1 Random Thought on Research Methods in CS/CIS CSCI 6530 July 1, 2010 Kwok-Bun Yue University of...

Post on 18-Jan-2018

217 views 0 download

description

Merriam-Webster Research –1 : careful or diligent search –2 : studious inquiry or examination; especially : investigation or experimentation aimed at the discovery and interpretation of facts, revision of accepted theories or laws in the light of new facts, or practical application of such new or revised theories or laws –3 : the collecting of information about a particular subject 7/1/2010Bun Yue: 3

transcript

1

Random Thought on Research Methods

in CS/CIS

CSCI 6530July 1, 2010

Kwok-Bun YueUniversity of Houston-Clear Lake

Random

• Random: not organized.

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 2

Merriam-Webster

• Research– 1 : careful or diligent search– 2 : studious inquiry or examination; especially :

investigation or experimentation aimed at the discovery and interpretation of facts, revision of accepted theories or laws in the light of new facts, or practical application of such new or revised theories or laws

– 3 : the collecting of information about a particular subject

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 3

For what?

• Finding new things: facts, theories, processes, tools, relationships, techniques.

• Solving problems

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 4

Why Research?

• Solving problems.• Enhancing understanding.• Career enhancement.• Curiosity and fun.• …

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 5

Research Methods

• Discipline dependent.– E.g. medical research: double blind test

with control.• Scientific methods.• Empirical methods.

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 6

Starting Research

• What do you need to start your research?– Talk! Talk! Talk!– Think! Think! Think!– Read! Read! Read!

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 7

Asking Questions

• ASK! ASK! ASK!

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 8

Not Asking Questions

• Easy• Comfortable• Familiar• …

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 9

Asking is crucial

• Get a context of the problem from many angles.

• Organize your thought.• Model and refine your understanding.• Discover new information and insight.

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 10

Intellectual Curiosity

• A key for deep understanding, important discovery and … fun.

• Sometimes not too output driven: need of ‘down’ time.

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 11

• Recommended reading: Surely You're Joking, Mr. Feynman! (Adventures of a Curious Character) by Richard Feynman.

Keeping an open mind

• Keep an open mind as long as possible.– Do not jump to the first solution that you

have come up with.

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 12

Research in Physics

• Scientific Methods:1. Observe, ask questions and understand2. Make hypothesis and model3. Make (precise) predictions using the

hypothesis.4. Test the predictions.

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 13

Questions in Physics

• Fundamental questions: e.g.– Can the four fundamental forces be unified:

theory of everything?– Where do our universe come back?– What are elementary particles make of?

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 14

Results in Physics

• Theories: e.g.– Superstring theory.– Big bang theory– Quarks

• New facts.

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 15

Validations in Physics

• Experiment with predictions by theories.• E.g.: Big bang theory predicts

abundance of light elements.– Positive results: add confidence.– Negative results: reject theory.

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 16

Questions in Computing

• Much more diverse. Have aspects from most other areas: engineering, science, humanities, …

• Can create your own ‘universe’. (vs economic, for example)

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 17

Result in CS

• New theories, algorithms, processes, methods, facts, etc.

• New models, problems and application areas.

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 18

Validations

• Direct validation• Theoretical analysis• Simulation• Benchmarking• Statistical methods• …

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 19

Planning: Goals

• Output oriented incentives can be too ‘far away’.

• Setting plans and goals.– Create a detail plan of steps and

benchmarks.– Small goals every step.– Consider input-oriented goals.

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 20

Early Web Business Model

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 21

BuildWebsites

AttractHuge Traffic

Somethinghappens

Rich!

Thesis

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 22

UnderstandProblem

Design and ImplementSolution

Good thinghappens

Done!

Detailed Plan

• Create a road map with enough details to the final goals.– Preparation.– Planning– Risk Management

• Recommended reading: Ed Viesturs, “No Shortcuts to the Top: Climbing the World's 14 Highest Peaks”

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 23

Areas of My Research Interest

• Internet Computing• XML and semi-structured data • CS and IS education• Concurrent Programming

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 24

(Older) XML Projects

• Storage of XML in relational database (Used as an example)

• XML Metrics

10/5/2005 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 25

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 26

Storing XML in RDB

• Advantages:– Mature database technologies.– May be queried by

• XML technology: e.g. XPath, XQuery.• RDB technology: e.g. SQL.

• Disadvantages: – impedance mismatch: XML and relations

are different data models.

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 27

Related Issues

• Effective mapping XML DTDs (~ ordered tree model) to relational schemas.

• Mapping of XML queries (e.g. XQuery) to RDB queries (e.g. SQL).

• Mapping of RDB query results back to XML format.

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 28

Related Work and Context

• Mapping – With or without schemas for XML.– With or without user input.

• Schemas for XML:– Document Type Definition (DTD)– XML Schema

• We consider mapping with DTD and without user input.

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 29

Naïve Mapping

• An XML element is mapped to a relation.

Example 1a:XML:

<a><b><c><d>hello</d></c></b></a>-> Relations: a, b, c and d.

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 30

Problems of Naïve Mapping

• Many relations.• Ineffective queries: multiple query joins.Example 1b:XPath Query: //aSQL Query: need to join the relations a, b,

c and d.

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 31

Inlining Algorithms

• First proposed by Shanmugasundaram, et. al.

• Expanded by Lu, Lee, Chu and others.• Extended in various directions by various

researchers, e.g.,– Preserving XML element orders.– Preserving XML constraints.

• Do not consider extensions here.

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 32

Basic Idea of Inlining Algorithms

• Inline child element into the relation for the parent element when appropriate.

• Different inlining algorithms differ in inlining criteria.

Example 1c: XML: <a><b><c><d>hello</d></c></b></a>

Inlined Relation: a.

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 33

Inlining Algorithms

• Child elements & attributes may be inlined.

• Child elements may not have their own relations.

• Results in less number of relations.• In general, more inlining -> less joins.

10/5/2005 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 34

Inlining Algorithm Structure

1. Simplification of DTD.2. Generation of DTD graphs3. Generation of Relational Schemas

Our work

• Improved on simplification of DTD and generation of DTD graphs.

• Constructed a new aggressive inlining algorithm.

• Student: Alakappan.

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 35

Internet Computing

• Web bias (older project)• Web 2.0 framework (IS project)• Content Management Software (CMS):

Joomla (CS/IS Education)• Mashup: Yahoo Pipe (CS/IS Education)

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 36

Measuring Web Bias

• Search engines dominate how information are accessed.

• Search results have major social, political and commercial consequences.

• Are search engines biased?• How bias are them?

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 37

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 38

Previous Works

• To measure bias, results should be compared to a norm.

• The norm may be from human experts.• Mowshowitz and Kawaguchi: the

average search result of a collection of popular search engines as the norm.

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 39

Mowshowitz and Kawaguchi

SE1

SEn

URLS1

URLSn

NORMURLS

URLVector1

URLVectorn

union NORMURL

Vector

Bias1

Biasn

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 40

Limitations

• Based on URL Vector -> cannot measure bias quality.

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 41

Our Approach

• Use Kleinberg’s HITS algorithm to create clusters, authorities and hubs of the result norm URLs.

• Use them as norm clusters, authorities and hubs.

• Measure distances between norms and individual results as bias.

10/5/2005 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 42

Our Approach

SE1

SEn

URLS1

URLSn

NORMURLS

URLVector1

URLVectorn

union NORMClusterVector

Bias1

Biasn

NORMCluster

ClusterVector1

ClusterVectorn

Recent Projects

• Web 2.0 framework:– A model and framework to study Web 2.0

technologies, implications and trends.– Collaborator: Mr. Tracy Gate.– Publications: Pre-ICIS Workshop and

Communications of AIS.

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 43

CMS: Joomla

• Question: Using CMS/Joomla for capstone project.

• Methodology: projects and surveys.• Collaborator:

– Capstone project teams.– Industrial mentor: Dilhar DeSilva

• Publication: Journal of Information Systems Education.

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 44

End User Programming

• Use of Yahoo/Pipeline in constructing Web Mashup.

• Methodology: projects and surveys.• Collaborators: students in the XML

class in Summer 2009.• Publication: Journal of Information

Systems Education.

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 45

Ongoing projects

• Googlewave as communications/collaboration tools in capstone projects and software project management.

• Collaborators: capstone project students.

• Publications: under preparation.

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 46

Open Source Software

• Use of OSS in educational institutes.• Methodology: meta-analysis.• Collaborators: two master students.

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 47

Other recent projects

• Assessment• Scholarship• Student Response Systems

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 48

Interested?

• Come and talk with me.

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 49

7/1/2010 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 50

Conclusions

• Good time to do applied computing research in the Web, XML and other areas.

• Style: hands-on supervision + publications.

• Don't forget to donate a scholarship to the School if your future research leads to a windfall.

10/5/2005 Bun Yue: yue@uhcl.edu, http://dcm.uhcl.edu/yue slide 51

Questions?

• Any Questions?• Thanks!