+ All Categories
Home > Documents > Data & Visual Analytics - Georgia Institute of...

Data & Visual Analytics - Georgia Institute of...

Date post: 26-Apr-2020
Category:
Upload: others
View: 18 times
Download: 0 times
Share this document with a friend
38
Data & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo) Chau Georgia Tech
Transcript
Page 1: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

Data & Visual Analytics

CSE6242 / CX4242

Duen Horng (Polo) ChauGeorgia Tech

Page 2: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

Who is Polo?

Polo ChauAssociate Director, MS Analytics

Assistant Prof, CSEAdjunct Assistant Prof, IC

Page 3: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

www.cc.gatech.edu/~dchau/

Page 4: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

Course Staff

Office hours listed on course homepage.

PoloChau

AmirAfsharinejad

Yichen Wang

ChrisBerlind

Meera Manohar Kamath

Page 5: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

5

We work with (really) large data.

Page 6: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

6

Internet50 Billion Web Pages

www.worldwidewebsize.com www.opte.org

Page 7: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

7

Facebook

Modified from Marc_Smith, flickr

800 Million Users

Page 8: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

8

Citation Network

www.scirus.com/press/html/feb_2006.html#2 Modified from well-formed.eigenfactor.org

250 Million Articles

Page 9: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

TwitterWho-follows-whom (500 million users)

Who-buys-what (120 million users)

cellphone networkWho-calls-whom (100 million users)

Protein-protein interactions200 million possible interactions in human genome

9

Many More

Sources: www.selectscience.net www.phonedog.com www.mediabistro.com www.practicalecommerce.com/

Page 10: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

10

Large Networks We Analyzed

DATA à INSIGHTS

Graph Nodes Edges

YahooWeb 1.4 Billion 6 Billion

Symantec Machine-File Graph 1 Billion 37 Billion

Twitter 104 Million 3.7 Billion

Phone call network 30 Million 260 Million

Page 11: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

7

Page 12: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

7Number of items an average human

holds in working memory

±2George Miller, 1956

Page 13: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)
Page 14: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

7

Page 15: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

Data

Insights

Page 16: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

14

How to do that?

COMPUTATION + HUMAN INTUITION

Page 17: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

Both develop methods for making sense of network data

15

How to do that?

COMPUTATION INTERACTIVE VISAutomatic User-driven; iterative

Summarization, clustering, classification Interaction, visualization

>Millions of nodes Thousands of nodes

Page 18: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

15

How to do that?

COMPUTATION INTERACTIVE VISAutomatic User-driven; iterative

Summarization, clustering, classification Interaction, visualization

>Millions of nodes Thousands of nodes

Page 19: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

15

How to do that?

COMPUTATION INTERACTIVE VISAutomatic User-driven; iterative

Summarization, clustering, classification Interaction, visualization

>Millions of nodes Thousands of nodes

Page 20: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

15

How to do that?

COMPUTATION INTERACTIVE VISAutomatic User-driven; iterative

Summarization, clustering, classification Interaction, visualization

>Millions of nodes Thousands of nodes

Page 21: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

15

How to do that?

COMPUTATION INTERACTIVE VISAutomatic User-driven; iterative

Summarization, clustering, classification Interaction, visualization

>Millions of nodes Thousands of nodes

Page 22: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

15

How to do that?

COMPUTATION INTERACTIVE VISAutomatic User-driven; iterative

Summarization, clustering, classification Interaction, visualization

>Millions of nodes Thousands of nodes

Page 23: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

“Computers are incredibly fast, accurate, and stupid.

Human beings are incredibly slow, inaccurate, and brilliant.

Together they are powerful beyond imagination.”

(Einstein might or might not have said this.)

Page 24: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

“Essentially,

all models are wrong,

but some are useful”

George Box

Page 25: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

Our research combines the Best of Both Worlds

18

Our Approach for Big Data Analytics

DATA MINING HCIAutomatic User-driven; iterative

Summarization, clustering, classification Interaction, visualization

>Millions of items Thousands of items

Human-Computer Interaction

Page 26: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

19

Patented with SymantecFinds malware from 37 billion file relationships

Serving 120 million users worldwidePublished at SDM’11

Text

Polonium

Page 27: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

20

Best papers of SDM 2014 (top data mining conference)

MARCODetecting Fake Yelp Reviews

Page 28: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

21

Latent Gesture

Page 29: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

22

Insider Trading Detectionwith Securities and Exchange Commission (SEC)

Page 30: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

23

Text

NetProbe Auction Fraud Detection on eBay

$$$

Page 31: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

Apolo: Machine Learning + Visualization Find relevant nodes in real time (CHI’11)

24

Page 32: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

CareFlow: Healthcare Visual & Data Analytics

25

Page 33: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

Course homepage poloclub.gatech.edu/cse6242/

Discussion, Q&A, find teammates

Piazza

Assignment Submission T-Square (for submissions only; use Piazza for discussion)

Logistics

Page 34: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

Course Goals

• Learn scalable visual and computation techniques and tools, for typical data types

• Learn how to combine both kinds of methods (how they complement each other)

• Gain practical know-how

• Gain breath of knowledge

Page 35: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

Course Expectation• Overview of scalable visual and computation

techniques and tools

• Gain knowledge & experience (useful for jobs, research)

• Experience with designing and developing an interactive analysis tool

• Projects from previous class turned into papers (KDD, IUI, etc.)

Page 36: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

Course Expectation

• Actively Participate in class! Ask questions during class, and on Piazza

• Polo will reserve last 5-10min of every lecture for Q&A

Page 37: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

Schedule

See course homepagepoloclub.gatech.edu/cse6242/

Page 38: Data & Visual Analytics - Georgia Institute of Technologypoloclub.gatech.edu/Cse6242/2015Spring/Slides/CSE6242-0-Intro.pdfData & Visual Analytics CSE6242 / CX4242 Duen Horng (Polo)

Grading

• 4-5 homework assignments (50%)

• End-to-end analysis

• Techniques (computation and vis)

• “Big data” tools, e.g, Hadoop, Spark, etc.

• Group project (50%) -- 3 to 4 people


Recommended