+ All Categories
Home > Data & Analytics > What to expect when you are visualizing

What to expect when you are visualizing

Date post: 21-Apr-2017
Category:
Upload: krist-wongsuphasawat
View: 1,243 times
Download: 0 times
Share this document with a friend
187
WHAT TO EXPECT WHEN YOU ARE VISUALIZING Krist Wongsuphasawat / @kristw Based on true stories Forever querying Never-ending cleaning Hopelessly prototyping Last minute coding and many more…
Transcript

WHAT TO EXPECT WHEN YOU ARE VISUALIZING

Krist Wongsuphasawat / @kristw

Based on true stories Forever querying

Never-ending cleaning Hopelessly prototyping

Last minute coding and many more…

Computer Engineer Bangkok, Thailand

PhD in Computer Science Information Visualization Univ. of Maryland

IBMMicrosoft

Data Visualization Scientist Twitter

Krist Wongsuphasawat / @kristw

VISUALIZE DATA

INPUT (DATA)

=YOU+ OUTPUT (VIS)

EXPECT THE MISMATCHES

INPUT (DATA)What clients think they have

INPUT (DATA)What clients think they have What they usually have

YOUWhat clients think you are

YOUWhat clients think you are What they will get

OUTPUT (VIS)What clients ask for

OUTPUT (VIS)What clients ask for What they really need

I need this. Take this.

I need this. Here you are.

I need this. Take this.

EXPECT THESE TASKS

INPUT (DATA)

=YOU+ OUTPUT (VIS)

INPUT (DATA)

=YOU+ OUTPUT (VIS)

+Get data & Wrangle

1+Analyze

& Visualize

2

GET DATA & WRANGLE1

DATA SOURCESOpen data Publicly available

Internal data Private, owned by clients’ organization

Self-collected data Manual, site scraping, etc.

Combine the above

MANY FORMS OF DATAStandalone files txt, csv, tsv, json, Google Docs, …, pdf*

APIs better quality with more overhead

Databases doesn’t necessary mean they are organized

Big data bigger pain

HAVING ALL TWEETSHow people think I feel.

How people think I feel. How I really feel.

HAVING ALL TWEETS

CHALLENGESGet relevant Tweets hashtag: #oscars keywords: “spotlight” (movie name)

Too big Need to aggregate & reduce size

Slow Long processing time (hours)

Hadoop Cluster

GETTING BIG DATA

Data Storage

Pig / Scalding (slow)

GETTING BIG DATAHadoop Cluster

Data Storage

Tool

Hadoop Cluster

Pig / Scalding (slow)

GETTING BIG DATA

Data Storage

Tool

Pig / Scalding (slow)

GETTING BIG DATAHadoop Cluster

Data Storage

Tool

Your laptop Smaller dataset

Hadoop Cluster

Pig / Scalding (slow)

Data Storage

Tool

Final dataset

Tool node.js / python / excel (fast)

Your laptop

GETTING BIG DATA

Smaller dataset

EXPECT TO WAIT FOR (BIG) DATA

DATA WRANGLINGClean A clean dataset? Joking, right?

Filter Less is more

Parse, Format, Correct, etc. Change country code from 3-letter to 2-letter Correct time of day based on users’ timezone etc.

EXPECT A LOT OF TIME WITH DATA WRANGLING

70-80% of time “Data Janitor”

RECOMMENDATIONSAlways think that you will have to do it again document the process, automation

Reusable scripts break a gigantic do-it-all function into smaller ones

Reusable data keep for future project

ANALYZE & VISUALIZE2

EXPECT DIFFERENT REQUIREMENTS

TYPE OF PROJECTSExplanatory Exploratory

Storytelling Analytics Tools Inspirations

x x

PMs, Data ScientistsGeneral Public General Public

Understand product usage

See what data can tell us

Get inspired

TYPE OF PROJECTSExplanatory Exploratory

Storytelling Analytics Tools Inspirations

x x

PMs, Data ScientistsGeneral Public General Public

Understand product usage

See what data can tell us

Get inspired

So many things we could learn

from Twitter data

Give us interesting vis about xxxx by Nov 10

STORYTELLING : WHAT TO EXPECTtimely Deadline is strict. Also can be unexpected events.

wide audience easy to explain and understand, multi-device support

one-off projects

content screening

WHO/WHAT

STORYTELLING

WHERE WHENlocation time

user/content

WHO/WHAT

STORYTELLING

WHERE WHENlocation time

user/content

TIME : TWEETS/SECONDby Miguel Rios

TIME : TWEETS/SECONDby Miguel Rios

TIME : TWEETS/SECOND + ANNOTATION

http://www.flickr.com/photos/twitteroffice/5681263084/

by Miguel Rios

IT DOESN’T HAVE TO BE COMPLEX.

WHO/WHAT

STORYTELLING

WHERE WHENlocation time

user/content

LOCATIONLow density

High density

by Miguel Rios

LOCATION

flickr.com/photos/twitteroffice/8798020541

San Francisco

Low density

High density

by Miguel Rios

Rebuild the world based on

tweet density

twitter.github.io/interactive/andes/

by Nicolas Garcia Belmonte

WHO/WHAT

STORYTELLING

WHERE WHENlocation time

user/content

CONTENT : US ELECTION 2016

CONTENT : #MUSEUMWEEK

CONTENT : #MUSEUMWEEK

WHO/WHAT

STORYTELLING

WHERE WHENlocation time

user/content

TIME + LOCATION : TWEET TIME BY CITY

Night

Late night

Daytime

Night

Late night

Daytime

by Miguel Rios & Jimmy Lin

Night

Late night

Daytime

Night

Late night

Daytime

TIME + LOCATION : TWEET TIME BY CITYby Miguel Rios & Jimmy Lin

Night

Late night

Daytime

Night

Late night

Daytime

TIME + LOCATION : TWEET TIME BY CITYby Miguel Rios & Jimmy Lin

TIME + LOCATION : TWEET TIME BY CITY

Night

Late night

Daytime

Night

Late night

Daytime

by Miguel Rios & Jimmy Lin

WHO/WHAT

STORYTELLING

WHERE WHENlocation time

user/content

CONTENT + LOCATION : TWEET MAPby Robert Harris

CONTENT + LOCATION : TWEET MAPby Robert Harris

most frequent

term

CONTENT + LOCATION : TWEET MAPby Robert Harris

Gmail was down Jan 24, 2014

CONTENT + LOCATION : TWEET MAPby Robert Harris

USER + LOCATION : FAN MAP

interactive.twitter.com/nfl_followers2014

USER + LOCATION : FAN MAP

interactive.twitter.com/nba_followers

USER + LOCATION : FAN MAP

interactive.twitter.com/premierleague

WHO/WHAT

STORYTELLING

WHERE WHENlocation time

user/content

CONTENT + TIME : STREAMGRAPH

CONTENT + TIME : MATCH SUMMARY

Biggest tournament for European soccer clubs

CONTENT + TIME : MATCH SUMMARY

Count Tweets mentioning the teams every minute

Dortmund Bayern MunichTeam 1 Team 2

time

begin

end

CONTENT + TIME : MATCH SUMMARY

CONTENT + TIME : MATCH SUMMARY

+ goals

CONTENT + TIME : MATCH SUMMARY

+ goals + players

CONTENT + TIME : COMPETITION SUMMARY

A B C D

A C

C

vs vs

vs + =

uclfinal.twitter.com

WHO/WHAT

STORYTELLING

WHERE WHENlocation time

user/content

CONTENT + TIME + LOCATION : NEW YEAR 2014

twitter.github.io/interactive/newyear2014/

BEHIND THE SCENE

https://interactive.twitter.com/tenyears

Project / Twitter 10 years

REQUEST

EXPECT FUNNY REQUESTS

DESIGN & PROTOTYPE

Engagements

First Minute First Hour First Day First Week

0 24h 0 7d0 60s 0 60m

EXPECT REVISIONS

Visualization is an important piece, but not the entire experience.

DON’T FORGET THE BIG PICTURE.

https://interactive.twitter.com/tenyears

Demo / Twitter 10 years

WORKFLOWRequested / Identify needs

Design & Prototype

Refine Mobile, Embed

Logging

Release

EXPECT THE UNEXPECTED

WORKFLOWRequested / Identify needs

Design & Prototype

Refine Mobile, Embed

Logging

Translations

Release

TYPE OF PROJECTSExplanatory Exploratory

Storytelling Analytics Tools Inspirations

x x

PMs, Data ScientistsGeneral Public General Public

Understand product usage

See what data can tell us

Get inspired

Data sources

Output

explore

analyze

present

get

*

*

Data sources

Output

explore

analyze

present

get

*

*

ad-hoc scripts

Data sources

Output

explore

analyze

present

get

*

*

ad-hoc scripts tools for exploration

ANALYTICS TOOLS : WHAT TO EXPECTricher, more features to support exploration of complex data

more technical audience product managers, engineers, data scientists

accuracy

designed for dynamic input

long-term projects

USER ACTIVITY LOGS

UsersUseTwitter

UsersUse

Product Managers

Curious

Twitter

UsersUse

Curious

Engineers

Log datain Hadoop

Write Twitter

Instrument

Product Managers

WHAT ARE BEING LOGGED?

tweet

activities

WHAT ARE BEING LOGGED?

tweet from home timeline on twitter.com tweet from search page on iPhone

activities

WHAT ARE BEING LOGGED?

tweet from home timeline on twitter.com tweet from search page on iPhone

sign up log in

retweet etc.

activities

ORGANIZE?

LOG EVENT A.K.A. “CLIENT EVENT”

[Lee et al. 2012]

LOG EVENT A.K.A. “CLIENT EVENT”

client : page : section : component : element : actionweb : home : timeline : tweet_box : button : tweet

1) User ID 2) Timestamp 3) Event name

4) Event detail

[Lee et al. 2012]

LOG DATA

UsersUse

Curious

Engineers

Log datain Hadoop

Twitter

Instrument

Write

Product Managers

bigger than Tweet data

UsersUse

Curious

Engineers

Log datain Hadoop

Data Scientists

Ask

Twitter

Instrument

Write

Product Managers

UsersUse

Curious

Engineers

Log datain Hadoop

Data Scientists

Find

Ask

Twitter

Instrument

Write

Product Managers

LOG DATA

UsersUse

Curious

Engineers

Log datain Hadoop

Data Scientists

Find, Clean

Ask

Twitter

Instrument

Write

Product Managers

UsersUse

Curious

Engineers

Log datain Hadoop

Data Scientists

Find, Clean

Ask

Monitor

Twitter

Instrument

Write

Product Managers

UsersUse

Curious

Engineers

Log datain Hadoop

Data Scientists

Find, Clean, Analyze

Ask

Monitor

Twitter

Instrument

Write

Product Managers

Log data

EngineersData Scientists

Usersin Hadoop

Find, Clean, Analyze

Use

Monitor

Ask

Curious

1 2

Twitter

Instrument

Write

Product Managers

Scribe Radar

Project / Find & Monitor client events

Log datain Hadoop

Engineers & Data Scientists

billions of rows

Log datain Hadoop

AggregateClient events count

Engineers & Data Scientists

Log datain Hadoop

Aggregate

Find

client page section component element action

Search

Client events count

Engineers & Data Scientists

Log datain Hadoop

Aggregate

Find

client page section component element action

Search

Client events count

Engineers & Data Scientists

SECTION? COMPONENT?

ELEMENT?

client page section component element action

Search

Find

Log datain Hadoop

Aggregate

web home * * impression*

Client events count

Engineers & Data Scientists

client page section component element action

Search

Find

Query

Return

Log datain Hadoop

Resultsweb : home : home : - : - : impression

web : home : wtf : - : - : impression

Aggregate

web home * * impression*

Client events count

Engineers & Data Scientists

client page section component element action

Search

Find

Query

Return

Log datain Hadoop

Resultsweb : home : home : - : - : impression

web : home : wtf : - : - : impression

Aggregate

search can be better

Client events count

Engineers & Data Scientists

client page section component element action

Search

Find

Query

Return

Log datain Hadoop

Resultsweb : home : home : - : - : impression

web : home : wtf : - : - : impression

Aggregate

10,000+ event types

search can be better

Client events count

Engineers & Data Scientists

client page section component element action

Search

Find

Query

Return

Log datain Hadoop

Resultsweb : home : home : - : - : impression

web : home : wtf : - : - : impression

Aggregate

search can be better

10,000+ event types

not everybody knows

What are all sections under web:home?

Client events count

Engineers & Data Scientists

client page section component element action

Search

Find

Query

Return

Log datain Hadoop

Resultsweb : home : home : - : - : impression

Aggregate

one graph / event

10,000+ event types

not everybody knows

What are all sections under web:home?

Client events count

Engineers & Data Scientists

search can be better

client page section component element action

Search

Find

Query

Return

Log datain Hadoop

Resultsweb : home : home : - : - : impression

Aggregate

one graph / event

x 10,000

10,000+ event types

not everybody knows

What are all sections under web:home?

Client events count

Engineers & Data Scientists

search can be better

GOALSSearch for client events

Explore client event collection

Monitor changes

DESIGN

Client event collection

Engineers & Data Scientists

See

Client event collection

Engineers & Data Scientists

See

Client event collection

Engineers & Data Scientists

narrow down

Interactions search box => filter

See

HOW TO VISUALIZE?

narrow down

Client event collection

Engineers & Data Scientists

Interactions search box => filter

See

Client event collection

Engineers & Data Scientists

client : page : section : component : element : action

HOW TO VISUALIZE?

narrow down

Interactions search box => filter

CLIENT EVENT HIERARCHY

iphone home -

- - impression

tweet tweet click

iphone:home:-:-:-:impression

iphone:home:-:tweet:tweet:click

DETECT CHANGES

iphone home -

- - impression

tweet tweet click

iphone home -

- - impression

tweet tweet click

TODAY

7 DAYS AGO

compared to

CALCULATE CHANGES

+5% +5% +5%

+10% +10% +10%

-5% -5% -5%

DIFF

DISPLAY CHANGES

iphone home -

- - impression

tweet tweet click

Map of the Market [Wattenberg 1999], StemView [Guerra-Gomez et al. 2013]

DISPLAY CHANGES

home -

- - impression

tweet tweet click

iphone

Demo Demo Demo

Demo / Scribe Radar

Twitter for Banana

Flying Sessions

Project / Funnel Analysis

COUNT PAGE VISITS

banana : home : - : - : - : impressionhome page

FUNNEL

home page

profile page

FUNNEL ANALYSIS

1 jobhome page

profile page

1 hourbanana : home : - : - : - : impression

banana : profile : - : - : - : impression

FUNNEL ANALYSIS

banana : home : - : - : - : impression

banana : profile : - : - : - : impression banana : search : - : - : - : impression

home page

profile page search page

2 jobs2 hours

FUNNEL ANALYSIS

banana : home : - : - : - : impression

banana : profile : - : - : - : impression banana : search : - : - : - : impression

home page

profile page search page

Specify all funnels manually!

n jobs

Time to find a new job

GOAL

banana : home : - : - : - : impression

… ……

1 job => all funnels, visualized

home page

USER SESSIONSSession#1

A

B

end

Session#4

Start

end

A

Session#2

B

end

A

Session#3

C

end

A

StartStartStart

AGGREGATE

A

BB C

Start

end endend

A A

end

A

4 sessions

AGGREGATE

A

BB C

Start

end endend

end

4 sessions

AGGREGATE

C

Start

end endend

end

A

B

4 sessions

AGGREGATE

C

Start

end endend

end

A

B

4 sessions

AGGREGATE

C

Start

end endend

A

B end

4 sessions

AGGREGATE

C

Start

endend

A

B end

4 sessions

AGGREGATE

C

Start

endend

A

B end

4 sessions

AGGREGATE

Start

endend

A

CB end

4 sessions

AGGREGATE

endend

A

CB end

Start

4,000,000 sessions

(~millions sessions, 10,000+ event types)

TRY WITH SAMPLE DATA

FAIL…

Keep trying to make it work

EXPECT TRIALS AND ERRORS

Read the details in Krist Wongsuphasawat and Jimmy Lin.

“Using Visualizations to Monitor Changes and Harvest Insights from a Global-Scale Logging Infrastructure at Twitter “ Proc. IEEE Conference on Visual Analytics Science and Technology (VAST) 2014

HOW TO MAKE IT WORK?

Demo Demo Demo

Demo / Flying Sessions

WORKFLOWRequested / Identify needs

Design & Prototype Make it work for sample dataset

Refine & Generalize

Productionize

Document & Release

Maintain & Support Keep it running, Feature requests & Bugs fix

TYPE OF PROJECTSExplanatory Exploratory

Storytelling Analytics Tools Inspirations

x x

PMs, Data ScientistsGeneral Public General Public

Understand product usage

See what data can tell us

Get inspired

https://medium.com/@kristw/designing-the-game-of-tweets-7f87c30dc5a2

Project / Game of Tweets

EXPECT HARDWARE COMPLICATIONS

INPUT (DATA)

=YOU+ OUTPUT (VIS)

+Get data & Wrangle

1+Analyze

& Visualize

2

INPUT (DATA)

=YOU+ OUTPUT (VIS)

+Get data & Wrangle

1+Analyze

& Visualize

2

EXPECT TO IMPROVE

HOW TO BE BETTER?Time is limited.

Grow the team

Expand skills

Improve tooling Solve a problem once and for all

Automate repetitive tasks

http://twitter.github.io/labella.js

Demo / Labella.js

https://github.com/twitter/d3kit

Demo / d3Kithttp://www.slideshare.net/kristw/d3kit

yeoman.io

Demo / Yeoman

TO SUM UP

INPUT (DATA)

=YOU+ OUTPUT (VIS)

+Get data & Wrangle

1+Analyze

& Visualize

2

TYPE OF PROJECTSExplanatory Exploratory

Storytelling Analytics Tools Inspirations

x x

PMs, Data ScientistsGeneral Public General Public

Understand product usage

See what data can tell us

Get inspired

TAKE-AWAY Getting data and data wrangling are time-consuming.

Different projects, different requirements Storytelling, Product insights, Art, etc.

Combine visualization with other skills HCI, Design, Stats, ML, etc.

Expect the unexpected

Learn and improve do more with less time grow the team, expand skills, improve tooling

Krist Wongsuphasawat / @kristwkristw.yellowpigz.com

Nicolas Garcia Belmonte, Robert Harris, Miguel Rios, Simon Rogers, Jimmy Lin, Linus Lee, Chuang Liu,

and many colleagues at Twitter. Lastly, to my wife for taking care of our 3 months old baby, so I had time to prepare these slides.

ACKNOWLEDGEMENT

RESOURCESImages Banana phone http://goo.gl/GmcMPq Bar chart https://goo.gl/1G1GBg Boss https://goo.gl/gcY8Kw Champions League http://goo.gl/DjtNKE Database http://goo.gl/5N7zZz Fishing shark http://goo.gl/2fp4zW Globe visualization http://goo.gl/UiGMMj Harry Potter http://goo.gl/Q9Cy64 Holding phone http://goo.gl/It2TzH Kiwi orange http://goo.gl/ejQ73y Kiwi http://goo.gl/9yk7o5 Library https://goo.gl/HVeE6h Library earthquake http://goo.gl/rBqBrs

Minion http://goo.gl/I19Ijg NBA http://goo.gl/p7HBdG NFL http://goo.gl/feQMZs Orange & Apple http://goo.gl/NG6RIL Pile of paper http://goo.gl/mGLQTx Premier League http://goo.gl/AqIINO Scrooge McDuck https://goo.gl/aKv8D7 The Sound of Music https://goo.gl/dqHlzj Trash pile http://goo.gl/OsFfo3 Tyrion http://goo.gl/WaBonl Watercolor Map by Stamen Design

THANK YOU


Recommended