Date post: | 21-Apr-2017 |
Category: |
Data & Analytics |
Upload: | krist-wongsuphasawat |
View: | 1,243 times |
Download: | 0 times |
WHAT TO EXPECT WHEN YOU ARE VISUALIZING
Krist Wongsuphasawat / @kristw
Based on true stories Forever querying
Never-ending cleaning Hopelessly prototyping
Last minute coding and many more…
Computer Engineer Bangkok, Thailand
PhD in Computer Science Information Visualization Univ. of Maryland
IBMMicrosoft
Data Visualization Scientist Twitter
Krist Wongsuphasawat / @kristw
DATA SOURCESOpen data Publicly available
Internal data Private, owned by clients’ organization
Self-collected data Manual, site scraping, etc.
Combine the above
MANY FORMS OF DATAStandalone files txt, csv, tsv, json, Google Docs, …, pdf*
APIs better quality with more overhead
Databases doesn’t necessary mean they are organized
Big data bigger pain
CHALLENGESGet relevant Tweets hashtag: #oscars keywords: “spotlight” (movie name)
Too big Need to aggregate & reduce size
Slow Long processing time (hours)
Hadoop Cluster
Pig / Scalding (slow)
Data Storage
Tool
Final dataset
Tool node.js / python / excel (fast)
Your laptop
GETTING BIG DATA
Smaller dataset
DATA WRANGLINGClean A clean dataset? Joking, right?
Filter Less is more
Parse, Format, Correct, etc. Change country code from 3-letter to 2-letter Correct time of day based on users’ timezone etc.
RECOMMENDATIONSAlways think that you will have to do it again document the process, automation
Reusable scripts break a gigantic do-it-all function into smaller ones
Reusable data keep for future project
TYPE OF PROJECTSExplanatory Exploratory
Storytelling Analytics Tools Inspirations
x x
PMs, Data ScientistsGeneral Public General Public
Understand product usage
See what data can tell us
Get inspired
TYPE OF PROJECTSExplanatory Exploratory
Storytelling Analytics Tools Inspirations
x x
PMs, Data ScientistsGeneral Public General Public
Understand product usage
See what data can tell us
Get inspired
STORYTELLING : WHAT TO EXPECTtimely Deadline is strict. Also can be unexpected events.
wide audience easy to explain and understand, multi-device support
one-off projects
content screening
TIME : TWEETS/SECOND + ANNOTATION
http://www.flickr.com/photos/twitteroffice/5681263084/
by Miguel Rios
LOCATION
flickr.com/photos/twitteroffice/8798020541
San Francisco
Low density
High density
by Miguel Rios
Rebuild the world based on
tweet density
twitter.github.io/interactive/andes/
by Nicolas Garcia Belmonte
TIME + LOCATION : TWEET TIME BY CITY
Night
Late night
Daytime
Night
Late night
Daytime
by Miguel Rios & Jimmy Lin
Night
Late night
Daytime
Night
Late night
Daytime
TIME + LOCATION : TWEET TIME BY CITYby Miguel Rios & Jimmy Lin
Night
Late night
Daytime
Night
Late night
Daytime
TIME + LOCATION : TWEET TIME BY CITYby Miguel Rios & Jimmy Lin
TIME + LOCATION : TWEET TIME BY CITY
Night
Late night
Daytime
Night
Late night
Daytime
by Miguel Rios & Jimmy Lin
USER + LOCATION : FAN MAP
interactive.twitter.com/nfl_followers2014
USER + LOCATION : FAN MAP
interactive.twitter.com/nba_followers
USER + LOCATION : FAN MAP
interactive.twitter.com/premierleague
CONTENT + TIME : MATCH SUMMARY
Count Tweets mentioning the teams every minute
Dortmund Bayern MunichTeam 1 Team 2
time
begin
end
CONTENT + TIME : COMPETITION SUMMARY
A B C D
A C
C
vs vs
vs + =
uclfinal.twitter.com
CONTENT + TIME + LOCATION : NEW YEAR 2014
twitter.github.io/interactive/newyear2014/
https://interactive.twitter.com/tenyears
Project / Twitter 10 years
https://interactive.twitter.com/tenyears
Demo / Twitter 10 years
WORKFLOWRequested / Identify needs
Design & Prototype
Refine Mobile, Embed
Logging
Translations
Release
TYPE OF PROJECTSExplanatory Exploratory
Storytelling Analytics Tools Inspirations
x x
PMs, Data ScientistsGeneral Public General Public
Understand product usage
See what data can tell us
Get inspired
ANALYTICS TOOLS : WHAT TO EXPECTricher, more features to support exploration of complex data
more technical audience product managers, engineers, data scientists
accuracy
designed for dynamic input
long-term projects
WHAT ARE BEING LOGGED?
tweet from home timeline on twitter.com tweet from search page on iPhone
activities
WHAT ARE BEING LOGGED?
tweet from home timeline on twitter.com tweet from search page on iPhone
sign up log in
retweet etc.
activities
LOG EVENT A.K.A. “CLIENT EVENT”
client : page : section : component : element : actionweb : home : timeline : tweet_box : button : tweet
1) User ID 2) Timestamp 3) Event name
4) Event detail
[Lee et al. 2012]
UsersUse
Curious
Engineers
Log datain Hadoop
Instrument
Write
Product Managers
bigger than Tweet data
UsersUse
Curious
Engineers
Log datain Hadoop
Data Scientists
Ask
Instrument
Write
Product Managers
UsersUse
Curious
Engineers
Log datain Hadoop
Data Scientists
Find
Ask
Instrument
Write
Product Managers
UsersUse
Curious
Engineers
Log datain Hadoop
Data Scientists
Find, Clean
Ask
Instrument
Write
Product Managers
UsersUse
Curious
Engineers
Log datain Hadoop
Data Scientists
Find, Clean
Ask
Monitor
Instrument
Write
Product Managers
UsersUse
Curious
Engineers
Log datain Hadoop
Data Scientists
Find, Clean, Analyze
Ask
Monitor
Instrument
Write
Product Managers
Log data
EngineersData Scientists
Usersin Hadoop
Find, Clean, Analyze
Use
Monitor
Ask
Curious
1 2
Instrument
Write
Product Managers
Log datain Hadoop
Aggregate
Find
client page section component element action
Search
Client events count
Engineers & Data Scientists
Log datain Hadoop
Aggregate
Find
client page section component element action
Search
Client events count
Engineers & Data Scientists
client page section component element action
Search
Find
Log datain Hadoop
Aggregate
web home * * impression*
Client events count
Engineers & Data Scientists
client page section component element action
Search
Find
Query
Return
Log datain Hadoop
Resultsweb : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
web home * * impression*
Client events count
Engineers & Data Scientists
client page section component element action
Search
Find
Query
Return
Log datain Hadoop
Resultsweb : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
search can be better
Client events count
Engineers & Data Scientists
client page section component element action
Search
Find
Query
Return
Log datain Hadoop
Resultsweb : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
10,000+ event types
search can be better
Client events count
Engineers & Data Scientists
client page section component element action
Search
Find
Query
Return
Log datain Hadoop
Resultsweb : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
search can be better
10,000+ event types
not everybody knows
What are all sections under web:home?
Client events count
Engineers & Data Scientists
client page section component element action
Search
Find
Query
Return
Log datain Hadoop
Resultsweb : home : home : - : - : impression
Aggregate
one graph / event
10,000+ event types
not everybody knows
What are all sections under web:home?
Client events count
Engineers & Data Scientists
search can be better
client page section component element action
Search
Find
Query
Return
Log datain Hadoop
Resultsweb : home : home : - : - : impression
Aggregate
one graph / event
x 10,000
10,000+ event types
not everybody knows
What are all sections under web:home?
Client events count
Engineers & Data Scientists
search can be better
See
Client event collection
Engineers & Data Scientists
narrow down
Interactions search box => filter
See
HOW TO VISUALIZE?
narrow down
Client event collection
Engineers & Data Scientists
Interactions search box => filter
See
Client event collection
Engineers & Data Scientists
client : page : section : component : element : action
HOW TO VISUALIZE?
narrow down
Interactions search box => filter
CLIENT EVENT HIERARCHY
iphone home -
- - impression
tweet tweet click
iphone:home:-:-:-:impression
iphone:home:-:tweet:tweet:click
DETECT CHANGES
iphone home -
- - impression
tweet tweet click
iphone home -
- - impression
tweet tweet click
TODAY
7 DAYS AGO
compared to
DISPLAY CHANGES
iphone home -
- - impression
tweet tweet click
Map of the Market [Wattenberg 1999], StemView [Guerra-Gomez et al. 2013]
FUNNEL ANALYSIS
1 jobhome page
profile page
1 hourbanana : home : - : - : - : impression
banana : profile : - : - : - : impression
FUNNEL ANALYSIS
banana : home : - : - : - : impression
banana : profile : - : - : - : impression banana : search : - : - : - : impression
home page
profile page search page
2 jobs2 hours
FUNNEL ANALYSIS
banana : home : - : - : - : impression
banana : profile : - : - : - : impression banana : search : - : - : - : impression
home page
profile page search page
Specify all funnels manually!
n jobs
Time to find a new job
USER SESSIONSSession#1
A
B
end
Session#4
Start
end
A
Session#2
B
end
A
Session#3
C
end
A
StartStartStart
Read the details in Krist Wongsuphasawat and Jimmy Lin.
“Using Visualizations to Monitor Changes and Harvest Insights from a Global-Scale Logging Infrastructure at Twitter “ Proc. IEEE Conference on Visual Analytics Science and Technology (VAST) 2014
HOW TO MAKE IT WORK?
WORKFLOWRequested / Identify needs
Design & Prototype Make it work for sample dataset
Refine & Generalize
Productionize
Document & Release
Maintain & Support Keep it running, Feature requests & Bugs fix
TYPE OF PROJECTSExplanatory Exploratory
Storytelling Analytics Tools Inspirations
x x
PMs, Data ScientistsGeneral Public General Public
Understand product usage
See what data can tell us
Get inspired
https://medium.com/@kristw/designing-the-game-of-tweets-7f87c30dc5a2
Project / Game of Tweets
HOW TO BE BETTER?Time is limited.
Grow the team
Expand skills
Improve tooling Solve a problem once and for all
Automate repetitive tasks
https://github.com/twitter/d3kit
Demo / d3Kithttp://www.slideshare.net/kristw/d3kit
TYPE OF PROJECTSExplanatory Exploratory
Storytelling Analytics Tools Inspirations
x x
PMs, Data ScientistsGeneral Public General Public
Understand product usage
See what data can tell us
Get inspired
TAKE-AWAY Getting data and data wrangling are time-consuming.
Different projects, different requirements Storytelling, Product insights, Art, etc.
Combine visualization with other skills HCI, Design, Stats, ML, etc.
Expect the unexpected
Learn and improve do more with less time grow the team, expand skills, improve tooling
Krist Wongsuphasawat / @kristwkristw.yellowpigz.com
Nicolas Garcia Belmonte, Robert Harris, Miguel Rios, Simon Rogers, Jimmy Lin, Linus Lee, Chuang Liu,
and many colleagues at Twitter. Lastly, to my wife for taking care of our 3 months old baby, so I had time to prepare these slides.
ACKNOWLEDGEMENT
RESOURCESImages Banana phone http://goo.gl/GmcMPq Bar chart https://goo.gl/1G1GBg Boss https://goo.gl/gcY8Kw Champions League http://goo.gl/DjtNKE Database http://goo.gl/5N7zZz Fishing shark http://goo.gl/2fp4zW Globe visualization http://goo.gl/UiGMMj Harry Potter http://goo.gl/Q9Cy64 Holding phone http://goo.gl/It2TzH Kiwi orange http://goo.gl/ejQ73y Kiwi http://goo.gl/9yk7o5 Library https://goo.gl/HVeE6h Library earthquake http://goo.gl/rBqBrs
Minion http://goo.gl/I19Ijg NBA http://goo.gl/p7HBdG NFL http://goo.gl/feQMZs Orange & Apple http://goo.gl/NG6RIL Pile of paper http://goo.gl/mGLQTx Premier League http://goo.gl/AqIINO Scrooge McDuck https://goo.gl/aKv8D7 The Sound of Music https://goo.gl/dqHlzj Trash pile http://goo.gl/OsFfo3 Tyrion http://goo.gl/WaBonl Watercolor Map by Stamen Design