Date post: | 03-Aug-2015 |
Category: |
Data & Analytics |
Upload: | krist-wongsuphasawat |
View: | 513 times |
Download: | 3 times |
Krist Wongsuphasawat /@kristw
Computer EngineerBangkok, Thailand
M.S. in Computer ScienceUniv. of Maryland
Krist Wongsuphasawat /@kristw
Computer EngineerBangkok, Thailand
PhD in Computer ScienceUniv. of MarylandInformation Visualization
Krist Wongsuphasawat /@kristw
Computer EngineerBangkok, Thailand
PhD in Computer ScienceUniv. of MarylandInformation Visualization
IBMMicrosoft
Krist Wongsuphasawat /@kristw
Computer EngineerBangkok, Thailand
PhD in Computer ScienceUniv. of MarylandInformation Visualization
IBMMicrosoft
Data Visualization ScientistTwitter
Krist Wongsuphasawat & Jimmy Lin@kristw
Using visualizations to monitor changes and harvest insights
from log data at Twitter
@lintool
IEEE VAST 2014
What are being logged?
tweet from home timeline on twitter.com tweet from search page on iPhone
activities
What are being logged?
tweet from home timeline on twitter.com tweet from search page on iPhone
sign up log in
retweet etc.
activities
log event a.k.a. “client event”
client : page : section : component : element : actionweb : home : timeline : tweet_box : button : tweet
1) User ID 2) Timestamp 3) Event name
4) Event detail
[Lee et al. 2012]
UsersUse
Curious
Engineers
Log datain Hadoop Twitter
Instrument
Write
Product Managers
bigger than Tweet data
UsersUse
Curious
Engineers
Log datain Hadoop
Data Scientists
Ask
Instrument
Write
Product Managers
UsersUse
Curious
Engineers
Log datain Hadoop
Data Scientists
Find
Ask
Instrument
Write
Product Managers
UsersUse
Curious
Engineers
Log datain Hadoop
Data Scientists
Find, Clean
Ask
Instrument
Write
Product Managers
UsersUse
Curious
Engineers
Log datain Hadoop
Data Scientists
Find, Clean
Ask
Monitor
Instrument
Write
Product Managers
UsersUse
Curious
Engineers
Log datain Hadoop
Data Scientists
Find, Clean, Analyze
Ask
Monitor
Instrument
Write
Product Managers
Log data
EngineersData Scientists
Usersin Hadoop
Find, Clean, Analyze
Use
Monitor
Ask
Curious
1 2
Instrument
Write
Product Managers
Log datain Hadoop
Aggregate
10,000+ event types
date client page section comp. elem. action count
20141011 web home home - - impression 100
20141011 web home wtf - - click 20
Engineers & Data Scientists
Client event collection
Log datain Hadoop
Aggregate
10,000+ event types
date client page section comp. elem. action count
20141011 web home home - - impression 100
20141011 web home wtf - - click 20
Engineers & Data Scientists
Client event collection
(Who-to-Follow)
Log datain Hadoop
Aggregate
Find
client page section component element action
Search
Client event collection
Engineers & Data Scientists
Log datain Hadoop
Aggregate
Find
client page section component element action
Search
Client event collection
Engineers & Data Scientists
client page section component element action
Search
Find
Log datain Hadoop
Aggregate
web home * * impression*
Client event collection
Engineers & Data Scientists
client page section component element action
Search
Find
Query
Return
Log datain Hadoop
Resultsweb : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
web home * * impression*
Client event collection
Engineers & Data Scientists
client page section component element action
Search
Find
Query
Return
Log datain Hadoop
Resultsweb : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
search can be better
Client event collection
Engineers & Data Scientists
client page section component element action
Search
Find
Query
Return
Log datain Hadoop
Resultsweb : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
10,000+ event types
search can be better
Client event collection
Engineers & Data Scientists
client page section component element action
Search
Find
Query
Return
Log datain Hadoop
Resultsweb : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
search can be better
10,000+ event types
not everybody knowsWhat are all sections under web:home?
Client event collection
Engineers & Data Scientists
client page section component element action
Search
Find
Query
Return
Log datain Hadoop
Resultsweb : home : home : - : - : impression
Aggregate
search can be better
one graph / event
10,000+ event types
not everybody knowsWhat are all sections under web:home?
Client event collection
Engineers & Data Scientists
client page section component element action
Search
Find
Query
Return
Log datain Hadoop
Resultsweb : home : home : - : - : impression
Aggregate
search can be better
one graph / eventx 10,000
10,000+ event types
not everybody knowsWhat are all sections under web:home?
Client event collection
Engineers & Data Scientists
• Session analysis
!
• Monitor network logs, not user activity logs
Related work
[Lam et al. 2007, Shen et al. 2013]
[Ghoniem et al. 2013]
See
Interactions search box => filter
Client event collection
narrow down
Engineers & Data Scientists
See
How to visualize?
narrow down
Client event collection
Engineers & Data Scientists
Interactions search box => filter
See
How to visualize?
narrow down
Client event collection
Engineers & Data Scientists
client : page : section : component : element : actionInteractions search box => filter
Client event hierarchy
iphone home -
- - impression
tweet tweet click
iphone:home:-:-:-:impressioniphone:home:-:tweet:tweet:click
Detect changes
iphone home -
- - impression
tweet tweet click
iphone home -
- - impression
tweet tweet click
TODAY
7 DAYS AGO
compared to
Display changes
iphone home -
- - impression
tweet tweet click
Map of the Market [Wattenberg 1999], StemView [Guerra-Gomez et al. 2013]
Users: PMs, Data Scientists, Engineers
• Search
• Monitor
• See effects after major product launch
Use cases
more information in the paper
Funnel analysis
banana : home : - : - : - : impression
banana : profile : - : - : - : impression
1 jobhome page
profile page
1 hour
Funnel analysis
banana : home : - : - : - : impression
banana : profile : - : - : - : impression banana : search : - : - : - : impression
home page
profile page search page
2 jobs2 hours
Funnel analysis
banana : home : - : - : - : impression
banana : profile : - : - : - : impression banana : search : - : - : - : impression
home page
profile page search page
Specify all funnels manually!
n jobsn hours
• Visualize an overview of event sequences
!
Related work
[Wongsuphasawat et al. 2011, Monroe et al. 2013, …]
• Visualize an overview of event sequences
!
• Big data? eBay checkout sequences
!
One funnel at a time Checkout > Payment > Confirm > Success
Related work
[Wongsuphasawat et al. 2011, Monroe et al. 2013, …]
[Shen et al. 2013]
User sessionsSession#1
A
B
start
end
Session#4
start
end
A
Session#2
B
start
end
A
Session#3
C
start
end
A
try with “sample” data (~millions sessions, 10,000+ event types)
!
original paper (100,000 sessions, ~10 event types)
1. Reduce event types
Reduce # of unique sequences
10,000 types select merge
tweet from home timeline tweet from search page tweet …
= tweet
1. Reduce event types
2. Reduce sequence length
Reduce # of unique sequences
session
10 events after (window size & direction)
1000 events
visit home page (alignment)
1. Reduce event types
2. Reduce sequence length
3. More aggregation on Hadoop
Reduce # of unique sequences
Ask users for input}
Collapse eventsSequence ABBBCCCC ABBCC ABC ABCCCC ABCD ABCCCD ABCCE ABCDF ABCDG ABCDH
e.g. tweet, tweet, tweet, … = tweet
Group & CountSequence ABC ABCD ABCE ABCDF ABCDG ABCDH ABCDI ABCDJK ABCDJL
Count 2000 80 20 1 1 1 1 1 1
rare sequences (count < threshold)
TruncateSequence ABC ABCD ABCE ABCDx ABCDx ABCDx ABCDx ABCDJx ABCDJx
Count 2000 80 20 1 1 1 1 1 1
Replace last event with x (…)
1. Define set of events
2. Pick alignment, direction and window size
3. Run Hadoop job (with more aggregation)
4. Wait for it… (2+ hrs)
5. Visualize
Final process
~100,000 patterns (10MB)
gazillion patterns (TBs)
• Since Jan 2013
• Fewer users, but more in-depth ad-hoc analysis
• Initial meeting to provide support
Deployment
• What did users do when they visit Twitter? (in demo)
• Where did users give up in the sign up process?
• more in the paper
Case studies
• Large-scale User Activity Logs + Visual Analytics
• Find, Monitor & Explore + Anomaly detection & automatic alert
• Funnel Analysis + More interactivity & data / reduce wait time
• Used in day-to-day operations at Twitter
Conclusions & Future work
Conclusions & Future workChallenge
big data
small data
visualize & interact
• Large-scale User Activity Logs + Visual Analytics
• Find, Monitor & Explore + Anomaly detection & automatic alert
• Funnel Analysis + More interactivity & data / reduce wait time
• Used in day-to-day operations at Twitter
aggregate & sacrifice
• Large-scale User Activity Logs + Visual Analytics
• Find, Monitor & Explore + Anomaly detection & automatic alert
• Funnel Analysis + More interactivity & data / reduce wait time
• Used in day-to-day operations at Twitter
• Generalize to smaller systems
Conclusions & Future workChallenge
big data
small data
visualize & interact
aggregate & sacrifice
• Data Scientists & Engineers @Twitter — Linus Lee, Chuang Liu
• Feedback from reviewers, Ben Shneiderman & Catherine Plaisant
Acknowledgement
• Large-scale User Activity Logs + Visual Analytics
• Find, Monitor & Explore + Anomaly detection & automatic alert
• Funnel Analysis + More interactivity & data / reduce wait time
• Used in day-to-day operations at Twitter
• Generalize to smaller systems
Conclusions & Future workChallenge
big data
small data
visualize & interact
[email protected] / @kristw
aggregate & sacrifice
I�0NY�MJQUX�^TZ
Ɣ FQ�DQTKPI�UGVWR�VCUMU�OCTIKPU��GVE��
Ɣ ETGCVG�TGURQPUKXG�EJCTVU�JCPFNG�TGUK\G�
Ɣ ETGCVG�TGWUCDNG�EQORQPGPVU
Ɣ OCPCIG�NC[GTU
I�0NY�MJQUX�^TZ
Ɣ FQ�DQTKPI�UGVWR�VCUMU
Ɣ ETGCVG�TGURQPUKXG�EJCTVU
Ɣ ETGCVG�TGWUCDNG�EQORQPGPVU
Ɣ OCPCIG�NC[GTU
F�-KV�%JCTVNGV
F�-KV�5MGNGVQP
F�-KV�.C[GT1TICPK\GT
F�-KV�HCEVQT[
2FWLNSX� I�èX�RFWLNS�HTS[JSYNTS�
KWWS���EO�RFNV�RUJ�NULVWZ��HHI�FE��I�GIF�F�D�F
2FWLNSX� I�èX�RFWLNS�HTS[JSYNTS�
ÔNQBÕ��ÔB�OM<INAJMHÓ�����ÕÔ�BÕÔ�NQBÕ
7UWCNN[�[QW�YKNN�JCXG�VQ�ETGCVG�CP��UXI �CPF�C��I � KPUKFG�YKVJ�UQOG� VTCPUNCVKQP� VQ� CFF�OCTIKPU� HQT� VJG� CZGU�� F�� JCU��VJKU� RCIG� VJCV� GZRNCKPU� VJG�EQPXGPVKQP��*QYGXGT�� VJGTG�CTG�UGXGTCN� UVGRU� VJCV� [QW� JCXG� VQ�FQ�GXGT[�VKOG�
+V� CNUQ� FQGU� PQV� NGV� [QW� GCUKN[�EJCPIG�VJG�OCTIKP�NCVGT�
KWWS���EO�RFNV�RUJ�NULVWZ��HHI�FE��I�GIF�F�D�F
'FW�HMFWY
KWWS���EO�RFNV�RUJ�NULVWZ��HFF�G�����FDGEH����
'FW�HMFWY.GVũU� EQORCTG� JQY� VQ� KORNGOGPV� VJKU�UKORNG�DCT�EJCTV�
KWWS���EO�RFNV�RUJ�NULVWZ��HFF�G�����FDGEH����
7UKPI�F�-KV1TKIKPCN�F��GZCORNG
7UKPI� F�-KV�� [QW� ECP� ETGCVG� C� UMGNGVQP��RCUUKPI�KP�VJG�EQPVCKPGT�G�I��DQF[��CPF�NGV�KV�ETGCVG� VJG� �UXI �� �I �� CPF� ECNEWNCVG� VJG�OCTIKPU�� 6JGP� [QW� ECP� QDVCKP� VJG� �I � WUKPI�UMGNGVQP�IGV4QQV)��
#NYC[U� WUG� UMGNGVQP�IGV+PPGT9KFVJ�� VQ� IGV�EQPVGPV� CTGC�� +H� [QW� EJCPIG� VJG� OCTIKP� XKC�UMGNGVQP�OCTIKP�� NCVGT�� ;QW� FQ� PQV� JCXG� VQ�YQTT[�CDQWV�WRFCVKPI�YKFVJ�ECNEWNCVKQP�CV�CNN��UMGNGVQP�IGV+PPGT9KFVJ�� YKNN� TGVWTP� VJG�WRFCVGF�KPPGT�YKFVJ�
7JXUTSXN[J�HMFWY
KWWS���EO�RFNV�RUJ�WUHERUHVTXH�I���������FI�����GI�
7JXUTSXN[J�HMFWYF�-KV�5MGNGVQP� CNUQ� JGNR� [QW� ECVEJ� TGUK\G�GXGPVU� CPF� TGUK\G� VJG� UMGNGVQP� CEEQTFKPI� VQ�[QWT�PGGF�HWNN�YKFVJ��MGGR�CURGEV�TCVKQ��
+P� VJKU� GZCORNG�� VJG� ,CRCP� HNCI� YKNN�ITQY�UJTKPM�YJGP�[QW�TGUK\G�VJG�YKPFQY��DWV�CNYC[U�MGGR�VJG�UCOG�CURGEV�TCVKQ�
5MGNGVQP� FKURCVEJGU� ŬTGUK\Gŭ� GXGPV�� UQ� [QW�YKNN�MPQY�YJGP�VQ�TGFTCY�[QWT�XKU�
KWWS���EO�RFNV�RUJ�WUHERUHVTXH�I���������FI�����GI�
7JZXFGQJ�HMFWY
KWWS���EO�RFNV�RUJ�NULVWZ�G�E��GG��D�F�������F
7JZXFGQJ�HMFWYF�-KV� CNUQ� RTQXKFGU� C� NKIJVYGKIJV� HCEVQT[� VQ�JGNR� [QW� ETGCVG� TGWUCDNG� EJCTV� QP� VQR� QH� C�UMGNGVQP��
9G�CTG�PQV� VT[KPI� VQ�FGHKPG�C�EQORNGZ� HTCOG�YQTM�JGTG��DWV�YG�CKO�VQ�UGV�VJG�UVCIG�CPF�IGV�QWV�QH�VJG�YC[��
KWWS���EO�RFNV�RUJ�NULVWZ�G�E��GG��D�F�������F
(MFWYQJY
KWWS���EO�RFNV�RUJ�WUHERUHVTXH��FF�G���EH����G�����
(MFWYQJY%JCTVNGV� JGNRU� [QW� ETGCVG� TGWUCDNG�EQORQPGPVU�YKVJKP�EJCTV��(QT�GZCORNG��VJGUG�HCEGU�DGNQY�CTG� KORNGOGPVGF�WUKPI�%JCTVNGV��;QW�ECP� KPVGTCEV�YKVJ� VJGO� KPFKXKFWCNN[��9G�ECP� GCUKN[� TGWUG� VJKU� HCEG� XKU� KP� CPQVJGT�EJCTV��
KWWS���EO�RFNV�RUJ�WUHERUHVTXH��FF�G���EH����G�����