Date post: | 26-Jan-2015 |
Category: |
Technology |
Upload: | mathieu-bastian |
View: | 116 times |
Download: | 3 times |
M A T H I E U B A S T I A N
D A T A V I S U A L I Z A T I O N S U M M I T , S A N F R A N C I S C O , A P R I L 1 1 - 1 2 , 2 0 1 3 1
BIG GRAPH DATA
• The story of big graph data is just starting • BIG GRAPH DATA
2 2 D A T A V I S U A L I Z A T I O N S U M M I T
BIG GRAPH DATA
• The story of big graph data is just starting • BIG GRAPH DATA
3 3 D A T A V I S U A L I Z A T I O N S U M M I T
BIG DATA GRAPHS
BIG GRAPH DATA
• The story of big graph data is just starting • BIG GRAPH DATA
4 4 D A T A V I S U A L I Z A T I O N S U M M I T
BIG DATA GRAPHS
LARGE DATASETS
DISTRIBUTED SYSTEMS
HADOOP
INDEXATION
REAL-TIME
STORAGE COMPLEX
ALGORITHM
ANALYTICS VISUALIZATION
CLOUD COMPUTING
DATABASES
BIG GRAPH DATA
• The story of big graph data is just starting • BIG GRAPH DATA
5 5 D A T A V I S U A L I Z A T I O N S U M M I T
BIG DATA GRAPHS
LARGE DATASETS
DISTRIBUTED SYSTEMS
HADOOP
INDEXATION
REAL-TIME
STORAGE COMPLEX
ALGORITHM
ANALYTICS VISUALIZATION
CLOUD COMPUTING
DATABASES
• “The Petabyte age” • All industries and domains can leverage big data
• Big Data => Big Problems • Focusing on building the technology to handle big data, and big
graph data (ex: graph databases) • Seeking efficient analysis of ever more complex systems
BIG DATA
6 6 D A T A V I S U A L I Z A T I O N S U M M I T
Health Government Finance Technology
• Graphs are everywhere, and it’s easy to collect graph data • The world is more complex and interconnected that we thought
GRAPHS
7 7 D A T A V I S U A L I Z A T I O N S U M M I T
Source: Collective Dynamics of Small-World Networks, D Watts, S Strogatz, Nature 393, 440-442
• The study of graphs has been exploding in the last 15 years • Networks have properties and patterns one can study • Robustness – How a network is resistant to random attacks? • Contagion – How fast a disease or gossip spread in a network? • Communities – How many communities exist in a network? • Centrality – Who is the most central individual in a network?
• If you read one of these books, you understand Network Science
NETWORK SCIENCE
8 8 D A T A V I S U A L I Z A T I O N S U M M I T
• Saddam Hussein Network (2003)
GRAPHS HELP SOLVE PROBLEMS
9 9 D A T A V I S U A L I Z A T I O N S U M M I T
The Universe
C. Wilson. Searching for Saddam: a five-part series on how the US military used social networking to capture the Iraqi dictator. 2010. www.slate.com/id/2245228/.
• Predicting and controlling infectious disease
GRAPHS HELP SOLVE PROBLEMS
10
10 D A T A V I S U A L I Z A T I O N S U M M I T
The Universe Naoki Masuda, Petter Holme - Predicting and controlling infectious disease epidemics using temporal networks. http://f1000.com/prime/reports/b/5/6/ Haraldsdottir S, Gupta S, Anderson RM: Preliminary studies of sexual networks in a male homosexual community in Iceland. J Acquir Immune Defic Syndr. 1992, 5:374–81.
• Recommendation systems
GRAPHS HELP SOLVE PROBLEMS
11
11 D A T A V I S U A L I Z A T I O N S U M M I T
The Universe
Credit: http://markorodriguez.com/2011/09/22/a-graph-based-movie-recommender-engine/
• Recipe recommendation using ingredient networks
GRAPHS HELP SOLVE PROBLEMS
12
12 D A T A V I S U A L I Z A T I O N S U M M I T
The Universe
Credit: http://www.ladamic.com/wordpress/?p=294
• Power grid
GRAPHS HELP SOLVE PROBLEMS
13
13 D A T A V I S U A L I Z A T I O N S U M M I T
The Universe
Credit: http://www.npr.org/templates/story/story.php?storyId=110997398
• Famous “Zachary’s Karate Club” study in 1977 only involved 34 nodes.
• It could be drawn by hand on paper
SMALL GRAPHS
14
14 D A T A V I S U A L I Z A T I O N S U M M I T
The Universe
W. W. Zachary, An information flow model for conflict and fission in small groups, Journal of Anthropological Research 33, 452-473 (1977).
Zachary’s Karate Club (1977)
• Your own Facebook or LinkedIn social network • The Harlem Shake: Anatomy of a Viral Meme
MEDIUM GRAPHS
15
15 D A T A V I S U A L I Z A T I O N S U M M I T
The Universe
Gilad Lotan. http://www.huffingtonpost.com/gilad-lotan/the-harlem-shake_b_2804799.html
• The Internet Map (~350 000 domains) • DBPedia (~290M relationships) • Friendster Social Network dataset* (1.8B edges)
LARGE GRAPHS
16
16 D A T A V I S U A L I Z A T I O N S U M M I T
The Universe
Internet Map (http://internet-map.net)
* http://snap.stanford.edu/data/index.html
• Graphs can be explicit or implicit • Explicit: The network exists in nature (Social Network, Food Webs,
Airlines Network) • Implicit: The network is derived from other data (Word networks, co-
authorship)
• Example of an implicit graph: • A set of documents have a set of tags • One can create a link when two tags are on the same document • Aggregate all links across all documents
IMPLICIT GRAPHS
17
17 D A T A V I S U A L I Z A T I O N S U M M I T
• Graphs of all the co-occurrences between LinkedIn Skills (2011)
SIMILARITY GRAPHS
18
18 D A T A V I S U A L I Z A T I O N S U M M I T
• Visualization and statistics are the two basic toolkits one can use on graphs
• Complex questions are asked when studying graphs
• Easy • Min, max, average, quartiles • Exact queries, search
• Harder • Patterns, trends, correlations • Changes over time, context • Anomalies, data errors • Geographical representation
VISUALIZATION
19
19 D A T A V I S U A L I Z A T I O N S U M M I T
Excel can do this!
Visualization can do this!
• Due to the size of graphs and the complexity of questions, visualization is the natural tool to understand what’s going on
GRAPH VISUALIZATION
20
20 D A T A V I S U A L I Z A T I O N S U M M I T
“ We are more easily persuaded by the reasons we ourselves discover than by those which are given to us by others.” Blaise Pascal
Let me play with the data!
Direct manipulation
• Use visualization and statistics to discover new hypothesis • Exploratory data analysis
• The user interface is centered around the human • Empowers the user to understand the structure and patterns in
the data • The machine augments the human • How? • Overview and details, zoom and pan interface • Interactive, direct-manipulation
DATA EXPLORATION AND INTERACTION
21
21 D A T A V I S U A L I Z A T I O N S U M M I T
“The greatest value of a picture is when it forces us to notice what we never expected to see.” John Tukey
• Iterative process to transform relational data into a map
• Use color, size and position to highlight, group and set up a hierarchy
MAP YOUR DATA
22
22 D A T A V I S U A L I Z A T I O N S U M M I T
• Exploring networks interactively & iterating often provide “Eureka” moments for domain experts
FROM INFORMATION TO KNOWLEDGE
23
23 D A T A V I S U A L I Z A T I O N S U M M I T
Eureka
• Big graph data doesn’t necessarily mean you’re visualizing or analyzing a large graph
• Small graphs can be extracted from large graphs and analyzed • Small graphs can be extracted from non-graph data as well • Graphs are just nodes and relationships after all
• Example: Adverse Drug Event Analysis with Hadoop, R, and Gephi (Josh Wills, Cloudera, 2012)
BIG GRAPH DATA
24
24 D A T A V I S U A L I Z A T I O N S U M M I T
• Built to solve large graph visualization problems. • Open source tool for Windows, Mac OS X and Linux • Large international community involved • The latest version has been downloaded > 100,000 times • Extensible with plug-ins • Available at http://gephi.org
GEPHI
25
25 D A T A V I S U A L I Z A T I O N S U M M I T
GEPHI
26
26 D A T A V I S U A L I Z A T I O N S U M M I T
VISUALIZATION
LAYOUT
FILTER
STATISTICS
TIMELINE
VISUAL MAPPING
DATA EDITION
• Open-source lightweight JavaScript library to draw graphs • Uses HTML5 Canvas • Display dynamically graphs that can be generated on the fly • Available at http://sigmajs.org
SIGMA.JS
27
27 D A T A V I S U A L I Z A T I O N S U M M I T
Sigma.js v0.1
• Big graph data = Relational Big Data • Graphs are everywhere! • Graphs have fascinating structure and patterns one can analyze • Visualization is a natural tool for such complex data and complex
questions • On graphs, visualization done right allows interaction and
iteration. Play. • The hard part is to extract a small or medium graph from big data • Open source tools like Gephi or Sigma.js are a good start
SUMMARY
28
28 D A T A V I S U A L I Z A T I O N S U M M I T
Become a graph evangelist!
QUESTIONS?
Mathieu Bastian (@mathieubastian)
29
29 D A T A V I S U A L I Z A T I O N S U M M I T
Join the Social Network Analysis class by Lada Adamic on Coursera https://www.coursera.org/course/sna Support the Gephi Consortium http://consortium.gephi.org Computational Information Design, Ben Fry (2004) http://benfry.com/phd/ The Atlas of Economic Complexity, Harvard's Center for International Development (CID) and the MIT Media Lab http://atlas.media.mit.edu/ The Mesh of Civilizations and International Email Flows, Bogdan State, Patrick Park, Ingmar Weber, Yelena Mejova, Michael Macy http://arxiv.org/abs/1303.0045 The Human Disease Network, Goh K-I, Cusick ME, Valle D, Childs B, Vidal M, Barabási A-L (2007) http://www.pnas.org/content/104/21/8685.full What does your intranet look like? http://intranetdiary.blogspot.co.uk/2012/11/network-visualisation.html Recipe recommendation using ingredient networks, Chun-Yuen Teng, Yu-Ru Lin, Lada A. Adamic http://arxiv.org/abs/1111.3919 US Presidents Inaugural Speeches 1969-2013 Text Network Analysis http://noduslabs.com/cases/presidents-inaugural-speeches-text-network-analysis/ 10 Reasons Why We Visualise Data http://www.slideshare.net/Facegroup/10-reasons-why-we-visualise-data
Sigma.js, Alexis Jacomy and al. http://sigmajs.org Linked: How Everything Is Connected to Everything Else and What It Means, Albert-Laszlo Barabasi http://www.amazon.com/gp/product/0452284392/ Six Degrees: The Science of a Connected Age, Duncan J. Watts http://www.amazon.com/gp/product/0393325423/ Nexus: Small Worlds and the Groundbreaking Science of Networks, Mark Buchanan http://www.amazon.com/gp/product/0393324427 Connected: The Surprising Power of Our Social Networks and How They Shape Our Lives, Nicholas A. Christakis and James H. Fowler http://www.amazon.com/dp/product/0316036137 Atelier Iceberg – Gephi http://www.slideshare.net/ateliericeberg/gephi-17680699 Adding Value through graph analysis using Titan and Faunus, Matthias Broecheler http://www.slideshare.net/knowfrominfo/titan-talk-ebaymarch2013 Network Maps Board on Pinterest, Mathieu Bastian http://pinterest.com/mathieubastian/network-maps/ Network Science Book, Albert-László Barabási http://barabasilab.neu.edu/networksciencebook Adverse Drug Event Analysis with Hadoop, R, and Gephi, Cloudera https://github.com/cloudera/ades
REFERENCES & LINKS
30
30 D A T A V I S U A L I Z A T I O N S U M M I T