Social Web 2014, Lora Aroyo!
Lecture IV: How can we MINE, ANALYSE &the Social Web? (1)
Lora Aroyo The Network Institute
VU University Amsterdam
Social Web 2014
• 25 billion tweets on Twitter in 2010, by 175 million users
• 360 billion pieces of contents on Facebook in 2010, by 600 million different users
• 35 hours of videos uploaded to YouTube every minute
• 130 million photos uploaded to flickr per month
Social Web 2014, Lora Aroyo!
The Age of BIG Data
Science with BIG Data
Social Web 2014, Lora Aroyo!
BIG Data Challenges
enormous wealth of data = lots of insights
• insights in users’ daily lives and activities • insights in history • insights in politics • insights in communities • insights in trends • insights in businesses & brands
Social Web 2014, Lora Aroyo!
Why?
enormous wealth of data = lots of insights
• who uploads/talks? (age, gender, nationality, community, etc.)
• what are the trending topics? when? • what else do these users like? on which platform? • who are the most/least active users? • ..…
Social Web 2014, Lora Aroyo!
Why?
Image: http://www.co.olmsted.mn.us/prl/propertyrecords/RecordingDocuments/
PublishingImages/forms.jpg
Social Web 2014, Lora Aroyo!
This doesn’t work
Social Web 2014, Lora Aroyo!
How about this?
Social Web 2014, Lora Aroyo!
Who uses it?
Social Web 2014, Lora Aroyo!
Politicians Governmental institutions
Whole society
Social Web 2014, Lora Aroyo!
Whole society
Social Web 2014, Lora Aroyo!
repurposing data
danger of second order effect
Whole society
Social Web 2014, Lora Aroyo!
repurposing data
danger of second order effect
Whole society
Social Web 2014, Lora Aroyo!
repurposing data
discoveries & correlations
Web-Scale Pharmacovigilance: Listening to Signals from the Crowd, R.W. White et al (2013)
Whole society
Social Web 2014, Lora Aroyo!
repurposing data
discoveries & correlations
Web-Scale Pharmacovigilance: Listening to Signals from the Crowd, R.W. White et al (2013)
Whole society
Social Web 2014, Lora Aroyo!
repurposing data
discoveries & correlations
Web-Scale Pharmacovigilance: Listening to Signals from the Crowd, R.W. White et al (2013)
Social Web 2014, Lora Aroyo!
Scientists
Bibliometrics
Social Web 2014, Lora Aroyo!
Scientists
Bibliometrics
Social Web 2014, Lora Aroyo!
Scientists
Bibliometrics
Social Web 2014, Lora Aroyo!
Culture History
Social Web 2014, Lora Aroyo!
Culture History
Culture History
Social Web 2014, Lora Aroyo!
Culture History
Social Web 2014, Lora Aroyo!
Culture History
Social Web 2014, Lora Aroyo!
Culture
Social Web 2014, Lora Aroyo!
Bill Howe, University of Washington
Social Web 2014, Lora Aroyo!
Entertainment
Social Web 2014, Lora Aroyo!
Entertainment
Social Web 2014, Lora Aroyo!
Entertainment
You?
Social Web 2014, Lora Aroyo!
You?
Social Web 2014, Lora Aroyo!
Companies
Social Web 2014, Lora Aroyo!
Social Web 2014, Lora Aroyo!
Who does it?
Social Web 2014, Lora Aroyo!
The Rise of the Data Scientist
Social Web 2014, Lora Aroyo!
The Rise of the Data Scientist
Social Web 2014, Lora Aroyo!
The Rise of the Data Scientist
Social Web 2014, Lora Aroyo!
The Rise of the Data Scientist
Social Web 2014, Lora Aroyo!
The Rise of the Data Scientist
Data Geeks Skills: Statistics
Data munging Visualisation
http://radar.oreilly.com/2010/06/what-is-data-science.html
Social Web 2014, Lora Aroyo!
The Rise of the Data Scientist
• Data Science enables the creation of data products
• Data products are applications that acquire their value from the data, and create more data as a result.
• Users are in a feedback loop: they constantly provide information about the products they use, which gets used in the data product.
Social Web 2014, Lora Aroyo!
Data Science
Social Web 2014, Lora Aroyo!
Data Science Venn Diagram
Drew Conway
Social Web 2014, Lora Aroyo!
Social Web 2014, Lora Aroyo!
Popular Data Products
Data Science is about building products
not just answering questions
Social Web 2014, Lora Aroyo!
Popular Data Products
empower the others to use the data
empower the others to their own analysis
(Inspired by George Tziralis’ FOSS Conf’09, John Elder IV’s Salford Systems Data Mining Conf. and Toon Calders’ slides)
Data mining is the exploration & analysis of large quantities of data
in order to discover valid, novel, potentially useful, & ultimately understandable patterns in data
http://www.freefoto.com/images/33/12/33_12_7---Pebbles_web.jpgSocial Web 2014, Lora Aroyo!
Data Mining 101
Databases Statistics
Artificial Intelligence
Social Web 2014, Lora Aroyo!
Data Mining 101
• Data input & exploration
• Preprocessing • Data mining algorithms
• Evaluation & Interpretation
• What data do I need to answer question X?
• What variables are in the data?
• Basic stats of my data?
Social Web 2014, Lora Aroyo!
Data Input & Exploration
“LikeMiner”
• Cleanup!
• Choose a suitable data model
• What happens if you integrate data from multiple sources?
• Reformat your data
Social Web 2014, Lora Aroyo!
Preprocessing
“LikeMiner”
• Classification: Generalising a known structure & apply to new data
• Association: Finding relationships between variables
• Clustering: Discovering groups and structures in data
Social Web 2014, Lora Aroyo!
Data Mining Algorithms
• Filter users by interests
• Construct user graphs
• PageRank on graphs to mine representativeness
• Result: set of influential users
• Compare page topics to user interests to find pages most representative for topics
Social Web 2014, Lora Aroyo!
Mining in “LikeMiner”
Evaluation & InterpretationWhat does the pattern I found mean? • Pitfalls:
• Meaningless Discoveries
• Implication ≠ Causality (Intensive care -> death)
• Simpson’s paradox
• Data Dredging
• Redundancy
• No New Information
• Overfitting
• Bad Experimental Setup
Social Web 2014, Lora Aroyo!
Social Web 2014, Lora Aroyo!
Data Mining is not easy
Data Journalism
Social Web 2014, Lora Aroyo!
Social Web 2014, Lora Aroyo!
source: http://kunau.us/wp-content/uploads/2011/02/Screen-shot-2011-02-09-
at-9.03.46-PM-w600-h900.png
Social Web 2014, Lora Aroyo!
Mining Social Web Data
Source: http://infosthetics.com/archives/2011/12/all_the_information_facebook_knows_about_you.html See also: http://www.youtube.com/watch?feature=player_embedded&v=kJvAUqs3Ofg
Social Web 2014, Lora Aroyo!
Single Person
http://www.brandrants.com/brandrants/obama/
Social Web 2014, Lora Aroyo!
Populations
Social Web 2014, Lora Aroyo!
Brand Sentiment via Twitter
http://flowingdata.com/2011/07/25/brand-sentiment-showdown/
Social Web 2014, Lora Aroyo!
Sentiment Analysis as Service
http://text-processing.com/demo/sentiment/
Social Web 2014, Lora Aroyo!
http://www.cs.cornell.edu/home/kleinber/networks-book/networks-book.pdf
Social Web 2014, Lora Aroyo!
Recommended Reading
http://www.actmedia.eu/media/img/text_zones/English/small_38421.jpgSocial Web 2014, Lora Aroyo!
Assignment 2: Semantic Markup• Part I: enrich/create a Web page with semantic markup
• Step 1: Mark up two different Web pages with the appropriate markup describing properties of at least people, relationships to other people, locations, some temporally related data and some multimedia. You can also try out tools such as Google Markup Helper
• Step 2: Validate your semantic markup. Use existing validator. • Step 3: Explain why you chose particular markups. Compare the advantages and disadvantages
of the different markups. Include screenshots from validators. !
• Part II: analyse other team’s Web page markup - as a consumer & as a publisher • Step 1: Perform evaluation and report your findings (consider findability or content extraction) • Step 2: Support your critique with examples of how the semantic markup could be improved. • In introductory section explain what semantic markup is, what it is for, what it looks like etc. • Support your choices and explanations with appropriate literature references. • 5 pages (excluding screen shots). • Other group’s evaluation details in appendix. !
• Deadline: 4 March 23:59
Image Source: http://blog.compete.com/wp-content/uploads/2012/03/Like.jpgSocial Web 2014, Lora Aroyo!
Final Assignment: Your SocWeb App
• Create your own Social Web app (in a group) • Use structured data, entity relations, data analysis, visualisation • Write individual report on one of the main aspects of your app • Pitch your app idea before finalising: 13 March, during Hands-on • Submit: 28 March 23:59
image source: http://www.flickr.com/photos/bionicteaching/1375254387/Social Web 2014, Lora Aroyo!
Hands-on Teaser
• Build your own recommender system 101 • Recommend pages on del.icio.us • Recommend pages to your Facebook friends