1February 17 2012
Online social networks: Trends in research and the small world phenomenon
Meeyoung ChaAssistant ProfessorGraduate School of Culture [email protected]
WST500
2
Roadmap
Trends in social network Basic concepts Sentiment analysis
Six degree of separation Milgram’s experiment Kevin bacon game Network models
[Credit: zastaviki.com]
3
Why is the role of networks in computer science, information science, social science, physics, economics, and biology expanding?
Rise of the Web and social media led to more data Online communities (Facebook 500M), news and micro-blogging sites
Shared vocabulary between different fields Network as a set of weakly interacting entities Helps us find patterns from seemingly complex structure
The new science of networks
Internet Citation Sexual contact Yeast-protein
4
What we can learn from social media
Social media help us gain new insight into the world we live in and answer long-standing questions in social science
[Credit: teslasociety.com]
Check out this linkhttp://...
I am @Starbucks
Great shampoo!
Listening to …
I’m bored
Fire here!
5
Can public sentiments expressed in
social media predict the stock market?
[Credit: wired.com]
6
Analysis of public tweets from Feb 28th – Dec 19th 2008(9.8 million tweets by 2.7 million users) http://arxiv.org/abs/1010.3003By Johan Bollen, Hunia Mao, Xiao-Jun Zeng, Oct 2010
Out of various types of emotions, “calmness” line up very well with the Dow Jones Industrial Average
Training a machine-learning algorithm with a 3-day prior data could predict the stock price with 86.7% accuracy
Yes, based on a Twitter study
7
Step 1: Finding opinions Focused on explicit mood statements
e.g. “I am feeling”, “makes me”, “I am” Excluded tweets with URLs in order to avoid spam messages
Step 2: Finding mood dimensions Used a psychology dictionary POMS (Profile of mood states) that gives scores to
words across different mood states [McNair, Lorr, and Droppleman, 1979]
Authors extended POMS to cover more recently used words from Google and measured six mood sates: calm, alert, sure, vital, kind, happye.g. “I feel nervous about doing something new”
Data methodology
8
Sanity Check
Could confirm that people are anxious the day before the US election
On Thanksgiving, “happy” score spiked
9
Roadmap
Trends in social network Basic concepts Sentiment analysis
Six degree of separation Milgram’s experiment Kevin bacon game Network models
[Credit: zastaviki.com]
10
A network is a small world if all nodes are connected to all other nodes through relatively short distances.
1. How short is “relatively” short?
2. Do there exist models of networks that produce short paths?
Small world phenomenon
12
Asked random people from Nebraska to send a letter (via intermediaries) to a stock broker in Boston
Could only send to letters to those acquainted on first-name basis
296 volunteers participated with help of 453 intermediaries. Ultimately, 29% of the letters reached the target!
Milgram experiment (1969)
Stanley Milgram
[Travers and Milgram, Sociometry 1969]
13
The (successful) chain length was on average six hops, indicating that everyone is connected to everyone else through six links.
Six degrees of separation
Mean = 5.2
[Travers and Milgram, Sociometry 1969]
14
Number of links required to connect scholars to Erdős via co-authorship of papers
Erdős wrote 1500+ papers with 507 co-authors
Jerry Grossman’s site allows mathematicians to compute their Erdős numbers: http://www.oakland.edu/enp/
Connecting path lengths, among mathematicians only: The average is 4.65 The maximum is 13
Erdős numbers - 1
Paul Erdős (1913-1996)
15
Collaboration graph Nodes are authors and links mean coauthor relationships Erdős number: distance from Paul Erdős 280,000 reachable nodes; mean path length = 4.65 links
Erdős numbers - 2
Count
Distance to Paul Erdős
Scientists are linked to one another through the papers they write, because coauthorship represents a strong social link.
16
Invented by Albright College students in 1994 Goal is to connect any actor to Kevin Bacon,
by linking actors who acted in the same movie
Oracle of Bacon website uses Internet Movie Database (IMDB.com) to find shortest link between any two actors: http://oracleofbacon.org/
Total # of actors in database: ~550,000 Average path length to Kevin: 2.79 Actor closest to “center”: Rod Steiger (2.53)
Most actors are within 3 links of each other!
Kevin Bacon Game
Boxed version of theKevin Bacon Game
19
Snail letter network, co-authorship network, movie network Small-scale examples Direct human networks, where social link is strong => Hence the chance of having a short path could increase
Online social networks Tens of millions of users and links Social links not necessarily based on direct encounters=> Should we expect social networks to have short paths too?
What about online social networks?
20
Leskovec and Horvitz (2007) 180 million nodes and 1.3 billion edges in the messenger network Mean path length = 6.6 links
Example 1: Microsoft IM
[Leskovec and Horvitz, WWW 2007]
21
Liben-Nowell et al. (2005) Greedy geographic routing to friend closest to destination Mean path length = 4.12 links
Example 2: LiveJournal
[Liben-Nowell et al., PNAS 2005]
22
Based on Milgram’s (1967) famous work, the substantive point is that networks are structured such that even when most of our connections are local, any pair of people can be connected by a fairly small number of relational steps.
Small world networks
23
Assume each person is connected to 100 other people So
In step 1, one can reach 100 people In step 2, one can reach 100x100 = 10,000 people … In step 5, one can reach 10 billion people
What’s not obvious here? Many edges are local (i.e., friend of a friend)
6-degrees: Should we be surprised?
[Credit: Jure Leskovec]How can we understand the small world phenomena?
Is there a good model?
24
Do there exist models of networks that have high clustering and low diameter (like real networks)? Studied shift from structured networks (lattices) to “random” networks
Transition from regular to random?
25
• 6 billion nodes on a circle• Each connected to 1,000 neighbors• Start rewiring links randomly• Calculate “average path length” and “clustering” as the
network starts to change• Network changes from structured to random• APL: starts at 3 million, decreases to 4 (!)• Clustering: probability that two nodes linked to a common
node will be linked to each other (degree of overlap)• Clustering: starts at 0.75, decreases to 1 in 6 million• So what happens along the way?
Watts and Strogatz Model (1998)
[Watts and Strogatz, Nature 1998]
26
“Rewire” edges of lattice independently with probability P then examined the average distance L(p) and clustering coefficient C(p)
From regularity to randomness
[Watts and Strogatz, Nature 1998]
27
Small worlds around us• Caenorhabditis Elegans
959 cellsGenome sequenced 1998Nervous system mapped small world network
• Power grid network of Western States5,000 power plants with high-voltage lines small world network
28
Scale-free network The scale-free model focuses on the distance-reducing
capacity of high-degree nodes, as ‘hubs’ create shortcuts that carry the disease.
Colorado Springs High-Risk(Sexual contact only)Network is power-law (a=-1.3)
29
Applications to Spread of diseases (foot-and-mouth
disease, computer viruses, AIDS) Spread of fashions Spread of knowledge
Small-world networks are: Robust to random failures Vulnerable to selectively targeted attacks
Implications
30
Watts and Strogatz demonstrated that small world properties can occur in graphs with a surprisingly small number of shortcuts
1. How short is “relatively” short?Empirical evidence shows six degree of separation.People can find efficient routes with only local information.
2. Do there exist models of networks that produce short paths?A small number of long range “shortcuts” suffice to significantly reduce average distance.
Conclusion: small world phenomenon