Log Dimension Hypothesis 1
The Logarithmic Dimension Hypothesis
Anthony BonatoRyerson University
MITACS International Problem Solving Workshop
July 2012
Log Dimension Hypothesis 2
Workshop team• David Gleich (Purdue)
• Dieter Mitsche (Ryerson)
• Stephen Young (UCSD)
• Myunghwan Kim (Stanford)
• Amanda Tian (York)
Log Dimension Hypothesis 3
Friendship networks• network of friends (some real, some virtual) form
a large web of interconnected links
Log Dimension Hypothesis 4
6 degrees of separation
• (Stanley Milgram, 67): famous chain letter experiment
Log Dimension Hypothesis 5
6 Degrees in Facebook?• 900 million users, > 70
billion friendship links• (Backstrom et al., 2012)
– 4 degrees of separation in Facebook
– when considering another person in the world, a friend of your friend knows a friend of their friend, on average
• similar results for Twitter and other OSNs
Log Dimension Hypothesis 6
Complex Networks• web graph, social networks, biological networks, internet
networks, …
Log Dimension Hypothesis 7
The web graph
• nodes: web pages
• edges: links
• over 1 trillion nodes, with billions of nodes added each day
Log Dimension Hypothesis 8
On-line Social Networks (OSNs)Facebook, Twitter, LinkedIn, Google+…
Log Dimension Hypothesis 9
Key parameters• degree distribution:
• average distance:
• clustering coefficient:
|})deg(:)({| , iuGVuN ni
)(,
1
2),()(
GVvu
nvudGL
)(
1-1
)()( ,2
)deg(|))((| )(
GVxxcnGC
xxNExc
Log Dimension Hypothesis 10
Properties of Complex Networks• power law degree distribution
(Broder et al, 01)
2 some ,, bniN bni
Power laws in OSNs (Mislove et al,07):
Log Dimension Hypothesis 11
Log Dimension Hypothesis 12
Small World Property• small world networks
(Watts & Strogatz,98)– low distances
• diam(G) = O(log n)• L(G) = O(loglog n)
– higher clustering coefficient than random graph with same expected degree
Log Dimension Hypothesis 13
Sample data: Flickr, YouTube, LiveJournal, Orkut
• (Mislove et al,07): short average distances and high clustering coefficients
Log Dimension Hypothesis 14
• (Zachary, 72)
• (Mason et al, 09)
• (Fortunato, 10)
• (Li, Peng, 11): small community property
Community structure
Log Dimension Hypothesis 15
(Leskovec, Kleinberg, Faloutsos,05):• densification power law: average degree is
increasing with time• decreasing distances
• (Kumar et al, 06): observed in Flickr, Yahoo! 360
Log Dimension Hypothesis 16
Geometry of OSNs?
• OSNs live in social space: proximity of nodes depends on common attributes (such as geography, gender, age, etc.)
• IDEA: embed OSN in 2-, 3- or higher dimensional space
Log Dimension Hypothesis 17
Dimension of an OSN• dimension of OSN: minimum number of
attributes needed to classify nodes
• like game of “20 Questions”: each question narrows range of possibilities
• what is a credible mathematical formula for the dimension of an OSN?
Log Dimension Hypothesis 18
Geometric model for OSNs• we consider a geometric
model of OSNs, where– nodes are in m-
dimensional Euclidean space
– threshold value variable: a function of ranking of nodes
Log Dimension Hypothesis 19
Geometric Protean (GEO-P) Model(Bonato, Janssen, Prałat, 12)
• parameters: α, β in (0,1), α+β < 1; positive integer m• nodes live in m-dimensional hypercube (torus metric)• each node is ranked 1,2, …, n by some function r
– 1 is best, n is worst – we use random initial ranking
• at each time-step, one new node v is born, one randomly node chosen dies (and ranking is updated)
• each existing node u has a region of influence with volume
• add edge uv if v is in the region of influence of u nr
Log Dimension Hypothesis 20
Notes on GEO-P model
• models uses both geometry and ranking• number of nodes is static: fixed at n
– order of OSNs at most number of people (roughly…)
• top ranked nodes have larger regions of influence
Log Dimension Hypothesis 21
Simulation with 5000 nodes
Log Dimension Hypothesis 22
Simulation with 5000 nodes
random geometric GEO-P
Log Dimension Hypothesis 23
Properties of the GEO-P model (Bonato, Janssen, Prałat, 2012)
• a.a.s. the GEO-P model generates graphs with the following properties:– power law degree distribution with exponent
b = 1+1/α– average degree d = (1+o(1))n(1-α-β)/21-α
• densification– diameter D = O(nβ/(1-α)m log2α/(1-α)m n)
• small world: constant order if m = Clog n– clustering coefficient larger than in comparable
random graph
Log Dimension Hypothesis 24
Spectral properties• the spectral gap λ of G is defined by the
difference between the two largest eigenvalues of the adjacency matrix of G
• for G(n,p) random graphs, λ tends to 0 as order grows
• in the GEO-P model, λ is close to 1• (Estrada, 06): bad spectral expansion in real
OSN data
Log Dimension Hypothesis 25
Dimension of OSNs
• given the order of the network n, power law exponent b, average degree d, and diameter D, we can calculate m
• gives formula for dimension of OSN:
Dn
nd
bb
Dnm
loglog
loglog
211
loglog
Log Dimension Hypothesis 26
6 Dimensions of Separation
OSN DimensionFacebook 7YouTube 6Twitter 4Flickr 4
Cyworld 7
Log Dimension Hypothesis 27
Uncovering the hidden reality• reverse engineering approach
– given network data (n, b, d, D), dimension of an OSN gives smallest number of attributes needed to identify users
• that is, given the graph structure, we can (theoretically) recover the social space
Log Dimension Hypothesis 28
Logarithmic Dimension Hypothesis
• Logarithmic Dimension Hypothesis (LDH): the dimension of an OSN is best fit by about log n, where n is the number of users OSN– theoretical evidence GEO-P and MAG
(Leskovec, Kim,12) models–empirical evidence? – (Sweeney, 2001)
Log Dimension Hypothesis 29
Experimental design• supervised machine learning
– Alternating Decision Trees (ADT)– approach of (Janssen et al, 12+) based on earlier
work on PIN by (Middendorf et al, 05)• classify OSN data vs simulated graphs from GEO-P
model in various dimensions• develop a feature vector (graphlets, degree distribution
percentiles, average distance, etc) to classify the correct dimension
• ADT will classify which dimension best fits the data – cross-validation and robustness testing
Log Dimension Hypothesis 30
Example
Log Dimension Hypothesis 31
• preprints, reprints, contact:search: “Anthony Bonato”
Log Dimension Hypothesis 32