+ All Categories
Home > Documents > CS 728 Lecture 4 It’s a Small World on the Web. Small World Networks It is a ‘small world’...

CS 728 Lecture 4 It’s a Small World on the Web. Small World Networks It is a ‘small world’...

Date post: 19-Dec-2015
Category:
View: 214 times
Download: 0 times
Share this document with a friend
16
CS 728 Lecture 4 It’s a Small World on the Web
Transcript

CS 728Lecture 4

It’s a Small World on the Web

Small World Networks

• It is a ‘small world’ after all

– Billions of people on Earth, yet every pair separated by “six degrees” of acquaintance relationships

– Notion popularized by experimental psychologist Stanley Milgram’s, different from his more infamous experiment

• Mathematically

– Sparse – linear number of edges

– Diameter - small like logarithm (log N) – Clustering is high – neighbors are neighbors

Small World = Small Diameter + Clustering

• Defined by two measures:– characteristic path length L = number of

edges in shortest path between two vertices, averaged over all vertex pairs

– clustering coefficient C:• take vertex v with k 1 neighbors• at most k(k-1)/2 edges among neighbors• C(v) = fraction of k(k-1)/2 edges present

• C = average clustering coefficient

• C >> C_random, L L_random

The small world of the Web

• Empirical study of Web-graph reveals small-world property– Sparse graph– Average distance (d) in simulated web: d = 0.35 + 2.06 log (n) e.g. n = 109, d ~= 19

– Diameter properties inferred from sampling• Calculation of max. diameter computationally

demanding for large values of n

– Clustering unknown

Implications for Web

• Logarithmic scaling of diameter makes future navigation of web manageable– 10-fold increase of web pages results in only

2 more additional ‘clicks’, but …– Users may not take shortest path, may use

bookmarks or just get distracted on the way– Search engines play a crucial role, how can

they use this SW link structure?

Small World in Real World of Hollywood: The Kevin Bacon Game

Goal: Connect any actor to Kevin Bacon, by linking actors who have acted in the same movie.

Oracle of Bacon website uses Internet Movie Database (IMDB.com) to find shortest link between any two actors. Created by students at Univ. of Virginia http://oracleofbacon.org/

Boxed version of theKevin Bacon Game

The Hollywood Network

Total # of actors in database: ~550,000

Most actors are within three links of each other!

Average path length to Kevin Bacon: 2.79

Actor closest to “center”: Rod Steiger (2.53)

Rank of Kevin, in closeness to center: 876th

Center of Hollywood?

Math Citation Network:Erdős Number

Number of links required to connect scholars to Erdős, via co-authorship of papers

Erdős wrote 1500+ papers with 507 co-authors.

Jerry Grossman’s (Oakland Univ.) website allows mathematicians to compute their Erdos numbers:

http://www.oakland.edu/enp/

Connecting path lengths, among mathematicians only:– average is 4.65– maximum is 13

Paul Erdős (1913-1996)

My number is 3

- Erdős and Renyi showed that average path length between connected nodes in a random graph is logarithmic

- But degree sequences in social networks like Web and Hollywood are not Poisson

- Back to Power-laws

Erdős

ArnyRosenberg Fred

AnnexsteinFanChung

k

N

ln

ln

Classes of small-world networks– Single-scale: Connectivity distribution decays exponentially

(e.g., Poisson and random graphs)– Scale-free: Power-law distribution of connectivity over

entire range– Broad-scale: Power-law over “broad range” + abrupt cut-off

Bow-tie Structure of Web

• A large scale study (Altavista crawls) reveals another interesting property of web – “symmetric asymmetry”– Study of 200 million nodes & 1.5 billion links– Small-world property not applicable to entire

web• Some parts unreachable• Others have long paths

– Power-law connectivity holds though• Page indegree ( = 2.1), outdegree ( = 2.72)

Bow-tie Components• Strongly Connected

Component (SCC)– Core with small-world

property

• Upstream (IN)– Core can’t reach IN

• Downstream (OUT)– OUT can’t reach core

• Disconnected (Tendrils)

Component Properties

• Each component is roughly same size– ~50 million nodes

• Tendrils not connected to SCC– But reachable from IN and can reach OUT

• Tubes: directed paths IN->Tendrils->OUT

• Disconnected components– Maximal and average diameter is infinite

Empirical Numbers for Bow-tie

• Maximal minimal (?) diameter– 28 for SCC, 500 for entire graph

• Probability of a path between any 2 nodes– ~1 quarter (0.24)

• Average length – 16 (directed path exists), 7 (undirected)

• Shortest directed path between 2 nodes in SCC: 16-20 links on average

Next Time:Models for the Web Graph

• Stochastic models that can explain or at least partially reproduce the properties of the web graph. Goals of model– power law distribution properties– maintain the small world property– bow-tie structure


Recommended