Dynamics of Peer-to-Peer Networks or
Who is Going to be The Next Pop Star?
Yuval ShavittSchool of Electrical Engineering
[email protected]://www.eng.tau.ac.il/~shavitt
Credits
Talk is based on the papers:• Static and dynamic characterization of the
Gnutella network [Shaked-Gish, S, Tankel, IPTPS 2007]
• How to predict the next pop star? [Koenigstein, S, Tankel, KDD 2008]
What are Peer-to-Peer Networks?
• The common computing paradigm is client-server– Server waits for requests (on a
known port)– Client sends a request– Server serves the client– Examples: WWW, FTP, SMTP (e-
mail), …..
• Peer-to-peer networks:– Each end-point is both client and
server
client client
client client
client client
client clientserver
The Gnutella Network
• Gnutella: The most popular sharing network on the Internet
• According to the Digital Music News Research Group 40% market share in Q4 2007
• Limewire: The most popular file sharing client in the world. Dominates the Gnutella network.
The Gnutella Protocol
• Originally: a flat peer-to-peer distributed protocol.– Churn caused instability
• Today: a 2-level tiered system – Stable nodes are promoted to become ultrapeers– Queries carry OOB address:
The originator’s address or in most cases when the client is firewalled, this is the ultrapeer’s address
Locating the Origin IP address
IP resolution Process:
• Detect the U.P. IP• Discard queries with
more than 2 hops• Discard queries with
2 hops and same IP• Intercept queries
with 2 hops and different IPs
peer peer
UPUPUP listener
peer
Cancels the bias for rare queries
Introduces bias against firewalled clients
Data Sets• First study:
– Jul 2006 - Nov 2006– 665,000,000 world-wide geo-identified queries
• Second study– Oct 2006 – Jul 2007, Sundays only– 310,000,000 USA geo-identified queries
• A network crawl of 24 hours– 1.2M users– 533,000 different songs
Largest studies ever performedin length and depth
Query Classification in Gnutella
Music (68.11%) Adult (22.01%)
Movie (4.1%) TV (1.7%)
Unknow n (1.67%) Japanese Anime/Comic (1.37%)
Softw are (0.54%) File Suff ix (0.26%)
Spam (0.23%)
2nd
Top Coutries
Queries Per Day
Queries Per Hour Per User
Top Queries (constant)
Top Volatile Queries
Temporal Ranking Drift
How to Predict Artist’s Success?
Noam Koenigstein, Y. Shavitt, and Tomer Tankel. Spotting Out Emerging Artists Using Geo-Aware Analysis of P2P Query Strings. The 2008 ACM SIGKDD Conference, August 2008, Las Vegas, NV, USA.
The Word of Mouth Effect
A successful innovation formation of adopter-clusters around early adopters
unsuccessful product a uniform spatial distribution
The Divergence can be used to predict a new product success probability [Garber et al., Marketing Science 2004]
The divergence
• When measured against the uniform distribution, maximum is achieved when P is a function.– True for both Kullback-Leiblar and Jensen-
Shannon– This is the case when emerging artists are
considered
• Non uniform distribution of potential adopters:
Party Like a Rockstar in 2007Week 6: The string “party like a rockstar” is detected by the algorithmWeek 8: Atlanta’s popularity chart in (Feb 18th)Week 15: Atlanta based Shop Boyz sign contract with Universal RecordingsWeek 18: The song first enters the Billboard Hot 100 on (80th position)Week 23: Reached 2nd position on Billboard Hot 100
Ranked only10,156on the
global chart
Party Like a Rockstar
0
0.5
1
1.5
2
2.5
3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Week Numbers (2007)
Div
erg
en
ce
0.00E+00
1.00E-02
2.00E-02
3.00E-02
4.00E-02
5.00E-02
6.00E-02
7.00E-02
8.00E-02
Po
pu
lari
ty
KL Divergence
PopularityShop Boyz related queries in February 2007
Shop Boyz Popularity and Divergence in 2007
Soulja Boy
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
Week Numbers (2007)
Div
erg
ence
0.00E+00
1.00E-02
2.00E-02
3.00E-02
4.00E-02
5.00E-02
6.00E-02
KL Divergence
Popularity
• Detected by our alg:already in 2006.
•The string “soulja boy” entered the “Atlanta queries top 100” already in October 2006
• Entered the Bubbling Under R&B/Hip-Hop Singles in the 23rd of June 2007•Later ranked first in the following Billboard charts:Hot 100, Hot Rap Tracks, Hot Videoclip, Hot RingMasters and Hot Ringtones
Yung Berg
• Active in LA
• Week 2: Entered LA top 100
• Week 15: First appeared on the Billboard charts
• Week 32: Reached 18 on the Billboard Top 100
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
Week Numbers (2007)
Div
erg
ence
0.00E+00
2.00E-03
4.00E-03
6.00E-03
8.00E-03
1.00E-02
1.20E-02
1.40E-02
1.60E-02
Po
pu
lari
ty
KL Diveregence
Popularity
Madonna
The Detection Algorithm• Input: A list of Geo-identified P2P Query strings
Output: A list of locally popular query string with high probability to become globally popular
• Build local and global popularity charts
• local popularity is detected using local and global popularity thresholds
• Looking for local popularity growth trends from week to week
• Filtering:Non-music related content, and already familiar artists are characterized by uniform distribution
Local Popularity
• Not all queries are “products”, thus divergence is not effective (e.g., rare typos)
• Detection is based on local popularity:
ATPL - All Times Popular List• Initialization: All the strings that reached global popularity in
2006
• Weekly aggregation
• Filters non-volatile string: • adult related, e.g., “porn” • well established artists, e.g., “madonna”, “avril lavigne”• Movies, software, etc.
Algorithm's Flow
Detection Time
Local Threshold
Local Threshold
Manual inspection of the Atlanta data
Correlation Between Billboard and downloads
Correlation Measurements
• Modified time series correlation
• P2P correlation with the Billboard:
Finding The Optimal Time Shift
Prediction Results
• Example:When a song enters the Billboard will it reach “top 20”?
• Precision: 89%, Recall: 80%On average songs pass the threshold 2.83 weeks before reaching top Billboard rank
• More details:Koenigstein, Shavitt, and Zilberman, AdMIRe 2009
Summary
• Following activity in the Internet can help up detect trends before they are visible– P2P networks– Social networks– Blogs– Talk-backs– Searches
• More at http://www.eng.tau.ac.il/~shavitt