Copyright 2002 Ellis Horowitz Peer-to-Peer File Sharing is all
about the trading of copyrighted music and videos without paying
anything to the authors query music category banner ad 3 million
users online sharing 4 PetaBytes of data Kazaa Native Windows
Application
Slide 4
Copyright 2002 Ellis Horowitz Kazaa Survives By Legal
Manuvering March 2001, Kazaa is founded by two Dutchmen, Niklas
Zennstrom and Janus Friis in a company called Computer Empowerment
The software is based upon their FastTrack P2P Stack, a proprietary
algorithm for peer-to-peer communication Kazaa licenses FastTrack
to Morpheus and Grokster Oct. 2001 MPAA and RIAA sue Kazaa,
Morpheus and Grokster Nov. 2001, Consumer Empowerment is sued in
the Netherlands by the Dutch music publishing body, Buma/Stemra.
The court orders KaZaA to take steps to prevent its users from
violating copyrights or else pay a heavy fine. Jan. 2002,
Zennstrom&Friis sell Kazaa software and website to Sharman
Networks, based in Vanuatu, an island in the Pacific, but operating
out of Australia Feb. 2002, Kazaa cuts off Morpheus clients from
FastTrack April 2002, Sharman Networks agrees to let Brilliant
Digital bundle their own stealth P2P application called AltNet
within KaZaA. This network would be remotely switched on, allowing
KaZaA users to trade Brilliant Digital content throughout
FastTrack
Slide 5
Copyright 2002 Ellis Horowitz Morpheus File Sharing Software
behind a firewall searches over multiple categories, metadata a
Java application Search Power Morpheus adopts the Jtella version of
Gnutella banner ad shopping, web browser
Slide 6
Copyright 2002 Ellis Horowitz There are many Gnutella Clients
See http://gnutella.wego.com/
Slide 7
Copyright 2002 Ellis Horowitz Gnutella History Originally
conceived of by Justin Frankel, 21 year old founder of Nullsoft
March 2000, Nullsoft posts Gnutella to the web A day later AOL
removes Gnutella at the behest of Time Warner The Gnutella protocol
version 0.4
http://www9.limewire.com/developer/gnutella_protocol_0.4.pdf and
version 0.6 http://rfc-
gnutella.sourceforge.net/Proposals/Ultrapeer/Ultrapeers.htm there
are multiple open source implementations at http://sourceforge.net/
including: Jtella Gnucleus Software released under the Lesser Gnu
Public License (LGPL) the Gnutella protocol has been widely
analyzed
Slide 8
Copyright 2002 Ellis Horowitz Gnutella Protocol Messages
Broadcast Messages Ping: initiating message (Im here) Query: search
pattern and TTL (time-to-live) Back-Propagated Messages Pong: reply
to a ping, contains information about the peer Query response:
contains information about the computer that has the needed file
Node-to-Node Messages GET: return the requested file PUSH: push the
file to me
Slide 9
Copyright 2002 Ellis Horowitz Gnutella search mechanism 1 2 3 4
5 6 7 A Steps: Node 2 initiates search for file A
Slide 10
Copyright 2002 Ellis Horowitz Gnutella Search Mechanism 1 2 3 4
5 6 7 A Steps: Node 2 initiates search for file A Sends message to
all neighbors A A
Slide 11
Copyright 2002 Ellis Horowitz Gnutella Search Mechanism 1 2 3 4
5 6 7 A Steps: Node 2 initiates search for file A Sends message to
all neighbors Neighbors forward message A A A
Slide 12
Copyright 2002 Ellis Horowitz Gnutella Search Mechanism 1 2 3 4
5 6 7 Steps: Node 2 initiates search for file A Sends message to
all neighbors Neighbors forward message Nodes that have file A
initiate a reply message A:5 A A:7 A A
Slide 13
Copyright 2002 Ellis Horowitz Gnutella Search Mechanism 1 2 3 4
5 6 7 Steps: Node 2 initiates search for file A Sends message to
all neighbors Neighbors forward message Nodes that have file A
initiate a reply message Query reply message is back- propagated
A:5 A:7 A A
Slide 14
Copyright 2002 Ellis Horowitz Gnutella Search Mechanism 1 2 3 4
5 6 7 Steps: Node 2 initiates search for file A Sends message to
all neighbors Neighbors forward message Nodes that have file A
initiate a reply message Query reply message is back- propagated
A:5 A:7
Slide 15
Copyright 2002 Ellis Horowitz Gnutella Search Mechanism 1 2 3 4
5 6 7 Steps: Node 2 initiates search for file A Sends message to
all neighbors Neighbors forward message Nodes that have file A
initiate a reply message Query reply message is back- propagated
File download Note: file transfer between clients behind firewalls
is not possible; if only one client, X, is behind a firewall, Y can
request that X push the file to Y download A
Slide 16
Copyright 2002 Ellis Horowitz Other Gnutella Issues GUID: Short
for Global Unique Identifier, a randomized string that is used to
uniquely identify a host or message on the Gnutella Network. This
prevents duplicate messages from being sent on the network.
GWebCache: a distributed system for helping servents connect to the
Gnutella network, thus solving the "bootstrapping" problem.
Servents query any of several hundred GWebCache servers to find the
addresses of other servents. GWebCache servers are typically web
servers running a special module. Host Catcher: Pong responses
allow servents to keep track of active gnutella hosts On most
servents, the default port for Gnutella is 6346
Slide 17
Copyright 2002 Ellis Horowitz Network growth statistics Growth
Factors DSL and cable modem nodes grew substantially Multiple
client implementations became available There was significant
growth in the Gnutella network in 2001 5,000 nodes on February
2001, 10,000 nodes on March 19, 2001 20,000 nodes on May 12, 2001
40,000 nodes on May 29, 2001 Statistics due to Matei Ripeanu, see
http://people.cs.uchicago.edu/~matei/ PAPERS/gnutella-rc.pdf
Slide 18
Copyright 2002 Ellis Horowitz Limewire Count of Gnutella Hosts
in 2002 Green graph represents unique hosts
Slide 19
Copyright 2002 Ellis Horowitz Growth invariants (1): avg. node
connectivity 3.4 links per node on average graph due to Matei
Ripeanu
Slide 20
Copyright 2002 Ellis Horowitz Growth invariants (2): network
diameter Node-to-node distance maintains similar distribution
Average node-to-node distance grew 25% while the network grew 50
times over 6 months graph due to Matei Ripeanu
Slide 21
Copyright 2002 Ellis Horowitz Is Gnutella a power-law network?
November 2000 Power-law networks: the number of links per node
follows a power-law distribution Examples: the Internet, in/out
links to/from HTML pages, citation network, US power grid
Implications: High tolerance to random node failure but low
reliability when facing of an intelligent adversary graph due to
Matei Ripean
Slide 22
Copyright 2002 Ellis Horowitz Total Generated Traffic Ripeanu
has determined that Gnutella traffic totals 1Gbps (or 330TB/month)!
Compare to 15,000TB/month in US Internet backbone (Dec. 2000) this
estimate excludes actual file transfers Reasoning: QUERY and PING
messages are flooded. They form more than 90% of generated traffic
predominant TTL=7 >95% of nodes are less than 7 hops away
measured traffic at each link about 6kbs network with 50k nodes and
170k links Statistics due to Matei Ripeanu
Slide 23
Copyright 2002 Ellis Horowitz Mapping between Gnutella Network
and Internet Infrastructure A DB C E H G F Perfect Mapping
Slide 24
Copyright 2002 Ellis Horowitz A DB C E H G F Mismatch between
Gnutella Network and Internet Infrastructure Inefficient mapping
Link D-E needs to support six times higher traffic.
Slide 25
Copyright 2002 Ellis Horowitz Topology mismatch The overlay
network topology doesnt match the underlying Internet
infrastructure topology! 40% of all nodes are in the 10 largest
Autonomous Systems (AS) Only 2-4% of all TCP connections link nodes
within the same AS Largely random wiring Most Gnutella generated
traffic crosses AS border, making the traffic more expensive May
cause ISPs to change their pricing scheme
Slide 26
Copyright 2002 Ellis Horowitz Scalability Whenever a node
receives a message, (ping/query) it sends copies out to all of its
other connections. existing mechanisms to reduce traffic: TTL
counter Cache information about messages they received, so that
they don't forward duplicated messages.
Slide 27
Copyright 2002 Ellis Horowitz Free Riding on Gnutella 70% of
Gnutella users share no files 90% of users answer no queries Those
who have files to share may limit number of connections or upload
speed, resulting in a high download failure rate. If only a few
individuals contribute to the public good, these few peers
effectively act as centralized servers. see Adar and Huberman at
http://www- 2.cs.cmu.edu/~kunwadee/res earch/p2p/gnutella.html
Slide 28
Copyright 2002 Ellis Horowitz Free Riding on Gnutella More than
25% of Gnutella clients share no files; 75% share 100 files or less
Conclusion: Gnutella has a high percentage of free riders *
Statistics due to S. Gribble
Slide 29
Copyright 2002 Ellis Horowitz Anonymity Gnutella provides for
anonymity by masking the identity of the peer that generated a
query. However, IP addresses are revealed at various points in its
operation: HITS packets includes the URL for each file, revealing
the IP addresses Clients claim that they have no control, but..
they support bootstrapping they may control message flow they may
control metadata searches they may control program updates
Slide 30
Copyright 2002 Ellis Horowitz Query Expressiveness Format of
query not standardized No standard format or matching semantics for
the QUERY string. Its interpretation is completely determined by
each node that receives it. String literal vs. regular expression
Directory name, filename, or file contents Malicious users may even
return files unrelated to the query
Slide 31
Copyright 2002 Ellis Horowitz Gnutella Queries "The popularity
of Gnutella queries and its implications on scalability" Kunwadee
Sripanidkulchai, see http://www-
2.cs.cmu.edu/~kunwadee/research/p2p/gnutella.html Examining over 5
million queries
Slide 32
Copyright 2002 Ellis Horowitz Security Recently there have been
P2P viruses and worms constructed the Benjamin virus uses Kazaa to
spread itself, see
http://www.viruslist.com/eng/viruslist.html?id=497 90 Kazaa now
includes virus checking software that is applied before
upload/after download There have been several Gnutella worms:
Gnutella.worm, VBS/GWV.a, VBS_GNUTELWORM, VBS.Gnut.A, VBS/Gnu A
Gnutella worm spreads by making a copy of itself in the Gnutella
program directory, then making that directory available for sharing
files on the Gnutella network.
Slide 33
Copyright 2002 Ellis Horowitz Conclusions Gnutella is a
self-organizing, large-scale, P2P application that produces an
overlay network on top of the Internet; it appears to work Growth
is hindered by the volume of generated traffic and inefficient
resource use since there is no central authority the open source
community must commit to making any changes Suggested changes have
been made by Peer-to-Peer Architecture Case Study: Gnutella
Network, by Matei Ripeanu Improving Gnutella Protocol: Protocol
Analysis and Research Proposals by Igor Ivkovic
Slide 34
Copyright 2002 Ellis Horowitz Legal Questions Do US courts have
jurisdiction over P2P companies? Do P2P companies really contribute
to copyright infringement, cite: Sony BetaMax case? Do P2P
companies affect file sharing? If Kazaa, Grokster and Morpheus are
stopped, will that stop file sharing or copyright
infringement?
Slide 35
Copyright 2002 Ellis Horowitz Some References [1] Eytan Adar
and Bernardo A. Huberman, Free Riding on Gnutella
http://www.firstmonday.dk/issues/issue5_10/adar/ [2] Igor Ivkovic,
Improving Gnutella Protocol: Protocol Analysis And Research
Proposals http://www9.limewire.com/download/ivkovic_paper.pdf [3]
Jordan Ritter, Why Gnutella Can't Scale. No, Really.
http://www.monkey.org/~dugsong/mirror/gnutella.html [4] Matei
Ripeanu, Peer-to-Peer Architecture Case Study: Gnutella network.
http://www.cs.uchicago.edu/%7Ematei/PAPERS/gnutella-rc.pdf [5] The
Gnutella Protocol Specification v0.4
http://www9.limewire.com/developer/gnutella_protocol_0.4.pdf