+ All Categories
Home > Documents > Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay...

Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay...

Date post: 27-Mar-2015
Category:
Upload: madeline-silva
View: 217 times
Download: 0 times
Share this document with a friend
Popular Tags:
55
www.intel.com/labs Performance Issues Performance Issues in P2P File Sharing in P2P File Sharing Systems Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter King, Heriott Watt Univ)
Transcript
Page 1: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

www.intel.com/labs

Performance Issues in Performance Issues in P2P File Sharing P2P File Sharing

SystemsSystems

Performance Issues in Performance Issues in P2P File Sharing P2P File Sharing

SystemsSystemsKrishna Kant

Ravi IyerVijay TewariIntel Corporation

(With contributions from Peter King, Heriott Watt Univ)

Page 2: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

22

OutlineOutlineOutlineOutline

Part I: P2P Computing Overview of P2P applications Overview of distributed computing frameworks P2P services & their requirements New research issues introduced by P2P

Part II: Performance Study Issues in network modeling P2P file sharing issues. Introduce a tool and some sample results. Additional issues to investigate.

Page 3: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

33

P2P BeginningsP2P BeginningsP2P BeginningsP2P Beginnings Interest kindled by distributed file-sharing applications

Napster: Mediated digital music swapping. (http://www.napster.com)

Peer B has it

Where is “X”?

Copying X

Peer A Peer B

Mediator

1

3

2

Page 4: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

44

P2P BeginningsP2P BeginningsP2P BeginningsP2P Beginnings Gnutella: Fully distributed file sharing. (http://gnutella.wego.com) Freenet Distributed file sharing with anonymity and key based

search. (http://freenet.sourceforge.net)

Peer A

Peer D Peer C

Peer B

C: I have it.

4

C: I have it.3

Where is File (Key) X?

1

Where is File X?

1

Where is File (Key) X?

2

File X6

GET File (Key) X (HTTP)

5

Page 5: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

55

We had them already!We had them already!We had them already!We had them already! Using idle CPU cycles on home PCs, e.g., SETI@home

Involves scanning of radio telescope images for extraterrestrial life. Chunks of data downloaded by home PCs, processed and results returned to

the coordinator. Similar schemes used for other heavy-duty computational problems.

Idle disk and main memory on workstations exploited in a number of network of workstation (NOW) projects.

Master

Peer 2Data

Crunching

Peer 1 Peer 4Peer 3

Raw Data

Processed Data

Data Crunching

Data Crunching

Data Crunching

Page 6: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

66

Newer ApplicationsNewer ApplicationsNewer ApplicationsNewer Applications P2P streaming media distribution

CenterSpan (C-Star Multisource Peer Streaming) Mediated, Secure P2P platform for distributing digital

content. Partition content and encrypt each segment. Distribute

segments amongst peers. Redundant distribution for reliability.

Download segments from local cache, peers or seed servers.

http://www.centerspan.com vTrails

vtCaster: At stream source. Creates network topology tree based on end users (vtPass client software).

Dynamically optimizes tree. Content distributed in a tiered manner. http://www.vtrails.com

Page 7: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

77

Newer ApplicationsNewer ApplicationsNewer ApplicationsNewer Applications P2P Collaboration Networks

A variety of applications: telemedicine, military planning, video-conferencing, document editing.

A group of peers discover one-another and form an ad-hoc network

Peers setup communication channels & distribute objects. Peers do arbitrary real-time computation perhaps involving

multiparty synchronization.

Example: Groove (http://www.groove.net) Real time, small group interaction and collaboration. Fundamental notion around a “shared space”

Each member of the group owns a copy of the “shared space”. Changes made to the “shared space” by one user are propagated

to all others (Store and forward if some member is offline).

Secure platform (PKI for authentication, end to end encryption, digitally signed components)

Page 8: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

88

So, what is P2P?So, what is P2P?So, what is P2P?So, what is P2P? Hype: A new paradigm that can

Unlock vast idle computing power of the Internet, and Provide unlimited performance scaling.

Skeptic’s view: Nothing new, just distributed computing “re-discovered” or made fashionable.

Reality: Distributed computing on a large scale No longer limited to a single LAN or a single domain. Autonomous nodes, no controlling/managing authority. Heterogeneous nodes intermittently connected via links of

varying speed and reliability.

A tentative definition: An uncoordinated dynamic network (peers can come & go as

they please) No central controlling or managing authority. A node can act as both as a “client” and as a “server”.

Page 9: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

99

P2P PlatformsP2P PlatformsP2P PlatformsP2P Platforms Legion, University of Virginia, Now owned by “Avaki” Corp. Globe, Vrije Univ., Netherlands Globus, Developed by a consortium including Argonne

Natl. Lab and USC’s Information Sciences Institute. JXTA, Open source P2P effort started by Sun Microsystems. .NET by Microsoft Corp.

WebOS, University of Washington Magi, Endeavors Technology Groove networks PAST, OceanStore (persistent storage), CAN (content addressable network), CHORD (P2P lookup service), Several others not mentioned here.

Page 10: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

1010

Avaki (Legion)Avaki (Legion)Avaki (Legion)Avaki (Legion) Objective: Wide-area O/S functionality via distributed objects.

Middleware infrastructure for distributed resource sharing in mutually distrustful environment..

Global O/S services built on top of local O/S

*Source: Peer-to-Peer Computing by David Barkai (Intel Press)

Page 11: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

1111

Avaki (Legion)Avaki (Legion)Avaki (Legion)Avaki (Legion) Naming: LOID (location Indep. Object Id), current object

address & object name Persistent object space: generalization of file-system

(manages files, classes, hosts, etc.) Communication: RPC like except that the results can be

forwarded to the real consumer directly. Security: RSA keys a part of LOIDs, Encryption,

authentication, digesting provided. Local autonomy: Objects call local O/S services for all

management, protection and scheduling. Active objects: objects represent both processes and

methods.

Overall: A comprehensive WAN O/S for distributed computing. Not targeted as a general P2P enabler.

Page 12: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

1212

GlobeGlobeGlobeGlobe Objective: Another model for WAN O/S. Distributed passive object model. Processes are separate

entities that bind to objects. Each object consists of 4 subobjects:

Semantics subobject for functionality. Communication subobject for inter-object communication. Replication subobject for replica handling including consistency

maintenance. Control subobject for control flow within the object.

Binding to object includes two steps: Name & location lookup and contact address creation. Selecting an implementation of the interface.

Overall: Similar to Legion, except that processes and objects are not

tightly integrated.

Page 13: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

1313

GlobusGlobusGlobusGlobus Objective: Grid computing, integration of existing services. Defines a collection of services, e.g.,

Service discovery protocol Resource location & availability protocol Resource replication service Performance monitoring service

Any service can be defined and becomes the part of the “system”.

Higher level services can be built on top of basic ones. Preserves site autonomy. Existing legacy services can be

offered unaltered. Overall:

Provides excellent reusability of existing services. Unconstrained toolbox approach => difficult to join two

“islands”.

Page 14: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

1414

JXTAJXTAJXTAJXTA Objective: A low-level framework to support P2P applications:

Avoids any reference to specific policies or usage models. Not targeted for any specific language, O/S, runtime environment, or networking model. All exchanges are XML based.

Base concepts for Peers & peer groups: An arbitrary grouping of peers; group members share resources &

services. Pipes: Unidirectional, asynchronous communication channels. A peer can dynamically

connect/disconnect to any existing pipe within the peer group. Advertisements: A “properties” record needed for name resolution, availability, etc.

Specified as a XML document. Messages: Arbitrary sized w/ source and destination addresses in URI form.

At the highest abstraction defines a set of protocols using the base concepts: Peer Discovery protocol: Discovery of peers, resources, peer groups etc. Peer Resolver Protocol Peer Information Protocol Peer Membership protocol. Pipe binding protocol Peer endpoint protocol.

Page 15: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

1515

JXTAJXTAJXTAJXTA

Source: White Paper on Project JXTA: A Technology Overview by Li Gong

Page 16: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

1616

Microsoft .NET in the context of Microsoft .NET in the context of P2PP2PMicrosoft .NET in the context of Microsoft .NET in the context of P2PP2P Objective: An enabler of general XML/SOAP based

web services.

Message transfer via SOAP (simple object access protocol) over HTTP.

Kerberos based user authentication.

Extensive class library.

Emphasizes global user authentication via passport service (user distinct from the device being used).

Hailstorm supports personal services which can be accessed via SOAP from any entity

Page 17: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

1717

MAGIMAGIMAGIMAGI Enabler for collaborative business applications.

*Source: Peer-to-Peer Computing by David Barkai (Intel Press)

Page 18: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

1818

MagiMagiMagiMagi Magi: Micro-Apache Generic Interface, an

extension of Apache project.

Superset of HTTP using WebDAV: Web distributed authoring & versioning

protocol, which provides, locking services, discovery & assignment services, etc. for web documents.

SWAP (simple workflow access protocol) that supports interaction between running services (e.g., notification, monitoring, remote stop/synchronization, etc.)

Intended for servers; client interface is HTTP.

Page 19: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

1919

WebOSWebOSWebOSWebOS Objective: WAN O/S that can dynamically push functionality

to various nodes depending on loading.

Outgrowth of the Berkeley NOW (network of workstations) project.

Consists of a number of components Global naming: Mapping a service to multiple nodes, load

balancing & failover. Wide-area file system (with transparent caching and cache

coherency). Security & Authentication w/ fine-grain capability control. Process control: Support for remote process execution.

Project no longer active, parts of it being used elsewhere.

Overall: Dynamic configurability useful for P2P environment.

Page 20: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

2020

GrooveGrooveGrooveGroove Groove (http://www.groove.net)

Real time, small group interaction and collaboration. Fundamental notion around a “shared space”

– Each member of the group owns a copy of the “shared space”.

– Changes made to the “shared space” by one member are propagated to each member of the group (Store and forward if some member is offline).

Platform is secure.

– PKI for user authentication.

– End to end encryption.

– Groove components are digitally signed

Page 21: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

2121

Requirements for P2P Requirements for P2P ApplicationsApplicationsRequirements for P2P Requirements for P2P ApplicationsApplications Local autonomy: No control or management by a central authority.

Scalability: Support collaboration of arbitrarily large number of nodes.

Security & Privacy: All accesses are authenticated and authorized.

Fault Tolerance: Assured progress with up to k failures anywhere.

Interoperability: Any peer that follows the protocol can participate irrespective of platform, OS, etc.

Responsiveness: Satisfy the latency expectations of the application.

Non-imposing: Allows machine user full resource usage whenever desired without affecting responsiveness.

Simplicity: Setting up a P2P application or participating in one should require minimum of manual intervention.

Auto-optimization: Ability to dynamically reconfigure the application (no of nodes, functionality, etc.)

Extensibility: Dynamic addition of functionality.

Page 22: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

2222

Some P2P ServicesSome P2P ServicesSome P2P ServicesSome P2P Services Network Services.

Enable communication directly and via firewalls and in the face of intermittent connectivity.

Naming, discovery and membership protocols.

Data and Metadata services Generic mechanism for publishing and obtaining Metadata for various

resources (devices, CPU, memory, files, etc) Event and Exception management services (Publish and subscribe model) Low level file and storage Services

Security Services Key distribution, authentication, encryption.

Advanced Services: Digital Rights management. Administration, Auditing and resource management services. High level file services akin to a virtual file system. User and group management services. Replication and Migration services.

Page 23: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

2323

From Services to possible LayersFrom Services to possible LayersFrom Services to possible LayersFrom Services to possible Layers

• Transport and data protocols for interoperability

• Common protocols: IP, : IP, IPv6, sockets, http, XML, IPv6, sockets, http, XML, SOAP, . . .SOAP, . . .

• NAT and firewall NAT and firewall solutionssolutions

• Roaming, intermittent Roaming, intermittent connectivityconnectivity

Availability from unreliable components

Replication Striping Failover Guaranteed message

queuing

CommunicationsCommunicationsCommunicationsCommunicationsCommunicationsCommunicationsCommunicationsCommunications

Location Independent ServicesLocation Independent Services

Identity, Presence, CommunityIdentity, Presence, CommunityIdentity, Presence, CommunityIdentity, Presence, CommunityIdentity, Presence, CommunityIdentity, Presence, CommunityIdentity, Presence, CommunityIdentity, Presence, CommunityIdentity, Presence, CommunityIdentity, Presence, CommunityIdentity, Presence, CommunityIdentity, Presence, Community

SecuritySecuritySecuritySecuritySecuritySecuritySecuritySecuritySecuritySecuritySecuritySecurity

AvailabilityAvailabilityAvailabilityAvailabilityAvailabilityAvailabilityAvailabilityAvailabilityAvailabilityAvailabilityAvailabilityAvailability

CommunicationsCommunicationsCommunicationsCommunications

Administration, MonitoringAdministration, MonitoringAdministration, MonitoringAdministration, Monitoring

Naming, Discovery, DirectoryNaming, Discovery, DirectoryNaming, Discovery, DirectoryNaming, Discovery, Directory

Sharable ResourcesSharable ResourcesSharable ResourcesSharable Resources

Sta

nd

ard

sS

tan

dar

ds

Sta

nd

ard

sS

tan

dar

ds

Po

lici

esP

oli

cies

Po

lici

esP

oli

cies

Authorization

Integrity

Privacy

Web of trust

Certification

DRM

Page 24: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

2424

From Services to possible LayersFrom Services to possible LayersFrom Services to possible LayersFrom Services to possible Layers

User / group identity

Authentication

Persistence

Beyond a session Across multiple

devices

Local Autonomy

IT allocation of resources

Self administration – reliable whole from unreliable parts

Resource monitoring

Payment tracking

CommunicationsCommunicationsCommunicationsCommunicationsCommunicationsCommunicationsCommunicationsCommunications

Location Independent ServicesLocation Independent Services

Identity, Presence, CommunityIdentity, Presence, CommunityIdentity, Presence, CommunityIdentity, Presence, CommunityIdentity, Presence, CommunityIdentity, Presence, CommunityIdentity, Presence, CommunityIdentity, Presence, CommunityIdentity, Presence, CommunityIdentity, Presence, CommunityIdentity, Presence, CommunityIdentity, Presence, Community

SecuritySecuritySecuritySecuritySecuritySecuritySecuritySecuritySecuritySecuritySecuritySecurity

AvailabilityAvailabilityAvailabilityAvailabilityAvailabilityAvailabilityAvailabilityAvailabilityAvailabilityAvailabilityAvailabilityAvailability

CommunicationsCommunicationsCommunicationsCommunications

Administration, MonitoringAdministration, MonitoringAdministration, MonitoringAdministration, Monitoring

Naming, Discovery, DirectoryNaming, Discovery, DirectoryNaming, Discovery, DirectoryNaming, Discovery, Directory

Sharable ResourcesSharable ResourcesSharable ResourcesSharable Resources

Sta

nd

ard

sS

tan

dar

ds

Sta

nd

ard

sS

tan

dar

ds

Po

lici

esP

oli

cies

Po

lici

esP

oli

cies

Name space management

Metadata management

Discovery & location of peers, services, resources, users

CPU, storage, memory

Bandwidth

I/O devices

Capability discovery

Page 25: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

2525

P2P Research IssuesP2P Research IssuesP2P Research IssuesP2P Research Issues Communication:

Communicating with peers behind NAT devices and firewalls. Naming and addressing peers that do not have DNS entries. Coping with intermittent connectivity & presence (e.g., queued

transfers).

Security and Protection Authentication of users independent of devices. Digital rights management. Access control in a mutually suspicious environment (host machine

& resident foreign objects cannot trust one another).

Topological mapping: P2P network is typically an ad hoc overlay network Usually a severe mismatch between application communication

pattern and physical topology. For planned collaborations, need to reduce this mismatch.

Page 26: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

2626

P2P Research IssuesP2P Research IssuesP2P Research IssuesP2P Research Issues Unobtrusive use by machine owner

A mechanism to measure & control resource usage. Low latency service handoff protocols to allow machine owner takeover. On demand task migration w/o breaking the application.

Information location and retrieval Efficient distributed information location & need based content

migration. Intelligent object retrieval

Retrieval by properties rather than URL. Need distributed indexing mechanisms. Directing searches to more promising and less loaded nodes.

Intelligent caching of search results.

Architectural features Efficiently propagate requests & responses w/o much CPU involvement Squelch duplicate, orphaned or very late responses. Stitch traffic from multiple paths to reduce latency or losses for real-

time applications.

Page 27: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

2727

Scalability IssuesScalability IssuesScalability IssuesScalability Issues Many problems well studied in distributed systems

context, but need to be revisited.

Need scalability to huge number of peers (e.g., 100M): Peer state management for huge number of peers. Discovery and presence management w/ essentially

infinite set of potential peers. Certificate management and authentication for huge user

base over a varied set of devices. Geographically distributed load balancing. Multiparty synchronization and communication.

Page 28: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

www.intel.com/labs

Part 2: Performance Part 2: Performance StudyStudyPart 2: Performance Part 2: Performance StudyStudy

Goals:

1. Define a performance model including

- Network model

- File storage and access model

2. Introduced a tool and discuss sample results.

Page 29: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

2929

P2P Network CharacteristicsP2P Network CharacteristicsP2P Network CharacteristicsP2P Network Characteristics Desirable characteristics

Adequate representation of ad hoc nature of the network. Expected to contain a few special sites (well-known,

content rich, substantial resources, etc.) Heavy-tailed nature of connectivity.

Other Issues Dynamic changes to the network

Direct modeling not required if rate of change << request rate.

Metadata consistency issues still need to be considered. Mapping of virtual P2P network on physical network

P2P applications generally don’t pay attention to mapping. “Virtual links” bet. P2P neighbors are essentially statistically

identical. A better modeling possible, but difficult to calibrate.

Page 30: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

3030

P2P Node & Link ModelsP2P Node & Link ModelsP2P Node & Link ModelsP2P Node & Link Models Consider a 3-tier model for nodes

tier-1: Well-known, resource-rich, always on & part of network. Similar to traditional server nodes (globally known sites in Gnutella) Henceforth called as distinguished nodes.

tier-2: “Hub” nodes (reasonably resource rich & mostly on) Contribute storage/files in addition to requesting them. May join/leave the network, but at time-scale >> req-response

time. Henceforth called as undistinguished nodes.

tier-3: Infrequently connected or primarily “client” functionality No need to represent these explicitly in the network Requests/responses from these appear to originate from tier-1/2

nodes that they home on.

A very simple link model Physical topology ignored; each “link” treated like a single pipe.

=> Links uninteresting from topological perspective.

Page 31: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

3131

P2P Network ModelP2P Network ModelP2P Network ModelP2P Network Model Use a random graph model to represent topology.

Traditional G(n,p) RG model too simplistic.

Use a 2-tier non-uniform model built as follows: Start with a degree Kd regular graph of Nd dist. Nodes. Add Nu undistinguished nodes sequentially as follows:

The new node connects to K other nodes. K: const or an integer-valued RV in range 1..Kmax

Each connection targets an undistinguished node with prob qu (this may not be possible for the first Kmax nodes).

Dist. Node target: uniform distribution over all dist nodes. Undist. Node target: Zipf() over existing undist. nodes. At most one connection allowed between any pair of nodes.

controls the decay rate of nodal degree =0 => Uniform dist => Very slow decay. Used here for

simplicity.

Page 32: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

3232

Topological propertiesTopological propertiesTopological propertiesTopological properties Some network properties can be analyzed analytically Outline of Analysis (see http://kkant.ccwebhost.com/download.htm)

Degree distribution: Distinguished nodes at level 0, each new node defines a new level. Pn(l2,l): Prob(level l node has degree n when current level = l2) Get recurrence eqns for Pn(l2,l) & hence its PGF (z| l2,l) . Get avg degree Dat(l2,l) at level l when current level = l2.

Can be adapted for computing the undistinguished degree of a node. No of nodes reached in h hops:

Rh matrix: Rh(i,j) is prob of reaching level i from level j in exactly h hops. Compute Rh(i,j) by enumerating all unique paths of length h. Compute G(l2,h), avg no of nodes reached in h hops starting from a level l2.

Request and response traffic at level l node: nreqs = No of requests reaching undist. nodes in h hops = 1 + h G(l2,h), nresps = 1 + h h G(l2,h), since resp from h hops away goes thru h nodes.

Nodal utilization & node engineering: Easy to ensure that nodal utilization do not exceed some limits.

Queuing properties generally intractable; explored via simulation.

Page 33: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

3333

Sample Results - 100 nodesSample Results - 100 nodesSample Results - 100 nodesSample Results - 100 nodesundist no_of nodes undist resps trafprob hops reached reached /node

/node

1 5.9 3.3 4.9 6.12 55.2 44.5 103.6 146.5

0.05 3 99.1 85.8 235.2 320.54 100 90.0 238.8 328.85 100 90.0 238.8 328.81 5.9 4.3 4.9 8.42 34.3 23.8 61.7 82.3

0.50 3 91.0 73.9 231.7 304.04 99.9 89.4 267.5 356.95 100 89.6 267.7 357.31 5.9 5.3 4.9 10.62 28.6 22.6 50.3 73.6

0.95 3 76.7 63.8 194.6 258.44 98.5 87.4 281.8 369.25 99.7 89.3 287.8 377.2

Page 34: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

3434

Sample Results - 500 nodesSample Results - 500 nodesSample Results - 500 nodesSample Results - 500 nodes

undist no_of nodes undist resps trafprob hops reached reached /node

/node

1 6.0 3.6 5.0 6.22 243.7 232.7 480.5 711.5

0.05 3 499.7 488.6 1248.41737.04 500.0 490.0 1249.61739.61 6.0 4.7 5.0 8.52 95.7 84.2 184.3 264.6

0.50 3 483.5 465.1 1347.81812.44 500.0 490.0 1413.91903.91 6.0 5.8 5.0 10.72 35.1 29.1 63.2 91.7

0.95 3 163.5 137.1 448.3 582.44 405.7 367.7 1417.21782.7

Page 35: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

3535

Simulation of Random GraphsSimulation of Random GraphsSimulation of Random GraphsSimulation of Random Graphs Simulation of Random graph is a hard problem

Model represents a large number of topologies that the actual network might take.

Too many instances to simulate explicitly and then average the results. Example: 2 dist & 3 undist nodes, each connects to 2 nodes => 6 distinct

topologies.

Possible approaches to simulation: Average case analysis Constrained model (limit the number of of instances). Direct simulation of probabilistic model.

Page 36: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

3636

Average case analysisAverage case analysisAverage case analysisAverage case analysis Intended environment

To study performance of an “average” network defined by RG model.

No dynamic changes to the topology possible.

Graph construction Start with the regular graph of distinguished nodes (as usual). For adding undist nodes, work with only the avg connectivities Kd &

Ku for an incoming node. Always connect to the existing node with min connectivity. Kd & Kd can be used successively to handle non-integer Kd values

(similarly for Ku).

Characteristics/issues Simple, only one graph to deal with in simulation. Gives correct avg reachability and nodal utilizations. All queuing metrics (including avg response time) are

underestimated.

Page 37: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

3737

Constrained ConnectivityConstrained ConnectivityConstrained ConnectivityConstrained Connectivity Intended environment

To capture most likely scenarios of connectivity. Accommodate both static topology an slowly changing topology.

Graph construction and simulation For the entering level l2 node, analytically estimate Dat(l2,l) at all l. Allow connection to a level l node only if degree(l) falls in the

range (min..max) Dat(l2,l) . Found that min=0.5 and max=1.5 is quite adequate. Generate a limited set (~100) instances of the graph. During simulation, each query randomly selects one instance.

Characteristics/issues Avoids highly asymmetric topologies => queuing properties may

be underestimated. All generated instances are given equal weight. Relative weights

can be estimated but very expensive.

Page 38: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

3838

Probabilistic Graph EmulationProbabilistic Graph EmulationProbabilistic Graph EmulationProbabilistic Graph Emulation Intended environment

To study overall performance when the topology is defined by the random graph model.

Accommodate fast changing or unstable topologies.

Method: For each node i, estimate relative prob qij of having an edge to node j i.

A query coming from node k to node i is sent to node j with prob qij/(1-qik).

This virtual topology for the query is used to return responses as well.

Characteristics/Issues Method dependent on analytic calculation of edge probabilities to

neighbors. Single simulation automatically visits various instances in the correct

proportion. No explicit control over which instances are visited => Reliable results may

take a very long time. Very expensive and difficult to handle complex operations (e.g., file

migration).

Page 39: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

3939

File Size & access distributionFile Size & access distributionFile Size & access distributionFile Size & access distribution Using a 2-segment model:

Small sizes: Distribution generally irregular; uniform is a reasonable model. Pareto tail with decay rate 1<<2 is quite reasonable.

Adopted distribution: Uniform dist in the small-size range 400 bytes to 4 KB. Pareto distribution with a min value of 4KB and mean of 40 KB => =

1.11. 40 KB mean is typical for web pages, but too small for MP3 files.

“File category” provides a link between file size and its “popularity”. Needed to model higher access rate of small files.

Chose 9 categories (equally spaced in log domain)

400B, 1.265KB, 4KB, 12.65KB, 40KB, 126.5KB, 400KB, 1.265MB, 4MB, 12.65MB

File access distribution: Across categories, distribution specified by a discrete mass function:

(0.07, 0.14, 0.2018, 0.20, 0.14, 0.098, 0.0686, 0.048, 0.0336) This increases linearly first and then decays geometrically w/ factor 0.7. Within each category, assume uniform access distribution.

Page 40: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

4040

File Copy parametersFile Copy parametersFile Copy parametersFile Copy parameters Each search in a P2P network may result in multiple “hits”.

Need only dist. of hits; precise modeling of search mechanism not needed.

Use file copies for this: Each file has C copies in the range (1..Cmax) with a given distribution.

A file is now identified by the triplet: (category, file_no, copy_no) where file_no is a unique id (e.g., sequence no) of files in a category.

This allows following capabilities: Unique searches specified by the file-id triplet. Non-unique searches specified by (category, file_no). Replication control and fault-tolerant operation.

File copy parameters: Distribution may be related to the nature of the file (not considered here). Separate distributions allowed for files allocated to dist & undist nodes.

Assuming a triangular distribution with Cmax = 20, and mode Cmode= 5 for all nodes => Mean no of copies = 8.667.

Page 41: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

4141

File Assignment to NodesFile Assignment to NodesFile Assignment to NodesFile Assignment to Nodes Assignment of copies to nodes:

Assign copies at a fixed distance so as to distribute them evenly across the network. Apply an offset for each round of copy assignment to avoid bunching up. Do not assign more than one copy of a file to a node.

Algorithm: loop over all files

n_copies = triangular_rv(1, Cmax , Cmode) // Generate random no of copies

if ( n_copies > n_nodes ) n_copies = n_nodes; // Don’t allow more copies than nodes

distance = n_nodes/n_copies; // Distance for copy allocation

offset = 1 + n_nodes/no_files; // If too few files, get an offset to avoid bunching

tot_offset = (tot_offset + offset) % n_nodes;

node_no = tot_offset; // Node for the assignment of first copy

for ( copy_no = 0; copy_no < n_copies; copy_no++) {

assign_file( node_no, file_no, size);

node_no = (node_no + distance) % n_nodes; // Next node for assignment

if ( copy_no < n_copies -1 && node_no == (tot_offset + wraps)% n_nodes) {

node_no = (node_no + 1) % n_nodes; wraps++;

}

} // loop over copies

Page 42: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

4242

Query CharacteristicsQuery CharacteristicsQuery CharacteristicsQuery Characteristics Assumptions:

No queries (searches) started from distinguished nodes since these nodes are essentially “servers”.

Identical query arrival process at each undistinguished node.

Arrival process model An on-off process with identical Pareto distribution for on \& off periods:

P(X>x) = (x/T) for x > T Assume T=12 secs, and =1.4 which gives E(X)=30 secs. Const inter-arrival time of 4 secs during the on-period, no traffic during off

period. Total traffic at a node is superposition of arrivals from all reachable nodes. Approx. a self-similar process with Hurst parameter H=(3 - )/2 = 0.8 when no

of reachable nodes is large.

Query properties: Each query specifies a file (category, file_no) w/ given access characteristics. Shown results do not specify copy_no => Multiple hits possible for each query. Query percolates for h “hops”. (h=3 can cover 90% of nodes for chosen graph). If a query arrives at a node more than once, it is not propagated.

Page 43: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

4343

File Retrieval File Retrieval File Retrieval File Retrieval Query Response:

Query reaching a node generates found/not found response, which travels backwards along the search path.

Querying node runs a timer Tu; all responses after the timeout are ignored.

Currently no concept of retrying the timed out requests. Requests and responses may be culled if response time exceeds a limit.

Distribution of Tu: Triangular in the range (3, 14) secs with mean 8.0 secs.

File retrieval: Randomly choose one of the positively responding nodes for file retrieval. Requested file(s) are obtained directly (i.e., do not follow the response path). Retrieved file may be optionally cached at the requesting node.

File cache flushing Used as an indirect modeling of dynamic changes in tier-3 nodes. A cache flush represents a tier3 user disconnecting and replaced by another

statistically identical tier-3 node. No of cycles before cache flushing: Zipf with min=30, max=120 and =1.0.

Page 44: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

4444

Service time modelingService time modelingService time modelingService time modeling Node service

Each query & response need service at each node visited. File transfer needs service on both ends & has two parts

A basic service time (indep. of file-size, given by a distribution). A file-size dependent component.

Each node implements 3 priority levels for efficient processing Low: queries, Medium: file transfers, High: response processing.

Overall queue size constrained to avoid long queuing delays.

Link Service Link service time also has two components:

A basic service time (indep. of transfer size, given by a distribution). Size dependent part determined from link bit rate.

Link bit rate taken as 3 KB/sec (a estimate of real-life rate on Internet).

Links are pure delay servers (assuming P2P traffic << total traffic).

Page 45: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

4545

P2P Simulation Tool (FSST)P2P Simulation Tool (FSST)P2P Simulation Tool (FSST)P2P Simulation Tool (FSST) Developed a file sharing simulation tool (FSST) with following

functionality Generation of random graphs instances w/ constrained degree. Simultaneous simulation of multiple graphs. Flexible specification of various network & file parameters. Unique & non-unique file searches. Optional culling of requests & responses. Queuing and service at nodes and links. File transfers, file caching, and cache flushing.

Features currently unavailable Automatic propagation of files through the network. Explicit modeling of user retry behavior. Dynamic changes to the network. Mapping between P2P network and physical network.

Tool specifics: Written in C/C++. Uses Sim++ package as simulation engine. Input interface common w/ Geist (demonstrated at this conf.).

Page 46: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

4646

Sample input fileSample input fileSample input fileSample input filenum_graphs = 100; # Number of graphs simulatedmax_deg_mult = 1.5; min_deg_mult = 0.5; # multipliers to get min & max degreesnum_d_nodes = 10; num_u_nodes = 90; # No of dist/undist nodesnum_d_edges = 2; num_u_edges = 4; # Initial no of edges for dist/undist nodeundist_node_prob = 0.50; # Prob of connecting to a undist nodenum_hops = 3; # number of hops each messagen_categories = 10; # Total no of size categoriescategory_boundary =

{400, 1265, 4000, 1.265e4, 4.0e4, 1.265e5, 4.0e5, 1.265e6, 4.0e6, 1.265e7};

category_prob = {0.07, 0.14, 0.2018, 0.20, 0.14, 0.098, 0.0686, 0.048, 0.0336, 0.0}; # Relative prob of each category bucket.d_file_size = {400, 4000, 1.265e7, 4.0e4, 0.0, 0.0, 0.0}; # Distinguished file size parms

# min_unif, max_unif, max, mean, unif_prob, alpha, betau_file_size = d_file_size; # Undist file size parmsd_copies_parms = {Triangle_int, 1, 20, 5, 0}; # number of file copies at dist. nodesu_copies_parms = {Triangle_int, 1, 20, 5, 0}; # No of file copies at undist nodesnum_files = {500, 1000}; # No of files at dist/undist nodesfilestore_size = {2.0e8, 3.2e7}; # File cache size at dist/undist

nodesqueue_depth = {50, 50}; # Max queue length allowedmax_cached_file_size = 80000; # Max file size that is cached

Page 47: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

4747

Sample input file (contd)Sample input file (contd)Sample input file (contd)Sample input file (contd)srch_stime_parms = {Exponential, 0.010, 0.1, 0.015, 0}; # CPU time for searching and search propagation (no local hit)local_srch_stime = {Exponential, 0.002, 0.050, 0.00225, 0}; # CPU time for search in local cache (local hit)rel_cpu_speed = {1.0, 1.0}; # CPU speeds of dist/undist

nodeslink_bandwidth = 3.0e3; # Link BW in bytes/sec link_stime_parms = {Exponential, 0.01, 0.20, 0.015, 0}; # Link service timesearch_priority = low; response_priority = high; # Rel. priorities of query & resp.get_priority = medium; put_priority = medium; # Rel. priority of file gets &

putsput_stime_parms = {Exponential, 0.003, 0.1, 0.005, 0}; # CPU time for file putper_byte_proc_time = 15e-7; # time for processing filesresp_stime_parms = {Exponential, 0.002, 0.1, 0.004, 0}; # resp proc CPU time int_arrival_time = 4; # Inter-arrival time during on periodon_period_parms = {Pareto, 12, 1200, 30, 0}; # On period for req. arrivalsnum_user_on_cycles = {Zipf, 30, 120, 0, 1}; # num cycles before a cache

flushtimer_threshold = {Triangle, 3, 14, 7, 0}; # Elapsed time for link traversalsimulation_warmup_time = 30000;simulation_run_time = 120000;

Page 48: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

4848

Sample Results from FSST (1)Sample Results from FSST (1)Sample Results from FSST (1)Sample Results from FSST (1)

Hops = 1 Hops = 2 Hops = 3 Hops = 4Node Utilization (Dist Node) 8.0% 44.3% 86.0% 96.1%Node Utilization (Other Nodes) 5.2% 25.9% 50.6% 58.9%Queue Length (Dist Nodes) 1.013 2.357 14.581 30.134Queue Length (Other Nodes) 1.048 2.285 5.179 6.972

Node Utilization and Queue Lengths as a function of #hopsNode Utilization and Queue Lengths as a function of #hops

Hops = 1 Hops = 2 Hops = 3 Hops = 4Num Responses Per Request 6.58 51.16 84.17 80.70% Unexpired Responses 99.39% 98.93% 99.27% 99.42%% Expired Responses 0.61% 1.07% 0.73% 0.58%Num Dropped Msgs Per Request 0.00 0.37 7.30 27.59

Reachability and Response RateReachability and Response Rate

% Successful Searches as #Hops Increase

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

1 2 3 4

Number of Hops

% S

uccessfu

l R

eq

uests

% Successful Requests

% Requests served locally

% Requests served remotely

Observations:Observations:Node utilization is significant at Node utilization is significant at hops >=3hops >=3

% successful requests saturates % successful requests saturates beyond 3 hops due to increased beyond 3 hops due to increased queuing and dropped messagesqueuing and dropped messages

Local cache hit rate changes Local cache hit rate changes minimally as a function of the minimally as a function of the number of hopsnumber of hops

Page 49: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

4949

Sample Results from FSST (2)Sample Results from FSST (2)Sample Results from FSST (2)Sample Results from FSST (2)

Impact of File Caching

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

Cache All Cache < 40K No Caching

Caching Option

% S

uc

ce

ss

ful S

ea

rch

es

% Requests servedremotely

% Requests servedlocally

Cache All Cache < 40K No CachingNode Utilization (Dist Node) 86.0% 89.8% 92.8%Node Utilization (Other Nodes) 50.6% 49.9% 52.2%Queue Length (Dist Nodes) 14.581 18.15 21.226Queue Length (Other Nodes) 5.179 3.952 4.161Num Responses Per Request 84.17 89.27 93.93% Unexpired Responses 99.27% 100.00% 100.00%% Expired Responses 0.73% 0.00% 0.00%Num Dropped Msgs Per Request 7.30 4.42 5.43

Impact of the Caching Option SelectedImpact of the Caching Option Selected

Observations:Observations:Node Utilization and queue length Node Utilization and queue length at the distinguished nodes increases at the distinguished nodes increases moderately as less caching is moderately as less caching is performed.performed.

Caching < 40K (avg file size) Caching < 40K (avg file size) seems to provide the highest hit ratio seems to provide the highest hit ratio for searchesfor searches

Expired responses are negligible Expired responses are negligible (perhaps need better (perhaps need better parameterization).parameterization).

Page 50: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

5050

Sample Results from FSST (3)Sample Results from FSST (3)Sample Results from FSST (3)Sample Results from FSST (3)

FS = 16M FS = 32MNode Utilization (Dist Node) 86.0% 80.0%Node Utilization (Other Nodes) 50.6% 46.6%Queue Length (Dist Nodes) 14.581 11.518Queue Length (Other Nodes) 5.179 4.703Num Responses Per Request 84.17 77.54% Unexpired Responses 99.27% 99.33%% Expired Responses 0.73% 0.67%Num Dropped Msgs Per Request 7.30 6.12

Impact of File Store Size

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

FS = 16M FS = 32M

File Store Size

% S

uc

ce

ss

ful S

ea

rch

es

% Requests servedremotely

% Requests servedlocally

Impact of the File Store Size at Non-Distinguished NodesImpact of the File Store Size at Non-Distinguished Nodes

Observations:Observations:Increasing the file store size Increasing the file store size improves the performance scenario improves the performance scenario considerablyconsiderably

Node utilization decreasesNode utilization decreases

Queue Length reducesQueue Length reduces

Search hit ratio improves.Search hit ratio improves.

The average no of responses per The average no of responses per request reduces somewhat because request reduces somewhat because more local hits occurmore local hits occur

Page 51: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

5151

Sample Results from FSST (4)Sample Results from FSST (4)Sample Results from FSST (4)Sample Results from FSST (4)QL = 50 QL = 1000

Node Utilization (Dist Node) 85.4% 86.9%Node Utilization (Other Nodes) 49.3% 50.4%Queue Length (Dist Nodes) 14.255 16.125Queue Length (Other Nodes) 4.636 6.571Num Responses Per Request 83.67 87.62% Unexpired Responses 99.59% 98.93%% Expired Responses 0.41% 1.07%Num Dropped Msgs Per Request 5.53 0.00% Successful Requests 47.60% 47.89%% Requests served locally 9.92% 9.96%% Requests served remotely 37.67% 37.94%

Base DNodePow*2Node Utilization (Dist Node) 86.0% 51.3%Node Utilization (Other Nodes) 50.6% 49.4%Queue Length (Dist Nodes) 14.581 3.504Queue Length (Other Nodes) 5.179 5.447Num Responses Per Request 84.17 82.33% Unexpired Responses 99.27% 99.39%% Expired Responses 0.73% 0.61%Num Dropped Msgs Per Request 7.30 9.81% Successful Requests 51.93% 51.62%% Requests served locally 8.10% 8.14%% Requests served remotely 43.83% 43.49%

Small vs. large queue depth at the nodesSmall vs. large queue depth at the nodes

Impact of More Powerful Impact of More Powerful

Distinguished NodesDistinguished NodesObservations:Observations:Increasing the queue depth ensures Increasing the queue depth ensures no dropping of requests BUT does no dropping of requests BUT does not impact the success rate of node not impact the success rate of node utilization much.utilization much.

Making the distinguished nodes Making the distinguished nodes more powerful seems to have no more powerful seems to have no impact other than the obvious impact other than the obvious reduction in utilization at reduction in utilization at distinguished nodes.distinguished nodes.

Page 52: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

5252

C / Fl C / No FL No C / No FL No C / FLNode Utilization (Dist Node) 86.0% 85.4% 87.8% 92.8%Node Utilization (Other Nodes) 50.6% 49.3% 52.5% 52.2%Queue Length (Dist Nodes) 14.581 14.255 16.053 21.226Queue Length (Other Nodes) 5.179 4.636 6.067 4.161Num Responses Per Request 84.17 83.67 83.40 93.93% Unexpired Responses 99.27% 99.59% 98.63% 100.00%% Expired Responses 0.73% 0.41% 1.37% 0.00%Num Dropped Msgs Per Request 7.30 5.53 11.06 5.43% Successful Requests 51.93% 47.60% 92.33% 28.81%% Requests served locally 8.10% 9.92% 6.33% 0.00%% Requests served remotely 43.83% 37.67% 86.00% 28.81%

Base w/ inf queue + EXPIRY = 8sNode Utilization (Dist Node) 88.3% 88.0%Node Utilization (Other Nodes) 52.4% 52.2%Queue Length (Dist Nodes) 17.556 17.21Queue Length (Other Nodes) 8.578 8.377Num Responses Per Request 89.10 88.88% Unexpired Responses 97.98% 98.31%% Expired Responses 2.02% 1.69%Num Dropped Msgs Per Request 0.00 0.44% Successful Requests 52.41% 52.15%% Requests served locally 8.29% 8.30%% Requests served remotely 44.12% 43.85%

Effect of Caching / Flushing SwitchesEffect of Caching / Flushing SwitchesSample Results from FSST (5)Sample Results from FSST (5)Sample Results from FSST (5)Sample Results from FSST (5)

Effect of Enforcing Message Expiry in NetworkEffect of Enforcing Message Expiry in NetworkObservations:Observations: When flushing and caching are When flushing and caching are both turned off, the search hit ratio both turned off, the search hit ratio is the best (because files do not get is the best (because files do not get replaced & lost).replaced & lost).

Enforcing message expiry makes Enforcing message expiry makes very little difference to the results very little difference to the results (when using the average timer (when using the average timer threshold value as the message threshold value as the message expiry threshold).expiry threshold).

Page 53: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

5353

Conclusions & Future WorkConclusions & Future WorkConclusions & Future WorkConclusions & Future Work Summary of covered material:

Introduced major developments relevant to P2P computing. Introduced sample middleware functionality to support P2P applications. Discussed major research issues to be resolved. Proposed a random graph model for P2P networks and studied its

properties. Studied some performance issues for P2P deployments using detailed

simulation of file-sharing applications.

Future P2P Performance Work Various strategies for automated file propagation through the network. Intelligent caching and invalidation of search results. Key based file location (hashing + searching). Dynamic changes to network and file-sets stored at nodes. Mapping of virtual network over a physical network to obtain more

realistic link delays. Various ways of culling unnecessary requests and responses.

Page 54: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

5454

Relevant sites: P2P ApplicationsRelevant sites: P2P ApplicationsRelevant sites: P2P ApplicationsRelevant sites: P2P Applications

Napster (http://www.napster.com) Gnutella (http://gnutella.wego.com) Freenet (http://freenet.sourceforge.net) JXTA (http://www.jxta.org) Avaki Corp (http://www.avaki.com) Legion (http://legion.virginia.edu) Globe (http://www.cs.vu.nl/~steen/globe) Globus (http://www.globus.org) Microsoft .Net (http://www.microsoft.com/net) CenterSpan (http://www.centerspan.com) vTrails (http://www.vtrails.com) SETI@Home (http://setiathome.ssl.berkeley.edu) CAN (http://www.acm.org/sigcomm/sigcomm2001/p13.html) CHORD (http://www.pdos.lcs.mit.edu/chord) PASTRY (http://research.microsoft.com/~antr/Pastry)

Page 55: Www.intel.com/labs Performance Issues in P2P File Sharing Systems Krishna Kant Ravi Iyer Vijay Tewari Intel Corporation (With contributions from Peter.

April 14, April 14, 20022002

Kant, Iyer & Tewari, Kant, Iyer & Tewari, Performance Issues in P2P file-sharing Performance Issues in P2P file-sharing systemssystems

5555

Relevant Sites: Modeling IssuesRelevant Sites: Modeling IssuesRelevant Sites: Modeling IssuesRelevant Sites: Modeling Issues

File-sharing networks Intl workshop on P2P (http://www.cs.rice.edu/Conferences/IPTPS02/) Jovanovic et al (U/Cinn), Scalability issues in Gnutella

(http://www.ececs.uc.edu/~mjovanov/Research/paper.html) Adar & Hubermann (HP), Free riding in Gnutella

(http://www.firstmonday.dk/issues/issue5_10/adar) Ripeanu (U/Chicago), Peer-to-Peer Architecture Case Study:

Gnutella Networkhttp://www.cs.uchicago.edu/research/publications/techreports/TR-2001-26

Internet graph models Kumar, et. al, (IBM), Web as a Graph,

http://www.almaden.ibm.com/cs/k53/algo.html Aiello et al (AT&T/UCSD) A random graph model for massive graphs,

http://math.ucsd.edu/~llu/random_abs.html Taxonomy

Kant, Iyer & Tewari (A classification framework for P2P technologies) http://kkant.ccwebhost.com/download.html


Recommended