Post on 14-Dec-2015
transcript
A Statistical Physics approach for Modeling P2P Systems
Giovanna CarofiglioGiovanna Carofiglio11, R.Gaeta2, M.Garetto1,
P.Giaccone1, E.Leonardi1, M.Sereno2
MAMA WorkshopMAMA Workshop joint with ACM SIGMETRICS 2005ACM SIGMETRICS 2005Banff, June 6-10, 2005
1 Politecnico di Torino, 2 Università di Torino Italy
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
Outline
Motivation Basic Model Extended Model Content Search Download effects
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
P2P System Architecture
peersclients
server
A possible definition
Decentralized, self-organizing distributed systems, in which all or most communication is symmetric.
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
Peer-to-Peer traffic P2P is the single
largest generator of traffic
P2P traffic significantly outweights web traffic
P2P traffic is continuing to grow
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
P2P Applications
Communication Voice Over IP: Skype Instant Messaging
Distributed Computation Seti@home, UnitedDevices,
Distributed Science
File Sharing BitTorrent, KaZaA,
Gnutella, eDonkey, Napster, etc.
DHTs Chord, CAN, Pastry,
Tapestry
Wireless Ad hoc Networking
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
Motivation
Most of the Internet traffic is generated by p2p applications.
Performance studies of p2p systems may be useful to drive the design of future applications.
Analytical models help analyzing large and complex p2p networks.
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
Modeling techniques
Traditional Markov Models
A detailed microscopic description is provided but with a huge space-state.
It is computationally expensive to analyze large systems like p2p systems (with million of users and contents shared).
Fluid models
Network dynamics are described with an increased level of abstraction, neglecting stochastic information.
Scalability: the model is based on a set of differential equations invariant w.r.t. the size of the network (n.users, link cap)
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
Model description
[1]F. Clevenot, P. Nain, “A Simple Model for the Analysis of SQUIRREL”, Infocom 2004, Hong Kong, Mar 2004.
[2]D. Qiu, R. Srikant, “Modeling and Performance Analysis of BitTorrent like Peer-to-Peer Networks”, Sigcomm 2004, U.S.A.
We model a generic p2p system without focusing on a particular implementation.
Based on a fluid approach like in [1] and [2], our model evolves in a second-order diffusion approximation where stochasticity in networks’ dynamics plays a relevant role.
The model provide a description of users/contents dynamics both in transient and in steady state.
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
Model structure
Users dynamics
Contents dynamics
Search phase
Download phase
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
Outline
Motivation Basic Model Extended Model Content Search Download effects
2
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
The number of users joining the p2p network dynamically changes according to:
Enter-leave dynamics
λ u = new users’ arrival rate 1/μu = average subscription time
Active-Sleeping mode
1/μas = average active time 1/μsa = average sleeping time
Users in sleeping mode do not interact at all with the other users of the community.
Users dynamics (1)
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
Users dynamics (2)
The evolution of the number of users in active or sleeping mode, Ua and Us respectively, can be described by two fluid differential equations:
sleeping users who become active
new users
active users who become sleeping
active users who leave the system
active users who become sleeping
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
Content Dynamics
The evolution of the number of available copies of a content is driven by 2 phenomena:
the generation of new copies (downloads or off-on transitions)
the cancellation of existing copies
θ = average request rate
1/μh , 1/μ’h = average content holding time for active/sleeping users
Note: ps=ps(μ’h ) is the probability that sleeping users have the considered content when they become active.
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
Brownian Motion Content dynamics are modelled through a Second-Order Diffusion Approximation
Each content is a particle with instantaneous position x(t) moving accordingly to a Brownian motion.
Langevin equation
Fokker Planck equation
The evolution of the pdf f(x,t) over follows:
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
Content diffusion equation
Introduction of new contents in the system
A content can disappear when are no more copies available. The rate at which a content disappear is:
The pdf F(x,t) of the number of copies follows the F.P. equation with boundary conditions for :
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
Diffusion Parameters
hh = variation coefficient of holding time
hr = variation coefficient of inter request time
m(x,t) expresses the average speed at which the content-particle moves along the x axis.
The variance σ2(x,t) expresses the burstiness of the processes.
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
Case : Content disappearance (1)
In a single-content scenario we study the probability that the content disappears as a function of the users’ dynamics.
Active Users = 10
Sleeping Users = 10
Copies Availables = 1
Network parameters Initial condition
λ u= users’ arrival rate = 0.1 ut/s
1/μu = avg subscription time = 4000 s
1/μas = avg active period = 400 s
1/μsa = avg sleeping period = 400 s
θ = average request rate
1/μh ,1/μ’h = avg content holding time for a/s users= 100 s
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
Case: Content disappearance (2)
Che grafico facciamo vedere? Modello e simulatore michele a confronto? Solo Modello?
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
Outline
Motivation Basic Model Extended Model Content Search Download effects
2
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
Dual distribution
Relations between users’ and contents’ dynamics
The number of active and sleeping users at time t
The number of copies available at time t
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
Dual equations Ga(x,t) and Gs(x,t) are the pdf of the number of active and sleeping users having x contents:
new usersactive users who become sleeping or leave the system
sleeping users who become active
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
Diffusion parameters
As for the contents diffusion equation m(x,t) expresses the average speed at which the copy-particle moves along the x axis, while σ2(x,t) expresses the variance of the associated process.
ra = rate of generation of new copies
da/s = rate of cancellation of existing copies
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
Multi-contents case (1) In a multi-content scenario, still assuming ideal search
and download we study the steady state distribution of the contents among users.
Active Users = 2500
Sleeping Users=7500
Copies Availables = 1
Network parameters Initial condition λ u= users’ arrival rate = 0 ut/s
1/μu = avg subscription time = inf
1/μas = avg active period = 6 h
1/μsa = avg sleeping period = 18 h
θ = average request rate = 2 c/h
λ c= contents’ introduction= 1/600 c/s
1/μh ,1/μ’h = avg content holding time for a/s users= 10 h, 8 h
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
Multi-contents case (2)
Che grafici facciamo vedere? Modello e simulatore michele a confronto? Solo Modello?
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
Outline
Motivation Basic Model Extended model Content Search Download effects
2
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
The contents’ trasfer rate In a non-ideal p2p system the transfer rate of the contents dynamically changes according to:
the probability of a successful search pphithit(x,t)(x,t) (related to content diffusion, search algorithm)
the probability of a successful download ppdowndown(x,t)(x,t) (related to network congestion, user impatience, on-off dynamics)
The effective retrieval rate becomes:
Both search and download require to know F(x,t) and provide it as a function of time.
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
Search Phase
Search algorithmSearch algorithm: flooding in an unstructured p2p network
For each content request a query message is forwarded to all the neighbors up to the distance max_ttl
Graph ModelGraph Model The P2P network topology is modeled as a random finite graph.
We consider Generalized Random Graph (GRG) to allow an arbitrary vertex degree distribution.
Active peer
Application-level connection
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
GRG Model Given the probability distribution {pk} that a vertex has k edges
departing from it, we can define the generating function:
It can be shown that the generating function of the number of the first neighbors with a copy of the content is:
α = x/Ua
X =#copies
Ua=#active users
The composition of these generating functions gives the generating function of the number of neighbors at distance h
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
GRG Topology
To compute the pdf of the GRG nodes degree we adopt a M/M/∞ queue
Assuming that an external observer joins the network
# customers # connections established in queue by the observer
Now we can define the generating function for the number of neighbors at distance up to max_ttl that have a copy of the content:
Hence it derives the hit probability:
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
Outline
Motivation Basic Model Extended Model Content Search Download effects
2
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
Download Phase AssumptionsAssumptions:
The transport network is ideal Infinite bandwidth on the client side The peer from which downloading the desired content is
rqndomly chosen between those storing that content.
The dynamics of dowload at each peer are modelled by a M/G/1-PS queue.
Problem Problem The download request rate incoming at peers is not known a priori!
It depends on:
The contents’ distribution at peers
The policy used by the system to distribute the load among peers
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
Probability of successful download (1)
Let θ is the popularity of a content, present in x copies in the network where there are Ua active peers
Download request rate
Assuming that the requests form a Poisson process, the queue becomes a M/G/1-PS with average delay:
Given a download rate y= θsphit the probability of successful download is:
Single Content CaseSingle Content Case
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
The overall probability of successful download is
Multiple Content CaseMultiple Content Case
From F(x) we derive the probability that a peer has k contents, present in x copies:
( F(x) is the pdf of the number of copies available for the content )
The overall download request rate seen by a peer is
Probability of successful download (2)
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
Since all Z(x) are independent we can approximate the distribution of Y around its average with a normal distribution
The probability of successful download becomes
my and σy are the first two moments of Y
The integral is restricted to the interval for numerical reasons.
Notes
Probability of successful download (3)
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
Conclusions
We defined a stochastic fluid model of a p2p system able to describe users and contents dynamics both in transient and stationary regime.
A support model permits to consider the effects of the search and the download on the system performance.
Analytical solution of the equations in steady state Model Extension to classes of different users Model Extension to classes of different contents Comparison beetween model and simulations in realistic scenarios.
Work in progress…
MAMA Workshop, Sigmetrics MAMA Workshop, Sigmetrics ‘05‘05
Thank you!Thank you!