P2P-VoD on Internet: Fault Tolerance and Control Architecture
Rodrigo Godoi
caos Universitat Autònoma de BarcelonaComputer Architecture & Operating Systems Department
Barcelona, July 2009.
Advisor: Dr. Porfidio Hernández Budé
Contents
Introduction
Goal of the Thesis
Control assessment
The Fault Tolerance Scheme
Simulation
Experimental Results
Conclusions
Future Work
Contents
Introduction
Goal of the Thesis
Control Assessment
The Fault Tolerance Scheme
Simulation
Experimental Results
Conclusions
Future Work
Video on Demand - VoD•Multimedia service
•Asynchronous requests
•Every client enjoys entire content
•Long sessions (> 60 min.)
VoD - requirements and constraints
Large scale Video on Demand - LVoD Clients: thousands and disperse Multimedia contents: huge catalogue
Time limit on handling data
Quality of Service (QoS) Fault Tolerance
Multicast
Peer-to-Peer
Internet
Control Architecture
Scalability
Soft real-time
Multicast
IP Multicast: •Source tree (e.g. PIM-DM)•Shared tree (e.g. PIM-SM)
Application Layer Multicast ALM (e.g. NICE, ALMI)
Overlay Multicast(e.g. OMNI)
- implementations
Multicast - Patching
Multicast
Unicast
t= 0
t = 6
Base stream
Patch stream
Patching: multicast technique for multimedia data delivery
Peer-to-PeerFree cooperation of equals in view of the performance of a
common task.
•Takes advantage of resources (storage, cycles, content, human presence) available at the edges of the Internet.
•Usage: file sharing, distributed multimedia systems, high performance computing.
Synchronised usage of peers resources: Collaboration Groups
Peer-to-Peer - classification
P2P taxonomy
Unstructured
Purely decentralised (Gnutella)
Partially decentralised (FastTrack)
Hybrid (BitTorrent)
Structured
Purely decentralised (Chord)
Location mechanism
Peers
Supernodes
Peers Tracker
Peers
Peers
Chain MeshTree
Overlay topology
Peer-to-Peer - classification
Internet environment
•Worldwide scale
•Heterogeneous environment
•Best-effort service
•Exponential growth rate
Organisation:
•Autonomous Systems (AS): collection of connected IP routing prefixes under the control of one or more network operators (ISPs, universities, companies)
•Network arranged by dimension and purpose (LAN, WAN, MAN)
•Modeled by complex network theory
)ln(
)ln(
k
Nr )1(
2
ii
ii
kk
EC
Clustering coefficient Average path length
Problem
Frequent arrivals/departures
Failures Network Server Peers
Large scale system
ProblemInput rate fluctuation
Source crash
Start-up delay
VoD service must…
respect deadlinesprovide low start-up delayhave a clever buffer usageenforce low control overhead
Cushion bufferFailures/Errors
treatment QoS
Control ArchitectureFault Tolerance
Control relevance
Performance improvement
Control complexity
Resources sharing
Delivery Architecture
P2P and Multicast
Heterogeneity: Internet, peers capabilities, lifetimes
Control Architecture
Fault Tolerance
Fault Tolerance
Failure
Network redundancy
Source redundancy
Error
Forward Error Correction
Automatic Repeat Request
Do not solve the fault
System defect
Consequence of a failure
State of the artService P2P Multicast Internet Fault
ToleranceControl
Assessment
DirectStream VoD Unstructured Tree
ALM x
CoopNet Streaming / VoD
Unstructured Tree
ALMx x
P2VoD VoD Unstructured Tree
ALMx x
DynaPeer VoD Unstructured Tree
IP/ALM x
PPLive Streaming Unstructured Mesh
ALM x x
P2Cast VoD Unstructured Tree
ALM x
Pn2Pn VoD Unstructured Mesh
IPx x
BitToS VoD Unstructured Mesh
ALM x
Promise Streaming Structured Mesh
ALM x
GloVe VoD Unstructured Tree
IP/ALMx x x
Contents
Introduction
Goal of the Thesis
Control Assessment
The Fault Tolerance Scheme
Simulation
Experimental Results
Conclusions
Future Work
Goal of the Thesis
LVoD system
Control Architecture and Fault Tolerance
Multicast
To assess Control impact and propose a Fault Tolerance Scheme for P2P-VoD
service on the Internet.
ScalabilityFlexibilityReliabilityEfficiencyLow overhead
QoS
Contents
Introduction
Goal of the Thesis
Control Assessment
The Fault Tolerance Scheme
Simulation
Experimental Results
Conclusions
Future Work
System architecture
Service P2P Multicast Internet Fault Tolerance
Control Assessment
SystemArchitecture
VoD UnstructuredTree/Mesh
IP/ALM
Internet
Clients overlay topology
Distributed proxy servers
Servers overlay topology
P2P Collaborations
Distributed video servers
Clients
Internet Autonomous SystemIP Multicast
zone
The Failure Management Process
Detection Recovery
Maintenance
Basis of Fault Tolerance Mechanisms
• Income stream monitoring• Heartbeat messages
• Centralised• Subsequent queries
• Network infrastructure• Peer status
Load and Time metrics
Volume of control messages that flows through the system on failure management processes
Control overhead - congestion, bandwidth consumption
Time cost
Time consumed by solving peer failures
Control efficiency - start-up delay, buffer usage
Load cost
Background: VoD service schemes
Gather different aspects of P2P-VoD services
PCM/MCDB•IP Multicast (local level)•Patching•Mesh-based P2P•Heartbeats/ buffer monitoring •Centralised recovery
P2Cast•ALM (AS level)•Patching•Tree-based P2P•Heartbeats/ buffer monitoring •Recursive recovery
PCM/MCDBPCM: Patch Collaboration Manager
MCDB: Multicast Channel Distributed Branching
PCM
MCDB
Bypass
Fault Tolerance - PCM/MCDB
Ch. M0
Ch. M1
Ch. M2
MCDB
Detection messages
Maintenance messages
Recovery messages
• Centralised recovery.
• IP Multicast tree rearrangement
P2Cast
VoD Server
35.8
31.0
34.0 35.0
39.9
40.0
37.0 35.5
Session 4
• Clients are divided into sessions according to the arrival time in the system (session threshold parameter - T)
• Best-fit algorithm: Peer with great amount of available bandwidth is selected as parent
20.0
21.0 24.0
27.0
Session 3
Base Stream
Patch Stream
T
Fault Tolerance - P2Cast
20.0
35.8
31.0
21.0 24.0
27.0
34.0 35.0
39.9
40.0
37.0 35.5
Session 3Session 4
VoD Server
• Source peers (Parents) failures provoke stream disruption
• Subsequent recovery queries
Detection messages
Recovery messages
Load cost
CG
iigccHBd NfC
1)(_
][.MGMG
CHfer NNOfC
MG
iigHTICCIm NfNfC
1)(
Detection
Recovery
Maintenance
Heartbeats
IP multicast rearrangement
Recovery request
Peers status
Routers status
Control messages
ccHBd NfC
PCM/MCDB P2Cast
efer p
OfC.
Subsequent queries
HfC TIm .
Routers status
Heartbeats
)ln(
)ln(1
k
Hl
pf
WC
eHBt
)ln(
)ln(
k
Hl
f
WC
HBt
Time cost
PCM/MCDB
P2Cast
Detection
Subsequent queries
Recovery messages
Detection
latency
Path (network theory): small-world effect
Time consumption
Recovery messages
latency
Path (network theory): small-world effect
Contents
Introduction
Goal of the Thesis
Control Assessment
The Fault Tolerance Scheme
Simulation
Experimental Results
Conclusions
Future Work
The Fault Tolerance Scheme (FTS)
FTS stands on peers capabilities:
89101112L 6713
•Input / Output bandwidth•Buffer size
CushionDeliveryCollaborationAltruist
Buffer In
Buffer Out
bwibwo
buffer
Fault Tolerance Groups
The Fault Tolerance Scheme (FTS)
MN
Video 8 9 10
L
6 7 13 14 1511 123 4 51 2
1 2 3 1 2 3
MN
t = 0
Video
C1
8 9 106 7 13 14 1511 123 4 51 2
7 L4 5 61 2 3 6 7
L 4 51 2 37 8 94 5 61 2 3 10
C1
MN
t = 0
t = 3
7
Video
MN
C1
C2
8 9 10
7 L
6 7 13 14 1511 123 4 51 2
4 5 61 2 3 14 1511 12 13
10 11 L127 8 94 5 6 9 106 7 813 14
13 14 L1510 11 128 9 4 51 2 3
C1 C2
MN
t = 0
t = 3 t = 10
[t = 17]
Cushion Delivery Gen. purpose Altruist
MN
Fault Tolerance Groups
Manager Node
FTS Collaborators
Load and Time costs with the FTS
MNHBd NfC
)ln(
)ln(
k
HlWC t
The proposed Fault Tolerance Scheme…
MN
G
iigccHBd NNfC
C
·21
)(_
MG
iigHTIm NfC
1)(
fer OfC .
)ln(
)ln(
k
Hl
f
WC
HBt
distributes the control through Manager Nodes
eliminates messages for peers status maintenance
removes subsequent queries during recovery
can detect failures through heartbeats (FTS I) and income stream monitoring (FTS II)
Contents
Introduction
Goal of the Thesis
Control Assessment
The Fault Tolerance Scheme
Simulation
Experimental Results
Conclusions
Future Work
Simulation tool: VoDSimComputational simulations provide a more dynamic and
scalable analysis
•Discrete event-driven model•More than 50 classes in C++•Over 46.000 lines
•Peer arrival rate: Poisson
•Content popularity: Zipf
•Implementation of ALM service scheme: P2Cast
•Peers disruptions: Weibull
•FMP instrumentation: Load and Time costs measurement
VoDSim extensions
P2Cast
Peers disruptio
ns
FMP instrumen
ta-tion
)/()( xexp )/(1)( xexp
Fault probability Lifetime
Contents
Introduction
Goal of the Thesis
Control Assessment
The Fault Tolerance Scheme
Simulation
Experimental Results
Conclusions
Future Work
Experimental Results
1 - Control relevance
• PCM/MCDB and P2Cast service schemes• Analytical and Simulated results• Load and Time costs behaviour• Control vs. Multimedia traffic
Failure Management Process validationParameter Value
Service scheme P2CastRequest rate 10 - 60 requests/minute.
Client’s output bandwidth 3000 kb/s.Client’s buffer 113MB (10 min. of video).
Video catalogue 1 video.Video length 90 minutes.
Video play rate 1500 kb/s.Threshold 1%, 5% and10% of video
length
0,00E+000
1,00E+006
2,00E+006
3,00E+006
4,00E+006
5,00E+006
6,00E+006
7,00E+006
8,00E+006
1 10 100 1000 10000
Number of clients
Lo
ad C
ost
(M
essa
ges
)
Analytical Model Simulation
N
MZs
2)(
Z Simulated valueM Average simulated cost.N Number of simulation samples.s Standard deviation.
1,0
10,0
100,0
1000,0
0,0010 0,0100 0,1000 1,0000
Success probability on search collaboration
Tim
e C
ost (
sec.
)
Analytical Model Simulation
Control vs. Multimedia traffic100
_
videoserver
controlw Tr
Tr
Simulated results (P2Cast)
∆w = 10%-28%
∆w = 13%-39%
∆w =13%- 37%
Performance improvement
Control complexity
Analytical results (PCM/MCDB and P2Cast)
Load cost analysisParameter Value
Number of clients [27 ; 5400]Number of multicast groups 40
Heartbeat frequency 60 msg./min.Failure frequency 0,2 fault/ min.
Peers available bandwidth 1.5-3.0 Mb/sPlayback rate 1500 kb/sVideo length 90 minutes.
Load cost analysisParameter Value
Number of clients 5400Number of multicast groups 11
Heartbeat frequency [0.2 ; 20] msg./min.Failure frequency 0,2 fault/ min.
Peers available bandwidth 1.5-3.0 Mb/sPlayback rate 1500 kb/sVideo length 90 minutes.
Time cost analysis
0,0
10,0
20,0
30,0
0,0001 0,0010 0,0100 0,1000
Network Latency (seconds)
Tim
e C
ost
(se
c.)
PCM-MCDB P2Cast
Time cost increment
High latency
Recovery control messages
Cushion buffer 56MB 11MB
Start-up delay 5 min. 1 min.
Download rate: 1500 kb/s (750+750)
Time cost analysis
Experimental Results
2 - The Fault Tolerance Scheme
• PCM/MCDB and P2Cast service schemes• Load and Time costs without the FTS (analytical)• Load and Time costs with the FTS (analytical)• FTS service performance - Simultaneous failures
0,0E+00
2,0E+04
4,0E+04
6,0E+04
8,0E+04
1,0E+05
1,2E+05
1 10 100
Multicast groups size (clients/session)
Lo
ad C
ost
(m
sg/m
in)
PCM-MCDB PCM-MCDB (FTS I) PCM-MCDB (FTS II)
P2Cast P2Cast (FTS I) P2Cast (FTS II)
Load cost analysis
100)( ._
SP
SPFTSSP
C
CC
)(._ SPFTSSP CC
)(._ SPFTSSP CC
Cost increment
Cost reduction
Parameter ValueNumber of clients [27 ; 5400]
Number of multicast groups 11 / 40Heartbeat frequency [0.2 ; 60] msg./min.
Failure frequency [0.03 ; 80] fault/ min.Peers available bandwidth 1.5-3.0 Mb/s
Playback rate 1500 kb/sBuffer capacity 113MB (10 min.)Video length 90 minutes.
average ∆FTS I x PCM/MCDB -60.3%FTS II x PCM/MCDB -85.5%
FTS I x P2Cast 8.5%FTS II x P2Cast -87.5%
FTS I - heartbeat detectionFTS II - buffer monitoring detection
0,E+00
1,E+04
2,E+04
3,E+04
4,E+04
5,E+04
6,E+04
7,E+04
8,E+04
0,10 1,00 10,00 100,00
Heartbeat frequency (msg/min)
Lo
ad
Co
st
(ms
g/m
in)
PCM-MCDB PCM-MCDB (FTS I) PCM-MCDB (FTS II)
P2Cast P2cast (FTS I) P2Cast (FTS II)
0,0E+00
5,0E+04
1,0E+05
1,5E+05
2,0E+05
2,5E+05
0,01 0,1 1 10 100
Failure Frequency (faults/min)
Lo
ad
Co
st
(ms
g/m
in)
PCM-MCDB PCM-MCDB (FTS I) PCM-MCDB (FTS II)
P2Cast P2cast (FTS I) P2Cast (FTS II)
Load cost analysis
average ∆FTS I x PCM/MCDB -90.6%FTS II x PCM/MCDB -96.1%
FTS I x P2Cast 2.1%FTS II x P2Cast -80.4%
Overhead reductionScalability
average ∆FTS I x PCM/MCDB -49.9%FTS II x PCM/MCDB -75.0%
FTS I x P2Cast -0.9%FTS II x P2Cast -91.6%
Cushion buffer 56MB
11MB
Time cost analysis
0,0
10,0
20,0
30,0
0,0001 0,0010 0,0100 0,1000
Network Latency (seconds)
Tim
e C
ost
(se
c.)
PCM-MCDB P2Cast
Time cost increment
High latency
Volume of communication FTS
τ = 1/(2·fHB)
Start-up delay5 min.
1 min.
Efficiency
FTS I - heartbeat detectionFTS II - buffer monitoring detection
Fault Tolerance service performanceParameter Value
Number of clients 10800Video channels with P2P collaboration
1000
Altruist Buffer338MB (30 min.) and
102MB (9 min.).Video length 90 minutes.
Video play rate 1500 kb/s.
NCFTG \ bw (kb/s) sf5 \ 300 2004 \ 750 400
3 \ 1500 6003 \ 3000 12003 \ 6000 2400
NCFTG \ bw (kb/s) sf10 \ 300 40010 \ 750 1000
10 \ 1500 200010 \ 3000 400010 \ 6000 8000
Altruist buffer 338MBReliabilityFlexibility
Altruist buffer 102MB
Contents
Introduction
Goal of the Thesis
Control Assessment
The Fault Tolerance Scheme
Simulation
Experimental Results
Conclusions
Future Work
Conclusions
Quality of Service
Load costControl overhead: network congestion,
bandwidth resources
Time costEfficiency: buffer usage, start-up delay
Load and Time costs trade-off
Reduction of Load and Time costs
Internet
P2P
Multicast
Increase
control
complexity
Control mechanism plays a crucial role on designing P2P-VoD systems
Conclusions
is flexible for Internet use
presents hierarchical control structure
has scalable backup mechanism
do not demand extra data communication and dedicated resources
is able to guarantee system reliability
reduces Load and Time costs
The Fault Tolerance Scheme…
Contents
Introduction
Goal of the Thesis
Control Assessment
The Fault Tolerance Scheme
Simulation
Experimental Results
Conclusions
Future Work
Future Work
• Application and assessment of the FTS in a wide range of VoD architectures and service policies
• Implementation of the FTS in a simulation environment
• FTS improvement: storing parts of non-visualized contents; using non-volatile storage devices (e.g. Solid State Disk drives)
• Addition of VCR / DVD-like operations
• Usage of clients behaviour information to improve system performance
P2P-VoD on Internet: Fault Tolerance and Control Architecture
Rodrigo Godoi
caos
Barcelona, July 2009.
Thank you
Gracias
Obrigado
The Fault Tolerance Scheme (FTS)
•Server: content seed
•Peer: multimedia client / source
•FTG member: collaborator in the FTS
•Manager Node: organize and monitor FTG
Architecture elements
•Distributed backup: flexibility and reliability.
•Built on the fly: backup do not need retransmission.
•P2P based: mechanism uses own system available resources.
•Hierarchical control: scalability and deployment.
Manager Node
Server
Fault Tolerance Group
Clients
Control comm.
FTG members
The FTS formation lawPeers’ bandwidth greater than playback rate (bw ≥ Vpr)
];[ maxmin ddd
];[ maxmin bwbwbwo
FTGCN
iidL
1
maxmin
minmaxCFTGCFTGCFTGCFTG NNN
d
LN
d
L
pr
o
V
bwMINCC
)(
vf streamsPPFTSPs _2%
)(bwfsCC
Ld
Ld
While
If
then
Add Collaborator to FTGIf
then New
FTG.
Input parameters Distributed
backupFTG size
Service conditions
The FTS formation lawPeers’ bandwidth lower than playback rate (bw < Vpr )
FTGCN
iidL
1
FTGCN
iiopr bwV
1
max2
max1 ;
d
LN
bw
VN pr
121 minNNNN CFTG
10
0
,
1
221
min
min
1
2
QNNr
NNr
rQ
NN
CFTG
CFTG
NN
if:
if:
maxmin CFTGCFTGCFTG NNN
pr
o
V
bwCC
)(bwfsCC
1 CCLd
1 CCLd
While
If
then
Add Collaborator to FTG.If
then
New FTG.
Buffer and Bandwidth constraints
Collaboration Capacity
FTG size
A*…
…C1
…C2
AVideo C BDEH G F…
…C3
A*
A*
A*
B*
B*
B*
B*
A
15A · 2 15
A · 315
A · 5
15A · 5
C1 C2
MN
C3
MN [500kb/s]C1 [200kb/s]C2 [300kb/s]C3 [500kb/s]Vpr [1500kb/s]
MN
The Fault Tolerance Scheme (FTS)
Creation of Fault Tolerance Groups
Client
I
Local Server
II
Collab. availability
FTS ack.Join to FTG
Standby status
Start new FTG and become Manager Node
C1 C2
MN
MN
IV
III
The Fault Tolerance Scheme (FTS)
FTG: complexity and maintenance
Standby Peer
Local Server
Restoring the FTGDesignation of new MN
C1 C2
I
II Restoring the FTG
MN
failure
C1
MN
Member failure
O(NCFTG)
Evaluation environment
Underlying network: GT-ITM topology generatorTransit-stub model
• 1Transit domain (3 routers)• 6 Stub domain (54 routers)
Service schemes• ALM / tree-based P2P (P2Cast)• IP Multicast / mesh-based P2P (PCM/MCDB)
Network protocols• Unicast: OSPF• IP Multicast: IGMP and PIM-SM
Evaluation environment
Description Stage
IP MulticastPIM-SM
Hello - sent periodically on each PIM-enabled interface, with a destination address for all the PIM routers multicast group.
Maintenance
Join/Prune - consists of a list of groups and a list of Joined and Pruned sources for each group in order to build the distribution tree.
Recovery \ Maintenance
IP MulticastIGMP
Create Group Request - requests the creation of a new transient host group.
Recovery
General Query - periodically solicits the group membership information.
Recovery
Host Membership Report - message to the group address for each group to which a host desires to belong.
Recovery
UnicastOSPF
Hello - used to perform neighbour discovery, continually sent to notice when connectivity has failed.
Maintenance
Advertisement - communicates the router's local routing topology to all other local routers in the same OSPF area.
Maintenance
Network Protocols
Conclusions - publicationsThe FMPLoad cost PCM/MCDBTree topology network
CEDI/JP07
Load costPCM/MCDBCentralised vs. Distributed FMPMulticast vs. Unicast
CACIC07
Load cost PCM/MCDBTransit-stub network topologyCentralised vs. Distributed FMP
JCS&T08
PCM/MCDB Load cost Transit-stub topology networkManager NodesControl vs. Multimedia traffic
EuroPar08
Load costTime costP2CastFMP simulated results
PDPTA09
Load costTime costP2CastControl vs. Multimedia traffic (simulated results)
ICIP09