+ All Categories
Home > Documents > TURINstream: A Totally pUsh, Robust, and effIcieNt P2P Video Streaming Architecture

TURINstream: A Totally pUsh, Robust, and effIcieNt P2P Video Streaming Architecture

Date post: 11-Dec-2016
Category:
Upload: matteo
View: 213 times
Download: 1 times
Share this document with a friend
14
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 12, NO. 8, DECEMBER 2010 901 TURINstream: A Totally pUsh, Robust, and effIcieNt P2P Video Streaming Architecture Andrea Magnetto, Rossano Gaeta, Marco Grangetto, Senior Member, IEEE, and Matteo Sereno, Member, IEEE Abstract—This paper presents TURINstream, a novel P2P video streaming architecture designed to jointly achieve low delay, robustness to peer churning, limited protocol overhead, and quality-of-service differentiation based on peers cooperation. Separate control and video overlays are maintained by peers organized in clusters that represent sets of collaborating peers. Clusters are created by means of a distributed algorithm and permit the exploitation of the participant nodes upload capacity. The video is conveyed with a push mechanism by exploiting the advantages of multiple description coding. TURINstream design has been optimized through an event driven overlay simulator able to scale up to tens of thousands of peers. A complete proto- type of TURINstream has been developed, deployed, and tested on PlanetLab. We tested our prototype under varying degree of peer churn, flash crowd arrivals, sudden massive departures, and limited upload bandwidth resources. TURINstream fulfills our initial design goals, showing low average connection, startup, and playback delays, high continuity index, low control overhead, and effective quality-of-service differentiation in all tested scenarios. Index Terms—Multiple description coding, peer-to-peer, Planet- Lab testbed, push architecture, video streaming. I. INTRODUCTION P EER-TO-PEER (P2P) streaming has proved to be a vi- able and efficient approach to support the broadcasting of live or prerecorded media over the Internet. Peers contribute their uplink bandwidth by forwarding content to their connected peers. Global available resources (aggregate uplink bandwidth) grow as the number of peers increases, thus making the ap- proach potentially able to scale to a large number of users. Nowadays P2P streaming architectures can be broadly clas- sified in two classes: tree-based (e.g., [1]–[3]) and mesh-based (e.g., [4], [5]). In tree-based approaches (also termed as push- based), the overlay is composed of several diverse trees that are used to multicast the video packets to the peers. The tree based approach can easily exploit multiple description coding (MDC) [6], [7], where the video is encoded onto a set of in- dependent descriptions to be pushed across separate trees. The mesh-based approach is based on file swarming mechanisms where participating peers form a randomly connected mesh and use gossip-like protocols for the creation and the administration of the overlay; buffer maps are frequently exchanged among the Manuscript received March 10, 2010; revised July 09, 2010; accepted September 06, 2010. Date of publication September 20, 2010; date of current version November 17, 2010. This work was supported in part by the FP7 COAST (FP7-ICT-248036) project, funded by the European Community. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Paal Halvorsen. The authors are with the Università degli Studi di Torino, Dipartimento di Informatica, Torino, Italy (e-mail: [email protected]; [email protected]; [email protected]; [email protected]). Digital Object Identifier 10.1109/TMM.2010.2077623 peers to signal the available video chunks. Each peer aims at retrieving the video stream by explicitly requesting its missing chunk (pull-based approach). Recently, [8] provided a compar- ison between the two approaches as well as the identification of similarities and differences. Push/tree P2P system generally guarantees lower startup and playback delays, which depend on the topological features of the overlay. Despite these advan- tages, the results in [8] indicate that the mesh-based approach generally exhibits a superior performance over the tree-based approach. The main shortcomings of the tree-based systems are reported to be the large overhead due to the organization of the tree topology, the difficulty to respond to the dynamics of the peers, and the suboptimal exploitation of the upload bandwidth. Nevertheless, the mesh-based approach suffers from other prob- lems; although several mesh-based mechanisms that deployed MDC have been proposed [9], [10], the use of MDC techniques is not straightforward since one has to guarantee that the de- scriptions follow independent distribution trees; indeed the av- erage decoded video quality is maximized if one assumes in- dependent losses of the descriptions. Access control policies and service differentiation are difficult to implement. Further- more, mesh-based approaches suffer from the tradeoff between control overhead and delay. In fact, to minimize delay, peers must notify their neighbors of available packets as frequently as possible, thus resulting in high control overhead. On the other hand, to reduce control messages, notifications must be aggre- gated over time, thus making the delay higher. An attempt to bring the benefits of tree-based approaches in the mesh-based scenario is presented in [11]. The authors propose a pull-push hybrid protocol where packets are pushed along the trees, dy- namically created according to a pull-based protocol. In [12], another mesh-based system that uses a push data dissemination approach is presented. In the present paper, we take a different viewpoint. In partic- ular, we leverage on the topological guarantees of a tree struc- tured overlay yielding short delays and allowing to exploit MDC video, while improving on the side of robustness to peer dy- namics and bandwidth utilization. These two latter objectives have been addressed with the introduction of a small set of fully connected collaborating peers forming what we termed a cluster. The contributions of this work are the design, simula- tion, and implementation of TURINstream, a novel multi-tree P2P streaming architecture where tree nodes are represented by clusters of peers. The concept of cluster has already been in- troduced in some previous research; as opposed to such related works, briefly recalled in Section V, where the clusters are orga- nized in a hierarchical structure, the overlay topology proposed in this paper is very flexible and permits a more efficient ex- ploitation of the resources. 1520-9210/$26.00 © 2010 IEEE
Transcript

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 12, NO. 8, DECEMBER 2010 901

TURINstream: A Totally pUsh, Robust,and effIcieNt P2P Video Streaming Architecture

Andrea Magnetto, Rossano Gaeta, Marco Grangetto, Senior Member, IEEE, and Matteo Sereno, Member, IEEE

Abstract—This paper presents TURINstream, a novel P2Pvideo streaming architecture designed to jointly achieve lowdelay, robustness to peer churning, limited protocol overhead,and quality-of-service differentiation based on peers cooperation.Separate control and video overlays are maintained by peersorganized in clusters that represent sets of collaborating peers.Clusters are created by means of a distributed algorithm andpermit the exploitation of the participant nodes upload capacity.The video is conveyed with a push mechanism by exploiting theadvantages of multiple description coding. TURINstream designhas been optimized through an event driven overlay simulatorable to scale up to tens of thousands of peers. A complete proto-type of TURINstream has been developed, deployed, and testedon PlanetLab. We tested our prototype under varying degree ofpeer churn, flash crowd arrivals, sudden massive departures, andlimited upload bandwidth resources. TURINstream fulfills ourinitial design goals, showing low average connection, startup, andplayback delays, high continuity index, low control overhead, andeffective quality-of-service differentiation in all tested scenarios.

Index Terms—Multiple description coding, peer-to-peer, Planet-Lab testbed, push architecture, video streaming.

I. INTRODUCTION

P EER-TO-PEER (P2P) streaming has proved to be a vi-able and efficient approach to support the broadcasting

of live or prerecorded media over the Internet. Peers contributetheir uplink bandwidth by forwarding content to their connectedpeers. Global available resources (aggregate uplink bandwidth)grow as the number of peers increases, thus making the ap-proach potentially able to scale to a large number of users.

Nowadays P2P streaming architectures can be broadly clas-sified in two classes: tree-based (e.g., [1]–[3]) and mesh-based(e.g., [4], [5]). In tree-based approaches (also termed as push-based), the overlay is composed of several diverse trees thatare used to multicast the video packets to the peers. The treebased approach can easily exploit multiple description coding(MDC) [6], [7], where the video is encoded onto a set of in-dependent descriptions to be pushed across separate trees. Themesh-based approach is based on file swarming mechanismswhere participating peers form a randomly connected mesh anduse gossip-like protocols for the creation and the administrationof the overlay; buffer maps are frequently exchanged among the

Manuscript received March 10, 2010; revised July 09, 2010; acceptedSeptember 06, 2010. Date of publication September 20, 2010; date of currentversion November 17, 2010. This work was supported in part by the FP7COAST (FP7-ICT-248036) project, funded by the European Community. Theassociate editor coordinating the review of this manuscript and approving it forpublication was Dr. Paal Halvorsen.

The authors are with the Università degli Studi di Torino, Dipartimento diInformatica, Torino, Italy (e-mail: [email protected]; [email protected];[email protected]; [email protected]).

Digital Object Identifier 10.1109/TMM.2010.2077623

peers to signal the available video chunks. Each peer aims atretrieving the video stream by explicitly requesting its missingchunk (pull-based approach). Recently, [8] provided a compar-ison between the two approaches as well as the identificationof similarities and differences. Push/tree P2P system generallyguarantees lower startup and playback delays, which dependon the topological features of the overlay. Despite these advan-tages, the results in [8] indicate that the mesh-based approachgenerally exhibits a superior performance over the tree-basedapproach. The main shortcomings of the tree-based systems arereported to be the large overhead due to the organization of thetree topology, the difficulty to respond to the dynamics of thepeers, and the suboptimal exploitation of the upload bandwidth.Nevertheless, the mesh-based approach suffers from other prob-lems; although several mesh-based mechanisms that deployedMDC have been proposed [9], [10], the use of MDC techniquesis not straightforward since one has to guarantee that the de-scriptions follow independent distribution trees; indeed the av-erage decoded video quality is maximized if one assumes in-dependent losses of the descriptions. Access control policiesand service differentiation are difficult to implement. Further-more, mesh-based approaches suffer from the tradeoff betweencontrol overhead and delay. In fact, to minimize delay, peersmust notify their neighbors of available packets as frequently aspossible, thus resulting in high control overhead. On the otherhand, to reduce control messages, notifications must be aggre-gated over time, thus making the delay higher. An attempt tobring the benefits of tree-based approaches in the mesh-basedscenario is presented in [11]. The authors propose a pull-pushhybrid protocol where packets are pushed along the trees, dy-namically created according to a pull-based protocol. In [12],another mesh-based system that uses a push data disseminationapproach is presented.

In the present paper, we take a different viewpoint. In partic-ular, we leverage on the topological guarantees of a tree struc-tured overlay yielding short delays and allowing to exploit MDCvideo, while improving on the side of robustness to peer dy-namics and bandwidth utilization. These two latter objectiveshave been addressed with the introduction of a small set offully connected collaborating peers forming what we termed acluster. The contributions of this work are the design, simula-tion, and implementation of TURINstream, a novel multi-treeP2P streaming architecture where tree nodes are represented byclusters of peers. The concept of cluster has already been in-troduced in some previous research; as opposed to such relatedworks, briefly recalled in Section V, where the clusters are orga-nized in a hierarchical structure, the overlay topology proposedin this paper is very flexible and permits a more efficient ex-ploitation of the resources.

1520-9210/$26.00 © 2010 IEEE

902 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 12, NO. 8, DECEMBER 2010

Fig. 1. TURINstream control and data topologies.

Separation of data and control overlays, cluster-based organ-ization, exploitation of MDC techniques, and tree-based controloverlay are all exploited in the TURINstream design to achieve:

• high degree of robustness providing playback continuityin order to minimize freezing or blackout periods despitepeers departures, failures, and churning. When a peerleaves or crashes, the cluster does not disconnect fromthe network and, while the cluster is being repaired, thevideo content keeps reaching the cluster members. At thesame time, the probability of a joint failure of all the peerssupplying the video to a cluster becomes negligible as thenumber of multiple trees increases;

• low connection, startup, and playback delays. These met-rics all depend on the depth of the control tree whosescaling is logarithmic with respect to the total number ofparticipating peers;

• high scalability and low control overhead to allow a verylarge number of users to join the application;

• rewards for altruistic (cooperative) users and penalties forselfish (un-cooperating) ones. In fact, the join algorithm wedefine allows for the adoption of access control policiesin terms of bandwidth and peer fairness; in other terms,differentiated quality-of-service policies can be used to payback more collaborative peers, i.e., peers that share a higheruplink bandwidth.

The TURINstream design has been aided by an event drivenoverlay simulator able to scale up to tens of thousands of users.Finally, the major contribution of the paper is the developmentof a complete prototype of TURINstream which has allowed usto deploy and test the application on PlanetLab. The TURIN-stream architecture has been used to stream H.264/AVC videoencoded with the MDC technique presented in [7]. Such MDCcodec is compliant with H.264/AVC and has allowed us to usethe standard RTP [13] protocol to transport the video packetsand RTSP/SDP [14], [15] protocols to signal the video session.We tested our prototype under varying degree of peer churn,

flash crowd arrivals, sudden massive departures, and limitedupload bandwidth resources. The main finding is that TURIN-stream fulfills our initial design goals, showing low average con-nection, startup, and playback delays, high continuity index, lowcontrol overhead, and effective quality-of-service differentia-tion in all the scenarios we addressed.

The outline of the paper is as follows: Section II presentsthe protocol operations, Section III describes the simulator wedeveloped, while Section IV presents the TURINstream proto-type and the tests performed on PlanetLab. Finally, Section Vdiscusses similar approaches in the literature while Section VIconcludes the paper with the outline of some of the futuredevelopments.

II. PROTOCOL DESCRIPTION

In this section, we describe the TURINstream protocol. Theprotocol is fully decentralized and it is designed to define andmaintain separated control and video streaming topologies thatare graphically shown in Fig. 1; this feature allows to keep thevideo stream flowing across the peers while the control overlayis undergoing some rearrangements in reaction to peer exits/crashes and bandwidth fluctuations.

A. Terminology and Notation

A peer is a host participating to the overlay.The node that injects the video stream in the network is

called root source; the proposed protocol can use multiple rootsources improving in terms of robustness and scalability, e.g.,by streaming each description from a different source. The nodethat coordinates the root sources and that is responsible formaintaining their neighborhood relationships is called tracker.The tracker and the root sources can reside on the same host.

The key element of the proposed protocol architecture is thecluster. A cluster is a group of up to cooperating peers (setof nodes enclosed by a dash circle in Fig. 1), coordinated by apeer outside called cluster-head.

MAGNETTO et al.: TURINstream: A TOTALLY pUsh, ROBUST AND effIcieNt P2P VIDEO STREAMING ARCHITECTURE 903

TABLE ITABLE OF SYMBOLS

In our protocol, each peer belongs to only one cluster. We use to denote the set of

peers belonging to . Similarly, we denote as theset of clusters whose cluster-head is a member of ; everycluster is fed with video data by one or more external peerscalled sources. The cluster-head is constrained to be one thesesources. Every peer of receiving one video packet from asource must share such packet with the remaining members ofits cluster.

As for the video codec, we assume to use MDC; therefore, thestream (whose bitrate is ) is composed by inde-pendent and complementary descriptions. Each description canbe decoded independently yielding the base video quality; suchquality is improved by decoding more descriptions and dependsonly on the number of the received descriptions. Each descrip-tion is further subdivided in sub-streams called stripes, eachrequiring a bitrate . A stripe is identified by the pair

where and . While MDC works atthe video coding level and generates independent and mutuallyrefinable video streams, the stripe is introduced only at transmis-sion level by selecting the packets of one description based ontheir counter identifier modulo . The stripe represents the min-imum access unit to the video stream. This approach is followedto allocate the upload bandwidth contributed by each peer witha finer granularity with respect to the number of descriptions.

Every peer can share a portion of its upload bandwidthfor both intra-cluster and inter-cluster communications. To sim-plify the management of the upload bandwidth of the peer, weintroduce the concept of slot, defined as the bandwidth requiredto upload a single stripe. The total amount of slots available atpeer , i.e., the maximum number of stripes that can upload,is given by . At a given time instant, eachpeer will be characterized by a certain slot allocation

, where and are theslots used for intra- and inter-cluster communications, respec-tively, and represents the free slots available at peer .

Please note that most of the previous definitions are timedependent; we explicit the time dependency only where it isstrictly necessary. To ease the task of the reader, the notationused in this paper is summarized in Table I.

B. Data Delivery Overlay

A peer can decode the video at full quality provided that itscluster receives all the stripes. Nonetheless, the use ofMDC guarantees service continuity, despite the lower level of

fidelity, when a subset of the descriptions is received. TURIN-stream distributes the video stream to the clusters by buildingand maintaining a video data distribution overlay that is a forestof trees, one for each stripe. In Fig. 1, an example of multi-treefor the distribution of two descriptions and one stripe per de-scription is depicted.

1) Data Sources for a Cluster: For a given cluster , eachstripe is provided to by a source that does not be-long to the cluster. Each packet is forwarded to a single member

of , that in turn propagates the information to all its cluster-mates. For each packet of a given stripe , the recipient isselected within in a round robin fashion among the peers thathave enough upload resources according to the bandwidth man-agement algorithm described in Section II-C5. It is now pos-sible to define the set of cluster sources for as

as well as the set of clusters re-ceiving stripe from peer as

, where is the peerpushing the stripe to the cluster .

To better exploit the advantages of MDC, TURINstreamaims at guaranteeing that a source can serve a cluster for stripesbelonging to only one description, to make most likely thatif a source stops serving a cluster, this impacts only on thedelivery of one description until a new source will replace thedeparted one. Formally,

This topological feature isenforced as far as possible thanks to the bandwith allocationalgorithm described in Section II-C5. Indeed, it is of paramountimportance in order to make the delivery paths of the videodescriptions as independent as possible so as to maximize theaverage quality of the received video.

2) Cluster Level: Every cluster is characterized by its dis-tance from the source root, defined as . The level ofa cluster does not change over time and represents an upperbound to the number clusters to be traversed by any path of thevideo data distribution graph from the root source to . The rootsource has level 0, clusters in the sethave level 1, and so on. On the contrary, a peer can move acrossthe clusters and its level changes over time in response to net-work dynamics as described in the following.

We impose that the sources of a cluster must be-long to clusters whose level is lower than , i.e.,

. This constraint guaranteesthat the video packets of each stripe are forwarded along acyclicpaths. Nevertheless, we do not imposeobtaining a more flexible data overlay structure, which in turnmakes more likely to find spare upload resources during overlayrearrangements. An example of data link between levels 1 and3 is shown in Fig. 1. Since a level 1 cluster is constrained toreceive the video stream directly from the root source, thenit follows that the number of such clusters cannot exceed

.3) Forwarding Algorithm: Content distribution occurs from

sources to clusters. Upon reception of a data packet in stripe, a peer is responsible for intra-cluster sharing.

All members of are guaranteed to receive all packets of eachstripe if

(1)

904 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 12, NO. 8, DECEMBER 2010

holds; in fact, represents the maximum numberof slots contributed by and slots areneeded for intra-cluster video distribution, given that slotsare provided by the external .

In case is also source for the higher levels, then it mustforward the packet using inter-cluster communications. Themain problem is how to select one peer for each cluster in

. In particular, one source selectsdepending on the forwarding activity already performed byand on the slots allocation described in Section II-C5.

In Fig. 1 (bottom-right corner), an example of packet for-warding involving three peers is shown; node A is sent a packetfrom a lower level cluster, then A forwards it to its served clus-ters and its cluster members B and C. Let us assume that thenext packet for the same stripe hits B; now, B will share it withC and A, that in turn will forward it downward in the tree. Fromprevious example, it is clear that each video packet takes twohops to traverse a cluster in the worst case. Since every clusteris fed by only one peer for a given stripe, it follows that eachstripe spans along a tree rooted at the root source; the completeoverlay for video data delivery is a forest of trees, one for eachstripe.

C. Control Topology

A control infrastructure is required to organize and maintainpeers in clusters as they join and leave the application. To thisend, one of the sources of a cluster is also responsible for themanagement of the cluster and it is called cluster-head. The con-trol topology is composed by the edges from cluster-heads tothe peers belonging to the clusters they head. Every connectedpeer depends only on one cluster-head. We will refer to thecluster-head of as its father and to the members of the clusteras the cluster-head’s children. Since the role of the cluster-headcan be played by any of the sources, the control topology is usu-ally different from any distribution topology. In the following,we describe all the protocol operations.

1) Peer Join: The join procedure allows a new peer to enterinto the overlay network. The same operation can be performedalso by a peer that is already connected but whose video conti-nuity is not satisfactory. When a peer wants to join the net-work, it first retrieves the address of the tracker and sends it amessage containing its available slots . The tracker can as-sign to a cluster served by a root source (level 1 cluster) orit can provide with the address of a level 1 cluster-head (re-sponsible of a cluster at level 2); the level 1 cluster-head is thencontacted to recursively repeat the join operation. Reiteratingthis procedure, can be joined at any level of the network. Inother words, follows a path along the control tree until it findsa cluster that can host it.

Acceptance of a new peer in a cluster is done by the cor-responding cluster-head. If , then it mustguarantee that (1) holds, i.e.,

. In the case of a full cluster, isaccepted if and only if . In such acase, the worst peer, i.e., , is substituted bythe new one. The worst peer uses a simplified join procedure,and it is demoted to a higher level cluster. Parameter mustbe greater than 1 to avoid frequent peer demotions. On the otherhand, large values for would result in a very strict criterion for

the acceptance of a new peer. This cluster admission policy isintroduced to ensure that more altruistic peers are placed closerto the root source to enjoy lower delays and higher stability inresponse to their higher contribution to the overlay.

2) Peer Leave: A peer can leave the network at any timenotifying the following peers:

• the other members of ;• the sources of , i.e., peers in ;• the members of the cluster it serves (if is a source);• the other sources of the cluster it serves (if is a source).When departure of a peer is silent, e.g., due to a software

or a hardware crash, its neighbors can infer it with some delayfrom the lack of periodic keep-alive messages. Keep-alive mes-sages are sent from members of a cluster to their cluster-head.This type of message carries information about the peer freeslots and information on its subtree (when the reporting peer isa cluster-head itself). The cluster-head runs a timeout for eachof the members of its cluster; receipt of this message from apeer causes the reset of the corresponding timeout. The expira-tion of this timeout is equivalent to the reception of a quit mes-sage, so the cluster-head removes the peer from the list of mem-bers of that cluster and informs other members and sources ofabout that. Keep-alive messages are also exchanged among thesources of a cluster (one of them is of course the cluster-head)to detect silent departures of sources (and cluster-heads) fromthe network. This allows all sources of a cluster to be informedabout departure of and enables the cluster-head to notify allthe members about absence. Such messages are piggybackedalong with the control information required for peer bandwidthmanagement described in Section II-C5.

After a departure several actions take place:• all other members in its cluster and all its sources will stop

sending packets to ;• when a cluster-head detects that the leaving peer is one

of the sources of its cluster, it starts the repair process.The cluster-head sends a message to its father containingits IP address, the level of the cluster to be repaired, andthe missing stripe . If one of its children has enoughupload bandwidth to replace the source for the missingstripe, then the repairing procedure is successful. If there isa child, that is serving as a cluster-head of a sub-tree withspare resources (gathered thorough previously mentionedperiodic reports), the repair request is forwarded downwardin the sub-tree. Otherwise, the repair message is forwardedupward to a lower level father, thus following a path alongthe control tree looking for a cluster-head able to attemptthe repairing process. To avoid cyclical paths, the repairmessage changes direction along the control tree only once.Moreover, the selected source must guarantee the serviceconstraints depending on the cluster level and requestedstripe. Since this process is not guaranteed to be successful,it can be repeated after the expiration of a timeout. It isworth pointing out that in presence of a limited number ofavailable slots, a higher priority is given to the repair re-quest issued by the lowest level clusters;

• if is a cluster-head, another source will be electedcluster-head. If the sources of a cluster detect the silentdeparture of the cluster-head, a successor is univocallyself-elected on the basis of a priority policy based on the

MAGNETTO et al.: TURINstream: A TOTALLY pUsh, ROBUST AND effIcieNt P2P VIDEO STREAMING ARCHITECTURE 905

values of the stripe identifier. In particular, the clustersource forwarding the stripe with the lowest identifiers

is the one taking the lead.If massive departures occur and all the sources of a cluster stopserving it, the cluster turns to be disconnected from the network;in this unlikely case, the cluster peers reconnect by repeating thejoin procedure.

3) New Cluster Creation: A new cluster can be createdbelow an existing cluster , i.e., ; thisdecision is taken by the cluster-head, that appoints some of itschildren as sources of . The control information required forthis operation is local to two levels of the control topology. Toincrease stability, the set is not filled entirely using

. The remaining sources are gathered accordingto the repair procedure so as to minimize the overlap betweenthe control and data topologies, thus limiting the probability ofcascaded clusters’ failures. will be initially an empty clusterto be populated by successive join requests.

The decision to create a new cluster must be taken onlywhen the current clusters are almost full and it is likely to haveenough upload resources to support it. This conditions guaranteethat the control and distribution trees are kept as compact as pos-sible to limit the communication delays and that the new clus-ters are well connected to the rest of the overlay. Going into de-tails, the cluster is created if the following conditions are jointlysatisfied:

• , with a suitable constant,i.e., the number of is close to the maximumvalue ;

• ,with a suitable constant, i.e., the clusters headedby are well populated; the quantities

are sent as control information by thecluster-heads of the next level and represent the only in-formation that is not directly available at the cluster-headof ;

• , i.e.,have enough free slots to support .

In particular, the required number of free slots is madeproportional to the number of controlled clusters.

4) Peers Promotion and Demotion: As already mentioned,peers are not constrained to a given cluster but can move in re-sponse to network dynamics. Such migrations are implementedby means of a simplified join procedure. In particular, thecluster-head of a full cluster can move the worst peer, i.e., theone with the minimum contribution in slots, to a higher levelcluster as far as a better peer joins its cluster. In the oppositecase, a cluster-head that notes that the population of clusteris falling below a certain threshold has means to select the bestpeers at level larger than by exploiting the periodicinformation sent by its children. As a consequence, it is possibleto move the best peers of the higher levels to the cluster .This approach keeps the overlay compact allowing to close thefurthest clusters when the network population decreases.

As a last resource, a peer is allowed to disconnect and recon-nect in a new cluster if its quality-of-service is not satisfactoryor because of the contemporary departure of all of its sources.

5) Upload Bandwidth Management: The management ofthe cluster upload bandwidth is done by the cluster-head.

This is achieved on the basis of the periodic keep-alive mes-sages sent by its children, which, for each , containand ; clearly, only peers that are cluster sources have

. The cluster head objective is the allocation ofthe slots for intra-cluster streaming under the con-straint that . Theoverall upload bandwidth required by a cluster amountsto slots, whererepresents the number of descriptions that can be supported by

. In fact, especially in leaf clusters that may host peers withlimited resources, the churning could limit the slots availabilityfor a certain period of time, thus forcing the cluster-headto reduce the streaming rate dropping some descriptions. Ifcluster peers were homogeneous in terms of , the optimalallocation would be . In practice,peers are not homogeneous making this simple allocationsub-optimal. Moreover, within TURINstream, we take intoaccount the presence of MDC video coding so as to improvethe performance in terms of the user experience avoiding, ifpossible, that the same peer relays more than one description.To this end, the cluster-head performs slots allocation at a finerlevel of granularity, fixing the number of slots per peer anddescription : . It follows thatand that the number of slots allocated to the descriptionin the cluster is given by . Therefore,the slots allocation sought by the cluster head is such that

and. Moreover, if possible, the following

constraint is enforced:which amounts to preventing to forward more than onedescription.

The allocation is refined dynamically starting from the value, that is assumed when a new peer joins the cluster.

The algorithm used in TURINstream aims at achieving a bal-anced slots allocation . Every time a peer enters or leavesthe cluster, the cluster head performs the following steps:

• Update by selecting the maximum number of de-scription satisfying (1), given the slots available in . Thencompute the average peer contribution .

• , remove slots allocatedin excess (starting from peers holding more descriptions).

• having for morethan one , keep the allocation only for description

and set .• having ,

slots are released.• , find additional re-

sources from the free slots of peers according to the fol-lowing priorities:1) find every such that and increment

by using some or all of its free slotswithout exceeding ;

2) find every such that andand increment while keeping ;

3) find every such that andand increment ;

4) find every such that andand increment without exceeding ;

5) find every other peer and increment .

906 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 12, NO. 8, DECEMBER 2010

Given the selected , the values of of each childare known and can be used by the cluster-head to drive the re-pair and new cluster creation procedures described above. Fi-nally, the values of are shared with all the other clustersources that must perform the actual forwarding of the videodata according to such slots allocation. It is worth pointing outthat the slots allocation is time dependent because of the dy-namic of the cluster population and the resources they can pro-vide. Let us make this dependence explicit by usingand to represent the slots allocation and thepopulation of cluster at a given time instant . At time , eachsource can evaluate the average bandwidth utilization of a peer

for a given description as

where is the time when has joined andis the amount of bits of the description pushed towardat time . The utilization is defined as the averageratio between the bandwidth spent by to forward videopackets within , measured by the source according to thesize of the transmitted packets , and the band-width allocated by the cluster-head. At time ,each source selects as destination of the next video packet ofdescription the peer exhibiting the lowest utilization:

III. PROTOCOL SIMULATOR

The first step towards the development of the TURINstreamprototype has been the design of an overlay simulator to aidthe development of the algorithms and the optimization of theprotocol parameters, e.g., number of stripes, number of descrip-tions, and number of peers per cluster. The simulator has beenused to perform most of the optimization of the protocol beforeactually deploying it in a dynamic and distributed environment.In Section III-A we describe the simulator, in Section III-B theeffect of the most critical protocol parameters is analyzed, andfinally in Section III-C, the ability of TURINstream to scale tolarge networks is tested.

A. Simulator Description

The behavior of the TURINstream protocol has been em-ulated by means of an event driven simulator. The simulatorcan handle networks of several thousands of peers and analyzetheir performance over time. The peers/clusters relations aremodeled by a bipartite graph, composed by the objects Peersand Clusters. Bidirectional edges link the two classes and forma graph that models the TURINstream overlay topology. Themost important simulated events are peer arrivals and departures(both notified and silent), cluster repairs, and peer reconnec-tion. Random latencies are used to accomplish each operationso as to simulate network delays. The simulator does not keeptrack of every message, so it can easily manage the dynamicsof large networks for a long simulated time. In order to test the

proposed protocol in a realistic scenario, we adopted two dif-ferent statistical models for peer arrival and sojourn times. In thefirst scenario, in the following referred to as departure process 1(DP1), arrivals and departures are distributed according to nega-tive exponential distributions, like in an queue. In thesecond scenario, a model where burst departures concentrate inshort periods is considered (DP2). In particular, the simulatedtime interval is divided into cycles of one hour and at the endof every cycle, 50% of the peers disconnect within a time in-terval of 100 s. The first model represents a behavior where thedepartures occur continuously, keeping the cluster repair systemunder pressure. The second is representative of mass peer depar-tures, e.g., after the end of a TV program, causing a major reor-ganization of the overlay. All simulations begin with an emptynetwork, growing dynamically with peer arrivals; after an ini-tial transitory, that is not considered in the computation of theperformance indexes, its size became stable (in case of DP1) orbegins its cyclical behavior (DP2). In both DP1 and DP2, 20%of the departures are treated as silent to simulate unexpectedpeer crashes.

Moreover, the simulator can handle peers with heterogeneousupload capacities. To obtain realistic results, all the followingexperiments have been worked out assuming a typical uploaddistribution, where the majority of the peer accesses the networkthough connections with limited upload bandwidth. The uploadbandwidth distribution used in all the reported experiments is384 kbps (72%), 512 kbps (24%), and 768 kbps (4%). The videobitrate has been fixed to . We used a single rootsource node with upload capacity of to initiate the videostreaming.

The simulator goal is the computation of the percentage oftime a peer is getting the video. We did the pessimistic as-sumption that a description cannot reach a cluster if at least oneof the paths from the root to the cluster transporting that de-scription is broken somewhere. In other words, we assume thata video description is completely useless as far as a single packetof such description is missing. This assumption does not hold forreal video players where a limited number of packet losses canbe mitigated, e.g., by means of error concealment. At any time apeer can be in states : it can be dis-connected while trying to reconnect or it can belongto a cluster that receives descriptions. During thesimulation, the time spent by in every possible state

is accumulated and the continuity index is evaluates as

Moreover, the simulator reports some topological indexes,namely the maximum and average height of the control tree.

B. Protocol Parameters Optimization

As a first result, the simulator has been used to select reason-able values for the following parameters: to managethe peer admission in a full cluster and , ,

, to control the creation of new clusters. Suchvalues have been chosen by studying their effects on the overlaytopology and on the video service in terms of .

MAGNETTO et al.: TURINstream: A TOTALLY pUsh, ROBUST AND effIcieNt P2P VIDEO STREAMING ARCHITECTURE 907

Fig. 2. �� (a) and control tree maximum and average height (b) versus the number of peers per cluster� , with � � �, � � �, overlay size of 1000, using DP1and DP2 models.

Fig. 3. �� versus the average overlay size with� � ��,� � �, 1 and� � �,4 for DP1 and DP2 models.

The most important parameters of TURINstream are thecluster size and the number of stripes in which the video isfragmented, i.e., . In Fig. 2(a), is reported versus foran overlay of 1000 peers when fixing and for bothDP1 and DP2 dynamic models. In Fig. 2(b), we show the meanand maximum tree height versus . As expected, larger valuesof yield a more reliable and shorter control overlay network.On the other hand, using very large clusters is not feasiblebecause of the increased upload required for a peer to sharethe video data with all his cluster-mates and the larger amountof control information to be managed by the cluster-head. Itturns out that setting is optimal from the point of viewof both the overlay reliability and control tree height. Fig. 2also points out that the mass departure scenario (DP2) is themost critical one, yielding a lower CI and a higher maximumheight. Nevertheless, such effect is particularly evident onlywhen using small clusters.

C. Protocol Scalability

The simulator has been used to test the behavior of theTURINstream overlay when the number of peers in the net-work increases up to the tens of thousands. The , the heightof the control tree, and the number of messages required forfrequent protocol operations have been studied as a function ofthe average overlay size from 1000 to 50 000 peers.

In Fig. 3, is shown for the cases andfor DP1 and DP2 models. It can be noticed

that TURINstream is able to scale to tens of thousands of userswithout a significant impact on the continuity index, that re-mains above 0.98 for both DP1 and DP2 peer dynamic models.Moreover, these latter experiments have been repeated for twoconfigurations of the stripes settings. In the first case, we use

and in the second case so asto compare the performance of the protocol when the numberof stripes is constrained to 4 but the number of video descrip-tions changes. Fig. 3 shows the improvements in terms ofobtained using 4 descriptions (see square markers); clearly thisadvantage is due to the fact that when , the receptionof a single stripe guarantees service continuity, even if withproportionally reduced video quality. Nevertheless, such gaincan be achieved at the expense of higher video coding and de-coding computational costs and a higher bitrate overhead withrespect to standard video coding [7], [16]. The observation thatTURINstream achieves a good performance with is ofparamount importance since most of the MDC video codingschemes available in the literature are designed for 2 or 4 de-scriptions [7], [16]–[18].

In Fig. 4(a), the average and maximum heights of the tree ofclusters build by TURINstream are reported versus the networksize. It turns out that the average height of the control tree haslogarithmic dependence on the overlay size [an func-tion is plotted for comparison]. Clearly, a compact topology isable to guarantee low control and playback delays. Finally, inFig. 4(b), we show the average number of messages required fortwo TURINstream operations, namely the join and repair pro-cedures. It turns out that a limited number of messages needs tobe exchanged to accomplish the two tasks even for very largenetworks. The repair procedure, taking on average less than fivecontrol messages, turns out to be quite efficient and points outthat the identification of additional resources requires only alocal visit of the control tree. These latter achievements can beexploited to obtain the low protocol control overhead measuredon the full prototype presented in the following section.

To conclude, for the prototype implementation and experi-mentation presented in Section IV, we selected a cluster size

and 4 stripes obtained by setting and . Thevalue of has been selected as a compromise between relia-bility and upload bandwidth requirements for intra-cluster com-munications. The number of stripes has been selected in order

908 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 12, NO. 8, DECEMBER 2010

Fig. 4. Control tree maximum and average height (a) and average number of messages per operation (b) versus the average overlay size with � � ��, � � �,� � � for DP1 and DP2 models.

to accommodate the efficient MDC video coding technique re-called in the following that allow us to perform real-time en-coding of live video.

IV. PROTOTYPE DEVELOPMENT AND EXPERIMENTAL RESULTS

In the following, we present details on the implementation ofa prototype of TURINstream (Section IV-A) that we deployedand tested on PlanetLab (Section IV-B). We discuss results forthe performance indexes we defined in Section IV-C that we ob-tained in several scenarios aimed at proving that TURINstreamfulfills all the inspiring design goals. In particular, Section IV-Dpresents results when peers alternate between active and idle pe-riods. In this case, performance of TURINstream are comparedagainst those presented in [4]. Scalability properties of TURIN-stream are investigated in Section IV-E. Different arrival anddeparture patterns are considered in Section IV-F where mostusers join (flash crowd) and leave (mass departure) the applica-tion in a very short time. Finally, Section IV-G presents resultsin scenarios where the global amount of available upload re-sources (the so-called resource index [19]) is rather limited, i.e.,less than 1.

A. Prototype Description

A complete prototype of the proposed P2P streaming pro-tocol has been developed, using the C++ language. The applica-tion includes the client and the server implementing the protocolprimitives described in this paper. The P2P protocol uses onlyUDP sockets so as to minimize transport delay and overhead.The server is equipped with a modified VideoLan [20] server inorder to encode and stream the video according to the MDC al-gorithm presented in [7]. The client is able to receive the videodescriptions, decode, and play the video. The video descriptionsare transported using RTP on top of UDP, whereas the video ses-sion and the decoder are set-up by means of SDP.

The proposed TURINstream is not constrained to the use ofa particular MDC algorithm. Nevertheless, the selection of theMDC video codec is important from the point of view of theadded encoder/decoder complexity and coding overhead withrespect to standard single description coding. The prototypehas been worked out employing the MDC video technique [7],which is designed to create video streams. According to

this technique, every video picture is fragmented into a numberof slices; MDC coding yields two alternative representationsof each slice, i.e., one packet for each description bitstreamto be forwarded by the TURINstream protocol. This MDCtechnique has a number of desirable features: first of all, it iscompliant with the H.264/AVC standard [21], that achievesstate-of-the-art compression performance and allows one touse standard transport and signaling protocols. Moreover,as opposed to other solutions, the MDC coding overhead,1

expressed as the rate penalty with respect to standard videocoding, can be controlled and freely adjusted according to thedesired robustness/video quality trade-off. In other words, theMDC overhead can be selected so as to be hardly noticeable inabsence of packet losses, while guaranteeing a more gracefuldegradation of the video quality when the P2P overlay isunder stress. In the following experiments, an MDC codingoverhead ranging from 5% to 10% of the rate has been used.Finally, the effort required on the decoder side to decode thedescriptions is based on a simple RTP packet level selectionalgorithm, followed by a standard H.264/AVC player. The twodescriptions are received as two separate RTP flows. The twostreams are synchronized exploiting the RTP timestamps andmerged into single AVC/H.264 RTP flow containing the bestavailable representation per each slice; the output stream canbe decoded by any standard player.

The TURINstream client and server applications are com-posed of four main modules:

• a thread that listens to the socket for incoming messagesand processes them;

• an event list that is used to implement timeouts and periodicmessaging;

• a video buffer where the incoming descriptions are syn-chronized and merged into a single video stream forplayout according to [7];

• a queue that manages outgoing messages (control andvideo); in this queue, smaller control packets are givenhigher priority so as to guarantee that critical operations,e.g., overlay repairs, are completed as quickly as possible.

1All MDC coding techniques incur a rate penalty with respect to standardvideo coding. The coding overhead allows one to decode the descriptions inde-pendently and makes them mutually refinable.

MAGNETTO et al.: TURINstream: A TOTALLY pUsh, ROBUST AND effIcieNt P2P VIDEO STREAMING ARCHITECTURE 909

B. Experimental Testbed

The developed prototype has been initially validated bymeans of a local testbed; then it has been deployed using thePlanetLab platform to test the application in a realistic Internetscenario and to compare it versus other solutions.

We considered two scenarios:• in the first setting, we used our local PlanetLab node to host

the video server (the root source) and a subset of other ac-tive PlanetLab nodes to run full clients, i.e., clients able toreceive, mix, and decode the video. We conducted exper-iments with a number of active PlanetLab nodes rangingfrom 50 to 200. We refer to this case as the full scenario.

• the second setting aims at analyzing the performance ofTURINstream when scaling from hundreds to thousandsof users. Since there are usually slightly more than 200usable2 PlanetLab hosts supporting our experiments, theonly solution is to allocate several peers per active host.Nevertheless, the bandwidth of a single host is not suffi-cient to support tens of concurrent peers, besides gener-ating unrealistic and very correlated congestion patternsover the IP network. To overcome this limitation, we con-sidered a lightweight scenario where clients implement allthe signaling procedures of the TURINstream protocol butvideo packets transmissions are emulated by sending onlythe packet headers. Clearly the peers bandwidth usage iscomputed by assuming the full size video packets. Peerspush video packet headers in the network allowing to es-timate all the performance indexes in absence of actualvideo streaming. The shortcoming of this approach is thatIP network congestion may be underestimated. Still, thepossibility to test the protocol scalability is a key point ofour study and the obtained results are very accurate if oneassumes that the underlying network infrastructure is notunder stress. In fact, results obtained in the lightweight sce-nario have been successfully validated against results in thefull scenario when the number of peers was equal to 200.

Moreover, all experiments have been performed by limitingthe upload bandwidths; this was achieved by adding a softwaremodule between the application and transport layers, that dropspackets if the upstream exceeds the selected limit. Such moduleis obtained controlling the rate at which messages are poppedfrom the outgoing queue. This choice has two important moti-vations. First of all, we are interested in simulating the systemin a more realistic scenario comprising both residential ADSLand institutional users. More importantly, we want to make ourresults as reproducible as possible,3 even if we resort to the Plan-etLab concurrent and open environment where resources cannotbe guaranteed using a preemptive allocation.

According to the protocol optimization presented inSection III, the prototype has been tested with the followingprotocol parameters: , , , ,

, , , and . Moreover, thedevelopment of the full prototype and its testing in realistic

2We use the term usable to denote a Planetlab node which was free of all thefollowing problems: frequently unreachable due to downtimes and other issues,unable to reach (or be reached) by other Planetlab nodes, experienced a verybad connection quality, suffering from DNS problems, under very high load,varying SSH-keys, not enough disk space to write log files.

3Unfortunately, bandwidth dynamics (congestion) cannot be controlled.

scenarios has led to the selection of proper timeout values forthe protocol operations. The most critical timeout values turnedout to be the timeout for a cluster-head to receive a messagefrom a child and the timeout for periodic messagesbetween sources . The optimization of the protocolon the field has led to the selection of and

.We also developed a console application to manage the

joining and leaving of the peers and the collection ofmeasurements. We conducted repeated experiments duringSeptember-November 2009: results showed similar character-istics; therefore, we selected representative cases for the nextdiscussion.

C. Performance Indexes

The purpose of our experiment was to estimate some perfor-mance indexes to confirm that TURINstream satisfies our de-sign goals. To assess the robustness of TURINstream, we con-sidered the continuity index , that is expressed as the frac-tion of video packets that arrive to a peer before their playbackdeadline. Since TURINstream is designed to exploit MDC, weconsider that the video service is granted as far as at least oneout of the two representations of each video packet is received.To reflect this, the continuity index is defined as the frac-tion of video slices for which at least one description has beenreceived. In fact, in presence of MDC, every video slice is splitinto two packets, one per each description, and a discontinuityoccurs only if both packets are lost; if only one is received, theuser does not perceive it as a discontinuity but as a degradationof the video quality.

Moreover, the developed prototype allowed us to measure thedelays incurred by the distributed application. The protocol la-tency is measured by means of the following indexes:

• Connection delay , defined as the time interval betweenthe sending of the first join request message and the recep-tion of the accept message;

• Startup delay , defined as the interval between the firstjoin message and the decoding of the first video frame; itincludes connection time and buffering time. Hence, it isthe time between the application starts and the time instantthe user starts watching the video;

• Playback delay , defined as the amount of time elapsedfrom the first transmission of a packet in the overlay by theserver and its actual playback time experienced by the user.

Finally, the efficiency of TURINstream has been evaluatedin terms of signaling overhead. Two types of overheads can beidentified, namely those caused by the video packet headers andby messages carrying control information, respectively. Thevideo packet headers are 14 bytes long, summing up to 28 bytesif one takes into account the UDP headers as well; this overheadis negligible and it is similar to the one yielded by standardstreaming transport protocols such as RTP/UDP. Clearly, themajor source of overhead is represented by signaling informa-tion exchanged to maintain the control and video streamingoverlay topologies. Moreover, such contribution depends onthe peers/network dynamics. To measure the latter, the overallamount of control traffic is logged by each peer. This allowsus to measure the protocol overhead as the ratio between theaverage bitrate of the control information over the video bitrate.

910 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 12, NO. 8, DECEMBER 2010

TABLE IIPARAMETERS OF THE TEST ENVIRONMENTS SHARED WITH [4]

D. On-Off Behavior

Performance of novel P2P video streaming architectureshould be compared against those of other techniques. Thisimportant issue is nevertheless very difficult to deal with. In-deed, a thorough and fair comparison may not be accurate dueto the difficulty of re-creating the same test environment.4 Forinstance, while it would be theoretically possible to comparedifferent architectures using the same video bitrate, root sourceupload bandwidth, peers upload bandwidth distribution, arrival(departure) pattern, and number of peers in the overlay, it issurely impossible to reproduce the congestion and CPU loadexperienced by the PlanetLab nodes. Of course, this is true onlyif the prototypes implementations of other techniques werepublicly available for experimentation. If this is not the case,then it is only possible to provide a comparison where the max-imum number of system parameters is matched to obtain hintson the relative performance of two competing architectures.

We chose to compare the performance of TURINstreamagainst those of Coolstreaming as presented in [4]. In par-ticular, we evaluated the average CI and control overheadperformance indexes where we matched the video bitrate, thepeer arrival (departure) pattern, and the number of peers in theoverlay (Table II shows the parameters of the test environmentsthat match those used in [4]). We were not able to obtain infor-mation on the root source upload bandwidth and on the peersupload bandwidth distribution. In [4], the CI and the overheadhave been measured using four connections per peer (includingthe root source); therefore, it seems reasonable to assume thatthe root source upload bandwidth has been fixed to . Onthe other hand, no limitations seem to have been applied to thepeers upload bandwidth.

To show that TURINstream is capable of providing perfor-mance at least as good as those of Coolstreaming, we fixed ourroot upload bandwidth to only (half of what has been pre-sumably used in [4]). This choice is representative of a sce-nario where a user wants to stream its video content withoutresorting to a dedicated networking infrastructure. It followsthat results could be significantly improved if a better connectedvideo provider was considered, e.g., representative of a commer-cial video distribution system. In fact, by increasing the numberof level 1 clusters, the continuity index would increase and de-lays would shorten for a larger number of peers. The controloverhead would practically be unaltered.

We also imposed a cap on the peers upload bandwidthby using the upload bandwidths of the participating peersaccording to the distribution shown in Table III. The secondcolumn represents upload values distributed according to band-width available to a majority of home ADSL users and a limited

4We do not consider architectures whose performance have been evaluatedonly through simulations.

TABLE IIILIMITED UPLOAD BANDWIDTH DISTRIBUTION

percentage of institutional/business users with high capacity.The resource index is equal to 1.18.

Moreover, we decided that peers’ departures are notified withprobability 0.95 while they are silent with probability 0.05 tosimulate crashes. Silent departures are obtained by suppressingall quit messages. This is another feature that makes our exper-imental conditions more adverse than those in [4].

All our experiments lasted for one hour and performance in-dexes are computed over all the ON periods. Fig. 5(a) showsthe TURINstream performance in terms of as a function ofthe average ON/OFF period for different network sizes. Thefour bars in the histograms show the obtained with differentoverlay sizes of 50,100,150, and 200 nodes; the dash level ineach bar shows the ratio between the average number of receiveddescriptions and and it is representative of the averaged re-ceived video quality; the black dots are experimental values of

reported in [4]. Fig. 5(a) shows that TURINstream achievesa high by profitably exploiting MDC. In particular, it can benoticed that for and , TURINstream yields

with an average number of received descriptionsvery close to . In presence of a higher churn, e.g., ,the TURINstream architecture still yields larger than 0.9 atthe expense of a slight reduction in the number of received de-scriptions. The results from [4] are reported in order to com-pare TURINstream performance with that of the first popularmesh-based streaming system; nevertheless, it must be recalledthat in [4], no limitation on the peer upload bandwidths wereconsidered and the root source upload bandwidth was likely tobe double with respect to that used for TURINstream.

TURINstream exhibits limited latencies, taking on averageless the 500 ms to complete the join procedure as shown inFig. 5(b) (filled bars for ) in all cases. In the same figure, alow startup delay of about 5 s is reported. The playback delay

is about 12 s in all cases. We also computed the CDF forin the case and 100 nodes (the other cases are qual-itatively very similar); we observed that the 95th percentile isequal to 6.1 s and the 99th percentile is equal to 6.6 s.

Finally, the proposed design is very efficient also in termsof signaling overhead which is kept below 0.003 as shown inFig. 5(c) which is one order of magnitude smaller than what isreported in [4]. The main contribution to the overhead is givenby periodic keep-alive messages. There are three kinds of suchmessages:

• every peer sends a message to its cluster-head everyseconds containing information for bandwidth

management;• every source sends a keep-alive message to the cluster-head

every seconds;

MAGNETTO et al.: TURINstream: A TOTALLY pUsh, ROBUST AND effIcieNt P2P VIDEO STREAMING ARCHITECTURE 911

Fig. 5. TURINstream average �� (empty bars) and the ratio between the average number of received descriptions and � (grey bars) along with experimentalresults from CoolStreaming [4] (black dots) (a), � (gray bars), � (b), and protocol overhead (c) for networks with 50,100,150,200 peers as a function of theaverage ON/OFF period � (full scenario).

Fig. 6. Performance indexes as a function of the number of peers (lightweight scenario): average CI (a), � , � , and � (b), and overhead (c).

• the cluster-head communicates to the sources the slots al-location every seconds.

An approximation of the average overhead per peer due to thekeep-alive messages is given by

(2)

where the first term takes into account the messages from a peerto its cluster-head, while the second one is the contribution ofthe cluster head. This latter represents the cost of sending the

values of to the other sources. The con-stant values represent the size of control packets, including thecost of UDP headers, that do not depend on any system param-eter. As a sanity check, we computed (2) when substituting theprotocol parameters values reported in Section IV-B: one ob-tains 839 bps, which amounts to about the 0.2% of the videobitrate. This estimate confirms the results shown in Fig. 5(c).The discrepancy is due the fact that (2) neglects the contribu-tions of control information exchanged for protocol operations,e.g., join, leave, cluster repair, etc.

To summarize, TURINstream is able to obtain low delays,as those previously reported for tree-based topologies, and atthe same time robustness to peer churning and very low con-trol overhead. The comparison against the mesh-based archi-tecture Coolstreaming reveals that TURINstream performanceis at least as good for CI, while it outperforms Coolstreamingfor both control overhead and delays.

E. Scalability

To test system composed of a larger peer population, we con-sidered three hours long experiments in the lightweight sce-nario where peers’ inter-arrival and permanence times are gen-erated according to an exponential distribution, i.e., the amountof peers in the system behaves like an queue. The ex-ponential distribution parameters have been set to obtain an av-erage sojourn time and steady-state average overlaysizes ranging from 100 to 2000 peers.

Fig. 6(a)–(c) shows the performance indexes versus theaverage overlay size up to 2000 peers. It can be noticed thatTURINstream is able to scale to thousands of peers withoutsignificant impairment in terms of [Fig. 6(a)], delays[Fig. 6(b)], and signaling overhead [Fig. 6(c)].

F. Mass Behavior

We also tested the performance of TURINstream in a morechallenging setting where a large number of peers joins the ap-plication in a short amount of time (flash crowd). To this end, weconsidered a system of 1000 peers in the lightweight scenario.For flash crowds, we considered rate of peer arrivals rangingfrom 50 to 200 peers/s while for mass departure, we forced apercentage ranging from 10% to 40% of the peers to leave theapplication in a time interval of 5 s (these departures are silentwith probability 0.05). The behavior of in presence of a flashcrowd starting at time instant 0 is shown in Fig. 7(a). It can benoted that the application is robust under sudden massive ar-rivals. In fact, the initial unavoidable drop in is recovered ina few seconds. When arrivals occur on average every 5 ms, we

912 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 12, NO. 8, DECEMBER 2010

Fig. 7. TURINstream continuity index for flash crowd arrivals (a) and mass departures (b).

TABLE IVLIMITED UPLOAD BANDWIDTH DISTRIBUTION

note that after 30 s, the average is above 0.9 and stabilizesto slightly less than 1 because no departures occur thereafter.In this case, and slightly increase to 3 s and 8 s, respec-tively. These values must be compared with those experiencedwith 1000 peers in Fig. 6(b) where and ;it turns out that is the most sensitive to flash crowds, whereasthe impact on is limited. Control overhead is only marginallyaffected. As a final comment, one has to keep in mind that theseresults have been obtained starting from an empty overlay; thepresented results improve if the flash crowd arrivals occurred onan already well-populated network.

The effect of massive departures on is reported inFig. 7(b); the latter clearly shows that the TURINstream is ableto efficiently cope with the sudden reduction of the resourcesexperienced by the system when many peers leave almost atthe same time. In particular, it can be noted that the continuityindex never drops below 0.8, even when 40% of the peers leavewithin 5 s. The delay of about 5 s with respect to the start of themass departure event at time 0 is due to the presence of videobuffering. In presence of mass departures, delay and overheadvalues show negligible variations.

G. Quality-of-Service Differentiation UnderLimited Resources

To test the ability of TURINstream to provide differentquality-of-service to peers sharing more resources, we con-sidered scenarios where the resource index is significantlylower than what we used in the previous sections, i.e., 1.18. Inparticular, we considered a video bitrate , a rootserver upload bandwidth equal to , and peers upload band-width distributed according to five classes in four scenarios assummarized in Table IV. We considered a lightweight scenariowhere the average sojourn time is equal to 360 s while the

inter-arrival time is equal to 720 ms and 360 ms to obtain anaverage overlay size equal to 500 and 1000 peers, respectively.Experiments lasted for three hours; we computed the averageCI, the average number of descriptions Q, and the startupdelay obtained by peers in each bandwidth class for eachof the four scenarios. Results presented in Table V prove thatTURINstream allows altruistic peers to enjoy very high CI andalmost full quality video reception. Indeed, this goal is obtainedeven in scenarios with very scarce available resources (S3 andS4) and it is maintained as the number of peers doubles. Themore peers act selfishly the more they are penalized; it can benoted that peers in class 1 (that contribute no more than half thevideo bitrate) enjoy low CI and Q while the quality-of-serviceexperienced by class 2 peers (that contribute for three quartersthe video bitrate) is consistently better. It can also be noted thatthe startup delay is lower for more cooperating peers: thisresult is due to the fact that TURINstream is able to accom-modate more altruistic peers into the top levels of the overlaythrough promotion and demotion operations.

V. RELATED WORKS

The TURINstream architecture encompasses two key el-ements, namely the MDC video distribution on a multi-treetopology and the organization of the peers in clusters. In thepast, many multi-tree solutions have been proposed where thenode of the trees are the peers as opposed to the TURINstreamclusters. Here we mention CoopNet [1] and Splitstream [2] thatare two well-known approaches. In CoopNet, a set of randomtrees is built by using a centralized approach. All the peersjoining and leaving the overlay exploit a resourceful servernode that coordinates the overlay construction and optimiza-tion. Because of the absence of a distributed and local algorithmfor the construction of the trees, the overlay reorganizationin response to peer crashes or bandwidth fluctuation can becostly. Moreover, CoopNet exploits a high number of videodescriptions (up to 16) to make the offered service reliable.On the other hand, SplitStream represents a viable distributedapproach to the construction of the multi-tree; in the lattercase, the use of interior-node-disjoint trees is proposed, whereeach peer is an interior node in at most one tree. This solutionlimits the impact of the departure of a peer to a single tree.Nevertheless, it has the negative drawback that every peer is aleaf node in the distribution of out of the descriptions.

MAGNETTO et al.: TURINstream: A TOTALLY pUsh, ROBUST AND effIcieNt P2P VIDEO STREAMING ARCHITECTURE 913

TABLE VLIMITED UPLOAD BANDWIDTH DISTRIBUTION

In presence of churn, the quality-of-service in terms of bothdelays and continuity decreases with the height of the tree. Thisfact heavily limits the overall quality offered to each peer.

A few peer-to-peer architectures have introduced the conceptof clustering in recent years as well. The Zigzag [22] protocol,which is an enhanced version of NICE [23], builds a single clus-tered multicast tree. As opposed to TURINstream in ZigZag, allthe peers belong to a cluster at level 0, whose cluster-head be-longs to a cluster at level 1; then level 1 cluster-head belong alsoto a cluster in level 2 and so on. As a consequence, in [22], a toplevel cluster-head belongs to all the clusters; on the other hand,in TURINstream, each cluster-head has only a local knowledgeof the overlay making it more robust to peer churn. In particular,TURINstream repair operations are quicker since the topologyis more flexible and does not require split and merging of clus-ters as in ZigZag. Moreover, TURINstream uses a multi-tree forcontent distribution whereas ZigZag employs a single multicasttree. Finally, in [22], simulation-based results without the ex-ploitation of MDC have been reported. Reference [24] presentsanother layered cluster architecture very similar to ZigZag fromthe point of view of the overlay topology. The focus of this workis on optimal exploitation of peer bandwidth whereas the pro-posed architecture, as most of the structured topologies, is notresilient to peer churn and departures. In particular, in dHCPS,the cluster-heads represent a single point of failure. Further-more, experimental results in [24] have been worked out forsmall and stable systems (80 peers on PlanetLab) and consid-ering a video server equipped with 3.2 Mbps upload bandwidthcapacity; actual scalability and resilience are thus a concern forthis architecture.

Another goal pursued in this work is service differentiationbased on peers contribution so as to promote altruistic behav-iors. In [25], a centralized solution that organizes peer in a di-rect acyclic graph and adapts the video streaming rate basedon the peer contributions is proposed and analyzed by simu-lation. The system proposed in [25] requires a parent node totranscode incoming stream to serve its children at different rates.Transcoding is more flexible in terms of the achievable rates butis highly demanding in terms of computational resources withrespect to the use of MDC, where peers can simply drop somedescription to shape the rate.

VI. CONCLUSIONS

In this paper, we presented TURINstream, an architecturefor P2P-based video streaming. It exploits separation of dataand control overlays, cluster-based organization, MDC videocoding, and tree-based control overlay to achieve a high de-gree of robustness, low connection, startup and playback de-lays, high scalability, low control overhead, and differentiatedquality-of-services for different classes of users.

TURINstream design has been optimized through an eventdriven overlay simulator able to scale up to tens of thousandsof users. A complete prototype of TURINstream has been de-veloped, deployed, and tested on PlanetLab under varying de-gree of peer churn, flash crowd arrivals, sudden massive depar-tures, and upload bandwidth limitation. Moreover, the prototypehas been equipped with state-of-the-art MDC video coding, al-lowing us to test the streaming protocol under realistic videotraffic. The main finding is that TURINstream fulfills our initialdesign goals and proved to be competitive with other architec-tures.

Future efforts will be devoted to devise more efficient ex-ploitation of all the available upload bandwidth of participatingusers as well as to cope with the delay in adaptation to band-width fluctuations. Moreover, network aware algorithms will beconsidered for optimal cluster formation.

ACKNOWLEDGMENT

The authors would like to thank the anonymous reviewersfor their valuable comments and suggestions that contributed toimprove the paper.

REFERENCES

[1] V. Padmanabhan, H. Wang, P. Chou, and K. Sripanidkulchai, “Dis-tributing streaming media content using cooperative networking,” inProc. ACM NOSSDAV, May 2002, pp. 177–186.

[2] M. Castro, P. Druschel, A. Kermarrec, A. Nandi, A. Rowstron, andA. Singh, “SplitStream: High-bandwidth multicast in cooperative en-vironments,” in Proc. ACM SOSP, Oct. 2003, pp. 298–313.

[3] V. N. Padmanabhan, H. J. Wang, and P. A. Chou, “Resilient peer-to-peer streaming,” in Proc. IEEE ICNP, Nov. 2003, pp. 16–27.

[4] X. Zhang, J. Liu, B. Li, and Y. S. P. Yum, “CoolStreaming/DONet: Adata-driven overlay network for peer-to-peer live media streaming,” inProc. IEEE INFOCOM, Mar. 2005, pp. 2102–2111.

[5] X. Hei, Y. Liu, and K. W. Ross, “IPTV over P2P streaming networks:The mesh-pull approach,” IEEE Commun. Mag., vol. 46, no. 2, pp.86–92, Feb. 2008.

914 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 12, NO. 8, DECEMBER 2010

[6] V. Vaishampayan, “Design of multiple description scalar quantizers,”IEEE Trans. Inf. Theory, vol. 39, no. 3, pp. 821–834, May 1993.

[7] T. Tillo, M. Grangetto, and G. Olmo, “Redundant slice optimal alloca-tion for H.264 multiple description coding,” IEEE Trans. Circuits Syst.Video Technol., vol. 18, no. 1, pp. 59–70, Jan. 2008.

[8] N. Magharei, R. Rejaie, and Y. Guoc, “Mesh or multiple-tree: A com-parative study of live p2p streaming approaches,” in Proc. IEEE IN-FOCOM, 2007.

[9] Z. Liu, Y. Shen, S. Panwar, K. Ross, and Y. Wang, “P2P video livestreaming with MDC: Providing incentives for redistribution,” in Proc.ICME, Jul. 2007, pp. 48–51.

[10] N. Magharei and R. Rejaie, “Prime: Peer-to-peer receiver-drivenmesh-based streaming,” IEEE/ACM Trans. Netw., vol. 17, no. 4, pp.1052–1065, 2009.

[11] M. Zhang, Q. Zhang, L. Sun, and S. Yang, “Understanding the powerof pull-based streaming: Can we do better?,” IEEE J. Select. AreasCommun., vol. 25, no. 8, pp. 1678–1694, Dec. 2007.

[12] T. Bonald, L. Massoulié, F. Mathieu, D. Perino, and A. Twigg, “Epi-demic live streaming: Optimal performance trade-offs,” in Proc. SIG-METRICS, Jun. 2008, pp. 325–336.

[13] RFC 3984, RTP Payload Format for H.264 Video, Feb. 2005.[14] RFC 2326, Real Time Streaming Protocol (RTSP), Apr. 1998.[15] RFC 5583, Signaling Media Decoding Dependency in the Session De-

scription Protocol (SDP), Jul. 2009.[16] R. Bernardini, M. Durigon, R. Rinaldo, L. Celetto, and A. Vitali,

“Polyphase spatial subsampling multiple description coding of videostreams with H.264,” in Proc. IEEE ICIP, Oct. 2004, pp. 3213–3216.

[17] E. Akyol, A. Tekalp, and M. Civanlar, “A flexible multiple descriptioncoding framework for adaptive peer-to-peer video streaming,” IEEE J.Select. Topics Signal Process., vol. 1, no. 2, pp. 231–245, Aug. 2007.

[18] C. Zhu and M. Liu, “Multiple description video coding based on hier-archical B pictures,” IEEE Trans. Circuits Syst. Video Technol., vol. 19,no. 4, pp. 511–521, Apr. 2009.

[19] K. Sripanidkulchai, A. Ganjam, B. Maggs, and H. Zhangc, “The fea-sibility of supporting large-scale live streaming applications with dy-namic application end-points,” in Proc. ACM SIGCOMM, Feb. 2004,pp. 107–120.

[20] VideoLAN Project. [Online]. Available: http://www.videolan.org/.[21] Joint Video Team JVT of ISO/IEC MPEG and ITU-T VCEG, Int.

Standard of Joint Video Specification (ITU-T Rec. H.264, ISO/IEC14496-10 AVC), Mar. 2003.

[22] D. A. Tran, K. Hua, and T. Do, “ZIGZAG: An efficient peer-to-peerscheme for media streaming,” in Proc. IEEE INFOCOM, Mar. 2003,pp. 1283–1292.

[23] S. Banerjee, B. Bhattacharjee, and C. Kommareddy, “Scalable ap-plication layer multicast,” in Proc. ACM SIGCOMM, Aug. 2002, pp.205–217.

[24] Y. Guo, C. Liang, and Y. Liu, “dHCPS: Decentralized hierarchicallyclustered p2p video streaming,” in Proc. ACM CIVR, Jul. 2008, pp.655–662.

[25] W. Ooi, “Dagster: Contributor aware end-host multicast for mediastreaming in heterogeneous environment,” in Proc. MMCN, Jan. 2005,pp. 77–90.

Andrea Magnetto was born in Rivoli, Italy, in 1984.He received the Laurea degree in computer sciencefrom the University of Torino, Torino, Italy, in 2006.

His main research interests include network mod-eling/simulation, with particular interest in peer-to-peer streaming.

Rossano Gaeta received the Laurea and Ph.D.degrees in computer science from the University ofTorino, Torino, Italy, in 1992 and 1997, respectively.

He is currently an Associate Professor in theComputer Science Department of the University ofTorino. His current research interests include thedesign and evaluation of peer-to-peer computingsystems and the analysis of compressive sensing andcoding techniques in distributed applications.

Dr. Gaeta is the recipient of the Best Paper awardat the 14th IEEE/ACM International Symposium on

Modeling, Analysis, and Simulation of Computer and Telecommunication Sys-tems (MASCOTS 2006) and at the 26th International Symposium on ComputerPerformance, Modeling, Measurements, and Evaluation (PERFORMANCE2007).

Marco Grangetto (S’99–M’03–SM’09) received theElectrical Engineering degree and the Ph.D. degreefrom the Politecnico di Torino, Turin, Italy, in 1999and 2003, respectively.

He is currently an Assistant Professor in the Com-puter Science Department, University of Torino.His research interests are in the fields of multimediasignal processing and networking. In particular, hisexpertise includes wavelets, image and video coding,data compression, video error concealment, errorresilient video coding unequal error protection, and

joint source channel coding.Dr. Grangetto was awarded the Premio Optime by Unione Industriale di

Torino in September 2000, and a Fulbright grant in 2001 for a research periodwith the Department of Electrical and Computer Engineering, Universityof California at San Diego. He has participated in the ISO standardizationactivities on Part 11 of the JPEG 2000 standard. He has been a member of theTechnical Program Committee for several international conferences, includingthe IEEE ICME, ICIP, ICASSP, and ISCAS.

Matteo Sereno (M’08) was born in Nocera In-feriore, Italy. He received the Laurea degree incomputer science from the University of Salerno,Salerno, Italy, in 1987 and the Ph.D. degree incomputer science from the University of Torino,Torino, Italy, in 1992.

He is currently a Full Professor in the ComputerScience Department of the University of Torino.His current research interests are in the area ofperformance evaluation of computer systems, com-munication networks, peer-to-peer systems, sensor

networks, queuing networks, and stochastic Petri net models.


Recommended