+ All Categories
Home > Documents > YaoWang2005Proc.ieee

YaoWang2005Proc.ieee

Date post: 15-Sep-2015
Category:
Upload: samir-chouchane
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
14
Multiple Description Coding for Video Delivery YAO WANG, FELLOW, IEEE, AMY R. REIBMAN, FELLOW, IEEE, AND SHUNAN LIN, MEMBER, IEEE Invited Paper Multiple description coding (MDC) is an effective means to combat bursty packet losses in the Internet and wireless networks. MDC is especially promising for video applications where re- transmission is unacceptable or infeasible. When combined with multiple path transport (MPT), MDC enables traffic dispersion and hence reduces network congestion. This paper describes principles in designing MD video coders employing temporal prediction and presents several predictor structures that differ in their tradeoffs between mismatch-induced distortion and coding efficiency. The paper also discusses example video communication systems inte- grating MDC and MPT. Keywords—Multipath transport, multiple description coding (MDC), video communications. I. INTRODUCTION Multiple description coding (MDC) has emerged as a promising approach to enhance the error resilience of a video delivery system. In the most common implementation, a multiple description (MD) coder generates two equal rate and equal importance descriptions so that each description alone provides low but acceptable quality and both descrip- tions together lead to higher quality. The two descriptions are individually packetized and sent through either the same or separate physical channels. As long as the two descriptions are not simultaneously (in terms of the spatial location and time in the underlying video sequence) affected by packet losses, an acceptable quality can be maintained. In the more general case, more than two descriptions can be generated, which may or may not have identical rates. A primary reason for the increasing popularity of MDC is that it can provide adequate quality without requiring retrans- mission of any lost packets (unless the loss rate is very high). Manuscript received January 11, 2004; revised May 20, 2004. This work was supported in part by the National Science Foundation under Grant ANI 0081375. Y. Wang is with the Department of Electrical and Computer Engi- neering, Polytechnic University, Brooklyn, NY 11201 USA (e-mail: [email protected]). A. R. Reibman is with AT&T Labs.—Research, Florham Park, NJ 07932 USA (e-mail: [email protected]). S. Lin is with Harmonic Inc., White Plains, NY 10601 USA (e-mail: [email protected]). Digital Object Identifier 10.1109/JPROC.2004.839618 This has two important implications. First, it makes MDC particularly appealing for real-time interactive applications such as video phone and conferencing, for which retransmis- sion is often not acceptable because it incurs overlong delay. Second, it simplifies the network design: no feedback or re- transmission is necessary and all the packets can be treated equally. This is in contrast to layered coding (LC), which generates a base layer and one or more enhancement layers. One major obstacle for the adoption of LC in practical net- works is that to guarantee a basic level of quality, the base layer must be delivered almost error free. This requires differ- ential treatment of base-layer and enhancement-layer packets by the network and retransmission of lost base-layer packets. Depending on the acceptable delay of the underlying appli- cation and the network setup, these options are not always feasible. In addition to error resilience, combining MDC with multiple path transport (MPT) enables traffic dispersion and load balancing in the network, which can effectively relieve congestion at hotspots and increase overall network utiliza- tion. Although one can split the bitstream from any coder onto multiple paths, the fact that substreams from MDC can be treated equally and independently makes the task of allocating and scheduling source packets onto transport paths much easier than for a conventional single description (SD) coder or a layered coder. Combining MDC with MPT has become a “hot” new research direction in the past few years because this architecture offers both error resilience and load balancing and is applicable in both wired and wireless networks. These benefits of MDC come at a price, however; for an MD coder to meet the same quality criterion as a conven- tional SD coder in the absence of any transmission errors, the MD coder uses more bits. This excess rate or redundancy is inserted intentionally to make the bitstream more resilient to transmission errors. The primary objective in designing an MD coder is to minimize the redundancy (or the total rate) while meeting an end-to-end distortion requirement that takes into account transmission loss. The idea of MDC was conceived and studied in the early 1980s by information theorists [1]–[3]. Some MD coders 0018-9219/$20.00 © 2005 IEEE PROCEEDINGS OF THE IEEE, VOL. 93, NO. 1, JANUARY 2005 57
Transcript
  • Multiple Description Coding for Video DeliveryYAO WANG, FELLOW, IEEE, AMY R. REIBMAN, FELLOW, IEEE, AND SHUNAN LIN, MEMBER, IEEE

    Invited Paper

    Multiple description coding (MDC) is an effective means tocombat bursty packet losses in the Internet and wireless networks.MDC is especially promising for video applications where re-transmission is unacceptable or infeasible. When combined withmultiple path transport (MPT), MDC enables traffic dispersion andhence reduces network congestion. This paper describes principlesin designing MD video coders employing temporal prediction andpresents several predictor structures that differ in their tradeoffsbetween mismatch-induced distortion and coding efficiency. Thepaper also discusses example video communication systems inte-grating MDC and MPT.

    KeywordsMultipath transport, multiple description coding(MDC), video communications.

    I. INTRODUCTION

    Multiple description coding (MDC) has emerged as apromising approach to enhance the error resilience of avideo delivery system. In the most common implementation,a multiple description (MD) coder generates two equal rateand equal importance descriptions so that each descriptionalone provides low but acceptable quality and both descrip-tions together lead to higher quality. The two descriptions areindividually packetized and sent through either the same orseparate physical channels. As long as the two descriptionsare not simultaneously (in terms of the spatial location andtime in the underlying video sequence) affected by packetlosses, an acceptable quality can be maintained. In the moregeneral case, more than two descriptions can be generated,which may or may not have identical rates.

    A primary reason for the increasing popularity of MDC isthat it can provide adequate quality without requiring retrans-mission of any lost packets (unless the loss rate is very high).

    Manuscript received January 11, 2004; revised May 20, 2004. This workwas supported in part by the National Science Foundation under Grant ANI0081375.

    Y. Wang is with the Department of Electrical and Computer Engi-neering, Polytechnic University, Brooklyn, NY 11201 USA (e-mail:[email protected]).

    A. R. Reibman is with AT&T Labs.Research, Florham Park, NJ 07932USA (e-mail: [email protected]).

    S. Lin is with Harmonic Inc., White Plains, NY 10601 USA (e-mail:[email protected]).

    Digital Object Identifier 10.1109/JPROC.2004.839618

    This has two important implications. First, it makes MDCparticularly appealing for real-time interactive applicationssuch as video phone and conferencing, for which retransmis-sion is often not acceptable because it incurs overlong delay.Second, it simplifies the network design: no feedback or re-transmission is necessary and all the packets can be treatedequally. This is in contrast to layered coding (LC), whichgenerates a base layer and one or more enhancement layers.One major obstacle for the adoption of LC in practical net-works is that to guarantee a basic level of quality, the baselayer must be delivered almost error free. This requires differ-ential treatment of base-layer and enhancement-layer packetsby the network and retransmission of lost base-layer packets.Depending on the acceptable delay of the underlying appli-cation and the network setup, these options are not alwaysfeasible.

    In addition to error resilience, combining MDC withmultiple path transport (MPT) enables traffic dispersion andload balancing in the network, which can effectively relievecongestion at hotspots and increase overall network utiliza-tion. Although one can split the bitstream from any coderonto multiple paths, the fact that substreams from MDCcan be treated equally and independently makes the taskof allocating and scheduling source packets onto transportpaths much easier than for a conventional single description(SD) coder or a layered coder. Combining MDC with MPThas become a hot new research direction in the past fewyears because this architecture offers both error resilienceand load balancing and is applicable in both wired andwireless networks.

    These benefits of MDC come at a price, however; for anMD coder to meet the same quality criterion as a conven-tional SD coder in the absence of any transmission errors,the MD coder uses more bits. This excess rate or redundancyis inserted intentionally to make the bitstream more resilientto transmission errors. The primary objective in designingan MD coder is to minimize the redundancy (or the totalrate) while meeting an end-to-end distortion requirement thattakes into account transmission loss.

    The idea of MDC was conceived and studied in the early1980s by information theorists [1][3]. Some MD coders

    0018-9219/$20.00 2005 IEEE

    PROCEEDINGS OF THE IEEE, VOL. 93, NO. 1, JANUARY 2005 57

  • were later proposed, including [4] and [5]. In the second halfof the 1990s, MDC became popular as an effective meansto combat transmission errors in the best-effort Internet andwireless networks. Numerous MD coders have since beenproposed for coding multimedia (speech, image, and video).For a comprehensive review of the development history ofMDC, the rate-distortion (R-D) bound for MDC, and variousMD algorithms developed (primarily for images), the readeris referred to Goyals overview paper [6].

    MDC is particularly promising for video because of thevery stringent delay requirement in many video applications.However, designers of MD video coders face some uniquechallenges. As is well known, motion-compensated predic-tion can effectively exploit the temporal correlation betweenvideo frames. As a result, it is a fundamental componentin all video coding standards. Therefore, many MD videocoders also incorporate motion-compensated prediction.However, whenever an encoder uses a signal for predictionthat is unavailable to the decoder (because of transmissionloss), a condition called mismatch occurs. MD codingis designed around the principle that information may belost during transmission; hence, mismatch is a fundamentaldesign concern in MD video coders. Efficient predictors aremore likely to introduce mismatch, so a primary technicalchallenge is to balance prediction efficiency with mismatchcontrol.

    In Section II, we describe the principles of designing MDvideo coders. In Section III, we review some representativeimplementations of MD video coders based on the strategiesthey use for MD coding, prediction, and mismatch control.Section IV presents the general system architecture that inte-grates MDC and MPT and reviews three representative sys-tems. Section V summarizes the main contributions of thispaper and suggests several future research directions.

    II. PRINCIPLES OF PREDICTIVE MD CODING

    In this section, our focus is on describing two fundamentaldesign issues that must be tackled when creating useful MDvideo coders: mismatch control and redundancy allocation.We restrict our discussion to the two-description case.

    We discuss general design strategies for an MD codersent on an ideal MD network in Section II-A. In predictivecoders in general, the encoder is designed to ensure thatthe predictions formed at both the encoder and decoderalways match. In a predictive MD (P-MD) encoder, this mayincur redundancy; hence, it may not always be desirable toeliminate mismatch. Section II-B examines P-MD codersand considers alternatives for handling mismatch in an idealMD network. Section II-C highlights the difference betweenthe ideal MD network and practical packet networks andpresents alternatives for mismatch control in such networks.

    In general, overall performance of an MD coder dependson the channel characteristics and the source statistics. Sinceboth these vary with time for many practical video applica-tions, it is important that MD coders be designed with theability to be flexible and dynamically adapt the amount and

    Fig. 1. MD source coding for an ideal MD network.

    types of redundancy. Section II-D discusses redundancy al-location in a P-MD coder.

    A. Characterization of a General MD CoderAn ideal MD network consists of two channels. Either

    channel may fail with probability . If a channelfails, it does not work throughout the duration of the trans-mission. The decoder knows when a channel has failed, whilethe encoder cannot know. A typical assumption is that bothchannels do not simultaneously fail.

    Fig. 1 shows the basic framework for MD source codingusing an ideal MD network. The encoder creates two descrip-tions which are sent separately across two channels. The bitrates used to send each description, in bits per source sample,are and , and the total rate is . Three situ-ations are possible: both descriptions are received by the MDdecoder or either one of the two descriptions is missing.

    It is conceptually useful to describe the basic MD decoderas consisting of three individual decoders, each corre-sponding to the three situations. This forces the encoder toconsider explicitly that the decoder may be in one of threestates, even though the encoder cannot know which of thethree states the decoder is in. The central decoder receivesboth descriptions and produces a high-quality reconstructionwith central distortion , while the two side decoderseach receive only one of the two descriptions and producelower, but still acceptable, quality reconstructions with sidedistortions and . In many applications, a balanceddesign is useful, where andand .

    An SD coder minimizes for a fixed total rate , andits performance is measured by its operational rate-distor-tion function . An MD coder has conflicting require-ments to simultaneously minimize both and . At oneextreme, an MD coder that simply alternates the bits of anSD bitstream into each description will achieve the minimumcentral distortion but have unacceptably high side distortion.At the other extreme, an MD coder that simply duplicates theSD bitstream with rate into each description will usebits to achieve minimal side distortion but have a larger cen-tral distortion compared to that of an SD coder using bits.

    One way to measure the efficiency of an MD coder is byusing the redundancy-rate distortion (RRD) curve [7]. De-fine the distortion of the best single description (SD) coderto be when bits are used. Then, define redundancy tobe , where is the rate when the MD coder has

    58 PROCEEDINGS OF THE IEEE, VOL. 93, NO. 1, JANUARY 2005

  • central distortion . Intuitively, is the bit-rate sacrificedcompared to the SD coder for the purpose of reducing .The RRD function is the additional rate whichis required to achieve a desired side distortion at centraldistortion . The performance of an MD coder is charac-terized by three variables: , and .

    Let be the set of parameters that define the MD coderthat is based on an SD coder with parameter set .1 Whenoptimizing an MD encoder for transport over an ideal MDnetwork, a common approach is to minimize the average dis-tortion based on the (assumed known) channel failure prob-abilities subject to a rate constraint. This leads to

    (1)subject to (2)

    We note that the central distortion depends on only, andthe side distortion depends on both and . The total rate

    can similarly be expressed as .Then, (1) can be written as

    (3)

    The above optimization problem can be considered aproblem of allocating a total rate between andto minimize the average distortion. If the central distortionis determined by the application, and are fixed. Then,only the inner optimization, which corresponds to findingthe best point on the RRD function, is necessary.

    B. Predictive Coding in an MD EnvironmentBecause predictive coding is nearly universal in todays

    video coders, we consider generic P-MD coders. The genericP-MD encoder and decoder we present here is meant to beillustrative rather than exhaustive or comprehensive. Detailson how specific implementations are built from or may varyfrom these simple structures are described in Section III.We focus on the predictive structure, recognizing that in-traframes are a special case where the predictor is zero.

    Consider a predictive coder in a single-description envi-ronment. The encoder typically tracks the state it expects tobe present at the decoder and bases its predictor upon thatstate. In this way, encoder and decoder are able to maintainidentical state, provided the decoder receives all information.

    Fig. 2 shows a simplified generic MD predictive decoder.The central and side decoders inside the dotted boxes areeach typical predictive decoders. The two descriptions con-tain prediction error information. As before, the central de-coder is used when both descriptions are received, and theside decoders are used when only one description is received.The prediction error signals are decoded by the MD predic-tion-error (MD-PE) central and side decoders. Each of thepredictive decoders has its own predictor function basedon the state, , available to that decoder. The

    1For example, for both the MDSQ coder [5] and the MDTC coder [7], theset Z is the quantizer parameters. For MDSQ,M corresponds to the MDindex assignment, and for MDTC,M is the MDTC transform matrix.

    Fig. 2. Predictive MD decoder.

    Fig. 3. Predictive MD encoder.

    state of the central decoder, , includes the state of the indi-vidual side decoders. The central predictor function maynot use all components of . The outputs of the side de-coders and the central decoder may be processed by an addi-tional block (not shown) prior to display to improve quality.

    Depending on which descriptions are received, the de-coder has three possible states, , and . However,the encoder can never know which of these states is present.If the encoder uses a predictor that depends on state notavailable at the decoder, the encoder and decoder states willbe mismatched. This potential mismatch and the subsequenterror propagation present a fundamental design concern forP-MD video coders.

    In general, a P-MD encoder must produce two descrip-tions that will be useful to the decoder, regardless of whichstate it is in. A variety of strategies have been proposed, withdifferent tradeoffs between redundancy and side distortion.Fig. 3 shows a simplified generic P-MD encoder. The struc-ture is similar to a basic predictive coder, with an MD-PEencoder in place of the usual quantizer and an MD-PE de-coder in place of the usual inverse quantizer. Each branch(arrow) may contain one, two, or three signals, as indicated.In some implementations, some branches may be omitted oradditional branches added, as described in more detail. Thegeneric encoder of Fig. 3 tracks all three states in the decoderof Fig. 2. To keep the block diagram simple, we specificallyomitted any optional block for coding the mismatch signal.

    WANG et al.: MULTIPLE DESCRIPTION CODING FOR VIDEO DELIVERY 59

  • Table 1Summary of Predictor Classes

    The generic encoder operates as follows. Depending onthe implementation, the encoder typically has either one ortwo basic predictors. Let be the th pre-diction of the original signal . If the encoder has onlyone predictor, it is , which may not neces-sarily be the same predictor as a single-description coder. Ifthe encoder has two predictors, they are and

    . The th prediction error signal is.

    The MD-PE encoder can be implemented using a wide va-riety of techniques, to be described in Section III-A. In gen-eral, it takes as input the prediction error signal(s) and addsredundancy to create two correlated signals, one for each de-scription. The MD-PE decoder has three outputs, one fromthe central MD-PE decoder and two from the side MD-PE de-coders. Each output from the MD-PE decoder is added to itsrespective prediction signal to form the reconstruction .

    Mismatch will occur in an ideal MD network whenever thestates used for prediction at the encoder and decoder do notmatch. The error due to mismatch in one frame will prop-agate into subsequent frames, and the resulting increasingerror as a function of time may become disturbing. There-fore, P-MD coders should be designed to be fully aware ofany mismatch they may introduce.

    The reconstruction errors for the three decoding states are. The mean square

    of defines the central distortion , while that ofdefines the side distortion . To empha-

    size the presence of the mismatch error in the side distortionfor a P-MD coder, we assume the mismatch error is uncor-related with the rest of the error2 and express the total sidedistortion as

    (4)

    is the side distortion in representing the predictionerror by the MD-PE encoder, and is the side distortiondue to mismatch.

    The predictor block shown in Fig. 3 outputs three predic-tion signals to mimic the prediction signals created in threedifferent decoding states. In some implementations, not allprediction signals are generated. We categorize the possiblepredictors for an P-MD coder into three classes, dependingon the resulting tradeoff between overall redundancy and theside distortion. These categories are summarized in Table 1.

    2This assumption is not accurate in coders with explicit mismatch coding.

    ) Predictors in Class A have no mismatch.This requires the encoder to create a predic-tion error signal using the same predictionsignal(s) as the decoder, regardless of whichstate the decoder is in.

    There are two principle ways for an en-coder to achieve this. The first is to use two in-dividual predictors, ,that predict based only on information sent inone of the two descriptions. Such an encodercan be thought of as a two-state encoder [8].Some examples of coders built using this ap-proach are [9], [8], [10], and [11]. These areexplained in more detail in Section III-B.

    The second way is for the encoder touse a single predictor that uses informationcommon to both descriptions. MD-SNR [12]is one example, as is the coder in [13]. Theseare explained in more detail in Section III-C.

    ) Predictors in Class B use the same predictorthat would be used by a single-descrip-tion (SD) predictive encoder. This predictorminimizes the prediction error and henceintroduces no additional redundancy duringthe prediction process. In general, the de-coder cannot form this prediction unless bothdescriptions have been received; hence, mis-match will occur. Many MD video coders usethis predictor strategy; a few examples are[14], [15], and [11]. These coders and othersare explained in more detail in Sections III-Dand III-E.

    ) A predictor in Class C has parameters thatcan be selected appropriately to trade the lossin prediction efficiency (compared to an SDpredictor) and the amount of mismatch. Twoexamples of encoders using this class of pre-dictor are [16] and [17]. These are explainedin more detail in Section III-F.

    Because predictors in Classes B and C introduce mis-match, an encoder using one of these predictors canoptionally send a compressed version of the mismatch. Theside decoder, by combining the central prediction error andthe mismatch signal, can completely or partially cancelthe prediction mismatch when reconstructing each frame.Another approach to control mismatch is to periodicallysend I-frames, which clear any accumulated mismatch in thereconstructed frames.

    Table 2 summarizes the four types of redundancy thatcan be added to a P-MD video coder. First, the MD-PEencoder introduces redundancy, denoted . Second, usinga predictor that is less efficient than the SD predictor addsredundancy, denoted . Third, the bit-rate can be usedsolely to reduce mismatch. Finally, the redundancy cor-responds to any bit rate used to describe side information inexcess of that used by an SD encoder. This includes extrabits used to describe motion vectors and any other MD pa-rameters. This redundancy may be nontrivial if the total rate

    60 PROCEEDINGS OF THE IEEE, VOL. 93, NO. 1, JANUARY 2005

  • Table 2Summary of Redundancy Types

    is low or if the encoder adapt its redundancy dynamicallybased on time-varying source and/or channel characteristics.The overall rate of a P-MD encoder can thus be expressed as

    (5)

    C. Using P-MD Coders on a Packet-Loss NetworkIn a packet-loss network, compressed information is par-

    titioned into multiple packets, and during transmission eachpacket has a (typically equal) probability of being lost .Most real-world networks including the Internet and manywireless networks experience busty packet losses, i.e., manysuccessive packets are often lost in a burst due to congestionor path failure. If the two descriptions from an MD coder arepacketized such that each burst usually affects only one de-scription, then at the burst level, the packet-loss network canbe approximated as an ideal MD network. This factor haslead to the surge of interest in employing MD coders to trans-port signals over the Internet and wireless networks. Thissection describes the challenges associated with designinga P-MD coder and corresponding packetization schemes foroperating over a packet-loss network.

    In an ideal MD network, a description is either intact orcompletely lost. In a packet network, a description sent usingmultiple packets may be only partially lost. Further, lossesmay appear in both descriptions either at distinct times or si-multaneously. In an ideal MD network, an MD encoder caneasily anticipate possible decoding states exactly; however,in a packet-loss network this is challenging. If the two de-scriptions are partitioned into packets, there arepossible sets the decoder might receive. Thus, mismatch con-trol by the encoder is challenging.

    As for the ideal MD network, two approaches to mismatchcontrol over a packet-loss network are possible: mismatchcoding or mismatch elimination. In practical MD videocoders, mismatch coding by sending mismatchsignals is often abandoned, in favor of the simpler approachof sending an I-frame to clear up the error. Other approachesfor mismatch coding include using a sync-frame [9] orSP-frame [18].

    For mismatch elimination in a packet network, twoapproaches have been proposed. The first merges smartpacketization with a mismatch-free P-MD encoder [10].Motion-compensated prediction is constrained so that eachdescription can be partitioned into multiple packets, eachself-sufficient for prediction [10]. However, this requireslatency as long as a group of pictures (GOP). In the second

    approach, a mismatch-free P-MD coder generates de-scriptions, one for each packet. However, this may increaseredundancy dramatically.

    D. Optimization of a P-MD CoderThe performance of a P-MD coder in an MD or a packet

    network depends on the channel characteristics and thesource statistics. Because these both vary with time, it isimportant that MD coders be able to be flexibly and dynam-ically adapt redundancy. The many sources of redundancyin a P-MD coder make this a challenging task.

    For the ideal MD network, we need to allocate the totalrate optimally among and , as described in (3). If thetotal rate is constrained within each frame period, then it isnecessary to allocate the redundancy to its individual com-ponents [see (5)]. Further, if the total rate can be allocatedacross multiple frames, then it is also important to optimallyallocate rate and redundancy to the different frames. In acoder with mismatch, and typically increasesover time. Thus, it is generally useful to allocate more redun-dancy to early frames to reduce the mismatch in subsequentframes.

    For the packet-loss network, the goal is to minimize theoverall distortion averaged over all possible packet loss sce-narios and averaged over time, subject to a constraint on theoverall rate. Except in the mismatch-free case (achievablewith, e.g., the two approaches described in Section III-C),a packet loss of an early frame will affect the reconstructionof later frames. Choosing among the MD parameters foreach frame, taking into account current and future impact ofloss, is a daunting task.

    Two approaches are possible for the packet-loss network.The first chooses all MD parameters across the GOPto minimize the average distortion, taking into accounterror propagation [19]. The second approach is recursive,using greedy optimization at each frame period (or smallerinterval) to choose those MD parameters that minimizethe distortion for this frame, based on past frames [20]. Theeffect of errors on future frames is ignored.

    When optimizing over either type of network, the codershould be able to flexibly adjust redundancy. As the channelor packet loss/failure parameter becomes small, ideallyan MD coder should add less redundancy so that it closelyapproximates an SD coder. As we will see in Section III,some proposed coders do not allow independent control overall types of redundancy. When one type of redundancy de-creases, another necessarily increases.

    III. REVIEW OF MD VIDEO CODERS

    In this section, we describe specific implementations ofMD video coding algorithms. We begin in Section III-A bydescribing MD coding algorithms useful for the MD-PEblock of the P-MD encoders illustrated in Fig. 3. Sec-tions III-B and III-C describe video coders which have nomismatch because they use a predictor in Class A. Then,Sections III-D and III-E describe video coders that usethe SD predictor, both without and with explicit mismatch

    WANG et al.: MULTIPLE DESCRIPTION CODING FOR VIDEO DELIVERY 61

  • coding. Section III-F describes video coders that use apredictor in Class C, which trades prediction efficiency andmismatch, either with or without optional mismatch coding.Section III-G describes MD coders realized by applyingunequal forward error correction (FEC) to different layersof a scalable bitstream. Depending on how many layers areincluded as references for prediction, these coders may havea predictor in Class A, B, or C.

    Ideally, we would like to compare the different implemen-tations of MD video coders in a fair and balanced way. Un-fortunately, because of the variety of experimental resultsthat are reported in the literature, this is difficult. The ex-periments use high- or low-motion test video sequences, dif-ferent resolution formats, bit rates, frame rates, different errorconcealment methods, packetization schemes, and error pat-terns. Therefore, in this section, we purposely do not com-pare the performance of the different MD video coders. Werefer the interested reader to the individual publications tolearn more about comparative performance.

    A. Multiple Description Algorithms

    During recent years, a variety of practical MD compres-sion algorithms have been proposed: subsampling either inthe spatial, temporal, or frequency domain, MD quantization,and MD transform coding. These algorithms typically appearin the MD-PE encoder of the P-MD encoder of Fig. 3. Someof these methods were also recently reviewed in [6].

    MD subsampling decomposes the original signal into sub-sets, either in the spatial, temporal, or frequency domain [21],[9], [22][24], [15], [8], [16], [11], [25], where each subsetcorresponds to a different description. These MD subsam-pling algorithms take advantage of the fact that spatially ortemporally adjacent video data samples are correlated. Thus,one description can be estimated from the other. Representa-tive algorithms include temporal frame interleaving [9], spa-tial pixel interleaving applied either to image samples [11],[22] or motion vectors [15], and transform coefficient inter-leaving [23], [24], [26]. Optimal partitioning strategies areconsidered in [25].

    Without prefiltering, the redundancy and side distortionof MD subsampling algorithms are controlled by the sourcestatistics. However, a well-designed prefilter can control thetradeoff between redundancy and side distortion [11], [22].

    In MD quantization algorithms [5], [27][33], the outputof a quantizer is assigned two indexes, one for each descrip-tion. The MD decoder estimates the reconstructed signalbased on the received index(es). The redundancy introducedby an MD scalar quantizer (MDSQ) and the correspondingside distortion are both controlled by the assignment ofindexes to each quantization bin.

    In MD transform coding (MDTC) [7], [34] a correlatingtransform introduces a controlled amount of redundancy be-tween two sets of coefficients. Within each description, co-efficients should be uncorrelated for maximum coding effi-ciency. At the decoder, missing coefficients can be estimatedfrom the received description. For example, the pairwise cor-relating transform (PCT) [7] transforms a pair of random

    variables using a nonorthogonal linear transform. The twooutputs each lie in one of two nonorthogonal subspaces. Theoptimal transform that achieves minimal side distortion fora given redundancy is parameterized by a single variable,which controls the amount of correlation introduced, whichin turn controls the redundancy and side distortion.

    B. MD Video Coders Using Multiple Class A Predictors

    We consider here video coders that use multiple predictorsto eliminate mismatch, starting with subsampling techniques[9], [8], [11] followed by MSDQ methods [10], [35].

    In the video redundancy coding (VRC) algorithm [9], [36],a video sequence is temporally down-sampled into two sub-sets, essentially with every other frame in each subset. Theframes in each subset are coded into a description using anSD video encoder. At the decoder, if only one description isreceived, the missing frames can be estimated from the re-ceived frames. In the context of the framework in Fig. 3, wesee that the MD-PE encoder simply alternates each frame;hence, there is no type (a) redundancy. There is no mismatchin an ideal MD network, because the two predictors used atthe encoder are exactly those of the side decoder. However,this comes at the expense of nonzero type (b) redundancy,which is introduced because of the decreased correlation be-tween the alternated frames as compared to adjacent frames.For this coder, is governed by the source and cannot becontrolled by the encoder.

    In a packet-loss network, VRC deals with uncertain lossesby using sync-frames. In a packet-loss network, it is pos-sible to adjust redundancy by adapting the frequency of sync-frames.

    The general idea of using multiple states for prediction wasproposed in [8] for a packet-loss network. The implementa-tion of the multiple-state encoder in [8] was identical to VRCwithout sync-frames. State recovery for this coder was alsopresented in [37], where if one frame is lost, the correctly re-ceived frames are used to help recover the missing frame.

    Instead of temporal subsampling, the independent flow mdvideo coder (IF-MDVC) [11] uses the quincunx subsamplinglattice on each video frame to generate two subsequences andapplies a separate predictive encoder to each subsequence.Before down-sampling, each frame is low-pass filtered in theDCT domain using a prefilter. Adjusting the prefilter param-eter controls both type (a) and type (b) redundancy.

    One advantage of using MD subsampling algorithms tobuild P-MD vido coders is that because of the subsampling,no redundancy is needed for sending motion information.Therefore, is close to zero. Another advantage is that itis easy to extend these algorithms to more than two descrip-tions by increasing the subsampling rate. However, it is ex-pected that the redundancy due to prediction inefficiencywill increase rapidly as the number of descriptions grows.

    Mismatch-free MD video coding using multistate predic-tion can also be accomplished using MDSQ [10], with mu-tually refining DPCM (MR-DPCM). Again, two predictorsare used at the encoder to mimic the two side-decoder pre-dictors. Here, the prediction errors are DCT transformed and

    62 PROCEEDINGS OF THE IEEE, VOL. 93, NO. 1, JANUARY 2005

  • quantized by an MDSQ. To ensure that the information re-ceived in both descriptions is mutually refining at the cen-tral decoder, a coarse quantizer is applied in the predictionloops. In MR-DPCM, redundancy is controlled by changingthe index assignment of MDSQ. However, as decreases,the prediction efficiency decreases so increases.

    Regunathan and Rose uses an estimation-theoretic ap-proach to enhance MR-DPCM in [35]. They improve thequality of the reconstructed video displayed at the decoderin all instances by using an estimation algorithm that takesinto account all information received by the decoder.

    C. MD Video Coders Using One Class A PredictorNext, we consider video coders that eliminate mismatch

    using a single predictor. To accomplish this, the predictormust form its prediction using information that is commonto both descriptions. In general, this will incur type (b) re-dundancy because the predictor will be less efficient than anSD predictor.

    One example is MD-SNR, which was presented in [12]as a reference for comparison. The MD-SNR coder is builtusing an H.263 SNR-scalable coder by duplicating thebase layer (BL) into both descriptions and alternating theenhancement layer between the two descriptions. At thedecoder, if only one description is received, only the BLis used for reconstruction. In H.263 SNR scalability, theprediction of the BL information uses only BL information.Therefore, MD-SNR introduces no mismatch. Redundancyin the MD-SNR codec is controlled by the size of the BL.Duplicating more information increases but decreasesbecause the prediction efficiency is improved.

    This prediction strategy was also used in [13], whichcodes the prediction error using a discrete wavelet transform(DWT). The bitstream corresponding to the coded bitplaneof a block of DWT coefficients may be included in one,two, or more descriptions. The prediction is based only oninformation that is common to all descriptions.

    D. MD Video Coders Using Class B Predictor WithoutMismatch Coding

    The Class B predictor is the SD predictor. As a result,all coders that use this predictor have . However,without some form of mismatch coding, these coders willall have mismatch . The mismatch can be con-trolled implicitly by adjusting the parameters of the MD-PEencoder, where more reduces the mismatch. Alternatively,the mismatch can be coded with redundancy to reduce oreliminate mismatch. This section describes the basic coderswithout mismatch coding.

    The first coder we describe here is based on MDTC [12].We call it MDTC-NMC, where NMC denotes no mismatchcoding. A single prediction error is formed using an SD pre-dictor with all information that would be available to the cen-tral decoder. The DCT is applied to the prediction error, fol-lowed by MDTC, where pairs of DCT coefficients have cor-relation introduced. One member of each pair is sent in eachdescription. Motion vectors are duplicated in each descrip-tion so that is nonzero and depends on the spatial resolu-

    tion. Type (a) redundancy can be controlled by adjustingthe transforms applied to each pair of coefficients. Largerdecreases both terms of the side distortion in (4). Total redun-dancy was allocated across time in [12] to reduce error prop-agation due to mismatch. However, in general, MDTC-NMCcannot completely eliminate mismatch. Therefore, periodicI-frames are inserted to clear up mismatch.

    To avoid the type (d) redundancy associated with dupli-cating motion vectors, Kim and Lee proposed a coder thatsplits motion vectors and DCT coefficients of adjacent blocksinto two descriptions using the quincunx lattice [15]. For thiscoder , and are all very small and fixed by the source,while . They use OBMC to improve the reconstructionof the side decoders; however, because of the minimal redun-dancy used, this coder may have significant side distortion.

    In [14], Reibman et al. proposed the MD-split method,which uses the simplest possible MD algorithm for theMD-PE encoder: duplication or alternation. Using an H.263bitstream, motion vectors are always duplicated, and avarying number of low-frequency coefficients are alsoduplicated. The number of coefficients to duplicate can beadapted easily based on varying source and channel statisticswithout explicitly informing the decoder. Duplicating morecoefficients increases and decreases both terms in theside distortion (4). This coder will always have mismatchunless all coefficients are duplicated.

    In parallel, Comas et al. [38] proposed an algorithm withvery similar properties. Again, coefficients are either dupli-cated or sent in just one of two descriptions. However, in[38], one description of a block contains all coefficients whilethe other contains only low-frequency coefficients. By alter-nating on a frame or block basis which description gets allcoefficients and which gets a subset of coefficients, balanceddistortion can be achieved for each description.

    MD-split was optimized for an ideal MD network in [14]and for a packet-loss network in [20]. In each case, the effectof past mismatch was considered but the impact of futuremismatch and error propagation was not considered.

    The MD-Split coder duplicates the lower frequency coef-ficients only. Recently, Kim and Cho [39] extended this toallow any coefficient to be duplicated. Instead of operatingon the DCT coefficients, the matching pursuits MD videocoder (MP-MDVC) proposed by Tang et al. [40] duplicatesand alternates matching pursuits atoms onto each descrip-tion. Redundancy is controlled by adjusting the numberof duplicated atoms.

    The drift compensated MD video coder (DC-MDVC)[11] also uses the SD predictor. Similar to IF-MDVC, ituses spatial down-sampling with prefiltering, but appliedto the prediction error images instead of the original video.By adjusting the prefiltering, the redundancy can becontrolled. Both terms of the side distortion in (4) decreaseas increases.

    E. Optional Mismatch Coding for Class B PredictorsAll coders described in Section III-D may include optional

    mismatch coding, incurring redundancy . One example ofexplicit mismatch coding is to use I-frames to reset errors.

    WANG et al.: MULTIPLE DESCRIPTION CODING FOR VIDEO DELIVERY 63

  • An alternative is to code the mismatch signal [12], [40], [11],[41], [42]. These methods use all three predictors shown inFig. 3, where is the SD predictor and mimicsthe predictor at each side decoder. For a class B predictor, themismatch error signal is . Wenote that for these coders it may not be possible to decomposethe side distortion into the two terms as in (4).

    Two mismatch coding strategies were proposed in [12],both using MDTC as the MD-PE encoder. We call themMDTC-FMC and MDTC-PMC, with the suffixes indi-cating full mismatch coding and partial mismatch coding,respectively. MDTC-FMC sends a mismatch correctionsignal on each description using aquantizer which is generally coarser than the quantizerused to send the prediction error . Here, indicatesthe reconstructed signal by decoder without employing

    . Decoder adds the quantized to to obtain abetter side reconstruction that is free of mismatch. The sidedistortion is controlled by only. By changing canbe adjusted independently of .

    In MDTC, each description contains information in one oftwo nonorthogonal subspaces. Motivation for MDTC-PMC[12] arose by noting that most of the energy of the mis-match signal lies in the subspace orthogonal to its descrip-tion. Therefore, MDTC-PMC codes only in that or-thogonal subspace. For the same , MDTC-PMC has lowerredundancy than MDTC-FMC but higher side distortion.

    In [12], the decoder never uses the coded whenboth descriptions are received. Tang et al. [40] use maximumlikelihood estimation in MP-MDVC to improve the recon-struction quality of the central decoder by incorporating theinformation in coded . MP atoms to representare chosen to minimize the side distortion. The redundancy

    is controlled by the number of atoms used to code .In [43], MP-MDVC was further optimized for a packet-lossnetwork. A fast algorithm is presented to jointly select thenumber of atoms that are duplicated and the number of atomsused for mismatch coding, depending on the packet-lossconditions.

    The mismatch coding described above codes the errorsignal . An alternate approach, applied toDC-MDVC, is to code the error signal [11].This will minimize the difference between and ,reducing the mismatch in following frames.

    The coders above each send the mismatch error as an ad-ditional signal. Jagmohan and Ratakonda [41] send a singlesignal in each description which contains the sum of the SDprediction error and the mismatch, using MDTC. A similaridea was incorporated in an MDSQ coder by Lee et al. [42]and applied to video. In [42], the signal included in descrip-tion is , whereis the output of the MDSQ quantizer for description . Bothcoders require that the same quantizer be applied to the pre-diction error and the mismatch signal. Thus, one cannot sep-arately control the central and side distortion. Further, as theMD-PE parameters are changed to decrease neces-sarily increases.

    F. MD Video Coders Using Class C PredictorsWith andWithout Optional Mismatch Coding

    In this section, we consider coders that use a predictor inClass C. These coders have very flexible redundancy alloca-tion, enabling easy adaptation to varying channel conditions.

    Motivated by the MD-DPCM speech coder [21], Wangand Lin [16] proposed a coder named MD motion compen-sation (MDMC). In MDMC, the central predictior forms alinear superposition of the past two reconstructed frames.The MD-PE encoder uses temporal subsampling similar toVRC, so that the side decoders will only receive every otherframe. As a result, without optional mismatch coding, theside decoders will have mismatch. The weights for the linearsuperposition control both redundancy and mismatch.

    In [17], Kim et al. presented double-vector motion com-pensation (DMC), for use in a packet loss network. DMC alsouses a linear superposition for motion compensation. Expres-sions for error propagation as a function of the superpositionweights were presented. Optional mismatch coding was notconsidered.

    The MDMC encoder [16] has also been designed toinclude mismatch coding. MDMC with mismatch codinghas highly flexible redundancy control; the superpositionweights control , and the quantization parameter forcoding mismatch controls both and the side distortion.The coded mismatch signal enables not only the cancellationof the mismatch but also more accurate estimation of themissing description (i.e., estimating even frames from oddframes, and vice versa). MDMC operates slightly differentlydepending on the expected network. For ideal MD networks,the encoder stores the reconstructed past frames from thethree possible decoding scenarios separately and forms

    using past frames corresponding to different states. For packet loss networks where it is difficult to anticipate

    possible decoding states, only the reconstructed frames fromboth descriptions are stored, and mismatch is controlled viaboth mismatch coding and periodic I-frames. Adaptation ofthe MDMC coder to channel loss conditions requires a goodmodel that relates rate, redundancy, central distortion, andside distortion with encoder parameters. This is studied in[44].

    G. MDC Through Unequal FECInstead of designing the source encoder to yield multiple

    descriptions directly, one can apply unequal cross-packetFEC to different parts of a scalable bitstream. This method,pioneered in [45] and [46], is commonly known as MD-FEC.As shown in Fig. 4, the original bitstream is partition into

    layers, with layer divided into equal-length groups.An ReedSolomon (RS) code is then applied across

    groups to yield groups. Description contains bitsfrom group from all layers. The decoder can recover first

    layers of the original bitstream from any descriptionsby performing RS decoding. Redundancy and the associatedside distortion are controlled by varying the layer partition

    . Given the probability of receivingdescriptions, , the optimal layer partition

    64 PROCEEDINGS OF THE IEEE, VOL. 93, NO. 1, JANUARY 2005

  • Fig. 4. MD-FEC algorithm.

    can be determined by minimizing the ex-pected distortion subject to a total rate constraint (includingchannel coding) [47], [48].

    MD-FEC has become a popular tool for generating MDvideo because it can be applied to any scalable video streamto generate an arbitrary number of descriptions. With ascalable video coder, each GoP is coded into an embeddedstream. MD-FEC is applied to each GoP independently bypartitioning the bitstream of each GoP into layers andapplying unequal FEC in different layers. Depending onhow many layers are included as reference for temporalprediction, the coder can be considered as using either aClass A predictor (if only layer 1 is used for prediction), aClass B predictor without mismatch coding (when all layersare used for prediction), or a Class C predictor withoutmismatch coding (if layers are used).

    H. SummaryTable 3 summarizes the P-MD video coding frameworks.

    Dynamic and flexible redundancy allocation is necessaryto adapt to changing source and channel characteristics.In MD coders that eliminate mismatch completely, it isoften difficult to flexibly allocate redundancy. For example,those coders that use one or more Class A predictors cannotachieve very low redundancy; as decreases, predictionefficiency decreases and increases. Thus, they may not bethe best choice for networks with low loss rates.

    A careful performance comparison among these coders ismissing from the literature. In such a study, it would be es-sential to take into consideration factors such as optimal re-dundancy allocation, performance for high and low rates, inhigh- and low-motion sequences, in high- and low-loss net-works, and robustness to changing network conditions. Ide-ally, to fully understand the impact of the MD parameters onperformance, this comparison should use MD video codersbuilt from a single structure for the SD video coder.

    IV. INTEGRATION OF MULTIPLE DESCRIPTION CODING ANDMULTIPLE PATH TRANSPORT

    The performance of MDC critically depends on whetherthe packets from different descriptions that correspond to

    Table 3Summary of Frameworks for P-MD Video Coders

    the same or adjacent spatio-temporal segments in a videoare likely to be lost simultaneously. One effective approachto reduce the likelihood for simultaneous loss is to senddifferent descriptions through separate paths. We coin thistransmission scheme multiple path transport (MPT) (a.k.a.path diversity). While each transport path may be unreli-able, the chance that two paths simultaneously experiencefailures is typically low. MPT also enables load balancingin the network, thus reducing congestion and consequentlypacket losses. When a single path does not have sufficientbandwidth to carry an entire video stream, employingmultiple paths can also effectively increase the aggregateend-to-end bandwidth.

    Several research groups have explored the integration ofMDC with MPT for video communications, both for videoover wireless networks [49], [50] and over the Internet [8],[51], [52]. In video streaming applications, server diversity (avariant of path diversity) combined with either MDC or SDChas also been studied [53][57]. Instead of using multipleservers, the work in [58] relays the information from a serverthrough separate peers using a peer-to-peer (P2P) networkingarchitecture. These works have clearly shown the advantageof MPT over single-path transport (SPT) when the same MDor SD coder is used. Furthermore, combining MDC and MPT

    WANG et al.: MULTIPLE DESCRIPTION CODING FOR VIDEO DELIVERY 65

  • Fig. 5. General architecture of the system using multistream coding and multipath transport.Reprinted from [50]. Copyright 2003 IEEE.

    can lead to substantial performance gains over using SDCand SPT.

    In Section IV-A, we discuss some of the fundamental is-sues involved in designing a system that integrates MDC withMPT. Then in Sections IV-B, IV-C, IV-D, we review threeexample systems that effectively integrate MDC with MPTfor video transport over different networks.

    A. General System ArchitectureThe general architecture of a system combining MDC

    with MPT is shown in Fig. 5. This figure assumes a multi-stream source coder with MD coders as a special case. Ingeneral, the number of paths and the available bandwidthand reliability on each path may change with time. Weassume the system receives feedback about network qualityof service (QoS) parameters (e.g., bandwidth, delay, and lossprobabilities). At each update interval, a multipath routinglayer discovers and sets up paths between the sourceand destination. The transport layer continuously monitorspath parameters and feeds back such information to thesender. Based on this information, the coder at the sendergenerates ( in general) multiple packetizedbitstreams. (The feedback and interaction is only desirable,not necessary.) The packets from different substreams aredistributed by the traffic allocator among paths. At thereceiver, the packets arriving from all the paths are putinto a resequencing buffer, where they are reassembled into

    substreams after a preset time-out period. Finally, thedecoder reconstructs the transmitted video from the receivedpackets in all the substreams.

    Two important issues requiring close interaction be-tween the source encoder and the transport layer are trafficallocation and path selection. For an MD source coder,traffic allocation should be such that packets from dif-ferent descriptions that carry information about nearbyspatio-temporal segments of the video sequence be spreadover different paths to reduce the chance that these packetsare simultaneously lost. Similarly, path selection should tryto minimize the expected distortion of reconstructed video,which depends on the loss characteristics of individual pathsas well as path correlation. Several papers have examined

    Table 4OPNET simulation results using the MDMC coder

    the effect of shared links among the chosen paths on theend-to-end video quality [59], [51], [60].

    B. Video Transport in Wireless Ad Hoc NetworksAd hoc networks are multihop wireless networks without

    a preinstalled infrastructure. MPT is very appealing for wire-less ad hoc networks for several reasons: 1) a path can breakdown frequently because of node movements; 2) links are un-reliable with frequent packet losses; and 3) individual linksmay not have adequate capacity to support high bandwidthservice.

    Mao et al. [50] proposed to combine multistream codingand MPT for video transport over ad hoc networks. Threemultistream coding techniques with respective transportcontrol protocols were compared: the MDMC video coder[16] (see also Section III-F) without retransmission orfeed-back based adaptation, a H.263-based layered coderwith constrained retransmission of the base-layer, and afeedback-based reference picture selection (RPS) scheme.These schemes were examined and compared both usingMarkov models at the link level and using the OPNET net-work modeler. The first two schemes were also implementedand evaluated on a testbed consisting of laptops capable oftransmitting/receiving from two paths.

    Table 4 summarizes the results from a simulated networkusing OPNET with 16 nodes moving in a 600 600 m re-gion all at the same speed of 10 m/s. The video sequence(Foreman, QCIF) was coded using the MDMC coder at118 kb/s. With SPT, all packets are transmitted over a singlepath, updated using the NIST dynamic source routing (DSR)

    66 PROCEEDINGS OF THE IEEE, VOL. 93, NO. 1, JANUARY 2005

  • Fig. 6. Average PSNR versus time when the number of trees varied from 1 to 16, for sequenceAkiyo. Curves from top to bottom correspond to results obtained using 16, 8, 4, 2, and 1 distributiontrees, respectively. Reprinted with permission from [58]. Copyright 2003 Microsoft Corporation.All rights reserved.

    protocol [61]. With MPT, the packets from the two descrip-tions are transported on two disjoint paths maintained by amultipath DSR (MDSR) protocol, which was developed byextending the DSR model. Table 4 shows that MPT leads toa lower packet loss rate on each description. The packet losstraces (see [50]) also show that the loss events on the twodescriptions are less correlated with MPT. Both factors con-tribute to a substantial improvement in video quality.

    C. Video Streaming in Content Delivery NetworksApostolopoulos et al. [53] considered video streaming

    in a content delivery network (CDN) and examined theperformance gains by coupling MDC and server diversity.In a traditional CDN, many servers are placed over the entireInternet, and when a client requests a media file, the closestserver to the client is found and the requested file is streamedto the client from that server. To overcome the loss causedby server failures and packet losses in the transmission path,video is coded with multiple descriptions and stored onseparate servers. Specifically, the two-state MD coder withtemporal down-sampling [8] (see also Section III-B) is used.This MD-CDN scheme (which streams two descriptionsfrom two separate servers) is compared to an SD-CDNapproach, which streams a single stream (generated using aH.263 coder) from a single server.

    There are, in general, three issues involved in designing aCDN: 1) where to place the servers in the network; 2) howto distribute the files over the servers; and 3) how to selectthe server(s) for a client request. The focus in [53] was onthe performance gains achievable by varying strategies for2) and 3) while employing three types of existing networkinfrastructure for server placement. Their simulation resultsshow that MD-CDN can reduce the distortion of the receivedvideo significantly, and the gain is more significant when theservers are located at hotspots (e.g., locations of the datacenters of an Internet service provider).

    D. CoopNet ProjectPadmanabhan et al. [58] explored video multicasting

    using the so-called CoopNet, which employs the combi-nation of a dedicated server and many cooperating peersin a P2P network to distribute live or pre-encoded video.The rationale for using peers is to alleviate the overload atthe server caused by a special event. A critical challenge indesigning such a network is that the peer nodes are inher-ently unreliable. CoopNet addresses this issue by employingmultiple distribution trees spanning the set of participatingnodes. The streaming content is encoded using MDC andthe descriptions are distributed over different trees. Eachdistribution tree may contain one or more descriptions.

    The MD-FEC algorithm discussed in Section III-G is ap-plied to each GoP of the scalable bitstream from an MPEG4-PFGS encoder [62] to generate MD video streams. Informa-tion about whether a description is received or not in eachGoP is fed back periodically from nodes to the server througha reverse path of each tree. Based on this feedback, the proba-bility that out of descriptions are received is up-dated. Based on the estimated and the rate-distortioncurve of the PFGS video encoder, the server recalculates theoptimal layer partition in each update interval.

    The peer arrival and departure are modeled using the flashcrowd traces recorded at the MSNBC site on September 11,2001. It was assumed that clients stop forwarding traffic themoment they depart. Once a node in a tree departs, all itsdescendent nodes will not receive the descriptions containedin this tree until the next update interval. Thus, a descriptionis either completely received or lost in a node, within eachupdate interval. If it is lost in the current interval, it will bedistributed from a different parent in the next interval.

    Fig. 6 compares the PSNR curves obtained with differentnumber of trees, when the number of descriptions is fixedat . This figure shows that increasing the numberof distributing trees (equivalent to the number of paths )

    WANG et al.: MULTIPLE DESCRIPTION CODING FOR VIDEO DELIVERY 67

  • can increase PSNR significantly. The improvement is mostsignificant when increases from one to two. However, theperformance saturates once increases beyond 8. This re-sult is consistent with those reported in [56], which comparedthe performance obtained by varying when . Theyhave found that when increases from one to two, a signif-icant gain can be observed. As increases further, perfor-mance stays almost constant.

    V. CONCLUSIONIn this paper, we have considered various techniques for

    MD video coding. We focused on MD video coders that usetemporal prediction and classified these coders based on theirstrategies for generating multiple descriptions, prediction,and mismatch control. Our goal has been to create a struc-ture that clearly defines the competing factors that must beconsidered when designing an MD video coder. This struc-ture has allowed us to describe the progress to date and laysthe groundwork for future contributions in MD video coding.We also presented a general architecture for transporting MDvideo over multiple paths and showed through example sys-tems that combining MDC with MPT can lead to substantialperformance gains compared to using SDC and a single path.

    Researchers interested in MDC have at their fingertips awealth of promising directions. Coding structures that arelikely to receive increasing attention are those using efficientprediction structures that eliminate mismatch, those usingnew Class C predictors that allow flexible tradeoffs betweenmismatch and redundancy, and those that can efficientlygenerate more than two descriptions. Optimizing MD videocoders for use on packet-loss channels also merits addi-tional attention, as does the application of distributed sourcecoding [63] to MD video [64].

    A fairly new research direction is to combine scalablecoding with MDC to offer both rate scalability and errorresilience. This can be achieved either by creating MDcoders in which each description is scalable or by designinglayered coders in which each layer is MD with a differentamount of redundancy. The MD-FEC coder (Section III-G)achieves this through channel coding. Other works in thisdirection include [26], [65], and [66].

    Finally, much work remains on integrating MD videowith MPT, both in existing networking environments as wellas in emerging communication infrastructures includingpeer-to-peer, cooperative communications (in which nodesassist each other to relay information), and sensor networks.

    ACKNOWLEDGMENT

    The authors would like to thank Y. Altunbasak, J. Apos-tolopoulos, A. Cengiz Begen, P. A. Chou, T. Nguyen, andA. Zakhor for providing information on their work relevant tothis paper. The authors would also like to thank V. Vaisham-payan and S. Mao for helpful technical discussions.

    REFERENCES[1] J. K. Wolf et al., Source coding for multiple descriptions, Bell Syst.

    Tech. J., vol. 59, pp. 14171426, Oct. 1980.

    [2] L. Ozarow, On a source coding problem with two channels andthree receivers, Bell Syst. Tech. J., vol. 59, pp. 19091921, Dec.1980.

    [3] A. A. El-Gamal and T. M. Cover, Achievable rates for multiple de-scriptions, IEEE Trans. Inform. Theory, vol. 28, pp. 851857, Nov.1982.

    [4] N. S. Jayant, Subsampling of a DPCM speech channel to providetwo self-contained half-rate channels, Bell Syst. Tech. J., vol. 60,no. 4, pp. 501509, Apr. 1981.

    [5] V. A. Vaishampayan, Design of multiple description scalar quan-tizer, IEEE Trans. Inform. Theory, vol. 39, pp. 821834, May 1993.

    [6] V. K. Goyal, Multiple description coding: Compression meets thenetwork, IEEE Signal Processing Mag., vol. 18, pp. 7493, Sep.2001.

    [7] Y. Wang et al., Multiple description coding using pairwise cor-relating transforms, IEEE Trans. Image Processing, vol. 10, pp.351366, Oct. 2001.

    [8] J. Apostolopoulos, Reliable video communication over lossypacket networks using multiple state encoding and path diversity,in Proc. Visual Communications Image Processing, 2001, pp.392409.

    [9] S. Wenger, Video redundancy coding in H.263+, presented at theAudio-Visual Services Over Packet Networks Conf., Aberdeen,U.K., 1997.

    [10] V. Vaishampayan and S. John, Balanced interframe multiple de-scription video compression, in Proc. IEEE Int. Conf. Image Pro-cessing, vol. 3, 1999, pp. 812816.

    [11] N. Franchi et al., Multiple video coding for scalable and robusttransmission over IP, presented at the Packet Video Conf., Nantes,France, 2003.

    [12] A. Reibman et al., Multiple description coding for video usingmotion compensated prediction, IEEE Trans. Circuits Syst. VideoTechnol., pp. 193204, Mar. 2002.

    [13] N. V. Boulgouris et al., Drift-free multiple description coding ofvideo, in Proc. IEEE Int. Workshop Multimedia Signal Processing,2001, pp. 105110.

    [14] A. Reibman et al., Multiple description video using rate-distortionsplitting, in Proc. IEEE Int. Conf. Image Processing, vol. 1, 2001,pp. 978981.

    [15] C. Kim and S. Lee, Multiple description coding of motion fieldsfor robust video transmission, IEEE Trans. Circuits Syst. VideoTechnol., vol. 11, pp. 9991010, Sep. 2001.

    [16] Y. Wang and S. Lin, Error resilient video coding using multipledescription motion compensation, IEEE Trans. Circuits Syst. VideoTechnol., vol. 12, pp. 438453, Jun. 2002.

    [17] C. Kim et al., Robust transmission of video sequence usingdouble-vector motion compensation, IEEE Trans. Circuits Syst.Video Technol., vol. 11, pp. 10111020, Sep. 2001.

    [18] M. Karczewicz and R. Kurceren, The SP- and SI-frames designfor H.264/AVC, IEEE Trans. Circuits Syst. Video Technol., pp.637644, Jul. 2003.

    [19] S. Lin and Y. Wang, Analysis and improvement of multiple de-scription motion compensation video coding for lossy packet net-works, in Proc. IEEE Int. Conf. Image Processing, vol. 2, 2002, pp.II-185II-188.

    [20] A. Reibman, Optimizing multiple description video coders in apacket loss environment, presented at the Packet Video Conf. 2002,Pittsburgh, PA.

    [21] A. Ingle and V. A. Vaishampayan, DPCM system design for di-versity systems with application to packetized speech, IEEE Trans.Speech Audio Process., vol. 3, pp. 4858, Jan. 1995.

    [22] Y. Wang and D. Chung, Non-hierarchical signal decomposition andmaximally smooth reconstruction for wireless video transmission,in Mobile Multimedia Communications, D. Goodman and D. Ray-chaudhuri, Eds. New York: Plenum, 1997, pp. 285292.

    [23] D. Chung and Y. Wang, Multiple description image coding usingsignal decomposition and reconstruction based on lapped orthogonaltransforms, IEEE Trans. Circuits Syst. Video Technol., vol. 9, pp.895908, Sep. 1999.

    [24] , Lapped orthogonal transforms designed for error resilientimage coding, IEEE Trans. Circuits Syst. Video Technol., vol. 12,pp. 752764, Sep. 2002.

    [25] I. V. Bajic and J. W. Woods, Domain-based multiple descriptioncoding of images and video, IEEE Trans. Image Process., vol. 12,pp. 12111225, Oct. 2003.

    68 PROCEEDINGS OF THE IEEE, VOL. 93, NO. 1, JANUARY 2005

  • [26] S. Cho and W. A. Pearlman, A full-featured, error resilient, scalablewavelet video codec based on the set partitioning in hierarchical trees(SPIHT) algorithm, IEEE Trans. Circuits Syst. Video Technol., vol.12, pp. 157170, Mar. 2002.

    [27] V. A. Vaishampayan and J. Domaszewicz, Design of dentropy-con-strained multiple description scalar quantizer, IEEE Trans. Inform.Theory, vol. 40, pp. 245250, Jan. 1994.

    [28] J.-C. Batllo and V. Vaishampayan, Asymptotic performance of mul-tiple description transform codes, IEEE Trans. Inform. Theory, vol.43, pp. 703707, Mar. 1997.

    [29] M. Srinivasan and R. Chellappa et al., Multiple description subbandcoding, in Proc. IEEE Int. Conf. Image Processing, vol. 1, 1998, pp.684688.

    [30] S. D. Servetto et al., Multiple description lattice vector quantiza-tion, in Proc. IEEE Data Compression Conf., 1998, pp. 1322.

    [31] H. Jafarkhani and V. Tarokh, Multiple description trellis codedquantization, IEEE Trans. Commun., vol. 47, pp. 799803, Jun.1999.

    [32] M. Fleming and M. Effros, Generalized multiple description vectorquantization, in Proc. IEEE Data Compression Conf., 1999, pp.312.

    [33] C. Tian and S. Hemami, Universal multiple description scalar quan-tization: Analysis and design, in Proc. IEEE Data CompressionConf., 2003, pp. 183192.

    [34] V. K. Goyal and J. Kovacevic, Generalized multiple descriptioncoding with correlating transforms, IEEE Trans. Inform. Theory,vol. 47, pp. 21992224, Sep. 2001.

    [35] S. L. Regunathan and K. Rose, Efficient prediction in multiple de-scription video coding, in Proc. IEEE Int. Conf. Image Processing,vol. 1, 2000, pp. 10201023.

    [36] S. Wenger et al., Error resilience support in H.263+, IEEE Trans.Circuits Syst. Video Technol., vol. 8, pp. 867877, Nov. 1998.

    [37] J. Apostolopoulos, Error-resilient video compression through theuse of multiple states, in Proc. IEEE Int. Conf. Image Processing,vol. 3, 2000, pp. 352355.

    [38] D. Comas et al., Rate-distortion optimization in a robust videotransmission based on unbalanced multiple description coding, inProc. IEEE Int. Workshop Multimedia Signal Processing, 2001, pp.581586.

    [39] I. K. Kim and N. I. Cho, Error resilient video coding using optimalmultiple description of DCT coefficients, in Proc. IEEE Int. Conf.Image Process., vol. 3, 2003, pp. II-663II-666.

    [40] X. Tang and A. Zakhor, Matching pursuits multiple descrip-tion coding for wireless video, IEEE Trans. Circuits Syst. VideoTechnol., vol. 12, pp. 566575, Jun. 2002.

    [41] A. Jagmohan and K. Ratakonda, Multiple description coding ofpredictively encoded sequences, in Proc. IEEE Data CompressionConf., 2002, pp. 1322.

    [42] Y. Lee et al., A drift-free motion-compensated predictive encodingtechnique for multiple description coding, in Proc. IEEE Int. Mul-timedia Expo, vol. 3, 2003, pp. III-581III-584.

    [43] T. Nguyen and A. Zakhor, Matching pursuits based multiple de-scription video coding for lossy environments, in Proc. IEEE Int.Conf. Image Processing, vol. 1, 2003, pp. I-57I-60.

    [44] S. Lin et al., Rate-distortion analysis of the multiple description mo-tion compensation video coding scheme, in Proc. IEEE Int. Conf.Acoustics Speech Signal Processing, vol. 3, 2003, pp. III-401III-404.

    [45] A. Albanese et al., Priority encoding transmission, IEEE Trans.Inform. Theory, pp. 17371744, Nov. 1996.

    [46] G. Davis and J. Danskin, Joint source and channel coding for imagetransmission over lossy packet networks, Proc. SPIE, Conf. WaveletApplicat. Digital Image Process. XIX, vol. 2847, pp. 376387, 1996.

    [47] A. E. Mohr et al., Unequal loss protection: Graceful degradationof image quality over packet erasure channels through forward errorcorrection, IEEE J. Select. Areas Commun., vol. 18, pp. 819829,Jun. 2000.

    [48] R. Puri and K. Ramchandran, Multiple description source codingthrough forward error correction codes, in Proc. 33rd AsilomarConf. Signals, System Comp., vol. 1, 1999, pp. 342346.

    [49] N. Gogate et al., Supporting video/image applications in a mobilemultihop radio environment using route diversity and multiple de-scription coding, IEEE Trans. Circuits Syst. Video Technol., vol. 12,no. 9, pp. 777792, Sep. 2002.

    [50] S. Mao et al., Video transport over ad hoc networks: Multistreamcoding with multipath transport, IEEE J. Select. Areas Commun.,vol. 21, pp. 17211737, Dec. 2003.

    [51] A. Begen et al., Multi-path selection for multiple description en-coded video streaming, in Proc. IEEE Int. Conf. Commun., vol. 3,2003, pp. 15831589.

    [52] J. Chakareski et al., Video streaming with diversity, in IEEE Int.Multimedia Expo, vol. 1, 2003, pp. 912.

    [53] J. Apostolopoulos et al., On multiple description streaming incontent delivery networks, in Proc. IEEE INFOCOM, 2002, pp.17361745.

    [54] T. Nguyen and A. Zakhor, Protocols for distributed videostreaming, in Proc. IEEE Int. Conf. Image Processing, vol. 3, 2002,pp. III-185III-188.

    [55] A. Majumdar et al., Distributed multimedia transmission from mul-tiple servers, in Proc. IEEE Int. Conf. Image Processing, vol. 3,2002, pp. III-177III-180.

    [56] J. Chakareski and B. Girod, Server diversity in rate-distortion opti-mized media streaming, in Proc. IEEE Int. Conf. Image Processing,vol. 3, 2003, pp. III-645III-648.

    [57] A. Begen et al., Rate-distortion optimized on-demand mediastreaming with server diversity, in Proc. IEEE Int. Conf. ImageProcessing, vol. 3, 2003, pp. III-657III-660.

    [58] V. Padmanabhan et al.. (2003, Mar.) Resilient peer-to-peerstreaming. Microsoft Res.. [Online]. Available: http://www.re-search.microsoft.com/projects/CoopNet/papers/msr-tr-2003-11.pdf

    [59] J. Apostolopoulos et al., Modeling path diversity for multiple de-scription video communication, in Proc. IEEE Int. Conf. AcousticsSpeech Signal Processing, vol. 3, 2002, pp. III-2161III-2164.

    [60] T. Nguyen and A. Zakhor, Path diversity with forward error cor-rection (pdf) system for packet switched networks, in Proc. IEEEINFOCOM, vol. 1, 2003, pp. 663672.

    [61] J. Johnson et al., The dynamic source routing protocol for mobilead hoc networks, in IETF Internet draft (draft-ietf-manet-dsr-03.txt),Oct. 1999.

    [62] F. Wu et al., A framework for efficient progressive fine granularityscalability video coding, IEEE Trans. Circuits Syst. Video Technol.,vol. 11, pp. 332344, Mar. 2001.

    [63] B. Girod et al., Distributed video coding, in Proc. IEEE, 93, pp.7183, Jan. 2005.

    [64] A. Jagmohan et al., WYZE-PMD based multiple description videocodec, in IEEE Int. Multimedia Expo, vol. 1, 2003, pp. I-569I-572.

    [65] P. Chou et al., Layered multiple description coding, presented atthe Packet Video Conf., Nantes, France, 2003.

    [66] H. Wang and A. Ortega, Robust video communication by com-bining scalability and multiple description coding techniques, Proc.SPIE, Image Video Commun. Process., vol. 5022, pp. 111124, May2003.

    Yao Wang (Fellow, IEEE) received the B.S. andM.S. degrees from Tsinghua University, Beijing,China, in 1983 and 1985, respectively, and thePh.D. degree from the University of California,Santa Barbara, in 1990, all in electrical engi-neering.

    Since 1990, she has been on the faculty ofElectrical and Computer Engineering, Poly-technic University, Brooklyn, NY. She is theleading author of the textbook Video Processingand Communications and has published over

    150 papers in journals and conference proceedings. Her research interestsinclude video communications and multimedia signal processing.

    Dr. Wang received the New York City Mayors Award for Excellence inScience and Technology in the Young Investigator Category in year 2000.She was elected an IEEE Fellow for contribution to video processing andcommunications in 2004. She is the Co-Winner of the IEEE Communica-tions Society Leonard G. Abraham Prize Paper Award in the Field of Com-munications Systems in 2003.

    WANG et al.: MULTIPLE DESCRIPTION CODING FOR VIDEO DELIVERY 69

  • Amy R. Reibman (Fellow, IEEE) received thePh.D. in electrical engineering from Duke Uni-versity, Durham, NC, in 1987.

    From 1988 to 1991, she taught electrical engi-neering at Princeton University, Princeton, NJ. In1991, she joined AT&T and is currently a Tech-nical Consultant in the Communication SciencesResearch Department at AT&T Lab.-Research,Florham Park, NJ. Her research interests includevideo compression systems for transport overpacket and wireless networks and video quality

    metrics.Dr. Reibman won the IEEE Communications Society Leonard G.

    Abraham Prize Paper Award in 1998. She was the Technical Co-Chair ofthe IEEE International Conference on Image Processing in 2002. She waselected an IEEE Fellow for contributions to the transport of video overnetworks.

    Shunan Lin (Member, IEEE) received theB.S. degree from the University of Science andTechnology of China, Hefei, in 1995, the M.S.degree from the Institute of Automation, ChineseAcademy of Sciences Beijing, China, in 1998,and the Ph.D. degree from Polytechnic Uni-versity, Brooklyn, NY, in 2004, all in electricalengineering.

    Since 2004, he has been with Harmonic Inc,White Plains, NY. Between 2000 and 2003, he in-terned at Microsoft Research, Beijing, China, and

    Mitsubishi Electric Research Laboratory, Murray Hill, NJ. His research in-terests include the field of video coding, processing, and transmission.

    Dr. Lin was the Co-Winner of the IEEE Communications Society LeonardG. Abraham Prize Paper Award in the Field of Communications Systems in2003.

    70 PROCEEDINGS OF THE IEEE, VOL. 93, NO. 1, JANUARY 2005

    tocMultiple Description Coding for Video DeliveryYAO WANG, FELLOW, IEEE, AMY R. REIBMAN, FELLOW, IEEE, and SHUNANI. I NTRODUCTIONII. P RINCIPLES OF P REDICTIVE MD C ODING

    Fig.1. MD source coding for an ideal MD network.A. Characterization of a General MD CoderB. Predictive Coding in an MD Environment

    Fig.2. Predictive MD decoder.Fig.3. Predictive MD encoder.Table 1 Summary of Predictor ClassesTable 2 Summary of Redundancy TypesC. Using P-MD Coders on a Packet-Loss NetworkD. Optimization of a P-MD CoderIII. R EVIEW OF MD V IDEO C ODERSA. Multiple Description AlgorithmsB. MD Video Coders Using Multiple Class A PredictorsC. MD Video Coders Using One Class A PredictorD. MD Video Coders Using Class B Predictor Without Mismatch CodiE. Optional Mismatch Coding for Class B PredictorsF. MD Video Coders Using Class C Predictors With and Without OptG. MDC Through Unequal FEC

    Fig.4. MD-FEC algorithm.H. SummaryIV. I NTEGRATION OF M ULTIPLE D ESCRIPTION C ODING AND M ULTIPLE

    Table 3 Summary of Frameworks for P-MD Video CodersFig.5. General architecture of the system using multistream codA. General System Architecture

    Table 4 OPNET simulation results using the MDMC coderB. Video Transport in Wireless Ad Hoc Networks

    Fig.6. Average PSNR versus time when the number of trees variedC. Video Streaming in Content Delivery NetworksD. CoopNet ProjectV. C ONCLUSIONJ. K. Wolf et al., Source coding for multiple descriptions, BellL. Ozarow, On a source coding problem with two channels and threA. A. El-Gamal and T. M. Cover, Achievable rates for multiple deN. S. Jayant, Subsampling of a DPCM speech channel to provide twV. A. Vaishampayan, Design of multiple description scalar quantiV. K. Goyal, Multiple description coding: Compression meets the Y. Wang et al., Multiple description coding using pairwise correJ. Apostolopoulos, Reliable video communication over lossy packeS. Wenger, Video redundancy coding in H.263+, presented at the AV. Vaishampayan and S. John, Balanced interframe multiple descriN. Franchi et al., Multiple video coding for scalable and robustA. Reibman et al., Multiple description coding for video using mN. V. Boulgouris et al., Drift-free multiple description coding A. Reibman et al., Multiple description video using rate-distortC. Kim and S. Lee, Multiple description coding of motion fields Y. Wang and S. Lin, Error resilient video coding using multiple C. Kim et al., Robust transmission of video sequence using doublM. Karczewicz and R. Kurceren, The SP- and SI-frames design for S. Lin and Y. Wang, Analysis and improvement of multiple descripA. Reibman, Optimizing multiple description video coders in a paA. Ingle and V. A. Vaishampayan, DPCM system design for diversitY. Wang and D. Chung, Non-hierarchical signal decomposition and D. Chung and Y. Wang, Multiple description image coding using siI. V. Bajic and J. W. Woods, Domain-based multiple description cS. Cho and W. A. Pearlman, A full-featured, error resilient, scaV. A. Vaishampayan and J. Domaszewicz, Design of dentropy-constrJ.-C. Batllo and V. Vaishampayan, Asymptotic performance of multM. Srinivasan and R. Chellappa et al., Multiple description subbS. D. Servetto et al., Multiple description lattice vector quantH. Jafarkhani and V. Tarokh, Multiple description trellis coded M. Fleming and M. Effros, Generalized multiple description vectoC. Tian and S. Hemami, Universal multiple description scalar quaV. K. Goyal and J. Kovacevic, Generalized multiple description cS. L. Regunathan and K. Rose, Efficient prediction in multiple dS. Wenger et al., Error resilience support in H.263+, IEEE TransJ. Apostolopoulos, Error-resilient video compression through theD. Comas et al., Rate-distortion optimization in a robust video I. K. Kim and N. I. Cho, Error resilient video coding using optiX. Tang and A. Zakhor, Matching pursuits multiple description coA. Jagmohan and K. Ratakonda, Multiple description coding of preY. Lee et al., A drift-free motion-compensated predictive encodiT. Nguyen and A. Zakhor, Matching pursuits based multiple descriS. Lin et al., Rate-distortion analysis of the multiple descriptA. Albanese et al., Priority encoding transmission, IEEE Trans. G. Davis and J. Danskin, Joint source and channel coding for imaA. E. Mohr et al., Unequal loss protection: Graceful degradationR. Puri and K. Ramchandran, Multiple description source coding tN. Gogate et al., Supporting video/image applications in a mobilS. Mao et al., Video transport over ad hoc networks: MultistreamA. Begen et al., Multi-path selection for multiple description eJ. Chakareski et al., Video streaming with diversity, in IEEE InJ. Apostolopoulos et al., On multiple description streaming in cT. Nguyen and A. Zakhor, Protocols for distributed video streamiA. Majumdar et al., Distributed multimedia transmission from mulJ. Chakareski and B. Girod, Server diversity in rate-distortion A. Begen et al., Rate-distortion optimized on-demand media streaV. Padmanabhan et al. . (2003, Mar.) Resilient peer-to-peer streJ. Apostolopoulos et al., Modeling path diversity for multiple dT. Nguyen and A. Zakhor, Path diversity with forward error correJ. Johnson et al., The dynamic source routing protocol for mobilF. Wu et al., A framework for efficient progressive fine granulaB. Girod et al., Distributed video coding, in Proc. IEEE, 93, ppA. Jagmohan et al., WYZE-PMD based multiple description video coP. Chou et al., Layered multiple description coding, presented aH. Wang and A. Ortega, Robust video communication by combining s