+ All Categories
Home > Documents > 1874 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. …gchan/papers/TMM14_MVS.pdf1874 IEEE TRANSACTIONS ON...

1874 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. …gchan/papers/TMM14_MVS.pdf1874 IEEE TRANSACTIONS ON...

Date post: 05-Jul-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
14
1874 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 16, NO. 7, NOVEMBER 2014 Coding Structure and Replication Optimization for Interactive Multiview Video Streaming Dongni Ren, S.-H. Gary Chan, Senior Member, IEEE, Gene Cheung, Senior Member, IEEE, and Pascal Frossard, Senior Member, IEEE Abstract—Multiview video refers to videos of the same dynamic 3-D scene captured simultaneously by multiple closely spaced cam- eras from different viewpoints. We study interactive streaming of pre-encoded multiview videos, where, at any time, a client can request any one of many captured views for playback. Moreover, the client can periodically freeze the video in time and switch to neighboring views for a compelling look-around visual effect. We consider distributed content servers to support large-scale interactive multiview video service. These servers collaboratively replicate and access video contents. We study two challenges in this setting: what is an efcient coding structure that supports interactive view switching and, given that, what to replicate in each server in order to minimize the cost incurred by interactive temporal and view switches? We rst propose a redundant coding structure that facilitates interactive view-switching, trading off storage with transmission rate. Using the coding structure, we next propose a content replication strategy that takes advantage of indirect hit to lower view-switching cost: in the event that the exact requested view is not available locally, the local server can fetch a different but correlated view from the other servers, so that the remote repository only needs to supply the pre-encoded view differential. We formulate the video content replication problem to minimize the switching cost as an integer linear programming (ILP) problem and show that it is NP-hard. We rst propose an LP relaxation and rounding algorithm (termed Minimum Eviction) with bounded approximation error. We then study a more scalable solution based on dynamic programming and La- grangian optimization (DPLO) with little sacrice in performance. Simulation results show that our replication algorithms achieve substantially lower switching cost compared to other content replication schemes. Index Terms—Multimedia computing, digital video broad- casting. Manuscript received July 04, 2013; revised December 19, 2013; accepted June 09, 2014. Date of publication June 20, 2014; date of current version Oc- tober 13, 2014. This work was supported in part by Hong Kong Research Grant Council (RGC) General Research Fund under Grant 610713, HKUST under Grant FSGRF13EG15, and the Hong Kong Innovation and Technology Fund under Grant UIM/246. The associate editor coordinating the review of this man- uscript and approving it for publication was Prof. Ebroul Izquierdo. S.-H. Gary Chan and D. Ren are with the Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong (e-mail: [email protected]; [email protected]). G. Cheung is with the National Institute of Informatics, Tokyo 1018430, Japan (e-mail: [email protected]). P. Frossard is with the Ecole Polytechnique Fédérale de Lausanne (EPFL) Lausanne CH-1015, Switzerland (e-mail: pascal.frossard@ep.ch). Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/TMM.2014.2332139 Fig. 1. Distributed interactive multiview video streaming (IMVS) network. I. INTRODUCTION A MULTIVIEW video is a set of 2-D image sequences cap- turing the same dynamic 3-D scene recorded by a large array of time-synchronized and closely spaced cameras from different viewpoints [1]. We consider interactive streaming of multiview video, called interactive multiview video streaming (IMVS) in the sequel, where a client can freely select any one of these stored views and play back on a conventional 2-D display. In addition, the client can freeze the video in time and switch to nearby viewpoints to examine the 3-D scene from dif- ferent angles [2]. This static view-switching interaction, which enables a “Matrix”-like look-around visual effect 1 with the ac- tion scene frozen in time, has been shown to be more appealing to users than dynamic view-switching, where the video is played back in time uninterrupted as a viewer switches to neighboring views, resembling a single-camera panning effect [3]. In order to support a large-scale service, a content provider often deploys content servers close to user pools [4]. We show in Fig. 1 an example of such a network. There is a remote repository storing all the pre-encoded videos. The content servers then collaboratively replicate the videos subject to their storage capacities. A client is served by its local server. In ad- dition to temporal interactive requests (random access in time of the same video view using traditional DVR functionalities), a client can send inter-view requests (aforementioned static view-switching) to its local server. The server fullls these requests directly if the data has been replicated locally. This situation is called replication hit. Otherwise, it contacts the other servers for the missing data. If no other servers has the re- quested data (a replication miss), the remote repository supplies it to the local server. While the network transmission delay 1 During the lming of the 1999 movie “The Matrix”, a 1D array of closely spaced cameras were arranged in a near half-circle and used to capture an action scene simultaneously. Frames of the same captured time instant from consec- utive views were then arranged together for playback, enabling a look-around visual observation of the scene frozen in time. 1520-9210 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Transcript
Page 1: 1874 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. …gchan/papers/TMM14_MVS.pdf1874 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 16, NO. 7, NOVEMBER 2014 Coding Structure and Replication Optimization

1874 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 16, NO. 7, NOVEMBER 2014

Coding Structure and Replication Optimization forInteractive Multiview Video Streaming

Dongni Ren, S.-H. Gary Chan, Senior Member, IEEE, Gene Cheung, Senior Member, IEEE, andPascal Frossard, Senior Member, IEEE

Abstract—Multiview video refers to videos of the same dynamic3-D scene captured simultaneously bymultiple closely spaced cam-eras from different viewpoints. We study interactive streamingof pre-encoded multiview videos, where, at any time, a client canrequest any one of many captured views for playback. Moreover,the client can periodically freeze the video in time and switchto neighboring views for a compelling look-around visual effect.We consider distributed content servers to support large-scaleinteractive multiview video service. These servers collaborativelyreplicate and access video contents. We study two challenges inthis setting: what is an efficient coding structure that supportsinteractive view switching and, given that, what to replicate ineach server in order to minimize the cost incurred by interactivetemporal and view switches? We first propose a redundant codingstructure that facilitates interactive view-switching, trading offstorage with transmission rate. Using the coding structure, wenext propose a content replication strategy that takes advantageof indirect hit to lower view-switching cost: in the event that theexact requested view is not available locally, the local server canfetch a different but correlated view from the other servers, so thatthe remote repository only needs to supply the pre-encoded viewdifferential. We formulate the video content replication problemto minimize the switching cost as an integer linear programming(ILP) problem and show that it is NP-hard. We first proposean LP relaxation and rounding algorithm (termed MinimumEviction) with bounded approximation error. We then study amore scalable solution based on dynamic programming and La-grangian optimization (DPLO) with little sacrifice in performance.Simulation results show that our replication algorithms achievesubstantially lower switching cost compared to other contentreplication schemes.

Index Terms—Multimedia computing, digital video broad-casting.

Manuscript received July 04, 2013; revised December 19, 2013; acceptedJune 09, 2014. Date of publication June 20, 2014; date of current version Oc-tober 13, 2014. This work was supported in part by Hong Kong Research GrantCouncil (RGC) General Research Fund under Grant 610713, HKUST underGrant FSGRF13EG15, and the Hong Kong Innovation and Technology Fundunder Grant UIM/246. The associate editor coordinating the review of this man-uscript and approving it for publication was Prof. Ebroul Izquierdo.S.-H. Gary Chan and D. Ren are with the Hong Kong University of Science

and Technology, Clear Water Bay, Hong Kong (e-mail: [email protected];[email protected]).G. Cheung is with the National Institute of Informatics, Tokyo 1018430,

Japan (e-mail: [email protected]).P. Frossard is with the Ecole Polytechnique Fédérale de Lausanne (EPFL)

Lausanne CH-1015, Switzerland (e-mail: [email protected]).Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TMM.2014.2332139

Fig. 1. Distributed interactive multiview video streaming (IMVS) network.

I. INTRODUCTION

A MULTIVIEW video is a set of 2-D image sequences cap-turing the same dynamic 3-D scene recorded by a large

array of time-synchronized and closely spaced cameras fromdifferent viewpoints [1]. We consider interactive streaming ofmultiview video, called interactive multiview video streaming(IMVS) in the sequel, where a client can freely select any oneof these stored views and play back on a conventional 2-Ddisplay. In addition, the client can freeze the video in time andswitch to nearby viewpoints to examine the 3-D scene from dif-ferent angles [2]. This static view-switching interaction, whichenables a “Matrix”-like look-around visual effect1 with the ac-tion scene frozen in time, has been shown to be more appealingto users than dynamic view-switching, where the video is playedback in time uninterrupted as a viewer switches to neighboringviews, resembling a single-camera panning effect [3].In order to support a large-scale service, a content provider

often deploys content servers close to user pools [4]. We showin Fig. 1 an example of such a network. There is a remoterepository storing all the pre-encoded videos. The contentservers then collaboratively replicate the videos subject to theirstorage capacities. A client is served by its local server. In ad-dition to temporal interactive requests (random access in timeof the same video view using traditional DVR functionalities),a client can send inter-view requests (aforementioned staticview-switching) to its local server. The server fulfills theserequests directly if the data has been replicated locally. Thissituation is called replication hit. Otherwise, it contacts theother servers for the missing data. If no other servers has the re-quested data (a replication miss), the remote repository suppliesit to the local server. While the network transmission delay

1During the filming of the 1999 movie “The Matrix”, a 1D array of closelyspaced cameras were arranged in a near half-circle and used to capture an actionscene simultaneously. Frames of the same captured time instant from consec-utive views were then arranged together for playback, enabling a look-aroundvisual observation of the scene frozen in time.

1520-9210 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: 1874 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. …gchan/papers/TMM14_MVS.pdf1874 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 16, NO. 7, NOVEMBER 2014 Coding Structure and Replication Optimization

REN et al.: CODING STRUCTURE AND REPLICATION OPTIMIZATION FOR INTERACTIVE MULTIVIEW VIDEO STREAMING 1875

is negligible among the servers, the delay from the repositoryis generally larger—potentially resulting in video playbackdelay—and the transmission is more expensive. Hence it isimportant to limit repository access as much as possible.During a temporal or inter-view switching, the client has de-

liberately chosen a different navigation path, and a typical videoclient has to empty its current buffer content and rebuffer a fixednumber of new frames before playback resumes. The delay inresuming video playback adversely affects viewing experience.Thus, we define switching cost to be the amount of time requiredfor the rebuffering of the newly transmitted frames, which de-pends on both the size of the frames and the network locationfrom which the frames are fetched.In this paper, we study two critical challenges to support

large-scale IMVS services: how to design an efficient codingstructure to facilitate interactive view-switching? Then, givensuch a coding structure, how to optimize content replicationacross servers for multiple movies and minimize switching costduring interactive temporal and static view switches?First, the multiview video must be pre-encoded a priori

without knowing what view navigation paths the clients willeventually choose; these pre-encoded frames must be efficientlyextracted for decoding and playback according to the actualviewpoints chosen by the clients at stream time. A simple(but naïve) way to enable random access for view-switchingis to encode different viewpoints as independently codedIntra-frames (I-frames). In this case, selected views can betransmitted and decoded in any order since there is no codingdependency between views. However, because I-frames arenot coding-efficient, this naïve solution results in unacceptablyhigh bandwidth requirements.Due to the high correlation among adjacent viewpoint im-

ages, we propose a more efficient coding structure to facilitatestatic view-switching, trading off required storage size with ex-pected transmission rate. The structure is based on redundantP-frames [2] andmerge frames [5] to help users navigate amongadjacent views. A strength of our redundant coding structure isthat, in addition to the replication hits (called direct hits in thesequel) and replication misses mentioned above, it enables in-direct hits: even if an exact requested view is not available lo-cally or in other servers, the local server can use a different butcorrelated view (either locally stored or fetched from the otherservers), so that the remote repository only needs to transmit thepre-encoded differentials between the target view and the cor-related view. This leads to a lower overall switching cost thantransmission of intra-coded version of the requested view (nec-essary for replication miss).Using this new coding structure, we then formulate the con-

tent replication problem for multiple movies in order to mini-mize the switching cost as an integer linear programming (ILP)problem and show that it is NP-hard.We then propose a near-op-timal replication strategy, which first solves the ILP problem as arelaxed LP problem, and then heuristically rounds the resultingfractional LP solution to integer towards ILP feasibility (usinga heuristic called Minimum Eviction). Since the LP-roundingalgorithm does not scale well to large size problems, we thenpropose a more computationally efficient solution based on dy-namic programming and Lagrangian optimization (DPLO) at

the price of only minimal performance penalty. Simulation re-sults show that our replication algorithms reduce switching costsubstantially compared to other state-of-the-art content replica-tion schemes.The paper is organized as follows. We first discuss related

work in Section II. We then present the coding structure wedesign to support static view-switching in Section III. InSection IV, we formulate the content replication problem as anILP problem and show that it is NP-hard. We present our LPrelaxation and rounding algorithm based on Minimum Evic-tion, and a scalable solution with dynamic programming andLagrangian optimization in Sections V and VI, respectively.Experimental results are discussed in Section VII. We concludein Section VIII.

II. RELATED WORK

A. Multiview Video Coding for Interactive Applications

There has been much research in multiview video coding(MVC), focusing on compression of all captured frames acrosstime and view, and exploiting both temporal and inter-view cor-relation to achieve maximal coding gain [6], [7]. However, ifthe sender transmits all camera-captured views (in a non-inter-active streaming paradigm where the sender does not considerreceiver’s feedback when deciding which views to transmit),which depending on system setup can be well over one hundredcameras [8], the transmission bandwidth cost is exorbitant evenafter employing MVC for compression, since the video bitratestill grows approximately linearly with the number of codedviews [6]. This transmission of multiple views looks especiallywasteful given that a user typically observes only a single viewat a time on a conventional 2-D display.In order to reduce transmission bandwidth, a user in an IMVS

system [2] requests and receives from the streaming server onlya single coded view for temporal playback, though he can re-quest switches to neighboring views periodically (e.g., everytemporal frames) in an interactive manner; this is a new para-digm that we call interactive streaming [9]. Unfortunately, videodata pre-encoded using MVC does not provide the data randomaccessibility required for interactive streaming. In particular,using MVC video for interactive streaming often means thatmore than one video view must be transmitted just so a singleview can be correctly decoded and displayed, resulting in largestreaming rates. The reason is that complicated inter-frame de-pendencies among coded frames across time and view (targetingpure coding gain) reduce the random decodability of the videostream. This point has been well established in the IMVS liter-ature; see [2], [10], [11] for more detailed discussions.In light of the practical need for new approaches to compress

multiview video compactly while maintaining a desired level ofdata random access, new frame types and frame structures haverecently been proposed [2], [12], [13]. A new frame type calledSP-frame [12] in the video coding standard H.264 is designedfor switching among pre-encoded video streams, thus providingflexibility in decoding. Later, authors in [13] showed that usingdistributed source coding (DSC) principles, one can constructa DSC frame using bit-plane and channel coding, which out-performs the corresponding SP-frame in rate-distortion (RD)

Page 3: 1874 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. …gchan/papers/TMM14_MVS.pdf1874 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 16, NO. 7, NOVEMBER 2014 Coding Structure and Replication Optimization

1876 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 16, NO. 7, NOVEMBER 2014

performance. The method in [2] then optimizes the construc-tion of a redundant frame structure using I-, redundant P- andDSC frames as building blocks for an IMVS scenario, wherea client can interactively switch to an adjacent view everyframes without interrupting temporal video playback, i.e. dy-namic view-switching.While these works focus their coding de-sign for single server-client communication, we employ here aredundant coding structure to facilitate static view-switching (asopposed to dynamic view-switching in [2]) and to enable indi-rect hits in a large-scale IMVS network. Further, instead of DSCframe, we employ a more coding-efficient and less complexmerge frame for merging of decoding paths [5]. Section VII willshow that indirect hit can significantly lower expected transmis-sion rate during an IMVS session.

B. Large-Scale Video Streaming and Distribution

There have been extensive studies on content replication andreplacement strategies for interactive video [14]–[17]. Therehave also been studies on cooperative caching and server se-lection in video streaming applications [18]–[20]. However, allthese works focus on single-view video and hence do not takeadvantage of correlation among views of the same video forsaving resources. Therefore, they are not efficient for IMVS. Incontrast, we propose a coding structure for multiview video thatfacilitates static view-switching, and then design content repli-cation strategies to minimize overall switching cost.More recently, multi-view video streaming has drawn quite

some attention in the community [21], [22]. The work in [23]proposes a scheduling algorithm that allows peers to frequentlycompute the scheduling of multiview segments. The paper [24]studies the problem of achieving low view-switching delay byorganizing viewers of different views together. These worksessentially treat the multiview video streaming as multipleindependent single-view video streams, and have not consid-ered view correlation and switching among views during videoplayback. Our work extends the field further by consideringboth the coding structure (through novel usage of inter-viewP-frames and merge frames to exploit correlation among neigh-boring views) and content replication strategy to achieve nearlyminimum expected switching costs.In our previous work [25], we study multiview video

streaming and content replication by proposing a heuristic-based strategy for the delivery of a single multiview videosequence, where we exploit the correlation between the mul-tiple views. In our follow-up work [26], we formulate theoptimization problem for multiple videos as an ILP problem.However, the rounding algorithm does not scale to large sizeproblems. In the present work, we extend the coding structureof the multi-view video to include a new encoding scheme forview switches, so that more advanced coding standards, suchas H.264, can be supported.

III. CODING STRUCTURE FOR IMVS

In this section, we introduce the coding structure that weuse for pre-encoding multiview video content that is stored inthe repository, with subsets of coded frames being distributed

TABLE ISYMBOLS USED IN PROBLEM FORMULATIONS

among content servers. The structure is designed to facilitateinteractive view-switching while the video playback is pausedin time (static view switching), so as to enable a look-aroundvisual effect for the viewers. We employ coding tools that havebeen previously developed to solve the view-switchingproblem[5]. Table I shows a list of important symbols used in this paper.

Page 4: 1874 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. …gchan/papers/TMM14_MVS.pdf1874 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 16, NO. 7, NOVEMBER 2014 Coding Structure and Replication Optimization

REN et al.: CODING STRUCTURE AND REPLICATION OPTIMIZATION FOR INTERACTIVE MULTIVIEW VIDEO STREAMING 1877

Fig. 2. Dependencies among segments in our proposedmultiview video codingstructure. Arrows at heads indicate feasible view switches using pre-encodeddifferentials.

A. Frame Structure in Details

To achieve inter-view and temporal switching withgood compression efficiency, we propose the followingframestructure to pre-encode a given multiview video content.There are a total of views in the sequence. For any view( ), consecutive captured video frames in time areencoded into a segment. The th segment for view is labeledas , where and is the movie lengthin frames.2

In our coding structure, a frame (or picture) at time instantof view is labeled as , . The

segment contains the leading picture , calledhead of , and the trailing pictures ,

, called tail of .In the coding structure, is the inter-view switching pe-

riod, which is the minimum frame interval between two timeconsecutive static view-switches requested by client. To facili-tate view switching, has a redundant representation, sothat inter-view correlations among nearby viewpoints are ex-ploited during a view-switch. Specifically, for a given redun-dant window , the head contains up to per-encodeddifferentials using heads ’s of nearby view ’s as predic-tors, where . Using thiscoding structure, a view-switch from view to only requirestransmission of the corresponding pre-encoded differential. Wegive in Fig. 2 an illustration of a multiview video frame structurefor five views and redundant window . Obviously,there is a tradeoff between the total data size and the switchingcost by varying the redundant window size . We will explorethis tradeoff in Section VII.The head of is represented by multiple compressed ver-

sions of the same picture :• One independently coded I-frame ,• Multiple differentially coded P-frames ’s, and• One merge frame [5].First, a temporal P-frame is motion-com-

pensated using a P-frame of previous time instantas predictor. Temporal P-frame is for

2We will adopt the convention that superscripts denote movie and/or serverindexes, subscripts denote time instants, and brackets denote view number inthe sequel.

Fig. 3. Piecewise constant function used for signal merging in our proposedmerge frame [5].

video playback in time in the same view . Then, inter-viewP-frames ’s are disparity-compensated, each usingI-frame of a nearby view at the same time instantas predictor. Inter-view P-frames ’s are designed forstatic inter-view switching.We now describe the key idea in the design of merge frames

[5], which are used for effective view switching in our IMVSsystem. The merge frame is built with multipleP-frames ’s as predictors using the following procedure.I-frame is the encoding target for , whichmeans that a correctly reconstructed merge frame is bit-by-bitequivalent to . Each provides side information(SI) to help decode . Because the reconstructed SIframe and the target are both representations ofthe same picture , the block-wise frequency contents inreconstructed and in Discrete Cosine Transform(DCT) domain are similar. In [5], a piecewise constant function(pwc) is used to merge coefficients of different SI framesto the target. As an example, in Fig. 3 we see two coefficients

and of block and frequency for SI frame1 and 2 respectively, which are close to each other. If thecorrect shift parameter and step size for a floor function(example of a pwc function) are chosen, then the similar SIcoefficients will land on the same step, and the computed value

will be the same (merged) value for all SI frames .[5] discussed the details of how and can be chosen for eachfrequency of each block so that a desired reconstructed value

can be obtained. [5] showed that competitive codingperformance can be obtained compared to DSC frames pro-posed in [13], which involved complex bit-plane and channelcoding. SP-frames in H.264 [12] can also be used for the samedecoding path merging task. However, the number of secondarySP-frames required at head grows linearly with respectto (while our proposal requires only a single merge frame

), and each secondary SP-frame is significantly largerthan our proposed merge frame due to lossless coding. See [5]for a more detailed discussion.Thus, by merge frame construction, can be perfectly

reconstructed via as long as one of the SI frames’s is available at decoder. Functionally, serves

as a signal merging operator to reconciliate minor differ-ences due to motion/disparity compensation and quantizationamong P-frames ’s and . This is done so thatother frames in turn can simply use the one unified versionof —reconstructed I-frame —as predictor fordifferential coding without introducing any coding drift. Thereconstructed frame is identical no matter which decoding path

Page 5: 1874 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. …gchan/papers/TMM14_MVS.pdf1874 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 16, NO. 7, NOVEMBER 2014 Coding Structure and Replication Optimization

1878 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 16, NO. 7, NOVEMBER 2014

Fig. 4. Implementation of a segment using I-, P-, and merge frames (denotedby circle, squares, and diamonds, respectively).

is actually taken. More specifically, It is mathematically provenin [5] that the decoded quantization bin indices for each DCTblock are exactly the same for every decoding path. Hence theresulting decoded frame is bit-by-bit identical, and no codingdrift from this point onward.The merge frame is in practice much smaller than

independently coded I-frame [5]. Fig. 4 shows a framestructure for segment with the different types of framesdiscussed above.

B. Benefits of Coding Structure in Interactive Systems

We now discuss how the redundant frame structure describedabove is used for a large-scale distributed IMVS network. Aserver in the network may choose to replicate a segment, whichwill include the tail of the segment, plus the I-frame of the headof the segment. All prediction differentials (inter-view P-framesand merge frames) of the head of the segment are only stored inthe repository. Generally, the viewer tries to get the data fromlocal servers, and accesses the repository only if this is not pos-sible. If a viewer requests a view after observing view , where

, only the inter-view P-frame and the mergeframe are needed from the repository for the decoderto switch views, if the target I-frame is not available lo-cally. On the other hand, if , then the much largerindependently coded I-frame is needed.The repository transmission is generally much slower than

local server transmission. Therefore, the goal of minimizingswitching cost is equivalent to minimizing required repositorytransmission. Thus, to avoid repository transmission ofduring an inter-view switch, we do the following. If a neigh-boring server has replicated I-frame , where ,then the server first sends to the local server (with negli-gible delay), while the repository transmits the smaller P-frame

and merge frame to the local server. Thisis called indirect hit: a local server which does not have therequested view but has a correlated view , can reduce theswitching cost of repository transmitting to the cost ofrepository transmitting and .To summarize, there are four possible transmission cases

during a static inter-view switch from to , depending on thecontent replicated in servers. In order of increasing costs, theyare:1) Direct hit: When a neighboring server or the local serverhas the exact segment , the replicated I-frame

can be forwarded locally. This is of negligiblecost.

2) Differential transmission: The repository transmits pre-en-coded differentially coded P-frame and themerge frame .

3) Indirect hit: A neighboring server has replicated a corre-lated frame , which is forwarded to the local server,and the repository transmits only P-frame andmerge frame to the local server.

4) Replication miss: No server has exact or correlated frames,and the repository transmits independently-coded I-frame

to the local server.We illustrate the four transmission cases in Fig. 5. Fig. 5(a)

shows the flow diagram of an inter-view switch. In the exampleof Fig. 5(b), We consider a multiview video with 7 indepen-dent views. View 4 and view 7 are replicated by the contentservers, while the rest of the views are stored only in the reposi-tory. We consider the redundant window in this example,and the user is currently watching view 4. Switching to view 6is a direct hit because view 6 is replicated by the content servers.Switching to view 3 is a differential transmission because view3 is within the redundant window of view 4, and only the mergeframe and differential P-frame need to be transmitted. Switchingto view 7 is an indirect hit because view 7 is within the redundantwindow of another replicated view, view 6. Therefore view 6 isdelivered from a content server, and the merge frame and dif-ferential P-frame are delivered from the repository. Switchingto view 1 is a replicate miss because no server replicates theexact or correlated views. In this case the repository transmitsthe complete view to the user. Fig. 5(a) shows the flow diagramof an inter-view switch.Due to indirect hit and differential transmission with our

coding structure, the switching cost from the repository canbe substantially reduced. Note finally that because there areno differentially coded P-frames , , there areonly two possible costs for temporal switching, direct hit orreplication miss, similar to conventional content replicationmethods in interactive video applications.

IV. PROBLEM FORMULATION AND COMPLEXITY

In this section, we formulate the distributed content repli-cation problem for a minimal cost IMVS system as an ILPproblem. We first define the decision variables, followed by theconstraints and optimization objective. We finally prove thatthe problem is NP-hard.

A. Decision Variables

We consider the distributed content replication problem withmovies of captured views each. Let and be the set of

servers and the repository, respectively. For efficient replicationof movie , sets of consecutive segments in time are groupedas “chunks”, which is the basic replication unit. More precisely,the chunkincludes the segments ’s, where. When a server replicates , for each segmentin it replicates I-frame and temporal P-frames

in its local storage.Inter-view P-frames ’s and merge frame re-side only at the repository.Each server must decide which chunk(s) of movie to repli-

cate given its own (limited) storage size. Let

Page 6: 1874 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. …gchan/papers/TMM14_MVS.pdf1874 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 16, NO. 7, NOVEMBER 2014 Coding Structure and Replication Optimization

REN et al.: CODING STRUCTURE AND REPLICATION OPTIMIZATION FOR INTERACTIVE MULTIVIEW VIDEO STREAMING 1879

Fig. 5. Four transmissions in inter-view switches. (a) Flow diagram. (b) An example.

be a binary variable indicating whether to replicate chunkof movie at server . When watching a segment in

chunk of movie , a user can request an inter-viewor temporal switch, and IMVS system has to decide where(content server or repository) to get the requested multiviewdata from. For inter-view switching from view to , we firstdefine to be the binary variable indicatingwhether to directly pull the requested view from a local server(direct hit). If chunk is not replicated locally, we define

to be the binary variable indicating whetherto pull a correlated intermediate view from a local serverand pull the pre-coded differential between the intermediateview and the requested view from the repository (indirecthit). Clearly, means that requestedview must be pulled entirely from the repository (differentialtransmission or replication miss, depending on view distancefrom ).For temporal switching (i.e., switching to a temporal segment

of the same view not in the current chunk ), we defineas the binary variable indicating whether to pull

the requested multiview data from a local server for a temporalswitch from chunk to .

B. Linear Constraints

We discuss now the system constraints in our content replica-tion problem. For movie , first let be the size of chunk

. Further, let the storage capacity of server be denoted. Using the decision variables introduced above, we can write

the following capacity constraint for server :

(1)

which means that the sum of replicated content cannot exceedthe storage capacity.Then we observe that the temporal switch variable

can be 1 only if there is at least one server replicating chunk. Thus, we can write:

(2)

Similarly, the direct inter-view switch variable can be1 only if there is a server replicating chunk , i.e.,

(3)

Then, the indirect inter-view switch variable can be 1only if there is a server replicating chunk , where viewis in the window , so that, the repository onlyneeds to send the inter-view P-frame and the mergeframe . Thus, we can write

(4)

Finally, we want to ensure that indirect and direct hits are not se-lected simultaneously. Thus, for a given inter-view switch fromto , we write:

(5)

We can see that all the above system constraints are linear withrespect to the decision variables.

C. Inter-view & Temporal Switch Model

Before defining the objective function of our optimizationproblem, we describe the probabilistic model that we use tomodel the likelihood that a user chooses different inter-viewand temporal switches. For simplicity, we assume users selectinter-view switches independent of time and temporal switchesindependent of view, and hence we model them separately.We start with the temporal switch model. For movie , letbe the probability that a user temporally switches from -th

chunk to -th chunk for any view . We assumethat the users start an IMVS session at first chunk of someview . At an arbitrary time after video playback has started, let

be the average probability that a user is observing chunkof movie . We derive in terms of the switching probability

as follows.Let be a discrete random variable denoting the total number

of temporal switches that a user have already made at the cur-rent time instance (including both sequential temporal playbackand jumps). Denote the probability mass function ofas . Let be the probability that a user is at the-th chunk after temporal switches, given that the user startsfrom the first chunk. Considering the transitions from chunks tochunks, can be derived recursively as

(6)

Page 7: 1874 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. …gchan/papers/TMM14_MVS.pdf1874 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 16, NO. 7, NOVEMBER 2014 Coding Structure and Replication Optimization

1880 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 16, NO. 7, NOVEMBER 2014

The average probability can then be expressed as3

(7)

Clearly, is a probability measure because it satisfies.

For inter-view switches, we let be the probabilitythat a user switches from view to independent of time.Inter-view switch probabilities ’s thus constitute anirreducible, aperiodic and positive-recurrent Markov chain, andwe can compute the average state probability of viewby: i) performing eigen-decomposition of the state transitionmatrix, and ii) identifying the eigenvector associated witheigenvalue 1.Finally, we let be the probability that in case of a switch,

a client performs temporal switching as opposed to inter-viewswitching, independent of time and view; i.e., the viewerswitches to a different view with probability .

D. Objective Function

For the movie , let and be the cost of tem-poral and inter-view switching, respectively, stemming from thechunk . Let be the probability that the movie is se-lected for playback. The expected switching cost can then beexpressed as

(8)Our objective is to minimize the expected switching cost bydeciding: i) which server to replicate each chunk ( ’s),and ii) where to pull content from upon a temporal switch ( ’s)or an inter-view switch ( ’s and ’s). The problem is equivalentto computing the following:

(9)

The temporal switch cost in can be expressed as thesum of all temporal switch costs ’s to some chunk index:

(10)

Let the transmission rate between the repository and a localserver be denoted as . Let be the size of first framesof the segments in chunk , if the first frame is encodedas I-frame, for video client rebuffering4. Each temporal switchcost ’s can then be written simply in terms of temporalswitch variable :

(11)

3In practice, in the summation we count only terms where is non-negligible, so that the number of terms in the summation is finite.4 can be set according to the default client video player behavior. For ex-

ample, means the player will play back video as soon as there is a com-plete frame in the buffer.

Equation (11) says that the temporal switch cost is equal to somesmall if the chunk is replicated in a neighboring server,and it is equivalent to the amount of time required to transmitthe first frames of the segment, , to the local server ifthe new segment must be pulled from the repository.Then, we can write inter-view switch cost as the sum

of all view-to-view costs to some view :

(12)

The view-to-view cost has a small value if there is adirect hit in the local server ( ). If there is an indirecthit ( ), then the repository must transmit a P-frameand a merge frame, resulting in cost no larger than

; we bound the size of the first frames—starting with ainter-view P-frame and merge frame —for anyintermediate view , , with .Finally, if repository must transmit everything (

), then repository cost depends onwhether the target view is within the prediction window

(differential transmission) or not (replication miss):

(13)

Summarizing the above, the view-to-view cost is written as

(14)

Since , and are fixed, it is clear thatthe temporal switch cost and the inter-view switch cost

are linear in the decision variables. Thus, the expectedtemporal and inter-view switch cost and are alsolinear, and the objective function is also linear. Since all theconstraints are also linear, our problem is an Integer Linear Pro-gramming problem (ILP).

E. NP-Hardness Proof

Unfortunately, our ILP optimization problem is NP-hard. Weprove that by showing a special case of the problem can bemapped to the known NP-complete problem bin packing [27],which is the following. Given a bin capacity , a list of items ofsizes , and integer , is there a capacity-preservingitem-to-bin assignment so that or fewer bins are required?Consider a special case of our problem where there are

servers in the IMVS network, each of storage size , and wherethe multiview video has only one view. Thus, the client canonly perform temporal switching. Suppose each chunk ofchunks has size , and each chunk is requested with equal

likelihood. If there is a content replication strategy to fit allchunk in the servers (reflected in the resulting cost since norepository transmission is required), then there is a capacity-pre-serving item-to-bin assignment to fit all items in or fewer bins.It corresponds to solving a NP-Complete binary decision binpacking problem. Finally, our optimization problem is a gen-eral version of the previous problem; thus it is at least as hardas the NP-complete binary decision bin packing problem, andhence it is NP-hard.

Page 8: 1874 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. …gchan/papers/TMM14_MVS.pdf1874 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 16, NO. 7, NOVEMBER 2014 Coding Structure and Replication Optimization

REN et al.: CODING STRUCTURE AND REPLICATION OPTIMIZATION FOR INTERACTIVE MULTIVIEW VIDEO STREAMING 1881

V. LP RELAXATION AND ROUNDING WITH MINIMUM EVICTION

In this section, we present a first algorithm termed MinimumEviction5 which provides an approximate solution to the formu-lated ILP problem in Section IV. We first discuss the principlesof relaxing the ILP problem to an LP one. Given the solution ofthe LP problem, we next discuss Minimum Eviction to roundthe fractional LP solution to integers for a feasible approximatesolution to the original ILP problem.

A. Principles of LP Relaxation

Though the ILP problem posed earlier has linear constraintsand objective, it is difficult to solve because of the integer con-straints. If we remove these integer constraints, we can solve theresulting LP problem using one of several known algorithms(like Simplex) in polynomial time [27]. The resulting objec-tive function value is called a super-optimal solution; i.e.,

, where is the true optimal solution value to theoriginal ILP problem. The reason is that LP problem is a re-laxed version of the original ILP problem of Equation (9) withfewer constraints.If we perform rounding to the LP solution so that the integer

constraints are satisfied, we have a (likely sub-optimal) solutionthat is feasible with objective value . The approximation errorfrom the true optimal solution is bounded as follows:

(15)

The proof is straight-forward:

Then (15) follows because .Thus, the LP solution provides us with an a posteriori approx-

imation bound to quantify the quality of our rounded solution.We discuss next how the LP solution also provides additionalinformation so that we can perform integer rounding to a goodapproximate solution.

B. Rounding Heuristic: Minimum Eviction

Given an LP solution, we can classify the storage variables’s into two classes: 1) Primary variables, which are

the fractions ’s of the same chunk that sum toone across servers, i.e., ; and 2) Secondaryvariables, which are the fractions ’s of the same chunk

when instead. The LP solution tellsus that the primary variables are more important than thesecondary ones, because the chunks are stored in entirety inservers. The heuristic Minimum Eviction algorithm essentiallytries to fit as many primary variables in server storage aspossible by iteratively considering fractional chunks, startingwith the one of largest size first, as follows:

5“Eviction” is a common term used in caching literature to mean removal ofless useful contents for storage of more useful ones when the cache capacity isfull.

1) Identify the storage variables ’s that are equal to1. These are stable assignments and will not be changedfurther.

2) Find the target fractional primary variablerepresenting the largest fractional chunk. Round this frac-tion up, and round the corresponding variables ’sin other servers down.

3) If rounding up in step 2 results in a storage constraint vi-olation for server , evict secondary variables in the orderof decreasing fractional chunk sizes until: i) the constraintis met, or ii) no more secondary variables are left.

4) If the storage constraint in server is still violated, evictthe unstable primary variables in server inthe order of decreasing fractional chunk sizes until: i) theconstraint is met, or ii) no more unstable primary variablesare left.

5) If the storage constraint in server is still violated, evictthe target unstable variable instead.

6) If there are no more primary variables, then round down allremaining fractional variables and the algorithm finishes.Otherwise, go back to Step 1.

The key idea is that, by attempting to round up storage vari-able with the largest fractional chunk size, it is either kept inserver or removed from the servers, but it is never moved fromone server to another.

VI. DPLO: DYNAMIC PROGRAMMING FOLLOWED BYLAGRANGIAN OPTIMIZATION

In the previous section, we propose a solution based on LPrelaxation and rounding. Because the LP-rounding algorithmdoes not scale well to large number of variables, we proposehere a more scalable solution based on dynamic programmingand Lagrangian optimization (DPLO), which suffers little lossin performance. DPLO has two stages. In the first stage, it usesDP to calculate the maximum replication benefit of each moviechunk in a server, subject to the server storage constraint. Inthe second stage, it uses Lagrangian optimization based on thereplication benefits to optimally store the chunks in each server.

A. A Benefit Measure for Chunk Replication

We solve the ILP problem formulated in the previous sec-tion in the following manner. Let be the bi-nary decision variable indicating whether server replicates thechunk . Since there is no additional benefit of replicatinga chunk more than once among servers (due to negligible costbetween servers), we have

(16)

In other words, the chunk will be replicated at most onceamong servers.We analyze the cost and benefit of replicating the chunk

as follows. Replicating the chunk consumesserver storage space. The benefit of replicating

Page 9: 1874 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. …gchan/papers/TMM14_MVS.pdf1874 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 16, NO. 7, NOVEMBER 2014 Coding Structure and Replication Optimization

1882 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 16, NO. 7, NOVEMBER 2014

chunk among servers, when the replicated chunk of thenearest view is , can be written as

(17)

In words, the equation states that the benefit of repli-cating chunk is the difference in the cost between repli-cation miss (i.e., ) and direct hit (i.e., ) during a tem-poral switch from to , plus the difference in transmission costduring an inter-view switch. For inter-view switch, we dividethe potential benefit into two cases: i) view-switch from viewto , and ii) view-switch from view to some other view ,, and view is used as an intermediate view during an in-

direct hit. For the first case, we write the benefit as :

(18)The equation above states that if the view-switch from to targetis within , or if the view of an already replicated chunkis also within of target , then the benefit is only the differencebetween a transmission cost of an indirect hitand a direct hit , i.e., . Otherwise, the benefit isthe difference between the cost of a replication missand a direct hit .For the second case that represents the switching from viewto a target view , where , we write the potential benefitof using view in an indirect hit as :

(19)The above states that if the view of an already replicated chunk

is within of target , or if the current view is withinof target , then the cost of view-switching from to is

already no worse than an indirect hit6, and replicating chunkbrings no additional benefit. Otherwise, the benefit is the

difference between the cost of a replication miss andthe cost of an indirect hit .Given the above cost/benefit analysis for each chunk, we can

derive an algorithm that operates in two stages as follows. Inthe first stage, we determine how the storage space should beoptimally distributed among different views of chunks of thesame movie and time index . In the second stage, we deter-mine how the storage space should be distributed among chunksof different movies and different time indices. We will demon-strate in Section VII that this “divide-and-conquer” strategy is

6If , then it is a direct hit with cost , which is smaller than an indirecthit .

computationally much more efficient than the Minimum Evic-tion algorithm discussed in Section V.

B. Stage 1: Dynamic Programming to Calculate ReplicationBenefits of Movie Chunks

We now discuss the first stage, namely a dynamic program-ming (DP) algorithm to find the optimal selection of chunksof the same movie and time index , given the availableserver storage space. First, let be the max-imum possible benefit achieved by the IMVS network by se-lecting which views of movie , chunk to replicate underthe constraints of available storage capacities inservers. We can solve using recursive function

, which is defined to be the maximum ben-efit given optimal replication decision has been made for view

and the most recent replicated view is :

(20)

The variable can be defined recursively asfollows:

(21)

In words, the above equation says that isthe larger of if chunk has notbeen replicated, and benefit of chunk plus fu-ture recursive cost, if has been replicated. Note that therecursive term has a smaller remaining storage sizefor server , and that the most recent replicated view has beenupdated to .We solve with arguments , whereis the total storage capacity of server . Equation (21) can

be solved using DP, where solution to isstored in entry of a DP table, so that a futurecall to with the same argument can be simply looked up.The time complexity of the algorithm is then the number ofsteps that are necessary to compute each DP table entry (using Equation (21)), times the number of entries in the DP table(which is ).Nonetheless, the size of the DP table can be large, leading

to large computation costs. We can reduce the complexity bya rounding factor as follows. First, the storage sizes for thearguments of the first call are each scaled androunded down by , i.e., . Then, in Equa-tion (21), when the chunk is replicated in server , thereduction in size is then scaled and rounded up by ,i.e., . In doing so, an entry in theDP table now represents the solution if storage sizes of at least

are available for chunks ’s of view, each chunk being of size no larger than . Therounding directions are chosen so that the obtained solution re-mains feasible in the original problem without rounding. Largerounding errors due to large , however, would mean that morestorage space in servers are left unused, leading to larger ap-proximation error in the obtained solution. The benefit on the

Page 10: 1874 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. …gchan/papers/TMM14_MVS.pdf1874 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 16, NO. 7, NOVEMBER 2014 Coding Structure and Replication Optimization

REN et al.: CODING STRUCTURE AND REPLICATION OPTIMIZATION FOR INTERACTIVE MULTIVIEW VIDEO STREAMING 1883

other side is a reduction in DP table size7 by factor , thusreducing the overall complexity of the algorithm.We note the following two observations about the DP algo-

rithm. First, the computation for can be carriedout independently for chunks of the different movies in the stage1 of our algorithm. And at different time instants, leading to anefficient parallel implementation. Second, the DP tables hostingthe computed results in stage 1 are only constructed once (fora given rounding factor ). In the Lagrangian optimization instage 2 of the optimization algorithm, the sameDP tables can bereused without re-computation, although the optimization hasto be performed multiple times in search for the appropriate La-grange multipliers ’s,.

C. Stage 2: Lagrangian Optimization for Chunk Storage

We have described above how can besolved with a DP algorithm. The overall constrained optimiza-tion problem can then be written as

(22)

Instead of solving Equation (22) directly, we propose to solveits Lagrangian version using Lagrange multipliers :

(23)

We can see clearly that Equation (23) is separable, i.e., forfixed ’s we can solve for optimal ’s for chunksof all view ’s independently of other chunks, without loss ofoptimality. In other words, we can solve independently the fol-lowing set of problems:

(24)This means that Equation (24) for different movies and at dif-ferent time instants can also be solved independently in a par-allel implementation, similarly to the DP algorithm in the stage1 of the optimization algorithm. Equation (24) for a given pair ofmovie index and time instant can be solved easily. The onlyremaining task is to find so that the operational storage sizesare as close to the original constraints ’s, without exceedingthem. This can be done, for example, using binary search on thereal positive number line.In summary, our optimization procedure of DPLO is as

follows:1) Using as argument, construct functionsfor all and with Equation (20). Use to controlcomplexity.

2) Initialize , .

7We note that integer rounding to reduce DP table size is a standard techniquein polynomial-time approximation scheme (PTAS) in combinatorial optimiza-tion [27].

3) Solve Equation (23) for given ’s. Increase if. Otherwise, decrease only if.

4) Repeat Step 3 if a has been updated.

VII. ILLUSTRATIVE SIMULATION RESULTS

In this section, we present simulation results to demonstratethe performance of our multiview coding and content replica-tion strategies.

A. Simulation Environment, Comparison Schemes andPerformance Metrics

In our simulation, we first run a set of experiments on theMPEG multiview videos to estimate the respective sizes ofI-frames, temporal P-frames, inter-view P-frames, and mergeframes. In the experiments, we use the multiview video se-quences of Kendo, Champagne tower and Pantomime, providedby the Tanimoto Laboratory, Nagoya University [28]. Views arecoded into our proposed frame structure using a H.264 codec: I-and P-frames are coded using conventional H.264 tools, whilemerge frames are coded using the methodology described in[13]. The quantization parameter is fixed at 40 for all framesfor constant visual quality. Note that the benefits offered by ouroptimized replication algorithm are not dependent on the actualvideo coding algorithms deployed. The performance gain mayvary depending on the version of the video coding standardsused in the experiments, but the results would be qualitativelythe same. We normalize the size of each I-frame, merge frame,temporal and inter-view P-frame into block units. Then werandomly generate frames for the movies with the sizes anddistributions according to the experiment results. The size ofI-frames is distributed with mean 4 units. The size of a temporalP-frame is distributed with mean 1 units. The size of a mergeframe plus a inter-view P-frames between view and viewequals to units. Unless otherwise stated, weuse the baseline parameters as shown in Table II to representthe system settings and the different costs in the IMVS system.The popularity of the movies follows the Zipf distribution withparameter (i.e., the access probability is proportional to ,where is the movie index). We have also run our simulationsusing different popularity distributions. The results of thosesimulations are qualitatively the same as those based on theZipf distribution, and hence they are not shown for the sake ofbrevity.We compare our replication strategiesMinimum Eviction and

DPLO with the following schemes:• Local Greedy [29]: Local greedy is a state-of-the-art repli-cation strategy, in which each server replicates chunks withhigh utility. In other words, popular movies and viewsare most likely to be replicated. A few servers replicatemedium popular movies and views, and unpopular onesare only stored at the repository. In our implementation,we use half of the storage in each server to replicate themost popular chunks, and the other half the storage in eachserver to replicate the medium popular ones. The unpop-ular ones are not replicated by the servers.

Page 11: 1874 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. …gchan/papers/TMM14_MVS.pdf1874 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 16, NO. 7, NOVEMBER 2014 Coding Structure and Replication Optimization

1884 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 16, NO. 7, NOVEMBER 2014

TABLE IIBASELINE PARAMETERS IN OUR SIMULATION

• Random: Random replication is a simple replicationmethod. Each server randomly replicates chunk of dif-ferent views and different movies up to using fully theavailable storage.

We evaluate the performance of our proposed algorithmsand compare them with other schemes in terms of run time,switching cost and request performance.• Run time: The run time is defined as the total number ofseconds needed to compute the replicating strategy. Weconduct our simulations on a 64-bit desktop computer withIntel Core i7-2600 [email protected] GHz and 8 GB RAM run-ning Windows 7 operating system. We set the normalizedrun time unit to be 10 seconds in this experiment. We areparticularly interested in the normalized running time ofDPLO compared to Minimum Eviction.

• Switching cost: Then, the switching cost is our objectivemetric in the problem formulation. We study the switchingcost of the different schemes, as well as its sensitivityagainst different system parameters. We are also interestedin cost components and distribution.

• Request performance: Finally, besides the cost perfor-mance, we also study the distribution of different types ofrequests/transmissions, i.e., direct hit rate (defined as thenumber of direct hit requests divided by total number ofrequests), differential transmission rate, indirect hit rateand replication miss rate.

B. Preliminary Comparison Between Minimum Eviction andDPLO

Due to the scalability issues of Minimum Eviction, firstconsider a small scale problem with 3 servers, 14 movies and3 chunks per movie for the comparisons in this sub-section.Fig. 6 shows the total switching cost as a function of theinter-view switch probability (view-switch tendency) for dif-ferent schemes. A view-switch tendency of 0 means that usersonly perform temporal switch, and view-switch tendency of1 means that users perform inter-view switch at every switchopportunity. Super Optimal is the optimal solution to theformulated ILP problem in Section IV but without the integerconstraints. We observe that Random has the worst performancedue to the lack of a problem-specific optimization. When viewswitch tendency is larger than 0, DPLO outperforms LocalGreedy because the servers consider view switches when they

Fig. 6. Switching cost versus view-switch tendency for different replicationalgorithms.

Fig. 7. Algorithm run time versus number of movies.

replicate chunks. Both DPLO and Minimum Eviction achieveclose-to-optimal performance as compared to Super Optimal.Fig. 7 shows the running time of DPLO and Minimum Evic-

tion as a function of number of movies. DPLO achieves muchbetter performance in run time than Minimum Eviction. Thisis because Minimum Eviction needs to solve a large-scale LPproblem with a large number of variables. Recall that the com-putational complexity ofDPLO can be tuned using the roundingparameter , which is set to 2 in this experiment.Although Minimum Eviction achieve better performance in

switching cost, it does not scale to larger problem sizes. There-fore we focus on DPLO in the rest of the evaluation studies.

C. Computational Run Time

We now study the performance of DPLO in different exper-iments. Fig. 8 shows the run time of DPLO as a function ofswitching cost. It illustrates the “time-performance” tradeoff ofour algorithm. We use a scaling parameter to control the sizeof the DP table, and hence the complexity of the algorithm.As shown in the figure, with (no rounding operations),DPLO gives the most accurate replicating solution, which leadsto lowest switching costs. As the scaling factor increases, therun time of DPLO decreases significantly with a slight increasein switching costs. Therefore, by adjusting the scaling factor, DPLO can trade off performance with computational com-

plexity in the optimization of the replications strategy.Fig. 9 shows the algorithm run time as a function of the total

number of servers. It demonstrates the relationship between thecomputational time of a replication plan and the problem com-plexity. The run time of Random does not increase much with

Page 12: 1874 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. …gchan/papers/TMM14_MVS.pdf1874 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 16, NO. 7, NOVEMBER 2014 Coding Structure and Replication Optimization

REN et al.: CODING STRUCTURE AND REPLICATION OPTIMIZATION FOR INTERACTIVE MULTIVIEW VIDEO STREAMING 1885

Fig. 8. Algorithm run time versus switching cost.

Fig. 9. Switching cost versus view-switch tendency for different replicationalgorithms.

Fig. 10. Switching cost versus number of views for different replicationalgorithms.

the problem size since every server just randomly replicateschunks without any collaboration and optimization. DPLO andLocal Greedy has similar performance, where their run timeboth increases with the number of servers. In real VOD sys-tems, multiple servers are often grouped into a single servercluster or farm, which can be modeled as one logical server inour problem. Therefore the total number of servers is not ex-pected to be very large.

D. Switching Cost

Fig. 10 shows the switching cost as a function of the numberof views in total. We observe that the switching cost increaseswith the number of views. This is because the increase in viewsleads to an increase of the number of total chunks in the system,and hence there is a higher likelihood of replicationmiss.We ob-serve again that DPLO performs better than Random and LocalGreedy, especially when the number of views is large.

Fig. 11. Switching cost versus the size of redundant window for different repli-cation algorithms.

Fig. 12. Switching cost versus the movie popularity model for different repli-cation algorithms.

Fig. 11 shows the switching cost as a function of the size ofthe redundant window . The switching cost decreases with theincrease of the redundant window size. With a larger redundantwindow, more view switches only require transmission of thecorresponding pre-encoded differentials. As a result, the indirecthit rate increases, and the switching cost in turn decreases. Onthe other hand, a large redundant window size means that thereare more redundant inter-view P-frames generated in the head ofeach coding unit, leading to amore redundant representation andto a larger storage cost at the repository. Therefore, the size ofthe redundant window needs to be judiciously selected to tradeoff performance (switching cost) with repository storage cost.Fig. 12 shows the switching cost as a function of the Zipf

parameter in the movie popularity model. When the param-eter is equal to zero, all movies are equally popular. As themovie popularity becomes more skewed, the switching cost ofDPLO and Local Greedy decreases because they use relativelymore storage to replicate popular movies. The switching costof Random is not sensitive to movie popularity. We observeagain that DPLO significantly outperforms Local Greedy andRandom.

E. Cost Components and Distribution

Fig. 13 shows the cost distribution of the different schemes.It can be seen thatDPLO has a much lower replication miss costcompared to Random and Local Greedy. This is because DPLOreplicates chunks in order to maximize the benefit of transmit-ting frame differentials instead of the whole chunks from therepository. Therefore, there are much more differential trans-mission requests and indirect hits inDPLO than in the other twoschemes. DPLO exploits indirect hit to lower the overall cost.

Page 13: 1874 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. …gchan/papers/TMM14_MVS.pdf1874 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 16, NO. 7, NOVEMBER 2014 Coding Structure and Replication Optimization

1886 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 16, NO. 7, NOVEMBER 2014

Fig. 13. Cost distribution for the different replication schemes.

Fig. 14. Distribution of request types versus the size of redundant window foralgorithm DPLO.

Fig. 14 shows the fraction of different requests in DPLO asa function of the size of the redundant window. When the re-dundant window size is zero, there are only two types of re-quests—direct hit and replication miss—since there is no inter-view P-frames to provide representation redundancy and ex-ploit inter-view correlation during a view-switch. With the in-crease of the redundant window size, the fraction of replica-tion miss decreases sharply, and both the fractions of indirecthit and differential transmission increase. The servers use inter-view P-frames to decode neighboring views more often, and thecost induced by replication miss significantly decreases. Whenthe redundant window size becomes large, the fraction of in-direct hits also decreases. This is because, with a large redun-dant window size, each view has a larger number of neigh-boring views. Hence most of the view switch requests lead todifferential transmission, which costs less than an indirect hit.The number of direct hits remains constant, since the redundantwindow size does not introduce new replicated chunks.

VIII. CONCLUSION

In interactive multiview video streaming (IMVS), userswatching a multiview video may request inter-view or temporalswitches at any time. In this paper, we study the issues of codingstructure and its replication to support large-scale IMVS servicewith distributed servers. In order to facilitate view-switchingand storage, we propose a coding structure based on redundantP-frames and merge frames. Using the redundant frame struc-ture, the switching cost of video segments can be substantiallyreduced via “indirect hit”—given a requested view and a locallystored correlated views at a server, only pre-encoded framedifferentials between the replicated views and the requestedviews are needed to be transmitted in the network.

With the coding structure, we then formulate the contentreplication problem to minimize content switching cost as aninteger linear programming (ILP) problem. We propose anLP-based strategy with integer rounding (called MinimumEviction) to replicate movie contents which achieves excellentperformance. We further propose a more scalable solutionbased on dynamic programming and Lagrangian optimization(DPLO). Simulation results show that our schemes achievevery close to the optimal solution, with significantly lowercost than a state-of-the-art and a commonly used replicationschemes.

REFERENCES[1] M. Tanimoto, M. P. Tehrani, T. Fujii, and T. Yendo, “Free-viewpoint

TV,” IEEE Signal Process. Mag., vol. 28, no. 1, pp. 67–76, Jan. 2011.[2] G. Cheung, A. Ortega, and N.-M. Cheung, “Interactive streaming of

stored multiview video using redundant frame structures,” IEEE Trans.Image Process., vol. 20, no. 3, pp. 744–761, Mar. 2011.

[3] J.-G. Lou, H. Cai, and J. Li, “A real-time interactive multi-view videosystem,” in Proc. ACM Int. Conf. Multimedia, Singapore, Nov. 2005,pp. 161–170.

[4] X. Zhang and H. Hassanein, “Video on-demand streaming on the in-ternet—a survey,” in Proc. 25th Biennial Symp. Commun., 2010, pp.88–91.

[5] W. Dai, G. Cheung, N.-M. Cheung, A. Ortega, and O. Au, “Rate-dis-tortion optimized merge frame using piecewise constant functions,”in Proc. IEEE Int. Conf. Image Process., Melbourne, Australia, Sep.2013, pp. 1787–1791.

[6] P. Merkle, A. Smolic, K. Muller, and T. Wiegand, “Efficient predic-tion structures for multiview video coding,” IEEE Trans. Circuits Syst.Video Technol., vol. 17, no. 11, pp. 1461–1473, Nov. 2007.

[7] S. Shimizu, M. Kitahara, H. Kimata, K. Kamikura, and Y. Yashima,“View scalable multiview coding using 3-D warping with depthmap,” IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 11, pp.1485–1495, Nov. 2007.

[8] T. Fujii, K. Mori, K. Takeda, K. Mase, M. Tanimoto, and Y. Sue-naga, “Multipoint measuring system for video and sound—100 cameraand microphone system,” in Proc. IEEE Int. Conf. Multimedia Expo,Toronto, ON, Canada, Jul. 2006, pp. 437–440.

[9] G. Cheung, A. Ortega, N.-M. Cheung, and B. Girod, “On media datastructures for interactive streaming in immersive applications,” inProc. SPIE Vis. Commun. Image Process. Conf., Huang Shan, China,Jul. 2010.

[10] X. Xiu, G. Cheung, and J. Liang, “Delay-cognizant interactive multi-view video with free viewpoint synthesis,” IEEE Trans. Multimedia,vol. 14, no. 4, pp. 1109–1126, Aug. 2012.

[11] T. Maugey, I. Daribo, G. Cheung, and P. Frossard, “Navigation domainrepresentation for interactive multiview imaging,” IEEE Trans. ImageProcess., vol. 22, no. 9, pp. 3459–3472, Sep. 2013.

[12] M. Karczewicz and R. Kurceren, “The SP- and SI-frames design forH.264/AVC,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7,pp. 637–644, Jul. 2003.

[13] N.-M. Cheung, A. Ortega, and G. Cheung, “Distributed source codingtechniques for interactive multiview video streaming,” in Proc. 27thPicture Coding Symp., Chicago, IL, USA, May 2009, pp. 1–4.

[14] E. Jaho, I. Koukoutsidis, I. Stavrakakis, and I. Jaho, “Cooperativecontent replication in networks with autonomous nodes,” Comput.Commun., vol. 35, no. 5, pp. 637–647, Mar. 2012.

[15] S.-H. G. Chan, “Operation and cost optimization of a distributedservers architecture for on-demand video services,” IEEE Commun.Lett., vol. 5, no. 9, pp. 384–386, Sep. 2001.

[16] S. Ataee, B. Garbinato, and F. Pedone, “Restream - a replication al-gorithm for reliable and scalable multimedia streaming,” in Proc. 21stEuromicro Int. Conf. Parallel, Distrib. and Network-Based Process.,2013, pp. 68–76.

[17] Y. Zhou, T. Fu, and D. M. Chiu, “On replication algorithm in P2PVoD,” IEEE/ACM Trans. Networking, vol. 21, no. 1, 2013.

[18] S. Ghandeharizadeh and S. Shayandeh, “Domical cooperative cachingfor streaming media in wireless home networks,” ACM Trans. Multi-media Computing, Communications and Applications, vol. 7, no. 4, pp.40:1–40:17, Dec. 2011.

[19] S. Borst, V. Gupta, and A. Walid, “Self-organizing algorithms forcache cooperation in content distribution networks,” ACM SIGMET-RICS Performance Eval. Rev., vol. 37, no. 2, pp. 71–72, 2009.

Page 14: 1874 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. …gchan/papers/TMM14_MVS.pdf1874 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 16, NO. 7, NOVEMBER 2014 Coding Structure and Replication Optimization

REN et al.: CODING STRUCTURE AND REPLICATION OPTIMIZATION FOR INTERACTIVE MULTIVIEW VIDEO STREAMING 1887

[20] S. Mao, X. Xheng, Y. T. Hou, H. D. Sherali, and J. H. Reed, “On jointrouting and server selection for MD video streaming in ad hoc net-works,” IEEE Trans. Wireless Commun., vol. 6, no. 1, pp. 338–347,Jan. 2007.

[21] M. Wang, L. Xu, and B. Ramamurthy, “Improving multi-viewpeer-to-peer live streaming systems with the divide-and-conquerstrategy,” Comput. Netw., vol. 55, no. 18, pp. 4069–4085, Dec. 2011.

[22] S. Sedef Savas, C. Göktuğ Gürler, A. Murat Tekalp, E. Ekmekcioglu,S. Worrall, and A. Kondoz, “Adaptive streaming of multi-view videoover p2p networks,” Image Commun., vol. 27, no. 5, pp. 522–531, May2012.

[23] Y. Ding and J. Liu, “Efficient stereo segment scheduling in peer-to-peer3D/multi-view video streaming,” inProc. IEEE Int. Conf. Peer-to-PeerComputing, Sep. 2011, pp. 182–191.

[24] Z. Chen, L. Sun, and S. Yang, “Overcoming view switching dynamicin multi-view video streaming over P2P network,” in Ptov. 3DTV-Con-ference: The True Vision - Capture, Transmission and Display of 3DVideo, Jun. 2010, pp. 1–4.

[25] H. Huang, B. Zhang, G. Chan, G. Cheung, and P. Frossard, “Codingand caching co-design for interactive multiview video streaming,” inProc. Mini-conf. IEEE INFOCOM, Orlando, FL, USA, Mar. 2011, pp.3073–3077.

[26] H. Huang, S.-H. G. Chan, G. Cheung, and P. Frossard, “Near-optimalcontent replication for interactive multiview video streaming,” in Proc.19th Int. Packet Video Workshop, May 2012, pp. 95–98.

[27] C. H. Papadimitriou and K. Steiglitz, Combinatorial Optimization: Al-gorithms and Complexity. New York, NY, USA: Dover, 1998.

[28] Tanimoto Laboratory, Department of Information Electronics,Nagoya University [Online]. Available: http://www.tanimoto.nuee.nagoya-u.ac.jp

[29] S. Borst, V. Gupta, and A. Walid, “Distributed caching algorithms forcontent distribution networks,” in Proc. IEEE INFOCOM, Mar. 2010,pp. 1–9.

Dongni Ren received the B.Eng. degree in com-puter science and engineering and M.Phil. degreein computer science from Hong Kong Universityof Science and Technology (HKUST), in 2007 and2009, respectively, where he is currently workingtoward the Ph.D. degree at the Department of Com-puter Science and Engineering, supervised by Prof.Gary Chan.His research interest includes video streaming

networks, overlay broadcasting, Video on Demand(VOD), and multi-view/free-viewpoint video

technologies.

S.-H. Gary Chan (S’89–M’98–SM’03) receivedthe B.S.E. degree (with highest honor) in electricalengineering from Princeton University, Princeton,NJ, USA, in 1993, and M.S.E. and Ph.D. degreesin electrical engineering from Stanford University,Stanford, CA, USA, in 1994 and 1999, respectively,He is currently Professor and Undergraduate Pro-

grams Coordinator at the Department of ComputerScience and Engineering, The Hong Kong Universityof Science and Technology (HKUST), Hong Kong.He is also the Director of Sino Software Research In-

stitute at HKUST. His research interest includes multimedia networking, wire-less networks and mobile computing.Prof. Chan was an associate editor of the IEEE TRANSACTIONS ON

MULTIMEDIA (2006–11), and a Vice-Chair of Peer-to-Peer Networking andCommunications Technical Sub-Committee of IEEE Comsoc Emerging Tech-nologies Committee. He has been Guest Editors of IEEE TRANSACTIONS ONMULTIMEDIA (2011), the IEEE Signal Processing Magazine (2011), the IEEECommunication Magazine (2007), and Springer Multimedia Tools and Appli-cations (2007). He was the TPC chair of IEEE Consumer Communicationsand Networking Conference (CCNC) 2010, Multimedia symposium in IEEEGlobecom (2007 and 2006), IEEE ICC (2007 and 2005), and Workshop on Ad-vances in Peer-to-Peer Multimedia Streaming in ACM Multimedia Conference(2005). His research projects on wireless and streaming have received severalICT (Information and Communication Technology) awards in Hong Kong, PanPearl River Delta and Asia-Pacific regions due to their commercial impacts to

industries (2012, 2013, and 2014). He is the recipient of Google Mobile 2014Award (2010 and 2011) and Silver Award of Boeing Research and Technology(2009). He has been a visiting professor or researcher in Microsoft Research(2000–11), Princeton University (2009), Stanford University (2008–09), andUniversity of California at Davis (1998–1999). He was a Co-director ofHKUST Risk Management and Business Intelligence program (2011–2013),and Director of Computer Engineering Program at the HKUST (2006–2008).

Gene Cheung (M’00–SM’07) received the B.S. de-gree from Cornell University, Ithaca, NY, USA, in1995, and the M.S. and Ph.D. degrees and computerscience from the University of California, Berkeley,CA, USA, in 1998 and 2000, respectively, all in elec-trical engineering.He was a Senior Researcher with Hewlett-Packard

Laboratories Japan, Tokyo, from 2000 till 2009. Heis now an associate professor in National Institute ofInformatics in Tokyo, Japan. His research interestsinclude image & video representation, immersive vi-

sual communication and graph signal processing. He has published over 130international conference and journal publications.He has served as an associate editor for the IEEE TRANSACTIONS ON

MULTIMEDIA from 2007 to 2011 and currently serves as associate editorfor DSP Applications Column in the IEEE Signal Processing Magazineand APSIPA journal on signal & information processing, and as area ed-itor for EURASIP Signal Processing: Image Communication. He currentlyserves as member of the Multimedia Signal Processing Technical Committee(MMSP-TC) in IEEE Signal Processing Society (2012–2014). He has alsoserved as area chair in IEEE International Conference on Image Processing(ICIP) 2010, 2012–2013, technical program co-chair of International PacketVideo Workshop (PV) 2010, track co-chair for Multimedia Signal Processingtrack in IEEE International Conference on Multimedia and Expo (ICME)2011, symposium co-chair for CSSMA Symposium in IEEE GLOBECOM2012, and area chair for ICME 2013. He was invited as plenary speaker forIEEE International Workshop on Multimedia Signal Processing (MMSP) 2013on the topic “3-D visual communication: media representation, transport andrendering”. He is a co-author of best student paper award in IEEE Workshopon Streaming and Media Communications 2011 (in conjunction with ICME2011), Best Paper finalists in ICME 2011 and ICIP 2011, Best Paper Runner-upAward in ICME 2012, and Best Student Paper Award in ICIP 2013.

Pascal Frossard (S’96–M’01–SM’04) receivedthe M.S. and Ph.D. degrees, both in electricalengineering, from the Swiss Federal Institute ofTechnology (EPFL), Lausanne, Switzerland, in 1997and 2000, respectively.Between 2001 and 2003, he was a member of

the research staff at the IBM T. J. Watson ResearchCenter, Yorktown Heights, NY, where he workedon media coding and streaming technologies. Since2003, he has been a faculty at EPFL, where heheads the Signal Processing Laboratory (LTS4). His

research interests include image representation and coding, visual informa-tion analysis, distributed image processing and communications, and mediastreaming systems.Dr. Frossard has been the General Chair of IEEE ICME 2002 and Packet

Video 2007. He has been the Technical Program Chair of IEEE ICIP 2014 andEUSIPCO 2008, and a member of the organizing or technical program com-mittees of numerous conferences. He has been an Associate Editor of the IEEETRANSACTIONS ON IMAGE PROCESSING (2010–2013), the IEEE TRANSACTIONSON MULTIMEDIA (2004–2012), and the IEEE TRANSACTIONS ON CIRCUITS ANDSYSTEMS FOR VIDEO TECHNOLOGY (2006–2011). He is the Chair of the IEEEImage, Video and Multidimensional Signal Processing Technical Committee(2014–2015), and an elected member of the IEEE Visual Signal Processingand Communications Technical Committee (2006–) and of the IEEE Multi-media Systems and Applications Technical Committee (2005–). He has servedas Steering Committee Chair (2012–2014) and Vice-Chair (2004–2006) of theIEEE Multimedia Communications Technical Committee and as a member ofthe IEEEMultimedia Signal Processing Technical Committee (2004–2007). Hereceived the Swiss NSF Professorship Award in 2003, the IBM Faculty Awardin 2005, the IBM Exploratory Stream Analytics Innovation Award in 2008 andthe IEEE TRANSACTIONS ON MULTIMEDIA Best Paper Award in 2011.


Recommended