+ All Categories
Home > Documents > An HTTP/2-Based Adaptive Streaming Frameworkfor 360 ... · of tiled streaming over HTTP/1.1, and...

An HTTP/2-Based Adaptive Streaming Frameworkfor 360 ... · of tiled streaming over HTTP/1.1, and...

Date post: 11-Aug-2020
Category:
Upload: others
View: 6 times
Download: 0 times
Share this document with a friend
9
An HTTP/2-Based Adaptive Streaming Framework for 360° Virtual Reality Videos Stefano Petrangeli Ghent University - imec [email protected] Viswanathan Swaminathan Adobe Research [email protected] Mohammad Hosseini University of Illinois at Urbana-Champaign [email protected] Filip De Turck Ghent University - imec [email protected] ABSTRACT Virtual Reality (VR) devices are becoming accessible to a large pub- lic, which is going to increase the demand for 360° VR videos. VR videos are often characterized by a poor quality of experience, due to the high bandwidth required to stream the 360° video. To over- come this issue, we spatially divide the VR video into tiles, so that each temporal segment is composed of several spatial tiles. Only the tiles belonging to the viewport, the region of the video watched by the user, are streamed at the highest quality. The other tiles are instead streamed at a lower quality. We also propose an algorithm to predict the future viewport position and minimize quality tran- sitions during viewport changes. The video is delivered using the server push feature of the HTTP/2 protocol. Instead of retrieving each tile individually, the client issues a single push request to the server, so that all the required tiles are automatically pushed back to back. This approach allows to increase the achieved throughput, especially in mobile, high RTT networks. In this paper, we detail the proposed framework and present a prototype developed to test its performance using real-world 4G bandwidth traces. Particu- larly, our approach can save bandwidth up to 35% without severely impacting the quality viewed by the user, when compared to a traditional non-tiled VR streaming solution. Moreover, in high RTT conditions, our HTTP/2 approach can reach 3 times the throughput of tiled streaming over HTTP/1.1, and consistently reduce freeze time. These results represent a major improvement for the ecient delivery of 360° VR videos over the Internet. CCS CONCEPTS Information systems Multimedia streaming; Human- centered computing Virtual reality; Networks Net- work protocols; Public Internet ; KEYWORDS Virtual reality; HTTP Adaptive Streaming; HTTP/2; Server push; Viewport prediction; H.265; Tiling Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the full citation on the rst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specic permission and/or a fee. Request permissions from [email protected]. MM ’17, October 23–27, 2017, Mountain View, CA, USA © 2017 ACM. 978-1-4503-4906-2/17/10. . . $15.00 DOI: https://doi.org/10.1145/3123266.3123453 Viewport Polar tile Equatorial tile Figure 1: In tiled VR streaming, only tiles belonging to the viewport (in green) are streamed at the highest quality. ACM Reference format: S. Petrangeli, V. Swaminathan, M. Hosseini, F. De Turck. 2017. An HTTP/2- Based Adaptive Streaming Framework for 360° Virtual Reality Videos. In Proceedings of MM ’17, October 23–27, 2017, Mountain View, CA, USA, 9 pages. DOI: https://doi.org/10.1145/3123266.3123453 1 INTRODUCTION Recent advancements in consumer electronics have made Virtual Reality (VR) devices accessible to a large public. Consequently, the demand for 360° VR video streaming is expected to grow exponen- tially in the near future. Streaming VR videos over the best-eort Internet is challenged today by the high bandwidth required to stream the entire 360° video. This aspect is often responsible for low video quality and buer starvations, two of the main factors inuencing users’ Quality of Experience (QoE). View-dependent so- lutions are ideal in saving bandwidth for VR streaming, as only the viewport, the portion of the video watched by the user, is streamed at the highest quality, while the rest of the video is streamed at a lower quality. Despite that, current solutions require to store a dierent version of the video for each possible position of the viewport, entailing huge CDN costs [5]. Tiling the VR video allows to obtain similar results in terms of bandwidth savings, without additional storage compared to traditional streaming. In tiling, the VR video is divided into spatial regions, each encoded at dierent quality levels. To save bandwidth, only tiles inside the viewport are streamed at the highest quality (Figure 1). This approach can be combined with the bandwidth adaptation of HTTP Adaptive Streaming (HAS) techniques. In tiled HAS, the video is both tempo- rally segmented and spatially tiled, so that each temporal segment of the video is composed of several video tiles.
Transcript
Page 1: An HTTP/2-Based Adaptive Streaming Frameworkfor 360 ... · of tiled streaming over HTTP/1.1, and consistently reduce freeze time. These results represent a major improvement for the

An HTTP/2-Based Adaptive Streaming Frameworkfor 360° Virtual Reality Videos

Stefano PetrangeliGhent University - imec

[email protected]

Viswanathan SwaminathanAdobe Research

[email protected]

Mohammad HosseiniUniversity of Illinois at Urbana-Champaign

[email protected]

Filip De TurckGhent University - imec�[email protected]

ABSTRACTVirtual Reality (VR) devices are becoming accessible to a large pub-lic, which is going to increase the demand for 360° VR videos. VRvideos are often characterized by a poor quality of experience, dueto the high bandwidth required to stream the 360° video. To over-come this issue, we spatially divide the VR video into tiles, so thateach temporal segment is composed of several spatial tiles. Onlythe tiles belonging to the viewport, the region of the video watchedby the user, are streamed at the highest quality. The other tiles areinstead streamed at a lower quality. We also propose an algorithmto predict the future viewport position and minimize quality tran-sitions during viewport changes. The video is delivered using theserver push feature of the HTTP/2 protocol. Instead of retrievingeach tile individually, the client issues a single push request to theserver, so that all the required tiles are automatically pushed backto back. This approach allows to increase the achieved throughput,especially in mobile, high RTT networks. In this paper, we detailthe proposed framework and present a prototype developed to testits performance using real-world 4G bandwidth traces. Particu-larly, our approach can save bandwidth up to 35% without severelyimpacting the quality viewed by the user, when compared to atraditional non-tiled VR streaming solution. Moreover, in high RTTconditions, our HTTP/2 approach can reach 3 times the throughputof tiled streaming over HTTP/1.1, and consistently reduce freezetime. These results represent a major improvement for the e�cientdelivery of 360° VR videos over the Internet.

CCS CONCEPTS• Information systems → Multimedia streaming; • Human-centered computing → Virtual reality; • Networks → Net-work protocols; Public Internet;

KEYWORDSVirtual reality; HTTP Adaptive Streaming; HTTP/2; Server push;Viewport prediction; H.265; Tiling

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor pro�t or commercial advantage and that copies bear this notice and the full citationon the �rst page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior speci�c permission and/or afee. Request permissions from [email protected] ’17, October 23–27, 2017, Mountain View, CA, USA© 2017 ACM. 978-1-4503-4906-2/17/10. . . $15.00DOI: https://doi.org/10.1145/3123266.3123453

Viewport Polar tile

Equatorial tile

Figure 1: In tiled VR streaming, only tiles belonging to theviewport (in green) are streamed at the highest quality.

ACM Reference format:S. Petrangeli, V. Swaminathan, M. Hosseini, F. De Turck. 2017. An HTTP/2-Based Adaptive Streaming Framework for 360° Virtual Reality Videos. InProceedings of MM ’17, October 23–27, 2017, Mountain View, CA, USA, 9 pages.DOI: https://doi.org/10.1145/3123266.3123453

1 INTRODUCTIONRecent advancements in consumer electronics have made VirtualReality (VR) devices accessible to a large public. Consequently, thedemand for 360° VR video streaming is expected to grow exponen-tially in the near future. Streaming VR videos over the best-e�ortInternet is challenged today by the high bandwidth required tostream the entire 360° video. This aspect is often responsible forlow video quality and bu�er starvations, two of the main factorsin�uencing users’ Quality of Experience (QoE). View-dependent so-lutions are ideal in saving bandwidth for VR streaming, as only theviewport, the portion of the video watched by the user, is streamedat the highest quality, while the rest of the video is streamed ata lower quality. Despite that, current solutions require to storea di�erent version of the video for each possible position of theviewport, entailing huge CDN costs [5]. Tiling the VR video allowsto obtain similar results in terms of bandwidth savings, withoutadditional storage compared to traditional streaming. In tiling, theVR video is divided into spatial regions, each encoded at di�erentquality levels. To save bandwidth, only tiles inside the viewportare streamed at the highest quality (Figure 1). This approach canbe combined with the bandwidth adaptation of HTTP AdaptiveStreaming (HAS) techniques. In tiled HAS, the video is both tempo-rally segmented and spatially tiled, so that each temporal segmentof the video is composed of several video tiles.

Page 2: An HTTP/2-Based Adaptive Streaming Frameworkfor 360 ... · of tiled streaming over HTTP/1.1, and consistently reduce freeze time. These results represent a major improvement for the

MM ’17, , October 23–27, 2017, Mountain View, CA, USA S. Petrangeli et al.

Unfortunately, tiling the video causes a signi�cant increase inthe number of requests when HAS is used over HTTP/1.1. Eachtile has to be requested independently from the others in orderto create a complete temporal video segment, meaning that thisapproach is susceptible to high RTTs, typical in mobile networks.As an example, assuming the video is composed of 6 tiles andthe RTT is 100 ms, it would require at least 600 ms to downloadeach temporal segment. This behavior can consistently lower theachieved throughput and limit the practical applicability of tiledVR streaming. In this paper, we propose to overcome this drawbackusing the server-push functionality of the HTTP/2 protocol [18].Only a single HTTP GET request is sent from the client; all the tilesare automatically pushed from the server. This approach allows toovercome the main drawback introduced by tiling. HTTP/2 sharesthe same methods, status codes and semantics with HTTP/1.1,entailing a complete backward compatibility. Server push, and moregenerally HTTP/2, is also completely cache- and CDN-friendly.

The main contributions of this paper are three-fold. First, wepropose a viewport-dependent HTTP/2-based streaming frame-work, where the client decides the quality of each tile based on theuser viewport. The H.265 standard is used to tile the video [10],as it allows to use a single decoder to decode the di�erent tiles,which represents an important advantage on resource-constraineddevices, as smartphones and tablets. Moreover, the HTTP/2’s serverpush can completely eliminate the overhead introduced by tilingthe video, as only one HTTP request specifying the quality of thetiles is sent from the client to the server. Based on this request, theserver can push all the requested tiles back to back, consequentlyreducing the impact of the network RTT. This approach does notrequire any client status to be kept on the server. A push direc-tive, as standardized by MPEG-DASH [11], embedded in the clientrequest speci�es the tiles qualities. By parsing this directive, theserver understands which tiles to push. Second, an algorithm is pro-posed to predict the future viewport position, in order to minimizequality transitions during viewport changes. An estimate of thefuture viewport is computed based on the viewport speed, in orderto anticipate the user’s movements and request in advance the rightportion of the video at the highest quality. This approach allows toprovide a graceful viewport transition and maximize QoE. Third,extensive experimental results are collected to quantify the gains ofthe proposed framework using a prototype implemented on a GearVR and Samsung Galaxy S7. Particularly, we show that our HTTP/2solution can reach better quality and lower freeze time comparedto standard tiled video over HTTP/1.1, in high RTT conditions.

The remainder of this paper is structured as follows. Section 2presents the related work on 360° VR video streaming and HTTP/2-based adaptive streaming. Section 3 describes in detail the proposedframework, both from an architectural and algorithmic point ofview, while Section 4 reports the obtained results. Section 5 con-cludes the paper.

2 RELATEDWORK2.1 360° Video StreamingThere is a large body of literature addressing the high bandwidth re-quirements of 360° videos, with tiling identi�ed as a good candidateto alleviate this problem.

D’Acunto et al. propose an MPEG-DASH SRD client to optimizethe delivery of zoomable videos, which are a�ected by the samebandwidth problem as 360° videos [4]. The video is spatially di-vided in tiles: the low resolution tile, corresponding to the wholezoomable video, is always downloaded to avoid a black screen incase of viewpoint changes. The high resolution tiles, correspondingto the zoomed part of the video, are downloaded afterwards. Wanget al. deliver the tiles of a zoomable video using multicast [17].An algorithm is proposed to decide the resolution of the tiles tobe multicasted and maximize the utility of all users. Lim et al. usetiling to e�ciently deliver panoramic videos [9]. In these works, thevideo is tiled using H.264, which does not natively support tiling.This aspect complicates the synchronization of the tiles, as each tilehas to be decoded independently. Le Feuvre et al. propose to useH.265 to spatially divide the 360° video [8]. They also propose a rateallocation algorithm to decide the quality of each tile based on theavailable bandwidth and the user viewport. The tiles are deliveredusing HTTP/1.1 and no prediction is proposed to anticipate theuser’s movements. Gaddam et al. use tiling and viewport predictionto stream interactive panoramic videos, where part of the panoramacan be used to extract a virtual view [6]. Also in this case, the videois encoded in H.264 and transported over regular HTTP/1.1. Cuervoet al. investigate the use of panoramic stereo video and likelihood-based foveation to deliver the 360° video [3]. Any possible virtualview can be extracted from the panoramic video, whose qualitygradually degrades based on the part of the video the user is mostlikely to watch in the future. Qian et al. propose a framework whereonly the portion of the 360° video watched by the user is actuallytransmitted, to save bandwidth [13]. The developed viewport pre-diction algorithm should therefore be extremely precise in order toavoid stalling when the user changes viewport. Moreover, resultsare only presented in a simulated environment. TaghaviNasrabadiet al. use scalable video coding and tiling to stream the VR video[15]. The video is divided in di�erent tiles, and each tile is encodedat di�erent scalable layers. As the video is delivered over standardHTTP/1.1, this approach is susceptible to high RTT’s. While all ofthe above research is successful in addressing some of the problemsa�ecting 360° video streaming, there is no single solution to addressall problems. Our work is an attempt to provide a comprehensivesolution for VR streaming. By using H.265 to tile the video, we caneliminate the synchronization issues at client side introduced bytiling. The tiled video is transported using HTTP/2, which allows toeliminate the signi�cant increase of GET requests due to the spatialpartitioning of the video. Finally, viewport prediction can success-fully compensate the quality degradation introduced by assigninglower qualities for tiles outside the viewport, by anticipating user’smovements.

Budagavi et al. mainly focus on how to optimize the encodingprocess to reduce the bit-rate of VR videos [1]. By gradually smooth-ing the quality of the bottom and top part of an equirectangularprojection, they are able to reduce the bit-rate by 20%. Zare et al.propose a modi�ed H.265 encoder to more e�ciently tile a 360°video [20]. Hosseini et al. propose a new tiling structure of the 360°video, which allows to save up to 30% of the bandwidth comparedto a non-tiled video [7]. The same tiling structure has also beenadopted in this paper, because of its e�ciency. Our approach can

Page 3: An HTTP/2-Based Adaptive Streaming Frameworkfor 360 ... · of tiled streaming over HTTP/1.1, and consistently reduce freeze time. These results represent a major improvement for the

An HTTP/2-Based Adaptive Streaming Frameworkfor 360° Virtual Reality Videos MM ’17, , October 23–27, 2017, Mountain View, CA, USA

be considered complimentary to these works, as in our case wemostly focus on the delivery of the video, rather than its encodingand preparation.

2.2 HTTP/2-Based Adaptive StreamingOne of the new features introduced by HTTP/2 is the possibility forthe server to push resources that have not been requested directlyby the client. This mechanism was originally proposed to reducethe latency in web delivery, but has also been applied in the deliveryof multimedia content. Wei et al. are the �rst to investigate howserver push can improve the delivery of HAS streams [18]. Theyfocus on the reduction of the camera-to-display delay, which isobtained by reducing the segment duration and pushing k segmentsafter a single HTTP GET request is issued by the client. Xiao etal. extend the k-push mechanism to optimize the battery lifetimeon mobile devices, by dynamically varying the value of k basedon network conditions and power e�ciency [19]. van der Hooftet al. also investigate the merits of server push for H.265 videosover 4G networks [16]. Segments with a sub-second duration arecontinuously pushed from the server to the client, in order to reducethe live delay compared to HTTP/1.1-based solutions. Cherif etal. use server push in conjunction with WebSocket to reduce thestartup delay in a DASH streaming session [2]. In this work, weexploit the server push functionality in order to reduce the networkoverhead introduced by spatially dividing the video into separatetiles. Instead of pushing the segments one after other, we use thek-push mechanism to push the tiles composing a single temporalsegment back-to-back from the server to the client.

3 HTTP/2-BASED VR STREAMINGFRAMEWORK

In this section, we describe the proposed framework for VR stream-ing. Our framework builds upon three components, which com-bined overcome the main issues a�ecting current VR streamingsolutions, namely storage costs and bandwidth requirements. First,the VR content is encoded using the H.265 standard and divided intospatial tiles, each encoded at di�erent quality levels (Section 3.1).Besides an encoding overhead introduced by the tiling process, thisapproach requires the same amount of storage as in classical videostreaming. Second, the video client is equipped with an algorithmthat can select the best video quality for each tile based on informa-tion as the current and predicted future viewport and the availablenetwork bandwidth (Section 3.2). By dynamically changing theviewport quality, the bandwidth required to stream the 360° videocan be consistently reduced. Third, the server-push functionalityof the HTTP/2 protocol allows to eliminate the signi�cant increaseof HTTP GET requests caused by tiling the video (Section 3.3), inturn increasing the achieved throughput, especially in high RTTnetworks.

3.1 H.265 Video TilingOne of the innovations introduced by the H.265 standard is thepossibility to spatially divide the video into regions, called tiles[10]. The tiles can be physically separated from each other andreconstructed in a common stream that can be decoded by a singledecoder. This tiling process is extremely bene�cial in VR streaming,

where the user can only watch a fraction of the entire 360° video atany given point in time. In fact, only the tiles inside the viewport arestreamed at the highest quality. This approach would still give thesame feeling of immersion as if the entire 360° video was streamedat the highest quality, while requiring less bandwidth compared tofull quality VR streaming and less storage compared to viewport-dependent non-tiled VR streaming. The storage savings can bequanti�ed based on the underlying 2D projection used for the VRvideo. We consider an equirectangular projection with width wand height h, and viewport with width and height equal to wp andhp , respectively. We assume the video is composed of two qualitylevels, with bit-rates b1 ≥ b0. In our tiled approach, the total storagerequired to stream a single video is given by:

St iled = d × (b1 + b0) × α (nt )

where d is the video duration and α (nt ) represents the encodingoverhead introduced by the tiling process, which depends on thenumber of tiles nt . Conversely, in a non-tiled approach, a di�erentcopy of the video has to be encoded for each desired viewportcon�guration:

Snon−t iled = d ×

[wp × hp

w × h× b1 +

(1 −

wp × hp

w × h

)× b0

× N (w,h,wp ,hp , ξ )

The term in brackets represents the bit-rate necessary to streama single viewport con�guration and is given by the percentage ofthe equirectangular projection occupied by the viewport at highquality (�rst term) plus the remaining part of the video streamedat a lower quality (second term). N (w,h,wp ,hp , ξ ) indicates thenumber of di�erent viewport con�gurations to encode, with ξ beingthe step separating the di�erent viewports:

N (w,h,wp ,hp , ξ ) = bw −wp

ξc × b

h − hp

ξc

Consequently, the gain in terms of storage between a tiled andnon-tiled approach can be quanti�ed as follows:

G =Snon−t iledSt iled

=b1 − b0b1 + b0

×wp × hp

w × h×N (w,h,wp ,hp , ξ )

α (nt )+

+b0

b0 + b1×N (w,h,wp ,hp , ξ )

α (nt )

The previous equation can be generalized to the case whereQ quality levels are available, such that the quality is graduallydegraded outside the viewport:

G =Snon−t iledSt iled

=N (w,h,wp ,hp , ξ )

α (nt )×

×

1w×h ×

∑Qq=1 bq ×

[wi × hi −

∑q−1j=1 w j × hj

]

∑Qq=1 bq

As for the actual tiling structure, we used the same approachfrom Hosseini et al. [7] (see Figure 1). In total, six tiles are created:two polar tiles and 4 equatorial tiles. This tiling process allows toreduce the bandwidth needed to stream the VR video by about 30%,

Page 4: An HTTP/2-Based Adaptive Streaming Frameworkfor 360 ... · of tiled streaming over HTTP/1.1, and consistently reduce freeze time. These results represent a major improvement for the

MM ’17, , October 23–27, 2017, Mountain View, CA, USA S. Petrangeli et al.

when compared to a full quality non-tiled approach [7]. As in HEVCtiling all columns must have the same number of rows, we simplytake the tiles belonging to the polar region and concatenate themtogether, so that they can be requested as a single object by theclient.

3.2 Tiles Quality SelectionAn important aspect of the proposed framework is the client-basedheuristic in charge of deciding the quality of the tiles. While inclassical HAS this decision is mainly based on network conditions,in 360° VR streaming a new dimension is added, namely the userviewport. In this work, we consider as viewport the region withcenter the �xation point and 60-degrees in radius, known as themid-peripheral region. All tiles overlapping with the viewport re-gion are considered viewport tiles. The quality of the tiles belongingto the viewport should always be maximized in order to guaranteean immersive experience to the user. The remaining tiles can bestreamed at a lower quality, to save bandwidth while guaranteeinga fast transition when the viewport changes. In order to reducethe transition time and maximize the viewed quality, our heuristiccan also predict the future viewport, based on the �xation pointspeed. This way, the video client can request in advance the tiles be-longing to the predicted viewport and therefore provide a seamlesstransition.

As in classical HAS, a new decision about the tiles quality ismade by the client after a segment has been completely downloaded.First, the client identi�es tiles belonging to the current and futureviewport. In order to compute the future viewport, we obtain thepositionp at instantk of the current �xation point on the underlying2D projection of the VR video, for example, latitude and longitudefor an equirectangular projection. The future �xation point, whichde�nes the future viewport, is computed as:

p (k + ∆) = p (k ) + ∆ × p̂ (k )

p̂ (k ) =p (k ) − p (k − δ )

δ

where ∆ is the future viewport prediction horizon, p̂ (k ) is thespeed of the �xation point and δ is the speed measurement interval.In our work, ∆ is equal to the segment duration of the video. Inorder to guarantee a �ne-grained monitoring of the viewport speed,δ is �xed to 100 ms.

Once the future viewport is computed, the tiles are logicallydivided into three categories: viewport, for tiles belonging to thecurrent and future viewport, adjacent, for tiles immediately outsidethe viewport tiles and outside, for all the remaining ones. A designchoice is taken regarding the polar tiles (Figure 1). Particularly, apolar tile always belongs to the outside group, unless it is part ofthe tiles between the current and future viewport (i.e., a viewporttile). The actual quality of the tiles is selected based on the availablebandwidth, as described in Algorithm 1. The algorithm takes asinputs the available perceived bandwidth B, the bit-rates of thevideo b (−) and the aforementioned tiles categories. First, the lowestquality is assigned to all the tiles of the video (line 1). This initialallocation guarantees that all the tiles of the video are streamed tothe user. Then, the available bandwidth budgetBbudдet is computed

Algorithm 1 Tiles quality selection heuristic.Require:

B , available perceived bandwidth (in Mbps)b , vector containing the bit-rates (in Mbps) of the available qualitylevels, from 0 (lowest) to nq (highest)viewport, adjacent, outside tiles groupsnT , the total number of video tiles

Ensure:qt (−), vector of the assigned tiles quality

1: qt (t ) = 0 ∀t ∈ {viewpor t, ad jacent, outside }2: Bbudдet = B − nT × b (0)3: for t iles_cateдory in {viewpor t, ad jacent, outside } do4: qt = maxq∈[1;nq ] q s .t . b (q ) ≤

Bbudдetnt iles

5: Bbudдet = Bbudдet − nt iles × b (qt )6: qt (t ) = qt ∀t ∈ t iles_cateдory7: end for

(line 2) as the di�erence between the available bandwidth and thetotal bit-rate allocated to the tiles. Next, the highest possible qualityis assigned to the tiles, given the bandwidth budget (lines 3-7),starting from viewport tiles. We select the highest quality q suchthat the corresponding bit-rate b (q) is lower than the ratio betweenthe current bandwidth budget and number of tiles nt iles belongingto the analyzed category (line 4). We then update the bandwidthbudget (line 5) and repeat the allocation for the adjacent and outsidetiles, till we run out of bandwidth. This way, we mitigate the edgee�ect between tiles at di�erent qualities by gradually reducing thequality as we move out of the viewport.

3.3 HTTP/2 Server Push for Tiled VideosOnce the tiles quality is decided, the client issues an HTTP GETrequest to the server. In classical tiled streaming over HTTP/1.1,nT HTTP GET requests have to be issued in order to retrieve asingle temporal segment, with nT equal to the number of tiles. Thisaspect entails that nT RTTs are lost, which can lower the achievedthroughput in mobile, high RTT networks. We propose to solvethis issue by using the server push functionality of the HTTP/2protocol and, particularly, the k-push approach proposed by Wei etal. [18], with k set to nT . In this case, only one request is sent fromthe client to the server, specifying the qualities of the video tiles,decided as reported in Section 3.2. All the tiles are consequentlypushed from the server to the client using the HTTP/2 protocol.This approach eliminates the request overhead due to tiling andresults in a better bandwidth utilization, even in high RTT networks.It is worth stressing that the k-push mechanism does not requireany client status to be kept on the server. A push directive, asstandardized by part 6 of the MPEG-DASH standard [11], embeddedin the client request speci�es the tiles qualities. We extended theurn:mpeg:dash:fdh:2016:push-next push directive in a compatibleway to allow the client specifying the tiles qualities. By parsingthis directive, the server understands which tiles to push. HTTP/2server push is by design cache compatible, and several CDNs arestarting to deploy it1.

1https://blogs.akamai.com/2016/04/are-you-ready-for-http2-server-push.html

Page 5: An HTTP/2-Based Adaptive Streaming Frameworkfor 360 ... · of tiled streaming over HTTP/1.1, and consistently reduce freeze time. These results represent a major improvement for the

An HTTP/2-Based Adaptive Streaming Frameworkfor 360° Virtual Reality Videos MM ’17, , October 23–27, 2017, Mountain View, CA, USA

Network module – okhttp

HTTP/1.1 HTTP/2 HTTP/2 server push

Video player –

Exoplayer

Video tiles from

the server

HTTP GET to

the server

Viewport

monitoring –

Gear VR Framework

Tiles

repackaging –

MP4Box

Tiles quality

selection

Viewport

prediction

Video

tiles

Video

segment

Buffer

level

Available

bandwidth

Current

viewport

Current

viewport

Predicted

viewport

Tiles

quality

Figure 2: Illustrative diagram of the developed prototype.Gray boxes indicate algorithmic components. In italics, thenames of the used libraries.

4 PERFORMANCE EVALUATION4.1 Experimental SetupThe proposed framework has been implemented as a prototype on aSamsung Galaxy S7 and a Gear VR [12]. A high-level description ofthe prototype is given in Figure 2. The Gear VR Framework2 allowsto develop VR applications on Android devices and provides generalVR functionalities. The framework is mainly used in the Viewportmonitoring module, as it allows to capture where the user is watch-ing and to enable viewport-awareness. The Tiles quality selectionmodule selects the quality of the tiles and takes as input: (i) thebu�er level, (ii) the available bandwidth, (iii) the current viewportand (iv) the predicted viewport, computed by the Viewport predic-tion module. The tiles quality is then communicated to the Networkmodule, implemented using the okhttp3 library, which takes care ofthe actual streaming of the video segments. Both regular HTTP/1.1and HTTP/2 protocols are supported. We also extended the okhttplibrary to support the server push functionality of HTTP/2. Oncethe tiles are downloaded from the server, the Tiles repackaging mod-ule, realized using the MP4Box4 library, pre-processes them beforethey are actually played by the Video player. This step is necessarybecause the ExoPlayer5, which implements the video playout, is notable to directly play tiled videos. Particularly, tiles are concatenatedinto a single mp4 �le using the cat command, and the raw HEVCstream is extracted using the raw command. Future versions of theExoPlayer would allow to eliminate this step. Despite this process,the latency added to the system is less than 100 ms.

The Jetty server has been used as HTTP server, which was ex-tended to implement the k-push functionality [19]. In our frame-work, k corresponds to the number of tiles the video is composedof. The Samsung S7 is connected to the Jetty server, hosted on aMacBook Pro Retina, via a 5GHz ad-hoc wireless network. Thewireless network presents an average RTT in idle conditions ofabout 50ms (±33ms).

2https://resources.samsungdevelopers.com/Gear_VR/020_GearVR_Framework_Project3http://square.github.io/okhttp/4https://gpac.wp.mines-telecom.fr/mp4box/5https://developer.android.com/guide/topics/media/exoplayer.html

Table 1: VR video characteristics. The nominal average bit-rate and its standard-deviation (in brackets) is reported. Allvalues are expressed in Mbps.

Tiled Non-tiled1 sec 2 sec 4 sec 1 sec 2 sec 4 sec

High 9.5(1.34) 7.1(1.18) 5.6(1.05) 8.9(1.61) 6.7(.92) 5.2(1.05)Medium 4.8(0.45) 3.2(0.34) 2.3(0.32) 4.3(0.61) 2.9(0.34) 2.1(0.32)Low 2.5(0.19) 1.6(0.14) 1.1(0.11) 2.3(0.26) 1.4(0.13) 0.9(0.10)

The 60 seconds Alba 360° Timelapse, available on YouTube, isused as video content. The raw 8K video was extracted from theoriginal clip and re-encoded using the HM encoder (version 16.4),the reference software for H.265. Three quality levels have beenencoded in variable bit-rate, corresponding to QP values equal to30, 25 and 20. The video is available in a 1, 2, 4 seconds segmentversion, both non-tiled and tiled. Using shorter segments increasesthe adaptability to viewport and bandwidth changes, at the cost ofan encoding overhead due to more frequent Instantaneous Decod-ing Refresh (IDR) frames at the beginning of each segment. Table1 reports the bit-rates of the di�erent video versions. As expected,tiling the video introduces an overhead compared to the non-tiledversion, which varies between 6% and 22%.

To provide an extensive benchmark of the proposed framework(referred as to H2P pred in the results section), we compare itsperformance with that obtained using a non-tiled solution (calledNon-tiled in the results section). Moreover, we also tested the per-formance of a tiled solution over HTTP/1.1 and HTTP/2 serverpush (referred as to H1 and H2P, respectively), without the view-port prediction presented in Section 3.2. This way, it is possibleto clearly identify both the gains of the prediction algorithm andthose brought by server push.

To assess the performance of the viewport prediction, we recorded10 di�erent viewport traces from real users using our prototype,and arti�cially injected them during the experiments. We asked10 users to watch the full high quality, non-tiled version of theAlba 360° Timelapse on the developed prototype, and recorded theviewport positions. The traces are divided in two groups, slow andfast, representing the cases where the movements are rare and slowor frequent and fast, respectively. Particularly, we characterize asslow the traces whose average angular speed is less than 90 deg/secand fast otherwise. The tiling structure used in this paper is com-posed of 2 polar tiles and 4 equatorial tiles (Figure 1), each covering90 degrees of the 360-degree video. This aspect entails that in theslow traces group, viewport tiles change at the same timescale asthe segment duration of the video. Therefore, even non-predictiveapproaches should be able to adapt the tiles quality fast enough toaccommodate viewport changes.

The VR client is equipped with a video bu�er whose criticalthreshold is 2 seconds, i.e., a new segment request is issued onlywhen the video bu�er drops below 2 seconds. This choice representsa good trade-o� between avoiding bu�er starvations and provid-ing a quick response to viewport changes. Each con�guration interms of segment duration, VR solution, viewport and networkcon�guration has been repeated 10 times to guarantee statisticalsigni�cance.

Page 6: An HTTP/2-Based Adaptive Streaming Frameworkfor 360 ... · of tiled streaming over HTTP/1.1, and consistently reduce freeze time. These results represent a major improvement for the

MM ’17, , October 23–27, 2017, Mountain View, CA, USA S. Petrangeli et al.

0% 50% 100%

Non-tiledH2P pred

H2PH1

Non-tiledH2P pred

H2PH1

Non-tiledH2P pred

H2PH1

Time spent on quality [%]

High Medium Low

4 s

ec2 s

ec1 s

ec

0% 50% 100%

Non-tiledH2P pred

H2PH1

Non-tiledH2P pred

H2PH1

Non-tiledH2P pred

H2PH1

Time spent on quality [%]

High Medium Low

4 s

ec2 s

ec1 s

ec

(a) 4G

0% 50% 100%

Non-tiledH2P pred

H2PH1

Non-tiledH2P pred

H2PH1

Non-tiledH2P pred

H2PH1

Time spent on quality [%]

High Medium Low

4 s

ec2 s

ec1 s

ec

0% 50% 100%

Non-tiledH2P pred

H2PH1

Non-tiledH2P pred

H2PH1

Non-tiledH2P pred

H2PH1

Time spent on quality [%]

High Medium Low

4 s

ec2 s

ec1 s

ec

(b) 5 Mbps

Figure 3: Viewport prediction is generally able to increasethe time spent on the highest quality. Tiled solutions reachsimilar or better performance than the non-tiled one, whenthe bandwidth decreases.

200

225

250

275

300

0 0.1 0.2 0.3 0.4 0.5

Total streamed data [Mbit]

Efficiency ratio vw

H2P pred

H1

H2P

Non-tiled

250

300

350

400

450

500

0.2 0.25 0.3 0.35 0.4

Total streamed data [Mbit]

Efficiency ratio vw

H2P pred

H1 – H2P

Non-tiled

(a) 4G

200

225

250

275

300

0 0.1 0.2 0.3 0.4 0.5

Total streamed data [Mbit]

Efficiency ratio vw

H2P pred

H1

H2P

Non-tiled

250

300

350

400

450

500

0.2 0.25 0.3 0.35 0.4

Total streamed data [Mbit]

Efficiency ratio vw

H2P pred

H1 – H2P

Non-tiled

(b) 5 Mbps

Figure 4: Tiled approaches have a higher e�ciency com-pared to the non-tiled one, as only the viewport is streamedat the highest quality. Results are reported for the 2 secondssegments video.

4.2 Impact of Tiling and PredictionIn this section, we investigate the performance of the proposedapproach. We consider two di�erent network scenarios. In the �rstone, we varied the available bandwidth based on traces collected ona real 4G network [16], to assess the performance of the proposedframework under realistic network conditions. The traces presentan average bandwidth equal to 21.8 Mbps (±12.3 Mbps), which isoften enough to stream the highest quality. In the second scenario,the bandwidth is �xed to 5 Mbps, to clearly highlight the bene�tsof the proposed approach, as a classical non-tiled approach will notbe able to stream the highest quality (see Table 1).

Results for the slow viewport traces are presented in Figure 3.The graphs report the percentage of time spent by the viewporton the three available quality levels, for each segment duration (1,2, 4 seconds). The time spent on the highest quality has always tobe maximized, to guarantee the best immersion to the user watch-ing the VR video. When the viewport slowly changes, di�erencesbetween a tiled and non-tiled approach are small (Figure 3a). View-port prediction (H2P pred) can increase the quality by about 15%

compared to tiled approaches without prediction (H1 andH2P). Con-sequently, the di�erence in terms of quality spent on the highestquality is very small (about 10%) compared to a non-tiled solution.The gains of tiling approaches are evident when bandwidth is lim-ited (Figure 3b), as the non-tiled approach cannot stream the highestquality. Conversely, tiled approaches can successfully stream thehighest quality of the video, up to 70% of the time in the 1 secondsegments case (Figure 3b). Results for the 4 seconds segments arecaused by the slightly increased amount of data needed to streamthe video when prediction is used. In this case, both viewport andpredicted tiles are requested at higher qualities to minimize qualitytransitions when the viewport changes. This aspect entails thatmore data is needed to stream the video (about 10%) comparedto tiled non-predictive approaches. Therefore, when bandwidth islimited and segment size is bigger, the client tends to request thesecond highest quality instead of the highest one. Another impor-tant metric to consider is the amount of data needed to transferthe video. We therefore introduce an e�ciency metric vw, whichrepresents the ratio between the amount of data used to stream theviewport at the highest quality, and the total amount of streameddata, computed as in the following:

vw =

∑nSs=1

∑t ∈viewpor t bst∑nS

s=1∑nTt=1 bst

where nS and nT are the number of segments and tiles the videois composed of, bst is the bit-rate of tile t belonging to segments and bst is the bit-rate of the highest quality if tile t is at thehighest quality, or zero otherwise. This metric quanti�es how muchbandwidth is wasted to stream tiles that are either at lower qualitiesor outside the viewport. Figure 4 shows the values of vw, for the 2seconds segments video. The x-axis reports the vw ratio, while they-axis the total amount of streamed data, to have an absolute scaleto compare the di�erent solutions. Tiled approaches reach a bettere�ciency when compared to the non-tiled one. Less data is needed,as only the viewport is streamed at the highest quality. Particularly,our solution is able to increase e�ciency from 25% to almost 40%(Figure 4a). Despite the overhead introduced by tiling the video (seeTable 1), our tiled solution uses 35% less data to stream the videothan a non-tiled one. This condition entails that our solution canbetter redistribute the data needed to stream the video, by givingmore importance to the portion of the video actually watched bythe user. When the bandwidth is �xed to 5 Mbps, the e�ciency ofthe non-tiled solution drops to zero, as the highest quality cannotbe streamed.

A similar analysis can be repeated for the fast viewport (Figure5). In this case, tiling the video reduces the amount of time spent onthe highest quality. If the user viewport is moving fast, the optimalchoice to provide the best immersion would be to download theentire video at the highest quality. In the 4G network con�guration,the non-predictive tiling approaches can provide only 25% of thetime at the highest quality, compared to 90% of the non-tiled solu-tion (Figure 5a). Our predictive approach can consistently reducethis di�erence to about 30%. These results clearly show the impor-tance of viewport prediction when the VR video is tiled. Despitethese results, the non-tiled approach cannot provide the best QoEwhen the bandwidth is limited (Figure 5b). As for the slow viewport

Page 7: An HTTP/2-Based Adaptive Streaming Frameworkfor 360 ... · of tiled streaming over HTTP/1.1, and consistently reduce freeze time. These results represent a major improvement for the

An HTTP/2-Based Adaptive Streaming Frameworkfor 360° Virtual Reality Videos MM ’17, , October 23–27, 2017, Mountain View, CA, USA

0% 50% 100%

Non-tiledH2P pred

H2PH1

Non-tiledH2P pred

H2PH1

Non-tiledH2P pred

H2PH1

Time spent on quality [%]

High Medium Low

4 s

ec2 s

ec1 s

ec

0% 50% 100%

Non-tiledH2P pred

H2PH1

Non-tiledH2P pred

H2PH1

Non-tiledH2P pred

H2PH1

Time spent on quality [%]

High Medium Low

4 s

ec2 s

ec1 s

ec

(a) 4G

0% 50% 100%

Non-tiledH2P pred

H2PH1

Non-tiledH2P pred

H2PH1

Non-tiledH2P pred

H2PH1

Time spent on quality [%]

High Medium Low

4 s

ec2 s

ec1 s

ec

0% 50% 100%

Non-tiledH2P pred

H2PH1

Non-tiledH2P pred

H2PH1

Non-tiledH2P pred

H2PH1

Time spent on quality [%]

High Medium Low

4 s

ec2 s

ec1 s

ec

(b) 5 Mbps

Figure 5: When the viewport is highly dynamic, a tiled ap-proach cannot reach the same quality as the non-tiled one.Despite that, when the bandwidth drops, tiling outperformsa traditional approach.

traces, streaming only the tiles watched by the user at the highestquality consistently reduces the bandwidth required to stream theVR video. Also the e�ciency metric vw drops when the viewportis highly dynamic, as it becomes more di�cult to present the rightportion of the video at the highest quality. Compared to the slowviewport case, the e�ciency for tiled solutions drops by about 25%.Using prediction allows to limit this drop to only 15%. Nonetheless,also in the fast viewport scenario, tiled approaches require far lessdata (up to 35%) compared to the non-tiled solution. Moreover, tiledapproaches can still provide a better e�ciency than the non-tiledone in the 5 Mbps scenario, as in this case the latter is never ableto stream the highest quality.

4.3 Impact of HTTP/2 Server PushThe aim of this section is to highlight the advantages of HTTP/2server push compared to traditional HTTP/1.1. As explained inSection 3.3, the proposed approach only needs to send a single GETrequest specifying the tiles qualities, which are then pushed auto-matically by the server. This solution is particularly bene�cial whenthe total time to retrieve each tile individually, as with standardHTTP/1.1, is comparable to the segment duration. This problemcan arise, for example, when the network RTT is high. We analyzethis scenario in the remainder of this section.

To better isolate the impact of the RTT only, we increase theRTT of the wireless network connecting the server and the clientto 100 ms and �x the bandwidth to 35 Mbps. Moreover, the slowviewport traces are used. Results of these experiments are presentedin Figure 6. A high RTT has a negative in�uence on the perceivedbandwidth of the HTTP/1.1 tiled approach (Figure 6a), as an RTTis lost to retrieve each single tile. This behavior has a direct impacton the total freeze time, which is extremely high in the HTTP/1.1case (Figure 6a), for all video con�gurations. Pushing the tiles isextremely bene�cial in this scenario. Particularly, we can still ob-tain similar performance to a non-tiled approach (Figure 6a), whilekeeping all the advantages described in Section 4.2. The perceived

H1

Non-tiled

H2P –

H2P pred0

5

10

15

20

25

30

1 2 4

Freeze time [s]

Segment duration [s]

1

4

7

10

13

1 2 4

Perceived bandwidth

[Mbps]

Segment duration [s]

Non-tiled

H1

H2P –

H2P pred

H2P –

H2P pred

H1

Non-tiled

(a) Bandwidth and freeze time

0% 50% 100%

Non-tiledH2P pred

H2PH1

Non-tiledH2P pred

H2PH1

Non-tiledH2P pred

H2PH1

Time spent on quality [%]

High Medium Low

4 s

ec2 s

ec1 s

ec

(b) Time on quality

Figure 6: HTTP/1.1 cannot provide good performance whenthe RTT is high, as each tile has to be retrieved indepen-dently. HTTP/2 push can completely eliminate this prob-lem.

bandwidth increases by more than 3 times compared to HTTP/1.1,which allows to consistently reduce video freezes. From a networkpoint of view, retrieving a single non-tiled video segment is thesame as pushing the tiles using HTTP/2, as only one request hasto be sent in both cases. As expected, the perceived bandwidthincreases with the segment duration, as the impact of the RTT onthe total download time diminishes. These results have a directimpact on the viewport quality (Figure 6b). Due to the low per-ceived bandwidth, in HTTP/1.1 most of the time is spent on thelowest quality. HTTP/2 solutions are instead able to provide a goodviewport quality. The quality drop in the non-tiled case is mainlydue to the varying bandwidth caused by a high RTT. As stated inSection 4.1, the wireless network presents an average RTT of about50ms (±33ms). Increasing the RTT to 100 ms causes the e�ectiveavailable bandwidth to �uctuate. As explained in Section 4.2, anon-tiled approach needs a higher bandwidth to stream the samequality as for tiled approaches, and is therefore more susceptible tovarying bandwidth conditions.

Another situation where retrieving each tile individually insteadof pushing them can have a negative impact on the overall stream-ing performance, is when the number of tiles increases. In all theprevious experiments, we used a 6 tiles con�guration, shown inFigure 1, which is composed of 2 polar tiles and 4 equatorial tiles.We re-encoded the content in order to have 14 tiles in total, 12 equa-torial and 2 polar. Increasing the number of tiles provides a bettergranularity in terms of viewport quality adaptation, as it is possibleto better match the portion of the video actually watched by theuser with the video tiles, but has a negative impact in terms of net-work overhead. Figure 7 reports the results of this experiment, fornetwork bandwidth �xed to 5 Mbps and 1 and 2 seconds segmentsvideo. As in the previous set of experiments, the slow viewporttraces have been used. In the HTTP/1.1 case, increasing the numberof tiles to 14 causes the bandwidth to drop by about 13% comparedto server push, due to the increased idle time between subsequentHTTP GET requests. Despite this drop, bandwidth is su�cientlyhigh to provide both a good overall quality and few freezes in the 2

Page 8: An HTTP/2-Based Adaptive Streaming Frameworkfor 360 ... · of tiled streaming over HTTP/1.1, and consistently reduce freeze time. These results represent a major improvement for the

MM ’17, , October 23–27, 2017, Mountain View, CA, USA S. Petrangeli et al.

0 1.5 3 4.5 6

H2P pred

H2P

H1

H2P pred

H2P

H1

Freeze time [s]

6 tiles 14 tiles

2 s

ec

1 s

ec

0% 50% 100%

H2P pred

H2P

H1

H2P pred

H2P

H1

Time spent on highest quality [%]

6 tiles 14 tiles

2 s

ec

1 s

ec

(a) Time on highest quality

0 1.5 3 4.5 6

H2P pred

H2P

H1

H2P pred

H2P

H1

Freeze time [s]

6 tiles 14 tiles

2 s

ec

1 s

ec

0% 50% 100%

H2P pred

H2P

H1

H2P pred

H2P

H1

Time spent on highest quality [%]

6 tiles 14 tiles

2 s

ec

1 s

ec

(b) Freeze time

Figure 7: In the 14 tiles video, HTTP/1.1 results in highfreeze time when the idle time due to the increased num-ber of GET requests approaches the segment duration (i.e., 1second segments video).

seconds segments video con�guration (Figures 7a and 7b). In the1 second segment version instead, the decreased bandwidth andthe increased idle time introduced by the subsequent GET requests,causes the HTTP/1.1 client to freeze (Figure 7b). HTTP/2 push solu-tions are much less a�ected by the increased number of tiles, whichis actually bene�cial in terms of total freeze time, compared to the6 tiles video. This behavior can be explained by looking at the totalamount of streamed data, which decreases with about 15% and 5%on average, in the 2 seconds and 1 second segments video. In the14 tiles scenario, it is possible to better match the user viewportwith the available tiles and stream a smaller portion of the video atthe highest quality, compared to the 6 tiles video. This aspect alsoentails a disadvantage in terms of time spent on the highest quality(Figure 7a), since it is more likely for the user to watch parts of thevideo at lower qualities in case of viewport changes. Predicting theuser viewport becomes even more prominent in this scenario, as itallows to limit the drop from 15%-20% in the non-predictive casesto only 5%-10%.

5 CONCLUSIONSIn this paper, we presented a novel framework for the e�cientstreaming of VR videos over the Internet, which aims to reducethe high bandwidth requirements and storage costs of current VRstreaming solutions. In our framework the video is spatially dividedin tiles using H.265, and only tiles belonging to the user viewport arestreamed at the highest quality. A viewport prediction algorithm hasbeen proposed to anticipate the user’s movements and downloadin advance the part of the video that is likely going to be watchedin the future. To reduce the in�uence of the network RTT on tiledstreaming, our framework uses the server push functionality ofthe HTTP/2 protocol. In the evaluated streaming scenarios andin presence of slow viewport movements, our framework is ableto obtain similar quality as for a non-tiled solution, by using upto 35% less data to stream the video. The gains brought by the

proposed approach represent an important step toward the e�cientstreaming of VR videos with consistent quality.

Future work will focus on a more extensive comparison of theproposed approach with existing VR tiling solutions. Moreover, wewill investigate the applicability of video quality metrics, similarlyto the approach used by Alface et al. [14], to quantify the impact oftiling on user perception. To this extent, a subjective user study willbe carried out to further analyze the trade-o� between bandwidthsavings and user experience in tiled VR streaming.

REFERENCES[1] M. Budagavi, J. Furton, G. Jin, A. Saxena, J. Wilkinson, and A. Dickerson. 2015. 360

degrees video coding using region adaptive smoothing. In 2015 IEEE InternationalConference on Image Processing (ICIP). 750–754. https://doi.org/10.1109/ICIP.2015.7350899

[2] Wael Cherif, Youenn Fablet, Eric Nassor, Jonathan Taquet, and Yuki Fujimori.2015. DASH Fast Start Using HTTP/2. In Proceedings of the 25th ACM Workshopon Network and Operating Systems Support for Digital Audio and Video (NOSSDAV’15). ACM, New York, NY, USA, 25–30. https://doi.org/10.1145/2736084.2736088

[3] Eduardo Cuervo and David Chu. 2016. Poster: Mobile Virtual Reality for Head-mounted Displays With Interactive Streaming Video and Likelihood-basedFoveation. In Proceedings of the 14th Annual International Conference on Mo-bile Systems, Applications, and Services Companion (MobiSys ’16 Companion).ACM, New York, NY, USA, 130–130. https://doi.org/10.1145/2938559.2938608

[4] Lucia D’Acunto, Jorrit van den Berg, Emmanuel Thomas, and Omar Niamut.2016. Using MPEG DASH SRD for Zoomable and Navigable Video. In Proceedingsof the 7th International Conference on Multimedia Systems (MMSys ’16). ACM,New York, NY, USA, Article 34, 4 pages. https://doi.org/10.1145/2910017.2910634

[5] Facebook. [n. d.]. Next-generation video encoding techniques for360 video and VR. https://code.facebook.com/posts/1126354007399553/next-generation-video-encoding-techniques-for-360-video-and-vr/. ([n. d.]).

[6] V. R. Gaddam, M. Riegler, R. Eg, C. Griwodz, and P. Halvorsen. 2016. Tiling inInteractive Panoramic Video: Approaches and Evaluation. IEEE Transactionson Multimedia 18, 9 (Sept 2016), 1819–1831. https://doi.org/10.1109/TMM.2016.2586304

[7] Mohammad Hosseini and Viswanathan Swaminathan. 2016. Adaptive 360 VRVideo Streaming: Divide and Conquer!. In Proceedings of the IEEE InternationalSymposium on Multimedia (IEEE ISM 2016). 6.

[8] Jean Le Feuvre and Cyril Concolato. 2016. Tiled-based Adaptive Streaming UsingMPEG-DASH. In Proceedings of the 7th International Conference on MultimediaSystems (MMSys ’16). ACM, New York, NY, USA, Article 41, 3 pages. https://doi.org/10.1145/2910017.2910641

[9] S. Y. Lim, J. M. Seok, J. Seo, and T. G. Kim. 2015. Tiled panoramic videotransmission system based on MPEG-DASH. In 2015 International Conferenceon Information and Communication Technology Convergence (ICTC). 719–721.https://doi.org/10.1109/ICTC.2015.7354646

[10] K. Misra, A. Segall, M. Horowitz, S. Xu, A. Fuldseth, and M. Zhou. 2013. AnOverview of Tiles in HEVC. IEEE Journal of Selected Topics in Signal Processing 7,6 (Dec 2013), 969–977. https://doi.org/10.1109/JSTSP.2013.2271451

[11] MPEG-DASH. [n. d.]. Dynamic adaptive streaming over HTTP (DASH) – Part 6:DASH with server push and websockets. https://www.iso.org/standard/71072.html. ([n. d.]).

[12] Stefano Petrangeli, Viswanathan Swaminathan, Mohammad Hosseini, and FilipDe Turck. 2017. Improving Virtual Reality Streaming Using HTTP/2. In Proceed-ings of the 8th ACM on Multimedia Systems Conference (MMSys’17). ACM, NewYork, NY, USA, 225–228. https://doi.org/10.1145/3083187.3083224

[13] Feng Qian, Lusheng Ji, Bo Han, and Vijay Gopalakrishnan. 2016. Optimizing 360Video Delivery over Cellular Networks. In Proceedings of the 5th Workshop on AllThings Cellular: Operations, Applications and Challenges (ATC ’16). ACM, NewYork, NY, USA, 1–6. https://doi.org/10.1145/2980055.2980056

[14] Patrice Rondao Alface, Jean-Francois Macq, and Nico Verzijp. 2012. InteractiveOmnidirectional Video Delivery: A Bandwidth-E�ective Approach. Bell LabsTechnical Journal 16, 4 (2012), 135–147. https://doi.org/10.1002/bltj.20538

[15] A. TaghaviNasrabadi, A. Mahzari, J. D. Beshay, and R. Prakash. 2017. Adaptive360-degree video streaming using layered video coding. In 2017 IEEE VirtualReality (VR). 347–348.

[16] J. van der Hooft, S. Petrangeli, T. Wauters, R. Huysegems, P. R. Alface, T. Bostoen,and F. De Turck. 2016. HTTP/2-Based Adaptive Streaming of HEVC Video Over4G/LTE Networks. IEEE Communications Letters 20, 11 (Nov 2016), 2177–2180.https://doi.org/10.1109/LCOMM.2016.2601087

[17] Hui Wang, Mun Choon Chan, and Wei Tsang Ooi. 2015. Wireless Multicast forZoomable Video Streaming. ACM Trans. Multimedia Comput. Commun. Appl. 12,1, Article 5 (Aug. 2015), 23 pages. https://doi.org/10.1145/2801123

Page 9: An HTTP/2-Based Adaptive Streaming Frameworkfor 360 ... · of tiled streaming over HTTP/1.1, and consistently reduce freeze time. These results represent a major improvement for the

An HTTP/2-Based Adaptive Streaming Frameworkfor 360° Virtual Reality Videos MM ’17, , October 23–27, 2017, Mountain View, CA, USA

[18] Sheng Wei and Viswanathan Swaminathan. 2014. Low Latency Live VideoStreaming over HTTP 2.0. In Proceedings of Network and Operating System Supporton Digital Audio and Video Workshop (NOSSDAV ’14). ACM, New York, NY, USA,Article 37, 6 pages. https://doi.org/10.1145/2578260.2578277

[19] Mengbai Xiao, Viswanathan Swaminathan, Sheng Wei, and Songqing Chen.2016. DASH2M: Exploring HTTP/2 for Internet Streaming to Mobile Devices.

In Proceedings of the 2016 ACM on Multimedia Conference (MM ’16). ACM, NewYork, NY, USA, 22–31. https://doi.org/10.1145/2964284.2964313

[20] Alireza Zare, Alireza Aminlou, Miska M. Hannuksela, and Moncef Gabbouj. 2016.HEVC-compliant Tile-based Streaming of Panoramic Video for Virtual RealityApplications. In Proceedings of the 2016 ACM on Multimedia Conference (MM ’16).ACM, New York, NY, USA, 601–605. https://doi.org/10.1145/2964284.2967292


Recommended