1 Markov Decision Model for Perceptually Optimized Video...

1

Markov Decision Model for Perceptually

Optimized Video Scheduling

Chao Chen, Student Member, IEEE, Robert W. Heath Jr., Fellow, IEEE, Alan C.

Bovik, Fellow, IEEE, and Gustavo de Veciana, Fellow, IEEE,

Abstract

Transmitting video over slow fading wireless channels with good perceptual quality is a challenging

task because no time-diversity can be exploited to combat channel variations, especially when the frequency

diversity and spatial diversity is not available due to the wireless system implementation. While quality-scalable

video coding techniques make video source-rate adaptation possible, determining a good scheduling strategy

which selectively schedules video data associated with different layers is a challenging problem. For the best

performance of a wireless video system, the scheduler needs to consider the channel state, the buffer state and

the perceptual video quality at the receiver. In this paper, we propose a scheduling algorithm to optimize the

perceptual quality of scalably coded videos transmitted over slow fading channels. By modeling the dynamics

of the channel as a Markov chain, we reduce the problem of dynamic video scheduling to a tractable Markov

decision problem over a finite state space. We then employ an infinite-horizon average-reward maximization

algorithm to maximize the time-average Multi-Scale Structural SIMilarity (MS-SSIM) index which has been

shown to correlate highly with human judgments of video quality. Simulation results show that the proposed

MDP-based scheduling policy achieves significant perceptual quality improvement over scheduling methods

which do not explicitly exploit the channel dynamics. Furthermore, we propose an on-line scheduling method

which not only performs nearly as well as the MDP-based performance but also has very low implementation

complexity.

Index Terms

Videos, Scheduling algorithm, Wireless communication, Image quality.

The authors are with Department of Electrical and Computer Engineering, The University of Texas at Austin, 1 University Station

C0803, Austin TX - 78712-0240, USA e-mail: [email protected] This research was supported in part by Intel Inc. and Cisco Corp.

under the VAWN program.

2

I. INTRODUCTION

Video transmission over wireless channels is a challenging task. The throughput of wireless channels

varies over time, making the delivery of real-time video challenging due to tight delay constraints.

In particular, if the coherence time of the channel is comparable to the delay constraint, then the

time-diversity of the channel cannot be exploited. Traditional channel coding methods cannot provide

graceful visual quality degradation of the received video signal when deep fading happens. Hence, adap-

tive transmission techniques such as multi-layer scheduling and link-adaptation should be employed.

Furthermore, video packets are structured. Due to the nature of predictive video coding algorithms, a

video frame can be decoded only when its predictors are received at the receiver. Hence, the prediction

structure of the video codec enforces a partial order on the transmissions of the video packets.

Scalable video coding (SVC) is one approach for allowing flexible video transmission over channels

with varying throughput [1], [2]. An SVC video encoder produces a layered video stream that contains

a base layer and several enhancement layers. If the throughput is low, the transmitter can choose to

transmit the base layer only, which provides a moderate, but acceptable, degree of visual quality at the

receiver. If the channel conditions improve, the transmitter can transmit one, or more, enhancement

layers to further improve the visual quality. Conceptually, SVC provides a means to adapt the data rate

for wireless video transmission. The wireless transmitter can easily adapt the data rate by selectively

scheduling video data associated with various layers rather than re-encoding the video sequence into

a suitable rate.

Designing scalable video scheduling algorithms for wireless channels is a complex task. The schedul-

ing policy depends not only on the channel condition but also on the receiver buffer state. For example,

if the receiver has successfully buffered base layer data over many frames, the scheduler could choose

to transmit some enhancement layer data to improve the video quality even if the throughput is low.

The scheduling policy also depends on the impact that particular video data packet will have on the

perceptual video quality. The scheduler should assign higher priority to packets which could result in

higher perceptual quality improvements. An effective perceptual quality metric and an accurate rate-

quality model are important for scheduling policy design. The objective of this paper is to develop

scheduling algorithms to maximize the receiver perceptual video quality for scalable video transmission

over wireless channels.

A. Contributions

In this paper, we assume that video sequences are encoded by an H.264/SVC-compatible scalable

video encoder. We employ a finite state Markov chain (FSMC) to model the dynamics of the slow

3

fading channel. We also employ a rate-quality model to capture the relationship between the size

of a video data packet and its contribution to the receiver perceptual quality. We model the dynamic

video transmission system as a controlled Markov system. The visual quality measured in time-average

MS-SSIM index is maximized by optimizing the scheduling policy via value iteration (see e.g. [3]).

The specific contributions that we make are as follows:

1) A tractable MDP-based formulation is proposed to design the optimal scheduling policy. Typical

mobile users usually have available application layer storage space of several Gigabytes. Thus,

the buffer size can be regarded as infinite. Because the performance of scheduling policies

depend on the buffer state, the policies needs to be optimized over a infinitely large state space.

By making some reasonable approximations, we fix the scheduling policy except a finite set.

We prove that optimizing the transmission policy for this finite-state set is equivalent to solving

a semi-Markov decision problem. Based on this result, a value iteration algorithm is used to

optimize the scheduling policy.

2) Accurate prediction of visual quality is used as the optimization objective. In most video trans-

mission methods, Peak Signal to Noise Ratio (PSNR) is used as the optimization objective. It

is well known that PSNR does not accurately predict the perceptual quality of videos in many

instances [4] [5] [6]. In this paper, we employ an infinite-horizon average-reward maximization

formulation which is directly related to the time-average MS-SSIM index [7]. As shown in [8],

the time-average MS-SSIM index correlates quite well with human judgments of visual quality.

In this paper, the system state is mapped to MS-SSIM index using a simple rate-quality model.

Then, the MS-SSIM value of each state is used as the state-value in the value iteration algorithm.

3) A simple and near-optimal scheduling method is proposed. Computing the scheduling policy

based on the MDP formulation requires extra system resources. We devise a simple and near-

optimal scheduling algorithm which, similar to the MDP-based policy, proactively transmits data

for later GOPs and dynamically schedules data associated with different layers.

B. Related Work

MDP-based stochastic control techniques have been proposed for video data scheduling [9]–[14].

In [9], adaptive video transmission over a packet erasure channel was studied by modeling the buffer

state as a controlled Markov chain. Later, in [10], an MDP-based scheduling algorithm was proposed

for video transmission over packet loss networks. This work was further extended for wireless video

streaming in [11]. The wireless channel was modeled as a binary symmetric channel. This channel

model is justified for fast fading channels where the coherence time is much less than the delay

4

constraint. In that case, interleaving can be applied without violating the delay constraint and the

channel will appears like a i.i.d channel. For slow fading channels, the bit error rate cannot be modeled

as a constant. In [12] [13], a reinforcement learning framework was proposed for wireless video

transmission. Their algorithm was based on MDP with discounted-reward maximization formulation.

The transmitter learns the characteristics of the channel and the video sequence during the transmission

process. The scheduling policy is updated according to the learned characteristics. In our previous

work [14], an infinite-horizon average-reward maximization formulation was proposed. A very simple

rate-quality model was employed to differentiate the importance of different layers. The difference

between frames, however, was not incorporated in the rate-quality model. In addition, PSNR, instead

of time-average MS-SSIM index, was used as the optimization objective.

MDP-based methods have been used to solve other problems in adaptive video transmission. In

[15], an MDP formulation was proposed for adaptive video playout and scheduling for single layer

videos. A two-state channel model was used to represent “good” and “bad” channel conditions. The

controller adapts the playout speed according to the receiver buffer state and channel state in order to

optimize the PSNR of the signal at the receiver. In [16], an MDP-based formulation was introduced

for the problem of real-time encoder rate-control. The derived optimal rate-control policy adapts the

encoding bit rate according to the channel condition and video rate-quality characteristics. In [17],

multi-user scheduling and rate adaptation in wireless local-area networks (WLAN) were studied. The

users in a WLAN access the available resources in a decentralized manner. The multi-user scheduling

problem was formulated as a competitive Markov decision process [18] and a Nash-equilibrium policy

was computed via a value iteration algorithm.

Among all the mentioned work, the most closely related work to ours are [10] and [12] which focus

on single user scalable video transmission. The differences from our work are summarized as follows:

• The video scheduling algorithm proposed in [10] was developed for packet loss networks. In

a packet loss network, the throughput of each transmission slot is limited by congestion rather

than by deep signal fading in the transmission medium. Hence, in [10], the channel was assumed

to be time-invariant. Each packet was assumed to be lost or delayed independently of the other

packets. Based on these assumptions, the scheduling policies for different video packets can be

factorized. Thus the policy optimization problem was greatly simplified. For wireless channels,

the channel state is time varying and thus the packet losses are not independent across coherent

periods, hence scheduling policy factorization is not possible.

• In [12], Zhang et al. studied single user video transmission over wireless channels. The control

5

policy was learned on-line through reinforcement learning. The major drawback of reinforcement

learning is that it takes time to learn from the wrong scheduling actions. The scheduler may cause

bad visual quality during the learning period. As shown in [12] and [19], with an accelerated

on-line learning algorithm, the scheduling policy converges after 25 frames are transmitted. In

our work, the channel dynamics are assumed to be known. Indeed, the finite state Markov model

can be obtained analytically [20] or by measurement [21]. The channel model is combined with

a simple rate-quality model to form a model-based MDP formulation. The scheduling policy thus

can be derived off-line.

• In [12] [16] [19], a discounted-reward maximization formulation is employed to trade off the

visual quality of recent and later frames. The discounting factor needs to be chosen heuristically

and affects the performance of the derived scheduling policy. In our work, an average-reward

maximization formulation is proposed. This formulation is naturally related to the time-average

MS-SSIM index which correlates well with human objective judgements of visual quality [8].

C. Notation Used

A and a are examples of a matrix and a vector, respectively. A is a set. |A| is the cardinality ofset A. 1 is the unit vector of all-ones and 0 is the zero vector. max{a,b} and min{a,b} are thecomponentwise maximum and minimum of vector a and b, respectively. �(·) is the indicator function.�·� is the ceiling function. P(·) is the probability measure and E[·] is the expectation.

II. SYSTEM MODEL

In this section, we first describe the wireless video system to be considered. Then, we present our

video codec configurations and introduce the rate-quality model based on MS-SSIM index. Finally,

we present the Markov channel model to be used in the sequel.

A. System Overview

We consider a time-slotted scalable video transmission system over slow fading wireless channels.

A video sequence is encoded with a quality-scalable video encoder and stored in a video server. The

video server transmits video data to a mobile user via a wireless transmitter. Each slot, the server

sends some video data upon the requests of a scheduler equipped on the wireless transmitter. These

data are packetized at the wireless transmitter for physical layer transmission. The scheduler operates

according to a scheduling policy which maps the channel and buffer state to the scheduling action(see

Fig. 1).

6

We assume that the channel between the video server and the wireless transmitter is not the

bottleneck of the link. Thus, in the perspective of the wireless transmitter, the whole video sequence

is accessible. We also assume that the physical layer channel state information is available at the

transmitter and that the modulation and coding scheme (MCS) is determined by a given physical layer

link-adaptation policy.

B. Video Codec Configuration

We assume that video sequences are encoded by an H.264/SVC-compatible scalable video encoder.

The duration of each frame ΔT is called a frame slot. The video frames are uniformly partitioned into

Groups of Pictures (GOPs). Every GOP has LGOP frames. The first frame in a GOP is an I frame

while the other frames are P frames. Every frame is encoded into L layers. The first layer is the base

layer; The other layers are enhancement layers. Every enhancement layer of a frame is predictively

encoded using the lower layers of the frame. The base layer of a P frame is predictively encoded

using the base layer of its preceding frame. The base layer of an I frame is encoded independently

(see Fig. 2).

Each frame has a playout deadline at the receiver. In the following, frames whose deadlines have

expired are called expired frames. The other frames are called unexpired frames. The first unexpired

frame is called the “current frame”. At any time, the frames are indexed relative to the current frame

as shown in Fig. 2. The video data in the �th layer of the f th frame is called the (f, �)th video data

unit.

We adopt the prediction structure in Fig. 2 rather than the “Hierarchical B” structure because

no structural delay is introduced [1]. Specifically, in the “Hierarchical B” prediction structure, the

encoding order differs from the display order, thus, the transmission of a frame must be delayed until

all necessary predictors are received. Also, due to the time-varying nature of the wireless channels,

the adaptive transmitter must drop some enhancement layers when channel throughput is low. So,

if the enhancement layers are used to predict other frames as is the case in the “Hierarchical B”

structure, dropped enhancement layers can give rise to error propagation and unpredictable visual

quality degradation. At the possible cost of lower compression efficiency, the prediction structure that

we use will eliminates error propagation arising from enhancement layer losses, since there will be

no inter-frame prediction among enhancement layers.

7

C. Rate-Quality Model

The rate-quality model characterizes the relationship between the size of a video data unit and

the visual quality improvement when it is correctly received. We adopt a simple model. For each

P frame, let rP� be the amount of data in the �th layer data unit. For each I frame, let rI� be the

amount of data in the �th layer data unit. Define q� to be the visual quality increment after the �th

layer is correctly received, given all its predictors are also received. This model implies that the visual

quality improvement incurred by the enhancement layers in one frame does not depend on whether

the enhancement layers of other frames are received. This is true for the prediction structure given in

Section II-B since there is no error propagation due to losses of enhancement layers.

Conventional image quality measures such as the PSNR reflect absolute signal fidelity but without

accounting for perceptual visual quality. Recently, a variety of models that accurately predict perceptual

video quality have been proposed [7], [22]–[25]. In our formulation, we adopt MS-SSIM index as the

visual quality measure [7], since it has been shown to correlate quite well with perceptual visual

quality and it is of reasonable computational complexity [8].

The MS-SSIM index of a video sequence ranges from 0 to 1. The larger the index, the better

the quality. In our rate-quality model, the quality increment q� is measured using MS-SSIM index.

Therefore, the quantity q� ∈ [0, 1]. Larger values of q� mean larger quality improvement can be achievedby transmitting the �th layer data units.

In a real video sequence, rate-quality characteristics vary from frame to frame. For simplicity, we

use the average value of the measured rate-quality characteristics as estimates of rI� , rP� and q�. In Fig.

3, the data rates and MS-SSIM values of two video test sequences, “Foreman” and “Paris”, are shown.

These two sequences are widely used in visual quality assessment. “Foreman” has higher temporal

complexity and “Paris” has higher spatial complexity [26]. As shown in Fig. 3, our proposed model

is a good fit for the rate-quality characteristics.

D. Channel Model

We focus on scheduling for a slow fading channel. By slow fading, we mean that the coherence time

of the channel is less than the duration of a GOP and larger than the duration of a frame. Assuming

the mobile users are moving in a 1.5m/s walking speed and the carrier frequency is 2GHz, the Doppler

spread is about 10Hz. The coherence time is about 100ms. A typical GOP duration is about 1 second

and a frame slot is about 30ms. Hence, for pedestrian video users, wireless channels are slow fading.

As the channel state is stable during each frame slot, the scheduling decision is made in a frame-by-

frame basis. At the beginning of each frame slot, a frame is played out. Then, the wireless transmitter

8

schedules video data units for transmission according to a scheduling action. The scheduling action is

defined as an ordered collection of video data units

u ={(f1, �1), (f2, �2), · · · , (f|u|, �|u|)

}.

When a scheduling action u is taken, the data units contained in u are transmitted sequentially. At

physical layer, each scheduled data unit is packetized into physical layer packets and each packet is

repeatedly transmitted, i.e., if errors occur, until acknowledged. The MCSs used in transmitting data

packets is determined by a link-adaption policy. In this paper, we focus on scheduling policy design

and assume that the link-adaptation policy is given.

In [20] and [27], it is shown that the first-order FSMC can be utilized to accurately describe the

first-order channel state transition probabilities for Rayleigh fading channels. First-order FSMC models

have also been validated in [21] and [28] by channel measurements of urban area wireless channels. In

this paper, we employ a first-order FSMC to describe the dynamics of the channel state. It should be

noted that, as pointed in [29], a first-order FSMC is not sufficient to describe high-order channel state

distributions. Generally, the autocorrelation function (ACF) of a first-order FSMC is exponentially

decreasing and the ACF of a Rayleigh fading channel is a zeroth-order Bessel function of the first

kind. To model the higher order dynamics of the wireless channel, at the cost of higher complexity,

a higher order Markov channel model can be applied.

At the physical layer, in the tth frame slot, the transmission bit rate Rt is determined by the MCS

and the packet error rate pt is determined by both the channel state and the MCS. Under the given

link adaptation method, the chosen MCS is a function of the channel state. Thus, there is a one-to-one

mapping from channel state to the tuple (Rt, pt). Due to the Markov property of the channel state,

(Rt, pt) can also be modeled by an FSMC. The channel state space is C ={C1, ..., C|C|

}, where

Ci = (Ri, pi) is the ith channel state. The state transition matrix Pc is a |C| × |C| matrix with entryPci,j = P(Cj|Ci) being the transition probability from state (Ri, pi) to (Rj, pj).

III. PROBLEM FORMULATION

In this section, we define the scheduler’s state space and the policies to be considered. Then, we show

how to simplify the scheduling problem to a finite-state Markov decision problem using reasonable

approximations. An infinite-horizon average-reward maximization MDP formulation is proposed to

optimize the scheduling policy so as to improve the time-average MS-SSIM index at the receiver.

9

A. Scheduling Policy and State Space

Considering all the possible scheduling actions makes defining the scheduling policy and represent-

ing the buffer state unmanageably complicated. If we do not apply any constraint on the scheduling

actions, the receiver buffer state could look like Fig. 4. On the one hand, to represent the buffer state,

the frame index and the layer index of each received data unit need to be recorded. Because the

number of received data units is not bounded, we cannot represent all possible buffer states using a

finite-dimension vector space. On the other hand, the scheduling actions which give rise to the buffer

state in Fig. 4 cannot provide optimal visual quality at the receiver. As shown in Fig. 4, some video

data units are transmitted before their predictors. If their predictors are not received before their playout

deadlines, these units are undecodable and useless. In this paper, by applying reasonable constraints

on the scheduling actions, we concentrate on those scheduling strategies which are possible to present

good performances. Specifically, we consider the scheduling policies which comply with the following

constraints.

Constraint 1: The scheduler always schedules a data unit later than its predictors in the prediction

structure.

Constraint 2: The amount of video data scheduled in the tth slot is just larger than Rt ×ΔT , i.e.,the amount of data which can be transmitted in the slot.

Constraint 3: The scheduler never schedules more enhancement layer data units for later P frames

than sooner P frames in the same GOP.

Constraint 1 is applied to make sure that the transmission order is compatible with the prediction

order given in Section II-B, since a data unit can be decoded only when its predictors are received.

Constraint 2 forces the transmitter to keep busy transmitting data during the entire slot. With Constraint

3, at any time and for all the P frames within a GOP, the scheduler does not sacrifice the quality of

the frames which will be displayed sooner for the frames to be displayed later by transmitting more

enhancement layer data for the latter. Because the optimization objective is time-average MS-SSIM

index, the quality of each P frame is equally important. Transmitting more enhancement layer data

for later frames does not help to improve the time-average MS-SSIM value.

Note that, although the P frames within a GOP are equally important for time-average MS-SSIM

index, the frames in different GOPs are not. As discussed in Section I, when the channel throughput is

very low, it is beneficial to sacrifice P frames in current GOP for transmitting the I frame of the next

GOP. To differentiate the importance of current and future GOPs, we partition the data units of the

unexpired frames into three sets: Ipre, I and Ipost. The set I contains the unexpired data units of the

10

first unexpired I frame, Ipre contains data units before the first unexpired I frame, and Ipost containsthe remaining unexpired data units (see Fig. 5). In the following, we define the buffer state spaces for

the three sets as BI , Bpre and Bpost, respectively. The overall buffer state space is B = Bpre×BI×Bpost.BI At the tth slot, the state of I is defined as BIt = (f It , bIt ), where f It ∈ {1, · · · , LGOP} is the

frame index of the first unexpired I frame and bIt is the number of the received data units

of I.Bpre According to Constraint 3, the number of data units received in Ipre is decreasing with

respect to the frame index. Hence, we only need to record the number of received data

units for each layer. We define the buffer state space of Ipre as a L-dimensional vectorBpret = (b

pre1 , b

pre2 , · · · , bpreL ), where bpre� is the number of the received data units in �th layer of

Ipre.Bpost It is noted that Constraint 3 is only applied within each GOP. Hence, we cannot define the

state space of Ipost in the similar way as we did for Ipre. In principle, we need to record thenumber of received data units for each frame in Ipost. In that case, we cannot define a vectorspace with fixed number of dimensions to represent the state of Ipost. Hence, for simplicity,we extend Constraint 3 to all the frames in Ipost. In other words, the scheduler never transmitsmore enhancement layers for later frames than sooner frames in Ipost. Similar to Ipre, wedefine the buffer state space of Ipost as a L-dimensional vector Bpostt = (bpost1 , bpost2 , · · · , bpostL ),where bpost� is the number of the received data units in �th layer of Ipost.

Remark By extending Constraint 3 to all the GOPs in Ipost, we actually rule out the option oftransmitting more data units for the I frames in later GOPs of Ipost. This may potentially degradeperformance but this is negligible. Transmitting more data units for later I frame is necessary only

when the channel state is bad throughout the whole GOP. However, as we assumed in Section II-D,

the coherence time of the channel is much less than the duration of a GOP. The channel is less likely

to be bad throughout the whole GOP. Hence, quality degradation is less likely to occur.

When no video data units for the current frame are received, the decoder cannot continue decoding

the frame. The video stream will experience an interruption until all the necessary base layer data

units for decoding the current frame are received. We define an interruption state N itrt which is the setof the unreceived base layer data units for decoding the current frame at slot t (see Fig. 5(b)). When

the current frame is decodable, the interruption state N itrt = ∅. N itrt contains at most LGOP − 1 dataunits because I frames resynchronize the decoding process and terminate the interruption. Because

11

every data unit is transmitted only when all its predictors are received, the data units needed for

decoding the current frame must be composed of a sequence of consecutive base layer data units,

i.e., N itrt = {(−N itrt + 1, 1), (−N itrt + 2, 1), · · · , (0, 1)}. Hence, the interruption state can be simplyrepresented by the number N itrt = |N itrt |.

The system state S is defined as the product of the channel state, the buffer state and the interruptionstate. At slot t, the system state is St = (Ct, Bt, N itrt ), where Bt = (B

pret , B

It , B

postt ). For each state

St ∈ S , we define a feasible control set USt which contains all the scheduling actions complying withall the three constraints. At any time, the state St contains all the information about the receiver buffer

and the channel. The transmitter must decide which action in USt to take in order to maximize thetime-average MS-SSIM index value. We define the scheduling policy μ(·) as the mapping from thesystem state St to an action in USt . In the following sections, we show how to optimize the schedulingpolicy μ(·).

B. Policy Simplification

Because the receiver buffer size is regarded as essentially infinite, the state space Bpost is thereforeinfinite. Optimizing the scheduling policy over this infinite-state space is intractable. In the following,

we reduce the state space to a finite one by reasonable approximations.

When many frames are buffered at the receiver, the scheduler can transmit more enhancement

layers because there is enough time before the frames are played out. Based on this observation,

we define a window W which contains the data units within the first W unexpired frames. Thescheduler is restricted to work as follows: If all the data in W are all received and N itrt = ∅, thescheduler schedules as many enhancement layers as possible. Otherwise, the scheduler only focuses

on scheduling the video data in W and N itrt .By using the window W , although the state space is still infinite, we fix the scheduling actions

outside a finite state set. In other words, it is only necessary to find the optimal actions for a finite-

state set. The window size W provides a tradeoff between complexity and optimality. The larger the

window, the less constrained the control policy but the higher complexity.

It should be noted that the window defined here is different from the sliding window defined in [10]

and [19]. Our scheduling policy allows the transmitter to transmit the data units outside the window.

C. Transition Probability

Let St = (Ct, Bt, N itrt ) and USt be the system state and the corresponding feasible control set atslot t, where Ct = (Rt, pt) and Bt = (B

pret , B

It , B

postt ). At the beginning of each slot, one frame is

12

decoded and played out. We let B+t = (Bpre+t , B

I+t , B

post+t ) denote the buffer state right after the first

frame is displayed. If f It = 1, i.e., the decoded frame is an I frame, then the frame set I becomesthe next I frame, i.e., the LGOP th frame (see Fig. 6(a)). Hence, buffer state is

BI+t =

(LGOP ,

L∑�=1

�(bpost� ≥ LGOP )), (1)

where∑L

�=1 �(bpost� ≥ LGOP ) is the number of received layers in the next I frame. Meanwhile, Ipre

becomes the first LGOP − 1 frames and Ipost contains the frames whose index is larger than LGOP .Thus, we have

Bpre+t = min{Bpostt , (LGOP − 1)1

}(2)

and

Bpost+t = max{Bpostt − LGOP1,0

}. (3)

If the decoded frame is not an I frame, the frame set Ipost will not be affected and the buffer stateBpostt does not change (see Fig. 6(b)). B

pret becomes

Bpre+t = max {Bpret − 1,0} . (4)

Summarizing (1), (2), (3) and (4), we have

Bpre+t =

⎧⎪⎨⎪⎩min

{Bpostt , (LGOP − 1)1

}if f It = 1,

max {Bpret − 1,0} if f It �= 1,

BI+t =

⎧⎪⎨⎪⎩(LGOP ,

∑L�=1 �(b

post� ≥ LGOP )

)if f It = 1,

(f It − 1, �It ) if f It �= 1,and

Bpost+t =

⎧⎪⎨⎪⎩max

{Bpostt − LGOP1,0

}if f It = 1,

Bpostt if f It �= 1.After the first frame is displayed, the transmitter begins to sequentially transmit the collection of video

data units indicated by the action ut = μ(St) = {(f1, �1), · · · , (f|ut|, �|ut|)}. Let Δut = {(f1, �1), · · · ,(fnt , �nt)} denote the completely received data units by the end of the slot, where nt is the numberof received data units. Among the data units in Δut, let ΔB

pret = (Δb

pre1 ,Δb

pre2 , · · · ,ΔbpreL ) be the

number of newly received data units for each layer in frame set Ipre. Similarly, we denote ΔBpostt =(Δbpost1 ,Δb

post2 , · · · ,ΔbpostL ) as the number of newly received data units for each layer in frame set Ipost

13

and Δ�I as the number of received data units for I. At the beginning of the (t + 1)th slot, we havethe following state transition relationship

Bpret+1 = Bpre+t +ΔB

pret , (5)

BIt+1 = (fI+t , �

I+t +Δ�

I), (6)

Bpostt+1 = Bpost+t +ΔB

postt . (7)

As for the interruption state N itrt , after the first frame is played out, the second frame becomes the

current frame. If the displayed frame is an I frame, N itr+t = �(bI = 0). Here, �(bI = 0) indicates

whether the base layer of the displayed frame is received. If the displayed frame is the last frame of

a GOP, then N itr+t = 0. If the displayed frame is neither an I frame nor the last frame in a GOP, we

have N itr+t = N itrt +�(bpre1 = 0). At the end of the slot, the data units in N itr+t which are also in Δut

are removed from N itr+t . Thus,

N itrt+1 = Nitr+t − |N itr+t ∩Δut|. (8)

The amount of video data in Δut, denoted by R(Bt,Δut), can be estimated according to buffer state

BIt and the rate-quality model introduced in Section II-C. Specifically, for each data unit in Δut, we

first determine whether it belongs to an I frame or a P frame according to BIt and then estimate the

amount of data by the rate-quality model. The set Δut records the completely transmitted data units

up to (fnt , �nt)th data unit. However, data unit (fnt+1, �nt+1) is only partially received. Denoting the

amount of data in unit (fnt+1, �nt+1) by R̃(BIt ,Δut), the amount of received data is at least R(BIt ,Δut)

and at most R(BIt ,Δut) + R̃(BIt ,Δut). Assuming the physical layer packet length is LPHY , there is

N = �ΔT×RtLPHY

� packet transmissions during a time slot ΔT . The number of successfully transmittedpackets is at least Nl = �R(B

It ,Δut)

LPHY� and is less than Nh = �R(B

It ,Δut)+R̃(B

It ,Δut)

LPHY�. As assumed in

Section II-D, the channel state is constant over each slot. Thus, the packet losses are independent

within each slot. The number of successful packet transmissions in a slot is distributed binomially.

Hence, the state transition probability from St = (Ct, Bt, N itrt ) to St+1 = (Ct+1, Bt+1, Nitrt+1) is

Pμ(St|St+1) =[

Nh−1∑nt=Nl

(N

nt

)pN−ntt (1− pt)nt

]P(Ct|Ct+1), (9)

where the first multiplicative term is the transition probability of the receiver buffer state from

(Bt, , Nitrt ) to (Bt+1, N

itrt+1) and the second term is the transition probability of the channel state

from Ct to Ct+1.

14

D. Optimization Objective

At the beginning of each time slot t, the first frame in the window is played out and the MS-SSIM

index is

Q(St) =L∑

�=1

q� × ��(St), (10)

where ��(St) is the indicator of whether the �th layer of the displayed frame is received at state St.

The quantity q� is the rate-quality model parameter defined in Section II-C. Because of the way that

the MS-SSIM index is defined, the quantity Q(St) is also bounded in [0, 1]. Our aim is to find the

optimal policy μ∗(·) which maximizes the time-average MS-SSIM index, i.e.,

Jμ = limN→∞

1

NEμ

{N−1∑t=0

Q(St)

}. (11)

E. Finite State Problem Formulation

Using the window defined in Section III-B, we fix the transmission policy outside a finite state set.

The buffer state space, however, is still infinite and the system state evolves in this infinite state space.

In the following, we show how to simplify this infinite state space problem to a finite-state problem.

Note that we only need to optimize the policy μ(·) when some of the video data in the windowhave not been received or N itr �= 0. We formally define this finite set SW as follows:

SW ={(

C,B,N itr) | (C,B,N itr) ∈ S, V(B) � VW or N itr �= 0} , (12)

where VW denotes the set of video data units in W and V(B) is the set of buffered video data units.We define another subset of S as follows

SW ={(

C,B,N itr) | (C,B,N itr) ∈ S,VW ⊂ V(B) and N itr = 0} . (13)

For all the states in SW , the video data in W and N itr are all received. Note that, because thetransmitter always transmits the video data units in W and N itr with higher priority, SW and SWform a partition of state space S . In other words, we have SW ∪ SW = S and SW ∩ SW = ∅.

Given a policy μ(·), the system state transits as a controlled Markov chain in set SW and as anuncontrolled Markov chain in set SW . Because the transmission rate is finite, the number of states inSW which can be reached from SW in one step is also finite. We formally define this set of states asfollows

SΔ = {S|S ∈ SW ; ∃S ′ ∈ SW , s.t., Pμ(S|S ′) > 0}. (14)

Once the system moves into the set SW , the system state hits a state in SΔ and then stays in SWfor some time. During this period, the decoded video quality is always Q̂ =

∑L�=1 q�, because all

15

the data units in W are received. The dynamics of the system when it moves into set SW affectsthe performance of the system. Generally, the longer it stays in SW , the better the performance is.Although the scheduling policy in SW is fixed as described in Section III-B, the control policy in SWdetermines how frequently the system state will hit SW and thus also affects the system performance.

In the following, we denote the system under a given policy μ as system Aμ. Let Tμ(S) be the

expected time spent by Aμ in SW after it enters SW at state S ∈ SΔ. Let PTμ (S ′|S) denote theprobability that Aμ jumps back to SW at state S ′ ∈ SW after it enters SW at state S. To find theoptimal policy, we define a finite-state system Ãμ as follows:

Definition 1: A system Ãμ is called the simplified system of the original system Aμ if it has the

following dynamics:

1) The system is a semi-Markov process over state space S̃ = SW ∪ SΔ. At any state S ∈ S̃ , thevisual quality is Q(S) as in (10). At any state in SW , the system acts according to the policy μ;

2) When the system jumps to a state S ∈ SΔ, it spends Tμ(S) slots in S with video quality∑L

�=1 q�

for each slot. Then, the system transits to a state S ′ ∈ SW with probability PTμ (S ′|S).It should be noted that Ãμ is not coupled with the original system Aμ. It just shares some properties

with the original system (see Fig. 7). The following theorem relates the visual quality under Ãμ and

that of Aμ.

Theorem 1: If the jump chain of the original system Aμ is positive recurrent, then the time-average

MS-SSIM index of Aμ is the same as the simplified system Ãμ.

Proof Sketch: If the jump chain is positive recurrent, the jump from SW to SΔ can partitionthe Markov process into i.i.d segments. We only need to optimize the policy μ to maximize the

average quality in each segment. Every segment consists of two consecutive subsegments. During the

first subsegment, St ∈ SW . In the other subsegment, St ∈ SW . Because every state in SW has thesame visual quality

∑L�=1 q�, we can abstract the first subsegment as a single state with transition

probability PTμ (S′|S). This simplified system provides the same average quality as the original system.

For a detailed proof, see the technical report [30]

Remark The positive recurrent condition for the jump chain means that the average throughput of the

channel is neither too large nor too small relative to the average data rate of the video. If the average

throughput of the channel is very large, the receiver buffer can always buffer enough frames and the

dynamic scheduling is unnecessary. If the average channel throughput is too small, the channel cannot

support the video stream and dynamic scheduling cannot help either.

As indicated by Theorem 1, given any policy μ, the visual quality of Aμ is the same as Ãμ. Thus,

16

we can optimize our policy with respect to Ãμ which has a finite-state space, and a standard policy

optimization algorithm can by applied.

IV. POLICY OPTIMIZATION

In the following, we show how to compute the parameters Tμ and PTμ . Then, we present a value

iteration algorithm to find out the optimal scheduling policy.

A. Computation of Tμ and PTμ

Before we can apply an MDP algorithm to optimize the policy, we need to compute Tμ(S) and

PTμ (S′|S) for every state S ∈ SΔ and S ′ ∈ SW . Both Tμ(S) and PTμ (S ′|S) only involve dynamics of

the system when the state S ∈ SW . Because the scheduling policy is fixed in SW , we can simplifythe state representation furthermore.

Let BWt be the number of buffered packets outside the window. Noted that, when the system

moves in SW , the system always schedules as many enhancement layer data units as possible. Inaddition, when the system is in state set SW , we always have N itrt = ∅. Hence, BWt and f It containall the information about the buffer state (Bt, N itrt ). We can further simplify the state representation

(Ct, Bt, Nitrt ) to (Ct, B

Wt , f

It ) when S ∈ SW . All the states in SW correspond to some states with

BWt ≥ 0. All the states in SW correspond to some states with BWt < 0.When the system evolves in SW , at the beginning of a slot t, the state BWt first decreases by

ΔBWd (St) when the current frame is displayed. Then, the transmitter schedules as many enhancement

layer data units as possible. At the end of the slot, BWt increases by ΔBWi (St). Because the quantity

ΔBW (St) = ΔBWi (St)−ΔBWd (St) only depends on state St, the state BWt varies like a random walk

but with Markovian step-size ΔBW . This process can be described by a quasi-birth-death process

(QBDP). Hence, determining Tμ and PTμ is actually the hitting time problem of the QBDP. The

problem for continuous time QBDP was essentially solved in [31, p. 96]. The discrete time case can

also be solved similarly. Details on how to compute Tμ and PTμ are found in the technical report [30].

B. Determining Optimal Policy via Value Iteration

Given Tμ and PTμ , the optimal policy for an MDP can be determined for the simplified system Ãμ,

which is also the optimal policy of A. Let Sini be any state in S̃ = SW ∪SΔ. The hitting time to stateSini can partition the process into i.i.d cycles. Optimizing the policy μ(·) in the cycles maximizes thetime-average MS-SSIM index of the system. Similar to the derivation in [3, p. 441], this is equivalent

17

to an average-reward maximization problem with stage-reward g(S)− τ(S)λ, where λ is the expectedaverage-reward of each cycle and

g(S) =

⎧⎨⎩

Q(S) : S ∈ SWTμ(S)

∑L�=1 q� : S ∈ SΔ,

τ(S) =

⎧⎨⎩

1 : S ∈ SWTμ(S) : S ∈ SΔ.

Let us denote by h(S) the average reward-to-go in each cycle when the system starts at state S. Then

we have the following Bellman’s equation array:

h(S) = g(S)− τ(S)λ+∑

S′∈SW∪∂SPμ(S ′|S)h(S ′), (15)

where h(Sini) = 0. To find the optimal policy, the standard value iteration algorithm can be applied

[3, p. 430].

V. PERFORMANCE EVALUATION AND NEAR-OPTIMAL POLICY DESIGN

In this section, we first test the MDP-based scheduling policy by simulations and show its superiority

to those scheduling methods which does not explicitly explore the buffer-channel information. Then,

we propose a simple scheduling policy which presents near-optimal performance.

A. Performance evaluation

The proposed dynamic scheduling algorithm was evaluated on the test sequences of “foreman”,

“bus”, “flower”,“mobile” and “Paris” [26]. These video sequences were encoded using H.264/SVC

reference software JSVM [32] into 3 layers. The GOP length was set as LGOP = 16. The encoding

parameters and rate-quality model parameters are listed in Table I. The parameters rI� and rP� are

measured in megabits and q� is measured in MS-SSIM index. The quantization parameters (QP ) for

base layers were chosen such that the base layer quality is about 0.90 to 0.91 MS-SSIM value which

is of moderate but acceptable visual quality. The QP ’s of the enhancement layers were chosen such

that the third layer provides MS-SSIM visual quality prediction of 0.95 to 0.96 and the bit rates of

the two enhancement layers are roughly the same. The Lagrangian multipliers for motion estimation

and mode decision were set as QP − 2.

18

We employ a 4-state Markov channel model to test the performance of the proposed scheduling

algorithm. The state transition matrix is

Pc =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣

15

45

0 0

15

35

15

0

0 35

15

15

0 0 45

15

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦

and the steady state distribution is

π= [0.15, 0.60, 0.20, 0.05].

Let us denote the throughput of channel state (Ri, pi) by ri = Ri(1 − pi). The state parameters Riand pi are configured such that r1 < RV1 < r2 < R

V2 < r3 < R

V3 < r4 in which R

Vi is the average

video data rate up to the ith layer. Hence, the channel throughput fluctuates among the average rate

of each layer. According the steady state distribution, the average throughput of the channel is higher

than the base layer but not enough to support the first enhancement layer.

We simulated 40 transmissions of each sequence over this channel model. To conceal errors, every

lost frame was reconstructed by copying its preceding frame. To demonstrate the advantages of the

proposed algorithm, three scheduling algorithms were tested over the same channel realizations. They

are summarized as follows:

H1 This policy only schedules the base layer. It is the most conservative transmission strategy.

H2 This policy always schedules as many enhancement layers as possible. It preferentially

maximizes the visual quality of recent frames.

H3 This policy decides dynamically how many enhancement layers to transmit. Specifically,

when the instantaneous channel throughput is lower than the average data rate up to the first

enhancement layer, the policy behaves like H1. If the channel throughput is higher than the

average data rate up to the second enhancement layer, the policy acts like H2. Otherwise,

the policy schedules the video data up to the first enhancement layer.

The average visual quality measured using MS-SSIM index is shown in Table II. It is observed that

the proposed scheduling algorithm outperforms all of the other policies by 0.015 0.035 in MS-SSIM

value, which is perceptually significant. As can be seen in Table I, to increase the MS-SSIM value

by 0.02 approximately requires doubling the video bit rate. Thus, the proposed scheduling algorithm

provides very significant performance improvements over other scheduling policies.

19

B. Near-optimal scheduling policy design

Although the MDP-based scheduling policy has the optimal performance among all the policies we

consider, the off-line computation of MDP policies requires extra system resources. This motivate us

to design a simple on-line scheduling policy which presents similar performance as the MDP-based

policy.

The simulation results show that, by dynamically schedules data associated with different layers,

H3 achieves much better performance than other heuristics. But, for the sequence “Paris”, which has

large I frames, H3’s performance is much worse. This is mainly because H3 only transmits video

frames sequentially without proactively transmitting data of later GOPs. In the following, we propose

a scheduling scheme which, similar to the MDP-based policy, not only dynamically schedules data

units associated with different layers but also allocate transmissions for later GOPs.

At the tth slot, the proposed policy first estimates the amount of data which can be sent before the

t + τ th time slot as Dt =∑τ−1

n=1[rtρn−1 + ravg(1 − ρn−1)]. Here, ρ is the subdominant eigenvalue of

Pc, which represents the temporal correlation of the channel condition. Such a correlation parameter

could be easily measured. The quantity rt is the throughput of the current slot and ravg is the average

throughput of the channel. Again, these parameters can be measured. If bI = 0, correctly sending Ito the receiver is critical and thus we set τ = f I . If bI > 0, we care about the transmissions within

the coherence time and thus we set τ = �1/ ln(1/ρ)�, where 1/ ln(1/ρ) is roughly the relaxation timeof channel variations. Let D�t be the amount of unreceived data contained for the first � layers of the

next τ frames. The proposed scheduling policy operates as follows:

• If Dt < D2t , the policy only schedules the base layer data units; If D2t ≤ Dt < D3t , the policy

only schedules the first two layers; Otherwise, the policy schedules data units from all the three

layers.

• If Dt < D1t , the policy schedules as much of base layer data for I as possible. If D1t < Dt < D2t ,the policy schedules up to 50% of the transmissions for I. Otherwise, the scheduler policytransmits the frames sequentially without proactively transmit data for I.

The parameters of the proposed policy, i.e., ρ, rt and ravg can be easily measured and this policy

is very simple to implement. The performance of this policy was tested over the simulated Markov

channel models with different temporal correlation parameter ρ. The simulation results are summarized

in Table III and Table IV. For most tested sequences, the gap between the proposed policy and the

MDP-based policy ranges from 0.003 to 0.01.

20

VI. CONCLUSIONS

We have developed dynamic scheduling for efficient scalable video transmission in wireless channels.

By modeling the wireless channel as a Markov chain, an infinite-horizon average-reward maximization

formulation is proposed to maximize the visual quality predicted by MS-SSIM index. To reduce the

state space to a finite one, we employ a window to fix the scheduling policy when all the data within

the window are received. It is shown that the scheduling policy optimization problem is equivalent

to finding the optimal control policy for a controlled semi-Markov process over a finite-state space.

Simulation results demonstrate the superiority of the scheduling policy obtained by the proposed

MDP-based formulation. Further, a simple scheduling policy is proposed and presents near-optimal

performances.

REFERENCES

[1] H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the scalable video coding extension of the H.264/AVC standard,” IEEE

Trans. Circuits Syst. Video Technol., vol. 17, no. 9, pp. 1103–1120, Sept. 2007.

[2] W. Li, “Overview of fine granularity scalability in MPEG-4 video standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 11,

no. 3, pp. 301–317, Mar. 2001.

[3] D. Bertsekas, Dynamic Programming and Optimal Control, 3rd ed. Athena Scientific, 2005, vol. 2.

[4] Z. Wang and A. C. Bovik, “Mean squared error: Love it or leave it? A new look at signal fidelity measures,” IEEE Signal Process.

Mag., vol. 26, no. 1, pp. 98 –117, Jan. 2009.

[5] B. Girod, “What’s wrong with mean-squared error,” Digital Images and Human Vision (A. B. Watson, ed.), pp. 207–220, 1993.

[6] A. M. Eskicioglu and P. S. Fisher, “Image quality measures and their performance,” IEEE Trans. Commun., vol. 43, no. 12, pp.

2959–2965, Dec. 1995.

[7] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in Conference Record

of the Thirty-Seventh Asilomar Conference on Signals, Systems and Computers, vol. 2, Nov. 2003, pp. 1398–1402.

[8] K. Seshadrinathan, R. Soundararajan, A. C. Bovik, and L. K. Cormack, “Study of subjective and objective quality assessment of

video,” IEEE Trans. Image Process., vol. 19, no. 6, pp. 1427 –1441, Jun. 2010.

[9] M. Podolsky, S. McCanne, and M. Vetterli, “Soft ARQ for layered streaming media,” Technical Report, Computer Science Division,

University of California, Berkeley, vol. UCB/CSD-98-1024, 1998.

[10] P. A. Chou and Z. Miao, “Rate-distortion optimized streaming of packetized media,” IEEE Trans. Multimedia, vol. 8, no. 2, pp.

390–404, Apr. 2006.

[11] J. Chakareski, P. A. Chou, and B. Aazhang, “Computing rate-distortion optimized policies for streaming media to wireless clients,”

in Proceedings of Data Compression Conference, 2002, pp. 53–62.

[12] Y. Zhang, F. Fu, and M. van der Schaar, “On-line learning and optimization for wireless video transmission,” IEEE Trans. Signal

Process., vol. 58, no. 6, pp. 3108–3124, Jun. 2010.

[13] F. Fu and M. van der Schaar, “A new systematic framework for autonomous cross-layer optimization,” IEEE Trans. Veh. Technol.,

vol. 58, no. 4, pp. 1887–1903, May. 2009.

[14] C. Chen, R. W. Heath, A. C. Bovik, and G. de Veciana, “Adaptive policies for real-time video transmission: a Markov decision

process framework,” in 18th IEEE International Conference on Image Processing, Sept. 2011.

21

[15] Y. Li, A. Markopoulou, J. Apostolopoulos, and N. Bambos, “Content-aware playout and packet scheduling for video streaming

over wireless links,” IEEE Trans. Multimedia, vol. 10, no. 5, pp. 885–895, Aug. 2008.

[16] J. Cabrera, A. Ortega, and J. I. Ronda, “Stochastic rate-control of video coders for wireless channels,” IEEE Trans. Circuits Syst.

Video Technol., vol. 12, no. 6, pp. 496–510, Jun. 2002.

[17] J. W. Huang, H. Mansour, and V. Krishnamurthy, “A dynamical games approach to transmission-rate adaptation in multimedia

WLAN,” IEEE Trans. Signal Process., vol. 58, no. 7, pp. 3635–3646, Jul. 2010.

[18] J. Filar and K. Vrieze, Competitive Markov Decision Processes. New York: Springer-Verlag, 1997.

[19] F. Fu and M. Van Der Schaar, “A systematic framework for dynamically optimizing multi-user wireless video transmission,” IEEE

J. Sel. Areas Commun., vol. 28, no. 3, pp. 308–320, Apr. 2010.

[20] Q. Zhang and S. A. Kassam, “Finite-state Markov model for Rayleigh fading channels,” IEEE Trans. Commun., vol. 47, no. 11,

pp. 1688–1692, Nov. 1999.

[21] H.-P. Lin and M.-J. Tseng, “Two-layer multistate Markov model for modeling a 1.8 GHz narrow-band wireless propagation channel

in urban Taipei city,” IEEE Trans. Veh. Technol., vol. 54, no. 2, pp. 435–446, Mar. 2005.

[22] M. H. Pinson and S. Wolf, “A new standardized method for objectively measuring video quality,” IEEE Trans. Broadcast., vol. 50,

no. 3, pp. 312–322, Sept. 2004.

[23] D. M. Chandler and S. S. Hemami, “VSNR: A wavelet-based visual signal-to-noise ratio for natural images,” IEEE Trans. Image

Process., vol. 16, no. 9, pp. 2284–2298, Sept. 2007.

[24] A. K. Moorthy and A. C. Bovik, “Visual importance pooling for image quality assessment,” IEEE J. Sel. Topics Signal Process.,

vol. 3, no. 2, pp. 193–201, Apr. 2009.

[25] Z. Wang and Q. Li, “Video quality assessment using a statistical model of human visual speed perception,” Journal of the Optical

Society of America, vol. A 24, B61-B69, Jul. 2007.

[26] Test sequences [On line]. Available: http://trace.eas.asu.edu/yuv/.

[27] H. S. Wang and P.-C. Chang, “On verifying the first-order Markovian assumption for a Rayleigh fading channel model,” IEEE

Trans. Veh. Technol., vol. 45, no. 2, pp. 353–357, May. 1996.

[28] T. Su, H. Ling, and W. J. Vogel, “Markov modeling of slow fading in wireless mobile channels at 1.9 GHz,” IEEE Trans. Antennas

Propag., vol. 46, no. 6, pp. 947–948, Jun. 1998.

[29] C. C. Tan and N. C. Beaulieu, “On first-order Markov modeling for the Rayleigh fading channel,” IEEE Trans. Commun., vol. 48,

no. 12, pp. 2032–2040, Dec. 2000.

[30] Technical Report [On line]. Available: https://webspace.utexas.edu/cc39488/pdf/report.pdf.

[31] M. Neuts, Matrix-Geometric Solutions in Stochastic Models: An Algorithm Approach. The Johns Hopkins University Press, 1981.

[32] J. Reichel, S. Schwarz, and W. M, “Joint scalable video model 11 (JSVM 11),” Joint Video Team, Doc. JVT-X202, Jul. 2007.

22

TABLE I

THE ENCODING PARAMETERS AND RATE-QUALITY MODEL PARAMETERS OF THE TESTED SEQUENCES.

sequencesLayer 1 (base layer) Layer 2 Layer 3

QP rI1 rP1 q1 QP rI2 rP2 q2 QP rI3 rP3 q3

foreman 34 0.0448 0.0095 0.9165 30 0.057 0.0144 0.0256 28 0.0264 0.0203 0.0106

bus 34 0.0942 0.0285 0.9167 30 0.0864 0.0397 0.0390 28 0.0420 0.0580 0.0126

flower 40 0.0846 0.0130 0.9117 36 0.075 0.0225 0.0400 35 0.028 0.0268 0.008

mobile 39 0.1098 0.0174 0.9021 35 0.0953 0.0338 0.0427 34 0.0356 0.0416 0.0076

Paris 37 0.0765 0.0079 0.9155 32 0.0711 0.0135 0.0375 30 0.0392 0.0182 0.0113

TABLE II

THE PERFORMANCE OF THE MDP-BASED POLICY.

Paris mobile flower bus foreman

MDP-based Policy 0.9479 0.9437 0.9468 0.9471 0.9438

H1 0.9144 0.9023 0.9117 0.9167 0.9165

H2 0.9164 0.8585 0.9160 0.8930 0.8943

H3 0.8821 0.9199 0.9325 0.9337 0.9271

TABLE III

THE PERFORMANCE OF THE NEAR-OPTIMAL POLICY. ρ = 0.5123 AND τ = 2.


MDP-based Policy 0.9479 0.9437 0.9468 0.9471 0.9438

Near-optimal Policy 0.9434 0.9418 0.9444 0.9460 0.9416

TABLE IV

THE PERFORMANCE OF THE NEAR-OPTIMAL POLICY. ρ = 0.7232 AND τ = 4.


MDP-based Policy 0.9530 0.9461 0.9537 0.9494 0.9476

Near-optimal Policy 0.9431 0.9418 0.9468 0.9491 0.9414

23

Scheduler

MarkovChannel

Receiver

Requests

Channel and Receiver Buffer State

VideoServer

RequestedData

Transmitter

Fig. 1. Dynamic Scheduling for Video Transmission

-1,3

-1,2

-1,1

0,3

0,2

0,1

1,3

1,2

1,1

2,3

2,2

2,1

3,3

3,2

3,1

4,3

4,2

4,1

Unexpired FramesExpired Frames

CurrentFrame

Fig. 2. Encoder Prediction Structure when L = 3

24

1 2 3 4 5 6 7 8 9 100.04

0.05

0.06

0.07

0.08

0.09

0.1

0.11

0.12

0.13

Frame Index

Fram

e si

ze/M

bit

Foreman

Measured dataAdopted model

(a)

1 2 3 4 5 6 7 8 9 100.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0.22

Frame Index

Fram

e si

ze/M

bit

Paris


(b)

0 20 40 60 80 100 120 1400

0.01

0.02

Fram

e si

ze/M

bit

1st l

ayer

Foreman

0 20 40 60 80 100 120 1400

0.01

0.02

0.03

Fram

e si

ze/M

bit

2nd

laye

r

0 20 40 60 80 100 120 1400

0.01

0.02

0.03

Frame Index

Fram

e si

ze/M

bit

3rd

Laye

r

Measured dataAdopted Model

(c)

0 20 40 60 80 100 120 1400

0.005

0.01

0.015

Fram

e si

ze/M

bit

1st l

ayer

Paris

0 20 40 60 80 100 120 1400

0.01

0.02

0.03

Fram

e si

ze/M

bit

2nd

laye

r

0 20 40 60 80 100 120 1400

0.02

0.04

Frame Index

Fram

e si

ze/M

bit

3rd

laye

r


(d)

0 50 100 1500.9

0.91

0.92

0.93

0.94

0.95

0.96

Frame Index

MS

−SS

IM

Foreman


(e)

0 50 100 1500.91

0.92

0.93

0.94

0.95

0.96

0.97

Frame Index

MS

−SS

IM

Paris


(f)

Fig. 3. Comparison between measured rate-quality characteristics and estimated rate-quality characteristics using the adopted rate-

quality model. The measured rate and the estimated rate for P frames are shown in (a) and (b); The measured rate and the estimated

rate for I frames are shown in (c) and (d); The measured MS-SSIM value and the modeled MS-SSIM value are shown in (e) and (f);

25

UnexpiredFrames

ExpiredFrames

P P P P P P P I P PP PI

Receiveddata unit

Unreceiveddata unit

Fig. 4. An example of buffer state when there is no constraints applied on scheduling policies. LGOP = 8, L = 3

UnexpiredFrames

ExpiredFrames

P P P P P P P I P

pre

PP P

post

I

Receiveddata unit

Unreceiveddata unit

(a)

UnexpiredFrames

ExpiredFrames

itritr

postpre

(b)

Fig. 5. An illustration of the receiver buffer state when LGOP = 8, L = 3. (a): Bpret = (4, 2, 1), BIt = (6, 2), B

postt = (3, 3, 1) and

N itr = 0; (b): Bpret = (0, 0, 0), BIt = (5, 2), B

postt = (4, 1, 0) and N

itr = 2.

26

Displayed Frame

P P I P P P P P P

pre+ ��

IP P

post+

P

post

(a) The displayed frame is an I frame.

Displayed Frame

P P P P P P P I P

pre+ ��

PP P

post+

I

pre post

(b) The displayed frame is a P frame.

Fig. 6. Buffer state transition after a frame is played out. LGOP = 8, L = 3.

Statespace s

'S��

WWWW

W W�W W�

(a) Aμ

Statespace s

'S( | )T S S� � ))(

T� �

�� W

W ��W ��

(b) Ãμ

Fig. 7. The dynamics of the system Aμ and the corresponding simplified system Ãμ.

Date post:	24-Jan-2021
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

1 Markov Decision Model for Perceptually Optimized Video...

Documents