+ All Categories
Home > Documents > 1 Markov Decision Model for Perceptually Optimized Video...

1 Markov Decision Model for Perceptually Optimized Video...

Date post: 24-Jan-2021
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
26
1 Markov Decision Model for Perceptually Optimized Video Scheduling Chao Chen, Student Member, IEEE, Robert W. Heath Jr., Fellow, IEEE, Alan C. Bovik, Fellow, IEEE, and Gustavo de Veciana, Fellow, IEEE, Abstract Transmitting video over slow fading wireless channels with good perceptual quality is a challenging task because no time-diversity can be exploited to combat channel variations, especially when the frequency diversity and spatial diversity is not available due to the wireless system implementation. While quality-scalable video coding techniques make video source-rate adaptation possible, determining a good scheduling strategy which selectively schedules video data associated with different layers is a challenging problem. For the best performance of a wireless video system, the scheduler needs to consider the channel state, the buffer state and the perceptual video quality at the receiver. In this paper, we propose a scheduling algorithm to optimize the perceptual quality of scalably coded videos transmitted over slow fading channels. By modeling the dynamics of the channel as a Markov chain, we reduce the problem of dynamic video scheduling to a tractable Markov decision problem over a finite state space. We then employ an infinite-horizon average-reward maximization algorithm to maximize the time-average Multi-Scale Structural SIMilarity (MS-SSIM) index which has been shown to correlate highly with human judgments of video quality. Simulation results show that the proposed MDP-based scheduling policy achieves significant perceptual quality improvement over scheduling methods which do not explicitly exploit the channel dynamics. Furthermore, we propose an on-line scheduling method which not only performs nearly as well as the MDP-based performance but also has very low implementation complexity. Index Terms Videos, Scheduling algorithm, Wireless communication, Image quality. The authors are with Department of Electrical and Computer Engineering, The University of Texas at Austin, 1 University Station C0803, Austin TX - 78712-0240, USA e-mail: [email protected] This research was supported in part by Intel Inc. and Cisco Corp. under the VAWN program.
Transcript
  • 1

    Markov Decision Model for Perceptually

    Optimized Video Scheduling

    Chao Chen, Student Member, IEEE, Robert W. Heath Jr., Fellow, IEEE, Alan C.

    Bovik, Fellow, IEEE, and Gustavo de Veciana, Fellow, IEEE,

    Abstract

    Transmitting video over slow fading wireless channels with good perceptual quality is a challenging

    task because no time-diversity can be exploited to combat channel variations, especially when the frequency

    diversity and spatial diversity is not available due to the wireless system implementation. While quality-scalable

    video coding techniques make video source-rate adaptation possible, determining a good scheduling strategy

    which selectively schedules video data associated with different layers is a challenging problem. For the best

    performance of a wireless video system, the scheduler needs to consider the channel state, the buffer state and

    the perceptual video quality at the receiver. In this paper, we propose a scheduling algorithm to optimize the

    perceptual quality of scalably coded videos transmitted over slow fading channels. By modeling the dynamics

    of the channel as a Markov chain, we reduce the problem of dynamic video scheduling to a tractable Markov

    decision problem over a finite state space. We then employ an infinite-horizon average-reward maximization

    algorithm to maximize the time-average Multi-Scale Structural SIMilarity (MS-SSIM) index which has been

    shown to correlate highly with human judgments of video quality. Simulation results show that the proposed

    MDP-based scheduling policy achieves significant perceptual quality improvement over scheduling methods

    which do not explicitly exploit the channel dynamics. Furthermore, we propose an on-line scheduling method

    which not only performs nearly as well as the MDP-based performance but also has very low implementation

    complexity.

    Index Terms

    Videos, Scheduling algorithm, Wireless communication, Image quality.

    The authors are with Department of Electrical and Computer Engineering, The University of Texas at Austin, 1 University Station

    C0803, Austin TX - 78712-0240, USA e-mail: [email protected] This research was supported in part by Intel Inc. and Cisco Corp.

    under the VAWN program.

  • 2

    I. INTRODUCTION

    Video transmission over wireless channels is a challenging task. The throughput of wireless channels

    varies over time, making the delivery of real-time video challenging due to tight delay constraints.

    In particular, if the coherence time of the channel is comparable to the delay constraint, then the

    time-diversity of the channel cannot be exploited. Traditional channel coding methods cannot provide

    graceful visual quality degradation of the received video signal when deep fading happens. Hence, adap-

    tive transmission techniques such as multi-layer scheduling and link-adaptation should be employed.

    Furthermore, video packets are structured. Due to the nature of predictive video coding algorithms, a

    video frame can be decoded only when its predictors are received at the receiver. Hence, the prediction

    structure of the video codec enforces a partial order on the transmissions of the video packets.

    Scalable video coding (SVC) is one approach for allowing flexible video transmission over channels

    with varying throughput [1], [2]. An SVC video encoder produces a layered video stream that contains

    a base layer and several enhancement layers. If the throughput is low, the transmitter can choose to

    transmit the base layer only, which provides a moderate, but acceptable, degree of visual quality at the

    receiver. If the channel conditions improve, the transmitter can transmit one, or more, enhancement

    layers to further improve the visual quality. Conceptually, SVC provides a means to adapt the data rate

    for wireless video transmission. The wireless transmitter can easily adapt the data rate by selectively

    scheduling video data associated with various layers rather than re-encoding the video sequence into

    a suitable rate.

    Designing scalable video scheduling algorithms for wireless channels is a complex task. The schedul-

    ing policy depends not only on the channel condition but also on the receiver buffer state. For example,

    if the receiver has successfully buffered base layer data over many frames, the scheduler could choose

    to transmit some enhancement layer data to improve the video quality even if the throughput is low.

    The scheduling policy also depends on the impact that particular video data packet will have on the

    perceptual video quality. The scheduler should assign higher priority to packets which could result in

    higher perceptual quality improvements. An effective perceptual quality metric and an accurate rate-

    quality model are important for scheduling policy design. The objective of this paper is to develop

    scheduling algorithms to maximize the receiver perceptual video quality for scalable video transmission

    over wireless channels.

    A. Contributions

    In this paper, we assume that video sequences are encoded by an H.264/SVC-compatible scalable

    video encoder. We employ a finite state Markov chain (FSMC) to model the dynamics of the slow

  • 3

    fading channel. We also employ a rate-quality model to capture the relationship between the size

    of a video data packet and its contribution to the receiver perceptual quality. We model the dynamic

    video transmission system as a controlled Markov system. The visual quality measured in time-average

    MS-SSIM index is maximized by optimizing the scheduling policy via value iteration (see e.g. [3]).

    The specific contributions that we make are as follows:

    1) A tractable MDP-based formulation is proposed to design the optimal scheduling policy. Typical

    mobile users usually have available application layer storage space of several Gigabytes. Thus,

    the buffer size can be regarded as infinite. Because the performance of scheduling policies

    depend on the buffer state, the policies needs to be optimized over a infinitely large state space.

    By making some reasonable approximations, we fix the scheduling policy except a finite set.

    We prove that optimizing the transmission policy for this finite-state set is equivalent to solving

    a semi-Markov decision problem. Based on this result, a value iteration algorithm is used to

    optimize the scheduling policy.

    2) Accurate prediction of visual quality is used as the optimization objective. In most video trans-

    mission methods, Peak Signal to Noise Ratio (PSNR) is used as the optimization objective. It

    is well known that PSNR does not accurately predict the perceptual quality of videos in many

    instances [4] [5] [6]. In this paper, we employ an infinite-horizon average-reward maximization

    formulation which is directly related to the time-average MS-SSIM index [7]. As shown in [8],

    the time-average MS-SSIM index correlates quite well with human judgments of visual quality.

    In this paper, the system state is mapped to MS-SSIM index using a simple rate-quality model.

    Then, the MS-SSIM value of each state is used as the state-value in the value iteration algorithm.

    3) A simple and near-optimal scheduling method is proposed. Computing the scheduling policy

    based on the MDP formulation requires extra system resources. We devise a simple and near-

    optimal scheduling algorithm which, similar to the MDP-based policy, proactively transmits data

    for later GOPs and dynamically schedules data associated with different layers.

    B. Related Work

    MDP-based stochastic control techniques have been proposed for video data scheduling [9]–[14].

    In [9], adaptive video transmission over a packet erasure channel was studied by modeling the buffer

    state as a controlled Markov chain. Later, in [10], an MDP-based scheduling algorithm was proposed

    for video transmission over packet loss networks. This work was further extended for wireless video

    streaming in [11]. The wireless channel was modeled as a binary symmetric channel. This channel

    model is justified for fast fading channels where the coherence time is much less than the delay

  • 4

    constraint. In that case, interleaving can be applied without violating the delay constraint and the

    channel will appears like a i.i.d channel. For slow fading channels, the bit error rate cannot be modeled

    as a constant. In [12] [13], a reinforcement learning framework was proposed for wireless video

    transmission. Their algorithm was based on MDP with discounted-reward maximization formulation.

    The transmitter learns the characteristics of the channel and the video sequence during the transmission

    process. The scheduling policy is updated according to the learned characteristics. In our previous

    work [14], an infinite-horizon average-reward maximization formulation was proposed. A very simple

    rate-quality model was employed to differentiate the importance of different layers. The difference

    between frames, however, was not incorporated in the rate-quality model. In addition, PSNR, instead

    of time-average MS-SSIM index, was used as the optimization objective.

    MDP-based methods have been used to solve other problems in adaptive video transmission. In

    [15], an MDP formulation was proposed for adaptive video playout and scheduling for single layer

    videos. A two-state channel model was used to represent “good” and “bad” channel conditions. The

    controller adapts the playout speed according to the receiver buffer state and channel state in order to

    optimize the PSNR of the signal at the receiver. In [16], an MDP-based formulation was introduced

    for the problem of real-time encoder rate-control. The derived optimal rate-control policy adapts the

    encoding bit rate according to the channel condition and video rate-quality characteristics. In [17],

    multi-user scheduling and rate adaptation in wireless local-area networks (WLAN) were studied. The

    users in a WLAN access the available resources in a decentralized manner. The multi-user scheduling

    problem was formulated as a competitive Markov decision process [18] and a Nash-equilibrium policy

    was computed via a value iteration algorithm.

    Among all the mentioned work, the most closely related work to ours are [10] and [12] which focus

    on single user scalable video transmission. The differences from our work are summarized as follows:

    • The video scheduling algorithm proposed in [10] was developed for packet loss networks. In

    a packet loss network, the throughput of each transmission slot is limited by congestion rather

    than by deep signal fading in the transmission medium. Hence, in [10], the channel was assumed

    to be time-invariant. Each packet was assumed to be lost or delayed independently of the other

    packets. Based on these assumptions, the scheduling policies for different video packets can be

    factorized. Thus the policy optimization problem was greatly simplified. For wireless channels,

    the channel state is time varying and thus the packet losses are not independent across coherent

    periods, hence scheduling policy factorization is not possible.

    • In [12], Zhang et al. studied single user video transmission over wireless channels. The control

  • 5

    policy was learned on-line through reinforcement learning. The major drawback of reinforcement

    learning is that it takes time to learn from the wrong scheduling actions. The scheduler may cause

    bad visual quality during the learning period. As shown in [12] and [19], with an accelerated

    on-line learning algorithm, the scheduling policy converges after 25 frames are transmitted. In

    our work, the channel dynamics are assumed to be known. Indeed, the finite state Markov model

    can be obtained analytically [20] or by measurement [21]. The channel model is combined with

    a simple rate-quality model to form a model-based MDP formulation. The scheduling policy thus

    can be derived off-line.

    • In [12] [16] [19], a discounted-reward maximization formulation is employed to trade off the

    visual quality of recent and later frames. The discounting factor needs to be chosen heuristically

    and affects the performance of the derived scheduling policy. In our work, an average-reward

    maximization formulation is proposed. This formulation is naturally related to the time-average

    MS-SSIM index which correlates well with human objective judgements of visual quality [8].

    C. Notation Used

    A and a are examples of a matrix and a vector, respectively. A is a set. |A| is the cardinality ofset A. 1 is the unit vector of all-ones and 0 is the zero vector. max{a,b} and min{a,b} are thecomponentwise maximum and minimum of vector a and b, respectively. �(·) is the indicator function.�·� is the ceiling function. P(·) is the probability measure and E[·] is the expectation.

    II. SYSTEM MODEL

    In this section, we first describe the wireless video system to be considered. Then, we present our

    video codec configurations and introduce the rate-quality model based on MS-SSIM index. Finally,

    we present the Markov channel model to be used in the sequel.

    A. System Overview

    We consider a time-slotted scalable video transmission system over slow fading wireless channels.

    A video sequence is encoded with a quality-scalable video encoder and stored in a video server. The

    video server transmits video data to a mobile user via a wireless transmitter. Each slot, the server

    sends some video data upon the requests of a scheduler equipped on the wireless transmitter. These

    data are packetized at the wireless transmitter for physical layer transmission. The scheduler operates

    according to a scheduling policy which maps the channel and buffer state to the scheduling action(see

    Fig. 1).

  • 6

    We assume that the channel between the video server and the wireless transmitter is not the

    bottleneck of the link. Thus, in the perspective of the wireless transmitter, the whole video sequence

    is accessible. We also assume that the physical layer channel state information is available at the

    transmitter and that the modulation and coding scheme (MCS) is determined by a given physical layer

    link-adaptation policy.

    B. Video Codec Configuration

    We assume that video sequences are encoded by an H.264/SVC-compatible scalable video encoder.

    The duration of each frame ΔT is called a frame slot. The video frames are uniformly partitioned into

    Groups of Pictures (GOPs). Every GOP has LGOP frames. The first frame in a GOP is an I frame

    while the other frames are P frames. Every frame is encoded into L layers. The first layer is the base

    layer; The other layers are enhancement layers. Every enhancement layer of a frame is predictively

    encoded using the lower layers of the frame. The base layer of a P frame is predictively encoded

    using the base layer of its preceding frame. The base layer of an I frame is encoded independently

    (see Fig. 2).

    Each frame has a playout deadline at the receiver. In the following, frames whose deadlines have

    expired are called expired frames. The other frames are called unexpired frames. The first unexpired

    frame is called the “current frame”. At any time, the frames are indexed relative to the current frame

    as shown in Fig. 2. The video data in the �th layer of the f th frame is called the (f, �)th video data

    unit.

    We adopt the prediction structure in Fig. 2 rather than the “Hierarchical B” structure because

    no structural delay is introduced [1]. Specifically, in the “Hierarchical B” prediction structure, the

    encoding order differs from the display order, thus, the transmission of a frame must be delayed until

    all necessary predictors are received. Also, due to the time-varying nature of the wireless channels,

    the adaptive transmitter must drop some enhancement layers when channel throughput is low. So,

    if the enhancement layers are used to predict other frames as is the case in the “Hierarchical B”

    structure, dropped enhancement layers can give rise to error propagation and unpredictable visual

    quality degradation. At the possible cost of lower compression efficiency, the prediction structure that

    we use will eliminates error propagation arising from enhancement layer losses, since there will be

    no inter-frame prediction among enhancement layers.

  • 7

    C. Rate-Quality Model

    The rate-quality model characterizes the relationship between the size of a video data unit and

    the visual quality improvement when it is correctly received. We adopt a simple model. For each

    P frame, let rP� be the amount of data in the �th layer data unit. For each I frame, let rI� be the

    amount of data in the �th layer data unit. Define q� to be the visual quality increment after the �th

    layer is correctly received, given all its predictors are also received. This model implies that the visual

    quality improvement incurred by the enhancement layers in one frame does not depend on whether

    the enhancement layers of other frames are received. This is true for the prediction structure given in

    Section II-B since there is no error propagation due to losses of enhancement layers.

    Conventional image quality measures such as the PSNR reflect absolute signal fidelity but without

    accounting for perceptual visual quality. Recently, a variety of models that accurately predict perceptual

    video quality have been proposed [7], [22]–[25]. In our formulation, we adopt MS-SSIM index as the

    visual quality measure [7], since it has been shown to correlate quite well with perceptual visual

    quality and it is of reasonable computational complexity [8].

    The MS-SSIM index of a video sequence ranges from 0 to 1. The larger the index, the better

    the quality. In our rate-quality model, the quality increment q� is measured using MS-SSIM index.

    Therefore, the quantity q� ∈ [0, 1]. Larger values of q� mean larger quality improvement can be achievedby transmitting the �th layer data units.

    In a real video sequence, rate-quality characteristics vary from frame to frame. For simplicity, we

    use the average value of the measured rate-quality characteristics as estimates of rI� , rP� and q�. In Fig.

    3, the data rates and MS-SSIM values of two video test sequences, “Foreman” and “Paris”, are shown.

    These two sequences are widely used in visual quality assessment. “Foreman” has higher temporal

    complexity and “Paris” has higher spatial complexity [26]. As shown in Fig. 3, our proposed model

    is a good fit for the rate-quality characteristics.

    D. Channel Model

    We focus on scheduling for a slow fading channel. By slow fading, we mean that the coherence time

    of the channel is less than the duration of a GOP and larger than the duration of a frame. Assuming

    the mobile users are moving in a 1.5m/s walking speed and the carrier frequency is 2GHz, the Doppler

    spread is about 10Hz. The coherence time is about 100ms. A typical GOP duration is about 1 second

    and a frame slot is about 30ms. Hence, for pedestrian video users, wireless channels are slow fading.

    As the channel state is stable during each frame slot, the scheduling decision is made in a frame-by-

    frame basis. At the beginning of each frame slot, a frame is played out. Then, the wireless transmitter

  • 8

    schedules video data units for transmission according to a scheduling action. The scheduling action is

    defined as an ordered collection of video data units

    u ={(f1, �1), (f2, �2), · · · , (f|u|, �|u|)

    }.

    When a scheduling action u is taken, the data units contained in u are transmitted sequentially. At

    physical layer, each scheduled data unit is packetized into physical layer packets and each packet is

    repeatedly transmitted, i.e., if errors occur, until acknowledged. The MCSs used in transmitting data

    packets is determined by a link-adaption policy. In this paper, we focus on scheduling policy design

    and assume that the link-adaptation policy is given.

    In [20] and [27], it is shown that the first-order FSMC can be utilized to accurately describe the

    first-order channel state transition probabilities for Rayleigh fading channels. First-order FSMC models

    have also been validated in [21] and [28] by channel measurements of urban area wireless channels. In

    this paper, we employ a first-order FSMC to describe the dynamics of the channel state. It should be

    noted that, as pointed in [29], a first-order FSMC is not sufficient to describe high-order channel state

    distributions. Generally, the autocorrelation function (ACF) of a first-order FSMC is exponentially

    decreasing and the ACF of a Rayleigh fading channel is a zeroth-order Bessel function of the first

    kind. To model the higher order dynamics of the wireless channel, at the cost of higher complexity,

    a higher order Markov channel model can be applied.

    At the physical layer, in the tth frame slot, the transmission bit rate Rt is determined by the MCS

    and the packet error rate pt is determined by both the channel state and the MCS. Under the given

    link adaptation method, the chosen MCS is a function of the channel state. Thus, there is a one-to-one

    mapping from channel state to the tuple (Rt, pt). Due to the Markov property of the channel state,

    (Rt, pt) can also be modeled by an FSMC. The channel state space is C ={C1, ..., C|C|

    }, where

    Ci = (Ri, pi) is the ith channel state. The state transition matrix Pc is a |C| × |C| matrix with entryPci,j = P(Cj|Ci) being the transition probability from state (Ri, pi) to (Rj, pj).

    III. PROBLEM FORMULATION

    In this section, we define the scheduler’s state space and the policies to be considered. Then, we show

    how to simplify the scheduling problem to a finite-state Markov decision problem using reasonable

    approximations. An infinite-horizon average-reward maximization MDP formulation is proposed to

    optimize the scheduling policy so as to improve the time-average MS-SSIM index at the receiver.

  • 9

    A. Scheduling Policy and State Space

    Considering all the possible scheduling actions makes defining the scheduling policy and represent-

    ing the buffer state unmanageably complicated. If we do not apply any constraint on the scheduling

    actions, the receiver buffer state could look like Fig. 4. On the one hand, to represent the buffer state,

    the frame index and the layer index of each received data unit need to be recorded. Because the

    number of received data units is not bounded, we cannot represent all possible buffer states using a

    finite-dimension vector space. On the other hand, the scheduling actions which give rise to the buffer

    state in Fig. 4 cannot provide optimal visual quality at the receiver. As shown in Fig. 4, some video

    data units are transmitted before their predictors. If their predictors are not received before their playout

    deadlines, these units are undecodable and useless. In this paper, by applying reasonable constraints

    on the scheduling actions, we concentrate on those scheduling strategies which are possible to present

    good performances. Specifically, we consider the scheduling policies which comply with the following

    constraints.

    Constraint 1: The scheduler always schedules a data unit later than its predictors in the prediction

    structure.

    Constraint 2: The amount of video data scheduled in the tth slot is just larger than Rt ×ΔT , i.e.,the amount of data which can be transmitted in the slot.

    Constraint 3: The scheduler never schedules more enhancement layer data units for later P frames

    than sooner P frames in the same GOP.

    Constraint 1 is applied to make sure that the transmission order is compatible with the prediction

    order given in Section II-B, since a data unit can be decoded only when its predictors are received.

    Constraint 2 forces the transmitter to keep busy transmitting data during the entire slot. With Constraint

    3, at any time and for all the P frames within a GOP, the scheduler does not sacrifice the quality of

    the frames which will be displayed sooner for the frames to be displayed later by transmitting more

    enhancement layer data for the latter. Because the optimization objective is time-average MS-SSIM

    index, the quality of each P frame is equally important. Transmitting more enhancement layer data

    for later frames does not help to improve the time-average MS-SSIM value.

    Note that, although the P frames within a GOP are equally important for time-average MS-SSIM

    index, the frames in different GOPs are not. As discussed in Section I, when the channel throughput is

    very low, it is beneficial to sacrifice P frames in current GOP for transmitting the I frame of the next

    GOP. To differentiate the importance of current and future GOPs, we partition the data units of the

    unexpired frames into three sets: Ipre, I and Ipost. The set I contains the unexpired data units of the

  • 10

    first unexpired I frame, Ipre contains data units before the first unexpired I frame, and Ipost containsthe remaining unexpired data units (see Fig. 5). In the following, we define the buffer state spaces for

    the three sets as BI , Bpre and Bpost, respectively. The overall buffer state space is B = Bpre×BI×Bpost.BI At the tth slot, the state of I is defined as BIt = (f It , bIt ), where f It ∈ {1, · · · , LGOP} is the

    frame index of the first unexpired I frame and bIt is the number of the received data units

    of I.Bpre According to Constraint 3, the number of data units received in Ipre is decreasing with

    respect to the frame index. Hence, we only need to record the number of received data

    units for each layer. We define the buffer state space of Ipre as a L-dimensional vectorBpret = (b

    pre1 , b

    pre2 , · · · , bpreL ), where bpre� is the number of the received data units in �th layer of

    Ipre.Bpost It is noted that Constraint 3 is only applied within each GOP. Hence, we cannot define the

    state space of Ipost in the similar way as we did for Ipre. In principle, we need to record thenumber of received data units for each frame in Ipost. In that case, we cannot define a vectorspace with fixed number of dimensions to represent the state of Ipost. Hence, for simplicity,we extend Constraint 3 to all the frames in Ipost. In other words, the scheduler never transmitsmore enhancement layers for later frames than sooner frames in Ipost. Similar to Ipre, wedefine the buffer state space of Ipost as a L-dimensional vector Bpostt = (bpost1 , bpost2 , · · · , bpostL ),where bpost� is the number of the received data units in �th layer of Ipost.

    Remark By extending Constraint 3 to all the GOPs in Ipost, we actually rule out the option oftransmitting more data units for the I frames in later GOPs of Ipost. This may potentially degradeperformance but this is negligible. Transmitting more data units for later I frame is necessary only

    when the channel state is bad throughout the whole GOP. However, as we assumed in Section II-D,

    the coherence time of the channel is much less than the duration of a GOP. The channel is less likely

    to be bad throughout the whole GOP. Hence, quality degradation is less likely to occur.

    When no video data units for the current frame are received, the decoder cannot continue decoding

    the frame. The video stream will experience an interruption until all the necessary base layer data

    units for decoding the current frame are received. We define an interruption state N itrt which is the setof the unreceived base layer data units for decoding the current frame at slot t (see Fig. 5(b)). When

    the current frame is decodable, the interruption state N itrt = ∅. N itrt contains at most LGOP − 1 dataunits because I frames resynchronize the decoding process and terminate the interruption. Because

  • 11

    every data unit is transmitted only when all its predictors are received, the data units needed for

    decoding the current frame must be composed of a sequence of consecutive base layer data units,

    i.e., N itrt = {(−N itrt + 1, 1), (−N itrt + 2, 1), · · · , (0, 1)}. Hence, the interruption state can be simplyrepresented by the number N itrt = |N itrt |.

    The system state S is defined as the product of the channel state, the buffer state and the interruptionstate. At slot t, the system state is St = (Ct, Bt, N itrt ), where Bt = (B

    pret , B

    It , B

    postt ). For each state

    St ∈ S , we define a feasible control set USt which contains all the scheduling actions complying withall the three constraints. At any time, the state St contains all the information about the receiver buffer

    and the channel. The transmitter must decide which action in USt to take in order to maximize thetime-average MS-SSIM index value. We define the scheduling policy μ(·) as the mapping from thesystem state St to an action in USt . In the following sections, we show how to optimize the schedulingpolicy μ(·).

    B. Policy Simplification

    Because the receiver buffer size is regarded as essentially infinite, the state space Bpost is thereforeinfinite. Optimizing the scheduling policy over this infinite-state space is intractable. In the following,

    we reduce the state space to a finite one by reasonable approximations.

    When many frames are buffered at the receiver, the scheduler can transmit more enhancement

    layers because there is enough time before the frames are played out. Based on this observation,

    we define a window W which contains the data units within the first W unexpired frames. Thescheduler is restricted to work as follows: If all the data in W are all received and N itrt = ∅, thescheduler schedules as many enhancement layers as possible. Otherwise, the scheduler only focuses

    on scheduling the video data in W and N itrt .By using the window W , although the state space is still infinite, we fix the scheduling actions

    outside a finite state set. In other words, it is only necessary to find the optimal actions for a finite-

    state set. The window size W provides a tradeoff between complexity and optimality. The larger the

    window, the less constrained the control policy but the higher complexity.

    It should be noted that the window defined here is different from the sliding window defined in [10]

    and [19]. Our scheduling policy allows the transmitter to transmit the data units outside the window.

    C. Transition Probability

    Let St = (Ct, Bt, N itrt ) and USt be the system state and the corresponding feasible control set atslot t, where Ct = (Rt, pt) and Bt = (B

    pret , B

    It , B

    postt ). At the beginning of each slot, one frame is

  • 12

    decoded and played out. We let B+t = (Bpre+t , B

    I+t , B

    post+t ) denote the buffer state right after the first

    frame is displayed. If f It = 1, i.e., the decoded frame is an I frame, then the frame set I becomesthe next I frame, i.e., the LGOP th frame (see Fig. 6(a)). Hence, buffer state is

    BI+t =

    (LGOP ,

    L∑�=1

    �(bpost� ≥ LGOP )), (1)

    where∑L

    �=1 �(bpost� ≥ LGOP ) is the number of received layers in the next I frame. Meanwhile, Ipre

    becomes the first LGOP − 1 frames and Ipost contains the frames whose index is larger than LGOP .Thus, we have

    Bpre+t = min{Bpostt , (LGOP − 1)1

    }(2)

    and

    Bpost+t = max{Bpostt − LGOP1,0

    }. (3)

    If the decoded frame is not an I frame, the frame set Ipost will not be affected and the buffer stateBpostt does not change (see Fig. 6(b)). B

    pret becomes

    Bpre+t = max {Bpret − 1,0} . (4)

    Summarizing (1), (2), (3) and (4), we have

    Bpre+t =

    ⎧⎪⎨⎪⎩min

    {Bpostt , (LGOP − 1)1

    }if f It = 1,

    max {Bpret − 1,0} if f It �= 1,

    BI+t =

    ⎧⎪⎨⎪⎩(LGOP ,

    ∑L�=1 �(b

    post� ≥ LGOP )

    )if f It = 1,

    (f It − 1, �It ) if f It �= 1,and

    Bpost+t =

    ⎧⎪⎨⎪⎩max

    {Bpostt − LGOP1,0

    }if f It = 1,

    Bpostt if f It �= 1.After the first frame is displayed, the transmitter begins to sequentially transmit the collection of video

    data units indicated by the action ut = μ(St) = {(f1, �1), · · · , (f|ut|, �|ut|)}. Let Δut = {(f1, �1), · · · ,(fnt , �nt)} denote the completely received data units by the end of the slot, where nt is the numberof received data units. Among the data units in Δut, let ΔB

    pret = (Δb

    pre1 ,Δb

    pre2 , · · · ,ΔbpreL ) be the

    number of newly received data units for each layer in frame set Ipre. Similarly, we denote ΔBpostt =(Δbpost1 ,Δb

    post2 , · · · ,ΔbpostL ) as the number of newly received data units for each layer in frame set Ipost

  • 13

    and Δ�I as the number of received data units for I. At the beginning of the (t + 1)th slot, we havethe following state transition relationship

    Bpret+1 = Bpre+t +ΔB

    pret , (5)

    BIt+1 = (fI+t , �

    I+t +Δ�

    I), (6)

    Bpostt+1 = Bpost+t +ΔB

    postt . (7)

    As for the interruption state N itrt , after the first frame is played out, the second frame becomes the

    current frame. If the displayed frame is an I frame, N itr+t = �(bI = 0). Here, �(bI = 0) indicates

    whether the base layer of the displayed frame is received. If the displayed frame is the last frame of

    a GOP, then N itr+t = 0. If the displayed frame is neither an I frame nor the last frame in a GOP, we

    have N itr+t = N itrt +�(bpre1 = 0). At the end of the slot, the data units in N itr+t which are also in Δut

    are removed from N itr+t . Thus,

    N itrt+1 = Nitr+t − |N itr+t ∩Δut|. (8)

    The amount of video data in Δut, denoted by R(Bt,Δut), can be estimated according to buffer state

    BIt and the rate-quality model introduced in Section II-C. Specifically, for each data unit in Δut, we

    first determine whether it belongs to an I frame or a P frame according to BIt and then estimate the

    amount of data by the rate-quality model. The set Δut records the completely transmitted data units

    up to (fnt , �nt)th data unit. However, data unit (fnt+1, �nt+1) is only partially received. Denoting the

    amount of data in unit (fnt+1, �nt+1) by R̃(BIt ,Δut), the amount of received data is at least R(BIt ,Δut)

    and at most R(BIt ,Δut) + R̃(BIt ,Δut). Assuming the physical layer packet length is LPHY , there is

    N = �ΔT×RtLPHY

    � packet transmissions during a time slot ΔT . The number of successfully transmittedpackets is at least Nl = �R(B

    It ,Δut)

    LPHY� and is less than Nh = �R(B

    It ,Δut)+R̃(B

    It ,Δut)

    LPHY�. As assumed in

    Section II-D, the channel state is constant over each slot. Thus, the packet losses are independent

    within each slot. The number of successful packet transmissions in a slot is distributed binomially.

    Hence, the state transition probability from St = (Ct, Bt, N itrt ) to St+1 = (Ct+1, Bt+1, Nitrt+1) is

    Pμ(St|St+1) =[

    Nh−1∑nt=Nl

    (N

    nt

    )pN−ntt (1− pt)nt

    ]P(Ct|Ct+1), (9)

    where the first multiplicative term is the transition probability of the receiver buffer state from

    (Bt, , Nitrt ) to (Bt+1, N

    itrt+1) and the second term is the transition probability of the channel state

    from Ct to Ct+1.

  • 14

    D. Optimization Objective

    At the beginning of each time slot t, the first frame in the window is played out and the MS-SSIM

    index is

    Q(St) =L∑

    �=1

    q� × ��(St), (10)

    where ��(St) is the indicator of whether the �th layer of the displayed frame is received at state St.

    The quantity q� is the rate-quality model parameter defined in Section II-C. Because of the way that

    the MS-SSIM index is defined, the quantity Q(St) is also bounded in [0, 1]. Our aim is to find the

    optimal policy μ∗(·) which maximizes the time-average MS-SSIM index, i.e.,

    Jμ = limN→∞

    1

    NEμ

    {N−1∑t=0

    Q(St)

    }. (11)

    E. Finite State Problem Formulation

    Using the window defined in Section III-B, we fix the transmission policy outside a finite state set.

    The buffer state space, however, is still infinite and the system state evolves in this infinite state space.

    In the following, we show how to simplify this infinite state space problem to a finite-state problem.

    Note that we only need to optimize the policy μ(·) when some of the video data in the windowhave not been received or N itr �= 0. We formally define this finite set SW as follows:

    SW ={(

    C,B,N itr) | (C,B,N itr) ∈ S, V(B) � VW or N itr �= 0} , (12)

    where VW denotes the set of video data units in W and V(B) is the set of buffered video data units.We define another subset of S as follows

    SW ={(

    C,B,N itr) | (C,B,N itr) ∈ S,VW ⊂ V(B) and N itr = 0} . (13)

    For all the states in SW , the video data in W and N itr are all received. Note that, because thetransmitter always transmits the video data units in W and N itr with higher priority, SW and SWform a partition of state space S . In other words, we have SW ∪ SW = S and SW ∩ SW = ∅.

    Given a policy μ(·), the system state transits as a controlled Markov chain in set SW and as anuncontrolled Markov chain in set SW . Because the transmission rate is finite, the number of states inSW which can be reached from SW in one step is also finite. We formally define this set of states asfollows

    SΔ = {S|S ∈ SW ; ∃S ′ ∈ SW , s.t., Pμ(S|S ′) > 0}. (14)

    Once the system moves into the set SW , the system state hits a state in SΔ and then stays in SWfor some time. During this period, the decoded video quality is always Q̂ =

    ∑L�=1 q�, because all

  • 15

    the data units in W are received. The dynamics of the system when it moves into set SW affectsthe performance of the system. Generally, the longer it stays in SW , the better the performance is.Although the scheduling policy in SW is fixed as described in Section III-B, the control policy in SWdetermines how frequently the system state will hit SW and thus also affects the system performance.

    In the following, we denote the system under a given policy μ as system Aμ. Let Tμ(S) be the

    expected time spent by Aμ in SW after it enters SW at state S ∈ SΔ. Let PTμ (S ′|S) denote theprobability that Aμ jumps back to SW at state S ′ ∈ SW after it enters SW at state S. To find theoptimal policy, we define a finite-state system Ãμ as follows:

    Definition 1: A system Ãμ is called the simplified system of the original system Aμ if it has the

    following dynamics:

    1) The system is a semi-Markov process over state space S̃ = SW ∪ SΔ. At any state S ∈ S̃ , thevisual quality is Q(S) as in (10). At any state in SW , the system acts according to the policy μ;

    2) When the system jumps to a state S ∈ SΔ, it spends Tμ(S) slots in S with video quality∑L

    �=1 q�

    for each slot. Then, the system transits to a state S ′ ∈ SW with probability PTμ (S ′|S).It should be noted that Ãμ is not coupled with the original system Aμ. It just shares some properties

    with the original system (see Fig. 7). The following theorem relates the visual quality under Ãμ and

    that of Aμ.

    Theorem 1: If the jump chain of the original system Aμ is positive recurrent, then the time-average

    MS-SSIM index of Aμ is the same as the simplified system Ãμ.

    Proof Sketch: If the jump chain is positive recurrent, the jump from SW to SΔ can partitionthe Markov process into i.i.d segments. We only need to optimize the policy μ to maximize the

    average quality in each segment. Every segment consists of two consecutive subsegments. During the

    first subsegment, St ∈ SW . In the other subsegment, St ∈ SW . Because every state in SW has thesame visual quality

    ∑L�=1 q�, we can abstract the first subsegment as a single state with transition

    probability PTμ (S′|S). This simplified system provides the same average quality as the original system.

    For a detailed proof, see the technical report [30]

    Remark The positive recurrent condition for the jump chain means that the average throughput of the

    channel is neither too large nor too small relative to the average data rate of the video. If the average

    throughput of the channel is very large, the receiver buffer can always buffer enough frames and the

    dynamic scheduling is unnecessary. If the average channel throughput is too small, the channel cannot

    support the video stream and dynamic scheduling cannot help either.

    As indicated by Theorem 1, given any policy μ, the visual quality of Aμ is the same as Ãμ. Thus,

  • 16

    we can optimize our policy with respect to Ãμ which has a finite-state space, and a standard policy

    optimization algorithm can by applied.

    IV. POLICY OPTIMIZATION

    In the following, we show how to compute the parameters Tμ and PTμ . Then, we present a value

    iteration algorithm to find out the optimal scheduling policy.

    A. Computation of Tμ and PTμ

    Before we can apply an MDP algorithm to optimize the policy, we need to compute Tμ(S) and

    PTμ (S′|S) for every state S ∈ SΔ and S ′ ∈ SW . Both Tμ(S) and PTμ (S ′|S) only involve dynamics of

    the system when the state S ∈ SW . Because the scheduling policy is fixed in SW , we can simplifythe state representation furthermore.

    Let BWt be the number of buffered packets outside the window. Noted that, when the system

    moves in SW , the system always schedules as many enhancement layer data units as possible. Inaddition, when the system is in state set SW , we always have N itrt = ∅. Hence, BWt and f It containall the information about the buffer state (Bt, N itrt ). We can further simplify the state representation

    (Ct, Bt, Nitrt ) to (Ct, B

    Wt , f

    It ) when S ∈ SW . All the states in SW correspond to some states with

    BWt ≥ 0. All the states in SW correspond to some states with BWt < 0.When the system evolves in SW , at the beginning of a slot t, the state BWt first decreases by

    ΔBWd (St) when the current frame is displayed. Then, the transmitter schedules as many enhancement

    layer data units as possible. At the end of the slot, BWt increases by ΔBWi (St). Because the quantity

    ΔBW (St) = ΔBWi (St)−ΔBWd (St) only depends on state St, the state BWt varies like a random walk

    but with Markovian step-size ΔBW . This process can be described by a quasi-birth-death process

    (QBDP). Hence, determining Tμ and PTμ is actually the hitting time problem of the QBDP. The

    problem for continuous time QBDP was essentially solved in [31, p. 96]. The discrete time case can

    also be solved similarly. Details on how to compute Tμ and PTμ are found in the technical report [30].

    B. Determining Optimal Policy via Value Iteration

    Given Tμ and PTμ , the optimal policy for an MDP can be determined for the simplified system Ãμ,

    which is also the optimal policy of A. Let Sini be any state in S̃ = SW ∪SΔ. The hitting time to stateSini can partition the process into i.i.d cycles. Optimizing the policy μ(·) in the cycles maximizes thetime-average MS-SSIM index of the system. Similar to the derivation in [3, p. 441], this is equivalent

  • 17

    to an average-reward maximization problem with stage-reward g(S)− τ(S)λ, where λ is the expectedaverage-reward of each cycle and

    g(S) =

    ⎧⎨⎩

    Q(S) : S ∈ SWTμ(S)

    ∑L�=1 q� : S ∈ SΔ,

    τ(S) =

    ⎧⎨⎩

    1 : S ∈ SWTμ(S) : S ∈ SΔ.

    Let us denote by h(S) the average reward-to-go in each cycle when the system starts at state S. Then

    we have the following Bellman’s equation array:

    h(S) = g(S)− τ(S)λ+∑

    S′∈SW∪∂SPμ(S ′|S)h(S ′), (15)

    where h(Sini) = 0. To find the optimal policy, the standard value iteration algorithm can be applied

    [3, p. 430].

    V. PERFORMANCE EVALUATION AND NEAR-OPTIMAL POLICY DESIGN

    In this section, we first test the MDP-based scheduling policy by simulations and show its superiority

    to those scheduling methods which does not explicitly explore the buffer-channel information. Then,

    we propose a simple scheduling policy which presents near-optimal performance.

    A. Performance evaluation

    The proposed dynamic scheduling algorithm was evaluated on the test sequences of “foreman”,

    “bus”, “flower”,“mobile” and “Paris” [26]. These video sequences were encoded using H.264/SVC

    reference software JSVM [32] into 3 layers. The GOP length was set as LGOP = 16. The encoding

    parameters and rate-quality model parameters are listed in Table I. The parameters rI� and rP� are

    measured in megabits and q� is measured in MS-SSIM index. The quantization parameters (QP ) for

    base layers were chosen such that the base layer quality is about 0.90 to 0.91 MS-SSIM value which

    is of moderate but acceptable visual quality. The QP ’s of the enhancement layers were chosen such

    that the third layer provides MS-SSIM visual quality prediction of 0.95 to 0.96 and the bit rates of

    the two enhancement layers are roughly the same. The Lagrangian multipliers for motion estimation

    and mode decision were set as QP − 2.

  • 18

    We employ a 4-state Markov channel model to test the performance of the proposed scheduling

    algorithm. The state transition matrix is

    Pc =

    ⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣

    15

    45

    0 0

    15

    35

    15

    0

    0 35

    15

    15

    0 0 45

    15

    ⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦

    and the steady state distribution is

    π= [0.15, 0.60, 0.20, 0.05].

    Let us denote the throughput of channel state (Ri, pi) by ri = Ri(1 − pi). The state parameters Riand pi are configured such that r1 < RV1 < r2 < R

    V2 < r3 < R

    V3 < r4 in which R

    Vi is the average

    video data rate up to the ith layer. Hence, the channel throughput fluctuates among the average rate

    of each layer. According the steady state distribution, the average throughput of the channel is higher

    than the base layer but not enough to support the first enhancement layer.

    We simulated 40 transmissions of each sequence over this channel model. To conceal errors, every

    lost frame was reconstructed by copying its preceding frame. To demonstrate the advantages of the

    proposed algorithm, three scheduling algorithms were tested over the same channel realizations. They

    are summarized as follows:

    H1 This policy only schedules the base layer. It is the most conservative transmission strategy.

    H2 This policy always schedules as many enhancement layers as possible. It preferentially

    maximizes the visual quality of recent frames.

    H3 This policy decides dynamically how many enhancement layers to transmit. Specifically,

    when the instantaneous channel throughput is lower than the average data rate up to the first

    enhancement layer, the policy behaves like H1. If the channel throughput is higher than the

    average data rate up to the second enhancement layer, the policy acts like H2. Otherwise,

    the policy schedules the video data up to the first enhancement layer.

    The average visual quality measured using MS-SSIM index is shown in Table II. It is observed that

    the proposed scheduling algorithm outperforms all of the other policies by 0.015 0.035 in MS-SSIM

    value, which is perceptually significant. As can be seen in Table I, to increase the MS-SSIM value

    by 0.02 approximately requires doubling the video bit rate. Thus, the proposed scheduling algorithm

    provides very significant performance improvements over other scheduling policies.

  • 19

    B. Near-optimal scheduling policy design

    Although the MDP-based scheduling policy has the optimal performance among all the policies we

    consider, the off-line computation of MDP policies requires extra system resources. This motivate us

    to design a simple on-line scheduling policy which presents similar performance as the MDP-based

    policy.

    The simulation results show that, by dynamically schedules data associated with different layers,

    H3 achieves much better performance than other heuristics. But, for the sequence “Paris”, which has

    large I frames, H3’s performance is much worse. This is mainly because H3 only transmits video

    frames sequentially without proactively transmitting data of later GOPs. In the following, we propose

    a scheduling scheme which, similar to the MDP-based policy, not only dynamically schedules data

    units associated with different layers but also allocate transmissions for later GOPs.

    At the tth slot, the proposed policy first estimates the amount of data which can be sent before the

    t + τ th time slot as Dt =∑τ−1

    n=1[rtρn−1 + ravg(1 − ρn−1)]. Here, ρ is the subdominant eigenvalue of

    Pc, which represents the temporal correlation of the channel condition. Such a correlation parameter

    could be easily measured. The quantity rt is the throughput of the current slot and ravg is the average

    throughput of the channel. Again, these parameters can be measured. If bI = 0, correctly sending Ito the receiver is critical and thus we set τ = f I . If bI > 0, we care about the transmissions within

    the coherence time and thus we set τ = �1/ ln(1/ρ)�, where 1/ ln(1/ρ) is roughly the relaxation timeof channel variations. Let D�t be the amount of unreceived data contained for the first � layers of the

    next τ frames. The proposed scheduling policy operates as follows:

    • If Dt < D2t , the policy only schedules the base layer data units; If D2t ≤ Dt < D3t , the policy

    only schedules the first two layers; Otherwise, the policy schedules data units from all the three

    layers.

    • If Dt < D1t , the policy schedules as much of base layer data for I as possible. If D1t < Dt < D2t ,the policy schedules up to 50% of the transmissions for I. Otherwise, the scheduler policytransmits the frames sequentially without proactively transmit data for I.

    The parameters of the proposed policy, i.e., ρ, rt and ravg can be easily measured and this policy

    is very simple to implement. The performance of this policy was tested over the simulated Markov

    channel models with different temporal correlation parameter ρ. The simulation results are summarized

    in Table III and Table IV. For most tested sequences, the gap between the proposed policy and the

    MDP-based policy ranges from 0.003 to 0.01.

  • 20

    VI. CONCLUSIONS

    We have developed dynamic scheduling for efficient scalable video transmission in wireless channels.

    By modeling the wireless channel as a Markov chain, an infinite-horizon average-reward maximization

    formulation is proposed to maximize the visual quality predicted by MS-SSIM index. To reduce the

    state space to a finite one, we employ a window to fix the scheduling policy when all the data within

    the window are received. It is shown that the scheduling policy optimization problem is equivalent

    to finding the optimal control policy for a controlled semi-Markov process over a finite-state space.

    Simulation results demonstrate the superiority of the scheduling policy obtained by the proposed

    MDP-based formulation. Further, a simple scheduling policy is proposed and presents near-optimal

    performances.

    REFERENCES

    [1] H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the scalable video coding extension of the H.264/AVC standard,” IEEE

    Trans. Circuits Syst. Video Technol., vol. 17, no. 9, pp. 1103–1120, Sept. 2007.

    [2] W. Li, “Overview of fine granularity scalability in MPEG-4 video standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 11,

    no. 3, pp. 301–317, Mar. 2001.

    [3] D. Bertsekas, Dynamic Programming and Optimal Control, 3rd ed. Athena Scientific, 2005, vol. 2.

    [4] Z. Wang and A. C. Bovik, “Mean squared error: Love it or leave it? A new look at signal fidelity measures,” IEEE Signal Process.

    Mag., vol. 26, no. 1, pp. 98 –117, Jan. 2009.

    [5] B. Girod, “What’s wrong with mean-squared error,” Digital Images and Human Vision (A. B. Watson, ed.), pp. 207–220, 1993.

    [6] A. M. Eskicioglu and P. S. Fisher, “Image quality measures and their performance,” IEEE Trans. Commun., vol. 43, no. 12, pp.

    2959–2965, Dec. 1995.

    [7] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in Conference Record

    of the Thirty-Seventh Asilomar Conference on Signals, Systems and Computers, vol. 2, Nov. 2003, pp. 1398–1402.

    [8] K. Seshadrinathan, R. Soundararajan, A. C. Bovik, and L. K. Cormack, “Study of subjective and objective quality assessment of

    video,” IEEE Trans. Image Process., vol. 19, no. 6, pp. 1427 –1441, Jun. 2010.

    [9] M. Podolsky, S. McCanne, and M. Vetterli, “Soft ARQ for layered streaming media,” Technical Report, Computer Science Division,

    University of California, Berkeley, vol. UCB/CSD-98-1024, 1998.

    [10] P. A. Chou and Z. Miao, “Rate-distortion optimized streaming of packetized media,” IEEE Trans. Multimedia, vol. 8, no. 2, pp.

    390–404, Apr. 2006.

    [11] J. Chakareski, P. A. Chou, and B. Aazhang, “Computing rate-distortion optimized policies for streaming media to wireless clients,”

    in Proceedings of Data Compression Conference, 2002, pp. 53–62.

    [12] Y. Zhang, F. Fu, and M. van der Schaar, “On-line learning and optimization for wireless video transmission,” IEEE Trans. Signal

    Process., vol. 58, no. 6, pp. 3108–3124, Jun. 2010.

    [13] F. Fu and M. van der Schaar, “A new systematic framework for autonomous cross-layer optimization,” IEEE Trans. Veh. Technol.,

    vol. 58, no. 4, pp. 1887–1903, May. 2009.

    [14] C. Chen, R. W. Heath, A. C. Bovik, and G. de Veciana, “Adaptive policies for real-time video transmission: a Markov decision

    process framework,” in 18th IEEE International Conference on Image Processing, Sept. 2011.

  • 21

    [15] Y. Li, A. Markopoulou, J. Apostolopoulos, and N. Bambos, “Content-aware playout and packet scheduling for video streaming

    over wireless links,” IEEE Trans. Multimedia, vol. 10, no. 5, pp. 885–895, Aug. 2008.

    [16] J. Cabrera, A. Ortega, and J. I. Ronda, “Stochastic rate-control of video coders for wireless channels,” IEEE Trans. Circuits Syst.

    Video Technol., vol. 12, no. 6, pp. 496–510, Jun. 2002.

    [17] J. W. Huang, H. Mansour, and V. Krishnamurthy, “A dynamical games approach to transmission-rate adaptation in multimedia

    WLAN,” IEEE Trans. Signal Process., vol. 58, no. 7, pp. 3635–3646, Jul. 2010.

    [18] J. Filar and K. Vrieze, Competitive Markov Decision Processes. New York: Springer-Verlag, 1997.

    [19] F. Fu and M. Van Der Schaar, “A systematic framework for dynamically optimizing multi-user wireless video transmission,” IEEE

    J. Sel. Areas Commun., vol. 28, no. 3, pp. 308–320, Apr. 2010.

    [20] Q. Zhang and S. A. Kassam, “Finite-state Markov model for Rayleigh fading channels,” IEEE Trans. Commun., vol. 47, no. 11,

    pp. 1688–1692, Nov. 1999.

    [21] H.-P. Lin and M.-J. Tseng, “Two-layer multistate Markov model for modeling a 1.8 GHz narrow-band wireless propagation channel

    in urban Taipei city,” IEEE Trans. Veh. Technol., vol. 54, no. 2, pp. 435–446, Mar. 2005.

    [22] M. H. Pinson and S. Wolf, “A new standardized method for objectively measuring video quality,” IEEE Trans. Broadcast., vol. 50,

    no. 3, pp. 312–322, Sept. 2004.

    [23] D. M. Chandler and S. S. Hemami, “VSNR: A wavelet-based visual signal-to-noise ratio for natural images,” IEEE Trans. Image

    Process., vol. 16, no. 9, pp. 2284–2298, Sept. 2007.

    [24] A. K. Moorthy and A. C. Bovik, “Visual importance pooling for image quality assessment,” IEEE J. Sel. Topics Signal Process.,

    vol. 3, no. 2, pp. 193–201, Apr. 2009.

    [25] Z. Wang and Q. Li, “Video quality assessment using a statistical model of human visual speed perception,” Journal of the Optical

    Society of America, vol. A 24, B61-B69, Jul. 2007.

    [26] Test sequences [On line]. Available: http://trace.eas.asu.edu/yuv/.

    [27] H. S. Wang and P.-C. Chang, “On verifying the first-order Markovian assumption for a Rayleigh fading channel model,” IEEE

    Trans. Veh. Technol., vol. 45, no. 2, pp. 353–357, May. 1996.

    [28] T. Su, H. Ling, and W. J. Vogel, “Markov modeling of slow fading in wireless mobile channels at 1.9 GHz,” IEEE Trans. Antennas

    Propag., vol. 46, no. 6, pp. 947–948, Jun. 1998.

    [29] C. C. Tan and N. C. Beaulieu, “On first-order Markov modeling for the Rayleigh fading channel,” IEEE Trans. Commun., vol. 48,

    no. 12, pp. 2032–2040, Dec. 2000.

    [30] Technical Report [On line]. Available: https://webspace.utexas.edu/cc39488/pdf/report.pdf.

    [31] M. Neuts, Matrix-Geometric Solutions in Stochastic Models: An Algorithm Approach. The Johns Hopkins University Press, 1981.

    [32] J. Reichel, S. Schwarz, and W. M, “Joint scalable video model 11 (JSVM 11),” Joint Video Team, Doc. JVT-X202, Jul. 2007.

  • 22

    TABLE I

    THE ENCODING PARAMETERS AND RATE-QUALITY MODEL PARAMETERS OF THE TESTED SEQUENCES.

    sequencesLayer 1 (base layer) Layer 2 Layer 3

    QP rI1 rP1 q1 QP rI2 rP2 q2 QP rI3 rP3 q3

    foreman 34 0.0448 0.0095 0.9165 30 0.057 0.0144 0.0256 28 0.0264 0.0203 0.0106

    bus 34 0.0942 0.0285 0.9167 30 0.0864 0.0397 0.0390 28 0.0420 0.0580 0.0126

    flower 40 0.0846 0.0130 0.9117 36 0.075 0.0225 0.0400 35 0.028 0.0268 0.008

    mobile 39 0.1098 0.0174 0.9021 35 0.0953 0.0338 0.0427 34 0.0356 0.0416 0.0076

    Paris 37 0.0765 0.0079 0.9155 32 0.0711 0.0135 0.0375 30 0.0392 0.0182 0.0113

    TABLE II

    THE PERFORMANCE OF THE MDP-BASED POLICY.

    Paris mobile flower bus foreman

    MDP-based Policy 0.9479 0.9437 0.9468 0.9471 0.9438

    H1 0.9144 0.9023 0.9117 0.9167 0.9165

    H2 0.9164 0.8585 0.9160 0.8930 0.8943

    H3 0.8821 0.9199 0.9325 0.9337 0.9271

    TABLE III

    THE PERFORMANCE OF THE NEAR-OPTIMAL POLICY. ρ = 0.5123 AND τ = 2.

    Paris mobile flower bus foreman

    MDP-based Policy 0.9479 0.9437 0.9468 0.9471 0.9438

    Near-optimal Policy 0.9434 0.9418 0.9444 0.9460 0.9416

    TABLE IV

    THE PERFORMANCE OF THE NEAR-OPTIMAL POLICY. ρ = 0.7232 AND τ = 4.

    Paris mobile flower bus foreman

    MDP-based Policy 0.9530 0.9461 0.9537 0.9494 0.9476

    Near-optimal Policy 0.9431 0.9418 0.9468 0.9491 0.9414

  • 23

    Scheduler

    MarkovChannel

    Receiver

    Requests

    Channel and Receiver Buffer State

    VideoServer

    RequestedData

    Transmitter

    Fig. 1. Dynamic Scheduling for Video Transmission

    -1,3

    -1,2

    -1,1

    0,3

    0,2

    0,1

    1,3

    1,2

    1,1

    2,3

    2,2

    2,1

    3,3

    3,2

    3,1

    4,3

    4,2

    4,1

    Unexpired FramesExpired Frames

    CurrentFrame

    Fig. 2. Encoder Prediction Structure when L = 3

  • 24

    1 2 3 4 5 6 7 8 9 100.04

    0.05

    0.06

    0.07

    0.08

    0.09

    0.1

    0.11

    0.12

    0.13

    Frame Index

    Fram

    e si

    ze/M

    bit

    Foreman

    Measured dataAdopted model

    (a)

    1 2 3 4 5 6 7 8 9 100.06

    0.08

    0.1

    0.12

    0.14

    0.16

    0.18

    0.2

    0.22

    Frame Index

    Fram

    e si

    ze/M

    bit

    Paris

    Measured dataAdopted model

    (b)

    0 20 40 60 80 100 120 1400

    0.01

    0.02

    Fram

    e si

    ze/M

    bit

    1st l

    ayer

    Foreman

    0 20 40 60 80 100 120 1400

    0.01

    0.02

    0.03

    Fram

    e si

    ze/M

    bit

    2nd

    laye

    r

    0 20 40 60 80 100 120 1400

    0.01

    0.02

    0.03

    Frame Index

    Fram

    e si

    ze/M

    bit

    3rd

    Laye

    r

    Measured dataAdopted Model

    (c)

    0 20 40 60 80 100 120 1400

    0.005

    0.01

    0.015

    Fram

    e si

    ze/M

    bit

    1st l

    ayer

    Paris

    0 20 40 60 80 100 120 1400

    0.01

    0.02

    0.03

    Fram

    e si

    ze/M

    bit

    2nd

    laye

    r

    0 20 40 60 80 100 120 1400

    0.02

    0.04

    Frame Index

    Fram

    e si

    ze/M

    bit

    3rd

    laye

    r

    Measured dataAdopted model

    (d)

    0 50 100 1500.9

    0.91

    0.92

    0.93

    0.94

    0.95

    0.96

    Frame Index

    MS

    −SS

    IM

    Foreman

    Measured dataAdopted model

    (e)

    0 50 100 1500.91

    0.92

    0.93

    0.94

    0.95

    0.96

    0.97

    Frame Index

    MS

    −SS

    IM

    Paris

    Measured dataAdopted model

    (f)

    Fig. 3. Comparison between measured rate-quality characteristics and estimated rate-quality characteristics using the adopted rate-

    quality model. The measured rate and the estimated rate for P frames are shown in (a) and (b); The measured rate and the estimated

    rate for I frames are shown in (c) and (d); The measured MS-SSIM value and the modeled MS-SSIM value are shown in (e) and (f);

  • 25

    UnexpiredFrames

    ExpiredFrames

    P P P P P P P I P PP PI

    Receiveddata unit

    Unreceiveddata unit

    Fig. 4. An example of buffer state when there is no constraints applied on scheduling policies. LGOP = 8, L = 3

    UnexpiredFrames

    ExpiredFrames

    P P P P P P P I P

    pre

    PP P

    post

    I

    Receiveddata unit

    Unreceiveddata unit

    (a)

    UnexpiredFrames

    ExpiredFrames

    itritr

    postpre

    (b)

    Fig. 5. An illustration of the receiver buffer state when LGOP = 8, L = 3. (a): Bpret = (4, 2, 1), BIt = (6, 2), B

    postt = (3, 3, 1) and

    N itr = 0; (b): Bpret = (0, 0, 0), BIt = (5, 2), B

    postt = (4, 1, 0) and N

    itr = 2.

  • 26

    Displayed Frame

    P P I P P P P P P

    pre+ ��

    IP P

    post+

    P

    post

    (a) The displayed frame is an I frame.

    Displayed Frame

    P P P P P P P I P

    pre+ ��

    PP P

    post+

    I

    pre post

    (b) The displayed frame is a P frame.

    Fig. 6. Buffer state transition after a frame is played out. LGOP = 8, L = 3.

    Statespace s

    'S���

    WWWW

    W W�W W�

    (a) Aμ

    Statespace s

    'S( | )T S S� � ))(

    T� �

    �� W

    W ��W ��

    (b) Ãμ

    Fig. 7. The dynamics of the system Aμ and the corresponding simplified system Ãμ.


Recommended