1
Markov Decision Model for Perceptually
Optimized Video Scheduling
Chao Chen, Student Member, IEEE, Robert W. Heath Jr., Fellow, IEEE, Alan C.
Bovik, Fellow, IEEE, and Gustavo de Veciana, Fellow, IEEE,
Abstract
Transmitting video over slow fading wireless channels with good perceptual quality is a challenging
task because no time-diversity can be exploited to combat channel variations, especially when the frequency
diversity and spatial diversity is not available due to the wireless system implementation. While quality-scalable
video coding techniques make video source-rate adaptation possible, determining a good scheduling strategy
which selectively schedules video data associated with different layers is a challenging problem. For the best
performance of a wireless video system, the scheduler needs to consider the channel state, the buffer state and
the perceptual video quality at the receiver. In this paper, we propose a scheduling algorithm to optimize the
perceptual quality of scalably coded videos transmitted over slow fading channels. By modeling the dynamics
of the channel as a Markov chain, we reduce the problem of dynamic video scheduling to a tractable Markov
decision problem over a finite state space. We then employ an infinite-horizon average-reward maximization
algorithm to maximize the time-average Multi-Scale Structural SIMilarity (MS-SSIM) index which has been
shown to correlate highly with human judgments of video quality. Simulation results show that the proposed
MDP-based scheduling policy achieves significant perceptual quality improvement over scheduling methods
which do not explicitly exploit the channel dynamics. Furthermore, we propose an on-line scheduling method
which not only performs nearly as well as the MDP-based performance but also has very low implementation
complexity.
Index Terms
Videos, Scheduling algorithm, Wireless communication, Image quality.
The authors are with Department of Electrical and Computer Engineering, The University of Texas at Austin, 1 University Station
C0803, Austin TX - 78712-0240, USA e-mail: [email protected] This research was supported in part by Intel Inc. and Cisco Corp.
under the VAWN program.
2
I. INTRODUCTION
Video transmission over wireless channels is a challenging task. The throughput of wireless channels
varies over time, making the delivery of real-time video challenging due to tight delay constraints.
In particular, if the coherence time of the channel is comparable to the delay constraint, then the
time-diversity of the channel cannot be exploited. Traditional channel coding methods cannot provide
graceful visual quality degradation of the received video signal when deep fading happens. Hence, adap-
tive transmission techniques such as multi-layer scheduling and link-adaptation should be employed.
Furthermore, video packets are structured. Due to the nature of predictive video coding algorithms, a
video frame can be decoded only when its predictors are received at the receiver. Hence, the prediction
structure of the video codec enforces a partial order on the transmissions of the video packets.
Scalable video coding (SVC) is one approach for allowing flexible video transmission over channels
with varying throughput [1], [2]. An SVC video encoder produces a layered video stream that contains
a base layer and several enhancement layers. If the throughput is low, the transmitter can choose to
transmit the base layer only, which provides a moderate, but acceptable, degree of visual quality at the
receiver. If the channel conditions improve, the transmitter can transmit one, or more, enhancement
layers to further improve the visual quality. Conceptually, SVC provides a means to adapt the data rate
for wireless video transmission. The wireless transmitter can easily adapt the data rate by selectively
scheduling video data associated with various layers rather than re-encoding the video sequence into
a suitable rate.
Designing scalable video scheduling algorithms for wireless channels is a complex task. The schedul-
ing policy depends not only on the channel condition but also on the receiver buffer state. For example,
if the receiver has successfully buffered base layer data over many frames, the scheduler could choose
to transmit some enhancement layer data to improve the video quality even if the throughput is low.
The scheduling policy also depends on the impact that particular video data packet will have on the
perceptual video quality. The scheduler should assign higher priority to packets which could result in
higher perceptual quality improvements. An effective perceptual quality metric and an accurate rate-
quality model are important for scheduling policy design. The objective of this paper is to develop
scheduling algorithms to maximize the receiver perceptual video quality for scalable video transmission
over wireless channels.
A. Contributions
In this paper, we assume that video sequences are encoded by an H.264/SVC-compatible scalable
video encoder. We employ a finite state Markov chain (FSMC) to model the dynamics of the slow
3
fading channel. We also employ a rate-quality model to capture the relationship between the size
of a video data packet and its contribution to the receiver perceptual quality. We model the dynamic
video transmission system as a controlled Markov system. The visual quality measured in time-average
MS-SSIM index is maximized by optimizing the scheduling policy via value iteration (see e.g. [3]).
The specific contributions that we make are as follows:
1) A tractable MDP-based formulation is proposed to design the optimal scheduling policy. Typical
mobile users usually have available application layer storage space of several Gigabytes. Thus,
the buffer size can be regarded as infinite. Because the performance of scheduling policies
depend on the buffer state, the policies needs to be optimized over a infinitely large state space.
By making some reasonable approximations, we fix the scheduling policy except a finite set.
We prove that optimizing the transmission policy for this finite-state set is equivalent to solving
a semi-Markov decision problem. Based on this result, a value iteration algorithm is used to
optimize the scheduling policy.
2) Accurate prediction of visual quality is used as the optimization objective. In most video trans-
mission methods, Peak Signal to Noise Ratio (PSNR) is used as the optimization objective. It
is well known that PSNR does not accurately predict the perceptual quality of videos in many
instances [4] [5] [6]. In this paper, we employ an infinite-horizon average-reward maximization
formulation which is directly related to the time-average MS-SSIM index [7]. As shown in [8],
the time-average MS-SSIM index correlates quite well with human judgments of visual quality.
In this paper, the system state is mapped to MS-SSIM index using a simple rate-quality model.
Then, the MS-SSIM value of each state is used as the state-value in the value iteration algorithm.
3) A simple and near-optimal scheduling method is proposed. Computing the scheduling policy
based on the MDP formulation requires extra system resources. We devise a simple and near-
optimal scheduling algorithm which, similar to the MDP-based policy, proactively transmits data
for later GOPs and dynamically schedules data associated with different layers.
B. Related Work
MDP-based stochastic control techniques have been proposed for video data scheduling [9]–[14].
In [9], adaptive video transmission over a packet erasure channel was studied by modeling the buffer
state as a controlled Markov chain. Later, in [10], an MDP-based scheduling algorithm was proposed
for video transmission over packet loss networks. This work was further extended for wireless video
streaming in [11]. The wireless channel was modeled as a binary symmetric channel. This channel
model is justified for fast fading channels where the coherence time is much less than the delay
4
constraint. In that case, interleaving can be applied without violating the delay constraint and the
channel will appears like a i.i.d channel. For slow fading channels, the bit error rate cannot be modeled
as a constant. In [12] [13], a reinforcement learning framework was proposed for wireless video
transmission. Their algorithm was based on MDP with discounted-reward maximization formulation.
The transmitter learns the characteristics of the channel and the video sequence during the transmission
process. The scheduling policy is updated according to the learned characteristics. In our previous
work [14], an infinite-horizon average-reward maximization formulation was proposed. A very simple
rate-quality model was employed to differentiate the importance of different layers. The difference
between frames, however, was not incorporated in the rate-quality model. In addition, PSNR, instead
of time-average MS-SSIM index, was used as the optimization objective.
MDP-based methods have been used to solve other problems in adaptive video transmission. In
[15], an MDP formulation was proposed for adaptive video playout and scheduling for single layer
videos. A two-state channel model was used to represent “good” and “bad” channel conditions. The
controller adapts the playout speed according to the receiver buffer state and channel state in order to
optimize the PSNR of the signal at the receiver. In [16], an MDP-based formulation was introduced
for the problem of real-time encoder rate-control. The derived optimal rate-control policy adapts the
encoding bit rate according to the channel condition and video rate-quality characteristics. In [17],
multi-user scheduling and rate adaptation in wireless local-area networks (WLAN) were studied. The
users in a WLAN access the available resources in a decentralized manner. The multi-user scheduling
problem was formulated as a competitive Markov decision process [18] and a Nash-equilibrium policy
was computed via a value iteration algorithm.
Among all the mentioned work, the most closely related work to ours are [10] and [12] which focus
on single user scalable video transmission. The differences from our work are summarized as follows:
• The video scheduling algorithm proposed in [10] was developed for packet loss networks. In
a packet loss network, the throughput of each transmission slot is limited by congestion rather
than by deep signal fading in the transmission medium. Hence, in [10], the channel was assumed
to be time-invariant. Each packet was assumed to be lost or delayed independently of the other
packets. Based on these assumptions, the scheduling policies for different video packets can be
factorized. Thus the policy optimization problem was greatly simplified. For wireless channels,
the channel state is time varying and thus the packet losses are not independent across coherent
periods, hence scheduling policy factorization is not possible.
• In [12], Zhang et al. studied single user video transmission over wireless channels. The control
5
policy was learned on-line through reinforcement learning. The major drawback of reinforcement
learning is that it takes time to learn from the wrong scheduling actions. The scheduler may cause
bad visual quality during the learning period. As shown in [12] and [19], with an accelerated
on-line learning algorithm, the scheduling policy converges after 25 frames are transmitted. In
our work, the channel dynamics are assumed to be known. Indeed, the finite state Markov model
can be obtained analytically [20] or by measurement [21]. The channel model is combined with
a simple rate-quality model to form a model-based MDP formulation. The scheduling policy thus
can be derived off-line.
• In [12] [16] [19], a discounted-reward maximization formulation is employed to trade off the
visual quality of recent and later frames. The discounting factor needs to be chosen heuristically
and affects the performance of the derived scheduling policy. In our work, an average-reward
maximization formulation is proposed. This formulation is naturally related to the time-average
MS-SSIM index which correlates well with human objective judgements of visual quality [8].
C. Notation Used
A and a are examples of a matrix and a vector, respectively. A is a set. |A| is the cardinality ofset A. 1 is the unit vector of all-ones and 0 is the zero vector. max{a,b} and min{a,b} are thecomponentwise maximum and minimum of vector a and b, respectively. �(·) is the indicator function.�·� is the ceiling function. P(·) is the probability measure and E[·] is the expectation.
II. SYSTEM MODEL
In this section, we first describe the wireless video system to be considered. Then, we present our
video codec configurations and introduce the rate-quality model based on MS-SSIM index. Finally,
we present the Markov channel model to be used in the sequel.
A. System Overview
We consider a time-slotted scalable video transmission system over slow fading wireless channels.
A video sequence is encoded with a quality-scalable video encoder and stored in a video server. The
video server transmits video data to a mobile user via a wireless transmitter. Each slot, the server
sends some video data upon the requests of a scheduler equipped on the wireless transmitter. These
data are packetized at the wireless transmitter for physical layer transmission. The scheduler operates
according to a scheduling policy which maps the channel and buffer state to the scheduling action(see
Fig. 1).
6
We assume that the channel between the video server and the wireless transmitter is not the
bottleneck of the link. Thus, in the perspective of the wireless transmitter, the whole video sequence
is accessible. We also assume that the physical layer channel state information is available at the
transmitter and that the modulation and coding scheme (MCS) is determined by a given physical layer
link-adaptation policy.
B. Video Codec Configuration
We assume that video sequences are encoded by an H.264/SVC-compatible scalable video encoder.
The duration of each frame ΔT is called a frame slot. The video frames are uniformly partitioned into
Groups of Pictures (GOPs). Every GOP has LGOP frames. The first frame in a GOP is an I frame
while the other frames are P frames. Every frame is encoded into L layers. The first layer is the base
layer; The other layers are enhancement layers. Every enhancement layer of a frame is predictively
encoded using the lower layers of the frame. The base layer of a P frame is predictively encoded
using the base layer of its preceding frame. The base layer of an I frame is encoded independently
(see Fig. 2).
Each frame has a playout deadline at the receiver. In the following, frames whose deadlines have
expired are called expired frames. The other frames are called unexpired frames. The first unexpired
frame is called the “current frame”. At any time, the frames are indexed relative to the current frame
as shown in Fig. 2. The video data in the �th layer of the f th frame is called the (f, �)th video data
unit.
We adopt the prediction structure in Fig. 2 rather than the “Hierarchical B” structure because
no structural delay is introduced [1]. Specifically, in the “Hierarchical B” prediction structure, the
encoding order differs from the display order, thus, the transmission of a frame must be delayed until
all necessary predictors are received. Also, due to the time-varying nature of the wireless channels,
the adaptive transmitter must drop some enhancement layers when channel throughput is low. So,
if the enhancement layers are used to predict other frames as is the case in the “Hierarchical B”
structure, dropped enhancement layers can give rise to error propagation and unpredictable visual
quality degradation. At the possible cost of lower compression efficiency, the prediction structure that
we use will eliminates error propagation arising from enhancement layer losses, since there will be
no inter-frame prediction among enhancement layers.
7
C. Rate-Quality Model
The rate-quality model characterizes the relationship between the size of a video data unit and
the visual quality improvement when it is correctly received. We adopt a simple model. For each
P frame, let rP� be the amount of data in the �th layer data unit. For each I frame, let rI� be the
amount of data in the �th layer data unit. Define q� to be the visual quality increment after the �th
layer is correctly received, given all its predictors are also received. This model implies that the visual
quality improvement incurred by the enhancement layers in one frame does not depend on whether
the enhancement layers of other frames are received. This is true for the prediction structure given in
Section II-B since there is no error propagation due to losses of enhancement layers.
Conventional image quality measures such as the PSNR reflect absolute signal fidelity but without
accounting for perceptual visual quality. Recently, a variety of models that accurately predict perceptual
video quality have been proposed [7], [22]–[25]. In our formulation, we adopt MS-SSIM index as the
visual quality measure [7], since it has been shown to correlate quite well with perceptual visual
quality and it is of reasonable computational complexity [8].
The MS-SSIM index of a video sequence ranges from 0 to 1. The larger the index, the better
the quality. In our rate-quality model, the quality increment q� is measured using MS-SSIM index.
Therefore, the quantity q� ∈ [0, 1]. Larger values of q� mean larger quality improvement can be achievedby transmitting the �th layer data units.
In a real video sequence, rate-quality characteristics vary from frame to frame. For simplicity, we
use the average value of the measured rate-quality characteristics as estimates of rI� , rP� and q�. In Fig.
3, the data rates and MS-SSIM values of two video test sequences, “Foreman” and “Paris”, are shown.
These two sequences are widely used in visual quality assessment. “Foreman” has higher temporal
complexity and “Paris” has higher spatial complexity [26]. As shown in Fig. 3, our proposed model
is a good fit for the rate-quality characteristics.
D. Channel Model
We focus on scheduling for a slow fading channel. By slow fading, we mean that the coherence time
of the channel is less than the duration of a GOP and larger than the duration of a frame. Assuming
the mobile users are moving in a 1.5m/s walking speed and the carrier frequency is 2GHz, the Doppler
spread is about 10Hz. The coherence time is about 100ms. A typical GOP duration is about 1 second
and a frame slot is about 30ms. Hence, for pedestrian video users, wireless channels are slow fading.
As the channel state is stable during each frame slot, the scheduling decision is made in a frame-by-
frame basis. At the beginning of each frame slot, a frame is played out. Then, the wireless transmitter
8
schedules video data units for transmission according to a scheduling action. The scheduling action is
defined as an ordered collection of video data units
u ={(f1, �1), (f2, �2), · · · , (f|u|, �|u|)
}.
When a scheduling action u is taken, the data units contained in u are transmitted sequentially. At
physical layer, each scheduled data unit is packetized into physical layer packets and each packet is
repeatedly transmitted, i.e., if errors occur, until acknowledged. The MCSs used in transmitting data
packets is determined by a link-adaption policy. In this paper, we focus on scheduling policy design
and assume that the link-adaptation policy is given.
In [20] and [27], it is shown that the first-order FSMC can be utilized to accurately describe the
first-order channel state transition probabilities for Rayleigh fading channels. First-order FSMC models
have also been validated in [21] and [28] by channel measurements of urban area wireless channels. In
this paper, we employ a first-order FSMC to describe the dynamics of the channel state. It should be
noted that, as pointed in [29], a first-order FSMC is not sufficient to describe high-order channel state
distributions. Generally, the autocorrelation function (ACF) of a first-order FSMC is exponentially
decreasing and the ACF of a Rayleigh fading channel is a zeroth-order Bessel function of the first
kind. To model the higher order dynamics of the wireless channel, at the cost of higher complexity,
a higher order Markov channel model can be applied.
At the physical layer, in the tth frame slot, the transmission bit rate Rt is determined by the MCS
and the packet error rate pt is determined by both the channel state and the MCS. Under the given
link adaptation method, the chosen MCS is a function of the channel state. Thus, there is a one-to-one
mapping from channel state to the tuple (Rt, pt). Due to the Markov property of the channel state,
(Rt, pt) can also be modeled by an FSMC. The channel state space is C ={C1, ..., C|C|
}, where
Ci = (Ri, pi) is the ith channel state. The state transition matrix Pc is a |C| × |C| matrix with entryPci,j = P(Cj|Ci) being the transition probability from state (Ri, pi) to (Rj, pj).
III. PROBLEM FORMULATION
In this section, we define the scheduler’s state space and the policies to be considered. Then, we show
how to simplify the scheduling problem to a finite-state Markov decision problem using reasonable
approximations. An infinite-horizon average-reward maximization MDP formulation is proposed to
optimize the scheduling policy so as to improve the time-average MS-SSIM index at the receiver.
9
A. Scheduling Policy and State Space
Considering all the possible scheduling actions makes defining the scheduling policy and represent-
ing the buffer state unmanageably complicated. If we do not apply any constraint on the scheduling
actions, the receiver buffer state could look like Fig. 4. On the one hand, to represent the buffer state,
the frame index and the layer index of each received data unit need to be recorded. Because the
number of received data units is not bounded, we cannot represent all possible buffer states using a
finite-dimension vector space. On the other hand, the scheduling actions which give rise to the buffer
state in Fig. 4 cannot provide optimal visual quality at the receiver. As shown in Fig. 4, some video
data units are transmitted before their predictors. If their predictors are not received before their playout
deadlines, these units are undecodable and useless. In this paper, by applying reasonable constraints
on the scheduling actions, we concentrate on those scheduling strategies which are possible to present
good performances. Specifically, we consider the scheduling policies which comply with the following
constraints.
Constraint 1: The scheduler always schedules a data unit later than its predictors in the prediction
structure.
Constraint 2: The amount of video data scheduled in the tth slot is just larger than Rt ×ΔT , i.e.,the amount of data which can be transmitted in the slot.
Constraint 3: The scheduler never schedules more enhancement layer data units for later P frames
than sooner P frames in the same GOP.
Constraint 1 is applied to make sure that the transmission order is compatible with the prediction
order given in Section II-B, since a data unit can be decoded only when its predictors are received.
Constraint 2 forces the transmitter to keep busy transmitting data during the entire slot. With Constraint
3, at any time and for all the P frames within a GOP, the scheduler does not sacrifice the quality of
the frames which will be displayed sooner for the frames to be displayed later by transmitting more
enhancement layer data for the latter. Because the optimization objective is time-average MS-SSIM
index, the quality of each P frame is equally important. Transmitting more enhancement layer data
for later frames does not help to improve the time-average MS-SSIM value.
Note that, although the P frames within a GOP are equally important for time-average MS-SSIM
index, the frames in different GOPs are not. As discussed in Section I, when the channel throughput is
very low, it is beneficial to sacrifice P frames in current GOP for transmitting the I frame of the next
GOP. To differentiate the importance of current and future GOPs, we partition the data units of the
unexpired frames into three sets: Ipre, I and Ipost. The set I contains the unexpired data units of the
10
first unexpired I frame, Ipre contains data units before the first unexpired I frame, and Ipost containsthe remaining unexpired data units (see Fig. 5). In the following, we define the buffer state spaces for
the three sets as BI , Bpre and Bpost, respectively. The overall buffer state space is B = Bpre×BI×Bpost.BI At the tth slot, the state of I is defined as BIt = (f It , bIt ), where f It ∈ {1, · · · , LGOP} is the
frame index of the first unexpired I frame and bIt is the number of the received data units
of I.Bpre According to Constraint 3, the number of data units received in Ipre is decreasing with
respect to the frame index. Hence, we only need to record the number of received data
units for each layer. We define the buffer state space of Ipre as a L-dimensional vectorBpret = (b
pre1 , b
pre2 , · · · , bpreL ), where bpre� is the number of the received data units in �th layer of
Ipre.Bpost It is noted that Constraint 3 is only applied within each GOP. Hence, we cannot define the
state space of Ipost in the similar way as we did for Ipre. In principle, we need to record thenumber of received data units for each frame in Ipost. In that case, we cannot define a vectorspace with fixed number of dimensions to represent the state of Ipost. Hence, for simplicity,we extend Constraint 3 to all the frames in Ipost. In other words, the scheduler never transmitsmore enhancement layers for later frames than sooner frames in Ipost. Similar to Ipre, wedefine the buffer state space of Ipost as a L-dimensional vector Bpostt = (bpost1 , bpost2 , · · · , bpostL ),where bpost� is the number of the received data units in �th layer of Ipost.
Remark By extending Constraint 3 to all the GOPs in Ipost, we actually rule out the option oftransmitting more data units for the I frames in later GOPs of Ipost. This may potentially degradeperformance but this is negligible. Transmitting more data units for later I frame is necessary only
when the channel state is bad throughout the whole GOP. However, as we assumed in Section II-D,
the coherence time of the channel is much less than the duration of a GOP. The channel is less likely
to be bad throughout the whole GOP. Hence, quality degradation is less likely to occur.
When no video data units for the current frame are received, the decoder cannot continue decoding
the frame. The video stream will experience an interruption until all the necessary base layer data
units for decoding the current frame are received. We define an interruption state N itrt which is the setof the unreceived base layer data units for decoding the current frame at slot t (see Fig. 5(b)). When
the current frame is decodable, the interruption state N itrt = ∅. N itrt contains at most LGOP − 1 dataunits because I frames resynchronize the decoding process and terminate the interruption. Because
11
every data unit is transmitted only when all its predictors are received, the data units needed for
decoding the current frame must be composed of a sequence of consecutive base layer data units,
i.e., N itrt = {(−N itrt + 1, 1), (−N itrt + 2, 1), · · · , (0, 1)}. Hence, the interruption state can be simplyrepresented by the number N itrt = |N itrt |.
The system state S is defined as the product of the channel state, the buffer state and the interruptionstate. At slot t, the system state is St = (Ct, Bt, N itrt ), where Bt = (B
pret , B
It , B
postt ). For each state
St ∈ S , we define a feasible control set USt which contains all the scheduling actions complying withall the three constraints. At any time, the state St contains all the information about the receiver buffer
and the channel. The transmitter must decide which action in USt to take in order to maximize thetime-average MS-SSIM index value. We define the scheduling policy μ(·) as the mapping from thesystem state St to an action in USt . In the following sections, we show how to optimize the schedulingpolicy μ(·).
B. Policy Simplification
Because the receiver buffer size is regarded as essentially infinite, the state space Bpost is thereforeinfinite. Optimizing the scheduling policy over this infinite-state space is intractable. In the following,
we reduce the state space to a finite one by reasonable approximations.
When many frames are buffered at the receiver, the scheduler can transmit more enhancement
layers because there is enough time before the frames are played out. Based on this observation,
we define a window W which contains the data units within the first W unexpired frames. Thescheduler is restricted to work as follows: If all the data in W are all received and N itrt = ∅, thescheduler schedules as many enhancement layers as possible. Otherwise, the scheduler only focuses
on scheduling the video data in W and N itrt .By using the window W , although the state space is still infinite, we fix the scheduling actions
outside a finite state set. In other words, it is only necessary to find the optimal actions for a finite-
state set. The window size W provides a tradeoff between complexity and optimality. The larger the
window, the less constrained the control policy but the higher complexity.
It should be noted that the window defined here is different from the sliding window defined in [10]
and [19]. Our scheduling policy allows the transmitter to transmit the data units outside the window.
C. Transition Probability
Let St = (Ct, Bt, N itrt ) and USt be the system state and the corresponding feasible control set atslot t, where Ct = (Rt, pt) and Bt = (B
pret , B
It , B
postt ). At the beginning of each slot, one frame is
12
decoded and played out. We let B+t = (Bpre+t , B
I+t , B
post+t ) denote the buffer state right after the first
frame is displayed. If f It = 1, i.e., the decoded frame is an I frame, then the frame set I becomesthe next I frame, i.e., the LGOP th frame (see Fig. 6(a)). Hence, buffer state is
BI+t =
(LGOP ,
L∑�=1
�(bpost� ≥ LGOP )), (1)
where∑L
�=1 �(bpost� ≥ LGOP ) is the number of received layers in the next I frame. Meanwhile, Ipre
becomes the first LGOP − 1 frames and Ipost contains the frames whose index is larger than LGOP .Thus, we have
Bpre+t = min{Bpostt , (LGOP − 1)1
}(2)
and
Bpost+t = max{Bpostt − LGOP1,0
}. (3)
If the decoded frame is not an I frame, the frame set Ipost will not be affected and the buffer stateBpostt does not change (see Fig. 6(b)). B
pret becomes
Bpre+t = max {Bpret − 1,0} . (4)
Summarizing (1), (2), (3) and (4), we have
Bpre+t =
⎧⎪⎨⎪⎩min
{Bpostt , (LGOP − 1)1
}if f It = 1,
max {Bpret − 1,0} if f It �= 1,
BI+t =
⎧⎪⎨⎪⎩(LGOP ,
∑L�=1 �(b
post� ≥ LGOP )
)if f It = 1,
(f It − 1, �It ) if f It �= 1,and
Bpost+t =
⎧⎪⎨⎪⎩max
{Bpostt − LGOP1,0
}if f It = 1,
Bpostt if f It �= 1.After the first frame is displayed, the transmitter begins to sequentially transmit the collection of video
data units indicated by the action ut = μ(St) = {(f1, �1), · · · , (f|ut|, �|ut|)}. Let Δut = {(f1, �1), · · · ,(fnt , �nt)} denote the completely received data units by the end of the slot, where nt is the numberof received data units. Among the data units in Δut, let ΔB
pret = (Δb
pre1 ,Δb
pre2 , · · · ,ΔbpreL ) be the
number of newly received data units for each layer in frame set Ipre. Similarly, we denote ΔBpostt =(Δbpost1 ,Δb
post2 , · · · ,ΔbpostL ) as the number of newly received data units for each layer in frame set Ipost
13
and Δ�I as the number of received data units for I. At the beginning of the (t + 1)th slot, we havethe following state transition relationship
Bpret+1 = Bpre+t +ΔB
pret , (5)
BIt+1 = (fI+t , �
I+t +Δ�
I), (6)
Bpostt+1 = Bpost+t +ΔB
postt . (7)
As for the interruption state N itrt , after the first frame is played out, the second frame becomes the
current frame. If the displayed frame is an I frame, N itr+t = �(bI = 0). Here, �(bI = 0) indicates
whether the base layer of the displayed frame is received. If the displayed frame is the last frame of
a GOP, then N itr+t = 0. If the displayed frame is neither an I frame nor the last frame in a GOP, we
have N itr+t = N itrt +�(bpre1 = 0). At the end of the slot, the data units in N itr+t which are also in Δut
are removed from N itr+t . Thus,
N itrt+1 = Nitr+t − |N itr+t ∩Δut|. (8)
The amount of video data in Δut, denoted by R(Bt,Δut), can be estimated according to buffer state
BIt and the rate-quality model introduced in Section II-C. Specifically, for each data unit in Δut, we
first determine whether it belongs to an I frame or a P frame according to BIt and then estimate the
amount of data by the rate-quality model. The set Δut records the completely transmitted data units
up to (fnt , �nt)th data unit. However, data unit (fnt+1, �nt+1) is only partially received. Denoting the
amount of data in unit (fnt+1, �nt+1) by R̃(BIt ,Δut), the amount of received data is at least R(BIt ,Δut)
and at most R(BIt ,Δut) + R̃(BIt ,Δut). Assuming the physical layer packet length is LPHY , there is
N = �ΔT×RtLPHY
� packet transmissions during a time slot ΔT . The number of successfully transmittedpackets is at least Nl = �R(B
It ,Δut)
LPHY� and is less than Nh = �R(B
It ,Δut)+R̃(B
It ,Δut)
LPHY�. As assumed in
Section II-D, the channel state is constant over each slot. Thus, the packet losses are independent
within each slot. The number of successful packet transmissions in a slot is distributed binomially.
Hence, the state transition probability from St = (Ct, Bt, N itrt ) to St+1 = (Ct+1, Bt+1, Nitrt+1) is
Pμ(St|St+1) =[
Nh−1∑nt=Nl
(N
nt
)pN−ntt (1− pt)nt
]P(Ct|Ct+1), (9)
where the first multiplicative term is the transition probability of the receiver buffer state from
(Bt, , Nitrt ) to (Bt+1, N
itrt+1) and the second term is the transition probability of the channel state
from Ct to Ct+1.
14
D. Optimization Objective
At the beginning of each time slot t, the first frame in the window is played out and the MS-SSIM
index is
Q(St) =L∑
�=1
q� × ��(St), (10)
where ��(St) is the indicator of whether the �th layer of the displayed frame is received at state St.
The quantity q� is the rate-quality model parameter defined in Section II-C. Because of the way that
the MS-SSIM index is defined, the quantity Q(St) is also bounded in [0, 1]. Our aim is to find the
optimal policy μ∗(·) which maximizes the time-average MS-SSIM index, i.e.,
Jμ = limN→∞
1
NEμ
{N−1∑t=0
Q(St)
}. (11)
E. Finite State Problem Formulation
Using the window defined in Section III-B, we fix the transmission policy outside a finite state set.
The buffer state space, however, is still infinite and the system state evolves in this infinite state space.
In the following, we show how to simplify this infinite state space problem to a finite-state problem.
Note that we only need to optimize the policy μ(·) when some of the video data in the windowhave not been received or N itr �= 0. We formally define this finite set SW as follows:
SW ={(
C,B,N itr) | (C,B,N itr) ∈ S, V(B) � VW or N itr �= 0} , (12)
where VW denotes the set of video data units in W and V(B) is the set of buffered video data units.We define another subset of S as follows
SW ={(
C,B,N itr) | (C,B,N itr) ∈ S,VW ⊂ V(B) and N itr = 0} . (13)
For all the states in SW , the video data in W and N itr are all received. Note that, because thetransmitter always transmits the video data units in W and N itr with higher priority, SW and SWform a partition of state space S . In other words, we have SW ∪ SW = S and SW ∩ SW = ∅.
Given a policy μ(·), the system state transits as a controlled Markov chain in set SW and as anuncontrolled Markov chain in set SW . Because the transmission rate is finite, the number of states inSW which can be reached from SW in one step is also finite. We formally define this set of states asfollows
SΔ = {S|S ∈ SW ; ∃S ′ ∈ SW , s.t., Pμ(S|S ′) > 0}. (14)
Once the system moves into the set SW , the system state hits a state in SΔ and then stays in SWfor some time. During this period, the decoded video quality is always Q̂ =
∑L�=1 q�, because all
15
the data units in W are received. The dynamics of the system when it moves into set SW affectsthe performance of the system. Generally, the longer it stays in SW , the better the performance is.Although the scheduling policy in SW is fixed as described in Section III-B, the control policy in SWdetermines how frequently the system state will hit SW and thus also affects the system performance.
In the following, we denote the system under a given policy μ as system Aμ. Let Tμ(S) be the
expected time spent by Aμ in SW after it enters SW at state S ∈ SΔ. Let PTμ (S ′|S) denote theprobability that Aμ jumps back to SW at state S ′ ∈ SW after it enters SW at state S. To find theoptimal policy, we define a finite-state system Ãμ as follows:
Definition 1: A system Ãμ is called the simplified system of the original system Aμ if it has the
following dynamics:
1) The system is a semi-Markov process over state space S̃ = SW ∪ SΔ. At any state S ∈ S̃ , thevisual quality is Q(S) as in (10). At any state in SW , the system acts according to the policy μ;
2) When the system jumps to a state S ∈ SΔ, it spends Tμ(S) slots in S with video quality∑L
�=1 q�
for each slot. Then, the system transits to a state S ′ ∈ SW with probability PTμ (S ′|S).It should be noted that Ãμ is not coupled with the original system Aμ. It just shares some properties
with the original system (see Fig. 7). The following theorem relates the visual quality under Ãμ and
that of Aμ.
Theorem 1: If the jump chain of the original system Aμ is positive recurrent, then the time-average
MS-SSIM index of Aμ is the same as the simplified system Ãμ.
Proof Sketch: If the jump chain is positive recurrent, the jump from SW to SΔ can partitionthe Markov process into i.i.d segments. We only need to optimize the policy μ to maximize the
average quality in each segment. Every segment consists of two consecutive subsegments. During the
first subsegment, St ∈ SW . In the other subsegment, St ∈ SW . Because every state in SW has thesame visual quality
∑L�=1 q�, we can abstract the first subsegment as a single state with transition
probability PTμ (S′|S). This simplified system provides the same average quality as the original system.
For a detailed proof, see the technical report [30]
Remark The positive recurrent condition for the jump chain means that the average throughput of the
channel is neither too large nor too small relative to the average data rate of the video. If the average
throughput of the channel is very large, the receiver buffer can always buffer enough frames and the
dynamic scheduling is unnecessary. If the average channel throughput is too small, the channel cannot
support the video stream and dynamic scheduling cannot help either.
As indicated by Theorem 1, given any policy μ, the visual quality of Aμ is the same as Ãμ. Thus,
16
we can optimize our policy with respect to Ãμ which has a finite-state space, and a standard policy
optimization algorithm can by applied.
IV. POLICY OPTIMIZATION
In the following, we show how to compute the parameters Tμ and PTμ . Then, we present a value
iteration algorithm to find out the optimal scheduling policy.
A. Computation of Tμ and PTμ
Before we can apply an MDP algorithm to optimize the policy, we need to compute Tμ(S) and
PTμ (S′|S) for every state S ∈ SΔ and S ′ ∈ SW . Both Tμ(S) and PTμ (S ′|S) only involve dynamics of
the system when the state S ∈ SW . Because the scheduling policy is fixed in SW , we can simplifythe state representation furthermore.
Let BWt be the number of buffered packets outside the window. Noted that, when the system
moves in SW , the system always schedules as many enhancement layer data units as possible. Inaddition, when the system is in state set SW , we always have N itrt = ∅. Hence, BWt and f It containall the information about the buffer state (Bt, N itrt ). We can further simplify the state representation
(Ct, Bt, Nitrt ) to (Ct, B
Wt , f
It ) when S ∈ SW . All the states in SW correspond to some states with
BWt ≥ 0. All the states in SW correspond to some states with BWt < 0.When the system evolves in SW , at the beginning of a slot t, the state BWt first decreases by
ΔBWd (St) when the current frame is displayed. Then, the transmitter schedules as many enhancement
layer data units as possible. At the end of the slot, BWt increases by ΔBWi (St). Because the quantity
ΔBW (St) = ΔBWi (St)−ΔBWd (St) only depends on state St, the state BWt varies like a random walk
but with Markovian step-size ΔBW . This process can be described by a quasi-birth-death process
(QBDP). Hence, determining Tμ and PTμ is actually the hitting time problem of the QBDP. The
problem for continuous time QBDP was essentially solved in [31, p. 96]. The discrete time case can
also be solved similarly. Details on how to compute Tμ and PTμ are found in the technical report [30].
B. Determining Optimal Policy via Value Iteration
Given Tμ and PTμ , the optimal policy for an MDP can be determined for the simplified system Ãμ,
which is also the optimal policy of A. Let Sini be any state in S̃ = SW ∪SΔ. The hitting time to stateSini can partition the process into i.i.d cycles. Optimizing the policy μ(·) in the cycles maximizes thetime-average MS-SSIM index of the system. Similar to the derivation in [3, p. 441], this is equivalent
17
to an average-reward maximization problem with stage-reward g(S)− τ(S)λ, where λ is the expectedaverage-reward of each cycle and
g(S) =
⎧⎨⎩
Q(S) : S ∈ SWTμ(S)
∑L�=1 q� : S ∈ SΔ,
τ(S) =
⎧⎨⎩
1 : S ∈ SWTμ(S) : S ∈ SΔ.
Let us denote by h(S) the average reward-to-go in each cycle when the system starts at state S. Then
we have the following Bellman’s equation array:
h(S) = g(S)− τ(S)λ+∑
S′∈SW∪∂SPμ(S ′|S)h(S ′), (15)
where h(Sini) = 0. To find the optimal policy, the standard value iteration algorithm can be applied
[3, p. 430].
V. PERFORMANCE EVALUATION AND NEAR-OPTIMAL POLICY DESIGN
In this section, we first test the MDP-based scheduling policy by simulations and show its superiority
to those scheduling methods which does not explicitly explore the buffer-channel information. Then,
we propose a simple scheduling policy which presents near-optimal performance.
A. Performance evaluation
The proposed dynamic scheduling algorithm was evaluated on the test sequences of “foreman”,
“bus”, “flower”,“mobile” and “Paris” [26]. These video sequences were encoded using H.264/SVC
reference software JSVM [32] into 3 layers. The GOP length was set as LGOP = 16. The encoding
parameters and rate-quality model parameters are listed in Table I. The parameters rI� and rP� are
measured in megabits and q� is measured in MS-SSIM index. The quantization parameters (QP ) for
base layers were chosen such that the base layer quality is about 0.90 to 0.91 MS-SSIM value which
is of moderate but acceptable visual quality. The QP ’s of the enhancement layers were chosen such
that the third layer provides MS-SSIM visual quality prediction of 0.95 to 0.96 and the bit rates of
the two enhancement layers are roughly the same. The Lagrangian multipliers for motion estimation
and mode decision were set as QP − 2.
18
We employ a 4-state Markov channel model to test the performance of the proposed scheduling
algorithm. The state transition matrix is
Pc =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣
15
45
0 0
15
35
15
0
0 35
15
15
0 0 45
15
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦
and the steady state distribution is
π= [0.15, 0.60, 0.20, 0.05].
Let us denote the throughput of channel state (Ri, pi) by ri = Ri(1 − pi). The state parameters Riand pi are configured such that r1 < RV1 < r2 < R
V2 < r3 < R
V3 < r4 in which R
Vi is the average
video data rate up to the ith layer. Hence, the channel throughput fluctuates among the average rate
of each layer. According the steady state distribution, the average throughput of the channel is higher
than the base layer but not enough to support the first enhancement layer.
We simulated 40 transmissions of each sequence over this channel model. To conceal errors, every
lost frame was reconstructed by copying its preceding frame. To demonstrate the advantages of the
proposed algorithm, three scheduling algorithms were tested over the same channel realizations. They
are summarized as follows:
H1 This policy only schedules the base layer. It is the most conservative transmission strategy.
H2 This policy always schedules as many enhancement layers as possible. It preferentially
maximizes the visual quality of recent frames.
H3 This policy decides dynamically how many enhancement layers to transmit. Specifically,
when the instantaneous channel throughput is lower than the average data rate up to the first
enhancement layer, the policy behaves like H1. If the channel throughput is higher than the
average data rate up to the second enhancement layer, the policy acts like H2. Otherwise,
the policy schedules the video data up to the first enhancement layer.
The average visual quality measured using MS-SSIM index is shown in Table II. It is observed that
the proposed scheduling algorithm outperforms all of the other policies by 0.015 0.035 in MS-SSIM
value, which is perceptually significant. As can be seen in Table I, to increase the MS-SSIM value
by 0.02 approximately requires doubling the video bit rate. Thus, the proposed scheduling algorithm
provides very significant performance improvements over other scheduling policies.
19
B. Near-optimal scheduling policy design
Although the MDP-based scheduling policy has the optimal performance among all the policies we
consider, the off-line computation of MDP policies requires extra system resources. This motivate us
to design a simple on-line scheduling policy which presents similar performance as the MDP-based
policy.
The simulation results show that, by dynamically schedules data associated with different layers,
H3 achieves much better performance than other heuristics. But, for the sequence “Paris”, which has
large I frames, H3’s performance is much worse. This is mainly because H3 only transmits video
frames sequentially without proactively transmitting data of later GOPs. In the following, we propose
a scheduling scheme which, similar to the MDP-based policy, not only dynamically schedules data
units associated with different layers but also allocate transmissions for later GOPs.
At the tth slot, the proposed policy first estimates the amount of data which can be sent before the
t + τ th time slot as Dt =∑τ−1
n=1[rtρn−1 + ravg(1 − ρn−1)]. Here, ρ is the subdominant eigenvalue of
Pc, which represents the temporal correlation of the channel condition. Such a correlation parameter
could be easily measured. The quantity rt is the throughput of the current slot and ravg is the average
throughput of the channel. Again, these parameters can be measured. If bI = 0, correctly sending Ito the receiver is critical and thus we set τ = f I . If bI > 0, we care about the transmissions within
the coherence time and thus we set τ = �1/ ln(1/ρ)�, where 1/ ln(1/ρ) is roughly the relaxation timeof channel variations. Let D�t be the amount of unreceived data contained for the first � layers of the
next τ frames. The proposed scheduling policy operates as follows:
• If Dt < D2t , the policy only schedules the base layer data units; If D2t ≤ Dt < D3t , the policy
only schedules the first two layers; Otherwise, the policy schedules data units from all the three
layers.
• If Dt < D1t , the policy schedules as much of base layer data for I as possible. If D1t < Dt < D2t ,the policy schedules up to 50% of the transmissions for I. Otherwise, the scheduler policytransmits the frames sequentially without proactively transmit data for I.
The parameters of the proposed policy, i.e., ρ, rt and ravg can be easily measured and this policy
is very simple to implement. The performance of this policy was tested over the simulated Markov
channel models with different temporal correlation parameter ρ. The simulation results are summarized
in Table III and Table IV. For most tested sequences, the gap between the proposed policy and the
MDP-based policy ranges from 0.003 to 0.01.
20
VI. CONCLUSIONS
We have developed dynamic scheduling for efficient scalable video transmission in wireless channels.
By modeling the wireless channel as a Markov chain, an infinite-horizon average-reward maximization
formulation is proposed to maximize the visual quality predicted by MS-SSIM index. To reduce the
state space to a finite one, we employ a window to fix the scheduling policy when all the data within
the window are received. It is shown that the scheduling policy optimization problem is equivalent
to finding the optimal control policy for a controlled semi-Markov process over a finite-state space.
Simulation results demonstrate the superiority of the scheduling policy obtained by the proposed
MDP-based formulation. Further, a simple scheduling policy is proposed and presents near-optimal
performances.
REFERENCES
[1] H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the scalable video coding extension of the H.264/AVC standard,” IEEE
Trans. Circuits Syst. Video Technol., vol. 17, no. 9, pp. 1103–1120, Sept. 2007.
[2] W. Li, “Overview of fine granularity scalability in MPEG-4 video standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 11,
no. 3, pp. 301–317, Mar. 2001.
[3] D. Bertsekas, Dynamic Programming and Optimal Control, 3rd ed. Athena Scientific, 2005, vol. 2.
[4] Z. Wang and A. C. Bovik, “Mean squared error: Love it or leave it? A new look at signal fidelity measures,” IEEE Signal Process.
Mag., vol. 26, no. 1, pp. 98 –117, Jan. 2009.
[5] B. Girod, “What’s wrong with mean-squared error,” Digital Images and Human Vision (A. B. Watson, ed.), pp. 207–220, 1993.
[6] A. M. Eskicioglu and P. S. Fisher, “Image quality measures and their performance,” IEEE Trans. Commun., vol. 43, no. 12, pp.
2959–2965, Dec. 1995.
[7] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in Conference Record
of the Thirty-Seventh Asilomar Conference on Signals, Systems and Computers, vol. 2, Nov. 2003, pp. 1398–1402.
[8] K. Seshadrinathan, R. Soundararajan, A. C. Bovik, and L. K. Cormack, “Study of subjective and objective quality assessment of
video,” IEEE Trans. Image Process., vol. 19, no. 6, pp. 1427 –1441, Jun. 2010.
[9] M. Podolsky, S. McCanne, and M. Vetterli, “Soft ARQ for layered streaming media,” Technical Report, Computer Science Division,
University of California, Berkeley, vol. UCB/CSD-98-1024, 1998.
[10] P. A. Chou and Z. Miao, “Rate-distortion optimized streaming of packetized media,” IEEE Trans. Multimedia, vol. 8, no. 2, pp.
390–404, Apr. 2006.
[11] J. Chakareski, P. A. Chou, and B. Aazhang, “Computing rate-distortion optimized policies for streaming media to wireless clients,”
in Proceedings of Data Compression Conference, 2002, pp. 53–62.
[12] Y. Zhang, F. Fu, and M. van der Schaar, “On-line learning and optimization for wireless video transmission,” IEEE Trans. Signal
Process., vol. 58, no. 6, pp. 3108–3124, Jun. 2010.
[13] F. Fu and M. van der Schaar, “A new systematic framework for autonomous cross-layer optimization,” IEEE Trans. Veh. Technol.,
vol. 58, no. 4, pp. 1887–1903, May. 2009.
[14] C. Chen, R. W. Heath, A. C. Bovik, and G. de Veciana, “Adaptive policies for real-time video transmission: a Markov decision
process framework,” in 18th IEEE International Conference on Image Processing, Sept. 2011.
21
[15] Y. Li, A. Markopoulou, J. Apostolopoulos, and N. Bambos, “Content-aware playout and packet scheduling for video streaming
over wireless links,” IEEE Trans. Multimedia, vol. 10, no. 5, pp. 885–895, Aug. 2008.
[16] J. Cabrera, A. Ortega, and J. I. Ronda, “Stochastic rate-control of video coders for wireless channels,” IEEE Trans. Circuits Syst.
Video Technol., vol. 12, no. 6, pp. 496–510, Jun. 2002.
[17] J. W. Huang, H. Mansour, and V. Krishnamurthy, “A dynamical games approach to transmission-rate adaptation in multimedia
WLAN,” IEEE Trans. Signal Process., vol. 58, no. 7, pp. 3635–3646, Jul. 2010.
[18] J. Filar and K. Vrieze, Competitive Markov Decision Processes. New York: Springer-Verlag, 1997.
[19] F. Fu and M. Van Der Schaar, “A systematic framework for dynamically optimizing multi-user wireless video transmission,” IEEE
J. Sel. Areas Commun., vol. 28, no. 3, pp. 308–320, Apr. 2010.
[20] Q. Zhang and S. A. Kassam, “Finite-state Markov model for Rayleigh fading channels,” IEEE Trans. Commun., vol. 47, no. 11,
pp. 1688–1692, Nov. 1999.
[21] H.-P. Lin and M.-J. Tseng, “Two-layer multistate Markov model for modeling a 1.8 GHz narrow-band wireless propagation channel
in urban Taipei city,” IEEE Trans. Veh. Technol., vol. 54, no. 2, pp. 435–446, Mar. 2005.
[22] M. H. Pinson and S. Wolf, “A new standardized method for objectively measuring video quality,” IEEE Trans. Broadcast., vol. 50,
no. 3, pp. 312–322, Sept. 2004.
[23] D. M. Chandler and S. S. Hemami, “VSNR: A wavelet-based visual signal-to-noise ratio for natural images,” IEEE Trans. Image
Process., vol. 16, no. 9, pp. 2284–2298, Sept. 2007.
[24] A. K. Moorthy and A. C. Bovik, “Visual importance pooling for image quality assessment,” IEEE J. Sel. Topics Signal Process.,
vol. 3, no. 2, pp. 193–201, Apr. 2009.
[25] Z. Wang and Q. Li, “Video quality assessment using a statistical model of human visual speed perception,” Journal of the Optical
Society of America, vol. A 24, B61-B69, Jul. 2007.
[26] Test sequences [On line]. Available: http://trace.eas.asu.edu/yuv/.
[27] H. S. Wang and P.-C. Chang, “On verifying the first-order Markovian assumption for a Rayleigh fading channel model,” IEEE
Trans. Veh. Technol., vol. 45, no. 2, pp. 353–357, May. 1996.
[28] T. Su, H. Ling, and W. J. Vogel, “Markov modeling of slow fading in wireless mobile channels at 1.9 GHz,” IEEE Trans. Antennas
Propag., vol. 46, no. 6, pp. 947–948, Jun. 1998.
[29] C. C. Tan and N. C. Beaulieu, “On first-order Markov modeling for the Rayleigh fading channel,” IEEE Trans. Commun., vol. 48,
no. 12, pp. 2032–2040, Dec. 2000.
[30] Technical Report [On line]. Available: https://webspace.utexas.edu/cc39488/pdf/report.pdf.
[31] M. Neuts, Matrix-Geometric Solutions in Stochastic Models: An Algorithm Approach. The Johns Hopkins University Press, 1981.
[32] J. Reichel, S. Schwarz, and W. M, “Joint scalable video model 11 (JSVM 11),” Joint Video Team, Doc. JVT-X202, Jul. 2007.
22
TABLE I
THE ENCODING PARAMETERS AND RATE-QUALITY MODEL PARAMETERS OF THE TESTED SEQUENCES.
sequencesLayer 1 (base layer) Layer 2 Layer 3
QP rI1 rP1 q1 QP rI2 rP2 q2 QP rI3 rP3 q3
foreman 34 0.0448 0.0095 0.9165 30 0.057 0.0144 0.0256 28 0.0264 0.0203 0.0106
bus 34 0.0942 0.0285 0.9167 30 0.0864 0.0397 0.0390 28 0.0420 0.0580 0.0126
flower 40 0.0846 0.0130 0.9117 36 0.075 0.0225 0.0400 35 0.028 0.0268 0.008
mobile 39 0.1098 0.0174 0.9021 35 0.0953 0.0338 0.0427 34 0.0356 0.0416 0.0076
Paris 37 0.0765 0.0079 0.9155 32 0.0711 0.0135 0.0375 30 0.0392 0.0182 0.0113
TABLE II
THE PERFORMANCE OF THE MDP-BASED POLICY.
Paris mobile flower bus foreman
MDP-based Policy 0.9479 0.9437 0.9468 0.9471 0.9438
H1 0.9144 0.9023 0.9117 0.9167 0.9165
H2 0.9164 0.8585 0.9160 0.8930 0.8943
H3 0.8821 0.9199 0.9325 0.9337 0.9271
TABLE III
THE PERFORMANCE OF THE NEAR-OPTIMAL POLICY. ρ = 0.5123 AND τ = 2.
Paris mobile flower bus foreman
MDP-based Policy 0.9479 0.9437 0.9468 0.9471 0.9438
Near-optimal Policy 0.9434 0.9418 0.9444 0.9460 0.9416
TABLE IV
THE PERFORMANCE OF THE NEAR-OPTIMAL POLICY. ρ = 0.7232 AND τ = 4.
Paris mobile flower bus foreman
MDP-based Policy 0.9530 0.9461 0.9537 0.9494 0.9476
Near-optimal Policy 0.9431 0.9418 0.9468 0.9491 0.9414
23
Scheduler
MarkovChannel
Receiver
Requests
Channel and Receiver Buffer State
VideoServer
RequestedData
Transmitter
Fig. 1. Dynamic Scheduling for Video Transmission
-1,3
-1,2
-1,1
0,3
0,2
0,1
1,3
1,2
1,1
2,3
2,2
2,1
3,3
3,2
3,1
4,3
4,2
4,1
Unexpired FramesExpired Frames
CurrentFrame
Fig. 2. Encoder Prediction Structure when L = 3
24
1 2 3 4 5 6 7 8 9 100.04
0.05
0.06
0.07
0.08
0.09
0.1
0.11
0.12
0.13
Frame Index
Fram
e si
ze/M
bit
Foreman
Measured dataAdopted model
(a)
1 2 3 4 5 6 7 8 9 100.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0.22
Frame Index
Fram
e si
ze/M
bit
Paris
Measured dataAdopted model
(b)
0 20 40 60 80 100 120 1400
0.01
0.02
Fram
e si
ze/M
bit
1st l
ayer
Foreman
0 20 40 60 80 100 120 1400
0.01
0.02
0.03
Fram
e si
ze/M
bit
2nd
laye
r
0 20 40 60 80 100 120 1400
0.01
0.02
0.03
Frame Index
Fram
e si
ze/M
bit
3rd
Laye
r
Measured dataAdopted Model
(c)
0 20 40 60 80 100 120 1400
0.005
0.01
0.015
Fram
e si
ze/M
bit
1st l
ayer
Paris
0 20 40 60 80 100 120 1400
0.01
0.02
0.03
Fram
e si
ze/M
bit
2nd
laye
r
0 20 40 60 80 100 120 1400
0.02
0.04
Frame Index
Fram
e si
ze/M
bit
3rd
laye
r
Measured dataAdopted model
(d)
0 50 100 1500.9
0.91
0.92
0.93
0.94
0.95
0.96
Frame Index
MS
−SS
IM
Foreman
Measured dataAdopted model
(e)
0 50 100 1500.91
0.92
0.93
0.94
0.95
0.96
0.97
Frame Index
MS
−SS
IM
Paris
Measured dataAdopted model
(f)
Fig. 3. Comparison between measured rate-quality characteristics and estimated rate-quality characteristics using the adopted rate-
quality model. The measured rate and the estimated rate for P frames are shown in (a) and (b); The measured rate and the estimated
rate for I frames are shown in (c) and (d); The measured MS-SSIM value and the modeled MS-SSIM value are shown in (e) and (f);
25
UnexpiredFrames
ExpiredFrames
P P P P P P P I P PP PI
Receiveddata unit
Unreceiveddata unit
Fig. 4. An example of buffer state when there is no constraints applied on scheduling policies. LGOP = 8, L = 3
UnexpiredFrames
ExpiredFrames
P P P P P P P I P
pre
PP P
post
I
Receiveddata unit
Unreceiveddata unit
(a)
UnexpiredFrames
ExpiredFrames
itritr
postpre
(b)
Fig. 5. An illustration of the receiver buffer state when LGOP = 8, L = 3. (a): Bpret = (4, 2, 1), BIt = (6, 2), B
postt = (3, 3, 1) and
N itr = 0; (b): Bpret = (0, 0, 0), BIt = (5, 2), B
postt = (4, 1, 0) and N
itr = 2.
26
Displayed Frame
P P I P P P P P P
pre+ ��
IP P
post+
P
post
(a) The displayed frame is an I frame.
Displayed Frame
P P P P P P P I P
pre+ ��
PP P
post+
I
pre post
(b) The displayed frame is a P frame.
Fig. 6. Buffer state transition after a frame is played out. LGOP = 8, L = 3.
Statespace s
'S���
WWWW
W W�W W�
(a) Aμ
Statespace s
'S( | )T S S� � ))(
T� �
�� W
W ��W ��
(b) Ãμ
Fig. 7. The dynamics of the system Aμ and the corresponding simplified system Ãμ.