Prediction of Transmission Distortion for Wireless Video ... · Section III, we derive the...

1

Prediction of Transmission Distortion for WirelessVideo Communication: Analysis

Zhifeng Chen and Dapeng WuDepartment of Electrical and Computer Engineering, University of Florida, Gainesville, Florida 32611

Abstract—Transmitting video over wireless is a challengingproblem since video may be seriously distorted due to packeterrors caused by wireless channels. The capability of predictingtransmission distortion (i.e., video distortion caused by packeterrors) can assist in designing video encoding and transmissionschemes that achieve maximum video quality or minimum end-to-end video distortion. This paper is aimed at deriving formulaefor predicting transmission distortion. The contribution of thispaper is two-folded. First, we identify the governing law thatdescribes how the transmission distortion process evolves overtime, and analytically derive the transmission distortion formulaas a closed-form function of video frame statistics, channel errorstatistics, and system parameters. Second, we identify, for the firsttime, two important properties of transmission distortion. Thefirst property is that the clipping noise, produced by non-linearclipping, causes decay of propagated error. The second propertyis that the correlation between motion vector concealment errorand propagated error is negative, and has dominant impact ontransmission distortion, compared to other correlations. Due tothese two properties and elegant error/distortion decomposition,our formula provides not only more accurate prediction but alsolower complexity than the existing methods.

Index Terms—Wireless video, transmission distortion, clippingnoise, slice data partitioning (SDP), unequal error protection(UEP), time-varying channel.

I. INTRODUCTION

Both multimedia technology and mobile communicationshave experienced massive growth and commercial successin recent years. As the two technologies converge, wirelessvideo, such as video phone and mobile TV in 3G/4G systems,is expected to achieve unprecedented growth and worldwidesuccess. However, different from the traditional video codingsystem, transmitting video over wireless with good quality orlow end-to-end distortion is particularly challenging since thereceived video is subject to not only quantization error but alsotransmission error. In a wireless video communication system,end-to-end distortion consists of two parts: quantization dis-tortion and transmission distortion. Quantization distortion iscaused by quantization errors during the encoding process,and has been extensively studied in rate distortion theory [1],[2]. Transmission distortion is caused by packet errors duringthe transmission of a video sequence, and it is the major part

Please direct all correspondence to Prof. Dapeng Wu, University ofFlorida, Dept. of Electrical & Computer Engineering, P.O.Box 116130,Gainesville, FL 32611, USA. Tel. (352) 392-4954. Fax (352) 392-0044. Email:[email protected]. Homepage: http://www.wu.ece.ufl.edu. This work was sup-ported in part by the US National Science Foundation under grant ECCS-1002214. Copyright (c) 2010 IEEE. Personal use of this material is permitted.However, permission to use this material for any other purposes must beobtained from the IEEE by sending a request to [email protected].

of the end-to-end distortion in delay-sensitive wireless videocommunication1 under high packet error probability (PEP),e.g., in a wireless fading channel.

The capability of predicting transmission distortion at thetransmitter can assist in designing video encoding and trans-mission schemes that achieve maximum video quality underresource constraints. Specifically, transmission distortion pre-diction can be used in the following three applications in videoencoding and transmission: 1) mode decision, which is to findthe best intra/inter-prediction mode for encoding a macroblock(MB) with the minimum rate-distortion (R-D) cost given theinstantaneous PEP, 2) cross-layer encoding rate control, whichis to control the instantaneously encoded bit rate for a real-time encoder to minimize the frame-level end-to-end distortiongiven the instantaneous PEP, e.g., in video conferencing, 3)packet scheduling, which chooses a subset of packets of thepre-coded video to transmit and intentionally discards theremaining packets to minimize the group of picture (GOP)-level end-to-end distortion given the average PEP and averageburst length, e.g., in streaming pre-coded video over networks.All the three applications require a formula for predicting howtransmission distortion is affected by their respective controlpolicy, in order to choose the optimal mode or encoding rateor transmission schedule.

However, predicting transmission distortion poses a greatchallenge due to the spatio-temporal correlation inside theinput video sequence, the nonlinearity of both the encoderand the decoder, and varying PEP in time-varying channels.In a typical video codec, the temporal correlation amongconsecutive frames and the spatial correlation among theadjacent pixels of one frame are exploited to improve thecoding efficiency. Nevertheless, such a coding scheme bringsmuch difficulty in predicting transmission distortion becausea packet error will degrade not only the video quality ofthe current frame but also the following frames due to errorpropagation. In addition, as we will see in Section III, thenonlinearity of both the encoder and the decoder makes theinstantaneous transmission distortion not equal to the sum ofdistortions caused by individual error events. Furthermore, in awireless fading channel, the PEP is time-varying, which makesthe error process a non-stationary random process and hence,as a function of the error process, the distortion process is alsoa non-stationary random process.

According to the aforementioned three applications, the

1Delay-sensitive wireless video communication usually does not allowretransmission to correct packet errors since retransmission may cause longdelay.

2

existing algorithms for estimating transmission distortion canbe categorized into the following three classes: 1) pixel-levelor block-level algorithms (applied to mode decision), e.g.,Recursive Optimal Per-pixel Estimate (ROPE) algorithm [3]and Law of Large Number (LLN) algorithm [4], [5]; 2)frame-level or packet-level or slice-level algorithms (appliedto cross-layer encoding rate control) [6], [7], [8], [9], [10];3) GOP-level or sequence-level algorithms (applied to packetscheduling) [11], [12], [13], [14], [15]. Although the existingdistortion estimation algorithms work at different levels, theyshare some common properties, which come from the inherentcharacteristics of wireless video communication system, thatis, spatio-temporal correlation, nonlinear codec and time-varying channel. However, none of the existing works analyzedthe effect of non-linear clipping noise on the transmissiondistortion, and therefore cannot provide accurate distortionestimation.

In this paper, we derive the transmission distortion formulaefor wireless video communication systems. With considerationof spatio-temporal correlation, nonlinear codec and time-varying channel, our distortion prediction formulae improvethe accuracy of distortion estimation from existing works.Besides that, our formulae support, for the first time, thefollowing capabilities: 1) prediction at different levels (e.g.,pixel/frame/GOP level), 2) prediction for multi-reference mo-tion compensation, 3) prediction under slice data partitioning(SDP) [16], 4) prediction under arbitrary slice-level pack-etization with flexible macroblock ordering (FMO) mecha-nism [17], [18], 5) prediction under time-varying channels,6) one unified formula for both I-MB and P-MB, and 7) pre-diction for both low motion and high motion video sequences.In addition, this paper also identifies two important propertiesof transmission distortion for the first time: 1) clipping noise,produced by non-linear clipping, causes decay of propagatederror; 2) the correlation between motion vector concealmenterror and propagated error is negative, and has dominantimpact on transmission distortion, among all the correlationsbetween any two of the four components in transmission error.Due to the page limit, we move most of the experimentalresults to our sequel paper [19], which 1) verify the accuracyof the formulae derived in this paper and compare that toexisting models, 2) discuss the algorithms designed based onthe formulae, 3) apply our algorithms in practical video codecdesign, and 4) compare the R-D performance between ouralgorithms and existing estimation algorithms.

The rest of the paper is organized as follows. Section IIpresents the preliminaries of our system model under study tofacilitate the derivations in the later sections, and illustratesthe limitations of existing transmission distortion models. InSection III, we derive the transmission distortion formula asa function of frame statistics, channel condition, and systemparameters. Section IV concludes the paper.

II. SYSTEM DESCRIPTION

A. Structure of a Wireless Video Communication System

Fig. 1 shows the structure of a typical wireless videocommunication system. It consists of an encoder, two channels

Videocapture

Input

T/Q-Q-1/T-1

ResidualChannel

+

Motioncompensation

Memory

Motionestimation

MVChannel

Q-1/T-1

+

Motioncompensation

MV Errorconcealment

ChannelEncoder Decoder

Videodisplay

Output

Clipping Clipping

Memory

keu

kuvm

(

keuˆ

kfû1ˆ −kfu

1ˆ −+k

kfumvu

1~

~ −+k

kfuvmu

keu~

kfu~

1~ −kfu

Residual ErrorConcealment

keuˆ

keu(

kfu

kumv

S(r)

S(m)

‘0’

‘0’

‘1’

‘1’

Fig. 1. System structure, where T, Q, Q−1, and T−1 denote transform,quantization, inverse quantization, and inverse transform, respectively.

and a decoder where residual channel and motion vector(MV) channel may be either the same channel or differentchannels. If residual packets or MV packets are erroneous, theerror concealment module will be activated. In typical videostandard such as H.263/264 and MPEG-2/4, the functionalblocks of an encoder can be divided into two classes: 1)basic parts, such as predictive coding, transform, quantization,entropy coding, motion compensation, and clipping; and 2)performance-enhancing parts, such as interpolation filtering,deblocking filtering, B-frame, multi-reference prediction, etc.Although the up-to-date video standard, e.g. the coming newvideo standard HEVC, includes more and more performance-enhancing parts, the basic parts do not change. In this paper,we analyze the transmission distortion for the structure withthe basic parts in Fig. 1.

Note that in this system, both residual channel and MVchannel are application-layer channels; specifically, both chan-nels consist of entropy coding and entropy decoding, net-working layers2, and physical layer (including channel en-coding, modulation, wireless fading channel, demodulation,channel decoding). Although the residual channel and MVchannel usually share the same physical-layer channel, thetwo application-layer channels may have different parametersettings (e.g., different channel code-rate) for different SDPpackets under unequal error protection (UEP) consideration.

Table I lists notations used in this paper. All vectors arein bold font. Note that the encoder needs to reconstruct thecompressed video for predictive coding; hence the encoder andthe decoder have a similar structure for pixel value reconstruc-tion. To distinguish the variables in the reconstruction moduleof the encoder from those in the reconstruction module of thedecoder, we add ˆ on top of the variables at the encoder andadd ˜ on top of the variables at the decoder.B. Clipping Noise

In this subsection, we examine the effect of clipping noiseon the reconstructed pixel value along each pixel trajectoryover time (frames). All pixel positions in a video sequenceform a three-dimensional spatio-temporal domain, i.e., twodimensions in spatial domain and one dimension in temporal

2Here, networking layers can include any layers other than physical layer.

3

TABLE ISUMMARY OF NOTATIONS

uk : A pixel with position u in the k-th framefku : Value of the pixel uk

eku : Residual of the pixel uk

mvku: MV of the pixel uk

∆ku : Clipping noise of the pixel uk

εku : Residual concealment error of the pixel uk

ξku : MV concealment error of the pixel uk

ζku : Transmission reconstructed error of the pixel uk

Sku : Error state of the pixel uk

Pku : Error probability of the pixel uk

Dku : Transmission distortion of the pixel uk

Dk : Transmission distortion of the k-th frameVk : Set of all the pixels in the k-th frame|Vk| : Number of elements in set Vk (cardinality of Vk)αk : Propagation factor of the k-th frameβk : Percentage of I-MBs in the k-th frameλk : Correlation ratio of the k-th frame

domain. Each pixel can be uniquely represented by uk in thisthree-dimensional time-space, where k means the k-th framein temporal domain and u is a two-dimensional vector inspatial domain, i.e. position in the k-th frame. The philosophybehind inter-prediction of a video sequence is to representthe video sequence by virtual motion of each pixel, i.e., eachpixel recursively moves from position v in the k − 1 frame,i.e. vk−1, to position uk. The difference between these twopositions is a two-dimensional vector called MV of pixel uk,i.e., mvku = v

k−1 − uk. The difference between the pixelvalues of these two positions is called residual of pixel uk,that is, eku = f

ku − f̂k−1u+mvku . Recursively, each inter-predicted

pixel in the k-th frame has one and only one reference pixeltrajectory backward towards the latest I-block.3

At the encoder, after transform, quantization, inverse quanti-zation, and inverse transform for the residual, the reconstructedpixel value for uk may be out-of-range and should be clippedas

f̂ku = Γ(f̂k−1u+mvku

+ êku), (1)

where Γ(·) function is a clipping function defined by

Γ(x) =

γl, x < γl

x, γl ≤ x ≤ γhγh, x > γh,

(2)

where γl and γh are user-specified low threshold and highthreshold, respectively. Usually, γl = 0 and γh = 255.

The residual and MV at the decoder may be differentfrom their counterparts at the encoder because of channelimpairments. Denote m̃vku and ẽ

ku the MV and residual at the

decoder, respectively. Then, the reference pixel position for uk

at the decoder is ṽk−1 = uk + m̃vku, and the reconstructedpixel value for uk at the decoder is

f̃ku = Γ(f̃k−1u+m̃vku

+ ẽku). (3)

In error-free channels, the reconstructed pixel value at thereceiver is exactly the same as the reconstructed pixel valueat the transmitter, because there is no transmission error and

3We will also discuss intra-predicted pixels in Section III.

hence no transmission distortion. However, in error-pronechannels, we know from (3) that f̃ku is a function of threefactors: the received residual ẽku, the received MV m̃v

ku,

and the propagated error f̃k−1u+m̃vku

. The received residual ẽkudepends on three factors, namely, 1) the transmitted residualêku, 2) the residual packet error state, which depends oninstantaneous residual channel condition, and 3) the residualerror concealment algorithm if the received residual packet iserroneous. Similarly, the received MV m̃vku depends on 1)the transmitted mvku, 2) the MV packet error state, whichdepends on instantaneous MV channel condition, and 3) theMV error concealment algorithm if the received MV packetis erroneous. The propagated error f̃k−1

u+m̃vkuincludes the error

propagated from the reference frames, and therefore dependson all samples in the previous frames indexed by i, where1 ≤ i < k and their reception error states as well as errorconcealment algorithms. In this paper, we consider the tem-poral error concealment [20], [21] in deriving the transmissiondistortion formulae.

The non-linear clipping function within the pixel trajectorymakes the distortion estimation more challenging. However,it is interesting to observe that clipping actually reducestransmission distortion. In Section III, we will quantify theeffect of clipping on transmission distortion.

C. Definition of Transmission Distortion

In a video sequence, all pixel positions in the k-th frameform a two-dimensional vector set Vk, and we denote thenumber of elements in set Vk by |Vk|. So, for any pixel atposition u in the k-th frame, i.e., u ∈ Vk, its reference pixelposition is chosen from set Vk−1 for single-reference motioncompensation.

Given the joint probability mass function (PMF) of f̂ku andf̃ku , we define the pixel-level transmission distortion (PTD) forpixel uk by

Dku , E[(f̂ku − f̃ku)2], (4)

where E[·] represents expectation and the randomness comesfrom both random video input and random channel error state.Then, we define the frame-level transmission distortion (FTD)for the k-th frame by

Dk , E[ 1|Vk|

·∑u∈Vk

(f̂ku − f̃ku)2]. (5)

It is easy to prove that the relationship between FTD and PTDis characterized by

Dk =1

|Vk|·∑u∈Vk

Dku. (6)

In fact, (6) is a general form for distortions of all levels. If|Vk| = 1, (6) reduces to (4). For slice/packet-level distortion,Vk is the set of the pixels contained in a slice/packet. ForGOP-level distortion, Vk could be replaced by the set of thepixels contained in a GOP. In this paper, we only show howto derive formulae for PTD and FTD. Our methodology isalso applicable to deriving formulae for slice/packet/GOP-leveldistortion by using appropriate Vk.

4

D. Limitations of the Existing Transmission Distortion Models

We define the clipping noise for pixel uk at the encoder as

∆̂ku , (f̂k−1u+mvku + êku)− Γ(f̂k−1u+mvku + ê

ku), (7)

and the clipping noise for pixel uk at the decoder as

∆̃ku , (f̃k−1u+m̃vku + ẽku)− Γ(f̃k−1u+m̃vku + ẽ

ku). (8)

Using (1), Eq. (7) becomes

f̂ku = f̂k−1u+mvku

+ êku − ∆̂ku, (9)

and using (3), Eq. (8) becomes

f̃ku = f̃k−1u+m̃vku

+ ẽku − ∆̃ku, (10)

where ∆̂ku only depends on the video content and encoderstructure, e.g., motion estimation, quantization, mode decisionand clipping function; and ∆̃ku depends on not only the videocontent and encoder structure, but also channel conditionsand decoder structure, e.g., error concealment and clippingfunction.

In most existing works [3], [7], [9], [10], [15], both ∆̂ku and∆̃ku are neglected, i.e., these works assume f̂

ku = f̂

k−1u+mvku

+ êku

and f̃ku = f̃k−1u+m̃vku

+ẽku. However, this assumption is only valid

for stored video or error-free communication, where ∆̃ku =∆̂ku, since ∆̂

ku = 0 with very high probability. For error-prone

communication, decoder clipping noise ∆̃ku has a significantimpact on transmission distortion and hence should not beneglected. Without taking into consideration ∆̃ku, the estimateddistortion can be much larger than true distortion [22].

III. TRANSMISSION DISTORTION FORMULAE

In this section, we derive formulae for PTD and FTD.The section is organized as follows: Section III-A presentsan overview of our approach to analyzing PTD and FTD.Then, we elaborate on the derivation details in Section III-Bthrough Section III-E. Specifically, Section III-B quantifies theeffect of residual concealment error (RCE) on transmissiondistortion; Section III-C quantifies the effect of motion vectorconcealment error (MVCE) on transmission distortion; Sec-tion III-D quantifies the effect of propagated error and clippingnoise on transmission distortion; Section III-E quantifies theeffect of correlations (between any two of the error sources)on transmission distortion. Finally, Section III-F summarizesthe key results of this paper, i.e., the formulae for PTD andFTD.

A. Overview of the Approach to Analyzing PTD and FTD

To analyze PTD and FTD, we take a divide-and-conquerapproach. We first divide transmission reconstructed errorinto four components: three random errors (RCE, MVCE andpropagated error) due to their different physical causes, andclipping noise, which is a non-linear function of these threerandom errors. This error decomposition allows us to furtherdecompose transmission distortion into four terms, i.e., distor-tion caused by 1) RCE, 2) MVCE, 3) propagated error plusclipping noise, and 4) correlations between any two of the error

TABLE IIDEFINITIONS

RCE : residual concealment errorMVCE: motion vector concealment errorPTD : pixel-level transmission distortionFTD : frame-level transmission distortionXEP : pixel error probabilityPEP : packet error probabilityFMO : flexible macroblock orderingUEP : unequal error protectionSDP : slice data partitioningPMF : probability mass function

sources, respectively. This distortion decomposition facilitatesthe derivation of a simple and accurate closed-form formulafor each of the four distortion terms. Next, we elaborate onerror decomposition and distortion decomposition.

Define transmission reconstructed error for pixel uk byζ̃ku , f̂ku − f̃ku . From (9) and (10), we obtain

ζ̃ku = (êku + f̂

k−1u+mvku

− ∆̂ku)− (ẽku + f̃k−1u+m̃vku − ∆̃ku)

= (êku − ẽku) + (f̂k−1u+mvku − f̂k−1u+m̃vku

)

+ (f̂k−1u+m̃vku

− f̃k−1u+m̃vku

)− (∆̂ku − ∆̃ku).

(11)

Define RCE ε̃ku by ε̃ku , êku − ẽku, and define MVCE ξ̃ku

by ξ̃ku , f̂k−1u+mvku − f̂k−1u+m̃vku

. Note that f̂k−1u+m̃vku

− f̃k−1u+m̃vku

=

ζ̃k−1u+m̃vku

, which is the transmission reconstructed error of theconcealed reference pixel in the reference frame; we callζ̃k−1u+m̃vku

propagated error. As mentioned in Section II-D, we

assume ∆̂ku = 0. Therefore, (11) becomes

ζ̃ku = ε̃ku + ξ̃

ku + ζ̃

k−1u+m̃vku

+ ∆̃ku. (12)

(12) is our proposed error decomposition. In Table II, we listthe abbreviations that will be used frequently in the followingsections.

Combining (4) and (12), we have

Dku = E[(ε̃ku + ξ̃

ku + ζ̃

k−1u+m̃vku

+ ∆̃ku)2]

= E[(ε̃ku)2] + E[(ξ̃ku)

2] + E[(ζ̃k−1u+m̃vku

+ ∆̃ku)2] + 2E[ε̃ku · ξ̃ku]

+ 2E[ε̃ku · (ζ̃k−1u+m̃vku + ∆̃ku)] + 2E[ξ̃

ku · (ζ̃k−1u+m̃vku + ∆̃

ku)].

(13)

Denote Dku(r) , E[(ε̃ku)2], Dku(m) , E[(ξ̃ku)2], Dku(P ) ,E[(ζ̃k−1

u+m̃vku+ ∆̃ku)

2] and Dku(c) , 2E[ε̃ku · ξ̃ku] + 2E[ε̃ku ·(ζ̃k−1

u+m̃vku+∆̃ku)]+2E[ξ̃

ku·(ζ̃k−1u+m̃vku+∆̃

ku)]. Then, (13) becomes

Dku = Dku(r) +D

ku(m) +D

ku(P ) +D

ku(c). (14)

(14) is our proposed distortion decomposition for PTD. Thereason why we combine propagated error and clipping noiseinto one term (called clipped propagated error) is becauseclipping noise is mainly caused by propagated error and suchdecomposition will simplify the formulae.

There are three major reasons for our decompositions in(12) and (14). First, if we directly substitute the terms in(4) by (9) and (10), it will produce 5 second moments and10 cross-correlation terms (assuming ∆̂ku = 0); since there

5

are 8 possible error events due to three individual randomerrors, there are a total of 8 × (5 + 10) = 120 terms forPTD, making the analysis highly complicated. In contrast, ourdecompositions in (12) and (14) significantly simplify the anal-ysis. Second, each term in (12) and (14) has a clear physicalmeaning, which lessens the requirement for joint PMF of f̂kuand f̃ku and leads to accurate estimation algorithms with lowcomplexity. Third, such decompositions allow our formulaeto be easily extended for supporting advanced video codecwith more performance-enhancing parts, e.g., multi-referenceprediction [22] and interpolation filtering in fractional-pelmotion estimation [23].

To derive the formula for FTD, from (6) and (14), we obtain

Dk = Dk(r) +Dk(m) +Dk(P ) +Dk(c), (15)

whereDk(r) =

1

|V|·∑u∈Vk

Dku(r), (16)

Dk(m) =1

|V|·∑u∈Vk

Dku(m), (17)

Dk(P ) =1

|V|·∑u∈Vk

Dku(P ), (18)

Dk(c) =1

|V|·∑u∈Vk

Dku(c). (19)

(15) is our proposed distortion decomposition for FTD. Usu-ally, the cardinality, i.e. the number of elements, of set Vkin a video sequence is the same for all frames. That is,|V1| = · · · = |Vk| = |V| for all k ≥ 14. Hence, we removethe frame index k and denote |Vk| for all k ≥ 1 by |V|.Note that in a video codec, e.g. H.264/AVC [16], a referencepixel may be in a position out of picture boundary; however,the cardinality of set consisting of reference pixels, althoughlarger than the cardinality of the input pixel set |V|, is still thesame for all frames.

B. Analysis of Distortion Caused by RCE

In this subsection, we first derive the pixel-level residualcaused distortion Dku(r). Then, we derive the frame-levelresidual caused distortion Dk(r).

1) Pixel-level Distortion Caused by RCE: We denote Sku asthe state indicator of whether there is transmission error forpixel uk after channel decoding. Note that as mentioned inSection II-A, both the residual channel and the MV channelcontain channel decoding; hence in this paper, the transmissionerror in the residual channel or the MV channel is meant to bean uncorrectable error after channel decoding. To distinguishthe residual error state and the MV error state, here we useSku(r) to denote the residual error state for pixel u

k. That is,Sku(r) = 1 if ê

ku is received with error, and S

ku(r) = 0 if

êku is received without error. At the receiver, if there is noresidual transmission error for pixel uk, ẽku is equal to ê

ku.

4Note that although they have the same cardinality, different sets are verydifferent, i.e. Vk−1 ̸= Vk .

However, if the residual packets are received with error, weneed to conceal the residual error at the receiver. Denote ěkuthe concealed residual when Sku(r) = 1, and we have,

ẽku =

{ěku, S

ku(r) = 1

êku, Sku(r) = 0.

(20)

Note that ěku depends on êku and the residual concealment

method, but does not depend on the channel condition. Fromthe definition of ε̃ku and (20), we have

ε̃ku = (êku − ěku) · Sku(r) + (êku − êku) · (1− Sku(r))

= (êku − ěku) · Sku(r).(21)

êku depends on the input video sequence and the encoderstructure, while Sku(r) depends on the random multiplica-tive and additive noises in the wireless channel. Under ourframework shown in Fig. 1, the input video sequence and theencoder structure are independent of communication systemparameters. Therefore, we assume êku and S

ku(r) are indepen-

dent as the following assumption.Assumption 1: Sku(r) is independent of ê

ku.

Denote εku , êku − ěku; we have ε̃ku = εku · Sku(r). DenoteP ku (r) as the residual pixel error probability (XEP) for pixeluk, that is, P ku (r) , P{Sku(r) = 1}5. Then, given P ku (r),from (21) and Assumption 1, we have

Dku(r) = E[(ε̃ku)

2] = E[(εku)2] · E[(Sku(r))2]

= E[(εku)2] · (1 · P ku (r)) = E[(εku)2] · P ku (r).

(22)

Hence, our formula for the pixel-level residual caused distor-tion is

Dku(r) = E[(εku)

2] · P ku (r). (23)

Note that we may also generalize (23) for I-MB. For pixelsin I-MB, if the packet containing those pixels has error, ěku isstill available since all the erroneous pixels will be concealedin the same way. However, since there is no êku available, inorder to use (23) to predict the transmission distortion, we mayneed to find the best reference, in terms of R-D cost, for thereconstructed I-MB by doing a virtual motion estimation andthen calculate êku for (23). The estimated mv

ku can be used to

predict Dku(m) for I-MB in the next subsection. An alternativemethod to calculate êku for I-MB is to use the same positionof previous frame as reference, i.e. assuming mvku = 0. Notethat if the packet containing those pixels in I-MB is correctlyreceived, Dku(r) = 0.

2) Frame-level Distortion Caused by RCE: To derive theframe-level residual caused distortion, the encoder needs toknow the second moment of RCE for each pixel in that frame.In most, if not all, existing distortion models [3], [7], [9], [10],[15], the residual error concealment method is to let ěku = 0for all erroneous pixels. However, as long as êku and ě

ku satisfy

some properties, we can derive a formula for more generalresidual error concealment methods instead of assuming ěku =0. We make the following assumption for êku and ě

ku.

5Pku (r) depends on the communication system parameters such as delaybound, channel coding rate, transmission power, channel gain of the wirelesschannel.

6

Assumption 2: The residual êku is stationary with respect to2D variable u in the same frame. In addition, ěku only dependson {êkv : v ∈ Nu} where Nu is a fixed neighborhood of u.

In other words, Assumption 2 assumes that 1) êku is a 2Dstationary stochastic process and the distribution of êku is thesame for all u ∈ Vk, and 2) ěku is also a 2D stationarystochastic process since it only depends on the neighboringêku. Hence, ê

ku − ěku is also a 2D stationary stochastic process,

and its second moment E[(êku− ěku)2] = E[(εku)2] is the samefor all u ∈ Vk. Therefore, we can drop u from the notation,and let E[(εk)2] = E[(εku)

2] for all u ∈ Vk.Denote Nki (r) as the number of pixels contained in the i-th

residual packet of the k-th frame; denote P ki (r) as PEP of thei-th residual packet of the k-th frame; denote Nk(r) as thetotal number of residual packets of the k-th frame. Since forall pixels in the same packet, the residual XEP is equal to itsPEP, from (16) and (23), we have

Dk(r) =1

|V|∑u∈Vk

E[(εku)2] · P ku (r) (24)

=1

|V|∑u∈Vk

E[(εk)2] · P ku (r) (25)

(a)=

E[(εk)2]

|V|

Nk(r)∑i=1

(P ki (r) ·Nki (r)) (26)

(b)= E[(εk)2] · P̄ k(r). (27)

where (a) is due to P ku (r) = Pki (r) for pixel u in the i-th

residual packet; (b) is due to

P̄ k(r) , 1|V|

Nk(r)∑i=1

(P ki (r) ·Nki (r)). (28)

P̄ k(r) is a weighted average over PEP of all residual packetsin the k-th frame, in which different packets may containdifferent numbers of pixels. Hence, given PEP of all residualpackets in the k-th frame, our formula for the frame-levelresidual caused distortion is

Dk(r) = E[(εk)2] · P̄ k(r). (29)

Note that with FMO mechanism, many neighboring pixelsmay be encoded into different slices and transmitted in dif-ferent packets. Since each packet may experience differentPEP especially in a fast fading channel, even neighboringpixels may have very different XEP. Therefore, (29) worksperfectly under the FMO consideration. This situation is takeninto consideration throughout this paper.

C. Analysis of Distortion Caused by MVCE

Similar to the derivations in Section III-B1, in this sub-section, we derive the formula for the pixel-level MV causeddistortion Dku(m), and the frame-level MV caused distortionDk(m).

1) Pixel-level Distortion Caused by MVCE: Denote theMV error state for pixel uk by Sku(m), and denote the con-cealed MV by m̌vku for general temporal error concealmentmethods when Sku(m) = 1. Therefore, we have

m̃vku =

{m̌vku, S

ku(m) = 1

mvku, Sku(m) = 0.

(30)

Denote ξku , f̂k−1u+mvku − f̂k−1u+m̌vku

, where ξku depends onthe accuracy of MV concealment, and the spatial correlationbetween reference pixel and concealed reference pixel at theencoder. A more comprehensive analysis of effect of inaccu-rate MV estimation on ξku can be found in Ref. [24], which isthen extended to support multihypothesis motion-compensatedprediction [25] and to derive a rate-distortion model taking intoaccount the temporal prediction distance [26].

We also make the following assumption.Assumption 3: Sku(m) is independent of ξ

ku.

Denote P ku (m) as the MV XEP for pixel uk, that is,

P ku (m) , P{Sku(m) = 1}. Note that it is possible thatP ku (m) ̸= P ku (r) if SDP and UEP are applied. Given P ku (m),following the same derivation process in Section III-B1, wecan obtain

Dku(m) = E[(ξku)

2] · P ku (m). (31)

Also note that in H.264/AVC specification [16], there is noSDP for an instantaneous decoding refresh (IDR) frame; soSku(r) = S

ku(m) in an IDR-frame and hence P

ku (r) = P

ku (m).

This is also true for MB without SDP. For P-MB with SDP inH.264/AVC, Sku(r) and S

ku(m) are dependent. In other words,

if the MV packet is lost, the corresponding residual packetcannot be decoded even if it is correctly received, since thereis no slice header in the residual packet. Therefore, the residualchannel and the MV channel in Fig. 1 are actually dependentif the encoder follows H.264/AVC specification. In this paper,we study transmission distortion in a more general case whereSku(r) and S

ku(m) can be either independent or dependent.

6

2) Frame-level Distortion Caused by MVCE: To derive theframe-level MV caused distortion, we also make the followingassumption.

Assumption 4: The second moment of ξku is the same forall u ∈ Vk.

Under Assumption 4, we can drop u from the notation, andlet E[(ξk)2] = E[(ξku)

2] for all u ∈ Vk. Denote Nki (m) as thenumber of pixels contained in the i-th MV packet of the k-thframe; denote P ki (m) as PEP of the i-th MV packet of the k-th frame; denote Nk(m) as the total number of MV packets ofthe k-th frame. Then, given PEP of all MV packets in the k-thframe, following the same derivation process in Section III-B2,we obtain the frame-level MV caused distortion for the k-thframe as

Dk(m) = E[(ξk)2] · P̄ k(m), (32)

6To achieve this, we add side information to the H.264/AVC referencecode JM14.0 by allowing residual packets to be used for decoder without thecorresponding MV packets being correctly received, that is, êku can be usedto reconstruct f̃ku even if mvku is not correctly received.

7

where P̄ k(m) , 1|V|∑Nkm

i=1(Pki (m) · Nki (m)), a weighted

average over PEP of all MV packets in the k-th frame, inwhich different packets may contain different numbers ofpixels.

D. Analysis of Distortion Caused by Propagated Error PlusClipping Noise

In this subsection, we derive the distortion caused byerror propagation in a non-linear decoder with clipping. Wefirst derive the pixel-level propagation and clipping causeddistortion Dku(P ). Then, we derive the frame-level propagationand clipping caused distortion Dk(P ).

1) Pixel-level Distortion Caused by Propagated Error PlusClipping Noise: First, we analyze the pixel-level propagationand clipping caused distortion Dku(P ) in P-MBs. From thedefinition, we know Dku(P ) depends on propagated error andclipping noise; and clipping noise is a function of RCE,MVCE and propagated error. Hence, Dku(P ) depends on RCE,MVCE and propagated error. Let r,m, p denote the event ofoccurrence of RCE, MVCE and propagated error respectively,and let r̄, m̄, p̄ denote logical NOT of r,m, p respectively(indicating no error). We use a triplet to denote the joint eventof three types of error; e.g., {r,m, p} denotes the event thatall the three types of errors occur, and uk{r̄, m̄, p̄} denotesthe pixel uk experiencing none of the three types of errors.

When we analyze the condition that several error eventsmay occur, the notation could be simplified by the principleof formal logic. For example, ∆̃ku{r̄, m̄} denotes the clippingnoise under the condition that there is neither RCE nor MVCEfor pixel uk, while it is not certain whether the referencepixel has error. Correspondingly, denote P ku{r̄, m̄} as theprobability of event {r̄, m̄}, that is, P ku{r̄, m̄} = P{Sku(r) =0 and Sku(m) = 0}. From the definition of P ku (r), the marginalprobability P ku{r} = P ku (r) and the marginal probabilityP ku{r̄} = 1 − P ku (r). Similarly, P ku{m} = P ku (m) andP ku{m̄} = 1− P ku (m).

Define Dku(p) , E[(ζ̃k−1u+mvku + ∆̃ku{r̄, m̄})2]; and define

αku ,Dku(p)

Dk−1u+mvku

, which is called propagation factor for pixel

uk. The propagation factor αku defined in this paper is differentfrom the propagation factor [10], leakage [7], or attenuationfactor [15], which are modeled as the effect of spatial filteringor intra update; our propagation factor αku is also different fromthe fading factor [8], which is modeled as the effect of usingfraction of referenced pixels in the reference frame for motionprediction. Note that Dku(p) is only a special case of D

ku(P )

under the error event of {r̄, m̄} for pixel uk. However, mostexisting models inappropriately use their propagation factor,obtained under the error event of {r̄, m̄}, to replace Dku(P )directly.

To calculate E[(ζ̃k−1u+m̃vku

+∆̃ku)2] in (13), we need to analyze

∆̃ku in four different error events for pixel uk: 1) both residual

and MV are erroneous, denoted by uk{r,m}; 2) residual iserroneous but MV is correct, denoted by uk{r, m̄}; 3) residualis correct but MV is erroneous, denoted by uk{r̄,m}; and 4)

both residual and MV are correct, denoted by uk{r̄, m̄}. So,

Dku(P ) = Pku{r,m} · E[(ζ̃k−1u+m̌vku + ∆̃

ku{r,m})2]

+ P ku{r, m̄} · E[(ζ̃k−1u+mvku + ∆̃ku{r, m̄})2]

+ P ku{r̄,m} · E[(ζ̃k−1u+m̌vku + ∆̃ku{r̄,m})2]

+ P ku{r̄, m̄} · E[(ζ̃k−1u+mvku + ∆̃ku{r̄, m̄})2].

(33)

Note that the concealed pixel value should be in the clippingfunction range, that is, Γ(f̃k−1

u+m̃vku+ ěku) = f̃

k−1u+m̃vku

+ ěku, so

∆̃ku{r} = 0. Also note that if the MV channel is independentof the residual channel, we have P ku{r,m} = P ku (r) ·P ku (m).However, as mentioned in Section III-C1, in H.264/AVC spec-ification, these two channels are dependent. In other words,P ku{r̄,m} = 0 and P ku{r̄, m̄} = P ku{r̄} for P-MBs with SDPin H.264/AVC7. In such a case, (33) is simplified to

Dku(P ) = Pku{r,m} ·Dk−1u+m̌vku + P

ku{r, m̄} ·Dk−1u+mvku

+ P ku{r̄} ·Dku(p).(34)

Note that for P-MB without SDP, we have P ku{r, m̄} =P ku{r̄,m} = 0, P ku{r,m} = P ku{r} = P ku{m} = P ku , andP ku{r̄, m̄} = P ku{r̄} = P ku{m̄} = 1−P ku . Therefore, (34) canbe further simplified to

Dku(P ) = Pku ·Dk−1u+m̌vku + (1− P

ku ) ·Dku(p). (35)

Also note that for I-MB, there will be no transmissiondistortion if it is correctly received, that is, Dku(p) = 0. So(35) can be further simplified to

Dku(P ) = Pku ·Dk−1u+m̌vku . (36)

Comparing (36) with (35), we see that I-MB is a specialcase of P-MB with Dku(p) = 0, that is, the propagation factorαku = 0 according to the definition. It is important to note thatDku(P ) > 0 for I-MB since P

ku ̸= 0. In other words, I-MB also

contains the distortion caused by propagation error and it canbe predicted by (36). However, existing linear time-invariant(LTI) models [7], [8] assume that there is no distortion causedby propagation error for I-MB, which underestimates thetransmission distortion.

In the following part of this subsection, we derive thepropagation factor αku for P-MB and prove some importantproperties of clipping noise. To derive αku, we first giveLemma 1 as below.

Lemma 1: Given the PMF of the random variable ζ̃k−1u+mvku

and the value of f̂ku , Dku(p) can be calculated at the encoder by

7In a more general case, where Pku{r̄,m} ̸= 0, Eq. (34) can be used as anapproximation. This is because E[(ζ̃k−1

u+m̌vku+∆̃ku{r̄,m})2] only happens in

SDP condition, where the probability of MV packet error is usually less thanthe probability of residual packet error and the probability of the event thata residual packet is correctly received but the corresponding MV packet is inerror, i.e. Pku{r̄,m}, is very small. In addition, since ∆̃ku{r} = 0, for the fourdifferent error events in (33), ∆̃ku{r̄,m} is much more similar to ∆̃ku{r̄, m̄}than to ∆̃ku{r,m} and ∆̃ku{r, m̄}. Therefore, we may approximate the lasttwo terms in (33) by Pku{r̄} · E[(ζ̃

k−1u+mvku

+ ∆̃ku{r̄, m̄})2], i.e. Pku{r̄} ·Dku(p).

8

Dku(p) = E[Φ2(ζ̃k−1

u+mvku, f̂ku)], where Φ(x, y) is called error

reduction function and defined by

Φ(x, y) , y−Γ(y−x) =

y − γl, y − x < γlx, γl ≤ y − x ≤ γhy − γh, y − x > γh.

(37)

Lemma 1 is proved in Appendix A. In fact, we havefound in our experiments that in any error event, ζ̃k−1

u+mvkuapproximately follows Laplacian distribution with zero mean.If we assume ζ̃k−1

u+mvkufollows Laplacian distribution with zero

mean, the calculation for Dku(p) becomes simpler since theonly unknown parameter for PMF of ζ̃k−1

u+mvkuis its variance.

Under this assumption, we have the following proposition.Proposition 1: The propagation factor α for propagated

error with Laplacian distribution of zero-mean and varianceσ2 is given by

α = 1− 12e−

y−γlb (

y − γlb

+ 1)− 12e−

γh−yb (

γh − yb

+ 1),

(38)

where y is the reconstructed pixel value, and b =√22 σ.

Proposition 1 is proved in Appendix B. In the zero-meanLaplacian case, αku will only be a function of f̂

ku and the

variance of ζ̃k−1u+mvku

, which is equal to Dk−1u+mvku

in this case.Since Dk−1

u+mvkuhas already been calculated during the phase of

predicting the (k−1)-th frame transmission distortion, Dku(p)can be calculated by Dku(p) = α

ku ·Dk−1u+mvku via the definition

of αku. Then, we can recursively calculate Dku(P ) in (34) since

both Dk−1u+m̌vku

and Dk−1u+mvku

have been calculated previouslyfor the (k − 1)-th frame.

Next, we prove an important property of the non-linearclipping function in Proposition 2. To prove Proposition 2,we need to use the following lemma.

Lemma 2: The error reduction function Φ(x, y) satisfiesΦ2(x, y) ≤ x2 for any γl ≤ y ≤ γh.Lemma 2 is proved in Appendix C. From Lemma 2, we knowthat the function Φ(x, y) reduces the energy of propagatederror. This is the reason why we call it error reduction function.With Lemma 1, it is straightforward to prove that whatever thePMF of ζ̃k−1

u+mvkuis,


u+mvku, f̂ku)] ≤ E[(ζ̃k−1u+mvku)

2] = Dk−1u+mvku

,

(39)

i.e., αku ≤ 1. In other words, we have the following proposi-tion.

Proposition 2: Clipping reduces propagated error, that is,Dku(p) ≤ Dk−1u+mvku , or α

ku ≤ 1.

Proposition 2 tells us that if there is no newly inducederrors in the k-th frame, transmission distortion decreases fromthe (k − 1)-th frame to the k-th frame. Fig. 2 shows theexperimental result of transmission distortion propagation for‘bus’ sequence in cif format, where the third frame is lostat the decoder and all other frames are correctly received8.

8Since showing the experimental results for all trajectories is almostimpossible in the paper, we just show the result with mean square error (MSE)of all pixels in the same frame.

0 5 10 15 20 25 30 35 400

200

400

600

800

1000

1200

Frame index

MS

E d

isto

rtio

n

Only the third frame is lost

Fig. 2. The effect of clipping noise on distortion propagation.

The experiment setup for Fig. 2, Fig. 3, Fig. 4, Fig. 5 andFig. 6 is: JM14.0 [27] encoder and decoder are used; the firstframe is an I-frame, and the subsequent frames are all P-frameswithout including I-MB; For temporal error concealment,MV error concealment is the default frame copy in JM14.0decoder due to its simplicity; residual packets can be used fordecoder without the corresponding MV packets being correctlyreceived as aforementioned; interpolation filter and deblockingfilter are disabled. That is, the error reduction is caused onlyby the clipping noise.

In fact, if we consider the more general cases where theremay be new error induced in the k-th frame, we can still provethat E[(ζ̃k−1

u+m̃vku+ ∆̃ku)

2] ≤ E[(ζ̃k−1u+m̃vku

)2] as shown in (60)during the proof for the following corollary.

Corollary 1: The correlation coefficient between ζ̃k−1u+m̃vku

and ∆̃ku is non-positive. Specifically, they are negatively corre-lated under the condition {r̄, p}, and uncorrelated under otherconditions.Corollary 1 is proved in Appendix D. This property is veryimportant for designing a low complexity algorithm to estimatepropagation and clipping caused distortion in PTD, which ispresented in the sequel paper [19].

2) Frame-level Distortion Caused by Propagated ErrorPlus Clipping Noise: Define Dk(p) as the mean of Dku(p)over all u ∈ Vk, i.e., Dk(p) , 1|V|

∑u∈Vk D

ku(p); the formula

for frame-level propagation and clipping caused distortion isgiven in Lemma 3.

Lemma 3: The frame-level propagation and clipping causeddistortion in the k-th frame is

Dk(P ) = Dk−1 · P̄ k(r) +Dk(p) · (1− P̄ k(r))(1− βk),(40)

where P̄ k(r) is defined in (28); βk is the percentage of I-MBsin the k-th frame; Dk−1 is the transmission distortion in the(k − 1)-th frame.

Lemma 3 is proved in Appendix F. Define the propaga-tion factor for the k-th frame αk , D

k(p)Dk−1

; then, we have

αk =

∑u∈Vk α

ku·D

k−1u+mvku

Dk−1. As explained in Appendix F, when

the number of pixels in the (k − 1)-th frame is sufficiently

9

large, the sum of Dk−1u+mvku

over all the pixels in the (k − 1)-th frame will converge to Dk−1 due to the randomness of

mvku. Therefore, we have αk ≈

∑u∈Vk α

ku·D

k−1u+mvku∑

u∈Vk Dk−1u+mvku

, which is

a weighted average of αku with the weight being Dk−1u+mvku

. Asa result, Dk(p) ≤ Dk(P ) with high probability9. However,most existing works directly use Dk(P ) = Dk(p) in predict-ing transmission distortion. This is another reason why LTImodels [7], [8] underestimate transmission distortion whenthere is no MV error.

E. Analysis of Correlation Caused DistortionIn this subsection, we first derive the pixel-level correlation

caused distortion Dku(c). Then, we derive the frame-levelcorrelation caused distortion Dk(c).

1) Pixel-level Correlation Caused Distortion: We analyzethe correlation caused distortion Dku(c) at the decoder in fourdifferent cases: i) for uk{r̄, m̄}, both ε̃ku = 0 and ξ̃ku = 0,so Dku(c) = 0; ii) for u

k{r, m̄}, ξ̃ku = 0 and Dku(c) =2E[εku · (ζ̃k−1u+mvku +∆̃

ku{r, m̄})]; iii) for uk{r̄,m}, ε̃ku = 0 and

Dku(c) = 2E[ξku · (ζ̃k−1u+m̌vku + ∆̃

ku{r̄,m})]; iv) for uk{r,m},

Dku(c) = 2E[εku · ξku] + 2E[εku · (ζ̃k−1u+m̌vku + ∆̃

ku{r,m})] +

2E[ξku ·(ζ̃k−1u+m̌vku+∆̃ku{r,m})]. From Section III-D1, we know

∆̃ku{r} = 0. So, we obtain

Dku(c) = Pku{r, m̄} · 2E[εku · ζ̃k−1u+mvku ]

+ P ku{r̄,m} · 2E[ξku · (ζ̃k−1u+m̌vku + ∆̃ku{r̄,m})]

+ P ku{r,m} · (2E[εku · ξku] + 2E[εku · ζ̃k−1u+m̌vku ]

+ 2E[ξku · ζ̃k−1u+m̌vku ]).

(41)

In our experiments, we find that in the trajectory of pixeluk, 1) the residual êku is almost uncorrelated with the residualin all other frames êiv where i ̸= k, i.e. their correlationcoefficient is almost zero, as shown in Fig. 310; and 2) alsothe residual êku is almost uncorrelated with the MVCE of thecorresponding pixel, i.e. ξku, and the MVCE in all previousframes, i.e. ξiv, where i < k, as shown in Fig. 4. Based on theabove observations, we further assume that for any i < k, êkuis uncorrelated with êiv and ξ

iv if v

i is not in the trajectory ofpixel uk, and make the following assumption.

Assumption 5: êku is uncorrelated with ξku, and is uncorre-

lated with both êiv and ξiv for any i < k.

Since ζ̃k−1u+mvku

and ζ̃k−1u+m̌vku

are the transmission recon-structed errors accumulated from all the frames before thek-th frame, εku is uncorrelated with ζ̃

k−1u+mvku

and ζ̃k−1u+m̌vku

dueto Assumption 5. Thus, (41) becomes

Dku(c) = 2Pku{m} · E[ξku · ζ̃k−1u+m̌vku ]

+ 2P ku{r̄,m} · E[ξku · ∆̃ku{r̄,m}].(42)

9When the number of reference pixels in the (k − 1)-th frame is small,∑u∈Vk α

ku ·D

k−1u+mvku

may be larger than Dk−1 in case the reference pixelswith high distortion are used more often than the reference pixels with lowdistortion.

10Fig. 3, Fig. 4, Fig. 5 and Fig. 6 are plotted for low motion sequence, e.g.‘foreman’, and high motion sequence, e.g. ’stefan’, in cif format. All othersequences show the similar statistics.

0

10

20

30

40

0

10

20

30

40−0.5

0

0.5

1

Frame index

Temporal correlation between residuals in one trajectory

Frame index

Cor

rela

tion

coef

ficie

nt

(a)

0

10

20

30

40

0

10

20

30

40−0.5

0

0.5

1

Frame index

Temporal correlation between residuals in one trajectory

Frame index

Cor

rela

tion

coef

ficie

nt

(b)

Fig. 3. (a) foreman-cif, (b) stefan-cif.

However, we observe that in the trajectory of pixel uk,1) êku is correlated with ξ

iv when i > k and especially

when i = k + 1 there are peaks, as seen in Fig. 4; and 2)ξku is highly correlated with ξ

iv as shown in Fig. 5. These

interesting statistical relationships could be exploited by anerror concealment algorithm, e.g. finding a concealed MV forpixel vi with proper ξiv given ê

ku or ξ

ku, and is subject to our

future study.As mentioned in Section III-D1, for P-MBs with SDP in

H.264/AVC, P ku{r̄,m} = 0. So, (42) becomes

Dku(c) = 2Pku{m} · E[ξku · (f̂k−1u+m̌vku − f̃

k−1u+m̌vku

)]. (43)

Note that in the more general case that P ku{r̄,m} ̸= 0,Eq. (43) can be used as an approximation since in (42),E[ξku · ∆̃ku{r̄,m}] is much smaller than E[ξku · ζ̃k−1u+m̌vku ] andP ku{r̄,m} is much smaller than P ku{m}.

For MBs without SDP, since P ku{r, m̄} = P ku{r̄,m} = 0and P ku{r,m} = P ku{r} = P ku{m} = P ku as mentioned inSection III-D1, (41) can be simplified to

Dku(c) = 2Pku · (2E[εku · ξku] + 2E[εku · ζ̃k−1u+m̌vku ] + 2E[ξ

ku · ζ̃k−1u+m̌vku ]).

(44)

Under Assumption 5, (44) reduces to (43).

10

0

10

20

30

40

0

10

20

30

40−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

MV frame index

Temporal correlation between residual and concealment error in one trajectory

Residual frame index

Cor

rela

tion

coef

ficie

nt

(a)

0

10

20

30

40

0

10

20

30

40−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

MV frame index

Temporal correlation between residual and concealment error in one trajectory

Residual frame index

Cor

rela

tion

coef

ficie

nt

(b)


Define λku ,E[ξku·f̃

k−1u+m̌vku

]

E[ξku·f̂k−1u+m̌vku

]; λku is a correlation ratio, that

is, the ratio of the correlation between MVCE and concealedreference pixel value at the decoder, to the correlation betweenMVCE and concealed reference pixel value at the encoder. λkuquantifies the effect of the correlation between the MVCE andpropagated error on transmission distortion.

Note that although we do not know the exact value of λkuat the encoder, its range is characterized by XEP of all pixelsin the trajectory T, which passes through the pixel uk, as

k−1∏i=1

P iT(i){r̄, m̄} ≤ λku ≤ 1, (45)

where T(i) is the reference pixel position in the i-th framefor the trajectory T. For example, T(k−1) = uk+mvku andT(k−2) = T(k−1)+mvk−1T(k−1). The left inequality in (45)holds in the extreme case that any error in the trajectory willcause ξku and f̃

k−1u+m̌vku

to be uncorrelated, which is usually truefor high motion video. The right inequality in (45) holds inanother extreme case that all errors in the trajectory do notaffect the correlation between ξku and f̃

k−1u+m̌vku

, that is E[ξku ·f̃k−1u+m̌vku

] ≈ E[ξku · f̂k−1u+m̌vku ] , which is usually true for lowmotion video. The details on how to estimate λku is presented

0

10

20

30

40

0

10

20

30

40−0.5

0

0.5

1

Frame index

Temporal correlation between concealment errors in one trajectory

Frame index

Cor

rela

tion

coef

ficie

nt

(a)

0

10

20

30

40

0

10

20

30

40−0.5

0

0.5

1

Frame index

Temporal correlation between concealment errors in one trajectory

Frame index

Cor

rela

tion

coef

ficie

nt

(b)


in the sequel paper [19].Using the definition of λku, we have the following lemma.Lemma 4:

Dku(c) = (λku − 1) ·Dku(m). (46)

Lemma 4 is proved in Appendix G.If we assume E[ξku] = 0, we may further derive the

correlation coefficient between ξku and f̂k−1u+m̌vku

. Denote ρ astheir correlation coefficient, from (70), we have

ρ =E[ξku · f̂k−1u+m̌vku ]− E[ξ

ku] · E[f̂k−1u+m̌vku ]

σξku · σf̂ku

= − E[(ξku)

2]

2 · σξku · σf̂ku= −

σξku2 · σf̂ku

.

(47)

Similarly, it is easy to prove that the correlation coefficientbetween ξku and f̂

k−1u+mvku

isσξku

2·σf̂ku

. This agrees well with theexperimental results shown in Fig. 6. Via the same derivationprocess, one can obtain the correlation coefficient between êkuand f̂k−1

u+mvku, and between êku and f̂

ku . One possible application

of these correlation properties is error concealment with partialinformation available.

11

0 5 10 15 20 25 30 35 40−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

Frame index

Cor

rela

tion

coef

ficie

ntMeasured ρ between ξ k

uand f̂k−1

u+mvku

Estimated ρ between ξ ku

and f̂k−1u+mvk

u

Measured ρ between ξ ku

and f̂k−1u+m̌vk

u


and f̂k−1u+m̌vk

u

(a)

0 5 10 15 20 25 30 35 40−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

Frame index

Cor

rela

tion

coef

ficie

nt


and f̂k−1u+mvk

u


and f̂k−1u+mvk

u


and f̂k−1u+m̌vk

u


and f̂k−1u+m̌vk

u

(b)

Fig. 6. Comparison between measured and estimated correlation coefficients:(a) foreman-cif, (b) stefan-cif.

2) Frame-Level Correlation Caused Distortion: DenoteVki {m} the set of pixels in the i-th MV packet of the k-thframe. From (19), (71) and Assumption 4, we obtain

Dk(c) =E[(ξk)2]

|V|∑u∈Vk

(λku − 1) · P ku (m)

=E[(ξk)2]

|V|

Nk(m)∑i=1

{P ki (m)∑

u∈Vki {m}

(λku − 1)}.(48)

Define λk , 1|V|∑

u∈Vk λku;

1Nki (m)

∑u∈Vki {m}

λku willconverge to λk for any packet that contains a sufficiently largenumber of pixels. By rearranging (48), we obtain

Dk(c) =E[(ξk)2]

|V|

Nk(m)∑i=1

{P ki (m) ·Nki (m) · (λk − 1)}

= (λk − 1) · E[(ξk)2] · P̄ k(m).

(49)

From (32), we know that E[(ξk)2] · P̄ k(m) is exactly equalto Dk(m). Therefore, (49) is further simplified to

Dk(c) = (λk − 1) ·Dk(m). (50)

F. Summary

In Section III-A, we decomposed transmission distortioninto four terms; we derived a formula for each term inSections III-B through III-E. In this section, we combine theformulae for the four terms into a single formula.

1) Pixel-Level Transmission Distortion:Theorem 1: Under single-reference motion compensation,

the PTD of pixel uk is

Dku = Dku(r) + λ

ku ·Dku(m) + P ku{r,m} ·Dk−1u+m̌vku

+ P ku{r, m̄} ·Dk−1u+mvku + Pku{r̄} · αku ·Dk−1u+mvku .

(51)

Proof: (51) can be obtained by plugging (23), (31), (34),and (71) into (14).

Corollary 2: Under single-reference motion compensationand no SDP, (51) is simplified to

Dku = Pku · (E[(εku)2] + λku · E[(ξku)2] +Dk−1u+m̌vku)

+ (1− P ku ) · αku ·Dk−1u+mvku .(52)

2) Frame-Level Transmission Distortion:Theorem 2: Under single-reference motion compensation,

the FTD of the k-th frame is

Dk = Dk(r) + λk ·Dk(m) + P̄ k(r) ·Dk−1

+ (1− P̄ k(r)) ·Dk(p) · (1− βk).(53)

Proof: (53) can be obtained by plugging (29), (32), (40)and (50) into (15).

Corollary 3: Under single-reference motion compensationand no SDP, the FTD of the k-th frame is simplified to

Dk = P̄ k · (E[(εk)2] + λk · E[(ξk)2] +Dk−1)+ (1− P̄ k) · αk ·Dk−1 · (1− βk)

(54)

Following the same deriving process, it is not difficult toobtain the distortion prediction formulae under multi-referencecase. Due to the space limit, in this paper we just presentthe formulae for distortion estimation under single-referencecase. Interested reader may refer to Ref. [22] for the analysisof multi-reference case. In Ref. [22], we also identify therelationship between our result and existing models, andspecify the conditions, under which those models are accurate.

IV. CONCLUSION

In this paper, we derived the transmission distortion formu-lae for wireless video communication systems. With consider-ation of spatio-temporal correlation, nonlinear codec and time-varying channel, our distortion prediction formulae improvethe accuracy of distortion estimation from existing works.Besides that, our formulae support, for the first time, thefollowing capabilities: 1) prediction at different levels (e.g.,pixel/frame/GOP level), 2) prediction for multi-reference mo-tion compensation, 3) prediction under SDP, 4) predictionunder arbitrary slice-level packetization with FMO mecha-nism, 5) prediction under time-varying channels, 6) one unifiedformula for both I-MB and P-MB, and 7) prediction for bothlow motion and high motion video sequences. In addition, thispaper also identified two important properties of transmissiondistortion for the first time: 1) clipping noise, produced by

12

non-linear clipping, causes decay of propagated error; 2)the correlation between motion vector concealment error andpropagated error is negative, and has dominant impact ontransmission distortion, among all the correlations between anytwo of the four components in transmission error.

In the sequel paper [19], we use the formulae derived inthis paper to design algorithms for estimating pixel-level andframe-level transmission distortion and apply the algorithms tovideo codec design; we also verify the accuracy of the formu-lae derived in this paper through experiments; the applicationof these formulae shows superior performance over existingmodels.

ACKNOWLEDGMENTS

This work was supported in part by an Intel gift, the USNational Science Foundation under grant CNS-0643731 andECCS-1002214. The authors would like to thank Jun Xu andQian Chen for many fruitful discussions related to this workand suggestions that helped to improve the presentation of thispaper. The authors would also like to thank the anonymousreviewers for their valuable comments to improve the qualityof this paper.

APPENDIX

A. Proof of Lemma 1

Proof: From (10) and (12), we obtain f̃k−1u+m̃vku

+ ẽku =

f̂ku − ξ̃ku − ε̃ku − ζ̃k−1u+m̃vku ,Together with (8), we obtain

∆̃ku = (f̂ku − ξ̃ku − ε̃ku − ζ̃k−1u+m̃vku)− Γ(f̂

ku − ξ̃ku − ε̃ku − ζ̃k−1u+m̃vku).

(55)

So, ζ̃k−1u+m̃vku

+ ∆̃ku = (f̂ku − ξ̃ku − ε̃ku)− Γ(f̂ku − ξ̃ku − ε̃ku −

ζ̃k−1u+m̃vku

), and

Dku(P ) = E[(ζ̃k−1u+m̃vku

+ ∆̃ku)2]

= E[Φ2(ζ̃k−1u+m̃vku

, f̂ku − ξ̃ku − ε̃ku)].(56)

We know from the definition that Dku(p) is a special caseof Dku(P ) under the condition {r̄, m̄}, which means ẽku = êku,i.e. ε̃ku = 0, and m̃v

ku = mv

ku, i.e. ξ̃

ku = 0. Therefore, we

obtain


u+mvku, f̂ku)]. (57)

B. Proof of Proposition 1

Proof: The probability density function of the randomvariable having a Laplacian distribution is f(x|µ, b) =12b exp

(− |x−µ|b

). Since µ = 0, we have E[x2] = 2b2, and

−300 −200 −100 0 100 200 3000

1

2

3

4

5

6

7

8

9x 10

4

x

y=100, γH

=255, γL=0

x2

Φ2(x, y)

Fig. 7. Comparison of Φ2(x, y) and x2.

from (37), we obtain

E[x2]− E[Φ2(x, y)]

=

∫ +∞y−γl

(x2 − (y − γl)2)1

2be−

xb dx

+

∫ y−γh−∞

[x2 − (y − γh)2]1

2be

xb dx

= e−y−γl

b ((y − γl) · b+ b2) + e−γh−y

b ((γh − y) · b+ b2).(58)

From the definition of propagation factor, we obtain α =E[Φ2(x, y)]

E[x2] = 1−12e

− y−γlb (y−γlb + 1)−12e

− γh−yb (γh−yb + 1).

C. Proof of Lemma 2

Proof: From the definition in (37), we obtain

Φ2(x, y)− x2 =

(y − γl)2 − x2, x > y − γl0, y − γh ≤ x ≤ y − γl(y − γh)2 − x2, x < y − γh.

(59)Since y ≥ γl, we obtain (y − γl)2 < x2 when x > y − γl.

Similarly, since y ≤ γh, we obtain (y − γh)2 < x2 whenx < y − γh. Therefore Φ2(x, y) − x2 ≤ 0 for γl ≤ y ≤ γh.Fig. 7 shows a pictorial example of the case that γh = 255,γl = 0 and y = 100.

D. Proof of Corollary 1

Proof: From (55), we obtain ∆̃ku{p̄} = (f̂ku − ξ̃ku − ε̃ku)−Γ(f̂ku − ξ̃ku− ε̃ku). Together with Lemma 5, which is presentedand proved in Appendix E, we have γl ≤ f̂ku − ξ̃ku − ε̃ku ≤ γh.From Lemma 2, we have Φ2(x, y) ≤ x2 for any γl ≤ y ≤ γh;therefore, E[Φ2(ζ̃k−1

u+m̃vku, f̂ku − ξ̃ku − ε̃ku)] ≤ E[(ζ̃k−1u+m̃vku)

2].Together with (56), it is straightforward to prove that

E[(ζ̃k−1u+m̃vku

+ ∆̃ku)2] ≤ E[(ζ̃k−1

u+m̃vku)2]. (60)

13

By expanding E[(ζ̃k−1u+m̃vku

+ ∆̃ku)2], we obtain

E[ζ̃k−1u+m̃vku

· ∆̃ku] ≤ −1

2E[(∆̃ku)

2] ≤ 0. (61)

The physical meaning of (61) is that ζ̃k−1u+m̃vku

and ∆̃ku are

negatively correlated if ∆̃ku ̸= 0. Since ∆̃ku{r} = 0 as notedin Section III-D1 and ∆̃ku{p̄} = 0 as proved in Lemma 5, weknow that ∆̃ku ̸= 0 is valid only for the error events {r̄,m, p}and {r̄, m̄, p}, and ∆̃ku = 0 for any other error event. Inother words, ζ̃k−1

u+m̃vkuand ∆̃ku are negatively correlated under

the condition {r̄, p}, and they are uncorrelated under otherconditions.

E. Lemma 5 and Its Proof

Before presenting the proof, we first give the definition ofIdeal Codec.

Definition 1: Ideal Codec: both the true MV and concealedMV are within the search range, and the position pointed bythe true MV, i.e., u+mvku, is the best reference pixel, underthe MMSE criteria, for pixel uk within the whole search rangeVk−1SR , that is, v = argmin

v∈Vk−1SR

{(f̂ku − f̂k−1v )2}.

To prove Corollary 1, we need to use the following lemma.Lemma 5: In an ideal codec, ∆̃ku{p̄} = 0, In other words,

if there is no propagated error, the clipping noise for the pixeluk at the decoder is always zero no matter what kind of errorevent occurs in the k-th frame.

Proof: In an ideal codec, we have (êku)2 = (f̂ku −

f̂k−1u+mvku

)2 ≤ (f̂ku−f̂k−1u+m̌vku)2. Due to the spatial and temporal

continuity of the natural video, we can prove by contradictionthat in an ideal codec f̂ku − f̂k−1u+mvku and f̂

ku − f̂k−1u+m̌vku have

the same sign, that is either

f̂ku − f̂k−1u+m̌vku ≥ êku ≥ 0, or f̂ku − f̂k−1u+m̌vku ≤ ê

ku ≤ 0.

(62)

If the sign of f̂ku − f̂k−1u+mvku and f̂ku − f̂k−1u+m̌vku is not the

same, then due to the spatial and temporal continuity of theinput video, there exists a better position v ∈ Vk−1 betweenmvku and m̌v

ku, and therefore within the search range, so that

(êku)2 ≥ (f̂ku − f̂k−1v )2. In this case, encoder will choose

v as the best reference pixel within the search range. Thiscontradicts the assumption that the best reference pixel isu+mvku within the search range.

Therefore, from (62), we obtain

f̂ku ≥ f̂k−1u+m̌vku + êku ≥ f̂k−1u+m̌vku ,

or f̂ku ≤ f̂k−1u+m̌vku + êku ≤ f̂k−1u+m̌vku .

(63)

Since both f̂ku and f̂k−1u+m̌vku

are reconstructed pixel value,

they are within the range γh ≥ f̂ku , f̂k−1u+m̌vku ≥ γl. From (63),we have γh ≥ f̂k−1u+m̌vku + ê

ku ≥ γl, and thus Γ(f̂k−1u+m̌vku +

êku) = f̂k−1u+m̌vku

+ êku. As a result, we obtain ∆̃ku{r̄,m, p̄} =

(f̂k−1u+m̌vku

+ êku)− Γ(f̂k−1u+m̌vku + êku) = 0.

Since ∆̃ku{r̄, m̄, p̄} = ∆̂ku = 0, and from Section III-D1, weknow that ∆̃ku{r, p̄} = 0, hence we obtain ∆̃ku{p̄} = 0.

Remark 1: Note that Lemma 5 is proved under the assump-tion of pixel-level motion estimation. In a practical encoder,block-level motion estimation is adopted with the criterionof minimizing the MSE of the whole block, e.g., in H.263,or minimizing the cost of residual bits and MV bits, e.g., inH.264/AVC. Therefore, some reference pixels in the block maynot be the best reference pixel within the search range. Onthe other hand, Rate Distortion Optimization (RDO) as usedin H.264/AVC may also cause some reference pixels not tobe the best reference pixels. However, the experiment resultsfor all the test video sequences show that the probability of∆̃ku{r̄,m, p̄} ̸= 0 is negligible.

F. Proof of Lemma 3

Proof: For P-MBs with SDP, from (18) and (34) weobtain

Dk(P ) =1

|V|∑u∈Vk

(P ku{r,m} ·Dk−1u+m̌vku)

+1

|V|∑u∈Vk

(P ku{r, m̄} ·Dk−1u+mvku) +1

|V|∑u∈Vk

(P ku{r̄} ·Dku(p)).

(64)

Denote Vki {r, m̄} the set of pixels in the k-th frame with thesame XEP P ki {r, m̄}; denote Nki {r, m̄} the number of pixelsin Vki {r, m̄}; denote Nk{r, m̄} the number of sets with differ-ent XEP P ki {r, m̄} in the k-th frame. Although D

k−1u+mvku

maybe very different for different pixels u+mvku in the (k−1)-thframe, e.g. under a fast fading channel with FMO mechanism,for large Nki {r, m̄}, we have 1Nki {r,m̄}

∑u∈Vki {r,m̄}

Dk−1u+mvku

converges to Dk−1.11 Therefore,

1

|V|∑u∈Vk

(P ku{r, m̄} ·Dk−1u+mvku)

=1

|V|

Nk{r,m̄}∑i=1

(P ki {r, m̄}∑

u∈Vki {r,m̄}

Dk−1u+mvku

)

=1

|V|

Nk{r,m̄}∑i=1

(P ki {r, m̄} ·Nki {r, m̄} ·Dk−1)

= Dk−1 · P̄ k{r, m̄},

(65)

where P̄ k{r, m̄} = 1|V|∑Nk{r,m̄}

i=1 (Pki {r, m̄} ·Nki {r, m̄}).

Following the same process, we obtain the first term in theright-hand side in (64) as Dk−1·P̄ k{r,m}, where P̄ k{r,m} =

11According to the definition, for any given u ∈ Vk−1, Dk−1u isan expected value, that is, it is not a random variable. However, dueto the randomness of mvku, each pixel in the k − 1-th frame can beused as a reference for multi pixels in the k-th frame. In other words,

1Nki {r,m̄}

∑u∈Vki {r,m̄}

Dk−1u+mvku

can be described as simple random sam-pling with replacement (SRSWR) and take their average. On the other hand,according to (6), Dk−1 is the mean of Dk−1u over all u ∈ Vk−1. Therefore,using the Theorem 5.2.6 in Ref. [28], it is easy to prove that expectation ofDk−1

u+mvkuis exactly equal to Dk−1; and using the Theorem 5.5.2 in Ref. [28],

it is also easy to prove that 1Nki {r,m̄}

∑u∈Vki {r,m̄}

Dk−1u+mvku

converge in

probability to Dk−1. Note again that the randomness of Dk−1u+mvku

is caused

by mvku.

14

1|V|

∑Nk{r,m}i=1 (P

ki {r,m} ·Nki {r,m}); and

1

|V|∑u∈Vk

(P ku{r̄} ·Dku(p)) =1

|V|

Nk{r̄}∑i=1

(P ki {r̄}∑

u∈Vki {r̄}

Dku(p)).

(66)

For large Nki {r̄}, we have 1Nki {r̄}∑

u∈Vki {r̄}Dku(p) converges

to Dk(p), so the third term in the right-hand side in (64) isDk(p) · (1− P̄ k(r)).

Note that P ki {r,m}+P ki {r, m̄} = P ki {r} and Nki {r,m} =Nki {r, m̄}. So, we obtain

Dk(P ) = Dk−1 · P̄ k(r) +Dk(p) · (1− P̄ k(r)). (67)

For P-MBs without SDP, it is straightforward to acquire(67) from (35). For I-MBs, from (36), it is also easy to obtainDk(P ) = Dk−1 · P̄ k(r). So, together with (67), we obtain(40).

G. Proof of Lemma 4

Proof: Using the definition of λku, (43) becomes

Dku(c) = 2Pku{m} · (1− λku) · E[ξku · f̂k−1u+m̌vku ]. (68)

Under the condition that the distance between mvku andm̌vku is small, for example, inside the same MB, the statisticsof f̂k−1

u+m̌vkuand f̂k−1

u+mvkuare almost the same. Therefore, we

may assume E[(f̂k−1u+m̌vku

)2] = E[(f̂k−1u+mvku

)2].

Since ξku = f̂k−1u+mvku

− f̂k−1u+m̌vku

, we have

E[(f̂k−1u+m̌vku

)2] = E[(f̂k−1u+mvku

)2]

= E[(ξku + f̂k−1u+m̌vku

)2],(69)

and therefore

E[ξku · f̂k−1u+m̌vku ] = −E[(ξku)

2]

2(70)

Note that following the same derivation process, we can proveE[ξku · f̂k−1u+mvku ] =

E[(ξku)2]

2 .Therefore, (68) can be simplify as

Dku(c) = (λku − 1) · E[(ξku)2] · P ku (m). (71)

From (31), we know that E[(ξku)2] · P ku (m) is exactly equal

to Dku(m). Therefore, (71) is further simplified to (46).

REFERENCES[1] C. E. Shannon, “Coding theorems for a discrete source with a fidelity

criterion,” IRE Nat. Conv. Rec. Part, vol. 4, pp. 142–163, 1959.[2] T. Berger and J. Gibson, “Lossy source coding,” IEEE Transactions on

Information Theory, vol. 44, no. 6, pp. 2693–2723, 1998.[3] R. Zhang, S. L. Regunathan, and K. Rose, “Video coding with optimal

inter/intra-mode switching for packet loss resilience,” IEEE Journal onSelected Areas in Communications, vol. 18, no. 6, pp. 966–976, Jun.2000.

[4] T. Stockhammer, M. Hannuksela, and T. Wiegand, “H. 264/AVC inwireless environments,” IEEE Transactions on Circuits and Systems forVideo Technology, vol. 13, no. 7, pp. 657–673, 2003.

[5] T. Stockhammer, T. Wiegand, and S. Wenger, “Optimized transmissionof h.26l/jvt coded video over packet-lossy networks,” in IEEE ICIP,2002.

[6] M. Sabir, R. Heath, and A. Bovik, “Joint source-channel distortionmodeling for MPEG-4 video,” IEEE Transactions on Image Processing,vol. 18, no. 1, pp. 90–105, 2009.

[7] K. Stuhlmuller, N. Farber, M. Link, and B. Girod, “Analysis of videotransmission over lossy channels,” IEEE Journal on Selected Areas inCommunications, vol. 18, pp. 1012–1032, Jun. 2000.

[8] J. U. Dani, Z. He, and H. Xiong, “Transmission distortion modelingfor wireless video communication,” in Proceedings of IEEE GlobalTelecommunications Conference (GLOBECOM’05), 2005.

[9] Z. He, J. Cai, and C. W. Chen, “Joint source channel rate-distortionanalysis for adaptive mode selection and rate control in wireless videocoding,” IEEE Transactions on Circuits and System for Video Technol-ogy, special issue on wireless video, vol. 12, pp. 511–523, Jun. 2002.

[10] Y. Wang, Z. Wu, and J. M. Boyce, “Modeling of transmission-loss-induced distortion in decoded video,” IEEE Transactions on Circuitsand Systems for Video Technology, vol. 16, no. 6, pp. 716–732, Jun.2006.

[11] J. Chakareski, J. Apostolopoulos, W.-T. Tan, S. Wee, and B. Girod,“Distortion chains for predicting the video distortion for general packetloss patterns,” in Proc. ICASSP, 2004.

[12] J. Chakareski, J. Apostolopoulos, S. Wee, W.-T. Tan, and B. Girod,“Rate-distortion hint tracks for adaptive video streaming,” IEEE Trans-actions on Circuits and Systems for Video Technology, vol. 15, no. 10,pp. 1257–1269, 2005.

[13] C. Zhang, H. Yang, S. Yu, and X. Yang, “GOP-level transmissiondistortion modeling for mobile streaming video,” Signal Processing:Image Communication, 2007.

[14] M. T. Ivrlač, L. U. Choi, E. Steinbach, and J. A. Nossek, “Models andanalysis of streaming video transmission over wireless fading channels,”Signal Processing: Image Communication, vol. 24, no. 8, pp. 651–665,Sep. 2009.

[15] Y. J. Liang, J. G. Apostolopoulos, and B. Girod, “Analysis of packet lossfor compressed video: Effect of burst losses and correlation betweenerror frames,” IEEE Transactions on Circuits and Systems for VideoTechnology, vol. 18, no. 7, pp. 861–874, Jul. 2008.

[16] ITU-T Series H: Audiovidual and Multimedia Systems, Advanced videocoding for generic audiovisual services, Nov. 2007.

[17] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview ofthe h.264/AVC video coding standard,” IEEE Transactions on Circuitsand Systems for Video Technology, vol. 13, no. 7, pp. 560–576, Jul.2003.

[18] P. Lambert, W. De Neve, Y. Dhondt, and R. Van de Walle, “Flexiblemacroblock ordering in H. 264/AVC,” Journal of Visual Communicationand Image Representation, vol. 17, no. 2, pp. 358–375, 2006.

[19] Z. Chen and D. Wu, “Prediction of Transmission Distortion for WirelessVideo Communication: Algorithm and Application,” Journal of VisualCommunication and Image Representation, vol. 21, no. 8, pp. 948–964,2010.

[20] Y. Wang and Q.-F. Zhu, “Error control and concealment for videocommunication: a review,” in Proceedings of IEEE, vol. 86, no. 5, 1998,pp. 974–997.

[21] D. Agrafiotis, D. R. Bull, and C. N. Canagarajah, “Enhanced errorconcealment with mode selection,” IEEE Transactions on Circuits andSystems for Video Technology, vol. 16, no. 8, pp. 960–973, Aug. 2006.

[22] Z. Chen and D. Wu, “Prediction of Transmission Distortionfor Wireless Video Communication: Part I: Analysis,” 2010,http://www.wu.ece.ufl.edu/mypapers/journal-1.pdf.

[23] Z. Chen, P. Pahalawatta, A. M. Tourapis, and D. Wu, “The ERMPCAlgorithm for Error Resilient Rate Distortion Optimization in VideoCoding,” IEEE Transactions on Circuits and Systems for Video Tech-nology, 2011, accepted.

[24] B. Girod, “The efficiency of motion-compensating prediction for hy-brid coding of video sequences,” IEEE Journal on Selected Areas inCommunications, vol. 5, no. 7, pp. 1140–1154, Aug. 1987.

[25] ——, “Efficiency analysis of multihypothesis motion-compensated pre-diction for video coding,” Image Processing, IEEE Transactions on,vol. 9, no. 2, pp. 173–183, 2000.

[26] A. Leontaris and P. Cosman, “Compression efficiency and delay trade-offs for hierarchical b-pictures and pulsed-quality frames,” Image Pro-cessing, IEEE Transactions on, vol. 16, no. 7, pp. 1726–1740, 2007.

[27] “H.264/AVC reference software JM14.0,” May. 2008. [Online].Available: http://iphome.hhi.de/suehring/tml/download

[28] G. Casella and R. L. Berger, Statistical Inference, 2nd ed. DuxburyPress, 2001.

15

PLACEPHOTOHERE

Zhifeng Chen received Ph.D. degree in Electricaland Computer Engineering from the University ofFlorida, Gainesville, Florida, in 2010. He joinedInterdigital Inc. in 2010, where he is currently a staffengineer working on video coding research.

PLACEPHOTOHERE

Dapeng Wu (S’98–M’04–SM’6) received Ph.D. inElectrical and Computer Engineering from CarnegieMellon University, Pittsburgh, PA, in 2003. Cur-rently, he is a professor of Electrical and ComputerEngineering Department at University of Florida,Gainesville, FL.

Date post:	11-Feb-2021
Category:	Documents
Upload:	others
View:	3 times
Download:	0 times

Prediction of Transmission Distortion for Wireless Video ... · Section III, we derive the...

Documents