+ All Categories
Home > Documents > EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

Date post: 13-Apr-2022
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
110
EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION OPTIMIZATION FOR VIDEO ENCODING UNDER ENERGY CONSTRAINTS A Thesis presented to the Faculty of the Graduate School University of Missouri-Columbia In Partial Fulfillment Of the Requirement for the Degree Master of Science by WENYE CHENG Prof. Zhihai He, Thesis Supervisor AUGUST 2007
Transcript
Page 1: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

EMBEDDED SYSTEM DESIGN AND

POWER-RATE-DISTORTION OPTIMIZATION FOR

VIDEO ENCODING UNDER ENERGY CONSTRAINTS

A Thesis presented to the Faculty of the Graduate School

University of Missouri-Columbia

In Partial Fulfillment

Of the Requirement for the Degree

Master of Science

byWENYE CHENG

Prof. Zhihai He, Thesis Supervisor

AUGUST 2007

Page 2: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

The undersigned, appointed by the Dean of the Graduate School,have examined the thesis entitled

EMBEDDED SYSTEM DESIGN ANDPOWER-RATE-DISTORTION OPTIMIZATION FOR VIDEO

ENCODING UNDER ENERGY CONSTRAINTS

Presented by Wenye Cheng

A candidate for the degree of Master of Science

And hereby certify that in their opinion it is worthy of acceptance.

Professor Zhihai He

Professor Justin Legarsky

Professor Ye Duan

Page 3: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

To my wife Ming Qian and daughter Xi Cheng.

—–Without their love, support and sacrifices, this work could not have been

accomplished.

Page 4: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

ACKNOWLEGEMENTS

I would hereby like to whole-heartedly thank my advisor, Prof. Zhihai He, for

providing excellent guidance throughout the course of this work.

I would like thank members of my thesis committee, Prof. Justin Legarsky and

Prof. Ye Duan, for their thorough review of this thesis. I express my deepest and

most sincere gratitude to my committee members.

I would like also thank Mr. Jim Fischer for his assistance in setting up the

experiment environment of this thesis. He was very patient and cooperative every

time I asked for his help.

To my colleagues, graduate students Xi Chen, Jay Eggert, and Xiwen Zhao, I

would like to express my thanks for their help in experiments. Specially, I express

my thanks to Xi Chen for his great help in our collaboration on the energy-aware

video encoder design.

I would like also thank York Chung for his help during my thesis writing. I would

have taken a much longer time in typewriting the thesis without his help.

ii

Page 5: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

Abstract

Wireless video communication over portable devices has become the driving tech-

nology of many important applications, experiencing dramatic market growth and

promising revolutionary experiences in personal communication, gaming, entertain-

ment, military, security, environment monitoring, and more. Portable devices are

powered by batteries. Video encoding schemes are often computationally intensive

and energy-demanding, even after being fully optimized with existing software and

hardware energy-minimization techniques. As a result, the operational lifetime of

current portable video systems, such as handheld video devices, is still very short,

mostly in the range of a few hours. Therefore, one of the central challenging issues in

portable video communication system design is to minimize the energy consumption

of video encoding so as to extend the operational lifetime of devices. In this work,

we develop an operational power-rate-distortion (P-R-D) approach to minimizing

the video encoding energy under rate-distortion constraints. We will demonstrate

that extending the traditional rate-distortion analysis to P-R-D analysis will give

us another dimension of flexibility in resource allocation and performance optimiza-

tion for wireless video communication over portable devices. Theoretically, we will

analyze the energy saving gain of P-R-D optimization. Practically, we will develop

an adaptive scheme to estimate the P-R-D model parameters and perform on-the-

iii

Page 6: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

fly energy optimization for real-time video compression. Our results show that, for

typical videos with non-stationary statistics, using the proposed P-R-D optimiza-

tion technology, the encoder energy consumption can be significantly reduced. This

has many important applications in energy-efficient portable video communication

system design.

iv

Page 7: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

Contents

ACKNOWLEDGEMENTS ii

Abstract iii

List of Figures viii

List of Tables xi

1 Introduction 1

1.1 Major Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Background and Relate Work 5

2.1 Video Coding Complexity . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Dynamic Scalable Voltage . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3 Joint Hardware and Application Adaptation . . . . . . . . . . . . . . 10

2.4 Importance of This Work . . . . . . . . . . . . . . . . . . . . . . . . . 12

3 Energy-Scalable Video Encoder Design and Operational P-R-D

Analysis 14

3.1 An Operational Approach to P-R-D Analysis . . . . . . . . . . . . . . 14

v

Page 8: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

3.2 Complexity Control Parameters . . . . . . . . . . . . . . . . . . . . . 17

3.2.1 The Complexity Control Parameter νX . . . . . . . . . . . . . 17

3.2.2 The Complexity Control Parameter νY . . . . . . . . . . . . . 19

3.3 P-R-D Modeling for Energy-Aware Video Encoding . . . . . . . . . . 21

3.4 Analytical P-R-D Models . . . . . . . . . . . . . . . . . . . . . . . . . 24

4 Energy Saving Analysis 33

4.1 Theoretical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.2 Traing-Classification Approach to P-R-D Video Encoder Control . . . 39

5 Power-Rate-Distortion Control for Energy-Aware Video Encoding 45

5.1 A Training-Classification Approach to P-R-D Video Encoder Control 45

5.2 Lagrangian Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.3 P-R-D Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.4 Summary of Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6 Joint Hardware and Video Encoder Adaptation 67

6.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

6.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

6.2.1 CPU Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

6.2.2 Task Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6.3 Application-Layer Video Encoder Adaptation . . . . . . . . . . . . . 71

6.3.1 Practical Dynamic Voltage Scaling (PDVS) . . . . . . . . . . 72

6.3.2 Cross-layer Adaptation for Single-Stream Video Encoding . . 76

7 Energy-Aware Embedded Video Encoding System Design 80

7.1 Tier architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

vi

Page 9: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

7.2 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

7.2.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

7.2.2 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

7.3 Power Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

7.3.1 Signal Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

7.3.2 State Control . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

8 Conclusion and Future Work 90

8.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

8.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

References 92

vii

Page 10: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

List of Figures

2.1 Generalized block diagram of a hybrid video encoder . . . . . . . . . 6

2.2 The framework of the GRACE . . . . . . . . . . . . . . . . . . . . . . 11

3.1 Power consumption model with DVS. . . . . . . . . . . . . . . . . . . 16

3.2 (a)The P-R-D curve; (b)The D-P curves at different bit rates. . . . . 23

3.3 Akiyos (a)The P-R-D curve; (b)The D-P curves at different bit rates. 25

3.4 News (a)The P-R-D curve; (b)The D-P curves at different bit rates. . 26

3.5 Salesman (a)The P-R-D curve; (b)The D-P curves at different bit rates. 27

3.6 Car (a)The P-R-D curve; (b)The D-P curves at different bit rates. . . 28

3.7 Foreman (a)The P-R-D curve; (b)The D-P curves at different bit rates. 29

3.8 Coastguard (a)The P-R-D curve; (b)The D-P curves at different bit

rates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.9 Football (a)The P-R-D curve; (b)The D-P curves at different bit rates. 31

4.1 The relationship of λ and λF . . . . . . . . . . . . . . . . . . . . . . . 41

4.2 The common video scene estimation parameter λF . . . . . . . . . . . 42

4.3 A parts of common video scene estimation parameter λF . . . . . . . 42

4.4 A parts of common video scene estimation parameter λF . . . . . . . 43

4.5 The distribution of scene activity parameters of all video segments. . 44

viii

Page 11: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

5.1 The discrete Lagrangian Optimization principle . . . . . . . . . . . . 47

5.2 The multiplier λ, rate constraints and complexity at D = 38db . . . . 52

5.3 The multiplier λ, rate constraints and complexity at D = 37db . . . . 53

5.4 The multiplier λ, rate constraints and complexity at D = 36db . . . . 53

5.5 The multiplier λ, rate constraints and complexity at D = 35db . . . . 54

5.6 The multiplier λ, rate constraints and complexity at D = 34db . . . . 54

5.7 The multiplier λ, rate constraints and complexity at D = 33db . . . . 55

5.8 The multiplier λ, rate constraints and complexity at D = 32db . . . . 55

5.9 The multiplier λ, rate constraints and complexity at D = 31db . . . . 56

5.10 The multiplier λ, rate constraints and complexity at D = 30db . . . . 56

5.11 The multiplier λ, rate constraints and complexity at D = 29db . . . . 57

5.12 Comparison of the non scaling coding and scaling coding at D =

37db, λ = 300 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.13 Comparison of the non scaling coding and scaling coding at D =

37db, λ = 2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.14 Comparison of the non scaling coding and scaling coding at D =

34db, λ = 300 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.15 Comparison of the non scaling coding and scaling coding at D =

34db, λ = 2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.16 Comparison of the non scaling coding and scaling coding at D =

30db, λ = 300 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.17 Comparison of the non scaling coding and scaling coding at D =

30db, λ = 2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

7.1 Tier Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

ix

Page 12: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

7.2 The Video Sensor Tiers . . . . . . . . . . . . . . . . . . . . . . . . . . 85

7.3 Software constructure . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

7.4 Definition of the Connecter Pins . . . . . . . . . . . . . . . . . . . . . 88

x

Page 13: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

List of Tables

2.1 CPU Occupancy (In Percentage) of the Major Encoding Function . . 8

4.1 The point value for categorizing Video Segment . . . . . . . . . . . . 41

5.1 Sample choices of of multiplier λ. . . . . . . . . . . . . . . . . . . . . 52

5.2 The Probability of the Fi . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.3 The Lagragian Multiplier λ and correspoding QoS and rate constaint 59

5.4 The Lagragian Multiplier λ and corresponding PSNR ≈ 37db and

rate constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.5 The Lagragian Multiplier λ and corresponding PSNR ≈ 34db and

rate constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.6 The Lagragian Multiplier λ and corresponding PSNR ≈ 34db and

rate constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.7 The Energy Saving Ratio . . . . . . . . . . . . . . . . . . . . . . . . . 65

6.1 Stargate Power consumption. . . . . . . . . . . . . . . . . . . . . . . 70

6.2 The Energy Saving Ratio1 . . . . . . . . . . . . . . . . . . . . . . . . 79

6.3 The Energy Saving Ratio 2 . . . . . . . . . . . . . . . . . . . . . . . . 79

6.4 The Energy Saving Ratio 3 . . . . . . . . . . . . . . . . . . . . . . . . 79

xi

Page 14: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

Chapter 1

Introduction

Wireless video communication over portable devices has become the driving tech-

nology of many important applications, experiencing dramatic market growth and

promising revolutionary experiences in personal communication, gaming, entertain-

ment, military, security, environment monitoring, and more [37,38]. Portable devices

are powered by batteries. Video encoding schemes are often computationally inten-

sive and energy-demanding, even after being fully optimized with existing software

and hardware energy-minimization techniques [14, 63]. As a result, the operational

lifetime of current portable video systems, such as handheld video devices, is still

short, mostly in the range of a few hours. This has become a bottleneck for techno-

logical progress in portable video electronics.

Video encoding is computationally intensive and energy-consuming. However, a

mobile video application system, powered by batteries, has limited energy supply

for the video data processing. One of the central challenging issues in such system

design is to minimize the energy consumption of video data processing so as to

extend the operational lifetime of the system while supporting multimedia Quality

1

Page 15: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

of Service (QoS) requirements. In energy-aware video encoding system design, the

energy consumption of video data compression should be minimized while satisfying

the same QoS requirements.

In the age of desktop computing and wired communication, people worried about

bits, the storage space or transmission bandwidth. Therefore, the ultimate goal in

this type of video communication system design is to optimize video quality under

rate constraints. To analyze, model, control, and optimize the performance of a

signal processing and communication system under rate constraints, rate-distortion

(R-D) theories and algorithms have been developed [5,36,50,65]. With recent tech-

nological advances in circuit design and wireless communication, storage space and

network bandwidth have experienced dramatic growth, having been improved by

hundreds of times during the past decade. Currently, in many portable communica-

tion applications, energy has become a much more scarce and critical resource than

bandwidth or storage space. Therefore, how to incorporate the energy consumption

into the existing R-D performance analysis framework so as to optimize the video

communication system performance under rate and energy constraints emerges as a

new research task.

In this work, we study the energy consumption of video encoders and incor-

porate the third dimension of power consumption into existing rate-distortion (R-

D) analysis framework so as to establish a power-rate-distortion (P-R-D) analysis

framework for energy-aware video encoding. More specifically, we first develop an

energy-scalable video encoder which is fully scalable in energy consumption. We

then develop an operational approach to study its P-R-D behavior. Both theo-

retically and theoretically, we demonstrate that with the proposed energy-scalable

video encoder design and P-R-D optimization, we are able to significantly reduce

2

Page 16: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

the energy consumption of video encoding over portable video devices.

1.1 Major Contributions

The major contributions of this work include:

1. Developing an energy-aware video encoding system for wildlife behavior mon-

itoring. Studying power consumption of video encoding systems and develop

a video encoding scheme which is fully scalable in energy consumption.

2. Developing an operational approach to modeling the P-R-D behavior of an

energy-scalable video encoder. Developing an analytical P-R-D model. De-

veloping a scheme to estimate P-R-D model parameters from video encoding

statistics.

3. Studying the characteristics of P-R-D functions of various test video segments.

Developing a training-classification approach for real-time P-R-D control using

P-R-D clustering.

1.2 Thesis Organization

The rest of the thesis is organized as follows:

Chapter 2 reviews the existing research work related to energy-aware video en-

coding over portable communication devices.

Chapter 3 introduces a fully energy-scalable video encoder with the P-R-D frame-

work and presents our operational approach to P-R-D modeling. An analytical P-

R-D model is then established. We also develop a scheme to estimate the P-R-D

model parameter from video encoder statistics.

3

Page 17: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

Chapter 4 studies the problem of energy-saving using P-R-D optimization. The-

oretically, we demonstrate that the energy consumption of a video encoder can be

saved if the scene activity of input video is non-stationary. We also study how the

complexity control parameters of the video encoder can be configured to achieve

this energy saving.

Chapter 5 optimizes the P-R-D computational complexity scaling parameters

with the discrete version Lagrangian optimization algorithm.

Chapter 6 discusses how the P-R-D optimal scalable encoder can be used in

practical energy-aware embedded video system design. Joint hardware and video

encoder adaptation technique is discussed.

Chapter 7 discusses energy-aware video encoding in practical system design and

explained how the proposed technologies can be used in a practical setting.

Chapter 8 concludes the thesis. Future work is also discussed in this chapter.

4

Page 18: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

Chapter 2

Background and Relate Work

In this chapter, we review existing research work related to energy-aware video

coding and communication system design. We then justify the importance of this

work.

2.1 Video Coding Complexity

There are two types of portable video devices: encoder (e.g., video cell phones,

wireless video cameras, etc) and player(e.g., iPod video). In this research, we fo-

cus on energy minimization for portable video encoding devices. This is because,

on portable video devices, the fraction of energy consumption by video encoding

(typically 60-85%) is much higher than that of video decoding [31].

For video encoding, video data is voluminous. It has to be very efficiently com-

pressed. Otherwise, the amount of transmission energy or required storage space

will be tremendous. During the past decades, many video compression algorithms

and international standards, such as MPEG-2, H.263, MPEG-4, and H.264 [54,55],

5

Page 19: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

have been developed for efficient video compression. An efficient video compression

system is often computationally intensive and energy-consuming, since it involves

many sophisticated operations in spatiotemporal prediction, transform, quantiza-

tion, mode selection, and entropy coding [66]. Recent studies [39, 63] and our ex-

perimental analysis show that, in typical scenarios of video communication over

portable devices, video encoding consumes a significant portion (up to 40-60%) of

the total energy. Standardized video encoding techniques like H.263, H.264/AVC,

MPEG-1, 2, 4 are based on hybrid video coding [31], which shows in the Fig. 2.1.

Figure 2.1: Generalized block diagram of a hybrid video encoder

The input image is divided into macroblocks. Each macroblock consists of the

three components Y, Cr and Cb. Y is the luminance component which represents the

rightness information. Cr and Cb represent the color information. A macroblock

consists of one block of 16 by 16 picture elements for the luminance component

and of two blocks of 8 by 8 picture elements for the color components. The encoder

compresses the video with each macroblock. In this view, the macroblock is the basic

6

Page 20: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

coding unit [5]. The behavior of the video encoder has been theoretically analyzed

in [66]. Typical video encoders, including all the standard video encoding systems,

such as MPEG-2 [1], H.263 [26], MPEG-4 [54], and H.264/AVC [31], employ a hybrid

motion compensated DCT encoding scheme. Specifically, as shown in Fig. 2.1, they

have the following major encoding modules: motion estimation (ME) and motion

compensation (COMP), DCT, quantization (QUANT), entropy encoding (ENC)

of the quantized DCT coefficients, inverse quantization (DQUANT), inverse DCT

(IDCT), picture reconstruction (RECON), and interpolation (INTERP) [54]. For

the ease of exposition, the DCT, IDCT, QUANT, DQUANT and RECON modules

are collectively referred to as PRECODING. In this way, the video encoder has only

three major modules: ME, PRECODING, and ENC. The PRECODING can be

considered as the data representation module. The run-time complexity of major

encoding modules is shown in Tab.2.1, where the percentages of CPU occupancy

for the major encoding modules are listed. It can be seen that ME is the most

computation-intensive module, consuming about one-third of the processor cycles.

The PRECODING modules collectively consume about 50% of the total processor

cycles. The ENC module uses a relative small amount of the total CPU time,

especially at low coding bit rates.

During the past decades, many algorithms, software and hardware techniques

have been developed to reduce the computational complexity of video encoding,

speed up the video encoder, and reduce its energy consumption. For example, a

statistical modeling approach is proposed in [21] to predict the zero DCT coefficients

after quantization. Based on the prediction, the DCT computation for those zero

coefficients can be saved. Fast and low-power motion estimation algorithms have

been developed to reduce the computational complexity of motion estimation [64].

7

Page 21: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

Component Akiyo News CarphoneME 30.4% 32.6% 33.1%

COMP 9.1% 8.4% 8.7%DCT 10.5% 9.2% 9.2%

QUANT 4.9% 4.6% 5.1%ENC 4.7% 5.4% 3%

DQUANT 1.9% 1.5% 2.0%IDCT 2.3% 2.9% 2.6%

RECOD 7.5% 6.9% 7.2%INTERP 14.3% 12.8% 13.2%

RC 7.4% 7.9% 7.6%other 6.5% 7.3% 6.7%

Table 2.1: CPU Occupancy (In Percentage) of the Major Encoding Function

Since there is no motion estimation for INTRA macroblocks (MB’s), the INTRA

ratio parameter, which is the fraction of INTRA MB’s in the video frame, can be us

to control the motion estimation complexity in the video encoder [50]. A parametric

scheme for scalable motion estimation and DCT has been proposed in [60]. Hardware

implementation technologies have also been developed to improve the video encoding

speed [47] [5]. Recently, researchers at University of Illinois at Urbana-Champaign

and IBM have realized the importance of power aware computing for video data

compression and are investigating software and hardware techniques to reduce the

energy consumption of video compression [14]. However, little research has been

done to establish a theoretical framework for modeling and minimizing the energy

consumption of video compression for battery-powered communication devices.

2.2 Dynamic Scalable Voltage

Many algorithms developed in the literature are able to control or optimize the

computational complexity of the video encoder. To translate the complexity control

8

Page 22: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

and reduction into energy control and saving, we need to consider energy-scaling

technologies in hardware design. To dynamically control the energy consumption of

microprocessors on the portable device, a CMOS circuits design technology, named

dynamic voltage scaling (DVS), has been recently developed [29, 42]. In CMOS

circuits, the power consumption P is given by

P = V 2 · fCLK · CEFF , (2.1)

where V , fCLK , and CEFF are the supply voltage, clock frequency, and effective

switch capacitance of the circuits, respectively [51]. Since the energy is power mul-

tiplied by time, and the time to finish an operation is inversely proportional to

the clock frequency. Therefore, the energy per operation Eop is proportional to V 2

(Eop ∝ V 2). This implies that lowering the supply voltage will reduce the energy con-

sumption of the system in a quadratic fashion. However, lowering the supply voltage

also decreases the maximum achievable clock speed. More specifically, it has been

observed that fCLK is approximately linearly proportional to V [65]. Therefore, the

result is

P ∝ f 3CLK , and Eop ∝ f 2

CLK . (2.2)

It can be seen that the CPU can reduce its energy consumption substantially by

running more slowly. However, it is not so slow for the real-time operation of video

coding.

This is the key idea behind the DVS technology. Variable chip makers, including

AMD [22] and Intel [23], have recently announced and sold processors with this

energy-scaling feature. In conventional system design with fixed supply voltage and

clock frequency, clock cycles, and hence energy, are wasted when the CPU workload

9

Page 23: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

is light and the processor becomes idle. Reducing the supply voltage in conjunction

with the clock frequency eliminates the idle cycles and saves the energy significantly.

2.3 Joint Hardware and Application Adaptation

Achieving high QoS requirements with low energy consumption is challenging.

Cross-layer adaptation provides an efficient way to address such issues. First, system

resources are being designed with the ability to trade off performance for energy. For

example, mobile processors on the market today (such as Intel XScale PAX255 [24],

Intel Pentium-M [25] and AMDAthlon [3]) can already change the speed and power

at runtime using DVS. Second, multimedia applications can gracefully adapt to re-

source changes while maintaining acceptable service quality. That is, multimedia

applications allow a tradeoff between output quality and resource demands. Finally,

the operating system can also provide flexible resource management to support the

tradeoff between QoS and resource demands or to balance the demands on different

resources (e.g., CPU time and network bandwidth).

Researchers have proposed adaptation approaches to address the high QoS and

low energy challenge in mobile devices. Adaptation can happen in different lay-

ers from hardware to operating system to applications. The hardware adaptation

dynamically reconfigures hardware resources such as the processor to save energy

while providing the requested resource service and performance [6,10,34,45,46]. The

operating system adaptation changes the policies of allocation and scheduling in re-

sponse to application and resource variations [18, 20, 28, 44, 48, 61]. The application

layer adaptation, possibly with the support of the operating system or middleware,

changes the QoS parameters such as rate to trade off output quality for resource

10

Page 24: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

usage or to balance usage of different resources [8, 13, 17, 35, 57, 66].

The above adaptation approaches have been shown to be effective for both QoS

provisioning and energy saving. However, most of them adapt only a single layer or

two joint layers (e.g., the operating system and applications [9,43] or the operating

system and hardware [30, 41, 59]).

Figure 2.2: The framework of the GRACE

The Global Resource Adaptation through Cooperation (GRACE) project [49] is

to develop and demonstrate an integrated cross-layer adaptive system where hard-

ware and all software layers cooperatively adapt to changing system resources and

application demands, seeking to maximize user satisfaction while meeting resource

constraints of the energy, time, and bandwidth. The centerpiece of the GRACE is

a cross-layer adaptation framework that enables coordination of the adaptations of

the different system layers for the best QoS possible. Fig. 2.2 shows the framework

of GRACE. The key characteristics of the coordinator are:

1. Individual application components adapt locally, without knowledge of the

internals of other parts of the system.

2. Each software configuration is characterized by its cost (to represent its re-

source usage) and its utility (to represent user satisfaction). An application software

11

Page 25: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

component uses these metrics to drive its local adaptations (with the goal of mini-

mizing cost for maximal utility).

3. A software configuration determines its cost using dynamic feedback from the

hardware and possibly other application components (e.g., the network).

4. Since all resources are capable of adaptation, each resource offers multiple

operating points and hence multiple possible costs (e.g., multiple combinations of

execution time and energy) for a given software configuration.

5. The resource manager receives requests from multiple applications with mul-

tiple associated costs and utilities. The resource manager selects the software and

hardware configurations that will maximize overall system utility and meet the sys-

tem constraints. Thus, the resource manager is not concerned with how an optimal

configuration is reached within a layer, but is simply a mediator for ensuring that

the selected configurations maximize overall system performance within the given

constraints.

6. Once the resource manager allocates a reservation, different system compo-

nents are free to adapt locally without going through the resource manager, as long

as they do not exceed the provided reservations.

The utility of GRACE can benefit in the real-time and dynamic nature of the

applications, which often result in some computational slack. It also provides a

possibility to tradeoff between requirement of the quality and resource usage.

2.4 Importance of This Work

Recently, there are a lot of algorithms, software and hardware energy-minimization

techniques, including low-complexity encoder design [16, 21, 33], low-power embed-

12

Page 26: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

ded video encoding [27, 32], adaptive power control [39, 60, 63], and joint encoder

and hardware adaptation [13, 14, 56] to have been developed. These algorithms fo-

cus on encoder complexity (and power consumption) reduction through heuristic

adaptation or control instead of systematic energy optimization. This is because

they lack an analytic model to characterize the optimum trade-off between energy

saving and encoding performance [66]. In addition, even with existing energy sav-

ing technologies, the operational lifetime of portable video electronics is still very

short, which has become one of the biggest impediments to our technology future.

Therefore, developing new energy optimization methods has become an urgent re-

search task. In this work, we propose to develop a new P-R-D analysis framework

to characterize the inherent relationship between energy consumption and encoder

R-D performance. This will enable us to perform systematic energy minimization.

More importantly, given a video encoder, which has already been fully optimized

using existing software and hardware energy optimization technologies, the P-R-D

analysis framework will enable us to achieve additional significant energy saving.

13

Page 27: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

Chapter 3

Energy-Scalable Video Encoder

Design and Operational P-R-D

Analysis

In this chapter, we introduce our scheme for energy-scalable video encoder design,

present our operational P-R-D analysis framework, introduce the analytical P-R-D

model, and explain how the model parameter can be estimated from video encoding

statistics.

3.1 An Operational Approach to P-R-D Analysis

In the original work of P-R-D analysis [66], we have introduced a set of complexity

control parameters into an MPEG-4 encoder, studied the R-D behavior of each

parameter, and obtained the P-R-D function for a simple MPEG-4 encoder. We

observe that this analytical approach is not easily extendable to other video encoders,

14

Page 28: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

such as H.264 video coding [55], and the direct R-D analysis of complexity control

parameters becomes very difficult when the video encoding mechanism becomes

more sophisticated. In this work, we propose an operational approach for offline

P-R-D analysis and modeling which can be applied to generic video encoders.

The operational P-R-D modeling has the following three major steps. In the

first step, we group the encoding operations into several modules, such as motion

prediction, pre-coding (transform and quantization), mode decision, and entropy

coding, and then introduce a set of control parameters Γ = [γ1, γ2, · · ·, γL] to control

the power consumption of these modules. Therefore, the encoder complexity C is

then a function of these control parameters, denoted by C(γ1, γ2, · · ·, γL). Within

the DVS (dynamic voltage scaling) design framework [40], the microprocessor power

consumption, denoted by P , is a function of computational complexity C, therefore,

also a function of Γ, denoted by P=Φ(C) = P (γ1, γ2, · · ·, γL), whereΦ(·) is the

power consumption model of the microprocessor. For example, according to our

measurement, the power consumption model of the Intel PXA255 XScale processor

used in our DeerCam system is depicted in Fig. 3.1 (solid line). It can be well

approximated by the following expression

P = Φ(C) = β × Cγ, γ = 2.5, (3.1)

where β is a constant. In the second step, we execute the video encoder using dif-

ferent configurations of complexity control parameters and obtain the corresponding

R-D data, denoted by D(R; γ1, γ2, · · ·, γL). Note that this step is computationally

intensive and is intended for offline analysis to obtain the P-R-D model only. Once

the model is established, in Section 3.4, we will discuss how the model parameter

15

Page 29: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

can be estimated online during video encoding.

0 50 100 150 200 250 300 350 4000

0.2

0.4

0.6

0.8

1

1.2

1.4

Work Load (M cycles)

Com

putin

g E

nerg

y (W

)

ActualApproximation

Figure 3.1: Power consumption model with DVS.

In the third step, we perform optimum configuration of the power control pa-

rameters to maximize the video quality (or minimize the video distortion) under the

power constraints. This optimization problem can be mathematically formulated as

follows:

minγ1,γ2,···,γL

D = D(R; γ1, γ2, · · ·, γL), s.t. P (γ1, γ2, · · ·, γL) ≤ P, (3.2)

where P is the available power consumption for video encoding. Given the R-D data

set D(R; γ1, γ2, · · ·, γL), The optimum solution, denoted by D(R, P ), describes

the P-R-D behavior of the video encoder. The corresponding optimum complexity

control parameters are denoted by γ∗i (R, P, 1 ≤ i ≤ L.

In the following, we explain how to define the complexity control parameters,

16

Page 30: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

design an energy-scalable video encoder, and obtain the P-R-D function using the

above operational P-R-D analysis approach.

3.2 Complexity Control Parameters

In this work, two complexity scalable parameters, νX and νY , are introduced in

the video encoder. The complexity control parameter for the motion compensation

module is the number of SAD (sum of absolute difference) computations per frame,

denoted by νX . The νY presents the complexity control parameter of pre-coding

module, which is the number of the non-zero MB’s in the video frame. Here, “non-

zero” means the MB has non-zero DCT coefficients after quantization.

3.2.1 The Complexity Control Parameter νX

The complexity control parameter for the motion prediction and compensation mod-

ule is νX . This is based on the observation that the ME process is simply a sequence

of SAD (sum of absolute difference) computations to find the MB position of the

minimum SAD. Therefore, the computational complexity of ME, denoted by CME ,

is simply given by

CME = νX · CSAD. (3.3)

The existing video encoder uses a block-based motion prediction scheme. The ob-

jective of motion estimation is to find the best match in the reference frame for every

MB in the current frame. The search for the SAD-optimal motion vector problem

can be formulated as

(x0, y0) = arg min SAD(x, y), (3.4)

17

Page 31: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

where SAD(x, y) represents the sum of absolute difference (SAD) between the cur-

rent MB and the reference MB at a relative position of (x, y). We can see that the

ME process is simply a sequence of SAD computations to find the motion vector

which has the minimum SAD. It is assumed that the computational complexity of

each MB SAD is a constant. Therefore, the overall computational complexity of the

ME module is linearly proportional to the number of SAD computations νX . At the

frame-level, the νX · SAD computations are allocated among the MB’s in the video

frame to optimize the picture quality.

The dynamic allocation of SAD computations is used in the complexity scalable

of motion estimation. It is well known that moving objects in the video scene

contribute most to the overall visual quality. This suggests that in motion estimation

under energy constraints, we need to allocate the available νX · SAD computations

among the MB’s according to their motion characteristics to optimize the overall

picture quality. Let (mvx, mvy) be the motion vector of the MB. The block motion

activity (BMA) factor of the MB, denoted by ma is defined as

ma = |mvx| + |mvy|. (3.5)

At the frame level, we introduce a motion history matrix (MHM), denoted by M =

[mij ]MR×MC , where MR and MC are the numbers of MB’s per row and per column,

respectively. Initially, we set mij = 1. After a frame id coded, each entry is updated

as follows:

mij =

mij + 1 if ma = 0;

0 else.(3.6)

Here, ma is the BMA factor of the (i, j)-th MB in the coded frame. The larger

18

Page 32: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

the value of mij , it is of higher probability that this MB is a static block, and

less SAD computations can be allocated to this MB. Note that each entry of the

MHM is linearly scaled and represented by the gray level of a MB, ranging from 0

to 255. We can see that the MHM captures not only the motion history but also

the locations of the object motion. Most importantly, this MHM approach has very

low computation overhead and is very cost-effective in practice. Using the MHM,

we can allocate the νX · SAD computations among the MB’s. The number of SAD

computations allocated to the (i, j)-th MB, denoted by nsadij, is determined by

nsadij =1

N − 1

[

1 −mij

(k,l)≥(i,j) mkl

]

· Nsad, (3.7)

where N is the number of MB’s left so far that need to perform the motion esti-

mation, and Nsad is the available number of SAD computations. Initially, Nsad is

set to be νX . Suppose the motion search range is SR. If nsadij ≥ (2 · SR + 1)2,

it means the computational power is enough to perform a full search for this block.

Otherwise, the diamond motion search algorithm in [4] is used to find the motion

vector, whose complexity, indicated by the number of search layers, is controlled by

nsadij .

3.2.2 The Complexity Control Parameter νY

By analyzing the encoding architecture of the video encoding system, we find that

it is possible to control the computational complexity of all the pre-coding modules

using one single parameter , which is the number of the non-zero MB’s in the

video frame. Here, ”non-zero” means the MB has non-zero DCT coefficients after

quantization. Let CNZMB and CPRE be the pre-coding computational complexity

19

Page 33: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

of one non-zero MB (NZMB) and the whole video frame, respectively. We will see

that,

CPRE = νY · CNZMB. (3.8)

A parametric complexity scalable parameter νY is given, which is to collectively

control the computational complexity of the pre-coding modules, namely, the DCT,

QUANT, DQUANT, IDCT, and RECON modules.

In typical video encoding as illustrated in Chapter 1 Fig.2.1, DCT is applied to

the difference MB after motion estimation and compensation, or the original MB

if its coding mode is INTRA. After the DCT coefficients are quantized, DQUANT,

IDCT, and RECON are performed to reconstruct the MB for motion prediction of

the next frame. In transform coding of videos, especially at low coding bit rates,

the DCT coefficients in the MB might become all zeros after quantization. We

refer to this MB as an all-zero MB (AZMB). Otherwise, it is called a non-zero MB

(NZMB). In international standards for video encoding, such as MPEG-2, H.263,

and MPEG-4, ”non-zeros” also means the CBP (coded block pattern) value of the

MB is non-zero. If we can predict an MB to be AZMB, all the above pre-coding

operations can be skipped, because the output of DQUANT and IDCT of an AZMB

is still an AZMB, and the reconstructed MB is exactly the reference MB used in

motion estimation and compensation. Therefore, the encoder can simply copy over

the reference MB to reconstruct the current MB. This is a unique property of the

AZMB, which can be used to reduce the computational complexity of the video

encoder [7].

The unique property of the AZMB is used to design a complexity scalability

scheme for the pre-coding modules. Let xnk|0 ≤ n, k ≤ 7 be the coefficients in

20

Page 34: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

the different MB after motion estimation. For INTRA MB’s, xnk are the original

pixels in the video frame. Let yij|0 ≤ i, j ≤ 7 be the DCT coefficients. According

to the definition of DCT, we have

yij =1

4CiCj

7∑

n=0

7∑

k=0

xnk cos

(

iπ2n + 1

16

)

cos

(

jπ2k + 1

16

)

, (3.9)

where

Ci =

1√2

ifi = 0, 1;

else,Cj =

1√2

ifj = 0, 1;

else.(3.10)

We can see that

|yij|2 ≤

7∑

n=0

7∑

k=0

|xnk|2. (3.11)

Note that the right-hand side is the SSD of the difference MB, which is already

computed during the motion estimation. This suggests us that the SSD could be

an efficient and low-cost measure to predict the AZMB. After motion estimation

and compensation, let SSDi|1 ≤ i ≤ M be the SSD values of the M MB’s in the

video frame sorted in an ascending order. In the proposed complexity scalability

scheme for pre-coding, we force the first M − νY MB’s to be AZMB’s, and treat the

remaining νY MB ’s as NZMB’s to which the pre-coding operations are applied.

3.3 P-R-D Modeling for Energy-Aware Video En-

coding

As discussed in the above, we start from a conventional video encoder, denoted by

ΠA, introduce a set of complexity control parameters to control its computational

complexity and in turn its power consumption. We refer to this complexity-scalable

21

Page 35: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

or power-scalable video encoder as C-R-D encoder, denoted by ΠB.

It should be noted that introducing the control parameters Γ to the existing

video encoder ΠA, as ΠB will only scale down its computational complexity and

power consumption. Let P A and P be the power consumption of encoders ΠA and

ΠB, respectively. Obviously, 0 ≤ P ≤ P A. Therefore, we can normalize the power

consumption P of the C-R-D encoder by P A. After this normalization, the value of

P is between 0 and 1. Fig.3.2(a) shows the P-R-D curve D(R;P)with normalized

power P. Fig.3.2(b) shows the D-P curves at different bit rates R.

The specific procedure to insert the complexity control parameters into the en-

coder may vary from one encoder to another. For example, in an MPEG-4 video

encoder, we can use the number of SAD (sum of absolute difference) computations

to control the complexity of its motion estimation module, and use the number of

non-zero blocks to control the complexity of DCT and quantization modules. Similar

procedures can be applied to MPEG-2 or H.263 encoders. For H.264 video encoders,

we can use the number of SAD computations and reference frames, and the number

of coding modes to control its complexity. It is not important how the complexity

control parameters Γ = [ν1, ν2, . . . , νL] are inserted into the video encoding modules.

Our C-R-D video encoder design only requires that:

• Compatible. The existing non-energy-scalable video encoder ΠA is a special

case of the P-R-D encoder ΠB when νi = 1 or P = 1. From the R-D analysis

perspective

D(R, P )|P=1 = D(R), (3.12)

where D(R, P ) is the P-R-D function of encoder ΠB, and D(R) is the R-D

22

Page 36: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

(a)

(b)

Figure 3.2: (a)The P-R-D curve; (b)The D-P curves at different bit rates.

23

Page 37: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

function of encoder ΠA. In other words, when the power supply is sufficient

and a full encoding power is used, the energy-scalable P-R-D video encoder

ΠB becomes the traditional video encoder ΠA.

• Scalable. When we reduce the values of complexity control parameters, the

power consumption P of the P-R-D encoder decreases

3.4 Analytical P-R-D Models

In order to obtain an analytical P-R-D model for energy-aware video encoding, we

perform the operational P-R-D analysis procedure over a wide range of test video

sequences and find determine the common characteristics of their P-R-D functions.

The test video sequences used in this work include Akiyos, News, Salesman, Car-

phone, Foreman, Coastguard, and Football QCIF (176×144) video sequences. The

C-R-D curves and the D-P curves for these test video sequences at different bit rates

are shown in Fig.3.3-Fig.3.9.

It should be noted that these C-R-D curves at different complexity control

schemes share a similar pattern: as the bit rate and power consumption increase,

the distortion decreases exponentially. More specifically, we can draw the following

observations about the C-R-D functions:

1. When P = 0 and R = 0, the coding distortion should be the variance of the

input video. This is because the encoder does not have any bit and energy

resources to compress the video data, and the decoder has no choice but display

blank pictures. In this case, the coding distortion, measured by the MSE

(mean squared error) between the input video and the reconstructed one, is

the variance of the input video.

24

Page 38: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

00.2

0.40.6

0.81

0

0.2

0.4

0.6

0.8

1

0

0.5

1

1.5

2

2.5

Rate

Akiyos P−R−D

Power

D(*

106 )

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5

2

2.5

Power

D(*

106 )

Akiyos P−R−D

Figure 3.3: Akiyos (a)The P-R-D curve; (b)The D-P curves at different bit rates.

25

Page 39: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

0

0.5

1

1.5

0

0.2

0.4

0.6

0.8

1

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

5.5

Rate

News P−R−D

Power

D(*

106 )

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Power

D(*

106 )

News P−R−D

Figure 3.4: News (a)The P-R-D curve; (b)The D-P curves at different bit rates.

26

Page 40: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

0

0.5

1

1.5

0

0.2

0.4

0.6

0.8

1

0

1

2

3

4

5

6

Rate

Salesman P−R−D

Power

D(*

106 )

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Power

D(*

106 )

Salesman P−R−D

Figure 3.5: Salesman (a)The P-R-D curve; (b)The D-P curves at different bit rates.

27

Page 41: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

00.5

11.5

22.5

00.2

0.40.6

0.81

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

5.5

Rate

Power

D(*

106 )

Car P−D−R

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5

2

2.5

3

Power

D(*

106 )

Car P−R−D

Figure 3.6: Car (a)The P-R-D curve; (b)The D-P curves at different bit rates.

28

Page 42: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

00.5

11.5

22.5

3

0

0.2

0.4

0.6

0.8

1

0

2

4

6

8

10

12

14

16

18

20

Rate

Foreman P−R−D

Power

D(*

106 )

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

2

4

6

8

10

12

14

16

18

20

Power

D(*

106 )

Foreman P−R−D

Figure 3.7: Foreman (a)The P-R-D curve; (b)The D-P curves at different bit rates.

29

Page 43: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

00.511.522.533.54

0

0.2

0.4

0.6

0.8

1

0

2

4

6

8

10

12

14

16

18

20

Rate

Coastguard P−R−D

Power

D(*

106 )

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

2

4

6

8

10

12

14

16

18

20

Power

D(*

106 )

Coastguard P−R−D

Figure 3.8: Coastguard (a)The P-R-D curve; (b)The D-P curves at different bitrates.

30

Page 44: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

5

10

15

20

25

30

35

40

Power

D(*

106 )

Fb P−R−D

Figure 3.9: Football (a)The P-R-D curve; (b)The D-P curves at different bit rates.

31

Page 45: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

2. As we can see from Fig.3.2, the relationship between the distortion D and

power consumption P is approximately exponential.

3. As suggested by the classical R-D models, the relationship between the coding

bit rate R and distortion D is also exponential.

Based on the above observations, we propose the following analytic expression

for the P-R-D model:

D(R, P ) = σ22−λR·g(P ), (3.13)

where, σ2 represents the picture variance of the video segment after motion compen-

sation and λ represents the resource utilization efficiency of the video encoder. g(P )

is the normalized power consumption model of the microprocessor with g(P ) = 0

and g(P ) = 1. In general, g(P ) is a monotonically increasing function. The analysis

in [6] suggests that

g(P ) = P1γ . (3.14)

The model 3.13 describes the relationship of the P , R, and D. It is convenient to

analyze the power consumption behavior of the video encoder with this model.

32

Page 46: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

Chapter 4

Energy Saving Analysis

In this chapter, we will study how the energy consumption of the video encoder

can be minimized using P-R-D optimization. We will then present a training-

classification approach to configure the complexity control parameters of the video

encoder so as to achieve the energy saving.

4.1 Theoretical Analysis

Let DA(R) be the R-D function of encoder ΠA, and DB(R, P ) the P-R-D function

of encoder ΠB. As explained Chapter3, we have

DB(R, P )|P = 1 = DA(R), and DB(R, P ) ≤ DA(R), (4.1)

where 0 ≤ P ≤ 1. Let S be the video sequence to be encoded. The time duration

of S is denoted by T , which corresponds to the operational lifetime of device. We

partition S into a number of segments, denoted by Sn|1 ≤ n ≤ N. Suppose

the picture variance and encoder efficiency in the P-R-D model 3.13 for each video

33

Page 47: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

segment are σ2 and λn, respectively. Let Rn and Pn be the number of bits and

power used by video encoder ΠB to compress Sn. Let RT be the total number bits

that can be generated by the video encoder. We assume that the video sequence is

encoded at a constant quality D, In general, this assumption is reasonable because

most applications require that the videos are encoded at a target quality. Rewriting

model 3.13 as the equation 4.2, for video segment Sn, we have

Pn =an

Rγn, (4.2)

where

an =

(1

λnlog2

σ2n

D

, (4.3)

is called scene activity level. We can see that within the P-R-D analysis framework,

the encoder power consumption Pn depends on the encoding bit rate RN . Our

objective is to minimize the total power consumption of encoder ΠB in compressing

all video segments. The energy minimization problem can be formulated as follows

minRn

P =N∑

n=1

Pn =N∑

n=1

an

Rγn, s.t.

N∑

n=1

Rn = RT . (4.4)

Lemma 1 The solution to the minimization problem is given by

Rn =(γan)

1γ+1

∑Ni=1(γai)

1γ+1

· RT , (4.5)

and the minimum power is given by

P =(∑N

n=1 a1

γ+1n )γ+1

RγT

. (4.6)

34

Page 48: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

Proof. We solve the energy minimization problem using a Lagrange multiplier

approach. Let

J =

N∑

n=1

an

Rγn

+ β(

N∑

n=1

Rn − RT ), (4.7)

where β is the Lagrange multiplier. For the minimum solution, the following condi-

tion holds:

∂J

∂Rn= −γ

an

Rγ+1n

+ β = 0. (4.8)

This is

Rn =

(γ · an

β

) 1γ+1

. (4.9)

Since∑N

n=1 Rn = RT , we have

β1

γ+1 =

∑Nn=1(γ · an)

1γ+1

RT. (4.10)

From equation 4.9, we have

Rn =(γ · an)

1γ+1

∑Ni=1(γ · ai)

1γ+1

· RT , (4.11)

which is the optimum bit allocation. The minimum power consumption is then giver

by

P =

N∑

i=1

Pn =

N∑

n=1

an

Rγn

35

Page 49: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

=1

RγT

N∑

n=1

an

(∑N

i=1(γ · ai)1

γ+1

(γ · an)γ

γ+1

=1

RγT

(N∑

i=1

a1

γ+1

i

)γ N∑

n=1

anγ

γ

γ+1

(γ · an)γ

γ+1

=1

RγT

(N∑

i=1

a1

γ+1

i

)γ N∑

n=1

a1

γ+1n

=

(∑N

i=1 a1

γ+1

i

)γ+1

RγT

, (4.12)

which proves Lemma 1.

Lemma 2. The P-R-D video encoder ΠB consumes less energy than the standard

non-energy-scalable video ΠB, and the energy saving ratio Λe is

Λe =P B

T

P AT

=

(∑N

i=1 a1

γ+1

i

)γ+1

(∑N

i=1 a1γ

i

· N

≤ 1, (4.13)

where P AT and P B

T are the power consumption of encoders ΠA and ΠB, respectively.

The equality holds if the scene activity level ai for each video segment is constant

(stationary).

Proof. In the P-R-D video encoder design, encoder ΠA is a special case of the P-R-

D video encoder ΠB when a full encoder power is used. In other words, the power

consumption of encoder ΠA on video segment Si, denoted by P Ai , is equal to 1. If D

is the target video encoding quality. According to equation 4.2, the corresponding

number of encoding bits of ΠA, denoted by RAi , is given by RA

i = a1γ

i . Therefore,

the total number of bits generated by encoder RAi is

36

Page 50: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

RT =N∑

i=1

a1γ

i , (4.14)

and the total power consumption of encoder ΠA is given by

PT =N∑

i=1

P Ai =

N∑

i=1

1 = N. (4.15)

For the same number of encoding bits RT , according to equation 4.6 and 4.14, the

minimum power consumption of the P-R-D encoder ΠB is

P BT =

(∑N

i=1 a1

γ+1

i

)γ+1

RγT

=

(∑N

i=1 a1

γ+1

i

)γ+1

(∑N

i=1 a1γ

i

)γ . (4.16)

The energy saving ratio of the P-R-D video encoder ΠB over the existing video

encoder ΠA is given by

Λe =P B

T

P AT

=

(∑N

i=1 a1

γ+1

i

)γ+1

(∑N

i=1 a1γ

i

· N

. (4.17)

To prove that Λ ≤ 1, we need to use the Holder’s inequality. Given two N-

dimensional vector (x1, x2, . . . , xN ) and (y1, y2, . . . , yN), according to Holder’s in-

equality, we have

N∑

i=1

|xi · yi| ≤

(N∑

i=1

xpi

) 1p(

N∑

i=1

yqi

) 1q

, (4.18)

where p, q > 1 and

37

Page 51: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

1

p+

1

q= 1. (4.19)

The equality holds when

(x1, x2, · · · , xN) = α · (y1, y2, · · · , yN) , (4.20)

where α is a constant. We let

xi = a1

γ+1

i , yi = 1, (4.21)

p =γ + 1

γ, q = γ + 1. (4.22)

Using the Holder’s inequality, we have

N∑

i=1

|a1

γ+1

i · 1| ≤

(N∑

i=1

(

a1

γ+1

i

)γ+1γ

) γ

γ+1

·

(N∑

i=1

(1)γ+1

) 1γ+1

=

(N∑

i=1

a1γ

i

) γ

γ+1

· N1

γ+1 , (4.23)

that is

(N∑

i=1

a1

γ+1

i

)γ+1

(N∑

i=1

a1γ

i

· N. (4.24)

Therefore

Λe =P B

T

P AT

=

(∑N

i=1 a1

γ+1

i

)γ+1

(∑N

i=1 a1γ

i

· N

≤ 1. (4.25)

38

Page 52: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

According to equation 4.20, the equality holds when

(a1, a2, · · · , aN ) = α (1, 1, · · · , 1) , (4.26)

which implies that the scene activity level ai of each video segment is constant. So

far, Lemma 2 is proved.

From Lemma 2, we can see that, incorporating another dimension, the power

consumption P, into the traditional R-D analysis gives us another dimension of

freedom in resource allocation and performance optimization. Accordingly, the P-

R-D video encoder is able to save energy by intelligently allocating its bit and energy

resources. Furthermore, the result from Lemma 2 shows that the energy saving is

possible if the scene activity of the input video scene is non-stationary, i.e., αi is

time-varying. In practice, typical video application system will be able to operate

for hours. Within this long operational time period, the video content captured by

the system often exhibits a large variation in its scene activity levels. In this case,

the P-R-D video encoder is able to save energy significantly.

4.2 Traing-Classification Approach to P-R-D Video

Encoder Control

The activity parameter αi is very important. It includes the resource utilization

efficiency parameter λ of the P-R-D video encoder, the picture variance σ2 which

represents the amount of scene activities, and the target video quality D set by the

application requirement. In real-time video encoding, the picture variance σ2 can

be obtained directly from the video encoder. Therefore, we only need to estimate

39

Page 53: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

λ. The P-R-D model is rewritten as:

D(R, P ) = σ22−λR·Cγ

, 1 ≤ γ ≤ 3. (4.27)

Considering P ∝ f 3CLK and C ∝ fCLK , it is observable that C ∝ P

13 . Hence, from

the P-R-D model

D(R, P ) = σ22−λR·P1γ

, 1 ≤ γ ≤ 3, (4.28)

the C-R-D model 4.27 is obtained.

With the C-R-D model, the λ can be estimated. In Chapter 3, we have pre-

sented an operational approach to obtain the P-R-D curve. Based on these P-R-D

curves, we can determine the value of λ using statistical fitting. This approach is

computationally intensive and only suitable for offline C-R-D analysis. In real-time

video compression, it is desirable to develop a low-complexity scheme which is able

to estimate λ from statistics of current or previous video frames, which are directly

available from the video encoder. We define the following encoder statistics λF

λF =1

R · CγF

log2

σ2i

Di, (4.29)

where CF = C(ν1 = 1, ν2 = 1, . . . , νL = 1) represents the encoder complexity where

no complexity control is applied. Fig.4.1 shows that the P-R-D model parameter

λ is highly correlated with the existing R-D model λF . So far, the video scene

activity can be identified with parameter λF . It dose not add further complexity for

calculating parameter λF .

In this work, we assume that the scene activity parameter does not change sig-

40

Page 54: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.050

1

2

3

4

5

6

7

8

λF

λ

data 1 linear

Figure 4.1: The relationship of λ and λF

nificantly within the video segment. In practice, with proper scene change detec-

tion, a long non-stationary video input can be often partitioned into multiple video

segments with relatively stationary scene activities. Fig. 4.2 illustrates the scene

activity estimation parameter λF in the standard test video sequences. It shows

that the time-varying video sequence can be partitioned into video segments, each

with a relatively stationary scene activity level.

Lavel 1 2 3 4 5 6 7

λF 0.0472 0.0331 0.0246 0.0132 0.0097 0.0070 0.0048

Range 0.045 0.045-0.030 0.030-0.022 0.022-0.012 0.012-0.009 0.009-0.006 0.006

Table 4.1: The point value for categorizing Video Segment

We implement and test the proposed P-R-D analysis procedure over an input

video sequence which has 3590 frames and partitioned into 57 segments. The results

are shown in Fig. 4.5. In our experiments, we classify the input video sequence

into the seven clusters, with each of them sharing similar P-R-D behaviors. The

41

Page 55: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

0 500 1000 1500 2000 2500 3000 35000

50

100

150

200

250

300

The activity of a video sequence

Number of frame

Mod

el Pa

ram

eter

λ F

Figure 4.2: The common video scene estimation parameter λF

600 650 700 750 800 850 900

−10

−5

0

5

10

15

20

25

30

35

The activity of a video sequence

Number of frame

Mod

el Pa

ram

eter

λ F

(1)

Figure 4.3: A parts of common video scene estimation parameter λF

42

Page 56: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

1500 1600 1700 1800 1900 2000

0

50

100

150

200

250

300

The activity of a video sequence

Number of frame

Mod

el Pa

ram

eter

λ F

(2)

Figure 4.4: A parts of common video scene estimation parameter λF

classification feature λF are listed in the Table 4.1. The result shows that the P-R-

D model parameters of different video segment exhibit a strong clustering behavior.

Based on this property, we will develop a training-classification approach to P-R-D

video encoder control in the next chapter.

43

Page 57: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

1 2 3 4 5 6 70

2

4

6

8

10

12

14

16The frequency of the activity level

Activity level

Freq

uenc

y

Figure 4.5: The distribution of scene activity parameters of all video segments.

44

Page 58: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

Chapter 5

Power-Rate-Distortion Control for

Energy-Aware Video Encoding

In this chapter, we study how the constrained optimization for P-R-D analysis in

(3.2) can be solved and how the complexity control parameters of video encoders

can be configured during real-time video encoding to achieve the optimized P-R-D

performance.

5.1 A Training-Classification Approach to P-R-D

Video Encoder Control

In Chapter 3, we introduce two complexity control parameters, νX and νY , to con-

trol the motion prediction and pre-coding modules of the video encoder so as to

establish a complexity-scalable or equivalently energy-scalable video encoder. The

encoding bit rate R, video distortion D, and power consumption P are all functions

of these two complexity control parameters. In energy-aware video encoding, these

45

Page 59: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

two parameters should be optimally selected so as to minimize the total power con-

sumption under rate and distortion constraints or minimize the total video coding

distortion under rate and power constraints, as shown in (3.2).

As discussed in Chapter 4, the input video segments exhibit a strong cluster-

ing behavior in their P-R-D model parameters. Based on this observation, we have

propose a training-classification approach to P-R-D control and optimization during

real-time video encoding. First, in the training stage, we cluster the training video

segments according to their scene activity parameter. During the training stage, we

collect a set of training video segments with a wide range of scene activities. We

partition them into a number of clusters according to their scene activity param-

eter λF . According to our simulation experience, 5 to 7 clusters will be sufficient.

For each cluster of video segments, we find their average P-R-D function and opti-

mum encoder complexity control parameters by solving the constrained optimization

problem. In this chapter, we use Lagrangian optimization to obtain the optimum

control parameters νX and νY of each cluster. These optimum encoder complexity

control parameters for all clusters are then stored in a database. During real-time

video encoding, we compute the scene activity parameter λF of the video segment,

determine its cluster based on the value of σ2m, and then use the average optimum

encoder complexity control parameters of that cluster to control the video encoder.

5.2 Lagrangian Optimization

In Chapter 4, we have proposed to partition a non-stationary input video sequence

into multiple video segments which have relatively stationary scene activities. These

video segments are then classified into several clusters and the distribution of these

46

Page 60: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

video segments in the cluster seems to follow a Gaussian distribution, as shown

in Fig. 4.5. For video segments in each cluster, we use a discrete Lagrangian

optimization approach [5] to obtain the optimum complexity control parameters,

since during our operational P-R-D analysis, what we have obtained are discrete P-

R-D points, D = D(R, P |Γ(ν1, ν2, . . . , νL)) at different configurations of complexity

control parameters.

Figure 5.1: The discrete Lagrangian Optimization principle

The basic idea of discrete Lagrangian optimization is as follow. We start with a

simple rate-distortion optimization without considering the third dimension of power

consumption to demonstrate the basic procedure of discrete Lagrangian optimization

. We first introduce a Lagrange multiplier λ ≥ 0, a non-negative real number and

consider the Lagrangian cost Jij(λ) = dij+λ·rij for our P-R-D analysis. Refer to Fig.

47

Page 61: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

5.1 for a graphical interpretation of the Lagrangian cost. Here, i is the index of video

segment and j is the index of the complexity control parameter. As the parameter

j increases, the coding bit rate rij decreases and the coding distortion dij increases.

The Lagrange multiplier allows us to select specific trade-off points. Minimizing the

Lagrangian cost Jij = dij +λ ·rij when λ = 0, is equivalent to minimizing the coding

distortion. In other words, it selects the point closer to the x-axis in Fig. 5.1.

Conversely, minimizing the Lagrangian cost function when λ becomes arbitrarily

large is equivalent to minimizing the coding bit rate, and thus finding the point

closest to the y-axis in Fig. 5.1. Intermediate values of the multiplier λ determine

intermediate operating points. Please see [5] for detail. Consider the following R-D

optimization problem:

minj∈Ω

J =

N∑

i=1

dij + λ · rij, s.t.

N∑

i=1

rij ≤ RT , (5.1)

where j is a parameter for coding control. If the mapping j = x∗(i) for i =

1, 2, . . . , N , minimizes

N∑

i=1

dix(i) + λ · rix(i). (5.2)

Then it is also the optimal solution to the problem of the 5.1, for the particular case

where the total budget is:

RT = R(λ) =N∑

i=1

rix(i), (5.3)

so that

48

Page 62: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

D(λ) =N∑

i=1

dix∗(i) ≤N∑

i=1

dix(i), (5.4)

for any x satisfying the equation 5.4 with R given by (5.3). Since the budget

constraint of (5.3) has been removed, for a given multiplier λ, (5.1) can be rewritten

as

minj∈Ω

(

N∑

i=1

dij + λ · rij) =

N∑

i=1

minj∈Ω

(dij + λ · rij). (5.5)

In this case, the minimum point can be computed independently for each coding

unit. Note that for each coding unit i, the point on the R-D characteristic that

minimizes dij + λ · rij is that point at which the line of absolute slope λ is tangient

to the convex hull of the R-D characteristic. For this reason we normally refer to λ

as the slope, and since λ is the same for every coding unit on the sequence, we can

refer to this algorithm as a constant slope optimization [5].

5.3 P-R-D Optimization

In the following, we apply discrete Lagrangian optimization to solve the P-R-D

optimization problem. (5.1) can be rewritten

min[ν1,ν2,...,νL]∈Ω

J = D + λ1 · R + λ2 · C, (5.6)

where the J is the Lagrangian cost, the λ1 and λ2 are multipliers for bit rate con-

straint R and complexity constraint C, respectively. For (5.6), the optimal objective

is to obtain best video quality of service under rate and complexity constraints. Note

that the power consumption P is the function of the computational complexity C

49

Page 63: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

which is related to DVS. In this chapter, for ease of discussion, we use C-R-D model

instead of the P-R-D model, since they are equivalent under DVS. Within a cluster

of video segment, as mentioned in 4, the Lagrangian cost function can be written as

J =

N∑

i=1

(Di + λ1i · Ri + λ2i · Ci), and

N∑

i=1

Ri ≤ R,

N∑

i=1

Ci ≤ C, (5.7)

where N is a number of video segments. There are two multipliers in the Lagrangian

cost function. So, it is more difficult to determine the optimum values of both. In

order to simple the optimization procedure, we assume that the video quality is

constant. This assumption is reasonable, since in many practical applications, users

often specify a target video quality for the video encoding and communication task.

In this case, we set Di = D, where D is the target video quality set by users. Now,

the cost function becomes

J =

N∑

i=1

(Ci + λi · Ri), and

N∑

i=1

Ri ≤ R, Di ≈ D. (5.8)

As mention in the 4, the input video segments are classified into seven clusters. In

this case, the cost function can be re-written as

min(

N∑

k=1

(Ci + λk · Rk)) =

7∑

i=1

Ni∑

j=1

Cij(i) + λi · Rij(i), (5.9)

underN∑

i=1

Ri ≤ R, Di ≈ D.

50

Page 64: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

Note that the each video segment in the cluster is independent so that the opti-

mization procedure can be performed on each cluster. Using the constant slope

optimization algorithm [5], Eq. (5.10) is rewritten as

7∑

i=1

Ni∑

j=1

Cij(i) + λ · Rij(i), N =

7∑

i=1

Ni, (5.10)

under

N∑

i=1

Ri ≤ R, Di ≈ D.

Based on this formulation, the optimization problem can be solved using the follow-

ing two steps.

Step 1: Calculating the minimum Lagrangian cost Ji with different λ ∈

[λ1, λ2, . . . , λM ] for each cluster. We run the P-R-D scalable encoder mentioned 4

with at different configuration of encoding parameters [X, Y, Q] and the correspond-

ing encoding results [C, R, D] are recorded. Here, [X, Y, Q] are scalable parameter

of motion estimation search, pre-coding, and quantization step size. [C, R, D] are

encoding complexity, coding bit rate, and video distortion, respectively. Note that

we assume the video coding distortion is constant. We group the results [C, R, D]

based their distortion values. For each group, we apply the optimization procedure

outlined in (5.11):

min[X,Y,Q]

ji = Ci + λ · Ri, i = 1, 2, . . . , 7,

(5.11)

under

N∑

i=1

Ri ≤ R, Di ≈ D.

51

Page 65: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

The multiplier λ controls the optimization to match the rate budget. Some sample

choices of λ are listed in Table 5.1.

λ 300 400 500 600 700 800 1000 2000 3000λ 4000 5000 6000 7000 8000 9000 10000 12000 14000

Table 5.1: Sample choices of of multiplier λ.

The Fig. 5.2 to Fig. 5.11 show the minimum Lagrangian cost at different values of

λ for each of the seven clusters. From these results, we can see that each optimal pair

of rate and complexity corresponds a multiplier λ. This pair of rate and complexity

corresponds a group of the scalable parameters [X, Y, Q]. With the parameters

[X, Y, Q], the optimal coding can be performed.

0 2000 4000 6000 8000 10000 12000 140001.14

1.16

1.18

1.2

1.22

1.24

1.26

1.28x 10

6PSNR=38

Rate

λ

0 2000 4000 6000 8000 10000 12000 140008.2

8.4

8.6

8.8

9

9.2

9.4

9.6x 10

8

Com

plex

ity

λ

Figure 5.2: The multiplier λ, rate constraints and complexity at D = 38db

Step 2: Predicting the scene activity parameter of the input video segment.

In Step 1, each cluster’s minimum Lagrangian cost is calculated at different values

of λ. This implies that the minimum computational complexity is obtained under

a certain rate constrain for a specific value of λ. As discussed in Chapter 4, each

52

Page 66: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

0 2000 4000 6000 8000 10000 12000 140001.05

1.1

1.15

1.2

1.25

1.3

1.35x 10

6 PSNR=37Ra

te

λ

0 2000 4000 6000 8000 10000 12000 140007

7.5

8

8.5

9

9.5

10x 10

8

Com

plexit

y

λ

Figure 5.3: The multiplier λ, rate constraints and complexity at D = 37db

0 2000 4000 6000 8000 10000 12000 140000.9

0.95

1

1.05

1.1

1.15x 10

6 PSNR=36

Rate

λ

0 2000 4000 6000 8000 10000 12000 140007

7.5

8

8.5

9

9.5x 10

8

Com

plex

ity

λ

Figure 5.4: The multiplier λ, rate constraints and complexity at D = 36db

53

Page 67: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

0 2000 4000 6000 8000 10000 12000 140007

7.5

8

8.5

9

9.5

10x 10

5 PSNR=35Ra

te

λ

0 2000 4000 6000 8000 10000 12000 140006.5

7

7.5

8

8.5

9

9.5x 10

8

Com

plexit

y

λ

Figure 5.5: The multiplier λ, rate constraints and complexity at D = 35db

0 2000 4000 6000 8000 10000 12000 140005.5

6

6.5

7

7.5

8

8.5x 10

5 PSNR=34

Rate

λ

0 2000 4000 6000 8000 10000 12000 140006.5

7

7.5

8

8.5

9

9.5x 10

8

Com

plexit

y

λ

Figure 5.6: The multiplier λ, rate constraints and complexity at D = 34db

54

Page 68: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

0 2000 4000 6000 8000 10000 12000 140005

5.5

6

6.5

7x 10

5PSNR=33

Rate

λ

0 2000 4000 6000 8000 10000 12000 140006.5

7

7.5

8

8.5

9

9.5x 10

8

Com

plexit

y

λ

Figure 5.7: The multiplier λ, rate constraints and complexity at D = 33db

0 2000 4000 6000 8000 10000 12000 140003.5

4

4.5

5

5.5x 10

5 PSNR=32

Rate

λ

0 2000 4000 6000 8000 10000 12000 140006

6.5

7

7.5

8

8.5

9

9.5x 10

8

Com

plexit

y

λ

Figure 5.8: The multiplier λ, rate constraints and complexity at D = 32db

55

Page 69: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

0 2000 4000 6000 8000 10000 12000 140003

3.5

4

4.5

5x 10

5 PSNR=31Ra

te

λ

0 2000 4000 6000 8000 10000 12000 140005.5

6

6.5

7

7.5

8

8.5

9x 10

8

Com

plexit

y

λ

Figure 5.9: The multiplier λ, rate constraints and complexity at D = 31db

0 2000 4000 6000 8000 10000 12000 140002.5

3

3.5

4x 10

5 PSNR=30

Rate

λ

0 2000 4000 6000 8000 10000 12000 140005.5

6

6.5

7

7.5

8x 10

8

Com

plexit

y

λ

Figure 5.10: The multiplier λ, rate constraints and complexity at D = 30db

56

Page 70: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

0 2000 4000 6000 8000 10000 12000 140002

2.2

2.4

2.6

2.8

3

3.2x 10

5 PSNR=29

Rate

λ

0 2000 4000 6000 8000 10000 12000 140005.5

6

6.5

7

7.5x 10

8

Com

plexit

y

λ

Figure 5.11: The multiplier λ, rate constraints and complexity at D = 29db

video segment is classified into clusters according to their scene activity parameters.

Hence, if the distribution of the clusters is known, the Lagrangrian multiplier λ can

be fixed for a given rate budget. Fortunately, the distribution of the clusters of

the video segments in term activity scene can be often considered to be a Gaussian

distribution. Let x be the scene activity parameter. Let f(x) be the probability

density function of x among these video clusters. Let r(x) be the corresponding bit

rate obtained from optimization. Therefore, the expected value of output bit rate

is given by∫ ∞

−∞f(x) · r(x)dx. (5.12)

Considering a discrete-time case of this problem. Let xi be the scene activity pa-

rameter of the i-th cluster. Let Ω = [xi|i = 1, 2, . . . , 7]. Let Fi be the probability

for sample i. The probability of the sample i, Fi, is list in Table 5.2. Let r(ji) be

the rate of the j-th video segment that is classified into i-th cluster. In this case,

57

Page 71: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

the expected bit rate of j-th video segment is predicted by

7∑

i=1

Fi · r(i|λ). (5.13)

The overall coding rate Rp is predicted by

Rp =N∑

j=1

7∑

i=1

Fi · r(i|λ) =7∑

i=1

(N · Fi) · r(i|λ) =7∑

i=1

Ni · r(i|λ), (5.14)

where Ni is the number of the video segments that are classified into the i-th cluster.

Cluster Cluster1 Cluster2 Cluster3 Cluster4 Cluster5 Cluster6 Cluster7

Prob. 0.053 0.105 0.140 0.281 0.211 0.140 0.070

Table 5.2: The Probability of the Fi

Step 3: Fix the Lagrangian multiplier λ and obtain the optimal encoder control

parameters [X, Y, Q]. In Step 1, the minimum Lagrangian cost of a video segment in

the i-th cluster is calculated at different values of λ. In other words, the minimum

computational complexity is obtained for a given rate constraint and a specific value

of λ. As discussed in Chapter 4, an input video segment is classified into a cluster

based on its scene activity parameter. The distribution of the clusters in the video

sequence can be predicted by Step 2. Accordingly, the Lagrangrian multiplier λ

can be fixed with the certain rate budget. For example, we can consider the video

quality levels of 37db, 34db, and 30db. In order to obtain the λ, first, we list the

relationship between the expectation of the rate and the λ in Table . Then, the

average rate budget for the video segment compression is calculated with the total

rate budget with

58

Page 72: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

r =R

N. (5.15)

λ 300 600 2000 7000 14000

RP 1264347.14 1209230.67 1162620.71 1153689.24 1153165.68

37db CP 826501723.62 844671507.38 902107565.82 940163519.19 946846809.28

Rr 1.2381e+006 1.2112e+006 1.1217e+006 1.1130e+006 1.1127e+006

Cr 972837842 990056977 1.0679e+009 1.1161e+009 1.1217e+009

λ 300 600 2000 7000 14000

RP 802745.57 705773.85 602751.87 593037.21 592451.51

34db CP 663130614.32 706261434.62 855651058.05 900865197.85 908359867.79

Rr 8.0704e+005 7.3882e+005 6.1917e+005 6.1253e+005 6.1233e+005

Cr 788791942 850711799 1.0121e+009 1.0609e+009 1.0750e+009

λ 300 600 2000 7000 14000

RP 390217.42 331753.54 266147.48 259710.16 255222.04

30db CP 572981357.51 597513566.14 709343178.71 738226926.15 783359359.26

Rr 4.1154e+005 3.6938e+005 3.0327e+005 3.0192e+005 2.9966e+005

Cr 680497548 701901814 854763964 878845555 935804923

Table 5.3: The Lagragian Multiplier λ and correspoding QoS and rate constaint

The encoder control parameters [X, Y, Q] are obtained at the minimum La-

grangian cost. In this case, (5.11) can be rewritten as

min[X,Y,Q]

J =

Ni∑

j=1

7∑

i=1

min(Cij + λ · Rij), Ni = N · Fi, (5.16)

under R = N ·

7∑

i=1

Fi · ri ≤ R, D ≈ D.

Here, Fi is the probability of the i-th cluster. The overall computational complexity

of a video sequence for a given rate constraint is

Cp =

7∑

i=1

Ni∑

j=1

Cij, and R = N ·

7∑

i=1

Fi · ri ≤ R, D ≈ D, (5.17)

where Cpis the total minimum computational complexity, R is the total rate budget.

59

Page 73: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

5.4 Summary of Algorithm

We implement the optimization on common video sequence which has 3590 frames

and partitioned into 57 segments and the following steps are performed:

1. Assuming the video QoS and rate evaluation list in the Table 5.3. The N is

57.

2. Calculating corresponding expectation rate rre with equation 5.13. Note that

the probability of the Fi is in the Table 5.2.

3. Obtaining the each corresponding scaling parameters with Table 5.4

PSNR: 37dbλ Clust1 Clust2 Clust3 Clust4 Clust5 Clust6 Clust7

X 0.20 0.10 0.10 1.00 1.00 0.50 0.10300 Y 0.20 0.40 0.20 0.40 0.60 0.80 0.80

Q 5 4 3 3 3 3 3X 0.20 0.10 0.10 1.00 1.00 1.00 0.10

600 Y 0.20 0.40 0.20 0.40 0.60 0.80 0.80Q 5 4 3 3 3 3 3X 0.20 0.10 0.10 0.50 1.00 1.00 1.00

2000 Y 0.20 0.40 0.20 0.80 0.60 0.80 0.80Q 5 4 3 4 3 3 3X 0.20 0.10 1.00 0.60 1.00 1.00 1.00

7000 Y 0.20 0.40 0.20 0.85 0.60 0.80 0.80Q 5 4 3 4 3 3 3X 1.00 0.10 1.00 0.60 1.00 1.00 1.00

14000 Y 0.20 0.40 0.20 0.85 0.60 0.80 0.80Q 5 4 3 4 3 3 3

Table 5.4: The Lagragian Multiplier λ and corresponding PSNR ≈ 37db and rateconstraint

The multiplier λ can be obtained with certain QoS and rate constrains. For

example, if the requirement of the QoS is 37db and average bit rate is less than

1162620.71/57 = 20396.85 bit, the λ is chosen as 2000. With the argument QoS

being 37db and being 2000, the scaling parameters are obtained from the Table 5.4:

60

Page 74: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

PSNR: 34dbλ Clust1 Clust2 Clust3 Clust4 Clust5 Clust6 Clust7

X 0.10 0.10 0.10 0.10 0.80 0.10 0.10300 Y 0.10 0.10 0.10 0.40 0.40 0.70 0.70

Q 8 5 3 6 4 5 4X 0.10 0.10 0.10 0.10 0.80 0.80 0.10

600 Y 0.10 0.10 0.20 0.40 0.40 0.70 0.90Q 8 5 6 6 4 5 5X 0.10 0.10 0.10 0.10 1.00 1.00 0.80

2000 Y 0.10 0.10 0.20 0.40 0.80 0.75 1.00Q 8 5 6 6 6 5 5X 0.10 0.50 1.00 1.00 1.00 1.00 0.80

8000 Y 0.10 0.20 0.20 0.40 0.80 0.75 1.00Q 8 6 6 6 6 5 5X 1.00 0.50 1.00 1.00 1.00 1.00 0.80

14000 Y 0.10 0.20 0.20 0.40 0.80 0.75 1.00Q 8 6 6 6 6 5 5

Table 5.5: The Lagragian Multiplier λ and corresponding PSNR ≈ 34db and rateconstraint

PSNR: 30dbλ Clust1 Clust2 Clust3 Clust4 Clust5 Clust6 Clust7

X 0.10 0.10 0.10 0.10 0.80 0.10 0.10300 Y 0.10 0.10 0.10 0.20 0.20 0.40 0.90

Q 18 12 10 10 8 8 8X 0.10 0.10 0.10 0.10 1.00 0.10 0.10

500 Y 0.10 0.10 0.10 0.20 0.20 0.40 0.90Q 18 12 10 10 8 8 8X 0.10 0.10 0.10 0.90 1.00 0.80 0.30

3000 Y 0.10 0.20 0.10 0.20 0.20 0.70 0.95Q 18 14 10 12 8 10 8X 0.10 0.10 0.10 1.00 1.00 1.00 0.30

8000 Y 0.10 0.20 0.20 0.20 0.20 0.80 0.95Q 18 14 12 12 8 10 8X 0.10 0.10 0.90 1.00 1.00 1.00 0.30

12000 Y 0.10 0.20 0.20 0.20 0.40 0.80 0.95Q 18 14 12 12 10 10 8

Table 5.6: The Lagragian Multiplier λ and corresponding PSNR ≈ 34db and rateconstraint

61

Page 75: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

[0.20, 0.20, 5.00], [0.10, 0.40, 4.00], [0.10, 0.20, 3.00], [0.50, 0.80, 4.00], [1.00, 0.60,

3.00], [1.00, 0.80, 3.00], [1.00, 0.80, 3.00]. They are correspond to Cluster1, Cluster2,

Cluster3, Cluster4, Cluster5, Cluster6, Cluster7, respectively. Hence, the coding is

performed with these scaling parameters. In this way, the fifteen individual video

encoding are performed and the real results of them are list in Table 5.3. The Rp

presents the rate constrains corresponding complexity denoted Cp.

In order to compare with the non-scalable encoder, we encode same video se-

quence with the no complexity control under the certain QoS requirement. Fig. 5.12

to Fig. 5.17 show the results of the reconstruction of the video sequence with both

of the scalable and non-scalable encoder at QoS =37db, 34db, and 30db.

0 10 20 30 40 50 6020

25

30

35

40

45

50

No.GoP

PSNR

no scalingscaling

λ =300

Figure 5.12: Comparison of the non scaling coding and scaling coding at D =37db, λ = 300

In Fig. 5.12 and Fig. 5.13, the QoS is 37dB. Fig. 5.14, Fig. 5.15 and Fig. 5.16,

Fig. 5.17 correspond to QoS is 34db and 30db, respectively. The results are listed

in Table 5.7.

62

Page 76: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

0 10 20 30 40 50 6026

28

30

32

34

36

38

40

42

44

46

No.GoP

PSNR

no scalingscaling

λ=2000

Figure 5.13: Comparison of the non scaling coding and scaling coding at D =37db, λ = 2000

0 10 20 30 40 50 6015

20

25

30

35

40

45

No.GoP

PSNR

no scalingscaling

λ=300

Figure 5.14: Comparison of the non scaling coding and scaling coding at D =34db, λ = 300

63

Page 77: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

0 10 20 30 40 50 6024

26

28

30

32

34

36

38

40

42

No.GoP

PSNR

no scalingscaling

λ=2000

Figure 5.15: Comparison of the non scaling coding and scaling coding at D =34db, λ = 2000

0 10 20 30 40 50 6015

20

25

30

35

40

No.GoP

PSNR

no scalingscaling

λ=300

Figure 5.16: Comparison of the non scaling coding and scaling coding at D =30db, λ = 300

64

Page 78: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

0 10 20 30 40 50 6015

20

25

30

35

40

No.GoP

PSNR

no scalingscaling

λ=2000

Figure 5.17: Comparison of the non scaling coding and scaling coding at D =30db, λ = 2000

PSNR Non-scalable Scaling1 Scaling2

R C R C Ratio R C Ratio

37db 8.7077 × 105 1.3049 × 109 1.2381 × 106 9.7284 × 108 0.41 1.1217 × 106 1.0676 × 109 0.55

34db 4.7797 × 105 1.3003 × 109 8.0704 × 105 7.8879 × 108 0.22 6.1917 × 105 1.0121 × 109 0.47

30db 2.1925 × 105 1.2944 × 109 4.1154 × 105 6.8049 × 108 0.15 3.0327 × 105 8.5476 × 108 0.29

Table 5.7: The Energy Saving Ratio

65

Page 79: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

The energy saving ratio is calculated:

Ratio11 =(

0.9728×109

1.3049×109

)3

≈ 0.41 Ratio12 =(

1.0679×109

1.3049×109

)3

≈ 0.55 PNSR = 37db

Ratio21 =(

0.7888×109

1.3003×109

)3

≈ 0.22 Ratio22 =(

1.0121×109

1.3003×109

)3

≈ 0.47 PNSR = 34db

Ratio31 =(

0.6805×109

1.2944×109

)3

≈ 0.15 Ratio32 =(

0.8548×109

1.2944×109

)3

≈ 0.29 PNSR = 30db

Note that the results in the table 5.7 are obtained with the theoretical evaluation.

The achievable energy saving, in a practical setting, is discussed in Chapter 6.

66

Page 80: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

Chapter 6

Joint Hardware and Video

Encoder Adaptation

In this chapter, we study how the hardware-layer system scheduling and application-

layer P-R-D control can be jointly designed to achieve energy saving in practical

portable video communication systems.

6.1 Background

In Chapters 2 to 4, we have developed a P-R-D analysis framework, which extends

the traditional R-D analysis by considering the third dimension of power consump-

tion. We have demonstrated that the video encoding energy can be significantly

saved using P-R-D optimization. To realize this energy saving in practical sys-

tem design, we need to consider the power consumption behavior of the hardware

computing platform and study joint hardware-application adaptation. The DVS

provides a mechanism to adapt the CPU operating voltage and clock frequency

67

Page 81: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

according to the work load of the P-R-D video encoder. It translates complexity-

scalability and encoder complexity reduction into energy-scalability and encoder

energy reduction. The encoder complexity minimization in previous chapters is per-

formed at the application layer and does not consider the hardware-layer system

scheduling. In cross-layer adaptation, the CPU clock frequency, circuit voltage sup-

ply in DVS, encoding bit rate, video encoding distortion and power consumption,

are jointly optimized.

6.2 System Model

In order to perform cross-layer adaptation for the complexity encoder to save power,

system models at the hardware and application layers are needed.

6.2.1 CPU Model

In the hardware layer, we consider the video application system with a single adap-

tive CPU that supports multiple speeds, f1, f2, . . . , fk, trading off performance for

energy. At a lower speed, the CPU consumes less power. However, a lower speed

may increase the power consumption of other resources such as memory. Two major

observations regarding CPU energy can be made here. First, the CPU is one of the

most energy-consuming components in a portable video device. Depending on the

application workload, For example, the CPU might consume up to 52% of the total

energy in the Stargate [58]. Second, many processors used in embedded system de-

sign today, for example, Intel XScale PAX255 [24], Intel Pentium-M [25], and AMD

Athlon [3], allow software to change the speed through the Advanced Configuration

and Power Interface (ACPI) standard [26], thereby enabling a cross-layer system

68

Page 82: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

control.

In general, there are two approaches to reduce CPU energy consumption. The

first one is dynamic power management (DPM) [34], which puts the idle processor

into the lower-power sleep state. The second approach is dynamic frequency /

voltage scaling (DVS) [53], which reduces the operating speed and voltage of the

active processor. DPM will be used in Chapter 6 for more energy saving in our

embedded system design. In this chapter, we focus on hardware adaptation through

DVS (dynamic voltage scaling).

The CPU power consumption typically consists of three major parts: the dy-

namic power, short circuit power, and leakage power, as shown in the following:

Ccp × f × V 2

︸ ︷︷ ︸

dynamic power

+ V × Isc︸ ︷︷ ︸

short circuit power

+ V × Ileak︸ ︷︷ ︸

leakage power

, (6.1)

where Ccp is the loading capacitance, f is the speed, V is the voltage, Isc is the

short circuit current, and Ileak is the leakage current [2]. When the speed decreases,

the CPU can operate at a lower voltage and thus reduce its power consumption.

Furthermore, the CPU power consumption model is generally a convex function

of the speed. Consequently, the CPU energy (i.e., the product of the power and

time) also decreases as the speed decreases even at the cost of longer execution

time. In particular, for ideal processors, we assume that their power is dominated

by the dynamic power and the voltage is proportional to the speed; that is, the CPU

power is proportional to the cubic of the speed on which the result of power saving is

calculated. Unfortunately, such assumption does not hold for real embedded systems

such as Stargate which uses Intel XScale PXA255, whose power is not proportional to

the cubic of the speed (see Table 6.1). Furthermore, a lower speed may increase the

69

Page 83: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

energy consumption of other resources, such as memory and display. For example,

when the CPU operates at a lower speed, an application needs to run for a longer

time; consequently, it needs to use the memory for a longer time, often increasing

the memory energy consumption. Since our goal is to save the total energy of the

whole device, we are more interested in the total power consumption of the device.

In general, the relationship between the clock speed f and the total system power

P (f) can be obtained via measurements like the data in the Table 6.1. Therefore,

it is without loss of generality to assume that the total power decreases as the CPU

speed decreases. This assumption holds for the system Stargate used in our project.

Freq.(MHz) 400 300 200 100Power (W) 1.89 1.67 1.62 1.45

Table 6.1: Stargate Power consumption.

Besides the fact that system power consumption is not proportional to the cubic

of the clock speed, in practice, the clock speed of a real CPU cannot be adjusted

continuously. Instead, the CPU often has several candidate clock speeds for us to

choose. In this case, the CPU clock speed cannot be adjusted to the frequency that

exactly matches how computational complexity the video application job demands.

6.2.2 Task Model

The nature of the video application task is periodic that release a job every period.

For example, the new computational complexity control encoder sends the video

segment coding jobs periodically. Each job has a soft deadline, typically defined

as the end of the period. Each job consumes a certain amount of CPU cycles.

Different jobs of the same task, like the video segments mentioned in the previous

chapter, will need different amount of CPU cycles. In practice, the instantaneous

70

Page 84: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

cycle demand for individual video segments may change significantly over the task

period. However, the probability distribution of cycle demand of the video segment

coding is often stable because of both of the periodic nature of its jobs and the

probability distribution of a video segment classified into a cluster. In the other

words, the probability distribution of cycle demand of each video segment coding

can be estimated based on previous encoding statistics. The probability that a job

demands no more than a certain amount of cycles in a period is denoted by

Pr(X ≤ x), (6.2)

where X is the random variable associated with the number of cycles demanded

by each job. In Chapter 4, an input video segment is classified into one of those

seven cluster Fi, i = 1, 2, . . . , 7 and the video segments in the i-th cluster require

Ci operation cycals. Assuming that the time interval of each video segment is

approximately TP . The probability Pr becomes

Pr(X ≤

j∑

i=1

·Ci

TP

) =

j∑

i=1

Fi. (6.3)

Table 5.2 list the probability distribution of these clusters.

6.3 Application-Layer Video Encoder Adaptation

In system scheduling, we need to make sure that the performance and CPU require-

ments of each task are met. In particular, the scheduler used a so-called soft real-time

algorithm to periodically allocate cycles to each task based on its statistical cycle

demand, e.g., the 95th percentile of cycle demand of all of its jobs. The statistical

71

Page 85: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

demand can be specified based on the task’s characteristics. The purpose of this

statistical, rather than the worst-case based, allocation is to improve CPU utiliza-

tion while providing soft (statistical) performance guarantees. The scheduler then

enforces the allocation through an earliest deadline first (EDF) based algorithm.

This algorithm makes admission control to ensure that the total CPU utilization (at

the highest CPU speed fk ) of all concurrent tasks is no more than one, i.e.,

n∑

i=1

Ci/fk

Tpi

≤ 1. (6.4)

Here, the system has n tasks and the i-th task demands Ci cycles per period Tp.

Among all tasks, the scheduler first executes the task with the earliest deadline

and positive allocation. As the task is executed, its cycle allocation is decreased

by the number of cycles it consumes. When its allocation is exhausted, the task is

preempted to run in best-effort mode for overrun protection.

This algorithm only protects the system overrun in a best-effort mode rather

than provides a mechanism to adjusting the CPU frequency for minimum power

consumption. An improved algorithm called practical voltage scaling (PDVS) is

used to minimize the total energy of the such multimedia system [62].

6.3.1 Practical Dynamic Voltage Scaling (PDVS)

The goal of PDVS is to minimize the total energy of the mobile device while provid-

ing soft performance guarantees to each multimedia task. For this, PDVS extends

the soft real-time scheduling algorithm by changing the CPU speed of task execu-

tion. The purpose is to save energy since the CPU may run slower without affecting

application performance. Assume that there are n tasks and each task is allocated

72

Page 86: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

Ci cycles per period Tp and all concurrent tasks run at a uniform CPU frequency.

The total CPU demand of all tasks is

1

fk

n∑

i=1

Ci

Tpi

≤ 1. (6.5)

The uniform frequency can be the lowest frequency that is no less than the total

demand. It is

min f : f ∈ f1, f2, . . . , fk, and f ≥

n∑

i=1

Ci

Tpi. (6.6)

If each task used its allocated cycles exactly, this uniform speed would minimize

CPU energy due to the convex nature of the frequency-power relationship [15, 52].

There is also some energy waste although the frequency of CPU is adapted because

the cycle allocation of a task is based on the some percentile of the number of cycles

demanded by its jobs. For example, if this percentile is 95% then about 95% of its

jobs will complete earlier. Early completion may eventually cause the CPU to be

idle, which, in turn, results in energy waste since the device still consumes energy

during the idle time.

Note that the more energy can be saved by avoiding or reducing the idle time.

Therefore, PDVS dynamically adapts the CPU speed of each job execution in a way

that minimizes the total energy consumed during the job execution while bounding

the job’s execution time. The reason for bounding the execution time is not to miss

the deadline of the job or other jobs executed after the job. In other words, each

job should finish within a certain amount of time. Therefore the time budget is

allocated to each job. Specifically, if there are n concurrent tasks and each task is

73

Page 87: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

allocated Ci cycles per period Tpi, then the i-th task is allocated

Ti =Ci

∑ni=1

Ci

Tpi

, (6.7)

time units per period Tpi (i.e. for each of its jobs). That is, we distribute the time

among all tasks based on their cycle demands. Note that if there is only a single

task, then its time budget equals to its period. Using joint allocation of cycles and

time, we can adapt the execution speed for each job as long as the job can use its

allocated cycles within its allocated time.

How is the execution speed dynamically adapted for jobs of each invitational

task? Without loss of generality, we consider a specific task with cycle allocation C

and time allocation T per period Tp. The basic idea behind PDVS is to minimize the

total energy consumed during each job execution while limiting the job’s execution

time within its time budget. To do this, PDVS sets a speed for each of the cycles

allocated to the job. If a cycles x, 1 ≤ x ≤ Cis executed at speed f(x), its execution

time is 1f(x)

. The energy consumed by the system during this time interval is

1

f(x)× p(f(x)) =

p(f(x))

f(x), (6.8)

where p(f(x)) is the total power consumed by the whole device at speed f(x).

Because multimedia tasks demand cycles statistically, the cycle x is executed with

a probability and its expected energy is

F (x)p(f(x))

f(x), (6.9)

74

Page 88: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

where F (x) is the tail of the distribution function:

F (x) = 1 − Pr(X ≤ x). (6.10)

By setting a speed for each of the cycles allocated the job, the minimization of the

total energy consumed during the job execution while bounding its execution time

is, more formally, with the speed adaptation problem as

minC∑

x=1

F (x)p(f(x))

f(x)+

(

T −C∑

x=1

F (x)1

f(x)

)

· Pidle, (6.11)

s.t.

C∑

x=1

1

f(x)≤ T,

f(x) ∈ f1, f2, . . . , fk, 1 ≤ x ≤ k,

where T is the time budget allocated to the job and Pidle is the device power when

the CPU is idle at the lowest speed.

In practice, using the constrained optimization (6.12) is often not feasible because

of two major reasons. First, the number of needed cycles, C, may be very large for

multimedia tasks (e.g., in millions). Consequently, the computational overhead for

solving the optimization problem may be unacceptably large. Second, we cannot set

a speed for individual cycles since each speed change may take tens of microseconds

[62]. To reduce the cost, a piece-wise approximation technique is applied that groups

the allocated cycles and sets a same speed within each group. Specifically, we first

use a set of points, b1, b2, . . . , bk+1 , to divide the allocated cycles into k groups, each

with size gi, 1 ≤ x ≤ k. We then find a speed f(bi) for each group [bi, bi+1). In this

75

Page 89: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

way, we rewrite the above constrained optimization as

mink∑

i=1

giF (bi)p(f(bi))

f(bi)+

(

T −k∑

i=1

giF (bi)

f(bi)

)

· Pidle, (6.12)

s.t.k∑

i=1

gi1

f(bi)≤ T,

f(bi) ∈ f1, f2, . . . , fk, 1 ≤ i ≤ k.

This leads to an optimization problem over a discrete set, which can be solved at

least by brute-force search. Certainly, some fast algorithms, such as gradient search

may be applied.

6.3.2 Cross-layer Adaptation for Single-Stream Video En-

coding

In the following, we consider PDVS adaptation for a single video encoding task. In

Section 6.3.1, the PDVS adaptation algorithm is considered for more generic adap-

tation problem with multiple CPU tasks. Within the context of single-stream video

encoding, this PDVS adaptation problem can be further simplified. First, in our

system, video encoding is the dominant CPU task while other system management

tasks require a very small amount of cycles and they can be considered as constant

over the system running time. Second, the time budget for this one task is not

changed considering no delay allowed and the adaptation is in coarse time granular-

ity that is same as the time of a video segment encoding, denoted as TGoP . In this

way, Eq. (6.13) can be simplified as

gjp(f(bi))

f(bi)+

(

TGoP − gj1

f(bi)

)

· Pidle, (6.13)

76

Page 90: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

s.t. gj1

f(bi)≤ TGoP , j = 1, 2, . . . , 7,

f(bi) ∈ f1, f2, . . . , fk, 1 ≤ i ≤ k.

Here the TGoP is the time budget which equals to the total encoding time fo the

video segment. gj is average computational complexity the j-th cluster of video

segments. This is solution for the one task and no delay video coding system. Note

that the function 6.14 represents the average energy consumption for encoding the

video segments that are classified into the j-th cluster. The total energy consumption

of the whole video sequence is given by

P Be =

7∑

j=1

Pj

(

gjp(f(bi))

f(bi)+

(

TGoP − gj1

f(bi)

)

· Pidle

)

, (6.14)

s.t. gj1

f(bi)≤ TGoP , j = 1, 2, . . . , 7,

f(bi) ∈ f1, f2, . . . , fk, 1 ≤ i ≤ k.

This is the power consumption P Be with cross-layer adaptation. If cross-layer adap-

tation is not used, the expected power consumption, P Ae is given by

P Ae =

7∑

j=1

Pj (gmk · p(fk)) = TGoP p(fk), j = 1, 2, . . . , 7, (6.15)

s.t.gm

k

fk≈ TGoP .

Let gmi denote the maximum cycles in the period TGoP under clock speed fi. The

energy saving ratio between these two cases with and without cross-layer adaptation

77

Page 91: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

is

P Be

P Ae

=

∑7j=1 Pj

(

gjp(f(bi))

f(bi)+(

TGoP − gj1

f(bi)

)

· pidle

)

TGoP · p(fk)

=1

TGoP

7∑

j=1

Pj

(gj

f(bi)·p(f(bi))

p(fk)+

(

TGoP −gj

f(bi)

)

·pidle

p(fk)

)

=

7∑

j=1

Pj

(gj

gmi

·p(f(bi))

p(fk)+

(

1 −gj

gmj

)

·pidle

p(fk)

)

, (6.16)

s.t. gmi

1

f(bi)≈ TGoP , j = 1, 2, . . . , 7,

f(bi) ∈ f1, f2, . . . , fk, 1 ≤ i ≤ k.

Considering the test video sequence used in the Chapter 5, which has 3585

frams partitioned into 57 video segments. It is compressed with P-R-D optimal

scaling encoder which runs on a Stargate microprocessor. The Stargate’s power

consumption model is listed in Table 6.1. The probability of cluster is shown in

Table 5.2. With the encoding results in the Chapter 5 Table 5.7, the real energy

saving ratio can be computed with the (6.17). Assuming that the power consumption

levels of Stargate at running and idle modes are the same. The energy saving results

are shown in Table 6.2. From these results, we can see that the real energy saving

ratio is not as good as theoretic evaluation in the Chapter 5. This is because the

system power consumption not only includes CPU energy consumption, but also

energy consumption by other system components which cannot be scaled by DVS.

78

Page 92: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

Cluster Cluster1 Cluster2 Cluster3 Cluster4 Cluster5 Cluster6 Cluster7 R-Ratiogj 9.1741 × 106 1.1255 × 107 9.0274 × 106 1.5083 × 107 1.7511 × 107 1.7847 × 107 1.6248 × 107

37db gj/gmi 0.856 0.689 0.844 0.923 0.796 0.812 0.995 0.9193

(1) fk 200 300 200 300 400 400 300pi/pk 0.857 0.884 0.857 0.884 1 1 0.884

gj 9.1741 × 106 1.1255 × 107 9.0274 × 106 1.7524 × 107 1.7511 × 107 2.0118 × 107 2.0844 × 107

37db gj/gmi 0.856 0.689 0.844 0.797 0.796 0.915 0.948 0.9602

(2) fk 200 300 200 400 400 400 400pi/pk 0.857 0.884 0.8571 1 1 1 1

Table 6.2: The Energy Saving Ratio1

Cluster Cluster1 Cluster2 Cluster3 Cluster4 Cluster5 Cluster6 Cluster7 R-Ratiogj 7.7220 × 106 7.7990 × 106 7.8556 × 106 1.1267 × 107 1.4371 × 107 1.4754 × 107 1.4893 × 107

34db gj/gmi 0.721 0.729 0.735 0.690 0.880 0.903 0.911 0.8760

(1) fk 200 200 200 300 300 300 300pi/pk 0.857 0.857 0.857 0.884 0.884 0.884 0.884

gj 7.7220 × 106 7.7990 × 106 8.8660 × 106 1.4609 × 107 1.9672 × 107 1.9632 × 107 2.1972 × 107

34db gj/gmi 0.721 0.729 0.828 0.894 0.895 0.893 0.999 0.9248

2 fk 200 200 200 300 400 400 400pi/pk 0.857 0.857 0.857 0.884 1 1 1

Table 6.3: The Energy Saving Ratio 2

Cluster Cluster1 Cluster2 Cluster3 Cluster4 Cluster5 Cluster6 Cluster7 R-Ratiogj 7.8002 × 106 8.3190 × 107 7.8587 × 106 8.9538 × 106 1.2106 × 107 1.1623 × 107 1.3818 × 107

30db gj/gmi 0.729 0.778 0.791 0.836 0.742 0.712 0.847 0.8684

1 fk 200 200 200 200 300 300 300pi/pk 0.857 0.857 0.857 0.857 0.884 0.884 0.884

gj 7.8002 × 106 8.3190 × 107 7.8587 × 106 1.0655 × 107 1.3056 × 107 1.4888 × 107 1.7642 × 107

30db gj/gmi 0.729 0.778 0.791 0.996 0.799 0.911 0.802 0.8764

2 fk 200 200 200 200 300 300 400pi/pk 0.857 0.857 0.857 0.857 0.884 0.884 1

Table 6.4: The Energy Saving Ratio 3

79

Page 93: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

Chapter 7

Energy-Aware Embedded Video

Encoding System Design

In this chapter, we describe energy-aware embedded video encoding system design.

An energy-aware embedded system, named DeerNet video sensor system, has been

designed. In embedded video communication system design for wildlife activity

monitoring, the system is expected to operate over an extended period of time, say

a few weeks or even months. Therefore, energy minimization of video encoding is

very critical. In this chapter, we introduce the tier architecture in our design. Based

on this tier architecture, the non-reducible power can be saved.

In Chapter 6, we have explained how the video encoding energy can be saved

through optimum complexity control of video encoders and cross-layer adaptation.

It should be noted that the results obtained in Chapter 5 and Chapter 6 are a little

different. This is because, former results are obtained in an ideal case which assumes

that the circuit voltage and clock speed can be adjusted continuously for the whole

system. It is assumed that the power consumed by circuit is proportional to the

80

Page 94: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

cubic of the running speed. In fact, practical systems, e.g., Stargate, consist of

many modules which implement a variety of system functions. All of these modules

cannot be adjusted in the voltage and/or speed. The other reason is the module

that can be adapted has some static power consumption. Those two reasons cause

the power consumption of the real system is not proportional to the cubic of the

running speed. So that, the system organization method provides other way that

results in more efficient power consuming in the system, specifically for embedded

system. The tier system architecture is one possible approach to saving power even

more.

7.1 Tier architecture

We observe that the power assumption in the real system can be categorized into

two classes: reducible power and non-reducible power. The reducible power is the

amount of power that can be eliminated from a running system while maintaining

the ability to do computation. On the other hand, the non-reducible power of system

can not be eliminated from a running system, which dominates the lifetime of the

battery, see results in Chapter 5. Common sources of non-reducible power include

the power supply, on-board oscillators, memory and I/O buses, and the limited

range of frequency and voltage scaling [19].

The amount of non-reducible power, however, varies on different platforms. Typ-

ically, platforms are carefully optimized to provide their promised functionality at

the lowest possible energy cost, and platforms that provide less functionality have

smaller non-reducible power. For example, the laptop provides more functions than

PADs and power consumption of the laptop is also more than that of the PDA. For-

81

Page 95: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

tunately, there is significant overlap in the functionality provided by high-power and

low-power platforms. Therefore, if the tasks running on the system have different

computational complexity, the tasks can be assigned to different platforms which

have different functionalities and non-reducible power. The system is organized in

tiers, such system is composed of a set of tiers, each with a set of capabilities and

a power modes. The system as a whole executes tasks by waking the tier that has

the capabilities to execute the task in the most efficient manner. Fig. 7.1 shows the

tier system architecture.

Figure 7.1: Tier Architecture

The tiers can be more than two and in our system two different platforms,

Stargate and MICA , are constructed in a whole system. For such two tiers system,

the nature of the power consumption of the system is analyzed here. Considering

82

Page 96: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

the system works like that the Stargate is waked up by the Mote to take video, then

it goes back to sleep and the MICA runs all time to determine whether it needs to

wake up Stargate.

If fSA is the fraction of time the tier Stargate spends awake, P S

A is the power it

expends while awake, 1 − fSA is the fraction of time the system spends asleep, and

P SS is the power it expends while suspended, and fM

A is the fraction of time the tier

MICA spends awake, P MA is the power it expends while awake, 1−fM

A is the fraction

of time the system spends asleep, and P MS is the power it expends while suspended,

the power consumption of the tier system, is

PT = fSA · P S

A + (1 − fSA) · P S

S + fMA · P M

A . (7.1)

Assuming a system consist of only Stargate, The power consumption of the

system is

PS = fSA · P S

A . (7.2)

The power saving ratio is

Raio =PT

PS

=1600 × 1/3 + 107 × 2/3 + 45 × 1

1600 × 1= 0.409. (7.3)

The average power consumption of Stargate is about 1600 mW during active

modes and 107 mW during sleep. The lower tier MIAC’s average power consumption

is about 45 mW. We assume that the video device captures 8 hours of video data

per day.

83

Page 97: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

7.2 System Design

The design of a power-aware embedded system, e.g., DeerNet video sensor, is com-

posed of three parts: the hardware, the underlying system architecture, and the

model for distributing applications across the tiers. In general, the design is similar

to many distributed systems; each tier is under autonomous control while decisions

are made in a distributed manner. Client applications reside at the most powerful

tier, and tasks that support those applications are distributed among the various

tiers.

7.2.1 Hardware

The DeerNet video sensor is designed in a strictly hierarchical manner, and the

higher tier is more powerful than the lower tier. Two tiers can communicate each

other. Communication occurs via a local port and the tiers are connected to a

common power source. Moreover, lower tier has the ability to draw its superior tier

out of a suspended mode. An overview diagram of our design is shown in Fig. 7.2.

The higher tier is based on the Crossbow’s Stargate platform, which has an

XScale PXA255 CPU (400 MHz) with 32MB flash memory and 64MB SDRAM.

PCMCIA and Compact Flash connectors are available on the main board. The

Stargate also has a daughter board with Ethernet, USB and serial connectors. A

Logitech QuickCam Pro 4000 webcam is connected through a USB connection for

video capture [11].

The MICA plays the lower tier rule which works with a powerful Atmega128L

micro-controller. The data, measurements, and other user-defined information are

stored in a 4-Mbit serial flash (Atmel AT45DB041). It is connected to one of the

84

Page 98: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

Figure 7.2: The Video Sensor Tiers

USART on the ATMega128L [12]. This is very low-power platform. The average

power consumption is about 45 mW.

The two tiers are connected through a 51-pin port which includes a local link

and a wakeup control (describing later).

7.2.2 Software

In the higher tier Stargate, the linux operating system manages all the tier resources

including the power management and the video capture. The video capture mod-

ule performances the new P-R-D optimization encoder which compresses the video

sequence into CF card. The power management can put the Stargate into sleep if

it is necessary. Once the Stargate is in sleep state, it will wait for the waking up

signal from MICA at the lower tier.

MICA at lower tier performs such tasks: determining the motion signal that

reflects the deer motion state, recording the deer motion state, communicating with

the higher tier, and sending wakeup signal to higher tier Stargate. All tasks running

85

Page 99: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

on MIAC are controlled by operating system called TinyOS. The software diagram

is shown in Fig. 7.3.

Figure 7.3: Software constructure

7.3 Power Management

7.3.1 Signal Design

The goal of power management is to put the higher tier Starget to sleep at a proper

time or situation. What is proper time or situation? DeerNet video sensor is used

to track deer’s action and some situations do not need to record such as after sleep,

repeating same action, and at night. In these situations and time, the sensor needs

to go sleep. On Stargate, a timer is designed. It tracks what time it is and how

long it will last. Therefore, once the timer shows night coming, the Stargate is put

into sleep. Another signal comes from MICA which determines motion state. If it

86

Page 100: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

determines that the deer is not moving, MICA sends a signal to Stargete that tells

Stargate to stop capturing video and go to sleep.

The clock signal can be obtained by the Stargate. It can be implemented by a

read clock command. The motion wakeup signal comes MIAC through local link

which is a UART included a 51 pin connecter that connects both Stargate and

MIAC. The signal can transmit to Stargate as a massage. The video capture signal

is triggered starting capture. So far, the signal is used to makes Stargate to sleep.

How is the waking up signal generated? It is must come from MICA because the

Stargate is sleeping. Waking up Stargate needs two steps. The first step is to set

a Stargate GPIO port into an interrupt allow state. Next, we need to let MICA

triggers this GPIO port. Once the GPIO port is triggered by MICA, an interrupt

routine is performed to wake up stargate. Fig. 7.4 shows the 51 pin connecter

definition.

The local link is provided by a UART port on pin 19 and pin 20 that communi-

cates with MICA. Pin 5 connects to PXA 255 general input/output port 1 (GPIO1)

which can generate an interrupt signal to wake up the Stargate.

7.3.2 State Control

Controlling Stargate state needs to initialize corresponding PAX 255 register e.g.,

interrupt control registers (ICMR, ICLR, ICCR, ICIP, ICPR) and power manager

control registers (PMCR, PSSR, PWER, PRER, etc.) [24]. It is difficult to read

or write such registers directly. Fortunately, the operating system provides such

control commands though the /proc file directory. In this way, the sleep and wakeup

controlling become convenient.

Making Stargate sleeping: The command is

87

Page 101: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

Figure 7.4: Definition of the Connecter Pins

88

Page 102: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

echo 1 > /proc/sys/pm/suspend.

This command actually puts the Stargate to sleep and wait to awake up. In

order to wake up a sleeping Stargate, the wakeup signal needs to set before Star-

gate in asleep. Because we set the pin5 GP1 M-INT1 as the waking up signal, the

correlation register has to set with the command

echo w1 > /proc/platx/gpio/GPCTL.

We also put a timer to prevent the wakeup signal failure. If the wakeup signal

fails, the timer can wake up Stargate also. A command to set the timer is

echo CONFIG_WAKEUPTIME=$time > /proc/platx/config.

89

Page 103: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

Chapter 8

Conclusion and Future Work

8.1 Conclusion

In this work, based on power-rate-distortion (P-R-D) optimization, we develop a

new approach for energy minimization for delay-tolerant video communication over

portable devices. We have developed a parametric video encoding architecture which

is fully scalable in power consumption. We have successfully extended the tradi-

tional R-D analysis by considering another dimension, the power consumption, and

established the P-R-D analysis framework for embedded video encoding and com-

munication under energy constraints. Using the P-R-D model, given a power supply

level and a bit rate, the power-scalable video encoder is able to find the best config-

uration of complexity control parameters to maximize the video quality. The P-R-D

analysis establishes a theoretical basis and provides a practical guideline in system

design and performance optimization for wireless video communication under energy

constraints.

Theoretically, we demonstrated that extending the traditional rate-distortion (R-

90

Page 104: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

D) analysis to P-R-D analysis gives us another dimension of flexibility in resource

allocation and performance optimization. We have analyzed the energy saving per-

formance of P-R-D optimization. We have also developed an adaptive scheme to

estimate the P-R-D model parameters and perform online energy optimization and

control for real-time video compression. Our simulation results show that, for typical

videos with non-stationary scene statistics, using the proposed P-R-D optimization

technology, the video encoding energy can be significantly reduced, especially for

delay-tolerant video communication applications where the per-bit energy cost of

wireless transmission is relatively low. This has a significant impact on energy-

efficient portable video communication system design.

8.2 Future Work

In our future work, we shall study and evaluate resource allocation and energy min-

imization for real-time video encoding and communication on embedded computing

platforms with dynamic voltage control. We shall also study how the P-R-D analy-

sis could be used for joint hardware and encoder energy minimization. In addition,

the follow issues need to be further investigated: (1) The training-classification ap-

proach does not work well with some video segments with significant change in scene

activities. (2) We should exploit the communication delay as a system resource to

further reduce the overall energy consumption. With such delay, the multi video

encoding tasks are in the system that might cause the system work on the more

efficient state because the system can be optimized with function 6.13.

91

Page 105: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

References

[1] Mpeg-2 video test model 5. ISO/IEC JTC1/SC29/WG11 MPEG93/457 (Apr1993).

[2] A.Chandrakasan, S.Sheng, and R.W.Brodersen. Low-power cmos dig-ital design. IEEE Journal of Solid-State Circuits Vol.27 (Apr 1992), 473–484.

[3] AMD. Mobile amd athlon 4 processor model 6 cpga data sheet.http://www.amd.com (Nov 2001).

[4] A.M.Tourapis, O.C.Au, and M.L.Liou. Predictive motion vector fieldadaptive search technique (pmvfast) - enhancing block based motion estimation.proceedings of Visual Communications and Image Processing 2001 (Jan 2001).

[5] A.Ortega, and K.Ramchandran. Rate-distortion methods for image andvideo compression: An overview. IEEE Signal Processing Magazine Vol.15 No.6(Nov 1998), 23 – 50.

[6] A.R.Lebeck, X.Fan, H.Zeng, and C.S.Ellis. Power aware page alloca-tion. In Proceedings of International Conference on Architectural Support forProgramming Languages and Operating Systems Cambridge MA (Nov 2000).

[7] B.Erol, F.Kossentini, and H.Alnuweiri. Efficient coding and mappingalgorithms for software-only real-time video coding at low bit rates. IEEETransactions on Circuits and Systems for Video Technology Vol.10 (Sep 2000),843–856.

[8] B.Li, and K.Nahrstedt. A control-based middleware framework for qualityof service adaptations. IEEE J. Select. Areas Commun. vol.17 (Sep 1999),1632–1650.

[9] C.Efstratiou, A.Friday, N.Davies, and K.Cheverst. A platform sup-porting coordinated adaptation in mobile systems. In Proceedings of 4th IEEEWorkshop on Mobile Computing Systems and Applications Callicoon NY (Jun2003).

92

Page 106: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

[10] C.Hughes, J.Srinivasan, and S.Adve. Saving energy with architecturaland frequency adaptations for multimedia applications.

[11] Crossbow Technology, I. Stargate developer’s guide.

[12] Crossbow Technology, I. Mpr-mib users manual.

[13] D.G.Sachs, S.Adve, and D.L.Jones. Cross-layer adaptive video coding toreduce energy on general-purpose processors. Proceedings of the InternationalConference on Image Processing ICIP ′03 Barcelona Spain (Sep 2003).

[14] D.G.Sachs, W.Yuan, C.J.Hughes, A.F.Harris, S.V.Adve, D.L.Jones,

R.H.Kravets, K.Nahrstedt, and Sidebar. Grace: A cross-layer adap-tation framework for saving energy. IEEE Computer, special issue on Power-Aware Computing vol.10 (Dec 2003), 50–51.

[15] D.Mosse, H. R., and P.Alvarez. Dynamic and aggressive scheduling tech-niques for power-aware real-time systems. In Proc. of 22nd IEEE Real-TimeSystems Symposium (Dec. 2001).

[16] D.S.Turaga, der Schaar, M., and B.Pesquet-Popescu. Complexityscalable motion compensated wavelet video encoding. IEEE Transactions onCircuits and Systems for Video Technology Vol.15 (Aug 2005), 982–993.

[17] et al, B. Agile application-aware adaptation for mobility. In Proceedingsof 16th Symposium on Operating Systems Principles Saint Malo France (Dec1997).

[18] et al, P. The emergence of networking abstractions and techniques in tinyos.In Proceedings of First Symposium on Networked System Designe and Imple-mentation San Francisco CA (Mar 2004).

[19] G.CHINN, S.DESAI, E.DISTEFANO, K.RAVICHANDRAN, and

S.THAKKAR. Mobile pc platforms enabled with intel centrino mobile tech-nology. Intel Technology Journal Vol.7 (May 2003).

[20] H.Zeng, X.Fan, C.Ellis, A.Lebeck, and A.Vahdat. Ecosystem: Man-aging energy as a first class operating system resource. In Proceedings of 10thIntl. Conf. on ASPLOS San Jose CA (Oct 2002).

[21] I.M.Pao, and M.T.Sun. Statistical computation of discrete cosine transformin video encoders. Journal of Visual Communication and Image RepresentationVol.9, no.2 (Jun 1998), 163–170.

93

Page 107: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

[22] Inc, A. Amd powernow!tm technology platform design guide for embeddedprocessors. http://www.amd.com/epd/processors.

[23] Inc, I. Intel xscale technology.

[24] Intel. Intel pxa255 processor: Developer manual.

[25] Intel. Pentium m processor. http://developer.intel.com/design/mobile/datashts/261203.pdf (Apr 2004).

[26] ITU-T. Video coding for low bit rate communications. ITU-T RecommendationH.263 version 1 and version 2 (Jan 1998).

[27] J.Chen, and K.J.R.Liu. Low-power architectures for compressed domainvideo coding co-processor. IEEE Transactions on Multimedia Vol.2 (Jun 2000),111–128.

[28] J.Flinn, Lara, E., M.Satyanarayanan, D.S.Wallach, and

W.Zwaenepoel. Reducing the energy usage of office applications. InProceedings of Middleware 2001 Heidelberg Germany (Nov 2001).

[29] J.Lorch, and A.Smith. Improving dynamic voltage scaling algorithms withpace. Proceedings of the ACM SIGMETRICS 2001 Conference (Jun 2001).

[30] J.Lorch, and A.Smith. Operating system modifications for task-based speedand voltage scheduling. In Proceedings of the 1st International Conference onMobile Systems Applications and Services San Francisco CA (May 2003).

[31] J.Ostermann, J.Bormans, P.List, D.Marpe, M.Narroschke,

F.Pereira, T.Stockhammer, and T.Wesi. Video coding with h.264/avc:Tools, performance, and complexity. IEEE CIRCUITS AND SYSTEMS MAG-AZINE (FIRST QUARTER 2004).

[32] J.Villasenor, C.Jones, and B.Schoner. Video communications usingrapidly reconfigurable hardware.

[33] K.Hyungjoon, N.Kamaci, and Y.Altunbasak. Low-complexity rate-distortion optimal macroblock mode selection and motion estimation for mpeg-like video coders. IEEE Transactions on Circuits and Systems for Video Tech-nology Vol.15 (Jul 2005), 823–834.

[34] L.Benini, A.Bogliolo, and G.D.Micheli. A survey of design techniquesfor system-level dynamic power management. IEEE Transactions on VLSISystems Vol.8 (Jun 2000.).

94

Page 108: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

[35] M.Mesarina, and Y.Turner. Reduced energy decoding of mpeg streams.In Proceedings of SPIE Multimedia Computing and Networking Conference SanJose CA (Jan 2002).

[36] P.A.Chou, and A.Sehgal. Rate-distortion optimized receiver-drivenstreaming over best-effort networks. Packet Video Workshop Pittsburg PA.(Apr 2002).

[37] page, W. http://www.abiresearch.com / products / market research / mo-bile broadcast video.

[38] page, W. Panel report of nsf workshop on sensors for environmental observa-tories. Seattle, WA. Nov.30 (Dec.2 2004), http://www.wtec.org/seo.

[39] P.Agrawal, J-C.Chen, S.Kishore, P.Ramanathan, and

K.Sivalingam. Battery power sensitive video processing in wirelessnetworks. Proceedings IEEE PIMRC’98 Boston (Sep 1998).

[40] P.Pillai, and K.G.Shin. Real-time dynamic voltage scaling for low-powerembedded operating system. Proceedings of 18th Symposium on Operating Sys-tem Principles, Banff, Canada (Oct 2001).

[41] P.Pillai, and K.G.Shin. Real-time dynamic voltage scaling for low-powerembedded operating systems. In Proceedings of 18th Symposium on OperatingSystems Principles Banff Canada (Oct 2001).

[42] R.Min, T.Furrer, and A.Chandrakasan. Dynamic voltage scaling tech-niques for distributed microsensor networks. IEEE Computer Society Workshopon VLSI (Apr 2000), 43–46.

[43] R.Rajkumar, C.Lee, J.Lehoczky, and D.Siewiorek. A resource allo-cation model for qos management. In Proceedings of 18th IEEE Real-TimeSystems Symposium San Francisco CA (Dec 1997).

[44] S.Banachowski, and S.Brandt. The best scheduler for integrated process-ing of besteffort and soft real-time processes.

[45] S.Gurumurthi, A.Sivasubramaniam, and M.Kandemir. Drpm: Dy-namic speed control for power management in server class disks. In Proceedingsof 30th Annual International Symposium on Computer Architecture San DiegoCA (Jun 2003).

[46] S.Iyer, L.Luo, R.Mayo, and P.Ranganathan. Energy-adaptive displaysystem designs for future mobile environments. In Proceedings of International

95

Page 109: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

Conference on Mobile Systems Applications and Services San Francisco CA(May 2003).

[47] S.M.Akramullah, I.Ahmad, and M.L.Liou. Optimization of h.263 videoencoding using a single processor computer: performance tradeoffs and bench-marking. IEEE Transaction on Circuits and System for Video TechnologyVol.11 (Aug 2001), 901 – 915.

[48] S.Mohapatra, and N.Venkatasubtramanian. Power-aware reconfiguremiddleware. In Proceedings of IEEE 23nd International Conference on Dis-tributed Computing Systems Providence RI (May 2003).

[49] S.V.Adve, A.F.Harris, C.J.Hughes, D.L.Jones, R.H.Kravets,

K.Nahrstedt, D.G.Sachs, R.Sasanka, J.Srinivasan, and W.Yuan.The illinois grace project: Global resource adaptation through cooperation.Proceedings of the Workshop on self-Healin ,Adaptive and self-managed System(June 2002).

[50] T.Berger. Rate Distortion Theory. Prentice Hall, Englewood Cliffs, NJ, 1984.

[51] T.Burd, and R.Broderson. Processor design for portable systems. Journalof VLSI Signal Processing Vol.13 No.2 (Aug 1996), 203–222.

[52] T.Ishihara, and H.Yasuura. Voltage scheduling problem for dynamicallyvariable voltage processors. In Proc. of Intl. Symp. on Low-Power Electronicsand Design (1998).

[53] T.Pering, T.Burd, and R.Brodersen. Voltage scheduling in the lparmmicroprocessor system.

[54] T.Sikora. The mpeg-4 video standard verification model. IEEE Transactionon Circuits and System for Video Technology Vol.7 (Feb 1997), 19 – 31.

[55] T.Wiegand. Text of committee draft of joint video specification (itu-t rec.h.264 — iso/iec 14496-10 avc). Document JVTC167 3rd JVT Meeting FairfaxVirginia USA (May 2002).

[56] V.Akella, der Schaar, M., and Kao, W.-F. Proactive energy opti-mization algorithms for wavelet-based video codecs on power-aware processors.Proceedings of of IEEE International Conference on Multimedia and Expo (Jul2005), 566– 569.

[57] V.Grassi, and R.Mirandola. Derivation of markov models for effective-ness analysis of adaptable software architectures for mobile computing. IEEETransactions on Mobile Computing vol.2 (Jun 2003).

96

Page 110: EMBEDDED SYSTEM DESIGN AND POWER-RATE-DISTORTION ...

[58] V.Raghunathan, M.Srivastava, T.Pering, and R.Want. Stargate: En-ergy management techniques. Network and Embedded System Lab (NESL) andUbiquirt SRP, Intel Reasearch.

[59] V.Raghunathan, P.Spanos, and M.Srivastava. Adaptive power-fidelityin energy aware wireless embedded systems. In Proceedings of IEEE Real TimeSystems Symposium London UK (Dec 2001).

[60] W.P.Burleson, P.Jain, and S.Venkatraman. Dynamically parameter-ized architecture for power-aware video coding: Motion estimation and dct.Proceedings of the Second USF International Workshop on Digital and Compu-tational Video (2001).

[61] W.Yuan, and K.Nahrstedt. Integration of dynamic voltage scaling andsoft real-time scheduling for open mobile systems. In Proceedings of 12th In-ternational Workshop on Network and OS Support for Digital Audio and VideoMiami Beach FL (May 2002).

[62] W.Yuan, and K.Nahrstedt. Practical voltage scaling for mobile multime-dia device. MM’04 (Oct 2004).

[63] X.Lu, Y.Wang, and E.Erkip. Power efficient h.263 video transmission overwireless channels. Proceedings of 2002 International Conference on Image Pro-cessing (Sep 2002).

[64] Z.-L.He, C.-Y.Tsui, K.-K.Chan, and M.Lion. Low-power vlsi design formotion estimation using adaptive pixel truncation. IEEE Trans. On Circuitsand System for Video Technology vol.10 (Aug 2000).

[65] Z.He, and S.K.Mitra. A unified rate-distortion analysis framework for trans-form coding. IEEE Transactions on Circuits and System on Video Technologyvol.11 (Dec 2001), 1221–1236.

[66] Z.He, Y.Liang, L.Chen, I.Ahmad, and D.Wu. Power-rate-distortion anal-ysis for wireless video communication under energy constraint. IEEE Transac-tion on Circuits and System for Video Technology (May 2005).

97


Recommended