EMBEDDED SYSTEM DESIGN AND
POWER-RATE-DISTORTION OPTIMIZATION FOR
VIDEO ENCODING UNDER ENERGY CONSTRAINTS
A Thesis presented to the Faculty of the Graduate School
University of Missouri-Columbia
In Partial Fulfillment
Of the Requirement for the Degree
Master of Science
byWENYE CHENG
Prof. Zhihai He, Thesis Supervisor
AUGUST 2007
The undersigned, appointed by the Dean of the Graduate School,have examined the thesis entitled
EMBEDDED SYSTEM DESIGN ANDPOWER-RATE-DISTORTION OPTIMIZATION FOR VIDEO
ENCODING UNDER ENERGY CONSTRAINTS
Presented by Wenye Cheng
A candidate for the degree of Master of Science
And hereby certify that in their opinion it is worthy of acceptance.
Professor Zhihai He
Professor Justin Legarsky
Professor Ye Duan
To my wife Ming Qian and daughter Xi Cheng.
—–Without their love, support and sacrifices, this work could not have been
accomplished.
ACKNOWLEGEMENTS
I would hereby like to whole-heartedly thank my advisor, Prof. Zhihai He, for
providing excellent guidance throughout the course of this work.
I would like thank members of my thesis committee, Prof. Justin Legarsky and
Prof. Ye Duan, for their thorough review of this thesis. I express my deepest and
most sincere gratitude to my committee members.
I would like also thank Mr. Jim Fischer for his assistance in setting up the
experiment environment of this thesis. He was very patient and cooperative every
time I asked for his help.
To my colleagues, graduate students Xi Chen, Jay Eggert, and Xiwen Zhao, I
would like to express my thanks for their help in experiments. Specially, I express
my thanks to Xi Chen for his great help in our collaboration on the energy-aware
video encoder design.
I would like also thank York Chung for his help during my thesis writing. I would
have taken a much longer time in typewriting the thesis without his help.
ii
Abstract
Wireless video communication over portable devices has become the driving tech-
nology of many important applications, experiencing dramatic market growth and
promising revolutionary experiences in personal communication, gaming, entertain-
ment, military, security, environment monitoring, and more. Portable devices are
powered by batteries. Video encoding schemes are often computationally intensive
and energy-demanding, even after being fully optimized with existing software and
hardware energy-minimization techniques. As a result, the operational lifetime of
current portable video systems, such as handheld video devices, is still very short,
mostly in the range of a few hours. Therefore, one of the central challenging issues in
portable video communication system design is to minimize the energy consumption
of video encoding so as to extend the operational lifetime of devices. In this work,
we develop an operational power-rate-distortion (P-R-D) approach to minimizing
the video encoding energy under rate-distortion constraints. We will demonstrate
that extending the traditional rate-distortion analysis to P-R-D analysis will give
us another dimension of flexibility in resource allocation and performance optimiza-
tion for wireless video communication over portable devices. Theoretically, we will
analyze the energy saving gain of P-R-D optimization. Practically, we will develop
an adaptive scheme to estimate the P-R-D model parameters and perform on-the-
iii
fly energy optimization for real-time video compression. Our results show that, for
typical videos with non-stationary statistics, using the proposed P-R-D optimiza-
tion technology, the encoder energy consumption can be significantly reduced. This
has many important applications in energy-efficient portable video communication
system design.
iv
Contents
ACKNOWLEDGEMENTS ii
Abstract iii
List of Figures viii
List of Tables xi
1 Introduction 1
1.1 Major Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Background and Relate Work 5
2.1 Video Coding Complexity . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Dynamic Scalable Voltage . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Joint Hardware and Application Adaptation . . . . . . . . . . . . . . 10
2.4 Importance of This Work . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 Energy-Scalable Video Encoder Design and Operational P-R-D
Analysis 14
3.1 An Operational Approach to P-R-D Analysis . . . . . . . . . . . . . . 14
v
3.2 Complexity Control Parameters . . . . . . . . . . . . . . . . . . . . . 17
3.2.1 The Complexity Control Parameter νX . . . . . . . . . . . . . 17
3.2.2 The Complexity Control Parameter νY . . . . . . . . . . . . . 19
3.3 P-R-D Modeling for Energy-Aware Video Encoding . . . . . . . . . . 21
3.4 Analytical P-R-D Models . . . . . . . . . . . . . . . . . . . . . . . . . 24
4 Energy Saving Analysis 33
4.1 Theoretical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2 Traing-Classification Approach to P-R-D Video Encoder Control . . . 39
5 Power-Rate-Distortion Control for Energy-Aware Video Encoding 45
5.1 A Training-Classification Approach to P-R-D Video Encoder Control 45
5.2 Lagrangian Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.3 P-R-D Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.4 Summary of Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6 Joint Hardware and Video Encoder Adaptation 67
6.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.2.1 CPU Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.2.2 Task Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.3 Application-Layer Video Encoder Adaptation . . . . . . . . . . . . . 71
6.3.1 Practical Dynamic Voltage Scaling (PDVS) . . . . . . . . . . 72
6.3.2 Cross-layer Adaptation for Single-Stream Video Encoding . . 76
7 Energy-Aware Embedded Video Encoding System Design 80
7.1 Tier architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
vi
7.2 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.2.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.2.2 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.3 Power Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
7.3.1 Signal Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
7.3.2 State Control . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
8 Conclusion and Future Work 90
8.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
8.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
References 92
vii
List of Figures
2.1 Generalized block diagram of a hybrid video encoder . . . . . . . . . 6
2.2 The framework of the GRACE . . . . . . . . . . . . . . . . . . . . . . 11
3.1 Power consumption model with DVS. . . . . . . . . . . . . . . . . . . 16
3.2 (a)The P-R-D curve; (b)The D-P curves at different bit rates. . . . . 23
3.3 Akiyos (a)The P-R-D curve; (b)The D-P curves at different bit rates. 25
3.4 News (a)The P-R-D curve; (b)The D-P curves at different bit rates. . 26
3.5 Salesman (a)The P-R-D curve; (b)The D-P curves at different bit rates. 27
3.6 Car (a)The P-R-D curve; (b)The D-P curves at different bit rates. . . 28
3.7 Foreman (a)The P-R-D curve; (b)The D-P curves at different bit rates. 29
3.8 Coastguard (a)The P-R-D curve; (b)The D-P curves at different bit
rates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.9 Football (a)The P-R-D curve; (b)The D-P curves at different bit rates. 31
4.1 The relationship of λ and λF . . . . . . . . . . . . . . . . . . . . . . . 41
4.2 The common video scene estimation parameter λF . . . . . . . . . . . 42
4.3 A parts of common video scene estimation parameter λF . . . . . . . 42
4.4 A parts of common video scene estimation parameter λF . . . . . . . 43
4.5 The distribution of scene activity parameters of all video segments. . 44
viii
5.1 The discrete Lagrangian Optimization principle . . . . . . . . . . . . 47
5.2 The multiplier λ, rate constraints and complexity at D = 38db . . . . 52
5.3 The multiplier λ, rate constraints and complexity at D = 37db . . . . 53
5.4 The multiplier λ, rate constraints and complexity at D = 36db . . . . 53
5.5 The multiplier λ, rate constraints and complexity at D = 35db . . . . 54
5.6 The multiplier λ, rate constraints and complexity at D = 34db . . . . 54
5.7 The multiplier λ, rate constraints and complexity at D = 33db . . . . 55
5.8 The multiplier λ, rate constraints and complexity at D = 32db . . . . 55
5.9 The multiplier λ, rate constraints and complexity at D = 31db . . . . 56
5.10 The multiplier λ, rate constraints and complexity at D = 30db . . . . 56
5.11 The multiplier λ, rate constraints and complexity at D = 29db . . . . 57
5.12 Comparison of the non scaling coding and scaling coding at D =
37db, λ = 300 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.13 Comparison of the non scaling coding and scaling coding at D =
37db, λ = 2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.14 Comparison of the non scaling coding and scaling coding at D =
34db, λ = 300 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.15 Comparison of the non scaling coding and scaling coding at D =
34db, λ = 2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.16 Comparison of the non scaling coding and scaling coding at D =
30db, λ = 300 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.17 Comparison of the non scaling coding and scaling coding at D =
30db, λ = 2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.1 Tier Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
ix
7.2 The Video Sensor Tiers . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.3 Software constructure . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
7.4 Definition of the Connecter Pins . . . . . . . . . . . . . . . . . . . . . 88
x
List of Tables
2.1 CPU Occupancy (In Percentage) of the Major Encoding Function . . 8
4.1 The point value for categorizing Video Segment . . . . . . . . . . . . 41
5.1 Sample choices of of multiplier λ. . . . . . . . . . . . . . . . . . . . . 52
5.2 The Probability of the Fi . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.3 The Lagragian Multiplier λ and correspoding QoS and rate constaint 59
5.4 The Lagragian Multiplier λ and corresponding PSNR ≈ 37db and
rate constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.5 The Lagragian Multiplier λ and corresponding PSNR ≈ 34db and
rate constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.6 The Lagragian Multiplier λ and corresponding PSNR ≈ 34db and
rate constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.7 The Energy Saving Ratio . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.1 Stargate Power consumption. . . . . . . . . . . . . . . . . . . . . . . 70
6.2 The Energy Saving Ratio1 . . . . . . . . . . . . . . . . . . . . . . . . 79
6.3 The Energy Saving Ratio 2 . . . . . . . . . . . . . . . . . . . . . . . . 79
6.4 The Energy Saving Ratio 3 . . . . . . . . . . . . . . . . . . . . . . . . 79
xi
Chapter 1
Introduction
Wireless video communication over portable devices has become the driving tech-
nology of many important applications, experiencing dramatic market growth and
promising revolutionary experiences in personal communication, gaming, entertain-
ment, military, security, environment monitoring, and more [37,38]. Portable devices
are powered by batteries. Video encoding schemes are often computationally inten-
sive and energy-demanding, even after being fully optimized with existing software
and hardware energy-minimization techniques [14, 63]. As a result, the operational
lifetime of current portable video systems, such as handheld video devices, is still
short, mostly in the range of a few hours. This has become a bottleneck for techno-
logical progress in portable video electronics.
Video encoding is computationally intensive and energy-consuming. However, a
mobile video application system, powered by batteries, has limited energy supply
for the video data processing. One of the central challenging issues in such system
design is to minimize the energy consumption of video data processing so as to
extend the operational lifetime of the system while supporting multimedia Quality
1
of Service (QoS) requirements. In energy-aware video encoding system design, the
energy consumption of video data compression should be minimized while satisfying
the same QoS requirements.
In the age of desktop computing and wired communication, people worried about
bits, the storage space or transmission bandwidth. Therefore, the ultimate goal in
this type of video communication system design is to optimize video quality under
rate constraints. To analyze, model, control, and optimize the performance of a
signal processing and communication system under rate constraints, rate-distortion
(R-D) theories and algorithms have been developed [5,36,50,65]. With recent tech-
nological advances in circuit design and wireless communication, storage space and
network bandwidth have experienced dramatic growth, having been improved by
hundreds of times during the past decade. Currently, in many portable communica-
tion applications, energy has become a much more scarce and critical resource than
bandwidth or storage space. Therefore, how to incorporate the energy consumption
into the existing R-D performance analysis framework so as to optimize the video
communication system performance under rate and energy constraints emerges as a
new research task.
In this work, we study the energy consumption of video encoders and incor-
porate the third dimension of power consumption into existing rate-distortion (R-
D) analysis framework so as to establish a power-rate-distortion (P-R-D) analysis
framework for energy-aware video encoding. More specifically, we first develop an
energy-scalable video encoder which is fully scalable in energy consumption. We
then develop an operational approach to study its P-R-D behavior. Both theo-
retically and theoretically, we demonstrate that with the proposed energy-scalable
video encoder design and P-R-D optimization, we are able to significantly reduce
2
the energy consumption of video encoding over portable video devices.
1.1 Major Contributions
The major contributions of this work include:
1. Developing an energy-aware video encoding system for wildlife behavior mon-
itoring. Studying power consumption of video encoding systems and develop
a video encoding scheme which is fully scalable in energy consumption.
2. Developing an operational approach to modeling the P-R-D behavior of an
energy-scalable video encoder. Developing an analytical P-R-D model. De-
veloping a scheme to estimate P-R-D model parameters from video encoding
statistics.
3. Studying the characteristics of P-R-D functions of various test video segments.
Developing a training-classification approach for real-time P-R-D control using
P-R-D clustering.
1.2 Thesis Organization
The rest of the thesis is organized as follows:
Chapter 2 reviews the existing research work related to energy-aware video en-
coding over portable communication devices.
Chapter 3 introduces a fully energy-scalable video encoder with the P-R-D frame-
work and presents our operational approach to P-R-D modeling. An analytical P-
R-D model is then established. We also develop a scheme to estimate the P-R-D
model parameter from video encoder statistics.
3
Chapter 4 studies the problem of energy-saving using P-R-D optimization. The-
oretically, we demonstrate that the energy consumption of a video encoder can be
saved if the scene activity of input video is non-stationary. We also study how the
complexity control parameters of the video encoder can be configured to achieve
this energy saving.
Chapter 5 optimizes the P-R-D computational complexity scaling parameters
with the discrete version Lagrangian optimization algorithm.
Chapter 6 discusses how the P-R-D optimal scalable encoder can be used in
practical energy-aware embedded video system design. Joint hardware and video
encoder adaptation technique is discussed.
Chapter 7 discusses energy-aware video encoding in practical system design and
explained how the proposed technologies can be used in a practical setting.
Chapter 8 concludes the thesis. Future work is also discussed in this chapter.
4
Chapter 2
Background and Relate Work
In this chapter, we review existing research work related to energy-aware video
coding and communication system design. We then justify the importance of this
work.
2.1 Video Coding Complexity
There are two types of portable video devices: encoder (e.g., video cell phones,
wireless video cameras, etc) and player(e.g., iPod video). In this research, we fo-
cus on energy minimization for portable video encoding devices. This is because,
on portable video devices, the fraction of energy consumption by video encoding
(typically 60-85%) is much higher than that of video decoding [31].
For video encoding, video data is voluminous. It has to be very efficiently com-
pressed. Otherwise, the amount of transmission energy or required storage space
will be tremendous. During the past decades, many video compression algorithms
and international standards, such as MPEG-2, H.263, MPEG-4, and H.264 [54,55],
5
have been developed for efficient video compression. An efficient video compression
system is often computationally intensive and energy-consuming, since it involves
many sophisticated operations in spatiotemporal prediction, transform, quantiza-
tion, mode selection, and entropy coding [66]. Recent studies [39, 63] and our ex-
perimental analysis show that, in typical scenarios of video communication over
portable devices, video encoding consumes a significant portion (up to 40-60%) of
the total energy. Standardized video encoding techniques like H.263, H.264/AVC,
MPEG-1, 2, 4 are based on hybrid video coding [31], which shows in the Fig. 2.1.
Figure 2.1: Generalized block diagram of a hybrid video encoder
The input image is divided into macroblocks. Each macroblock consists of the
three components Y, Cr and Cb. Y is the luminance component which represents the
rightness information. Cr and Cb represent the color information. A macroblock
consists of one block of 16 by 16 picture elements for the luminance component
and of two blocks of 8 by 8 picture elements for the color components. The encoder
compresses the video with each macroblock. In this view, the macroblock is the basic
6
coding unit [5]. The behavior of the video encoder has been theoretically analyzed
in [66]. Typical video encoders, including all the standard video encoding systems,
such as MPEG-2 [1], H.263 [26], MPEG-4 [54], and H.264/AVC [31], employ a hybrid
motion compensated DCT encoding scheme. Specifically, as shown in Fig. 2.1, they
have the following major encoding modules: motion estimation (ME) and motion
compensation (COMP), DCT, quantization (QUANT), entropy encoding (ENC)
of the quantized DCT coefficients, inverse quantization (DQUANT), inverse DCT
(IDCT), picture reconstruction (RECON), and interpolation (INTERP) [54]. For
the ease of exposition, the DCT, IDCT, QUANT, DQUANT and RECON modules
are collectively referred to as PRECODING. In this way, the video encoder has only
three major modules: ME, PRECODING, and ENC. The PRECODING can be
considered as the data representation module. The run-time complexity of major
encoding modules is shown in Tab.2.1, where the percentages of CPU occupancy
for the major encoding modules are listed. It can be seen that ME is the most
computation-intensive module, consuming about one-third of the processor cycles.
The PRECODING modules collectively consume about 50% of the total processor
cycles. The ENC module uses a relative small amount of the total CPU time,
especially at low coding bit rates.
During the past decades, many algorithms, software and hardware techniques
have been developed to reduce the computational complexity of video encoding,
speed up the video encoder, and reduce its energy consumption. For example, a
statistical modeling approach is proposed in [21] to predict the zero DCT coefficients
after quantization. Based on the prediction, the DCT computation for those zero
coefficients can be saved. Fast and low-power motion estimation algorithms have
been developed to reduce the computational complexity of motion estimation [64].
7
Component Akiyo News CarphoneME 30.4% 32.6% 33.1%
COMP 9.1% 8.4% 8.7%DCT 10.5% 9.2% 9.2%
QUANT 4.9% 4.6% 5.1%ENC 4.7% 5.4% 3%
DQUANT 1.9% 1.5% 2.0%IDCT 2.3% 2.9% 2.6%
RECOD 7.5% 6.9% 7.2%INTERP 14.3% 12.8% 13.2%
RC 7.4% 7.9% 7.6%other 6.5% 7.3% 6.7%
Table 2.1: CPU Occupancy (In Percentage) of the Major Encoding Function
Since there is no motion estimation for INTRA macroblocks (MB’s), the INTRA
ratio parameter, which is the fraction of INTRA MB’s in the video frame, can be us
to control the motion estimation complexity in the video encoder [50]. A parametric
scheme for scalable motion estimation and DCT has been proposed in [60]. Hardware
implementation technologies have also been developed to improve the video encoding
speed [47] [5]. Recently, researchers at University of Illinois at Urbana-Champaign
and IBM have realized the importance of power aware computing for video data
compression and are investigating software and hardware techniques to reduce the
energy consumption of video compression [14]. However, little research has been
done to establish a theoretical framework for modeling and minimizing the energy
consumption of video compression for battery-powered communication devices.
2.2 Dynamic Scalable Voltage
Many algorithms developed in the literature are able to control or optimize the
computational complexity of the video encoder. To translate the complexity control
8
and reduction into energy control and saving, we need to consider energy-scaling
technologies in hardware design. To dynamically control the energy consumption of
microprocessors on the portable device, a CMOS circuits design technology, named
dynamic voltage scaling (DVS), has been recently developed [29, 42]. In CMOS
circuits, the power consumption P is given by
P = V 2 · fCLK · CEFF , (2.1)
where V , fCLK , and CEFF are the supply voltage, clock frequency, and effective
switch capacitance of the circuits, respectively [51]. Since the energy is power mul-
tiplied by time, and the time to finish an operation is inversely proportional to
the clock frequency. Therefore, the energy per operation Eop is proportional to V 2
(Eop ∝ V 2). This implies that lowering the supply voltage will reduce the energy con-
sumption of the system in a quadratic fashion. However, lowering the supply voltage
also decreases the maximum achievable clock speed. More specifically, it has been
observed that fCLK is approximately linearly proportional to V [65]. Therefore, the
result is
P ∝ f 3CLK , and Eop ∝ f 2
CLK . (2.2)
It can be seen that the CPU can reduce its energy consumption substantially by
running more slowly. However, it is not so slow for the real-time operation of video
coding.
This is the key idea behind the DVS technology. Variable chip makers, including
AMD [22] and Intel [23], have recently announced and sold processors with this
energy-scaling feature. In conventional system design with fixed supply voltage and
clock frequency, clock cycles, and hence energy, are wasted when the CPU workload
9
is light and the processor becomes idle. Reducing the supply voltage in conjunction
with the clock frequency eliminates the idle cycles and saves the energy significantly.
2.3 Joint Hardware and Application Adaptation
Achieving high QoS requirements with low energy consumption is challenging.
Cross-layer adaptation provides an efficient way to address such issues. First, system
resources are being designed with the ability to trade off performance for energy. For
example, mobile processors on the market today (such as Intel XScale PAX255 [24],
Intel Pentium-M [25] and AMDAthlon [3]) can already change the speed and power
at runtime using DVS. Second, multimedia applications can gracefully adapt to re-
source changes while maintaining acceptable service quality. That is, multimedia
applications allow a tradeoff between output quality and resource demands. Finally,
the operating system can also provide flexible resource management to support the
tradeoff between QoS and resource demands or to balance the demands on different
resources (e.g., CPU time and network bandwidth).
Researchers have proposed adaptation approaches to address the high QoS and
low energy challenge in mobile devices. Adaptation can happen in different lay-
ers from hardware to operating system to applications. The hardware adaptation
dynamically reconfigures hardware resources such as the processor to save energy
while providing the requested resource service and performance [6,10,34,45,46]. The
operating system adaptation changes the policies of allocation and scheduling in re-
sponse to application and resource variations [18, 20, 28, 44, 48, 61]. The application
layer adaptation, possibly with the support of the operating system or middleware,
changes the QoS parameters such as rate to trade off output quality for resource
10
usage or to balance usage of different resources [8, 13, 17, 35, 57, 66].
The above adaptation approaches have been shown to be effective for both QoS
provisioning and energy saving. However, most of them adapt only a single layer or
two joint layers (e.g., the operating system and applications [9,43] or the operating
system and hardware [30, 41, 59]).
Figure 2.2: The framework of the GRACE
The Global Resource Adaptation through Cooperation (GRACE) project [49] is
to develop and demonstrate an integrated cross-layer adaptive system where hard-
ware and all software layers cooperatively adapt to changing system resources and
application demands, seeking to maximize user satisfaction while meeting resource
constraints of the energy, time, and bandwidth. The centerpiece of the GRACE is
a cross-layer adaptation framework that enables coordination of the adaptations of
the different system layers for the best QoS possible. Fig. 2.2 shows the framework
of GRACE. The key characteristics of the coordinator are:
1. Individual application components adapt locally, without knowledge of the
internals of other parts of the system.
2. Each software configuration is characterized by its cost (to represent its re-
source usage) and its utility (to represent user satisfaction). An application software
11
component uses these metrics to drive its local adaptations (with the goal of mini-
mizing cost for maximal utility).
3. A software configuration determines its cost using dynamic feedback from the
hardware and possibly other application components (e.g., the network).
4. Since all resources are capable of adaptation, each resource offers multiple
operating points and hence multiple possible costs (e.g., multiple combinations of
execution time and energy) for a given software configuration.
5. The resource manager receives requests from multiple applications with mul-
tiple associated costs and utilities. The resource manager selects the software and
hardware configurations that will maximize overall system utility and meet the sys-
tem constraints. Thus, the resource manager is not concerned with how an optimal
configuration is reached within a layer, but is simply a mediator for ensuring that
the selected configurations maximize overall system performance within the given
constraints.
6. Once the resource manager allocates a reservation, different system compo-
nents are free to adapt locally without going through the resource manager, as long
as they do not exceed the provided reservations.
The utility of GRACE can benefit in the real-time and dynamic nature of the
applications, which often result in some computational slack. It also provides a
possibility to tradeoff between requirement of the quality and resource usage.
2.4 Importance of This Work
Recently, there are a lot of algorithms, software and hardware energy-minimization
techniques, including low-complexity encoder design [16, 21, 33], low-power embed-
12
ded video encoding [27, 32], adaptive power control [39, 60, 63], and joint encoder
and hardware adaptation [13, 14, 56] to have been developed. These algorithms fo-
cus on encoder complexity (and power consumption) reduction through heuristic
adaptation or control instead of systematic energy optimization. This is because
they lack an analytic model to characterize the optimum trade-off between energy
saving and encoding performance [66]. In addition, even with existing energy sav-
ing technologies, the operational lifetime of portable video electronics is still very
short, which has become one of the biggest impediments to our technology future.
Therefore, developing new energy optimization methods has become an urgent re-
search task. In this work, we propose to develop a new P-R-D analysis framework
to characterize the inherent relationship between energy consumption and encoder
R-D performance. This will enable us to perform systematic energy minimization.
More importantly, given a video encoder, which has already been fully optimized
using existing software and hardware energy optimization technologies, the P-R-D
analysis framework will enable us to achieve additional significant energy saving.
13
Chapter 3
Energy-Scalable Video Encoder
Design and Operational P-R-D
Analysis
In this chapter, we introduce our scheme for energy-scalable video encoder design,
present our operational P-R-D analysis framework, introduce the analytical P-R-D
model, and explain how the model parameter can be estimated from video encoding
statistics.
3.1 An Operational Approach to P-R-D Analysis
In the original work of P-R-D analysis [66], we have introduced a set of complexity
control parameters into an MPEG-4 encoder, studied the R-D behavior of each
parameter, and obtained the P-R-D function for a simple MPEG-4 encoder. We
observe that this analytical approach is not easily extendable to other video encoders,
14
such as H.264 video coding [55], and the direct R-D analysis of complexity control
parameters becomes very difficult when the video encoding mechanism becomes
more sophisticated. In this work, we propose an operational approach for offline
P-R-D analysis and modeling which can be applied to generic video encoders.
The operational P-R-D modeling has the following three major steps. In the
first step, we group the encoding operations into several modules, such as motion
prediction, pre-coding (transform and quantization), mode decision, and entropy
coding, and then introduce a set of control parameters Γ = [γ1, γ2, · · ·, γL] to control
the power consumption of these modules. Therefore, the encoder complexity C is
then a function of these control parameters, denoted by C(γ1, γ2, · · ·, γL). Within
the DVS (dynamic voltage scaling) design framework [40], the microprocessor power
consumption, denoted by P , is a function of computational complexity C, therefore,
also a function of Γ, denoted by P=Φ(C) = P (γ1, γ2, · · ·, γL), whereΦ(·) is the
power consumption model of the microprocessor. For example, according to our
measurement, the power consumption model of the Intel PXA255 XScale processor
used in our DeerCam system is depicted in Fig. 3.1 (solid line). It can be well
approximated by the following expression
P = Φ(C) = β × Cγ, γ = 2.5, (3.1)
where β is a constant. In the second step, we execute the video encoder using dif-
ferent configurations of complexity control parameters and obtain the corresponding
R-D data, denoted by D(R; γ1, γ2, · · ·, γL). Note that this step is computationally
intensive and is intended for offline analysis to obtain the P-R-D model only. Once
the model is established, in Section 3.4, we will discuss how the model parameter
15
can be estimated online during video encoding.
0 50 100 150 200 250 300 350 4000
0.2
0.4
0.6
0.8
1
1.2
1.4
Work Load (M cycles)
Com
putin
g E
nerg
y (W
)
ActualApproximation
Figure 3.1: Power consumption model with DVS.
In the third step, we perform optimum configuration of the power control pa-
rameters to maximize the video quality (or minimize the video distortion) under the
power constraints. This optimization problem can be mathematically formulated as
follows:
minγ1,γ2,···,γL
D = D(R; γ1, γ2, · · ·, γL), s.t. P (γ1, γ2, · · ·, γL) ≤ P, (3.2)
where P is the available power consumption for video encoding. Given the R-D data
set D(R; γ1, γ2, · · ·, γL), The optimum solution, denoted by D(R, P ), describes
the P-R-D behavior of the video encoder. The corresponding optimum complexity
control parameters are denoted by γ∗i (R, P, 1 ≤ i ≤ L.
In the following, we explain how to define the complexity control parameters,
16
design an energy-scalable video encoder, and obtain the P-R-D function using the
above operational P-R-D analysis approach.
3.2 Complexity Control Parameters
In this work, two complexity scalable parameters, νX and νY , are introduced in
the video encoder. The complexity control parameter for the motion compensation
module is the number of SAD (sum of absolute difference) computations per frame,
denoted by νX . The νY presents the complexity control parameter of pre-coding
module, which is the number of the non-zero MB’s in the video frame. Here, “non-
zero” means the MB has non-zero DCT coefficients after quantization.
3.2.1 The Complexity Control Parameter νX
The complexity control parameter for the motion prediction and compensation mod-
ule is νX . This is based on the observation that the ME process is simply a sequence
of SAD (sum of absolute difference) computations to find the MB position of the
minimum SAD. Therefore, the computational complexity of ME, denoted by CME ,
is simply given by
CME = νX · CSAD. (3.3)
The existing video encoder uses a block-based motion prediction scheme. The ob-
jective of motion estimation is to find the best match in the reference frame for every
MB in the current frame. The search for the SAD-optimal motion vector problem
can be formulated as
(x0, y0) = arg min SAD(x, y), (3.4)
17
where SAD(x, y) represents the sum of absolute difference (SAD) between the cur-
rent MB and the reference MB at a relative position of (x, y). We can see that the
ME process is simply a sequence of SAD computations to find the motion vector
which has the minimum SAD. It is assumed that the computational complexity of
each MB SAD is a constant. Therefore, the overall computational complexity of the
ME module is linearly proportional to the number of SAD computations νX . At the
frame-level, the νX · SAD computations are allocated among the MB’s in the video
frame to optimize the picture quality.
The dynamic allocation of SAD computations is used in the complexity scalable
of motion estimation. It is well known that moving objects in the video scene
contribute most to the overall visual quality. This suggests that in motion estimation
under energy constraints, we need to allocate the available νX · SAD computations
among the MB’s according to their motion characteristics to optimize the overall
picture quality. Let (mvx, mvy) be the motion vector of the MB. The block motion
activity (BMA) factor of the MB, denoted by ma is defined as
ma = |mvx| + |mvy|. (3.5)
At the frame level, we introduce a motion history matrix (MHM), denoted by M =
[mij ]MR×MC , where MR and MC are the numbers of MB’s per row and per column,
respectively. Initially, we set mij = 1. After a frame id coded, each entry is updated
as follows:
mij =
mij + 1 if ma = 0;
0 else.(3.6)
Here, ma is the BMA factor of the (i, j)-th MB in the coded frame. The larger
18
the value of mij , it is of higher probability that this MB is a static block, and
less SAD computations can be allocated to this MB. Note that each entry of the
MHM is linearly scaled and represented by the gray level of a MB, ranging from 0
to 255. We can see that the MHM captures not only the motion history but also
the locations of the object motion. Most importantly, this MHM approach has very
low computation overhead and is very cost-effective in practice. Using the MHM,
we can allocate the νX · SAD computations among the MB’s. The number of SAD
computations allocated to the (i, j)-th MB, denoted by nsadij, is determined by
nsadij =1
N − 1
[
1 −mij
∑
(k,l)≥(i,j) mkl
]
· Nsad, (3.7)
where N is the number of MB’s left so far that need to perform the motion esti-
mation, and Nsad is the available number of SAD computations. Initially, Nsad is
set to be νX . Suppose the motion search range is SR. If nsadij ≥ (2 · SR + 1)2,
it means the computational power is enough to perform a full search for this block.
Otherwise, the diamond motion search algorithm in [4] is used to find the motion
vector, whose complexity, indicated by the number of search layers, is controlled by
nsadij .
3.2.2 The Complexity Control Parameter νY
By analyzing the encoding architecture of the video encoding system, we find that
it is possible to control the computational complexity of all the pre-coding modules
using one single parameter , which is the number of the non-zero MB’s in the
video frame. Here, ”non-zero” means the MB has non-zero DCT coefficients after
quantization. Let CNZMB and CPRE be the pre-coding computational complexity
19
of one non-zero MB (NZMB) and the whole video frame, respectively. We will see
that,
CPRE = νY · CNZMB. (3.8)
A parametric complexity scalable parameter νY is given, which is to collectively
control the computational complexity of the pre-coding modules, namely, the DCT,
QUANT, DQUANT, IDCT, and RECON modules.
In typical video encoding as illustrated in Chapter 1 Fig.2.1, DCT is applied to
the difference MB after motion estimation and compensation, or the original MB
if its coding mode is INTRA. After the DCT coefficients are quantized, DQUANT,
IDCT, and RECON are performed to reconstruct the MB for motion prediction of
the next frame. In transform coding of videos, especially at low coding bit rates,
the DCT coefficients in the MB might become all zeros after quantization. We
refer to this MB as an all-zero MB (AZMB). Otherwise, it is called a non-zero MB
(NZMB). In international standards for video encoding, such as MPEG-2, H.263,
and MPEG-4, ”non-zeros” also means the CBP (coded block pattern) value of the
MB is non-zero. If we can predict an MB to be AZMB, all the above pre-coding
operations can be skipped, because the output of DQUANT and IDCT of an AZMB
is still an AZMB, and the reconstructed MB is exactly the reference MB used in
motion estimation and compensation. Therefore, the encoder can simply copy over
the reference MB to reconstruct the current MB. This is a unique property of the
AZMB, which can be used to reduce the computational complexity of the video
encoder [7].
The unique property of the AZMB is used to design a complexity scalability
scheme for the pre-coding modules. Let xnk|0 ≤ n, k ≤ 7 be the coefficients in
20
the different MB after motion estimation. For INTRA MB’s, xnk are the original
pixels in the video frame. Let yij|0 ≤ i, j ≤ 7 be the DCT coefficients. According
to the definition of DCT, we have
yij =1
4CiCj
7∑
n=0
7∑
k=0
xnk cos
(
iπ2n + 1
16
)
cos
(
jπ2k + 1
16
)
, (3.9)
where
Ci =
1√2
ifi = 0, 1;
else,Cj =
1√2
ifj = 0, 1;
else.(3.10)
We can see that
|yij|2 ≤
7∑
n=0
7∑
k=0
|xnk|2. (3.11)
Note that the right-hand side is the SSD of the difference MB, which is already
computed during the motion estimation. This suggests us that the SSD could be
an efficient and low-cost measure to predict the AZMB. After motion estimation
and compensation, let SSDi|1 ≤ i ≤ M be the SSD values of the M MB’s in the
video frame sorted in an ascending order. In the proposed complexity scalability
scheme for pre-coding, we force the first M − νY MB’s to be AZMB’s, and treat the
remaining νY MB ’s as NZMB’s to which the pre-coding operations are applied.
3.3 P-R-D Modeling for Energy-Aware Video En-
coding
As discussed in the above, we start from a conventional video encoder, denoted by
ΠA, introduce a set of complexity control parameters to control its computational
complexity and in turn its power consumption. We refer to this complexity-scalable
21
or power-scalable video encoder as C-R-D encoder, denoted by ΠB.
It should be noted that introducing the control parameters Γ to the existing
video encoder ΠA, as ΠB will only scale down its computational complexity and
power consumption. Let P A and P be the power consumption of encoders ΠA and
ΠB, respectively. Obviously, 0 ≤ P ≤ P A. Therefore, we can normalize the power
consumption P of the C-R-D encoder by P A. After this normalization, the value of
P is between 0 and 1. Fig.3.2(a) shows the P-R-D curve D(R;P)with normalized
power P. Fig.3.2(b) shows the D-P curves at different bit rates R.
The specific procedure to insert the complexity control parameters into the en-
coder may vary from one encoder to another. For example, in an MPEG-4 video
encoder, we can use the number of SAD (sum of absolute difference) computations
to control the complexity of its motion estimation module, and use the number of
non-zero blocks to control the complexity of DCT and quantization modules. Similar
procedures can be applied to MPEG-2 or H.263 encoders. For H.264 video encoders,
we can use the number of SAD computations and reference frames, and the number
of coding modes to control its complexity. It is not important how the complexity
control parameters Γ = [ν1, ν2, . . . , νL] are inserted into the video encoding modules.
Our C-R-D video encoder design only requires that:
• Compatible. The existing non-energy-scalable video encoder ΠA is a special
case of the P-R-D encoder ΠB when νi = 1 or P = 1. From the R-D analysis
perspective
D(R, P )|P=1 = D(R), (3.12)
where D(R, P ) is the P-R-D function of encoder ΠB, and D(R) is the R-D
22
(a)
(b)
Figure 3.2: (a)The P-R-D curve; (b)The D-P curves at different bit rates.
23
function of encoder ΠA. In other words, when the power supply is sufficient
and a full encoding power is used, the energy-scalable P-R-D video encoder
ΠB becomes the traditional video encoder ΠA.
• Scalable. When we reduce the values of complexity control parameters, the
power consumption P of the P-R-D encoder decreases
3.4 Analytical P-R-D Models
In order to obtain an analytical P-R-D model for energy-aware video encoding, we
perform the operational P-R-D analysis procedure over a wide range of test video
sequences and find determine the common characteristics of their P-R-D functions.
The test video sequences used in this work include Akiyos, News, Salesman, Car-
phone, Foreman, Coastguard, and Football QCIF (176×144) video sequences. The
C-R-D curves and the D-P curves for these test video sequences at different bit rates
are shown in Fig.3.3-Fig.3.9.
It should be noted that these C-R-D curves at different complexity control
schemes share a similar pattern: as the bit rate and power consumption increase,
the distortion decreases exponentially. More specifically, we can draw the following
observations about the C-R-D functions:
1. When P = 0 and R = 0, the coding distortion should be the variance of the
input video. This is because the encoder does not have any bit and energy
resources to compress the video data, and the decoder has no choice but display
blank pictures. In this case, the coding distortion, measured by the MSE
(mean squared error) between the input video and the reconstructed one, is
the variance of the input video.
24
00.2
0.40.6
0.81
0
0.2
0.4
0.6
0.8
1
0
0.5
1
1.5
2
2.5
Rate
Akiyos P−R−D
Power
D(*
106 )
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.5
1
1.5
2
2.5
Power
D(*
106 )
Akiyos P−R−D
Figure 3.3: Akiyos (a)The P-R-D curve; (b)The D-P curves at different bit rates.
25
0
0.5
1
1.5
0
0.2
0.4
0.6
0.8
1
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
Rate
News P−R−D
Power
D(*
106 )
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Power
D(*
106 )
News P−R−D
Figure 3.4: News (a)The P-R-D curve; (b)The D-P curves at different bit rates.
26
0
0.5
1
1.5
0
0.2
0.4
0.6
0.8
1
0
1
2
3
4
5
6
Rate
Salesman P−R−D
Power
D(*
106 )
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Power
D(*
106 )
Salesman P−R−D
Figure 3.5: Salesman (a)The P-R-D curve; (b)The D-P curves at different bit rates.
27
00.5
11.5
22.5
00.2
0.40.6
0.81
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
Rate
Power
D(*
106 )
Car P−D−R
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.5
1
1.5
2
2.5
3
Power
D(*
106 )
Car P−R−D
Figure 3.6: Car (a)The P-R-D curve; (b)The D-P curves at different bit rates.
28
00.5
11.5
22.5
3
0
0.2
0.4
0.6
0.8
1
0
2
4
6
8
10
12
14
16
18
20
Rate
Foreman P−R−D
Power
D(*
106 )
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
2
4
6
8
10
12
14
16
18
20
Power
D(*
106 )
Foreman P−R−D
Figure 3.7: Foreman (a)The P-R-D curve; (b)The D-P curves at different bit rates.
29
00.511.522.533.54
0
0.2
0.4
0.6
0.8
1
0
2
4
6
8
10
12
14
16
18
20
Rate
Coastguard P−R−D
Power
D(*
106 )
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
2
4
6
8
10
12
14
16
18
20
Power
D(*
106 )
Coastguard P−R−D
Figure 3.8: Coastguard (a)The P-R-D curve; (b)The D-P curves at different bitrates.
30
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
5
10
15
20
25
30
35
40
Power
D(*
106 )
Fb P−R−D
Figure 3.9: Football (a)The P-R-D curve; (b)The D-P curves at different bit rates.
31
2. As we can see from Fig.3.2, the relationship between the distortion D and
power consumption P is approximately exponential.
3. As suggested by the classical R-D models, the relationship between the coding
bit rate R and distortion D is also exponential.
Based on the above observations, we propose the following analytic expression
for the P-R-D model:
D(R, P ) = σ22−λR·g(P ), (3.13)
where, σ2 represents the picture variance of the video segment after motion compen-
sation and λ represents the resource utilization efficiency of the video encoder. g(P )
is the normalized power consumption model of the microprocessor with g(P ) = 0
and g(P ) = 1. In general, g(P ) is a monotonically increasing function. The analysis
in [6] suggests that
g(P ) = P1γ . (3.14)
The model 3.13 describes the relationship of the P , R, and D. It is convenient to
analyze the power consumption behavior of the video encoder with this model.
32
Chapter 4
Energy Saving Analysis
In this chapter, we will study how the energy consumption of the video encoder
can be minimized using P-R-D optimization. We will then present a training-
classification approach to configure the complexity control parameters of the video
encoder so as to achieve the energy saving.
4.1 Theoretical Analysis
Let DA(R) be the R-D function of encoder ΠA, and DB(R, P ) the P-R-D function
of encoder ΠB. As explained Chapter3, we have
DB(R, P )|P = 1 = DA(R), and DB(R, P ) ≤ DA(R), (4.1)
where 0 ≤ P ≤ 1. Let S be the video sequence to be encoded. The time duration
of S is denoted by T , which corresponds to the operational lifetime of device. We
partition S into a number of segments, denoted by Sn|1 ≤ n ≤ N. Suppose
the picture variance and encoder efficiency in the P-R-D model 3.13 for each video
33
segment are σ2 and λn, respectively. Let Rn and Pn be the number of bits and
power used by video encoder ΠB to compress Sn. Let RT be the total number bits
that can be generated by the video encoder. We assume that the video sequence is
encoded at a constant quality D, In general, this assumption is reasonable because
most applications require that the videos are encoded at a target quality. Rewriting
model 3.13 as the equation 4.2, for video segment Sn, we have
Pn =an
Rγn, (4.2)
where
an =
(1
λnlog2
σ2n
D
)γ
, (4.3)
is called scene activity level. We can see that within the P-R-D analysis framework,
the encoder power consumption Pn depends on the encoding bit rate RN . Our
objective is to minimize the total power consumption of encoder ΠB in compressing
all video segments. The energy minimization problem can be formulated as follows
minRn
P =N∑
n=1
Pn =N∑
n=1
an
Rγn, s.t.
N∑
n=1
Rn = RT . (4.4)
Lemma 1 The solution to the minimization problem is given by
Rn =(γan)
1γ+1
∑Ni=1(γai)
1γ+1
· RT , (4.5)
and the minimum power is given by
P =(∑N
n=1 a1
γ+1n )γ+1
RγT
. (4.6)
34
Proof. We solve the energy minimization problem using a Lagrange multiplier
approach. Let
J =
N∑
n=1
an
Rγn
+ β(
N∑
n=1
Rn − RT ), (4.7)
where β is the Lagrange multiplier. For the minimum solution, the following condi-
tion holds:
∂J
∂Rn= −γ
an
Rγ+1n
+ β = 0. (4.8)
This is
Rn =
(γ · an
β
) 1γ+1
. (4.9)
Since∑N
n=1 Rn = RT , we have
β1
γ+1 =
∑Nn=1(γ · an)
1γ+1
RT. (4.10)
From equation 4.9, we have
Rn =(γ · an)
1γ+1
∑Ni=1(γ · ai)
1γ+1
· RT , (4.11)
which is the optimum bit allocation. The minimum power consumption is then giver
by
P =
N∑
i=1
Pn =
N∑
n=1
an
Rγn
35
=1
RγT
N∑
n=1
an
(∑N
i=1(γ · ai)1
γ+1
)γ
(γ · an)γ
γ+1
=1
RγT
(N∑
i=1
a1
γ+1
i
)γ N∑
n=1
anγ
γ
γ+1
(γ · an)γ
γ+1
=1
RγT
(N∑
i=1
a1
γ+1
i
)γ N∑
n=1
a1
γ+1n
=
(∑N
i=1 a1
γ+1
i
)γ+1
RγT
, (4.12)
which proves Lemma 1.
Lemma 2. The P-R-D video encoder ΠB consumes less energy than the standard
non-energy-scalable video ΠB, and the energy saving ratio Λe is
Λe =P B
T
P AT
=
(∑N
i=1 a1
γ+1
i
)γ+1
(∑N
i=1 a1γ
i
)γ
· N
≤ 1, (4.13)
where P AT and P B
T are the power consumption of encoders ΠA and ΠB, respectively.
The equality holds if the scene activity level ai for each video segment is constant
(stationary).
Proof. In the P-R-D video encoder design, encoder ΠA is a special case of the P-R-
D video encoder ΠB when a full encoder power is used. In other words, the power
consumption of encoder ΠA on video segment Si, denoted by P Ai , is equal to 1. If D
is the target video encoding quality. According to equation 4.2, the corresponding
number of encoding bits of ΠA, denoted by RAi , is given by RA
i = a1γ
i . Therefore,
the total number of bits generated by encoder RAi is
36
RT =N∑
i=1
a1γ
i , (4.14)
and the total power consumption of encoder ΠA is given by
PT =N∑
i=1
P Ai =
N∑
i=1
1 = N. (4.15)
For the same number of encoding bits RT , according to equation 4.6 and 4.14, the
minimum power consumption of the P-R-D encoder ΠB is
P BT =
(∑N
i=1 a1
γ+1
i
)γ+1
RγT
=
(∑N
i=1 a1
γ+1
i
)γ+1
(∑N
i=1 a1γ
i
)γ . (4.16)
The energy saving ratio of the P-R-D video encoder ΠB over the existing video
encoder ΠA is given by
Λe =P B
T
P AT
=
(∑N
i=1 a1
γ+1
i
)γ+1
(∑N
i=1 a1γ
i
)γ
· N
. (4.17)
To prove that Λ ≤ 1, we need to use the Holder’s inequality. Given two N-
dimensional vector (x1, x2, . . . , xN ) and (y1, y2, . . . , yN), according to Holder’s in-
equality, we have
N∑
i=1
|xi · yi| ≤
(N∑
i=1
xpi
) 1p(
N∑
i=1
yqi
) 1q
, (4.18)
where p, q > 1 and
37
1
p+
1
q= 1. (4.19)
The equality holds when
(x1, x2, · · · , xN) = α · (y1, y2, · · · , yN) , (4.20)
where α is a constant. We let
xi = a1
γ+1
i , yi = 1, (4.21)
p =γ + 1
γ, q = γ + 1. (4.22)
Using the Holder’s inequality, we have
N∑
i=1
|a1
γ+1
i · 1| ≤
(N∑
i=1
(
a1
γ+1
i
)γ+1γ
) γ
γ+1
·
(N∑
i=1
(1)γ+1
) 1γ+1
=
(N∑
i=1
a1γ
i
) γ
γ+1
· N1
γ+1 , (4.23)
that is
(N∑
i=1
a1
γ+1
i
)γ+1
≤
(N∑
i=1
a1γ
i
)γ
· N. (4.24)
Therefore
Λe =P B
T
P AT
=
(∑N
i=1 a1
γ+1
i
)γ+1
(∑N
i=1 a1γ
i
)γ
· N
≤ 1. (4.25)
38
According to equation 4.20, the equality holds when
(a1, a2, · · · , aN ) = α (1, 1, · · · , 1) , (4.26)
which implies that the scene activity level ai of each video segment is constant. So
far, Lemma 2 is proved.
From Lemma 2, we can see that, incorporating another dimension, the power
consumption P, into the traditional R-D analysis gives us another dimension of
freedom in resource allocation and performance optimization. Accordingly, the P-
R-D video encoder is able to save energy by intelligently allocating its bit and energy
resources. Furthermore, the result from Lemma 2 shows that the energy saving is
possible if the scene activity of the input video scene is non-stationary, i.e., αi is
time-varying. In practice, typical video application system will be able to operate
for hours. Within this long operational time period, the video content captured by
the system often exhibits a large variation in its scene activity levels. In this case,
the P-R-D video encoder is able to save energy significantly.
4.2 Traing-Classification Approach to P-R-D Video
Encoder Control
The activity parameter αi is very important. It includes the resource utilization
efficiency parameter λ of the P-R-D video encoder, the picture variance σ2 which
represents the amount of scene activities, and the target video quality D set by the
application requirement. In real-time video encoding, the picture variance σ2 can
be obtained directly from the video encoder. Therefore, we only need to estimate
39
λ. The P-R-D model is rewritten as:
D(R, P ) = σ22−λR·Cγ
, 1 ≤ γ ≤ 3. (4.27)
Considering P ∝ f 3CLK and C ∝ fCLK , it is observable that C ∝ P
13 . Hence, from
the P-R-D model
D(R, P ) = σ22−λR·P1γ
, 1 ≤ γ ≤ 3, (4.28)
the C-R-D model 4.27 is obtained.
With the C-R-D model, the λ can be estimated. In Chapter 3, we have pre-
sented an operational approach to obtain the P-R-D curve. Based on these P-R-D
curves, we can determine the value of λ using statistical fitting. This approach is
computationally intensive and only suitable for offline C-R-D analysis. In real-time
video compression, it is desirable to develop a low-complexity scheme which is able
to estimate λ from statistics of current or previous video frames, which are directly
available from the video encoder. We define the following encoder statistics λF
λF =1
R · CγF
log2
σ2i
Di, (4.29)
where CF = C(ν1 = 1, ν2 = 1, . . . , νL = 1) represents the encoder complexity where
no complexity control is applied. Fig.4.1 shows that the P-R-D model parameter
λ is highly correlated with the existing R-D model λF . So far, the video scene
activity can be identified with parameter λF . It dose not add further complexity for
calculating parameter λF .
In this work, we assume that the scene activity parameter does not change sig-
40
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.050
1
2
3
4
5
6
7
8
λF
λ
data 1 linear
Figure 4.1: The relationship of λ and λF
nificantly within the video segment. In practice, with proper scene change detec-
tion, a long non-stationary video input can be often partitioned into multiple video
segments with relatively stationary scene activities. Fig. 4.2 illustrates the scene
activity estimation parameter λF in the standard test video sequences. It shows
that the time-varying video sequence can be partitioned into video segments, each
with a relatively stationary scene activity level.
Lavel 1 2 3 4 5 6 7
λF 0.0472 0.0331 0.0246 0.0132 0.0097 0.0070 0.0048
Range 0.045 0.045-0.030 0.030-0.022 0.022-0.012 0.012-0.009 0.009-0.006 0.006
Table 4.1: The point value for categorizing Video Segment
We implement and test the proposed P-R-D analysis procedure over an input
video sequence which has 3590 frames and partitioned into 57 segments. The results
are shown in Fig. 4.5. In our experiments, we classify the input video sequence
into the seven clusters, with each of them sharing similar P-R-D behaviors. The
41
0 500 1000 1500 2000 2500 3000 35000
50
100
150
200
250
300
The activity of a video sequence
Number of frame
Mod
el Pa
ram
eter
λ F
Figure 4.2: The common video scene estimation parameter λF
600 650 700 750 800 850 900
−10
−5
0
5
10
15
20
25
30
35
The activity of a video sequence
Number of frame
Mod
el Pa
ram
eter
λ F
(1)
Figure 4.3: A parts of common video scene estimation parameter λF
42
1500 1600 1700 1800 1900 2000
0
50
100
150
200
250
300
The activity of a video sequence
Number of frame
Mod
el Pa
ram
eter
λ F
(2)
Figure 4.4: A parts of common video scene estimation parameter λF
classification feature λF are listed in the Table 4.1. The result shows that the P-R-
D model parameters of different video segment exhibit a strong clustering behavior.
Based on this property, we will develop a training-classification approach to P-R-D
video encoder control in the next chapter.
43
1 2 3 4 5 6 70
2
4
6
8
10
12
14
16The frequency of the activity level
Activity level
Freq
uenc
y
Figure 4.5: The distribution of scene activity parameters of all video segments.
44
Chapter 5
Power-Rate-Distortion Control for
Energy-Aware Video Encoding
In this chapter, we study how the constrained optimization for P-R-D analysis in
(3.2) can be solved and how the complexity control parameters of video encoders
can be configured during real-time video encoding to achieve the optimized P-R-D
performance.
5.1 A Training-Classification Approach to P-R-D
Video Encoder Control
In Chapter 3, we introduce two complexity control parameters, νX and νY , to con-
trol the motion prediction and pre-coding modules of the video encoder so as to
establish a complexity-scalable or equivalently energy-scalable video encoder. The
encoding bit rate R, video distortion D, and power consumption P are all functions
of these two complexity control parameters. In energy-aware video encoding, these
45
two parameters should be optimally selected so as to minimize the total power con-
sumption under rate and distortion constraints or minimize the total video coding
distortion under rate and power constraints, as shown in (3.2).
As discussed in Chapter 4, the input video segments exhibit a strong cluster-
ing behavior in their P-R-D model parameters. Based on this observation, we have
propose a training-classification approach to P-R-D control and optimization during
real-time video encoding. First, in the training stage, we cluster the training video
segments according to their scene activity parameter. During the training stage, we
collect a set of training video segments with a wide range of scene activities. We
partition them into a number of clusters according to their scene activity param-
eter λF . According to our simulation experience, 5 to 7 clusters will be sufficient.
For each cluster of video segments, we find their average P-R-D function and opti-
mum encoder complexity control parameters by solving the constrained optimization
problem. In this chapter, we use Lagrangian optimization to obtain the optimum
control parameters νX and νY of each cluster. These optimum encoder complexity
control parameters for all clusters are then stored in a database. During real-time
video encoding, we compute the scene activity parameter λF of the video segment,
determine its cluster based on the value of σ2m, and then use the average optimum
encoder complexity control parameters of that cluster to control the video encoder.
5.2 Lagrangian Optimization
In Chapter 4, we have proposed to partition a non-stationary input video sequence
into multiple video segments which have relatively stationary scene activities. These
video segments are then classified into several clusters and the distribution of these
46
video segments in the cluster seems to follow a Gaussian distribution, as shown
in Fig. 4.5. For video segments in each cluster, we use a discrete Lagrangian
optimization approach [5] to obtain the optimum complexity control parameters,
since during our operational P-R-D analysis, what we have obtained are discrete P-
R-D points, D = D(R, P |Γ(ν1, ν2, . . . , νL)) at different configurations of complexity
control parameters.
Figure 5.1: The discrete Lagrangian Optimization principle
The basic idea of discrete Lagrangian optimization is as follow. We start with a
simple rate-distortion optimization without considering the third dimension of power
consumption to demonstrate the basic procedure of discrete Lagrangian optimization
. We first introduce a Lagrange multiplier λ ≥ 0, a non-negative real number and
consider the Lagrangian cost Jij(λ) = dij+λ·rij for our P-R-D analysis. Refer to Fig.
47
5.1 for a graphical interpretation of the Lagrangian cost. Here, i is the index of video
segment and j is the index of the complexity control parameter. As the parameter
j increases, the coding bit rate rij decreases and the coding distortion dij increases.
The Lagrange multiplier allows us to select specific trade-off points. Minimizing the
Lagrangian cost Jij = dij +λ ·rij when λ = 0, is equivalent to minimizing the coding
distortion. In other words, it selects the point closer to the x-axis in Fig. 5.1.
Conversely, minimizing the Lagrangian cost function when λ becomes arbitrarily
large is equivalent to minimizing the coding bit rate, and thus finding the point
closest to the y-axis in Fig. 5.1. Intermediate values of the multiplier λ determine
intermediate operating points. Please see [5] for detail. Consider the following R-D
optimization problem:
minj∈Ω
J =
N∑
i=1
dij + λ · rij, s.t.
N∑
i=1
rij ≤ RT , (5.1)
where j is a parameter for coding control. If the mapping j = x∗(i) for i =
1, 2, . . . , N , minimizes
N∑
i=1
dix(i) + λ · rix(i). (5.2)
Then it is also the optimal solution to the problem of the 5.1, for the particular case
where the total budget is:
RT = R(λ) =N∑
i=1
rix(i), (5.3)
so that
48
D(λ) =N∑
i=1
dix∗(i) ≤N∑
i=1
dix(i), (5.4)
for any x satisfying the equation 5.4 with R given by (5.3). Since the budget
constraint of (5.3) has been removed, for a given multiplier λ, (5.1) can be rewritten
as
minj∈Ω
(
N∑
i=1
dij + λ · rij) =
N∑
i=1
minj∈Ω
(dij + λ · rij). (5.5)
In this case, the minimum point can be computed independently for each coding
unit. Note that for each coding unit i, the point on the R-D characteristic that
minimizes dij + λ · rij is that point at which the line of absolute slope λ is tangient
to the convex hull of the R-D characteristic. For this reason we normally refer to λ
as the slope, and since λ is the same for every coding unit on the sequence, we can
refer to this algorithm as a constant slope optimization [5].
5.3 P-R-D Optimization
In the following, we apply discrete Lagrangian optimization to solve the P-R-D
optimization problem. (5.1) can be rewritten
min[ν1,ν2,...,νL]∈Ω
J = D + λ1 · R + λ2 · C, (5.6)
where the J is the Lagrangian cost, the λ1 and λ2 are multipliers for bit rate con-
straint R and complexity constraint C, respectively. For (5.6), the optimal objective
is to obtain best video quality of service under rate and complexity constraints. Note
that the power consumption P is the function of the computational complexity C
49
which is related to DVS. In this chapter, for ease of discussion, we use C-R-D model
instead of the P-R-D model, since they are equivalent under DVS. Within a cluster
of video segment, as mentioned in 4, the Lagrangian cost function can be written as
J =
N∑
i=1
(Di + λ1i · Ri + λ2i · Ci), and
N∑
i=1
Ri ≤ R,
N∑
i=1
Ci ≤ C, (5.7)
where N is a number of video segments. There are two multipliers in the Lagrangian
cost function. So, it is more difficult to determine the optimum values of both. In
order to simple the optimization procedure, we assume that the video quality is
constant. This assumption is reasonable, since in many practical applications, users
often specify a target video quality for the video encoding and communication task.
In this case, we set Di = D, where D is the target video quality set by users. Now,
the cost function becomes
J =
N∑
i=1
(Ci + λi · Ri), and
N∑
i=1
Ri ≤ R, Di ≈ D. (5.8)
As mention in the 4, the input video segments are classified into seven clusters. In
this case, the cost function can be re-written as
min(
N∑
k=1
(Ci + λk · Rk)) =
7∑
i=1
Ni∑
j=1
Cij(i) + λi · Rij(i), (5.9)
underN∑
i=1
Ri ≤ R, Di ≈ D.
50
Note that the each video segment in the cluster is independent so that the opti-
mization procedure can be performed on each cluster. Using the constant slope
optimization algorithm [5], Eq. (5.10) is rewritten as
7∑
i=1
Ni∑
j=1
Cij(i) + λ · Rij(i), N =
7∑
i=1
Ni, (5.10)
under
N∑
i=1
Ri ≤ R, Di ≈ D.
Based on this formulation, the optimization problem can be solved using the follow-
ing two steps.
Step 1: Calculating the minimum Lagrangian cost Ji with different λ ∈
[λ1, λ2, . . . , λM ] for each cluster. We run the P-R-D scalable encoder mentioned 4
with at different configuration of encoding parameters [X, Y, Q] and the correspond-
ing encoding results [C, R, D] are recorded. Here, [X, Y, Q] are scalable parameter
of motion estimation search, pre-coding, and quantization step size. [C, R, D] are
encoding complexity, coding bit rate, and video distortion, respectively. Note that
we assume the video coding distortion is constant. We group the results [C, R, D]
based their distortion values. For each group, we apply the optimization procedure
outlined in (5.11):
min[X,Y,Q]
ji = Ci + λ · Ri, i = 1, 2, . . . , 7,
(5.11)
under
N∑
i=1
Ri ≤ R, Di ≈ D.
51
The multiplier λ controls the optimization to match the rate budget. Some sample
choices of λ are listed in Table 5.1.
λ 300 400 500 600 700 800 1000 2000 3000λ 4000 5000 6000 7000 8000 9000 10000 12000 14000
Table 5.1: Sample choices of of multiplier λ.
The Fig. 5.2 to Fig. 5.11 show the minimum Lagrangian cost at different values of
λ for each of the seven clusters. From these results, we can see that each optimal pair
of rate and complexity corresponds a multiplier λ. This pair of rate and complexity
corresponds a group of the scalable parameters [X, Y, Q]. With the parameters
[X, Y, Q], the optimal coding can be performed.
0 2000 4000 6000 8000 10000 12000 140001.14
1.16
1.18
1.2
1.22
1.24
1.26
1.28x 10
6PSNR=38
Rate
λ
0 2000 4000 6000 8000 10000 12000 140008.2
8.4
8.6
8.8
9
9.2
9.4
9.6x 10
8
Com
plex
ity
λ
Figure 5.2: The multiplier λ, rate constraints and complexity at D = 38db
Step 2: Predicting the scene activity parameter of the input video segment.
In Step 1, each cluster’s minimum Lagrangian cost is calculated at different values
of λ. This implies that the minimum computational complexity is obtained under
a certain rate constrain for a specific value of λ. As discussed in Chapter 4, each
52
0 2000 4000 6000 8000 10000 12000 140001.05
1.1
1.15
1.2
1.25
1.3
1.35x 10
6 PSNR=37Ra
te
λ
0 2000 4000 6000 8000 10000 12000 140007
7.5
8
8.5
9
9.5
10x 10
8
Com
plexit
y
λ
Figure 5.3: The multiplier λ, rate constraints and complexity at D = 37db
0 2000 4000 6000 8000 10000 12000 140000.9
0.95
1
1.05
1.1
1.15x 10
6 PSNR=36
Rate
λ
0 2000 4000 6000 8000 10000 12000 140007
7.5
8
8.5
9
9.5x 10
8
Com
plex
ity
λ
Figure 5.4: The multiplier λ, rate constraints and complexity at D = 36db
53
0 2000 4000 6000 8000 10000 12000 140007
7.5
8
8.5
9
9.5
10x 10
5 PSNR=35Ra
te
λ
0 2000 4000 6000 8000 10000 12000 140006.5
7
7.5
8
8.5
9
9.5x 10
8
Com
plexit
y
λ
Figure 5.5: The multiplier λ, rate constraints and complexity at D = 35db
0 2000 4000 6000 8000 10000 12000 140005.5
6
6.5
7
7.5
8
8.5x 10
5 PSNR=34
Rate
λ
0 2000 4000 6000 8000 10000 12000 140006.5
7
7.5
8
8.5
9
9.5x 10
8
Com
plexit
y
λ
Figure 5.6: The multiplier λ, rate constraints and complexity at D = 34db
54
0 2000 4000 6000 8000 10000 12000 140005
5.5
6
6.5
7x 10
5PSNR=33
Rate
λ
0 2000 4000 6000 8000 10000 12000 140006.5
7
7.5
8
8.5
9
9.5x 10
8
Com
plexit
y
λ
Figure 5.7: The multiplier λ, rate constraints and complexity at D = 33db
0 2000 4000 6000 8000 10000 12000 140003.5
4
4.5
5
5.5x 10
5 PSNR=32
Rate
λ
0 2000 4000 6000 8000 10000 12000 140006
6.5
7
7.5
8
8.5
9
9.5x 10
8
Com
plexit
y
λ
Figure 5.8: The multiplier λ, rate constraints and complexity at D = 32db
55
0 2000 4000 6000 8000 10000 12000 140003
3.5
4
4.5
5x 10
5 PSNR=31Ra
te
λ
0 2000 4000 6000 8000 10000 12000 140005.5
6
6.5
7
7.5
8
8.5
9x 10
8
Com
plexit
y
λ
Figure 5.9: The multiplier λ, rate constraints and complexity at D = 31db
0 2000 4000 6000 8000 10000 12000 140002.5
3
3.5
4x 10
5 PSNR=30
Rate
λ
0 2000 4000 6000 8000 10000 12000 140005.5
6
6.5
7
7.5
8x 10
8
Com
plexit
y
λ
Figure 5.10: The multiplier λ, rate constraints and complexity at D = 30db
56
0 2000 4000 6000 8000 10000 12000 140002
2.2
2.4
2.6
2.8
3
3.2x 10
5 PSNR=29
Rate
λ
0 2000 4000 6000 8000 10000 12000 140005.5
6
6.5
7
7.5x 10
8
Com
plexit
y
λ
Figure 5.11: The multiplier λ, rate constraints and complexity at D = 29db
video segment is classified into clusters according to their scene activity parameters.
Hence, if the distribution of the clusters is known, the Lagrangrian multiplier λ can
be fixed for a given rate budget. Fortunately, the distribution of the clusters of
the video segments in term activity scene can be often considered to be a Gaussian
distribution. Let x be the scene activity parameter. Let f(x) be the probability
density function of x among these video clusters. Let r(x) be the corresponding bit
rate obtained from optimization. Therefore, the expected value of output bit rate
is given by∫ ∞
−∞f(x) · r(x)dx. (5.12)
Considering a discrete-time case of this problem. Let xi be the scene activity pa-
rameter of the i-th cluster. Let Ω = [xi|i = 1, 2, . . . , 7]. Let Fi be the probability
for sample i. The probability of the sample i, Fi, is list in Table 5.2. Let r(ji) be
the rate of the j-th video segment that is classified into i-th cluster. In this case,
57
the expected bit rate of j-th video segment is predicted by
7∑
i=1
Fi · r(i|λ). (5.13)
The overall coding rate Rp is predicted by
Rp =N∑
j=1
7∑
i=1
Fi · r(i|λ) =7∑
i=1
(N · Fi) · r(i|λ) =7∑
i=1
Ni · r(i|λ), (5.14)
where Ni is the number of the video segments that are classified into the i-th cluster.
Cluster Cluster1 Cluster2 Cluster3 Cluster4 Cluster5 Cluster6 Cluster7
Prob. 0.053 0.105 0.140 0.281 0.211 0.140 0.070
Table 5.2: The Probability of the Fi
Step 3: Fix the Lagrangian multiplier λ and obtain the optimal encoder control
parameters [X, Y, Q]. In Step 1, the minimum Lagrangian cost of a video segment in
the i-th cluster is calculated at different values of λ. In other words, the minimum
computational complexity is obtained for a given rate constraint and a specific value
of λ. As discussed in Chapter 4, an input video segment is classified into a cluster
based on its scene activity parameter. The distribution of the clusters in the video
sequence can be predicted by Step 2. Accordingly, the Lagrangrian multiplier λ
can be fixed with the certain rate budget. For example, we can consider the video
quality levels of 37db, 34db, and 30db. In order to obtain the λ, first, we list the
relationship between the expectation of the rate and the λ in Table . Then, the
average rate budget for the video segment compression is calculated with the total
rate budget with
58
r =R
N. (5.15)
λ 300 600 2000 7000 14000
RP 1264347.14 1209230.67 1162620.71 1153689.24 1153165.68
37db CP 826501723.62 844671507.38 902107565.82 940163519.19 946846809.28
Rr 1.2381e+006 1.2112e+006 1.1217e+006 1.1130e+006 1.1127e+006
Cr 972837842 990056977 1.0679e+009 1.1161e+009 1.1217e+009
λ 300 600 2000 7000 14000
RP 802745.57 705773.85 602751.87 593037.21 592451.51
34db CP 663130614.32 706261434.62 855651058.05 900865197.85 908359867.79
Rr 8.0704e+005 7.3882e+005 6.1917e+005 6.1253e+005 6.1233e+005
Cr 788791942 850711799 1.0121e+009 1.0609e+009 1.0750e+009
λ 300 600 2000 7000 14000
RP 390217.42 331753.54 266147.48 259710.16 255222.04
30db CP 572981357.51 597513566.14 709343178.71 738226926.15 783359359.26
Rr 4.1154e+005 3.6938e+005 3.0327e+005 3.0192e+005 2.9966e+005
Cr 680497548 701901814 854763964 878845555 935804923
Table 5.3: The Lagragian Multiplier λ and correspoding QoS and rate constaint
The encoder control parameters [X, Y, Q] are obtained at the minimum La-
grangian cost. In this case, (5.11) can be rewritten as
min[X,Y,Q]
J =
Ni∑
j=1
7∑
i=1
min(Cij + λ · Rij), Ni = N · Fi, (5.16)
under R = N ·
7∑
i=1
Fi · ri ≤ R, D ≈ D.
Here, Fi is the probability of the i-th cluster. The overall computational complexity
of a video sequence for a given rate constraint is
Cp =
7∑
i=1
Ni∑
j=1
Cij, and R = N ·
7∑
i=1
Fi · ri ≤ R, D ≈ D, (5.17)
where Cpis the total minimum computational complexity, R is the total rate budget.
59
5.4 Summary of Algorithm
We implement the optimization on common video sequence which has 3590 frames
and partitioned into 57 segments and the following steps are performed:
1. Assuming the video QoS and rate evaluation list in the Table 5.3. The N is
57.
2. Calculating corresponding expectation rate rre with equation 5.13. Note that
the probability of the Fi is in the Table 5.2.
3. Obtaining the each corresponding scaling parameters with Table 5.4
PSNR: 37dbλ Clust1 Clust2 Clust3 Clust4 Clust5 Clust6 Clust7
X 0.20 0.10 0.10 1.00 1.00 0.50 0.10300 Y 0.20 0.40 0.20 0.40 0.60 0.80 0.80
Q 5 4 3 3 3 3 3X 0.20 0.10 0.10 1.00 1.00 1.00 0.10
600 Y 0.20 0.40 0.20 0.40 0.60 0.80 0.80Q 5 4 3 3 3 3 3X 0.20 0.10 0.10 0.50 1.00 1.00 1.00
2000 Y 0.20 0.40 0.20 0.80 0.60 0.80 0.80Q 5 4 3 4 3 3 3X 0.20 0.10 1.00 0.60 1.00 1.00 1.00
7000 Y 0.20 0.40 0.20 0.85 0.60 0.80 0.80Q 5 4 3 4 3 3 3X 1.00 0.10 1.00 0.60 1.00 1.00 1.00
14000 Y 0.20 0.40 0.20 0.85 0.60 0.80 0.80Q 5 4 3 4 3 3 3
Table 5.4: The Lagragian Multiplier λ and corresponding PSNR ≈ 37db and rateconstraint
The multiplier λ can be obtained with certain QoS and rate constrains. For
example, if the requirement of the QoS is 37db and average bit rate is less than
1162620.71/57 = 20396.85 bit, the λ is chosen as 2000. With the argument QoS
being 37db and being 2000, the scaling parameters are obtained from the Table 5.4:
60
PSNR: 34dbλ Clust1 Clust2 Clust3 Clust4 Clust5 Clust6 Clust7
X 0.10 0.10 0.10 0.10 0.80 0.10 0.10300 Y 0.10 0.10 0.10 0.40 0.40 0.70 0.70
Q 8 5 3 6 4 5 4X 0.10 0.10 0.10 0.10 0.80 0.80 0.10
600 Y 0.10 0.10 0.20 0.40 0.40 0.70 0.90Q 8 5 6 6 4 5 5X 0.10 0.10 0.10 0.10 1.00 1.00 0.80
2000 Y 0.10 0.10 0.20 0.40 0.80 0.75 1.00Q 8 5 6 6 6 5 5X 0.10 0.50 1.00 1.00 1.00 1.00 0.80
8000 Y 0.10 0.20 0.20 0.40 0.80 0.75 1.00Q 8 6 6 6 6 5 5X 1.00 0.50 1.00 1.00 1.00 1.00 0.80
14000 Y 0.10 0.20 0.20 0.40 0.80 0.75 1.00Q 8 6 6 6 6 5 5
Table 5.5: The Lagragian Multiplier λ and corresponding PSNR ≈ 34db and rateconstraint
PSNR: 30dbλ Clust1 Clust2 Clust3 Clust4 Clust5 Clust6 Clust7
X 0.10 0.10 0.10 0.10 0.80 0.10 0.10300 Y 0.10 0.10 0.10 0.20 0.20 0.40 0.90
Q 18 12 10 10 8 8 8X 0.10 0.10 0.10 0.10 1.00 0.10 0.10
500 Y 0.10 0.10 0.10 0.20 0.20 0.40 0.90Q 18 12 10 10 8 8 8X 0.10 0.10 0.10 0.90 1.00 0.80 0.30
3000 Y 0.10 0.20 0.10 0.20 0.20 0.70 0.95Q 18 14 10 12 8 10 8X 0.10 0.10 0.10 1.00 1.00 1.00 0.30
8000 Y 0.10 0.20 0.20 0.20 0.20 0.80 0.95Q 18 14 12 12 8 10 8X 0.10 0.10 0.90 1.00 1.00 1.00 0.30
12000 Y 0.10 0.20 0.20 0.20 0.40 0.80 0.95Q 18 14 12 12 10 10 8
Table 5.6: The Lagragian Multiplier λ and corresponding PSNR ≈ 34db and rateconstraint
61
[0.20, 0.20, 5.00], [0.10, 0.40, 4.00], [0.10, 0.20, 3.00], [0.50, 0.80, 4.00], [1.00, 0.60,
3.00], [1.00, 0.80, 3.00], [1.00, 0.80, 3.00]. They are correspond to Cluster1, Cluster2,
Cluster3, Cluster4, Cluster5, Cluster6, Cluster7, respectively. Hence, the coding is
performed with these scaling parameters. In this way, the fifteen individual video
encoding are performed and the real results of them are list in Table 5.3. The Rp
presents the rate constrains corresponding complexity denoted Cp.
In order to compare with the non-scalable encoder, we encode same video se-
quence with the no complexity control under the certain QoS requirement. Fig. 5.12
to Fig. 5.17 show the results of the reconstruction of the video sequence with both
of the scalable and non-scalable encoder at QoS =37db, 34db, and 30db.
0 10 20 30 40 50 6020
25
30
35
40
45
50
No.GoP
PSNR
no scalingscaling
λ =300
Figure 5.12: Comparison of the non scaling coding and scaling coding at D =37db, λ = 300
In Fig. 5.12 and Fig. 5.13, the QoS is 37dB. Fig. 5.14, Fig. 5.15 and Fig. 5.16,
Fig. 5.17 correspond to QoS is 34db and 30db, respectively. The results are listed
in Table 5.7.
62
0 10 20 30 40 50 6026
28
30
32
34
36
38
40
42
44
46
No.GoP
PSNR
no scalingscaling
λ=2000
Figure 5.13: Comparison of the non scaling coding and scaling coding at D =37db, λ = 2000
0 10 20 30 40 50 6015
20
25
30
35
40
45
No.GoP
PSNR
no scalingscaling
λ=300
Figure 5.14: Comparison of the non scaling coding and scaling coding at D =34db, λ = 300
63
0 10 20 30 40 50 6024
26
28
30
32
34
36
38
40
42
No.GoP
PSNR
no scalingscaling
λ=2000
Figure 5.15: Comparison of the non scaling coding and scaling coding at D =34db, λ = 2000
0 10 20 30 40 50 6015
20
25
30
35
40
No.GoP
PSNR
no scalingscaling
λ=300
Figure 5.16: Comparison of the non scaling coding and scaling coding at D =30db, λ = 300
64
0 10 20 30 40 50 6015
20
25
30
35
40
No.GoP
PSNR
no scalingscaling
λ=2000
Figure 5.17: Comparison of the non scaling coding and scaling coding at D =30db, λ = 2000
PSNR Non-scalable Scaling1 Scaling2
R C R C Ratio R C Ratio
37db 8.7077 × 105 1.3049 × 109 1.2381 × 106 9.7284 × 108 0.41 1.1217 × 106 1.0676 × 109 0.55
34db 4.7797 × 105 1.3003 × 109 8.0704 × 105 7.8879 × 108 0.22 6.1917 × 105 1.0121 × 109 0.47
30db 2.1925 × 105 1.2944 × 109 4.1154 × 105 6.8049 × 108 0.15 3.0327 × 105 8.5476 × 108 0.29
Table 5.7: The Energy Saving Ratio
65
The energy saving ratio is calculated:
Ratio11 =(
0.9728×109
1.3049×109
)3
≈ 0.41 Ratio12 =(
1.0679×109
1.3049×109
)3
≈ 0.55 PNSR = 37db
Ratio21 =(
0.7888×109
1.3003×109
)3
≈ 0.22 Ratio22 =(
1.0121×109
1.3003×109
)3
≈ 0.47 PNSR = 34db
Ratio31 =(
0.6805×109
1.2944×109
)3
≈ 0.15 Ratio32 =(
0.8548×109
1.2944×109
)3
≈ 0.29 PNSR = 30db
Note that the results in the table 5.7 are obtained with the theoretical evaluation.
The achievable energy saving, in a practical setting, is discussed in Chapter 6.
66
Chapter 6
Joint Hardware and Video
Encoder Adaptation
In this chapter, we study how the hardware-layer system scheduling and application-
layer P-R-D control can be jointly designed to achieve energy saving in practical
portable video communication systems.
6.1 Background
In Chapters 2 to 4, we have developed a P-R-D analysis framework, which extends
the traditional R-D analysis by considering the third dimension of power consump-
tion. We have demonstrated that the video encoding energy can be significantly
saved using P-R-D optimization. To realize this energy saving in practical sys-
tem design, we need to consider the power consumption behavior of the hardware
computing platform and study joint hardware-application adaptation. The DVS
provides a mechanism to adapt the CPU operating voltage and clock frequency
67
according to the work load of the P-R-D video encoder. It translates complexity-
scalability and encoder complexity reduction into energy-scalability and encoder
energy reduction. The encoder complexity minimization in previous chapters is per-
formed at the application layer and does not consider the hardware-layer system
scheduling. In cross-layer adaptation, the CPU clock frequency, circuit voltage sup-
ply in DVS, encoding bit rate, video encoding distortion and power consumption,
are jointly optimized.
6.2 System Model
In order to perform cross-layer adaptation for the complexity encoder to save power,
system models at the hardware and application layers are needed.
6.2.1 CPU Model
In the hardware layer, we consider the video application system with a single adap-
tive CPU that supports multiple speeds, f1, f2, . . . , fk, trading off performance for
energy. At a lower speed, the CPU consumes less power. However, a lower speed
may increase the power consumption of other resources such as memory. Two major
observations regarding CPU energy can be made here. First, the CPU is one of the
most energy-consuming components in a portable video device. Depending on the
application workload, For example, the CPU might consume up to 52% of the total
energy in the Stargate [58]. Second, many processors used in embedded system de-
sign today, for example, Intel XScale PAX255 [24], Intel Pentium-M [25], and AMD
Athlon [3], allow software to change the speed through the Advanced Configuration
and Power Interface (ACPI) standard [26], thereby enabling a cross-layer system
68
control.
In general, there are two approaches to reduce CPU energy consumption. The
first one is dynamic power management (DPM) [34], which puts the idle processor
into the lower-power sleep state. The second approach is dynamic frequency /
voltage scaling (DVS) [53], which reduces the operating speed and voltage of the
active processor. DPM will be used in Chapter 6 for more energy saving in our
embedded system design. In this chapter, we focus on hardware adaptation through
DVS (dynamic voltage scaling).
The CPU power consumption typically consists of three major parts: the dy-
namic power, short circuit power, and leakage power, as shown in the following:
Ccp × f × V 2
︸ ︷︷ ︸
dynamic power
+ V × Isc︸ ︷︷ ︸
short circuit power
+ V × Ileak︸ ︷︷ ︸
leakage power
, (6.1)
where Ccp is the loading capacitance, f is the speed, V is the voltage, Isc is the
short circuit current, and Ileak is the leakage current [2]. When the speed decreases,
the CPU can operate at a lower voltage and thus reduce its power consumption.
Furthermore, the CPU power consumption model is generally a convex function
of the speed. Consequently, the CPU energy (i.e., the product of the power and
time) also decreases as the speed decreases even at the cost of longer execution
time. In particular, for ideal processors, we assume that their power is dominated
by the dynamic power and the voltage is proportional to the speed; that is, the CPU
power is proportional to the cubic of the speed on which the result of power saving is
calculated. Unfortunately, such assumption does not hold for real embedded systems
such as Stargate which uses Intel XScale PXA255, whose power is not proportional to
the cubic of the speed (see Table 6.1). Furthermore, a lower speed may increase the
69
energy consumption of other resources, such as memory and display. For example,
when the CPU operates at a lower speed, an application needs to run for a longer
time; consequently, it needs to use the memory for a longer time, often increasing
the memory energy consumption. Since our goal is to save the total energy of the
whole device, we are more interested in the total power consumption of the device.
In general, the relationship between the clock speed f and the total system power
P (f) can be obtained via measurements like the data in the Table 6.1. Therefore,
it is without loss of generality to assume that the total power decreases as the CPU
speed decreases. This assumption holds for the system Stargate used in our project.
Freq.(MHz) 400 300 200 100Power (W) 1.89 1.67 1.62 1.45
Table 6.1: Stargate Power consumption.
Besides the fact that system power consumption is not proportional to the cubic
of the clock speed, in practice, the clock speed of a real CPU cannot be adjusted
continuously. Instead, the CPU often has several candidate clock speeds for us to
choose. In this case, the CPU clock speed cannot be adjusted to the frequency that
exactly matches how computational complexity the video application job demands.
6.2.2 Task Model
The nature of the video application task is periodic that release a job every period.
For example, the new computational complexity control encoder sends the video
segment coding jobs periodically. Each job has a soft deadline, typically defined
as the end of the period. Each job consumes a certain amount of CPU cycles.
Different jobs of the same task, like the video segments mentioned in the previous
chapter, will need different amount of CPU cycles. In practice, the instantaneous
70
cycle demand for individual video segments may change significantly over the task
period. However, the probability distribution of cycle demand of the video segment
coding is often stable because of both of the periodic nature of its jobs and the
probability distribution of a video segment classified into a cluster. In the other
words, the probability distribution of cycle demand of each video segment coding
can be estimated based on previous encoding statistics. The probability that a job
demands no more than a certain amount of cycles in a period is denoted by
Pr(X ≤ x), (6.2)
where X is the random variable associated with the number of cycles demanded
by each job. In Chapter 4, an input video segment is classified into one of those
seven cluster Fi, i = 1, 2, . . . , 7 and the video segments in the i-th cluster require
Ci operation cycals. Assuming that the time interval of each video segment is
approximately TP . The probability Pr becomes
Pr(X ≤
j∑
i=1
·Ci
TP
) =
j∑
i=1
Fi. (6.3)
Table 5.2 list the probability distribution of these clusters.
6.3 Application-Layer Video Encoder Adaptation
In system scheduling, we need to make sure that the performance and CPU require-
ments of each task are met. In particular, the scheduler used a so-called soft real-time
algorithm to periodically allocate cycles to each task based on its statistical cycle
demand, e.g., the 95th percentile of cycle demand of all of its jobs. The statistical
71
demand can be specified based on the task’s characteristics. The purpose of this
statistical, rather than the worst-case based, allocation is to improve CPU utiliza-
tion while providing soft (statistical) performance guarantees. The scheduler then
enforces the allocation through an earliest deadline first (EDF) based algorithm.
This algorithm makes admission control to ensure that the total CPU utilization (at
the highest CPU speed fk ) of all concurrent tasks is no more than one, i.e.,
n∑
i=1
Ci/fk
Tpi
≤ 1. (6.4)
Here, the system has n tasks and the i-th task demands Ci cycles per period Tp.
Among all tasks, the scheduler first executes the task with the earliest deadline
and positive allocation. As the task is executed, its cycle allocation is decreased
by the number of cycles it consumes. When its allocation is exhausted, the task is
preempted to run in best-effort mode for overrun protection.
This algorithm only protects the system overrun in a best-effort mode rather
than provides a mechanism to adjusting the CPU frequency for minimum power
consumption. An improved algorithm called practical voltage scaling (PDVS) is
used to minimize the total energy of the such multimedia system [62].
6.3.1 Practical Dynamic Voltage Scaling (PDVS)
The goal of PDVS is to minimize the total energy of the mobile device while provid-
ing soft performance guarantees to each multimedia task. For this, PDVS extends
the soft real-time scheduling algorithm by changing the CPU speed of task execu-
tion. The purpose is to save energy since the CPU may run slower without affecting
application performance. Assume that there are n tasks and each task is allocated
72
Ci cycles per period Tp and all concurrent tasks run at a uniform CPU frequency.
The total CPU demand of all tasks is
1
fk
n∑
i=1
Ci
Tpi
≤ 1. (6.5)
The uniform frequency can be the lowest frequency that is no less than the total
demand. It is
min f : f ∈ f1, f2, . . . , fk, and f ≥
n∑
i=1
Ci
Tpi. (6.6)
If each task used its allocated cycles exactly, this uniform speed would minimize
CPU energy due to the convex nature of the frequency-power relationship [15, 52].
There is also some energy waste although the frequency of CPU is adapted because
the cycle allocation of a task is based on the some percentile of the number of cycles
demanded by its jobs. For example, if this percentile is 95% then about 95% of its
jobs will complete earlier. Early completion may eventually cause the CPU to be
idle, which, in turn, results in energy waste since the device still consumes energy
during the idle time.
Note that the more energy can be saved by avoiding or reducing the idle time.
Therefore, PDVS dynamically adapts the CPU speed of each job execution in a way
that minimizes the total energy consumed during the job execution while bounding
the job’s execution time. The reason for bounding the execution time is not to miss
the deadline of the job or other jobs executed after the job. In other words, each
job should finish within a certain amount of time. Therefore the time budget is
allocated to each job. Specifically, if there are n concurrent tasks and each task is
73
allocated Ci cycles per period Tpi, then the i-th task is allocated
Ti =Ci
∑ni=1
Ci
Tpi
, (6.7)
time units per period Tpi (i.e. for each of its jobs). That is, we distribute the time
among all tasks based on their cycle demands. Note that if there is only a single
task, then its time budget equals to its period. Using joint allocation of cycles and
time, we can adapt the execution speed for each job as long as the job can use its
allocated cycles within its allocated time.
How is the execution speed dynamically adapted for jobs of each invitational
task? Without loss of generality, we consider a specific task with cycle allocation C
and time allocation T per period Tp. The basic idea behind PDVS is to minimize the
total energy consumed during each job execution while limiting the job’s execution
time within its time budget. To do this, PDVS sets a speed for each of the cycles
allocated to the job. If a cycles x, 1 ≤ x ≤ Cis executed at speed f(x), its execution
time is 1f(x)
. The energy consumed by the system during this time interval is
1
f(x)× p(f(x)) =
p(f(x))
f(x), (6.8)
where p(f(x)) is the total power consumed by the whole device at speed f(x).
Because multimedia tasks demand cycles statistically, the cycle x is executed with
a probability and its expected energy is
F (x)p(f(x))
f(x), (6.9)
74
where F (x) is the tail of the distribution function:
F (x) = 1 − Pr(X ≤ x). (6.10)
By setting a speed for each of the cycles allocated the job, the minimization of the
total energy consumed during the job execution while bounding its execution time
is, more formally, with the speed adaptation problem as
minC∑
x=1
F (x)p(f(x))
f(x)+
(
T −C∑
x=1
F (x)1
f(x)
)
· Pidle, (6.11)
s.t.
C∑
x=1
1
f(x)≤ T,
f(x) ∈ f1, f2, . . . , fk, 1 ≤ x ≤ k,
where T is the time budget allocated to the job and Pidle is the device power when
the CPU is idle at the lowest speed.
In practice, using the constrained optimization (6.12) is often not feasible because
of two major reasons. First, the number of needed cycles, C, may be very large for
multimedia tasks (e.g., in millions). Consequently, the computational overhead for
solving the optimization problem may be unacceptably large. Second, we cannot set
a speed for individual cycles since each speed change may take tens of microseconds
[62]. To reduce the cost, a piece-wise approximation technique is applied that groups
the allocated cycles and sets a same speed within each group. Specifically, we first
use a set of points, b1, b2, . . . , bk+1 , to divide the allocated cycles into k groups, each
with size gi, 1 ≤ x ≤ k. We then find a speed f(bi) for each group [bi, bi+1). In this
75
way, we rewrite the above constrained optimization as
mink∑
i=1
giF (bi)p(f(bi))
f(bi)+
(
T −k∑
i=1
giF (bi)
f(bi)
)
· Pidle, (6.12)
s.t.k∑
i=1
gi1
f(bi)≤ T,
f(bi) ∈ f1, f2, . . . , fk, 1 ≤ i ≤ k.
This leads to an optimization problem over a discrete set, which can be solved at
least by brute-force search. Certainly, some fast algorithms, such as gradient search
may be applied.
6.3.2 Cross-layer Adaptation for Single-Stream Video En-
coding
In the following, we consider PDVS adaptation for a single video encoding task. In
Section 6.3.1, the PDVS adaptation algorithm is considered for more generic adap-
tation problem with multiple CPU tasks. Within the context of single-stream video
encoding, this PDVS adaptation problem can be further simplified. First, in our
system, video encoding is the dominant CPU task while other system management
tasks require a very small amount of cycles and they can be considered as constant
over the system running time. Second, the time budget for this one task is not
changed considering no delay allowed and the adaptation is in coarse time granular-
ity that is same as the time of a video segment encoding, denoted as TGoP . In this
way, Eq. (6.13) can be simplified as
gjp(f(bi))
f(bi)+
(
TGoP − gj1
f(bi)
)
· Pidle, (6.13)
76
s.t. gj1
f(bi)≤ TGoP , j = 1, 2, . . . , 7,
f(bi) ∈ f1, f2, . . . , fk, 1 ≤ i ≤ k.
Here the TGoP is the time budget which equals to the total encoding time fo the
video segment. gj is average computational complexity the j-th cluster of video
segments. This is solution for the one task and no delay video coding system. Note
that the function 6.14 represents the average energy consumption for encoding the
video segments that are classified into the j-th cluster. The total energy consumption
of the whole video sequence is given by
P Be =
7∑
j=1
Pj
(
gjp(f(bi))
f(bi)+
(
TGoP − gj1
f(bi)
)
· Pidle
)
, (6.14)
s.t. gj1
f(bi)≤ TGoP , j = 1, 2, . . . , 7,
f(bi) ∈ f1, f2, . . . , fk, 1 ≤ i ≤ k.
This is the power consumption P Be with cross-layer adaptation. If cross-layer adap-
tation is not used, the expected power consumption, P Ae is given by
P Ae =
7∑
j=1
Pj (gmk · p(fk)) = TGoP p(fk), j = 1, 2, . . . , 7, (6.15)
s.t.gm
k
fk≈ TGoP .
Let gmi denote the maximum cycles in the period TGoP under clock speed fi. The
energy saving ratio between these two cases with and without cross-layer adaptation
77
is
P Be
P Ae
=
∑7j=1 Pj
(
gjp(f(bi))
f(bi)+(
TGoP − gj1
f(bi)
)
· pidle
)
TGoP · p(fk)
=1
TGoP
7∑
j=1
Pj
(gj
f(bi)·p(f(bi))
p(fk)+
(
TGoP −gj
f(bi)
)
·pidle
p(fk)
)
=
7∑
j=1
Pj
(gj
gmi
·p(f(bi))
p(fk)+
(
1 −gj
gmj
)
·pidle
p(fk)
)
, (6.16)
s.t. gmi
1
f(bi)≈ TGoP , j = 1, 2, . . . , 7,
f(bi) ∈ f1, f2, . . . , fk, 1 ≤ i ≤ k.
Considering the test video sequence used in the Chapter 5, which has 3585
frams partitioned into 57 video segments. It is compressed with P-R-D optimal
scaling encoder which runs on a Stargate microprocessor. The Stargate’s power
consumption model is listed in Table 6.1. The probability of cluster is shown in
Table 5.2. With the encoding results in the Chapter 5 Table 5.7, the real energy
saving ratio can be computed with the (6.17). Assuming that the power consumption
levels of Stargate at running and idle modes are the same. The energy saving results
are shown in Table 6.2. From these results, we can see that the real energy saving
ratio is not as good as theoretic evaluation in the Chapter 5. This is because the
system power consumption not only includes CPU energy consumption, but also
energy consumption by other system components which cannot be scaled by DVS.
78
Cluster Cluster1 Cluster2 Cluster3 Cluster4 Cluster5 Cluster6 Cluster7 R-Ratiogj 9.1741 × 106 1.1255 × 107 9.0274 × 106 1.5083 × 107 1.7511 × 107 1.7847 × 107 1.6248 × 107
37db gj/gmi 0.856 0.689 0.844 0.923 0.796 0.812 0.995 0.9193
(1) fk 200 300 200 300 400 400 300pi/pk 0.857 0.884 0.857 0.884 1 1 0.884
gj 9.1741 × 106 1.1255 × 107 9.0274 × 106 1.7524 × 107 1.7511 × 107 2.0118 × 107 2.0844 × 107
37db gj/gmi 0.856 0.689 0.844 0.797 0.796 0.915 0.948 0.9602
(2) fk 200 300 200 400 400 400 400pi/pk 0.857 0.884 0.8571 1 1 1 1
Table 6.2: The Energy Saving Ratio1
Cluster Cluster1 Cluster2 Cluster3 Cluster4 Cluster5 Cluster6 Cluster7 R-Ratiogj 7.7220 × 106 7.7990 × 106 7.8556 × 106 1.1267 × 107 1.4371 × 107 1.4754 × 107 1.4893 × 107
34db gj/gmi 0.721 0.729 0.735 0.690 0.880 0.903 0.911 0.8760
(1) fk 200 200 200 300 300 300 300pi/pk 0.857 0.857 0.857 0.884 0.884 0.884 0.884
gj 7.7220 × 106 7.7990 × 106 8.8660 × 106 1.4609 × 107 1.9672 × 107 1.9632 × 107 2.1972 × 107
34db gj/gmi 0.721 0.729 0.828 0.894 0.895 0.893 0.999 0.9248
2 fk 200 200 200 300 400 400 400pi/pk 0.857 0.857 0.857 0.884 1 1 1
Table 6.3: The Energy Saving Ratio 2
Cluster Cluster1 Cluster2 Cluster3 Cluster4 Cluster5 Cluster6 Cluster7 R-Ratiogj 7.8002 × 106 8.3190 × 107 7.8587 × 106 8.9538 × 106 1.2106 × 107 1.1623 × 107 1.3818 × 107
30db gj/gmi 0.729 0.778 0.791 0.836 0.742 0.712 0.847 0.8684
1 fk 200 200 200 200 300 300 300pi/pk 0.857 0.857 0.857 0.857 0.884 0.884 0.884
gj 7.8002 × 106 8.3190 × 107 7.8587 × 106 1.0655 × 107 1.3056 × 107 1.4888 × 107 1.7642 × 107
30db gj/gmi 0.729 0.778 0.791 0.996 0.799 0.911 0.802 0.8764
2 fk 200 200 200 200 300 300 400pi/pk 0.857 0.857 0.857 0.857 0.884 0.884 1
Table 6.4: The Energy Saving Ratio 3
79
Chapter 7
Energy-Aware Embedded Video
Encoding System Design
In this chapter, we describe energy-aware embedded video encoding system design.
An energy-aware embedded system, named DeerNet video sensor system, has been
designed. In embedded video communication system design for wildlife activity
monitoring, the system is expected to operate over an extended period of time, say
a few weeks or even months. Therefore, energy minimization of video encoding is
very critical. In this chapter, we introduce the tier architecture in our design. Based
on this tier architecture, the non-reducible power can be saved.
In Chapter 6, we have explained how the video encoding energy can be saved
through optimum complexity control of video encoders and cross-layer adaptation.
It should be noted that the results obtained in Chapter 5 and Chapter 6 are a little
different. This is because, former results are obtained in an ideal case which assumes
that the circuit voltage and clock speed can be adjusted continuously for the whole
system. It is assumed that the power consumed by circuit is proportional to the
80
cubic of the running speed. In fact, practical systems, e.g., Stargate, consist of
many modules which implement a variety of system functions. All of these modules
cannot be adjusted in the voltage and/or speed. The other reason is the module
that can be adapted has some static power consumption. Those two reasons cause
the power consumption of the real system is not proportional to the cubic of the
running speed. So that, the system organization method provides other way that
results in more efficient power consuming in the system, specifically for embedded
system. The tier system architecture is one possible approach to saving power even
more.
7.1 Tier architecture
We observe that the power assumption in the real system can be categorized into
two classes: reducible power and non-reducible power. The reducible power is the
amount of power that can be eliminated from a running system while maintaining
the ability to do computation. On the other hand, the non-reducible power of system
can not be eliminated from a running system, which dominates the lifetime of the
battery, see results in Chapter 5. Common sources of non-reducible power include
the power supply, on-board oscillators, memory and I/O buses, and the limited
range of frequency and voltage scaling [19].
The amount of non-reducible power, however, varies on different platforms. Typ-
ically, platforms are carefully optimized to provide their promised functionality at
the lowest possible energy cost, and platforms that provide less functionality have
smaller non-reducible power. For example, the laptop provides more functions than
PADs and power consumption of the laptop is also more than that of the PDA. For-
81
tunately, there is significant overlap in the functionality provided by high-power and
low-power platforms. Therefore, if the tasks running on the system have different
computational complexity, the tasks can be assigned to different platforms which
have different functionalities and non-reducible power. The system is organized in
tiers, such system is composed of a set of tiers, each with a set of capabilities and
a power modes. The system as a whole executes tasks by waking the tier that has
the capabilities to execute the task in the most efficient manner. Fig. 7.1 shows the
tier system architecture.
Figure 7.1: Tier Architecture
The tiers can be more than two and in our system two different platforms,
Stargate and MICA , are constructed in a whole system. For such two tiers system,
the nature of the power consumption of the system is analyzed here. Considering
82
the system works like that the Stargate is waked up by the Mote to take video, then
it goes back to sleep and the MICA runs all time to determine whether it needs to
wake up Stargate.
If fSA is the fraction of time the tier Stargate spends awake, P S
A is the power it
expends while awake, 1 − fSA is the fraction of time the system spends asleep, and
P SS is the power it expends while suspended, and fM
A is the fraction of time the tier
MICA spends awake, P MA is the power it expends while awake, 1−fM
A is the fraction
of time the system spends asleep, and P MS is the power it expends while suspended,
the power consumption of the tier system, is
PT = fSA · P S
A + (1 − fSA) · P S
S + fMA · P M
A . (7.1)
Assuming a system consist of only Stargate, The power consumption of the
system is
PS = fSA · P S
A . (7.2)
The power saving ratio is
Raio =PT
PS
=1600 × 1/3 + 107 × 2/3 + 45 × 1
1600 × 1= 0.409. (7.3)
The average power consumption of Stargate is about 1600 mW during active
modes and 107 mW during sleep. The lower tier MIAC’s average power consumption
is about 45 mW. We assume that the video device captures 8 hours of video data
per day.
83
7.2 System Design
The design of a power-aware embedded system, e.g., DeerNet video sensor, is com-
posed of three parts: the hardware, the underlying system architecture, and the
model for distributing applications across the tiers. In general, the design is similar
to many distributed systems; each tier is under autonomous control while decisions
are made in a distributed manner. Client applications reside at the most powerful
tier, and tasks that support those applications are distributed among the various
tiers.
7.2.1 Hardware
The DeerNet video sensor is designed in a strictly hierarchical manner, and the
higher tier is more powerful than the lower tier. Two tiers can communicate each
other. Communication occurs via a local port and the tiers are connected to a
common power source. Moreover, lower tier has the ability to draw its superior tier
out of a suspended mode. An overview diagram of our design is shown in Fig. 7.2.
The higher tier is based on the Crossbow’s Stargate platform, which has an
XScale PXA255 CPU (400 MHz) with 32MB flash memory and 64MB SDRAM.
PCMCIA and Compact Flash connectors are available on the main board. The
Stargate also has a daughter board with Ethernet, USB and serial connectors. A
Logitech QuickCam Pro 4000 webcam is connected through a USB connection for
video capture [11].
The MICA plays the lower tier rule which works with a powerful Atmega128L
micro-controller. The data, measurements, and other user-defined information are
stored in a 4-Mbit serial flash (Atmel AT45DB041). It is connected to one of the
84
Figure 7.2: The Video Sensor Tiers
USART on the ATMega128L [12]. This is very low-power platform. The average
power consumption is about 45 mW.
The two tiers are connected through a 51-pin port which includes a local link
and a wakeup control (describing later).
7.2.2 Software
In the higher tier Stargate, the linux operating system manages all the tier resources
including the power management and the video capture. The video capture mod-
ule performances the new P-R-D optimization encoder which compresses the video
sequence into CF card. The power management can put the Stargate into sleep if
it is necessary. Once the Stargate is in sleep state, it will wait for the waking up
signal from MICA at the lower tier.
MICA at lower tier performs such tasks: determining the motion signal that
reflects the deer motion state, recording the deer motion state, communicating with
the higher tier, and sending wakeup signal to higher tier Stargate. All tasks running
85
on MIAC are controlled by operating system called TinyOS. The software diagram
is shown in Fig. 7.3.
Figure 7.3: Software constructure
7.3 Power Management
7.3.1 Signal Design
The goal of power management is to put the higher tier Starget to sleep at a proper
time or situation. What is proper time or situation? DeerNet video sensor is used
to track deer’s action and some situations do not need to record such as after sleep,
repeating same action, and at night. In these situations and time, the sensor needs
to go sleep. On Stargate, a timer is designed. It tracks what time it is and how
long it will last. Therefore, once the timer shows night coming, the Stargate is put
into sleep. Another signal comes from MICA which determines motion state. If it
86
determines that the deer is not moving, MICA sends a signal to Stargete that tells
Stargate to stop capturing video and go to sleep.
The clock signal can be obtained by the Stargate. It can be implemented by a
read clock command. The motion wakeup signal comes MIAC through local link
which is a UART included a 51 pin connecter that connects both Stargate and
MIAC. The signal can transmit to Stargate as a massage. The video capture signal
is triggered starting capture. So far, the signal is used to makes Stargate to sleep.
How is the waking up signal generated? It is must come from MICA because the
Stargate is sleeping. Waking up Stargate needs two steps. The first step is to set
a Stargate GPIO port into an interrupt allow state. Next, we need to let MICA
triggers this GPIO port. Once the GPIO port is triggered by MICA, an interrupt
routine is performed to wake up stargate. Fig. 7.4 shows the 51 pin connecter
definition.
The local link is provided by a UART port on pin 19 and pin 20 that communi-
cates with MICA. Pin 5 connects to PXA 255 general input/output port 1 (GPIO1)
which can generate an interrupt signal to wake up the Stargate.
7.3.2 State Control
Controlling Stargate state needs to initialize corresponding PAX 255 register e.g.,
interrupt control registers (ICMR, ICLR, ICCR, ICIP, ICPR) and power manager
control registers (PMCR, PSSR, PWER, PRER, etc.) [24]. It is difficult to read
or write such registers directly. Fortunately, the operating system provides such
control commands though the /proc file directory. In this way, the sleep and wakeup
controlling become convenient.
Making Stargate sleeping: The command is
87
Figure 7.4: Definition of the Connecter Pins
88
echo 1 > /proc/sys/pm/suspend.
This command actually puts the Stargate to sleep and wait to awake up. In
order to wake up a sleeping Stargate, the wakeup signal needs to set before Star-
gate in asleep. Because we set the pin5 GP1 M-INT1 as the waking up signal, the
correlation register has to set with the command
echo w1 > /proc/platx/gpio/GPCTL.
We also put a timer to prevent the wakeup signal failure. If the wakeup signal
fails, the timer can wake up Stargate also. A command to set the timer is
echo CONFIG_WAKEUPTIME=$time > /proc/platx/config.
89
Chapter 8
Conclusion and Future Work
8.1 Conclusion
In this work, based on power-rate-distortion (P-R-D) optimization, we develop a
new approach for energy minimization for delay-tolerant video communication over
portable devices. We have developed a parametric video encoding architecture which
is fully scalable in power consumption. We have successfully extended the tradi-
tional R-D analysis by considering another dimension, the power consumption, and
established the P-R-D analysis framework for embedded video encoding and com-
munication under energy constraints. Using the P-R-D model, given a power supply
level and a bit rate, the power-scalable video encoder is able to find the best config-
uration of complexity control parameters to maximize the video quality. The P-R-D
analysis establishes a theoretical basis and provides a practical guideline in system
design and performance optimization for wireless video communication under energy
constraints.
Theoretically, we demonstrated that extending the traditional rate-distortion (R-
90
D) analysis to P-R-D analysis gives us another dimension of flexibility in resource
allocation and performance optimization. We have analyzed the energy saving per-
formance of P-R-D optimization. We have also developed an adaptive scheme to
estimate the P-R-D model parameters and perform online energy optimization and
control for real-time video compression. Our simulation results show that, for typical
videos with non-stationary scene statistics, using the proposed P-R-D optimization
technology, the video encoding energy can be significantly reduced, especially for
delay-tolerant video communication applications where the per-bit energy cost of
wireless transmission is relatively low. This has a significant impact on energy-
efficient portable video communication system design.
8.2 Future Work
In our future work, we shall study and evaluate resource allocation and energy min-
imization for real-time video encoding and communication on embedded computing
platforms with dynamic voltage control. We shall also study how the P-R-D analy-
sis could be used for joint hardware and encoder energy minimization. In addition,
the follow issues need to be further investigated: (1) The training-classification ap-
proach does not work well with some video segments with significant change in scene
activities. (2) We should exploit the communication delay as a system resource to
further reduce the overall energy consumption. With such delay, the multi video
encoding tasks are in the system that might cause the system work on the more
efficient state because the system can be optimized with function 6.13.
91
References
[1] Mpeg-2 video test model 5. ISO/IEC JTC1/SC29/WG11 MPEG93/457 (Apr1993).
[2] A.Chandrakasan, S.Sheng, and R.W.Brodersen. Low-power cmos dig-ital design. IEEE Journal of Solid-State Circuits Vol.27 (Apr 1992), 473–484.
[3] AMD. Mobile amd athlon 4 processor model 6 cpga data sheet.http://www.amd.com (Nov 2001).
[4] A.M.Tourapis, O.C.Au, and M.L.Liou. Predictive motion vector fieldadaptive search technique (pmvfast) - enhancing block based motion estimation.proceedings of Visual Communications and Image Processing 2001 (Jan 2001).
[5] A.Ortega, and K.Ramchandran. Rate-distortion methods for image andvideo compression: An overview. IEEE Signal Processing Magazine Vol.15 No.6(Nov 1998), 23 – 50.
[6] A.R.Lebeck, X.Fan, H.Zeng, and C.S.Ellis. Power aware page alloca-tion. In Proceedings of International Conference on Architectural Support forProgramming Languages and Operating Systems Cambridge MA (Nov 2000).
[7] B.Erol, F.Kossentini, and H.Alnuweiri. Efficient coding and mappingalgorithms for software-only real-time video coding at low bit rates. IEEETransactions on Circuits and Systems for Video Technology Vol.10 (Sep 2000),843–856.
[8] B.Li, and K.Nahrstedt. A control-based middleware framework for qualityof service adaptations. IEEE J. Select. Areas Commun. vol.17 (Sep 1999),1632–1650.
[9] C.Efstratiou, A.Friday, N.Davies, and K.Cheverst. A platform sup-porting coordinated adaptation in mobile systems. In Proceedings of 4th IEEEWorkshop on Mobile Computing Systems and Applications Callicoon NY (Jun2003).
92
[10] C.Hughes, J.Srinivasan, and S.Adve. Saving energy with architecturaland frequency adaptations for multimedia applications.
[11] Crossbow Technology, I. Stargate developer’s guide.
[12] Crossbow Technology, I. Mpr-mib users manual.
[13] D.G.Sachs, S.Adve, and D.L.Jones. Cross-layer adaptive video coding toreduce energy on general-purpose processors. Proceedings of the InternationalConference on Image Processing ICIP ′03 Barcelona Spain (Sep 2003).
[14] D.G.Sachs, W.Yuan, C.J.Hughes, A.F.Harris, S.V.Adve, D.L.Jones,
R.H.Kravets, K.Nahrstedt, and Sidebar. Grace: A cross-layer adap-tation framework for saving energy. IEEE Computer, special issue on Power-Aware Computing vol.10 (Dec 2003), 50–51.
[15] D.Mosse, H. R., and P.Alvarez. Dynamic and aggressive scheduling tech-niques for power-aware real-time systems. In Proc. of 22nd IEEE Real-TimeSystems Symposium (Dec. 2001).
[16] D.S.Turaga, der Schaar, M., and B.Pesquet-Popescu. Complexityscalable motion compensated wavelet video encoding. IEEE Transactions onCircuits and Systems for Video Technology Vol.15 (Aug 2005), 982–993.
[17] et al, B. Agile application-aware adaptation for mobility. In Proceedingsof 16th Symposium on Operating Systems Principles Saint Malo France (Dec1997).
[18] et al, P. The emergence of networking abstractions and techniques in tinyos.In Proceedings of First Symposium on Networked System Designe and Imple-mentation San Francisco CA (Mar 2004).
[19] G.CHINN, S.DESAI, E.DISTEFANO, K.RAVICHANDRAN, and
S.THAKKAR. Mobile pc platforms enabled with intel centrino mobile tech-nology. Intel Technology Journal Vol.7 (May 2003).
[20] H.Zeng, X.Fan, C.Ellis, A.Lebeck, and A.Vahdat. Ecosystem: Man-aging energy as a first class operating system resource. In Proceedings of 10thIntl. Conf. on ASPLOS San Jose CA (Oct 2002).
[21] I.M.Pao, and M.T.Sun. Statistical computation of discrete cosine transformin video encoders. Journal of Visual Communication and Image RepresentationVol.9, no.2 (Jun 1998), 163–170.
93
[22] Inc, A. Amd powernow!tm technology platform design guide for embeddedprocessors. http://www.amd.com/epd/processors.
[23] Inc, I. Intel xscale technology.
[24] Intel. Intel pxa255 processor: Developer manual.
[25] Intel. Pentium m processor. http://developer.intel.com/design/mobile/datashts/261203.pdf (Apr 2004).
[26] ITU-T. Video coding for low bit rate communications. ITU-T RecommendationH.263 version 1 and version 2 (Jan 1998).
[27] J.Chen, and K.J.R.Liu. Low-power architectures for compressed domainvideo coding co-processor. IEEE Transactions on Multimedia Vol.2 (Jun 2000),111–128.
[28] J.Flinn, Lara, E., M.Satyanarayanan, D.S.Wallach, and
W.Zwaenepoel. Reducing the energy usage of office applications. InProceedings of Middleware 2001 Heidelberg Germany (Nov 2001).
[29] J.Lorch, and A.Smith. Improving dynamic voltage scaling algorithms withpace. Proceedings of the ACM SIGMETRICS 2001 Conference (Jun 2001).
[30] J.Lorch, and A.Smith. Operating system modifications for task-based speedand voltage scheduling. In Proceedings of the 1st International Conference onMobile Systems Applications and Services San Francisco CA (May 2003).
[31] J.Ostermann, J.Bormans, P.List, D.Marpe, M.Narroschke,
F.Pereira, T.Stockhammer, and T.Wesi. Video coding with h.264/avc:Tools, performance, and complexity. IEEE CIRCUITS AND SYSTEMS MAG-AZINE (FIRST QUARTER 2004).
[32] J.Villasenor, C.Jones, and B.Schoner. Video communications usingrapidly reconfigurable hardware.
[33] K.Hyungjoon, N.Kamaci, and Y.Altunbasak. Low-complexity rate-distortion optimal macroblock mode selection and motion estimation for mpeg-like video coders. IEEE Transactions on Circuits and Systems for Video Tech-nology Vol.15 (Jul 2005), 823–834.
[34] L.Benini, A.Bogliolo, and G.D.Micheli. A survey of design techniquesfor system-level dynamic power management. IEEE Transactions on VLSISystems Vol.8 (Jun 2000.).
94
[35] M.Mesarina, and Y.Turner. Reduced energy decoding of mpeg streams.In Proceedings of SPIE Multimedia Computing and Networking Conference SanJose CA (Jan 2002).
[36] P.A.Chou, and A.Sehgal. Rate-distortion optimized receiver-drivenstreaming over best-effort networks. Packet Video Workshop Pittsburg PA.(Apr 2002).
[37] page, W. http://www.abiresearch.com / products / market research / mo-bile broadcast video.
[38] page, W. Panel report of nsf workshop on sensors for environmental observa-tories. Seattle, WA. Nov.30 (Dec.2 2004), http://www.wtec.org/seo.
[39] P.Agrawal, J-C.Chen, S.Kishore, P.Ramanathan, and
K.Sivalingam. Battery power sensitive video processing in wirelessnetworks. Proceedings IEEE PIMRC’98 Boston (Sep 1998).
[40] P.Pillai, and K.G.Shin. Real-time dynamic voltage scaling for low-powerembedded operating system. Proceedings of 18th Symposium on Operating Sys-tem Principles, Banff, Canada (Oct 2001).
[41] P.Pillai, and K.G.Shin. Real-time dynamic voltage scaling for low-powerembedded operating systems. In Proceedings of 18th Symposium on OperatingSystems Principles Banff Canada (Oct 2001).
[42] R.Min, T.Furrer, and A.Chandrakasan. Dynamic voltage scaling tech-niques for distributed microsensor networks. IEEE Computer Society Workshopon VLSI (Apr 2000), 43–46.
[43] R.Rajkumar, C.Lee, J.Lehoczky, and D.Siewiorek. A resource allo-cation model for qos management. In Proceedings of 18th IEEE Real-TimeSystems Symposium San Francisco CA (Dec 1997).
[44] S.Banachowski, and S.Brandt. The best scheduler for integrated process-ing of besteffort and soft real-time processes.
[45] S.Gurumurthi, A.Sivasubramaniam, and M.Kandemir. Drpm: Dy-namic speed control for power management in server class disks. In Proceedingsof 30th Annual International Symposium on Computer Architecture San DiegoCA (Jun 2003).
[46] S.Iyer, L.Luo, R.Mayo, and P.Ranganathan. Energy-adaptive displaysystem designs for future mobile environments. In Proceedings of International
95
Conference on Mobile Systems Applications and Services San Francisco CA(May 2003).
[47] S.M.Akramullah, I.Ahmad, and M.L.Liou. Optimization of h.263 videoencoding using a single processor computer: performance tradeoffs and bench-marking. IEEE Transaction on Circuits and System for Video TechnologyVol.11 (Aug 2001), 901 – 915.
[48] S.Mohapatra, and N.Venkatasubtramanian. Power-aware reconfiguremiddleware. In Proceedings of IEEE 23nd International Conference on Dis-tributed Computing Systems Providence RI (May 2003).
[49] S.V.Adve, A.F.Harris, C.J.Hughes, D.L.Jones, R.H.Kravets,
K.Nahrstedt, D.G.Sachs, R.Sasanka, J.Srinivasan, and W.Yuan.The illinois grace project: Global resource adaptation through cooperation.Proceedings of the Workshop on self-Healin ,Adaptive and self-managed System(June 2002).
[50] T.Berger. Rate Distortion Theory. Prentice Hall, Englewood Cliffs, NJ, 1984.
[51] T.Burd, and R.Broderson. Processor design for portable systems. Journalof VLSI Signal Processing Vol.13 No.2 (Aug 1996), 203–222.
[52] T.Ishihara, and H.Yasuura. Voltage scheduling problem for dynamicallyvariable voltage processors. In Proc. of Intl. Symp. on Low-Power Electronicsand Design (1998).
[53] T.Pering, T.Burd, and R.Brodersen. Voltage scheduling in the lparmmicroprocessor system.
[54] T.Sikora. The mpeg-4 video standard verification model. IEEE Transactionon Circuits and System for Video Technology Vol.7 (Feb 1997), 19 – 31.
[55] T.Wiegand. Text of committee draft of joint video specification (itu-t rec.h.264 — iso/iec 14496-10 avc). Document JVTC167 3rd JVT Meeting FairfaxVirginia USA (May 2002).
[56] V.Akella, der Schaar, M., and Kao, W.-F. Proactive energy opti-mization algorithms for wavelet-based video codecs on power-aware processors.Proceedings of of IEEE International Conference on Multimedia and Expo (Jul2005), 566– 569.
[57] V.Grassi, and R.Mirandola. Derivation of markov models for effective-ness analysis of adaptable software architectures for mobile computing. IEEETransactions on Mobile Computing vol.2 (Jun 2003).
96
[58] V.Raghunathan, M.Srivastava, T.Pering, and R.Want. Stargate: En-ergy management techniques. Network and Embedded System Lab (NESL) andUbiquirt SRP, Intel Reasearch.
[59] V.Raghunathan, P.Spanos, and M.Srivastava. Adaptive power-fidelityin energy aware wireless embedded systems. In Proceedings of IEEE Real TimeSystems Symposium London UK (Dec 2001).
[60] W.P.Burleson, P.Jain, and S.Venkatraman. Dynamically parameter-ized architecture for power-aware video coding: Motion estimation and dct.Proceedings of the Second USF International Workshop on Digital and Compu-tational Video (2001).
[61] W.Yuan, and K.Nahrstedt. Integration of dynamic voltage scaling andsoft real-time scheduling for open mobile systems. In Proceedings of 12th In-ternational Workshop on Network and OS Support for Digital Audio and VideoMiami Beach FL (May 2002).
[62] W.Yuan, and K.Nahrstedt. Practical voltage scaling for mobile multime-dia device. MM’04 (Oct 2004).
[63] X.Lu, Y.Wang, and E.Erkip. Power efficient h.263 video transmission overwireless channels. Proceedings of 2002 International Conference on Image Pro-cessing (Sep 2002).
[64] Z.-L.He, C.-Y.Tsui, K.-K.Chan, and M.Lion. Low-power vlsi design formotion estimation using adaptive pixel truncation. IEEE Trans. On Circuitsand System for Video Technology vol.10 (Aug 2000).
[65] Z.He, and S.K.Mitra. A unified rate-distortion analysis framework for trans-form coding. IEEE Transactions on Circuits and System on Video Technologyvol.11 (Dec 2001), 1221–1236.
[66] Z.He, Y.Liang, L.Chen, I.Ahmad, and D.Wu. Power-rate-distortion anal-ysis for wireless video communication under energy constraint. IEEE Transac-tion on Circuits and System for Video Technology (May 2005).
97