ISLPED – Portland, OR – August 28, 2007
A 0.4-V UWB Baseband Processor
Vivienne Sze
Anantha P. Chandrakasan
Massachusetts Institute of Technology
OutlineOutline
� UWB Specifications and System Architecture
� Baseband Algorithm and Architecture
� Parallelism for Energy Efficiency� aggressive voltage scaling� reduced receiver on-time
� Chip Measurements
� Conclusions
UltraUltra --wideband (UWB) Radiowideband (UWB) Radio
distance1m 10m 100m
500Mb
50Mb
5Mb
500kb
WLAN
Wireless USB &
Multimedia
Locationing/Tagging
� Advantages of UWB communications include�High Data Rate�Excellent Multi-path Resolution�Low Interference
� Integrate UWB radios on battery operated devices
� Need an energy efficient UWB System
Narrowband
Narrowband
UWB
UWB
Possible UWB ApplicationsUWB versus Narrowband
freq
time
UWB System ArchitectureUWB System Architecture
3.1 10.6-61.3
-51.3
-41.3
Frequency [GHz]
Pow
er d
ensi
ty [d
Bm
]
14 Channel Frequency Plan
Figure courtesy of D.Wentzloff
Sampling Rate500 MSPS
Data Rate 100 Mbps
[F.S. Lee et al., ICUWB 2006]
Packet Structure (PHY layer)Packet Structure (PHY layer)
� Goal : Reduce overhead energy (acquisition)
� Majority of acquisition energy spent on computation of cross-correlation
PREAMBLE PAYLOAD
...
State 1
Channel
Estimation
State 1
Acquisition
State 2
Demodulation
Packet Begins
Receiver
Turns OFF
Receiver
Turns ON
40ns
10ns
... Payload BitsGold
Code
31-bits
Gold
Code
Gold
Code
Gold
Code
Gold
Code
CrossCross --Correlation ComputationCorrelation Computation
∑=
−×=619
0
][][][k
nkxkhny
� Points can be computed independently
Baseband ArchitectureBaseband Architecture
10100110
� Cross-correlation requires a fixed number of operations
� Reduce energy of each operation in order to reduce baseband energy
� Map operations to architecture that reduces system energy
Energy Efficiency Using ParallelismEnergy Efficiency Using Parallelism
� Ultra-Low Voltage Operation - Maintain Throughput (L)� Reduce supply voltage � Minimize energy of baseband processor
� Reducing Acquisition Time - Parallelized Computation (M)� Reduce receiver on-time � Minimize energy of entire receiver (system)
Exploit TWO forms of parallelism in Correlator Bank
Baseband Energy SavingsBaseband Energy Savings
� Correlators compute the cross-correlation function
� Voltage scaling to reduce energy per operation
� Parallelize to maintain throughput of 500 MSPS
� Designed and simulated in a 90-nm process
Correlator Architecture ∑ −×= ][][][ nkxkhny
0 0.2 0.4 0.6 0.8 110-3
10-2
10-1
100
VDD
(V)
Ene
rgy
per
oper
atio
n (n
orm
aliz
ed)
Leakage Energy
Dynamic Energy
Total Energy
Minimum Energy Point
Baseband Energy SavingsBaseband Energy Savings
� At the minimum energy point of 0.3 V � 9X energy reduction
� Set clock frequency to 25 MHz (preamble PRF)
� Parallelize by L=20 to maintain 500 MSPS throughput
� Need to raise voltage to 0.4 V to achieve 25 MHz
� At 0.4 V, reduce energy per operation by 5.8X
Minimum Energy Point
Total Energy
Active Energy
Leakage Energy
leakageactivetotal EEE +=
leakDDperiodleakage IVTE ∝2DDeffactive VCE ∝
Summary of MethodologySummary of Methodology
� Select the optimal degree of parallelism that� minimizes energy consumption� meets performance constraints
1. Determine VDDMEP (at minimum energy point)
2. Determine delay and throughput at VDDMEP
3. Divide required throughput by the throughput at VDDMEP to obtain the necessary degree of parallelism (L)
System Energy SavingsSystem Energy Savings
� Trade-off area for time by mapping to parallel architecture
� Reducing acquisition time allows for fewer number of Gold Code repetitions in the preamble
� RF front-end and ADC can be turned off earlier
� Energy savings across the entire system
Acquisition Energy ReductionAcquisition Energy Reduction
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1 4 8 11 16 31
Parallelism (M)
Syn
chro
niza
tion
ener
gy
(nor
mal
ized
)
Digital Baseband ADCs Baseband Amplifiers RF front end
Acq
uisi
tion
Ene
rgy
Reduce RF front-end and ADC energy! 14.7X overall
reduction
[V. Sze, R. Blázquez, M. Bhardwaj, A. Chandrakasan, ICASSP 2006.]
Maximum Ratio Combiner (MRC)Maximum Ratio Combiner (MRC)
� Demodulation uses a 5-fingered RAKE receiver � A hard decision is made at the output MRC to resolve a bit
Parallelized Baseband ArchitectureParallelized Baseband Architecture
L = 20; M = 31Total # of correlators = 620
Correlator Bank
Demodulation
Correlator Sub-bank 1
Correlator 1
Correlator 2
Correlator L
Correlator Sub-bank 2
Retim
ing Block
5-finger RAKE MRC
5-finger RAKE MRC
5-finger RAKE MRC
5-finger RAKE MRC
………
……
…
Correlator L+1
Correlator L+2
Correlator 2L
Correlator Sub-bank M
Correlator (M-1)L+1
Correlator (M-1)L+2
Correlator ML
5-bit complex input fromA
DC
s
Serial to P
arallel
Peak D
etector
Bit D
ecoder
………
………
………
DemodulatedBits
Synchronization/Timing Control
ChannelEstimation
Cross-CorrelationFunction
npeak
n
Cross-CorrelationFunction
PeakDetector
npeak
EnergyEnergy --Area TradeoffArea Tradeoff
Energy-Area Tradeoff for Digital Baseband Processor
0
1
2
3
4
5
6
7
8
1 10 100 1000
Normalized Baseband Processor Area
Nor
mal
ized
Bas
eban
d P
roce
ssor
Ene
rgy
This Design
5.8x
9.6x
EnergyEnergy --Area TradeoffArea Tradeoff
Receiver Energy - Digital Baseband Area Tradeoff
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10
Normalized Digital Baseband Processor Area
Nor
mal
ized
Rec
eive
r E
nerg
y
This Design
14.7x
8.6x
400400--mV mV Baseband ProcessorBaseband Processor
� STMicroelectronicsstandard-VT 90-nm CMOS process
� 281,260 gates
� Includes 620 Correlators & 4 Maximum Ratio Combiners
� Die area: 10.94mm2
� Active area 23%3.
3 m
m
3.3 mm
Correct Operation at 400 mVCorrect Operation at 400 mV
� Oscilloscope plot shows correct functionality at 400mV� Note: I/O has a 1V power supply
� Operating frequency of 25 MHz
� Four bits demodulated in parallel every 40-ns cycle
Data Ready
Output Clock
Output Data [1-0]
Energy Per BitEnergy Per Bit
Breakdown of Baseband Processor's Energy Per Bit
0
5
10
15
20
25
30
35
40
45
50
512b 1kb 2kb 4kb 8kb
Size of Packet (bits)
Ene
rgy
per
bit (
pJ)
Acquisition Energy
Demodulation Energy
16.8 pJ/bit
� Power Measurements� Acquisition 7 mW / Demodulation 1.7 mW
Power GatingPower Gating
� Reduce leakage power using a high VT “sleep” transistor to gate the leakage current when block is idle
� Breakeven time 137 µs
Instantaneous Power
-5
0
5
10
15
20
0 50 100 150 200 250 300 350 400
Time (us)P
ower
(mW
)
Sleep = 1Sleep = 0Sleep = 1
Recovery
Acquisition
Demodulation
ConclusionsConclusions
� Reduce energy to receive a UWB packet by� Scaling to optimum supply voltage� Mapping algorithm to parallel architecture
� Voltage scaling to ultra-low voltage (1 V � 0.4 V)� 5.8X reduction in energy per operation of correlators
� Reduced acquisition time � 14.7X reduction in receiver acquisition energy
� 400-mV 100 Mbps UWB Baseband Processor� 16.8 pJ/bit for demodulation� 20 pJ/bit for a 4-kb packet
� Demonstrate high performance at ultra-low voltage
� Can be applied to other high performance communication and signal processing applications
AcknowledgementsAcknowledgements
� Funding: DARPA and NSERC Fellowship
� Chip Fabrication: STMicroelectronics