+ All Categories
Home > Documents > Bridging the Energy Gap in Size, Weight and Power Constrained Software Defined Radio: Agile Baseband...

Bridging the Energy Gap in Size, Weight and Power Constrained Software Defined Radio: Agile Baseband...

Date post: 21-Dec-2015
Category:
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
42
Bridging the Energy Gap in Size, Weight and Power Constrained Software Defined Radio: Agile Baseband Processing as a Key Enabler Bruno Bougard, Min Li, David Novo, Liesbet Van der Perre and Francky Catthoor
Transcript

Bridging the Energy Gap in Size, Weight and Power Constrained Software Defined Radio:

Agile Baseband Processing as a Key Enabler Bruno Bougard, Min Li, David Novo, Liesbet Van der Perre and Francky Catthoor

Bruno Bougard et al.Athens, May 2008 2

The number of standards to implement in a single handset increases dramatically

Mobility

Sta

tio

na

ryW

alki

ng

Dri

vin

g

Data rate10 Mbps

IEEE802.16a,d

1 100

HSxPA

IEEE802.16e

GSMGPRS

DECT

EDGE

3G LTE

0.1

BlueTooth

UMTS

WLAN(IEEE 802.11b)

WLAN (IEEE 802.11a/g/n)

Bruno Bougard et al.Athens, May 2008 3

All cost factors direct towards high-volume programmable solutions everywhere possible

[source: ICERA]

Bruno Bougard et al.Athens, May 2008 4

Two barriers remain

The energy gap The exploding complexity

2G

3G

3G+

LTE?

.11b

.11g

.11n

MIMO

Bruno Bougard et al.Athens, May 2008 5

VLIW/DSPs

ASIPs

FPGAs

Most SWPC SDR research focuses on more energy efficient processor architectures

Flexibility

Efficienc

y

ASICs

VLIW/DSPs

ASIPs

FPGAs RISCs

GPPs

?VLIW/DSPs

ASIPs

FPGAs

• NXP onDSP, EVP

• Sandbridge Sandblaster SB3011

• SiliconHive CSP2200

• Infineon MUSIC

• Icera DXP

• Nokia VectorASIP

• UMich SODA

• ULinkoping/CORESONICS BBE2

• TUDresden SAMIRA

• …

Bruno Bougard et al.Athens, May 2008 6

Radio Baseband Platform Requirements

Low Cost

Energy Aware Spectrum Agile

Long HW lifespan

Short SW deployment time

Scalable HW/SW

Energy aware HW

Energy aware algorithms

Energy aware protocols

Techno-aware power managnt

Versatile RX digital front end

Versatile TX digital front end

Powerful MAC/RLC/QoS Ctrl

Bruno Bougard et al.Athens, May 2008 7

Outline

• IMEC SDR Baseband Platform

• Wanted: Platform Aware Signal Processing

• Case study 1: OFDMA transmitter

• Case study 2: OFDMA receiver

• Case study 3: Dynamic fixed-point format assignment

Bruno Bougard et al.Athens, May 2008 8

Outline

• IMEC SDR Baseband Platform

• Wanted: Platform Aware Signal Processing

• Case study 1: OFDMA transmitter

• Case study 2: OFDMA receiver

• Case study 3: Dynamic fixed-point format assignment

Bruno Bougard et al.Athens, May 2008 9

Where do you need flexibility?Where do you need energy efficiency?

Duty Cycle

Div

ers

ity/V

ers

ati

lity

FE steeringSignal detection

Synchronization

ModulationDemodulation(Inner Modem)

Forward Error Correction

(Outer Modem)

Bruno Bougard et al.Athens, May 2008 10

Where do you need flexibility?Where do you need energy efficiency?

Need in energy efficiency

Need

in

flexib

ility

FE steeringSignal detection

Synchronization

ModulationDemodulation(Inner Modem)

Forward Error Correction

(Outer Modem)

Bruno Bougard et al.Athens, May 2008 11

IMEC MIMO-capable SDR baseband platform

802.11n

and next gen.

802.16e and next gen.

• Up to 3 antennas

• Up to 200Mbps

• <500mW

Flexible platform

3GPPLTE

DVB-H/T

Bruno Bougard et al.Athens, May 2008 14

Two Programmable CGA Processor Cores at its heart

• 32KB I$• 128KB IMEM• 128-entries CMEM• 64KB L1 data scratchpad

• TSMC 90G • Dual VT and substrate biasing for leakage

reduction in sleep mode

• Clock rate 400MHz WCC

• Total Area: 6 sqmm• Power consumption

– Active TC VLIW 75mW

– Active TC CGA 300mW

– Leakage @ T=65C 25mW

– Leakage in standby <10mW

2.55 mm

2.27 mm

L1

scra

tch

padI$

CGA Config mem

CGA Config mem

AHB

Core logic(including registerfiles)

2.55 mm

2.27 mm

L1

scra

tch

padI$

CGA Config mem

CGA Config mem

AHB

Core logic(including registerfiles)

2.27 mm

L1

scra

tch

padI$

CGA Config mem

CGA Config mem

AHB

Core logic(including registerfiles)

• 4x4 64-bit 4-way SIMD CGA• VLIW and CGA mode of operations• C-programmable

• 25 (theoretical) GOPS• 46MOPS/mW

Bruno Bougard et al.Athens, May 2008 1515

200 Mbps+ SDR application driver

• IEEE 802.11 n digital inner modem receiver

– Channel bonding 40MHz

– 2 antennas MIMO SDM OFDM

# ant. mod. scheme

cod. rate

SNR [dB] BER = 10

1 BPSK 1/2 3.0

1 QPSK 1/2 6.5

1 16QAM 1/2 12.5

1 64QAM 3/4 22.3

2 BPSK 1/2 5.5

2 QPSK 1/2 11.5

2 16QAM 1/2 18.0

2 64QAM 3/4 34.0

-3

Bruno Bougard et al.Athens, May 2008 17

0 2 4 6 8 10 12 14 16 18

acorr

fshift

xcorr

fft (2x)

freq offset estim.

freq offset comp

SDM MMSE (2x)

tracking

QAM demap

Total preample processing

Total per symbol processing

execution time @ 400MHz (Us)

Profiling for SDR benchmarks and OFDM full application prove real time operations @100Mbps

2-antenna SDM-OFDM @100Mbps

Bruno Bougard et al.Athens, May 2008 18

Great benefit in area but power higher than dedicated hardware solutions

VLIW ctrl0%

FU VLIW4%

FU CGA25%

VLIW reg6%

CGA reg2%

CMEM13%

DMEM10%

I$1%

peripherals1%

CGA intercon - mux - pipeline

38%

Active Power VLIW: 75mWActive Power CGA: 300mWLeakage Power: 25mW

0

0.5

1

1.5

2

2.5

3

3.5

4

802.11n 802.16e DVB-H 11n&16e all

0

50

100

150

200

250

300

350

400

SDR(IMEC)

Reconf.(Intel)

ASIC(source:

Intel)

SDR(IMEC)

ASIC(Atheros)

Bruno Bougard et al.Athens, May 2008 19

The interconnection network dominates the power consumption in VLIW and CGA modes

VLIW ctrl0%

FU VLIW22%

FU CGA2%

VLIW reg21%

CGA reg2%

CMEM0%

DMEM13%

I$10%

peripherals2%

Interconnect + mux28%

VLIW ctrl0%

FU VLIW4%

FU CGA25%

VLIW reg6%

CGA reg2%

CMEM13%

DMEM10%

I$1%

peripherals1%

CGA intercon - mux - pipeline

38%

VLIW mode CGA mode

Active power: 75mWLeakage Power: 25mW

Active power: 300mWLeakage Power: 25mW

Bruno Bougard et al.Athens, May 2008 20

Outline

• IMEC SDR Baseband Platform

• Wanted: Platform Aware Signal Processing

• Case study 1: OFDMA transmitter

• Case study 2: OFDMA receiver

• Case study 3: Dynamic fixed-point format assignment

Bruno Bougard et al.Athens, May 2008 21

Wanted: SDR-Platform Aware Signal Processing

Elephant as Platform Horse as Platform

Bruno Bougard et al.Athens, May 2008 22

Dynamic signal processing implementation

Time

3GPPChannelresponse

Cycle Count onSoA processor

Bruno Bougard et al.Athens, May 2008 23

Wanted: SDR-Platform Aware Signal Processing

ASIC as platform

• Requires simple control flow

• Requires manifest and regular computation structures

• Maximum functional reuse is a must

• Minimum data wordwidth• Accommodates high

computation loads• Highest energy efficiency

SDR as platform

• Accommodates more complex control flows

• Accommodates complex and irregular computation structures

• Functional reuse not a must (reuse memory footprint only)

• Aligned data wordwidth• Limited maximum computation

load• Lower energy efficiency

Bruno Bougard et al.Athens, May 2008 24

Algorithm-Architecture Co-Design

• Make algorithm compatible with architecture/compiler constraints • Exploit opportunities of programmable architecture

Architecture/

Compiler

Algorithm/Software

Bruno Bougard et al.Athens, May 2008 25

Channel Channel

Observation

• Wireless baseband processing implies high dynamics • Wireless baseband processing tolerate inaccuracy• This is already considered at system level (X-layer), but what

about in the signal processing implementation?

Bruno Bougard et al.Athens, May 2008 26

The opportunity

• Two viewpoints toward complexity – Computation complexity and memory complexity

– Structure complexity (control flow, heterogeneity , etc.)

• Wireless system can cope with inaccuracy (“scalable” QoS)• On SDR

– Computation complexity is much more costly than in ASIC

– Memory complexity is as costly as in ASIC

– Structure complexity is much less costly than in ASIC

• What can we do ?Increase the structure complexity of baseband processing

to reduce the average computation and memory complexity

by enabling run-time adaptation of the algorithms implementation

to the dynamics in QoS requirement, environment (and platform)

Baseband ASIC Baseband ASIC

with Low Structurewith Low Structure

Complexity Complexity

SDR Baseband SDR Baseband

with High Structurewith High Structure

ComplexityComplexity

Bruno Bougard et al.Athens, May 2008 27

Outline

• IMEC SDR Baseband Platform

• Wanted: Platform Aware Signal Processing

• Case study 1: OFDMA transmitter

• Case study 2: OFDMA receiver

• Case study 3: Dynamic fixed-point format assignment

Bruno Bougard et al.Athens, May 2008 28

Motivation: OFDMA Modulation Error requirements vary

WiMAX Specification

Modulation accuracy can be relaxed for lower order modulation

Bruno Bougard et al.Athens, May 2008 29

RCE relaxation can be exploited by a scalable digital OFDMA Modulator

• Original: A large-size (e.g., 1024) IFFT based non-scalable modulator

• Transformed: An scalable OFDMA modulator with 3 cascaded components

Interpolation factor can be used as a knob to adjust the accuracy and computation load to the RCE requirement

Bruno Bougard et al.Athens, May 2008 30

Computation load scales smoothly with the interpolation factor

Norm

aliz

ed c

ycl

e c

ount

Interpolation factor

Bruno Bougard et al.Athens, May 2008 31

Outline

• IMEC SDR Baseband Platform

• Wanted: Platform Aware Signal Processing

• Case study 1: OFDMA transmitter

• Case study 2: OFDMA receiver

• Case study 3: adaptive fixed-point refinement

Bruno Bougard et al.Athens, May 2008 32

OFDMA mod./demod. requires (I)FFT with Partial input/output

The position and number of bins change dynamically

Bruno Bougard et al.Athens, May 2008 33

Efficient Partial FFT on ILP Architectures

• Exploit the partial input/output to reduce active instructions and memory accesses

• 30 years theoretical research on PFFT but few implementations

• We propose a generic and efficient scheme for PFFT on ILP architectures– Any pattern of bin-distribution can be implemented

Bruno Bougard et al.Athens, May 2008 34

The proposed scheme brings important gains in almost all implementation cost factors and scales smoothly with the number of sub-carriers to be processed

Bruno Bougard et al.Athens, May 2008 35

The prize to pay is an higher instruction cache miss rate (acceptable)

Bruno Bougard et al.Athens, May 2008 36

Outline

• IMEC SDR Baseband Platform

• Wanted: Platform Aware Signal Processing

• Case study 1: OFDMA transmitter

• Case study 2: OFDMA receiver

• Case study 3: Dynamic fixed-point format assignment

Bruno Bougard et al.Athens, May 2008 3737

State-of-the-art

• Automatic Floating point to fixed point conversion (>30 years of work)– Commercial products: Catalytic Inc. & Mathworks

– Recent academic contributions:

• Simulation-based: Seoul National Univ. (‘95)• Analytical methods: Aachen (‘98), Northwest Univ. (‘01)• Hybrid methods: Imperial College (‘03), Berkeley (‘04) and ENSSAT (‘05)

[Yoshizawa, S. et Al. ISCAS’06]

• Run-time word-length selection: Receiver VLSI architecture based in a control feedback loop. Hokkaido University (‘06)

Bruno Bougard et al.Athens, May 2008 3838

Modeling of the fixed-point communication system

• Performance of the communication system as a function of the receiver SNR

– BER = f(SNR)

+b

ac

+na

a

c

+nb

a nc

• Fixed-point refined system includes quantization noise

– BER = f(SNR, na, nb, …) = f’(SNR) ≈ f(SNR’)

0 5 10 15 20 25 30 35 400

20

40

60

80

100

120

SNR [dB]

Th

rou

gh

pu

t [M

bp

s]

BPSK 1/2QPSK 1/216QAM 1/264QAM 2/3

A B C D• Implementation-scenarios defined

and optimized at design time

Bruno Bougard et al.Athens, May 2008 39

RX

Opportunity: application dynamics and tolerance to inaccuracy can be propagated to the implementation

• Multiple link parameters trade off noise/interference robustness versus data rate

• Different system configurations have different requirements in [digital] signal processing accuracy use different implementations

0 5 10 15 20 25 30 35 400

20

40

60

80

100

120

SNR [dB]

Th

rou

gh

pu

t [M

bp

s]

BPSK 1/2QPSK 1/216QAM 1/264QAM 2/3

A B C D

AnalogFE

DigitalDSP

#bitsTX A +

noise

Channel

SNR

SY

ST

EM

LE

VE

LIM

PL

EM

EN

TA

TIO

N L

EV

EL

• We adapt the application fixed-point mapping at run-time• By switching between the “mappings”, the average load is reduced

Bruno Bougard et al.Athens, May 2008 40

SDR enables more agile signal processing implementations

DSP implementation

Run-timecontroller

Monitoring info

AdaptData format

QoS req.

TimeFreq

ChanAtt

Several sw implementation of the same functionality with different precision/computation load

Monotonic relation between precision/load

One can switch between sw implementation in a few cycles

Bruno Bougard et al.Athens, May 2008 41

Dynamic fixed-point format assignment increases energy efficiency in situation requiring lower performance

Bruno Bougard et al.Athens, May 2008 42

Dynamic fixed-point format assignment increases energy efficiency in situation requiring lower performance

Bruno Bougard et al.Athens, May 2008 43

Dynamic fixed-point format assignment increases energy efficiency in situation requiring lower performance

Bruno Bougard et al.Athens, May 2008 4444

Increase in scalability

• Energy efficiency increased at lower rate modes

• Average energy consumption is reduced

Bruno Bougard et al.Athens, May 2008 4545

Conclusions

• Energy efficiency of flexible implementation closer to their dedicated hardware counterparts:– Has the potential to continuously best-fit the dynamism.

• Does not rely on hypothetical provision in the standards:– Implementation centric

– Applicable to any functional-level algorithmic solutions

• Wireless systems context today but also other domains tomorrow: – Digital signal processing with an SNR type constraint and which has

dynamic data resolution variation

– biomedical signal processing, multimedia, etc.


Recommended