Implementation Issues for Channel Estimation and Detection Algorithms for W-CDMA Sridhar Rajagopal...

Post on 25-Dec-2015

219 views 1 download

Tags:

transcript

Implementation Issues for Channel Implementation Issues for Channel Estimation and Detection Algorithms Estimation and Detection Algorithms for W-CDMAfor W-CDMA

Sridhar Rajagopal and Joseph Cavallaro

ECE Dept.

ContentsContents

IntroductionW-CDMAChannel Estimation and DetectionDSP ImplementationASIC ImplementationOther Current Projects Future Work

The CDMA Research GroupThe CDMA Research Group

We cover the entire (spread) spectrum!

– Algorithms

– Implementation issues

D M

Implementation IssuesImplementation Issues

Important because– Real-time– Low Power– Mobility /Size

DSPs– Signal Processing Communications

ASICs / FPGAs– Speed / Size

W-CDMAW-CDMA

CDMA : Code Division Multiple Access

W-CDMA : Wideband CDMA (5 MHz)

– Next Generation Communication Systems

– Integrating Multimedia Capabilities

– QoS /Multi-rate Services

– Higher Data Rates 2048,384,144 Kbps

Uplink - Async, MultiuserUplink - Async, Multiuser

Direct Path

Reflected PathsBase

Station

Noise + MAI

User 1

User 2

Downlink - Sync, Single UserDownlink - Sync, Single User

Direct Path

Reflected Paths

Base Station

Noise + MAI

User 1

User 2 User 1

Determining the ChannelDetermining the Channel

Channel Estimation

– Need to know the Channel for proper detection• Delays and Amplitudes : Multiuser/path

– Send sequence of known bits (Pilot / Preamble)

– 2 types• Code Multiplexed with Data• Time Multiplexed with Data

Detection– Use knowledge of channel for detection of Data bits

W-CDMA Standards W-CDMA Standards

Not fixed yet…...

Uplink– Channel Estimation - Time Multiplexed– Multiuser Detection

Downlink– Channel Estimation - Common Pilot– Detection : Rake Receivers/ Equalizers

Channel EstimationChannel Estimation

Uplink

– Time Multiplexed

• Maximum Likelihood

• Subspace

Downlink

– Continuous• LMS Based Adaptive

Multiuser Detection Multiuser Detection

Optimal Sub-optimal

Linear Interference Cancellation

Neural Network

•MAI Whitening

•Decorrelating

•MMSE

•Serial SIC

•Parallel PIC

MLSE (Viterbi)

Base-Station Receiver Base-Station Receiver

Channel

Estimator

Multiuser

Detector

Demux Decoder

Data

Pilot

Estimated Amplitudes &

Delays

Demodulator

Antenna

CDMA Uplink SystemCDMA Uplink System

Channel

Encoder

Channel

Encoder

Channel

Encoder

Spreading

Spreading

Spreading

AWGN

Matched

Filter

Matched

Filter

Channel

Estimator

Matched

Filter

Multi-

User

Detector

Channel

Decoder

+

User 1d1

User 2d2

User KdK

R(t)

User 1d1

'

User 2d2

'

User KdK

'

y1

y2

yK

Demux

Maximum Likelihood Channel EstimationMaximum Likelihood Channel Estimation

Send a time-multiplexed Preamble (Pilot).

Channel properties extracted

Compare with known pilot and estimate.

Keep estimate for remaining data bits (static).

Repeat preamble every frame, if no tracking.

The Maximum Likelihood AlgorithmThe Maximum Likelihood Algorithm

Compute the correlation matrices

Compute the channel estimate

Calculate the noise covariance matrix K.

Calculate the channel impulse response vector z.

Extract the ampitudes and delays using least

squares fit.

bb.brrr R & R ,R

.bb-1

br R R Y

The ML Algorithm Complexity The ML Algorithm Complexity

Complex-Real Dot Product.

Complex-Real Matrix Product.

Complex -Real Product.

Real Square roots.– Solving quadratic equation for least squares fit.

Critical code : Matrix-vector / Dot Product

r.bL

1Rbr

1

bbbr RRY

1''

212))((

UUUUUyUyz

L

k

L

k

R

k

R

k

L

k

H

k

R

k

H

k

H

k

Assuming Unity Noise CovarianceAssuming Unity Noise Covariance

Offline

Differencing Multistage Multiuser DetectionDifferencing Multistage Multiuser Detection

Based on the principle of Parallel Interference

Cancellation (PIC)

Cross-correlation information used to remove

interference of other users

Repeated iterations for convergence

Differencing techniques to improve performance

The Differencing Multistage DetectorThe Differencing Multistage Detector

Split the cross correlation matrix into lower, upper

and the diagonal matrix.

Calculate impulse response

x is called the differencing vector.

TSSDR

R

D

S

TS

})2,2,0{ˆ(

ˆˆˆ

ˆ)()2()1()1(

)1()1()(

k

lll

lTll

x

ddx

xASSAzAz

where

Multistage Detector ComplexityMultistage Detector Complexity

Matrix Multiplication:

– Computed only once for one frame

Dot Product:

– Computed iteratively

Critical code: Dot Product

ASSB T )(

ljij

lk

lk xBzz ˆ1

TI Tools UsedTI Tools Used

Evaluation Modules (EVM) for C6201 and C6701

fixed and floating point DSPs

– 64 KB each internal program & data memory

– 256 KB SBSRAM, 8 MB SDRAM (external)

C Compiler ver 3.0 from Code Generation Tools

Code Composer ver 4.02 for profiling

DSP Implementation: Channel EstimationDSP Implementation: Channel Estimation

Floating point implementation found more feasible due to matrix inversions and square-roots.

Code optimized for the DSP

Use of Specialized approximate instructions– Approximate reciprocal square roots– Approximate reciprocals

Use of Assembly Code for critical part.– TI's C67 floating point benchmarks for Matrix-Vector

Multiplication & Dot Product

Data Memory requirements for Channel Estimation

Approximate Instructions & AssemblyApproximate Instructions & Assembly

L = 150, P =3, N= 31,

SNR = 5dB, SINR = -10 dB

TMS320C67x DSP Cycles

Approx. FPReciprocalinstruction

1

FP reciprocalfunction 28

Approx. FPReciprocal Sq. root

Instruction1

FP Reciprocal Sq.root Instruction 34

0 5 10 150

20

40

60

80

100

120

140

Number of users -->

Ex

ec

uti

on

tim

e(i

n m

illi

se

co

nd

s)

-->

Use of specialized instructions and assembly code on C6701 DSP

C6701: Original C6701: with IntrinsicsC6701: with Assembly

10% improvement

100% improvement

Data Memory RequirementsData Memory Requirements

Data to be placed in External memory

1306

DSP Implementation: Multistage DetectionDSP Implementation: Multistage Detection

16-bit Fixed Point C Code

Code optimized for the DSP

Use of Assembly Code for critical part

– TI's C62 fixed point assembly benchmarks for Dot

Product

Data memory requirements for Multistage Detection

Data Memory RequirementsData Memory Requirements

Data can be placed

completely in Internal memory

1 2 3 4 5 6 7 80

2

4

6

8

10

12

14x 10

4

Total Number of Iterations

Nu

mb

er

of

Flo

ps

Users:K=15 SNR=6dB

Conventional MethodDifferencing Method

Flops CountFlops Count

conventional

differencing

2X speedup

for a

three-stage

detector

Real-Time RequirementsReal-Time Requirements

Real-Time capability by C6201 DSP

NUMBER OF USERS8 9 10 11 12 13 14

50

100

150

200

250

300

350

MA

X B

IT R

AT

E P

ER

US

ER

(k

b/s

)

SNR=10dB Window Size=12

Conventional MethodDifferencing Method

12users

150kb/s

Trends in Recent DSPsTrends in Recent DSPs

More internal memory and higher clock speeds

– C6203 : 512 KB data, 384 KB program, 250 MHz

– useful for uplink channel estimation algorithms.

Specialized Blocks in the DSP Core.

– Viterbi decoding in C54.

Lower Voltage operation

– 1.2 V in C5402 , useful for saving power consumption in

the mobile.

ASIC ImplementationASIC Implementation

MOSIS Tiny-Chip (40-pin DIP)

– 8 synchronous users

– 12-bit fixed point implementation

– 6000 transistors

– 1.2 m CMOS technology

– 190kb/s for each user (@12.5MHz)

– 3-stage cascade delay < 15 s

Advantages of ASICsAdvantages of ASICs

Highly paralleled instructions:

4 RISC IPC (instructions per cycle)– accumulating while shifting, loading and

storing– recoding while loading

Application specific architecture– faster I/O– smaller on chip memory– smaller ALU

Chip (Single Stage) ArchitectureChip (Single Stage) Architecture

)1( ld

)( lz)( lz

)( ld

)1( lz)( lz

)( ld

SHIFT

)1( ld

A

L

U

RECODER

REG

(L+L’)A ControlLogic

)1()()(

)()()1(

ˆˆˆ whereˆ)(

lll

lTll

ddxxALLzz Internal signals

External signals

Chip LayoutChip Layout

12-bit ALU

Soft Decisions

Cross-Correlation

Recodinglogic

2.0 mm

The Actual Chip PhotographThe Actual Chip Photograph

3-3-stage Cascade Modestage Cascade Mode

Sin

Hin

Fin

Load

CLK

Sout

Hout

Fout

1/2

Sin

Hin

Fin

Load

CLK

Sout

Hout

Fout

1/2

Sin

Hin

Fin

Load

CLK

Sout

Hout

Fout

1/2

Matched

FilterOutput

DetectorOutput

HandShaking

Load RClock

Output Valid

System TimingSystem Timing

Load R 1st Stage 2nd Stage3rd Stage

Final Output

Interference CancellationInterference Cancellation

10100000 00100000 00100000

Scalable ASIC DesignScalable ASIC DesignCurrent Tiny-Chip Chip in the future

Features 8 synchronous users,12-bit fixed point

30 asynchronous users,16-bit fixed point

Clock Rate 12.5MHz 100MHz

Internal registers 0.3 kb 8 kb

ALU 12-bit partial carry look-ahead adder

Three 16-bit full carry look-ahead adders

Transistors 6K 100K

Outputbandwidth

1.5Mb/s 3.0Mb/s

Design method layout VHDL synthesis

Xilinx FPGA XC4000: 500k gates, 96MHz

DSP-ASIC ComparisonDSP-ASIC Comparison

8 users DSP (C6201) ASIC (Tiny Chip)

Clock 200MHz 12.5MHz

Precision 16-bit 12-bit

Speed 300kb/s/user 190kb/s/user

Complexity ~10M (0.25m)transistors

6K (1.2m)transistors

DesignCycle

short long

•TI’s ‘C54xx: General purpose DSP core + ASIC

Other Current ProjectsOther Current Projects

Simulation Testbed– Entire Chain of Algorithms

• Simulink - RTW • Rapid Prototyping• Matlab to DSP

Copper Contest– Implementation of Multistage Detector

using 0.15 micron Copper Technology

Wireless LAN Project Wireless LAN Project

Home Area Wireless LAN

High Speed Office Wireless LAN

Outdoor CDMA Cellular Network

Future WorkFuture Work

Fixed Point Implementations on DSPs/ASICs

– Uplink & Downlink Algorithms

Approximations using Linear Algebra

Support Long Codes and Fading

Multistage Detector– Execution time Predictability – Increase Efficiency

GPP Comparisons : Praful, Partha, Dr.Adve

Effect of DMA and Caches