Low Complexity Design for Decoding LDPC...

transcript

Low Complexity Design forDecoding LDPC Codes

EE5327 VLSI Design Lab. Project

Apr/18/2006

Introduction of LDPC codes

Iterative message passing algorithm

- sum-product algorithm (SPA)

Proposed architecture for decoding LDPC codes

Design & Verification procedure

Synthesis & Optimization

Current status

Future works

Conclusion

Contents

Low-Density Parity-Check(LDPC) Codes

More than 40 years of research (1948-1994) centered around

Weights of errors that a code is guaranteed to correct

“Bounded distance decoding” cannot achieve Shannon limit

Trade-off minimum distance for efficient decoding

Low-Density Parity-Check (LDPC) Codes

Gallager 1963, Tanner 1984, MacKay 1996

1. Linear block codes with sparse (small fraction of ones) parity-check matrix

2. Have natural representation in terms of bipartite graphs

3. Simple and efficient iterative decoding in the form of belief propagation (Pearl, 1980-1990)

min( 1)/2d≤ −⎢ ⎥⎣ ⎦

Representations of LDPC Codes

1000001

0000000100000100100000100101

⎪⎪⎪⎪

.⎥⎥⎥⎥⎥⎥

……….

⎢⎢⎢⎢⎢⎢⎢⎢⎢

parity check matrix

N Bit(variable) nodes

Check nodes

bipartite graph

……….……….……….

……….

Iterative message passing algorithm - Sum-product algorithm (SPA)

{ ( )}

( ) log(tanh / 2 ))

cv cv cnn N c v

cv cnn N c v

S sign L

= − ⋅Ψ Ψ

∏( )\

cv nvn M v c

r received data

estimated noise iance

Check Node Update Variable Node Update

SPA hardware architecture I(Degree of Variable Node: 3, Degree of Check Node: 6)

Check Node Update (CNU) Variable Node Update (VNU)

6cβ 6cα

( )q - 1

vz( )q -1

( )q -1

( )q -1q

)sgn( vβ

0 0.5 1 1.5 2 2.5 3 3.5 40

Input x

| of P

Curve of PSI functionLook Up Table (LUT)

Implementation

Finite precision analysis

Quantization scheme of passing message (q:f)• q : totally used bits for quantization

• f : used bits for fractional part

• e.g. (6:3) ranges is -4.0 ~ 3.875 (step size: 0.125)

Reasonable distribution of data (BPSK, AWGN channel)• Input range = -4.0 ~ 4.0

• Output range = -4.0 ~ 4.0

Best Choice in between hardware complexity and performance – (6:3) quantization scheme• Sign: 1 bit, Integer: 2 bits, fraction: 3 bits

• LUT size : 5 x 2^(6-1) bits (except sign bit)

Non-uniform quantization schemeusing compressed LUT (CLUT)

Almost no performance degradation

Reduced memory word lengths: q -> (q-1)bits

Reduced LUT size: 2^(q-1)*q -> 2^(q-2)*q bits

0 0.5 1 1.5 2 2.5 3 3.5 40

Input x

| of P

Original curve vs Quantized curve of PSI function with 6:3 bit (including sign bit)

Non-uniform steps- 2^(q-3) steps

for 0.0 < x < 1.0

- 2^(q-4) steps

for 1.0 < x < 2.0

- 2^(q-4) steps

for 2.0 < x < 4.0

0 0.5 1 1.5 2 2.5 3 3.5 40

Input x

| of P

Original curve vs Quantized curve of PSI function with 6:3 bit (including sign bit)

Original valueQuantized valuesModified Quantized values

Proposed non-uniform quantizatione.g. (6:3) – 8 steps, 4steps, 4steps

Proposed Non-uniform (6:3) Quantization LUT (except sign bit)

Quan_val Output Quan_val Mod LUT0.0625 3.4661 3.500 11100 3.500 111000.1875 2.3700 2.375 10011 2.375 100110.3125 1.8644 1.875 01111 1.875 011110.4375 1.5356 1.500 01100 1.500 011000.5625 1.2944 1.250 01010 1.250 010100.6875 1.1062 1.125 01001 1.125 010010.8125 0.9538 1.000 01000 1.000 010000.9375 0.8274 0.875 00111 0.875 001111.0625 0.7209 0.750 00110 0.625 001011.1875 0.6300 0.625 00101 0.625 001011.3125 0.5519 0.500 00100 0.500 001001.4375 0.4843 0.500 00100 0.500 001001.5625 0.4255 0.375 00011 0.375 000111.6875 0.3743 0.375 00011 0.375 000111.8125 0.3294 0.375 00011 0.250 000101.9375 0.2901 0.250 00010 0.250 000102.0625 0.2557 0.250 00010 0.250 000102.1875 0.2253 0.250 00010 0.250 000102.3125 0.1987 0.250 00010 0.250 000102.4375 0.1752 0.125 00001 0.250 000102.5625 0.1545 0.125 00001 0.125 000012.6875 0.1363 0.125 00001 0.125 000012.8125 0.1203 0.125 00001 0.125 000012.9375 0.1061 0.125 00001 0.125 000013.0625 0.0936 0.125 00001 0.125 000013.1875 0.0826 0.125 00001 0.125 000013.3125 0.0729 0.125 00001 0.125 000013.4375 0.0643 0.125 00001 0.125 000013.5625 0.0568 0.000 00000 0.000 000003.6875 0.0501 0.000 00000 0.000 000003.8125 0.0442 0.000 00000 0.000 000003.9375 0.0402 0.000 00000 0.000 00000

InputLUT #1 LUT #3Original

1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.810-7

Eb/No (dB)

The (3,6) Regular 1/2 Rate LDPC Code, Block size N=1728, (Max Iteration = 20)

SPA - No Quantization SPA - Uniform (7:4) Quantization SPA - Uniform (6:3) Quantization SPA - Uniform (5:2) QuantizationSPA - Variable (6:3) Quantization

Simulation results

Regular (1728, 864) ½ rate LDPC code

SPA hardware architecturefor LDPC decoder

Check Node Update (6 degree) Variable Node Update (3 degree)

6cβ 6cα

( 2)q -

( )q -1

( 2)q -

( 2)q -q

( )q -1

q)sgn( vβ

RLUTQuan_val Mod LUT Output

00000 3.500 11100 0000 1110000001 2.375 10011 0001 1001100010 1.875 01111 0010 0111100011 1.500 01100 0011 0110000100 1.250 01010 0100 0101000101 1.125 01001 0101 0100100110 1.000 01000 0110 0100000111 0.875 00111 0111 0011101000 0.625 0010101001 0.625 0010101010 0.500 0010001011 0.500 0010001100 0.375 0001101101 0.375 0001101110 0.250 0001001111 0.250 0001010000 0.250 0001010001 0.250 0001010010 0.250 0001010011 0.250 0001010100 0.125 0000110101 0.125 0000110110 0.125 0000110111 0.125 0000111000 0.125 0000111001 0.125 0000111010 0.125 0000111011 0.125 0000111100 0.000 0000011101 0.000 0000011110 0.000 0000011111 0.000 00000

LUT #3

000001111

5 bitInput

4bitInput

1 0v v2 1v v3 2v v

3b1 0b b2b

Overall structure

M(12,1)#10

M(12,5)#21

M(12,7)#18

M(11,1)#5

M(12,8)#8

M(10,9)#15

M(12,24)#0

M(11,11)#11

M(12,12)#1

M(11,23)#0

M(11,24)#0

M(10,10)#17

M(10,23)#0

M(9,1)#33

M(9,7)#16

M(8,1)#2

M(8,7)#27

M(7,10)#35

M(8,10)#25

M(8,11)#18

M(10,22)#0

M(7,12)#27

M9,12)#30

M(9,22)#0

M(9,21)#0

M(5,1)#5

M(6,10)#8

M(6,13)#0

M(8,20)#0

M(5,10)#2

M(7,19)#0

M(7,20)#0

M(8,21)#0

M(4,1)#43

M(4,8)#0

M(4,10)#41

M(4,11)#0

M6,19)#0

M(3,10)#38

M(3,12)#0

M(2,1)#29

M(1,1)#0

M(1,10)#0

M(1,13)#1

M(2,3)#0

M(1,2)#0

M(6,2)#46

M(5,3)#1

M(9,2)#35

M(8,3)#44

M(9,4)#29

M(10,4)#4

M(11,4)#19

M(7,4)#9

M(3,4)#21

M(4,4)#30

M(2,4)#26

M(1.4)#0

DVPU#1 DVPU#1

M(1,6)#0

M(2,7)#0

M(2,9)#2

M(3,5)#0

M(3,7)#17

M(5,6)#20

M(5,7)#35

M(6,7)#22

M(6,9)#40

M(10,5)#4

M(11,6)#14

M(7,8)#13

M(7,7)#18

DVPU#1 DVPU#1

DVPU DVPU DVPU

DVPU#1 DVPU#1 DVPU#1 DVPU#1

M(6,18)#0

M(5,18)#0

M(5,17)#0

M(4,17)#0

M(4,16)#0

M(3,15)#0

M(2,14)#0

M(2,15)#0

M(3,16)#0

M(1,14)#0

DVPU#1 DVPU#1 DVPU#1 DVPU#1 DVPU#1 DVPU#1 DVPU#1 DVPU#1 DVPU#1 DVPU#1

Memory Bank

Binary 864 bits Binary 1728 bits

Corrupted 1728 DataBinary 864 bits

Eb/No = 2.0 dB

Algorithm verification in MATLAB- generating the encoder output

- adding AWGN to LDPC decoder

- checking correct capability of LDPC decoder

HDL design- describing in structural style

- bottom-to-top design approach

Functional Simulation

- making test-bench file for checking error

correcting capability of LDPC Decoder

- using VCS Verilog Simulator

- compiling our verilog source code

- running the simulation

- viewing the generated waveform

Verilog HDL Design

UsingTest data made

by Matlab

Optimization for Area, Timing,

and Power

LDPCH matrix

Construction

SubmoduleVerilog HDL

Design

Partial Functional Verification

PartialSynposys & Synthesis

OverallVerilog HDL

Design

Intergration for Submodules,Clock, FSM,

Overall Functional Verification

Overall Synposys & Synthesis

UsingSimple

TestBench

1st Optimization

H=(1728, 864)for 802.11n

VNUs,CNUs,

Memory Block,Address Gen,

etc...

Design Complete& Report

- using Script command to synthesize

and optimize our design

- bottom-to-top optimization using don’t touch attribute

- analyzing area and timing

- If the result do not meet design constraints,

go to RTL Description level

Verilog HDL Design

Synthesis Results for VPU3_Block

Optimization Method Area Data required time- data arrival time

Area 1074.000000 35 - 25.79 = 9.21

Timing Driven Structuring 1043.250000 35 - 26.55 = 8.45

Timing Driven StructuringWith Flattening 1043.250000 35 - 26.42 = 8.58

Timing Driven StructuringWith Flattening and Boolean

Optimization1033.500000 35 - 26.42 = 8.58

Timing Driven StructuringWith High Mapping Effort and

Flattening1017.750000 35 - 26.38 = 8.62

Computer simulation by Matlab- Completed Matlab program for LDPC decoding.

- Confirmed the decoding performance for our proposed algorithm.

- Extracted the encoded output data and corrupted data by AWGN.

HDL design by Verilog- Completed HDL description for submodules.

- Still checking the functional simulation using simple testbench.

Current Status

HDL design by Verilog- Complete HDL description by integrating submodules.

- Compare & check the full functional simulation using the extracted data form Matlab.

- Perform synthesis and optimization.

- Make a final report

Future works

In this project,Do understanding the LDPC codes.

Propose low complexity architecture for decoding partial parallel LDPC codes.

Use Verilog HDL for hardware implementation.

Study synthesis & optimization technique using several tools.

Learn overall design procedures by VHDL or Verilog descriptions

Conclusion

Thank you

Low Complexity Design for Decoding LDPC...

Documents