Post on 31-Mar-2020
transcript
Low Complexity Design forDecoding LDPC Codes
EE5327 VLSI Design Lab. Project
Apr/18/2006
Introduction of LDPC codes
Iterative message passing algorithm
- sum-product algorithm (SPA)
Proposed architecture for decoding LDPC codes
Design & Verification procedure
Synthesis & Optimization
Current status
Future works
Conclusion
Contents
Low-Density Parity-Check(LDPC) Codes
More than 40 years of research (1948-1994) centered around
Weights of errors that a code is guaranteed to correct
“Bounded distance decoding” cannot achieve Shannon limit
Trade-off minimum distance for efficient decoding
Low-Density Parity-Check (LDPC) Codes
Gallager 1963, Tanner 1984, MacKay 1996
1. Linear block codes with sparse (small fraction of ones) parity-check matrix
2. Have natural representation in terms of bipartite graphs
3. Simple and efficient iterative decoding in the form of belief propagation (Pearl, 1980-1990)
mind
min( 1)/2d≤ −⎢ ⎥⎣ ⎦
Representations of LDPC Codes
M
1000001
0000000100000100100000100101
⎪⎪⎪⎪
⎭
⎪⎪⎪⎪
⎬
⎫
⎥
.
.
.⎥⎥⎥⎥⎥⎥
……….
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
=H
parity check matrix
N Bit(variable) nodes
Check nodes
bipartite graph
……….……….……….
……….
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Iterative message passing algorithm - Sum-product algorithm (SPA)
( )\
( )\
{ ( )}
( )
( ) log(tanh / 2 ))
cv cv cnn N c v
cv cnn N c v
R S L
S sign L
x x
∈
∈
= − ⋅Ψ Ψ
=
Ψ =
∑
∏( )\
2
2
2
:
: var
cv nvn M v c
v
v
L R z
rz
r received data
estimated noise iance
σ
σ
∈
= −
⋅=
∑
Check Node Update Variable Node Update
SPA hardware architecture I(Degree of Variable Node: 3, Degree of Check Node: 6)
Check Node Update (CNU) Variable Node Update (VNU)
1cβ
2cβ
3cβ
4cβ
5cβ
1cα
2cα
3cα
4cα
5cα
6cβ 6cα
( )q - 1
( )q - 1
( )q - 1
( )q - 1
( )q - 1
( )q - 1
( )q - 1
( )q - 1
( )q - 1
( )q - 1
( )q - 1
( )q - 1
( )q - 1
( )q - 1
( )q - 1
( )q - 1
( )q - 1
( )q - 1
1cβ
2cβ
3cβ
vz( )q -1
( )q -1
( )q -1
( )q -1
( )q -1
( )q -1q
q
q
1cα
2cα
3cα
)sgn( vβ
0 0.5 1 1.5 2 2.5 3 3.5 40
1
2
3
4
5
6
Input x
outp
ut |y
| of P
SI f
uctio
n
Curve of PSI functionLook Up Table (LUT)
Implementation
Finite precision analysis
Quantization scheme of passing message (q:f)• q : totally used bits for quantization
• f : used bits for fractional part
• e.g. (6:3) ranges is -4.0 ~ 3.875 (step size: 0.125)
Reasonable distribution of data (BPSK, AWGN channel)• Input range = -4.0 ~ 4.0
• Output range = -4.0 ~ 4.0
Best Choice in between hardware complexity and performance – (6:3) quantization scheme• Sign: 1 bit, Integer: 2 bits, fraction: 3 bits
• LUT size : 5 x 2^(6-1) bits (except sign bit)
Non-uniform quantization schemeusing compressed LUT (CLUT)
Almost no performance degradation
Reduced memory word lengths: q -> (q-1)bits
Reduced LUT size: 2^(q-1)*q -> 2^(q-2)*q bits
0 0.5 1 1.5 2 2.5 3 3.5 40
0.5
1
1.5
2
2.5
3
3.5
Input x
outp
ut |y
| of P
SI f
uctio
n
Original curve vs Quantized curve of PSI function with 6:3 bit (including sign bit)
Non-uniform steps- 2^(q-3) steps
for 0.0 < x < 1.0
- 2^(q-4) steps
for 1.0 < x < 2.0
- 2^(q-4) steps
for 2.0 < x < 4.0
0 0.5 1 1.5 2 2.5 3 3.5 40
0.5
1
1.5
2
2.5
3
3.5
Input x
outp
ut |y
| of P
SI f
uctio
n
Original curve vs Quantized curve of PSI function with 6:3 bit (including sign bit)
Original valueQuantized valuesModified Quantized values
Proposed non-uniform quantizatione.g. (6:3) – 8 steps, 4steps, 4steps
Proposed Non-uniform (6:3) Quantization LUT (except sign bit)
Quan_val Output Quan_val Mod LUT0.0625 3.4661 3.500 11100 3.500 111000.1875 2.3700 2.375 10011 2.375 100110.3125 1.8644 1.875 01111 1.875 011110.4375 1.5356 1.500 01100 1.500 011000.5625 1.2944 1.250 01010 1.250 010100.6875 1.1062 1.125 01001 1.125 010010.8125 0.9538 1.000 01000 1.000 010000.9375 0.8274 0.875 00111 0.875 001111.0625 0.7209 0.750 00110 0.625 001011.1875 0.6300 0.625 00101 0.625 001011.3125 0.5519 0.500 00100 0.500 001001.4375 0.4843 0.500 00100 0.500 001001.5625 0.4255 0.375 00011 0.375 000111.6875 0.3743 0.375 00011 0.375 000111.8125 0.3294 0.375 00011 0.250 000101.9375 0.2901 0.250 00010 0.250 000102.0625 0.2557 0.250 00010 0.250 000102.1875 0.2253 0.250 00010 0.250 000102.3125 0.1987 0.250 00010 0.250 000102.4375 0.1752 0.125 00001 0.250 000102.5625 0.1545 0.125 00001 0.125 000012.6875 0.1363 0.125 00001 0.125 000012.8125 0.1203 0.125 00001 0.125 000012.9375 0.1061 0.125 00001 0.125 000013.0625 0.0936 0.125 00001 0.125 000013.1875 0.0826 0.125 00001 0.125 000013.3125 0.0729 0.125 00001 0.125 000013.4375 0.0643 0.125 00001 0.125 000013.5625 0.0568 0.000 00000 0.000 000003.6875 0.0501 0.000 00000 0.000 000003.8125 0.0442 0.000 00000 0.000 000003.9375 0.0402 0.000 00000 0.000 00000
InputLUT #1 LUT #3Original
Value
1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.810-7
10-6
10-5
10-4
10-3
10-2
10-1
100
Eb/No (dB)
BE
R/F
ER
The (3,6) Regular 1/2 Rate LDPC Code, Block size N=1728, (Max Iteration = 20)
SPA - No Quantization SPA - Uniform (7:4) Quantization SPA - Uniform (6:3) Quantization SPA - Uniform (5:2) QuantizationSPA - Variable (6:3) Quantization
Simulation results
Regular (1728, 864) ½ rate LDPC code
SPA hardware architecturefor LDPC decoder
Check Node Update (6 degree) Variable Node Update (3 degree)
RLUT
1cβ
2cβ
3cβ
4cβ
5cβ
1cα
2cα
3cα
4cα
5cα
6cβ 6cα
( 2)q -
( 2)q -
( 2)q -
( 2)q -
( 2)q -
( 2)q -
( 2)q -
( 2)q -
( 2)q -
( 2)q -
( 2)q -
( 2)q -
( )q -1
( )q -1
( )q -1
( )q -1
( )q -1
( )q -1
ivz
1cα
2cα
3cα
( )q -1
( )q -1
( )q -1
( 2)q -
( 2)q -
( 2)q -
( 2)q -
( 2)q -
( 2)q -q
q
q
q
( )q -1
( )q -1
( )q -1
q)sgn( vβ
1cβ
2cβ
3cβ
RLUTQuan_val Mod LUT Output
00000 3.500 11100 0000 1110000001 2.375 10011 0001 1001100010 1.875 01111 0010 0111100011 1.500 01100 0011 0110000100 1.250 01010 0100 0101000101 1.125 01001 0101 0100100110 1.000 01000 0110 0100000111 0.875 00111 0111 0011101000 0.625 0010101001 0.625 0010101010 0.500 0010001011 0.500 0010001100 0.375 0001101101 0.375 0001101110 0.250 0001001111 0.250 0001010000 0.250 0001010001 0.250 0001010010 0.250 0001010011 0.250 0001010100 0.125 0000110101 0.125 0000110110 0.125 0000110111 0.125 0000111000 0.125 0000111001 0.125 0000111010 0.125 0000111011 0.125 0000111100 0.000 0000011101 0.000 0000011110 0.000 0000011111 0.000 00000
LUT #3
000001111
1110
1101
5 bitInput
00100
00001
1011
1010
1001
1000
1100
00101
4bitInput
00010
00001
00011
00010
4v
3v
1 0v v2 1v v3 2v v
2v
3b1 0b b2b
COMP
Overall structure
M(12,1)#10
M(12,5)#21
M(12,7)#18
M(11,1)#5
M(12,8)#8
M(10,9)#15
M(12,24)#0
M(11,11)#11
M(12,12)#1
M(11,23)#0
M(11,24)#0
M(10,10)#17
M(10,23)#0
M(9,1)#33
M(9,7)#16
M(8,1)#2
M(8,7)#27
M(7,10)#35
M(8,10)#25
M(8,11)#18
M(10,22)#0
M(7,12)#27
M9,12)#30
M(9,22)#0
M(9,21)#0
M(5,1)#5
M(6,10)#8
M(6,13)#0
M(8,20)#0
M(5,10)#2
M(7,19)#0
M(7,20)#0
M(8,21)#0
M(4,1)#43
M(4,8)#0
M(4,10)#41
M(4,11)#0
M6,19)#0
M(3,10)#38
M(3,12)#0
M(2,1)#29
M(1,1)#0
M(1,10)#0
M(1,13)#1
DHPU
DVPU
DHPU
DHPU
DHPU
HPU8
HPU8
DVPU
HPU8
HPU8
HPU8
HPU8
HPU8
HPU8
DHPU
DHPU
DHPU
DHPU
HPU8
HPU8
HPU8
HPU8
HPU8
HPU8
HPU8
HPU8
DHPU
DHPU
DHPU
DHPU
HPU8
HPU8
HPU8
HPU8
HPU8
HPU8
HPU8
HPU8
M(2,3)#0
M(1,2)#0
M(6,2)#46
M(5,3)#1
M(9,2)#35
M(8,3)#44
M(9,4)#29
M(10,4)#4
M(11,4)#19
M(7,4)#9
M(3,4)#21
M(4,4)#30
M(2,4)#26
M(1.4)#0
DVPU#1 DVPU#1
M(1,6)#0
M(2,7)#0
M(2,9)#2
M(3,5)#0
M(3,7)#17
M(5,6)#20
M(5,7)#35
M(6,7)#22
M(6,9)#40
M(10,5)#4
M(11,6)#14
M(7,8)#13
M(7,7)#18
DVPU#1 DVPU#1
DVPU DVPU DVPU
DVPU#1 DVPU#1 DVPU#1 DVPU#1
M(6,18)#0
M(5,18)#0
M(5,17)#0
M(4,17)#0
M(4,16)#0
M(3,15)#0
M(2,14)#0
M(2,15)#0
M(3,16)#0
M(1,14)#0
DVPU
Spare
Spare
Spare
Spare
Spare
Spare
Spare
Spare
Spare
Spare
Spare
Spare
Spare
Spare
Spare
DVPU#1 DVPU#1 DVPU#1 DVPU#1 DVPU#1 DVPU#1 DVPU#1 DVPU#1 DVPU#1 DVPU#1
Memory Bank
Design & Verification procedure
Binary 864 bits Binary 1728 bits
Corrupted 1728 DataBinary 864 bits
Eb/No = 2.0 dB
Algorithm verification in MATLAB- generating the encoder output
- adding AWGN to LDPC decoder
- checking correct capability of LDPC decoder
HDL design- describing in structural style
- bottom-to-top design approach
Design & Verification procedure
Functional Simulation
- making test-bench file for checking error
correcting capability of LDPC Decoder
- using VCS Verilog Simulator
- compiling our verilog source code
- running the simulation
- viewing the generated waveform
Design & Verification procedure
Verilog HDL Design
UsingTest data made
by Matlab
Optimization for Area, Timing,
and Power
LDPCH matrix
Construction
SubmoduleVerilog HDL
Design
Partial Functional Verification
PartialSynposys & Synthesis
OverallVerilog HDL
Design
Intergration for Submodules,Clock, FSM,
etc
Overall Functional Verification
Overall Synposys & Synthesis
UsingSimple
TestBench
1st Optimization
H=(1728, 864)for 802.11n
WLAN
VNUs,CNUs,
Memory Block,Address Gen,
etc...
Design Complete& Report
Synthesis & Optimization
- using Script command to synthesize
and optimize our design
- bottom-to-top optimization using don’t touch attribute
- analyzing area and timing
- If the result do not meet design constraints,
go to RTL Description level
Verilog HDL Design
Synthesis & Optimization
Synthesis Results for VPU3_Block
Optimization Method Area Data required time- data arrival time
Area 1074.000000 35 - 25.79 = 9.21
Timing Driven Structuring 1043.250000 35 - 26.55 = 8.45
Timing Driven StructuringWith Flattening 1043.250000 35 - 26.42 = 8.58
Timing Driven StructuringWith Flattening and Boolean
Optimization1033.500000 35 - 26.42 = 8.58
Timing Driven StructuringWith High Mapping Effort and
Flattening1017.750000 35 - 26.38 = 8.62
Computer simulation by Matlab- Completed Matlab program for LDPC decoding.
- Confirmed the decoding performance for our proposed algorithm.
- Extracted the encoded output data and corrupted data by AWGN.
HDL design by Verilog- Completed HDL description for submodules.
- Still checking the functional simulation using simple testbench.
Current Status
HDL design by Verilog- Complete HDL description by integrating submodules.
- Compare & check the full functional simulation using the extracted data form Matlab.
- Perform synthesis and optimization.
- Make a final report
Future works
In this project,Do understanding the LDPC codes.
Propose low complexity architecture for decoding partial parallel LDPC codes.
Use Verilog HDL for hardware implementation.
Study synthesis & optimization technique using several tools.
Learn overall design procedures by VHDL or Verilog descriptions
Conclusion
Thank you