Date post: | 29-Jan-2016 |
Category: |
Documents |
Upload: | arron-walker |
View: | 221 times |
Download: | 0 times |
Design and Implementation of Turbo Decoder for 4G standards IEEE 802.16e and LTE
Syed Z. Gilani
Motivation
• Conventional serial decoding architectures can be performance bottleneck– 6144 bit block, 8 iterations @ 250MHz, 1 bit
processed per cycle=> data rate < 6144/ (6144*8*4ns)
– ~ 31Mbps
• Data rates for LTE can be 100Mbps-300Mbps• Parallel architecture necessary to support high
throughput decoding
• Maximum-a posteriori (MAP) algorithm
– Alpha– Beta– Gamma
– LLR
• (De)Interleaver P(i) = (f1*i + f2*i2 )mod N
switch (i mod 4) case 0: P(i) = (P0*i + 1 )mod N case 1: P(i) = (P0*i + 1 + N/2 + P1 )mod N
case 2: P(i) = (P0*i + 1 + P2 )mod N case 3: P(i) = (P0*i + 1 +N/2 + P3 )mod N
Turbo Decoder Overview
Optimizations
• Resource Sharing• Retiming• Look-ahead transformation• Variable and adaptive parallelism• Multiplierless interleaver
Parallelization
Time (cycles)
Stat
es
PE 1PE 1
PE 2PE 2
PE 3PE 3
PE 4PE 4
Variable Parallelization
Parallel Interleaver
Bank 0
Bank 0
Bank 1
Bank 1
Bank 0
Bank 0
Bank 1
Bank 1
Coded Bits Decoded Bits
Variable Parallelization
Parallel Interleaver
Bank 0
Bank 0
Bank 3
Bank 3
Bank 1
Bank 1
Bank 2
Bank 2
Bank 0
Bank 0
Bank 3
Bank 3
Bank 1
Bank 1
Bank 2
Bank 2
Coded Bits Decoded Bits
Interleaver Optimization• Interleaving functions
– P(i) = (f1*i + f2*i2 )mod N
– switch (i mod 4) case 0: P(i) = (P0*i + 1 )mod N
case 1: P(i) = (P0*i + 1 + N/2 + P1 )mod N
case 2: P(i) = (P0*i + 1 + P2 )mod N
case 3: P(i) = (P0*i + 1 +N/2 + P3 )mod N
• Unoptimized Memory requirements– Don’t want to use multipliers and dividers– Storing all memory address in RAM– LTE alone supprts 40 different block lengths with different
interleaving parameters– Block lengths vary from 40 bits to 6144 bits
Interleaver Optimization• On-the-fly address generation• LTE Interleaving Function
P(i) = (f1*i + f2*i2 )mod N
P(i+1) = (f1*(i+1) + f2*(i+1)2)mod N= P(i) +( f1 + f2 +2 f2)mod N
• Wimax Interleaving Function switch (i mod 4)
case 0: P(i) = (P0*i + 1 )mod N
case 1: P(i) = (P0*i + 1 + N/2 + P1 )mod N
case 2: P(i) = (P0*i + 1 + P2 )mod N
case 3: P(i) = (P0*i + 1 +N/2 + P3 )mod N
– P(i+1) = (P0 (i) + P0 + constant factor )mod N
• Replace sum by residue whenever sum exceeds N to avoid mod N (subtraction)
Interleaver OptimizationPE i P(i) Bank
Add.
Bit
Add.0 0 1 0 1
1 300 1501 5 1
2 600 601 2 1
3 900 2101 7 1
4 1200 1201 4 1
5 1500 301 1 1
6 1800 1801 6 1
7 2100 901 3 1
PE i P(i) Bank
Add.
Bit
Add.0 1 1320 4 120
1 301 420 1 120
2 601 1920 6 120
3 901 1020 3 120
4 1201 120 0 120
5 1501 1620 5 120
6 1801 720 2 120
7 2101 2220 7 120
Lookahead Transformation
0
1
6
7
0 0
1
2
3
4
5
6
7
0
tk tk+1 tk tk+2
•16 Comparisons required for lookahead transformation in Duo-binary Wimax turbo codes•Increases throughput by 2x•Maximum clock rate decreases from 500MHz to ~300MHz along with significant increase in area
Results
No of Iterations Number of PEs Throughput Serial throughput2 2 490Mbps 243Mbps2 4 909Mbps 243Mbps2 8 1666Mbps 243Mbps4 2 245Mbps 122Mbps4 4 455Mbps 122Mbps4 8 833Mbps 122Mbps8 2 122Mbps 60Mbps8 4 228Mbps 60Mbps8 8 417Mbps 60Mbps
@ 500Mhz
Questions
Outline
• Motivation• Turbo Encoding• Turbo Decoding• Optimizations– Look-ahead transformation– Variable and adaptive parallelism– Multiplierless interleaver
• Results• Summary
Turbo Encoder
LTE Turbo Encoding Wimax Turbo Encoding
Parallelization
•Example 4 state trellis•1 decoded symbol per cycle
Time (cycles)
Stat
es