+ All Categories
Home > Documents > A Parallelized Layered QC-LDPC Decoder for IEEE...

A Parallelized Layered QC-LDPC Decoder for IEEE...

Date post: 10-Mar-2020
Category:
Upload: others
View: 9 times
Download: 0 times
Share this document with a friend
1
A. Balatsoukas-Stimming * , N. Preyss * , A. Cevrero * , A. Burg * , C. Roth * Department Of Electrical Engineering, EPFL, Lausanne, Switzerland, Integrated Systems Laboratory, ETHZ, Zurich, Switzerland E-mail: {alexios.balatsoukas, nicholas.preyss, alessandro.cevrero, andreas.burg}@epfl.ch, [email protected] A Parallelized Layered QC-LDPC Decoder for IEEE 802.11ad IEEE 802.11ad: IEEE 802.11ad: Multi-gigabit throughput for wireless LAN >10x times higher throughput offers new wireless opportunities: Raw HD streaming Instant media library sync Ultra-high throughput IP links IEEE 802.11ad requires high-speed baseband signal processing at low power consumption Challenges: Complex channel conditions due to high delay spread Large device variations of analog front-ends Gbit/s bit rate (1.54Gbps mandatory, 3.08 & 6.16Gbps optional) Layered Decoding Schedule Performance highly affected by message-passing schedule Flooding Schedule: all variable-to-check messages updated, then all check-to- variable messages updated. Highly parallelizable, slow convergence Layered Schedule: variable-to-check and check-to-variable messages for 1 st check node, then 2 nd , etc. Fast convergence, low parallelism Twofold reduction in number of iterations twofold reduction in energy consumption But: very challenging to achieve multi-gigabit throughput Parallelized Decoder Architecture 802.11ad Channel Coding: QC-LDPC Codes Application to IEEE 802.11ad requires: Very high throughput Low power Solution: Highly conflicting requirements! Control Sequence Optimization Z=42 and N=16 are fixed by IEEE802.11ad I=5 (number of iterations) is fixed to satisfy QoS requirements L (sequence length) can be optimized Optimization Method Detailed view of COMB unit Layered decoding & early termination low power Additional parallelization high throughput Re-arrange rows and columns of parity-check matrix to minimize pipeline stalls higher throughput (Almost) free lunch: only LLR access order changes Parallelization overhead: ̴10% Average length reduction: ̴13% Reduction in max. length: ̴13% Result: 3.12 Gbps min. throughput Conclusion Low-power layered LDPC decoder is feasible when multi- gigabit throughput is required Careful assignment of processing units to parity-check matrix blocks leads to very efficient parallelization [1] Draft Standard for Information Technology, Draft Amendment 5, IEEE P802.11ad/D5.0, IEEE Std., Sep. 2011. [2] R. G. Gallager, Low-Density Parity-Check Codes. Cambridge, MIT Press, 1963. [3] M. Weiner, B. Nikolic, and Z. Zhang, “LDPC decoder architecture for high- data rate personal-area networks,” in Proc. IEEE Int. Symp. Circuits and Systems, 2011. [3] M. P. C. Fossorier, “Quasi-cyclic low-density parity-check codes from circulant permutation matrices,” IEEE Trans. Inf. Theory, vol. 50, no. 8, 2004. [5] C. Studer, N. Preyss, C. Roth, and A. Burg, “Configurable high throughput decoder architecture for quasi-cyclic LDPC codes,” in Proc. 42nd Asilomar Conf. on Signals, Systems and Computers, 2008. [6] E. Sharon, S. Litsyn, and J. Goldberger, “Efficient serial message-passing schedules for LDPC decoding,” IEEE Trans. Inf. Theory, vol. 53, no. 11, Nov. 2007. [7] H. Shirani-Mehr, T. Mohsenin, and B. Baas, “A reduced routing network architecture for partial parallel LDPC decoders,” in Proc. 45th Asilomar Conf. on Signals, Systems and Computers, 2011. References: Doubly parallelized architecture: 1. Two blocks of every row of H processed simultaneously 2. COMB unit combines partial results to ensure proper operation Processing units and shifters doubled No additional memory required Simple routing preserved Throughput: Synthesis Results Parity-check matrix 1. Consists of 42x42 cyclic permutation matrices 2. Illustrates parity constraints imposed on bits by the code Parity-check matrix of rate ½ code 3. Is used to decode codewords via Min-Sum (MS) message-passing 4. Represents graph in which columns are variable nodes, rows are check nodes 5. Various coding rates are used depending on channel conditions Reference Architecture MIN and SEL units perform basic functions of MS decoding on Z independent rows of H simultaneously Parity-check matrix blocks are processed serially in a pipeline Memory reads/writes dictated by control sequence, data dependencies avoided by pipeline stalling
Transcript
Page 1: A Parallelized Layered QC-LDPC Decoder for IEEE 802alexiosbalatsoukas.com/papers/13NEWCASLDPCPoster.pdf · A Parallelized Layered QC-LDPC Decoder for IEEE 802.11ad IEEE 802.11ad:

A. Balatsoukas-Stimming*, N. Preyss*, A. Cevrero*, A. Burg*, C. Roth†

*Department Of Electrical Engineering, EPFL, Lausanne, Switzerland, †Integrated Systems Laboratory, ETHZ, Zurich, SwitzerlandE-mail: {alexios.balatsoukas, nicholas.preyss, alessandro.cevrero, andreas.burg}@epfl.ch, [email protected]

A Parallelized Layered QC-LDPC Decoder for IEEE 802.11ad

IEEE 802.11ad: Multi-gigabit throughput for wireless LAN

IEEE 802.11ad: Multi-gigabit throughput for wireless LAN

>10x times higher throughput offers new wireless opportunities:• Raw HD streaming • Instant media library sync• Ultra-high throughput IP links

IEEE 802.11ad requires high-speed baseband signal processing at low power consumption

Challenges:• Complex channel conditions due to high delay spread• Large device variations of analog front-ends• Gbit/s bit rate (1.54Gbps mandatory, 3.08 & 6.16Gbps optional)

Layered Decoding ScheduleLayered Decoding Schedule• Performance highly affected by message-passing schedule• Flooding Schedule: all variable-to-check messages updated, then all check-to-

variable messages updated. Highly parallelizable, slow convergence• Layered Schedule: variable-to-check and check-to-variable messages for 1st check

node, then 2nd, etc. Fast convergence, low parallelism

• Twofold reduction in number of iterations ≈ twofold reduction in energy consumption

• But: very challenging to achieve multi-gigabit throughput

Parallelized Decoder ArchitectureParallelized Decoder Architecture

802.11ad Channel Coding: QC-LDPC Codes802.11ad Channel Coding: QC-LDPC Codes

Application to IEEE 802.11ad requires:• Very high throughput• Low powerSolution:

Highly conflicting requirements!

Control Sequence OptimizationControl Sequence Optimization• Z=42 and N=16 are fixed by IEEE802.11ad• I=5 (number of iterations) is fixed to satisfy QoS requirements• L (sequence length) can be optimized

Optimization Method

Detailed view of COMB unit

• Layered decoding & early termination → low power• Additional parallelization → high throughput

• Re-arrange rows and columns of parity-check matrix to minimize pipeline stalls → higher throughput

• (Almost) free lunch: only LLR access order changes• Parallelization overhead: ̴10%• Average length reduction: ̴13%• Reduction in max. length: ̴13%• Result: 3.12 Gbps min. throughput

ConclusionConclusionLow-power layered LDPC decoder is feasible when multi-

gigabit throughput is required

Careful assignment of processing units to parity-check matrix blocks leads to very efficient parallelization

[1] Draft Standard for Information Technology, Draft Amendment 5, IEEEP802.11ad/D5.0, IEEE Std., Sep. 2011.[2] R. G. Gallager, Low-Density Parity-Check Codes. Cambridge, MIT Press,1963.[3] M. Weiner, B. Nikolic, and Z. Zhang, “LDPC decoder architecture for high-data rate personal-area networks,” in Proc. IEEE Int. Symp. Circuits andSystems, 2011.[3] M. P. C. Fossorier, “Quasi-cyclic low-density parity-check codes fromcirculant permutation matrices,” IEEE Trans. Inf. Theory, vol. 50, no. 8, 2004.

[5] C. Studer, N. Preyss, C. Roth, and A. Burg, “Configurable high throughputdecoder architecture for quasi-cyclic LDPC codes,” in Proc. 42nd AsilomarConf. on Signals, Systems and Computers, 2008.[6] E. Sharon, S. Litsyn, and J. Goldberger, “Efficient serial message-passingschedules for LDPC decoding,” IEEE Trans. Inf. Theory, vol. 53, no. 11, Nov.2007.[7] H. Shirani-Mehr, T. Mohsenin, and B. Baas, “A reduced routing networkarchitecture for partial parallel LDPC decoders,” in Proc. 45th Asilomar Conf.on Signals, Systems and Computers, 2011.

References:

• Doubly parallelized architecture: 1. Two blocks of every row of H processed simultaneously2. COMB unit combines partial results to ensure proper operation

• Processing units and shifters doubled• No additional memory required• Simple routing preserved

• Throughput:

Synthesis Results

• Parity-check matrix1. Consists of 42x42 cyclic

permutation matrices2. Illustrates parity constraints

imposed on bits by the code Parity-check matrix of rate ½ code

3. Is used to decode codewords via Min-Sum (MS) message-passing4. Represents graph in which columns are variable nodes, rows are check nodes5. Various coding rates are used depending on channel conditions

Reference Architecture

• MIN and SEL units perform basic functions of MS decoding on Z independent rows of H simultaneously• Parity-check matrix blocks are processed serially in a pipeline• Memory reads/writes dictated by control sequence, data dependencies avoided by pipeline stalling

Recommended