Aggregated Circulant Matrix Based LDPC CodesYuming Zhu and Chaitali ChakrabartiDepartment of Electrical Engineering Arizona State University, Tempe
OutlineIntroduction to LDPC codesIterative decoding of LDPC codesAggregated Circulant Matrix (ACM) based LDPC codesConstruction algorithmBER performanceDecoder architectureConcluding remarks
Introduction to LDPC codes
LDPC codes are linear block codes with sparse parity check matrix. (Low complexity)Can be represented by bipartite (Tanner) graph.LDPC codes were proposed by Gallager in 1962. Rediscovered in 90s because of the success of Turbo codes.Adopted in IEEE 802.16e (WiMax),10G BaseT, DVB-S2, IEEE 802.11n (in consideration)
Introduction (contd.)
LDPC code is Shannon limit approaching.Chung (Trans. IT, Feb 2001) reported 0.0045dB to AWGN channel Shannon limit with irregular LDPC code.Very simple data path.Potential to achieve massive parallelism and high throughput.1G bps LDPC decoder (Blanksby and Howland 2001)
Related WorkHigh speed LDPC decoder architecturesC. Howland 2001(Fully Parallel)T. Zhang 2001 (Partially Parallel)M.M. Mansour 2003 (Partially Parallel)D.E. Hocevar 2004 (Partially Parallel)Partially Parallel decoding for sub-matrix with multiple shifted identity matricesZ. Wang 2005 (With restriction on the shift values; for geometrical LDPC codes)
OutlineIntroduction to LDPC codesIterative decoding of LDPC codesAggregated Circulant Matrix (ACM) based LDPC codesConstruction algorithmBER performanceDecoder architectureConclusion and future work
Code graph and decodingIterative decodingBelief Propagation (BP)Bit nodes send their belief information (likelihood ratio, usually in LOG domain)Check nodes gather the information and update the corresponding bit nodes.An example of (2,3) regular LDPC
Iterative decoding algorithmBP:Min-Sum:-Min:
Circulant matrix based LDPC codePartial parallel implementation with ordered sub-matrix in H matrix.
Scheduling of the belief information update.Check node basedVariable node basedEach element in the Hb matrix is a circulant shifted version of identity matrix. (Tanner 2004)
Layered BP algorithmLayered BP algorithm schedules in row order.The rows in one block row can be processed in parallel. Pipelining is a common technique to increase the throughput.However, the throughput increased by pipelining is limited if there are data dependencies.
OutlineIntroduction to LDPC codesIterative decoding of LDPC codesAggregated Circulant Matrix (ACM) based LDPC codesConstruction algorithmBER performanceDecoder architectureConclusion and future work
Aggregated Circulant Matrix (ACM) based LDPCIdea: Remove the data dependency between the block rows in the parity check matrix.Method: Perform aggregation to reduce the non-zero sub-matrix within each block column.Outcome: Throughput is doubled with a small increase in data-path.
Construction algorithm for ACMFirst, construct Hb matrix with designed degree distribution.Aggregation: It does not change the degree distribution.
For Hb of size MxN, make sure (i,j) and (i+M/2,j) does not contain I simultaneously.The decoding of block rows is scheduled as: 0, M/2, 1, M/2+1, , M/2-1, M.
Hb matrix of ACM based LDPCOriginalAggregatedReordered
BER performance of ACM
Bit Update AlgorithmIn parallel decoding of regular circulant matrix based LDPC, each bit node is updated only once.
However, in the parallel processing of ACM LDPC, the bit update information could come from different rows that are being processed simultaneously.Some mechanism is needed to combine the multiple bit update information.m1m2nnmRegular Sub-matrixACMSub-matrix
Bit Update for ACM
ACM LDPC decoder architecture
Synthesis resultThe decoder was synthesized with TSMC 90 nm lib.The decoder achieved 930 Mb/s throughput when clocked at 300 MHz for a rate 4/5 code.Compared with the regular LDPC code, the data-path of the ACM LDPC decoder increased by only 20%, while the throughput increased by a factor of 2.
The data-path contributed 25.6% of the area of the overall decoder.
OutlineIntroduction to LDPC codesIterative decoding of LDPC codesAggregated Circulant Matrix (ACM) based LDPC codesConstruction algorithmBER performanceDecoder architectureConclusion and future work
Concluding RemarksThe proposed ACM LDPC code has comparable performance with the regular LDPC codes.The advantage is that it can be decoded with a two-fold increase in the throughput at the expense of only 20% increase in data-path complexity.Efficient implementation of the aggregation algorithm with more than 2 identity matrices is still an open problem.
Thank You!!Questions?
Aggregation exampleThe example shows the case for aggregating (i,j), (i,k), (i+M/2,j), (i+M/2,k).If the difference in row index is not M/2, the shift values may vary.
Ln includes the intrinsic information (In) and the information from all the check-to-bit information (Enm) from the check node connected to n.Enm is the check-to-bit information from check node m to bit node n.Lnm is the bit-to-check information from bit node n to check node m.Some data dependency can be solved by block row re-ordering, while some can not. For example Comparing the bit update rule for ACM, the extra addition computation comes from the calculation of Tnm and Ln^new. Area decomposition: Interconnection 56.3%, Data path 25.6%, Control 19.1%.Memory requirement: (a) Enm: 23.3 Kb (b) Lnm: 7.3 Kb