Home >Documents >FPGA-based Evaluation of LDPC Codeskumar/Sabanci_Seminar.pdfFPGA-based Evaluation of LDPC...

FPGA-based Evaluation of LDPC Codeskumar/Sabanci_Seminar.pdfFPGA-based Evaluation of LDPC...

Date post:02-Mar-2018
Category:
View:221 times
Download:6 times
Share this document with a friend
Transcript:
  • FPGA-based Evaluation of LDPC CodesFPGA-based Evaluation of LDPC Codes

    Prof. Vijayakumar [email protected]

  • AcknowledgementsDr. Hongwei Song

    Dr. Zongwang Li

    Dr. Lingyan Sun

    Xinde Hu

  • OutlineOutline

    Motivation for using low density parity check (LDPC) codes in data storage systemsStructured LDPC codesSoft output Viterbi algorithm (SOVA)Implementation on FPGA hardwareLDPC code evaluation for magnetic recording channel modelsSummary

  • Digital Data Storage ChannelDigital Data Storage Channel

    Media

    WriteHead

    ReadHead

    Use

    r Bits

    User BitsWrite

    CurrentWrite

    Current

    Timing Recovery

    ReadEqualization

    ReadEqualization

    EncodingEncoding

    DetectionDetectionDecodingDecoding

    Tape cartridge Tape cartridge and driveand drive

    Analog Digital data represented by analog media magnetization changes (magnetic disk, tape, magneto-optic recording) pits and lands (CD/DVD) phase change (DVR) refractive index changes (holographic storage)

  • Hard Disk Drive Signal Processing TrendsHard Disk Drive Signal Processing Trends

    PEAK DETECTMFM

    (2,7)

    (1,7)

    PR4EPR4 NPML with PARITY

    d=0 ord=1

    Density

    Time

    ANALOG DIGITAL

    E PR4, GEnPR4n

    TURBO/LDPC CODES

    d=0

    Courtesy: H. Thapar

    Normalized densities > 3 Low SNRs; 6 to 10 dB range? Higher data rates; need faster detection

  • Partial Response Maximum Likelihood (PRML)

    ChannelT

    PR Equalizer Viterbi detector kaka

    Forcing ISI to zero causes noise amplification PR equalization leaves a controlled amount of ISI behind Maximum likelihood (ML) uses Viterbi algorithm (VA) to unravel the original bits from noisy samples containing controlled ISI HDD employs PRML

  • MotivationHard disk drives have enjoyed compound annual density growths as

    high as 100%, but are reaching limits due to effects such as super-paramagnetic effect

    Advanced coding and signal processing needed to cope with the lower SNRs and higher inter-symbol interference (ISI) of future systems.

    Simulations of iterative soft detection/decoding using LDPC codes exhibit 3~5 dB coding gain over uncoded partial response maximum likelihood (PRML) system at bit error rate (BER) ~10-5

    Performance of LDPC codes at BER

  • LDPC Codes

    X is valid codeword if0S HX= =

    =

    5

    4

    3

    2

    1

    0

    3

    2

    1

    0

    011000001100011010100001

    xxxxxx

    ssss

    H

    3c 2c 1c 0c

    5b 4b 3b 2b 1b 0b

    Rate, block length, minimum distanceGirth: the shortest circle lengthGood structure of H matrix

    facilitate LDPC decodingEasier analysis

    Tanner Graph

  • Structured LDPC CodesStructured LDPC Codes

    Regular codes: all columns have same number of onesStructured codes: parity check matrix or generator matrix has a structure that offers implementation advantages

    Disjoint Difference Sets (DDS) codesArray codesEuclidean geometry codesQuasi-cyclic codes

  • Disjoint Difference Set (DDS) CodesDisjoint Difference Set (DDS) Codes

    Difference set: A set {a1, a2, ,aj} of different residues (mod v) is called a difference set (v, j) if no two of the ordered j(j-1) differences ai-ai modulo v are identical.

    D={{0, 1, 4}, {0, 2, 7}} is a disjoint difference sets (DDS) with v=13 {0, 1, 4} set: ai-ai={1 3 4 9 10 12}{0, 2, 7} set: ai-ai={2 5 6 7 8 11}

    1 2

    0 0 0 0 0 0 0 0 0 0

    = =

    1 1 1 1 1 11 1 1 1 1 1

    1 1 1 1 1 11 1 1 1 1 1

    1 1 1 1 1 11 1 1 1 1 1

    1 1 1 1 1 11 1 1 1 1 1

    1 1 1 1 1 11 1 1 1 1 1

    1 1 1 1 1 11 1 1 1 1 1

    1 1 1 1 1 1

    H H H

    0 1 4 0 2 7

    *H. Song, Iterative soft detection and decoding for data storage channels, Ph.D dissertation, Dept. of ECE, Carnegie Mellon University, Pittsburgh, USA, Dec. 2002

    D={0, 1, 3} is a difference set (7, 3), ai-ai={1 2 3 4 5 6}

  • Array based LDPC codesArray based LDPC codes

    2 3 1

    2 4 6 2( 1)

    1 2( 1) 3( 1) ( 1)( 1)

    p

    p

    j j j j p

    =

    I I I I II

    H I

    I

    11

    1

    1 p p

    =

    1 1 1 1 11 1 1 1 1

    1 1 1 1 11 1 1 1 1

    1 1 1 1 11 1 1 1 1

    1 1 1 1 11 1 1 1 1

    1 1 1 1 11 1 1 1 1

    1 1 1 1 11 1 1 1 1

    1 1 1 1 11 1 1 1 1

    1 1 1 1 1

    1

    2

    3

    HH = H =

    H

    *H. Song, Iterative soft detection and decoding for data storage channels, Ph.D dissertation, Dept. of ECE, Carnegie Mellon University, Pittsburgh, USA, Dec. 2002

  • Quasi-Cyclic (QC) LDPC Code

    =

    ctcc

    t

    t

    HHH

    HHHHHH

    H

    21

    22221

    11211

    Each Hij is either an all-zero matrix or a circulant matrix.(circulant size 5x5)n: codeword length, 4608

    k: information bit length, 4096ql: circulant size qlxql, 128

    =

    0100000100000100000110000

    , jiH

    Parity check matrix of QC-LDPC code in circulant formLow routing congestion in ASIC

  • Soft InformationSoft Information

    Larger LLR magnitudes represent more confidence

    Prob. of bit (+1)Prob. of bit ( -1)Log-Likelihood Ratio (LLR) =loge

    -2

    -1

    0

    1

    2

    0 1 2 3 4 5 6 7 8 9 10Sample Index

    LLR

    High Confidence of (-1)

    Low Confidence of (+1)

    High Confidence of (+1)

  • LDPC DecodingLDPC Decoding

    Bit-to-Check update

    Check to bit update

    LLR

  • Iterative Detection and DecodingIterative Detection and Decoding

    Soft detection and decoding

    Exchanging soft information between channel detector and LDPC decoder

    3~5 dB Gain in simulations over uncoded PRML system at BER 10-5

    Soft ChannelDetector

    Soft ChannelDetector

    LDPC DecoderLDPC

    Decoder

    Readback signal

    Channel Iteration

    Decoded bits

  • Example Performance Gain

    EPR4 (1+D-D2-D3)

    AWGN

    Codeword length 4608

    Code rate 8/9

    10 channel iterations

    1 LDPC iteration

    4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.510-6

    10-5

    10-4

    10-3

    10-2

    10-1

    SNR

    BER

    PRMLLDPC(10,1)

  • Problem of Error FloorProblem of Error Floor

    Error floor is a major problem in applying LDPC codes to the magnetic recoding channel

    5 6 7 8 9 10 11

    Signal to Noise Ratio [dB]

    10-1

    10-2

    10-3

    10-4

    10-5

    10-6

    10-7

    10-8

    10-9

    10-10

    10-11

    Bit E

    rror R

    ate

    Error Floor

    C limitation

  • LDPC Code Evaluation ApproachLDPC Code Evaluation Approach

    ApproachFPGA platformCode evaluation platform for AWGN channelCode Evaluation platform for ideal PR channelError analysis using FPGA

    ChallengesHardware throughput and software flexibility Tradeoff between throughput, logic consumption

  • Reasons for FPGAReasons for FPGA

    speed Cost

    Programmability

    Comparison of Hardware Platforms for DSP Algorithm

    FPGA

    uPs

    ASIC uPs : General Purpose Microprocessors/ Digital Signal Processors

    ASIC: Application Specific Integrated Circuit

    EPR4 target, depth=15, TS-SOVA

    Operations/bit ~350.

    Assume 1GHz PC, 3 clock cycles/operation. 1013 bits need 122 days

    100Mbps, 1013 bits need 1.15days

    High speed: required to achieve BER

  • First Platform - System View

    Matlab, C

    PCI Port

    PCI (33MHz) between PC and FPGA

    Simulation controller: Matlab and C program

    FPGA chip: Xilinx Virtex II 6000

    6M gates

    144 Memory blocks

  • Hardware / Software Co-designHardware / Software Co-design

    Task Partition:

    Generate SamplesCollect Errors Detection/Decoding

    High bandwidth

    PC FPGA

    Collect Errors PCI PortGenerate SamplesDetection/Decoding

    High speed sample generation

    PC FPGA

    PCI Port

  • Soft Channel DetectorsSoft Channel Detectors

    Provide soft information for the LDPC DecoderMaximum-A-Posterior Probability (MAP) Detectors

    MAPMax-Log-MapLog-Map

    Maximum Likelihood Sequence Detectors (MLSD)Classic SOVA, Two step SOVABi-directional SOVA

  • Soft Channel Detector ComparisonMAP, Max-Log-Map, Log-Map, Bi-SOVA, TS-SOVA

    Window length 32 EPR4 channelAWGNFloating pointN=4608, j=3, rate 8/910 LDPC iterations0 Channel iterations

    5 5.5 6 6.510-4

    10-3

    10-2

    10-1

    SNR (dB)

    BE

    R

    MAPLOG_MAPMAX_LOG_MAPBI_SOVASOVA

  • LDPC Decoder Implementations

    Fully Parallel ImplementationDirectly map the algorithm to hardware implementation

    Highest throughput. (64 Gbps/iteration at 64MHz using 0.16um standard cell CMOS processing with 5 metal layer*)

    Complicated wiring dominates the chip area

    Design not scalable

    * C. Howland, A. Blanksby, Parallel decoding architectures for low density parity check codes, The 2001 IEEE International Symposium on Circuits and Systems, vol. 4, page 742 745, 2001.

    PE

    PE

  • Shared Memory Architecture

    PEMemory

    Problems:

    Code specific design

    Different H matrix structures give different interconnections

    Large amount of memory consumption

    Requirements:

    A single decoder for a broad class of codes

    Column weight, code rate, block length, H matrix structure

    Fit the decoder in a single FPGA

    High throughput

  • 0b 1b 2b 4b3b 8b 9b6b 7b5b 10b

    0c 1c 2c 3c 4c 5c 6c 7c

    11b

    Generalized LDPC Decoder Implementation Set H matrix constraint: consists of regular sub matrix

    Physical node to virtual node mapping

    0,0 0,1 0, 1

    1,0 1,1 1, 1

    1,0 1,1 1, 1

    1 1 0 0 0 0 0 0 1 0 0 0

    0 1 1 0 0 0 0 0 0 0 1 0

    1 0 0 1 0 0 0 0 0 1 0 0

    0 0 1 1 0 0 0 0 0 0 0 1

    0 0 0 1 1 1 1 0 0 0 1 1

    1 0 0 0 1 0 1 1 0 1 1 0

    0 0 1 0 0 1 1 1 1 1 0 0

    0 1 0 0 1 1 0 1 1 0 0 1

    ..

    .... .. .. ..

    ..

    k

    k

    j j j k

    H H HH H H

    H

    H H H

    =

    '2b

    '1b

    '0b

    '0c

    '1c

    '0b '1b

    '2b

    '0c '1c

  • AWGN Channel Evaluation Platform

    AWGNError Analyzer

    PCI_ IF

    DEC LLR Cal

    PCI Port

    PCI handshaking logic

    Start, stop, noise variance, max errorBlock error,

    Bit error

    Fully reconfigurable: Iteration numbers, bit width, block length, Column width of code

    High throughput

    Small size to fit in one chip

  • Error floor of DDS code in AWGNError floor of DDS code in AWGN

    3.5 4 4.5 5 5.5 6 6.510-10

    10-910-810-710-610-5

    10-410-310-210-1100

    SNR

    BE

    R/B

    LER

    DDS j=5, N=4923, M=547, Iterations=50

    BLER

    BER

    *H. Song, Iterative soft detection and decoding for data storage channels, Ph.D dissertation, Dept. of ECE, Carnegie Mellon University, Pittsburgh, USA, Dec. 2002

  • DDS Codes in AWGN Channel

    3 3.5 4 4.5 5 5.5 6 6.5 710-10

    10-9

    10-8

    10-7

    10-6

    10-5

    10-4

    10-3

    10-2

    10-1

    100

    SNR

    BER

    /BLE

    R

    DDS j=3,4,5, N=4923, M=547, Iterations=50

    DDS j=3DDS j=4DDS j=5 AWGN channel,

    6 bit LLR, 50 iterationsJ=3 Error Floor at BLER=1e-3J=4 Error Floor at BLER=1e-4J=5 Error Floor at BLER=1e-6

    *H. Song, Iterative soft detection and decoding for data storage channels, Ph.D dissertation, Dept. of ECE, Carnegie Mellon University, Pittsburgh, USA, Dec. 2002

  • 3.5 4 4.5 5 5.5 610-1210-1110-1010-910-810-710-610-510-410-310-210-1100

    SNR

    BE

    R/B

    LER

    Array j=5, N=4637, M=515, Iterations=50

    BLER

    BER

    Error floor of Array code in AWGNError floor of Array code in AWGN

  • Array Code Evaluation in AWGN Channel

    3 3.5 4 4.5 5 5.5 6 6.5 710-12

    10-11

    10-10

    10-9

    10-8

    10-7

    10-6

    10-5

    10-4

    10-3

    10-2

    10-1

    100

    SNR

    BER

    /BLE

    R

    ARRAY j=3,4,5, Iterations=50

    ARRAY j=3, N=4671, M=519ARRAY j=4, N=4716, M=524ARRAY j=5, N=4635, M=515

    AWGN channel6 bit LLR, 50 iterationsJ=3 Error Floor at BLER=1e-5J=4 Error Floor at BLER=1e-6J=5 Error Floor at BLER=1e-9

    * J. Fan, "Array codes as Low-Density Parity Check Codes" 2nd International Symposium on Turbo Codes and Related Topics (Brest, France), September 2000.

  • PR Channel Evaluation Platform

    Main challenges

    Data flow control

    Buffer and memory requirement

    Clock skew in large design

    PCI (33.33 MHz) handshaking

    RDG ENC FIFO Interleaver

    Precoder

    PR+AWGN

    SOVADe-interleaver

    DecoderInterleaver FIFO

    Error distribution Error

    CountError Location

    Start, stop, noise variance, max error

    Error Location, Error number, Error distribution

  • Error floor of DDS code (column weight 3) in PR channel

    4 4.5 5 5.5 6 6.5 7 7.510

    -8

    10-7

    10-6

    10-5

    10-4

    10-3

    10-2

    10-1

    100

    SNR

    BE

    R/B

    LER

    DDS J=3, N=4923, M=547

    AWGN Channel Result at 50 LDPC iteration

    PR channel with 25 LDPC iterations and 2 channel iterations

    BLER

    BER BER

    BLER

    C Simulation

    EPR4 channel7 bit SOVA6 bit LDPC Decoder25 LDPC iterations2 Channel iteration

  • 4 4.5 5 5.5 6 6.5 7 7.510

    -10

    10-9

    10-8

    10-7

    10-6

    10-5

    10-4

    10-3

    10-2

    10-1

    100

    SNR

    BE

    R/B

    LER

    DDS J=5, N=4923, M=547

    AWGN Channel at 50 LDPC iteration PR Channel with 25 LDPC iterations

    and 2 channel iterations

    BLER

    BER

    BLER

    BER

    Error Floor of DDS Code (column weight 5) in PR Channel

    EPR4 channel7 bit SOVA6 bit LDPC Decoder25 LDPC iterations2 Channel iteration

  • 4 4.5 5 5.5 6 6.5 7 7.5 810-9

    10-8

    10-7

    10-6

    10-5

    10-4

    10-3

    10-2

    10-1

    100

    SNR

    BE

    R/B

    LER

    ARRAY j=3, N=4671, M=519

    BER

    BLER

    AWGN Channel Result at 50 LDPC iteration

    PR channel with 25 LDPC iterations and 2 channel iterations

    BLER

    BER

    Error Floor of Array Code (column weight 3) in PR Channel

    EPR4 channel7 bit SOVA6 bit LDPC Decoder25 LDPC iterations2 Channel iteration

  • Code length: 5760 Message length: 5120 Code rate: 8/9

    Column Weight: 3, 4, 5Row weight: 27,36,45

    Max. # iterations: 15

    BER/BLER

    1. 0E- 11

    1. 0E- 10

    1. 0E- 09

    1. 0E- 08

    1. 0E- 07

    1. 0E- 06

    1. 0E- 05

    1. 0E- 04

    1. 0E- 03

    1. 0E- 02

    1. 0E- 01

    1. 0E+00

    2. 5 3 3. 5 4 4. 5 5Eb/ N0

    BER ( FPGA deg5)

    BLER ( FPGA deg5)

    BER ( FPGA deg 3)

    BLER ( FPGA deg 3)

    BER ( FPGA deg 4)

    BLER ( FPGA deg 4)

    Rate 8/9 QC-LDPC Code

  • BER/BLER

    1. 0E- 11

    1. 0E- 10

    1. 0E- 09

    1. 0E- 08

    1. 0E- 07

    1. 0E- 06

    1. 0E- 05

    1. 0E- 04

    1. 0E- 03

    1. 0E- 02

    1. 0E- 01

    1. 0E+00

    0 0. 5 1 1. 5 2 2. 5 3

    Eb/ N0 ( dB)

    BLER( 30 i t er at i on,Scal ed mi n- sum, FPGA)

    BER( 30 i t er at i on,Scal ed mi n- sum, FPGA)

    Rate PEG QC-LDPC Code with Block Length of 32768 bits

    BER/BLERBER/BLER

    Low BERs Possible thanks due to FPGA

  • SOVA+LDPC FPGA SimulatorSOVA+LDPC FPGA Simulator

    RDG ENC Interleaver

    Write processor

    SOVA Deinterleaver

    DECinterleaverERR ANY

    Read processor

    Magnetic Recording Channel

  • Transition NoiseTransition Noise

    Transition Jitter Width Variation

    Transitions between bit cells not straightThe zigzags result in transition shift (transition jitter) and pulse width variation

  • Media NonlinearitiesMedia NonlinearitiesPartial Erasure (PE)

    Each bit cell is partially erased by the field of neighboring field (when transition occurs)Results in an amplitude loss in the readback signal

    Nonlinear Transition Shift (NLTS)The previous written bit cells interfere with the current writing field Results in transition location shift

    Partial Erasure

    Original write bubble

    Actual write bubble

    Original Transition Position

    Actual Transition Position

    The writing head

    NLTS Interference field

  • NLTS Factor

    Partial ErasureFactor

    Data Dependent Noise

    Data Independent Noise

    AWGN Generator

    Transition NoiseEPR4 Equalizer

    Jitter Noise

    Width Variation

    Magnetic Recording Channel ModelMagnetic Recording Channel Model

    ISI Factor

    Input Buffer

    From Encoder

    To Decoder

    Electronic Noise

    The ReadbackChannel

    AWGN Generator

    AWGN Generator

  • Hardware Resource UsageHardware Resource Usage

    1 out of 16 6%

    56 out of 168 33%

    27308 out of 93184 29%

    15936 out of 93184 17%

    17897 out of 46592 38%

    Number of GCLKs

    Number of MULT18X18s

    Number of 4 input LUTs

    Number of Slice Flip Flops

    Number of Slices

    The resource utilization (Xilinx Virtex II 8000 -4 ) Total equivalent gate count for design: 3,696,944

    Processing speed and throughput Clock speed 40 Mbits/s Throughput 180 Mbits/s per iteration, 3.6 Mbits/s with 50 iterations Time required to reach BER of 10-11: FPGA --- 30 hours

    C simulation --- 3,000 hours

  • Signal-to-Noise Ratio (SNR)Signal-to-Noise Ratio (SNR)

    Includes both AWGN and transition noiseSNR defined as the ratio of the power of a single pulse and the height of noise power spectral density

    The transition noise is the dominant part of the noise (80% in power)

    02

    250 0

    / 1 4

    t

    j t

    E NSNRE

    PW N

    =+

  • Xilinx Virtex II 8000FPGA ChipAlphaData ADM-XRC-II PCI AdaptorAdaptor

    25 LDPC (outer) iterations2 channel (inner) iterations

    Number of iterations12dB --- 22 dBSNR

    EPR4PR target

    Code length 4932, rate 8/9 LDPC codeLDPC Codes

    VHDLHardware Programming language

    Simulation SettingsSimulation Settings

  • BER for Different ChannelsBER for Different Channels

    6 dB due to realistic channel modeling of Lorentzian pulse shape and the non-adaptive equalizer

    2.5 dB due to PR channel

    3 dB due to the presence of media nonlinearities

  • When the PE ratio reaches 0.7, the loss in BER is significant. For the NLTS effect, the threshold is around 0.15T. When exceeds 0.15T, a precompensator is necessary for the system.

    Partial Erasure and NLTSPartial Erasure and NLTS

    PE Effect NLTS Effect

  • Inefficient region Waterfall Region Error Floor

    Each curve represents the ratio of errors corrected during the past 5 iterations In the low SNR region, the noise level is too high for the LDPC code to be effective. In the waterfall region, the error rate drops dramatically as SNR increases. Each iteration is able to correct a large portion of error bits left. In the error floor region, only the early iterations in each of the two channel iterations are effective. Some iterations introduce more errors than they correct, shown as negative ratio.

    Iteration EfficiencyIteration Efficiency

  • BER with Fewer IterationsBER with Fewer Iterations

    0.6 dB

    Reduce the total number of iterations to 10 (in 2 channel iterations) from 50The loss in BER is 0.6 dB in the waterfall regionThe guaranteed throughput is increased by 5

  • FPGA LDPC Code Platform EvolutionFPGA LDPC Code Platform Evolution

    AWGN Channel + LDPC ENDECIdealized EPR4 + LDPCEqualized EPR4 + LDPCLorentzian readback signal (on FPGA) + equalized EPR4 + LDPC

    AWGNTransition noiseNonlinear transition shiftPartial erasure

    Perpendicular recording channel model (on FPGA)

  • Perpendicular RecordingPerpendicular Recording

    Step response modeled by error functionTransition noise Media nonlinearities

    Nonlinear transition shift (NLTS)

    Partial erasure (PE)

    Inter-symbol interferenceElectronic Noise

    ln16( ) ( )50dt

    tg t E erfPW

    =

  • LDPC Code with Perpendicular RecordingLDPC Code with Perpendicular Recording

    BER FER

    The effects of major impairments in perpendicular recording systems are similar to that in longitudinal recording except NLTS does not appear to impact perpendicular recording as much.At normalized recording density of 2 (PW50/T =2), the BER reaches 10-12 at SNR of 21.5 dB.

  • Magnetic recording channel models (both longitudinal and perpendicular recording systems) include transition noise, nonlinear transition shift, partial erasure and electronic noiseFPGA allows us to reach BERs as low as 10-12

    At normalized recording density of 2, simulation exhibits 10-12 BER for SNR of around 21 dBThe individual impact of different impairments on coding performance can be investigated

    SummarySummary

of 52/52
FPGA-based Evaluation of LDPC Codes FPGA-based Evaluation of LDPC Codes Prof. Vijayakumar Bhagavatula [email protected]
Embed Size (px)
Recommended