002.Ldpc Final 1

CHAPTER I

INTRODUCTION

1.1 Introduction to LDPC

Due to their near Shannon limit performance and inherently parallelizable

decoding scheme, low-density parity-check (LDPC) codes. have been extensively

investigated in research and practical applications. Recently, LDPC codes have been

considered for many industrial standards of next generation communication systems

such as DVB-S2, WLAN (802.11.n), WiMAX (802.16e), and 10GBaseT (802.3an).

For high throughput applications, the decoding parallelism is usually very high.

Hence, a complex interconnect network is required which consumes a significant

amount of silicon area and power.

A message broadcasting technique was proposed to reduce the routing

congestion in a fully parallel LDPC decoder. Because all check nodes and variable

nodes are directly mapped to hardware, the implementation cost is very high. The

decoders in are targeted to specific LDPC codes which have very simple

interconnection between check nodes and variables nodes. The constraints in H matrix

structure for routing complexity reduction unavoidably limit the performance of the

LDPC codes. The LDPC code decoder proposed in based on two-phase message-

passing (TPMP) decoding scheme.

Recently, layered decoding approach has been of great interest in LDPC

decoder design because it converges much faster than TPMP decoding approach. The

4.6 Gb/s LDPC decoder presented in adopted layered decoding approach. However, it

is only suited for array LDPC codes, which can be viewed as a sub-class of LDPC

codes. It should be noted that a shuffled iterative decoding algorithm based on vertical

partitioning of the parity-check matrix can also speed up the LDPC decoding in

principle.

In practice, LDPC codes have attracted considerable attention due to their

excellent error correction performance and the regularity in their parity check

matrices which is well suited for VLSI implementation. In this paper, we present a

high-throughput low-cost layered decoding architecture for generic QC-LDPC codes. 1

A row permutation approach is proposed to significantly reduce the

implementation complexity of shuffle network in the LDPC decoder. An approximate

layered decoding approach is explored to increase clock speed and hence to increase

the decoding throughput. An efficient implementation technique which is based on

Min-Sum algorithm is employed to minimize the hardware complexity. The

computation core is further optimized to reduce the computation delay.

Low-density parity-check (LDPC) codes were invented by R. G. Gallager

(Gallager 1963; Gallager 1962) in 1962. He discovered an iterative decoding

algorithm which he applied to a new class of codes. He named these codes low-

density parity-check (LDPC) codes since the parity-check matrices had to be sparse to

perform well. Yet, LDPC codes have been ignored for a long time due mainly to the

requirement of high complexity computation, if very long codes are considered. In

1993, C. Berrou et. al. invented the turbo codes (Berrou, Glavieux, and Thitimajshima

1993) and their associated iterative decoding algorithm. The remarkable performance

observed with the turbo codes raised many questions and much interest toward

iterative techniques. In 1995, D. J. C. MacKay and R. M. Neal (MacKay and Neal

1995; MacKay and Neal 1996; Mackay 1999) rediscovered the LDPC codes, and set

up a link between their iterative algorithm to the Pearl’s belief algorithm (Pearl 1988),

from the artificial intelligence community (Bayesian networks). At the same time, M.

Sipser and D. A. Spielman (Sipser and Spielman 1996) used the first decoding

algorithm of R. G. Gallager (algorithm A) to decode expander codes.

1.2 Objectives:

The objective of this project is a high-throughput decoder architecture for

generic quasicyclic low-density parity-check (QC-LDPC) codes. Various

optimizations are employed to increase the clock speed. A row permutation scheme is

proposed to significantly simplify the implementation of the shuffle network in LDPC

decoder. An approximate layered decoding approach is explored to reduce the critical

path of the layered LDPC decoder. The computation core is further optimized to

reduce the computation delay. It is estimated that 4.7 Gb/s decoding throughput can

be achieved at 15 iterations using the current technology.

2

Low-density parity-check (LDPC) codes, which have channel capacity

approaching performance, were first invented by Gallager in 1962 and rediscovered

by MacKay in 1996 as a linear block code; LDPC codes show excellent error

correction capability even for low signal-to-noise ratio applications. Also, inner

independence of its parity-check matrix enables parallel decoding and thus makes

high-speed LDPC decoder possible. Hence, LDPC codes have been suggested in

many recent wire-line and wireless communication standards such as IEEE 802.11n,

DVB-S2 and IEEE 802.16e (WiMax). LDPC codes can be effectively decoded by the

standard belief propagation (BP) algorithm which is also called sum-product

algorithm (SPA). Later, min-sum algorithm (MSA) is introduced to reduce the

computational complexity of the check nodes processing in SPA, which makes this

algorithm suitable for VLSI implementation. VLSI implementation of LDPC decoder

has attracted attentions from researchers in the past few years, including fully parallel

architecture and partly parallel architecture. Fully parallel architecture directly maps

standard BP algorithm into hardware by specifying connections between check nodes

and variable nodes. However, the interconnections become more complex as the

block length increases, which leads to large chip area and power consumption. Partly

parallel architecture can effectively balance the hardware complexity and system

throughput by employing architecture-aware LDPC (AA-LDPC) codes that have

regularly-constructed parity-check matrix. However, the decoder complexity is still a

great challenge for LDPC codes that have irregular parity check matrix, such as the

codes used in the IEEE 802.16e standard for WiMax systems. The decoding

throughput for irregular LDPC codes will decrease due to the irregular parity check

matrix which destroys the inherent parallelism in partly parallel decoding

architectures. Layered decoding algorithm (LDA), either by horizontal partitioning or

vertical partitioning, uses the newest data from the current iteration rather than data

from the previous iteration and thus can double the convergence speed. Conventional

LDA processes messages in serial, from the first layer to the last, leading to limited

decoding throughput. Grouped layered decoding can improve the throughput but

employs more hardware resources. In this paper, we introduce a new parallel layered

decoding architecture (PLDA) to enable different layers to operate concurrently.

Precisely scheduled message passing paths among different layers guarantees that

newly calculated messages can be delivered to their designated locations before they

3

are used by the next layer. By adding offsets to the permutation values of the sub-

matrices in the base parity check matrix, time intervals among different layers become

large enough for message passing. In PLDA, the decoding latency per iteration can be

reduced greatly and hence the decoding throughput is improved. The remainder of

this paper is organized as follows. Section II introduces code structure used in

WiMax, MSA and LDA. Corresponding hardware implementation of PLDA and

message passing network are presented in Section III. Section IV shows experimental

results of the proposed decoder, including FPGA implementation results, ASIC

implementation results and comparisons with existing WiMax LDPC decoders.

4

CHAPTER II

2.1 Turbo codes

Turbo Coding is an iterated soft-decoding scheme that combines two or more

relatively simple convolutional codes and an interleaver to produce a block code that

can perform to within a fraction of a decibel of the Shannon limit. Predating LDPC

codes in terms of practical application, they now provide similar performance.

One of the earliest commercial applications of turbo coding was the

CDMA2000 1x (TIA IS-2000) digital cellular technology developed by Qualcomm

and sold by Verizon Wireless, Sprint, and other carriers. It is also used for the

evolution of CDMA2000 1x specifically for Internet access, 1xEV-DO (TIA IS-856).

Like 1x, EV-DO was developed by Qualcomm, and is sold by Verizon Wireless,

Sprint, and other carriers (Verizon's marketing name for 1xEV-DO is Broadband

Access, Sprint's consumer and business marketing names for 1xEV-DO are Power

Vision and Mobile Broadband, respectively.).

2.1.1 Characteristics of Turbo Codes

1) Turbo codes have extraordinary performance at low SNR.

a) Very close to the Shannon limit.

b) Due to a low multiplicity of low weight code words.

2) However, turbo codes have a BER “floor”.

- This is due to their low minimum distance.

3) Performance improves for larger block sizes.

a) Larger block sizes mean more latency (delay).

b) However, larger block sizes are not more complex to decode.

c) The BER floor is lower for larger frame/interleaver sizes

5

4) The complexity of a constraint length KTC turbo code is the same as a K =

KCC convolutional code,

Where: KCC » 2+KTC+ log2 (number decoder iterations)

2.2 Performance of Error Correcting Codes

The performances of error correcting codes are compared with each other by

referring to their gap to the Shannon limit, as mentioned in section 1.1.1. This section

aims at defining exactly what the Shannon limit is, and what can be measured exactly

when the limit to the Shannon bound is referred to. It is important to know exactly

what is measured since a lot of “near Shannon limit” codes have been discovered

now. The results hereafter are classical in the information theory and may be found in

a lot of references. Yet, the first part is inspired by the work of (Schlegel 1997).

Forward Error Correction (FEC) is an important feature of most modem

communication systems, including wired and wireless systems. Communication

systems use a variety of FEC coding techniques to permit correction of bit errors in

transmitted symbols.

2.2.1 Forward error correction

In telecommunication and information theory, forward error correction (FEC)

(also called channel coding) is a system of error control for data transmission,

whereby the sender adds (carefully selected) redundant data to its messages, also

known as an error-correcting code. This allows the receiver to detect and correct

errors (within some bound) without the need to ask the sender for additional data. The

advantages of forward error correction are that a back-channel is not required and

retransmission of data can often be avoided (at the cost of higher bandwidth

requirements, on average). FEC is therefore applied in situations where

retransmissions are relatively costly or impossible. In particular, FEC information is

usually added to most mass storage devices to protect against damage to the stored

data.

FEC processing often occurs in the early stages of digital processing after a

signal is first received. That is, FEC circuits are often an integral part of the analog-to-

6

digital conversion process, also involving digital modulation and demodulation, or

line coding and decoding. Many FEC coders can also generate a bit-error rate (BER)

signal which can be used as feedback to fine-tune the analog receiving electronics.

Soft-decision algorithms, such as the Viterbi encoder, can take (quasi-)analog data in,

and generate digital data on output.

The maximum fraction of errors that can be corrected is determined in

advance by the design of the code, so different forward error correcting codes are

suitable for different conditions.

How it works

FEC is accomplished by adding redundancy to the transmitted information

using a predetermined algorithm. Each redundant bit is invariably a complex function

of many original information bits. The original information may or may not appear in

the encoded output; codes that include the unmodified input in the output are

systematic, while those that do not are nonsystematic.

An extremely simple example would be an analog to digital converter that

samples three bits of signal strength data for every bit of transmitted data. If the three

samples are mostly zero, the transmitted bit was probably a zero, and if three samples

are mostly one, the transmitted bit was probably a one. The simplest example of error

correction is for the receiver to assume the correct output is given by the most

frequently occurring value in each group of three.

Triplet received Interpreted as000 0001 0010 0100 0111 1110 1101 1011 1

This allows an error in any one of the three samples to be corrected by

"democratic voting". This is a highly inefficient FEC, but it does illustrate the

principle. In practice, FEC codes typically examine the last several dozen, or even the

7

http://en.wikipedia.org/wiki/Line_coding

last several hundred, previously received bits to determine how to decode the current

small handful of bits (typically in groups of 2 to 8 bits).

Such triple modular redundancy, the simplest form of forward error correction,

is widely used.

Averaging noise to reduce errors:

FEC could be said to work by "averaging noise"; since each data bit affects

many transmitted symbols, the corruption of some symbols by noise usually allows

the original user data to be extracted from the other, uncorrupted received symbols

that also depend on the same user data.

Because of this "risk-pooling" effect, digital communication systems that use

FEC tend to work well above a certain minimum signal-to-noise ratio and not

at all below it.

This all-or-nothing tendency -- the cliff effect -- becomes more pronounced as

stronger codes are used that more closely approach the theoretical limit

imposed by the Shannon limit.

Interleaving FEC coded data can reduce the all or nothing properties of

transmitted FEC codes. However, this method has limits; it is best used on

narrowband data.

Most telecommunication systems used a fixed Channel Code designed to tolerate

the expected worst-case bit error rate, and then fail to work at all if the bit error rate is

ever worse. However, some systems adapt to the given channel error conditions:

Hybrid automatic repeat-request uses a fixed FEC method as long as the FEC can

handle the error rate, then switches to ARO when the error rate gets too high; adaptive

modulation and coding uses a variety of FEC rates, adding more error-correction bits

per packet when there are higher error rates in the channel, or taking them out when

they are not needed.

Types of FEC:

The two main categories of FEC codes are block codes and convolutional

codes.

8

Block codes work on fixed-size blocks (packets) of bits or symbols of

predetermined size. Practical block codes can generally be decoded in

polynomial time to their block length.

Convolutional codes work on bit or symbol streams of arbitrary length. They

are most often decoded with the Viterbi algorithm, though other algorithms are

sometimes used. Viterbi decoding allows asymptotically optimal decoding

efficiency with increasing constraint length of the convolutional code, but at

the expense of exponentially increasing complexity. A convolutional code can

be turned into a block code, if desired.

There are many types of block codes, but among the classical ones the most

notable is Reed-Solomon coding because of its widespread use on the Compact disc,

the DVD, and in hard disk drives. Golay, BCH, Multidimensional parity, and

Hamming codes are other examples of classical block codes.

Hamming ECC is commonly used to correct NAND flash memory errors ]. This

provides single-bit error correction and 2-bit error detection. Hamming codes are only

suitable for more reliable single level cell (SLC) NAND. Denser multi level cell

(MLC) NAND requires stronger multi-bit correcting ECC such as BCH or Reed-

Solomon].Classical block codes are usually implemented using hard-decision

algorithms, which means that for every input and output signal a hard decision is

made whether it corresponds to a one or a zero bit. In contrast, soft-decision

algorithms like the Viterbi decoder process (discretized) analog signals, which allow

for much higher error-correction performance than hard-decision decoding. Nearly all

classical block codes apply the algebraic properties of finite fields.

Concatenated FEC codes for improved performance:

Classical (algebraic) block codes and convolutional codes are frequently

combined in concatenated coding schemes in which a short constraint-length Viterbi-

decoded convolutional code does most of the work and a block code (usually Reed-

Solomon) with larger symbol size and block length "mops up" any errors made by the

convolutional decoder.

9

Concatenated codes have been standard practice in satellite and deep space

communications since Voyager2 first used the technique in its 1986 encounter with

Uranus.

Low-density parity-check (LDPC):

Low-Density Parity- Check (LDPC) codes are a class of recently re-

discovered highly efficient linear block codes. They can provide performance very

close to the channel capacity (the theoretical maximum) using an iterated soft-

decision decoding approach, at linear time complexity in terms of their block length.

Practical implementations can draw heavily from the use of parallelism.

LDPC codes were first introduced by Robert G. Gallager in his PhD thesis in

1960, but due to the computational effort in implementing en- and decoder and the

introduction of Reed-Solomon codes, they were mostly ignored until recently.

LDPC codes are now used in many recent high-speed communication

standards, such as DVB-S2 (Digital video broadcasting), Wi-MAX (IEEE 802.16e

standard for microwave communications), High-Speed Wireless LAN (IEEE

802.11n), 10GBase-T Ethernet (802.3an) and G.hn/G.9960 (ITU-T Standard for

networking over power lines, phone lines and coaxial cable).

Channel Capacity:

Stated by Claude Shannon in 1948, the theorem describes the maximum

possible efficiency of error-correcting methods versus levels of noise interference and

data corruption. The theory doesn't describe how to construct the error-correcting

method, it only tells us how good the best possible method can be. Shannon's theorem

has wide-ranging applications in both communications and data storage. This theorem

is of foundational importance to the modern field of information theory. Shannon only

gave an outline of the proof. The first rigorous proof is due to Amiel Feinstein in

1954.

The Shannon theorem states that given a noisy channel with channel capacity

C and information transmitted at a rate R, then if R < C there exist codes that allow

the probability of error at the receiver to be made arbitrarily small. This means that,

10

http://en.wikipedia.org/wiki/Code

http://en.wikipedia.org/wiki/Channel_capacity

http://en.wikipedia.org/w/index.php?title=Amiel_Feinstein&action=edit&redlink=1

http://en.wikipedia.org/wiki/Information_theory

http://en.wikipedia.org/wiki/Data_storage_device

http://en.wikipedia.org/wiki/Error-correcting_code

http://en.wikipedia.org/wiki/Claude_Shannon

theoretically, it is possible to transmit information nearly without error at any rate

below a limiting rate, C.

The converse is also important. If R > C, an arbitrarily small probability of

error is not achievable. All codes will have a probability of error greater than a certain

positive minimal level, and this level increases as the rate increases. So, information

cannot be guaranteed to be transmitted reliably across a channel at rates beyond the

channel capacity. The theorem does not address the rare situation in which rate and

capacity are equal.

Simple schemes such as "send the message 3 times and use a best 2 out of 3

voting scheme if the copies differ" are inefficient error-correction methods, unable to

asymptotically guarantee that a block of data can be communicated free of error.

Advanced techniques such as Reed–Solomon codes and, more recently, turbo codes

come much closer to reaching the theoretical Shannon limit, but at a cost of high

computational complexity. Using low-density parity-check (LDPC) codes or turbo

codes and with the computing power in today's digital signal processors, it is now

possible to reach very close to the Shannon limit. In fact, it was shown that LDPC

codes can reach within 0.0045 dB of the Shannon limit (for very long block lengths).

Mathematical statement:

Theorem (Shannon, 1948):

1. For every discrete memory less channel, the channel capacity

has the following property. For any ε > 0 and R < C, for large enough N, there

exists a code of length N and rate ≥ R and a decoding algorithm, such that the

maximal probability of block error is ≤ ε.

11

http://en.wikipedia.org/wiki/Channel_capacity

http://en.wikipedia.org/wiki/Digital_signal_processors

http://en.wikipedia.org/wiki/Low-density_parity-check_code

http://en.wikipedia.org/wiki/Turbo_code

http://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_code

http://en.wikipedia.org/wiki/File:Comm_Channel.svg

2. If a probability of bit error pb is acceptable, rates up to R(pb) are achievable, where

And H2(pb) is the binary entropy function

3. For any pb, rates greater than R(pb) are not achievable.

(MacKay (2003), p. 162; cf Gallager (1968), ch.5; Cover and Thomas (1991),

p. 198; Shannon (1948) thm. 11)

Figure 1: Error Correction in Communication systems

2.3 Row Permutation of Parity Check Matrix of LDPC Codes

12

http://en.wikipedia.org/wiki/Binary_entropy_function

The Parity check matrix of a LDPC code is an array of circulant

submatrices.To achieve very high decoding throughput, an array of cyclic shifters is

needed to shuffle soft messages corresponding to multiple submatrices for check

nodes and variable nodes. In order to reduce the VLSI implementation complexity for

the shuffle network, the shifting structure in circulant matrices is extensively

exploited. Suppose the parity check matrix H of a LDPC code is a J×C array of P×P

circulant submatrices. With row permutation, it can be converted to a form as shown

in fig.2

Figure 2: Array of circulant sub matrices

Figure 3: Permuted Matrix

Where is a P×P permutation matrix representing a single left or right

cyclic shift. The submatrix can be obtained by cyclically shifting the submatrix

13

for a single step. Ai is a J×P matrix determined by the shift offsets of the

circulant submatrices in block column i(i=1,2,...C),m is an integer such that P can be

divided by m.

For example, the matrix Ha shown in Fig 1. is a 2×3 array of 8×8 cyclically

shifted identify submatrices. With the row permutation described in the following, a

new matrix Hb can be obtained, which has the form shown in (1).First,the first four

rows of the first block row of Ha are distributed to four block rows of Hb in a round-

robin fashion(i.e., rows 1-4 of Ha are distributed row 1,5,9,and 13 of Hb).Then the

second four rows are distributed in the same way. The permutation can be continued

until all rows in the first block row of matrix Ha are moved to matrix Hb. Then the

second block row of Ha Are distributed in the same way. It can be seen from Fig.2 that

Hb has the form shown in(1).In the previous example, the row distribution is started

from the first row of each block row. In general, the distribution can be started from

any row of a block row. To minimize the data dependency between two adjacent

block rows, an optimum row distribution scheme is desired.

For an LDPC decoder which can process all messages corresponding to

the 1-components in an entire block row of matrix Hp (e.g., Hb in Fig.2), the shuffle

network for LDPC decoding can be implemented with very simple data shifters.

14

15

LDPC Codes:

An LDPC code is defined by a binary matrix called parity check matrix H.

Rows define parity check equations (constrains) between encoded symbols in

a code word and columns define the length of the code.

V is a valid code word if H٠Vt=0.

Decoder in the receiver checks if the condition H٠Vt=0 is valid.

16

Example : Parity check matrix for (9, 5) LDPC code, row weight=4, column

weight =2:

9

8

7

6

5

4

3

2

1

100001010

010100001

001010100

001100010

100010001

010001100

v

v

v

v

v

v

v

v

v

H ≠ 0 (There is error)= 0 (There is no error)

CHAPTER III17

CHANNEL CODING

This first chapter introduces the channel code decoding issue and the problem

of optimal code decoding in the case of linear block codes. First, the main notations

used in the thesis are presented and especially those related to the graph

representation of the linear block codes. Then the optimal decoding is discussed: it is

shown that under the cycle-free hypothesis, the optimal decoding can be processed

using an iterative algorithm. Finally, the performance of error correcting codes is also

discussed.

3.1 Optimal decoding:

Communication over noisy channels can be improved by the use of a channel

code C, as demonstrated by C. E. Shannon for its famous channel coding theorem

“Let a discrete channel have the capacity C and a discrete source the entropy per

second H. If H _ C there exists a coding system such that the output of the source can

be transmitted over the channel with an arbitrarily small frequency of errors. If H > C

it is possible to encode the source so that the equivocation is less than H.” This

theorem states that below a maximum rate R, which is equal to the capacity of the

channel, it is possible to find error correction codes to achieve any given probability

of error. Since this theorem does not explain how to make such a code, it has been the

kick-off for a lot of activities in the coding theory community. When Shannon

announced his theory in the July and October issues of the Bell System Technical

Journal in 1948, the largest communications cable in operation at that time carried

1800 voice conversations. Twenty-five years later, the highest capacity cable was

carrying 230000 simultaneous conversations. Today a single optical fiber as thin as a

human hair can carry more than 6.4 million conversations. In the quest of capacity

achieving codes, the performance of the codes is measured by their gap to the

capacity. For a given code, the smallest gap is obtained by an optimal decoder: the

maximum a-posteriori (MAP) decoder. Before dealing with the optimal decoding,

some notations within a model of the communication scheme are presented hereafter.

18

Figure 4: Basic scheme for channel code encoding/decoding.

3.1.1 Communication model

It depicts a classical communication scheme. The source block delivers

information by the mean of sequences which are row vectors x of length K. The

encoder block delivers the codeword c of length N, which is the coded version of x.

The code rate is defined by the ratio R = K/N. The codeword c is sent over the

channel and the vector y is the received word: a distorted version of c. The matched

filters, the modulator and the demodulator, and the synchronization is supposed to

work perfectly. Hence, the channel is represented by a discrete time equivalent model.

The channel is a non-deterministic mapper between its input c and its output y. We

assume that y depends on c via a conditional probability density function (pdf)

p (y|c). We assume also that the channel is memory less:

For example: If the channel is the binary-input additive white Gaussian noise

(BI-AWGN), and if the modulation is a binary phased shift keying (BPSK)

modulation with the

On Figure 4, two types of decoder are depicted: decoders of type 1 have to

compute the best estimation ˆx of the source word x; decoders of type 2 compute the

19

best estimation ˆc of the sent codeword c. In this case, ˆx is extracted from ˆc by a post

processing (reverse processing of the encoding) when the code is non-systematic.

Both decoders can perform two types of decoding:

3.2 Classes of LDPC codes:

R. Gallager defined an (N, j, k) LDPC codes as a block code of length N

having a small fixed number (j) of ones in each column of the parity check H, and a

small fixed number (k) of ones in each rows of H. This class of codes is then to be

decoded by the iterative algorithm described in chapter 1. This algorithm computes

exact a posteriori probabilities, provided that the Tanner graph of the code is cycle

free. Generally, LDPC codes do have cycles. The sparseness of the parity check

matrix aims at reducing the number of cycles and at increasing the size of the cycles.

Moreover, as the length N of the code increases, the cycle free hypothesis becomes

more and more realistic. The iterative algorithm is processed on these graphs.

Although it is not optimal, it performs quite well. Since then, LDPC codes class have

been enlarged to all sparse parity check matrices, thus creating a very wide class of

codes, including the extension to codes in GF(q) and irregular LDPC codes

Irregularity:

In the Gallager’s original LDPC code design, there is a fixed number of ones

in both the rows (k) and the columns (j) of the parity check matrix: it means that each

bit is implied in j parity check constraints and that each parity check constraint is the

exclusive-OR (XOR) of k bits. This class of codes is referred to as regular LDPC

codes. On the contrary, irregular LDPC codes do not have a constant number of non-

zero entries in the rows or in the columns of H. They are specified by the distribution

degree of the bit _(x) and of the parity check constraints ρ(x), using the notations of,

where:

20

Similarly, denoting by ρi the proportion of rows having weight i:

Code rate:

The rate R of LDPC codes is defined by is the design code rate. Rd = R if the

parity check matrix has full rank. The authors of have shown that as N increases, the

21

parity-check matrix is almost sure to be full rank. Hereafter, we will assume that R =

Rd unless the contrary is mentioned. The rate R is then linked to the other parameters

of the class by

Note that in general, for random constructions, when j is odd:

and when j is even:

3.2.1 Optimization of LDPC codes:

The bounds and performance of LDPC codes are derived from their

parameters set. The wide number of independent parameters enables to tune them so

as to fit some external constraint, as a particular channel, for example. Two

algorithms can be used to design a class of irregular LDPC codes under some channel

constraints: the density evolution algorithm and the extrinsic information transfer

(EXIT) charts.

Density evolution algorithm:

Richardson designed capacity approaching irregular codes with the density

evolution (DE) algorithm. This algorithm tracks the probability density function (pdf)

of the messages through the graph nodes under the assumption that the cycle free

hypothesis is verified. It is a kind of belief propagation algorithm with pdf messages

instead of log likelihood ratios messages. Density evolution is processed on the

asymptotical performance of the class of LDPC codes. It means that a infinite number

of iterations is processed on a infinite code-length LDPC code: if the length of the

code tends to infinity, the probability that a randomly chosen node belongs to a cycle

of a given length tends towards zero.

22

Usually, either the channel threshold or the code rates are optimized under the

constraints of the degree distributions and of the SNR. The threshold of the channel is

the value of the channel parameter above which the probability tends towards zero if

the iterations are infinite (and the code length also). Optimization tries to lower the

threshold or to higher the rate as best as possible. In for example, the authors designed

a rate−1/2 irregular LDPC codes for binary-input AWGN channels that approach the

Shannon limit very closely (up to 0.0045 dB). Optimization based on DE algorithm

are often processed by the mean of differential evolution algorithm when

optimizations are non-linear, as for example in where the authors optimize an

irregular LDPC code for uncorrelated flat Rayleigh fading channels. The Gaussian

approximation in the DE algorithm can also be used: the probability density functions

of the messages are assumed to be Gaussian and the only parameters that has to be

tracked in the nodes is the mean.

EXIT chart:

Extrinsic information transfer (EXIT) charts are 2D graphs on which are

superposed the mutual information transfers through the 2 constituent codes of a

turbocode. EXIT charts have been transposed to the LDPC code optimization

3.2.2 Regular vs. Irregular LDPC codes:

• An LDPC code is regular if the rows and columns of H have uniform weight,

i.e. all rows have the same number of ones (dv) and all columns have the same

number of ones (dc)

– The codes of Gallager and MacKay were regular (or as close as

possible)

– Although regular codes had impressive performance, they are still

about 1 dB from capacity and generally perform worse than turbo

codes

• An LDPC code is irregular if the rows and columns have non-uniform weight

– Irregular LDPC codes tend to outperform turbo codes for block lengths

of about n>105

23

• The degree distribution pair (λ, ρ) for a LDPC code is defined as

• λi, ρi represent the fraction of edges emanating from variable (check) nodes of

degree i

3.2.3 Constructing Regular LDPC Codes:

• Around 1996, Mackay and Neal described methods for constructing sparse H

matrices

• The idea is to randomly generate a M × N matrix H with weight dv columns

and weight dc rows, subject to some constraints

• Construction 1A: Overlap between any two columns is no greater than 1

– This avoids length 4 cycles

• Construction 2A: M/2 columns have dv =2, with no overlap between any pair

of columns. Remaining columns have dv =3. As with 1A, the overlap between

any two columns is no greater than 1

• Construction 1B and 2B: Obtained by deleting select columns from 1A and 2A

– Can result in a higher rate code

3.2.4 Constructing Irregular LDPC Codes:

• Luby developed LDPC codes based on irregular LDPC Tanner graphs

• Message and check nodes have conflicting requirements

– Message nodes benefit from having a large degree

24

– LDPC codes perform better with check nodes having low degrees

• Irregular LDPC codes help balance these competing requirements

– High degree message nodes converge to the correct value quickly

– This increases the quality of information passed to the check nodes,

which in turn helps the lower degree message nodes to converge

• Check node degree kept as uniform as possible and variable node degree is

non-uniform

– Code 14: Check node degree =14, Variable node degree =5, 6, 21, 23

• No attempt made to optimize the degree distribution for a given code rate

3.3 Constructions of LDPC codes:

By constructions of LDPC codes, we mean the construction, or design, of a

particular LDPC parity check matrix H. The design of H is the moment when the

asymptotical constraints (the parameters of the class you designed, like the degree

distribution, the rate) have to meet the practical constraints (finite dimension, girths).

Hereafter are described some recipes taking into account some practical constraints.

Two techniques exist in the literature: random and deterministic ones. The design

compromise is that for increasing the girth, the sparseness has to be decreased

yielding poor code performance due to a low minimum distance. On the contrary, for

high minimum distance, the sparseness has to be increased yielding the creation of

low-length girth, due to the fact that H dimensions are finite, and thus, yielding a poor

convergence of the belief propagation algorithm.

3.3.1 Random based construction:

The first constructions of LDPC codes were random ones. The parity check

matrix is the concatenation and/or superposition of sub-matrices; these sub-matrices

are created by processing some permutations on a particular (random or not) sub-

matrix which usually has a column weight of 1. R. Gallager’s construction for

example is based on a short matrix H0. Then j matrices Пi(H0) are vertically stacked

on H0, where Пi(H0) denotes a column permutation of H0 (see figure 5).

25

Regular and irregular codes can be also constructed like in where the 2 sets of

nodes are created, each node appearing as many times as its degree’s value. Then a

one to one association is randomly mapped between the nodes of the 2 sets, like

illustrated on figure 2.3. D. MacKay compares random constructions of regular and

irregular LDPC codes: small girth have to be avoided, especially between low weight

variables.

All the constructions described above should be constrained by the girth’s

value. Yet, increasing the girth from 4 to 6 and above is not trivial; some random

constructions specifically address this issue. The authors generate a parity check

matrix optimizing the length of the girth or the rate of the code when M is

Figure 5: Some random constructions of regular LDPC parity check matrices

based on Gallager’s (a) and MacKay’s constructions (b, c) (MacKay, Wilson, and

Davey 1999). Example of a regular (3, 4) LDPC code of length N = 12. Girths of

length 4 have not been avoided. The permutations can be either columns permutation

(a, b) or rows permutations.

26

3.3.2 Deterministic based construction:

Random constructions don’t have too many constraints: they can fit quite well

to the parameters of the desired class. The problem is that they do not guarantee that

the girth will be small enough. So either post-processing or more constraints are

added for the random design, yielding sometimes much complexity. To circumvent

the girth problem, deterministic constructions have been developed. Moreover,

explicit constructions can lead to easier encoding, and can be also easier to handle in

hardware. 2 branches in combinatorial mathematics are involved in such designs:

finite geometry and Balanced Incomplete Block Design’s (BIBSs). They seem to be

more efficient than previous an algebraic construction which was based on expander.

The authors designed high rate LDPC codes based on Steiner systems.

Their conclusion was that the minimum distance was not high enough and that

difference set cyclic (DSC) codes should outperform them, where they are combined

with the one step majority logic decoding. The authors present LDPC code

constructions based on finite geometry, like in (Johnson and Weller 2003a) for

constructing very high rate LDPC codes. Balanced incomplete block designs (BIBDs)

have also been.

The major drawback for deterministic constructions of LDPC codes is that

they exist with a few combinations of parameters. So it may be difficult to find one

that fits the specifications of a given system.

3.4 Code construction

For large block sizes, LDPC codes are commonly constructed by first studying

the behaviour of decoders. As the block size tends to infinity, LDPC decoders can be

shown to have a noise threshold below which decoding is reliably achieved, and

above which decoding is not achieved. This threshold can be optimized by finding the

best proportion of arcs from check nodes and arcs from variable nodes. An

approximate graphical approach to visualizing this threshold is an EXIT chart..

The cons ruction of a specific LDPC code after this optimization falls into two

main types of techniques:

27

Pseudo-random approaches

Combinatorial approaches

Construction by a pseudo-random approach builds on theoretical results that,

for large block size, a random construction gives good decoding performance. In

general, pseudo-random codes have complex encoders; however pseudo-random

codes with the best decoders can have simple encoders. Various constraints are often

applied to help ensure that the desired properties expected at the theoretical limit of

infinite block size occur at a finite block size.

Combinatorial approaches can be used to optimize properties of small block-

size LDPC codes or to create codes with simple encoders.

3.4.1 Random number generation

A random number generator (often abbreviated as RNG) is a computational or

physical device designed to generate a sequence of numbers or symbols that lack any

pattern, i.e. appear random.

The many applications of randomness have led to the development of several

different methods for generating random data. Many of these have existed since

ancient times, including dice, coin flipping, the shuffling of playing cards, the use of

yarrow stalks (by divination) in the I Ching, and many other techniques. Because of

the mechanical nature of these techniques, generating large amounts of sufficiently

random numbers (important in statistics) required a lot of work and/or time. Thus,

results would sometimes be collected and distributed as random number

tables.Nowadays, after the advent of computational random number generators, a

growing number of government-run lotteries, and lottery games, are using RNGs

instead of more traditional drawing methods, such as using ping-pong or rubber balls.

RNGs are also used today to determine the odds of modern slot machines.

Several computational methods for random number generation exist, but often

fall short of the goal of true randomness — though they may meet, with varying

success, some of the statistical tests for randomness intended to measure how

unpredictable their results are (that is, to what degree their patterns are discernible).

28

Only in 2010 was the first truly random computational number generator produced,

recurring to principles of quantum physics.

3.4.2 Pseudorandom number generator

A pseudorandom number generator (PRNG), also known as a deterministic

random bit generator (DRBG), is an algorithm for generating a sequence of numbers

that approximates the properties of random numbers. The sequence is not truly

random in that it is completely determined by a relatively small set of initial values,

called the PRNG's state. Although sequences that are closer to truly random can be

generated using hardware random number generators, pseudorandom numbers are

important in practice for simulations (e.g., of physical systems with the Monte Carlo

method), and are central in the practice of cryptography and procedural generation.

Common classes of these algorithms are linear congruential generators, Lagged

Fibonacci generators, linear feedback shift registers, feedback with carry shift

registers, and generalized feedback shift registers. Recent instances of pseudorandom

algorithms include Blum Blum Shub, Fortuna, and the Mersenne twister.

Careful mathematical analysis is required to have any confidence a PRNG

generates numbers that are sufficiently "random" to suit the intended use. Robert

R.Coveyou of Oak Ridge National Laboratory once titled an article, "The generation

of random numbers is too important to be left to chance." As John von Neumann

joked, "Anyone who considers arithmetical methods of producing random digits is, of

course, in a state of sin."

Periodicity:

A PRNG can be started from an arbitrary starting state using a seed state. It

will always produce the same sequence thereafter when initialized with that state. The

maximum length of the sequence before it begins to repeat is determined by the size

of the state, measured in bits. However, since the length of the maximum period

potentially doubles with each bit of 'state' added, it is easy to build PRNGs with

periods long enough for many practical applications.

If a PRNG's internal state contains n bits, its period can be no longer than 2n

results. For some PRNGs the period length can be calculated without walking through

29

http://en.wikipedia.org/wiki/Bit

http://en.wikipedia.org/wiki/Random_seed

http://en.wikipedia.org/w/index.php?title=Robert_Coveyou&action=edit&redlink=1

the whole period. Linear Feedback Shift Registers (LFSRs) are usually chosen to have

periods of exactly 2n−1. Linear congruential generators have periods that can be

calculated by factoring.[citation needed] Mixes (no restrictions) have periods of about 2n/2 on

average, usually after walking through a no repeating starting sequence. Mixes that

are reversible (permutations) have periods of about 2n−1 on average, and the period

will always include the original internal state. Although PRNGs will repeat their

results after they reach the end of their period, a repeated result does not imply that

the end of the period has been reached, since its internal state may be larger than its

output; this is particularly obvious with PRNGs with a 1-bit output.

Most pseudorandom generator algorithms produce sequences which are

uniformly distributed by any of several tests. It is an open question, and one central to

the theory and practice of cryptography, whether there is any way to distinguish the

output of a high-quality PRNG from a truly random sequence without knowing the

algorithm(s) used and the state with which it was initialized. The security of most

cryptographic algorithms and protocols using PRNGs is based on the assumption that

it is infeasible to distinguish use of a suitable PRNG from use of a truly random

sequence. The simplest examples of this dependency are stream ciphers, which (most

often) work by exclusive or-ing the plaintext of a message with the output of a PRNG,

producing cipher text. The design of cryptographically adequate PRNGs is extremely

difficult; because they must meet additional criteria (see below). The size of its period

is an important factor in the cryptographic suitability of a PRNG, but not the only one.

3.4.3 "True" random numbers vs. pseudorandom numbers:

There are two principal methods used to generate random numbers. One

measures some physical phenomenon that is expected to be random and then

compensates for possible biases in the measurement process. The other uses

computational algorithms that produce long sequences of apparently random results,

which are in fact completely determined by a shorter initial value, known as a seed or

key. The latter types are often called pseudorandom number generators.

A "random number generator" based solely on deterministic computation

cannot be regarded as a "true" random number generator, since its output is inherently

predictable. How to distinguish a "true" random number from the output of a pseudo-

30

http://en.wikipedia.org/wiki/Ciphertext

http://en.wikipedia.org/wiki/Plaintext

http://en.wikipedia.org/wiki/Exclusive_or

http://en.wikipedia.org/wiki/Stream_cipher

http://en.wikipedia.org/wiki/Cryptography

http://en.wikipedia.org/wiki/Uniform_distribution_(discrete)

http://en.wikipedia.org/wiki/Permutations

http://en.wikipedia.org/wiki/Wikipedia:Citation_needed

http://en.wikipedia.org/wiki/Linear_congruential_generator

http://en.wikipedia.org/wiki/Linear_feedback_shift_register

random number generator is a very difficult problem. However, carefully chosen

pseudo-random number generators can be used instead of true random numbers in

many applications. Rigorous statistical analysis of the output is often needed to have

confidence in the algorithm.

3.5 Encoding of LDPC codes:

• A linear block code is encoded by performing the matrix multiplication c =

uG

• A common method for finding G from H is to first make the code systematic

by adding rows and exchanging columns to get the H matrix in the form H =

[PT I]

– Then G = [I P]

– However, the result of the row reduction is a non-sparse P matrix

– The multiplication c =[u uP] is therefore very complex

• As an example, for a (10000, 5000) code, P is 5000 by 5000

– Assuming the density of 1’s in P is 0.5, then 0.5× (5000)2 additions are

required per codeword

• This is especially problematic since we are interested in large n (>105)

• An often used approach is to use the all-zero codeword in simulations.

• Richardson and Urbanke show that even for large n, the encoding complexity

can be (almost) linear function of ‘n’

- “Efficient encoding of low-density parity-check codes”, IEEE Trans. Inf.

Theory, Feb., 2001

• Using only row and column permutations, H is converted to an approximately

lower triangular matrix

- Since only permutations are used, H is still sparse

31

- The resulting encoding complexity in almost linear as a function of n

- An alternative involving a sparse-matrix multiply followed by

differential encoding has been proposed by Ryan, Yang, & Li….

- “Lowering the error-rate floors of moderate-length high-rate irregular

LDPC codes,” ISIT, 2003

The weak point of LDPC codes is their encoding process: a sparse parity

check matrix does not have necessarily a sparse generator matrix. Moreover, it

appears to be particularly dense. So encoding by a G multiplication yields to an N2

complexity processing. A first encoding scheme is to deal with lower triangular shape

parity check matrices. The other encoding schemes are mainly to deal with cyclic

parity check matrices.

3.5.1 Lower-triangular shape based encoding:

` A first approach is to create a parity check matrix with an almost

lower-triangular shape, as depicted on figure 2.4(a). The performance is a little bit

affected by the lower-triangular shape constraint. Instead of computing the product c

= u Gt, the equation H.ct = 0 is solved, where c is the unknown variable. The encoding

is systematic:

The last have to be solved without

reduced complexity. Thus, the higher M1 is the less complex the encoding is. T.

Richardson and R. Urbanke propose an efficient encoding of a parity check matrix H.

It is based on the shape depicted on figure 2.4-(b). They also propose some “greedy”

algorithms which transform any parity check matrix H into an equivalent parity check

matrix H0 using columns and rows permutations, minimizing. So H0 is still sparse.

The encoding complexity scales in O(N + g2) where g is a small fraction of N. As a

particular case the authors of and construct parity check matrices of the same shape

with g = 0.32

3.5.2 Other encoding schemes:

Iterative encoding

The authors derived a class of parity check codes which can be iteratively

encoded using the same graph-based algorithm as the decoder. But for irregular cases,

the code does not seem to perform as well as random ones.

Low-density generator matrices:

The generator matrices of LDPC codes are usually not sparse, because of the

inversion. But if H is constructed both sparse and systematic, then:

Where G is a sparse generator matrix (LDGM): they correspond to parallel

concatenated codes. They seem to have high error floors (asymptotically bad codes).

Yet, the authors of carefully chose and concatenate the constituent codes to lower the

error floor. Note that this may be a drawback for applications with high rate codes.

Cyclic parity-check matrices:

The most popular codes that can be easily encoded are the cyclic or pseudo-

cyclic ones. A Gallager-like construction using cyclic shifts enables to have a cyclic

based encoder. Finite geometry or BIBDs constructed LDPC codes are also cyclic or

pseudo-cyclic. Table 2 gives a summary of the different encoding schemes.

33

Table 2: Summary of the different LDPC encoding schemes

3.6 Decoding LDPC codes:

Like Turbo codes, LDPC can be decoded iteratively

– Instead of a trellis, the decoding takes place on a Tanner graph

– Messages are exchanged between the v-nodes and c-nodes

– Edges of the graph act as information pathways

Hard decision decoding

– Bit-flipping algorithm

Soft decision decoding

– Sum-product algorithm

• Also known as message passing/ belief propagation algorithm

– Min-sum algorithm

• Reduced complexity approximation to the sum-product

algorithm

34

In general, the per-iteration complexity of LDPC codes is less than it is for

turbo codes

– However, many more iterations may be required (max»100;avg»30)

Thus, overall complexity can be higher than turbo.

35

CHAPTER IV

TERMINOLOGY

36

WiMAX (Worldwide Interoperability for Microwave Access) is a

telecommunications protocol that provides fixed and fully mobile internet access. The

current WiMAX revision provides up to 40 Mbit/s[1][2] with the IEEE 802.16m update

expected offer up to 1 Gbit/s fixed speeds. The name "WiMAX" was created by the

WiMAX Forum, which was formed in June 2001 to promote conformity and

interoperability of the standard. The forum describes WiMAX[3] as "a standards-based

technology enabling the delivery of last mile wireless broadband access as an

alternative to cable and DSL".[4]

WiMAX base station equipment with a sector antenna and wireless modem on

top

A pre-WiMAX CPE of a 26 km (16 mi) connection mounted 13 metres (43 ft)

above the ground (2004, Lithuania).

37

http://en.wikipedia.org/wiki/Lithuania

http://en.wikipedia.org/wiki/Customer-premises_equipment

http://en.wikipedia.org/wiki/Wireless_modem

http://en.wikipedia.org/wiki/Sector_antenna

http://en.wikipedia.org/wiki/Base_station

http://en.wikipedia.org/wiki/WiMAX#cite_note-3

http://en.wikipedia.org/wiki/Last_mile


http://en.wikipedia.org/wiki/WiMAX#WiMAX_Forum

http://en.wikipedia.org/wiki/IEEE_802.16m



http://en.wikipedia.org/wiki/Telecommunication

http://en.wikipedia.org/wiki/File:WiMAX_equipment.jpg

http://en.wikipedia.org/wiki/File:WiMAX_Antenne_aufm_Land.jpg

WiMAX (Worldwide Interoperability for Microwave Access) is a

telecommunications protocol that provides fixed and fully mobile internet access.

The current WiMAX revision provides up to 40 Mbit/s with the IEEE 802.16m

update expected offer up to 1 Gbit/s fixed speeds. The name "WiMAX" was created

by the WiMAX Forum, which was formed in June 2001 to promote conformity and

interoperability of the standard. The forum describes WiMAX as "a standards-based

technology enabling the delivery of last mile wireless broadband access as an

alternative to cable and DSL".

WiMAX refers to interoperable implementations of the IEEE 802.16 wireless-

networks standard (ratified by the WiMAX Forum), in similarity with Wi-Fi, which

refers to interoperable implementations of the IEEE 802.11 Wireless LAN standard

(ratified by the Wi-Fi Alliance). The WiMAX Forum certification allows vendors to

sell their equipment as WiMAX (Fixed or Mobile) certified, thus ensuring a level of

interoperability with other certified products, as long as they fit the same profile.

The IEEE 802.16 standard forms the basis of 'WiMAX' and is sometimes referred

to colloquially as "WiMAX", "Fixed WiMAX", "Mobile WiMAX", "802.16d" and

"802.16e." Clarification of the formal names is as follow:

802.16-2004 is also known as 802.16d, which refers to the working party that

has developed that standard. It is sometimes referred to as "Fixed WiMAX,"

since it has no support for mobility.

802.16e-2005, often abbreviated to 802.16e, is an amendment to 802.16-2004.

It introduced support for mobility, among other things and is therefore also

known as "Mobile WiMAX".

Mobile WiMAX is the WiMAX incarnation that has the most commercial interest

to date and is being actively deployed in many countries. Mobile WiMAX is also the

basis of future revisions of WiMAX. As such, references to and comparisons with

"WiMAX" in this Wikipedia article mean "Mobile WiMAX".

Uses:

The bandwidth and range of WiMAX make it suitable for the following

potential applications:

38

http://en.wikipedia.org/wiki/Last_mile

http://en.wikipedia.org/wiki/WiMAX#WiMAX_Forum

http://en.wikipedia.org/wiki/IEEE_802.16m

http://en.wikipedia.org/wiki/Telecommunication

Providing portable mobile broadband connectivity across cities and countries

through a variety of devices.

Providing a wireless alternative to cable and DSL for "last mile" broadband

access.

Providing data, telecommunications (VoIP) and IPTV services (triple play).

Providing a source of Internet connectivity as part of a business continuity

plan.

Providing a network to facilitate machine to machine communications, such as

for Smart Metering.

Broadband:

Companies are deploying WiMAX to provide mobile broadband or at-home

broadband connectivity across whole cities or countries. In many cases this has

resulted in competition in markets which typically only had access to broadband

through an existing incumbent DSL (or similar) operator.

Additionally, given the relatively low cost to deploy a WiMAX network (in

comparison to GSM, DSL or Fiber-Optic), it is now possible to provide broadband in

places where it may have not been economically viable.

A WiMAX USB modem for mobile internet

There are numerous devices on the market that provide connectivity to a

WiMAX network. These are known as the "subscriber unit" (SU).

39

http://en.wikipedia.org/wiki/Subscriber_unit

http://en.wikipedia.org/wiki/File:Mobile_wimax_usb.jpg

There is an increasing focus on portable units. This includes handsets (similar

to cellular smart phones); PC peripherals (PC Cards or USB dongles); and embedded

devices in laptops, which are now available for Wi-Fi services. In addition, there is

much emphasis by operators on consumer electronics devices such as Gaming

consoles, MP3 players and similar devices. It is notable that WiMAX is more similar

to Wi-Fi than to 3G cellular technologies.

The WiMAX Forum website provides a list of certified devices. However, this

is not a complete list of devices available as certified modules are embedded into

laptops, MIDs (Mobile internet devices), and other private labeled devices.

WiMAX Gateways:

WiMAX gateway devices are available as both indoor and outdoor versions from

several manufacturers. Many of the WiMAX gateways that are offered by

manufactures such as ZyXEL, Motorola, and Greenpacket are stand-alone self-install

indoor units. Such devices typically sit near the customer's window with the best

WiMAX signal, and provide:

An integrated Wi-Fi access point to provide the WiMAX Internet connectivity

to multiple devices throughout the home or business.

Ethernet ports should you wish to connect directly to your computer or DVR

instead.

One or two PSTN telephone jacks to connect your land-line phone and take

advantage of VoIP.

Indoor gateways are convenient, but radio losses mean that the subscriber may

need to be significantly closer to the WiMAX base station than with professionally-

installed external units.

Outdoor units are roughly the size of a laptop PC, and their installation is

comparable to the installation of a residential satellite dish. A higher-gain directional

outdoor unit will generally result in greatly increased range and throughput but with

the obvious loss of practical mobility of the unit.

40

http://en.wikipedia.org/wiki/Mobile_internet_device

http://en.wikipedia.org/wiki/3G

http://en.wikipedia.org/wiki/Smartphone

WiMAX Mobiles:

HTC announced the first WiMAX enabled mobile phone, the Max 4G, on Nov

12th 2008. The device was only available to certain markets in Russia on the Yota

network.

HTC released the second WiMAX enabled mobile phone, the EVO 4G, March

23, 2010 at the CTIA conference in Las Vegas. The device made available on June 4,

2010 is capable of EV-DO (3G) and WiMAX (4G) as well as simultaneous data &

voice sessions. The device also has a front-facing camera enabling the use of video

conversations. A number of WiMAX Mobiles are expected to hit the US market in

2010.

Technical information:

Illustration of a WiMAX MIMO board

4.1 WiMAX and the IEEE 802.16 Standard:

The current WiMAX revision is based upon IEEE Std 802.16e-2005, approved

in December 2005. It is a supplement to the IEEE Std 802.16-2004, and so the actual

standard is 802.16-2004 as amended by 802.16e-2005. Thus, these specifications need

to be considered together.

IEEE 802.16e-2005 improves upon IEEE 802.16-2004 by:

41

http://en.wikipedia.org/wiki/File:Pmc_wizird.jpg

Adding support for mobility (soft and hard handover between base stations).

This is seen as one of the most important aspects of 802.16e-2005, and is the

very basis of Mobile WiMAX.

Scaling of the Fast Fourier Transform (FFT) to the channel bandwidth in order

to keep the carrier spacing constant across different channel bandwidths

(typically 1.25 MHz, 5 MHz, 10 MHz or 20 MHz). Constant carrier spacing

results in a higher spectrum efficiency in wide channels, and a cost reduction

in narrow channels. Also known as Scalable OFDMA (SOFDMA). Other

bands not multiples of 1.25 MHz are defined in the standard, but because the

allowed FFT subcarrier numbers are only 128, 512, 1024 and 2048, other

frequency bands will not have exactly the same carrier spacing, which might

not be optimal for implementations.

Advanced antenna diversity schemes, and hybrid automatic repeat-requeat

(HARQ)

Adaptive antenna Systems (AAS) and MIMO technology

Denser sub-channelization, thereby improving indoor penetration

Introducing Turbo Coding and Low-Density Parity Check (LDPC)

Introducing downlink sub-channelization, allowing administrators to trade

coverage for capacity or vice versa

Fast Fourier Transform algorithm

Adding an extra QoS class for VoIP applications.

SOFDMA (used in 802.16e-2005) and OFDM256 (802.16d) are not

compatible thus equipment will have to be replaced if an operator is to move to the

later standard (e.g., Fixed WiMAX to Mobile WiMAX).

Physical layer:

The original version of the standard on which WiMAX is based (IEEE 802.16)

specified a physical layer operating in the 10 to 66 GHz range. 802.16a updated in

2004 to 802.16-2004, added specifications for the 2 to 11 GHz range. 802.16-2004

42

was updated by 802.16e-2005 in 2005 and uses Scalable Orthogonal Frequency-

Division Multiple Acess (SOFDMA) as opposed to the fixed Orthogonal Frequency-

Division Multiple Acess (OFDM) version with 256 sub-carriers (of which 200 are

used) in 802.16d. More advanced versions, including 802.16e, also bring multiple

antenna support through MIMO (See WiMAX MIMO). This brings potential benefits

in terms of coverage, self installation, power consumption, frequency re-use and

bandwidth efficiency.

MAC (data link) layer:

The WiMAX MAC uses a Scheduling algorithm for which the subscriber

station needs to compete only once for initial entry into the network. After network

entry is allowed, the subscriber station is allocated an access slot by the base station.

The time slot can enlarge and contract, but remains assigned to the subscriber station,

which means that other subscribers cannot use it.

In addition to being stable under overload and over-subscription, the

scheduling algorithm can also be more bandwidth efficient. The scheduling algorithm

also allows the base station to control Quality of Service (QoS) parameters by

balancing the time-slot assignments among the application needs of the subscriber

stations.

Spectrum allocation:

There is no uniform global licensed spectrum for WiMAX; however the

WiMAX Forum has published three licensed spectrum profiles: 2.3 GHz, 2.5 GHz

and 3.5 GHz, in an effort to drive standardization and decrease cost.

In the USA, the biggest segment available is around 2.5 GHz, and is already

assigned, primarily to Sprint Nextel and Clear wire. Elsewhere in the world, the most-

likely bands used will be the Forum approved ones, with 2.3 GHz probably being

most important in Asia. Some countries in Asia like India and Indonesia will use a

mix of 2.5 GHz, 3.3 GHz and other frequencies. Pakistan's Wateen Telecom uses

3.5 GHz.

Analog TV bands (700 MHz) may become available for WiMAX usage, but

await the complete roll out of digital TV, and there will be other uses suggested for

43

http://en.wikipedia.org/wiki/Digital_television

http://en.wikipedia.org/wiki/Wateen_Telecom

http://en.wikipedia.org/wiki/Pakistan

http://en.wikipedia.org/wiki/Indonesia

http://en.wikipedia.org/wiki/India

http://en.wikipedia.org/wiki/Clearwire

http://en.wikipedia.org/wiki/Sprint_Nextel

that spectrum. In the USA the FCC auction for this spectrum began in January 2008

and, as a result, the biggest share of the spectrum went to Verizon Wireless and the

next biggest to AT&T. Both of these companies have stated their intention of

supporting LTE, a technology which competes directly with WiMAX. EU

commissioner Viviane Reding has suggested re-allocation of 500–800 MHz spectrum

for wireless communication, including WiMAX.

WiMAX profiles define channel size, TDD/FDD and other necessary

attributes in order to have inter-operating products. The current fixed profiles are

defined for both TDD and FDD profiles. At this point, all of the mobile profiles are

TDD only. The fixed profiles have channel sizes of 3.5 MHz, 5 MHz, 7 MHz and

10 MHz. The mobile profiles are 5 MHz, 8.75 MHz and 10 MHz. (Note: the 802.16

standard allows a far wider variety of channels, but only the above subsets are

supported as WiMAX profiles.)

Since October 2007, the Radio communication Sector of the International

Telecommunication Union (ITU-R) has decided to include WiMAX technology in the

IMT-2000 set of standards.[21] This enables spectrum owners (specifically in the 2.5-

2.69 GHz band at this stage) to use WiMAX equipment in any country that recognizes

the IMT-2000.

Spectral efficiency:

One of the significant advantages of advanced wireless systems such as

WiMAX is spectral efficiency. For example, 802.16-2004 (fixed) has a spectral

efficiency of 3.7 (bit/s)/Hertz, and other 3.5–4G wireless systems offer spectral

efficiencies that are similar to within a few tenths of a percent. This multiplies the

effective spectral efficiency through multiple reuse and smart network deployment

topologies. The direct use of frequency domain organization simplifies designs using

MIMO-AAS compared to CDMA/WCDMA methods, resulting in more effective

systems.

Inherent Limitations:

A commonly-held misconception is that WiMAX will deliver 70 Mbit/s over

50 kilometers. Like all wireless technologies, WiMAX can either operate at higher

44

http://en.wikipedia.org/wiki/Bit_rate

http://en.wikipedia.org/wiki/Spectral_efficiency

http://en.wikipedia.org/wiki/Spectral_efficiency


http://en.wikipedia.org/wiki/Duplex_(telecommunications)

http://en.wikipedia.org/wiki/Viviane_Reding

http://en.wikipedia.org/wiki/3GPP_Long_Term_Evolution

http://en.wikipedia.org/wiki/Spectrum_auction

bitrates or over longer distances but not both: operating at the maximum range of

50 km (31 miles) increases bit error rate and thus results in a much lower bitrate.

Conversely, reducing the range (to under 1 km) allows a device to operate at higher

bitrates.

A recent city-wide deployment of WiMAX in Perth, Australia, has

demonstrated that customers at the cell-edge with an indoor CPE typically obtain

speeds of around 1–4 Mbit/s, with users closer to the cell tower obtaining speeds of up

to 30 Mbit/s.[citation needed]

Like all wireless systems, available bandwidth is shared between users in a

given radio sector, so performance could deteriorate in the case of many active users

in a single sector. However, with adequate capacity planning and the use of WiMAX's

Quality of Service, a minimum guaranteed throughput for each subscriber can be put

in place. In practice, most users will have a range of 4-8 Mbit/s services and

additional radio cards will be added to the base station to increase the number of users

that may be served as required.

Silicon implementations:

A critical requirement for the success of a new technology is the availability of

low-cost chipsets and silicon implementations.

WiMAX has a strong silicon ecosystem with a number of specialized

companies producing baseband ICs and integrated RFICs for implementing full-

featured WiMAX Subscriber Stations in the 2.3, 2.5 and 3.5Ghz band (refer to

'Spectrum allocation' above). It is notable that most of the major semiconductor

companies have not developed WiMAX chipsets of their own and have instead

chosen to invest in and/or utilize the well developed products from smaller specialists

or start-up suppliers. These companies include but not limited to Beceem, Sequans

and Pico Chip. The chipsets from these companies are used in the majority of

WiMAX devices.

Intel Corporation is a leader in promoting WiMAX, but has limited its

WiMAX chipset development and instead chosen to invest in these specialized

45

http://en.wikipedia.org/wiki/Intel_Corporation

http://en.wikipedia.org/wiki/Chipset

http://en.wikipedia.org/wiki/Wikipedia:Citation_needed

http://en.wikipedia.org/wiki/Customer-premises_equipment

http://en.wikipedia.org/wiki/Bit_error_ratio

companies producing silicon compatible with the various WiMAX deployments

throughout the globe.

Comparison with Wi-Fi:

Comparisons and confusion between WiMAX and Wi-Fi are frequent because

both are related to wireless connectivity and Internet access.

WiMAX is a long range system, covering many kilometers that uses licensed

or unlicensed spectrum to deliver connection to a network, in most cases the

Internet.

Wi-Fi uses unlicensed spectrum to provide access to a local network.

Wi-Fi is more popular in end user devices.

Wi-Fi runs on the Media Access Control's CSMA/CA protocol, which is

connectionless and contention based, whereas WiMAX runs a connection-

oriented MAC.

WiMAX and Wi-Fi have quite different quality of service (QoS) mechanisms:

o WiMAX uses a QoS mechanism based on connections between the

base station and the user device. Each connection is based on specific

scheduling algorithms.

o Wi-Fi uses contention access - all subscriber stations that wish to pass

data through a wireless access point (AP) are competing for the AP's

attention on a random interrupt basis. This can cause subscriber

stations distant from the AP to be repeatedly interrupted by closer

stations, greatly reducing their throughput.

Both 802.11 and 802.16 define Peer-to-Peer (P2P) and ad hoc networks,

where an end user communicates to users or servers on another Local Area

Network (LAN) using its access point or base station. However, 802.11

supports also direct ad hoc or peer to peer networking between end user

devices without an access point while 802.16 end user devices must be in

range of the base station.

46

http://en.wikipedia.org/wiki/Base_station

http://en.wikipedia.org/wiki/Access_point

http://en.wikipedia.org/wiki/LAN

http://en.wikipedia.org/wiki/LAN

http://en.wikipedia.org/wiki/Ad_hoc_network

http://en.wikipedia.org/wiki/Peer-to-peer

http://en.wikipedia.org/wiki/802.16

http://en.wikipedia.org/wiki/802.11

http://en.wikipedia.org/wiki/Wireless_access_point

http://en.wikipedia.org/wiki/Contention_(telecommunications)

http://en.wikipedia.org/wiki/CSMA/CA

http://en.wikipedia.org/wiki/Media_Access_Control

http://en.wikipedia.org/wiki/Wi-Fi

Wi-Fi and WiMAX are complementary. WiMAX network operators typically

provide a WiMAX Subscriber Unit which connects to the metropolitan WiMAX

network and provides Wi-Fi within the home or business for local devices (e.g.,

Laptops, Wi-Fi Handsets, smartphones) for connectivity. This enables the user to

place the WiMAX Subscriber Unit in the best reception area (such as a window), and

still be able to use the WiMAX network from any place within their residence.

WiMAX Forum:

The WiMAX Forum is a non profit organization formed to promote the

adoption of WiMAX compatible products and services.

A major role for the organization is to certify the interoperability of WiMAX

products. Those that pass conformance and interoperability testing achieve the

"WiMAX Forum Certified" designation, and can display this mark on their products

and marketing materials. Some vendors claim that their equipment is "WiMAX-

ready", "WiMAX-compliant", or "pre-WiMAX", if they are not officially WiMAX

ForumCertified.

Another role of the WiMAX Forum is to promote the spread of knowledge

about WiMAX. In order to do so, it has a certified training program that is currently

offered in English and French. It also offers a series of member events and endorses

some industry events.

WiMAX Spectrum Owners Alliance:

WiSOA logo

WiSOA was the first global organization composed exclusively of owners of WiMAX

spectrum with plans to deploy WiMAX technology in those bands. WiSOA focussed

on the regulation, commercialisation, and deployment of WiMAX spectrum in the

2.3–2.5 GHz and the 3.4–3.5 GHz ranges. WiSOA merged with the Wireless

Broadband Alliance in April 2008.

47

http://en.wikipedia.org/wiki/Wireless_Broadband_Alliance

http://en.wikipedia.org/wiki/Wireless_Broadband_Alliance

http://en.wikipedia.org/wiki/File:WiSOA_Logo_80px.jpg

CHAPTER V

Approximate Layered Decoding Approach for Pipelined Decoding

Recently, layered decoding approach has been found to converge much faster

than conventional TPMP decoding approach. With layered decoding approach, the

parity check matrix of an LDPC code is partitioned into L layers

: The layer defines a supercode and the original

LDPC code is the intersection of all supercodes: The column

weight of each layer is at most 1.

Let denote the check-to-variable message from the check node _ to the

variable node , and represent the variable-to-check message from the variable

node to the check node c. In the kth iteration, the log-likelihood ratio (LLR)

message from layer t to the next layer for variable node is represented by ,

where . The layered message passing with Min-Sum lgorithm can be

formulated as (2)–(4).

In (3), denotes the set of variable nodes connected to the check node

excluding the variable node In an LDPC decoder, the check node unit (CNU) is

for the computation shown in (3) and the variable node unit (VNU) performs (2) and

(4). In the case that all soft messages corresponding to the 1-components in an entire

block row of parity check matrix are processed in a clock period, the computations

shown in (2)–(4) are sequentially performed. The long computation delay in the CNU

inevitably limits the maximum achievable clock speed. Usually pipelining technique

can be utilized to reduce the critical path in computing units. However, due to the data

48

dependency between two consecutive layers in layered decoding, pipelining technique

can not be applied directly. For instance, suppose that one stage pipelining latch is

introduced into every CNU. To compute messages corresponding to the third

block row of messages are needed, which cannot be determined until

messages are computed with (3). Due to the one-clock delay caused by the pipelining

stage in CNUs, _ messages are not available in the required clock cycle. The

data dependency between layer 3 and layer 2 occurs at column 4, 8, 9, and 13 as

marked by bold squares in Fig. 2. To enable pipelined decoding, we propose an

approximation of layered decoding approach. Let us rewrite (3) as the following:

where is the variable node set The data dependency between

layer and occurs in the column positions corresponding to the variable node

set For the variable nodes belonging to the variable node set the

following equation is satisfied:

The item in (6) is the incremental change of

message corresponding to layer in the decoding iteration, where

represents the check node in the layer that connects to In iterative LDPC

decoding, the (6) can be approximated using (7) if the item

is used for updating instead of . Thus, by

49

slightly changing the updating order of message corresponding to the variable

node set the (2) can be approximated by (7).

Based on the previous consideration, an approximate layered decoding

approach is formulated as (8)–(10).

Where is a small integer. In order to demonstrate the decoding

performance of the proposed approach, a (3456, 1728), (3, 6) rate-0.5 QC-LDPC code

50

constructed with progressive edge-growth (PEG) approach [12] is used. Its parity

check matrix is permuted as discussed in Section II. The number of rows in each layer

is 144. The parameter in (8) and (10) is set to 2 to enable two stage pipelines. The

maximum iteration number is set to 15. It can be observed from Fig. 3 that the

proposed approach has about 0.05 dB performance degradation compared with the

standard layered decoding scheme. The conventional TPMP approach has about 0.2

dB performance loss compared with the standard layered decoding scheme because of

its slowconvergence speed. It should be noted that, by increasing the maximum

iteration number, the performance gap among the three decoding schemes decreases.

However, the achievable decoding throughput is reduced.

5. 1 Decoder Architecture with Layered Decoding Approach

5.1.1 Overall Decoder Architecture:

The proposed decoder computes the check-to-variable messages, variable-to-

check messages, and LLR messages corresponding to an entire block row of

matrix in one clock cycle. The decoder architecture is shown in Fig. 4. It consists of

the following five portions.

1). layer -register arrays. Each layer is used to store the check-tovariable

messages corresponding to the 1-components in a block row of matrix .At

each clock cycle, messages in one layer are vertically shifted down to the

adjacent layer.

2). A check node unit (CNU) array for generating the messages for one layer of

R- register array in a clock cycle. The dashed-lines in the CNU array denote two

pipeline stages.

3). LLR-register arrays. Each LLR-register array stores the messages

corresponding to a block column of matrix .

4). variable node unit (VNU) arrays. Each VNU array is used for computing the

variable-to-check messages and LLR messages corresponding to a block column of

matrix Each VNU is composed of two adders.

51

5) data shifters. The messages corresponding to a block column of matrix

is shifted one step by a data shifter array.

Figure 6: Decoder Architecture

In Fig. 6, each VNU, MUX, and data shifter is used to represent computing unit

arrays.

In the decoding initialization, the intrinsic messages are transferred to LLR-

register arrays via the MUX1 arrays. At the first clock cycles, messages are

not available due to the _ pipeline stages in the CNU array. Therefore, the MUX2

arrays are needed to prevent LLR-registers from being updated. In one clock cycle,

only a portion of LLR-messages are updated. The updated LLR-messages correspond

to the 1-component in the layer of matrix are sent to data shifter via computation

path. The remained LLR-messages are directly sent to the data shifter from the LLR-

register array.

52

5.1.2 Critical path of the Proposed Architecture:

The computation path of the proposed architecture is shown in Fig.

5.The equations shown in (8)–(10) are sequentially performed. The computation

results of (8) are represented in two’s complement format. It is convenient to use the

sign-magnitude representation for the computation expressed in (9). Thus, two’s

complement to sign-magnitude data conversion is needed before data are sent to

CNU. The messages from CNU array and R-register arrays are in a compressed

form to reduce memory requirement. More details are explained in the next

paragraph. To recover the individual messages, a data distributor is needed. The

messages sent out by the data distributor are in sign-magnitude representation.

Consequently, sign-magnitude to two’s complement conversion is needed before data

are sent to VNU. In this design, the computation path is divided into three segments.

The implementation of the SM-to-2’S unit and the adder in segment-1 can be

optimized by merging the adder into the SM-to-2’S unit to reduce computation delay.

With the Min-Sum algorithm, the critical task of a CNU is to find the two smallest

magnitudes from all input data and identify the relative position of the input data with

the smallest magnitude. An efficient CNU implementation approach was proposed .

The dataflow in a CNU is very briefly discussed in this paper. Since six inputs are

considered for this design, four computation steps are needed in a CNU. The first step

is compare-and-swap. Then, two pseudo rank order filter (PROF) stages are needed.

In the last step, the two smallest magnitudes are corrected using a scaling factor

(usually, is set as 3/4). In this way, the messages output by a CNU are in a

compressed form with four elements, i.e., the smallest magnitude, the second smallest

magnitude, the index of the smallest magnitude, and the signs of all messages. It

can be observed that the critical path of segment-1 is three adders and four

multiplexers. The longest logic path of segment-2 includes three adders and two

multiplexers. The adder in the last stage can be implemented with a [4:2] compressor

and a fast adder. The data shifter can be implemented with one-level multiplexers.

The detail is illustrated in Section IV-C. Thus, the computation delay of segment-3 is

less than that of either segment-1 or segment-2. By inserting two pipeline stages

among the three segments, the critical path of the overall decoder architecture is

reduced to three adders and four 2:1 multiplexers.

53

Data Shifter:

It can be seen from Fig. 2 that by a single left cyclic shift, the block

is identical to , for and Therefore,

repeated single-step left cyclic-shift operations can ensure the message alignment for

all layers in a decoding iteration. After the messages corresponding to the last block

row are processed, a reverse cyclic-shift operation is needed for the next decoding

iteration. Based on the previous observation, only the edges of the tanner graph for the

first layer of matrix are mapped to the fixed hardware interconnection in the

proposed decoder. A very simple data shifter which is composed of one level two-

input one-output multiplexers is utilized to perform the shifting operation for one

block column of matrix . Fig. 8 shows the structure of a data shifter for the

matrix . When the value of control signal is 1, the shifting network performs a

single-step left cyclic-shift. If is set to 0, the reverse cyclic-shift is performed.

Hardware Requirement and Throughput Estimation:

The hardware requirement of the decoder for the example LDPC code is

estimated except for the control block and parity check block. In Table I, the gate

count for computing blocks is provided. Each MUX stands for a 1-bit 2-to-1

multiplexer. Each XOR represents a 1-bit two input XOR logic unit. The register

requirement is estimated in Table II. In the two tables, represent the word length

of each message and message, respectively. As analyzed in

Section IV-B, the critical path of the proposed decoder is three adders and four

multiplexers.

In the decoder architecture presented in [6], each soft message is represented

as 4 bits. The critical path consists of an R-select unit, two adders, a CUN, a shifting

unit and a MUX. The computation path of a CNU has a 2’S-SM unit, a two-least-

minimum computation unit, an offset computation unit, an SM-to-2’S unit stage, and

an R-selector unit. The overall critical path is longer than 10 4-bit adders and 7

multiplexers. The post routing frequency is 100 MHz with 0.13-m CMOS technology.

Because the critical path of the proposed decoder architecture is about one-third of the

architecture presented in [6], using 4-bit for each soft message, the clock speed for the

54

proposed decoder architecture is estimated to be 250 MHz with the same CMOS

technology. In a decoding iteration, the required number of clock cycles is 12. To

finish a decoding process of 15 iterations, we need 183 clock cycles.

Among them, one cycle is needed for initialization and two cycles are due to pipeline

latency. Thus, the throughput of the layered decoding architecture is at least

4.7 Gb/s. Because a real design using the proposed

architecture has not been completed, we can only provide a rough comparison with

other designs.

Lin et al. designed an LDPC decoder for a (1200, 720) code. The decoder

achieves 3.33 Gb/s throughput with 8 iterations. Sha et al.proposed a 1.8 Gb/s decoder

with 20 iterations. The decoder is targeted for a (8192, 7168) LDPC code. The

decoding throughput of the both decoders is less than the proposed architecture with

15 iterations. Gunnam et al. Presented an LDPC decoder architecture for (2082,

1041) array LDPC codes. With 15 iterations, it can achieve 4.6 Gb/s decoding

throughput. The number of CNUs and VNUs are 347 and 2082, respectively

Figure 7: Structure of a data shifter

Table 5.1 Gate Count Estimation Computing Blocks

55

Table 5.2 Storage Requirement Estimate

It can be seen from Table I that less than half computing units are needed in

our pipelined architecture. The register requirement in our design is more than that in

because an LDPC code with a larger block length for a better decoding performance is

considered in our design. The two pipeline stages in CNU array also require

additional registers. The design in is only suitable for array LDPC codes. But the

proposed decoder architecture is for generic QC-LDPC codes. We would like to

mention that the proposed architecture is scalable. For example, the LDPC code

considered in this paper can be partitioned into 8, 12, or 18 layers for different trade-

offs between hardware cost and decoding throughput.

PLDA Architecture:

The architecture of the PDLA based LDPC decoder is shown in Fig.8 As

described in Section III-A, the APP messages, instead of the CTV messages are

passed among different layers. Therefore, for each sub-matrix, two memory blocks

are needed – one to store the APP messages and another to store the CTV messages.

The memory blocks are dual port RAMs, because at every clock cycle, the decoder

must not only fetch messages to facilitate the variable nodes and check nodes

processing, but also receive messages from other connected layers. VNU performs the

variable nodes processing to calculate the VTC messages using data from the APP

memories and CTV message memories. CNU performs check nodes processing to

calculate the new CTV messages. Then, these newly updated CTV messages are

stored back to the same locations in the CTV memories while the updated APP

messages are passed to APP memories in other connected layers. The architecture of

VNU is a simple adder to update the VTC message by subtracting the CTV value

from the APP message. Several VNUs in the same row operate in parallel so that the

newly calculated VTC messages can be used immediately to complete the horizontal

processing. The architecture of the CNU is to perform the check node processing as in

MSA. Each CNU contains 6 number or 7 number comparators which can find the 56

minimum and second minimum values. Here we adopt the comparator. Then, the

absolute value of the VTC message is compared with the minimum value from the

comparator. If the absolute value of the VTC message is larger than the minimum

value, the compare and select unit outputs the minimum value; otherwise it outputs

the second minimum value. The dedicated message passing paths among different

layers are pre-determined by the modified permutation values as described in Section

III-A. Since the message passing paths are fixed, only static wiring connection are

necessary to connect the APP memories among different layers, instead of using a l ×

l switching network. Therefore, the area and power consumption of the message

passing network in PLDA becomes minimal. In addition, mode switching in this

decoder becomes much easier. The depth of the APP memories and CTV memories

are designed to be 96 which can completely fulfill the maximum size of the sub-

matrices. Then, different modes can be set by adjusting the operating period to the

actual size of the sub matrix.

Figure 8: Parallel layered decoding Architecture

To evaluate the performance of our proposed PLDA, a rate-1/2 WiMax LDPC

decoder with 19 different modes is synthesized and implemented on both FPGA and

ASIC platforms. Implemented on Xilinx XC2VP30 device, the maximum frequency

is 66.4 MHz which corresponding to the decoding throughput of 160 Mbps with a

maximum of 10 iterations. The same architecture is implemented and synthesized

using TSMC 90nm ASIC technology. Totally 152 pieces of dual port RAM, each size

of 96 × 6 bits, are used to store the APP messages and CTV messages. The

57

synthesized decoder can achieve a maximum throughput of 492 Mbps for 10decoding

iterations. PLDA only needs (l + 1) × Iter clock cycles for the convergence of the

decoding process and l is the size of the sub-matrix. This number rises tol ×(4 × Iter +

1) and l×(5 × Iter)+12, respectively. Hence, under the same number of iterations,

PLDA could reduce the decoding latency by approximately 75%. The core area of the

decoder is 2.1mm2 and the power consumption is 61 mW at the maximum frequency

of 204 MHz Fig 5.5 shows the layout of decoder. Dual-port memories are generated

by Synopsys Design Ware tool and thus flattened during synthesis and place and

route.

Table 5.3 shows the decoder implementation results compared with other

LDPC decoders for WiMax presented in the literature. Our proposed decoder can

achieve significantly higher decoding throughput with comparable chip area and

power dissipation. The energy efficiency is measured by the energy required to

decode per bit per iteration (pJ/Bit/Iter). As shown

in Table I, the energy efficient of the proposed decoder is improved by about

an order of magnitude comparing to other existing designs.

Table 5.3: Overall Comparisons between Proposed Decoder And Other

Existing LDPC Decoders For WiMax

58

Figure 9: Layout of the proposed decoder chip

5.1.3 Generic Implementation of an LDPC Decoder about Genericity:

The main specification of the LDPC decoder that will be described in this

chapter is the genericity. The meaning of genericity is double:

The decoder should be generic in the sense that it should be able to decode any

LDPC code, providing that they have the same size as the one fixed by the

parameters of the architecture. It means that any distribution degree of the

variable and check nodes should be allowed, given N,M and P.

The decoder should also be generic in the sense that a lot of parameters and

components could be modified: for example the parity-check matrix size, the

decoding algorithm (BP, −min, BP-based), the dynamic range and the

number of bits used for the fixed-point coding. But the modifications of these

parameters should require a new hardware synthesis.

Both of these goals are very challenging since LDPC decoders have been so

far always designed for a particular class of parity-check matrix. Moreover, genericity

for architecture description increases the complexity level. Our motivation to design

such a decoder is the possibility to run simulations much faster on a FPGA than on a

59

PC. Simulations give, at the end, the final judgment about the comparison of the

codes and of their associated architecture.

5.2 Code Structure and Decoding Algorithms:

Code Structure of WiMax. The IEEE 802.16e standard for Wi Max systems

uses irregular LDPC code as the error-correction code because of its competitive bit

error performance compared with regular LDPC codes. It is the quasi-cyclic LDPC

code whose parity check matrix can be decomposed into several sub-matrices and

each one is either an identity matrix or its transformation. An Mb × Nb base parity

check matrix of rate-1/2 WiMax codes is defined where Mb is 12 and Nb is 24. Parity

check matrix H is generated by expanding blank entries as l × l zero matrix and non-

blank entries as a l × l circular right shifted identity matrix. As there are 19 different

modes with the sub-matrix size l ranging from 24 to 96, the shifted value of each sub-

matrix can be expressed as:

(5.1)

where p(i, j) is the permutation value when the size of sub matrix

5.2.1 Min-Sum Decoding Algorithm:

Before presenting the MSA, we first make some definitions as follows: Let cn

denote the n-th bit of a codeword and yn denote the corresponding received value from

the channel.

Let rmn[k] be the check-to-variable (CTV) and qmn[k] the variable-to-check

(VTC) message between check node m and variable node n at the k-th iteration. Let N

(m) denote the set of variables that participate in check m and M (n) denote the set of

checks that participate in variable n. The set N (m) without variable n is denoted as N

(m) \ n and the set M (n) without check m is denoted as M (n) \ m. Detailed steps of

MSA are described below:

1. Initialization:

60

Under the assumption of equal priori probability, compute the channel

probability pn (intrinsic information) of the variable node n, by

(5.2)

The CTV message rmn is set to be zero.

2. Iterative Decoding:

At the k-th iteration, for the variable node n, calculate VTC message qmn [k] by

(5.3)

Meanwhile, the decoder can make a hard decision by calculating the APP (a-posterior

probability) by

(5.4)

Decide the n-th bit of the decoded codeword xn = 0 if n > 0 and xn = 1

otherwise. The decoding process terminates when the entire codeword x =[x1, x2, · · ·

· · · xN] satisfy all M parity check equations Hx = 0, or the preset maximum number of

iterations is reached. If the decoding process does not stop, then, calculate the CTV

message rmn for check node m, by

(5.5)

61

Here, a normalized factor is introduced to compensate for the performance

loss in the min-sum algorithm compared to standard BP algorithm. In this paper, is set

to be 0.75.

5.2.2 Layered Decoding Algorithm:

In BP algorithm and MSA, CTV messages are updated during horizontal step

using VTC messages received from previous iteration. During the vertical step, all

VTC messages are updated by the newly obtained CTV messages from the current

iteration. In other words, these two decoding steps execute iteratively with no

overlapped period between them. LDA enables check node updating process to to be

finished by each individual layer. Therefore, VTC messages can be updated using

CTV messages from the current iteration instead of using old values from the

previous iteration. The selected H base matrix in WiMax is well suited for horizontal

LDA implementation, as it can be decomposed into 12 rows and each row can be

treated as a horizontal layer. Different layers have some vertically overlapped

positions and APP messages instead of CTV messages; can be passed from upper

layer to lower layer at these positions within the same iteration. Recent work by E.

Sharon has theoretically proved that such layered decoding algorithm, either

horizontal or vertical, doubles the convergence speed in comparison with the BP

algorithm.

62

CHAPTER-VI

Xilinx ISE Tool for Synthesis and Place and Route

6.1 Xilinx ISE Tool Flow:

The Integrated Software Environment (ISE) is the Xilinx design software

suite that allows you to take the design from design entry through Xilinx device

programming. The ISE Project Navigator manages and processes the design through

the following steps in the ISE design flow.

Design Entry:

Design entry is the first step in the ISE design flow. During design entry, you

create the source files based on the design objectives. You can create the top-level

design file using a Hardware Description Language (HDL), such as VHDL, Verilog,

or ABEL, or using a schematic. You can use multiple formats for the lower-level

source files in the design. If work starts with a synthesized EDIF or NGC/NGO file,

design entry and synthesis steps can be skipped and start with the implementation

process.

Synthesis:

After design entry and optional simulation, you run synthesis. During this

step, VHDL, Verilog, or mixed language designs become netlist files that are

accepted as input to the implementation step.

Implementation:

After synthesis, you run design implementation, which converts the logical

design into a physical file format that can be downloaded to the selected target

device. From Project Navigator, you can run the implementation process in one step,

or you can run each of the implementation processes separately. Implementation

processes vary depending on whether you are targeting a Field Programmable Gate

Array (FPGA) or a Complex Programmable Logic Device (CPLD).

Verification:

63

You can verify the functionality of the design at several points in the design

flow. You can use simulator software to verify the functionality and timing of the

design or a portion of the design. The simulator interprets VHDL or Verilog code

into circuit functionality and displays logical results of the described HDL to

determine correct circuit operation. Simulation allows you to create and verify

complex functions in a relatively small amount of time. You can also run in-circuit

verification after programming the device.

Device Configuration:

After generating a programming file, you configure the device. During

configuration, you generate configuration files and download the programming files

from a host computer to a Xilinx device. This project is uses Xilinx ISE tool for

synthesis and FPGA implementation of the 2D-DCT code processor. The device

chosen is Spartan3 XC3S400.

Synthesis using Xilinx can be done by following steps:

1) Now create a new project and select device Spartan3 and then XC3S400

2) Now add source code.

3) Go to Implementation

4) Synthesis XST

5) Now to change to Behavioral simulation

6) Run the source code

Figure below shows

the ISE tool flow.

64

Figure 7: ISE Simulation Flow

CHAPTER VII

65

RESULTS

7.1 Simulation Results:

66

7.2 Schematic:

67

Internal Schematic:

68

69

70

CHAPTER VIII

APPLICATIONS

In 2003, an LDPC code beat six turbo codes to become the error correcting

code in the new DVB-S2 standard for the satellite transmission of digital television.

In 2008, LDPC beat convolutional turbo codes as the FEC scheme for the

ITU-T G.hn standard. G.hn chose LDPC over turbo codes because of its lower

decoding complexity (especially when operating at data rates close to 1 Gbit/s) and

because the proposed turbo codes exhibited a significant error floor at the desired

range of operation. LDPC is also used for 10GBase-T Ethernet, which sends data at

10 gigabits per second over twisted-pair cables.

71

http://en.wikipedia.org/wiki/10GBase-T

http://en.wikipedia.org/wiki/Error_floor

http://en.wikipedia.org/wiki/G.hn

http://en.wikipedia.org/wiki/ITU-T

http://en.wikipedia.org/wiki/Forward_error_correction

http://en.wikipedia.org/wiki/Digital_television

http://en.wikipedia.org/wiki/DVB-S2

CHAPTER IX

CONCLUSION

In this project, a high-throughput low-complexity decoder architecture for

generic LDPC codes has been presented. To enable pipelining technique for layered

decoding approach, an approximate layered decoding approach has been explored. It

has been estimated that the proposed decoder can achieve more than 4.7 Gb/s

decoding throughput at 15 iterations.

9.1 Future Scope:

The IEEE 802.16m standard is the core technology for the proposed WiMAX

Release 2, which enables more efficient, faster, and more converged data

communications. The IEEE 802.16m standard has been submitted to the ITU for

IMT-Advanced standardization[26]. IEEE 802.16m is one of the major candidates for

IMT-Advanced technologies by ITU. Among many enhancements, IEEE 802.16m

systems can provide four times faster[clarification needed] data speed than the current

WiMAX Release 1 based on IEEE 802.16e technology.

WiMAX Release 2 will provide strong backward compatibility with Release 1

solutions. It will allow current WiMAX operators to migrate their Release 1 solutions

to Release 2 by upgrading channel cards or software of their systems. Also, the

subscribers who use currently available WiMAX devices can communicate with new

WiMAX Release 2 systems without difficulty.

It is anticipated that in a practical deployment, using 4X2 MIMO in the urban

microcell scenario with only a single 20-MHz TDD channel available system wide,

the 802.16m system can support both 120 Mbit/s downlink and 60 Mbit/s uplink per

site simultaneously. It is expected that the WiMAX Release 2 will be available

commercially in the 2011-2012 timeframe.

72

http://en.wikipedia.org/wiki/Time-division_duplex

http://en.wikipedia.org/wiki/MIMO

http://en.wikipedia.org/wiki/Wikipedia:Please_clarify


REFERENCES

[1]. R. G. Gallager, “Low-density parity-check codes,” IRE Trans. Inf. Theory,vol.

IT-8, pp. 21–28, Jan. 1962.

[2]. A. J. Blanksby and C. J. Howland, “A 690-mW 1-Gb/s 1024-b, rate-1/2 low-

density parity check code decoder,” IEEE J. Solid-State Circuits, vol. 37, no. 3, pp.

404–412, Mar. 2002.

[3] A. Darabiha, A. C. Carusone, and F. R. Kschischang, “Multi-Gbit/sec low density

parity check decoders with reduced interconnect complexity,” in Proc. ISCAS, May

2005, vol. 5, pp. 5194–5197.

[4] C. Lin, K. Lin, H. Chang, and C. Lee, “A 3.33 Gb/s (1200, 720) lowdensity parity

check code decoder,” in Proc. ESSCIRC, Sep. 2005, pp. 211–214.

[5] J. Sha, M. Gao, Z. Zhang, L. Li, and Z. Wang, “Efficient decoder implementation

for QC-LDPC codes,” in Proc. ICCCAS, Jun. 2006, vol. 4, pp. 2498–2502.

[6] K. K. Gunnam, G. S. Choi, and M. B. Yeary, “A parallel VLSI architecture for

layered decoding for array LDPC codes,” in Proc. VLSID, Jan. 2007, pp. 738–73.

[7] E. Sharon, S. Litsyn, and J. Goldberger, “An efficient message-passing schedule

for LDPC decoding,” in Proc. 23rd IEEE Convention Elect. Electron. Eng. Israel,

Sep. 2004, pp. 223–226.

[8] D. E. Hocevar, “A reduced complexity decoder architecture via layered decoding

of LDPC codes,” in Proc. IEEE Workshop Signal Process. Syst., 2004, pp. 107–112.

[9] L. Chen, J. Xun, I. Djurdjevic, and S. Lin, “Near shannon limit quasicyclic low-

density parity-check codes,” IEEE Trans. Commun., vol. 52, no. 7, pp. 1038–1042,

Jul. 2004.

[10] Y. Zhang, W. E. Ryan, and Y. Li, “Structured eIRA codes with low floors,” in

Proc. Int. Symp. Inf. Theory, Sep. 2005, pp. 174–178.

[11] Z.Wang and Z. Cui, “A memory efficient partially parallel decoder architecture

for QC-LDPC codes,” in Proc. 39th Asilomar Conf. Signals, Syst. Comput., 2005, pp.

729–733.

73

[12] H. Xiao-Yu, E. Eleftheriou, and D. M. Arnold, “Regular and irregular

progressive edge-growth tanner graphs,” IEEE Trans. Inf. Theory, vol. 51, no. 1, pp.

386–398, Jan. 2005.

[13] J. Zhang and M. P. C. Fossorier, “Shuffled iterative decoding,” IEEE Trans.

Commun., vol. 53, no. 2, pp. 209–213, Feb. 2005.

[14]. Ardakani, M., and F.R. Kschischang. July 2002. “Designing irregular LPDC

codes using EXIT charts based on message error rate.” Proceedings of the IEEE

International Symposium on Information Theory.

[15]. Ardakani, Masoud, Terence H. Chan, and Frank R. Kschischang. May 2003.

“Properties of the EXIT Chart for One-Dimensional LDPC Decoding Schemes.”

Proceedings of CWIT.

[16]. Bahl, L.R., J Cocke, F. Jelinek, and J. Raviv. March 1974. “Optimal Decoding

of Linear Codes for Minimizing Symbol Error Rate.” IEEE Transactions on

Information

Theory 20:284–287.

[17]. Barry, J.R. oct. 2001. Low-Density Parity-Check Codes. Available at

http://www.ece.gatech.edu/~barry/6606/handsout/ldpc.pdf.

[18].Battail, G., and A. H. M. El-Sherbini. 1982. “Coding for Radio Channels.”

Annales des.

[19]. T´el´ecommunications 37:75–96.

74

Date post:	03-Dec-2014
Category:	Documents
Upload:	padma-amancha
View:	117 times
Download:	1 times

002.Ldpc Final 1

Documents