TELE4652 Mobile and Satellite Communication Systems · The other important insight is that simply...

TELE4652 Mobile and Satellite Communication Systems

Lecture 7 – Equalisation, Diversity, and Channel Coding In this lecture we’ll look at three complementary technologies to allow us to obtain high quality transmission over the radio interface, and whose development was crucial to the success of modern digital cellular networks. The first of these is equalisation, a family of techniques that implement an adaptive filter at the receiver to compensate for the inter-symbol interference (ISI) introduced by the multipath delay spread in a high speed mobile channel. Then we will consider diversity, a set of techniques to identify the individual and independent multipath components and somehow combine them in such a way to obtain a stronger signal at the receiver. Then finally we will take a brief excursion into the vast field of channel coding, where additional parity bits are inserted into the transmitted data stream to facilitate error detection and error correction. Without these three techniques to improve the performance of the radio link, it is inconceivable that mobile cellular networks would have reached the high level of sophistication and performance that they have today.

Equalisation As we discussed in the lecture on radio channel modelling, whenever the RMS delay spread due to multi-pathing is larger than the symbol period, sT>τσ , there will be inter-symbol interference. We classify such a channel as frequency-selective fading, since the coherence bandwidth of the channel is smaller than the signal bandwidth, so the various frequency components in the signal will be attenuated by different amounts in their passage through the channel. With the growing demand for ever increasing data rates over the air interface our channel are inevitability frequency selective, and the resultant intersymbol interference is something that we must live with. The effect of this ISI on the ability of the receiver to correctly recover the data can be disastrous. The overlap of data symbols produces a noise and error floor detection, and equalisation is the term coined for a collection of techniques to remove this ISI and as a result improve the receiver’s noise performance. The name comes from the analogous operation in audio engineering, since the equaliser can be thought of as a filter which re-balances the frequency components in the signal, whose relative amplitudes have been distorted by the frequency selective channel, back to the original transmitted ‘spectral balance’. The diagram illustrates the equaliser as a channel inverse filter. There are a couple of ideas that we can take from this representation. The first is that, since the mobile channel is time-varying, understood from the Doppler spread and quantified as the channel coherence time, the equaliser must be adaptive too. Thus, all practical

equalisers are adaptive equalisers, where the equaliser filter changes in response to changes in the radio channel.

The standard way to realise an adaptive equaliser is through training and tracking. Firstly, the transmitter sends a known, pre-defined sequence, called a training sequence to the receiver. This is called training. The receiver obtains the training sequence after it has passed over the radio channel, and by comparing the received training sequence to what it knows was sent it can determine the properties of the channel, and update its equaliser filter accordingly. This is called tracking. The basic structure of an adaptive equaliser is shown in the diagram below.

The characteristics of the radio channel, expressed as the coherence time and coherence bandwidth, determine how adaptive equalisation must be performed. The channel coherence time determines how often the training sequence must be sent and the equaliser filter updated, as the channel coherence time quantifies the rate at which the mobile channel changes. The rms delay spread determines how long the training sequence must be – it must be long enough to see the longest multipath component of the channel, so it can compensate for its effect. Clearly adaptive equalisation involves a cost in terms of radio resources – some bandwidth must be sacrificed and devoted to this training sequence. However the performance gain in removing this ISI at the receiver makes this cost in resources more than worthwhile.

The other important insight is that simply implementing an inverse filter for the radio channel would be disastrous in terms of noise enhancement. To see this, let ( )fS be the spectrum of the transmitted signal and ( )fH be the channel transfer function. Then the spectrum of the received signal, along with additive noise, could be written as

( ) ( ) ( ) ( )fNfSfHfY += The inverse channel filter equaliser is then,

( ) ( )fHfH eq 1= The output of the equaliser is then

( ) ( ) ( ) ( )fNfSfYfH eq ′+= where the noise at the output of the equaliser is coloured with power spectral density

( )( ) 2

0

2 fHN

fS N =′

The output noise power will be very large at nay frequencies for which the channel has spectral nulls. Thus, implementing a successful equaliser is a little more complicated than simply performing an inverse channel filter at the receiver. A trade-off must be found between ISI removal and noise performance. In modern digital communication systems the equaliser is performed digitally, acting as a digital filter on a symbol by symbol basis. In its simplest manifestation it is a FIR filter at the receiver. The inputs to the equaliser is the received symbol sequence, { }ky , fed from an A/D and demodulator, but prior to the decision maker. Due to ISI each of these received symbols will contain contributions from earlier transmitted and possibly later transmitted symbols. The equaliser could then remove this ISI by removing some linear combination of earlier and later received symbols,

∑−=

−=M

Lkknkn ywx̂

where the length of the filter, M+L, is determined by the multipath spread, and the filter tap weights { }kw are determined by some adaptive algorithm based on the received training sequence. Note that causality is not necessarily required here, as long as we are prepared to accept a delay on the output at the receiver to wait for later symbols to be received (which may or may not be the case, depending on the application). The FIR filter corresponding to the equaliser is then

( ) ∑−=

=M

Lk

kk zwzH

There are three basic characteristics used to describe and classify practical adaptive equalisers. The first characteristic is the type of equalisation performed, as to whether it is symbol by symbol, involving feedback of past decisions, or acting on the received sequence as a whole. The second is how to implement the equaliser filter, usually either a transverse or lattice structure. The final characteristic of an equaliser is the adaptive algorithm employed to track the channel changes. The diagram below illustrates the main types of equalisers.

Beginning with the second point, merely as it is the one for which the least needs to be said, The following diagrams show the FIR filter implemented as first a transverse structure and then a lattice structure. The main issue here is that, while the lattice structure is a more complex realisation and recursive structure, it has several practical advantages over the basic transverse filter implementation. The lattice implementation of the FIR filter offers superior numerical stability, with regard to quantisation noise and rounding errors from finite precision arithmetic. It also facilitates a faster convergence in the determination of the filter tap weights, and is also more suited to dynamic length assignment. For this course we’ll imagine the equaliser filter to be transverse, for algebraic simplicity. Students should be aware, however, that both structures are possible and in a sense equivalent (in the sense that, given a transverse filter one could construct an equivalent lattice filter with the same transfer function).

As for equaliser types, the linear symbol by symbol equaliser is the simplest to understand. It merely consists of the FIR filter, implemented either in lattice or transverse form, with coefficients determined by an adaptive algorithm, operating to equalise each received symbol, one at a time. More sophisticated structures are Maximum Likelihood Sequence Estimation (MLSE) and Decision Feedback Equalisation (DFE). MLSE is the optimal implementation of the equaliser structure, and in fact, the optimal structure of a receiver in general for a channel with memory. Rather than try to implement an inverse equaliser filter acting on a symbol by symbol basis, the MLSE instead makes a decision on the transmitted sequence as a whole. That is, it waits and decides on a group of symbols together. The basis structure of the MLSE is an iterative estimation loop, whereby a channel estimator is implemented to mimic the action of the channel. The channel estimate and the current symbol estimates can be compared to the received symbols, and based on the difference the channel and symbol estimates can be iteratively refined.

The major issue here is that the MLSE is very computationally intensive, particularly when the channel delay spread is large and so the MLSE must act on a large number of symbols at one time. Often a MLSE equaliser is implemented along with the Viterbi algorithm to perform the search. We will soon study this algorithm in a different guise, to decode convolutionally encoded data. In high data rate applications the computational overheads and delay introduced mean that MLSE is often not the chosen equaliser type, and simpler, though less optimal, equaliser is implemented. A popular choice of equaliser is the Decision Feedback Equaliser (DFE). It is not as complex or computationally intensive as the MLSE, though can produce more than adequate performance. The basic idea of the DFE is that the prior estimated symbols can be used, along with an estimate of the channel to estimate the ISI affecting the current symbol. A feedback filter can then be used to subtract off this estimated ISI. The symbol estimate is thus,

∑∑=

−=

− −=2

1 1

0 ˆ̂N

iiki

Niikik dvywd

Neither the DFE nor the MLSE suffer from the noise enhancement problem, since rather than attempt to implement an inverse channel filter they come at the problem the other way and determine an estimate of the channel and account for the ISI from this.

The final issue in equalisation is the technique used to estimate the adaptive filter tap weights. These can be done in a single iteration, such as by the zero-forcing (ZF) or MMSE ideas, or as is most common, by an iterative technique such as the LMS or RLS algorithms. The Zero-forcing (ZF) approach is to precisely determine the best FIR filter approximation to the channel inverse filter. It is seldom used in practise, because of the noise enhancement problem, though they are perhaps the easiest to understand. The function of training sequence is to allow the receiver to estimate the channel impulse response of the channel, and hence the channel transfer function ( )zH . This can be done by a deconvolution technique or other. The zero-forcing equaliser is then defined to be

( ) ( )zHzH ZF

1=

Since ( )zH will be FIR (even if the channel impulse response is not finite, we’ll only be able to measure it over a finite number of symbols anyway), the desired equaliser is typically IIR. Thus the best FIR approximation to a given order is sought. The aim is to choose the equaliser coefficients { }ML ww ,,K− to minimise

( ) ( )2

1 MM

LL zwzw

zH−

− ++− K

or some similar metric. A more popular solution, and with significantly superior noise performance, is the Minimum Mean Square Error (MMSE) algorithm. The idea is to choose the equaliser coefficients that produce the smallest square difference between the equaliser output and the known training symbols. It has a lot in common with the familiar Wiener filter from signal processing and the Kalman filter from digital control theory. To construct the MMSE solution, denote [ ]TMnnnn yyy −−= ,,, 1 Ky a vector formed of the previous M received symbols corresponding to the known training symbols { }Mnnn xxx −− ,,, 1 K being sent down the channel. The equaliser filter is of order M, and

denote a vector formed by its coefficients as [ ]TMwww ,,, 10 K=w . This allows us to express the filter output in vector notation,

yw ⋅== ∑=

−T

M

iinin ywx

0

ˆ

The error of the equaliser is then the difference between the actual equaliser output and the known training symbol,

yw ⋅−=−= Tkkkk xxxe ˆ

The natural of quantity of interest is the mean square error, [ ] [ ] wwwp RxEeEMSE TT

kk +⋅−=== 222ζ where

[ ]kkxE yp = is the correlation vector, which measures the commonality between the received symbols at each time with the current known training symbol, and

[ ]kTkER yy=

is the covariance matrix of the received symbols. These quantities could be determined from the statistical model of the channel, though in practise they are calculated as averages over the received training symbols. We seek the equaliser coefficients to minimise this mean square error. Taking the vector derivative of ζ with respect to w,

0=⎥⎦

⎤⎢⎣

⎡∂∂

∂∂

∂∂

=∇Mwwwζζζζ ,,,

10

K

gives, 0pw =−=∇ 22Rζ

and the optimal equaliser, optimal in the MMSE sense, is pw 1−= RMMSE

It is possible to show that this equaliser doesn’t suffer from the usual noise enhancement problem. For a channel with transfer function ( )zH , and AWGN noise, it is possible to show that the MMSE solution has a transfer function

( ) ( ) 0

1NzH

zH MMSE +=

The weakness with the MMSE approach lies in the inversion of the covariance matrix, R. For large equalisers, this is quite a big matrix, and the inversion procedure can be very computationally intensive and numerically unstable (the rows can be almost linearly dependent, as they are determined by the signal to noise level). Thus, in practise iterative approaches are used to void the need of matrix inversion. To understand where the iterative approaches are coming from, we can first see that the selection of the optimal filter is the really the solution of a simple, convex, M dimensional optimisation problem. We seek to choose the set w to minimise the quadratic form,

( ) [ ] wwwpw RxEJ TTk +⋅−== 22ζ

One way we could find the global minimum would be to move on this M-dimensional hyperplane from our current ‘guess’ towards the global minimum is to move the direction of steepest descent. We would then update our equaliser coefficients as

( ) ( ) ( )( )[ ]jjj J www ∇−+=+

21 α

where α is the step size, which represents how far we move on each iteration. If α is small we converge slowly, though we find the eventually solution quite accurately. On the other hand, if α is large we move rapidly on the surface but run the danger of instability – essentially we continually hop over the desired solution on successive iterations. Here,

( ) ( )pww −=∇ RJ 2 and the steepest decent algorithm updates the equaliser coefficients iteratively as

( ) ( ) ( )[ ]jjj Rwpww −+=+ α1 Notice that there is no need for matrix inversions here. In general, though, we do not even need to determine the covariance vector and correlation matrix, and simpler, more quickly computed approximations suffice. A simple and popular iterative algorithm is the Least Mean Square (LMS), which updates our approximate solution in proportion to the current error. The algorithm, which can be run on an input symbol by symbol basis, updates the equaliser as follows: Iterative over j:

1. Find current equaliser output, ( ) ( )( ) kTjj

kx yw ⋅=ˆ 2. Calculate the current error, ( ) ( )j

kkj

k xxe ˆ−= 3. and update the equaliser coefficients as ( ) ( ) ( )

kj

kjj e yww α+=+1

For stability we require that the step size satisfy

∑=

<< M

ii

0

20λ

α

where { }iλ are the eigenvalues of the covariance matrix, R. This is typically determined in proportion to the received signal power.

The LMS is quite simple to implement but its convergence is very slow. A more sophisticated algorithm is the RLS (Recursive Least Squares). It is designed to be iteratively minimise the cumulative square error,

( ) ∑=

−=m

iin

i enJ0

2λ

where λ is the coefficient of forgetfulness ( 10 << λ ). The RLS procedure is: 1. Initialise, ( )

( ) 0kw == 00 , and ( ) MMIR ×

− = δ10 for some large positive δ.

2. Obtain a new input sample at a time, and iterate for that sample. Iterate over j:

3. Find the current output, ( ) ( )( ) kTjj

kx yw ⋅=ˆ , and the current error, ( ) ( )j

kkj

k xxe ˆ−= .

4. Update the Kalman gains, ( )( )

( ) kjTk

kjj R

Ryy

yk 1

1

11−−

−−

+=λ

and the inverse

correlation matrix, ( ) ( ) ( ) ( )[ ]111

11

1 1 −−−

−−

− −= jTkjjj RRR yk

λ.

5. Finally, update the equaliser coefficients, ( ) ( ) ( )( )j

jk

jj e kww +=+1 . The RLS obviously more complex than the LMS, but it is found in practise to converge much more quickly on the optimal solution. In the selection of an iterative technique to determine the coefficients of the adaptive filter, the major issues are:

1. Computational complexity: the number of multiplications to be performed on each iteration.

2. Rate of convergence: how fast the algorithm locks on to the optimal solution.

3. Misalignment and error: how closely the algorithm output approaches the optimum solution, and how robust the solution is to noise.

4. Numerical properties: sensitivity to finite precision rounding errors. In general there is a trade off in computational complexity and rate of convergence. The table below summarises the typical performance of popular algorithms. In the table, N represents the number of taps in the equaliser. We have not discussed the last algorithms – the interested reader can find the requisite information in the corresponding text books and research papers.

Algorithm No. of mux per iteration

Complexity Convergence speed

Tracking

LMS 2N + 1 Low Slow (>10N) Poor MMSE 2N to 3N Very high Fast (~N) Good

RLS NN 5.45.2 2 + High Fast (~N) Good Fast Kalman DFE 20N + 5 Low Fast (~N) Good

Square-root LS DFE NN 5.65.1 2 + High Fast (~N) Good

Diversity One might imagine that the natural characteristic of the multipath radio channel, to provide the receiver with multiple independent copies of the data stream would be to our advantage. If one of the components undergoes a deep fade, then it is unlikely that the other component would too, if they are really independent. Diversity techniques aim to make this a reality – identify the individual multipath components, and somehow combine them to improve the performance of our communication system. There are two basic aspects of diversity. Microdiversity considers techniques to combat the effects of small-scale fading. Macrodiversity, on the other hand, looks at ways to mitigate the effects of large scale shadowing due to buildings and other obstructions. Macrodiversity is commonly implemented at the network level by combining together the signal received at different base stations. The principles

behind each type of diversity are the same, however macrodiversity is only implemented at the higher network layers. We’ll generally focus here on microdiversity techniques – the extension to macrodiversity situations is easily made.

The aim of diversity is obtain as many different independent versions of the received signal, each called a diversity ‘branch’, at the receiver as possible. By having multiple independent copies of the data signal, the probability of an outage, that is, that our received signal is below the acceptable threshold SNR, will be reduced. The questions then are: how can we obtain diversity branches in practise? How can we combine together these multiple independent signals effectively at the receiver? What is the performance improvement that results? It is important to firstly appreciate, though, that best performance comes if the branch are independent. Thinking in terms of elementary probability theory, with two branches, denoted A and B, the probability that the joint communication link is successful is

( ) ( ) ( ) ( )BAPBPAPBAP ∩−+=∪ which is maximum if A and B are independent, so ( ) 0=∩ BAP . There are many ways that diversity can be achieved in a practical cellular system. Many base stations can use several separate antennae, and as long the antennae are spaced by a sufficient distance, the signals received at each antenna can be assumed to fade independently. This is called space diversity. Another solution is to use polarised antennae at the base station, since the two different polarisation components of the RF signal will propagate through the radio channel in very different ways. This is polarisation diversity. If we transmit the signal at two different frequencies separated by more than the channel coherence bandwidth, then these copies of the signal will fade independently

and we get frequency diversity. Finally, if we send the same signal at two different times separated by more than the channel coherence time we obtain time diversity. Having obtained these independent copies of the signal at the receiver, there are three common ways to combine them to get a stronger resultant signal. The first is called Selection Combining (SC), where we simply select the branch with the strongest SNR to be the output signal. All this requires is that the receiver monitor and measure the signal strength on each branch – no complicated co-phasing is required. The second technique is called Equal Gain Combining (EGC). Here the receiver co-phases the signals (compensates for the different time delays) and sums the signals. The final technique, Maximum Ratio Combining (MRC) is the optimal one. In MRC the receiver co-phases the signals and sums them together, weighting each branch with a gain proportional to the amplitude on the received signal on that branch. All of these techniques can be represented with the same structure. The receiver weights the signal on each branch with a complex gain,

ijii ea θα −=

where the factor iθ does the co-phasing, essentially compensating for the phase of the signal received on that branch. The output combined signal is

( ) ( )∑=

∑ =M

iii trtr

1

α

This is illustrated in the diagram below.

There are two main quantities that are determined to assess the performance improvement with diversity. The first is the average increase in output SNR, called the array gain,

SNRbranch averageSNR combined average

== ∑

γγ

gA

The second is the decrease in the output symbol error rate, called the diversity gain. In general when M diversity branches are used, one finds that the symbol error rate can be approximated as

χγ −≈ cPs where c depends on the type of modulation and detection used, and M≤χ is called the diversity order. The maximum diversity order that can be achieved with M diversity branches is M, when MRC is used. Let’s first consider Selection Combining (SC). The output is simply the strongest branch, so the gains are all zero except for that of the strongest branch. Exact expressions for the performance improvement of this SC diversity scheme can be obtained, given models of the channels on the constituent branches. The simplest model is to assume each branch is an identical and independent Rayleigh fading channel, each having the same mean SNR, γ . As such, the probability that the SNR on a branch is γ is

( ) γγ

γγ −= eP 1

The probability of outage on a branch, we find the probability that the SNR is below some threshold level 0γ .

( ) ( )∫ −−==≤0

0

00 1

γγγγγγγ edPPout

Now the probability of an outage in the selection diversity system is the probability that all branch SNRs are below the threshold level,

( ) [ ]Mout eP γγγγ 010−

∑ −=≤ This decreases as M increases – the diversity gain. We can find the probability distribution for the output SNR by differentiating the outage probability (representing the cumulative distribution).

( ) [ ] γγγγ

γγ −−−

∑ −= eeMP M 11

The array gain is then obtained as the average output SNR,

∑=

∑ =M

k k1

1γγ

Note here the dimensioning effect on the output signal level of progressively adding more and more branches. The diversity gain is considerably more difficult to calculate, as it requires assumptions regarding not only the channel model but also the modulation and detection used. Moreover, exact expressions are often not easily obtained. For the unique case of binary DPSK with non-coherent detection, an exact form can be found. The average BER with SC with M branches is

( ) ( )∑∫−

=

−

∑−∞

++−==

1

0

1

0 21

11

2

M

m

mM

meb m

CMdPePγ

γγγ

SC falls well short of full diversity order. A simpler practical implementation of SC is called Threshold combining. In this case we only switch branch if the current output branch drops below a certain threshold.

The performance of threshold combining is very similar to SC. It is illustrated in the diagram below for two branches.

The optimum choice of branch weight is MRC, where each branch weight is proportional to the signal level on that branch. The output is, after co-phasing,

( ) ( )∑=

=M

iii tratr

1

where ( )tra ii ∝ . This is optimal in the sense of maximising the output SNR. The analysis is the same as that for Selection Combining above, however more algebraically complex, as the output SNR is found to follow a 2χ -distribution. The main results are: Array gain,

γγ M=∑ Note that there is no diminishing returns as we add more diversity branches. Outage probability,

( )( ) ⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧

−−= ∑

=

−−

M

k

k

out keP

1

10

!11 0

γγγγ

And the general form of the symbol error probability is M

esbaP

−

⎟⎠⎞

⎜⎝⎛≈

2γ

indicating that full diversity order is attained. Equal Gain Combining (EGC) weight all of the branches equally, along with co-phasing, Maaa === K21 . It’s performance is only slightly worse than MRC, and given its simplicity, it is a popular choice in practical systems. The above discussion has all been for diversity at the receiver. It is also possible to consider diversity at the transmitter, where the transmitter attempts to improve the communication link by transmitting to the receiver on different diversity branches (say from separate TX antennae). When the transmitter has knowledge of the state of the channels it has access to the problem is identical to the receive diversity schemes we have been considering above. SC, EGC, or MRC can be used by the transmitter to improve link performance.

It is still possible to achieve full diversity order when the channel state information is unknown at the transmitter (but known at the receiver). This was demonstrated by Alamouti in a famous paper, and is known as space-time (block) coding (STBC). Space-Time coding is an important aspect of MIMO (Multiple Input – Multiple Output) antenna systems, and is a central component of the planned 4G cellular networks. We will discuss these ideas in a later chapter.

RAKE Receiver A very important example of diversity principles applied to a cellular network is that of the RAKE receiver – a feature of CDMA networks. By their very nature direct sequence spread spectrum signals are susceptible to ISI caused by multi-pathing, since the transmitted bandwidth is large compared to the channel coherence bandwidth. However, by selecting spreading sequences with low autocorrelation the multipath ‘echoes’ will have little impact on the recovered, de-spread signal. The idea in a RAKE receiver is to identify each of the strongest multipath components by trawling through the received signal with time-shifted versions of the spreading sequence. Once a multipath component is identified it can be co-phased and combined to produce a stronger output signal, using one of the afore described diversity techniques (usually MRC, for obvious reasons).

Note also that the application of this RAKE receiver makes the CDMA system particularly well suited to soft, clean hand-offs. Two or more base stations can simultaneously transmit the same signal to a MS, and the signals from these different base stations will appear as separate RAKE components and be added to produce a stronger resultant signal. Another very important example of time diversity is interleaving. Interleaving is a systematic re-ordering of the transmitted bit sequence. When combined with channel coding, which we will discuss in the next section, this produces a very efficient and high performance communication system over the mobile fading environment. At a fundamental level, we can design channel codes that correct random bit errors distributed uniformly over the bit stream very effectively. It is much more difficult to

design and implement channel codes that correct burst errors – long consecutive sequences of bit errors, due to channel fading events. With interleaving we essentially distribute the burst errors randomly over the transmitted data stream, enabling the channel code to correct them. The diagram below shows a simple example of a block interleaving scheme on a communication channel. The general aim of interleaving is to distribute neighbouring bits of the encoded sequence across the transmitted bit stream at times separated by greater than the channel coherence time, so they experience essentially independent channels. The trade-off here is, though, that the greater the time over which we interleave the greater the delay at the receiver (since it must weight until all of the local interleaved bits have been received to re-order and then decode). This is an issue when we consider the communication of real-time data, such as voice transmission in a phone conversation.

Channel Coding The role of channel coding in communication is to insert redundant information into the transmitter bit stream to facilitate the detection and correction of errors that naturally occur on transmission over the harsh radio channel. As illustrated in the diagram below, coding lowers the bit error rate for a given signal to noise ratio, significantly improving the link performance.

Channel coding and information theory in general is an enormous field and we can only attempt a very brief summary here. We will firstly introduce the ideas of block codes, and in particular cyclic codes. These are primarily used for error detection in cellular and satellite systems. Then we will treat convolutional codes and turbo codes, which are the popular choice to provide error correction efficiently in cellular networks. The basic idea of a block code is to map k input symbols into n output symbols, with

kn > , by inserting kn − parity bits to allow us to detect and correct errors that occur on the channel. We will restrict our attention to binary block codes, where both the input and output alphabets are { }1,0 - the binary field. It is possible to conceive block codes over non-binary alphabets, and in fact the size of the input and output alphabets need not even be the same. Some of the most important block codes are constructed over non-binary alphabets, particularly the Reed-Solomon family of burst error correcting codes. Nevertheless, the analysis and results that we obtain for block codes is easy to generalise to non-binary alphabets, but we will free ourselves of this added complexity for our presentation. For our binary block code, our k input bits correspond to k2 possible input binary words, and these map to k2 distinct, unique codewords. We call this a ( )kn, code, and denote the ratio nkR = the code rate, representing the fraction by which our bit

stream is expanded on the application of the code, due to the addition of the parity bits. We can imagine these codewords as k2 binary vectors embedded in a n2 binary space. Communication of the codewords over a noisy channel will naturally result in some the bits being received incorrectly. The error resilience of the block code, and its ability to correct errors, is fundamentally related to the distance between codewords in the code space, where the code space is the binary space of dimension n, consisting of

n2 discrete points.

Block encoderx1, x2, …,xM

ChannelfY|X

Block decoder],,[ ,1, Nmmm xx K=x

∏=

N

nnmnXY

mxyf

1,| ),(argmax

],,[ 1 Nyy K=y

To measure the distance between code points in the code space we introduce the Hamming distance. The Hamming distance between two codewords merely represents the number of bit positions in which the two codewords differ. It is trivial to determine the Hamming distance between two codewords of a binary code using the Hamming weight, ( )iw c , of a binary vector ic . The Hamming weight of the codeword is the number of ‘1’s in the binary codeword, ( )iniii ccc ,,, 21 K=c , with

{ }{ }1,0∈ijc

( ) ∑=

=n

jiji cw

1

c

The Hamming distance between two codewords is then found by taking the Hamming weight of the sum or difference of the codewords (note that, over the binary field, addition and subtraction are equivalent: 00000 =−=+ ; 1 + 1 = 1 – 1 = 0; 0 + 1 = 1 + 0 = 0 – 1 = 1 – 0 = 1.) The addition or subtraction of the two codewords will yield a ‘0’ if the codewords agree at that bit position, and a ‘1’ if they disagree.

( ) ( ) ( )jijiji wwd cccccc −=+=,Ham The Hamming distance of a code is defined to be the minimum Hamming distance between any two codewords of the code,

( )jiddji

cccc

,min HamCode,min ∈=

The Hamming distance of a code ultimately determines its ability to correct errors. We can conceive of a simple conceptual model for decoding and error correction of the block code. We can surround each code vector by a sphere radius t, such that each of these spheres are non-overlapping – called Hamming spheres. Our received binary word r must be some point in the code space, and we decode by selecting the codeword ic corresponding to that received vector as the codeword in whose Hamming sphere r lies. The Hamming sphere, containing all points within a Hamming distance of t from the codeword, corresponds to all binary vectors that differ from the codeword in up to t bit positions. Thus, our code is able to correct t bit errors in the codewords.

The minimum distance between codewords in the codespace, the Hamming distance of the code mind , naturally determines the number of correctable errors, t. To make the Hamming spheres disjoint we require

12min +≥ td Moreover, we can distinguish between the number of errors a code can correct, ct , and the number of errors a code can detect dt , though not necessarily correct – merely that the receiver can identify that the received binary codeword with dt errors is not itself a codeword. Clearly we must have cd tt ≥ . The number of detectable errors must satisfy,

1min +≥ dtd We could then conceive of a code that could correct ct errors and detect dt errors if and only if it satisfies

12min +≥ ctd and 1min ++≥ cd ttd The Hamming distance of a code and the number of errors it can correct can be used to put bounds on the size of the code and the number of parity bits required. The number of points within the Hamming sphere of a codeword is

⎟⎟⎠

⎞⎜⎜⎝

⎛++⎟⎟

⎠

⎞⎜⎜⎝

⎛+⎟⎟

⎠

⎞⎜⎜⎝

⎛+=⎟⎟

⎠

⎞⎜⎜⎝

⎛∑= t

nnnjnn

j

K21

10

as there is the codeword itself, then there are 11C

n n=⎟⎟⎠

⎞⎜⎜⎝

⎛ (‘n choose 1’) vectors that

differ from the codeword in one bit position, 22C

n n=⎟⎟⎠

⎞⎜⎜⎝

⎛ vectors that differ from the

codeword in exactly two bit positions, etc. There are k2 codewords in the code space, and as there are n2 points in the codespace, we must have

nt

j

k

jn

220

≤⎟⎟⎠

⎞⎜⎜⎝

⎛∑=

The above argument gives our the Hamming Bound on the number of correctable errors for a block code,

knt

j jn −

=

≤⎟⎟⎠

⎞⎜⎜⎝

⎛∑ 20

A code that achieves equality in the Hamming bound is known as a ‘perfect code’. A perfect code has the property that every point in the codespace lies within the Hamming sphere of some codeword. In a sense there are no wasted points in the codespace, and the decoder can make a decision about every single possible received

binary word. In a code that is not perfect, there are some points that are equally distance from two or more codewords, and as such the receiver cannot decide on the codeword corresponding to this received binary vector. There are three types of perfect codes: binary repetition codes, Hamming codes, and the Golay codes. We will discuss each of these later. Perfect codes are not of great practical interest, however, since as we said before the main interest in coding is to be able to build large block codes while still maintaining finite complexity in the encoding and decoding operations. Perfect codes are thus not considered the ‘best’ codes in practise.

5min =d

1c

points 1

1h

⎟⎟⎠

⎞⎜⎜⎝

⎛

=

nd

points 2

1

2h

⎟⎟⎠

⎞⎜⎜⎝

⎛+⎟⎟

⎠

⎞⎜⎜⎝

⎛

=

nnd

2c

A simple example of a (5,2) block code is: Input binary word Codeword (0,0) (0,0,0,0,0) (0,1) (0,1,0,1,1) (1,0) (1,0,1,0,1) (1,1) (1,1,1,1,0) The Hamming distance for this code is seen to be 3min =d , which means this code of rate 2/5 can correct a single bit error. The code is not perfect, since

8251 25 =<+ − For example, if the receiver obtained (1,0,0,0,0) it would decode this as (0,0), since it differs from the first codeword in a single bit position, and all other codewords in more than two bit positions. An example of an ambiguous received vector is (1,1,0,0,0), as it differs from the first codeword in two bit positions, but also the fourth codeword in two bit positions. The receiver has no way of deciding between the two, and correcting the double error that must have occurred. From a practical standpoint we usually restrict our discussion of practical block codes to systematic linear block codes. A systematic (n,k) code is one for which the first k bits of the n bit codeword correspond exactly the input k bits. An example of a systematic code is the (5,2) example presented earlier.

Input binary word Codeword (0,0) (0,0,0,0,0) (0,1) (0,1,0,1,1) (1,0) (1,0,1,0,1) (1,1) (1,1,1,1,0)

Notice that the first two bits of each codeword are the same as the input bits. This systematic property is an important structural element in an efficient decoder. The first operation of the decoder is to establish whether or not an error occurred in the codeword. If it did not then the decoder simply decodes by taking the first k bits of the received n bits. If an error is detected then an algorithm can be invoked to identify and correct this error. This systematic property, particularly in situations with low error rates, can lead to considerable savings in computation in the decoder. A linear code has the property that the sum of any two codewords is itself another codeword. If we denote the set of codewords as C, then

CC jiji ∈+∈ cccc then ,, If You might note that the above (5,2) code is also linear. The first useful characteristic of a linear code is the existence of a generator matrix for the code, G. The generator matrix is an efficient technique for performing encoding, simply multiply the input binary word b by the generator matrix to obtain the codeword c,

Gbc = This is possible since a matrix is a linear mapping. If we take a basis for the input binary space (the obvious one is ( ) ( ) ( ){ }KKKK ,,1,0,0,,0,1,0,,0,0,1 ), then an arbitrary input can be expressed as a linear combination of these basis vectors. Thus, using the generator matrix the resultant codeword must be linear combination of codewords corresponding to the basis vectors. A generator matrix can easily be constructed by considering what happens to the basis vectors of the code. For the (5,2) example, the generator matrix is

⎟⎟⎠

⎞⎜⎜⎝

⎛=

11

1001

1001

G

simply the codewords corresponding to the input binary words (1,0) and (0,1). The generator matrix of a systematic code will always have the form

( )( )knkkk PIG −××= | where P is known as the parity generating matrix, since it tell us how to determine the parity bits for the given codeword. For a linear block code, the Hamming weight of the code is the smallest Hamming weight of a non-zero codeword of the code (since all sums and differences of codewords are themselves other codewords). Moreover, the Hamming weight of the code is the smallest number of ‘1’s that can be made by any linear combination of rows of the generator matrix. The most important feature of a linear code is the ability to perform syndrome decoding. To do this we define the parity check matrix,

⎟⎟⎠

⎞⎜⎜⎝

⎛=

IP

H

The parity check matrix has the property that it acting on any codeword must result in 0.

( )

0bb

b

bcy

=+=

⎟⎟⎠

⎞⎜⎜⎝

⎛=

==

PPIP

PI

GHH

|

since 0 + 0 = 0 and 1 + 1 = 0, so any binary vector added to itself is equal to the zero vector. After communication over a noisy channel, the received vector could be written as the transmitted codeword plus an error vector e. The error vector has a ‘1’ at any position that is in error, and a ‘0’ in all positions that are not in error.

ecr += The decoder acts on the received codeword with the parity check matrix H to determine the syndrome,

HHH

H

eec

ry

=+=

=

The syndrome only depends on the error that has occurred. If there was no error then 0y = , we assume that the codeword was received correctly and decode our

systematic code by taking the first k bits. Otherwise the syndrome identifies the error that has occurred, independent of the codeword sent. A look-up table is commonly employed to correct the error from the syndrome. For each error that the code can correct, the associated syndrome is found by multiplying the error by the parity check matrix. The decoder then stores in memory a table of syndromes and the associated errors, and if it calculates a particular syndrome it looks this up in the table, grabs the corresponding error vector, and corrects this error by adding it to the received vector (and so switching the erroneous bits back to their original values). For the (5,2) code we have been considering, the parity check matrix is

⎟⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜⎜

⎝

⎛

=

10

01

00

00111

10

01

H

The code can correct single bit errors, so there are 5 error vectors that the code can correct, (1,0,0,0,0), (0,1,0,0,0), (0,0,1,0,0), (0,0,0,1,0), and (0,0,0,0,1). The syndromes corresponding to each of these errors are, Hey = , forming the look-up table, Syndrome Corresponding Error (1,0,1) (1,0,0,0,0) (0,1,1) (0,1,0,0,0) (1,0,0) (0,0,1,0,0) (0,1,0) (0,0,0,1,0) (0,0,1) (0,0,0,0,1) If the receiver obtained (1,1,1,1,0), then the decoder calculates the syndrome as (0,0,0), meaning codeword is correct and there was no error. If, for example, the second bit is in error, then (1,0,1,1,0) is received. The syndrome in this case is (0,1,1). The associated error vector from the look-up table is (0,1,0,0,0), allowing the received word to be corrected to (1,1,1,1,0). This syndrome decoding with a look-up table can be practically implemented for reasonably large code sizes. The only weakness really is that matrix computations,

even in binary, can get quite tedious for very large matrices. Next we will see a way of circumventing this problem. The final structural aspect we impose on block codes to ease implementation is to require them to be cyclic. Cyclic codes are ones for which any cyclic permutation of a codeword is itself another codeword. A simple example of a cyclic code is the (6,2) repetition code, Input binary word Codeword (0,0) (0,0,0,0,0,0) (0,1) (0,1,0,1,0,1) (1,0) (1,0,1,0,1,0) (1,1) (1,1,1,1,1,1) Notice that any cyclic shift of a codeword always gives another codeword. For instance, if we take 010101 and shift all bits one place to the right, we get the third codeword 101010. The important thing about cyclic codes is that they facilitate a very efficient polynomial representation. A binary vector of length n can be mapped to a polynomial whose coefficients are over the binary field of degree n – 1. The coordinates of the binary vector become the coefficients of the polynomial. For instance, (1,0,1,0,1,0) as a binary polynomial is

pppppppp ++≡+++++ 352345 010101 A cyclic shift of bits could easily be implemented in the polynomial representation by multiplication by an appropriate power of p. The attraction of cyclic codes is that this binary polynomial representation naturally maps to implementation at a microprocessor level, in terms of bit-shift and add operations. This will be made apparent in the example to follow. Firstly, let’s discuss the encoding and decoding procedures. Cyclic codes are characterised by a generator polynomial, ( )pg , that defines the rule for how codewords are found from input binary words. To generate codewords in a systematic way, the input bits are represented by a binary polynomial ( )pb . This polynomial is multiplied by knp − , in effect a shift of the bits to the left kn − places. We then divide ( )pbp kn− by ( )pg to find the remainder polynomial, ( )pρ . The coefficients of the remainder polynomial are the parity check bits for the codeword. The resultant polynomial for the codeword could then be written as

( ) ( ) ( )ppbppc kn ρ+= − Note that this is a multiple of the generator polynomial, since by the law of division,

( ) ( ) ( ) ( )ppgpqpbp kn ρ+=− and all coefficients are binary. Thus, if the codeword polynomial is divided by the generator polynomial the remainder is zero. In general, after a noisy channel, the received polynomial will differ from the codeword polynomial by some error polynomial,

( ) ( ) ( )pepcpr += The remainder upon dividing the received polynomial by the code generator polynomial must depend only on the error, independent of the codeword sent.

Syndrome ideas and look-up tables can then be used to find the error from this remainder, which is known as the syndrome. As an example of the encoding process, consider the (7,4) Hamming code. This is a single error correcting code, which can be described by the following generator polynomial, ( ) 321 pppg ++= . To encode (1,0,1,0), we first form

( ) ( ) 4633 ppppppbp kn +=+=− The parity check bits are found by dividing this by the generator polynomial,

1

11

1

23

23

23

245

345

356

4623

++

++

+++

++++

+++pp

pppp

pppppp

ppppppp

The remainder polynomial is ( ) 1=pρ . The transmitted codeword is then 1010001 – the remainder giving the parity check bits. The above example of binary polynomial long division should make it clear how easy this procedure is to implement in a microprocessor using bit-shift and add operations. It is now time to take a look at some practical block coding schemes. Hamming Codes: Hamming codes are a class of single-error correcting perfect binary codes. A perfect single-error correcting binary code satisfies

knn −=+ 21 Some (n.k) values that satisfy this relationship are (3,1), (7,4), (15,11), and (31,26), and Hamming codes exist for all of these (n,k) values. Constructing generator polynomials for hamming codes relies on the observation that the generator polynomial ( )pg must be a factor of 1+np . Fundamentally this makes sure that we can generate codewords, multiples of ( )pg , that give syndromes of zero. For example, the (7,4) Hamming code can be constructed as factors of 17 +p . Factorising we find

( )( )( )3237 1111 pppppp +++++=+ This gives us two possible choices for the generator polynomial – either the second or third factors. Encoding and decoding Hamming codes typically follows the syndrome technique described in earlier sections.

Cyclic Redundancy Check Codes: These codes, known as CRC codes, are widely used for error detection. They have applications in communication systems, typically at the physical layer in conjunction with ARQ schemes. They are a common feature of many international standards in many diverse systems, from error detection in serial communication to error detection in memory reading and writing at a hardware level. Some common CRC standards are: Code Generator Polynomial n-k CRC-12 code 1231112 +++++ ppppp 12 CRC-16 (USA) 121516 +++ ppp 16 CRC-16 (ITU) 151216 +++ ppp 16 The first example, CRC-12 code, is well suited to systems bit on 6-bit words, while the latter pair are suited for 8-bit word systems (or byte based systems), as the CRC bits will represent two additional bytes at the end of the message. The implementation of CRC codes is the same as we have addressed in early sections, though it should be emphasised that these codes are only suitable for error detection, and cannot perform forward error correction (they require Automatic Repeat Request (ARQ)). This is commonly their application in cellular networks. Their role is to detect failures of the outer convolutional or turbo code used for error correction. Bose-Chaudhuri-Hocquenghem (BCH) codes: BCH codes are a family of binary cyclic codes with a wide variety of parameter choices, and as such are quite popular. The BCH codes are characterised by a positive integer m > 2, with the ability to correct t errors with ( ) 212 −< mt . The associated parameters are then, Block length: 12 −= mn Number of message bits: mtnk −≥ Hamming distance: 12min +≥ td BCH codes can correct up to t random errors in a codeword. In fact, Hamming codes are a special case of BCH codes with t = 1. We not go into how to construct generator polynomials for BCH codes, as it is a little too mathematically involved for this course. One should point out that most codes of reasonable block size have already been discovered, and in practise the engineer will select an existing code with the desired parameters and find its generator polynomials in a text book or research paper. For example, choosing m = 5 we can design a code that corrects t = 2 random errors per codeword. The above relationships imply at we need a (15,7) block code with

5min =d . A generator polynomial for this code, referring to the appropriate paper, ( ) 14678 ++++= pppppg

The main structural feature of BCH codes that make them so popular is that we need not rely on syndrome decoding and the use of look-up tables. For large block sizes and large error correcting capability, the associated look-up tables can get very large

and be quite impractical to implement. BCH codes lend themselves to two alternative algorithms for error location, and once the error is located correction for a binary code is easy. The first algorithm is known as the Berlekamp-Massey algorithm, while the second is built on the famous Euclid’s algorithm. The Berlekamp-Massey is superior, but due to patent issues Euclid’s algorithm is more widely used. Reed-Solomon Codes: RS codes are an important class of non-binary BCH codes. RS codes differ from binary codes as they map a sequence of k symbols to a set of n encoded symbols. The symbols come from a set of size m2 , where m corresponds to the number of bits per symbol. Alternatively, we could consider a RS code as mapping mk input bits to nm encoded bits. A popular choice of m is 8, in which case we can consider a RS code as mapping k input bytes to n encoded bytes, by appending n – k parity bytes. The RS code can correct t symbol errors, no matter where they occur in the codeword. The important parameters for a RS code are: Block length: 12 −= mn symbols Message size: k symbols Parity check size: tkn 2=− symbols Hamming distance: 12min += td symbols For example, for m = 8 and the case of input bytes, then the block length is n = 255 symbols. To correct 16 symbol errors, we need 33min =d and find k = 223. We would require a (255,223) RS code to correct 16 symbol errors. Notice how close the code rate is to 1, even for this large error correcting capacity. RS codes offer particularly good performance in burst error correction. In the above example the (255,223) RS code involves the transmission of 8×255 = 2040 bits for a code word. The RS decoder can correct 16 symbol errors, no matter where they occur in the codeword. Suppose these 16 symbol errors are consecutive, corresponding to 16×8 = 128 consecutive bits being in error. This means the RS code can correct bursts of up to 128 consecutive bits. This has made RS codes attractive for application in space communications, and the above RS code is at the heart of the NASA/ESA deep space coding standard (CCSDS). The weakness with the (255,223) RS code is, while it can correct 128 consecutive bit errors, the code is broken by only 17 random bit errors affecting 17 different symbols distributed throughout the codeword. The common way around this problem is to protect the RS code by concatenating it with an inner code to correct these random bit errors. The most common choice for the inner code is a convolutional code, and this is the case in the CCSDS standard. In fact, Reed-Solomon codes concatenated with convolutional codes are still the most successful channel coding option, for a given computational implementation complexity, even after the discovery of LDPC and Turbo codes. The inner convolutional code protects against random bit errors, while the outer RS code protects against burst errors and failures of the inner code. RS codes can be represented and implemented as polynomials over a non-binary field. For the case above, the coefficients of the polynomial are taken from the Galois field GF[256]. Efficient encoder structures have been designed for these codes for a variety

of microprocessor architectures. The decoding technique for RS codes is also very efficient, and follows a similar procedure to that of BCH codes, with the important distinction being once again that these operate over non-binary fields.

Convolutional Codes An (n,k,m) convolutional encoder maps k input bits to n output bits, using the m previous inputs to the encoder. The difference between this and traditional block codes the use of the memory bits, whose extent is encapsulated by m. The general form of a convolutional encoder is shown in the diagram below.

mem

ory

⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜

⎝

⎛

=

− )1(

)1(

)0(

kl

l

l

l

b

bb

Mb

⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜

⎝

⎛

=

−−

−

−

−

)1(1

)1(1

)0(1

1

kl

l

l

l

b

bb

Mb

mem

ory

⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜

⎝

⎛

=

−−

−

−

−

)1(

)1(

)0(

kml

ml

ml

ml

b

bb

Mb

mem

ory

2−lb 1+−mlb

]0[)0(0g

]0[)1(0g

]0[)1(0−kg

]0[)0(1g

]0[)1(1g

]0[)1(2−kg

]0[)0(1−ng

]0[)1(1−ng

]0[)1(1−−k

ng

]1[)1(0g

]1[)1(0−kg

]1[)0(1g

]1[)1(1g

]1[)1(2−kg

]1[)0(1−ng

]1[)1(1−ng

]1[)1(1−−k

ng

][)1(0 mg

][)1(0 mg k−

][)0(1 mg

][)1(1 mg

][)1(2 mg k−

][)0(1 mgn−

][)1(1 mgn−

][)1(1 mg k

n−−

]1[)0(0g ][)0(

0 mg

l

nl

l

l

c

cc

c=

⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜

⎝

⎛

− )1(

)1(

)0(

M

)0(lc

)1(lc

)1( −nlc

We represent the k input bits by a vector

( ) ( ) ( )( )kiiii bbb ,,, 21

K=b , then the output ( ) ( ) ( )( )n

iiii ccc ,,, 21K=c is determined by the last m input vectors, miii −− bbb ,...,, 1 . The

encoder thus needs to remember mk input bits, which at a basic level could be implemented using mk input bits. The code rate of the encoder is nk . The code is linear, as the mapping is performed modulo two, using a set of weights

( ) ( ) ( )( ) ( ) { }{ }1,0|,,, 21 ∈= lpj

kpjpjpjpj gggg Kg to determine the contribution of the pth previous

input vector, pi−b , to the jth output, ( )jic ,

( ) ( ) ( )∑∑= =

−=m

p

k

l

lpi

lpj

ji bgc

0 1

It is evident that the sum of two input signals produces an output that is the sum of the two individual outputs, and that an all-zero input produces and all zero output. The above expression also makes it clear where the term ‘convolutional encoder’ comes from, as the implementation is very similar to discrete time convolution ( ( ) ∑ −=∗

jjiji hghg ).

The behaviour of the encoder is well thought of as a state machine, where the states of the encoder correspond to the bits held in memory, which are the m previous input vectors to the encoder. The total number of different states of the encoder is thus mk2 . It is important when a convolutional encoder is used to ‘flush’ the encoder, which means returning it to the all zero state by giving it m input vectors of zeros. The convolutional encoder must initially be in the all-zero state each time we use it, as otherwise the receiver will not be able to decode the sequence since it does not know the initial state of the encoder. The need to return the convolutional encoder to its initial state after each use allows us to interpret it as a block code. We can consider giving the encoder L input vectors, followed by a string of m zero-valued input vectors to flush the encoder. This is effectively an input block of ( )kmL + input bits of which the information part is Lk bits. In response to this input block the encoder produces an output block of ( )nmL + output bits. Thus, the convolutional encoder looks a lot like a ( )( )LknmL ,+ block code. In practise we can easily run the convolutional encoder for a long time, producing large block codes in a structurally simple way, and ensuring that the code rate is effectively nk . The behaviour of convolutional coders is usually made clear by studying a particular encoder. The convolutional encoder we’ll focus on in these notes is the famous (2,1,2) convolutional encoder shown in the diagram below.

1−lb 2−lblb

)1(lc

)0(lc

The encoder above produces two output bits ( ) ( ){ }10 , ii cc for every single input bit ib , dependent on the previous two input bits { }21, −− ii bb . The relationships to determine these output bits are

( )2

0−+= iii bbc

( )21

1−− ++= iiii bbbc

Following the notation above, we can describe this convolutional encoder by a set of filter weights, also called the generator sequences, ( ) { }1,0,10 =jg and ( ) { }1,1,11 =jg , such that

( ) ( )∑=

−=m

jjiji bgc

0

00 and ( ) ( )∑=

−=m

jjiji bgc

0

11

We really taking the convolution of the input bit sequence with these filter weights to determine the output bit sequence. It is common to denote these convolutional encoders in octal form, where we simply express every three bits of these generator sequences as the equivalent octal digit. This convolutional encoder is a

( ) ( ){ } { } { }111,1017,5, 10 ≡=jj gg encoder, expressed in octal.

Consider the following input bit sequence, ib = 10011. The above relationships allow us to determine the output sequence in a straightforward manner,

( ) ( )( ) ( ) ( ) ( ) ( ) ( ){ }1,0,1,1,1,1,0,1,1,1, 10 =ii cc This output bit sequence is usually transmitted as a single sequence with a predefined order, 1110111101. Note that this output sequence can be formed by taking the convolution of the generator sequences, ( ) { }1,0,10 =jg and ( ) { }1,1,11 =jg , with the input bit stream. A very useful way of understanding the behaviour of convolutional encoders is to think of them as state machines. A convolutional encoder is well modelled as a state machine, since the output at any time is determined by both the input and the current state of the encoder, where we naturally interpret the state of the encoder as the previous input bits currently stored in the encoders memory. The input and current encoder’s state determines the next state of the encoder. A convolutional encoder would have in general mk2 states, representing all the possible combinations for the bits currently stored in memory. Our (2,1,2) convolutional encoder thus has 422 = states. In the diagram, denote these states as ( )21, −− ii bb . The transitions between these states are denoted ( ) ( )( )10 , iii ccb , where ib is the current input bit, which ultimately decides on the next encoder state (since the current input will become the previous input, next clock cycle), and ( ) ( )( )10 , ii cc the output bits for this state and input. We will find the state diagram very useful for analysing the error performance of the convolutional encoder. A different, though less compact, representation is useful understanding the optimal decoding algorithm. This is the trellis diagram, which shows the possible state of the encoder at each time instant, and how the encoder can change to each different state in the next time interval.

( )0,0

( )1,1

( )0,1 ( )1,0

)0,0(0

)0,1(1

)1,0(0

)1,1(0

)0,0(1

)0,1(0

)1,0(1

)1,1(1

)0,0(0

)0,1(0

)1,0(1

)1,1(1

( )0,0

( )0,1

( )0,0

( )1,0

( )0,1

( )1,1

( )0,0( )0,0

)0,0(0

)1,1(1

)0,0(0

)0,1(1

)1,0(0

)1,1(0

)0,0(1)0,1(0

)1,0(1

)1,1(1

( )0,0

( )1,0

( )0,1

( )1,1

)0,0(0

)0,1(1

)1,0(0

)1,1(0

)0,0(1)0,1(0

)1,0(1

)1,1(1

( )0,0

( )1,0

( )0,1

( )1,1

)0,0(0

)0,1(1

)1,0(0

)1,1(0

)0,0(1)0,1(0

)1,0(1

)1,1(1

( )0,0

( )1,0

( )0,1

( )1,1

)0,0(0

)1,0(0

)1,1(0

)0,1(0

( )0,0

( )1,0

)0,0(0

)1,1(0 ( )0,0

flush

Shown on the trellis diagram in red/bold is the path the encoder follows through the trellis in response to the input sequence given earlier, 10011, followed by two 0s to flush the encoder. The trellis diagram shows very clearly that the encoder can be in any one of four possible states at each time instant, depending on what the input sequence. It also illustrates that each input bit sequence produces a different path through the trellis. If we are giving the encoder an input bit sequence of length L, then there are L2 distinct paths through the trellis. The action of our decoder now becomes, given the received bit sequence, determine the most likely path through the trellis, and hence decode the message. This trellis idea is important, and is the source of the error robustness of the convolutional encoder. Not all transitions are possible in the trellis, and this additional structure can help us identify and correct errors. The optimal algorithm for determining the most likely trellis path the encoder followed, given the received bit sequence, will be now be described. This is known as the Viterbi algorithm. A naïve implementation of the optimal encoder would be to wait until the entire bit sequence has been received, and then compare this bit sequence to all kL2 possible sequences to find the one that is closest to the received sequence. Naturally this is completely impractical for any relatively large input sequence, and large input sequences are what we are after to approach channel capacity. The Viterbi algorithm, developed by Andrew Viterbi in the late 1960s allows us to perform this comparison in an iterative and efficient way. The Viterbi algorithm records a best path for each of the states of the trellis, and attempts to update these paths for each received bit group corresponding to a single operation of the convolutional encoder. The key observation was that the current received bits cannot change the best paths ended at each state, only how these best paths will be updated in the next iteration. This greatly simplifies our work, as for each received encoder output we only need to perform a number of correlations equal to the number of states of the encoder, and not the number of paths through the trellis. Let’s formulate the algorithm, and then illustrate it with an example using our (2,1,2) convolutional encoder. We can denote a path through the trellis in one of two ways:

either by the sequence of inputs, ( ) ( ) ( ) ( )( )il

iiil bbbb ,,, 10:0 K= that causes that path through

the trellis; or ( ) ( ) ( ) ( )( )il

iiil cccc ,,, 10:0 K= , the sequence of encoder outputs that would be

transmitted corresponding to that trellis path. The index i refers to the ith path through the trellis. After l inputs there should be kl2 different paths, but as we mentioned earlier we will only remember km2 paths, one for each state of our trellis. The important point then is what we mean by ‘best path’. There are two conceivable metrics for us to consider, depending on whether we can implement soft decision decoding or hard decision decoding at the receiver. For hard decision decoding, we can compare trellis paths with the received sequence using the Hamming distance. Our ‘best path’ will be the one that has the smallest Hamming distance from the received sequence – that is, the path that differs from the received bit sequence in the least number of bits. The metric for hard decision decoding for a path is at up length l is

( )( ) ( )( ) ( )( )∑=

==l

jj

ijl

il

il ddd

0hamm:0:0hamm:0 ,, rcrcc

where ( ) ( )lil rrrr ,,, 10:0 K= is the received bit sequence, after hard decision decoding.

The important observation is that we can update this distance for next step of the path, at l + 1, by adding the distance between the new received bits and the codeword bits that extend this path to the existing path distance,

( )( ) ( )( ) ( )( )11hamm:0:01:0 ,, +→++ += l

jill

il

jl ddd rcrcc

where ( )jil→+1c is the codeword bits that would be sent if the ith path after l inputs is

extended to be the jth path after l+1 inputs. The procedure of the Viterbi decoding algorithm is then as follows. We remember the best path – the one that is closest to the received sequence – for each state of the encoder. For these paths we record the input bit sequence that corresponds to that path, ( ){ } km

iil

21:0 =b , the corresponding output bit sequence from the encoder ( ){ } km

iil

21:0 =c

(though we will see that we do not need to keep this in memory), and the distance of this path from the received sequence, ( )( ){ } km

iild 2

1:0 =c . We then received the next group of bits, 1+lr , corresponding to what is received when the encoder output at time l+ 1 is sent of the channel. We iterate through each state of the encoder, and update the paths

to find a new set of best paths for each state of the encoder, ( ) ( ) ( )( ){ } km

iil

il

il d 2

11:01:01:0 ,, =+++ ccb , where the path metric are updated via

( )( ) ( )( ) ( )( )11hamm:0:01:0 ,, +→++ += l

jill

il

jl ddd rcrcc

for hard decision decoding. There will be k2 possible paths that could be updated into each possible state of the encoder, so for received group there will be kmk 22 × correlations and comparisons that must be performed to obtain the km2 new best paths. Let’s work through an example of the Viterbi algorithm for our (2,1,2) convolutional encoder that we have been studying. Let the received bit sequence be 10 10 11 01 01 01. Now, inspection of the trellis indicates that, after two iterations there is a single path that can lead to each of the four possible states of the encoder. Thus, we can

begin our decoding algorithm at the second stage. The table gives our paths table, listing for each of the four states of the encoder that path expressed in both input and output bits, and how many bits that the first pair of received bits, 10 10, differs from each of these paths. Here we’ve assumed hard decision decoding, so the distance expressed is the Hamming distance. State ( )i

l:0b ( )il:0c ( )( )i

ld :0c 00 00 00 00 2 01 10 11 10 1 10 01 00 11 2 11 11 11 01 3 From this path table we can immediately see that at least one of the first four received bits is in error, as all four possible paths that could thus far be followed through the trellis differ from the received bits in at least one bit position. Our next step in the Viterbi algorithm is to update this path table, given the next received bit pair 11. We begin with state 00. According to the trellis there are only two ways we can enter state 00; we could come from state 00, if the input bit was 0 and then 00 would be sent; or we could come from state 01, with input 0 and 11 would be the codeword bits. This means that there are two paths we could potentially update to end at state 00; the path that leads us to 00, and the path that comes from 01. Our aim is to determine which the best one is. Now, the path that ended at 00 had a Hamming distance of 2 from the previous received sequence, and the encoded bits 00 differ in additional 2 bits from what we received, 11. Hence, updating this path would result in a distance of 4 from the received sequence. Comparing this to updating the 01 path to end at 00, the path to 01 had a Hamming distance of 1 from the received sequence, and the codeword bits 11 match the received bits, so the total distance of this path is only 1. Thus, our best path is found by updating the best path to 01 one step to 00. We then proceed with three similar path comparisons for states 01, 10, and 11, and update our path table to State ( )i

l:0b ( )il:0c ( )( )i

ld :0c 00 100 11 10 11 1 01 010 00 11 10 3 10 001 00 00 11 2 11 011 00 11 01 3 The next received bits are 01, and we then update our path trouble given these received bit pairs. Looking again at the situation for 00, we consider updating either the path from 00 or the path from 01. The path from 00 differs thus far in 1 bit from the received sequence, and this next step would introduce a further bit difference, comparing the codeword bits 00 to the received bits 01, thus it would have a total distance of 2. On the other hand the path from 01 differs from what has been previously received in 3 bits, and there is additional bit difference for our extra hop (codeword 11 compared to 01 that was received). Thus, we choose to update the path

from 00 to 00 in this next iteration. The full updated path list is, after these received bits, State ( )i

l:0b ( )il:0c ( )( )i

ld :0c 00 1000 11 10 11 00 2 01 0110 00 11 01 01 3 10 1001 11 10 11 11 2 11 0011 00 00 11 01 2 And updating this again for the fifth received bit pair, 01, the path table becomes as below. State ( )i

l:0b ( )il:0c ( )( )i

ld :0c 00 10000 11 10 11 00 00 3 01 00110 00 00 11 01 01 2 10 10001 11 10 11 00 11 3 11 10011 11 10 11 11 01 2 There must have been at least two bit errors within the first ten received bits. We could continue updating these paths for more and more received bits. If we were to make a decision here, after only ten received bits, we have two equally likely paths – those ending at state 01 and the path ending at 11. In some respects this means our algorithm has not been successful decoding this bit sequence so far. Updating this table once more for the next received bit pair, 01, we obtain. State ( )i

l:0b ( )il:0c ( )( )i

ld :0c 00 001100 00 00 11 01 01 11 3 01 100110 11 10 11 11 01 01 2 10 001101 00 00 11 01 01 00 3 11 100011 11 10 11 00 11 01 3 Now we have a clear winner, and would decode our received sequence as 100110. Note that we have been able to correct two bit errors, as we believe that 11 10 11 11 01 01 was sent, and 10 10 11 01 01 01 was received. The ability to correct two errors in twelve bits at a code rate of ½ shows the good performance of the convolutional code – it would be hard to build a block code with these constraints to have equivalent performance. Indeed convolutional codes are very powerful in correcting random bit errors in a binary sequence. In practise we do not need to remember the codeword sequences, ( ){ } km

iil

21:0 =c , as the only

important quantity is their distance from the received sequence. As we were beginning to see in the above example, after a while the beginning of all the best paths begin to look the same, and the paths differ only in the last few bit positions. In practise one would make a decision about these earlier bits in the sequence, so that the decoder does not need to wait until the entire sequence has been received before passing decoded bits to the output. This greatly reduces throughput delays in a system in which the coder is used. A general rule of thumb is that we need to remember sequences beyond P steps in the past, where mP 5≥ , and can decide on the earlier

part of the sequence and pass these decoded bits to the output. This also has the advantage of reducing the memory needed to store the sequences. For our (2,1,2) convolutional encoder this would mean that we need only remember paths to lengths of approximately 10 codeword pairs, or 10 bits (as we remember the path as the input bits), and can make decisions on paths beyond this length. One should point out that this is only a rough rule of thumb, and needs to be tweaked slightly in practise. For example, the IS-95 CDMA standard for mobile communications specified a (3,1,9) convolutional encoder for the reverse link. Such an encoder has 51229 = states, so the decoder will be required to remember 512 paths of length of approximately 45 bits each, requiring 2880 bytes. For received bit triple the decoder would have to perform 1024210 = correlation and comparisons. Calculations of this order can easily be performed with the computing power available today, even for soft-decision decoding.

Turbo Codes Turbo codes were stumbled upon in 1993 by C. Berrou and A. Glavieux, and their discovery surprised the communications research community in providing engineers for the first time structurally simple codes that could approach Shannon’s capacity closer than ever before. The basic structure of a turbo coder is shown in the diagram below. This is the structure of a (3,1,m) turbo encoder. It consists of two parallel and identical Recursive Systematic Convolutional (RSC) encoder – one acting on directly on the input bit stream, and the other on a random interleaved version of the bit stream. A RSC encoder is similar to the traditional convolutional encoder we have been considering, except that their structures are altered to make one of the outputs systematic. A simple (2,1,3) RSC encoder is shown in the diagram below. RSC themselves do not perform as well as traditional convolutional encoders relative to random channel errors, however they are an integral part of turbo encoders. The idea behind the use of an interleaver in the turbo encoder is that errors that occur on one of the bit stream are unlikely to also occur in the equivalent bits of the interleaved copy. This gives the encoder greater error robustness and increased diversity, and indeed the interleaver is found to be integral in the performance of a turbo code. Turbo decoding in typically performed using the BCJR algorithm (Bahl, Cocke, Jelinek, and Raviv). The BCJR algorithm differs from the simple Viterbi algorithm in being a soft-input and soft-output algorithm. A turbo decoder will employ two of these BCJR algorithms in parallel, each one acting to decode the received sequence and the interleaved sequence – exchanging soft bit estimates between each other as they go. This structure is reminiscent of a turbo charger in a motor, hence the name, ‘turbo codes’. It is also very important that turbo coders act on ‘soft-decisions’, where the value of each binary bit is not decided as ‘0’ or ‘1’ (a hard decision), but rather left as a continuous (or really finite precision) demodulated amplitude or bit probability estimate. The requirement for soft-decision decoding naturally adds significantly to the computational overhead.

Turbo codes, along with LDPC codes and Reed-Solomon codes concatenated with convolutional codes, are our best performing channel codes at present. The main drawback with turbo codes at present is that they require a fair degree of computational overheads, as the soft-decision algorithms are computationally expensive. As more and more processing power becomes available turbo codes are being increasingly employed in practical systems. As we shall see, turbo codes are a feature of 3G cellular standards, as a powerful alternative to traditional convolutional codes.

Date post:	26-Jan-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

TELE4652 Mobile and Satellite Communication Systems · The other important insight is that simply...

Documents