ON DECODING TANNER CODES WITH LOCAL-OPTIMALITY …nissimh/Nissim-PhDThesis.pdf · Tanner code with...

THE IBY AND ALADAR FLEISCHMAN FACULTY OF ENGINEERING

The Zandman-Slaner Graduate School of Engineering

ON DECODING TANNER CODES WITH LOCAL-OPTIMALITY GUARANTEES

By

Nissim Halabi

THESIS SUBMITTED TO THE SENATE OF TEL-AVIV UNIVERSITY

in partial fulfillment of the requirements for the degree of

"DOCTOR OF PHILOSOPHY"

October 2012

THE IBY AND ALADAR FLEISCHMAN FACULTY OF ENGINEERING

The Zandman-Slaner Graduate School of Engineering

ON DECODING TANNER CODES WITH LOCAL-OPTIMALITY GUARANTEES

By

Nissim Halabi

THESIS SUBMITTED TO THE SENATE OF TEL-AVIV UNIVERSITY

in partial fulfillment of the requirements for the degree of

"DOCTOR OF PHILOSOPHY"

This research work was carried out at Tel-Aviv University,

at the Faculty of Engineering

Under the supervision of Prof. Guy Even

October 2012

This work was carried out under the supervision of

Prof. Guy Even

Acknowledgments

First and foremost, I would like to express my deepest gratitude to my advisor, Prof. GuyEven, with whom I have had a great pleasure to work with. As evident from Figure 1,my correctness-proving abilities have developed considerably under his guidance since mysenior year, some ten years ago.

I am grateful to Dr. Pascal O. Vontobel for stimulating discussions over the past fewyears, and for many helpful suggestions that improved the outcome of my research.

I thank my fellow students and friends at the Algorithms Lab and the Systems depart-ment in the School of Electrical Engineering, for making my years at Tel-Aviv Universityvery enjoyable.

I give my final words of gratitude to my family for being as they are. Thank you somuch.

This research was supported in part by fellowships from the Advanced CommunicationCenter at Tel-Aviv University and the Yitzhak and Chaya Weinstein Research Institutefor Signal Processing at Tel-Aviv University.

(a) A trial to design a synchronous com-binational circuit.

(b) Failed to persuade Guy that the design is correct.

Figure 1: A mid-term exam in the course “Introduction to Digital Computers” taughtby Guy back in 2000. My first record of acquaintance with Guy, when I was 19 years oldin my sophomore year.

i

ii

Abstract

This thesis deals with two approaches to the problem of suboptimal decoding of finite-length error correcting codes under a probabilistic channel model. One approach, ini-tiated by Gallager in 1963, is based on iterative message-passing decoding algorithms.The second approach, introduced by Feldman, Wainwright and Karger in 2003, is basedon linear-programming (LP). We consider families of error correcting codes defined ongraphs, called Tanner codes. The probabilistic channel model is any memoryless binary-input output-symmetric (MBIOS) channel.

Given a channel observation y and a codeword x, we are interested in a one sided errortest that answers the questions: is x optimal with respect to y? is it unique? A positiveanswer for such a test is called a certificate for the optimality of a codeword. We dealwith certificates that are based on combinatorial characterizations of local-optimality.Local-optimality certificates guarantee both maximum-likelihood (ML) optimality andLP-optimality. That is, if a certificate exists for some codeword x, then LP-decodingfinds x, and x is guaranteed to be the ML-codeword.

We present certificates for three families of Tanner codes. (i) We start with a simplecertificate based on simple cycles in the Tanner graph for a family of “even” Tannercodes. (ii) We proceed with certificates for regular low-density parity-check (LDPC)codes introduced by Arora, Daskalakis and Steurer [2009] based on a work by Koetterand Vontobel [2006]. This certificate is based on weighted skinny trees in the Tannergraph, the height of which is bounded by half of the girth of the Tanner graph. (iii) Weconclude with certificate for any linear Tanner code. The certificate is based on trees incomputation trees of the Tanner graph. These trees may have any finite height h (evengreater than the girth of the Tanner graph). In addition, the degrees of local-code nodesare not restricted to two (i.e., the trees are denser than skinny trees).

Based on the combinatorial characterization of the certificates, we present bounds onthe word error probability for LP-decoding in MBIOS channels. Upper bounds on theon the word error probability are obtained by lower bounds on the probability that acertificate exists.

We present a new message-passing iterative decoding algorithm, called normalizedweighted min-sum (nwms). nwms algorithm is a BP-type algorithm that applies to anyTanner code with single parity-check local codes (e.g., LDPC and high-density parity-check codes). We prove that if a locally-optimal codeword for depth h exists, then thenwms algorithm finds it in h iterations. Hence, if successful, the nwms algorithm has anML-certificate for any bounded number of iterations. Furthermore, since the depth h isnot bounded, the guarantee for successful decoding by nwms holds even if the numberof iterations h exceeds the girth of the Tanner graph.

Finally, we present hierarchies on the combinatorial characterization of local-optimality.These hierarchies explain: (i) The improvement of the probability that a local-optimalitycertificate exists when increasing the certificate’s hight or density. (ii) The improvementof iterative decoding (e.g., nwms) when increasing number of iterations, even when thenumber of iterations exceeds the girth.

iii

iv

Contents

1 Introduction 11.1 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.1.1 Bounds on the Word Error Probability for LP-Decoding of “Even”Tanner Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.1.2 LP Decoding of Regular LDPC codes in Memoryless Channels . . 51.1.3 Local-Optimality Certificates for ML-Decoding and LP-decoding of

Tanner Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.1.4 Message-Passing Decoding with Local-Optimality Guarantee for

Tanner Codes with Single Parity-Check Local Codes . . . . . . . 71.1.5 Bounds on the Word Error Probability for LP-Decoding of Regular

Tanner Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.1.6 Hierarchies of Local-Optimality . . . . . . . . . . . . . . . . . . . 10

1.2 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Preliminaries 132.1 Graph Terminology and Algebraic Notation . . . . . . . . . . . . . . . . 132.2 Communicating over Noisy Channel . . . . . . . . . . . . . . . . . . . . . 142.3 Tanner Codes and Tanner Graph Representation . . . . . . . . . . . . . . 162.4 Linear Programming Decoding of Tanner Codes over Memoryless Channels 192.5 Symmetry of LP Decoding and the All-Zero Codeword Assumption . . . 202.6 Graph Cover Pseudo-Codewords and the Generalized Fundamental Polytope 22

2.6.1 Covering Maps, Liftings, and Cover Codes . . . . . . . . . . . . . 222.6.2 Graph Cover Pseudo-Codewords and the Generalized Fundamental

Polytope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3 Bounds on the Word Error Probability for LP-Decoding of “Even” Tan-ner Codes 293.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.2 Local-Optimality Certificates Based on Simple Cycles . . . . . . . . . . . 30

3.2.1 Local-Optimality Implies ML-Optimality . . . . . . . . . . . . . . 313.2.2 Local-Optimality Implies LP-Optimality . . . . . . . . . . . . . . 32

3.3 Bounds on the Word Error Probability Using Cycle-Based Local-Optimality 333.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4 LP Decoding of Regular LDPC codes in Memoryless Channels 394.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

v

4.2 On the Connections between Local Optimality, Global Optimality, and LPOptimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.3 Proving Error Bounds Using Local Optimality . . . . . . . . . . . . . . . 454.3.1 Bounding Processes on Trees . . . . . . . . . . . . . . . . . . . . . 454.3.2 Analysis for BI-AWGN Channel . . . . . . . . . . . . . . . . . . . 47

4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.A Computing the Evolution of Probability Densities over Trees . . . . . . . 52

4.A.1 Properties of Random Variables . . . . . . . . . . . . . . . . . . . 524.A.2 Computing Distributions of Xl and Yl . . . . . . . . . . . . . . . . 524.A.3 Estimating mint>0 Ee

−tXs . . . . . . . . . . . . . . . . . . . . . . 54

5 Local-Optimality Certificates for ML-Decoding and LP-decoding ofTanner Codes 595.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595.2 A Combinatorial Certificate for an ML Codeword . . . . . . . . . . . . . 615.3 Local Optimality Implies LP-Optimality . . . . . . . . . . . . . . . . . . 635.4 Verifying Local Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . 655.5 Constructing Codewords from Weighted Trees Projections . . . . . . . . 65

6 Message-Passing Decoding with Local-Optimality Guarantee for TannerCodes with Single Parity-Check Local Codes 716.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 716.2 Message-Passing Decoding with ML Guarantee for Irregular LDPC Codes 73

6.2.1 Proof of Theorem 6.1 - NWMS Finds the Locally Optimal Codeword 756.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 806.A Optimal Valid Subconfigurations in the Execution of NWMS2 . . . . . . 81

7 Bounds on the Word Error Probability for LP-Decoding of RegularTanner Codes 857.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 857.2 Bounds on the Error Probability of LP-Decoding Using Local-Optimality 86

7.2.1 Bounding Processes on Trees . . . . . . . . . . . . . . . . . . . . . 887.2.2 Analysis for Binary Symmetric Channel . . . . . . . . . . . . . . 907.2.3 Analysis for MBIOS Channels . . . . . . . . . . . . . . . . . . . . 93

7.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 947.A Proof of Lemma 7.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 957.B Proof of Lemma 7.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

8 Hierarchies of Local-Optimality 978.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 978.2 Trimming Subtrees from a Path-Prefix Tree . . . . . . . . . . . . . . . . 988.3 Degree Hierarchy of Local-Optimality . . . . . . . . . . . . . . . . . . . . 998.4 Height Hierarchy of Strong Local-Optimality . . . . . . . . . . . . . . . . 1008.5 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1028.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1038.A Proof of Lemma 8.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

vi

List of Figures

1 A mid-term exam in the course “Introduction to Digital Computers” taughtby Guy back in 2000. My first record of acquaintance with Guy, when Iwas 19 years old in my sophomore year. . . . . . . . . . . . . . . . . . . . i

2.1 Channel Coding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.2 Examples of MBIOS channels. . . . . . . . . . . . . . . . . . . . . . . . . 152.3 Tanner Graph of a Tanner Code. . . . . . . . . . . . . . . . . . . . . . . 172.4 M-cover of base graph G: Fiber π−1(vi) contains M copies of vi, matching

between fibers π−1(vi) and π−1(vj) if (vi, vj) is an edge in G. . . . . . . . 232.5 Connecting Edges in proof of Lemma 2.18 between fiber π−1(Cj) every fiber

π−1(vi) such that vi ∈ NG(Cj). For example, we assume that NG(Cj) =

vi1, vi2 , vi3 and x′(j)1 N (Cj)

= (0, 1, 1), x′(j)2 N (Cj)

= (1, 0, 1), and x′(j)M N (Cj)

=

(0, 1, 1). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.1 Probability density functions of X0 and Y0 for (dL, dR) = (3, 6) and σ = 0.7. 544.2 Probability density functions of Xl for l = 0, . . . , 4, (dL, dR) = (3, 6) and

σ = 0.7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.3 Probability density functions of Yl for l = 0, . . . , 4, (dL, dR) = (3, 6) and

σ = 0.7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.4 ln

(Ee−tXs

)as a function of t for s = 4, 6, 8, 10, 12, (dL, dR) = (3, 6) and

σ = 0.7. Plot (b) is an enlargement of the rectangle depicted in plot (a). 56

4.5 (a) Region for which 5e−1

2σ2 Ee−tX4 < 1 as a function of t and σ for(dL, dR) = (3, 6). Note that the maximal value of σ contained in thatregion results to the estimate of σ0 = 0.685 in the entry s = 4 in Table 4.1.(b) Constant α in Corollary 4.17 as a function of σ in the case where s = 4and t = 0.11, i.e., the value of α over the cut of the (t, σ)-plane in plot (a)at t = 0.11 (depicted by a thick solid line). . . . . . . . . . . . . . . . . . 57

5.1 Set of all backtrackless paths P`(v) as augmentation of the set P`−1(v) asviewed by the path-suffix tree of height ` rooted at v, in proof of Lemma 5.11. 67

6.1 Substructures of a path-prefix tree T 2hr (G) in a dynamic programming that

computes optimal configurations in T 2hr (G). . . . . . . . . . . . . . . . . 82

6.2 T 2l+2C→v as a substrucure isomorphic to a subtree of the path-prefix tree T 2h

r . 83

8.1 Trimmed tree of T induced by q. . . . . . . . . . . . . . . . . . . . . . . 998.2 Decomposition of a reduced d-tree T of height 2kh to a set of subtrees

Tj that are reduced d-trees of height 2h. . . . . . . . . . . . . . . . . . 102

vii

8.3 Growth of strong local-optimality and local-optimality as a function of theheight h. |Λp| = 5000 for p ∈ 0.04, 0.05, 0.06. . . . . . . . . . . . . . . . 104

viii

List of Tables

4.1 Computed values of σ0 for finite s in Corollary 4.17, and their correspond-ing Eb

N0SNR measure in dB. . . . . . . . . . . . . . . . . . . . . . . . . . 50

7.1 Computed values of p0 for finite d0 < d∗ in Theorem 7.1. Values are pre-sented for (2, 16)-Tanner code with rate at least 0.375 when using [16, 11, 4]-extended Hamming codes as local codes. (a) finite-length bound: ∀p 6 p0

bound on the word error probability that is inverse doubly-exponential inthe girth of the Tanner graph. (b) asymptotic-bound: For g = Ω(logN)sufficiently large, LP decoder succeeds w.p. at least 1−exp(−N δ) for someconstant 0 < δ < 1, provided that p 6 p0(d0). . . . . . . . . . . . . . . . . 88

7.2 Computed values of p0 for finite s in Corollary 7.10. Values are presentedfor (dL, dR) = (2, 16) and d = 3. . . . . . . . . . . . . . . . . . . . . . . . 93

7.3 Computed values of p0 for finite s in Corollary 7.11. Values are presentedfor (dL, dR) = (2, 16) and d = 4. . . . . . . . . . . . . . . . . . . . . . . . 93

ix

x

List of Algorithms

5.1 verify-lo(x, λ, h, w, d) - An iterative verification algorithm. Let G =(V ∪ J , E) denote a Tanner graph. Given an LLR vector λ ∈ R|V|, acodeword x ∈ C(G), level weights w ∈ Rh

+, and a degree d ∈ N+, outputs“true” if x is (h, w, d)-locally optimal w.r.t. λ; otherwise, outputs “false”. 65

6.1 nwms(λ, h, w) - An iterative normalized weighted min-sum decoding algo-rithm. Given an LLR vector λ ∈ RN and level weights w ∈ Rh

+, outputs abinary string x ∈ 0, 1N . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

6.2 nwms2(λ(0), λ(1), h, w) - An iterative normalized weighted min-sum de-coding algorithm. Given an log-likelihood vectors λ(a) ∈ RN for a ∈ 0, 1and level weights w ∈ Rh

+, outputs a binary string x ∈ 0, 1N . . . . . . 76

xi

xii

Chapter 1

Introduction

Reliable transmission of information over unreliable (noisy) channels is achieved by aprocess of channel coding [Sha48]. The study of error-correcting codes began with thework of Hamming [Ham50], and is central to the field of information theory. In general,the theory of coding can be classified into two main classes: (i) classical coding theory,and (ii) modern coding theory.

Classical coding theory mainly deals with finding codes with desired properties (e.g.,large minimum distance). Then, efficient algorithms for optimal decoding of these codesare designed (see e.g., [Rot06]). Modern coding theory deals with designing good errorcorrecting codes for a given class of efficient decoders, that are not necessarily optimal(see e.g., [RU08]).

Solving the maximum-likelihood (ML) decoding problem for general linear codes iscomputationally intractable [BMvT78]. Suboptimal decoders may fail to correct errorsthat are corrected by an optimal maximum-likelihood (ML) decoder (worst case anal-ysis). However, in the average case, efficient suboptimal decoders may approach theperformance of the ML-decoder (see e.g., [CRU01, BZ04, FS05]).

In this thesis we study two approaches to the problem of suboptimal decoding offinite-length error correcting codes under a probabilistic model of the channel noise. Oneapproach, initiated by Gallager [Gal63], is based on iterative message-passing decodingalgorithms based on belief-propagation (see e.g., [Gal63, BGT93, Mac99, LMSS01, RU01,KFL01]). The second approach, introduced by Feldman, Wainwright and Karger [Fel03,FK04, FWK05], is based on linear-programming (LP).

Low-density parity-check (LDPC) codes were invented by Gallager [Gal63] in 1963.Gallager also invented the first type of message-passing iterative decoding algorithm,known today as the sum-product algorithm for a-posteriori probability (APP) decoding.Until the 1990s, iterative decoding systems were forgotten with a few exceptions such asthe landmark paper of Tanner [Tan81] in 1981. Tanner [Tan81] introduced graph repre-sentations of linear codes based on bipartite graphs over variable nodes and constraintnodes, and viewed iterative decoding as message-passing algorithms over the edges of theTanner graph. In the standard setting, constraint nodes compute the parity function. Inthe generalized setting, constraint nodes use a local error-correcting code. One may viewa constraint node with a linear local-code as a coalescing of multiple single parity-checknodes. Therefore, a code may have a sparser and smaller representation when representedas a Tanner code in the generalized setting.

1

LDPC codes were rediscovered [MN96] after the discovery of turbo-codes [BGT93].LDPC codes have attracted a lot of research attention since empirical studies demonstrateexcellent decoding performance using iterative decoding methods. Among the main re-sults is the density-evolution technique for analyzing and designing asymptotic LDPCcodes [RU01]. A density-evolution analysis computes a threshold for the noise. Thismeans that if the noise in the channel is below that threshold, then the decoding errordiminishes as a function of the block length. The threshold results of [RU01] hold for arandom code from an ensemble of LDPC codes.

Wiberg et al. [WLK95, Wib96] developed the use of graphical models for system-atically describing instances of known decoding algorithms. In particular, the “sum-product” algorithm and the “min-sum” algorithm are presented as generic iterativemessage-passing decoding algorithms that apply to any graph realization of a Tannercode. Wiberg et al. proved that the min-sum algorithm can be viewed as a dynamic pro-gramming algorithm that computes the ML-codeword if the Tanner graph is a tree. Thisframework was further generalized to represent global functions using factor graphs overlocal-function nodes and variable nodes [AM00, KFL01, For01]. The framework of factorgraphs leads to a large class of iterative message-passing algorithms aimed to efficientlysolve the global function represented by the factor graphs.

Most results on iterative message-passing decoding algorithms address asymptoticperformance. Namely, performance is analyzed when the block length tends to infinity.From a practical point of view, it is important to analyze the performance of specificfinite length codes. Asymptotic analysis does not resolve the problem of finite-lengthanalysis. In this thesis focus on such finite length analysis for specific codes.

Feldman, Karger and Wainwright [Fel03, FK04, FWK05] introduced an interestingsuboptimal decoding algorithm for linear codes that is based on linear programming. Ini-tially, this idea seems to be counter-intuitive since codes are over 0, 1N , whereas linearprogramming is over RN . Following ideas from optimization theory and approximationalgorithms, linear programming (LP) is regarded as a fractional relaxation of an integerprogram that models the problem of maximum-likelihood decoding. One can distinguishbetween integral solutions (vertices) and non-integral vertices of the LP. The integralvertices correspond to codewords, whereas the non-integral vertices are not codewordsand are thus called pseudocodewords. This algorithm, called LP-decoding, has two mainadvantages: (i) it runs in polynomial time, and (ii) when successful, LP-decoding providesan ML-certificate, i.e., a proof that its outcome agrees with ML decoding.

LP decoding has been applied to several codes, among them: cycle codes, turbo-likecodes and RA codes [FK04, HE05, GB11], LDPC codes [FMS+07, DDKW08, KV06,ADS09, HE11a], expander codes [FS05, Ska11], and polar codes [GKG10]. Experimentsindicate that max-product (message-passing) decoding is likely to fail if LP-decodingfails [Fel03, VK05].

Koetter and Vontobel showed that LP-decoding is equivalent to graph cover decod-ing [KV03, VK05]. Abstractly, graph cover decoding proceeds as follows. Given a receivedword, graph cover decoding considers all possible M-covers of the Tanner graph of thecode (for every integer M). For every M-cover graph, the variables are assigned M copiesof the received word. ML-decoding is applied to obtain a codeword in the code corre-sponding to the M-cover graph. The “best” ML-decoding result is selected among allcovers. This lifted codeword is then projected (via averaging) to the base Tanner graph.

2

Obviously, this averaging might yield a non-integral solution, namely, a pseudocodewordas in the case of LP-decoding. Graph cover decoding provides a combinatorial character-ization of LP-decoding and pseudo-codewords.

1.1 Summary of Results

Our work is motivated by the problem of finite-length and average-case analysis of suc-cessful LP-decoding and iterative message-passing decoding for families of Tanner codes.

Combinatorial characterizations of sufficient conditions for successful decoding arebased on so called “certificates”. That is, given a channel observation y and a codewordx, we are interested in a one-sided error test that answers the questions: is x optimal withrespect to y? is it unique? Note that the test may answer “no” for a positive instance.A positive answer for such a test is called a certificate for the optimality of a codeword.

We deal with certificates that are based on combinatorial characterizations for local-optimality. The certificates are efficiently computed by dynamic programming algorithmsbased on optimal local computations. Hence, called local-optimality certificates. Thelocal-optimality certificates guarantee both ML-optimality and LP-optimality. That is,if a certificate exists for some codeword, then LP-decoder finds it, and it is guaranteedto be the ML-codeword.

We present certificates for three families of Tanner codes. (i) We start with a simplecertificate based on simple cycles in the Tanner graph for a family of “even” Tanner codes.(ii) We proceed with certificate for regular LDPC codes introduced by Arora, Daskalakisand Steurer [ADS09] based on a work by Koetter and Vontobel [KV06]. This certificateis based on weighted skinny trees in the Tanner graph, the height of which is boundedby half of the girth of the Tanner graph. (iii) We conclude with certificate for any linearTanner code. The certificate is based on weighted normalized trees in computation treesof the Tanner graph. These trees may have any finite height h (even greater than thegirth of the Tanner graph). In addition, the degrees of local-code nodes are not restrictedto two (i.e., the trees are denser than skinny trees).

Based on the combinatorial characterization of the certificates, we present bounds onthe word error probability for LP-decoding in memoryless binary-input output-symmetric(MBIOS) channels. Upper bounds on the on the word error probability are obtained bylower bounds on the probability that a certificate exists.

We present a new message-passing iterative decoding algorithm, called normalizedweighted min-sum (nwms). nwms algorithm is a BP-type algorithm that applies to anyTanner code with single parity-check local codes (e.g., LDPC codes). We prove that if alocally-optimal codeword with height parameter h exists, then the nwms algorithm findsit in h iterations. Hence, the nwms algorithm has an ML-certificate for any boundednumber of iterations. Furthermore, since the height parameter h is not bounded, theguarantee for successful decoding by nwms holds even if the number of iterations hexceeds the girth of the Tanner graph.

Finally, we present hierarchies on the combinatorial characterization of local-optimality. These hierarchies explain: (i) The improvement of the probability that a local-optimality certificate exists when increasing the certificate’s hight or density. (ii) Theimprovement of iterative decoding (e.g., nwms) when increasing number of iterations,

3

even when the number of iterations exceeds the girth.In the following subsections we present, in detail, previous results and our contribu-

tion.

1.1.1 Bounds on the Word Error Probability for LP-Decoding

of “Even” Tanner Codes

In Chapter 3 (based on [HE06], extended in [HE12b, HE12e]), we deal with a simpleexample, yet effective, for an analysis based on local-optimality characterization. Inthis case, the local-optimality characterization is based on simple cycles in the Tannergraph. The local-optimality certificate guarantees ML-optimality and LP-optimality.Also, a simple analysis of this cycle-based local-optimality implies analytical boundson the success probability for LP-decoding.

Previous Results. Feldman and Karger [FK02] first introduced the novel concept oflinear-programming based decoding for repeat-accumulate RA(q) turbo-like codes. ForRA(2) codes over the binary symmetric channel (BSC), for a certain constant thresholdon the noise, they proved that the word error probability of LP-decoding is boundedby an inverse polynomial in the code length. A similar claim was also shown for thebinary-input additive white Gaussian noise (BI-AWGN) channel.

RA(2) codes are a family of cycle codes, i.e., Tanner codes with single parity-checklocal codes such that the degree of every variable node equals to 2. In fact, the analysispresented by Feldman and Karger [FK02, FK04] holds for any cycle code whose Tannergraph has a logarithmic girth. Hence, LP-decoding of cycle codes has a word errorprobability of at most N−ε for any ε > 0, provided that the noise is bounded by a certainfunction of the constant ε (independent of N). Moreover, in the case of cycle codes,LP-decoding is equivalent to the iterative min-sum decoder [FWK05].

The analysis of Feldman and Karger [FK04] was improved in [HE05]. An exact combi-natorial characterization of irreducible pseudocodewords w.r.t. the LLR vector receivedfrom the channel is presented. LP-decoding fails iff there exists an irreducible pseu-docodeword with negative cost w.r.t. the LLR vector received from the channel. Hence,a careful combinatorial analysis of such irreducible pseudocodewords w.r.t. MBIOS chan-nels resulted in improved bounds.

Recently, Goldenberg and Burshtein [GB11] generalized the analysis of Feldman andKarger [FK04] to RA(q) codes with even repetition q > 4. For this family of codes, theyproved inverse polynomial bounds in the code length on the word error probability of LP-decoding. These bounds are based on analysis of hyperpromenades defined by Feldmanand Karger [FK04]. We note that the family of even Tanner codes contains the family ofrepeat accumulate codes with even repetitions studied in [GB11].

Contributions. We study a family of Tanner codes in which (i) the degrees of thevariable nodes all equal dL, for an even number dL, and (ii) local-codes contain onlywords with even weight. We call this family of codes even Tanner codes. We presenta combinatorial characterization of local-optimality of a codeword in even Tanner codeswith respect to any MBIOS channel. The local-optimality characterization is based onsimple cycles in the Tanner graph. We prove that cycle-based local-optimality implies

4

ML-optimality and LP-optimality. We prove inverse polynomial bounds in the codelength on the word error probability for LP-decoding of even Tanner codes whose Tannergraphs have logarithmic girth.

1.1.2 LP Decoding of Regular LDPC codes in MemorylessChannels

In Chapter 4 (based on [HE10, HE11a]), we deal with regular low-density parity-check(LDPC) codes, i.e., Tanner codes based on sparse regular Tanner graphs with singleparity-check local codes.

Previous Results. Feldman et al. [FMS+07] were the first to show that LP-decodingcorrects a constant fraction of errors for expander codes over an adversarial bit flippingchannel. For example, for a specific family of rate 1

2LDPC expander codes, they proved

that LP-decoding can correct 0.000175N errors. This kind of analysis is worst-case inits nature, and the implied results are quite far from the performance of LDPC codesobserved in practice over binary symmetric channels (BSC). Daskalakis et al. [DDKW08]initiated an average-case analysis of LP-decoding for LDPC codes over a probabilisticbit flipping channel. For a certain family of LDPC expander codes over a BSC with bitflipping probability p, they proved that LP-decoding recovers the transmitted codewordwith high probability up to a noise threshold of p = 0.002. This proved threshold forLP-decoding is rather weak compared to thresholds proved for belief propagation (BP)decoding over the BSC. For example, even for (3, 6)-regular LDPC codes, the BP thresh-old is p = 0.084. Therefore, one would expect LDPC expander codes to have muchbetter thresholds than p = 0.002 under LP-decoding. Both of the results in [FMS+07]and [DDKW08] were proved by analysis of the dual LP solution based on expansion ar-guments. Extensions of [FMS+07] to a larger class of channels (e.g., truncated AWGNchannel) were discussed in [FKV05].

Koetter and Vontobel [KV06] analyzed LP-decoding of regular LDPC codes usinggirth arguments and the dual LP solution. They proved lower bound on the thresholdof LP-decoding for regular LDPC codes whose Tanner graphs have logarithmic girthover any memoryless channel. This bound on the threshold depends only on the de-gree of the variable nodes. The decoding errors for noise below the threshold decreasedoubly-exponentially in the girth of the factor graph. This was the first threshold resultpresented for LP-decoding of LDPC codes over memoryless channels other than the BSC.When applied to LP-decoding of (3, 6)-regular LDPC codes over a BSC with crossoverprobability p, they achieved a lower bound of p = 0.01 on the threshold. For the binary-input additive white Gaussian noise channel with noise variance σ2 (BI-AWGN(σ)), theyachieved a lower bound of σ = 0.5574 on the threshold (equivalent to an upper boundof Eb

N0= 5.07dB). The question of closing the gap to σ = 0.82 (1.7dB) [WA01], which is

the threshold of max-product (min-sum) decoding algorithm for the same family of codesover a BI-AWGNC(σ), remains open.

Recently, Arora et al. [ADS09] presented a novel probabilistic analysis of the primalsolution of LP-decoding for regular LDPC codes over a BSC using girth arguments. Theyproved error bounds that are inverse doubly-exponential in the girth of the Tanner graphand lower bounds on thresholds that are much closer to the performance of BP-based

5

decoding. For example, for a family of (3, 6)-regular LDPC codes whose Tanner graphshave logarithmic girth over a BSC with crossover probability p, they proved a lower boundof p = 0.05 on the threshold of LP-decoding. Their technique is based on a weighteddecomposition of every codeword and pseudo-codeword to a finite set of structured trees.They proved a sufficient condition, called local-optimality, for the optimality of a decodedcodeword based on this decomposition. They use a min-sum process on trees to bound theprobability that local-optimality holds. A probabilistic analysis of the min-sum processis applied to the structured trees of the decomposition, and yields error bounds for LP-decoding.

Contributions. We extend the analysis in [ADS09] from the BSC to any memorylessbinary-input output-symmetric (MBIOS) channel. We prove bounds on the word errorprobability that are inverse doubly-exponential in the girth of the factor graph for LP-decoding of regular LDPC codes over MBIOS channels. We also prove lower boundson the threshold of (dL, dR)-regular LDPC codes whose Tanner graphs have logarithmicgirth under LP-decoding in binary-input AWGN channels. Note that regular Tannergraphs with logarithmic girth can be constructed explicitly (see e.g. [Gal63]). Specifically,in a finite length analysis of LP-decoding over BI-AWGN(σ), we prove that for (3, 6)-regular LDPC codes the decoding errors for σ < 0.605 (Eb

N0> 4.36dB) decrease doubly-

exponentially in the girth of the factor graph. In an asymptotic case analysis, we prove alower bound of σ = 0.735 (upper bound of Eb

N0= 2.67dB) on the threshold of (3, 6)-regular

LDPC codes under LP-decoding, thus decreasing the gap to the BP-based decodingasymptotic threshold.

In our analysis we utilize the combinatorial interpretation of LP-decoding via graphcovers [VK05] to simplify some of the proofs in [ADS09]. Specifically, using the equiva-lence of graph cover decoding and LP-decoding in [VK05], we obtain a simpler proof thatlocal-optimality suffices for LP optimality.

1.1.3 Local-Optimality Certificates for ML-Decoding and LP-

decoding of Tanner Codes

In Chapter 5 (based on [HE11b, HE12c]) we deal with a new combinatorial characteri-zation for local-optimality of a codeword in irregular Tanner codes with respect to anyMBIOS channel. We prove that local-optimality in this new characterization implies ML-optimality and LP-optimality. Given a codeword and the channel output, we also showhow to efficiently recognize if the codeword is locally optimal by a dynamic programmingalgorithm.

Previous Results. In Section 1.1.2, we discussed the analysis of local-optimality char-acterization based on weighted skinny trees in the Tanner graph.

Vontobel [Von10] extended the decomposition of a codeword (and pseudocodeword)to skinny trees in graph covers. This enabled Vontobel to mitigate the limitation on theheight by the girth. The decomposition is obtained by a random walk, and applies alsoto irregular Tanner graphs.

6

Contributions. We present a new combinatorial characterization for local-optimalityof a codeword in irregular Tanner codes with respect to any memoryless binary-inputoutput-symmetric (MBIOS) channel. Local optimality is characterized via costs of devi-ations based on subtrees in computation trees of the Tanner graph. Consider a computa-tion tree with height 2h rooted at some variable node. A deviation is based on a subtreesuch that (i) the degree of a variable node equals to its degree in the computation tree,and (ii) the degree of a local-code node equals some constant d > 2, provided that d isat most the minimum distance of the local-codes. Furthermore, level weights w ∈ Rh

+

are assigned to the levels of the tree. Hence, a deviation is a combinatorial structurethat has three main parameters: deviation height h, deviation level weights w ∈ Rh

+, anddeviation “degree” d. Therefore, the new definition of local-optimality is based on threeparameters: h ∈ N, w ∈ Rh

+, and d > 2.

This characterization extends the notion of deviations in local-optimality in four ways:(i) no restrictions are applied to the degrees of the nodes in the Tanner graph, (ii) arbitrarylocal linear codes may be associated with constraint nodes, (iii) deviations are subtrees inthe computation tree and no limitation is set on the height of the deviations; in particular,their height may exceed the girth of the Tanner graph, (iv) minimal deviations may havea degree d > 2 in the check nodes (as opposed to skinny trees in previous analyses),provided that d is at most the minimum distance of the local-codes. We prove that local-optimality in this characterization implies ML-optimality. We utilize the equivalenceof graph cover decoding and LP-decoding for Tanner codes, implied by Vontobel andKoetter [VK05], to prove that local-optimality suffices also for LP-optimality. We presentan efficient dynamic programming algorithm that computes a local-optimality certificatefor a codeword with respect to a given channel output.

1.1.4 Message-Passing Decoding with Local-Optimality Guar-antee for Tanner Codes with Single Parity-Check LocalCodes

In Chapter 6 (based on [HE11b, HE12f]), we present a new message-passing iterativedecoding algorithm, called normalized weighted min-sum (nwms). nwms algorithm is aBP-type algorithm that applies to any irregular Tanner code with single parity-check localcodes (e.g., LDPC codes and high-density parity-check codes). The decoding guaranteeof nwms applies whenever there exists a locally optimal codeword. We prove that if alocally optimal codeword with respect to height parameter h exists, then nwms-decodingfinds it in h iterations. This decoding guarantee holds for every finite value of h and isnot limited by the girth. Because local optimality of a codeword implies that it is theunique ML-codeword, the decoding guarantee also has an ML-certificate.

Previous Results. Arora et al. [ADS09] pointed out that it is possible to design a re-weighted version of the min-sum decoder for regular codes that finds the locally-optimalcodeword if such exists for trees whose height is at most half of the girth.

Various iterative message-passing decoding algorithms are derived from the beliefpropagation algorithm (e.g., max-product [WLK95], attenuated max-product [FK00],tree-reweighted belief-propagation [WJW05], etc.). The convergence of these BP-based

7

iterative decoding algorithms to an optimum solution has been studied extensively in var-ious settings (see e.g., [WLK95, FK00, WF01, CF02, CDE+05, WJW05, RU01, JP11]).However, bounds on the time and message complexity of these algorithms are not consid-ered. The analyses in these works often rely on the existence of a single optimal solutionin addition to other conditions such as: single-loop graphs, large girth, large reweightingcoefficients, consistency conditions, etc.

Jian and Pfister [JP11] analyzed a special case of the attenuated max-product de-coder [FK00], for regular LDPC codes. They considered skinny trees in the computationtree, the height of which is greater than the girth of the Tanner graph. Using contractionproperties and consistency conditions, they proved sufficient conditions under which themessage-passing decoder converges to a locally optimal codeword. This convergence alsoimplies convergence to the LP-optimum and therefore to the ML-codeword.

Contributions. We present a new message-passing iterative decoding algorithm, callednormalized weighted min-sum (nwms) algorithm. nwms algorithm applies to irregularTanner codes with single parity-check (SPC) local-codes (e.g., LDPC codes and high-density parity-check codes). The characterization of local-optimality for Tanner codeswith SPC local-codes has two parameters: (i) a certificate height parameter h, and (ii) avector of layer weights w ∈ Rh

+ \ 0h. (Note that the local codes are single paritychecks, and therefore the deviation degree d equals 2.) We prove that, for any finiteh, the nwms decoder is guaranteed to compute the ML codeword in h iterations if anh-locally-optimal codeword exists. Furthermore, the output of nwms can be efficientlyML-certified. The number of iterations, h, may exceed the girth. Because local-optimalityis a pure combinatorial property, the results do not rely on convergence. The time andmessage complexity of nwms is O(|E| · h) where |E| is the number of edges in theTanner graph. Local optimality, as discussed in Section 1.1.3, is a sufficient condition forsuccessful decoding by our BP-based algorithm in loopy graphs.

Previous lower bounds on the probability that a local-optimality certificate ex-ists [ADS09, HE11a] hold for regular LDPC codes. The same bounds hold also for suc-cessful decoding by nwms. These bounds are based on proving that a local-optimalitycertificate exists with high probability for noise thresholds close to the BP threshold.Specifically, noise thresholds of p∗ > 0.05 in the case of BSC [ADS09], and σ∗ > 0.735 inthe case of BI-AWGN channel [HE11a] have been proven.

1.1.5 Bounds on the Word Error Probability for LP-Decoding

of Regular Tanner Codes

In Chapter 7 (based on [HE11b]), we prove new bounds on the word error probability forLP-decoding of regular Tanner codes.

Previous Results. Most of the research on the error correction of Tanner codes dealswith two main families of codes: (i) expander Tanner codes, and (ii) repeat-accumulatecodes.

Sipser and Spielman [SS96] studied Tanner codes based on expander graphs and an-alyzed a simple bit-flipping iterative decoding algorithm. Specifically, their main results

8

were stated for (2,∆)-regular Tanner codes whose Tanner graphs are edge-vertex inci-dence graphs of ∆-regular expander graphs. Their novel scheme was later improved,and it was shown that expander Tanner codes can even asymptotically achieve capac-ity in the BSC with an iterative decoding bit-flipping scheme [Z01, BZ02, BZ04]. Thelater works were stated for (2,∆)-regular Tanner codes whose Tanner graphs are edge-vertex incidence graphs of bipartite ∆-regular expander graphs. In addition, these worksalso present a worst-case analysis (for a bit-flipping adversarial channel). Skachek andRoth [SR03, Ska07] presented a generalized minimum distance iterative decoding algo-rithm that attains the best worst-case analysis to date for iterative decoding of suchexpander codes.

Feldman and Stein [FS05] studied LP-decoding of expander codes. They proved thatLP-decoding can asymptotically achieve capacity of bounded LLR MBIOS channels witha special family of (2,∆)-regular expander Tanner codes. They also presented a worst-case analysis for bit-flipping adversarial channels.

The error correction capability of expander codes depends on the expansion, thus afairly large degree and huge block-lengths are required to achieve good error correction.

Repeat-Accumulate (RA) codes were introduced by Divsalar, Jin andMcEliece [DJM98] as a simple non-trivial example for turbo codes. Divsalar etal. proved inverse polynomial bounds in the block length on the word error probabilityof ML decoding for RA codes.

RA codes may be represented as Tanner codes using two types of simple local-codes:(i) repetition codes, and (ii) single parity-check codes. The simplicity of the local codesand the structure of the Tanner graph makes RA codes amenable to asymptotic analysisusing density evolution techniques (see e.g., [JKM00]). However, there are still no provenfinite-length bounds on the word error probability of RA codes for iterative message-passing decoders (for repetition q > 2).

Recently, Goldenberg and Burshtein [GB11] generalized the analysis of Feldman andKarger [FK02] to RA codes with even repetition q > 4. For this family of codes, theyproved inverse polynomial bounds in the block length on the word error probability ofLP-decoding.

Contributions. We present new bounds on the word error probability for LP decodingof regular Tanner codes in MBIOS channels. In Section In Section 1.1.3 we discussed anew local-optimality characterization for Tanner codes.

Because trees in our new characterization may have degrees bigger than two, theycontain more vertices. Hence this characterization leads to improved bounds for successfuldecoding of regular Tanner codes. These bounds extend the probabilistic analysis of themin-sum process by Arora et al. [ADS09] to a sum-min-sum process on regular trees.For regular Tanner codes, we prove bounds on the word error probability of LP-decodingunder MBIOS channels that are inverse doubly-exponential in the girth of the Tannergraph. We also prove bounds on the threshold of regular Tanner codes whose Tannergraphs have logarithmic girth. This means that if the noise in the channel is below thatthreshold, then the decoding error diminishes as a function of the block length. Note thatTanner graphs with logarithmic girth can be constructed explicitly (see e.g., [Gal63]).

9

1.1.6 Hierarchies of Local-Optimality

In Chapter 8 (based on [HE12a, HE12d], we present hierarchies of locally-optimal code-words with respect to two parameters. One parameter is related to the minimum distanceof the local codes in Tanner codes. The second parameter is related to the finite numberof iterations used in iterative decoding, even when number of iterations exceeds the girthof the Tanner graph. We show that these hierarchies satisfy inclusion properties as theseparameters are increased. In particular, this implies that a codeword that is decodedwith a certificate using an iterative decoder (nwms) after h iterations is decoded with acertificate after k · h iterations, for every integer k.

Previous Results. Suboptimal decoding of expander Tanner codes was analyzed inmany works (see e.g., [SS96, BZ04, FS05]). The results in these analyses rely on: (i) theexpansion properties of the Tanner graph, and (ii) constant relative minimum distancesof the local codes. The error-correcting guarantees in these analyses improve as theexpansion factor and relative minimum distance increase. In the first part of this chapterwe focus on the effect of increasing the minimum distance of the local codes on errorcorrecting guarantees of Tanner codes by ML-decoding and LP-decoding.

Density Evolution (DE) is used to study the asymptotic performance of decodingalgorithms based on Belief-Propagation (BP) (see e.g., [RU01, CF02]). Convergence ofBP-based decoding algorithms to some fixed point was studied in [FK00, WF01, WJW05,JP11]. However, convergence guarantees do not imply successful decoding after a finitenumber of iterations. Korada and Urbanke [KU11] provide an asymptotic analysis ofiterative decoding “beyond” the girth. Specifically, they prove that one may exchangethe order of the limits in DE-analysis of BP-decoding under certain conditions (i.e.,variable node degree at least 5 and bounded LLRs). On the other hand, in the secondpart of this chapter we focus on properties of iterative decoding of finite-length codesusing a finite number of iterations.

The characterization of local-optimality for Tanner codes has three parameters: (i) aheight h ∈ N, (ii) a vector of level weights w ∈ Rh

+, and (iii) a degree 2 6 d 6 d∗, where d∗

is the minimum local distance. We define hierarchies of local-optimality with respect tothe degree and height parameters in local-optimality. These hierarchies provide a partialexplanation of two questions about successful decoding with ML-certificates: (1) Whatis the effect of increasing the minimum distance of the local codes in Tanner codes?(2) What is the effect of increasing the number of iterations beyond the girth in iterativedecoding?

Contributions. To obtain one of the hierarchy results, we needed a new definition oflocal-optimality called strong local-optimality. We prove that if a codeword is stronglylocally-optimal, then it is also locally-optimal. Hence, the results for local-optimalitydiscussed in Section Section 1.1.3 hold also for strong local-optimality.

We present two combinatorial hierarchies:

1. A hierarchy of local-optimality based on degrees. The degree hierarchy states thata locally-optimal codeword x with degree parameter d is also locally-optimal withrespect to any degree parameter d′ > d. The degree hierarchy implies that theoccurrence of local-optimality does not decrease as the degree parameter increases.

10

2. A hierarchy of strong local-optimality based on height. The height hierarchy statesthat if a codeword x is strongly locally-optimal with respect to height parameterh, then it is also strongly locally-optimal with respect to every height parameterthat is an integer multiple of h. The height hierarchy proves, for example, thatthe performance of iterative decoding with an ML-certificate (e.g., nwms) of finite-length Tanner codes with SPC local codes does not degrade as the number ofiterations grows, even beyond the girth of the Tanner graph.

1.2 Organization

In Chapter 2 we review some basics of modern coding theory for linear-programmingdecoding, and establish notation that is used throughout the thesis.

In Chapter 3 we present local-optimality certificates for “even” Tanner codes. Weprove inverse polynomial bounds in the code length on the word error probability forLP-decoding of Tanner codes whose Tanner graphs have logarithmic girth.

In Chapter 4 we extend the analysis in [ADS09] from the BSC to any MBIOS channel.We prove bounds on the word error probability for LP-decoding of regular LDPC codesthat are inverse doubly exponential in the girth of the Tanner graph.

In Chapter 5 we present a new combinatorial characterization of local-optimality forirregular Tanner codes. We prove that local-optimality in this characterization impliesboth ML-optimality and LP-optimality. We present an efficient dynamic-programmingalgorithm that computes a local-optimality certificate for a codeword with respect to agiven channel output.

In Chapter 6 we present a new iterative message-passing decoding algorithm (nwms).nwms algorithm is a BP-type algorithm that applies to any Tanner code with singleparity-check local codes (e.g., LDPC codes). We prove that nwms is guaranteed toefficiently compute the locally-optimal codeword if one exists.

In Chapter 7 we present bounds on the word error probability for LP-decoding ofregular Tanner codes in MBIOS channels.

We conclude in Chapter 8 with two hierarchies on the combinatorial characterizationof local-optimality. These hierarchies explain: (i) The improvement of the probabilitythat a local-optimality certificate exists when increasing the certificate’s hight or den-sity. (ii) The improvement of iterative decoding (e.g., nwms) when increasing number ofiterations, even when the number of iterations exceeds the girth.

11

12

Chapter 2

Preliminaries

In this chapter we review some basics of modern coding theory for linear-programmingdecoding, and establish notation that is used throughout the thesis. We discuss theproblem of communication over a noisy channel via channel coding. We review Tan-ner codes, graph representation of Tanner codes, and some basic properties. We thenformulate ML-decoding as convex optimization problem, and define linear-programmingdecoding of Tanner codes. We present a symmetry property for decoding algorithms andshow that LP-decoding of Tanner codes is symmetric. We conclude this chapter with thecharacterization of pseudocodewords of LP-decoding via graph covers.

2.1 Graph Terminology and Algebraic Notation

Algebraic notation. Let x, y ∈ RN . Let 〈x, y〉 ,∑

i xi · yi denote the inner product

of vectors x and y. Denote by ‖x‖1 ,∑

i|xi| the `1 norm of x. The support supp(x) of avector is the set of indices where x is nonzero. For a set of vectors S ⊆ RN , let conv(S)denote the convex hull of set S. Namely, conv(S) = ∑i λixi | xi ∈ S, and

∑

i λi = 1.The cardinality of a set S is denoted by |S|. We denote by [N ] the set 1, 2, ..., N forN ∈ N+. For a set S ⊆ [N ], we denote by xS ∈ R|S| the projection of the vector x ontoindices in S.

A vector is rational if all its components are rational. Similarly, a vector is integral ifall its components are integral. A vector is fractional if at least one of its components isfactional.

Graph terminology. Let G = (V,E) denote an undirected graph. Let NG(v) denotethe set of neighbors of node v ∈ V . Let degG(v) denote the edge degree of vertex v in agraph G, i.e., degG(v) equals the number of edges incident to v in G with multiplicities.In the case where G is simple, it holds that degG(v) = |NG(v)|. For a set S ⊆ V letNG(S) ,

⋃

v∈S NG(v). A path p = (v, . . . , u) in G is a sequence of vertices such thatthere exists an edge between every two consecutive nodes in the sequence p. Let s(p)denote the first vertex of path p (source), and let t(p) denote the last vertex of path p(target). If s(p) = t(p) then the path is closed. A simple path is a path with no repeatedvertex. A simple cycle is a closed path where the only repeated vertex is the first and lastvertex. A path p is backtrackless if every three consecutive vertices along p are distinct

13

Noisy ChannelChannelEncoder

ChannelDecoder

u y c, uc

Figure 2.1: Channel Coding.

(i.e., a subpath (u, v, u) is not allowed). Let |p| denote the length of a path p, i.e., thenumber of edges in p. Let dG(r, v) denote the distance (i.e., length of a shortest path)between nodes r and v in G, and let girth(G) denote the length of the shortest cycle inG. Let p and q denote two paths a graph G such that t(p) = s(q). The path obtained byconcatenating the paths p and q is denoted by p q.

An induced subgraph is a subgraph obtained by deleting a set of vertices. The subgraphof G induced by S ⊆ V consists of S and all edges in E, both endpoints of which arecontained in S. Let GS denote the subgraph of G induced by S.

A graph G = (V,E) is bipartite if V is the union of two disjoint independent sets,i.e. no two vertices within the same set are adjacent. We denote by G = (V ∪ J , E) abipartite graph if V ∩ J = ∅, and V and J are independent sets in G.

2.2 Communicating over Noisy Channel

Error control coding deals with reliable transmission of information over an unreliablechannel. Figure 2.1 depicts a simplified abstraction of a communication system. Ourgoal is to transfer an information word u ∈ X k reliably over a noisy channel. A discretecommunication channel is defined by the following three components: (i) input alphabetX , (ii) output alphabet Y , and (iii) a conditional probability density function f(y | c = x)for every pair (x ∈ X , y ∈ Y). We denote the channel input by c and the channel outputby y.

Reliable transmission of information over a noisy channel can be achieved by a processof channel coding. A code C of length N over an alphabet X is a collection of elementsfrom XN . The elements of the code are called codewords. The rate of a code is given byR(C) , 1

Nlog|X |(|C|). The sender encodes an information word u ∈ X k to a codeword

c ∈ XN by adding redundant symbols according to an injective encoding function E :X k → XN . The set of all codewords C ,

c | c = E(u), u ∈ X k

forms a code C. A

codeword c is then transmitted over a noisy channel that outputs a word y ∈ YN . Giventhe channel output y, the decoder makes an estimate u of the original information wordu according to a noisy decoding function D : YN → X k. Decoding succeeds if u = uand fails if u 6= u. Because the encoder mapping is injective, we may assume that thedecoder outputs an estimate c of the transmitted codeword c instead of the correspondinginformation word.

In this thesis we restrict our discussion to a class of channels called memoryless binary-input output-symmetric (MBIOS) channels. Let f(y | c = x) denote the conditionalprobabilistic density function that defines a channel. A memoryless channel satisfiesf(y | c = x) =

∏Ni=1 f(yi | ci = xi) for every pair of channel input x ∈ XN and

channel output y ∈ YN . For a binary-input channel, the input alphabet X = 0, 1 (or

14

ci

0

1

p

p

0

1

1− p

1− p

yi

(a) Binary symmetric channel - BSC(p).

yi = ci + φici ∈ ±1

φi ∼ N (0, σ2)

(b) Binary-input additive white Gaussian noise channel- BI-AWGNC(σ).

Figure 2.2: Examples of MBIOS channels.

X = −1, 1). A binary-input output-symmetric channel satisfies f(yi | ci = 0) = f(−yi |ci = 1). In MBIOS channels, the log-likelihood ratio (LLR) vector λ ∈ RN is defined by

λi(yi) , ln( f(yi|ci=0)

f(yi|ci=1)

)for every output symbol yi. Following are two common examples

of MBIOS channels that we refer to .

1. The Binary symmetric channel with crossover probability p < 12, denoted by

BSC(p), is defined as follows. The input and output alphabets are X = Y = 0, 1,where each input bit is flipped with probability p (see Figure 2.2(a)). The LLRfunction of BSC(p) is given by

λi(yi) =

ln(1−pp

) if yi = 0,

− ln(1−pp

) if yi = 1.

2. The Binary-input additive white Gaussian noise channel with noise variance σ2,denoted by BI-AWGNC(σ), is defined as follows. The input alphabet X = ±1 andthe output alphabet Y = R. Typically, a bit b ∈ 0, 1 is mapped to (−1)b. Givena channel input ci ∈ ±1, the channel outputs yi = ci + φi where φi ∼ N (0, σ2)(see illustration in Figure 2.2(b)). For BI-AWGNC(σ), the LLR function is givenby λi(yi) = 2yi

σ2 .

Let C denote the set of all codewords, i.e., C =c | c = E(u), u ∈ 0, 1k

. Assume

that a codeword c ∈ C is transmitted and the channel outputs a word y. The maximuma posterior (MAP) criterion minimizes the block error probability, i.e., the event whereD(y) 6= c. That is, MAP decoding is defined by

xmap(y) , arg maxx∈C

f(x | y). (2.1)

According to Bayes’ rule, if the prior probability is uniform (i.e., Pr(c = x) = 1|C|), then

maximizing the conditional probability f(x | y) is equivalent to maximizing the posteriorprobability f(y | x). In this case the block MAP decoding is called maximum-likelihood(ML) decoding, and is given by

xml(y) , arg maxx∈C

f(y | x). (2.2)

15

xml is referred to as the ML codeword.The following claim reformulate ML-decoding as a linear optimization problem over

the codewords in the code C.

Claim 2.1 For a code C ⊂ 0, 1N maximum-likelihood decoding is equivalent to

xml(y) = arg minx∈conv(C)

〈λ(y), x〉. (2.3)

Proof:

xml = arg maxx∈C

N∏

i=1

f(yi | ci = xi)

= arg minx∈C

− ln(

N∏

i=1

f(yi | ci = xi))

= arg minx∈C

N∑

i=1

− ln(f(yi | ci = xi)

)

= arg minx∈C

N∑

i=1

ln(f(yi | ci = 0)

)− ln

(f(yi | ci = xi)

)

= arg minx∈C

∑

i:xi=1

ln(f(yi | ci = 0)

f(yi | ci = 1)

)

= arg minx∈C

〈λ(y), x〉

2

Note that the optimal ML solution is invariant under positive scaling of the LLRvector λ.

2.3 Tanner Codes and Tanner Graph Representation

Minimum distance and linear codes. Let x, y ∈ 0, 1N . The Hamming weightof a word x, denoted by wH(x), equals to the number of non-zero entries in x, i.e.,wH(x) , ‖x‖1. The Hamming distance between two vectors x and y, denoted by d(x, y),isthe number of indices in which they differ, i.e., d(x, y) , ‖x−y‖1. The minimum distanceof a code C, denoted by d(C), is defined by d(C) , mind(x, y) | x, y ∈ C, x 6= y. This isalso referred to as simply the distance of the code.

An [N, k, d] binary linear code is a k-dimensional vector subspace of the vector space0, 1N with minimum distance d. A parity check matrix for an [N, k, d] binary linearcode C is an m×N matrix H with rank(H) = N − k 6 m whose rows span the space ofvectors orthogonal to C.

Tanner codes. Tanner [Tan81] introduced a class of codes that generalizes the low-density parity-check codes of Gallager [Gal63]. Tanner used a bipartite graph to representa long “global” code that is built from one or more short “local” codes. That is, a new

16

J

v1

vi

vN

v2

xN

xi

x2

x1

C1

C2

Cj

CJ

nj

mi

C1

C2

Cj

CJ

Variable Nodes Local-Code Nodes

V

Figure 2.3: Tanner Graph of a Tanner Code.

code is defined explicitly by its decomposition into shorter sub-codes. We refer to thisfamily of codes as Tanner codes based on Tanner graphs.

Let G = (V ∪ J , E) denote an edge-labeled bipartite-graph, where V = v1, . . . , vNis a set of N vertices called variable nodes, and J = C1, . . . , CJ is a set of J verticescalled local-code nodes. We denote the degree of Cj by nj (see illustration in Figure 2.3).

We associate with each local-code node Cj a linear code Cjof length degG(Cj). Let

CJ ,Cj

: Cjis an [nj , kj, dj] code, j ∈ [J ]

denote the set of J local codes, one for

each local-code node. We say that vi participates in Cjif (vi, Cj) is an edge in E.

A word x = (x1, . . . , xN) ∈ 0, 1N is an assignment to variable nodes in V where xi isassigned to vi. Let Vj denote the set NG(Cj) ordered according to labels of edges incidentto Cj. Denote by xVj

∈ 0, 1degG(Cj ) the projection of the word x = (x1, . . . , xN) ontoentries associated with Vj.

The Tanner code C(G, CJ ) based on the labeled Tanner graph G is the set of vectors

x ∈ 0, 1N such that xVjis a codeword in Cj

for every j ∈ [J ].

Let dj denote the minimum distance of the local code Cj. The minimum local distance

d∗ of a Tanner code C(G, CJ ) is defined by d∗ = minj dj.If the bipartite graph is (dL, dR)-regular, i.e., the vertices in V have degree dL and the

vertices in J have degree dR, then the graph defines a (dL, dR)-regular Tanner code.If the Tanner graph is sparse, i.e., |E| = O(N), then it defines a low-density Tanner

code. A single parity check code is the code that contains all binary words with evenHamming weight. Tanner codes with single parity check local codes that are based onsparse Tanner graphs are called low-density parity-check (LDPC) codes.

17

Consider a Tanner code C(G, CJ ). We say that a word x = (x1, ..., xN ) satisfies local-

code Cjif xVj

∈ Cj. The set of words x that satisfy the local-code Cj

is denoted by Cj , i.e.,

Cj = x ∈ 0, 1N : xVj∈ Cj. The resulting code Cj is the extension of the local-code

Cjfrom length nj to length N . It holds that

C(G, CJ ) =⋂

j∈[J ]

Cj. (2.4)

Rate and distance properties of Tanner codes. Tanner proved the following lower

bounds on the rate and distance of a Tanner code C(G, CJ ).

Theorem 2.2 ([Tan81]) Let C = C(G, CJ ) denote a Tanner code with rate R. Assume

that variable nodes in G have degree m. If all local-codes Cj ∈ CJ are linear codes withrate r, then

R ≥ 1− (1− r)m.

Theorem 2.3 (Tree Bound on Minimum Distance [Tan81]) Let C = C(G, CJ )denote a Tanner code with distance D. Assume that variable nodes in G have degree

m, and let g = girth(G). If all local-codes Cj ∈ CJ have distance d, then

D ≥ d[(d− 1)(m− 1)](g−2)/4 − 1

(d− 1)(m− 1)− 1+d

m[(d− 1)(m− 1)](g−2)/4

for g2

odd, and

D ≥ d[(d− 1)(m− 1)]g/4 − 1

(d− 1)(m− 1)− 1

for g2

even.

For example, if we consider some trivial Tanner code with [7, 4, 3]-Hamming code as local-codes in a Tanner graph with girth g = Θ(log7N), then Theorem 2.2 and Theorem 2.3

imply that the resulting code has rate at least 17

and distance of size Ω(N112 ).

Sipser and Spielman [SS96] proved a stronger lower bound on the distance of C(G, CJ )in the case where variable nodes in G have degree 2. The bound is based on the vertex

expansion characteristics of the graph G and properties of the local codes in CJ .

Theorem 2.4 (Expander Based Bound on Minimum Distance [SS96]) Let C =

C(G, CJ ) denote a Tanner code of length N and distance D. Assume that all variable

nodes in G have degree 2. Also assume that all local-codes Cj ∈ CJ are [n, k] codes withrate r = k

nand distance d. Then,

D ≥ N

(d

n

)2

(1− ε)

where ε depends on d and the expansion of G.

Note that ( dn)2(1− ε) > α for some constant 0 < α < 1. Hence, the minimum distance of

such a code is linear in the block length (for codes with sufficiently large block length).

18

2.4 Linear Programming Decoding of Tanner Codes

over Memoryless Channels

A minimum of a linear function over a convex hull of a set is achieved in a vertex. There-fore, following Claim 2.1, for a code C, Maximum-Likelihood (ML) decoding is equivalentto

xml(y) = arg minx∈conv(C)

〈λ(y), x〉, (2.5)

where conv(C) denotes the convex hull of the set C.In general, solving the optimization problem in (2.5) for linear codes is in-

tractable [BMvT78]. Feldman et al. [Fel03, FWK05] introduced a linear programmingrelaxation for the problem of ML decoding of Tanner codes whose local codes are paritycodes. This definition is based on a fundamental polytope that corresponds to the Tannergraph G. We consider an extension of this definition to the case in which the local codes

are arbitrary as follows. The generalized fundamental polytope P , P(G, CJ ) of a Tanner

code C = C(G, CJ ) is defined by

P ,⋂

Cj∈CJ

conv(Cj). (2.6)

Note that a Tanner code may have multiple representations by a Tanner graph and

local codes. Moreover, different representations (G, CJ ) of the same Tanner code C may

yield different generalized fundamental polytopes P(G, CJ ). If the degree of each local-code node is constant, then the generalized fundamental polytope can be representedby O(N + |J |) variables and O(|J |) constraints. If, in addition, the Tanner graph issparse, then |J | = O(N), and the generalized fundamental polytope has an efficientrepresentation. Such Tanner codes are often called generalized low-density parity-checkcodes.

Given an LLR vector λ for a received word y, LP-decoding is defined by the followinglinear program:

xlp(y) , arg minx∈P(G,CJ

)

〈λ(y), x〉. (2.7)

If the LP solution is integral, then the LP decoder outputs xlp. In contrast, if the LPsolution is fractional, then the decoder outputs “error”. As in the case of ML-decoding,the optimal LP solution is invariant under positive scaling of the LLR vector λ.

The difference between ML-decoding and LP-decoding is that the fundamental poly-

tope P(G, CJ ) may strictly contain the convex hull of C. Moreover the integral points in

P(G, CJ ) are exactly the codewords of C, i.e., P(G, CJ ) ∩ 0, 1N = C. In that case, the

polytope is called proper. That is, Every codeword in C is a vertex of P(G, CJ ). Vertices

of P(G, CJ ) that are not codewords of C must have fractional components and are calledpseudocodewords. Therefore, the LP decoder has the ML certificate property : if xlp is acodeword, then it is guaranteed to be the ML-codeword.

19

2.5 Symmetry of LP Decoding and the All-Zero

Codeword Assumption

In order to simplify the probabilistic analysis of algorithms for decoding linear codesover symmetric channel, we want to assume without loss of generality that the all-zerocodeword was transmitted, i.e., c = 0N . However, the correctness of the restriction tothe all-zero codeword depends on the employed decoding algorithm. In this section wedefine a symmetry property of a decoding algorithm. Given that a decoding algorithmis symmetric, we prove the validity of the all-zero codeword assumption over symmetricchannels.

For two vectors y, z ∈ RN , let “∗” denote coordinatewise multiplication, i.e., y ∗ z ,(y1 · z1, . . . , yN · zN).

Definition 2.5 (symmetry of decoding algorithm) Let x ∈ C denote a codewordand let b ∈ ±1N denote a vector defined by bi = (−1)xi. Let λ denote an LLR vector.A decoding algorithm, dec(λ), is symmetric with respect to code C, if

∀x ∈ C. x⊕ dec(λ) = dec(b ∗ λ). (2.8)

The following proposition follows from the symmetry of a decoding algorithm and thesymmetry of an MBIOS channel.

Proposition 2.6 (All-zero codeword assumption) Consider a linear code C ∈0, 1N and an MBIOS channel with LLR output λ ∈ RN . If a decoding algorithmdec is symmetric with respect to C, then the probability that dec fails is independent ofthe transmitted codeword. That is, for every x ∈ C

Prdec(λ) 6= x | c = x = Prdec(λ) 6= 0N |c = 0N.

Proof: By Definition 2.5, for every x ∈ C,

Prdec(λ) 6= x | c = x = Pr

dec(b ∗ λ) 6= 0N | c = x.

For MBIOS channels, Pr(λi | ci = 0) = Pr(−λi | ci = 1). Therefore, the mapping(x, λ) 7→ (0N , b∗λ) preserves probability measure. We apply this mapping to (x, b∗λ) 7→(0N , b ∗ b ∗ λ) and conclude that

Prdec(b ∗ λ) 6= 0N | c = x = Pr

dec(λ) 6= 0N | c = 0N.

2

ML-decoding is symmetric with respect to every linear code C. Hence, one may assumethat the all-zero codeword is transmitted when analyzing ML-decoding of linear codesover symmetric channels. In order to prove that LP-decoding is symmetric, Feldman etal. [Fel03, FWK05] define a symmetry property of the fundamental polytope with respectto a code C. They show that if an LP decoder is based on a symmetric fundamentalpolytope P with respect to C, then the LP decoder is symmetric with respect to C. We

prove that the generalized fundamental polytope P(G, CJ ) is symmetric if the local-codes

20

in CJ are linear. Therefore, the all-zero codeword assumption is valid for LP decoding ofTanner codes.

The following definition expands the notion of addition of codewords over 0, 1N tothe case where one of the vectors is real.

Definition 2.7 ([Fel03]) Given a codeword x ∈ 0, 1N and a point f ∈ [0, 1]N , therelative point x⊕ f ∈ [0, 1]N is defined by (x⊕ f)i = |xi − fi|.

Note that (x ⊕ f)i = xi + (−1)xi · fi. Hence, for a fixed x ∈ 0, 1N , x ⊕ f is an affinelinear function in f . It follows that for any distribution over vectors f ∈ [0, 1]N , we haveE[x⊕ f ] = x⊕E[f ].

Definition 2.8 (C-symmetry of a Polytope [Fel03]) Consider a polytope P ⊆[0, 1]N and a code C. For every x ∈ C, define πx : P → [0, 1]N by πx(f) , x ⊕ f .The polytope P is C-symmetric if πx is an automorphism of P for every x ∈ C.

Theorem 2.9 (C-symmetry of P(G, CJ )) Let C(G, CJ ) denote a generalized Tanner

code. If every local code Cj ∈ CJ is linear, then the polytope P(G, CJ ) is C-symmetric.

Proof: Consider a codeword x ∈ C(G, CJ ). If πx(f) = πx(f′) then f = f ′. Hence, πx is

injective. It remains to show that if f ∈ P then πx(f) ∈ P.From Equation (2.6), it follows that f ∈ conv(Cj) for every Cj ∈ CJ . Therefore, we

can express f as a convex combination of some N ′ ∈ N affinely independent codewordsin Cj . That is, for every Cj ∈ CJ there is a subset X (j) = x(j)

l ∈ 0, 1NN′

l=1 ⊆ Cj suchthat

f =

N ′∑

l=1

αl · x(j)l (2.9)

where∑N ′

l=1 αl = 1, and αl > 0 for every l ∈ [N ′].Therefore,

πx(f) = x⊕ (

N ′∑

l=1

αl · x(j)l )

=

N ′∑

l=1

αl · (x⊕ x(j)l )

Because Cj is linear, (x ⊕ x(j)l ) ∈ Cj for every l ∈ [N ′]. Therefore, πx(f) is also a convex

combination of N ′ codewords in Cj .We conclude that πx(f) ∈ conv(Cj) for every local-code of C(G, CJ ), and therefore

πx(f) ∈ P. 2

Proposition 2.10 Let x ∈ 0, 1N , and let b ∈ ±1N denote a vector defined by bi ,(−1)xi. For every λ ∈ RN and every f ∈ [0, 1]N ,

〈b ∗ λ, f〉 = 〈λ, x⊕ f〉 − 〈λ, x〉. (2.10)

21

Proof: For f ∈ [0, 1]N , it holds that 〈λ, x⊕ f〉 = 〈λ, x〉+ ∑Ni=1(−1)xiλifi. Hence,

〈λ, x⊕ f〉 − 〈λ, x〉 =

N∑

i=1

(−1)xiλifi

= 〈b ∗ λ, f〉

2

Corollary 2.11 (Symmetry of LP-decoding) If a polytope P is C-symmetric, thenthe LP decoder based on P is symmetric with respect to C.

Proof: Because C is linear, we may assume without loss of generality that xlp(λ) = 0N .Therefore, it remains to prove that

∀f ∈ P. 〈λ, x〉 6 〈λ, f〉 ⇐⇒ ∀f ∈ P. 〈λ, f〉 6 0. (2.11)

Because P is symmetric, P = x ⊕ f | f ∈ P. By Proposition 2.10, 〈b ∗ λ, f〉 =〈λ, x⊕ f〉 − 〈λ, x〉, and therefore implies (2.11). 2

Following Proposition 2.6, Theorem 2.9 and Corollary 2.11 we conclude that the all-zero codeword assumption is valid for LP-decoding of Tanner codes. Because the all-zerocodeword has zero cost, we have

PrLP decoding fails

= Pr

∃f ∈ P(G, CJ ) \ 0N. 〈λ, f〉 6 0

∣∣0N

. (2.12)

2.6 Graph Cover Pseudo-Codewords and the Gener-

alized Fundamental Polytope

Koetter and Vontobel [KV03, VK05] introduced a combinatorial concept called graph-cover decoding for decoding codes on graphs, and showed its equivalence to LP-decoding.The characterization of graph cover decoding provides useful tool for the analysis of LP-decoding and its connections to iterative message-passing decoding algorithms. In thefollowing section we study pseudocodewords of Tanner codes via graph covers. We definepseudocodewords of Tanner codes based on the definition of Koetter and Vontobel [KV03,VK05]. We conclude the section with a proof of an equality of the set of all graph coverpseudo-codewords of a Tanner code and the set of rational vectors in the generalizedfundamental polytope.

2.6.1 Covering Maps, Liftings, and Cover Codes

Let G = (V,E) and G = (V , E) be finite graphs and let π : G → G be a graphhomomorphism, namely, ∀u, v ∈ V : (u, v) ∈ E ⇒ (π(u), π(v)) ∈ E. A homomorphismπ is a covering map if for every v ∈ V the restriction of π to neighbors of v is a bijection tothe neighbors of π(v). The pre-image π−1(v) of a node v is called a fiber. It is easy to seethat all the fibers have the same cardinality if G is connected. This common cardinalityis called the degree or fold number of the covering map. If π : G→ G is a covering map,

22

v3 v4

v1 v2

(a) Base graph G.

fiber π−1(v1) π−1(v2)

π−1(v4)π−1(v3)

(b) An M -cover of G.

Figure 2.4: M-cover of base graph G: Fiber π−1(vi) contains M copies of vi, matchingbetween fibers π−1(vi) and π−1(vj) if (vi, vj) is an edge in G.

we call G the base graph and G a cover of G (see Figure 2.4). In the case where the foldnumber of the covering map is M , we say that G is an M-cover of G.

Given a base graph G and a natural fold number M , an M-cover G and a coveringmap π : G → G can be constructed in the following way. Map every vertex (v, i) ∈ V(where i ∈ [M ]) to v ∈ V , i.e., π(v, i) = v. The edges in E are obtained by specifying amatching of M edges between π−1(u) and π−1(v) for every (u, v) ∈ E.

Note that the term ‘covering’ originates from covering maps in algebraic topology, asopposed to other notions of ‘coverings’ in graphs or codes (e.g., vertex covers or coveringcodes).

We now define assignments to variable nodes in an M-cover of a Tanner graph. Theassignment is induced by the covering map and an assignment to the variable nodes inthe base graph.

Definition 2.12 (lift, [VK05]) Consider a bipartite graph G = (V ∪ J , E) and anarbitrary M-cover G = (V ∪ J , E) of G. The M-lift of a vector x ∈ RN is an assignmentx ∈ RN ·M to the nodes in V that is defined by an assignment x ∈ RN to the nodes in Vand the covering map π : G→ G as follows. Every v ∈ π−1(v) is assigned by x the valueassigned to v by x.

We now extend the notion of an M-cover in graphs to M-cover codes of Tanner

codes. Consider a Tanner code C(G, CJ ) of length N . Let G denote an M-cover of G

where π : G → G is an M-covering map. The M-cover code C(G, CJM) is obtained by

associating the local code Cjto each node in the fiber π−1(Cj). Hence, CJM is the multiset

obtained by repeating each local-code Cj ∈ CJ M times. Note that the M-cover code

C(G, CJM) is of length M ·N .

2.6.2 Graph Cover Pseudo-Codewords and the GeneralizedFundamental Polytope

Let C , C(G, CJ ) denote a Tanner code of length N . For any positive integer M , let

G = (V ∪ J , E) be an M-cover of G, and let C , C(G, CJM) denote an M-cover code of C.

23

Definition 2.13 (pseudo-codeword, [VK05]) The (scaled) pseudo-codeword ζ(x) ∈QN associated with binary vector x = xvv∈V ∈ C of length N ·M is the rational vector

ζ(x) ,(ζ1(x), ζ2(x), . . . , ζN(x)

)defined by

ζi(x) ,1

M·

∑

v∈π−1(vi)

xv, (2.13)

where the sum is taken in R (not in F2). We call the vector M · ζ(x) the unscaledpseudo-codeword associated with x. Additionally, we define ζ(C) to be the set of scaledpseudo-codewords

ζ(C) , ζ(x) : x ∈ C. (2.14)

Given a Tanner code C(G, CJ ), we would like to characterize the set of all (scaled)pseudo-codewords obtained by all finite covers of the Tanner graph G. We follow thecharacterization of Vontobel and Koetter [VK05] for binary linear codes defined by itsparity-check matrix, and generalize it to the class of Tanner codes.

Consider a Tanner code C(G, CJ ) of length N . Denote by Q(G, CJ ) the set of all

triplets (M, G, x) such that x is a codeword of C(G, CJM) where G is a finite M-cover ofG. That is,

Q(G, CJ ) ,⋃

M∈N+

⋃

G: G is anM−cover of G

⋃

x∈C(G,CJM )

(M, G, x

)

. (2.15)

Denote by Q(G, CJ ) the “projection” of the set Q(G, CJ ) into the set of (scaled) pseudo-codewords with length N , namely,

Q(G, CJ ) ,⋃

(M,G,x)∈ ˜Q(G,CJ)

ζ(x)

. (2.16)

We are now ready to state the theorem that characterizes the set Q(G, CJ ) using

the generalized fundamental polytope P(G, CJ ) for a given Tanner code C(G, CJ ). Thefollowing theorem is a generalization of [VK05] to the case where the local-codes are notnecessarily parity codes (the local-codes may be even non-linear). The theorem statethat the set of all scaled pseudo-codewords equals the set of all rational vectors in thegeneralized fundamental polytope.

Theorem 2.14 Let C = C(G, CJ ) denote a Tanner code. Then,

Q(G, CJ ) = P(G, CJ ) ∩QN , (2.17)

and

P(G, CJ ) = closure(Q(G, CJ )

). (2.18)

where the closure of the set Q is with respect to the space RN . Moreover, all vertices of

polytope P(G, CJ ) are in the set Q(G, CJ ).

24

Proof: Theorem 2.14 follows directly from Lemmas 2.16, 2.17 and 2.18. 2

Corollary 2.15 For every rational vector ψ ∈ P there exists M and an M-cover graphof G with a corresponding word x such that ψ = ζ(x)

Lemma 2.16 Q(G, CJ ) ⊆ P(G, CJ ).

Proof: Consider a pseudo-codeword ζ(x) ∈ Q(G, CJ ) where x is a codeword of an

M-cover code C(G, CJM). It suffices to show that ζ(x) ∈ conv(Cj) for every Cj ∈ CJ .

We prove that ζ(x) is an average of M codewords in Cj . By Definition 2.13, everycomponent ζi(x) is the average of the values in x assigned to the M variable nodes inthe fiber π−1(vi). The projection of x onto coordinates of the neighbors of each of the M

local code nodes in the fiber π−1(Cj) is a local codeword in Cj. Since every local code

in the fiber π−1(Cj) is Cj, we conclude that ζ(x) is an average of M codewords in the

extended code Cj . 2

Lemma 2.17 The vertices of the generalized fundamental polytope P(G, CJ ) are ratio-nal.

Proof: The vertices of the polytope conv(Cj) are integral. Therefore, the facets ofconv(Cj) are contained in hyperplanes whose linear constraints have rational coefficients.Note that

⋂

j conv(Cj) can be described as the union of the above linear constraints. Since

QN is a vector space, the solution of every subset of the above constraints is rational. 2

Lemma 2.18 All rational vectors in the generalized fundamental polytope P(G, CJ ) are

in the set Q(G, CJ ).

Proof: Let C = C(G, CJ ) denote a generalized Tanner Code of length N . Consider

an arbitrary rational vector ζ ∈ QN in P(G, CJ ). We need to prove that there is anM-cover G = (V ∪ J , E) of G and a 0− 1 assignment x to its variable nodes V such that

x ∈ C(G, CJM), and ζ = ζ(x).

From Equation (2.6), it follows that ζ ∈ conv(Cj) for every Cj ∈ CJ . UsingCaratheodory’s Theorem (see e.g., [Sch98]), we can express ζ as a convex combination ofN ′ ≤ N + 1 affinely independent codewords in Cj. That is, for every Cj ∈ CJ there is asubset X (j) = x(j)

l ∈ 0, 1NN′

l=1 ⊆ Cj such that

ζ =N ′∑

l=1

λ(j)l · x

(j)l (2.19)

where∑N ′

l=1 λ(j)l = 1, and λ

(j)l > 0 for every l ∈ [N ′]. Since ζ is rational, and every x

(j)l ,

l ∈ [N ′], is integral, we may require that λ(j) = (λ(j)1 , ..., λ

(j)N ′) is a rational solution to

Equation (2.19).

25

Let M denote the least common denominator (LCD) of the set λ(j)l : j ∈ [J ], and l ∈

[N ′]. We rewrite each coefficient λ(j)l ,

a(j)l

M, where a

(j)l ∈ Z>0. Therefore, for every

j ∈ [J ] we can write

ζ =1

M

N ′∑

l=1

a(j)l · x

(j)l . (2.20)

For every set X (j), define the multi-set X ′(j) of cardinality M , such that every memberx

(j)l ∈ X (j) is repeated a

(j)l times in X ′(j). Formally,

X ′(j) = x′(j)m : x′(j)m = x(j)l for

∑

t∈[l−1]

a(j)t < m ≤

∑

t∈[l]

a(j)t . (2.21)

We can therefore write for every j ∈ [J ],

ζ =1

M

N ′∑

l=1

a(j)l · x

(j)l (2.22)

=1

M

N ′∑

l=1

a(j)l∑

t=1

x(j)l (2.23)

=1

M

M∑

l=1

x′(j)l . (2.24)

Note that Equation (2.24) implies that for every j, the rational vector ζ is the average of

M codewords (with repetitions) in Cj . Since x′(j)l is integral for every j ∈ [J ] and l ∈ [M ],

it follows that ζ ·M is also integral.We are now ready to describe a construction of a finite M-cover G = (V ∪ J , E)

of G, and a proper assignment x ∈ 0, 1N ·M such that x ∈ C(G, CJM), and ζ = ζ(x).Let V (G) = V ∪ J = V × [M ] ∪ J × [M ]. That is, for each vertex v of G (bothvariable nodes and local-code nodes), create a fiber π−1(v) of cardinality M in G. Letx ∈ 0, 1N ·M denote an assignment to variable nodes in V as follows. For every vi ∈ Vand l ∈ [M ],

x(vi,l) =

1 if 1 ≤ l ≤ ζi ·M,

0 if ζi ·M < l ≤M.(2.25)

It remains to describe the set of edges E. Note that if π : G → G is an M-covering,then for every edge e = (vi, Cj) ∈ E, π−1(e) is a perfect matching between π−1(vi) andπ−1(Cj). In fact, we construct π by defining the matchings in G corresponding to edgesin G. Consider a fiber π−1(Cj) of a local-code node Cj ∈ J . We construct |NG(Cj)|matchings, each of size M , between fiber π−1(Cj) and the neighboring fibers π−1(v) suchthat v ∈ NG(Cj). A local-code node (Cj, l) is connected to one variable node in eachfiber neighboring the fiber π−1(Cj) = (Cj, l)Ml=1. The connection is to a variable node

assigned a one or a zero according to the codeword x′(j)l (see Figure 2.5). Namely, the

neighbors of (Cj, l) in G hold the projection of the codeword x′(j)l to the bits of Cj

.We need to show that the connections can be made without “running out” of zeros or

ones. Recall that each variable node in a fiber neighboring the fiber (Cj, l)Ml=1 should

26

ζi1 ·M

(Cj , 2)

(Cj , 1)

(Cj , M)

π−1(Cj)

′1′

′1′

′1′

′0′

′0′

′0′

ζi2 ·M

′1′

′1′

′1′

′0′

′0′

′0′π−1(vi1)

π−1(vi2)

′1′

′1′

′1′

′0′

′0′

′0′

ζi3 ·M

π−1(vi3)

Figure 2.5: Connecting Edges in proof of Lemma 2.18 between fiber π−1(Cj) every fiberπ−1(vi) such that vi ∈ NG(Cj). For example, we assume that NG(Cj) = vi1 , vi2, vi3 and

x′(j)1 N (Cj)

= (0, 1, 1), x′(j)2 N (Cj)

= (1, 0, 1), and x′(j)M N (Cj)

= (0, 1, 1).

be connected to exactly one node in the fiber (Cj, l)Ml=1. Since ζ ·M is integral, thereare exactly ζi ·M codewords of Cj in the multi-set X ′(j) whose ith entry equals 1. In eachfiber π−1(vi) of a variable node vi ∈ V there are exactly ζi ·M nodes assigned the value1 by x. Therefore, the connecting edges between variable nodes and local-code nodes inthe M-cover G “hit” every variable node in the fiber π−1(v) (where v ∈ NG(Cj)) exactlyonce. By the construction of x and the M-cover G we conclude that the projection of x

to the coordinates of NG((Cj , l)) is a codeword in Cj. Since this hold for every j ∈ [J ]

and l ∈ [M ], it follows that x ∈ C(G, CJM), as required. 2

27

28

Chapter 3

Bounds on the Word ErrorProbability for LP-Decoding of“Even” Tanner Codes

In this chapter1 we deal with a family of Tanner codes in which (i) the degrees of thevariable nodes all equal dL, for an even number dL, and (ii) local-codes contain onlywords with even weight. We call this family of codes even Tanner codes. We presenta combinatorial characterization of local-optimality of a codeword in even Tanner codeswith respect to any MBIOS channel. The local-optimality characterization is based onsimple cycles in the Tanner graph. We prove that cycle-based local-optimality impliesML-optimality and LP-optimality. We prove inverse polynomial bounds in the codelength on the word error probability for LP-decoding of even Tanner codes whose Tannergraphs have logarithmic girth.

This chapter presents a simple example, yet effective, of analysis based on local-optimality. The analysis provides a guarantee for ML-optimality and LP-optimality.Also, the analysis of local-optimality implies bounds on the success probability for LP-decoding. Various methods that extend local-optimality characterization and its analysisare presented throughout this thesis.

3.1 Introduction

Feldman and Karger [FK02] first introduced the novel concept of linear-programmingbased decoding for repeat-accumulate RA(q) turbo-like codes. For RA(2) codes over theBSC, given a certain constant threshold on the noise, they proved that the word errorprobability of LP-decoding is bounded by an inverse polynomial in the code length. Asimilar claim was also shown for the BI-AWGN channel.

RA(2) codes are a family of cycle codes, i.e., Tanner codes with single parity-checklocal codes such that the degree of every variable node equals to 2. In fact, the analysispresented by Feldman and Karger [FK02, FK04] holds for any cycle code whose Tanner

1The research work presented in this chapter is based on [HE06]. Extension to the work presented inthis chapter that includes the case of even Tanner codes with irregular left degrees appears in [HE12b,HE12e]

29

graph has a logarithmic girth. Hence, LP-decoding of cycle codes has a word errorprobability of at most N−ε for any ε > 0, provided that the noise is bounded by a certainfunction of the constant ε (independent of N). Moreover, in the case of cycle codes,LP-decoding is equivalent to the iterative min-sum decoder [FWK05].

The analysis of Feldman and Karger [FK04] was improved in [HE05]. An exact combi-natorial characterization of irreducible pseudocodewords w.r.t. the LLR vector receivedfrom the channel is presented. LP-decoding fails iff there exists an irreducible pseu-docodeword with negative cost w.r.t. the LLR vector received from the channel. Hence,a careful combinatorial analysis of such irreducible pseudocodewords w.r.t. MBIOS chan-nels resulted in improved bounds.

Recently, Goldenberg and Burshtein [GB11] generalized the analysis of Feldman andKarger [FK04] to RA(q) codes with even repetition q > 4. For this family of codes, theyproved inverse polynomial bounds in the code length on the word error probability of LP-decoding. These bounds are based on analysis of hyperpromenades defined by Feldmanand Karger [FK04]. We note that the family of even Tanner codes contains the family ofrepeat accumulate codes with even repetitions studied in [GB11].

Contributions. We study pseudocodewords of Tanner codes for which (i) the Tannergraph is left regular with even degree, and (ii) local-codes contain only words with evenweight. We call these codes even Tanner codes. We present a combinatorial charac-terization of local-optimality of a codeword in even Tanner codes with respect to anyMBIOS channel. Local-optimality in this chapter is characterized via costs of simple cy-cles in the Tanner graph w.r.t. the LLR vector received from the channel. We prove thatlocal-optimality implies ML-optimality and LP-optimality. Based on the local-optimalitycharacterization, we prove inverse polynomial bounds in the code length on the word errorprobability for LP-decoding of even Tanner codes whose Tanner graphs have logarithmicgirth.

3.2 Local-Optimality Certificates Based on Simple

Cycles

In this chapter we consider a family of even Tanner codes.

Definition 3.1 Let C(G, CJ ) denote a Tanner code based on a Tanner graph G = (V ∪J , E) with: (1) ∀v ∈ V. degG(v) = dL ,i.e., G is left regular. (2) dL is even. (3) Every

local-code Cj ∈ CJ contains only words with even weight. We call this family of codeseven Tanner codes.

The characteristic vector χG(p) ∈ N|V| of a path p in a Tanner graph G is defined asfollows. For every v ∈ V the component [χG(p)]v equals to the multiplicity of v in p. If pis closed (a cycle), then we count the last and first vertices in p as one occurrence.

Let B denote the set of characteristic vectors of all simple cycles in G. Formally,

B , χG(p) | p is a simple cycle

Note that B ∈ 0, 1|V| because every vertex appears at most once in a simple cycle.

30

Definition 3.2 (cycle-based local-optimality) Let C(G) ⊂ 0, 1N denote an evenTanner code. A codeword x ∈ C(G) is locally optimal with respect to λ ∈ RN if for allvectors β ∈ B,

〈λ, x⊕ β〉 > 〈λ, x〉. (3.1)

For two vectors y, z ∈ RN , let “∗” denote coordinatewise multiplication, i.e., (y∗z)i ,yi · zi. For a word x ∈ 0, 1N , let b ∈ ±1N denote a vector defined by bi , (−1)xi .The following proposition states that the mapping (x, λ) 7→ (0N , b ∗ λ) preserves localoptimality.

Proposition 3.3 (symmetry of local-optimality) For every x ∈ C, x is locally opti-mal w.r.t. λ if and only if 0N is locally optimal w.r.t. b ∗ λ.

Proof: By Proposition 2.10, 〈λ, x⊕ β〉 − 〈λ, x〉 = 〈b ∗ λ, β〉. 2

3.2.1 Local-Optimality Implies ML-Optimality

In the following subsection we show that local-optimality is sufficient for ML-optimality.We first prove the following decomposition lemma.

Lemma 3.4 Let C(G) denote an even Tanner code. For every codeword x 6= 0N , thereexists a distribution ρ over simple cycles in G and a constant α > 1 such that

x = α · Eβ∈ρB[β].

Proof: Let Vx , v | xv = 1, and let Gx denote the subgraph of the Tanner graph Ginduced by Vx ∪ NG(Vx). Because x is a codeword in an even Tanner code, the degreeof every node (both variable nodes and local-code nodes) in Gx is even. Therefore, Gx is

the disjoint union of Eulerian cycles. Denote by G(j)x the set of connected components

of Gx, and denote by ψ(j) the set of the corresponding Eulerian cycles.Let dL denote the degree of the variable nodes in G. It holds that degGx

(v) = dL for

every variable node v ∈ Vx. Consider a variable node v in the connected component G(j)x ,

then the multiplicity of v in ψ(j) equals to dL

2. Therefore, x = 2

dL·∑j χG(ψ(j)).

Every Eulerian cycle ψ(j) can be decomposed into a set of simple cycles. Let Γ denotethe set of simple cycles in G obtained by decomposing Gx to simple cycles. Therefore,

x =2

dL· |Γ| ·

∑

γ∈Γ

1

|Γ|χG(γ).

Because every Eulerian cycle ψ(j) is decomposed into at least dL

2simple cycles, we

have that 2dL· |Γ| > 1. Hence, the lemma follows with α = 2

dL· |Γ| > 1. 2

Theorem 3.5 (local-optimality is sufficient for ML) Let C(G) denote an evenTanner code. Let λ ∈ RN denote the LLR vector received from the channel. If x isa locally optimal codeword w.r.t. λ, then x is also the unique maximum-likelihood code-word w.r.t. λ.

31

Proof: [based on [ADS09, HE11a]] We use the decomposition proved in Lemma 3.4 toshow that for every codeword x′ 6= x, 〈λ, x′〉 > 〈λ, x〉. Let z , x ⊕ x′. By linearityz ∈ C(G). Moreover, z 6= 0N because x 6= x′. By Lemma 3.4 there exists a distributionover the set B, such that Eβ∈Bβ = α · z, where α 6 1. Let f : [0, 1]N → R be the affine

linear function defined by f(β) , 〈λ, x⊕ β〉 = 〈λ, x〉+ ∑Ni=1(−1)xiλiβi. Then,

〈λ, x〉 < Eβ∈B〈λ, x⊕ β〉 (by local-optimality of x)

= 〈λ, x⊕Eβ∈Bβ〉 (by linearity of f and linearity of expectation)

= 〈λ, x⊕ αz〉 (by Lemma 3.4)

= 〈λ, (1− α)x+ α(x⊕ z)〉= 〈λ, (1− α)x+ αx′〉= (1− α)〈λ, x〉+ α〈λ, x′〉.

which implies that 〈λ, x′〉 > 〈λ, x〉 as desired. 2

3.2.2 Local-Optimality Implies LP-Optimality

In the following subsection we show that local-optimality is sufficient for LP-optimality.We consider graph cover decoding introduced by Vontobel and Koetter [VK05] (see Chap-ter 2.6). Let G denote an M-cover of G. Let x = x↑M ∈ C(G) and λ = λ↑M ∈ RN ·M

denote the M-lifts of x and λ, respectively.

Proposition 3.6 (local-optimality of the all-zero codeword is preserved by M-lifts)0N is locally optimal codeword w.r.t. λ ∈ RN if and only if 0N ·M is locally optimalcodeword w.r.t. λ.

Proof: Consider the surjection ϕ of simple cycles in G to simple cycles in G. Thissurjection is based on the covering map between G and G. Given a simple cycle γ inG, let γ , ϕ(γ). Let β , χG(γ) and β , χG(γ). The proposition follows because〈λ, β〉 = 〈λ, β〉. 2

The following lemma states that local-optimality is preserved by lifting to an M-cover.

Lemma 3.7 x is locally optimal w.r.t. λ if and only if x is locally optimal w.r.t. λ.

Proof: Assume that x is locally optimal codeword for λ. By Proposition 3.3, 0N ·M islocally optimal w.r.t. (−1)x ∗ λ. By Proposition 3.6, 0N is locally optimal w.r.t. b ∗ λ.By Proposition 3.3, x is locally optimal w.r.t. λ. Each of these implications is necessaryand sufficient, and the lemma follows. 2

The following theorem is obtained as a corollary of Theorem 3.5 and Lemma 3.7. Theproof is based on arguments utilizing properties of graph cover decoding (Theorem 2.14).

Theorem 3.8 (local optimality is sufficient for LP optimality) If x is a locallyoptimal codeword w.r.t. λ, then x is also the unique optimal LP solution given λ.

32

Proof: [based on [HE11a]] Suppose that x is a locally optimal codeword w.r.t. λ ∈ RN .Theorem 2.14 implies that for every basic feasible solution z ∈ [0, 1]N of the LP, thereexists an M-cover G of G and an assignment z ∈ 0, 1N ·M such that z ∈ C(G) andz = ζ(z), where ζ(z) is the image of the scaled projection of z in G (i.e., the pseudo-codeword associated with z, see Definition 2.13). Moreover, since the number of basicfeasible solutions is finite, we conclude that there exists a finite M-cover G such thatevery basic feasible solution of the LP admits a valid assignment in G.

Let z∗ denote an optimal LP solution given λ. Without loss of generality z∗ is a basicfeasible solution. Let z∗ ∈ 0, 1N ·M denote the 0− 1 assignment in the M-cover G thatcorresponds to z∗ ∈ [0, 1]N . By Theorem 2.14 and the optimality of z∗ it follows that z∗

is a codeword in C(G) that minimizes 〈λ, z〉 for z ∈ C(G), namely z∗ is the ML codewordin C(G) w.r.t. λ↑M .

Let x = x↑M denote the M-lift of a locally optimal codeword x. Note that because xis a codeword, i.e., x ∈ 0, 1N , there is a unique pre-image of x in G, which is the M-liftof x. Lemma 3.7 implies that x is a locally optimal codeword for λ. By Theorem 3.5,we also get that x is the ML codeword in C(G) w.r.t. λ↑M . Moreover, Theorem 3.5guarantees the uniqueness of an ML optimal solution. Thus, x = z∗. By projection to G,since x = z∗, we get that x = z∗ and uniqueness follows, as required. 2

3.3 Bounds on the Word Error Probability Using

Cycle-Based Local-Optimality

In the previous section, we showed that LP-decoding succeeds if a locally optimal code-word exists w.r.t. the LLR vector received from the channel. In this section we analyzethe probability that a cycle-based locally-optimal codeword exists for even Tanner codesin MBIOS channels. In order to simplify the probabilistic analysis of algorithms for de-coding linear codes over symmetric channels, we apply the assumption that the all-zerocodeword is transmitted, i.e., c = 0N (see Chapter 2.5). Consider the set B of character-istic vectors of all simple cycles in G, then

PrLP decoding fails , Prx 6= xlp(λ) | c = x (3.2)

(1)

6 Pr∃β ∈ B such that 〈λ, x⊕ β〉 6 0

∣∣c = x

(2)= Pr

∃β ∈ B such that 〈b ∗ λ, β〉 6 0

∣∣c = x

(3)= Pr

∃β ∈ B such that 〈λ, β〉 6 0

∣∣c = 0N

.

Inequality (1) is the contrapositive statement of Theorem 3.8. Equality (2) follows Propo-sition 3.3. For MBIOS channels, Pr(λi | ci = 0) = Pr(−λi | ci = 1). Therefore, the map-ping (x, λ) 7→ (0N , b ∗ λ) where bi , (−1)xi preserves probability measure. Equality (3)follows by applying this mapping to (x, b ∗ λ) 7→ (0N , b ∗ b ∗ λ).

Therefore, our goal is to prove an upper bound on the probability that there exists asimple cycle in G with non-positive cost w.r.t. λ. However, there are many simple cycleswith varying lengths. Cycles in G have lengths between the girth (which is at mostlogarithmic in N) to a size linear in N . We use a method suggested by Feldman andKarger [FK04] to bypass the problem of size varying cycles by bounding the number of

33

simple paths of length g , girth(G) with non-positive cost w.r.t. λ. This helps becauseif a non-positive cost simple cycle exists, then there exists a non-positive cost path oflength g.

The bounds on the word error probability of the LP decoder are therefore based onbounding the probability that a non-positive cost cycle exists. Cycles with many variablenodes are unlikely to have non-positive cost. This suggests that the Tanner graph shouldhave high girth. Note that Tanner graphs with logarithmic girth can be constructedexplicitly (see e.g. [Gal63]).

For a path (or a cycle) ψ, let cost(ψ) , 〈λ, χG(ψ)〉 denote the cost of ψ w.r.t. λ. Thefollowing lemma relaxes the problem of finding non-positive cost simple cycles to findingnon-positive cost simple paths.

Lemma 3.9 Let G denote a Tanner graph, and let g = girth(G). If there exists a simplecycle γ in G such that cost(γ) 6 0, then there exists a simple path (or cycle) ψ in G oflength g such that cost(ψ) 6 0.

Proof: Let γ = (γ0, γ1, . . . , γ`−1, γ` = γ0) be a simple cycle in G of length ` wherecost(γ) 6 0. We prove that there exists a simple path (or cycle) of length g with non-positive cost. The proof utilizes an averaging argument over all segments of length g inγ.

Since γ is a cycle, we have that ` > g. Let ψi = (γi, γi+1, ..., γi+g) denote a segmentof γ that starts at vertex γi and contains g edges. Because g is the girth of G, everysegment ψi has no repeated vertices. (Excluding first and last vertex in a simple cycle ofsize g.) We note that

cost(γ) =1

g·

`−1∑

i=0

cost(ψi),

since every occurrence of a variable node in γ is counted in the sum exactly g times.Because cost(γ) 6 0, at least one segment ψi∗ must have a non-positive cost, i.e.,cost(ψi∗) 6 0. 2

Let dmaxR denote the maximum degree of a local-code node C ∈ J , and letD , dL·dmax

R .By [Gal63], there exists explicit constructions of Tanner codes whose Tanner graphs havelogarithmic girth. In particular, we assume that girth(G) = logD(N) = O(log(N)).Therefore, with the above graph construction, by Equation (3.2) and a union boundover the paths of length logD(N) in G, we derive an analytical bound on the word errorprobability of the LP decoder on the BSC.

Theorem 3.10 Let C(G) denote an even Tanner code of length N such that g = logD(N)where D , dL · dmax

R . Consider a BSC with crossover probability p. For any ε > 0, if

4p < D−4(ε+ 32), then the LP decoder fails to decode the transmitted codeword with a

probability of at most N−ε.

For example, if p 6 D−8

4, then the word error probability of LP-decoding is at most 1√

N.

34

Proof: By Equation (3.2) we assume that the all-zero codeword is transmitted, i.e.,c = 0N . Hence λv = 1 w.p. 1− p and λv = −1 w.p. p. Along with Lemma 3.9, we boundthe word error probability Pw using a union bound over all the events where simple pathsof length g have non-positive cost in G.

Let ψ be a particular path of length g. Each variable node in the path has cost +1with probability (1− p) and −1 with probability p. At least half of the variable nodes in

ψ must have cost −1 in order for cost(ψ) 6 0. Hence Prcost(ψ) 6 0 6( g

2g4

)p

g4 . There

are at most |V| ·D g2 different simple paths of length g in G. By the union bound,

Pw 6 |V| ·D g2 ·

( g2g4

)

· p g4

6 N ·D 12

logD(N) ·(1

2logD(N)

14logD(N)

)

· p 14

logD(N)

6 N ·N 12 · 2 1

2logD(N) ·N 1

4logD(p)

= N32+ 1

2logD(2)+ 1

4logD(p)

6 N−ε.

(3.3)

2

We derive a bound on the word error probability for the BI-AWGN channel, also usinga union bound.

Theorem 3.11 Let C(G) denote an even Tanner code of length N such that g = logD(N)where D , dL · dmax

R . Consider a BI-AWGN channel with variance σ2. For any ε > 0, if

σ2 < logD(e)6+4ε

, then the LP decoder fails to decode the transmitted codeword with a probabilityof at most σ√

π logD(N)·N−ε.

Proof: By Equation (3.2) we assume that the all-zero codeword is transmitted, i.e.,c = 0N . Hence λv = 1+φi where φi ∼ N (0, σ2) is a zero-mean Gaussian random variablewith variance σ2. Along with Lemma 3.9, we bound the word error probability Pw usinga union bound over all the events where simple paths of length g have non-positive costin G.

Let ψ be a particular path of length g. If the sum of the costs of the variable nodesin ψ is non-positive, then cost(ψ) 6 0. Hence,

Prcost(ψ) 6 0

= Pr

g2∑

i=1

(1 + φi) 6 0

= Pr

g2∑

i=1

φi 6 −g2

.

(3.4)

The sum of independent Gaussian random variables (RVs) with zero mean is a zero-mean Gaussian RV whose variance equals to the sum of the variances of the accumulated

variables. Let Φ =∑ g

2i=1 φi, then Φ ∼ N (0, σ2 · g

2) is a zero-mean Gaussian RV with

35

variance σ2 · g2. Moreover, the Gaussian distribution function is symmetric around 0.

Therefore,

Prcost(ψ) 6 0 = PrΦ 6 −g

2

= PrΦ >

g

2

.

(3.5)

For a Gaussian RV φ ∼ N (0, σ2) with zero mean and variance σ2, the inequality

Prφ > x 6σ

x√

2πe−

x2

2σ2 (3.6)

holds for every x > 0 [Fel68]. We conclude that

Prcost(ψ) 6 0 6σ√πge−

g

4σ2 . (3.7)

There are at most |V| · D g2 different simple paths of length g in G. By the union

bound,Pw 6 |V| ·D g

2 · Prcost(ψ) 6 06 N ·D g

2σ√πge−

g

4σ2

6 N1 12 · σ

√

π logD(N)e−

logD(N)

4σ2

=σ

√

π logD(N)N1 1

2− 1

4σ2 logD(e)

6σ

√

π logD(N)·N−ε.

(3.8)

2

3.4 Discussion

In this chapter we presented a combinatorial characterization of local-optimality for evenTanner codes with respect to any MBIOS channel. The local-optimality characterizationis based on the decomposition of every codeword (and pseudocodeword) to a set of simplecycles in the Tanner graph. We prove that local-optimality in this characterization impliesML-optimality and LP-optimality. Based on the local-optimality characterization, weprove inverse polynomial bounds in the code length on the word error probability forLP-decoding of even Tanner codes whose Tanner graphs have logarithmic girth.

In the definition of even Tanner codes we required that the Tanner graph is left regularfor ease of presentation. By considering even Tanner codes whose left degree is regular,the concept of cycle decomposition for local-optimality is neatly presented. We notethat the characterization and the analysis presented in this chapter can be extended toeven Tanner codes with irregular left degrees by introducing degree normalization to thecharacteristic vector of a cycle (see Chapter 5 and [HE12b, HE12e]).

In [HE05], we presented algorithms that compute bounds on the word error probabilityof LP-decoding for given instances of cycle codes. The algorithms for computing the

36

bounds are based on a “cycle” analysis. The analysis presented in this chapter is basedon combinatorial characterization of pseudocodewords by simple cycles in the Tannergraph. Therefore, trivial modifications of the algorithms presented in [HE05] also boundthe word error probability of LP-decoding for specific instances of even Tanner codes.Because the minimum distance of cycle codes is at most logarithmic in the size of theTanner graph, the careful analysis in [HE05] led to upper bounds that are roughly equalto the lower bounds for these codes. However, even Tanner codes in general (e.g., RA(q)codes for even g > 4) may exhibit an empirical word error probabilities decaying fasterthan N−ε.

Consider the following problem formulation for even Tanner codes. Assume thatthere exists a cycle-based locally optimal codeword w.r.t. an LLR vector λ. Giventhis assumption, the following task is equivalent to maximum-likelihood decoding: Finda 0 − 1 assignment x ∈ 0, 1N to variable nodes such that 0N is cycle-based locallyoptimal codeword w.r.t. (−1)x ∗ λ. That is, there are no non-positive simple cycles inG w.r.t. (−1)x ∗ λ. By Theorem 3.8, LP-decoding computes such a valid assignment x.In Chapter 6 we present an iterative message-passing decoding algorithm that computessuch a valid assignment x, even in a more general setting.

37

38

Chapter 4

LP Decoding of Regular LDPCcodes in Memoryless Channels

In this chapter1 we deal with regular low-density parity-check (LDPC) codes, i.e., Tannercodes based on sparse regular Tanner graphs with single parity-check local codes. Local-code nodes C ∈ J in this case are called check nodes.

We study error bounds for linear programming decoding of regular LDPC codes. Formemoryless binary-input output-symmetric channels, we prove bounds on the word errorprobability that are inverse doubly-exponential in the girth of the Tanner graph. Formemoryless binary-input AWGN channel, we prove lower bounds on the threshold forregular LDPC codes whose factor graphs have logarithmic girth under LP-decoding. Inparticular, we prove a lower bound of σ = 0.735 (upper bound of Eb

N0= 2.67dB) on the

threshold of (3, 6)-regular LDPC codes whose factor graphs have logarithmic girth.Our proof is an extension of a recent work by Arora, Daskalakis, and Steurer [ADS09]

who presented a novel probabilistic analysis of LP decoding over a binary symmetricchannel. Their analysis is based on the primal LP representation and has an explicit con-nection to message passing algorithms. We extend this analysis to any MBIOS channel.

4.1 Introduction

LP-decoding has been applied to several families of codes, among them: RA codes, turbo-like codes, LDPC codes, and expander codes. Decoding failures have been characterized,and these characterizations enabled proving word error bounds for RA codes, LDPCcodes, and expander codes (see e.g., [FK04, HE05, KV06, FS05, FMS+07, DDKW08,ADS09]). Experiments indicate that message-passing decoding is likely to fail if LP-decoding fails [Fel03, VK05].

Feldman et al. [FMS+07] were the first to show that LP-decoding corrects a constantfraction of errors for expander codes over an adversarial bit flipping channel. For example,for a specific family of rate 1

2LDPC expander codes, they proved that LP-decoding can

correct 0.000175N errors. This kind of analysis is worst-case in its nature, and the impliedresults are quite far from the performance of LDPC codes observed in practice over binarysymmetric channels (BSC). Daskalakis et al. [DDKW08] initiated an average-case analysis

1The research work presented in this chapter is based on [HE11a].

39

of LP-decoding for LDPC codes over a probabilistic bit flipping channel. For a certainfamily of LDPC expander codes over a BSC with bit flipping probability p, they provedthat LP-decoding recovers the transmitted codeword with high probability up to a noisethreshold of p = 0.002. This proved threshold for LP-decoding is rather weak comparedto thresholds proved for belief propagation (BP) decoding over the BSC. For example,even for (3, 6)-regular LDPC codes, the BP threshold is p = 0.084. Therefore, one wouldexpect LDPC expander codes to have much better thresholds than p = 0.002 under LP-decoding. Both of the results in [FMS+07] and [DDKW08] were proved by analysis ofthe dual LP solution based on expansion arguments. Extensions of [FMS+07] to a largerclass of channels (e.g., truncated AWGN channel) were discussed in [FKV05].

Koetter and Vontobel [KV06] analyzed LP-decoding of regular LDPC codes usinggirth arguments and the dual LP solution. They proved lower bound on the thresholdof LP-decoding for regular LDPC codes whose Tanner graphs have logarithmic girthover any memoryless channel. This bound on the threshold depends only on the de-gree of the variable nodes. The decoding errors for noise below the threshold decreasedoubly-exponentially in the girth of the factor graph. This was the first threshold resultpresented for LP-decoding of LDPC codes over memoryless channels other than the BSC.When applied to LP-decoding of (3, 6)-regular LDPC codes over a BSC with crossoverprobability p, they achieved a lower bound of p = 0.01 on the threshold. For the binary-input additive white Gaussian noise channel with noise variance σ2 (BI-AWGN(σ)), theyachieved a lower bound of σ = 0.5574 on the threshold (equivalent to an upper boundof Eb

N0= 5.07dB). The question of closing the gap to σ = 0.82 (1.7dB) [WA01], which is

the threshold of max-product (min-sum) decoding algorithm for the same family of codesover a BI-AWGNC(σ), remains open.

Recently, Arora et al. [ADS09] presented a novel probabilistic analysis of the primalsolution of LP-decoding for regular LDPC codes over a BSC using girth arguments. Theyproved error bounds that are inverse doubly-exponential in the girth of the Tanner graphand lower bounds on thresholds that are much closer to the performance of BP-baseddecoding. For example, for a family of (3, 6)-regular LDPC codes whose Tanner graphshave logarithmic girth over a BSC with crossover probability p, they proved a lower boundof p = 0.05 on the threshold of LP-decoding. Their technique is based on a weighteddecomposition of every codeword and pseudo-codeword to a finite set of structured trees.They proved a sufficient condition, called local-optimality, for the optimality of a decodedcodeword based on this decomposition. They use a min-sum process on trees to bound theprobability that local-optimality holds. A probabilistic analysis of the min-sum processis applied to the structured trees of the decomposition, and yields error bounds for LP-decoding.

Our Contribution. In this chapter, we extend the analysis in [ADS09] from the BSCto any memoryless binary-input output-symmetric (MBIOS) channel. We prove boundson the word error probability that are inverse doubly-exponential in the girth of thefactor graph for LP-decoding of regular LDPC codes over MBIOS channels. We also provelower bounds on the threshold of (dL, dR)-regular LDPC codes whose Tanner graphs havelogarithmic girth under LP-decoding in binary-input AWGN channels. Note that regularTanner graphs with logarithmic girth can be constructed explicitly (see e.g. [Gal63]).Specifically, in a finite length analysis of LP-decoding over BI-AWGN(σ), we prove that

40

for (3, 6)-regular LDPC codes the decoding errors for σ < 0.605 (EbN0

> 4.36dB) decreasedoubly-exponentially in the girth of the factor graph. In an asymptotic case analysis, weprove a lower bound of σ = 0.735 (upper bound of Eb

N0= 2.67dB) on the threshold of

(3, 6)-regular LDPC codes under LP-decoding, thus decreasing the gap to the BP-baseddecoding asymptotic threshold.

In our analysis we utilize the combinatorial interpretation of LP-decoding via graphcovers [VK05] to simplify some of the proofs in [ADS09]. Specifically, using the equiva-lence of graph cover decoding and LP-decoding in [VK05], we obtain a simpler proof thatlocal-optimality suffices for LP optimality.

Our main result:

Theorem 4.1 Let G denote a (dL, dR)-regular bipartite graph with girth g, and letC(G) ⊂ 0, 1N denote the low-density parity-check code defined by G. Let x ∈ C(G)be a codeword. Consider the BI-AWGNC(σ), and suppose that y ∈ RN is the wordobtained from the channel given x. Then,

1) [finite length bound] For (dL, dR) = (3, 6) and σ 6 0.605 (EbN0

> 4.36dB), x is theunique optimal solution to the LP decoder with probability at least

1− 1

125e

32σ2N · α2b

14 gc

for some constant α < 1.

2) [asymptotic bound] For (dL, dR) = (3, 6) and g = Ω(logN) sufficiently large, x isthe unique optimal solution to the LP decoder with probability at least 1−exp(−N δ)for some constant 0 < δ < 1, provided that σ 6 0.735 (Eb

N0> 2.67dB).

3) For any (dL, dR), x is the unique optimal solution to the LP decoder with probability

at least 1−N · α(dL−1)b14 gc

for some constant α < 1, provided that

mint>0

(

(dR−1)e−t

∫ ∞

−∞

(1−FN (z)

)dR−2fN (z)e−tzdz

)

·(

(dR−1)e12t2σ2−t

)1/(dL−2)

< 1,

where fN (·) and FN (·) denote the p.d.f. and c.d.f. of a Gaussian random variablewith zero mean and standard deviation σ, respectively.

Theorem 4.1 generalizes to MBIOS channels as follows.

Theorem 4.2 Let G denote a (dL, dR)-regular bipartite graph with girth Ω(logN), andlet C(G) ⊂ 0, 1N denote the low-density parity-check code defined by G. Consider anMBIOS channel, and suppose that y ∈ RN is the word obtained from the channel givenx = 0N . Let λ ∈ R denote the log-likelihood ratio of the received channel observations,and let fλ(·) and Fλ(·) denote the p.d.f. and c.d.f. of λ(yi), respectively. Then, LP-decoding succeeds with probability at least 1 − exp(−N δ) for some constant 0 < δ < 1,provided that

mint>0

(

(dR − 1)

∫ ∞

−∞

(1− Fλ(z)

)dR−2fλ(z)e

−tzdz

)

·(

(dR − 1)Ee−tλ

)1/(dL−2)

< 1.

41

Organization. The remainder of this chapter is organized as follows. Section 4.2presents combinatorial characterization of a sufficient condition of LP-decoding successfor regular LDPC codes in memoryless channels. In section 4.3 we use the combinatorialcharacterization to bound the error probability of LP-decoding and provide lower boundson the threshold. Thus proving Theorems 4.1 and 4.2. We conclude with a discussion inSection 4.4.

4.2 On the Connections between Local Optimality,

Global Optimality, and LP Optimality

Let x ∈ C(G) denote a codeword and λ(y) ∈ RN denote an LLR vector for a receivedword y ∈ RN . Following [ADS09], we consider two questions: (i) does x equal xml(y)?and (ii) does x equal xlp(y) and is it the unique solution? Arora et al. [ADS09] presenteda certificate based on local structures both for xml(y) and xlp(y) over a binary symmetricchannel. In this section we present modifications of definitions and certificates to the caseof memoryless binary-input output-symmetric (MBIOS) channels.

Notation: Let y ∈ RN denote the received word. Let λ = λ(y) denote the LLRvector for y. Let x ∈ C(G) be a candidate for xml(y) and xlp(y). G = (V ∪ J , E) is a(dL, dR)-regular bipartite factor graph. For two vertices u and v, denote by d(u, v) thedistance between u and v in G. Denote by N (v) the set of neighbors of a node v in G,and let B(u, t) denote the set of vertices at distance at most t from u.

Following Arora et al. we consider neighborhoods B(i0, 2h) where i0 ∈ V and h <14girth(G). Note that the induced graph on B(i0, 2h) is a tree.

Definition 4.3 (Minimal Local Deviation, [ADS09]) An assignment β ∈ 0, 1Nis a valid deviation of depth h at i0 ∈ V or, in short, a h-local deviation at i0, if βi0 = 1and β satisfies all parity checks in B(i0, 2h),

∀j ∈ J ∩B(i0, 2h) :∑

i∈N (j)

βi ≡ 0 mod 2.

A h-local deviation β at i0 is minimal if βi = 0 for every i /∈ B(i0, 2h), and everycheck node j in B(i0, 2h) has at most two neighbors with value 1 in β. A minimal h-localdeviation at i0 can be seen as a subtree of B(i0, 2h) of height 2h rooted at i0, where everyvariable node has full degree and every check node has degree 2. Such a tree is called askinny tree. An assignment β ∈ 0, 1N is a minimal h-local deviation if it is a minimalh-local deviation at some i0. Note that given β there is a unique such i0 , root(β).

If w = (w1, . . . , wh) ∈ [0, 1]h is a weight vector and β is a minimal h-local deviation,then β(w) denotes the w-weighted deviation

β(w)i =

wtβi if d(root(β), i) = 2t and 1 6 t 6 h,

0 otherwise.

Given a log-likelihood ratio vector λ, the cost of a w-weighted minimal h-local devia-tion β is defined by 〈λ, β(w)〉. The following definition is an extension of local-optimalityfrom BSC to LLR.

42

Definition 4.4 (local-optimality following [ADS09]) A codeword x ∈ 0, 1N is(h, w)-locally optimal w.r.t. λ ∈ RN if for all minimal h-local deviations β,

〈λ, x⊕ β(w)〉 > 〈λ, x〉.Since β(w) ∈ [0, 1]N , we consider only weight vectors w ∈ [0, 1]h\0h. Koetter and

Vontobel [KV06] proved for w = 1h that a locally optimal codeword x w.r.t. λ is alsoglobally optimal, i.e., the ML codeword. Moreover, they also showed that a locallyoptimal codeword x w.r.t. λ is also the unique optimal LP solution given λ. Arora etal. [ADS09] used a different technique to prove that local-optimality is sufficient bothfor global optimality and LP optimality with general weights in the case of a binarysymmetric channel. We extend the results of Arora et al. [ADS09] to the case of MBIOSchannels. Specifically, we prove for MBIOS channels that local-optimality implies LPoptimality (Theorem 4.8). We first show how to extend the proof that local-optimalityimplies global optimality in the case of MBIOS channels.

Theorem 4.5 (local-optimality is sufficient for ML) Let h < 14girth(G) and w ∈

[0, 1]h. Let λ ∈ RN denote the log-likelihood ratio for the received word, and suppose thatx ∈ 0, 1N is a (h, w)-locally optimal codeword in C(G) w.r.t. λ. Then x is also theunique maximum-likelihood codeword for λ.

The proof for MBIOS channels is a straightforward modification of the proofin [ADS09]. We include it for the sake of self-containment. The following lemma isthe key structural lemma in the proof of Theorem 4.5.

Lemma 4.6 ([ADS09]) Let h < 14girth(G). Then, for every codeword z 6= 0N , there

exists a distribution over minimal h-local deviations β such that, for every weight vectorw ∈ [0, 1]h, there exists an α ∈ (0, 1], such that

Eββ(w) = αz.

Proof of Theorem 4.5. We want to show that for every codeword x′ 6= x, 〈λ, x′〉 > 〈λ, x〉.Since z , x ⊕ x′ is a codeword, by Lemma 4.6 there exists a distribution over minimalh-local deviations β such that Eββ

(w) = αz. Let f : [0, 1]N → R be the affine linear

function defined by f(u) , 〈λ, x⊕ u〉 = 〈λ, x〉+ ∑Ni=1(−1)xiλiui. Then,

〈λ, x〉 < Eβ〈λ, x⊕ β(w)〉 (by local-optimality of x)

= 〈λ, x⊕ Eββ(w)〉 (by linearity of f and linearity of expectation)

= 〈λ, x⊕ αz〉 (by Lemma 4.6)



In order to prove a sufficient condition for LP optimality, we consider graph coverdecoding introduced by Vontobel and Koetter [VK05]. We use the terms and notation ofVontobel and Koetter [VK05] in the statement of Lemma 4.7 and the proof of Theorem 4.8(see Chapter 2.6). The following lemma shows that local-optimality is preserved afterlifting to an M-cover. Note that the weight vector must be scaled by the cover degreeM .

43

Lemma 4.7 Let h < 14girth(G) and w ∈ [0, 1

M]h\0h. Let G denote any M-cover of

G. Suppose that x ∈ C(G) is a (h, w)-locally optimal codeword w.r.t. λ ∈ RN . Letx = x↑M ∈ C(G) and λ = λ↑M ∈ RN ·M denote the M-lifts of x and λ, respectively. Thenx is a (h,M · w)-locally optimal codeword w.r.t. λ.

Proof: Assume that x = x↑M is not a (h,M · w)-locally optimal codeword w.r.t. λ =λ↑M . Then, there exists a minimal h-local deviation β ∈ 0, 1N ·M such that

〈λ, x⊕ β(M ·w)〉 6 〈λ, x〉. (4.1)

Note that for x ∈ 0, 1N ·M and its projection x = ζ(x) ∈ RN , it holds that

1

M〈λ, x〉 = 〈λ, x〉, and (4.2)

1

M〈λ, x⊕ β(M ·w)〉 = 〈λ, x⊕ β(w)〉, (4.3)

where β is the support of the projection of β onto the base graph. It holds that β is ah-local deviation because h < 1

4girth(G) 6 1

4girth(G). From (4.1), (4.2), and (4.3) we get

that 〈λ, x〉 > 〈λ, x ⊕ β(w)〉, contradicting our assumption on the (h, w)-local optimalityof x. Therefore, x is a (h,M · w)-locally optimal codeword w.r.t. λ in C(G). 2

Arora et al. [ADS09] proved the following theorem for a BSC and w ∈ [0, 1]h. Theproof can be extended to the case of MBIOS channels with w ∈ [0, 1]h using the sametechnique of Arora et al. A simpler proof is achieved for w ∈ [0, 1

M]h for some finite M .

The proof is based on arguments utilizing properties of graph cover decoding [VK05], andfollows as a corollary of Theorem 4.5 and Lemma 4.7.

Theorem 4.8 (local-optimality is sufficient for LP optimality) For every factorgraph G, there exists a constant M such that, if

1. h < 14girth(G),

2. w ∈ [0, 1M

]h\0h, and

3. x is a (h, w)-locally optimal codeword w.r.t. λ ∈ RN ,

then x is also the unique optimal LP solution given λ.

Proof: Suppose that x is a (h, w)-locally optimal codeword w.r.t. λ ∈ RN . Theo-rem 2.14 implies that for every basic feasible solution z ∈ [0, 1]N of the LP, there existsan M-cover G of G and an assignment z ∈ 0, 1N ·M such that z ∈ C(G) and z = ζ(z),where ζ(z) is the image of the scaled projection of z in G (i.e., the pseudo-codeword asso-ciated with z, see Definition 2.13). Moreover, since the number of basic feasible solutionsis finite, we conclude that there exists a finite M-cover G such that every basic feasiblesolution of the LP admits a valid assignment in G.

Let z∗ denote an optimal LP solution given λ. Without loss of generality z∗ is a basicfeasible solution. Let z∗ ∈ 0, 1N ·M denote the 0− 1 assignment in the M-cover G thatcorresponds to z∗ ∈ [0, 1]N . By Theorem 2.14, Equation (4.2), and the optimality of

44

z∗ it follows that z∗ is a codeword in C(G) that minimizes 〈λ, z〉 for z ∈ C(G), namelyz∗ = xml(y↑M).

Let x = x↑M denote the M-lift of x. Note that because x is a codeword, i.e., x ∈0, 1N , there is a unique pre-image of x in G, which is the M-lift of x. Lemma 4.7implies that x is a (h,M · w)-locally optimal codeword w.r.t. λ, where M · w ∈ [0, 1]h.By Theorem 4.5, we also get that x = xml(y↑M). Moreover, Theorem 4.5 guarantees theuniqueness of an ML optimal solution. Thus, x = z∗. By projection to G, since x = z∗,we get that x = z∗ and uniqueness follows, as required. 2

From this point, let M denote the constant whose existence is guaranteed by Theo-rem 4.8.

4.3 Proving Error Bounds Using Local Optimality

In order to simplify the probabilistic analysis of algorithms for decoding linear codes oversymmetric channels, one can assume without loss of generality that the all-zero codewordwas transmitted, i.e., c = 0N (see Chapter 2.5). The following lemma gives a structuralcharacterization for the event of LP-decoding failure if c = 0N .

Lemma 4.9 Let h < 14girth(G). Assume that the all-zero codeword was transmitted, and

let λ ∈ RN denote the log-likelihood ratio for the received word. If the LP decoder failsto decode to the all-zero codeword, then for every w ∈ Rh

+ there exists a minimal h-localdeviation β such that 〈λ, β(w)〉 6 0.

Proof: Consider the event where the LP decoder fails to decode the all-zero codeword,i.e., 0N is not a unique optimal LP solution. Theorem 4.8 implies that there existsa constant M such that, for every w′ ∈ [0, 1

M]h\0h, the all-zero codeword is not the

(h, w′)-locally optimal codeword w.r.t. λ. That is, there exists a minimal h-local deviationβ such that 〈λ, β(w′)〉 6 0. Let w′ = 1

M ·||w||∞ · w. Therefore 〈λ, β(w)〉 is also non-positive,as required. 2

We therefore have for a fixed h < 14girth(G) and w ∈ Rh

+ that

PrLP decoding fails 6 Pr∃β such that 〈λ, β(w)〉 6 0

∣∣c = 0N

. (4.4)

4.3.1 Bounding Processes on Trees

Using the terminology of (4.4), Arora et al. [ADS09] suggested a recursive method forbounding the probability Pr

∃β such that 〈λ, β(w)〉 6 0

∣∣c = 0N

for a BSC. We extend

this method to MBIOS channels and apply it to a BI-AWGN channel.Let G be a (dL, dR)-regular bipartite factor graph, and fix h < 1

4girth(G). Let Tv0

denote the subgraph induced by B(v0, 2h) for a variable node v0. Since h < 14girth(G),

it follows that Tv0 is a tree. We direct the edges of Tv0 so that it is an out-branchingdirected at the root v0 (i.e., a rooted spanning tree with directed paths from the root v0

to all the nodes). For l ∈ 0, . . . , 2h, denote by Vl the set of vertices of Tv0 at height l(the leaves have height 0 and the root has height 2h). Let τ ⊆ V (Tv0) denote the vertexset of a skinny tree rooted at v0.

45

Definition 4.10 ((h, ω)-Process on a (dL, dR)-Tree, [ADS09]) Let ω ∈ Rh+ denote a

weight vector. Let λ denote an assignment of real values to the variable nodes of Tv0 , wedefine the ω-weighted value of a skinny tree τ by

valω(τ ;λ) ,h−1∑

l=0

∑

v∈τ∩V2l

ωl · λv.

Namely, the sum of the values of variable nodes in τ weighted according to their height.Given a probability distribution over assignments λ, we are interested in the probability

Πλ,dL,dR(h, ω) , Pλ

minτ⊂T

valω(τ ;λ) 6 0

. (4.5)

In other words, Πλ,dL,dR(h, ω) is the probability that the minimum value over all skinny

trees of height 2h rooted in some variable node v0 in a (dL, dR)-bipartite graph G is non-positive. For every two roots v0 and v1 the trees Tv0 and Tv1 are isomorphic, it followsthat Πλ,dL,dR

(h, ω) does not depend on the root v0.Since λ is a random assignment of values to variable nodes in Tv0 , Arora et al. refer

to minτ⊂Tv0valω(τ ;λ) as a random process. With this notation, we apply a union bound

utilizing Lemma 4.9, as follows.

Lemma 4.11 Let G be a (dL, dR)-regular bipartite graph and w ∈ Rh+ be a weight vector

with h < 14girth(G). Suppose that λ ∈ RN is the log-likelihood ratio of the word received

from the channel. Then, the transmitted codeword c = 0N is (h, α ·w)-locally optimal forα , (M · ||w||∞)−1 with probability at least

1−N · Πλ,dL,dR(h, ω), where ωl = wh−l,

and with at least the same probability, c = 0N is also the unique optimal LP solutiongiven λ.

Note the two different weight notations: (i) w denotes weight vector in the context ofweighted deviations, and (ii) ω denotes weight vector in the context of skinny subtrees inthe (h, ω)-Process. A one-to-one correspondence between these two vectors is given byωl = wh−l for 0 6 l < h. From this point on, we will use only ω.

Following Lemma 4.11, it is sufficient to estimate the probability Πλ,dL,dR(h, ω) for a

given weight vector ω, a distribution of a random vector λ, and degrees (dL, dR). Weoverview the recursion presented in [ADS09] for estimating and bounding the probabilityof the existence of a skinny tree with non-positive value in a (h, ω)-process.

Let γ denote an ensemble of i.i.d. random variables. Define random variablesX0, . . . , Xh−1 and Y0, . . . , Yh−1 with the following recursion:

Y0 = ω0γ (4.6)

Xl = minY

(1)l , . . . , Y

(dR−1)l

(0 6 l < h) (4.7)

Yl = ωlγ +X(1)l−1 + . . .+X

(dL−1)l−1 (0 < l < h) (4.8)

The notation X(1), . . . , X(d) and Y (1), . . . , Y (d) denotes d mutually independent copies ofthe random variables X and Y , respectively. Each instance of Yl, 0 6 l < h, uses anindependent instance of a random variable γ.

46

Consider a directed tree T = Tv0 of height 2h, rooted at node v0. Associate variablenodes of T at height 2l with copies of Yl, and check nodes at height 2l + 1 with copiesof Xl, for 0 6 l < h. Note that any realization of the random variables γ to variablenodes in T can be viewed as an assignment λ. Thus, the minimum value of a skinny treeof T equals

∑dL

i=1X(i)h−1. This implies that the recursion in (4.6)-(4.8) defines a dynamic

programming algorithm for computing minτ⊂T valω(τ ;λ). Now, let the components ofthe LLR vector λ be i.i.d. random variables distributed identically to γ, then

Πλ,dL,dR(h, ω) = Pr

dL∑

i=1

X(i)h−1 6 0

. (4.9)

Given a distribution of γ and a finite “height” h, it is possible to compute thedistribution of Xl and Yl according to the recursion in (4.6)-(4.8) using properties of asum of random variables and a minimum of random variables (see Appendix 4.A.1). Thefollowing two lemmas play a major role in proving bounds on Πλ,dL,dR

(h, ω).

Lemma 4.12 ([ADS09]) For every t > 0,

Πλ,dL,dR(h, ω) 6

(Ee−tXh−1

)dL .

Let d′L , dL − 1 and d′R , dR − 1.

Lemma 4.13 ([ADS09]) For 0 6 s < l < h, we have

Ee−tXl 6

(

Ee−tXs

)d′Ll−s

·l−s−1∏

k=0

(d′REe

−tωl−kγ)d′

Lk.

Based on these bounds, in the following subsection we present concrete bounds onΠλ,dL,dR

(h, ω) for BI-AWGN channel.

4.3.2 Analysis for BI-AWGN Channel

Consider the binary input additive white Gaussian noise channel with noise varianceσ2 denoted by BI-AWGNC(σ). In the case that the all-zero codeword is transmitted,

the channel input is Xi = +1 for every i. Hence, λBI−AWGNC(σ)i = 2

σ2 (1 + φi) whereφi ∼ N (0, σ2). Since Πλ,dL,dR

(h, ω) is invariant under positive scaling of the vector λ,we consider in the following analysis the scaled vector λ in which λi = 1 + φi withφi ∼ N (0, σ2).

Following [ADS09], we apply a simple analysis for BI-AWGNC(σ) with uniform weightvector ω. Then, we present improved bounds by using a non-uniform weight vector.

Uniform Weights

Consider the case where ω = 1h. Let α1 , Ee−tX0 and α2 , d′REe−tγ where γ

i.i.d.∼ λi,

and define α , α1 · α1/(dL−2)2 . By substituting notations of α1 and α2 in Lemmas 4.12

and 4.13, Arora et al. [ADS09] proved that if α < 1, then

Πλ,dL,dR(h, 1h) 6 αdL·d′L

h−1−dL.

47

To analyze parameters for which Πλ,dL,dR(h, 1h) → 0, we need to compute α1 and α2

as functions of σ, dL and dR. Note that

X0 = mini∈1,...,d′

Rλi

= 1 + mini∈1,...,d′

Rφi, where φi ∼ N (0, σ2) i.i.d.

Denote by fN (·) and FN (·) the p.d.f. and c.d.f. of a Gaussian random variable with zeromean and standard deviation σ, respectively. We therefore have

α1(σ, dL, dR) = d′Re−t

∫ ∞

−∞

(1− FN (x)

)d′R−1fN (x)e−txdx, and (4.10)

α2(σ, dL, dR) = d′Re12t2σ2−t. (4.11)

The above calculations give the following bound on Πλ,dL,dR(h, 1h).

Lemma 4.14 If σ > 0 and dL, dR > 2 satisfy the condition

α = mint>0

(

d′Re−t

∫ ∞

−∞

(1− FN (x)

)d′R−1fN (x)e−txdx

︸︷︷︸

α1

)

·(

d′Re12t2σ2−t

︸︷︷︸

α2

)1/(dL−2)

< 1,

then for h ∈ N and ω = 1h, we have

Πλ,dL,dR(h, ω) 6 αdL·d′L

h−1−dL .

For (3,6)-regular graphs, we obtain by numeric calculations the following corollary.

Corollary 4.15 Let σ < 0.59, dL = 3, and dR = 6. Then, there exists a constant α < 1such that for every h ∈ N and ω = 1h,

Πλ,dL,dR(h, ω) 6 α2h

.

Note that Πλ,dL,dR(h, 1h) decreases doubly-exponentially as a function of h.

Improved Bounds Using Non-Uniform Weights

The following lemma implies an improved bound for Πλ,dL,dR(h, ω) using a non-uniform

weight vector ω.

Lemma 4.16 Let σ > 0 and dL, dR > 2. Suppose that for some s ∈ N and some weightvector ω ∈ Rs

+,

mint>0

Ee−tXs <((dR − 1)e−

12σ2

)− 1dL−2 . (4.12)

Let ω(ρ) ∈ Rh+ denote the concatenation of the vector ω ∈ Rs

+ and the vector (ρ, . . . , ρ) ∈Rh−s

+ . Then, for every h > s there exist constants α < 1 and ρ > 0 such that

Πλ,dL,dR(h, ω(ρ)) 6

((dR − 1)e−

12σ2

)− dLdL−2 · αdL·d′L

h−s−1

.

48

Proof: By Lemma 4.13, we have

Ee−tXh−1 6 (Ee−tXs)(dL−1)h−s−1((dR − 1)Ee−tρ(1+φ)

)∑h−s−2k=0 (dL−1)k

= (Ee−tXs)(dL−1)h−s−1((dR − 1)Ee−tρ(1+φ)

) (dL−1)h−s−1−1

dL−2 .

Note that Ee−tρ(1+φ) = e−tρ+ 12t2ρ2σ2

is minimized when tρ = σ−2. By setting ρ = 1tσ2 , we

obtain

Ee−tXh−1 6 (Ee−tXs)(dL−1)h−s−1((dR − 1)e−

12σ2

) (dL−1)h−s−1−1

dL−2

=

(

Ee−tXs((dR − 1)e−

12σ2

) 1dL−2

)(dL−1)h−s−1((dR − 1)e−

12σ2

)− 1dL−2 .

Let α ,

mint>0 Ee−tXs

((dR − 1)e−

12σ2

) 1dL−2

. By (4.12), α < 1. Let t∗ =

arg mint>0 Ee−tXs , then

Ee−t∗Xh−1 6 α(dL−1)h−s−1((dR − 1)e−

12σ2

)− 1dL−2 .

Using Lemma 4.12, we conclude that

Πλ,dL,dR(h, ω(ρ)) 6 αdL(dL−1)h−s−1(

(dR − 1)e−1

2σ2)− dL

dL−2 ,

and the lemma follows. 2

Arora et al. [ADS09] suggested using a weight vector ω with components ωl = (dL−1)l.This weight vector has the effect that if λ assigns the same value to every variable node,then every level in a skinny tree τ contributes equally to valω(τ ;λ). For h > s, considera weight vector ω(ρ) ∈ Rh

+ defined by

ωl =

ωl if 0 6 l < s,

ρ if s 6 l < h.

Note that the first s components of ω(ρ) are non-uniform while the other components areuniform.

For a given σ, dL, and dR, and for a concrete value s we can compute the distribu-tion of Xs using the recursion in (4.6)-(4.8). Moreover, we can also compute the valuemint>0 Ee

−tXs . Computing the distribution and the Laplace transform of Xs is not atrivial task in the case where the components of λ have a continuous density distributionfunction. However, since the Gaussian distribution function is smooth and most of itsvolume is concentrated in a defined interval, it is possible to “simulate” the evolutionof the density distribution functions of the random variables Xl and Yl for l 6 s. Weuse a numerical method based on quantization in order to represent and evaluate thefunctions fXl

(·), FXl(·), fYl

(·), and FYl(·). This computation follows methods used in the

implementation of density evolution technique (see e.g. [RU08]). A specific method forcomputation is described in Appendix 4.A and exemplified for (3,6)-regular graphs.

For (3, 6)-regular bipartite graphs we obtain the following corollary.

49

s σ0Eb

N0

(σ0)[dB] s σ0Eb

N0

(σ0)[dB] s σ0Eb

N0

(σ0)[dB]

0 0.605 4.36 4 0.685 3.28 12 0.72 2.85

1 0.635 3.94 6 0.7 3.09 14 0.725 2.79

2 0.66 3.60 8 0.71 2.97 18 0.73 2.73

3 0.675 3.41 10 0.715 2.91 22 0.735 2.67

Table 4.1: Computed values of σ0 for finite s in Corollary 4.17, and their correspondingEb

N0SNR measure in dB.

Corollary 4.17 Let σ < σ0, dL = 3, and dR = 6. For the following values of σ0 and sin Table 4.1 it holds that there exists a constant α < 1 such that for every h > s,

Πλ,dL,dR(h, ω) 6

1

125e

32σ2 · α2h−s

.

Note that for a fixed s, the probability Πλ,dL,dR(h, ω) decreases doubly-exponentially

as a function of h. Since it’s required that s < h, Corollary 4.17 applies only to codeswhose Tanner graphs have girth larger than 4h.

Theorem 4.1 follows from Lemma 4.11, Lemma 4.14, and Corollary 4.17 as follows.The first part, that states a finite-length result, follows from Lemma 4.11 and Corol-lary 4.17 by taking s = 0 < h < 1

4girth(G) which holds for any Tanner graph G.

The second part, that deals with an asymptotic result, follows from Lemma 4.11 andCorollary 4.17 by fixing s = 22 and taking g = Ω(logN) sufficiently large such thats < h = Θ(logN) < 1

4girth(G). It therefore provides a lower bound on the threshold of

LP-decoding. The third part, that states a finite-length result for any (dL, dR)-regularLDPC code, follows from Lemma 4.11 and Lemma 4.14. Theorem 4.2 is obtained inthe same manner after a simple straightforward modification of Lemma 4.14 to MBIOSchannels.

Remark 4.18 Following [ADS09], the contribution ωh · λv0 of the root of Tv0 is not in-cluded in the definition of valω(τ ;λ). The effect of this contribution to Πλ,dL,dR

(h, ω) isbounded by a multiplicative factor, as implied by the proof of Lemma 4.12. The multi-plicative factor is bounded by Ee−tωhλv0 , which may be regarded as a constant since it doesnot depend on the code parameters (in particular the code length N). Therefore, we canset ωh = 0 without loss of generality for these asymptotic considerations.

4.4 Discussion

In this chapter we extended the analysis of Arora et al. [ADS09] for LP-decoding overa BSC to any MBIOS channel. We proved bounds on the word error probability thatare inverse doubly-exponential in the girth of the factor graph for LP-decoding of regularLDPC codes over MBIOS channels. We also proved lower bounds on the threshold ofregular LDPC codes whose Tanner graphs have logarithmic girth under LP-decoding inthe binary-input AWGN channel.

Although thresholds are regarded as an asymptotic result, the analysis presented byArora et al. [ADS09], as well as its extension presented in this chapter, exhibits both

50

asymptotic results as well as finite-length results. An interesting tradeoff between thesetwo perspectives is shown by the formulation of the results. We regard the goal ofachieving the highest possible thresholds as an asymptotic goal, and as such we maycompare the achieved thresholds to the asymptotic BP-based thresholds. Note that theobtained lower bound on the threshold increases up to a certain ceiling value (which weconjecture is below the LP threshold) as the assumed girth increases. Thus, an asymptoticresult is obtained.

However, in the case of finite-length codes, the analysis cannot be based on an infinitegirth in the limit. Two phenomena occur in the analysis of finite codes: (i) the size of theinterval [0, σ0] for which the error bound holds increases as function of the girth (as shownin Table 4.1), and (ii) the decoding error probability decreases exponentially as a functionof the gap σ0−σ (as implied by Figure 4.5(b)). We demonstrated the power of the analysisfor the finite-length case by presenting error bounds for any (3, 6)-regular LDPC code asfunction of the girth of the Tanner graph provided that σ 6 0.605. Assuming that thegirth of the Tanner graph is greater than 88, an error bound is presented provided thatσ 6 0.735. This proof also shows that 0.735 is a lower bound on the threshold in theasymptotic case.

In the proof of LP optimality (Lemma 4.7 and Theorem 4.8) we used the combi-natorial interpretation of LP-decoding via graph covers [VK05] to infer a reduction toconditions of ML optimality. That is, the decomposition of codewords presented by Aroraet al. [ADS09] leads to a decomposition for fractional LP solutions. This method of re-ducing combinatorial characterizations of LP-decoding to combinatorial characterizationsof ML decoding is based on graph cover decoding.

Future directions: The technique for proving error bounds for BI-AWGN channeldescribed in Section 4.3 and in Appendix 4.A is based on a min-sum probabilistic processon a tree. The process is characterized by an evolution of probability density functions.Computing the evolving densities in the analysis of AWGN channels is not a trivial task.As indicated by our numeric computations, the evolving density functions in the case ofthe AWGN channel visually resemble Gaussian probability density functions (see Figures4.2 and 4.3). Chung et al. [CRU01] presented a method for estimating thresholds ofbelief propagation decoding according to density evolution using Gaussian approximation.Applying an appropriate Gaussian approximation technique to our analysis may resultin analytic asymptotic approximate thresholds of LP-decoding for regular LDPC codesover AWGN channels.

Feldman et al. [FKV05] observed that for high SNRs truncating LLRs of BI-AWGNCsurprisingly assist LP-decoding. They proved that for certain families of regular LDPCcodes and large enough SNRs (i.e., small σ), it is advantageous to truncate the LLRsbefore passing them to the LP decoder. The method presented in Appendix 4.A forcomputing densities evolving on trees using quantization and truncation of the LLRs canbe applied to this case. It is interesting to see whether this unexpected phenomenon ofLP-decoding occurs also for larger values of σ (i.e., lower SNRs).

51

4.A Computing the Evolution of Probability Densi-

ties over Trees

In this appendix we present a computational method for estimating mint>0 Ee−tXs for

some concrete s. The random variable Xs is defined by the recursion in (4.6)-(4.8). Letγ denote an ensemble of i.i.d. continuous random variable with probability densityfunction (p.d.f.) fγ(·) and cumulative distribution function (c.d.f.) Fγ(·).

We demonstrate the method for computing mint>0 Ee−tXs for the case where dL = 3,

dR = 6, ωl = (dL − 1)l = 2l, σ = 0.7, and γ = 1 + φ where φ ∼ N (0, σ2). In this case,

fγ(x) = fN (x− 1) =1√

2πσ2e−

(x−1)2

2σ2 , and

Fγ(x) = FN (x− 1) =1

2

[

1 + erf

(x− 1√

2σ

)]

,

where erf(x) , 2√π

∫ x

0e−t2dt denotes the error function.

The actual computation of the evolution of density functions via the recursion equa-tions requires a numeric implementation. Finding an efficient and stable such implemen-tation is nontrivial. We follow methods used in the computation of the variable-nodeupdate process in the implementation of density evolution analysis (see e.g. [RU08]).

We first state two properties of random variables for the evolving process defined inthe recursion. We then show a method for computing a proper representation of theprobability density function of Xs for the purpose of finding mint>0 Ee

−tXs .

4.A.1 Properties of Random Variables

Sum of Random Variables. Let Φ denote a random variable that equals to the sumof n independent random variables φini=1, i.e., Φ =

∑ni=1 φi. Denote by fφi

(·) the p.d.f.of φi. Then, the p.d.f. of Φ is given by

fΦ = convi∈1,...,nfφi, (4.13)

where ? denotes the standard convolution operator over R or over Z.

Minimum of Random Variables. Let Φ denote a random variable that equals to theminimum of n i.i.d. random variables φini=1, i.e., Φ = min16i6n φi. Denote by fφ(·) andFφ(·) the p.d.f. and c.d.f. of φ ∼ φi, respectively. Then, the p.d.f. and c.d.f. of Φ aregiven by

fΦ(x) = n ·(1− Fφ(x)

)n−1fφ(x), and (4.14)

FΦ(x) = 1−(1− Fφ(x)

)n. (4.15)

4.A.2 Computing Distributions of Xl and Yl

The base case of the recursion in (4.6)-(4.8) is given by Y0. Let gωl(·) denote the p.d.f. of

the scaled random variable ωlγ, i.e.,

gωl(y) =

1

ωlfγ

(y

ωl

)

. (4.16)

52

Then, the p.d.f. of Y0 is simply written as

fY0(y) = gω0(y). (4.17)

In the case where γ = 1 +N (0, σ2), Equation (4.17) simplifies to

fY0(y) =1

ω0

fN

(y

ω0

− 1

)

, and (4.18)

FY0(y) = FN

(y

ω0− 1

)

. (4.19)

Let fFd(·) denote the d-fold convolution of a function f(·), i.e., the convolution offunction f(·) with itself d times. Following (4.13)-(4.15), the recursion equations for thep.d.f. and c.d.f. of Xl and Yl are given by

fXl(x) = (dR − 1)

(1− FYl

(x))dR−2

fYl(x), (4.20)

FXl(x) = 1−

(1− FYl

(x))dR−1

, (4.21)

fYl(y) =

(

gωl? f

F(dL−1)Xl−1

)

(y), and (4.22)

FYl(y) =

∫ y

−∞fYl

(t)dt. (4.23)

Since we cannot analytically solve (4.20)-(4.23), we use a numeric method based onquantization in order to represent and evaluate the functions fXl

(·), FXl(·), fYl

(·), andFYl

(·). As suggested in [RU08], we compute a uniform sample of the functions, i.e.,we consider the functions over the set δZ, where δ denotes the quantization step size.Moreover, due to practical reasons we restrict the functions to a finite support, namely,δkNk=M for some integers M < L. We denote the set δkLk=M by δ[M,L]. Obviously,the choice of δ, M , and L determines the precision of our target computation. Dependingon the quantized function, it is also common to consider point masses at points not inδ[M,L]Z. For example, in the case where the density function has an heavy tail above δLwe may assign the value +∞ to the mass of the tail as an additional quantization point.The same applies analogously to a heavy tail below δM .

A Gaussian-like function (bell-shaped function) is bounded and continuous, and so areits derivatives. The area beneath its tails decays exponentially and becomes negligible afew standard deviations away from the mean. Thus, Gaussian-like functions are amenableto quantization and truncation of the tails. We therefore choose to zero the densityfunctions outside the interval [δM, δL]. The parameters M and L are symmetric aroundthe mean, and together with δ are chosen to make the error of a Riemann integralnegligible. As we demonstrate by computations, the density functions fXl

(·) and fYl(·)

are indeed bell-shaped, justifying the quantization. Figure 4.1 illustrates the p.d.f. of X0

(here X0 equals to the minimum of dR − 1 = 5 instances of Y0). Note that by definition,Y0 is a Gaussian random variable.

Computing fYl(·) given fXl−1

(·) requires the convolution of functions. However, therestriction of the density functions to a restricted support δ[M,L] is not invariant underconvolution. That is, if the function f is supported by δ[M,L], then f ?f is supported by

53

−4 −2 0 2 4 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

x

fX0

fY0

Figure 4.1: Probability density functions of X0 and Y0 for (dL, dR) = (3, 6) and σ = 0.7.

δ[2M, 2L]. In the quantized computations of fXl(·) and fYl

(·), our numeric calculationsshow that the mean and standard deviation of the random variables Xl and Yl increaseexponentially in l as illustrated in Figures 4.2 and 4.3. Therefore, the maximal slopes ofthe density functions fXl

(·) and fYl(·) decrease with l. This property allows us to double2

the quantization step δ as l increases by one. Thus, the size of the support used for fXl(·)

and fYl(·) does not grow. Specifically, the interval δ[M,L] doubles but the doubling of

δ keeps the number of points fixed. This method helps keep the computation tractablewhile keeping the error small.

For two quantized functions f and g, the calculation of f ? g can be efficiently per-formed using Fast Fourier Transform (FFT). First, in order to prevent aliasing, extendthe support with zeros (i.e., zero padding) so that it equals the support of f ? g. Then,f ? g = IFFT(FFT(f) ∗ FFT(g)) where ∗ denotes a coordinate-wise multiplication. Theoutcome is scaled by the quantization step size δ. In fact, the evaluation of fYl

(·) requiresdL − 1 convolutions and is performed in the frequency domain (without returning to thetime domain in between) by a proper zero padding prior to performing the FFT.

Note that when γ is a discrete random variable with a bounded support (as in[ADS09]), a precise computation of the probability distribution function of Xs is ob-tained by following (4.20)-(4.23).

4.A.3 Estimating mint>0 Ee−tXs

After obtaining a proper discretized representation of the p.d.f. of Xs we approximateEe−tXs for a given t by

Ee−tXs u

L∑

k=M

δ · fXs(δk) · e−tδk.

2Doubling applies to the demonstrated parameters, i.e. dL = 3 and ωl = 2l.

54

−20 −10 0 10 20 30 400

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

x

fX2

fX3 fX4

fX1

fX0

Figure 4.2: Probability density functions of Xl for l = 0, . . . , 4, (dL, dR) = (3, 6) andσ = 0.7.

−10 0 10 20 30 40 50 600

0.1

0.2

0.3

0.4

0.5

0.6

x

fY0

fY4

fY3

fY2

fY1

Figure 4.3: Probability density functions of Yl for l = 0, . . . , 4, (dL, dR) = (3, 6) andσ = 0.7.

55

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4−5

0

5

10

15

20

25

30

t

ln(Ee−tXs)

s = 4s = 6s = 8s = 10s = 12

(a)

0 0.01 0.02 0.03 0.04 0.05 0.06−1.5

−1

−0.5

0

0.5

1

1.5

2

t

ln(Ee−tXs)

s = 12

s = 10

s = 8

s = 8s = 4

(b)

Figure 4.4: ln(Ee−tXs

)as a function of t for s = 4, 6, 8, 10, 12, (dL, dR) = (3, 6) and

σ = 0.7. Plot (b) is an enlargement of the rectangle depicted in plot (a).

We then estimate the minimum value by searching over values of t > 0. Figure 4.4 depictsln

(Ee−tXs

)as a function of t ∈ (0, 0.5] for s = 4, 6, 8, 10, 12. The numeric calculations

show that as t grows from zero, the function Ee−tXs decreases to a minimum value,and then increases rapidly. We can also observe that both the values mint>0 Ee

−tXs andarg mint>0 Ee

−tXs decrease as a function of s.Following Lemma 4.16, we are interested in the maximum value of σ for which (4.12)

holds for a given s. That is,

σ0 , sup

σ > 0

∣∣∣∣

mint>0

Ee−tXs ·((dR − 1)e−

12σ2

) 1dL−2 < 1

. (4.24)

Note that if the set in (4.24) is not empty, then it is an open interval (0, σ0) ∈ R+.Figure 4.5 (a) illustrates the region in the (t, σ) plane, for which (4.12) holds with s = 4.

Let t∗ denote the value of t that achieves the supremum σ0. For every σ ∈ (0, σ0), wemay set the value of the constant α in Corollary 4.17 as

α = Ee−t∗Xs ·((dR − 1)e−

12σ2

) 1dL−2 .

Figure 4.5 (b) illustrates the value of the constant c in Corollary 4.17 as a function of σin the case where s = 4.

56

(a)

0.5 0.55 0.6 0.65 0.70

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

σ

c=

5e−

1

2σ2Ee−0.11·X4

(b)

Figure 4.5: (a) Region for which 5e−1

2σ2Ee−tX4 < 1 as a function of t and σ for (dL, dR) =(3, 6). Note that the maximal value of σ contained in that region results to the estimateof σ0 = 0.685 in the entry s = 4 in Table 4.1. (b) Constant α in Corollary 4.17 as afunction of σ in the case where s = 4 and t = 0.11, i.e., the value of α over the cut of the(t, σ)-plane in plot (a) at t = 0.11 (depicted by a thick solid line).

57

58

Chapter 5

Local-Optimality Certificates forML-Decoding and LP-decoding ofTanner Codes

In this chapter1, we present a new combinatorial characterization for local-optimality ofa codeword in irregular Tanner codes with respect to any MBIOS channel. We provethat local-optimality in this new characterization implies Maximum-Likelihood (ML) op-timality and LP-optimality. Given a codeword and the channel output, we also showhow to efficiently recognize if the codeword is locally optimal by a dynamic programmingalgorithm.

The results in the Chapters 6-8 are based on the local-optimality characterizationpresented in this chapter.

5.1 Introduction

Combinatorial characterizations of sufficient conditions for successful decoding are basedon so called “certificates”. That is, given a channel observation y and a codeword x, we areinterested in a one-sided error test that answers the questions: is x optimal with respect toy? is it unique? Note that the test may answer “no” for a positive instance. A positiveanswer for such a test is called a certificate for the optimality of a codeword. Upperbounds on the word error probability are obtained by lower bounds on the probabilitythat a certificate exists.

Wiberg [Wib96] studied representations of codes using factor graphs. He used theserepresentations to analyze message passing decoding algorithms. The analysis uses min-imal combinatorial structures (now also known as skinny trees) to characterize decodingerrors when using message-passing decoding algorithms.

Koetter and Vontobel [KV06] analyzed LP-decoding of regular LDPC codes. Theiranalysis is based on decomposing each codeword (and pseudocodeword) to a finite setof minimal structured trees (i.e., skinny trees) with uniform vertex weights. Arora etal. [ADS09] extended the work in [KV06] by introducing nonuniform weights to thevertices in the skinny trees, and defining local-optimality. For a BSC, Arora et al. proved

1The research work presented in this chapter is based on [HE11b, HE12c]

59

that local optimality implies both ML-optimality and LP-optimality. They presented ananalysis technique that performs a finite-length density evolution of a min-sum process toprove bounds on the probability of a decoding error. Arora et al. also pointed out that itis possible to design a re-weighted version of the min-sum decoder for regular codes thatfinds the locally-optimal codeword if such exists for trees whose height is at most halfof the girth. This work was further extended in [HE11a] (see Chapter 4) to memorylesschannels. The analyses presented in these works [KV06, ADS09, HE11a] are limited toskinny trees, the height of which is bounded by a half of the girth of the Tanner graph.

Vontobel [Von10] extended the decomposition of a codeword (and pseudocodeword)to skinny trees in graph covers. This enabled Vontobel to mitigate the limitation on theheight by the girth. The decomposition is obtained by a random walk, and applies alsoto irregular Tanner graphs.

Contributions. We present a new combinatorial characterization for local-optimalityof a codeword in irregular Tanner codes with respect to any memoryless binary-inputoutput-symmetric (MBIOS) channel (Definition 5.4). Local optimality is characterizedvia costs of deviations based on subtrees in computation trees of the Tanner graph.Consider a computation tree with height 2h rooted at some variable node. A deviationis based on a subtree such that (i) the degree of a variable node equals to its degreein the computation tree, and (ii) the degree of a local-code node equals some constantd > 2, provided that d is at most the minimum distance of the local-codes. Furthermore,level weights w ∈ Rh

+ are assigned to the levels of the tree. Hence, a deviation is acombinatorial structure that has three main parameters: deviation height h, deviationlevel weights w ∈ Rh

+, and deviation “degree” d. Therefore, the new definition of local-optimality is based on three parameters: h ∈ N, w ∈ Rh

+, and d > 2.

This characterization extends the notion of deviations in local-optimality in four ways:(i) no restrictions are applied to the degrees of the nodes in the Tanner graph, (ii) arbitrarylocal linear codes may be associated with constraint nodes, (iii) deviations are subtrees inthe computation tree and no limitation is set on the height of the deviations; in particular,their height may exceed the girth of the Tanner graph, (iv) minimal deviations may havea degree d > 2 in the check nodes (as opposed to skinny trees in previous analyses),provided that d is at most the minimum distance of the local-codes. We prove that local-optimality in this characterization implies ML-optimality (Theorem 5.5). We utilizethe equivalence of graph cover decoding and LP-decoding for Tanner codes, impliedby Vontobel and Koetter [VK05], to prove that local-optimality suffices also for LP-optimality (Theorem 5.10). We present an efficient dynamic programming algorithm thatcomputes a local-optimality certificate for a codeword with respect to a given channeloutput (Algorithm 5.1).

Organization. The remainder of this chapter is organized as follows. Section 5.2presents combinatorial certificate that applies to ML-decoding for codewords of Tan-ner codes. In Section 5.3, we prove that the certificate applies also to LP-decoding forcodewords of Tanner codes. In Section 5.4, we present an efficient certification algorithmfor local-optimality. In Section 5.5, we prove a structural decomposition for codewordsof Tanner codes used as a key element in the proof of the main theorem of Section 5.2.

60

5.2 A Combinatorial Certificate for an ML Code-

word

In this section we present combinatorial certificates for codewords of Tanner codes thatapply both to ML-decoding and LP-decoding. A certificate is a proof that a given code-word is the unique solution of maximum-likelihood decoding and linear-programmingdecoding. The certificate is based on combinatorial weighted structures in the Tannergraph, referred to as local configurations. These local configurations generalize the mini-mal configurations (skinny trees) presented by Vontobel [Von10] as extension to Arora etal. [ADS09]. We note that for Tanner codes, the support of each weighted local configu-ration is not necessarily a local valid configuration. For a given codeword, the certificateis computed by a dynamic-programming algorithm on the Tanner graph of the code (seeSection 5.4).

Notation: Let y ∈ RN denote the word received from the channel. Let λ = λ(y)denote the LLR vector for y. Let G = (V ∪ J , E) denote a Tanner graph, and let C(G)denote a Tanner code based on G with minimum local distance d∗. Let x ∈ C(G) be acandidate for xml(y) and xlp(y).

Definition 5.1 (Path-Prefix Tree) Consider a graph G = (V,E) and a node r ∈ V .Let V denote the set of all backtrackless paths in G with length at most h that start atnode r, and let

E ,(p1, p2) ∈ V × V

∣∣ p1 is a prefix of p2, |p1|+ 1 = |p2|

.

We identify the empty path in V with (r). Denote by T hr (G) , (V , E) the path-prefix

tree of G rooted at node r with height h.

Path prefix trees of G that are rooted in variable nodes are often called computationtrees. We consider also path prefix trees of subgraphs that may be either rooted at avariable node or at a local-code node.

We use the following notation. Because vertices in T hr (G) are paths in G, we denote

vertices in path-prefix trees by p and q. Vertices in G are denoted by u, v, r. For a pathp ∈ V , let s(p) denote the first vertex (source) of path p, and let t(p) denote the lastvertex (target) of path p. Denote by Prefix+(p) the set of proper prefixes of the path p,i.e.,

Prefix+(p) =q

∣∣ q is a prefix of p, 1 6|q|< |p|

.

Consider a Tanner graph G = (V ∪ J , E) and let T hr (G) = (V , E) denote a path-prefix

tree of G. Let V , p | p ∈ V , t(p) ∈ V, and J , p | p ∈ V , t(p) ∈ J . Paths in Vare called variable paths, and paths in J are called local-code paths.

The following definitions expand the combinatorial notion of minimal valid devia-tions [Wib96] and weighted minimal local-deviations (skinny trees) [ADS09, Von10] tothe case of Tanner codes.

Definition 5.2 (d-tree) Consider a Tanner graph G = (V∪J , E). Denote by T 2hr (G) =

(V ∪ J , E) the path-prefix tree of G rooted at node r ∈ V. A subtree T ⊆ T 2hr (G) is a

d-tree if: (i) T rooted at (r), (ii) for every local-code path p ∈ T ∩ J , degT (p) = d, and(iii) for every variable path p ∈ T ∩ V, degT (p) = degT 2h

r(p).

61

Note that the leaves of a d-tree are variable paths because a d-tree is rooted in a variablenode and has an even height. Let T [r, 2h, d](G) denote the set of all d-trees rooted at rthat are subtrees of T 2h

r (G).

In the following definition we use “level” weights w = (w1, . . . , wh) that are assignedto variables paths in a subtree of a path-prefix tree of height 2h.

Definition 5.3 (w-weighted subtree) Let T = (V ∪J , E) denote a subtree of T 2hr (G),

and let w = (w1, . . . , wh) ∈ Rh+ denote a non-negative weight vector. Let wT : V → R

denote a weight function based on weight vector w for variable paths p ∈ V defined asfollows. If p is an empty variable path, then wT (p) = 0. Otherwise,

wT (p) ,w`

‖w‖1· 1

degG

(t(p)

) ·∏

q∈Prefix+(p)

1

degT (q)− 1, (5.1)

where ` =⌈ |p|

2

⌉. We refer to wT as a w-weighed subtree.

For any w-weighted subtree wT of T 2hr (G), let πG,T ,w : V → R denote a function

whose values correspond to the projection of wT to the Tanner graph G. That is, forevery variable node v in G,

πG,T ,w(v) ,∑

p∈T :t(p)=vwT (p). (5.2)

We remark that: (i) If no variable path in T ends in v, then πG,T ,w(v) = 0. (ii) Ifh < girth(G)/4, then every node v is an endpoint of at most one variable path in T 2h

r (G),and the projection is trivial. However, we deal with arbitrary heights h, and the projectiondeals with many variable paths that end at the same variable node v. (iii) πG,T ,w(v) ∈[0, 1] for every weight vector w ∈ Rh

+ \ 0h.For a Tanner code C(G), let B(w)

d ⊆ [0, 1]N denote the set of all projections of w-weighted d-trees to G. That is,

B(w)d ,

πG,T ,w

∣∣ T ∈

⋃

r∈VT [r, 2h, d](G)

.

Vectors in B(w)d are referred to as deviations.

For two vectors x ∈ 0, 1N and f ∈ [0, 1]N , let x ⊕ f ∈ [0, 1]N denote the relativepoint defined by (x ⊕ f)i , |xi − fi| [Fel03]. The following definition is an extension oflocal-optimality [ADS09, Von10] to Tanner codes on memoryless channels.

Definition 5.4 (local-optimality) Let C(G) ⊂ 0, 1N denote a Tanner code with min-imum local distance d∗. Let w ∈ Rh

+\0h denote a non-negative weight vector of lengthh and let 2 6 d 6 d∗. A codeword x ∈ C(G) is (h, w, d)-locally optimal with respect to

λ ∈ RN if for all vectors β ∈ B(w)d ,

〈λ, x⊕ β〉 > 〈λ, x〉. (5.3)

62

Based on random walks on the Tanner graph, Vontobel showed that (h, w, 2)-localoptimality is sufficient both for ML-optimality and LP-optimality. The random walksare defined in terms derived from the generalized fundamental polytope. We extend theresults of Vontobel [Von10] to “thicker” sub-trees by using probabilistic combinatorialarguments on graphs and the properties of graph cover decoding [VK05]. Specifically,we prove that (h, w, d)-local optimality, for any 2 6 d 6 d∗, implies both ML- and LP-optimality (Theorems 5.5 and 5.10). Given the decomposition of Lemma 5.11 proved inSection 5.5, the following theorem is obtained by modification of the proof of [ADS09,Theorem 2] or [HE11a, Theorem 6] (see Theorem 4.5).

Theorem 5.5 (local-optimality is sufficient for ML) Let C(G) denote a Tannercode with minimum local distance d∗. Let h be some positive integer and w =(w1, . . . , wh) ∈ Rh

+ denote a non-negative weight vector. Let λ ∈ RN denote the LLRvector received from the channel. If x is an (h, w, d)-locally optimal codeword w.r.t. λand some 2 6 d 6 d∗, then x is also the unique maximum-likelihood codeword w.r.t. λ.

Proof: We use the decomposition proved in Section 5.5 to show that for every codewordx′ 6= x, 〈λ, x′〉 > 〈λ, x〉. Let z , x ⊕ x′. By linearity z ∈ C(G). Moreover, z 6= 0N

because x 6= x′. Because d∗ > 2, it follows that ‖z‖1 > 2. By Lemma 5.11 there exists

a distribution over the set B(w)d , such that E

β∈B(w)d

β = z‖z‖1

. Let α , 1‖z‖1

< 1. Let

f : [0, 1]N → R be the affine linear function defined by f(β) , 〈λ, x ⊕ β〉 = 〈λ, x〉 +∑N

i=1(−1)xiλiβi. Then,

〈λ, x〉 < Eβ∈B(w)

d

〈λ, x⊕ β〉 (by local-optimality of x)

= 〈λ, x⊕Eβ∈B(w)

d

β〉 (by linearity of f and linearity of expectation)

= 〈λ, x⊕ αz〉 (by Lemma 5.11)



5.3 Local Optimality Implies LP-Optimality

In order to prove a sufficient condition for LP optimality, we consider graph cover decodingintroduced by Vontobel and Koetter [VK05] (see Chapter 2.6). Let G denote an M-coverof G. Let x = x↑M ∈ C(G) and λ = λ↑M ∈ RN ·M denote the M-lifts of x and λ,respectively.

In this section we consider the following setting. Let C(G) denote a Tanner codewith minimum local distance d∗. Let w ∈ Rh

+\0h for some positive integer h and let2 6 d 6 d∗.

Proposition 5.6 (local-optimality of all-zero codeword is preserved by M-lifts)0N is an (h, w, d)-locally optimal codeword w.r.t. λ ∈ RN if and only if 0N ·M is an(h, w, d)-locally optimal codeword w.r.t. λ.

63

Proof: Consider the surjection ϕ of d-trees in the path-prefix tree of G to d-trees inthe path-prefix tree of G. This surjection is based on the covering map between G andG. Given a deviation β , πG,T ,w, let β , πG,ϕ(T ),w. The proposition follows because

〈λ, β〉 = 〈λ, β〉. 2

For two vectors y, z ∈ RN , let “∗” denote coordinatewise multiplication, i.e., y ∗ z ,(y1 · z1, . . . , yN · zN ). For a word x ∈ 0, 1N , let b ∈ ±1N denote a vector defined bybi , (−1)xi .

Proposition 5.7 For every λ ∈ RN and every β ∈ [0, 1]N ,

〈b ∗ λ, β〉 = 〈λ, x⊕ β〉 − 〈λ, x〉. (5.4)

Proof: For β ∈ [0, 1]N , it holds that 〈λ, x⊕ β〉 = 〈λ, x〉+ ∑Ni=1(−1)xiλiβi. Hence,

〈λ, x⊕ β〉 − 〈λ, x〉 =

N∑

i=1

(−1)xiλiβi

= 〈b ∗ λ, β〉

2

The following proposition states that the mapping (x, λ) 7→ (0N , b ∗ λ) preserves localoptimality.

Proposition 5.8 (symmetry of local-optimality) For every x ∈ C, x is (h, w, d)-locally optimal w.r.t. λ if and only if 0N is (h, w, d)-locally optimal w.r.t. b ∗ λ.

Proof: By Prop. 5.7, 〈λ, x⊕ β〉 − 〈λ, x〉 = 〈b ∗ λ, β〉. 2

The following lemma states that local-optimality is preserved by lifting to an M-cover.

Lemma 5.9 x is (h, w, d)-locally optimal w.r.t. λ if and only if x is (h, w, d)-locallyoptimal w.r.t. λ.

Proof: Assume that x is (h, w, d)-locally optimal codeword w.r.t. λ. By Prop. 5.8,0N ·M is (h, w, d)-locally optimal w.r.t. (−1)x ∗ λ. By Prop. 5.6, 0N is (h, w, d)-locallyoptimal w.r.t. (b ∗ λ). By Prop. 5.8, x is (h, w, d)-locally optimal w.r.t. λ. Each of theseimplications is necessary and sufficient, and the lemma follows. 2

The following theorem is obtained as a corollary of Theorem 5.5 and Lemma 5.9.The proof is based on arguments utilizing properties of graph cover decoding. Thosearguments are used for a reduction from ML-optimality to LP-optimality similar to thereduction presented in the proof of [HE11a, Theorem 8] (see Theorem 4.8).

Theorem 5.10 (local optimality is sufficient for LP optimality) If x is an(h, w, d)-locally optimal codeword w.r.t. λ, then x is also the unique optimal LP solutiongiven λ.

64

5.4 Verifying Local Optimality

In this section we address the problem of how to verify whether a codeword x is (h, w, d)-locally optimal with respect to λ. By Prop. 5.8, this is equivalent to verifying whether0N is (h, w, d)-locally optimal with respect to b ∗ λ, where bi , (−1)xi.

The verification algorithm is listed as Algorithm 5.1. It applies dynamic programmingto find, for every variable node v, a d-tree Tv, rooted at v, that minimizes the cost 〈b ∗λ, πG,Tv,w〉. The algorithm returns false if and only if it finds a deviation with nonpositivecost.

The algorithm is presented as a message passing algorithm. In every step, a nodepropagates to its parent the minimum cost of the d-subtree that hangs from it basedon the minimum values received from its children. The time and message complexity ofAlgorithm 5.1 is O(|E| · h), where E denotes the edge set of the Tanner graph.

The following notation is used in Line 8 of the algorithm. For a set S of real values,let min[i]S denote the ith smallest member in S.

Algorithm 5.1 verify-lo(x, λ, h, w, d) - An iterative verification algorithm. Let G =(V ∪J , E) denote a Tanner graph. Given an LLR vector λ ∈ R|V|, a codeword x ∈ C(G),level weights w ∈ Rh

+, and a degree d ∈ N+, outputs “true” if x is (h, w, d)-locally optimalw.r.t. λ; otherwise, outputs “false”.

1: Initialize: ∀v ∈ V : λ′v ← λv · (−1)xv

2: ∀C ∈ J , ∀v ∈ N (C): µ(−1)C→v ← 0

3: for l = 0 to h− 1 do4: for all v ∈ V, C ∈ N (v) do

5: µ(l)v→C ← wh−l

degG(v)λ′v + 1

degG(v)−1

∑

C′∈N (v)\C µ(l−1)C′→v

6: end for7: for all C ∈ J , v ∈ N (C) do

8: µ(l)C→v ← 1

d−1·∑d−1

i=1 min[i]µ

(l)v′→C : v′ ∈ N (C) \ v

9: end for10: end for11: for all v ∈ V do12: µv ←

∑

C∈N (v) µ(h−1)C→v

13: if µv 6 0 then min-cost w-weighted d-tree rooted at v has non-positive value14: return false;15: end if16: end for17: return true;

5.5 Constructing Codewords from Weighted Trees

Projections

In this section we prove Lemma 5.11, the key structural lemma in the proof of Theo-rem 5.5. This Lemma states that every codeword of a Tanner code is a finite sum ofprojections of weighted trees in the computation trees of G.

65

Throughout this section, let C(G) denote a Tanner code with minimum local distanced∗, let x denote a nonzero codeword, let h denote some positive integer, and let w ∈Rh

+ \ 0h denote level weights.

Lemma 5.11 Consider a codeword x 6= 0N . Then, for every 2 6 d 6 d∗, there exists adistribution ρ over d-trees of G of height 2h such that for every weight vector w ∈ Rh

+\0,it holds that

x = ‖x‖1 · Eρ

[πG,T ,w

].

The proof of Lemma 5.11 is based on lemmas 5.12-5.13 and corollary 5.14. Lemma 5.12states that every codeword x ∈ C(G) can be decomposed into a set of weighted path-prefixtrees. The number of trees in the decomposition equals ‖x‖1. Lemma 5.13 states thatevery weighted path-prefix tree is a convex combination of weighted d-trees. This lemmaimplies that the projection of a weighted path-prefix tree equals to the expectation ofprojections of weighted d-trees.

For a codeword x ∈ C(G) ⊂ 0, 1N , let Vx , v ∈ V | xv = 1. Let Gx denote thesubgraph of the Tanner graph G induced by Vx ∪ N (Vx).

Lemma 5.12 For every codeword x 6= 0N , for every weight vector w ∈ Rh+, and for every

variable node v ∈ V,

xv =∑

r∈Vx

πG,T 2hr (Gx),w(v).

Proof: If xv = 0, then πG,T 2hr (Gx),w(v) = 0. It remains to show that equality holds for

variable nodes v ∈ Vx.Consider an all-one weight vector η = 1h. Construct a path-suffix tree rooted at v.

Level ` of this path-suffix trees consists of all backtrackless paths in Gx of length ` thatend at node v (see Figure 5.1). We denote this level by P`(v).

We claim that for every v ∈ Vx and 1 6 ` 6 2h,

∑

p∈P`(v)

ηT 2hs(p)

(p) =1

h. (5.5)

The proof is by induction on `. The induction basis, for ` = 1, holds because |P1(v)| =degG(v) and ηT 2h

s(p)(p) = 1

h·degG(v)for every p ∈ P1(v). The induction step is proven as

follows. For each p ∈ P`(v), let aug(p) ,q ∈ P`+1(v)

∣∣ p is a suffix of q

. Note that

|aug(p)| = degGx

(s(p)

)− 1. Moreover, for each q ∈ aug(p),

ηT 2hs(q)

(q)

ηT 2hs(p)

(p)=

1

degGx

(s(p)

)− 1

. (5.6)

Hence,∑

q∈aug(p)

ηT 2hs(q)

(q) = ηT 2hs(p)

(p).

Finally, P`+1(v) is the disjoint union of⋃aug(p) | p ∈ P`(v). It follows that

∑

q∈P`+1(v)

ηT 2hs(q)

(q) =∑

p∈P`(v)

ηT 2hs(p)

(p). (5.7)

66

`

s(p)

v = t(p)

p

P`−1(v)

P`(v)

Figure 5.1: Set of all backtrackless paths P`(v) as augmentation of the set P`−1(v) asviewed by the path-suffix tree of height ` rooted at v, in proof of Lemma 5.11.

By the induction hypothesis we conclude that∑

q∈P`+1(v) ηT 2hs(q)

(q) = 1/h, as required.

Note that the sum of weights induced by η on each level is 1/h, both for levels of pathsbeginning in variable nodes and in local-code nodes. In the rest of the proof we focusonly on even levels that start at variable nodes. We now claim that

∑

p∈P2`(v)

wT 2hs(p)

(p) =w`

‖w‖1. (5.8)

Indeed, by Definition 5.3 it holds that wT 2hs(p)

(p) = ηT 2hs(p)

(p) · w`

‖w‖1· h for every p ∈ P2`(v).

Therefore, Equation (5.8) follows from Equation (5.5).The lemma follows because for every v ∈ Vx,

∑

r∈Vx

πG,T 2hr (Gx),w(v) =

h∑

`=1

∑

p∈P2`(v)

wT 2hs(p)

(p)

=

h∑

`=1

w`

‖w‖1= 1.

2

Lemma 5.13 Consider a subgraph Gx of a Tanner graph G, where x ∈ C(G) \ 0N.Then, for every variable node r ∈ Gx, every positive integer h, every 2 6 d 6 d∗, andevery weight vector w ∈ Rh

+, it holds that

wT 2hr (Gx) = Eρr

[wT

]

where ρr is the uniform distribution over T [r, 2h, d](Gx).

Proof: Let Gx = (Vx ∪ Jx, Ex) and let wT 2hr (Gx) denote a w-weighted path-prefix tree

rooted at node r with height 2h. We claim that the expectation of w-weighted d-treewT ∈ T [r, 2h, d](Gx) equals wT 2h

r (Gx) if wT is chosen uniformly at random.

67

A random d-tree in T [r, 2h, d](Gx) is obtained as follows. Start from the root r. Foreach variable path, take all its augmentations, and for each local-code path choose d− 1distinct augmentations uniformly at random. Let T ∈ T [r, 2h, d](Gx) denote such arandom d-tree, and consider a variable path p ∈ T 2h

r (Gx). Then,

PrT ∈T [r,2h,d](Gx)(p ∈ T ) =∏

q∈Prefix+(p):t(q)∈Jx

d− 1

degGx(t(q))− 1

. (5.9)

Note the following two observations: (i) if p /∈ T , then wT (p) = 0, and (ii) if p ∈ T ,then the value of wT (p) is constant and does not depend on T . Moreover, from the twoobservations above we have

ET ∈T [r,2h,d](Gx)

[wT (p)

]= wT (p) · PrT ∈T [r,2h,d](Gx)(p ∈ T ). (5.10)

By Definition 5.3, for p ∈ T we have

wT (p) =w|p|/2

‖w‖1· 1

degGx(t(p))

· 1

(d− 1)|p|/2·

∏

q∈Prefix+(p):t(q)∈Vx

1

degGx(t(q))− 1

. (5.11)

By substituting Equations (5.9) and (5.11) in Equation (5.10), we conclude that


[wT (p)

]=

w|p|/2

‖w‖1· 1

degGx(t(p))

·∏

q∈Prefix+(p)

1

degGx(t(q))− 1

= wT 2hr (Gx)(p).

2

Corollary 5.14 For every positive integer h, every 2 6 d 6 d∗, and every weight vectorw ∈ Rh

+, it holds thatπG,T 2h

r (Gx),w = Eρr

[πG,T ,w

].

Proof: By definition of πG,T 2hr (Gx),w, we have

πG,T 2hr (Gx),w(v) =

∑

p∈T 2hr (Gx):t(p)=v

wT 2hr (Gx)(p). (5.12)

By Lemma 5.13 and linearity of expectation we have∑


wT 2hr (Gx)(p) =

∑



[wT (p)

]

= ET ∈T [r,2h,d](Gx)

[∑


wT (p)

]

. (5.13)

Now, for variable paths p that are not in a d-tree T , wT (p) = 0. Hence, if a d-tree T isa subtree of T 2h

r (Gx), then∑


wT (p) =∑

p∈T :t(p)=vwT (p)

, πG,T ,w(v). (5.14)

68

From Equations (5.12)-(5.14) we conclude that for every v ∈ V,

πG,T 2hr (Gx),w(v) = ET ∈T [r,2h,d](Gx)

[πG,T ,w(v)

].

2

Before proving Lemma 5.11, we state a proposition from probability theory.

Proposition 5.15 Let ρrKi=1 denote K probability distributions. Let ρ , 1K

∑Kr=1 ρr.

Then,K∑

r=1

Eρr[x] = K · Eρ[x].

Proof of Lemma 5.11. By Lemma 5.12 and Corollary 5.14 we have for every v ∈ Vx

xv =∑

r∈Vx

πG,T 2hr (Gx),w(v) (5.15)

=∑

r∈Vx

Eρr

[πG,T ,w

].

Let ρ denote the distribution defined by ρ , 1‖x‖1·∑r∈Vx

ρr. By Proposition 5.15 and

Equation (5.15),xv = ‖x‖1 · Eρ

[πG,T ,w

],


69

70

Chapter 6

Message-Passing Decoding withLocal-Optimality Guarantee forTanner Codes with SingleParity-Check Local Codes

In this chapter1, we present a new message-passing iterative decoding algorithm, callednormalized weighted min-sum (nwms). nwms algorithm is a BP-type algorithm thatapplies to any irregular Tanner code with single parity-check local codes (e.g., LDPCcodes and HDPC codes). The decoding guarantee of nwms applies whenever there existsa locally optimal codeword. We prove that if a locally optimal codeword with respect toheight parameter h exists, , then nwms-decoding finds it in h iterations. This decodingguarantee holds for every finite value of h and is not limited by the girth. Becauselocal optimality of a codeword implies that it is the unique ML-codeword, the decodingguarantee also has an ML-certificate.

The results presented in this chapter are based on the definitions in Chapters 2 and 5.

6.1 Introduction

Wiberg et al. [WLK95, Wib96] developed the use of graphical models for systematicallydescribing instances of known decoding algorithms. In particular, the “sum-product” al-gorithm and the “min-sum” algorithm are presented as generic iterative message-passingdecoding algorithms that apply to any graph realization of a Tanner code. Wiberg etal. proved that the min-sum algorithm can be viewed as a dynamic programming algo-rithm that computes the ML-codeword if the Tanner graph is a tree. For LDPC codes,Wiberg [Wib96] characterized a necessary condition for decoding failures of the min-sumalgorithm by “negative” cost trees, called minimal deviations.

Arora et al. also pointed out that it is possible to design a re-weighted version of themin-sum decoder for regular codes that finds the locally-optimal codeword if such existsfor trees whose height is at most quarter of the girth.

1The research work presented in this chapter is based on [HE11b, HE12f]

71

Various iterative message-passing decoding algorithms are derived from the beliefpropagation algorithm (e.g., max-product [WLK95], attenuated max-product [FK00],tree-reweighted belief-propagation [WJW05], etc.). The convergence of these BP-basediterative decoding algorithms to an optimum solution has been studied extensively in var-ious settings (see e.g., [WLK95, FK00, WF01, CF02, CDE+05, WJW05, RU01, JP11]).However, bounds on the time and message complexity of these algorithms are not consid-ered. The analyses in these works often rely on the existence of a single optimal solutionin addition to other conditions such as: single-loop graphs, large girth, large reweightingcoefficients, consistency conditions, etc.

Jian and Pfister [JP11] analyzed a special case of the attenuated max-product de-coder [FK00], for regular LDPC codes. They considered skinny trees in the computationtree, the height of which is greater than the girth of the Tanner graph. Using contractionproperties and consistency conditions, they proved sufficient conditions under which themessage-passing decoder converges to a locally optimal codeword. This convergence alsoimplies convergence to the LP-optimum and therefore to the ML-codeword.

We note that aside from message-passing algorithms, other decoding algorithms forLDPC codes have been suggested as alternatives to LP decoding. We briefly mention twosuch alternatives below. (i) Vontobel and Koetter [VK06, VK07] presented an iterativedecoder based on a Gauss-Seidel-type algorithm for the dual LP. In a following work,Burshtein [Bur09] presented a scheduling scheme for the iterative algorithm and provedthat for bounded LLR channels this algorithm is a fully polynomial-time approximationscheme (FPTAS) with running time O(N4

ε3). (ii) Taghavi et al. [TS08, TSS11] stud-

ied cutting-plane algorithms as an alternative to direct implementation of LP decoding.These algorithms, called adaptive LP decoding, solve a hierarchy of simpler LP decodersusing sparse implementations of interior-point algorithms. This method is observed, inpractice, to be more efficient than solving the LP-decoding problem directly.

Contributions. We present a new message-passing iterative decoding algorithm, callednormalized weighted min-sum (nwms) algorithm (Algorithm 6.1). nwms algorithm ap-plies to any irregular Tanner code with single parity-check (SPC) local-codes (e.g., LDPCcodes and HDPC codes). The characterization of local-optimality for Tanner codes withSPC local-codes has two parameters: (i) a certificate height h, and (ii) a vector of layerweights w ∈ Rh

+ \ 0h. (Note that the local codes are single parity checks, and thereforethe deviation degree d equals 2.) We prove that, for any finite h, the nwms decoder isguaranteed to compute the ML codeword in h iterations if an h-locally-optimal codewordexists (Theorem 6.1). Furthermore, the output of nwms can be efficiently ML-certified.The number of iterations, h, may exceed the girth. Because local-optimality is a purecombinatorial property, the results do not rely on convergence. The time and messagecomplexity of nwms is O(|E| · h) where |E| is the number of edges in the Tanner graph.Local optimality, as defined in Chapter 5, is a sufficient condition for successful decodingby our BP-based algorithm in loopy graphs.

Previous bounds on the probability that a local-optimality certificate exists [ADS09,HE11a] hold for regular LDPC codes. The same bounds hold also for successful decodingby nwms. These bounds are based on proving that a local-optimality certificate existswith high probability for noise thresholds close to the BP threshold. Specifically, noisethresholds of p∗ > 0.05 in the case of BSC [ADS09], and σ∗ > 0.735 in the case of

72

BI-AWGN channel [HE11a] are proven.

Organization. The remainder of this chapter is organized as follows. Section 6.2presents the nwms iterative decoding algorithm for irregular Tanner codes with singleparity-check local-codes, followed by a proof that nwms finds the locally-optimal code-word. Finally, a discussion on the new message-passing decoding algorithm appears inSection 6.3.

6.2 Message-Passing Decoding with ML Guarantee

for Irregular LDPC Codes

In this section we present a weighted min-sum decoder (called, nwms) for irregular LDPCcodes over any memoryless binary-input output-symmetric channels. In Section 6.2.1 weprove that the decoder computes the maximum-likelihood (ML) codeword if a locally-optimal codeword exists (Theorem 6.1). Note that Algorithm nwms is not presented as amin-sum process. However, in Section 6.2.1, an equivalent min-sum version is presented.

We deal with Tanner codes based on Tanner graphs G = V ∪ J , E with singleparity-check local-codes. Local-code nodes C ∈ J in this case are called check nodes.The graph G may be either regular or irregular. All the results in this section hold forevery Tanner graph, regardless of its girth, degrees, or density.

A huge number of works deal with message-passing decoding. We point out threeworks that can be viewed as precursors to our decoding algorithm. Gallager [Gal63]presented the sum-product iterative decoding algorithm for LDPC codes. Tanner [Tan81]viewed iterative decoding algorithms as message passing algorithms over the edges of theTanner graph. Wiberg [Wib96] characterized decoding failures of the min-sum iterativedecoding algorithm by negative cost trees. Message-passing decoding algorithms proceedby iterations of “ping-pong” messages between the variables nodes and the local-codenodes in the Tanner graph. These messages are sent only along the edges.

Algorithm description. Algorithm nwms(λ, h, w), listed as Algorithm 6.1, is a nor-malized w-weighted version of the min-sum algorithm for decoding Tanner codes withsingle parity-check local-codes. The input to algorithm nwms consists of an LLR vectorλ ∈ RN , an integer h > 0 that determines the number of iterations, and a nonnegativeweight vector w ∈ Rh

+. For each edge (v, C), each iteration consists of one message fromthe variable node v to the check node C (that is, the “ping” message), and one messagefrom C to v (that is, the “pong” message). Hence, the time and message complexity ofAlgorithm 6.1 is O(|E| · h).

Let µ(l)v→C denote the “ping” message from a variable node v ∈ V to an adjacent

check-node C ∈ J in iteration l of the algorithm. Similarly, let µ(l)C→v denotes the “pong”

message from C ∈ J to v ∈ V in iteration l. Denote by µv the final value computed byvariable node v ∈ V. Note that nwms does not add w0λv in the computation of µv inLine 11 for ease of presentation2. The output of the algorithm x ∈ 0, 1N is computed

2Adding w0λv to µv in Line 11 requires changing the definition of deviations so that they also includethe root of each d-tree.

73

locally by each variable node in Line 12.Algorithm nwms may be applied to any memoryless binary-input output-symmetric

channel (e.g., BEC, BSC, AWGN, etc.) because the input is the LLR vector.

Algorithm 6.1 nwms(λ, h, w) - An iterative normalized weighted min-sum decodingalgorithm. Given an LLR vector λ ∈ RN and level weights w ∈ Rh

+, outputs a binarystring x ∈ 0, 1N .

1: Initialize: ∀C ∈ J , ∀v ∈ N (C) : µ(−1)C→v ← 0

2: for l = 0 to h− 1 do3: for all v ∈ V, C ∈ N (v) do “PING”4: µ

(l)v→C ← wh−l

degG(v)λv + 1

degG(v)−1

∑

C′∈N (v)\C µ(l−1)C′→v

5: end for6: for all C ∈ J , v ∈ N (C) do “PONG”7: µ

(l)C→v ←

(∏

u∈N (C)\v sign(µ

(l)u→C

))

·minu∈N (C)\v|µ(l)

u→C|

8: end for9: end for

10: for all v ∈ V do Decision11: µv ←

∑

C∈N (v) µ(h−1)C→v

12: xv ←

0 if µv > 0,

1 otherwise.

13: end for

The following theorem states that nwms(λ, h, w) computes an (h, w, 2)−locally-optimal codeword w.r.t. λ if such a codeword exists. Hence, Theorem 6.1 provides asufficient condition for successful iterative decoding for any finite number h of iterations.In particular, the number of iterations may exceed the girth. Theorem 6.1 implies analternative proof of the uniqueness of an (h, w, 2)-locally optimal codeword that is provedin Theorem 5.5. The proof appears in Section 6.2.1.

Theorem 6.1 (NWMS finds the locally optimal codeword) Let G = (V ∪ J , E)denote a Tanner graph and let C(G) ⊂ 0, 1N denote the corresponding Tanner code withsingle parity-check local-codes. Let h ∈ N+ and let w ∈ Rh

+ \ 0h denote a non-negativeweight vector. Let λ ∈ RN denote the LLR vector of the channel output. If x ∈ C(G) isan (h, w, 2)-locally optimal codeword w.r.t. λ, then nwms(λ, h, w) outputs x.

The nwms decoding algorithm has an ML-certificate. Namely, if a locally-optimalcodeword exists, the dynamic programming algorithm verify-lo described in Chap-ter 5.4 can be used to verify whether nwms(λ, h, w) outputs an (h, w, 2)-locally optimalcodeword. If so, then, by Theorem 5.5, the output of nwms(λ, h, w) is the unique ML-codeword.

Corollary 6.11 states that for MBIOS channels, the probability that nwms fails isindependent of the transmitted codeword. Hence, the following corollary is a contra-positive of Theorem 6.1 provided the all-zero codeword assumption.

74

Corollary 6.2 Assume that the all-zero codeword is transmitted, and let λ ∈ RN denotethe log-likelihood ratio for the received word. If nwms(λ, h, w) fails to decode the all-zero

codeword for w ∈ Rh+\0h, then there exists a deviation β ∈ B(w)

2 such that 〈λ, β〉 6 0.

Hence, for a fixed h and w ∈ Rh+\0h,

Prnwms(λ, h, w) fails | c = 0N 6 Pr∃β ∈ B(w)

2 such that 〈λ, β〉 6 0∣∣c = 0N

. (6.1)

Bounds on the existence of a non-positive weighted deviation (i.e., the right-hand side inEquation (6.1)) are discussed in Section 6.3.

6.2.1 Proof of Theorem 6.1 - NWMS Finds the Locally OptimalCodeword

Proof outline. The proof of Theorem 6.1 is based on two observations. (1) We presentan equivalent algorithm, called nwms2 (Section 6.2.1), and prove that Algorithm nwms2outputs the all-zero codeword if 0N is locally optimal (Sections 6.2.1-6.2.1). (2) InLemma 6.10 we prove that algorithm nwms is symmetric (Section 6.2.1). This sym-metry is with respect to the mapping of a pair (x, λ) of a codeword and an LLR vector toa pair (0N , λ0) of the all-zero codeword and a corresponding LLR vector λ0 , (−1)x ∗ λ(recall that “∗” denotes a coordinate-wise vector multiplication).

To prove Theorem 6.1, we prove the contrapositive statement, that is, if x 6=nwms(λ, h, w), then x is not (h, w, 2)-locally optimal w.r.t. λ. Let x denote a code-word, and define b ∈ ±1N by bi , (−1)xi. Define λ0 , b ∗ λ. By definition λ = b ∗ λ0.The proof is obtained by the following derivations:

x 6= nwms(λ, h, w)

⇔x 6= x⊕ nwms(λ0, h, w) [Lemma 6.10, symmetry of nwms]

⇔0N 6= nwms(λ0, h, w)

⇒∃β ∈ B(w)2 .〈λ0, β〉 6 0 [Lemma 6.7, local optimality (LO)]

⇔x is not (h, w, 2)−locally optimal for λ. [Prop. 5.8, symmetry of LO]

Note that the only implication that is only sufficient is in the 4th line, which is based onLemma 6.7.

2

We now prove the two lemmas used in the foregoing proof.

NWMS2 : An Equivalent Version

Algorithm nwms is input the log-likelihood ratio. We refer to this algorithm as a min-sum algorithm in light of the general description of Wiberg [Wib96] in the log-domain.In Wiberg’s description, every check node finds a minimum value from a set of functionson the incoming messages, and every variable node computes the sum of the incomingmessages and its corresponding channel observation. Hence the name “min-sum”.

75

Let y ∈ RN denote channel observations. For a ∈ 0, 1, define the log-likelihood ofyi by λi(a) , − log

(Pr(yi|ci = a)

). Note that the log-likelihood ratio λi for yi equals

λi(1)− λi(0).

Algorithm nwms2(λ(0), λ(1), h, w), listed as Algorithm 6.2, is a normalized w-weighted min-sum algorithm. Algorithm nwms2 computes separate “reliabilities” for“0” and “1”. Namely, µ

(l)v→C(a) and µ

(l)C→v(a) denote the messages corresponding to the

assumption that node v is assigned the value a (for a ∈ 0, 1). The higher the values ofthese messages, the lower the likelihood of the event xv = a.

The main difference between the presentations of Algorithm 6.1 and Algorithm 6.2is in Line 7. Consider a check node C and valid assignment x ∈ 0, 1deg(C) to variablenodes adjacent to C with even weight. For every such assignment x in which xv = a, thecheck node C computes the sum of the incoming messages µ

(l)u→C(xu) from the neighboring

nodes u ∈ N (C) \ v. The message µ(l)C→v(a) equals the minimum value over these valid

summations.

Algorithm 6.2 nwms2(λ(0), λ(1), h, w) - An iterative normalized weighted min-sumdecoding algorithm. Given an log-likelihood vectors λ(a) ∈ RN for a ∈ 0, 1 and levelweights w ∈ Rh

+, outputs a binary string x ∈ 0, 1N .

1: Initialize: ∀C ∈ J , ∀v ∈ N (C), ∀a ∈ 0, 1 : µ(−1)C→v(a)← 0

2: for l = 0 to h− 1 do3: for all v ∈ V, C ∈ N (v), a ∈ 0, 1 do “PING”4: µ

(l)v→C(a)← wh−l

degG(v)λv(a) + 1

degG(v)−1

∑

C′∈N (v)\C µ(l−1)C′→v(a)

5: end for6: for all C ∈ J , v ∈ N (C), a ∈ 0, 1 do “PONG”7: µ

(l)C→v(a)← minx∈0,1deg(C):‖x‖1 is even and xv=a

∑

u∈N (C)\v µ(l)u→C(xu)

8: end for9: end for

10: for all v ∈ V do Decision11: µv(a)←

∑

C∈N (v) µ(h−1)C→v (a)

12: xv ←

0 if(µv(1)− µv(0)

)> 0,

1 otherwise.

13: end for

Following Wiberg [Wib96, Appendix A.3], we claim that Algorithms 6.1 and 6.2 areequivalent.

Claim 6.3 Let λ, λ(0), and λ(1) in RN denote the LLR vector and the two log-likelihoodvectors for a channel output y ∈ RN . Then, for every h ∈ N+ and w ∈ Rh

+, the followingequalities hold:

1. µ(l)v→C = µ

(l)v→C(1)− µ(l)

v→C(0) and µ(l)C→v = µ

(l)C→v(1)− µ(l)

C→v(0) in every iteration l.

2. µv = µv(1) − µv(0). Hence nwms(λ, h, w) and nwms2(λ(0), λ(1), h, w) output thesame vector x.

76

NWMS2 as a Dynamic Programming Algorithm

In Lemma 6.4 we prove that Algorithm nwms2 is a dynamic programming algorithmthat computes, for every variable node v, two min-weight valid configurations. Oneconfiguration is 0-rooted and the other configuration is 1-rooted. Algorithm nwms2decides xv = 0 if the min-weight valid configuration rooted at v is 0-rooted, otherwisedecides xv = 1. We now elaborate on the definition of valid configurations and theirweight.

Valid configurations and their weight. Fix a variable node r ∈ V. We refer to ras the root. Consider the path-prefix tree T 2h

r (G) rooted at r consisting of all the pathsof length at most 2h starting at r. Denote the vertices of T 2h

r by V ∪ J , where paths inV = p | p ∈ V , t(p) ∈ V are variable paths, and paths in J = p | p ∈ V , t(p) ∈ J are parity-check paths. Denote by (r) the empty path, i.e., the path consisting of onlythe root r.

A binary word z ∈ 0, 1|V| is interpreted as an assignment to variable paths p ∈ Vwhere zp is assigned to p. We say that z is a valid configuration if it satisfies all parity-

check paths in J . Namely, for every check path q ∈ J , the assignment to its neighbors hasan even number of ones. We denote the set of valid configurations of T 2h

r by config(T 2hr ).

The weight WT 2hr

(z) of a valid configuration z is defined by weights WT 2hr

(p) that

are assigned to variable paths p ∈ V as follows. We start with “level” weights w =(w1, . . . , wh) ∈ Rh

+ that are assigned to levels of variable paths in T 2hr . Define the weight

of a variable path p ∈ V with respect to w by3

WT 2hr

(p) ,w|p|/2

degG

(t(p)

) ·∏

q∈Prefix+(p)∩V

1

degG

(t(q)

)− 1

.

There is a difference between Definition 5.3 and WT 2hr

(p). In Definition 5.3 the productis taken over all paths in Prefix+(p). However, inWT 2h

r(p) the product is taken only over

variable paths in Prefix+(p).

The weight of a valid configuration z ∈ 0, 1|V| is defined by

WT 2hr

(z) ,∑

p∈V\(r)

λt(p)(zp) · WT 2hr

(p).

Given a variable node r ∈ V and a bit a ∈ 0, 1, our goal is to compute the value ofa min-weight valid configuration Wmin(r, a) defined by

Wmin(T 2hr , a) , arg minWT 2h

r(z) | z ∈ config(T 2h

r ), z(r) = a. (6.2)

In the following lemma we show that nwms2 computes Wmin(T 2hr , a) for every r ∈ V

and a ∈ 0, 1. The proof is based on interpreting nwms2 as a dynamic programming.See Appendix 6.A for details.

Lemma 6.4 Consider an execution of nwms2(λ(0), λ(1), h, w). For every variable noder, µr(a) =Wmin(T 2h

r , a).

3We use the same notation as in Definition 5.3.

77

From Line 12 in Algorithm nwms2 we obtain the following corollary that characterizesnwms2 as a computation of min-weight configurations.

Corollary 6.5 Let x denote the output of nwms2(λ(0), λ(1), h, w). For every variablenode r,

xr =

0 if Wmin(T 2hr , 1) >Wmin(T 2h

r , 0),

1 otherwise.

Define the W∗ cost of a configuration z in T 2hr by

W∗T 2h

r(z) ,

∑

p∈V

λt(p) · WT 2hr

(p) · zp.

Note that W∗T 2h

r(z) uses the LLR vector λ (i.e., λv = λv(1)− λv(0)).

Corollary 6.6 Let x denote the output of nwms(λ, h, w). Let z∗ denote a valid config-uration in T 2h

r with minimum W∗ cost. Then, xr = z∗(r).

Proof: The derivation in Equation (6.3) shows that the valid configuration z∗ thatminimizes the W∗ cost also minimizes the W cost.

arg minvalid z∈T 2h

r

WT 2hr

(z)(1)= arg min

valid z∈T 2hr

WT 2h

r(z)−WT 2h

r(0|V|)

(6.3)

(2)= arg min

valid z∈T 2hr

∑

p∈V:zp=1

λt(p)(1) · WT 2hr

(p)−∑

p∈V:zp=1

λt(p)(0) · WT 2hr

(p)

(3)= arg min

valid z∈T 2hr

∑

p∈V

λt(p) · WT 2hr

(p) · zp = arg minvalid z∈T 2h

r

W∗T 2h

r(z).

Equality (1) relies on the fact that WT 2hr

(0|V|) is a constant. The elements λt(p)(zp) ·WT 2h

r(p) in WT 2h

r(z) where zp = 0 are reduced by the substraction of the same elements

in WT 2hr

(0|V|). Therefore, leaving in Equality (2) only elements that correspond to bitszp = 1. Equality (3) is obtained by the LLR definition λt(p) = λt(p)(1)− λt(p)(0).

Let x = nwms(λ, h, w) and y = nwms2(λ(0), λ(1), h, w). By Corollary 6.5 andEquation 6.3, yr = z∗(r). By Claim 6.3, xr = yr, and the corollary follows. 2

Connections to Local Optimality

The following lemma states that the nwms algorithm computes the all-zero codeword if0N is locally optimal.

Lemma 6.7 Let x denote the output of nwms(λ, h, w). If 0N is (h, w, 2)-locally optimalw.r.t. λ, then x = 0N .

Lemma 6.8 Let x denote the output of nwms(λ, h, w). If xv = 1, then there exists a

deviation β ∈ B(w)2 corresponding to a w-weighted 2-tree such that 〈λ, β〉 6 0.

78

Proof: We prove the contrapositive statement. Assume that x 6= 0N . Hence, thereexists a variable node v for which xv = 1. Consider T 2h

v = (V ∪ J , E). Then, by

Corollary 6.6, there exists a valid configuration z∗ ∈ 0, 1|V| in T 2hv with z∗(v) = 1 that

satisfies∀valid configuration y ∈ T 2h

v . W∗T 2h

v(z∗) 6W∗

T 2hv

(y). (6.4)

Let T (z∗) denote the subgraph of T 2hv induced by V(z∗) ∪ N

(V(z∗)

)where V(z∗) =

p ∈ V|z∗p = 1. Note that T (z∗) is a forest. Because z∗(v) = 1 and z∗ is a valid

configuration, the forest T (z∗) must contain a 2-tree of height 2h rooted at the node v;

denote this tree by T . Let τ ∈ 0, 1|V| denote the support of T , and let z0 ∈ 0, 1|V|denote the support of T (z∗) \ T . Then, z∗ = τ + z0, where z0 is also necessarily a validconfiguration. By linearity and disjointness of τ and z0, we have

W∗T 2h

v(z∗) =W∗

T 2hv

(τ + z0) =W∗T 2h

v(τ) +W∗

T 2hv

(z0). (6.5)

Because z0 is a valid configuration, by Equation (6.4), we have W∗T 2h

v(z∗) 6 W∗

T 2hv

(z0).

By Equation (6.5), W∗T 2h

v(τ) 6 0.

Let w∗τ ∈ R|V| denote the vector whose component indexed by p ∈ V equals

WT 2hv

(p) · τp. The vector w∗τ equals to the w-weighted 2-tree wT according to Defini-

tion 5.3. Hence, β = πG,T ,w ∈ B(w)2 satisfies 〈λ, β〉 =W∗

T 2hv

(τ) 6 0. We therefore conclude

that 0N is not (h, w, 2)-locally optimal w.r.t. λ and the lemma follows. 2

Symmetry and the All-Zero Codeword Assumption

We define symmetric decoding algorithms (see [RU08, Definition 4.81] for a discussion ofsymmetry in message passing algorithms).

Definition 6.9 (symmetry of decoding algorithm) Let x ∈ C denote a codewordand let b ∈ ±1N denote a vector defined by bi = (−1)xi. Let λ denote an LLR vector.A decoding algorithm, dec(λ), is symmetric with respect to code C, if

∀x ∈ C. x⊕ dec(λ) = dec(b ∗ λ). (6.6)

The following lemma states that nwms algorithm is symmetric. The proof is by inductionon the number of iterations.

Lemma 6.10 (symmetry of NWMS) Fix h ∈ N+ and w ∈ RN+ . Consider λ ∈ RN

and a codeword x ∈ C(G). Let b ∈ ±1N denote a vector defined by bi = (−1)xi. Then,

x⊕ nwms(λ, h, w) = nwms(b ∗ λ, h, w). (6.7)

Proof: Let µ(l)v→C [λ] denote the message sent from v to C in iteration l given an input

λ. Let µ(l)C→v[λ] denote the corresponding message from C to v. From the decision

of nwms in Line 12, it’s sufficient to prove that µ(l)v→C [λ] = (−1)xv · µ(l)

v→C [b ∗ λ] and

µ(l)C→v[λ] = (−1)xv · µ(l)

C→v[b ∗ λ] for every 0 6 l 6 h− 1.The proof is by induction on l. The induction basis, for l = −1, holds because

µ(−1)C→v[λ] = (−1)xv · µ(−1)

C→v[b ∗ λ] = 0 for every codeword x.

79

The induction step is proven as follows. By induction hypothesis we have

µ(l)v→C [λ] =

wh−l

degG(v)λv +

1

degG(v)− 1

∑

C′∈N (v)\Cµ

(l−1)C′→v[λ]

= (−1)xv ·(

wh−l

degG(v)(−1)xvλv +

1

degG(v)− 1

∑

C′∈N (v)\Cµ

(l−1)C′→v[b ∗ λ]

)

= (−1)xv · µ(l)v→C [b ∗ λ].

For check to variable messages we have by induction hypothesis,

µ(l)C→v[λ] =

(∏

u∈N (C)\vsign

(µ

(l)u→C[λ]

))

· minu∈N (C)\v

∣∣µ

(l)u→C[λ]

∣∣

=

(∏

u∈N (C)\vsign

((−1)xu · µ(l)

u→C[b ∗ λ]))

· minu∈N (C)\v

∣∣(−1)xu · µ(l)

u→C [b ∗ λ]∣∣

=

(∏

u∈N (C)\v(−1)xu

)

· µ(l)C→v[b ∗ λ].

Because x is codeword, for every single parity check C we have∏

u∈N (C)\v(−1)xu =

(−1)xv . Therefore, µ(l)C→v[λ] = (−1)xv · µ(l)

C→v[b ∗ λ] and the claim follows. 2

The following corollary follows from Lemma 6.10 and the symmetry of an MBIOS channel.

Corollary 6.11 (All-zero codeword assumption) Fix h ∈ N+ and w ∈ RN+ . For

MBIOS channels, the probability that nwms fails is independent of the transmitted code-word. That is,

Prnwms decoding fails = Prnwms(λ, h, w) 6= 0N |c = 0N.

Proof: Following Lemma 6.10, for every codeword x,

Prnwms(λ, h, w) 6= x | c = x = Pr

nwms(b ∗ λ, h, w) 6= 0N | c = x.

For MBIOS channels, Pr(λi | 0) = Pr(−λi | 1). Therefore, the mapping (x, λ) 7→ (0N , b ∗λ) preserves probability measure. We apply this mapping to (x, b ∗ λ) 7→ (0N , b ∗ b ∗ λ)and conclude that

Prnwms(b ∗ λ, h, w) 6= 0N | c = x = Pr

nwms(λ, h, w) 6= 0N | c = 0N.

2

6.3 Discussion

We present a new message-passing algorithm, called nwms, with the following properties:(1) nwms applies to regular and irregular Tanner codes with SPC local codes. (2) nwms

has a decoding guarantee stated as follows. If there exists a locally optimal codeword,

80

then nwms finds it (recall that a locally optimal codeword has an ML-certificate). (3) Thedecoding guarantee of nwms is not bounded by the girth. Namely, the height parameterin local optimality and the number of iterations in the decoding is arbitrary. (4) Thedecoding guarantee of nwms is not asymptotic nor does it rely on convergence. Namely,it applies to finite codes and decoding with a finite number of iterations. (5) nwms

applies to any MBIOS channel (including unbounded LLRs).In the special case of regular LDPC codes, the nwms algorithm is a generalization of

BP-based decoding algorithms simply by the choice of the level weights. This includes themin-sum algorithm [WLK95, Wib96], attenuated max-product [FK00], and normalizedBP-based [CF02, CDE+05]. Hence, ML-certificates for these decoding algorithms followfrom nwms.

Many works on the BP-based decoding algorithms study the convergence of messagepassing algorithms (e.g., [WF01, WJW05, RU01, JP11]). In particular, bounds on thetime and message complexity are not considered. The analyses in these works rely on theexistence of a single optimal solution in addition to other conditions such as: a single cycle,large girth, large reweighing coefficient, consistency conditions, etc. On the other hand,the nwms algorithm is guaranteed to compute the ML codeword within h iterations if alocally optimal certificate with height parameter h exists for some codeword. Moreover,the certificate can be computed efficiently (see Algorithm 5.1).

In previous works [ADS09, HE11a], the probability that a locally optimal certificatewith height parameter h exists for some codeword was investigated for regular LDPCcodes with h < 1

4girth(G). Consider (dL, dR)-regular LDPC code whose Tanner graph

G has logarithmic girth, let h < 14girth(G) and define a constant weight vector w ,

1h. In that case, the message normalization by variable node degrees has the effectthat each level of variable nodes in a 2-tree contributes equally to the cost of the w-weighted value of the 2-tree. Hence, the set B(w)

2 of deviations equals to the set of(dL− 1)-exponentially weighted skinny trees [ADS09, HE11a]. Following Equation (6.1),we conclude that the previous bounds on the probability that a locally optimal certificateexists [ADS09, HE11a] apply also to the probability of nwms decoding success.

Consider (3, 6)-regular LDPC codes whose Tanner graphs G have logarithmic girth,and let h = 1

4girth(G) and w = 1h. Then, nwms(λ, h, w) succeeds in recovering the trans-

mitted codeword with probability at least 1− exp(−N δ) for some constant 0 < δ < 1 inthe following cases: (1) In a BSC with crossover probability p < 0.05 (implied by [ADS09,Theorem 5]). (2) In a BI-AWGN channel with Eb

N0> 2.67dB (implied by [HE11a, Theo-

rem 1], see Theorem 4.1).It remains to explore good weighting schemes (choice of vectors w) for specific families

of irregular LDPC codes, and prove that a locally optimal codeword exists with highprobability provided that the noise is bounded. Such a result would imply that nwms

decoding algorithm is a good efficient replacement for LP-decoding.

6.A Optimal Valid Subconfigurations in the Execu-

tion of NWMS2

The description of algorithm nwms2 as a dynamic programming deals with computationof optimal valid configurations and subconfigurations. In this appendix we define optimal

81

wh

wh−l

(v)

(v, C)

2l + 2

T 2l+2C→v

(a) Substructure T 2l+2

C→v for 0 6 l 6 h− 1.

wh−l

wh

wh−l+1

(C)

(C, v)

T 2l+1v→C

2l + 1

(b) Substructure T 2l+1

v→C for 0 6 l 6 h− 1.

Figure 6.1: Substructures of a path-prefix tree T 2hr (G) in a dynamic programming that

computes optimal configurations in T 2hr (G).

valid subconfigurations and prove invariants for the messages of Algorithm nwms2.

Denote by T 2l+2C→v a path prefix tree of G rooted at node v with height 2l+2 such that

all paths must start with edge (v, C) (see Figure 6.1(a)). Denote by T 2l+1v→C a path prefix

tree of G rooted at node C with height 2l + 1 such that all paths mast start with edge(C, v) (see Figure 6.1(b)).

Consider the message µ(2l+2)C→v . It is determined by the messages sent along the edges

of T 2l+2v (G) that hang from the edge (v, C). We introduce the following notation of this

subtree (see Figure 6.2). Consider a path-prefix tree T 2hr (G) and a variable path p such

that (1) p is a path from root r to a variable node v, (2) the last edge in p is (C ′, v) forC ′ 6= C, and (3) the length of p is 2(h− l− 1). In such a case, T 2l+2

C→v is isomorphic to thesubtree of T 2h

r hanging from p along the edge(p, p (v, C)

). Hence, we say that T 2l+2

C→v

is a substructure of T 2hr (G). Similarly, if there exists a backtrackless path q in G from r

to C with length 2(h− l)− 1 that does not end with edge (v, C), we say that T 2l+1v→C is a

substructure of T 2hr (G).

Let Tsub denote a substructure T 2l+2C→v or T 2l+1

v→C . A binary assignment z ∈ 0, 1|V(Tsub)|

to variable paths V(Tsub) is a valid subconfiguration if it satisfies every parity-check pathq ∈ Tsub such that |q| > 1. We denote the set of valid subconfigurations of Tsub byconfig(Tsub).

Define the weight of a variable path q ∈ V(Tsub) with respect to level weights w =(w1, . . . , wh) ∈ Rh

+ by

Wsub(Tsub, q) ,wh−l−1+d|q|/2edegG

(t(q)

) ·∏

q′∈Prefix+(q)∩V(Tsub)

1

degG

(t(q′)

)− 1

.

The weight of a valid subconfiguration z for a substructure Tsub is defined by

Wsub(Tsub, z) ,∑

q∈V(Tsub)||q|>1

λt(q)(zq) · Wsub(Tsub, q).

82

(r)

p (v, C)

p 2h

2l + 2

T 2hr (G)

∼= T 2l+2C→v

Figure 6.2: T 2l+2C→v as a substrucure isomorphic to a subtree of the path-prefix tree T 2h

r .

Define minimum weight of substructures T 2l+1v→C and T 2l+2

C→v for a ∈ 0, 1 as follows.

Wminsub (T 2l+1

v→C , a) , minWsub(T 2l+1v→C , z) | z ∈ config(T 2l+1

v→C ), z(C,v) = a, and

Wminsub (T 2l+2

C→v , a) , minWsub(T 2l+2C→v , z) | z ∈ config(T 2l+2

C→v ), z(v) = a.

The minimum weight substructures satisfy the following recurrences.

Proposition 6.12 Let a ∈ 0, 1, then

1. for every 1 6 l 6 h− 1,

Wminsub (T 2l+1

v→C , a) =wh−l

degG(v)· λv(a) +

1

degG(v)− 1·

∑

C′∈N (v)\CWmin

sub (T 2(l−1)+2C′→v , a).

2. for every 0 6 l 6 h− 1,

Wminsub (T 2l+2

C→v , a) = min

∑

u∈N (C)\vWmin

sub (T 2l+1u→C , xu)

∣∣∣∣x ∈ 0, 1degG(C), ‖x‖1 even, xv = a

.

The following claim states an invariant over the messages µlC→v(a) and µl

v→C(a) thatholds during the execution of nwms2.

Claim 6.13 Consider an execution of nwms2(λ(0), λ(1), h, w). Then, for every 0 6 l 6h− 1,

µ(l)v→C(a) = Wmin

sub (T 2l+1v→C , a), and

µ(l)C→v(a) = Wmin

sub (T 2l+2C→v , a).

83

Proof: The proof is by induction on l. The induction basis, for l = 0, holds becauseµ

(0)v→C(a) = Wmin

sub (T 1v→C , a) = wh

degG(v)λv(a) for every edge (v, C) of G. The induction step

follows directly from the induction hypothesis and Proposition 6.12. 2

84

Chapter 7

Bounds on the Word ErrorProbability for LP-Decoding ofRegular Tanner Codes

In this chapter1, we prove new bounds on the word error probability for LP-decoding ofregular Tanner codes. These bounds are based on the local-optimality characterizationintroduced in Chapter 5.

The results presented in this chapter are based on the definitions in Chapters 2 and 5.

7.1 Introduction

Most of the research on the error correction of Tanner codes deals with two main familiescodes: (i) expander Tanner codes, and (ii) repeat-accumulate codes.

Sipser and Spielman [SS96] studied Tanner codes based on expander graphs and an-alyzed a simple bit-flipping iterative decoding algorithm. Specifically, their main resultswere stated for (2,∆)-regular Tanner codes whose Tanner graphs are edge-vertex inci-dence graphs of ∆-regular expander graphs. Their novel scheme was later improved,and it was shown that expander Tanner codes can even asymptotically achieve capac-ity in the BSC with an iterative decoding bit-flipping scheme [Z01, BZ02, BZ04]. Thelater works were stated for (2,∆)-regular Tanner codes whose Tanner graphs are edge-vertex incidence graphs of bipartite ∆-regular expander graphs. In addition, these worksalso present a worst-case analysis (for a bit-flipping adversarial channel). Skachek andRoth [SR03, Ska07] presented a generalized minimum distance iterative decoding algo-rithm that attains the best worst-case analysis to date for iterative decoding of suchexpander codes.

Feldman and Stein [FS05] studied LP-decoding of expander codes. They proved thatLP-decoding can asymptotically achieve capacity of bounded LLR MBIOS channels witha special family of (2,∆)-regular expander Tanner codes. They also presented a worst-case analysis for bit-flipping adversarial channels.

The error correction capability of expander codes depends on the expansion, thus afairly large degree and huge block-lengths are required to achieve good error correction.

1The research work presented in this chapter is based on [HE11b]

85

Repeat-Accumulate (RA) codes were introduced by Divsalar, Jin andMcEliece [DJM98] as a simple non-trivial example for turbo codes. Divsalar etal. proved inverse polynomial bounds in the blocklength on the word error probabilityof ML decoding for RA codes.

RA codes may be represented as Tanner codes using two types of simple local-codes:(i) repetition codes, and (ii) single parity-check codes. The simplicity of the local codesand the structure of the Tanner graph makes RA codes amenable to asymptotic analysisusing density evolution techniques (see e.g., [JKM00]). However, there are still no provenfinite-length bounds on the word error probability of RA codes for iterative message-passing decoders (for repetition q > 2).

Recently, Goldenberg and Burshtein [GB11] generalized the analysis of Feldman andKarger [FK02] to RA codes with even repetition q > 4. For this family of codes, theyproved inverse polynomial bounds in the blocklength on the word error probability ofLP-decoding. This result is similar to the analysis presented in Chapter 3.

Contributions. We present new bounds on the word error probability for LP decod-ing of regular Tanner codes in MBIOS channels. In Chapter 5 we have presented anew local-optimality characterization for Tanner codes. We prove that local-optimalityin this characterization implies both ML-optimality (Theorem 5.5) and LP-optimality(Theorem 5.10).

Because trees in our new characterization may have degrees bigger than two, theycontain more vertices. Hence this characterization leads to improved bounds for successfuldecoding of regular Tanner codes (Theorems 7.1 and 7.12). These bounds extend theprobabilistic analysis of the min-sum process by Arora et al. [ADS09] to a sum-min-sumprocess on regular trees. For regular Tanner codes, we prove bounds on the word errorprobability of LP-decoding under MBIOS channels that are inverse doubly-exponentialin the girth of the Tanner graph. We also prove bounds on the threshold of regularTanner codes whose Tanner graphs have logarithmic girth. This means that if the noisein the channel is below that threshold, then the decoding error diminishes as a functionof the block length. Note that Tanner graphs with logarithmic girth can be constructedexplicitly (see e.g., [Gal63]).

Organization. In Section 7.2, we use the combinatorial characterization of local-optimality to bound the error probability of LP decoding for regular Tanner codes. Adiscussion on the new bounds appears in Section 7.3.

7.2 Bounds on the Error Probability of LP-Decoding

Using Local-Optimality

In this section we analyze the probability that a local optimality certificate for regularTanner codes exists, and therefore LP decoding succeeds. The analysis is based on thestudy of a “sum-min-sum” process that characterizes d-trees of a regular Tanner graph.We prove upper bounds on the error probability of LP decoding of regular Tanner codesin memoryless channels. The upper bounds on the error probability imply lower boundson the threshold of LP decoding. We apply the analysis to the binary symmetric channel,

86

and compare our results with previous results on expander codes. The analysis presentedin this section generalizes the probabilistic analysis of Arora et al. [ADS09] from 2-trees(skinny trees) to d-trees for any d > 2.

In the remainder of this section, we restrict our discussion to (dL, dR)-regular Tannercodes with minimum local distance d∗. Let d denote a parameter such that 2 6 d 6 d∗.

Theorem 7.1 summarizes the main results presented in this section for binary sym-metric channels, and generalizes to any MBIOS channel as described in Section 7.2.3.Concrete bounds are given for a (2, 16)-regular Tanner code with code rate at least 0.375when using [16, 11, 4]-extended Hamming codes as local codes.

Theorem 7.1 Let G denote a (dL, dR)-regular bipartite graph with girth g, and let C(G)denote a Tanner code based on G with minimum local distance d∗. Let x ∈ C(G) be acodeword. Suppose that y ∈ 0, 1N is obtained from x by flipping every bit independentlywith probability p. Then,

1. [finite length bound] Let d = d0, p 6 p0, (dL, dR) = (2, 16), and d∗ = 4. For thevalues of d0 and p0 in Table 7.1a it holds that x is the unique optimal solution tothe LP decoder with probability at least

Pr(xlp(y) = x

)> 1−N · α(d−1)b 1

4gc

for some constant α < 1.

2. [asymptotic bound] Let d = d0, (dL, dR) = (2, 16), d∗ = 4, and g = Ω(logN)sufficiently large. For the values of d0 and p0 in Table 7.1b it holds that x is theunique optimal solution to the LP decoder with probability at least 1 − exp(−N δ)for some constant 0 < δ < 1, provided that p 6 p0(d0).

3. Let d′ , d− 1, d′L , dL − 1, and d′ , dR − 1. For any (dL, dR) and 2 6 d 6 d∗ s.t.d′L · d′ > 2, the codeword x is the unique optimal solution to the LP decoder with

probability at least 1−N · α(d′L·d′)b14gc

for some constant α < 1, provided that

mint>0

(α1(p, d, dL, dR, t)

)·(α2(p, d, dL, dR, t)

)1/(d′L·d′−1)

< 1,

where

α1(p, d, dL, dR, t) =d′−1∑

k=0

(d′Rk

)

pk(1− p)(d′R−k)e−t(d′−2k)

+

( d′R∑

k=d′

(d′Rk

)

pk(1− p)d′R−k

)

etd′ ,

α2(p, d, dL, dR, t) =

(d′Rd′

)((1− p)e−t + pet

)d′.

87

d0 p0

“finite”3 0.00864 0.0218

“asymptotic”3 0.0194 0.044

Table 7.1: Computed values of p0 for finite d0 < d∗ in Theorem 7.1. Values are presentedfor (2, 16)-Tanner code with rate at least 0.375 when using [16, 11, 4]-extended Hammingcodes as local codes. (a) finite-length bound: ∀p 6 p0 bound on the word error probabilitythat is inverse doubly-exponential in the girth of the Tanner graph. (b) asymptotic-bound: For g = Ω(logN) sufficiently large, LP decoder succeeds w.p. at least 1 −exp(−N δ) for some constant 0 < δ < 1, provided that p 6 p0(d0).

Proof: [Proof Outline] Theorem 7.1 follows from Lemma 7.4, Lemma 7.7, Corol-lary 7.10, and Corollary 7.11 as follows. The first part, that states a finite-length result,follows from Lemma 7.4 and Corollaries 7.10 and 7.11 by taking s = 0 < h < 1

4girth(G)

which holds for any Tanner graph G. The second part, that deals with an asymptoticresult, follows from Lemma 7.4 and Corollaries 7.10 and 7.11 by fixing s = 10 and takingg = Ω(logN) sufficiently large such that s < h = Θ(logN) < 1

4girth(G). It therefore

provides a lower bound on the threshold of LP-decoding. The third part, that statesa finite-length result for any (dL, dR)-regular LDPC code, follows from Lemma 7.4 andLemma 7.7. 2

We refer the reader to Section 7.3 for a discussion on the results stated in Theorem 7.1.We now provide more details and prove the lemmas and corollaries used in the proof ofTheorem 7.1.

In order to simplify the probabilistic analysis of algorithms for decoding linear codesover symmetric channels, we apply the assumption that the all-zero codeword is transmit-ted, i.e., c = 0N (see Chapter 2.5 for details). The following corollary is the contrapositivestatement of Theorem 5.10 given c = 0N .

Corollary 7.2 For every fixed h ∈ N, w ∈ Rh+\0h, and 2 6 d 6 d∗,

PrLP decoding fails 6 Pr∃β ∈ B(w)

d such that 〈λ, β〉 6 0∣∣c = 0N

.

7.2.1 Bounding Processes on Trees

Let G be a (dL, dR)-regular Tanner graph, and fix h < 14girth(G). Let T 2h

v0(G) denote

the path-prefix tree rooted at a variable node v0 with height 2h. Since h < 14girth(G), it

follows that the projection of T 2hv0

(G) to G is a tree. We direct the edges of T 2hv0

so thatit is an in-branching directed toward the root v0 (i.e., paths from all nodes are directedtoward the root v0). For l ∈ 0, . . . , 2h, denote by Vl the set of vertices of T 2h

v0at height l

(the leaves have height 0 and the root has height 2h). Let τ ⊆ V (T 2hv0

) denote the vertexset of a d-tree rooted at v0.

88

Definition 7.3 ((h, ω, d)-Process on a (dL, dR)-Tree) Let ω ∈ Rh+ denote a weight

vector. Let λ denote an assignment of real values to the variable nodes of Tv0 , we definethe ω-weighted value of a d-tree τ by

valω(τ ;λ) ,

h−1∑

l=0

∑

v∈τ∩V2l

ωl · λv.

Namely, the sum of the values of variable nodes in τ weighted according to their height.Given a probability distribution over assignments λ, we are interested in the probability

Πλ,d,dL,dR(h, ω) , Prλ

minτ∈T [v0,2h,d]

valω(τ ;λ) 6 0

. (7.1)

In other words, Πλ,d,dL,dR(h, ω) is the probability that the minimum value over all d-trees

of height 2h rooted in some variable node v0 in a (dL, dR)-bipartite graph G is non-positive. For every two roots v0 and v1 the trees T 2h

v0and T 2h

v1are isomorphic, hence

Πλ,d,dL,dR(h, ω) does not depend on the root v0.

With this notation, the following lemma connects between the (h, ω, d)-process on(dL, dR)-trees and the event where the all-zero codeword is (h, w, d)-locally optimal. Weapply a union bound utilizing Corollary 7.2, as follows.

Lemma 7.4 Let G be a (dL, dR)-regular bipartite graph and w ∈ Rh+ \ 0h be a weight

vector with h < 14girth(G). Assume that the all-zero codeword is transmitted, and let

λ ∈ RN denote the LLR vector received from the channel. Then, 0N is (h, w, d)-locallyoptimal w.r.t. λ with probability at least

1−N ·Πλ,d,dL,dR(h, ω), where ωl = wh−l · d−1

L · (dL − 1)l−h+1 · (d− 1)h−l,

and with at least the same probability, 0N is also the unique optimal LP solution given λ.

Note the two different weight notations that we use for consistency with [ADS09]:(i) w denotes weight vector in the context of (h, w, d)-local optimality certificate, and(ii) ω denotes weight vector in the context of d-trees in the (h, ω, d)-process. A one-to-onecorrespondence between these two vectors is given by ωl = wh−l·d−1

L ·(dL−1)l−h+1·(d−1)h−l

for 0 6 l < h. From this point on, we will use only ω in this section.Following Lemma 7.4, it is sufficient to estimate the probability Πλ,d,dL,dR

(h, ω) fora given weight vector ω, a distribution of a random vector λ, constant 2 6 d 6 d∗,and degrees (dL, dR). Arora et al. [ADS09] introduced a recursion for estimating andbounding the probability of the existence of a 2-tree (skinny tree) with non-positivevalue in a (h, ω, 2)-process. We generalize the recursion and its analysis to d-trees with2 6 d 6 d∗.

For a set S of real values, let min[i]S denote the ith smallest member in S. Let γdenote an ensemble of i.i.d. random variables. Define random variables X0, . . . , Xh−1 andY0, . . . , Yh−1 with the following recursion:

Y0 = ω0γ (7.2)

Xl =d−1∑

i=1

min[i]Y

(1)l , . . . , Y

(dR−1)l

(0 6 l < h) (7.3)

Yl = ωlγ +X(1)l−1 + . . .+X

(dL−1)l−1 (0 < l < h) (7.4)

89

The notation X(1), . . . , X(k) and Y (1), . . . , Y (k) denotes k mutually independent copiesof the random variables X and Y , respectively. Each instance of Yl, 0 6 l < h, usesan independent instance of a random variable γ. Note that for every 0 6 l < h, thed − 1 order statistic random variables

min[i]Y (1)

l , . . . , Y(dR−1)l | 1 6 i 6 d − 1

in

Equation (7.3) are dependent.Consider a directed tree T = Tv0 of height 2h, rooted at node v0. Associate variable

nodes of T at height 2l with copies of Yl, and check nodes at height 2l + 1 with copiesof Xl, for 0 6 l < h. Note that any realization of the random variables γ to variablenodes in T can be viewed as an assignment λ. Thus, the minimum value of a d-treeof T equals

∑dL

i=1X(i)h−1. This implies that the recursion in (7.2)-(7.4) defines a dynamic

programming algorithm for computing minτ∈T [v0,2h,d] valω(τ ;λ). Now, let the componentsof the LLR vector λ be i.i.d. random variables distributed identically to γ, then

Πλ,d,dL,dR(h, ω) = Pr

dL∑

i=1

X(i)h−1 6 0

. (7.5)

Given a distribution of γ and a finite “height” h, the challenge is to compute thedistribution of Xl and Yl according to the recursion in (7.2)-(7.4). The following twolemmas play a major role in proving bounds on Πλ,d,dL,dR

(h, ω).

Lemma 7.5 ([ADS09]) For every t > 0,

Πλ,d,dL,dR(h, ω) 6

(Ee−tXh−1

)dL .

Let d′ , d− 1, d′L , dL − 1 and d′R , dR − 1.

Lemma 7.6 (following [ADS09]) For 0 6 s < l < h, we have

Ee−tXl 6

(

Ee−tXs

)(d′L·d′)l−s

·l−s−1∏

k=0

((d′Rd′

)(Ee−tωl−kγ

)d′)(d′

L·d′)k

.

Proof: See Appendix 7.A. 2

In the following subsection we present concrete bounds on Πλ,d,dL,dR(h, ω) for the BSC.

The bounds are based on Lemmas 7.5 and 7.6. The technique used to derive concretebounds for the BSC may be applied to other MBIOS channels. For example, concretebounds for the BI-AWGN channel can be derived by a generalization of the analysispresented in Chapter 4.

7.2.2 Analysis for Binary Symmetric Channel

Consider the binary symmetric channel with crossover probability p denoted by BSC(p).In the case that the all-zero codeword is transmitted, the channel input is ci = 0 forevery i. Hence, Pr

(λi = −log

(1−p

p

))= p, and Pr

(λi = +log

(1−p

p

))= 1 − p. Since

Πλ,d,dL,dR(h, ω) is invariant under positive scaling of the vector λ, we consider in the

following analysis the scaled vector λ in which λi = +1 w.p. p, and −1 w.p. (1− p).Following the analysis of Arora et al. [ADS09], we apply a simple analysis in the case

of uniform weight vector ω. Then, we present improved bounds by using a non-uniformweight vector.

90

Uniform Weights

Consider the case where ω = 1h. Let α1 , Ee−tX0 and α2 ,(

d′Rd′

)(Ee−tγ)d′ where γ

i.i.d.∼ λi,

and define α , mint>0 α1 · α1/(d′L·d′−1)

2 . Note that α1 6 α2 (see Equation (7.12)).Weconsider the case where α < 1. By substituting notations of α1 and α2 in Lemma 7.6 fors = 0, we have

Ee−tXl 6

(

Ee−tX0

)(d′L·d′)l

·l−1∏

k=0

((d′Rd′

)(Ee−tγ

)d′)(d′

L·d′)k

= α1(d′L·d′)l ·

l−1∏

k=0

α2(d′L·d′)k

= α1(d′

L·d′)l · α2

∑l−1k=0 (d′

L·d′)k

= α1(d′

L·d′)l · α2

(d′L·d′)l−1

d′L·d′−1

=(α1 · α2

1d′L·d′−1

)(d′L·d′)l· α2

− 1d′L·d′−1

6 α(d′L·d′)l−1.

By Lemma 7.5, we conclude that

Πλ,d,dL,dR(h, 1h) 6 αdL·(d′

L·d′)h−1−dL .

To analyze parameters for which Πλ,d,dL,dR(h, 1h)→ 0, we need to compute α1 and α2

as functions of p, d, dL and dR. Note that

X0 =

d′ − 2k w.p.(

d′R

k

)pk(1− p)d′

R−k, ∀k. 0 6 k < d′,

−d′ w.p.∑d′R

k=d′

(d′

R

k

)pk(1− p)d′

R−k.

(7.6)

Therefore,

α1(p, d, dL, dR, t) =

d′−1∑

k=0

(d′Rk

)

pk(1− p)(d′R−k)e−t(d′−2k) (7.7)

+

( d′R∑

k=d′

(d′Rk

)

pk(1− p)d′R−k

)

etd′ , and (7.8)

α2(p, d, dL, dR, t) =

(d′Rd′

)((1− p)e−t + pet

)d′. (7.9)

The above calculations give the following bound on Πλ,d,dL,dR(h, 1h).

Lemma 7.7 Let p ∈ (0, 12) and let d, dL, dR > 2 s.t. d′L · d′ > 2. Denote by α1 and α2

the functions defined in (7.7)-(7.9), and let

α = mint>0

(α1(p, d, dL, dR, t)

)·(α2(p, d, dL, dR, t)

)1/(d′L·d′−1)

.

Then, for h ∈ N and ω = 1h, we have

Πλ,d,dL,dR(h, ω) 6 αdL·d′L

h−1−dL .

91

Note that Πλ,d,dL,dR(h, 1h) decreases doubly-exponentially as a function of h.

For (2, 16)-regular graphs and d ∈ 3, 4, we obtain the following corollary.

Corollary 7.8 Let dL = 2, and dR = 16.

1. Let d = 3 and p 6 0.0067. Then, there exists a constant α < 1 such that for everyh ∈ N and w = 1h,

Πλ,d,dL,dR(h, 1h) 6 α2h−1

.

2. Let d = 4 and p 6 0.0165. Then, there exists a constant α < 1 such that for everyh ∈ N and w = 1h,

Πλ,d,dL,dR(h, 1h) 6 α3h−1

.

The bound on p for which Corollary 7.8 applies grows with d. This fact confirms thatanalysis based on denser trees, i.e., d-trees with d > 2 instead of skinny trees, impliesbetter bounds on the error probability and higher lower bounds on the threshold. Also,for d > 2, we may apply the analysis to (2, dR)-regular codes; a case that is not applicableby the analysis of Arora et al. [ADS09].

Improved Bounds Using Non-Uniform Weights

The following lemma implies an improved bound for Πλ,d,dL,dR(h, ω) using a non-uniform

weight vector ω.

Lemma 7.9 Let p ∈ (0, 12) and let d, dL, dR > 2 s.t. d′L · d′ > 2. For s ∈ N and a weight

vector ω ∈ Rs+, let

α = mint>0

Ee−tXs

·((

d′Rd′

)(2√

p(1− p))d′

) 1d′L·d′−1

. (7.10)

Let ω(ρ) ∈ Rh+ denote the concatenation of the vector ω ∈ Rs

+ and the vector (ρ, . . . , ρ) ∈Rh−s

+ . Then, for every h > s there exists a constant ρ > 0 such that

Πλ,d,dL,dR(h, ω(ρ)) 6

((d′Rd′

)(2√

p(1− p))d′

)− d′L

d′L·d′−1

· αdL·(d′L·d′)h−s−1

.

Proof: See Appendix 7.B 2

Consider a weight vector ω with components ωl = ((dL − 1)(d − 1))l. This weightvector has the effect that every level in a skinny tree τ contributes equally to valω(τ ; |λ|)(note that |λ| ≡ ~1). For h > s, consider a weight vector ω(ρ) ∈ Rh

+ defined by

ω(ρ)l =

ωl if 0 6 l < s,

ρ if s 6 l < h.

Note that the first s components of ω(ρ) are geometric while the other components areuniform.

92

s p0 s p0

0 0.0086 4 0.0164

1 0.011 5 0.0171

2 0.0139 6 0.0177

3 0.0154 10 0.0192

Table 7.2: Computed values of p0 for finite s in Corollary 7.10. Values are presented for(dL, dR) = (2, 16) and d = 3.

s p0 s p0

0 0.0218 4 0.039

1 0.0305 5 0.0405

2 0.0351 6 0.0415

3 0.0375 10 0.044

Table 7.3: Computed values of p0 for finite s in Corollary 7.11. Values are presented for(dL, dR) = (2, 16) and d = 4.

For a given p, d, dL, and dR, and for a concrete value s we can compute the dis-tribution of Xs using the recursion in (7.2)-(7.4). Moreover, we can also compute thevalue mint>0 Ee

−tXs. For (2, 16)-regular graphs we obtain the following corollaries. Corol-lary 7.10 is stated for the case where d = 3, and Corollary 7.11 is stated for the case whered = 4.

Corollary 7.10 Let p 6 p0, d = 3, dL = 2, and dR = 16. For the following values ofp0 and s in Table 7.2 it holds that there exists constants ρ > 0 and α < 1 such that forevery h > s,


1

420

(p(1− p)

)−1 · α2h−s

.

Corollary 7.11 Let p 6 p0, d = 4, dL = 2, and dR = 16. For the following values ofp0 and s in Table 7.3 it holds that there exists constants ρ > 0 and α < 1 such that forevery h > s,


1

60

(p(1− p)

)− 34 · α3h−s

.

Note that for a fixed s, the probability Πλ,d,dL,dR(h, ω) decreases doubly-exponentially

as a function of h.

7.2.3 Analysis for MBIOS Channels

Theorem 7.1 generalizes to MBIOS channels as follows.

Theorem 7.12 Let G denote a (dL, dR)-regular bipartite graph with girth Ω(logN), andlet C(G) ⊂ 0, 1N denote a Tanner code based on G with minimum local distance d∗.

93

Consider an MBIOS channel, and let λ ∈ RN denote the LLR vector received from thechannel given c = 0N . Let γ ∈ R denote a random variable independent and identicallydistributed to components of λ. Then, for any (dL, dR) and 2 6 d 6 d∗ s.t. (dL − 1)(d−1) > 2, LP-decoding succeeds with probability at least 1 − exp(−N δ) for some constant0 < δ < 1, provided that

mint>0

Ee−tX0 ·((

dR − 1

d− 1

)(Ee−tγ

)(d−1)) 1

(dL−1)(d−1)−1

< 1.

where X0 =∑d−1

i=1 min[i]γ(1), . . . , γ(dR−1) and the random variables γ(i) are independentand distributed identically to γ.

7.3 Discussion

In this chapter, new bounds for LP-decoding failure are proved in the case of regularTanner codes. In particular, we considered a concrete example of (2, 16)-regular Tannercodes with [16, 11, 4]-Hamming codes as local-codes and Tanner graphs with logarithmicgirth. The rate of such codes is at least 0.375. For the case of a BSC with crossoverprobability p, we prove a lower bound of p∗ = 0.044 on the noise threshold. Below thatthreshold the word error probability decreases doubly exponential in the girth of theTanner graph.

Most of the research on the error correction of Tanner codes deals with families of ex-pander Tanner codes. How do the bounds presented in Section 7.2 compare with resultson expander Tanner codes? The error correction capability of expander codes depends onthe expansion, thus a fairly large degree and huge block-lengths are required to achievegood error correction. Our example for which results are stated in Theorem 7.1(1) and7.1(2) relies only on a 16-regular graph with logarithmic girth. Sipser and Spielman [SS96]studied Tanner codes based on expander graphs and analyzed a simple bit-flipping itera-tive decoding algorithm. Their novel scheme was later improved, and it was shown thatexpander Tanner codes can even asymptotically achieve capacity in the BSC with aniterative decoding bit-flipping scheme [Z01, BZ02, BZ04]. In these works, a worst-caseanalysis (for an adversarial channel) was performed as well.

The best result for iterative decoding of such expander codes, reported by Skachek andRoth [SR03], implies a lower bound of p∗ = 0.0016 on the threshold of a certain iterativedecoder for rate 0.375 codes. Feldman and Stein [FS05] proved that LP-decoding canasymptotically achieve capacity with a special family of expander Tanner codes. They alsopresented a worst-case analysis, which in the case of a code rate of 0.375, proves that LPdecoding can recover any pattern of at most 0.0008N bit flips. This implies a lower boundof p∗ = 0.0008 on the noise threshold. These analyses yield overly pessimistic predictionsfor the average-case (i.e., the BSC). Theorem 7.1(2) deals with average case analysisand implies that LP-decoding can correct up to 0.044N bit flips with high probability.Furthermore, previous iterative decoding algorithms for expander Tanner codes deal onlywith bit-flipping channels. Our analysis for LP-decoding applies to any MBIOS channel,in particular, it can be applied to the BI-AWGN channel.

However, the lower bounds on the noise threshold proved for Tanner codes do notimprove the best previous bounds for regular LDPC codes. An open question is whether

94

using deviations denser than skinny trees for Tanner codes can beat the best previousbounds for regular LDPC codes [ADS09, HE11a]. In particular, for a concrete family ofTanner codes with rate 1

2, it would be interesting to prove lower bounds on the threshold

of LP-decoding that are larger than p∗ = 0.05 in the case of BSC, and σ∗ = 0.0735 in thecase of BI-AWGN channel.

7.A Proof of Lemma 7.6

Proof: We prove the lemma by induction on the difference l − s. We first derive anequality for Ee−tYl and a bound for Ee−tXl. Since Yl is the sum of mutually independentvariables,

Ee−tYl =(Ee−tωlγ

)(Ee−tXl−1

)d′L. (7.11)

By definition of Xl we have the following bound,

e−tXl = e−t∑d′

j=1 min[j]Y (i)l

:16i6d′R

=d′∏

j=1

e−t min[j]Y (i)l

:16i6d′R

6∑

S⊆[d′R

]:|S|=d′

∏

i∈S

e−tY(i)l .

Therefore, from linearity of expectation and since Y (i)l

d′Ri=1 are mutually independent

variables, we have

Ee−tXl 6

(d′Rd′

)(

Ee−tYl

)d′

. (7.12)

By substituting (7.11) in (7.12), we get

Ee−tXl 6

(

Ee−tXl−1

)(d′L·d′)(d′Rd′

)(

Ee−tωlγ

)d′

, (7.13)

which proves the induction basis where s = l − 1. Suppose, therefore, that the lemmaholds for l − s = i, we now prove it for l − (s− 1) = i+ 1. Then by substituting (7.13)in the induction hypothesis, we have

Ee−tXl 6

(

Ee−tXs

)(d′L·d′)l−s

·l−s−1∏

k=0

((d′Rd′

)(Ee−tωl−kγ

)d′)(d′L·d′)k

6

[(

Ee−tXs−1

)(d′L·d′)(

d′Rd′

)(

Ee−tωsγ

)d′](d′L·d′)l−s

·l−s−1∏

k=0

((d′Rd′

)(Ee−tωl−kγ

)d′)(d′

L·d′)k

=

(

Ee−tXs−1

)(d′L·d′)l−s+1

·l−s∏

k=0

((d′Rd′

)(Ee−tωl−kγ

)d′)(d′

L·d′)k

,

which concludes the correctness of the induction step for a difference of l − s+ 1. 2

95

7.B Proof of Lemma 7.9

Proof: By Lemma 7.6, we have

Ee−tXh−1 6

(

Ee−tXs

)(d′L·d′)h−s−1

·((

d′Rd′

)(Ee−tρη

)d′) (d′

L·d′)h−s−1−1

d′L·d′−1

.

Note that Ee−tρη is minimized for etρ =√

p(1− p). Hence,

Ee−tXh−1 6

(

Ee−tXs

)(d′L·d′)h−s−1

·((

d′Rd′

)(2√

p(1− p))d′

) (d′L·d′)h−s−1−1

d′L·d′−1

6

[(

Ee−tXs

)((d′Rd′

)(2√

p(1− p))d′

) 1d′L·d′−1

](d′L·d′)h−s−1

·((

d′Rd′

)(2√

p(1− p))d′

)− 1d′L·d′−1

.

Let α , mint>0

Ee−tXs

((

d′Rd′

)(2√

p(1− p))d′

) 1d′L·d′−1

. By (7.10), α < 1. Let t∗ =

arg mint>0 Ee−tXs , then

Ee−t∗Xh−1 6 α(d′L·d′−1)h−s−1 ·

((d′Rd′

)(2√

p(1− p))d′

)− 1d′L·d′−1

.

Using Lemma 7.5, we conclude that

Πλ,d,dL,dR(h, ω(ρ)) 6 αdL(d′L·d′−1)h−s−1 ·

((d′Rd′

)(2√

p(1− p))d′

)− dLd′L·d′−1

.


96

Chapter 8

Hierarchies of Local-Optimality

In this chapter1, we present hierarchies of locally-optimal codewords with respect to twoparameters. One parameter is related to the minimum distance of the local codes inTanner codes. The second parameter is related to the finite number of iterations usedin iterative decoding, even when number of iterations exceeds the girth of the Tannergraph. We show that these hierarchies satisfy inclusion properties as these parametersare increased. In particular, this implies that a codeword that is decoded with a certificateusing an iterative decoder (nwms) after h iterations is decoded with a certificate afterk · h iterations, for every integer k.

The results presented in this chapter are based on the definitions in Chapters 2 and 5.In the following, let G = (V∪J , E) denote a Tanner graph and let C(G) ⊂ 0, 1N denotea Tanner code with minimum local distance d∗. Let w ∈ Rh

+\0h denote a non-negativeweight vector of length h ∈ N. Unless stated otherwise, let 2 6 d 6 d∗.

8.1 Introduction

Suboptimal decoding of expander Tanner codes was analyzed in many works (seee.g., [SS96, BZ04, FS05]). The results in these analyses rely on: (i) the expansion proper-ties of the Tanner graph, and (ii) constant relative minimum distances of the local codes.The error-correcting guarantees in these analyses improve as the expansion factor andrelative minimum distance increase. In Section 8.3 we focus on the effect of increasingthe minimum distance of the local codes on error correcting guarantees of Tanner codesby ML-decoding and LP-decoding.

Density Evolution (DE) is used to study the asymptotic performance of decodingalgorithms based on Belief-Propagation (BP) (see e.g., [RU01, CF02]). Convergence ofBP-based decoding algorithms to some fixed point was studied in [FK00, WF01, WJW05,JP11]. However, convergence guarantees do not imply successful decoding after a finitenumber of iterations. Korada and Urbanke [KU11] provide an asymptotic analysis ofiterative decoding “beyond” the girth. Specifically, they prove that one may exchangethe order of the limits in DE-analysis of BP-decoding under certain conditions (i.e.,variable node degree at least 5 and bounded LLRs). On the other hand, in Section 8.4we focus on properties of iterative decoding of finite-length codes using a finite number

1The research work presented in this chapter is based on [HE12a, HE12d]

97

of iterations.The characterization of local-optimality for Tanner codes has three parameters: (i) a

height h ∈ N, (ii) a vector of level weights w ∈ Rh+, and (iii) a degree 2 6 d 6 d∗, where d∗

is the minimum local distance. We define hierarchies of local-optimality with respect tothe degree and height parameters in local-optimality. These hierarchies provide a partialexplanation of two questions about successful decoding with ML-certificates: (1) Whatis the effect of increasing the minimum distance of the local codes in Tanner codes?(2) What is the effect of increasing the number of iterations beyond the girth in iterativedecoding?

Contributions. To obtain one of the hierarchy results, we needed a new definition oflocal-optimality called strong local-optimality. We prove that if a codeword is stronglylocally-optimal, then it is also locally-optimal (Lemma 8.5). Hence, the results provedfor local-optimality in Sections 5.2-5.3 hold also for strong local-optimality.

We present two combinatorial hierarchies:

1. A hierarchy of local-optimality based on degrees. The degree hierarchy states thata locally-optimal codeword x with degree parameter d is also locally-optimal withrespect to any degree parameter d′ > d. The degree hierarchy implies that theoccurrence of local-optimality does not decrease as the degree parameter increases.

2. A hierarchy of strong local-optimality based on height. The height hierarchy statesthat if a codeword x is strongly locally-optimal with respect to height parameterh, then it is also strongly locally-optimal with respect to every height parameterthat is an integer multiple of h. The height hierarchy proves, for example, thatthe performance of iterative decoding with an ML-certificate (e.g., nwms) of finite-length Tanner codes with SPC local codes does not degrade as the number ofiterations grows, even beyond the girth of the Tanner graph.

Organization. In Section 8.2 we introduce a key trimming procedure used in the proofsof the hierarchies. In Section 8.3 we prove that the degree-based hierarchy induces a chainof inclusions of locally-optimal codewords and LLRs. In Section 8.4 we prove a height-based hierarchy over strong local-optimality. We show that strong local-optimality implieslocal-optimality. Numerical results of strong local-optimality and local-optimality withrespect to the height hierarchy are presented in Section 8.5. We conclude with a discussionin Section 8.6.

8.2 Trimming Subtrees from a Path-Prefix Tree

We use in this section some basic definitions of local-optimality characterization. Werefer the reader to Section 5.2 for definitions of path-prefix tree (Definition 5.1), d-tree(Definition 5.2), and a w-weighted subtree (Definition 5.1).

Let Tq denote the subtree of a path-prefix tree T hanging from path q, i.e., the subtree

induced by Vq , p ∈ V ∪ J | q ∈ Prefix+(p) or p = q (see Figure 8.1). Let Trim(T , q)denote the subtree of T obtained by deleting the subtree Tq from T . Formally, Trim(T , q)is the path-prefix subtree of T induced by V ∪ J \ Vq. Note that if q′ is a sibling of q (i.e.,

98

qq′

T

Trim(T , q) Tq

q′′

Figure 8.1: Trimmed tree of T induced by q.

q′ differs from q only in the last edge), then the degree of the parent of q and q′ decreasesby one as a result of trimming Vq. Hence, wT (q′′) < wTrim(T ,q)(q

′′) for every variable path

q′′ ∈ Vq′.

The proofs of the hierarchies presented in this chapter are based on the followinglemma.

Lemma 8.1 Let T denote a subtree of a path-prefix tree T 2hr (G). For every path p ∈ T

with at least two children in T , there exists at least one child p′ of p, such that

〈λ, πG,T ,w〉 > 〈λ, πG,Trim(T ,p′),w〉.

Proof: See Appendix 8.A. 2

8.3 Degree Hierarchy of Local-Optimality

Let Λ ⊆ RN denote a set of LLR vectors. Denote by loC,Λ(h, w, d) the set of pairs(x, λ) ∈ C × Λ such that x is (h, w, d)-locally optimal w.r.t. λ. Formally,

loC,Λ(h, w, d) ,(x, λ) ∈ C × Λ | x is (h, w, d)−locally optimal w.r.t. λ

. (8.1)

The following theorem derives an hierarchy on the “density” of deviations in local-optimality characterization.

Theorem 8.2 (d-Hierarchy of local-optimality) Let 2 6 d < d∗. For every Λ ⊆RN ,

loC,Λ(h, w, d) ⊆ loC,Λ(h, w, d+ 1).

99

Proof: We prove the contrapositive statement. Assume that x is not (h, w, d + 1)-locally optimal w.r.t. λ. By Proposition 5.8, 0N is not (h, w, d+ 1)-locally optimal w.r.t.

λ0 , (−1)x ∗ λ. Hence, there exists a deviation β = πG,T ,w ∈ B(w)d such that 〈λ0, β〉 6 0.

Let T denote the (d+ 1)-tree that corresponds to the deviation β.Consider the following iterative trimming process. Start with the (d+ 1)-tree T and

let T ← T ′; While there exists a local-code path p ∈ T ′ such that degT ′(p) = d + 1 do:T ′ ← Trim(T ′, q) where q is a child of p such that 〈λ0, πG,T ′,w〉 > 〈λ0, πG,Trim(T ′,q),w〉.

Lemma 8.1 guarantees that the iterative trimming process halts with a d-tree T ′

whose corresponding deviation β ′ = πG,T ′,w satisfies 〈λ0, β ′〉 6 〈λ0, β〉 6 0. We concludeby Proposition 5.8 that x is not (h, w, d)-locally optimal w.r.t. λ, as required. 2

We conclude that for every 2 6 d < d∗,

Prλ

x is (h, w, d+ 1)−locally optimal w.r.t. λ

>

Prλ

x is (h, w, d)−locally optimal w.r.t. λ

.

In Chapter 7 we prove upper bounds on the probability that a codeword x is locally-optimal for regular Tanner codes over MBIOS channels. Specifically, we present an anal-ysis for the BSC. Indeed, the bounds in Theorem 7.1 improve as d increases.

8.4 Height Hierarchy of Strong Local-Optimality

In this section we introduce a new combinatorial characterization named strong local-optimality. We prove that if a codeword is strongly locally-optimal then it is also locally-optimal. The other direction is not true in general. We prove a hierarchy on strong local-optimality based on the height parameter. We discuss in Section 8.6 on the implicationsof the height hierarchy on iterative message-passing decoding of Tanner codes.

Definition 8.3 (reduced d-tree) Denote by T 2hr (G) = (V ∪ J , E) the path-prefix tree

of a Tanner graph G rooted at node r ∈ V. A subtree T ⊆ T 2hr (G) is a reduced d-tree if:

(i) T is rooted at r, (ii) degT((r)

)= degG(r)−1, (iii) for every local-code path p ∈ T ∩J ,

degT (p) = d, and (iv) for every non-empty variable path p ∈ T ∩V, degT (p) = degT 2hr

(p).

The only difference between Definition 5.2 (d-tree) to a reduced d-tree is that the degreeof the root in a reduced d-tree is smaller by 1 (as if the root itself hangs from an edge)2.

Let T red[r, 2h, d](G) denote the set of all reduced d-trees rooted at r that are subtrees

of T 2hr (G). For a Tanner code C(G), let B(w)

d ⊆ [0, 1]N denote the set of all projections ofw-weighted reduced d-trees on G. That is,

B(w)

d ,πG,T ,w

∣∣T ∈

⋃

r∈VT red[r, 2h, d](G)

. (8.2)

Vectors in B(w)

d are referred to as reduced deviations.The following definition is analogues to Definition 5.4 (local-optimality) using reduced

deviations instead of deviations.

2This difference is analogous to the “edge” versus “node” perspectives of tree ensembles in the bookModern Coding Theory [RU08]

100

Definition 8.4 (strong local-optimality) Let C(G) ⊂ 0, 1N denote a Tanner code.Let w ∈ Rh

+\0h denote a non-negative weight vector of length h and let d > 2. Acodeword x ∈ C(G) is (h, w, d)-strong locally-optimal with respect to λ ∈ RN if for all

vectors β ∈ B(w)

d ,〈λ, x⊕ β〉 > 〈λ, x〉. (8.3)

Denote by sloC,Λ(h, w, d) the set pairs (x, λ) ∈ C × Λ such that x is (h, w, d)-stronglocally-optimal w.r.t. λ. Formally,

sloC,Λ(h, w, d) ,(x, λ) ∈ C × Λ | x is (h, w, d)−strongly locally − optimal w.r.t. λ

.

(8.4)The following lemma states that if a codeword x is strongly locally-optimal w.r.t. λ,

then x is locally-optimal w.r.t. λ.

Lemma 8.5 For every Λ ⊆ RN ,

sloC,Λ(h, w, d) ⊆ loC,Λ(h, w, d).

Proof: We prove the contrapositive statement. Assume that x is not (h, w, d)-locallyoptimal w.r.t. λ. By Proposition 5.8, 0N is not (h, w, d)-locally optimal w.r.t. λ0 ,

(−1)x ∗ λ. Hence, there exists a deviation β = πG,T ,w ∈ B(w)d such that 〈λ0, β〉 6 0. Let

T denote the d-tree that corresponds to the deviation β.Denote by (r) the root of T . By Lemma 8.1, the root (r) has a child q such that

〈λ0, πG,T ,w〉 > 〈λ0, πG,Trim(T ,q),w〉. Note that Trim(T , q) is a reduced d-tree rooted at r.Moreover, the corresponding reduced deviation β ′ = πG,T ′,w satisfies 〈λ0, β ′〉 6 〈λ0, β〉 60. We conclude by Proposition 5.8 that x is not (h, w, d)-strong locally-optimal w.r.t. λ,as required. 2

Following Lemma 8.5, and Theorems 5.5 and 5.10 we have the following corollary.

Corollary 8.6 (strong local-optimality is sufficient for both ML and LP) LetC(G) denote a Tanner code with minimum local distance d∗. Let h ∈ N+ and w ∈ Rh

+.Let λ ∈ RN denote the LLR vector received from the channel. If x is an (h, w, d)-stronglocally-optimal codeword w.r.t. λ and some 2 6 d 6 d∗, then (1) x is the uniquemaximum-likelihood codeword w.r.t. λ, and (2) x is the unique solution of LP-decodinggiven λ.

Consider a weight vector w ∈ Rk·h, and let w = w1 w2 . . . wk denote its decom-position to k blocks wi ∈ Rh. We say that w ∈ Rk·h is a k-legal extension of w ∈ Rh ifthere exists a vector α ∈ Rk such that wi = αi · w. Note that if w ∈ Rk·h is geometric,then it is a k-legal extension of the first block w1 in its decomposition.

The following theorem derives a hierarchy on the height of reduced deviations of stronglocal-optimality characterization.

Theorem 8.7 (h-Hierarchy of strong LO) For every Λ ⊆ RN , if w ∈ Rk·h is a k-legal extension of w ∈ Rh, then

sloC,Λ(h, w, d) ⊆ sloC,Λ(k · h, w, d).

101

pj

T

Tj

2h

2h

2h

2 · k · h

Figure 8.2: Decomposition of a reduced d-tree T of height 2kh to a set of subtrees Tjthat are reduced d-trees of height 2h.

Proof: We prove the contrapositive statement. Assume that x is not (k ·h, w, d)-stronglocally-optimal w.r.t. λ. Proposition 5.7 implies that 0N is not (k ·h, w, d)-strong locally-

optimal w.r.t. λ0 , (−1)x ∗ λ. Hence, there exists a reduced deviation β = πG,T ,w ∈ B(w)d

such that 〈λ0, β〉 6 0. Let T denote the reduced d-tree that corresponds to the reduceddeviation β.

Let Tj denote a decomposition of T to reduced d-trees of height 2h as shown inFigure 8.2, where leaves of a subtree are the roots of other subtrees. Let pj denote theroot of a reduced d-tree Tj in the decomposition of T . For each subtree Tj let `(Tj) denoteits “level”, namely, `(Tj) , b|pj| /hc. Then,

πG,T ,w =∑

Tjα`(Tj) · πG,Tj ,w.

Because 〈λ0, β〉 6 0, we conclude by averaging that there exists at least one reducedd-tree T ∗ ∈ Tj of height 2h such that 〈λ0, πG,T ∗,w〉 6 0. Hence, 0N is not (h, w, d)-strong locally-optimal w.r.t. λ0. We apply Proposition 5.7 again, and conclude that x isnot (h, w, d)-strong locally-optimal w.r.t. λ, as required. 2

8.5 Numerical Results

We conducted simulations to demonstrate two phenomena. First, we checked the differ-ence between strong local-optimality and local-optimality. Second, we checked the effectof increasing the number of iterations on successful decoding with ML-certificates (basedon local-optimality).

102

We chose a (3, 6)-regular LDPC code with blocklength N = 1008 and girth g =6 [Mac]. For each p ∈ 0.04, 0.05, 0.06, we randomly picked a set Λp of 5000 LLR vectorscorresponding to the all zeros codeword with respect to a BSC with crossover probabilityp. We used unit level weights, i.e., w = 1h, for the definition of local-optimality.

Let slo0N ,Λp(h, w, 2) (resp., lo0N ,Λp

(h, w, 2) ) denote the set of LLR vectors λ ∈ Λp

such that 0N is strongly locally-optimal (resp., locally-optimal) w.r.t. λ.Figure 8.3 depicts cardinality of slo0N ,Λp

(h, w, 2) and lo0N ,Λp(h, w, 2) as a func-

tion of h, for three values of p. The results suggest that, in this setting, the setsslo0N,Λp

(h, w, 2) and lo0N,Λp(h, w, 2) coincide as h grows. This suggests that for finite-

length codes and large height h, strong local-optimality is very close to local-optimality.For example, in our simulation for p = 0.04 and h = 320, |lo0N,Λp

(h, w, 2)| = 4868 and|slo0N,Λp

(h, w, 2)| = 4859 (i.e., only 9 LLRs out of 5000 are in lo but not in slo forheight parameter h = 320).

Iterative decoding is guaranteed to succeed after h iteration if (h, w, 2)-stronglylocally-optimal w.r.t. λ. Hence, the results also suggest that the number of iterationsneeded to obtain reasonable decoding with ML-certificates is far greater than the girth.Clearly, the “tree property” that DE analysis relies on does not hold for so many iterationsin finite-length codes. Indeed, the simulated crossover probabilities are in the “waterfall”region of the word error rate curve with respect to nwms decoding. We are not aware ofany analytic explanation of the phenomena that iterative decoding of finite-length codesrequires so many iterations in the “waterfall” region.

Another result of the simulation (for which we do not provide proof) is thatslo0N ,Λp

(h, w, 2) ⊆ slo0N ,Λp(h + 1, w, 2). Namely, once a codeword is strongly locally-

optimal w.r.t. λ with height h, then it is also strongly locally-optimal for any heighth′ > h (and not only multiples of h as proved in Theorem 8.7). We point out that sucha strengthening of the height hierarchy result is not true in general.

8.6 Discussion

We present hierarchies of local-optimality with respect to two parameters of the local-optimality characterization for Tanner codes. One hierarchy is based on the local-codenode degrees in the deviations. We prove containment, namely, the set of locally-optimalcodewords with respect to degree d+1 is a superset of the set of locally-optimal codewordswith respect to degree d.

The second hierarchy is based on the height of the deviations. We prove that, forgeometric level weights, a strongly locally-optimal codeword is infinitely often stronglylocally-optimal. In particular, a codeword that is decoded with a certificate using theiterative decoder nwms after h iterations is decoded with a certificate after k ·h iterations,for every integer k.

The degree hierarchy and probability of successful decoding of Tanner codes.The degree hierarchy supports the improvement in the lower bounds on the thresholdvalue of the crossover probability p of successful LP-decoding over a BSCp as a functionof the degree parameter d (see Theorem 7.1). These lower bounds are proved by analyzingthe probability of a locally-optimal codeword as a function of p and the degree parameter

103

1 10 100 3200

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

h

|LO0N ,Λp(h, 1h, 2)|, p = 0.04

|SLO0N ,Λp(h, 1h, 2)|, p = 0.04

|LO0N ,Λp(h, 1h, 2)|, p = 0.05

|SLO0N ,Λp(h, 1h, 2)|, p = 0.05

|LO0N ,Λp(h, 1h, 2)|, p = 0.06

|SLO0N ,Λp(h, 1h, 2)|, p = 0.06

girth=6h=3

Figure 8.3: Growth of strong local-optimality and local-optimality as a function of theheight h. |Λp| = 5000 for p ∈ 0.04, 0.05, 0.06.

d. For example, consider any (2, 16)-regular Tanner code with minimum local-distanced∗ = 4 whose Tanner graph has logarithmic girth in the blocklength. The bounds inChapter 7 imply a lower bound on the threshold of p0 = 0.019 with respect to degreeparameter d = 3. On the other hand, the lower bound on the threshold increases top0 = 0.044 with respect to degree parameter d = 4. However, note that the degreehierarchy holds for local-optimality with any height parameter h, while the probabilisticanalysis presented in Chapter 7 restricts the parameter h by a quarter of the girth of theTanner graph.

The height hierarchy of strong local-optimality and iterative decoding Themotivation for considering the height hierarchy comes from an iterative message-passingalgorithm (nwms, Algorithm 6.1) that is guaranteed to successfully decode a locally-optimal codeword in h iterations (Theorem 6.1). Consider a Tanner code with singleparity-check local codes. Assume that x is a codeword that is strongly locally-optimalw.r.t. λ for height parameter h. Our results imply that: (i) x is also strongly locally-optimal w.r.t. λ for any height parameter k ·h where k ∈ N+ (this is implied by the heighthierarchy in Theorem 8.7), (ii) x is also locally-optimal (this is implied by Lemma 8.5).Therefore, we have that x is also locally-optimal w.r.t. λ for any height parameter k · hwhere k ∈ N+. Thus nwms decoding is guaranteed to decode x after k · h iterations(Theorem 6.1). This gives the following new insight of convergence. If a codeword xis decoded after h iterations and is certified to be strongly locally-optimal (and henceML-optimal), then x is the outcome of nwms infinitely many times (i.e., whenever the

104

number of iterations is a multiple of h).Richardson and Urbanke proved a monotonicity property w.r.t. iterations for belief

propagation decoding of LDPC codes based on a tree-like setting and channel degra-dation [RU08, Lemma 4.107]. Such a monotonicity property does not hold in generalfor suboptimal iterative decoders. In particular, the standard min-sum algorithm is notmonotone for LDPC codes. The height hierarchy implies a monotonicity property w.r.t.iterations for nwms decoding with strong local-optimality certificates even without as-suming the tree-like setting and channel degradation. That is, the performance of stronglylocally-optimal nwms decoding of finite-length Tanner codes with SPC local codes doesnot degrade as the number of iterations increase, even beyond the girth of the Tannergraph. Proving an analogous non-probabilistic combinatorial height hierarchy for BP isan interesting open question.

8.A Proof of Lemma 8.1

Let us first introduce the following averaging proposition.

Proposition 8.8 Let x1, . . . , xk denote k real numbers. Define kmax , arg max16i6kxi,and

x′i ,

0 if i = kmax,k

k−1· xi otherwise.

Then,∑k

i=1 xi >∑k

i=1 x′i.

Proof: It holds that

k∑

i=1

x′i =∑

i6=kmax

k

k − 1· xi

=

k∑

i=1

xi +∑

i6=kmax

1

k − 1· xi − xkmax .

Therefore, it is sufficient to show that xkmax >∑

i6=kmax

1k−1· xi. The proposition follows

because xkmax is indeed greater or equal than the average of the other numbers. 2

Proof: [Proof of Lemma 8.1] Consider a path p ∈ T , and let p′ denote a child of p (i.e.,p′ is an augmentation of p by a single edge). We separate the inner products 〈λ, πG,T ,w〉and 〈λ, πG,Trim(T ,p′),w〉 to variable paths in V \ Vp and in V ∩ Vp as follows.

〈λ, πG,T ,w〉 =∑

q∈V\Vp

λt(q) · wT (q)

︸︷︷︸

(a)

+∑

q∈V∩Vp

λt(q) · wT (q)

︸︷︷︸

(b)

. (8.5)

〈λ, πG,Trim(T ,p′),w〉 =∑

q∈V\Vp

λt(q) · wT (q)

︸︷︷︸

(a′)

+∑

q∈V∩Vp

λt(q) · wT (q)

︸︷︷︸

(b′)

. (8.6)

105

It is sufficient to show: (i) ∀p′ child of p: Term (8.5.a) = Term (8.6.a’), and (ii) ∃p′ childof p s.t. Term (8.5.b) > Term (8.6.b’).

First we deal with the equality Term (8.5.a) = Term (8.6.a’). Let p′ denote a child ofp. For each q ∈ V \ Vp, it holds that wT (q) = wTrim(T ,p′)(q). Therefore,

∑

q∈V\Vp

λt(q) · wT (q) =∑

q∈V\Vp

λt(q) · wTrim(T ,p′)(q) (8.7)

Hence, Term (8.5.a) remains unchanged by trimming Tp′ from T for every child p′ of p.For a path q, let costT (Tq) ,

∑

q′∈Vqλt(q′)wT (q′) denote the cost of Tq with respect

to T . Note that Term (8.5.b) equals costT (Tp). We may reformulate Term (8.5.b) asfollows:

costT (Tp) =

λt(p)wT (p) +∑

q∈NT (p) : |q|=|p|+1 costT (Tq) if t(p) ∈ V,∑

q∈NT (p) : |q|=|p|+1 costT (Tq) if t(p) ∈ J . (8.8)

Consider two children q1 and q2 of p. By Definition 5.3, for every variable path q ∈ Tq2,

(degT (p)− 1) · wT (q) = (degT (p)− 2) · wTrim(T ,q1)(q). (8.9)

Hence by summing over all the variable paths in Tq2 we obtain

(degT (p)− 1) · costT (Tq2) = (degT (p)− 2) · costTrim(T ,q1)(Tq2). (8.10)

Therefore,costT (Tq2)

costTrim(T ,q1)(Tq2)=

degT (p)− 2

degT (p)− 16 1. (8.11)

Let qmax denote a child of p, for which the subtree hanging from it has a maximum cost.Formally, qmax , arg maxcostT (Tq) | q ∈ NT (p), |q| = |p|+1. We apply Proposition 8.8as follows. Let k = degT (p)− 1, and let xi = costT (Tqi

) where qi denotes the ith child ofp. Notice that by Equation (8.11), x′i = costTrim(T ,qmax)(Tqi

). It follows that

∑

q∈NT (p) : |q|=|p|+1costT (Tq) >

∑

q∈NT (p) : |q|=|p|+1\qmaxcostTrim(T ,qmax)(Tq). (8.12)

Because λt(p)wT (p) is unchanged by trimming a child of p, it follows from Equa-tions (8.8) and (8.12) that

costT (Tp) > costTrim(T ,qmax)(Tp). (8.13)

Hence, we conclude that Term (8.5.b) > Term (8.6.b’) for p′ = qmax, and the lemmafollows. 2

106

Bibliography

[ADS09] S. Arora, C. Daskalakis, and D. Steurer, “Message passing algorithms andimproved LP decoding,” in Proc. of the 41st annual ACM Symp. Theory ofComputing (STOC’09), Bethesda, MD, USA, May 31 - June 02, 2009, pp.3–12.

[AM00] S. M. Aji and R. J. McEliece, “The generalized distributive law” IEEE Trans.Inf. Theory, vol. 46, no. 2, pp. 325–343, Mar. 2000.

[Bur09] D. Burshtein, “Iterative approximate linear programming decoding of LDPCcodes with linear complexity,” IEEE Trans. Inf. Theory, vol. 55, no. 11, pp.4835–4859, Nov. 2009.

[BGT93] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limit error-correcting coding and decoding: Turbo-codes,” in Proc. IEEE Int. Conf.on Communications (ICC’93), Geneva, Switzerland, vol. 2, pp. 1064–1070,1993.

[BMvT78] E. Berlekamp, R. McEliece, and H. van Tilborg, “On the inherent intractabil-ity of certain coding problems,” IEEE Trans. Inf. Theory, vol. 24, no. 3, pp.384-386, May 1978.

[BZ02] A. Barg and G. Zemor, “Error exponents of expander codes,” IEEE Trans.Inf. Theory, vol. 48, no. 6, pp. 1725–1729, Jun. 2002.

[BZ04] A. Barg and G. Zemor, “Error exponents of expander codes under linear-complexity decoding,” SIAM J. Discr. Math., vol. 17, no. 3, pp 426–445,2004.

[CDE+05] J. Chen, A. Dholakia, E. Eleftheriou, M.P.C. Fossorier, and X.-Y. Hu,“Reduced-complexity decoding of LDPC codes,” IEEE Trans. Commun.,vol. 53, no. 8, pp. 1288 – 1299, Aug. 2005.

[CF02] J. Chen and M. P. C. Fossorier, “Density evolution for two improved BP-Based decoding algorithms of LDPC codes,” IEEE Commun. Lett., vol. 6,no. 5, pp. 208 –210, May 2002.

[CRU01] S.-Y. Chung, T. Richardson, and R. Urbanke, “Analysis of sum-productdecoding of low-density parity-check codes using a Gaussian approximation,”IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 657–670, Feb. 2001.

107

[DDKW08] C. Daskalakis, A. G. Dimakis, R. M. Karp, and M. J. Wainwright, “Proba-bilistic analysis of linear programming decoding,” IEEE Trans. Inf. Theory,vol. 54, no. 8, pp. 3565–3578, Aug. 2008.

[DJM98] D. Divsalar, H. Jin, and R. J. McEliece, “Coding theorems for ‘turbo-like’codes,” in Proc. 36th Allerton Conf. Communication, Control, and Comput-ing, Monticello, IL, Sep. 1998, pp. 201-210.

[Fel68] W. Feller, An introduction to probability theory and its applications, 3rd ed.,New York: Wiley, 1968, vol. I.

[Fel03] J. Feldman, “Decoding error-correcting codes via linear programming,”Ph.D. dissertation, MIT, Cambridge, MA, 2003.

[FK00] B.J. Frey and R. Koetter, “Exact inference using the attenuated max-productalgorithm,” In Advanced Mean Field Methods: Theory and Practice, Cam-bridge, MA: MIT Press, 2000.

[FK02] J. Feldman and D. R. Karger, “Decoding turbo-like codes via linear program-ming” in Proc. 43rd Annu. IEEE Symp. Foundations of Computer Science(FOCS), Nov. 2002.

[FK04] J. Feldman and D. R. Karger, “Decoding turbo-like codes via linear pro-gramming,” J. Comput. Syst. Sci., vol. 68, no. 4, pp. 733–752, June 2004.

[FKV05] J. Feldman, R. Koetter, and P. O. Vontobel, “The benefit of thresholding inLP decoding of LDPC codes,” in Proc. IEEE Int. Symp. Information Theory(ISIT’05), Adelaide, Australia, Sep. 49 2005, pp. 307-311.

[FMS+07] J. Feldman, T. Malkin, R. A. Servedio, C. Stein, and M. J. Wainwright, “LPdecoding corrects a constant fraction of errors,” IEEE Trans. Inf. Theory,vol. 53, no. 1, pp. 82–89, Jan. 2007.

[For01] Forney, G. D., Jr., “Codes on graphs: normal realizations,” IEEE Trans. Inf.Theory, vol. 47, no. 2, pp. 520–548, Feb. 2001.

[FS05] J. Feldman and C. Stein, “LP decoding achieves capacity,” in Proc. Symp.Discrete Algorithms (SODA’05), Vancouver, Canada, Jan. 2005, pp. 460–469.

[FWK05] J. Feldman, M. J. Wainwright, and D. R. Karger, “Using linear programmingto decode binary linear codes,” IEEE Trans. Inf. Theory, vol. 51, no. 3, pp.954-972, Mar. 2005.

[Gal63] R. G. Gallager, Low-Density Parity-Check Codes. MIT Press, Cambridge,MA, 1963.

[GB11] I. Goldenberg and D. Burshtein, “Error bounds for repeat-accumulate codesdecoded via linear programming,” Adv. Math. Commn., vol. 5, no. 4, pp.555-570, Nov. 2011.

108

[GKG10] N. Goela, S. B. Korada, and M. Gastpar, “On LP decoding of polar codes,”In Proc. IEEE Information Theory Workshop (ITW’10), pp. 1–5, 2010.

[Ham50] R. W. Hamming, “Error detecting and error correcting codes,” Bell Syst.Tech. J., Vol. 29, no. 2, pp. 147-160, Apr. 1950.

[HE05] N. Halabi and G. Even, “Improved bounds on the word error probabilityof RA(2) codes with linear-programming-based decoding,” IEEE Trans. Inf.Theory, vol. 51, no. 1, pp. 265–280, Jan. 2005.

[HE06] N. Halabi and G. Even, “On graph cover decoding and LP decoding of gen-eralized Tanner codes,” Manuscript, Dec. 2006.

[HE10] N. Halabi and G. Even, “LP decoding of regular LDPC codes in memorylesschannels,” in Proc. IEEE Int. Symp. Information Theory (ISIT’10), pp. 744–748, Jun. 2010.

[HE11a] N. Halabi and G. Even, “LP decoding of regular LDPC codes in memorylesschannels,” IEEE Trans. Inf. Theory, vol. 57, no. 2, pp. 887–897, Feb. 2011.

[HE11b] N. Halabi and G. Even, “On decoding irregular Tanner codes with local op-timality guarantees”, CoRR, http://arxiv.org/abs/1107.2677, Jul. 2011.

[HE12a] N. Halabi and G. Even, “Hierarchies of local-optimality characterizations indecoding Tanner codes”, CoRR, http://arxiv.org/abs/1202.2251, Feb.2012.

[HE12b] N. Halabi and G. Even, “Local-optimality guarantees for optimal decodingbased on paths”, CoRR, http://arxiv.org/abs/1203.1854, Mar. 2012.

[HE12c] N. Halabi and G. Even, “Linear-programming decoding of Tanner codes withlocal-optimality certificates”, in Proc. IEEE Int. Symp. Information Theory(ISIT’12), pp. 2686–2690, Jul. 2012.

[HE12d] N. Halabi and G. Even, “Hierarchies of local-optimality characterizationsin decoding Tanner codes”, in Proc. IEEE Int. Symp. Information Theory(ISIT’12), pp. 2691–2695 , Jul. 2012.

[HE12e] N. Halabi and G. Even, “Local-optimality guarantees for optimal decodingbased on paths”, in Proc. 7th Int. Symp. Turbo Codes and Iterative Infor-mation Processing, Gothenburg, Sweden, Aug. 2012.

[HE12f] N. Halabi and G. Even, “Message-passing decoding beyond the girth withlocal-optimality guarantees”, accepted to IEEEI’12.

[JKM00] H. Jin, A. Khandekar, and R. J. McEliece, “Irregular repeat-accumulatecodes,” in Proc. 2nd Int. Symp. Turbo Codes, pp.1–8, 2000.

[JP11] Y.-Y. Jian and H.D. Pfister, “Convergence of weighted min-sum decoding viadynamic programming on coupled trees,” CoRR, http://arxiv.org/abs/1107.3177, Jul. 2011.

109

[KFL01] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, “Factor graphs and thesum-product algorithm,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 498–519, Feb. 2001.

[KU11] S. B. Korada and R. L. Urbanke, “Exchange of limits: Why iterative de-coding works,” IEEE Trans. Inf. Theory, vol. 57, no. 4, pp. 2169–2187, Apr.2011.

[KV03] R. Koetter and P. O. Vontobel, “Graph-covers and iterative decoding of finitelength codes,” in Proc. 3rd Int. Symp. Turbo Codes, pp. 75–82, Sep. 2003.

[KV06] R. Koetter and P. O. Vontobel, “On the block error probability of LP de-coding of LDPC codes,” in Proc. Inaugural Workshop of the Center for In-formation Theory and its Applications, La Jolla, CA, USA, Feb. 2006.

[LMSS01] M. G. Luby, M. Mitzenmacher, M. A. Shokrollahi, and D. A. Spielman, “Im-proved low-density parity-check codes using irregular graphs,” IEEE Trans.Inf. Theory, vol. 47, no. 2, pp.585 –598, Feb. 2001.

[Mac] D. MacKay, Encyclopedia of Sparse Graph Codes. Available online: http:

//www.inference.phy.cam.ac.uk/mackay/codes/

[Mac99] D. J. C. MacKay, “Good error-correcting codes based on very sparse matri-ces,” IEEE Trans. Inf. Theory, vol. 45, no. 2, pp.399 –431, Mar. 1999.

[MN96] D. J. MacKay and R. M. Neal, “Near Shannon limit performance of low-density parity check codes,” Electron. Lett., vol. 33, pp. 457–458, Mar. 1997.

[Rot06] R. Roth, Introduction to Coding Theory. Cambridge University Press, NewYork, NY, 2006.

[RU01] T. Richardson and R. Urbanke, “The capacity of low-density parity-checkcodes under message-passing decoding,” IEEE Trans. Inf. Theory, vol. 47,no. 2, pp. 599–618, Feb. 2001.

[RU08] T. Richardson and R. Urbanke, Modern Coding Theory. Cambridge Univer-sity Press, New York, NY, 2008.

[Sch98] A. Schrijver, Theory of linear and integer programming. John Wiley & Sons,Inc., New York, NY, USA, 1998.

[Sha48] C. E. Shannon, “A mathematical theory of communication,” Bell Syst. Tech.J., vol. 27, no. 2, pp. 379–423, 623–656, 1948.

[Ska07] V. Skachek,“Low-density parity-check codes: Constructions and bounds,”Ph.D. dissertation, Technion, Haifa, Israel, 2007.

[Ska11] V. Skachek, “Correcting a fraction of errors in nonbinary expander codes withlinear programming,” IEEE Trans. Inf. Theory, vol. 57, no. 6, pp. 3698–3706,Jun. 2011.

110

[SR03] V. Skachek and R. M. Roth, “Generalized minimum distance iterative de-coding of expander codes,” In Proc. IEEE Information Theory Workshop(ITW’03), pp. 245 – 248, 2003.

[SS96] M. Sipser and D. A. Spielman, “Expander codes”, IEEE Trans. Inf. Theory,vol. 42, no. 6, pp. 1710–1722, Nov. 1996.

[TSS11] M. H. Taghavi, A. Shokrollahi, and P. H. Siegel, “Efficient implementationof linear programming decoding,” IEEE Trans. Inf. Theory, vol. 57, no. 9,pp. 5960–5982, Sep. 2011.

[TS08] M. H. Taghavi and P. H. Siegel, “Adaptive methods for linear programmingdecoding,” IEEE Trans. Inf. Theory, vol. 54, no. 12, pp. 5396–5410, Dec.2008.

[Tan81] R. M. Tanner, “A recursive approach to low-complexity codes,” IEEE Trans.Inf. Theory, vol. 27, no. 5, pp. 533–547, Sept. 1981.

[VK05] P. O. Vontobel and R. Koetter, “Graph-cover decoding and finite-lengthanalysis of message-passing iterative decoding of LDPC codes,” CoRR, http://www.arxiv.org/abs/cs.IT/0512078, Dec. 2005.

[VK06] P. O. Vontobel and R. Koetter, “Towards low-complexity linear-programming decoding,” in Proc. 4th Int. Symp. Turbo Codes and RelatedTopics, Munich, Germany, Apr. 2006.

[VK07] P. O. Vontobel and R. Koetter, “On low-complexity linear-programming de-coding of LDPC codes,” Europ. Trans. Telecommun., vol. 18, no. 5, pp.509-517, Aug. 2007.

[Von10] P. Vontobel, “A factor-graph-based random walk, and its relevance for LPdecoding analysis and Bethe entropy characterization,” in Proc. InformationTheory and Applications Workshop, UC San Diego, LA Jolla, CA, USA,Jan. 31-Feb. 5, 2010.

[WA01] X. Wei and A. Akansu, “Density evolution for low-density parity-check codesunder Max-Log-MAP decoding,” Electron. Lett., vol. 37, no. 18, pp. 1125 –1126, Aug. 2001.

[WF01] Y. Weiss and W. T. Freeman, “On the optimality of solutions of the max-product belief-propagation algorithm in arbitrary graphs,” IEEE Trans. Inf.Theory, vol. 47, no. 2, pp. 736–744, Feb. 2001.

[Wib96] N. Wiberg, “Codes and decoding on general graphs”, Ph.D. dissertation,Department of Electrical Engineering, Linkoping University, Linkoping, Swe-den, 1996.

[WJW05] M. J. Wainwright, T. S. Jaakkola, and A. S. Willsky, “MAP estimation viaagreement on trees: message-passing and linear programming,” IEEE Trans.Inf. Theory, vol. 51, no. 11, pp. 3697–3717, Nov. 2005.

111

[WLK95] N. Wiberg, H.-A. Loeliger, and R. Kotter, “Codes and iterative decoding ongeneral graphs,” Eur. Trans. Telecomm., vol. 6, no. 5, pp. 513–525, 1995.

[Z01] G. Zemor, “On expander codes,” IEEE Trans. Inf. Theory, vol. 47, no. 2,pp. 835–837, Feb. 2001.

112

95 7.A 7.6 הוכחת למה – נספח

96 7.B 7.9 הוכחת למה – נספח

מקומית- היררכיות של אופטימאליות8 97

מבוא8.1 97

רישא- גיזום של תתי עצים בעץ מסלולי8.2 98

מקומית- היררכיית דרגות של אופטימאליות8.3 99

מקומית קרובה- היררכיה מבוססת גובה של אופטימאליות8.4 100

צאות נומריות תו8.5 102

דיון8.5 103

105 8.A 8.1 הוכחת למה

ביבליוגרפיה 107

אופטימאליות גלובאלית ואופטימאליות ביחס לתכנות , מקומית- על הקשרים בין אופטימאליות4.2

42 ליניארי

מקומית- הוכחת חסמים על בסיס אופטימאליות4.3 45

חסימת תהליכים על עצים4.3.1 45

)BIAWGN(סי חיבורי בעל קלט בינארי ניתוח עבור ערוץ רעש לבן גאו4.3.2 47

דיון4.4 50

52 4.A חישוב האבולוציה של פונקציות צפיפות הסתברות על עצים– נספח

52 4.A.1תכונות של משתנים אקראיים

4.A.2 חישוב הפילוגים של Xlו - Yl 52

4.A.3 שערוך m54 0in E tXt e s

ליניארי של קודי טאנר- ופענוח מבוסס תכנותMLומית עבור פענוח מק-תעודות אופטימאליות 5 59

מבוא5.1 59

)ML( תעודה קומבינטורית עבור מילת קוד אופטימאלית גלובאלית 5.2 61

ליניארי-מקומית גוררת אופטימאליות ביחס לפענוח מבוסס תכנות- אופטימאליות5.3 63

מקומית- אימות תעודת אופטימאליות5.4 65

י הטלות של עצים ממושקלים" הרכבת מילות קוד ע5.5 65

מקומית עבור קודי טאנר עם אילוצי -הודעות עם הבטחה על בסיס אופטימאליות-פענוח העברת 6

71 בדיקת זוגיות

מבוא6.1 71

הודעות עם הבטחה לאופטימאליות גלובאלית עבור קודי טאנר לא רגלוריים עם -פענוח העברת 6.2

73 דיקת זוגיותאילוצי ב

מחשב את המילה האופטימאלית המקומיתNWMS – 6.1 הוכחת משפט 6.2.1 75

דיון6.3 80

81 6.A קונפיגורציות תקפות אופטימאליות במהלך ביצוע - תת– נספחNWMS2

ליניארי של קודי טאנר רגולריים-חסמים על הסתברות השגיאה למילה של פענוח מבוסס תכנות 7 85

א מבו7.1 85

מקומית-ליניארי על בסיס אופטימאליות- חסמים על הסתברות השגיאה של פענוח מבוסס תכנות7.2 86

חסימת תהליכים על עצים7.2.1 88

)BSC( ניתוח עבור ערוץ בינארי סימטרי 7.2.2 90

ניתוח עבור ערוצים חסרי זיכרון בעלי קלט בינארי ופלט סימטרי7.2.3 93

דיון7.3 94

תוכן עניינים

מבוא1 1

סיכום התוצאות1.1 3

"זוגיים"ליניארי עבור קודי טאנר - חסמים על הסתברות השגיאה למילה של פענוח מבוסס תכנות1.1.1 4

רגולריים בערוצים חסרי זיכרוןLDPCארי של קודי ליני- פענוח מבוסס תכנות1.1.2 5

ליניארי של קודי טאנר - ופענוח מבוסס תכנותMLמקומית עבור פענוח - תעודות אופטימאליות1.1.3 6

7

מקומית עבור קודי טאנר עם אילוצי -הודעות עם הבטחה על בסיס אופטימאליות- פענוח העברת1.1.4

בדיקת זוגיות

ליניארי של קודי טאנר רגולריים-הסתברות השגיאה למילה של פענוח מבוסס תכנות חסמים על 1.1.5 8

מקומית- היררכיות של אופטימאליות1.1.6 10

ארגון1.2 11

רקע2 13

מונחים בגרפים וסימונים אלגבריים2.1 13

תקשורת בנוכחות ערוץ רועש2.2 14

קודי טאנר וייצוג של גרף טאנר2.3 16

ליניארי של קודי טאנר בערוצים חסרי זיכרון-ענוח מבוסס תכנות פ2.4 19

ליניארי והנחת מילת האפס- סימטריה של פענוח מבוסס תכנות2.5 20

קוד מבוססות על כיסוי טופולוגי של גרפים והפוליטופ הבסיסי המוכלל- פסאודו מילות2.6 22

הרמות וקודים מכסים, מיפויים של כיסוי2.6.1 22

קוד מבוססות על כיסוי טופולוגי של גרפים והפוליטופ הבסיסי המוכלל- פסאודו מילות2.6.2 23

29

ליניארי עבור קודי טאנר -חסמים על הסתברות השגיאה למילה של פענוח מבוסס תכנות 3

"זוגיים"

מבוא3.1 29

מקומית מבוססות על מעגלים פשוטים- תעודות אופטימאליות3.2 30

)ML(מקומית מבטיחה אופטימליות גלובאלית -מאליות אופטי3.2.1 31

ליניארי-מקומית מבטיחה אופטימאליות ביחס לתכנות- אופטימאליות3.2.2 32

מקומית המבוסס על מעגלים- חסמים על הסתברות השגיאה למילה בעזרת אפיון אופטימאליות3.3 33

דיון3.4 36

רגולריים בערוצים חסרי זיכרוןLDPCלינארי של קודי - פענוח מבוסס תכנות4 39

מבוא4.1 39

תקציר

אופטימאלי של קודים לתיקון שגיאות בעלי אורך סופי תחת מודל -חיבור זה עוסק בשתי גישות לפענוח תת. מבוססת על אלגוריתם פענוח איטרטיבי, 1963 - י גלאגר ב"שהוצגה לראשונה ע, גישה אחת. ערוץ הסתברותיאנו נתייחס . ליניארי-מבוססת על תכנות, 2003 - ויינרייט וקארגר ב, י פלדמן"שהוצגה ע, הגישה השנייה

מודל הערוץ ההסתברותי . הנקראים קודי טאנר, למשפחות של קודים לתיקון שגיאות המבוססים על גרפים .רי ופלט סימטריכולל כל ערוץ חסר זיכרון בעל קלט בינא

צדדית העונה על שתי -אנו מתעניינים במבחן בעל טעות חד, x ומילת קוד yבהינתן פלט ערוץ

, תשובה חיובית למבחן? האם הוא יחיד? y הינו שערוך אופטימאלי ביחס לפלט הערוץ xהאם : שאלותקים במבחנים אנו עוס. xהינה עדות המאשרת את האופטימאליות של מילת הקוד , שנקראת תעודה

מקומיות - תעודה מבוססת אופטימאליות. מקומית-המבוססים על אפיונים קומבינטוריים של אופטימאליות. ליניארי-וגם אופטימאליות ביחס לתכנות) Maximum-Likelihood(מבטיחה גם אופטימאליות גלובאלית

אזי מפענח , כלשהיxאם קיימת עדות המאשרת את האופטימאליות המקומית עבור מילת קוד , כלומר . הינה גם מילת הקוד האופטימאלית הגלובאליתx-ומובטח ש, xליניארי מחשב את -מבוסס תכנות

. עבור שלוש משפחות של קודי טאנרמקומית-אופטימאליותאנו מציגים תעודות שמבוססות על

)i (ור משפחה של קודי טאנר נתחיל בתעודה פשוטה המבוססת על מעגלים פשוטים בגרף הטאנר של הקוד עבדסקלקיס ושטויירר , י ארורה"שהוצגה ע, מקומית-אופטימאליותנמשיך עם תעודה מבוססת ) ii". (זוגיים"תעודה זו מבוססת על ). דלילהמטריצת בדיקת זוגיותבעלי קודים ( רגולריים LDPCעבור קודי , ]2009[

. י מחצית המותן של גרף הטאנר" חסום עשגובהם, עצים רזים וממושקלים בגרף הטאנר של הקוד- תתי)iii (עצים בעצי חישוב של גרף הטאנר של -התעודה מבוססת על תתי. נציג תעודה עבור כל קוד טאנר ליניארי

דרגות צמתי האילוץ בתתי , בנוסף). אף גדול יותר ממותן גרף הטאנר (hעצים אלה בעלי כל גובה סופי . הקוד .עצים אלה צפופים יותר ומכילים צמתים רבים יותר מאשר עצים רזים, על כן. יםהעצים אינן מוגבלות לשתי

נציג חסמים על הסתברות השגיאה למילה של , בהתבסס על האפיון הקומבינטורי של התעודות

חסמים עליונים על . ליניארי תחת ערוצים חסרי זיכרון בעלי קלט בינארי ופלט סימטרי- פענוח מבוסס תכנות- אופטימאליותי חסמים תחתונים על ההסתברות שקיימת תעודת "שגיאה למילה מתקבלים עהסתברות ה

.מקומית

אלגוריתם . NWMSהנקרא , אנו מציגים אלגוריתם פענוח איטרטיבי חדש של העברת הודעותNWMS אמונה חלחול הוא מסוג אלגוריתמים מבוססי )BP ( שפועל עבור כל קוד טאנר בעל אילוצי בדיקת עבור עומק מקומית-אופטימאליתנוכיח שאם קיימת מילת קוד ). HDPC וקודי LDPCדוגמת קודי (זוגיות אינו hמאחר והעומק , יתר על כן. איטרציותh - מחשב את אותה המילה בNWMSאזי אלגוריתם , hנתון אנר גדול ממותן גרף הטh תקפה גם אם מספר האיטרציות NWMSי "ההבטחה של פענוח מוצלח ע, חסום

.של הקוד

. מקומית-אופטימאליותבפרק האחרון אנו מציגים היררכיות עבור האפיונים הקומבינטוריים של מקומית- אופטימאליותהשיפור בהסתברות שתעודה מבוססת ) i. (היררכיות אלו מסבירות שתי תופעות

פענוח איטרטיבי כאשר השיפור בביצועים של ) ii. (קיימת כאשר מגדילים את גובה התעודה או את צפיפותה .גם כאשר מספרן גדול ממותן הגרף, מגדילים את מספר האיטרציות

עבודה זו נעשתה בהדרכת

גיא אבן' פרופ

ר פליישמןדש איבי ואל"הפקולטה להנדסה ע

סליינר-ש זנדמן"ת הספר לתארים מתקדמים עבי

הבטחה פענוח קודים מבוססי גרפים עם אופטימאליות מקומיתעל בסיס

ניסים חלאבי

"דוקטור לפילוסופיה"קבלת תואר חיבור לשם

אביב-הוגש לסנאט של אוניברסיטת תל

א בפקולטה להנדסה"יטת תעבודה זו נעשתה באוניברס

גיא אבן' בהדרכת פרופ

ג" תשעתשרי

ר פליישמןדש איבי ואל"הפקולטה להנדסה ע

סליינר-ש זנדמן"בית הספר לתארים מתקדמים ע

פענוח קודים מבוססי גרפים עם הבטחה על בסיס אופטימאליות מקומית

ניסים חלאבי

"דוקטור לפילוסופיה"קבלת תואר חיבור לשם

אביב-הוגש לסנאט של אוניברסיטת תל

ג" תשעתשרי

Date post:	05-Aug-2020
Category:	Documents
Upload:	others
View:	11 times
Download:	0 times

ON DECODING TANNER CODES WITH LOCAL-OPTIMALITY …nissimh/Nissim-PhDThesis.pdf · Tanner code with...

Documents