Topics in joint source-channelcoding and multiuser detection
Adria Tauste Campo
Department of Engineering
University of Cambridge
SupervisorDr. Albert Guillen i Fabregas
This dissertation was submitted for the degree of Doctor ofPhilosophy
June 2011
A mis abuelos Basi y Jose.A mi tıo Adolfo, siempre en el recuerdo.
Declaration
This dissertation is the result of my own work and includes nothingwhich is the outcome of work done in collaboration except wherespecifically indicated in the text.
Declaration
This dissertation does not exceed the word limit for the Departmentof Engineering Degree Committee.
Contents
Contents 4
List of Figures 7
List of Tables 10
I Non-asymptotic joint source-channel coding 7
1 Introduction 81.1 Joint source-channel coding as a paradigm model . . . . . . . . . 8
1.1.1 A finite-length perspective . . . . . . . . . . . . . . . . . . 111.2 Definitions and notation . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.1 Discrete memoryless systems . . . . . . . . . . . . . . . . . 201.3 Previous work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.3.1 The separation theorem . . . . . . . . . . . . . . . . . . . 211.3.2 Bounds on the error probability . . . . . . . . . . . . . . . 221.3.3 Refined analysis of non-asymptotic channel coding . . . . . 26
2 Joint-source channel coding for arbitrary blocklength 282.1 Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.2 Achievability results . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2.1 MAP decoding upper bounds . . . . . . . . . . . . . . . . 282.2.1.1 Proof of Theorem 2.2.3 . . . . . . . . . . . . . . . 342.2.1.2 Proof of Corollary 2.2.3 . . . . . . . . . . . . . . 372.2.1.3 Discussion . . . . . . . . . . . . . . . . . . . . . . 39
2.2.2 Threshold decoding upper bounds . . . . . . . . . . . . . . 392.2.2.1 An exponent for threshold decoding . . . . . . . 42
2.3 Converse bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . 432.3.1 Set-dependent lower bounds . . . . . . . . . . . . . . . . . 472.3.2 εk,n-transmissibility . . . . . . . . . . . . . . . . . . . . . . 49
4
CONTENTS
2.4 Comparison with separate source-channel coding . . . . . . . . . . 502.4.1 System model . . . . . . . . . . . . . . . . . . . . . . . . . 512.4.2 Upper bounds on the probability of error . . . . . . . . . . 512.4.3 A lower bound on the probability of error . . . . . . . . . 542.4.4 Error exponent of separate source and channel coding . . . 552.4.5 A comparison between JSCC and SSCC via DT(k, n) . . . 56
2.5 Application to binary memoryless sources and channels . . . . . . 572.5.1 Numerical results . . . . . . . . . . . . . . . . . . . . . . . 57
2.5.1.1 Joint source-channel coding bounds . . . . . . . . 582.5.1.2 Joint vs. Separate source-channel coding . . . . . 60
2.6 Chapter review and conclusions . . . . . . . . . . . . . . . . . . . 62
II Large-system analysis of multiuser detection with anunknown number of users 65
3 Introduction to large-system analysis of multiuser detection 663.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663.2 Notation and model . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.2.1 Statistical-physics approach to large-system analysis of clas-sic MUD . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.2.2 A note on the validity of the replica method . . . . . . . . 72
4 High-SNR analysis of optimum multiuser detection in the large-system regime 744.1 Large-system multiuser efficiency . . . . . . . . . . . . . . . . . . 744.2 Maximum system load and related considerations . . . . . . . . . 77
4.2.1 Solutions to the large-system fixed-point equation . . . . . 794.2.2 System load and the space of fixed-point solutions . . . . . 81
4.2.2.1 Case α ∈ (0, 1) . . . . . . . . . . . . . . . . . . . 834.2.2.2 Case α = 1 . . . . . . . . . . . . . . . . . . . . . 85
4.2.3 Maximum system load with error probability constraints . 884.3 Chapter review and conclusions . . . . . . . . . . . . . . . . . . . 90
5 Iterative multiuser decoding with an unknown number of usersin the large-system regime 915.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 915.2 Encoding of data and activity . . . . . . . . . . . . . . . . . . . . 92
5.2.1 Optimum detection . . . . . . . . . . . . . . . . . . . . . . 945.3 Iterative joint decoding under belief propagation . . . . . . . . . . 955.4 Performance analysis . . . . . . . . . . . . . . . . . . . . . . . . . 97
5
CONTENTS
5.4.1 Large-system analysis . . . . . . . . . . . . . . . . . . . . . 975.4.2 Concentration . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.4.2.1 Case ρ < ρth. Decoupling of extrinsic messages . 1015.4.2.2 Case ρ ≥ ρth. Lower bound to η(`) . . . . . . . . 102
5.4.3 Approximations . . . . . . . . . . . . . . . . . . . . . . . . 1045.5 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . 1055.6 Chapter review and conclusions . . . . . . . . . . . . . . . . . . . 109
6 Summary and lines of future research 1106.1 Main contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.1.1 Part I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1106.1.2 Part II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.2 Future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
A 113A.1 Types: definitions and properties . . . . . . . . . . . . . . . . . . 113A.2 Proof of Theorem 1.3.3 . . . . . . . . . . . . . . . . . . . . . . . . 114A.3 Source and channel error exponents, and Csiszar’s original JSCC
theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116A.4 Proof of the lower bound in Theorem 1.3.4 using random coding
over fixed-composition codes with MAP decoding . . . . . . . . . 118
B 123B.1 Application to BMS and BSC source channel pair . . . . . . . . . 123
B.1.1 Achievability JSCC bounds for BMS and BSC . . . . . . . 123B.1.2 Error exponents for BMS and BSC . . . . . . . . . . . . . 124
B.2 Application to BMS and BEC source channel pair . . . . . . . . . 125B.2.1 Achievability JSCC bounds for BMS and BEC . . . . . . . 125B.2.2 Error exponents for BMS and BEC . . . . . . . . . . . . . 126
C 127C.1 Proof of Corollary 4.1.1 . . . . . . . . . . . . . . . . . . . . . . . . 127C.2 Proof of Theorem 4.1.1 . . . . . . . . . . . . . . . . . . . . . . . . 128C.3 Proof of Theorem 4.2.1 . . . . . . . . . . . . . . . . . . . . . . . . 133
D 134D.1 Proof of Claim 5.4.1 . . . . . . . . . . . . . . . . . . . . . . . . . 134D.2 Proof of Lemma 5.4.1 . . . . . . . . . . . . . . . . . . . . . . . . . 139D.3 Proof of Proposition 5.4.1 . . . . . . . . . . . . . . . . . . . . . . 142D.4 Proof of Corollary 5.4.1 . . . . . . . . . . . . . . . . . . . . . . . . 144
References 146
6
List of Figures
1.1 Communication problem. . . . . . . . . . . . . . . . . . . . . . . . 81.2 Diagram of the joint source-channel coding. . . . . . . . . . . . . 91.3 Diagram of source coding and channel coding. . . . . . . . . . . . 101.4 JSCC achievability bound and SSCC converse bound for a system
formed by binary memoryless source (with parameter δ = 0.05)and binary symmetric channel with crossover probability ξ = 0.11with transmission rate t = 1. . . . . . . . . . . . . . . . . . . . . . 13
1.5 Block diagram of joint source-channel coding. . . . . . . . . . . . 141.6 Diagram of MAP decoding. The source vector with highest poste-
rior probability is decoded. . . . . . . . . . . . . . . . . . . . . . . 151.7 Diagram of threshold decoding. In general, the source output that
is decoded is not the one with highest posterior probability butthe first on the list whose posterior probability is above a giventhreshold. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.8 Csiszar’s upper and lower bound on the JSCC error exponentwith transmission rate t = 1 for a binary symmetric channel withcrossover probability ξ = 0.002 as a function of the source entropyof a binary memoryless source. . . . . . . . . . . . . . . . . . . . . 24
2.1 Random-coding construction: different classes Ai are associated todifferent codeword distributions P
(i)X , i = 1, . . . , N = 4. . . . . . . 35
2.2 Bank of decision rules to test the channel between PXY and QXY . 452.3 Max upper bound and converse for a binary memoryless source
with δ = 0.05 and a binary symmetric channel with crossover prob-ability ξ = 0.11. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.4 Structure of a separated source-channel coding system. . . . . . . 512.5 Upper and lower bounds for a BMS with δ = 0.05 and a BSC with
crossover probability ξ = 0.11. . . . . . . . . . . . . . . . . . . . . 582.6 Error exponents for a BSC with crossover probability ξ = 0.11 as
a function of a BMS entropy, where H(0.05) ≈ 0.29. . . . . . . . . 59
7
LIST OF FIGURES
2.7 Upper and lower bounds for a BMS with δ = 0.05 and a BEC witherasure probability ξ = 0.5. . . . . . . . . . . . . . . . . . . . . . . 60
2.8 Error exponents for a BMS-BEC pair, where BEC has erasureprobability ξ = 0.5, as a function of the BMS entropy, whereH(0.05) ≈ 0.29. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.9 Achievability and converse bounds for SSCC and JSCC for a BMSwith δ = 0.05 and a BSC with crossover probability 0.11. Thedashed and solid lines represent the corresponding RCU and con-verse bounds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.10 Maximum entropy allowed to transmit with SSCC and JSCC over aBSC with crossover probability ξ = 0.11. The achievability boundfor JSCC is the RCU bound while the converse bound for SSCC isgiven by equation (2.152). . . . . . . . . . . . . . . . . . . . . . . 63
3.1 Randomly spread CDMA system with optimum multiuser detector. 683.2 Bank of K equivalent single-user Gaussian channels. . . . . . . . . 73
4.1 Large-system multiuser efficiency of the user-and-data detector un-der MAP with prior knowledge of α and β=3/7. . . . . . . . . . . 76
4.2 A comparison of the exact MMSE value with its upper and lowerbounds for α = 0.5 and ηγ ∈ [10, 20] dB . . . . . . . . . . . . . . 78
4.3 A comparison of the exact MMSE value with its upper and lowerbounds for ηγ = 20 dB and α ∈ [0, 1]. . . . . . . . . . . . . . . . . 78
4.4 Fixed-point solutions (marked by circles) for different values of βand fixed α = 0.5, and γ = 18 dB. . . . . . . . . . . . . . . . . . . 80
4.5 System load function in the multiuser efficiency domain for α = 0.5and γ = 18 dB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.6 Upper and lower bounds on the numerical spinodal lines (thickline) for α = 0.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.7 Comparison of upper bounds on the spinodal lines for α = 1.0(left) and α = 0.5 (right). . . . . . . . . . . . . . . . . . . . . . . 87
4.8 Critical system load for different uncoded error probabilities andactivity rates. Thicker lines represent numerical results, whereasregular lines show the corresponding lower bounds. Lines withcircle markers: Pe = 10−3 and α = 0.99. Lines with cross markers:Pe = 10−3 and α = 0.1. Lines with star markers: Pe = 10−3 andα = 0.5. Lines with square markers: error probability 10−5 andα = 0.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.1 Block diagram of the transmission. . . . . . . . . . . . . . . . . . 935.2 Modified trellis structure of a convolutional code (5, 7)8. . . . . . . 94
8
LIST OF FIGURES
5.3 Block diagram of an iterative multiuser joint decoder . . . . . . . 965.4 Numerical results for Corollary 5.4.1: Mapping function Ψ(η, β, α)
with α = 1.0 and α = 0.5 for ρ = 0.0, 0.01, 0.5, at Eb/N0 = 6dBand β = 4.5. Solid lines represent the density evolution with the(5, 7)8 convolutional code for a codeword length of L = 4000 and100 realizations for α = 1.0 and for α = 0.5 with ρ = 0.0. Forα = 0.5 and ρ = 0.01 (dash-dotted line), it is shown the lowerbound on Ψ(η, β, α) for η ∈ (0, 0.1] and the exact mapping forη ∈ (0.1, 1]. For α = 0.5 and ρ = 0.5, the lower bound is shown∀η ∈ (0, 1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.5 Numerical results for section 5.4.3 using the Gaussian approxi-mation at the SISO decoders for α = 0.5 and ρ > 0: Mappingfunction Ψ(η, β, α) with ρ = 0.05 and ρ = 0.5, at Eb/N0 = 6dBand β = 4.5. Solid lines represent the density evolution with the(5, 7)8 convolutional code and dash-dotted lines plot the Gaussianapproximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
9
List of Tables
2.1 Summary of upper and lower bounds. . . . . . . . . . . . . . . . . 49
10
Acknowledgements
This thesis is the result of an adventurous trip with base camp inCambridge and stages in Stanford, Barcelona and Mallorca. In thenext few lines I would like to acknowledge a number of people thathelped me reach the final destination with success.
First, I am deeply thankful to my supervisor Prof. Albert Guillen iFabregas who offered me the opportunity to do a PhD in a uniqueplace like Cambridge and always gave me his unconditional supportin all circumstances. Beyond particular thoughts and ideas, our dis-cussions over all these years have taught me to think with a balancedsense of criticism and passion.
I am indebted to Prof. Ezio Biglieri who was my Masters advisor inBarcelona and pedagogically introduced me the second part of thisdissertation. I also want to express my gratitude to Prof. GuillemFemenies and Dr. Felip Riera at Universitat de les Illes Balears fortheir hospitality during my days in Mallorca and Dr. Alberto LopezToledo for being my host and advisor during my internship at Tele-fonica research labs.
The work that I present in this thesis has mostly benefited from the in-teraction with my colleagues in Cambridge. I am first indebted to Dr.Alfonso Martinez for giving me his unconventional and stimulatingpoint of view. I also would like to thank Dr. Gonzalo Vazquez-Vilarfor his collaboration while he visited our group, Dr. Tobias Koch forhis valuable comments beyond technical aspects, and Dr. Jossy Sahirfor his support and contagious joy. Finally, I am very happy and hon-oured of being part of a research group with Taufiq Asyhari, Li Peng,Jing Guo and Dr. Alex Alvarado.
Doubtless, this work could not have been possible without the “contri-bution” of my home friends Miguel, Marin, Berni, Berta, Ton, Diana,Susi, Virginia, the “pistol” tandem Nai & Xous and many others. Ihave also had the chance to meet extraordinary people during myPhD years with whom I share unforgettable moments: Jaime, Gra-ham, Fabio, Andrea, Fernando, Carlos, Amir, Vijay, Michele, Mosesand Adit, among others.
During this time, I have felt the constant care in the distance of myparents, brother, aunt and grandparents. If this trip has reacheda successful end is also thanks to them and Inma who has beatenoceans, time zones and uncountable “byes” at airports to be on myside illuminating the route.
Topics in joint source-channel coding and
multiuser detectionAdria Tauste Campo
Department of EngineeringUniversity of Cambridge
This dissertation was submitted for the degree of Doctor ofPhilosophy
June 2011
In the first part of this dissertation, we study the joint source-channel codingproblem in the non-asymptotic regime. In particular, we derive new achievabilityand converse bounds on the error probability that characterize the performancelimits of such model for arbitrary blocklengths. We also extend the random-coding analysis to study the JSCC error exponent and provide an achievabilityformula that holds for arbitrarily generated codewords and recovers all knownresults in the literature. Finally, our results enable us to numerically quantifythe minimum gain of joint over separate source and channel coding.
In the second part, we analyze multiuser detection under the assumptionthat the number of users accessing the channel is unknown by the receiver. Ourmain goal is to determine the performance loss caused by the need for estimatingthe identities of active users, which are not known a priori. We examine theperformance of multiuser detectors when the number of potential users is large.Statistical-physics methodologies are used to determine the macroscopic perfor-mance of the detector in terms of its multiuser efficiency. Special attention ispaid to the fixed-point equation whose solution yields the multiuser efficiency ofthe optimal (maximum a posteriori) detector in the large signal-to-noise ratioregime. Our analysis yields closed-form approximate bounds to the minimummean-squared error in this regime. These illustrate the set of solutions of thefixed-point equation, and their relationship with the maximum system load.
We also study iterative multiuser joint decoding in large randomly spread codedivision multiple access systems under the same assumptions. In particular, wefocus on the factor graph representation and iterative algorithms based on beliefpropagation. We study a suboptimal iterative scheme that jointly detects theencoded data and the users’ activity. Using density evolution, we provide a fixed-point equation of the overall iterative system where the nature of the exchangingprobabilities depends on the users’ activity. Finally, when the number of usersand the blocklength scale appropriately, we show that in the large limit a simplestructure on the users’ codes yields a multiuser efficiency fixed-point equationthat is equivalent to the case of all-active users with a system load scaled by theactivity rate.
Preliminaries
In this preliminary chapter we provide an outline of every chapter in the dis-sertation and summarize the work that has been published and submitted forpublication.
Outline
Part I
Part I studies the joint source-channel coding problem for arbitrary blocklength.In Chapter 1, we introduce the notation, and main definitions for Part I. We alsoprovide a literature review on previous work and establish links between classicaland recent results.
In Chapter 2 we formulate new sufficient and necessary conditions for trans-missibility assuming a fixed average error probability, channel blocklength andtransmission rate. To do this, we derive new tight achievability bounds andextend some of the bounds appearing in [49]. These bounds are obtained viarandom-coding and can be used to prove the direct part of the joint source-channelcoding theorem. Furthermore, they suggest that the random-coding average errorprobability can be optimized by drawing the codewords with a source-dependentdistribution. We explore this aspect by studying the JSCC error exponent. Bydoing this, we obtain a novel formulation of the JSCC exponent via the ideaof partitioning the source into classes and generate codewords according to theclasses. The new form particularizes to the results by Csiszar [14] and Gallager[23] when the partition is chosen appropriately. We strengthen this analysis byinvestigating the analogy between binary hypothesis testing and the JSCC prob-lem, which helps us derive a new converse result on the average error probability.Furthermore, we particularize our bounds to obtain achievability and converseresults for separate source and channel coding (SSCC) that provide a guidelineto compare the system performance between JSCC and SSCC in the finite-lengthregime. We finally illustrate our main results with some numerical examples ofbinary memoryless systems.
4
LIST OF TABLES
Part II
Part II studies multiuser detection with an unknown number of users in the large-system regime. In Chapter 3, we introduce the multiuser detection model anddescribe the statistical-physics approach to study the asymptotic performance ofan optimum detector.
In Chapter 4 we focus on the degradation of multiuser efficiency when theuncertainty on the activity of the users grows and the SNR is sufficiently large.We go one step beyond the application of the large-system decoupling principle[28, 57] and provide a new high-SNR analysis on the space of fixed-point solutionsshowing explicitly its interplay with the system load for a non-uniform ternaryand parameter-dependent input distribution. By expanding the minimum meansquare error for large SNR, we obtain tight closed-form bounds that describe thelarge CDMA system as a function of the SNR, the activity factor and the systemload. In addition, some trade-off results between these quantities are derived. Ofspecial novelty here is the study of the impact of the activity factor in the CDMAperformance measures (minimum mean-square error and multiuser efficiency). Inparticular, we provide necessary and sufficient conditions on the existence ofsingle or multiple fixed-point solutions as a function of the system load and SNR.Finally, we analytically identify the region of “meaningful” multiuser efficiencysolutions with their associated maximum system loads, and derive consequencesfor engineering problems of practical interest.
In Chapter 5, we study iterative multiuser joint decoding in large randomlyspread CDMA under the assumption that the number of users accessing thechannel is unknown by the receiver. In this scenario, the large-system approachallows one to characterize the system’s performance by an iterative updating ruleon the multiuser efficiency derived via the replica method. In particular, whenthe scaling between the logarithm of the number of users and the blocklength isbelow a threshold, we show that in the large limit a simple structure on the users’codes yields a multiuser efficiency fixed-point equation that is equivalent to thecase of all-active users with a system load scaled by the activity rate.
Note on published and submitted work
Part I has led to the following journal article:
• A. Tauste Campo, Gonzalo Vazquez-Vilar, A. Guillen i Fabregas, TobiasKoch and Alfonso Martinez, “Joint source-channel coding revisited: Ran-dom coding bounds and error exponents”. Submitted to IEEE Transactionson Information Theory.
and the presentation of the following peer-reviewed conference paper:
5
LIST OF TABLES
• A. Tauste Campo, Gonzalo Vazquez-Vilar, Alfonso Martinez, A. Guilleni Fabregas and Tobias Koch, “Achieving Csiszar’s source-channel codingexponent with product distributions”, In Proceedings 2012 IEEE Interna-tional Symposium on Information Theory, Cambridge, MA, July 2012.
• A. Tauste Campo, Gonzalo Vazquez-Vilar, A. Guillen i Fabregas, TobiasKoch and Alfonso Martinez, “Random-coding joint source-channel bounds”.In Proceedings 2011 IEEE International Symposium on Information Theory,Saint Petersburg, Russia, July-August 2011.
Part II has led to the publication and submission of the following journalarticles:
• A. Tauste Campo, A. Guillen i Fabregas and E. Biglieri, “Large-systemanalysis of multiuser detection with an unknown number of users”, IEEETransactions on Information Theory, vol. 57, no. 6, pp. 3416-3428, June2011.
• A. Tauste Campo and A. Guillen i Fabregas, “Iterative multiuser jointdecoding with an unknown number of users: Large-system analysis”, sub-mitted to IEEE Transactions on Communications, 2010.
and the presentation of the following peer-reviewed conference papers:
• A. Tauste Campo and A. Guillen i Fabregas, “Large system analysis ofiterative multiuser joint decoding with an uncertain number of users”. InProceedings 2010 IEEE International Symposium on Information Theorypp. 2103-2107 , Austin, TX, USA, June 2010.
• A. Tauste Campo, A. Guillen i Fabregas and E. Biglieri, “High-SNR anal-ysis of optimum multiuser detection with an unknown number of users”.In Proceedings of Information Theory Workshop (ITW) pp. 283-287 ,Taormina, Italy, 11-16 October, 2009.
6
Part I
Non-asymptotic jointsource-channel coding
7
Chapter 1
Introduction
1.1 Joint source-channel coding as a paradigm
model
The communication problem can be described in a simple form: Sender A wishesto transmit a certain message m to receiver B over a noisy medium W. Underwhich conditions is this possible?
W BAm
Figure 1.1: Communication problem.
The path to obtain answers for fundamental questions often becomes as im-portant as the answers themselves. The figure of Claude E. Shannon (1916-2001)made that happen in the context of communications engineering. While address-ing the above question Shannon built [54] a formal model for the communicationproblem and developed mathematical methods whose scope went beyond theoriginal problem. By doing this, Shannon helped shape a scientific disciplineknown as information theory whose main purpose is to study communicationsengineering questions as well-posed mathematical problems.
In the first part of this thesis, our aim is to study a model inspired by Shan-non’s work known as joint source-channel coding (JSCC) that formally covers awide class of communication problems. The model is composed by the followingparts (See the scheme at the top of Fig. 1.2):
8
1.1 Joint source-channel coding as a paradigm model
Source
Joint Source-Channel Coding
00000000000000000001
1111111111
...{ EncoderEncoder Decoder...{0000000001
11111{ 000?0
00?01
1?111
...Channel
00000000000000000001
1111111111{
Figure 1.2: Diagram of the joint source-channel coding.
• A source that generates messages with a certain probability distributionover a countable alphabet. For ease of analysis we will assume that themessages can be described by sequences of k symbols (These symbols arebinary in Fig. 1.2).
• An encoder that maps the messages into sequences of n symbols calledcodewords, where n is known as the channel blocklength or channel use(The codewords are binary strings in Fig. 1.2)
• A channel that describes the noisy medium by randomly mapping input se-quences into output sequences of the same length with a certain conditionaldistribution.
• A decoder that estimates the original messages based on the output sequenceof the channel.
The above model is the meeting point of the well-known source coding andchannel coding problems. Indeed, as shown in Figure 1.3, the JSCC model canbe viewed as a source coding model by using the identity function as the channelmapping and indexing the encoded sequences as {1, . . . ,Ms}. Similarly, one canrecover the channel-coding model when the source of the JSCC model generatesequiprobable messages among a set {1, . . . ,Mc}.
Since the source and the channel are fixed entities in a communication prob-lem, it is natural to wonder which is the best possible encoder/decoder pair inFig. 1.2 (also referred to as a code). A reasonable interpretation of the bestcode is that of achieving maximum decoding success rate with minimum usageof the channel. This reversely leads to study when errors occur, i.e, when noisyobservations lead to an incorrect decision of the decoder. For instance, by fixinga code we can compute the probability of the error event by averaging over allpossible source sequences and noise realizations. If we assume that the decodingrule can be optimized (e.g, using Bayes estimation), searching for the best code
9
1.1 Joint source-channel coding as a paradigm model
Source Encoder
00000000000000000001
1111111111
... {1, . . . ,Ms}
Message set
00000000010000000010
1000000000{ {
...{
Source coding
0000000001
11111{ 000?0
00?01
1?111
...
Message
{1, . . . , Mc}
...
Channel
Channel coding
EncoderEncoder
DecoderEncoder
Decoder estimate{1, . . . ,Mc}
Message set
Figure 1.3: Diagram of source coding and channel coding.
is a priori an NP-hard problem since one needs to check the error probability forevery possible encoding mapping (also called codebook) and channel blocklength.
Shannon circumvented the searching problem by proving a more general state-ment known as the JSCC theorem that ensures the existence of asymptoticallyoptimal codes as long as certain conditions on the statistics of the source andchannel are met.
The statement of the JSCC theorem can be summarized into three parts:
1. A sufficient (or achievability) condition for a source to be transmissible overa channel at a given error probability ε and blocklength n.
2. A necessary (or converse) condition for a source to be transmissible over achannel at ε′ and n′.
3. Asymptotic analysis of each condition for arbitrarily large blocklength (n, n′ →∞) and small error probability (ε, ε′ → 0).
Shannon’s original proof applies to sources and channels described by sta-tionary memoryless processes [54]. The direct part states that transmission ofa source over a channel with vanishing error probability is possible if a certainsource parameter is strictly below a channel one. The converse part states thattransmission of a source over a channel with vanishing error probability is notpossible when the source parameter is strictly larger than the channel one [13,p. 220][62]. For the direct part, Shannon proposed a suboptimal JSCC encod-ing/decoding scheme formed by two independent blocks: first, a source codingstage and then a channel coding one. Strikingly enough, the converse part ofthe theorem tells us that this design is asymptotically optimal: separate source
10
1.1 Joint source-channel coding as a paradigm model
and channel coding (SSCC) achieves reliable transmission under the same condi-tions of the best source-channel construction. This crucial remark has been thehistorical reason why the above result is widely known as the separation theorem.
Since the early 50’s, the JSCC theorem has been generalized in various ways.On the one hand, the result has been extended to general definitions of classesof sources and channels (including non-stationary, non-nergodic, with memory,etc.) [64, 65] refining the sufficient and necessary conditions for a source tobe transmissible over a channel [19, 33, 62]. The most general conditions [33]are based on the tail distribution of specific random variables called informationspectrum measures (see [32] for a review on information spectrum methods).
On the other hand, the approach based on the reliability function (or errorexponent) pioneered by Gallager [23] and further elaborated by Csiszar [14, 15]gives us further information about the JSCC theorem by describing the expo-nential decay rate of the error probability when the conditions of the separationtheorem are met. As separation is a particular case of JSCC, an optimized jointscheme should in general obtain a better performance in terms of error probabil-ity than a separate one. Regarding this issue, classical results on the reliabilityfunction suggest that JSCC leads to a greater error exponent than separate sourceand channel coding [14, 23]. This has been been corroborated by recent work in[69], where the authors report a systematic comparison between the error expo-nent of both schemes suggesting a significant gain of JSCC over separation forfinite lengths.
1.1.1 A finite-length perspective
The asymptotic analysis is a cornerstone of the JSCC theorem. Nevertheless,when we drop the infinite blocklength assumption the picture changes completely.As reviewed above, it is well known that in this case JSCC outperforms separa-tion for finite blocklengths [14, 23, 69]. Motivated by this fact, much research hasbeen devoted to study specific joint source-channel coding schemes [2, 30, 48].However, lesser attention has been paid to the critical role played by the channelblocklength. In fact, the assumption of infinite blocklength in practice requires al-lowing infinite delay and complexity in the encoder/decoders. This is particularlycritical when designing modern real-time processing systems with very low-delaydemands [11, 51]. Therefore, one needs to look at the JSCC problem from afinite-length perspective. In this context, a number of questions rise: what is theminimum error probability achieved by a JSCC code using a few hundred bits?How much can we gain using JSCC over separate source and channel coding?
The actual limits of JSCC are not known for all blocklengths and there isno general characterization on the finite-length gains of JSCC over separation.However, studying the JSCC problem in the finite blocklength regime could have
11
1.2 Definitions and notation
a large impact on practical applications by quantifying the gain of joint overseparate source-channel coding in terms of blocklength (i.e, complexity and delayof the decoders), and error probability (i.e, quality of service). This would providea rigorous trade off between the JSCC gain and the simplicity of a separate design.
In the first part of this thesis, we study the above problem from variousangles. We are first concerned about finding new achievability and converseconditions for the transmission of sources over channels at fixed blocklength anderror probability. Then, we provide an analytical comparison between JSCCand separation in the finite blocklength regime. The new results can be used toillustrate the intrinsic gain of JSCC over separation from different views. As anexample, Fig. 1.4 shows the average error probability when transmitting a binarymemoryless source (BMS) with probabilities {P0 = 0.05, P1 = 0.95} over a binarysymmetric channel (BSC) with crossover probability ξ = 0.11. For instance, theplot reports that using 1000 bits, there exists a joint source-channel code thatachieves an average error probability < 10−7, whereas every separate source-channel has greater average error probability than 9 × 10−5. The relationshipbetween the two bounds is of the order of x1.75, which is consistent with previousstudies [69].
Most of our work is based on the derivation of new tight achievability andconverse bounds on the average error probability and a novel analysis of theJSCC reliability function for general sources and channels. The new bounds areinspired by earlier works [14, 23, 49], strengthen classical results [22, 23, 55, 65],and can be applied to prove the direct and converse part of the general JSCCtheorem [33] when we take the limit in the channel blocklength.
This chapter is divided into the following parts. In Section 1.2 we describethe problem formulation, the model and notation used throughout Chapter 1-2.In Section 1.3 we review classical and recent results on the topic.
1.2 Definitions and notation
To fix notation for this part of the dissertation, we shall use X to denote analphabet that is a Cartesian product of possibly different sets, and Xn whenwe wish to specify the number of such sets. The elements of X will be calledsequences or vectors and denoted by x. If the alphabet is a 1-fold set it will bedenoted by X, its elements will be called symbols and will be denoted by x. WhenX (respectively X) is finite, its size will be given by |X| (resp. |X|). For any setA, we shall denote by Ac its complement.
We shall useX to denote a random variable (r.v.) defined on X andXn whenXn is specified. Unless otherwise stated, the definitions for X particularize to theuni-dimensional random variable X. The probability mass function (p.m.f.) when
12
1.2 Definitions and notation
100 200 300 400 500 600 700 800 900 100010
−8
10−6
10−4
10−2
100
Blocklength, n
Err
orbou
nds
Converse SSCCAchievability JSCC
Figure 1.4: JSCC achievability bound and SSCC converse bound for a systemformed by binary memoryless source (with parameter δ = 0.05) and binary sym-metric channel with crossover probability ξ = 0.11 with transmission rate t = 1.
X is discrete or the probability density function (p.d.f.) when it is continuouswill be denoted by PX . In contrast, PX(X) will denote a r.v. with underlyingprobability X ∼ PX . We shall use Pr{.} to denote the probability of an eventand E[.] to denote the expectation of a random variable, when the underlyingprobability distribution is clear from the context. For two functions f and g,f ◦ g shall denote the composition of f over g.
The main ingredients of our analysis are sources and channels. In the follow-ing, we provide general definitions of both terms following the approach of [32, 33].For given k, n ∈ N, consider Vk to be a finite or infinitely countable alphabet andXn and Yn to be arbitrary abstract sets with elements v,x,y, respectively.
Definition 1.2.1 A source V is a sequence of pairs V = (Vk, PV k)k≥1 indexed by
the source blocklength k, where PV kis a probability distribution defined on Vk.
Definition 1.2.2 A channel W is a sequence of triplets (Xn,Yn, PY n|Xn)n≥1 in-
13
1.2 Definitions and notation
Source-Channel Encoder
ChannelSource-ChannelDecoder
! !PY |XV k
vx ySource
PV
Destinationv
Figure 1.5: Block diagram of joint source-channel coding.
dexed by the channel blocklength n, where PY n|Xn is a channel transition probabil-ity and specifically, PY n|Xn(x|y) (x ∈ Xn,y ∈ Yn) is the conditional probabilityof y given x.
The above definitions are general enough to cover a wide range of cases, in-cluding nonstationary, nonergodic sources and channels with arbitrary memorystructures. The following definition formally characterizes the encoder/decoderblocks of the JSCC scheme in Fig. 1.2.
Definition 1.2.3 A joint source-channel (JSC) code with transmission rate tover Vk × Xn × Yn is a pair (φk, ψn) with t , k
n, where φk, the encoder, is a
mapping described byφk : Vk → Xn (1.1)
and ψn, the decoder, is a possibly random transformation given by
ψn : Yn → Vk. (1.2)
Then, we define codebook as the set of sequences C(φ) = {x ∈ X|x = φk(v),v ∈V}.
If it is not stated otherwise, we shall assume thereafter that k and n are fixed(and thus, t), and we shall drop the blocklength subindices and use V, X, Y, PV ,PY |X , φ and ψ without ambiguity.
We can represent the relationships induced by the use of the encoder anddecoder of a JSC code via a Markov chain. If X = φ(V ) is the encoded n-dimensional vector (associated to the k-dimensional source vector V ), Y theoutput vector corresponding to the input X, and V = ψ(Y ) the decoded vec-tor, the joint source-channel code is characterized by the following Markov-chainrelation (see Fig. 1.5),
Vφ→X→Y ψ→ V . (1.3)
In particular we can recover the definitions of source code and channel code[13] when the variables V and X are defined conveniently. For instance, whenX = Y and X = {1, . . . ,Ms}, where Ms is the number of source codewords,
14
1.2 Definitions and notation
(φ, ψ) corresponds to a source code of rate Rs , logMs
k. Analogously, when V
is a equiprobable variable over {1, . . . ,Mc}, where Mc is the number of channelcodewords, (φ, ψ) is a channel code of rate Rc , logMc
n.
We introduce here the definitions of the two main classes of decoders, namely,the MAP and threshold decoder, that are used to derive classical results as wellas the new bounds presented in this dissertation.
Definition 1.2.4 Given a received sequence y and a source-channel encoder φ,the MAP decoder ψMAP chooses one source output v with equal probability amongthe vectors v′ that maximize the posterior probability PV |Y (v|y). That is, for agiven y, the MAP decoder ψMAP equiprobably chooses a v, such that
v ∈ Dφ(y) ,
{v′ ∈ V : v′ = arg max
v∈VPV (v)PY |X(y|φ(v))
}. (1.4)
v1 v2 v3 v4
PV |Y (v|y)
Figure 1.6: Diagram of MAP decoding. The source vector with highest posteriorprobability is decoded.
Definition 1.2.5 Let the source vectors be listed and a decoding metric be definedby f : V× Y→ R. Given a received sequence y and a source-channel encoder φ,a threshold decoder ψTH with threshold γ(v,y) chooses the first vector v on thelist such that f(v,y) > γ(v,y).
We shall clarify which decoder is used to derive each result when appropriate.As introduced earlier, an important class of joint source-channel codes are
those where the source (resp. channel) encoder and the decoder do not take
15
1.2 Definitions and notation
v1 v2 v3 v4
PV |Y (v|y)
!(v, y)
Figure 1.7: Diagram of threshold decoding. In general, the source output that isdecoded is not the one with highest posterior probability but the first on the listwhose posterior probability is above a given threshold.
into account information on the statistics of the channel (resp. source). Hence,they can be divided into a source encoder/decoder block and a channel en-coder/decoder block allowing an independent optimization of each block. Thesecodes will be referred as separate source-channel codes.
Definition 1.2.6 A separate source-channel code with M codewords over V ×X× Y is a JSC code formed by the pair (φ(c) ◦ φ(s), ψ(s) ◦ ψ(c)), where
φ(s) : V→ {1, . . . ,M} (1.5)
andφ(c) : {1, . . . ,M} → X (1.6)
are encoding mappings and
ψ(c) : Y→ {1, . . . ,M} (1.7)
andψ(s) : {1, . . . ,M} → V (1.8)
are random transformations.
The measure that we consider to evaluate the performance of JSC codes isthe error probability.
16
1.2 Definitions and notation
Definition 1.2.7 The average error probability or error probability of a JSC code(φ, ψ) denoted by εk,n(φ, ψ), is the average number of error induced after decodinga noisy observation:
εk,n(φ, ψ) , Pr{V 6= V
}. (1.9)
In the following, a JSC code (φ, ψ) with average error probability equal toεk,n > 0 will be called a εk,n-code. When the joint source-channel code particular-izes into a source code (respectively channel code) with average error probabilityεk (resp. εn) we shall denote it as a εk-code (resp. εn-code).
The definitions of JSC code and average error probability bring us to twocentral concepts in Shannon theory: the minimum achievable source coding rateand the channel capacity. These two concepts arise as fundamental limits insource coding and channel coding respectively and are tied together in the JSCCtheorem as we shall see in the next section. To start with, we introduce the notionof reliable transmission or transmissibility of a source over a channel.
Definition 1.2.8 A source is transmissible over a channel if there exists a se-quence of εk,n-codes satisfying
limn→∞
εk,n = 0. (1.10)
The above definition can be particularized for source and channel coding as fol-lows.
Definition 1.2.9 Reliable source (resp. channel) compression (resp. transmis-sion) is possible at a rate R ≥ 0 if there exists a sequence of εk-codes (resp.εn-codes) with Ms (resp. Mc) codewords satisfying
R >logMs
k
(resp. R <
logMc
n
)(1.11)
andlimk→∞
εk = 0(
resp. limn→∞
εn = 0). (1.12)
Hence, the following concepts are defined as extreme rates for which the abovedefinitions hold.
Definition 1.2.10 The minimum achievable rate Rf (V) of a source V is theinfimum of all rates for which (source) compression is possible.
Definition 1.2.11 The capacity C(W) of the channel W is the supremum of allrates at which (channel) reliable transmission is possible.
17
1.2 Definitions and notation
For the sake of completeness, we provide a refined definition of channel capac-ity based on the limit inferior of the average error probability [65]. This notionof capacity allows to characterize the converse part of the general version of theseparation theorem [33, 62].
Definition 1.2.12 C(W) is the optimistic capacity of the channel W, defined bythe supremum of all rates R for which there exists a sequence of εn-codes with Mc
codewords such that
R <logMc
n(1.13)
andlim infn→∞
εn = 0. (1.14)
As discussed earlier, the above definitions are meaningful for sufficiently largen. When considering JSCC for finite blocklength, we need to reformulate thetransmissibility condition.
Definition 1.2.13 A source is εk,n-transmissible over a channel if there exists aεk,n-code.
The definitions for source and channel coding are analogous. Most of our workin Chapter 2 is concerned in finding sufficient and necessary conditions for εk,n-transmissibility.
When studying the average error probability at finite blocklength, it is ofteninsightful to express it in terms of specific random variables that are based onthe source and channel statistics and are known in the literature as informationspectrum variables [32, 64] (see [32] for a review on information spectrum meth-ods). As we shall see in the next section, the distribution or spectrum of thesevariables provides a simple characterization of upper and lower bounds on theaverage error probability [33].
Definition 1.2.14 Let PV be a source distribution. We will call the r.v.
h(V ) ,1
klog
1
PV (V )(1.15)
the entropy density rate and its distribution the entropy spectrum.
Notice that for a given k and PV , the entropy density rate can be seen as the(deterministic) transformation f(.) = − logPV (.) over the variable V . Further,the expected value of h(V ) equals 1
kH(V ), where H(V ) stands for the entropy
of V [13]:
H(V ) = −∑
v
PV (v) logPV (v). (1.16)
18
1.2 Definitions and notation
Definition 1.2.15 Let PY |X be a channel transition probability. We will call ther.v.
i(X,Y ) ,1
nlog
PY |X(Y |X)
PY (Y )(1.17)
the information density rate and its distribution the information spectrum.
Hence, the information density rate is a random variable fully determined bythe channel transition probability PY |X and the channel output distribution PY
induced by PX as PY (Y ) =∑
x PX(x)PY |X(Y |x). Similarly, it is easy to seethat the expected value of i(X,Y ) is given by 1
nI(X;Y ), where I(X;Y ) is the
mutual information between X and Y [13]:
I(X;Y ) =∑
y
∑
x
PX(x)PY |Y (y|x) logPY |X(y|x)
PY (y). (1.18)
By convention, we will assume that h(V ) = ∞ and i(X,Y ) = ∞ for v andy such that PV (v) = 0 and PY (y) = 0 respectively, and i(X,Y ) = −∞ for all(x,y) such that PY |X(y|x)/PY (y) = 0.
A central concept in our analysis is the error exponent (also known as the reli-ability function) [23]. The error exponent gives the asymptotic exponential decayof the error probability over the channel blocklength. Although the concept is de-fined asymptotically, it provides an accurate characterization of the average errorprobability for sufficiently large finite blocklength and small error probability.
Definition 1.2.16 Let V and W be the source and the channel of a JSCC scheme.The reliability function (or error exponent) is the largest number EJ ≥ 0 for whichthere exists εk,n-codes such that
EJ < lim infn→∞
− 1
nlog εk,n. (1.19)
The above definition gives the best exponential decay of the error probabilitythat a JSC code can achieve for a given source and channel. In other words, forany εk,n-code satisfying
εk,n ≤ exp (−nE1) (1.20)
it must hold that E1 ≤ EJWith some abuse of notation, we shall denote the vector form of the JSCC
error exponent by EJ(PV , PY |X). One can similarly define the source coding errorexponent at a rate R ≥ 0, denoted by e(R,PV ), [17, 36] and the channel codingerror exponent at a rate R, denoted by E(R,PY |X) [17, 23].
19
1.2 Definitions and notation
1.2.1 Discrete memoryless systems
We now define a widely studied class of sources and channels [13] where we applymost of our results in the last section of Chapter 2.
Definition 1.2.17 Let V = Vk be a k-Cartesian product of a discrete alphabetV. A discrete memoryless source (DMS) over V is described by the distribution
PV (v) =k∏
`=1
PV (v`), v = (v1, . . . , vk) ∈ Vk. (1.21)
Let X and Y be the n-Cartesian product of the discrete alphabets X and Y re-spectively. A discrete memoryless channel (DMC) over X×Y is described by thetransition probabilities
PY |X(y|x) =n∏
`=1
PY |X(y`|x`), x = (x1, . . . , xn) ∈ Xn, y = (y1, . . . , yn) ∈ Yn.
(1.22)
The definitions of Rf (V) and C(W) presented above particularize to well-known expressions for discrete memoryless systems [13] as a consequence of thesource coding and channel coding theorems.
Theorem 1.2.1 [23][13] The minimum achievable source coding rate equals theentropy of PV , i.e:
Rf (V) = H(V ), (1.23)
defined in (1.16). On the other hand, for a DMC with transition probability PY |X ,the channel capacity C(W) = C equals
C = supPX
I(X;Y ). (1.24)
where I(X;Y ) is defined in (1.18).
We present now a methodology developed by Csiszar that provides a naturalframework to study the properties of DMS and DMC. It is called the methodof types [16] and its application ranges from information theory to other relatedfields such as large deviation theory or hypothesis testing [17]. The idea be-hind the method of types is the following: One divides the set of source vectorsinto classes named types according to the empirical distribution of the symbolsappearing in the sequence. The key property of such classes is that they growsub-exponentially with the source blocklength k. Hence, we can compute theprobability of any event that is exponential with k (such that the error) by in-tersecting it with the various types and summing each contribution. In order tofurther illustrate this method, we define in Appendix A.1 the concepts of type,joint type and conditional type and some important facts about them.
20
1.3 Previous work
1.3 Previous work
1.3.1 The separation theorem
The JSCC theorem stated by Shannon in his landmark paper [54] can be formu-lated as follows.
Theorem 1.3.1 (Shannon [54]) Let a source and a channel be described bystationary memoryless processes. Then, let H(V ) the entropy of the source and Cthe channel capacity. If tH(V ) < C, the source is tranmissible over the channel.Conversely, if tH(V ) > C, the source is not transmissible over the channel.
The above statement can be viewed as a generalization of the channel codingtheorem for general distributions PV over the messages. However, there is amore valuable message implicit in it: excluding the case tH(V ) = C, separatesource and channel coding achieves asymptotically vanishing error probabilityunder the same condition as any joint optimized scheme does. In fact, Shannonoriginally proved the direct part of Theorem 1.3.1 using a concatenation of asource and a channel code. More specifically, he proved that by compressing thesource at the (source coding ) rate Rs = H(V ), encoding the resulting indicesat (channel coding) rate Rc = tRs as long as Rc ≤ C. The proof relies on theso-called random-coding method. This method uses the fact that a bound onan ensemble average must be satisfied by at least one element of the ensemble.Shannon applied this idea in a very elegant way: by defining a distribution overthe ensemble of codebooks, Shannon bounded the average error by a sequenceof real numbers approaching 0 and thus, proved the existence of at least onesequence of codes that asymptotically achieves zero error probability.
Shannon’s theorem has been extended to the general definitions of sourcesand channels [64, 65] introduced in Section 1.2. In this context, the separationtheorem has also been revisited [33, 62] yielding new sufficient and necessary con-ditions for the reliable transmission of a source over a channel. In particular, theauthors in [62] describe an example where a generalized version of the separationtheorem does not hold, i.e, the minimum achievable rate of the source is largerthan the capacity of the channel and reliable transmission is still possible. Thisgives rise to a revised version of the JSCC theorem, where it is used the conceptof optimistic capacity in the converse part.
Theorem 1.3.2 (Vembu-Verdu-Steinberg [62]) A source V is transmissibleover a channel W if Rf (V) < C(W). Conversely, a source is not transmissibleover W if Rf (V) > C(W).
21
1.3 Previous work
1.3.2 Bounds on the error probability
One of the first characterizations of the JSCC error probability was proposedby Gallager [23, Prob. 5.16] as a take-home exam problem and is based on theaforementioned random-coding method.
Thereafter, for a given pair (k, n) we assume that the codebooks are gen-erated by independently drawing codewords with probability PX|V . Then, theprobability of a codebook defined by φ(.) is
Pr {C(φ)} =∏
v
PX|V (φ(v)|v). (1.25)
According to (1.25), the random-coding average error probability E[εk,n] undera given decoder ψ is given by
E[εk,n] =∑
φ
Pr {C(φ)} εk,n(φ, ψ). (1.26)
Hence, an upper bound on E[εk,n] ensures that there exists at least one code-book whose average error probability satisfies the bound. The next random-coding result due to Gallager’s is derived using PX|V = PX .
Theorem 1.3.3 (Gallager [23]) Let (k, n) be fixed. For any ρ ∈ [0, 1] and PX ,the average error probability over the ensemble of joint source-channel randomcodes satisfies:
E[εk,n] ≤(∑
v
PV (v)1
1+ρ
)1+ρ∑
y
(∑
x
PX(x)PY |X(y|x)1
1+ρ
)1+ρ
(1.27)
Proof See Appendix A.2.
Let P be the distribution of the r.v. A on the alphabet A and let W be aconditional distribution of a r.v. B given A defined on the alphabet B. Definingthe functions
Es(ρ, s, P ) , log
(∑
a∈A
P (a)1−sρ
(∑
a′∈A
P (a′)s
)ρ)(1.28)
E0(ρ, s, P,W ) , − log E[(
EP [W (B|A)s]
W (B|A)s
)ρ](1.29)
for ρ ≥ 0 and s > 0, known in the literature as Gallager’s source and channelfunctions [23] we can express the vector form of equation (1.27) as
E[εk,n] ≤ e−E0(ρ, 11+ρ
,PX ,PY |X)+Es(ρ, 11+ρ
,PV ), ρ ∈ [0, 1]. (1.30)
22
1.3 Previous work
We shall use Es(ρ, P ) ≡ Es
(ρ, 1
1+ρ, P)
and E0(ρ,Q,W ) ≡ E0
(ρ, 1
1+ρ, Q,W
)
thereafter, unless otherwise stated. Then, observe that (1.30) can be regarded asa lower bound on the actual JSCC error exponent:
EJ(PV , PY |X)(PV , PY |X) ≥ lim infn→∞
1
n
(E0(ρ, PX , PY |X)− Es(ρ, PV )
), ρ ∈ [0, 1].
(1.31)In particular, for discrete memoryless systems, the functions (1.28) and (1.29)
factorize in n and k respectively, and (1.31) is equivalent to
EJ(PV , PY |X) ≥ E0(ρ, PX , PY |X)− tEs(ρ, PV ). (1.32)
Besides, by derivation of the right hand side of (1.32) with respect to ρ, it canbe shown that EJ(PV , PY |X) is strictly positive when H(V ) < C, which providesan alternative proof of the JSCC theorem.
Csiszar tightened Gallager’s bound (1.32) as well as provided an upper boundon the error exponent [14, 15] using the method of types [16, 17]. For the lowerbound, Csiszar used a maximum mutual information decoder at the receiver andbounded the error probability of a code whose codewords belong to a specifictype. This class of codes is known in the literature as fixed-composition codes[17, 24]. However, we present Csiszar’s result in an equivalent form derived in[69] that is easier to compute. To do that, we define
E0(ρ, PY |X) , maxPX
E0(ρ, PX , PY |X) (1.33)
and denote by E0(ρ, PY |X) the convex hull of the function E0(ρ, PY |X) over theappropriate domain of ρ.
Theorem 1.3.4 (Csiszar [14], Zhong et al. [69]) The JSCC error exponentEJ(PV , PY |X) for a DMS-DMC pair is bounded as
maxρ∈[0,1]
E0(ρ, PY |X)− tEs(ρ, PV ) ≤ EJ(PV , PY |X) ≤ supρ≥0
E0(ρ, PY |X)− tEs(ρ, PV ).
(1.34)
Proof In Appendix A.4, we provide a novel proof of Csiszar’s lower bound usingrandom coding and a MAP decoder at the receiver.
The above bounds are plotted in Fig. 1.8 as a function of the source entropyof a binary memoryless source for a binary symmetric channel with crossoverprobability ξ = 0.002 and channel capacity C = 0.9792. According to Theorem1.3.4 both bounds must coincide whenever the maximizing ρ of the upper bound
23
1.3 Previous work
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
H(V )
EJ
Csiszar upper boundCsiszar lower bound
Figure 1.8: Csiszar’s upper and lower bound on the JSCC error exponent withtransmission rate t = 1 for a binary symmetric channel with crossover probabilityξ = 0.002 as a function of the source entropy of a binary memoryless source.
satisfies ρ ∈ [0, 1]. The plot also illustrated that the bounds are strictly positivefor source entropies below the channel capacity.
One can observe in (1.34) that Csiszar’s lower bound strengthens the opti-mized version of Gallager’s bound (1.32) via the inequality
E0(ρ, PY |X) ≥ E0(ρ, PY |X), (1.35)
which holds with equality when E0(ρ, PY |X) is concave over ρ ∈ [0, 1] [69]. Inparticular, when the maximizing PX in Gallager’s bound is independent of ρ, asit is the case for symmetric channels [23]), E0(ρ, PY |X) is concave and both boundsare equal. In fact, they coincide for many source-channel pairs of interest despitehaving been derived using very different techniques. In Chapter 2, we investigatethis striking fact and provide an unified method to derive both results.
As discussed in Section 1.1, the error exponents provide a method to comparethe JSCC performance with that of separation for finite blocklength. In general,
24
1.3 Previous work
we can upperbound the error probability of separate source and channel coding,denoted by Esp(R,PY |X), as [14]
εsepk,n ≤ εk + εn, (1.36)
where εk and εn are the source and channel coding error probabilities. This inturn gives the achievable exponent
Esep(PV , PY |X) = minR∈(Rf (V),C(W))
(e(R,PV ), E(R,PY |X)
). (1.37)
Based on this observation and Csiszar’s bounds for DMS and DMC, the authorsin [69] derive the following result.
Theorem 1.3.5 [69] For DMC and DMS, the JSCC error exponent can be atmost equal to double the exponent (1.37), i.e.:
EJ(PV , PY |X) ≤ 2Esep(PV , PY |X). (1.38)
In particular, in such cases where (1.38) holds with equality the above resultcan be interpreted with the relation
εk,n ≈ (εsepk,n)2, (1.39)
which means that SSCC approximately needs double blocklength (double delay)than JSCC to achieve the same error probability. In Chapter 2 we refine thiscomparison with tight achievability and converse bounds to εsep
k,n.A more general approach [32] to compute error bounds is based on the spectra
of h(V ) (1.15) and i(X,Y ) (1.17) and is at the core of the proof of the generalseparation theorem (Theorem 1.3.2) [33]. The bounds are a generalization of clas-sical results in channel coding to non-uniform source distributions. One exampleis the extension of Feinstein’s Lemma [22] to JSCC due to Han [33].
Theorem 1.3.6 (Generalization of Feintein’s Lemma [33]) For any PX|Vand every pair (k, n), there exists a εk,n-code with threshold decoding such that
εk,n ≤ Pr {i(X,Y ) ≤ th(V ) + γ}+ e−nγ ∀γ > 0. (1.40)
Similarly, the dual form of the original Feinstein’s Lemma, Verdu-Han’s Lemma[65], can be extended to JSCC with the distributions PX|V and PY induced by agiven code.
Theorem 1.3.7 (Verdu-Han’s Lemma for JSCC [33]) The average error prob-ability of a JSC code (φ, ψ), where X = φ(V ), and Y is obtained via PY |X , islower bounded as
εk,n ≥ Pr {i(X,Y ) ≤ th(V )− γ} − e−nγ ∀γ > 0. (1.41)
The direct and converse parts of Theorem 1.3.2 are based on the asymptoticanalysis of bound (1.40) and bound (1.41).
25
1.3 Previous work
1.3.3 Refined analysis of non-asymptotic channel coding
The finite-length perspective has been recently adopted in the context of chan-nel coding [34, 41, 49] providing a framework to study more general problemssuch JSCC or multi-terminal communications. We review below a few resultsappearing in [41, 49] that inspire part of our work in Chapter 2.
Let a channel be fixed over alphabets X and Y for a given n. The first resulttightens the union bound on the random-coding average error probability.
Theorem 1.3.8 (Random Coding Union (RCU) bound [49]) For any dis-tribution PX , there exists a εn-code with M codewords and decoded with maximumlikelihood satisfying
εn ≤ E[min
(1, (M − 1) Pr
{PY |X(Y |X) ≥ PY |X(Y |X)|XY
})], (1.42)
where X ∼ PX is independent of Y .
Gallager’s bound for channel coding [23] can be obtained from (1.42) usingMarkov’s inequality and min(1, x) ≤ xρ, ρ ∈ [0, 1] [41]:
e−E0(ρ, 11+ρ
,PX ,PY |X)+ρR, (1.43)
where R = logMn
.More generally, the use of Markov’s inequality yields a family of bounds pa-
rameterized by s ≥ 0.
Theorem 1.3.9 (Tilted Random Coding Union bound [41]) For any dis-tribution PX , there exists a εn-code with M codewords and decoded with maximumlikelihood satisfying:
εn ≤ E[e−n|is(X;Y )− 1
nlogM−1|+] (1.44)
where
is(X;Y ) ,1
nlog
PY |X(Y |X)
E[PY |X(Y |X)s|Y ]. (1.45)
The above results are all derived using MAP decoding. On the other hand,one can also obtain bounds assuming threshold decoding at the receiver. Inparticular, the authors in [49] provide a refinement of Feinstein’s result (1.40).
Theorem 1.3.10 (Dependence Testing (DT) bound [49]) For any distri-bution PX , there exists a εn-code with M codewords and decoded with thresholddecoding satisfying
εn ≤ E[e−n|i(X,Y )− 1
nlog M−1
2 |+]
(1.46)
where |x|+ = max(x, 0).
26
1.3 Previous work
The analogy between hypothesis testing and the channel coding problem hasbeen devised since the early work by Blahut [5]. Of special relevance is therelationship between the error of a specific test and the average error probability.The Neyman-Pearson Lemma proves to be a useful tool as it sets the performancelimit of a binary hypothesis test in terms of the decision error.
We now revisit this lemma. Consider first a test between two hypothesis H0
and H1 based on the distribution of a random variable A taking values in A.We say that H0 holds if A is distributed according to P0 and H1 holds if A isdistributed according to P1. Given an observation of A we can construct anydecision rule by defining the region in A where we accept H0 .
Theorem 1.3.11 (Neyman Pearson Lemma [45]) For any α ∈ [0, 1], thereexists γ ∈ [0, 1] and τ ∈ [0, 1) such that the decision rule with regions
RH0 =
{a ∈ A
∣∣∣∣∣P0(a)
P1(a)> γ
}(1.47)
RH1 =
{a ∈ A
∣∣∣∣∣P0(a)
P1(a)≤ γ
}(1.48)
satisfy
Pr
{P0(A)
P1(A)> γ
∣∣∣∣∣A ∼ P0
}+ τ Pr
{P0(A)
P1(A)= γ
∣∣∣∣∣A ∼ P0
}= α. (1.49)
and achieves the minimum error Pr {H0|H1 is true} among the tests with
Pr {H0|H0 is true} ≥ α. (1.50)
The relationship between the Neyman-Pearson Lemma and the channel cod-ing problem was analyzed by Blahut [5] and refined in [49]. In particular, theauthors [49] derive a converse result to the number of codewords that is possibleto transmit at fixed εn.
Theorem 1.3.12 (Meta-converse bound [49]) Every εn-code with M code-words satisfies
M ≥ supPX
infQY
1
β1−εn(PX × PY |X , PX ×QY
) (1.51)
where βα(PX × PY |X , PX ×QY
)is the minimum type II error given by the Neyman-
Pearson Lemma when testing a joint distribution on (X,Y ) between PX ×PY |Xand PX ×QY with maximum type I error 1− α.
27
Chapter 2
Joint-source channel coding forarbitrary blocklength
2.1 Presentation
In this chapter we derive achievability and converse results on JSCC error prob-ability for arbitrary blocklength. The chapter is structured as follows. In Sec-tion 2.2 we use the random-coding method to derive a number of achievabilitybounds to the average error probability that provide sufficient conditions for εk,n-transmissibility. These bounds translate into lower bounds to the JSCC errorexponent. In particular, we elaborate a general version of the random-codingerror exponent that particularizes to known results in the literature [14, 69].
Inspired by previous work [5, 49], we draw a analogy between a binary hy-pothesis test and the JSCC problem in Section 2.2 and obtain a novel conversebound on the average error probability.
2.2 Achievability results
In the following, we present upper (or achievability) random-coding bounds tothe error probability with MAP and threshold decoding. To that purpose weassume that codewords are independently generated according to a distributionPX|V . As we shall see, each bound follows from subsequent relaxations stemmingfrom the exact random-coding error probability E[εk,n].
2.2.1 MAP decoding upper bounds
We start this section by defining the MAP decoder. Given a received sequence yand a source-channel encoder φ, the MAP decoder chooses one source output v
28
2.2 Achievability results
with equal probability among the vectors v′ that maximize the posterior proba-bility PV |Y (v|y). That is, for a given y, the MAP decoder ψMAP equiprobablychooses a v, such that
v ∈ Dφ(y) ,
{v′ ∈ V : v′ = arg max
v∈VPV (v)PY |X(y|φ(v))
}. (2.1)
In the following random-coding analysis of the decoder, the candidate code-word assigned to the transmitted message v will be denoted by X, and thecandidate codewords for the non-transmitted v 6= v will be denoted by a r.v. Xindependent of X. In this setting, we can distinguish two error events for everywrong candidate v:
E1(v) ,{PV (v)PY |X(Y |X) > PV (V )PY |X(Y |X)|V XY
}(2.2)
E2(v) ,{PV (v)PY |X(Y |X) = PV (V )PY |X(Y |X) ∩ {v is decoded} |V XY
}.
(2.3)
We can therefore evaluate E[εk,n] as
E[εk,n] =E
[Pr
{⋃
v 6=V
E1(v) ∪ E2(v)
∣∣∣∣∣V XY}]
(2.4)
=1−∑
v
PV (v)E
[Pr
{⋂
v 6=V
Ec1(v) ∩ Ec2(v)
∣∣∣∣∣V XY}]
. (2.5)
where v 6= V is an abuse of notation denoting that v is different from a givenrealization of V .
We now provide an explicit computation of (2.5) by conditioning on the num-ber of ties that the decoder handles. For a particular y let Dφ(y) be the set ofsource outputs whose MAP decoding metric is maximum and |Dφ(y)| = ` itscardinality. For a given v, let the ensemble of subsets of V◦(v) , V \ {v} withcardinality `, i.e, the ensemble of all possible combinations of ` decoding ties, bedefined as P(V◦(v), `) , {A ⊆ V◦(v), |A| = `}, and be denoted by P(V◦(V ), `)when V is not specified. Then, simple combinatorial arguments yield the follow-ing result.
Theorem 2.2.1 The random-coding average error probability of JSCC underMAP decoding and arbitrary PX|V is given by
εMAPk,n , 1−
∑
`≥0
1
`+ 1E [Pr{`|V XY }] , (2.6)
29
2.2 Achievability results
where
Pr {`|V XY } ,∑
A∈P(V◦(V ),`)
∏
v∈A
Pr{PV (v)PY |X(Y |X) = PV (V )PY |X(Y |X)|V X,Y
}×
×|V|−1∏
v′∈V◦(v)\A
Pr{PV (v′)PY |X(Y |X) < PV (V )PY |X(Y |X)|V X,Y
}.
(2.7)
The computation of the exact formula (2.6) is challenging even for small (k, n).Instead, we can provide a looser version by assuming that the decoder alwaysresolves all ties in error:
E[εk,n] ≤ E
[Pr
{⋃
v 6=V
PV (v)PY |X(Y |X) ≥ PV (V )PY |X(Y |X)
∣∣∣∣V XY}]
.
(2.8)
By using the following identity for a sequence of real valued r.v.’s {Ai}
Pr {∪i{Ai ≥ 1}} = Pr{
maxiAi ≥ 1
}(2.9)
we obtain the next upper bound.
Theorem 2.2.2 (Max-bound) For every (k, n) and arbitrary PX|V , there ex-ists an εk,n-code such that
εk,n ≤ MAX(k, n) , E[Pr
{maxv 6=V
PV (v)PY |X(Y |X) ≥ PY |X(Y |X)PV (V )|V XY}]
.
(2.10)
In practice, the computation of the bound (2.10) typically involves an expo-nential number of products, which requires a high degree of numerical precisionas each factor can be fairly small. In order to obtain simpler bounds we can usethe union bound
Pr{
maxiAi ≥ 1
}≤ min
{1,∑
i
Pr{Ai ≥ 1}}. (2.11)
Based on Equation (2.11), we can extend the so-called random-coding union(RCU) bound for channel coding [49, Th. 16] to JSCC.
30
2.2 Achievability results
Corollary 2.2.1 (RCU bound) For an arbitrary PX|V , there exists an εk,n-code such that
εk,n ≤RCU(k, n)
,E
[min
{1,∑
v 6=V
Pr{PV (v)PY |X(Y |X) ≥ PV (V )PY |X(Y |X)|V XY
}}]
.
(2.12)
The bound in Equation (2.12) turns out to be a good bound, especially whenthe error event is sufficiently small, as most of the pairwise error probabilitiesare very small. Furthermore, we can relax Equation (2.12) by applying Markov’sinequality to each term in the sum [41], which yields
E[εk,n] ≤ E
[min
{1,∑
v 6=V
E[PV (v)sPY |X(Y |X)s|Y
]
PV (V )sPY |X(Y |X)s
}]. (2.13)
By using the identity min{1, x} = exp(−| log x−1|+), ∀x ∈ R+, where |x|+ =max(x, 0), a new family of upper bounds parameterized by s > 0 follow.
Corollary 2.2.2 (Tilted RCU bounds) For s > 0 and arbitrary PX|V , thereexists a εk,n-code such that
εk,n ≤ RCUs(k, n) , E[e−n|js(V ,X,Y )|+
], (2.14)
where
js(V ,X,Y ) ,1
nlog
PV (V )sPY |X(Y |X)s∑v PV (v)s
∑x PX|V (x|v)PY |X(Y |x)s
. (2.15)
In particular, when s = 1 equation (2.15) takes the form j1(V ,X,Y ) = i(X,Y )−th(V ), where i(X,Y ) and h(V ) are the information and entropy density rates,respectively [32]. Hence, Theorem 2.2.2 can be used to prove the direct part ofthe general JSCC theorem [33]. For ease of notation, we shall use j(V ,X,Y )to denote j1(V ,X,Y ) Moreover, when X is independent of V , js(V ,X,Y )decomposes into a source and a channel term
js(V ,X,Y ) = is(X,Y )− ths(V ), (2.16)
where is(X,Y ) , 1n
logPY |X(Y |X)s
E[PY |X(Y |X)s|Y ]and hs(V ) , 1
nlog PV (V )sP
v PV (v)s.
Further, Gallager’s upper bound (Theorem 1.3.3) (1.27) can be obtained fromEquation (2.13) by assuming that V andX are independent, by using min(1, x) ≤xρ for 0 ≤ ρ ≤ 1 and by letting s = 1
1+ρ:
E[εk,n] ≤ e−E0(ρ,PX ,PY |X)+Es(ρ,PV ). (2.17)
31
2.2 Achievability results
In the context of discrete memoryless systems and for PX being a productdistribution PX(x) =
∏ni=1 PX(xi), x ∈ Xn Gallager’s upper bound specializes to
ε ≤ e−n(E0(ρ,PY |X ,PX)−tEs(ρ,PV )). (2.18)
Thus, the probability of error ε vanishes exponentially in n with the error expo-nent given by E0(ρ, PY |X , PX) − tEs(ρ, PV ). By maximizing over PX and ρ, thisgives rise to a lower bound on the error exponent EJ(PV , PY |X) of the best JSCCscheme:
EJ(PV , PY |X) ≥ EGJ , max
ρ∈[0,1]
{E0(ρ, PY |X)− tEs(ρ, PV )
}, (2.19)
where we define E0(ρ, PY |X) , maxPX E0(ρ, PY |X , PX).As introduced in Chapter 1, (2.19) was strengthened by Csiszar (Theorem
1.3.4) [14] asECs
J = maxρ∈[0,1]
{E0(ρ, PY |X)− tEs(ρ, PV )
}. (2.20)
Even though the difference between ECsJ and EG
J is typically small [69], themethods used to derive each exponent are conceptually different. Firstly, Csiszarconsiders fixed composition codes rather than codes that are generated by productdistributions—such codes are constructed by mapping messages within a sourcetype onto sequences within a channel-input type. Secondly, a suboptimal max-imum mutual information decoder is used at the receiver. This decoder firstdecides on the source type that is being transmitted and then on the sourcemessage within the type. In fact, one may wonder whether we can still obtainCsiszar’s exponent by using the tools introduced above. i.e, by essentially con-sidering random-coding techniques, MAP decoding at the receiver and boundingtechniques based on Markov’s inequality [23]. We partially addressed the abovequestions in Chapter 1 by showing that (2.20) can be obtained using the random-coding technique and MAP decoding at the receiver. However, some questionsremain open:
• Are types and fixed composition codes needed to derive Csiszar’s exponent?
• Can we obtain Csiszar’s exponent using random-coding and Gallager’s bound-ing techniques?
We answer the above points by proposing a new random-coding lower bound tothe JSCC error exponent that holds for discrete sources with arbitrary alphabetsand general channels. The method that we follow to derive our main result isinspired by [14, 23] and can be summarized into three steps. First, we consider aset of partitions of the source output space Vk into classes and assign a channel
32
2.2 Achievability results
input distribution to each class. Second, for every partition, we randomly andindependently generate every codeword according to the class of the transmittedmessage. Finally we choose the partition that maximizes the error exponent.
More explicitly, the derivation of the new bound involves the following steps:
1. Define a partition Pk of the message set Vk intoNk disjoint subsets A1, . . . ,ANk
satisfying⋃Nki=1 Ai = Vk. We shall refer to these subsets as classes. A chan-
nel input distribution P(i)X is assigned to each class Ai.
2. For each source message v ∈ Ai randomly and independently generatecodewords x(v) ∈ Xn according to P
(i)X .
3. Upper-bound the probability of error using Gallager’s bounding techniques[23].
For future use we define for i = 1, . . . , Nk
E(i)s (ρ, PV ) , log
(∑
v∈Ai
PV (v)1
1+ρ
)1+ρ
. (2.21)
and the probability mass function
P(i)V (v) ,
PV (v)∑v′∈Ai
PV (v′), v ∈ Ai,
0, v /∈ Ai.
(2.22)
Theorem 2.2.3 For every partition Pk, for every set of channel-input distri-butions P
(1)X , . . . , P
(Nk)X , and for every set of parameters ρ1, . . . , ρNk ∈ [0, 1], the
average probability of error is upper-bounded by
ε ≤ εB (Pk) , h(k)
Nk∑
i=1
exp
(−E0
(ρi, PY |X , P
(i)X
)+ E(i)
s (ρi, PV )
), (2.23)
where
h(k) ,3Nk − 1
2. (2.24)
If we choose the sequence of partitions Pk such that Nk = 1 and A1 = Vk
for k ≥ 1, then E(1)s (ρ, PV ) = Es(ρ, PV ). If we optimize the bound (2.23) for
this choice of Pk over product distributions PX =∏n
i=1 P(i)X (xi), x ∈ Xn, and
ρ1 ∈ [0, 1] we obtain
ε ≤ h(k) exp
(− max
ρ1∈[0,1]
{E0
(ρ, PY |X
)− Es(ρ, PV )
}). (2.25)
33
2.2 Achievability results
Hence, for the trivial partition with Nk = 1 Theorem 2.2.3 recovers Gallager’sbound on the error exponent (2.19) as log h(k)/k → 0 as n→∞
limn→∞
− 1
nlog εB(Pk) = EG
J . (2.26)
With a more judicious choice of Pk the upper bound (2.23) also recoversCsiszar’s lower bound on the error exponent (2.20). Specifically, (2.20) canbe achieved by identifying the classes A1, . . . ,ANk with the source-type classesT1, . . . ,TNk . Moreover, Csiszar’s bound can also be recovered if we restrict our-selves to product distributions as shown next.
Corollary 2.2.3 Let the classes A1, . . . ,ANk of the partition Pk be the sets ofsource-type classes T1, . . . ,TNk . Then, optimizing (2.23) over product distribu-
tions P(i)X (x) =
∏ni=1 P
(i)X (xi), x ∈ Xn, and ρi ∈ [0, 1], i = 1, . . . , Nk, we obtain
ε ≤ h(k)
Nk∑
i=1
exp
(− max
ρi∈[0,1]
{nmax
P(i)X
E0(ρi, PY |X , P(i)X )− E(i)
s (ρi, PV )
}), (2.27)
and
lim infn→∞
− 1
nlog εB(Pk) ≥ ECs
J . (2.28)
2.2.1.1 Proof of Theorem 2.2.3
To prove Theorem 2.2.3, we apply the RCU bound (2.12) together with theproposed code construction to obtain
ε ≤Nk∑
i=1
∑
v∈A(i)k
PV (v)∑
x,y
P(i)X (x)PY |X(y|x)
×min
1,
Nk∑
j=1
∑
v∈A(j)k
∑
x:PV (v)PY |X(y|x)
≥PV (v)PY |X(y|x)
P(j)X (x)
(2.29)
≤Nk∑
i=1
∑
v∈A(i)k
PV (v)∑
x,y
P(i)X (x)PY |X(y|x)
×min
1,
Nk∑
j=1
∑
v∈A(j)k
∑
x
P(j)X (x)
(PV (v)PY |X(y|x)
)sj(PV (v)PY |X(y|x)
)sj
, (2.30)
34
2.2 Achievability results
A2
A3 A4, P
(3)X
, P(4)X
Vk
, P(2)X
N = 4
P(1)XA1,
Figure 2.1: Random-coding construction: different classes Ai are associated todifferent codeword distributions P
(i)X , i = 1, . . . , N , with N = 4.
where (2.30) follows from Markov’s inequality for sj ∈ [1/2, 1], j = 1, . . . , Nk
[23]. Using the inequality min{1, A+B} ≤ min{1, A}+ min{1, B}, A,B ≥ 0, weupper bound (2.30) as
ε ≤Nk∑
i,j=1
ε(i, j), (2.31)
where ε(i, j) are bounds on the cross probability of error from class i to class j,i.e.,
ε(i, j) ,∑
v∈A(i)k
PV (v)∑
x,y
P(i)X (x)PY |X(y|x)
×min
1,∑
v∈A(j)k
∑
x
P(j)X (x)
(PV (v)PY |X(y|x)
)sj(PV (v)PY |X(y|x)
)sj
. (2.32)
Using the inequality min{1, A} ≤ Aρ, for A ≥ 0 and 0 ≤ ρ ≤ 1, we have that
35
2.2 Achievability results
for ρij ∈ [0, 1], i, j = 1, . . . , Nk, we can individually bound ε(i, j) as
ε(i, j) ≤∑
v∈A(i)k
PV (v)∑
x,y
P(i)X (x)PY |X(y|x)
×
∑
v∈A(j)k
∑
x
P(j)X (x)
(PV (v)PY |X(y|x)
)sj(PV (v)PY |X(y|x)
)sj
ρij
(2.33)
=∑
y
∑
v∈A(i)k
PV (v)1−ρijsj∑
x
P(i)X (x)PY |X(y|x)1−ρijsj
×
∑
v∈A(j)k
PV (v)sj∑
x
P(j)X (x)PY |X(y|x)sj
ρij
. (2.34)
Choosing ρij = 1−sisj∈ [0, 1] for si, sj ∈
[12, 1], and substituting (2.34) into (2.31)
yields then
ε ≤Nk∑
i,j=1
∑
y
Gi(y)siGj(y)1−si (2.35)
where
Gi(y) ,
( ∑
v∈A(i)k
PV (v)si
) 1si
(∑
x
P(i)X (x)PY |X(y|x)si
) 1si
. (2.36)
36
2.2 Achievability results
From (2.35) we have that
ε ≤Nk∑
i,j=1
∑
y
Gi(y)siGj(y)1−si (2.37)
≤Nk∑
i,j=1
(∑
y
Gi(y)
)si (∑
y
Gj(y)
)1−si
(2.38)
≤Nk∑
i,j=1
(si
(∑
y
Gi(y)
)+ (1− si)
(∑
y
Gj(y)
))(2.39)
≤Nk∑
i=1
∑
y
Gi(y) +
Nk∑
i,j=1i 6=j
(∑
y
Gi(y) +1
2
∑
y
Gj(y)
)(2.40)
=3Nk − 1
2
Nk∑
i=1
∑
y
Gi(y), (2.41)
where in (2.38) we applied Holder’s inequality ‖fg‖ ≤ ‖f‖p‖g‖q with p = 1si
and
q = 11−si such that p, q ≥ 1 and 1
p+ 1
q= 1; (2.39) follows from the inequality
between arithmetic and geometric means; and in (2.40) we used the bounds 12≤
si ≤ 1 in the terms of the sum for which i 6= j. The result follows from (2.41) byusing the definition of Gi(y), i = 1, . . . , Nk.
Remark Theorem 2.2.3 holds for general discrete channels sources, not neces-sarily memoryless. Furthermore, it naturally extends to continuous sources andchannels by following the same arguments as those extending Gallager’s exponentfor channel coding.
2.2.1.2 Proof of Corollary 2.2.3
Let the classes A1, . . . ,ANk of the partition Pk be the source-type classes T1, . . . ,TNk .We first note that, by the type counting lemma,
Nk ≤ (k + 1)|V|. (2.42)
We further note that PV (·) is constant within each source-type class Ai = Ti.Therefore, for every i = 1, . . . , Nk we have that
P(i)V (v) =
1
|Ai|, v ∈ Ai, (2.43)
which impliesEs(ρi, P
(i)V ) = ρi log |Ai|, i = 1, . . . , Nk. (2.44)
37
2.2 Achievability results
Let Vi be a random variable whose distribution is the type Pi associated withTi. Then, if we define Ri , tH(Vi) (where H(Vi) denotes the entropy of Vi), wehave the following inequalities using [17, Lemmas 2.3 & 2.6]:
log |Ai|n
≤ Ri, (2.45)
log
(∑
v∈Ai
PV (v)
)≤ −kD (Pi||PV ) (2.46)
≤ −k minj=1,...,Nk:
H(Pj)≥H(Pi)
D (Pj||PV ) (2.47)
≤ −ke(Ri
t, PV
), (2.48)
where in (2.48) we have used the definitions of Ri and of the source reliabilityfunction (A.24).
Using (2.44), inequalities (2.45) and (2.46)-(2.48), and using that t = k/n,
we can upper-bound (2.23) after appropriate optimization over P(i)X and ρi, i =
1, . . . , NK as
εB(Pk) ≤h(k)
Nk∑
i=1
e−n(maxρi∈[0,1]{E0(ρi,PY |X)−ρiRi}+te(Rit ,PV )) (2.49)
=h(k)
Nk∑
i=1
e−n(Er(Ri,PY |X)+te(Rit ,PV )
)(2.50)
≤h(k)Nke−nmini=1,...,Nk{Er(Ri,PY |X)+te(Rit ,PV )} (2.51)
≤h(k)Nke−nminR>0{Er(R,PY |X)+te(Rt ,PV )}, (2.52)
where in (2.50) we have used the definition of the random-coding channel expo-nent (A.29) and (2.52) follows from relaxing the set {Ri}, i = 1, . . . , Nk, overwhich the minimization is performed to all possible values of R > 0.
Using the type counting lemma [17, Lemma 2.2], we have that Nk is which ispolynomial in k. Hence, (2.52) yields the following error exponent
lim infn→∞
− 1
nlog εB(Pk) ≥ min
R>0
{Er
(R,PY |X
)+ te
(R
t, PV
)}. (2.53)
Since Er
(R,PY |X
)+te (R/t, PV ) is a decreasing function of R for 0 < R ≤ tH(V ),
and since e (R/t, PV ) is infinite for R > t log |V|, it follows that the right-handside of (2.53) is equal to ECs
J , thus proving Corollary 2.2.3.
38
2.2 Achievability results
2.2.1.3 Discussion
We have derived a novel upper bound on the error probability of joint source-channel coding by means of an extension of Gallager’s random-coding method.We first partition the source message set into subsets and generate codewords witha distribution that depends on the index of the message subset. We then boundthe probability of the error events and obtain a closed-form overall exponent interms of the minimum of the error exponents within each subset. This expression,valid for any set of input distributions, allows us to show that Csiszar’s form ofthe error exponent for almost lossless source-channel coding can be derived usingproduct distributions, proving that fixed-composition codes are not required toachieve the best known random-coding exponent.
2.2.2 Threshold decoding upper bounds
We now study upper bounds to the error probability using (suboptimal) thresholddecoders. We assume that the source vectors v1,v2, . . . are indexed according totheir corresponding probabilities such that PV (v1) ≥ PV (v2) ≥ . . . . In case ofequality between source probabilities the indexes are set arbitrarily. For a givenencoder φ(.), we define the decoding function Tγ : V× Y→ {0, 1} by
Tγ(vm,y) = 11{PV (vm)PY |X(y|φ(vm)) > γ(vm,y)
}, (2.54)
where 11{·} is the indicator function. Given a transmitted codeword x and areceived sequence y the threshold decoder ψγ returns the lowest index m (associ-ated to the highest probable source vector) for which Tγ(vm,y) = 1. Notice thatthe encoder φ and the function Tγ induce a set Tγ defined by:
Tγ ={
(v,y) ∈ V×X|PV (v)PY |X(y|φ(v)) > γ(v,y)}, (2.55)
We can also evaluate the random-coding error probability of a threshold de-coder by independently generating codewords with a distribution PX|V . In gen-eral, the exact expression is as follows:
E[εTHk,n] = Pr {Error|(V ,Y ) ∈ Tγ}Pr{Tγ}+ Pr
{Error|(V ,Y ) ∈ Tc
γ
}Pr{Tcγ}.
(2.56)In particular, an exact evaluation of (2.56) is given in the next result.
Theorem 2.2.4 The average random-coding error probability for a finite alpha-bet source and arbitrary PX|V under threshold decoding is given by
E[εTHk,n] = 1−
∑
m≥1
PV (vm) Pr {ψ(Y ) = vm} , (2.57)
39
2.2 Achievability results
where
Pr {ψ(Y ) = vm} , E
[Pr{PV (vm)PY |X(Y |X) > γ(vm,Y )|Y
}×
×m−1∏
l=1
Pr{PV (vl)PY |X(Y |X) ≤ γ(vl,Y )|Y
}], (2.58)
where the probabilities in the second term assume that X and Y are independent.
Optimizing (2.56) over all possible source-dependent thresholds is computa-tionally challenging. In order to overcome this issue, we follow the arguments in[49] and apply the probability union bound on (2.56).
E[εk,n] ≤Pr{Tcγ}+ Pr {Error|(V ,Y ) ∈ Tγ} (2.59)
≤Pr{PV (V )PY |X(Y |X) ≤ γ(V ,Y )
}(2.60)
+∑
v
PV (v)∑
v:PV (v)≥PV (v)
Pr{PV (v)PY |X(Y |X) > γ(V ,Y )
}. (2.61)
The next result gives an optimized version of (2.60)-(2.61) following similararguments as in [49, Th. 17].
Theorem 2.2.5 (Optimized DT Bound) For an arbitrary PX|V , there existsan εk,n-code that satisfies
εk,n ≤ DT(k, n) , E[min
(1,
Pr {S(V )}PV (V )
PY (Y )
PY |X(Y |X)
)]. (2.62)
where S(V ) is a random set defined by
S(v) , {v ∈ V | PV (v) ≤ PV (v)} , v ∈ V (2.63)
Proof We first claim that (2.61) is equivalent to
∑
v
Pr {S(V )}Pr{PV (v)PY |X(Y |X) ≤ γ(V , Y )
}), (2.64)
where Y is independent of X. To see that, let the sources be indexed v1,v2, . . .and define pl , PV (vl), and
ql , Pr{PV (vl)PY |X(Y |X) > γ(V ,Y )
}(2.65)
≡ Pr{PV (vl)PY |X(Y |X) > γ(V , Y )
}(2.66)
40
2.2 Achievability results
where Y is independent of X in (2.66). Then, (2.64) follows from the identity
∑
l
pl
(∑
m≥l
qm
)=∑
m
qm
(∑
l≤m
pl
). (2.67)
Then, using (2.60)-(2.61) and (2.67) we obtain
E[εk,n] ≤∑
v
PV (v)
(Pr{PV (v)PY |X(Y |X) ≤ γ(V ,Y )
}(2.68)
Pr{S(v)}PV (v)
Pr{PV (v)PY |X(Y |X) ≤ γ(V , Y )
}). (2.69)
We can optimize the threshold γ(V ,Y ) of equation (2.69) by conditioning on(v,x):
Pr{PV (v)PY |X(Y |x) ≤ γ(v,Y )
}+
Pr{S(v)}PV (v)
Pr{PV (v)PY |X(Y |x) > γ(v, Y )
}.
(2.70)As noted in [49], when γ(v,Y ) = γ′(v)PY (Y ), equation (2.70) can be re-
garded as a weighted sum of the two types of error of a binary hypothesis testbetween probabilities PY |X(Y |x) and PY (Y ). More specifically for (v,x),
Pr
{PY |X(Y |x)
PY (Y )≤ γ′(v)
PV (v)
}+
Pr{S(v)}PV (v)
Pr
{PY |X(Y |x)
PY (Y )>
γ′(v)
PV (v)
}(2.71)
is the average error probability in a Bayesian hypothesis testing problem be-tween PY |X(Y |x) and PY (Y ). This test can be optimized by choosing γ′(v) =Pr{S(v)}, which yields
E[εk,n] ≤E
[11
{PY |X(Y |X)
PY (Y )≤ Pr{S(V )}
PV (V )
}(2.72)
+Pr{S(V )}PY (Y )
PY |X(Y |X)PV (V )11
{PY |X(Y |X)
PY (Y )≤ Pr{S(V )}
PV (V )
}](2.73)
=E
[min
(1,
Pr{S(V )}PY (Y )
PV (V )PY |X(Y |X)
)]. (2.74)
We note that the above bound can be rewritten as
E[e−n|j(V ,X,Y )−γ(V )|+
], (2.75)
41
2.2 Achievability results
where γ(V ) , 1n
log Pr {S(V )} and j(V ,X,Y ) is defined in (2.15) for s = 1.The particularization of equation (2.75) to channel coding,
M∑
m=1
1
ME[e−n|i(X,Y )− 1
nlog(M−mM )|+] , (2.76)
leads to a marginally tighter result than the original DT bound (Theorem 1.3.10).This can be proved by noting that exp(−| log x−1 + a|+) is a concave function inx ∀a ∈ R, and using Jensen’s inequality [41]
M∑
m=1
1
ME[e−n|i(X,Y )− 1
nlog(M−mM )|+] (2.77)
≤ E[e−n|i(X,Y )− 1
nlog( 1
M
PMm=1
M−mM )|+] (2.78)
= E[e−n|i(X,Y )− 1
nlog(M−1
2 )|+] , (2.79)
where equation (2.79) is the DT bound for channel coding [49].Finally, we wish to recover Theorem 1.3.6 (Feinstein’s bound) from our error
probability analysis. To do this, we fix γ(V ,Y ) = enγPY (Y ) in equation (2.69)with γ > 0, and upper-bound the second term using Markov’s inequality as
E[εk,n] ≤Pr {j(V ,X,Y ) ≤ γ}+∑
v
Pr{S(v)}Pr{PV (v)PY |X(Y |X) ≤ γ(V , Y )
}
(2.80)
≤Pr {j(V ,X,Y ) ≤ γ}+ e−nγ∑
v
Pr{S(v)}PV (v)E[PY |X(Y |X)|Y
]
PY (Y )
(2.81)
= Pr {j(V ,X,Y ) ≤ γ}+ e−nγ∑
v
PV (v) Pr{S(v)} (2.82)
= Pr {j(V ,X,Y ) ≤ γ}+ e−nγ, (2.83)
where (2.83) corresponds to Feinstein’s bound.
2.2.2.1 An exponent for threshold decoding
We prove here a Chernoff bound on (2.62) that yields a lower bound on the JSCCerror exponent with threshold decoding. Beforehand, let EDT(PV , PY |X) be theerror exponent of DT(k, n).
Theorem 2.2.6
EDT(PV , PY |X) ≥ lim infn→∞
1
n
(E0(ρ, 1, PV , PY |X)− Es
(ρ,
1
1 + ρ, PV
)). (2.84)
42
2.3 Converse bounds
Proof According to (2.62), we can further upper-bound E[εk,n] for s ∈ [0, 1] andρ ∈ [0, 1] as
E[εk,n] =E[min
(1,
Pr {S(V )}PV (V )
PY (Y )
PY |X(y|x)
)](2.85)
≤E
min
1,
∑
v:PV (v)≤PV (V )
PV (v)s
PV (V )sPY (Y )
PY |X(Y |X)
(2.86)
≤E
[min
(1,∑
v
PV (v)s
PV (V )sPY (Y )
PY |X(Y |X)
)](2.87)
≤E
[(∑
v
PV (v)s
PV (V )sPY (Y )
PY |X(Y |X)
)ρ](2.88)
=∑
v
PV (v)1−sρ
(∑
v
PV (v)s
)ρ
E[
PY (Y )ρ
PY |X(Y |X)ρ
]. (2.89)
By using s = 11+ρ
, we finally obtain
E[εk,n] ≤e−E0(ρ,1,PX ,PY |X)+Es(ρ, 11+ρ
,PV ). (2.90)
We can compare the lower bound with threshold decoding with that obtainedvia MAP decoding. More specifically, recalling equation (2.14),
RCUs(k, n) ≤ e−E0(ρ, 11+ρ
,PX ,PY |X)+Es(ρ, 11+ρ
,PV ), (2.91)
whereas for threshold decoding, we obtain
DT(k, n) ≤ e−E0(ρ,1,PX ,PY |X)+Es(ρ, 11+ρ
,PV ). (2.92)
Notice that the sub-optimality of threshold decoding is reflected in in the termE0
(ρ, 1, PX , PY |X
), which cannot be optimized with respect to s > 0 in (2.92).
2.3 Converse bounds
In this section we derive a converse result in the form of a lower bound on theaverage error probability of every JSC codes. To do this, we refine a methodbased on binary hypothesis testing. As we shall see, the key argument is to relatethe error probability conditioned on a source vector being transmitted, εk,n(v),to the error of the following binary test on the joint distribution of (X,Y ):
H0 : PXY = PX × PY |X
H1 : QXY = PX ×QY (2.93)
43
2.3 Converse bounds
where QY is an arbitrary output distribution. For a given PX , notice that theproblem (2.93) is equivalent to test the conditional distributions PY |X or QY . Inother words, the above problem can be viewed as a test on the distribution of anunknown channel whose distribution is ether PY |X or QY .
We wish to study a decision rule for that test. For this purpose, we index thesource outputs {vl}l≥1 and proceed as follows (See also Fig. 2.2):
• We fix a certain code to communicate over the unknown channel.
• We transmit a vector vl and observe the output of the decoder vl.
• If vl = vl, we declare that the channel is described by PY |X , otherwise wedeclare QY .
Without loss of generality, we consider that the encoding mapping is generatedwith a distribution PX|V and that the decoder (possibly randomized) is specifiedby the conditional distribution PZ|Y (vl|y), vl ∈ V, y ∈ Y. Hence, the set ofdecisions rules for the test (2.93) can be defined by the indicator function
DH0 = 11{vl = vl}, l ∈ N (2.94)
that specifies the acceptance regions
RH0(vl, PX|V , PZ|Y ) = {(x,y) ∈ X× Y|vl = vl}, l ∈ NRH1(vl, PX|V , PZ|Y ) = {(x,y) ∈ X× Y|vl 6= vl}, l ∈ N. (2.95)
We now analyze the test errors for every decision rule. For a given vl, theprobability of accepting QY under PY |X (Type I error) with the decision ruleDH0 (2.94) equals
Pr{vl 6= vl|PY |X
}= 1−
∑
x
∑
y
PX|V (x|vl)PY |X(y|x)PZ|Y (vl|y) (2.96)
= εk,n(vl), (2.97)
where note that εk,n(vl) corresponds to the error probability when vl is transmit-ted and the code specified by (PX|V , PZ|Y ) is used.
Similarly, the probability of accepting PY |X under QY (type II error) withthe decision rule DH0 equals
Pr {vl = vl|QY } =∑
x
∑
y
PX|V (x|vl)QY (y)PZ|Y (vl|y) (2.98)
=∑
y
QY (y)PZ|Y (vl|y) = QZ(vl). (2.99)
44
2.3 Converse bounds
PX|V PZ|YPXY
vlx yvl
QXY
(vl, vl)
PZ|YPXY
QXY
...(v1, v1)
v1
...
PX|Vv1
x y
Figure 2.2: Bank of decision rules to test the channel between PXY and QXY .
Our set of decision rules for the test (2.93) has been deliberately defined toobtain εk,n(vl) as the type I error for every l ≥ 1. We can relate this quantityto the type II error (2.98) via the Neyman-Pearson Lemma (Theorem 1.3.11).As introduced in Chapter 1, the Neyman-Pearson Lemma gives the test thatachieves the minimum type II error among the tests whose type I error satisfiesan upper-bound constraint. For the test under consideration (2.93) we denotethis minimum by βα(PXPY |X , PXQY ) when the type I error is upper-boundedby 1−α. Hence, according to the errors computed in (2.97) and (2.98) we observethe following remark.
Remark 2.3.1
β1−εk,n(vl)(PXPY |X , PXQY ) ≤ QZ(vl), ∀l ≥ 0. (2.100)
For ease of notation, we shall use βα to refer to βα(PXPY |X , PXQY ) as afunction of α when the rest of variables are assumed to be fixed. Since βα is apiecewise convex function [50] it is also a strictly monotone function in the region[α0, 1], where
α0 , max{α|βα = 0}. (2.101)
Then, we can define the inverse of βα in this region as
β−1β′ (PXPY |X , PXQY ) = {α ∈ [α0, 1] : βα(PXPY |X , PXQY ) = β′}. (2.102)
which is increasing in β′ ∈ [0, 1]. Consequently, for every β′ such that β′ ≥β1−εk,n(vl) the function β−1
β′ gives 1− ε′ such that 1− ε′ ≥ 1− εk,n(vl). Thus, any
45
2.3 Converse bounds
upper bound on β1−εk,n(vl) corresponds to a lower bound
εk,n(vl) ≥ 1− β−1β′ (PXPY |X , PXQY ). (2.103)
In particular, on account of (2.103) Remark 2.3.1 implies
εk,n(vl) ≥ 1− β−1QZ(v)(PXPY |X , PXPY ). (2.104)
Then, the next result follows from taking the average over V on both sides of(2.104) and optimizing the bound appropriately.
Theorem 2.3.1 Given a finitely countable source V and a general channel W,the average error probability εk,n of every JSC code satisfies
εk,n ≥ CNV(k, n) , 1− supPX ,PZ|Y
infQY
∑
v
PV (v)β−1QZ(v)(PXPY |X , PXQY ) (2.105)
The above result requires a priori a challenging optimization. However, weremark a number of points that facilitate its computation. First, note that anychoice of QY still gives a converse as it is independent of εk,n. Besides, PZ|Y canbe set to define a MAP decoder and ensure that the error probability under anydecoder is lower-bounded.
We now claim that our approach gives a tighter bound than using a singledecision rule based on transmitting a random message [49]. To see that, considerthe random-message decision rule
DH0(PX|V , PZ|Y , PV ) = 11{V = V
}, (2.106)
with type I error probability equal to
Pr{V 6= V |PY |X
}= 1−
∑
v
∑
x
∑
y
PV (v)PX|V (x|v)PY |X(y|x)PZ|Y (v|y)
(2.107)
= εk,n, (2.108)
and type II error equal to
Pr{V = V |QY
}=∑
v
∑
x
∑
y
PV (v)PX|V (x|v)QY (y)PZ|Y (v|y) (2.109)
=∑
v
PV (v)QZ(v). (2.110)
We justify our claim by using this lemma:
46
2.3 Converse bounds
Lemma 2.3.1 The function β−1β′ is concave in [α0, 1].
Proof Let β′, β′′ ∈ [0, 1] with α′ = β−1β′ , α
′′ = β−1β′′ ∈ [α0, 1]. Since βα is convex,
this implies ∀λ ∈ [0, 1] that
β (λα′ + (1− λ)α′′) ≤ λβ′ + (1− λ)β′′. (2.111)
By inverting both sides of (2.111), we obtain
λβ−1β′′ + (1− λ)β−1
β′′ ≤ β−1 (λβ′ + (1− λ)β′′) , (2.112)
which concludes the proof.
Then, by using Jensen’s inequality [13] it follows
εk,n ≥ 1−∑
v
PV (v)β−1QZ(v) ≥ 1− β−1P
v PV (v)QZ(v), (2.113)
and conclude that the choice of the decision rule (2.106) gives a weaker conversebound than Theorem 2.3.1.
2.3.1 Set-dependent lower bounds
In the remainder of this part, we provide an alternative approach to derive lowerbounds to the error probability of the form of Verdu-Han Lemma for JSCC (The-orem 1.3.7). To do this, define an arbitrary set G, and upper-bound its probabilityas
Pr{G} = Pr{G ∩ Error}+ Pr{G ∩ no Error} (2.114)
= Pr{G|Error}εk,n + Pr{G ∩ no Error} (2.115)
≤ εk,n + Pr{G ∩ no Error}. (2.116)
In particular, the choice
G ={
(v,y) : PV (v)PY |X(y|x(v)) ≤ γ(y)}
(2.117)
with γ(y) ≥ 0 yields the following general result.
Theorem 2.3.2 (Generalized Verdu-Han lower bound) For a given sourceand channel, a (φ, ψ)-code satisfies
εk,n ≥ Pr{PV (V )PY |X(Y |φ(V )) ≤ γ(Y )
}−∑
y
γ(y). (2.118)
47
2.3 Converse bounds
Proof Given a code (φ, ψ), let Aφ,ψ(v) ⊂ Y be the decoding region for the sourceoutput v ∈ V. Then we upper-bound Pr{G ∩ no Error} as
Pr{G ∩ no Error} =∑
(v,y)∈{G∩no Error}
PV (v)PY |X(y|ψ(v)) (2.119)
≤∑
(v,y)∈{G∩no Error}
γ(y) (2.120)
≤∑
(v,y)∈{no Error}
γ(y) (2.121)
=∑
v
∑
y∈Aφ,ψ(v)
γ(y) (2.122)
=∑
y
γ(y), (2.123)
where it is used that the decoding regions need to be disjoint when no errors areencountered.
It is easy to see that we can recover Verdu-Han’s Lemma by setting γ(y) =PY (y)γ′, with γ′ > 0 and PY (y) the output density for a given code. Computing(2.118) for the best code is in general intractable. In any event, one has the free-dom to choose QY such that Y is independent ofX = φ(V ). Since PY |X(y|φ(v))is the only term in (2.118) that depends on the code choice we can minimize thelower bound over the set of possible codewords in X. This argument yields thenext (possibly looser) result that can be viewed as a generalization of Wolfowitz’sbound [68, Th. 7.8.1] to JSCC.
Corollary 2.3.1 Let the set of possible codewords be denoted by C ⊆ X. Then,every εk,n satisfies
εk,n ≥∑
v
PV (v) infx∈C
Pr
{PV (v)PY |X(Y |x)
QY (Y )≤ γ′
}− γ′, (2.124)
for an arbitrary QY and γ′ > 0.
Instead of searching for the best codeword performance, one can also average(2.118) over the set of randomly selected codes with probability PX′ and obtain alower bound on the random-coding average error probability εk,n. In particular,the random-coding lower bound when choosing QY (y) =
∑x PX′(x)PY |X(y|x)
is given by
E[εk,n] ≥ LB(k, n, γ) = Pr {j(V ,X,Y ) ≤ −γ′} − e−nγ′ , (2.125)
where j(V ,X,Y ) is as defined in Eq. (2.15) for s = 1. We remark that theabove bound cannot be used to prove a converse, as this is only a lower boundon the random-coding error probability.
48
2.3 Converse bounds
2.3.2 εk,n-transmissibility
Imagine that we are engineers working for an aerospace company and our goalis to guarantee the transmission of k-length data packets between a terrestrialstation and a satellite with maximum delay n and target average error probabilityε. Before thinking of any transmitter, receiver, or coding technique we need toask ourselves: Is our goal achievable given the statistics of the source and themedium? In other words, is εk,n-transmissibility possible in this case?
Notation Parameters Exponents
MAP decoding upper bounds MAX(k, n) PX|VRCU(k, n) PX|V EJ
RCUs(k, n) PX|V , s ≥ 0
Threshold decoding upper bound DT(k, n) PX|V EDT
Test-dependent lower bound CNV(k, n) QY
Set-dependent lower bound LB(k, n, γ) γ > 0
Table 2.1: Summary of upper and lower bounds.
The results presented in this chapter shed light into these type of questions.More specifically, our achievability and converse bounds shown in Table 2.1 tellus when εk,n-transmissibility is possible. For instance, an achievability resultcan be interpreted as a sufficient condition for being εk,n-transmissible whereas aconverse result states a necessary condition for it. In particular, using our tightestachievability and converse results help us bound the minimum εk,n for which asource is εk,n-transmissible over a channel.
Corollary 2.3.2 (εk,n-transmissibility) Let V and W be a source and a chan-nel of a communications model. For every k and n, V is εk,n-transmissibleover W if εk,n ≥ MAX(k, n). Conversely, if V is εk,n-transmissible over W,εk,n ≥ CNV(k, n).
Further, letε?k,n(PV , PY |X) , min
(φ,ψ){εk,n(φ, ψ)} (2.126)
be the minimum average error probability achieved by a JSC code. Then,
CNV(k, n) ≤ ε?k,n ≤ MAX(k, n) (2.127)
where CNV(k, n) and MAX(k, n) are given by (2.105) and (2.10) respectively.
49
2.4 Comparison with separate source-channel coding
20 30 40 50 60 70 80 90 1000.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.11
Blocklength, n
Err
orbo
und
CNV
MAX
Figure 2.3: Max upper bound and converse for a binary memoryless source withδ = 0.05 and a binary symmetric channel with crossover probability ξ = 0.11.
Figure 2.3 illustrates the above result for a binary memoryless source-channelpair up to 100 bits. The shadowed area corresponds to the region where ε?k,n(PV , PY |X)lies.
In channel source coding, the concept of maximum or minimum achievablerate allows to describe the fundamental performance limits. However, we be-lieve that it is the concept of εk,n transsmissibility introduced in this dissertationthe guideline to study the problem of joint source-channel coding for arbitraryblocklength.
2.4 Comparison with separate source-channel cod-
ing
In the previous sections we have derived a set of tools that allow us to tightlycharacterize the performance of a JSCC scheme. We now wish to compare theadvantage of a JSCC over a separate source-channel code (SSCC). As SSCC isa particular case of JSCC, we here present particularizations of our achievabilityand converse results for a separate scheme.
50
2.4 Comparison with separate source-channel coding
Source-Encoder ChannelChannel
Encoder!s !s
PY |XV k
vx ySourcePV
Destinationv ChannelDecoder
!s!c!c
SourceDecoder!s
m m
Figure 2.4: Structure of a separated source-channel coding system.
2.4.1 System model
A general separate source-channel encoder is composed by:
• A source encoder φs that maps source vectors v into messages m belongingto the a set M = {1, 2, . . . ,M}.
• A channel encoder φc that maps messages m into channel codewords x(m).
Similarly the decoding process is divided into two steps
• Estimation of the message m as a function of the channel output m = ψc(y).
• Recovery of the source vector from the estimated message v = ψs(m).
We can formally represent the encoding and decoding process by the followingMarkov chain (See Fig. 2.4):
Vφs→M
φc→X→Y ψc→ Mψs→ V , (2.128)
where the distribution PX|M can be set arbitrarily. Further, we choose withoutloss of generality the uniform distribution over M : PM (m) = 1
M, for all m ∈M.
2.4.2 Upper bounds on the probability of error
The average error probability of a SSCC scheme εsepk,n with M messages can be
decomposed as
εsepk,n = Pr
{V 6= V
}(2.129)
= Pr{
(V 6= V ) ∩ (M 6= M )}
+ Pr{
(V 6= V ) ∩ (M = M)}
(2.130)
= Pr{V 6= V | M 6= M
}Pr{M 6= M
}(2.131)
+ Pr{V 6= V | M = M
}Pr{M = M
}(2.132)
≤Pr{M 6= M
}+ Pr
{V 6= V | M = M
}, (2.133)
51
2.4 Comparison with separate source-channel coding
where observe that εn , Pr{M 6= M} and εk , Pr{V 6= V | M = M} cor-respond to the channel coding and source coding error probabilities respectively.Hence, one can always bound the average error probability by the overall sum ofthe two individual errors, i.e,
εsepk,n ≤ εn + εk. (2.134)
We now particularize our JSCC upper bounds to source and channel coding toobtain bounds on εsep
k,n via the inequality (2.134). In the next results, the absenceof a noisy channel is characterized by X = Y and PX is assumed to be uniformover the codeword set {1, . . . ,Ms}. We show in the following that Corollary2.2.1 and Theorem 2.2.5 particularize to the same upper bound and thus, theperformance of threshold decoding is the same of that of MAP decoding whenusing source codes.
Corollary 2.4.1 (RCU-DT bound for source coding) There exists a εk-sourcecode with Ms distinct codewords and decoded with MAP decoding that satisfies
εk ≤ E[min
(1,
Pr {S(V )}MsPV (V )
)], (2.135)
where S(V ) = {v′ : PV (v′) ≤ PV (V )}.
Proof We first recall the RCU bound for JSCC, which reads:
E[εk,n]leqE
[min
{1,∑
v 6=V
Pr{PV (v)PY |X(Y |X) ≥ PV (V )PY |X(Y |X)|V XY
}}]
.
(2.136)Given the above considerations, for (x,y) with x 6= y, the following holds
PV (v)PY |X(Y |X) = 0, (2.137)
whereas for x = y,
PV (v)PY |X(Y |X) = PV (v) (2.138)
with probability 1Ms
. Consequently, equation (2.136) gives an upper bound onthe random-coding error probability of source coding, E[εk],
E[εk] ≤ E
[min
(1,∑
v′
1
Ms
Pr {PV (V ) ≤ PV (v′)})]
, (2.139)
52
2.4 Comparison with separate source-channel coding
which can be rewritten using the notation of Section 2.2 as
E[εk] ≤ E[min
(1,
Pr {S(V )}MsPV (V )
)]. (2.140)
In order to see that DT(k, n) coincides with (2.140) when both are appliedto source coding one needs to substitute PY (Y ) = 1
Msand PY |X(Y |X) = 1 in
equation (2.62) of Theorem 2.2.5.
The above result is derived using random-coding and in general it does notachieve the optimal source error exponent [23]. Alternatively, we provide anupper bound derived by Jelinek [36] for a specific code that does achieve thesource error exponent.
Theorem 2.4.1 [36] There exists a εk-code with Ms codewords that satisfies
εk ≤ Pr
{PV (V ) ≤ 1
Ms
}. (2.141)
Proof Let A ∈ (0, 1) and define the set C ⊆ V as
C = {v ∈ V : P (v) > A} . (2.142)
Then, every source vector v ∈ C is uniquely mapped into a codeword m. Onthe other hand, all vectors falling outside C are encoded into a dummy codewordm0. By using the condition (2.142), we note that the size of Ms , |C| can beupper-bounded as
Ms ≤∑
v∈C
P (v)
A(2.143)
≤∑
v∈C
P (v)
A+∑
v/∈C
P (v)
A=
1
A, (2.144)
and thus, A ≤ 1Ms
. Observe that the error probability is the probability of drawingan element of Cc. This yields,
εk = Pr {PV (v) ≤ A}
≤ Pr
{PV (v) ≤ 1
Ms
}.
Unlike the equation (2.135), the above upper bound (2.141) does achieve theexact source error exponent [36], which implies that a source-coding scheme de-fined by (2.142) performs optimally as n grows large. We shall recall this codeconstruction in the derivation of a new converse for SSCC in Section 2.4.3.
53
2.4 Comparison with separate source-channel coding
As it is concerned with channel coding bounds, the original RCU bound isgiven in Theorem 1.3.8, whereas the particularization of DT(k, n) is indicated in(2.76). By combining these upper bounds with the expressions given in (2.135)and (2.142), one obtains achievability results for the error probability of SSCC.As an example, we provide the result associated with the RCU bound.
Corollary 2.4.2 (RCU bound for separate source and channel coding)The error probability of the best separated scheme with intermediate message setof cardinality Mc is upper-bounded by
εsepk,n ≤E
[min
(1, (M − 1) Pr
{PY |X(Y |X) ≥ PY |X(Y |X)|XY
})](2.145)
+E[min
(1,
Pr {S(V )}McPV (V )
)], (2.146)
where X is independent of Y .
Proof It follows directly from equation (2.134), Corollary 1.3.8 and Corollary2.4.1.
2.4.3 A lower bound on the probability of error
The achievability bounds presented in the previous section give insight into theperformance of SSCC but do not provide much information on how well a separatecoding scheme can perform. This performance limit is conceptually importantin order to characterize the gain of JSCC over SCCC since it allows to quantifythe difference in error probability between an achievable JSCC error probabilityand the best error probability one can obtain using a separate scheme. For thisreason, we provide in this section a converse bound on the error probability ofSSCC using Theorem 1.3.12. To do that, we shall use Jelinek’s definition of thesource encoding set (See Theorem 2.4.1) that is proven to achieve the best errorprobability [36]. Then, without loss of optimality, we define the source set for agiven M ∈ N as
B(M) = {v : PV (v) ≥ λM} ⊆ V, λM ∈ [0, 1] (2.147)
ensuring that |B(M)| = M .Then, we assume that the source code uniquely maps an indexm ∈ {1, . . . ,M}
to every v ∈ B(M) and assigns m0 otherwise. Let B(M)c the complementary set
54
2.4 Comparison with separate source-channel coding
of B(M). Then, for every M ≤ |V| the error probability of separation reads
εk,n = Pr {error|V ∈ B(M)}Pr{B(M)} (2.148)
+ Pr {error|V /∈ B(M)}Pr{B(M)c} (2.149)
= Pr{Channel coding error}Pr{B(M)} (2.150)
+
(1− (1− Pr{Channel coding error})
|B(M)c|
)Pr{B(M)c}, (2.151)
where the factor in the second term is the probability of guessing correctly anelement of B(M)c from m0. Moreover, we can find a lower bound on εn, ech, in(2.148) using Theorem 1.3.12 [49] (or alternatively via the weaker bound (2.113)when PV is uniform). This yields the following converse result.
Corollary 2.4.3 The error probability of any separate source-channel code islower-bounded as
εsepk,n ≥ min
MinfPX
{ech Pr{B(M)}+
(1− (1− ech)
|B(M)c|
)Pr{B(M)c}
}(2.152)
whereech , 1− β−1
1M
(2.153)
and β(α) ≡ βα(PXPY |X , PXQY ) is the minimum type II error when testing be-tween PXY = PXPY |X and PXY = PXQY with type I error equal to 1− α.
2.4.4 Error exponent of separate source and channel cod-ing
We recall here the definition of the separate error exponent when concatenatingoptimal source and channel coding [14] that stems from inequality (2.134):
Esep , supR
min
{te
(R
t, PV
), E(R,PY |X)
}(2.154)
where e(Rt, PV
)and E(R,W ) are the source and channel error exponents defined
in Appendix (A.3).For discrete memoryless systems, equation (2.154) can be rewritten as [69]:
Esep = te
(R0
t, PV
)= E(R0,W ) (2.155)
where
R0 ,
{R such that e
(Rt, PV
)= E(R,PY |X), if −te(log |V| , PV ) ≥ E(t log |V| , PY |X)
t log |V| , if −te(log |V| , PV ) < E(t log |V| , PY |X).
(2.156)
55
2.4 Comparison with separate source-channel coding
2.4.5 A comparison between JSCC and SSCC via DT(k, n)
As introduced above, a rigorous method to analytically quantify the gain of JSCCover separation is comparing and achievability bound for JSCC and a conversebound for separation. This gives a lower bound on the minimum achievable gainof a joint scheme over separation. In subsection 2.5.1.2 we numerically illustratethis gain using the RCU bound for JSCC and the converse for separation givenin Corollary 2.4.3. However, in this section, we provide a not so rigorous butinsightful characterization of the gain via our achievability bounds. To this end,consider first the threshold decoding bound DT(k, n) given in Theorem 2.2.5:
E[min
(1,
Pr {S(V )}PV (V )
PY (Y )
PY |X(Y |X)
)](2.157)
and the following upper bound for every M ≥ 0:
E[min
(1,
Pr {S(V )}PV (V )
PY (Y )
PY |X(Y |X)
)]≤E
[min
(1,
Pr {S(V )}MPV (V )
)](2.158)
+E[min
(1,
MPY (Y )
PY |X(Y |X)
)], (2.159)
where it is used the inequality min(1, AB) ≤ min(1, A) + min(1, B) for A,B ≥ 0.When M is appropriately chosen, notice that the right hand side correspondsto the sum of the (optimized) DT bound for source coding (2.135) and the DTbound for channel coding with γ(V ) = 1. According to (2.134) this constitutes anachievability bound for SSCC as both codes use the same rate. Let now denotethe sum by DTsep. Hence,
DT(k, n) ≤ DTsep(k, n) (2.160)
which shows that the achievability bound for separation upper-bounds that forjoint source-channel coding. Further, when each bound is non trivial (i.e., smallerthan 1) and PX is taken such that X is independent of V , we have the followinginequality:
DT(k, n) = E[
Pr {S(V )}PV (V )
PY (Y )
PY |X(Y |X)
](2.161)
= E[
Pr {S(V )}MPV (V )
]E[MPY (Y )
PY |X(Y |X)
](2.162)
≤
(E[
Pr{S(V )}MPV (V )
]+ E
[MPY (Y )
PY |X(Y |X)
])2
2(2.163)
=(DTsep(k, n))2
2. (2.164)
56
2.5 Application to binary memoryless sources and channels
Hence, when the bounds are not trivial we can approximate the JSCC averageerror probability as
εJSCCk,n ≈
(εSSCCk,n
)2
2, (2.165)
which is consistent with the relationship found via error exponents [69].
2.5 Application to binary memoryless sources
and channels
In this section, we illustrate our main results, namely, upper, lower bounds, andexponents with some source-channel pair examples. In these examples, we con-sider a binary memoryless source (BMS) that is transmitted over two channels ex-amples: a binary symmetric channel (BSC) and a binary erasure channel (BEC).
Let dw(x) denote the Hamming weight of sequence x, i.e, the number ofnon-zero elements appearing in sequence x. A binary memoryless source withprobability of bit 1 equal to δ is defined by an alphabet V = {0, 1}k and thedistribution
PV (v) = δdw(v)(1− δ)k−dw(v), v ∈ {0, 1}k (2.166)
A binary symmetric channel (BSC) with crossover probability ξ < 12
is definedby an input and output alphabets X = {0, 1}n and a conditional distribution
PY |X(y|x) = εdw(y−x)(1− ε)n−dw(y−x), x,y ∈ {0, 1}n (2.167)
Let now e be a symbol denoting an erasure and de(y) be the number oferasures in sequence y. A binary erasure channel (BEC) with probability ξ < 1is characterized by an input alphabet X = {0, 1}n, an output alphabet {0, 1, e}nand a conditional distribution
PY |X(y|x) =
{εde(y)(1− ε)n−de(y), x and y coincide in non-erased positions
0, otherwise.
(2.168)
2.5.1 Numerical results
In the following, we numerically compare the upper and lower bounds for thesource-channel pairs BMS-BSC and BMS-BEC as well as some classical results.For the sake of simplicity we consider n = k, i.e., transmission rate t = 1 and thebounds only depend on n. We evaluate our bounds over the ensemble of randomsource-channel codes generated by the capacity achieving distribution PX(x) =
57
2.5 Application to binary memoryless sources and channels
100 200 300 400 500 600 700 800 900 100010
−8
10−6
10−4
10−2
100
Blocklength, n
Err
orb
ounds
DT
RCUsG
RCU
CNV
GallagerFeinstein
Figure 2.5: Upper and lower bounds for a BMS with δ = 0.05 and a BSC withcrossover probability ξ = 0.11.
2−n. The expressions corresponding to these particular cases can be found in theAppendices B.1 and B.2 respectively except the exponent for separation, whichis computed according to (2.155).
2.5.1.1 Joint source-channel coding bounds
We here compare the bounds and error exponents for JSCC presented in Section2.2 and Section 2.3. For each source-channel pair, the tilted RCU bound iscomputed with the s ≥ 0 that optimizes Gallager’s upper bound (2.17) (sG) andwill be denoted as RCUsG(n).
Fig. 2.5 shows upper and lower bounds for a BMS with δ = 0.05 transmittedover a BSC with crossover probability ξ = 0.11. More specifically, RCU(n) (2.12),RCUs(n) (2.14), DT(n) (2.62) and a relaxation of CNV(n) (2.105) (It is denotedin the figure as CNV(n) for ease of notation). RCUsG(n) is obtained with sG =0.8382 and ρ = 0.1930. The three new achievability bounds are tighter thanGallager’s (2.83) and Feinstein’s bounds (2.83) over the blocklength range in
58
2.5 Application to binary memoryless sources and channels
0.1 0.2 0.29 0.4 0.50.50
0.02
0.04
0.06
0.08
0.1
0.12
0.14
H(δ)
Err
orex
pone
nts
EGC, Esp
EDT
Esep
Figure 2.6: Error exponents for a BSC with crossover probability ξ = 0.11 as afunction of a BMS entropy, where H(0.05) ≈ 0.29.
Fig. 2.5. RCU(n) and RCUsG(n) have Gallager’s exponent, which coincides withCsiszar’s one as the channel is symmetric [69]. On the other hand, the thresholddecoding bound DT(n) does not have the optimal exponent and it is expectedto perform poorer than Gallager’s bound for sufficiently large blocklength. Theconverse bound also seems to have the optimal exponent, and tightly bounds thebest performance region. Notice that one cannot find JSCC codes with errorprobability below CNV(n). We validate these observations by fixing the channeland plotting (See Fig. 2.6) lower bounds on the error exponent of each bound asa function of the source entropy parametrized by δ, H(δ). First, we verify thatGallager-Csiszar’s exponent is at least tight for entropies larger or equal thanH(V ) = 0.05 because it coincides with the (sphere-packing) upper bound givenin Theorem A.3.3. Besides, EDT is well approximated by the lower bound givenin (2.84). Consequently, Gallager’s exponent is strictly larger than DT(n) onaccount of Theorem (2.2.6). This is illustrated by Fig. 2.6 at H(0.05) = 0.2864.
Fig. 2.7 shows error bounds for the same BMS transmitted over a BEC witherasure probability ξ = 0.5. In this case RCUsG(n) is obtained with sG = 0.770and ρ = 0.2970 and a relaxation of CNV(n) (2.105) In particular, RCU(n) andDT(n) have an approximately constant gap with respect to RCUsG(n). Fig. 2.6shows that the exponent of the three bounds is the actual JSCC error exponent
59
2.5 Application to binary memoryless sources and channels
50 100 150 200 250 300 350 400 450 50010
−7
10−6
10−5
10−4
10−3
10−2
10−1
100
Blocklength, n
Err
or b
ound
s
RCUsG
DT
RCU
CNV
FeinsteinGallager
Figure 2.7: Upper and lower bounds for a BMS with δ = 0.05 and a BEC witherasure probability ξ = 0.5.
as it coincides with the sphere-packing exponent. In this case, the E0 function ofthe BEC channel is independent of sG and hence, EDT equals Gallager’s exponenton account of equations (2.91) and (2.92).
2.5.1.2 Joint vs. Separate source-channel coding
Adding the results from Section 2.4, Fig. 2.9 compares the error probability of thebest JSCC and SSCC schemes for a BMS with δ = 0.05 and a BSC with crossoverprobability 0.11. In particular, we present respective uppers bounds based on theRCU bound (See (2.135) for SSCC) and lower bounds based on the converseresults (2.105) and (2.152). One can observe that for blocklengths as short asn = 100 it is already noticeable the advantage of JSCC over a separate scheme.This gain can be quantified in error probability when the blocklength is fixed orin delay when a target error probability is given. For instance, a JSCC requires400 channel uses to achieve an error probability of 10−4, whereas a SSCC schemenearly requires 800. This is also validated by Fig. 2.6, where the relationship
60
2.5 Application to binary memoryless sources and channels
0.1 0.2 0.29 0.4 0.50.50
0.05
0.1
0.15
0.2
0.25
H(δ)
Err
ror
expo
nent
s
EGC, EDT, Esp
Esep
Figure 2.8: Error exponents for a BMS-BEC pair, where BEC has erasure prob-ability ξ = 0.5, as a function of the BMS entropy, where H(0.05) ≈ 0.29.
between the exponents is approximately EJ(PV , PY |X) ≈ 2Esep(PV , PY |X), whichtranslates into εk,n ≈ (εsep
k,n)2 as n grows large.In Fig. 2.10, we compare both schemes from a different perspective. Here, we
fix the error probability to be 10−3 in the same BMS-BSC source-channel pairand leave the parametrization of the BMS (δ ∈ [0, 1]) free. The figure showsa lower bound on the maximum allowed entropy of a BMS that is transmittedwith JSCC for the aforementioned average error probability. Conversely, thefigure plots an upper bound on the maximum allowed entropy of a BMS thatis transmitted with SSCC and target error probability equal to 10−3. The gapbetween both curves (shadowed area) represents the gain of JSCC over separationin terms of transmissible entropy. The gap decreases with the blocklength sincethe maximum allowed entropy converges to capacity for both schemes as n goesto infinity. However, as seen in the previous plots, when the desired probability oferror is not so high, the gain of JSCC over SSCC can be significant for moderateblocklengths.
61
2.6 Chapter review and conclusions
100 200 300 400 500 600 700 800 900 100010
−8
10−7
10−6
10−5
10−4
10−3
10−2
10−1
100
Blocklength, n
Err
orbo
unds
JSCC
SSCC
Figure 2.9: Achievability and converse bounds for SSCC and JSCC for a BMSwith δ = 0.05 and a BSC with crossover probability 0.11. The dashed and solidlines represent the corresponding RCU and converse bounds.
2.6 Chapter review and conclusions
In this chapter, we have studied the performance limits of joint source-channelcoding for arbitrary blocklengths.
To do this, we have divided our work into three parts:
• Achievability/direct part: Study of upper bounds on the minimumachievable error probability.
• Converse part: Study of lower bounds on every achievable error proba-bility.
• Comparison JSCC vs. SSCC: Particularization of the JSCC boundsto source and channel coding enabling a comparison between joint andseparate source-channel coding.
In this process, we have followed two methods:
62
2.6 Chapter review and conclusions
100 200 300 400 500 600 700 8000
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Blocklength, n
H(V
)
Capacity
AchievabilityJSCC
ConverseSSCC
Figure 2.10: Maximum entropy allowed to transmit with SSCC and JSCC overa BSC with crossover probability ξ = 0.11. The achievability bound for JSCC isthe RCU bound while the converse bound for SSCC is given by equation (2.152).
• Random-coding analysis (Achievability part).
• Binary hypothesis testing and the Neyman-Pearson Lemma (Con-verse part).
In the first part, we have extended the random-coding approach to JSCCinitiated by Gallager [23] with the derivation of new achievability bounds basedon MAP and threshold decoding. Remarkably enough, the tightest achievableexponent (due to Csiszar [14]) for JSCC had been derived for a fixed-compositioncode and a suboptimal decoder. In this context, we have derived a new random-coding exponent with MAP decoding and product distributions that recoversGallager’s and Csiszar’s exponents as special cases.
For the converse part, we have investigated the connection between the chan-nel coding problem and binary hypothesis testing to characterize a lower boundon the error probability for JSCC. We have proposed a method to derive this lower
63
2.6 Chapter review and conclusions
bound by defining a decision rule for each transmitted message based on the out-put of the decoder. The type II error of each decision rule gives a lower bound onthe conditional (on the message) error via the Neyman-Pearson Lemma. Whenthe messages are not uniform, we have showed that this method is tighter thanconsidering one test based on the average output of the decoder. Unlike previ-ous hypothesis-testing bounds, the distribution of the decoder appears explicitlyin the converse bound as an optimization parameter. This has implications inrelated problems such as mismatch decoding. Finally, a problem for future re-search is the derivation of a general upper-bound (sphere-packing) on the JSCCexponent based on our lower bound on the error probability.
The solutions proposed in this chapter for each part share a central idea: we al-low the main parameters namely, the codeword distribution (for the achievabilitypart) and the testing decision rule (for the converse part), to be source-dependent,which improves the bounding analysis and thus, the finite-length characterizationof the problem. As introduced in Chapter 1, JSCC is a paradigm model that cov-ers many problems in Information theory, including multi-terminal channels withcorrelated sources. Then, this source-dependent notion can potentially be ex-tended to these scenarios with an appropriate definition of each parameter. Inparticular, our JSCC results can be used to obtain bounds for source and channelcoding. From this perspective, we have used our bounds to compare the perfor-mance of JSCC with respect to a two-stage scheme formed by the concatenationof source and a channel code (SSCC). In this context, the separation theoremstates that using SSCC is optimal if we assume infinite blocklength [13]. How-ever, JSCC outperforms SSCC for finite blocklength. For instance, in practicalimplementations, where the blocklength is of the order of hundreds or thousands,there is an intrinsic loss in dividing the transmitter and receiver into two indepen-dent blocks. We have provided two ways to quantify this loss. First, our boundsallow a numerical quantification of the minimum achievable gain of JSCC overSSCC. To do this, one can evaluate any JSCC upper bounds and the conversebound for separation given in Corollary 2.4.3. Alternatively, we have character-ized this gain by comparing an upper bound for JSCC with its particular formfor source and channel coding. Although only approximated, the final expres-
sion suggests that εJSCCk,n ≈ γ
(εSSCCk,n
)2, with γ being a positive constant, which is
consistent with previous work [69].
64
Part II
Large-system analysis ofmultiuser detection with anunknown number of users
65
Chapter 3
Introduction to large-systemanalysis of multiuser detection
3.1 Introduction
In multiple-access communication, the evolution of user activity may play animportant role. From one time instant to the next, some new users may becomeactive and some users inactive, while the parameters of the active users, suchas power or location, may vary. Now, most of the available multiuser detection(MUD) theory is based on the assumption that the number of active users isconstant, known at the receiver, and equal to the maximum number of usersentitled to access the system [63]. If this assumption does not hold, the receivermay exhibit a serious performance loss [31, 35].
In reference [4], the more realistic scenario in which the number of active usersis unknown a priori, and varies with time with known statistics, is the basis ofa new approach to detector design. This work presents a large-system analysisof this new type of detectors for Code Division Multiple Access (CDMA) whenthey jointly estimate data and user activity.
Our main goal is to determine the performance loss caused by the need forestimating the identities of active users, which are not known a priori. In Chapter4 we restrict our analysis to a worst-case scenario, where detection cannot improvethe performance from past experience due to a degeneration of the activity model(for instance, assuming a Markovian evolution of the number of active users[12, 47]) into an independent process [3]. The same analysis applies to systemswhere the input symbols accounting for data and activity are interleaved beforedetection. To prevent a loss of optimality, we assume that identities and dataare estimated jointly, rather than in two separate steps. In Chapter 5 we studythe case where active users encode their data during a block time and study a
66
3.2 Notation and model
specific iterative decoder for data and activity in this scenario.
3.2 Notation and model
We consider a CDMA system with an unknown number of users [4], and examinethe optimum user-and-data detector in a multiple-access link (e.g. at a basestation). In particular, we study the canonical model for randomly spread direct-sequence (DS) CDMA with a maximum of K active users and length N signatures(spreading sequences) assigned to each user, given by
yt = SAbt + zt, (3.1)
where
• yt ∈ RN is the received signal at time t.
• S ∈ RN×K is a matrix containing by columns the user spreading sequencessk, whose components are i.i.d. random variables with a symmetric distri-bution with zero mean and unit variance.
• A = diag(a1, . . . , aK) ∈ RK×K is the diagonal matrix of the users’ signalamplitudes.
• bt = (b1t , . . . , b
Kt ) ∈ RK is the users’ data vector where each bkt belongs to a
set X.
• zt is an additive white Gaussian noise vector with i.i.d. entries ∼ N(0, 1).
The main difference with the classical randomly spread CDMA models [57, 63]is that we define a system’s activity rate, α , Pr{user k is active}, 1 ≤ k ≤ Kand assume that active users employ binary phase-shift keying (BPSK) with equalprobabilities. In particular, we shall use the above model for two communicationpurposes: uncoded transmission (Chapter 4), where users are assumed to beactive at every single time with probability α, and coded transmission (Chapter5), where active users are assumed to encode their messages with blocklength Lor stay inactive. In this latter case we will analyze (3.1) in a time window.
In a static channel model, the detector operation remains invariant along adata frame, indexed by t, but we often omit this time index for the sake of sim-plicity. Assuming that the receiver knows S and A, the a posteriori probability(APP) of the transmitted data has the form
p(b|y,S,A) =1√2πe−‖y−SA b‖2
2p(b)
p(y|S,A). (3.2)
67
3.2 Notation and model
Hence, the maximum a posteriori (MAP) joint activity-and-data multiuser de-tector solves
b = arg maxb∈XK
p(b|y,S,A). (3.3)
Similarly, individually optimum (IO) detection of single-user data and activity isobtained by marginalizing over the undesired users as follows:
bk = arg maxbk
∑
b\bkp(b|y,S,A). (3.4)
Figure 3.1 illustrates the multiuser scheme with a transmission block based onthe randomly spread CDMA model and the receiver based on optimum (MAP)detection.
b1t
bKt
!
+
zt ! N (0, I)
s1 ! S(0, 1)
bt
a1
aK
!
!
!
ak sk ! S(0, 1)
bkt yt
MUD
b1t
bkt
bKt
bt
(MAP)
!
sK ! S(0, 1)
!
Figure 3.1: Randomly spread CDMA system with optimum multiuser detector.
According to (3.1), each user transmits a data vector over a multiuser channelunder the effect of interference and background Gaussian noise, which may causedetection errors at the receiver. We introduce a few definitions that help analyzethe detector performance in the presence of these two phenomena.
68
3.2 Notation and model
Definition 3.2.1 The signal-to-noise ratio (SNR) is the ratio of the transmittedsignal power to the background Gaussian noise variance.
For ease of analysis, we shall consider in Chapter 4 and Chapter 5 that every usertransmits with the same power and thus, the SNR will be constant across users.
In practical settings we are interested in evaluating the performance loss interms of uncoded error probability.
Definition 3.2.2 The uncoded error probability or error probability Pe is theaverage probability of erroneously detecting a transmitted symbol.
However, in the context of large-system analysis the effect of the interferenceis better captured by the concept of multiuser efficiency, which plays a centralrole in the development of this second part of the dissertation. The multiuserefficiency is a system parameter that describes the performance of every useraccessing the channel (3.1) in terms of the performance achieved under Gaussiannoise and no interference. The formal definition is given as follows.
Definition 3.2.3 Let the error of detecting user k accessing channel (3.1) be Pe.The multiuser efficiency ηk ∈ [0, 1] is defined as the inverse of the noise varianceof the white Gaussian noise channel under which the error of detecting user kwould be Pe.
As we shall see, the notion of multiuser efficiency is especially meaningful whenwe perform large-system analysis. Finally, we define a parameter describing thetrade-off between the dimensions of the CDMA system.
Definition 3.2.4 The system load β is defined as the ratio of the maximumnumber of users K to the length of the spreading sequence, i.e, β , K
N.
In general, evaluating the system’s performance of a detector for finite num-ber of users is challenging. In contrast, it has been showed that randomly spreadCDMA systems [21, 59] enjoy self-averaging properties when the natural dimen-sions (number of users K, and spreading gain N) tend to infinity, while thesystem load is kept fixed. Besides, the general results derived from asymptoticanalysis can be validated by simulations run for a limited number of users [27].Hence adopting the large-system analysis is a reasonable approach to study ouruser-and-data detector. Of late, the theory of multiuser detection has receivedimportant contributions stemming from a large-system approach. Some of themare due to Tanaka [57], who used statistical physics concepts and methodologiesin multiuser detection to obtain large-system uncoded minimum bit error prob-ability (BER) and spectral efficiency with equal-power binary inputs. Tanaka’spioneering approach inspired additional work. For instance Reference [44] studies
69
3.2 Notation and model
the channel capacity, while [28] presents a unified treatment of Gaussian CDMAchannels and multiuser detection in the large-system limit with arbitrary inputdistribution and flat fading. An extensive survey on the topic can be found in[27]. In particular, we are interested in optimal maximum a posteriori (MAP)multiuser detector, for which there exists asymptotic methods from statisticalphysics [26, 27, 28, 42, 46, 57].
Of special relevance in our analysis is the use of the replica method that en-ables to prove under some assumptions, that the multiuser channel (3.1) asymp-totically decomposes into K scalar Gaussian channels for optimum detection pur-poses [28]. This phenomenon is also referred to as the decoupling principle andprovides the framework where we develop our work in Chapters 3 and 4.
In the next section, we overview some basic concepts of statistical physics tointroduce the context in which the replica method is used and as a consequence,the decoupling principle can be proved.
3.2.1 Statistical-physics approach to large-system analy-sis of classic MUD
The core of the application of statistical physics to communication problems liesin the concept of free energy. In statistical physics, the free energy F (X) (whereX is the state variable) relates the energy ε(X) and the entropy H(X) of aphysical system of particles in the following way:
F (X) = ε(X)− TH(X), (3.5)
where T is the temperature of the system. At thermal equilibrium the free en-ergy (3.5) is minimized, since the entropy is maximized, as time evolves, followingthe second law of thermodynamics. Under these conditions, the free energy canalso be expressed as
F (X) = −T lnZ(X), (3.6)
where Z(X) =∑
x exp(− ε(x)
T
)is called the partition function, x denotes each
microscopic value, ε(.) is the energy operator, and the temperature T ≥ 0 reflectsthe energy constraint. A key point here is that the free energy, after normalizationto the dimension of the system, is a self-averaging quantity, which means that ittends to its expectation as the system dimension grows to infinity. The free energyis the starting point for calculating macroscopic properties of a thermodynamicsystem.
In communications systems, the role of the energy function can be playedby the metric of a detector in a multiuser channel. Therefore, by assumingthe self-averaging property, any detector parametrized by a certain metric can
70
3.2 Notation and model
be analyzed with the tools of statistical mechanics. Hence, the calculation ofthe limiting free energy at equilibrium, i.e., the logarithm of the inverse of thepartition function (3.6), leads to the characterization of the large-system detectorperformance in terms of its macroscopic parameters such as multiuser and spectralefficiency.
In a communications scheme such as the one modeled by (3.1), the goal of themultiuser detector is to infer the information-bearing symbols given the receivedsignal y and the knowledge about the channel state. This leads naturally tothe choice of the partition function Z(y,S) = p(y | S). The corresponding freeenergy, normalized by the number of users and with the choice T = 1, becomes
FK , − 1
Kln p(y | S). (3.7)
To calculate this expression we make the self-averaging assumption, which statesthat the randomness of (3.7) vanishes as K →∞. This is tantamount to sayingthat the free energy per user FK converges in probability to its expected valueover the distribution of the random variables y and S, denoted by
F , limK→∞
E{− 1
Kln p(y | S)
}. (3.8)
Evaluation of (3.8) is made possible by the replica method [42, 46], which consistsof introducing n independent replicas of the input variables, with correspondingdensity pn(y|S), and computing F as follows:
F = − limn→0
∂
∂n
(limK→∞
1
Kln E{pn(y | S)}
). (3.9)
The equivalence between (3.8) and (3.9) uses the following equality:
limn→0
∂
∂nln E{Zn} = E{lnZ} (3.10)
under the assumption that Zn = pn(y | S) is valid for all real valued n in thevicinity of n = 0, and that the limits in K and n can be interchanged.
To compute (3.9), one of the cornerstones in large deviation theorem, theVaradhan’s theorem [20], is invoked to transform the calculation of the limitingfree energy into a simplified optimization problem, whose solution is assumed toexhibit symmetry among its replicas. More specifically, in the case of a MAPindividually optimum detector, the optimization yields a fixed-point equation,whose unknown variable corresponds to the multiuser efficiency, of an equivalentGaussian channel [28]. Due to the structure of the optimization problem, themultiuser efficiency must minimize the free energy. The above is tantamount toformulating the decoupling principle:
71
3.2 Notation and model
Claim 3.2.1 [27, 28] Given the (instantaneous) multiuser channel (3.1) the out-put bk of the IO detector (3.4), conditioned on bk being transmitted with amplitude√γ, converges to the distribution of the posterior mean estimate of the single-user
Gaussian channel
y =√γbk +
1√ηz, (3.11)
where z ∼ N(0, 1), and η, the multiuser efficiency, is the solution of the followingfixed-point equation:
η−1 = 1 + βEγ [γMMSE (ηγ, α)] . (3.12)
If (3.12) admits more than one solution, we must choose the one that minimizesthe free energy function
F = −E[∫
p(y | bk) ln p(y | bk)dy
]− 1
2ln
2πe
η+
1
2β
(η ln
2π
η
). (3.13)
In (3.12), (3.13), p(y|bk) is the transition probability of the (large-system) equiv-alent single-user Gaussian channel described by (3.11), and
MMSE(ηγ, α) , E[(bk − bk
)2]
(3.14)
denotes the minimum mean-square error in estimating bk in Gaussian noise withamplitude equal to
√γ, where b = E
[bk|y
]is the posterior mean estimate, which
is known to minimize the MMSE [50].
Hence, the decoupling principle states that under random spreading an opti-mum (Bayes) detector asymptotically estimates users’ transmitted symbols as ifthey were transmitted over a scalar white Gaussian noise channel. The parame-ter that specifies this equivalent Gaussian channel is the multiuser efficiency. Anschematic plot of the decoupling picture is shown in Fig. 3.2).
Claim 3.2.1 can be derived minimizing the free energy (as discussed above)or more formally by showing the convergence in joint moments of the randomvariables (bk, bk) under the (vector) multiuser channel (3.1) to (bk, bk) under the(scalar) single-user channel (3.11). We provide a derivation of the decouplingprinciple in Appendix D.1 in the context of iterative multiuser decoding.
3.2.2 A note on the validity of the replica method
The replica method is known to accurately approximate experimental data andis consistent with previous theoretical work [59, 60, 66]. The replica method
72
3.2 Notation and model
b1t
bKt
a1
aK
!
!
!
ak
bkt
y1t+
+
+
ykt
yKt
zt ! N!
0,1!
"
zt ! N!
0,1!
"
zt ! N!
0,1!
"
Figure 3.2: Bank of K equivalent single-user Gaussian channels.
analysis relies on four unproved assumptions: i) the self-averaging property of thefree energy, ii) the replica symmetry of the fixed-point solution, iii) the exchangeof order of limits and iv) the analytic continuation of the replica exponent to realvalues. Although the mathematical rigor of these assumptions is still an unsolvedproblem, there has been some recent progress in this direction [18, 37, 38, 43, 56].
73
Chapter 4
High-SNR analysis of optimummultiuser detection in thelarge-system regime
In this chapter we present our main results on the asymptotic performance ofmultiuser detection in the high-SNR region. The chapter is structured as follows:Section 4.1 derives the large-system central fixed-point equation, and analyticalbounds to the MMSE. Based on these results, Section 4.2 discusses the interplayof maximum system load and multiuser efficiency. Finally, Section 4.3 drawssome concluding remarks. Proofs of some results can be found in the AppendixC.
4.1 Large-system multiuser efficiency
We illustrate here the behavior of multiuser efficiency and system load in thehigh-SNR region corresponding to detection with an unknown number of users.We start by shaping our problem into the statistical-physics framework [28, 57].As mentioned earlier, the multiuser detector metric is regarded as the energyof a system of particles at state X. Therefore, the partition function Z(X) =∑
x exp (−ε(x)/T ) corresponds to the output density given the channel informa-tion, i.e., p(y|S) = (2π)−1/2
∑b p(b) exp (−‖y − SAb‖2/2).
The energy operator ε(.), as derived from the free energy, is related to thelogarithm of the joint distribution p(y | b,S)p(b):
ε(b) = ‖y − SAb‖2 − 2 ln p(b) (4.1)
We can now invoke the decoupling principle (Claim 3.2.1) in the multiusersystem (3.1), so as to use its single-user characterization. By doing this, the
74
4.1 Large-system multiuser efficiency
system’s performance can be characterized by that of a bank of K scalar Gaussianchannels (3.11), where K represents the maximum number of users. The inputdistribution for an arbitrary BPSK user k takes values X = {−1, 0,+1} withprobabilities α
2, 1 − α and α
2, respectively, where α is the activity rate. Besides,
the signal amplitudes from matrix A are assumed to be constant, i.e., ak =√γ
∀k, where γ is the SNR per active user (referred to as SNR), and the inversenoise variance is equal to the multiuser efficiency η. Hence, η is the solution ofthe fixed-point equation (3.12) that minimizes (3.13), where the MMSE is givenby (3.14). More generally, the analysis presented in this chapter can be easilyextended to ak coefficients with different statistics, like for example those inducedby Rayleigh fading.
By applying Claim 3.2.1 in [28], which holds under the assumptions of thereplica method, the fixed-point equation of the user-and-data detector can bestated as follows:
Corollary 4.1.1 Given a randomly spread DS-CDMA system with constant equalpower per user, the large-system multiuser efficiency of an individually optimumdetector that performs MAP estimation of users’ identities and their data underBPSK transmission is the solution of the following fixed-point equation
η =1
1 + β
(γ
[α−
∫1√2πe−y2
2α2 sinh(ηγ−y√ηγ)
α cosh(ηγ−y√ηγ)+(1−α)eηγ2
dy
]) (4.2)
that minimizes the free energy (3.13).
Proof See Appendix C.1.
Our approach differs from that in [28, 44, 57], as the fixed-point equation (4.2)also includes the prior distribution on the users’ activity in a static channel.Under MAP estimation, detection requires the knowledge not only of the priorinformation of the data, but also of the activity rate α. Thus, the fixed-pointequations depend on MMSE, SNR, and system load. Numerical solutions vs.SNR at a load β = 3/7 are shown in Fig. 4.1. Plots like this one illustrate howthe multiuser efficiency is affected by the level of noise and interference, and bythe uncertainty in the users’ activity rate. For low SNR, noise dominates, and theperformance of the MMSE and the multiuser efficiency is degraded as α grows,since the presence of more active users adds more noise to the system. On theother hand, as we shall discuss later, for high SNR the MMSE strongly dependson the minimum distance between the transmitted symbols, and the activity ratehere plays a secondary role. Hence, the gap between the multiuser efficiencies withα = 1 and α 6= 1 for larger SNR is due to the fact that the former constellation has
75
4.1 Large-system multiuser efficiency
!20 !10 0 10 20
0.4
0.5
0.6
0.7
0.8
0.9
1
!(dB)
Mul
tius
ere!
cien
cy"
#=0.2#=0.6#=0.9#=1.0
Figure 4.1: Large-system multiuser efficiency of the user-and-data detector underMAP with prior knowledge of α and β=3/7.
twice the minimum distance of the latter. We can observe clearly the transitionbehavior from low to high SNR for values of α approaching 1. Moreover, whenα = 1, (4.2) reduces to the fixed-point equation for the classical assumption, inwhich all users are active and transmit a binary antipodal constellation [28]:
η−1 = 1 + β
(γ
[1−
∫1√2πe−y
2/2 tanh(ηγ − y√ηγ)dy
]). (4.3)
In this case, it can be shown that, for high SNR, we have MMSE(ηγ, α = 1) ≈√2πηγe−ηγ/2. In fact, the following general result holds:
Lemma 4.1.1 [40] For large output SNR, the MMSE (3.14) of a system trans-mitting an equiprobable M-ary normalized constellation with minimum Euclideandistance d in a Gaussian channel with noise variance 1/η is
MMSE(ηγ, α = 1) = κ(ηγ)e−d2ηγ/8 (4.4)
with κ1(ηγ) ≤ κ(ηγ) ≤ κ2, where κ1(ηγ) = O(
1√ηγ
)and κ2 is a constant, given
by the maximum distance between neighboring symbols.
For the entire range of activity rates, i.e., α ∈ [0, 1], we can derive lowerand upper bounds illustrating analytically the transition between the classical
76
4.2 Maximum system load and related considerations
assumption (α = 1) and the cases where the activity is also detected (α < 1) forlarge SNR. Our calculations bring about a new rigorous analytical framework todeal with large-system analysis, as we will see in the next section. Our bounds areconsistent with Lemma 4.1.1 and the lower bound includes the case α = 1. Thegeneral result is stated as follows, where we do not use any further assumptionsthan Corollary 4.1.1.
Theorem 4.1.1 [10, Th. 3.3] The MMSE of joint user identification and datadetection in a large system with an unknown number of users has the followingbehavior, valid for sufficiently large values of the product ηγ:
MMSE(ηγ, α) ≥ 2
√α(1− α)
πηγe−ηγ/8
MMSE(ηγ, α) ≤ 2αe−ηγ/2 +
√πα(1− α)
ηγe−ηγ/8. (4.5)
Proof See Appendix C.2.
Bounds in (4.5) describe explicitly, in the high-SNR region, the relationshipbetween the MMSE, the users’ activity rate, and the effective SNR (ηγ). InFig. 4.2 these bounds are compared to the true MMSE values as a function of ηγfor fixed α. It can be seen that the uncertainty about the users’ activity modifiessubstantially the exponential decay of the MMSE for high SNR. In fact, a valueof α different from 1 causes the MMSE to decay by exp(−ηγ/8), rather than byexp(−ηγ/2), which would be the case when all users are active. Furthermore,we can observe that, for sufficiently large effective SNR, the behavior vs. α ofthe optimal detector is symmetric with respect to α = 1/2, which correspondsto the maximum uncertainty of the activity rate. Figure 4.3 shows that for largevalues of the product ηγ, the MMSE essentially depends on the minimum distancebetween the inactivity symbol {0} and the data symbols {−1, 1}, and thus users’identification prevails over data detection. Summarizing, the dependence of theMMSE must be symmetrical with respect to α = 1/2, since it reflects the impactof prior knowledge about the user’s activity into the estimation.
4.2 Maximum system load and related consid-
erations
Recall the definition of maximum system load β = KN
, where K is the maximumnumber of users accessing the multiuser channel. When the number of active
77
4.2 Maximum system load and related considerations
10 11 12 13 14 15 16 17 18 19 2010
−7
10−6
10−5
10−4
10−3
10−2
10−1
ηγ(dB)
MMSE(ηγ, 0.5)
Upper Bound
Lower Bound
Figure 4.2: A comparison of the exact MMSE value with its upper and lowerbounds for α = 0.5 and ηγ ∈ [10, 20] dB
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5x 10
−7
α
MMSE(20dB,α)
Lower BoundUpper Bound
Figure 4.3: A comparison of the exact MMSE value with its upper and lowerbounds for ηγ = 20 dB and α ∈ [0, 1].
78
4.2 Maximum system load and related considerations
users is unknown, and there is a priori knowledge of the activity rate, the actualsystem load is β′ = αβ. In this section, we focus on β and study some of itsproperties. Notice that, given an activity rate, results for the actual system loadfollow trivially.
4.2.1 Solutions to the large-system fixed-point equation
We characterize the behavior of the maximum system load subject to quality-of-service constraints. This helps to shed light into the nature of the solutions ofthe fixed-point equation (4.2). In particular, there might be cases where (4.2)has multiple solutions. These solutions correspond to the solutions appearing inany simple mathematical model of magnetism based on the evaluation of the freeenergy with the fixed-point method [46]. They represent what in the statisticalphysics parlance is called phase coexistence (for example, this occurs in ice orliquid phase of water at 0◦C). In particular, at low temperatures, the magneticsystem might have three solutions 0 ≤ Ψ1 < Ψ2 < Ψ3 ≤ 1. Solutions Ψ1 and Ψ3
are stable: one of them is globally stable (it actually minimizes the free energy),whereas the other is metastable, and a local minimum. Solution Ψ2 is alwaysunstable, since it is a local maximum. The “true” solution is therefore given byΨ1 and Ψ3, for which the free energy is a minimum. The same considerationapplies also to our multiuser detection problem where multiuser efficiencies forthe IO detector might vary significantly depending on the value of the systemload and SNR. More specifically, for sufficiently large SNR, stable solutions mayswitch between a region that approaches the single-user performance (η = 1− ε1)and a region approaching the worst performance (η = ε0), for 0 < ε1, ε0 � 1.Following previous literature [57], we shall call the former solutions good and thelatter bad. When the solution is unique, due to low or high system load, themultiuser efficiency is a globally stable solution that lies in either the good or thebad solution region. Then, for given system parameters, the set of operational(or globally stable) solutions is formed by solutions that are part of these setsand minimize the free energy.
The existence of good and bad solutions are critical in our problem. Froma computational perspective, we are particularly interested in single solutions,either bad or good, that surely avoid metastability and instability. These solutionsbelong to a specific subregion within the bad and good regions, and appear forlow and high SNR, respectively.
From an information-theoretic perspective, it might seem that the true solu-tions should capture all our attention. However, it has been shown that metastablesolutions appear in suboptimal belief-propagation-based multiuser detectors, wherethe system is easily attracted into the bad solutions region (corresponding to lowmultiuser efficiency), due to initial configurations that are far from the true so-
79
4.2 Maximum system load and related considerations
lution [27]. Moreover, the region of good solutions is of interest in the high-SNRanalysis, because, for a given system load, it can be observed that the multiuserefficiency tends to 1, consistently with previous theoretical results [60].
Of special interest is the case of single good solutions for which the resultsarising from the replica method can be rigorously validated [38, 43]. In whatfollows, we provide an analysis of the boundaries of the stable solution regions,as well as their single-solution subregions with practical interest in the low andhigh SNR regimes.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
η
W(1
8dB
,η,0
.5)
β = 1β = 3/7
β = 10 β = 30
Figure 4.4: Fixed-point solutions (marked by circles) for different values of β andfixed α = 0.5, and γ = 18 dB.
A quantitative illustration of the above considerations is provided by plottingthe left- and right-hand sides of (4.2) to obtain fixed points for constant values ofamplitude and activity rate, and as a function of the system load. The solutionsof (4.2) are found at the intersection of the curve corresponding to the right-handside with the y = η line. Fig. 4.4 plots different solutions of the right-hand sideof (4.2) for increasing system load, α = 0.5 and γ = 18 dB:
W (γ, η, α) ,1
1 + βγMMSE(ηγ, α)
Notice first that the structure of the fixed-point equation in general does not allowthe solution η = 0, and for finite γ and β, η = 1 is not a solution. In fact, the
80
4.2 Maximum system load and related considerations
latter is an asymptotic solution for large SNR and certain system loads, as theMMSE decays exponentially to 0. From Fig. 4.4, one can observe the presenceof phase transitions and the coexistence of multiple solutions. In particular, weobserve that for β = 3/7 the good solution is computationally feasible. On theother hand, for β = 1 and β = 10 the system has three solutions, where the truesolution belongs to either the bad or the good solution region. When the systemload achieves β = 30, the curve only intersects the identity curve near 0, and theoperational solution is unique and lies in a subregion of bad solutions.
4.2.2 System load and the space of fixed-point solutions
Even in the case of good solutions, the multiuser efficiency can be greatly degradedby the joint effect of the activity rate and the maximum system load. In order toanalyze the fixed-point equation (4.2) from a different perspective and shed lightinto the interplay between these parameters, we express the maximum systemload as the following function, derived from (4.2):
Υβ(γ, η, α) ,(1− η)
ηγMMSE(ηγ, α)(4.6)
Since MMSE is a continuous function of η [29], then Υβ is also a continuous func-tion in any compact set over the domain η ∈ (0, 1] for given SNR and activity rate.It is also easy to observe that, for small values of η, Υβ tends to infinity regardlessof γ and α, whereas in the high-η region, which is of interest here, it decays to 0.Before analyzing the behavior of (4.6), we introduce a few definitions that helpdescribe the boundaries between the regions with and without coexistence (in thestatistical-physics literature, these boundaries are called spinodal lines [57]). Wealso define appropriately the regions of potentially stable solutions as introducedbefore.
Definition The critical system load β?(γ, α) is the maximum load at which astable good solution of (4.2) exists.
Definition The transition system load β?(γ, α) is the minimum load at whichthe true solution of (4.2), η? coexists with other solutions η′?.
Definition The good solution region corresponds to the domain of (4.6) formedby the maximum η in every set of pre-images of Υβ below the critical systemload:
Rg ={η ∈ [0, 1], η = max{Υ−1
β (β)},∀β ∈ [0, β?]}
(4.7)
Similarly, the bad solution region corresponds to the domain of (4.6) formed bythe minimum η in every set of pre-images of Υβ above the transition system load:
Rb ={η ∈ [0, 1], η = min{Υ−1
β (β)},∀β ∈ [β?,+∞)}
(4.8)
81
4.2 Maximum system load and related considerations
Definition The single good solution region Rgc ⊂ Rg corresponds to the domainof (4.6) formed by the pre-image of Υβ below the transition system load:
Rgc ={η ∈ [0, 1], η = Υ−1
β (β?)∀β ∈ [0, β?]}
(4.9)
and the single bad solution region Rbc ⊂ Rb corresponds to the domain of (4.6)formed by the pre-image of Υβ above the critical system load:
Rbc ={η ∈ [0, 1], η = Υ−1
β (β),∀β ∈ [β?,+∞)}
(4.10)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
5
10
15
20
25
!
!!(1
8dB
,!,0
.5)
Rg
" > ""
"" ! " ! ""
""
""" < ""
Rb
Figure 4.5: System load function in the multiuser efficiency domain for α = 0.5and γ = 18 dB.
Fig. 4.5 illustrates Υβ (for fixed SNR and activity rate) and show the re-gions defined by the aforementioned parameters. It is important to remark thatboth system loads defined above delimit the regions from which there is phasecoexistence (β? ≤ β ≤ β?) from the areas where there is one solution (β > β?
82
4.2 Maximum system load and related considerations
or β < β?). Additionally, Fig. 4.5 illustrates the set of solutions that satisfyconditions (4.7), (4.8).
Fig. 4.5 illustrates that it seems useful to define analytically the domain wherestable solutions can be found. Beforehand, we differentiate for convenience thecase with unknown number of users α ∈ (0, 1), from the case where all users areactive (α = 1). We do not consider the case α = 0.
4.2.2.1 Case α ∈ (0, 1)
In order to analyze the conditions on the system load, SNR, and activity rate, forwhich we can find a good solution, we use the asymptotic results on the MMSE(4.5), yielding lower and upper bounds L(.) ≤ Υβ(γ, η, α) ≤ U(.) for large enoughηγ, where
L(γ, η, α) ,(1− η)√
πηγα(1− α)eηγ/8 (4.11)
U(γ, η, α) ,(1− η)
2
√π
ηγα(1− α)eηγ/8 (4.12)
Although not exact for low SNR, the dependence on η of the upper and lowerbound provides a good approximation for the dependence of Υβ for large SNRand given α. Hence, by using U(.) and L(.), we obtain necessary and sufficientconditions that determine the regions of stable solutions and provide analyticalexpressions for the transition and critical system loads. The main result forα ∈ (0, 1) follows:
Theorem 4.2.1 [10, Th.4.5] Given the range of activity rates α ∈ (0, 1), a nec-essary condition for phase coexistence is
γ ≥ 4(3 + 2√
2), (γ ≥ 13.67 dB) (4.13)
Moreover, for high SNR, the condition is met and the transition system load isbounded by
L(γ, ηm, α) < β?(γ, α) < U(γ, ηm, α) (4.14)
while the critical system load is bounded by
L(γ, ηM , α) < β?(γ, α) < U(γ, ηM , α) (4.15)
and ηm, ηM are given by
ηm , (γ/2− 2− 4∆(γ))/γ
ηM , (γ/2− 2 + 4∆(γ))/γ
83
4.2 Maximum system load and related considerations
where ∆(γ) =√
(γ/8)2 − 3γ/8 + 1/4.Hence, the bad-solution region is given by Rb = (0, ηm], whereas the good-
solution region is Rg = [ηM , 1]. Similarly, the subregion of single bad solutionsRbc = (0, ηbc) ⊂ Rb, and that of single good solutions, Rgc = (ηgc, 1] ⊂ Rg, satisfy
ηbc = min{Υ−1β (β?)} > η?bc
ηgc = max{Υ−1β (β?)} < η?gc
where η?bc , min{L−1(β?)}, and η?gc , max{U−1(β?)} are obtained from thebounds.
Proof See Appendix C.3.
The above result provides the general boundaries of the space of solutions ofour problem. It is important to note that ηm and ηM are very good approxi-mations for high SNR of the positions of the minimum and maximum observedin Fig. 4.5, which determine the transition and the critical system loads. As aconsequence, remark that Theorem 4.2.1 analytically tells us the range of β’s forwhich there are either single or multiple solutions based on the up-to-a-constantapproximation of Υβ by (4.11) and (4.12). Similarly, η∗bc and η∗gc are tight boundsof the boundaries of the single-solution regions as U(.) and L(.) are of Υβ(.). Notealso that the activity rate affects the system load boundaries in the same sym-metrical manner as it does the MMSE (i.e., the worst case here also correspondsto α = 0.5) but has no impact on the regions of solutions, that are only reducedin size by increasing the SNR. In particular, these regions are characterized, inthe limit of high SNR, as follows:
Corollary 4.2.1 In the limit of high SNR, Rg → {1}, Rb → {0}, and conse-quently Rgc → {1}, and Rbc → {0} .
Proof The above corollary results from
limγ→∞
ηM = limγ→∞
(γ/2− 2 + 4√
(γ/8)2 − 3γ/8 + 1/4)
γ
= 1
limγ→∞
ηm = limγ→∞
(γ/2− 2− 4√
(γ/8)2 − 3γ/8 + 1/4)
γ
= 0
84
4.2 Maximum system load and related considerations
Note that, given a system load β with β? > β, for sufficiently large SNR theunique true (large-system) solution is η = 1, which corroborates the main resultin [60]. Moreover, the description of the single good solutions by analytical meansallows the computation of a sufficient condition on the system load to guarantee agiven multiuser efficiency in practical implementations. More specifically, we usethe aforementioned lower bound on Υ to state that any system load below L(.)guarantees that the given multiuser efficicency is achieved. The result is statedas follows:
Corollary 4.2.2 The maximum system load, βα,η, for a given activity rate andmultiuser efficiency requirement, η = 1− ε, where 0 < ε� 1, that lies in Rgc, islower-bounded in the high-SNR region by
βα,η >ε√
πηγα(1− α)e(1−ε)γ/8. (4.16)
In Fig. 4.6 we show the numerical values of the transition and the criticalsystem load as a function of the SNR in the (γ, β) space. We also use the asymp-totic expansion to derive upper and lower bounds, respectively. The plottedcurves are the spinodal lines, which mark the boundary between the regions withand without solution coexistence. The β? (lower branch) separates the regionwhere the bad solution disappears, whereas β? (upper branch) contains the bi-furcation points at which any good solution disappears. The intersection pointbetween both branches corresponds to the SNR threshold (4.13), which providesthe necessary condition for solution coexistence.
4.2.2.2 Case α = 1
We now apply the same reasoning for the “classical” approach to multiuser de-tection, corresponding to activity rate 1. In this case, using the approximationin [40], the system load function can be lower-bounded by
Υβ(γ, η, 1) =(1− η)
ηγMMSE(ηγ, 1)<
(1− η) eηγ/2√πηγ
(4.17)
Hence, we can derive the following spinodal lines
Corollary 4.2.3 Given α = 1 a necessary condition for the phase coexistence isthat
γ ≥ 3 + 2√
2, (γ ≥ 7.65dB).
Moreover, for high SNR, the condition is met and the transition system loadis upper-bounded by
β? <(1− ηm′)
2
√π
ηm′γeηm′γ
2 (4.18)
85
4.2 Maximum system load and related considerations
14 14.5 15 15.5 16 16.5 17 17.5 1810
−1
100
101
102
γ(dB)
β
β > β?
γ > 13.67
three generalsolutions(one operational)
β < β? one operationalsolution
one non-operationalsolution
β? < β < β?
Figure 4.6: Upper and lower bounds on the numerical spinodal lines (thick line)for α = 0.5.
and the critical system load is upper-bounded by
β? <(1− ηM ′)
2
√π
ηM ′γeηM′γ
2 (4.19)
where ηm′ and ηM ′ are given by
ηm′ ,(γ/2− 1/2− Λ(γ))
γ
ηM ′ ,(γ/2− 1/2 + Λ(γ))
γ
and Λ(γ) =√
(γ/2)2 − 3γ/2 + 1/4.Hence, the bad solution region is given by Rb = (0, ηm′ ] whereas the good
solution region is Rg = [ηM ′ , 1].
Proof The proof is analogous to that of Theorem 4.2.1.
The same consequence for the asymptotic operational region holds here.
Corollary 4.2.4 In the limit of high SNR, Rg → {1}, and Rb → {0}.
86
4.2 Maximum system load and related considerations
Proof This corollary results from
limγ→∞
ηM = limγ→∞
(γ/2− 1/2 +√
(γ/2)2 − 3γ/2 + 1/4)
γ
= 1
limγ→∞
ηm = limγ→∞
(γ/2− 1/2−√
(γ/2)2 − 3γ/2 + 1/4)
γ
= 0
8 9 10 11 12 13 14 15 16 17 1810
−1
100
101
102
103
104
γ(dB)
β
γ ≥ 13.67γ > 7.65
α = 1.0
α = 0.5
Figure 4.7: Comparison of upper bounds on the spinodal lines for α = 1.0 (left)and α = 0.5 (right).
In Fig. 4.7, one can observe a 6 dB-difference between the spinodal lines corre-sponding to α = 0.5 and to α = 1.0. This is due to the minimum distance of theunderlying constellations, which causes the MMSE to have different exponentialdecays. This can be interpreted by saying that the addition of activity detectionto data detection is reflected by a 6 dB increase of the SNR needed to achieve thesame system load performance. Moreover, with α < 1, the transition system loadis lower than the case where all users are active, and, therefore, computationallygood solutions correspond to lower values of the maximum system load.
87
4.2 Maximum system load and related considerations
4.2.3 Maximum system load with error probability con-straints
A natural application of the above results to practical designs appears when thequality-of-service requirements of the system are specified in terms of uncodederror probability. Such an application can provide some extra insight into theplausible values of β with joint activity and data detection for efficient design oflarge CDMA systems. Once a multiuser-efficiency requirement is assigned, thecorresponding probability of error follows naturally. Note first that, in order todetect the activity as well as the transmitted data, our model deals with a ternaryconstellation {−1, 0, 1}. When any of these symbols is transmitted by each userwith constant SNR= γ through a bank of large-system equivalent white Gaussiannoise channels with variance 1/η, the probability of error over X depends on theprior probabilities as well as the Euclidean distance between the symbols. Theerror probability implied by the replica analysis is
Pe(η, γ, α)
= 2(1− α)Q
(√ηγ
2+
λα√ηγ
)+ αQ
(√ηγ
2− λα√
ηγ
)(4.20)
where Q(x) , 1√2π
∫∞xe−
t2
2 dt is the Gaussian tail function, and λα , ln(
2(1−α)α
).
The relationship between η and Pe for our particular case can be used toreformulate the bounds on the function Υβ in terms of error probability.
Corollary 4.2.5 The maximum system load, Υβ(η, γ, α), for a given error prob-ability Pe, γ, and activity rate is bounded for high SNR by:
L(γ, ηmax, α) < Υβ(ηmax, γ, α) < U(γ, ηmax, α) (4.21)
whereηmax , max {ηP , ηgc} ,
and (ηP , γ, α) is the pre-image of Pe.
Proof The result is obtained by noticing that the multiuser efficiency require-ment extracted from Pe must lie on the subregion (ηgc, 1].
Notice that, if the error probability satisfies ηp ≤ ηgc = ηmax, then the con-straint is described by the bounds on the transition load (4.15). However, ifηgc < ηp = ηmax, then, for Corollary 4.2.2, the maximum system load can bealso easily bounded. Fig. 4.8 plots the critical system load for two different errorprobabilities requirements and three different activity rates.
88
4.2 Maximum system load and related considerations
15 15.5 16 16.5 17 17.5 18 18.5 19 19.5 2010
0
101
102
103
104
γ(dB)
β(γ
,η,α
)
Figure 4.8: Critical system load for different uncoded error probabilities andactivity rates. Thicker lines represent numerical results, whereas regular linesshow the corresponding lower bounds. Lines with circle markers: Pe = 10−3 andα = 0.99. Lines with cross markers: Pe = 10−3 and α = 0.1. Lines with starmarkers: Pe = 10−3 and α = 0.5. Lines with square markers: error probability10−5 and α = 0.5.
89
4.3 Chapter review and conclusions
4.3 Chapter review and conclusions
We have analyzed multiuser detectors for CDMA where the fraction of active usersis unknown and must be estimated in a tracking phase. Using a large-system ap-proach and statistical-physics tools, we have derived a fixed-point equation forthe optimal user-and-data detector, and provided asymptotic bounds for the cor-responding MMSE. Further, we have described the space of stable solutions of thefixed-point equation, and derived explicit bounds on the critical and transitionsystem loads for all users’ activity rates. These are consistent with the resultsobtained under the classic multiuser-detection assumption (α = 1.0) made in theliterature. The study of the so-called spinodal lines allowed us to determine theregions of stable good and bad solutions, including subregions of single solutions(also computationally feasible), in the system load vs. SNR parameter space ofour model. Our results show that for a user-and-data detector, the boundariesof the space of solutions do depend on the activity rate, whereas the regions ofstable solutions (good and bad) are only affected by the SNR. Hence, the overallsystem load performance keeps a symmetric behavior with respect to α. In prac-tical implementations with high quality-of-service demands, we are interested inmaximizing the critical system load, while keeping the optimal detector in thefeasible subregion of good multiuser efficiencies, such that a a wider range of po-tential users can successfully access the channel with a given rate. By increasingthe SNR, this goal can be achieved, but for limited SNR, the certainty on theusers’ activity allows allocation of more users for a given spreading length. Arelevant example corresponds to a system with a given error probability require-ment. For this case, we have shown that, for sufficiently large SNR, we can choosethe minimum multiuser efficiency in the domain of feasible good solutions, andmaximize the critical system load regardless of the error probability target.
One of the assumptions of this chapter is to model the activity as an i.i.d.process. An extensions to non-i.i.d scenarios can be found in the next chapter,where users transmit encoded messages and the activity is correlated over thecoded blocks of each user [8], and in [3] where the users’ activity evolves accordingto a Markov process.
90
Chapter 5
Iterative multiuser decoding withan unknown number of users inthe large-system regime
5.1 Introduction
The interplay between multiuser detection (MUD) and channel coding in multiple-access channels has been recently studied from different angles. From an information-theoretic perspective, the capacity region of the Gaussian multiple-access channelis known to be achievable by successive interference cancellation (IC) and single-user decoding [13, 25]. Practical approaches based on code division multipleaccess (CDMA) and iterative joint decoding have also been studied [6, 7, 52, 67].
The authors in [6] provide a unified framework to analyze the performance ofiterative multiuser joint decoding with CDMA in the limit for large blocklengthand system dimensions. Their approach is based on a factor graph representa-tion of the a posteriori probabilities (APP’s) of the information symbols usingthe belief propagation or sum-product algorithm [39]. This characterization al-lows the derivation of iterative algorithms that approximate optimal maximuma posteriori (MAP) decoding. The asymptotic performance of belief propagationcan be analyzed by using density evolution techniques [53]. Based on resultsfrom linear MUD for uncoded systems [59], [6] characterized the performance oflarge multiuser systems using suboptimal iterative IC and decoding with linearfiltering.
Recalling the main result for the large-system analysis of optimal MUD foruncoded systems (Proposition 1, [57]), the authors in [7] provide a modified ver-sion by replica method that characterizes the large-system performance of thenon-linear data iterative joint decoder. In contrast with previous work on un-
91
5.2 Encoding of data and activity
coded CDMA [28], the density evolution system performance is determined bya dynamic fixed-point equation on the multiuser efficiency. In particular, whenthe decoder messages are approximated using an equivalent Gaussian channel,density evolution with a Gaussian approximation (DE-GA) can be described asa one-dimensional dynamical system.
As introduced earlier, we analyze in this chapter the performance of an itera-tive joint decoding scheme with an unknown number of users in the large-systemregime. To do that, we use density evolution as a method to describe the dynam-ics of the multiuser efficiency and derive the corresponding fixed-point equationsbased on the replica method.
This chapter is organized as follows. Section 5.2 introduces the encodingmethod and the main notations used throughout. Section 5.3 describes the iter-ative MUD factor graph and the belief propagation decoding algorithm. Section5.4 presents the main results on density evolution and Gaussian approximation,showing the corresponding dynamic fixed-point equations. Finally, in Section 5.5we illustrate our main results with some numerical examples. Proofs can be foundin the Appendix D.
5.2 Encoding of data and activity
We introduce here our method to encode data and activity in the system model(3.1). Let B = (b1, . . . , bK)T ∈ RK×B be the information matrix of all users,where B is the length of an information message and bk = (bk,1, . . . , bk,B)T isthe k-th user information vector. Whenever user k is active, then bk ∈ {0,+1}B,otherwise we set bk,i = 2, ∀k ∈ K, ∀i = 1, . . . , B for sake of analysis5.1. We assumethat active-user vectors appear with probability α and their information vectorsare encoded independently. The inactive message appears with probability 1 −α. The nature of the information messages varies significantly from active toinactive users. For instance, it is easy to see that in the case of inactive users,the information symbols are not independent: if the first one is represented by2, all the rest are 2 as well. In order to incorporate the activity of users intothe decoding process, we add one symbol at the beginning of each coded block,which can be either one or zero, depending on whether the user is active or not,respectively. Hence, if user k is active, we can define an encoding function φkover each user’s message set Mk ⊆ {0,+1}B, φk : Mk → {−1,+1}L−1 such that
φk(mk) = (xk,2, . . . , xk,L) (5.1)
5.1Note that this is simply a convention to represent the inactivity. It therefore does notaffect the results.
92
5.2 Encoding of data and activity
where mk ∈Mk is an input message. The code Ck of an active user is then definedas
Ck = {x ∈ {−1,+1}L : x = (+1, φk(mk)), ∀mk ∈Mk} (5.2)
If the user is inactive, the user code Ck is only modulated as follows:
Ck = {x = 0 = (0, . . . , 0) ∈ RL}. (5.3)
Note that this is equivalent to considering a code Ck that incorporates the all-zero codeword (no signal is transmitted) representing the non-activity. While thepresentation given in this paper is general, we will focus our examples on trelliscodes. We differentiate two cases: if a user is active, its BPSK stream is inter-leaved across time. If the user is not active, the all-zero codeword is transmitted.We remark that the resulting vectors accessing the channel are independent buttheir components might be correlated, due to the temporal correlation induced bythe inactive users transmitting the all-zero codeword. The interleaved signals arethen spread and transmitted over the channel. The overall transmission schemeis depicted in Fig. 5.1.
b1
bK
!1x1
xK
!
+
{0,1}
Z!
{0,1}
!K
s1
sK!K
!1
Figure 5.1: Block diagram of the transmission.
If the codes Ck are convolutional codes, the above considerations result in atrellis that combines the activity and encoding functions. This is shown in Fig.5.2 for the first stages of a convolutional code (5, 7)8. We represent the inactivitywith the upper all-zero branch, while users’ activity corresponds to the lower
93
5.2 Encoding of data and activity
branch that contains the code structure. The overall trellis can be decoded withthe forward-backward algorithm [1]. The two parts are linked in the initial stateS0 and become independent after the first symbol.
0
1
00 00.......S0
S1
S2
S4
S0 S0 S0
S1 S1
S3 S3
-1-1 -1-1
11
1-1
-11
11
.......
.......
.......
.......
Figure 5.2: Modified trellis structure of a convolutional code (5, 7)8.
5.2.1 Optimum detection
Let X , {1×{−1, 1}L−1∪0} denote the codeword space of our coding construc-tion, where L is the code blocklength, and xk the codeword assigned to user k.Assuming that the receiver knows S and A, the a posteriori probability (APP)of the transmitted data has the form
p(X|Y ,S,A) = (2π)−NL2 e−
12‖Y −SAX‖2 p(X)
p(Y |S,A). (5.4)
where ||.|| is the Frobenius norm. Hence, the maximum a posteriori (MAP) jointactivity-and- data multiuser detector solves
X = arg maxXK
p(X|Y ,S,A), (5.5)
where K is the maximum number of users. Similarly, optimum detection of single-user data and activity is obtained by marginalizing over the undesired users asfollows:
xk = arg maxxk
∑
X∈XK ,xk=x
p(X|Y ,S,A). (5.6)
94
5.3 Iterative joint decoding under belief propagation
5.3 Iterative joint decoding under belief propa-
gation
Our goal is to compute the a posteriori p.m.f. of the information symbols:
Pr(b1, . . . , bK |Y ,S,A) (5.7)
However, the computation of (5.7) by brute force is infeasible even for a smallmaximum number of users due to the large dimensions. In order to obtain alow-complexity detector, we resort the canonical factor graph representation of amultiuser coded system [6] and consider the application of the well-known sum-product algorithm [39]. Although in the model presented above symbols are cor-related during the inactive stream, we study symbol-by-symbol belief propagation(BP) decoding as a suboptimal mismatched strategy to approximate iterativelythe marginal probabilities of (5.7). This method would asymptotically replicatelarge-system optimal detection [43, 57] in the case of a system with collocatedusers, where coded sequences could be interleaved across the user dimension.
The application of BP to our model results in message passing between theindividually optimum multiuser detector (IO-MUD) and the users’ soft-inputsoft-output (SISO) decoders. In this case, our suboptimal detector assumes thateach coded symbol can take values in the ternary constellation X , {−1, 0,+1}.We thus use a three-dimensional probability vector to describe the exchangedmessages between the two blocks. Hence, the outgoing messages from the IO-MUD at iteration ` ∈ N, for user k = 1, . . . , K and time l = 1, . . . , L are denoted
as q(`)k,l =
(q
(`)k,l (−1), q
(`)k,l (0), q
(`)k,l (1)
), where q
(`)k,l stands for the extrinsic probability
of the symbol xk,l given the channel observation. The outgoing messages from
the SISO decoder are denoted as p(`)k,l =
(p
(`)k,l(−1), p
(`)k,l(0), p
(`)k,l(1)
), and are the
extrinsic probabilities of the coded symbols [53]. When the users’ codes are
convolutional codes, the messages p(`)k,l are obtained by applying the forward-
backward algorithm to the combined trellis. Note that the above algorithm issuboptimal since it ignores the correlation introduced by the inactive users, byexchanging different messages over a ternary constellation for every time instancel = 1, . . . , L.
According to [6, 39], the sum-product rules that relate the probabilities q(`)k,l
and p(`)k,l are stated as follows:
95
5.3 Iterative joint decoding under belief propagation
q(`)k,l (x) ∝
∑
x∈XK ,xk,l=x
exp
−1
2
∣∣∣∣∣yl −K∑
j=1
sjajxj,l
∣∣∣∣∣
2∏
j 6=k
p(`−1)j,l (xj,l), for x ∈ {−1, 0, 1}
(5.8)
p(`)k,l(x) ∝
∑
{x∈Ck,xk,l=x}
∏
j 6=l
q(`)k,j(x), for x ∈ {−1, 0, 1} (5.9)
for ` ≥ 1. We assume that the message p(0)k,l = (α
2, 1− α, α
2) for k = 1, . . . , K and
time l = 1, . . . , L so that the reciever knows the statistics of user activity. Notethat the above messages can be viewed as random variables, which depend onboth the channel and code parameters.
Finally, the approximation of the APP of any information symbol b resultingfrom the above belief propagation algorithm is given by
APP(`)k,l(b) ∝
∑
{x=φk(b),bk,l=b}
L∏
j=1
q(`)k,l (xk,l). (5.10)
Figure 5.3 illustrates the above method by showing the message exchangesbetween the MUD detector and the SISO decoders to estimate the l-th symbolof every user information vector bk.
b2,l
yMUD
bK,l
b1,l
(MAP)
q2,l
p!K,l
SISO decoder
SISO decoder
SISO decoder
user 1
user 2
user K
q1,l
qK,l
p!1,l
p!2,l
Figure 5.3: Block diagram of an iterative multiuser joint decoder
96
5.4 Performance analysis
5.4 Performance analysis
In order to analyze the performance over the iterations, we are interested instudying the evolution of the p.d.f. of the above messages. This can be doneby means of a procedure termed density evolution. Density evolution has beenapplied to study low-density parity check (LDPC) codes [53] as well as iterativeMUD [6, 7]. Density evolution is based on the principle that as the length of thecodes is sufficiently large, the p.d.f. of the messages exchanged at each iterationconverges to a deterministic one. In this section, we study density evolution forour joint multiuser decoder with an unknown number of users. In particular, westudy the large-system limit, i.e., when the number of users and the spreadingsequence dimension grow large, but their ratio is kept fixed. More specifically, weemploy statistical physics techniques to characterize the nature of the messagesq`k,l.
5.4.1 Large-system analysis
In general, the analysis with finite K and N can be complicated [6]. On the otherhand, as we introduced in Chapter 3, large-system analysis is remarkably simplerand accurately mimics the behavior of the system for not-so-large dimensions[6, 7]. In particular, the decoupling principle also holds in the context of iterativedecoding.
Theorem 5.4.1 [6, Prop.2] Let K,N →∞ keeping their ratio, the system loadβ = K/N , fixed. The marginal probabilities (5.8) computed at the IO-MUDcorrespond to those of an equivalent scalar Gaussian channel Y = X + Z, where
the noise is distributed as N(
0, 1γkη(`)
), k = 1, . . . , K, and η(`) is the multiuser
efficiency at iteration `
The above result applies to a wide range of detectors, not necessarily optimal.In particular, for optimum detection, the multiuser efficiency at every iteration` can be found using the replica method [7]. More interestingly, the decouplingprinciple can be generalized to the case of coded CDMA systems where the userssymbols are not independent from one time to another [58]. In particular, inthe large-system limit, for every time index l = 1, . . . , L we obtain a set of Kparallel additive white Gaussian noise channels with time-varying variances givenby (γηl)
−1, l = 1, . . . , L, where η = (η1, . . . , ηL) is the multiuser efficiency vector,characterizing the multiuser efficiency at every symbol instance.
According to this generalization, we now present our main result that describesthe dynamical system behavior by updating the distribution of the IO-MUDmessages at each iteration. Note that due to the large-system approach, the
97
5.4 Performance analysis
extrinsic probabilities provided by the SISO decoders p(`)k,l are independent of
k, hence resulting in a 3 × L matrix denoted by P(`)ext. Generalizing [6, 7], the
multiuser efficiency is given in terms of an L-dimensional fixed-point equationη(`) = Ψ(η(`−1), β, α, γ), that characterizes the density evolution mapping, as ηevolves through the iterations.
Claim 5.4.1 Consider an iterative MUD system where the number of users isunknown and parameterized by a Bernoulli variable Aα with success probabilityα. Then, the multiuser efficiency at iteration `, η(`) = Ψ(η(`−1), β, α, γ) ∈ RL, ofa belief-propagation iterative joint multiuser decoder is given by the globally stablesolutions of the following fixed point equations:
η(`)l =
(1 + βE
Aα,P(`−1)ext ,z,x,γ
[γ (xl − xl)2])−1
(5.11)
for l = 1, . . . , L , where xl(η(`),η(`−1), γ) is the l-th entry of the MMSE symbol
estimate x = E[x|y, γ,P (`−1)ext ] over the single-user equivalent vector Gaussian
channel:y =√γx+ z (5.12)
where x,y ∈ RL, x ∼ P (`−1)ext , and z ∼ N (0,Σ), where Σ = diag(η−1
1 , . . . , η−1L ).
Proof See Appendix D.1.
The above result defines implicitly Ψ and describes the performance of an it-erative joint decoder under belief propagation, assuming that the symbols at timel = 1, . . . , L can be correlated. In terms of convergence, if the above density evo-lution algorithm has a unique fixed point η = (1, . . . , 1), the system approachessingle-user performance. On the other hand, if at any iteration `, Ψ has a fixedpoint in (0, 1) in any of its L components, i.e. there exists at least an l such that
η(`)l = η
(`−1)l < 1, then the multiuser detector cannot remove the interference at
time l. Otherwise, the multiuser efficiency η converges to 1 ∈ RL through theiterations.
Equation (5.11) can also be interpreted as an analogous case of [61], where thefading distribution takes a specific form characterized by Aα. Notice, however,that the activity factor comes into play in the encoding mapping and is inde-pendent of the nature of channel. Therefore, (5.11) can be extended to fadingchannels with soft channel estimation feedback, allowing for a further generaliza-tion. Furthermore, (5.11) could potentially encompass correlated and non-ergodicfading, generalizing the study of [61] for i.i.d. fading to more general fading dis-tributions.
98
5.4 Performance analysis
5.4.2 Concentration
The concentration theorem in [6] for coded CDMA refers to the existence of a
limiting distribution of the output messages p(`)k,l for L → ∞ under some unifor-
mity conditions on the user codes Ck. However, our codebook does enjoy theseconditions in general. Although data and activity are jointly decoded, we notethat the scheme allows a simple separate error probability analysis. To see that,let Q?
e and P ?e be the Bayes error probabilities for data and activity detection re-
spectively. Then, the optimal error ε on any user’s symbol can be upper-boundedas
ε ≤α(Q?
(e|Aα=1) + P ?(e|Aα=1)
)+ (1− α)P ?
(e|Aα=0) (5.13)
=αP ?(e|Aα=1) + (1− α)P ?
(e|Aα=0) + αQ?(e|Aα=1) (5.14)
=P ?e + αQ?
(e|Aα=1). (5.15)
Hence, if the activity detection error P ?e vanishes for sufficiently large L, the
error will only be due to data detection and thus, the distribution of the p(`)k,l will
concentrate with an appropriate choice of the users’ code [6]. We will show in thefollowing that this holds as long as the maximum number of users scales appropri-ately with the blocklength. To see that, we first derive an exact characterizationof P ?
(e|Aα=1) and P ?(e|Aα=0) in the context of large-system analysis. For that pur-
pose, notice from Section 5.2, the representation of the activity/inactivity can becharacterized as a repetition code of rate 1/L.
Lemma 5.4.1 The Bayes-optimal error probabilities of making an incorrect ac-tivity decision on user k under the channel model (5.12) are
2−(L−1)R
2(L−1)R∑
j=1
Pr
{min
ci
{(√γ(cj − ci) + z)TΣ−1(
√γ(cj − ci) + z)
}≥
(√γcj + z)TΣ−1(
√γcj + z) + 2 log
(α
1− α
)− 2(L− 1)R log 2
}
(5.16)
when an active codeword has been transmitted and
Pr
{min
ci
{(z −√γci)TΣ−1(z −√γci)
}≤ zTΣ−1z+2 log
(α
1− α
)−2(L−1)R log 2
}
(5.17)when an inactive codeword has been transmitted, where {ci}, ci ∈ {−1, 1}L, i =1, . . . , 2(L−1)R, is the set of active codewords with rate R of any user k and 0 ∈ RL
denotes the inactivity codeword.
99
5.4 Performance analysis
Proof See Appendix D.2.
Our iterative joint decoding scheme approximates (5.16) and (5.17) at any it-
eration ` via P(`)(e|Aα=1) and P
(`)(e|Aα=0) leading to an average error P
(`)e , αP
(`)(e|Aα=1)+
(1−α)P(`)(e|Aα=0), but those approximations are in general challenging to compute
even for small values of L. However, we can find simple upper bounds to bothquantities by making two considerations:
• We take the (unidimensional) multiuser efficiency to be
η(`) , min1≤l≤L
η(`)l . (5.18)
Thus, instead of (5.12), we consider the large-system single-user equivalentwhite-noise Gaussian channel y =
√ηγx+ z, where z ∼ N(0, I).
• We consider two (sub-optimal) activity decoding-regions.
The following result is derived according to both observations.
Proposition 5.4.1 Under the single-user equivalent Gaussian vector channel(5.12) at iteration `, with η(`) = (η
(`)1 , . . . , η
(`)L ), P
(`)(e|Aα=1) and P
(`)(e|Aα=0) can be
upper-bounded for sufficiently large L as
P(`)(e|Aα=1) ≤ 2
√1− αα
e−Lη(`)γ
8 , and P(`)(e|Aα=0) ≤
√α
1− αe−L η
(`)γ8 . (5.19)
where η(`) , min1≤l≤L η(`)l .
Proof See Appendix D.3.
Proposition 5.4.1 tells us that we generally require large blocklengths to ensurereliable activity detection. One may wonder how large the blocklength needsto be, i.e., how L needs to asymptotically scale with the rest of the parametersto achieve such performance. For that purpose, consider that the blocklengthdepends on K, L = L(K), and let L = L(K) be such that
limK→∞
logK
L(K)= ρ, (5.20)
where ρ ≥ 0 expresses the tradeoff between the blocklength and the logarithm ofthe maximum number of users as K grows large. We remark that ρ = 0 impliesthat the blocklength grows faster than logK.
100
5.4 Performance analysis
Let now P(E|Aα=1) and P(E|Aα=0) be the probabilities of decoding incorrectlythe activity of at least a user. Assuming that all users transmit their codewordsindependently, we can use the union bound to bound these quantities as
P(E|Aα=1) ≤ 2
√1− αα
e−L( ηγ8 −ρ) (5.21)
P(E|Aα=0) ≤√
α
1− αe−L( ηγ8 −ρ). (5.22)
Note that according to (5.21) and (5.22) this immediately implies that theoverall error probability for activity detection
PE , αP(E|Aα=1) + (1− α)P(E|Aα=0) (5.23)
converges to 0 as L→∞, ∀ρ < ρth(η, γ), where
ρth(η, γ) ,ηγ
8, (5.24)
whereas for ρ ≥ ρth(η, γ) this is not true in general. We remark that ρth is fullydetermined by the actual SNR of the equivalent large-system Gaussian channel,i.e. by increasing γη, the range of feasible scaling factors can be increased. Inparticular, by fixing γ, ρth depends exponentially on the level of interference ηand achieves its maximum value when the interference is removed (i.e., at η = 1).
5.4.2.1 Case ρ < ρth. Decoupling of extrinsic messages
When ρ < ρth(η, γ) the density evolution assumption (L → ∞) allows us toconclude that correct activity detection is achieved with probability PE → 0.This implies that for every inactive user, p
(`)k,l → (0, 1, 0) in the limit of large
codeword length.We therefore consider a compound of two types of message probabilities that
switch depending on whether soft decoding operates on an active or an inactivecoded block. Consequently, the limiting distribution of the messages over an ac-tive block exists under the same conditions as in the general case [6], whereas thelimiting distribution of the messages over an inactive one exists since it concen-trates all the probability in the symbol 0.
As a result of the decoupling structure in the p(`)k,l distribution for L → ∞,
the application of claim 5.4.1 to our system yields a simplified one-dimensionalfixed-point equation. We therefore define p
(`)ext ≡ p
(`)k,l ∈ R3 as the columns of
P(`)ext to simplify the notation. In fact, since the activity is perfectly detected
after the first iteration for arbitrary SNR and L → ∞ and belief propagationcan approximate the optimal detection of the interleaved active-user codewords
101
5.4 Performance analysis
[43]. Symbols are no longer correlated due to the effect of the interleaver, andthe resulting density evolution mapping takes a unique one-dimensional formη(`) = Ψ(η(`−1), β, α, γ, ρ).
5.4.2.2 Case ρ ≥ ρth. Lower bound to η(`)
Instead, when ρ ≥ ρth, there is less knowledge about the actual limiting distri-bution of p
(`)k,l as L→∞ as the multiuser efficiencies η
(`)l are in general different
across time. However, we can provide a lower bound to every η(`)l in η(`) by the
following arguments. Recall first that Aα,k is the binary random variable denot-ing the activity of user k’s block. Then, consider the scenario where the activityis detected by another binary variable Aα,k using only the prior probabilities onAα,k, and consequently the extrinsic probabilities do not incorporate any furtherinformation on Aα,k along the iterations other than the activity rate α. Thus,
Pr{Aα = 1} = α and Pr{Aα = 0} = 1 − α and the weak extrinsic probabilitiesp(`)k,l
can be decomposed as:
p(`)
k,l= αp
(`)
(k,l|Aα=1)+ (1− α)p
(`)
(k,l|Aα=0)(5.25)
Thus, we compute the worst-case extrinsic probabilities p(`)(ext|Aα=0) ≡ p
(`)(k,l|Aα=0) ∈
R3 of every symbol in an inactive block via conditioning (5.25) on Aα = 0:
p(`)
(ext|Aα=0)=[α
2, 1− α, α
2
]T, ∀` ≥ 0 (5.26)
where it is used that p(`)
(k,l|Aα=0,Aα=0)= [0, 1, 0]T and
[12, 0, 1
2
]Tis taken to be
the worst-case scenario among all p(`)
(k,l|Aα=1,Aα=0). Notice here that p
(`)(ext|Aα=0) ≡
p(0)(ext|Aα=0), ∀` ≥ 0, and consequently, the extrinsic probabilities applying to any
inactive block remain the same along the iterations.On the other hand, we compute the worst-case extrinsic probabilities p
(`)(ext|Aα=1) ≡
p(`)(k,l|Aα=1) ∈ R3 when an active codeword has been transmitted (Aα = 1) by con-
ditioning (5.25) on Aα = 1:
p(`)
(ext|Aα=1)= αp
(`)
(k,l|Aα=1,Aα=1)+ (1− α)[0, 1, 0]T (5.27)
and taking [0, 1, 0]T to be the worst-case scenario among p(`)
(k,l|Aα=0,Aα=1)’s. More-
over, notice that
p(`)
(k,l|Aα=1,Aα=1)= [p
(`)
(k,l|Aα=1,Aα=1)(−1), 0, p
(`)
(k,l|Aα=1,Aα=1)(1)]T (5.28)
102
5.4 Performance analysis
where p(`)
(k,l|Aα=1,Aα=1)(−1) and p
(`)
(k,l|Aα=1,Aα=1)(1) correspond to the extrinsic prob-
abilities of x = −1 and x = 1, when an active codeword has been transmittedand its activity has been successfully decoded. Hence, in the large-system limitthe extrinsic probabilities for Aα = 1 satisfy:
p(`)
(ext|Aα=1)= αp
(`)
(k,l|Aα=1,Aα=1)+ (1− α)[0, 1, 0]T ∈ R3, ∀` ≥ 0 (5.29)
and consequently, its limiting distribution exists under the same conditions asthat of a system with known number of users [7].
Further, since the message probabilities p(`)ext
are the same ∀l = 1, . . . , L, theresulting density evolution mapping takes also a unique one-dimensional formη(`) = Ψ(η(`−1), β, α, γ, ρ). More specifically, the general result on the mappingη(`) = Ψ(η(`−1), β, α, γ) that incorporates the cases ρ ≥ ρth is given in the follow-ing corollary:
Corollary 5.4.1 The large-system fixed-point equation of a system with unknownnumber of equal-power users that perfectly estimates their activity at a particulariteration η(`−1) > 0 (` ≥ 1) due to ρ < ρth(η(`−1), γ) converges with probability 1to
η(`) =
(1 + β′γ
(1− E(pext,z,x|Aα=1)
[x2]))−1
(5.30)
where β′ = βα and x(η(`−1), η(`), γ) = E[x|y, γ,p(`−1)ext ] is the MMSE estimate for
the single-user scalar Gaussian channel y =√γx+ z, where x, y ∈ R, x ∼ p(`−1)
ext ,
p(`−1)ext ∈ R3 and z ∼ N
(0, 1
η(`)
).
For ρ ≥ ρth(η(`−1), γ), the mapping η(`) = Ψ(η(`−1), β, α, γ) (5.11) with η(`) =η1, . . . , ηL) is lowerbounded by
ηl ≥ η(`) = Ψ(η(`−1), β, α, γ, ρ) l = 1, . . . , L, (5.31)
which is the solution of the fixed-point equation
η(`) =
(1 + β′γ
(1− E
(p(`−1)ext ,z,x|Aα=1)
[x2])
+ (β − β′)γ(E
(p(`−1)ext ,z,x|Aα=0)
[x2]))−1
,
(5.32)
where the MMSE estimate x(η(`−1), η(`), γ) depends now on x ∼ p(`−1)(ext|Aα) ∈ R3
given by equations (5.26) and (5.29).
Proof See Appendix D.4.
103
5.4 Performance analysis
Since ρth is a function of η and η ∈ (0, 1], the impact of ρth into any systemwith tradeoff ρ can be better analyzed through the threshold value
ηth(ρ, γ) , min
{8ρ
γ, 1
}(5.33)
which establishes the minimum multiuser efficiency above where the system’sactivity detection satisfies PS → 1 as L → ∞ and consequently, the fixed-point(5.30) holds.
We remark that according to Corollary 5.4.1 all systems that encounter a fixed-point Ψ(η(`?)) = η(`?) above ηth for some iteration `? (i.e., η(`?) > ηth), undergo twodifferent phases. The first phase is found for ρ ≥ ρth during the initial iterationsof the decoder, where η(`−1) ∈ (0, ηth], ` < `?, hold. As a result, the error due toactivity detection cannot be neglected and leads to a performance loss boundedby the uncoded-activity scenario (5.32). The second phase corresponds to η(`−1)
being sufficiently large such that η(`−1) > ηth, ` ≤ `?. Then, over the followingiterations, equation (5.30) provides the fixed-point equation of a system with aknown number of users and load β′ = βα [7]. Notice that the same argumentson the convergence of Ψl, l = 1, . . . , L, in the general case hold here for Ψ.
5.4.3 Approximations
The fixed-point equations (5.30) and (5.32) can be further developed by usingsome approximations of the probabilities pext when an active block is transmittedand its activity is successfully detected. Based on common empirical observations,we can approximate the outgoing message from the active-user SISO decoders asthe output of a virtual equivalent Gaussian channel where the noise is modelledas N(0, 1/(µ(`))). The approximation allows some degrees of freedom in the choiceof µ at each iteration `. As in [7], we choose the following matching
µ(`) =(Q−1
(Pe(γη(`)
)))2(5.34)
where Pe(%) is the symbol error probability where decisions are made from ex-trinsic probabilities of the SISO decoders for a general input with SNR %. Thischaracteristic can be obtained by simple simulation over the AWGN channel, ora combination of simulation and bounding techniques [6].
In particular, by the above Gaussian approximation we can recover the ana-lytical result of [7] from the first statement (equation (5.30)) of Corollary 5.4.1:
Corollary 5.4.2 Assume the Gaussian approximation postulated above for theSISO decoders. Then, when ρ < ρth(η(`−1), γ) the fixed-point equation for ` > 1
104
5.5 Numerical results
converges with probability 1 to
η(`) = (1 + β′γE(p
(`−1)ext ,z,x|Aα=1)
[x2])−1, (5.35)
where E(p
(`−1)ext ,z,x|Aα=1)
[x2] it is approximated by
∫
R2
1
2πe−
(y2+w2)2 tanh
(η(`)γ+µ(η(`−1)γ)−
√η(`)γy−
√µ(η(`−1)γ)w
)dydw (5.36)
Proof We use here the aforementioned Gaussian approximation of the proba-bility messages, that states that when there is an active user (Aα = 1, withprobability α), the virtual channel at the SISO decoder can be approximated by
the distribution δ ∼ N(
0, 1µ(γη(`−1))
). Hence, the extrinsic probabilities can be
computed as APP’s of a virtual Gaussian channel w = x+ δ:
(pext|Aα = 1) = (p(−1), p0, p1) =1
f(w)√
2π
(1
2e−
(w−√µ(ηγ))2
2 , 0,1
2e−
(w+√µ(ηγ))2
2
)
(5.37)where f(w) is the p.d.f. of the variable w. Further development of Epext,z,x
[x2] =
Ew,y,x|y[x2] in (5.30) gives equation (5.35).
5.5 Numerical results
The above results imply that under some conditions the analysis of a codedmultiuser system with user-and-data detection can be converted into the analysisof a standard multiuser system where the number of users is fixed and known.More specifically, the activity can be detected perfectly after a few iterations,and the behavior of the dynamical fixed-point equation has the form of a datadetector with a scaled system load.
In Fig. 5.4 we illustrate Corollary 5.4.1. To that end, we simulate a (5, 7)8
convolutional code with the Trellis structure shown in Fig. 5.2 for a codewordlength of L = 4000 and 1000 realizations. We first show the density evolutionmapping function corresponding to an overloaded system with β = 4.5 and a 0.02resolution grid on the η-axis for the standard MUD case (all users active α = 1,thinner solid line) and a case where all users are active with probability α = 0.5and ρ = 0.0 (ηth = 0.0, solid line). We also plot two lower bounds based on densityevolution for α = 0.5 with non-vanishing tradeoff between the blocklength andthe logarithm of the number of users: one with ρ = 0.05 (ηth = 0.1, dash-dottedline) and another with ρ = 0.5 (ηth = 1, dotted line). The codeword lengthused here is L = 4000, for Eb/N0 = 6dB and the number of realizations are
105
5.5 Numerical results
100. Observe that for α = 1, the system converges to a fixed point at verylow multiuser efficiency and results in the system not converging to remove theinterference. On the other hand, when users are active with probability α = 0.5and ρ = 0.0, we have that ρ < ρth(η, γ) for η ∈ (0, 1], the unique solution ofΨ(η) = η is η = 1 and the system converge s to single-user performance. Thesame convergence is achieved when the system has ρ = 0.05. In this case, thetrue system performance is lower-bounded by (5.32) for η ∈ (0, 0.1], and coincideswith the case ρ = 0.0 for η ∈ (0.1, 1] since perfect activity detection is achieveddue to ρ < ρth. Note that in both cases the activity rate scales the system loadafter the first iteration, and thus, the curve with α = 0.5 significantly improvesthe performance of the α = 1-case. However, when α = 0.5 and ρ = 0.5, we havethat ρ ≥ ρth ∀η ∈ (0, 1] since maxη ρth = ρth(1, γ) = 0.49 and the system onlyexperiments the stage where the lower bound holds.
Notice that the scaling on the system load for ρ = 0.0 (ηth = 0) cannotreproduce the value at the very first half-iteration (first output of the IO-MUD)since the detection is done as if the system was uncoded. It is easy to see thatthe right limit of the scaling curve when η → 0+ does not coincide with the pointΨ(0) . The right limit is the fixed-point equation of a data detector with scaledsystem load αβ:
η(0)c =
1
1 + αβγ
(1−
∫e−y2
2√2π
tanh(η(0)γ −
√η(0)γy
)dy
) (5.38)
whereas the point Ψ(0) is given by the user-and-data detection curve with theprior probabilities {α/2, 1− α, α/2} [9]:
η(0)u =
1
1 + βγ
(α−
∫e−y2
2√2π
α2 sinh(η(0)γ−y√η(0)γ)
α cosh(η(0)γ−y√η(0)γ)+(1−α)eη
(0) γ2dy
) (5.39)
which in general yields different solutions. On the other hand, the same rightlimit and the point Ψ(0) of the lower bound on the density evolution for ρ > 0
do coincide at η(0)u since the system scales above ηth > 0. For the setup of Fig.
5.4, we find that η(0)c = 0.16, η
(0)u = 0.122.
In Fig. 5.5, we compare the lower bounds on the density evolution mappings(solid lines) for the cases (α = 0.5, ρ = 0.05) and (α = 0.5, ρ = 0.5), also shownin Fig. 5.4, with the corresponding Gaussian approximations (dash-dotted lines).We observe that by using Gaussian approximation we can also mimic the system’sbehavior described by the lower bound (5.32) not only in (ηth, 1], where it takesthe form of a scaled system [7], but also for η ∈ (0, ηth].
106
5.5 Numerical results
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
η
Ψ
α = 1.0
α = 0.5ρ = 0.0
α = 0.5ρ = 0.05
ηth = 0.1
α = 0.5ρ = 0.5
Figure 5.4: Numerical results for Corollary 5.4.1: Mapping function Ψ(η, β, α)with α = 1.0 and α = 0.5 for ρ = 0.0, 0.01, 0.5, at Eb/N0 = 6dB and β = 4.5.Solid lines represent the density evolution with the (5, 7)8 convolutional code fora codeword length of L = 4000 and 100 realizations for α = 1.0 and for α = 0.5with ρ = 0.0. For α = 0.5 and ρ = 0.01 (dash-dotted line), it is shown the lowerbound on Ψ(η, β, α) for η ∈ (0, 0.1] and the exact mapping for η ∈ (0.1, 1]. Forα = 0.5 and ρ = 0.5, the lower bound is shown ∀η ∈ (0, 1].
107
5.5 Numerical results
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
η
Ψ
α = 0.5ρ = 0.5
α = 0.5ρ = 0.05
ηth = 0.1
Figure 5.5: Numerical results for section 5.4.3 using the Gaussian approximationat the SISO decoders for α = 0.5 and ρ > 0: Mapping function Ψ(η, β, α) withρ = 0.05 and ρ = 0.5, at Eb/N0 = 6dB and β = 4.5. Solid lines represent thedensity evolution with the (5, 7)8 convolutional code and dash-dotted lines plotthe Gaussian approximation
108
5.6 Chapter review and conclusions
5.6 Chapter review and conclusions
We have studied the large-system performance of iterative multiuser joint decod-ing under belief propagation using density evolution when the number of activeusers accessing the channel is unknown at the receiver. Due to inactive userstransmitting the all-zero codeword, the channel model is no longer memoryless,since symbols are correlated over time. We employ a low-complexity symbol-by-symbol iterative multiuser detector that ignores this correlation and analyze itsperformance in the large system limit. In particular, using the replica method, weobtain the multidimensional fixed-point equation of the multiuser efficiency forfinite blocklength. We then study the limiting performance for large blocklengthusing density evolution techniques. In this case, we first show that in the limit forlarge blocklength, the system effectively performs perfect activity detection whenthe maximum number of users and the blocklength scale appropriately below athreshold. Otherwise, we provide a lower bound on the performance based onuncoded activity detection. In both cases, the original multidimensional fixed-point equation reduces to the one-dimensional fixed-point equation. In particular,when perfect-activity detection is achieved, the fixed-point equation is equivalentto that of a system where the number of users is fixed and known, but with ascaled system load. We have also discussed the corresponding Gaussian approxi-mations for both cases and we have verified that these approximations give veryaccurate characterizations of the overall behavior of the iterative multiuser jointdecoder.
109
Chapter 6
Summary and lines of futureresearch
We summarize in this chapter the main contributions of this dissertation anddescribe some lines of future research.
6.1 Main contributions
The main contributions of our work are summarized as follows:
6.1.1 Part I
• Extension of the random-coding method [23] to joint source-channel codingin the form of achievability error bounds for MAP and threshold decoding.
• Formulation of a lower bound on the JSCC based on source-dependentrandom-coding generation. The final result recover’s Ciszar’s and Gallager’sexponents and the method has potential applications in other related prob-lems where the message set is not homogeneous.
• Derivation of a converse bound for JSCC with potential application in otherinformation theory areas like mismatch decoding or multi-terminal commu-nications.
• Analytical approximation and numerical evaluation of the gain of jointsource-channel coding over separation.
110
6.2 Future research
6.1.2 Part II
• Analytical description of the high-SNR operational regimes of an optimumactivity-and-data detector when the number of users is potentially large.
• Design and analysis of an iterative multiuser decoder of data and activity.
• Identification of a condition for which the number of users scales with theactivity rate using an iterative multiuser decoder.
6.2 Future research
• Extreme value approach to study the optimal limits of finite-length information theory
The error probability of the optimal decoder is generally determined by theprobability of extreme events. For the purpose of analysis, this naturallyresorts to a branch in statistics called extreme value theory that deals withthe extreme deviations from the median of probability distributions. Tradi-tionally, researchers in information theory have not dealt with the statisticsof extreme values for its apparent difficulty and instead have preferred touse simpler techniques (Chernoff bounds, c.d.f. of certain random vari-ables, etc...) to provide bounds on the optimal error probability. In fact,converse and achievability bounds error bounds are no more than lower andupper bounds on the optimal error probability respectively. However, themajority of these results are only tight when we assume sufficiently largeblock lengths or/and small probability error events. In other cases, the gapbetween the optimal performance and that of the bounds can be signifi-cantly large leading to an imprecise guidance on the actual performancelimit. Based on the notion of extreme values, we derived in Chapter 2 anew achievability bound (MAX(k, n)) for joint source-channel coding thatoutperforms previous results. Beyond this result, there is room to furtherexplore the insight that extreme value theory provides in finding the optimalperformance of general communication schemes.
• Joint source-channel codes with a fidelity criterion A natural andnon-trivial extension of this work with very useful applications in video andimage processing applications is to study the finite-length rate distortionof sources that are transmitted over channels for a given threshold fidelitycriterion. This threshold is formally set over distortion functions (typicallythe Hamming distortion for discrete alphabets and the squared error dis-tortion for continuous alphabets). Classical rate distortion theorems are
111
6.2 Future research
proved using the idea of “covering” (this essentially means that we need tocompute the minimum number of encoded sequences that are jointly typicalwith the source) and hold in the limit of blocklength due to the applicationof the law of large numbers. Nevertheless, when we drop the assumption ofinfinite block length, this is no longer an optimal strategy.
112
Appendix A
A.1 Types: definitions and properties
The following concept help us quantify the probability of every type in a givenset [13].
Definition A.1.1 The relative entropy (or Kullback-Leibler divergence, or di-vergence) between measures P and Q defined on X is given by
D(P ||Q) =∑
x∈X
P (x) logP (x)
Q(x). (A.1)
The next property of the divergence is central to derive many facts in infor-mation theory.
D(P ||Q) ≥ 0.
For a sequence x let N(x|x) be the number of occurrences of symbol x in x.The type of x is defined by the relative frequency (or empirical distribution) ofthe symbols x in the sequence x:
Px(x) =N(x|x)
n, x ∈ V, (A.2)
Then a type T (PX) is the set of sequences x ∈ X whose type Px equals PX . Theset of probabilities PX which are types of sequences in V will be denoted by P(V).
Joint and conditional types can be defined along similar lines. A joint typeT (PXY ) is formed by all sequence pairs (x,y) ∈ X×Y such that the joint empiricaldistribution of their symbols
Pxy(x, y) =N(x, y|x,y)
n, (x, y) ∈ X× Y. (A.3)
equals PXY .
113
A.2 Proof of Theorem 1.3.3
A conditional type Tx(PY |X) is formed by all sequences y ∈ Y such that
Pxy(x, y) = Px(x)PY |X(y|x) ∀x ∈ X, y ∈ Y. (A.4)
The set of conditional types given x ∈ T (PX) will be denoted by P(Y|PX). Fur-ther, P(Y|X) = ∪PX∈P(X)P(Y|PX).
We now review basic properties of types [13, 16, 17] that are frequently usedwhen applying this method. In this context, we write for ease of notation H(PX),H(PX |PY |X) and I(PX ;PY |X) to denote the entropy H(X), conditional entropyH(Y |X) and mutual information I(X;Y ).
Upper bound to the cardinality of the set of types/conditional types:
|P(V)| ≤ (k + 1)|V| (A.5)
|P(Y|X)| ≤ (n+ 1)|X||Y| (A.6)
Bounds to the cardinality of a type/conditional type:
(n+ 1)−|X|enH(PX) ≤ |T (PX)| ≤ enH(PX), PX ∈ P(X) (A.7)
(n+ 1)−|X||Y|enH(PY |X |PX) ≤∣∣Tx(PY |X)
∣∣ ≤ enH(PY |X |PX), (A.8)
x ∈ T (PX), PY |X ∈ P(Y|PX)
Upper bounds to the probability of a type/conditional type:
(n+ 1)−|X|e−kD(P ′V ||PV ) ≤∑
v∈T (P ′V )
PV (v) ≤ e−kD(P ′V ||PV ), T (P ′V ) ∈ P(V)
(A.9)
(n+ 1)−|X||Y|e−nD(P ′Y |X ||PY |X |PX) ≤
∑
y∈Tx(P ′Y |X)
PY |X(y|x) ≤ e−nD(P ′Y |X ||PY |X |PX)
(A.10)
x ∈ T (PX), P ′Y |X ∈ P(Y|PX). (A.11)
A.2 Proof of Theorem 1.3.3
Let k and n be fixed and assume that each codeword is generated with PX , whereX is independent of V . Let the decoder be maximum a posteriori (MAP), i.e,a decoder that given a received sequence y outputs v = arg maxv P (v|y). Wecompute the random-coding error probability E(ε) expressed as
E[εk,n] =∑
v
PV (v)E[εk,n(v)], (A.12)
114
A.2 Proof of Theorem 1.3.3
where E[εk,n(v)] is the random-coding error probability conditioned on v beingtransmitted.
Throughout the following analysis we consider that decoder always resolvesthe ties in error. Then, for a given source vector v, its codeword x and theresulting channel output y, the associated error event is characterized by thosev 6= v encoded into X such that:
∪v 6=v{PV (v)PY |X(y|X) ≥ PV (v)PY |X(y|x)}, (A.13)
where X ∼ PX is independent of Y .Hence, E[εk,n(v)] can be upper-bounded as
E[εk,n(v)] = E[min
(1,Pr
{∪v 6=v{PV (v)PY |X(Y |X) ≥ PV (v)PY |X(Y |X)}|XY
})]
(A.14)
≤ E
[min
(1,∑
v 6=v
Pr{PV (v)PY |X(Y |X) ≥ PV (v)PY |X(Y |X)|XY
})]
,
(A.15)
where the union bound is applied in (A.15).We now bound each probability in (A.15) using Markov’s inequality with
s > 0:
Pr
{PY |X(Y |X)s ≤
(PV (v)PY |X(Y |X)
PV (v)
)s}≤ PV (v)sE
[PY |X(Y |X)s|Y
]
PV (v)sPY |X(Y |X)s,
(A.16)
which yields
E[εk,n(v)] ≤ E
[min
(1,∑
v 6=v
PV (v)sE[PY |X(Y |X)s|Y
]
PV (v)sPY |X(Y |X)s
)]. (A.17)
Then, by applying the inequality min(1, x) ≤ xρ for x ≥ 0, ρ ∈ [0, 1] in (A.17)we obtain
E[εk,n(v)] ≤ E
[(∑
v 6=v
PV (v)sE[PY |X(Y |X)s|Y
]
PV (v)sPY |X(Y |X)s
)ρ]. (A.18)
115
A.3 Source and channel error exponents, and Csiszar’s original JSCCtheorem
Then, E[εk,n] can be upperbounded on account of (A.12) as
E[εk,n] ≤∑
v
PV (v)E
[(∑
v 6=v
PV (v)sE[PY |X(Y |X)s|Y
]
PV (v)sPY |X(Y |X)s
)ρ](A.19)
≤∑
v
PV (v)∑
x,y
PX(x)PY |X(y|x)
(∑
v
PV (v)s
PV (v)sE[PY |X(y|X)s
]
PY |X(y|x)s
)ρ
(A.20)
=∑
v
PV (v)1−sρ∑
x,y
PX(x)PY |X(y|x)1−sρ
(∑
v
PV (v)sE[PY |X(y|X)s
])ρ
,
(A.21)
where it is used thatX is independent of V to obtain (A.21). Finally, substitutions = 1/(1 + ρ) in (A.21) [23] yields
E[εk,n] ≤∑
v
PV (v)1
1+ρ
∑
x,y
PX(x)PY |X(y|x)1
1+ρ
(∑
v
PV (v)1
1+ρE[PY |X(y|X)
11+ρ
])ρ
(A.22)
=
(∑
v
PV (v)1
1+ρ
)1+ρ∑
y
(∑
x
PX(x)PY |X(y|x)1
1+ρ
)1+ρ
. (A.23)
A.3 Source and channel error exponents, and
Csiszar’s original JSCC theorem
In this section, we review known exact formulas and bounds for the source andchannel coding error exponents. The expressions are given in terms of the di-vergence (See definition A.1.1) as well and in a parametric compact form. Wenote that the latter form can be derived from the former using the method of theLagrange multipliers [5].
Theorem A.3.1 [5, 17, 36] The source error exponent e(R,PV ) is given by
e(R,PV ) =
0 0 < R < H(V )
minU :H(U)≥RD(U‖V ) 0 ≤ R ≤ log V
∞ R > log V,
(A.24)
which admits the parametric form
e (R,PV ) , sup0≤ρ≤∞
ρR− Es(ρ, PV ), (A.25)
116
A.3 Source and channel error exponents, and Csiszar’s original JSCCtheorem
where Es(ρ, PV ) is Gallager’s source function (1.28).
Theorem A.3.2 [5, 17, 23] The channel coding error exponent E(R,PY |X) isbounded as
Er ≤ E(R,PY |X) ≤ Esp(R,PY |X), (A.26)
where
Er , supPX
minP ′
Y |XD(P ′Y |X‖PY |X |PX) + |I(PX ;P ′Y |X)−R|+ (A.27)
is the channel random-coding exponent , and
Esp(R,PY |X) , supPX
minP ′
Y |X :I(PX ;P ′Y |X)≤R
D(P ′Y |X‖PY |X |PX) (A.28)
is the channel sphere packing exponent . Similarly, Er and Esp(R,PY |X) admitparametric forms given by
Er = supPX
maxρ∈[0,1]
(E0(ρ, PX , PY |X)− ρR
)(A.29)
andEsp(R,PY |X) = sup
PX
supρ≥0
(E0(ρ, PX , PY |X)− ρR
), (A.30)
respectively, where E0(ρ, PX , PY |X) is Gallager’s channel function (1.29).
Based on the parametric forms of Er and Esp(R,PY |X) it can be easily proventhat the two bounds are tight and thus, give the actual error exponent
E(R,PY |X) = Er = Esp(R,PY |X) (A.31)
for rates R > 0 strictly larger than the channel critical rate Rcr, defined by
Rcr ,∂E0(ρ, PX , PY |X)
∂ρ
∣∣∣∣∣ρ=1
. (A.32)
We remark that for DMS and DMC with blocklengths k and n, the aboveexpressions factorize as ke(R,PV ), and nEr(R,PY |X) and nEsp(R,PY |X) respec-tively.
Based on the source and error exponent Csisar formulated his upper and lowerbounds on the JSCC exponent.
Theorem A.3.3 (Csiszar [14]) The JSCC error exponent EJ(PV , PY |X) for aDMS-DMC pair is bounded as
EJ(PV , PY |X) ≥ mintH(V )≤R≤t log|V|
te
(R
t, PV
)+ Er (A.33)
and
EJ(PV , PY |X) ≤ mintH(V )≤R≤t log|V|
te
(R
t, PV
)+ Esp(R,PY |X). (A.34)
117
A.4 Proof of the lower bound in Theorem 1.3.4 using random codingover fixed-composition codes with MAP decoding
A.4 Proof of the lower bound in Theorem 1.3.4
using random coding over fixed-composition
codes with MAP decoding
Let V be divided into mk types T (PV1), . . . , T (PVmk ) ∈ P(V) associated to dis-tributions PV1 , . . . , PVmk over V respectively. For each source type T (PVi), i =1, . . . ,mk, and channel blocklength n, we consider a type in X, T (PXi) ∈ P(X),such that each message v ∈ T (PVi) is encoded into a codeword x ∈ T (PXi) that isindependently chosen with equal probability among the elements of T (PXi) [24].
We wish to compute the exponent of the random-coding average error proba-bility E[εk,n] when MAP decoding is used at the receiver. Then, for a given sourcevector v ∈ T (PVi), its assigned codeword x ∈ T (PXi) and the resulting channeloutput y, the error event is characterized by those v 6= v encoded onto X suchthat:
∪v 6=v{PV (v)PY |X(y|X) ≥ PV (v)PY |X(y|x)}, (A.35)
where X ∼ PX is independent of Y .In order to evaluate the probability of (A.35), we first define the following
conditional type
Ty(PXj , PY |X) ={x ∈ T (PXj) : (x,y) ∈ T (PXj × PY |X)
}, (A.36)
where T (PXj × PY |X) is the joint type associated to the joint distribution PXj ×PY |X . Then, for (v,x,y) let P ′Y |X ∈ P(Y|X) be such that x ∈ Ty(PXi , P
′Y |X).
An error occurs if v ∈ T (PVi) is encoded into a codeword x ∈ T (PXj) such thatx ∈ Ty(PXj , PY |X) for a PY |X ∈ P(Y|X) satisfying (A.35), i.e:
PV (v)PY |X(y|x) ≥ PV (v)PY |X(y|x) (A.37)
PY (y) =∑
x
PXj(x)PY |X(y|x) =∑
x
PXi(x)PY |X(y|x), (A.38)
which can be rewritten in terms of the distribution of the types as (See SectionA.1):
exp[k∑
v PVj(v) logPV (v) + n∑
x,y PXj(x)PY |X(y|x) logPY |X(y|x)]≥
exp[k∑
v PVi(v) logPV (v) + n∑
x,y PXi(x)P ′Y |X(y|x) logPY |X(y|x)]
(A.39)∑
x PXj(x)PY |X(y|x) =∑
x PXi(x)P ′Y |X(y|x) for each y ∈ Y. (A.40)
Denote now by W ∈ P(Y|X) the set of noise types PY |X such that (A.39)and (A.39) hold for a given P ′Y |X . Then, by recalling equations (A.14)-(A.15) in
118
A.4 Proof of the lower bound in Theorem 1.3.4 using random codingover fixed-composition codes with MAP decoding
Appendix A.2 we can rewrite the upper bound to E[εk,n(v)], for v ∈ T (PVi) as
E[εk,n(v)] = |T (PXi)|−1∑
x∈T (PXi )
∑
P ′Y |X∈P(Y n|x)
∑
y∈Tx(P ′Y |X)
PY |X(y|x)ε(v,x,y)
(A.41)
where
ε(v,x,y) ≤ min
1,
mk∑
j=1
∑
v∈T (PVj )
∑
PY |X∈W(P ′Y |X)
Pr{x ∈ Ty(PXj , PY |X)
} (A.42)
Since x is randomly selected for v with equal probability over all sequencesin T (PXj), then
Pr{x ∈ Ty(PXj , PY |X)
}=|Ty(PXj , PY |X)||T (PXj)|
≤ exp[−nI(PXj ; PY |X)
](A.43)
where the upper bound follows from inequalities (A.7) and (A.8).For a given (v,x,y), we can upperbound (A.42) as
ε(v,x,y) ≤ min
1,
mk∑
j=1
∑
v∈T (PVj )
∑
PY |X∈W(P ′Y |X)
exp[−nI(PXj ; PY |X)
] (A.44)
≤ min
1, (n+ 1)|X||Y|
mk∑
j=1
∑
v∈T (PVj )
maxP ′Y |X∈W( ˆP ′
Y |X)
exp[−n(I(PXj ; PY |X))
]
(A.45)
=
mk∑
j=1
exp
[−n min
PY |X∈W(P ′Y |X)
[I(PXj ; PY |X)−Rj − δN
]+], (A.46)
where (A.45) follows from inequality (A.6), inequality (A.7) is used to obtain
(A.46), and δN , |X||Y| logn+1n
, Rj , H(Vj).
Denote the expression (A.46) by ε(v,x,y). Based on equation (A.42), we cannow upperbound E[εk,n(v)] as
|T (PXi)|−1∑
x∈T (PXi )
∑
P ′Y |X∈P(Y n|x)
∑
y∈Tx(P ′Y |X)
PY |X(y|x)ε(v,x,y) (A.47)
≤mk∑
j=1
exp
[−n(
minP ′Y |X
minPY |X∈W(P ′
Y |X)D(P ′Y |X ||PY |X |PXi) +
[I(PXj ; PY |X)−Rj − 2δN
]+)]
(A.48)
119
A.4 Proof of the lower bound in Theorem 1.3.4 using random codingover fixed-composition codes with MAP decoding
where inequalities (A.11) are combined (A.6) to obtain (A.48).By using (A.9), we can finally upperbound E[εk,n] as follows:
E[εk,n] ≤mk∑
i=1
mk∑
j=1
exp [−nEij] , (A.49)
where
Eij , minP ′Y |X
minPY |X∈W(P ′
Y |X)tD(Vi||V )+D(P ′Y |X ||PY |X |PXi)+
[I(PXj ; PY |X)−Rj − 2δN
]+
(A.50)and t = k
nis the transmission rate.
We will show now that Eij ≥ Eii for i, j = 1, . . . ,mk. To begin with, ifI(PXj ; PY |X)−Rj ≥ I(PXi ;P
′Y |X)−Ri is satisfied for a pair (i, j) then
Eij ≥ minP ′Y |X
tD(Vi||V ) +D(P ′Y |X ||PY |X |PXi) +[I(PXi ;P
′Y |X)−Ri − 2δN
]+.
(A.51)
Otherwise, if there exists a pair (i, j) for which I(PXj ; PY |X)−Rj < I(PXi ;P′Y |X)−
Ri, this is equivalent to
∑
x,y
PXj(x)PY |X(y|x) log PY |X(y|x) + t∑
v
PVj(v) logPVj(v) <
∑
x,y
PXi(x)P ′Y |X(y|x) logP ′Y |X(y|x) + t∑
v
PVi(v) logPVi(v), (A.52)
which yields the following relationship (unifying the dummy variables x = x and
120
A.4 Proof of the lower bound in Theorem 1.3.4 using random codingover fixed-composition codes with MAP decoding
v = v)
tD(Vi||V ) +D(P ′Y |X ||PY |X |PXi) (A.53)
= t∑
v
PVi(v) logPVi(v)
PV (v)+∑
x,y
PXi(x)P ′Y |X(y|x) logP ′Y |X(y|x)
PY |X(y|x)(A.54)
(a)
≥ t∑
v
PVi(v) logPVi(v)− t∑
v
PVj(v) logPV (v) (A.55)
+∑
x,y
PXi(x)P ′Y |X(y|x) logP ′Y |X(y|x)−∑
x,y
PXj(x)PY |X(y|x) logPY |X(y|x)
(A.56)
(b)
≥ t∑
v
PVj(v) logPVj(v)− t∑
v
PVj(v) logPV (v) (A.57)
+∑
x,y
PXj(x)PY |X(y|x) log PY |X(y|x)−∑
x,y
PXj(x)PY |X(y|x) logPY |X(y|x)
(A.58)
= tD(Vj||V ) +D(PY |X ||PY |X |PXj), (A.59)
where the MAP decoding rule (A.39)-(A.40) is used in (a), and inequality (A.52)in (b). Thus, the exponent in (A.51) is lowerbounded as
Eij ≥ minPY |X
tD(Vj||V ) +D(PY |X ||PY |X |PXj) +[I(PXj ; PY |X)−Rj − 2δN
]+.
(A.60)
Notice that unifying P ′Y |X = PY |X , i = j in (A.51) and (A.60) yields
Eij ≥ minP ′Y |X
tD(Vi||V ) +D(P ′Y |X ||PY |X |PXi) +[I(PXi ;P
′Y |X)−Ri − 2δN
]+
(A.61)
for every i, j = 1, . . . ,mk.By taking the limit in n and optimizing (A.61) with respect to PXi , we finally
121
A.4 Proof of the lower bound in Theorem 1.3.4 using random codingover fixed-composition codes with MAP decoding
obtain
limn→∞t= k
n
min1≤i≤mk
tD(Vi||V ) + minP ′Y |X
D(P ′Y |X ||PY |X |Pi) +[I(PXi ;P
′Y |X)−Ri − 2δN − 2δK
]+
(A.62)
= min1≤i≤mk
tD(Vi||V ) + minP ′Y |X
D(P ′Y |X ||PY |X |Pi) +[I(PXi ;P
′Y |X)−Ri
]+
(A.63)
≥ min1≤i≤mk
minU :H(U)≥H(Vi)
tD(U ||V ) + Er(Ri, PY |X
)(A.64)
= min1≤i≤mk
te
(Ri
t, PV
)+ Er
(Ri, PY |X
)(A.65)
≥ minH(V )≤R≤log V
te
(R
t, PV
)+ Er
(R,PY |X
), (A.66)
where the definition of the random-coding channel error exponent (A.27) is usedto obtain (A.64) and the definition of the source error exponent (A.24) is used toobtain (A.65).
122
Appendix B
B.1 Application to BMS and BSC source chan-
nel pair
We assume here a binary memoryless source (BMS) with probability of bit 1 beingδ < 0.5 and a binary symmetric channel (BSC) with crossover probability ξ < 0.5.We consider n = k, i.e., transmission rate t = 1 and evaluate our bounds over theensemble of random source-channel codes generated by the capacity achievingdistribution PX(x) = 2−n.
B.1.1 Achievability JSCC bounds for BMS and BSC
Let v have probability P (v) = δl(1−δ)n−l, for some 0 ≤ l ≤ n, and the likelihoodof its codeword x be p(y|x) = ξw(1− ξ)n−w for some 0 ≤ w ≤ n. Then, for anyv 6= v with probability P (v) = δ l(1− δ)n−l, we define
w? ,w + (l − l) log
(δ
1−δ
)
log(
ξ1−ξ
) (B.1)
such that δl(1− δ)n−lξw(1− ξ)n−w = δ l(1− δ)n−lξw(1− ξ)n−w.Using (B.1), the following expressions are direct application of Theorem 2.2.2,
Corollaries 2.2.1 and 2.2.2, and Theorem 2.2.5 to a BMS-BSC pair.
Max bound
1−n∑
l=0
n∑
w=0
(n
l
)(n
w
)δl(1− δ)n−lξw(1− ξ)n−w
n∏
l′=0
n∑
`=dw?e
(n
`
)2−n
(nl′)
(B.2)
123
B.1 Application to BMS and BSC source channel pair
RCU bound,
n∑
l=0
n∑
w=0
(n
l
)(n
w
)δl(1− δ)n−lξw(1− ξ)n−w min
1,
n∑
l′=0
(n
l′
) bw?c∑
`=0
(n
`
)2−n
(B.3)
Tilted RCU bound
n∑
l=0
n∑
w=0
(n
l
)(n
w
)δl(1−δ)n−lξw(1−ξ)n−w min
(1,
2−n [(ξs + (1− ξ)s) (δs + (1− δ)s)]nδsl(1− δ)s(n−l)ξws(1− ξ)s(n−w)
)
(B.4)
Optimized DT bound
n∑
l=0
n∑
w=0
(n
l
)(n
w
)min
(δl(1− δ)n−lξw(1− ξ)n−w, 2−n
l∑
l′=0
(n
l′
)δl′(1− δ)n−l′
)
(B.5)
B.1.2 Error exponents for BMS and BSC
Gallager-Csiszar and sphere-packing exponents for the BMS-BSC are given by
EGC = maxρ∈[0,1]
E0
(ρ,
1
1 + ρ, 2−n, PY |X
)− Es
(ρ,
1
1 + ρ, PV
)(B.6)
Esp = maxρ≥0
E0
(ρ,
1
1 + ρ, 2−n, PY |X
)− Es
(ρ,
1
1 + ρ, PV
)(B.7)
(B.8)
respectively, where
E0
(ρ, s, 2−n, PY |X
)= log
((1− ξ)1−ρs + ξ1−ρs) 2−ρ(ξs + (1− xi)s)ρ (B.9)
and
Es(ρ, s, PV ) = log(δ1−ρs + (1− δ)1−ρs)+ ρ log (δs + (1− δ)s) . (B.10)
Regarding threshold decoding, the Chernoff bound of DT(k, n) (2.84) reads
EDT? = maxρ∈[0,1]
E0(ρ, 1, 2−n, PY |X)− Es
(ρ,
1
1 + ρ, PV
). (B.11)
124
B.2 Application to BMS and BEC source channel pair
B.2 Application to BMS and BEC source chan-
nel pair
Consider now that the BMS is transmitted over a BEC channel with erasureprobability ξ < 1.0,. We consider n = k, i.e., transmission rate t = 1 and evaluateour bounds over the ensemble of random source-channel codes generated by thecapacity achieving distribution PX(x) = 2−n.
For a source vector v with Hamming weight dw(v) = l and a received vectory with e erasures, the next equality holds
PV (v)PY |X(y|x)
PY (y)= δl(1− δ)n−l2n−e. (B.12)
B.2.1 Achievability JSCC bounds for BMS and BEC
The following expressions are direct application of Theorem 2.2.2, Corollaries2.2.1 and 2.2.2, and Theorem 2.2.5 to a BMS-BEC source-channel pair.
Max bound
1−n∑
l=0
n∑
e=0
(n
l
)(n
e
)δl(1− δ)n−lξe(1− ξ)n−e
l∏
l′=0
(1− 2e−n
)(nl′) (B.13)
RCU bound
n∑
l=0
n∑
e=0
(n
l
)(n
e
)δl(1− δ)n−lξe(1− ξ)n−e min
(1, 2e−n
l∑
l′=0
(n
l′
))(B.14)
Tilted RCU
n∑
l=0
n∑
e=0
(n
l
)(n
e
)δl(1− δ)n−lξe(1− ξ)n−e min
(1,
(δs + (1− δ)s)n 2e−n
δsl(1− δ)s(n−l))
(B.15)
Optimized DT Bound
n∑
l=0
n∑
e=0
(n
l
)(n
e
)min
(δl(1− δ)n−lξe(1− ξ)n−e, ξe
(1− ξ
2
)n−e l∑
l′=0
(n
l′
)δl′(1− δ)n−l′
)
(B.16)
125
B.2 Application to BMS and BEC source channel pair
B.2.2 Error exponents for BMS and BEC
Gallager-Csiszar and the sphere-packing exponents for the BMS-BEC are givenby
EGC = maxρ∈[0,1]
E0
(ρ,
1
1 + ρ, 2−n, PY |X
)− Es
(ρ,
1
1 + ρ, PV
)(B.17)
Esp = maxρ≥0
E0
(ρ,
1
1 + ρ, 2−n, PY |X
)− Es
(ρ,
1
1 + ρ, PV
), (B.18)
respectively, where
E0(ρ, s, 2−n, PY |X) = log(2−ρ(1− ξ) + 2ξ
)(B.19)
and
Es(ρ, s, PV ) = log(δ1−ρs + (1− δ)1−ρs)+ ρ log (δs + (1− δ)s) . (B.20)
Finally, the Chernoff bound of DT(k, n) (2.84) reads
EDT? = maxρ∈[0,1]
E0(ρ, 1, 2−n, PY |X)− Es
(ρ,
1
1 + ρ, PV
). (B.21)
126
Appendix C
C.1 Proof of Corollary 4.1.1
We first derive the MMSE for our particular ternary input distribution in thegeneral fixed-point equation (3.12):
MMSE(ηγ, α) =E
[(bk − E{bk|S, y}
)2]
(C.1)
=α−∫ E2
bk{bkP (y|η, bk, S)}
Ebk{P (y|η, bk, S)}dy (C.2)
=α−∫
(√η
2πα2
[e−
η2
(y−√γ)2 − e− η2 (y+√γ)2])2
√η
2π
(α2
[e−
η2
(y−√γ)2+ e−
η2
(y+√γ)2])
+ (1− α)eη2y2 dy
(C.3)
=α− 1
2
√η
2π
[∫e−
η2
(y−√γ)2α2 sinh(ηy
√γ)
α cosh(ηy√γ) + (1− α)e
ηγ2
dy (C.4)
−∫
e−η2
(y′+√γ)2α2 sinh(ηy′
√γ)
α cosh(ηy′√γ) + (1− α)e
ηγ2
dy′
]. (C.5)
After appropriate change of variables the MMSE is
MMSE(ηγ, α) = α−∫
1√2π
e−y2
2α2 sinh[ηγ − y
√ηγ]
α cosh[ηγ − y√ηγ] + (1− α)eηγ/2
dy (C.6)
Since the SNR is constant among users, it follows that the large-system fixed-point equation is given by (4.2).
127
C.2 Proof of Theorem 4.1.1
C.2 Proof of Theorem 4.1.1
From now on, we omit the explicit indication of the arguments of the MMSEfunction. The lower bound is obtained by noting that, for large SNR, the generalMMSE, denoted MMSEα, is lower-bounded by MMSE{0,1}, which describes thedetection performance when the transmitted symbols are {0, 1}, with probabilities{1− α, α}. In this case the MMSE has the following form
MMSE{0,1} = α−∫
(√η
2παe−
η2
(y−√γ)2)2
√η
2π
(αe−
η2
(y−√γ)2+ (1− α)e−
η2y2)dy (C.7)
where λα , 12
ln(
1−αα
). After some manipulation and appropriate changes of
variables, we obtain
MMSE{0,1} = α− αe− 3ηγ8−λα
√1
2π
∫ ∞
−∞e− 1
2(2z−√ηγ+ 2λα√
ηγ)2
sech (z√ηγ) dz. (C.8)
Use now the following asymptotic expansion for sech(z): |z| → ∞
sech(z) = 2e−|z|
(1 +
∞∑
`=1
(−1)`e−2`|z|
)(C.9)
and obtain
MMSE{0,1}
= α− αe− 3ηγ8−λα
√1
2π
[∫ 0
−∞e− 1
2(2z−√ηγ+ 2λα√
ηγ)2
2ez√ηγ
(1 +
∞∑
`=1
(−1)`e2`z√ηγ
)dz
+
∫ ∞
0
e− 1
2(2z−√ηγ+ 2λα√
ηγ)2
2e−z√ηγ
(1 +
∞∑
`=1
(−1)`e−2`z√ηγ
)dz
].
We now use the expansion
MMSE{0,1} = α− αe−ηγ/8+λα−2λ2α
ηγ
[∞∑
`=0
(−1)`√
2
π
∫ 0
−∞e−2z2+
“(3+2`)
√ηγ− 4λα√
ηγ
”zdz
(C.10)
+∞∑
`=0
(−1)`√
2
π
∫ ∞
0
e−2z2+
“(1−2`)
√ηγ− 4λα√
ηγ
”zdz
].
(C.11)
128
C.2 Proof of Theorem 4.1.1
Express now the integrals in terms of the Q-function, Q(x) , 1√2π
∫∞xe−
t2
2 dt:
MMSE{0,1}
= αe−ηγ/8+λα−2λ2α
ηγ
[e
„√ηγ2 − 2λα√
ηγ
«2
2 Q
(√ηγ
2− 2λα√
ηγ
)− e
„3√ηγ
2 − 2λα√ηγ
«2
2 Q
(3√ηγ
2− 2λα√
ηγ
)
−∞∑
`=1
(−1)`[e
„(2`−1)
√ηγ
2 + 2λα√ηγ
«2
2 Q
((2`− 1)
√ηγ
2+
2λα√ηγ
)
+e
„(3+2`)
√ηγ
2 − 2λα√ηγ
«2
2
(Q
((3 + 2`)
√ηγ
2− 2λα√
ηγ
))]].
Next, use the expansion of the Q function, [63]:
Q(x) =e−x
2/2
√2πx
(1 +
∞∑
`=1
(−1)`∏`
q=1(2q − 1)
x2`
)(C.12)
to obtain
MMSE{0,1} (C.13)
= 2
√α(1− α)
2πηγe−ηγ/8−
2λ2α
ηγ
[1(
1− 4λαηγ
) +1(
1 + 4λαηγ
) − 1(3− 4λα
ηγ
) − 1(3 + 4λα
ηγ
)
(C.14)
+ · · ·+ O
(1√ηγ
)]
(C.15)
where the linear term in λα is substituted in the common factor. Assuming alarge value ηγ, and using the result
2∞∑
n=0
(−1)n+1
2n+ 1=π
2(C.16)
we obtain the following lower bound
MMSE{0,1} > 2
√α(1− α)
2πηγe−ηγ/8
√2 = 2
√α(1− α)
πηγe−ηγ/8. (C.17)
129
C.2 Proof of Theorem 4.1.1
As far as the upper-bound is concerned, we derive the general MMSE, denotedMMSEα, and its particular case when all users are assumed to be active, denotedMMSE1. Hence, we express the corresponding integrals in an analogous manner
ζα =
∫1√2πe−(y−√ηγ)2
2α sinh(y
√ηγ)
α cosh(y√ηγ) + (1− α)eη
γ2
dy. (C.18)
ζ1 =
∫1√2πe−(y−√ηγ)2
2 tanh(y√ηγ)dy. (C.19)
We now obtain
MMSEα = α(1− ζα) = α (1 + ζ1 − ζ1 − ζα) = α ((1− ζ1) + (ζ1 − ζα)) (C.20)
= α (MMSE1 + (ζ1 − ζα)) (C.21)
Next, we expand α(ζ1 − ζα), which yields
α(ζ1 − ζα) (C.22)
= (1− α)eηγ/2∫
1√2πe−(y−√ηγ)2
2α sinh[y
√ηγ]
α cosh2[y√ηγ] + (1− α)eηγ/2 cosh(y
√ηγ)
dy
= (1− α)eηγ/2∫
1√2πe−(y−√ηγ)2
2tanh[y
√ηγ]
cosh[y√ηγ] + eηγ/2+ln( 1−α
α )dy (C.23)
Consider now the following inequalities
tanh(z) ≤ 1 and cosh(z) ≥ ez
2(C.24)
After substitution and manipulation of the denominator of (C.23), we obtain
α (ζ1 − ζα) ≤(1− α)eηγ/2∫ ∞
−∞
e−(y−√ηγ)2
2√2π
1
ey√ηγ
2+ eηγ/2+ln( (1−α)
α )dy (C.25)
=(1− α)eηγ/2∫ ∞
−∞
e−(y−√ηγ)2
2√2π
e−y√ηγ
2− ηγ
4−φα
cosh(y√ηγ
2− ηγ
4− φα
)dy, (C.26)
where φα , 12
ln(
2(1−α)α
). We readjust terms to express the integral in a conve-
nient form
α (ζ1 − ζα) ≤ (1− α)e−ηγ/8−φα∫ ∞
−∞
e−
„y−√ηγ2
«2
2√2π
sech
(y√ηγ
2− ηγ
4− φα
)dy.
(C.27)
130
C.2 Proof of Theorem 4.1.1
We now use the following transformation in the variable of integration: y =2z +
√ηγ
2+ 2φα√
ηγ.
α (ζ1 − ζα) ≤2(1− α)e7ηγ8−φα
∫ 0
−∞
e−
„2z−2
√ηγ− 2φα√
ηγ
«2
2√2π
sech (z√ηγ) dz (C.28)
+2(1− α)e−ηγ/8−φα∫ ∞
0
e−
„2z+
2φα√ηγ
«2
2√2π
sech (z√ηγ) dz (C.29)
and the asymptotic expansion for sech(z) in the above derivation
sech(z) = 2e−|z|
(1 +
∞∑
`=1
(−1)`e−2`|z|
). (C.30)
This yields
α (ζ1 − ζα) ≤4(1− α)e−ηγ/8−φα−2φ2α
ηγ
∞∑
`=0
(−1)`∫ ∞
0
e−2z2+
“(1+2`)
√ηγ− 4φα√
ηγ
”z
√2π
dz
(C.31)
+ 4(1− α)e−ηγ/8−φα−2φ2α
ηγ
∞∑
`=0
(−1)`∫ ∞
0
e−2z2−
“(1+2`)
√ηγ+ 4φα√
ηγ
”z
√2π
dz
(C.32)
Finally, expressing the integrals in terms of the Q function
α (ζ1 − ζα) (C.33)
≤ 2(1− α)e−ηγ/8−φα−2φ2α
ηγ Biggl[∞∑
`=0
(−1)`e
„(1+2`)
√ηγ
2 − 2φα√ηγ
2«2
2 Q
((1 + 2`)
√ηγ
2− 2φα√
ηγ
)
(C.34)
+∞∑
`=0
(−1)`e
„(1+2`)
√ηγ
2 +2φα√ηγ
«2
2 Q
((1 + 2`)
√ηγ
2+
2φα√ηγ
)]
(C.35)
131
C.2 Proof of Theorem 4.1.1
and manipulating the expansion
α (ζ1 − ζα) ≤2(1− α)e−ηγ/8−φα−2φ2α
ηγ
[e
„√ηγ2 − 2φα√
ηγ
«2
2 Q
(√ηγ
2− 2φα√
ηγ
)(C.36)
+ e
„√ηγ2 +
2φα√ηγ
«2
2 Q
(√ηγ
2+
2φα√ηγ
)(C.37)
+∞∑
`=1
(−1)`e
„(1+2`)
√ηγ
2 − 2φα√ηγ
«2
2 Q
((1 + 2`)
√ηγ
2− 2φα√
ηγ
)(C.38)
+∞∑
`=1
(−1)`e
„(1+2`)
√ηγ
2 +2φα√ηγ
«2
2 Q
((1 + 2`)
√ηγ
2+
2φα√ηγ
)], (C.39)
we obtain, after using the series expansion (C.16),
α (ζ1 − ζα) (C.40)
≤ 2
√(1− α)α
πηγe−ηγ/8
[1(
1− 4φαηγ
) +1(
1 + 4φαηγ
) − 1(3− 4φα
ηγ
) − 1(3 + 4φα
ηγ
)
(C.41)
+ · · ·+ O
(1√ηγ
)],
(C.42)
where the linear term in φα = 12
ln(
2(1−α)α
)is substituted in the common factor,
and quadratic terms are neglected.
Using the same result as before on the series 2∑∞
n=0(−1)n+1
2n+1, we obtain the
upper bound:
α (ζ1 − ζα) <
√πα(1− α)
ηγe−ηγ/8. (C.43)
Finally, using the upper bound given in Lemma 4.1.1 for BPSK, the overall MMSEcan be upper-bounded by
MMSEα ≤ 2αe−ηγ/2 +
√πα(1− α)
ηγe−ηγ/8. (C.44)
132
C.3 Proof of Theorem 4.2.1
C.3 Proof of Theorem 4.2.1
We analyze the function
G(η) = (1− η)euη/√η (C.45)
where u is a constant, which entirely describes the dependence of Υβ on η forsufficiently large ηγ. By simple derivation of (C.45), it is easy to observe thatthe solution has critical points in the domain (0, 1] if and only if u ≥ (3 +
√8)/2.
These points are
ηm =u− 1/2−
√u2 − 3u+ 1/4
2u(C.46)
ηM =u− 1/2 +
√u2 − 3u+ 1/4
2u(C.47)
and lie in the domain (0, 1]. In fact, note that u2−3u+1/4 < (u−3/2)2, and thusit can be verified that 0 < 1/2u < ηm < ηM < 1− 1/u < 1. By using the secondderivative of (C.45) we observe that these solutions correspond to a local mini-mum and a local maximum, respectively. Let us now study the function (C.45) tojustify the range of values of the critical system load. G(η) is a continuous functionin (0, 1] that takes positive values. Since G(η) tends to 0 as η approaches 1, andtends to infinity as η approaches 0, it can be concluded that the range for whichG(η) has only one pre-image is (0, G(ηm)) ∪ (G(ηM),∞). Hence, there are sin-gle pre-images in the ranges (0,min{G−1(G(ηM))}) and (max{G−1(G(ηm))}, 1].For G(ηm) < G(.) < G(ηM), there are three pre-images and for G(ηm) andG(ηM) there are exactly two due to the local minimum and maximum (See Fig.4.5). Then, the smallest pre-image among them lies on [min{G−1(G(ηM))}, ηm]whereas the largest lies on [ηM ,max{G−1(G(ηm))}]. In conclusion, the overallsmallest pre-images belong to
R1 = (0,min{G−1(G(ηM))}) ∪ [min{G−1(G(ηM))}, ηm] = (0, ηm], (C.48)
whereas the largest pre-images belong to
R2 = [ηM ,max{G−1(G(ηm))}] ∪ [max{G−1(G(ηm))}, 1] = [ηM , 1]. (C.49)
In particular, R12 , (0,min{G−1(G(ηM)}) ⊂ R1 and R22 , (max{G−1(G(ηm))}, 1] ⊂R2. By bounding the MMSE using (4.5), replacing u = γ/8, and denotingRb = R1, Rbc = R12, Rg = R2, Rgc = R22, we obtain the desired result.
133
Appendix D
D.1 Proof of Claim 5.4.1
The proof lies on the so-called replica method, which is a common tool in statisti-cal physics and has been proved to be a powerful technique in detection analysis.We generalize the application of this method in [28, 57] in multiuser detection invarious ways. Firstly, the iterative decoding framework described above requiresa generalization of classic uncoded detection to the case of arbitrary and proba-bly correlated symbol prior probabilities [58], under the assumption that in thelarge-system limit, the empirical distribution of these priors converges almost ev-erywhere to some deterministic function (concentration). The proof also extendsthe results for log-ratio prior probabilities in [7] to the case of arbitrary extrinsicmessage probabilities (pext), whose distribution is in turn governed by a Bernoullivariable Aα with mean equal to the activity rate.
We start by analyzing an optimal generic multiuser iterative detector for codedCDMA. The receiver postulates an AWGN channel with noise variance σ andprior probability pX , whereas the true noise variance is σ2
0 = 1 and the priorprobability is pX0 , without loss of generality. The replica method consists ofadding n K × L inputs X1, . . . ,Xn and corresponding postulated channels tothe true one, whose input is X0. Note that the replicas X = (X0, . . . ,Xn) areindependent but in general have different prior probabilities due to the randomextrinsic information provided by the decoders.
By applying Varadhan’s lemma and assuming replica symmetry [46] among thesolutions of the resulting optimization problem, the free energy can be expressedas
F = limn→0
∂
∂n
(inf
{c,d,f,g}sup
{r,p,m,q}
{β−1Gn − In
})
(D.1)
where c = (c1, . . . , cL), d = (d1, . . . , dL), f = (f1, . . . , fL), g = (g1, . . . , gL), r =
134
D.1 Proof of Claim 5.4.1
(r1, . . . , rL), p = (p1, . . . , pL),m = (m1, . . . ,mL), q = (q1, . . . , qL) ∈ RL,
Gn =1
2
L∑
l=1
log(1 + β
σ2 (pl − ql))1−n
1 + βσ2 (pl − ql) + n
σ2 (1 + β(rl − 2ml + ql))−n log(2πσ2)L/2 (D.2)
and
In =L∑
l=1
(rlcl + npl + 2nmldl + n(n− 1)qlfl)− limK→∞
1
K
K∑
k=1
log Λnk (D.3)
is the rate function obtained from the application of the Gartner-Ellis theorem[20] , where Λn
k(c, d, f , g) is the moment generating function, which in turn canbe developed as
Λnk = EX,γ
[exp
(γ
L∑
l=1
(x(l)k )TQlx
(l)k
)](D.4)
where∑L
l=1(x(l)k )TQlx
(l)k equals
L∑
l=1
(2dl
n∑
a=1
xk,l,0xk,l,a + 2fl∑
1≤a<b≤n
xk,l,axk,l,b + cl(xk,l,0)2 + gl
n∑
a=1
(xk,l,a)2
)
(D.5)
where x(l)k = (xk,l,0, xk,l,1, . . . , xk,l,n+1) is a vector formed by concatenating the
replicas for user k and codeword position l, and Q = (Q1, . . . ,QL), with each Ql
being a (n+1)× (n+1) matrix of parameters, which under the replica symmetryreads:
Q00,l = cl, Q0a,l = dl, Qaa,l = gl, Qab,l = fl. (D.6)
The overall computation of the free-energy is equivalent to the optimizationof H(Q) , (β−1Gn − In) over the symmetric replicas at n = 0 [27]. Hence, weimmediately obtain
limn→0
∂H
∂rl= 0 ⇒ cl = 0 (D.7)
limn→0
∂H
∂ml
= 0 ⇒ dl =1
2(σ2 + βpl)(D.8)
limn→0
∂H
∂ql= 0 ⇒ fl =
1 + β(rl − 2ml + ql)
2(σ2 + βpl)(D.9)
limn→0
∂H
∂dl= 0 ⇒ gl = fl − dl (D.10)
135
D.1 Proof of Claim 5.4.1
for l = 1, . . . , L.The rest of parameters are found by derivation of the moment generating
function (D.4) with respect to c, d, f , g:
limn→0
∂H
∂(.)= lim
n→0limK→∞
1
K
K∑
k=1
∂ (log Λnk)
∂(.)= 0 (D.11)
Since we assume independence between users’ codewords, we apply the law of
large numbers to the quantities
{∂(log Λnk)
∂(.)
}
k
which are i.i.d with respect to the
measure induced by the random variables {Aα,pext, γ}.
rl = limn→0
EAα,P ext,x,γ
γ(x2
l,0 exp(∑L
l=1(x(l))TQlx(l))
Λnk
(D.12)
ndl = limn→0
EAα,P ext,X,γ
γ∑n
a=1 x2l,a exp
(∑Ll=1(x(l))TQlx
(l))
Λnk
(D.13)
2nml = limn→0
EAα,P ext,X,γ
2γ∑n
a=1 xl,0xl,a exp(∑L
l=1(x(l))TQlx(l))
Λnk
(D.14)
n(n− 1)ql = limn→0
EAα,P ext,X,γ
2γ∑
1≤a<b≤n xl,axl,b exp(∑L
l=1(x(l))TQlx(l))
Λnk
(D.15)
for l = 1, . . . , L, where Λnk = EX,γ
[exp
(∑Ll=1(γx
(l)k )TQlx
(l)k
)]. (D.4) by using
the unit area property of the Gaussian density
et2
=
√η
2π
∫exp
[−η
2y2 +
√2ηty
]dy, ∀t, η (D.16)
which yields
Λnk =EX,γ
[L∏
l=1
(√(dl)2
flπ
)∫exp
[L∑
l=1
−(dl)2
fl(yl −
√γxl,0)2 + clγxl,0
]
.
n∏
a=1
Exa
[exp
[L∑
l=1
2dl√γxl,ayl + (gl − fl)γ(xl,a)
2
] ∣∣∣∣∣γ]
dy
](D.17)
136
D.1 Proof of Claim 5.4.1
where xa is the corresponding transmitted codeword for replica a and y ∈ RL.Notice that cl = 0, and dl = fl − gl, and thus, (D.17) can be written as:
Λnk =EX,γ
[L∏
l=1
(√ηl2π
)∫exp
[−
L∑
l=1
ηl2
(yl −√γxl,0)2
]
.
{Ex
[exp
[L∑
l=1
−ξl(yl)2
2− ξl
2(yl −
√γxl)
2
] ∣∣∣∣∣γ]}n
dy
](D.18)
where η, ξ ∈ RL, ηl = 2(dl)2/fl and ξl = 2dl, l = 1, . . . , L are the multiuser
efficiencies (for each symbol) of the original and postulated channel respectively.Then, it is easy to see that:
limn→0
Λnk = 1 (D.19)
The parameter rl can therefore be computed straightforwardly
limn→0
∂H
∂cl= 0⇒ rl = lim
n→0EAα,P ext,X,γ
(xl,0)2γ exp(∑L
l=1(x(l)k )TQlx
(l)k
)
Λnk
= EAα,P ext,X,γ[γx2l,0] (D.20)
for l = 1, . . . , L.Note that for dl, we have:
limn→0
∂H
∂gl= 0⇒ ndl = lim
n→0EAα,P ext,X,γ
γ∑n
a=1(xl,a)2 exp
(∑Ll=1(x
(l)k )TQlx
(l)k
)
Λnk
= limn→0
nEAα,P ext,X,γ
γ(xl)
2 exp(∑L
l=1(x(l)k )TQlx
(l)k
)
Λnk
where it is used the symmetry among the replicas. Then:
dl = EAα,P ext,x,γ[γ(xl)2] (D.21)
for l = 1, . . . , L.
137
D.1 Proof of Claim 5.4.1
Similarly, we have:
limn→0
∂H
∂dl= 0⇒ 2nml = lim
n→0EAα,P ext,X,γ
2∑n
a=1 xl,0xl,a exp(∑L
l=1(x(l)k )TQlx
(l)k
)
Λnk
(D.22)
=EAα,P ext,X,γ
[L∏
l=1
(γ
√ηl2π
)∫exp
[L∑
l=1
−ηl2
(yl −√γxl,0)2
]2
n∑
a=1
xl,0xl,a
]
(D.23)
=EAα,P ext,X,γ
[2γxl,0
L∏
l=1
(γ
√ηl2π
)∫exp
[L∑
l=1
−ηl2
(yl −√γxl,0)2
](D.24)
.n∑
a=1
xl,a
∏Ll=1
√ξl2π
∫exp
[− ξl
2(yl −√γxl,a)2
]
Ex
[√ξl2π
∫exp
[− ξl
2(yl −√γxl,a)2
]|γ]dy
](D.25)
=EAα,P ext,x,γ
[2γxl,0
L∏
l=1
(γ
√ηl2π
)∫exp
[L∑
l=1
−ηl2
(yl −√γxl,0)2
]n∑
a=1
xl,ady
]
(D.26)
where√
ξl2π
exp[− ξl
2(yl −√γxl,a)
]is the p.d.f. of an auxiliary Gaussian channel
and xa is the posterior mean estimator of the replica component xl,a given theentire received codeword.
Notice that the above expression can be expressed as an expectation over thevariables γ and x, z ∈ RL given the following equivalent single-user channel:
y = γx+ z (D.27)
where z ∼ N(0,Σ), with Σ = diag(η−11 , . . . , η−1
L ). Remark that Equation (D.27)shows that the optimum detector statistically decomposes the system into a bankof single-user channels [27].
If y is the received sequence after a codeword x ∼ P ext with SNR γ has beentransmitted, (D.21) can be rewritten as
2nml = 2EAα,P ext,z,x0,γ
[γxl,0
L∑
l=1
xl,a
]= 2EAα,P ext,z
[n∑
a=1
xl,a
]EAα,P ext,z,x0,γ [γxl,0]
= 2nEAα,P ext,z [xl] EP ext,z,x0,γ [γxl,0] = 2nEAα,P ext,z,x0,γ [γxl,0xa]
⇒ ml = EAα,P ext,z,x0,γ [γxl,0xl] (D.28)
138
D.2 Proof of Lemma 5.4.1
for l = 1, . . . , L, where x and y = γx + z account for received and transmittedcodewords once the replicas are cleared out. Notice also that the expectation ofall xl,a over P ext|Aα yields the same result. In fact, the random variable xl|Aαhas the same distribution of xl,a|Aα, ∀a ∈ {1, . . . , n}.
Following an analogous procedure, we obtain
ql = EAα,P ext,z,x,γ
[γ(xl)
2]
(D.29)
for l = 1, . . . , L.By simple combination between the replicas, it is easy to see that for each
l = 1, . . . , L, we have:
rl − 2ml + ql = = EAα,P ext,z,x,γ
[γ (xl,0 − xl)2] (D.30)
dl − ql = EAα,P ext,z,x,γ
[γ (xl − xl)2] (D.31)
which in turn leads to the set of fixed-point equations
(ηl)−1 = 1 + βEAα,P ext,z,x,γ
[γ (xl,0 − xl)2] (D.32)
(ξl)−1 = σ2 + βEAα,P ext,z,x,γ
[γ (xl − xl)2] (D.33)
using (D.7)-(D.10) and the definition of (ηl, ξl).For the case of interest here, i.e., individually optimum detection, the postu-
lated noise variance σ2 coincides with the true variance σ0 = 1 so that the replicasolution η = ξ is chosen, and the fixed-point equation is reduced to:
(ηl)−1 = 1 + βEAα,P ext,z,x,γ
[γ (xl − xl)2] (D.34)
Since P ext depends on the state of the channel in the previous iteration, wecan express the result recursively in the following manner:
η(`)l =
(1 + βEAα,P ext,z,x,γ
[γ (xl − xl)2}
)−1
(D.35)
where the MMSE estimate can be expressed as
xl = xl(η(`),P ext, γ) = xl(η
(`),η(`−1), γ). (D.36)
D.2 Proof of Lemma 5.4.1
Notice first that there are 2(L−1)R active codewords instead of 2LR because wekeep the initial symbol to encode the activity. Let P ?
e be probability of errorunder Bayesian detection.
139
D.2 Proof of Lemma 5.4.1
Then,by recalling that all active codewords are equiprobable, P ?(e|Aα=1) can be
decomposed as:
P ?(e|Aα=1) = 2−(L−1)R
2(L−1)R∑
j=1
P ?(e|Aα=1,cj)
(D.37)
where P ?(e|Aα=1,cj)
, P ?(e|Aα=1,y=
√γcj+z) corresponds to the error of activity detec-
tion when the codeword cj has been transmitted. Thus, each term in (D.37) canbe computed as follows:
P ?(e|Aα=1,cj)
= Pr
2(L−1)R⋂
i=1
{P (0|y =√γcj + z) ≥ P (ci|y =
√γcj + z)}
,
(D.38)
where P(0|y =
√γcj + z
)and P
(ci|y =
√γcj + z
), i = 1, . . . , 2(L−1)R are the
posterior probabilities given the received sequence y. Then, we can decompose(D.38) as
P ?(e|Aα=1,cj)
= Pr
2(L−1)R⋂
i=1
{P (0)P (√γcj + z|0) ≥ P (ci)P (
√γcj + z|ci)}
(D.39)
= Pr
2(L−1)R⋂
i=1
(1− α){P (√γcj + z|0) ≥ α2−(L−1)RP (
√γcj + z|ci)
}
(D.40)
= Pr
{max
ci
{α2−(L−1)RP (
√γcj + z|ci)
}≤ (1− α)P (
√γcj + z|0)
},
(D.41)
where it is used that P (0) = 1 − α, P (ci) = α2−(L−1)R, ∀i = 1, . . . , 2(L−1)R, andthe following equivalence for every set of real values A1, . . . ,AM and γ ∈ R,
Pr
{M⋂
i=1
{Ai ≤ γ}}
, Pr
{max
i=1,...,M{Ai} ≤ γ
}(D.42)
Hence, by expressing the channel transition probabilities P (y|.) under thelarge-system equivalent multivariate Gaussian channel, (D.41) can be further de-
140
D.2 Proof of Lemma 5.4.1
veloped as:
Pr
[max
ci
{−(√γ(ci − cj) + z)TΣ−1(
√γ(ci − cj) + z)
}≤
2 log
(1− αα
)+ 2(L− 1)R log 2− (
√γci + z)TΣ−1(
√γci + z)
}(D.43)
= Pr
[min
ci
{(√γ(ci − cj) + z)TΣ−1(
√γ(ci − cj) + z)
}≥
2 log
(α
1− α
)− 2(L− 1)R log 2 + (
√γci + z)TΣ−1(
√γci + z)
}, (D.44)
where the last equality results from exchanging the sign in both sides of theinequality inside the probability in (D.43).
By adding all terms (D.44) ∀j = 1, . . . , 2(L−1)R in equation (D.37), we finallyobtain that P ?
(e|Aα=1) equals
2−(L−1)R
2(L−1)R∑
j=1
Pr
[min
ci
{(√γ(ci − cj) + z)TΣ−1(
√γ(ci − cj) + z)
}≥
(√γci + z)TΣ−1(
√γci + z) + 2 log
(α
1− α
)− 2(L− 1)R log 2
}.
(D.45)
On the other hand, the probability of detecting activity at any iteration whenno user is active, P ?
(e|Aα=0) , P ?(e|Aα=0,y=z), can be expressed as:
P ?(e|Aα=0) = Pr
2(L−1)R⋃
i=1
{P (ci|y = z) ≥ P (0|y = z)}
(D.46)
= Pr
2(L−1)R⋃
i=1
{P (ci)P (z|ci) ≥ P (0)P (z|0)}
(D.47)
= Pr
2(L−1)R⋃
i=1
{α2−(L−1)RP (z|ci) ≥ (1− α)P (z|0)
} (D.48)
= Pr
{max
ci
{α2−(L−1)RP (z|ci)
}≥ (1− α)P (z|0)
}, (D.49)
141
D.3 Proof of Proposition 5.4.1
where it is used the following equivalence for every set of real values A1, . . . ,AM
and γ ∈ R:
Pr
2(L−1)R⋃
i=1
{Ai ≥ γ}
, Pr
{max
i=1,...,M{Ai} ≥ γ
]
We can further develop (D.49) using the large-system equivalent multivariateGaussian channel:
Pr
[max
ci
{exp
(log(α)− (L− 1)R log 2− 1
2(z −√γci)TΣ−1(z −√γci)
)}≥
exp
(log(1− α)− 1
2zTΣ−1z
)](D.50)
= Pr
[max
ci
{2 log
(α
1− α
)− 2(L− 1)R log 2− (z −√γci)TΣ−1(z −√γci)
}≥
− zTΣ−1z
](D.51)
Finally, by exchanging the sign inside the probability of (D.51) we obtain:
P ?(e|Aα=0) = Pr
{min
ci
{(z −√γci)TΣ−1(z −√γci)
}
≤ zTΣ−1z + 2 log
(α
1− α
)− 2(L− 1)R log 2
}(D.52)
D.3 Proof of Proposition 5.4.1
Based on the exact error probabilities given in Lemma 5.4.1, we define the fol-lowing (suboptimal) decoding regions where the looseness results from ignoringthe time correlation by taking Σ = ηγI, and omitting the term −2(L− 1)R log 2on the right-hand side of each inequality.
D(c 6= 0) =
{y ∈ RL, ||y||2 ≥ min
ci||y −
√η(`)γci||2 + 2 log
(1− αα
)}(D.53)
D(0) =
{y ∈ RL, ||y||2 < min
ci||y −
√η(`)γci||2 + 2 log
(1− αα
)}(D.54)
142
D.3 Proof of Proposition 5.4.1
Denote P ′(`)e , αP ′(`)(e|Aα=1) +(1−α)P ′
(`)(e|Aα=0) the average activity detection error,
and P ′(`)(e|Aα=1) and P ′
(`)(e|Aα=0) the conditional error probabilities under the decoding
regions (D.53) and (D.54). Thus, by recalling our suboptimality arguments, we
have that P(`)e ≤ P ′(`)e . Further, we can develop an upper bound on P ′
(`)(e|Aα=1) ,
2−(L−1)R∑2(L−1)R
j=1 Pr{y ∈ D(0)
∣∣∣y =√η(`)γcj + z
]as follows:
P ′(`)(e|Aα=1) =2−(L−1)R
2(L−1)R∑
j=1
Pr
{||√η(`)γcj + z||2 < min
ci||√η(`)γ(cj − ci) + z||2
+ 2 log
(1− αα
)}(D.55)
≤2−(L−1)R
2(L−1)R∑
j=1
Pr
{||√η(`)γcj + z||2 < ||z||2 + 2 log
(1− αα
)}
(D.56)
=2−(L−1)R
2(L−1)R∑
j=1
Pr
{−zT
√η(`)γcj >
η(`)γL− 2 log(
1−αα
)
2
}(D.57)
where (D.56) results from choosing (most of the times wrongly) ci = cj in theright-hand side minimization. Notice that
−zT√η(`)γcj = −
√η(`)γ
L∑
l=1
cj(l)zl (D.58)
and it is a Gaussian distributed random variable with zero mean and varianceequal to η(`)γL. Since each codeword is modulated with BPSK, we have that(D.57) is equivalent to
Pr
{t >
η(`)γL− 2 log(
1−αα
)
2√η(`)γL
}= Q
(η(`)γL− 2 log
(1−αα
)
2√η(`)γL
)(D.59)
where t ∼ N(0, 1) and Q(.) is its tail function. By using the upper bound Q(x) ≤e−x
2/2 for sufficiently large x, we finally obtain
P ′(`)(e|Aα=1) ≤ exp
(−(η(`)γL− 2 log
(1−αα
))2
8η(`)γL
)=
√1− αα
e−η(`)γL
8 (D.60)
Given the decoding region (D.53), notice first that P(`)(e|Aα=0) ≤ P ′
(`)(e|Aα=0) by
using Σ = ηγI, with η+ minl=1,dotsc,L ηl, and ignoring the term −2(L− 1)R log 2
143
D.4 Proof of Corollary 5.4.1
in the corresponding Bayes error probability (5.17). Moreover, P ′(`)(e|Aα=0) can be
computed as follows:
P ′(`)(e|Aα=0) = Pr
{y ∈ D(c 6= 0)
∣∣∣y = z}
= Pr
{||z||2 ≥ min
ci||z −
√η(`)γ(ci)||2 + 2 log
(1− αα
)}
= Pr
{zT√η(`)γc? ≥
η(`)γL+ 2 log(
1−αα
)
2
}, (D.61)
where c?(z) ∈ FL2 is the codeword that minimizes{||z −
√η(`)γ(ci)||2
}. Remark
that (D.61) only differs from (D.57) by a sign flip in the term 2 log(
1−αα
). Hence,
(D.61) can be similarly upper-bounded as P ′(`)(e|Aα=0) ≤
√α
1−αe− η
(`)γL8 . As a result,
we obtain on the one hand that P(`)(e|Aα=0) ≤
√α
1−αe− η
(`)γL8 , using P
(`)(e|Aα=0) ≤
P ′(`)(e|Aα=0), and on the other hand, an upper bound on the average error probability
via P ′(`)e :
P (`)e ≤ P ′
(`)e ≤ α
√1− αα
e−η(`)γL
8 + (1− α)
√α
1− αe− η
(`)γL8
= 2√α(1− α)e−
η(`)γL8 . (D.62)
Then, by using (D.62), we can also find an upper bound on P(`)(e|Aα=1) as it is
shown below:
αP(`)(e|Aα=1)+(1− α)P
(`)(e|Aα=0) ≤ 2
√α(1− α)e−
η(`)γL8 (D.63)
⇒ P(`)(e|Aα=1) ≤
2√α(1− α)e−
η(`)γL8 − (1− α)P
(`)(e|Aα=0)
α≤ 2
√1− αα
e−η(`)γL
8 .
(D.64)
D.4 Proof of Corollary 5.4.1
As shown in Section 5.4.2, when ρ < ρth the extrinsic probabilities for the inactiveusers converge with probability 1 to
(pext|Aα = 0)→ (0, 1, 0) (D.65)
144
D.4 Proof of Corollary 5.4.1
for every l = 1, . . . , L and SNR. Hence, the L-dimensional fixed-point equation(5.11) can be simplified by noticing that
EAα,P ext,z,x,γ
[γ (xl − xl)2] = αE(P ext,z,x,γ|Aα=1)
[γ (xl − xl)2]
+ (1− α)E(P ext,z,x,γ|Aα=0)
[γ (xl − xl)2] . (D.66)
We now focus on the second term in (D.66). Since the extrinsic probabilities aredeterministic, the corresponding MMSE estimate will also be deterministic withvalue xl = xl for l = 1, . . . , L. Therefore, the second term cancels and we finallyhave
η(`)l =
(1 + βαγE
(P(`−1)ext ,z,x|Aα=1)
[(xl − xl)2]
)−1
(D.67)
=
(1 + βαγ
(1− E
(P(`−1)ext ,z,x|Aα=1)
[(xl)
2]))−1
(D.68)
Note here that in the case of active users, coded symbols are independent due tothe effect of the interleaver. Hence, we can omit the time subscripts, leading tothe fixed-point equation (5.30), where x = x(η(`−1), η(`), γ) is the MMSE of theequivalent single-user Gaussian channel (5.12) [28] and pext is the extrinsic priorof a given symbol.
When ρ ≥ ρth the lower-bound (5.32) follows from the substitution of the
message probabilities p(`−1)(ext|Aα=1),p
(`−1)(ext|Aα=0) ∈ R3 given in (5.29) and (5.26) into
the fixed-point equation (5.11)
η(`)l =
(1 + βαγ
(1− E
p(`−1)ext ,z,x|Aα=1)
[(xl)
2])
+ β(1− α)γE(p
(`−1)ext ,z,x|Aα=0)
[(xl)
2])−1
(D.69)
Since, pext
are i.i.d processes across the time domain, (D.69) is equivalent tothe fixed-point equation (5.32) ∀l = 1, . . . , L, which leads to a lower bound onthe density evolution mapping Ψ (5.11) based on the MMSE inequalities 1 −E
(p(`−1)ext ,z,x|Aα=1)
[x2] ≥ 1 − E(P
(`−1)ext ,z,x|Aα=1)
[(xl)2] ≥ 0 and E
(p(`−1)ext ,z,x|Aα=0)
[x2] ≥E
(P(`−1)ext ,z,x|Aα=0)
[(xl)2] ≥ 0.
145
References
[1] L. Bahl, J. Cocke, F. Jelinek, and J. Raviv. Optimal decoding oflinear codes for minimizing symbol error rate. IEEE Trans. Inform. Theory,20(2):284–287, Mar 1974. 94
[2] V. B. Balakirsky. Joint source-channel coding with variable length codes.In Proc. 1997 IEEE Int.. Symp. on Inf. Theory., page 419, 1997. 11
[3] E. Biglieri, E. Grossi, M. Lops, and A. Tauste Campo. Large-system analysis of a dynamic CDMA system under a Markovian input pro-cess. In Proc. IEEE Int. Symp. on Inf. Theory, Toronto, Canada, July 2008.66, 90
[4] E. Biglieri and M. Lops. Multiuser detection in a dynamic environment-Part I: User identification and data detection. IEEE Trans. Inf. Theory,53(9):3158–3170, Sept. 2007. 66, 67
[5] R. E. Blahut. Hypothesis testing and information theory. IEEE Trans.Inf. Theory, IT-20(4):405–417, 1974. 27, 28, 116, 117
[6] J. Boutros and G. Caire. Iterative multiuser joint decoding. IEEETrans. Inf. Theory, 48(7):1772–1793, Jul. 2002. 91, 95, 97, 98, 99, 101, 104
[7] G. Caire, R. R. Muller, and T. Tanaka. Iterative multiuser jointdecoding: Optimal power allocation and low-complexity implementation.IEEE Trans. Inf. Theory, 50(9):1950–1973, Sep. 2004. 91, 97, 98, 103, 104,106, 134
[8] A. Tauste Campo and A. Guillen i Fabregas. Large system analysisof iterative multiuser joint decoding with an uncertain number of users. InProc. IEEE Int. Symp. on Inf. Theory, pages 2103–2107, Austin, US, June2010. 90
[9] A. Tauste Campo, A. Guillen i Fabregas, and E. Biglieri. Large-system analysis of multiuser detection with an unknown number of users.
146
REFERENCES
Technical Report CUED/F-INFENG/TR.601, Department of Engineering,University of Cambridge, May 2008. ISSN 0951-9211. 106
[10] A. Tauste Campo, A. Guillen i Fabregas, and E. Biglieri. Large-system analysis of multiuser detection with an unknown number of users: Ahigh-SNR approach. IEEE Trans. Inf. Theory, 57(6):3416–3428, June 2011.77, 83
[11] C. Chatellier, H. Boeglen, C. Perrine, C. Olivier, and O. Hae-berle. A robust joint source channel coding scheme for image transmissionover the ionospheric channel. Sig. Process.: Image Commun., 22(6):543–556,2007. 11
[12] B. Chen and L. Tong. Traffic-aided multiuser detection for random-accesscdma networks. IEEE Trans. Sig Process., 49(7):1570–1580, July 2001. 66
[13] T. M. Cover and J. A. Thomas. Elements of Information Theory.Wiley-Interscience, NJ, 2006. 10, 14, 18, 19, 20, 47, 64, 91, 113, 114
[14] I. Csiszar. Joint source-channel error exponent. Probl. Contr. Inf. Theory,9:315–328, 1980. 4, 11, 12, 23, 25, 28, 32, 55, 63, 117
[15] I. Csiszar. On the error exponent of source-channel transmission witha distortion threshold. IEEE Trans. Inf. Theory, IT-28(6):823–828, Nov.1982. 11, 23
[16] I. Csiszar. The method of types. IEEE Trans. Inf. Theory, 44(6):2505–2523, 1998. 20, 23, 114
[17] I. Csiszar and J. Korner. Information Theory: Coding Theorems forDiscrete Memoryless Systems. Cambridge University Press, 2011. 19, 20, 23,38, 114, 116, 117
[18] Guo. D and C. C. Wang. Random sparse linear systems observed viaarbitrary channels: A decoupling principle. In Proc. IEEE Int. Symp. onInf. Theory, Nice, France, June 2007. 73
[19] R. L. Dobrushin. Mathematical problems in the Shannon theory of optimalcoding of information. In Proc. 4th Berkeley Symp. Math., Statist., Probabil.,1, pages 211–252, 1961. 11
[20] R. S. Ellis. Entropy, Large Deviations, and Statistical Mechanics, 271. Aseries of comprehensive studies in mathematics. Springer-Verlag, 1985. 71,135
147
REFERENCES
[21] J. Evans and D. Tse. Large system performance of linear multiuserreceivers in multipath fading channels. IEEE Trans. Inform. Theory,46(6):2059–2078, Sept. 2000. 69
[22] A. Feinstein. A new basic theorem of information theory. IRE Trans. Inf.Theory, IT-4:2–22, 1954. 12, 25
[23] R. G. Gallager. Information Theory and Reliable Communication. JohnWiley & Sons, Inc., New York, 1968. 4, 11, 12, 19, 20, 22, 24, 26, 32, 33, 35,53, 63, 110, 116, 117
[24] R. G. Gallager. A random coding bound on fixed composition codes.Class notes, http://www.rle.mit.edu/rgallager/documents/notes6.pdf, 1982.Class notes. 23, 118
[25] A. Grant and C. Schlegel. Coordinated multiuser communications.Springer, New York, 2006. 91
[26] D. Guo, D. Baron, and S. Shamai. A single-letter characterization ofoptimal noisy compressed sensing. In 47th Ann. Allerton Conf. on Comm.,Control, and Computing, pages 52–59, 2009. 70
[27] D. Guo and T. Tanaka. Generic multiuser detection and statisticalphysics. In Advances in Multiuser Design. M. L. Honig Ed. John Wiley& Sons, 2008. 69, 70, 72, 80, 135, 138
[28] D. Guo and S. Verdu. Randomly spread CDMA: Asymptotics via sta-tistical physics. IEEE Trans. Inf. Theory, 51(6):1983–2007, June 2005. 5,70, 71, 72, 74, 75, 76, 92, 134, 145
[29] D. Guo, Y. Wu, and S. Verdu. Estimation in gaussian noise: Propertiesof the minimum mean-square error. IEEE Trans. Inf. Theory, 57(4):2371–2385, 2011. 81
[30] A. Guyader, E. Fabre, C. Guillemot, and M. Robert. Joint source-channel turbo decoding of entropy-coded sources. IEEE Journal on Sel.Areas in Comms., 19(9):1680–1696, 2001. 11
[31] K. Halford and M. Brandt-Pearce. New-user identification in aCDMA system. IEEE Trans. Commun., 46:144–155, Jan. 1998. 66
[32] T. S. Han. Information-Spectrum Methods in Information Theory.Springer-Verlag, Berlin, Germany, 2003. 11, 13, 18, 25, 31
148
REFERENCES
[33] T. S. Han. Joint source-channel coding revisited: Information-spectrumapproach. Arxiv preprint arXiv:0712.2959v1, 2007. 11, 12, 13, 18, 21, 25, 31
[34] M. Hayashi. Information spectrum approach to second-order coding ratein channel coding. IEEE Trans. Inf. Theory, 55(11):4947–4966, Nov. 2009.26
[35] M. L. Honig and H. V. Poor. Adaptive interference suppression inwireless communication systems. In Wireless Communications: Signal Pro-cessing Perspectives. H.V. Poor and G.W. Wornell, Eds Englewood Cliffs,NJ:Prentice Hall, 1998. 66
[36] F. Jelinek. Probabilistic Information Theory. New York: McGraw-Hill,1968. 19, 53, 54, 116
[37] S. B. Korada and N. Macris. Tight bounds on the capacity of binaryinput random cdma systems. IEEE Trans. Inf. Theory, 56(11):5590–5613,Nov. 2010. 73
[38] S. B. Korada and A. Montanari. Applications of the lindeberg prin-ciple in communications and statistical learning. IEEE Trans. Inf. Theory,57(4):2440–2450, Apr. 2011. 73, 80
[39] F. Kschischang, B. Frey, and H. A. Loeliger. Factor graphs andthe sum-product algorithm. IEEE Trans. Inf. Theory, 47(2):498–519, Feb.2001. 91, 95
[40] A. Lozano, A. M. Tulino, and S. Verdu. Optimum power allocationfor parallel gaussian channels with arbitrary inout distributions. IEEE Trans.Inf. Theory, 52(7):3033–3051, July 2006. 76, 85
[41] A. Martinez and A. Guillen i Fabregas. Saddlepoint approximation ofrandom-coding bounds. In Information Theory and Applications Workshop(ITA), 2011, pages 1–6, 2011. 26, 31, 42
[42] M. Mezard and A. Montanari. Information, Physics, and Computation.Oxford University Press, 2009. 70, 71
[43] A. Montanari and D. Tse. Analysis of belief propagation for non-linearproblems: The example of CDMA (or: How to prove tanaka’s formula). InProc. of IEEE Inform. Theory Work., Punta del Este, Uruguay, March 2006.73, 80, 95, 102
149
REFERENCES
[44] R. R. Muller. Channel capacity and minimum probability of error inlarge dual antenna array systems with binary modulation. IEEE Trans. Sig.Process., 51(11):2821–2828, Nov. 2003. 69, 75
[45] J. Neyman and E. S. Pearson. On the problem of the most efficienttests of statistical hypotheses. Philosophical Transactions of the Royal Soci-ety of London. Series A, Containing Papers of a Mathematical or PhysicalCharacter, 231(694-706):289, 1933. 27
[46] H. Nishimori. Statistical Physics of Spin Glasses and Information Process-ing: An Introduction. Oxford University Press, 2001. 70, 71, 79, 134
[47] T. Oskiper and H. V. Poor. Online activity detection in a multiuserenvironment using the matrix cusum algorithm. IEEE Trans. Inf. Theory,48(2):477–493, Feb. 2002. 66
[48] B. D. Pettijohn, M. W. Hoffman, and K. Sayood. Jointsource/channel coding using arithmetic codes. IEEE Trans. on Comms.,49(5):826–836, 2001. 11
[49] Y. Polyanskiy, H. V. Poor, and S. Verdu. Channel coding rate in thefinite blocklength regime. IEEE Trans. Inf. Theory, 56(5):2307–2359, 2010.4, 12, 26, 27, 28, 30, 40, 41, 42, 46, 55
[50] V. Poor. An introduction to Signal Detection and Estimation. Springer-Verlag, New York, 1988. 45, 72
[51] K. Ramchandran, A. Ortega, K. M. Uz, and M. Vetterli. Mul-tiresolution broadcast for digital HDTV using joint source/channel coding.IEEE Journal on Sel. Areas in Commun., 11(1):6–23, 2002. 11
[52] M. Reed, C. Schlegel, P. Alexander, and J. Asentorfer. Itera-tive multiuser detection for CDMA with FEC: Near single-user performance.IEEE Trans. Commun., 46:1693–1699, Dec. 1998. 91
[53] T. Richardson and R. L. Urbanke. Modern coding theory. CambridgeUniversity Press, New York, 2008. 91, 95, 97
[54] C. Shannon. A mathematical theory of communication. Bell Syst. Tech.J., 27:379–423 and 623–656, July and Oct. 1948. 8, 10, 21
[55] C. E. Shannon. Certain results on the error probability of channel coding.Inf. Contr., 10(1):65–103, 1957. 12
150
REFERENCES
[56] M. Talagrand. Spin Glasses: A Challenge for Mathematicians. Springer,2003. 73
[57] T. Tanaka. A statistical-mechanics approach to large-system analysis ofCDMA multiuser detectors. IEEE Trans. Inf. Theory, 48(11):2888–2910,Nov. 2002. 5, 67, 69, 70, 74, 75, 79, 81, 91, 95, 134
[58] T. Tanaka, M. Yano, and D. Saad. A mean field theory of codedCDMA. J. Phys. A: Math. Theor., 41, 2008. 97, 134
[59] D. Tse and S. Hanly. Linear multiuser receivers: effective interference,effective bandwidth, and user capacity. IEEE Trans. Inf. Theory, 45(2):641–657, March 1999. 69, 72, 91
[60] D. Tse and S. Verdu. Optimum asymptotic multiuser efficiency of ran-domly spread cdma. IEEE Trans. Inf. Theory, 46(11):2718–2722, Nov. 2000.72, 80, 85
[61] M. Vehkapera, K. Takeuchi, R. R. Muller, and T. Tanaka.Asymptotic analysis of iterative channel estimation and multiuser detectionwith soft feedback in multipath channels. In Proc. Europ. Sig. Process. Conf.(Eusipco’08), Lausanne, Switzerland, Aug. 2008. 98
[62] S. Vembu, S. Verdu, and Y. Steinberg. The source-channel separationtheorem revisited. IEEE Trans. Inf. Theory, 41(1):44–54, 1995. 10, 11, 18,21
[63] S. Verdu. Multiuser Detection. Cambridge University Press, 1998. 66, 67,129
[64] S. Verdu and T. S. Han. Approximation theory of output statistics.IEEE Trans. Inf. Theory, 39(3):752–772, May 1993. 11, 18, 21
[65] S. Verdu and T. S. Han. A general formula for channel capacity. IEEETrans. Inf. Theory, 40(4):1147–1157, July 1994. 11, 12, 18, 21, 25
[66] S. Verdu and S. (Shitz) Shamai. Spectral efficiency of CDMA withrandom spreading. IEEE Trans. Inf. Theory, 45(2):pp. 622–640, March1999. 72
[67] X. Wang and V. Poor. Iterative (turbo) soft interference cancellationand decoding for coded CDMA. IEEE Trans. Commun., 47:1047–1061, Jul.1999. 91
151
REFERENCES
[68] J. Wolfowitz. The coding of messages subject to chance errors. IllinoisJ. Math., 1:591–606, 1957. 48
[69] Y. Zhong, F. Alajaji, and L. L. Campbell. On the joint source-channelcoding error exponent for discrete memoryless systems. IEEE Trans. Inf.Theory, 52(4):1450–1468, April 2006. 11, 12, 23, 24, 25, 28, 32, 55, 57, 59,64
152