A Proof of the Converse for the Capacity of Gaussian MIMO...

A Proof of the Converse for the Capacity of

Gaussian MIMO Broadcast Channels

Mehdi Mohseni and John M. CioffiDepartment of Electrical Engineering

Stanford University, Stanford, CA 94305-9510{mmohseni, cioffi}@stanford.edu

Abstract

The paper provides a proof of the converse for the capacity region of the Gaussian

MIMO broadcast channel. The proof uses several ideas from earlier works on the prob-

lem including the recent converse proof by Weingarten, Steinberg and Shamai. First

the duality between Gaussian multiple access and broadcast channels is used to show

that every point on the boundary of the dirty paper coding region can be represented

as the optimal solution to a convex optimization problem. Using the optimality con-

ditions for this convex problem, a degraded broadcast channel is constructed for each

point. It is then shown that the capacity region for this degraded broadcast channel

contains the capacity region of the original channel. Moreover, the same point lies on

the boundary of the dirty paper coding region for this degraded channel. Finally, the

standard entropy power inequality is used to show that this point lies on the boundary

of the capacity region of the degraded channel as well and consequently it is on the

boundary of the capacity region of the original channel.

Index Terms–Broadcast channel (BC), multiple access channel (MAC), capacity region, dirtypaper coding (DPC) region, duality, convex optimization, Karush-Kuhn-Tucker (KKT) op-timality conditions, entropy power inequality (EPI).

1 Introduction

Consider a memoryless Gaussian multiple-input multiple-output (MIMO) broadcast channel(BC) with K ≥ 2 receivers. Assume that the transmitter has t antennas and each receiverhas r antennas. Equal number of receive antennas is chosen to simplify notation. The proofreadily applies to the case of receivers with different numbers of antennas. The receivedsymbols of user k = 1, . . . , K at transmission i can be expressed in terms of the transmittedsymbols and channel coefficients as,

yk(i) = Hkx(i) + zk(i), (1)

1

PSfrag replacements

xn(W1,W2)

W1 ∈ {1, . . . , enR1}

W2 ∈ {1, . . . , enR2}

yn1

yn2

zn1

zn2

H1

H2

W1(yn1 )

W2(yn2 )

Figure 1: Gaussian MIMO broadcast channel.

where x(i) ∈ Rt is the vector of transmitted symbols and yk(i) ∈ R

r is the vector of receivedsymbols. The noise vectors zk(i) for k = 1, . . . , K and i = 1, 2, . . . are i.i.d. white Gaussiannoise with identity covariance matrix, Ir. The matrices Hk ∈ R

r×t, k = 1, . . . , K, representthe channel gains, where the entry Hk(i, j) denotes the channel gain from transmit antennaj to receive antenna i of user k.

A code with rates (R1, . . . , RK) and block length n, denoted by C(enR1 , . . . , enRK , n),consists of K message sets Wk = {1, . . . , enRk} that contain the intended messages for userk = 1, . . . , K, an encoding function xn(W1, . . . ,WK) that maps the messages (W1, . . . ,WK) ∈W1×· · ·×WK into the transmitted codewords and K decoding functions Wk(y

nk ) that assign

the messages Wk ∈ Wk to the received codewords ynk for k = 1, . . . , K. Average probability

of decoding error, P(n)e , is defined as the probability that either of the transmitted messages

Wk is decoded erroneously. For the moment, as for the case of Gaussian channels, a totalaverage power constraint is assumed on the transmitted codewords, i.e., for every codeword,

1

n

n∑

i=1

x(i)Tx(i) ≤ P.

More general constraints on the covariance matrix of the transmitted codewords will beconsidered later in the paper. The rate-tuple (R1, . . . , RK) is said to be achievable if thereexists a sequence of C(enR1 , . . . , enRK , n) codes such that the average probability of error goesto zero as the block length n goes to infinity. The capacity region is then the convex hullof the union of all achievable rates. The capacity region of the Gaussian MIMO BC undertotal average transmit power constraint P is referred to by CBC(P ). A two-user GaussianMIMO BC is shown in Figure 1.

Unlike the scalar BC with t = r = 1, the Gaussian MIMO BC in (1) is not degradedin general, hence, the superposition coding and successive decoding of the scalar case isnot applicable to the MIMO channel. In the pioneering works [5] [6], Caire and Shamaiused Costa’s “writing on dirty paper” result [2] to establish an achievable rate region forthe Gaussian MIMO BC, commonly referred to as the “Dirty Paper Coding” (DPC) region.They showed that their proposed scheme achieves the sum-rate capacity of a 2-user GaussianMIMO BC with 2 transmit antennas and one receive antenna at each receiver, and conjec-

2

tured that this achievable rate region is the capacity. Independent works presented in [8], [9]and [10] further established the optimality of the DPC scheme for the sum-rate. Progresstowards establishing this conjecture in general was made in [11] and [12]. By introducingthe Degraded Same Marginal (DSM) outer bound, the proof of the conjecture was reducedto that for a degraded Gaussian MIMO BC. The conjecture was finally settled in [13], wherethe DPC region was proven to be equal to the capacity region. In this paper, an alterna-tive proof of the aforementioned conjecture is provided. The contribution of this paper isto combine several ideas from previous works to provide a rather more intuitive and muchsimpler converse proof. While the proof employs some previously used ideas, as will be clearlater in the paper, it has several key differences with the recent converse of [13].

The rest of this paper is organized as follows: In Section 2, the DPC region is revisited.Based on duality between Gaussian multiple access and broadcast channels, every boundarypoint of the DPC region is represented as the solution to a convex optimization problem.The proof of the converse for K = 2 users is given in Section 3. The proof is extended tomore than two users in Section 3.1. In addition, the optimality of the DPC is proven underany arbitrary compact and convex constraint on the transmit covariance matrix in Section3.2. Section 4 summarizes the paper.

The following notations and abbreviations will be used throughout the paper. Uppercase letters denote matrices and boldface letters denote vectors. The ith element of a vectora is denoted by ai. The (i, j) entry of a matrix A is denoted by A(i, j). AT is the transposeof A and |A| is its determinant. An identity matrix of size n× n is denoted by In. E(.) andtr(.) denote the expectation and trace operations, respectively. For a symmetric matrix A,A � 0 and A � 0 mean that A is positive semi-definite and positive definite, respectively.The abbreviations BC, MAC, DPC and EPI are used for broadcast channel, multiple accesschannel, dirty paper coding region and entropy power inequality, respectively.

2 Dirty Paper Coding Region

Dirty paper coding region is constructed based on a surprising result on the capacity ofchannels with non-causal transmitter side information. Consider a Gaussian MIMO channelgiven by,

y = x + s + z,

where x ∈ Rt is the transmitted signal vector. s and z ∈ R

t are the zero mean Gaussianinterference vector with covariance matrix Ss and the Gaussian noise vector with covariancematrix Sz, respectively, and are independent of each other. Assume the interference sequencesn is completely known at the transmitter but unknown to the receiver. Therefore to encodethe message W , the encoder can choose the transmitted codeword according to W and theinterference sequence sn as xn(sn,W ). Also assume that there is an average power constrainton each transmitted codeword. It was shown in [3] [4] that the capacity of this channel is

3

the same as if the interference s does not exist, i.e.,

C = maxtr(Sx)≤P

1

2log

|Sx + Sz|

|Sz|. (2)

In other words, interference can be pre-subtracted at the transmitter without increase intransmit power. This result which is known as “writing on dirty paper”, can be consideredas a generalization of the Costa’s work [2] where similar result was obtained for the capacity ofa Gaussian scalar channel with i.i.d. Gaussian interference. By using the idea of subtractinginterference at the transmitter instead of the receiver, superposition coding can be used innon-degraded Gaussian MIMO BCs. Caire and Shamai [5] [6] used this “writing on dirtypaper” idea to establish an achievable rate region for a 2-user Gaussian MIMO BC with 2transmit antennas and one receive antenna per user. This achievable rate region was referredto as the DPC region and it has been generalized to Gaussian MIMO BCs with arbitrarynumber of users and antennas [16].

In the DPC scheme, users’ messages are encoded successively and the correspondingcodewords are added together to form the transmitted codeword. Figure 2 illustrates theDPC scheme for a 2-user Gaussian MIMO BC. Assume the message for user 1 is encodedfirst by using Gaussian codewords with covariance matrix S1 = E(x1x

T1 ). Consequently,

the codeword of user 1, xn1 (W1), can be viewed as a Gaussian interference for user 2 that

is completely known to the encoder. Therefore, by the writing on dirty paper result, theencoder can pre-subtract this interference at the transmitter without increase in transmitpower. Moreover, the codewords for user 2, xn

2 (xn1 ,W2), are also Gaussian codewords and

are statistically independent from xn1 (W1). Let S2 = E(x2x

T2 ) denote the covariance matrix

for these codewords. By completely treating the Gaussian interference from user 2 as noise,receiver 1 can achieve the rate R1 and in the absence of interference from user 1, receiver 2can achieve the rate R2 as given below:

R1 =1

2log

∣

∣H1S1HT1 + H1S2H

T1 + Ir

∣

∣

∣

∣H1S2HT1 + Ir

∣

∣

,

R2 =1

2log∣

∣H2S2HT2 + Ir

∣

∣ .

Since x = x1 + x2 and the codewords are independent, the transmit covariance matrixS = E(xxT ) is given by S1 + S2. Hence, by using various code-books with covariancematrices S1, S2 � 0 that satisfy the average power constraint, tr(S1 + S2) ≤ P , an achievablerate region can be constructed. This region can be expanded further by using the otherencoding order. The following lemma summarizes the DPC scheme for a K-user GaussianMIMO BC.

Lemma 2.1 Given a permutation π on {1, . . . , K} and a set of positive semi-definite matri-ces Sk, k = 1, . . . , K, such that tr

(∑

k Sk

)

≤ P , any rate-tuple in the set F(π, {Hk}, {Sk})given as below is achievable for the Gaussian MIMO BC in (1).

F(π, {Hk}, {Sk}) ={

R ∈ RK+ : Rk ≤ Rk, k = 1, . . . , K

}

, (3)

4

PSfrag replacements

xn

xn1 (W1)

xn2 (xn

1 ,W2)

yn1

yn2

zn1

zn2

H1

H2

W1 W1(yn1 )

W2 W2(yn2 )

Figure 2: Dirty paper coding for Gaussian MIMO BC.

where for k = 1, . . . , K, Rk is defined as,

Rπ(k) =1

2log

∣

∣

∣Hπ(k)

(

∑Ki=k Sπ(i)

)

HTπ(k) + Ir

∣

∣

∣

∣

∣

∣Hπ(k)

(

∑Ki=k+1 Sπ(i)

)

HTπ(k) + Ir

∣

∣

∣

. (4)

In this lemma, permutation π determines the encoding order: the message for user π(k) isencoded after all messages for preceding users π(i), i < k, have been encoded. Moreover, Sk

is the covariance matrix of the transmitted codewords for user k. The proof for K-user caseis a straightforward extension of the 2-user case.

The DPC achievable rate region is formally defined in the following.

Definition 2.1 The DPC achievable rate region of the Gaussian MIMO BC in (1) with aver-age power constraint P , RDPC(P ), is the convex-hull of the union of all sets F(π, {Hk}, {Sk})over all permutations and admissible covariance matrices, i.e.,

RDPC(P ) = Conv

⋃

π,{Sk}:Sk�0 ∀k,∑

ktr(Sk)≤P

F(π, {Hk}, {Sk})

. (5)

Except for a few special cases, it is almost impossible to characterize the DPC region byexamining all possible permutations and admissible transmit covariance matrices. Hence,one might think of using the convexity of this region to characterize its boundary points.However, the rate terms for Rk in (4) are not concave functions of {Sk} and therefore it isvery difficult to directly characterize the boundary points of the region RDPC(P ). In thefollowing section, it is shown how this difficulty can be overcome using the duality theory.

5

2.1 Alternative Representation of the Dirty Paper Coding Region

via Duality

The Gaussian multiple access-broadcast channel duality was observed independently by au-thors of [7] [8] and [10]. The approach introduced in [7] and [8] is employed for the purposeof this section. There, it was shown that for any permutation π and admissible set of covari-ance matrices {Sk}, the rates Rk in (4) are achievable in a Gaussian multiple access channelthat is obtained from the broadcast channel by reversing the roles of the transmitter andthe receivers. Specifically, the output of this dual MAC is given by,

y =K∑

k=1

HTk xk + z, (6)

where y ∈ Rt is the received vector, xk ∈ R

r is the transmitted vector of user k and z is thereceiver’s Gaussian noise with covariance matrix It. In this dual MAC, the matrices Hk arethe same as the ones in (1). Here, Hk(i, j) is the channel coefficient from transmit antennai of user k to receive antenna j.

The duality result states that for any π and any set of covariance matrices {Sk}, thereexists a set of r × r covariance matrices {Sk} such that

∑

k tr(Sk) =∑

k tr(Sk) and fork = 1, . . . , K,

1

2log

∣

∣

∣Hπ(k)

(∑

i≥k Sπ(i)

)

HTπ(k) + Ir

∣

∣

∣

∣

∣

∣Hπ(k)

(∑

i>k Sπ(i)

)

HTπ(k) + Ir

∣

∣

∣

=1

2log

∣

∣

∣

∑

i≤k HTπ(i)Sπ(i)Hπ(i) + It

∣

∣

∣

∣

∣

∣

∑

i<k HTπ(i)Sπ(i)Hπ(i) + It

∣

∣

∣

. (7)

Recall that the left-hand side expression is Rπ(k) when the covariance matrices of the code-words in the DPC scheme are equal to {Sk}. Moreover, a close look at the righthand side ex-pression reveals that it is equal to I(xπ(k);y|xπ(k+1), . . . ,xπ(K)) in the dual MAC in (6) whenthe users exploit Gaussian code-books with covariance matrices {Sk}. Therefore, the rates Rk

in the DPC region of the broadcast channel are achieved by successive decoding in the dualMAC and the users’ codewords are decoded in the opposite order of π. Conversely, for anygiven π and any set of covariance matrices {Sk} such that

∑

k tr(Sk) ≤ P , there exists a setof covariance matrices {Sk} that satisfies the equalities in (7) with

∑

k tr(Sk) =∑

k tr(Sk).Given that for a Gaussian multiple access channel, Gaussian code-books are optimal, it canbe concluded that the DPC region in (5) is equal to the capacity region of the dual MACunder sum power constraint P . Let this region be denoted by Csum

MAC(P ). In short, by duality

RDPC(P ) = CsumMAC(P ).

In order to describe CsumMAC(P ) , recall that by using a Gaussian code-book with covariance

matrix Sk for user k, the following set of rates is achievable in the dual multiple accesschannel [17] [15]:

G({HTk }, {Sk}) =

{

R ∈ RK+ :∑

k∈J

Rk ≤1

2log

∣

∣

∣

∣

∣

∑

k∈J

HTk SkHk + It

∣

∣

∣

∣

∣

, ∀ J ⊆ {1, . . . , K}

}

. (8)

6

In effect, the capacity region of the dual MAC in (6) under sum power constraint P can beexpressed by,

CsumMAC(P ) =

⋃

{Sk}:Sk�0 ∀k,∑

ktr(Sk)≤P

G({HTk }, {Sk}). (9)

It is not hard to check that this region is closed. Moreover, the following propositionstates that it is also convex without convexification.

Proposition 2.1 The capacity region of the dual MAC under sum power constraint, as givenin (9), is convex.

The proof of this proposition is given in Appendix A. In addition to the closeness and convex-ity properties of the set Csum

MAC(P ), according to the following lemma, each of the constitutingsets G({HT

k }, {Sk}) has a particular feature that becomes very handy in characterizing theboundary points of the set Csum

MAC(P ).

Lemma 2.2 For a fixed set of covariance matrices {Sk} and any µ1, . . . , µK ≥ 0, the solutionto the optimization problem,

MaximizeK∑

k=1

µkRk Subject to R ∈ G({HTk }, {Sk}),

is attained by a permutation π over {1, 2, . . . , K} and a Vertex Rπ defined as,

Rππ(k) =

1

2log

∣

∣

∣

∑ki=1 HT

π(i)Sπ(i)Hπ(i) + It

∣

∣

∣

∣

∣

∣

∑k−1i=1 HT

π(i)Sπ(i)Hπ(i) + It

∣

∣

∣

k = 1, . . . , K,

where π is such that µπ(1) ≥ µπ(2) ≥ · · · ≥ µπ(K).

The proof of this lemma is based on the polymatroid structure of the set G({HTk }, {Sk}) for

a fixed set of {Sk} and is provided in [15] and references therein. Note that Rπ is achievableby successive decoding scheme in the dual MAC with decoding order determined by thepermutation π. The message of user π(K) is decoded first and the message for user π(1) isdecoded last.

All these mentioned properties together with the fact that each set G({HTk }, {Sk}) is

expressed by concave functions of {Sk}, make the dual representation of the DPC region aneasier set to describe. Therefore, in the following, the DPC region is characterized via findingthe boundary points of the set Csum

MAC(P ). Since the set CsumMAC(P ) is a closed and convex set

in RK+ , any point on its boundary (also known as Pareto optimal point) can be found by

7

maximizing a weighted sum of the rates [18]. More specifically, any boundary point is asolution to the following optimization problem for some weights µ1, . . . , µK ≥ 0:

MaximizeK∑

k=1

µkRk (10)

Subject to R ∈ CsumMAC(P ).

Conversely, all the solutions corresponding to all possible selections of the weights constitutethe boundary. Without loss of generality, assume all the weights are positive. If µk = 0 forsome k, the resulting solution can be viewed as a boundary point of the DPC region obtainedby positive weights in a broadcast channel derived from the original channel after removingthe users with µk = 0. Therefore, assume 0 < µ1 ≤ µ2 ≤ · · · ≤ µK . In General, twopossible selections for the weights are feasible: The case where not any pair of the weightsare equal, e.g., 0 < µ1 < µ2 < · · · < µK and the case where some of the weights are equal,e.g., 0 < µ1 < · · · < µl = µl+1 = · · · = µl+m < · · · < µK . Characterizing the boundarypoints corresponding to these two cases are slightly different, therefore, they are consideredseparately. Lemma 2.3 simplifies further the optimization problem in (10) to characterizethe boundary points corresponding to some given weights µ1, . . . , µK .

Lemma 2.3 The boundary point R∗ maximizing∑K

k=1 µkRk over CsumMAC(P ) for two possible

selections of µ1, . . . , µK are characterized in the following:(i), 0 < µ1 < µ2 < · · · < µK: in this case R∗ is unique and is given by,

R∗k =

1

2log

∣

∣

∣

∑Kj=k HT

j S∗j Hj + It

∣

∣

∣

∣

∣

∣

∑Kj=k+1 HT

j S∗j Hj + It

∣

∣

∣

k = 1, . . . , K, (11)

where S∗k for k = 1, . . . , K are optimal solutions to the following optimization problem

(conventionally µ0 = 0):

MaximizeK∑

k=1

(µk − µk−1)1

2log

∣

∣

∣

∣

∣

K∑

j=k

HTj SjHj + It

∣

∣

∣

∣

∣

(12)

Subject toK∑

k=1

tr(Sk) ≤ P, (13)

Sk � 0 k = 1, . . . , K. (14)

Furthermore, for any optimal S∗k, k = 1, . . . , K,

∑Kk=1 tr(S∗

k) = P and there exists λ∗ ∈ R+

and positive semi-definite matrices Φ∗k such that they jointly satisfy the following Karush-

Kuhn-Tucker (KKT) optimality conditions for k = 1, . . . , K:

Hk

k∑

j=1

(µj − µj−1)1

2

(

K∑

i=j

HTi S∗

i Hi + It

)−1

HTk + Φ∗

k − λ∗Ir = 0, (15)

tr(Φ∗kS

∗k) = 0. (16)

8

(ii), 0 < µ1 < · · · < µl = µl+1 = · · · = µl+m < · · · < µK: in this case, the boundary pointR∗ may not be unique. In fact, R∗ can be any point in the convex-hull of the vertices Rσi asgiven below for all permutations σi, i = 1, . . . , (m + 1)!, on the set I = {l, . . . , l + m}:

Rσi

k =1

2log

∣

∣

∣

∑Kj=k HT

j S∗j Hj + It

∣

∣

∣

∣

∣

∣

∑Kj=k+1 HT

j S∗j Hj + It

∣

∣

∣

k ∈ {1, . . . , K} \ I, (17)

Rσi

σi(k) =1

2log

∣

∣

∣

∑l+mj=k HT

σi(j)S∗

σi(j)Hσi(j) +

∑Kj=l+m+1 HT

j S∗j Hj + It

∣

∣

∣

∣

∣

∣

∑l+mj=k+1 HT

σi(j)S∗

σi(j)Hσi(j) +

∑Kj=l+m+1 HT

j S∗j Hj + It

∣

∣

∣

k ∈ I, (18)

where S∗k for k = 1, . . . , K are solutions to the same optimization problem (12) and satisfy the

same KKT optimality conditions in (15)-(16). Moreover, any such R∗ satisfies the followingequalities:

R∗k =

1

2log

∣

∣

∣

∑Kj=k HT

j S∗j Hj + It

∣

∣

∣

∣

∣

∣

∑Kj=k+1 HT

j S∗j Hj + It

∣

∣

∣

k ∈ {1, . . . , K} \ I, (19)

l+m∑

k=l

R∗k =

1

2log

∣

∣

∣

∑Kj=l HT

j S∗j Hj + It

∣

∣

∣

∣

∣

∣

∑Kj=l+m+1 HT

j S∗j Hj + It

∣

∣

∣

. (20)

Proof: First consider the case 0 < µ1 < · · · < µK . Since the set CsumMAC(P ) as defined in (9)

is closed and convex, each of its boundary points must belong to a set G({HTk }, {Sk}), for

some covariance matrices {Sk}. Using the result of Lemma 2.2, the vertex given by

Rk =1

2log

∣

∣

∣

∑Kj=k HT

j SjHj + It

∣

∣

∣

∣

∣

∣

∑Kj=k+1 HT

j SjHj + It

∣

∣

∣

k = 1, . . . , K,

maximizes∑

k µkRk over G({HTk }, {Sk}). Moreover, this point is achievable by successive

decoding in the dual MAC where the message for user k is decoded kth in the order. Substi-tuting the rate terms for Rk in (10) and including the sum power constraint, the optimizationproblem in (12) is obtained.

It is easy to show that the cost function of this optimization problem is continuous in{Sk} for any norm on the space of symmetric matrices. As is shown in [18], this function isactually differentiable with respect to these variables. Moreover, the optimization domaindefined by the constraints (13) and (14) is closed and compact for the same norm. Hence,by Weierstrass theorem [19], there exist S∗

k for k = 1, . . . , K that achieve the maximum.In addition, this optimization problem is convex since it has a concave cost function andconvex constraints in the covariance matrices {Sk} as in (13) and (14). Also the Slatercondition holds and the feasible region has an interior point for any P > 0. Thus, anyoptimal solution of (12) must satisfy the Karush-Kuhn-Tucker (KKT) optimality conditionsand vice versa [18]. To obtain the KKT conditions, let λ ≥ 0 be the dual variable associated

9

with the sum power constraint in (13) and the matrix Φk � 0 be the dual variable associatedwith the positive semi-definite constraint on Sk given in (14) for k = 1, . . . , K. Then theLagrangian for this optimization problem can be expressed as,

L({Sk}, {Φk}, λ) =K∑

k=1

(µk − µk−1)1

2log

∣

∣

∣

∣

∣

K∑

j=k

HTj SjHj + It

∣

∣

∣

∣

∣

+K∑

k=1

tr(SkΦk) − λ(K∑

k=1

tr(Sk) − P ).

Taking the derivative of the Lagrangian with respect to Sk for some norm on the space ofsymmetric matrices, yields the left-hand side expression in (15) for k = 1, . . . , K [18]. Theconditions in (16) are known as the complementary slackness conditions. Since the optimumvalue is achieved, there exist feasible S∗

k , Φ∗k for k = 1, . . . , K and λ∗ that satisfy the KKT

conditions in (15)-(16). S∗k are referred to as the primal optimal solutions while Φ∗

k and λ∗ are

referred to as the dual optimal solutions. Note that since for any k, log∣

∣

∣

∑Kj=k HT

j αSjHj + It

∣

∣

∣

is a strictly increasing function of α ∈ R+, the power constraint must hold with equality,i.e.,

∑

k tr(S∗k) = P . Otherwise, all S∗

ks can be scaled up by a factor α > 1 to increase thecost function while still satisfying the trace constraint. Also all the matrix terms on theleft-hand sides of (15) are positive semi-definite and the matrix terms involving S∗

ks cannotbe equal to all zero matrix. Therefore, the optimal λ∗ must be positive. For this choice ofthe weights, it can be shown that the boundary point R∗ is also unique1.

Next consider the case 0 < µ1 < · · · < µl = µl+1 = · · · = µl+m < · · · < µK . As inthe case of unequal weights, each boundary point must belong to a set G({HT

k }, {Sk}) forsome feasible covariance matrices {Sk}. According to Lemma 2.2,

∑

k µkRk is maximizedover G({HT

k }, {Sk}) by all the vertices that have the following decoding orders: user k fork /∈ {l, . . . , l + m} is decoded kth in the order. However, since µl = · · · = µl+m, thislemma does not specify any decoding order for the users in {l, . . . , l + m}. Therefore, bychoosing various decoding orders specified by the permutations σi for i = 1, . . . , (m + 1)!on these m + 1 users, all the vertices Rσi that maximize

∑

k µkRk can be found as given in(17)-(18). Recall that there is no other vertex that maximizes

∑

k µkRk. Referring to thepolymatroid structure of the set G({HT

k }, {Sk}) for a fixed set of {Sk} [15], the convex-hull ofthese (m + 1)! vertices constitutes a m-dimensional boundary surface of G({HT

k }, {Sk}) andclearly any point on this convex-hull maximizes

∑

k µkRk. Note that some or all of thesepoints may coincide resulting in a smaller dimensional surface (possibly a single point).However, in general, they produce a m-dimensional surface. It is easy to verify that for all

1Uniqueness of this point is a direct consequence of strict concavity of log |.| function, however, it is of

the least significance to the proof and is included here for the sack of completeness.

10

the vertices Rσi for i = 1, . . . , (m + 1)!,

Rσi

k =1

2log

∣

∣

∣

∑Kj=k HT

j SjHj + It

∣

∣

∣

∣

∣

∣

∑Kj=k+1 HT

j SjHj + It

∣

∣

∣

k ∈ {1, . . . , K} \ I,

l+m∑

k=l

Rσi

k =1

2log

∣

∣

∣

∑Kj=l HT

j SjHj + It

∣

∣

∣

∣

∣

∣

∑Kj=l+m+1 HT

j SjHj + It

∣

∣

∣

.

Since µl = · · · = µl+m, after substituting the rate terms for these vertices in∑

k µkRk andtaking into account the second equality above, the same optimization problem in (12) isobtained. The solution to this problem may not be unique, however, any set of optimal {S∗

k}satisfies the KKT conditions in (15)-(16) and identifies the vertices Rσi . The boundary pointR∗ may be chosen as any point in the convex-hull of these vertices and clearly satisfies theequalities in (19) and (20). Note that the aforementioned boundary point R∗ may no longerbe achievable by the successive decoding scheme in the dual MAC. Hence, it may not beachievable by only using the DPC scheme in the broadcast channel. Nevertheless, the DPCscheme achieves each of the vertices Rσi and by time sharing among the codes achievingthese vertices, R∗ can be achieved.

This result will be exploited in the next section to prove the optimality of the DPCregion.

Figure 3 sketches the DPC region and shows a boundary point (R∗1, R

∗2) for some µ1, µ2.

3 Optimality of the Dirty Paper Coding Scheme

Theorem 3.1 RDPC(P ) is the capacity region, CBC(P ).

To prove this theorem, the same approach proposed in [11], [12] and [13] is used: the bound-ary of the DPC region is partitioned into several segments (possibly single points) and theconverse is proven independently for each segment. The main idea of this approach is toexploit the known results on the capacity of degraded broadcast channels. By partitioningthe boundary of the DPC region into several segments, a degraded broadcast channel is con-structed for each boundary segment, B, with the following properties: First, it has the samesegment B on the boundary of its DPC region. In other words, for that particular segment,the DPC scheme performs at best the same in the degraded channel and the original chan-nel. Second, the capacity region of the degraded channel contains the capacity region of theoriginal channel. Then, by using the known results on the capacity of degraded broadcastchannels, it is shown that the DPC scheme is optimal for the degraded channel, hence, thischannel has the segment B on the boundary of its capacity region. Since the capacity regionof the degraded channel contains the capacity region of the original channel, it is concludedthat B is also on the boundary of the capacity region of the original channel. The same

11

0 0.65 1.3 1.950

0.65

1.3

1.95

PSfrag replacements

G({HTk }, {S

∗k})

R1

R2

Slope:−µ1/µ2

(R∗1, R

∗2)

Figure 3: Characterizing the boundary points of the DPC region via duality.

12

argument is used for all the boundary segments to prove the optimality of the DPC schemefor the whole region.

The authors of [11] and [12] successfully constructed these degraded broadcast channelsfor each boundary segment, however, they failed to prove the optimality of the DPC schemefor these degraded broadcast channels. Obviously, the choice of these degraded channels hasa significant effect on the simplicity of the converse proof.

In this section, the converse proof is given for each boundary point of RDPC(P ). Adegraded broadcast channel for the boundary point R∗ that maximizes

∑Kk=1 µkRk, for a

given µ1, . . . , µK ≥ 0, is defined based on the optimality conditions given in Lemma 2.3.In Lemma 3.1, it is shown that R∗ is on the boundary of the DPC region of this degradedchannel (first property). Furthermore, it is proven that this degraded channel has a largercapacity region compared to the original channel (second property). Using the entropy powerinequality, it is then proven that R∗ also lies on the boundary of the capacity region of thedegraded broadcast channel (Lemma 3.2).

While the proof presented in this section borrows its main idea from the previous worksincluding the recent converse by Weingarten, Steinberg and Shamai (WSS) [13], it has severalkey differences. The converse by WSS is initially proven for a particular class of degradedMIMO broadcast channels referred to as Aligned Degraded Broadcast Channels (ADBC),while the proof proposed here directly applies to general Gaussian MIMO broadcast channels.Moreover, in [13], the converse for the ADBC channels is proven through the definition ofthe enhanced ADBC channel with certain properties that make it possible to employ theentropy power inequality. Although, this enhanced ADBC channel shows up in the proposedproof here, the way this channel is obtained is totally different from the approach of WSS. Inthis section, this channel is initially defined based on the optimality conditions for a convexoptimization problem and it will turn out that it has all the properties of the enhancedchannel. While in [13], the existence of such a channel is proven mainly using non-convexoptimization techniques. As the last step of the WSS proof, the converse is generalizedto the larger class of Aligned Multiple-input-multiple-output Broadcast Channels (AMBC).Using the result for the AMBC channels, the proof is then extended to the general GaussianMIMO BC by showing that the capacity region of a MIMO BC can be expressed as the limitof the capacity regions of a sequence of AMBC channels as some of the eigenvalues of thenoise covariance matrices go to infinity. Nevertheless, this limiting argument is not neededin the converse proof of this section.

To focus mainly on the key steps of the proof instead of the details and simplify thepresentation, first, the optimality of the DPC scheme is proven for K = 2 users. Study ofthe general K > 2 user case is postponed to Section 3.1.

Now the details of the proof are given. Consider the boundary point R∗ of RDPC(P )corresponding to given 0 < µ1 < µ2 as characterized in Lemma 2.3. Recall that for K = 2,the choices of µ1 = 0 or µ2 = 0 and µ1 = µ2 correspond to the capacities of the individualusers and the maximum sum-rate point of the DPC region, respectively. These points areof no interest since optimality of the DPC scheme is already known for them (see [8], [9]and [10]). Thus, it only remains to consider the cases 0 < µ1 < µ2 or 0 < µ2 < µ1. Since the

13

proof for the latter case is identical to the proof for the former case, it is sufficient to onlyconsider 0 < µ1 < µ2. For these given weights, define the t × t symmetric matrices,

Q1 =µ1

2λ∗

(

HT1 S∗

1H1 + HT2 S∗

2H2 + It

)−1, (21)

Q2 =µ1

2λ∗

(

HT1 S∗

1H1 + HT2 S∗

2H2 + It

)−1+

µ2 − µ1

2λ∗

(

HT2 S∗

2H2 + It

)−1, (22)

where S∗1 , S

∗2 and λ∗ are, respectively, the primal and dual optimal solutions of the optimiza-

tion problem (12) in Lemma 2.3. Clearly, these matrices are positive definite. Note thatboth Q1 and Q2 depend on the weight vector µ = (µ1, µ2). This dependency is not explic-itly included for notational simplicity. In the following, a degraded MIMO BC is definedcorresponding to the boundary point of RDPC(P ) under consideration.

Definition 3.1 For a given weight vector µ and its corresponding boundary point R∗ ofRDPC(P ), define the DBC(µ) channel as,

yk = x + zk k = 1, 2, (23)

where x,y1 and y2 ∈ Rt are the channel input and output vectors, respectively, and z1, z2 are

Gaussian noise vectors with covariance matrices Q1 and Q2, respectively. Further assumethe same total average transmit power P for this channel.

It is immediate from definition of Q1 and Q2 that 0 ≺ Q1 ≺ Q2. This choice of Q1 and Q2

ensures that DBC(µ) is statistically degraded, a property that will be used later to establishits capacity region. The following lemma shows that the boundary of the DPC region ofDBC(µ) is tangent to the boundary of RDPC(P ) at R∗, hence, this point is achievable bythe DPC scheme in this channel.

Lemma 3.1 The point R∗ maximizes µ1R1 +µ2R2 over the DPC region of DBC(µ) denoted

by RDBC(µ)DPC (P ).

The proof of this lemma is given in Appendix B. Figure 4 shows the DPC regions for boththe original channel and the degraded channel, DBC(µ), defined for the point R∗.

Before proceeding to prove that the point R∗ also lies on the boundary of the capacityregion of DBC(µ), consider the transmit covariance matrices for the DBC(µ) channel thatachieve R∗ by the DPC scheme. Denote these matrices by Γ∗

1 and Γ∗2. Lemma 3.3 of Section

3.1 shows that these matrices are given by,

Γ∗1 =

µ1

2λ∗

(

HT2 S∗

2H2 + It

)−1− Q1

=µ1

2λ∗

(

(HT2 S∗

2H2 + It)−1 − (HT

1 S∗1H1 + HT

2 S∗2H2 + It)

−1)

, (24)

Γ∗2 =

µ2

2λ∗It − Γ∗

1 − Q2

=µ2

2λ∗

(

It − (HT2 S∗

2H2 + It)−1)

, (25)

14

0 1 2 30

1

2

PSfrag replacements

RDPC(P )

RDBC(µ)DPC (P )

R1

R2

Slope:−µ1/µ2

(R∗1, R

∗2)

Figure 4: The point (R∗1, R

∗2) lies on the boundaries of the DPC regions for the original

broadcast channel and the degraded channel, DBC(µ).

15

PSfrag replacements

x

y2y1

z1 ∼ N (0, Q1) z′2 ∼ N (0, Q2 − Q1)

Figure 5: Physically degraded broadcast channel with the same capacity region as DBC(µ).

where S∗1 , S∗

2 and λ∗ are the optimal solutions to the optimization problem in (12). Moreover,this lemma expresses R∗

1, R∗2 in terms of Γ∗

1 and Γ∗2 as,

R∗1 =

1

2log

∣

∣Γ∗1 + Q1

∣

∣

|Q1|,

R∗2 =

1

2log

∣

∣Γ∗1 + Γ∗

2 + Q2

∣

∣

∣

∣Γ∗1 + Q2

∣

∣

.

Lemma 3.2 The point R∗, which was shown to be on the boundary of the DPC region ofDBC(µ) is also on the boundary of its capacity region.

Proof: The method of proof by contradiction is employed to verify that (R∗1, R

∗2) is on

the boundary of the capacity region of DBC(µ). The steps of the proof is very similar tothe Bergmans’ convers given for the scalar case [1]. Since Q1 ≺ Q2, DBC(µ) has the samemarginal transition probability distributions, and therefore, the same capacity region as aphysically degraded broadcast channel given by,

y1 = x + z1,

y2 = y1 + z′2,

where z1 and z′2 are independent Gaussian noises with covariance matrices equal to Q1

and Q2 − Q1, respectively (see Figure 5). To be able to use the entropy power inequality,this degraded version of DBC(µ) is used for the capacity analysis. First assume (R∗

1, R∗2)

is not on the boundary and lies within the capacity region of DBC(µ). Therefore, thereexists a rate-pair (R1, R2) in the capacity region and an arbitrary small δ > 0 such thatR∗

k + 2δ ≤ Rk for k = 1, 2. Consider C(enR1 , enR2 , n), an arbitrary sequence of codes eachwith block length n and rates (R1, R2) such that the average probability of decoding error,

P(n)e , vanishes as n → ∞. Let xn denote the nt by 1 stacked vector of the transmitted

symbols, xn = [x(1)T · · ·x(n)T ]T and define the noise vectors zn1 , z′n2 and corresponding

output vectors yn1 , yn

2 similarly. By Fano’s inequality, for the codes under consideration with

16

large enough block-length, n, the following inequalities hold:

R∗1 + δ ≤ R1 − δ ≤

1

nI(W1;y

n1 |W2), (26)

R∗2 + δ ≤ R2 − δ ≤

1

nI(W2;y

n2 ), (27)

where W1, W2 are intended messages for user 1 and user 2, respectively. By expanding themutual information term, (26) is reduced to,

R∗1 + δ ≤

1

nh(yn

1 |W2) −1

nh(yn

1 |W1,W2),

where h(.) is the differential entropy function. Note that yn1 , yn

2 are obtained by addi-tion of two independent Gaussian noise vectors to xn, hence, they both have densities.h(yn

1 |W1,W2) = h(zn1 ) = n

2log(2πe)t |Q1| and R∗

1 = 12log∣

∣Γ∗1 + Q1

∣

∣ − 12log |Q1|, from these

equalities the following lower bound on h(yn1 |W2) can be established:

1

nh(yn

1 |W2) ≥1

2log(2πe)t

∣

∣Γ∗1 + Q1

∣

∣+ δ. (28)

Now since z′n2 is independent of (W1,W2, z

n1 ), and conditioned on W2, yn

2 = yn1 + z′

n2 and yn

1

have densities, the entropy power inequality [17] can be applied to obtain,

exp

(

2

nth(yn

2 |W2)

)

≥ exp

(

2

nth(yn

1 |W2)

)

+ exp

(

2

nth(z′

n2 )

)

. (29)

Employing the lower bound obtained in (28) in (29) and substituting h(z′2) = 1

2log(2πe)t |Q2 − Q1|,

the inequality (29) is reduced to,

exp

(

2

nth(yn

2 |W2)

)

≥ 2πe(

∣

∣Γ∗1 + Q1

∣

∣

1

t + |Q2 − Q1|1

t

)

+ δ′,

for some small δ′ > 0. However, as illustrated in the following, the expressions for Γ∗1, Q1

and Q2 in (24), (21) and (22) reveal that the two matrix expressions on the right-hand side,(Γ∗

1 + Q1) and (Q2 − Q1), are scaled versions of each other:

Γ∗1 + Q1 =

µ1

2λ∗(HT

2 S∗2H2 + It)

−1,

Q2 − Q1 =(µ2 − µ1)

2λ∗(HT

2 S∗2H2 + It)

−1.

In effect, they satisfy the following equality:

∣

∣Γ∗1 + Q1

∣

∣

1

t + |Q2 − Q1|1

t =∣

∣Γ∗1 + Q2

∣

∣

1

t .

This equality yields a lower bound on h(yn2 |W2) as given below:

1

nh(yn

2 |W2) ≥1

2log(2πe)t

∣

∣Γ∗1 + Q2

∣

∣+ δ′′, (30)

17

where δ′′ > 0 is some small positive constant. The lower bound (30) can be used in theFano’s inequality (27) to obtain a lower bound on h(yn

2 ) as

1

nh(yn

2 ) ≥ R∗2 +

1

2log(2πe)t

∣

∣Γ∗1 + Q2

∣

∣+ δ + δ′′

≥1

2log(2πe)t

∣

∣Γ∗1 + Γ∗

2 + Q2

∣

∣+ δ + δ′′,

where the second term is obtained by substituting R∗2 = 1

2log∣

∣Γ∗1 + Γ∗

2 + Q2

∣

∣− 12log∣

∣Γ∗1 + Q2

∣

∣.However, yn

2 = xn + zn1 + z′

n2 is the transmitted codeword corrupted by an additive Gaussian

noise that is i.i.d. over transmissions and on each transmission it has covariance matrix Q2.In other words, zn

1 + z′n2 has nt by nt block-diagonal covariance matrix with Q2 on each

diagonal block:

Q2 0 · · · 0

0 Q2 · · · 0...

.... . .

...0 0 · · · Q2

.

Moreover, there is an average power constraint on xn,

n∑

i=1

E(x(i)Tx(i)) = E(

xnTxn)

≤ nP.

Therefore, by water-filling conditions [17], h(yn2 ) is maximized for a random vector xn that

has i.i.d. Gaussian x(i) for i = 1, . . . , n. In addition, each x(i) has zero mean and covariancematrix that has the same eigenvectors as the noise covariance matrix Q2 and its eigenvalueswater-fill the eigenvalues of Q2 . It is not hard to show that the optimal positive semi-definitecovariance matrix Σ = E(x(i)x(i)T ) that maximizes h(yn

2 ) subject to the power constraintmust satisfy the water-filling conditions as given below,

(Σ + Q2)−1 + Θ = αIt,

tr(Σ) = P,

tr(ΣΘ) = 0,

where Θ is a t × t positive semi-definite matrix and α is a positive real number. Form (25),it can be seen that Γ∗

1 + Γ∗2 + Q2 = µ2

2λ∗It and furthermore, in the proof of Lemma 3.3, it

is shown that tr(Γ∗1 + Γ∗

2) = P . Therefore, the transmit covariance matrix Γ∗1 + Γ∗

2 satisfiesthe water-filling conditions for Θ = 0 and α = 2λ∗/µ2 and maximizes h(yn

2 ). In effect,1nh(yn

2 ) ≤ 12log(2πe)t

∣

∣Γ∗1 + Γ∗

2 + Q2

∣

∣, that contradicts the previous inequality. Therefore(R∗

1, R∗2) lies on the boundary of the capacity region of DBC(µ).

To complete the proof and show that R∗ is on the boundary of CBC(P ), it remains to showthat the capacity region of DBC(µ) contains the capacity region of the original broadcastchannel. Since R∗ is on the boundary of the capacity region of DBC(µ) that contains CBC(P ),it must be on the boundary of CBC(P ) as well.

Let (R1, R2) be a rate-pair in CBC(P ). To prove CBC(P ) is contained in the capacityregion of DBC(µ), it is sufficient to show that any code achieving the rate-pair (R1, R2) in

18

the original broadcast channel with arbitrary small probability of decoding error, can beused in DBC(µ) to achieve the same rates with at least the same probability of decodingerror. Consider a code for the original broadcast channel with rates (R1, R2) and arbitrarysmall probability of error and assume this code is used in DBC(µ). To be able to decodeeach codeword, appropriate decoder is constructed for receiver k of DBC(µ) as described inthe following:

1. Receiver k multiplies each received symbol yk(i) by Hk.

2. Adds an i.i.d Gaussian noise vector with covariance matrix Ir−HkQkHTk to the resulting

symbols for Qk as given in (21)-(22).

3. The resulting symbols are decoded using the same decoding rule as in the originalbroadcast channel.

After step 1, receiver k obtains a codeword that is statistically the same as the one receivedin a Gaussian MIMO BC with channel matrix Hk and noise covariance matrix HkQkH

Tk for

receiver k. Note that equalities given in (15) ensures that HTk QkHk � Ir for k = 1, 2.

Therefore, there exists a Gaussian noise vector with covariance matrix as given in step 2. Bystep 2, the resulting codeword is statistically the same as the one passed through a broadcastchannel with channel matrix Hk and noise covariance matrix HkQkH

Tk + Ir −HkQkH

Tk = Ir

for receiver k. Thus, the same decoding functions as in the original broadcast channel canbe used to decode each message with at least the same probability of error. This completesthe converse proof for K = 2 users.

3.1 Extension to More than Two Users

In this section, the converse proof is extended to more than two users. Except for sometechnical details, the extended proof follows exactly the same line of reasoning as for the twouser case. Let R∗ be the boundary point of RDPC(P ) corresponding to some given weightsµ1, . . . , µk ≥ 0. For the converse proof, it is only sufficient to prove the optimality of theDPC scheme for the boundary points that correspond to positive weights. If µk = 0 for somek, then R∗ will essentially lie on the boundary of the DPC region of a K − 1 user broadcastchannel obtained by removing user k from the original channel. Hence, by induction on Kand assuming that the converse holds for K − 1 users, this point will be on the boundaryof the capacity region of this K − 1 user broadcast channel, which is in fact the boundarysegment of the capacity region of the K user original channel corresponding to Rk = 0.Consequently, R∗ will lie on the boundary of the capacity region of the K user broadcastchannel. Therefore, without loss of generality assume 0 < µ1 ≤ · · · ≤ µK and consider thecorresponding boundary point R∗ as specified in Lemma 2.3.

Parallel to Section 3, for the boundary point R∗ define the matrices Qk ∈ St as,

Qk =1

2λ∗

k∑

j=1

(µj − µj−1)

(

K∑

i=j

HTi S∗

i Hi + It

)−1

k = 1, . . . , K. (31)

19

Also let the degraded broadcast channel DBC(µ) be defined as before:

yk = x + zk, for k = 1, . . . , K,

where zk is a white Gaussian noise vector with covariance matrix equal to Qk. Note that0 ≺ Q1 � Q2 � · · · � QK and DBC(µ) is a statistically degraded broadcast channel. As forthe 2-user case, according to Lemma 3.1, the boundary of the DPC region of this channel istangent to the boundary of the DPC region of the original channel at the point R∗.

Furthermore, transmit covariance matrices that achieve R∗ by the DPC scheme in DBC(µ)are provided in the following lemma.

Lemma 3.3 For the case 0 < µ1 < · · · < µK, the transmit covariance matrices of DBC(µ)that achieve the boundary point R∗ as given in (11) by the DPC scheme satisfy the followingrecursive formulas:

Γ∗k =

µk

2λ∗

(

K∑

j=k+1

HTj S∗

j Hj + It

)−1

−k−1∑

j=1

Γ∗j − Qk, k = 1, . . . , K, (32)

and can be expressed as given below:

Γ∗k =

µk

2λ∗

(

K∑

j=k+1

HTj S∗

j Hj + It

)−1

−µk

2λ∗

(

K∑

j=k

HTj S∗

j Hj + It

)−1

k = 1, . . . , K, (33)

where S∗k and λ∗ are the optimal solutions to the optimization problem (12). Furthermore,

for the case 0 < µ1 < · · · < µl = · · · = µl+m < · · · < µK, these covariance matrices achieveRσ1 as given in (17)-(18) by the DPC where σ1 is the identity permutation on {l, . . . , l+m}.

Proof: Starting from k = 1 and substituting the expressions for the covariance matricesin the recursive formulas, Γ∗

k as given in (33) are obtained. First, it is shown that Γ∗k, k =

1, . . . , K, are feasible and satisfy the sum power constraint P . Note that for k = 1, . . . , K,

(

K∑

j=k

HTj S∗

j Hj + It

)−1

�

(

K∑

j=k+1

HTj S∗

j Hj + It

)−1

.

Therefore, Γ∗k are positive semi-definite. Moreover, using the identity I − (I + A)−1 =

(I + A)−1A for any A � 0, the expressions for Γ∗k can be simplified to,

Γ∗k =

µk

2λ∗

(

( K∑

j=k+1

HTj S∗

j Hj + It

)−1

− It

)

+µk

2λ∗

(

It −

( K∑

j=k

HTj S∗

j Hj + It

)−1)

=µk

2λ∗

( K∑

j=k

HTj S∗

j Hj + It

)−1 K∑

j=k

HTj S∗

j Hj

−µk

2λ∗

( K∑

j=k+1

HTj S∗

j Hj + It

)−1 K∑

j=k+1

HTj S∗

j Hj,

20

where the term µk/2λ∗It is added to and subtracted from the original expression for Γ∗

k.Adding up Γ∗

k for k = 1, . . . , K yields,

K∑

k=1

Γ∗k =

K∑

k=1

µk − µk−1

2λ∗

(

K∑

i=k

HTi S∗

i Hi + It

)−1 K∑

j=k

HTj S∗

j Hj

=K∑

j=1

j∑

k=1

µk − µk−1

2λ∗

(

K∑

i=k

HTi S∗

i Hi + It

)−1

HTj S∗

j Hj

=K∑

j=1

QjHTj S∗

j Hj,

where the second equality is obtained by interchanging the summation over j and k and thethird equality follows from definition of Qk in (31). Hence,

K∑

k=1

tr(Γ∗k) =

K∑

k=1

tr(HkQkHTk S∗

k) =K∑

k=1

tr

(

(Ir −1

λ∗Φ∗

k)S∗k

)

=K∑

k=1

tr(S∗k) = P,

where the second equality follows from the optimality conditions given in (15) and definitionof Qk while the third equality follows from the complementary slackness conditions in (16).Therefore, Γ∗

k as given in (33) satisfy the power constraint P .

Referring to Lemma 3.1, for the case 0 < µ1 < · · · < µK , R∗ is achievable in DBC(µ)by the DPC scheme with an encoding order that is reverse of the decoding order achievingthis point in the dual MAC. Hence, starting from user K, user k is encoded after usersk + 1, . . . , K. On the other hand, according to the DPC rates given in (4), the followingrates are achievable in DBC(µ) using Γ∗

k and the mentioned encoding order:

Rk =1

2log

∣

∣

∣

∑kj=1 Γ∗

j + Qk

∣

∣

∣

∣

∣

∣

∑k−1j=1 Γ∗

j + Qk

∣

∣

∣

, k = 1, . . . , K.

Form the recursive expressions in (32), it is easy to verify the Rk = R∗k for R∗ as specified

in (11). Therefore,

R∗k =

1

2log

∣

∣

∣

∑kj=1 Γ∗

j + Qk

∣

∣

∣

∣

∣

∣

∑k−1j=1 Γ∗

j + Qk

∣

∣

∣

, k = 1, . . . , K, (34)

and is achievable by the DPC scheme using the covariance matrices Γ∗k in DBC(µ). Recall

that for the case 0 < µ1 < · · · < µl = · · · = µl+m < · · · < µK , as is explained in Lemma 2.3,in general, the boundary point R∗ is not necessarily achievable by only successive encodingscheme and further time-sharing may be required. However, following the same line ofreasoning as given above, it can be shown that the vertex Rσ1 as given in (17)-(18) isachievable by the DPC scheme using the covariance matrices Γ∗

k in DBC(µ).

21

The fact that the capacity region of DBC(µ) contains the capacity region of the originalchannel follows directly from the inequalities HT

k QkHk � Ir, exactly in the same manner asfor the two user case. Finally by showing that the point R∗ also lies on the boundary of thecapacity region of DBC(µ), the converse proof is completed. This is proven by contradictionin the following.

As was mentioned earlier, DBC(µ) is a degraded broadcast channel and its capacityregion is equal to the capacity region of a physically degraded broadcast channel given asbelow,

y1 = x + z1,

yk = yk−1 + z′k, k = 2, . . . , K,

where z1 ∈ Rt and z′k ∈ R

t for k = 2, . . . , K are independent white Gaussian noise vec-tors with covariance matrices Q1 and Qk − Qk−1, respectively. For capacity analysis, thisphysically degraded version of DBC(µ) is considered.

Assume R∗ is not on the boundary of the capacity region of DBC(µ). Therefore, theremust exist a rate-tuple R and an arbitrary small δ > 0 such that R∗

k + 2δ ≤ Rk fork = 1, . . . , K. Consider C(enR1 , enR2 , . . . , enRK , n), any sequence of n block-length codesfor DBC(µ) with rates (R1, . . . , RK) such that the average probability of decoding error,

P(n)e , goes to zero as n → ∞. By Fano’s inequality, for sufficiently large n, the following

inequalities hold:

R∗k + δ ≤

1

nI(Wk;y

nk |Wk+1, . . . ,WK) k = 1, . . . , K. (35)

In Lemma 3.4 at the end of this section, it is shown that for sufficiently small εk > 0, theinequality,

1

nh(yn

k |Wk+1,Wk+2, . . . ,WK) ≥1

2log(2πe)t

∣

∣

∣

∣

∣

k∑

j=1

Γ∗j + Qk

∣

∣

∣

∣

∣

+ εk, (36)

holds for k = 1, . . . , K in the case 0 < µ1 < · · · < µK and it holds for k ∈ {1, . . . , K} \{l, . . . , l + m − 1} in the case 0 < µ1 < · · · < µl = · · · = µl+m < · · · < µK . As a result, fork = K, it follows from (36) that,

1

nh(yn

K) ≥1

2log(2πe)t

∣

∣

∣

∣

∣

K∑

j=1

Γ∗j + QK

∣

∣

∣

∣

∣

+ εK .

However, ynK = xn + zn

K , where znK consists of i.i.d. Gaussian noise vectors with covari-

ance matrix QK for each transmission and there is an average power constraint nP onthe transmitted codewords, i.e., E(xnTxn) ≤ nP . Therefore, i.i.d. Gaussian x(i) fori = 1, . . . , n that satisfy the water-filling conditions maximize h(yn

K). Note that from Lemma3.3,

∑Kk=1 tr(Γ∗

k) = P and from the recursive formulas in (32) for k = K, it follows that,

K∑

j=1

Γ∗j + QK =

µK

2λ∗It.

22

Therefore, i.i.d. Gaussian x(i), for i = 1, . . . , n with covariance matrix E(x(i)xT (i)) =∑K

k=1 Γ∗k satisfy the water-filling conditions. Consequently, 1

nh(yn

K) is bounded from aboveas given below:

1

nh(yn

K) ≤1

2log(2πe)t

∣

∣

∣

∣

∣

K∑

k=1

Γ∗k + QK

∣

∣

∣

∣

∣

,

which contradicts the previous inequality. Therefore, the point R∗ must be on the boundaryof the capacity region of DBC(µ).

Lemma 3.4 For the case 0 < µ1 < · · · < µK, the following inequalities hold for k =1, . . . , K:

1

nh(yn

k |Wk+1,Wk+2, . . . ,WK) ≥1

2log(2πe)t

∣

∣

∣

∣

∣

k∑

j=1

Γ∗j + Qk

∣

∣

∣

∣

∣

+ εk,

where εk > 0 are arbitrary small positive constants. The aforementioned inequalities hold for1 ≤ k ≤ l − 1 and l + m ≤ k ≤ K for the case 0 < µ1 < · · · < µl = µl+1 = · · · = µl+m <· · · < µK.

Proof: These inequalities are proven by induction on k. First consider the case 0 < µ1 <· · · < µK . For k = 1, from the Fano’s inequality in (35), the rate expression for R∗

1 in (34)and h(yn

1 |W1, . . . ,WK) = h(zn1 ) = n

2log(2πe)t|Q1|, it follows that,

1

nh(yn

1 |W2, . . . ,WK) ≥1

2log(2πe)t|Q1| +

1

2log

|Γ∗1 + Q1|

|Q1|+ δ =

1

2log(2πe)t|Γ∗

1 + Q1| + δ,

which is the desired inequality with ε1 = δ > 0. Now assume the inequality holds for k − 1.Recall that yn

k−1 and z′nk are independent given Wk, . . . ,WK and yn

k = ynk−1 + z′

nk , hence, the

conditional entropy power inequality can be employed to obtain,

exp

(

2

nth(yn

k |Wk, . . . ,WK)

)

≥ exp

(

2

nth(yn

k−1|Wk, . . . ,WK)

)

+ exp

(

2

th(z′k)

)

≥ 2πe

∣

∣

∣

∣

∣

k−1∑

j=1

Γ∗j + Qk−1

∣

∣

∣

∣

∣

1/t

+ 2πe |Qk − Qk−1|1/t + ε′k−1,

where the second inequality follows from the induction assumption for k − 1 and sufficientlysmall ε′k−1 > 0. However, the recursive expression in (32) for Γ∗

k−1 reveals that,

k−1∑

j=1

Γ∗j + Qk−1 =

µk−1

2λ∗

(

K∑

j=k

HTj S∗

j Hj + It

)−1

.

Therefore,∑k−1

j=1 Γ∗j + Qk−1 and Qk − Qk−1 are scaled versions of each other which yields,

∣

∣

∣

∣

∣

k−1∑

j=1

Γ∗j + Qk−1

∣

∣

∣

∣

∣

1/t

+ |Qk − Qk−1|1/t =

∣

∣

∣

∣

∣

k−1∑

j=1

Γ∗j + Qk

∣

∣

∣

∣

∣

1/t

.

23

This equality further simplifies the lower bound on h(ynk |Wk, . . . ,WK) as,

1

nh(yn

k |Wk, . . . ,WK) ≥1

2log(2πe)t

∣

∣

∣

∣

∣

k−1∑

j=1

Γ∗j + Qk

∣

∣

∣

∣

∣

+ ε′′k−1,

for some sufficiently small ε′′k−1 > 0. By employing this lower bound in the Fano’s inequalityin (35) for user k and replacing the rate expression for R∗

k, the desired inequality is obtainedfor k:

1

nh(yn

k |Wk+1, . . . ,WK) ≥ R∗k +

1

nh(yn

k |Wk, . . . ,WK) + δ

≥1

2log

∣

∣

∣

∑kj=1 Γ∗

j + Qk

∣

∣

∣

∣

∣

∣

∑k−1j=1 Γ∗

j + Qk

∣

∣

∣

+1

2log(2πe)t

∣

∣

∣

∣

∣

k−1∑

j=1

Γ∗j + Qk

∣

∣

∣

∣

∣

+ ε′′k−1 + δ

=1

2log(2πe)t

∣

∣

∣

∣

∣

k∑

j=1

Γ∗j + Qk

∣

∣

∣

∣

∣

+ εk.

Next consider the case 0 < µ1 < · · · < µl = · · · = µl+m < · · · < µK . In this case, sinceQl = · · · = Ql+m in DBC(µ), the received vectors yn

l , . . . ,ynl+m are statistically the same.

Therefore, for k = l, . . . , l + m, the Fano’s inequalities in (35) can be written as,

R∗k + δ ≤

1

nI(Wk;y

nk |Wk+1, . . . ,WK) = I(Wk;y

nl+m|Wk+1, . . . ,WK).

Adding up these inequalities for k = l, . . . , l + m and using the chain rule for mutual infor-mation provide the following inequality that will be used subsequently:

l+m∑

k=l

R∗k + (m + 1)δ ≤

1

nI(Wl, . . . ,Wl+m;yn

l+m|Wl+m+1, . . . ,WK). (37)

According to Lemma 2.3, for this choice of weights, the point R∗ that is on the boundaryof the DPC regions of both DBC(µ) and the original channel, lies on the convex-hull of thevertices Rσi for i = 1, . . . , (m+1)! as given in (17)-(18). Moreover, all these vertices and henceR∗ satisfy the equalities in (19)-(20). Therefore, R∗

k = Rσi

k for k ∈ {1, . . . , K}\{l, . . . , l+m}

and∑l+m

k=l R∗k =

∑l+mk=l Rσi

k . In particular, these equalities hold for Rσ1 where σ1 is theidentity permutation on {l, . . . , l + m}. On the other hand, in Lemma 3.3 it is shown thatthe vertex Rσ1 is achievable in DBC(µ) by using the DPC scheme with covariance matricesΓ∗

k for k = 1, . . . , K and Rσ1

k assures the rate expression in (34). Hence, any boundary pointR∗ satisfies,

R∗k =

1

2log

∣

∣

∣

∑kj=1 Γ∗

j + Qk

∣

∣

∣

∣

∣

∣

∑k−1j=1 Γ∗

j + Qk

∣

∣

∣

k ∈ {1, . . . , K} \ {l, . . . , l + m}, (38)

l+m∑

k=l

R∗k =

l+m∑

k=l

1

2log

∣

∣

∣

∑kj=1 Γ∗

j + Qk

∣

∣

∣

∣

∣

∣

∑k−1j=1 Γ∗

j + Qk

∣

∣

∣

=1

2log

∣

∣

∣

∑l+mj=1 Γ∗

j + Ql+m

∣

∣

∣

∣

∣

∣

∑l−1j=1 Γ∗

j + Ql

∣

∣

∣

, (39)

24

where the second equality in (39) is obtained from the fact that for µl = · · · = µl+m,Ql = · · · = Ql+m. Since the rate expressions for R∗

k, k = 1, . . . , l − 1, in (38) are the same asin the case 0 < µ1 < · · · < µK , identical induction arguments prove the desired inequalitiesfor k = 1, . . . , l − 1 in the same way as given before. Now assume the inequality holds forl − 1, i.e.,

1

nh(yn

l−1|Wl,Wl+1, . . . ,WK) ≥1

2log(2πe)t

∣

∣

∣

∣

∣

l−1∑

j=1

Γ∗j + Ql−1

∣

∣

∣

∣

∣

+ εl−1.

In the degraded version of DBC(µ), ynl+m = yn

l−1+z′nl since Ql = · · · = Ql+m. The conditional

entropy power inequality can be applied to ynl+m to obtain,

exp

(

2

nth(yn

l+m|Wl, . . . ,WK)

)

≥ exp

(

2

nth(yn

l−1|Wl, . . . ,WK)

)

+ exp

(

2

th(z′l)

)

≥ 2πe

∣

∣

∣

∣

∣

l−1∑

j=1

Γ∗j + Ql−1

∣

∣

∣

∣

∣

1/t

+ 2πe |Ql − Ql−1|1/t + ε′l−1,

where the second inequality follows from the induction assumption for k = l−1 and arbitrarysmall ε′l−1 > 0. Again, the two expressions on the right-hand side are scaled versions of eachother and simplify further the lower bound on h(yn

l+m|Wl, . . . ,WK) as,

1

nh(yn

l+m|Wl, . . . ,WK) ≥1

2log(2πe)t

∣

∣

∣

∣

∣

l−1∑

j=1

Γ∗j + Ql

∣

∣

∣

∣

∣

+ ε′′l−1,

where ε′′l−1 > 0 is sufficiently small. By employing this lower bound in the inequality (37)

and replacing the expression for∑l+m

k=l R∗k from (39), the desired inequality for k = l + m is

obtained as below:

1

nh(yn

l+m|Wl+m+1, . . . ,WK)

≥1

nh(yn

l+m|Wl, . . . ,WK) +l+m∑

k=l

R∗k + (m + 1)δ

≥1

2log(2πe)t

∣

∣

∣

∣

∣

l−1∑

j=1

Γ∗j + Ql

∣

∣

∣

∣

∣

+ ε′′l−1 +1

2log

∣

∣

∣

∑l+mj=1 Γ∗

j + Ql+m

∣

∣

∣

∣

∣

∣

∑l−1j=1 Γ∗

j + Ql

∣

∣

∣

+ (m + 1)δ

=1

2log(2πe)t

∣

∣

∣

∣

∣

l+m∑

j=1

Γ∗j + Ql+m

∣

∣

∣

∣

∣

+ εl+m.

For k = l + m + 1, . . . , K, the inequality holds by the same induction steps given previously.

25

3.2 Extension to General Convex Constraints on the Transmit

Covariance Matrix

In this section, capacity region of the Gaussian MIMO BC under any convex constraint onthe transmit covariance matrix is investigated. Consider a norm2 on the space of symmetrict × t matrices, S

t, and let S ⊆ St be a compact (closed and bounded) and convex set

with respect to this norm. For the Gaussian MIMO BC defined in (1), instead of the totalaverage power constraint, assume the following constraint is imposed on each codeword ofblock-length n:

1

n

n∑

i=1

x(i)x(i)T ∈ S. (40)

Capacity of the Gaussian MIMO BC under these types of transmit covariance matrix con-straints is practically interesting. For instance, consider the downlink transmission in awireless system where each antenna at the base station has its individual average power con-straint. These individual power constraints can be imposed by the RF amplifiers connectedto each antenna. The problem of determining the data rates supportable for each user inthis scenario is particularly important in wireless communications. These per antenna powerconstraints appropriately lie under the category of the transmit covariance matrix constraintsgiven in (40).

The main objective of this section is to show that the DPC scheme achieves the capacity ofthe Gaussian MIMO BC under general convex constraints on the transmit covariance matrix.This result was first obtained in [13] directly without any hint of the duality concept. Atthe first glance, duality may appear to be inappropriate to tackle this problem, since theearly versions of the MAC-BC duality introduced in [7], [8] and [10] only observe this dualityunder total power constraint. However, by using more general duality concepts, in particularthe one introduced in [14], optimality of the DPC scheme can be established under thesekinds of covariance constraints.

The following theorem formally states the main result of this section.

Theorem 3.2 The capacity region of the Gaussian MIMO BC in (1) under the transmitcovariance matrix constraint in (40), denoted by CBC(S), is equal to

RDPC(S) = Conv

⋃

π,{Sk}:Sk�0 ∀k,∑

kSk∈S

F(π, {Hk}, {Sk})

, (41)

where π is a permutation on {1, . . . , K}, Sk = E(xkxTk ) is the transmit covariance matrix

for user k and the set F(π, {Hk}, {Sk}) is defined as in (3).

This theorem is initially established for a particular class of the sets S that are specifiedby an arbitrary number of affine constraints on the transmit covariance matrix, S =

∑

k Sk.

2Spectral norm is an example. For any matrix S ∈ St, ‖S‖ = maxi |λi|, where λi for i = 1, . . . , t are the

eigenvalues of S.

26

Then the result is generalized to compact and convex sets in the positive semi-definite cone,S

t+. Let M denote a general set in S

t+ that is specified by a number of affine constraints as

given below:M =

{

S : S � 0, tr(AiS) ≤ bi, for i = 1, . . . ,m}

, (42)

where Ai and bi, i = 1, . . . ,m, are arbitrary number of t × t symmetric matrices and realnumbers, respectively. Clearly, the set M depends on the matrices Ai and the constantsbi, i = 1, . . . ,m, however, this dependency is not included in the notation for simplicitypurposes. It is only reasonable to consider the sets M that are non-empty and bounded,otherwise the corresponding capacity region would be empty or unbounded. Hence, assumeAi and bi, i = 1, . . . ,m, are such that the set M is non-empty and bounded. Moreover,without loss of generality, it can be assumed that

∑mi=1 Ai � 0. Notice that by adding another

linear constraint tr(Am+1S) ≤ bm+1 for Am+1 = −∑m

i=1 Ai and bm+1 = maxS∈M tr(Am+1S),the matrices {Ai}

m+1i=1 satisfy

∑m+1i=1 Am � 0 and the set M is not altered. Also, bm+1

is bounded because for matrix B ∈ St, over the bounded set M, maxS∈M tr(BS) is also

bounded. Since m is arbitrary in definition of this class of covariance constraints specifiedby M, there is no loss of generality in assuming

∑mi=1 Ai � 0. The assumption that M

is non-empty together with∑m

i=1 Ai � 0 imply that∑m

i=1 bi ≥ 0, since for any positivesemi-definite matrix S ∈ M,

0 ≤ tr

((

m∑

i=1

Ai

)

S

)

≤m∑

i=1

bi.

To prove Theorem 3.2 for the set M, the notion of MAC-BC duality introduced in [14]is employed. By using the Lagrange dual problem, authors of [14] have established anothernotion of MAC-BC duality summarized in the following lemma.

Lemma 3.5 The boundary point R∗ that maximizes∑K

k=1 µkRk over the DPC region, RDPC(M),for some given weights µ1, . . . , µK ≥ 0, can be obtained by maximizing the same weightedsum of the rates over the capacity region of a Gaussian multiple access channel described asbelow:

y =K∑

k=1

HTk xk + z.

In this dual MAC, the Gaussian noise vector z ∈ Rr has a covariance matrix Sz � 0 that

is determined by the weights µ1, . . . , µK and the parameters {Ai}, {bi} and has the followinggeneral form:

Sz =m∑

i=1

αiAi, (43)

for some αi ∈ R+, i = 1, . . . ,m, that satisfy,

m∑

i=1

αibi ≤m∑

i=1

bi. (44)

Furthermore, the dual MAC has a sum power constraint equal to∑m

i=1 bi.

27

The notion of MAC-BC duality considered in Section 2.1 is different from the one introducedin Lemma 3.5 in the sense that in the latter, a specific MAC is defined for each boundarypoint of the DPC region, whereas in the former, a MAC is defined for the whole DPC region.In fact, in the duality notion of [14], the boundary of the capacity region of each dual channelis tangent to the boundary of the DPC region at the point for which the dual channel isdefined. However, the capacity region of the dual MAC corresponding to that point maynot be equal to the DPC region. It should be mentioned that the duality in [14] has beenestablished originally for the per antenna power constraints. Although the derivation theredoes not explicitly include the types of the constraints given by the set M, however, bysome minor modifications, the duality can be easily extended to this larger class of linearconstraints on the transmit covariance matrix. The only subtle point that requires someattention is the choice of {Ai} and {bi} that define the set M. For improper choice of {Ai}and {bi}, the set M may be empty or unbounded. Furthermore, there may not exist a noisecovariance matrix that satisfy (43) and (44), and finally the power constraint in the dualchannel,

∑mi=1 bi, may be negative. Nevertheless, for the non-empty and bounded sets M

considered in this section,∑m

i=1 Ai � 0, therefore (43) and (44) at least has a solution forαi = 1, i = 1, . . . ,m, and

∑mi=1 bi ≥ 0.

The same approach used in Section 3 is employed to show that CBC(M) is equal toRDPC(M). First, each boundary point of RDPC(M) is characterized by maximizing

∑Kk=1 µkRk

for some weights µ1, . . . , µK ≥ 0. Then for each boundary point, a degraded broadcast chan-nel with the usual total average power constraint is constructed. It is shown that the samepoint lies on the boundary of the DPC region and the capacity region of this degraded chan-nel. Finally, it is shown that the capacity region of this degraded channel contains CBC(M),hence, the boundary point of RDPC(M) lies on the boundary of CBC(M). By the samearguments, all the boundary points of the DPC region lie on the boundary of the capacityregion and two regions are equal.

To mainly focus on the key steps of the proof and avoid getting into unnecessary details,

the proof is given for the case where the large channel matrix,[

HT1 HT

2 · · · HTK

]Tis full

column-rank. As it is shown in the following, the dual MAC corresponding to each boundarypoint of the DPC region of such a broadcast channel has positive definite noise covariancematrix, Sz � 0. This assumption is only made to simplify the proof, however, the proofcan be easily extended to general rank deficient channel matrices by some additional stepsrequired for handling the channel and the noise covariance matrix singularities.

As the first step, the boundary point R∗ that maximizes∑K

k=1 µkRk over RDPC(M)for some weights µ1, . . . , µK ≥ 0 is characterized. As was mentioned in Section 3, it issufficient to only consider the positive weights µ1, . . . , µK > 0. According to Lemma 3.5,this maximization can be performed over the capacity region of the dual MAC. Similar to thedual MAC defined in (6), the dual multiple access channel of Lemma 3.5 has a sum powerconstraint while in contrast, this channel does not have the identity noise covariance matrixstructure. However, the noise at the receiver of this channel can be whitened to transform itinto the form of the MAC defined in (6). It is easy to verify that the noise covariance matrix,Sz, of the dual MAC in Lemma 3.5 is positive definite for weights µ1, . . . , µK > 0, given that

the matrix[

HT1 · · · HT

K

]Tis full column-rank. This can be verified by contradiction. Assume

28

Sz is not positive definite, therefore, there exists w ∈ Rt such that wT Szw = 0. However,

w must be in the null-space of[

HT1 · · · HT

K

]Tand wT

[

HT1 · · · HT

K

]

must be equal to allzero vector. Otherwise, in the dual MAC, the non-zero element of wT

[

HT1 · · · HT

K

]

willhave infinite signal to noise ratio and the corresponding user can achieve arbitrary largerates. Since µk > 0 for all k, this will make

∑

k µkRk unbounded above. On the other hand,[

HT1 · · · HT

K

]Tis full column-rank and has empty null-space. Therefore, w cannot be in the

null-space of[

HT1 · · · HT

K

]Tand Sz must be positive definite.

Given Sz � 0, the noise at the receiver of the dual MAC in Lemma 3.5 can be whitened bymultiplying the output vector by S

−1/2z , where S

−1/2z is the square root matrix of S−1

z . Thiswhitening process transforms the channel into the form of the dual MAC defined in (6) with

channel matrices HTk = S

−1/2z HT

k for k = 1, . . . , K and sum power constraint∑m

i=1 bi, anddoes not alter the capacity region. Therefore, Lemma 2.3 can be employed for this equivalentchannel to characterize the boundary point R∗ of RDPC(M) that maximizes

∑Kk=1 µkRk.

The next couple of steps are exactly the same as in Section 3 and 3.1. For the boundarypoint R∗ of RDPC(M), as is characterized in Lemma 2.3 for the channel matrices HT

k =

S−1/2z HT

k , k = 1, . . . , K, define the degraded broadcast channel DBC(µ) exactly as before.Assume for user k, this channel has the identity channel matrix and the noise covariancematrix Qk that is given by the same expression in (31) except with all the terms relatedto the channel matrices HT

k replaced by the corresponding terms for HTk . Consistent with

the sum power constraint of the dual MAC, assume this channel has a total average powerconstraint

∑mi=1 bi. Lemma 3.1 holds immediately and R∗ lies on the boundary of the DPC

region of DBC(µ). Also, following the same line of reasoning as before, it can be shown thatR∗ lies on the boundary of the capacity region of DBC(µ). Therefore, the only remainingstep is to show that the capacity region of the degraded broadcast channel, DBC(µ), containsthe capacity region of the original channel, CBC(M). Recall that DBC(µ) has the ordinarytotal average power constraint while the original channel is under the covariance constraintof the form given in (40). The following lemma proves this claim and complete the proof ofTheorem 3.2 for the class of covariance constraints specified by the set M.

Lemma 3.6 Capacity region of DBC(µ) contains CBC(M).

Proof: To prove CBC(M) is contained in the capacity region of DBC(µ), it is sufficientto show that any code achieving the rate-tuple R in the original broadcast channel witharbitrary small probability of decoding error, can be used in DBC(µ) to achieve the samerates with at least the same probability of decoding error. Consider a code for the originalbroadcast channel with rates (R1, . . . , RK) with arbitrary small probability of error andcodewords denoted by xn. To use this code in DBC(µ), assume each x(i) for i = 1, . . . , n is

multiplied by S1/2z prior to transmission in this channel, where S

1/2z is the square root matrix

of Sz. After this multiplication, each transmitted codeword has total average power givenby,

1

n

n∑

i=1

xT (i)Szx(i) = tr(SzS),

29

where S = 1n

∑ni=1 x(i)x(i)T is the transmit covariance matrix of that codeword. However,

since xn is a codeword for the original broadcast channel, it must satisfy the covarianceconstraint: S ∈ M, i.e.,

tr(AiS) ≤ bi for i = 1, . . . ,m.

Hence,

tr(SzS) =m∑

i=1

αitr(AiS) ≤m∑

i=1

αibi ≤m∑

i=1

bi,

where the first equality follows form the form of Sz given in (43) and the third inequalityis based on (44). In effect, the transmitted codeword in DBC(µ) satisfies the total averagepower constraint for this channel.

On the decoders’ side, receiver k of DBC(µ) multiplies its received signal by Hk =

HkS−1/2z , adds an i.i.d Gaussian noise vector with covariance matrix Ir − HkQkH

Tk to it and

uses the same decoding rule as in the original broadcast channel to decode the transmittedcodeword. After these procedures, receiver k obtains the transmitted codeword xn passedthrough the channel matrix HkS

1/2z = Hk and added to a Gaussian noise vector with co-

variance matrix HkQkHTk + Ir − HkQkH

Tk = Ir. Note that the same equalities given in (15)

ensures that HkQkHTk � Ir for the channel matrices Hk. Therefore, the resulting codeword is

statistically the same as the one passed through the original channel and the same decodingfunctions as in the original broadcast channel can be used to decode each user’s messagewith at least the same probability of decoding error.

In the following, the proof of Theorem 3.2 is extended to any compact and convex setin the cone of t × t positive semi-definite matrices, S

t+. The extension makes use of the fact

that each closed and convex set can be expressed as the intersection of all closed half-spacescontaining it [18]. Recall that any closed half-space in S

t is expressed by tr(AS) ≤ b forsome matrix A ∈ S

t and some real number b. Therefore, any compact and convex set S inS

t+ can be expressed as the intersection of possibly infinite number of sets Mn such that for

each n, Mn has the form given in (42) and contains the set S:

S =⋂

n

Mn.

Note that since S is bounded, each of the sets Mn can be chosen as a bounded set, although,they may not be bounded in general.

Now consider a sequence of sets {Mn}∞n=1 such that M1 has the form given in (42) and

contains the set S and for each n, the set Mn is obtained from Mn−1 by addition of anotherhalf-space that contains S. Therefore, S =

⋂

n Mn and

Mm ⊆ Mn, for all m ≥ n and for all n.

The rest of this section is dedicated to prove that RDPC(∩nMn) = CBC(∩nMn). Clearly,RDPC(∩Mn) ⊆ CBC(∩nMn), thus, it only remains to show that CBC(∩nMn) ⊆ RDPC(∩nMn).For each m,

⋂

n Mn ⊆ Mm, therefore, it follows immediately that CBC(∩nMn) ⊆ CBC(Mm)

30

and consequently, CBC(∩nMn) ⊆⋂

n CBC(Mn). However, from Theorem 3.2 that was provenfor the sets Mn earlier in this section, RDPC(Mn) = CBC(Mn) for all n which yields,

CBC(∩nMn) ⊆⋂

n

CBC(Mn) =⋂

n

RDPC(Mn).

In the following it is shown that⋂

n RDPC(Mn) = RDPC(∩nMn) which establishes therelation CBC(∩nMn) ⊆ RDPC(∩nMn) and completes the proof.

The “tricky” part in showing the equality⋂

n RDPC(Mn) = RDPC(∩nMn) is to ver-ify that

⋂

n RDPC(Mn) ⊆ RDPC(∩nMn). Verifying the other direction, RDPC(∩nMn) ⊆⋂

n RDPC(Mn), is straight forward. First, for any compact and convex set M ∈ St, let the

set RDPC(M) denote the set of all rate-tuples in RDPC(M) that are achievable by only DPCscheme without any time-sharing, i.e.,

RDPC(M) =⋃

π,{Sk}:Sk�0 ∀k,∑

kSk∈M

F(π, {Hk}, {Sk}). (45)

Clearly, RDPC(M) = conv(RDPC(M)). Now, consider a rate-tuple Ro ∈⋂

n RDPC(Mn).

Ro must belong to all the sets RDPC(Mn) and since each one only contains the rate-tuplesachievable by the DPC scheme, for each n, there must exist a permutation πn on {1, . . . , K}

and a set of covariance matrices (Sn1 , Sn

2 , . . . , SnK) ∈

(

St+

)Ksuch that

∑Kk=1 Sn

k ∈ Mn andthese covariance matrices achieve the point Ro by successive encoding scheme with the orderspecified by πn. Given that there are only a finite number of possible permutations on{1, . . . , K}, there must exist a permutation πo and an infinite sub-sequence of permutations,{πni}∞i=1, that are all equal to πo. On the other hand, for each n, Mn ∈ M1 and each set ofthe covariance matrices (Sn

1 , Sn2 , . . . , Sn

K) belongs to the set K defined as below:

K =

{

(S1, S2, . . . , SK) ∈(

St+

)K:

K∑

k=1

Sk ∈ M1

}

.

On the vector space (St)K , define a norm ‖(S1, S2, . . . , SK)‖ = maxk ‖Sk‖, where the normon the right-hand side is the same norm on S

t considered earlier in this section. Since M1 is

a compact set, K is also a compact subset of(

St+

)Kunder this norm. Therefore, the infinite

sub-sequence {(Sni

1 , Sni

2 , . . . , Sni

K )}∞i=1 must have an infinite sub-sequence that converges to alimiting point (So

1 , So2 , . . . , S

oK) ∈ K, where ni, i = 1, 2, . . . are the same indexes for which

πo = πn1 = πn2 = · · · . Since M1 ⊇ M2 ⊇ · · · ⊇ Mn ⊇ · · · , and for each n,∑K

k=1 Snk ∈ Mn,

it is not hard to show that for the limiting point (So1 , S

o2 , . . . , S

oK),

∑Kk=1 So

k ∈⋂

n Mn.Furthermore, in [18], it is shown that the function log |Ir + X| is continuous for X ∈ S

r+

with any norm on the space of symmetric matrices. As a result, the DPC achievable ratesobtained by the covariance matrices (Sni

1 , Sni

2 , . . . , Sni

K ) and the encoding order πo,

Roπo(k) =

1

2log

∣

∣

∣Hπo(k)

(

∑

j≥k Sni

πo(j)

)

HTπo(k) + Ir

∣

∣

∣

∣

∣

∣Hπo(k)

(

∑

j>k Sni

πo(j)

)

HTπo(k) + Ir

∣

∣

∣

k = 1, . . . , K, (46)

31

converge to the corresponding rate terms for (So1 , S

o2 , . . . , S

oK) and πo as the covariance matri-

ces (Sni

1 , Sni

2 , . . . , Sni

K ) converge to (So1 , S

o2 , . . . , S

oK). Recall that the rates on the left-hand side

of the expressions in (46) are equal to Rok since (Sni

1 , Sni

2 , . . . , Sni

K ) and πo achieve Ro. Conse-

quently, Ro is achievable by (So1 , S

o2 , . . . , S

oK) with the order πo and since

∑Kk=1 So

k ∈⋂

n Mn,

Ro lies in the set RDPC(∩nMn).

So far, it was shown that⋂

n RDPC(Mn) ⊆ RDPC(∩nMn). Therefore, the convex-hull of

the set⋂

n RDPC(Mn) is also a subset of the convex-hull of the set RDPC(∩nMn) which is

equal to RDPC(∩nMn). The following lemma shows⋂

n RDPC(Mn) = conv(

⋂

n RDPC(Mn))

which in effect proves that⋂

n RDPC(Mn) ⊆ RDPC(∩nMn).

Lemma 3.7 Let the decreasing sets A1 ⊇ A2 ⊇ · · · ⊇ An ⊇ · · · be compact and connectedsubsets of the d-dimensional Euclidean subspace and let An be the convex-hull of the set An.Then, it can be shown that

⋂

n

An = conv

(

⋂

n

An

)

.

This Lemma is proven in Appendix C.

32

4 Summary

In this paper, the capacity region of the Gaussian MIMO BC was considered. The DPCachievable rate region for this class of broadcast channels was reviewed. By introducing thedual MAC of a Gaussian MIMO BC and using the MAC-BC duality result, the DPC regionwas represented alternatively as the capacity region of the dual MAC under sum powerconstraint. It was shown that the dual representation of the DPC region has several keyadvantages over the original representation that can be exploited to characterized this regionmore efficiently. Using the convex optimization techniques, each point on the boundarysurface of the DPC region was characterized as the solution to a convex optimization problem.

After characterizing the DPC region, it was proven that this region is the capacity regionof the Gaussian MIMO BC under ordinary total average power constraint. In the converseproof, several ideas from the previous works were combined with the duality theory to provethat each point on the boundary of the DPC region lies on the boundary of the capacityregion. Finally, by using a more comprehensive notion of MAC-BC duality, the optimality ofthe DPC scheme was proven under any general convex constraint on the transmit covariancematrix.

Acknowledgment

The authors would like to thank professor El Gamal for his careful comments on the earlierdraft of this paper.

References

[1] P. P. Bergmans, “A simple converse for broadcast channels with additive white Gaussiannoise (Corresp.),” IEEE Trans. Inform. Theory, vol. 20, no. 2, pp. 279-280, Mar. 1974.

[2] M. Costa, “Writing on dirty paper,” IEEE Trans. Inform. Theory, vol. 29, pp. 439-441,May 1983.

[3] A. Cohen and A. Lapidoth, “The Gaussian watermarking game: Part I,” IEEE Trans.Inform. Theory, vol. 47, no. 1, pp. 211-219, Jan. 2001.

[4] W. Yu, A. Sutivong, D. Julian, T. M. Cover and M. Chiang, “Writing on colored paper,”in Proc. IEEE Inter. Symp. Inform. Theory (ISIT), Jun. 2001.

[5] G. Caire and S. Shamai, “ On achievable rates in a multi-antenna broadcast downlink,”38th annual Allerton Conf. on Commun., Control and Computing, Oct. 2000.

[6] G. Caire and S. Shamai, “On the achievable throughput of a multiantenna Gaussianbroadcast channel,” IEEE Trans. Inform. Theory, vol. 49, vo. 7, pp. 1691-1706, Jul.2003.

33

[7] N. Jindal, S. Vishwanath and A. Goldsmith, “On the duality of Gaussian multiple-accessand broadcast channels,” IEEE Trans. Inf. Theory, vol. 50, no. 5, pp. 768-783, May 2004.

[8] S. Vishwanath, N. Jindal and A. Goldsmith, “Duality, achievable rates, and sum-ratecapacity of Gaussian MIMO broadcast channels,” IEEE Trans. Inf. Theory, vol. 49, no.10, pp. 2658-2668, Oct. 2003.

[9] W. Yu and J. M. Cioffi, “Sum capacity of Gaussian vector broadcast channels,” IEEETrans. Inform. Theory, vol. 50, no. 9, pp. 1875-1892, Sept. 2004.

[10] P. Viswanath and D. Tse, “Sum capacity of the vector Gaussian broadcast channel anduplink-downlink duality,” IEEE Trans. Inform. Theory, vol. 49, no. 8, pp. 1912-1921,Aug. 2003.

[11] P. Viswanath and D. Tse “On the capacity of the multiple antenna broadcast channel,”in Proc. DIMACS Workshop on Signal Processing for Wireless Transmission, Oct. 2002.

[12] S. Vishwanath, G. Kramer, S. Shamai, S. Jafar and A. Goldsmith, “Outer bounds formulti-antenna broadcast channels,” in Proc. DIMACS Workshop on Signal Processingfor Wireless Transmission, Oct. 2002.

[13] H. Weingarten, Y. Steinberg and S. Shamai, “The capacity region of the GaussianMIMO broadcast channel,” in Proc. Conference on Inform., Science, and Systems(CISS), 2004.

[14] W. Yu, T. Lan, “Transmitter optimization for the multi-antenna downlink with per-antenna power constraints,” submitted to IEEE Trans. Signal Processing, Dec. 2005.

[15] D. Tse and S. Hanly, “Multi-access fading channels-Part I: polymatroid structure, op-timal resource allocation and throughput capacities,” IEEE Trans. Inf. Theory, vol. 44,no. 7, pp. 2796-2815, Nov. 1998.

[16] W. Yu, Competition and cooperation in multi-user communication environments, Ph.D.dissertation, Elec. eng., Stanford Univ., Stanford, CA, June 2002.

[17] T. Cover and J. Thomas, Elements of Information Theory, New York, Wiley, 1991.

[18] S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, 2004.

[19] D. G. Luenberger, Optimization by Vector Space Methods, Wiley, New York, 1969.

34

A Proof of Proposition 2.1

In this appendix, it is shown that the capacity region of the dual MAC under sum powerconstraint, Csum

MAC(P ), as given in (9), is convex.

Proof: Assume the rate-tuples R(1), R(2) belong to CsumMAC(P ). Hence, there must exist two

sets of positive semi-definite covariance matrices {S(1)k } and {S(2)

k } such that∑

k tr(S(i)k ) ≤ P

and R(i) ∈ G({HTk }, {S

(i)k }) for i = 1, 2. For any given α ∈ [0, 1] and any J ⊆ {1, . . . , K},

the following inequalities can be obtained:

∑

k∈J

(

αR(1)k + αR

(2)k

)

≤α

2log

∣

∣

∣

∣

∣

∑

k∈J

HTk S

(1)k Hk + It

∣

∣

∣

∣

∣

+α

2log

∣

∣

∣

∣

∣

∑

k∈J

HTk S

(2)k Hk + It

∣

∣

∣

∣

∣

≤1

2log

∣

∣

∣

∣

∣

∑

k∈J

HTk

(

αS(1)k + αS

(2)k

)

Hk + It

∣

∣

∣

∣

∣

,

where α = 1−α, the first inequality follows from definition of the set G({HTk }, {Sk}) and the

second one is from concavity of the log |.| function. As a result, the rate-tuple αR(1) + αR(2)

belongs to the set G({HTk }, {αS

(1)k + αS

(2)k }). Since for the given α,

∑

k tr(αS(1)k + αS

(2)k ) ≤ P

holds as well, the set of covariance matrices {αS(1)k + αS

(2)k } satisfies the power constraint

and the region CsumMAC(P ) is convex.

35

B Proof of Lemma 3.1

In this appendix, Lemma 3.1 is proven for the general K-user case. It is shown that thepoint R∗ as given by Lemma 2.3 maximizes

∑Kk=1 µkRk over the DPC region of DBC(µ),

denoted by RDBC(µ)DPC (P ).

Proof: By means of duality, optimization can be performed over the capacity region of thedual MAC of DBC(µ) under sum power constraint P . Recall that the MAC-BC dualityconsidered in Section 2.1 applies to channels with spatially white Gaussian noises. LetQ

−1/2k be the symmetric square root matrix of the inverse of Qk, i.e., Q

−1/2k = (Q

−1/2k )T and

Q−1/2k Q

−1/2k = Q−1

k , k = 1, . . . , K. The noise at receiver k of DBC(µ) can be whitened by

multiplying the channel output yk by Q−1/2k . This process results in an equivalent broadcast

channel with channel matrix Q−1/2k and receiver’s white Gaussian noise zk with covariance

matrix It for user k. Note that this transformation whitens the noises and does not affectthe capacity region nor the DPC region of DBC(µ). Therefore, the dual MAC of DBC(µ)

has channel matrices Q−1/2k for user k and white Gaussian noise z with covariance matrix It,

and is given by,

y =K∑

k=1

Q−1/2k xk + z. (47)

Figure 6 depicts DBC(µ) together with its dual MAC for K = 2 users. By duality, the DPCregion of DBC(µ) is equal to the capacity region of its dual MAC given in (47) under sumpower constraint P . Hence, the weighted sum rate maximization can be performed over thecapacity region of this dual MAC. Let Γk denote the transmit covariance matrix of user k inthis dual channel for k = 1, . . . , K. For a given set of covariance matrices {Γk} and for allpermutations σi, i = 1, . . . , (m + 1)!, on the set I = {l, . . . , l + m}, define the rate-tuple Rσi

as,

Rσi

k =1

2log

∣

∣

∣

∑Kj=k Q

−1/2j ΓjQ

−1/2j + It

∣

∣

∣

∣

∣

∣

∑Kj=k+1 Q

−1/2j ΓjQ

−1/2j + It

∣

∣

∣

k ∈ {1, . . . , K} \ I,

Rσi

σi(k) =1

2log

∣

∣

∣

∑l+mj=k Q

−1/2σi(j)

Γσi(j)Q−1/2σi(j)

+∑K

j=l+m+1 Q−1/2j ΓjQ

−1/2j + It

∣

∣

∣

∣

∣

∣

∑l+mj=k+1 Q

−1/2σi(j)

Γσi(j)Q−1/2σi(j)

+∑K

j=l+m+1 Q−1/2j ΓjQ

−1/2j + It

∣

∣

∣

k ∈ I.

Also let σ1 be the identity permutation, i.e., σ1(k) = k for all k ∈ I. Following thesame argument given in the proof of Lemma 2.3, for the case 0 < µ1 < . . . < µK , Rσ1

maximizes∑K

k=1 µkRk over the DPC region of DBC(µ) for transmit covariance matrices Γ∗k,

36

PSfrag replacements

x

y1

y2

z1 ∼ N (0, Q1)

z2 ∼ N (0, Q2)y

x1

x2

z ∼ N (0, It)

It

It

Q−1/21

Q−1/22

(a) (b)

Figure 6: Degraded Gaussian MIMO BC DBC(µ) in (a) and its dual multiple access channelin (b).

k = 1, . . . , K, that are the optimal solutions to the following optimization problem:

MaximizeK∑

k=1

(µk − µk−1)1

2log

∣

∣

∣

∣

∣

K∑

j=k

Q−1/2j ΓjQ

−1/2j + It

∣

∣

∣

∣

∣

(48)

Subject toK∑

k=1

tr(Γk) ≤ P,

Γk � 0 k = 1, . . . , K.

This optimization problem is exactly the same as the one given in (12) except the channel

matrices HTk of (6) are replaced by the channel matrices of (47), Q

−1/2k . Hence, the optimal

solution must satisfy the KKT conditions of Lemma 2.3 with HTk replaced by Q

−1/2k and any

solution that satisfies these KKT conditions is optimal. Let γ and Ψk for k = 1, . . . , K bethe dual variables associated with the sum power and the positive semi-definite constraintsfor the optimization problem (48). Set γ∗ = λ∗, Ψ∗

k = 0 for all k and

Γ∗k = Q

1/2k HT

k S∗kHkQ

1/2k k = 1, . . . , K, (49)

where S∗k and λ∗ are the primal and the dual optimal solutions of (12). In the following, it is

shown that this specific choice of the primal and dual variables satisfies the KKT conditionsfor the optimization problem in (48) and hence is optimal. Clearly Γ∗

k � 0 for all k andγ∗ > 0. Also note that,

K∑

k=1

tr(Γ∗k) =

K∑

k=1

tr(HkQkHTk S∗

k) =K∑

k=1

tr

(

(Ir −1

λ∗Φ∗

k)S∗k

)

=K∑

k=1

tr(S∗k) = P,

where the second equality follows from the optimality conditions given in (15) and definitionof Qk while the third equality follows from the complementary slackness conditions in (16).

37

Hence, Γ∗k, Ψ∗

k for k = 1, . . . , K and γ∗ as defined are feasible and satisfy the complementaryslackness conditions, i.e., tr(Γ∗

kΨ∗k) = 0 for all k. It only remains to show that the derivatives

of the Lagrangian with respect to Γk, k = 1, . . . , K, are equal to zero at these given values.In other words, they satisfy the equation (15) counterparts for the MAC in (47):

Q−1/2k

k∑

j=1

(µj − µj−1)1

2

(

K∑

i=j

Q−1/2i Γ∗

i Q−1/2i + It

)−1

Q−1/2k + Ψ∗

k − γ∗Ir = 0,

for k = 1, . . . , K. These equations are obtained from equations (15) by replacing HTk with

Q−1/2k . Using the definition of Qk, it is not hard to show that the chosen values for Γ∗

k,Ψ∗

k and γ∗ satisfy these equations and are optimal. By substituting the optimal Γ∗k from

(49) in the expression for Rσ1 , the expression for the optimal rate-tuple R∗ given in (11)is obtained. Therefore, R∗ is on the boundary of the DPC region of DBC(µ). Similarly,for the case 0 < µ1 < · · · < µl = · · · = µl+m < · · · < µK , each point on the convex-hull of the vertices Rσi , i = 1, . . . , (m + 1)!, maximizes

∑Kk=1 µkRk over the DPC region of

DBC(µ) for the optimal transmit covariance matrices Γ∗k that are obtained from (48). By the

same arguments used for Rσ1 , it can be shown that the boundary point Rσi of RDBC(µ)DPC (P )

coincides with the vertex Rσi of RDPC(P ) given in (17)-(18) for i = 1, . . . , (m+1)!. Therefore,the convex-hull of these vertices are the same and the DPC regions of the original channeland DBC(µ) both have this surface in common. Recall that the same weights µ1, . . . , µK

are used to find the boundary point or surface of both RDBC(µ)DPC (P ) and RDPC(P ), hence,

the two boundaries are tangent at this boundary point or surface.

38

C Proof of Lemma 3.7

In this appendix, it is shown that for the decreasing, compact and connected subsets of thed-dimensional Euclidean subspace, A1 ⊇ A2 ⊇ · · · ⊇ An ⊇ · · · ,

⋂

n

An = conv

(

⋂

n

An

)

,

where An denotes the convex-hull of the set An.

Proof: It is easy to show that conv (⋂

n An) ⊆⋂

n An. In the following, it is proven that,⋂

n An ⊆ conv (⋂

n An). Consider a point p ∈⋂

n An. From the Caratheodory theorem [17],the point p in the convex-hull of the connected and compact set An can be represented asa convex combination of at most d points in the set An. Therefore, for each n, there existd points bn

1 ,bn2 , . . . ,b

nd ∈ An and a vector α

n = (αn1 , . . . , αn

d ) ∈ Rd+ such that

∑di=1 αn

i = 1

and p =∑d

i=1 αni b

ni . Note that some of the αn

i may be zero. Since for each n, An ⊆ A1,(bn

1 ,bn2 , . . . ,b

nd) ∈ Ad

1 and they form an infinite sequence in the compact set Ad1. Hence,

there exists a sub-sequence {(bni

1 ,bni

2 , . . . ,bni

d )}∞i=1 that converges to a point (bo1,b

o2, . . . ,b

od)

and obviously each of the points boi must be in the set

⋂

n An for i = 1, . . . , d. Recallthat the vectors α

n also forms an infinite sequence in the compact set [0, 1]d. Therefore,the sub-sequence α

ni for indexes ni, i = 1, 2, . . . , as specified before, has a sub-sequencethat converges to a limiting point α

o ∈ [0, 1]d. Since for each ni,∑d

j=1 αni

j = 1 and p =∑d

j=1 αni

j bni

j , these equalities must hold for the limiting points as well, i.e.,

d∑

j=1

αoj = 1,

p =d∑

j=1

αojb

oj .

Hence, p lies in the convex-hull of the set⋂

n An.

39

Date post:	18-May-2018
Category:	Documents
Upload:	hatuong
View:	217 times
Download:	2 times

A Proof of the Converse for the Capacity of Gaussian MIMO...

Documents