Download - ETRY OFKULLRACK-LEIBLER INFORMATION CENTER UNCLDIE/G … · 2014-09-27 · ada2 170 a note on the geom etry ofkullrack-leibler information 1/ numbrs(u) wisconsin univ-madison mathematic

ADA2 170 A NOTE ON THE GEOM ETRY OFKULLRACK-LEIBLER INFORMATION 1/NUMBRS(U) WISCONSIN UNIV-MADISON MATHEMATIC S RESEARCHCENTER W LOH APR 83 MRC-TSR-2506 0AAG29-8O-C-0041

UNCLDIE/G 2/1. N

MElEL

1111111.5

Q36

L

MICROCOPY RESOLUTION TEST CHARTNA1TOAL BUREAU OF STANDAROS-1963-A

ism

7 1MRC Technical Summa~ry Report #2506

A NOTE ON THE GEOMETRY OF

AD A29170Wei-Yin Loh

* Mathematics Research Center

* University of Wisconsin- Madison610 Walnut StreetMadison, Wisconsin 53706

* April 1983

(Received March 23, 1983) O&;E ET E O l

JUN 9 im

Approved for Public relessue

OTIC FILE COPYDistribution unlimited

sponsored by

U. S. Army Research office National Science Foundation

p. 0. Box 12211 Wahington, DC 20550

* Research Triangle ParkNorth Carolina 27709

* 88 06 07 075

-. 7 w&*.- ~

4,}

UNIVERSITY OF WISCONSIN - MADISONMATHEMATICS RESEARCH CENTER

A NOTE ON THE GEOMETRYOF KULLBACK-LEIBLER INFORMATION NUMBERS

Wei-Yin Loh

Technical Summary Report #2506

April 1983

ABSTRACT

Csiszar (1975) has shown that Kullback-Leibler information numbers

possess some geometrical properties much like those in Euclidean geometry.

This paper extends these results by characterizing the shortest line between

two distributions as well as the midpoint of the line. It turns out that the

distributions comprising the line have applications to the problem of testing

separate families of hypotheses.

AMS (MOS) Subject Classifications: Primary - 60-00-E05, 62B10Secondary - 62F03

Key Words: Kullback-Leibler information, geometry of probabilitydistributions, minimax, embedding

Work Unit Number 4 - Statistics and Probability

Department of Statistics and Mathematics Research Center, University of

Wisconsin, Madison, WI 53705.

Sponsored by the United States Army under Contract No. D'AG29-80-C-0041. Thismaterial is based upon work supported by the National Science Foundation underGrant No. MCS-7927062, Mod. 2 and Nos. MCS-7825301 and MCS-7903716.

BIGNIZCA C MWD =PLANTZON

T he Kullback-Leibler information number in a well-known measure of

statistical distance between probability distributions. Previous authors have

shown that when endowed with this distance measure, the space of probability

distributions possesses geometrical properties analogous to Zuclidean

geometry. This paper proves a new geometrical property by showing that one

can in fact define the shortest line between two probability distributions as

well as its aid-point.

It turns out that the probability distributions comprising this line have

long ago been used as a tool in the important problem of testing statistical

hypotheses involving nuisance parameters. &part from pure mathematical

convenience, there has been little justification for its use. The results in

this paper are the first attempt at such explanation.

:C3

The responsibility for the wording and views expresed in this desriptive

smmary lies with MkC, and not with the author of this report.

I1"p

IA

a "MT ON inS GnOMMYE orKULLUWc-L3z3L33 Z1O3A2 NUNN=R

I Wel-Tin Loh

1. Zntroduction

Caissar (1975) has shown that if we use the Kullback-Leibler information

number as a measure of distance between (probability) distributions, certain

analogies exist between the properties of distributions and Zaclidean

*' geometry. Zn particular, he proved an analogue of Pythagoras' theorem. Zn

this note we extend these geometrical properties by defining the "shortest

line" between two distributions and the "aid-pointu of the line. It turns out

that the distributions comprising such a line are precisely those whose

densities are exponential linear combinations of the densities of the two

distributions at the end-points.

The idea of taking exponential linear combinations of densities is not

now. Por example, it appears in Cox (1961), Atkinson (1970) and Drown (1971)

as a mathematically convenient means of embedding two families of

distributions into a larger family. Our results in section 4 show that in

fact there is a deeper mathematical property behind this choice of embedding,

namely that the distributions in the embedding are really those distributions

that are closest (in the Xullback-Leibler sense) to the two original families.

Department of Statistics and Mathematics Research Center, University of

Wisconsin, Madison, 91 53705.

Sponsored by the United States Army under Contract go. DA& -290--0041. Thismaterial is based upon work supported by the National science Foundation underGrant No. NCS-7927062, Nod. 2 and You. NCB-7825301 and NC-7903716.

2. Notations and dofinitions

macall that if F and a are two distributions on the same measurable

space, the Kullback-Lelbler information number X(1,G) is defined as

( log(dV/d0)dV, it F << G1(1G) -

(+ °otherwise

where OF << G means that F is absolutely continuous with respect to G.

It Is well known that 1(PG) is well-defined, nonnegative, and is equal to

zero if and only if F(R) - G(s) for all measurable sets B.

We need the following definitions in the rest of this paper.

Definition 2.1. A distribution P is closer to F and G than Q is if

X(P,) ( K(Q,P) and X(P,G) 4 K(Q,G)

with at least one Inequality being strict. In symbols we write P Q (or

P < Q if it is clear from the context what F and G are).

Definition 2.2. P is a mid-point of F and G if K(PF) - K(PG) and

there does not exist Q for which Q P.

Definition 2.3. P is minimax for F and G if max(K(PF), K(P,G)) -

min(max(K(Q,F), K(Q,G))} where the min is taken over the space of allQdistributions.

Throughout this paper, p denotes a measure that dominates both P

and Gi and f(x), g(x) are their respective densities relative to V. For

convenience, we let A denote the set

(2.1) A - {x : f(x)g(x) 0 0) ,

and let Y ( ( 1) be the distribution with density (with respect to a)

given by

A 14A

(2.2) pAx) - /kAg (x)f ) on A

0 otherwise ,

-2-

II-

wier k;e - A a adI1 (2.3) P.({, , o A 1) u {V.6)

(note that if Ir and 0 are matually absolutely oontLnuouei, P4 IN

Il 0G and P is an exponential family.) FPinally v need the fft ation

(2.4) (M - A f10(9/

I we will often abbreviate I(Pxe) to P(At).

I* I 4

C..

n-3

3. Preliminary leso.

We will asume throughout that u is a a-finite measure and F and G

are two (fized) distributions, not necessarily mutually absolutely continuous.

Lsina 3.1. Suppose that Y(A) > 0. Then ki and J0M) are both

differentiable in (0,1) and continuous at X = 0, 1 (with J(0) and J(1)

possibly infinite).

Proof. Since k - f= exp(l log(g/f))f do and J(1) is Its first

derivative, differentiability in (0,1) follows from a well-known result on

integrals of exponential densities (see e.g. Lohmann (1959)). To see that the

functions are continuous at the end-points, split A into the sets A(f) =

A A (f ; g) and A(g) - A A (f < g), and use dominated convergence to obtain

the result for k1. To prove the sam for .r(1), first observe that

nonnegativity of K(G,F) implies that

S4 f A Ilog(g/f )I _g d" <

Therefore we may take limits as A + 0 in the inequality

A tlog(g/f)-g af p -C ( (A flog(g/f)] g dal (!A(log(g/f)]-f do}

to obtain

(3.1) lin J"[log(g/f)1g f di < I, [log(g/f] f d .

Fatouls lmma shows that the reverse inequality holds, so in fact exact

equality obtains in (3.1). Now by monotone convergence

Ila flogg/)l+ Ag f 1'kdo - f[log(g/f)]+f du.

This proves that J(M) is continuous at X = 0. A similar argument doss it

for A - 1.

Lemo 3.2. As functions of X, both K(AF) and K(X,G) are differentiable

in (0,1) and continous at X - 0, 1. K(x,r) is non-decreasing and K(A,G)

is non-increasing in [0,1].

-4-

Proof.* The first assertion follows from the preceding lie and the relations

.... ..... i l i a in t l a - . . .. . . . . .. '- ... ~ l_ . J , l I i -

(3.2) 1(1,7) - log A + 1 Jill

(3.3) K(A,G) - log k - 11-X)k) Jil)

I Differentiation yields, for 0 < A < 1,

-1 -A (d/&))K(,7) - -(-A) (d/dA)K(AG)

(3.4)M r,(o9Wf M 0.. 13.41 var1(loglg(X)/f(X)) 1} 0•

This proves the second aseertion. it is easy to see that strict inequality

holds in (3.4) for same 0 < A < 1 if and only if it holds for all

Loen 3.3. Suppose that u(A) 0. Let Q be such that K(Q,3') and

f (Q,G) are both finite and define

rCA) - f log(px/f)dQ

(3.5)OWA1 - I log(px/g)dQ

Then Mi) rCA) and s(A) are finite and continuous in [0,1], and (iLi) if

for s*me 0 < ( 1,

(3.6) riA) - K1(,r)

then m(l) - R(A,G).

Proof. The finiteness of (Q,) and K(Q,G) means that Q is absolutely

continuous with respect to P for all A in [0,11. Therefore we may write

rW) - log kh + X(K(Q,V) - KQ,G))

OMA) - Iq kX - (1-x)CKCQ,r) - KCQ,G))

Assertion (i) now follow from Lema 3.1. To get (ii) use the fact that

R1A,,) - log k1 + 1(1(1,r) - 1(IM)

and (1,0) - loc ki - (1-A)(CV) - 1(1,0))

.....-.-

The proof of the next lsm. is trivial.* A UNWe general version appears

In Csissar (1975).

!2 3.4 Let P, Q, a be three distinct distributions such that P << R

and K(Q, P) <(-. Then

I f ~log(dP/GJ)dO - K(P,3)

if and only if

Kx(Q,RI) *K(Q,P) + K(P,R)

A similar result holds if both signs are replaced with U signs.

4. Main reslts

We •oa pcyv our main theorem, which says that the romential eMebdduIg

P In (2.3) Is In same sense "oe.le"N

Thsore! 4.1. For any Q not belonging to P, there is V in P such that

P .Q.

proof The result is easy if F and a are mutually singular sine then

1(FG) - K(,?) and we may take P -F if K(Q,) - and 1 - G

otherwise. so suppose va(A) ) 0, and without loss of generality further

asme that both K(QY) and I(QG). are finite. lben Q << P) for all

0 -C A ( 1. Let r and a be defined as in (3.5). By Laas 3.2 and 3.3.

RCA,r) and r() are continuous functions of X in [0,11. We consider

throe oases according to whether these two graphs intersect.

(). Suppose r(A) - R(A,) for ame 0 < X < 1. Then

(4.1) K(Q,) " ).(QG) + (I-A)9(Q,P) - log kA < -

and Lemma 3.4 implies that

x(9,r) -x(QA) + (X,) xCA.)

Further, by tomea 3.3, a(A) - R(A,G). leversLng the roles of r and s,

and F and 0, we also get X(Q,G) > K(X,G). sne* P, ;ftQ "

(11). Suppose r(A) ' K(,) for all 0 < A < 1. Continuity yields r(1) •

1(1,) and since K(Q,1) (- by (4.1), we can use LAme 3.4 to deduce that

it follows that P~ Q*9

(III). The case rCA) C RO,,) for all 0 < A < 1 is similar to (11).

According to Definition 2.2, the. above theorem Implies that the aid-

point N of F and 0 belongs to P whenever the former exists. he

following corollaries give conditeas for the existence of no

-7-

Corollary 4.1. M If F and a are mutually absolutely continuous, N

exists and equals PA for some unique A in (0,1). (Li If 7 and G are

mutually singular, N does not exist. (ill) N is unique whenever it exists.

Proof. Assertion (i) follows from the fact that if P and 0 are mutually

absolutely continuous and distinct from each other, then 1(0,F) - K(1,0) - 0,I and both X(A,P) and X(X,G) are strictly monotone for 0 ( A 4 1.

Assertion (ii) is ismediate from Theorem 4.1 since P - {F,G) if F and G

are mutually singular. To prove assertion (iit), suppose that F and G are

not mutually singular and N exists. If there are A 1 A2 in [0,1] such

that and PA2 are both aid-points of F and G, then

2 (X1,F) - (I, G) - K(12,r) - (12,0)

and it follows from (3.4) that g(x)/f(x) is constant a.e. (p) on A. This

implies that PX - P0 for all 0 ( A 4 1 and hence that N is unique.

Corollja 4.2. Suppose F and G are not mutually singular. Then the mid-

point N exists if and only if

(4.2) J(A) - 0 for some 0 A(1 ,

in which case N -P.

Proof. According to Theorem 4.1, N exists if and only if

(4.3) I(1,?) -1(1,G) < - for some 0 ( X 1

It is clear from (3.2) and (3.3) that this is equivalent to (4.2).

Corollary 4.1 states that mutual singularity of F and G is a

sufficient condition for the non-existence of the mid-point. The following

example shows that the condition is not necessary.

Sxsmple 4.1. Let r be the uniform distribution on (0,3) and G be

uniform on (1,2). Then P. - 0 for all 0 ( X 4 I and (4.3) does not hold

for any A. There is thus no mid-poLnt.

-6-.i

-------- I~ i .. '

The PX (or G) in this example in "minimmxO according to Definition

2.3.* It turns out that ainimax distributions exist always. Uniqueness may be

lost but only in trivial cases. This is made explicit in the next corollary.

corollary 4.3. Mi A Mininax distribution always exists. (ai) If r and

G are not mutually singular, the minimax distributioni is unique. ULUi If

F and G are mutually singular, every distribution is minimax. (iv) avery

mid-point is unique minimax.

Proof. Since every mid-point is minimax by definition, assertion (iv) is

Ismadiato from Corollary 4.1. It remains to prove assertions Mi - (iii) only

f or the case when the mid-point does not exist. To prove assertion (ii),

suppose that F and G are not mutually singular. it is clear from (4.3)

that the mid-pota.,t does not exist if and only if the graphs of 10.?) and

W(,G) fail to intersect in (0,11. But from (3.4) either

(4.4) Cd/dX)K(X,F) > 0 and (d/dX)K(X,G) < 0 for all 0 < X. < I

or

(4.5) (d/dX)K0.,F) - (d/d)K)(X,G) - 0 for all 0 < X < I

Therefore either K(0,?) > 1(0,G) or 1(1,7) < ICIG). Assume, without loss

of generality, that 1(0,7) > ICOG). First suppose that (4.4) holds. Then

for all 0 < X 1

(4.6) maXCICO,F), 1(0,G)) < max(K0X,F), K(XG))

if X(G,F) < -, then G << F, P1 G and (4.6) yields

(4.7) maxCIO,P), ICOG)) < max(X(G,I), K(G,G))

Clearly (4.7) is trivially true also if K(G,F) - .A similar argument shows

that

(4.8) max(R(0,F), K(0,G)) 4 max(t(?,r), ICIGQ))

with equality if and only if P 0 a F. Now (4.6) -(4.8) shows that P 0

uniquely minimises max(X(P,P), X(P,G)) over all P e P. we conclude from

Theorm 4.1 that POis unique minimax for F and 0. if instead (4.5)

obtains, then as the proof of Corollary 4.1 shows, P),- P. for all

0 4 X 4 1. rurther 1(0,F) - X(0,0) - 0. since (4.7) and (4.6) are trivially

satisfied, PO is again unique minimax. This completes the proof of

assertion (Li). Assertion (iii) follows from observing that if P and G

are mutually singular, then at least one of X(Q,V) and K(Q,G) is infinite

for any distribution Q. Assertion (i) is a consequence of (ii) -(iv).

Ik ii NC

5. Example. We end this discussion with two example.

Ibxasple 5.1 (Binomial), Let F be Rin(np 1 ) (binomial with n trials and

success probability p,) and G be sin(nIp 2). Write qi- I-pi. Then every

member in P is binomial and the mid-point KI is Bin(n,p) wheo" p

log(q2/q1)/log(pjq 2/P2 q1 ). This formula applies and yields p between p1

and P2 only when neither p1 nor P2 is 0 or 1. If P1 - 0

and 0 < P2 < 1 for example, the formula gives p -0. The reason for this

strange result is that here there is no aid-point since P - (F). It can be

shown that if both p1 and P2 are neither 0 nor 1, then p liesIIp -5as expected. The formula for p suggests a new way of "scaling" the

binomial family.

Examle 52 (Wrmal. Le F b U(8 21______5.2(Noma)._ot__be____ (normal with mean 8 and

vainea2)adGb1~ 2 ). Then the members of P are also normal

distributions. if a, M2 a is 144 (a + 62), a 2)1 and if 0 2

2'2 2 2 2 2

2 w a21a2 loq(o 2/a/(0O2 a 2i

It can be verified that a always lies between 01i and a

AcknowledosMet. Te author is grateful to 3. L. Lehmann for many helpful

comments.

RZ3R13NCl8

[1] Atkinson, A. C. (1970). A method for discriminating between models. J.

Roy. Statist. Soc. (B) 32, 323-353.

121 Brown, L. D. (1971). Non-local asymptotic optimality of appropriate

likelihood ratio tests. Ann. Math. Statist. 42, 1206-1240.

(31 Cox, D. R. (1961). Tests of separate families of hypotheses. Proc.

Fourth Berkeley Symp. 1, 105-123.141 Caiszar, 1. (1975). I-divergence geometry of probability distributions

and minimization problems. Ann. Probability, 3, 146-158.

(5] Lehmann, R. L. (1959). Testing Statistical Hypotheses. Wiley, New York.

'-

WrL/:j12

-12-.

I Il I ' Il i r' r II I II I "-'"'?"' I , r~r 'r 'V

SECURITY CLASSFICATION OF TIS PAGE fUhe. DaEtUe~ __________READ _____________

REPORT DOCUMENTATION PAGE 13EFORECOMPLEOFOMu1. RIFORT UMBERGOVT ACCEISSION NO. 3- CIPICET-S CATALOG NUIA0ER

4. TILE (nd S~d~jS. TYPE OF REPORT A PERIOD COVERED

A NOTE ON THE~ GEOMETRY OFKLBC-EBE Summary Report - no specificINFORMATION NUMBERS reportinq period

SPERFORMING ONG. REPORT NUMnafR

7. AUTHOR(q) S.CONTRACT OR GRANT NURNER(S-)

!C-7927062, Mod. 2Wei-Yin Loh DAAG9-80-C-0041

________________________________________MCS-7825301; M4CS-790371613. PERFORMING ORGANIZATION HNAMC N ADDRESS I0. PROGRAM ELEMENT. PROJECT, TASK

Mathmatcs Rseach ente., nivesit ofAREA A WORK UNIT NUMBERSMathmatcs eseach entr, Uivesit ofWork Unit Number 4 -

610 Walnut Street Wisconsin Statistics &ProbabilityMadison, Wisconsin 53706 _____________

it. CONTROLLING OFFICE NAME9 AND ADDRESS it. REPORT DATE

April 1983See Item 18 below IS UMBER OF PAGES

1214. momsNiRI 131Ncl NAME & AODRZSS(f difflave. &@a Cma,.bvd 0111g.) IS. SECURITY CLASS. (of thi. rpoi)

UNCLASSIFIEDI f.CA_SIFICAT1O/OWGAO

14L DISTRIBUTION STATEMENT (of glte I*pere)

Approved for public release; distribution unlimited.

17- DISTRIBUTION STATEMENT (of #ho oh..en tered Aft bleak . it *lf.,, Area Ropest)

IS11 SUPPLEMENTARY NOTESU. S. Army Research office National Science FoundationP. 0. Box 12211 Washington, DC 20550Research Triangle ParkNorth Carolina 27709119. KIEV WORDS (C..U..o -ao aid* ofM 800900ae7 da 1"No by Sloah mome)

Kuilback-Leibler information, geometry of probability distributions,minimax, embedding

C 1W ABSTRACT (Cantbau so eve" side Nf aepmm lMW IdwdI by bleak .&..o)Csiszar (1975) has shown that Kuliback-Leibler information numbers possess

some geometrical properties much like those in Euclidean geometry. This paperextends these results by characterizing the shortest lint between two distri-

* butions as veil as the midpoint of the line. It turns out that the distribution* comprising the line have applications to the problem of testing separate

families of hypotheses.

DO0 1473 EDITION or I NOV 611IS OsSOLeTE UNCLASSIFIED*SECURITY CLASSIFICATION OF ?NIS PAGE (ftaft. 000 or**