ADA2 170 A NOTE ON THE GEOM ETRY OFKULLRACK-LEIBLER INFORMATION 1/NUMBRS(U) WISCONSIN UNIV-MADISON MATHEMATIC S RESEARCHCENTER W LOH APR 83 MRC-TSR-2506 0AAG29-8O-C-0041
UNCLDIE/G 2/1. N
MElEL
1111111.5
Q36
L
MICROCOPY RESOLUTION TEST CHARTNA1TOAL BUREAU OF STANDAROS-1963-A
ism
7 1MRC Technical Summa~ry Report #2506
A NOTE ON THE GEOMETRY OF
AD A29170Wei-Yin Loh
* Mathematics Research Center
* University of Wisconsin- Madison610 Walnut StreetMadison, Wisconsin 53706
* April 1983
(Received March 23, 1983) O&;E ET E O l
JUN 9 im
Approved for Public relessue
OTIC FILE COPYDistribution unlimited
sponsored by
U. S. Army Research office National Science Foundation
p. 0. Box 12211 Wahington, DC 20550
* Research Triangle ParkNorth Carolina 27709
* 88 06 07 075
-. 7 w&*.- ~
4,}
UNIVERSITY OF WISCONSIN - MADISONMATHEMATICS RESEARCH CENTER
A NOTE ON THE GEOMETRYOF KULLBACK-LEIBLER INFORMATION NUMBERS
Wei-Yin Loh
Technical Summary Report #2506
April 1983
ABSTRACT
Csiszar (1975) has shown that Kullback-Leibler information numbers
possess some geometrical properties much like those in Euclidean geometry.
This paper extends these results by characterizing the shortest line between
two distributions as well as the midpoint of the line. It turns out that the
distributions comprising the line have applications to the problem of testing
separate families of hypotheses.
AMS (MOS) Subject Classifications: Primary - 60-00-E05, 62B10Secondary - 62F03
Key Words: Kullback-Leibler information, geometry of probabilitydistributions, minimax, embedding
Work Unit Number 4 - Statistics and Probability
Department of Statistics and Mathematics Research Center, University of
Wisconsin, Madison, WI 53705.
Sponsored by the United States Army under Contract No. D'AG29-80-C-0041. Thismaterial is based upon work supported by the National Science Foundation underGrant No. MCS-7927062, Mod. 2 and Nos. MCS-7825301 and MCS-7903716.
BIGNIZCA C MWD =PLANTZON
T he Kullback-Leibler information number in a well-known measure of
statistical distance between probability distributions. Previous authors have
shown that when endowed with this distance measure, the space of probability
distributions possesses geometrical properties analogous to Zuclidean
geometry. This paper proves a new geometrical property by showing that one
can in fact define the shortest line between two probability distributions as
well as its aid-point.
It turns out that the probability distributions comprising this line have
long ago been used as a tool in the important problem of testing statistical
hypotheses involving nuisance parameters. &part from pure mathematical
convenience, there has been little justification for its use. The results in
this paper are the first attempt at such explanation.
:C3
The responsibility for the wording and views expresed in this desriptive
smmary lies with MkC, and not with the author of this report.
I1"p
IA
a "MT ON inS GnOMMYE orKULLUWc-L3z3L33 Z1O3A2 NUNN=R
I Wel-Tin Loh
1. Zntroduction
Caissar (1975) has shown that if we use the Kullback-Leibler information
number as a measure of distance between (probability) distributions, certain
analogies exist between the properties of distributions and Zaclidean
*' geometry. Zn particular, he proved an analogue of Pythagoras' theorem. Zn
this note we extend these geometrical properties by defining the "shortest
line" between two distributions and the "aid-pointu of the line. It turns out
that the distributions comprising such a line are precisely those whose
densities are exponential linear combinations of the densities of the two
distributions at the end-points.
The idea of taking exponential linear combinations of densities is not
now. Por example, it appears in Cox (1961), Atkinson (1970) and Drown (1971)
as a mathematically convenient means of embedding two families of
distributions into a larger family. Our results in section 4 show that in
fact there is a deeper mathematical property behind this choice of embedding,
namely that the distributions in the embedding are really those distributions
that are closest (in the Xullback-Leibler sense) to the two original families.
Department of Statistics and Mathematics Research Center, University of
Wisconsin, Madison, 91 53705.
Sponsored by the United States Army under Contract go. DA& -290--0041. Thismaterial is based upon work supported by the National science Foundation underGrant No. NCS-7927062, Nod. 2 and You. NCB-7825301 and NC-7903716.
2. Notations and dofinitions
macall that if F and a are two distributions on the same measurable
space, the Kullback-Lelbler information number X(1,G) is defined as
( log(dV/d0)dV, it F << G1(1G) -
(+ °otherwise
where OF << G means that F is absolutely continuous with respect to G.
It Is well known that 1(PG) is well-defined, nonnegative, and is equal to
zero if and only if F(R) - G(s) for all measurable sets B.
We need the following definitions in the rest of this paper.
Definition 2.1. A distribution P is closer to F and G than Q is if
X(P,) ( K(Q,P) and X(P,G) 4 K(Q,G)
with at least one Inequality being strict. In symbols we write P Q (or
P < Q if it is clear from the context what F and G are).
Definition 2.2. P is a mid-point of F and G if K(PF) - K(PG) and
there does not exist Q for which Q P.
Definition 2.3. P is minimax for F and G if max(K(PF), K(P,G)) -
min(max(K(Q,F), K(Q,G))} where the min is taken over the space of allQdistributions.
Throughout this paper, p denotes a measure that dominates both P
and Gi and f(x), g(x) are their respective densities relative to V. For
convenience, we let A denote the set
(2.1) A - {x : f(x)g(x) 0 0) ,
and let Y ( ( 1) be the distribution with density (with respect to a)
given by
A 14A
(2.2) pAx) - /kAg (x)f ) on A
0 otherwise ,
-2-
II-
wier k;e - A a adI1 (2.3) P.({, , o A 1) u {V.6)
(note that if Ir and 0 are matually absolutely oontLnuouei, P4 IN
Il 0G and P is an exponential family.) FPinally v need the fft ation
(2.4) (M - A f10(9/
I we will often abbreviate I(Pxe) to P(At).
I* I 4
C..
n-3
3. Preliminary leso.
We will asume throughout that u is a a-finite measure and F and G
are two (fized) distributions, not necessarily mutually absolutely continuous.
Lsina 3.1. Suppose that Y(A) > 0. Then ki and J0M) are both
differentiable in (0,1) and continuous at X = 0, 1 (with J(0) and J(1)
possibly infinite).
Proof. Since k - f= exp(l log(g/f))f do and J(1) is Its first
derivative, differentiability in (0,1) follows from a well-known result on
integrals of exponential densities (see e.g. Lohmann (1959)). To see that the
functions are continuous at the end-points, split A into the sets A(f) =
A A (f ; g) and A(g) - A A (f < g), and use dominated convergence to obtain
the result for k1. To prove the sam for .r(1), first observe that
nonnegativity of K(G,F) implies that
S4 f A Ilog(g/f )I _g d" <
Therefore we may take limits as A + 0 in the inequality
A tlog(g/f)-g af p -C ( (A flog(g/f)] g dal (!A(log(g/f)]-f do}
to obtain
(3.1) lin J"[log(g/f)1g f di < I, [log(g/f] f d .
Fatouls lmma shows that the reverse inequality holds, so in fact exact
equality obtains in (3.1). Now by monotone convergence
Ila flogg/)l+ Ag f 1'kdo - f[log(g/f)]+f du.
This proves that J(M) is continuous at X = 0. A similar argument doss it
for A - 1.
Lemo 3.2. As functions of X, both K(AF) and K(X,G) are differentiable
in (0,1) and continous at X - 0, 1. K(x,r) is non-decreasing and K(A,G)
is non-increasing in [0,1].
-4-
Proof.* The first assertion follows from the preceding lie and the relations
.... ..... i l i a in t l a - . . .. . . . . .. '- ... ~ l_ . J , l I i -
(3.2) 1(1,7) - log A + 1 Jill
(3.3) K(A,G) - log k - 11-X)k) Jil)
I Differentiation yields, for 0 < A < 1,
-1 -A (d/&))K(,7) - -(-A) (d/dA)K(AG)
(3.4)M r,(o9Wf M 0.. 13.41 var1(loglg(X)/f(X)) 1} 0•
This proves the second aseertion. it is easy to see that strict inequality
holds in (3.4) for same 0 < A < 1 if and only if it holds for all
Loen 3.3. Suppose that u(A) 0. Let Q be such that K(Q,3') and
f (Q,G) are both finite and define
rCA) - f log(px/f)dQ
(3.5)OWA1 - I log(px/g)dQ
Then Mi) rCA) and s(A) are finite and continuous in [0,1], and (iLi) if
for s*me 0 < ( 1,
(3.6) riA) - K1(,r)
then m(l) - R(A,G).
Proof. The finiteness of (Q,) and K(Q,G) means that Q is absolutely
continuous with respect to P for all A in [0,11. Therefore we may write
rW) - log kh + X(K(Q,V) - KQ,G))
OMA) - Iq kX - (1-x)CKCQ,r) - KCQ,G))
Assertion (i) now follow from Lema 3.1. To get (ii) use the fact that
R1A,,) - log k1 + 1(1(1,r) - 1(IM)
and (1,0) - loc ki - (1-A)(CV) - 1(1,0))
.....-.-
The proof of the next lsm. is trivial.* A UNWe general version appears
In Csissar (1975).
!2 3.4 Let P, Q, a be three distinct distributions such that P << R
and K(Q, P) <(-. Then
I f ~log(dP/GJ)dO - K(P,3)
if and only if
Kx(Q,RI) *K(Q,P) + K(P,R)
A similar result holds if both signs are replaced with U signs.
4. Main reslts
We •oa pcyv our main theorem, which says that the romential eMebdduIg
P In (2.3) Is In same sense "oe.le"N
Thsore! 4.1. For any Q not belonging to P, there is V in P such that
P .Q.
proof The result is easy if F and a are mutually singular sine then
1(FG) - K(,?) and we may take P -F if K(Q,) - and 1 - G
otherwise. so suppose va(A) ) 0, and without loss of generality further
asme that both K(QY) and I(QG). are finite. lben Q << P) for all
0 -C A ( 1. Let r and a be defined as in (3.5). By Laas 3.2 and 3.3.
RCA,r) and r() are continuous functions of X in [0,11. We consider
throe oases according to whether these two graphs intersect.
(). Suppose r(A) - R(A,) for ame 0 < X < 1. Then
(4.1) K(Q,) " ).(QG) + (I-A)9(Q,P) - log kA < -
and Lemma 3.4 implies that
x(9,r) -x(QA) + (X,) xCA.)
Further, by tomea 3.3, a(A) - R(A,G). leversLng the roles of r and s,
and F and 0, we also get X(Q,G) > K(X,G). sne* P, ;ftQ "
(11). Suppose r(A) ' K(,) for all 0 < A < 1. Continuity yields r(1) •
1(1,) and since K(Q,1) (- by (4.1), we can use LAme 3.4 to deduce that
it follows that P~ Q*9
(III). The case rCA) C RO,,) for all 0 < A < 1 is similar to (11).
According to Definition 2.2, the. above theorem Implies that the aid-
point N of F and 0 belongs to P whenever the former exists. he
following corollaries give conditeas for the existence of no
-7-
Corollary 4.1. M If F and a are mutually absolutely continuous, N
exists and equals PA for some unique A in (0,1). (Li If 7 and G are
mutually singular, N does not exist. (ill) N is unique whenever it exists.
Proof. Assertion (i) follows from the fact that if P and 0 are mutually
absolutely continuous and distinct from each other, then 1(0,F) - K(1,0) - 0,I and both X(A,P) and X(X,G) are strictly monotone for 0 ( A 4 1.
Assertion (ii) is ismediate from Theorem 4.1 since P - {F,G) if F and G
are mutually singular. To prove assertion (iit), suppose that F and G are
not mutually singular and N exists. If there are A 1 A2 in [0,1] such
that and PA2 are both aid-points of F and G, then
2 (X1,F) - (I, G) - K(12,r) - (12,0)
and it follows from (3.4) that g(x)/f(x) is constant a.e. (p) on A. This
implies that PX - P0 for all 0 ( A 4 1 and hence that N is unique.
Corollja 4.2. Suppose F and G are not mutually singular. Then the mid-
point N exists if and only if
(4.2) J(A) - 0 for some 0 A(1 ,
in which case N -P.
Proof. According to Theorem 4.1, N exists if and only if
(4.3) I(1,?) -1(1,G) < - for some 0 ( X 1
It is clear from (3.2) and (3.3) that this is equivalent to (4.2).
Corollary 4.1 states that mutual singularity of F and G is a
sufficient condition for the non-existence of the mid-point. The following
example shows that the condition is not necessary.
Sxsmple 4.1. Let r be the uniform distribution on (0,3) and G be
uniform on (1,2). Then P. - 0 for all 0 ( X 4 I and (4.3) does not hold
for any A. There is thus no mid-poLnt.
-6-.i
-------- I~ i .. '
The PX (or G) in this example in "minimmxO according to Definition
2.3.* It turns out that ainimax distributions exist always. Uniqueness may be
lost but only in trivial cases. This is made explicit in the next corollary.
corollary 4.3. Mi A Mininax distribution always exists. (ai) If r and
G are not mutually singular, the minimax distributioni is unique. ULUi If
F and G are mutually singular, every distribution is minimax. (iv) avery
mid-point is unique minimax.
Proof. Since every mid-point is minimax by definition, assertion (iv) is
Ismadiato from Corollary 4.1. It remains to prove assertions Mi - (iii) only
f or the case when the mid-point does not exist. To prove assertion (ii),
suppose that F and G are not mutually singular. it is clear from (4.3)
that the mid-pota.,t does not exist if and only if the graphs of 10.?) and
W(,G) fail to intersect in (0,11. But from (3.4) either
(4.4) Cd/dX)K(X,F) > 0 and (d/dX)K(X,G) < 0 for all 0 < X. < I
or
(4.5) (d/dX)K0.,F) - (d/d)K)(X,G) - 0 for all 0 < X < I
Therefore either K(0,?) > 1(0,G) or 1(1,7) < ICIG). Assume, without loss
of generality, that 1(0,7) > ICOG). First suppose that (4.4) holds. Then
for all 0 < X 1
(4.6) maXCICO,F), 1(0,G)) < max(K0X,F), K(XG))
if X(G,F) < -, then G << F, P1 G and (4.6) yields
(4.7) maxCIO,P), ICOG)) < max(X(G,I), K(G,G))
Clearly (4.7) is trivially true also if K(G,F) - .A similar argument shows
that
(4.8) max(R(0,F), K(0,G)) 4 max(t(?,r), ICIGQ))
with equality if and only if P 0 a F. Now (4.6) -(4.8) shows that P 0
uniquely minimises max(X(P,P), X(P,G)) over all P e P. we conclude from
Theorm 4.1 that POis unique minimax for F and 0. if instead (4.5)
obtains, then as the proof of Corollary 4.1 shows, P),- P. for all
0 4 X 4 1. rurther 1(0,F) - X(0,0) - 0. since (4.7) and (4.6) are trivially
satisfied, PO is again unique minimax. This completes the proof of
assertion (Li). Assertion (iii) follows from observing that if P and G
are mutually singular, then at least one of X(Q,V) and K(Q,G) is infinite
for any distribution Q. Assertion (i) is a consequence of (ii) -(iv).
Ik ii NC
5. Example. We end this discussion with two example.
Ibxasple 5.1 (Binomial), Let F be Rin(np 1 ) (binomial with n trials and
success probability p,) and G be sin(nIp 2). Write qi- I-pi. Then every
member in P is binomial and the mid-point KI is Bin(n,p) wheo" p
log(q2/q1)/log(pjq 2/P2 q1 ). This formula applies and yields p between p1
and P2 only when neither p1 nor P2 is 0 or 1. If P1 - 0
and 0 < P2 < 1 for example, the formula gives p -0. The reason for this
strange result is that here there is no aid-point since P - (F). It can be
shown that if both p1 and P2 are neither 0 nor 1, then p liesIIp -5as expected. The formula for p suggests a new way of "scaling" the
binomial family.
Examle 52 (Wrmal. Le F b U(8 21______5.2(Noma)._ot__be____ (normal with mean 8 and
vainea2)adGb1~ 2 ). Then the members of P are also normal
distributions. if a, M2 a is 144 (a + 62), a 2)1 and if 0 2
2'2 2 2 2 2
2 w a21a2 loq(o 2/a/(0O2 a 2i
It can be verified that a always lies between 01i and a
AcknowledosMet. Te author is grateful to 3. L. Lehmann for many helpful
comments.
RZ3R13NCl8
[1] Atkinson, A. C. (1970). A method for discriminating between models. J.
Roy. Statist. Soc. (B) 32, 323-353.
121 Brown, L. D. (1971). Non-local asymptotic optimality of appropriate
likelihood ratio tests. Ann. Math. Statist. 42, 1206-1240.
(31 Cox, D. R. (1961). Tests of separate families of hypotheses. Proc.
Fourth Berkeley Symp. 1, 105-123.141 Caiszar, 1. (1975). I-divergence geometry of probability distributions
and minimization problems. Ann. Probability, 3, 146-158.
(5] Lehmann, R. L. (1959). Testing Statistical Hypotheses. Wiley, New York.
'-
WrL/:j12
-12-.
I Il I ' Il i r' r II I II I "-'"'?"' I , r~r 'r 'V
SECURITY CLASSFICATION OF TIS PAGE fUhe. DaEtUe~ __________READ _____________
REPORT DOCUMENTATION PAGE 13EFORECOMPLEOFOMu1. RIFORT UMBERGOVT ACCEISSION NO. 3- CIPICET-S CATALOG NUIA0ER
4. TILE (nd S~d~jS. TYPE OF REPORT A PERIOD COVERED
A NOTE ON THE~ GEOMETRY OFKLBC-EBE Summary Report - no specificINFORMATION NUMBERS reportinq period
SPERFORMING ONG. REPORT NUMnafR
7. AUTHOR(q) S.CONTRACT OR GRANT NURNER(S-)
!C-7927062, Mod. 2Wei-Yin Loh DAAG9-80-C-0041
________________________________________MCS-7825301; M4CS-790371613. PERFORMING ORGANIZATION HNAMC N ADDRESS I0. PROGRAM ELEMENT. PROJECT, TASK
Mathmatcs Rseach ente., nivesit ofAREA A WORK UNIT NUMBERSMathmatcs eseach entr, Uivesit ofWork Unit Number 4 -
610 Walnut Street Wisconsin Statistics &ProbabilityMadison, Wisconsin 53706 _____________
it. CONTROLLING OFFICE NAME9 AND ADDRESS it. REPORT DATE
April 1983See Item 18 below IS UMBER OF PAGES
1214. momsNiRI 131Ncl NAME & AODRZSS(f difflave. &@a Cma,.bvd 0111g.) IS. SECURITY CLASS. (of thi. rpoi)
UNCLASSIFIEDI f.CA_SIFICAT1O/OWGAO
14L DISTRIBUTION STATEMENT (of glte I*pere)
Approved for public release; distribution unlimited.
17- DISTRIBUTION STATEMENT (of #ho oh..en tered Aft bleak . it *lf.,, Area Ropest)
IS11 SUPPLEMENTARY NOTESU. S. Army Research office National Science FoundationP. 0. Box 12211 Washington, DC 20550Research Triangle ParkNorth Carolina 27709119. KIEV WORDS (C..U..o -ao aid* ofM 800900ae7 da 1"No by Sloah mome)
Kuilback-Leibler information, geometry of probability distributions,minimax, embedding
C 1W ABSTRACT (Cantbau so eve" side Nf aepmm lMW IdwdI by bleak .&..o)Csiszar (1975) has shown that Kuliback-Leibler information numbers possess
some geometrical properties much like those in Euclidean geometry. This paperextends these results by characterizing the shortest lint between two distri-
* butions as veil as the midpoint of the line. It turns out that the distribution* comprising the line have applications to the problem of testing separate
families of hypotheses.
DO0 1473 EDITION or I NOV 611IS OsSOLeTE UNCLASSIFIED*SECURITY CLASSIFICATION OF ?NIS PAGE (ftaft. 000 or**