Inference on the eigenvalues of the covariance matrix of a multivariate normal...

Contents lists available at ScienceDirect

Journal of Statistical Planning and Inference

Journal of Statistical Planning and Inference 150 (2014) 66–83

http://d0378-37

journal homepage: www.elsevier.com/locate/jspi

Inference on the eigenvalues of the covariance matrixof a multivariate normal distribution—Geometrical view

Yo SheenaFaculty of Economics, Shinshu University, Japan

a r t i c l e i n f o

Article history:Received 22 October 2013Received in revised form5 March 2014Accepted 11 March 2014Available online 20 March 2014

MSC:Primary, 62H05;Secondary, 62F12

Keywords:Curved exponential familyInformation lossFisher information metricEmbedding curvatureAffine connectionPositive definite matrix

x.doi.org/10.1016/j.jspi.2014.03.00458/& 2014 Elsevier B.V. All rights reserved.

a b s t r a c t

We consider inference on the eigenvalues of the covariance matrix of a multivariatenormal distribution. The family of multivariate normal distributions with a fixed meanis seen as a Riemannian manifold with Fisher information metric. Two submanifoldsnaturally arises: one is the submanifold given by the fixed eigenvectors of the covariancematrix; the other is the one given by the fixed eigenvalues. We analyze the geometricalstructures of these manifolds such as metric, embedding curvature under e-connection orm-connection. Based on these results, we study (1) the bias of the sample eigenvalues, (2)the asymptotic variance of estimators, (3) the asymptotic information loss caused byneglecting the sample eigenvectors, (4) the derivation of a new estimator that is naturalfrom a geometrical point of view.

& 2014 Elsevier B.V. All rights reserved.

1. Introduction

Consider a normal distribution with zero mean and an unknown covariance matrix, Nð0;ΣÞ. Let denote the eigenvaluesof Σ by

λ¼ ðλ1;…; λpÞ; λ14⋯4λp

and eigenvectors matrix by Γ, hence we have the spectral decomposition

Σ¼ ΓΛΓt ; Λ¼ diagðλÞ; ð1Þ

where diagðλÞ means the diagonal matrix with the ith diagonal element λi. It is needless to say that the inference on Σ is animportant task in many practical situations in such a diversity of fields as engineering, biology, chemistry, finance,psychology, etc. Especially we often encounter the cases where the property of interest depends on Σ only through itseigenvalues λ. We treat an inference problem on the eigenvalues λ from a geometrical point of view.

Treating the family of normal distributions Nðμ;ΣÞ (μ is not necessarily zero) as a Riemannian manifold has been done byseveral authors. For example, see Fletcher and Joshi (2007), Lenglet et al. (2006), Skovgaard (1984), Smith (2005), andYoshizawa and Tanabe (1999). When μ equals zero, the family of normal distributions Nð0;ΣÞ can be taken as a manifold (say S)with a single coordinate system Σ. Hence, S is identified with the space of symmetric positive definite matrices.Geometrically analyzing the space of symmetric positive definite matrices has been an interesting topic in a mathematical

www.sciencedirect.com/science/journal/03783758

www.elsevier.com/locate/jspi

http://dx.doi.org/10.1016/j.jspi.2014.03.004



http://crossmark.crossref.org/dialog/?doi=10.1016/j.jspi.2014.03.004&domain=pdf




Y. Sheena / Journal of Statistical Planning and Inference 150 (2014) 66–83 67

or engineering point of view. Refer to Moakher and Zéraï (2011), Ohara et al. (1996) and Zhang et al. (2009) as well as theabove literature.

In this paper, we analyze S from the standpoint of information geometry while focusing on the inference on theeigenvalues of Σ. The paper is aimed to make a contribution in two regards: (1) the geometrical structure of S is analyzed inview of the eigenvalues and eigenvectors of Σ; (2) some statistical problems on the inference for λ are explained in thegeometrical terms.

We summarize the inference problem for λ. Based on independent n samples xi ¼ ðxi1;…; xipÞ0; i¼ 1;…;n from Nð0;ΣÞ, wewant to make inference on the unknown λ. We confine ourselves to the classical case where nZp. It is well-known that theproduct–sum matrix

S ¼ ∑n

i ¼ 1xixti

is sufficient statistic for both unknown λ and Γ. The spectral decomposition of S is given by

S ¼HLHt ; L¼ diagðlÞ;

where

l¼ ðl1;…; lpÞ; l14⋯4 lp40 a:e:

are the eigenvalues of S, and H is the corresponding eigenvectors matrix. This decomposition gives us two statisticsavailable, i.e. the sample eigenvalues l and the sample eigenvectors H. However it is almost customary that we only use thesample eigenvalues, discarding the information contained in H. In the past literature on the inference for the populationeigenvalues, every notable estimator is based simply on the sample eigenvalues. See Takemura (1984), Dey and Srinivasan(1985), Haff (1991), Yang and Berger (1994) for orthogonally invariant estimators of Σ; Dey (1988), Hydorn and Muirhead(1999), Jin (1993), Sheena and Takemura (2011) for direct estimators of λ. Since we do not have enough space to state theconcrete form of each estimator, we just mention Stein's estimator as a pioneering work for “shrinkage” estimator of Σ. Ingeneral, an orthogonally invariant estimator of Σ is given by

Σ ¼HΦHt ; Φ¼ diagðϕ1ðlÞ;…;ϕpðlÞÞ: ð2Þ

The estimator of λ is given by the eigenvalues of Σ, that is, ðϕ1ðlÞ;…;ϕpðlÞÞ. The sample covariance matrix (M.L.E. estimator)S9n�1S gives the estimator of λ as ϕiðlÞ ¼ n�1li; i¼ 1;…; p, while Stein's “shrinkage” estimator gives birth to

ϕiðlÞ ¼ li=ðnþpþ1�2iÞ; i¼ 1;…; p: ð3Þ

Stein's estimator assigns the lighter (heavier) weight to the larger (smaller) sample eigenvalues, hence the diversity of l is shrunk.This estimator is quite simple and performs much better than M.L.E. (see Dey and Srinivasan, 1985). Unlike Stein's estimator,many estimators in the above literature are not explicitly given or too complicated for immediate use. Nonetheless they all haveone common feature. The derived estimators of λ only depend on l.

In a sense it is natural to implicitly associate the sample eigenvalues to the population eigenvalues, and the sampleeigenvectors to the population counterpart. However the sample eigenvalues are not sufficient for the unknown populationeigenvalues. Therefore it is important to evaluate how much information is lost by neglecting the sample eigenvectors.Following Amari (1982), we gain an understanding of the asymptotic information loss with geometric terms such as Fisherinformation metric and embedding curvatures.

Another statistically interesting topic is the bias of n�1l. It is well known that n�1l is largely biased and the estimatorsmentioned above are all modification of n�1l to correct the bias, that is, “shrinkage estimators.” We show that the bias isclosely related to the embedding curvatures. Moreover the geometric structure of S naturally leads us to a new estimator,which is also a shrinkage estimator.

The organization of this paper is as follows: In the former part (Sections 2 and 3), we describe the geometrical structureof S in view of the spectral decomposition (1). In Section 2, we observe S as a Riemannian manifold endowed with Fisherinformation metrics. In Section 3, we treat two submanifolds of S, a submanifold given by the fixed eigenvectors and the onegiven by the fixed eigenvalues. The embedding curvatures of these submanifolds are explicitly given. We will show that thebias of l is closely related to the curvatures. In the latter part (Sections 4 and 5), we consider the estimation problem of λ. InSection 4, we describe the asymptotic variance of estimators when Γ is known (Section 4.1) and the asymptotic informationloss caused by discarding the sample eigenvectors H (Section 4.2). The asymptotic information loss could be measured bythe difference in the asymptotic variance between two certain estimators. In Section 5 for the case when Γ is unknown, wepropose a new estimator of λ, which is naturally derived from a geometric point of view. In the last section, some commentsare made for further research. All the proofs are collected in Appendix.

Unfortunately we do not have enough space to explain the geometrical concepts used in this paper. Refer to Boothby(2002), Amari (1985), and Amari and Nagaoka (2000).

Y. Sheena / Journal of Statistical Planning and Inference 150 (2014) 66–8368

2. Riemannian manifold and metric

The density of the normal distribution Nð0;ΣÞ is given by

f Σ xð Þ ¼ ð2πÞ�p=2 Σj�1=2 exp �12 x

tΣ�1x� �

; x¼ x1;…; xp� �

ARp��If we let sij and sij denote the (i,j) element of respectively Σ and Σ�1, then the log likelihood equals

log f ΣðxÞ ¼∑ix2i ð�sii=2Þþ ∑

io jxixjð�sijÞ�ðp=2Þlog 2π�ð1=2ÞlogjΣj

¼∑iyiiθ

iiþ ∑io j

yijθij�ψðΘÞ ðsay lðy;ΘÞÞ; ð4Þ

where Θ¼ ðθijÞir j and y¼ ðyijÞir j are given by

θii ¼ ð�1=2Þsii; i¼ 1;…; p;

θij ¼ �sij; 1r io jrp;

yii ¼ x2i ; i¼ 1;…; p;

yij ¼ xixj; 1r io jrp;

8>>>><>>>>:

ð5Þ

and

ψðΘÞ ¼ ðp=2Þlog 2πþð1=2Þ log jΣðΘÞj: ð6ÞThe summations ∑i, ∑io j in Eq. (4) are abbreviations respectively for ∑p

i ¼ 1 and ∑1r io jrp, and we will use these kinds ofnotations implicitly hereafter.

The expression (4) gives natural coordinate system Θ of the manifold S as a full exponential family. Another coordinatesystem, so-called expectation parameters, is also useful, which is defined as

sij ¼ EðyijÞ; 1r ir jrp: ð7ÞFor the analysis of the information carried by l and H, we need to prepare another coordinate system. The matrix

exponential expression of an orthogonal matrix O is given by

O¼ exp U ¼ IpþUþ12U

2þ 13!U

3þ⋯; ð8Þwhere Ip is the p-dimensional unit matrix, U is a skew-symmetric matrix and parametrized by u¼ ðuijÞ1r io jrp as

ðUÞij ¼uij if1r io jrp;

�uij if 1r jo irp;0 if 1r i¼ jrp:

8><>:

The function exp U is diffeomorphic, and u gives “normal coordinate” for the group of orthogonal matrices (see Boothby,2002, (6.7) or Muirhead, 1982, Theorem A9.11). We can use this coordinate as local system around Ip and construct an atlasfor the entire space of p-dimensional orthogonal matrices (note this space is compact); for each Γ, there exists an openneighborhood and some open ball B in Rpðp�1Þ=2 around the origin such that these spaces are diffeomorphic by the functionΓ exp UðuÞ on B.

We will use ðλ;uÞ as the third coordinate system of S and call it “spectral coordinate (system)”. Notice that this coordinatesystem is associated with the following submanifolds in S. If we fix Γ in (1), then we get a submanifold MðΓÞ embedded in Swith a coordinate system λ. This is a subfamily in Nð0;ΣÞ and called curved exponential family. Its log-likelihood isexpressed, as we emphasize it as a function of λ, to be

lðy;ΘðλÞÞ ¼∑iyiiθ

iiðλÞþ ∑io j

yijθijðλÞ�ψðΘðλÞÞ: ð9Þ

On the contrary, if we fix λ in (1), we get another submanifold AðλÞ in S, whose coordinate system is given by u in aneighborhood of each point of AðλÞ. Its log-likelihood expression is given by

lðy;ΘðuÞÞ ¼∑iyiiθ

iiðuÞþ ∑io j

yijθijðuÞ�ψðΘðuÞÞ: ð10Þ

First we consider a metric, that is, a field of symmetric, positive definite, bilinear form on S. The statistically most naturalmetric is Fisher information metric. Suppose ff ðx; θÞg is a parametric family of probability density functions, whosecoordinate as a manifold is given by θ¼ ðθ1;…; θpÞ. Then the (i,j) component of Fisher information metric with respect to θ isgiven by

Eθ∂∂θi

log f x; θð Þ ∂∂θj

log f x; θð Þ� �

:

For the multivariate normal distribution family, Nðμ;ΣÞ (μ, the mean parameter is also included), Skovgaard (1984) gives aclear form of Fisher information metric. The tangent vector space at a fixed point Σ w.r.t. ðsijÞir j coordinate can be identified


with the space of symmetric matrices. For any symmetric matrix A, B, the metric with respect to the Σ¼ ðsijÞ coordinatesystem is given by

12 tr Σ�1AΣ�1B� �

: ð11ÞWe are interested in Fisher information metric with respect to the spectral coordinate ðλ;uÞ. Let ∂a; ∂b;⋯ denote the

tangent vectors w.r.t. the λ coordinate, ∂ðs;tÞ; ∂ðu;vÞ;⋯ denote the tangent vectors w.r.t. the u coordinate. Namely

∂a9∂∂λa

; ∂ðs;tÞ9∂

∂ust:

These tangent vectors (exactly speaking, vector fields) are invariant with respect to any orthogonal transformation of Σ. Forsome orthogonal matrix O, an orthogonal transformation F of S is defined as

FðΣÞ ¼OΣOt ð12ÞFor any O,

Fnð∂aÞ ¼ ∂a; 1rarp; ð13Þ

Fnð∂ðs;tÞÞ ¼ ∂ðs;tÞ; 1rsotrp; ð14Þwhere Fn is the derivative of F.

Proposition 1. Let ⟨ ; ⟩ denote Fisher information metric based on x�Nð0;ΣÞ, then the components of the metric with respect toðλ;uÞ are given as follows:

gab9⟨∂a; ∂b⟩¼ ð1=2Þλ�2a δða¼ bÞ; 1ra; brp;

gaðs;tÞ9⟨∂a; ∂ðs;tÞ⟩¼ 0; 1rarp; 1rsotrp;

gðs;tÞðu;vÞ9⟨∂ðs;tÞ; ∂ðu;vÞ⟩¼ ðλs�λtÞ2λ�1s λ�1

t δððs; tÞ ¼ ðu; vÞÞ; 1rsotrp; 1ruovrp:

δð�Þ equals one if the logic inside the parenthesis is correct, otherwise zero.

There are two remarkable properties of the metric for the spectral coordinate. First note that since the metriccomponents matrix is diagonal, ðλ;uÞ is an orthogonal coordinate system, especially that the submanifolds MðΓÞ and AðλÞare orthogonal to each other for any λ and Γ. Second it is independent of Γ, hence the metric stays constant with respectto the orthogonal transformation F in (12) for any orthogonal matrix O. (Second property is instantly derived from theexpression (11).)

Theoretically, other metrics could be naturally implemented. Calvo and Oller (1990) introduced Siegel metric. Lovrić et al.(2000) considered the natural invariant metric from the standpoint of Riemannian symmetric space. The concrete formsof both metrics are given by (3.4) and (3.2) in Lovrić et al. (2000). (The information metric (11) corresponds to (3.3) in Lovrićet al., 2000. See also Theorem 1 of Zhang et al., 2009.)

Once a metric is given on the manifold S, a connection is needed for further geometrical analysis. Connection isan important “rule” which defines how a tangent space is shifted with an infinitesimal move in a differential manifold.Although connection has an infinite variation, the most commonly used one is Levi–Civita connection. It is characterized asa unique torsion-free, metric-preserving connection. This connection is essential to consider a distance function on themanifold. Skovgaard (1984), Calvo and Oller (1990), Fletcher and Joshi (2007), Lenglet et al. (2006), Lovrić et al. (2000), andMoakher and Zéraï (2011) analyze the manifold of the normal distributions under Levi–Civita connection.

On the other hand, Amari (1982) showed that “α-connection” is suitable for statistical manifolds in general. He also foundthat e-connection (α¼1) and m-connection (α¼�1) are especially important for the asymptotic analysis of informationloss for a curved exponential family. Amari and Kumon (1983), Kumon and Amari (1983) and Eguchi (1985) gave furtherdevelopment along this line. Specifically in the relation with the multivariate normal distribution or S, Ohara et al. (1996),Yoshizawa and Tanabe (1999) and Zhang et al. (2009) considered the dual geometry (α and �α connections) of themanifolds. Notice that Levi–Civita connection is 0-connection and the “mean” between e-connection and m-connection.Therefore, using the results on geometric properties of S under e-connection and m-connection, we could also derive thoseunder Levi–Civita connection.

Since this paper is aimed for the statistical inference on Σ, we adopt α-connections, especially e- and m-connections,hereafter. We conclude this section by mentioning the important fact that S is e-flat and m-flat, and corresponding affinecoordinates are given respectively by ðsijÞ and ðsijÞ.

3. Embedding curvatures

Curvature, which is an important property for a geometrical analysis, is defined based on a given connection. Asubmanifold has both intrinsic and extrinsic curvatures. The latter describes how the submanifold is placed in the wholemanifold, and called an embedding curvature or the second fundamental form. (The first fundamental form is the metric.)


In this section, we observe the embedding curvatures of M and A for the analysis of the distribution ðl;HÞ. Specificallywe consider the following embedding curvatures:

1.
Embedding curvature of M with respect to e-connection or m-connection. Its components w.r.t. the spectral coordinateare given by
He

abðs;tÞ9⟨∇e∂a∂b; ∂ðs;tÞ⟩; H

m

abðs;tÞ9⟨∇m∂a∂b; ∂ðs;tÞ⟩; ð15Þ

where ∇e∂a∂b is the covariant derivative of ∂b in the direction of ∂a with respect to e-connection. ∇

m∂a∂b is similarly defined.

2.
Embedding curvature of A with respect to m-connection. Its components w.r.t. the spectral coordinate are given by
Hm

ðs;tÞðu;vÞa9⟨∇m∂ðs;tÞ∂ðu;vÞ; ∂a⟩; ð16Þ

where ∇m∂ðs;tÞ∂ðu;vÞ is the covariant derivative of ∂ðs;tÞ in the direction of ∂ðu;vÞ with respect to m-connection.

On these curvatures at the point ðλ;ΓÞ, we have the following results.

Proposition 2. For 1ra; brp; 1rsotrp;

He

abðs;tÞ ¼Hm

abðs;tÞ ¼ 0: ð17Þ

For 1rarp; 1rsotrp; 1ruovrp,

Hm

ðs;tÞðu;vÞa ¼λ�2a ðλt�λaÞ if s¼ u¼ a; t ¼ v;λ�2a ðλs�λaÞ if s¼ u; t ¼ v¼ a;

0 otherwise:

8><>: ð18Þ

Another expression of the embedding curvature of A is given by

Ham

ðs;tÞðu;vÞ9∑bHm

ðs;tÞðu;vÞbgba: ð19Þ

With this notation, the orthogonal projection of the covariant derivative

∇m∂ðs;tÞ∂ðu;vÞ

onto the tangent space of M is given by

∑aHam

ðs;tÞðu;vÞ∂a:

From Propositions 1 and 2, we have

Ham

ðs;tÞðu;vÞ ¼ 2ðλt�λaÞδðs¼ u¼ a; t ¼ vÞþ2ðλs�λaÞδðs¼ u; t ¼ v¼ aÞ; ð20Þhence

∑aHam

ðs;tÞðu;vÞ∂a ¼2ðλt�λsÞ∂sþ2ðλs�λtÞ∂t if ðs; tÞ ¼ ðu; vÞ;0 otherwise:

(

Similarly another embedding curvature components Hðs;tÞe

ce is defined as

Hðs;tÞe

ab ¼ ∑uov

He

abðu;vÞgðu;vÞðs;tÞ ð21Þ

and actually it vanishes

Hðs;tÞe

ab ¼ 0; 1ra; brp; 1rsotrp: ð22ÞAn embedding curvature has full information about the “extrinsic curvature” of the embedded submanifold in any

direction. Sometimes it is convenient to compress it into a scalar measure of the curvature. “Statistical curvature” by Efron(see Efron, 1975; Murray and Rice, 1993) is such a measure. For A, it is defined by (see Amari, 1985, p. 159)

γðAÞ9 ∑1ra;brp

∑so t;uov;oop;qo r

Hm

ðs;tÞðu;vÞaHm

ðo;pÞðq;rÞbgðs;tÞðo;pÞgðu;vÞðq;rÞgab;

which attains the following value at the point ðλ;ΓÞ.


Corollary 1.

γ Að Þ ¼ 2 ∑aob

λ2aþλ2bðλa�λbÞ2

From these results, we notice that if S is endowed with m-connection, then (1) the embedding curvatures and the statisticalcurvatures of A are independent of Γ, (2) any one-parameter curve ðλ;ΓðuÞÞ given by a parameter uðs;tÞ; sot, where λ and theother elements of u are fixed, is curved in the direction of ∂t�∂s and contained in a two-dimensional plane composedby ∂ðs;tÞ and ∂t�∂s, (3) the statistical curvature of A could be quite large when λ are close to each other, while M is flateverywhere.

Here we introduce another submanifold ~A which is contrasting to A in the sense that ~A is flat with respect tom-connection. For a point ðλ;ΓÞ, let

~Aðλ;ΓÞ9fΣASjðΓtΣΓÞii ¼ λi; 1r 8 irpg:We easily notice that ~A is the minimum distance points with respect to Kullback–Leibler divergence. That is,

~Aðλ;ΓÞ ¼ fΣASjarg min ~λKLðΣ;Γ diagð~λ1;…; ~λpÞΓtÞ ¼ λg;where KLðΣ; ~ΣÞ is the Kullback–Leibler divergence between Nð0;ΣÞ and Nð0; ~ΣÞ, which is specifically given by

trðΣ ~Σ�1Þ� logjΣ ~Σ�1j�p:

The minimum distance points with respect to the Kullback–Leibler divergence consists of all the points on the m-geodesicswhich pass through the point ðλ;ΓÞ and are orthogonal to MðΓÞ at that point. (See Theorem in A2 of Amari, 1982.)

We can visualize the structure of S endowed with m-connection for the two dimensional case. See Fig. 1, whereMi9MðΓiÞ; i¼ 1;…;3, Ai9AðλiÞ; i¼ 1;2 and ~A19 ~Aðλ1;Γ1Þ are drawn. When p¼2, M is a two-dimensional autoparallelsubmanifold with the affine coordinate ðλ1; λ2Þ, while A is a one-dimensional submanifold with the coordinate uð1;2Þ. As it isseen in Proposition 1, all the tangent vectors ∂1ð9∂=∂λ1Þ, ∂2ð9∂=∂λ2Þ, ∂ð1;2Þð9∂=∂uð1;2ÞÞ are orthogonal to each other. ~A is a“straight” line which is also orthogonal toM. The arrow onM is the line fλjλ1þλ2 is constantg, and the arrow head indicatesthe direction in which c9λ2=λ1 increases. The statistical curvature turns out to be the increasing function of c:

γ Að Þ ¼ 21þc2

ð1�cÞ2:

We can analyze the bias of li9n�1li; i¼ 1;…; p, from the geometrical structure of S. It is well known thatE½li� ði¼ 1;…;pÞ majorizes λi ði¼ 1;…; pÞ, that is,

∑j

i ¼ 1E½li�Z ∑

j

i ¼ 1λi; 1r 8 jrp�1; ∑

p

i ¼ 1E½li� ¼ ∑

p

i ¼ 1λi: ð23Þ

The bias E½li� is quite large when n is small and λi's are close to each other (see Lawley, 1956; Anderson, 1965). For the casep¼2,

E½l1�Zλ1; E½l2�rλ2; E½l1�þE½l2� ¼ λ1þλ2: ð24ÞSuppose a sample S9n�1S takes the value at a point sAS: Let s1 denote the point on MðΓÞ designated by the eigenvaluesof S , namely l9 ðl1; l2Þ. The curve AðlÞ connects s and s1. If we define s2 as the point on MðΓÞ designated by

Fig. 1. Submanifolds of S when p¼2, M, A and ~A .

Fig. 2. Horizontal perspective of A and ~A on the plane M when p¼2.


λ9ðλ1; λ2Þ9 ððΓtSΓÞ11; ðΓtSΓÞ22Þ, then ~Aðλ;ΓÞ connects s and s2. The three points s, s1 and s2 are on the same plane, and if wemove from s1 in the direction to s2, then the statistical curvature of A increases (see Fig. 2). If we estimate ðλ1; λ2Þ by l , thenthe estimate is the point s1, while for the unbiased estimator λ, the estimate is the point s2. Since the c-coordinate of s1 isalways smaller than that of s2, the estimator ðl1; l2Þ is likely to estimate λ1 and λ2 too apart, which causes the bias (24). It isalso seen that the bias gets larger when c approaches to one, that is, λ1 and λ2 get closer to each other.

Though the exact magnitude of the bias EðlaÞ�λa is hard to evaluate, the asymptotic bias can be evaluated. This can bealso described with embedding curvatures (see Amari, 1985, (5.4)):

E la�λa� �

¼ � 12n

CaþO n�3=2� �

;

where

Ca ¼∑c;dΓam

cdgcdþ ∑

so t;uovHam

ðs;tÞðu;vÞgðs;tÞðu;vÞ;

and Γam

cd is a m-connection coefficients of M, which is defined by

Γam

cd ¼ Γmcdbg

ba; Γmcdb9⟨∇

m∂c∂d; ∂b⟩: ð25Þ

Since M is autoparallel in m-flat S,

Γam

cd ¼ Γmcdb ¼ 0; 1ra; b; c; drp: ð26Þ

Hence we have the following equation from Proposition 1 and (20):

Ca λð Þ ¼ ∑ao t

Ham

ða;tÞða;tÞgða;tÞða;tÞ þ ∑soa

Ham

ðs;aÞðs;aÞgðs;aÞðs;aÞ

¼ 2 ∑taa

λaλtλt�λa

: ð27Þ

This bias was originally derived by the perturbation method in Lawley (1956).

4. Estimation of λ when Γ is known

We consider an estimation problem when Γ is known to be Γ0. From a practical point of view, the case when Γ is knownis not of much interest compared to the general case where both Γ and λ are unknown. However as we will show in thissection, the asymptotic information loss caused by discarding the sample eigenvectors (Section 4.2) are closely related to theasymptotic variance difference between two certain estimators (Section 4.1). Both asymptotic variance and information lossare described with geometrical terms.

4.1. Asymptotic variance of the estimators of λ

In a general term, the subfamily (submanifold) MðΓ0Þð9fΣASjΓðΣÞ ¼ Γ0gÞ in S is a “curved” exponential family, since itis a subfamily in an exponential family S. In a usual case, a subfamily is not “flat”, hence the term “curved” is used. Howeveras you can see from (17), MðΓ0Þ is autoparallel in m(e)-flat S, and intrinsically m(e)-flat (see e.g. Amari and Nagaoka, 2000,Theorem 1.1).

We are supposed to estimate unknown coordinate λ of MðΓ0Þ using an estimator λ ¼ ðλ1;…; λpÞ of some kind. Anestimator λðSÞ is specified by its inverse image λ

�1ðλÞ

AðλÞ9 λ�1ðλÞ ¼ fΣASjλðΣÞ ¼ λg: ð28Þ

This is another submanifold in S, where we will use u as a coordinate system.A consistent estimator λ is called first-order (Fisher) efficient if the first order term (i.e. Oðn�1) order term) w.r.t. the

asymptotic expansion of the variance (covariance) in n is minimized among all (regular) estimators. Correct the bias of the


first-order efficient estimator λ up to the term of order n�1, and let it be denoted by λn

9ðλn1;…; λn

pÞ. Amari showed (see e.g.Amari and Nagaoka, 2000, Theorem 4.4) that its asymptotic variance can be described by the geometrical properties such asthe metric and the embedding curvatures of MðΓ0Þ and A. For 1ra;brp,

E λn

a�λa� �

λn

b�λb� �h i

¼ 1ngabþ 1

2n2fðΓmMÞ2abþ2ðHe

MÞ2abþðHAmÞ2abgþO n�3� � ð29Þ

where

ðΓmMÞ2ab ¼ ∑

c;d;e;fΓam

cdΓb

m

ef gcegdf ;

ðHeMÞ2ab ¼ ∑

c;d;e;f ;so t;uovHðs;tÞ

e

ceHðu;vÞe

df gðs;tÞðu;vÞgcdgeagfb;

ðHAmÞ2ab ¼ ∑

so t;uov;oop;qo rHam

ðs;tÞðu;vÞHb

m

ðo;pÞðq;rÞgðs;tÞðo;pÞgðu;vÞðq;rÞ;

Γam

cd and He ðs;tÞce are already defined in the previous section as the connection coefficients (see (25)) or the embedding

curvature components (see (21)) of M. They are defined independently of the particular estimator. Ham

ðs;tÞðu;vÞ are thecomponents of the embedding m-curvature of A, which differ among the estimators.

We apply this formula to the following two estimators, ln ¼ ðln1;…; lnpÞ and λ ¼ ðλ1;…; λpÞ. The former is the bias-correctedsample eigenvalues, which is given, using (27), by

lna ¼ laþ12n

Ca lð Þ ¼ laþ1n∑taa

laltlt� la

; a¼ 1;…; p; ð30Þ

and the latter is defined by

λa ¼ ððΓ0ÞtSΓ0Þaa; a¼ 1;…; p; ð31Þwhich is (exactly) unbiased. In fact λ is the maximum likelihood estimator for the case Γ is known. Notice that for l,AðλÞ ¼AðλÞ and that for λ, AðλÞ ¼ ~Aðλ;Γ0Þ. The first-order efficiency of both estimators is guaranteed by the orthogonality toMðΓ0Þ of AðλÞ and ~Aðλ;Γ0Þ.

The terms ðΓmMÞ2ab and ðHe

MÞ2ab, which are related to the submanifold M, hence common to both estimators, vanish,because of (22) and (26). The term ðHA

mÞ2ab is different between the two estimators. As we observed in the previous section,AðλÞ is not autoparallel in S (see (18)). On the other hand, ~Aðλ;Γ0Þ is autoparallel in S, hence ðHA

mÞ2ab vanishes. Consequentlythe following results are gained.

Proposition 3. For 1ra; brp,

E lna�λa� �

lnb�λb� �

9Vab ln� ��

¼

2nλ2aþ

2n2 ∑

taa

λ2aλ2t

ðλt�λaÞ2þO n�3

� �if a¼ b;

� 2n2

λ2aλ2b

ðλa�λbÞ2þO n�3

� �if aab:

8>>>><>>>>:

ð32Þ

E λa�λa� �

λb�λb� �h i

9Vab λ� ��

¼2nλ2aþO n�5=2

� �if a¼ b;

Oðn�5=2Þ if aab:

8<: ð33Þ

This result says that λ is the second-order efficient (among the bias-corrected first-order efficient estimators), but the bias-corrected sample eigenvalues are not. The difference in the asymptotic performance between the two estimators is due tothe fact ln do not use the prior information Γ¼ Γ0, while λ does. In contrast to ln, which does not use H, λ incorporates theinformation of H with the aid of the prior knowledge Γ¼ Γ0: In fact, as we will see in the next subsection, the differencebetween (32) and (33) is closely related to the asymptotic information loss caused by discarding H.

4.2. Asymptotic information loss

In this subsection, we consider the asymptotic information loss caused by ignoring H for the estimation of λ. Informationloss matrix ðΔgabðlÞÞ; 1ra; brp at a fixed point Σ¼ ðλ;ΓÞ is given by

ΔgabðlÞ9E½gabðSjlÞ� ¼ gabðSÞ�gabðlÞ;where gabðSÞ; gabðlÞ; gabðSjlÞ are the components of the metrics w.r.t. ∂a and ∂b based on respectively the distributions S, l andthe conditional distribution of S given l, all of which are measured at the point Σ¼ ðλ;ΓÞ.


Amari (1982) found that the asymptotic information loss can be expressed in terms of the metric and the embeddingcurvatures:

ΔgabðlÞ ¼ n ∑so t;uov

gaðs;tÞgbðu;vÞgðs;tÞðu;vÞ þ ∑

c;d;so t;uovHe

acðs;tÞHe

bdðu;vÞgcdgðs;tÞðu;vÞ

þð1=2Þ ∑so t;uov;oop;qo r

Hm

ðs;tÞðu;vÞaHm

ðo;pÞðq;rÞb gðs;tÞðo;pÞgðu;vÞðq;rÞ þOðn�1Þ: ð34Þ

Straightforward calculation leads us to the following result:

Proposition 4.

ΔgabðlÞ ¼ BabþOðn�1Þ;

where

Bab ¼

12λ2a

∑taa

λ2tðλt�λaÞ2

if a¼ b;

� 12ðλa�λbÞ2

if aab:

8>>>><>>>>:

Bab at the point ðλ;ΓÞ depends only on λ. When the information loss of a statistic has the order Oðn�qþ1Þ, we call the statisticis the qth order sufficient. Consequently the statistic l is the first order sufficient, but not the second order sufficient.

Bab, the information loss in the second order term (Oð1Þ), could be quite large when the population eigenvalues are closeto each other. Note that the information carried by l is given by the formula:

gabðlÞ ¼ gabðSÞ�ΔgabðlÞ¼ ngabðxÞ�ΔgabðlÞ¼ ðn=2Þλ�2

a δða¼ bÞ�ΔgabðlÞ:Since ðgabðlÞÞ is positive definite, diagðn2�1λ�2

1 ;…;n2�1λ�2p Þ4 ðΔgabÞ. This holds true even in the neighborhood of a point

λ1 ¼⋯¼ λp where Bab diverges. This indicates that the term of order Oðn�1Þ in ΔgabðlÞ is also unbounded in such aneighborhood. Hence the expansion of the information loss with respect to n is not useful when the population eigenvaluesare close to each other.

Except for the case where the population eigenvalues are close to each other, Proposition 4 tells us approximately howmuch information is lost by ignoring the sample eigenvectors for the inference on the population eigenvalues. If we contractΔgab, then we could get a scalar measure on the information loss:

IL9∑a;bgabΔgab ¼∑

a2λ2aBaaþO n�1� �¼ ∑

aob


þO n�1� �Asymptotic information loss is closely related to the asymptotic variance of the two estimators ln and λ in the previoussubsection. Actually if we contract the asymptotic performance difference between the two estimators VabðlnÞ�VabðλÞ, thenit equals n�2IL, that is,

∑a;b

Vab ln� ��Vab λ

� �� gab ¼ 2�1E ∑

aðlna=λa�1Þ2

� ��2�1E ∑

aðλa=λa�1Þ2

� �

¼ n�2 ∑aob


þO n�3� �¼ n�2IL: ð35Þ

As a numerical example, we made a simulation for the case p¼2, n¼20. Taking the relationship (35) into account, we couldmeasure an information loss as the normalized quadratic risk difference between ln and λ. We randomly generated a two-dimensional normal vector under the following conditions, Σ¼ diagð1:0; cÞ; c¼ 0:2;0:4;0:6;0:8;1:0. We made 108 timesrepetition and took the average for each condition. Table 1 shows the result. (Note: (1) The risk of λ theoretically equals 0.4.(2) The simulated risk of ln is quite unstable as its large s.d. shows.) We notice that information loss is not negligible. The risk

Table 1Simulate risk of ln when p¼2 as c varies.

c: Second eigenvalue 1.0 0.8 0.6 0.4 0.2Simulated risk of ln 0.85 0.83 0.70 0.60 0.50Standard deviation 0.24 0.48 0.15 0.09 0.22

100� (risk difference/risk of λ) 111 107 75 49 24

Fig. 3. The shrinkage effect of the projection ðλ;ΓÞ onto Mi ; i¼ 1;2.


of ln is larger than that of λ by 24–111%. The risk difference is quite large especially when the population eigenvalues areclose to each other.

5. Estimation of λ when Γ is unknown

In this section, we consider the more practical case where Γ is unknown. The derivation of a new estimator for this casewill be done in view of the modification of the bias of l . Actually almost all the literature on the estimation of λ wementioned in Section 1 modify the bias of l by the so-called “shrinkage” method, that is, decreasing the dispersion of l .Though the concrete methods of shrinkage differ for each estimator, they are proposed mainly from analytical motivations.Here we consider another shrinkage estimator from a geometrical point of view.

Suppose that we have a sample S9n�1S which takes the point ðλ;ΓÞ in S, that is, λ¼ l ;Γ¼H. (See Fig. 3.) Take theorthogonal projection of this point onto the submanifold MðΓiÞ9Miði¼ 1;2Þ, where the projected point ðλi;ΓiÞ is given byλi ¼ ððΓt

iSΓiÞ11;…; ðΓtiSΓiÞppÞ. As we mentioned in Section 3, ðλi;ΓiÞ is the minimum distance point on Mi from ðλ;ΓÞ with

respect to Kullback–Leibler divergence. It is clearly understood that this projection has the shrinkage effect. If we havean appropriate probability measure of Γ on the group of p-dimensional orthogonal matrices OðpÞ, the expectation ofðΓtSΓÞii; i¼ 1;…; p, for that measure would give birth to a natural shrinkage estimator.

We choose the conditional distribution of H when l is given for the probability measure on OðpÞ. Since S ¼HLHt isdistributed as Wishart matrix Wpðn;ΣÞ, its density w.r.t. the uniform probability dμðHÞ on OðpÞ equals

f ðHjl;ΣÞ ¼ Kðl;ΣÞ�1 expð�ð1=2ÞtrHLHtΣ�1Þ; ð36Þwhere normalizing constant Kðl;ΣÞ is given by

Kðl;ΣÞ ¼ZOðpÞ

expð�ð1=2Þ tr HLHtΣ�1Þ dμðHÞ:

This conditional distribution depends on Σ. If we substitute Σ with an estimator ΣðS Þ, it gives a distribution on OðpÞ, whosedensity with respect to dμðΓÞ is given by

f ðΓjl; ΣÞ ¼ Kðl; S Þ�1 expð�ð1=2Þ tr ΓLΓtΣ�1Þ; ð37Þwhere

Kðl; ΣÞ ¼ZOðpÞ

expð�ð1=2Þ tr ΓLΓtΣ�1Þ dμðΓÞ:

Take the expectation of ðΓtSΓÞii w.r.t. the density (37), then we have

λn

i 9Kðl; SÞ�1ZOðpÞ

ðΓtSΓÞii expð�ð1=2Þ tr ΓLΓtΣ�1Þ dμðΓÞ; i¼ 1;…; p: ð38Þ

We propose λn

9 ðλn1;…; λn

pÞ as a new estimator of λ.

Fig. 4. Risks of the three estimators as c changes.


If Σ is given by an orthogonally invariant estimator (2), λn

i can be more specifically described. Let L denote diagðlÞ.Because of the invariance of dμ, it turns out that

λn

i ¼ KðlÞ�1ZOðpÞ

ðΓtHLHtΓÞii expð�ð1=2Þ tr LΓtHΦ�1HtΓÞ dμðΓÞ

¼ KðlÞ�1ZOðpÞ

ðΓtLΓÞii expð�ð1=2Þ tr LΓtΦ�1ΓÞ dμðΓÞ; ð39Þ

where

KðlÞ9ZOðpÞ

expð�ð1=2Þ tr LΓtΦ�1ΓÞ dμðΓÞ: ð40Þ

The analytic evaluation of this estimator's performance seems difficult even for the large sample case. Instead we showthe numerical result comparing l , λ

n

and Stein's estimator (3). Our new estimator λn

i is also equipped with the same ϕ'sin (3). We simulated the risks of three estimators for the case p¼2, n¼10 w.r.t. K–L loss, which is given by

∑p

i ¼ 1λ iλ

�1i � ∑

p

i ¼ 1logðλ iλ�1

i Þ�p;

where λ i ¼ li; λn

i ; ϕi; i¼ 1;…; p. Since all the estimators are functions of l and scale invariant, it is enough to measure therisks for Σ¼ diagð1; cÞ; 0ocr1. We varied c from 0.04 to 1.00 by an increment of 0.04, and for each c we repeated the riskevaluation 105 times and took the average. For the integral calculation of (39) and (40), we picked up 50 points from Oð2Þ inan equidistant manner. Fig. 4 shows the result. The new estimator performs better compared to l , especially λ are close toeach other, though it seems that λ

n

does not dominate l as Stein's estimator does. Unfortunately we do not have anytheoretical explanation of the risk behavior of the new estimator. We could only guess that the shrinkage effect works wellwhen c is close to one, while its effect is too strong elsewhere. We also simulated the risk of the new estimator equippedwith M.L.E. instead of Stein's estimator. Since its performance is almost the same as the above new estimator, we skip theresult.

6. Remark

1.
We treated the estimation problem of the eigenvalues λ in the latter half of the paper. The estimation on the eigenvectorsΓ seems rather untouched in the classical situation nZp. Corollary 1 on the statistical curvatures of A or (27) on theasymptotic bias tells us that the point where λ has some multiplicity is a statistically singular point. Around these points,inference on Γ is considered to need subtle treatment. Especially the eigenvectors are not well identified around themultiplicity point, hence the information contained in H vanishes there (see gðs;tÞðu;vÞ in Proposition 1). This indicates thatthe inference using only H is not appropriate.
2.
We proposed a new estimator for λ in Section 5. However this belongs to the same category as most estimators in thepast literature in that it uses sample eigenvalues λ only. It is still unclear how we can use the sample eigenvalues H forthe inference of λ.


Acknowledgment

The author really appreciates Dr. M. Kumon kindly answering his question on a basic fact of the information loss. He alsoshows deep gratitude to anonymous referees for their valuable comments which improved the quality of the paper.

Appendix A

A.1. Proof of Proposition 1

As a base for the vector space of real symmetric matrices, we consider Eij ð1r ir jrpÞwhich is a p�p matrix defined by

Eij ¼Iii if i¼ j;IijþIji if io j;

(

where Iij ð1r i; jrpÞ is the p�p matrix whose ði; jÞ element equals one, and all the other elements are zero. The one to onecorrespondence

∂ði;jÞ9∂

∂sij⟷Eij; 1r ir jrp;

gives the component expression of (11)

⟨∂ði;jÞ; ∂ðk;lÞ⟩¼ 12 tr Σ�1EijΣ�1Ekl� �

; 1r ir jrp; 1rkr lrp:

Since

∂a9∂∂λa

¼ ∑ir j

∂sij∂λa

∂∂sij

¼ ∑ir j

∂sij∂λa

∂ði;jÞ; 1rarp; ð41Þ

∂ðs;tÞ9∂

∂ust¼ ∑

ir j

∂sij∂ust

∂∂sij

¼ ∑ir j

∂sij∂ust

∂ði;jÞ; 1rsotrp; ð42Þ

we have the following relations:

gab ¼12tr Σ�1 ∑

ir j

∂sij∂λa

Eij

!Σ�1 ∑

kr l

∂skl∂λb

Ekl

!( ); ð43Þ

gaðs;tÞ ¼12tr Σ�1 ∑

ir j

∂sij∂λa

Eij

!Σ�1 ∑

kr l

∂skl∂ust

Ekl

!( ); ð44Þ

gðs;tÞðu;vÞ ¼12tr Σ�1 ∑

ir j

∂sij∂ust

Eij

!Σ�1 ∑

kr l

∂skl∂uuv

Ekl

!( ); ð45Þ

where 1ra; brp, 1rsotrp; 1ruovrp.For the first order derivative at u¼ 0, we only have to consider Σ up to the term to the first power w.r.t. u, hence we put

Σðλ;uÞ asΣðλ;uÞ ¼ ΓðIpþUÞΛðIpþUÞtΓtþOðJuJ2Þ

¼ ΓΛΓtþΓΛUtΓtþΓUΛΓtþOðJuJ2Þ: ð46ÞTherefore we have

sij ¼∑kγikγjkλkþ∑

k;lγikγjlðλkulkþλluklÞþOðJuJ2Þ; 1r ir jrp;

where uii90 ð1r irpÞ; uij9�uji ð1r jo irpÞ, which leads to

∂sij∂λa

��u ¼ 0

¼ γiaγja; ð47Þ

and

∂sij∂ust

��u ¼ 0

¼ λtγitγjs�λsγisγjtþλtγisγjt�λsγitγjs: ð48Þ

From (47) and (48), we have the following results on tangent vectors:

∑ir j

∂sij∂λa

Eij ¼ ∑ir j

γiaγjaEij ¼ γaγta; ð49Þ


where γa is the ath column of Γ, and

∑ir j

∂sij∂ust

Eij ¼ λtγtγts�λsγsγ

ttþλtγsγ

tt�λsγtγ

ts: ð50Þ

If we substitute (49) and (50) into (43), (44), and (45), we get the results as follows:

2gab ¼ trðΣ�1γaγtaΣ

�1γbγtbÞ

¼ trfðγtbΣ�1γaÞfðγtaΣ�1γbÞg¼ trðλ�1

a δða¼ bÞλ�1b δða¼ bÞÞ

¼ λ�2a δða¼ bÞ;

2gaðs;tÞ ¼ trfΣ�1γaγtaΣ

�1ðλtγtγts�λsγsγttþλtγsγ

tt�λsγtγ

tsÞg

¼ λtλ�2a δða¼ s¼ tÞ�λsλ

�2a δða¼ s¼ tÞ

þλtλ�2a δða¼ s¼ tÞ�λsλ

�2a δða¼ s¼ tÞ

¼ 0;

2gðs;tÞðu;vÞ ¼ trfΣ�1ðλtγtγts�λsγsγttþλtγsγ

tt�λsγtγ

tsÞ � Σ�1ðλvγvγtu�λuγuγ

tvþλvγuγ

tv�λuγvγ

tuÞg

¼ λtλvλ�1t δðu¼ tÞλ�1

s δðs¼ vÞ�λtλuλ�1t δðv¼ tÞλ�1

s δðs¼ uÞþλtλvλ

�1t δðv¼ tÞλ�1

s δðs¼ uÞ�λtλuλ�1t δðu¼ tÞλ�1

s δðs¼ vÞ�λsλvλ

�1s δðu¼ sÞλ�1

t δðt ¼ vÞþλsλuλ�1s δðs¼ vÞλ�1

t δðt ¼ uÞ�λsλvλ

�1s δðs¼ vÞλ�1

t δðt ¼ uÞþλsλuλ�1s δðu¼ sÞλ�1

t δðt ¼ vÞþλtλvλ

�1s δðu¼ sÞλ�1

t δðt ¼ vÞ�λtλuλ�1s δðv¼ sÞλ�1

t δðt ¼ uÞþλtλvλ

�1s δðv¼ sÞλ�1

t δðt ¼ uÞ�λtλuλ�1s δðu¼ sÞλ�1

t δðt ¼ vÞ�λsλvλ

�1t δðu¼ tÞλ�1

s δðs¼ vÞþλsλuλ�1t δðv¼ tÞλ�1

s δðu¼ sÞ�λsλvλ

�1t δðt ¼ vÞλ�1

s δðu¼ sÞþλsλuλ�1t δðu¼ tÞλ�1

s δðs¼ vÞ¼ ð�1þλtλ

�1s �1þλsλ

�1t þλtλ

�1s �1þλsλ

�1t �1Þδðs¼ u; t ¼ vÞ

¼ 2ðλ�1s ðλt�λsÞþλ�1

t ðλs�λtÞÞδðs¼ u; t ¼ vÞ¼ 2ðλt�λsÞðλ�1

s �λ�1t Þδðs¼ u; t ¼ vÞ

¼ 2ðλt�λsÞ2ðλsλtÞ�1δðs¼ u; t ¼ vÞ:


Note that Σ�1 ¼ ΓΛ�1Γt , hence

θij ¼�∑

kγikγjkλ

�1k if io j;

�2�1∑kγ2ikλ

�1k if i¼ j:

8>><>>:

This means M is an affine subspace of S w.r.t. an Θ, which is an affine coordinate system of S with e-connection.

Consequently M is e-flat, i.e. He

abðs;tÞ ¼ 0. Hm

abðs;tÞ ¼ 0 is similarly proved. See Theorem 1.1 in Amari and Nagaoka (2000).

Now we consider Hm

ðs;tÞðu;vÞa. Using (4.14) in Amari (1985), it is calculated as

Hm

ðs;tÞðu;vÞa ¼ ∑ir j

∂2sij∂ust∂uuv

��u ¼ 0

∂θij

∂λa

��u ¼ 0

¼ �2�1 ∑1r i;jrp

∂2sij∂ust∂uuv

��u ¼ 0

∂sij

∂λa

��u ¼ 0

¼ �2�1trðABÞ;where p� p matrices A, B are given by

ðAÞij9∂2sij

∂ust∂uuv

��u ¼ 0

; ðBÞij9∂sij

∂λa

��u ¼ 0

; 1r i; jrp:

In order to calculate A, we only have to consider Σ up to the terms powered by two w.r.t. u:

Σ¼ ΓðIpþUþ2�1U2ÞΛðIpþUþ2�1U2ÞtΓtþOðJuJ3Þ¼ ΓΛΓtþΓðUΛþΛUtÞΓtþ2�1ΓðU2ΛþΛðU2ÞtÞΓtþΓUΛUtΓtþOðJuJ3Þ:


Therefore sij is expressed as

sij ¼ 2�1∑k;lγikγjlððU2ΛþΛðU2ÞtÞklþ2ðUΛUtÞklÞþRijþOðJuJ3Þ; ð51Þ

where R¼ ΓΛΓtþΓðUΛþΛUtÞΓt : Since

ðU2ΛþΛðU2ÞtÞkl ¼ ðU2ΛÞklþðU2ΛÞlk ¼∑bukbublλlþ∑

bulbubkλk;

2ðUΛUtÞkl ¼ 2∑bukbulbλb;

(51) turns out to be

sij ¼ 2�1 ∑k;l;b

γikγjlðukbublλlþulbubkλkþ2ukbulbλbÞþRijþOðJuJ3Þ: ð52Þ

From this we have

∂2sij∂ust∂uuv

��u ¼ 0

� 2¼ ∑k;l;b

að1Þij það1Þji það2Þij það2Þji það3Þij það4Þij

� �; ð53Þ

where

að1Þij ¼ γisγjvλvδfðk; bÞ ¼ ðs; tÞ; ðb; lÞ ¼ ðu; vÞ; ðs; tÞaðu; vÞg�γitγjvλvδfðk; bÞ ¼ ðt; sÞ; ðb; lÞ ¼ ðu; vÞ; ðs; tÞaðu; vÞg�γisγjuλuδfðk; bÞ ¼ ðs; tÞ; ðb; lÞ ¼ ðv;uÞ; ðs; tÞaðu; vÞgþγitγjuλuδfðk; bÞ ¼ ðt; sÞ; ðb; lÞ ¼ ðv;uÞ; ðs; tÞaðu; vÞgþγiuγjtλtδfðk; bÞ ¼ ðu; vÞ; ðb; lÞ ¼ ðs; tÞ; ðs; tÞa ðu; vÞg�γivγjtλtδfðk; bÞ ¼ ðv;uÞ; ðb; lÞ ¼ ðs; tÞ; ðs; tÞa ðu; vÞg�γiuγjsλsδfðk; bÞ ¼ ðu; vÞ; ðb; lÞ ¼ ðt; sÞ; ðs; tÞaðu; vÞgþγivγjsλsδfðk; bÞ ¼ ðv;uÞ; ðb; lÞ ¼ ðt; sÞ; ðs; tÞa ðu; vÞg;

að2Þij ¼ 2γisγjtλtδfðk; bÞ ¼ ðs; tÞ; ðb; lÞ ¼ ðs; tÞ; ðs; tÞ ¼ ðu; vÞgþ2γitγjsλsδfðk;bÞ ¼ ðt; sÞ; ðb; lÞ ¼ ðt; sÞ; ðs; tÞ ¼ ðu; vÞg�2γisγjsλsδfðk; bÞ ¼ ðs; tÞ; ðb; lÞ ¼ ðt; sÞ; ðs; tÞ ¼ ðu; vÞg�2γitγjtλtδfðk; bÞ ¼ ðt; sÞ; ðb; lÞ ¼ ðs; tÞ; ðs; tÞ ¼ ðu; vÞg

¼ �2γisγjsλsδfðk;bÞ ¼ ðs; tÞ; ðb; lÞ ¼ ðt; sÞ; ðs; tÞ ¼ ðu; vÞg�2γitγjtλtδfðk; bÞ ¼ ðt; sÞ; ðb; lÞ ¼ ðs; tÞ; ðs; tÞ ¼ ðu; vÞg;

að3Þij ¼ 2γisγjuλtδfðk; bÞ ¼ ðs; tÞ; ðl; bÞ ¼ ðu; vÞ; ðs; tÞa ðu; vÞg�2γitγjuλsδfðk;bÞ ¼ ðt; sÞ; ðl;bÞ ¼ ðu; vÞ; ðs; tÞa ðu; vÞg�2γisγjvλtδfðk; bÞ ¼ ðs; tÞ; ðl; bÞ ¼ ðv;uÞ; ðs; tÞaðu; vÞgþ2γitγjvλsδfðk; bÞ ¼ ðt; sÞ; ðl; bÞ ¼ ðv;uÞ; ðs; tÞaðu; vÞgþ2γiuγjsλtδfðk;bÞ ¼ ðu; vÞ; ðl; bÞ ¼ ðs; tÞ; ðs; tÞa ðu; vÞg�2γivγjsλtδfðk; bÞ ¼ ðv;uÞ; ðl; bÞ ¼ ðs; tÞ; ðs; tÞaðu; vÞg�2γiuγjtλsδfðk;bÞ ¼ ðu; vÞ; ðl; bÞ ¼ ðt; sÞ; ðs; tÞa ðu; vÞgþ2γivγjtλsδfðk; bÞ ¼ ðv;uÞ; ðl; bÞ ¼ ðt; sÞ; ðs; tÞaðu; vÞg;

að4Þij ¼ 4γisγjsλtδfðk; bÞ ¼ ðl;bÞ ¼ ðs; tÞ ¼ ðu; vÞgþ4γitγjtλsδfðk; bÞ ¼ ðl; bÞ ¼ ðt; sÞ ¼ ðv;uÞg�4γisγjtλtδfðk; bÞ ¼ ðs; tÞ; ðl; bÞ ¼ ðt; sÞ; ðs; tÞ ¼ ðu; vÞg�4γitγjsλsδfðk;bÞ ¼ ðt; sÞ; ðl; bÞ ¼ ðs; tÞ; ðs; tÞ ¼ ðu; vÞg

¼ 4γisγjsλtδfðk; bÞ ¼ ðl; bÞ ¼ ðs; tÞ ¼ ðu; vÞgþ4γitγjtλsδfðk; bÞ ¼ ðl; bÞ ¼ ðt; sÞ ¼ ðv;uÞg:

Furthermore we have

2A¼ Að1Þ þðAð1ÞÞtþAð2Þ þðAð2ÞÞtþAð3Þ þAð4Þ; ð54Þ


where

Að1Þ ¼ γsγtvλvδðt ¼ uÞþγtγ

tuλuδðs¼ vÞ

þγuγttλtδðs¼ vÞþγvγ

tsλsδðu¼ tÞ

�γtγtvλvδðs¼ u; tavÞ�γsγ

tuλuδðt ¼ v; sauÞ

�γvγttλtδðu¼ s; tavÞ�γuγ

tsλsδðt ¼ v; sauÞ;

Að2Þ ¼ �2ðγsγtsλsδðs¼ u; t ¼ vÞþγtγttλtδðs¼ u; t ¼ vÞÞ;

Að3Þ ¼ 2ðγsγtuλtδðt ¼ v; sauÞþγtγtvλsδðs¼ u; tavÞ

þγuγtsλtδðv¼ t; sauÞþγvγ

ttλsδðu¼ s; tavÞÞ

�2ðγtγtuλsδðs¼ vÞþγsγtvλtδðt ¼ uÞþγvγ

tsλtδðu¼ tÞþγuγ

ttλsδðs¼ vÞÞ;

Að4Þ ¼ 4ðγsγtsλtδðs¼ u; t ¼ vÞþγtγttλsδðs¼ u; t ¼ vÞÞ:

Since

∂sij

∂λa

��u ¼ 0

¼ �λ�2a γiaγja;

we have

B¼ �λ�2a γaγ

ta: ð55Þ

From (54) and (55), we have

Hm

ðs;tÞðu;vÞa ¼ �4�1 trð2ABÞ¼ �4�1 trfðAð1Þ þðAð1ÞÞtþAð2Þ þðAð2ÞÞtþAð3Þ þAð4ÞÞBg¼ 4�1λ�2

a trfðAð1Þ þðAð1ÞÞtþAð2Þ þðAð2ÞÞtþAð3Þ þAð4ÞÞγaγtag:The following equalities hold:

trðAð1ÞγaγtaÞ ¼ λaδðs¼ v¼ a; t ¼ uÞþλaδðt ¼ u¼ a; s¼ vÞ

þλaδðt ¼ u¼ a; s¼ vÞþλaδðs¼ v¼ a; t ¼ uÞ�λaδðt ¼ v¼ a; s¼ u; tavÞ�λaδðs¼ u¼ a; t ¼ v; sauÞ�λaδðt ¼ v¼ a; s¼ u; tavÞ�λaδðs¼ u¼ a; t ¼ v; sauÞ

¼ 0:

trððAð1ÞÞtγaγtaÞ ¼ 0:

trðAð2ÞγaγtaÞ ¼ �2ðλaδðs¼ u¼ a; t ¼ vÞþλaδðs¼ u; t ¼ v¼ aÞÞ:

trððAð2ÞÞtγaγtaÞ ¼ �2ðλaδðs¼ u¼ a; t ¼ vÞþλaδðs¼ u; t ¼ v¼ aÞÞ:

trðAð3ÞγaγtaÞ ¼ 2fλtδðs¼ u¼ aÞδðt ¼ v; sauÞþλsδðt ¼ v¼ aÞδðs¼ u; tavÞ

þλtδðs¼ u¼ aÞδðt ¼ v; sauÞþλsδðt ¼ v¼ aÞδðs¼ u; tavÞg�2fλsδðt ¼ u¼ aÞδðs¼ vÞþλtδðs¼ v¼ aÞδðt ¼ uÞþλtδðs¼ v¼ aÞδðt ¼ uÞþλsδðu¼ t ¼ aÞδðs¼ vÞg

¼ 0:

trðAð4ÞγaγtaÞ ¼ 4λtδðs¼ u¼ a; t ¼ vÞþ4λsδðt ¼ v¼ a; s¼ uÞ:

Consequently

Hm

ðs;tÞðu;vÞa ¼ �λ�1a δðs¼ u¼ a; t ¼ vÞ�λ�1

a δðs¼ u; t ¼ v¼ aÞþλ�2a λtδðs¼ u¼ a; t ¼ vÞþλ�2

a λsδðt ¼ v¼ a; s¼ uÞ

¼λ�2a ðλt�λaÞ if s¼ u¼ a; t ¼ v;

λ�2a ðλs�λaÞ if s¼ u; t ¼ v¼ a;

0 otherwise:

8><>:


A.3. Proof of Corollary 1

As we will see in the next subsection,

∑so t;uov;oop;qo r

Hm

ðs;tÞðu;vÞaHm

ðo;pÞðq;rÞbgðs;tÞðo;pÞgðu;vÞðq;rÞ ¼

1λ2a

∑taa

λ2tðλt � λaÞ2

if a¼ b;

� 1ðλa�λaÞ2

if aab:

8>>><>>>:

Combine this with Proposition 1, we have

γ Að Þ ¼ 2∑a∑taa

λ2tðλt�λaÞ2

¼ 2 ∑aob


:


We calculate each term in (29). gab ¼ δða¼ bÞ2λ2a from Proposition 1. Because of (22) and (26),

ðΓmMÞ2ab ¼ ðHe

MÞ2ab ¼ 0:

For ln, ðHAmÞ2ab ¼ ðHm

A Þ2ab:

ðHmA Þ2ab ¼ ∑

so t;uov;oop;qo rHam

ðs;tÞðu;vÞHb

m

ðo;pÞðq;rÞgðs;tÞðo;pÞgðu;vÞðq;rÞ

¼ ∑t4a;p4b

Ham

ða;tÞða;tÞHa

m

ðb;pÞðb;pÞðgða;tÞðb;pÞÞ2

þ ∑t4a;pob

Ham

ða;tÞða;tÞHb

m

ðp;bÞðp;bÞðgða;tÞðp;bÞÞ2

þ ∑toa;p4b

Ham

ðt;aÞðt;aÞHb

m

ðb;pÞðb;pÞðgðt;aÞðb;pÞÞ2

þ ∑toa;pob

Ham

ðt;aÞðt;aÞHb

m

ðp;bÞðp;bÞðgðt;aÞðp;bÞÞ2 ð56Þ

If a¼b, then the r.h.s. of (56) equals

∑t4a

Ham

ða;tÞða;tÞ

� �2

ðgða;tÞða;tÞÞ2þ ∑toa

Ham

ðt;aÞðt;aÞ

� �2

ðgðt;aÞðt;aÞÞ2

¼ ∑t4a

ð2ðλt�λaÞÞ2 λaλt

ðλa�λtÞ2

!2

þ ∑toa

ð2ðλt�λaÞÞ2 λaλt

ðλa�λtÞ2

!2

¼ 4 ∑taa

λ2aλ2t

ðλa�λtÞ2:

If aab, then the r.h.s. of (56) equals

Ham

ða;bÞða;bÞHb

m

ða;bÞða;bÞðgða;bÞða;bÞÞ2

¼ 4 λb�λað Þ λa�λbð Þ λaλbðλa�λbÞ2

!2

¼ � 4λ2aλ2b

ðλa�λbÞ2:



The term of the order n in (34) vanishes since gaðs;tÞ equals zero for 1rarp; 1rsotrp. We consider the term of order

Oð1Þ. Since He

acðs;tÞ also vanishes for 1ra; crp, 1rsotrp, we only have to consider the term

ð1=2Þ ∑so t;uov;oop;qo r

Hm

ðs;tÞðu;vÞaHm

ðo;pÞðq;rÞbgðs;tÞðo;pÞgðu;vÞðq;rÞ:

Because of (18), the above term equals

2�1 ∑t4a;p4b

Hm

ða;tÞða;tÞaHm

ðb;pÞðb;pÞbðgða;tÞðb;pÞÞ2

þ2�1 ∑t4a;pob

Hm

ða;tÞða;tÞaHm

ðp;bÞðp;bÞbðgða;tÞðp;bÞÞ2

þ2�1 ∑toa;p4b

Hm

ðt;aÞðt;aÞaHm

ðb;pÞðb;pÞbðgðt;aÞðb;pÞÞ2

þ2�1 ∑toa;pob

Hm

ðt;aÞðt;aÞaHm

ðp;bÞðp;bÞbðgðt;aÞðp;bÞÞ2 ð57Þ

If a¼b, then (57) equals

2�1 ∑t4a

ðHm

ða;tÞða;tÞaÞ2ðgða;tÞða;tÞÞ2þ2�1 ∑toa

ðHm

ðt;aÞðt;aÞaÞ2ðgðt;aÞðt;aÞÞ2

¼ 2�1 ∑t4a

ðλ�2a ðλt�λaÞÞ2 λaλt

ðλa�λtÞ2

!2

þ ∑toa

ðλ�2a ðλt�λaÞÞ2 λaλt

ðλa�λtÞ2

!28<:

9=;

¼ 2�1 ∑taa

λ2tλ2aðλa�λtÞ2

:

If aob, then (57) equals

2�1Hm

ða;bÞða;bÞaHm

ða;bÞða;bÞbðgða;bÞða;bÞÞ2

¼ 2�1λ�2a λb�λað Þλ�2

b λa�λbð Þ λaλbðλa�λbÞ2

!2

¼ � 12ðλa�λbÞ2

:

References

Amari, S., 1982. Differential geometry of curved exponential families—curvature and information loss. Ann. Statist. 10, 357–385.Amari, S., 1985. Differential-Geometrical Methods in Statistics. Lecture Notes in Statistics, vol. 28. Springer-Verlag, Berlin, Heidelberg.Amari, S., Nagaoka, H., 2000. Methods of Information Geometry. Translations of Mathematical Monographs, vol. 191. American Mathematical Society,

Providence.Amari, S., Kumon, M., 1983. Differential geometry of Edgeworth expansions in curved exponential family. Ann. Inst. Statist. Math. 35, 1–24.Anderson, G.A., 1965. An asymptotic expansion for the distribution of the latent roots of the estimated covariance matrix. Ann. Math. Statist. 36, 1153–1173.Boothby, W.M., 2002. An Introduction to Differentiable Manifolds and Riemannian Geometry, revised 2nd ed. Academic Press, San Diego.Calvo, M., Oller, J.M., 1990. A distance between multivariate normal distributions based on an embedding into Siegel group. J. Multivariate Anal. 35,

223–242.Dey, D.K., 1988. Simultaneous estimation of eigenvalues. Ann. Inst. Statist. Math. 40, 137–147.Dey, D.K., Srinivasan, C., 1985. Estimation of a covariance matrix under Stein's loss. Ann. Statist. 13, 1581–1591.Efron, B., 1975. Defining the curvature of a statistical problem (with application to second order efficiency) (with discussion). Ann. Statist. 3, 1189–1242.Eguchi, S., 1985. A differential geometric approach to statistical inference on the basis of contrast functionals. Hiroshima Math. J. 15, 341–391.Fletcher, P.T., Joshi, S., 2007. Riemannian geometry for the statistical analysis of diffusion tensor data. Signal Process. 87, 250–262.Haff, L.R., 1991. The variational form of certain Bayes estimators. Ann. Statist. 19, 1163–1190.Hydorn, D.L., Muirhead, R.J., 1999. Polynomial estimation of eigenvalues. Commun. Statist. Theory Methods 28, 581–596.Jin, C., 1993. A note on simultaneous estimation of eigenvalues of a multivariate normal covariance matrix. Statist. Probab. Lett. 16, 197–203.Kumon, M., Amari, S., 1983. Geometrical theory of higher-order asymptotics of test, interval estimator and conditional inference. Proc. Roy. Soc. London

A387, 429–458.Lawley, D.N., 1956. Test of significance for the latent roots of covariance and correlation matrices. Biometrika 43, 128–136.Lenglet, C., Rousson, M., Deriche, R., Faugeras, O., 2006. Statistics on the manifold of multivariate normal distributions: theory and application to diffusion

tensor MRI processing. J. Math. Imaging Vis. 25, 423–444.Lovrić, M., Min-Oo, M., Ruh, E.A., 2000. Multivariate normal distributions parametrized as a Riemannian symmetric space. J. Multivariate Anal. 74, 36–48.Moakher, Maher, Zéraï, Mourad, 2011. The Riemannian geometry of the space of positive-definite matrices and its application to the regularization of

positive-definite matrix-valued data. J. Math. Imaging Vis. 40, 171–187.Muirhead, R.J., 1982. Aspects of Multivariate Statistical Theory. Wiley, New York.Murray, M.K., Rice, J.W., 1993. Differential Geometry and Statistics. Chapman&Hall/CRC, Boca Raton.

http://refhub.elsevier.com/S0378-3758(14)00040-8/sbref1





























Ohara, A., Suda, N., Amari, S., 1996. Dualistic differential geometry of positive definite matrices and its applications to related problems. Linear Algebra Appl.247, 31–53.

Sheena, Y., Takemura, A., 2011. Admissible estimator of the eigenvalues of the variance–covariance matrix for multivariate normal distributions.J. Multivariate Anal. 102, 801–815.

Skovgaard, L.T., 1984. A Riemannian geometry of the multivariate normal model. Scand. J. Statist. 11, 211–233.Smith, S.T., 2005. Covariance, subspace, and intrinsic Cramér–Rao bounds. IEEE Trans. Signal Process. 53, 1610–1630.Takemura, A., 1984. An orthogonally invariant minimax estimator of the covariance matrix of a multivariate normal population. Tsukuba J. Math. 8, 367–376.Yang, R., Berger, J.O., 1994. Estimation of a covariance matrix using the reference prior. Ann. Statist. 22, 1195–1221.Yoshizawa, S., Tanabe, K., 1999. Dual differential geometry associated with the Kullback–Leibler information on the Gaussian distributions and its

2-parameter deformations. SUT J. Math. 35, 113–137.Zhang, S., Sun, H., Li, C., 2009. Information geometry of positive definite matrices. J. Beijing Inst. Technol. 18, 484–487.












Date post:	30-Dec-2016
Category:	Documents
Upload:	yo
View:	216 times
Download:	3 times

Inference on the eigenvalues of the covariance matrix of a multivariate normal...

Documents