ÚOíä�ûü
Luo Shan
Department of Mathematics
Shanghai Jiao Tong University
March 31st, 2015
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 1 / 42
Lecture 5
1 What is in the last lecture
2 Asymptotic Criteria and InferenceConsistencyAsymptotic bias, variance and mse
3 UMVUESufficient and complete statisticsA necessary and sufficient conditionInformation inequality
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 2 / 42
What is in the last lecture
sufficiency, minimal sufficiency, complete statistics, statistical inferenceToday’s homework 1: Exercises 34, 47, 97 (page 147-156.)
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 3 / 42
Asymptotic Criteria and Inference Consistency
1 What is in the last lecture
2 Asymptotic Criteria and InferenceConsistencyAsymptotic bias, variance and mse
3 UMVUESufficient and complete statisticsA necessary and sufficient conditionInformation inequality
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 4 / 42
Asymptotic Criteria and Inference Consistency
Definition 1 (consistency of point estimators)
Let X1,X2, · · · ,Xn be a sample from P ∈ P and Tn(X ) be a pointestimator of ν for every n.
(i) Tn(X ) is called consistent for ν iff Tn(X )→p ν w.r.t any P ∈ P(weak consistency)
(ii) Let {an} be a sequence of positive constants diverging to ∞,Tn(X ) iscalled an-consistent for ν iff an(Tn(X )− ν) = Op(1) w.r.t any P ∈ P
(iii) Tn(X ) is called strongly consistent for ν iff Tn(X )→a.s. ν w.r.t anyP ∈ P
(iv) Tn(X ) is called Lr -consistent for ν iff Tn(X )→Lr ν w.r.t any P ∈ Pfor some fixed r > 0. (when r = 2, it is also called consistency inmse.)
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 5 / 42
Asymptotic Criteria and Inference Consistency
LLN, CLT, Slutsky’s Theorem, continuous mapping theorem are typicallyapplied to establish consistency of point estimators.
Example 2
Let X1,X2, · · · ,Xn be i.i.d from P ∈ P.
If ν = µ = EX1 is finite, by SLLN, X is strongly consistent for µ
If σ2 = var(X1) is finite, by CLT, X is√
n-consistent for µ, S2 isstrongly consistent for σ2 by SLLN.
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 6 / 42
Asymptotic Criteria and Inference Consistency
Example 3
Let X1,X2, · · · ,Xn be i.i.d from an unknown P with a continuous c.d.f Fsatisfying F (θ) = 1 for some θ ∈ R and F (x) < 1 for any x < θ (e.g,U(0, θ)). For any ε > 0,F (θ − ε) < 1, P(|X(n) − θ| ≥ ε) = [F (θ − ε)]n,
which implies that X(n) is strongly consistent for θ. If assume F (i)(θ−)
vanishes for i ≤ m and F (m+1)(θ−) 6= 0, then
1−F (X(n)) =(−1)mF (m+1)(θ−)
(m + 1)!(θ−X(n))m+1 +o(|θ−X(n)|m+1)a.s. (1)
Note that P(n[1− F (X(n))] ≥ s) = (1− s/n)n → 0 impliesn(θ − X(n))m+1 = Op(1). If m = 0, then X(n) is n-consistent; if m = 1,then it is
√n-consistent.
Remark: In statistical theory, inconsistent estimators should not be used,but a consistent estimator is not necessarily good. Consistency should beused together with one or a few more criteria.
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 7 / 42
Asymptotic Criteria and Inference Asymptotic bias, variance and mse
1 What is in the last lecture
2 Asymptotic Criteria and InferenceConsistencyAsymptotic bias, variance and mse
3 UMVUESufficient and complete statisticsA necessary and sufficient conditionInformation inequality
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 8 / 42
Asymptotic Criteria and Inference Asymptotic bias, variance and mse
(i) In some cases, there is no unbiased estimator. For example, assume Xfollows binomial distribution Bi(n, p) where n is known and p ∈ (0, 1)is unknown, then there is no unbiased estimator for ν = p−1.
(ii) There are many reasonable point estimators whose expectations arenot well defined. For example, (X1,Y1), · · · , (Xn,Yn) are i.i.d frombivariate normal distribution, the ratio estimator Tn = X/Y forν = µx/µy is not defined for any n.
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 9 / 42
Asymptotic Criteria and Inference Asymptotic bias, variance and mse
Definition 4
If E(Tn) exists for every n and limn→∞ E(Tn − ν) = 0 for any P ∈ P,then Tn is said to be approximately unbiased.
Let ξ1, ξ2, · · · , ξn be random variables and {an} be a sequence ofpositive numbers satisfying an →∞ or an → a > 0. If anξn →d ξ andE|ξ| <∞, then E(|ξ|)/an is called an asymptotic expectation of ξn.
Let Tn be a point estimator of ν for every n, an asymptoticexpectation of Tn − ν, if exists, is called an asymptotic bias of Tn anddenoted by bTn(P); if lim
n→∞bTn(P) = 0 for any P ∈ P, then Tn is said
to be asymptotically unbiased.
Suppose that there is a sequence of random variables {ηn} such thatanηn →d Y , a2
n(Tn − ν − ηn)→d W where Y ,W are randomvariables with finite means, EY = 0,EW 6= 0. Then we may definea−2n be the order of bTn(P) or define EW /a2
n to be that a−2n order
asymptotic bias of Tn.
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 10 / 42
Asymptotic Criteria and Inference Asymptotic bias, variance and mse
Example 5
Let X1,X2, · · · ,Xn be i.i.d random k-vectors with finite Σ = var(X1). Let
X = n−1n∑
i=1Xi ,Tn = g(X ) where g is a function on Rk that is
second-order differentiable at ν = EX1 ∈ Rk . Consider Tn as an estimatorof ν = g(µ). By Taylor’s expansion,Tn − µ = [∇g(µ)]τ (X − µ) + 1/2(X − µ)τ∇2g(µ)(X − µ) + op(1/n).Since (X − µ)τ∇2g(µ)(X − µ)→d Z τ
Σ∇2g(µ)ZΣ where ZΣ = Nk(0,Σ).Thus, E(Z τ
Σ∇2g(µ)ZΣ)/2n is the n−1 order asymptotic bias of Tn = g(X ).
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 11 / 42
Asymptotic Criteria and Inference Asymptotic bias, variance and mse
Definition 6
Let Tn be an estimator of ν for every n and {an} be a sequence of positivenumbers satisfying an →∞ or an → a > 0. Assume thatan(Tn − ν)→d Y with 0 < EY 2 <∞.
The asymptotic mean squared error of Tn, denoted by amseTn(P) (oramseTn(θ) if P is in a parametric family indexed by θ, is defined to bethe asymptotic expectation of (Tn − ν)2, i.e, EY 2/a2
n. Theasymptotic variance of Tn is defined to be var(Y )/a2
n.
Let T ′n be another parameter of ν. The asymptotic relative efficiencyof T ′n w.r.t Tn is defined to be eT ′n,Tn(P) = amseTn(P)/amseT ′n(P)
Tn is said to be asymptotically more efficient than T ′n ifflim supn eT ′n,Tn(P) ≤ 1 for any P and < 1 for some P.
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 12 / 42
UMVUE
Unbiased estimation
Unbiased or asymptotically unbiased estimation plays an important role inpoint estimation theory. How to derive and find the best unbiasedestimators?
Definition 7
X is a sample from an unknown population P ∈ P, ν is a real-valuedparameter related to P. An estimator T (X ) of ν is unbiased iffET (X ) = ν for any P ∈ P. If there exists an unbiased estimator of ν,then ν is called an estimable parameter.
Definition 8 (UMVUE)
An unbiased estimator T (X ) of ν is called the uniformly minimumvariance unbiased estimator (UMVUE) iff var(T (X )) ≤ var(U(X )) for anyP ∈ P and any other unbiased estimator U(X ) of ν.
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 13 / 42
UMVUE
Theorem 9 (Rao-Blackwell Theorem)
Assume that A is a convex subset of Rk and for any P ∈ P, that L(P, a)is a convex function of a. Suppose also that T is a sufficient statistic forP ∈ P, and that T0 is a nonrandomized decision rule satisfyingE‖T0‖ <∞ and let T1 = E(T0(X )|T ). Then T1 is R-equivalent to orR-better than T0 for any P ∈ P.
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 14 / 42
UMVUE
The derivation of a UMVUE is relatively simple if there exists a sufficientand complete statistic for P ∈ P.
Theorem 10 (Lehmann-Scheffe Theorem)
Suppose that there exists a sufficient and complete statistic T (X ) forP ∈ P. If ν is estimable, then there is a unique unbiased estimator of νthat is of the form h(T ) with a Borel function h. Furthermore, h(T ) is theunique UMVUE of ν. (two estimators that are equal a.s. P are treated asone estimator.)
Remark: Both of these theorems can be considered as applications ofJensen’s inequality.
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 15 / 42
UMVUE Sufficient and complete statistics
1 What is in the last lecture
2 Asymptotic Criteria and InferenceConsistencyAsymptotic bias, variance and mse
3 UMVUESufficient and complete statisticsA necessary and sufficient conditionInformation inequality
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 16 / 42
UMVUE Sufficient and complete statistics
How to derive a UMVUE when a sufficient and completestatistic T is available?
The first method: directly solving for h
Need the distribution of T
Try some function h to see if E[h(T )] is related to ν
If E[h(T )] = ν for all P, what should h be?
Example 11
Let X1,X2, · · · ,Xn be i.i.d from the uniform distribution on (0, θ), θ > 0.Consider ν = θ. Since the sufficient statistic X(n) has the Lebesgue p.d.f
nθ−nxn−1I(0,θ)(x). Suppose Ef (X(n)) = 0, then nθ−n∫ θ
0 f (x)xn−1dx = 0for all θ > 0. Therefore, X(n) is also a complete statistic.
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 17 / 42
UMVUE Sufficient and complete statistics
Example 11 continued.
Consider ν = g(θ), where g is a differentiable function on (0,∞). An
unbiased estimator h(X(n)) of ν must satisfy g(θ) = nθ−n∫ θ
0 h(x)xn−1dx .Differentiating both sides of the equation leads tong(θ) + θg ′(θ) = nh(θ),∀θ > 0. Hence, the UMVUE of ν ish(X(n)) = g(X(n)) + n−1X(n)g ′(X(n)). For example, g(θ) = θ, the UMVUEof θ is (n + 1)X(n)/n.
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 18 / 42
UMVUE Sufficient and complete statistics
Example 12
Let X1,X2, · · · ,Xn be i.i.d from the Poisson distribution P(θ) with anunknown parameter θ > 0.
The p.d.f is exp(−nθ + (∑n
i=1 xi ) ln θ)(∏n
i=1 xi !)−1, T (X ) =
∑ni=1 Xi
is sufficient and complete for θ > 0 and has Poisson distributionP(nθ) (by comparing their ch.fs).
Suppose that ν = g(θ) where g is a smooth function such thatg(x) =
∑∞j=1 ajx
j , x > 0. An unbiased estimator h(T ) of ν must
satisfy∑∞
t=0
e−nθ(nθ)th(t)
t!=∑∞
j=0 ajθj for all θ > 0. i.e,∑∞
t=0
(nθ)th(t)
t!= [∑∞
i=0
(nθ)i
i !][∑∞
j=0 ajθj ]. By comparing the
coefficients of θt for all t ≥ 0, we obtain h(t) =t!
nt
∑i ,j :i+j=t
niaji !
for
any nonnegative integer t. For example, if ν = θr for r ≥ 1, then
h(t) =
0 t < rt!
nr (t − r)!t ≥ r
.
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 19 / 42
UMVUE Sufficient and complete statistics
Example 13
Let X1,X2, · · · ,Xn be i.i.d from a power series distribution, i.e,P(Xi = x) = γ(x)θx/c(θ), x = 0, 1, 2, · · · . with a known functionγ(x) ≥ 0 and an unknown parameter θ > 0.
T (X ) =∑n
i=1 Xi is sufficient and complete for θ > 0 and it is also ina power series distribution family, P(T (X ) = x) = γn(x)θx/(c(θ))n
where [c(θ)]n =∑∞
x=0 γn(x)θx .
For ν = g(θ) =θr
[c(θ)]p, where r , p are nonnegative integers. Suppose
the UMVUE is h(T ), then∑∞
x=0
γn(x)θxh(x)
[c(θ)]n=
θr
[c(θ)]p, i.e,∑∞
x=0 γn(x)h(x)θx =∑∞
x=0 γn−p(x)θx+r . Hence,
h(x) =
{0 x < r
γn−p(x − r)/γn(x) x ≥ r.
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 20 / 42
UMVUE Sufficient and complete statistics
Example 14
Let X1,X2, · · · ,Xn be i.i.d from an unknown population P in anonparametric family P. If the vector of order statistics,T = (X(1),X(2), · · · ,X(n)) is sufficient and complete (for example, when Pis the family of distributions on R having Lebesgue p.d.f.’s), since anestimator ϕ(X1, · · · ,Xn) is a function of T iff the function ϕ is symmetricin its n arguments, then a symmetric unbiased estimator of any estimableν is the UMVUE. For example, X is the UMVUE of ν = EX1; S2 is theUMVUE of var(X1); Fn(t) is the UMVUE of P(X1 ≤ t) for any fixed t.Remark: If T is not sufficient and complete for P ∈ P (for example, ifn > 2 and P contains all symmetric distributions having Lebesgue p.d.f.’sand finite means), then there is no UMVUE for ν = EX1. (Today’shomework 1)
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 21 / 42
UMVUE Sufficient and complete statistics
The second method: conditioning
Find an unbiased estimator of ν, say U(X ).
Conditioning on a sufficient and complete statistic T (X ), thenE(U(X )|T (X )) is the UMVUE of ν.
Remark: from the uniqueness of the UMVUE, it does not matter whichU(X ) is used, thus, we should use U(X ) so as to make the calculation ofE(U(X )|T (X )) as easy as possible.
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 22 / 42
UMVUE Sufficient and complete statistics
Example 15
Let X1,X2, · · · ,Xn be i.i.d from the exponential distribution E(0, θ),Fθ(x) = (1− e−x/θ)I(0,∞)(x). Consider ν = 1− Fθ(t) = e−t/θ for t > 0.
X is sufficient and complete for θ; U(X ) = I(t,∞)(X1) is an unbiased
estimator for ν, hence T (X ) = E(U(X )|T (X )) = P(X1 > t|X ) is theUMVUE of ν. Since X1/X is an ancillary statistic for θ (scale family), byBasu’s theorem, X1/X and X are independent. P(X1 > t|X = x) equalsP(X1/X > t/x). Note that
∑ni=2 Xi is independent of X1 and it follows
distribution Γ(n− 1, θ). (Lebesgue p.d.f is xn−2θ−(n−1)/Γ(n− 1) exp(−x
θ).
P(X1/X > t/x) = P((nx − t)X1 > t(n∑
i=2
Xi ))
=
∫ ∞0
exp(− tx
(nx − t)θ)
xn−2
Γ(n − 1)θn−1exp(−x
θ)dx
= (1− t/(nx))n−1.
The UMVUE of ν is (1− t/(nX ))n−1.Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 23 / 42
UMVUE Sufficient and complete statistics
Example 16
Let X1,X2, · · · ,Xn be i.i.d from N(µ, σ2) with unknown µ and σ2 > 0,T = (X , S2) is sufficient and complete for θ = (µ, σ2). Note that: (i)X ,S2
are independent; (2) X ∼ N(µ, σ2/n); (3)(n − 1)S2/σ2 ∼ χ2(n − 1), itsLebesgue p.d.f is x (n−1)/2−1e−x/22−(n−1)/2/Γ((n − 1)/2) whereΓ(t) =
∫∞0 x t−1e−xdx for t ≥ 0. Moreover, for m > (1− n)/2,
E((n − 1)S2/σ2)m = 2mΓ((n − 1)/2 + m)/Γ((n − 1)/2). (2)
Denote kn,r = nr/2Γ(n/2)/(2r/2Γ((n + r)/2)), then σr = kn−1,rES r forr > 1− n.
By solving for h directly, we can find that (i)The UMVUE for µ is X ;(ii) The UMVUE for µ2 is X 2 − S2/n; (iii)The UMVUE for σr withr > 1− n is kn−1,rS r ; (iv) The UMVUE for µ/σ is kn−1,−1X/S , ifn > 2; (iv) If ν satisfies P(X1 ≤ ν) = p with a fixed p ∈ (0, 1). Let Φbe the c.d.f of N(0, 1), then ν = µ+ σΦ−1(p), its UMVUE isX + kn−1,1SΦ−1(p).
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 24 / 42
UMVUE Sufficient and complete statistics
Example 16 cont.
Let c be a fixed constant and ν = P(X1 ≤ c) = Φ((c − µ)/σ).Choose U(X ) = I(−∞,c)(X1), then E(U(X )|T ) = P(X1 ≤ c|T ) is the
UMVUE for ν. Z (X ) = (X1 − X )/S is an ancillary statistic and byBasu’s theorem, it is independent of T .P(X1 ≤ c|T = (x , s)) = P(Z (X ) ≤ (c − x)/s). The Lebesgue p.d.fof Z is
f (z) =
√nΓ(
n − 1
2)
√π(n − 1)Γ(
n − 2
2)
[1− nz2
(n − 1)2
](n/2)−2
I(0,(n−1)/√n)(|z |).
(3)(Today’s homework 2: prove (3))
Hence the UMVUE of ν is P(X1 ≤ c |T ) =∫ (c−X )/S
−(n−1)/√n
f (z)dz .
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 25 / 42
UMVUE Sufficient and complete statistics
Example 16 cont.
Let ν = σ−1Φ′((c − µ)/σ), where Φ′ is the first-order derivative of Φ.ν is the Lebesgue p.d.f of X1 evaluated at a fixed c . The conditionalp.d.f of X1 given T = (X , S2) = (x , s2) is s−1f ((x − x)/s). Hence
the UMVUE of ν is1
Sf (
c − X
S).
Remark: The UMVUE does not have necessarily have the minimum MSE.
Let h(X ) =1
n
n∑i=1
(Xi − X )2, then E(h(X )− σ2)2 ≤ var(S2).
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 26 / 42
UMVUE Sufficient and complete statistics
Example 17
Let X1,X2, · · · ,Xn be i.i.d with Lebesgue p.d.f fθ(x) = θx−2I(θ,∞)(x),where θ > 0 is unknown.
X(1) is sufficient and complete for θ. The Lebesgue p.d.f of X(1) is
nθnx−(n+1)I(θ,∞)(x), it is apparent that X(1) is complete.
Suppose ν = P(X1 > t) for a constant t > 0. P(X1 > t|X(1)) is theUMVUE for ν. P(X1 > t|X(1) = x(1)) = P(X1/X(1) > t/x(1)) (Basu’stheorem). Without loss of generality, assume θ = 1. Ift ≤ x(1),P(X1 > t|X(1) = x(1)) = 0; otherwise, let s = t/x(1),
P(X1/X(1) > s) = (n − 1)P(X1/X(1) > s,X(1) = Xn)
= (n − 1)
∫x1>sxn,x2>xn,··· ,xn−1>xn
n∏i=1
1
x2i
dx1 · · · dxn
= (n − 1)/(ns) = ((n − 1)x(1)/nt).
The UMVUE for ν is h(X(1)) =
{((n − 1)X(1))/(nt) X(1) < t
1 X(1) ≥ tLuo Shan (SJTU) ÚOíä�ûü March 31st, 2015 27 / 42
UMVUE A necessary and sufficient condition
1 What is in the last lecture
2 Asymptotic Criteria and InferenceConsistencyAsymptotic bias, variance and mse
3 UMVUESufficient and complete statisticsA necessary and sufficient conditionInformation inequality
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 28 / 42
UMVUE A necessary and sufficient condition
It could be difficult if a complete and sufficient statistic is not available, insome cases, the following results can be applied if we have knowledgeabout the unbiased estimators of 0.
Theorem 18
Let U be the set of all unbiased estimator of 0 with finite variances and Tbe an unbiased estimator of ν with ET 2 <∞.
(i) A necessary and sufficient condition for T (X ) to be a UMVUE of ν isthat E[T (X )U(X )] = 0 for any U ∈ U ,P ∈ P;
(ii) Suppose that T = h(T ) where T is a sufficient statistic for P ∈ Pand h is a Borel function. Let UT be the subset of U consisting of
Borel functions on T . Then a necessary and sufficient condition for Tto be a UMVUE of ν is that E[T (X )U(X )] = 0 for anyU ∈ UT ,P ∈ P.
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 29 / 42
UMVUE A necessary and sufficient condition
Proof of 18.
(i) If T is a UMVUE of ν, Tc = T + cU where U ∈ U , c is a fixedconstant, is also an unbiased estimator for ν andvarTc ≥ var(T ), ∀c ∈ R,P ∈ P. Hence, E[T (X )U(X )] = 0 for anyU ∈ U ,P ∈ P; for the converse, let T0 be another unbiased estimatorof ν with var(T0) <∞,T − T0 ∈ U , therefore, E[T (T − T0)] = 0 forall P ∈ P. Combined with the fact that ET = ET0, we havevar(T ) = cov(T ,T0), hence var(T ) ≤ var(T0) for any P ∈ P0.
(ii) It suffices to show that E[T (X )U(X )] = 0 for any U ∈ UT ,P ∈ Pimplies E[T (X )U(X )] = 0 for any U ∈ U ,P ∈ P. Let U ∈ U ,E(TU) = E[E(TU|T )] = E[E(h(T )U|T )] = E[h(T )E(U|T )] = 0.
�
Remark: If there is a sufficient statistic, then by Rao-Blackwell’s theorem,we only need to focus on functions of the sufficient statistics. (ii) inTheorem 18 is more convenient to use.
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 30 / 42
UMVUE A necessary and sufficient condition
Application I of Theorem 18: to find an UMVUE
Example 19
Let X1,X2, · · · ,Xn be i.i.d from the uniform distribution on the interval(0, θ), Θ = [1,∞). X(n) is sufficient for θ but not complete. Its Lebesguep.d.f is nθ−nxn−1I(0,θ)(x). Suppose EU(X(n)) = 0, then∫ θ
0 U(x)xn−1dx = 0 for all θ ≥ 1. This implies that U(X ) = 0 a.e.
Lebesgue measure on [1,∞) and∫ 1
0 U(x)xn−1dx = 0. Let ν = θ.Consider T = h(X(n)), to have E(TU) = 0, we must have∫ 1
0 h(x)U(x)xn−1dx = 0. Consider h(x) =
{c 0 ≤ x ≤ 1
bx x > 1where c , b are
some constants. ET = θ implies θ = cP(X(n) ≤ 1) + bE[X(n)I(1,∞)(X(n))].Hence, c = 1, b = (n + 1)/n. The UMVUE of θ is then
h(X(n)) =
{1 0 ≤ X(n) ≤ 1
(n + 1)X(n)/n X(n) > 1
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 31 / 42
UMVUE A necessary and sufficient condition
Application II of Theorem 18: to show the nonexistence ofany UMVUE
Example 20
Let X be a sample (of size 1) from the uniform distributionU(θ − 1/2, θ + 1/2), θ ∈ R, we can show that there is no UMVUE ofν = g(θ) for any nonconstant function g . U ∈ U must satisfy∫ θ+1/2θ−1/2 U(x)dx = 0 for all θ ∈ R. Differentiating both sides leads to
U(x) = U(x + 1) a.e.m. where m is the Lebesgue measure on R. If T isan UMVUE, then T (x)U(x) = T (x + 1)U(x + 1) a.e. m,
T (x) = T (x + 1) a.e.m. Since g(θ) =∫ θ+1/2θ−1/2 T (x)dx for all θ ∈ R,
therefore, g ′(θ) = T (θ + 1/2)− T (θ − 1/2) = 0 a.e. m. It contradictswith the fact that g is a nonconstant function.
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 32 / 42
UMVUE A necessary and sufficient condition
Corollary 21
(i) Let Tj be a UMVUE of νj , j = 1, 2, · · · , k, where k is a fixed positive
integer, then∑k
j=1 cjTj is a UMVUE of ν =∑k
j=1 cjνj for anyconstants c1, c2, · · · , ck .
(ii) Let T1,T2 be two UMVUE’s of ν, then T1 = T2 a.s. P for anyP ∈ P.
These are consequences of Theorem 18.
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 33 / 42
UMVUE Information inequality
1 What is in the last lecture
2 Asymptotic Criteria and InferenceConsistencyAsymptotic bias, variance and mse
3 UMVUESufficient and complete statisticsA necessary and sufficient conditionInformation inequality
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 34 / 42
UMVUE Information inequality
Theorem 22 (Cramer-Rao lower bound)
Let X = (X1,X2, · · · ,Xn) be a sample from P ∈ P = {Pθ : θ ∈ Θ}, whereΘ is an open set in Rk . Suppose that T (X ) is an estimator withET (X ) = g(θ) being a differentiable function of θ,Pθ has a p.d.f fθ w.r.t ameasure ν for all θ ∈ Θ; and fθ is differentiable as a function of θ andsatisfies
∂
∂θ
∫h(x)fθ(x)dν =
∫h(x)
∂
∂θfθ(x)dν, θ ∈ Θ (4)
for h(x) ≡ 1 and h(x) = T (x). Then
var(T (X )) ≥ [∂
∂θg(θ)]τ [I (θ)]−1 ∂
∂θg(θ), (information inequalities) (5)
where I (θ) = E{ ∂∂θ
log fθ(X )[∂
∂θlog fθ(X )]τ} is assumed to be positive
definite for any θ ∈ Θ. It is called the Fisher information matrix. Thegreater I (θ) is, the easier it is to distinguish θ from neighboring values, themore accurately θ can be estimated.
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 35 / 42
UMVUE Information inequality
Proof of Theorem 22.
Let h(X ) ≡ 1, we have E[∂
∂θlog fθ(X )] = 0; let h(x) = T (x), we have
E[T (X )∂
∂θlog fθ(X )] =
∂
∂θg(θ). Denote qθ(X ) =
∂
∂θlog fθ(X ), it suffices
to show that
var(T (X )) ≥ {cov(T (X ), qθ(X ))}τ [var(qθ(X ))]−1cov(T (X ), qθ(X )) (6)
Denote Uθ(X ) = T (X )− {cov(T (X ), qθ(X ))}τ [var(qθ(X ))]−1qθ(X ), theabove equation can be implied by observing var(Uθ(X )) ≥ 0. �
Remark: I (θ) depends on the particular parametrization. If θ = ψ(η) andψ(η) is differentiable, then the Fisher information that X contains about η
is∂
∂ηψ(η)I (ψ(η))[
∂
∂ηψ(η)]τ .
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 36 / 42
UMVUE Information inequality
Proposition
(i) If X ,Y are independent with the Fisher information matricesIX (θ), IY (θ), then the Fisher information about θ contained in (X ,Y )is IX (θ) + IY (θ). In particular, if X1,X2, · · · ,Xn are i.i.d and I1(θ) isthe Fisher information about θ contained in X1, then the Fisherinformation about θ contained in X1,X2, · · · ,Xn is nI1(θ).
(ii) Suppose that X has the p.d.f fθ that is twice differentiable in θ andthat (4) holds with h(x) ≡ 1 and fθ replaced by ∂fθ/∂θ. Then
I (θ) = −E[∂2
∂θ∂θτlog fθ(X )].
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 37 / 42
UMVUE Information inequality
Example 23
Let X1,X2, · · · ,Xn be i.i.d with the Lebesgue p.d.f1
σf (
x − µσ
), where
f (x) > 0 and f ′(x) exists for all x ∈ R, µ ∈ R, σ > 0. Let θ = (µ, σ2).Then the Fisher information about θ contained in X1,X2, · · · ,Xn is
I (θ) =n
σ2
∫ [f ′(x)]2
f (x)dx
∫ f ′(x)[xf ′(x) + f (x)]
f (x)dx∫ f ′(x)[xf ′(x) + f (x)]
f (x)dx
∫ [xf ′(x) + f (x)]2
f (x)dx
(7)
For example, let f (x) =1√2π
exp(−x2
2). If θ = (µ, σ), then
I (θ) =n
σ2
(1 00 2
); If θ = (µ, σ2), then I (θ) =
n
σ2
(1 0
01
2σ2
)
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 38 / 42
UMVUE Information inequality
Proposition
Suppose that the distribution of X is from an exponential family{fθ : θ ∈ Θ}, i.e, the p.d.f of X w.r.t a σ-finite measure is
fθ(x) = exp{η(θ)τT (x)− ξ(θ)}c(x) (8)
where Θ is an open subset of Rk .
(i) The regularity condition (4) is satisfied for any E|h(X )| <∞ and
I (θ) = E[∂2
∂θ∂θτlog fθ(X )].
(ii) If I¯(η) is the Fisher information matrix for the natural parameter η,
then the variance-covariance matrix var(T ) = I¯(η).
(iii) If I (ν) is the Fisher information matrix for the parameter ν = ET (X ),then var(T ) = [I (ν)]−1.
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 39 / 42
UMVUE Information inequality
Proof.
(ii) fη(x) = exp{ητT (x)− ζ(η)}c(x),∂
∂ηlog fη(x) = T (x)− ∂
∂ηζ(η) =
T (x)− ET (X ), var(T ) =∂2
∂η∂ητ.
(iii) I¯(η) =
∂ν
∂ηI (ν)(
∂ν
∂η)τ =
∂2
∂η∂ητI (ν)[
∂2
∂η∂ητ]τ
�
Remark: for exponential family, the variance of any linear function of Tattains the Cramer-Rao lower bound. The following result gives anecessary condition for var(U(X )) of an estimator U(X ) to attain theCramer-Rao lower bound.
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 40 / 42
UMVUE Information inequality
Proposition Assume that the conditions in Theorem 22 hold with T (X )replaced by U(X ) and that Θ ⊂ R.
(i) If var(U(X )) of an estimator U(X ) attains the Cramer-Rao lower
bound, then a(θ)[U(X )− g(θ)] = g ′(θ)∂
∂θlog fθ(X ) a.s. Pθ for some
function a(θ), θ ∈ Θ;
(ii) Let fθ,T be given in (8).If var(U(X )) of an estimator U(X ) attainsthe Cramer-Rao lower bound, then U(X ) is a linear function of T (X )a.s. Pθ, θ ∈ Θ.
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 41 / 42
UMVUE Information inequality
Let X1,X2, · · · ,Xn be i.i.d from N(µ, σ2) where µ is unknown and σ isknown. X is sufficient and complete. I (µ) = n/σ2.
ν = µ, g ′ = 1, var(X ) = I (µ), it attains the Cramer-Rao lower bound
ν = µ2, g ′(µ) = 2µ, the Cramer-Rao lower bound is4µ2σ2
n.
h(X ) = X 2 − σ2/n is the UMVUE for ν, var(h(X )) =4µ2σ2
n+
2σ4
n2.
Luo Shan (SJTU) ÚOíä�ûü March 31st, 2015 42 / 42