+ All Categories
Home > Documents > Chapter 7: Point Estimation (Part II)

Chapter 7: Point Estimation (Part II)

Date post: 29-Oct-2021
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
21
Chapter 7: Point Estimation (Part II) STK4011/9011: Statistical Inference Theory Johan Pensar STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > 1 / 21
Transcript
Page 1: Chapter 7: Point Estimation (Part II)

Chapter 7: Point Estimation (Part II)

STK4011/9011: Statistical Inference Theory

Johan Pensar

STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > 1 / 21

Page 2: Chapter 7: Point Estimation (Part II)

Overview

1 Methods of Evaluating EstimatorsMean Squared ErrorBest Unbiased EstimatorsSufficiency and Unbiasedness

Covers Sec 7.3.1–7.3.3 in CB.

STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > 2 / 21

Page 3: Chapter 7: Point Estimation (Part II)

Mean Squared Error (MSE)

Definition 7.3.1: The mean squared error (MSE) of an estimator W of a parameter θ isthe function of θ defined by Eθ

([W − θ]2

).

The MSE is tractable analytically and it has a natural interpretation in terms of varianceand bias:

Eθ([W − θ]2

)= Varθ(W ) + Eθ(W − θ)2

Definition 7.3.2: The bias of a point estimator W of a parameter θ is

Biasθ(W ) = Eθ(W − θ) = Eθ(W )− θ.

An estimator for which Biasθ(W ) = 0 (that is, Eθ(W ) = θ) for all θ is called unbiased.

STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Mean Squared Error 3 / 21

Page 4: Chapter 7: Point Estimation (Part II)

MSE, Bias, and Variance

An estimator with good MSE need to control both variance (random error) and bias(systematic error).

Unbiased estimators are optimal in terms of bias, and the MSE is then equal to thevariance:

Eθ([W − θ]2

)= Varθ(W ).

However, there is typically a bias-variance tradeoff; sometimes, a small increase in biascan be traded for a larger decrease in variance, and thereby a lower MSE.

STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Mean Squared Error 4 / 21

Page 5: Chapter 7: Point Estimation (Part II)

Example: Normal MSE

STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Mean Squared Error 5 / 21

Page 6: Chapter 7: Point Estimation (Part II)

Unbiased Estimators

The notion of finding the “best MSE” estimator is problematic in the sense that no suchestimator exists in general.

Example: θ = 17 is optimal in MSE at θ = 17, but a terrible estimator in general.

One way to make the problem tractable is to consider a limited class of estimators.

We are going to focus on the class of unbiased estimators, for which the MSE equals thevariance of the estimator, and we choose the estimator with the smallest variance.

In particular, if we can find an unbiased estimator with uniformly smallest variance, we ahave an optimal unbiased estimator w.r.t. the MSE.

STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Best Unbiased Estimators 6 / 21

Page 7: Chapter 7: Point Estimation (Part II)

Best Unbiased Estimator

Definition 7.3.7: An estimator W ∗ is a best unbiased estimator of τ(θ) if it satisfies

Eθ(W ∗) = τ(θ) for all θ,

and for any other estimator W with Eθ(W ) = τ(θ), we have that

Varθ(W ∗) ≤ Varθ(W ) for all θ.

W ∗ is also called a uniform minimum variance unbiased estimator (UMVUE) of τ(θ).

STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Best Unbiased Estimators 7 / 21

Page 8: Chapter 7: Point Estimation (Part II)

Finding a Best Unbiased Estimator

Finding a best unbiased estimator (or UMVUE), if one exists, is not an easy task.

Example: Let X1, . . . ,Xn be iid Poisson(λ).

X and S2 are unbiased estimators of λ.

It can be shown that Varλ(X ) ≤ Varλ(S2) for all λ.

But, what about other unbiased estimators?

One technique for finding a best unbiased estimator is to bound the variance from below,and find an unbiased estimator whose variance equals the bound.

STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Best Unbiased Estimators 8 / 21

Page 9: Chapter 7: Point Estimation (Part II)

The Cramer-Rao Lower Bound

Theorem 7.3.9: Let X1, . . . ,Xn be a sample with pdf f (x | θ), and letW (X ) = W (X1, . . . ,Xn) be any estimator satisfying

d

dθEθ(W (X )

)=

∫X

∂θ[W (x)f (x | θ)]dx and Varθ

(W (X )

)<∞.

Then,

Varθ(W (X )

)≥

[ddθEθ

(W (X )

)]2Eθ

([∂∂θ log f (X | θ)

]2) .NOTE: The quantity Eθ

([∂∂θ log f (X | θ)

]2)is known as the Fisher information and it

measures the amount of information a random sample X carries about θ.

STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Best Unbiased Estimators 9 / 21

Page 10: Chapter 7: Point Estimation (Part II)

The Cramer-Rao Lower Bound - IID Case

Corollary 7.3.10: If the assumptions of Theorem 7.3.9 are satisfied and, additionally, ifX1, . . . ,Xn are iid with pdf f (x | θ), then

Varθ(W (X )

)≥

[ddθEθ

(W (X )

)]2nEθ

([∂∂θ log f (X | θ)

]2) .

NOTE: The C-R lower bound also applies to discrete variables, but the key condition ismodified to enable interchange of summation and differentiation (assumes that the pmf isdifferentiable in θ, which is the case for most common pmfs).

STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Best Unbiased Estimators 10 / 21

Page 11: Chapter 7: Point Estimation (Part II)

A Useful Result For Calculating the Fisher Information

Lemma 7.3.11: If f (x | θ) satisfies

d

dθEθ

(∂

∂θlog f (X | θ)

)=

∫∂

∂θ

[(∂

∂θlog f (x | θ)

)f (x | θ)

]dx

(which is true for an exponential family), then

([∂

∂θlog f (X | θ)

]2)

= −Eθ(∂2

∂θ2log f (X | θ)

).

STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Best Unbiased Estimators 11 / 21

Page 12: Chapter 7: Point Estimation (Part II)

Example: Poisson Best Unbiased Estimator

STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Best Unbiased Estimators 12 / 21

Page 13: Chapter 7: Point Estimation (Part II)

Attainment of the C-R Lower Bound

There is in general no guarantee that the C-R bound is sharp, that is, it may be strictlysmaller than the variance of any unbiased estimator.

Corollary 7.3.15: Let X1, . . . ,Xn be iid with f (x | θ), which satisfies the conditions listedin Theorem 7.3.9. If W (X ) = W (X1, . . . ,Xn) is any unbiased estimator of τ(θ), thenW (X ) attains the C-R Lower Bound iff

a(θ)[W (x)− τ(θ)

]=

∂θlog(L(θ | x)

)for some function a(θ).

NOTE: In addition to checking if the bound can be reached, the above result alsoimplicitly gives a way to find a best unbiased estimator.

STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Best Unbiased Estimators 13 / 21

Page 14: Chapter 7: Point Estimation (Part II)

Example: Normal Variance Bound

STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Best Unbiased Estimators 14 / 21

Page 15: Chapter 7: Point Estimation (Part II)

Sufficiency and Unbiased Estimators

The C-R theorem cannot be used for finding a best unbiased estimator if:

f (x | θ) does not satisfy the assumptions required by the theorem,

The bound is unattainable by the considered class of estimators.

In an alternative approach to the C-R theorem, we are going to introduce the concept ofsufficiency in our search for best unbiased estimators.

The main theorem is a clever application of the following results:

E (X ) = E [E (X |Y )] (Thm 4.4.3)

Var(X ) = Var [E (X |Y )] + E [Var(X |Y )] (Thm 4.4.7)

STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Sufficiency and Unbiasedness 15 / 21

Page 16: Chapter 7: Point Estimation (Part II)

The Rao-Blackwell Theorem

Theorem 7.3.17: Let W be any unbiased estimator of τ(θ), let T be a sufficientstatistic for θ, and define φ(T ) = E (W |T ). Then, for all θ:

Eθ(φ(T )

)= τ(θ) and Varθ

(φ(T )

)≤ Varθ(W ).

In other words, φ(T ) is a uniformly better unbiased estimator of τ(θ).

STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Sufficiency and Unbiasedness 16 / 21

Page 17: Chapter 7: Point Estimation (Part II)

Proof of Thm 7.3.17

STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Sufficiency and Unbiasedness 17 / 21

Page 18: Chapter 7: Point Estimation (Part II)

Towards a Characterization of Best Unbiased Estimators

Implied by Thm 7.3.17, we only need to consider estimators that are functions of asufficient statistic in our search for best unbiased estimators.

Moreover, we have that the best unbiased estimator is unique.

Theorem 7.3.19: If W is a best unbiased estimator of τ(θ), then W is unique.

But, if E (φ) = τ(θ) and φ is based on a sufficient statistic T , i.e. E (φ |T ) = φ, how dowe know that φ is best unbiased for τ(θ) (if it does not attain the C-R lower bound)?

STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Sufficiency and Unbiasedness 18 / 21

Page 19: Chapter 7: Point Estimation (Part II)

Improving Upon an Unbiased Estimator

Idea: To check if an estimator is best unbiased, see if it can be improved upon:

Let W and U be two estimators for which Eθ(W ) = τ(θ) and Eθ(U) = 0 for all θ.

Consider the unbiased estimator φa = W + aU for which

Varθ(φa) = Varθ(W ) + 2aCovθ(W ,U) + a2Varθ(U).

If for some θ0, we have that Covθ0 (W ,U) 6= 0 we can choose a value on a such that

2aCovθ(W ,U) + a2Varθ(U) < 0 ⇒ Varθ(φa) < Varθ(W ),

meaning that W cannot be best unbiased.

The relationship of W with unbiased estimators of 0 can be used to characterize bestunbiasedness.

STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Sufficiency and Unbiasedness 19 / 21

Page 20: Chapter 7: Point Estimation (Part II)

A Characterization of Best Unbiased Estimators

Theorem 7.3.20: If Eθ(W ) = τ(θ), then W is a best unbiased estimator of τ(θ) iff W isuncorrelated with all unbiased estimators of 0.

NOTE: An unbiased estimator of 0 is essentially random noise (the most sensibleestimator of 0 is 0).

The practical usefulness of Thm 7.3.20 is limited in general, since characterizing allunbiased estimators of 0 is typically very difficult, requiring conditions on the pdf/pmf.

STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Sufficiency and Unbiasedness 20 / 21

Page 21: Chapter 7: Point Estimation (Part II)

Completeness

Consider a family of pdfs/pmfs with the property that there are no unbiased estimators of0 other than 0 itself (recall completeness), and note that Covθ(W , 0) = 0.

Theorem 7.3.23: Let T be a complete sufficient statistic for a parameter θ, and letφ(T ) be any estimator based only on T . Then φ(T ) is the unique best unbiasedestimator of its expected value.

NOTE: If T is a complete sufficient statistic for a parameter θ and h(X ) any unbiasedestimator of τ(θ), then φ(T ) = E

(h(X ) |T

)is the best unbiased estimator of τ(θ).

STK4011/9011: Statistical Inference Theory > Chapter 7: Point Estimation (Part II) > Sufficiency and Unbiasedness 21 / 21


Recommended