The likelihood approach to statistical decision problems · introduction I the classical and...

The likelihood approach to statisticaldecision problems

Marco CattaneoDepartment of Statistics, LMU Munich

30 January 2014

introduction

I the classical and Bayesian approaches to statistics are unified andgeneralized by the corresponding decision theories

I the likelihood approach to statistics is extremely successful in practice, but itis not unified and generalized by a decision theory

I is such a likelihood decision theory possible?

I in statistics, L usually denotes:

I likelihood function

(here λ)

I loss function

(here W )

I statistical model: (Ω,F ,Pθ) with θ ∈ Θ (where Θ is a nonempty set) andrandom variables X : Ω → X and Xn : Ω → Xn

Marco Cattaneo @ LMU Munich The likelihood approach to statistical decision problems 2/10

introduction






(here λ)

I loss function

(here W )



introduction






(here λ)

I loss function

(here W )



introduction






(here λ)

I loss function

(here W )



introduction






(here λ)

I loss function

(here W )



introduction






(here λ)

I loss function

(here W )



introduction





I likelihood function (here λ)

I loss function (here W )



introduction





I likelihood function (here λ)

I loss function (here W )



loss function

I a statistical decision problem is described by a loss function

W : Θ×D → [0,+∞[,

where D is a nonempty set

I intended as unification (and generalization) of statistical inference,in particular of:

I point estimation (e.g., with D = Θ)

I hypothesis testing (e.g., with D = H0,H1)

I most successful general methods:

I point estimation: maximum likelihood estimators

I hypothesis testing: likelihood ratio tests

I these methods do not fit well in the setting of statistical decision theory:here they are unified (and generalized) in likelihood decision theory


loss function


W : Θ×D → [0,+∞[,


I intended as unification (and generalization) of statistical inference,

in particular of:








loss function


W : Θ×D → [0,+∞[,










loss function


W : Θ×D → [0,+∞[,










loss function


W : Θ×D → [0,+∞[,










likelihood function

I λx : Θ → [0, 1] is the (relative) likelihood function given X = x , when

supθ∈Θ

λx(θ) = 1 and λx(θ) ∝ Pθ(X = x)

(with λx(θ) ∝ fθ(x) as approximation for continuous X )

I λx describes the relative plausibility of the possible values of θ in the light ofthe observation X = x , and can thus be used as a basis for post-datadecision making

I prior information can be described by a prior likelihood function: if X1 andX2 are independent, then λ(x1,x2) ∝ λx1 λx2 ; that is, when X2 = x2 isobserved, the prior λx1 is updated to the posterior λ(x1,x2)

I strong similarity with the Bayesian approach (both satisfy the likelihoodprinciple): a fundamental advantage of the likelihood approach is thepossibility of not using prior information (since λx1 ≡ 1 describes completeignorance)


likelihood function


supθ∈Θ







likelihood function


supθ∈Θ







likelihood function


supθ∈Θ







likelihood function


supθ∈Θ







likelihood decision criteria

I likelihood decision criterion: minimize V (W (·, d), λx),

where the functional V must satisfy the following three properties, for allfunctions w ,w ′ : Θ → [0,+∞[ and all likelihood functions λ, λn : Θ → [0, 1]

I monotonicity: w ≤ w ′ (pointwise) ⇒ V (w , λ) ≤ V (w ′, λ)(implied by meaning of W )

I parametrization invariance: b : Θ → Θ bijection ⇒ V (w b, λ b) = V (w , λ)

(excludes Bayesian criteria V (w , λ) =∫w λ dµ∫λ dµ

for infinite Θ)

I consistency: H ⊆ Θ with limn→∞ supθ∈Θ\H λn(θ) = 0 ⇒limn→∞ V (c IH + c ′ IΘ\H, λn) = c for all constants c, c ′ ∈ [0,+∞[(excludes minimax criterion V (w , λ) = supθ∈Θ w(θ),implies calibration: V (c, λ) = c)

I likelihood decision function: δ : X → D such that δ(x) minimizesV (W (·, d), λx)



I likelihood decision criterion: minimize V (W (·, d), λx),where the functional V must satisfy the following three properties, for allfunctions w ,w ′ : Θ → [0,+∞[ and all likelihood functions λ, λn : Θ → [0, 1]




for infinite Θ)






I monotonicity: w ≤ w ′ (pointwise) ⇒ V (w , λ) ≤ V (w ′, λ)

(implied by meaning of W )



for infinite Θ)









for infinite Θ)









for infinite Θ)









for infinite Θ)









for infinite Θ)

I consistency: H ⊆ Θ with limn→∞ supθ∈Θ\H λn(θ) = 0 ⇒limn→∞ V (c IH + c ′ IΘ\H, λn) = c for all constants c, c ′ ∈ [0,+∞[

(excludes minimax criterion V (w , λ) = supθ∈Θ w(θ),implies calibration: V (c, λ) = c)








for infinite Θ)









for infinite Θ)




properties

I likelihood decision criteria have the advantages of post-data methods:

I independence from choice of possible alternative observations

I direct interpretation

I simpler problems

I likelihood decision criteria have also important pre-data properties:

I equivariance: for invariant decision problems, the likelihood decision functionsare equivariant

I (strong) consistency: under some regularity conditions, the likelihood decisionfunctions δn : X1 × · · · × Xn → D satisfy

limn→∞

W (θ, δn(X1, . . . ,Xn)) = infd∈D

W (θ, d) Pθ-a.s.


properties




I simpler problems




limn→∞

W (θ, δn(X1, . . . ,Xn)) = infd∈D

W (θ, d) Pθ-a.s.


properties




I simpler problems




limn→∞

W (θ, δn(X1, . . . ,Xn)) = infd∈D

W (θ, d) Pθ-a.s.


properties




I simpler problems




limn→∞

W (θ, δn(X1, . . . ,Xn)) = infd∈D

W (θ, d) Pθ-a.s.


properties




I simpler problems




limn→∞

W (θ, δn(X1, . . . ,Xn)) = infd∈D

W (θ, d) Pθ-a.s.


properties




I simpler problems




limn→∞

W (θ, δn(X1, . . . ,Xn)) = infd∈D

W (θ, d) Pθ-a.s.


properties




I simpler problems




limn→∞

W (θ, δn(X1, . . . ,Xn)) = infd∈D

W (θ, d) Pθ-a.s.


MPL criterion

I MPL criterion: minimize supθ∈Θ W (θ, d)λx(θ),

corresponds to

V (w , λ) = supθ∈Θ

w(θ)λ(θ)

(nonadditive integral of w with respect to H 7→ supθ∈H λ(θ))

I point estimation:

I D = Θ finite

I W (θ, θ) = Iθ =θ simple loss function

I the maximum likelihood estimator (when well-defined) is the likelihooddecision function resulting from the MPL criterion

I hypothesis testing:

I D = H0,H1 with H0 : θ ∈ H and H1 : θ ∈ Θ \ HI W (θ,H1) = c Iθ∈H and W (θ,H0) = c ′ Iθ∈Θ\H with c ≥ c ′

I the likelihood ratio test with critical value c′/c is the likelihood decisionfunction resulting from the MPL criterion


MPL criterion

I MPL criterion: minimize supθ∈Θ W (θ, d)λx(θ), corresponds to


w(θ)λ(θ)


I point estimation:

I D = Θ finite







MPL criterion



w(θ)λ(θ)


I point estimation:

I D = Θ finite







MPL criterion



w(θ)λ(θ)


I point estimation:

I D = Θ finite







MPL criterion



w(θ)λ(θ)


I point estimation:

I D = Θ finite







MPL criterion



w(θ)λ(θ)


I point estimation:

I D = Θ finite







MPL criterion



w(θ)λ(θ)


I point estimation:

I D = Θ finite







MPL criterion



w(θ)λ(θ)


I point estimation:

I D = Θ finite







MPL criterion



w(θ)λ(θ)


I point estimation:

I D = Θ finite




I D = H0,H1 with H0 : θ ∈ H and H1 : θ ∈ Θ \ H

I W (θ,H1) = c Iθ∈H and W (θ,H0) = c ′ Iθ∈Θ\H with c ≥ c ′



MPL criterion



w(θ)λ(θ)


I point estimation:

I D = Θ finite







MPL criterion



w(θ)λ(θ)


I point estimation:

I D = Θ finite







a simple example

I X1, . . . ,Xni.i.d.∼ N (θ, σ2) with Θ =]0,+∞[ (that is, θ positive and σ known)

I estimation of θ with squared error:

I D = Θ with W (θ, θ) = (θ − θ)2

I no unbiased estimator, maximum likelihood estimator not well-defined, nostandard (proper) Bayesian prior

I likelihood decision function resulting from the MPL criterion:

I scale invariance and sufficiency: θ(x1, . . . , xn) = g( xσ/

√n) σ/√n

I consistency and asymptotic efficiency: θ(x1, . . . , xn) = x when x ≥√

2σ/√n

0

1

2

3

4

5

g

–4 –2 2 4x

0.5

0.6

0.7

0.8

0.9

1

MSE

0 1 2 3 4 5theta


a simple example



I D = Θ with W (θ, θ) = (θ − θ)2




√n) σ/√n


2σ/√n

0

1

2

3

4

5

g

–4 –2 2 4x

0.5

0.6

0.7

0.8

0.9

1

MSE

0 1 2 3 4 5theta


a simple example



I D = Θ with W (θ, θ) = (θ − θ)2




√n) σ/√n


2σ/√n

0

1

2

3

4

5

g

–4 –2 2 4x

0.5

0.6

0.7

0.8

0.9

1

MSE

0 1 2 3 4 5theta


a simple example



I D = Θ with W (θ, θ) = (θ − θ)2




√n) σ/√n


2σ/√n

0

1

2

3

4

5

g

–4 –2 2 4x

0.5

0.6

0.7

0.8

0.9

1

MSE

0 1 2 3 4 5theta


a simple example



I D = Θ with W (θ, θ) = (θ − θ)2




√n) σ/√n


2σ/√n

0

1

2

3

4

5

g

–4 –2 2 4x

0.5

0.6

0.7

0.8

0.9

1

MSE

0 1 2 3 4 5theta


a simple example



I D = Θ with W (θ, θ) = (θ − θ)2




√n) σ/√n


2σ/√n

0

1

2

3

4

5

g

–4 –2 2 4x

0.5

0.6

0.7

0.8

0.9

1

MSE

0 1 2 3 4 5theta


a simple example



I D = Θ with W (θ, θ) = (θ − θ)2




√n) σ/√n

I consistency and asymptotic efficiency: θ(x1, . . . , xn) = x when x ≥√2σ/√n

0

1

2

3

4

5

g

–4 –2 2 4x

0.5

0.6

0.7

0.8

0.9

1

MSE

0 1 2 3 4 5theta


a simple example



I D = Θ with W (θ, θ) = (θ − θ)2




√n) σ/√n


0

1

2

3

4

5

g

–4 –2 2 4x

0.5

0.6

0.7

0.8

0.9

1

MSE

0 1 2 3 4 5theta


a simple example



I D = Θ with W (θ, θ) = (θ − θ)2




√n) σ/√n


0

1

2

3

4

5

g

–4 –2 2 4x

0.5

0.6

0.7

0.8

0.9

1

MSE

0 1 2 3 4 5theta


conclusion

I this work:

I fills a gap in the likelihood approach to statistics

I introduces an alternative to classical and Bayesian decision making

I offers a new perspective on the likelihood methods

I likelihood decision making:

I is post-data and equivariant

I is consistent and asymptotically efficient

I does not need prior information


conclusion

I this work:









conclusion

I this work:









conclusion

I this work:









conclusion

I this work:









conclusion

I this work:









references

I Lehmann (1959). Testing Statistical Hypotheses. Wiley.

I Diehl and Sprott (1965). Die Likelihoodfunktion und ihre Verwendungbeim statistischen Schluß. Statistische Hefte 6, 112–134.

I Giang and Shenoy (2005). Decision making on the sole basis ofstatistical likelihood. Artificial Intelligence 165, 137–163.

I Cattaneo (2013). Likelihood decision functions. Electronic Journal ofStatistics 7, 2924–2946.

I Cattaneo and Wiencierz (2012). Likelihood-based Imprecise Regression.International Journal of Approximate Reasoning 53, 1137–1154.


Date post:	31-Aug-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

The likelihood approach to statistical decision problems · introduction I the classical and...

Documents