Reading Birnbaum's (1962) paper, by Li Chenlu

On The Foundations Of Statistical Inference

by ALLAN BIRNBAUM

LI Chenlu

2013.01.22

1 / 33

content

1 Introduction

2 Part1Statistical EvidenceThe Principle of SufficiencyThe Principle of ConditionalityThe Likelihood Principle

3 Part2Binary ExperimentsFinite Parameter SpacesMore General Parameter SpacesBayesian Methods: An Interpretation of the Principle of Insuffi-cient Reason

4 Conclusion

2 / 33

content

1 Introduction2 Part1

Statistical EvidenceThe Principle of SufficiencyThe Principle of ConditionalityThe Likelihood Principle


4 Conclusion

2 / 33

content




4 Conclusion

2 / 33

content




4 Conclusion

2 / 33

Introduction

The paper studies the likelihood principle(LP)and how thelikelihood function can be used to mesure the evidence in the dataabout an unkown parameter.• The main aim of the paper is to show and discuss the implicationof the fact that the LP is a consequence of the concepts ofconditional frames of the reference and sufficiency.• The second aim of the paper is to describe how and why theseprinciples are appropriate ways to characterize statistical evidencein parametric models for inference purposes.

3 / 33

Part1

1 Statistical Evidence2 The Principle of Sufficiency3 The Principle of Conditionality4 The Likelihood Principle

4 / 33

Statistical Evidence

An experiment E is defined as E=Ω,S,f(x,θ),where f is adensity,θ is the unknown parameter,Ω is the parameter space and Sthe sample space of outcomes x of E.The likelihood functiondetermined by an observed outcome is Lx=f(x,θ)

Birnbaum states that the central purpose of the paper is to clarifythe essential structure and properites of statistical evidence,termedthe evidential meaning of (E,x) and denoted by Ev(E,x),in variousinstances.

Ev(E,x)is the evidence about θ supplied by x and E

5 / 33

The Principle of Sufficiency

The Principle of Sufficiency (S)

Let E be any experiment,with sample spacex,and let t(x)be anysufficient statistic. Let E

′denote the derived experiment,having

the same parameter space,such that when any outcome x of E isobserved the corresponding outcome t = t(x) of E

′is observe.Then

for each x,Ev(E , x) = Ev(E′, t),where t = t(x)

∗ If t(x) is a sufficient statistic for θ, then any inference about θshould depend on the sample x only through the value t(x)

6 / 33


If x is any specified outcome of any specified experiment E,thelikelihood function determined by x is the function of θ:cf(x,θ),wherec is any positive constant value

If for some positive constant c we have f (x , θ) = cg(y , θ),forall θ,x and y are said to determine the same likelihood function

If two outcomes x,x′

of one experiment determine the samelikelihood function,f(x,θ)=cf(x

′,θ) for all θ ,then there exists a

sufficient statistic t such that t(x) = t(x′)

7 / 33


lemma1

if two outcomes x, x′

of any experiment E determine the samelikelihood function,then they have the same evidential meaning:Ev(E , x) = Ev(E , x

′)

8 / 33

The Principle of Conditionality

the definiton of the mixture experiment

An experiment E is called a mixture ,with componensEh,if it ismathematically equivalent to a two-stage experiment of the follow-ing form:

1 An observation h is taken on a random variable H having afixed and know distribution G (G does not depend on unknowparameter values.)

2 The corresponding component experiment Eh is carried out,yielding an outcomes xh

Thus each outcomes of E is a pair(Eh,xh)

9 / 33


The Principle of Conditionality (C)

If an experiment E is a mixture G of componentsEh,with possibleoutcomes(Eh,xh),then

Ev(E , (Eh, xh)) = Ev(Eh, xh)

That is,the evidential meaning of any outcome(Eh,xh)of any ex-periment E having a mixture structure is the same as: the eviden-tial meaning of the corresponding outcome xh of the correspondingcomponent experiment Eh,ignoring otherwise the over-all structureof the original experiment E

10 / 33


Exemple

suppose that two instruments(h=1 or 2) are available for use inan experment,respectives probabilities p1=0.73,p2=0.27 of being s-elected for use. each instrument gives the observations y=1,or y=0.

Consider the assertion :Ev(E,(E1,1))=Ev(E1,1),by accepting theexperimental conditions,suppose that E leads to selection of thefirst instrument(h=1)..In the hypothetical situation,it would be prepared to reporteither(E1,0)or(E1,1)as a complete description of the statisticalevidence obtained.

11 / 33


Exemple

For purpose of informative inference ,if y=1 is observed with thefirst instument,then the report (E1,1) seems to be an appropriateand complete description of the statistical evidence obtained.and the”more complete” report(E,(E1,1))seems to differ from itonly by the addition of recognizably redundant elements irrelevantto the evidential meaning and evidential interpretation of thisoutcomes of E.

12 / 33

The Likelihood Principle

The Likelihood Principle (L)

If E and E′

are any two experiments with a common parameterspace,and if x and y any respective outcomes which determinelikelihood functions satisfying f (x , θ) = cg(y , θ) for some positiveconstant c=c(x,y) and all θ,thenEv(E , x) = Ev(E

′, y)

That is ,the evidential meaning Ev(E,x)of any outcome xof any experiment E is characterized completely by the likelihoodfunction cf(x,θ),and is otherwise independent of the structure of(E,x)

13 / 33


Lemma2

(S)and(C)⇐⇒(L)

Prove⇐:

•That(L)implies(C)follows immediately from the fact that in all cas-es the likelihood functions determined respectively by (E,(Eh,xh))and(Eh,xh)are proportional.•That(L)implies(S)follows immediately from Lemma 1.

14 / 33


Lemma2

(S)and(C)⇐⇒(L)

Prove⇒:

Let E and E′denote any two experiments,having the same parameter spaceΩ=θ,and

represented by probability density functions f(x,θ),g(y,θ)on their respective sample

spaces S=x,S′=y.consider the mixture experiment E?whose components are just

E and E′,taken with equal probabilities.let z denote the sample point of E?,and let C

denote any set of points z;then C=A⋃

B,where A⊂ S and B⊂ S′

Prob(Z∈|θ)= 12

Prob(A|θ,E)+ 12

Prob(B|θ,E′)

the probability density function representing E? be denoted by:

h =

12f (x , θ) if z=x∈ S,

12g(y , θ) if z=y∈ S

′

From(C),it follows that:Ev(E?,(E,x))=Ev(E,x), for each x∈ S

. Ev(E?,(E′,y))=Ev(E

′,y),for each y∈ S

′(a)

15 / 33


Prove⇒:

Let x y be any two outcomes of E,E′respectively which determine the same likelihood

function : f(x,θ)=cg(y,θ) for all θ.

where c is some positive constant.Then we have h(x,θ)=ch(y,θ) for all θ,

the two outcomes(E,x),(E′,y)of E? determine the same likelihood function.Then it fol-

lows from(S)and Lemma1: Ev(E?,(E,x))=Ev(E?,(E′,y)) (b)

from(a)and(b)it follows that:

Ev(E,x)=Ev(E′,y).

The consequence states that any two outcomes x,y of any two experiments E,E′(with

the same parameter space)have the same evidential meaning if they determine the same

likelihood function.

16 / 33


impact of the principle

•The implication⇒ is the most important part of the equiva-lence,because this means that if you do not accept(L),you have todiscard either(S)or(C),two widely accepted principles.

•The most important consequence of (L) seems to be thatevidential measures based on a specific experimental frame of refer-ece(like p-values and confidence levels) are somewhat unsatisfactory.

• In other words, (L) eliminates the need to consider thesample space or any part of it once the data are observed.Lemma2 truly was a ”breakthrough” in the foundations of statisticalinference and made (L) stand on its own ground,independent of aBayesian argument.

17 / 33

Part2

1 Binary Experiments2 Finite Parameter Spaces3 More General Parameter Spaces4 Bayesian Methods

18 / 33

Binary Experiments

let Ω=(θ1,θ2).In this case,(L)means that all information lie in thelikelihood ratio, λ(x)=f(x,θ2)/f(x,θ1).The question is now what evidential meaning we can attach to thenumberλ(x)? To answer this,Birnbaum first considers a binary experiment inwhich the sample space has only two points.denoted(+)and(-),andsuch that p(+|θ1)=p(-|θ2)=α for an α ≤ 1

2 . Such an experimentis called a symmetric simple binary experiment and is characterizedby the”error” probability α.

19 / 33

Binary Experiments

For such an experiment,λ(+)=(1-α)/α ≥ 1 ,α=1/(1+λ(+))andλ(-)= α/(1-α) ≤ 1.The important point now is that accordingto (L),two experiments with the same value of λ have the sameevidential meaning about the value of α .Therefore,the evidential meaning of λ(x)≥1 from any binaryexperiment E is the same as the evidential meaning of the(+)outcome from a symmetric simple binary experiment withα(x)=1/(1+λ(x)). α(x)is called the intrinsic signigicance level andis measure of evidence that satisfies(L).

20 / 33

Finite Parameter Spaces

If E is any experiment with a parameter space containing only afinite number k of points,θ=i=1,2...k. Any observed outcome x ofE determines a likelihood function L(i)=cf(x,i),i=1,....k. We canassume that

∑ki=1L(i)=1

Any experiment E with a finite sample space j=1,...m, and finiteparameter space is represented by a stochastic matrix

E = (Pij) =

p11 . . . p1m...

. . ....

pk1 . . . pkm

where

∑mj=1Pij=1 and pij=Prob[j|i],for each i,j.Here the i th row is the discrete

probability distribution pij given by parameter value i,and the j th column is

proportional to the likelihooh function L(i)=L(i|j)=cpij ,i=1,...k,determined by

outcome j.

21 / 33

Finite Parameter Spaces:Qualitative evidential interpretation

Exemple1:

experiment with only two points j=1,2.we can define Prob[j=1|i]=L(i) and Prob[j=2|i]=1-L(i).For i=1,...k.for exemple,the likelihood function L(i)= 1

3 ,i=1,2,3 represents thepossible outcome j=1 of the experiment

E =

13

23

13

23

13

23

•Since this experiment gives the same distribution on the two-point sample space under

esch hypothesis,it is completely uninformative.According to the likelihood principle,we

can therefore conclude that the given likelihood function has a simple evidential inter-

pretation,regardless of the structure of the experiment,that is represents a completely

uninformative outcome.22 / 33


Exemple2:

The likelihood function( 12 , 1

2 ,0)(that is ,L(1)=L(2)= 12 ,L(3)=0,on the

3-points parameter spacei=1,2,3.)this represents the possible outcome j=1 of the experiment

E =

12

12

12

12

0 1

•this outcome of E is impossible under i=3,and hence supports without risk of error the

conclusion that i 6=3

• E prescribes identical distributions under i=1,and 2.and hence the experiment E ,and

each of its possible outcomes ,is completely uninformative as between i=1,and 2.

23 / 33


Exemple 3:some likelihood functions on a given parameter space canbe compared and ordered in a natural way

Consider the likelihood functions (0.8,0.1,0.1) and (0.45,0.275,0.275)

The interpretation that the first is more informative than the second is supported

as follows:

E =

0.8 0.2

0.1 0.9

0.1 0.9

= (Pij)

when outcome j=2 of E is observed,we report w=1 with probability 12

,w=2 with

probability 12

.when outcome j=1 of E is observed, the report w=1 is given.

E′ =

0.9 0.1

0.55 0.45

0.55 0.45

= (P′iw )

The experiment E′

is less informative than E

24 / 33

Finite Parameter Spaces:Intrinsic confidence methods

Exemple 4

consider the likelihood function(0.9,0.09,0.01)defined on the param-eter space i=1,2,3.This represents the possible outcome j=1 of theexperiment.

E =

0.9 0.01 0.09

0.09 0.9 0.01

0.01 0.09 0.9

= (Pij)

•In this experiment,a confidence set estimator of the parameter i isgiven by taking,for each possible outcomes j,the two values of ihaving greatest likelihoods L(i | j).•we can verify that under each value of i, the probability is 0.99that the confidence sets determined in this way have confidencecoefficient 0.99.

25 / 33

Finite Parameter Spaces:Intrinsic confidence methods

The general form of the intrinsic confidence methods

for any likelihood function L(i) defined on a finite parameter spacei=1,...k,and such that

∑ki=1L(i)=1

if there is a unique least likely value i1of i,let c1=1-L(i1).Then the remaining

(k-1)parameter points will be called an intrinsic confidence set with intrinsic confidence

coefficient c1;If there is a pair of values of i,say i1,i2,with likelihoods strictly smaller

than those of the remainning (k-2) points,call the latter set of points an intrinsic

confidence set,with intrinsic confidence level c2=1-L(i1)-L(i2).and so on.

E =

L(1) L(k) L(k − 1) . . .

L(2) L(1) L(k) . . .

L(3) L(2) L(1) . . .

......

...

L(k) L(k − 1) L(k − 2) . . .

= (Pij )

26 / 33

Finite Parameter Spaces

on the finite parameter space:

•For finite parameter spaces,significance levels, confidence sets,andconfidence levels can be based on the observed Lx(θ),hencesatisfying(L),defined as regular such methods and concepts for aconstructed experiment with a likelihood function identical to Lx(θ).

•Therefore,in the case of finite parameter spaces,a clear andlogical evidential interpretation of the likelihood function can begiven through intrinsic methods and concepts.

27 / 33

More General Parameter Spaces

• This section deals mainly with the case where Ω is the real line.Given E,x,and Lx(θ),a hypothetical experiment E

′consisting of a

single observation of Y with density g(y,θ)=cLx(θ-y)is thenconstructed.• Then(E,x)has the same likelihood function as (E

′,0),and(L)

implies that the same inference should be used in (E,x)as in (E′,0).

For exemple,if a regular(1-α) confidence interval in E′

is used, thenthis interval estimate(for y=0)should be the one used also for (E,x)and is called a (1-α) intrinsic confidence interval for (E,x).

28 / 33

More General Parameter Spaces

As a general comment,Birnbaum emphasizes that intrinsic methodsand concepts can ,in light of (L),be nothing more than methods ofexpressing evidential meaning already implicit in Lx(θ)itself.In the discussion,Birnbaum does not recommend intrinsic methodsas statistical methods in practice.The value of these methods isconceptual,and the main use of intrinsic concepts is to show thatlikelihood functions as such are evidentially meaningful.

29 / 33

Bayesian Methods:An Interpretation of the principle ofInsufficient Reason

Birnbaum views the Bayes approach as not directed to informativeinference,but rather as a way to determine an appropriate finalsynthesis of availabe,information based on prior availbaleinformation and data.It is observed that in determining the postriordistribution ,the contribution of the data and E is L x(θ)only,so theBayes approach implies(L).

30 / 33

Conclusion

•Birnbaum’s main result,that LP follows from sufficiency andconditionality principles that most statisticians accept,must beregarded as one of the deepest theorems of theoretical statistics,yet the proof is unbelievably simple.•The result had a decisive influence on how many statisticianscame to view the likelihood function as a basic quantity instatistical analysis.•It has also affected in a general way how we view the science ofstatistics.Birnbaum introduced principles of equivalence within andbetween experiments, showing various relationships between theseprinciples.This made it possible to disccuss the different conceptsfrom alternative viewpoints.

31 / 33

References

Allan Birnbaum ,”On the foundations of statistical inference:binaryexperiment” Institute of Mathematical Sciences,New York Uni-versity

Daniel Steel ,”Beyasian confirmation theory and the likelihoodprinciple” Michigan State University

Royall,R” Statistical Evidence:A likelihood paradigm” Chapmanand Hall,London

Jan F Bjφrnstad,” Breakthroughs in Statistics Volume I -foundationaans Basic Theory” The university of Trondheim

32 / 33

the end!thank you!

33 / 33

Date post:	10-May-2015
Category:	Education
Upload:	christian-robert
View:	2,106 times
Download:	1 times

Reading Birnbaum's (1962) paper, by Li Chenlu

Education