Comparing Classical and Bayesian Approaches to Hypothesis Testing James O. Berger Institute of...

transcript

Comparing Classical and Bayesian Approaches to Hypothesis Testing

James O. BergerInstitute of Statistics and Decision Sciences

Duke Universitywww.stat.duke.edu

Outline

• The apparent overuse of hypothesis testing

• When is point null testing needed?

• The misleading nature of P-values

• Bayesian and conditional frequentist testing of plausible hypotheses

• Advantages of Bayesian testing

• Conclusions

I. The apparent overuse of hypothesis testing

• Tests are often performed when they are irrelevant.

• Rejection by an irrelevant test is sometimes viewed as “license” to forget statistics in further analysis

HabitatType

Rank ObservedUsage

Hypothesis

A 1 3.8B 2 3.6 H0 : "mean usage isC 3 2.8 equal for all habitats"D 4 1.8 Rejected (P<.025)E 5 1.5F 6 0.7

Prototypical example

Statistical mistakes in the example

• The hypothesis is not plausible; testing serves no purpose.

• The observed usage levels are given without confidence sets.

• The rankings are based only on observed means, and are given without uncertainties. (For instance, perhaps Pr (A>B)=0.6 only.)

HabitatType

Rank ObservedUsage

Hypothesis

Statistical mistakes in the example

• The hypothesis is not plausible; testing serves no purpose.

• The observed usage levels are given without confidence sets.

• The rankings are based only on observed means, and are given without uncertainties. (For instance, perhaps Pr (A>B)=0.6 only.)

HabitatType

Rank ObservedUsage

Hypothesis

Note that, while is typically not

plausible, it is a good approximation to

as long as < (4 n

(assuming Gaussian observations with

standard deviation ).

:| | , )

II. When is testing of a point null hypothesis needed?

Answer: When the hypothesis is plausible, tosome degree.

Examples of hypotheses that are not realistically plausible

• H0: small mammals are as abundant on livestock grazing land as on non-grazing land

• H0: survival rates of brood mates are independent

• H0: bird abundance does not depend on the type of forest habitat they occupy

• H0: cottontail choice of habitat does not depend on the season

Examples of hypotheses that may be plausible, to at least some degree:

• H0: Males and females of a species are the same in terms of characteristic A.

• H0: Proximity to logging roads does not affect ground nest predation.

• H0: Pollutant A does not affect Species B.

Example: Experimental drugs D1, D2, D3, . . .

are to be tested.Each Test: H0: Di has negligible effect

H1: Di is effective

Typical Bayesian Answer: The probabilitythat H0 is true is 0.06.

Classical Answer (P-value): If H0 were true, the

probability of observing hypothetical data as ormore "extreme" than the actual data is 0.06.

III. For plausible hypotheses, P-valuesare misleading as measures of evidence

DRUG D1 D2 D3 D4 D5 D6

P-VALUE 0.41 0.04 0.32 0.94 0.01 0.28

P-VALUE 0.11 0.05 0.65 0.009 0.09 0.66

Question: How strongly do we believe thatDrug i has a nonnegligible effect when (i) the P-value is approximately 0.05?(ii) the P-value is approximately 0.01?

A Surprising Fact: Suppose it is knownthat, apriori, about 50% of the Drugs willhave negligible effect. Then,

(i) of the Drugs for which the P-value 0.05, at least 25% (and typically over 50%) will have negligible effect;

(ii) of the Drugs for which the P-value 0.01, at least 7% (and typically over 15%) will have negligible effect.

P-VALUE 0.41 0.04 0.32 0.94 0.01 0.28

P-VALUE 0.11 0.05 0.65 0.009 0.09 0.66

A Surprising Fact: Suppose it is knownthat, apriori, about 50% of the Drugs willhave negligible effect. Then,

(i) of the Drugs for which the P-value 0.05, at least 25% (and typically over 50%) will have negligible effect;

(ii) of the Drugs for which the P-value 0.01, at least 7% (and typically over 15%) will have negligible effect.

P-VALUE 0.41 0.04 0.32 0.94 0.01 0.28

P-VALUE 0.11 0.05 0.65 0.009 0.09 0.66

IV. Bayesian testing of point hypotheses

Data and Model: has density

# of eggs hatched out of eggs

in a recently polluted area (so is binomial,

and is the true proportion that would hatch).

To Test: versus

is the historically known proportion

of eggs that hatch in the area

Example n

Example

0 0 1 0

The prior distribution

Let and be the prior probabilities of and

(The usual choice is

Under , let be the density representing

information concerning the location of (The usual

choice for the binomial problem is

There are two schools of Bayesian statistics,

the school, where the prior distribution

reflects real extraneous information, and the

school, where the prior is chosen in a default fashion.

P P H H

default P P

default

subjective

objective

1 2 1 2

( ) .)

Posterior probability that H0 is true, given the data (from Bayes theorem):

Pr( | )( | )

( | ) ( | ) ( )

H data xP f x

P f x P f x d

Beta x n xx x n

1 1 1 1

(for the binomial testing problem)

Note: Some prefer to use the (or

weighted likelihood ratio) of to ,

since this does not involve prior probabilties of the .

Suppose eggs hatched out of

Then and (Here a

classical test would yield

Bayes Factor

likelihood of data under H

average likelihood of data under H

Example x= n= .

H data x B

P value

40 100

0 52 0 92

( | ) ( )

Pr( | ) . . .

Conditional frequentist interpretation of the posterior probability of H0

Pr( | )

H data x frequentist type I error

probability

type I error probability

0 is also the

conditional on observing data of the

same "strength of evidence" as the actual data

The classical makes the

mistake of reporting the error averaged over data

of very different strengths.)

V. Advantages of Bayesian testing

• Pr (H0 | data x) reflects real expected error rates: P-values do not.

• A default formula exists for all situations:

Pr( | )( , ) ( , ) ( , )

( , ) ( , )

H data xf x f x f x dx d

f x f x d

where is independent (unobserved) data of the smallest

size such that the above integrals exist.

• Posterior probabilities allow for incorporation of personal opinion, if desired. Indeed, if the published default posterior probability of H0 is P*, and your prior probability of H0 is P0, then your posterior probability of H0 is

In the binomial example, recall

A "skeptic" has ; hence

A "believer" has ; hence

Pr( | )

. Pr( | ) . .

H data xP P

Example P

P H data x

01 011

0 9 0 91

• Posterior probabilities are not affected by the reason for stopping experimentation, and hence do not require rigid experimental designs (as do classical testing measures).

• Posterior probabilities can be used for multiple models or hypotheses.

Example H

H data H data H data

Pr( | ) . , Pr( | ) . , Pr( | ) .

pollutant A has no effect on species B

pollutant A decreases abundance of species B

pollutant A increases abundance of species B

0 1 230 68 02

An aside: integrating science and statistics via the Bayesian paradigm

• Any scientific question can be asked (e.g., What is the probability that switching to management plan A will increase species abundance by 20% more than will plan B?)

• Models can be built that simultaneously incorporate known science and statistics.

• If desired, expert opinion can be built into the analysis.

Conclusions

• Hypothesis testing is overutilized while (Bayesian) statistics is underutilized.

• Hypothesis testing is needed only when testing a “plausible” hypothesis (and this may be a rare occurrence in wildlife studies).

• The Bayesian approach to hypothesis testing has considerable advantages in terms of interpretability (actual error rates), general applicability, and flexible experimentation.

Comparing Classical and Bayesian Approaches to Hypothesis Testing James O. Berger Institute of...

Documents