Basic Multiple Comparison Procedures

A Review of Multiple Comparisons:

Procedures and Practice

2/8/06

Objectives

• Basic overview of error control

• Highlight various errors that can be

controlled

• Provide selected notes on various

methods

• Find out how error control is viewed in

psychology

Error Rates

Comparisonwise Type I error rate (a): Probability of claiming a difference exists in a particular comparison when there truly is no difference.

Familywise Type I error rate (F): Probability of claiming that any differences exist in a set of comparisons when there is truly no differences.

Studywise Type I error rate: Probability of claiming that any differences exist in all of the comparisons executed for a particular study.

The Problem: Inflated

Familywise Error Rate

Under the null hypothesis, the probability of incorrectly

“rejecting” at least one null hypothesis will be greater

than the nominal alpha. This is a family-wise error rate

issue.

If and only if, the family-wise error rate is to be directly

controlled then use a multiple comparison procedure.

When should researchers do this?

Typical Situation: After ANOVA

• 60 treatments and you compare biggest and smallest responses. We have not made just one independent comparison, but have included all 60(60-1)/2 comparisons.

• If the global null is true and familywise error is not controlled, the probability of making an error is at least 0.952.

• With the second largest difference 59(59-1)/2 comparisons are made.

Multiple Comparisons (MC)

• A multiple comparison procedure is a

method for controlling a “Type I error” rate

other than the per comparison error rate.

– Multiple pairwise comparison,sequential tests,

multiple tested models, interim analysis

situations.

• Step one: What is the family? Is it truly

studywise error control?

Familywise Error Rate

• Variations

– False Discovery Rate (FDR): Controls the expected fraction of falsely claimed differences.

With FDR set to .05, you are allowed one false rejection for 19 correct rejections.

– Strong Familywise Error Rate: Controls the probability that any false rejection is made.

An Issue for the Researcher

• “The choices of error rate and family of

tests are not purely statistical” (Oehlert, 2000)

• Statisticians can help control the error

rate, but researchers must decide which

rate is to be controlled.

Studywise Control

• Critical P-value adjustment approaches

Flexible. Can be applied in most situations.

– Bonferroni: Easy to use.

– Holm: More powerful than Bonferroni.

– Hochberg: Uniformly the most powerful.

How Bonferroni Works

• Partition alpha throughout the family

• Consider m t-tests,

Without FWEC,

tcrit*=t(a,df)

With a Bonferroni correction,

tcrit*=t(F/m,df)

Hopefully,

P(t1>tcrit*)+ P(t2 >tcrit*)+…+ P(tm >tcrit*)= a

0

0.01

0.02

0.03

0.04

0.05

0.06

Ty

pe

I E

rro

r (F

W)

0

0.2

5

0.5

0.7

5 1Correlation

Data #1

Type I Error (FW)

Type I

Err

or

Rate

(F

W)

Correlation

Bonferroni Test with Two Correlated Endpoints

Holm Correction

1. Order the p-values for the m sets of tests.

2. Reject each corresponding null hypothesis ifp(j) < F/(m-j+1) for all j = 1, …, m.

3. Start with the smallest p-value and stop if non-significance is found.

Although more complicated, power is gained because only the smallest p-value is compared to F/m.

Hochberg Method

1. Sort the p-values.

2. Reject each corresponding null

hypothesis if

p(j) < jE/m for all j = 1, …, m.

3. Start with the largest p-value and work

down.

This approach has relatively high power, but

controls FDR not the strong FW rate.

Studywise Control

• Bonferroni provide TIs. The others do not.

• Same approaches can be used in sets of

pairwise comparisons.

Bonferroni Holm Hochbergless

power

than

less

power

than

Strong Familywise Error Control FDR Control

Multiple Comparisons:

After ANOVA

• A natural family of comparisons is

established.

• An ANOVA is not a necessary step before

using most of the following approaches.

Comparison of Methods

Method Application Error Control

Scheffe All Contrasts SFW

Protected LSD Pairwise Comps SFW

LSD Pairwise Comps Comparison Wise

Tukey HSD Pairwise Comps SFW

SNK Pairwise Comps FDR

Ryan Pairwise Comps SFW

Duncan Control Comp FW

Hsu’s MCB Control Comp FW

. . .

. . .


Method Note

Scheffe Rarely interested in all possible contrasts

Protected LSD t-tests after ANOVA

LSD Doesn’t control familywise error rate

Tukey HSD Based on Studentized Range

SNK Step-down method using S.R.

Ryan Extension to SNK

Duncan Rarely interested control comparison only

Hsu’s MCB Rarely interested control comparison only

.

.


Method Strength/Weakness

Protected LSD Powerful / No protection for the

true portion of Ho

Tukey’s HSD Mitigation & understanding of

assumption violation. Provides TI

Ryan More power than HSD / No TI

SNK More power than Ryan / No TI

When Should Researchers Control Family-wise

Alpha?

• Should familywise error rates be controlled after ANOVA?

• When should researchers be more concerned about false discovery rates?

• When should parameter estimates be reported as test intervals or confidence intervals?

• When should studywise control be implemented?

Context of the Development of

Multiple Comparison

Procedures

• “The catch is that Neyman and Pearson developed their statistical tests to aid decision making, not to assess evidence in data.” (Perneger 1998)

• Researchers requiring a yes or no determination should be concerned about the multiple comparison issue.

• Example: Does drug A provide clinical benefits?

When Should Researchers Control

Familywise Error?

Causal Relationship

0Different Same

Multiplicity

Adjustment

Use a Composite

Endpoint

& Global Test

No Adjustment

Adapted from Dmitrienko 2006

Examples

• R

• Stata

Comment

A distinction needs to be made between tests of multiple hypotheses and multiple tests of the same hypothesis. If a study of treatment effectiveness uses multiple endpoints, then finding a significant effect for any of the endpoints will lead to the conclusion that the treatment is effective. For this reason, protection against a false positive conclusion has to be based on keeping the familywise error rate to 5%.

However, many studies simultaneously test several hypotheses. I see no reason why we should penalise these designs. (Conroy 2002)

Date post:	04-Jul-2015
Category:	Data & Analytics
Upload:	kevin-cummins
View:	151 times
Download:	1 times

Basic Multiple Comparison Procedures

Data & Analytics