Date post: | 04-Jul-2015 |
Category: |
Data & Analytics |
Upload: | kevin-cummins |
View: | 151 times |
Download: | 1 times |
A Review of Multiple Comparisons:
Procedures and Practice
2/8/06
Objectives
• Basic overview of error control
• Highlight various errors that can be
controlled
• Provide selected notes on various
methods
• Find out how error control is viewed in
psychology
Error Rates
Comparisonwise Type I error rate (a): Probability of claiming a difference exists in a particular comparison when there truly is no difference.
Familywise Type I error rate (F): Probability of claiming that any differences exist in a set of comparisons when there is truly no differences.
Studywise Type I error rate: Probability of claiming that any differences exist in all of the comparisons executed for a particular study.
The Problem: Inflated
Familywise Error Rate
Under the null hypothesis, the probability of incorrectly
“rejecting” at least one null hypothesis will be greater
than the nominal alpha. This is a family-wise error rate
issue.
If and only if, the family-wise error rate is to be directly
controlled then use a multiple comparison procedure.
When should researchers do this?
Typical Situation: After ANOVA
• 60 treatments and you compare biggest and smallest responses. We have not made just one independent comparison, but have included all 60(60-1)/2 comparisons.
• If the global null is true and familywise error is not controlled, the probability of making an error is at least 0.952.
• With the second largest difference 59(59-1)/2 comparisons are made.
Multiple Comparisons (MC)
• A multiple comparison procedure is a
method for controlling a “Type I error” rate
other than the per comparison error rate.
– Multiple pairwise comparison,sequential tests,
multiple tested models, interim analysis
situations.
• Step one: What is the family? Is it truly
studywise error control?
Familywise Error Rate
• Variations
– False Discovery Rate (FDR): Controls the expected fraction of falsely claimed differences.
With FDR set to .05, you are allowed one false rejection for 19 correct rejections.
– Strong Familywise Error Rate: Controls the probability that any false rejection is made.
An Issue for the Researcher
• “The choices of error rate and family of
tests are not purely statistical” (Oehlert, 2000)
• Statisticians can help control the error
rate, but researchers must decide which
rate is to be controlled.
Studywise Control
• Critical P-value adjustment approaches
Flexible. Can be applied in most situations.
– Bonferroni: Easy to use.
– Holm: More powerful than Bonferroni.
– Hochberg: Uniformly the most powerful.
How Bonferroni Works
• Partition alpha throughout the family
• Consider m t-tests,
Without FWEC,
tcrit*=t(a,df)
With a Bonferroni correction,
tcrit*=t(F/m,df)
Hopefully,
P(t1>tcrit*)+ P(t2 >tcrit*)+…+ P(tm >tcrit*)= a
0
0.01
0.02
0.03
0.04
0.05
0.06
Ty
pe
I E
rro
r (F
W)
0
0.2
5
0.5
0.7
5 1Correlation
Data #1
Type I Error (FW)
Type I
Err
or
Rate
(F
W)
Correlation
Bonferroni Test with Two Correlated Endpoints
Holm Correction
1. Order the p-values for the m sets of tests.
2. Reject each corresponding null hypothesis ifp(j) < F/(m-j+1) for all j = 1, …, m.
3. Start with the smallest p-value and stop if non-significance is found.
Although more complicated, power is gained because only the smallest p-value is compared to F/m.
Hochberg Method
1. Sort the p-values.
2. Reject each corresponding null
hypothesis if
p(j) < jE/m for all j = 1, …, m.
3. Start with the largest p-value and work
down.
This approach has relatively high power, but
controls FDR not the strong FW rate.
Studywise Control
• Bonferroni provide TIs. The others do not.
• Same approaches can be used in sets of
pairwise comparisons.
Bonferroni Holm Hochbergless
power
than
less
power
than
Strong Familywise Error Control FDR Control
Multiple Comparisons:
After ANOVA
• A natural family of comparisons is
established.
• An ANOVA is not a necessary step before
using most of the following approaches.
Comparison of Methods
Method Application Error Control
Scheffe All Contrasts SFW
Protected LSD Pairwise Comps SFW
LSD Pairwise Comps Comparison Wise
Tukey HSD Pairwise Comps SFW
SNK Pairwise Comps FDR
Ryan Pairwise Comps SFW
Duncan Control Comp FW
Hsu’s MCB Control Comp FW
. . .
. . .
Comparison of Methods
Method Note
Scheffe Rarely interested in all possible contrasts
Protected LSD t-tests after ANOVA
LSD Doesn’t control familywise error rate
Tukey HSD Based on Studentized Range
SNK Step-down method using S.R.
Ryan Extension to SNK
Duncan Rarely interested control comparison only
Hsu’s MCB Rarely interested control comparison only
.
.
Comparison of Methods
Method Strength/Weakness
Protected LSD Powerful / No protection for the
true portion of Ho
Tukey’s HSD Mitigation & understanding of
assumption violation. Provides TI
Ryan More power than HSD / No TI
SNK More power than Ryan / No TI
When Should Researchers Control Family-wise
Alpha?
• Should familywise error rates be controlled after ANOVA?
• When should researchers be more concerned about false discovery rates?
• When should parameter estimates be reported as test intervals or confidence intervals?
• When should studywise control be implemented?
Context of the Development of
Multiple Comparison
Procedures
• “The catch is that Neyman and Pearson developed their statistical tests to aid decision making, not to assess evidence in data.” (Perneger 1998)
• Researchers requiring a yes or no determination should be concerned about the multiple comparison issue.
• Example: Does drug A provide clinical benefits?
When Should Researchers Control
Familywise Error?
Causal Relationship
0Different Same
Multiplicity
Adjustment
Use a Composite
Endpoint
& Global Test
No Adjustment
Adapted from Dmitrienko 2006
Examples
• R
• Stata
Comment
A distinction needs to be made between tests of multiple hypotheses and multiple tests of the same hypothesis. If a study of treatment effectiveness uses multiple endpoints, then finding a significant effect for any of the endpoints will lead to the conclusion that the treatment is effective. For this reason, protection against a false positive conclusion has to be based on keeping the familywise error rate to 5%.
However, many studies simultaneously test several hypotheses. I see no reason why we should penalise these designs. (Conroy 2002)