Basic statistical tests in R
Anja Bråthen Kristoffersen
Biomedical Research Group
Outline
• Example: Tea testing
– Hypotese testing, type I and type II error
• More tests and how to find and read help files for different tests in R
•
2014.01.15 2
Famous hypotheses test example:
The Design of Experiments (1935), Sir Ronald A. Fisher
– A tea party in Cambridge in the 1920ties
– A lady claims that she can taste whether milk is poured in cup before or after the tea
– All professors agree: impossible
– Fisher: this is statistically interesting!
He organized a test
2014.01.15 3
The lady tasting tea • Test with 8 trials, 2 cups in each trial
– In each trial: guess which cup had the milk poured in first
• Binomial experiment
– Independent trials
– Two possible outcomes, she guesses right cup (success), wrong cup (failure)
– Constant probability of success in each trial
• X = number of correct guesses in 8 trials, each with probability of success p
– X is Binomially (8,p) distributed
2014.01.15 4
The lady tasting tea cont. • The null (conservative) hypothesis
– The one we initially believe in
• The alternative hypothesis
– The new claim we wish to test
• She has no special ability to taste the difference
• She has a special ability to taste the difference
2014.01.15 5
𝑝 = 0.5
𝑝 > 0.5
How many right to be convinced We expect maybe 3, 4 or 5 correct guesses if she has no special ability
• Assume 7 correct guesses
– Is there enough evidence to claim that she has a special ability? If 8 correct guesses this would have been even more obvious!
• What if only 6 correct guesses?
– Then it is not so easy to answer YES or NO
• Need a rule that says something about what it takes to be convinced.
2014.01.15 6
How many right to be convinced?
• Rule: We reject H0 if the observed data have a small probability under H0 (given H0 is true).
• Compute the p-value.
– The probability to obtain the observed value or something more extreme, given that is true
– NB! The p-value is NOT the probability that is true
Small p-value: reject the null hypothesis
Large p-value: keep the null hypothesis 2014.01.15 7
The lady tasting tea, cont. • Say: she identified 6 cups correctly
• P-value – The probability to obtain the observed value or something more
extreme, given that H0 is true 𝑃 𝑋 ≥ 6 𝐻0 𝑡𝑟𝑢𝑒 =
𝑃 𝑋 = 6 𝑝 = 0.5 + 𝑃 𝑋 = 7 𝑝 = 0.5 + 𝑃 𝑋 = 8 𝑝 = 0.5 = 𝑑𝑏𝑖𝑛𝑜𝑚 6, 8, 0.5 + 𝑑𝑏𝑖𝑛𝑜𝑚 7, 8, 0.5 + 𝑑𝑏𝑖𝑛𝑜𝑚 8, 8, 0.5 =
sum(dbinom(6:8, 8, 0.5)) = 0.1445
• Is this enough to be convinced?
• Need a limit. – we must know about the types of errors we can make.
2014.01.15 8
Two types of error
• Type I error most serious
– Wrongly reject the null hypothesis
– Example:
• person is not guilty
• person is guilty
• To say a person is guilty when he is not is far more serious than to say he is not guilty when he is.
2014.01.15 9
H0 true H1 true
Accept H0 OK Type II error
Accept H1 Type I error OK
When to reject
• Decide on the hypothesis’ level of significance
– Choose a level of significance α
– This guarantees P(type I error) ≤ α
– Example
• Level of significance at 0.05 gives 5 % probability to reject a true
• Reject H0 if P-value is less than α
2014.01.15 10
Important parameters in hypothesis testing
• Null hypothesis
• Alternative hypothesis
• Level of significance
Must be decided upon before we know the results of the experiment
2014.01.15 11
The lady tasting tea, cont.
• Choose 5 % level of significance Conduct the experiment
– Say: she identified 6 cups correctly
– Is this evidence enough?
• P-value
– The probability to obtain the observed value or something more extreme, given that H0 is true
𝑃 𝑋 ≥ 6 𝐻0 𝑡𝑟𝑢𝑒 = 𝑠𝑢𝑚(𝑑𝑏𝑖𝑛𝑜𝑚 6: 8, 8, 0.5)= 0.1445
2014.01.15 12
The lady tasting tea, cont.
• We obtained a p-value of 0.1443
• The rejection rule says
– Reject H0 if p-value is less than the level of significance α
– Since α = 0.05 we do NOT H0 reject
Small p-value: reject the null hypothesis
Large p-value: keep the null hypothesis 2014.01.15 13
The lady tasting tea, cont.
• In the tea party in Cambridge:
– The lady got every trial correct!
• Comment:
– Why does it taste different?
• Pouring hot tea into cold milk makes the milk curdle, but not so pouring cold milk into hot tea*
2014.01.15 14 *http://binomial.csuhayward.edu/applets/appletNullHyp.html Curdle = å skille seg
Area of rejection
• Reject H0 if p-value ≤ α
• Reject H0 if observed value (x) ≥ critical value (xc)
• P(type I error) = P(reject H0 | H0 true) =
P(X ≥ xc|p = 0.5) – xc= 7 → sum(dbinom(7:8, 8, 0.5)) = 0.035 ≤ 0.05
– xc= 6 → sum(dbinom(6:8, 8, 0.5)) = 0.145 > 0.05
Area of rejection: {x: x ≥ xc} → {x: x ≥ 7}
2014.01.15 15
Type II error
P(type I error) ≤ α P(type II error) = β
• Want both errors as small as possible
– especially type I.
• β is not explicitly given, depends on H1
• There is one β for each possible value of p under H1
2014.01.15 16
H0 true H1 true
Accept H0 OK Type II error
Accept H1 Type I error OK
Example: type II error
• P(type II error) = P(not reject H0 | H1 true)
– P = 0.7:
P(not reject H0 | p = 0.7) = 1 - P(reject H0 | p = 0.7) =
1 – P(X ≥ 7 | p = 0.7) = 1 - (1 – P(X < 7 | p = 0.7) =
P(X ≤ 6|p = 0.7) = sum(dbinom(1:6, 8, 0.7)) = 0.745
p = 0.7: H0 will wrongly be accept in 74.5% of the tests
2014.01.15 17
Power of the test
• The probability that a false H0 is rejected
P(reject H0 | H1 true) = 1 - P(accept H0 | H1 true) = 1 - β
• A test with large power has:
– larger probability to draw the right conclusion
– larger probability to reject a false null hypothesis
then a test with low power.
• α and β is connected:
– Decreasing α will give an increased β which again will decrease the power of the test
2014.01.15 18
Example: power function
2014.01.15 19
p <- seq(0.6, 1, 0.01)
antall <- length(p)
beta8 <- rep(NA, antall)
for(i in 1:antall){
beta8[i] <- sum(dbinom(1:6, 8, p[i]))
}
power8 <- 1 - beta8
plot(p, power8, type = "l")
chisq.test()
2014.01.15 24
Reject the null hypotheses and assume that there are differences between the groups
Multiple hypothesis testing
• Tests are designed such that it has an expected proportion of incorrectly rejected null hypotheses, most often this level is 5%.
• When many tests are done the probability of rejecting a null hypotheses falsely increase, hence we can correct the probabilities according to how many tests that are done.
2014.01.15 35
Example 10000 genes • Q: is gene g, g = 1, …, 10 000, differentially expressed?
• Gives 10 000 null hypothesis: 𝐻01, 𝐻0
2, … , 𝐻010000
– 𝐻01: gene 1 not differentially expressed
– …
• Assume: no genes differentially expressed
– 𝐻0𝑔
true for all g
• Significance level α ≤ 0.01
– The probability to incorrectly conclude that one gene is differentially expressed is 0.01. e.g. 0.01 * 10000 = 100
expected wrong rejections of 𝐻0𝑔
2014.01.15 36
Need to control the risk of false positive Type I error
• Corrected p-value:
– The original p-values do not tell the full story.
– Instead of using the original p-values for decision making, we should use corrected ones.
2014.01.15 37
Different correction methods
• Bonferroni (1935)
– Just multiply all the p-values by the number of tests
– To conservative
• need very small p-value to reject 𝐻0
• give very little power
• Methods that control the family-wise error rate (FWER).
• Methods that control the false discovery rate (FDR).
2014.01.15 38
Family-Wise Error Rate (FWER)
• Control type I errors at a level α
– Bonferroni
– Sidak
– Bonferroni-Holm
– Westfall & Young
• Use one of these if you are most afraid of getting stuff on your significant list that should not have been there
2014.01.15 39
False Discovery Rate (FDR)
• Calculate the expected proportion of type I error among the rejected hypotheses
• Technique that applies to a set of p-values
– Benjamini & Hochberg
– Different newer variants of Benjamini & Hochberg
• Use one of these if you are you most afraid of missing out on interesting stuff
2014.01.15 40