BIO5312 BiostatisticsLecture 6: Statistical hypothesis testings
Yujin Chung
October 4th, 2016
Fall 2016
Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 1/30
Previous
Two types of statistical inferences:
Estimation: concerned with estimating the values of specificpopulation parameters. These specific values are referred to aspoint estimates. Sometimes, interval estimation is carried outto specify an interval which likely includes the parameter values.
Hypothesis testing: concerned with testing whether the value ofa population parameter is equal to some specific value
Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 2/30
Hypothesis testing
Philosophy: prove a claim by contradiction.
Analogy: “dependent love” story
Claim : You don’t love me.Reasoning : If you loved me, you would take the trash out
every week and put your socks away.Data : Some weeks you don’t take the trash out or leave
your socks where they fall.Conclusion : You don’t love me.
Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 3/30
Statistical hypothesis testing
Hypothesis-testing framework specifies two hypotheses: null andalternative hypothesis
I The null hypothesis (H0) is often an initial claim thatresearchers specify using previous research or knowledge. Typicallyit is a statement that the value of a population parameter (such asproportion, mean, or standard deviation) is equal to some claimedvalue.
I The alternative hypothesis (H1) is what you might believe tobe true or hope to prove true.
H0 : you love me vs. H1 : you don’t love me
Hypothesis-testing provides an objective framework for makingdecisions using probabilities methods, rather than relying onsubjective impressions.
Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 4/30
Examples
The average of cholesterol level in children is 175mg/dL. A groupof men who have died from heart disease within the past year areidentified, and the cholesterol of their offspring are measured. (1)Is the average cholesterol level of these children larger than175mg/dL? (2) Is the average cholesterol level different from thatof children whose fathers do not have a history of heart disease?
I µ1: the population mean of cholesterol level in the case groupµ2: the poulation mean of cholesterol level in the control group
I (1) H0: µ1 = 175 vs. H1: µ1 > 175I (2) H0: µ1 = µ2 vs. H1: µ1 6= µ2
Are the IQ and the number of finger-wrist taps (fwt) of children inthe lead exposed group different from those of children in thecontrol group?
I H0: the population mean of fwt in the two groups are the samevs. H1: the means are different
Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 5/30
Four possible outcomes in hypothesis testing
No reject H0 Reject H0
H0 true true negative false positive(1− α) Type I error (α)
H1 true false negative true positiveType II error (β) Power (1− β)
Two possible errorsI type I error (α): Pr(Reject H0|H0 true). commonly referred to as
the significance level of a test.I type II error (β): Pr(Not reject H0|H1 true)
The power of a test: 1− β = Pr(Reject H0|H1 true)
We prefer a test with small α and large power (1− β).
Statistical hypothesis test: the greatest power (1− β) among allpossible tests of a given type I error α
Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 6/30
t-Test for the Mean
We assume the cholesterol levels in children follow N(µ, σ2). We wishto test whether the cholesterol levels of children with family history issame as 175mg/dL, the average cholesterol without family or largerthan 175.
HypothesesH0 : µ = 175 vs H1 : µ > 175.
Logic:1 Assume H0 is true2 If x̄ is too large, it is a contradiction to the assumption that H0 is
true.
Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 7/30
t-test for the mean: critical-value method
H0 : µ = 175 vs H1 : µ > 175.The distribution of X̄ under H0
Since X1, . . . , Xn ∼ N(175, σ2), t =X̄ − 175
S/√n∼ tn−1.
0.0
0.1
0.2
0.3
0.4
Critical−value method: H0:µ=175 vs. H1:µ>175D
ensi
ty fu
nctio
n of
t n−1
−4 0 4tn−11−α
Acceptance region
Rejection region
Critical value: tn−1,1−αIf t > tn−1,1−α, reject H0; if t ≤ tn−1,α, not reject H0
Type I error: Pr(t > tn−1,1−α|H0) = α;
Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 8/30
t-Test for the Mean: p-value method
H0 : µ = 175 vs H1 : µ > 175.
Test statistic: t =X̄ − 175
S/√n
.
Under H0: test statistic t ∼ tn−1
p-value: Pr(t > t(obs)|H0), the probability of obtaining a teststatistic as extreme as or more extreme than the actual teststatistic value, given that H0 is true.
0.0
0.1
0.2
0.3
0.4
p−value for the test: H0:µ=175 vs. H1:µ>175
Den
sity
func
tion
of t n
−1
−4 0 4tn−11−αt(obs)
● Rejection region
● p−value: Pr(t>t(obs))
If p-value < α, reject H0; if p-value ≥ α, not reject H0.Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 9/30
Significance level: α
H0 is rejected if t > tn−1,1−α or p-value< α.
α: significance level, type-I error, typically set to 0.05
Guidelines for judging the significance of a p-value
If p ≥ 0.05, then the results are considered not statisticallysignificant
If 0.01 ≤ p < 0.05, then the results are statistically significant
If 0.001 ≤ p < 0.01, then the results are highly significant
If p < 0.001, then the results are very highly significant
Report an exact p-value!
The p-value indicates exactly how significant the results arewithout performing repeated significance tests at different α levels.
The p-value indicate how close to statistical significance the resultshave come even when they are not statistically significant
Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 10/30
Example: the cholesterol of children
Suppose the mean cholesterol level of 10 children whose fathers diedfrom heart disease is 200 mg/dL and the sample standard deviation is50 mg/dL. The average of cholesterol level in children is known as175mg/dL. Is the average cholesterol level of these children larger than175mg/dL?
Let µ be the population mean cholesterol level of children whosefathers died from heart disease. The hypotheses are H0 : µ = 175 vs.H1 : µ > 175.
The test statistic is t =x̄− µ0
s/√n
and follows tn−1 under H0. The
observed test statistic is 200−17550/√n
= 1.58.
Critical-value methodAt the significance level 5%, the critical value istn−1,1−α = t9,0.95 = 1.833 and the rejection region is t > 1.833.Since t(obs) = 1.58 < 1.833, we cannot reject H0 atsignificance level 5%
Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 11/30
Example: the cholesterol of children
Suppose the mean cholesterol level of 10 children whose fathers diedfrom heart disease is 200 mg/dL and the sample standard deviation is50 mg/dL. The average of cholesterol level in children is known as175mg/dL. Is the average cholesterol level of these children larger than175mg/dL?
Let µ be the population mean cholesterol level of children whosefathers died from heart disease. The hypotheses are H0 : µ = 175 vs.H1 : µ > 175.
The test statistic is t =x̄− µ0
s/√n
and follows tn−1 = t9 under H0. The
observed test statistic is 200−17550/√n
= 1.58.
p-value methodThe p-value is p = Pr(t > t(obs)|H0) = Pr(t > 1.58|H0) = 0.074.Since p > 0.05, we cannot reject H0 at significance level 5%
Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 12/30
One-tailed t-test for the mean
A one-tailed test is a test in which the values of the parameter beingstudied under the alternative hypothesis are allowed to be eithergreater than or less than the values of the parameter under the nullhypothesis (µ0) but not both.
H0 : µ = µ0
test statistic: t =X̄ − µ0
S/√n
H1 rejection region p-value
µ > µ0 t > tn−1,1−α Pr(t > t(obs)|H0)µ < µ0 t < tn−1,α Pr(t < t(obs)|H0)
Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 13/30
Two-sided alternatives
The test for H0 : µ = µ0 vs. H1 : µ 6= µ0 is based on t = x̄−µ0s/√n
.
Critical-value methodI Rejection region: If |t| > tn−1,1−α/2, then H0 is rejected.I Acceptance region: If |t| > tn−1,1−α/2, then H0 is NOT rejected.I Type-I error: Pr(|t| > tn−1,1−α/2|H0) = α.
p-value method: p-value is Pr(|t| > |t(obs)||H0). If p < 0.05, thenH0 is rejected at significance level 5%.
Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 14/30
Example: two-sided alternatives
(continued from the cholesterol data) Is the average cholesterol level ofchildren whose fathers had heart disease different from the US averagecholesterol level (175) of children?
The hypotheses are H0 : µ = 175 vs. H1 : µ 6= 175. The observedtest statistic is t(obs) = 1.58.
Critical-value method: At significance level of 5%, the rejectionregion is t > 2.262 or t < −2.262. Since the observed test statisticis −2.262 < 1.58 < 2.262, we cannot reject H0 at significancelevel 5%.
p-value method: p = Pr(|t| > |t(obs)||H0) = Pr(t >1.58|H0) + Pr(t < −1.58|H0) = 0.149. Since p > 0.05, we cannotreject H0 at significance level 5%.
Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 15/30
The Relationship Between Hypothesis Testing andConfidence Intervals
Suppose we are testing H0 : µ = µ0 vs. H1 : µ 6= µ0.
H0 is rejected at significance level α, if and only if 100%× (1− α)CI for µ does not contain µ0.
Recall that 100%(1− α) CI for µ is x̄± tn−1,1−α/2s/√n.
If H0 is rejected,
t =x̄− µ0
s/√n< −tn−1,1−α/2 or t > tn−1,1−α/2
⇒ x̄− µ0 < −tn−1,1−α/2s/√n or x̄− µ0 > tn−1,1−α/2s/
√n
⇒ µ0 > x̄+ tn−1,1−α/2s/√n or µ0 < x̄− tn−1,1−α/2s/
√n
Therefore, 100%(1− α) CI does not include µ0. Similarly, the inversecan be proved.
Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 16/30
z-test for the Mean with Known Variance
Let X1, . . . , Xn ∼ N(µ, σ2) and σ2 is known.
The test for H0 : µ = µ0 is based on the test statistic Z =X̄ − µ0
σ/√n
which follows N(0, 1) under H0.
H1 Rejection region p-value
µ > µ0 z > z1−α Pr(Z > z(obs)|H0)µ < µ0 z < zα Pr(Z < z(obs)|H0)µ 6= µ0 |z| > z1−α/2 Pr(|Z| > |z(obs)||H0)
Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 17/30
Tests for Binomial probability p
Let X ∼ Binomial(n, p). We want to test H0 : p = p0 vs. H1 : p 6= p0.
The test statistic is Z =p̂− p0√
p0(1− p0)/n, where p̂ = X/n. If
np0(1− p0) ≥ 5, Z ∼ N(0, 1) under H0.
Rejection region: z(obs) < zα/2 or z(obs) > z1−α/2p-value:
I If p̂ ≤ p0, then p = 2× Pr(Z < z(obs)|H0)I If p̂ > p0, then p = 2× Pr(Z > z(obs)|H0)
Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 18/30
Two-sample case
In a two-sample hypothesis-testing problem, the underlyingparameters of two different populations are compared.
fwt left and right
maxfwt in the control group and exposed group
Independent samples: when the data points in one sample areunrelated to the data points in the second sample.
Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 19/30
The paired sample: Paired t-test
Paired sample: when each data point in the first sample is matchedand is related to a unique data point in the second sample. Pairedsamples may represent two sets of measurements on the same people oron different people who are chosen on an individual basis usingmatching criteria, such as age and sex, to be very similar to each other.
LEAD data: The numbers of right-hand and left-hand finger-wristtapping (fwt r and fwt l), respectively, were observed from each of124 children. We want to test whether the number of finger-wristtapping is different between right hand and left hand.
Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 20/30
The paired sample: paired t-test
Consider two samples: (X1,1, X2,1), . . . , (X1,n, X2,n), whereE(X1,i) = µ1 and E(X2,i) = µ2 for all i = 1, . . . , n. We want to testH0 : µ1 = µ2 vs. H1 : µ1 6= µ2.
Let ∆ = µ1 − µ2. Then, H0 : ∆ = 0 vs. H1 : ∆ 6= 0
To get rid of the correlation X1,i and X2,i, we consider thedifferences di = X1,i −X2,i for i = 1, . . . , n.
We assume d1, . . . , dn ∼ N(∆, σ2d). It is a one-sample t-test
problem.
Test statistic: t = d̄sd/√n
, where d̄ and sd are the sample mean
and standard deviation of the differences, respectively.
Under H0: t ∼ tn−1
p-value= 2× Pr(t > |t(obs)| |H0)
CI: d̄± tn−1,1−α/2sd/√n
Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 21/30
Example: Lead data
The numbers of right-hand and left-hand finger-wrist tapping (fwt r
and fwt l), respectively, were observed from each of 124 children. Wewant to test whether the number of finger-wrist tapping is differentbetween right hand and left hand.
Since fwt r and fwt l are not independent and paired sample, weconsider the difference of them.
H0 : ∆ = 0 vs. H1 : ∆ 6= 0
mean of difference: d̄ = 5.919 and s.d. sd = 6.711
test statistic: t = 5.9196.711/
√124
= 9.8206
The distribution of test statistic: t124−1
p-value: 2× Pr(t > 9.8206) = 3.699× 10−17
At significance level 5%, we reject the null. Right- and left- handfwt are significantly different.
Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 22/30
Two independent samples: equal variances
Consider two independent samples: X1,1, . . . , X1,n1 ∼ N(µ1, σ21)
(sample size n1) and X2,1, . . . , X2,n2 ∼ N(µ2, σ22) (sample size n2). We
want to test H0 : µ1 = µ2 vs. H1 : µ1 6= µ2.We assume σ2 = σ2
1 = σ22.
X̄1 ∼ N(µ1, σ2/n1), X̄2 ∼ N(µ2, σ
2/n2)
X̄1 − X̄2 ∼ N(µ1 − µ2, σ2/n1 + σ2/n2)
the pooled variance estimation of σ2:
s2 =(n1 − 1)s2
1 + (n2 − 1)s22
n1 + n2 − 2, weighted average of s2
1 and s22
Test statistic: t =X̄1 − X̄2
s√
1/n1 + 1/n2
∼ tn1+n2−2 under H0.
Rejection region: t(obs) > tn1+n2−2,1−α/2 ort(obs) < −tn1+n2−2,1−α/2p-value= 2× Pr(t > |t(obs)| |H0)
CI: (x̄1 − x̄2)± tn1+n2−2,1−α/2s2√
1/n1 + 1/n2
Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 23/30
Example: Lead data
We now assume the variances of maxfwt in the control (n1 = 78) andexposed group (n2 = 46) are the same. We want to test forH0 : µ1 = µ2 vs. H1 : µ1 6= µ2.
sample means x̄1 = 62.44, x̄2 = 59.76; sample variances:s2
1 = 415.18 and s22 = 625.43
pooled variance: s2 =(78− 1)415.18 + (46− 1)625.43
78 + 46− 2= 492.734
test statistic: t =62.44− 59.76√
492.734(1/78 + 1/46)= 0.6482
Under H0, t ∼ t122 (df=78+46-2=122)
p-value: 2× Pr(t > 0.6482) = 0.518
At significance level 5%, we cannot reject the null hypothesis. Noevidence of different means.
Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 24/30
Two independent samples: different variances
Two samples: X1,1, . . . , X1,n1 ∼ N(µ1, σ21) (sample size n1) and
X2,1, . . . , X2,n2 ∼ N(µ2, σ22) (sample size n2). We want to test
H0 : µ1 = µ2 vs. H1 : µ1 6= µ2.
X̄1 − X̄2 ∼ N(µ1 − µ2, σ21/n1 + σ2
1/n2)
Test statistic: t =X̄1 − X̄2√
s21/n1 + s2
2/n2
Under H0: the test statistic approximately follows t-distribution
with d.f. d′ =(s2
1/n1 + s22/n2)2
(s21/n1)2/(n1 − 1) + (s2
2/n2)2/(n2 − 1)
Rejection region: t(obs) > td′,1−α/2 or t(obs) < −td′,1−α/2p-value= 2× Pr(t > |t(obs)| |H0)
CI: (x̄1 − x̄2)± td′,1−α/2√s2
1/n1 + s22/n2
Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 25/30
F-test for the Equal Variances
Two samples: X1,1, . . . , X1,n1 ∼ N(µ1, σ21) (sample size n1) and
X2,1, . . . , X2,n2 ∼ N(µ2, σ22) (sample size n2). We want to test
H0 : σ21 = σ2
2 vs. H1 : σ21 6= σ2
2. In other words, H0 : σ21/σ
22 = 1 vs.
H1 : σ21/σ
22 6= 1
(n1 − 1)S21/σ
21 ∼ χ2
n1−1 and (n2 − 2)S22/σ
22 ∼ χ2
n2−1
Test statistic: F =S2
1
S22
∼ Fn1−1,n2−1 under H0
Rejection region: F (obs) > Fn1−1,n2−1,1−α/2 orF (obs) < Fn2−1,n1−1,α/2
p-valueI If F (obs) ≥ 1, then p = 2× Pr(F > F (obs)|H0)I If F (obs) < 1, then p = 2× Pr(F < F (obs)|H0)
Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 26/30
Example: Lead data
Test whether the variances of maxfwt in the control (n1 = 78) andexposed group (n2 = 46) are the same or not.
H0 : σ21 = σ2
2 vs. H0 : σ21 6= σ2
2
sample variances: s21 = 415.18 and s2
2 = 625.43
test statistic: F = s21/s
22 = 415.18/625.43 = 0.6638
The distribution of test statistic under H0: F77,45
p-value: 2× Pr(F < 0.6638) = 0.1133
At significance level 5%, we cannot reject the null hypothesis.There is no evidence that the variances are different.
Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 27/30
Overlapping Confidence Intervals and StatisticalSignificance
Can we judge whether two statistics are significantly differentdepending on whether or not their confidence intervals overlap? Theanswer is: not always.
If two statistics have non-overlapping confidence intervals, they aresignificantly different.
If they have overlapping confidence intervals, it is not necessarilytrue that they are not significantly different.
Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 28/30
Overlapping Confidence Intervals and StatisticalSignificance
Assume x̄1 − x̄2 ≥ 0 without loss of generality.
The means are significantly different if(x̄1 − x̄2) > 1.96
√SE2
1 + SE22
CIs do not overlap if x̄1 − 1.96SE1 > x̄2 + 1.96SE2 which implies(x1 − x2) > 1.96(SE1 + SE2)
Since√SE2
1 + SE22 ≤ SE1 + SE2,
Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 29/30
Summary
1 What are your hypotheses?2 Identify data type and test statistic
I t-test, z-test, χ2-test, F-testI one-sample or two-sample (paired or independent)
3 Perform a test
4 Go back to numerical and/or graphical summary and confirmyour test result matches your data
Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 30/30