BIO5312 Biostatistics Lecture 6: Statistical hypothesis testings · BIO5312 Biostatistics Lecture...

BIO5312 BiostatisticsLecture 6: Statistical hypothesis testings

Yujin Chung

October 4th, 2016

Fall 2016

Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 1/30

Previous

Two types of statistical inferences:

Estimation: concerned with estimating the values of specificpopulation parameters. These specific values are referred to aspoint estimates. Sometimes, interval estimation is carried outto specify an interval which likely includes the parameter values.

Hypothesis testing: concerned with testing whether the value ofa population parameter is equal to some specific value


Hypothesis testing

Philosophy: prove a claim by contradiction.

Analogy: “dependent love” story

Claim : You don’t love me.Reasoning : If you loved me, you would take the trash out

every week and put your socks away.Data : Some weeks you don’t take the trash out or leave

your socks where they fall.Conclusion : You don’t love me.


Statistical hypothesis testing

Hypothesis-testing framework specifies two hypotheses: null andalternative hypothesis

I The null hypothesis (H0) is often an initial claim thatresearchers specify using previous research or knowledge. Typicallyit is a statement that the value of a population parameter (such asproportion, mean, or standard deviation) is equal to some claimedvalue.

I The alternative hypothesis (H1) is what you might believe tobe true or hope to prove true.

H0 : you love me vs. H1 : you don’t love me

Hypothesis-testing provides an objective framework for makingdecisions using probabilities methods, rather than relying onsubjective impressions.


Examples

The average of cholesterol level in children is 175mg/dL. A groupof men who have died from heart disease within the past year areidentified, and the cholesterol of their offspring are measured. (1)Is the average cholesterol level of these children larger than175mg/dL? (2) Is the average cholesterol level different from thatof children whose fathers do not have a history of heart disease?

I µ1: the population mean of cholesterol level in the case groupµ2: the poulation mean of cholesterol level in the control group

I (1) H0: µ1 = 175 vs. H1: µ1 > 175I (2) H0: µ1 = µ2 vs. H1: µ1 6= µ2

Are the IQ and the number of finger-wrist taps (fwt) of children inthe lead exposed group different from those of children in thecontrol group?

I H0: the population mean of fwt in the two groups are the samevs. H1: the means are different


Four possible outcomes in hypothesis testing

No reject H0 Reject H0

H0 true true negative false positive(1− α) Type I error (α)

H1 true false negative true positiveType II error (β) Power (1− β)

Two possible errorsI type I error (α): Pr(Reject H0|H0 true). commonly referred to as

the significance level of a test.I type II error (β): Pr(Not reject H0|H1 true)

The power of a test: 1− β = Pr(Reject H0|H1 true)

We prefer a test with small α and large power (1− β).

Statistical hypothesis test: the greatest power (1− β) among allpossible tests of a given type I error α


t-Test for the Mean

We assume the cholesterol levels in children follow N(µ, σ2). We wishto test whether the cholesterol levels of children with family history issame as 175mg/dL, the average cholesterol without family or largerthan 175.

HypothesesH0 : µ = 175 vs H1 : µ > 175.

Logic:1 Assume H0 is true2 If x̄ is too large, it is a contradiction to the assumption that H0 is

true.


t-test for the mean: critical-value method

H0 : µ = 175 vs H1 : µ > 175.The distribution of X̄ under H0

Since X1, . . . , Xn ∼ N(175, σ2), t =X̄ − 175

S/√n∼ tn−1.

0.0

0.1

0.2

0.3

0.4

Critical−value method: H0:µ=175 vs. H1:µ>175D

ensi

ty fu

nctio

n of

t n−1

−4 0 4tn−11−α

Acceptance region

Rejection region

Critical value: tn−1,1−αIf t > tn−1,1−α, reject H0; if t ≤ tn−1,α, not reject H0

Type I error: Pr(t > tn−1,1−α|H0) = α;


t-Test for the Mean: p-value method

H0 : µ = 175 vs H1 : µ > 175.

Test statistic: t =X̄ − 175

S/√n

.

Under H0: test statistic t ∼ tn−1

p-value: Pr(t > t(obs)|H0), the probability of obtaining a teststatistic as extreme as or more extreme than the actual teststatistic value, given that H0 is true.

0.0

0.1

0.2

0.3

0.4

p−value for the test: H0:µ=175 vs. H1:µ>175

Den

sity

func

tion

of t n

−1

−4 0 4tn−11−αt(obs)

● Rejection region

● p−value: Pr(t>t(obs))

If p-value < α, reject H0; if p-value ≥ α, not reject H0.Yujin Chung Lec6: Statistical hypothesis testings Fall 2016 9/30

Significance level: α

H0 is rejected if t > tn−1,1−α or p-value< α.

α: significance level, type-I error, typically set to 0.05

Guidelines for judging the significance of a p-value

If p ≥ 0.05, then the results are considered not statisticallysignificant

If 0.01 ≤ p < 0.05, then the results are statistically significant

If 0.001 ≤ p < 0.01, then the results are highly significant

If p < 0.001, then the results are very highly significant

Report an exact p-value!

The p-value indicates exactly how significant the results arewithout performing repeated significance tests at different α levels.

The p-value indicate how close to statistical significance the resultshave come even when they are not statistically significant


Example: the cholesterol of children

Suppose the mean cholesterol level of 10 children whose fathers diedfrom heart disease is 200 mg/dL and the sample standard deviation is50 mg/dL. The average of cholesterol level in children is known as175mg/dL. Is the average cholesterol level of these children larger than175mg/dL?

Let µ be the population mean cholesterol level of children whosefathers died from heart disease. The hypotheses are H0 : µ = 175 vs.H1 : µ > 175.

The test statistic is t =x̄− µ0

s/√n

and follows tn−1 under H0. The

observed test statistic is 200−17550/√n

= 1.58.

Critical-value methodAt the significance level 5%, the critical value istn−1,1−α = t9,0.95 = 1.833 and the rejection region is t > 1.833.Since t(obs) = 1.58 < 1.833, we cannot reject H0 atsignificance level 5%


Example: the cholesterol of children

Suppose the mean cholesterol level of 10 children whose fathers diedfrom heart disease is 200 mg/dL and the sample standard deviation is50 mg/dL. The average of cholesterol level in children is known as175mg/dL. Is the average cholesterol level of these children larger than175mg/dL?

Let µ be the population mean cholesterol level of children whosefathers died from heart disease. The hypotheses are H0 : µ = 175 vs.H1 : µ > 175.

The test statistic is t =x̄− µ0

s/√n

and follows tn−1 = t9 under H0. The

observed test statistic is 200−17550/√n

= 1.58.

p-value methodThe p-value is p = Pr(t > t(obs)|H0) = Pr(t > 1.58|H0) = 0.074.Since p > 0.05, we cannot reject H0 at significance level 5%


One-tailed t-test for the mean

A one-tailed test is a test in which the values of the parameter beingstudied under the alternative hypothesis are allowed to be eithergreater than or less than the values of the parameter under the nullhypothesis (µ0) but not both.

H0 : µ = µ0

test statistic: t =X̄ − µ0

S/√n

H1 rejection region p-value

µ > µ0 t > tn−1,1−α Pr(t > t(obs)|H0)µ < µ0 t < tn−1,α Pr(t < t(obs)|H0)


Two-sided alternatives

The test for H0 : µ = µ0 vs. H1 : µ 6= µ0 is based on t = x̄−µ0s/√n

.

Critical-value methodI Rejection region: If |t| > tn−1,1−α/2, then H0 is rejected.I Acceptance region: If |t| > tn−1,1−α/2, then H0 is NOT rejected.I Type-I error: Pr(|t| > tn−1,1−α/2|H0) = α.

p-value method: p-value is Pr(|t| > |t(obs)||H0). If p < 0.05, thenH0 is rejected at significance level 5%.


Example: two-sided alternatives

(continued from the cholesterol data) Is the average cholesterol level ofchildren whose fathers had heart disease different from the US averagecholesterol level (175) of children?

The hypotheses are H0 : µ = 175 vs. H1 : µ 6= 175. The observedtest statistic is t(obs) = 1.58.

Critical-value method: At significance level of 5%, the rejectionregion is t > 2.262 or t < −2.262. Since the observed test statisticis −2.262 < 1.58 < 2.262, we cannot reject H0 at significancelevel 5%.

p-value method: p = Pr(|t| > |t(obs)||H0) = Pr(t >1.58|H0) + Pr(t < −1.58|H0) = 0.149. Since p > 0.05, we cannotreject H0 at significance level 5%.


The Relationship Between Hypothesis Testing andConfidence Intervals

Suppose we are testing H0 : µ = µ0 vs. H1 : µ 6= µ0.

H0 is rejected at significance level α, if and only if 100%× (1− α)CI for µ does not contain µ0.

Recall that 100%(1− α) CI for µ is x̄± tn−1,1−α/2s/√n.

If H0 is rejected,

t =x̄− µ0

s/√n< −tn−1,1−α/2 or t > tn−1,1−α/2

⇒ x̄− µ0 < −tn−1,1−α/2s/√n or x̄− µ0 > tn−1,1−α/2s/

√n

⇒ µ0 > x̄+ tn−1,1−α/2s/√n or µ0 < x̄− tn−1,1−α/2s/

√n

Therefore, 100%(1− α) CI does not include µ0. Similarly, the inversecan be proved.


z-test for the Mean with Known Variance

Let X1, . . . , Xn ∼ N(µ, σ2) and σ2 is known.

The test for H0 : µ = µ0 is based on the test statistic Z =X̄ − µ0

σ/√n

which follows N(0, 1) under H0.

H1 Rejection region p-value

µ > µ0 z > z1−α Pr(Z > z(obs)|H0)µ < µ0 z < zα Pr(Z < z(obs)|H0)µ 6= µ0 |z| > z1−α/2 Pr(|Z| > |z(obs)||H0)


Tests for Binomial probability p

Let X ∼ Binomial(n, p). We want to test H0 : p = p0 vs. H1 : p 6= p0.

The test statistic is Z =p̂− p0√

p0(1− p0)/n, where p̂ = X/n. If

np0(1− p0) ≥ 5, Z ∼ N(0, 1) under H0.

Rejection region: z(obs) < zα/2 or z(obs) > z1−α/2p-value:

I If p̂ ≤ p0, then p = 2× Pr(Z < z(obs)|H0)I If p̂ > p0, then p = 2× Pr(Z > z(obs)|H0)


Two-sample case

In a two-sample hypothesis-testing problem, the underlyingparameters of two different populations are compared.

fwt left and right

maxfwt in the control group and exposed group

Independent samples: when the data points in one sample areunrelated to the data points in the second sample.


The paired sample: Paired t-test

Paired sample: when each data point in the first sample is matchedand is related to a unique data point in the second sample. Pairedsamples may represent two sets of measurements on the same people oron different people who are chosen on an individual basis usingmatching criteria, such as age and sex, to be very similar to each other.

LEAD data: The numbers of right-hand and left-hand finger-wristtapping (fwt r and fwt l), respectively, were observed from each of124 children. We want to test whether the number of finger-wristtapping is different between right hand and left hand.


The paired sample: paired t-test

Consider two samples: (X1,1, X2,1), . . . , (X1,n, X2,n), whereE(X1,i) = µ1 and E(X2,i) = µ2 for all i = 1, . . . , n. We want to testH0 : µ1 = µ2 vs. H1 : µ1 6= µ2.

Let ∆ = µ1 − µ2. Then, H0 : ∆ = 0 vs. H1 : ∆ 6= 0

To get rid of the correlation X1,i and X2,i, we consider thedifferences di = X1,i −X2,i for i = 1, . . . , n.

We assume d1, . . . , dn ∼ N(∆, σ2d). It is a one-sample t-test

problem.

Test statistic: t = d̄sd/√n

, where d̄ and sd are the sample mean

and standard deviation of the differences, respectively.

Under H0: t ∼ tn−1

p-value= 2× Pr(t > |t(obs)| |H0)

CI: d̄± tn−1,1−α/2sd/√n


Example: Lead data

The numbers of right-hand and left-hand finger-wrist tapping (fwt r

and fwt l), respectively, were observed from each of 124 children. Wewant to test whether the number of finger-wrist tapping is differentbetween right hand and left hand.

Since fwt r and fwt l are not independent and paired sample, weconsider the difference of them.

H0 : ∆ = 0 vs. H1 : ∆ 6= 0

mean of difference: d̄ = 5.919 and s.d. sd = 6.711

test statistic: t = 5.9196.711/

√124

= 9.8206

The distribution of test statistic: t124−1

p-value: 2× Pr(t > 9.8206) = 3.699× 10−17

At significance level 5%, we reject the null. Right- and left- handfwt are significantly different.


Two independent samples: equal variances

Consider two independent samples: X1,1, . . . , X1,n1 ∼ N(µ1, σ21)

(sample size n1) and X2,1, . . . , X2,n2 ∼ N(µ2, σ22) (sample size n2). We

want to test H0 : µ1 = µ2 vs. H1 : µ1 6= µ2.We assume σ2 = σ2

1 = σ22.

X̄1 ∼ N(µ1, σ2/n1), X̄2 ∼ N(µ2, σ

2/n2)

X̄1 − X̄2 ∼ N(µ1 − µ2, σ2/n1 + σ2/n2)

the pooled variance estimation of σ2:

s2 =(n1 − 1)s2

1 + (n2 − 1)s22

n1 + n2 − 2, weighted average of s2

1 and s22

Test statistic: t =X̄1 − X̄2

s√

1/n1 + 1/n2

∼ tn1+n2−2 under H0.

Rejection region: t(obs) > tn1+n2−2,1−α/2 ort(obs) < −tn1+n2−2,1−α/2p-value= 2× Pr(t > |t(obs)| |H0)

CI: (x̄1 − x̄2)± tn1+n2−2,1−α/2s2√

1/n1 + 1/n2


Example: Lead data

We now assume the variances of maxfwt in the control (n1 = 78) andexposed group (n2 = 46) are the same. We want to test forH0 : µ1 = µ2 vs. H1 : µ1 6= µ2.

sample means x̄1 = 62.44, x̄2 = 59.76; sample variances:s2

1 = 415.18 and s22 = 625.43

pooled variance: s2 =(78− 1)415.18 + (46− 1)625.43

78 + 46− 2= 492.734

test statistic: t =62.44− 59.76√

492.734(1/78 + 1/46)= 0.6482

Under H0, t ∼ t122 (df=78+46-2=122)

p-value: 2× Pr(t > 0.6482) = 0.518

At significance level 5%, we cannot reject the null hypothesis. Noevidence of different means.


Two independent samples: different variances

Two samples: X1,1, . . . , X1,n1 ∼ N(µ1, σ21) (sample size n1) and

X2,1, . . . , X2,n2 ∼ N(µ2, σ22) (sample size n2). We want to test

H0 : µ1 = µ2 vs. H1 : µ1 6= µ2.

X̄1 − X̄2 ∼ N(µ1 − µ2, σ21/n1 + σ2

1/n2)

Test statistic: t =X̄1 − X̄2√

s21/n1 + s2

2/n2

Under H0: the test statistic approximately follows t-distribution

with d.f. d′ =(s2

1/n1 + s22/n2)2

(s21/n1)2/(n1 − 1) + (s2

2/n2)2/(n2 − 1)

Rejection region: t(obs) > td′,1−α/2 or t(obs) < −td′,1−α/2p-value= 2× Pr(t > |t(obs)| |H0)

CI: (x̄1 − x̄2)± td′,1−α/2√s2

1/n1 + s22/n2


F-test for the Equal Variances

Two samples: X1,1, . . . , X1,n1 ∼ N(µ1, σ21) (sample size n1) and

X2,1, . . . , X2,n2 ∼ N(µ2, σ22) (sample size n2). We want to test

H0 : σ21 = σ2

2 vs. H1 : σ21 6= σ2

2. In other words, H0 : σ21/σ

22 = 1 vs.

H1 : σ21/σ

22 6= 1

(n1 − 1)S21/σ

21 ∼ χ2

n1−1 and (n2 − 2)S22/σ

22 ∼ χ2

n2−1

Test statistic: F =S2

1

S22

∼ Fn1−1,n2−1 under H0

Rejection region: F (obs) > Fn1−1,n2−1,1−α/2 orF (obs) < Fn2−1,n1−1,α/2

p-valueI If F (obs) ≥ 1, then p = 2× Pr(F > F (obs)|H0)I If F (obs) < 1, then p = 2× Pr(F < F (obs)|H0)


Example: Lead data

Test whether the variances of maxfwt in the control (n1 = 78) andexposed group (n2 = 46) are the same or not.

H0 : σ21 = σ2

2 vs. H0 : σ21 6= σ2

2

sample variances: s21 = 415.18 and s2

2 = 625.43

test statistic: F = s21/s

22 = 415.18/625.43 = 0.6638

The distribution of test statistic under H0: F77,45

p-value: 2× Pr(F < 0.6638) = 0.1133

At significance level 5%, we cannot reject the null hypothesis.There is no evidence that the variances are different.


Overlapping Confidence Intervals and StatisticalSignificance

Can we judge whether two statistics are significantly differentdepending on whether or not their confidence intervals overlap? Theanswer is: not always.

If two statistics have non-overlapping confidence intervals, they aresignificantly different.

If they have overlapping confidence intervals, it is not necessarilytrue that they are not significantly different.


Overlapping Confidence Intervals and StatisticalSignificance

Assume x̄1 − x̄2 ≥ 0 without loss of generality.

The means are significantly different if(x̄1 − x̄2) > 1.96

√SE2

1 + SE22

CIs do not overlap if x̄1 − 1.96SE1 > x̄2 + 1.96SE2 which implies(x1 − x2) > 1.96(SE1 + SE2)

Since√SE2

1 + SE22 ≤ SE1 + SE2,


Summary

1 What are your hypotheses?2 Identify data type and test statistic

I t-test, z-test, χ2-test, F-testI one-sample or two-sample (paired or independent)

3 Perform a test

4 Go back to numerical and/or graphical summary and confirmyour test result matches your data


Date post:	10-Aug-2020
Category:	Documents
Upload:	others
View:	4 times
Download:	0 times

BIO5312 Biostatistics Lecture 6: Statistical hypothesis testings · BIO5312 Biostatistics Lecture...

Documents