P-values and statistical tests3. t-test
Hand-outsavailableathttp://is.gd/statlec
MarekGierlińskiDivisionofComputationalBiology
Statistical test
2
NullhypothesisH0:noeffect
Significancelevel𝛼 = 0.05
StatisticTData
p-value
𝑝 < 𝛼RejectH0
𝑝 ≥ 𝛼Insufficientevidence
One-sample t-test
One-sample t-test
4
Nullhypothesis:thesamplecamefromapopulationwithmean𝜇 = 20 g
t-statisticn Sample𝑥,, 𝑥., … , 𝑥0
𝑀 - mean
𝑆𝐷 - standarddeviation
𝑆𝐸 = 𝑆𝐷/ 𝑛� - standarderror
n Fromthesewecanfind
𝑡 =𝑀 − 𝜇𝑆𝐸
n moregenericform:
𝑡 =deviation
standarderror
5
2𝑆𝐸𝜇 2𝑆𝐸
Student’s t-distributionn t-statisticisdistributedwitht-distribution
n Standardized
n Oneparameter:degreesoffreedom,𝜈
n Forlarge𝜈 approachesGaussian
6
degreesoffreedomGaussian
William Gossetn Brewerandstatisticiann DevelopedStudent’st-distribution
n WorkedforGuinness,whoprohibitedemployeesfrompublishinganypapers
n Publishedas“Student”
n WorkedwithFisheranddevelopedthet-statisticinitscurrentform
n Alwaysworkedwithexperimentaldatan Progenitorbioinformatician?
7
WilliamSealyGosset (1876-1937)
William Gossetn Brewerandstatisticiann DevelopedStudent’st-distribution
n WorkedforGuinness,whoprohibitedemployeesfrompublishinganypapers
n Publishedas“Student”
n WorkedwithFisheranddevelopedthet-statisticinitscurrentform
n Alwaysworkedwithexperimentaldatan Progenitorbioinformatician?
8
Null distribution for the deviation of the mean
9
Populationofmice
𝜇 = 20 g,𝜎 = 5
Selectsamplesize5
𝑍 =𝑀 − 𝜇𝜎/ 𝑛�
𝑡 =𝑀 − 𝜇𝑆𝐷/ 𝑛�
Builddistributionsof𝑀,𝑍 and𝑡
×10J
Null distribution for the deviation of the mean
10
Originaldistribution
Distributionof𝑀
Gaussian 𝜇, 𝜎/ 𝑛�
Gaussian 𝜇, 𝜎 Gaussian 0, 1
Distributionof𝑍
𝑍
Distributionoft
𝑡
Gaussian 0, 1t-distribution(𝜈)
Null distribution for the deviation of the mean
11
Gaussian 0, 1
Distributionof𝑍
𝑍
Distributionoft
𝑡
Gaussian 0, 1t-distribution(𝜈)𝑡 =
𝑀 − 𝜇𝑆𝐷/ 𝑛�
=𝑀 − 𝜇𝑆𝐸
𝑆𝐷 - sampleestimator(known)
𝑍 =𝑀 − 𝜇𝜎/ 𝑛�
𝜎 - populationparameter(unknown)
One-sample t-testn Considerasampleof𝑛 measurements
o 𝑀 – samplemeano 𝑆𝐷 – samplestandarddeviationo 𝑆𝐸 = 𝑆𝐷/ 𝑛� – samplestandarderror
n Nullhypothesis:thesamplecomesfromapopulationwithmean𝜇
n Teststatistic
𝑡 =𝑀 − 𝜇𝑆𝐸
n isdistributedwitht-distributionwith𝑛 −1 degreesoffreedom
12
nulldistributiont-distributionwith4d.o.f.
nulldistributiont-distributionwith4d.o.f.
One-sample t-test: examplen H0:𝜇 = 20 g
n 5micewithbodymass(g):n 19.5,26.7,24.5,21.9,22.0
𝑀 = 22.92 g𝑆𝐷 = 2.76 g𝑆𝐸 = 1.23 g
𝑡 =22.92 − 20
1.22 = 2.37𝜈 = 4
𝑝 = 0.04
13
Observation
𝑝 = 0.04
>mass=c(19.5,26.7,24.5,21.9,22.0)>M=mean(mass)>n=length(mass)>SE=sd(mass)/sqrt(n)>t=(M- 20)/SE[1]2.36968>1- pt(t,n- 1)[1]0.03842385
Normality of dataOriginaldistribution
Distributionoft
Sidedness
15
Observation
𝑝, = 0.04
𝑝. = 0.08
One-sidedtestH1:𝑀 > 𝜇
Two-sidedtestH2:𝑀 ≠ 𝜇
𝑝. = 2𝑝,
One-sample t-test: summary
16
Input sampleof𝑛 measurementtheoreticalvalue𝜇 (populationmean)
Assumptions ObservationsarerandomandindependentDataarenormallydistributed
Usage Examineifthesampleisconsistentwiththepopulationmean
Nullhypothesis Samplecamefromapopulationwithmean𝜇
Comments Limited usage(e.g.SILAC)Workswellfornon-normaldistribution,aslongasitissymmetric
How to do it in R?# One-sided t-test
> mass = c(19.5, 26.7, 24.5, 21.9, 22.0)
> t.test(mass, mu=20, alternative="greater")
One Sample t-test
data: mass
t = 2.3697, df = 4, p-value = 0.03842
alternative hypothesis: true mean is greater than 20
95 percent confidence interval: 20.29307 Inf
sample estimates:
mean of x
22.92
17
Two-sample t-test
Two samplesn Considertwosamples(differentsizes)
n Aretheydifferent?
n Aretheirmeansdifferent?
n Dotheycomefrompopulationswithdifferentmeans?
19
𝑛U = 12𝑀U = 19.0 g𝑆U = 4.6 g
𝑛V = 9𝑀V = 24.0 g𝑆V = 4.3 g
The null distribution for the deviation between means
20
Normalpopulation𝜇 = 20 g,𝜎 = 5 g
𝑀U 𝑀V
x1,000,000
PopulationofBritishmice𝜇 = 20 g,𝜎 = 5
Selecttwosamplessize12and9
𝑡 =𝑀U −𝑀V
𝑆𝐸
BuilddistributionofΔ𝑀 and𝑡
Two-sample t-testn Nullhypothesis:bothsamplescomefrompopulationsofthesamemean
n H0:𝜇, = 𝜇.
n Teststatistic
𝑡 =𝑀, −𝑀.
𝑆𝐸
isdistributedwitht-distributionwith𝜈degreesoffreedom
21
DistributionofΔ𝑀
Distributionof𝑡
Case 1: equal variancesn Assumethatbothdistributionshavethesamevariance(orstandarddeviation)
n Usepooledvarianceestimator:
𝑆𝐷,,.. =𝑛, − 1 𝑆𝐷,. + 𝑛. − 1 𝑆𝐷..
𝑛, + 𝑛. − 2
n Andthenthestandarderrorandthenumberofdegreesoffreedomare
𝑆𝐸 = 𝑆𝐷,,.1𝑛,+1𝑛.
�
𝜈 = 𝑛, + 𝑛. − 2
22
Incaseofequalsamplessizes,𝑛, = 𝑛.,theseequationssimplify:
𝑆𝐷,,.. = 𝑆𝐷,. + 𝑆𝐷..
𝑆𝐸 =𝑆𝐷,,.𝑛�
𝜈 = 2𝑛 − 2
Case 1: equal variances, example
𝑆𝐷,,. = 4.5g
𝑆𝐸 = 1.98g
𝜈 = 19
𝑡 = 2.499
𝑝 = 0.011 (one-sided)
𝑝 = 0.022 (two-sided)
23
𝑛U = 12𝑀U = 19.0 g𝑆𝐷U = 4.6 g
𝑛V = 9𝑀V = 24.0 g𝑆𝐷V = 4.3 g
𝑝 = 0.011
t-distribution
Case 2: unequal variancesn Assumethatdistributionshavedifferentvariances
n Welch’st-test
n Findindividualstandarderrors(squared):
𝑆𝐸,. =𝑆𝐷,.
𝑛,𝑆𝐸.. =
𝑆𝐷..
𝑛.
n Findthecommonstandarderror:
𝑆𝐸 = 𝑆𝐸,. + 𝑆𝐸..�
n Numberofdegreesoffreedom
𝜈 ≈𝑆𝐸,. + 𝑆𝐸.. .
𝑆𝐸,[𝑛, − 1
+ 𝑆𝐸.[𝑛. − 1
24
Case 2: unequal variances, example
𝑆𝐸U. = 1.8g.
𝑆𝐸V. = 2.1g.
𝑆𝐸 = 1.96g
𝜈 = 18
𝑡 = 2.524
𝑝 = 0.011 (one-sided)
𝑝 = 0.021 (two-sided)
𝑛U = 12𝑀U = 19.0 g𝑆𝐷U = 4.6 g
𝑛V = 9𝑀V = 24.0 g𝑆𝐷V = 4.3 g
𝑝 = 0.011
t-distribution
What if variances are not equal?n Say,oursamplescomefromtwopopulations:
o English:𝜇 = 20g, 𝜎 = 5go Scottish:𝜇 = 20g, 𝜎 = 2.5g
n ‘Equalvariance’t-statisticdoesnotrepresentthenullhypothesis
n t-testonsamples:o ‘equal’:𝑡 = 4.17, 𝑝 = 2.8×10\[
o ‘unequal’:𝑡 = 4.56, 𝑝 = 1.4×10\[
n Unlessyouarecertainthatthevariancesareequal,usetheWelch’stest
26
Equalvariancet-statistic
Unequalvariancet-statistic
t-distribution
simulation
P-values vs. effect size
27
𝑛U = 12𝑛V = 9log𝐹𝐶 = 0.33𝑝 = 0.022
𝑛U = 100𝑛V = 100log𝐹𝐶 = 0.11𝑝 = 0.022
P-value is not a measure of biological
significance
Two-sample t test: summary
29
Input two samplesof𝑛, and 𝑛. measurements
Assumptions Observationsarerandomandindependent (nobefore/afterdata)Dataarenormallydistributed
Usage Comparesamplemeans
Nullhypothesis Samplescamefrompopulations withthesamemeans
Comments Workswellfornon-normaldistribution,aslongasitissymmetric
How to do it in R?# Two-sided t-test, equal variances
> English = c(16.5, 21.3, 12.4, 11.2, 23.7, 20.2, 17.4, 23, 15.6, 26.5, 21.8, 18.9)
> Scottish = c(19.7, 29.3, 27.1, 24.8, 22.4, 27.6, 25.7, 23.9, 15.4)
> t.test(English, Scottish, var.equal=T)
Two Sample t-test
data: English and Scottish
t = -2.4993, df = 19, p-value = 0.02177
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval: -9.0903223 -0.8041221
sample estimates:
mean of x mean of y
19.04167 23.98889
# Two-sided t-test, unequal variances
> t.test(English, Scottish, var.equal=F)
Welch Two Sample t-test
data: English and Scottish
t = -2.5238, df = 17.969, p-value = 0.02125
30
Paired t-test
Paired t-testn Samplesarepairedn Forexample:mouseweightbeforeandafterobesitytreatment
n Nullhypothesis:thereisnodifferencebetweenbeforeandafter
n 𝑀` - themeanoftheindividualdifferences
n Example:mousebodymass(g)
Before: 21.4 20.2 23.5 17.5 18.6 17.0 18.9 19.2
After: 22.6 20.9 23.8 18.0 18.4 17.9 19.3 19.1
32
Paired t-testn Samplesarepairedn Findthedifferences:
∆b= 𝑥b − 𝑦b
then
𝑀∆ - mean𝑆𝐷∆ - standarddeviation𝑆𝐸∆ = 𝑆𝐷∆/ 𝑛� - standarderror
n Theteststatisticis
𝑡 =𝑀∆
𝑆𝐸∆
n t-distributionwith𝑛 − 1 degreesoffreedom
33
Non-pairedt-test(Welch)𝑀d −𝑀e = 0.46 g𝑆𝐸 = 1.08 g𝑡 = 0.426𝑝 = 0.34
Pairedtest𝑀∆ = 0.28 g𝑆𝐸∆ = 0.17 g𝑡 = 2.75𝑝 = 0.03
How to do it in R?# Paired t-test
> before = c(21.4, 20.2, 23.5, 17.5, 18.6, 17.0, 18.9, 19.2)
> after = c(22.6, 20.9, 23.8, 18.0, 18.4, 17.9, 19.3, 19.1)
> t.test(before, after, paired=T)
Paired t-test
data: before and after
t = -2.7545, df = 7, p-value = 0.02832
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:-0.85953136 -0.06546864
sample estimates:
mean of the differences
-0.4625
34
F-test
Variancen Onesampleofsize𝑛n Samplevariance
𝑆𝐷0\,. =1
𝑛 − 1f 𝑥b − 𝑀 .�
b
n Generalizedvariance:meansquare
𝑀𝑆 =𝑆𝑆𝜈
n whereo 𝑆𝑆 - sumofsquaredresidualso 𝜈 - numberofdegreesoffreedom
36
Samplemean
Residual
Comparison of variancen Considertwosamples
o Englishmice,𝑛U = 12o Scottishmice𝑛V = 9
n Wewanttotestiftheycomefromthepopulationswiththesamevariance,𝜎.
n Nullhypothesis:𝜎,. = 𝜎..
n Weneedateststatisticwithknowndistribution
37
𝑛U = 12𝑆𝐷U. = 21 g2
𝑛V = 9𝑆𝐷V. = 19 g2
Gedankenexperiment
38
nulldistributionPopulationofBritishmice𝜇 = 20 g,𝜎 = 5
Selecttwosamplessize12and9
𝐹 =𝑆𝐷U.
𝑆𝐷V.
Builddistributionof𝐹
Test to compare two variancesn Considertwosamples,sized𝑛, and𝑛.
n Nullhypothesis:theycomefromdistributionswiththesamevariance
n 𝐻h:𝜎,. = 𝜎..
n Teststatistic:
𝐹 =𝑆𝐷,.
𝑆𝐷..
isdistributedwithF-distributionwith𝑛, − 1 and𝑛. − 1 degreesoffreedom
39
F-distribution,𝜈, = 11, 𝜈. = 8
RemainderTeststatisticfortwo-samplet-test:
𝑡 =𝑀, −𝑀.𝑆𝐸
F-testn Englishmice:𝑆𝐷U = 4.61 g,𝑛U = 12n Scottishmice:𝑆𝐷V = 4.32 g,𝑛U = 9
n Nullhypothesis:theycomefromdistributionswiththesamevariance
n Teststatistic:
𝐹 =4.61.
4.32. = 1.139
𝜈U = 11𝜈V = 8
𝑝 = 0.44
40
Observation
𝑝 = 0.44
F-distribution,𝜈, = 11, 𝜈. = 8
>1- pf(1.139,11,8)[1]0.4375845
Two-sample variance test (F-test): summary
41
Input two samplesof𝑛, and 𝑛. measurements
Usage comparesamplevariances
Nullhypothesis samplescamefrompopulations withthesamevariance
Comments requiresnormalityofdatarightnow,itmightlookpointless,butisnecessaryinANOVA.Veryimportanttest!
How to do it in R?# Two-sample variance test
> var.test(English, Scottish, alternative=“greater”)
F test to compare two variances
data: English and Scottish
F = 1.1389, num df = 11, denom df = 8, p-value = 0.4376
alternative hypothesis: true ratio of variances is greater to 1
95 percent confidence interval:
0.3437867 Inf
sample estimates:
ratio of variances
1.138948
42
Hand-outsavailableathttp://tiny.cc/statlec