3. t-test - University of Dundee · 2017-04-18 · How to do it in R? # Two-sided t-test, equal...

P-values and statistical tests3. t-test

Hand-outsavailableathttp://is.gd/statlec

MarekGierlińskiDivisionofComputationalBiology

Statistical test

2

NullhypothesisH0:noeffect

Significancelevel𝛼 = 0.05

StatisticTData

p-value

𝑝 < 𝛼RejectH0

𝑝 ≥ 𝛼Insufficientevidence

One-sample t-test

One-sample t-test

4

Nullhypothesis:thesamplecamefromapopulationwithmean𝜇 = 20 g

t-statisticn Sample𝑥,, 𝑥., … , 𝑥0

𝑀 - mean

𝑆𝐷 - standarddeviation

𝑆𝐸 = 𝑆𝐷/ 𝑛� - standarderror

n Fromthesewecanfind

𝑡 =𝑀 − 𝜇𝑆𝐸

n moregenericform:

𝑡 =deviation

standarderror

5

2𝑆𝐸𝜇 2𝑆𝐸

Student’s t-distributionn t-statisticisdistributedwitht-distribution

n Standardized

n Oneparameter:degreesoffreedom,𝜈

n Forlarge𝜈 approachesGaussian

6

degreesoffreedomGaussian

William Gossetn Brewerandstatisticiann DevelopedStudent’st-distribution

n WorkedforGuinness,whoprohibitedemployeesfrompublishinganypapers

n Publishedas“Student”

n WorkedwithFisheranddevelopedthet-statisticinitscurrentform

n Alwaysworkedwithexperimentaldatan Progenitorbioinformatician?

7

WilliamSealyGosset (1876-1937)

William Gossetn Brewerandstatisticiann DevelopedStudent’st-distribution

n WorkedforGuinness,whoprohibitedemployeesfrompublishinganypapers

n Publishedas“Student”

n WorkedwithFisheranddevelopedthet-statisticinitscurrentform

n Alwaysworkedwithexperimentaldatan Progenitorbioinformatician?

8

Null distribution for the deviation of the mean

9

Populationofmice

𝜇 = 20 g,𝜎 = 5

Selectsamplesize5

𝑍 =𝑀 − 𝜇𝜎/ 𝑛�

𝑡 =𝑀 − 𝜇𝑆𝐷/ 𝑛�

Builddistributionsof𝑀,𝑍 and𝑡

×10J


10

Originaldistribution

Distributionof𝑀

Gaussian 𝜇, 𝜎/ 𝑛�

Gaussian 𝜇, 𝜎 Gaussian 0, 1

Distributionof𝑍

𝑍

Distributionoft

𝑡

Gaussian 0, 1t-distribution(𝜈)


11

Gaussian 0, 1

Distributionof𝑍

𝑍

Distributionoft

𝑡

Gaussian 0, 1t-distribution(𝜈)𝑡 =

𝑀 − 𝜇𝑆𝐷/ 𝑛�

=𝑀 − 𝜇𝑆𝐸

𝑆𝐷 - sampleestimator(known)

𝑍 =𝑀 − 𝜇𝜎/ 𝑛�

𝜎 - populationparameter(unknown)

One-sample t-testn Considerasampleof𝑛 measurements

o 𝑀 – samplemeano 𝑆𝐷 – samplestandarddeviationo 𝑆𝐸 = 𝑆𝐷/ 𝑛� – samplestandarderror

n Nullhypothesis:thesamplecomesfromapopulationwithmean𝜇

n Teststatistic

𝑡 =𝑀 − 𝜇𝑆𝐸

n isdistributedwitht-distributionwith𝑛 −1 degreesoffreedom

12

nulldistributiont-distributionwith4d.o.f.

nulldistributiont-distributionwith4d.o.f.

One-sample t-test: examplen H0:𝜇 = 20 g

n 5micewithbodymass(g):n 19.5,26.7,24.5,21.9,22.0

𝑀 = 22.92 g𝑆𝐷 = 2.76 g𝑆𝐸 = 1.23 g

𝑡 =22.92 − 20

1.22 = 2.37𝜈 = 4

𝑝 = 0.04

13

Observation

𝑝 = 0.04

>mass=c(19.5,26.7,24.5,21.9,22.0)>M=mean(mass)>n=length(mass)>SE=sd(mass)/sqrt(n)>t=(M- 20)/SE[1]2.36968>1- pt(t,n- 1)[1]0.03842385

Normality of dataOriginaldistribution

Distributionoft

Sidedness

15

Observation

𝑝, = 0.04

𝑝. = 0.08

One-sidedtestH1:𝑀 > 𝜇

Two-sidedtestH2:𝑀 ≠ 𝜇

𝑝. = 2𝑝,

One-sample t-test: summary

16

Input sampleof𝑛 measurementtheoreticalvalue𝜇 (populationmean)

Assumptions ObservationsarerandomandindependentDataarenormallydistributed

Usage Examineifthesampleisconsistentwiththepopulationmean

Nullhypothesis Samplecamefromapopulationwithmean𝜇

Comments Limited usage(e.g.SILAC)Workswellfornon-normaldistribution,aslongasitissymmetric

How to do it in R?# One-sided t-test

> mass = c(19.5, 26.7, 24.5, 21.9, 22.0)

> t.test(mass, mu=20, alternative="greater")

One Sample t-test

data: mass

t = 2.3697, df = 4, p-value = 0.03842

alternative hypothesis: true mean is greater than 20

95 percent confidence interval: 20.29307 Inf

sample estimates:

mean of x

22.92

17

Two-sample t-test

Two samplesn Considertwosamples(differentsizes)

n Aretheydifferent?

n Aretheirmeansdifferent?

n Dotheycomefrompopulationswithdifferentmeans?

19

𝑛U = 12𝑀U = 19.0 g𝑆U = 4.6 g

𝑛V = 9𝑀V = 24.0 g𝑆V = 4.3 g

The null distribution for the deviation between means

20

Normalpopulation𝜇 = 20 g,𝜎 = 5 g

𝑀U 𝑀V

x1,000,000

PopulationofBritishmice𝜇 = 20 g,𝜎 = 5

Selecttwosamplessize12and9

𝑡 =𝑀U −𝑀V

𝑆𝐸

BuilddistributionofΔ𝑀 and𝑡

Two-sample t-testn Nullhypothesis:bothsamplescomefrompopulationsofthesamemean

n H0:𝜇, = 𝜇.

n Teststatistic

𝑡 =𝑀, −𝑀.

𝑆𝐸

isdistributedwitht-distributionwith𝜈degreesoffreedom

21

DistributionofΔ𝑀

Distributionof𝑡

Case 1: equal variancesn Assumethatbothdistributionshavethesamevariance(orstandarddeviation)

n Usepooledvarianceestimator:

𝑆𝐷,,.. =𝑛, − 1 𝑆𝐷,. + 𝑛. − 1 𝑆𝐷..

𝑛, + 𝑛. − 2

n Andthenthestandarderrorandthenumberofdegreesoffreedomare

𝑆𝐸 = 𝑆𝐷,,.1𝑛,+1𝑛.

�

𝜈 = 𝑛, + 𝑛. − 2

22

Incaseofequalsamplessizes,𝑛, = 𝑛.,theseequationssimplify:

𝑆𝐷,,.. = 𝑆𝐷,. + 𝑆𝐷..

𝑆𝐸 =𝑆𝐷,,.𝑛�

𝜈 = 2𝑛 − 2

Case 1: equal variances, example

𝑆𝐷,,. = 4.5g

𝑆𝐸 = 1.98g

𝜈 = 19

𝑡 = 2.499

𝑝 = 0.011 (one-sided)

𝑝 = 0.022 (two-sided)

23

𝑛U = 12𝑀U = 19.0 g𝑆𝐷U = 4.6 g

𝑛V = 9𝑀V = 24.0 g𝑆𝐷V = 4.3 g

𝑝 = 0.011

t-distribution

Case 2: unequal variancesn Assumethatdistributionshavedifferentvariances

n Welch’st-test

n Findindividualstandarderrors(squared):

𝑆𝐸,. =𝑆𝐷,.

𝑛,𝑆𝐸.. =

𝑆𝐷..

𝑛.

n Findthecommonstandarderror:

𝑆𝐸 = 𝑆𝐸,. + 𝑆𝐸..�

n Numberofdegreesoffreedom

𝜈 ≈𝑆𝐸,. + 𝑆𝐸.. .

𝑆𝐸,[𝑛, − 1

+ 𝑆𝐸.[𝑛. − 1

24

Case 2: unequal variances, example

𝑆𝐸U. = 1.8g.

𝑆𝐸V. = 2.1g.

𝑆𝐸 = 1.96g

𝜈 = 18

𝑡 = 2.524

𝑝 = 0.011 (one-sided)

𝑝 = 0.021 (two-sided)

𝑛U = 12𝑀U = 19.0 g𝑆𝐷U = 4.6 g

𝑛V = 9𝑀V = 24.0 g𝑆𝐷V = 4.3 g

𝑝 = 0.011

t-distribution

What if variances are not equal?n Say,oursamplescomefromtwopopulations:

o English:𝜇 = 20g, 𝜎 = 5go Scottish:𝜇 = 20g, 𝜎 = 2.5g

n ‘Equalvariance’t-statisticdoesnotrepresentthenullhypothesis

n t-testonsamples:o ‘equal’:𝑡 = 4.17, 𝑝 = 2.8×10\[

o ‘unequal’:𝑡 = 4.56, 𝑝 = 1.4×10\[

n Unlessyouarecertainthatthevariancesareequal,usetheWelch’stest

26

Equalvariancet-statistic

Unequalvariancet-statistic

t-distribution

simulation

P-values vs. effect size

27

𝑛U = 12𝑛V = 9log𝐹𝐶 = 0.33𝑝 = 0.022

𝑛U = 100𝑛V = 100log𝐹𝐶 = 0.11𝑝 = 0.022

P-value is not a measure of biological

significance

Two-sample t test: summary

29

Input two samplesof𝑛, and 𝑛. measurements

Assumptions Observationsarerandomandindependent (nobefore/afterdata)Dataarenormallydistributed

Usage Comparesamplemeans

Nullhypothesis Samplescamefrompopulations withthesamemeans

Comments Workswellfornon-normaldistribution,aslongasitissymmetric

How to do it in R?# Two-sided t-test, equal variances

> English = c(16.5, 21.3, 12.4, 11.2, 23.7, 20.2, 17.4, 23, 15.6, 26.5, 21.8, 18.9)

> Scottish = c(19.7, 29.3, 27.1, 24.8, 22.4, 27.6, 25.7, 23.9, 15.4)

> t.test(English, Scottish, var.equal=T)

Two Sample t-test

data: English and Scottish

t = -2.4993, df = 19, p-value = 0.02177

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval: -9.0903223 -0.8041221

sample estimates:

mean of x mean of y

19.04167 23.98889

# Two-sided t-test, unequal variances

> t.test(English, Scottish, var.equal=F)

Welch Two Sample t-test


t = -2.5238, df = 17.969, p-value = 0.02125

30

Paired t-test

Paired t-testn Samplesarepairedn Forexample:mouseweightbeforeandafterobesitytreatment

n Nullhypothesis:thereisnodifferencebetweenbeforeandafter

n 𝑀` - themeanoftheindividualdifferences

n Example:mousebodymass(g)

Before: 21.4 20.2 23.5 17.5 18.6 17.0 18.9 19.2

After: 22.6 20.9 23.8 18.0 18.4 17.9 19.3 19.1

32

Paired t-testn Samplesarepairedn Findthedifferences:

∆b= 𝑥b − 𝑦b

then

𝑀∆ - mean𝑆𝐷∆ - standarddeviation𝑆𝐸∆ = 𝑆𝐷∆/ 𝑛� - standarderror

n Theteststatisticis

𝑡 =𝑀∆

𝑆𝐸∆

n t-distributionwith𝑛 − 1 degreesoffreedom

33

Non-pairedt-test(Welch)𝑀d −𝑀e = 0.46 g𝑆𝐸 = 1.08 g𝑡 = 0.426𝑝 = 0.34

Pairedtest𝑀∆ = 0.28 g𝑆𝐸∆ = 0.17 g𝑡 = 2.75𝑝 = 0.03

How to do it in R?# Paired t-test

> before = c(21.4, 20.2, 23.5, 17.5, 18.6, 17.0, 18.9, 19.2)

> after = c(22.6, 20.9, 23.8, 18.0, 18.4, 17.9, 19.3, 19.1)

> t.test(before, after, paired=T)

Paired t-test

data: before and after

t = -2.7545, df = 7, p-value = 0.02832

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:-0.85953136 -0.06546864

sample estimates:

mean of the differences

-0.4625

34

F-test

Variancen Onesampleofsize𝑛n Samplevariance

𝑆𝐷0\,. =1

𝑛 − 1f 𝑥b − 𝑀 .�

b

n Generalizedvariance:meansquare

𝑀𝑆 =𝑆𝑆𝜈

n whereo 𝑆𝑆 - sumofsquaredresidualso 𝜈 - numberofdegreesoffreedom

36

Samplemean

Residual

Comparison of variancen Considertwosamples

o Englishmice,𝑛U = 12o Scottishmice𝑛V = 9

n Wewanttotestiftheycomefromthepopulationswiththesamevariance,𝜎.

n Nullhypothesis:𝜎,. = 𝜎..

n Weneedateststatisticwithknowndistribution

37

𝑛U = 12𝑆𝐷U. = 21 g2

𝑛V = 9𝑆𝐷V. = 19 g2

Gedankenexperiment

38

nulldistributionPopulationofBritishmice𝜇 = 20 g,𝜎 = 5

Selecttwosamplessize12and9

𝐹 =𝑆𝐷U.

𝑆𝐷V.

Builddistributionof𝐹

Test to compare two variancesn Considertwosamples,sized𝑛, and𝑛.

n Nullhypothesis:theycomefromdistributionswiththesamevariance

n 𝐻h:𝜎,. = 𝜎..

n Teststatistic:

𝐹 =𝑆𝐷,.

𝑆𝐷..

isdistributedwithF-distributionwith𝑛, − 1 and𝑛. − 1 degreesoffreedom

39

F-distribution,𝜈, = 11, 𝜈. = 8

RemainderTeststatisticfortwo-samplet-test:

𝑡 =𝑀, −𝑀.𝑆𝐸

F-testn Englishmice:𝑆𝐷U = 4.61 g,𝑛U = 12n Scottishmice:𝑆𝐷V = 4.32 g,𝑛U = 9

n Nullhypothesis:theycomefromdistributionswiththesamevariance

n Teststatistic:

𝐹 =4.61.

4.32. = 1.139

𝜈U = 11𝜈V = 8

𝑝 = 0.44

40

Observation

𝑝 = 0.44

F-distribution,𝜈, = 11, 𝜈. = 8

>1- pf(1.139,11,8)[1]0.4375845

Two-sample variance test (F-test): summary

41

Input two samplesof𝑛, and 𝑛. measurements

Usage comparesamplevariances

Nullhypothesis samplescamefrompopulations withthesamevariance

Comments requiresnormalityofdatarightnow,itmightlookpointless,butisnecessaryinANOVA.Veryimportanttest!

How to do it in R?# Two-sample variance test

> var.test(English, Scottish, alternative=“greater”)

F test to compare two variances


F = 1.1389, num df = 11, denom df = 8, p-value = 0.4376

alternative hypothesis: true ratio of variances is greater to 1

95 percent confidence interval:

0.3437867 Inf

sample estimates:

ratio of variances

1.138948

42

Hand-outsavailableathttp://tiny.cc/statlec

Date post:	14-Jul-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

3. t-test - University of Dundee · 2017-04-18 · How to do it in R? # Two-sided t-test, equal...

Documents