P-values and statistical tests7. Statistical power
Hand-outsavailableathttp://is.gd/statlec
MarekGierlińskiDivisionofComputationalBiology
Statistical power: what is it about?
2
Twopopulations(alternativehypothesis)
Effectsize
Samplesize
Twosamples Statisticalsignificance
Howdoesourabilitytocallachange“significant”dependontheeffectsizeandthesamplesize?
Effect size
Effect size describes the alternative hypothesis
4
Effectsize
𝜇" − 𝜇$
𝜎
𝜇" − 𝜇$𝜎
Effect size for two sample means
5
𝑑 =𝑀" −𝑀$
𝑆𝐷Cohen’sd
𝑆𝐷 =𝑛" − 1 𝑆𝐷"$ + 𝑛$ − 1 𝑆𝐷$$
𝑛" + 𝑛$ + 2�
𝑡 =𝑀" −𝑀$𝑆𝐸
𝑑 = 𝑡𝑛" + 𝑛$𝑛"𝑛$
�
𝑑 = 1.1
Effect size for two sample means
6
Cohen,J.(1988).Statisticalpoweranalysisforthebehavioralsciences
Effect size depends on the standard deviation
7
Foldchange=2
Effect size does not depend on the sample size
8
Effectsize=0.8
9
Effectsizedescribesthealternativehypothesis
Effect size in ANOVA
10
Forthepurposeofthiscalculationweonlyconsidergroupsofequalsizes,𝑛
𝑓 = 1
Teststatistic
𝐹 =𝑀𝑆5𝑀𝑆6
H0:𝑀𝑆5 = 𝑀𝑆6H1:𝑀𝑆5 = 𝑀𝑆6 + 𝑛𝑀𝑆7Addedvariance
𝑓$ =𝑀𝑆7𝑀𝑆6
Cohen’sf
𝑓$ =𝐹 − 1𝑛
Effect size in ANOVA
11
𝑓 = 1 𝑓 = 1
Effect size in frequency tables: odds ratio
12
Dead Alive Total
DrugA 68 12 80
DrugB 70 30 100
Total 138 42 180
p=0.013
Dead Alive Total
DrugA 𝑝7 = 0.85 𝑞7 = 0.15 1
DrugB 𝑝5 = 0.70 𝑞5 = 0.30 1
Total 1 1
𝑞5 − 𝑞7 = 0.30 − 0.15 = 0.15
Notusefulforsmallproportions
Oddsofsurvival
𝑞7𝑝7
=0.150.85 = 0.18 ∶ 1
𝑞5𝑝5
=0.300.70 = 0.43 ∶ 1
Oddsratio
𝜔 =𝑞5/𝑝5𝑞7/𝑝7
=0.430.18 = 2.4
Effect size
13
Data Statistical test Effectsize Formula
Twosets, size𝑛" and𝑛$ t-test Cohen’s𝑑 𝑑 = 𝑡𝑛" + 𝑛$𝑛"𝑛$
�
𝑘 groupsof𝑛 pointseach ANOVA Cohen’s𝑓 𝑓 =
𝐹 − 1𝑛
�
2×2contingencytable Fisher’sexact Oddsratio 𝜔 =𝑞5/𝑝5𝑞7/𝑝7
Paired data𝑥", 𝑥$, … , 𝑥Gand𝑦", 𝑦$, … , 𝑦G
Significanceofcorrelation Pearson’s 𝑟 𝑟 =
1𝑛 − 1J
𝑥K − 𝑀L𝑆𝐷L
𝑦K − 𝑀M
𝑆𝐷M
G
KN"
How to do it in R?> library(MBESS)
# Mouse body weight data
> English = c(16.5, 21.3, 12.4, 11.2, 23.7, 20.2, 17.4, 23, 15.6, 26.5, 21.8, 18.9)
> Scottish = c(19.7, 29.3, 27.1, 24.8, 22.4, 27.6, 25.7, 23.9, 15.4)
> n1 = length(English)> n2 = length(Scottish)
# t-test with equal variances, extract test statistic> test = t.test(English, Scottish, var.equal=TRUE)
> t = test$statistic[['t']]# confidence limits on the non-centrality parameter (t in this case)> nct.limits = conf.limits.nct(t, n1 + n2 - 2)# find Cohen's distance and its limits> sn = sqrt((n1 + n2) / (n1 * n2))> d = t * sn> d.lower = nct.limits$Lower.Limit * sn> d.upper = nct.limits$Upper.Limit * sn
> d[1] -1.102067> d.lower[1] -2.021337> d.upper[1] -0.1579345
14
Statistical powert-test
Statistical testing
16
Statisticalmodel
NullhypothesisH0:noeffect
Allotherassumptions
Significancelevel𝛼 = 0.05
p-value:probabilitythattheobservedeffectisrandom
𝑝 < 𝛼RejectH0
(atyourownrisk)Effectisreal
𝑝 ≥ 𝛼AcceptH0 (!!!)
StatisticaltestagainstH0Data
This table
17
H0 istrue H0 isfalse
H0 rejectedtypeIerror 𝜶falsepositive
correctdecisiontruepositive Positive
H0 acceptedcorrectdecisiontruenegative
typeIIerror 𝜷falsenegative Negative
Noeffect Effect
Gedankenexperiment
Draw100,000pairsofsamples(𝑋, 𝑌) ofsize𝑛 = 5
Find𝑡 = (𝑀" − 𝑀$)/𝑆𝐸 foreachpair
Buildsamplingdistributionof𝑡
18
H0:thereisnoeffect𝑋 from𝜇" = 20 g𝑌 from𝜇$ = 20 g
H1:thereisaneffect𝑋 from𝜇" = 20 g𝑌 from𝜇$ = 30 g
One alternative hypothesis
19
Nullhypothesis
𝛼 = 0.05
H0 true H0 false
reject FP𝜶 TP
accept TN FN𝜷
acceptanceregion1 − 𝛼
rejectionregion𝛼/2
rejectionregion𝛼/2
𝛽 1 − 𝛽
Alternativehypothesis
𝛽 = 0.08
Powerofthetest
𝑃 = 1 − 𝛽
ProbabilitythatwecorrectlyrejectH0
20
Statisticalpower
Theprobabilityofcorrectlyrejectingthenullhypothesis
(choosingthealternative,whenitistrue)
Multiple alternative hypotheses
21
𝜇" = 22 g𝑑 = 0.49𝛽 = 0.90
𝜇" = 24 g𝑑 = 0.98𝛽 = 0.72
𝜇" = 26 g𝑑 = 1.47𝛽 = 0.47
𝜇" = 29 g𝑑 = 1.96𝛽 = 0.23
𝜇" = 30 g𝑑 = 2.45𝛽 = 0.08
acceptanceregion1 − 𝛼
Power curve
22
acceptanceregion1 − 𝛼
𝛽 - typeIIerror(falsenegative)probability
Power = 1 − 𝛽
Power curves
23
𝑃 = 0.8
𝑃 = 0.95
How to do it in R?# Find sample size required to detect the effect size d = 1> power.t.test(d=1, sig.level=0.05, power=0.8, type="two.sample", alternative="two.sided")
One-sample t test power calculation
n = 16.71473d = 1
sig.level = 0.05power = 0.8
alternative = two.sided
> power.t.test(d=1, sig.level=0.05, power=0.95, type="two.sample", alternative="two.sided")
One-sample t test power calculation
n = 26.98922d = 1
sig.level = 0.05power = 0.95
alternative = two.sided
24
Statistical powerANOVA
One alternative hypothesis
26
Nullhypothesis
𝛼 = 0.05
Samplingdist.ofF,𝜇" = 𝜇$ = 𝜇b = 𝜇c = 20 g
acceptanceregion1 − 𝛼
rejectionregion𝛼
SamplingdistributionofF𝜇" = 𝜇$ = 20 g𝜇b = 𝜇c = 25 g
𝛽 1 − 𝛽
Alternativehypothesis
𝛽 = 0.20
Multiple alternative hypotheses
27
𝑓 = 0.1𝛽 = 0.92
𝑓 = 0.2𝛽 = 0.83
𝑓 = 0.3𝛽 = 0.65
𝑓 = 0.5𝛽 = 0.20
𝑓 = 1𝛽 = 3×10ef
acceptanceregion1 − 𝛼
Power curves
28
𝑃 = 0.8
𝑃 = 0.95
How to do it in R?> library(pwr)
# Find sample size required to detect a “large” effect size f = 0.4> pwr.anova.test(k=4, f=0.4, sig.level=0.05, power=0.8)
Balanced one-way analysis of variance power calculation
k = 4n = 18.04262f = 0.4
sig.level = 0.05power = 0.8
NOTE: n is number in each group
29
Worked example
Example: how toxicity affects rat brains
31
Samsonatal.(2016)DOI:10.1038/srep33746
𝑘 = 5 chambers𝑛 = 6 replicatesineach
PilotexperimentConnectedneuronsin5chambersPutneurotoxininC3CountdeadandalivecellsSeehowitspreads
Poweranalysis
Howmanyreplicatesdoweneedto...
1) detecta10%differencebetweenchambers?(powerint-test)
2) detecttheobservedC1-C5effectinANOVA?(powerinANOVA)
Howmanyreplicatestodetectadifferenceof0.1betweenchambers?
Assess your data variability based on the pilot
33
𝑆𝐷 = 0.1
𝑆𝐷 = 0.15
StandarderrorofSD
𝑆𝐸gh =𝑆𝐷
2(𝑛 − 1)�
Better scenario: 𝑆𝐷 = 0.1
Cohen’sd:
𝑑 =Δ𝑀𝑆𝐷 =
0.10.1 = 1
> power.t.test(d=1, sig.level=0.05, power=0.8, type="two.sample", alternative="two.sided")
Two-sample t test power calculation
n = 16.71477delta = 1
sd = 1sig.level = 0.05
power = 0.8
Worse scenario: 𝑆𝐷 = 0.15
> power.t.test(d=0.67, sig.level=0.05, power=0.8, type="two.sample", alternative="two.sided")
Two-sample t test power calculation
n = 35.95548delta = 0.67
sd = 1sig.level = 0.05
power = 0.8
Cohen’sd:
𝑑 =Δ𝑀𝑆𝐷 =
0.10.15 ≈ 0.67
HowmanyreplicatestodetecttheobservedC1-C5effectinANOVA?
Power in ANOVA
37
𝑓 =𝐹 − 1𝑛
�= 0.38
How many replicates do we need?> library(pwr)> rat = read.table('http://tiny.cc/rat_toxicity', header=TRUE)# Here n = 6 and k = 4
> rat.aov = aov(Proportion ~ Chamber, data=rat)# Extract F value> F = summary(rat.aov)[[1]]$F[1]# Effect size: Cohen's f> f = sqrt((F - 1)/n)
# What is the power of this experiment?> pwr.anova.test(k=4, n=6, f=f, sig.level=0.05)
k = 6n = 5f = 0.3760972
sig.level = 0.05power = 0.2507655
# How many replicates to get power of 0.8?> pwr.anova.test(k=4, f=f, sig.level=0.05, power=0.8)
k = 6n = 16.06243f = 0.3760972
sig.level = 0.05power = 0.8
38
Hand-outsavailableathttp://tiny.cc/statlec