Post on 14-Jul-2015
transcript
Statistical Power
Statistical power of a study (i.e., test of statistical significance)โProbability that it will correctly reject a false null hypothesis
โProbability that it will correctly detect an effect/difference
Why calculate statistical power?โPerhaps you want to know in advance the minimum sample size
necessary to have a reasonable chance of detecting an effect
โAlternatively, if you found out that your (costly) study only had power = 0.3, would you proceed with the study?
Fail to reject ๐ป0 Reject ๐ป0
๐ป0 is True Confidence Level Type I error (๐ผ)
๐ป0 is False Type II error (๐ฝ) Power
Calculating Statistical Power
Power Calculator: http://www.statisticalsolutions.net/pss_calc.php
You hypothesize that your weight loss drug helps people lose 2kg over a month. Assuming ๐ = 8, ๐ = 0.05, and ๐๐๐ค๐๐ = 0.8, what is the minimum sample size required to detect an effect?
You realize that you only have budget for 30 participants to conduct your trial. Assuming the same parameters as above and with ๐ = 30, what is the power of your trial?
Parameters of Statistical Power
๐๐๐๐๐๐๐๐๐๐๐๐ ๐ฟ๐๐ฃ๐๐ / ๐ โ ๐ฃ๐๐๐ข๐ (๐ผ) ๐๐๐๐๐๐ ๐๐๐ง๐ (๐)
๐ธ๐๐๐๐๐ก ๐๐๐ง๐ (๐ถ๐โ๐๐โฒ๐ ๐) ๐ท๐๐ ๐ก๐๐๐๐ข๐ก๐๐๐ ๐2
Tests with smaller p-values are more โrigorousโ and require more power.โ Increasing p-value from 0.01 to 0.1 means that you will
be rejecting ๐ป0 more often (99% vs 90%)โ There is a greater chance of accepting B relative to A
The bars show the 95% CI. In C, the sample sizes are small and thus the CI is large; in contrast, D has larger sample sizes and thus smaller CI.As a result, it would be easier to detect the difference in D relative to C.
DC
FE
Given that the size of difference (effect size) in F is much larger than in E, a statistical test would find it easier to detect the difference in F.
As the distribution of H has lesser variance than G, there would be lesser overlap in their CIs. Thus, it would be easier to detect the difference in H.
HG
BA ๐ผ = 0.01 ๐ผ = 0.1
Comparing more than two means
T-testโOnly two groups/levels are involved
โDependent t-test: Whether McDonalds makes you gain weight (before vs. after)
โ Independent t-test: Whether McDonalds or KFC makes you gain more weight
What if we have more than two levels?โAnalysis of Variance (ANOVA)
Hypothesis testing for ANOVA
Null hypothesis ๐ป0โThe mean outcome is the same across all categories
โ ๐1 = ๐2 = โฆ = ๐๐where ๐๐ = mean of the outcome for observations in category I
where ๐ = number of groups
Alternative hypothesis (๐ป๐)โAt least one pair of means are different from each other
Is there a difference between the average weight gain from consuming three types of fast foodsโCategories: (i) No fast food/control, (ii) McDonalds, (iii) KFC, (iv) Subway
Variability portioning in ANOVA
ANOVA allows us to separate out variability due to conditions/levels
Total variability in weight gain
Between group variability: variability due to food type
Within group variability: variability due to other factors
t test vs. ANOVA
t testโ Compare means from two groups
โ Are they so far apart that the difference cannot be attributed to sampling variability (i.e., randomness)?
โ ๐ป0: ๐1 = ๐2
Test statistic
๐ก = ๐ฅ1 โ ๐ฅ2 โ ๐1 โ ๐2
๐๐ธ( ๐ฅ1โ ๐ฅ2)
ANOVAโ Compare means from more than two
groups
โ Are they so far apart that the difference cannot be attributed to sampling variability (i.e., randomness)?
โ ๐ป0: ๐1 = ๐2 = โฏ = ๐๐
Test statistic
๐น =๐ฃ๐๐๐๐๐๐๐๐๐ก๐ฆ ๐๐๐ก๐ค๐๐๐ ๐๐๐๐ข๐๐
๐ฃ๐๐๐๐๐๐๐๐๐ก๐ฆ ๐ค๐๐กโ๐๐ ๐๐๐๐ข๐๐
Large test statistics lead to small p-values If p-value is small enough, ๐ป0 is rejected and we conclude that that data provides evidence of a
difference in population means
F Distribution
Probability distribution associated with the f statisticโ In order to be able to reject ๐ป0, we need a small p-value which requires a
large F statistic
โTo get a large F statistic, variability between sample means needs to be greater than variability within sample means
p-valueโProbability of as large a ratio between the โbetweenโ and โwithinโ group
variabilities, if in fact the means of all groups are equal
Accept๐ป0
Reject๐ป0
๐น =๐ฃ๐๐๐๐๐๐๐๐๐ก๐ฆ ๐๐๐ก๐ค๐๐๐ ๐๐๐๐ข๐๐
๐ฃ๐๐๐๐๐๐๐๐๐ก๐ฆ ๐ค๐๐กโ๐๐ ๐๐๐๐ข๐๐
Interpreting the ANOVA table (sum of squares) Sum of squares (total)
โ Measures the total variability
โ Calculated very similarly to variance except not scaled by sample size
Sum of squares (group)โ Measures variability between
groups
โ Deviation of group mean from overall mean, weighted by sample size
Sum of squares (error)โ Measures the variability within
groups
โ Unexplained by the group variable
Total 119 501.9
Interpreting the ANOVA table (degrees of freedom) Degrees of freedom (total)
โ n - 1
โ Where n = number of observations
Degrees of freedom (group) โ k โ 1
โ Where k = number of groups
Degrees of freedom (error)โ Degrees of freedom total โ degrees
of freedom group
Total 119 501.9
Interpreting the ANOVA table (mean squares) Mean squares (group)
โ Average variability between groups
โ Total variability (sum sq) scaled by the associated df
โ Mean square (group) / degrees of freedom (group)
Mean squares (error)โ Average variability within groups
โ Total variability (sum sq) scaled by the associated df
โ Mean square (error) / degrees of freedom (error)
Total 119 501.9
Interpreting the ANOVA table (F statistics & p) F statistic
โ Ration of the between group and within group variability
โ Mean square (group) / mean square (error)
p-valueโ Probability of as large a ratio
between the โbetweenโ and โwithinโ group variabilities, if in fact the means of all groups are equal
Total 119 501.9
Interpreting the ANOVA table (p-value)
If p-value is small (less than ๐ผ), reject ๐ป0โThe data provides evidence that
at least one pair of means different from each other
โBut we canโt tell which pair
If p-value is large (more than ๐ผ), fail to reject ๐ป0โThe data does not provide
evidence that one pair of means are different from each other
โThe observed differences could be due to chance
Total 119 501.9
Conditions for ANOVA
IndependenceโWithin groups: sampled observations must be independent
โBetween groups: groups must be independent of each other
Approximate normalityโWithin each group, distributions should be nearly normal
Equal variance (homoscedasticity) โGroups should have roughly equal variability
So how do we find out which means differ?
We conduct independent t tests for differences between each possible pair of groups (multiple comparisons)โHowever, with multiple t test, there could be an inflated Type I error rate
Thus, we use a modified significance level, which ranges from the most liberal to the most conservativeโMost liberal: no correction
โMost conservative: Bonferronni correction
Bonferroni correction
The Bonferroni correction suggests that a more stringent significance level is more appropriate for multiple correctionsโThus, we adjust ๐ผ by the number of comparisons considered
๐โ =๐ผ
๐พ
Where ๐พ: number of comparisons, i.e., ๐๐ข๐๐๐๐ ๐๐ ๐๐๐๐ข๐๐ (๐๐ข๐๐๐๐ ๐๐ ๐๐๐๐ข๐๐ โ1)
2
Bonferroni Correction
In our example, the fast food variable has 4 level: (i) control, (ii) McDonalds, (iii) KFC, (iv) Subway. If ๐ผ = 0.05, what should the modified significance level be?
๐๐ข๐๐๐๐ ๐๐ ๐๐๐ฃ๐๐๐ ๐ = 4
๐พ =4 ร(4โ1)
2= 6
๐โ =๐ผ
๐พ=
0.05
6โ 0.0083
Types of ANOVA
One-way ANOVAโBetween-groups
โRepeated measures
Factorial ANOVAโTwo or more independent variables
โAllows for examination of interaction effects
The t-test should suffice for most of your hypothesis testing needsโ For our understanding though, what other forms of hypothesis tests are there?
โ Chi-Squareโ Independent variable: Gender (proportion in general population)โ Dependent variable: Gender (proportion in engineering faculty)
โ Linear Regressionโ Independent variable: Ageโ Dependent variable: Income
โ Logistic Regressionโ Independent variable: Ageโ Dependent variable: Marital status
What other kinds of statistical tests are there?
Dependent Variable
Continuous Categorical
Independent Variable
Continuous
Categorical t-test
Linear Regression Logistic Regression
Chi-square test
Time for practice
In this lab session we will cover:โANOVA
โBonferroni Correction
GitHub repository: https://github.com/eugeneyan/Statistical-Inference