Statistical inference: Statistical Power, ANOVA, and Post Hoc tests

Post on 14-Jul-2015

665 views 0 download

Tags:

transcript

Statistical InferenceWeek 4: Statistical power, ANOVA, and post hoc tests

Statistical Power

Statistical power of a study (i.e., test of statistical significance)โˆ’Probability that it will correctly reject a false null hypothesis

โˆ’Probability that it will correctly detect an effect/difference

Why calculate statistical power?โˆ’Perhaps you want to know in advance the minimum sample size

necessary to have a reasonable chance of detecting an effect

โˆ’Alternatively, if you found out that your (costly) study only had power = 0.3, would you proceed with the study?

Fail to reject ๐ป0 Reject ๐ป0

๐ป0 is True Confidence Level Type I error (๐›ผ)

๐ป0 is False Type II error (๐›ฝ) Power

Calculating Statistical Power

Power Calculator: http://www.statisticalsolutions.net/pss_calc.php

You hypothesize that your weight loss drug helps people lose 2kg over a month. Assuming ๐œŽ = 8, ๐‘ = 0.05, and ๐‘๐‘œ๐‘ค๐‘’๐‘Ÿ = 0.8, what is the minimum sample size required to detect an effect?

You realize that you only have budget for 30 participants to conduct your trial. Assuming the same parameters as above and with ๐‘ = 30, what is the power of your trial?

Parameters of Statistical Power

๐‘†๐‘–๐‘”๐‘›๐‘–๐‘“๐‘–๐‘๐‘Ž๐‘›๐‘๐‘’ ๐ฟ๐‘’๐‘ฃ๐‘’๐‘™ / ๐‘ โˆ’ ๐‘ฃ๐‘Ž๐‘™๐‘ข๐‘’ (๐›ผ) ๐‘†๐‘Ž๐‘š๐‘๐‘™๐‘’ ๐‘†๐‘–๐‘ง๐‘’ (๐‘)

๐ธ๐‘“๐‘“๐‘’๐‘๐‘ก ๐‘†๐‘–๐‘ง๐‘’ (๐ถ๐‘œโ„Ž๐‘’๐‘›โ€ฒ๐‘  ๐‘‘) ๐ท๐‘–๐‘ ๐‘ก๐‘Ÿ๐‘–๐‘๐‘ข๐‘ก๐‘–๐‘œ๐‘› ๐œŽ2

Tests with smaller p-values are more โ€œrigorousโ€ and require more power.โˆ’ Increasing p-value from 0.01 to 0.1 means that you will

be rejecting ๐ป0 more often (99% vs 90%)โˆ’ There is a greater chance of accepting B relative to A

The bars show the 95% CI. In C, the sample sizes are small and thus the CI is large; in contrast, D has larger sample sizes and thus smaller CI.As a result, it would be easier to detect the difference in D relative to C.

DC

FE

Given that the size of difference (effect size) in F is much larger than in E, a statistical test would find it easier to detect the difference in F.

As the distribution of H has lesser variance than G, there would be lesser overlap in their CIs. Thus, it would be easier to detect the difference in H.

HG

BA ๐›ผ = 0.01 ๐›ผ = 0.1

Comparing more than two means

T-testโˆ’Only two groups/levels are involved

โˆ’Dependent t-test: Whether McDonalds makes you gain weight (before vs. after)

โˆ’ Independent t-test: Whether McDonalds or KFC makes you gain more weight

What if we have more than two levels?โˆ’Analysis of Variance (ANOVA)

Hypothesis testing for ANOVA

Null hypothesis ๐ป0โˆ’The mean outcome is the same across all categories

โˆ’ ๐œ‡1 = ๐œ‡2 = โ€ฆ = ๐œ‡๐‘˜where ๐œ‡๐‘– = mean of the outcome for observations in category I

where ๐‘˜ = number of groups

Alternative hypothesis (๐ป๐‘Ž)โˆ’At least one pair of means are different from each other

Is there a difference between the average weight gain from consuming three types of fast foodsโˆ’Categories: (i) No fast food/control, (ii) McDonalds, (iii) KFC, (iv) Subway

Variability portioning in ANOVA

ANOVA allows us to separate out variability due to conditions/levels

Total variability in weight gain

Between group variability: variability due to food type

Within group variability: variability due to other factors

t test vs. ANOVA

t testโˆ’ Compare means from two groups

โˆ’ Are they so far apart that the difference cannot be attributed to sampling variability (i.e., randomness)?

โˆ’ ๐ป0: ๐œ‡1 = ๐œ‡2

Test statistic

๐‘ก = ๐‘ฅ1 โˆ’ ๐‘ฅ2 โˆ’ ๐œ‡1 โˆ’ ๐œ‡2

๐‘†๐ธ( ๐‘ฅ1โˆ’ ๐‘ฅ2)

ANOVAโˆ’ Compare means from more than two

groups

โˆ’ Are they so far apart that the difference cannot be attributed to sampling variability (i.e., randomness)?

โˆ’ ๐ป0: ๐œ‡1 = ๐œ‡2 = โ‹ฏ = ๐œ‡๐‘˜

Test statistic

๐น =๐‘ฃ๐‘Ž๐‘Ÿ๐‘–๐‘Ž๐‘๐‘–๐‘™๐‘–๐‘ก๐‘ฆ ๐‘๐‘’๐‘ก๐‘ค๐‘’๐‘’๐‘› ๐‘”๐‘Ÿ๐‘œ๐‘ข๐‘๐‘ 

๐‘ฃ๐‘Ž๐‘Ÿ๐‘–๐‘Ž๐‘๐‘–๐‘™๐‘–๐‘ก๐‘ฆ ๐‘ค๐‘–๐‘กโ„Ž๐‘–๐‘› ๐‘”๐‘Ÿ๐‘œ๐‘ข๐‘๐‘ 

Large test statistics lead to small p-values If p-value is small enough, ๐ป0 is rejected and we conclude that that data provides evidence of a

difference in population means

F Distribution

Probability distribution associated with the f statisticโˆ’ In order to be able to reject ๐ป0, we need a small p-value which requires a

large F statistic

โˆ’To get a large F statistic, variability between sample means needs to be greater than variability within sample means

p-valueโˆ’Probability of as large a ratio between the โ€˜betweenโ€™ and โ€˜withinโ€™ group

variabilities, if in fact the means of all groups are equal

Accept๐ป0

Reject๐ป0

๐น =๐‘ฃ๐‘Ž๐‘Ÿ๐‘–๐‘Ž๐‘๐‘–๐‘™๐‘–๐‘ก๐‘ฆ ๐‘๐‘’๐‘ก๐‘ค๐‘’๐‘’๐‘› ๐‘”๐‘Ÿ๐‘œ๐‘ข๐‘๐‘ 

๐‘ฃ๐‘Ž๐‘Ÿ๐‘–๐‘Ž๐‘๐‘–๐‘™๐‘–๐‘ก๐‘ฆ ๐‘ค๐‘–๐‘กโ„Ž๐‘–๐‘› ๐‘”๐‘Ÿ๐‘œ๐‘ข๐‘๐‘ 

Interpreting the ANOVA table (sum of squares) Sum of squares (total)

โˆ’ Measures the total variability

โˆ’ Calculated very similarly to variance except not scaled by sample size

Sum of squares (group)โˆ’ Measures variability between

groups

โˆ’ Deviation of group mean from overall mean, weighted by sample size

Sum of squares (error)โˆ’ Measures the variability within

groups

โˆ’ Unexplained by the group variable

Total 119 501.9

Interpreting the ANOVA table (degrees of freedom) Degrees of freedom (total)

โˆ’ n - 1

โˆ’ Where n = number of observations

Degrees of freedom (group) โˆ’ k โ€“ 1

โˆ’ Where k = number of groups

Degrees of freedom (error)โˆ’ Degrees of freedom total โ€“ degrees

of freedom group

Total 119 501.9

Interpreting the ANOVA table (mean squares) Mean squares (group)

โˆ’ Average variability between groups

โˆ’ Total variability (sum sq) scaled by the associated df

โˆ’ Mean square (group) / degrees of freedom (group)

Mean squares (error)โˆ’ Average variability within groups

โˆ’ Total variability (sum sq) scaled by the associated df

โˆ’ Mean square (error) / degrees of freedom (error)

Total 119 501.9

Interpreting the ANOVA table (F statistics & p) F statistic

โˆ’ Ration of the between group and within group variability

โˆ’ Mean square (group) / mean square (error)

p-valueโˆ’ Probability of as large a ratio

between the โ€˜betweenโ€™ and โ€˜withinโ€™ group variabilities, if in fact the means of all groups are equal

Total 119 501.9

Interpreting the ANOVA table (p-value)

If p-value is small (less than ๐›ผ), reject ๐ป0โˆ’The data provides evidence that

at least one pair of means different from each other

โˆ’But we canโ€™t tell which pair

If p-value is large (more than ๐›ผ), fail to reject ๐ป0โˆ’The data does not provide

evidence that one pair of means are different from each other

โˆ’The observed differences could be due to chance

Total 119 501.9

Conditions for ANOVA

Independenceโˆ’Within groups: sampled observations must be independent

โˆ’Between groups: groups must be independent of each other

Approximate normalityโˆ’Within each group, distributions should be nearly normal

Equal variance (homoscedasticity) โˆ’Groups should have roughly equal variability

So how do we find out which means differ?

We conduct independent t tests for differences between each possible pair of groups (multiple comparisons)โˆ’However, with multiple t test, there could be an inflated Type I error rate

Thus, we use a modified significance level, which ranges from the most liberal to the most conservativeโˆ’Most liberal: no correction

โˆ’Most conservative: Bonferronni correction

Bonferroni correction

The Bonferroni correction suggests that a more stringent significance level is more appropriate for multiple correctionsโˆ’Thus, we adjust ๐›ผ by the number of comparisons considered

๐‘Žโˆ— =๐›ผ

๐พ

Where ๐พ: number of comparisons, i.e., ๐‘›๐‘ข๐‘š๐‘๐‘’๐‘Ÿ ๐‘œ๐‘“ ๐‘”๐‘Ÿ๐‘œ๐‘ข๐‘๐‘  (๐‘›๐‘ข๐‘š๐‘๐‘’๐‘Ÿ ๐‘œ๐‘“ ๐‘”๐‘Ÿ๐‘œ๐‘ข๐‘๐‘  โˆ’1)

2

Bonferroni Correction

In our example, the fast food variable has 4 level: (i) control, (ii) McDonalds, (iii) KFC, (iv) Subway. If ๐›ผ = 0.05, what should the modified significance level be?

๐‘๐‘ข๐‘š๐‘๐‘’๐‘Ÿ ๐‘œ๐‘“ ๐‘™๐‘’๐‘ฃ๐‘’๐‘™๐‘  ๐‘˜ = 4

๐พ =4 ร—(4โˆ’1)

2= 6

๐‘Žโˆ— =๐›ผ

๐พ=

0.05

6โ‰ˆ 0.0083

Types of ANOVA

One-way ANOVAโˆ’Between-groups

โˆ’Repeated measures

Factorial ANOVAโˆ’Two or more independent variables

โˆ’Allows for examination of interaction effects

The t-test should suffice for most of your hypothesis testing needsโˆ’ For our understanding though, what other forms of hypothesis tests are there?

โˆ’ Chi-Squareโˆ’ Independent variable: Gender (proportion in general population)โˆ’ Dependent variable: Gender (proportion in engineering faculty)

โˆ’ Linear Regressionโˆ’ Independent variable: Ageโˆ’ Dependent variable: Income

โˆ’ Logistic Regressionโˆ’ Independent variable: Ageโˆ’ Dependent variable: Marital status

What other kinds of statistical tests are there?

Dependent Variable

Continuous Categorical

Independent Variable

Continuous

Categorical t-test

Linear Regression Logistic Regression

Chi-square test

Time for practice

In this lab session we will cover:โˆ’ANOVA

โˆ’Bonferroni Correction

GitHub repository: https://github.com/eugeneyan/Statistical-Inference

Thank you for your attention!Eugene Yan