Post on 11-Jan-2016
transcript
F-Test ( ANOVA ) & Two-Way ANOVA
Hande Ürkmeyen
Tourism & Hotel Management
What is F-test?
An F-test is any statistical test in which the test statistic has an F-distribution under the null hypothesis. It is most often used when comparing statistical models that have been fit to a data set, in order to identify the model that best fits the population from which the data were sampled.
The name was coined by George W. Snedecor, in honour of Sir Ronald A. Fisher. Fisher initially developed the statistic as the variance ratio in the 1920s.
The F-test is designed to test if two population variances are equal. It does this by comparing the ratio of two variances. So, if the variances are equal, the ratio of the variances will be 1.
There are several different F-tables. Each one has a different level of significance.So, find the correct level of significance first, and then look up the numerator degrees of freedom and the denominator degrees of freedom to find the critical value.
The numerator degrees of freedom will be the degrees of freedom for whichever sample has the larger variance (since it is in the numerator) and the denominator degrees of freedom will be the degrees of freedom for whichever sample has the smaller variance (since it is in the denominator).
The F-test statistic is found by dividing the between group variance by the within group variance. The degrees of freedom for the numerator are the degrees of freedom for the between group (k-1) and the degrees of freedom for the denominator are the degrees of freedom for the within group (N-k).
Characteristics of the F Distribution
• The values of F cannot be negative.• The distribution is positively skewed.• The mean value of F is approximately equal to 1.• The F distribution is a family of curves based on
the degrees of freedom of the variance of the numerator and denominator.
How to use F-test ?
1) Invoke the null hypothesis that states that the two variances we are comparing are from the same population. (i.e., they are not statistically different)
2) Calculate the F value (the ratio of the two variances)
3) Look up the table value of F for the degrees of freedom used to calculate both variances and for a given confidence level.
4) If the calculated F is greater than the table value, then the null hypothesis is not correct. Else, the two could have come from the same population of measurements.
Assumptions for the F-test
• Samples are independent random samples• Distribution of response variable is a normal
curve within each population• Different populations may have different
means.• All populations have same standard dev. , s.
Conditions for Using the F-Test
• F-statistic can be used if data are not extremely skewed, there are no extreme outliers, and group standard deviations are not markedly different.
• Tests based on F-statistic are valid for data with skewness or outliers if sample sizes are large.
• A rough criterion for standard deviations is that the largest of the sample standard deviations should not be more than twice as large as the smallest of the sample standard deviations.
What is ANOVA?
• Analysis of variance: tool for analyzing how the mean value of a quantitative response variable is affected by one or more categorical explanatory factors.
If one categorical variable: one-way ANOVA
If two categorical variables: two-way ANOVA
F-Tests and ANOVA
• The ‘groups’ are from the independent variable’s values we wish to compare
• An estimate of the variance between groups is compared to an estimate of the variance within groups – this is the core concept of Analysis of variance (ANOVA)
• This is also called an F test, since that’s the name of the main output variable
Anova F-Test
MSE
MSG
Groupswithin Variation
Groupsbetween Variation F
Variation within groups small compared with variation between groups → Large F
Variation within groups large compared with variation between groups → Small F
What does ANOVA do?
At its simplest ANOVA tests the following hypotheses:
H0: The means of all the groups are equal.
Ha: Not all the means are equal
• doesn’t say how or which ones differ.• Can follow up with “multiple
comparisons”
Note: we usually refer to the sub-populations as “groups” when doing ANOVA.
ANOVA Null and Alternative Hypotheses
Say the sample contains K independent groups
• ANOVA tests the null hypothesis
H0: μ1 = μ2 = … = μK
– That is, “the group means are all equal”“the group means are all equal”
• The alternative hypothesis is
H1: μi ≠ μj for some i, j
– or, “the group means are “the group means are notnot all equal” all equal”
Assumptions of ANOVA
• Each group is approximately normal· check this by looking at histograms and/or
normal quantile plots, or use assumptions· can handle some nonnormality, but not severe
outliers• Standard deviations of each group are
approximately equal• The populations all have the same variance
One-way ANOVA Test Requirements
• There are k simple random samples from k populations
• The k samples are independent of each other; that is, the subjects in one group cannot be related in any way to subjects in a second group
• The populations are normally distributed
• The populations have the same variance; that is, each treatment group has a population variance σ2
Computing the F-test Statistic for
ANOVA1. Compute the sample mean of the combined data
set, x
2. Find the sample mean of each treatment (sample), xi
3. Find the sample variance of each treatment (sample), si
2
4. Compute the mean square due to treatment, MST
5. Compute the mean square due to error, MSE
6. Compute the F-test statistic:
mean square due to treatment MST F = ------------------------------------- = ---------- mean square due to error MSE
ni(xi – x)2 (ni – 1)si2
MST = -------------- MSE = ------------- k – l n – k
What is MSE and MST?
● MSE - mean square due to error, measures how different the observations, within each sample, are from each other It compares only observations within the same
sample Larger values correspond to more spread sample
means This mean square is approximately the same as the
population variance
● MST - mean square due to treatment, measures how different the samples are from each other It compares the different sample means Larger values correspond to more spread sample
means Under the null hypothesis, this mean square is
approximately the same as the population variance
Source of Variation
Sum of Squares
Degrees of
FreedomMean
Squares
F-testStatistic
F Critic
al Value
Treatment Σ ni(xi – x)2 k - 1 MST MST/MSEF α, k-1,
n-k
Error Σ (ni – 1)si2 n - k MSE
Total SST + SSE n - 1
General format of ANOVA Analysis of Variance Table
One-Way ANOVA Test Statistic
• Test statistic– F = MSA / MSW
• MSA is Mean Square Among or Between Variance
• MSW is Mean Square Within or Error Variance
• Degrees of freedom
– df1 = c -1
– df2 = n - c• c = # Columns (populations, groups, or
levels)• n = Total sample size
One-Way ANOVA Summary Table
• The F-statistic is the ratio of the Among estimate of the variance and the Within estimate of the variance. Therefore, it must always be positive.
• If the null hypothesis of equal means is true, then this ratio should be 1.
• The degrees of freedom in the denominator will typically be large, (n-c) while the degrees of freedom in the numerator will be small (c-1). The numerator is expected to be greater than the denominator.
One-Way ANOVAExample
• You’re a trainer for Microsoft Corp. Is there a differencein mean learning times of 12 people using 4 different training methods ( =.05)?
M1 M2 M3 M410 11 13 189 16 8 235 9 9 25
One-Way ANOVA Solution Template
•H0:•H1:
=•dfB = dfW =•Critical Value(s):
Test Statistic:
Decision:
Conclusion:
F0
One-Way ANOVA
•H0: 1 = 2 = 3 = 4
•H1:
= .05•dfB = dfW =•Critical Value(s):
Test Statistic:
Decision:
Conclusion:
F0 4.07 F0 4.07
= .05
One-Way ANOVA
•H0: 1 = 2 = 3 = 4
•H1: Not all equal
= .05•dfB = dfW =•Critical Value(s):
Test Statistic:
Decision:
Conclusion:
F0 4.07 F0 4.07
= .05
One-Way ANOVA
•H0: 1 = 2 = 3 = 4
•H1: Not all equal
= .05•dfB = 3 ( k-1 = 4-1 )•dfW = 8 ( N-k = 12-4 )•Critical Value(s):
Test Statistic:
Decision:
Conclusion:
F0 4.07 F0 4.07
= .05
Summary Table (Partially Completed )
Source ofVariation
Degrees ofFreedom
Sum ofSquares
MeanSquare
(Variance)
F
Among(Methods)
348
Within(Error)
80
Total
One-Way ANOVA
•H0: 1 = 2 = 3 = 4
•H1: Not all equal
= .05•dfB = 3 ( k-1 = 4-1 )•dfW = 8 ( N-k = 12-4 )•Critical Value(s):
Test Statistic:
F = MSA / MSW
= 116 / 10 = 11.6Where
MSA = SSA/ dfB
MSW= SSW/ dfW
Decision:
Reject at = .05
Conclusion:
There is evedence population means are different.
F0 4.07 F0 4.07
= .05
Summary Table Solution
Source ofVariation
Degrees ofFreedom
Sum ofSquares
MeanSquare
(Variance)
F
Among(Methods)
4 - 1 = 3 348 116 11.6
Within(Error)
12 - 4 = 8 80 10
Total 12 - 1 = 11 428
Source ofVariation
Degrees ofFreedom
Sum ofSquares
MeanSquare
(Variance)
F
Among(Methods)
4 - 1 = 3 348 116 11.6
Within(Error)
12 - 4 = 8 80 10
Total 12 - 1 = 11 428
Two Way Analysis of Variance Two-Way ANOVA
The two-way analysis of variance is an extension to the one-way analysis of variance. There are two independent variables (hence the name two-way).
• Assumptionso The populations from which the samples were
obtained must be normally or approximately normally distributed.
o The samples must be independent. o The variances of the populations must be
equal. o The groups must have the same sample size.
Two-Way ANOVA
• We are interested in the effect of two categorical factors on the response.
• We are interested in whether either of the two factors have an effect on the response and whether there is an interaction effect. – An interaction effect means that the effect
on the response of one factor depends on the level of the other factor.
Why Two-Way ANOVA?
• Examines the Effect of:– Two Factors on the Dependent Variable
e.g., Percent Carbonation and Line Speed on Soft Drink Bottling Process
– Interaction Between the Different Levels of these Two Factors
e.g., Does the effect of one particular percentage of Carbonation depend on
which level the line speed is set?
Interaction
Low High Factor A
Resp
onse
No Interaction
Factor B Low Factor B High
Low High Factor A
Resp
onse
Interaction
Factor B Low Factor B High
Two-Way ANOVA Model
ij
ijk
ij
j
i
ijk
ijkijjiijk
nk
bj
ai
N
y
y
,...,1
,,1
,,1
),0(~
Bfactor of leveljth theandA factor of levelith theofeffect n interactio theis )(
Bfactor of leveljth theofeffect main theis
Afactor of levelith theofeffect main theis
mean overall theis
level Bfactor jth theand levelA factor ith on the kth trial theof response theis
Where
)(
2
Hypothesis Testing Modelfor Two-Way ANOVA
Hypothesis testing can be done in this way,
• For H0
A: α1*= α2*= … αa*
• For Ha
A: At least two of the αi*’s are different,
Advantages of the two-way ANOVA
• Usually have a smaller total sample size, since you’re studying two things at once [rat diet example]
• Removes some of the random variability (some of the random variability is now explained by the second factor, so you can more easily find significant differences)
• We can look at interactions between factors (a significant interactionmeans the effect of one variable changes depending on the level of the other factor).
Why do not 2 One-Way?
• Why not do two one-way ANOVA’s instead of a two-way?
– The results of two one-way ANOVA’s will be identical to the two-way when the factors have completely independent results from each other.
As a result;
• Actually, ANOVA can be n-way. Any good statistics package can compute ANOVA with more than 2 factors with F-test and general ANOVA table.
Hande Ürkmeyen
Tourism & Hotel Management
Thank you…