F-Test ( ANOVA ) & Two-Way ANOVA Hande Ürkmeyen Tourism & Hotel Management.

transcript

F-Test ( ANOVA ) & Two-Way ANOVA

Hande Ürkmeyen

Tourism & Hotel Management

What is F-test?

An F-test is any statistical test in which the test statistic has an F-distribution under the null hypothesis. It is most often used when comparing statistical models that have been fit to a data set, in order to identify the model that best fits the population from which the data were sampled.

The name was coined by George W. Snedecor, in honour of Sir Ronald A. Fisher. Fisher initially developed the statistic as the variance ratio in the 1920s.

The F-test is designed to test if two population variances are equal. It does this by comparing the ratio of two variances. So, if the variances are equal, the ratio of the variances will be 1.

There are several different F-tables. Each one has a different level of significance.So, find the correct level of significance first, and then look up the numerator degrees of freedom and the denominator degrees of freedom to find the critical value.

The numerator degrees of freedom will be the degrees of freedom for whichever sample has the larger variance (since it is in the numerator) and the denominator degrees of freedom will be the degrees of freedom for whichever sample has the smaller variance (since it is in the denominator).

The F-test statistic is found by dividing the between group variance by the within group variance. The degrees of freedom for the numerator are the degrees of freedom for the between group (k-1) and the degrees of freedom for the denominator are the degrees of freedom for the within group (N-k).

Characteristics of the F Distribution

• The values of F cannot be negative.• The distribution is positively skewed.• The mean value of F is approximately equal to 1.• The F distribution is a family of curves based on

the degrees of freedom of the variance of the numerator and denominator.

How to use F-test ?

1) Invoke the null hypothesis that states that the two variances we are comparing are from the same population. (i.e., they are not statistically different)

2) Calculate the F value (the ratio of the two variances)

3) Look up the table value of F for the degrees of freedom used to calculate both variances and for a given confidence level.

4) If the calculated F is greater than the table value, then the null hypothesis is not correct. Else, the two could have come from the same population of measurements.

Assumptions for the F-test

• Samples are independent random samples• Distribution of response variable is a normal

curve within each population• Different populations may have different

means.• All populations have same standard dev. , s.

Conditions for Using the F-Test

• F-statistic can be used if data are not extremely skewed, there are no extreme outliers, and group standard deviations are not markedly different.

• Tests based on F-statistic are valid for data with skewness or outliers if sample sizes are large.

• A rough criterion for standard deviations is that the largest of the sample standard deviations should not be more than twice as large as the smallest of the sample standard deviations.

What is ANOVA?

• Analysis of variance: tool for analyzing how the mean value of a quantitative response variable is affected by one or more categorical explanatory factors.

If one categorical variable: one-way ANOVA

If two categorical variables: two-way ANOVA

F-Tests and ANOVA

• The ‘groups’ are from the independent variable’s values we wish to compare

• An estimate of the variance between groups is compared to an estimate of the variance within groups – this is the core concept of Analysis of variance (ANOVA)

• This is also called an F test, since that’s the name of the main output variable

Anova F-Test

Groupswithin Variation

Groupsbetween Variation F

Variation within groups small compared with variation between groups → Large F

Variation within groups large compared with variation between groups → Small F

What does ANOVA do?

At its simplest ANOVA tests the following hypotheses:

H0: The means of all the groups are equal.

Ha: Not all the means are equal

• doesn’t say how or which ones differ.• Can follow up with “multiple

comparisons”

Note: we usually refer to the sub-populations as “groups” when doing ANOVA.

ANOVA Null and Alternative Hypotheses

Say the sample contains K independent groups

• ANOVA tests the null hypothesis

H0: μ1 = μ2 = … = μK

– That is, “the group means are all equal”“the group means are all equal”

• The alternative hypothesis is

H1: μi ≠ μj for some i, j

– or, “the group means are “the group means are notnot all equal” all equal”

Assumptions of ANOVA

• Each group is approximately normal· check this by looking at histograms and/or

normal quantile plots, or use assumptions· can handle some nonnormality, but not severe

outliers• Standard deviations of each group are

approximately equal• The populations all have the same variance

One-way ANOVA Test Requirements

• There are k simple random samples from k populations

• The k samples are independent of each other; that is, the subjects in one group cannot be related in any way to subjects in a second group

• The populations are normally distributed

• The populations have the same variance; that is, each treatment group has a population variance σ2

Computing the F-test Statistic for

ANOVA1. Compute the sample mean of the combined data

set, x

2. Find the sample mean of each treatment (sample), xi

3. Find the sample variance of each treatment (sample), si

4. Compute the mean square due to treatment, MST

5. Compute the mean square due to error, MSE

6. Compute the F-test statistic:

mean square due to treatment MST F = ------------------------------------- = ---------- mean square due to error MSE

ni(xi – x)2 (ni – 1)si2

MST = -------------- MSE = ------------- k – l n – k

What is MSE and MST?

● MSE - mean square due to error, measures how different the observations, within each sample, are from each other It compares only observations within the same

sample Larger values correspond to more spread sample

means This mean square is approximately the same as the

population variance

● MST - mean square due to treatment, measures how different the samples are from each other It compares the different sample means Larger values correspond to more spread sample

means Under the null hypothesis, this mean square is

approximately the same as the population variance

Source of Variation

Sum of Squares

Degrees of

FreedomMean

Squares

F-testStatistic

F Critic

al Value

Treatment Σ ni(xi – x)2 k - 1 MST MST/MSEF α, k-1,

Error Σ (ni – 1)si2 n - k MSE

Total SST + SSE n - 1

General format of ANOVA Analysis of Variance Table

One-Way ANOVA Test Statistic

• Test statistic– F = MSA / MSW

• MSA is Mean Square Among or Between Variance

• MSW is Mean Square Within or Error Variance

• Degrees of freedom

– df1 = c -1

– df2 = n - c• c = # Columns (populations, groups, or

levels)• n = Total sample size

One-Way ANOVA Summary Table

• The F-statistic is the ratio of the Among estimate of the variance and the Within estimate of the variance. Therefore, it must always be positive.

• If the null hypothesis of equal means is true, then this ratio should be 1.

• The degrees of freedom in the denominator will typically be large, (n-c) while the degrees of freedom in the numerator will be small (c-1). The numerator is expected to be greater than the denominator.

One-Way ANOVAExample

• You’re a trainer for Microsoft Corp. Is there a differencein mean learning times of 12 people using 4 different training methods ( =.05)?

M1 M2 M3 M410 11 13 189 16 8 235 9 9 25

One-Way ANOVA Solution Template

•H0:•H1:

=•dfB = dfW =•Critical Value(s):

Test Statistic:

Decision:

Conclusion:

One-Way ANOVA

•H0: 1 = 2 = 3 = 4

•H1:

= .05•dfB = dfW =•Critical Value(s):

Test Statistic:

Decision:

Conclusion:

F0 4.07 F0 4.07

One-Way ANOVA

•H0: 1 = 2 = 3 = 4

•H1: Not all equal

= .05•dfB = dfW =•Critical Value(s):

Test Statistic:

Decision:

Conclusion:

F0 4.07 F0 4.07

One-Way ANOVA

•H0: 1 = 2 = 3 = 4

= .05•dfB = 3 ( k-1 = 4-1 )•dfW = 8 ( N-k = 12-4 )•Critical Value(s):

Test Statistic:

Decision:

Conclusion:

F0 4.07 F0 4.07

Summary Table (Partially Completed )

Source ofVariation

Degrees ofFreedom

Sum ofSquares

MeanSquare

(Variance)

Among(Methods)

Within(Error)

One-Way ANOVA

•H0: 1 = 2 = 3 = 4

= .05•dfB = 3 ( k-1 = 4-1 )•dfW = 8 ( N-k = 12-4 )•Critical Value(s):

Test Statistic:

F = MSA / MSW

= 116 / 10 = 11.6Where

MSA = SSA/ dfB

MSW= SSW/ dfW

Decision:

Reject at = .05

Conclusion:

There is evedence population means are different.

F0 4.07 F0 4.07

Summary Table Solution

Source ofVariation

Degrees ofFreedom

Sum ofSquares

MeanSquare

(Variance)

Among(Methods)

4 - 1 = 3 348 116 11.6

Within(Error)

12 - 4 = 8 80 10

Total 12 - 1 = 11 428

Source ofVariation

Degrees ofFreedom

Sum ofSquares

MeanSquare

(Variance)

Among(Methods)

4 - 1 = 3 348 116 11.6

Within(Error)

12 - 4 = 8 80 10

Total 12 - 1 = 11 428

Two Way Analysis of Variance Two-Way ANOVA

The two-way analysis of variance is an extension to the one-way analysis of variance. There are two independent variables (hence the name two-way).

• Assumptionso The populations from which the samples were

obtained must be normally or approximately normally distributed.

o The samples must be independent. o The variances of the populations must be

equal. o The groups must have the same sample size.

Two-Way ANOVA

• We are interested in the effect of two categorical factors on the response.

• We are interested in whether either of the two factors have an effect on the response and whether there is an interaction effect. – An interaction effect means that the effect

on the response of one factor depends on the level of the other factor.

Why Two-Way ANOVA?

• Examines the Effect of:– Two Factors on the Dependent Variable

e.g., Percent Carbonation and Line Speed on Soft Drink Bottling Process

– Interaction Between the Different Levels of these Two Factors

e.g., Does the effect of one particular percentage of Carbonation depend on

which level the line speed is set?

Interaction

Low High Factor A

No Interaction

Factor B Low Factor B High

Low High Factor A

Interaction

Factor B Low Factor B High

Two-Way ANOVA Model

ijkijjiijk

,...,1

Bfactor of leveljth theandA factor of levelith theofeffect n interactio theis )(

Bfactor of leveljth theofeffect main theis

Afactor of levelith theofeffect main theis

mean overall theis

level Bfactor jth theand levelA factor ith on the kth trial theof response theis

Hypothesis Testing Modelfor Two-Way ANOVA

Hypothesis testing can be done in this way,

• For H0

A: α1*= α2*= … αa*

• For Ha

A: At least two of the αi*’s are different,

Advantages of the two-way ANOVA

• Usually have a smaller total sample size, since you’re studying two things at once [rat diet example]

• Removes some of the random variability (some of the random variability is now explained by the second factor, so you can more easily find significant differences)

• We can look at interactions between factors (a significant interactionmeans the effect of one variable changes depending on the level of the other factor).

Why do not 2 One-Way?

• Why not do two one-way ANOVA’s instead of a two-way?

– The results of two one-way ANOVA’s will be identical to the two-way when the factors have completely independent results from each other.

As a result;

• Actually, ANOVA can be n-way. Any good statistics package can compute ANOVA with more than 2 factors with F-test and general ANOVA table.

Hande Ürkmeyen

Tourism & Hotel Management

Thank you…

F-Test ( ANOVA ) & Two-Way ANOVA Hande Ürkmeyen Tourism & Hotel Management.

Documents