+ All Categories
Home > Documents > The Analysis of Variance. One-Way ANOVA We use ANOVA when we want to look at statistical...

The Analysis of Variance. One-Way ANOVA We use ANOVA when we want to look at statistical...

Date post: 17-Jan-2016
Category:
Upload: aron-watson
View: 224 times
Download: 0 times
Share this document with a friend
30
The Analysis of Variance
Transcript
Page 1: The Analysis of Variance. One-Way ANOVA  We use ANOVA when we want to look at statistical relationships (difference in means for example) between more.

The Analysis of Variance

Page 2: The Analysis of Variance. One-Way ANOVA  We use ANOVA when we want to look at statistical relationships (difference in means for example) between more.

One-Way ANOVA

We use ANOVA when we want to look at statistical relationships (difference in means for example) between more than 2 populations or samples

ANOVA is a natural extension of ideas used in 2-pop t-tests and other methods we have explored

Page 3: The Analysis of Variance. One-Way ANOVA  We use ANOVA when we want to look at statistical relationships (difference in means for example) between more.

Trouble on the School Board!

Despite the school board’s best efforts – sensitive test score data for a large urban school district was leaked to the press!

The issue is a long standing argument that children in the inner city do not receive the same quality of education as do children in the suburban parts of the city. This could be very embarrassing for both the board and the mayor!

Here’s the data

Page 4: The Analysis of Variance. One-Way ANOVA  We use ANOVA when we want to look at statistical relationships (difference in means for example) between more.

A school board official states: “ The data is roughly normally distributed and is what you would expect for a random sample of 90 students – 30 from each

of the East, Central and West districts”

NOT SO FAST!Take a closer

look at the data – check for “structure”

Page 5: The Analysis of Variance. One-Way ANOVA  We use ANOVA when we want to look at statistical relationships (difference in means for example) between more.

Our investigative reporter took Stats 300 in college! Here is what she did:

Sort the data into East, Central and West “bins”

The box plot suggests a cover-up!

Page 6: The Analysis of Variance. One-Way ANOVA  We use ANOVA when we want to look at statistical relationships (difference in means for example) between more.

Digging further… the full set becomes

Page 7: The Analysis of Variance. One-Way ANOVA  We use ANOVA when we want to look at statistical relationships (difference in means for example) between more.

Further tests…Thanks StatsMan!

Page 8: The Analysis of Variance. One-Way ANOVA  We use ANOVA when we want to look at statistical relationships (difference in means for example) between more.
Page 9: The Analysis of Variance. One-Way ANOVA  We use ANOVA when we want to look at statistical relationships (difference in means for example) between more.

Summary of the 3 data sets:

Is there a statistical hypothesis lurking about?

Page 10: The Analysis of Variance. One-Way ANOVA  We use ANOVA when we want to look at statistical relationships (difference in means for example) between more.

The Hypotheses

Let 1, 2, and 3 be the mean scores for the three populations:

Pop1 = East Pop 2 = Central Pop 3 = West

Ho: 1= 2= 3

Ha: ?

The null hypothesis is pretty straight forward

Why is this a problem?

Page 11: The Analysis of Variance. One-Way ANOVA  We use ANOVA when we want to look at statistical relationships (difference in means for example) between more.

Could we do this with paired t-tests?

YES!

What does this imply?

Page 12: The Analysis of Variance. One-Way ANOVA  We use ANOVA when we want to look at statistical relationships (difference in means for example) between more.

We have good evidence to reject the null hypothesis – the central district scores are statistically lower than the other two districts.

Page 13: The Analysis of Variance. One-Way ANOVA  We use ANOVA when we want to look at statistical relationships (difference in means for example) between more.

Could we just use paired t-tests?

If we had 12 school districts that we were testing in the same way as the previous case – how would the analysis change? How many pairs How many false positives would we get

at a 95% Confidence level?

Page 14: The Analysis of Variance. One-Way ANOVA  We use ANOVA when we want to look at statistical relationships (difference in means for example) between more.

Why we can’t use multiple pairs of t-tests or why we should consider the entire set:

1. As the number of pairs increases the chance of a false positives or erroneous conclusion on the null hypothesis increases

2. By pooling all of information (not just pairs) we get a much more precise value for the standard deviation in the population

3. By treating all of the data we can, potentially detect interesting correlations between subgroups – this could easily be overlooked in we approached the data in a pair-wise fashion.

• Decreases the chance of false positives

• Pooling gives more precision in statisitcs

• Detect interesting correlations

Page 15: The Analysis of Variance. One-Way ANOVA  We use ANOVA when we want to look at statistical relationships (difference in means for example) between more.

Setting up for ANOVA

You guessed it – yet more terminology!

In 12.1 and 12.2 we will introduce: A method to get an estimate for the

standard deviation for the entire population (Pooled Estimator)

A new spin on degrees of freedom (df) A new test for significance – the F-test

Page 16: The Analysis of Variance. One-Way ANOVA  We use ANOVA when we want to look at statistical relationships (difference in means for example) between more.

Pooled Estimator for

This is a generalization of the method we used in paired t-tests:

2 2 22 1 1 2 2

1 2

( 1) ( 1) ( 1)

( 1) ( 1) ( 1)I I

p

I

n s n s n ss

n n n

This expression begins to measure the total variation in a population. Each si

2 term measures variation within a given sample. “I” represents the total number of independent SRS’s

Page 17: The Analysis of Variance. One-Way ANOVA  We use ANOVA when we want to look at statistical relationships (difference in means for example) between more.

Sigma Rule…

If the largest standard deviation in a set of I SRS’s is less than twice as large as the smallest then we can approximate the standard deviation by using the pooled estimator.

Page 18: The Analysis of Variance. One-Way ANOVA  We use ANOVA when we want to look at statistical relationships (difference in means for example) between more.

Example: What is the pooled estimate for sigma for the 3 school districts?

I = 3 (East, Central, West are SRS’s) n1=n2=n3=30

2 2 22 (30 1)35.04 (30 1)33.56 (30 1)26.13

(30 1) (30 1) (30 1)ps

2 1012.28 31.8p ps s

Page 19: The Analysis of Variance. One-Way ANOVA  We use ANOVA when we want to look at statistical relationships (difference in means for example) between more.

Part II – Developing the F-Test

Conceptual Model A collection of

SRS’s drawn from a larger population illustrate two different kinds of variation:

Internal variation around a sample mean within a given SRS

Variation of the SRS means with the overall population mean

Page 20: The Analysis of Variance. One-Way ANOVA  We use ANOVA when we want to look at statistical relationships (difference in means for example) between more.

ANOVA compares the two kinds of variability

The null hypothesis often is equivalent to saying that the populations overlap (have the same mean for example)

Another way of saying this is that the SRS’s share the “grand mean” of the entire population

This could happen if the individual SRS’s have large variation internally but not externally

We need a way to quantify this

Ways of quantifying variation

Page 21: The Analysis of Variance. One-Way ANOVA  We use ANOVA when we want to look at statistical relationships (difference in means for example) between more.

The F-Value

We can compare variation between samples with the variation within samples by calculating the Mean Square of the error in both cases.

This is expressed as:

We will get to F-distributions in a few moments

( )

( )

MS betweenF

MS within

Page 22: The Analysis of Variance. One-Way ANOVA  We use ANOVA when we want to look at statistical relationships (difference in means for example) between more.

Mean Square Error – MSE(within)

This is what the pooled estimator determines:

This means that our school board data has an internal MSE of (31.8)2

2 ( )ps MSE within

Page 23: The Analysis of Variance. One-Way ANOVA  We use ANOVA when we want to look at statistical relationships (difference in means for example) between more.

Mean Square Error – MSE(between)

To determine this we need the “grand mean” for all of the data:

Page 24: The Analysis of Variance. One-Way ANOVA  We use ANOVA when we want to look at statistical relationships (difference in means for example) between more.

Mean Square Error – MSE(between)

Define as:2 2( ) ( )

( )( ) 1i i grand i i grandn x n x

MS betweendf between I

A new application of the idea of degrees of freedom

Page 25: The Analysis of Variance. One-Way ANOVA  We use ANOVA when we want to look at statistical relationships (difference in means for example) between more.

Example – school board data:

2 2 230(649 611) 30(548 611) 30(635 611)( )

3 189835

MS between

We can now determine the “F-Value” for this data:

8983588.8

31.8b

w

MSF

MS

Page 26: The Analysis of Variance. One-Way ANOVA  We use ANOVA when we want to look at statistical relationships (difference in means for example) between more.

I Don’t Get It!

Confused? We are almost there. We now know how to

quantify the variation within SRS’s (MSw) and the variation between the means of the SRS’s (MSb)

The “F-ratio” can be compared against tables just like we did for z-tests and t-tests

Page 27: The Analysis of Variance. One-Way ANOVA  We use ANOVA when we want to look at statistical relationships (difference in means for example) between more.

How to Use an “F-ratio”

You need to know some important numbers:

The number of SRS’s (I) from this we form the degrees of freedom for the MSb term: dfb = I-1

The total number of data points ( the pooled data) = N, dfw=N-1

The F-ratio tests the null hypothesis (ie – that the means are equal)

If Ho is true the F ≈ 1

b

w

MSF

MS

denominator

numerator

Page 28: The Analysis of Variance. One-Way ANOVA  We use ANOVA when we want to look at statistical relationships (difference in means for example) between more.

Testing the School Board’s Claim

The school board’s claim was that there was no difference between the three district’s mean test scores.

Since there were 90 students (n=90) and 3 groups (I=3) we should use the F(I-1,N-1) = F(2,89) distribution

So … use Table E and F(2,89) = 88.8. Since this is not listed we need to approximate. You should be able to determine the probability of the null hypothesis between an upper and lower p-value.

With an F-ratio as big as 88.8 you really don’t normally need to look it up – you know Ho is false!

Page 29: The Analysis of Variance. One-Way ANOVA  We use ANOVA when we want to look at statistical relationships (difference in means for example) between more.

Use Minitab or EXCEL

Life is short! ANOVA is a complex (number intensive) process. Let’s look at two approaches: Minitab

Page 30: The Analysis of Variance. One-Way ANOVA  We use ANOVA when we want to look at statistical relationships (difference in means for example) between more.

Next lecture …

We will spend next lecture working through several examples of ANOVA

When doing this keep in mind what it is that you are calculating

Don’t get overwhelmed by the detail!


Recommended