+ All Categories
Home > Documents > Chapter 9 An Introduction to Analysis of Variance

Chapter 9 An Introduction to Analysis of Variance

Date post: 31-Dec-2015
Category:
Upload: raya-walls
View: 32 times
Download: 2 times
Share this document with a friend
Description:
Chapter 9 An Introduction to Analysis of Variance. Terry Dielman Applied Regression Analysis: A Second Course in Business and Economic Statistics, fourth edition. ANOVA. - PowerPoint PPT Presentation
Popular Tags:
52
ANOVA ANOVA 1 Copyright © 2005 Brooks/Cole, a division of Thomson Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Learning, Inc. Chapter 9 Chapter 9 An Introduction to An Introduction to Analysis of Variance Analysis of Variance Terry Dielman Terry Dielman Applied Regression Applied Regression Analysis: Analysis: A Second Course in Business and A Second Course in Business and Economic Statistics, fourth Economic Statistics, fourth edition edition
Transcript
Page 1: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 11 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Chapter 9Chapter 9An Introduction to An Introduction to

Analysis of VarianceAnalysis of Variance

Terry DielmanTerry DielmanApplied Regression Analysis:Applied Regression Analysis:

A Second Course in Business and A Second Course in Business and Economic Statistics, fourth editionEconomic Statistics, fourth edition

Page 2: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 22 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

ANOVAANOVA

Analysis of variance was a term used Analysis of variance was a term used in regression to describe how we split in regression to describe how we split the variation in our sample into the variation in our sample into "explained" and "unexplained" parts."explained" and "unexplained" parts.

In this chapter we will look at some In this chapter we will look at some other ANOVA procedures where the other ANOVA procedures where the model doing the "explaining" is model doing the "explaining" is different.different.

Page 3: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 33 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.1 One-Way Analysis of Variance9.1 One-Way Analysis of Variance

Consider a problem that has Consider a problem that has KK populations. We can write:populations. We can write:

yyijij = µ = µii + e + eijij

The notation:The notation:yyijij is the j is the jthth observation in population i observation in population i

µµii is the mean for population iis the mean for population i

eeijij is a random disturbance is a random disturbance The population index The population index ii ranges from 1 to ranges from 1 to KK

and the observation index and the observation index jj from 1 to from 1 to nnii..

Page 4: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 44 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Sample SizesSample Sizes

The use of the subscript on The use of the subscript on nn implies implies that the sample sizes can differ that the sample sizes can differ although it is often better if they are although it is often better if they are about equal in size.about equal in size.

Our combined (overall) sample size Our combined (overall) sample size will be denoted without a subscript will be denoted without a subscript as just as just nn..

It is the sum of the It is the sum of the KK individual individual nnii..

Page 5: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 55 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Assumptions About DisturbancesAssumptions About Disturbances

We make the same assumptions as in We make the same assumptions as in regression analysis:regression analysis:

1.1. The The eeijij have mean 0. have mean 0.

2.2. The The eeijij have constant variance have constant variance 22ee..

3.3. The The eeijij are normally distributed. are normally distributed.

Page 6: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 66 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

ANOVA TerminologyANOVA Terminology

ANOVA has its own terminology. The ANOVA has its own terminology. The dependent variable dependent variable yy is said to differ is said to differ due to due to factors factors (here, different (here, different populations).populations).

A A levellevel of a factor is a particular of a factor is a particular population.population.

In one-way ANOVA we often refer to In one-way ANOVA we often refer to the factor levels as the factor levels as treatmentstreatments..

Page 7: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 77 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Alternative RepresentationAlternative Representation We can rewrite the model to show We can rewrite the model to show

the treatment effects the treatment effects ii.. Suppose we let the overall mean be Suppose we let the overall mean be

denoted denoted µµ. The alternate form is:. The alternate form is:

yyijij = µ + = µ + ii + e + eijij

A factor-level mean is A factor-level mean is µµii = µ + = µ + ii

Page 8: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 88 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Hypothesis TestHypothesis Test

The question we want to answer is "are all The question we want to answer is "are all the population means equal?"the population means equal?"

The hypotheses for this are:The hypotheses for this are:

HH00: µ: µ11 = µ = µ22 = ... = µ = ... = µKK

HHaa: : At least oneAt least one µ µii is differentis different

An equivalent would be to claim all the An equivalent would be to claim all the treatment effects are the same.treatment effects are the same.

Page 9: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 99 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

The F TestThe F Test

As in regression, we perform the test by As in regression, we perform the test by partitioning the variation in the sample.partitioning the variation in the sample.

We have unexplained variation (SSE) and We have unexplained variation (SSE) and explained variation (SSTR) which is a explained variation (SSTR) which is a function of the difference in treatment function of the difference in treatment means.means.

After dividing by appropriate degrees of After dividing by appropriate degrees of freedom, the F is a ratio of mean squares:freedom, the F is a ratio of mean squares:

F = MSTR/MSEF = MSTR/MSE

Page 10: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 1010 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Computing SSTRComputing SSTR

Compute an overall mean and a Compute an overall mean and a mean for each sample:mean for each sample:

Next compute the treatment sum of Next compute the treatment sum of squares:squares:

jin

jij

ji

K

i

n

jij y

nyy

ny

1.

1 1..

1and

1

K

iii yynSSTR

1

2... )(

Page 11: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 1111 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Computing SSEComputing SSE

As in regression we compute fit As in regression we compute fit errors, but now we use the treatment errors, but now we use the treatment means as predictors:means as predictors:

The error sum of squares is thus:The error sum of squares is thus:

SSTR has (SSTR has (K-1)K-1) degrees of freedom degrees of freedom and SSE has and SSE has (n-K)(n-K)..

K

i

n

jiij

i

yySSE1 1

2.)(

Page 12: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 1212 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Example 9.1 Automobile InjuriesExample 9.1 Automobile Injuries

The file INJURY9 contains data on The file INJURY9 contains data on injury claims involving 112 models of injury claims involving 112 models of 1984-86 cars.1984-86 cars.

The variable INJURIES is the number The variable INJURIES is the number of claims for each model and the of claims for each model and the variable CARCLAS indicates which variable CARCLAS indicates which category (small 2-door, small 4-door, category (small 2-door, small 4-door, etc.) the car falls into.etc.) the car falls into.

Page 13: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 1313 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Minitab OutputMinitab OutputAnalysis of Variance for INJURIESSource DF SS MS F PCARCLAS 8 54762 6845 22.80 0.000Error 103 30917 300Total 111 85679 Individual 95% CIs For Mean Based on Pooled StDevLevel N Mean StDev -+---------+---------+---------+-----1 20 127.10 21.89 (--*--) 2 13 105.00 16.29 (---*---) 3 4 68.25 9.07 (------*------) 4 18 124.22 25.58 (---*--) 5 23 94.57 13.38 (--*--) 6 6 66.33 3.72 (-----*----) 7 7 88.29 9.64 (----*-----) 8 13 82.62 14.21 (---*---) 9 8 60.00 6.23 (----*----) -+---------+---------+---------+-----Pooled StDev = 17.33 50 75 100 125

Page 14: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 1414 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

The F TestThe F Test

The F ratio has 8 numerator and 103 The F ratio has 8 numerator and 103 denominator degrees of freedom.denominator degrees of freedom.

At a 5% significance level, the critical At a 5% significance level, the critical value is 2.10value is 2.10

From the output SS(CARCLASS) is 54762 From the output SS(CARCLASS) is 54762 so MSTR is 54762/8 = 6845.so MSTR is 54762/8 = 6845.

MSE = 300 and F=6845/300 = 22.8MSE = 300 and F=6845/300 = 22.8 We reject the hypothesis that all types of We reject the hypothesis that all types of

cars have the same number of accidents.cars have the same number of accidents.

Page 15: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 1515 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Which are higher or lower?Which are higher or lower? The Minitab output below the ANOVA table The Minitab output below the ANOVA table

presents some information that helps us figure presents some information that helps us figure that out.that out.

The intervals for The intervals for µµ11 and and µµ44 are distinctly higher are distinctly higher than the other types and there is a lot of overlap than the other types and there is a lot of overlap among the others.among the others.

At minimum, we can say that category 1 (small 2-At minimum, we can say that category 1 (small 2-door) and category 4 (small 4-door) had door) and category 4 (small 4-door) had significantly more injury claims.significantly more injury claims.

We will look at more precise ways to do We will look at more precise ways to do comparisons in the next example.comparisons in the next example.

Page 16: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 1616 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Comments on the ComparisonsComments on the Comparisons

The data represents The data represents numbernumber of injury of injury claims, not a claims, not a raterate, so it is possible , so it is possible that these two are high just because that these two are high just because more small cars are out there.more small cars are out there.

It is also possible that these small It is also possible that these small cars provide less protection, so more cars provide less protection, so more injuries occur during accidents.injuries occur during accidents.

Page 17: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 1717 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Planned ExperimentsPlanned Experiments ANOVA is often used to analyze data collected ANOVA is often used to analyze data collected

during a designed experiment.during a designed experiment.

The term design refers to the plan for conducting The term design refers to the plan for conducting the experiment.the experiment.

The researcher can assign the objects in the The researcher can assign the objects in the experiment to specific treatments, often to achieve experiment to specific treatments, often to achieve a balanced experiment with equal a balanced experiment with equal nnii..

We had no control like this in the injury analysis.We had no control like this in the injury analysis.

Page 18: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 1818 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Example 9.2 Computer SalesExample 9.2 Computer Sales

We are studying three different We are studying three different approaches for selling computers.approaches for selling computers.

Fifteen different salespeople are Fifteen different salespeople are randomly assigned to the sales randomly assigned to the sales methods, five to each approach.methods, five to each approach.

At the end of a month, we collected At the end of a month, we collected sales figures from each.sales figures from each.

Page 19: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 1919 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Minitab OutputMinitab OutputAnalysis of Variance for Sales Source DF SS MS F PApproach 2 384.5 192.3 8.47 0.005Error 12 272.4 22.7Total 14 656.9 Individual 95% CIs For Mean Based on Pooled StDevLevel N Mean StDev --+---------+---------+---------+----1 5 15.600 3.578 (-------*-------) 2 5 21.600 5.727 (-------*-------) 3 5 28.000 4.743 (-------*------) --+---------+---------+---------+----Pooled StDev = 4.764 12.0 18.0 24.0 30.0

Approaches are significantly different

Page 20: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 2020 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Pairwise ComparisonsPairwise Comparisons A better way to compare any two means A better way to compare any two means

for differences is a variation of our two-for differences is a variation of our two-sample interval from Chapter 2.sample interval from Chapter 2.

For comparing approach For comparing approach ii to approach to approach jj, , compute the interval:compute the interval:

SSee is the pooled 3-sample standard is the pooled 3-sample standard deviation, the square root of MSE.deviation, the square root of MSE.

jieKnji nnStyy

11)( ..

Page 21: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 2121 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Comparing A to CComparing A to CFor A to C:For A to C:

or (-18.96, -5.84).or (-18.96, -5.84).

We can claim the people using approach C We can claim the people using approach C sell from $5,840 to $18,960 more than sell from $5,840 to $18,960 more than those using approach A.those using approach A.

CAeKnCA nnStyy

11)(

5

1

5

1)76.4(179.2)0.286.15(

Page 22: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 2222 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Multiple ComparisonsMultiple Comparisons The confidence level for this single The confidence level for this single

comparison is 95%, but if you did comparison is 95%, but if you did many such comparisons, "overall" many such comparisons, "overall" confidence will be lower.confidence will be lower.

If we compared each approach to the If we compared each approach to the others, that would be three 95% others, that would be three 95% intervals.intervals.

The overall confidence is roughly The overall confidence is roughly 85%.85%.

Page 23: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 2323 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Bonferroni MethodBonferroni Method

The Bonferroni approach is a method The Bonferroni approach is a method for performing comparisons that are for performing comparisons that are planned in advance.planned in advance.

It essentially controls overall It essentially controls overall confidence by using a larger confidence by using a larger tt multiplier.multiplier.

If there are If there are gg 95% comparisons 95% comparisons planned, find the planned, find the tt value that has tail value that has tail probability of (.025/probability of (.025/gg).).

Page 24: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 2424 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

All Three ComparisonsAll Three Comparisons

If ahead of time we knew we wanted to If ahead of time we knew we wanted to compare each sales method to the other compare each sales method to the other two, we have two, we have g=3g=3..

Find the Find the tt value with (.025/3) = .008 tail value with (.025/3) = .008 tail probability.probability.

Using Excel's or Minitab's probability Using Excel's or Minitab's probability function, you can find that function, you can find that t=2.802t=2.802

If you had no other way to find out what If you had no other way to find out what this is, to be safe use the tabled value this is, to be safe use the tabled value at .005, which is 3.055.at .005, which is 3.055.

Page 25: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 2525 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Comparing All 3Comparing All 3The first interval is:The first interval is:

or (-15.20, 3.20) or (-15.20, 3.20) no differenceno difference

A to C is: (-21.60, -3.20) A to C is: (-21.60, -3.20) C is betterC is better

B to C is: (-15.60, 2.80) B to C is: (-15.60, 2.80) no differenceno difference

BAeBA nnSByy

11)(

5

1

5

1)76.4(055.3)6.216.15(

Page 26: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 2626 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

The Tukey ProcedureThe Tukey Procedure The Bonferroni approach can be used The Bonferroni approach can be used

for more than pairwise comparisons.for more than pairwise comparisons.

For example, we could compare For example, we could compare method A to the average of methods method A to the average of methods B and C.B and C.

If you only plan on the pairwise If you only plan on the pairwise comparisons, the Tukey procedure is comparisons, the Tukey procedure is more efficient.more efficient.

Page 27: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 2727 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Tukey IntervalsTukey Intervals When sample sizes are equal:When sample sizes are equal:

If they differ:If they differ:

q q is the critical value of the Studentized is the critical value of the Studentized range (Appendix B),range (Appendix B), p p is the number of is the number of treatments andtreatments and v=n-K v=n-K is the error df.is the error df.

i

eji

n

Svpqyy ),()( ..

ji

eji nn

Svpqyy

11

2),()( ..

Page 28: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 2828 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Tukey CalculationsTukey Calculations Since we have equal sample sizes, we use the Since we have equal sample sizes, we use the

first formula. The ± amount will be the same for first formula. The ± amount will be the same for all:all:

A to B: (15.6-21.6) ± 8.025 = (-14.03, 2.03)A to B: (15.6-21.6) ± 8.025 = (-14.03, 2.03)A to C: (15.6-28.0) ± 8.025 = (-20.43,-4.37)A to C: (15.6-28.0) ± 8.025 = (-20.43,-4.37)B to C: (21.6-28.0) ± 8.025 = (-14.43, 1.63)B to C: (21.6-28.0) ± 8.025 = (-14.43, 1.63)

We still have the same results but got a little We still have the same results but got a little closer to a significant difference on the A to B and closer to a significant difference on the A to B and B to C comparisons.B to C comparisons.

025.8)1287.2(77.35

76.4)12,3(),( q

n

Svpq

i

e

Page 29: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 2929 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Minitab Output With Tukey OptionMinitab Output With Tukey Option

Tukey 95% Simultaneous Confidence IntervalsAll Pairwise Comparisons among Levels of APPROACH

Individual confidence level = 97.94%

APPROACH = 1 subtracted from:

APPROACH Lower Center Upper ----+---------+---------+---------+-----2 -2.033 6.000 14.033 (-------*-------)3 4.367 12.400 20.433 (-------*-------) ----+---------+---------+---------+----- -10 0 10 20

APPROACH = 2 subtracted from:

APPROACH Lower Center Upper ----+---------+---------+---------+-----3 -1.633 6.400 14.433 (-------*-------) ----+---------+---------+---------+----- -10 0 10 20

Page 30: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 3030 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.2 ANOVA Using A Randomized Block Design9.2 ANOVA Using A Randomized Block Design

Consider again the computer sales Consider again the computer sales problem; one thing affecting the results is problem; one thing affecting the results is that some people are better salespersons that some people are better salespersons regardless of what approach they are regardless of what approach they are using.using.

If we had a way of including that If we had a way of including that information, we could "block out" the information, we could "block out" the talent effect and get a better idea about talent effect and get a better idea about which sales method is better.which sales method is better.

This is what a This is what a randomized block designrandomized block design does.does.

Page 31: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 3131 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Repeated Measures DesignsRepeated Measures Designs

One way to incorporate talent is to just use One way to incorporate talent is to just use 5 salespersons and have each use a 5 salespersons and have each use a different method each month.different method each month.

To minimize any effects of time order, we To minimize any effects of time order, we would randomly assign the order in which would randomly assign the order in which they use the approaches.they use the approaches.

When we are all done we can compute When we are all done we can compute each person's average sales to measure each person's average sales to measure relative talent.relative talent.

Page 32: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 3232 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Example 9.6 Cereal Package DesignExample 9.6 Cereal Package Design

We have four different designs of cereal We have four different designs of cereal packages and want to determine which packages and want to determine which one sells better.one sells better.

We have 20 stores to use in the study.We have 20 stores to use in the study. A potential confounding factor is that more A potential confounding factor is that more

sales will occur in larger stores.sales will occur in larger stores. We can block this out by dividing the We can block this out by dividing the

stores up into five size groups of four stores up into five size groups of four stores each. Each package design will be stores each. Each package design will be used in one store in each group.used in one store in each group.

Page 33: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 3333 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

The Randomized Block ModelThe Randomized Block Model

Our model is:Our model is:

yyijij = µ + = µ + ii + B + Bj j + e+ eijij

yyijij is the single observation for is the single observation for treatment treatment i i in block in block jjµ µ is the overall meanis the overall meanii is the effect of the iis the effect of the ithth treatment treatmentBBjj is the effect of the j is the effect of the jthth block blockeeijij is a random disturbance is a random disturbance

Page 34: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 3434 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

One Observation per CellOne Observation per Cell

We assume here that there is only a We assume here that there is only a single observation per combination single observation per combination of block and treatment. of block and treatment.

Repeats can be handled but we would Repeats can be handled but we would have to make some adjustments to have to make some adjustments to some formula and add another some formula and add another subscript.subscript.

Leave those to the computer.Leave those to the computer.

Page 35: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 3535 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

F TestsF Tests

Another row gets added to the Another row gets added to the ANOVA table.ANOVA table.

Our sources of variation are now Our sources of variation are now error, blocks and treatments.error, blocks and treatments.

We can perform an F test for block We can perform an F test for block effects.effects.

Our main interest is the test for the Our main interest is the test for the treatment effects.treatment effects.

Page 36: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 3636 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

F RatiosF Ratios For block test, F = MSBL/MSEFor block test, F = MSBL/MSE

There are There are bb block levels so the block levels so the numerator has numerator has (b-1)(b-1) degrees of degrees of freedom.freedom.

For treatments, use F = MSTR/MSEFor treatments, use F = MSTR/MSE

The numerator has The numerator has (K-1) (K-1) d.f.d.f. MSE has MSE has (b-1)(K-1) (b-1)(K-1) d.f.d.f.

Page 37: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 3737 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Output for Cereal Package Design AnalysisOutput for Cereal Package Design AnalysisAnalysis of Variance for SALES Source DF SS MS F PSIZE 4 1521.7 380.4 4.17 0.024DESIGN 3 31.0 10.3 0.11 0.951Error 12 1093.5 91.1Total 19 2646.2

Individual 95% CISIZE Mean ------+---------+---------+---------+-----1 28.3 (--------*-------)2 35.5 (--------*-------)3 39.8 (--------*--------)4 46.5 (--------*-------)5 53.5 (--------*-------) ------+---------+---------+---------+----- 24.0 36.0 48.0 60.0

Individual 95% CIDESIGN Mean ----------+---------+---------+---------+-1 40.4 (--------------*---------------)2 40.0 (---------------*--------------)3 39.6 (---------------*---------------)4 42.8 (--------------*---------------) ----------+---------+---------+---------+- 36.0 42.0 48.0 54.0

Page 38: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 3838 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Analyzing the ResultsAnalyzing the Results

First, the F test for the store size First, the F test for the store size effect is significant (F=4.17 has a p-effect is significant (F=4.17 has a p-value of 0.024). The means plot value of 0.024). The means plot below the ANOVA table shows that below the ANOVA table shows that sales do increase with size.sales do increase with size.

The F test for package design is not The F test for package design is not significant (F=0.11 has p =.951). significant (F=0.11 has p =.951). Thus, it does not appear that any one Thus, it does not appear that any one design works better than others.design works better than others.

Page 39: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 3939 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.3 Two-Way ANOVA9.3 Two-Way ANOVA

In this situation, there are two factors In this situation, there are two factors or explanatory variables.or explanatory variables.

For example, suppose a company is For example, suppose a company is going to experiment with two price going to experiment with two price levels and three types of advertising.levels and three types of advertising.

Now a "treatment" is considered a Now a "treatment" is considered a price-advertising combination, of price-advertising combination, of which there are 6.which there are 6.

Page 40: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 4040 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Factorial DesignsFactorial Designs

This type of problem is called a This type of problem is called a factorial factorial designdesign..

When all possible combinations of the two When all possible combinations of the two factors are used, it is a factors are used, it is a complete factorial complete factorial experimentexperiment..

We will assume that all treatments have We will assume that all treatments have the same number of observations; the same number of observations; although it is possible to do factorial although it is possible to do factorial designs without equal samples.designs without equal samples.

Page 41: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 4141 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

The Two-Way ANOVA ModelThe Two-Way ANOVA Model

Our model is:Our model is:

yyijkijk = µ + = µ +ii + + j j + (+ ())ij ij + e+ eijkijk

yyijkijk is the k is the kthth observation at factor level observation at factor level ii for for factor A and factor level factor A and factor level jj for factor B for factor B

µ µ is the overall meanis the overall meanii is the effect of factor A at level is the effect of factor A at level iijj is the effect of factor B at level is the effect of factor B at level jj

(())ij ij is theis the interaction between factorsinteraction between factorseeijkijk is a random disturbance is a random disturbance

Page 42: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 4242 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Hypothesis TestsHypothesis Tests

The tests for the effects of factor A and The tests for the effects of factor A and factor B are called tests for the factor B are called tests for the main main effects effects and these are what we are mainly and these are what we are mainly interested in.interested in.

You should first test for interaction. You should first test for interaction. Interaction means that the effect of factor Interaction means that the effect of factor A may depend on the level of factor B.A may depend on the level of factor B.

If there is no interaction, the main effects If there is no interaction, the main effects are independent of each other.are independent of each other.

Page 43: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 4343 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Degrees Of FreedomDegrees Of Freedom

Assume that factor A has Assume that factor A has nn11 levels and levels and factor B has factor B has nn22 levels, and that we have levels, and that we have rr observations for each treatment.observations for each treatment.

The four sources of variation and thier The four sources of variation and thier associated degrees of freedom:associated degrees of freedom:

factor A factor A (n(n11-1)-1)

factor B factor B (n(n22-1)-1)

interaction interaction (n(n11-1)(n-1)(n22-1)-1)

error error ((rnrn11nn22 – 1) – 1)

Page 44: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 4444 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Means PlotMeans Plot

A good exploratory tool is to plot the A good exploratory tool is to plot the average value of average value of yy that occurs at each that occurs at each treatment.treatment.

The average The average yy goes on the vertical axis goes on the vertical axis and one of the factors on the horizontal and one of the factors on the horizontal axis.axis.

Use lines to connect the means for the Use lines to connect the means for the other factor.other factor.

If the lines are roughly parallel, it is a If the lines are roughly parallel, it is a signal that there is no interaction.signal that there is no interaction.

Page 45: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 4545 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Example 9.8 Printer SalesExample 9.8 Printer Sales

The company is experimenting with the The company is experimenting with the price of its top-of-the-line printer and how price of its top-of-the-line printer and how it is advertised.it is advertised.

They set the price at either $600 (1) or They set the price at either $600 (1) or $700 (2) and the advertising was either by $700 (2) and the advertising was either by television (1), radio (2) or newspaper (3).television (1), radio (2) or newspaper (3).

They record the sales for one month, and They record the sales for one month, and each combination was run twice, so we each combination was run twice, so we have 12 observations.have 12 observations.

Page 46: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 4646 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

600

700

600

700

321

18

17

16

15

14

13

12

11

10

9

8

Advert

Avg

Sal

es

Means PlotMeans Plot

Page 47: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 4747 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Tentative FindingsTentative Findings

In general, sales were higher at the In general, sales were higher at the lower price.lower price.

They were highest for TV advertising They were highest for TV advertising and next for radio.and next for radio.

A mild potential interaction is present A mild potential interaction is present because the higher price did better because the higher price did better when using newspaper advertising.when using newspaper advertising.

Page 48: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 4848 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

ANOVA OutputANOVA Output

Two-way ANOVA: SALES versus ADV, PRICE

Analysis of Variance for SALES Source DF SS MS F PADV 2 103.752 51.876 75.82 0.000PRICE 1 7.521 7.521 10.99 0.016Interaction 2 8.032 4.016 5.87 0.039Error 6 4.105 0.684Total 11 123.409

Page 49: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 4949 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Test for InteractionTest for Interaction

The test for interaction is significant. The The test for interaction is significant. The F ratio is 5.87 compared to a critical value F ratio is 5.87 compared to a critical value from Ffrom F2,62,6 = 5.14. The p-value is .039. = 5.14. The p-value is .039.

If strong interaction is present, it means it If strong interaction is present, it means it may be hard to sort out the main effects may be hard to sort out the main effects (and you may not even want to test for (and you may not even want to test for them).them).

The interaction is not real strong so we will The interaction is not real strong so we will test for main effects.test for main effects.

Page 50: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 5050 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Main EffectsMain Effects

Test for Selling PriceTest for Selling Price

The test is significant (F=10.99 has The test is significant (F=10.99 has p=.016) so selling price does matter.p=.016) so selling price does matter.

Test for Advertising EffectTest for Advertising Effect

This is even more significant This is even more significant (F=75.82 has p=.000).(F=75.82 has p=.000).

Page 51: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 5151 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

The Optimal PolicyThe Optimal Policy

Sales are certainly highest when the Sales are certainly highest when the price is set at $600 and television price is set at $600 and television advertising is used.advertising is used.

These are not surprising results and These are not surprising results and they also represent the most costly they also represent the most costly combination.combination.

If we had profit—instead of revenueIf we had profit—instead of revenue—information, we may have a —information, we may have a different opinion on what to do.different opinion on what to do.

Page 52: Chapter 9 An Introduction to  Analysis of Variance

ANOVAANOVA 5252 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.

9.4 Analysis of Covariance9.4 Analysis of Covariance A close relative to ANOVA is ANCOVA where the A close relative to ANOVA is ANCOVA where the

model contains a mixture of quantitative and model contains a mixture of quantitative and qualitative predictors.qualitative predictors.

These are often analyzed by a General Linear These are often analyzed by a General Linear Model procedure.Model procedure.

In these procedures you specify the In these procedures you specify the yy variable variable and its predictors. You then indicate which are and its predictors. You then indicate which are factors (qualitative) and which are covariates factors (qualitative) and which are covariates (quantitative).(quantitative).

In essence, it is just regression analysis with both In essence, it is just regression analysis with both continuous and indicator variables.continuous and indicator variables.

Using a GLM procedure often makes it easier to Using a GLM procedure often makes it easier to specify the model form, particularly when specify the model form, particularly when interaction is involved. interaction is involved.


Recommended