+ All Categories
Home > Documents > EXAMPLE 1 – Butter Fat Content in Cow Milkcourse1.winona.edu/bdeppa/STAT 600/Handouts/Section...

EXAMPLE 1 – Butter Fat Content in Cow Milkcourse1.winona.edu/bdeppa/STAT 600/Handouts/Section...

Date post: 11-Jan-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
31
STAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring 2017 (Comparing several population means) Introduction Suppose we wish to compare k population means ( k2 ). This situation can arise in two ways. If the study is observational, we are obtaining independently drawn samples from k distinct populations and we wish to compare the population means for some numerical response of interest. If the study is experimental, then we are using a completely randomized design to obtain our data from k distinct treatment groups. In a completely randomized design the experimental units are randomly assigned to one of k treatments and the response value from each unit is obtained. The mean of the numerical response of interest is then compared across the different treatment groups. There two main questions of interest: 1) Are there at least two population means that differ? 2) If so, which population means differ and how much do they differ by? More formally: H o : μ 1 =μ 2 =...=μ k i. e . all the population means are equal and have a common mean ( μ ) H a : μ i μ j for some ij , i . e . at least two population means differ . If we reject the null then we use comparative methods to answer question 2 above. Basic Idea: The test procedure compares the variation in observations between samples to the variation within samples. If the variation between samples is large relative to the variation within samples we are likely to conclude that the population means are not all equal. The diagrams below illustrate this idea... Between Group Variation >> Within Group Variation Between Group Variation ¿ Within Group Variation 54
Transcript
Page 1: EXAMPLE 1 – Butter Fat Content in Cow Milkcourse1.winona.edu/bdeppa/STAT 600/Handouts/Section 3... · Web viewSTAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring

STAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring 2017 (Comparing several population means)

IntroductionSuppose we wish to compare k population means (k≥2 ). This situation can arise in two ways. If the study is observational, we are obtaining independently drawn samples from k distinct populations and we wish to compare the population means for some numerical response of interest. If the study is experimental, then we are using a completely randomized design to obtain our data from k distinct treatment groups. In a completely randomized design the experimental units are randomly assigned to one of k treatments and the response value from each unit is obtained. The mean of the numerical response of interest is then compared across the different treatment groups.

There two main questions of interest:1) Are there at least two population means that differ?2) If so, which population means differ and how much do they differ by?

More formally:

Ho : μ1=μ2=. ..=μk i .e . all the population means are equal and have a common mean ( μ)Ha : μ i≠μ j for some i≠ j , i . e . at least two population means differ .

If we reject the null then we use comparative methods to answer question 2 above.

Basic Idea:The test procedure compares the variation in observations between samples to the variation within samples. If the variation between samples is large relative to the variation within samples we are likely to conclude that the population means are not all equal. The diagrams below illustrate this idea...

Between Group Variation >> Within Group Variation Between Group Variation ¿ Within Group Variation (Conclude population means differ) (Fail to conclude the population means differ)

The name analysis of variance gets its name because we are using variation to decide whether the population means differ.

54

Page 2: EXAMPLE 1 – Butter Fat Content in Cow Milkcourse1.winona.edu/bdeppa/STAT 600/Handouts/Section 3... · Web viewSTAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring

STAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring 2017 (Comparing several population means)

Computational Details for Equal Sample Size Case (n1=n2=⋯=nk=n )Preliminary notation:

Y ij= jth obs. from population i, i=1 , .. . , k , j=1 ,. .. , nY ¿ Y

¿⋅¿= grand mean of all obs.=

∑i=1

k∑j=1

nY

ij

N =

∑i=1

kY

i⋅¿

k ¿ ¿

N=n1+n2+. ..+nk , i . e . N is the total sample size for the experiment

TWO ESTIMATES OF THE COMMON VARIANCE (σ 2)

Estimate 1: Using Between Group Variation

If the null hypothesis is true each of theY i⋅¿ ' s ¿ represents a randomly selected observation

from a normal distribution with mean μ (common mean) and std. deviation = σ

√n , i.e.

the standard error of the mean, i.e. from the sampling distribution of Y N (μ , σ√n

) .

Thus if we find the sample standard deviation of the Y i⋅¿ ' s ¿we get an estimate of σ

√n , or in terms of the sample variance we have,Sample variance of the { Y

i⋅¿ ' s = ∑i=1

k(

Yi⋅¿−Y

¿⋅¿ )2

k−1 is an estimate of σ2

n¿¿

.

Therefore if multiply this sample variance by n,

n∑

i=1

k

(Y i⋅¿−Y

¿⋅¿)2

k−1¿¿

we obtain an estimate of σ2

, if and only if the null hypothesis is true. However if the alternative hypothesis is true, this estimate of σ

2 will be too BIG. This formula will

be large when there is substantial between group variation. This measure of between

group variation is called the Mean Square for Treatments and is denoted MSTreat .

55

Page 3: EXAMPLE 1 – Butter Fat Content in Cow Milkcourse1.winona.edu/bdeppa/STAT 600/Handouts/Section 3... · Web viewSTAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring

STAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring 2017 (Comparing several population means)

Estimate 2: Using Within Group Variation

Another estimate of the common variance (σ 2) can be found by looking at the variation of the observations within each of the k treatment groups. By extending the pooled-variance from the two population case we have the following:

Pooled estimate of σ2=(n1−1)s1

2+(n2−1)s22+. ..+(nk−1 )sk

2

n1+n2+. . .+nk−k

which simplifies to the mean of the k sample variances when the sample sizes are all equal.

Another way to write this is as follows:

Pooled estimate of σ2=∑i=1

k

∑j=1

ni

(Y ij−Yi⋅¿)2

N−k =SSError

N−k =MSError ¿

This will be an estimate of the common population variance (σ 2)regardless of whether

the null hypothesis is true or not. This is called the Mean Square for Error ( MSError ) .

Thus if the null hypothesis is true we have two estimates of the common variance (σ 2) ,

namely the mean square for treatments ( MSTreat )and the Mean Square Error ( MSError ) .

If MSTreat >> MSError we rejectHo , i.e. the between group variation is large relative to the within group variation.

If MSTreat≈MS Error we fail to reject Ho , i.e. the between group variation is NOT large relative to the within group variation.

Test StatisticThe test statistic compares the mean squares for treatment and error in the form of a ratio,

F=MSTreat

MSError~ F-distribution with numerator df=k−1 and denominator df = N-k

Large values for F give small p-values and lead to rejection of the null hypothesis.

56

Page 4: EXAMPLE 1 – Butter Fat Content in Cow Milkcourse1.winona.edu/bdeppa/STAT 600/Handouts/Section 3... · Web viewSTAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring

STAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring 2017 (Comparing several population means)

Example 3.1 - Weight Gain in Anorexia PatientsData File: Anorexia.JMP available on the course website

These data give the pre- and post-weights of patients being treated for anorexia nervosa. There are actually three different treatment plans being used in this study, and we wish to compare their performance.

The variables in the data file are: Treatment – Family, Standard, Behavioralprewt - weight at the beginning of treatment postwt - weight at the end of the study period Weight Gain - weight gained (or lost) during treatment (postwt-prewt)

We begin our analysis by examining comparative displays for the weight gained across the three treatment methods. To do this select Fit Y by X from the Analyze menu and place the grouping variable, group, in the X box and place the response, Weight Gain, in the Y box and click OK. Here boxplots, mean diamonds, normal quantile plots, and comparison circles have been added.

Things to consider from this graphical display:

Do there appear to be differences in the mean weight gain?

Are the weight changes normally distributed?

Is the variation in weight gain equal across therapies?

57

Page 5: EXAMPLE 1 – Butter Fat Content in Cow Milkcourse1.winona.edu/bdeppa/STAT 600/Handouts/Section 3... · Web viewSTAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring

STAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring 2017 (Comparing several population means)

Checking the Equality of Variance AssumptionTo test whether it is reasonable to assume the population variances are equal for these three therapies select UnEqual Variances from the Oneway Analysis pull down-menu.

We have no evidence to conclude that the variances/standard deviations of the weight gains for the different treatment programs differ (p >> .05).

ONE-WAY ANOVA TEST FOR COMPARING THE THERAPY MEANSTo test the null hypothesis that the mean weight gain is the same for each of the therapy methods we will perform the standard one-way ANOVA test. To do this in JMP select Means, Anova/t-Test from the Oneway Analysis pull-down menu. The results of the test are shown in the Analysis of Variance box.

The p-value contained in the ANOVA table is .0065, thus we reject the null hypothesis at the .05 level and conclude statistically significant differences in the mean weight gain experienced by patients in the different therapy groups exist.

Equality of Variance TestHo : σ1

2=σ22=⋯=σk

2

Ha : the population variances are not all equal

MSTreat=307 . 22MS Error=56 . 68The mean square for treatments is 5.42 times larger than the mean square for error! This provides strong evidence against the null hypothesis.

58

Page 6: EXAMPLE 1 – Butter Fat Content in Cow Milkcourse1.winona.edu/bdeppa/STAT 600/Handouts/Section 3... · Web viewSTAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring

STAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring 2017 (Comparing several population means)

MULTIPLE COMPARISONSBecause we have concluded that the mean weight gains across treatment method are not all equal it is natural to ask the secondary question:

Which means are significantly different from one another?

We could consider performing a series of two-sample t-Tests and constructing confidence intervals for independent samples to compare all possible pairs of means, however if the number of treatment groups is large we will almost certainly find two treatment means as being significantly different. Why? Consider a situation where we have k = 7 different treatments that we wish to compare. To compare all possible pairs of means (1 vs. 2, 1 vs. 3, …, 6 vs. 7) would require

performing a total of k (k−1)

2=21

two-sample t-Tests. If we used α=P( Type I Error)=. 05for each test we expect to make 21( . 05)≈1 Type I Error, i.e. we expect to find one pair of means as being significantly different when in fact they are not. This problem only becomes worse as the number of groups, k, gets larger.

Experiment-wise Error Rate ( EER ) Another way to think about this is to consider the probability of making no Type I Errors when making our pair-wise comparisons. When k = 7 for example, the probability of making no Type I Errors is ( . 95)21=. 3406 , i.e. the probability that we make at least one Type I Error is therefore .6596 or a 65.96% chance. Certainly this unacceptable! Why would conduct a statistical analysis when you know that you have a 66% of making an error in your conclusions? This probability is called the experiment-wise error rate.

Bonferroni CorrectionThere are several different ways to control the experiment-wise error rate (EER). One of the easiest ways to control experiment-wise error rate is use the Bonferroni Correction. If we plan on making m comparisons or conducting m significance tests the Bonferroni

Correction is to simply use α

mas our significance level rather than α . This simple correction guarantees that our experiment-wise error rate will be no larger than α . This correction implies that our p-values will

have to be less than α

m , rather than α , to be considered statistically significant.

59

Page 7: EXAMPLE 1 – Butter Fat Content in Cow Milkcourse1.winona.edu/bdeppa/STAT 600/Handouts/Section 3... · Web viewSTAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring

STAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring 2017 (Comparing several population means)

Multiple Comparison Procedures for Pairwise Comparisons of k Pop. MeansWhen performing pair-wise comparison of population means in ANOVA there are several different methods that can be employed. These methods depend on the types of pair-wise comparisons we wish to perform. The different types available in JMP are summarized briefly below:

Compare each pair using the usual two-sample t-Test for independent samples. This choice does not provide any experiment-wise error rate protection! (DON’T USE)

Compare all pairs using Tukey’s Honest Significant Difference (HSD) approach. This is best choice if you are interested comparing each possible pair of treatments.

Compare with the means to the “Best” using Hsu’s method. The best mean can either be the minimum (if smaller is better for the response) or maximum (if bigger is better for the response).

Compare each mean to a control group using Dunnett’s method. Compares each treatment mean to a control group only. You must identify the control group in JMP by clicking on an observation in your comparative plot corresponding to the control group before selecting this option.

Multiple Comparison Options in JMP

60

Page 8: EXAMPLE 1 – Butter Fat Content in Cow Milkcourse1.winona.edu/bdeppa/STAT 600/Handouts/Section 3... · Web viewSTAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring

STAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring 2017 (Comparing several population means)

Example 3.1 - Weight Gain in Anorexia Patients (cont’d)For these data we are probably interested in comparing each of the treatments to one another. For this we will use Tukey’s multiple comparison procedure for comparing all pairs of population means. Select Compare Means from the Oneway Analysis menu and highlight the All Pairs, Tukey HSD option. Beside the graph you will now notice there are circles plotted. There is one circle for each group and each circle is centered at the mean for the corresponding group.  The size of the circles are inversely proportional to the sample size, thus larger circles will drawn for groups with smaller sample sizes. These circles are called comparison circles and can be used to see which pairs of means are significantly different from each other. To do this, click on the circle for one of the treatments. Notice that the treatment group selected will be appear in the plot window and the circle will become red & bold. The means that are significantly different from the treatment group selected will have circles that are gray. These color differences will also be conveyed in the group labels on the horizontal axis. In the plot below we have selected Standard treatment group.

The results of the pairwise comparisons are also contained the output window (shown below). The matrix labeled Comparison for all pairs using Tukey-Kramer HSD identifies pairs of means that are significantly different using positive entries in this matrix. Here we see only treatments 2 and 3 significantly differ. I generally turn this option off as the other methods for displaying significant differences are better.

The next table conveys the same information by using different letters to represent populations that have significantly different means. Notice treatments 2 and 3 are not connected by the same letter so they are significantly different.

61

Page 9: EXAMPLE 1 – Butter Fat Content in Cow Milkcourse1.winona.edu/bdeppa/STAT 600/Handouts/Section 3... · Web viewSTAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring

STAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring 2017 (Comparing several population means)

Finally the CI’s in the Ordered Differences section give estimates for the differences in the population means. Here we see that mean weight gain for patients in treatment 3 is estimated to be between 2.09 lbs. and 13.34 lbs. larger than the mean weight gain for patients receiving treatment 2 (see highlighted section below).

62

Page 10: EXAMPLE 1 – Butter Fat Content in Cow Milkcourse1.winona.edu/bdeppa/STAT 600/Handouts/Section 3... · Web viewSTAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring

STAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring 2017 (Comparing several population means)

Example 3.2 – Butter Fat Content in Cow MilkData File: Butterfat-cows.JMP

This data set comes from the butter fat content for five breeds of dairy cows.

The variables in the data file are: Breed – Ayshire, Canadian, Guernsey, Holstein-Fresian, Jersey Age Group – two different age class of cows (1 = younger, 2 =

older) Butterfat - % butter fat found in the milk sample

We begin our analysis by examining comparative displays for the butter fat content across the five breeds. To do this select Fit Y by X from the Analyze menu and place the grouping variable, Breed, in the X box and place the response, Butterfat, in the Y box and click OK. The resulting plot simply shows a scatter plot of butter fat content versus breed. In many cases there will numerous data points on top of each other in such a display making the plot harder to read. To help alleviate this problem we can stagger or jitter the points a bit from vertical by selecting Jitter Points from the Display Options menu. To help visualize breed differences we could add quantile boxplots, mean diamonds, or mean with standard error bars to the graph. To do any or all of these select the appropriate options from the Display Options menu. The display below has both quantile boxplots, sample mean with error bars, and normal quantile plots added.

63

Page 11: EXAMPLE 1 – Butter Fat Content in Cow Milkcourse1.winona.edu/bdeppa/STAT 600/Handouts/Section 3... · Web viewSTAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring

STAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring 2017 (Comparing several population means)

We can clearly see the butter fat content is largest for Jersey and Guernsey cows and lowest for Holstein-Fresian. There also appears to be some difference in the variation of butter fat content as well. The summary statistics on the following page confirm our findings based on the graph above. To obtain these summary statistics select the Quantiles and Means, Std Dev, Std Err options from the Oneway Analysis menu at the top of the window.

Summary Statistics for Butter Fat Content by Breed

To test the null hypothesis that the butter fat content is the same for each of the breeds we can perform a one-way ANOVA test.

Ho : μ Ayshire=μCanadian=. . .=μJerseyHa : at least two breeds have different mean percent butter fat content in their milk

Again the assumptions for one-way ANOVA are as follows:1.) The samples are drawn independently or come from using a completely randomized design. Here we can assume the cows sampled from each breed were independently sampled. 2.) The variable of interest is normally distributed for each

population.3.) The population variances are equal across groups. Here this

means:σ

2Ayshire=σ

2Canadian=.. .=σ

2Jersey

If this assumption is violated we can use Welch’s ANOVA, which allows

for inequality of the population variances or we can transform the response (see below).

64

Page 12: EXAMPLE 1 – Butter Fat Content in Cow Milkcourse1.winona.edu/bdeppa/STAT 600/Handouts/Section 3... · Web viewSTAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring

STAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring 2017 (Comparing several population means)

To check these assumptions in JMP select UnEqual Variances and Normal Quantile Plot > Actual by Quantile (the 1st option). To conduct the one-way ANOVA test select Means, Anova/t-Test from the Oneway Analysis pull-down menu.

The graphical results and the results of the test are shown below.Butter Fat Content by Breed

Normality appears to be satisfied

65

Page 13: EXAMPLE 1 – Butter Fat Content in Cow Milkcourse1.winona.edu/bdeppa/STAT 600/Handouts/Section 3... · Web viewSTAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring

STAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring 2017 (Comparing several population means)

The equality of variance test results are shown below.

Because we have strong evidence against equality of the population variances we could use Welch’s ANOVA to test the equality of the means. This test allows for the population variances/standard deviations to differ when comparing the population means. For Welch’s ANOVA we see that the p-value < .0001, therefore we have strong evidence against the equality of the means and we conclude these dairy breeds differ in the mean butter fat content of their milk.

Variance Stabilizing Transformations

We have strong evidence against the equality of the population variances/standard deviations. (Use Welch’s ANOVA)

We conclude at least two means differ.

66

Page 14: EXAMPLE 1 – Butter Fat Content in Cow Milkcourse1.winona.edu/bdeppa/STAT 600/Handouts/Section 3... · Web viewSTAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring

STAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring 2017 (Comparing several population means)

Another approach that is often times used when we encounter situations where there is evidence against the equality of variance assumption is to transform the response. Often times we find that the variation appears to increase as the mean increases. For these data we see that this is the case, the estimated standard deviations increase as the estimated means increase. To correct this we generally consider taking the log of the response. Sometimes the square root and reciprocal are used but these transformations do not allow for any meaningful interpretation in the original scale, whereas the log transform.

The results of the analysis using the log response are shown below.

Analysis of the log(Percent Butter Fat) Content for the Five Dairy Cow Breeds

As we can see, as the samples means for the groups increase so do the sample standard deviations. To stabilize the variances/standard deviations we can try a log transformation.

67

Page 15: EXAMPLE 1 – Butter Fat Content in Cow Milkcourse1.winona.edu/bdeppa/STAT 600/Handouts/Section 3... · Web viewSTAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring

STAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring 2017 (Comparing several population means)

Because we have concluded that the mean/median log butter fat content differs across breed (p < .0001), it is natural to ask the secondary question: Which breeds have typical values that are significantly different from one another? To do this we again use multiple comparison procedures. In JMP select Compare Means from the Oneway Analysis menu and highlight the All Pairs, Tukey HSD option. The results of Tukey’s HSD are shown below.

Below is the plot showing the results when clicking on the circle for Guernsey’s.

We can see that the mean log-Butterfat Content for Guernsey’s and Jersey’s do not significantly differ from each other, but they both

Normality is still satisfied, there is no evidence against the equality of the variances, and we have strong evidence that at least two means differ.

68

Page 16: EXAMPLE 1 – Butter Fat Content in Cow Milkcourse1.winona.edu/bdeppa/STAT 600/Handouts/Section 3... · Web viewSTAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring

STAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring 2017 (Comparing several population means)

differ significantly from the other three breeds.

Here we see only Guernsey and Jersey do not significantly differ. The table using letters conveys the information, using different letters to represent populations that have significantly different means. The CI’s in the Ordered Differences section give estimates for the differences in the population means/medians in the log scale. As was the case with using the log transformation with two populations, we interpret the results in the original scale in terms of ratios of medians. For example, back transforming the interval comparing Jersey to Holstein-Fresian, we estimate that the median butter fat content found in Jersey milk is between 1.33 and 1.55 times larger than the median butter fat content in Holstein-Fresian milk. We could also state this in terms of percentages as follows: we estimate that the typical butter fat content found in Jersey milk is between 33% and 55% higher than the butter fat content found in Holstein-Fresian milk.

ONE-WAY ANOVA MODEL (FYI only)

69

Page 17: EXAMPLE 1 – Butter Fat Content in Cow Milkcourse1.winona.edu/bdeppa/STAT 600/Handouts/Section 3... · Web viewSTAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring

STAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring 2017 (Comparing several population means)

In statistics we often use models to represent the mechanism that generates the observed response values. Before we introduce the one-way ANOVA model we first must introduce some notation.

Let,ni= sample size from group i ,( i=1 ,. . . ,k )

Y ij= the jth response value for ith group ,( j=1, . .. , ni )μ= mean common to all groups assuming the null hypothesis is trueτ i= shift in the mean due to the fact the observation came from group iε ij= random error for the jth observation from the ith group . Note: The test procedure we use requires that the random errors are normally distributed with mean 0 and variance σ

2. Equivalently the test procedure requires that the response

is normally distributed with a common standard deviation σ for all k groups.

One-way ANOVA Model:Y ij=μ+τ i+εij i=1 , .. . ,k and j =1 , .. . ,ni

The null hypothesis equivalent to saying that the τ i are all 0, and the alternative says that

at least one τ i≠0 , i.e. Ho : τ i=0 for all i vs. Ha : τ i≠0 for some i . We must decide on the basis of the data whether we evidence against the null. We display this graphically as follows:

Estimates of the Model Parameters:μ=Y ¿⋅¿ ¿~ the grand meanτ i=Y i⋅¿−Y ¿⋅¿ ¿¿~ the estimate ith treatment effectY ij= μ+τ i=Y i⋅¿ ¿ ~ these are called the fitted values.ε ij=Y ij−Y ij=Y ij−Y i⋅¿ ¿ ~ these are called the residuals.

Note:Y ij= μ+τ i+ εijY ij=Y ¿⋅¿+(Y

i⋅¿−Y ¿⋅¿ )+( Y

ij− Y

i⋅¿ )

¿¿

¿

¿

70

Page 18: EXAMPLE 1 – Butter Fat Content in Cow Milkcourse1.winona.edu/bdeppa/STAT 600/Handouts/Section 3... · Web viewSTAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring

STAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring 2017 (Comparing several population means)

An Alternative Derivation of the Test Statistic Based on the ModelStarting with

Y ij= μ+τ i+ εij

Y ij=Y ¿⋅¿+(Yi⋅¿−Y ¿⋅¿ )+Y

ij− Y

i⋅¿ )

¿¿

¿

¿

we obtain(Y ij−Y ¿⋅¿)=( Yi⋅¿−Y

¿⋅¿ )+( Y

ij−Y

i⋅¿ )

¿¿¿¿

After squaring both sides we have,(Y ij−Y

¿⋅¿)2=(Yi⋅¿− Y

¿⋅¿ )2

+( Y

ij− Y

i⋅¿ )2

+2(Y

i⋅¿− Y¿⋅¿ )( Y

ij−Y

i⋅¿ )

¿¿

¿

¿

¿¿¿

Finally after summing overall of the observations and simplying we obtain.∑i=1

k

∑j=1

ni

(Y ij−Y¿⋅¿)2=∑

i=1

k

ni( Y

i⋅¿−Y

¿⋅¿)2

+ ∑i=1

k∑

j=1

ni( Y

ij− Y

i⋅¿ )2

¿¿¿¿

These measures of variation are called “Sum of Squares” and are denoted SS. The above expression can be written

SSTotal=SSTreat +SSErrorThe degrees of freedom associated with each of these SS’s have the same relationship and are given below:

df total = df treatment + df error. (n – 1) = (k – 1) + (n – k)The mean squares (i.e. variance estimates) discussed above are found by taking the sum of squares and dividing by their associated degrees of freedom, i.e.

MSTreat=SSTreat

k−1=∑i=1

k

n i τ i2

k−1 which is an estimate of the common pop. variance (σ2

) only if Ho is trueand

MS Error=SS Error

n−k =σ 2 which is an estimate of the common population σ

2regardless of Ho.

Thus when testingHo : τ i=0Ha : τ i≠0we reject Ho when MSTreat >> MSError , i.e. when the estimated treatment effects are large!

for all i.

for some i.

71

Page 19: EXAMPLE 1 – Butter Fat Content in Cow Milkcourse1.winona.edu/bdeppa/STAT 600/Handouts/Section 3... · Web viewSTAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring

STAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring 2017 (Comparing several population means)

Using the Fit Model approach to perform One-way ANOVA in JMP

Example 3.1 – Treatment Methods for Anorexia Nervosa (cont’d)

We again examine the data anorexia weight gain study, but this time we will conduct our analysis using the Fit Model approach in JMP. The Fit Model option will allow us assess model assumptions using residuals from the fitted model. The other main advantage using the Fit Model approach is that it will allow us to examine random model effects and more importantly to look at analyses with more than one factor of interest, i.e. factorial experiments (e.g. two-way ANOVA). We can also use this approach to analyze data from randomized block design experiments or, in general, where our experiment involves blocking.

The results are shown on the following page.

72

Page 20: EXAMPLE 1 – Butter Fat Content in Cow Milkcourse1.winona.edu/bdeppa/STAT 600/Handouts/Section 3... · Web viewSTAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring

STAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring 2017 (Comparing several population means)

We again see that treatment effect is significant, indicating the mean weight gains differ significantly across therapy method. We can again conduct Tukey’s HSD analysis to determine which means significantly differ and we can quantify the significant differences.

The Residual by Predicted Plot shows the residuals from the fitted model plotted vs. the predicted values. The residuals (e ij) and fitted values ( y ij) from a one-way ANOVA model are given by:

y ij= y i ∙

e ij= y ij− y i ∙

The spread of the residuals should be equal across the range of the fitted values. If there is the variation differs, it can indicate problems with the equality of variation assumption. Here we see no evidence a problem with the residuals. Another assumption “required” for conducting a one-way ANOVA is that the response is normally distributed for each treatment level. This is can be assessed by looking at the normal quantile plots of the response values for each level of the factor as we did above, however when we have a small number of observations per treatment level these plots will not show much. Another way to assess the normality assumption is construct a normal quantile plot of the residuals from the experiment.

First we save the residuals from the model fit to our spreadsheet as shown above and then we can use Distribution to examine the residuals.

73

Page 21: EXAMPLE 1 – Butter Fat Content in Cow Milkcourse1.winona.edu/bdeppa/STAT 600/Handouts/Section 3... · Web viewSTAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring

STAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring 2017 (Comparing several population means)

Random Effects One-way ANOVAExample 3.3 – Variation in Looms in Textile ManufacturingHow much variation in fabric strength is due to the fact different looms are used to produce the fabric? To answer this question plant engineers randomly sampled four looms (from many) at the plant and tested the fabric strength of n = 4 fabric samples from each. The data entered into JMP is shown below.

The random effects model for these data is given by:

y ij=μ+τ i+εij i=1,2,3,4 (looms) and j=1,2,3,4 (replicates)

74

Page 22: EXAMPLE 1 – Butter Fat Content in Cow Milkcourse1.winona.edu/bdeppa/STAT 600/Handouts/Section 3... · Web viewSTAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring

STAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring 2017 (Comparing several population means)

where we assume,

τ i ~ N (0 ,σ τ2) and ε ij ~ N (0 , σ2 )

This implies that the total variation in the fabric strengths is

Var ( y ij )=σ Total2 =σ2+σ τ

2

We want to estimate both variance components given the data, and look at what percentage of the variation of the total variation can be attributed to the fact that different looms are used produce fabric in the factory.

Analysis in JMPSelect Analyze > Fit Model and put Strength in the Y box and Loom in the effects in model box. The critical step is to highlight Loom in the model effects box and select Random Effect from the Attributes pull-down menu as shown below.

Results of E(MS) Method for Estimating Variance Components

We have two methods of estimation at our disposal. The E(MS) approach and the REML (restricted maximum likelihood) method which. First we will examine the E(MS) method, which stands for the expected mean square method.

75

Page 23: EXAMPLE 1 – Butter Fat Content in Cow Milkcourse1.winona.edu/bdeppa/STAT 600/Handouts/Section 3... · Web viewSTAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring

STAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring 2017 (Comparing several population means)

If we use the REML method we basically the same estimate along with a

95% CI forσ Loom2

.

The CI is too wide to be useful! To get precise estimates of variance components much larger sample sizes are needed.

Confidence Interval for Percent of Variation Due to an Effect (assuming equal replicates)

100(1−α )%CI for

σ τ2

σ2+σ τ2

is given by:L

1+ L≤

σ τ2

σ 2+σ τ2≤

U1+U

where,

L=1n ( MSTreatment

MSE

1Fα /2 , k−1 , N −k

−1) and U =1n (MSTreatment

MS E

1F1−(α /2) ,k−1, N−k

−1)

p-value = .0002 Variation in the response due to looms is statistically significant.

E(MS) Table E( MS Loom )=σ2+4 σ Loom

2

E( MS Error )=σ 2

σ Loom2 =6 .96

σ 2=1 . 90

σ Loom

2 =MSLoom−MSError

n=29. 72−1 . 90

4=6 .96

This estimate comes directly from the E(MS) above, thus the name of the estimation method.

% variation due to looms = 78.6%% variation due to error = 21.4%

76

Page 24: EXAMPLE 1 – Butter Fat Content in Cow Milkcourse1.winona.edu/bdeppa/STAT 600/Handouts/Section 3... · Web viewSTAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring

STAT 602: 3 – One-way ANOVA (Completely Randomized Designs) Spring 2017 (Comparing several population means)

Constructing a 95% CI for σ τ

2

σ2+σ τ2 for the loom data we first need the F-

quantiles which can be calculated using either the file F-quantile Calculator.JMP on the class server or by using tables in the F-table in the appendix of your text. To find the upper F-quantile using the table you have to use the fact that

F1−(α /2 ), a−1 , N −a=1

Fα /2, N−a , a−1

Thus Fα /2, a−1, N−a=F. 025 ,3 , 12=4 . 4742 and F1−(α /2 ), a−1 ,N −a=F . 975 , 3 ,12=. 06975

Thus we have,

L= 14 (29 .73

1 . 901

4 . 47−1)=.625 and U = 1

4 (29 .731. 90

1. 06975

−1)=55 .633

which gives,. 6251+ .625

≤σ τ

2

σ 2+σ τ2≤

55 .6331+55 .633 and finally

. 38≤σ τ

2

σ2+σ τ2 ≤.98

.

So looms account for between 38% and 98% of the total variation in fabric strength. Again this interval is very wide because of the small number of replicates, however we certainly know that loom to loom variation is not negligible.

77


Recommended