+ All Categories
Home > Documents > SAS/STAT 9.22User’s Guide Introduction to Analysis of Variance … · 2010-05-25 · 114 F...

SAS/STAT 9.22User’s Guide Introduction to Analysis of Variance … · 2010-05-25 · 114 F...

Date post: 05-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
20
SAS/STAT ® 9.22 User’s Guide Introduction to Analysis of Variance Procedures (Book Excerpt) SAS ® Documentation
Transcript
Page 1: SAS/STAT 9.22User’s Guide Introduction to Analysis of Variance … · 2010-05-25 · 114 F Chapter 5: Introduction to Analysis of Variance Procedures Analysis of variance methodology

SAS/STAT® 9.22 User’s GuideIntroduction to Analysis ofVariance Procedures(Book Excerpt)

SAS® Documentation

Page 2: SAS/STAT 9.22User’s Guide Introduction to Analysis of Variance … · 2010-05-25 · 114 F Chapter 5: Introduction to Analysis of Variance Procedures Analysis of variance methodology

This document is an individual chapter from SAS/STAT® 9.22 User’s Guide.

The correct bibliographic citation for the complete manual is as follows: SAS Institute Inc. 2010. SAS/STAT® 9.22 User’sGuide. Cary, NC: SAS Institute Inc.

Copyright © 2010, SAS Institute Inc., Cary, NC, USA

All rights reserved. Produced in the United States of America.

For a Web download or e-book: Your use of this publication shall be governed by the terms established by the vendor atthe time you acquire this publication.

U.S. Government Restricted Rights Notice: Use, duplication, or disclosure of this software and related documentationby the U.S. government is subject to the Agreement with SAS Institute and the restrictions set forth in FAR 52.227-19,Commercial Computer Software-Restricted Rights (June 1987).

SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513.

1st electronic book, May 2010

SAS® Publishing provides a complete selection of books and electronic products to help customers use SAS software toits fullest potential. For more information about our e-books, e-learning products, CDs, and hard-copy books, visit theSAS Publishing Web site at support.sas.com/publishing or call 1-800-727-3228.

SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS InstituteInc. in the USA and other countries. ® indicates USA registration.

Other brand and product names are registered trademarks or trademarks of their respective companies.

Page 3: SAS/STAT 9.22User’s Guide Introduction to Analysis of Variance … · 2010-05-25 · 114 F Chapter 5: Introduction to Analysis of Variance Procedures Analysis of variance methodology

Chapter 5

Introduction to Analysis of VarianceProcedures

ContentsOverview: Analysis of Variance Procedures . . . . . . . . . . . . . . . . . . . . . 113

Procedures That Perform Sum of Squares Analysis of Variance . . . . . . . 115Procedures That Perform General Analysis of Variance . . . . . . . . . . . . 116

Statistical Details for Analysis of Variance . . . . . . . . . . . . . . . . . . . . . . . 117From Sums of Squares to Linear Hypotheses . . . . . . . . . . . . . . . . . . 117Tests of Effects Based on Expected Mean Squares . . . . . . . . . . . . . . 118

Analysis of Variance for Fixed-Effect Models . . . . . . . . . . . . . . . . . . . . 119PROC GLM for General Linear Models . . . . . . . . . . . . . . . . . . . . 119PROC ANOVA for Balanced Designs . . . . . . . . . . . . . . . . . . . . . 120Comparing Group Means . . . . . . . . . . . . . . . . . . . . . . . . . . . 120PROC TTEST for Comparing Two Groups . . . . . . . . . . . . . . . . . . . 121

Analysis of Variance for Categorical Data and Generalized Linear Models . . . . . . 121Nonparametric Analysis of Variance . . . . . . . . . . . . . . . . . . . . . . . . . 122Constructing Analysis of Variance Designs . . . . . . . . . . . . . . . . . . . . . 122References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

Overview: Analysis of Variance Procedures

The statistical term “analyis of variance” is used in a variety of circumstances in statistical theory andapplications. In the narrowest sense, and the original sense of the phrase, it signifies a decompositionof a variance into contributing components. This was the sense used by R. A. Fisher when hedefined the term to mean the expression of genetic variance as a sum of variance components due toenvironment, heredity, and so forth:

�2D �2

1 C �22 C � � � C �

2p

In this sense of the term, the SAS/STAT procedures that fit variance component models, such asthe GLIMMIX, HPMIXED, MIXED, NESTED, and VARCOMP procedures, are “true” analysis ofvariance procedures.

Page 4: SAS/STAT 9.22User’s Guide Introduction to Analysis of Variance … · 2010-05-25 · 114 F Chapter 5: Introduction to Analysis of Variance Procedures Analysis of variance methodology

114 F Chapter 5: Introduction to Analysis of Variance Procedures

Analysis of variance methodology in a slightly broader sense—and the sense most frequently under-stood today—applies the idea of an additive decomposition of variance to an additive decompositionof sums of squares, whose expected values are functionally related to components of variation. Acollection of sums of squares that measure and can be used for inference about meaningful featuresof a model is called a sum of squares analysis of variance, whether or not such a collection is anadditive decomposition. In a linear model, the decomposition of sums of squares can be expressedin terms of projections onto orthogonal subspaces spanned by the columns of the design matrix X.This is the general approach followed in the section “Analysis of Variance” on page 61 in Chapter 3,“Introduction to Statistical Modeling with SAS/STAT Software.” Depending on the statistical questionat hand, the projections can be formulated based on estimable functions, with different types ofestimable functions giving rise to different types of sums of squares. Note that not all sum of squaresanalyses necessarily correspond to additive decompositions. For example, the Type III sums ofsquares often test hypotheses about the model that are more meaningful than those corresponding tothe Type I sums of squares. But while the Type I sums of squares additively decompose the sum ofsquares due to all model contributions, the Type III sums of squares do not necessarily add up to anyuseful quantity. The four types of estimable functions in SAS/STAT software, their interpretation,and their construction are discussed in Chapter 15, “The Four Types of Estimable Functions.” Theapplication of sum of squares analyses is not necessarily limited to models with classification effects(factors). The methodology also applies to linear regression models that contain only continuousregressor variables.

An even broader sense of the term “analysis of variance” pertains to statistical models that containclassification effects (factors), and in particular, to models that contain only classification effects. Anystatistical approach that measures features of such a model and can be used for inference is calleda general analysis of variance. Thus the procedures for general analysis of variance in SAS/STATare considered to be those that can fit statistical models containing factors, whether the data areexperimental or observational. Some procedures for general analysis of variance have a statisticalestimation principle that gives rise to a sum of squares analysis as discussed previously; othersexpress a factor’s contribution to the model fit in some other form. Note that this view of analysis ofvariance includes, for example, maximum likelihood estimation in generalized linear models withthe GENMOD procedure, restricted maximum likelihood estimation in linear mixed models withthe MIXED procedure, the estimation of variance components with the VARCOMP procedure, thecomparison of means of groups with the TTEST procedure, and the nonparametric analysis of rankscores with the NPAR1WAY procedure, and so on.

In summary, analysis of variance in the contemporary sense of statistical modeling and analysisis more aptly described as analysis of variation, the study of the influences on the variation of aphenomenon. This can take, for example, the following forms:

� an analysis of variance table based on sums of squares followed by more specific inquiriesinto the relationship among factors and their levels

� a deviance decomposition in a generalized linear model

� a series of Type III tests followed by comparisons of least squares means in a mixed model

Page 5: SAS/STAT 9.22User’s Guide Introduction to Analysis of Variance … · 2010-05-25 · 114 F Chapter 5: Introduction to Analysis of Variance Procedures Analysis of variance methodology

Procedures That Perform Sum of Squares Analysis of Variance F 115

Procedures That Perform Sum of Squares Analysis of Variance

The flagship procedure in SAS/STAT software for linear modeling with sum of squares analysistechniques is the GLM procedure. It handles most standard analysis of variance problems. Thefollowing list provides descriptions of PROC GLM and other procedures that are used for morespecialized situations:

ANOVA performs analysis of variance, multivariate analysis of variance, and repeatedmeasures analysis of variance for balanced designs. PROC ANOVA also performsmultiple comparison tests on arithmetic means.

GLM performs analysis of variance, regression, analysis of covariance, repeated mea-sures analysis, and multivariate analysis of variance. PROC GLM producesseveral diagnostic measures, performs tests for random effects, provides contrastsand estimates for customized hypothesis tests, provides tests for means adjustedfor covariates, and performs multiple-comparison tests on both arithmetic andadjusted means.

LATTICE computes the analysis of variance and analysis of simple covariance for data froman experiment with a lattice design. PROC LATTICE analyzes balanced squarelattices, partially balanced square lattices, and some rectangular lattices.

MIXED performs mixed model analysis of variance and repeated measures analysisof variance via covariance structure modeling. When you choose one of themethod-of-moment estimation techniques, the MIXED procedure produces ananalysis of variance table with sums of squares, mean squares, and expectedmean squares. PROC MIXED constructs statistical tests and intervals, allowscustomized contrasts and estimates, and computes empirical Bayes predictions.

NESTED performs analysis of variance and analysis of covariance for purely nested randommodels.

ORTHOREG performs regression by using the Gentleman-Givens computational method. Forill-conditioned data, PROC ORTHOREG can produce more accurate parameterestimates than other procedures, such as PROC GLM. See Chapter 63, “TheORTHOREG Procedure,” for more information.

VARCOMP estimates variance components for random or mixed models. If you choosethe METHOD=TYPE1 or METHOD=GRR option, the VARCOMP procedureproduces an analysis of variance table with sums of squares that correspond tothe random effects in your models.

TRANSREG fits univariate and multivariate linear models, optionally with spline and othernonlinear transformations. Models include ordinary regression and ANOVA,multiple and multivariate regression, metric and nonmetric conjoint analysis,metric and nonmetric vector and ideal point preference mapping, redundancyanalysis, canonical correlation, and response surface regression. See Chapter 91,“The TRANSREG Procedure,” for more information.

Page 6: SAS/STAT 9.22User’s Guide Introduction to Analysis of Variance … · 2010-05-25 · 114 F Chapter 5: Introduction to Analysis of Variance Procedures Analysis of variance methodology

116 F Chapter 5: Introduction to Analysis of Variance Procedures

Procedures That Perform General Analysis of Variance

Many procedures in SAS/STAT enable you to incorporate classification effects into your model andto perform statistical inferences for experimental factors and their interactions. These procedures donot necessarily rely on sums of squares decompositions to perform these inferences.

CATMOD fits linear models and performs analysis of variance and repeated measuresanalysis of variance for categorical responses. See Chapter 8, “Introductionto Categorical Data Analysis Procedures,” and Chapter 28, “The CATMODProcedure,” for more information.

GENMOD fits generalized linear models. PROC GENMOD is especially suited for responseswith discrete outcomes, and it performs logistic regression and Poisson regressionas well as fitting generalized estimating equations for repeated measures data.Bayesian analysis capabilities for generalized linear models are also availablewith the GENMOD procedure. See Chapter 8, “Introduction to Categorical DataAnalysis Procedures,” and Chapter 37, “The GENMOD Procedure,” for moreinformation.

GLIMMIX fits generalized linear mixed models by likelihood-based methods. PROC GLIM-MIX offers many facilities for analyzing and comparing classification effectsand their levels, including multiplicity-adjusted linear estimates. See Chapter 38,“The GLIMMIX Procedure,” for more information.

LOGISTIC fits logistic models for binomial and ordinal outcomes. PROC LOGISTIC pro-vides a wide variety of model-building methods and computes numerous regres-sion diagnostics. See Chapter 8, “Introduction to Categorical Data AnalysisProcedures,” and Chapter 51, “The LOGISTIC Procedure,” for more information.

NPAR1WAY performs nonparametric one-way analysis of rank scores.

TTEST compares the means of two groups of observations.

The following section discusses procedures in SAS/STAT that compute analysis of variance in modelswith classification factors in the narrow sense—that is, they produce analysis of variance tables andform F tests based on sums of squares, mean squares, and expected mean squares.

The subsequent sections discuss procedures that perform statistical inference in models with classifi-cation effects in the broader sense.

The following section also presents an overview of some of the fundamental features of analysis ofvariance. Subsequent sections describe how this analysis is performed with procedures in SAS/STATsoftware. For more detail, see the chapters for the individual procedures. Additional sources aredescribed in the section “References” on page 123.

Page 7: SAS/STAT 9.22User’s Guide Introduction to Analysis of Variance … · 2010-05-25 · 114 F Chapter 5: Introduction to Analysis of Variance Procedures Analysis of variance methodology

Statistical Details for Analysis of Variance F 117

Statistical Details for Analysis of Variance

From Sums of Squares to Linear Hypotheses

Analysis of variance (ANOVA) is a technique for analyzing data in which one or more response (ordependent or simply Y) variables are measured under various conditions identified by one or moreclassification variables. The combinations of levels for the classification variables form the cells ofthe design for the data. This design can be the result of a controlled experiment or the result of anobservational study in which you observe factors and factor level combinations in an uncontrolledenvironment. For example, an experiment might measure weight change (the dependent variable) formen and women who participated in three different weight-loss programs. The six cells of the designare formed by the six combinations of gender (men, women) and program (A, B, C).

In an analysis of variance, the variation in the response is separated into variation attributable todifferences between the classification variables and variation attributable to random error. An analysisof variance constructs tests to determine the significance of the classification effects. A typical goalin such an analysis is to compare means of the response variable for various combinations of theclassification variables.

The least squares principle is central to computing sums of squares in analysis of variance models.Suppose that you are fitting the linear model Y D Xˇ C � and that the error terms satisfy the usualassumptions (uncorrelated, zero mean, homogeneous variance). Further, suppose that X is partitionedaccording to several model effects, X D ŒX1 X2 � � � Xk�. If b̌ denotes the ordinary least squaressolution for this model, then the sum of squares attributable to the overall model can be written as

SSM D b̌0X0Y D Y0HY

where H is the “hat” matrix H D X.X0X/�X0. (This model sum of squares is not yet corrected forthe presence of an explicit or implied intercept. This adjustment would consist of subtracting nY

2

from SSM.) Because of the properties of the hat matrix H, you can write X0 D X0H and HX D X.The (uncorrected) model sum of squares thus can also be written as

SSM D b̌0.X0X/b̌This step is significant, because it demonstrates that sums of squares can be identified with quadraticfunctions in the least squares coefficients. The generalization of this idea is to do the following:

� consider hypotheses of interest in an analysis of variance model

� express the hypotheses in terms of linear estimable functions of the parameters

� compute the sums of squares associated with the estimable function

� construct statistical tests based on the sums of squares

Decomposing a model sum of squares into sequential, additive components, testing the significanceof experimental factors, comparing factor levels, and performing other statistical inferences fall

Page 8: SAS/STAT 9.22User’s Guide Introduction to Analysis of Variance … · 2010-05-25 · 114 F Chapter 5: Introduction to Analysis of Variance Procedures Analysis of variance methodology

118 F Chapter 5: Introduction to Analysis of Variance Procedures

within this generalization. Suppose that Lˇ is an estimable function (see the section “EstimableFunctions” on page 64 in Chapter 3, “Introduction to Statistical Modeling with SAS/STAT Software,”and Chapter 15, “The Four Types of Estimable Functions,” for details). The sum of squares associatedwith the hypothesis H WLˇ D 0 is

SS.H/ D SS.Lˇ D 0/ D b̌0L0 �L.X0X/�L0��1 Lb̌

One application would be to form sums of squares associated with the different components of X.For example, you can form a matrix L2 matrix such that L2ˇ D 0 tests the effect of adding thecolumns for X2 to an empty model or to test the effect of adding X2 to a model that already containsX1.

These sums of squares can also be expressed as the difference between two residual sums of squares,since Lˇ D 0 can be thought of as a (linear) restriction on the parameter estimates in the model:

SS.H/ D SSR.constrained model/ � SSR.full model/

If, in addition to the usual assumptions mentioned previously, the model errors are assumed tobe normally distributed, then SS.H/ follows a distribution that is proportional to a chi-squaredistribution. This fact, and the independence of SS.H/ from the residual sum of squares, enablesyou to construct F tests based on sums of squares in least squares models.

The extension of sum of squares analysis of variance to general analysis of variance for classificationeffects depends on the fact that the distributional properties of quadratic forms in normal randomvariables are well understood. It is not necessary to first formulate a sum of squares to arrive at anexact or even approximate F test. The generalization of the expression for SS.H/ is to form teststatistics based on quadratic forms

b̌0L0VarhLb̌i�1

Lb̌that follow a chi-square distribution if b̌ is normally distributed.

Tests of Effects Based on Expected Mean Squares

Statistical tests in analysis of variance models can be constructed by comparing independent meansquares. To test a particular null hypothesis, you compute the ratio of two mean squares that have thesame expected value under that hypothesis; if the ratio is much larger than 1, then that constitutessignificant evidence against the null. In particular, in an analysis of variance model with fixed effectsonly, the expected value of each mean square has two components: quadratic functions of fixedparameters and random variation. For example, for a fixed effect called A, the expected value of itsmean square is

EŒMS.A/� D Q.ˇ/C �2

where �2 is the common variance of the �i .

Page 9: SAS/STAT 9.22User’s Guide Introduction to Analysis of Variance … · 2010-05-25 · 114 F Chapter 5: Introduction to Analysis of Variance Procedures Analysis of variance methodology

Analysis of Variance for Fixed-Effect Models F 119

Under the null hypothesis of no A effect, the fixed portion Q(ˇ) of the expected mean square is zero.This mean square is then compared to another mean square—say, MS(E)—that is independent of thefirst and has the expected value �2. The ratio of the two mean squares

F DMS.A/MS.E/

has an F distribution under the null hypothesis.

When the null hypothesis is false, the numerator term has a larger expected value, but the expectedvalue of the denominator remains the same. Thus, large F values lead to rejection of the nullhypothesis. The probability of getting an F value at least as large as the one observed given that thenull hypothesis is true is called the significance probability value (or the p-value). A p-value of lessthan 0.05, for example, indicates that data with no A effect will yield F values as large as the oneobserved less than 5% of the time. This is usually considered moderate evidence that there is a realA effect. Smaller p-values constitute even stronger evidence. Larger p-values indicate that the effectof interest is less than random noise. In this case, you can conclude either that there is no effect at allor that you do not have enough data to detect the differences being tested.

The actual pattern in expected mean squares of terms related to fixed quantities (Q(ˇ)) and functionsof variance components depends on which terms in your model are fixed effects and which terms arerandom effects. This has bearing on how F statistics can be constructed. In some instances, exacttests are not available, such as when a linear combination of expected mean squares is necessary toform a proper denominator for an F test and a Satterthwaite approximation is used to determine thedegrees of freedom of the approximation. The GLM and MIXED procedures can generate tables ofexpected mean squares and compute degrees of freedom by Satterthwaite’s method. The MIXEDand GLIMMIX procedures can apply Satterthwaite approximations and other degrees-of-freedomcomputations more widely than in analysis of variance models. See the section “Fixed, Random,and Mixed Models” on page 33 in Chapter 3, “Introduction to Statistical Modeling with SAS/STATSoftware,” for a discussion of fixed versus random effects in statistical models.

Analysis of Variance for Fixed-Effect Models

PROC GLM for General Linear Models

The GLM procedure is the flagship tool for classical analysis of variance in SAS/STAT software. Itperforms analysis of variance by using least squares regression to fit general linear models. Amongthe statistical methods available in PROC GLM are regression, analysis of variance, analysis ofcovariance, multivariate analysis of variance, repeated measures analysis, and partial correlationanalysis.

While PROC GLM can handle most common analysis of variance problems, other procedures aremore efficient or have more features than PROC GLM for certain specialized analyses, or they canhandle specialized models that PROC GLM cannot. Much of the rest of this chapter is concernedwith comparing PROC GLM to other procedures.

Page 10: SAS/STAT 9.22User’s Guide Introduction to Analysis of Variance … · 2010-05-25 · 114 F Chapter 5: Introduction to Analysis of Variance Procedures Analysis of variance methodology

120 F Chapter 5: Introduction to Analysis of Variance Procedures

PROC ANOVA for Balanced Designs

When you design an experiment, you choose how many experimental units to assign to eachcombination of levels (or cells) in the classification. In order to achieve good statistical propertiesand simplify the computations, you typically attempt to assign the same number of units to every cellin the design. Such designs are called balanced designs.

In SAS/STAT software, you can use the ANOVA procedure to perform analysis of variance forbalanced data. The ANOVA procedure performs computations for analysis of variance that assumethe balanced nature of the data. These computations are simpler and more efficient than thecorresponding general computations performed by PROC GLM. Note that PROC ANOVA can beapplied to certain designs that are not balanced in the strict sense of equal numbers of observationsfor all cells. These additional designs include all one-way models, regardless of how unbalanced thecell counts are, as well as Latin squares, which do not have data in all cells. In general, however,the ANOVA procedure is recommended only for balanced data. If you use ANOVA to analyze adesign that is not balanced, you must assume responsibility for the validity of the output. Youare responsible for recognizing incorrect results, which might include negative values reported forthe sums of squares. If you are not certain that your data fit into a balanced design, then you probablyneed the framework of general linear models in the GLM procedure.

Comparing Group Means

The F test for a classification factor that has more than two levels tells you whether the level effectsare significantly different from each other, but it does not tell you which levels differ from whichother levels.

If the level comparisons are expressed through differences of the arithmetic cell means, you can usethe MEANS statement in the GLM and ANOVA procedure for comparison. If arithmetic means arenot appropriate for comparison, for example, because your data are unbalanced or means need tobe adjusted for other model effects, then you can use the LSMEANS statement in the GLIMMIX,GLM, and MIXED procedures for level comparisons.

If you have specific comparisons in mind, you can use the CONTRAST statement in these proceduresto make these comparisons. However, if you make many comparisons that use some given signifi-cance level (0:05, for example), you are more likely to make a type 1 error (incorrectly rejecting ahypothesis that the means are equal) simply because you have more chances to make the error.

Multiple-comparison methods give you more detailed information about the differences among themeans and enable you to control error rates for a multitude of comparisons. A variety of multiple-comparison methods are available with the MEANS statement in both the ANOVA and GLMprocedures, as well as the LSMEANS statement in the GLIMMIX, GLM, and MIXED procedures.These are described in detail in the section “Multiple Comparisons” on page 3070 in Chapter 39,“The GLM Procedure,” and in Chapter 38, “The GLIMMIX Procedure,” and Chapter 56, “TheMIXED Procedure.”

Page 11: SAS/STAT 9.22User’s Guide Introduction to Analysis of Variance … · 2010-05-25 · 114 F Chapter 5: Introduction to Analysis of Variance Procedures Analysis of variance methodology

PROC TTEST for Comparing Two Groups F 121

PROC TTEST for Comparing Two Groups

If you want to perform an analysis of variance and have only one classification variable with twolevels, you can use PROC TTEST. In this special case, the results generated by PROC TTEST areequivalent to the results generated by PROC ANOVA or PROC GLM.

You can use PROC TTEST with balanced or unbalanced groups. In addition to the test assumingequal variances, PROC TTEST also performs a Satterthwaite test assuming unequal variances.

The TTEST procedure also performs equivalence tests, computes confidence limits, and supportsboth normal and lognormal data. If you have an AB/BA crossover design with no carryover effects,then you can use the TTEST procedure to analyze the treatment and period effects.

The PROC NPAR1WAY procedure performs nonparametric analogues to t tests. See Chapter 16,“Introduction to Nonparametric Analysis,” for an overview and Chapter 62, “The NPAR1WAYProcedure,” for details on PROC NPAR1WAY.

Analysis of Variance for Categorical Data and GeneralizedLinear Models

A categorical variable is defined as one that can assume only a limited number of values. Forexample, a person’s gender is a categorical variable that can assume one of two values. Variableswith levels that simply name a group are said to be measured on a nominal scale. Categoricalvariables can also be measured using an ordinal scale, which means that the levels of the variable areordered in some way. For example, responses to an opinion poll are usually measured on an ordinalscale, with levels ranging from “strongly disagree” to “no opinion” to “strongly agree.”

For two categorical variables, one measured on an ordinal scale and one measured on a nominalscale, you can assign scores to the levels of the ordinal variable and test whether the mean scores forthe different levels of the nominal variable are significantly different. This process is analogous toperforming an analysis of variance on continuous data, which can be performed by PROC CATMOD.If there are n nominal variables, rather than 1, then PROC CATMOD can perform an n-way analysisof variance of the mean scores.

For two categorical variables measured on a nominal scale, you can test whether the distribution ofthe first variable is significantly different for the levels of the second variable. This process is ananalysis of variance of proportions, rather than means, and can be performed by PROC CATMOD.The corresponding n-way analysis of variance can also be performed by PROC CATMOD.

See Chapter 8, “Introduction to Categorical Data Analysis Procedures,” and Chapter 28, “TheCATMOD Procedure,” for more information.

The GENMOD procedure uses maximum likelihood estimation to fit generalized linear models. Thisfamily includes models for categorical data such as logistic, probit, and complementary log-logregression for binomial data and Poisson regression for count data, as well as continuous models such

Page 12: SAS/STAT 9.22User’s Guide Introduction to Analysis of Variance … · 2010-05-25 · 114 F Chapter 5: Introduction to Analysis of Variance Procedures Analysis of variance methodology

122 F Chapter 5: Introduction to Analysis of Variance Procedures

as ordinary linear regression, gamma, and inverse gaussian regression models. PROC GENMODperforms analysis of variance through likelihood ratio and Wald tests of fixed effects in generalizedlinear models, and provides contrasts and estimates for customized hypothesis tests. It performsanalysis of repeated measures data with generalized estimating equation (GEE) methods.

See Chapter 8, “Introduction to Categorical Data Analysis Procedures,” and Chapter 37, “TheGENMOD Procedure,” for more information.

Nonparametric Analysis of Variance

Analysis of variance is sensitive to the distribution of the error term. If the error term is not normallydistributed, the statistics based on normality can be misleading. The traditional test statistics arecalled parametric tests because they depend on the specification of a certain probability distributionexcept for a set of free parameters. Parametric tests are said to depend on distributional assumptions.Nonparametric methods perform the tests without making any strict distributional assumptions.Even if the data are distributed normally, nonparametric methods are often almost as powerful asparametric methods.

Most nonparametric methods are based on taking the ranks of a variable and analyzing these ranks(or transformations of them) instead of the original values. The NPAR1WAY procedure performs anonparametric one-way analysis of variance. Other nonparametric tests can be performed by takingranks of the data (using the RANK procedure) and using a regular parametric procedure (such asGLM or ANOVA) to perform the analysis. Some of these techniques are outlined in the descriptionof PROC RANK in SAS Language Reference: Concepts and in Conover and Iman (1981).

Constructing Analysis of Variance Designs

Analysis of variance is most often used for data from designed experiments. You can use the PLANprocedure to construct designs for many experiments. For example, PROC PLAN constructs designsfor completely randomized experiments, randomized blocks, Latin squares, factorial experiments,certain balanced incomplete block designs, and balanced crossover designs.

Randomization, or randomly assigning experimental units to cells in a design and to treatmentswithin a cell, is another important aspect of experimental design. For either a new or an existingdesign, you can use PROC PLAN to randomize the experimental plan.

Additional features for design of experiments are available in SAS/QC software. The FACTEXand OPTEX procedures can construct a wide variety of designs, including factorials, fractionalfactorials, and D-optimal or A-optimal designs. These procedures, as well as the ADX Interface,provide features for randomizing and replicating designs; saving the design in an output data set; andinteractively changing the design by changing its size, use of blocking, or the search strategies used.For more information, see the SAS/QC User’s Guide.

Page 13: SAS/STAT 9.22User’s Guide Introduction to Analysis of Variance … · 2010-05-25 · 114 F Chapter 5: Introduction to Analysis of Variance Procedures Analysis of variance methodology

References F 123

References

Analysis of variance was pioneered by R. A. Fisher (1925). For a general introduction to analysis ofvariance, see an intermediate statistical methods textbook such as Steel and Torrie (1980), Snedecorand Cochran (1980), Milliken and Johnson (1984), Mendenhall (1968), John (1971), Ott (1977), orKirk (1968). A classic source is Scheffé (1959). Freund, Littell, and Spector (1991) bring together atreatment of these statistical methods and SAS/STAT software procedures. Schlotzhauer and Littell(1997) cover how to perform t tests and one-way analysis of variance with SAS/STAT procedures.Texts on linear models include Searle (1971), Graybill (1976), and Hocking (1984). Kennedy andGentle (1980) survey the computing aspects. Other references include the following:

Conover, W. J. and Iman, R. L. (1981), “Rank Transformations as a Bridge between Parametric andNonparametric Statistics,” The American Statistician, 35, 124–129.

Fisher, R. A. (1925), Statistical Methods for Research Workers, Edinburgh: Oliver & Boyd.

Freund, R. J., Littell, R. C., and Spector, P. C. (1991), SAS System for Linear Models, Cary, NC: SASInstitute Inc.

Graybill, F. A. (1976), Theory and Applications of the Linear Model, North Scituate, MA: DuxburyPress.

Hocking, R. R. (1984), Analysis of Linear Models, Monterey, CA: Brooks-Cole.

John, P. (1971), Statistical Design and Analysis of Experiments, New York: Macmillan.

Kennedy, W. J., Jr. and Gentle, J. E. (1980), Statistical Computing, New York: Marcel Dekker.

Kirk, R. E. (1968), Experimental Design: Procedures for the Behavioral Sciences, Monterey, CA:Brooks-Cole.

Mendenhall, W. (1968), Introduction to Linear Models and the Design and Analysis of Experiments,Belmont, CA: Duxbury Press.

Milliken, G. A. and Johnson, D. E. (1984), Analysis of Messy Data Volume I: Designed Experiments,Belmont, CA: Lifetime Learning Publications.

Ott, L. (1977), Introduction to Statistical Methods and Data Analysis, Second Edition, Belmont, CA:Duxbury Press.

Scheffé, H. (1959), The Analysis of Variance, New York: John Wiley & Sons.

Schlotzhauer, S. D. and Littell, R. C. (1997), SAS System for Elementary Statistical Analysis, Cary,NC: SAS Institute Inc.

Searle, S. R. (1971), Linear Models, New York: John Wiley & Sons.

Snedecor, G. W. and Cochran, W. G. (1980), Statistical Methods, Seventh Edition, Ames: Iowa StateUniversity Press.

Page 14: SAS/STAT 9.22User’s Guide Introduction to Analysis of Variance … · 2010-05-25 · 114 F Chapter 5: Introduction to Analysis of Variance Procedures Analysis of variance methodology

124 F Chapter 5: Introduction to Analysis of Variance Procedures

Steel, R. G. D. and Torrie, J. H. (1980), Principles and Procedures of Statistics, Second Edition, NewYork: McGraw-Hill.

Page 15: SAS/STAT 9.22User’s Guide Introduction to Analysis of Variance … · 2010-05-25 · 114 F Chapter 5: Introduction to Analysis of Variance Procedures Analysis of variance methodology

Index

analysis of varianceIntroduction to ANOVA Procedures, 113

estimable functionIntroduction to ANOVA Procedures, 114, 118

generalized estimating equationsIntroduction to ANOVA Procedures, 116

generalized linear modelIntroduction to ANOVA Procedures, 116

Introduction to ANOVASAS/STAT procedures, 113

Introduction to ANOVA Proceduresanalysis of covariance, 115analysis of ranks, 114, 122analysis of variance, 115balanced design, 120Bayesian analysis, 116categorical response, 116classification effect, 114, 116, 117constructing designs, 122controlled experiment, 117covariance structure modeling, 115definition, 113design matrix, 114empirical Bayes predictions, 115estimable function, 114, 118exact test, 119expected mean squares, 118, 119experimental data, 114, 117F-test based on sum of squares, 118fixed effect, 119general analysis of variance model, 114general ANOVA procedures, 116generalized estimating equations, 116generalized linear mixed model, 116generalized linear model, 116group comparisons, 120hat matrix, 117hypothesis sum of squares, 118lattice design, 115least squares, 117linear model, 114mean squares, 118mean squares, expected, 118, 119method of moments, 115model sum of squares, 117multiple comparisons, 120

multivariate analysis of variance, 115nested model, 115nonlinear transformation, 115nonparametric analysis, 114, 116, 122observational data, 114, 117p-value, 119projection, 114random effect, 119repeated measures, 115residual sum of squares, 118Satterthwaite approximation, 119spline transformation, 115sum of squares decomposition, 114, 116Type I sum of squares, 114Type III sum of squares, 114variance components, 115

least squaresIntroduction to ANOVA Procedures, 117

matrixdesign (Introduction to ANOVA Procedures),

114hat (Introduction to ANOVA Procedures),

117projection (Introduction to ANOVA

Procedures), 114mixed model

generalized linear (Introduction to ANOVAProcedures), 116

linear (Introduction to ANOVA Procedures),115

sum of squaresdecomposition (Introduction to ANOVA

Procedures), 114, 116F-test (Introduction to ANOVA Procedures),

118for linear hypothesis (Introduction to

ANOVA Procedures), 118model (Introduction to ANOVA Procedures),

117residual (Introduction to ANOVA

Procedures), 118Type I (Introduction to ANOVA Procedures,

114Type III (Introduction to ANOVA Procedures,

114

Page 16: SAS/STAT 9.22User’s Guide Introduction to Analysis of Variance … · 2010-05-25 · 114 F Chapter 5: Introduction to Analysis of Variance Procedures Analysis of variance methodology
Page 17: SAS/STAT 9.22User’s Guide Introduction to Analysis of Variance … · 2010-05-25 · 114 F Chapter 5: Introduction to Analysis of Variance Procedures Analysis of variance methodology

Your Turn

We welcome your feedback.

� If you have comments about this book, please send them [email protected]. Include the full title and page numbers (ifapplicable).

� If you have comments about the software, please send them [email protected].

Page 18: SAS/STAT 9.22User’s Guide Introduction to Analysis of Variance … · 2010-05-25 · 114 F Chapter 5: Introduction to Analysis of Variance Procedures Analysis of variance methodology
Page 19: SAS/STAT 9.22User’s Guide Introduction to Analysis of Variance … · 2010-05-25 · 114 F Chapter 5: Introduction to Analysis of Variance Procedures Analysis of variance methodology

SAS® Publishing Delivers!Whether you are new to the work force or an experienced professional, you need to distinguish yourself in this rapidly changing and competitive job market. SAS® Publishing provides you with a wide range of resources to help you set yourself apart. Visit us online at support.sas.com/bookstore.

SAS® Press Need to learn the basics? Struggling with a programming problem? You’ll find the expert answers that you need in example-rich books from SAS Press. Written by experienced SAS professionals from around the world, SAS Press books deliver real-world insights on a broad range of topics for all skill levels.

s u p p o r t . s a s . c o m / s a s p r e s sSAS® Documentation To successfully implement applications using SAS software, companies in every industry and on every continent all turn to the one source for accurate, timely, and reliable information: SAS documentation. We currently produce the following types of reference documentation to improve your work experience:

• Onlinehelpthatisbuiltintothesoftware.• Tutorialsthatareintegratedintotheproduct.• ReferencedocumentationdeliveredinHTMLandPDF– free on the Web. • Hard-copybooks.

s u p p o r t . s a s . c o m / p u b l i s h i n gSAS® Publishing News Subscribe to SAS Publishing News to receive up-to-date information about all new SAS titles, author podcasts, and new Web site features via e-mail. Complete instructions on how to subscribe, as well as access to past issues, are available at our Web site.

s u p p o r t . s a s . c o m / s p n

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Otherbrandandproductnamesaretrademarksoftheirrespectivecompanies.©2009SASInstituteInc.Allrightsreserved.518177_1US.0109

Page 20: SAS/STAT 9.22User’s Guide Introduction to Analysis of Variance … · 2010-05-25 · 114 F Chapter 5: Introduction to Analysis of Variance Procedures Analysis of variance methodology

Recommended