Springer Texts in Statistics
For other titles published in this series, go to
Springer Texts in Statistics
http://www.springer.com/series/417
Series Editors:G. CasellaS. FienbergI. Olkin
Shalabh
of Designed ExperimentsStatistical Analysis
Third Edition
Helge Toutenburg
c
Printed on acid-free paper
USA
© Springer Science+Business Media, LLC 2009
Shalabh
Indian Institute of TechnologyKanpur-208016India
George CasellaDepartment of StatisticsUniversity of FloridaGainesville, FL 32611-8545
USA
USA
Stephen FienbergDepartment of StatisticsCarnegie Mellon UniversityPittsburg, PA 15213-3890
Ingram OlkinDepartment of StatisticsStanford UniversityStanford, CA 94305
ISSN 1431-875XISBN 978-1-4419-1147-6 e-ISBN 978-1-4419-1148-3
Library of Congress Control Number:
Department of Mathematics & Statistics
DOI 10.1007/978-1-4419-1148-3
Springer is part of Springer Science+Business Media (www.springer.com)
STS Editorial Board
2009934435
Springer New York Dordrecht Heidelberg London
permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.
All rights reserved. This work may not be translated or copied in whole or in part without the written
[email protected] MünchenGermany
Helge Toutenburg
Institut für StatistikLudwig-Maximilians-Universität Akademiestraße 1
Preface to the Third Edition
This book is the third revised and updated English edition of the Germantextbook “Versuchsplanung und Modellwahl” by Helge Toutenburg whichwas based on more than 15 years experience of lectures on the course “De-sign of Experiments” at the University of Munich and interactions with thestatisticians from industries and other areas of applied sciences and engi-neering. This is a type of resource/ reference book which contains statisticalmethods used by researchers in applied areas. Because of the diverse exam-ples combined with software demonstrations it is also useful as a textbookin more advanced courses,
The applications of design of experiments have seen a significant growthin the last few decades in different areas like industries, pharmaceuticalsciences, medical sciences, engineering sciences etc. The second editionof this book received appreciation from academicians, teachers, studentsand applied statisticians. As a consequence, Springer-Verlag invited HelgeToutenburg to revise it and he invited Shalabh for the third edition of thebook.
In our experience with students, statisticians from industries and re-searchers from other fields of experimental sciences, we realized theimportance of several topics in the design of experiments which will in-crease the utility of this book. Moreover we experienced that these topicsare mostly explained only theoretically in most of the available books. Stu-dents and applied statisticians generally loose their interest and patiencein reading too much theory before they can understand the topic and use itin the applications. So we decided to write and include these topics in thethird edition of the book. We have attempted to go into theory only up to
vi
a necessary level. At several places, we have tried to explain the concepts,methodologies and utility of the topics with particular cases of designs ofexperiments instead of starting directly with a theoretical setup. We wouldlike to remark that this text may not directly appeal to a reader interestedonly in theory. Some good references are provided which can be followedlater to get a theoretical grasp after understanding the text from this book.
We have added a new Chapter 6 on incomplete block designs. Thischapter starts with an introduction to the general theory of incompleteblock designs which is necessary to understand the analysis of balancedincomplete block design and partially balanced incomplete block designintroduced afterwards. More emphasis is given in explaining the setup,concept, methodology and various other aspects of these designs. For theanalysis part, the results from the general theory of incomplete blockdesigns are carried over and used directly.
The Chapter on ”Multifactor Experiments” is extended and topics onconfounding, partial confounding and fractional replications in factorialexperiments are introduced. These topics do not start directly with thetheoretical setup. We have rather considered particular cases of factorialdesigns to explain the intricacies of related concepts and have developedthe necessary tools stepwise. Once a reader understands these steps andgets familiar with the concepts and terminologies, then all the details canbe extended to a general setup.
The derivations of the theoretical results again are put into an Appendixso that a reader interested in the applications is not burdened unnecessarily.
We thank Dr. John Kimmel of Springer-Verlag for his help in the thirdedition of the book.
We invite the readers to send their comments and suggestions on thecontents and treatment of the topics in the book for possible improvementin future editions.
Munchen, Germany Helge ToutenburgKanpur, India ShalabhJuly 7, 2009
Preface to the EditionThird
Preface
This book is the second English edition of my German textbook thatwas originally written parallel to my lecture “Design of Experiments”which was held at the University of Munich. It is thought to be a typeof resource/reference book which contains statistical methods used by re-searchers in applied areas. Because of the diverse examples it could also beused in more advanced undergraduate courses, as a textbook.
It is often called to our attention, by statisticians in the pharmaceu-tical industry, that there is a need for a summarizing and standardizedrepresentation of the design and analysis of experiments that includes thedifferent aspects of classical theory for continuous response, and of modernprocedures for a categorical and, especially, correlated response, as well asmore complex designs as, for example, cross–over and repeated measures.Therefore the book is useful for non statisticians who may appreciate theversatility of methods and examples, and for statisticians who will alsofind theoretical basics and extensions. Therefore the book tries to bridgethe gap between the application and theory within methods dealing withdesigned experiments.
In order to illustrate the examples we decided to use the software pack-ages SAS, SPLUS, and SPSS. Each of these has advantages over the othersand we hope to have used them in an acceptable way. Concerning the datasets we give references where possible.
viii
Staff and graduate students played an essential part in the preparationof the manuscript. They wrote the text in well–tried precision, worked–outexamples (Thomas Nittner), and prepared several sections in the book (Ul-rike Feldmeier, Andreas Fieger, Christian Heumann, Sabina Illi, ChristianKastner, Oliver Loch, Thomas Nittner, Elke Ortmann, Andrea Schopp, andIrmgard Strehler).
Especially I would like to thank Thomas Nittner who has done a greatdeal of work on this second edition. We are very appreciative of the effortsof those who assisted in the preparation of the English version. In partic-ular, we would like to thank Sabina Illi and Oliver Loch, as well as V.K.Srivastava (1943–2001), for their careful reading of the English version.
This book is constituted as follows. After a short Introduction, with someexamples, we want to give a compact survey of the comparison of two sam-ples (Chapter 2). The well–known linear regression model is discussed inChapter 3 with many details, of a theoretical nature, and with emphasison sensitivity analysis at the end. Chapter 4 contains single–factor exper-iments with different kinds of factors, an overview of multiple regressions,and some special cases, such as regression analysis of variance or modelswith random effects. More restrictive designs, like the randomized blockdesign or Latin squares, are introduced in Chapter 5. Experiments withmore than one factor are described in Chapter 7, with some basics such as,e.g., effect coding. As categorical response variables are present in Chap-ters 9 and 10 we have put the models for categorical response, though theyare more theoretical, in Chapter 8. Chapter 9 contains repeated measuremodels, with their whole versatility and complexity of designs and testingprocedures. A more difficult design, the cross–over, can be found in Chap-ter 10. Chapter 11 treats the problem of incomplete data. Apart from thebasics of matrix algebra (Appendix A), the reader will find some proofs forChapters 3 and 4 in Appendix B. Last but not least, Appendix C containsthe distributions and tables necessary for a better understanding of theexamples.
Of course, not all aspects can be taken into account, specially as devel-opment in the field of generalized linear models is so dynamic, it is hard toinclude all current tendencies. In order to keep up with this development,the book contains more recent methods for the analysis of clusters.
To some extent, concerning linear models and designed experiments, wewant to recommend the books by McCulloch and Searle (2000), Wu andHamada (2000), and Dean and Voss (1998) for supplying revised material.
Preface
ix
Finally, we would like to thank John Kimmel, Timothy Taylor, and BrianHowe of Springer–Verlag New York for their cooperation and confidence inthis book.
Universitat Munchen Helge ToutenburgMarch 25, 2002 Thomas Nittner
Preface
Contents
Preface to the Third Edition v
Preface vii
1 Introduction 11.1 Data, Variables, and Random Processes . . . . . . . . . . 11.2 Basic Principles of Experimental Design . . . . . . . . . 31.3 Scaling of Variables . . . . . . . . . . . . . . . . . . . . . 51.4 Measuring and Scaling in Statistical Medicine . . . . . . 71.5 Experimental Design in Biotechnology . . . . . . . . . . 81.6 Relative Importance of Effects—The Pareto Principle . . 91.7 An Alternative Chart . . . . . . . . . . . . . . . . . . . . 101.8 A One–Way Factorial Experiment by Example . . . . . . 151.9 Exercises and Questions . . . . . . . . . . . . . . . . . . . 19
2 Comparison of Two Samples 212.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 212.2 Paired t–Test and Matched–Pair Design . . . . . . . . . . 222.3 Comparison of Means in Independent Groups . . . . . . 25
2.3.1 Two–Sample t–Test . . . . . . . . . . . . . . . . 252.3.2 Testing H0 : σ2
A = σ2B = σ2 . . . . . . . . . . . . 25
2.3.3 Comparison of Means in the Case ofUnequal Variances . . . . . . . . . . . . . . . . . 26
xii Contents
2.3.4 Transformations of Data to Assure Homo-geneity of Variances . . . . . . . . . . . . . . . . 27
2.3.5 Necessary Sample Size and Power of the Test . . 272.3.6 Comparison of Means without Prior Testing
H0 : σ2A = σ2
B ; Cochran–Cox Test forIndependent Groups . . . . . . . . . . . . . . . . 27
2.4 Wilcoxon’s Sign–Rank Test in the Matched–Pair Design 282.5 Rank Test for Homogeneity of Wilcoxon, Mann and
Whitney . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.6 Comparison of Two Groups with Categorical Response . 38
2.6.1 McNemar’s Test and Matched–Pair Design . . . 382.6.2 Fisher’s Exact Test for Two Independent
Groups . . . . . . . . . . . . . . . . . . . . . . . 402.7 Exercises and Questions . . . . . . . . . . . . . . . . . . . 42
3 The Linear Regression Model 453.1 Descriptive Linear Regression . . . . . . . . . . . . . . . 453.2 The Principle of Ordinary Least Squares . . . . . . . . . 473.3 Geometric Properties of Ordinary Least Squares
Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 503.4 Best Linear Unbiased Estimation . . . . . . . . . . . . . 51
3.4.1 Linear Estimators . . . . . . . . . . . . . . . . . 523.4.2 Mean Square Error . . . . . . . . . . . . . . . . . 533.4.3 Best Linear Unbiased Estimation . . . . . . . . . 553.4.4 Estimation of σ2 . . . . . . . . . . . . . . . . . . 57
3.5 Multicollinearity . . . . . . . . . . . . . . . . . . . . . . . 603.5.1 Extreme Multicollinearity and Estimability . . . 603.5.2 Estimation within Extreme Multicollinearity . . 613.5.3 Weak Multicollinearity . . . . . . . . . . . . . . . 63
3.6 Classical Regression under Normal Errors . . . . . . . . . 673.7 Testing Linear Hypotheses . . . . . . . . . . . . . . . . . 693.8 Analysis of Variance and Goodness of Fit . . . . . . . . . 73
3.8.1 Bivariate Regression . . . . . . . . . . . . . . . . 733.8.2 Multiple Regression . . . . . . . . . . . . . . . . 79
3.9 The General Linear Regression Model . . . . . . . . . . . 843.9.1 Introduction . . . . . . . . . . . . . . . . . . . . 843.9.2 Misspecification of the Covariance Matrix . . . . 85
3.10 Diagnostic Tools . . . . . . . . . . . . . . . . . . . . . . . 873.10.1 Introduction . . . . . . . . . . . . . . . . . . . . 873.10.2 Prediction Matrix . . . . . . . . . . . . . . . . . 873.10.3 Effect of a Single Observation on the
Estimation of Parameters . . . . . . . . . . . . . 913.10.4 Diagnostic Plots for Testing the Model
Assumptions . . . . . . . . . . . . . . . . . . . . 963.10.5 Measures Based on the Confidence Ellipsoid . . . 97
Contents xiii
3.10.6 Partial Regression Plots . . . . . . . . . . . . . . 1023.10.7 Regression Diagnostics by Animating Graphics . 105
3.11 Exercises and Questions . . . . . . . . . . . . . . . . . . . 110
4 Single–Factor Experiments with Fixed and RandomEffects 1134.1 Models I and II in the Analysis of Variance . . . . . . . . 1134.2 One–Way Classification for the Multiple Compari-
son of Means . . . . . . . . . . . . . . . . . . . . . . . . . 1154.2.1 Representation as a Restrictive Model . . . . . . 1174.2.2 Decomposition of the Error Sum of Squares . . . 1194.2.3 Estimation of σ2 by MSError . . . . . . . . . . . 123
4.3 Comparison of Single Means . . . . . . . . . . . . . . . . 1264.3.1 Linear Contrasts . . . . . . . . . . . . . . . . . . 1264.3.2 Contrasts of the Total Response Values in
the Balanced Case . . . . . . . . . . . . . . . . . 1294.4 Multiple Comparisons . . . . . . . . . . . . . . . . . . . . 134
4.4.1 Introduction . . . . . . . . . . . . . . . . . . . . 1344.4.2 Experimentwise Comparisons . . . . . . . . . . . 1354.4.3 Select Pairwise Comparisons . . . . . . . . . . . 137
4.5 Regression Analysis of Variance . . . . . . . . . . . . . . 1444.6 One–Factorial Models with Random Effects . . . . . . . 1474.7 Rank Analysis of Variance in the Completely Ran-
domized Design . . . . . . . . . . . . . . . . . . . . . . . 1514.7.1 Kruskal–Wallis Test . . . . . . . . . . . . . . . . 1514.7.2 Multiple Comparisons . . . . . . . . . . . . . . . 154
4.8 Exercises and Questions . . . . . . . . . . . . . . . . . . . 156
5 More Restrictive Designs 1595.1 Randomized Block Design . . . . . . . . . . . . . . . . . 1595.2 Latin Squares . . . . . . . . . . . . . . . . . . . . . . . . 168
5.2.1 Analysis of Variance . . . . . . . . . . . . . . . . 1695.3 Rank Variance Analysis in the Randomized Block Design 175
5.3.1 Friedman Test . . . . . . . . . . . . . . . . . . . 1755.3.2 Multiple Comparisons . . . . . . . . . . . . . . . 177
5.4 Exercises and Questions . . . . . . . . . . . . . . . . . . . 179
6 Incomplete Block Designs 1816.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 1816.2 General Theory of Incomplete Block Designs . . . . . . . 1836.3 Intrablock Analysis of Incomplete Block Design . . . . . 185
6.3.1 Model and Normal Equations . . . . . . . . . . . 1856.3.2 Covariance Matrices of Adjusted Treatment
and Block Totals . . . . . . . . . . . . . . . . . . 188
xiv Contents
6.3.3 Decomposition of Sum of Squares andAnalysis of Variance . . . . . . . . . . . . . . . . 189
6.4 Interblock Analysis of Incomplete Block Design . . . . . 1936.4.1 Model and Normal Equations . . . . . . . . . . . 1956.4.2 Use of Intrablock and Interblock Estimates . . . 197
6.5 Balanced Incomplete Block Design . . . . . . . . . . . . . 2016.5.1 Interpretation of Conditions of BIBD . . . . . . 2026.5.2 Intrablock Analysis of BIBD . . . . . . . . . . . 2046.5.3 Interblock Analysis and Recovery of
Interblock Information in BIBD . . . . . . . . . . 2116.6 Partially Balanced Incomplete Block Designs . . . . . . . 219
6.6.1 Partially Balanced Association Schemes . . . . . 2206.6.2 General Theory of PBIBD . . . . . . . . . . . . . 2296.6.3 Conditions for PBIBD . . . . . . . . . . . . . . . 2306.6.4 Interpretations of Conditions of BIBD . . . . . . 2306.6.5 Intrablock Analysis of PBIBD With Two
Associates . . . . . . . . . . . . . . . . . . . . . . 2316.7 Exercises and Questions . . . . . . . . . . . . . . . . . . . 241
7 Multifactor Experiments 2457.1 Elementary Definitions and Principles . . . . . . . . . . . 2457.2 Two–Factor Experiments (Fixed Effects) . . . . . . . . . 2497.3 Two–Factor Experiments in Effect Coding . . . . . . . . 2547.4 Two–Factorial Experiment with Block Effects . . . . . . 2637.5 Two–Factorial Model with Fixed Effects—Confidence
Intervals and Elementary Tests . . . . . . . . . . . . . . . 2667.6 Two–Factorial Model with Random or Mixed Effects . . 270
7.6.1 Model with Random Effects . . . . . . . . . . . . 2707.6.2 Mixed Model . . . . . . . . . . . . . . . . . . . . 274
7.7 Three–Factorial Designs . . . . . . . . . . . . . . . . . . . 2787.8 Split–Plot Design . . . . . . . . . . . . . . . . . . . . . . 2837.9 2k Factorial Design . . . . . . . . . . . . . . . . . . . . . 287
7.9.1 The 22 Design . . . . . . . . . . . . . . . . . . . 2887.9.2 The 23 Design . . . . . . . . . . . . . . . . . . . 290
7.10 Confounding . . . . . . . . . . . . . . . . . . . . . . . . . 2947.11 Analysis of Variance in Case of Confounded Effects . . . 3037.12 Partial Confounding . . . . . . . . . . . . . . . . . . . . . 3047.13 Fractional Replications . . . . . . . . . . . . . . . . . . . 3167.14 Exercises and Questions . . . . . . . . . . . . . . . . . . . 322
8 Models for Categorical Response Variables 3298.1 Generalized Linear Models . . . . . . . . . . . . . . . . . 329
8.1.1 Extension of the Regression Model . . . . . . . . 3298.1.2 Structure of the Generalized Linear Model . . . . 3318.1.3 Score Function and Information Matrix . . . . . 334
Contents xv
8.1.4 Maximum Likelihood Estimation . . . . . . . . . 3358.1.5 Testing of Hypotheses and Goodness of Fit . . . 3388.1.6 Overdispersion . . . . . . . . . . . . . . . . . . . 3398.1.7 Quasi Loglikelihood . . . . . . . . . . . . . . . . 341
8.2 Contingency Tables . . . . . . . . . . . . . . . . . . . . . 3438.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . 3438.2.2 Ways of Comparing Proportions . . . . . . . . . 3448.2.3 Sampling in Two–Way Contingency Tables . . . 3478.2.4 Likelihood Function and Maximum
Likelihood Estimates . . . . . . . . . . . . . . . . 3488.2.5 Testing the Goodness of Fit . . . . . . . . . . . . 350
8.3 Generalized Linear Model for Binary Response . . . . . . 3538.3.1 Logit Models and Logistic Regression . . . . . . 3538.3.2 Testing the Model . . . . . . . . . . . . . . . . . 3558.3.3 Distribution Function as a Link Function . . . . 356
8.4 Logit Models for Categorical Data . . . . . . . . . . . . . 3578.5 Goodness of Fit—Likelihood Ratio Test . . . . . . . . . . 3588.6 Loglinear Models for Categorical Variables . . . . . . . . 359
8.6.1 Two–Way Contingency Tables . . . . . . . . . . 3598.6.2 Three–Way Contingency Tables . . . . . . . . . . 362
8.7 The Special Case of Binary Response . . . . . . . . . . . 3658.8 Coding of Categorical Explanatory Variables . . . . . . . 368
8.8.1 Dummy and Effect Coding . . . . . . . . . . . . 3688.8.2 Coding of Response Models . . . . . . . . . . . . 3728.8.3 Coding of Models for the Hazard Rate . . . . . . 372
8.9 Extensions to Dependent Binary Variables . . . . . . . . 3758.9.1 Overview . . . . . . . . . . . . . . . . . . . . . . 3768.9.2 Modeling Approaches for Correlated Response . 3778.9.3 Quasi–Likelihood Approach for Correlated
Binary Response . . . . . . . . . . . . . . . . . . 3788.9.4 The Generalized Estimating Equation Method
by Liang and Zeger . . . . . . . . . . . . . . . . . 3798.9.5 Properties of the Generalized Estimating
Equation Estimate βG . . . . . . . . . . . . . . . 3818.9.6 Efficiency of the Generalized Estimating
Equation and Independence EstimatingEquation Methods . . . . . . . . . . . . . . . . . 383
8.9.7 Choice of the Quasi–Correlation Matrix Ri(α) . 3838.9.8 Bivariate Binary Correlated Response Variables . 3848.9.9 The Generalized Estimating Equation Method . 3858.9.10 The Independence Estimating Equation Method 3868.9.11 An Example from the Field of Dentistry . . . . . 3878.9.12 Full Likelihood Approach for Marginal Models . 392
8.10 Exercises and Questions . . . . . . . . . . . . . . . . . . . 392
xvi Contents
9 Repeated Measures Model 3959.1 The Fundamental Model for One Population . . . . . . . 3959.2 The Repeated Measures Model for Two Populations . . . 3989.3 Univariate and Multivariate Analysis . . . . . . . . . . . 401
9.3.1 The Univariate One–Sample Case . . . . . . . . 4019.3.2 The Multivariate One–Sample Case . . . . . . . 401
9.4 The Univariate Two–Sample Case . . . . . . . . . . . . . 4069.5 The Multivariate Two–Sample Case . . . . . . . . . . . . 4079.6 Testing of H0 : Σx = Σy . . . . . . . . . . . . . . . . . . . 4079.7 Univariate Analysis of Variance in the Repeated
Measures Model . . . . . . . . . . . . . . . . . . . . . . . 4099.7.1 Testing of Hypotheses in the Case of
Compound Symmetry . . . . . . . . . . . . . . . 4099.7.2 Testing of Hypotheses in the Case of
Sphericity . . . . . . . . . . . . . . . . . . . . . . 4119.7.3 The Problem of Nonsphericity . . . . . . . . . . 4159.7.4 Application of Univariate Modified
Approaches in the Case of Nonsphericity . . . . . 4169.7.5 Multiple Tests . . . . . . . . . . . . . . . . . . . 4179.7.6 Examples . . . . . . . . . . . . . . . . . . . . . . 418
9.8 Multivariate Rank Tests in the Repeated Measures Model 4249.9 Categorical Regression for the Repeated Binary
Response Data . . . . . . . . . . . . . . . . . . . . . . . . 4299.9.1 Logit Models for the Repeated Binary
Response for the Comparison of Therapies . . . . 4299.9.2 First–Order Markov Chain Models . . . . . . . . 4309.9.3 Multinomial Sampling and Loglinear
Models for a Global Comparison of Therapies . . 4329.10 Exercises and Questions . . . . . . . . . . . . . . . . . . . 439
10 Cross–Over Design 44110.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 44110.2 Linear Model and Notations . . . . . . . . . . . . . . . . 44210.3 2× 2 Cross–Over (Classical Approach) . . . . . . . . . . 443
10.3.1 Analysis Using t–Tests . . . . . . . . . . . . . . . 44410.3.2 Analysis of Variance . . . . . . . . . . . . . . . . 44910.3.3 Residual Analysis and Plotting the Data . . . . . 45310.3.4 Alternative Parametrizations in 2× 2 Cross–
Over . . . . . . . . . . . . . . . . . . . . . . . . . 45710.3.5 Cross–Over Analysis Using Rank Tests . . . . . . 468
10.4 2 × 2 Cross–Over and Categorical (Binary) Response . . 46810.4.1 Introduction . . . . . . . . . . . . . . . . . . . . 46810.4.2 Loglinear and Logit Models . . . . . . . . . . . . 473
10.5 Exercises and Questions . . . . . . . . . . . . . . . . . . . 485
Contents xvii
11 Statistical Analysis of Incomplete Data 48711.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 48711.2 Missing Data in the Response . . . . . . . . . . . . . . . 492
11.2.1 Least Squares Analysis for Complete Data . . . . 49211.2.2 Least Squares Analysis for Filled–Up Data . . . 49311.2.3 Analysis of Covariance—Bartlett’s Method . . . 494
11.3 Missing Values in the X–Matrix . . . . . . . . . . . . . . 49511.3.1 Missing Values and Loss of Efficiency . . . . . . 49711.3.2 Standard Methods for Incomplete X–Matrices . 499
11.4 Adjusting for Missing Data in 2× 2 Cross–Over Designs 50211.4.1 Notation . . . . . . . . . . . . . . . . . . . . . . . 50211.4.2 Maximum Likelihood Estimator (Rao, 1956) . . 50411.4.3 Test Procedures . . . . . . . . . . . . . . . . . . 505
11.5 Missing Categorical Data . . . . . . . . . . . . . . . . . . 51011.5.1 Introduction . . . . . . . . . . . . . . . . . . . . 51011.5.2 Maximum Likelihood Estimation in the
Complete Data Case . . . . . . . . . . . . . . . . 51111.5.3 Ad–Hoc Methods . . . . . . . . . . . . . . . . . . 51111.5.4 Model–Based Methods . . . . . . . . . . . . . . . 512
11.6 Exercises and Questions . . . . . . . . . . . . . . . . . . . 515
A Matrix Algebra 517A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 517A.2 Trace of a Matrix . . . . . . . . . . . . . . . . . . . . . . 520A.3 Determinant of a Matrix . . . . . . . . . . . . . . . . . . 520A.4 Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . 522A.5 Orthogonal Matrices . . . . . . . . . . . . . . . . . . . . 523A.6 Rank of a Matrix . . . . . . . . . . . . . . . . . . . . . . 524A.7 Range and Null Space . . . . . . . . . . . . . . . . . . . . 524A.8 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . 525A.9 Decomposition of Matrices . . . . . . . . . . . . . . . . . 527A.10 Definite Matrices and Quadratic Forms . . . . . . . . . . 530A.11 Idempotent Matrices . . . . . . . . . . . . . . . . . . . . 536A.12 Generalized Inverse . . . . . . . . . . . . . . . . . . . . . 537A.13 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . 545A.14 Functions of Normally Distributed Variables . . . . . . . 546A.15 Differentiation of Scalar Functions of Matrices . . . . . . 549A.16 Miscellaneous Results, Stochastic Convergence . . . . . . 552
B Theoretical Proofs 555B.1 The Linear Regression Model . . . . . . . . . . . . . . . . 555B.2 Single–Factor Experiments with Fixed and Random
Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578B.3 Incomplete Block Designs . . . . . . . . . . . . . . . . . . 581
xviii Contents
C Distributions and Tables 591
References 599
Index 611
1Introduction
This chapter will give an overview and motivation of the models discussedwithin this book. Basic terms and problems concerning practical work areexplained and conclusions dealing with them are given.
1.1 Data, Variables, and Random Processes
Many processes that occur in nature, the engineering sciences, and biomed-ical or pharmaceutical experiments cannot be characterized by theoreticalor even mathematical models.
The analysis of such processes, especially the study of the cause effect re-lationships, may be carried out by drawing inferences from a finite numberof samples. One important goal now consists of designing sampling experi-ments that are productive, cost effective, and provide a sufficient data basein a qualitative sense. Statistical methods of experimental design aim atimproving and optimizing the effectiveness and productivity of empiricallyconducted experiments.
An almost unlimited capacity of hardware and software facilities suggestsan almost unlimited quantity of information. It is often overlooked, how-ever, that large numbers of data do not necessarily coincide with a largeamount of information. Basically, it is desirable to collect data that containa high level of information, i.e., information–rich data. Statistical methodsof experimental design offer a possibility to increase the proportion of suchinformation–rich data.
© Springer Science + Business Media, LLC 2009
H. Toutenburg and Shalabh, Statistical Analysis of Designed Experiments, Third Edition, 1Springer Texts in Statistics, DOI 10.1007/978-1-4419-1148-3_1,
2 1. Introduction
As data serve to understand, as well as to control processes, we mayformulate several basic ideas of experimental design:
• Selection of the appropriate variables.
• Determination of the optimal range of input values.
• Determination of the optimal process regime, under restrictionsor marginal conditions specific for the process under study (e.g.,pressure, temperature, toxicity).
Examples:
(a) Let the response variable Y denote the flexibility of a plastic that isused in dental medicine to prepare a set of dentures. Let the binaryinput variable X denote if silan is used or not. A suitably designedexperiment should:
(i) confirm that the flexibility increases by using silan (cf. Table1.1); and
(ii) in a next step, find out the optimal dose of silan that leads toan appropriate increase of flexibility.
PMMA PMMA2.2 Vol% quartz 2.2 Vol% quartzwithout silan with silan
98.47 106.75106.20 111.75100.47 96.6798.72 98.7091.42 118.61
108.17 111.0398.36 90.9292.36 104.6280.00 94.63
114.43 110.91104.99 104.62101.11 108.77102.94 98.97103.95 98.7899.00 102.65
106.05
x = 100.42 y = 103.91s2x = 7.92 s2
y = 7.62
n = 16 m = 15
Table 1.1. Flexibility of PMMA with and without silan.
(b) In metallurgy, the effect of two competing methods (oil, A; or saltwater, B), to harden a given alloy, had to be investigated. Somemetallic pieces were hardened by Method A and some by Method B.
1.2 Basic Principles of Experimental Design 3
In both samples the average hardness, xA and xB , was calculated andinterpreted as a measure to assess the effect of the respective method(cf. Montgomery, 1976, p. 1).
In both examples, the following questions may be of interest:
• Are all the explaining factors incorporated that affect flexibility orhardness?
• How many workpieces have to be subjected to treatment such thatpossible differences are statistically significant?
• What is the smallest difference between average treatment effects thatcan be described as being substantial?
• Which methods of data analysis should be used?
• How should treatments be randomized to units?
1.2 Basic Principles of Experimental Design
This section answers parts of the above questions by formulating kinds ofbasic principles for designed experiments.
We shall demonstrate the basic principles of experimental design by thefollowing example in dental medicine. Let us assume that a study is tobe planned in the framework of a prophylactic program for children ofpreschool age. Answers to the following questions are to be expected:
• Are different intensity levels of instruction in dental care for pre–school children different in their effect?
• Are they substantially different from situations in which no instruc-tion is given at all?
Before we try to answer these questions we have to discuss some topics:
(a) Exact definition of intensity levels of instruction in medical care.
Level I: Instruction by dentists and parents andinstruction to the kindergarten teacher by dentists.
Level II: as Level I, but without instruction of parents.Level III: Instruction by dentists only.
Additionally, we define:
Level IV: No instruction at all (control group).
4 1. Introduction
(b) How can we measure the effect of the instruction?As an appropriate parameter, we chose the increase in caries duringthe period of observation, expressed by the difference in carious teeth.
Obviously, the most simple plan is to give instructions to one child whereasanother is left without advice. The criterion to quantify the effect is givenby the increase in carious teeth developed during a fixed period:
Treatment Unit Increase in carious teethA (without instruction) 1 child Increase (a)B (with instruction) 1 child Increase (b)
It would be unreasonable to conclude that instruction will definitely reducethe increase in carious teeth if (b) is smaller than (a), as only one childwas observed for each treatment. If more children are investigated and thedifference of the average effects (a) – (b) still continues to be large, onemay conclude that instruction definitely leads to improvement.
One important fact has to be mentioned at this stage. If more than oneunit per group is observed, there will be some variability in the outcomes ofthe experiment in spite of the homogeneous experimental conditions. Thisphenomenon is called sampling error or natural variation.
In what follows, we will establish some basic principles to study thesampling error. If these principles hold, the chance of getting a data setor a design which could be analyzed, with less doubt about structuralnuisances, is higher as if the data was collected arbitrarily.
Principle 1 Fisher’s Principle of Replication. The experiment has to becarried out on several units (children) in order to determine the samplingerror.
Principle 2 Randomization. The units have to be assigned randomly totreatments. In our example, every level of instruction must have the samechance of being assigned. These two principles are essential to determinethe sampling error correctly. Additionally, the conditions under which thetreatments were given should be comparable, if not identical. Also theunits should be similar in structure. This means, for example, that childrenare of almost the same age, or live in the same area, or show a similarsociological environment. An appropriate set–up of a correctly designedtrial would consist of blocks (defined in Principle 3), each with, for example(the minimum of), four children that have similar characteristics. The fourlevels of instruction are then randomly distributed to the children suchthat, in the end, all levels are present in every group. This is the reasoningbehind the following:
Principle 3 Control of Variance. To increase the sensitivity of an ex-periment, one usually stratifies the units into groups with similar
1.3 Scaling of Variables 5
(homogeneous) characteristics. These are called blocks. The criterion forstratifying is often given by age, sex, risk exposure, or sociological factors.
For Convenience. The experiment should be balanced. The number ofunits assigned to a specific treatment should nearly be the same, i.e., everyinstruction level occurs equally often among the children. The last principleensures that every treatment is given as often as the others.
Even when the analyst follows these principles to the best of his abilitythere might still occur further problems as, for example, the scaling ofvariables which influences the amount of possible methods. The next twosections deal with this problem.
1.3 Scaling of Variables
In general, the applicability of the statistical methods depends on the scalein which the variables have been measured. Some methods, for example,assume that data may take any value within a given interval, whereasothers require only an ordinal or ranked scale. The measurement scale is ofparticular importance as the quality and goodness of statistical methodsdepend to some extent on it.
Nominal Scale (Qualitative Data)
This is the most simple scale. Each data point belongs uniquely to a specificcategory. These categories are often coded by numbers that have no realnumeric meaning.
Examples:
• Classification of patients by sex: two categories, male and female, arepossible;
• classification of patients by blood group;
• increase in carious teeth in a given period. Possible categories: 0 (noincrease), 1 (1 additional carious tooth), etc;
• profession;
• race; and
• marital status.
These types of data are called nominal data. The following scale containssubstantially more information.
6 1. Introduction
Ordinal or Ranked Scale (Quantitative Data)
If we intend to characterize objects according to an ordering, e.g., gradesor ratings, we may use an ordinal or ranked scale. Different categories nowsymbolize different qualities. Note that this does not mean that differencesbetween numerical values may be interpreted.
Example: The oral hygiene index (OHI) may take the values 0, 1, 2, and3. The OHI is 0 if teeth are entirely free of dental plaque and the OHI is 3if more than two–thirds of teeth are attacked. The following classificationserves as an example for an ordered scale:
Group 1 0–1 Excellent hygieneGroup 2 2 Satisfactory hygieneGroup 3 3 Poor hygiene
Further examples of ordinal scaled data are:
• age groups (< 40, < 50, < 60, ≥ 60 years);
• intensity of a medical treatment (low, average, high dose); and
• preference rating of an object (low, average, high).
Metric or Interval Scale
One disadvantage of a ranked scale consists of the fact that numericaldifferences in the data are not liable to interpretation. In order to measuredifferences, we shall use a metric or interval scale with a defined origin andequal scaling units (e.g., temperature). An interval scale with a naturalorigin is called a ratio scale. Length, time, or weight measurements areexamples of such ratio scales. It is convenient to consider interval and ratioscales as one scale.
Examples:
• Resistance to pressure of material.
• pH–Value in dental plaque.
• Time to produce a workpiece.
• Rates of return in per cent.
• Price of an item in dollars.
Interval data may be represented by an ordinal scale and ordinal data bya nominal scale. In both situations, there is a loss of information. Obviously,there is no way to transform data from a lower scale into a higher scale.
Advanced statistical techniques are available for all scales of data. Asurvey is given in Table 1.2.
1.4 Measuring and Scaling in Statistical Medicine 7
Appropriate Appropriate Appropriatemeasures test procedures measures of correlation
Nominal Absolute and χ2–Test Contingencyscale relative frequency coefficient
mode
Ranked Frequencies, χ2–Test, Rank correlationscale mode, ranks, nonparametric coefficient
median, quantiles, methods basedrank variance on ranks
Interval Frequencies, χ2–Test, Correlationscale mode, ranks, nonparametric coefficient
quantiles, median, methods, parametricskewness, x, s, s2 methods (e.g.,
under normality) χ2–, t–,F–Tests, variance, andregression analysis
Table 1.2. Measurement scales and related statistics.
It should be noted that all types of measurement scales may occur si-multaneously if more than one variable is observed from a person or anobject.
Examples: Typical data on registration at a hospital:
• Sex (nominal).
• Deformities: congenital/transmitted/received (nominal).
• Age (interval).
• Order of therapeutic steps (ordinal).
• OHI (ordinal).
• Time of treatment (interval).
1.4 Measuring and Scaling in Statistical Medicine
We shall discuss briefly some general measurement problems that are typ-ical for medical data. Some variables are directly measurable, e.g., height,weight, age, or blood pressure of a patient, whereas others may be observedonly via proxy variables. The latter case is called indirect measurement. Re-sults for the variable of interest may only be derived from the results of aproxy.
Examples:
• Assessing the health of a patient by measuring the effect of a drug.
8 1. Introduction
• Determining the extent of a cardiac infarction by measuring theconcentration of transaminase.
An indirect measurement may be regarded as the sum of the actualeffect and an additional random effect. To quantify the actual effect maybe problematic. Such an indirect measurement leads to a metric scale if:
• the indirect observation is metric;
• the actual effect is measurable by a metric variable; and
• there is a unique relation between both measurement scales.
Unfortunately, the latter case arises rarely in medicine.Another problem arises by introducing derived scales which are defined
as a function of metric scales. Their statistical treatment is rather difficultand more care has to be taken in order to analyze such data.
Example: Heart defects are usually measured by the ratio
strain durationtime of expulsion
·
For most biological variables Z = X | Y is unlikely to have a normaldistribution.
Another important point is the scaling of an interval scale itself. If mea-surement units are chosen unnecessarily wide, this may lead to identicalvalues (ties) and therefore to a loss of information.
In our opinion, it should be stressed that real interval scales are hard tojustify, especially in biomedical experiments.
Furthermore, metric data are often derived by transformations such thatparametric assumptions, e.g., normality, have to be checked carefully.
In conclusion, statistical methods based on rank or nominal data assumenew importance in the analysis of bio medical data.
1.5 Experimental Design in Biotechnology
Data represent a combination of signals and noise. A signal may be definedas the effect a variable has on a process. Noise, or experimental errors, coverthe natural variability in the data or variables.
If a biological, clinical, or even chemical trial is repeated several times, wecannot expect that the results will be identical. Response variables alwaysshow some variation that has to be analyzed by statistical methods.
There are two main sources of uncontrolled variability. These are givenby a pure experimental error and a measurement error in which possibleinteractions (joint variation of two factors) are also included. An exper-imental error is the variability of a response variable under exactly the
1.6 Relative Importance of Effects—The Pareto Principle 9
same experimental conditions. Measurement errors describe the variabilityof a response if repeated measurements are taken. Repeated measurementsmean observing values more than once for a given individual.
In practice, the experimental error is usually assumed to be much higherthan the measurement error. Additionally, it is often impossible to separateboth errors, such that noise may be understood as the sum of both errors.As the measurement error is negligible, in relation to the experimentalerror, we have
noise ≈ experimental error.
One task of experimental design is to separate signals from noise undermarginal conditions given by restrictions in material, time, or money.
Example: If a response is influenced by two variables, A and B, then onetries to quantify the effect of each variable. If the response is measuredonly at low or high levels of A and B, then there is no way to isolate theireffects. If measurements are taken according to the following combinationsof levels, then individual effects may be separated:
• A low, B low.
• A low, B high.
• A high, B low.
• A high, B high.
1.6 Relative Importance of Effects—The ParetoPrinciple
The analysis of models of the form
response = f(X1, . . . , Xk),
where the Xi symbolize exogeneous influence variables, is subject to severalrequirements:
• Choice of the functional dependency f(·) of the response onX1, . . . , Xk.
• Choice of the factors Xi.
• Consideration of interactions and hierarchical structures.
• Estimation of effects and interpretation of results.
A Pareto chart is a special form of bar graph which helps to determinethe importance of problems. Figure 1.1 shows a Pareto chart in which in-fluence variables and interactions are ordered according to their relative
10 1. Introduction
importance. The theory of loglinear regression (Agresti (2007); Fahrmeirand Tutz, 2001; Toutenburg, 1992a) suggests that a special coding of vari-ables as dummies yields estimates of the effects that are independent ofmeasurement units. Ishihawa (1976) has also illustrated this principle by aPareto chart.
-
6
A
B
AB (Interaction)
C
AC
BC
Figure 1.1. Typical Pareto chart of a model: response = f(A, B, C).
1.7 An Alternative Chart
The results of statistical analyses become strictly more apparent if they areaccompanied by the appropriate graphs and charts. Based on the Paretoprinciple, one such chart has been presented in the previous section. Ithelps to find and identify the main effects and interactions. In this sec-tion, we will illustrate a method developed by Heumann, Jacobsen andToutenburg (1993), where bivariate cause effect relationships for ordinaldata are investigated by loglinear models. Let the response variable Y taketwo values
Y =
1 if response is a success,0 otherwise.
Let the influence variables A and B have three ordinal factor levels (low,average, high).The loglinear model is given by
ln(n1jk) = µ + λsuccess1 + λA
j + λBk + λsuccess/A
1j + λsuccess/B
1k . (1.1)
Data is taken from Table 1.3.
1.7 An Alternative Chart 11
Factor BY Factor A low average high0 low 40 10 20
average 60 70 30high 80 90 70
1 low 20 30 5average 60 150 20
high 100 210 50
Table 1.3. Three–dimensional contingency table.
The loglinear model with interactions (1.1)
Y / Factor A, Y / Factor B,
yields the following parameter estimates for the main effects (Table 1.4).
StandardizedParameter estimateY = 0 0.257Y = 1 –0.257Factor A low –13.982Factor A average 4.908Factor A high 14.894Factor B low 2.069Factor B average 10.515Factor B high –10.057
Table 1.4. Main effects in model (1.1).
The estimated interactions are given in Table 1.5.The interactions are displayed in Figures 1.2 and 1.3. The effects are
shown proportional to the highest effect. Note that a comparison of themain effects (shown at the border) and interactions is not possible dueto different scaling. Solid circles correspond to a positive interaction, non-solid circles to a negative interaction. The standardization was calculatedaccording to
area effecti = πr2i (1.2)
with
ri =
√estimation of effecti
maxiestimation of effecti · r,
where r denotes the radius of the maximum effect.
12 1. Introduction
StandardizedParameter estimateY = 0/Factor A low 3.258Y = 0/Factor A average -1.963Y = 0/Factor A high -2.589Y = 1/Factor A low -3.258Y = 1/Factor A average 1.963Y = 1/Factor A high 2.589Y = 0/Factor B low 1.319Y = 0/Factor B average -8.258Y = 0/Factor B high 5.432Y = 1/Factor B low -1.319Y = 1/Factor B average 8.258Y = 1/Factor B high -5.432
Table 1.5. Estimated interactions.
Interpretation. Figure 1.2 shows that (A low)/failure and (A high)/successare positively correlated, such that a recommendation to control is givenby “A high”. Analogously, we extract from Figure 1.3 the recommendation“B average”.
Note. Interactions are to be assessed only within one figure and not be-tween different figures, as standardization is different. A Pareto chart forthe effects of positive response yields Figure 1.4, where the negative effectsare shown as thin lines and the positive effects are shown as thick lines.
Y
Factor A
Y = 0
Y = 1
low average high
z
j
j v z
z h i
j x y
Figure 1.2. Main effects and interactions of Factor A.
1.7 An Alternative Chart 13
Y
Factor B
Y = 0
Y = 1
low average high
z
j
t z j
t j x
d z h
Figure 1.3. Main effects and interactions of Factor B.
-
6B average
B high
A low
A high
A average
B low
Figure 1.4. Simple Pareto chart of a loglinear model.
Example 1.1. To illustrate the principle further, we focus our attention onthe cause effect relationship between smoking and tartar. The loglinearmodel related to Table 1.6 is given by
ln(nij) = µ + λSmokingi + λTartar
j + λSmoking/Tartarij , (1.3)
with λSmokingi as main effect of the three levels nonsmoker, light smoker, and
heavy smoker, λTartarj as main effect of the three levels (low/average/high)
of tartar, and λSmoking/Tartarij as interaction smoking/tartar.
Parameter estimates are given in Table 1.7.
14 1. Introduction
smoking
tartar
no
light
heavy
no average high
i
t
z
w z j
z f f
g u t
j v v
Figure 1.5. Effects in a loglinear model (1.3) displayed proportional to size.
No Medium High–leveltartar tartar tartar
j 1 2 3 ni·i
Nonsmoker 1 284 236 48 568
Smoker, lessthan 6.5 g per day 2 606 983 209 1798
Smoker, morethan 6.5 g per day 3 1028 1871 425 3324
n·j 1918 3090 682 5690
Table 1.6. Contingency table: consumption of tobacco / tartar.
Basically, Figure 1.5 shows a diagonal structure of interactions, wherepositive values are located on the main diagonal. This indicates a positiverelationship between tartar and smoking.
1.8 A One–Way Factorial Experiment by Example 15
Standardizedparameter estimates Effect
-25.93277 smoking(non)7.10944 smoking(light)
32.69931 smoking(heavy)11.70939 tartar(no)23.06797 tartar(average)
-23.72608 tartar(high)7.29951 smoking(non)/tartar(no)
-3.04948 smoking(non)/tartar(average)-2.79705 smoking(non)/tartar(high)-3.51245 smoking(light)/tartar(no)1.93151 smoking(light)/tartar(average)1.17280 smoking(light)/tartar(high)
-7.04098 smoking(heavy)/tartar(no)2.66206 smoking(heavy)/tartar(average)3.16503 smoking(heavy)/tartar(high)
Table 1.7. Estimations in model (1.3).
1.8 A One–Way Factorial Experiment by Example
To illustrate the theory of the preceding section, we shall consider a typ-ical application of experimental design in agriculture. Let us assume thatn1 = 10 and n2 = 10 plants are randomly collected out of n (homoge-neous) plants. The first group is subjected to a fertilizer A and the secondto a fertilizer B. After a period of growth, the weight (response) y of allplants is measured.
Suppose, for simplicity, that the response variable in the populationis distributed according to Y ∼ N(µ, σ2). Then we have, for bothsubpopulations (fertilizers A and B),
YA ∼ N(µA, σ2)
and
YB ∼ N(µB , σ2),
where the variances are assumed to be equal.These assumptions include the following one–way factorial model, where
the factor fertilizer is imposed on two levels, A and B. For the actualresponse values we have
yij = µi + εij (i = 1, 2, j = 1, . . . , ni) (1.4)
with
εij ∼ N(0, σ2)
16 1. Introduction
and εij independent, for all i 6= j. The null hypothesis is given by
H0 : µ1 = µ2 (i.e., H0 : µA = µB).
The alternative hypothesis is
H1 : µ1 6= µ2.
The one–way analysis of variance is equivalent to testing the equality ofthe expected values of two samples by the t–test under normality. The teststatistic, in the case of independent samples of size n1 and n2, is given by
t =x− y
s
√n1 · n2
n1 + n2∼ tn1+n2−2 , (1.5)
where
s2 =
∑n1i=1(xi − x)2 +
∑n2j=1(yj − y)2
n1 + n2 − 2(1.6)
is the pooled estimate of the variance (experimental error). H0 will berejected, if
|t| > tn1+n2−2;1−α/2, (1.7)
where tn1+n2−2;1−α/2 stands for the (1 − α/2)–quantile of the tn1+n2−2–distribution. Assume that the data from Table 1.8 was observed.
Fertilizer A Fertilizer Bi xi (xi − x)2 yi (yi − y)2
1 4 1 5 12 3 4 4 43 5 0 6 04 6 1 7 15 7 4 8 46 6 1 7 17 4 1 5 18 7 4 8 49 6 1 5 1
10 2 9 5 1∑50 26 60 18
Table 1.8. One–way factorial experiment with two independent distributions.
1.8 A One–Way Factorial Experiment by Example 17
We calculate x = 5, y = 6, and
s2 =26 + 18
10 + 10− 2=
4418
= 1.562 ,
t18 =5− 61.56
√10020
= −1.43 ,
t18;0.975 = 2.10 ,
such that H0 : µA = µB cannot be rejected.The underlying assumption of the above test is that both subpopula-
tions can be characterized by identical distributions which may differ onlyin location. This assumption should be checked carefully, as (insignificant)differences may come from inhomogeneous populations. This inhomogene-ity leads to an increase in experimental error and makes it difficult to detectdifferent factor effects.
Pairwise Comparisons (Paired t–Test)
Another experimental set–up that arises frequently in the analysis of bio-medical data is given if two factor levels are subjected, consecutively, tothe same object or person. After the first treatment a wash–out period isestablished, in which the response variable is traced back to its originallevel.
Consider, for example, two alternative pesticides, A and B, which shouldreduce lice attack on plants. Each plant is treated initially by Method Abefore the concentration of lice is measured. Then, after some time, eachplant is treated by Method B and again the concentration is measured.The underlying statistical model is given by
yij = µi + βj + εij ,
i = 1, 2,j = 1, . . . , J,
(1.8)
where:
yij is the concentration in plant j after treatment i;µi is the effect of treatment i;βj is the effect of the jth replication; andεij is the experimental error.
A comparison of the treatments is possible by inspecting the individualdifferences
dj = y1j − y2j , j = 1, . . . , J, (1.9)
of concentrations on one specific plant. We derive
µd := E(dj) = E(y1j − y2j)= µ1 + βj − µ2 − βj
= µ1 − µ2.
18 1. Introduction
Testing H0 : µ1 = µ2 is therefore equivalent to testing for the significanceof H0 : µd = 0. In this situation, the paired t–test for one sample may beapplied, assuming di ∼ N(0, σ2
d),
tn−1 =d
sd
√n (1.10)
with
s2d =
∑(di − d)2
n− 1.
H0 is rejected if
|tn−1| > tn−1;1−α/2.
Let us assume that the data shown in Table 1.9 was observed (i.e., thesame data as in Table 1.8). We get
j y1j y2j dj (dj − d)2
1 4 5 -1 02 3 4 -1 03 5 6 -1 04 6 7 -1 05 7 8 -1 06 6 7 -1 07 4 5 -1 08 7 8 -1 09 6 5 1 410 2 5 -3 4∑
-10 8
Table 1.9. Pairwise experimental design.
d = −1 ,
s2d =
89
= 0.942 ,
t9 =−10.94
√10 = −3.36 ,
t9;0.975 = 2.26 ,
such that H0 : µ1 = µ2 (i.e., µA = µB) is rejected, which confirms thatMethod A is superior to Method B.
If we compare the two experimental designs a loss in degrees of freedombecomes apparent in the latter design. The respective confidence intervals
1.9 Exercises and Questions 19
are given by
(x− y) ± t18;0.975 s
√n1 + n2
n1n2,
−1 ± 2.10 · 1.56
√20100
,
−1 ± 1.46 ,
[−2.46; +0.46] ,
and
d ± t9;0.975sd√n
,
−1 ± 2.260.94√
10,
−1 ± 0.67 ,
[−1.67;−0.33] .
We observe a smaller interval in the second experiment. A comparisonof the respective variances, s2 = 1.562 and s2
d = 0.942, indicates that areduction of the experimental error to (0.94/1.56) ·100 = 60% was achievedby blocking with the paired design.
Note that these positive effects of blocking depend on the homogeneityof variances within each block. In Chapter 4 we will discuss this topic indetail.
1.9 Exercises and Questions
1.9.1 Describe the basic principles of experimental design.
1.9.2 Why are control groups useful?
1.9.3 To what type of scaling do the following data belong?
– Male/female.– Catholic, Protestant.– Pressure.– Temperature.– Tax category.– Small car, car in the middle range, luxury limousine.– Age.– Length of stay of a patient in a clinical trial.– University degrees.
1.9.4 What is the difference between direct and indirect measurements?
20 1. Introduction
1.9.5 What are ties and their consequences in a set of data?
1.9.6 What is a Pareto chart?
1.9.7 Describe problems occurring in experimental set–ups with pairedobservations.
2Comparison of Two Samples
2.1 Introduction
Problems of comparing two samples arise frequently in medicine, sociology,agriculture, engineering, and marketing. The data may have been generatedby observation or may be the outcome of a controlled experiment. In thelatter case, randomization plays a crucial role in gaining information aboutpossible differences in the samples which may be due to a specific factor.Full nonrestricted randomization means, for example, that in a controlledclinical trial there is a constant chance of every patient getting a specifictreatment. The idea of a blind, double blind, or even triple blind set–up ofthe experiment is that neither patient, nor clinician, nor statistician, knowwhat treatment has been given. This should exclude possible biases in theresponse variable, which would be induced by such knowledge. It becomesclear that careful planning is indispensible to achieve valid results.
Another problem in the framework of a clinical trial may consist of thefact of a systematic effect on a subgroup of patients, e.g., males and females.If such a situation is to be expected, one should stratify the sample intohomogeneous subgroups. Such a strategy proves to be useful in plannedexperiments as well as in observational studies.
Another experimental set–up is given by a matched–pair design. Sub-groups then contain only one individual and pairs of subgroups arecompared with respect to different treatments. This procedure requirespairs to be homogeneous with respect to all the possible factors that may
© Springer Science + Business Media, LLC 2009
H. Toutenburg and Shalabh, Statistical Analysis of Designed Experiments, Third Edition,Springer Texts in Statistics, DOI 10.1007/978-1-4419-1148-3_2,
21
22 2. Comparison of Two Samples
exhibit an influence on the response variable and is thus limited to veryspecial situations.
2.2 Paired t–Test and Matched–Pair Design
In order to illustrate the basic reasoning of a matched–pair design, consideran experiment, the structure of which is given in Table 2.1.
TreatmentPair 1 2 Difference1 y11 y21 y11 − y21 = d1
2 y12 y22 y12 − y22 = d2
......
......
n y1n y2n y1n − y2n = dn
d =∑
di/n
Table 2.1. Response in a matched–pair design.
We consider the linear model already given in (1.8). Assuming that
dii.i.d.∼ N
(µd, σ
2d
), (2.1)
the best linear unbiased estimator of µd, d, is distributed as
d ∼ N(µd,σ2
d
n) . (2.2)
An unbiased estimator of σ2d is given by
s2d =
∑ni=1(di − d)2
n− 1∼ σ2
d
n− 1χ2
n−1 (2.3)
such that under H0 : µd = 0 the ratio
t =d
sd
√n (2.4)
is distributed according to a (central) t–distribution.A two–sided test for H0 : µd = 0 versus H1 : µd 6= 0 rejects H0, if
|t| > tn−1;1−α(two–sided) = tn−1;1−α/2 . (2.5)
A one–sided test H0 : µd = 0 versus H1 : µd > 0 (µd < 0) rejects H0 infavor of H1 : µd > 0, if
t > tn−1;1−α . (2.6)
H0 is rejected in favor of H1 : µd < 0, if
t < −tn−1;1−α . (2.7)
2.2 Paired t–Test and Matched–Pair Design 23
Necessary Sample Size and Power of the Test
We consider a test of H0 versus H1 for a distribution with an unknownparameter θ. Obviously, there are four possible situations, two of which
Real situationDecision H0 true H0 false
H0 accepted Correct decision False decisionH0 rejected False decision Correct decision
Table 2.2. Test decisions.
lead to a correct decision. The probability
Pθ(reject H0 | H0 true) = Pθ(H1 | H0) ≤ α for all θ ∈ H0 (2.8)
is called the probability of a type I error. α is to be fixed before theexperiment. Usually, α = 0.05 is a reasonable choice. The probability
Pθ(accept H0 | H0 false) = Pθ(H0 | H1) ≥ β for all θ ∈ H1 (2.9)
is called the probability of a type II error. Obviously, this probabilitydepends on the true value of θ such that the function
G(θ) = Pθ(reject H0) (2.10)
is called the power of the test. Generally, a test on a given α aims to fixthe type II error at a defined level or beyond. Equivalently, we could saythat the power should reach, or even exceed, a given value. Moreover, thefollowing rules apply:
(i) the power rises as the sample size n increases, keeping α and theparameters under H1 fixed;
(ii) the power rises and therefore β decreases as α increases, keeping nand the parameters under H1 fixed; and
(iii) the power rises as the difference δ between the parameters under H0
and under H1 increases.
We bear in mind that the power of a test depends on the difference δ, on thetype I error, on the sample size n, and on the hypothesis being one–sidedor two–sided. Changing from a one–sided to a two–sided problem reducesthe power.
The comparison of means in a matched–pair design yields the followingrelationship. Consider a one–sided test (H0 : µd = µ0 versus H1 : µd =µ0 + δ, δ > 0) and a given α. To start with, we assume σ2
d to be known. Wenow try to derive the sample size n that is required to achieve a fixed powerof 1− β for a given α and known σ2
d. This means that we have to settle n
24 2. Comparison of Two Samples
in a way that H0 : µd = µ0, with fixed α, is accepted with probability β,although the true parameter is µd = µ0 + δ. We define
u :=d− µ0
σd/√
n.
Then, under H1 : µd = µ0 + δ, we have
u =d− (µ0 + δ)
σd/√
n∼ N(0, 1) . (2.11)
u and u are related as follows:
u = u +δ
σd
√n ∼ N
(δ
σd
√n, 1
). (2.12)
The null hypothesis H0 : µd = µ0 is accepted erroneously if the test statisticu has a value of u ≤ u1−α. The probability for this case should be β =P (H0 | H1). So we get
β = P (u ≤ u1−α)
= P
(u ≤ u1−α − δ
σd
√n
)
and, therefore,
uβ = u1−α − δ
σd
√n ,
which yields
n ≥ (u1−α − uβ)2σ2d
δ2(2.13)
=(u1−α + u1−β)2σ2
d
δ2. (2.14)
For application in practice, we have to estimate σ2d in (2.13). If we estimate
σ2d using the sample variance, we also have to replace u1−α and u1−β by
tn−1;1−α and tn−1;1−β , respectively. The value of δ is the difference of ex-pectations of the two parameter ranges, which is either known or estimatedusing the sample.
2.3 Comparison of Means in Independent Groups 25
2.3 Comparison of Means in Independent Groups
2.3.1 Two–Sample t–Test
We have already discussed the two–sample problem in Section 1.8. Now weconsider the two independent samples
A : x1, . . . , xn1 , xi ∼ N(µA, σ2A) ,
B : y1, . . . , yn2 , yi ∼ N(µB , σ2B) .
Assuming σ2A = σ2
B = σ2, we may apply the linear model. To comparethe two groups A and B we test the hypothesis H0 : µA = µB using thestatistic, i.e.,
tn1+n2−2 = (x− y)/s√
(n1n2)/(n1 + n2) .
In practical applications, we have to check the assumption that σ2A = σ2
B .
2.3.2 Testing H0 : σ2A = σ2
B = σ2
Under H0, the two independent sample variances
s2x =
1n1 − 1
n1∑
i=1
(xi − x)2
and
s2y =
1n2 − 1
n2∑
i=1
(yi − y)2
follow a χ2–distribution with n1 − 1 and n2 − 1 degrees of freedom,respectively, and their ratio follows an F–distribution
F =s2
x
s2y
∼ Fn1−1,n2−1 . (2.15)
Decision
Two–sided:
H0 : σ2A = σ2
B versus H1 : σ2A 6= σ2
B .
H0 is rejected if
F > Fn1−1,n2−1;1−α/2
or
F < Fn1−1,n2−1;α/2 (2.16)
with
Fn1−1,n2−1;α/2 =1
Fn1−1,n2−1;1−α/2. (2.17)
26 2. Comparison of Two Samples
One–sided:
H0 : σ2A = σ2
B versus H1 : σ2A > σ2
B . (2.18)
If
F > Fn1−1,n2−1;1−α , (2.19)
then H0 is rejected.
Example 2.1. Using the data set of Table 1.8, we want to test H0 : σ2A = σ2
B .In Table 1.8 we find the values n1 = n2 = 10, s2
A = 269 , and s2
B = 189 . This
yields
F =2618
= 1.44 < 3.18 = F9,9;0.95
so that we cannot reject the null hypothesis H0 : σ2A = σ2
B versus H1 :σ2
A > σ2B according to (2.19). Therefore, our analysis in Section 1.8 was
correct.
2.3.3 Comparison of Means in the Case of Unequal Variances
If H0 : σ2A = σ2
B is not valid, we are up against the so–called BehrensFisher problem, which has no exact solution. For practical use, the followingcorrection of the test statistic according to Welch gives sufficiently goodresults
t =|x− y|√
(s2x/n1) + (s2
y/n2)∼ tv (2.20)
with degrees of freedom approximated by
v =
(s2
x/n1 + s2y/n2
)2
(s2x/n1)2/(n1 + 1) + (s2
y/n2)2/(n2 + 1)− 2 (2.21)
(v is rounded). We have min(n1 − 1, n2 − 1) < v < n1 + n2 − 2.
Example 2.2. In material testing, two normal variables, A and B, were ex-amined. The sample parameters are summarized as follows:
x = 27.99, s2x = 5.982, n1 = 9 ,
y = 1.92, s2y = 1.072, n2 = 10 .
The sample variances are not equal
F =5.982
1.072= 31.23 > 3.23 = F8,9;0.95 .
Therefore, we have to use Welch’s test to compare the means
tv =|27.99− 1.92|√
5.982/9 + 1.072/10= 12.91
2.3 Comparison of Means in Independent Groups 27
with v ≈ 9 degrees of freedom. The critical value of t9;0.975 = 2.26 isexceeded and we reject H0 : µA = µB .
2.3.4 Transformations of Data to Assure Homogeneity ofVariances
We know from experience that the two–sample t–test is more sensitive todiscrepancies in the homogeneity of variances than to deviations from theassumption of normal distribution. The two–sample t–test usually reachesthe level of significance if the assumption of normal distributions is not fullyjustified, but sample sizes are large enough (n1, n2 > 20) and the homogene-ity of variances is valid. This result is based on the central limit theorem.Analogously, deviations from variance homogeneity can have severe effectson the level of significance.
The following transformations may be used to avoid the inhomogeneityof variances:
• logarithmic transformation ln(xi), ln(yi); and
• logarithmic transformation ln(xi + 1), ln(yi + 1), especially if xi andyi have zero values or if 0 ≤ xi, yi ≤ 10 (Woolson, 1987, p. 171).
2.3.5 Necessary Sample Size and Power of the Test
The necessary sample size, to achieve the desired power of the two–samplet–test, is derived as in the paired t–test problem. Let δ = µA − µB > 0 bethe one–sided alternative to be tested against H0 : µA = µB with σ2
A =σ2
B = σ2. Then, with n2 = a · n1 (if a = 1, then n1 = n2), the minimumsample size to preserve a power of 1− β (cf. (2.14)) is given by
n1 = σ2(1 + 1/a)(u1−α + u1−β)2/δ2 (2.22)
and
n2 = a · n1 with n1 from (2.22).
2.3.6 Comparison of Means without Prior TestingH0 : σ2
A = σ2B; Cochran–Cox Test for Independent
Groups
There are several alternative methods to be used instead of the two–samplet–test in the case of unequal variances. The test of Cochran and Cox (1957)uses a statistic which approximately follows a t–distribution. The Cochran–Cox test is conservative compared to the usually used t–test. Substantially,this fact is due to the special number of degrees of freedom that have to beused. The degrees of freedom of this test are a weighted average of n1 − 1
28 2. Comparison of Two Samples
and n2 − 1. In the balanced case (n1 = n2 = n) the Cochran–Cox test hasn− 1 degrees of freedom compared to 2(n− 1) degrees of freedom used inthe two–sample t–test. The test statistic
tc−c =x− y
s(x−y)(2.23)
with
s2(x−y) =
s2x
n1+
s2y
n2
has critical values at:
two–sided: (2.24)
tc−c(1−α/2) =s2
x/n1 tn1−1;1−α/2 + s2y/n2 tn2−1;1−α/2
s2(x−y)
, (2.25)
one–sided: (2.26)
tc−c(1−α) =s2
x/n1 tn1−1;1−α + s2y/n2 tn2−1;1−α
s2(x−y)
. (2.27)
The null hypothesis is rejected if |tc−c| > tc−c(1−α/2) (two–sided) (resp.,tc−c > tc−c(1− α) (one–sided, H1: µA > µB)).
Example 2.3. (Example 2.2 continued).We test H0: µA = µB using the two–sided Cochran–Cox test. With
s2(x−y) =
5.982
9+
1.072
10= 3.97 + 0.11 = 4.08 = 2.022
and
tc−c(1−α/2) =3.97 · 2.31 + 0.11 · 2.26
4.08= 1.86 ,
we get tc−c = |27.99 − 1.92|/2.02 = 12.91 > 2.31, so that H0 has to berejected.
2.4 Wilcoxon’s Sign–Rank Test in theMatched–Pair Design
Wilcoxon’s test for the differences of pairs is the nonparametric analogto the paired t–test. This test can be applied to a continuous (not neces-sarily normal distributed) response. The test allows us to check whetherthe differences y1i − y2i of paired observations (y1i, y2i) are symmetricallydistributed with median M = 0.
2.4 Wilcoxon’s Sign–Rank Test in the Matched–Pair Design 29
In the two–sided test problem, the hypothesis is given by
H0 : M = 0 or, equivalently, H0 : P (Y1 < Y2) = 0.5 , (2.28)
versus
H1 : M 6= 0 (2.29)
and in the one–sided test problem
H0 : M ≤ 0 versus H1 : M > 0 . (2.30)
Assuming Y1 − Y2 being distributed symmetrically, the relation f(−d) =f(d) holds for each value of the difference D = Y1 − Y2, with f(·) denotingthe density function of the difference variable. Therefore, we can expect,under H0, that the ranks of absolute differences |d| are equally distributedamongst negative and positive differences. We put the absolute differencesin ascending order and note the sign of each difference di = y1i − y2i.Then we sum over the ranks of absolute differences with positive sign (or,analogously, with negative sign) and get the following statistic (cf. Buningand Trenkler, 1978, p. 187):
W+ =n∑
i=1
ZiR(|di|) (2.31)
with
di = y1i − y2i ,
R(|di|) : rank of |di|,Zi =
1, di > 0 ,0, di < 0 .
(2.32)
We also could sum over the ranks of negative differences (W−) and get therelationship W+ + W− = n(n + 1)/2.
Exact Distribution of W+ under H0
The term W+ can also be expressed as
W+ =n∑
i=1
iZ(i) with Z(i) =
1, Dj > 0 ,0, Dj < 0 .
(2.33)
In this case Dj denotes the difference for which r(|Dj |) = i for given i.Under H0 : M = 0 the variable W+ is symmetrically distributed withcenter
E(W+) = E
(n∑
i=1
i Z(i)
)=
n(n + 1)4
.
The sample space may be regarded as a set L of all n–tuples built of 1 or0. L itself consists of 2n elements and each of these has probability 1/2n
30 2. Comparison of Two Samples
under H0. Hence, we get
P (W+ = w) =a(w)2n
(2.34)
with a(w) : number of possibilities to assign + signs to the numbers from1 to n in a manner that leads to the sum w.
Example: Let n = 4. The exact distribution of W+ under H0 can be foundin the last column of the following table:
w Tuple of ranks a(w) P (W+ = w)10 (1 2 3 4) 1 1/169 (2 3 4) 1 1/168 (1 3 4) 1 1/167 (1 2 4), (3 4) 2 2/166 (1 2 3), (2 4) 2 2/165 (1 4), (2 3) 2 2/164 (1 3), (4) 2 2/163 (1 2), (3) 2 2/162 (2) 1 1/161 (1) 1 1/160 1 1/16∑
: 16/16 = 1
For example, P (W+ ≥ 8) = 3/16.
Testing
Test A:
H0 : M = 0 is rejected versus H1 : M 6= 0, if W+ ≤ wα/2 orW+ ≥ w1−α/2.
Test B:
H0 : M ≤ 0 is rejected versus H1 : M > 0, if W+ ≥ w1−α.
The exact critical values can be found in tables (e.g., Table H, p. 373 inBuning and Trenkler, 1978). For large sample sizes (n > 20) we can usethe following approximation
Z =W+ − E(W+)√
Var(W+)H0∼ N(0, 1) ,
i.e.,
Z =W+ − n(n + 1)/4√n(n + 1)(2n + 1)/24
. (2.35)
2.4 Wilcoxon’s Sign–Rank Test in the Matched–Pair Design 31
For both tests, H0 is rejected if |Z| > u1−α/2 (resp., Z > u1−α).
Ties
Ties may occur as zero–differences (di = y1i−y2i = 0) and/or as compound–differences (di = dj for i 6= j). Depending on the type of ties, we use oneof the following test:
• zero–differences test;
• compound–differences test; and
• zero–differences plus compound–differences test.
The following methods are comprehensively described in Lienert (1986,pp. 327–332).
1. Zero–Differences Test
(a) Sample reduction method of Wilcoxon and Hemelrijk (Hemelrijk,1952):This method is used if the sample size is large enough (n ≥ 10) andthe percentage of ties is less than 10% (t0/n ≤ 1/10, with t0 denotingthe number of zero–differences).
Zero–differences are excluded from the sample and the test isconducted using the remaining n0 = n− t0 pairs.
(b) Pratt’s partial–rank randomization method (Pratt, 1959):This method is used for small sample sizes with more than 10% ofzero–differences.
The zero–differences are included during the association of ranksbut are excluded from the test statistic. The exact distribution ofW+
0 under H0 is calculated for the remaining n0 signed ranks. Theprobabilities of rejection are given by:
– Test A (two–sided):
P ′0 =2A′0 + a′0
2n0.
– Test B (one–sided):
P ′0 =A′0 + a′0
2n0.
Here A′0 denotes the number of orderings which give W+0 > w0
and a′0 denotes the number of orderings which give W+0 = w0.
32 2. Comparison of Two Samples
(c) Cureton’s asymptotic version of the partial–rank randomization test(Cureton, 1967):This test is used for large sample sizes and many zero–differences(t0/n > 0.1). The test statistic is given by
ZW0 =W+
0 − E(W+0 )√
Var(W+0 )
with
E(W+0 ) =
n(n + 1)− t0(t0 + 1)4
,
Var(W+0 ) =
n(n + 1)(2n + 1)− t0(t0 + 1)(2t0 + 1)24
.
Under H0, the statistic ZW0 follows asymptotically the standardnormal distribution.
2. Compound–Differences Test
(a) Shared–ranks randomization method.In small samples and for any percentage of compound–differences weassign averaged ranks to the compound–differences. The exact distri-butions as well as one– and two–sided critical values, are calculatedas shown in Test 1(b).
(b) Approximated compound–differences test.If we have a larger sample (n > 10) and a small percentage ofcompound–differences (t/n ≤ 1/5 with t = the number of compound–differences), then we assign averaged ranks to the compounded values.The test statistic is calculated and tested as usual.
(c) Asymptotic sign–rank test corrected for ties.This method is useful for large samples with t/n > 1/5.
In equation (2.36) we replace Var(W+) by a corrected variance (dueto the association of ranks) Var(W+
corr.) given by
Var(W+corr.) =
n(n + 1)(2n + 1)24
−r∑
j=1
t3j − tj
48,
with r denoting the number of groups of ties and tj denoting thenumber of ties in the jth group (1 ≤ j ≤ r). Unbounded observationsare regarded as groups of size 1. If there are no ties, then r = n andtj = 1 for all j, e.g., the correction term becomes zero.
2.5 Rank Test for Homogeneity of Wilcoxon, Mann and Whitney 33
3. Zero–Differences Plus Compound–Differences Test
These tests are used if there are both zero–differences and compound–differences.
(a) Pratt’s randomization method.For small samples which are cleared up for zeros (n0 ≤ 10), we pro-ceed as in Test 1(b) but additionally assign averaged ranks to thecompound–differences.
(b) Cureton’s approximation method.In larger zero–cleared samples the test statistic is calculated analo-gously to Test 3(a). The expectation E(W+
0 ) equals that in Test 1(c)and is given by
E(W+0 ) =
n(n + 1)− t0(t0 + 1)4
.
The variance in Test 1(c) has to be corrected due to ties and is givenby
Varcorr.(W+0 ) =
n(n + 1)(2n + 1)− t0(t0 + 1)(2t0 + 1)24
−r∑
j=1
t3j − tj
48.
Finally, the test statistic is given by
ZW0,corr. =W+
0 − E(W+0 )√
Varcorr.(W+0 )
. (2.36)
2.5 Rank Test for Homogeneity of Wilcoxon, Mannand Whitney
We consider two independent continuous random variables, X and Y , withunknown distribution or nonnormal distribution. We would like to testwhether the samples of the two variables are samples of the same population(homogeneity). The so–called U–test of Wilcoxon, Mann, and Whitneyis a rank test. As the Kruskal Wallis test (as the generalization of theWilcoxon test) defines the null hypothesis that k populations are identical,i.e., testing for the homogeneity of these k populations, the Mann WhitneyWilcoxon test could also be seen as a test for homogeneity for the case k = 2(cf. Gibbons, (1976), p. 173). This is the nonparametric analog of the t–test and is used if the assumptions for the use of the t–test are not justifiedor called into question. The relative efficiency of the U–test compared tothe t–test is about 95% in the case of normally distributed variables. TheU–test is often used as a quick test or as a control if the test statistic ofthe t–test gives values close to the critical values.
34 2. Comparison of Two Samples
The hypothesis to be tested is H0 : the probability P to observe a valuefrom the first population X that is greater than any given value of thepopulation Y is equal to 0.5. The two–sided alternative is H1 : P 6= 0.5.The one–sided alternative H1 : P > 0.5 means that X is stochasticallylarger than Y .
We combine the observations of the samples (x1, . . . , xm) and (y1, . . . , yn)in ascending order of ranks and note for each rank the sample it belongsto. Let R1 and R2 denote the sum of ranks of the X– and Y –samples,respectively. The test statistic U is the smaller of the values U1 and U2:
U1 = m · n +m(m + 1)
2−R1 , (2.37)
U2 = m · n +n(n + 1)
2−R2 , (2.38)
with U1 + U2 = m · n (control).H0 is rejected if U ≤ U(m,n;α) (Table 2.3 contains some values for
α = 0.05 (one–sided) and α = 0.10 (two–sided)).
nm 2 3 4 5 6 7 8 9 104 − 0 15 0 1 2 46 0 2 3 5 77 0 2 4 6 8 118 1 3 5 8 10 13 159 1 4 6 9 12 15 18 2110 1 4 7 11 14 17 20 24 27
Table 2.3. Critical values of the U–test (α = 0.05 one–sided, α = 0.10two–sided).
In the case of m and n ≥ 8, the excellent approximation
u =U −m · n/2√
m · n(m + n + 1)/12∼ N(0, 1) (2.39)
is used. For |u| > u1−α/2 the hypothesis H0 is rejected (type I error αtwo–sided and α/2 one–sided).
Example 2.4. We test the equality of means of the two series of measure-ments given in Table 2.4 using the U–test. Let variable X be the flexibilityof PMMA with silan and let variable Y be the flexibility of PMMA withoutsilan. We put the (16 + 15) values of both series in ascending order, applyranks and calculate the sums of ranks R1 = 231 and R2 = 265 (Table 2.5).
2.5 Rank Test for Homogeneity of Wilcoxon, Mann and Whitney 35
PMMA PMMA2.2 Vol% quartz 2.2 Vol% quartzwithout silan with silan
98.47 106.75106.20 111.75100.47 96.6798.72 98.7091.42 118.61
108.17 111.0398.36 90.9292.36 104.6280.00 94.63
114.43 110.91104.99 104.62101.11 108.77102.94 98.97103.95 98.7899.00 102.65
106.05
x = 100.42 y = 103.91s2x = 7.92 s2
y = 7.62
n = 16 m = 15
Table 2.4. Flexibility of PMMA with and without silan (cf. Toutenburg,Toutenburg and Walther, 1991, p. 100).
Rank 1 2 3 4 5 6 7 8 9Observation 80.00 90.92 91.42 92.36 94.63 96.67 98.36 98.47 98.70Variable X Y X X Y Y X X Y
Sum of ranks X 1 +3 +4 +7 +8Sum of ranks Y 2 +5 +6 +9Rank 10 11 12 13 14 15 16 17Observation 98.72 98.78 98.97 99.00 100.47 101.11 102.65 102.94Variable X Y Y X X X Y X
Sum of ranks X +10 +11 +13 +14 +15 +17Sum of ranks Y +12 +16Rank 18 19 20 21 22 23 24Observation 103.95 104.62 104.75 104.99 106.05 106.20 106.75Variable X Y Y X X X Y
Sum of ranks X +18 +21 +22 +23Sum of ranks Y +19 +20 +24Rank 25 26 27 28 29 30 31Observation 108.17 108.77 110.91 111.03 111.75 114.43 118.61Variable X Y Y Y Y X Y
Sum of ranks X +25 +30Sum of ranks Y +26 +27 +28 +29 +31
Table 2.5. Computing the sums of ranks (Example 2.3, cf. Table 2.4).
Then we get
U1 = 16 · 15 +16(16 + 1)
2− 231 = 145 ,
U2 = 16 · 15 +15(15 + 1)
2− 265 = 95 ,
U1 + U2 = 240 = 16 · 15 .
36 2. Comparison of Two Samples
Since m = 16 and n = 15 (both sample sizes ≥ 8), we calculate the teststatistic according to (2.39) with U = U2 being the smaller of the twovalues of U :
u =95− 120√
240(16 + 15 + 1)/12= − 25√
640= −0.99 ,
and therefore |u| = 0.99 < 1.96 = u1−0.05F/2 = u0.975.The null hypothesis is not rejected (type I error 5% and 2.5% using
two– and one–sided alternatives, respectively). The exact critical value ofU is U(16, 15, 0.05two–sided) = 70 (Tables in Sachs, 1974, p. 232), i.e., thedecision is the same (H0 is not rejected).
Correction of the U–Statistic in the Case of Equal Ranks
If observations occur more than once in the combined and ordered samples(x1, . . . , xm) and (y1, . . . , yn), we assign an averaged rank to each of them.The corrected U–test (with m + n = S) is given by
u =U −m · n/2√
[m · n/S(S − 1)][(S3 − S)/12−∑ri=1(t
3i − ti)/12]
. (2.40)
The number of groups of equal observations (ties) is r, and ti denotes thenumber of equal observations in each group.
Example 2.5. We compare the time that two dentists B and C need tomanufacture an inlay (Table 4.1). First, we combine the two samples inascending order (Table 2.6).
Observation 19.5 31.5 31.5 33.5 37.0 40.0 43.5 50.5 53.0 54.0Dentist C C C B B C B C C BRank 1 2.5 2.5 4 5 6 7 8 9 10
Observation 56.0 57.0 59.5 60.0 62.5 62.5 65.5 67.0 75.0Dentist B B B B C C B B BRank 11 12 13 14 15.5 15.5 17 18 19
Table 2.6. Association of ranks (cf. Table 4.1) .
We have r = 2 groups with equal data:
Group 1 : twice the value of 31.5; t1 = 2 ,
Group 2 : twice the value of 62.5; t2 = 2 .
The correction term then is2∑
i=1
t3i − ti12
=23 − 2
12+
23 − 212
= 1 .
2.5 Rank Test for Homogeneity of Wilcoxon, Mann and Whitney 37
The sums of ranks are given by
R1 (dentist B) = 4 + 5 + · · ·+ 19 = 130 ,
R2 (dentist C) = 1 + 2.5 + · · ·+ 15.5 = 60 ,
and, according to (2.37), we get
U1 = 11 · 8 +11(11 + 1)
2− 130 = 24
and, according to (2.38),
U2 = 11 · 8 +8(8 + 1)
2− 60 = 64 ,
U1 + U2 = 88 = 11 · 8 (control).
With S = m + n = 11 + 8 = 19 and with U = U1, the test statistic (2.40)becomes
u =24− 44√[
8819 · 18
] [193 − 19
12− 1
] = −1.65 ,
and, therefore, |u| = 1.65 < 1.96 = u1−0.05/2.The null hypothesis H0 : Both dentists need the same time to make an
inlay is not rejected. Both samples can be regarded as homogeneous andmay be combined in a single sample for further evaluation.
We now assume the working time to be normally distributed. Hence, wecan apply the t–test and get
dentist B : x = 55.27, s2x = 12.742, n1 = 11 ,
dentist C : y = 43.88, s2y = 15.752, n2 = 8 ,
(see Table 4.1).
The test statistic (2.15) is given by
F10,7 =15.752
12.742= 1.53 < 3.15 = F10,7;0.95 ,
and the hypothesis of equal variance is not rejected. To test the hypothesisH0 : µx = µy the test statistic (1.5) is used. The pooled sample varianceis calculated according to (1.6) and gives s2 = (10 ·12.742 +7 ·15.752)/17 =14.062. We now can evaluate the test statistic (1.5) and get
t17 =55.27− 43.88
14.06
√11 · 811 + 8
= 1.74 < 2.11 = t17;0.95(two–sided) .
As before, the null hypothesis is not rejected.
38 2. Comparison of Two Samples
2.6 Comparison of Two Groups with CategoricalResponse
In the previous sections the comparisons in the matched–pair designs andin designs with two independent groups were based on the assumption ofcontinuous response. Now we want to compare two groups with categoricalresponse. The distributions (binomial, multinomial, and Poisson distribu-tions) and the maximum–likelihood–estimation are discussed in detail inChapter 8.
To start with, we first focus on binary response, e.g., to recover/not torecover from an illness, success/no success in a game, scoring more/lessthan a given level.
2.6.1 McNemar’s Test and Matched–Pair Design
In the case of binary response we use the codings 0 and 1, so that the pairsin a matched design are one of the tuples of response (0, 0), (0, 1), (1, 0),or (1, 1). The observations are summarized in a 2× 2 table:
Group 10 1 Sum
Group 20 a c a + c
1 b d b + d
Sum a + b c + d a + b + c + d = n
The null hypothesis is H0 : p1 = p2, where pi is the probabilityP (1 | group i) (i = 1, 2). The test is based on the relative frequen-cies h1 = (c+ d)/n and h2 = (b+ d)/n for response 1, which differ in b andc (these are the frequencies for the discordant results (0, 1) and (1, 0)).
Under H0, the values of b and c are expected to be equal or, analogously,the expression b−(b+c)/2 is expected to be zero. For a given value of b + c,the number of discordant pairs follows a binomial distribution with theparameter p = 1/2 (probability to observe a discordant pair (0, 1) or (1, 0)).As a result, we get E[(0, 1)–response] = (b+c)/2 and Var[(0, 1)–response] =(b + c) · 1
2 · 12 (analogously, this holds symmetrically for [(1, 0)–response]).
The following ratio then has expectation 0 and variance 1:
b− (b + c)/2√(b + c) · 1/2 · 1/2
=b− c√b + c
H0∼ (0, 1)
and follows the standard normal distribution for reasonably large (b+c) dueto the central limit theorem. This approximation can be used for (b + c) ≥20. For the continuity correction, the absolute value of |b− c| is decreased
2.6 Comparison of Two Groups with Categorical Response 39
by 1. Finally, we get the following test statistic:
Z =(b− c)− 1√
b + cif b ≥ c , (2.41)
Z =(b− c) + 1√
b + cif b < c . (2.42)
Critical values are the quantiles of the cumulated binomial distributionB(b + c, 1
2 ) in the case of a small sample size. For larger samples (i.e.,b + c ≥ 20), we choose the quantiles of the standard normal distribution.The test statistic of McNemar is a certain combination of the two Z–statistics given above. This is used for a two–sided test problem in the caseof b + c ≥ 20 and follows a χ2–distribution
Z2 =(|b− c| − 1)2
b + c∼ χ2
1 . (2.43)
Example 2.6. A clinical experiment is used to examine two different teeth–cleaning techniques and their effect on oral hygiene. The response is codedbinary: reduction of tartar yes/no. The patients are stratified into matchedpairs according to sex, actual teeth–cleaning technique, and age. We assumethe following outcome of the trial:
Group 10 1 Sum
Group 20 10 50 60
1 70 80 150Sum 80 130 210
We test H0 : p1 = p2 versus H1 : p1 6= p2. Since b + c = 70 + 50 > 20, wechoose the McNemar statistic
Z2 =(|70− 50| − 1)2
70 + 50=
192
120= 3.01 < 3.84 = χ2
1;0.95
and do not reject H0.
Remark. Modifications of the McNemar test can be constructed similarlyto sign tests. Let n be the number of nonzero differences in the response ofthe pairs and let T+ and T− be the number of positive and negative differ-ences, respectively. Then the test statistic, analogously to the Z–statistics(2.41) and (2.42), is given by
Z =(T+/n− 1/2)± n/2
1/√
4n, (2.44)
in which we use +n/2 if T+/n < 1/2 and −n/2 if T+/n ≥ 1/2. The nullhypothesis is H0 : µd = 0. Depending on the sample size (n ≥ 20 or n < 20)we use the quantiles of the normal or binomial distributions.
40 2. Comparison of Two Samples
2.6.2 Fisher’s Exact Test for Two Independent Groups
Regarding two independent groups of size n1 and n2 with binary response,we get the following 2× 2 table
Group 1 Group 21 a c a + c0 b d b + d
n1 n2 n
The relative frequencies of response 1 are p1 = a/n1 and p2 = c/n2. Thenull hypothesis is H0 : p1 = p2 = p. In this contingency table, we identifythe cell with the smallest cell count and calculate the probability for thisand all other tables with an even smaller cell count in the smallest cell. Indoing so, we have to ensure that the marginal sums keep constant.
Assume (1, 1) to be the weakest cell. Under H0 we have, for response 1in both groups (for given n, n1, n2, and p):
P ((a + c)|n, p) =(
n
a + c
)pa+c(1− p)n−(a+c) ,
for Group 1 and response 1:
P (a|(a + b), p) =(
a + b
a
)pa(1− p)b ,
for Group 2 and response 1:
P (c|(c + d), p) =(
c + d
c
)pc(1− p)d .
Since the two groups are independent, the joint probability is given by
P (Group 1 = a ∧ Group 2 = c) =(
a + b
a
)pa(1− p)b
(c + d
c
)pc(1− p)d
and the conditional probability of a and c (for the given marginal sum a+c)is
P (a, c | a + c) =(
a + b
a
)(c + d
c
)/(n
a + c
)
=(a + b)! (c + d)! (a + c)! (b + d)!
n!· 1a! b! c! d!
.
Hence, the probability to observe the given table or a table with an evensmaller count in the weakest cell is
P =(a + b)! (c + d)! (a + c)! (b + d)!
n!·∑
i
1ai! bi! ci! di!
,
with summation over all cases i with ai ≤ a. If P < 0.05 (one–sided) or2P < 0.05 (two–sided) hold, then hypothesis H0 : p1 = p2 is rejected.
2.6 Comparison of Two Groups with Categorical Response 41
Example 2.7. We compare two independent groups of subjects receivingeither type A or type B of an implanted denture and observe whether it islost during the healing process (8 weeks after implantation). The data are
A B
LossYes 2 8 10
No 10 4 1412 12 24
.
The two tables with a smaller count in the (yes | A) cell are
1 911 3 and 0 10
12 2
and, therefore, we get
P =10! 14! 12! 12!
24!
(1
2! 8! 10! 4!+
11! 9! 11! 3!
+1
0! 10! 12! 2!
)= 0.018 ,
one–sided test: P = 0.018two–sided test: 2P = 0.036
< 0.05 .
Decision. H0 : p1 = p2 is rejected in both cases. The risk of loss issignificantly higher for type B than for type A.
Recurrence Relation
Instead of using tables, we can also use the following recurrence relation(cited by Sachs, 1974, p. 289):
Pi+1 =aidi
bi+1ci+1Pi .
In our example, we get
P = P1 + P2 + P3 ,
P1 =10! 14! 12! 12!
24!1
2! 8! 10! 4!= 0.0166 ,
P2 =2 · 411 · 9P1 = 0.0013 ,
P3 =1 · 3
12 · 10P2 = 0.0000 ,
and, therefore, P = 0.0179 ≈ 0.0180.
42 2. Comparison of Two Samples
2.7 Exercises and Questions
2.7.1 What are the differences between the paired t–test and the two–sample t–test (degrees of freedom, power)?
2.7.2 Consider two samples with n1 = n2, α = 0.05 and β = 0.05 ina matched–pair design and in a design of two independent groups.What is the minimum sample size needed to achieve a power of 0.95,assuming σ2 = 1 and δ2 = 4.
2.7.3 Apply Wilcoxon’s sign–rank test for a matched–pair design to thefollowing table:
Table 2.7. Scorings of students who took a cup of coffee either before or after alecture.
Student Before After1 17 252 18 453 25 374 12 105 19 216 34 277 29 29
Does treatment B (coffee before) significantly influence the score?
2.7.4 For a comparison of two independent samples, X : leaf–length ofstrawberries with manuring A, and Y : manuring B, the normal dis-tribution is put in question. Test H0 : µX = µY using the homogeneitytest of Wilcoxon, Mann, and Whitney.
A B37 4549 5151 6262 7374 8789 4544 335317
Note that there are ties.
2.7.5 Recode the response in Table 2.4 into binary response with:flexibility < 100 : 0 ,
2.7 Exercises and Questions 43
flexibility ≥ 100 : 1 ,and apply Fisher’s exact test for H0 : p1 = p2 (pi = P (1 | group i)).
2.7.6 Considering Exercise 2.7.3, we assume that the response has beenbinary recoded according to scoring higher/lower than average: 1/0.A sample of n = 100 shows the following outcome:
Before0 1
0 20 25 45After
1 15 40 5535 65 100
Test for H0 : p1 = p2 using McNemar’s test.
3The Linear Regression Model
3.1 Descriptive Linear Regression
The main focus of this chapter will be the linear regression model and itsbasic principle of estimation. We introduce the fundamental method of leastsquares by looking at the least squares geometry and discussing some of itsalgebraic properties.
Time
10 8 6 4 2 0
Rea
ctio
n 40
30
20
10
0
Figure 3.1. Scatterplot of advertising time and number of positive reactions.
In empirical work, it is quite often appropriate to specify the relation-ship between two sets of data by a simple linear function. For example, wemodel the influence of advertising time on the number of positive reactions
© Springer Science + Business Media, LLC 2009
H. Toutenburg and Shalabh, Statistical Analysis of Designed Experiments, Third Edition,Springer Texts in Statistics, DOI 10.1007/978-1-4419-1148-3_3,
45
46 3. The Linear Regression Model
from the public. From the scatterplot in Figure 3.1 one could suspect a lin-ear function between advertising time (x–axis) and the number of positivereactions (y–axis). The study was done on 66 people in order to investigatethe impact and cognition of advertising on TV.
Let Y denote the dependent variable which is related to a set of Kindependent variables X1, . . . , XK by a function f . As both sets comprise Tobservations on each variable, it is convenient to use the following notation:
(y, X) =
y1 x11 · · · xK1
......
...yT x1T · · · xKT
= (y x(1) . . . x(K)) =
y1 x′1...
yT x′T
,
(3.1)where x(t) denotes a column vector and x′t a row vector. We intend toobtain a good overall fit of the model and easy mathematical tractability.Choosing f to be linear seems to be realistic as almost every specificationof f suffers from the exclusion of important variables or the inclusion ofunimportant variables. Additionally, even a correct set of variables is oftenmeasured with at least some error such that a correct functional relation-ship between y and X will most unlikely be precise. On the other hand, thelinear approach may serve as a suitable approximation to several nonlinearfunctional relationships.
If we assume Y to be generated additively by a linear combination of theindependent variables, we may write
Y = X1β1 + . . . + XKβK . (3.2)
The β’s in (3.2) are unknown (scalar–valued) coefficients explaining thedirection and magnitude of their influence on Y . The magnitude of the β’sindicates their importance in explaining Y . Therefore, an obvious goal ofempirical regression analysis consists of finding those values for β1, . . . , βK
which minimize the differences
et := yt − x′tβ (t = 1, . . . , T ) ,
where β′ = (β1, . . . , βK). The et’s are called residuals and play an im-portant role in regression analysis (e.g., in regression diagnostics, see, e.g.,Rao, Toutenburg, Shalabh and Heumann (2008, Chapter 7)). In general,we cannot expect that et = 0 will hold for all t = 1, . . . , T , i.e., the scatter-plot in Figure 3.1 would be a straight line. Accordingly, the residuals areincorporated into the linear approach upon setting
yt = x′tβ + et (t = 1, . . . , T ) . (3.3)
This may be summarized in matrix notation by
y = Xβ + e . (3.4)
3.2 The Principle of Ordinary Least Squares 47
Obviously, a successful choice for β is indicated by small values of all et.Thus, there are quite a few conceivable principles by which the quality ofan actual choice for β may be evaluated.
Among others, the following measures have been proposed:∑T
t=1 |et| , maxt|et| ,
∑Tt=1 e2
t = e′e . (3.5)
Whereas the first two proposals are subject to either some complicatedmathematics or poor statistical properties, the last principle has becomewidely accepted. This provides the basis for the famous method of leastsquares.
3.2 The Principle of Ordinary Least Squares
Let B be the set of all possible vectors β. If there is no further information,we have B = RK (K–dimensional real Euclidean space). The idea is to finda vector b′ = (b1, . . . , bK) from B that minimizes (3.5), the sum of squaredresiduals,
S(β) =T∑
t=1
e2t = e′e = (y −Xβ)′(y −Xβ) , (3.6)
given y and X. Remembering the scatterplot in Figure 3.1 we can explain(3.6) by drawing the regression line and visualizing the individual differenceεi between the original value (xi, yi) and the corresponding value (xi, yi) onthe regression line. This can be seen in Figure 3.2 where these differencesare shown for seven values.
Figure 3.2. Scatterplot with regression line and some εi.
48 3. The Linear Regression Model
A minimum will always exist, as S(β) is a real–valued convex differen-tiable function. If we rewrite S(β) as
S(β) = y′y + β′X ′Xβ − 2β′X ′y′ (3.7)
and differentiate with respect to β (by help of A.63–A.67), we obtain
∂S(β)∂β
= 2X ′Xβ − 2X ′y , (3.8)
∂2S(β)∂β2
= 2X ′X , (3.9)
with 2X ′X being nonnegative definite. Equating the first derivative to zeroyields the normal equations
X ′Xβ = X ′y. (3.10)
The solution of (3.10) is now straightforwardly obtainable by consideringa system of linear equations
Ax = a , (3.11)
where A is an (n×m)–matrix and a is an (n×1)–vector. The (m×1)–vectorx solves the equation. Let A− be a generalized inverse of A (cf. DefinitionA.26). Then we have:
Theorem 3.1. The linear equation Ax = a has a solution if and only if
AA−a = a . (3.12)
If (3.12) holds, then all solutions are given by
x = A−a + (I −A−A)w , (3.13)
where w is an arbitrary (m× 1)–vector. (Proof 1, Appendix B.)
Remark. x = A−a (i.e., (3.13) and w = 0) is a particular solution ofAx = a.
We apply this result to our problem, i.e., to (3.10), and check the solv-ability of the linear equation first.
X is a (T×K)–matrix, thus X ′X is a symmetric (K×K)–matrix of rank(X ′X) = p ≤ K. Equation (3.10) has a solution if and only if (cf. (3.12))
(X ′X)(X ′X)−X ′y = X ′y . (3.14)
Following the definition of a g–inverse
(X ′X)(X ′X)−(X ′X) = (X ′X)
we have with Theorem A.46
X ′X(X ′X)−X ′ = X ′ ,
3.2 The Principle of Ordinary Least Squares 49
such that (3.14) holds. Thus, the normal equation (3.10) always has asolution. The set of all solutions of (3.10) are, by (3.13), of the form
b = (X ′X)−X ′y + (I − (X ′X)−X ′X)w , (3.15)
where w is an arbitrary (K×1)–vector. For the choice w = 0, we have with
b = (X ′X)−X ′y (3.16)
a particular solution, which is nonunique as the generalized inverse (X ′X)−
is nonunique.An interesting algebraic property can be seen from the following theorem.
Theorem 3.2. The vector β = b minimizes the sum of squared errors if andonly if it is a solution of X ′Xb = X ′y. All solutions are located on thehyperplane Xb. (Proof 2, Appendix B.)
The solutions b of the normal equations are called empirical regressioncoefficients or empirical least squares estimates of β. y = Xb is calledthe empirical regression hyperplane. An important property of the sum ofsquared errors S(b) is
y′y = y′y + e′e , (3.17)
where e denotes the residuals y−Xb. This means that the sum of squaredobservations y′y may be decomposed additively into the sum of squaredvalues y′y, explained by regression and the sum of (unexplained) squaredresiduals e′e.
We derive (3.17) by premultiplication of (3.10) with b′:
b′X ′Xb = b′X ′y
and
y′y = (Xb)′(Xb) = b′X ′Xb = b′X ′y (3.18)
according to
S(b) = e′e = (y −Xb)′(y −Xb)= y′y − 2b′X ′y + b′X ′Xb
= y′y − b′X ′y
= y′y − y′y . (3.19)
Remark. In analysis of variance, y′y will be decomposed further into or-thogonal components which are related to the main and mixed effects oftreatments.
50 3. The Linear Regression Model
3.3 Geometric Properties of Ordinary LeastSquares Estimation
This section gives a short survey of some of the geometric properties ofordinary least squares (OLS) Estimation. Because of its geometric and al-gebraic characteristics it may be more theoretical than other sections and,therefore, the reader with practical interest may skip these pages.
Once again, we consider the linear model (3.4), i.e.,
y = Xβ + e ,
where Xβ ∈ R(X) = Θ : Θ = Xβ. R(X) is the column space, the setof all vectors Θ such that Θ = Xβ is fulfilled for all vectors β from Rp.R(X) = Θ : Θ = Xb and the null space N (X) = Φ : XΦ = 0 arevector spaces. The basic relation between the column space and the nullspace is given by
N (X) = R(X ′)⊥ . (3.20)
If we assume that rank(X) = p, then R(X) is of dimension p. Let R(X)⊥
denote the orthogonal complement of R(X) and let Xb be denoted by Θ0
where b is the OLS estimation of β. Then we have:
Theorem 3.3. The OLS estimation Θ0 of Xb minimizing
S(β) = (y −Xβ)′(y −Xβ)= (y −Θ)′(y −Θ) = S(Θ) (3.21)
for Θ ∈ R(X), is given by the orthogonal projection of y on the spaceR(X). (Proof 3, Appendix B.)
As the context of theorem 3.3 is difficult to imagine Figure 3.3 may helpto get a better impression.
The OLS estimator Xb of Xβ may also be obtained in a more direct wayby using idempotent projection matrices.
Theorem 3.4. Let P be a symmetric and idempotent matrix of rank p,representing the orthogonal projection of RT on R(X).Then Xb = Θ0 = Py. (Proof 4, Appendix B.)
The determination of P depends on the rank of X. Whereas forrank(X) = K, i.e., X is of full rank, P is determined by X(X ′X)−1X ′,it turns out to be more difficult when rank(X) = p < K. As shown inProof 4, Appendix B, unique solutions are derived, based on (K−p) linearrestrictions on β by Rβ = r, leading to the conditional Ordinary LeastSquares Estimator (OLSE)
b(R, r) = (X ′X + R′R)−1(X ′y + R′r) . (3.22)
3.4 Best Linear Unbiased Estimation 51
..................................
..................................
..................................
..................................
..................................
.....................
..................................
..................................
..................................
..................................
..................................
...........................................................................................................................................................................................................................................................................................................................................................................
..................................
..................................
..................................
..................................
..................................
...........................................................................................................................................................................................................................................................................................................................................................................
x1
x2
y = Py
y
ε = (I − P )y
........................................................................................................................................................................................................................ ................
..............................................
..............................................
.............................................................
......................................................
......................................................
......................................................
......................................................
......................................................
............................................................
.................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
......................
................
......................
.....................
.....................
......................
.....................
........... ........ ........ ........ ........ ........ ........ ........ ........ ........ ........ ........ ........ ........ ........ ........ ........ ........ ........ ........ ........ ........ ........ ........ ..................................................................................................................................................................................................
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
........
.
Figure 3.3. Orthogonal projection of y on R(X).
The conditional OLSE (in the sense of being restricted by Rβ = r) b(R, r)will be most useful in tackling the problem of multicollinearity which istypical for design matrices in ANOVA models (see Section 3.5).
3.4 Best Linear Unbiased Estimation
After introducing the classical linear model, with its assumptions and mea-sures for evaluating linear estimates, we want to show that b is the bestlinear unbiased estimator of β. As estimation of variance is always of prac-tical interest we describe the estimation of σ2 in general and for the specialcase K = 2.
In descriptive regression analysis, the regression coefficient β is allowed tovary and is then determined by the method of least squares in an algebraicalway by using projection matrices. The classical linear regression model nowinterprets the vector β as a fixed but unknown model parameter. Thenestimation is carried out by minimizing an appropriate risk function. The
52 3. The Linear Regression Model
model and its main assumptions are given as follows:
y = Xβ + ε ,E(ε) = 0 , E(εε′) = σ2I ,X nonstochastic , rank(X) = K .
(3.23)
As X is assumed to be nonstochastic, X and ε are independent, i.e.,
E(ε | X) = E(ε) = 0 , (3.24)
E(X ′ε | X) = X ′ E(ε) = 0 , (3.25)
and
E(εε′ | X) = E(εε′) = σ2I . (3.26)
The rank condition on X means that there are no linear relations betweenthe K regressors X1, . . . , XK ; especially, the inverse matrix (X ′X)−1 exists.Using (3.23) and (3.24) we get the conditional expectation
E(y|X) = Xβ + E(ε|X) = Xβ , (3.27)
and by (3.26) the covariance matrix of y is of the form
E[(y − E(y))(y − E(y))′|X] = E(εε′|X) = σ2I . (3.28)
In the following, all expected values should be understood as conditionalon a fixed matrix X.
3.4.1 Linear Estimators
The statistician’s task is now to estimate the true but unknown vector βof regression parameters in the model (3.23) on the basis of observations(y,X) and the assumptions already stated. This will be done by choosinga suitable estimator β which will then be used to calculate the conditionalexpectation E(y|X) = Xβ, and an estimate for the error variance σ2. It iscommon to choose an estimator β that is linear in y, i.e.,
β = CK×T
y + dK×1
. (3.29)
C and d are nonstochastic matrices, which have been determined byminimizing a suitably chosen risk function in an optimal way.
At first, we have to introduce some definitions.
Definition 3.5. β is called a homogeneous estimator of β, if d = 0; otherwiseβ is called inhomogeneous.
In descriptive regression analysis, we measured the goodness of fit of themodel by the sum of squared errors S(β). Analogously, we define for therandom variable β the quadratic loss function
L(β, β, A) = (β − β)′A(β − β) , (3.30)
3.4 Best Linear Unbiased Estimation 53
where A is a symmetric and, at least, nonnegative–definite (K×K)–matrix.
Remark. We say that A ≥ 0 (A nonnegative definite) and A > 0 (A posi-tive definite) in accordance with Theorems A.21–A.23.
Obviously, the loss (3.30) depends on the sample. Thus, we have to con-sider the average or expected loss over all possible samples. The expectedloss of an estimator will be called risk.
Definition 3.6. The quadratic risk of an estimator β of β is defined as
R(β, β, A) = E(β − β)′A(β − β) . (3.31)
The next step now consists of finding an estimator β that minimizes thequadratic risk function over a class of appropriate functions. Therefore, wehave to define a criterion to compare estimators.
Definition 3.7 (R(A)–Superiority). An estimator β2 of β is called R(A)superior or an R(A) improvement over another estimator β1 of β, if
R(β1, β, A)−R(β2, β, A) ≥ 0. (3.32)
3.4.2 Mean Square Error
The quadratic risk is related closely to the matrix–valued criterion of themean square error (MSE) of an estimator. The MSE is defined as
M(β, β) = E(β − β)(β − β)′ . (3.33)
We will denote the covariance matrix (see also Example A.1, Appendix A)of an estimator β by V(β):
V(β) = E(β − E(β))(β − E(β))′ . (3.34)
If E(β) = β, then β will be called unbiased (for β). If E(β) 6= β, then β iscalled biased. The difference between E(β) and β is called
Bias(β, β) = E(β)− β . (3.35)
If β is unbiased, then obviously Bias(β, β) = 0.
The following decomposition of the mean square error often proves to beuseful
M(β, β) = E[(β − E(β)) + (E(β)− β)][(β − E(β)) + (E(β)− β)]′
= V(β) + (Bias(β, β))(Bias(β, β))′ , (3.36)
i.e., the MSE of an estimator is the sum of the covariance matrix and thesquared bias. In terms of statistical inference the MSE could be explained as
54 3. The Linear Regression Model
the sum of stochastic and systematic errors made by estimating β throughβ.
Mean Square Error Superiority
As the MSE contains all relevant information about the quality of anestimator, comparisons between different estimators may be made bycomparing their MSE matrices.
Definition 3.8 (MSE–I Criterion). We consider two estimators β1 and β2 ofβ. Then β2 is called MSE superior to β1 (or β2 is called an MSE improve-ment to β1), if the difference of their MSE matrices is nonnegative definite,i.e., if
∆(β1, β2) = M(β1, β)−M(β2, β) ≥ 0 . (3.37)
MSE superiority is a local property in the sense that it depends on theparticular value of β. The quadratic risk function (3.30) is just a scalar–valued version of the MSE:
R(β, β, A) = trAM(β, β) . (3.38)
One important connection between R(A) and MSE superiority has beengiven by Theobald (1974) and Trenkler (1981):
Theorem 3.9. Consider two estimators β1 and β2 of β. The following twostatements are equivalent:
∆(β1, β2) ≥ 0 , (3.39)
R(β1, β, A)−R(β2, β, A) = trA∆(β1, β2) ≥ 0 , (3.40)
for all matrices of the type A = aa′.
Proof. Using (3.37) and (3.38) we get
R(β1, β, A)−R(β2, β, A) = trA∆(β1, β2). (3.41)
Following Theorem A.20, it holds that trA∆(β1, β2) ≥ 0 for all matricesA = aa′ ≥ 0 if and only if ∆(β1, β2) ≥ 0.
In practice, β is usually unknown, i.e., expressions like bias or MSE cannot be determined. Within simulation experiments where β is determined,the value of these parameters can be estimated (“estimated” because of theindividuality of the experiment).
3.4 Best Linear Unbiased Estimation 55
3.4.3 Best Linear Unbiased Estimation
The previous definitions and theorems now enable us to evaluate the esti-mator β.
In (3.29), the matrix C and vector d are unknown and have to be es-timated in an optimal way by minimizing the expectation of the sum ofsquared errors S(β), namely, the risk function
r(β, β) = E(y −Xβ)′(y −Xβ) . (3.42)
Direct calculus yields the following result:
y −Xβ = Xβ + ε−Xβ
= ε−X(β − β) , (3.43)
such that
r(β, β) = trE(ε−X(β − β))(ε−X(β − β))′= trσ2IT + XM(β, β)X ′ − 2X E[(β − β)ε′]= σ2T + trX ′XM(β, β) − 2 trX E[(β − β)ε′] . (3.44)
Now we will specify the risk function r(β, β) for linear estimators, con-sidering unbiased estimators only.
Unbiasedness of β requires that E(β | β) = β holds independently of thetrue β in model (3.23). We will see that this imposes some new restrictionson the matrices to be estimated, i.e.,
E(β | β) = C E(y) + d
= CXβ + d = β for all β . (3.45)
For the choice β = 0, we immediately have
d = 0 (3.46)
and the condition, equivalent to (3.45), is
CX = I . (3.47)
Inserting this into (3.43) yields
y −Xβ = Xβ + ε−XCXβ −XCε
= ε−XCε , (3.48)
and (cf. (3.44))
trX E[(β − β)ε′] = trX E(Cεε′)= σ2 trXC= σ2 trCX = σ2 trIK = σ2K . (3.49)
56 3. The Linear Regression Model
Thus we can state the following:
Theorem 3.10. For linear unbiased estimators β = Cy with CX = I, itholds that M(β, β) = V(β) = σ2CC ′ and
r(β, β) = tr(X ′X)V(β)+ σ2(T − 2K) . (3.50)
If we consider the risk functions r(β, β) and R(β, β,X ′X), then we maystate:
Theorem 3.11. Let β1 and β2 be two linear unbiased estimators. Then
r(β1, β)− r(β2, β) = tr(X ′X)4 (β1, β2)= R(β1, β, X ′X)−R(β2, β, X ′X) , (3.51)
where 4(β1, β2) = V(β1) − V(β2), i.e., the difference of the covariancematrices only.
Using Theorem 3.10 we get, with CX = I,
r(β, β) = σ2(T − 2K) + trX ′X V(β)= σ2(T − 2K) + σ2 trX ′XCC ′ .
Minimizing r(β, β) with respect to C leads to an optimum matrixC = (X ′X)−1X ′ (Proof 5, Appendix B). Therefore the actual linear un-biased estimator coincides with the descriptive or empirical OLS estimatorb and is given by
βopt = Cy = (X ′X)−1X ′y , (3.52)
being unbiased with the(K ×K)–covariance matrix
Vb = σ2(X ′X)−1 , (see also Proof 5, Appendix B). (3.53)
The main reason for the popularity of the OLS b in contrast to otherestimators is obvious, as b possesses the minimum variance property amongall members of the class of linear unbiased estimators β. More precisely:
Theorem 3.12. Let β be an arbitrary linear unbiased estimator of β withcovariance matrix Vβ and let a be an arbitrary (K × 1)–vector.
Then the following two equivalent statements hold:
(a) The difference Vβ − Vb is always nonnegative definite (nnd).
(b) The variance of the linear form a′b is always less than or equal to thevariance of a′b:
a′Vba ≤ a′Vβa or a′(Vβ − Vb)a ≥ 0 . (3.54)
Proof. See Proof 6, Appendix B; note that Theorem 3.12 also holds forcomponents, i.e., Var(βi) and Var(bi).
The minimum property of b is usually expressed by the fundamentalGauss–Markov theorem.
3.4 Best Linear Unbiased Estimation 57
Theorem 3.13 (Gauss–Markov Theorem). Consider the classical linear re-gression model (3.23). The OLS estimator
b0 = (X ′X)−1X ′y , (3.55)
with covariance matrix
Vb0 = σ2(X ′X)−1 , (3.56)
is the best homogeneous linear unbiased estimator of β in the sense of thetwo properties of Theorem 3.12. b0 will also be denoted as a Gauss–Markovestimator.
Estimation of a Linear Function of β
If we are interested in estimating a linear combination of the componentsof β, e.g., linear contrasts in ANOVA models, then we have to consider
d = a′β , (3.57)
where a is a known (K × 1)–vector. For now, it is sufficient to restrictconsideration to the linear homogeneous estimators d = c′y. Then we have:
Theorem 3.14. In the classical linear regression model (3.23)
d = a′b0 , (3.58)
with the variance
Var(d) = σ2a′(X ′X)−1a = a′Vb0a , (3.59)
is the best linear unbiased estimator of d = a′β. (Proof 7, Appendix B.)
3.4.4 Estimation of σ2
In this section we want to estimate σ2, an important parameter character-izing the deviation between the actual and predicted response values. Wedecided not to put the derivation of σ2 in the appendix B because it is asimple proof supporting the exposure with the classical linear model.
We start the proof by rewriting ε with the help of projection matricesto simplify the computation of E(ε′ε). This leads to the estimation of σ2
whose unbiasedness we subsequently prove. Finally, we demonstrate thespecial case K = 2.
The sum of squares ε′ε of the estimated errors ε = y − y obviously pro-vides a basis appropriate for estimating σ2.
58 3. The Linear Regression Model
In detail, we get
ε = y − y = Xβ + ε−Xb0
= ε−X(X ′X)−1X ′ε
= (I −X(X ′X)−1X ′)ε= Mε . (3.60)
The matrix M is idempotent by Theorem A.36. As a consequence, the sumof squared errors
ε′ε = ε′MMε = ε′Mε
has expectation
E(ε′ε) = E(ε′Mε)= E(trε′Mε) [Theorem A.1(vi)]= E(trMε′ε)= trM E(εε′)= σ2 trM= σ2 trIT − σ2 trX(X ′X)−1X ′ [Theorem A.1(i)]= σ2 trIT − σ2 tr(X ′X)−1X ′X= σ2 trIT − σ2 trIK= σ2(T −K) . (3.61)
An unbiased estimator for σ2 is then given by
s2 = ε′ε(T −K)−1 = (y −Xb0)′(y −Xb0)(T −K)−1 . (3.62)
Hence, an unbiased estimator of Vb0 is given by
Vb0 = s2(X ′X)−1 . (3.63)
Bivariate Regression K = 2
The important special case K = 2 of the general linear model with Kregressors X1, . . . , XK deserves attention. If there is only one true explana-tory variable accompanied by a dummy regressor, i.e., a column of 1’s, thenwe speak of the simple linear regression model
yt = α + βxt + εt (t = 1, . . . , T ) . (3.64)
It is often useful to transform the observations (xt, yt) in a way that (xt, yt)represent deviations of the sample means (xt, yt):
yt = yt − y , xt = xt − x . (3.65)
As
E(yt|x1, . . . , xT ) = α + βxt − (α + βx) = βxt ,
3.4 Best Linear Unbiased Estimation 59
we are able to obtain an even simpler form of the model (3.64), while theparameter β remains unchanged, i.e.,
yt = βxt + εt (t = 1, . . . , T ) . (3.66)
Assuming that ε = 1/T∑
εt = 0, we have εt = εt for all t. The OLSestimator of β and the unbiased estimator of σ2 are obtained by (B.34)and (3.62) as
b =∑
xtyt∑x2
t
with Var(b) =σ2
∑x2
t
. (3.67)
s2 = (T − 2)−1∑
(yt − xtb)2 . (3.68)
From the right–hand side of (3.67) one can easily see what σ2(X ′X)−1
looks like for K = 2.
It is easy to see that the OLS estimator for α is given by
α = y − bx . (3.69)
Example 3.1. We are interested in modeling the dependence of advertisingx, on sales increase y, of 10 department stores:
i yi xi yi − y xi − x (xi − x)(yi − y)1 2.0 1.5 −5.0 −2.5 12.52 3.0 2.0 −4.0 −2.0 8.03 6.0 3.5 −1.0 −0.5 0.54 5.0 2.5 −2.0 −1.5 3.05 1.0 0.5 −6.0 −3.5 21.06 6.0 4.5 −1.0 0.5 −0.57 5.0 4.0 −2.0 0.0 0.08 11.0 5.5 4.0 1.5 6.09 14.0 7.5 7.0 3.5 24.510 17.0 8.5 10.0 4.5 45.0∑
70 40 0.0 0.0y = 7 x = 4 Syy = 252 Sxx = 60 Sxy = 120
Using β = sxy/sxx and (3.69) leads to the model
yt = −1 + 2xt
which is easily calculated by β = 12060 and α = 7− 2 ∗ 4. The coefficient of
determination results from
R2 = r2 = s2xy/(sxxsyy) = 1202/(60 ∗ 252) .
Running the linear regression in SPLUS for the above data set producesthe following output:
60 3. The Linear Regression Model
*** Linear Model ***
Call: lm(formula = Y ~ X, data = kaufhaus, na.action = na.omit)Residuals:Min 1Q Median 3Q Max-2 -9.384e-016 1.404e-015 1 1
Coefficients:Value Std. Error t value Pr(>|t|)
(Intercept) -1.0000 0.7416 -1.3484 0.2145X 2.0000 0.1581 12.6491 0.0000
Residual standard error: 1.225 on 8 degrees of freedomMultiple R--Squared: 0.9524F--statistic: 160 on 1 and 8 degrees of freedom, the $p$--value is 1.434e-006
Running the “linear regression” procedure with y and x leads to the resultsshown in (3.66).
3.5 Multicollinearity
3.5.1 Extreme Multicollinearity and Estimability
A typical problem in practical work is that there is almost always at leastsome correlation between the exogeneous variables in X. We speak of ex-treme multicollinearity if two or more columns in X are linearly dependent,i.e., if one is a linear combination of the others. As a consequence, we haverank(X) < K such that one basic assumption of model (3.23) is violated.In this case, no unbiased linear estimators for β exist.
We recall that the condition for unbiasedness is equivalent to d = 0 andCX = I (cf. (3.47)). If rank(X) = p < K, then CX is of rank p at most,cf. Theorem A.6(iv), whereas the identity matrix IK is of rank K. Condi-tion (3.47) is thus never fulfilled.
This result could be proven in an alternative way, as you will see inProof 8, Appendix B.
The matrix (X ′X) is singular, since rank(X) < K and solutions to thenormal equation (3.10) are no longer unique.
We say that the parameter vector β is not estimable in the sense that nolinear unbiased estimator exists.
Another problem occurring with extreme multicollinearity becomes ap-parent when considering, without loss of generality, for x1, a linearcombination consisting of all other columns, i.e.,
x1 =K∑
k=2
αkxk .
3.5 Multicollinearity 61
For an arbitrary scalar λ 6= 0, we can derive the decomposition
Xβ =K∑
k=1
xkβk = (1− λ)β1x1 +K∑
k=2
(βk + λαkβ1)xk
= β1x1 +K∑
k=2
βkxk = Xβ , (3.70)
where β1 = (1− λ)β1, βk = (βk + λαkβ1), k = 2, . . . , K. This means, thatthe parameter vectors β and β with β 6= β yield the same systematicalcomponent Xβ = Xβ. Now the observations y do not depend directly, butover Xβ on β.
This means that the information in y therefore does not allow us todistinguish between β and β. The regression coefficients are not identifiable,the related models are observational equivalent.
Example 3.2. We consider the model
yt = α + βxt + εt (t = 1, . . . , T ) . (3.71)
Exact linear dependence between X1 ≡ 1 and X2 = X means thatx1 = . . . = xt = a (a constant), such that
∑(xt − x)2 = 0 and
b (3.67) cannot be calculated.
Let(αβ
)= Cy be a linear homogeneous estimator of (α, β)′. Unbiasedness
requires that (3.47) is fulfilled, such that( ∑
c1t a∑
c1t∑c2t a
∑c2t
)=
(1 00 1
). (3.72)
There exists no matrix C and no real–valued a 6= 0; (α, β)′ are not es-timable. Since xt = a for all t, we have yt = (α + βa) + εt, such that α andβ are only jointly estimable as (α + βa) = y .
3.5.2 Estimation within Extreme Multicollinearity
We are mainly interested in making use of a prior restriction of the form(B.12) with r = 0, i.e.,
0 = Rβ . (3.73)
Parameter values that are observational equivalent are thus excluded.The identifiability of β is guaranteed if RX = 0 and the assumptions of
Theorem B.1 are fulfilled. Following Theorem B.1, the OLS estimator of βis of the form
b(R, 0) = b(R) = (X ′X + R′R)−1X ′y , (3.74)
62 3. The Linear Regression Model
if r = 0. Summarizing, we may state: In the classical linear restrictiveregression model
y = Xβ + ε ,E(ε) = 0, E(εε′) = σ2I ,X nonstochastic, rank(X) = p < K ,0 = Rβ, rank(R) = K − p, rank(D) = K ,
(3.75)
with D′ = (X ′, R′), the following fundamental theorem is valid.
Theorem 3.15. In model (3.75), the conditional OLS estimator
b(R) = (X ′X + R′R)−1X ′y = (D′D)−1X ′y , (3.76)
with covariance matrix
Vb(R) = σ2(D′D)−1X ′X(D′D)−1 , (3.77)
is the best linear unbiased estimator of β.
Definition 3.16. A linear estimator β is called conditionally unbiased under
AK×K
β − aK×1
= 0 ,
if
E(β − β | Aβ − a = 0) = 0 . (3.78)
Proof of Theorem 3.15. See Proof 9, Appendix B.
Extreme multicollinearity is a problem usually not occurring in descrip-tive linear regression, i.e., when analyzing sample data, because an exactlinear dependency between sampled data is unusual. In experimental de-signs, however, where factors are fixed, extreme multicollinearity is present.Assuming a simple case with one factor on s = 2 levels with ns observationseach, the linear model y = Xβ + ε could be written according to
y11
...y1n1
y21
...y2n2
=
1 1 0...
......
1 1 01 0 1...
......
1 0 1
µα1
α2
+
ε11...
ε1n1
ε21...
ε2n2
. (3.79)
As can easily be seen from (3.79) the (n× 3)–matrix X has rank s = 2 be-cause the first column representing the intercept is the sum of the last twocolumns, leading to a case of extreme multicollinearity. Using the condi-tional least squares by (3.73) with r = 0 and R′ = (0, n1, n2), i.e.,
∑αini =
0, guarantees the estimability of β because rank(X, R′)′ = s + 1 = 3.
3.5 Multicollinearity 63
3.5.3 Weak Multicollinearity
When analyzing a data set by the linear model y = Xβ + ε with X notbeing a fixed factor (which would mean having the problem of extreme mul-ticollinearity), a more common problem is weak multicollinearity. Weakmulticollinearity means that there is no exact (but close) linear depen-dency between the exogenous variables, i.e., X is still of full rank. X ′Xis regular and the results remain valid, especially, b still is the best linearunbiased estimator. The problem, however, occurs because one or moreeigenvalues, which are nearly zero, lead to a determinant of X ′X used forcomputing σ2(X ′X)−1 which is also going to be near zero. This means thatVb = σ2(X ′X)−1 grows large and the estimates become unreliable.
In other words, there is not enough information to estimate the inde-pendent influences of some covariates on the response. The effect of eachindependent variable cannot be separated from the remaining variables.Ridge, shrinkage, or principal component regression are ad–hoc procedureswhich cope with multicollinearity in its weak form. However, they are con-troversial, and popular statistical software does not offer these methods; sowe abandon a description of these.
Apart from considering the correlation between the exogenous variables,in order to find the source of the problem and possibly remove it in practice,some other alternatives might be:
• additional observations to reduce the correlation between somevariables within a fixed model (experimental designs);
• linear transformations, e.g., building differences;
• eliminate trends (Schneeweiß (1990));
• use additional information such as a priori estimates r = Rβ + d,d being an error term; and
• exact linear restrictions.
Our main interest is the use of linear restrictions and external information.Using exact linear restrictions with r = 0, i.e.,
0 = Rβ , (3.80)
means that the parameter β is subjected to limitations in the range of val-ues in its components.
Finally, we want to illustrate the problem of weak multicollinearity withthe help of a multiple regression, analyzing data from the demographicinformation of 122 countries (with the most data being from 1992). Wedecided to use SPSS within this framework because it provides somediagnostics for evaluating multicollinearity in a simple way.
64 3. The Linear Regression Model
Example 3.3. We are interested in predicting female life expectancy for asample of 122 countries. Within a multiple regression model the variablesshown in Table 3.1 specifying economic and health–care delivery charac-teristics are included in the analysis.
Variable Name DescriptionUrban Percentage of the population living in urban areaslndocs ln(number of doctors per 10,000 people)lnbeds ln(number of hospital beds per 10,000 people)lngdp ln(per capita gross domestic product in dollars)lnradios ln(radios per 100 people)
Table 3.1. Variable declaration.
When plotting each independent variable against the response it can beseen that only “urban” shows a linear relation to female life expectancy.In order to attain this relation for all other covariates also they should betransformed by the natural log leading to the variables described in Ta-ble 3.1.
First of all we consider the partial correlation coefficients. Each indepen-dent variable should correlate with the response because of the postulatedlinear relation. Between the independent variables correlation should notbe present because of the possible problems already described theoretically.
lifeexpf urban lndocs lnbeds lngdp lnradiolifeexpf 1.000 0.704** 0.879** 0.730** 0.832** 0.695**urban 0.704** 1.000 0.765** 0.576** 0.751** 0.583**lndocs 0.879** 0.765** 1.000 0.711** 0.824** 0.621**lnbeds 0.730** 0.576** 0.711** 1.000 0.741** 0.616**lngdp 0.832** 0.751** 0.824** 0.741** 1.000 0.709**
lnradios 0.695** 0.583** 0.621** 0.616** 0.709** 1.000
Table 3.2. ** Correlation (Pearson) is significant at the 0.01 level (two–tailed).
We abandon the p–values of the corresponding test for H0 : ρ = 0 be-cause they all indicate a significance at the 1% level. The first row shows thecorrelation between the response and the covariates. We see that a linearrelation seems to be adequate. However, we also identify high correlationbetween the independent variables themselves, especially for “lndocs” and“lngdp”. Whether this leads to a problem of multicollinearity has to beverified by further analysis. In the next step we run the linear regressionby entering all variables.
3.5 Multicollinearity 65
Standard error of Change statisticsR2 R2
adj the estimate R Square Ch. F Ch. Sig. F Ch.
0.827 0.819 4.74 0.827 105.336 0.000
Table 3.3. Model summary.
From table 3.3 we should especially remember R2 and R2adj for compar-
isons with other models. The ANOVA table was also abandoned becausethe focus here lies on coefficients and first collinearity diagnostics.
Unstand. coefficients Collinearity statisticsModel β Std. error t Sig. Tolerance VIF
(Constant) 40.767 3.174 12.845 0.000
lndocs 4.069 0.563 7.228 0.000 0.253 3.950
lnradios 1.542 0.686 2.247 0.027 0.467 2.140
lngdp 1.709 0.616 2.776 0.006 0.217 4.614
urban -2.002E-02 0.029 -0.686 0.494 0.371 2.699
lnbeds 1.147 0.749 1.532 0.128 0.406 2.461
Table 3.4. Coefficients (dependent variable: female life expectancy, 1992).
“lndocs”, “lnradios” and “lngdp” have an influence on the female life ex-pectancy within the saturated model (see Table 3.4). The last two columnsgive evidence to the existence of multicollinearity. The tolerance tells uswhether linear relations upon the independent variables are present. Thisis the proportion of a variable’s variance not accounted for by other inde-pendent variables. “VIF” is the reciprocal of tolerance and stands for theinflation factor. Its increase means an increase in the variance of β andthus an unstable estimate β. A large “VIF” is therefore an indicator formulticollinearity.
Considering the variance inflation factor may cause doubt, in the inde-pendence between ‘lngdp‘” and the further covariates, because of its highvalue. Indicators for multicollinearity known from matrix theory are theeigenvalues of X ′X, X denoting the independent variables. SPSS offers theeigenvalues within the “collinearity diagnostics” as well as the conditionindex which is the square root of the ratio between the largest eigenvalueand the actual eigenvalue. Condition indices larger than 15 indicate a prob-lem with multicollinearity, values larger than 30 indicate a serious problem.
As we could not specify variables directly from the table containing theeigenvalues we remember the above results (especially the correlation andvariance inflation factor), and we may conclude, that the variable describing
66 3. The Linear Regression Model
Eigenvalue Condition1 5.510 1.0002 0.360 3.9113 6.608E-02 9.1324 3.356E-02 12.8135 2.360E-02 15.2816 6.798E-03 28.469
Table 3.5. Collinearity diagnostics.
the per capita gross domestic product could be the reason for multi-collinearity. A first way to check this may be the elimination of “lngdp”and then rerun the analysis leading to the following results.
Standard error of Change statisticsR2 R2
adj the estimate R Square Ch. F Ch. Sig. F Ch.
0.815 0.808 4.88 0.815 122.352 0.000
Table 3.6. Model summary.
Unstand. coefficients Collinearity statisticsModel β Std. Error t Sig. Tolerance VIF
(Constant) 47.222 2.224 21.229 0.000
lndocs 4.670 0.535 8.728 0.000 0.297 3.365
lnradios 2.177 0.666 3.268 0.001 0.526 1.902
urban 2.798E-03 0.006 0.097 0.923 0.402 2.485
lnbeds 1.786 0.148 2.434 0.017 0.449 2.229
Table 3.7. Coefficients (dependent variable: female life expectancy, 1992).
Comparing the primary model with the reduced model step by step (seeTables 3.6, 3.7, and 3.8) confirms the elimination of “lngdp”. The elimina-tion of “lngdp” leads to a decrease in the adjusted R2 but the differenceis just marginal. Analyzing the coefficients shows that the standard errorsof all variables have decreased denoting more stable estimates. The para-meter estimates changed more or less slightly to a larger value, especiallythat of “urban” where even the sign changed and whose values of the rel-ative change (here not shown) are maximal. The two variables “lndocs”and “lnradios” are still significantly different from zero and, additionally,“lnbeds” is now a further covariate with an essential influence on femalelife expectancy. Last, but not least, we observe a decrease in the conditionindices, especially a decrease in the maximum ratio which changed from28.469 to 14.251.
3.6 Classical Regression under Normal Errors 67
Eigenvalue Condition1 4.532 1.0002 .347 3.6153 6.579E-02 8.3004 3.312E-02 11.6975 2.232E-02 14.251
Table 3.8. Collinearity diagnostics.
There is no general guide as to when multicollinearity seems to be a prob-lem even though indicators point to this more or less explicitly. We havedemonstrated a possible solution which, in practice, should be arranged interms of logical consistency concerning its context. This proceeding seemsto be similar to a variable selection. But here we have just tried to overcomethe problem of multicollinearity by eliminiating possible sources with thehelp of criteria concerning the constitution of X.
3.6 Classical Regression under Normal Errors
All the results obtained so far are valid, irrespective of the actual distribu-tion of the random disturbances ε, provided that E(ε) = 0 and E(εε′) = σ2I.Now we shall specify the type of the distribution of ε by additionally im-posing the following condition: The vector ε of the random disturbances εt
is distributed according to a T–dimensional normal distribution N(0, σ2I),i.e., ε ∼ N(0, σ2I). The probability density of ε is given by
f(ε; 0, σ2I) =T∏
t=1
(2πσ2)−1/2 exp(− 1
2σ2ε2t
)
= (2πσ2)−T/2 exp
− 1
2σ2
T∑t=1
ε2t
, (3.81)
such that its components εt, t = 1, . . . , T , are independent and identicallydistributed (i.i.d.) as N(0, σ2). Equation (3.81) is a special case of thegeneral T–dimensional normal distribution N(µ,Σ).
Let Ξ ∼ NT (µ, Σ), i.e., E(Ξ) = µ, E(Ξ − µ)(Ξ − µ)′ = Σ. Then Ξ isnormally distributed with density
f(Ξ; µ, Σ) = (2π)T |Σ|−1/2 exp−1/2(Ξ− µ)′Σ−1(Ξ− µ) . (3.82)
The classical linear regression model under normal errors is given by
y = Xβ + ε ,ε ∼ N(0, σ2I) ,X nonstochastic, rank(X) = K .
(3.83)
68 3. The Linear Regression Model
The Maximum Likelihood Principle
Definition 3.17. Let Ξ = (ξ1, . . . , ξn)′ be a random variable with densityfunction f(Ξ; Θ), where the parameter vector Θ = (Θ1, . . . , Θm)′ is amember of the parameter space Ω comprising all values that are a pri-ori admissible.
The basic idea of the Maximum Likelihood (ML) principle is to interpretthe density f(Ξ;Θ) for a specific realization of the sample Ξ0 of Ξ as afunction of Θ:
L(Θ) = L(Θ1, . . . , Θm) = f(Ξ0; Θ) .
L(Θ) will be denoted as the likelihood function of Ξ0.
The ML principle now postulates to choose a value Θ ∈ Ω whichmaximizes the likelihood function, i.e.,
L(Θ) ≥ L(Θ) for all Θ ∈ Ω .
Note that Θ may not be unique. If we consider all possible samples, thenΘ is a function of Ξ and is thus a random variable itself. We will call it themaximum likelihood estimator (MLE) of Θ.
ML Estimation in Classical Normal Regression
Following Theorem A.55, we have for y, from (3.23),
y = Xβ + ε ∼ N(Xβ, σ2I) , (3.84)
such that the Likelihood function of y is given by
L(β, σ2) = (2πσ2)−T/2 exp− 1
2σ2(y −Xβ)′(y −Xβ)
. (3.85)
The logarithmic transformation is monotonic. Hence, it is appropriateto maximize ln L(β, σ2) instead of L(β, σ2), as the maximizing argumentremains unchanged,
ln L(β, σ2) = −T
2ln(2πσ2)− 1
2σ2(y −Xβ)′(y −Xβ) . (3.86)
If there are no a priori restrictions on the parameters, then the parameterspace is given by Ω = β;σ2 : β ∈ RK ; σ2 > 0. We derive the MLestimators of β and σ2 by equating the first derivatives to zero (TheoremsA.63–A.67)
∂ ln L/∂β = 1/2σ22X ′(y −Xβ) = 0 , (3.87)∂ ln L/∂σ2 = −T/2σ2 + 1/2(σ2)2(y −Xβ)′(y −Xβ) = 0 . (3.88)
3.7 Testing Linear Hypotheses 69
The likelihood equations are given by
(I) X ′Xβ = X ′y ,
(II) σ2 = 1/T (y −Xβ)′(y −Xβ) .
(3.89)
Equation (I) is identical to the well–known normal equation (3.10). Itssolution is unique, as rank(X) = K, and we get the unique ML estimator
β = b = (X ′X)−1X ′y . (3.90)
If we compare (II) with the unbiased estimator s2 (3.62) for σ2, weimmediately see that
σ2 =T −K
Ts2 , (3.91)
such that σ2 is a biased estimator. The asymptotic expectation is given by(cf. A.71 (i))
limT→∞
E(σ2) = E(σ2) = E(s2) = σ2 . (3.92)
Thus we can state:
Theorem 3.18. The maximum likelihood estimator and the ordinary leastsquares estimator of β are identical in the model (3.84) of classical normalregression. The ML estimator σ2 of σ2 is asymptotically unbiased.
Remark. The Cramer–Rao bound defines a lower bound (in the sense of thedefiniteness of matrices) for the covariance matrix of unbiased estimators.In the model of normal regression, the Cramer–Rao bound is given by(Amemiya, 1985, p. 19)
V(β) ≥ σ2(X ′X)−1,
where β is an arbitrary estimator. The covariance matrix of the ML esti-mator is just identical to this lower bound, such that b is the best unbiasedestimator in the linear regression model under normal errors.
3.7 Testing Linear Hypotheses
In this section, testing procedures, such as for H0 : β1 = β2 = β3, forexample, are being derived in order to test linear hypotheses in the model(3.83) of classical normal regression. The general linear hypothesis,
H0 : Rβ = r, σ2 > 0 arbitrary , (3.93)
is usually tested against the alternative
H1 : Rβ 6= r, σ2 > 0 arbitrary , (3.94)
70 3. The Linear Regression Model
where the following will be assumed:
R,(K−s)×K
r,(K−s)× 1
R, r nonstochastic and known,rank(R) = K − s,s ∈ 0, 1, . . . , K − 1 .
(3.95)
The hypothesis H0 expresses the fact that the parameter vector β obeys(K − s) exact linear restrictions which are independent, as it is requiredthat rank(R) = K − s. The general linear hypothesis (3.93) contains twomain special cases:
Case 1: s = 0The (K×K)–matrix R is regular, by assumption (3.95), and we may expressH0 and H1 in the following form:
H0 : β = R−1r = β∗, σ2 > 0 arbitrary ,H1 : β 6= β∗, σ2 > 0 arbitrary .
(3.96)
Case 2: s > 0We choose an (s × K)–matrix G complementary to R such that the
(K × K)–matrix(
GR
)is regular of rank K. For exact notation, see
Proof 10, Appendix B.
Then we may write
y = Xβ + ε = X
(GR
)−1 (GR
)β + ε
= X
(β1
β2
)+ ε
= X1β1 + X2β2 + ε .
The latter model obeys all the assumptions (3.23). The hypotheses H0 andH1 are thus equivalent to
H0 : β2 = r, β1 and σ2 > 0 arbitrary ,
H1 : β2 6= r, β1 and σ2 > 0 arbitrary .(3.97)
Let Ω be the whole parameter space (either H0 or H1 are valid) and letω ⊂ Ω be the subspace in which only H0 is true, i.e.,
Ω = β;σ2 : β ∈ EK , σ2 > 0,ω = β; σ2 : β ∈ EK and Rβ = r, σ2 > 0. (3.98)
3.7 Testing Linear Hypotheses 71
As a genuine test statistic, we will use the likelihood ratio
λ(y) =maxω L(Θ)maxΩ L(Θ)
, (3.99)
which may be derived in terms of model (3.84) in the following way. L(Θ)attains its maximum at the ML estimator Θ. Let Θ = (β, σ2), then it holdsthat
maxβ,σ2
L(β, σ2) = L(β, σ2)
= (2πσ2)−T/2 exp−1/2σ2(y −Xβ)′(y −Xβ)
= (2πσ2)−T/2 exp −T/2 (3.100)
and, therefore,
λ(y) =(
σ2ω
σ2Ω
)−T/2
, (3.101)
where σ2ω and σ2
Ω are the ML estimators of σ2 under H0 and in Ω. Therandom variable λ(y) can take values between 0 and 1, as is obvious from(3.99). If H0 is true, the numerator of λ(y) gets close to the denominator,so that λ(y) should be close to one in repeated samples. On the other hand,λ(y) should be close to zero if H1 is true. Consider the linear transform ofλ(y):
F = (λ(y))−2/T − 1(T −K)(K − s)−1
=σ2
ω − σ2Ω
σ2Ω
· T −K
K − s. (3.102)
If λ → 0, then F →∞ and if λ → 1 we have F → 0, such that “F is closeto 0” if H0 seems to be true and “F is sufficiently large” if H1 is supposedto be true. The determination of F and its distribution for the two specialcases s = 0 and s > 0 is shown in Proof 11, Appendix B. The resultingdistribution of the test statistic F is FK−s,T−K(σ−2(β2 − r)′D(β2 − r))under H1, D being symmetric and regular, resulting from the inversionof the partitioned matrix, and central FK−s,T−K under H0. The region ofacceptance of H0 at a level of significance α is then given by
0 ≤ F ≤ FK−s,T−K,1−α. (3.103)
Accordingly, the critical area of H0 is given by
F > FK−s,T−K,1−α. (3.104)
Example 3.4. Assume that we want to test for H0 : β1 = β2 = β3. Onesolution to this problem, with respect to Rβ = r with its assumptions
72 3. The Linear Regression Model
(3.95), is based on the equations
(1) β1 − β2 = 0 , (3.105)
and
(2) β2 − β3 = 0 , (3.106)
leading to
R =(
1 −1 00 1 −1
)
β1
β2
β3
=
(00
). (3.107)
R in (3.107) has rank 2 but is not the only solution. Its structure dependson the system of equations (3.105) and (3.106). A similar, but not the samecase is the test for H0 : β1 = β2 = β3 = 0. One system of equations may be
(1) β1 = 0 , (3.108)(2) β2 − β1 = 0 , (3.109)(3) β3 − β2 = 0 (3.110)
(3.111)
leading to
R =
1 0 0−1 1 0
0 −1 1
(3.112)
or, in another way, simply to
R =
1 0 00 1 00 0 1
, (3.113)
i.e., Iβ = 0. Obviously, one has to be careful when handling linear hy-potheses with its test situation and the corresponding estimation.
One simple example of testing a linear hypothesis is H0 : β1 = 0. Thiscorresponds to the well–known t–test for testing if the parameter β differsfrom zero concerning its influence on y. Another example comes from analy-sis of variance where linear contrasts can be tested. Assuming a categoricalcovariate and a linear contrast, which tests if the means y1, y2 for differentlevels of factor A are the same, is the analog for testing H0 : β1 = β2. Con-cerning the use of statistical software within testing linear hypotheses theuser may hope to have a simple problem as above. A similar problem occurswhen the aim is the estimation of a restrictive least squares estimator. Onepossibility is to compute R by the corresponding system of equations suchas (3.105) and (3.106) and the well–known estimate (X ′X +R′R)−1X ′y bya software such as MAPLE used for analytical solutions.
3.8 Analysis of Variance and Goodness of Fit 73
3.8 Analysis of Variance and Goodness of Fit
Having only independent variables which are noncontinuous leads to theanalysis of variance. One main aim is to test if factors have individualor joint influence on the response. The analysis of variance is also aninstrument for reviewing the goodness of fit of the chosen model. The de-composition of the sum of squares is building the body for the analysis ofvariance which causes us to start with bivariate regression illustrating thederivation of this main context.
3.8.1 Bivariate Regression
To illustrate the basic ideas, we shall consider the model (3.64) with aconstant dummy variable 1 and a regressor x:
yt = β0 + β1xt + et (t = 1, . . . , T ) . (3.114)
Ordinary Least Squares estimators of β = (β0, β1)′ are given by
b1 =∑
(xt − x)(yt − y)∑(xt − x)2
, (3.115)
b0 = y − b1x . (3.116)
The best predictor of y on the basis of a given x is
y = b0 + b1x , (3.117)
Especially, we have, for x = xt,
yt = b0 + b1xt
= y + b1(xt − x) (3.118)
(cf. (3.115)).On the basis of the identity
yt − yt = (yt − y)− (yt − y) (3.119)
we may express the sum of squared residuals (cf. (3.19)) as
S(b) =∑
(yt − yt)2 =∑
(yt − y)2 +∑
(yt − y)2
−2∑
(yt − y)(yt − y).
Further manipulation yields∑
(yt − y)(yt − y) =∑
(yt − y)b1(xt − x) [cf. (3.118)]= b2
1
∑(xt − x)2 [cf. (3.115)]
=∑
(yt − y)2 [cf. (3.118)].
Thus, we have∑
(yt − y)2 =∑
(yt − yt)2 +∑
(yt − y)2. (3.120)
74 3. The Linear Regression Model
This relation has already been established in (3.17). The left–hand side of(3.120) is called the sum of squares about the mean or the correctedsum of squares of Y (i.e., SS(corrected)) or SY Y .
The first term on the right–hand side describes the deviation: “observa-tion − predicted value”, i.e., the residual sum of squares
SS residual: RSS =∑
(yt − yt)2 , (3.121)
whereas the second term describes the proportion of variability explainedby regression
SS regression: SSReg =∑
(yt − y)2 . (3.122)
If all the observations yt are located on a straight line, we obviously have∑(yt − yt)2 = 0 and thus SS(corrected) = SSReg. Accordingly, the
goodness of fit of a regression is measured by the ratio
R2 =SSReg
SS (corrected). (3.123)
We will discuss R2 in some detail. The degrees of freedom (df) of the sumof squares are
T∑t=1
(yt − y)2 : df = T − 1 ,
andT∑
t=1
(yt − y)2 = b21
∑(xt − x)2 : df = 1 ,
as one function in yt – namely, b1 – is sufficient to calculate SSReg. Inview of (3.120), the degree of freedom for the sum of squares
∑(yt − yt)2
is just the difference of the other two df ’s, i.e., df = T − 2. This enables usto establish the following analysis of variance table:
Source of Mean Squarevariation SS df (= SS/df) F
Regression SS regression 1 MSReg MSReg/s2
Residual RSS T − 2 s2 = RSS/T − 2
Total SS (corrected) = SY Y T − 1
The following example illustrates the basics of the ANOVA table with areal data set from the 1993 General Social Survey. If the errors et are nor-mally distributed, the sum of squares are distributed independently as χ2
df
and F follows an F–distribution.
3.8 Analysis of Variance and Goodness of Fit 75
Example 3.5. We are interested in the influence of the degree of education onthe average hours worked per week. The degree of education is a categoricalvariable on five levels. Running Analysis of Variance in SPLUS producesthe following output as an analog to the above table:
*** Analysis of Variance Model ***
Short Output:Call:
aov(formula = HRS1 ~ DEGREE, data = anova, na.action = na.omit)
Terms:DEGREE Residuals
Sum of Squares 1825.92 92148.28Deg. of Freedom 4 736
Residual standard error: 11.18935Estimated effects may be unbalanced
Analysis of Variance Table:Df Sum of Sq Mean Sq F Value Pr(F)
DEGREE 4 1825.92 456.4794 3.645958 0.005960708Residuals 736 92148.28 125.2015
The overall hypothesis is significant and for further analysis one has tocompute multiple comparisons for detecting local differences.
For goodness of fit and confidence intervals we need some tools and willuse the following abbreviations for these essential quantities:
SXX =∑
(xt − x)2 , (3.124)
SY Y =∑
(yt − y)2 , (3.125)
SXY =∑
(xt − x)(yt − y) . (3.126)
The sample correlation coefficient may then be written as
rXY =SXY√
SXX√
SY Y. (3.127)
Moreover, we have (cf. (3.115))
b1 =SXY
SXX= rXY
√SY Y
SXX. (3.128)
The estimator of σ2 may be expressed by using (3.127)as
s2 =1
T − 2
∑e2t =
1T − 2
RSS. (3.129)
76 3. The Linear Regression Model
Various alternative formulations for RSS are in use as well
RSS =∑
(yt − (b0 + b1xt))2
=∑
[(yt − y)− b1(xt − x)]2
= SY Y + b21SXX − 2b1SXY
= SY Y − b21SXX (3.130)
= SY Y − (SXY )2
SXX. (3.131)
Further relations immediately become apparent
SS (corrected) = SY Y (3.132)
and
SSReg = SY Y −RSS
=(SXY )2
SXX= b2
1 SXX . (3.133)
Testing the Model
If the model (3.114)
yt = β0 + β1xt + εt
is appropriate, the coefficient b1 should be significantly different from zero.This is equivalent to the fact that X and Y are significantly correlated.Formally, we compare the models (cf. Weisberg, 1980, p. 17)
H0 : yt = β0 + εt ,
H1 : yt = β0 + β1xt + εt ,
by testing H0 : β1 = 0 against H1 : β1 6= 0.
We assume normality of the errors ε ∼ N(0, σ2I). If we recall (B.65),i.e.,
D = x′x− x′1(1′1)−11′x
=∑
x2t −
(∑
xt)2
T=
∑(xt − x)2 = SXX , (3.134)
3.8 Analysis of Variance and Goodness of Fit 77
then the likelihood ratio test (B.78) is given by
F1,T−2 =b21SXX
s2
=SSReg
RSS· (T − 2)
=MSReg
s2. (3.135)
The Coefficient of Determination
In (3.123) R2 has been introduced as a measure of goodness of fit. Using(3.133), we get
R2 =SSReg
SY Y= 1− RSS
SY Y. (3.136)
The ratio SSReg/SY Y describes the proportion of variability that is cov-ered by regression in relation to the total variability of y. The right–handside of the equation is 1 minus the proportion of variability that is notcovered by regression.
Definition 3.19. R2 is called the coefficient of determination.
By using (3.127) and (3.133), we get the basic relation between R2 andthe sample correlation coefficient
R2 = r2XY . (3.137)
As one can see from the model summary on page 65 the coefficientof determination could be computed when analyzing a linear model bysoftware.
Confidence Intervals for b0 and b1
The covariance matrix of OLS is generally of the form Vb = σ2(X ′X)−1 =σ2S−1. In model (3.114) we get
S =(
1′1 1′x1′x x′x
)=
(T T xT x
∑x2
t
), (3.138)
S−1 =1
SXX
(1/T
∑x2
t −x−x 1
)(3.139)
78 3. The Linear Regression Model
and, therefore,
Var(b1) = σ2 1SXX
, (3.140)
Var(b0) =σ2
T·
∑x2
t
SXX=
σ2
T
∑x2
t − T x2 + T x2
SXX
= σ2
(1T
+x2
SXX
). (3.141)
The estimated standard deviations are
SE(b1) = s
√1
SXX(3.142)
and
SE(b0) = s
√1T
+x2
SXX(3.143)
with s from (3.129).
Under normal errors ε ∼ N(0, σ2I) in model (3.114), we have
b1 ∼ N
(β1, σ
2 · 1SXX
). (3.144)
Thus it holds thatb1 − β1
s
√SXX ∼ tT−2 . (3.145)
Analogously, we get
b0 ∼ N
(β0, σ
2
(1T
+x2
SXX
)), (3.146)
b0 − β0
s
√1T
+x2
SXX∼ tT−2 . (3.147)
This enables us to calculate confidence intervals at level 1− α:
b0 − tT−2,1−α/2 · SE(b0) ≤ β0 ≤ b0 + tT−2,1−α/2 · SE(b0) , (3.148)
and
b1 − tT−2,1−α/2 · SE(b1) ≤ β1 ≤ b1 + tT−2,1−α/2 · SE(b1) . (3.149)
For the “advertise” model (see page 45) we computed the confidenceintervals for the estimates using SPSS. It is not a standard output but onehas to choose this option.
The above confidence intervals correspond to the region of acceptance ofa two–sided test at the same level.
3.8 Analysis of Variance and Goodness of Fit 79
Unst. coefficients 95% Confidence interval for βModel β Std. error Lower bound Upper bound1 (Constant) 6.019 1.104 3.838 8.199
adv 3.079 0.300 2.486 3.672
Table 3.9. Dependent variable: reaction.
(i) Testing H0 : β0 = β∗0
The test statistic is
tT−2 =b0 − β∗0SE(b0)
. (3.150)
H0 is not rejected, if
|tT−2| ≤ tT−2,1−α/2
or, equivalently, if (3.148) holds, with β0 = β∗0 .
(ii) Testing H0 : β1 = β∗1
The test statistic is
tT−2 =b1 − β∗1SE(b1)
(3.151)
or, equivalently,
t2T−2 = F1,T−2 =(b1 − β∗1)2
(SE(b1))2. (3.152)
This is identical to (3.135), if H0 : β1 = 0 is being tested.
H0 will not be rejected, if
|tT−2| ≤ tT−2,1−α/2
or, equivalently, if (3.149) holds, with β1 = β∗1 .
3.8.2 Multiple Regression
If we consider more than two regressors, still under the assumption of nor-mality of the errors, we find the methods of analysis of variance to be mostconvenient in distinguishing the two models y = 1β0 + Xβ∗ + ε = Xβ + εand y = 1β0 + ε. In the latter model, we have β0 = y and the relatedresidual sum of squares is
∑(yt − yt)2 =
∑(yt − y)2 = SY Y. (3.153)
In the former model, the unknown parameter β = (β0, β∗)′ will again beestimated by b = (X ′X)−1 X ′ y.
80 3. The Linear Regression Model
The two components of the parameter vector β in the full model may beestimated by
b =(
β0
β∗
), β∗ = (X ′X)−1X ′y, β0 = y − β′∗x . (3.154)
Thus, we have (cf. Weisberg, 1980, p. 43)
RSS = (y − Xb)′(y − Xb)= y′y − b′X ′Xb
= (y − 1y)′(y − 1y)− β′∗(X′X)β∗ + T y2 . (3.155)
The proportion of variability explained by regression is (cf. (3.133))
SSReg = SY Y −RSS (3.156)
with RSS from (3.155) and SY Y from (3.153). The ANOVA table is of theform
Source ofvariation SS df MS
Regression on SSReg K SSReg/KX1, . . . , XK
Residual RSS T −K − 1 s2 = RSS/(T −K − 1)Total SY Y T − 1
As before, the multiple coefficient of determination
R2 =SSReg
SY Y(3.157)
is a measure of the proportion of variability explained by the regression ofy on X1, . . . , XK in relation to the total variability SY Y .
The F–test of
H0 : β∗ = 0
versus
H1 : β∗ 6= 0
(i.e., H0 : y = 1β0 + ε versus H1 : y = 1β0 + Xβ∗ + ε) is based on the teststatistic
FK,T−K−1 =SSReg/K
s2. (3.158)
Often it is of interest to test for the significance of the single compo-nents of β. This type of problem arises, for example, in stepwise modelselection, if an optimal subset is selected with respect to the coefficient ofdetermination.
3.8 Analysis of Variance and Goodness of Fit 81
Criteria for Model Choice
Draper and Smith (1966) and Weisberg (1980) have established a varietyof criteria to find the right model. We will follow the strategy proposed byWeisberg.
(i) Ad–Hoc Criteria
Denote by X1, . . . , XK all the available regressors and let Xi1, . . . , Xipbe a subset of p ≤ K regressors. We denote the residual sum of squares byRSSK (resp. RSSp). The parameter vectors are
β for X1, . . . , XK ,β1 for Xi1, . . . , Xip,
andβ2 for (X1, . . . , XK)\(Xi1, . . . , Xip).
A choice between both models can be conducted by testing H0 : β2 = 0.We apply the F–test, since the hypotheses are nested,
F(K−p),T−K =(RSSp −RSSK)/(K − p)
RSSK/(T −K). (3.159)
We prefer the full model against the partial model if H0 : β2 = 0 is rejected,i.e., if F > F1−α (with degrees of freedom K − p and T −K).
Model Choice Based on an Adjusted Coefficient of Determination
The coefficient of determination (see (3.156) and (3.157))
R2p = 1− RSSp
SY Y(3.160)
is inappropriate to compare a model with K and one with p < K, since R2
always increases if an additional regressor is incorporated into the model,irrespective of its values. The full model always has the greatest value of R2
(see Theorem 3.20). So we have to adjust R2 with respect to the numberof variables.
Example 3.6. Remembering our example from page 64, concerning the pre-diction of female life expectancy, we want to show the behavior of thecoefficient of determination. Using a “Forward Selection” within the lin-ear regression in SPSS leads to a model including “lndocs”, “lngdp”, and“lnradios” as predictors. Table 3.10 illustrates the varying coefficient of de-termination.
Beginning with Step 1 and R2 = 0.775, R2adj = 0.773 the stepwise inclu-
sion of two further variables leads to R2 = 0.823 and an adjusted coefficientof determination of 0.818 – the coefficients of the model resulting from a
82 3. The Linear Regression Model
Change statisticsR Square Sig. F
Model R2 R2adj change F change df1 df2 change
1 0.775 0.773 0.775 391.724 1 114 0.000
2 0.813 0.809 0.038 23.055 1 113 0.000
3 0.823 0.818 0.010 6.161 1 112 0.015
Table 3.10. 1 (Constant), natural log of doctors per 10,000; 2 (Constant), naturallog of doctors per 10,000, natural log of GDP; 3 (Constant), natural log of doctorsper 10,000, natural log of GDP, natural log of radios per 100 people.
“forward selection”. In order to illustrate the possible effect of an increas-ing R2 and a decreasing R2
adj we first include additionally “lnbeds” into theabove model (see Table 3.11). The result is shown in Table 3.11.
Change statisticsR Square Sig. F
Model R2 R2adj change F change df1 df2 change
4 0.826 0.820 0.826 132.183 4 111 0.000
Table 3.11. 4: (Constant), natural log of doctors per 10,000, natural log of GDP,natural log of radios per 100 people, natural log hospital beds/10,000.
Again, both R2 and R2adj are increased. Including “urban” as a further
variable (see Table 3.12), however illustrates the effect already described.The fact, that Models 4 and 5 have higher R2
adj’s than the model resulting
Change statisticsR Square Sig. F
Model R2 R2adj change F change df1 df2 change
4 0.827 0.819 0.827 105.336 5 110 0.000
Table 3.12. 5: (Constant), natural log of doctors per 10,000, natural log of GDP,natural log of radios per 100 people, natural log hospital beds/10,000, percenturban, 1992.
from the “forward selection” has its reason in non significant parameterestimates of the variables “lnbeds” and “urban”.
Theorem 3.20. Let y = X1β1 + X2β2 + ε = Xβ + ε be a full model and lety = X1β1 + ε be a submodel. Then it holds that
R2X −R2
X1≥ 0. (3.161)
(See Proof 12, Appendix B.)
3.8 Analysis of Variance and Goodness of Fit 83
On the basis of Theorem 3.20 we define the statistic
F–change =(RSSX1 −RSSX)/(K − p)
RSSX/(T −K), (3.162)
which is distributed as FK−p,T−K under H0 : “submodel is valid”. In modelchoice procedures, F–change tests for the significance of the change of R2
by adding further K − p variables to the submodel.In multiple regression, the appropriate adjustment of the ordinary co-
efficient of determination is provided by the coefficient of determinationadjusted by the degrees of freedom of the multiple model
R2p = 1−
(T − 1T − p
)(1−R2
p). (3.163)
Remark. If there is no constant β0 present in the model, then the numer-ator is T instead of T − 1, such that R2
p may possibly take negative values.This disadvantage cannot occur when using the ordinary R2.
If we consider two models, the smaller of which is assumed to becompletely included in the bigger one, and we find the relation
R2p+q < R2
p ,
then the smaller model obviously shows a better goodness of fit.
Further criteria are, for example, Mallows’ Cp (cf. Weisberg, 1980, p. 88),or criteria based on the residual MSE σ2
p = RSSp/(T −p) which are closelyrelated.
Confidence Regions
As in bivariate regression, there are close relations between the region ofacceptance of the F–test and the confidence intervals for β in the multiplelinear regression model as well.
Confidence Ellipsoids for the Whole Parameter Vector β
Considering (B.51) and (B.54), we get for β∗ = β a confidence ellipsoid atlevel 1− α:
(b− β)′X ′X(b− β)(y −Xb)′(y −Xb)
· T −K
K≤ FK,T−K,1−α . (3.164)
Confidence Ellipsoids for Subvectors of β
From (B.78) and (3.103), we have that
(b2 − β2)′D(b2 − β2)(y −Xb)′(y −Xb)
· T −K
K − s≤ FK−s,T−K,1−α (3.165)
is a (1− α)–confidence ellipsoid for β2.
84 3. The Linear Regression Model
Further results may be found in Judge, Griffiths, Hill and Lee (1980),Goldberger (1964), Pollock (1979), Weisberg (1980), and Kmenta (1997).
3.9 The General Linear Regression Model
3.9.1 Introduction
In many applications, it cannot be justified that the response values yt
(t = 1, . . . , T ) are independent. Consider, for example, a time series withautocorrelated errors or processes typically arising in medicine or sociology,when measurements are being repeated several times on a single person orcluster analysis. We will discuss these types of models at a later stage.Here we present a first step to generalize the classical model assuming aless restrictive form of the error covariance matrix.
The general linear regression model is of the form
y = Xβ + ε,E(ε) = 0, E(εε′) = σ2W,
W positive definite and known,X nonstochastic, rank(X) = K .
(3.166)
The first problem is now, that in the case of an unknown matrix W , thenumber of additional parameters to be estimated may increase by T (T +1)/2, at the most, because
∑Ti=1 i is the number of different parameters
in W . This problem cannot be solved on the basis of T observations only.Therefore we assume, for the present, that W is known. Furthermore, it isuseful to impose several restrictions on W in the sense that tr(W ) = T orwii = 1 (i = 1, . . . , T ).
Aitken Estimator
In order to facilitate the estimation in a general linear regression model(3.166), we shall transform the model. For the exact transformation, seeProof 13, Appendix B.
This transformation leads to b = (X ′X)−1X ′y which is, as we know,identical to the Gauss–Markov (GM) estimator in the transformed model.The Gauss–Markov property of b (with S = (X ′W−1X)) also remains validin model (3.166):
b = S−1X ′W−1y is unbiased,
E(b) = (X ′W−1X)−1X ′W−1 E(y) (3.167)= (X ′W−1X)−1X ′W−1Xβ = β .
3.9 The General Linear Regression Model 85
Moreover, b possesses the smallest variance (in the sense of Theorem 3.12,see Proof 14, Appendix B).
These results are summarized in:
Theorem 3.21 (Gauss–Markov–Aitken Theorem). In the general linear re-gression model, the generalized OLS estimator
b = (X ′W−1X)−1X ′W−1y , (3.168)
with covariance matrix
Vb = σ2(X ′W−1X)−1 = σ2S−1 , (3.169)
is the best linear unbiased estimator of β.
(We denote b also as an Aitken estimator or a generalized least squares(GLS) estimator). Analogously to the classical model, we estimate σ2 andVb by
s2 = (y −Xb)′W−1(y −Xb)(T −K)−1 (3.170)
and
Vb = s2S−1 . (3.171)
Both estimators are unbiased
E(s2) = σ2 and E(Vb) = σ2S−1 . (3.172)
Some statistical software packages offer a procedure for solving theproblem of E(εε′) 6= σ2I. SPSS, for example, suggests using the “weightestimation procedure” where cases with less variability are given greaterweights. The coefficients are computed by weighted least squares and arange of weight transformations is tested to get the best fit.
3.9.2 Misspecification of the Covariance Matrix
Assuming the general linear regression model (3.166) and W to be true,we want to examine the influence of a misspecification of the covariancematrix on the estimator of β and σ2, compared to the GLS estimator b(3.168) and s2 (3.170). Reasons for the misspecification could be:
• the use of the classical OLS estimator because the correlation betweenthe errors εt was not recognized;
• that the correlation is generally described by a matrix W 6= W ; and
• that the matrix W is unknown and is estimated independent of yfrom a presample through W .
In any case, we get the estimator
β = (X ′AX)−1X ′Ay , (3.173)
86 3. The Linear Regression Model
with A 6= W−1 symmetric, nonstochastic, and with (X ′AX) regular. Thenwe have
E(β) = β , (3.174)
where β [(3.173)] is unbiased for every misspecified matrix A (if rank(X ′AX)= K ). For the covariance matrix of β we get
Vβ = σ2(X ′AX)−1X ′AWAX(X ′AX)−1 . (3.175)
The loss of efficiency, due to the use of β instead of the GLS estimatorb = S−1X ′W−1y, becomes
Vβ − Vb = σ2[(X ′AX)−1X ′A− S−1X ′W−1]
×W [(X ′AX)−1X ′A− S−1X ′W−1]′ . (3.176)
Following Theorem A.18(iv), this matrix is nonnegative definite. There isno loss in efficiency if
(X ′AX)−1X ′A = S−1X ′W−1 or β = b . (3.177)
Assume the first column of X being 1 and let A = I, i.e., implyingthe use of the classical OLS estimator (X ′X)−1X ′y. Then the followingtheorem is valid (McElroy (1967)):
Theorem 3.22. The OLS estimator b0 = (X ′X)−1X ′y is Gauss–Markovestimator in the generalized linear regression model if and only if X = (1X),and
W = (1− ρ)I + ρ11′ (3.178)
with 0 ≤ ρ < 1 and 1′ = (1, 1, . . . , 1).
In other words, we have
(X ′X)−1X ′y = (X ′W−1X)−1X ′W−1y (3.179)
for all y, if and only if the errors εt have the same variance σ2 and equalnonnegative covariances σ2ρ. A matrix of this form is called compoundsymmetric.
Moreover, a loss in efficiency occurs if σ2 is estimated by an estimator σ2
that is based on β. The average bias of the estimator σ2 which is based onOLS is given by [σ2/T−K](K−tr[(X ′X)−1X ′WX]) (see Proof 15, Appen-dix B). It is to be expected that the bias will tend to be negative, especiallyin processes with positive correlation. As a consequence, the variance willbe underestimated, leading in turn to a better goodness of fit (cf. severalexamples in Goldberger, 1964, pp. 288, in cases of heteroscedasticity andfirst–order autoregression).
3.10 Diagnostic Tools 87
3.10 Diagnostic Tools
3.10.1 Introduction
This chapter discusses the influence of individual observations on the esti-mated values of parameters and the prediction of the dependent variablefor given values of regressor variables. Methods for detecting the outliers,and deviation from normality of the distribution of errors, are given in somedetail. The material of this chapter is drawn mainly from the excellent bookby Chatterjee and Hadi (1988).
3.10.2 Prediction Matrix
We consider the classical linear model
y = Xβ + ε, ε ∼ (0, σ2I) ,
with the usual assumptions. In particular, we assume that the matrix Xof order T × K has the full rank K. The quality of the classical ex–postpredictor p = Xb0 = y of y with b0 = (X ′X)−1X ′y, the OLSE (ordinaryleast–squares estimator), is strongly determined by the (T × T )–matrix
P = X(X ′X)−1X ′ = (pij) , (3.180)
which is symmetric and idempotent of rank(P ) = tr(P ) = tr(IK) = K.The matrix M = I − P is also symmetric and idempotent and hasrank(M) = T −K. The estimated residuals are defined by
ε = (I − P )y = y −Xb0
= y − y = (I − P )ε . (3.181)
Definition 3.23 (Chatterjee and Hadi, 1988). The matrix P given in (3.180)is called the prediction matrix, and the matrix I −P is called the residualsmatrix.
Remark: The matrix P is sometimes called the hat matrix because it mapsy onto y.
The (i, j)th element of the matrix P is denoted by pij where
pij = pji = x′j(X′X)−1xi (i, j = 1, . . . , T ) . (3.182)
The ex–post predictor y = Xb0 = Py has the dispersion matrix
V (y) = σ2P . (3.183)
88 3. The Linear Regression Model
Therefore, we obtain (denoting the ith component of y by yi and the ithcomponent of ε by εi)
var(yi) = σ2pii , (3.184)V(ε) = V
((I − P )y
)= σ2(I − P ) , (3.185)
var(εi) = σ2(1− pii) (3.186)
and, for i 6= j,
cov(εi, εj) = −σ2pij . (3.187)
The correlation coefficient between εi and εj then becomes
ρij = corr(εi, εj) =−pij√
1− pii
√1− pjj
. (3.188)
Thus the covariance matrices of the predictor Xb0 and the estimator oferror ε are entirely determined by P . Although the disturbances εi of themodel are independent and identically distributed, the estimated residu-als εi are not identically distributed and, moreover, they are correlated.Observe that
yi =T∑
j=1
pijyi = piiyi +∑
j 6=i
pijyj (i = 1, . . . , T ) , (3.189)
implying that
∂yi
∂yi= pii and
∂yi
∂yj= pij . (3.190)
Therefore, pii can be interpreted as the amount of leverage each valueyi has in determining yi regardless of the realized value yi. The secondrelation of (3.190) may be interpreted, analogously, as the influence of yj
in determining yi.
Elements of P
The size and range of the elements of P are measures for the influence ofdata on the predicted values yt. Because of the symmetry of P , we havepij = pji, and the idempotence of P implies
pii =n∑
j=1
p2ij = p2
ii +∑
j 6=i
p2ij . (3.191)
From this equation we obtain the important property
0 ≤ pii ≤ 1 . (3.192)
Reformulating (3.191)
pii = p2ii + p2
ij +∑
k 6=i,j
p2ik (j fixed) , (3.193)
3.10 Diagnostic Tools 89
which implies that p2ij ≤ pii(1−pii) and, therefore, using (3.192), we obtain
−0.5 ≤ pij ≤ 0.5 (i 6= j) . (3.194)
If X contains a column of constants (1 or c1), then in addition to (3.192)we obtain
pii ≥ T−1 (for all i) (3.195)
and
P1 = 1 . (3.196)
Relationship (3.195) is a direct consequence of (B.101) resulting from thedecomposition of P shown in Proof 16, Appendix B.
The diagonal elements pii and the off–diagonal elements pij (i 6= j)are interrelated according to properties (i)–(iii) as follows (Chatterjee andHadi, 1988, p. 19):
(i) If pii = 1 or pii = 0, then pij = 0.
Proof. Use (3.191).
(ii) We have
(piipjj − p2ij) ≥ 0 . Proof17,AppendixB. (3.197)
(iii) We have
(1− pii)(1− pjj)− p2ij ≥ 0 . Proof 18, Appendix B. (3.198)
Interpretation. If a diagonal element pii is close to either 1 or 0, then theelements pij (for all j 6= i) are close to 0.
The classical predictor of y is given by y = Xb0 = Py, and its firstcomponent is y1 =
∑p1jyj . If, for instance, p11 = 1, then y1 is fully deter-
mined by the observation y1. On the other hand, if p11 is close to 0, theny1 itself, and all the other observations y2, . . . , yT , have low influence ony1. Relationship (B.105) indicates that if pii is large, then the standardizedresidual εi/ε′ε becomes small.
Conditions for pii to Be Large
If we assume the simple linear model
yt = α + βxt + εt, t = 1, . . . , T ,
then we obtain, from (B.101),
pii =1T
+(xi − x)2∑Tt=1(xt − x)2
. (3.199)
90 3. The Linear Regression Model
The size of pii is dependent on the distance |xi−x|. Therefore, the influenceof any observation (yi, xi) on yi will be increasing with increasing distance|xi − x|.
In the case of multiple regression we have a similar relationship. Letλi denote the eigenvalues and let γi (i = 1, . . . , K) be the orthonormaleigenvectors of the matrix X ′X. Furthermore, let θij be the angle betweenthe column vector xi and the eigenvector γj (i, j = 1, . . . ,K). Then wehave
pij = ‖xi‖ ‖xj‖K∑
r=1
λ−1r cos θir cos θrj (3.200)
and
pii = x′ixi
K∑r=1
λ−1r (cos θir)2 . (3.201)
See Proof 19, Appendix B.
Therefore, pii tends to be large if:
(i) x′ixi is large in relation to the square of the vector norm x′jxj of theother vectors xj (i.e., xi is far from the other vectors xj); or
(ii) xi is parallel (or almost parallel) to the eigenvector corresponding tothe smallest eigenvalue. For instance, let λK be the smallest eigenvalueof X ′X, and assume xi to be parallel to the corresponding eigenvectorγK . Then we have cos θiK = 1, and this is multiplied by λ−1
K , resultingin a large value of pii (cf. Cook and Weisberg, 1982, p. 13).
Multiple X Rows
In the statistical analysis of linear models there are designs (as, e.g., in theanalysis of variance of factorial experiments) that allow a repeated responseyt for the same fixed x–vector. Let us assume that the ith row (xi1, . . . , xiK)occurs a times in X. Then it holds that
pii ≤ a−1. (3.202)
This property is a direct consequence of (3.193). Let J = j : xi = xjdenote the set of indices of rows identical to the ith row. This impliespij = pii for j ∈ J and, hence, (3.193) becomes
pii = ap2ii +
∑
j /∈J
p2ij ≥ ap2
ii ,
including (3.202).
3.10 Diagnostic Tools 91
Example 3.7. We consider the matrix
X =
1 21 21 1
with K = 2 and T = 3, and calculate
X ′X =(
3 55 9
), |X ′X| = 2 , (X ′X)−1 =
12
(9 −5
−5 3
),
P = X(X ′X)−1X ′ =
0.5 0.5 00.5 0.5 0
0 0 1
.
The first and second rows of P coincide. Therefore we have p11 ≤ 12 .
Inserting x = 53 and
∑3t=1(xt − x)2 = 6
9 in (3.199) results in
pii =13
+(xi − x)2∑(xt − x2)
,
that is, p11 = p22 = 13 + 1/9
6/9 = 12 and p33 = 1
3 + 4/96/9 = 1.
3.10.3 The Effect of a Single Observation on the Estimationof Parameters
In Section 3.8 we investigated the effect of one variable Xi (or sets ofvariables) on the fit of the model. The effect of including or excludingcolumns of X is measured and tested by the statistic F .
In this section we wish to investigate the effect of rows (yt, x′t) instead
of columns xt on the estimation of β. Usually, not all observations (yt, x′t)
have equal influence in a least squares fit or on the estimator (X ′X)−1X ′y.It is important for the data analyst to be able to identify observationsthat individually or collectively have excessive influence compared to otherobservations. Such rows of the data matrix (y, X) will be called influentialobservations.
The measures for the goodness of fit of a model are mainly based on theresidual sum of squares
ε′ε = (y −Xb)′(y −Xb)= y′(I − P )y = ε′(I − P )ε . (3.203)
This quadratic form and the residual vector ε = (I −P )ε itself may changeconsiderably if an observation is excluded or added. Depending on thechange in ε or ε′ε, an observation may be identified as influential or not. Inthe literature, a large number of statistical measures have been proposedfor diagnosing influential observations. We describe some of them and fo-
92 3. The Linear Regression Model
cus attention on the detection of a single influential observation. A moredetailed presentation is given by Chatterjee and Hadi (1988, Chapter 4).
Measures Based on Residuals
Residuals play an important role in regression diagnostics, since the ithresidual εi may be regarded as an appropriate guess for the unknownrandom error εi.
The relationship ε = (I − P )ε implies that ε would even be a goodestimator for ε if (I − P ) ≈ I, that is, if all pij are sufficiently smalland if the diagonal elements pii are of the same size. Furthermore, evenif the random errors εi are independent and identically distributed. (i.e.,E εε′ = σ2I), the identity ε = (I − P )ε indicates that the residuals arenot independent (unless P is diagonal) and do not have the same variance(unless the diagonal elements of P are equal). Consequently, the residualscan be expected to be reasonable substitutes for the random errors if:
(i) the diagonal elements pii of the matrix P are almost equal, thatis, the rows of X are almost homogeneous, implying homogeneityof variances of the εt; and
(ii) the off–diagonal elements pij (i 6= j) are sufficiently small, implyinguncorrelated residuals.
Hence it is preferable to use transformed residuals for diagnostic purposes.That is, instead of εi, we may use a transformed standardized residualεi = εi/σi, where σi is the standard deviation of the ith residual. Sev-eral standardized residuals with specific diagnostic power are obtained bydifferent choices of σi (Chatterjee and Hadi, 1988, p. 73).
(i) Normalized Residual . Replacing σi by (ε′ε)1/2 gives
ai =εi
(ε′ε)1/2(i = 1, . . . , T ). (3.204)
(ii) Standardized Residual . Replacing σi by s =√
ε′ε/(T −K), we obtain
bi =εi
s(i = 1, . . . , T ). (3.205)
(iii) Internally Studentized Residual . With σi = s√
1− pii we obtain
ri =εi
s√
1− pii(i = 1, . . . , T ). (3.206)
(iv) Externally Studentized Residual . Let us assume that the ith obser-vation is omitted. This fact is indicated by writing the index (i) inparantheses. Using this indicator, we may define the estimator of σ2
i
when the ith row (yi, x′i) is omitted as
s2(i) =
y′(i)(I − P(i))y(i)
T −K − 1(i = 1, . . . , T ). (3.207)
3.10 Diagnostic Tools 93
If we take σi = s(i)
√1− pii, the ith externally Studentized residual
is defined as
r∗i =εi
s(i)
√1− pii
(i = 1, . . . , T ). (3.208)
Detection of Outliers
To find the relationships between the ith internally and externally Student-ized residuals, we need to write (T−K)s2 = y′(I−P )y as a function of s2
(i),that is, as (T −K − 1)s2
(i) = y′(i)(I − P(i))y(i). This is done by noting thatomitting the ith observation is equivalent to fitting the mean–shift outliermodel
y = Xβ + eiδ + ε , (3.209)
where ei is the ith unit vector; that is, e′i = (0, . . . , 0, 1, 0, . . . , 0). Theargument is as follows. Suppose that either yi or x′iβ deviates systematicallyby δ from the model yi = x′iβ+εi. Then the ith observation (yi, x
′iβ) would
have a different intercept than the remaining observations and (yi, x′iβ)
would hence be an outlier. To check this fact, we test the hypothesis
H0 : δ = 0 (i.e., E(y) = Xβ)
against the alternative
H1 : δ 6= 0 (i.e., E(y) = Xβ + eiδ)
using the likelihood–ratio test (LRT) statistic
Fi =
(SSE(H0)− SSE(H1)
)/1
SSE(H1)/(T −K − 1), (3.210)
where SSE(H0) is the residual sum of squares in the model y = Xβ + εcontaining all the T observations
SSE(H0) = y′(I − P )y = (T −K)s2
and SSE(H1) is the residual sum of squares in the model y = Xβ +eiδ+ε.
The test statistic (3.210) may be written as
Fi =ε2i
(1− pii)s2(i)
= (r∗i )2, (3.211)
where r∗i is the ith externally Studentized residual (see Proof 20, AppendixB).
94 3. The Linear Regression Model
Theorem 3.24 (Beckman and Trussel, 1974). Assume the design matrix Xis of full column rank K.
(i) If rank(X(i)) = K and ε ∼ NT (0, σ2I), then the externally Student-ized residuals r∗i (i = 1, . . . , T ) are tT−K−1–distributed.
(ii) If rank(X(i)) = K − 1, then the residual r∗i is not defined.
Assume rank(X(i)) = K. Then Theorem 3.24(i) implies that the teststatistic (r∗i )2 = Fi from (3.211) is distributed as central F1,T−K−1 underH0 and noncentral F1,T−K−1(δ2(1 − pii)σ2) under H1, respectively. Thenoncentrality parameter decreases (tending to zero) as pii increases. Thatis, the detection of outliers becomes difficult when pii is large.
Relationships Between r∗i and ri
Equations (B.108) and (3.206) imply that
s2(i) =
(T −K)s2
T −K − 1− ε2i
(T −K − 1)(1− pii)
= s2
(T −K − r2
i
T −K − 1
)(3.212)
and, hence,
r∗i = ri
√T −K − 1T −K − r2
i
. (3.213)
Inspecting the Four Types of Residuals
The normalized, standardized, and internally and externally Studentizedresiduals are transformations of the OLS residuals εi according to εi/σi,where σi is estimated by the corresponding statistics defined in (3.204)–(3.207), respectively. The normalized, as well as the standardized, residualsai and bi, respectively, are easy to calculate but they do not measure thevariability of the variances of the εi. Therefore, in the case of large dif-ferences in the diagonal elements pii of P or, equivalently (cf. (3.186)), ofthe variances of εi, application of the Studentized residuals ri or r∗i is wellrecommended. The externally Studentized residuals r∗i are advantageous inthe following sense:
(i) (r∗i )2 may be interpreted as the F–statistic for testing the significanceof the unit vector ei in the mean–shift outlier model (3.209).
(ii) The internally Studentized residual ri follows a beta distribution(cf. Chatterjee and Hadi, 1988, p. 76) whose quantiles are notincluded in standard textbooks.
(iii) If r2i → T − K then r∗2i → ∞ (cf. (3.213)). Hence, compared to ri,
the residual r∗i is more sensitive to outliers.
3.10 Diagnostic Tools 95
i 1− pii yi εi r2i r∗2i = Fi
1 0.76 11.55 6.45 1.15 1.182 0.90 41.29 5.71 0.76 0.743 0.14 124.38 0.62 0.06 0.054 0.90 39.24 0.76 0.01 0.015 0.89 35.14 1.86 0.08 0.076 0.88 32.06 -12.06 3.48 5.387 0.86 26.93 -2.93 0.21 0.198 0.90 44.37 -9.37 2.05 2.419 0.88 57.71 1.29 0.04 0.03
10 0.90 42.32 7.68 1.38 1.46
Table 3.13. Internally and externally Studentized residuals.
Example 3.8. We consider the following data set including the responsevector y and the variable X4 (which was already detected to be the mostimportant variable compared to X1, X2, and X3):
(y
X4
)′=
(18 47 125 40 37 20 24 35 59 50
−10 19 100 17 13 10 5 22 35 20
).
Including the dummy variable 1, the matrix X = (1, X4) gives
X ′X =(
10 231231 13153
), |X ′X| = 78169,
(X ′X)−1 =1
78169
(13153 −231−231 10
).
The diagonal elements of P = X(X ′X)−1X ′ are
p11 = 0.24, p66 = 0.12,p22 = 0.10, p77 = 0.14,p33 = 0.86, p88 = 0.10,p44 = 0.10, p99 = 0.12,p55 = 0.11, p1010 = 0.11,
where∑
pii = 2 = K = tr P and pii ≥ 110 (cf. (3.195)). The value p33
differs considerably from the other pii. To calculate the test statistic Fi
(3.211), we have to find the residuals εi = yi − yi = yi − x′ib0, where β was(21.80, 1.03). The results are summarized in Table 3.13.
The residuals r2i and r∗2i are calculated according to (3.206) and (3.213),
respectively. The standard deviation was found to be s = 6.9.From Table C.6 (Appendix C) we have the quantile F1,7,0.95 = 5.59, im-
plying that the null hypothesis H0 : “ith observation (yi, 1, x4i) is not anoutlier” is not rejected for all i = 1, . . . , 10. The third observation may beidentified as a high–leverage point having remarkable influence on the re-gression line. Taking x4 = 23.1 and s2(x4) = 868.544 and applying formula
96 3. The Linear Regression Model
A
Figure 3.4. High–leverage point A.
A
Figure 3.5. Outlier A.
(3.199), we obtain
p33 =110
+(100− 23.1)2∑10
t=1(xt − x)2=
110
+76.92
9 · 868.544= 0.10 + 0.76 = 0.86.
Therefore, the large value of p33 = 0.86 is mainly caused by the largedistance between x43 and the mean value x4 = 23.1.
Figures 3.4 and 3.5 show typical situations for points that are very farfrom the others. Outliers correspond to extremely large residuals, but high–leverage points correspond to extremely small residuals in each case whencompared with other residuals.
3.10.4 Diagnostic Plots for Testing the Model Assumptions
Many graphical methods make use of the residuals to detect deviations fromthe stated assumptions. From experience one may prefer graphical methodsover numerical tests based on residuals. The most common residual plotsare:
(i) empirical distribution of the residuals, stem–and–leaf diagrams, Box–Whisker plots;
(ii) normal probability plots; and
(iii) residuals versus fitted values or residuals versus xi plots (seeFigures 3.6 and 3.7).
These plots are useful in detecting deviations from assumptions made onthe linear model.
The externally Studentized residuals may also be used to detect a vio-lation of normality. If normality is present, then approximately 68% of theresiduals r∗i will be in the interval [−1, 1]. As a rule of thumb, one mayidentify the ith observation as an outlier if |r∗i | > 3.
3.10 Diagnostic Tools 97
Figure 3.6. Plot of the resid-uals εt versus the fitted valuesyt (suggests deviation fromlinearity).
Figure 3.7. No violation oflinearity.
If the assumptions of the model are correctly specified, then we have
cov(ε, y′) = E((I − P )εε′P
)= 0 . (3.214)
Therefore, plotting εt versus yt (Figures 3.6 and 3.7) exhibits a randomscatter of points. Such a situation, as in Figure 3.7, is called a null plot. Aplot, as in Figure 3.8, indicates heteroscedasticity of the covariance matrix.
Figure 3.8. Signals for heteroscedasticity.
3.10.5 Measures Based on the Confidence Ellipsoid
Under the assumption of normally distributed disturbances, that is, ε ∼N(0, σ2I), we have b0 = (X ′X)−1X ′y ∼ N(β, σ2(X ′X)−1) and
(β − b0)′(X ′X)(β − b0)Ks2
∼ FK,T−K . (3.215)
98 3. The Linear Regression Model
Then the inequality
(β − b0)′(X ′X)(β − b0)Ks2
≤ FK,T−K,1−α (3.216)
defines a 100(1−α)% confidence ellipsoid for β centered at b0. The influenceof the ith observation (yi, x
′i) can be measured by the change of various
parameters of the ellipsoid when the ith observation is omitted. Stronginfluence of the ith observation would be equivalent to a significant changeof the corresponding measure.
Cook’s Distance
Cook (1977) suggested the index
Ci =(b− β(i))′X ′X(b− β(i))
Ks2(3.217)
=(y − y(i))′(y − y(i))
Ks2(i = 1, . . . , T ) , (3.218)
to measure the influence of the ith observation on the center of the con-fidence ellipsoid or, equivalently, on the estimated coefficients β(i) or thepredictors y(i) = Xβ(i). The measure Ci can be thought of as the scaleddistance between b and β(i) or y and y(i), respectively. Using
b− β(i) =(X ′X)−1xiεi
1− pii, (3.219)
the difference between the OLSEs in the full model and the reduced datasets, we immediately obtain the following relationship:
Ci =1K
pii
1− piir2i , (3.220)
where ri is the ith internally Studentized residual. Ci becomes large if pii
and/or r2i are large. Furthermore, Ci is proportional to r2
i . Applying (3.211)and (3.213), we get
r2i (T −K − 1)T −K − r2
i
∼ F1,T−K−1 ,
indicating that Ci is not exactly F–distributed. To inspect the relativesize of Ci for all the observations, Cook (1977), by analogy of (3.216)and (3.217), suggests comparing Ci with the FK,T−K–percentiles. Thegreater the percentile corresponding to Ci, the more influential is the ithobservation.
Let, for example, K = 2 and T = 32, that is, (T −K) = 30. The 95% and99% quantiles of F2,30 are 3.32 and 5.59, respectively. When Ci = 3.32, β(i)
lies on the surface of the 95% confidence ellipsoid. If Cj = 5.59 for j 6= i,then β(j) lies on the surface of the 99% confidence ellipsoid and, hence, thejth observation would be more influential than the ith observation.
3.10 Diagnostic Tools 99
Welsch–Kuh’s Distance
The influence of the ith observation on the predicted value yi can be mea-sured by the scaled difference (yi − yi(i)) – by the change in predictingyi when the ith observation is omitted. The scaling factor is the standarddeviation of yi (cf. (3.184)):
|yi − yi(i)|σ√
pii=|x′i(b− β(i))|
σ√
pii. (3.221)
suggesting the use of s(i) [(3.207)] as an estimate of σ in (3.221). Using(3.219) and (3.208), (3.221) can be written as
WKi =|εi/(1− pii)x′i(X
′X)−1xi|s(i)
√pii
= |r∗i |√
pii
1− pii. (3.222)
WKi is called the Welsch–Kuh statistic. When r∗i ∼ tT−K−1 (see Theo-rem 3.24), we can judge the size of WKi by comparing it to the quantilesof the tT−K−1–distribution. For sufficiently large sample sizes, one mayuse 2
√K/(T −K) as a cutoff point for WKi, signaling an influential ith
observation.
Remark: The literature contains various modifications of Cook’s distance(cf. Chatterjee and Hadi, 1988, pp. 122–135).
Measures Based on the Volume of Confidence Ellipsoids
Let x′Ax ≤ 1 define an ellipsoid and assume A to be a symmetric(positive–definite or nonnegative–definite) matrix. From spectral decom-position (Theorem A.30), we have A = ΓΛΓ′, ΓΓ′ = I. The volume of theellipsoid x′Ax = (x′ Γ)Λ(Γ′x) = 1 is then seen to be
V = cK
K∏
i=1
λ−1/2i = cK
√|Λ−1| ,
that is, inversely proportional to the root of |A|. Applying these argumentsto (3.216), we may conclude that the volume of the confidence ellipsoid(3.216) is inversely proportional to |X ′X|. Large values of |X ′X| indicatean informative design. If we take the confidence ellipsoid when the ithobservation is omitted, namely,
(β − β(i))′(X ′(i)X(i))(β − β(i))
Ks2(i)
≤ FK,T−K−1,1−α , (3.223)
then its volume is inversely proportional to |X ′(i)X(i)|. Therefore, omitting
an influential (informative) observation would decrease |X ′(i)X(i)| relative to
100 3. The Linear Regression Model
|X ′X|. On the other hand, omitting an observation having a large residualwill decrease the residual sum of squares s2
(i) relative to s2. These two ideascan be combined in one measure.
Andrews–Pregibon Statistic
Andrews and Pregibon (1978) have compared the volume of the ellipsoids(3.216) and (3.223) according to the ratio
(T −K − 1)s2(i)|X ′
(i)X(i)|(T −K)s2|X ′X| . (3.224)
An equivalent representation, proved in Proof 21, Appendix B, is
|Z ′(i)Z(i)||Z ′Z| . (3.225)
Omitting an observation that is far from the center of data will result ina large reduction in the determinant and, consequently, a large increase involume. Hence, small values of (3.225) correspond to this fact. For the sakeof convenience, we define
APi = 1−|Z ′(i)Z(i)||Z ′Z| , (3.226)
so that large values will indicate influential observations. APi is called theAndrews–Pregibon statistic and could be rewritten to
APi = pzii, Proof 22, Appendix B, (3.227)
where pzii is the ith diagonal element of the prediction matrix PZ =Z(Z ′Z)−1Z ′. From (B.106) we get
pzii = pii +ε2iε′ε
. (3.228)
Thus APi does not distinguish between high–leverage points in the X–spaceand outliers in the Z–space. Since 0 ≤ pzii ≤ 1 (cf. (3.192)), we get
0 ≤ APi ≤ 1 . (3.229)
If we apply the definition (3.206) of the internally Studentized residuals ri
and use s2 = ε′ε/(T −K), (3.229) implies
APi = pii + (1− pii)r2i
T −K(3.230)
or
(1−APi) = (1− pii)(
1− r2i
T −K
). (3.231)
The first quantity of (3.231) identifies high–leverage points and the secondidentifies outliers. Small values of (1−APi) indicate influential points (high–
3.10 Diagnostic Tools 101
leverage points or outliers), whereas independent examination of the singlefactors in (3.231) is necessary to identify the nature of influence.
Variance Ratio
As an alternative to the Andrews–Pregibon statistic and the other meas-ures, one can identify the influence of the ith observation by comparing theestimated dispersion matrices of b0 and β(i):
V (b0) = s2(X ′X)−1 and V (β(i)) = s2(i)(X
′(i)X(i))−1
by using measures based on the determinant or the trace of these matrices.If (X ′
(i)X(i)) and (X ′X) are positive definite, one may apply the followingvariance ratio suggested by Belsley, Kuh and Welsch (1980):
V Ri =|s2
(i)(X′(i)X(i))−1|
|s2(X ′X)−1| (3.232)
=
(s2(i)
s2
)K |X ′X||X ′
(i)X(i)|. (3.233)
Applying Theorem A.2(x), we obtain
|X ′(i)X(i)| = |X ′X − xix
′i|
= |X ′X|(1− x′i(X′X)−1xi)
= |X ′X|(1− pii) .
With this relationship, and using (3.212), we may conclude that
V Ri =(
T −K − r2i
T −K − 1
)K 11− pii
. (3.234)
Therefore, V Ri will exceed 1 when r2i is small (no outliers) and pii is large
(high–leverage point), and it will be smaller than 1 whenever r2i is large
and pii is small. But if both r2i and pii are large (or small), then V Ri tends
toward 1. When all observations have equal influence on the dispersionmatrix, V Ri is approximately equal to 1. Deviation from unity then willsignal that the ith observation has more influence than the others. Belsleyet al. (1980) propose the approximate cut–off “quantile”
|V Ri − 1| ≥ 3K
T. (3.235)
Example 3.9 (Example 3.8, continued). We calculate the measures definedbefore for the data of Example 3.8 (cf. Table 3.13). Examining Table 3.14,we see that Cook’s Ci has identified the sixth data point to be the mostinfluential one. The cutoff quantile 2
√K/T −K = 1 for the Welsch–Kuh
distance is not exceeded, but the sixth data point has the largest indication,again.
102 3. The Linear Regression Model
i Ci WKi APi V Ri
1 0.182 0.610 0.349 1.2602 0.043 0.289 0.188 1.1913 0.166 0.541 0.858 8.9674 0.001 0.037 0.106 1.4555 0.005 0.096 0.122 1.4436 0.241 0.864 0.504 0.4757 0.017 0.177 0.164 1.4438 0.114 0.518 0.331 0.8039 0.003 0.068 0.123 1.46610 0.078 0.405 0.256 0.995
Table 3.14. Cook’s Ci; Welsch–Kuh, WKi; Andrews–Pregibon, APi; varianceratio V Ri, for the data set of Table 3.13.
In calculating the Andrews–Pregibon statistic APi (cf. (3.227) and(3.228)), we insert ε′ε = (T − K)s2 = 8 · (6.9)2 = 380.88. The smallestvalue (1−APi) = 0.14 corresponds to the third observation, and we obtain
(1−AP3) = 0.14 = (1− p33)(
1− r23
8
)
= 0.14 · (1− 0.000387),
indicating that (y3, x3) is a high–leverage point, as we have noted already.The sixth observation has an APi value next to that of the third observa-tion. An inspection of the factors of (1−AP6) indicates that (y6, x6) tendsto be an outlier
(1−AP6) = 0.496 = 0.88 · (1− 0.437).
These conclusions also hold for the variance ratio. Condition (3.235),namely, |V Ri − 1| ≥ 6
10 , is fulfilled for the third observation, indicatingsignificance, in the sense of (3.235).
Remark: In the literature one may find many variants and generaliza-tions of the measures discussed here. A suitable recommendation is themonograph by Chatterjee and Hadi (1988).
3.10.6 Partial Regression Plots
Plotting the residuals against a fixed independent variable can be used tocheck the assumption that this regression has a linear effect on Y . If theresidual plot shows the inadequacy of a linear relation between Y and somefixed Xi, it does not display the true (nonlinear) relation between Y and Xi.Partial regression plots are refined residual plots to represent the correctrelation for a regressor in a multiple model under consideration. Suppose
3.10 Diagnostic Tools 103
e(X2|X1)
e(Y |X1)
Figure 3.9. Partial regression plot (of e(X2 | X1) versus e(Y | X1)) indicatingno additional influence of X2 compared to the model y = β0 + X1β1 + ε.
that we want to investigate the nature of the marginal effect of a variableXk, say, on Y in case the other independent variables under considerationare already included in the model. Thus partial regression plots may provideinformation about the marginal importance of the variable Xk that maybe added to the regression model.
Let us assume that one variable X1 is included and that we wish to add asecond variable X2 to the model (cf. Neter, Wassermann and Kutner, 1990,p. 387). Regressing Y on X1, we obtain the fitted values
yi(X1) = β0 + x1iβ1 = x′1iβ1 , (3.236)
where
β1 = (β0, β1)′ = (X ′1X1)−1X ′
1y (3.237)
and X1 = (1, x1).Hence, we may define the residuals
ei(Y |X1) = yi − yi(X1) . (3.238)
Regressing X2 on X1, we obtain the fitted values
x2i(X1) = x′1ib∗1 (3.239)
with b∗1 = (X ′1X1)−1X ′
1x2 and the residuals
ei(X2|X1) = x2i − x2i(X1) . (3.240)
Analogously, in the full model y = β0 + X1β1 + X2β2 + ε, we have
ei(Y |X1, X2) = yi − yi(X1, X2) , (3.241)
where
yi(X1, X2) = X1b1 + X2b2 (3.242)
104 3. The Linear Regression Model
e(X2|X1)
e(Y |X1)
Figure 3.10. Partial regression plot (of e(X2 | X1) versus e(Y | X1)) indicatingadditional linear influence of X2.
and b1 and b2 are the two components resulting from the separation of b(replace X1 by X1), for example, see Rao et al. (2008). Then we have
e(Y | X1, X2) = e(Y | X1)− b2e(X2 | X1) . (3.243)
The partial regression plot is obtained by plotting the residuals ei(Y |X1) against the residuals ei(X2 | X1). Figures 3.9 and 3.10 present somestandard partial regression plots. If the vertical deviations of the plottedpoints around the line e(Y | X1) = 0 are squared and summed, we obtainthe residual sum of squares
RSSX1=
(y − X1(X ′
1X1)−1X ′1y
)′(y − X1(X ′
1X1)−1X ′1y
)
= y′M1y
=[e(y | X1)
]′[e(Y | X1)
]. (3.244)
The vertical deviations of the plotted points in Figure 3.9, taken with re-spect to the line through the origin with slope b1 are the estimated residualse(Y | X1, X2).
The extra sum of squares relationship is
SSReg(X2 | X1) = RSSX1−RSSX1,X2
. (3.245)
This relation is the basis for the interpretation of the partial regressionplot: If the scatter of the points around the line with slope b2 is muchless than the scatter around the horizontal line, then adding an additionalindependent variable X2 to the regression model will lead to a substantialreduction of the error sum of squares and, hence, will substantially increasethe fit of the model.
3.10 Diagnostic Tools 105
3.10.7 Regression Diagnostics by Animating Graphics
Graphical techniques are an essential part of statistical methodology. Oneof the important graphics in regression analysis is the residual plot. In re-gression analysis the plotting of residuals versus the independent variable orpredicted values has been recommended by Draper and Smith (1966) andCox and Snell (1968). These plots help to detect outliers, to assess the pres-ence of the inhomogeneity of variance, and to check model adequacy. Larsenand McCleary (1972) introduced partial residual plots, which can detectthe importance of each independent variable and assess some nonlinearityor necessary transformation of variables.
For the purpose of regression diagnostics, Cook and Weisberg (1989)introduced dynamic statistical graphics. They considered the interpreta-tion of two proposed types of dynamic displays, rotation and animation,in regression diagnostics. Some of the issues that they addressed by usingdynamic graphics include adding predictors to a model, assessing the needto transform, and checking for interactions and normality. They used an-imation to show the dynamic effects of adding a variable to a model andprovided methods for simultaneously adding variables to a model.
Assume the classical linear, normal model
y = Xβ + ε
= X1β1 + X2β2 + ε, ε ∼ N(0, σ2I) . (3.246)
X consists of X1 and X2 where X1 is a [T × (K − 1)]–matrix, and X2 is a(T×1)–matrix, that is, X = (X1, X2). The basic idea of Cook and Weisberg(1989) is to begin with the model y = X1β1 + ε and then smoothly addX2, ending with a fit of the full model y = X1β1 + X2β2 + ε, where β1 is a[(K − 1)× 1]–vector and β2 is an unknown scalar. Since the animated plotthat they proposed involves only fitted values and residuals, they workedin terms of a modified version of the full model (3.246) given by
y = Zβ∗ + ε
= X1β∗1 + X2β
∗2 + ε , (3.247)
where X2 = Q1X2/||Q1X2|| is the part of X2 orthogonal to X1, normalizedto unit length, Q1 = I − P1, P1 = X1(X ′
1X1)−1X ′1, Z = (X1, X2), and
β∗ = (β∗′1 , β∗′2 )′.Next, for each 0 < λ ≤ 1, they estimated β∗ by
βλ =(
Z ′Z +1− λ
λee′
)−1
Z ′y , (3.248)
106 3. The Linear Regression Model
where e is a (K × 1)–vector of zeros except for a single 1 corresponding toX2. Since
(Z ′Z +
1− λ
λee′
)−1
=(
X ′1X1 00′ X ′
2X2 + (1− λ)/λ
)−1
=(
X ′1X1 00′ 1/λ
)−1
,
we obtain
βλ =(
(X ′1X1)−1X ′
1y
λX ′2y
).
So as λ tends to 0, (3.248) corresponds to the regression of y on X1
alone. And if λ = 1, then (3.248) corresponds to the ordinary least squaresregression of y on X1 and X2. Thus as λ increases from 0 to 1, βλ repre-sents a continuous change of estimators that add X2 to the model, and ananimated plot of ε(λ) versus y(λ), where ε(λ) = y − y(λ) and y(λ) = Zβλ,gives a dynamic view of the effects of adding X2 to the model that al-ready includes X1. This idea corresponds to the weighted mixed regressionestimator, see Rao et al. (2008), for example.
Using Cook and Weisberg’s idea of animation, Park, Kim and Touten-burg (1992) proposed an animating graphical method to display the effectsof removing an outlier from a model for regression diagnostic purposes.
We want to view the dynamic effects of removing the ith observationfrom the model (3.246). First, we consider the mean shift model y = Xβ +γiei + ε (see (3.209)) where ei is the vector of zeros except for a single 1corresponding to the ith observation. We can work in terms of a modifiedversion of the mean shift model given by
y = Zβ∗ + ε
= Xβ + γ∗i e + ε , (3.249)
where ei = Qxei/||Qxei|| is the orthogonal part of ei to X normalized tounit length, Q = I − P , P = X(X ′X)−1X ′, Z = (X, ei), and β∗ = (βγ∗i )′.And then, for each 0 < λ ≤ 1, we estimate β∗ by
βλ =(
Z ′Z +1− λ
λee′
)−1
Z ′y , (3.250)
where e is the [(K + 1) × 1]–vector of zeros except for a single 1 for the(K + 1)th element. Now we can think of some properties of βλ. First,without loss of generality, we take X and y of the forms X = (X(i)x
′i)′
and y = (y(i)yi)′, where x′i is the ith row vector of X, X(i) is the matrixX without the ith row, and y(i) is the vector y without yi. That is, placethe ith observation to the bottom and so ei and e become vectors of zeros
3.10 Diagnostic Tools 107
except for the last 1. Then, since(
Z ′Z +1− λ
λee′
)−1
=(
X ′X 00′ 1/λ
)−1
=(
(X ′X)−1 00′ λ
)
and
Z ′y =(
X ′ye′iy
)
we obtain
βλ =
( ˆβ
γ∗i
)=
((X ′X)−1X ′y
λe∗i y
)
and
y(λ) = Zβλ = X(X ′X)−1X ′y + λee′y .
Hence at λ = 0, y(λ) = (X ′X)−1X ′y is the predicted vector of observedvalues for the full model by the method of ordinary least squares. And atλ = 1, we can get the following lemma, where β(i) = (X ′
(i)X(i))−1X(i)y(i).
Lemma 3.25.
y(1) =
(X(i)β(i)
y(i)
).
Proof. See Proof 23, Appendix B.
Thus as λ increases from 0 to 1, an animated plot of ε(λ) versus λ givesa dynamic view of the effects of removing the ith observation from model(3.246).
The following lemma shows that the residuals ε(λ) and fitted values y(λ)can be computed from the residuals ε, fitted values y = y(0) from the fullmodel, and the fitted values y(1) from the model that does not contain theith observation.
Lemma 3.26.
(i) y(λ) = λy(1) + (1− λ)y(0); and
(ii) ε(λ) = ε− λ(y(1)− y(0)) .
Proof. See Proof 24, Appendix B.
Because of the simplicity of Lemma 3.26, an animated plot of ε(λ) versusy(λ) as λ is varied between 0 and 1 can easily be computed.
The appropriate number of frames (values of λ) for an animated residualplot depends on the speed with which the computer screen can be refreshedand, thus, on the hardware being used. With too many frames, changesoften become too small to be noticed and, as a consequence, the overall
108 3. The Linear Regression Model
trend can be missed. With too few frames, smoothness and the behavior ofindividual points cannot be detected.
When there are too many observations, and it is difficult to check allthe animated plots, it is advisable to select several suspicious observationsbased on nonanimated diagnostic measures, such as Studentized residuals,Cook’s distance, and so on.
From animated residual plots for individual observations, i = 1, 2, . . . , n,it would be possible to diagnose which observation is most influential inchanging the residuals ε, and the fitted values of y, y(λ), as λ changesfrom 0 to 1. Thus, it may be possible to formulate a measure to reflectwhich observation is most influential, and which kind of influential pointscan be diagnosed in addition to those that can already be diagnosed bywell–known diagnostics. However, our primary intent is only to provide agraphical tool to display and see the effects of continuously removing asingle observation from a model. For this reason, we do not develop a newdiagnostic measure that could give a criterion when an animated plot ofremoving an observation is significant or not. Hence, development of a newmeasure based on such animated plots remains open to further research.
Example 3.10 (Phosphorus Data). In this example, we illustrate the use ofε(λ) versus y(λ) as an aid to understanding the dynamic effects of removingan observation from a model. Our illustration is based on the phosphorusdata reported in Snedecor and Cochran (1967, p. 384). An investigation ofthe source from which corn plants obtain their phosphorus was carried out.Concentrations of phosphorus, in parts per million, in each of 18 soils wasmeasured. The variables are
X1 = concentrations of inorganic phosphorus in the soil,X2 = concentrations of organic phosphorus in the soil,
andy = phosphorus content of corn grown in the soil at 20 C.
The data set, together with the ordinary residuals ei, the diagonal termshii of the hat matrix H = X(X ′X)−1X ′, the Studentized residuals ri, andCook’s distances Ci are shown in Table 3.15 under the linear model assump-tion. We developed computer software that plots the animated residualsand some related regression results. The plot for the seventeenth obser-vation shows the most significant changes in residuals among eighteenplots. In fact, the seventeenth observation has the largest target residualei, Studentized residuals rii, and Cook’s distances Ci, as shown in Table3.15.
Figure 3.10 shows four frames of an animated plot of ε(λ) versus y(λ) forremoving the seventeenth observation. The first frame (a) is for λ = 0 andthus corresponds to the usual plot of residuals versus fitted values from theregression of y on X = (X1, X2), and we can see that in (a) the seventeenth
3.10 Diagnostic Tools 109
Soil X1 X2 y ei hii ri Ci
1 0.4 53 64 2.44 0.26 0.14 0.0022432 0.4 23 60 1.04 0.19 0.06 0.0002433 3.1 19 71 7.55 0.23 0.42 0.0167114 0.6 34 61 0.73 0.13 0.04 0.0000715 4.7 24 54 -12.74 0.16 -0.67 0.0287626 1.7 65 77 12.07 0.46 0.79 0.1787907 9.4 44 81 4.11 0.06 0.21 0.0009658 10.1 31 93 15.99 0.10 0.81 0.0238519 11.6 29 93 13.47 0.12 0.70 0.02254310 12.6 58 51 -32.83 0.15 -1.72 0.17809511 10.9 37 76 -2.97 0.06 -0.15 0.00050312 23.1 46 96 -5.58 0.13 -0.29 0.00417913 23.1 50 77 -24.93 0.13 -1.29 0.08066414 21.6 44 93 -5.72 0.12 -0.29 0.00376815 23.1 56 95 -7.45 0.15 -0.39 0.00866816 1.9 36 54 -8.77 0.11 -0.45 0.00862417 26.8 58 168 58.76 0.20 3.18 0.83767518 29.9 51 99 -15.18 0.24 -0.84 0.075463
Table 3.15. Data, ordinary residuals ei, diagonal terms hii of hat matrixH = X(X ′X)−1X ′, Studentized residuals ri, and Cook’s distances Ci fromExample 3.10.
observation is located in the upper–right corner. The second (b), third (c),and fourth (d) frames correspond to λ = 1
2 , 23 , and 1, respectively. So the
fourth frame (d) is the usual plot of the residuals versus the fitted valuesfrom the regression of y(17) on X(17) where the subscript represents omissionof the corresponding observation. We can see that as λ increases from 0to 1, the seventeenth observation moves to the right and down, becomingthe rightmost point in (b), (c), and (d). Considering the plotting form,the residual plot in (a) has an undesirable form because it does not havea random form in a band between −60 and +60, but in (d) its form hasrandomness in a band between −20 and +20.
Figure 3.11–3.14 show animated plots of ε(λ) versus y(λ) for data in Ex-ample 3.10 when removing the seventeenth observation (marked by dottedlines).
Apart from the problems we described within this section there existmany other problems which the user may be confronted with in practicalwork. Based on the usual notation of the linear model, problems may ariseby its components, i.e., ε (heteroscedasticity, autocorrelation), X (exclusionof relevant variables, inclusion of irrelevant variables, correlation betweenX and ε), or with the parameter β. Especially, the constancy of β as an
110 3. The Linear Regression Model
20 60 100 140 180
−60−40−20
0204060
•
••••
••
••
•
• ••
••
•
•
•
Figure 3.11. λ = 0
20 60 100 140 180
−60−40−20
0204060
•
••••
••
••
•
• ••
••
•
•
•
Figure 3.12. λ = 13
20 60 100 140 180
−60−40−20
0204060
•
••••
••
••
•
• ••
••
•
•
•
Figure 3.13. λ = 23
20 60 100 140 180
−60−40−20
0204060
•
••••
••
••
•
• ••
••
•
•
•
Figure 3.14. λ = 1
important assumption may be violated. Several testing procedures, e.g.,the Chow or Hansen tests, are described in Johnston (1984). Also helpfulis the description of tests of slope coefficients or of an intercept (see alsoJohnston (1984)).
3.11 Exercises and Questions
3.11.1 Define the principle of least squares.′ ′
a unique solution?
estimator.
the MSE–I superiority.
3.11.2 Given the normal equation X Xβ = X y, what are the conditions for
3.11.3 Assume rank(X) = p < K. What are the linear restrictions to ensureestimability of β? Give the definition of the restricted least squares
3.11.4 Define the matrix–valued mean square error of a linear estimator and
3.11 Exercises and Questions 111
β = Cy + d a linear estimator. Give the condition ofunbiasedness of β. What is the best linear unbiased estimator?
unbiased estimator β and any linear estimator β?
3.11.7 How can you get an unbiased estimate of σ2?
rank of X ′X, of the least squares estimator andidentifiability.
3.11.9 Assume ε ∼ N(0, σ2I) and give the ML estimators of β and σ2.
3.11.5 Let be
3.11.6 What is the relation of the covariance matrices of the best linear
3.11.8 Characterize weak and extreme multicollinearity in terms of theunbiasedness
4Single–Factor Experiments withFixed and Random Effects
4.1 Models I and II in the Analysis of Variance
The analysis of variance, which was originally developed by R.A. Fisherfor field experiments, is one of the most widely used and one of the mostgeneral statistical procedures for testing and analyzing data. These pro-cedures require a large amount of computation, especially in the case ofcomplicated classifications. For this reason, these procedures are availableas software.
We distinguish between two fundamental problems.
Model I with fixed effects is used for the multiple comparison of means ofquantitative normally distributed factors that are observed on fixed selectedexperimental units. We test the null hypothesis H0 : µ1 = µ2 = . . . = µs
against the general alternative H1 : at least two means are different, i.e., wecompare s normally distributed populations with respect to their means.The corresponding F–test is a generalization of the t–test, that comparestwo normal distributions. In general, this comparison is called comparisonof the effects of treatments. If specific treatments are to be compared, thenit is wise not to choose them at random, but to assume them as fixed.
© Springer Science + Business Media, LLC 2009
H. Toutenburg and Shalabh, Statistical Analysis of Designed Experiments, Third Edition, 113Springer Texts in Statistics, DOI 10.1007/978-1-4419-1148-3_4,
114 4. Single–Factor Experiments with Fixed and Random Effects
Example 4.1. Comparison of the average manufacturing time for an inlayby three different prespecified dentists (Table 4.1).
Dentist A Dentist B Dentist C55.5 67.0 62.540.0 57.0 31.538.5 33.5 31.531.5 37.0 53.045.5 75.0 50.570.0 60.0 62.578.0 43.5 40.080.0 56.0 19.574.5 65.557.5 54.072.0 59.570.048.059.0
n1 = 14 n2 = 11 n3 = 8x1 = 58.57 x2 = 55.27 x3 = 43.88
n = n1 + n2 + n3
Table 4.1. Manufacturing time (in minutes) for the making of inlays, measuredfor three dentists (cf. Toutenburg, 1977).
Model II with random effects is used for the decomposition of the totalvariability produced by the effect of several factors. This total variability(variance) is decomposed into components that reflect the effect of eachfactor and into a component that cannot be explained by the factors, i.e.,the error variance. The experimental units are chosen at random, as op-posed to Model I. The treatments are then to be regarded as a randomsample from an assumed infinite population. Hence, we have no interest inthe treatments chosen at random, but only in the respective proportion ofthe total variability.
Example 4.2. From a total population, the manufacturing times of (e.g.,three) dentists chosen at random are to be analyzed with respect to theirproportion of the total variability.
4.2 One–Way Classification for the Multiple Comparison of Means 115
4.2 One–Way Classification for the MultipleComparison of Means
Assume we have s samples from s normally distributed populationsN(µi, σ
2). Furthermore, assume the sample sizes to be ni and the totalsample size to be n with
s∑
i=1
ni = n. (4.1)
The variances σ2 are unknown, but equal in all populations.
Definition 4.1. If all ni are equal, then the sampling design (experimentaldesign) is called balanced. Otherwise, it is called unbalanced.
The s different levels of a Factor A are called treatments. Since only onefactor is investigated, we call this type of experimental design one–wayclassification.
Examples:
1. Factor A: plastic PMMA:s levels: s different concentrations of quartz in PMMA;s effects: flexibility of the different PMMA materials.
2. Factor A: fertilization:s levels: s different fertilizers (or one fertilizer
with s different concentrations of phosphate);s effects: output per acre.
Single experiments per level of Factor A Sum of the obser- Sample1 2 . . . ni vations per sample mean
1 y11 y12 . . . y1n1
Py1j = Y1. Y1./n1 = y1.
2 y21 y22 . . . y2n2
Py2j = Y2. Y2./n2 = y2.
.
.
.s ys1 ys2 . . . ysns
Pysj = Ys. Ys./ns = ys.
n =P
ni
PYi. = Y.. Y../n = y..
Table 4.2. Sample design (one–way classification).
The observations of the s samples are arranged according to Table 4.2.A period in the subscript indicates that we summed over this subscript.For example, y1. is the sum of the first row, y.. is the total sum. For theobservations yij we assume the following model:
yij = µ + αi + εij (i = 1, . . . , s, j = 1, . . . , ni) , (4.2)
in which µ is the overall mean, αi is the effect of the ith level of Factor A( i.e., the deviation (treatment effect) from the overall mean µ caused by
116 4. Single–Factor Experiments with Fixed and Random Effects
the ith level), and εij is a random error (i.e., random deviation from µ andαi).µ and αi are fixed parameters, the εij are random. The followingassumptions have to hold:
• the errors εij are independent and identically distributed with mean0 and variance σ2;
• the errors are normal, i.e., we have εij ∼ N(0, σ2); and
• the following constraint holds∑
αini = 0. (4.3)
In experimental designs, it is important to have equal sample sizes ni inthe groups (balanced case), otherwise the analysis of variance is not ro-bust against deviations from the assumptions (normal distribution, equalvariances).
Remark. Model I (with fixed effects) assumes that the s treatments aregiven in advance, i.e., they are fixed before the experiment. Hence, the αi
are nonstochastic factors. If the s treatments were selected by a randommechanism from a set of possible treatments, then the αi would be sto-chastic, i.e., random variables with a certain distribution. For the analysisof linear models with stochastic parameters the methods of linear modelshave to be modified. For now, we restrict ourselves to the case with fixedeffects. Models with random effects are discussed in Section 4.6.
Completely Randomized Experimental Design
The simplest and least restrictive design (CRD: completely randomizeddesign) consists of assigning the s treatments to the n experimental unitsin the following manner. We choose n1 experimental units at random andassign them to treatment i = 1. After that, n2 experimental units areselected from the remaining n − n1 units, once again at random, and areassigned to treatment i = 2, and so on. The remaining n −∑s−1
i=1 ni = ns
units receive the sth treatment. This experimental design has the followingadvantages (cf., e.g., Petersen, 1985, p. 7):
• Flexibility: The number s of treatments and the amounts ni are notrestricted; in particular, unbalanced designs are allowed. However,balanced design should be preferred, since for these designs the powerof the tests is the highest.
• Degrees of freedom: The design provides a maximum number ofdegrees of freedom for the error variance.
• Statistical analysis: The employment of standard procedures is possi-ble in the unbalanced case as well (e.g., in the case of missing valuesdue to nonresponse).
4.2 One–Way Classification for the Multiple Comparison of Means 117
A disadvantage of this design arises in case of inhomogeneous experi-mental units: a decrease in the precision of the results. Often, however, theexperimental units can be grouped into homogeneous subgroups (blocking)with a resulting increase in precision.
4.2.1 Representation as a Restrictive Model
The linear model (4.2) can be formulated in matrix notation
y11
...y1n1
...ys1
...ysns
=
1 1 0 . . . 0...
......
......
1 1 0 . . . 0...
......
......
1 0 · · · 0 1...
......
......
1 0 · · · 0 1
µα1
...αs
+
ε11...
ε1n1
...εs1
...εsns
,
i.e.,
y = Xβ + ε, ε ∼ N(0, σ2I), (4.4)
with X of type n × (s + 1) and rank(X) = s. Hence, we have exact mul-ticollinearity. X ′X is now singular, and a linear restriction r = R′β withrank(R) = J = 1 and rank(XR′)′ = s + 1 has to be introduced for the es-timation of the [(s + 1)× 1]–vector β′ = (µ, α1, . . . , αs) (cf. Theorem B.1).We choose
r = 0, R′ = (0, n1, . . . , ns), (4.5)
and, hence,∑
αini = 0 (4.6)
(cf. (4.3)).
Remark. The estimability of β is ensured according to Theorem B.1 forevery restriction r = R′β with rank(R′) = J = 1 and rank(XR′)′ = s + 1.However, the selected restriction (4.6) has the advantage of an interpre-tation, justified by the subject matter, that follows the effect coding of aloglinear model. The parameters αi are then the deviations from the overallmean µ and hence standardized with respect to µ. Thus, the αi determinethe relative (positive or negative) factors, with which the ith treatmentleads to deviations from the overall mean, by their magnitude and sign.
According to (B.16), the conditional OLS estimate of β′ = (µ, α1, . . . , αs)is of the following form:
b(R′, 0) = (X ′X + RR′)−1X ′y. (4.7)
118 4. Single–Factor Experiments with Fixed and Random Effects
As we can easily check, the matrix (XR′)′ with X from (4.4), and R′ from(4.5), is of full column rank s + 1.
Case s = 2
We demonstrate the computation of the estimate b(R′, 0) for s = 2. Withthe notation 1′ni
= (1, . . . , 1) for the (ni × 1)–vector of ones, we obtain thefollowing representation:
Xn,3
=(
1n1 1n1 01n2 0 1n2
), (4.8)
X ′X =
1′n11′n2
1′n10′
0′ 1′n2
(1n1 1n1 01n2 0 1n2
)
=
n1 + n2 n1 n2
n1 n1 0n2 0 n2
,
RR′ =
0n1
n2
(0 n1 n2) (4.9)
=
0 0 00 n2
1 n1n2
0 n1n2 n22
.
With n = n1 + n2 we have
(X ′X + RR′) =
n n1 n2
n1 n1 + n21 n1n2
n2 n1n2 n2 + n22
,
|X ′X + RR′| = n1n2n2, (4.10)
following that (X ′X + RR′)−1 equals
1n1n2n2
·
n1n2(1 + n) −n1n2 −n1n2
−n1n2 n2(n(1 + n2)− n2) −n1n2(n− 1)−n1n2 −n1n2(n− 1) n1(n(1 + n1)− n1)
,
(4.11)
X ′y =
1′n11′n2
1′n10′
0′ 1′n2
(y1
y2
)
=
Y··Y1·Y2·
. (4.12)
4.2 One–Way Classification for the Multiple Comparison of Means 119
Here we have
y1 =
y11
...y1n1
, y2 =
y21
...y2n2
,
Y1· =n1∑
j=1
y1j , Y2· =n2∑
j=1
y2j ,
Y·· = Y1· + Y2· .
Finally, we receive the conditional OLS estimate (4.7) for the case s = 2according to
b ((0, n1, n2), 0) = (X ′X + RR′)−1X ′y
=
µα1
α2
=
y··y1· − y··y2· − y··
. (4.13)
Proof. See Proof 25, Appendix B.2.
4.2.2 Decomposition of the Error Sum of Squares
With b(R′, 0) from (4.13) we receive
y = Xb(R′, 0) =(
y1·1n1
y2·1n2
). (4.14)
The decomposition (3.120), i.e.,∑
(yt − y)2 =∑
(yt − yt)2 +∑
(yt − y)2 ,
is of the following form in the model (4.4) with the new notation
s∑
i=1
ni∑
j=1
(yij − y··)2 =s∑
i=1
ni∑
j=1
(yij − yi·)2 +s∑
i=1
ni(yi· − y··)2 (4.15)
or, written according to (3.121) and (3.122),
SSCorr = RSS + SSReg (4.16)
or, in the notation of the analysis of variance,
SSTotal = SSWithin + SSBetween . (4.17)
The sum of squares
SSWithin =∑ ∑
(yij − yi·)2
120 4. Single–Factor Experiments with Fixed and Random Effects
measures the variability within each treatment. On the other hand, thesum of squares
SSBetween =s∑
i=1
ni(yi· − y··)2
measures the differences in variability between the treatments, i.e., theactual treatment effects.
Testing the Regression
We consider the linear model
yij = µ + αi + εij (i = 1, . . . , s, j = 1, . . . , ni) (4.18)
with∑
niαi = 0 . (4.19)
Testing the hypothesis
H0 : α1 = · · · = αs = 0 (4.20)
is equivalent to comparing the models
H0 : yij = µ + εij (4.21)
and
H1 : yij = µ + αi + εij with∑
niαi = 0 , (4.22)
i.e., is equivalent to testing
H0 : α1 = · · · = αs = 0 (parameter space ω) (4.23)
against
H1 : αi 6= 0 for at least two i (parameter space Ω) . (4.24)
In the case of an assumed normal distribution εij ∼ N(0, σ2) for all i, j thecorresponding likelihood ratio test statistic (3.102)
F =σ2
ω − σ2Ω
σ2Ω
T −K
K − s
changes to
F =SSTotal − SSWithin
SSWithin
n− s
s− 1(4.25)
=SSBetween
SSWithin
n− s
s− 1(4.26)
=MSBetween
MSWithin. (4.27)
4.2 One–Way Classification for the Multiple Comparison of Means 121
Remark. The sum of squares
SSBetween =s∑
i=1
ni(yi· − y··)2
is named according to the factor, e.g., SSA, if Factor A represents atreatment in s different levels. Analogously, we also denote
SSWithin =s∑
i=1
ni∑
j=1
(yij − yi·)2
as SSError (SSE, error sum of squares).The sums of squares with respect to SSBetween = SSA can also be written
in detail as follows:
SSTotal =∑
i
∑
j
(yij − y··)2 =∑
i
∑
j
y2ij − ny2
·· , (4.28)
SSA =∑
i
∑
j
(yi· − y··)2 =∑
i
niy2i· − ny2
·· , (4.29)
SSError =∑
i
∑
j
(yij − yi·)2 =∑
i
∑
j
y2ij −
∑
i
niy2i· . (4.30)
These formulas make the computation a lot easier (i.e., if calculators areused).
Under the assumption of a normal distribution, the sums of squares havea χ2–distribution with the corresponding degrees of freedom. The ratiosSS/df are called MS (Mean Square). As we will show further on,
MSE =SSError
n− s(4.31)
is an unbiased estimate of σ2. For the test of hypothesis (4.23), the teststatistic (4.27) is used, i.e.,
F =MSA
MSE=
n− s
s− 1SSA
SSError. (4.32)
Under H0, F has an Fs−1,n−s–distribution. If
F > Fs−1,n−s;1−α , (4.33)
then H0 is rejected. For the realization of the analysis of variance we useTable 4.3.
Remark. For the derivation of the test statistic (4.32) we used the resultsof Chapter 3 and those of Section 3.7 in particular. Hence, we did notagain prove the independence of the χ2–distributions in the numerator anddenominator of F (4.32).
122 4. Single–Factor Experiments with Fixed and Random Effects
Source of Degrees of Testvariation SS freedom MS statistics F
Between thelevels ofFactor A
SSA =sP
i=1niy
2i· − ny2
·· dfA = s−1 MSA =SSAdfA
MSA/MSE
Within thelevels ofFactor A
SSError =Pi
Pj
y2ij −
Pi
niy2i· dfE = n−s MSE =
SSEdfE
SSTotal =Pi
Pj
y2ij − ny2
·· dfT = n−1
Table 4.3. Layout for the analysis of variance; one–way classification.
Theorem 4.2 (Theorem by Cochran). Let zi ∼ N(0, 1), i = 1, . . . , v, beindependent random variables and assume the following disjunctivedecomposition
v∑
i=1
z2i = Q1 + Q2 + · · ·+ Qs (4.34)
with s ≤ v. Hence, the Q1, . . . , Qs are independent χ2v1
, . . . , χ2vs
–distributedrandom variables if and only if
v = v1 + · · ·+ vs (4.35)
holds.
Employing this theorem yields the following:
(i) SSTotal =s∑
i=1
ni∑
j=1
(yij − y··)2 (4.36)
has n =∑s
i=1 ni summands, that have to satisfy one linear restriction(∑ ∑
yij = ny··). Hence, SSTotal has n− 1 degrees of freedom:
(ii) SSWithin = SSError =s∑
i=1
ni∑
j=1
(yij − yi·)2 (4.37)
has s linear restrictions∑ni
j=1 yij = niyi· (i = 1, . . . , s) in the case of nsummands. Hence, SSWithin has n− s degrees of freedom:
(iii) SSBetween = SSA =s∑
i=1
ni(yi· − y··)2 (4.38)
has s summands, that have to satisfy one linear restriction (∑s
i=1 niyi· =ny··), and thus SSBetween has s − 1 degrees of freedom. Hence, for thedecomposition (4.34), according to
SSTotal = SSError + SSA
4.2 One–Way Classification for the Multiple Comparison of Means 123
we have the decomposition (4.35) of the degrees of freedom, i.e.,
n− 1 = (n− s) + (s− 1) ,
such that according to Theorem 4.2, SSError and SSA have independentχ2–distributions, i.e., their ratio F [(4.32)] has an F–distribution.
4.2.3 Estimation of σ2 by MSError
In (3.62) we derived the statistic
s2 =1
T −K(y −Xb0)′(y −Xb0)
as an unbiased estimate for σ2 in the linear model. In our special case ofmodel (4.4) and using
y = Xb0 =
y1·1n1
y2·1n2
...ys·1ns
(4.39)
according to (4.14) for s > 2, we receive (equating K = s, T = n):
s2 =1
n− s((y1 − y1·1n1)
′, . . . , (ys − ys·1ns)′)
y1 − y1·1n1
...ys − ys·1ns
=1
n− s
s∑
i=1
ni∑
j=1
(yij − yi·)2 (4.40)
= MSError. (4.41)
Model (4.2) yields
yi· = µ + αi + εi· , εi· ∼ N
(0,
σ2
ni
), (4.42)
and, hence, in analogy to (3.61),
E(MSError) =1
n− sE
[∑ ∑(yij − yi·)2
]
=1
n− sE
[∑ ∑(ε2ij + ε2i· − 2εijεi·)
]
=1
n− s
∑
i
∑
j
(σ2 +
σ2
ni− 2
σ2
ni
)
= σ2. (4.43)
124 4. Single–Factor Experiments with Fixed and Random Effects
Furthermore, it follows, from (4.42) with (4.6), that
y·· = µ +1n
s∑
i=1
niαi + ε··
= µ + ε·· , ε·· ∼ N
(0,
σ2
n
), (4.44)
E(εi·ε··) =1
ninE
ni∑
j=1
εij
s∑
i=1
ni∑
j=1
εij
=σ2
n. (4.45)
Hence
yi· − y·· = αi + εi· − ε··, (4.46)
E(yi· − y··)2 = α2i +
σ2
ni− σ2
n, (4.47)
holds and, thus,
E(MSA) =1
s− 1
∑∑E(yi· − y··)2
= σ2 +∑
niα2i
s− 1. (4.48)
Hence, under H0 : α1 = · · · = αs = 0, MSA is an unbiased estimate forσ2 as well. Thus, if H0 does not hold, the test statistic F [(4.32)] has anexpectation larger than one.
Example 4.3. The measured manufacturing times for the making of inlays(Table 4.1) represent one–way classified data material. Here, Factor A rep-resents the effect of a dentist on the manufacturing times, it has s = 3levels (dentists A, B, C).
We may assume that the assumptions for a normal distribution hold, ifwe replace the manufacturing times in Table 4.1 by their natural logarithm(the reason for this transformation is that time values usually have a skeweddistribution).
The arrangement in Table 4.4 of the measured values is done accordingto Table 4.1, the analysis is done in Table 4.5. The analysis yields thetest statistic F = 2.70 < 3.32 = F2,30;0.95 (Table C.6). Hence, the nullhypothesis The mean manufacturing times per inlay are equal for all threedentists is not rejected.
Once again we want to point out the difference between Models I andII: The above result indicates that the three selected dentists do not differwith respect to their average manufacturing times per inlay. If, however,we want to test the effect that the factor dentist has on the manufacturingtime, then the manufacturing times would have to be measured in a sample
4.2 One–Way Classification for the Multiple Comparison of Means 125
1 2 3 4 5 6 7 8 9 10i
(A) 1 4.02 3.69 3.65 3.45 3.82 4.25 4.36 4.38 4.31 4.05(B) 2 4.20 4.04 3.51 3.61 4.32 4.09 3.77 4.03 4.18 3.99(C) 3 4.14 3.45 3.45 3.97 3.92 4.14 3.69 2.97
11 12 13 14 Yi· yi·i
(A) 1 4.28 4.25 3.87 4.08 56.46 = Y1· 4.03 = y1·(B) 2 4.09 43.83 = Y2· 3.98 = y2·(C) 3 29.73 = Y3· 3.72 = y3·
n = 33 130.02 = Y·· 3.94 = y··
Table 4.4. Logarithms of the manufacturing times from Table 4.1.
SS df MS FSSA = 512.82 - 512.28 2 MSA = 0.27 F = 2.70
= 0.54SSError = 515.76 - 512.82 30 MSE = 0.10
= 2.94SSTotal = 515.76 - 512.28 32
= 3.48
Table 4.5. Analysis of variance table for Example 4.1.
of s dentists selected at random, and the proportion of the variability dueto dentists compared to the total variation would have to be tested. Hence,the comparison of means is not the point of interest, but the decompositionof the total variation into components (Model II).
Remark.
(i) The above analysis was done on a PC with maximum precision. If cal-culators are used, and in the case of two–digital precision, deviationsin the SS′s arise, but not in the test decision.
(ii) The model (4.4) assumes identical variances of εij in the s popu-lations. ANOVA under unequal error variances is a Behrens–Fisherproblem which is discussed in Weerahandi (1995), which gives anexact test for comparing more than two variances.
126 4. Single–Factor Experiments with Fixed and Random Effects
4.3 Comparison of Single Means
4.3.1 Linear Contrasts
The multiple comparison of means, i.e., the test of H0 [(4.23)] against H1
[(4.24)], has two possible outcomes–acceptance of H0 (no treatment effect)and rejection of H0 (treatment effect). In the case of the first decision theanalysis is finished, although a second run for the proof of an effect with alarger sample size could be done after appropriate power calculations.
If, however, H1 : αi 6= 0 for at least one i (or, equivalently, µi = µ +αi 6= µ + αj = µj for at least one pair (i, j), i 6= j) is accepted, i.e., anoverall treatment effect is proven, then the main interest lies in findingthose populations that caused this overall effect. Hence, in this situationcomparisons of pairs or of linear combinations are appropriate, that is, wetest, for example,
H0 : µ1 = µ2
against
H1 : µ1 6= µ2
with the two–sample t–test by comparing y1· and y2· according to (1.5).Another possible hypothesis would be, for example, µ1 + µ2 = µ3 + µ4.
These hypotheses stand for one linear constraint r = R′β each, withrank(R′) = 1. In the analysis of variance, a linear combination of means(in the population or in the sample) is called a linear contrast, as long asthe following assumption is fulfilled.
Definition 4.3. A linear combinationa∑
i=1
ciyi· = c′y
of means is called a linear contrast if
c′c 6= 0 anda∑
i=1
ci = 0 (4.49)
holds.
Suppose we want to compare s populations with respect to their means,i.e., if we assume
yij ∼ N(µi, σ2), i = 1, . . . , s, j = 1, . . . , ni, (4.50)
with yij and yi′j independent for i 6= i′, then
yi· ∼ N
(µi,
σ2
ni
). (4.51)
4.3 Comparison of Single Means 127
Denote by
µ = (µ1, . . . , µs)′ (4.52)
the vector of the s expectations. Then every linear contrast in theexpectations can be written as
c′µ with∑
ci = 0 and c′c 6= 0. (4.53)
The vector µ is not to be mistaken for the overall mean µ from (4.4). Hence,the test statistic for testing H0 : c′µ = 0 has the typical form
(c′y)2
Var(c′y)(4.54)
with the vector
y′ = (y1·, . . . , ys·) (4.55)
of the sample means. Thus, because of the independence of the spopulations, we have (cf. (4.4))
c′y ∼ N
(c′µ, σ2
∑ c2i
ni
)(4.56)
and, hence, under H0:
(c′y)2
σ2∑
c2i /ni
∼ χ21 . (4.57)
As always, the MSError [(4.41)] is an unbiased estimate of the variance σ2,hence the test statistic is of the following form:
t2n−s = F1,n−s =(c′y)2
MSError
∑c2i /ni
(4.58)
if the χ2–distributions of the numerator and denominator are independentwhich could be proven by Cochran’s Theorem 4.2. For the exact proof, seeProof 26, Appendix B.
Since, under H0 : c′µ = 0, a linear contrast is invariant to a multiplicationwith a constant a 6= 0:
ac′µ = 0, a∑
ci = 0, (4.59)
it is advisable to eliminate the ambiguity by the standardization
c′c = 1. (4.60)
Definition 4.4. A linear contrast c′µ is normed if c′c = 1.
Definition 4.5. Two linear contrasts c′1µ and c′2µ are orthogonal if
c′1c2 = 0. (4.61)
128 4. Single–Factor Experiments with Fixed and Random Effects
Analogously, a system (c′1µ, . . . , c′vµ) of orthogonal contrasts is called anorthonormal system if
c′icj = δij (i, j = 1, . . . , v) (4.62)
holds, where δij is the Kronecker symbol.The orthogonal contrasts are an essential aid in reducing the number
of possible pairwise comparisons to the maximum number of independenthypotheses, and hence in ensuring the testability.
Example 4.4. Assume we have s = 3 samples (3 levels of Factor A) and letthe design be balanced (ni = r). The overall null hypothesis
H0 : µ1 = µ2 = µ3 (i.e., H0 : αi = 0 for i = 1, 2, 3) (4.63)
can be written, for example, as
H0 : µ1 = µ2 and µ2 = µ3 , (4.64)
or with linear contrasts as
H0 :(
c′1c′2
)µ =
(00
)(4.65)
with
µ′ = (µ1, µ2, µ3)
and
c′1 = (1,−1, 0) , (4.66)c′2 = (0, 1,−1) . (4.67)
We have c′1c2 = −1, hence c′1µ and c′2µ are not orthogonal and the quadraticforms (c′1y)2 and (c′2y)2 are not stochastically independent. If, however, wechoose
c′1 = (1,−1, 0), c′1c1 = 2, (4.68)
as before, and
c′2 = (1, 1,−2), c′2c2 = 6 , (4.69)
then c′1c2 = 0. c′1µ = 0 means µ1 = µ2 and c′2µ = 0 means (µ1+µ2)/2 = µ3,so that both contrasts represent H0 : µ1 = µ2 = µ3 simultaneously. Thetest statistic for H0 [(4.65)] is then of the form
F2,n−2 =(
r(c′1y)2
c′1c1+
r(c′2y)2
c′2c2
)/MSError. (4.70)
With the contrasts (4.68) and (4.69), we thus have, for the hypothesis H0
[(4.63)],
F2,n−2 =(
r(y1· − y2·)2
2+
r(y1· + y2· − 2y3·)2
6
)/MSError . (4.71)
4.3 Comparison of Single Means 129
4.3.2 Contrasts of the Total Response Values in the BalancedCase
We want to derive an interesting decomposition of the sum of squares SSA.We assume:
• s levels of Factor A (treatments);
• ni = r repetitions per treatment (balanced design);
• n = rs the total number of response values;
• Yi· =∑r
j=1 yij the total response of treatment i;
• Y ′ = (Y1·, . . . , Ys·) the vector of the total response values; and
• SSA = 1r
s∑
i=1
Y 2i· − 1
rs
(s∑
i=1
Yi·
)2
(4.72)
(cf. (4.29) for the balanced case).
Under these assumptions the following rules apply (cf., e.g., Petersen, 1985,p. 92):
(i) Let c′1Y be a linear contrast of the total response values. Then
S21 =
(∑s
i=1 c1iYi·)2
(r∑
c21i)
=(c′1Y)2
(rc′1c1)(4.73)
is a component of SSA with one degree of freedom. Hence, with
c1iYi· ∼ N(0, rσ2c21i),
c′1Y ∼ N(0, rσ2∑
c21i)
= N(0, rσ2c′1c1),
we have under H0:
(c′1Y)2
rc′1c1= S2
1 ∼ σ2χ21 . (4.74)
(ii) If c′2Y and c′1Y are orthogonal contrasts, then
S22 =
(c′2Y)2
(rc′2c2)(4.75)
is a component of SSA − S21 .
(iii) If c′1Y, . . . , c′s−1Y is a complete system of orthogonal contrasts, then
S21 + . . . + S2
s−1 = SSA (4.76)
holds.
130 4. Single–Factor Experiments with Fixed and Random Effects
We now have a decomposition of SSA into s − 1 independent sums ofsquares. In the case of a normal distribution, these components haveindependent χ2–distributions. This decomposition corresponds to the de-composition of the G2–statistic in (I × 2)–contingency tables into (I − 1)independent, χ2–distributed G2–statistics for the analysis of the subeffects.In the case of a significant overall treatment effect the main subeffects thatcontributed to the significance can thus be discovered. The significance ofthe subeffects, i.e., H0 : c′iY = 0 against H1: c′iY 6= 0, is tested with
t2n−s = F1,n−s = F1,s(r−1) =S2
i
MSError. (4.77)
Variance of Linear Contrasts
Repetitionsi 1 2 3 4 5 6 Yi· yi· si
1 4.5 5.0 3.5 3.7 4.8 4.0 25.5 4.25 0.60912 3.8 4.0 3.9 4.2 3.6 4.4 23.9 3.98 0.28583 3.5 4.5 3.2 2.1 3.5 4.0 20.8 3.47 0.81164 3.0 2.8 2.2 3.4 4.0 3.9 19.3 3.22 0.6882
Y·· = 89.5 y·· = 3.73
Table 4.6. Flexibility in dependency of four levels of Factor A (additives).
Sum of Mean F FSource df squares squares ratio prob.
Between groups 3 4.0046 1.3349 3.3687 0.0389Within groups 20 7.9250 0.3962
Total 23 11.9296
Table 4.7. Analysis of variance table for Table 4.6 in SPSS format.
If the s samples are independent, then the variance of a linear contrastis computed as follows:
(i) Contrast of the meansLet c′y = c1y1· + . . . + csys·, then
Var(c′y) =(
c21
n1+ . . . +
c2s
ns
)σ2 (4.78)
holds in general. In the balanced case (ni = r, i = 1, . . . , s) thisexpression simplifies to
Var(c′y) =c′cr
σ2 . (4.79)
4.3 Comparison of Single Means 131
(ii) Contrast of the totalsLet c′Y = c1Y1· + . . . + csYs·, then
Var(c′Y) = (n1c21 + . . . + nsc
2s)σ
2 (4.80)
holds in general, and in the balanced design
Var(c′Y) = rc′cσ2 . (4.81)
The variance σ2 of the population is estimated by MSError = s2, hence
Var(c′y) = s2∑ c2
i
ni(4.82)
and
Var(c′Y) = s2∑
nic2i (4.83)
are unbiased estimates of Var(c′y) and Var(c′Y).
Example 4.5. Consider the following balanced experimental design withr = 6 repetitions:
Factor A: Level 1: control group (neither A1 nor A2);Level 2: additive A1;Level 3: additive A2;Level 4: additives A1 and A2 (combination).
Suppose response Y is the flexibility of a plastic material, and that we areinterested in the most favorable mixture in the sense of a reduction of theflexibility. The data are shown in Table 4.6.
We receive the analysis of variance table (Table 4.7) according to thelayout of Table 4.3 in the SPSS format. The F–test rejects the hypothesisH0 : µ1 = µ2 = µ3 = µ4 with the statistic F3,20 = 3.3687 (p–value, 0.0389).Hence, we can now compare pairs or combinations of treatments. For s = 4levels, systems exist with s− 1 = 3 orthogonal contrasts. We consider thetwo systems in Tables 4.8 and 4.9.
In both systems the sums of squares S2 of the contrasts add upto SSA (SS Between Groups in Table 4.7) according to (4.76). WithMSError = 0.3962, the test statistics (4.77) are
Table 4.8 Table 4.92.02 1.012.61 9.10 ∗5.48 ∗ 0.00
The 95%–quantile of the F1,23–distribution is 4.15, so that:
• the employment of at least one additive, compared to the controlgroup, is significant (i.e., reduces the flexibility significantly); and
132 4. Single–Factor Experiments with Fixed and Random Effects
Treatment 1 2 3 4Contrast response Yi· 25.5 23.9 20.8 19.3 c′Y S2
A1 against A2 0 +1 −1 0 3.1 0.8008A1 or A2 against 0 −1 −1 2 −6.1 1.0336A1 and A2A1 or A2 or −3 +1 +1 +1 −12.5 2.1702A1 and A2 againstcontrol group P
= 4.0046
Table 4.8. Orthogonal contrasts and test statistics S2.
Treatment 1 2 3 4Contrast response Yi· 25.5 23.9 20.8 19.3 c′Y S2
A1 −1 +1 −1 +1 −3.1 0.4004A2 −1 −1 +1 +1 −9.3 3.6038A1 × A2 +1 −1 −1 +1 0.1 0.0004P
= 4.0046
Table 4.9. Orthogonal contrasts and test statistics S2.
• the employment of A2 (alone or in combination with A1) reduces theflexibility significantly.
The orthogonal contrasts of the response sums Yi· make a decompositionof the variability SSA possible, i.e., of the treatment effect, and henceenable the determination of significant subeffects. With F from (4.58), theorthogonal contrast of means, on the other hand, yields a test statisticfor testing differences of treatments according to the linear function of themeans given by the contrast.
We demonstrate this with the same systems of orthogonal contrasts asin Tables 4.8 and 4.9. The results are shown in Tables 4.10 and 4.11. Wehave, for example (Table 4.11, first row),
c′y = (y2· + y4·)− (y1· + y3·)= 3.98 + 3.22− (4.25 + 3.47) = −0.52 ,
Var(c′y) =c′cr
s2
= 4/6 · 0.3962 = 0.2641= 0.51402
with s2 = MSError = 0.3962 from Table 4.7. The test statistic from (4.58),for
H0 : c′µ = (µ2 + µ4)− (µ1 + µ3) = 0 ,
i.e., for H0 : (α2 + α4) = (α1 + α3), is now
t24−4 = t20 =−0.5200.514
= −1.002 .
The critical value is (Table C.5)
t20;0.95,one–sided = −1.73
4.3 Comparison of Single Means 133
and
t20;0.95,two–sided = ±2.09 ,
so that H0 is not rejected. We can see from Tables 4.10 and 4.11 that thefollowing contrasts are significant:
µ2 + µ3 + µ4
3− µ1 < 0
(the control group has a higher flexibility than the mean of the threetreatments),
µ3 + µ4 − (µ1 + µ2) < 0
(A2 plus (A1 and A2) have a lower mean flexibility than the control groupplus A1). Commands and output in SPSS: The contrasts from Table 4.11are called, with the command,
/contrast = -1 1 -1 1/contrast = -1 -1 1 1/contrast = 1 -1 -1 1
which is inserted into the SPSS procedure.
Treatment 1 2 3 4Contrast mean yi· 4.25 3.98 3.47 3.22 c′y Var(c′y) t20
A1 against A2 0 +1 −1 0 0.52 0.3632 1.42A1 or A2 againstA1 and A2 0 −1 −1 2 −1.02 0.6292 −1.61A1 or A2 orA1 and A2 againstcontrol group −3 +1 +1 +1 −2.08 0.8902 −2.33 *
Table 4.10. Orthogonal contrasts of the means.
Contrast Treatment 1 2 3 4mean yi· 4.25 3.98 3.47 3.22 c′y Var(c′y) t20
A1 −1 +1 −1 +1 −0.52 0.5142 −1.002A2 −1 −1 +1 +1 −1.54 0.5142 −2.996 *
A1 × A2 +1 −1 −1 +1 0.02 0.5142 0.039
Table 4.11. Orthogonal contrasts of the means.
The obvious question, as whether A2 should be employed alone or incombination with A1, could be tested with the two–sample t–test accordingto (1.5). We compute with sA2 = 0.8116, sA1 and A2
= 0.6882 (Table 4.6)the pooled variance (1.6)
s2 =5(0.81162 + 0.68822)
6 + 6− 2= 0.75242
and
t10 =20.8/6− 19.3/6
0.7524
√6 · 6/(6 + 6) = 0.5755 ,
134 4. Single–Factor Experiments with Fixed and Random Effects
so that H0 : µA2 = µ(A1 and A2) is not rejected (t10,0.95,one–sided = 1.81).Hence, the two treatments A2 and (A1 and A2) show no significantdifference.
In the next section, however, we will integrate this problem of pairwisecomparisons in the case of s treatments into the multiple test problem. Aswe will see, this shows that an adjustment of the degrees of freedom, or ofthe applied quantile, respectively, has to be made.
4.4 Multiple Comparisons
4.4.1 Introduction
With the linear and, especially, with the orthogonal contrasts we have thepossibility of testing selected linear combinations for significance and thusstructure the treatments. The starting point is a rejection of the overallequality µ1 = . . . = µs of the means of the response.
A number of statistical procedures exist for the comparison of singlemeans or of groups of means. These procedures have the following differentobjectives:
• Comparison of all possible pairs of means (for s levels of A we haves(s− 1)/2 different pairs).
• Comparison of all s − 1 means with a control group selected inadvance.
• Comparison of all pairs of treatments that were selected in advance.
• Comparison of any linear combinations of the means.
These procedures differ, next to their aims, especially with respect tothe way in which they control for the type I error. In one case, the erroris controlled on a per comparison basis, in the other case the error iscontrolled simultaneously for all comparisons.
A multiple test procedure, that conducts every pairwise comparison at asignificance level α, i.e., that works per comparison basis, is possible if thegroup comparisons are already planned at the beginning of the experiment.This is based mainly on the t–statistic. If we want to ensure the significancelevel α simultaneously for all group comparisons of interest, the appropriatemultiple test procedure is one that controls the error rate per experimentbasis.
The decision for one of the two procedures is to be made ahead of theexperiment.
4.4 Multiple Comparisons 135
4.4.2 Experimentwise Comparisons
The most popular multiple procedures that control the error simultaneouslyare those of Dunnett (1955) for the comparison of s − 1 groups with acontrol group, of Tukey (1953) for all s(s−1)/2 =
(s2
)pairwise comparisons,
and those of Scheffe (1953) for any linear combinations. The proceduresof Tukey and Scheffe should be applied in the explorative phase of anexperiment, in order to avoid comparisons that are suggested by the data.The main condition for all multiple procedures is the rejection of H0 :µ1 = · · · = µs.
Hint. A detailed representation and rating of the multiple test procedurescan be found in Miller, Jr. (1981).
Procedure by Scheffe
Let c′µ be any linear contrast of µ and c′y, with∑s
i=1 ci = 0 and y′ =(y1·, . . . , ys·) the corresponding contrast of the vector of means. We thenhave, for all c,
P (c′y −√
S1−α ≤ c′µ ≤ c′y +√
S1−α) = 1− α (4.84)
with (cf. (4.78))
S1−α = MSError(s− 1)(
c21
n1+ · · ·+ c2
s
ns
)Fs−1,n−s;1−α . (4.85)
The null hypothesis H0 : c′µ = 0 is rejected if zero is not within theconfidence interval. The multiple level is α.
Procedure by Dunnett
Let group i = 1 be selected as the control group that is to be comparedwith the treatments (groups) i = 2, . . . , s. The [(1− α) · 100%]–confidenceintervals for the s − 1 pairwise comparisons “control – treatment” are ofthe form
(y1· − yi·)± C1−α(s− 1, n− s)sdi(4.86)
with
sdi=
√MSError
(1n1
+1ni
). (4.87)
The quantiles C1−α(s − 1, n − s) are given in special tables (one– andtwo–sided, cf. Woolson, 1987, Tables 13a and 13b, p. 502–503; or Dunnett(1955; 1964)). We show an excerpt for C0.95(s− 1, n− s) in Table 4.12 and4.13.The hypothesis H0 : µ1 = µi (i = 2, . . . , s) is rejected:
136 4. Single–Factor Experiments with Fixed and Random Effects
s− 1n− s 1 2 3 4 5
5 2.57 3.03 3.39 3.66 3.8810 2.23 2.57 2.81 2.97 3.1115 2.13 2.44 2.64 2.79 2.9020 2.09 2.38 2.57 2.70 2.81
Table 4.12. [C0.95(s− 1, n− s)]–quantiles (two–sided).
s− 1n− s 1 2 3 4 5
5 2.02 2.44 2.68 2.85 2.9810 1.81 2.15 2.34 2.47 2.5615 1.75 2.07 2.24 2.36 2.4420 1.72 2.03 2.19 2.30 2.39
Table 4.13. [C0.95(s− 1, n− s)]–quantiles (one–sided).
• two–sided in favor of H1 : µ1 6= µi, if
|y1· − yi·| > C1−α(s− 1, n− s) · sdi; (4.88)
• one–sided in favor of H1 : µ1 > µi, if
y1· − yi· > C1−α(s− 1, n− s) · sdi; (4.89)
• one–sided in favor of H1 : µ1 < µi, if
y1· − yi· < −C1−α(s− 1, n− s) · sdi(4.90)
holds. For all s− 1 comparisons the multiple level α is ensured.
Procedure by Tukey
In the case of experiments in the explorative phase it is often not possible tofix the set of planned comparisons in advance. Hence, all s(s−1)/2 possiblepairwise comparisons are done. The two–sided test procedure by Tukeyassumes the balanced case ni = r and controls for the error experimentwise,i.e., for all s(s− 1)/2 comparisons the multiple level α holds. We computethe confidence intervals
(yi· − yj·)± Tα (i > j) (4.91)
with
Tα = Qα(s, n− s) sd , (4.92)
sd =√
MSError/r . (4.93)
The quantiles Q1−α(s, n − s) are so–called Studentized rank–values, thatare given in special tables (cf., e.g., Woolson, 1987, Table 14, pp. 504–505).
4.4 Multiple Comparisons 137
The set of null hypotheses H0(i, j) : µi = µj (i > j) is rejected in favor ofH1 : H0 incorrect (i.e., µi 6= µj for at least one pair i > j), if
|yi· − yj·| > Tα (4.94)
holds. For all pairs (i, j), i > j with |yi· − yj·| > Tα, we have a statisticallysignificant treatment difference.
Bonferroni Method
Suppose, we want to conduct k ≤ s comparisons with a multiple level of αat the most. In this situation the Bonferroni method can be applied. Thismethod splits up the risk α into equal parts α/k for the k comparisons.The basis is Bonferroni’s inequality.
Let H1, . . . ,Hk be the confidence intervals for the k comparisons. Denoteby P (Hi) the probability that Hi is true (i.e., Hi covers the respectiveparameter of the ith comparison). Then P (H1∩· · ·∩Hk) is the probabilitythat all k confidence intervals cover the respective parameters. Accordingto Bonferroni’s inequality, we have
P (H1 ∩ · · · ∩Hk) ≥ 1−k∑
i=1
P (Hi) , (4.95)
where Hi is the complementary event to Hi. If P (Hi) = α/k is chosen,then the following holds for the simultaneous probability
P (H1 ∩ · · · ∩Hk) ≥ 1− α . (4.96)
Assume, for example, k ≤ s contrasts c′iµ are to be tested simultaneously.The confidence intervals for c′iµ, according to the Bonferroni method, arethen of the following form:
c′iy ± tn−s;1−α/2k
√MSError
√c21
n1+ · · ·+ c2
s
ns. (4.97)
The test runs analogously to the procedure by Scheffe, i.e., if (4.97) doesnot contain the zero, then H0 is rejected and the respective comparison issignificant.
4.4.3 Select Pairwise Comparisons
The “Least Significant Difference” (LSD)
Suppose we want to compare the means of two selected treatments, i.e.,suppose we want to test H0 : µ1 = µ2 against H1 :µ1 6= µ2. The appropriatetest statistic is
tdf =y1· − y2·√
Var(y1· − y2·), (4.98)
138 4. Single–Factor Experiments with Fixed and Random Effects
where df is the number of degrees of freedom. For |t| > tdf ;1−α/2 we rejectH0, where tdf ;1−α/2 is the two–sided quantile at the α probability level. IfH0 is rejected, then µ1 is significantly different from µ2 at the α level.
|t| > tdf ;1−α/2 is equivalent with
tdf ;1−α/2
√Var(y1· − y2·) < |y1· − y2·| . (4.99)
Hence, every sample with a difference |y1· − y2·| that exceedstdf ;1−α/2
√Var(y1· − y2·), indicates a significant difference between µ1 and
µ2. According to (4.99), the left side would be the smallest difference ofy1· and y2· for which significance would be declared. Thus, we define (dfis the number of degrees of freedom of s2, the pooled variance of the twosamples)
LSD = tdf ;1−α/2
√Var(y1· − y2·)
= tdf ;1−α/2
√s2
(1n1
+1n2
). (4.100)
In the balanced case (n1 = n2 = r) we receive
LSD = tdf ;1−α/2
√2s2
r. (4.101)
Using the LSD is controversial, especially if it is used for comparisonssuggested by the data (largest/smallest sample mean) or if all pairwisecomparisons are done without correction of the test level. If the LSD isused for all pairwise comparisons (i.e., for s(s−1)/2 comparisons in the caseof s treatments), then these tests are not independent. Procedures basedon the LSD, that ensure the test level due to corrections of the quantiles,exist (HSD, Duncan test). FPLSD and SNK on the other hand, onlyensure the global level.
Fisher’s Protected LSD (FPLSD)
This procedure starts out with the analysis of variance and tests the globalhypothesis H0 : µ1 = · · · = µs with the statistic F = MSA/MSError from(4.32). If F is not significant the procedure stops. If F > Fs−1,n−s;1−α, i.e.,differences of the means are significant, then all pairs of means yi· and yj·(i 6= j) are tested for differences with
FPLSD = tn−s;1−α/2
√MSError
(1ni
+1nj
). (4.102)
For |yi· − yj·| > FPLSD we have a significant difference of means. Notethat in (4.102) σ2 is estimated by MSError. Hence, t now has n− s degrees
4.4 Multiple Comparisons 139
of freedom (instead of n1 +n2− 2 degrees of freedom as in the two–samplecase).
Tukey’s Honestly Significant Difference (HSD)
This procedure uses the Studentized rank values Qα,(s,n−s) (cf. (4.92)) in-stead of the t–quantiles and replaces the standard error of the mean by thestandard error of the difference (pooled sample). We compute
HSD = Qα,(s,n−s)
√MSError/r . (4.103)
All differences of pairs |yi· − yj·| (i < j) are compared with HSD. For|yi· − yj·| > HSD we have a significant difference between µi and µj .
Student–Newman–Keuls Test (SNK)
The SNK test is a test in which the difference needed for significance varieswith the degree of separation. Suppose we want to compare k means. Thesample means are sorted in descending order
y(1)·, . . . , y(k)· ,
where y(i)· is the mean with the ith rank (i.e., y(1)· is the largest mean,y(k)· the smallest mean). We compute the SNK differences
SNKi = Qα,(i,df)
√MSError/r (i = 2, . . . , k), (4.104)
with Qα,(i,df) for df degrees of freedom of SSError and (in succession)i = 2, 3, . . . , k means.
If |y(1)· − y(k)·| < SNKk, then none of the differences of means aresignificant and the procedure stops.If |y(1)· − y(k)·| > SNKk, then this (largest) difference is significant. Weproceed by testing whether
|y(2)· − y(k)·| > SNKk−1
and
|y(1)· − y(k−1)·| > SNKk−1
holds. If both conditions hold, then those differences of the rank–orderedmeans are tested, where the ranks differ by k − 3. This procedure iscontinued up to the comparison of rank–neighbored means.
Duncan Test
Duncan (1975) modified the procedure FPLSD by computing alternativequantiles. The least significant difference is Bayes adjusted and reads asfollows:
BLSD = tB√
2MSError/r . (4.105)
140 4. Single–Factor Experiments with Fixed and Random Effects
The values tB are given in special tables (Waller and Duncan, 1972) andare printed in the SPSS procedure.
Hint. A number of multiple test procedures exist that work with otherrank values. These are implemented in the standard software.
Example 4.6. (Continuation of Example 4.5)Table 4.6 yields:
Treatment 1 2 3 4Rank 1 2 3 4Mean 4.25 3.98 3.47 3.22
We had s = 4, r = 6, and n = 4 · 6 = 24, as well as MSError = 0.3962 forn− s = 20 degrees of freedom (Table 4.7). The hypothesis H0 : µ1 = · · · =µ4 was rejected.
Experimentwise Procedures
Procedure by Scheffe The critical value (4.85) of the confidence interval(4.84) for any contrast c′µ is, with F3,20;0.95 = 3.10,
S1−α = 0.3962 · 3 · 3.10 · c′c6
= 0.61 · c′c .
We test the complete system of orthogonal contrasts of the means fromTable 4.11 and receive:
c′y c′c√
S1−α c′y ±√S1−α
A1 −0.52 4 1.57 [−2.09 , 1.05]A2 −1.54 4 1.57 [−3.11 , 0.03]
A1 ×A2 0.02 4 1.57 [−1.55 , 1.59]
The zero lies in all three intervals, hence H0 : c′µ = 0 is never rejected.
Procedure by Dunnett In Example 4.5 Level 1 was designed as controlgroup. We conduct the multiple comparison (according to Dunnett) of thecontrol group with the Groups 2, 3, and4. The critical limits (4.86) are(ni = nj = 6) (cf. Tables 4.12 and 4.13) two–sided:
C1−α(3, 20)√
0.3962 · 2/6 = 2.57 · 0.3634 = 0.9340
and one–sided:
C1−α(3, 20) · 0.3634 = 2.19 · 0.3634 = 0.7958 .
For the one–sided tests we receive
y1· − y2· = 0.27,
y1· − y3· = 0.78,
y1· − y4· = 1.03 * ,
4.4 Multiple Comparisons 141
and, hence, a significant difference between the control group and Group4.
Procedure by Tukey Here all 4·3/2 = 6 possible comparisons are conducted.With Q0.05(4, 20) = 3.95 and sd =
√MSError/r =
√0.3962/6 = 0.2570
the critical value (cf. (4.92)) is T0.05 = 3.95 · 0.2570 = 1.02.
(i, j) |yi· − yj·|(1, 2) 0.27(1, 3) 0.78(1, 4) 1.03 *(2, 3) 0.51(2, 4) 0.76(3, 4) 0.25
Again, the difference between treatments 1 and 4 is significant.
Bonferroni Method We conduct the k = 3 comparisons from Table 4.10according to the Bonferroni method. The critical limit from (4.97) for thechosen contrast c′µ is
t20;1−0.05/2·3 ·√
0.3962 ·√
c′c6
= 2.95 · 0.62942.4495
·√
c′c
= 0.7580 ·√
c′c .
Contrast c′y c′c 0.7580 · √c′c Interval (4.97)1/2 0.52 2 1.0720 [−0.5520, 1.5920]
1 or 2/4 −1.02 6 1.8567 [−2.8767, 0.8367]1/2 or 3 or 4 −2.08 12 2.6258 [−4.7058, 0.6058]
In the multiple comparison according to Bonferroni no contrast isstatistically significant.
Selected Pairwise Comparisons
SNK Test The Studentized ranges, Q0.05,(i,df) for df = 20 degrees offreedom, are
2 3 4Q0.05,(i,20) 2.95 3.57 3.95
SNKi 0.76 0.92 1.02
This yields the following comparisons
|y(1)· − y(4)·| = |4.25− 3.22|= 1.03 > SNK4 = 1.02 .
142 4. Single–Factor Experiments with Fixed and Random Effects
Hence the largest difference is significant. Thus, we can proceed with theprocedure
|y(1)· − y(3)·| = |4.25− 3.47|= 0.78 < SNK3 = 0.92 ,
|y(2)· − y(4)·| = |3.98− 3.22|= 0.76 < SNK3 = 0.92 .
Here, the SNK test stops. Therefore, the only significant difference isthat between treatment 1 (control group) and treatment 4 (A1 and A2).The treatments (1, 2, 3), or (2, 3, 4), respectively, may be regarded ashomogeneous.
SNK in SPSS
The procedure is started with /Ranges = snk
Note. SPSS computes the SNK statistic according to
SNK =
√MSError
2Qα,(i,df)
√1ni
+1nj
,
for ni = nj = r this yields the expression (4.104).
The SPSS printout is of the following form:
Multiple Range TestStudent--Newman--Keuls ProcedureRanges for the .050 level
2.95 3.57 3.95The ranges above are table ranges.
The value actually compared withMean(J)-Mean(I) is
.4451 * Range * Sqrt(1/N(I) + 1/N(J))
(*) Denotes pairs of groups significantlydifferent at the .050 level
G G G Gr r r rp p p p4 3 2 1
Mean Group3.22 Grp 43.47 Grp 33.98 Grp 2
4.4 Multiple Comparisons 143
4.25 Grp 1 *
Homogeneous Subsets
Subset 1Group Grp 4 Grp 3 Grp 2Mean 3.22 3.47 3.98
Subset 2Group Grp 3 Grp 2 Grp 1Mean 3.47 3.98 4.25
Tukey’s HSD Test We compute the HSD (4.103) according to
HSD = Qα,(4,20)
√MSError/6
= 3.95 · 0.2569 = 1.01 .
The differences of pairs yi· − yj· (i < j) are
y1· − y2· = 4.25− 3.98 = 0.27,
y1· − y3· = 0.78,
y1· − y4· = 1.03, *
y2· − y3· = 0.51,
y2· − y4· = 0.76,
y3· − y4· = 0.25 ,
hence only |y1· − y4·| > HSD holds.SPSS call and printout:
/Ranges = tukey
Tukey--HSD ProcedureRanges for the .050 level
3.95 3.95 3.95
G G G Gr r r rp p p p4 3 2 1
Mean Group3.22 Grp 43.47 Grp 33.98 Grp 24.25 Grp 1 *
144 4. Single–Factor Experiments with Fixed and Random Effects
Fisher’s Protected LSD (FPLSD)
The FPLSD (4.102) at the 5% level is
t20;0.975
√0.3962 · 2/6 = 2.09 · 0.3634 = 0.76 .
With the differences of means calculated above, we receive
G G G Gr r r rp p p p4 3 2 1
Mean Group3.22 Grp 43.47 Grp 33.98 Grp 2 *4.25 Grp 1 * *
The means µ1 and µ4 and µ1 and µ3, as well as the means µ2 and µ4, aresignificantly different according to this test.
4.5 Regression Analysis of Variance
For the description of the dependence of a variable Y on another (fixed)variable X by a regression model of the form
Y = α + βX + ε
we need pairs of observations (xi, yi), i = 1, . . . , n, i.e., for every x–valueone y–value is observed.
Consider the following experimental design. For every x–value severalobservations of Y are realized
xi, yi1, . . . , yini .
This corresponds to the idea that a population of y–values belongs to afixed x–value. The question of interest is whether a dependence exists be-tween the y–samples, represented by their means yi·, and the factor X.First, we test whether the populations Yi have equal means (analysis ofvariance – multiple comparison of means).
If this hypothesis is rejected, we have reason for assuming a simple linearrelationship
yi· = α + βxi + εi (i = 1, . . . , s) . (4.106)
4.5 Regression Analysis of Variance 145
The estimates of α and β are determined, under consideration of the samplesizes ni, according to the method of weighted least squares, i.e.,
s∑
i=1
ni(yi· − α− βxi)2 (4.107)
is minimized with respect to α and β. Let n =∑
ni be the sum of allobservations. The weighted least squares estimates are then of the followingform
β =∑
nixiyi· − 1/n∑
nixi
∑niyi·∑
nix2i − 1/n [
∑nixi]
2 , (4.108)
α = y·· − bx , (4.109)
where yi· = 1/ni
∑j yij is the ith sample mean and y·· = 1/n
∑i
∑j yij is
the overall mean of all y–values. We receive the estimated means accordingto
yi· = α + βxi . (4.110)
We partition the sum of squares SSA as follows:
SSA =s∑
i=1
ni(yi· − y··)2 (4.111)
=s∑
i=1
ni(yi· − y··)2 +s∑
i=1
ni(yi· − yi·)2
= SSModel + SSDeviation .
For the degrees of freedom we have
dfA = dfM + dfDeviation , (4.112)
i.e.,
(s− 1) = 1 + s− 2 .
If not only K = 2 parameters are to be estimated, but K parameters ingeneral, then
dfA = s− 1, dfM = K − 1, dfDeviation = s−K . (4.113)
The complete table of the regression analysis of variance is shown in Table4.14. As a test value for the fit of the model we compute
F =MSModel
MSDeviation. (4.114)
If F > Fs−1,n−s;1−α the fit of the model is significant at the α level.
Example 4.7. In a study the rate of abrasion of silanized plastic materialPMMA was determined for various levels of the proportion of quartz (Table4.15).
146 4. Single–Factor Experiments with Fixed and Random Effects
Source ofvariation SS df MS = SS/df Test valueModel SSM K − 1 MSM
MSModel/MSDev
Model SSDev s−K MSDev
deviationBetween the SSA s− 1 MSA F = MSA/MSError
y–groupsWithin the SSError n− s MSError
y–groupsTotal SSTotal n− 1
Table 4.14. Table of the regression analysis of variance.
x [in volume % quartz]x1 = 2.2 x2 = 4.5 x3 = 9.3 x4 = 25.60.1420 0.0964 0.0471 0.04510.1113 0.0680 0.0585 0.03110.1092 0.0964 0.0544 0.04580.1298 0.0764 0.0444 0.05340.0962 0.0749 0.0575 0.04880.0917 0.0813 0.0406 0.05080.0800 0.0813 0.0522 0.04400.0996 0.0813 0.0525 0.05490.1123 0.0570 0.0539
0.0559 0.0526y1· = 0.1080 y2· = 0.0820 y3· = 0.0520 y4· = 0.0480
n1 = 9 n2 = 8 n3 = 10 n4 = 10y·· = 0.0710 n = 37
y1· = 0.0878 y2· = 0.0831 y3· = 0.0733 y4· = 0.0400
Table 4.15. Data of the rate of abrasion.
The null hypothesis H0 : All means are equal, i.e., the proportion ofquartz has no effect on the rate of abrasion is rejected, since the analysisof variance yields the test value (see Table 4.16)
F =MSA
MSError= 55.80 > 2.74 = F3,33;0.95 .
Hence, we fit a linear regression (4.110) to the means yi· of the s = 4samples. The parameters are computed according to (4.108) and (4.109):
yi· = 0.0923− 0.0020 xi (i = 1, . . . , 4) .
4.6 One–Factorial Models with Random Effects 147
SS df MS Test valueSSM = 0.01340 1 MSM = 0.01340 F = 3.02
SSDev. = 0.00886 2 MSDev. = 0.00443SSA = 0.02226 3 MSA = 0.00742 F = 55.80SSE = 0.00440 33 MSE = 0.00013SST = 0.02667 36
Table 4.16. Table of the regression analysis of variance of the rate of abrasion.
These estimated values are shown in Table 4.15. We can now calculate thepartition (4.111) of SSA (Table 4.16), the test value is
F =MSModel
MSDev.= 3.02 < 18.51 = F1,2;0.95 .
Hence, the null hypothesis H0 : β = 0 cannot be rejected.
4.6 One–Factorial Models with Random Effects
So far, in this chapter, we have discussed models with fixed effects. In theIntroduction, however, we have already referred to the difference to modelswith random effects.
Models with fixed effects for the analysis of treatment effects are thestandard in designed experiments. Models with random effects, however,occur in sample surveys where the grouping categories are random effects.
Examples: Quality control:
(i) Fixed effects: The daily production of five particular machines froman assembly line.
(ii) Random effects: The daily production of five machines, chosen atrandom, that represent the machines as a class.
The model with random effects is of the same structure as the model(4.2) with fixed effects
yij = µ + αi + εij (i = 1, . . . , s; j = 1, . . . , ni) . (4.115)
The meaning of the parameter αi however has now changed. The αi arenow the random effects of the ith treatment (ith machine). Hence, the αi
are the random variables whose distributions we have to specify. We assume
E(αi) = 0, Var(αi) = σ2α , (4.116)
and
E(εijαi) = 0, E(αiαj) = 0 (i 6= j) . (4.117)
148 4. Single–Factor Experiments with Fixed and Random Effects
Then
yij ∼ (µ, σ2α + σ2) (4.118)
holds.In the model with fixed effects, the treatment effect A was represented
by the parameter estimates αi, or µi = µ + αi, respectively. In the modelwith random effects, a treatment effect can be expressed by the so–calledvariance components. The variance σ2
α is estimated as a component of theentire variance. The absolute or relative size of this component then makesconclusions about the treatment effect possible.
The estimation of the variances σ2α and σ2 requires no assumptions about
the distribution. For the test procedure and the computation of confidenceintervals, however, we assume the normal distribution, i.e.,
εij ∼ N(0, σ2), εij independent,αi ∼ N(0, σ2
α), αi independent,
and, hence,
yij ∼ N(µ, σ2α + σ2) . (4.119)
Unlike the model with fixed effects, the response values yij of a level i ofthe treatment (i.e., of the ith sample) are no longer uncorrelated
E(yij − µ)(yij′ − µ) = E(αi + εij)(αi + εij′)= E(α2
i ) = σ2α . (4.120)
On the other hand, the response values of different samples are stilluncorrelated (i 6= i′, for any j, j′):
E(yij − µ)(yi′j′ − µ) = E(αiαi′) + E(εijεi′j′) + E(αiεi′j′) + E(αi′εij) = 0 .(4.121)
In the case of a normal distribution, uncorrelated can be replaced byindependent.
Test of the Null Hypothesis H0 : σ2α = 0 Against H1 : σ2
α > 0
The hypothesis H0 : “no treatment effect” for the two models is:
– fixed effects: H0 : αi = 0 ∀i;– random effects: H0 : σ2
α = 0 .
With the results of Section 4.2.3, which we can partly adopt, we have, forthe model with random effects,
E(MSError) = σ2 ,
4.6 One–Factorial Models with Random Effects 149
i.e., MSError = σ2 is an unbiased estimate of σ2. We compute E(MSA) asfollows:
SSA =s∑
i=1
ni∑
j=1
(yi· − y··)2 ,
yi· = µ + αi + εi· ,
y·· = µ + α + ε·· ,
α =∑
niαi/n ,
(yi· − y··) = (αi − α) + (εi· − ε··) .
With (4.116) and (4.117) we have
E(yi· − y··)2 = E(αi − α)2 + E(εi· − ε··)2 , (4.122)E(αi − α)2 = E(α2
i ) + E(α2)− 2E(αiα)
= σ2α
[1 +
∑n2
i
n2− 2
ni
n
], (4.123)
E(ε2i· − ε··)2 = E(ε2i·) + E(ε2··)− 2E(εi·ε··)
=σ2
ni+
σ2
n− 2
σ2
n
= σ2
(1ni− 1
n
). (4.124)
Henceni∑
j=1
E(yi· − y··)2 = niE(yi· − y··)2
= σ2α
[ni +
ni
n
∑n2
i
n− 2
n2i
n
]+ σ2
(1− ni
n
)
ands∑
i=1
niE(yi· − y··)2 = σ2α
[n−
∑n2
i
n
]+ σ2(s− 1) .
We receive:
(i) in the unbalanced case
E(MSA) =1
s− 1E(SSA) = σ2 + kσ2
α (4.125)
with
k =1
s− 1
(n− 1
n
∑n2
i
); (4.126)
150 4. Single–Factor Experiments with Fixed and Random Effects
(ii) in the balanced case (ni = r for all i, n = r · s)
k =1
s− 1
(r · s− 1
r · ss · r2
)= r, (4.127)
E(MSA) = σ2 + rσ2α . (4.128)
This yields the unbiased estimate σ2α of σ2
α:
(i) in the unbalanced case
σ2α =
MSA −MSError
k; (4.129)
(ii) in the balanced case
σ2α =
MSA −MSError
r, . (4.130)
In the case of an assumed normal distribution we have
MSError ∼ σ2χ2n−s
and
MSA ∼ (σ2 + kσ2α)χ2
s−1 .
The two distributions are independent, hence the ratio
MSA
MSError· σ2
σ2 + kσ2α
has a central F–distribution under the assumption of equal variances, i.e.,under H0 : σ2
α = 0. Under H0 : σ2α = 0 we thus have
MSA
MSError∼ Fs−1,n−s . (4.131)
Hence, H0 : σ2α = 0 is tested with the same test statistic as H0 : αi = 0
(all i) in the model with fixed effects. The table of the analysis of varianceremains unchanged.
E(MS)Effects
Source SS df Fixed Random
Treatment SSA s− 1 σ2 +P
niα2i
s− 1 σ2 + kσ2α
Error SSError n− s σ2 σ2
Table 4.17. Expectations of MSA and MSError.
4.7 Rank Analysis of Variance in the Completely Randomized Design 151
Example 4.8. (Continuation of Example 4.5)We now regard the design from Table 4.6 as a model with random effects.The null hypothesis H0 : σ2
α = 0 is tested with the statistic from (4.131).Table 4.7 yields
F3,20 =1.33490.3962
= 3.3687 (p–value: 0.0389) ,
hence H0 : σ2α = 0 is rejected. The estimated components of variance are
σ2 = MSError = 0.3962
and (cf. (4.130))
σ2α =
1.3349− 0.39626
= 0.1564 .
4.7 Rank Analysis of Variance in the CompletelyRandomized Design
4.7.1 Kruskal–Wallis Test
The previous models were designed for the case that the response valuesfollow a normal distribution. We now consider the situation that the re-sponse is either continuous but not normal or that we have a categoricalresponse. For this data situation, which is often found in practice, we wantto conduct the one–factorial comparison of groups. We first discuss thecompletely randomized design.
The response values are yij with the two subscripts i = 1, . . . , s (groups)and j = 1, . . . , ni (subscript within the ith group). The data are collectedaccording to the completely randomized design: n1 units are chosen atrandom from n =
∑ni units and are assigned to the treatment (group) 1,
etc. The data structure is shown in Table 4.18.
Group1 2 · · · s
y11 y21 · · · ys1
......
...y1n1 y2n2 · · · ysns
Table 4.18. Data matrix in the completely randomized design.
To begin with, we choose the following linear additive model
yij = µi + εij (4.132)
152 4. Single–Factor Experiments with Fixed and Random Effects
and assume that
εij ∼ F (0, σ2) (4.133)
holds (where F is any continuous distribution). Additionally, we assumethat the observations are independent within and between the groups.
The major statistical task is the comparison of the group means µi
according to
H0 :µ1 = · · · = µs against H1 :µi 6= µj (at least one pair i, j, i 6= j).
The tests are based on the comparison of the rank sums of the groups, inanalogy to the Wilcoxon test in the two–sample case. The ranking proce-dure assigns the rank 1 to the smallest value of all s groups, . . ., the rankn =
∑ni to the largest value of all s groups. These ranks Rij replace the
original values yij of the response Table 4.18 according to Table 4.19.
Group1 2 · · · s
R11 R21 Rs1
......
...R1n1 R2n2 Rsns∑R1· R2· · · · Rs· R··
Mean r1· r2· · · · rs· r··
Table 4.19. Rank values for Table 4.18.
The rank sums and rank means are
Ri· =∑ni
j=1 Rij , R·· =∑s
i=1 Ri· = n(n+1)2 ,
ri· = Ri·ni
, r·· = R··n = n+1
2 .
Under the null hypothesis all n!/n1! · · ·ns! possible arrangements of theranks have equal possibility. Hence, for each of these arrangements we cancompute a measure for the difference between the groups. One possiblemeasure for the group difference is based on the comparison of the rankmeans ri· .
In analogy to the error sum of squares SSA =∑s
i=1 ni(yi· − y··)2
(cf. (4.29)) Kruskal and Wallis constructed the following test statistic(Kruskal and Wallis, 1952):
H =12
n(n + 1)
s∑
i=1
ni(ri· − r··)2
=12
n(n + 1)
s∑
i=1
R2i·
ni− 3(n + 1) . (4.134)
4.7 Rank Analysis of Variance in the Completely Randomized Design 153
The test statistic H is a measure for the variance of the sample rankmeans. For the case of ni ≤ 5, tables exist for the exact critical values(cf., e.g., Hollander and Wolfe, 1973, p. 294). For ni > 5 (i = 1, . . . , s), His approximatively χ2
s−1–distributed.
Correction in the Case of Ties
If equal response values yij arise and mean ranks are assigned, then thefollowing corrected test statistic is used
HCorr = H
(1−
∑rk=1(t
3k − tk)
n3 − n
)−1
. (4.135)
Here r is the number of groups with equal ranks and tk is the number ofequal response values within a group. If H > χ2
s−1;1−α, the hypothesis H0
: µ1 = · · · = µs is rejected in favor of H1. If HCorr has to be used, thecorrected value does not have to be calculated in the case of significance ofH, due to HCorr > H.
Example 4.9. We now compare the manufacturing times from Table 4.1according to the Kruskal–Wallis test. Hint: In Example 4.1 the analysisof variance was done with the logarithms of the response values, since anormal distribution of the original values was doubtful. The null hypothesiswas not rejected, cf. Table 4.5. The test statistic based on Table 4.20 is
Dentist A Dentist B Dentist C
Manufacturing Manufacturing Manufacturingtime Rank time Rank time Rank
31.5 3.0 33.5 5.0 19.5 1.038.5 7.0 37.0 6.0 31.5 3.040.0 8.5 43.5 10.0 31.5 3.045.5 11.0 54.0 15.0 40.0 8.548.0 12.0 56.0 17.0 50.5 13.055.5 16.0 57.0 18.0 53.0 14.057.5 19.0 59.5 21.0 62.5 23.559.0 20.0 60.0 22.0 62.5 23.570.0 27.5 65.5 25.070.0 27.5 67.0 26.072.0 29.0 75.0 31.074.5 30.078.0 32.080.0 33.0
n1 = 14 n2 = 11 n3 = 8R1· = 275.5 R2· = 196.0 R3· = 89.5r1· = 19.68 r2· = 17.82 r3· = 11.19
Table 4.20. Computation of the ranks and rank sums for Table 4.1.
154 4. Single–Factor Experiments with Fixed and Random Effects
H =12
33 · 34
[275.52
14+
196.02
11+
89.52
8
]− 3 · 34
= 4.044 < 5.99 = χ22;0.95 .
Since H is not significant we have to compute HCorr. Table 4.20 yields:
r = 4 : t1 = 3 (3 ranks of 3),t2 = 2 (2 ranks of 8.5),t3 = 2 (2 ranks of 23.5),t4 = 2 (2 ranks of 27.5).
Correction term:1− [3 · (23 − 2) + (33 − 3)]−1/(333 − 33) = [1− 42]−1/35904 = 0.9988,
HCorr = 4.044 .
The decision is: the null hypothesis H0 : µ1 = µ2 = µ3 is not rejected, theeffect “dentist” cannot be proven.
4.7.2 Multiple Comparisons
In analogy to the reasoning in Section 4.4, we want to discuss the procedurein case of a rejection of the null hypothesis H0 : µ1 = · · · = µs for rankeddata.
Planned Single Comparisons
If we plan a comparison of two particular groups before the data is col-lected, then the Wilcoxon rank–sum test is the appropriate test procedure(cf. Section 2.5). The type I error, however, only holds for this particularcomparison.
Comparison of All Pairwise Differences
The procedure for comparing all s(s−1)/2 possible pairs (i, j) of differenceswith i > j dates back to Dunn (1964). It is based on the Bonferroni methodand assumes large sample sizes. The following statistics are computed fromthe differences ri· − rj· of the rank means (i 6= j , i > j):
zij =ri· − rj·√
(n(n + 1)/12) · (1/ni + 1/nj). (4.136)
Let u1−α/s(s−1) be the [1−α/s(s−1)]–quantile of the N(0, 1)–distribution.The multiple testing rule that ensures the α–level overall for all s(s − 1)pairwise comparisons is
H0: µi = µj for all (i, j), i > j, (4.137)
is rejected in favor of
H1 : µi 6= µj for at least one pair (i, j),
4.7 Rank Analysis of Variance in the Completely Randomized Design 155
if
|zij | > z1−α/s(s−1) for at least one pair (i, j), i > j . (4.138)
Example 4.10. Table 4.6 shows the response values of the four treatments(i.e., control group, A1, A2, A1 ∪ A2) in the balanced randomized design.The analysis of variance, under the assumption of a normal distribution,rejected the null hypothesis H0 : µ1 = · · · = µ4. In the following, we con-duct the analysis based on ranked data, i.e., we no longer assume a normaldistribution. From Table 4.6 we compute the Rank Table 4.21
Controlgroup A1 A2 A1 ∪A2
Value Rank Value Rank Value Rank Value Rank4.5 21.5 3.8 12.0 3.5 8.0 3.0 4.05.0 24.0 4.0 16.5 4.5 21.5 2.8 3.03.5 8.0 3.9 13.5 3.2 5.0 2.2 2.03.7 11.0 4.2 19.0 2.1 1.0 3.4 6.04.8 23.0 3.6 10.0 3.5 8.0 4.0 16.54.0 16.5 4.4 20.0 4.0 16.5 3.9 13.5R1· = 104 R2· = 91 R3· = 60 R4· = 45r1· = 17.33 r2· = 15.17 r3· = 10.00 r4· = 7.50
Table 4.21. Rank table for Table 4.6.
and receive the Kruskal–Wallis statistic
H =12
24 · 25 · 6∑
R2i· − 3 · 25
=1
300
∑(1042 + 912 + 602 + 452)− 75
= 7.41 .
H0 is not rejected on the 5% level, due to 7.41 < 7.81 = χ23;0.95. Hence, the
nonparametric analysis stops.For the demonstration of nonparametric multiple comparisons we now
change to the 10% level. This yields H = 7.41 > 6.25 = χ23;0.90. Since H
already is significant, HCorr does not have to be calculated. Hence, H0 :µ1 = · · · = µ4 can be rejected on the 10% level.
We can now conduct the multiple comparisons of the pairwise differences.The denominator of the test statistic zij (4.136) is
√((24 · 25)/12)(2/6) =
√50/3 = 4.08.
156 4. Single–Factor Experiments with Fixed and Random Effects
Comparison ri· − rj· zij
1/2 2.16 0.531/3 7.33 1.801/4 10.83 2.65 *2/3 5.17 1.272/4 8.67 2.133/4 3.50 0.86
For α = 0.10 we receive α/s(s − 1) = 0.10/12 = 0.0083, 1 − α/s(s − 1) =0.9917, u0.9917 = 2.39. Hence, the comparison 1/4 is significant.
Comparison Control Group – All Other Treatments
If one treatment out of the s treatments is chosen as the control groupand compared to the other s− 1 treatments, then the test procedure is thesame, but with the [u1−α/2(s−1)]–quantile.
Example 4.11. (Continuation of Example 4.10)The control group is treatment 1 (no additives). The comparison with thetreatments 2 (A1), 3 (A2), and 4 (A1 ∪A2) is done with the test statisticsz12, z13, z14. Here we have to use the [u1−α/2(s−1)]–quantile. We receive1− 0.10/6 = 0.9833, u1−0.10/6 = 2.126 ⇒ the comparisons 1/4 and 2/4 aresignificant.
4.8 Exercises and Questions
4.8.1 Formulate the one–factorial design with s = 2 fixed effects for thebalanced case as a linear model in the usual coding and in effectcoding.
4.8.2 What does the table of the analysis of variance look like in a two–factorial design with fixed effects?
4.8.3 What meaning does the theorem of Cochran have? What effects canbe tested with it?
4.8.4 In a field experiment three fertilizers are to be tested. The table ofthe analysis of variance is:
df MS FSSA = 50SSError =SSTotal = 350 32
Name the hypothesis to be tested and the test decision.
4.8 Exercises and Questions 157
4.8.5 Let c′y· be a linear contrast of the means y1·, . . . , ys·. Complete thefollowing:
c′y· ∼ N( ?, ?).
The test statistic for testing H0 : c′µ = 0 is
? ∼ χ2df , df = ? .
4.8.6 How many independent linear contrasts exist for s means? What is acomplete system of linear contrasts? Is this system unique?
4.8.7 Let c′1Y·, . . . , c′s−1Y· be a complete system of linear contrasts of the
total response values Y· = (Y1·, . . . , Ys·)′. Assume that each contrasthas the distribution
c′iY· ∼ N( ?, ?).
Then
(c′iY )2
?∼ ?
and, if the contrasts are ..., then
SSA = ?
holds.
4.8.8 Let A1 be a control group and assume that A2 and A3 are two treat-ments. Name the contrasts for the comparison of:A1 against A2 or A3;A2 against A1;A3 against A1?
4.8.9 Describe the main concern of multiple comparisons and the twomethods of comparison.
into the following matrix:
Scheffe Dunnett Tukey Bonferroni(i)(ii)(iii)(iv)
(i) k ≤ s comparisons planned in advance;
(ii) set of any linear contrasts;
(iii) (s− 1) comparisons with a control group; and
4.8.10 Assign the experimentwise designed multiple comparisons correctly
158 4. Single–Factor Experiments with Fixed and Random Effects
(iv) all s(s− 1)/2 comparisons of means.
is tn−1;1−α. In the of the Bonferroni procedure with threecomparisons the critical value for each single comparison is t ?; ?.
4.8.12 Name the assumptions in the model yij = µ + αi + εij with mixedeffects. We have yij ∼ N( ?, ?). Formulate the hypothesis H0 : notreatment effect!
for the following table:
Student A Student B Student CPoints Rank Points Rank Points Rank
32 34 3839 37 4045 42 4347 54 4853 60 5259 75 6171 8085 95
Hint: Completely randomized design.
4.8.11 In the case of the two–sample t–test (balanced) the critical valuecase
4.8.13 Conduct the rank analysis of variance according to Kruskal–Wallis
5More Restrictive Designs
5.1 Randomized Block Design
In statistical practice, the experimental units are often not completely ho-mogeneous. Usually, a grouping according to a stratification factor can beobserved (clinical population: stratified according to patient’s age, degreeof disease, etc.). If we have such prior information then a gain in efficiencycompared to the completely randomized experiment is possible by groupinginto blocks. The experimental units are grouped together in homogeneousgroups (blocks) and the treatments are assigned to the experimental unitswithin each block by random. Hence the block effect (differences betweenthe blocks) can now be separated from the experimental error. This leads toa higher precision. The strategy of building blocks should yield a variabil-ity within each block that is as small as possible and a variability betweenblocks that is as high as possible.
The most widely used block design is the randomized block design(RBD). Here s treatments with r repetitions each (i.e., balanced) are as-signed to a total of n = r · s experimental units. First, the experimentalunits are divided into r blocks with s units each in such a way that theunits within each block are as homogeneous as possible. The s treatmentsare then assigned to the s units at random, so that each treatment occursonly once per block.
© Springer Science + Business Media, LLC 2009
H. Toutenburg and Shalabh, Statistical Analysis of Designed Experiments, Third Edition, 159Springer Texts in Statistics, DOI 10.1007/978-1-4419-1148-3_5,
160 5. More Restrictive Designs
Example 5.1. We want to test s = 3 treatments A, B, C with r = 4repetitions each in the randomized block design with respect to their ef-fect. Assume the blocking factor to be ordinal scaled (e.g., r = 4 levels ofintensity of a disease or r = 4 age groups).
The block design of the n = r · s = 12 experimental units is then of thestructure displayed in Table 5.1. The assignment of the s = 3 treatments
BlockI II III IV1 1 1 12 2 2 23 3 3 3
−→Randomization
I II III IVA B C BB A A CC C B A
Table 5.1. Randomized assignment of treatments per block.
per block to the three units of the r = 4 blocks can be done via randomnumbers. Ranks 1, 2, or 3 are assigned to these random numbers and theassignment to the treatments is then done according to a previously speci-fied coding (rank 1: treatment A, rank 2: treatment B, rank 3: treatment C).
Example 5.2. Block II in Table 5.1:
Unit Random number Rank Treatment1 182 2 B2 037 1 A3 217 3 C
The structure of the data is shown in Table 5.2, with
Sums MeansYi· =
∑j yij yi· = Yi·/s Block i
Y·j =∑
i yij y·j = Y·j/r Treatment jY·· =
∑i Yi· =
∑j Y·j y·· = Y··/rs Total
Treatment jBlock i 1 2 · · · s Sum Mean
1 y11 y12 · · · y1s Y1· y1·2 y21 y22 · · · y2s Y2· y2·...
......
......
...r yr1 yr2 · · · yrs Yr· yr·
Sum Y·1 Y·2 · · · Y·s Y··Mean y·1 y·2 · · · y·s y··
Table 5.2. Data table for the randomized block design.
5.1 Randomized Block Design 161
Source SS df MS FBlock SSBlock r − 1 MSBlock FBlock
Treatment SSTreat s− 1 MSTreat FTreat
Error SSError (r − 1)(s− 1) MSError
Total SSTotal sr − 1
Table 5.3. Analysis of variance table for the randomized block design.
The linear model for the randomized block design (without interaction) is
yij = µ + βi + τj + εij (5.1)
where
yij is the response of the jth treatment in the ith block;µ is the average response of all experimental units (overall mean);βi is the additive effect of the ith block;τj is the additive effect of the jth treatment; andεij is the random error of the experimental unit that receives the
jth treatment in the ith block.
The following assumptions are made:
(i) The blocks are used for error control, hence the βi are random effectswith
βi ∼ N(0, σ2β) . (5.2)
(ii) Assume the treatments to be fixed factors. The τj are then fixedeffects that represent the deviation from the overall mean µ. Hencethe following constraint holds
s∑
j=1
τj = 0 . (5.3)
Remark. If, however, the treatment effects are to be regarded as randomeffects, then we assume
τj ∼ N(0, σ2τ ) (5.4)
and
E(βiτj) = 0 (for all i, j) (5.5)
instead of (5.3).
(iii) The εij are the random errors. Assume
εiji.i.d.∼ N(0, σ2) (5.6)
162 5. More Restrictive Designs
and
E(εijβi) = 0 (5.7)
as well as
E(εijτj) = 0 . (5.8)
Then
µi = µ + βi is the mean of the ith block
and
µj = µ + τj is the mean of the jth treatment.
Decomposition of the Error Sum of Squares
Using the identity
yij − y·· = (yij − yi· − y·j + y··) + (yi· − y··) + (y·j − y··) , (5.9)
it can be shown that the following decomposition holds:∑
i
∑
j
(yij − y··)2 =∑
i
∑
j
(yij − yi· − y·j + y··)2
+r∑
i=1
s(yi· − y··)2
+s∑
j=1
r(y·j − y··)2 . (5.10)
If the correction term is computed by
C = Y 2·· /rs , (5.11)
then the above sums of squares can be expressed as
SSTotal =∑
i
∑
j
(yij − y··)2 =∑
i
∑
j
y2ij − C, (5.12)
SSBlock = s∑
i
(yi· − y··)2 =1s
∑
i
Y 2i· − C, (5.13)
SSTreat = r∑
j
(y·j − y··)2 =1r
∑
j
Y 2·j − C, (5.14)
SSError = SSTotal − SSBlock − SSTreat . (5.15)
The F–ratios (cf. Table 5.3) are
FBlock =SSBlock
SSError· (r − 1)(s− 1)
(r − 1)
=MSBlock
MSError(5.16)
5.1 Randomized Block Design 163
and
FTreat =SSTreat
SSError· (s− 1)(r − 1)
(s− 1)
=MSTreat
MSError. (5.17)
The significance of the treatment effect, i.e., H0 : τj = 0 (j = 1, . . . , s) forfixed effects and H0 : σ2
τ = 0 for random effects, is tested with FTreat.
Testing for Block Effects
Consider the completely randomized design of the model (4.2) for the bal-anced case (ni = r for all i) and exchange the rows and columns (i.e., themeaning of i and j) in Table 4.2. If we additionally assume αi = τj , thenthe following model corresponds with the completely randomized design
yij = µ + τj + εij (5.18)
with the constraint∑
τj = 0. The subscript i = 1, . . . , r represents therepetitions of the jth treatment (j = 1, . . . , s). Hence the completely ran-domized design (5.18) is a nested submodel of the randomized block design(5.1). Testing for significance of the block effect is therefore equivalent tomodel choice between the complete model (here (5.1)) and a submodelrestricted by constraints (H0 : βi = 0).
The appropriate test statistic for this problem was already derived inSection 3.8.2 with FChange (cf. (3.162)). FChange is of the following form:
error variance (small model) - error variance (large model)error variance (large model)
. (5.19)
Applied to our problem we receive for the “large” model (5.1), accordingto (5.15),
SSError(large) = SSTotal − SSBlock − SSTreat .
In the “small” model (5.18) we have
SSError(small) = SSTotal − SSTreat ,
hence FChange is now
SSBlock/(r − 1)SSError(large)/(r − 1)(s− 1)
= FBlock . (5.20)
This statistic tests the significance of the transition from the smaller model(completely randomized design) to the larger model (randomized blockdesign) and hence the significance of the block effects.
164 5. More Restrictive Designs
Estimates and Variances
The unbiased estimate of the jth treatment mean µj = µ + τj is given by
µj =Y·jr
= y·j . (5.21)
The variance of this estimate is
Var(y·j) =1r2
r Var(yij) =σ2
r(for all j). (5.22)
The unbiased estimate of the standard deviation of the estimates y·j is then
sy·j =√
MSError/r (j = 1, . . . , s) . (5.23)
Hence, the (1−α)-confidence intervals of the jth treatment means are givenby
y·j ± t(s−1)(r−1),1−α/2
√MSError/r . (5.24)
For the simple comparison of two treatment means we receive an unbiasedestimate of their difference by
y·j1 − y·j2
with the standard deviation
s(y·j1−y·j2 ) =√
2MSError/r . (5.25)
Hence the (1 − α)-confidence intervals for the differences of means are ofthe form
(y·j1 − y·j2)± t(s−1)(r−1),1−α/2
√2MSError/r . (5.26)
Hint. Note the admissibility of simple comparisons.
Example 5.3. A physician wants to test the effect of three blood pressurelowering drugs (drug A, drug B, a combination of A and B) and of a placeboas control group. The 12 patients are assigned into three groups accordingto their weight. The “difference of the diastolic blood pressure from takingthe drug at 6 o’clock am until 6 o’clock pm” is the measured response. Theassignment to the treatments is done at random in each block. Table 5.4shows the measured values from which the table of variance is calculated.
5.1 Randomized Block Design 165
Placebo A B A and BBlock 1 2 3 4
∑yi·
1 5 7 4 12 28 72 7 8 6 15 36 93 9 9 8 18 44 11∑
21 24 18 45 108y·j 7 8 6 15 9
Table 5.4. Blood pressure differences.
We now receive
C = Y 2·· /rs = 1082/12 = 972,
SSTotal = 52 + · · ·+ 182 − C
= 1158− 972 = 186,
SSBlock = 1/4(282 + 362 + 442)− C
= 1004− 972 = 32,
SSTreat = 1/3(212 + 242 + 182 + 452)− C
= 1122− 972 = 150,
SSError = 186− 32− 150 = 4.
SS df MS FBlock 32 2 16 24.00Treat 150 3 50 75.00Error 4 6 0.67Total 186 11
The testing of H0 : τj = 0 (j = 1, . . . , 4) (no treatment effect) withFTreat = F3,6 = 75.00 leads to a rejection of H0 (F3,6;0.95 = 4.76), hence thetreatment effect is significant. The test of the block effect yields significancewith FBlock = F2,6 = 24.00 (F2,6;0.95 = 5.14), hence the randomized blockdesign is significant compared to the completely randomized design.
Consider the analysis of variance table in the completely randomizeddesign with the same response values as in Table 5.4:
SS df MS FTreat 150 3 50 11.11Error 36 8 4.5Total 186 11
Due to
F = 11.11 > F3,8;0.95 = 4.07
the treatment effect here is significant as well:
166 5. More Restrictive Designs
Treatments1/2 −1 ± 1.63 =⇒ [−2.63, 0.63]1/3 1 ± 1.63 =⇒ [−0.63, 2.63]1/4 −8 ± 1.63 =⇒ [−9.63, −6.37] *2/3 2 ± 1.63 =⇒ [0.37, 3.63] *2/4 7 ± 1.63 =⇒ [5.37, 8.63] *3/4 9 ± 1.63 =⇒ [7.37, 10.63] *
Table 5.5. Simple comparisons.
Treatment means Standard error
1 2 3 4√
MSError/r
7 8 6 15√
0.67/3 = 0.47Confidence intervals
7± 1.15 8± 1.15 6± 1.15 15± 1.15
(Hint. t6,0.975 = 2.45, 2.45√
MSError/r = 1.15.)Confidence intervals for differences of means.(Hint. t6,0.975
√2MSError/r = 1.63.)
In the simple comparison of means the treatments 1 and 4, 2 and 3, 2and 4, as well as 3 and 4, differ significantly. Using Scheffe (see Table 5.6)we get that treatments 1, 2, and 3 define a homogeneous subset which isseparated from treatment 4, i.e., the means of treatments 2 and 3 do notdiffer significantly using the multiple tests.
Treatments1/2 −1 ± 1.7321 =⇒ [−7.0494, 5.0494]1/3 1 ± 1.7321 =⇒ [−5.0494, 7.0494]1/4 −8 ± 1.7321 =⇒ [−14.0494, −1.9506] *2/3 2 ± 1.7321 =⇒ [−4.0494, 8.0494]2/4 7 ± 1.7321 =⇒ [−13.0494, −0.9506] *3/4 9 ± 1.7321 =⇒ [−15.0494, −2.9506] *
Table 5.6. Multiple comparisons according to Scheffe.
Example 5.4. n = 16 students are tested for s = 4 training methods. Thestudents are divided into r = 4 blocks according to their previous level ofperformance and the training methods are then assigned at random withineach block. The response is measured as the level of performance on a scaleof 1 to 100 points. The results are shown in Table 5.7.
Again, we calculate the sums of squares and test for treatment effect andblock effect
5.1 Randomized Block Design 167
Training methodBlock 1 2 3 4
∑Means
1 41 53 54 42 190 47.52 47 62 58 41 208 52.03 55 71 66 58 250 62.54 59 78 72 61 270 67.5∑
202 264 250 202 918Means 50.5 66.0 62.5 50.5 57.375
Table 5.7. Points.
C =(918)2
16= 52670.25
SSTotal = 412 + · · ·+ 612 − (918)2
16= 54524.00− 52670.25,
= 1853.75,
SSBlock =1902 + · · ·+ 2702
4− (918)2
16= 53691.00− 52670.25
= 1020.75,
SSTreat =2022 + · · ·+ 2022
4− (918)2
16= 53451.00− 52670.25
= 780.75,
SSError = 1853.75− 1020.75− 780.75= 52.25.
SS df MS FBlock 1020.75 3 340.25 58.61 *Treat 780.75 3 260.25 44.83 *Error 52.25 9 5.81Total 1853.75 15
Both effects are significant
FTreat = F3,9 = 44.83 > 3.86 = F3,9;0.95,
FBlock = F3,9 = 58.61 > 3.86 = F3,9;0.95.
168 5. More Restrictive Designs
5.2 Latin Squares
In the randomized block design we divided the experimental units intohomogeneous blocks according to a blocking factor and hence eliminatedthe differences among the blocks from the experimental error, i.e., increasedthe part of the variability explained by a model.
We now consider the case that the experimental units can be groupedwith respect to two factors, as in a contingency table. Hence two blockeffects can be removed from the experimental error. This design is called aLatin square.
If s treatments are to be compared, s2 experimental units are required.These units are first classified into s blocks with s units each, based onone of the factors (row classification). The units are then classified into sgroups with s units each, based on the other factor (column classification).The s treatments are then assigned to the units in such a way that eachtreatment occurs once, and only once, in each row and column.
Table 5.8 shows a Latin square for the s = 4 treatments A, B, C, D,which were assigned to the n = 16 experimental units by permutation.
A B C DB C D AC D A BD A B C
Table 5.8. Latin square for s = 4 treatments.
This arrangement can be varied by randomization, e.g., by first definingthe order of the rows by random numbers. We replace the lexicographicalorder A, B, C, D of the treatments by the numerical order 1, 2, 3, 4.
Row Random number Rank1 131 22 079 13 284 34 521 4
This yields the following row randomization:
B C D AA B C DC D A BD A B C
Assume the randomization by columns leads to:
5.2 Latin Squares 169
Column Random number Rank1 003 12 762 43 319 34 199 2
The final arrangement of the treatments would then be:
B A D CA D C BC B A DD C B A
If a time trend is present, then the Latin square can be applied to separatethese effects.
I II III IVA B C D B C D A C D A B D A B C
——————–> time axis
Figure 5.1. Latin square for the elimination of a time trend.
5.2.1 Analysis of Variance
The linear model of the Latin square (without interaction) is of thefollowing form:
yij(k) = µ + ρi + γj + τ(k) + εij (i, j, k = 1, . . . , s) . (5.27)
Here yij(k) is the response of the experimental unit in the ith row and thejth column, subjected to the kth treatment. The parameters are:
µ is the average response (overall mean);ρi is the ith row effect;γj is the jth column effect;τ(k) is the kth treatment effect; andεij is the experimental error.
We make the following assumptions:
εij ∼ N(0, σ2) , (5.28)ρi ∼ N(0, σ2
ρ) , (5.29)
γj ∼ N(0, σ2γ) . (5.30)
Additionally, we assume all random variables to be mutually independent.For the treatment effects we assume
170 5. More Restrictive Designs
(i) fixed:∑s
k=1 τ(k) = 0,or
(ii) random: τ(k) ∼ N(0, σ2τ ) ,
respectively. The treatments are distributed over all s2 experimental unitsaccording to the randomization, such that each unit, or rather its response,has to have the subscript (k) in order to identify the treatment. From thedata table of the Latin square we obtain the marginal sumsYi· =
∑sj=1 yij is the sum of the ith row;
Y·j =∑s
i=1 yij is the sum of the jth column; andY·· =
∑i Yi· =
∑j Y·j is the total response.
For the treatments we calculate thatTk is the sum of the response values of the kth treatment; andmk = Tk/s is the average response of the kth treatment.
Treatment1 2 · · · s
Sum T1 T2 . . . Ts
∑sk=1 Tk = Y··
Mean m1 m2 . . . ms Y··/s2 = y··
Table 5.9. Sums and means of the treatments.
Source SS df MS FRows SSRow s− 1 MSRow FRow
Columns SSColumn s− 1 MSColumn FColumn
Treatment SSTreat s− 1 MSTreat FTreat
Error SSError (s− 1)(s− 2) MSError
Total SSTotal s2 − 1
Table 5.10. Analysis of variance table for the Latin square.
The decomposition of the error sum of squares is as follows.
Assume the correction term defined according to
C = Y 2·· /s2 . (5.31)
5.2 Latin Squares 171
Then we have
SSTotal =∑
i
∑
j
y2ij − C, (5.32)
SSRow =1s
∑
i
Y 2i· − C, (5.33)
SSColumn =1s
∑
j
Y 2·j − C, (5.34)
SSTreat =1s
∑
k
T 2k − C, (5.35)
SSError = SSTotal − SSRow − SSColumn − SSTreat. (5.36)
The MS–values are obtained by dividing the SS–values by their degrees offreedom. The F–ratios are MS/MSError (cf. Table 5.10). The expectationsof the MS are shown in Table 5.11.
Source MS E(MS)Rows MSRow σ2 + sσ2
ρ
Columns MSColumn σ2 + sσ2γ
Treatment MSTreat σ2 + s/(s− 1)∑
k τ2(k)
Error MSError σ2
Table 5.11. E(MS).
The null hypothesis, H0 : “no treatment effect”, i.e., H0 : τ1 = · · · = τs = 0against H1 : τi 6= 0 for at least one i, is tested with
FTreat =MSTreat
MSError. (5.37)
Due to the design of the Latin square, the s treatments are repeated s–times each. Hence, treatment effects can be tested for. On the other hand,we cannot always speak of a repetition of rows and columns in the sense ofblocks. Hence, FRow and FColumn can only serve as indicators for additionaleffects which yield a reduction of MSError and thus an increase in precision.Row and column effects would be statistically detectable if repetitions wererealized for each cell.
Point and Confidence Estimates of the Treatment Effects
The OLS estimate of the kth treatment mean µk = µ + τ(k) is
mk = Tk/s (5.38)
with the variance
Var(mk) = σ2/s (5.39)
172 5. More Restrictive Designs
and the estimated variance
Var(mk) = MSError/s . (5.40)
Hence the confidence interval is of the following form:
mk ± t(s−1)(s−2);1−α/2
√MSError/s . (5.41)
In the case of a simple comparison of two treatments the difference isestimated by the confidence interval
(mk1 −mk2)± t(s−1)(s−2);1−α/2
√2MSError/s . (5.42)
Example 5.5. The effect of s = 4 sleeping pills is tested on s2 = 16 persons,who are stratified according to the design of the Latin square, based on theordinally classified factor’s body weight and blood pressure. The responseto be measured is the prolongation of sleep (in minutes) compared to anaverage value (without sleeping pills).
Bloodpressure
↓
Weight−→
A 43 B 57 C 61 D 74B 59 C 63 D 75 A 46C 65 D 79 A 48 B 64D 83 A 55 B 67 C 72
Table 5.12. Latin square (prolongation of sleep).
WeightBlood
pressure 1 2 3 4 Yi·1 43 57 61 74 2352 59 63 75 46 2433 65 79 48 64 2564 83 55 67 72 277
Y·j 250 254 251 256 1011Medicament A B C D TotalTotal (Tk) 192 247 261 311 1011Mean 48.00 61.75 65.25 77.75 63.19
5.2 Latin Squares 173
We calculate the sums of squares
C = 10112/16 = 63882.56,
SSTotal = 65939− C = 2056.44,
SSRow = 1/4 · 256539− C = 252.19,
SSColumn = 1/4 · 255553− C = 5.69,
SSTreat = 1/4 · 262715− C = 1796.19,
SSError = 2056.44− (252.19 + 5.69 + 1796.19)= 2056.44− 2054.07= 2.37.
Source SS df MS FRows 252.19 3 84.06 212.8 *Columns 5.69 3 1.90 4.802Treatment 1796.19 3 598.73 1496.83 *Error 2.37 6 0.40Total 2056.44 15
The critical value is F3,6;0.95 = 4.757. Hence the row effect (stratifica-tion according to blood pressure groups) is significant, the column effect(weight) however, is not significant. The treatment effect is significant aswell. The final conclusion should be, that in further clinical tests of thefour different sleeping pills the experiment should be conducted accordingto the randomized block design with the blocking factor “blood pressuregroups”.
The simple and multiple tests require SSError from the model with themain effect treatment:
Source SS df MS FTreatment 1796.19 3 598.73 27.60 *Error 260.25 12 21.69Total 2056.44 15
For the simple mean comparisons we obtain (t6;0.975
√2MSError/4 =
8.058):
Treatments Difference Confidence interval
2/1 13.75 [5.68, 21.82]3/1 17.25 [9.18, 25.32]4/1 29.75 [21.68, 37.82]3/2 3.50 [−4.57, 11.57]4/2 16.00 [ 7.93, 24.07]4/3 12.50 [ 4.43, 20.57]
174 5. More Restrictive Designs
Result: In the case of the simple test all pairwise mean comparisons, exceptfor 3/2, are significant. These tests however are not independent. Hence,we conduct the multiple tests.
Multiple Tests
The multiple test statistics (cf. (4.102)–(4.104)) with the degrees of freedomof the Latin square are
FPLSD = ts(s−1);1−α/2
√2MSError/s , (5.43)
HSD = Qα,(s,s(s−1))
√MSError/s , (5.44)
SNKi = Qα,(i,(s−1)(s−2))
√MSError/s . (5.45)
Results of the Multiple Tests
Fisher’s protected LSD test:
FPLSD = t12,0.975
√2MSError/4
= 2.18√
21.69/2= 7.18.
Hence, the means are different except for µ2 and µ3.
HSD test:
We have Q0.05,(4,12) = 4.20, hence
HSD = 4.20√
21.69/4 = 9.78 .
All the means except 2/3 differ significantly.
SNK test
The means ordered according to their size are
48.00(A), 61.75(B), 65.25(C), 77.75(D).
The Studentized rank values and the SNKi values calculated from themare
i 2 3 4Q0.05,(i,6) 3.46 4.34 4.90SNKi 8.06 10.11 11.41
For the largest difference (D minus A) we have
77.75− 48 = 29.75 > 11.41 ,
5.3 Rank Variance Analysis in the Randomized Block Design 175
for the next differences (D minus B) and (C minus A) we receive
77.75− 61.75 = 16.00 > 10.11 ,
65.25− 48.00 = 17.25 > 10.11 ,
and, finally, we have
(D minus C) : 77.75− 65.25 = 12.50 > 8.06 ,(C minus B) : 3.50 < 8.06 ,(B minus A) : 13.75 > 8.06 .
Hence all means except for 2/3 differ significantly.
5.3 Rank Variance Analysis in the RandomizedBlock Design
5.3.1 Friedman Test
In the randomized block design, the individuals are grouped into blocksand are assigned one of the s treatments, randomized within each block.The essential demand is that each treatment occurs once, and only once,within each block. The layout of the response values is shown in Table 5.2.Once again we assume the linear additive model (5.1). Furthermore, weassume
εiji.i.d.∼ F (0, σ2), (5.46)
where F is any continuous distribution and does not have to be equal tothe normal distribution. The randomization leads to independence of theεij . Hence, the actual assumption in (5.46) refers to the homogeneity ofvariance.
The hypothesis of interest is H0 : no treatment effect, i.e., we test
H0 :τ1 = · · · = τs
against
H1 :τi 6= τj for at least one (i, j), i 6= j .
The test procedure is based on the rank assignment (ranks 1 to s) forthe response values, which is to be done separately for each block. Underthe null hypothesis each of the s! possible orders per block have the sameprobability. Analogously, the (s!)r possible orders of the intra block rankshave equal possibilities.
If we take the sums of ranks per treatment j = 1, . . . , s over the r blocks,then they should be almost equal if H0 holds. The test statistic by Friedman(1937) for testing H0 compares these rank sums.
176 5. More Restrictive Designs
TreatmentBlock 1 · · · s
1 R11 · · · Rs1
......
...r R1r · · · Rsr
Sum R1· · · · Rs·Mean r1· · · · rs·
Table 5.13. Rank sums and rank means in the randomized block design.
The test statistic by Friedman is
Q =12r
s(s + 1)
s∑
j=1
(rj· − r··)2 (5.47)
=12
rs(s + 1)
s∑
j=1
R2j· − 3r(s + 1) . (5.48)
Here we have
Rj· =r∑
i=1
Rji rank sum of the jth treatment,
rj· = Rj·/r rank mean of the jth treatment,r·· = (s + 1)/2 .
If H0 holds, then the differences ri·−r·· are almost equal and Q is sufficientlysmall. If, however, H0 does not hold, then Q becomes large.
The test statistic Q is approximately (for r sufficiently large) χ2s−1–
distributed. Hence, H0 : τ1 = · · · = τs is rejected for
Q > χ2s−1;1−α .
For small values of r (r < 15), this approximation is insufficient. In thiscase exact quantiles are used (cf. tables in Hollander and Wolfe (1973);Michaelis (1971); and Sachs (1974), p. 424). If ties are present, then thecorrection term
Ccorr = 1−r∑
i=1
si∑
k=1
(t3ik − tik)/(rs(s2 − 1)) (5.49)
is calculated. Here ti1 is the size of the first group of equally large responsevalues, ti2 is the size of the second group of equally large response values,etc., in the ith block.
The corrected Friedman statistic is
Qcorr =Q
Ccorr. (5.50)
5.3 Rank Variance Analysis in the Randomized Block Design 177
The Friedman test is a test of homogeneity. It tests whether the treatmentsamples could possibly come from the same population.
Example 5.6. (Continuation of Example 5.3) We conduct the comparisonof the s = 4 treatments, that are arranged in r = 3 blocks, according toTable 5.4 with the Friedman test. From Table 5.4 we calculate the ranksin Table 5.14.
Placebo A B A and BBlock 1 2 3 4
1 2 3 1 42 2 3 1 43 2.5 2.5 1 4
Sum 6.5 8.5 3 12rj· 2.17 2.83 1 4
Table 5.14. Rank table for Table 5.4.
The test statistic Q is
Q =12
3 · 4 · 5(6.52 + 8.52 + 32 + 122)− 3 · 3 · 5
=267.5
5− 45 = 8.5 .
Since we have ties in the third block, we compute
Ccorr = 1− (23 − 2)/(3 · 4 · (42 − 1))= 1− 1/30 = 0.97
and
Qcorr =Q
Ccorr= 8.76 .
The exact test yields the 95%–quantile as 7.4. Hence, H0 : “homogeneityof the four treatments” is rejected.
5.3.2 Multiple Comparisons
We assume that the null hypothesis H0 : τ1 = · · · = τs is rejected by theFriedman test. Analogously to Section 4.7.2, we distinguish between theplanned single comparisons, all pairwise comparisons, and the comparisonof a control group with all other treatments.
Planned Single Comparisons
If the comparison of two selected treatments is planned before the datacollection, then the Wilcoxon test (cf. Chapter 2) is applied.
178 5. More Restrictive Designs
Comparison of all Pairwise Differences According to Friedman
The comparison of all s(s− 1)/2 possible pairs is based on a modificationof the Friedman test (cf. Woolson, 1987, p. 387).
For each combination (j1, j2), j1 > j2, of treatments we compute the teststatistic
Zj1,j2 =|rj1· − rj2·|√s(s + 1)/12r
(5.51)
for testing H0 : τj1 = τj2 against H1 : τj1 6= τj2 . All null hypotheses withZj1,j2 > QP1−α(r) are rejected and the multiple level is α. Tables for thecritical values QP1−α(r) exist (cf., e.g., Woolson 1987, Table 15, p. 506;Hollander and Wolfe, 1973). For α = 0.05 some selected values are:
r 2 3 4 5 6 7 8 9 10QP0.95(r) 2.77 3.31 3.63 3.86 4.03 4.17 4.29 4.39 4.47
Example 5.7. (Continuation of Example 5.3) For the differences of the rankmeans we obtain from Table 5.14 the following table (
√4(4 + 1)/12 · 3 =√
20/36 = 0.745):
Comparisons |rj1· − rj2·| Test statistic1/2 |2.17− 2.83| = 0.66 0.861/3 |2.17− 1.0| = 1.17 1.571/4 |2.17− 4.0| = 1.83 2.462/3 |2.83− 1.0| = 1.83 2.462/4 |2.83− 4.0| = 1.17 1.573/4 |1.0− 4.0| = 3.00 4.03 *
Result: The treatment B and the combination (A and B) show differencesin effect.
Remark: A well–known problem from screening trials is that of a largenumber s of treatments with limited replication r (r ≤ 4 blocks). Brownieand Boos 1994, demonstrate the validity of standard ANOVA and of rank-based ANOVA under nonnormality with respect to type I error rates whens becomes large.
Comparison Control Group versus All Other Treatments
Let j = 1 be the subscript of the control group. The test statistic for themultiple comparison of treatment 1 with the (s− 1) other treatments is
Z1j =|r1· − rj·|√s(s + 1)/6r
, j = 2, . . . , s .
5.4 Exercises and Questions 179
The two–stage quantiles QC1−α(s − 1) are given in special tables (Wool-son, 1987, p. 507; Hollander and Wolfe, 1973). For Z1j > QC1−α(s−1) thecorresponding null hypothesis H0 : “homogeneity of the treatments 1 andj” is rejected. The multiple level α is ensured. In the following table wegive a few selected critical values QC0.95(s− 1):
s− 1 1 2 3 4 5QC0.95(s− 1) 1.96 2.21 2.35 2.44 2.51
Example 5.8. (Continuation of Example 5.3) The above table of the|rj1· − rj2·| yields the following results for the comparison “placebo againstA, B, and combination”:
1/2: Z12 = 0.66/√
4 · 5/6 · 3 = 0.63,
1/3: Z13 = 1.17/√
20/18 = 1.11,
1/4: Z14 = 1.83/√
20/18 = 1.74,
< 2.35 .
Hence, no comparison is significant.
5.4 Exercises and Questions
5.4.1 Describe the strategy of building blocks (homogeneity/heterogeneity).Does the experimental error diminish or increase in the case ofblocking?
5.4.2 How can it be shown that the completely randomized design is asubmodel of the randomized block design? How can the block effectbe tested? Name the correct F–test for the treatment effect in thefollowing table:
SS MS FBlock 20 3Treatment 60 3Error 10 9Total 90 15
5.4.3 Conduct a comparison of means according to Scheffe and Bonferronifor Example 5.3 (Table 5.4). Compare the results with those fromExample 5.3 for the simple comparisons.
5.4.4 A Latin square is to test the effect of the s = 3 eating habits ofdecathletes, who are classified according to the ordinally classifiedfactors, sprinting speed and strength. Test for block effects and forthe treatment effect (measured in points).
180 5. More Restrictive Designs
Speed−→
Strength ↓
A B C40 50 80
C A B50 45 65
B C A70 70 60
Points above an average value.
5.4.5 Conduct the Friedman test for Table 5.7. Define training method 1 asthe control group and conduct a multiple comparison with the threeother training methods.
6Incomplete Block Designs
6.1 Introduction
In many situations the number of treatments to be compared is large.Then we need large number of blocks to accommodate all the treatmentsand in turn more experimental material. This may increase the cost ofexperimentation in terms of money, labor, time etc. The completely ran-domized design and randomized block design may not be suitable in suchsituations because they will require large number of experimental units toaccommodate all the treatments. In such cases when sufficient number ofhomogeneous experimental units are not available to accommodate all thetreatments in a block, then incomplete block designs are used in whicheach block receives only some and not all the treatments to be compared.Sometimes it is possible that the blocks that are available can only handle alimited number of treatments due to several reasons. For example, supposethe effect of twenty medicines for a rare disease from different companies isto be tested over patients. These medicines can be treated as treatments.It may be difficult to get sufficient number of patients having the diseaseto conduct a complete block experiment. In such a case, a possible solutionis to have less than twenty patients in each block. Then not all the twentymedicines can be administered in every block. Instead few medicines areadministered to the patients in one block and the remaining medicines tothe patients in other blocks. The incomplete block designs can be usedin this setup. In another example, the medical companies and biologicalexperimentalists need animals to conduct their experiments to study the
© Springer Science + Business Media, LLC 2009
H. Toutenburg and Shalabh, Statistical Analysis of Designed Experiments, Third Edition, 181Springer Texts in Statistics, DOI 10.1007/978-1-4419-1148-3_6,
182 6. Incomplete Block Designs
development of any new drug. Usually there is an ethics commission whichstudies the whole project and decides how many animals can be sacrificedin the experiment. Generally the limits prescribed by the ethics commissionare not sufficient to conduct a complete block experiment. Then there aretwo options – either to reduce the number of treatments to be comparedaccording to the number of animals in each block or to reduce the blocksize. In such cases when the number of treatments to be compared is largerthan the number of animals in each block, the block size is reduced andthen incomplete block designs can be used. As another example, in manyexperiments, if the per unit cost of getting observations is high then theexperimenter would like to have smaller number of observations to keepthe cost of experimentation low. If the number of treatments are largerthan the affordable number of observations to be allocated in each block,then incomplete block designs are more economical in such situations. Theincomplete block designs need a less number of observations in a blockthan a complete block design to conduct the test of hypothesis withoutloosing the efficiency of design of experiment, in general. The incompleteblock designs are used in these situations and they result in the reductionof the experimental cost as well as of the experimental error. Some moreexamples on the applications of incomplete block designs are presented inHinkelmann and Kempthorne (2005).
The designs in which every block receives all the treatments are calledcomplete block designs whereas the designs in which every block does notreceive all the treatments but only some of the treatments are called incom-plete block designs. In incomplete block designs, the block size is smallerthan the total number of treatments to be compared.
We conduct two types of analysis while dealing with incomplete blockdesigns – intrablock analysis and interblock analysis. In intrablock analysis,the treatment effects are estimated after eliminating the block effects andthen the analysis and test of significance of treatment effects are conductedfurther. If the blocking factor is not marked, then intrablock analysis is suf-ficient enough and the derived statistical inferences are correct and valid.There is a possibility that the blocking factor is important and the blocktotals may carry some important information about the treatment effects.In such situations, one would like to utilize the information on block ef-fects (instead of removing it as in the intrablock analysis) in estimatingthe treatment effects to conduct the analysis of design. This is achievedthrough interblock analysis of an incomplete block design by consideringthe block effects to be random. When intrablock and interblock analysishave been conducted, then two estimates of treatment effects are availablefrom each of the analysis. A natural question then arises – Is it possibleto pool these two estimates together and obtain an improved estimator oftreatment effects to use it for testing of hypothesis? Since such an estimatorcomprises of more information to estimate the treatment effects, so this isnaturally expected to provide better statistical inferences. This is achieved
6.2 General Theory of Incomplete Block Designs 183
by combining the intrablock and interblock analysis together through therecovery of interblock information.
Our objective is to introduce two incomplete block designs – balancedincomplete block designs (BIBD) and partially balanced incomplete blockdesigns (PBIBD) and the methodology to conduct their analysis of vari-ance. In order to understand them, we need to understand first the generaltheory of incomplete block designs. So we will first discuss the general the-ory of incomplete block designs with intrablock analysis, interblock analysisand recovery of interblock information. Then we introduce the BIBD andPBIBD. The theory developed for a general incomplete block design isthen implemented in the analysis of these designs. The intrablock analysisand interblock analysis of BIBD are presented with an example showingthe stepwise computations. In PBIBD, we have restricted only to the in-trablock analysis and an example to demonstrate the steps involved incomputation and analysis. We do not aim to consider the construction ofBIBD and PBIBD; only the analysis part of these designs is presented. Thereader is referred to Raghavarao (1971), Raghavarao and Padgett (1986)and Hinkelmann and Kempthorne (2005) for an excellent exposition on theconstruction of BIBD and PBIBD. For more details on incomplete blockdesigns, see Chakrabarti (1963), John (1980), Dey (1986), Hinkelmann andKempthorne (2005).
6.2 General Theory of Incomplete Block Designs
First we formalize the notations and symbols to be used in this chapter.Let
v denotes the number of treatments to be compared;b denotes the number of available blocks;ki denotes the number of plots in the ith block ;rj denotes the number of plots receiving the jth treatment;n denotes the total number of plots and n =
∑bi=1 ki =
∑vj=1 rj ,
(i = 1, 2, . . . , b, j = 1, 2, . . . , v).
Further, each treatment may occur more than once in each block or maynot occur at all. Let nij be the number of times the jth treatment occurs
184 6. Incomplete Block Designs
in ith block so thatv∑
j=1
nij = ki ; (i = 1, 2, . . . , b) ,
b∑
i=1
nij = rj ; (j = 1, 2, . . . , v) ,
n =b∑
i=1
v∑
j=1
nij .
In matrix notations, the (b× v) matrix of nij ’s is denoted by
N =
n11 n12 · · · n1v
n21 n22 · · · n2v
......
. . ....
nb1 nb2 · · · nbv
and is called the incidence matrix. The matrix N ′N is called theconcordance matrix. Note that
1b′ ·N = (r1, r2, . . . , rv) = r′ ,
N · 1v = (k1, k2, . . . , kb)′ = k′ .
Also, let
β = (β1, β2, . . . , βb)′ ,
τ = (τ1, τ2, . . . , τv)′ ,
B = (B1, B2, . . . , Bb)′ ,
V = (V1, V2, . . . , Vv)′ ,
K = diag(k1, k2, . . . , kb) ,
R = diag(r1, r2, . . . , rv).
where
Bi denotes the block total of ith block andVj denotes the treatment total due to jth treatment.
In general, a design is represented by D(v, b; r, k; n) where v, b, r, k andn are the parameters of the design.
Definition 6.1. A design is said to be proper if all the blocks have samenumber of plots, i.e., ki = k for all i.
Definition 6.2. A design is said to be equireplicate if each treatment isreplicated an equal number of times, i.e., rj = r for all j.
6.3 Intrablock Analysis of Incomplete Block Design 185
Definition 6.3. A design is said to be binary if nij takes only two values,viz., zero or one. Note that nij = 1 or 0 indicates the presence or absence,respectively of the jth treatment in ith block.
Definition 6.4. A linear function λ′β is said to be estimable if there exista linear function l′y of the observations on random variable y such thatE(l′y) = λ′β.
Definition 6.5. A block design is said to be connected if all the elementarytreatment contrasts are estimable.
Disconnected designs are useful for single replicate factorial experimentsarranged in blocks, they need never be used for experiments with at leasttwo observations per treatment.
Definition 6.6. A connected design is said to be balanced or more specifi-cally, variance balanced if all the elementary contrasts of treatment effectscan be estimated with the same precision. This definition does not hold forthe disconnected design as all the elementary contrasts are not estimablein this design.
6.3 Intrablock Analysis of Incomplete Block Design
6.3.1 Model and Normal Equations
Let yijm denotes the response from the mth replicate of jth treatment inith block from the model
yijm = µ + βi + τj + εijm;
i = 1, 2, . . . , b;j = 1, 2, . . . , v;m = 0, 1, 2, . . . , nij
(6.1)
where
µ is the general mean effect;βi is the fixed additive ith block effect;τj is the fixed additive jth treatment effect andεijm is the i.i.d. random error with εijm ∼ N(0, σ2).
The ith block total is Bi =∑
j
∑m yijm, jth treatment total is Vj =∑
i
∑m yijm and grand total of all the observations is G =
∑i
∑j
∑m yijm.
If nij = 0 or 1 for all i and j, we omit the superfluous suffix m.The least squares estimators of µ, βi and τj are µ, βi and τj , respectively
which are the solutions of following normal equations that are obtained byminimizing the sum of squares
∑i
∑j
∑m(yijm−µ−βi−τj)2 with respect
186 6. Incomplete Block Designs
to µ, βi and τj , respectively:
nµ +∑
i
ni·βi +∑
j
n·j τj = G , (6.2)
ni·µ + ni·βi +∑
j
nij τj = Bi , (6.3)
n·j µ +∑
i
nij βi + n·j τj = Vj , (6.4)
where ni· =∑
j nij and n·j =∑
i nij . The normal equations (6.2)-(6.4) canbe expressed in matrix notations as
n 1b′K 1v
′RK1b K NR1v N ′ R
µ
βτ
=
GBV
(6.5)
where, e.g., 1b denotes a (b × 1) vector of all elements being unity. Whenthe interest lies in testing the significance of treatment effects, we eliminatethe block effect (β) from the normal equations by premultiplying both sidesof (6.5) by
1 0 00 Ib −NR−1
0 −N ′K−1 Iv
and obtain the following sets of equations:
nµ + 1b′Kβ + 1vRτ = G , (6.6)
(K −NR−1N ′)β = B −NR−1V , (6.7)(R−N ′K−1N)τ = V −N ′K−1B , (6.8)
where
K−1 = diag(
1k1
,1k2
, . . . ,1kb
)
and
R−1 = diag(
1r1
,1r1
, . . . ,1rv
).
The reduced normal equation (6.8) is represented by
Q = Cτ (6.9)
and is often termed as intrablock equations of treatment effects where
Q = (Q1, Q2, . . . , Qv)′
= V −N ′K−1B (6.10)
and
C = R−N ′K−1N. (6.11)
6.3 Intrablock Analysis of Incomplete Block Design 187
The (v × 1) vector Q is called the vector of adjusted treatment totals. Itis termed as adjusted in the sense that it is adjusted for block effects. The(v×v) matrix C is called the reduced intrablock matrix or C-matrix of theincomplete block design. The C-matrix is symmetric and singular becauseits row and column sums are zero as C1v = 0. Thus rank(C) ≤ v − 1.
The intrablock estimates of µ and τ are thus obtained as
µ =G
bk, (6.12)
τ = C−Q (6.13)
where C− is the generalized inverse of C.We note from (6.10) that
Qj = Vj −b∑
i=1
nijBi
ki; j = 1, 2, . . . , v (6.14)
where Bi/ki is called the average response per plot from ith block and sonijBi/ki is considered as average contribution to the jth treatment totalfrom the ith block. Observe that Qj is obtained by removing the sum ofaverage contributions of b blocks from the jth treatment total Vj .
The diagonal and off-diagonal elements of C-matrix in (6.11) are
cjj = rj −b∑
i=1
n2ij
ki; j = 1, 2, . . . , v , (6.15)
and
cjj′ = −b∑
i=1
nijnik
ki; j 6= j′ , (6.16)
respectively.Since rank(C) ≤ v − 1, so it is clear that all the elementary treatment
contrasts are not estimable and thus the design is not connected. A designis connected if and only if rank(C) = v − 1. The following rules given byChakrabarti (1963) can be used to determine the connectedness of a design.
Rule 1 : The design is connected if every element of C is nonzero.
Rule 2 : The design is connected if C contains a column (or row) of nonzeroelements.
Rule 3 : Find the nonzero element of last row of C. The design is connectedif at least one element in any row above these elements is nonzero.
Definition 6.7. For proper binary equireplicate designs,
C = rI − N ′Nk
.
188 6. Incomplete Block Designs
The intrablock equations of treatment effects are obtained by eliminatingthe block effects from (6.2)-(6.4). Similar to this, the treatment effects canalso be eliminated from (6.2)-(6.4) and intrablock equations of block effectsare found in (6.7) as
P = Dβ (6.17)
where
P = B −NR−1V , (6.18)D = K −NR−1N ′. (6.19)
The (b×b) matrix D is symmetric and singular because its row and columnsums are zero as D1b = 0. Thus rank(D) ≤ b − 1. The (b × 1) vector Pis known as vector of adjusted block totals. This is called adjusted in thesense that it is adjusted for treatment effects.
In fact, the relationship between the ranks of C and D is given by
b + rank C = v + rank D. (6.20)
The relationship (6.20) is proved in Appendix B.3 (Proof 27).Thus if rank (C) = v − 1, then every treatment contrast is estimable.
Similar consideration for a linear function of block effects to be estimableis that it must be a block contrast and then with rank (C) = v − 1 in(6.20), we have rank (D) = b− 1. Thus every block contrast is estimable ifrank (D) = b− 1.
So a necessary and sufficient condition for every block contrast and treat-ment contrast to be estimable is that rank (C) = v − 1. This is the samecondition for a design to be connected.
6.3.2 Covariance Matrices of Adjusted Treatment and BlockTotals
The covariance matrices of adjusted treatment totals and adjusted blocktotals are
V (Q) = (R−N ′K−1N)σ2
= Cσ2 (6.21)
and
V (P ) = (K −NR−1N ′)σ2
= Dσ2, (6.22)
respectively. The covariance between B and Q is
Cov(B, Q) = 0. (6.23)
Thus the adjusted treatment totals are orthogonal to block totals. Theexpressions (6.21)-(6.23) are derived in Appendix B.3 (Proof 28).
6.3 Intrablock Analysis of Incomplete Block Design 189
Next, the covariance matrix between Q and P is
Cov(Q,P ) = (N ′K−1NR−1N ′ −N ′)σ2.
Thus Q and P are orthogonal when
Cov(Q,P ) = 0or N ′K−1NR−1N ′ −N ′ = 0 (6.24)or CR−1N ′ = 0 (using C = R−N ′K−1N) (6.25)or N ′K−1D = 0 (using D = K −NR−1N ′). (6.26)
Thus if any of the condition among (6.24), (6.25) and (6.26) is satisfied,then Q and P are orthogonal and the design is said to be an orthogonalblock design.
So in order that the adjusted block totals may be orthogonal to the ad-justed treatment totals, the design is either not connected or the incidencematrix N is such that nij/rj is constant for all j.
Theorem 6.8. If nij/rj is constant for all j, then nij/ki is also constant forall i and vice versa.
See, Appendix B.3 (Proof 29) for the proof.Hence consistent with the conditions of a design, no nij can be zero in
this case. So when we define an incomplete block design as a design inwhich at least one of the blocks does not contain all the treatments, thenone can assert that all the adjusted block totals can not be orthogonal toall the adjusted treatment totals in a connected block design.
In this case, we have
nij =kirj
n(6.27)
or
N =rk′
n. (6.28)
6.3.3 Decomposition of Sum of Squares and Analysis ofVariance
The sum of squares due to residuals is
SSError(t) =∑
i
∑
j
∑m
(yijm − µ− βi − τj)2
=∑
i
∑
j
∑m
yijm(yijm − µ− βi − τj) [cf. (6.2)-(6.4)]
=∑
i
∑
j
∑m
y2ijm − µG−
∑
i
βiBi −∑
j
τjVj
= Y ′Y − µG−Bβ − V ′τ (6.29)
190 6. Incomplete Block Designs
where Y is the vector of all observations and G is the grand total of allobservations.
Since
β = K−1B − 1bµ−K−1Nτ [cf. (6.3) and (6.5)] (6.30)
and
G = B′1b , (6.31)
substituting (6.30) and (6.31) in (6.29), we have
SSError(t) = Y ′Y − µG−B[K−1B − 1bµ−K−1Nτ ]
= Y ′Y −B′K−1B − (B′K−1N − V ′)τ
=(
Y ′Y − G2
n
)−
(B′K−1B − G2
n
)−Q′τ . (6.32)
Our interest is in testing the null hypothesis H0(t) : τ1 = τ2 = . . . = τv
against the alternative hypothesis H1(t) : at least one pair of τj ’s is different.The sum of squares due to residual under H0 is
SS0Error(t) =
∑
i
∑
j
∑m
(yijm − µ− βi)2
= Y ′Y −B′K−1B
=(
Y ′Y − G2
n
)−
(B′K−1B − G2
n
). (6.33)
Thus the adjusted treatment sum of squares (adjusted for block effects) is
SSTreat(adj) = SSError(t) − SS0Error(t)
= Q′τ
=v∑
j=1
Qj τj . (6.34)
The unadjusted sum of squares due to blocks is
SSBlock(unadj) = B′K−1B − G2
n
=b∑
i=1
B2i
ki− G2
n(6.35)
and the total sum of squares is
SSTotal = Y ′Y − G2
n
=∑
i
∑
j
∑m
y2ij −
G2
n. (6.36)
6.3 Intrablock Analysis of Incomplete Block Design 191
Since adjusted treatment totals are orthogonal to block totals (cf. (6.23)),so the degrees of freedom carried by the sets of Bi and Qj is the sum ofindividual degrees of freedom carried by Bi and Qj . Since
∑j Qj = 0, so
the adjusted treatment totals Qj are not linearly independent and thus theset of Qj has at most (v − 1) degrees of freedom. A test for H0(t) is thenbased on the statistic
FTr =SSTreat(adj)/(v − 1)
SSError(t)/(n− b− v + 1)(6.37)
which follows an F -distribution with (v− 1) and (n− b− v + 1) degrees offreedom under H0(t). If FTr > Fv−1,n−b−v+1;1−α, then H0(t) is rejected.
The intrablock analysis of variance table for testing the significance oftreatment effects is described in Table 6.1.
Table 6.1. Intrablock analysis of variance for H0(t) : τ1 = τ2 = . . . = τv
Source SS df MS F
Betweentreatments(adjusted)
SSTreat(adj) =Q′τ
dfTreat =v − 1
MSTreat =SSTreat(adj)
dfTreat
MSTreatMSE
Betweenblocks(unadjusted)
SSBlock(unadj) =B′K−1B − G2
n
dfBlock =b− 1
MSBlock =SSBlock(unadj)
dfBlock
Intrablockerror
SSError(t) =Y ′Y −B′K−1B−Q′τ
dfEt =n − b −v + 1
MSE =SSError(t)
dfEt
Total SSTotal = Y ′Y −G2
n
dfT = n−1
An important observation to be noted in the analysis of variance of in-complete block designs is that it makes a difference if the treatment effectsare estimated first and then the block effects are estimated later or theblock effects are estimated first and then the treatment effects are esti-mated later. In case of complete block designs, it does not matter at allbecause rank(C) = v − 1. One may also note that in order to use theFisher-Cochran theorem, we must have
SSTotal = SSBlock + SSTreat + SSError. (6.38)
In case of incomplete block designs, either
SSTotal = SSBlock(unadj) + SSTreat(adj) + SSError (6.39)
holds true or
SSTotal = SSBlock(adj) + SSTreat(unadj) + SSError (6.40)
192 6. Incomplete Block Designs
holds true. Both (6.39) and (6.40) can not hold true simultaneously be-cause the unadjusted sum of squares due to blocks and treatments are notorthogonal.
In fact, in case of incomplete block designs
SSBlock(unadj) + SSTreat(adj) = SSBlock(adj) + SSTreat(unadj) . (6.41)
Generally the main interest in design of experiment lies in testing thehypothesis related to treatment effects. In spite of that suppose we wantto test the significance of block effects also. In a complete block design,this can be done from the same analysis of variance table used for testingthe significance of treatment effects. In case of an incomplete block design,this does not remain true and we proceed as follows. Suppose we want totest the null hypothesis H0(b) : β1 = β2 = . . . = βb against alternativehypothesis H1(b) : at least one pair of βi’s is different. Obtain the adjustedsum of squares due to blocks using P ′β or
∑bi=1 Piβi where β is obtained
by P = Dβ (cf. (6.17)). This step can be avoided if τ has already beenobtained from Q = Cτ (cf. (6.9)). In this case, the adjusted sum of squaresdue to blocks is obtained using (6.41) as
SSBlock(adj) = SSBlock(unadj) + SSTreat(adj) − SSTreat(unadj)
where the unadjusted treatment sum of squares is obtained by
SSTreat(unadj) = V ′R−1V − G2
n
=v∑
j=1
V 2j
rj− G2
n. (6.42)
The sum of squares due to residuals in this case is
SSError(b) = SSTotal − SSBlock(adj) − SSTreat(unadj) . (6.43)
The adjusted block totals are also orthogonal to treatment totals and so thedegrees of freedom carried by the set of Pi and Vj is the sum of individualdegrees of freedom carried by Pi and Vj . A test statistic for H0b is thenbased on the statistic
Fbl =SSBlock(adj)/(b− 1)
SSError(b)/(n− b− v + 1)(6.44)
which follows a F -distribution with (b − 1) and (n− b− v + 1) degrees offreedom. If Fbl > Fb−1,n−b−v+1;1−α, then H0(b) is rejected.
The intrablock analysis of variance table for testing the significance oftreatment effects is described in Table 6.2.
The reader may note that since rank (C) ≤ v−1 and rank (D) ≤ b−1, soin order to estimate τ or β, one has to use the generalized inverse. Variousmethods to compute the generalized inverse are available in the literature.
6.4 Interblock Analysis of Incomplete Block Design 193
Table 6.2. Intrablock analysis of variance for H0(b) : β1 = β2 = . . . = βb
Source SS df MS FBetweentreatments(unadjusted)
SSTreat(unadj) =V ′R−1V − G2
n
dfTreat =v − 1
Betweenblocks(adjusted)
SSBlock(adj) dfBlock =b− 1
MSBlock =SSBlock(adj)
dfBlock
MSBlockMSE
Intrablockerror
SSError(b) dfEb =n − b −v + 1
MSE =SSError(b)
dfEb
Total SSTotal = Y ′Y −G2
n
dfT = n−1
The results for testing the significance of treatment effects in intrablockanalysis of an incomplete block design can be obtained using SAS with thefollowing commands:
proc glm data = file name containing data; /* Proc glmperforms an intrablock analysis */
class blocks treat;model data = blocks treat;lsmeans treat;run;
Two types of sum of squares- Type I and Type III are obtained in the SASoutput. The type I sum of squares (SS) for treatment are unadjusted andare based on the ordinary treatment means. So this sum of squares containsboth the treatment and block differences. The type III sum of squares fortreatment is adjusted for block, so the mean square (MS) for treatmentmeasures the difference between treatment means and random error. Theleast squares means are obtained from lsmeans. These are the adjustedmeans in which blocks are treated as another fixed effect for computation.
6.4 Interblock Analysis of Incomplete Block Design
The purpose of block designs is to reduce the variability of response byremoving part of the variability as block numbers. If in fact this removalis illusory, the block effects being all equal, then the estimates are lessaccurate than those obtained by ignoring the block effects and using theestimates of treatment effects. On the other hand, if the block effect is
194 6. Incomplete Block Designs
very marked, the reduction in basic variability may be sufficient to ensurea reduction of the actual variances for the block analysis.
In the intrablock analysis related to treatments, the treatment effects areestimated after eliminating the block effects. If the block effects are marked,then the block comparisons may also provide information about the treat-ment comparison. So a question arises how to utilize the block informationadditionally to develop an analysis of variance to test the significance oftreatment effects.
Such an analysis can be derived by regarding the block effects as ran-dom variables and changing in repetitions of the experiment, correspondingto the choice of different sets of blocks in these repetitions. This assump-tion involves the random allocation of different blocks of the design to bethe blocks of material selected (at random from the population of possi-ble blocks) in addition to the random allocation of treatments occurringin a block to the units of the block selected to contain them. Now thetwo responses from the same block are correlated because the error associ-ated with each contains the block number in common. Such an analysis ofincomplete block design is termed as interblock analysis.
To illustrate the idea behind the interblock analysis and how blockcomparisons also contain information about the treatment comparisons,consider an allocation of four selected treatments in two blocks each andthe outputs (yij) are recorded as follows:
Block 1: y12 y14 y15 y17
Block 2: y21 y23 y24 y25 .
The block totals are
B1 = y12 + y14 + y15 + y17 ,
B2 = y21 + y23 + y24 + y25 .
Following the model (6.1), we have
y12 = µ + β1 + τ2 + ε12 ,
y14 = µ + β1 + τ4 + ε14 ,
y15 = µ + β1 + τ5 + ε15 ,
y17 = µ + β1 + τ7 + ε17 ,
y21 = µ + β2 + τ1 + ε21 ,
y23 = µ + β2 + τ3 + ε23 ,
y24 = µ + β2 + τ4 + ε24 ,
y25 = µ + β2 + τ5 + ε25 ,
and thus
B1 −B2 = 4(β1 − β2) + (τ2 + τ4 + τ5 + τ7)− (τ1 + τ3 + τ4 + τ5)+(ε12 + ε14 + ε15 + ε17)− (ε21 + ε23 + ε24 + ε25) .
6.4 Interblock Analysis of Incomplete Block Design 195
If we assume additionally that the block effects β1 and β2 are random withmean zero, then
E(B1 −B2) = (τ2 + τ7)− (τ1 + τ3)
which reflects that the block comparisons can also provide informationabout the treatment comparisons.
The intrablock analysis of an incomplete block design is based on esti-mating the treatment effects (or their contrasts) by eliminating the blockeffects. Since different treatment occurs in different blocks, so one may ex-pect that the block totals may also provide some information on treatments.The interblock analysis utilizes the information on block totals to estimatethe treatment differences. The block effects are assumed to be random andso we consider the setup of mixed effect model in which the treatment ef-fects are fixed but block effects are random. This approach is applicableonly when the number of blocks are more than the number of treatments.We consider here the interblock analysis of binary proper designs for whichnij = 0 or 1 and k1 = k2 = . . . = kb = k in connection with the intrablockanalysis.
6.4.1 Model and Normal Equations
Let yij denotes the response from jth treatment in ith block from the model
yij = µ∗ + β∗i + τj + εij
i = 1, 2, . . . , b;j = 1, 2, . . . , v ,
(6.45)
where
µ∗ is the general mean effect;β∗i is the random additive ith block effect;τj is the fixed additive jth treatment effect; andεij is the i.i.d. random error with εij ∼ N(0, σ2).
Since the block effect is now considered to be random, so we additionallyassume that β∗i (i = 1, 2, . . . , b) are independent following N(0, σ2
β) anduncorrelated with εij . One may note that we cannot assume here
∑i β∗i = 0
as in other cases of fixed effect models. In place of this, we take E(β∗i ) = 0.Also, yij ’s are no longer independent but
Var(yij) = σ2β + σ2 ,
Cov(yij , yi′j′) =
σ2β if i = i′, j 6= j′
0 otherwise.
196 6. Incomplete Block Designs
In case of interblock analysis, we work with block totals Bi in place ofyij where
Bi =v∑
j=1
nijyij
=v∑
j=1
nij(µ∗ + β∗i + τj + εij)
= kµ∗ +∑
j
nijτj + fi (6.46)
where fi = β∗i k +∑
j nijεij , (i = 1, 2, . . . , b) are independent and normallydistributed with mean 0 and
Var(fi) = k2σ2β + kσ2 = σ2
f .
Thus
E(Bi) = kµ∗ +∑
j
nijτj ,
Var(Bi) = σ2f ; i = 1, 2, . . . , b ,
Cov(Bi, Bi′) = 0 ; i 6= i′; i, i′ = 1, 2, . . . , b.
In matrix notations, the model (6.46) can be written as
B = kµ∗1b + Nτ + f (6.47)
where f = (f1, f2, . . . , fb)′.In order to obtain an estimate of τ , we minimize the sum of squares due
to error f , i.e., minimize
(B − kµ∗1b −Nτ)′(B − kµ∗1b −Nτ)
with respect to µ and τ . The estimates of µ and τ are obtained as
µ =G
bk, (6.48)
τ = (N ′N)−1N ′B − G1v
bk. (6.49)
The estimates in (6.48) and (6.49) are termed as interblock estimates ofµ and τ , respectively. These estimates are derived in Appendix B.3 (Proof30).
Generally we are not interested merely in the interblock analysis of vari-ance but we utilize the information from interblock analysis along withintrablock information to improve upon the statistical inferences. This ispresented in the next Subsection 6.4.2.
The results for interblock analysis of an incomplete block design can beobtained using SAS with the following commands:
6.4 Interblock Analysis of Incomplete Block Design 197
proc glm data = file name containing data;class blocks treat;model data = blocks treat;lsmeans treatments;estimate ‘Treat 1’ intercept 1 treat 1; /* for example */estimate ‘Treat 1 vs Treat 3’ intercept 1 treat 1 0 -1;
/* for example */random blocks;run;
Instead of proc glm, another procedure proc mixed can also be used.The procedure proc glm is based on the ordinary least squares estimationand the procedure proc mixed is based on the generalized least squaresestimation (estimates are maximum likelihood estimates under normality).
6.4.2 Use of Intrablock and Interblock Estimates
After obtaining the interblock estimate of treatment effects, the next ques-tion that arises is how to use this information for an improved estimationof treatment effects and use it further for the testing of significance oftreatment effects. Such an estimate is based on more information, so it isexpected to provide better statistical inferences.
We now have two different estimates of treatment effects as
– based on intrablock analysis τ = C−Q (cf. (6.13)) and
– based on interblock analysis τ = (N ′N)−1N ′B − G1vbk (cf. (6.49)).
Let us consider the estimation of linear contrast of treatment effectsL = l′τ . Since the intrablock and interblock estimates of τ are based onGauss-Markov model and least squares, so the best estimate of L based onintrablock estimation is
L1 = l′τ
= l′C−Q (6.50)
and the best estimate of L based on interblock estimation is
L2 = l′τ
= l′[(N ′N)−1N ′B − G1v
bk
]
= l′(N ′N)−1N ′B (since l′1v = 0 being contrast.) (6.51)
The variances of L1 and L2 are
Var(L1) = σ2l′C−l (6.52)
and
Var(L2) = σ2f l′(N ′N)−1l, (6.53)
198 6. Incomplete Block Designs
respectively. The covariance between Q (from intrablock) and B (frominterblock) is
Cov(Q,B) = Cov(V −N ′K−1B, B) [cf. (6.10)]= Cov(V, B)−N ′K−1V(B)= N ′σ2
f −N ′K−1Kσ2f
= 0. (6.54)
Using (6.54), we have
Cov(L1, L2) = 0 (6.55)
irrespective of the values of l.The question now arises that given the two estimators τ and τ of τ , how
to combine them and obtain a minimum variance unbiased estimator of τ .We note that a pooled estimator of τ in the form of weighted arithmeticmean of uncorrelated L1 and L2 is the minimum variance unbiased estima-tor of τ when the weights θ1 and θ2 of L1 and L2, respectively are chosensuch that
θ1
θ2=
Var(L2)Var(L1)
, (6.56)
i.e., the chosen weights are reciprocal to the variance of respective estima-tors, irrespective of the values of l. So consider the weighted average of L1
and L2 with weights θ1 and θ2, respectively as
τ∗ =θ1L1 + θ2L2
θ1 + θ2
=l′(θ1τ + θ2τ)
θ1 + θ2(6.57)
with
θ−11 = l′C−lσ2 , (6.58)
θ−12 = l′(N ′N)−1lσ2
f . (6.59)
The linear contrast of τ∗ is
L∗ = l′τ∗ (6.60)
and its variance is
Var(L∗) =θ21Var(L1) + θ2
2Var(L2)(θ1 + θ2)2
l′l (since Cov(L1, L2) = 0)
=l′l
(θ1 + θ2). [cf. (6.56)] (6.61)
We note from (6.57) that τ∗ can be obtained provided θ1 and θ2 areknown. But θ1 and θ2 are known if σ2 and σ2
β are known. So τ∗ can beobtained if σ2 and σ2
β are known. In case, if σ2 and σ2β are unknown then
6.4 Interblock Analysis of Incomplete Block Design 199
their estimates can be used. A question arises how to obtain such estima-tors? One such approach to obtain the estimates of σ2 and σ2
β is based onutilizing the results from intrablock and interblock analysis both and is asfollows.
From intrablock analysis
E(SSError(t)) = (n− b− v + 1)σ2 , [cf. (6.29)]
so an unbiased estimator of σ2 is
σ2 =SSError(t)
n− b− v + 1. (6.62)
An unbiased estimator of σ2β is obtained by using the following results
based on intrablock analysis:
SSTreat(unadj) =v∑
j=1
V 2j
rj− G2
n,
SSBlock(unadj) =b∑
i=1
B2i
ki− G2
n, [cf. (6.35)]
SSTreat(adj) =v∑
j=1
Qj τj , [cf. (6.34)]
SSTotal =b∑
i=1
v∑
j=1
y2ij −
G2
n,
where
SSTotal = SSTreat(adj) + SSBlock(unadj) + SSError(t)
= SSTreat(unadj) + SSBlock(adj) + SSError(t) .
Hence
SSBlock(adj) = SSTreat(adj) + SSBlock(unadj) − SSTreat(adj).
Under the interblock analysis model (6.46) and (6.47),
E[SSBlock(adj)] = E[SSTreat(adj)] + E[SSBlock(unadj)]− E[SSTreat(adj)]
which is obtained as following:
E[SSBlock(adj)] = (b− 1)σ2 + (n− v)σ2β (6.63)
or
E[SSBlock(adj) −
b− 1n− b− v + 1
SSError(t)
]= (n− v)σ2
β . [cf. (6.62)]
200 6. Incomplete Block Designs
Thus an unbiased estimator of σ2β is
σ2β =
1n− v
[SSBlock(adj) −
b− 1n− b− v + 1
SSError(t)
]. (6.64)
Now the estimates of weights θ1 and θ2 in (6.58) and (6.59) can beobtained by replacing σ2 and σ2
β by σ2 (cf. (6.62)) and σ2β (cf. (6.64)), re-
spectively. Then the estimate of τ∗ (cf. (6.57)) can be obtained by replacingθ1 and θ2 by their estimates and can be used in place of τ∗. It may be notedthat the exact distribution of associated sum of squares due to treatmentsis difficult to find when σ2 and σ2
β are replaced by σ2 and σ2β , respectively
in τ∗. Some approximate results are possible which we will present whiledealing with the balanced incomplete block design in the next section. Anincrease in the precision using interblock analysis as compared to intrablockanalysis is measured by
1/variance of pooled estimate1/variance of intrablock estimate
− 1.
In interblock analysis, the block effects are treated as random variablewhich is appropriate if the blocks can be regarded as a random samplefrom a large population of blocks. The best estimate of treatment effectfrom intrablock analysis is further improved by utilizing the information onblock totals. Since the treatments in different blocks are not all the same, sothe difference between block totals is expected to provide some informationabout the differences between the treatments. So the interblock estimatesare obtained and pooled with intrablock estimates to obtain the combinedestimate of τ . The procedure of obtaining the interblock estimates and thenthe pooled estimates is called the recovery of interblock information.
How to conduct the analysis of variance in the recovery of interblockinformation is presented in the next Subsection 6.5.3 under the setup ofa BIBD. The results for recovery of interblock information in incompleteblock designs can be obtained using SAS with the following commands:
proc mixed data = file name containing data ; /* e.g., assume6 treatments in 3 blocks of size 4 */
class blocks treat;model data = blocks treat;lsmeans treatments;estimate ‘Treat 1’ intercept 1 treat 1; /* intrablock
analysis */estimate ‘Treat 1’ intercept 12 treat 6 |
blocks 1 1 1 /divisor=12; /* interblock analysis */estimate ‘Treat 1 vs Treat 3’ intercept 1 treat 1 0 -1;random blocks;run;
6.5 Balanced Incomplete Block Design 201
6.5 Balanced Incomplete Block Design
A balanced incomplete block design (BIBD) is an arrangement of v treat-ments in b blocks, each containing k experimental units (k < v) suchthat
– every treatment occurs at most once in each block,
– every treatment is replicated r times in the design and
– every pair of treatment occurs together in exactly λ of the b blocks.
The quantities v, b, r, k and λ are called the parameters of BIBD. TheBIBD is a proper, binary and equireplicate design.
The parameters v, b, r, k and λ are integers which are not chosenarbitrarily and are not at all independent. They satisfy the followingrelations:
(i) bk = vr (6.65)(ii) λ(v − 1) = r(k − 1) (6.66)(iii) b ≥ v (and hence r ≥ k). (6.67)
The relationship (iii) in (6.67) is also called as Fisher’s inequality.Since BIBD is a binary design, i.e.,
nij =
1 if the jthtreatment occurs in the ith block0 otherwise,
sov∑
j=1
nij = k for all i = 1, 2, . . . , b , (6.68)
b∑
i=1
nij = r for all j = 1, 2, . . . , v , (6.69)
v∑
i=1
nijnij′ = λ for all j, j′ = 1, 2, . . . , v . (6.70)
Obviously, nij/r can not be constant for all j (cf. (6.27)), so this designis not orthogonal.
Following arrangement of treatments in Table 6.3 with b = 10,(B1, B2, . . . , B10), v = 6, (T1, T2, . . . , T6), k = 3, r = 5 and λ = 2 isan example of BIBD.
The relationships (i)-(iii) in (6.66)-(6.68) are also satisfied for BIBD inTable 6.3 as
bk = 30 = vr,
λ(v − 1) = 10 = r(k − 1),
202 6. Incomplete Block Designs
Table 6.3. Arrangement of BIBD with b = 10, v = 6, k = 3, r = 5 and λ = 2
Blocks TreatmentsB1 T1, T2, T5
B2 T1, T2, T6
B3 T1, T3, T4
B4 T1, T3, T6
B5 T1, T4, T5
B6 T2, T3, T4
B7 T2, T4, T6
B8 T2, T3, T5
B9 T3, T5, T6
B10 T4, T5, T6
and
b = 10 ≥ v = 6.
Even if the parameters satisfy the relations (6.65)-(6.67), it is not alwayspossible to arrange the treatments in blocks to get the corresponding BIBD.The conditions (6.65)-(6.67) are some necessary conditions. Each conditionhas an interpretation and can be derived analytically, see Appendix B.3(Proofs 31–33) for their derivation.
6.5.1 Interpretation of Conditions of BIBD
(i) bk = vr
The interpretation of bk = vr is related to the total number of plots and isas follows. Since there are b blocks and each block has k plots, so the totalnumber of plots is bk. Also, there are v treatments and each treatment isreplicated r times with a rider that each treatment occurs at most once ina block. So the total number of plots is vr. Hence bk = vr.
(ii) λ(v − 1) = r(k − 1)
The number of pairs of plots in a block are(
k2
). So the total number
of pairs of plots such that each pair consists of plots within a block are
b
(k2
)=
bk(k − 1)2
. (6.71)
Similarly, the number of pairs of treatment are(
v2
)and each pair is
replicated λ times (i.e., in λ number of blocks). So the total number of
6.5 Balanced Incomplete Block Design 203
pairs of plots within blocks must be
λ
(v2
)=
λv(v − 1)2
. (6.72)
Thus it follows from (6.71) and (6.72) that
bk(k − 1)2
=λv(v − 1)
2. (6.73)
Since bk = vr, so (6.73) reduces to
r(k − 1) = λ(v − 1).
Definition 6.9. A BIBD is called symmetric if the number of blocks andtreatments are equal, i.e., b = v. Since bk = vr, so k = r in a symmetricBIBD.
The determinant of N ′N is
|N ′N | = [r + λ(v − 1)](r − λ)v−1 [cf. (B.132)]= rk(r − λ)v−1. [cf. (6.66)]
When BIBD is symmetric, b = v and then
|N ′N | = |N |2 = r2(r − λ)v−1, [cf. (B.132)]
so
|N | = ±r(r − λ)v−12 . (6.74)
Since |N | is an integer, hence when v is an even number, (r − λ) must bea perfect square. So
|N ′N | = (r − λ)I + λ1v1v′ ,
(N ′N)−1 = N−1N ′−1
=1
r − λ
[I − λ
r21v1v
′]
,
N ′−1 =1
r − λ
[N − λ
r1v1v
′]
. (6.75)
Postmultiplying both sides by N ′, we get
NN ′ = (r − λ)I + λ1v1v′ = N ′N. (6.76)
Hence in the case of a symmetric BIBD, any two blocks have λ treatmentsin common.
Definition 6.10. A block design of b blocks in which each of the v treatmentsis replicated r times is said to be resolvable if the b blocks can be dividedinto r sets of b/r blocks each such that every treatment appears in each setprecisely once. Obviously, b is multiple of r in a resolvable design.
204 6. Incomplete Block Designs
Theorem 6.11. In a resolvable BIBD,
b ≥ v + r − 1. (6.77)
See Appendix B.3 (Proof 34) for the derivation of (6.77).
Definition 6.12. A resolvable BIBD is said to be affine resolvable if twoblocks belonging to two different sets have the same number of treatmentsin common.
A necessary and sufficient condition for a BIBD to be affine resolvable isthat
b = v + r − 1 (6.78)
and in this case, k/n = k2/v is an integer.
6.5.2 Intrablock Analysis of BIBD
Consider the model
yij = µ + βi + τj + εij ; i = 1, 2, . . . , b; j = 1, 2, . . . , v , (6.79)
where
µ is the general mean effect;βi is the fixed additive ith block effect;τj is the fixed additive jth treatment effect andεij is the i.i.d. random error with εijm ∼ N(0, σ2).
The results from the intrablock analysis of an incomplete block design fromSection 6.3 are carried over and implemented under the conditions of BIBD.Using the same notations, we represent the block totals by Bi =
∑vj=1 yij ,
treatment totals by Vj =∑b
i=1 yij , adjusted treatment totals by Qj andgrand total by G =
∑i
∑j yij . The normal equations can be obtained
after eliminating the block effects and the resulting intrablock equations oftreatment effects in matrix notations are
Q = Cτ [cf. (6.9)] (6.80)
where in case of BIBD, the diagonal elements of C are given by
cjj = r −∑b
i=1 n2ij
k(j = 1, 2, . . . , v)
= r − r
k, (6.81)
6.5 Balanced Incomplete Block Design 205
the off-diagonal elements of C are given by
cjj′ = −1k
b∑
i=1
nijnij′ (j 6= j′; j, j′ = 1, 2, . . . , v)
= −λ
k, (6.82)
and the adjusted treatment totals are given by
Qj = Vj − 1k
b∑
i=1
nijBi (j = 1, 2, . . . , v)
= Vj − 1k
∑
i(j)
Bi (6.83)
where∑
i(j) denotes the sum over those blocks containing jth treatment.Let Tj =
∑i(j) Bi, then
Qj = Vj − Tj
k. (6.84)
An estimate of τ is obtained as
τ =k
λvQ (6.85)
which is derived in Appendix B.3 (Proof 35).The null hypothesis of our interest is H0 : τ1 = τ2 = . . . = τv against
the alternative hypothesis H1 : at least one pair of τj ’s is different. Theadjusted treatment sum of squares (cf. (6.34)) is
SSTreat(adj) = τ ′Q
=k
λvQ′Q
=k
λv
v∑
j=1
Q2j , (6.86)
the unadjusted block sum of squares (cf. (6.35)) is
SSBlock(unadj) =b∑
i=1
B2i
k− G2
bk(6.87)
and the residual sum of squares is
SSError(t) = SSTotal − SSBlock(unadj) − SSTreat(adj) (6.88)
where
SSTotal =b∑
i=1
v∑
j=1
y2ij −
G2
bk. (6.89)
206 6. Incomplete Block Designs
A test for H0 : τ1 = τ2 = . . . = τv is then based on the statistic
FTr =SSTreat(adj)/(v − 1)
SSError(t)/(bk − b− v + 1)
=k
λv· bk − b− v + 1
v − 1·
∑vj=1 Q2
j
SSError(t). (6.90)
If FTr > Fv−1,bk−b−v+1;1−α then H0(t) is rejected.The intrablock analysis of variance table for testing the significance of
treatment effect is given in Table 6.4.
Table 6.4. Intrablock analysis of variance table of BIBD forH0(t) : τ1 = τ2 = . . . = τv
Source SS df MS F
Betweentreatments(adjusted)
SSTreat(adj) =kλv
∑vj=1 Q2
j
dfTreat =v − 1
MSTreat =SSTreat(adj)
dfTreat
MSTreatMSE
Betweenblocks(unadjusted)
SSBlock(unadj) =∑bi=1
B2i
k − G2
bk
dfBlock =b− 1
Intrablockerror
SSError(t) (bysubstraction)
dfEt = bk −b− v + 1
MSE =SSError(t)
dfEt
Total SSTotal =∑i
∑j y2
ij − G2
bk
dfT = bk−1
The variance of an elementary contrast (τj−τj′ , j 6= j′) under intrablockanalysis is
Vτj−τj′ = Var(τj − τj′)
=k2
λ2v2[Var(Qj) + Var(Qj′)− 2Cov(QjQj′)]
=k2
λ2v2(cjj + cj′j′ − 2cjj′)σ2 [cf. (6.21)]
=k2
λ2v2
[2r
(1− 1
k
)+
2λ
k
]σ2 [cf. (6.81) and (6.82)]
=2k
λvσ2. (6.91)
An unbiased estimator of σ2 from (6.62) is
σ2 =SSError(t)
bk − b− v + 1. [cf. (6.88)] (6.92)
6.5 Balanced Incomplete Block Design 207
Thus an unbiased estimator of (6.91) can be obtained by substituting σ2
in it as
Vτj−τj′ =2k
λv· SSError(t)
bk − b− v + 1. (6.93)
In order to test H0 : τj = τj′ , (j 6= j′), a suitable statistic is
t =k(bk − b− v + 1)
λv· Qj −Qj′√
SSError(t)
(6.94)
which follows a t-distribution with (bk−b−v+1) degrees of freedom underH0. The results (6.91)-(6.94) can be used for multiple comparison tests inthe case of rejection of null hypothesis.
We now compare the efficiency of BIBD with a randomized block (com-plete) design with r replicates. The variance of an elementary contrastunder a randomized block design (RBD) is
Var(τj − τj′)RBD =2σ2∗
r(6.95)
where Var(yij) = σ2∗ under RBD.
Thus efficiency of BIBD relative to RBD is
Var(τj − τj′)RBD
Var(τj − τj′)=
(2σ2∗
r
)
(2kσ2
λv
) [cf. (6.91)]
=λv
rk
(σ2∗
σ2
). (6.96)
The factor (λv)/(rk) = E (say) in (6.96) is termed as the efficiency factorof BIBD and
E =λv
rk=
v
k
(k − 1v − 1
)
=(
1− 1k
)(1− 1
v
)−1
< 1 (since v > k) .
But the actual efficiency of BIBD over RBD not only depends on effi-ciency factor but also on the ratio of variances σ2
∗/σ2. So BIBD can bemore efficient than RBD as σ2
∗ can be more than σ2 as k < v.
Definition 6.13. A block design is said to be efficiency balanced if everycontrast of treatment effects is estimated through the design with sameefficiency factor.
If a block design satisfies any two of the following properties:
(i) efficiency balanced,
(ii) variance balanced and
208 6. Incomplete Block Designs
(iii) equal number of replications,
then the third property holds true.
Example 6.1. Consider the following arrangement of 5 treatments in 10blocks leading to a BIBD. The response obtained from the experiment arepresented in the Table 6.5. First we explain about the steps involved inthe intrablock analysis of BIBD. The parameters of the design are b = 10,
Table 6.5. Responses under BIBD in Example 6.1
Treatments I II III IV VBlock 1 6.53 8.35 4.28Block 2 7.37 5.44 8.38Block 3 8.32 4.36 5.73Block 4 9.12 8.36 7.45Block 5 6.38 6.50 6.83Block 6 4.68 3.45 9.72Block 7 3.64 8.37 7.37Block 8 7.45 6.41 8.92Block 9 6.31 4.77 8.29Block 10 5.32 6.72 7.21
v = 5, r = 6, k = 3 and λ = 3.The block totals are obtained as
B1 = 6.53 + 8.35 + 4.28 = 19.16 ,
B2 = 7.37 + 5.44 + 8.38 = 21.19 ,
B3 = 8.32 + 4.36 + 5.73 = 18.41 ,
B4 = 9.12 + 8.36 + 7.45 = 24.93 ,
B5 = 6.38 + 6.50 + 6.83 = 19.71 ,
B6 = 4.68 + 3.45 + 9.72 = 17.85 ,
B7 = 3.64 + 8.37 + 7.37 = 19.38 ,
B8 = 7.45 + 6.41 + 8.92 = 22.78 ,
B9 = 6.31 + 4.77 + 8.29 = 19.35 ,
B10 = 5.32 + 6.72 + 7.21 = 19.25 .
The treatment totals are obtained as
V1 = 6.53 + 8.32 + 9.12 + 6.38 + 4.68 + 6.31 = 41.34 ,
V2 = 7.37 + 4.36 + 8.36 + 3.45 + 3.64 + 5.32 = 32.50 ,
V3 = 5.44 + 5.73 + 6.50 + 7.45 + 4.77 + 6.72 = 36.61 ,
V4 = 8.35 + 8.38 + 9.72 + 8.37 + 6.41 + 8.29 = 49.52 ,
V5 = 4.28 + 7.45 + 6.83 + 7.37 + 8.92 + 7.21 = 42.06 ,
6.5 Balanced Incomplete Block Design 209
and grand total (G) = 202.03.In this case, the C-matrix is
C =
4 −1 −1 −1 −1−1 4 −1 −1 −1−1 −1 4 −1 −1−1 −1 −1 4 −1−1 −1 −1 −1 4
,
where
cjj = 6− 63
,
cjj′ = −33, j 6= j′ ,
the incidence matrix N is
N =
1 0 1 1 1 1 0 0 1 00 1 1 1 0 1 1 0 0 10 1 1 0 1 0 0 1 1 11 1 0 0 0 1 1 1 1 01 0 0 1 1 0 1 1 0 1
,
T1 =10∑
i=1
ni1Bi
= 19.16 + 18.41 + 24.93 + 19.71 + 17.85 + 19.35= 119.41 ,
T2 =10∑
i=1
ni2Bi
= 21.19 + 18.41 + 24.93 + 19.71 + 17.85 + 19.38 + 19.25= 121.01 ,
T3 =10∑
i=1
ni3Bi
= 21.19 + 18.41 + 19.71 + 22.78 + 19.35 + 19.25= 120.71 ,
T4 =10∑
i=1
ni4Bi
= 19.16 + 21.19 + 17.85 + 19.38 + 22.78 + 19.25= 119.73 ,
210 6. Incomplete Block Designs
T5 =10∑
i=1
ni5Bi
= 19.16 + 24.93 + 19.71 + 19.38 + 22.78 + 19.25= 125.21 .
Now the adjusted treatment totals are obtained as
Q1 = V1 − T1
k= 1.53 ,
Q2 = V2 − T2
k= −7.84 ,
Q3 = V3 − T3
k= −3.63 ,
Q4 = V4 − T4
k= 9.61 ,
Q5 = V5 − T5
k= 0.32 .
The adjusted treatment sum of squares (cf. (6.86)) is
SSTreat(adj) =k
λv
5∑
j=1
Q2j
= 33.89 ,
the unadjusted block sum of squares (cf. (6.87)) is
SSBlock(unadj) =10∑
i=1
B2i
k− G2
bk
= 14.11 ,
the total sum of squares (cf. (6.89)) is
SSTotal =5∑
i=1
5∑
j=1
y2ij −
G2
bk
= 82.22 ,
and and residual sum of squares (cf. (6.88)) is
SSError(t) = SSTotal − SSBlock(unadj) − SSTreat(adj)
= 34.22 .
The test statistics for H0(t) : τ1 = τ2 = τ3 = τ4 = τ5 (cf. (6.90)) is
FTr =k
λv· bk − b− v + 1
v − 1·
∑5j=1 Q2
j
SSError(t)
= 3.96
and F4,16;0.95 = 3.01, so H0(t) is rejected at 5% level of significance.
6.5 Balanced Incomplete Block Design 211
The analysis of variance table in this case is obtained in Table 6.6.The variance of an elementary contrast of treatments is estimated (cf.
(6.91)) by
Vτj−τj′ =2k
λvσ2
= 0.85
where σ2 is estimated (cf. (6.92)) by
σ2 =SSError(t)
bk − b− v + 1= 2.14. (6.97)
Table 6.6. Intrablock analysis of variance of BIBD forH0(t) : τ1 = τ2 = τ3 = τ4 = τ5 in Example 6.1
Source SS df MS FBetweentreatments(adjusted)
33.89 4 8.47 FTr = 3.96
Between blocks(unadjusted)
14.11 9 1.57
Intrablock error 34.22 (by sub-straction)
16 2.14
Total 82.22 29
The results for intrablock analysis of BIBD can be obtained using theproc glm in SAS with the commands in Section 6.3.
6.5.3 Interblock Analysis and Recovery of InterblockInformation in BIBD
An intrablock analysis of BIBD is based on the assumption that the blockeffects are not marked. It is possible in many situations that the block ef-fects are marked and then the block totals may carry information aboutthe treatment combinations. This information can be used in estimatingthe treatment effects by an interblock analysis of BIBD and used furtherthrough recovery of interblock information. So we first conduct the in-terblock analysis of BIBD. We do not derive the expressions a fresh but weuse the assumptions and results for an interblock analysis of an incompleteblock design from Section 6.4 assuming that the block effects are random.
212 6. Incomplete Block Designs
After estimating the treatment effects under interblock analysis, we use theresults of Section 6.4.2 for the pooled estimation and recovery of interblockinformation in a BIBD.
In case of BIBD,
N ′N =
∑i n2
i1
∑i ni1ni2 . . .
∑i ni1niv∑
i ni1ni2
∑i n2
i2 . . .∑
i ni2niv
......
. . ....∑
i nivni1
∑i nivni2 . . .
∑i n2
iv
=
r λ . . . λλ r . . . λ...
.... . .
...λ λ . . . r
= (r − λ)Iv + λ1v1v′ , (6.98)
(N ′N)−1 =1
r − λ
[Iv − λ1v1v
′
rk
]. (6.99)
The interblock estimate of τ can be obtained by substituting (6.98) in
τ = (N ′N)−1N ′B − G1v
bk. [cf. (6.49)]
In order to use the interblock and intrablock estimates of τ to-gether through pooled estimate, we consider the interblock and intrablockestimates of treatment contrast.
The intrablock estimate of treatment contrast l′τ is
l′τ = l′C−Q [cf. (6.51)]
=k
λvl′Q [cf. (6.85)]
=k
λv
∑
j
ljQj
=v∑
j=1
lj τj . (6.100)
6.5 Balanced Incomplete Block Design 213
The interblock estimate of treatment contrast l′τ is
l′τ =l′N ′Br − λ
(since l′1v = 0 and cf. (6.51))
=1
r − λ
v∑
j=1
lj
(b∑
i=1
nijBi
)
=1
r − λ
v∑
j=1
ljTj
=v∑
j=1
lj τj . (6.101)
Further, the variances of l′τ and l′τ are obtained as
Var(l′τ) =(
k
λv
)σ2
∑
j
l2j , (6.102)
Var(l′τ) =σ2
f
r − λ
∑
j
l2j , (6.103)
which are derived in Appendix B.3 (Proof 36).The weights to be assigned to intrablock and interblock estimates are
reciprocal to λv/(kσ2) and (r − λ)/σ2f , respectively. The pooled estimate
of l′τ and l′τ is
L∗ =λvkσ2
∑j lj τj + r−λ
σ2f
∑j lj τj
λvkσ2 + r−λ
σ2f
[cf. (6.57)]
=∑
j
lj
[λvω1τj + k(r − λ)ω2τj
λvω1 + k(r − λ)ω2
]
=∑
j
ljτ∗j (6.104)
where
τ∗j =λvω1τj + k(r − λ)ω2τj
λvω1 + k(r − λ)ω2(6.105)
=1r
[Vj + ξ
W ∗
j − (k − 1)G]
, (6.106)
W ∗j = (v − k)Vj − (v − 1)Tj + (k − 1)G , (6.107)
ξ =ω1 − kω2
ω1v(k − 1) + ω2k(v − k), (6.108)
ω1 =1σ2
, (6.109)
ω2 =1σ2
f
. (6.110)
214 6. Incomplete Block Designs
The proof of (6.106) is detailed in Appendix B.3 (Proof 37).Thus the pooled estimate of the contrast l′τ is
l′τ∗ =∑
j
ljτ∗j
=1r
∑
j
lj(Vj + ξW ∗j ) (since
∑j lj = 0 being contrast)
(6.111)
and variance of l′τ∗ is
Var(l′τ∗) =k
λvω1 + k(r − λ)ω2
∑
j
l2j
=k(v − 1)
r[v(k − 1)ω1 + k(v − k)ω2]
∑
j
l2j
(using λ(v − 1) = r(k − 1))
= σ2E
∑j l2j
r(6.112)
where
σ2E =
k(v − 1)v(k − 1)ω1 + k(v − k)ω2
(6.113)
is the effective variance.The effective variance can be approximately estimated by
σ2E = MSE [1 + (v − k)ω∗]
where MSE is the mean square due to error from intrablock analysis as
MSE =SSError(t)
bk − b− v + 1[cf. (6.88)] (6.114)
and
ω∗ =ω1 − ω2
v(k − 1)ω1 + (v − k)ω2. (6.115)
To test the hypothesis related to treatment effects based on the pooledestimate, we proceed as follows.
Consider the adjusted treatment totals based on intrablock and in-terblock estimates as
T ∗j = Tj + ω∗W ∗j ; j = 1, 2, . . . , v. (6.116)
The sum of squares due to T ∗j is
S2T∗ =
v∑
j=1
T ∗j2 −
(∑vj=1 T ∗j
)2
v. (6.117)
6.5 Balanced Incomplete Block Design 215
Define the statistic
F ∗ =S2
T∗/[(v − 1)r]MSE [1 + (v − k)ω∗]
(6.118)
where ω∗ is an estimator of ω∗ in (6.115). It may be noted that F ∗ dependson ω∗. Also, ω∗ itself depends on the estimated variances σ2 and σ2
f . Sothe statistic F ∗ does not exactly follows F distribution. The approximatedistribution of F ∗ is considered as F distribution with (v−1) and (bk−b−v + 1) degrees of freedom. Also, ω∗ is an estimator of ω∗ which is obtainedby substituting the unbiased estimators of ω1 and ω2.
The problem of estimating ω1 and ω2 is similar to the analysis of a linearmodel with correlated data.
An estimate of ω1 can be obtained by estimating σ2 from intrablockanalysis of variance as
ω1 =1σ2
= [MSE ]−1 . [cf. (6.114)] (6.119)
The estimate of ω2 depends on σ2 and σ2β . To obtain an unbiased
estimator of σ2β , consider
SSBlock(adj) = SSTreat(adj) + SSBlock(unadj) − SSTreat(unadj)
for which
E(SSBlock(adj)) = (bk − v)σ2β + (b− 1)σ2 . (6.120)
Thus an unbiased estimator of σ2β is
σ2β =
1bk − v
[SSBlock(adj) − (b− 1)σ2
]
=1
bk − v
[SSBlock(adj) − (b− 1)MSE
]
=b− 1bk − v
[MSBlock(adj) −MSE
]
=b− 1
v(r − 1)[MSBlock(adj) −MSE
]
where
MSBlock(adj) =SSBlock(adj)
b− 1. (6.121)
Thus
ω2 =1
kσ2 + σ2β
=1
v(r − 1)[k(b− 1)SSBlock(adj) − (v − k)SSError(t)]. (6.122)
216 6. Incomplete Block Designs
An approximate best pooled estimate of∑v
j=1 ljτj is
v∑
j=1
ljVj + ξWj
r(6.123)
and its variance is approximately estimated by
k∑
j l2j
λvω1 + (r − λ)kω2. (6.124)
In case of resolvable BIBD, σ2β can be obtained by using the adjusted
block with replications sum of squares from the intrablock analysis of vari-ance. If sum of squares due to such block total is SS∗Block and correspondingmean square is
MS∗Block =SS∗Block
b− r(6.125)
then
E(MS∗Block) = σ2 +(v − k)(r − 1)
b− rσ2
β
= σ2 +(r − 1)k
rσ2
β , (6.126)
since k(b− r) = r(v − k) for a resolvable design. Thus
E [rMS∗Block −MSE ] = (r − 1)(σ2 + kσ2β) [cf. (6.114)] (6.127)
and hence
ω2 =[rMS∗Block −MSE
r − 1
]−1
, (6.128)
ω1 = [MSE ]−1. (6.129)
The analysis of variance table for recovery of interblock information inBIBD is described in Table 6.7
The increase in precision using interblock analysis as compared tointrablock analysis is
Var(τ)Var(τ∗)
− 1
=λvω1 + ω2k(r − λ)
λvω1− 1
=ω2(r − λ)k
λvω1. (6.130)
Such an increase may be estimated by
ω2(r − λ)kλvω1
. (6.131)
6.5 Balanced Incomplete Block Design 217
Table 6.7. Analysis of variance table for recovery of interblock information ofBIBD for H0(t) : τ1 = τ2 = . . . = τv
Source SS df MS F ∗
Betweentreat-ments(unad-justed)
S2T∗ =∑vj=1 T ∗j
2 −(∑vj=1 T ∗j
)2
/v
dfTreat =v − 1
F ∗ =S2
T∗/[(v−1)r]MSE [1+(v−k)ω∗]
Betweenblocks(ad-justed)
SSBlock(adj) =SSTreat(adj) +SSBlock(unadj)−SSTreat(unadj)
dfBlock =b− 1
MSBlocks(adj) =SSBlock(adj)
dfBlock
Intrablockerror
SSError(t) (bysubstraction)
dfEt =bk − b −v + 1
MSE =SSError(t)
dfEt
Total SSTotal =∑i
∑j y2
ij −G2
bk
dfT =bk − 1
Although ω1 > ω2 but this may not hold true for ω1 and ω2. The estimatesω1 and ω2 may be negative also and in that case we take ω1 = ω2.
Example 6.2. (Continued Example 6.1) Now we illustrate the interblockanalysis and recovery of interblock information with the setup of Example6.1.
From the intrablock analysis of variance, we find
σ2 = 2.14 , [cf. (6.97)]
the unadjusted sum of squares due to treatments is
SSTreat(unadj) =v∑
j=1
V 2j
rj− G2
bk= 25.924 ,
where the values of Vj ’s and G are obtained from the calculations ofintrablock analysis. The adjusted sum of squares due to blocks
SSBlock(adj) = SSTreat(adj) + SSBlock(unadj) − SSTreat(unadj)
= 33.89 + 14.11− 25.92 = 22.08 .
So
MSBlocks(adj) =22.076
9= 2.45
218 6. Incomplete Block Designs
and thus
σ2β =
b− 1bk − v
[MSBlock(adj) −MSE
]
= 0.11 .
Then we have
ω1 =1σ2
= 0.47 ,
ω2 =1
kσ2 + σ2β
= 0.15
and thus
ω∗ =ω1 − ω∗2
v(k − 1)ω1 + (v − k)ω2= 0.0638 .
Now for j = 1, 2, 3, 4, 5, we have
W ∗j = 2Vj − 4Tj + 2G , [cf. (6.107)]
T ∗j = Tj + ω∗W ∗j [cf. (6.116)]
which gives W ∗1 = 9.02, W ∗
2 = −14.98, W ∗3 = −5.58, W ∗
4 = 24.16, W ∗5 =
−12.64, T ∗1 = 120.01, T ∗2 = 120.05, T ∗3 = 120.35, T ∗4 = 121.27 and T ∗5 =124.40. This yields
S2T∗ = 13.72 . [cf. (6.117)]
Now the statistic F ∗ (cf. (6.118) is
F ∗ = 0.24
which approximately follows F distribution with 4 and 16 degrees of free-dom. Since F4,16;0.95 = 3.01, so we accept the null hypothesis about theequality of treatment effects at 5% level of significance. The analysis ofvariance table is described in Table 6.8
Table 6.8. Analysis of variance table for recovery of interblock information ofBIBD for Example 6.1
Source SS df MS F ∗
Between treatments (un-adjusted)
S2T∗ = 13.72 4 0.24
Between blocks (adjusted) 22.08 9 2.45
Intrablock error 46.42 16 2.90Total 82.22 29
6.6 Partially Balanced Incomplete Block Designs 219
One may note that an intrablock analysis resulted in the rejection ofthe null hypothesis in Example 6.1.When information about the blocksis incorporated then the recovery of interblock information results in theacceptance of same null hypothesis. The recovery of interblock informationadditionally incorporated the information about blocks in the analysis.
The results for the analysis of recovery of interblock information of BIBDcan be obtained using the proc mixed discussed in SAS with the commandsin Section 6.4.2
6.6 Partially Balanced Incomplete Block Designs
The balanced incomplete block design has several optimum properties likeconnectedness, equal block size etc. They are more efficient than other in-complete block designs in which each block has same number of plots andeach treatment is replicated an equal number of times. However the bal-anced incomplete block designs do not always exist and for certain numberof treatments, they exist only with large numbers of blocks and replicates.For example, if 8 treatments are to be arranged in the blocks of 3 plots
each, then we need at least(
83
)= 56 number of blocks and the total
number of times each treatment is replicated is at least 21 (using bk = vrwith b = 56, k = 3, v = 8). The actual arrangement of design consists ofputting in each block one of the 56 combinations of 8 treatments taken 3at a time. One of the main properties of a BIBD is that the variance of anyelementary contrast has same value for all elementary contrasts arising inthe design. In fact, we have shown that
Var(l′τ) =k
λvl′lσ2
which implies that
Var(τj − τj′) =2k
λvσ2 for all j 6= j′.
Partially balanced incomplete block designs overcome such problems tosome extent. The number of replications for each treatment can be mademuch smaller than BIBD and property of equal variance of treatmentcontrasts is modified to some extent. The partially balanced incompleteblock designs are connected but no longer balanced. In order to under-stand and define a partially balanced incomplete block design (PBIBD),we use the concept of “Association Schemes”. First we explain the asso-ciation schemes with examples and then we discuss the partially balancedincomplete block designs.
220 6. Incomplete Block Designs
6.6.1 Partially Balanced Association Schemes
Definition 6.14. Given a set of treatments (symbols) 1, 2, . . . , v, a relation-ship satisfying the following three conditions is called a partially balancedassociation scheme with m-associate classes.
(i) Any two symbols are either first, second,. . . , or mth associates andthe relation of associations is symmetrical, i.e., if the treatment A isthe ith associate of treatment B, then B is also the ith associate oftreatment A.
(ii) Each treatment A in the set has exactly ni treatments in the setwhich are the ith associate and the number ni (i = 1, 2, . . . ,m) doesnot depend on the treatment A.
(iii) If any two treatments A and B are the ith associates, then the numberof treatments which are both jth associate of A and kth associate ofB is pi
jk and is independent of the pair of ith associates A and B.
The numbers v, n1, n2, . . . , nm, pijk (i, j, k = 1, 2, . . . , m) are called the
parameters of m-associate partially balanced scheme.To understand these conditions (i)-(iii), we illustrate them with examples
based on rectangular and triangular association schemes in the followingsubsections.
Rectangular Association Scheme
Consider an example of m = 3 associate classes. Consider the arrangementof 6 treatment symbols 1, 2, 3, 4, 5 and 6 as in Table 6.9.
Table 6.9. Arrangement of six treatments under rectangular association scheme
1 2 34 5 6
Then with respect to each symbol, the
• two other symbols in same row are the first associates,
• one another symbol in same column is the second associate and
• remaining two symbols are the third associates.
For example, with respect to treatment 1,
treatments 2 and 3 are the first associates as they occur in the samerow,
• treatment 4 is the second associate as it occurs in the same columnand
6.6 Partially Balanced Incomplete Block Designs 221
• the remaining treatments 5 and 6 are the third associates.
Table 6.10 describes the first, second and third associates of all the sixtreatments.
Table 6.10. First, second and third associates of six treatments under rectangularassociation scheme
Treatment First Second Thirdnumber associates associates associates
1 2, 3 4 5, 62 1, 3 5 4, 63 1, 2 6 4, 54 5, 6 1 2, 35 4, 6 2 1, 36 4, 5 3 1, 2
Further, we observe that for the treatment 1, the
number of first associates (n1) = 2,
number of second associate (n2) = 1 and
number of third associates (n3) = 2.
The same values of n1, n2 and n3 hold true for other treatments also.Now we discuss the implementation of condition (iii) of definition of par-
tially balanced association scheme related to pijk. Consider the treatments
1 and 2. They are the first associates (which means i = 1), i.e., treatments1 and 2 are the first associate of each other; treatment 6 is the third as-sociate (which means j = 3) of treatment 1 and also the third associate(which means k = 3) of treatment 2. Thus the number of treatments whichare both, i.e., the jth (j = 3) associate of treatment A (here A ≡ 1) andkth (k = 3) associate of treatment B (here B ≡ 2) are ith (i.e., i = 1)associate is
pijk = p1
33 = 1.
Similarly consider the treatments 2 and 3 which are the first associate(which means i = 1); treatment 4 is the third (which means j = 3) associateof treatment 2 and treatment 4 is also the third (which means k = 3)associate of treatment 3. Thus
p133 = 1.
Other values of pijk (i, j, k = 1, 2, 3) can also be obtained similarly.
We would like to remark that this method can be used to generate 3-classassociation scheme in general for m×n treatments (symbols) by arrangingthem in m-rows and n-columns.
222 6. Incomplete Block Designs
Triangular Association Scheme
The triangular association scheme gives rise to a 2-class association scheme.It is obtained by arranging
v =(
q2
)=
q(q − 1)2
(6.132)
symbols in q rows and q columns in the following way as shown in Table6.11.
(a) Positions in leading diagonals are left blank (or crossed).
(b) The q(q− 1)/2 positions are filled up in the positions above the prin-cipal diagonal by treatment numbers 1, 2, . . . , v corresponding to thesymbols.
(c) Fill the positions below the principal diagonal symmetrically.
Table 6.11. Assignment of q(q−1)/2 treatments in triangular association scheme
rows−→
1 2 3 4 . . . q − 1 q
columns↓1 × 1 2 3 . . . q − 2 q − 12 1 × q q + 1 . . . 2q − 2 2q − 13 2 q × . . . . . . . . . . . .4 3 q + 1 . . . . . . . . . . . . . . ....
......
......
. . ....
...q − 1 q − 2 2q − 2 . . . . . . . . . × q(q − 1)/2q q − 1 2q − 1 . . . . . . . . . q(q − 1)/2 ×
The symbols entering in same column i (i = 1, 2, . . . , q) are the firstassociates of i and rest are the second associates. Thus two treatments insame row or in same column are the first associates of treatment i. Twotreatments which do not occur in the same row or same column are secondassociates of treatment i.
Consider the following example for the understanding of triangularassociation scheme.
Let q = 5, then we have v =(
52
)= 10. The ten treatments are
arranged under triangular association scheme in Table 6.12. For example,for treatment 1, the treatments 2, 3 and 4 occur in same row (or samecolumn) and treatments 5, 6 and 7 occur in same column (or same row).So the treatments 2, 3, 4, 5, 6 and 7 are the first associates of treatment
6.6 Partially Balanced Incomplete Block Designs 223
1. Then rest of the treatments 8, 9 and 10 are the second associates oftreatment 1. The first and second associates of other treatment are statedin Table 6.13.
Table 6.12. Assignment of 10 treatments in triangular association scheme
rows−→
1 2 3 4 5
columns↓1 × 1 2 3 42 1 × 5 6 73 2 5 × 8 94 3 6 8 × 105 4 7 9 10 ×
Table 6.13. First and second associates of 10 treatments under triangularassociation scheme
Treatment number First associates Second associates1 2, 3, 4 5, 6, 7 8, 9, 102 1, 3, 4 5, 8, 9 6, 7, 103 1, 2, 4 6, 8, 10 5, 7, 94 1, 2, 3 7, 9, 10 5, 6, 85 1, 6, 7 2, 8, 9 3, 4, 106 1, 5, 7 3, 8, 10 2, 4, 97 1, 5, 6 4, 9, 10 2, 3, 88 2, 5, 9 3, 6, 10 1, 4, 79 2, 5, 8 4, 7, 10 1, 3, 610 3, 6, 8 4, 7, 9 1, 2, 5
We observe from Table 6.13 that the number of first and second associatesof each of the 10 treatments (v = 10) is same with n1 = 6, n2 = 3 andn1 + n2 = 9 = v − 1. For example, the treatment 2 in the column of firstassociates occurs six times, viz., in first, third, fourth, fifth, eighth and ninthrows. Similarly the treatment 2 in the column of second associates occursthree times, viz., in the sixth, seventh and tenth rows. Similar conclusionscan be verified for other treatments.
There are six parameters, viz., p111, p1
22, p112 (or p1
21), p211, p2
22 and p212 (or
p221) which can be arranged in symmetric matrices P1 and P2 as follows:
P1 =[
p111 p1
12
p121 p1
22
], P2 =
[p211 p2
12
p221 p2
22
]. (6.133)
224 6. Incomplete Block Designs
We would like to caution the reader not to read p211 as square of p11 but 2
in p211 is only a superscript.
For the design under consideration, we find that
P1 =[
3 22 1
], P2 =
[4 22 0
].
In order to learn how to write these matrices P1 and P2, we consider thetreatments 1, 2 and 8. Note that the treatment 8 is the second associate oftreatment 1. Consider only the rows corresponding to treatments 1, 2 and8 in Table 6.13 and obtain the elements of P1 and P2 as follows:
p111: Treatments 1 and 2 are the first associates of each other.
There are three common treatments (viz., 3, 4 and 5)between the first associates of treatment 1 and the firstassociates of treatment 2. So p1
11 = 3.p112 and p1
21: Treatments 1 and 2 are the first associates of eachother. There are two treatments (viz., 6 and 7) which arecommon between the first associates of treatment 1 and thesecond associates of treatment 2. So p1
12 = 2 = p121.
p122: Treatments 1 and 2 are the first associates of each other.
There is only one treatment (viz., treatment 10) which iscommon between the second associates of treatment 1 andthe second associates of treatment 2. So p1
22 = 1.p211: Treatments 1 and 8 are the second associates of each other.
There are four treatments (viz., 2, 3, 5 and 6) which arecommon between the first associates of treatment 1 and firstassociates of treatment 8. So p2
11 = 4.p212 and p2
21: Treatments 1 and 8 are the second associates ofeach other. There are two treatments (viz., 4 and 7) whichare common between the first associates of treatment 1 andthe second associates of treatment 8. So p2
12 = 2 = p221.
p222: Treatments 1 and 8 are the second associates of each other.
There is no treatment which is common between the sec-ond associates of treatment 1 and the second associates oftreatment 8. So p2
22 = 0.
In general, if we use q rows and q columns of a square, then for q > 3
v =(
q2
)=
q(q − 1)2
, (6.134)
n1 = 2q − 4 , (6.135)
n2 =(q − 2)(q − 3)
2, (6.136)
6.6 Partially Balanced Incomplete Block Designs 225
P1 =[
q − 2 q − 3q − 3 (q−3)(q−4)
2
], (6.137)
P2 =[
4 2q − 82q − 8 (q−4)(q−5)
2
]. (6.138)
For q = 3, there are no second associates which is a degenerate case wheresecond associates do not exist and hence P2 can not be defined.
It may be remarked that the graph theory techniques can be used forcounting pi
jk. Further, it is easy to see that all the parameters in P1, P2,etc. are not independent.
Construction of Blocks of PBIBD under Triangular Association Scheme
The blocks of a PBIBD can be obtained in different ways through an as-sociation scheme. One PBIBD from triangular association scheme can beobtained as follows. Consider the rows of arrangement of treatments in atriangular association scheme. The treatments in each row constitutes theset of treatments to be assigned in a block. When q = 5, the blocks ofPBIBD are constructed by considering the rows of Table 6.12 that are pre-sented in Table 6.14. The parameters of such a design are b = 5, v = 10,r = 2, k = 4, λ1 = 1 and λ2 = 0.
Table 6.14. Blocks of PBIBD under triangular association scheme with q = 5.
TreatmentsBlock 1 1, 2, 3, 4Block 2 1, 5, 6, 7Block 3 2, 5, 8, 9Block 4 3, 6, 8, 10Block 5 4, 7, 9, 10
There are other approaches also to obtain the blocks of PBIBD from atriangular association scheme. For example, consider the columns of trian-gular scheme pairwise. Then delete the common treatments between thechosen columns and retain others. The retained treatments will constitutethe blocks. Consider e.g., the triangular association scheme for q = 5 as inTable 6.12, then the first block under this approach is obtained by deletingthe common treatments between columns 1 and 2 which results in a blockcontaining the treatments 2, 3, 4, 5, 6 and 7. Similarly, considering the pairsof columns (1 and 3), (1 and 4), (1 and 5), (2 and 3), (2 and 4), (2 and 5),(3 and 4), (3 and 5) and (4 and 5), other blocks can be obtained which arepresented in Table 6.15. The parameters of the PBIBD are b = 10, v = 10,r = 6, k = 6, λ1 = 3 and λ2 = 4.
Since both these PBIBDs in Tables 6.14 and 6.15 are arising from sameassociation scheme, so we have the same values of n1 = 6 and n2 = 3 as
226 6. Incomplete Block Designs
well as P1 and P2 matrices for both the designs as
P1 =[
3 22 1
],
P2 =[
4 22 0
].
Table 6.15. Blocks of PBIBD under triangular association scheme
Blocks Columns of association scheme TreatmentsBlock 1 (1, 2) 2, 3, 4, 5, 6, 7Block 2 (1, 3) 1, 3, 4, 5, 8, 9Block 3 (1, 4) 1, 2, 4, 6, 8, 10Block 4 (1, 5) 1, 2, 3, 7, 9, 10Block 5 (2, 3) 1, 2, 6, 7, 8, 9Block 6 (2, 4) 1, 3, 5, 7, 8, 10Block 7 (2, 5) 1, 4, 5, 6, 9, 10Block 8 (3, 4) 2, 3, 5, 6, 9, 10Block 9 (3, 5) 2, 4, 5, 7, 8, 10Block 10 (4, 5) 3, 4, 6, 7, 8, 9
The blocks of another PBIBD can be derived by considering all the firstassociates of a given treatment in a block. For example, in case of q = 5, thefirst associates of treatment 1 from Table 6.13 are the treatments 2, 3, 4, 5,6 and 7. So these treatments constitute one block. Similarly other blockscan also be found. This results in the same arrangement of treatments inblocks as in Table 6.15.
The PBIBD with two associate classes are popular in practical applica-tions and can be classified into following types depending on the associationscheme, (see Bose and Shimamoto (1952)).
1. Triangular
2. Group divisible
3. Latin square with i constraints (Li)
4. Cyclic and
5. Singly linked blocks.
The triangular association scheme has already been discussed. We nowbriefly present other types of association schemes.
6.6 Partially Balanced Incomplete Block Designs 227
Group Divisible Type Association Scheme
Let there be v = pq treatments. In a group divisible type scheme, thetreatments can be divided into p groups of q treatments each, such that anytwo treatments in same group are the first associates and two treatmentsin different groups are the second associates. The association scheme canbe exhibited by placing the treatment in a (p × q) rectangle, where thecolumns form the groups.
Under this association scheme,
n1 = q − 1 ,
n2 = q(p− 1),
hence
(q − 1)λ1 + q(p− 1)λ2 = r(k − 1)
and the parameters of second kind are uniquely determined by p and q. Inthis case,
P1 =(
q − 2 00 q(p− 1)
),
P2 =(
0 q − 1q − 1 q(p− 2)
).
For every group divisible design,
r ≥ λ1 ,
rk − vλ2 ≥ 0.
A group divisible design is said to be singular if r = λ1. A singular groupdivisible design is always derivable from a corresponding BIBD by replacingeach treatment by a group of q treatments. In general, corresponding to aBIBD with parameters b∗, v∗, r∗, k∗, λ∗, a divisible group divisible design isobtained with parameters
b = b∗ ,
v = qv∗ ,
r = r∗ ,
k = qk∗ ,
λ1 = r ,
λ2 = λ∗ ,
n1 = p ,
n2 = q.
228 6. Incomplete Block Designs
A group divisible design is nonsingular if r 6= λ1. Nonsingular groupdivisible designs can be divided into two classes– semi-regular and regular.
A group divisible design is said to be semi-regular if r > λ1 and rk −vλ2 = 0. For this design
b ≥ v − p + 1.
Also, each block contains the same number of treatments from each groupso that k must be divisible by p.
A group divisible design is said to be regular if r > λ1 and rk− vλ2 > 0.For this design
b ≥ v.
Latin Square Type Association Scheme
The Latin square type PBIBD with i constraints is denoted by Li. Thenumber of treatments are v = q2. The treatments may be set in a squarescheme. For the case i = 2, two treatments are first associates if they occurin the same row or same column, and second associates otherwise. For thegeneral case, we take a set of (i − 2) mutually orthogonal Latin squares,provided it exists. Then two treatments are first associates if they occur inthe same row or same column, or corresponding to the same letter of oneof the Latin squares. Otherwise they are second associates.
Under this association scheme,
v = q2 ,
n1 = i(q − 1) ,
n2 = (q − 1)(q − i + 1) ,
P1 =(
(i− 1)(i− 2) + q − 2 (q − i + 1)(i− 1)(q − i + 1)(i− 1) (q − i + 1)(q − i)
),
P2 =(
i(i− 1) i(q − i)i(q − i) (q − i)(q − i− 1) + q − 2
).
Cyclic Type Association Scheme
Let there be v treatments denoted by integers 1, 2, . . . , v in a cyclic typePBIBD. The first associates of treatment i are
i + d1, i + d2, . . . , i + dn1 (mod v),
where the d’s satisfy the following conditions:
(i) the d’s are all different and 0 < dj < v (j = 1, 2, . . . , n1);
(ii) among the n1(n1 − 1) differences dj − dj′ , (j, j′ = 1, 2, . . . , n1, j 6=j′) reduced (mod v), each of the numbers d1, d2, . . . , dn occurs αtimes, whereas each of the numbers e1, e2, . . . , en2 occurs β times,where d1, d2, . . . , dn1 , e1, e2, . . . , en2 are all the different v−1 numbers
6.6 Partially Balanced Incomplete Block Designs 229
1, 2, . . . , v − 1. (To reduce an integer mod v, we have to substractit from it a suitable multiple of v, so that the reduced integer liesbetween 1 and v. For example, 17 when reduced mod 13 is 4). Forthis scheme,
n1α + n2β = n1(n1 − 1) ,
P1 =(
α n1 − α− 1n1 − α− 1 n2 − n1 + α + 1
),
P2 =(
β n1 − βn1 − β n2 − n1 + β − 1
).
Singly Linked Block Association Scheme
Consider a BIBD D with parameters b∗∗, v∗∗, r∗∗, k∗∗, λ∗∗ = 1 and b∗∗ >v∗∗. Let the block numbers of this design be treated as treatments, i.e.,v = b∗∗. Define two block numbers of D to be the first associates if theyhave exactly one treatment in common and second associates otherwise.Then this association scheme with two classes is called as singly linkedblock association scheme.
Under this association scheme,
v = b∗∗ ,
n1 = k∗∗(r∗∗ − 1) ,
n2 = b∗∗ − 1− n1 ,
P1 =(
r∗∗ − 2 + (k∗∗ − 1)2 n1 − r∗∗ − (k∗∗ − 1)2 + 1n1 − r∗∗ − (k∗∗ − 1)2 + 1 n2 − n1 + r∗∗ + (k∗∗ − 1)2 − 1
),
P2 =(
k∗∗2 n1 − k∗∗2
n1 − k∗∗2 n2 − n1 + k∗∗2 − 1
).
6.6.2 General Theory of PBIBD
Definition 6.15. A PBIBD with m-associate classes is an arrangement of vtreatments into b blocks of size k each, according to an m-associate partiallybalanced association scheme such that
(a) every treatment occurs at most once in a block,
(b) every treatment occurs exactly in r blocks and
(c) if two treatments are the ith associates of each other then they occurtogether exactly in λi (i = 1, 2, . . . ,m) blocks.
The number λi is independent of the particular pair of ith associate chosen.It is not necessary that λi should all be different and some of the λi’s maybe zero.
230 6. Incomplete Block Designs
If v treatments have such a scheme available, then we have a PBIBD.Note that here two treatments which are the ith associates, occur togetherin λi blocks.
The parameters b, v, r, k, λ1, λ2, . . . , λm, n1, n2, . . . , nm are termed asthe parameters of first kind and pi
jk are termed as the parameters ofsecond kind. It may be noted that n1, n2, . . . , nm and all pi
jk of the de-sign are obtained from the association scheme under consideration. Onlyλ1, λ2, . . . , λm occur in the definition of PBIBD.
If λi = λ for all i = 1, 2, . . . ,m then PBIBD reduces to BIBD. So BIBDis essentially a PBIBD with one associate class.
6.6.3 Conditions for PBIBD
The parameters of a PBIBD are chosen such that they satisfy the followingrelations:
(i) bk = vr (6.139)(ii)
∑mi=1 ni = v − 1 (6.140)
(iii)∑m
i=1 niλi = r(k − 1) (6.141)
(iv) nkpkij = nip
ijk = njp
jki (6.142)
(v)∑m
k=1 pijk =
nj − 1 if i = jnj if i 6= j.
(6.143)
It follows from these conditions that there are only m(m2 − 1)/6independent parameters of the second kind.
6.6.4 Interpretations of Conditions of BIBD
The interpretations of conditions (i)-(v) in (6.139)-(6.143) are as follows.
(i) bk = vr
This condition is a statement about the total number of plots similar to asin the case of BIBD.
(ii)∑m
i=1 ni = v − 1
Since with respect to each treatment, the remaining (v− 1) treatments areclassified as first, second,. . . , or mth associates and each treatment has ni
associates.
(iii)∑m
i=1 niλi = r(k − 1)
Consider r blocks in which a particular treatment A occurs. In these rblocks, r(k − 1) pairs of treatments can be found, each having A as one
6.6 Partially Balanced Incomplete Block Designs 231
of its members. Among these pairs, the ith associate of A must occur λi
times and there are ni associates, so∑
i niλi = r(k − 1).
(iv) nipijk = njp
jki = nkpk
ij
Let Gi be the set of ith associates, i = 1, 2, . . . ,m of a treatment A. Fori 6= j, each treatment in Gi has exactly pi
jk numbers of kth associates inGi. Thus the number of pairs of kth associates that can be obtained bytaking one treatment from Gi and another treatment from Gj is on the onehand is nip
ijk and on the another hand is njp
jik.
(v)∑m
k=1 pijk = nj − 1 if i = j and
∑mk=1 pi
jk = nj if i 6= j
Let the treatments A and B be ith associates. The kth associate of A(k = 1, 2, . . . , m) should contain all the nj number of jth associates of B(j 6= i). When j = i, A itself will be one of the jth associate of B. Hencekth associate of A, (k = 1, 2, . . . ,m) should contain all the (nj−1) numbersof jth associate of B. Thus the condition holds.
6.6.5 Intrablock Analysis of PBIBD With Two Associates
Consider a PBIBD under two associates scheme with parameters b, v, r,k, λ1, λ2, n1, n2, p1
11, p122, p1
12, p211, p2
22 and p212. The corresponding linear
model is
yij = µ + βi + τj + εij ; i = 1, 2, . . . , b, j = 1, 2, . . . , v, (6.144)
where
µ is the general mean effect;βi is the fixed additive ith block effect satisfying
∑i βi = 0;
τj is the fixed additive jth treatment effect satisfying∑r
j=1 τj = 0and
εijm is the i.i.d. random error with εijm ∼ N(0, σ2).
The PBIBD is a binary proper and equireplicate design so
• nij = 0 or 1,
• k1 = k2 = . . . = kb = k and
• r1 = r2 = . . . = rv = r.
The null hypothesis of interest is H0 : τ1 = τ2 = . . . τv against alternativehypothesis H1 : at least one pair of τj is different. The null hypothesis
232 6. Incomplete Block Designs
related to block effects is of not much practical relevance and can be treatedsimilarly. The minimization of sum of squares due to residuals
b∑
i=1
v∑
j=1
(yij − µ− βi − τj)2
with respect to µ, βi and τj results in the following set of reduced normalequations in matrix notation after eliminating the block effects
Q = Cτ [cf.(6.9)]
with
C = R−N ′K−1N , [cf.(6.10)]Q = V −N ′K−1B ,
where in our case
R = rIv , (6.145)K = kIb , (6.146)
the diagonal elements of C (cf. (6.15)) are
cjj =r(k − 1)
r, (j = 1, 2, . . . , v), (6.147)
the off-diagonal elements of C (cf. (6.16)) are
cjj′ =
−λ1k if treatments j and j′ are the first associates
−λ2k if treatments j and j′ are the second associates
(j 6= j′ = 1, 2, . . . , v)(6.148)
and
Qj = Vj − 1k
[Sum of block totals in which jth treatment occurs]
=1k
r(k − 1)τj −
∑
i
∑
j′(j 6=j′)
nijnij′τj
. (6.149)
Let Sj1 be the sum of all treatments which are the first associates ofjth treatment and Sj2 be the sum of all treatments which are the secondassociates of jth treatment. Then
τj + Sj1 + Sj2 =v∑
j=1
τj . (6.150)
6.6 Partially Balanced Incomplete Block Designs 233
Using (6.150) in (6.149), we have for j = 1, 2, . . . , v,
kQj = [r(k − 1)τj − (λ1Sj1 + λ2Sj2)]
= r(k − 1)τj − λ1Sj1 − λ2
v∑
j=1
τj − τj − Sj1
= [r(k − 1) + λ2] τj + (λ2 − λ1)Sj1 − λ2
v∑
j=1
τj . (6.151)
The equations (6.151) are to be solved for obtaining the adjusted treatmentssum of squares. Imposing the side condition
∑vj=1 τj = 0 on (6.151), we
have
kQj = [r(k − 1) + λ2] τj + (λ2 − λ1)Sj1
= a∗12τj + b∗12Sj1 (6.152)
where a∗12 = r(k − 1) + λ2 and b∗12 = λ2 − λ1.Let Qj1 denotes the adjusted sum of Qj ’s over the set of treatments
which are the first associate of jth treatment. We note that when we addthe terms Sj1 for all j, then j occurs n1 times in the sum, every firstassociate of j occurs p1
11 times in the sum and every second associate of joccurs p2
11 times in the sum with p211 + p2
12 = n1. Then using (6.146) and∑vj=1 τj = 0, we have
kQj1 = [r(k − 1) + λ2]Sj1 + (λ2 − λ1)[n1τj + p1
11Sj1 + p211Sj2
]
=[r(k − 1) + λ2 + (λ2 − λ1)(p1
11 − p211)
]Sj1 + (λ2 − λ1)p2
12τj
= b∗22Sj1 + a∗22τj (6.153)
where
a∗22 = (λ2 − λ1)p212 , (6.154)
b∗22 = r(k − 1) + λ2 + (λ2 − λ1)(p111 − p2
11) . (6.155)
Now (6.152) and (6.153) can be solved to obtain τj as
τj =k[b∗22Qj − b∗12Qj1]a∗12b
∗22 − a∗22b
∗12
, (j = 1, . . . , v). (6.156)
We see thatv∑
j=1
Qj =v∑
j=1
Qj1 = 0 , (6.157)
sov∑
j=1
τj = 0 . (6.158)
Thus τj is a solution of reduced normal equation.
234 6. Incomplete Block Designs
The analysis of variance can be carried out by obtaining the unadjustedblock sum of squares as
SSBlock(unadj) =b∑
i=1
B2i
k− G2
bk, (6.159)
the adjusted sum of squares due to treatment as
SSTreat(adj) =v∑
j=1
τjQj (6.160)
from (6.152) and (6.156) where G =∑
i
∑j yij and the sum of squares due
to error as
SSError(t) = SSTotal − SSBlock(unadj) − SSTreat(adj) (6.161)
where
SSTotal =∑
i
∑
j
y2ij −
G2
bk. (6.162)
A test for H0: τ1 = τ2 = . . . = τv is then based on the statistic
FTr =SSTreat(adj)/(v − 1)
SSError(t)/(bk − b− v + 1). (6.163)
If FTr > Fv−1,bk−v−b+1;1−α then H0 is rejected. The intrablock analysis ofvariance for testing the significance of treatment effects is given in Table6.16.
We would like to point out that in (6.151), one can also eliminate Sj1
instead of Sj2. If we eliminate Sj2 instead of Sj1 (as we approached), thenthe solution has less work involved in the summing of Qj1 if n1 < n2. Ifn1 > n2, then one may prefer to eliminate Sj1 in (6.151) to reduce the workin obtaining Qj2 where Qj2 denotes the adjusted sum of Qj ’s over the setof treatments which are the second associate of jth treatment. We obtainthe following estimate of treatment in this case
τ∗j =k[b∗21Qj − b∗11Qj2]a∗11b
∗21 − a∗21b
∗11
(6.164)
where
a∗11 = r(k − 1) + λ1 , (6.165)b∗11 = λ1 − λ2 , (6.166)a∗21 = (λ1 − λ2)p1
12 , (6.167)b∗21 = r(k − 1) + λ1 + (λ1 − λ2)(p2
22 − p122) . (6.168)
The analysis of variance is then based on (6.164) and can be carried outsimilarly.
6.6 Partially Balanced Incomplete Block Designs 235
Table 6.16. Intrablock analysis of variance of PBIBD forH0(t) : τ1 = τ2 = . . . = τv with two associate class
Source SS df MS FBetweentreatments(adjusted)
SSTreat(adj) =∑vj=1 τjQj
dfTreat =v − 1
MSTreat =SSTreat(adj)
dfTreat
MSTreatMSE
Betweenblocks(unadjusted)
SSBlock(unadj) =∑bi=1
B2i
k − G2
bk
dfBlock =b− 1
Intrablockerror
SSError(t) (Bysubstraction)
dfEt =bk − b −v + 1
MSE =SSError
dfEt
Total SSTotal =∑bi=1
∑vj=1 y2
ij −G2
bk
dfT =bk − 1
The variance of the elementary contrasts of estimates of treatments (incase of n1 < n2)
τj − τj′ =b∗22(kQj − kQj′)− b∗12(kQj1 − kQj′1)
a∗12b∗22 − a∗22b
∗12
is
Var(τj − τj′) =
2k(b∗22+b∗12)a∗12b∗22−a∗22b∗12
if treatment j and j′ are the firstassociates
2kb∗12a∗12b∗22−a∗22b∗12
if treatment j and j′ are thesecond associates.
We observe that the variance of τj− τj′ depends on the nature of j and j′
in the sense that whether they are the first or second associates. So designis not (variance) balanced. But variance of any elementary contrast areequal under a given order of association, viz., first or second. That is whythe design is said to be partially balanced in this sense.
The results for intrablock analysis of PBIBD can be obtained using theSAS commands discussed in Subsection 6.3.3. Ths SAS commands can beused only after getting the blocks from the association schemes.
236 6. Incomplete Block Designs
Example 6.3. The data in Tables 6.17 and 6.18 represent the length of rootcanal treatment lasted in patients. There are ten types of techniques usedfor root canal treatments. These techniques (or treatments) are denotedby the numbers 1, 2, . . . , 10. Two types of PBIBD are constructed usingtriangular association scheme. The blocks in first PBIBD are obtained byconsidering the treatments in rows of triangular association scheme and itsdata is given in Table 6.17.
The blocks in second type of PBIBD are obtained by considering theuncommon treatments between the pairs of columns of triangular associa-tion scheme in which the common treatments between the two columns areignored and others are retained as in Table 6.15. Its data is given in Table6.18.
Now we conduct an intrablock analysis of both the PBIBDs and test ofhypothesis related to the effectiveness of ten types of techniques of rootcanal treatment. The numbers inside the brackets in Tables 6.17 and 6.18represent the treatment number corresponding to which an observation isobtained.
Table 6.17. Arrangement of treatment in blocks in first PBIBD in Example 6.3
Blocks Life of root canals in years (Treatment number)1 3.6 (1), 3.8 (2), 4.2 (3), 3.2 (4)2 4.4 (1), 4.5 (5), 4.1 (6), 3.9 (7)3 3.8 (2), 3.8 (5), 3.6 (8), 3.3 (9)4 3.9 (3), 4.0 (6), 4.1 (8), 3.5 (10)5 3.3 (4), 3.6 (7), 3.8 (9), 3.1 (10)
Table 6.18. Arrangement of treatment in blocks in second PBIBD in Example6.3
Blocks Life of root canals in years (Treatment number)1 3.4 (2), 3.5 (3), 3.6 (4), 4.0 (5), 2.8 (6), 2.9 (7)2 3.7 (1), 3.8 (3), 3.4 (4), 3.7 (5), 2.6 (8), 3.9 (9)3 3.6 (1), 3.8 (2), 3.4 (4), 4.2 (6), 3.7 (8), 3.2 (10)4 4.4 (1), 4.1 (2), 3.1 (3), 4.3 (7), 4.4 (9), 3.9 (10)5 4.4 (1), 4.1 (2), 3.5 (6), 3.4 (7), 3.6 (8), 3.3 (9)6 3.8 (1), 3.8 (3), 3.6 (5), 3.5 (7), 3.5 (8), 3.2 (10)7 3.6 (1), 3.6 (4), 3.2 (5), 4.1 (6), 3.2 (9), 3.1 (10)8 4.0 (2), 4.6 (3), 4.2 (5), 4.2 (6), 3.8 (9), 3.7 (10)9 4.0 (2), 3.8 (4), 4.1 (5), 3.4 (7), 3.5 (8), 3.3 (10)10 3.1 (3), 3.5 (4), 3.2 (6), 3.1 (7), 2.8 (8), 2.9 (9)
6.6 Partially Balanced Incomplete Block Designs 237
It may be noted that the allocation of ten treatments under the triangularassociation scheme can be done as in Table 6.12, and the resulting blocksare as in Table 6.14. The first and second associates of the given treat-ments follow from Table 6.13 and its blocks are obtained in Table 6.15.The parameters of this PBIBD are b = 5, v = 10, r = 2, k = 4, λ1 = 1
and λ2 = 0. Other related values are n1 = 6, n2 = 3, P1 =[
3 22 1
]and
P2 =[
4 22 0
]. The diagonal elements of C-matrix are
cjj =32
(j = 1, 2, . . . , 10) [cf. (6.147)]
and the off-diagonal elements of C-matrix are
cjj′ =
− 14 if treatments j and j′ are the first associates
0 if treatments j and j′ are the second associates(j 6= j′ = 1, 2, . . . , 10). [cf. (6.148)]
The block totals are
B1 = 3.6 + 3.8 + 4.2 + 3.2 = 14.8 ,
B2 = 4.4 + 4.5 + 4.1 + 3.9 = 16.9 ,
B3 = 3.8 + 3.8 + 3.6 + 3.3 = 14.5 ,
B4 = 3.9 + 4.0 + 4.1 + 3.5 = 15.5 ,
B5 = 3.3 + 3.6 + 3.8 + 3.1 = 13.8 ,
the treatment totals are
V1 = 3.6 + 4.4 = 8.0 ,
V2 = 3.8 + 3.8 = 7.6 ,
V3 = 4.2 + 3.9 = 8.1 ,
V4 = 3.2 + 3.3 = 6.5 ,
V5 = 4.5 + 3.8 = 8.8 ,
V6 = 4.1 + 4.0 = 8.1 ,
V7 = 3.9 + 3.6 = 7.5 ,
V8 = 3.6 + 4.1 = 7.7 ,
V9 = 3.3 + 3.8 = 7.1 ,
V10 = 3.5 + 3.1 = 6.6 ,
238 6. Incomplete Block Designs
values of T ∗∗j (sum of block totals in which jth treatment occurs) are
T ∗∗1 = B1 + B2 = 31.7 ,
T ∗∗2 = B1 + B3 = 29.3 ,
T ∗∗3 = B1 + B4 = 30.3 ,
T ∗∗4 = B1 + B5 = 28.6 ,
T ∗∗5 = B2 + B3 = 31.4 ,
T ∗∗6 = B2 + B4 = 32.4 ,
T ∗∗7 = B2 + B5 = 30.7 ,
T ∗∗8 = B3 + B4 = 30.0 ,
T ∗∗9 = B3 + B5 = 28.3 ,
T ∗∗10 = B4 + B5 = 29.3 ,
values of Qj (cf. (6.149)) are
Q1 = V1 − T ∗∗1
k= 0.08 ,
Q2 = V2 − T ∗∗2
k= 0.27 ,
Q3 = V3 − T ∗∗3
k= 0.53 ,
Q4 = V4 − T ∗∗4
k= −0.75 ,
Q5 = V5 − T ∗∗5
k= 0.45 ,
Q6 = V6 − T ∗∗6
k= 0 ,
Q7 = V7 − T ∗∗7
k= −0.17 ,
Q8 = V8 − T ∗∗8
k= 0.20 ,
Q9 = V9 − T ∗∗9
k= 0.02 ,
Q10 = 10− T ∗∗10
k= −0.72 ,
since n1 > n2, so we prefer to use Qj2 and we have
Q12 = Q8 + Q9 + Q10 = −0.50 ,
Q22 = Q6 + Q7 + Q10 = −0.89 ,
Q32 = Q5 + Q7 + Q9 = 0.30 ,
Q42 = Q5 + Q6 + Q8 = 0.65 ,
Q52 = Q3 + Q4 + Q10 = −0.94 ,
6.6 Partially Balanced Incomplete Block Designs 239
Q62 = Q2 + Q4 + Q9 = −0.45 ,
Q72 = Q2 + Q3 + Q8 = 1.00 ,
Q82 = Q1 + Q4 + Q7 = −0.47 ,
Q92 = Q1 + Q3 + Q6 = 0.61 ,
Q102 = Q1 + Q2 + Q5 = 0.81 .
One may note that when n1 > n2, the calculation in obtaining Qj1 involvessumming of 6 terms whereas Qj2 involves summing of only 3 terms. Nowusing (6.165)-(6.168), we have a∗11 = 7, b∗11 = 1, a∗21 = 2 and b∗21 = 6. Thusτ∗j (cf. (6.164)) is
τ∗j =4(6Qj −Qj2)
40which solves to τ∗1 = 0.098, τ∗2 = 0.225, τ∗3 = 0.288, τ∗4 = −0.515, τ∗5 =0.365, τ∗6 = 0.045, τ∗7 = −0.206, τ∗8 = 0.167, τ∗9 = −0.046 and τ∗10 = −0.516.
The adjusted sum of squares due to treatments (cf. (6.160)) is
SSTreat(adj) = 1.215,
the unadjusted sum of squares due to blocks (cf. (6.159)) is
SSBlock(unadj) = 1.385,
the total sum of squares (cf. (6.162)) is
SSTotal = 2.798,
the sum of squares due to error (cf. (6.161)) is
SSError(t) = 0.198,
thus the F -statistic (cf. (6.163)) is
FTr = 4.09,
and F9,6;0.05 = 4.10, so we reject the null hypothesis at 5% level of signifi-cance. The corresponding analysis of variance table is given in Table 6.19.
Table 6.19. Intrablock analysis of variance of first PBIBD of data in Table 6.17
Source SS df MS FBetween treatments (adjusted) 1.385 4 0.135 4.091
Between blocks (unadjusted) 1.215 9
Intrablock error 0.198 6 0.033Total 2.798 19
240 6. Incomplete Block Designs
Now we consider the analysis of PBIBD for the data in Table 6.18.The parameters of the given PBIBD are b = 10, v = 10, r = 6, k = 6
λ1 = 3, λ2 = 4, n1 = 6 and n2 = 3. The values of diagonal and off-diagonalelements of C-matrix are
cjj = 5
cjj′ =
−12 if treatments j and j′ are the first associates
− 23 if treatments j and j′ are the second associates
(j 6= j′ = 1, 2, . . . , 10) .
The values of block totals Bj , treatment totals Vj , adjusted treatment totalsT ∗∗j , Qj , Qj2, and τ∗j (j = 1, 2, . . . , 10) are obtained in the Table 6.20.
Table 6.20. Calculation of terms in second PBIBD for data in Table 6.18
j Bj Vj T ∗∗j Qj Qj2 τ∗j1 20.2 23.5 131.7 1.55 -3.967 0.3042 21.1 23.4 135.2 0.867 -2.851 0.1653 21.9 21.9 130 0.233 -0.167 0.0494 24.2 21.3 130.6 -0.467 -0.383 -0.1035 22.3 22.8 130.1 1.117 -2.251 0.2236 21.4 22.0 131.8 0.033 -0.017 0.0077 20.8 28.6 128.8 -0.867 -0.433 -0.1898 24.5 19.7 127.4 1.533 0.216 -0.3319 22.1 21.5 131.5 -0.417 1.816 -0.07610 18.6 20.4 134.9 -2.017 3.535 -0.407
Here
τ∗j =174Qj + 6Qj2
810where a∗11 = 28, a∗21 = −2, b∗11 = −1 and b∗21 = 29. Thus
SSTreat(adj) = 2.45 ,
SSBlock(unadj) = 4.63 ,
SSTotal = 11.91 ,
SSError(t) = 4.84 ,
and
FTr = 2.31
with
F9,41;0.95 = 2.12.
Thus H0(t) is rejected at 5% level of significance. The corresponding analysisof variance table is given in Table 6.21.
6.7 Exercises and Questions 241
Table 6.21. Intrablock analysis of variance of second PBIBD in of data in Table6.18
Source SS df MS FBetween treatments (adjusted) 4.63 9 0.51 2.31
Between blocks (unadjusted) 2.45 9
Intrablock error 4.83 41 0.11Total 11.91 59
6.7 Exercises and Questions
6.7.1 From the following incidence matrix of a design, obtain the estimabletreatment contrasts and the degrees of freedom associated with theadjusted treatment and adjusted block sum of squares.
1 1 1 0 0 00 0 0 1 1 10 1 1 0 0 00 0 0 1 1 0
6.7.2 It is proposed to test seven treatments A, B, C, D, E, F and Gaccording to one of the three plans mentioned in Table 6.22. Which
Table 6.22. Plans for testing seven treatments in Exercise 2
Plan I Plan II Plan IIIBlock 1 A, B, C A, B, C A, B, CBlock 2 B, F , D B, C, D A, C, DBlock 3 C, D, G C, D, A A, D, EBlock 4 D, A, E D, A, B A, E, FBlock 5 E, C, F D, F , G A, F , GBlock 6 F , G, A F , G, E A, G, BBlock 7 G, E, B G, E, D -Block 8 - E, D, F -
plan would you recommend and why?
6.7.3 Form an analysis of variance appropriate to the design whose inci-dence matrix N = 2(1v1b
′) and compare it with that of a designwhose incidence matrix is N = 1v1b
′.
242 6. Incomplete Block Designs
6.7.4 Let the incidence matrix of a design be
1 1 1 01 1 0 11 0 1 10 1 1 1
.
Show that the design is connected balanced and its efficiency factoris E = 8/9.
6.7.5 Show that a necessary and sufficient condition in order that allelementary treatment contrasts may be estimated with the sameprecision is that C has (v − 1) equal non-zero eigen values.
6.7.6 In the intrablock analysis of variance of an incomplete block designwith model specification as in (6.1), show that
(i) E(Q) = Cτ, V(Q) = Cσ2
(ii) E(P ) = Dβ, V(P ) = Dσ2
[Hint: (Alternative approach) Model (6.1) can be expressed as
y = µ1n + D′1τ + D′
2β + ε
where D1 is (v × n) matrix of treatment effects versus N , i.e.,
(i, j)th element of D1 =
1 if jth observation comes fromith treatment
0 otherwise.
Similarly D2 is (b× n) matrix of block effects versus N , i.e.,
(i, j)th element of D2 =
1 if jth observation comes fromith block
0 otherwise.
Now D1D′1 = R, D2D
′2 = K, D1D
′2 = N ′, D11n = (r1, r2, . . . , rv)′,
D21n = (k1, k2, . . . , kb)′, D′11v = 1n = D′
21b, V = (V1, V2, . . . , Vv)′ =D1y, B = (B1, B2, . . . , Bb)′ = D2y. So
Q = V −N ′K−1B
= [D1 −D1D′2(D2D
′2)−1D2]y
P = B −NR−1V
= [D2 −D2D′1(D1D
′1)−1D1]y
6.7 Exercises and Questions 243
E(Q) = [D1 −D1D′2(D2D
′2)−1D2]E(µ1n + D′
1τ + D′2β)
=[(r1, r2, . . . , rv)′ −N ′K−1(k1, k2, . . . , kb)′
]µ
+[R−N ′K−1N
]τ + [N ′ −N ′K−1K]β
= (R−N ′K−1N)τ ,
V(Q) = D1
[In −D′
2(D2D′2)−1D2
]V(y)
[I−D′
2(D2D′2)−1D2
]D′
1
= σ2D1
[In −D′
2(D2D′2)−1D2
]D′
1
= σ2[R−N ′K−1N ′] .
6.7.7 Show that the determinant of(C −NN K
)
is (∏b
i=1 ki)(∏v
j=1 rj) and(w1C +
w2
kN ′N
)−1
r =1
kw21v
where r = (r1, r2, . . . , rv)′, w1 = 1/σ2 and w2 = 1/(kσ2 + σ2β).
When r1 = r2 = . . . = rv = r, show that the average variance of all el-ementary treatment contrasts with recovery of interblock informationis
2[tr
(w1C + w2
k N ′N)− 1
w2r
]
v − 1.
6.7.8 Show that in a connected design Qj + rjG/n (j = 1, 2, . . . , v) arelinearly independent. Hence show that (C +rr′/n) is nonsingular and(C + rr′/n)−1r = 1v where r = (r1, r2, . . . , rv)′ and n =
∑vj=1 rj .
6.7.9 Show that the variance of the best linear unbiased estimation of an el-ementary treatment contrast in a connected block design lies between2σ2/λmax and 2σ2/λmin where λmax and λmin denote the largest andsmallest positive characteristic roots of C (Hint: Consider Var(l′τ)and use min l′C−1l
l′l = 1λmax
and max l′C−1ll′l = 1
λmin)
treatment effects are mutually orthogonal.
matrix
M =( −kI1
√−λ1v√−λ1v N
).
Show that MM ′ = M ′M = (r − λ)Iv+1 and hence NN ′ = N ′N .
6.7.10 if km treatments are divided into m sets of k each and if treatments ofa set are assigned to k-plot blocks and if there be r replications, showthat the design is such that the adjusted block effects and adjusted
6.7.11 Let N be the incidence matrix of a symmetrical BIBD. Consider the
244 6. Incomplete Block Designs
6.7.12 Let N be the incidence matrix of a BIBD.′ is zero when the BIBD is
non-symmetrical.(ii) Show that the eigenvalues of NN ′ are rk and r − λ with
multiplicities 1 and v − 1, respectively.′ are rk and
matrix with off-diagonal elements aij =∑m
l=1 λipjli − niλi, (i 6=
j) and diagonal elements are aii = r +∑m
l=1 λipili − niλi, (i, j =
1, 2, . . . , m).
6.7.14 Prove that a BIBD is always connected unless k = 1.
6.7.15 Prove that for a BIBD, the inequality b ≥ v + r − k holds. Is thisinequality equivalent to Fisher’s inequality?
6.7.16 Prove that for a BIBD with k > 1,
b ≥ 3(r − λ) .
6.7.17 Show that if in a BIBD with b = 3r − 2λ, then r > 2λ.
is given by∑v
j=1 W 2j /[λv(v− 1)(v− k)] where Wj = (v− k)Vj− (v−
1)Tj + (k − 1)G.
based PBIBDs:
(i) v = 15 = b, r = 5 = k, λ1 = 1, λ2 = 2(ii) v = 21 = b, r = 10 = k, λ1 = 1, λ2 = 2(iii) v = 36 = b, r = 8 = k, λ1 = 1, λ2 = 2.
(i) Show that the determinant of N N
6.7.13 Show that in the case of PBIBD, the eigenvalues of NNthe eigenvalues of A with appropriate multiplicities where A is the
6.7.18 For a symmetrical BIBD, show that the adjusted block sum of squares
6.7.19 Prove the non-existence of the following triangular association scheme
7Multifactor Experiments
7.1 Elementary Definitions and Principles
In practice, for most designed experiments it can be assumed that theresponse Y is not only dependent on a single variable but on a whole groupof prognostic factors. If these variables are continuous, their influence onthe response is taken into account by so–called factor levels. These areranges (e.g., low, medium, high) that classify the continuous variables asordinal variables. In Sections 1.7 and 1.8, we have already cited examplesfor designed experiments where the dependence of a response on two factorswas to be examined.
Designs of experiments that analyze the response for all possiblecombinations of two or more factors are called factorial experimentsor cross–classification. Suppose that we have s factors A1, . . . , As withr1, . . . , rs factor levels. The complete factorial design then requires r =
∏ri
observations for one trial. This shows that it is important to restrict thenumber of factors as well as the number of their levels.
For factorial experiments, two elementary models are distinguished—models with and without interaction. Assume the situation of two factorsA and B with two factor levels each, i.e., A1, A2 and B1, B2.
The change in response produced by a change in the level of a factoris called the main effect of this factor. Considering Table 7.1, the maineffect of Factor A can be interpreted as the difference between the average
© Springer Science + Business Media, LLC 2009
H. Toutenburg and Shalabh, Statistical Analysis of Designed Experiments, Third Edition, 245Springer Texts in Statistics, DOI 10.1007/978-1-4419-1148-3_7,
246 7. Multifactor Experiments
response of the two factor levels A1 and A2:
λA =602− 40
2= 10 .
Similarly, the main effect of Factor B is
λB =702− 30
2= 20 .
Factor A
Factor BB1 B2
∑A1 10 30 40A2 20 40 60∑
30 70 100
Table 7.1. Two–factorial experiment without interaction.
The effects of Factor A at the two levels of Factor B are
for B1: 20− 10 = 10; for B2: 40− 30 = 10,
and hence identical for both levels of Factor B. For the effect of Factor Bwe have
for A1: 30− 10 = 20; for A2: 40− 20 = 20,
so that no effect dependent on Factor A can be seen. The response linesare parallel.The analysis of Table 7.2, however, leads to the following effects:
main effect λA =80− 40
2= 20,
main effect λB =90− 30
2= 30,
Factor A
Factor BB1 B2
∑A1 10 30 40A2 20 60 80∑
30 90 120
Table 7.2. Two–factorial experiment with interaction.
effects of Factor A:
for B1: 20− 10 = 10; for B2: 60− 30 = 30,
effects of Factor B:
for A1: 30− 10 = 20; for A2: 60− 20 = 40.
7.1 Elementary Definitions and Principles 247
ÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃ
ÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃ
A1 A2
10
20
30
40
B1
B1
B2
B2
u
u
u
u
Figure 7.1. Two–factorial experiment without interaction
Here the effects depend on the levels of the other factor, the interactioneffect amounts to 20. The response lines are no longer parallel (Figure 7.2).
Remark. The term factorial experiment describes the completely crossedcombination of the factors (treatments) and not the design of experiment.Factorial experiments may be realized as completely randomized designs ofexperiments, as Latin squares, etc.
The factorial experiment should be used:
• in pilot studies that analyze the statistical relevance of possiblecovariates;
• for the determination of bivariate interaction; and
• for the determination of possible rank orders of the factors related totheir influence on the response.
Compared to experiments with a single factor, the factorial experimenthas the advantage that the main effects may be estimated with the sameprecision, but with a smaller sample size.
Assume that we want to estimate the main effects A and B as in theabove examples. The following one–factor experiment with two repetitionswould be appropriate (cf. Montgomery, 1976, p. 124):
A1B(1)1 A1B
(1)2
A2B(1)1
A1B(2)1 A1B
(2)2
A2B(2)1
248 7. Multifactor Experiments
ÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃ
"""""""""""""""
A1 A2
10
20
30
40
50
60
B1
B1
B2
B2
u
u
u
u
Figure 7.2. Two–factorial experiment with interaction.
n = 3 + 3 = 6 observations
estimation of λA :12
[(A2B
(1)1 −A1B
(1)1 ) + (A2B
(2)1 −A1B
(2)1 )
],
estimation of λB :12
[(A1B
(1)1 −A1B
(1)2 ) + (A1B
(2)1 −A1B
(2)2 )
].
Estimation of the effects with the same precision is achieved by the factorialexperiment
A1B1 A1B2
A2B1 A2B2
with only n = 4 observations according to
λA =12
[(A2B1 −A1B1) + (A2B2 −A1B2)]
and
λB =12
[(A1B2 −A1B1) + (A2B2 −A2B1)] .
Additionally, the factorial experiment reveals existing interaction and henceleads to an adequate model.
7.2 Two–Factor Experiments (Fixed Effects) 249
If a present interaction is neglected or not revealed, a serious misinterpre-tation of the main effects may be the consequence. In principle, if significantinteraction is present, then the main effects are of secondary importancesince the effect of one factor on the response can no longer be segregatedfrom the other factor.
7.2 Two–Factor Experiments (Fixed Effects)
Suppose that there are a levels of Factor A and b levels of Factor B. For eachcombination (i, j), r replicates are realized and the design is a completelyrandomized design. Hence the number of observations equals N = rab. Theresponse is described by the linear model
yijk = µ + αi + βj + (αβ)ij + εijk ,(i = 1, . . . , a, j = 1, . . . , b, k = 1, . . . , r) .
(7.1)
where we have:
yijk is the response to the ith level of Factor A and the jth levelof Factor B in the kth replicate;
µ is the overall mean;αi is the effect of the ith level of Factor A;βj is the effect of the jth level of Factor B;(αβ)ij is the effect of the interaction of the combination (i, j); andεijk is the random error.
The following assumption is made for ε′ = (ε111, . . . , εabr):
ε ∼ N(0, σ2I) . (7.2)
For the fixed effects, we have the following constraints:a∑
i=1
αi = 0 , (7.3)
b∑
j=1
βj = 0 , (7.4)
a∑
i=1
(αβ)ij =b∑
j=1
(αβ)ij = 0 . (7.5)
Remark. If the randomized block design is chosen as the design of experi-ment, the model (7.1) additionally contains the (additive) block effects ρk
as random effects with ρk ∼ N(0, σ2ρ).
250 7. Multifactor Experiments
BA 1 2 · · · b
∑Means
1 Y11· Y12· · · · Y1b· Y1·· y1··2 Y21· Y22· · · · Y2b· Y2·· y2··...
......
......
...a Ya1· Ya2· · · · Yab· Ya·· ya··∑
Y·1· Y·2· · · · Y·b· Y··· y···Means y·1· y·2· · · · y·b·
Table 7.3. Table of the total response values in the (A×B)–design.
Source SS df MS FFactor A SSA a− 1 MSA FA
Factor B SSB b− 1 MSB FB
InteractionA×B SSA×B (a− 1)(b− 1) MSA×B FA×B
Error SSError N − ab MSError
= ab(r − 1)Total SSTotal N − 1
Table 7.4. Analysis of variance table in the (A×B)–design with interaction.
Ordinary Least Squares Estimation of the Parameters
The score function (3.6) in model (7.1) is as follows:
S(θ) =∑
i
∑
j
∑
k
(yijk − µ− αi − βj − (αβ)ij)2 (7.6)
under the constraints (7.3)–(7.5).
Here
θ′ = (µ, α1, . . . , αa, β1, . . . , βb, (αβ)11, . . . , (αβ)ab) (7.7)
is the vector of the unknown parameters. The normal equations, taking therestrictions (7.3)–(7.5) into consideration, can easily be derived
−12
∂S(θ)∂µ
=∑∑∑
(yijk − µ− αi − βj − (αβ)ij)
= Y··· −Nµ = 0, (7.8)
−12
∂S(θ)∂αi
= Yi·· − brαi − brµ = 0 (i fixed), (7.9)
7.2 Two–Factor Experiments (Fixed Effects) 251
−12
∂S(θ)∂βj
= Y·j· − arβj − arµ = 0 (j fixed), (7.10)
−12
∂S(θ)∂(αβ)ij
= Yij· − rµ− rαi − rβj − (αβ)ij
= 0 (i, j fixed) . (7.11)
We now obtain the OLS estimates under the constraints (7.3)–(7.5), thatis, the conditional OLS estimates
µ = Y···/N = y···, (7.12)
αi =Yi··br
− µ = yi·· − y···, (7.13)
βj =Y·j·ar
− µ = y·j· − y···, (7.14)
(αβ)ij =Yij·r− µ− αi − βj = yij· − yi·· − y·j· + y··· . (7.15)
The correction term is defined as
C = Y 2···/N (7.16)
with N = a b r. The sums of squares can now be expressed as follows:
SSTotal =∑ ∑∑
(yijk − y···)2
=∑ ∑∑
y2ijk − C, (7.17)
SSA =1br
∑
i
Y 2i·· − C, (7.18)
SSB =1ar
∑
j
Y 2·j· − C, (7.19)
SSA×B =1r
∑
i
∑
j
Y 2ij· −
1br
∑
i
Y 2i·· −
1ar
∑
j
Y 2·j· + C
=
1
r
∑
i
∑
j
Y 2ij· − C
− SSA − SSB , (7.20)
SSError = SSTotal − SSA − SSB − SSA×B
= SSTotal −1
r
∑
i
∑
j
Y 2ij· − C
. (7.21)
Remark. The sum of squares between the a · b sums of response Yij· is alsocalled SSSubtotal, i.e.,
SSSubtotal =1r
∑
i
∑
j
Y 2ij· − C . (7.22)
252 7. Multifactor Experiments
Hint. In order to ensure that the interaction effect is detectable (and hence(αβ)ij can be estimated), in the balanced design at least r = 2 replicateshave to be realized for each combination (i, j). Otherwise, the interactioneffect is included in the error and cannot be separated.
Test Procedure
The model (7.1) with interaction is called a saturated model. The modelwithout interaction,
yijk = µ + αi + βj + εijk , (7.23)
is called the independence model.First, the hypothesis H0 : (αβ)ij = 0 (for all (i, j)) against H1 : (αβ)ij 6=
0 (for at least one pair (i, j)) is tested. This corresponds to the modelchoice submodel (7.23) compared to the complete model (7.1) accordingto our likelihood–ratio test strategy in Chapter 3. The interpretation ofinferences obtained from the factorial experiment depends on the result ofthis test.
H0 is rejected if
FA×B =MSA×B
MSError> F(a−1)(b−1),ab(r−1);1−α . (7.24)
The interaction effects are significant in the case of a rejection of H0. Themain effects are of no importance, no matter whether they are significantor not.
Remark: This test procedure is a kind of philosophy representing oneschool. One could also consider a less dogmatic idea. If the main effect—being, for example, the average over the levels of another factor—is sensiblewithin an application the test could also be interpretable and meaningfuleven in the presence of an interaction.
If, however, H0 is not rejected, then the test results for H0 : αi = 0against H1 : αi 6= 0 (for at least one i) with FA = MSA/MSError andfor H0 : βj = 0 against H1 : βj 6= 0 (for at least one j) with FB =MSB/MSError are of importance for the interpretation in model (7.23).If only one factor effect is significant (e.g., Factor A), then the model isreduced further to a balanced one–factor model with a factor levels and brreplicates each
yijk = µ + αi + εijk . (7.25)
Example 7.1. The influence of two factors A (fertilizer) and B (irrigation)on the yield of a type of grain is to be analyzed in a pilot study. The FactorsA and B are applied at two levels (low, high) and r = 2 replicates each.Hence, we have a = b = r = 2 and N = abr = 8. The experimental units(plants) are assigned to the treatments at random. From Tables 7.5 and
7.2 Two–Factor Experiments (Fixed Effects) 253
7.6, we calculate
C = 77.62/8 = 752.72,
SSTotal = 866.92− C = 114.20,
SSA =14(39.62 + 38.02)− C
= 753.04− 752.72 = 0.32,
SSB =14(26.42 + 51.22)− C
= 892.60− 752.72 = 76.88,
SSSubtotal =12(17.82 + 21.82 + 8.62 + 29.42)− C
= 865.20− 752.72 = 112.48,
SSA×B = SSSubtotal − SSA − SSB = 35.28,
SSError = 114.20− 35.28− 0.32− 76.88= 1.72 .
Factor A
Factor B1 2
1 8.6 9.2 10.4 11.42 4.7 3.9 14.1 15.3
Table 7.5. Response values.
Factor A
Factor B
1 2∑
1 17.8 21.8 39.62 8.6 29.4 38.0∑
26.4 51.2 77.6
Table 7.6. Total response.
Source SS df MS FA 0.32 1 0.32 0.74B 76.88 1 76.88 178.79 *
A×B 35.28 1 35.28 82.05 *Error 1.72 4 0.43Total 114.20 7
Table 7.7. Analysis of variance table for Example 7.1.
254 7. Multifactor Experiments
Result: The test for interaction leads to a rejection of H0 : no interactionwith F1,4 = 82.05 (F1,4;0.95 = 7.71). A reduction to an experiment with asingle factor is not possible, in spite of the nonsignificant main effect A.
``````````````
(((((((((((((((
lowA1
highA2
10
20
30
B1 B1
B2
B2
uu
u
u
Figure 7.3. Interaction in Example 7.1.
7.3 Two–Factor Experiments in Effect Coding
In the above section, we have derived the parameter estimates of the com-ponents of θ (7.7) by minimizing the error sum of squares under the linearrestrictions
∑i αi = 0,
∑j βj = 0, and
∑i(αβ)ij =
∑j(αβ)ij = 0. This
corresponds to the conditional OLS estimate b(R) from (3.76).We now want to achieve a reduction in the number of parameters. This
is done by an alternative parametrization that includes the restrictionsalready in the model. The result is a set of parameters that correspondsto a design matrix of full column rank. The parameter estimation is nowachieved by the OLS estimate b0. For this purpose we use the so–calledeffect coding of categories. The effect coding for Factor A at a = 3 categories(levels) is as follows:
xAi =
1 for category i (i = 1, . . . , a− 1),−1 for category a,
0 else,
so that
αa = −a−1∑
i=1
αi , (7.26)
7.3 Two–Factor Experiments in Effect Coding 255
or, expressed differently,
a∑
i=1
αi = 0 . (7.27)
Example: Assume Factor A has a = 3 levels, A1: low, A2: medium, A3:high. The original link of design and parameters is as follows:
low:medium:high:
1 0 00 1 00 0 1
α1
α2
α3
and α1 + α2 + α3 = 0.
If effect coding is applied, we obtain
low:medium:high:
1 00 1
−1 −1
(α1
α2
).
Case a = b = 2
In the case of a linear model with two two–level prognostic Factors Aand B, we have, for fixed k (k = 1, . . . , r), the following parametrization(cf. Toutenburg, 1992a, p. 255):
y11k
y12k
y21k
y22k
=
1 1 1 11 1 −1 −11 −1 1 −11 −1 −1 1
µα1
β1
(αβ)11
+
ε11k
ε12k
ε21k
ε22k
. (7.28)
Here we get the constraints immediately
α1 + α2 = 0 ⇒ α2 = −α1,
β1 + β2 = 0 ⇒ β2 = −β1,
(αβ)11 + (αβ)12 = 0 ⇒ (αβ)12 = −(αβ)11,(αβ)11 + (αβ)21 = 0 ⇒ (αβ)21 = −(αβ)11,(αβ)21 + (αβ)22 = 0 ⇒ (αβ)22 = −(αβ)21 = (αβ)11 .
256 7. Multifactor Experiments
Of the original nine parameters, only four remain in the model. The othersare calculated from these equations. The following notation is used:
X11
r,4
= (1r 1r 1r 1r),
X12
r,4
= (1r 1r − 1r − 1r),
X21
r,4
= (1r − 1r 1r − 1r),
X22
r,4
= (1r − 1r − 1r 1r),
X ′
4,4r
= (X ′11 X ′
12 X ′21 X ′
22),
θ′0 = (µ, α1, β1, (αβ)11),
yij =
yij1
...yijr
, εij =
εij1
...εijr
,
y =
y11
y12
y21
y22
, ε =
ε11ε12ε21ε22
.
In the case of a = b = 2 and r replicates, and considering the restric-tions (7.3), (7.4), (7.5), the two–factorial model (7.1) can alternatively beexpressed in effect coding:
y = Xθ0 + ε . (7.29)
The OLS estimate of θ0 is
θ0 = (X ′X)−1X ′y .
We now calculate θ0:
X ′X4,4
= X ′11X11 + X ′
12X12 + X ′21X21 + X ′
22X22
= 4rI4 ,
7.3 Two–Factor Experiments in Effect Coding 257
X ′y =
Y···Y1·· − Y2··Y·1· − Y·2·
(Y11· + Y22·)− (Y12· + Y21·)
=
Y···2Y1·· − Y···2Y·1· − Y···
(Y11· + Y22·)− (Y12· + Y21·)
. (7.30)
With (X ′X)−1 = 1/4rI, the OLS estimate θ0 = (X ′X)−1X ′y can bewritten in detail as (cf. (7.12)–(7.15))
µα1
β1
ˆ(αβ)11
=
y···y1·· − y···y·1· − y···
y11· − y1·· − y·1· + y···
. (7.31)
The first three relations in (7.31) can easily be detected. The transitionfrom the fourth row in (7.30) to the fourth row in (7.31), however, has tobe proven in detail.
With a = b = 2, we have
y11· − y1·· − y·1· + y···
=Y11·r
−[Y11·br
+Y12·br
]−
[Y11·ar
+Y21·ar
]+
Y11· + Y12· + Y21· + Y22·abr
=Y11·r
(1− 1
b− 1
a+
1ab
)− Y12·
br
(1− 1
a
)− Y21·
ar
(1− 1
b
)+
Y22·abr
=Y11·r
(ab− a− b + 1
ab
)+
Y22·abr
− Y12·abr
(a− 1)− Y21·abr
(b− 1)
=14r
[(Y11· + Y22·)− (Y12· + Y21·)] .
Remark. Here we wish to point out an important characteristic of the effectcoding in the case of equal numbers r of replications. First, we write thematrix X in a different form
X =
X11
X12
X21
X22
=
1r 1r 1r 1r
1r 1r −1r −1r
1r −1r 1r −1r
1r −1r −1r 1r
= ( xµ
4r,1
xα1
4r,1
xβ1
4r,1
x(αβ)11
4r,1
)
258 7. Multifactor Experiments
so that
x′µxµ = x′α1xα1 = x′β1
xβ1 = x′(αβ)11x(αβ)11 = 4r,
x′µxα1 = x′µxβ1 = x′µx(αβ)11 = 0,
x′α1xβ1 = x′α1
x(αβ)11 = 0,
x′β1x(αβ)11 = 0 .
Hence, as we mentioned before, the following holds
X ′X =
x′µx′α1
x′β1
x(αβ)11
(xµ xα1 xβ1 x(αβ)11
)= 4rI4 .
The vectors that belong to different effect groups (µ, α, β, (αβ)) areorthogonal. This property remains true in general for effect coding.
General Cases: a > 2, b > 2
In the general case of a two–factorial model with interaction with:
Factor A : a levels; and
Factor B : b levels;
the parameter vector (after taking the constraints into account, i.e., ineffect coding) is as follows
θ′0 = (µ, α1, . . . , αa−1, β1, . . . , βb−1, (αβ)1,1, . . . , (αβ)a−1,b−1) (7.32)
and the design matrix is
X =(xµ Xα Xβ X(αβ)
). (7.33)
Here the column vectors of a submatrix are orthogonal to the columnvectors of every other submatrix, e.g.,
X ′αXβ = 0 .
The matrix X ′X is now block-diagonal
X ′X = diag(x′µxµ, X ′
αXα, X ′βXβ , X ′
(αβ)X(αβ)
)
so that
(X ′X)−1 = diag((x′µxµ)−1, (X ′
αXα)−1, (X ′βXβ)−1, (X ′
(αβ)X(αβ))−1)
(7.34)
7.3 Two–Factor Experiments in Effect Coding 259
and the OLS estimate θ0 can be written as
θ0 =
µα
βˆ(αβ)
=
(x′µxµ)−1x′µy(X ′
αXα)−1X ′αy
(X ′βXβ)−1X ′
βy
(X ′(αβ)X(αβ))−1X ′
(αβ)y
. (7.35)
For the covariance matrix of θ, we get a block-diagonal structure as well:
V(θ) = σ2
(x′µxµ)−1 0 0 00 (X ′
αXα)−1 0 00 0 (X ′
βXβ)−1 00 0 0 (X ′
(αβ)X(αβ))−1
.
(7.36)This shows that the estimation vectors µ, α, β, ( ˆαβ) are uncorrelated andindependent in the case of normal errors. From this it follows that theestimates µ, α and β in model (7.1), with interaction and the estimates inthe independence model (7.23), are identical. Hence, the estimates for oneparameter group—e.g., the main effects of Factor B—are always the same,no matter whether the other parameters are contained in the model or not.Again, this holds only for balanced data.
In the case of rejection of H0 : (αβ)ij = 0, σ2 is estimated by
MSError =SSError
N − ab=
1N − ab
(SSTotal − SSA − SSB − SSA×B)
(cf. Table 6.4 and (7.21)). If H0 is not rejected, then the independencemodel (7.23) holds and we have
SSError = SSTotal − SSA − SSB
for N − 1− (a− 1)− (b− 1) = N − a− b + 1 degrees of freedom.The model (7.1) with interaction corresponds to the parameter space Ω,
according to our notation in Chapter 3. The independence model is thesubmodel of the parameter space ω ⊂ Ω. With (B.77) we have
σω − σ2Ω ≥ 0 . (7.37)
Applied to our problem, we find
σ2Ω =
SSTotal − SSA − SSB − SSA×B
N − ab(7.38)
and
σ2ω =
SSTotal − SSA − SSB
N − ab + (a− 1)(b− 1). (7.39)
Interpretation. In the independence model σ2 is estimated by (7.39).Hence, the confidence intervals of the parameter estimates µ, α, and β arelarger when compared with those obtained from the model with interaction.
260 7. Multifactor Experiments
On the other hand, the parameter estimates themselves (which correspondto the center points of the confidence intervals) stay unchanged. Thus, theprecision of the estimates µ, α, and β decreases. Simultaneously the teststatistics change so that in the case of a rejection of the saturated model(7.1), tests of significance for µ, α, and β, based on the analysis of variancetable for the independence model, are to be carried out.
Cases a = 2, b = 3
Considering the constraints (7.3)–(7.5), the model in effect coding is asfollows:
y11
y12
y13
y21
y22
y23
=
1r 1r 1r 0 1r 01r 1r 0 1r 0 1r
1r 1r −1r −1r −1r −1r
1r −1r 1r 0 −1r 01r −1r 0 1r 0 −1r
1r −1r −1r −1r 1r 1r
µα1
β1
β2
(αβ)11(αβ)12
+
ε11ε12ε13ε21ε22ε23
. (7.40)
Here we once again find the constraints immediately:
α1 + α2 = 0 ⇒ α2 = −α1,
β1 + β2 + β3 = 0 ⇒ β3 = −β1 − β2,
(αβ)11 + (αβ)21 = 0 ⇒ (αβ)21 = −(αβ)11,(αβ)12 + (αβ)22 = 0 ⇒ (αβ)22 = −(αβ)12,(αβ)13 + (αβ)23 = 0 ⇒ (αβ)23 = −(αβ)13,
(αβ)11 + (αβ)12 + (αβ)13 = 0 ⇒ (αβ)13 = −(αβ)11 − (αβ)12,(αβ)21 + (αβ)22 + (αβ)23 = 0 ⇒ (αβ)23 = −(αβ)21 − (αβ)22,
= (αβ)11 + (αβ)12,
so that, of the original 12 parameters, only six remain in the model
θ′0 = (µ, α1, β1, β2, (αβ)11, (αβ)12) . (7.41)
7.3 Two–Factor Experiments in Effect Coding 261
We now take advantage of the orthogonality of the submatrices and apply(7.35) for the determination of the OLS estimates. We thus have
µ = (x′µxµ)−1x′µy =16r
Y··· = y··· ,
α1 = (x′αxα)−1x′αy =16r
(Y1·· − Y2··)
=16r
(2Y1·· − Y···)
= y1·· − y··· ,(β1
β2
)=
(X ′
βXβ
)−1X ′
βy
=(
4r 2r2r 4r
)−1 (Y11· − Y13· + Y21· − Y23·Y12· − Y13· + Y22· − Y23·
)
=16r
(2 −1−1 2
)(Y·1· − Y·3·Y·2· − Y·3·
)
=16r
(2Y·1· − Y·2· − Y·3·2Y·2· − Y·1· − Y·3·
)
=(
y·1· − y···y·2· − y···
),
since, for instance,
16r
(2Y·1· − Y·2· − Y·3·) =3Y·1· − Y···
6r= y·1· − y··· ,
((αβ)11(αβ)12
)=
16r
(2 −1−1 2
) (Y11· − Y13· − Y21· + Y23·Y12· − Y13· − Y22· + Y23·
)
=16r
(2Y11· − Y13· − 2Y21· + Y23· − Y12· + Y22·−Y11· − Y13· + Y21· + Y23· + 2Y12· − 2Y22·
)
=(
y11· − y1·· − y·1· + y···y12· − y1·· − y·2· + y···
).
Example 7.2. A designed experiment is to analyze the effect of differentconcentrations of phosphate in a combination fertilizer (Factor B) on theyield of two types of beans (Factor A). A factorial experiment with twofactors and fixed effects is chosen:
262 7. Multifactor Experiments
Factor A: A1: type of beans I,A2: type of beans II;
Factor B: B1: no phosphate,B2: 10% per unit,B3: 30% per unit.
Hence, in the case of the two–factor approach we have the six treatmentsA1B1, A1B2, A1B3, A2B1, A2B2, and A2B3. In order to be able to estimatethe error variance, the treatments have to be repeated. Here we choose thecompletely randomized design of experiment with four replicates each. Theresponse values are summarized in Table 7.8.
B1 B2 B3 SumA1 15 18 22
17 19 2914 20 3116 21 35
Sum 62 78 117 257A2 13 17 18
9 19 228 18 24
12 18 23Sum 42 72 87 201Sum 104 150 204 458
Table 7.8. Response in the (A×B)–design (Example 7.2).
We calculate the sums of squares (a = 2, b = 3, r = 4, N = 3 · 3 · 4 = 24):
C = Y 2···/N = 4582/24 = 8740.17,
SSTotal = (152 + 172 + · · ·+ 232)− C
= 9672− C = 931.83,
SSA =1
3 · 4(2572 + 2012)− C
= 8870.83− C = 130.66,
SSB =1
2 · 4(1042 + 1502 + 2042)− C
= 9366.50− C = 626.33,
SSSubtotal = 1/4(622 + 782 + · · ·+ 872)− C
= 9533.50− C = 793.33,
SSA×B = SSSubtotal − SSA − SSB
= 36.34SSError = SSTotal − SSSubtotal = 138.50 .
7.4 Two–Factorial Experiment with Block Effects 263
SS df MS FFactor A 130.66 1 130.66 16.99 *Factor B 626.33 2 313.17 40.72 *
A×B 36.34 2 18.17 2.36Error 138.50 18 7.69Total 931.83 23
Table 7.9. Analysis of variance table for Table 7.8.
The test strategy starts by testing H0 : no interaction. The test statistic is
FA×B = F2,18 =18.177.69
= 2.36 .
The critical value is
F2,18;0.95 = 3.55 .
Hence, the interaction is not significant at the 5% level.
SS df MS FFactor A 130.66 1 130.66 14.95 *Factor B 626.33 2 313.17 35.83 *
Error 174.84 20 8.74Total 931.83 23
Table 7.10. Analysis of variance table for Table 7.8 after omitting the interaction(independence model).
The test for significance of the main effects and the interaction effectin Table 7.9 is based on model (7.1) with interaction. The test statisticsfor H0 : αi = 0, H0 : βi = 0, and H0 : (αβ)ij = 0 are independent. Wedid not reject H0 : (αβ)ij = 0 (cf. Figure 7.4). This leads us back to theindependence model (7.23) and we test the significance of the main effectsaccording to Table 7.10. Here both effects are significant as well.
7.4 Two–Factorial Experiment with Block Effects
We now realize the factorial design with Factors A (at a levels) and B (atb levels) as a randomized block design with ab observations for each block(Table 7.11). The appropriate linear model with interaction is then of thefollowing form:
yijk = µ + αi + βj + ρk + (αβ)ij + εijk
(i = 1, . . . , a, j = 1, . . . , b, k = 1, . . . , r).(7.42)
Here ρk (k = 1, . . . , r) is the kth block effect and the constraints∑rk=1 ρk = 0 for fixed effects hold. The other parameters are the
264 7. Multifactor Experiments
40
60
80
100
120
A2
A2
A2A1
A1
A1
B1 B2 B3
u
u
u
u
u
u
!!!!!!¡¡
¡¡
¡¡
##
##
##!!!!!!
Figure 7.4. Interaction type × fertilization (not significant).
same as in model (7.1). In the case of random block effects we assumeρ′ = (ρ1, . . . , ρr) ∼ N(0, σ2
ρ I) and E(ερ′) = 0. Let
Yij· =r∑
k=1
yijk (7.43)
be the total response of the factor combination over all r blocks. The errorsum of squares SSTotal (7.17), SSA (7.18), SSB (7.19), and SSA×B (7.20)remain unchanged. For the additional block effect, we calculate
SSBlock =1ab
r∑
k=1
Y 2··k − C . (7.44)
The sum of squares SSError is now
SSError = SSTotal − SSA − SSB − SSA×B − SSBlock . (7.45)
The analysis of variance is shown in Table 7.12.The interpretation of the model with block effects is done in the same
manner as for the model without block effects. In the case of at least onesignificant interaction, it is not possible to interpret the main effects—including the block effect—separately.
If H0 : (αβ)ij = 0 is not rejected, then an independence model with thethree main effects (A, B, and block) holds, if these effects are significant.
7.4 Two–Factorial Experiment with Block Effects 265
Factor BFactor A 1 2 · · · b Sum
1 Y11· Y12· · · · Y1b· Y1··2 Y21· Y22· · · · Y2b· Y2··...
......
......
a Ya1· Ya2· · · · Yab· Ya··Sum Y·1· Y·2· · · · Y·b· Y···
Table 7.11. Two–factorial randomized block design.
Source SS df MS FFactor A SSA a− 1 MSA FA
Factor B SSB b− 1 MSB FB
A×B SSA×B (a− 1)(b− 1) MSA×B FA×B
Block SSBlock r − 1 MSBlock FBlock
Error SSError (r − 1)(ab− 1) MSError
Total SSTotal rab− 1
Table 7.12. Analysis of variance table in the A×B-design (7.42) with interactionand block effects.
Compared to model (7.23), the parameter estimates α and β are moreprecise, due to the reduction of the variance achieved by the block effect.
Example 7.3. The experiment in Example 7.2 is now designed as a ran-domized block design with r = 4 blocks. The response values are shown inTable 7.13 and the total response is given in Tables 7.14 and 7.15.
We calculate (with C = 8740.17)
SSBlock =1
2 · 3(1032 + 1152 + 1152 + 1252)− C
= 8780.67− C = 40.50
and
SSError = 98.00 .
The analysis of variance table (Table 7.16) shows that with F2,15;0.95 = 3.68the interaction effect is once again not significant. In the reduced model
yijk = µ + αi + βj + ρk + εijk (7.46)
we test the main effects (Table 7.17).Because of F3,17;0.95 = 3.20, the block effect is not significant. Hence
we return to model (7.23) with the two main effects A and B which aresignificant according to Table 7.10.
266 7. Multifactor Experiments
I II III IVA2B2 A1B1 A1B3 A2B1
17 17 31 12A1B3 A2B3 A2B1 A1B2
22 22 8 21A1B1 A1B2 A1B2 A2B3
15 19 20 23A2B1 A2B2 A2B2 A1B3
13 19 18 35A1B2 A2B1 A1B1 A2B2
18 9 14 18A2B3 A1B3 A2B3 A1B1
18 29 24 16
Table 7.13. Randomized block design and response in the (2 × 3)–factorexperiment.
SumBlock I II III IVResponse total 103 115 115 125 458
Table 7.14. Total response Y··k per block.
7.5 Two–Factorial Model with FixedEffects—Confidence Intervals and ElementaryTests
In a two–factorial experiment with fixed effects there are three differenttypes of means: A–levels, B–levels, and (A × B)–levels. In the case of anonrandom block effect, the fourth type of means is that of the blocks. Inthe following, we assume fixed block effects.
(i) Factor A
The means of the A–levels are
yi·· =1br
b∑
j=1
r∑
k=1
yijk ∼ N
(µ + αi,
σ2
br
). (7.47)
B1 B2 B3
A1 62 78 117 257A2 42 72 87 201
104 150 204 458
Table 7.15. Total response Yij· for each factor combination (Example 7.3).
7.5 Two–Factorial Model with Fixed Effects—Confidence Intervals andElementary Tests
267
Source SS df MS FFactor A 130.66 1 130.66 20.01 *Factor B 626.33 2 313.17 47.96 *A×B 36.34 2 18.17 2.78Block 40.50 3 13.50 2.07Error 98.00 15 6.53Total 931.83 23
Table 7.16. Analysis of variance table in model (7.42.)
Source SS df MS FFactor A 130.66 1 130.66 16.54 *Factor B 626.33 2 313.17 39.64 *
Block 40.50 3 13.50 1.71Error 134.34 17 7.90Total 931.83 23
Table 7.17. Analysis of variance table in model (7.46).
The variance σ2 is estimated by s2 = MSError with df degrees of freedom.Here MSError is computed from the model which holds after testing forinteraction and block effects.
The confidence intervals for µ + αi are now of the following form(tdf,1−α/2: two–sided quantile)
yi·· ± tdf,1−α/2
√s2
br. (7.48)
The standard error of the difference between two A–levels is√
2s2/br, sothat the test statistic for H0 : αi1 = αi2 is of the following form:
tdf =yi1·· − yi2··√
2s2/br. (7.49)
(ii) Factor B
Similarly, we have
y·j· =1ar
a∑
i=1
r∑
k=1
yijk ∼ N
(µ + βj ,
σ2
ar
). (7.50)
The (1− α)–confidence interval for µ + βj is
y·j· ± tdf,1−α/2
√s2
ar(7.51)
and the test statistic for the comparison of means (H0 : βj1 = βj2) is
tdf =y·j1· − y·j2·√
2s2/ar. (7.52)
268 7. Multifactor Experiments
(iii) Factor A×B
Here we have
yij· =1r
r∑
k=1
yijk ∼ N
(µ + αi + βj + (αβ)ij ,
σ2
r
). (7.53)
The (1− α)–confidence interval for µ + αi + βj + (αβ)ij is
yij· ± tdf,1−α/2
√s2/r (7.54)
and the test statistic for the comparison of two (A×B)–effects is
tdf =yi1j1· − yi2j2·√
2s2/r. (7.55)
The significance of single effects is tested by:
(i) H0 : µ + αi = µ0:
tdf =yi·· − µ0√
s2/br; (7.56)
(ii) H0 : µ + βj = µ0:
tdf =y·j· − µ0√
s2/ar; (7.57)
(iii) H0 : µ + αi + βj + (αβ)ij = µ0:
tdf =yij· − µ0√
s2/r. (7.58)
Here the statements in Section 4.4 about elementary and multiple testshold.
Example 7.4. (Examples 7.2 and 7.3 continued) The test procedure leadsto nonsignificant interaction and block effects. Hence, the independencemodel holds. From the appropriate analysis of variance table (Table 7.10)we take
s2 = 8.74 for df = 20.
7.5 Two–Factorial Model with Fixed Effects—Confidence Intervals andElementary Tests
269
From Table 7.8 we obtain the means of the two levels A1 and A2 and ofthe three levels B1, B2, and B3:
A1 : y1·· =2573 · 4 = 21.42,
A2 : y2·· =2013 · 4 = 16.75,
B1 : y·1· =1042 · 4 = 13.00,
B2 : y·2· =1502 · 4 = 18.75,
B3 : y·3· =2042 · 4 = 25.50, .
(i) Confidence intervals fo A–levels:
A1: 21.42± t20;0.975
√8.74/3 · 4 = 21.42± 2.09 · 0.85
= 21.42± 1.78⇒ [19.64; 23.20],
A2: 16.75± 1.78⇒ [14.97; 18.53].
Test for H0 : α1 = α2 against H1 : α1 > α2:
t20 =21.42− 16.75√
2 · 8.74/3 · 4 =4.671.21
= 3.86
> 1.73 = t20;0.95 (one–sided)
⇒ H0 is rejected.
(ii) Confidence intervals for B–levels:
With t20;0.975
√8.74/2 · 4 = 2.09 · 1.05 = 2.19, we obtain
B1 : 13.00± 2.19 ⇒ [10.81; 15.19],B2 : 18.75± 2.19 ⇒ [16.56; 20.94],B3 : 25.50± 2.19 ⇒ [23.31; 27.69].
The pairwise comparisons of means reject the hypothesis of identity.
270 7. Multifactor Experiments
7.6 Two–Factorial Model with Random or MixedEffects
The first part of Chapter 7 has assumed the effects of Factors A and B tobe fixed. This means that the factor levels of A and B are specified beforethe experiment and, hence, the conclusions of the analysis of variance areonly valid for these factor levels. Alternative designs allow Factors A andB to act randomly (model with random effects) or keep one factor fixedand choose the other factor at random (model with mixed effects).
7.6.1 Model with Random Effects
We assume that the levels of both Factors A and B are chosen at randomfrom populations A and B. The inferences will then be valid about all levelsin the (two-dimensional) population. The response values in the model withrandom effects (or components of variance model) are
yijk = µ + αi + βj + (αβ)ij + εijk , (7.59)
with i = 1, . . . , a, j = 1, . . . , b, k = 1, . . . , r and where αi, βj , (αβ)ij arerandom variables independent of each other and of εijk. We assume
α = (α1, . . . , αa)′ ∼ N(0, σ2α I),
β = (β1, . . . , βb)′ ∼ N(0, σ2β I),
(αβ) = ((αβ)11, . . . , (αβ)ab)′ ∼ N(0, σ2αβ I),
ε = (ε1, . . . , εabr)′ ∼ N(0, σ2 I) .
(7.60)
In matrix notation, the covariance structure is as follows:
E
αβ
(αβ)ε
(α, β, (αβ), ε)′ =
σ2α I 0 0 00 σ2
β I 0 00 0 σ2
αβ I 00 0 0 σ2 I
.
Hence the variance of the response values is
Var(yijk) = σ2α + σ2
β + σ2αβ + σ2 . (7.61)
σ2α, σ2
β , σ2αβ , σ2 are called variance components. The hypotheses that we
are interested in testing are: H0 : σ2α = 0, H0 : σ2
β = 0, and H0 : σ2αβ = 0.
The formulas for the decomposition of the variance SSTotal into SSA,SSB , SSA×B , and SSError and for the calculation of the variance remainunchanged, that is, all sums of squares are calculated as in the fixed effectscase. However, to form the test statistics we must examine the expectation
7.6 Two–Factorial Model with Random or Mixed Effects 271
of the appropriate mean squares. We have
SSA =1br
a∑
i=1
(Yi·· − Y···)2
=a∑
i=1
b∑
j=1
r∑
k=1
(yi·· − y···)2 . (7.62)
With α = 1/a∑a
i=1 αi, β = 1/b∑b
j=1 βj , (αβ)i· = 1/b∑b
j=1(αβ)ij , and(αβ)·· = 1/(ab)
∑∑(αβ)ij , we compute, from model (7.59),
yi·· = µ + αi + β + (αβ)i· + εi·· ,
y··· = µ + α + β + (αβ)·· + ε··· ,
so that
yi·· − y··· = (αi − α) + [(αβ)i· − (αβ)··] + (εi·· − ε···) . (7.63)
Because of the mutual independence of the random effects and of the error,we have
E(yi·· − y···)2 = E(αi − α)2 + E[(αβ)i· − (αβ)··]2 + E(εi·· − ε···)2 . (7.64)
For the three components, we observe that
E(αi − α)2 = E(α2i ) + E(α2)− 2E(αiα)
= σ2α
[1 +
1a− 2
a
]
= σ2α
[1− 1
a
]= σ2
α
(a− 1
a
), (7.65)
E[(αβ)i· − (αβ)··]2 = E[(αβ)2i·] + E[(αβ)2··]− 2E[(αβ)i·(αβ)··]
= σ2αβ
[1b
+1ab− 2
ab
]
= σ2αβ
(a− 1ab
), (7.66)
E(εi·· − ε···)2 = E(ε2i··) + E(ε2···)− 2E(εi··ε···)
= σ2
[1br
+1
abr− 2
abr
]
= σ2
(a− 1abr
), (7.67)
whence we find (cf. (7.62) and (7.64))
E(MSA) =1
a− 1E(SSA)
= σ2 + rσ2αβ + brσ2
α . (7.68)
272 7. Multifactor Experiments
Similarly, we find
E(MSB) = σ2 + rσ2αβ + arσ2
β , (7.69)
E(MSA×B) = σ2 + rσ2αβ , (7.70)
E(MSError) = σ2 . (7.71)
Estimation of the Variance Components
The estimates σ2, σ2α, σ2
β , and σ2αβ of the variance components σ2, σ2
α, σ2β ,
and σ2αβ are computed from the equating system (7.68)–(7.71) in its sample
version, that is, from the system
MSA = brσ2α + rσ2
αβ + σ2,MSB = arσ2
β + rσ2αβ + σ2,
MSA×B = rσ2αβ + σ2,
MSError = σ2,
(7.72)
i.e.,
MSA
MSB
MSA×B
MSError
=
br 0 r 10 ar r 10 0 r 10 0 0 1
σ2α
σ2β
σ2αβ
σ2
.
The coefficient matrix of this linear inhomogeneous system is of triangularshape with its determinant as
abr3 6= 0 .
This yields the unique solution
σ2 = MSError, (7.73)
σ2αβ =
1r(MSA×B −MSError), (7.74)
σ2β =
1ar
(MSB −MSA×B), (7.75)
σ2α =
1br
(MSA −MSA×B) . (7.76)
Testing of Hypotheses about the Variance Components
(i) H0 : σ2αβ = 0
From the system (7.68)–(7.71) of the expectations of the MS’s it can beseen that for H0 : σ2
αβ = 0 (no interaction) we have E(MSA×B) = σ2.Hence the test statistic is of the form
FA×B =MSA×B
MSError. (7.77)
7.6 Two–Factorial Model with Random or Mixed Effects 273
If H0 : σ2αβ = 0 does not hold (i.e., H0 is rejected in favor of H1 : σ2
αβ 6= 0),then we have E(MSA×B) > E(MSError). Hence H0 is rejected if
FA×B > F(a−1)(b−1),ab(r−1);1−α (7.78)
holds.
(ii) H0 : σ2α = 0
The comparison of E(MSA) [(7.68)] and E(MSA×B) [(7.70)] shows thatboth expectations are identical under H0 : σ2
α = 0, but E(MSA) >E(MSA×B) holds in the case of H1 : σ2
α 6= 0. The test statistic is then
FA =MSA
MSA×B(7.79)
and H0 is rejected if
FA > Fa−1,(a−1)(b−1);1−α (7.80)
holds.
(iii) H0 : σ2β = 0
Similarly, the test statistic for H0 : σ2β = 0 against H1 : σ2
β 6= 0 is
FB =MSB
MSA×B, (7.81)
and H0 is rejected if
FB > Fb−1,(a−1)(b−1);1−α (7.82)
holds.
Source SS df MS F
Factor A SSA dfA = a − 1 MSA =SSAdfA
FA =MSA
MSA×B
Factor B SSB dfB = b − 1 MSB =SSBdfB
FB =MSB
MSA×BInteraction
A × B SSA×B dfA×B = (a − 1)(b − 1) MSA×B =SSA×BdfA×B
FA×B =MSA×BMSError
Error SSError dfError = ab(r − 1) MSError =SSErrordfError
Total SSTotal dfTotal = abr − 1
Table 7.18. Analysis of variance table (two–factorial with interaction and randomeffects.)
Remark. In the random effects model the test statistics FA and FB areformed with MSA×B in the denominator. In the model with fixed effects,we have MSError in the denominator.
274 7. Multifactor Experiments
SS df MS F
Factor A 130.66 1 130.66 FA = 130.66/18.17 = 7.19
Factor B 626.33 2 313.17 FB = 313.17/18.17 = 17.24
A×B 36.34 2 18.17 FA×B = 18.17/7.69 = 2.36Error 138.50 18 7.69Total 931.83 23
Table 7.19. Analysis of variance table for Table 7.8 in the case of random effects.
Example 7.5. We now consider the experiment in Example 7.2 as a two–factorial experiment with random effects. For this, we assume that thetwo types of beans (Factor A) are chosen at random from a population,instead of being fixed effects. Similarly, we assume that the three phosphatefertilizers are chosen at random from a population. We assume the sameresponse values as in Table 6.8 and adopt the first three columns from Table6.9 for our analysis (Table 6.19). The estimated variance components are
σ2 = 7.69,
σ2αβ = 1/4(18.17− 7.69) = 2.62,
σ2β =
12 · 4(313.17− 18.17) = 36.88,
σ2α =
13 · 4(130.66− 18.17) = 9.37 .
The three variance components σ2αβ , σ2
α, and σ2β are not significant
at the 5% level (critical values: F1,2;0.95 = 18.51; F2,2;0.95 = 19.00;F2,18;0.95 = 3.55).
Owing to the nonsignificance of σ2αβ , we return to the independence
model. The analysis of variance table of this model is identical with Table7.10 so that the two variance components σ2
α and σ2β are significant.
7.6.2 Mixed Model
We now consider the situation where one factor (e.g., Factor A) is fixedand the other Factor B is random. The appropriate linear model in thestandard version by Scheffe (1956; 1959) is
yijk = µ + αi + βj + (αβ)ij + εijk (7.83)
7.6 Two–Factorial Model with Random or Mixed Effects 275
with i = 1, . . . , a, j = 1, . . . , b, k = 1, . . . , r, and the following assumptions:
αi : fixed effect,a∑
i=1
αi = 0, (7.84)
βj : random effect, βji.i.d.∼ N(0, σ2
β), (7.85)
(αβ)ij : random effect, (αβ)iji.d.∼ N
(0,
a− 1a
σ2αβ
), (7.86)
a∑
i=1
(αβ)ij = (αβ)·j = 0 (j = 1, . . . , b) . (7.87)
We assume that the random variable groups βj , (αβ)ij , and εijk are mu-tually independent, that is, we have E(βj(αβ)ij) = 0, etc. As in the abovemodels, we have E(ε) = σ2I.
The last assumption (7.87) means that the interaction effects betweentwo different A–levels are correlated. For all j = 1, . . . , b, we have
Cov[(αβ)i1j , (αβ)i2j ] = −1aσ2
αβ (i1 6= i2) , (7.88)
but
Cov[(αβ)i1j1 , (αβ)i2j2 ] = 0 (j1 6= j2, any i1, i2) . (7.89)
For a = 3, we provide a short outline of the proof. Using (7.87), we obtain
Cov[(αβ)1j , (αβ)2j ] = Cov[(αβ)1j , [−(αβ)1j − (αβ)3j ]]= −Var(αβ)1j − Cov[(αβ)1j , (αβ)3j ] ,
whence
Cov[(αβ)1j , (αβ)2j ] + Cov[(αβ)1j , (αβ)3j ] = −Var(αβ)1j
= −3− 13
σ2αβ .
Since Cov[(αβ)i1j , (αβ)i2j ] is identical for all pairs, (7.88) holds. Ifa = b = 2 and r = 1, then the model (7.83) with all assumptionshas a four–dimensional normal distribution
0BB@
y11
y21
y12
y22
1CCA ∼ N
0BB@
0BB@
µ + α1
µ + α2
µ + α1
µ + α2
1CCA ,
0BB@
σ2 σ2∗ 0 0
σ2∗ σ2 0 00 0 σ2 σ2
∗0 0 σ2
∗ σ2
1CCA
1CCA (7.90)
with
Var(yij) = σ2 = σ2β + σ2
αβ
a− 1a
+ σ2 (7.91)
= (σ2αβ + σ2) + σ2
∗ ,
276 7. Multifactor Experiments
using the identity σ2∗ = σ2
β − (1/a)σ2αβ . The covariance matrix (7.90) can
now be written as
Σ = I ⊗ ((σ2αβ + σ2)I2 + σ2
∗J2) ,
where ⊗ is the Kronecker product. However, the second matrix has a com-pound symmetrical structure (3.178) so that the parameter estimates ofthe fixed effects are computed according to the OLS method (cf. Theorem3.22):
r = 1: µ = y·· and αi = yi· − y·· ,r > 1: µ = y··· and αi = yi·· − y··· .
Expectations of the MS’s
The specification of the A–effects and the reparametrization of the varianceof (αβ)ij in σ2
αβ [(a− 1)/a], as well as the constraints (7.87), have an effecton the expected mean squares. The expectations of the MS’s are now
E(MSA) = σ2 + rσ2αβ +
br∑a
i=1 α2i
a− 1, (7.92)
E(MSB) = σ2 + arσ2β , (7.93)
E(MSA×B) = σ2 + rσ2αβ , (7.94)
E(MSError) = σ2 . (7.95)
The test statistic for testing H0 : no A–effect, i.e., H0 : αi = 0 (for all i), is
FA = Fa−1,(a−1)(b−1) =MSA
MSA×B. (7.96)
The test statistic for H0 : σ2β = 0 is
FB = Fb−1,ab(r−1) =MSB
MSError. (7.97)
The test statistic for H0 : σ2αβ = 0 is
FA×B = F(a−1)(b−1),ab(r−1) =MSA×B
MSError. (7.98)
Estimation of the Variance Components
The variance components may be estimated by solving the following system(7.92)–(7.95) in its sample version:
MSA = [br/(a− 1)]∑
α2i + rσ2
αβ + σ2,MSB = arσ2
β + σ2,MSA×B = rσ2
αβ + σ2,MSError = σ2,
7.6 Two–Factorial Model with Random or Mixed Effects 277
=⇒ σ2 = MSError, (7.99)
σ2αβ =
MSA×B −MSError
r, (7.100)
σ2β =
MSB −MSError
ar. (7.101)
Source SS df E(MS) F
Factor A SSA a− 1 σ2 + rσ2αβ+ FA = MSA/MSA×B
+[br/(a− 1)]P
α2i
Factor B SSB b− 1 σ2 + arσ2β FB = MSB/MSError
A× B SSA×B (a− 1)(b− 1) σ2 + rσ2αβ FA×B = MSA×B/MSError
Error SSError ab(r − 1) σ2
Total SSTotal abr − 1
Table 7.20. Analysis of variance table in the mixed model (standard model,dependent interaction effects).
In addition to the standard model with intraclass correlation structure,several other versions of the mixed model exist (cf. Hocking, 1973). Animportant version is the model with independent interaction effects thatassumes
(αβ)iji.i.d.∼ N(0, σ2
αβ) (for all i, j) . (7.102)
Furthermore, independence of the (αβ)ij from the βj and the εij is assumedas in the standard model.
E(MSB) now changes to
E(MSB) = σ2 + rσ2αβ + arσ2
β (7.103)
and the test statistic for H0 : σ2β = 0 changes to
FB = Fb−1,(a−1)(b−1) =MSB
MSA×B. (7.104)
The choice of mixed models should always be dictated by the data. Inmodel (7.83) we have, for the covariance within the response values,
Cov(yi1j1k1 , yi2j2k2) = δj1j2σ2β + Cov[(αβ)i1j1 , (αβ)i2j2 ] + σ2 . (7.105)
If Factor B represents, for example, b time intervals (24–hour mea-sure of blood pressure) and if Factor A represents the fixed effectplacebo/medicament (p/m), then the assumption Cov[(αβ)Pj , (αβ)Mj ] =0 would be reasonable, which is the opposite of (7.88). Similarly, (7.89)
278 7. Multifactor Experiments
Source SS df E(MS) F
A SSA a− 1 σ2 + rσ2αβ+ FA = MSA/MSA×B
+[br/(a− 1)]P
α2i
B SSB b− 1 σ2 + rσ2αβ + arσ2
β FB = MSB/MSA×B
A× B SSA×B (a− 1)(b− 1) σ2 + rσ2αβ FA×B = MSA×B/MSError
Error SSError ab(r − 1) σ2
Total SSTotal abr − 1
Table 7.21. Analysis of variance table in the mixed model with independentinteraction effects.
would have to be changed to
Cov[(αβ)Pj1 , (αβ)Pj2 ] 6= 0
or
Cov[(αβ)Mj1 , (αβ)Mj2 ] 6= 0 (j1 6= j2),
respectively. These models are described in Chapter 9.
7.7 Three–Factorial Designs
The inclusion of a third factor in the experiment increases the numberof parameters to be estimated. At the same time, the interpretation alsobecomes more difficult.
We denote the three factors (treatments) by A, B, and C and theirfactor levels by i = 1, . . . , a, j = 1, . . . , b, and k = 1, . . . , c. Furthermore, weassume r replicates each, e.g., the randomized block design with r blocksand abc observations each. The appropriate model is the following additivemodel
yijkl = µ + αi + βj + γk + (αβ)ij + (αγ)ik + (βγ)jk + (αβγ)ijk
+τl + εijkl (l = 1, . . . , r) . (7.106)
In addition to the two–way interactions (αβ)ij , (βγ)jk, and (αγ)ik, we nowhave the three–way interaction (αβγ)ijk. We assume the usual constraintsfor the main effects and the two–way interactions. Additionally, we assume
∑
i
(αβγ)ijk =∑
j
(αβγ)ijk =∑
k
(αβγ)ijk = 0 . (7.107)
The test strategy is similar to the two–factorial model, that is, the three–way interaction is tested first. If H0 : (αβγ)ijk = 0 is rejected, then allof the two–way interactions and the main effects cannot be interpretedseparately. The test strategy and, especially, the interpretation of submod-els will be discussed in detail in Chapter 8 for models with categorical
7.7 Three–Factorial Designs 279
response. The results of Chapter 8 are valid for models with continuousresponse analogously.
The total response values are given in Table 7.22.
Factor CFactor A Factor B 1 2 · · · c Sum
1 1 Y111· Y112· · · · Y11c· Y11··2 Y121· Y122· · · · Y12c· Y12··...
......
......
b Y1b1· Y1b2· · · · Y1bc· Y1b··Sum Y1·1· Y1·2· · · · Y1·c· Y1···
......
......
a 1 Ya11· Ya12· · · · Ya1c· Ya1··2 Ya21· Ya22· · · · Ya2c· Ya2··...
......
......
b Yab1· Yab2· · · · Yabc· Yab··Sum Ya·1· Ya·2· · · · Ya·c· Ya···
Sum Y··1· Y··2· · · · Y··c· Y····
Table 7.22. Total response per block of the (A, B, C)–factor combinations.
The sums of squares are as follows:
C =Y 2····
abcr(correction term),
SSTotal =∑∑ ∑ ∑
y2ijkl − C,
SSBlock =1
abc
r∑
l=1
Y 2···l − C,
SSA =1
bcr
∑
i
Y 2i··· − C,
SSB =1
acr
∑
j
Y 2·j·· − C,
SSA×B =1cr
∑
i
∑
j
Y 2ij·· − C − SSA − SSB ,
280 7. Multifactor Experiments
SSC =1
abr
∑
k
Y 2··k· − C,
SSA×C =1br
∑
i
∑
k
Y 2i·k· − C − SSA − SSC ,
SSB×C =1ar
∑
j
∑
k
Y 2·jk· − C − SSB − SSC ,
SSA×B×C =1r
∑
i
∑
j
∑
k
Y 2ijk· − C,
− SSA − SSB − SSC
− SSA×B − SSA×C − SSB×C ,
SSError = SSTotal − SSBlock
− SSA − SSB − SSC
− SSA×B − SSA×C − SSB×C
− SSA×B×C .
As in the above models with fixed effects, MS = SS/df holds (cf. Table7.23). The test statistics, in general, are
FEffect =MSEffect
MSError. (7.108)
Source SS df MS F
Block SSBlock r − 1 MSBlock FBlock
Factor A SSA a− 1 MSA FA
Factor B SSB b− 1 MSB FB
Factor C SSC c− 1 MSC FC
A×B SSA×B (a− 1)(b− 1) MSA×B FA×B
A× C SSA×C (a− 1)(c− 1) MSA×C FA×C
B × C SSB×C (b− 1)(c− 1) MSB×C FB×C
A×B × C SSA×B×C (a− 1)(b− 1)(c− 1) MSA×B×C FA×B×C
Error SSError (r − 1)(abc− 1) MSError
Total SSTotal abcr − 1
Table 7.23. Three–factorial analysis of variance table.
Example 7.6. The firmness Y of a ceramic material is dependent on thepressure (A), on the temperature (B), and on an additive (C). A three–factorial experiment, that includes all three factors at two levels, low/high,is to analyze the influence on the response Y . A randomized block designis chosen with r = 2 blocks of workpieces that are homogeneous withinthe blocks and heterogeneous between the blocks. The results are shown inTable 7.24.
7.7 Three–Factorial Designs 281
Block Block1 2 1 2C1 C2 Sum
A1 B1 14 , 16 4 , 8 42B2 7 , 11 24 , 32 74
48 68 116A2 B1 18 , 20 6 , 10 54
B2 9 , 10 26 , 34 7957 76 133
Sum 105 144 249
Y···1 = 108 , Y···2 = 141
Table 7.24. Response values for Example 7.6.
We compute (N = abcr = 24 = 16)
C =Y 2····N
=2492
16= 3875.06,
SSTotal = 5175− C = 1299.94,
SSBlock =18(1082 + 1412)− C = 3943.13− C = 68.07,
SSA =18(1162 + 1332)− C = 3893.13− C = 18.07,
SSB =18((42 + 54)2 + (74 + 79)2)− C = 4078.13− C = 203.07,
SSA×B =14(422 + 742 + 542 + 792)− C − SSA − SSB
= 4099.25− C − SSA − SSB = 3.05,
SSC =18(1052 + 1442)− C = 3970.13− C = 95.07,
SSA×C =14(482 + 682 + 572 + 762)− C − SSA − SSC = 0.05,
SSB×C =14((14 + 16 + 18 + 20)2 + (4 + 8 + 6 + 10)2
+ (7 + 11 + 9 + 10)2 + (24 + 32 + 26 + 34)2)− C − SSB − SSC = 885.05,
SSA×B×C =12((14 + 16)2 + · · ·+ (26 + 34)2)− C
− SSA − SSB − SSA×B − SSC − SSA×C − SSB×C
= 3.08,
SSError = 24.43 .
282 7. Multifactor Experiments
Result: The F–tests with F1,7;0.95 = 5.99 show significance for the followingeffects: block, B, C, and B × C. The influence of A is significant for noneof the effects, hence the analysis can be done in a two–factorial /B × C)–design (Table 7.26, F1,11;0.95 = 4.84). The response Y is maximized for thecombination B2 × C2.
SS df MS FBlock 68.07 1 68.07 19.50 *Factor A 18.07 1 18.07 5.18Factor B 203.07 1 203.07 58.19 *Factor C 95.07 1 95.07 27.24 *A×B 3.05 1 3.05 0.87A× C 0.05 1 0.05 0.01B × C 885.05 1 885.05 253.60 *A×B × C 3.08 1 3.08 0.88Error 24.43 7 3.49Total 1299.94 15
Table 7.25. Analysis of variance in the (A×B × C)–design for Example 7.6.
SS df MS FBlock 68.07 1 68.07 15.37 *Factor B 203.07 1 203.07 45.84 *Factor C 95.07 1 95.07 21.46 *B × C 885.05 1 885.05 199.79 *Error 48.68 11 4.43Total 1299.94 15
Table 7.26. Analysis of variance in the (B × C)–design for Example 7.6.
Remark: Three-factorial design models with random effects are discussedin Burdick (1994). Confidence intervals are used for testing the significanceof variance components.
7.8 Split–Plot Design 283
C1 C2
50 u
u
u
u
!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!
A1
A2
A2
A1
Figure 7.5. (A× C)–response
C1 C2
50
u
u
u
u
##
##
##
##
##
##
###
aaaaaaaaaaaaaaa
B1
B2
B2
B1
Figure 7.6. (B × C)–response
7.8 Split–Plot Design
In many practical applications of the randomized block design it is not pos-sible to arrange all factor combinations at random within one block. This isthe case if the factors require different sizes of experimental units, e.g., be-cause of technical reasons. Consider some examples (cf. Montgomery, 1976,pp. 292–300; Petersen, 1985, pp. 134–145):
284 7. Multifactor Experiments
B1 B2
50
u
u
uu
´´
´´
´´
´´
´´
´´
´´
©©©©©©©©©©©©©©©
A1
A2
A2
A1
Figure 7.7. (A×B)–response
• Employment of various drill machines (Factor B, only possible onlarger fields) and of various fertilizers (Factor C, may be employedon smaller fields as well). In this case Factor B is set and only FactorC is randomized in the blocks.
• Combination of three different paper pulp preparation methods and offour different temperatures in paper manufacturing. Each replicate ofthe experiment requires 12 observations. In a completely randomizeddesign, a factor combination (pulp i, temperature j) would have tobe chosen at random within the block. In this example, however, thisprocedure may not be economical. Hence, the three types of pulpare divided in four sample units and the temperature is randomizedwithin these units.
Split–plot designs are used if the possibilities for randomization are re-stricted. The large units are called whole–plots while the smaller units arecalled subplots (or split–plots).
In this design of experiment, the whole–plot factor effects are estimatedfrom the large units while the subplot effects and the interaction whole–plot– subplot is estimated from the small units. This design, however, leadsto two experimental errors. The error associated with the subplot is thesmaller one. The reason for this is the larger number of degrees of freedomof the subplot error, as well as the fact that the units in the subplots tendto be positively correlated in the response.
In our examples:
• the drill machine is the whole–plot and the fertilizer the subplot; and
• the type of pulp is the whole–plot and the temperature is the subplot.
7.8 Split–Plot Design 285
The linear model for the two–factorial split–plot design is (Montgomery,1976, p. 293)
yijk = µ + τi + βj + (τβ)ij + γk + (τγ)ik + (βγ)jk + (τβγ)ijk + εijk
(i = 1, . . . , a, j = 1, . . . , b, k = 1, . . . , c) ,(7.109)
where the parameters
τi: random block effect (Factor A);βj : whole–plot effect (Factor B);(τβ)ij : whole–plot error (= (A×B)–interaction);
are the whole–plot parameters and the subplot parameters are
γk: treatment effect factor C;(τγ)ik: (A× C)–interaction;(βγ)jk: (B × C)–interaction;(τβγ)ijk: subplot error (= A×B × C)–interaction).
The sums of squares are computed as in the three–factorial model withoutreplication (i.e., r = 1 in the SS’s of the previous section).
The test statistics are given in Table 7.27. The effects to be tested arethe main effects of Factor B and Factor C as well as the interaction B×C.The test strategy starts out as in the two–factorial model, that is, with the(B × C)–interaction.
Source SS df MS F
Block(A) SSA a − 1 MSAFactor B SSB b − 1 MSB FB = MSB/MSA×BError(A × B) SSA×B (a − 1)(b − 1) MSA×B
Factor C SSC c − 1 MSC FC = MSC /MSA×B×CA × C SSA×C (a − 1)(c − 1) MSA×CB × C SSB×C (b − 1)(c − 1) MSB×C FB×C = MSB×C /MSA×B×CError(A×B×C) SSA×B×C (a − 1)(b − 1)(c − 1) MSA×B×C
Totel SSTotal abc − 1
Table 7.27. Analysis of variance in the split–plot design.
Example 7.7. A laboratory has two furnaces of which one can only beheated up to 500 C. The hardness of a ceramic, having dependence upontwo additives, and the temperature is to be tested in a split–plot design.
Factor A (block): replication on r = 3 days.Factor B (whole–plot): temperature:
B1: 500 C (furnace I),B2: 750 C (furnace II).
286 7. Multifactor Experiments
Factor C (subplot): additive:C1: 10%,C2: 20%.
Because of F1,2;0.95 = 18.51, only Factor C is significant (Table 7.29). Hencethe experiment can be conducted with a single–factor additive (Table 7.30).
IB1 B2
C1 C2
4 6C2 C1
7 5
IIB1 B2
C2 C2
7 7C1 C1
5 6
IIIB1 B2
C2 C1
9 9C1 C2
4 10
Block B1 B2 SumI 11 11 22II 12 13 25III 13 19 32
Sum 36 43 79
B1 B2
C1 13 20 33C2 23 23 46
36 43 79
Table 7.28. Response tables.
SS df MS FBlock (A) 13.17 2 6.58Factor B 4.08 1 4.08 FB = 1.58Error (A×B) 5.17 2 2.58Factor C 14.08 1 14.08 FC = 24.14 *A× C 1.17 2 0.58B × C 4.08 1 4.08 FB×C = 7.00Error (A×B × C) 1.17 2 0.58Total 42.92 11
Table 7.29. Analysis of variance table for Example 7.7.
Source SS df MS FFactor C 14.08 1 14.08 FC = 4.88Error 28.83 10 2.88Total 42.92 11
Table 7.30. One–factor analysis of variance table (Example 7.7).
7.9 2k Factorial Design 287
Remark: Generalizations in model (7.109) are discussed in Algina (1995),Algina, (1997), especially with respect to unequal group dispersion ma-trices. The analysis of covariance in various types of split–plot design ispresented by Brzeskwiniewicz and Wagner (1991).
7.9 2k-Factorial Design
Especially in the industrial area, factorial designs at the first stage of ananalysis are usually conducted with only two factor levels for each of theincluded factors. The idea of this procedure is to make the important effectsidentifiable so that the analysis in the following stages can test factor com-binations more specifically and more cost–effectively. A complete analysiswith k factors, each of two levels, requires 2k replications for one trial. Thisfact leads to the nomenclature of the design: the 2k experiment. The restric-tion to two levels for all factors makes a minimum of observations possiblefor a complete factorial experiment with all two–way and higher–order in-teractions. We assume fixed effects and complete randomization. The samelinear models and constraints, as for the previous two– and three–factorialdesigns, are valid in the 2k design, too. The advantage of this design isthe immediate computation of the sums of squares from special constraintswhich are linked to the effects.
Definition 7.1. The list of treatments can be expressed in a standard order.For one factor A, the standard order is (1), a. For two factors A and B,the standard order is obtained by adding b and ab which are derived bymultiplying (1) and a by b, i.e., b× (1), a. So the standard order is
(1), a, b, ab.
For three factors, we add c, ac, bc and abc which are derived by multiplyingthe earlier standard order of two factors by c, i.e., b × (1), a, b, ab. Sothe standard order is
(1), a, b, ab, c, ac, bc, abc.
Thus the standard order of any factor is obtained step by step by multi-plying it with additional letter to preceding standard order. For example,the standard order of A, B, C and D in 24 factorial experiment is(1), a, b, ab, c, ac, bc, abc, d × (1), a, b, ab, c, ac, bc, . So thestandard order is
(1), a, b, ab, c, ac, bc, abc, d, ad, bd, abd, cd, acd, bcd, abcd.
288 7. Multifactor Experiments
7.9.1 The 22 Design
The 22 design has already been introduced in Section 7.1. Two factorsA and B are run at two levels each (e.g., low and high). The chosenparametrization is usually
low: 0, high: 1 .The high levels of the factors are represented by a or b, respectively, and
the low level is denoted by the absence of the corresponding letter. If bothfactors are at the low level, (1) is used as representation:
(0, 0) −→ (1),(1, 0) −→ a,
(0, 1) −→ b,
(1, 1) −→ ab .
Here (1), a, b, ab denote the response for all r replicates. The average effectof a factor is defined as the reaction of the response to a change of level ofthis factor, averaged over the levels of the other factor. The effect of A atthe low level of B is [a − (1)]/r and the effect of A at the high level of Bis [ab− b]r. The average effect of A is then
A =12r
[ab + a− b− (1)] . (7.110)
The average effect of B is
B =12r
[ab + b− a− (1)] . (7.111)
The interaction effect AB is defined as the average difference between theeffect of A at the high level of B and the effect of A at the low level of B.Thus
AB =12r
[(ab− b)− (a− (1))]
=12r
[ab + (1)− a− b] . (7.112)
Similarly, the effect BA may be defined as the average difference betweenthe effect of B at the high level of A (i.e., (ab − a)/r) and the effect ofB at the low level of A (i.e., (b − (1))/r). We obviously have AB = BA.Hence, the average effects A, B, and AB are linear orthogonal contrasts inthe total response values (1), a, b, ab, except for the factor 1/2r.
Let Y∗ = ((1), a, b, ab)′ be the vector of the total response values. Then
A = 12r c′AY∗, B = 1
2r c′BY∗ ,
AB = 12r c′ABY∗ ,
(7.113)
holds where the contrasts cA, cB , cAB are taken from Table 7.31.We have c′AcA = c′BcB = c′ABcAB = 4.
7.9 2k Factorial Design 289
(1) a b ab ContrastFactor A -1 +1 -1 +1 c′AFactor B -1 -1 +1 +1 c′B
AB +1 -1 -1 +1 c′AB
Table 7.31. Contrasts in the 22 design.
From Section 4.3.2, we find the following sums of squares:
SSA =(c′AY∗)2
(rc′AcA)=
(ab + a− b− (1))2
4r, (7.114)
SSB =(c′BY∗)2
(rc′BcB)=
(ab + b− a− (1))2
4r, (7.115)
SSAB =(c′ABY∗)2
(rc′ABcAB)=
(ab + (1)− a− b)2
4r. (7.116)
The sum of squares SSTotal is computed as usual
SSTotal =2∑
i=1
2∑
j=1
r∑
k=1
y2ijk −
Y 2...
4r(7.117)
and has (2 · 2 · r)− 1 degrees of freedom. As usual, we have
SSError = SSTotal − SSA − SSB − SSAB . (7.118)
We now illustrate this procedure with an example.
Example 7.8. We wish to investigate the influence of Factors A (tempera-ture, 0 : low, 1 : high) and B (catalytic converter, 0 : not used, 1 : used)on the response Y (hardness of a ceramic material). The response is shownin Table 7.32.
Replication TotalCombination 1 2 response Coding
(0, 0) 86 92 178 (1)(1, 0) 47 39 86 a(0, 1) 104 114 218 b(1, 1) 141 153 294 ab
Y... = 776
Table 7.32. Response in Example 7.8.
290 7. Multifactor Experiments
From Table 7.32, we obtain the average effects
A =14
[294 + 86− 218− 178] = −4,
B =14
[294 + 218− 86− 178] = 62,
AB =14
[294 + 178− 86− 218] = 42,
and from these the sums of squares
SSA =(4A)2
4 · 2 = 32,
SSB =(4B)2
4 · 2 = 7688,
SSAB =(4AB)2
4 · 2 = 3528 .
Furthermore, we have
SSTotal = (862 + . . . + 1532)− 7762
8= 86692− 75272 = 11420 ,
SSError = 172 .
The analysis of variance table is shown in Table 7.33.
SS df MS FFactor A 32 1 32 FA = 0.74Factor B 7688 1 7688 FB = 178.79 *
AB 3528 1 3528 FAB = 82.05 *Error 172 4 43Total 11420 7
Table 7.33. Analysis of variance for Example 7.8.
7.9.2 The 23 Design
Suppose that in a complete factorial experiment three binary factors A,B, C are to be studied. The number of combinations is eight and with rreplicates we have N = 8r observations that are to be analyzed for theirinfluence on a response.
Assume the total response values are (in standard order)
Y∗ = [(1), a, b, ab, c, ac, bc, abc]′ . (7.119)
In the coding 0: low and 1: high, this corresponds to the triples(0, 0, 0), (1, 0, 0), (0, 1, 0), (1, 1, 0), . . . , (1, 1, 1). The response values can be
7.9 2k Factorial Design 291
arranged as a three–dimensional contingency table (cf. Table 7.35). Theeffects are determined by linear contrasts
c′Effect · ((1), a, b, ab, c, ac, bc, abc) = c′Effect · Y∗ (7.120)
(cf. Table 7.34).
Factorial Factor combinationeffect (1) a b ab c ac bc abc
I + + + + + + + +A – + – + – + – +B – – + + – – + +
AB + – – + + – – +C – – – – + + + +
AC + – + – – + – +BC + + – – – – + +
ABC – + + – + – – +
Table 7.34. Algebraic structure for the computation of the effects from the totalresponse values.
The first row in Table 7.34 is a basic element. With this element, the totalresponse Y.... = 1′Y∗ can be computed. If the other rows are multiplied withthe first row, they stay unchanged (therefore I for identity). Every otherrow has the same numbers of + and – signs. If + is replaced by 1 and – isreplaced by −1, we obtain vectors of orthogonal contrasts with the norm 8.
If each row is multiplied by itself, we obtain I (row 1). The product ofany two rows leads to a different row of Table 7.34. For example, we have
A ·B = AB,
(AB) · (B) = A ·B2 = A,
(AC) · (BC) = A · C2B = AB .
The sums of squares in the 23 design are
SSEffect =(Contrast)2
8r. (7.121)
Estimation of the Effects
The algebraic structure of Table 7.34 immediately leads to the estimatesof the average effects. For instance, the average effect A is
A =14r
[a− (1) + ab− b + ac− c + abc− bc] . (7.122)
Explanation. The average effect of A at the low level of B and C is
(1 0 0)− (0 0 0) : [a− (1)]/r .
292 7. Multifactor Experiments
The average effect of A at the high level of B and the low level of C is
(1 1 0)− (0 1 0) : [ab− b]/r .
The average effect of A at the low level of B and the high level of C is
(1 0 1)− (0 0 1) : [ac− c]/r .
The average effect of A at the high level of B and C is
(1 1 1)− (0 1 1) : [abc− bc]/r .
Hence for all combinations of B and C the average effect of A is the averageof these four values, which equals (7.122). Similarly, we obtain the otheraverage effects
B =14r
[b + ab + bc + abc− (1)− a− c− ac] , (7.123)
C =14r
[c + ac + bc + abc− (1)− a− b− ab] , (7.124)
AB =14r
[(1) + ab + c + abc− a− b− ac− bc] , (7.125)
AC =14r
[(1) + b + ac + abc− a− ab− c− bc] , (7.126)
BC =14r
[(1) + a + bc + abc− b− ab− c− ac] , (7.127)
ABC =14r
[(abc− bc)− (ac− c)− (ab− b) + (a− (1))]
=14r
[abc + a + b + c− ab− ac− bc− (1)] . (7.128)
Example 7.9. We demonstrate the analysis by means of Table 7.35. Wehave r = 2.
Factor B0 1
Factor C Factor CFactor A 0 1 0 1
4 7 20 100 5 9 14 6
9 = (1) 16 = c 34 = b 16 = bc4 2 4 14
1 11 7 6 1615 = a 9 = ac 10 = ab 30 = abc
Table 7.35. Example for a 23 design with r = 2 replicates.
7.9 2k Factorial Design 293
Average Effects
A =18
[15− 9 + 10− 34 + 9− 16 + 30− 16] =18[64− 75]
= −11/8 = −1.375,
B =18
[34 + 10 + 16 + 30− (9 + 15 + 16 + 9)] =18[90− 49]
= 41/8 = 5.125,
C =18
[16 + 9 + 16 + 30− (9 + 15 + 34 + 10)] =18[71− 68]
= 3/8 = 0.375,
AB =18
[9 + 10 + 16 + 30− (15 + 34 + 9 + 16)] =18[65− 74]
= −9/8 = −1.125,
AC =18
[9 + 34 + 9 + 30− (15 + 10 + 16 + 16)] =18[82− 57]
= 25/8 = 3.125,
BC =18
[9 + 15 + 16 + 30− (34 + 10 + 16 + 9)] =18[70− 69]
= 1/8 = 0.125,
ABC =18
[30 + 15 + 34 + 16− (10 + 9 + 16 + 9)] =18[95− 44]
= 51/8 = 6.375 .
SS df MS FFactor A 7.56 1 7.56 0.87Factor B 105.06 1 105.06 12.09 *
AB 5.06 1 5.06 0.58Factor C 0.56 1 0.56 0.06
AC 39.06 1 39.06 4.49BC 0.06 1 0.06 0.01
ABC 162.56 1 162.56 18.71 *Error 69.52 8 8.69Total 389.44 15
Table 7.36. Analysis of variance for Table 7.35.
294 7. Multifactor Experiments
The sums of squares are (cf. (7.121))
SSA = 112/16 = 7.56, SSAB = 92/16 = 5.06,
SSB = 412/16 = 105.06, SSAC = 252/16 = 39.06,
SSC = 32/16 = 0.56, SSBC = 12/16 = 0.06 .
SSABC = 512/16 = 162.56,
SSTotal = (42 + 52 + . . . + 142 + 162)−1392/16
= 1597− 1207.56 = 389.44,
SSError = 69.52,
The critical value for the F–statistics is F1,8;0.95 = 5.32 (cf. Table 7.36).Since the ABC effect is significant, no reduction to a two–factorial modelis possible.
7.10 Confounding
If the number of factors or levels increase in a factorial experiment, thenthe number of treatment combinations increases rapidly. When the numberof treatment combinations is large, then it may be difficult to get the blocksof sufficiently large size to accommodate all the treatment combinations.Under such situations, one may use either connected incomplete block de-signs, e.g., BIBD where all the main effects and interaction contrasts canbe estimated or use unconnected designs where not all these contrasts canbe estimated. Non-estimable contrasts are said to be confounded. Note thata linear function λ′β is said to be estimable if there exist a linear functionl′y of the observations on random variable y such that E(l′y) = λ′β. Nowthere arise two questions. Firstly, what does confounding means and sec-ondly, how does it compares to using BIBD. For notational simplicity, werepresent the interactions A×B as AB, A×B × C as ABC, etc.
In order to understand the confounding, let us consider a simple exampleof 22 factorial with factors a and b. The four treatment combinations are(1), a, b and ab. Suppose each batch of raw material to be used in theexperiment is enough only for two treatment combinations to be tested. Sotwo batches of raw material are required. Thus two out of four treatmentcombinations must be assigned to each block. Suppose this 22 factorialexperiment is being conducted in a randomized block design. Then thecorresponding model is
E(yij) = µ + βi + τj , [cf. (5.1)] (7.129)
7.10 Confounding 295
then
A =12r
[ab + a− b− (1)] , (7.130)
B =12r
[ab + b− a− (1)] , (7.131)
AB =12r
[ab + (1)− a− b]. (7.132)
Suppose the following block arrangement is opted:
Block 1 Block 2(1) aab b
The block effects of blocks 1 and 2 are β1 and β2, respectively, then theaverage responses corresponding to treatment combinations a, b, ab and(1) using (7.129) are
E[y(a)] = µ + β2 + τ(a) , (7.133)E[y(b)] = µ + β2 + τ(b) , (7.134)
E[y(ab)] = µ + β1 + τ(ab) , (7.135)E[y(1)] = µ + β1 + τ(1) , (7.136)
respectively. Here y(a), y(b), y(ab), y(1) and τ(a), τ(b), τ(ab), τ(1) denotethe responses and treatments corresponding to a, b, ab and (1), respectively.Ignoring the factor 1/2r in (7.130)-(7.132) and using (7.133)-(7.136), theeffects A is expressible as follows:
A = [µ + β1 + τ(ab)] + [µ + β2 + τ(a)]−[µ + β2 + τ(b)]− [µ + β1 + τ(1)]
= τ(ab) + τ(a)− τ(b)− τ(1). (7.137)
So the block effect is not present in (7.137) and is not mixed up withthe treatment effects. In this case, we say that the main effect A is notconfounded. Similarly, for the main effect B, we have
B = [µ + β1 + τ(ab)] + [µ + β2 + τ(b)]−[µ + β2 + τ(a)]− [µ + β1 + τ(1)]
= τ(ab) + τ(b)− τ(a)− τ(1). (7.138)
So there is no block effect present in (7.138) and thus B is not confounded.For the interaction effect AB, we have
AB = [µ + β1 + τ(ab)] + [µ + β1 + τ(1)]−[µ + β2 + τ(a)]− [µ + β2 + τ(b)]
= 2(β1 − β2) + τ(ab) + τ(1)− τ(a)− τ(b). (7.139)
296 7. Multifactor Experiments
Here β1 and β2 are mixed up with the block effects and can not be separatedindividually from the treatment effects in (7.139). So AB is said to beconfounded (or mixed up) with the blocks.
If the arrangement is like as follows:
Block 1 Block 2a (1)ab b
then the main effect A is expressible as
A = [µ + β1 + τ(ab)] + [µ + β1 + τ(a)]−[µ + β2 + τ(b)]− [µ + β2 + τ(1)]
= 2(β1 − β2) + τ(ab) + τ(a)− τ(b)− τ(1). (7.140)
So the main effect A is confounded with the blocks in this arrangement oftreatments.
We notice that it is in our control to decide that which of the effectis to be confounded. The order in which treatments are run in a block isdetermined randomly. The choice of block to be run first is also randomlydecided.
The following observation emerges from the allocation of treatments inblocks.
For a given effect, when two treatment combinationswith same signs are assigned to one block and other twotreatment combinations with same but opposite signs areassigned to another block, then the effect gets confounded.
For example, in case AB is confounded as in (7.139), then
• ab and (1) with + signs are assigned to block 1 whereas
• a and b with − signs are assigned to block 2.
Similarly when A is confounded as in (7.140), then
• a and ab with + signs are assigned to block 1 whereas
• (1) and b with − signs are assigned to block 2.
The reason behind this observation is that if every block has treatmentcombinations in the form of linear contrast, then effects are estimable andthus unconfounded. This is also evident from the theory of linear estimationthat a linear parametric function is estimable if it is in the form of a linearcontrast.
The contrasts which are not estimable are said to be confounded withthe differences between blocks (or block effects). The contrasts which areestimable are said to be unconfounded with blocks or free from block effects.
7.10 Confounding 297
Now we explain how confounding and BIBD compares together. Considera 23 factorial experiment which needs the block size to be 8. Suppose theraw material available to conduct the experiment is sufficient only for ablock of size 4. One can use a BIBD in this case with parameters b=14,k=4, v=8, r=7 and λ=3 (such BIBD exists). For this BIBD, the efficiencyfactor is
E =λv
kr=
68
and
Var(τj − τj′)BIBD =2k
λvσ2 =
26σ2 (j 6= j′). (7.141)
Consider now an unconnected design in which 7 out of 14 blocks gettreatment combination in block 1 as
a b c abc
and remaining 7 blocks get treatment combination in block 2 as
(1) ab bc ac .
In this case, all the effects A,B, C,AB, BC and AC are estimable but ABCis not estimable because the treatment combinations with all + and all −signs in
ABC = (a− 1)(b− 1)(c− 1)= (a + b + c + abc)︸ ︷︷ ︸
in block 1
− ((1) + ab + bc + ac)︸ ︷︷ ︸in block 2
are contained in same blocks. In this case, the variance of estimates ofunconfounded main effects and interactions is 8σ2/7. Note that in case ofRBD,
Var(τj − τj′)RBD =2σ2
r=
2σ2
7(j 6= j′) (7.142)
and there are four linear contrasts, so the total variance is 4 × (2σ2/7)which gives the factor 8σ2/7 and which is smaller than the variance underBIBD as in (7.141).
We observe that at the cost of not being able to estimate ABC, wehave better estimates of A,B, C, AB,BC and AC with the same numberof replicates as in BIBD. Since higher order interactions are difficult tointerpret and are usually not large, so it is much better to use confoundingarrangements which provide better estimates of the interactions in whichwe are more interested.
The reader may note that this example is for understanding only. As suchthe concepts behind incomplete block design and confounding are different.
298 7. Multifactor Experiments
Definition 7.2. The arrangement of treatment combinations in differentblocks, whereby some pre-determined effect (either main or interaction)contrasts are confounded is called a confounding arrangement.
For example, when the interaction ABC is confounded in a 23 factor-ial experiment, then the confounding arrangement consists of dividing theeight treatment combinations into following two sets:
a b c abc
and
(1) ab bc ac
With the treatments of each set being assigned to the same block andeach of these sets being replicated same number of times in the experi-ment, we say that we have a confounding arrangement of a 23 factorialin two blocks. It may be noted that any confounding arrangement has tobe such that only predetermined interactions are confounded and the es-timates of interactions which are not confounded are orthogonal wheneverthe interactions are orthogonal.
Definition 7.3. The interactions which are confounded are called thedefining contrasts of the confounding arrangement.
A confounded contrast will have treatment combinations with the samesigns in each block of the confounding arrangement. For example, if anothereffect AB is to be confounded, then we follow from Table 7.34 and put allfactor combinations with + sign, i.e., (1), ab, c and abc in one block andall other factor combinations with − sign, i.e., a, b, ac and bc in anotherblock. So the block size reduces to 4 from 8 when one effect is confoundedin 23 factorial experiment.
Suppose if along with ABC confounded, we want to confound C also.To obtain such blocks, consider the blocks where ABC is confounded anddivide them into further halves. So the block
a b c abc
is divided into following two blocks:
a b and c abc
and the block
(1) ab bc ac
is divided into following two blocks:
(1) ab and bc ac
These blocks of 4 treatments are divided into 2 blocks with each hav-ing 2 treatments and they are obtained in the following way. If only C is
7.10 Confounding 299
confounded then the block with + sign of treatment combinations in C is
c ac bc abc
and block with − sign of treatment combinations in C is
(1) a b ab .
Now look into the
(i) following block with + sign when ABC is confounded,
a b c abc (7.143)
(ii) following block with + sign when C is confounded and
c ab bc abc (7.144)
(iii) Table 7.34.
Identify the treatment combinations having common + signs in these twoblocks in (7.143) and (7.144) from Table (7.34). These treatment combina-tions are c and abc. So assign them into one block. The remaining treatmentcombinations out of a, b, c and abc are a and b which go into another block.
Similarly, look into the
(i) following block with − sign when ABC is confounded,
(1) ab bc ac (7.145)
(ii) following block with − sign when C is confounded and
(1) a b ab (7.146)
(iii) Table 7.34.
Identify the treatment combinations having common − sign in these twoblocks in (7.145) and (7.146) from Table 7.34. These treatment combina-tions are (1) and ab which go into one block and remaining two treatmentcombinations ac and bc out of c, ac, bc and abc go into another block. Sothe blocks where both ABC and C are confounded together are
(1) ab , a b , ac bc and c abc .
While making these assignments of treatment combinations into fourblocks, each of size two, we notice that another effect, viz., AB also getsconfounded automatically. Thus we see that when we confound two factors,a third factor is automatically getting confounded. This situation is quitegeneral. The defining contrasts for a confounding arrangement cannot bechosen arbitrarily. If some defining contrasts are selected then some otherwill also get confounded.
Now we present some definitions which are useful in describing theconfounding arrangements.
300 7. Multifactor Experiments
Definition 7.4. Given any two interactions, the generalized interaction isobtained by multiplying the factors (in capital letters) and ignoring all theterms with an even exponent.
For example, the generalized interaction of the factor ABC and BCDis ABC × BCD = AB2C2D = AD and the generalized interaction of thefactors AB, BC and ABC is AB ×BC ×ABC = A2B3C2 = B.
Definition 7.5. A set of main effects and interaction contrasts is called inde-pendent if no member of the set can be obtained as a generalized interactionof the other members of the set.
For example, the set of factors AB,BC and AD is an independent set butthe set of factors AB, BC, CD and AD is not an independent set becauseAB ×BC × CD = AB2C2D = AD which is already contained in the set.
Definition 7.6. The treatment combination apbqcr... is said to be orthogonalto the interaction AxByCz . . . if (px+qy+rz+...) is divisible by 2. Since p, q,r, ..., x, y, z,... are either 0 or 1, so a treatment combination is orthogonal toan interaction if they have an even number of letters in common. Treatmentcombination (1) is orthogonal to every interaction.
If ap1bq1cr1 . . . and ap2bq2cr2 . . . are both orthogonal to AxByCz . . ., thenthe product ap1+p2bq1+q2cr1+r2 . . . is also orthogonal to AxByCz . . . Simi-larly, if two interactions are orthogonal to a treatment combination, thentheir generalized interaction is also orthogonal to it.
Now we give some general results for a confounding arrangement. Sup-pose we wish to have a confounding arrangement in 2p blocks of a 2k
factorial experiment. Then we have the following observations:
1. The size of each block is 2k−p.
2. The number of elements in defining contrasts is (2p−1), i.e., (2p−1)interactions have to be confounded.Proof: If p factors are to be confounded, then the number of mth
order interaction with p factors is(
pm
), (m = 1, 2, . . . , p). So the
total number of factors to be confounded are∑p
m=1
(pm
)= 2p−1.
3. If any two interactions are confounded, then their generalizedinteractions are also confounded.
4. The number of independent contrasts out of (2p−1) defining contrastsis p and rest are obtained as generalized interactions.
5. Number of effects getting confounded automatically is (2p − p− 1).
To illustrate this, consider a 25 factorial (k = 5) with 5 factors, viz.,A,B, C, D and E. The factors are to be confounded in 23 blocks (p = 3).
7.10 Confounding 301
So the size of each block is 25−3 = 4. The number of defining contrastsis 23 − 1 = 7. The number of independent contrasts which can be chosenarbitrarily is 3 (i.e., p) out of 7 defining contrasts. Suppose we choose p = 3independent contrasts as
(i) ACE
(ii) CDE
(iii) ABDE
and then the remaining 4 out of 7 defining contrasts are obtained as
(iv) (ACE)× (CDE) = AC2DE2 = ADE
(v) (ACE)× (ABDE) = A2BCDE2 = BCD
(vi) (CDE)× (ABDE) = ABCD2E2 = ABC
(vii) (ACE)× (CDE)× (ABDE) = A2BC2D2E3 = BE.
Alternatively, if we choose another set of p = 3 independent contrast as
(i) ABCD
(ii) ACDE
(iii) ABCDE,
then the defining contrasts are obtained as
(iv) (ABCD)× (ACDE) = A2BC2D2E = BE
(v) (ABCD)× (ABCDE) = A2B2C2D2E = E
(vi) (ACDE)× (ABCDE) = A2BC2D2E2 = B
(vii) (ABCD)× (ACDE)× (ABCDE) = A3B2C3D3E2 = ACD.
In this case, the main effects B and E also get confounded.As a rule, try to confound, as far as possible, higher order interactions
only because they are difficult to interpret.After selecting p independent defining contrasts, divide the 2k treatment
combinations into 2p groups of 2k−p combinations each, and each groupgoing into one block.
Definition 7.7. Group containing the combination (1) is called the principalblock or key block. It contains all the treatment combinations which areorthogonal to the chosen independent defining contrasts.
If there are p independent defining contrasts, then any treatment combi-nation in principal block is orthogonal to p independent defining contrasts.In order to obtain the principal block,
— write the treatment combinations in standard order.
302 7. Multifactor Experiments
— check each one of them for orthogonality.
— if two treatment combinations belongs to the principal block, theirproduct also belongs to the principal block.
— when few treatment combinations of the principal block have beendetermined, other treatment combinations can be obtained bymultiplication rule.
Now we illustrate these steps in the following example.
Example 7.10. Consider the setup of a 25 factorial experiment in which wewant to divide the total treatment effects into 23 groups by confoundingthree effects AD, BE and ABC. The generalized interactions in this caseare ADBE, BCD, ACE and CDE.
In order to find the principal block, first write the treatment combina-tions in standard order as follows.
(1) a b ab c ac bc abcd ad bd abd cd acd bcd abcde ae be abe ce ace bce abcede ade bde abde cde acde bcde abcde.
Place a treatment combination in the principal block if it has an even num-ber of letters in common with the confounded effects AD, BE and ABC.The principal block has (1), acd, bce and abde (=acd× bce). Obtain otherblocks of confounding arrangement from principal block by multiplying thetreatment combinations of the principal block by a treatment combinationnot occurring in it or in any other block already obtained. In other words,choose treatment combinations not occurring in it and multiply with themin the principal block. Choose only distinct blocks. In this case, obtainother blocks by multiplying a, b, ab, c, ac, bc, abc like as follows in Table7.37. They are separated by a dotted line.
Table 7.37. Arrangement of treatments in blocks when AD, BE and ABC areconfounded
Principal Block Block Block Block Block Block BlockBlock 1 2 3 4 5 6 7 8
(1) a b ab c ac bc abcacd cd abcd bcd ad d abd bdbce abce ce ace be abe e aeabde bde ade de abcde bcde acde cde
For example, block 2 is obtained by multiplying a with each factor combi-nation in principal block as (1)×a = a, acd×a = a2cd = cd, bce×a = abce,
7.11 Analysis of Variance in Case of Confounded Effects 303
abde × a = a2bde = bde; block 3 is obtained by multiplying b with (1),acd, bce and abde and similarly other blocks are obtained. If any othertreatment combination is chosen to be multiplied with the treatments inprincipal block, then we get a block which will be one among the blocks1 to 8. For example, if ae is multiplied with the treatments in principalblock, then the block obtained consists of (1) × ae = ae, acd × ae = cde,bce× ae = abc and abde× ae = bd which is same as the block 8.
Alternatively, if ACD, ABCD and ABCDE are to be confounded,then independent defining contrasts are ACD, ABCD,ABCDE and theprincipal block has (1), ac, ad and cd (=ac× ad).
7.11 Analysis of Variance in Case of ConfoundedEffects
When an effect is confounded, it means that it is not estimable. The follow-ing steps are followed to conduct the analysis of variance in case of factorialexperiments with confounded effects:
• Obtain the sum of squares due to main and interaction effects in theusual way as if no effect is confounded.
• Drop the sum of squares corresponding to confounded effects andretain only the sum of squares due to unconfounded effects.
• Find the total sum of squares.
• Obtain the sum of squares due to error and associated degrees offreedom by substraction.
• Conduct the test of hypothesis in the usual way.
Example 7.11. (Example 7.9 continued) We demonstrate the analysis ofvariance under confounded effects with the same Example 7.9. SupposeABC is confounded in the setup of Example 7.9 and all other effects areestimable. So the average effects and the sum of squares of unconfoundedeffects are obtained as earlier
A = −1.375, SSA = 7.56 ,
B = 5.125, SSB = 105.06 ,
C = 0.375, SSC = 0.56 ,
AB = −1.125, SSAB = 5.06 ,
AC = 3.125, SSAC = 39.06 ,
BC = 0.125, SSBC = 0.06.
304 7. Multifactor Experiments
Also, from earlier results
SSTotal = 389.44
and
SSError = SSTotal − (SSA + SSB + SSC + SSAB + SSAC + SSBC)= 232.08.
Table 7.38. Analysis of variance for Example 7.11
Source SS df MS FFactor A 7.56 1 7.56 0.03Factor B 105.06 1 105.06 0.45AB 5.06 1 5.06 0.02Factor C 0.56 1 0.56 0.00AC 39.06 1 39.06 0.17BC 0.06 1 0.06 0.00Error 232.08 9 25.79Total 389.44 15
The critical values for F -statistics is F1,9,0.95 = 5.12. So none of the effectis found to be significant.
It may be noted that in Table 7.36, the effect of B was found to besignificant when ABC was not confounded. Now with ABC confounded,the effect of B turns out to be insignificant in Table 7.38.
7.12 Partial Confounding
The purpose of confounding is to assess more important treatment com-parisons with greater precision. To achieve this, unimportant treatmentcombinations are mixed up deliberately with the incomplete block differ-ences in all the replicates which is termed as total confounding. If suchunimportant treatment combinations are not mixed up in all the repli-cates but an effect is confounded with incomplete block differences in oneor more replicates, another effect is confounded in some other replicatesand so on, then these effects are said to be partially confounded with theincomplete block differences. Thus the treatment combinations are con-founded with incomplete block differences in some of the replicates onlyand are unconfounded in other replicates. In such a case, some factors onwhich information is available from all the replicates are more accuratelydetermined. This type of confounding is called partial confounding.
7.12 Partial Confounding 305
Definition 7.8. If all the effects of a certain order are confounded with in-complete block differences in equal number of replicates in a design, thedesign is said to be balanced partially confounded design. If all the effectsof a certain order are confounded an unequal number of times in a design,the design is said to be unbalanced partially confounded design.
We discuss only the analysis of variance in case of balanced partiallyconfounded design through 22 and 23 factorial experiments.
Example 7.12. Consider the case of 22 factorial as in Table 7.31 in a ran-domized block design where y∗i = ((1), a, b, ab)′ denotes the vector oftotal responses in the ith replication and each treatment is replicated rtimes, i = 1, 2, ..., r. If no factor is confounded then similar to (7.113), wecan write
A =12r
r∑
i=1
c′Ay∗i , (7.147)
B =12r
r∑
i=1
c′By∗i , (7.148)
AB =12r
r∑
i=1
c′ABy∗i , (7.149)
which holds because all the factors are estimated from all the replicatesand contrasts cA, cB , cAB are taken from Table 7.31 and each contrast ishaving 4 elements in it.
We have in this case
c′AcA = c′BcB = c′ABcAB = 4
and the sum of squares in (7.114)-(7.116) remain holds true which can berewritten as
SSA =(∑r
i=1 c′Ay∗i)2
rc′AcA=
(ab + a− b− (1))2
4r, (7.150)
SSB =(∑r
i=1 c′By∗i)2
rc′BcB=
(ab + b− a− (1))2
4r, (7.151)
SSAB =(∑r
i=1 c′ABy∗i)2
rc′ABcAB=
(ab + (1)− a− b)2
4r. (7.152)
Now consider the setup with 3 replicates with each consisting of 2 in-complete blocks as in Figure 7.8. The factor A is confounded in replicate1, factor B is confounded in replicate 2 and interaction AB is confoundedin replicate 3. Suppose we have r repetitions of each of the blocks in thethree replicates. The partitions of replications, the blocks within replicatesand plots within blocks being randomized. Now from the setup of Figure7.8,
306 7. Multifactor Experiments
Replicate 1 Replicate 2Block 1 Block 2 Block 1 Block 2
ab b ab aa (1) b (1)
Replicate 3Block 1 Block 2
ab a(1) b
Figure 7.8. Confounding of A, B and AB in 3 replicates
• factor A can be estimated from replicates 2 and 3,
• factor B can be estimated from replicates 1 and 3 and
• interaction AB can be estimated from replicates 1 and 2.
When A is estimated from replicate 2 only, then
Arep2 =(∑r
i=1 c′A2y∗i)rep2
2r(7.153)
and when A is estimated from replicate 3 only, then
Arep3 =(∑r
i=1 c′A3y∗i)rep3
2r, (7.154)
where c′A2 and c′A3 are the contrasts under replicates 2 and 3, respectivelyand each is having 4 elements in it. Now A is estimated from both thereplicates 2 and 3 as an average of Arep2 and Arep3 as
Apc =Arep2 + Arep3
2
=(∑r
i=1 c′A2y∗i)rep2 + (∑r
i=1 c′A3y∗i)rep3
4r
=∑r
i=1 c∗A′y∗i
4r(7.155)
where the vector
c∗A′ = (cA2, cA3)
consists of 8 elements and subscript pc in Apc denotes the estimate of A un-der partial confounding (pc). The sum of squares under partial confoundingin this case is
SSApc=
(∑r
i=1 c∗A′y∗i)2
rc∗A′c∗A
=(∑r
i=1 c∗A′y∗i)2
8r(7.156)
7.12 Partial Confounding 307
and the variance of Apc is
Var(Apc) =(
14r
)2
Var(r∑
i=1
c∗A′y∗i)
=(
14r
)2
Var
((
r∑
i=1
cA2′y∗i)rep2 + (
r∑
i=1
c′A3y∗i)rep3
)
=(
14r
)2
(4rσ2 + 4rσ2)
=σ2
2r(7.157)
assuming that yij ’s are independent and Var(yij)=σ2 for all i and j.Now suppose A is not confounded in any of the blocks in Figure 7.8.
Then A can be estimated from all the three replicates, each repeated rtimes as
A∗pc =Arep1 + Arep2 + Arep3
3
=(∑r
i=1 c′A1y∗i)rep1 + (∑r
i=1 c′A2y∗i)rep2 + (∑r
i=1 c′A3y∗i)rep3
6r
=∑r
i=1 c∗∗A′y∗i
6r(7.158)
where the vector
c∗∗A′ = (cA1, cA2, cA3)
consists of 12 elements. The variance of A under (7.158) is
Var(A∗pc) =(
16r
)2
Var
((
r∑
i=1
cA1′y∗i)rep1 + (
r∑
i=1
cA2′y∗i)rep2
+(r∑
i=1
cA3′y∗i)rep3
)
=(
16r
)2
(4rσ∗2 + 4rσ∗2 + 4rσ∗2)
=σ∗2
3r(7.159)
assuming that yij ’s are independent and Var(yij)=σ∗2 for all i and j.One may note that the expressions A in (7.147) and A∗pc in (7.158) are
same because A in (7.147) is based on r replications whereas A∗pc in (7.158)is based on 3r replications. If we assume r∗ = 3r then A∗pc in (7.158) be-comes same as A in (7.147). The expressions of variances of A and A∗pc alsoare same if we use r∗ = 3r in (7.159). Comparing (7.157) and (7.159), wesee that the information on A in the partially confounded scheme relative
308 7. Multifactor Experiments
to that in unconfounded scheme is
2r/σ2
3r/σ∗2=
23
σ∗2
σ2. (7.160)
If σ∗2 > 32σ2, then the information in partially confounded design is more
than the information in unconfounded design.Also, the confounded effect is completely lost in total confounding but
some information about the confounded effect can be recovered in par-tial confounding. For example, two third of the total information can berecovered in this case for A (cf. (7.160).
Similarly, when B is estimated from replicates 1 and 3 separately, then
Brep1 =(∑r
i=1 cB1′y∗i)rep1
2r,
Brep3 =(∑r
i=1 cB3′y∗i)rep3
2r
and
Bpc =Brep1 + Brep3
2
=(∑r
i=1 cB1′y∗i)rep1 + (
∑ri=1 cB3
′y∗i)rep3
4r
=∑r
i=1 c∗B′y∗i
4r(7.161)
where the vector
c∗B′ = (cB1, cB3)
consists of 8 elements. The sum of squares due to Bpc is
SSBpc=
(∑r
i=1 c∗B′y∗i)2
rc∗B′c∗B
=(∑r
i=1 c∗B′y∗i)2
8r(7.162)
and the variance of Bpc is
Var(Bpc) =(
14r
)2
Var
(r∑
i=1
c∗B′y∗i
)
=σ2
2r. (7.163)
When AB is estimated from the replicates 1 and 2 separately, then
ABrep1 =(∑r
i=1 cAB1′y∗i)rep1
2r,
ABrep2 =(∑r
i=1 cAB2′y∗i)rep2
2r,
7.12 Partial Confounding 309
and
ABpc =ABrep1 + ABrep2
2
=(∑r
i=1 cAB1′y∗i)rep1 + (
∑ri=1 cAB2
′y∗i)rep2
4r
=∑r
i=1 c∗AB′y∗i
4r(7.164)
where the vector
c∗AB′ = (cAB1, cAB2)
consists of 8 elements. The sum of squares due to ABpc is
SSABpc=
(∑r
i=1 c∗AB′y∗i)2
rc∗AB′c∗AB
=(∑r
i=1 c∗AB′y∗i)2
8r(7.165)
and the variance of ABpc is
Var(ABpc) =(
14r
)2
Var(r∑
i=1
c∗AB′y∗i)
=σ2
2r. (7.166)
Now we illustrate how the sum of squares due to blocks are adjustedunder partial confounding. We consider the setup as in Figure 7.8. Thereare 6 blocks (2 blocks under each replicate 1, 2 and 3), each repeated rtimes. So there are total (6r − 1) degrees of freedom associated with sumof squares due to blocks. The sum of squares due to blocks is divided intotwo parts
– sum of squares due to replicates with (3r−1) degrees of freedom and
– sum of squares due to within replicates with 3r degrees of freedom.
Now, denoting
• Bi to be the total of ith block and
• Ri to be the total due to ith replicate,
310 7. Multifactor Experiments
the sum of squares due to blocks is
SSBlock(pc) =1
Total number of treatments
Total number of blocks∑
i=1
B2i −
Y 2...
N
=122
3r∑
i=1
B2i −
Y 2...
12r; (N = 12r)
=122
3r∑
i=1
(B2
i −R2i + R2
i
)− Y 2...
12r
=122
3r∑
i=1
(B2
i −R2i
)+
(122
3r∑
i=1
R2i −
Y 2...
12r
)
=122
3r∑
i=1
(B2
1i + B22i
2−R2
i
)+
(122
3r∑
i=1
R2i −
Y 2...
12r
)(7.167)
where Bji denotes the total of jth block in ith replicate (j = 1, 2), the sumof squares due to blocks within replications (wr) is
SSBlock(wr) =122
3r∑
i=1
(B2
1i + B22i
2−R2
i
)(7.168)
and the sum of squares due to replications is
SSBlock(r) =122
3r∑
i=1
R2i −
Y 2...
12r. (7.169)
So we have
SSBlock = SSBlock(wr) + SSBlock(r) (7.170)
in case of partial confounding.The total sum of squares is
SSTotal(pc) =∑∑∑
y2ijk −
Y 2...
N; (N = 12r). (7.171)
The analysis of variance table in this case is given in Table 7.39. The testof hypothesis can be carried out in a usual way as in the case of factorialexperiments.
Example 7.13. Consider the setup of 23 factorial experiment with block size22 and 4 replications as in Figure 7.9.
The interaction effects AB, AC, BC and ABC are confounded in repli-cates 1, 2, 3 and 4, respectively. The r replications of each block areobtained, the partitions of replicates, the blocks within replicates andplots within blocks being randomized. In this example, we need to esti-mate the unconfounded factors A,B, C and partially confounded factors
7.12 Partial Confounding 311
Table 7.39. Analysis of variance in 22 factorial under partial confounding as inExample 7.12
Source SS df MSReplicates SSBlock(r) 3r(= r∗) MSBlock(r)
Blocks within SSBlock(wr) 3r − 1(= r∗ − 1) MSBlock(wr)
replicatesFactor A SSApc 1 MSA(pc)
Factor B SSBpc1 MSB(pc)
AB SSABpc1 MSAB(pc)
Error by substraction 6r − 3(= 2r∗ − 3) MSE(pc)
Total SSTotal(pc) 12r − 1(= 4r∗ − 1)
Replicate 1 Replicate 2Block 1 Block 2 Block 1 Block 2
(1) a (1) aab b b abc ac ac c
abc bc abc bc
Replicate 3 Replicate 4Block 1 Block 2 Block 1 Block 2
(1) b (1) aa c ab bbc ab ac cabc ac bc abc
Figure 7.9. Arrangement of treatments in blocks in Example 7.13
AB,AC, BC and ABC. The unconfounded factors can be estimated fromall the four replicates whereas partially confounded factors can be estimatedfrom the following replicates:
• AB from the replicates 2, 3 and 4,
• AC from the replicates 1, 3 and 4,
• BC from the replicates 1, 2 and 4 and
• ABC from the replicates 1, 2 and 3.
Using Table 7.34, (7.119)-(7.128), we first present the estimation of uncon-founded factors A,B and C which are estimated from all the four replicates.
312 7. Multifactor Experiments
The estimation of these factors from lth replicate (l = 1, 2, 3, 4) is as follows:
Arepl=
∑ri=1 c′Aly∗i
4r, (7.172)
A =∑4
l=1 Arepl
4=
∑4l=1
∑ri=1 c′Aly∗i
16r
=∑r
i=1 c∗A′y∗i
16r(7.173)
where the vector
c∗A′ = (cA1, cA2, cA3, cA4)
consists of 32 elements and each cAl(l = 1, 2, 3, 4) is having 8 elements init. The sum of squares due to A is
SSA =(∑r
i=1 c∗A′y∗i)2
rc∗A′c∗A
=(∑r
i=1 c∗A′y∗i)2
32r(7.174)
and the variance of A is
Var(A) =(
116r
)2
Var(r∑
i=1
c∗A′y∗i)
=(
116r
)2
× 32rσ2
=σ2
8r, (7.175)
assuming that yij ’s are independent and Var(yij)=σ2 for all i and j.Similarly for B and C,
B =∑r
i=1 c∗B′y∗i
16r,
SSB =(∑r
i=1 c∗B′y∗i)2
32r,
Var(B) =σ2
8r
where the vector
c∗B′ = (cB1, cB2, cB3, cB4)
consists of 32 elements and
C =∑r
i=1 c∗C′y∗i
16r,
SSC =(∑r
i=1 c∗C′y∗i)2
32r,
Var(C) =σ2
8r
7.12 Partial Confounding 313
where the vector
c∗C′ = (cC1, cC2, cC3, cC4)
consists of 32 elements.Next we consider the estimation of confounded factor AB which can be
estimated from the replicates 2, 3 and 4 as
ABpc =ABrep2 + ABrep3 + ABrep4
3
=1
12r
((
r∑
i=1
cAB2′y∗i)rep2 + (
r∑
i=1
cAB3′y∗i)rep3
+(r∑
i=1
cAB4′y∗i)rep4
)
=∑r
i=1 c∗AB′y∗i
12r(7.176)
where the vector
c∗AB′ = (cAB2, cAB3, cAB4)
consists of 24 elements and each of the cAB2, cAB3 and cAB4 is having 8elements in it. The sum of squares due to ABpc is
SSABpc=
(∑r
i=1 c∗AB′y∗i)2
rc∗AB′c∗AB
=(∑r
i=1 c∗AB′y∗i)2
24r(7.177)
and the variance of ABpc is
Var(ABpc) =(
112r
)2
Var(r∑
i=1
c∗AB′y∗i)
=(
112r
)2
Var
((
r∑
i=1
c′AB2y∗i)rep2 + (r∑
i=1
c′AB3y∗i)rep3
+(r∑
i=1
c′AB4y∗i)rep4
)
=(
112r
)2
(8rσ2 + 8rσ2 + 8rσ2)
=σ2
6r. (7.178)
314 7. Multifactor Experiments
Similarly the confounded effects AC,BC and ABC are estimated and theirrespective sum of squares and variances are obtained as follows:
ACpc =ACrep1 + ACrep3 + ACrep4
3
=∑r
i=1 c∗AC′y∗i
12r,
SSACpc=
(∑r
i=1 c∗AC′y∗i)2
24r,
Var(ACpc) =σ2
6r
where the vector
c∗AC′ = (cAC1, cAC3, cAC4)
consists of 24 elements,
BCpc =BCrep1 + BCrep2 + BCrep4
3
=∑r
i=1 c∗BC′y∗i
12r,
SSBCpc=
(∑r
i=1 c∗BC′y∗i)2
24r,
Var(BCpc) =σ2
6r
where the vector
c∗BC′ = (cBC1, cBC2, cBC4)
consists of 24 elements and
ABCpc =ABCrep1 + ABCrep2 + ABCrep3
3,
SSABCpc=
(∑r
i=1 c∗ABC′y∗i)2
24r,
Var(ABCpc) =σ2
6r
where the vector
c∗ABC′ = (cABC1, cABC2, cABC3)
consists of 24 elements.If an unconfounded design with 4r replication was used then the variance
of each of the factors A, B,C, AB, BC, AC and ABC is σ∗2/8r where σ∗2
is the error variance on blocks of size 8. So the relative efficiency of aconfounded effect in the partially confounded design with respect to that
7.12 Partial Confounding 315
of an unconfounded one in a comparable unconfounded design is
6r/σ2
8r/σ∗2=
34
σ∗2
σ2. (7.179)
So the information on a partially confounded effect relative to an uncon-founded effect is 3/4. If σ∗2 > 4σ2/3, then partially confounded designgives more information than the unconfounded one.
The sum of squares due to blocks in this case of partial confounding is
SSBlock = SSBlock(wr) + SSBlock(r)
where the sum of squares due to blocks within replications (wr) is
SSBlock(wr) =123
4r∑
i=1
(B2
1i + B22i
2−R2
i
)(7.180)
which carries 4r degrees of freedom and the sum of squares due toreplications is
SSBlock(r) =123
4r∑
i=1
R2i −
Y 2...
32r(7.181)
which carries (4r − 1) degrees of freedom. The total sum of squares is
SSTotal(pc) =∑
i
∑
j
∑
k
y2ijk −
Y 2...
32r. (7.182)
The analysis of variance table in this case is given in Table 7.40. The
Table 7.40. Analysis of variance in 23 factorial under partial confounding as inExample 7.13
Source SS df MSReplicates SSBlock(r) 4r − 1 MSBlock(r)
Blocks within replicates SSBlock(wr) 4r MSBlock(wr)
Factor A SSA 1 MSA
Factor B SSB 1 MSB
Factor C SSC 1 MSC
AB SSAB(pc) 1 MSAB(pc)
AC SSAC(pc) 1 MSAC(pc)
BC SSBC(pc) 1 MSBC(pc)
ABC SSABC(pc) 1 MSABC(pc)
Error by substraction 24r − 7 MSE(pc)
Total SSTotal(pc) 32r − 1
test of hypothesis can be carried out in the usual way as in case of factorialexperiment.
316 7. Multifactor Experiments
7.13 Fractional Replications
When the number of factors in a factorial experiment increases, then thenumber of experimental units or the number of plots needed to run thecomplete factorial experiment also increases. For example, a 24 factorialexperiment needs 16 plots, a 25 factorial experiment needs 32 plots, a 26
factorial experiment needs 64 plots and so on to run the complete factorialexperiment. Regarding the degrees of freedom, e.g., the 26 factorial exper-iment will carry 63 degrees of freedom. Out of the 63 degrees of freedom,6 go with main effects, 15 go with two factor interaction and rest 42 gowith three factor or higher order interactions. If somehow the higher orderinteractions are not of much importance and can be ignored, then informa-tion on main effects and lower order interaction can be obtained only by afraction of complete factorial experiment. Such experiments are called asfractional factorial experiments. These experiments are more useful whenthere are several variables and the process under study is expected to beprimarily governed by some of the main effects and lower order interactions.Use of a fractional factorial experiment instead of full factorial experimentis usually done for economic reasons. In case of fractional factorial experi-ment, it is possible to combine the runs of two or more fractional factorialsto assemble sequentially a larger experiment to estimate the factor and in-teraction effects of interest. We demonstrate this with one-half fraction ofa 23 factorial experiment.
One Half Fraction of Factorial Experiment with Two Levels
Consider the setup of 23 factorial experiment consisting of three factors,each at two levels. We have total 8 treatment combinations. So we needthe plots of size 8 to run the complete factorial experiment.
Suppose it cannot be afforded to run all the eight treatment combinationsand the experimenter decides to have only four runs, i.e., 1/2 fraction of 23
factorial experiment. Such an experiment contains one-half fraction of a 23
experiment and is called 23−1 factorial experiment. Similarly, 1/22 fractionof 23 factorial experiment requires only 2 runs and contains 1/22 fractionof 23 factorial experiment and is called as 23−2 factorial experiment. Ingeneral, 1/2p fraction of a 2k factorial experiment requires only 2k−p runsand is denoted as 2k−p factorial experiment.
For illustration, we consider the case of 1/2 fraction of 23 factorial exper-iment. The question now arises is how to choose four out of eight treatmentcombinations. In order to decide this, first we have to choose an interac-tion factor which the experimenter feels can be ignored. Let us choose, sayABC. Now we create the table of treatment combinations as in Table 7.41.The arrangement of treatment combinations in Table 7.41 is obtained asfollows
7.13 Fractional Replications 317
Table 7.41. Arrangement of treatment combinations for one-half fraction of 23
factorial experiment
Treatment Factorscombinations I A B C AB AC BC ABC
a + + – – – – + +b + – + – – + – +c + – – + + – – +
abc + + + + + + + +ab + + + – + – – –ac + + – + – + – –bc + – + + – – + –(1) + – – – + + + –
• Write down the factor to be ignored which is ABC in our case. Interms of treatment combinations
ABC = (a + b + c + abc)− (ab + ac + bc + (1)).
• Collect the treatment combinations with plus (+) and minus (−) signstogether; divide the eight treatment combinations into two groupswith respect to the + and − signs. This is done in the last columncorresponding to ABC in Table 7.41.
• Write down the symbols + or − of the other factors A, B, C, AB,AC and BC corresponding to (a, b, c, abc) and (ab, ac, bc, (1)).
This will yield the arrangement as in Table 7.41. Now the treatment com-binations corresponding to + signs of treatment combinations in ABC and− signs of treatment combinations in ABC will constitute two one-halffractions of 23 factorial experiment. Here one of the one-half fractions willcontain the treatment combinations a, b, c and abc. Another one-half frac-tion will contain the treatment combinations ab, ac, bc and (1). Both theone-half fractions are separated by dotted line in Table 7.41.
The factor which is used to generate the two one-half fractions is calledas the generator. For example, ABC is the generator of this particularfraction in the present case.
The identity column I always contains all the + signs. So I = ABCis called the defining relation of this fractional factorial experiment. Thedefining relation for a fractional factorial is the set of all columns that areequal to the identity column I.
The number of degrees of freedom associated with one-half fraction of 23
factorial experiment, i.e., 23−1 factorial experiment is 3 which is essentiallyused to estimate the main effects.
Now consider the one-half fraction containing the treatment combina-tions a, b, c and abc (corresponding to + signs in the column of ABC).
318 7. Multifactor Experiments
The factors A, B,C, AB, AC and BC are now estimated from this block asfollows
A = a− b− c + abc , (7.183)B = −a + b− c + abc , (7.184)C = −a− b + c + abc , (7.185)
AB = −a− b + c + abc , (7.186)AC = −a + b− c + abc , (7.187)BC = a− b− c + abc. (7.188)
We notice that the estimate of A in (7.183) is same as the estimate of BCin (7.188). So it is not possible to differentiate between whether A is beingestimated or BC is being estimated and as such A = BC. Similarly, theestimates of B in (7.184) and of AC in (7.187) as well as the estimates ofC in (7.185) and of AB in (7.186) are also same. We write this as B = AC,C = AB. So one can not differentiate between B and AC as well as betweenC and AB that which one is being estimated. Two or more effects that havethis property are called aliases. Thus
• A and BC are aliases,
• B and AC are aliases and
• C and AB are aliases.
In fact, when we estimate A, B and C in 23−1 factorial experiment, thenwe are essentially estimating A+BC, B +AC and C +AB, respectively ina complete 23 factorial experiment. To understand this, consider the setupof complete 23 factorial experiment in which A and BC are estimated by
A = −(1) + a− b + ab− c + ab− bc + abc , (7.189)BC = (1) + a− b− ab− c− ac + bc + abc. (7.190)
Adding (7.189) and (7.190) and ignoring the common multiplier, we have
A + BC = a− b− c + abc (7.191)
which is same as (7.183) or (7.188). Similarly, considering the estimates ofB, C, AB and AC in 23 factorial experiment and ignoring the commonmultiplier in (7.194) and (7.197), we have
B = −(1)− a + b + ab− c− ac + bc + abc , (7.192)AC = (1)− a + b− ab + ac− bc + abc , (7.193)
B + AC = −a + b− c + abc , (7.194)
7.13 Fractional Replications 319
which is same as (7.184) or (7.187) and
C = −(1)− a− b− ab + c + ac + bc + abc , (7.195)AB = (1)− a− b + ab + c− ac− bc + abc , (7.196)
C + AB = −a− b− c + abc , (7.197)
which is same as (7.185) or (7.186).The alias structure can be determined by using the defining relation.
Multiplying any column (or effect) by the defining relation yields the aliasesfor that column (or effect). For example, in this case, the defining relationis I = ABC. Now multiply the factors on both sides of I = ABC yields
A× I = (A)× (ABC) = A2BC = BC ,
B × I = (B)× (ABC) = AB2C = AC ,
C × I = (C)× (ABC) = ABC2 = AB.
The systematic rule to find aliases is to write down all the effects ofa 23−1 = 22 factorial in standard order and multiply each factor by thedefining contrast.
Now suppose we choose other one-half fraction, i.e., treatment combina-tions with − signs in ABC column in Table 7.41. This is called alternateor complementary one-half fraction. In this case,
A = ab + ac− bc− (1) , (7.198)B = ab− ac + bc− (1) , (7.199)C = −ab + ac + bc− (1) , (7.200)
AB = ab− ac− bc + (1) , (7.201)AC = −ab + ac− bc + (1) , (7.202)BC = −ab− ac + bc + (1). (7.203)
In this case, we notice that A = −BC, B = −AC, C = −AB, so the samefactors remain aliases again which are aliases in the one-half fraction with +sign in ABC. If we consider the setup of complete 23 factorial experiment,then using (7.189) and (7.190), we observe that A−BC is same as (7.198)or (7.203) (ignoring the common multiplier). So what we estimate in theone-half fraction with − sign is ABC is same as of estimating A−BC froma complete 23 factorial experiment. Similarly, using (7.192) and (7.193),we see that B − AC is same as (7.199) or (7.202); and using (7.195) and(7.196), we see that C − AB is same as (7.200) or (7.201) (ignoring thecommon multiplier).
In practice, it does not matter which fraction is actually used. Boththe one-half fractions belong to the same family of 23 factorial experi-ment. Moreover the difference of negative signs in aliases of both the halvesbecomes positive while obtaining the sum of squares in analysis of variance.
Further, suppose we want to have 1/22 fraction of 23 factorial experimentwith one more defining relation, say I = BC along with I = ABC. So the
320 7. Multifactor Experiments
one-half fraction with + signs of ABC can further be divided into twohalves in which each half will contain two treatments corresponding to
• + sign of BC, (viz., a and abc) and
• − sign of BC, (viz., b and c).
These two halves will constitute the one-fourth fraction of 23 factorialexperiment. Similarly we can consider the other one-half fraction corre-sponding to − sign of ABC. Now we look for + and − signs correspondingto I = BC which constitute the two one-half fractions consisting of thetreatments
• (1), bc and
• ab, ac.
This will again constitutes the one-fourth fraction of 23 factorial experi-ment.
In order to have more understanding of fractional factorial, we considerthe setup of 26 factorial experiment and construct the one-half fractionusing I = ABCDEF as defining relation. First we write all the factorsof 26−1 = 25 factorial experiment in standard order and multiply all thefactors with the defining relation. This is illustrated in Table 7.42
Table 7.42. One half fraction of 26 factorial experiment using I = ABCDEF asdefining relation
I = ABCDEF D = ABCEF E = ABCDF DE = ABCFA = BCDEF AD = BCEF AE = BCDF ADE = BCFB = ACDEF BD = ACEF BE = ACDF BDE = ACFAB = CDEF ABD = CEF ABE = CDF ABDE = CFC = ABDEF CD = ABEF CE = ABDF CDE = ABFAC = BDEF ACD = BEF ACE = BDF ACDE = BFBC = ADEF BCD = AEF BCE = ADF BCDE = AFABC = DEF ABCD = EF ABCE = DF ABCDE = F
In this case, we observe that
— all the main effects have 5 factor interactions as aliases,
— all the 2 factor interactions have 4 factor interactions as aliases and
— all the 3 factor interactions have 3 factor interactions as aliases.
Suppose a completely randomized design is adopted with blocks of size16. There are 32 treatments and abcdef is chosen as the defining contrastfor half replicate. Now all the 32 treatments are to be divided and allocatedinto two blocks of size 16 each. This is equivalent to saying that one factorial
7.13 Fractional Replications 321
effect (and its alias) are confounded with blocks. Suppose we decide thatthe three factor interactions and their aliases (which are also three factorsinteractions in this case) are to be used as error. So we choose one of thethree factor interaction, say ABC (and its alias DEF ) to be confounded.Now one of the block contains all the treatment combinations having aneven number of letters a, b or c. These blocks are constructed in Table 7.43.There are all together 31 degrees of freedom in total, out of which 6 degreesof freedom are carried by the main effects, 15 degrees of freedom are carriedby the two factor interactions and 9 degrees of freedom are carried by theerror (from three factor interactions). Additionally, one more division ofdegree of freedom arises in this case which is due to blocks. So the degreeof freedom carried by blocks is 1. That is why the error degrees of freedomare 9 (and not 10) because one degree of freedom goes to block.
Table 7.43. One half replicate of 26 factorial experiment in the blocks of size 16
Block 1 Block 2(1) abde aedf afef bdab beac bfbc cd
abde ceabdf cfabef adefacde bdefacdf cdefacef abcdbcde abcebcdf abcfbcef abcdef
Suppose we want to have blocks of size 8 in the same setup. This canbe achieved by 1/22 replicate of 26 factorial experiment. In terms of con-founding setup, this is equivalent to saying that the two factorial effectsare to be confounded. Suppose we choose ABD (and its alias CEF ) inaddition to ABC (and its alias DEF ). When we confound two effects,then their generalized interaction also gets confounded. So the interactionABC × ABD = A2B2CD = CD (or DEF × CEF = CDE2F 2 = CD)and its alias ABEF also get confounded. One may note that a two factorinteraction is getting confounded in this case which is not a good strategy.A good strategy in such cases where an important factor is getting con-
322 7. Multifactor Experiments
founded is to choose the least important two factor interaction. The blocksarising with this plan are described in Table 7.44. These blocks are derivedby dividing each block of Table 7.43 into halves. These halves contain re-spectively an odd and even number of the letters c and d. The total degreesof freedom in this case are 31 which are divided as follows:
– the blocks carry 3 degrees of freedom,
– the main effects carry 6 degrees of freedom,
– the two factor interactions carry 14 degrees of freedom and
– the error carry 8 degrees of freedom.
Table 7.44. One fourth replicate of 26 factorial experiment in blocks of size 8
Block 1 Block 2 Block 3 Block 4(1) de ae adef df af bdab ac be ce
abef bc bf cfacde abde cd abceacdf abdf abcd abcfbcde acef cdef adefbcdf bcef abcdef bdef
The analysis of variance in case of fractional factorial experiments isconducted in the usual way as in the case of any factorial experiment. Thesums of squares for blocks, main effects and two factor interactions arecomputed in the usual way.
Remark: For further examples and other multifactor designs we refer tothe overview given by Hinkelmann and Kempthorne (2005), Draper andPukelsheim (1996) and Johnson and Leone (1964).
7.14 Exercises and Questions
7.14.1 What advantages does a experiment (A, B)compared to two one–factor experiments (A) and (B)?
mean and of the two main effects.
have,two–factorial
7.14.2 Name the score function for parameter estimation in a two–factorialmodel with interaction. Name the parameter estimates of the overall
7.14.3 Fill in the degrees of freedom and the F–statistics (A in a levels, Bin b levels, r replicates) in the two–factorial design with fixed effects:
7.14 Exercises and Questions 323
df MS FFactor A SSA
Factor B SSB
A×B SSA×B
Error SSError
Total SSTotal
interaction?
is meant a saturated model and what is meant by theindependence model?
corresponds to the two–factorial design with fixed effects)?
(a)FA *FB *FA×B *
(b)FA *FB *FA×B
(c)FA
FB
FA×B *(d)
FA *FB
FA×B *
(e)FA
FB *FA×B
B : b levels, r replicates)?
interaction in effect coding.
two–factorial model with fixed effects in effect coding?
V(µ, α, β, (αβ)) = σ2 ?
In what way do the parameter estimates µ, α, and β change if FA×B
is not significant? How does the estimate σ2 change? In what way dothe confidence intervals for α, and β, and the test statistics FA andFB, change? Is the test more conservative than in the model withsignificant interaction?
effects and define the final model:
7.14.4 At least how many replicates r are needed in order to be able to show
7.14.5 What by
7.14.6 How are the following test results to be interpreted (i.e., which model
7.14.7 Of what rank is the design matrix X in the two–factorial model (A : a,
7.14.8 Let a = b = 2 and r = 1. Describe the two–factorial model with
7.14.9 Of what form is the covariance matrix of the OLS estimate in the
7.14.10 Carry out the following test in the two–factorial model with fixed
324 7. Multifactor Experiments
df MS FSSA 130 1SSB 630 2SSA×B 40 2SSError 150 18SSTotal 23
the SSError. What meaning does a significant block effect have?
7.14.12 Analyze the following two–factorial experiment with a = b = 2 andr = 2 replicates (randomized design, no block design):
B1 B2
17 4A1 18 6
35 10 456 15
A2 4 1010 25 3545 35 80
C =?2
N,
SSTotal =∑∑ ∑
y2ijk − C,
SSA =1br
∑
i
Y 2i.. − C,
SSB = ?,SSSubtotal = 1/2(352 + 102 + 102 + 252)− C,
SSA×B = SSSubtotal − SSA − SSB ,
SSError = ?.
7.14.13 Name the assumptions for µ, αi, βj , and (αβ)ij
model with random effects. Complete the following:
– Var(yijk) = ?.
– E
αβαβε
(α, β, αβ, ε)′ = ?.
7.14.11 Assume the two–factorial experiment with fixed effects to be designedas a randomized block design. Specify the model. In what way dothe parameter estimates and the SS’s for the other parameters oreffects change, compared to the model without block effects? Name
in the two–factorial
7.14 Exercises and Questions 325
– Solve the following system:
MSA = brσ2α + rσ2
αβ + σ2,
MSB = arσ2β + rσ2
αβ + σ2,
MSA×B = rσ2αβ + σ2,
MSError = + σ2 .
– Compute the test statistics
FA×B = ?,FA = ?,FB = ?.
– Name the test statistics if FA×B is not significant.
are
FA×B =MSA×B
MSError,
FB =MSB
?,
FA =MSA
?,
and in the model with independent interactions
FB =MSB
?.
with fixed effects
FEffect = .
(Effect, e.g., A, B, C, A×B, A×B × C) ?
2
replications:
(1) a b abA -1 +1 -1 +1B -1 -1 +1 +1
AB +1 -1 -1 +1
7.14.14 The covariance matrix in the mixed two–factorial model (A fixed, Brandom) has a compound symmetric structure, i.e., Σ = ? Therefore,we have a generalized linear regression model. According to whichmethod are the estimates of the fixed effects obtained? The test sta-tistics in the model with the interactions correlated over the A–levels
7.14.15 Name the test statistics for the three–factorial (A × B × C)–design
7.14.16 The following table is used in the 2 design with fixed effects and r
326 7. Multifactor Experiments
∗ ′
B, and AB in the following 22 design.
Replications Total1 2 response
(0, 0) 85 93(1, 0) 46 40(0, 1) 103 115(1, 1) 140 154
HowareSSA, SSB , and SSA×B
5
being investigated are:A: density of the material,B: addition of a specific ingredient,C: moisture content,D: structure of the material andE: age.
The data are in coded units.
(1) = 11 ac = 11 acd = 18 abce = 14cde = 15 d = 19 ce = 17 ab = 17ae = 14 abd = 19 bcd = 18 ade = 14bc = 20 be = 21 abcde = 16 bde = 20
7.14.18 In a pilot experiment on heat loss of insulation material, 4 factors(A, B,C, D) were considered, each at 2 levels. Only 4 experiments
effects and interactions you consider significant.
Replicate 1Block 1 2 3 4
(1) = 6 a = 5 b = 8 d = 6bcd = 17 abcd = 15 cd = 10 bc = 7ac = 11 c = 7 abc = 17 acd = 4abd = 12 bd = 11 ad = 8 ab = 7
Here (1) is the total response for (0, 0) (A low, B high), (a) for (1,0), (b) for (0, 1) and (ab) for (1, 1). Hence, the vector of the totalresponse values is Y =((1), a, b, ab) . Compute the average effects A,
computed? (Hint:Use the contrasts).
periment on the insulation properties of a new product. The 5 factorsfactorial ex-7.14.17 The data below constitutes a one half replicate of a 2
Each factor was held at 2 levels for the initial experiment. The databelow represent differential of temperature arising from one fixed ap-plication of heat. Test whether any of the main effects are significant.
could be carried out at a single session. Two replicates were desired.The coded data given below are so arranged that the first replicate hasas confounding interactions ABC, ACD and BD, while the secondreplicate has as confounding interactions BCD, ABD and AC. Con-struct an appropriate analysis of variance table and indicate which
7.14 Exercises and Questions 327
Replicate 2Block 1 2 3 4
(1) = 3 b = 9 c = 9 a = 6bd = 12 d = 6 bcd = 14 abd = 6acd = 11 abcd = 12 ad = 7 cd = 5abc = 17 ac = 12 ab = 12 bc = 13
with BC.7
7.14.19 Suppose 3 factors (all parameters) are to be studied, each at 2 levels.In carrying out the experiment, it is necessary to run it in 2 blocksof 4. Two replicates are planned. Setup the formulas for the sum ofsquares and degrees of freedom for each effect, if the first replicate hasblocks confounded with ABC, and the second has block confounded
7.14.20 Construct a design for 1/4 replicate of a 2 experiment in 4 blocks of 8treatments. Use ABCDE and CDEFG as 2 of the defining contrasts.
73 replicate of a 27.14.21 Determine the elements in the principal block of 1/2experiment with ABCDE and ABFG as 2 of the defining contrasts.
8Models for CategoricalResponse Variables
8.1 Generalized Linear Models
8.1.1 Extension of the Regression Model
Generalized linear models (GLMs) are a generalization of the classical linearmodels of regression analysis and analysis of variance, which model therelationship between the expectation of a response variable and unknownpredictor variables according to
E(yi) = xi1β1 + . . . + xipβp
= x′iβ . (8.1)
The parameters are estimated according to the principle of least squaresand are optimal according to the minimum dispersion theory or, in thecase of a normal distribution, are optimal according to the ML theory(cf. Chapter 3).
Assuming an additive random error εi, the density function can bewritten as
f(yi) = fεi( yi − x′iβ) , (8.2)
where ηi = x′iβ is the linear predictor. Hence, for continuous normallydistributed data, we have the following distribution and mean structure:
yi ∼ N(µi, σ2), E(yi) = µi , µi = ηi = x′iβ . (8.3)
© Springer Science + Business Media, LLC 2009
H. Toutenburg and Shalabh, Statistical Analysis of Designed Experiments, Third Edition, 329Springer Texts in Statistics, DOI 10.1007/978-1-4419-1148-3_8,
330 8. Models for Categorical Response Variables
In analyzing categorical response variables, three major distributions mayarise: the binomial, multinomial, and Poisson distributions, which belongto the natural exponential family (along with the normal distribution).
In analogy to the normal distribution, the effect of covariates on theexpectation of the response variables may be modeled by linear predictorsfor these distributions as well.
Binomial Distribution
Assume that I predictors ηi = x′iβ (i = 1, . . . , I) and Ni realizations yij
(j = 1, . . . , Ni), respectively, are given and, furthermore, assume that theresponse has a binomial distribution
yi ∼ B(Ni, πi) with E(yi) = Niπi = µi .
Let g(πi) = logit(πi) be the chosen link function between µi and ηi:
logit(πi) = ln(
πi
1− πi
)
= ln(
Niπi
Ni −Niπi
)= x′iβ . (8.4)
With the inverse function g−1(x′iβ) we then have
Niπi = µi = Niexp(x′iβ)
1 + exp(x′iβ)= g−1(ηi) . (8.5)
Poisson Distribution
Let yi (i = 1, . . . , I) have a Poisson distribution with E(yi) = µi:
P (yi) =e−µiµyi
i
yi!for yi = 0, 1, 2, . . . . (8.6)
The link function can then be chosen as ln(µi) = x′iβ.
Contingency Tables
The cell frequencies yij of an (I × J)–contingency table of two categoricalvariables can have a Poisson, multinomial, or binomial distribution (de-pending on the sampling design). By choosing appropriate design vectorsxij , the expected cell frequencies can be described by a loglinear model
ln(mij) = µ + αAi + βB
j + (αβ)ABij
= x′ijβ (8.7)
and, hence, we have
µij = mij = exp(x′ijβ) = exp(ηij) . (8.8)
8.1 Generalized Linear Models 331
In contrast to the classical model of regression analysis, where E(y) is linearin the parameter vector β, so that µ = η = x′β holds, the generalizedmodels are of the following form:
µ = g−1(x′β) , (8.9)
where g−1 is the inverse function of the link function. Furthermore, theadditivity of the random error is no longer a necessary assumption, sothat, in general,
f(y) = f(y, x′β) (8.10)
is assumed, instead of (8.2).
8.1.2 Structure of the Generalized Linear Model
The generalized linear model (GLM) (cf. Nelder and Wedderburn, 1972)is defined as follows. A GLM consists of three components:
• the random component, which specifies the probability distributionof the response variable;
• the systematic component, which specifies a linear function of theexplanatory variables; and
• the link function, which describes a functional relationship be-tween the systematic component and the expectation of the randomcomponent.
The three components are specified as follows:
1. The random component Y consists of N independent observations y′ =(y1, y2, . . . , yN ) of a distribution belonging to the natural exponential family(cf. Agresti (2007)). Hence, each observation yi has—in the simplest caseof a one–parametric exponential family—the following probability densityfunction:
f (yi, θi) = a (θi) b (yi) exp (yi Q (θi)) . (8.11)
Remark. The parameter θi can vary over i = 1, 2, . . . , N , depending on thevalue of the explanatory variable, which influences yi through the systema-tic component.
Special distributions of particular importance in this family are the Pois-son and the binomial distribution. Q(θi) is called the natural parameter ofthe distribution. Likewise, if the yi are independent, the joint distributionis a member of the exponential family.
A more general parametrization allows inclusion of scaling or nuisancevariables. For example, an alternative parametrization with an additional
332 8. Models for Categorical Response Variables
scaling parameter φ (the so–called dispersion parameter) is given by
f(yi | θi, φ) = exp
yiθi − b(θi)a(φ)
+ c(yi, φ)
, (8.12)
where θi is called the natural parameter. If φ is known, (8.12) represents alinear exponential family. If, on the other hand, φ is unknown, then (8.12)is called an exponential dispersion model . With φ and θi, (8.12) is a two–parametric distribution for i = 1, . . . , N , which, for instance, is used fornormal or gamma distributions. Introducing yi and θi as vector–valuedparameters rather than scalars leads to multivariate generalized models,which include multinomial response models as a special case (cf. Fahrmeirand Tutz, 2001, Chapter 3).
2. The systematic component relates a vector η = (η1, η2, . . . , ηN ) to a setof explanatory variables through a linear model
η = Xβ . (8.13)
Here η is called the linear predictor, X : N×p is the matrix of observationson the explanatory variables, and β is the (p× 1)–vector of parameters.
3. The link function connects the systematic component with the expec-tation of the random component. Let µi = E(yi); then µi is linked to ηi byηi = g(µi). Here g is a monotonic and differentiable function
g(µi) =p∑
j=1
βjxij , i = 1, 2, . . . , N . (8.14)
Special cases:
(i) g(µ) = µ is called the identity link . We get ηi = µi.
(ii) g(µ) = Q(θi) is called the canonical natural link . We haveQ(θi) =
∑pj=1 βjxij .
Properties of the Density Function (8.12)
Let
li = l(θi, φ; yi) = ln f(yi; θi, φ) (8.15)
be the contribution of the ith observation yi to the loglikelihood. Then
li = [yiθi − b(θi)]/a(φ) + c(yi; φ) (8.16)
holds and we get the following derivatives with respect to θi:
∂li∂θi
=[yi − b′(θi)]
a(φ), (8.17)
∂2li∂θ2
i
=−b′′(θi)
a(φ), (8.18)
8.1 Generalized Linear Models 333
where b′(θi) = ∂b(θi)/∂θi and b′′(θi) = ∂2b(θi)/∂θ2i are the first and second
derivatives of the function b(θi), assumed to be known. By equating (8.17)to zero, it becomes obvious that the solution of the likelihood equations isindependent of a(φ). Since our interest lies with the estimation of θ and βin η = x′β, we could assume a(φ) = 1 without any loss of generality (thiscorresponds to assuming σ2 = 1 in the case of a normal distribution). Forthe present, however, we retain a(φ).
Under certain assumptions of regularity, the order of integration anddifferentiation may be interchangeable, so that
E(
∂li∂θi
)= 0, (8.19)
−E(
∂2li∂θ2
i
)= E
(∂li∂θi
)2
. (8.20)
Hence, we have, from (8.17) and (8.19),
E(yi) = µi = b′(θi) . (8.21)
Similarly, from (8.18) and (8.20), we find
b′′(θi)a(φ)
= E
[yi − b′(θi)]2
a2(φ)
=var(yi)a2(φ)
, (8.22)
since E[yi − b′(θi)] = 0 and, hence,
V (µi) = var(yi) = b′′(θi)a(φ) . (8.23)
Under the assumption that the yi (i = 1, . . . , N) are independent, theloglikelihood of y′ = (y1, . . . , yN ) equals the sum of li(θi, φ; yi). Let
θ′ = (θ1, . . . , θN ), µ′ = (µ1, . . . , µN ), X =
x′1...
x′N
,
and
η = (η1, . . . , ηN )′ = Xβ .
We then have, from (8.21),
µ =∂b(θ)∂θ
=(
∂b(θ1)∂θ1
, . . . ,∂b(θ1)∂θN
)′, (8.24)
and, in analogy to (8.23) for the covariance matrix of y′ = (y1, . . . , yN ),
cov(y) = V (µ) =∂2b(θ)∂θ ∂θ′
= a(φ) diag(b′′(θ1), . . . , b′′(θN )) . (8.25)
These relations hold in general, as we show in the following discussion.
334 8. Models for Categorical Response Variables
8.1.3 Score Function and Information Matrix
The likelihood of the random sample is the product of the density functions
L(θ, φ; y) =N∏
i=1
f(yi; θi, φ) . (8.26)
The loglikelihood ln L(θ, φ; y) for the sample y of independent yi (for i =1, . . . , N) is of the form
l = l(θ, φ; y) =N∑
i=1
li =N∑
i=1
(yiθi − b(θi))
a(φ)+ c(yi; φ)
. (8.27)
The vector of first derivatives of l with respect to θi is needed for deter-mining the ML estimates. This vector is called the score function. For now,we neglect the parametrization with φ in the representation of l and L andthus get the score function as
s(θ; y) =∂
∂θl(θ; y) =
1L(θ; y)
∂
∂θL(θ; y) . (8.28)
Let
∂2l
∂θ ∂θ′=
(∂2l
∂θi ∂θj
)i=1,...,Nj=1,...,N
be the matrix of the second derivatives of the loglikelihood. Then
F(N)(θ) = E(−∂2l(θ; y)
∂θ ∂θ′
)(8.29)
is called the expected Fisher–information matrix of the sample withy′ = (y1, . . . , yN ), where the expectation is to be taken with respect tothe following density function
f(y1, . . . , yN |θi) =∏
f(yi|θi) = L(θ; y) .
In the case of regular likelihood functions (where regular means that the ex-change of integration and differentiation is possible), which the exponentialfamilies belong to, we have
E(s(θ; y)) = 0 (8.30)
and
F(N)(θ) = E(s(θ; y)s′(θ; y)) = cov(s(θ; y)) , (8.31)
Relation (8.30) follows from∫
f(y1, . . . , yN |θ) dy1 · · ·dyN =∫
L(θ; y) dy = 1 , (8.32)
8.1 Generalized Linear Models 335
by differentiating with respect to θ, using (8.28),∫
∂L(θ; y)∂θ
dy =∫
∂l(θ; y)∂θ
L(θ; y)dy
= E(s(θ; y)) = 0 . (8.33)
Differentiating (8.33) with respect to θ′, we get
0 =∫
∂2l(θ; y)∂θ ∂θ′
L(θ; y) dy
+∫
∂l(θ; y)∂θ
∂l(θ; y)∂θ′
L(θ; y) dy
= − F(N)(θ) + E(s(θ; y)s′(θ; y)) ,
and hence (8.31), because E(s(θ; y)) = 0.
8.1.4 Maximum Likelihood Estimation
Let ηi = x′iβ =∑p
j=1 xijβj be the predictor of the ith observation of theresponse variable (i = 1, . . . , N) or, in matrix representation,
η =
η1
...ηN
=
x′1β...
x′Nβ
= Xβ . (8.34)
Assume that the predictors are linked to E(y) = µ by a monotonicdifferentiable function g(·):
g(µi) = ηi (i = 1, . . . , N) , (8.35)
or, in matrix representation,
g(µ) =
g(µ1)...
g(µN )
= η . (8.36)
The parameters θi and β are then linked by relation (8.21), that is, µi =b′(θi), with g(µi) = x′iβ. Hence we have θi = θi(β). Since we are interestedonly in estimating β, we write the loglikelihood (8.27) as a function of β:
l(β) =N∑
i=1
li(β) . (8.37)
We can find the derivatives ∂li(β)/∂βj according to the chain rule
∂li(β)∂βj
=∂li∂θi
∂θi
∂µi
∂µi
∂ηi
∂ηi
∂βj. (8.38)
336 8. Models for Categorical Response Variables
The partial results are as follows:
∂li∂θi
=[yi − b′(θi)]
a(φ)[cf. (8.17)]
=[yi − µi]
a(φ)[cf. (8.21)], (8.39)
µi = b′(θi) ,
∂µi
∂θi= b′′(θi) =
var(yi)a(φ)
[cf. (8.23)], (8.40)
∂ηi
∂βj=
∂∑p
k=1 xikβk
∂βj= xij . (8.41)
Because ηi = g(µi), the derivative ∂µi/∂ηi is dependent on the link functiong(·), or rather its inverse g−1(·). Hence, it cannot be specified until the linkis defined.
Summarizing, we now have
∂li∂βj
=(yi − µi)xij
var(yi)∂µi
∂ηi, j = 1, . . . , p, (8.42)
using the rule
∂θi
∂µi=
(∂µi
∂θi
)−1
for inverse functions (µi = b′(θi), θi = (b′)−1(µi)). The likelihood equationsfor finding the components βj are now
N∑
i=1
(yi − µi)xij
var(yi)∂µi
∂ηi= 0, j = 1, . . . , p. (8.43)
The loglikelihood is nonlinear in β. Hence, the solution of (8.43) requiresiterative methods. For the second derivative with respect to components ofβ, we have, in analogy to (8.20), with (8.42),
E(
∂2li∂βj∂βh
)= −E
(∂li∂βj
)(∂li∂βh
)
= −E
[(yi − µi)(yi − µi)xijxih
(var(yi))2
(∂µi
∂ηi
)2]
= − xijxih
var(yi)
(∂µi
∂ηi
)2
, (8.44)
and, hence,
E(− ∂2l(β)
∂βj∂βh
)=
N∑
i=1
xijxih
var(yi)
(∂µi
∂ηi
)2
(8.45)
8.1 Generalized Linear Models 337
and, in matrix representation for all (j, h) combinations,
F(N)(β) = E(−∂2l(β)
∂β∂β′
)= X ′WX (8.46)
with
W = diag(w1, . . . , wN ) (8.47)
and the weights
wi =(∂µi/∂ηi)
2
var(yi). (8.48)
Fisher–Scoring Algorithm
For the iterative determination of the ML estimate of β, the method ofiterative reweighted least squares is used. Let β(k) be the kth approximationof the ML estimate β. Furthermore, let q(k)(β) = ∂l(β)/∂β be the vector ofthe first derivatives at β(k) (cf. (8.42)). Analogously, we define W (k). Theformula of the Fisher–scoring algorithm is then
(X ′W (k)X)β(k+1) = (X ′W (k)
X)β(k) + q(k) . (8.49)
The vector on the right side of (8.49) has the components (cf. (8.45) and(8.42))
∑
h
[∑
i
xijxih
var(yi)
(∂µi
∂ηi
)2
β(k)h
]+
∑
i
(yi − µ(k)i )xij
var(yi)
(∂µi
∂ηi
)(8.50)
(j = 1, . . . , p).
The entire vector (8.50) can now be written as
X ′W (k)z(k) , (8.51)
where the (N × 1)–vector z(k) has the jth element as follows:
z(k)i =
p∑
j=1
xijβ(k)j + (yi − µ
(k)i )
(∂η
(k)i
∂µ(k)i
)
= η(k)i + (yi − µ
(k)i )
(∂η
(k)i
∂µ(k)i
). (8.52)
Hence, the equation of the Fisher–scoring algorithm (8.49) can now bewritten as
(X ′W (k)X)β(k+1) = X ′W (k)
z(k) . (8.53)
This is the likelihood equation of a generalized linear model with the re-sponse vector z(k) and the random error covariance matrix (W (k))−1. Ifrank(X) = p holds, we obtain the ML estimate β as the limit of
β(k+1) = (X ′W (k)X)−1X ′W (k)z(k) (8.54)
338 8. Models for Categorical Response Variables
for k →∞, with the asymptotic covariance matrix
V(β) = (X ′WX)−1 = F−1(N)(β) , (8.55)
where W is determined at β. Once a solution is found, then β is consistentfor β, asymptotically normal, and asymptotically efficient (see Fahrmeirand Kaufmann (1985) and Wedderburn (1976) for existence and uniquenessof the solutions). Hence we have β
a.s.∼ N(β, V(β)).
Remark. In the case of a canonical link function, that is for g(µi) = θi,the ML equations simplify and the Fisher–scoring algorithm is identical tothe Newton–Raphson algorithm (cf. Agresti (2007)). If the values a(φ) areidentical for all observations, then the ML equations are
∑
i
xijyi =∑
i
xijµi . (8.56)
If, on the other hand, a(φ) = ai(φ) = aiφ (i = 1, . . . , N) holds, then theML equations are
∑
i
xijyi
ai=
∑
i
xijµi
ai. (8.57)
As starting values for the Fisher–scoring algorithm the estimates β(0) =(X ′X)−1X ′y or β(0) = (X ′X)−1X ′g(y) may be used.
8.1.5 Testing of Hypotheses and Goodness of Fit
A generalized linear model g(µi) = x′iβ is—besides the distributionalassumptions—determined by the link function g(·) and the explanatoryvariables X1, . . . , Xp, as well as their number p, which determines the lengthof the parameter vector β to be estimated. If g(·) is chosen, then the modelis defined by the design matrix X.
Testing of Hypotheses
Let X1 and X2 be two design matrices (models), and assume that thehierarchical order X1 ⊂ X2 holds; that is, we have X2 = (X1, X3)with some matrix X3 and hence R(X1) ⊂ R(X2). Let β1, β2, and β3
be the corresponding parameter vectors to be estimated. Further, letg(µ1) = η1 = X1β1 and g(µ2) = η2 = X2β2 = X1β1 + X3β3, where β1
and β2 = (β′1, β′3)′ are the maximum–likelihood estimates under the two
models, and rank(X1) = r1, rank(X2) = r2, and (r2 − r1) = r = df . Thelikelihood ratio statistic, which compares a larger model X2 with a (smaller)submodel X1, is then defined as follows (where L is the likelihood function)
Λ =maxβ1 L(β1)maxβ2 L(β2)
. (8.58)
8.1 Generalized Linear Models 339
Wilks (1938) showed that −2 ln Λ has a limiting χ2df–distribution where
the degrees of freedom df equal the difference in the dimensions of thetwo models. Transforming (8.58) according to −2 ln Λ, with l denoting theloglikelihood, and inserting the maximum likelihood estimates gives
−2 ln Λ = −2[l(β1)− l(β2)] . (8.59)
In fact, one tests the hypotheses H0 : β3 = 0 against H1 : β3 6= 0. If H0
holds, then −2 ln Λ ∼ χ2r. Therefore, H0 is rejected if the loglikelihood is
significantly higher under the greater model using X2. According to Wilks,we write
G2 = −2 ln Λ .
Goodness of Fit
Let X be the design matrix of the saturated model that contains the samenumber of parameters as observations. Denote by θ the estimate of θ thatbelongs to the estimates µi = yi (i = 1, . . . , N) in the saturated model. Forevery submodel Xj that is not saturated, we then have (assuming againthat a(φ) = ai(φ) = aiφ)
G2(Xj |X) = 2∑ 1
ai
yi(θi − θi)− b(θi) + b(θi)φ
=D(y; µj)
φ(8.60)
as a measure for the loss in goodness of fit of the model Xj compared tothe perfect fit achieved by the saturated model. The statistic D(y; µj) iscalled the deviance of the model Xj . We then have
G2(X1 | X2) = G2(X1 | X)−G2(X2 | X) =D(y; µ1)−D(y; µ2)
φ. (8.61)
That is, the test statistic for comparing the model X1 with the larger modelX2 equals the difference of the goodness–of–fit statistics of the two models,weighted with 1/φ.
8.1.6 Overdispersion
In samples of a Poisson or multinomial distribution, it may occur thatthe elements show a larger variance than that given by the distribution.This may be due to a violation of the assumption of independence, as, forexample, a positive correlation in the sample elements. A frequent causefor this is the cluster structure of the sample. Examples are:
• the behavior of families of insects in the case of the influence of in-secticides Agresti (2007), where the family (cluster, batch) shows a
340 8. Models for Categorical Response Variables
collective (correlated) survivorship (many survive or most of themdie) rather than an independent survivorship, due to dependence oncluster-specific covariates such as the temperature;
• the survivorship of dental implants when two or more implants areincorporated for each patient;
• the developement of diseases, or the social behavior of the membersof a family; and
• heterogeneity is not taken into account, which is, for example, causedby not having measured important covariates for the linear predictor.
The existence of a larger variation (inhomogeneity) in the sample than inthe sample model is called overdispersion. Overdispersion is, in the simplestway, modeled by multiplying the variance with a constant φ > 1, whereφ is either known (e.g., φ = σ2 for a normal distribution), or has to beestimated from the sample (Fahrmeir and Tutz, 2001).
Example (McCullagh and Nelder, 1989, p. 125): Let N individuals bedivided into N/k clusters of equal cluster size k. Assume that the individualresponse is binary with P (Yi = 1) = πi, so that the total response
Y = Z1 + Z2 + · · ·+ ZN/k
equals the sum of independent B(k;πi)–distributed binomial variables Zi
(i = 1, . . . , N/k). The πi’s vary across the clusters and we assume thatE(πi) = π and var(πi) = τ2π(1− π) with 0 ≤ τ2 ≤ 1. We then have
E(Y ) = Nπ,
var(Y ) = Nπ(1− π)1 + (k − 1)τ2 (8.62)= φNπ(1− π) .
The dispersion parameter φ = 1 + (k − 1)τ2 is dependent on the clustersize k and on the variability of the πi, but not on the sample size N .This fact is essential for interpreting the variable Y as the sum of thebinomial variables Zi and for estimating the dispersion parameter φ fromthe residuals. Because of 0 ≤ τ2 ≤ 1, we have
1 ≤ φ ≤ k ≤ N . (8.63)
Relationship (8.62) means that
var(Y )Nπ(1− π)
= 1 + (k − 1)τ2 = φ (8.64)
is constant. An alternative model—the beta–binomial distribution—has theproperty that the quotient in (8.64), i.e., φ, is a linear function of the samplesize N . By plotting the residuals against N , it is easy to recognize whichof the two models is more likely. Rosner (1984) used the beta–binomialdistribution for estimation in clusters of size k = 2.
8.1 Generalized Linear Models 341
8.1.7 Quasi Loglikelihood
The generalized models assume a distribution of the natural exponentialfamily for the data as the random component (cf. (8.11)). If this assumptiondoes not hold, an alternative approach can be used to specify the functionalrelationship between the mean and the variance. For exponential families,the relationship (8.23) between variance and expectation holds. Assumethe general approach
var(Y ) = φV (µ) , (8.65)
where V (·) is an appropriately chosen function.In the quasi–likelihood approach (Wedderburn, 1974), only assumptions
about the first and second moments of the random variables are made.It is not necessary for the distribution itself to be specified. The startingpoint in estimating the influence of covariates is the score function (8.28),or rather the system of ML equations (8.43). If the general specification(8.65) is inserted into (8.43), we get the system of estimating equations forβ:
N∑
i=1
(yi − µi)V (µi)
xij∂µi
∂ηi= 0 (j = 1, . . . , p) , (8.66)
which is of the same form as the likelihood equations (8.43) for GLMs.However, system (8.66) is an ML equation system only if the yi’s have adistribution of the natural exponential family.
In the case of independent response, the modeling of the influence ofthe covariates X on the mean response E(y) = µ is done accordingto McCullagh and Nelder (1989, p. 324) as follows. Assume that for theresponse vector we have
y ∼ (µ, φV (µ)) , (8.67)
where φ > 0 is an unknown dispersion parameter and V (µ) is a matrix ofknown functions. Expression φV (µ) is called the working variance.
If the components of y are assumed to be independent, the covariancematrix φV (µ) has to be diagonal, that is,
V (µ) = diag(V1(µ), . . . , VN (µ)) . (8.68)
Here it is realistic to assume that the variance of each random variable yi
is dependent only on the ith component µi of µ, meaning thereby
V (µ) = diag(V1(µ1), . . . , VN (µN )). (8.69)
A dependency on all components of µ according to (8.68) is difficult tointerpret in practice, if independence of the yi is demanded as well. (Ne-vertheless, situations as in (8.68) are possible.) In many applications it isreasonable to assume, in addition to the functional independency (8.69),
342 8. Models for Categorical Response Variables
that the Vi functions are identical, so that
V (µ) = diag(v(µ1), . . . , v(µN )) (8.70)
holds, with Vi = v(·).Under the above assumptions, the following function for a component yi
of y:
U = u(µi, yi) =yi − µi
φv(µi)(8.71)
has the properties
E(U) = 0, (8.72)
var(U) =1
φv(µi), (8.73)
∂U
∂µi=
−φv(µi)− (yi − µi)φ∂v(µi)/∂µi
φ2v2(µi),
−E(
∂U
∂µi
)=
1φv(µi)
. (8.74)
Hence U has the same properties as the derivative of a loglikelihood, which,of course, is the score function (8.28). Property (8.47) corresponds to (8.31),whereas property (8.74), in combination with (8.73), corresponds to (8.31).Therefore,
Q(µ; y) =N∑
i=1
Qi(µi; yi) (8.75)
with
Qi(µi; yi) =∫ µi
yi
µi − t
φv(t)dt (8.76)
(cf. McCullagh and Nelder, 1989, p. 325) is the analog of the loglikeli-hood function. Q(µ; y) is called quasi loglikelihood. Hence, the quasi–scorefunction, which is obtained by differentiating Q(µ; y), equals
U(β) = φ−1D′V −1(y − µ) , (8.77)
with D = (∂µi/∂βj) (i = 1, . . . , N , j = 1, . . . , p) and V = diag(v1, . . . , vN ).The quasi–likelihood estimate β is the solution of U(β) = 0. It has theasymptotic covariance matrix
cov(β) = φ(D′V −1D)−1 . (8.78)
The dispersion parameter φ is estimated by
φ =1
N − p
∑(yi − µi)2
v(µi)=
X2
N − p, (8.79)
8.2 Contingency Tables 343
where X2 is the so–called Pearson statistic. In the case of overdispersion (orassumed overdispersion), the influence of covariates (i.e., of the vector β)is to be estimated by a quasi–likelihood approach (8.66) rather than by alikelihood approach.
8.2 Contingency Tables
8.2.1 Overview
This section deals with contingency tables and the appropriate models. Wefirst consider so–called two–way contingency tables. In general, a bivariaterelationship is described by the joint distribution of the two associated ran-dom variables. The two marginal distributions are obtained by integrating(summing) the joint distribution over the respective variables. Likewise,the conditional distributions can be derived from the joint distribution.
Definition 8.1 (Contingency Table). Let X and Y denote two categoricalvariables, with X at I levels and Y at J levels. When we observe sub-jects with the variables X and Y, there are I × J possible combinationsof classifications. The outcomes (X;Y ) of a sample with sample size n aredisplayed in an I × J (contingency) table. (X,Y ) are realizations of thejoint two–dimensional distribution
P (X = i, Y = j) = πij . (8.80)
The set πij forms the joint distribution of X and Y . The marginaldistributions are obtained by summing over rows or columns
Y Marginal1 2 . . . J distribution of X
1 π11 π12 . . . π1J π1+
2 π21 π22 . . . π2J π2+
X ......
......
...I πI1 πI2 . . . πIJ πI+
Marginal π+1 π+2 . . . π+J
distribution of Y
π+j =I∑
i=1
πij , j = 1, . . . , J ,
πi+ =J∑
j=1
πij , i = 1, . . . , I ,
I∑
i=1
πi+ =J∑
j=1
π+j = 1 .
344 8. Models for Categorical Response Variables
In many contingency tables the explanatory variable X is fixed, andonly the response Y is a random variable. In such cases, the main interestis not the joint distribution, but rather the conditional distribution. πj|i =P (Y = j | X = i) is the conditional probability, and π1|i, π2|i, . . . , πJ|i,with
∑Jj=1 πj|i = 1, is the conditional distribution of Y , given X = i.
A general aim of many studies is the comparison of the conditionaldistributions of Y at various levels i of X.
Suppose that X as well as Y are random response variables, so that thejoint distribution describes the association of the two variables. Then, forthe conditional distribution Y |X, we have
πj|i =πij
πi+∀i, j . (8.81)
Definition 8.2. Two variables are called independent if
πij = πi+π+j ∀i, j. (8.82)
If X and Y are independent, we obtain
πj|i =πij
πi+=
πi+π+j
πi+= π+j . (8.83)
The conditional distribution is equal to the marginal distribution and thusis independent of i.
Let pij denote the sample joint distribution. They have the followingproperties, with nij being the cell frequencies and n =
∑Ii=1
∑Jj=1 nij :
pij =nij
n,
pj|i =pij
pi+=
nij
ni+, pi|j =
pij
p+j=
nij
n+j,
pi+ =
∑Jj=1 nij
n, p+j =
∑Ii=1 nij
n,
ni+ =∑J
j=1 nij = npi+ , n+j =∑I
i=1 nij = np+j .
(8.84)
8.2.2 Ways of Comparing Proportions
Suppose that Y is a binary response variable (Y can take only the values 0or 1), and let the outcomes of X be grouped. When row i is fixed, π1|i is theprobability for response (Y = 1), and π2|i is the probability for nonresponse(Y = 0). The conditional distribution of the binary response variable Y ,given X = i, then is
(π1|i; π2|i) = (π1|i, (1− π1|i)). (8.85)
8.2 Contingency Tables 345
We can now compare two rows, say i and h, by calculating the differencein proportions for response, or nonresponse, respectively,
response: π1|h − π1|i
and
nonresponse: π2|h − π2|i = (1− π1|h)− (1− π1|i)= − (π1|h − π1|i) .
The differences have different signs, but their absolute values are identical.Additionally, we have
−1.0 ≤ π1|h − π1|i ≤ 1.0 . (8.86)
The difference equals zero if the conditional distributions of the two rowsi and h coincide. From this, one may conjecture that the response variableY is independent of the row classification when
π1|h − π1|i = 0 ∀(h, i), i, h = 1, 2, . . . , I , i 6= h . (8.87)
In a more general setting, with the response variable Y having Jcategories, the variables X and Y are independent if
πj|h − πj|i = 0 ∀j , ∀(h, i), i, h = 1, 2, . . . , I , i 6= h . (8.88)
Definition 8.3 (Relative Risk). Let Y denote a binary response variable. Theratio π1|h/π1|i is called the relative risk for response of category h in relationto category i.
For 2 × 2 tables the relative risk (for response) is
0 ≤ π1|1π1|2
< ∞ . (8.89)
The relative risk is a nonnegative real number. A relative risk of 1corresponds to independence. For nonresponse, the relative risk is
π2|1π2|2
=1− π1|11− π1|2
. (8.90)
Definition 8.4 (Odds). The odds are defined as the ratio of the probability ofresponse in relation to the probability of nonresponse, within one categoryof X.
For 2 × 2 tables, the odds in row 1 equal
Ω1 =π1|1π2|1
. (8.91)
Within row 2, the corresponding odds equal
Ω2 =π1|2π2|2
. (8.92)
346 8. Models for Categorical Response Variables
Hint. For the joint distribution of two binary variables, the definition is
Ωi =πi1
πi2, i = 1, 2 . (8.93)
In general, Ωi is nonnegative. When Ωi > 1, response is more likely thannonresponse. If, for instance, Ω1 = 4, then response in the first row is fourtimes as likely as nonresponse. The within–row conditional distributionsare independent when Ω1 = Ω2. This implies that the two variables areindependent:
X, Y independent ⇔ Ω1 = Ω2 . (8.94)
Definition 8.5 (Odds Ratio). The odds ratio is defined as
θ =Ω1
Ω2. (8.95)
From the definition of the odds using joint probabilities, we have
θ =π11π22
π12π21. (8.96)
Another terminology for θ is the cross–product ratio. X and Y areindependent when the odds ratio equals 1:
X, Y independent ⇔ θ = 1 . (8.97)
When all the cell probabilities are greater than 0 and 1 < θ < ∞,response for the subjects in the first row is more likely than for the subjectsin the second row, that is, π1|1 > π1|2. For 0 < θ < 1, we have π1|1 < π1|2(with a reverse interpretation).
The sample version of the odds ratio for the 2 × 2 table
Y1 2
1 n11 n12 n1+X2 n21 n22 n2+
n+1 n+2 n
is
θ =n11n22
n12n21. (8.98)
Odds Ratios for I × J Tables
From any given I ×J table, 2× 2 tables can be constructed by picking twodifferent rows and two different columns. There are I(I−1)/2 pairs of rowsand J(J − 1)/2 pairs of columns; hence an I × J table contains IJ(I −1)(J − 1)/4 tables. The set of all 2 × 2 tables contains much redundantinformation; therefore, we consider only neighboring 2× 2 tables with thelocal odds ratios
θij =πi,jπi+1,j+1
πi,j+1πi+1,j, i = 1, 2, . . . , I − 1 , j = 1, 2, . . . , J − 1 . (8.99)
8.2 Contingency Tables 347
These (I−1)(J−1) odds ratios determine all possible odds ratios formedfrom all pairs of rows and all pairs of columns.
8.2.3 Sampling in Two–Way Contingency Tables
Variables having nominal or ordinal scale are denoted as categorical vari-ables. In most cases, statistical methods assume a multinomial or a Poissondistribution for categorical variables. We now elaborate these two samplemodels. Suppose that we observe counts ni (i = 1, 2, . . . , N) in the N cellsof a contingency table with a single categorical variable or in N = I × Jcells of a two–way contingency table.
We assume that the ni are random variables with a distribution in R+
and the expected values E(ni) = mi, which are called expected frequencies.
Poisson Sample
The Poisson distribution is used for counts of events (such as response to amedical treatment) that occur randomly over time when outcomes in dis-joint periods are independent. The Poisson distribution may be interpretedas the limit distribution of the binomial distribution B(n; p) if λ = n · p isfixed for increasing n. For each of the N cells of a contingency table ni,we have
P (ni) =e−mimni
i
ni!, ni = 0, 1, 2, . . . , i = 1, . . . , N . (8.100)
This is the probability mass function of the Poisson distribution with theparameter mi. This satisfies the identities var(ni) = E(ni) = mi.
The Poisson model for ni assumes that the ni are independent. Thejoint distribution for ni then is the product of the distributions for ni
in the N cells. The total sample size n =∑N
i=1 ni also has a Poissondistribution with E(n) =
∑Ni=1 mi (the rule for summing up independent
random variables with Poisson distribution).The Poisson model is used if rare events are independently distributed
over disjoint classes.Let n =
∑Ni=1 ni be fixed. The conditional probability of a contingency
table ni that satisfies this condition is
348 8. Models for Categorical Response Variables
P(ni observations in cell i,i = 1, 2, . . . , N | ∑N
i=1 ni = n)
=P (ni observations in cell i,i = 1, 2, . . . , N)
P (∑N
i=1 ni = n)
=∏N
i=1 e−mi [(mnii )/ni!]
exp(−∑Nj=1 mj)[(
∑Nj=1 mj)n/n!]
=
(n!∏N
i=1 ni!
)·
N∏
i=1
πnii , with πi =
mi∑Ni=1 mi
. (8.101)
For N = 2, this is the binomial distribution. For the multinomial distri-bution for (n1, n2, . . . , nN ), the marginal distribution for ni is a binomialdistribution with E(ni) = nπi and var(ni) = nπi(1− πi).
Independent Multinomial Sample
Suppose we observe on a categorical variable Y at various levels of anexplanatory variable X. In the cell (X = i, Y = j) we have nij observations.Suppose that ni+ =
∑Jj=1 nij , the number of observations of Y for fixed
level i of X, is fixed in advance (and thus not random) and that the ni+
observations are independent and have the distribution (π1|i, π2|i, . . . , πJ|i).Then the cell counts in row i have the multinomial distribution
(ni+!∏Jj=1 nij !
)·
J∏
j=1
πnij
j|i . (8.102)
Furthermore, if the samples are independent for different i, then the jointdistribution for the nij in the I × J table is the product of the multino-mial distributions (8.102). This is called product multinomial sampling orindependent multinomial sampling.
8.2.4 Likelihood Function and Maximum Likelihood Estimates
For the observed cell counts ni, i = 1, 2, . . . , N, the likelihood functionis defined as the probability of ni, i = 1, 2, . . . , N for a given samplingmodel. This function, in general, is dependent on an unknown parameterθ—here, for instance, θ = πj|i. The maximum–likelihood estimate forthis vector of parameters is the value for which the likelihood function ofthe observed data takes its maximum.
To illustrate, we now look at the estimates of the category probabilitiesπi for multinomial sampling. The joint distribution ni is (cf. (8.102)and the notation πi, i = 1, . . . , N , N = I · J , instead of πj|i)
8.2 Contingency Tables 349
n!∏Ni=1 ni!
N∏
i=1
πnii
︸ ︷︷ ︸kernel
. (8.103)
It is proportional to the so–called kernel of the likelihood function. Thekernel contains all unknown parameters of the model. Hence, maximizingthe likelihood is equivalent to maximizing the kernel of the loglikelihoodfunction
ln(kernel) =N∑
i=1
ni ln(πi) → maxπi
. (8.104)
Under the condition πi > 0 (i = 1, 2, . . . , N),∑N
i=1 πi = 1, we haveπN = 1−∑N−1
i=1 πi and, hence,
∂πN
∂πi= −1 , i = 1, 2, . . . , N − 1 , (8.105)
∂ ln πN
∂πi=
1πN
· ∂πN
∂πi=−1πN
, i = 1, 2, . . . , N − 1 , (8.106)
∂L
∂πi=
ni
πi− nN
πN= 0 , i = 1, 2, . . . , N − 1 . (8.107)
From (8.107) we get
πi
πN=
ni
nN, i = 1, 2, . . . , N − 1 , (8.108)
and thus
πi = πNni
nN. (8.109)
UsingN∑
i=1
πi = 1 =πN
∑Ni=1 ni
nN, (8.110)
we obtain the solutions
πN =nN
n= pN , (8.111)
πi =ni
n= pi , i = 1, 2, . . . , N − 1 . (8.112)
The ML estimates are the proportions (relative frequencies) pi.For contingency tables we have, for independent X and Y ,
πij = πi+π+j . (8.113)
The ML estimates under this condition are
πij = pi+p+j =ni+n+j
n2(8.114)
350 8. Models for Categorical Response Variables
with the expected cell frequencies
mij = nπij =ni+n+j
n. (8.115)
Because of the similarity of the likelihood functions, the ML estimatesfor Poisson, multinomial, and product multinomial sampling are identical(as long as no further assumptions are made).
8.2.5 Testing the Goodness of Fit
A principal aim of the analysis of contingency tables is to test whetherthe observed and the expected cell frequencies (specified by a model) co-incide. For instance, Pearson’s χ2–statistic compares the observed and theexpected cell frequencies from (8.115) for independent X and Y .
Testing a Specified Multinomial Distribution (Theoretical Distribution)
We first want to compare a multinomial distribution, specified by πi0,with the observed distribution ni for N classes.
The hypothesis for this problem is
H0 : πi = πi0 , i = 1, 2, . . . , N , (8.116)
whereas for the πi we have the restrictionN∑
i=1
πi = 1 . (8.117)
When H0 is true, the expected cell frequencies are
mi = nπi0 , i = 1, 2, . . . , N . (8.118)
The appropriate test statistic is Pearson’s χ2, where
χ2 =N∑
i=1
(ni −mi)2
mi
approx.∼ χ2N−1 . (8.119)
This can be justified as follows: Let p = (n1/n, . . . , nN−1/n) andπ0 = (π10 , . . . , πN−10). By the central limit theorem we then have, forn →∞,
√n (p− π0) → N (0,Σ0) , (8.120)
and so
n (p− π0)′Σ−1
0 (p− π0) → χ2N−1 . (8.121)
The asymptotic covariance matrix has the form
Σ0 = Σ0(π0) = diag(π0)− π0π′0 . (8.122)
8.2 Contingency Tables 351
Its inverse can be written as
Σ−10 =
1πN0
11′ + diag(
1π10
, . . . ,1
πN−1 ,0
). (8.123)
The equivalence of (8.119) and (8.121) is proved by direct calculation.To illustrate, we choose N = 3. Using the relationship π1 + π2 + π3 = 1,we have
Σ0 =(
π1 00 π2
)−
(π2
1 π1π2
π1π2 π22
),
Σ−10 =
(π1(1− π1) −π1π2
−π1π2 π2(1− π2)
)−1
=1
π1π2π3
(π2(1− π2) π1π2
π1π2 π1(1− π1)
)
=(
1/π1 + 1/π3 1/π3
1/π3 1/π2 + 1/π3
).
The left side of (8.121) now is
n(n1
n− m1
n,n2
n− m2
n
) ( nm1
+ nm3
nm3
nm3
nm2
+ nm3
)(n1n − m1
nn2n − m2
n
)
=(n1 −m1)2
m1+
(n2 −m2)2
m2+
1m3
[(n1 −m1) + (n2 −m2)]2
=3∑
i=1
(ni −mi)2
mi.
Goodness of Fit for Estimated Expected Frequencies
When the unknown parameters are replaced by the ML estimates for aspecified model, the test statistic is again approximately distributed as χ2
with the number of degrees of freedom reduced by the number of estimatedparameters.
The degrees of freedom are (N − 1)− t, if t parameters are estimated.
Testing for Independence
In two–way contingency tables with multinomial sampling, the hypothesisH0 : X and Y are statistically independent is equivalent to H0 : πij =πi+π+j ∀i, j. The test statistic is Pearson’s χ2 in the following form:
χ2 =∑
i=1,2,...,Ij=1,2,...,J
(nij −mij)2
mij, (8.124)
where mij = nπij = nπi+π+j (expected cell frequencies under H0) areunknown.
352 8. Models for Categorical Response Variables
Given the estimates mij = npi+p+j , the χ2–statistic then equals
χ2 =∑
i=1,2,...,Ij=1,2,...,J
(nij − mij)2
mij(8.125)
with (I − 1)(J − 1) = (IJ − 1) − (I − 1) − (J − 1) degrees of freedom.The numbers (I − 1) and (J − 1) correspond to the (I − 1) independentrow proportions (πi+)′ and (J − 1) independent column proportions (π+j)estimated from the sample.
Likelihood–Ratio Test
The likelihood–ratio test (LRT) is a general–purpose method for testing H0
against H1. The main idea is to compare maxH0 L and maxH1∨H0 L withthe corresponding parameter spaces ω ⊆ Ω. As a test statistic, we have
Λ =maxω L
maxΩ L≤ 1 . (8.126)
It follows that, for n →∞ (Wilks, 1932),
G2 = −2 ln Λ → χ2d (8.127)
with d = dim(Ω)− dim(ω) as the degrees of freedom.For multinomial sampling in a contingency table, the kernel of the
likelihood function is
K =I∏
i=1
J∏
j=1
πnij
ij , (8.128)
with the constraints for the parameters
πij ≥ 0 andI∑
i=1
J∑
j=1
πij = 1 . (8.129)
Under the null hypothesis H0 : πij = πi+π+j , K is maximum for πi+ =ni+/n, π+j = n+j/n, and πij = ni+n+j/n2. Under H0∨H1, K is maximumfor πij = nij/n. We then have
Λ =
∏Ii=1
∏Jj=1 (ni+n+j)
nij
nn∏I
i=1
∏Jj=1 n
nij
ij
. (8.130)
It follows that Wilks’s G2 is given by
G2 = −2 lnΛ = 2I∑
i=1
J∑
j=1
nij ln(
nij
mij
)∼ χ2
(I−1)(J−1)
with mij = ni+n+j/n (estimate under H0).If H0 holds, Λ will be large, i.e., near 1, and G2 will be small. This means
that H0 is to be rejected for large G2.
8.3 Generalized Linear Model for Binary Response 353
8.3 Generalized Linear Model for Binary Response
8.3.1 Logit Models and Logistic Regression
Let Y be a binary random variable, that is, Y has only two categories(for instance, success/failure or case/control). Hence the response variableY can always be coded as (Y = 0, Y = 1). Yi has a Bernoulli distrib-ution, with P (Yi = 1) = πi = πi(xi) and P (Yi = 0) = 1 − πi, wherexi = (xi1, xi2, . . . , xip)′ denotes a vector of prognostic factors, which webelieve influence the success probability π(xi), and i = 1, . . . , N denotesindividuals as usual. With these assumptions it immediately follows that
E(Yi) = 1 · πi + 0 · (1− πi) = πi ,
E(Y 2i ) = 12 · πi + 02 · (1− πi) = πi ,
var(Yi) = E(Y 2i )− (E(Yi))
2 = πi − π2i = πi(1− πi) .
The likelihood contribution of an individual i is further given by
f (yi; πi) = πyi
i (1− πi)1−yi
= (1− πi)(
πi
1− πi
)yi
= (1− πi) exp(
yi ln(
πi
1− πi
)).
The natural parameter Q(πi) = ln[πi/(1− πi)] is the log odds of response1 and is called the logit of πi.
A GLM with the logit link is called a logit model or logistic regressionmodel . The model is, on an individual basis, given by
ln(
πi
1− πi
)= x′iβ . (8.131)
This parametrization guarantees a monotonic course (S–curve) of the prob-ability πi, under inclusion of the linear approach x′iβ over the range ofdefinition [0, 1]:
πi =exp(x′iβ)
1 + exp(x′iβ). (8.132)
Grouped Data
If possible (e.g., if prognostic factors are themselves categorical), patientscan be grouped along the strata defined by the number of possible factorcombinations. Let nj , j = 1, . . . , G, G ≤ N , be the number of patientsfalling in strata j. Then we observe yj patients having response Y = 1 andnj − yj patients with response Y = 0. Then a natural estimate for πj isπj = yj/nj . This corresponds to a saturated model, that is, a model inwhich main effects and all interactions between the factors are included.
354 8. Models for Categorical Response Variables
Age Lossj Group yes No nj
1 < 40 4 70 742 40–50 28 147 1753 50–60 38 207 2454 60–70 51 202 2535 > 70 32 92 124
153 718 871
Table 8.1. (5× 2)–Table of loss of abutment teeth by age groups (Example 8.1).
But one should note that this is reasonable only if the number of stratais low compared to N so that nj is not too low. Whenever nj = 1 theseestimates degenerate, and more smoothing of the probabilities and thus amore parsimonious model is necessary.
The Simplest Case and an Example
For simplicity, we assume now that p = 1, that is, we consider only oneexplanatory variable. The model in this simplest case is given by
ln(
πi
1− πi
)= α + βxi . (8.133)
For this special situation, we get for the odds,πi
1− πi= exp(α + βxi) = eα
(eβ
)xi, (8.134)
that is, if xi increases by one unit, the odds increase by eβ .An advantage of this link is that the effects of X can be esti-
mated, whether the study of interest is retrospective or prospective(cf. Toutenburg, 1992b, Chapter 5). The effects in the logistic model referto the odds. For two different x–values, exp(α + βx1)/ exp(α + βx2) is anodds ratio.
To find the appropriate form for the systematic component of the logisticregression, the sample logits are plotted against x.
Remark. Let xj be chosen (j being a group index). For nj observations ofthe response variable Y , let 1 be observed yj times at this setting. Henceπ(xj) = yj/nj and ln[πj/(1− πj)] = ln[yj/(nj − yj)] is the sample logit.
This term, however, is not defined for yj = 0 or nj = 0. Therefore, acorrection is introduced, and we utilize the smoothed logit
ln[(
yj + 1/2)/(
nj − yj + 1/2)]
.
Example 8.1. We examine the risk (Y ) for the loss of abutment teethby extraction in dependence on age (X) (Walther and Toutenburg, 1991).
8.3 Generalized Linear Model for Binary Response 355
From Table 8.1, we calculate χ24 = 15.56, which is significant at the 5%
level (χ24;0.95 = 9.49). Using the unsmoothed sample logits results in the
following table:
Samplei logits
π1|j = yj/nj
1 −2.86 0.0542 −1.66 0.1603 −1.70 0.1554 −1.38 0.2025 −1.06 0.258 −3
−2.5−2
−1.5−1
−0.50
•
• ••
•
x1 x2 x3 x4 x5
π1|j is the estimated risk for loss of abutment teeth. It increases linearlywith age group. For instance, age group 5 has five times the risk of agegroup 1.
Modeling with the logistic regression
ln(
π1(xj)1− π1(xj)
)= α + βxj
results in
Sample Fitted Expected Observedxj logits logits π1(xj) nj π1(xj) yj
35 −2.86 −2.22 0.098 7.25 445 −1.66 −1.93 0.127 22.17 2855 −1.70 −1.64 0.162 39.75 3865 −1.38 −1.35 0.206 51.99 5175 −1.06 −1.06 0.257 31.84 32
with the ML estimates
α = −3.233 ,
β = 0.029 .
8.3.2 Testing the Model
Under general conditions the maximum–likelihood estimates are asymptot-ically normal. Hence tests of significance and the setting up of confidencelimits can be based on the normal theory.
The significance of the effect of the variable X on π is equivalent to thesignificance of the parameter β. The hypothesis β is significant or β 6= 0 istested by the statistical hypothesis H0 : β = 0 against H1 : β 6= 0. For thistest, we compute the Wald statistic Z2 = β′(covβ)−1β ∼ χ2
df , where df isthe number of components of the vector β.
356 8. Models for Categorical Response Variables
0
1π(x)
Figure 8.1. Logistic function π(x) = exp(x)/(1 + exp(x)).
In the above Example 8.1, we have Z2 = 13.06 > χ21;0.95 = 3.84 (the
upper 5% value), which leads to a rejection of H0 : β = 0 so that the trendis seen to be significant.
8.3.3 Distribution Function as a Link Function
The logistic function has the shape of the cumulative distribution functionof a continuous random variable.
This suggests a class of models for binary responses having the form
π(x) = F (α + βx) , (8.135)
where F is a standard, continuous, cumulative distribution function. If Fis strictly monotonically increasing over the entire real line, we have
F−1(π(x)) = α + βx . (8.136)
This is a GLM with F−1 as the link function. F−1 maps the [0, 1] range ofprobabilities onto (−∞,∞).
The cumulative distribution function of the logistic distribution is
F (x) =exp
(x− µ
τ
)
1 + exp(
x− µτ
) , −∞ < x < ∞ , (8.137)
with µ as the location parameter and τ > 0 as the scale parameter.The distribution is symmetric with mean µ and standard deviation
τπ/√
3 (bell–shaped curve, similar to the standard normal distribution).The logistic regression π(x) = F (α + βx) belongs to the standardized lo-gistic distribution F with µ = 0 and τ = 1. Thus, the logistic regressionhas mean −α/β and standard deviation π/|β|√3.
If F is the standard normal cumulative distribution function, π(x) =F (α + βx) = Φ(α + βx), π(x) is called the probit model.
8.4 Logit Models for Categorical Data 357
8.4 Logit Models for Categorical Data
The explanatory variable X can be continuous or categorical. Assume X tobe categorical and choose the logit link; then the logit models are equivalentto loglinear models (categorical regression), which are discussed in detail inSection 8.6. For the explanation of this equivalence we first consider thelogit model.
Logit Models for I × 2 Tables
Let X be an explanatory variable with I categories. If response/nonresponseis the Y factor, we then have an I × 2 table. In row i the probability forresponse is π1|i and for nonresponse π2|i, with π1|i + π2|i = 1.
This leads to the following logit model:
ln(
π1|iπ2|i
)= α + βi . (8.138)
Here the x–values are not included explicitly but only through the categoryi. βi describes the effect of category i on the response. When βi = 0, thereis no effect. This model resembles the one–way analysis of variance and,likewise, we have the constraints for identifiability
∑βi = 0 or βI = 0.
Then I−1 of the parameters βi suffice for characterization of the model.For the constraint
∑βi = 0, α is the overall mean of the logits and βi is
the deviation from this mean for row i. The higher βi is, the higher is thelogit in row i, and the higher is the value of π1|i (= chance for response incategory i).
When the factor X (in I categories) has no effect on the response variable,the model simplifies to the model of statistical independence of the factorand response
ln(
π1|iπ2|i
)= α ∀i ,
We now have β1 = β2 = · · · = βI = 0, and thus π1|1 = π1|2 = · · · = π1|I .
Logit Models for Higher Dimensions
As a generalization to two or more categorical factors that have an effecton the binary response, we now consider the two factors A and B with Iand J levels. Let π1|ij and π2|ij denote the probabilities for response andnonresponse for the combination ij of factors so that π1|ij + π2|ij = 1. Forthe I × J × 2 table, the logit model
ln(
π1|ijπ2|ij
)= α + βA
i + βBj (8.139)
represents the effects of A and B without interaction. This model is equi-valent to the two–way analysis of variance without interaction.
358 8. Models for Categorical Response Variables
8.5 Goodness of Fit—Likelihood Ratio Test
For a given model M , we can use the estimates of the parameters (α + βi)and (α, β) to predict the logits, to estimate the probabilities of responseπ1|i, and hence to calculate the expected cell frequencies mij = ni+πj|i.
We can now test the goodness of fit of a model M with Wilks’ G2–statistic
G2(M) = 2I∑
i=1
J∑
j=1
nij ln(
nij
mij
). (8.140)
The mij are calculated by using the estimated model parameters. Thedegrees of freedom equal the number of logits minus the number ofindependent parameters in the model M .
We now consider three models for binary response (cf. Agresti (2007)).
(1) Independence model:
M = I : ln(
π1|iπ2|i
)= α . (8.141)
Here we have I logits and one parameter, that is, I − 1 degrees offreedom.
(2) Logistic model:
M = L : ln(
π1|iπ2|i
)= α + βxi . (8.142)
The number of degrees of freedom equals I − 2.
(3) Logit model:
M = S : ln(
π1|iπ2|i
)= α + βi . (8.143)
The model has I logits and I independent parameters. The numberof degrees of freedom is 0, so it has perfect fit. This model, with equalnumbers of parameters and observations, is called a saturated model.
As mentioned earlier, the likelihood–ratio test compares a model M1
with a simpler model M2 (in which a few parameters equal zero). The teststatistic here is then
Λ =L(M2)L(M1)
, (8.144)
or
G2 (M2|M1) = −2 (ln L(M2)− ln L(M1)) . (8.145)
The statistic G2(M) is a special case of this statistic, in which M2 = Mand M1 is the saturated model. If we want to test the goodness of fit with
8.6 Loglinear Models for Categorical Variables 359
G2(M), this is equivalent to testing whether all the parameters that are inthe saturated model, but not in the model M , are equal to zero.
Let lS denote the maximized loglikelihood function for the saturatedmodel. Then we have
G2(M2|M1) = −2 (ln L(M2)− ln L(M1))= −2 (ln L(M2)− lS)− [−2(ln L(M1)− lS)]= G2(M2)−G2(M1) . (8.146)
That is, the statistic G2(M2|M1) for comparing two models is identical tothe difference of the goodness–of–fit statistics for the two models.
Example 8.2. In Example 8.1 “Loss of abutment teeth/age” for the logisticmodel we have:
Age Loss No lossgroup Observed Expected Observed Expected
1 4 7.25 70 66.752 28 22.17 147 152.833 38 39.75 207 205.254 51 51.99 202 201.015 32 31.84 92 92.16
and get G2(L) = 3.66, df = 5− 2 = 3.For the independence model, we get G2(I) = 17.25 with df = 4 =
(I − 1)(J − 1) = (5− 1)(2− 1). The test statistic for testing H0 : β = 0 inthe logistic model is then
G2(I|L) = G2(I)−G2(L) = 17.25− 3.66 = 13.59, df = 4− 3 = 1 .
This value is significant, which means that the logistic model, compared tothe independence model, holds.
8.6 Loglinear Models for Categorical Variables
8.6.1 Two–Way Contingency Tables
The previous models focused on bivariate response, that is, on I×2 tables.We now generalize this set–up to I × J and later to I × J ×K tables.
Suppose that we have a realization (sample) of two categorical variableswith I and J categories and sample size n. This yields observations inN = I × J cells of the contingency table. The number in the (i, j)th cell isdenoted by nij .
The probabilities πij of the multinomial distribution form the jointdistribution. Independence of the variables is equivalent to
πij = πi+π+j (for all i, j). (8.147)
360 8. Models for Categorical Response Variables
If this is applied to the expected cell frequencies mij = nπij , thecondition of independence is equivalent to
mij = nπi+π+j . (8.148)
The modeling of the I × J table is based on this relation as anindependence model on the logarithmic scale
ln(mij) = ln n + ln πi+ + ln π+j . (8.149)
Hence, the effects of the rows and columns on ln(mij) are additive. Analternative expression, following the models of analysis of variance of theform,
yij = µ + αi + βj + εij ,(∑
αi =∑
βj = 0)
, (8.150)
is given by
ln mij = µ + λXi + λY
j (8.151)
with
λXi = ln πi+ − 1
I
(I∑
k=1
ln πk+
), (8.152)
λYj = ln π+j − 1
J
(J∑
k=1
ln π+k
), (8.153)
µ = ln n +1I
(I∑
k=1
ln πk+
)+
1J
(J∑
k=1
ln π+k
). (8.154)
The parameters satisfy the constraints
I∑
i=1
λXi =
J∑
j=1
λYj = 0 , (8.155)
which make the parameters identifiable.Model (8.151) is called a loglinear model of independence in a two–way
contingency table.The related saturated model contains the additional interaction parame-
ters λXYij :
ln mij = µ + λXi + λY
j + λXYij . (8.156)
This model describes the perfect fit. The interaction parameters satisfy
I∑
i=1
λXYij =
J∑
j=1
λXYij = 0 . (8.157)
8.6 Loglinear Models for Categorical Variables 361
Given the λij in the first (I−1)(J−1) cells, these constraints determine theλij in the last row or the last column. Thus, the saturated model contains
1︸︷︷︸µ
+(I − 1)︸ ︷︷ ︸λX
i
+(J − 1)︸ ︷︷ ︸λY
j
+(I − 1)(J − 1)︸ ︷︷ ︸λXY
ij
= IJ (8.158)
independent parameters.For the independence model, the number of independent parameters
equals
1 + (I − 1) + (J − 1) = I + J − 1 . (8.159)
Interpretation of the Parameters
Loglinear models estimate the effects of rows and columns on ln mij . Forthis, no distinction is made between explanatory and response variables.The information of the rows or columns influence mij symmetrically.
Consider the simplest case—the I × 2 table (independence model).According to (8.159), the logit of the binary variable equals
ln(
π1|iπ2|i
)= ln
(mi1
mi2
)
= ln(mi1)− ln(mi2)= (µ + λX
i + λY1 )− (µ + λX
i + λY2 )
= λY1 − λY
2 . (8.160)
The logit is the same in every row and hence independent of X or thecategories i = 1, . . . , I, respectively.
For the constraints
λY1 + λY
2 = 0 ⇒ λY1 = −λY
2 ,
⇒ ln(
π1|iπ2|i
)= 2λY
1 (i = 1, . . . , I) .
Hence we obtain
π1|iπ2|i
= exp(2λY1 ) (i = 1, . . . , I) . (8.161)
In each category of X, the odds that Y is in category 1 rather than incategory 2 are equal to exp(2λY
1 ), when the independence model holds.
362 8. Models for Categorical Response Variables
EndodonticAge Form of treatment
group construction Yes NoH 62 1041
< 60B 23 463H 70 755≥ 60B 30 215
Σ 185 2474
Table 8.2. 2× 2× 2 Table for endodontic risk.
The following relationship exists between the odds ratio in a 2× 2 tableand the saturated loglinear model
ln θ = ln(
m11 m22
m12 m21
)
= ln(m11) + ln(m22)− ln(m12)− ln(m21)= (µ + λX
1 + λY1 + λXY
11 ) + (µ + λX2 + λY
2 + λXY22 )
− (µ + λX1 + λY
2 + λXY12 )− (µ + λX
2 + λY1 + λXY
21 )= λXY
11 + λXY22 − λXY
12 − λXY21 .
Since∑2
i=1 λXYij =
∑2j=1 λXY
ij = 0, we have λXY11 = λXY
22 = −λXY12 =
−λXY21 and thus ln θ = 4λXY
11 . Hence the odds ratio in a 2× 2 table equals
θ = exp(4λXY11 ) , (8.162)
and is dependent on the association parameter in the saturated model.When there is no association, i.e., λij = 0, we have θ = 1.
8.6.2 Three–Way Contingency Tables
We now consider three categorical variables X, Y , and Z. The frequenciesof the combinations of categories are displayed in the I×J×K contingencytable. We are especially interested in I × J × 2 contingency tables, wherethe last variable is a bivariate risk or response variable. Table 8.2 showsthe risk for an endodontic treatment depending on the age of patients andthe type of construction of the denture (Walther and Toutenburg, 1991).
In addition to the bivariate associations, we want to model an overallassociation. The three variables are mutually independent if the followingindependence model for the cell frequencies mijk (on a logarithmic scale)holds:
ln(mijk) = µ + λXi + λY
j + λZk . (8.163)
(In the above example, we have X : age group, Y : type of construction,and Z : endodontic treatment.) The variable Z is independent of the joint
8.6 Loglinear Models for Categorical Variables 363
distribution of X and Y (jointly independent) if
ln(mijk) = µ + λXi + λY
j + λZk + λXY
ij . (8.164)
A third type of independence (conditional independence of two variablesgiven a fixed category of the third variable) is expressed by the followingmodel (j fixed!):
ln(mijk) = µ + λXi + λY
j + λZk + λXY
ij + λY Zjk . (8.165)
This is the approach for the conditional independence of X and Z at level jof Y . If they are conditionally independent for all j = 1, . . . , J , then X andZ are called conditionally independent, given Y . Similarly, if X and Y areconditionally independent at level k of Z, the parameters λXY
ij and λY Zjk in
(8.165) are replaced by the parameters λXZik and λY Z
jk . The parameters withtwo subscripts describe two–way interactions. The appropriate conditionsfor the cell probabilities are:
(a) mutual independence of X, Y, Z:
πijk = πi++π+j+π++k (for all i, j, k). (8.166)
(b) joint independence:Y is jointly independent of X and Z when
πijk = πi+kπ+j+ (for all i, j, k). (8.167)
(c) conditional independence:X and Y are conditionally independent of Z when
πijk =πi+kπ+jk
π++k(for all i, j, k). (8.168)
The most general loglinear model (saturated model) for three–way tablesis the following:
ln(mijk) = µ + λXi + λY
j + λZk + λXY
ij + λXZik + λY Z
jk + λXY Zijk . (8.169)
The last parameter describes the three–factor interaction.All association parameters,, describing the deviation from the general
mean µ, satisfy the constraintsI∑
i=1
λXYij =
J∑
j=1
λXYij = . . . =
K∑
k=1
λXY Zijk = 0 . (8.170)
Similarly, for the main factor effects we haveI∑
i=1
λXi =
J∑
j=1
λYj =
K∑
k=1
λZk = 0 . (8.171)
From the general model (8.169), submodels can be constructed. For this,the hierarchical principle of construction is preferred. A model is called hi-erarchical when, in addition to significant higher–order effects, it contains
364 8. Models for Categorical Response Variables
Loglinear model Symbol
ln(mij+) = µ + λXi + λY
j (X, Y )
ln(mi+k) = µ + λXi + λZ
k (X, Z)
ln(m+jk) = µ + λYj + λZ
k (Y, Z)
ln(mijk) = µ + λXi + λY
j + λZk (X, Y, Z)
ln(mijk) = µ + λXi + λY
j + λZk + λXY
ij (XY, Z)
.
.
....
ln(mijk) = µ + λXi + λY
j + λXYij (XY )
.
.
....
ln(mijk) = µ + λXi + λY
j + λZk + λXY
ij + λXZik (XY, XZ)
.
.
....
ln(mijk) = µ + λXi + λY
j + λZk + λXY
ij + λXZik + λY Z
jk (XY, XZ, Y Z)
.
.
....
ln(mijk) = µ + λXi + λY
j + λZk + λXY
ij + λXZik + λY Z
jk + λXY Zijk (XY Z)
Table 8.3. Symbols of the hierarchical models for three–way contingency tablesAgresti (2007).
all lower–order effects of the variables included in the higher–order effects,even if these parameter estimates are not statistically significant. For in-stance, if the model contains the association parameter λXZ
ik , it must alsocontain λX
i and λZk :
ln(mijk) = µ + λXi + λZ
k + λXZik . (8.172)
A symbol is assigned to the various hierarchical models (Table 8.3).Similar to 2×2 tables, a close relationship exists between the parameters
of the model and the odds ratios. Given a 2× 2× 2 table, we have, underthe constraints (8.170) and (8.171), for instance,
θ11(1)
θ11(2)= [(π111π221)/(π211π121)]/[(π112π222)/(π212π122)]
= exp(8λXY Z111 ) . (8.173)
This is the conditional odds ratio of X and Y given the levels k = 1(numerator) and k = 2 (denominator) of Z. The same holds for X and Zunder Y and for Y and Z under X. In the population, we thus have- forthe three–way interaction λXY Z
111 ,
θ11(1)
θ11(2)=
θ1(1)1
θ1(2)1=
θ(1)11
θ(2)11= exp(8λXY Z
111 ) . (8.174)
In the case of independence in the equivalent subtables, the odds ratios(of the population) equal 1. The sample odds ratio gives a first hint at adeviation from independence.
8.7 The Special Case of Binary Response 365
Consider the conditional odds ratio (8.174) for Table 8.2 assuming thatX is the variable “age group,” Y is the variable “form of construction,”and Z is the variable “endodontic treatment.”
We then have a value of 1.80. This indicates a positive tendency for anincreased risk of endodontic treatment in comparing the following subtablesfor endodontic treatment (left) versus no endodontic treatment (right):
H B< 60 62 23≥ 60 70 30
H B< 60 1041 463≥ 60 755 215
The relationship (8.102) is also valid for the sample version. Thus acomparison of the following subtables for < 60 (left) versus ≥ 60 (right):
TreatmentYes No
H 62 1041B 23 463
TreatmentYes No
H 70 755B 30 215
or for H (left) versus B (right):
TreatmentYes No
< 60 62 1041≥ 60 70 755
TreatmentYes No
< 60 23 463≥ 60 30 215
leads to the same sample value 1.80 and hence λXY Z111 = 0.073.
Calculations for Table 8.2:
θ11(1)
θ11(2)
=n111n221n211n121n112n222n212n122
=62·3070·23
1041·215755·463
=1.15530.6403
= 1.80 ,
θ(1)11
θ(2)11
=n111n122n121n112n211n222n221n212
=62·46323·104170·21530·755
=1.19890.6645
= 1.80 ,
θ1(1)1
θ1(2)1
=n111n212n211n112n121n222n221n122
=62·75570·104123·21530·463
=0.64240.3560
= 1.80 .
8.7 The Special Case of Binary Response
If one of the variables is a binary response variable (in our example, Z :endodontic treatment) and the others are explanatory categorical variables(in our example X : age group and Y : type of construction), these modelslead to the already known logit model.
366 8. Models for Categorical Response Variables
Given the independence model
ln(mijk) = µ + λXi + λY
j + λZk , (8.175)
we then have, for the logit of the response variable Z,
ln(
mij1
mij2
)= λZ
1 − λZ2 . (8.176)
With the constraint∑
2k=1λ
Zk = 0 we thus have
ln(
mij1
mij2
)= 2λZ
1 (for all i, j) . (8.177)
The higher the value of λZ1 is, the higher is the risk for category Z = 1
(endodontic treatment), independent of the values of X and Y .In case the other two variables are also binary, implying a 2×2×2 table,
and if the constraints
λX2 = −λX
1 , λY2 = −λY
1 , λZ2 = −λZ
1 ,
hold, then the model (8.175) can be expressed as follows:
ln(m111)ln(m112)ln(m121)ln(m122)ln(m211)ln(m212)ln(m221)ln(m222)
=
1 1 1 11 1 1 −11 1 −1 11 1 −1 −11 −1 1 11 −1 1 −11 −1 −1 11 −1 −1 −1
µλX
1
λY1
λZ1
, (8.178)
which is equivalent to ln(m) = Xβ.This corresponds to the effect coding of categorical variables (Section
8.8). The ML equation is
X ′n = X ′m . (8.179)
The estimated asymptotic covariance matrix for Poisson sampling reads as
cov(β) = [X ′(diag(m))X]−1, (8.180)
where diag(m) has the elements m on the main diagonal. The solution ofthe ML equation (8.179) is obtained by the Newton–Raphson or any otheriterative algorithm, for instance, the iterative proportional fitting (IPF).
The IPF method (Deming and Stephan, 1940; cf. Agresti (2007), adjustsinitial estimates m(0)
ijk successively to the respective expected marginaltable of the model until a prespecified accuracy is achieved. For the
8.7 The Special Case of Binary Response 367
independence model the steps of iteration are
m(1)ijk = m
(0)ijk
(ni++
m(0)i++
),
m(2)ijk = m
(1)ijk
(n+j+
m(1)+j+
),
m(3)ijk = m
(2)ijk
(n++k
m(2)++k
).
Example 8.3 (Tartar Smoking Analysis). A study cited in Toutenburg(1992b, p. 42) investigates to what extent smoking influences the develop-ment of tartar. The 3× 3 contingency table (Table 8.5) is modeled by theloglinear model
ln(mij) = µ + λSmokingi + λTartar
j + λSmoking/Tartarij ,
with i, j = 1, 2. Here we have
λSmoking1 = effect nonsmoker,
λSmoking2 = effect light smoker,
λSmoking3 = −(λSmoking
1 + λSmoking2 ) = effect heavy smoker .
For the development of tartar, analogous expressions are valid:
(i) Model of independence. For the null hypothesis
H0 : ln(mij) = µ + λSmokingi + λTartar
j ,
we receive G2 = 76.23 > 9.49 = χ24;0.95. This leads to a clear rejection
of this model.
(ii) Saturated model. Here we have G2 = 0. The estimates of the para-meters are (values in parantheses are standardized values)
λSmoking1 = -1.02 (-25.93),
λSmoking2 = 0.20 (7.10),
λSmoking3 = 0.82 (—),
λTartar1 = 0.31 (11.71),
λTartar2 = 0.61 (23.07),
λTartar3 = -0.92 (—) .
All single effects are highly significant. The interaction effects areshown in Table 8.4.
368 8. Models for Categorical Response Variables
Tartar1 2 3
∑1 0.34 -0.14 -0.20 0
Smoking 2 -0.12 0.06 0.06 03 -0.22 0.08 0.14 0∑
0 0 0
Table 8.4. Interaction effects
The main diagonal is very well marked, which is an indication fora trend. The standardized interaction effects are significant as well:
1 2 31 7.30 -3.05 —2 -3.51 1.93 —3 — — —
TartarNone Middle Heavy
None 284 236 48Smoking Middle 606 983 209
Heavy 1028 1871 425
Table 8.5. Smoking and development of tartar.
8.8 Coding of Categorical Explanatory Variables
8.8.1 Dummy and Effect Coding
If a bivariate response variable Y is connected to a linear model x′β, withx being categorical, by an appropriate link, the parameters β are always tobe interpreted in terms of their dependence on the x scores. To eliminatethis arbritariness, an appropriate coding of x is chosen. Here two ways ofcoding are suggested (partly in analogy to the analysis of variance).
Dummy Coding
Let A be a variable in I categories. Then the I − 1 dummy variables aredefined as follows:
xAi =
1 for category i of variable A,0 for others, (8.181)
8.8 Coding of Categorical Explanatory Variables 369
with i = 1, . . . , I − 1.The category I is implicitly taken into account by xA
1 = . . . = xAI−1 = 0.
Thus, the vector of explanatory variables belonging to variable A is of thefollowing form:
xA = (xA1 , xA
2 , . . . , xAI−1)
′ . (8.182)
The parameters βi, which go into the final regression model proportionalto x′Aβ, are called the main effects of A.
Example:
(i) Sex male/female, with male : category 1, female : category 2,
xSex1 = (1) ⇒ person is male,
xSex2 = (0) ⇒ person is female .
(ii) Age groups i = 1, . . . , 5,
xAge = (1, 0, 0, 0)′ ⇒ age group is 1,
xAge = (0, 0, 0, 0)′ ⇒ age group is 5 .
Let y be a bivariate response variable. The probability of response (y = 1)dependent on a categorical variable A in I categories can be modeled asfollows:
P (y = 1 | xA) = β0 + β1xA1 + · · ·+ βI−1x
AI−1 . (8.183)
Given category i (age group i), we have
P (y = 1 | xA represents the ith age group) = β0 + βi ,
as long as i = 1, 2, . . . , I − 1 and, for the implicitly coded category I, weget
P (y = 1 | xA represents the Ith age group) = β0 . (8.184)
Hence, for each category i, another probability of response P (y = 1 | xA)is possible.
Effect Coding
For an explanatory variable A in I categories, effect coding is defined asfollows:
xAi =
1 for category i, i = 1, . . . , I − 1,−1 for category I,
0 for others.(8.185)
Consequently, we have
βI = −I−1∑
i=1
βi , (8.186)
370 8. Models for Categorical Response Variables
which is equivalent to
I∑
i=1
βi = 0 . (8.187)
In analogy to the analysis of variance, the model for the probability ofresponse has the following form:
P (y = 1 | xA represents the ith age group) = β0 + βi (8.188)
for i = 1, . . . , I and with the constraint (8.187).
Example: I = 3 age groups A1, A2, A3. A person in A1 is coded (1, 0), aperson in A2 is coded (0, 1) for both dummy and effect coding. A person inA3 is coded (0, 0) using dummy coding or (−1,−1) using effect coding. Thetwo ways of coding categorical variables generally differ only for category I.
Inclusion of More than One Variable
If more than one explanatory variable is included in the model, the cat-egories of A,B, and C (with I, J , and K categories, respectively), forexample, are combined in a common vector
x′ = (xA1 , . . . , xA
I−1, xB1 , . . . , xB
J−1, xC1 , . . . , xC
K−1) . (8.189)
In addition to these main effects, the interaction effects xABij , . . . , xABC
ijk canbe included. The codings of the xAB
ij , . . . , xABCijk are chosen in consideration
of constraints (8.170).
Example: In the case of effect coding, we obtain, for the saturated model(8.156) with binary variables A and B,
ln(m11)ln(m12)ln(m21)ln(m22)
=
1 1 1 11 1 −1 −11 −1 1 −11 −1 −1 1
µλA
1
λB1
λAB11
,
from which we receive the following values for xABij , recoded for parame-
ter λAB11 :
Recoding(i, j) Parameter Constraints for λAB
11
(1, 1) xAB11 = 1 λAB
11
(1, 2) xAB12 = 1 λAB
12 λAB12 = −λAB
11 xAB12 = −1
(2, 1) xAB21 = 1 λAB
21 λAB21 = λAB
12 = −λAB11 xAB
21 = −1(2, 2) xAB
22 = 1 λAB22 λAB
22 = −λAB21 = λAB
11
Thus the interaction effects develop from multiplying the main effects.
8.8 Coding of Categorical Explanatory Variables 371
X =
β0 xA1 xB
1 xB2 xC
1 xC2 xC
3
1 1 1 0 1 0 01 1 1 0 0 1 01 1 1 0 0 0 11 1 1 0 −1 −1 −11 1 0 1 1 0 01 1 0 1 0 1 01 1 0 1 0 0 11 1 0 1 −1 −1 −11 1 −1 −1 1 0 01 1 −1 −1 0 1 01 1 −1 −1 0 0 11 1 −1 −1 −1 −1 −11 −1 1 0 1 0 01 −1 1 0 0 1 01 −1 1 0 0 0 11 −1 1 0 −1 −1 −11 −1 0 1 1 0 01 −1 0 1 0 1 01 −1 0 1 0 0 11 −1 0 1 −1 −1 −11 −1 −1 −1 1 0 01 −1 −1 −1 0 1 01 −1 −1 −1 0 0 11 −1 −1 −1 −1 −1 −1
Figure 8.2. Design matrix for the main effects of a 2× 3× 4 contingency table.
Let L be the number of possible (different) combinations of variables. If,for example, we have three variables A,B, C in I, J,K categories, L equalsIJK.
Consider a complete factorial experimental design (as in an I × J ×Kcontingency table). Now L is known, and the design matrix X (in effect ordummy coding) for the main effects can be specified (independence model).
Example (Fahrmeir and Hamerle, 1984, p. 507): The reading habits ofwomen (preference for a specific magazine: yes/no) are to be analyzed interms of dependence on employment (A: yes/no), age group (B: three cat-egories), and education (C: four categories). The complete design matrixX (Figure 8.2) is of dimension IJK × 1 + (I − 1) + (J − 1) + (K − 1),therefore (2 · 3 · 4) × (1 + 1 + 2 + 3) = 24 × 7. In this case, the numberof columns m is equal to the number of parameters in the independencemodel (cf. Figure 8.2).
372 8. Models for Categorical Response Variables
8.8.2 Coding of Response Models
Let
πi = P (y = 1 | xi) , i = 1, . . . , L ,
be the probability of response dependent on the level xi of the vector ofcovariates x. Summarized in matrix representation we then have
πL,1
= XL,m
βm,1
. (8.190)
Ni observations are made for the realization of covariates coded by xi.Thus, the vector y(j)
i (j = 1, . . . , Ni) is observed, and we get the MLestimate
πi = P (y = 1 | xi) =1Ni
Ni∑
j=1
y(j)i (8.191)
for πi (i = 1, . . . , L). For contingency tables the cell counts with binaryresponse N
(1)i and N
(0)i are given from which πi = N
(1)i /(N (1)
i + N(0)i ) is
calculated.The problem of finding an appropriate link function h(π) for estimating
h(π) = Xβ + ε (8.192)
has already been discussed in several previous sections. If model (8.190)is chosen, i.e., the identity link, the parameters βi are to be interpretedas the percentages with which the categories contribute to the conditionalprobabilities.
The logit link
h(πi) = ln(
πi
1− πi
)= x′iβ (8.193)
is again equivalent to the logistic model for πi:
πi =exp(x′iβ)
1 + exp(x′iβ). (8.194)
The design matrices under inclusion of various interactions (up to thesaturated model) are obtained as an extension of the designs for effect–coded main effects.
8.8.3 Coding of Models for the Hazard Rate
The analysis of lifetime data, given the variables Y = 1 (event) andY = 0 (censored), is an important special case of the application of binaryresponse in long–term studies.
The Cox model is often used as a semiparametric model for the modelingof failure time. Under inclusion of the vector of covariates x, this model can
8.8 Coding of Categorical Explanatory Variables 373
be written as follows:
λ(t | x) = λ0(t) exp(x′β) . (8.195)
If the hazard rates of two vectors of covariates x1, x2 are to be com-pared with each other (e.g., stratification according to therapy x1, x2), thefollowing relation is valid:
λ(t | x1)λ(t | x2)
= exp((x1 − x2)′β) . (8.196)
In order to be able to realize tests for quantitative or qualitative inter-actions between types of therapy and groups of patients, J subgroups ofpatients are defined (e.g., stratification according to prognostic factors).Let therapy Z be bivariate, i.e., Z = 1 (therapy A) and Z = 0 (therapyB). For a fixed group of patients the hazard rate λj(t | Z) (j = 1, . . . , J),for instance, is determined according to the Cox approach
λj(t | Z) = λ0j(t) exp(βjZ) . (8.197)
In the case of βj > 0, the risk is higher for Z = 1 than for Z = 0 (jthstratum).
Test for Quantitative Interaction
We test H0 : effects of therapy is identical across the J strata, i.e., H0 :β1 = . . . = βJ = β, against the alternative H1 : βi
<>βj for at least one pair
(i, j). Under H0, the test statistic
χ2J−1 =
J∑
j=1
(βj − ¯
β)2
var(βj)(8.198)
with
¯β =
∑Jj=1[βj/ var(βj)]J∑
j=1
[1/ var(βj)](8.199)
is distributed according to χ2J−1.
Test for Qualitative Differences
The null hypothesis H0 : therapy B (Z = 0) is better than therapy A(Z = 1) means H0 : βj ≤ 0 ∀j. We define the sum of squares of thestandardized estimates
Q− =∑
j:βj<0
(βj)2
var(βj)(8.200)
374 8. Models for Categorical Response Variables
J 2 3 4 5c 2.71 4.23 5.43 6.50
Table 8.6. Critical values for the Q–test for α = 0.05 (Gail and Simon, 1985).
and
Q+ =∑
j:βj>0
[βj
var(βj)
]2
, (8.201)
as well as the test statistic
Q = min(Q−, Q+) . (8.202)
H0 is rejected if Q > c (Table 8.6).Starting with the logistic model for the probability of response
P (Y = 1 | x) =exp(θ + x′β)
1 + exp(θ + x′β), (8.203)
and
P (Y = 0 | x) = 1− P (Y = 1 | x) =1
1 + exp(θ + x′β)(8.204)
with the binary variable
Y = 1 : T = t | T ≥ t, x ⇒ failure at time t,Y = 0 : T > t | T ≥ t, x ⇒ no failure,
we obtain the model for the hazard function
λ(t | x) =exp(θ + x′β)
1 + exp(θ + x′β)for t = t1, . . . , tT (8.205)
(Cox, 1972b; cf. Doksum and Gasko, 1990; Lawless, 1982; Hamerle andTutz, 1989). Thus the contribution of a patient to the likelihood (x fixed)with failure time t is
P (T = t | x) =exp(θt + x′β)∏
ti=1(1 + exp(θi + x′β))
. (8.206)
Example 8.4. Assume that a patient has an event in the four failure times(e.g., loss of abutment teeth by extraction). Let the patient have the follow-ing categories of the covariates: sex = 1 and age group = 5 (60–70 years).
8.9 Extensions to Dependent Binary Variables 375
The model is then l = θ + x′β:
Sex Age
0001
=
1 0 0 00 1 0 00 0 1 00 0 0 1
1 51 51 51 5︸ ︷︷ ︸
x
θ1
θ2
θ3
θ4
β11
β12
θt
β.
(8.207)
For N patients we have the model
l1l2...
lN
=
I1 x1
I2 x2
...IN xN
(θβ
),
The dimension of the identity matrices Ij (patient j) is the number ofsurvived failure times plus 1 (failure time of the jth patient). The vectors ljfor the jth patient contain as many zeros as the number of survived failuretimes of the other patients and the value 1 at the failure time of the jthpatient.
The numerical solutions (for instance, according to Newton–Raphson) forthe ML estimates θ and β are obtained from the product of the likelihoodfunctions (8.206) of all patients.
8.9 Extensions to Dependent Binary Variables
Although loglinear models are sufficiently rich to model any dependencestructure between categorical variables, if one is interested in a regressionof multivariate binary responses on a set of possibly continuous covariates,alternative models exist which are better suited and have easier parameterinterpretation. Two often–used models in applications are marginal modelsand random effects models. In the following, we emphasize the idea of mar-ginal models, because these seem to be a natural extension of the logisticregression model to more than one response variable. The first approachwe describe in detail is called the quasi–likelihood approach (cf. Section8.1.7), because the distribution of the binary response variables is not fullyspecified. We start by describing these models in detail in Section 8.9.3.Then the generalized estimating equations (GEEs) approach (Liang andZeger, 1986) is introduced and two examples are given. The third approachis a full likelihood approach (Section 8.9.12). Section 8.9.12 mainly givesan overview of the recent literature.
376 8. Models for Categorical Response Variables
8.9.1 Overview
We now extend the problems of categorical response to the situations ofcorrelation within the response values. These correlations are due to clas-sification of the individuals into clusters of “related” elements. As alreadymentioned in Section 8.1.6, a positive correlation among related elementsin a cluster leads to overdispersion, if independence among these elementsis falsely assumed.
Examples:
• Two or more implants or abutment teeth in dental reconstructions(Walther and Toutenburg, 1991).
• Response of a patient in cross–over in the case of a significant carry–over effect.
• Repeated categorical measurement of a response such as functionof the lungs, blood pressure, or performance in training (repeatedmeasures design or panel data).
• Measurement of paired organs (eyes, kidneys, etc.)
• Response of members of a family.
Let yij be the categorical response of the jth individual in the ith cluster
yij , i = 1, . . . , N, j = 1, . . . , ni . (8.208)
We assume that the expectation of the response yij is dependent onprognostic variables (covariates) xij by a regression, that is,
E(yij) = β0 + β1xij . (8.209)
Assume var(yij) = σ2 and
cov(yij , yij′) = σ2ρ (j 6= j′). (8.210)
The response of individuals from different clusters is assumed to be uncor-related. Let us assume that the covariance matrix for the response of everycluster equals
V
yi1
...yini
= V(yi) = σ2(1− ρ)Ini
+ σ2ρJni(8.211)
and thus has a compound symmetric structure. Hence, the covariancematrix of the entire sample vector is block–diagonal
W = V
y1
...yN
= diag(V(y1), . . . , V(yN )) . (8.212)
8.9 Extensions to Dependent Binary Variables 377
Notice that the matrix W itself does not have a compound symmetricstructure. Hence, we have a generalized regression model. The best linearunbiased estimate of β = (β0, β1)′ is given by the Gauss–Markov–Aitkenestimator [(3.168)]
b = (X ′W−1X)−1X ′W−1y (8.213)
and does not coincide with the OLS estimator. The choice of an incorrectcovariance structure leads, according to our remarks in Section 3.9.2, to abias in the estimate of the variance. On the other hand, the unbiasednessor consistency of the estimator of β stays untouched even in the case of anincorrect choice of the covariance matrix. Liang and Zeger (1993) examinedthe bias of var(β1) for the wrong choice of ρ = 0. In the case of positivecorrelation within the cluster, the variance is underestimated. This corre-sponds to the results of Goldberger (1964) for positive autocorrelation ineconometric models.
The following problems arise in practice:
(i) identification of the covariance structure;
(ii) estimation of the correlation; and
(iii) application of an Aitken-type estimate.
However, it is no longer possible to assume the usual GLM approach,because this does not take the correlation structure into consideration.Various approaches were developed as extensions of the GLM approach, inorder to be able to include the correlation structure in the response:
• the marginal model;
• the random–effects model;
• the observation–driven model; and
• the conditional model.
For binary response, simplifications arise (Section 8.9.8). Liang and Zeger(1989) proved that the joint distribution of the yij can be descibed by ni
logistic models for yij given yik (k 6= j). Rosner (1984) used this approachand developed beta–binomial models.
8.9.2 Modeling Approaches for Correlated Response
The modeling approaches can be ordered according to diverse criteria.
Population–Averaged versus Subject–Specific Models
The essential difference between population–averaged (PA) and subject–specific (SS) models lies in the answer to the question of whether the
378 8. Models for Categorical Response Variables
regression coefficients vary for the individuals. In PA models, the β’s areindependent of the specific individual i. Examples are the marginal andconditional models. In SS models, the β’s are dependent on the specifici and are therefore written as βi. An example for an SS model is therandom–effects model.
Marginal, Conditional, and Random–Effects Models
In the marginal model, the regression is modeled separately from the de-pendence within the measurement in contrast to the two other approaches.The marginal expectation E(yij) is modeled as a function of the explana-tory variables and is interpreted as the mean response over the populationof individuals with the same x. Hence, marginal models are mainly suitablefor the analysis of covariate effects in a population.
The random–effects model, often also titled the mixed model, assumesthat there are fixed effects, as in the marginal model, as well as individualspecific effects. The dependent observations on each individual are assumedto be conditionally independent given the subject–specific effects.
Hence random–effects models are useful if one is interested in subject–specific behavior. But, concerning interpretation, only the linear mixedmodel allows an easy interpretation of fixed effect parameters as population–averaged effects and the others as subject–specific effects. Generalized linearmixed models are more complex, and even if a parameter is estimated as afixed effect it may not be easily interpreted as a population–averaged effect.
For the conditional model (observation–driven model), a time–dependentresponse yit is modeled as a function of the covariates and of the pastresponse values yit−1, . . . , yi1. This is done by assuming a specific correla-tion structure among the response values. Conditional models are useful ifthe main point of interest is the conditional probability of a state or thetransition of states.
8.9.3 Quasi–Likelihood Approach for Correlated BinaryResponse
The following sections are dedicated to binary response variables andespecially the bivariate case (i.e., cluster size ni = 2 for all i = 1, . . . , N).
In the case of a violation of independence or in the case of a missingdistribution assumption of the natural exponential family, the core of theML method, namely, the score function, may be used, nevertheless, forparameter estimation. We now want to specify the so–called quasi–scorefunction (8.77) for the binary response (cf. Section 8.1.7).
Let y′i = (yi1, . . . , yini) be the response vector of the ith cluster
(i = 1, . . . , N) with the true covariance matrix cov(yi) and let xij bethe (p × 1)–vector of the covariate corresponding to yij . Assume the vari-ables yij are binary with values 1 and 0, and assume P (yij = 1) = πij .
8.9 Extensions to Dependent Binary Variables 379
We then have µij = πij . Let π′i = (πi1, . . . , πini). Suppose that the link
function is g(·), that is,
g(πij) = ηij = x′ijβ .
Let h(·) be the inverse function, that is,
µij = πij = h(ηij) = h(x′ijβ) .
For the canonical link
logit(πij) = ln(
πij
1− πij
)= g(πij) = x′ijβ
we have
πij = h(ηij) =exp(ηij)
1 + exp(ηij)=
exp(x′ijβ)1 + exp(x′ijβ)
.
Hence
D =(
∂µij
∂β
)=
(∂πij
∂β
).
We have∂πij
∂β=
∂πij
∂ηij
∂ηij
∂β=
∂h(ηij)∂ηij
xij ,
and, hence, for i = 1, . . . , N and the (p× ni)–matrix X ′i = (xi1, . . . , xini):
Di = Di Xi with Di =(
∂h(ηij)∂ηij
).
For the quasi–score function for all N clusters, we now get
U(β) =N∑
i=1
X ′iD
′iV−1i (yi − πi) , (8.214)
where Vi is the matrix of the working variances and covariances of theyij of the ith cluster. The solution of U(β) = 0 is found iteratively underfurther specifications, which we describe in the next section.
8.9.4 The Generalized Estimating Equation Method by Liangand Zeger
The variances are modeled as a function of the mean, that is,
vij = var(yij) = v(πij)φ . (8.215)
(In the binary case, the form of the variance of the binomial distributionis often chosen: v(πij) = πij(1− πij).) With these, the following matrix isformed
Ai = diag(vi1, . . . , vini) . (8.216)
380 8. Models for Categorical Response Variables
Since the structure of dependence is not known, an (ni × ni)–quasi–correlation matrix Ri(α) is chosen for the vector of the ith cluster y′i =(yi1, . . . , yini) according to
Ri(α) =
1 ρi12(α) · · · ρi1ni(α)ρi21(α) 1 · · · ρi2ni(α)
......
ρini1(α) ρini2(α) · · · 1
, (8.217)
where the ρikl(α) are the correlations as function of α (α may be a scalaror a vector). Ri(α) may vary for the clusters.
By multiplying the quasi–correlation matrix Ri(α) with the root diagonalmatrix of the variances Ai, we obtain a working covariance matrix
Vi(β, α, φ) = A1/2i Ri(α)A1/2
i , (8.218)
which is no longer completely specified by the expectations, as in the caseof independent response. We have Vi(β, α, φ) = cov(yi) if and only if Ri(α)is the true correlation matrix of yi.
If the matrices Vi in (8.214) are replaced by the matrices Vi(β, α, φ)from (8.218), we get the generalized estimating equationsby Liang and Zeger(1986), that is,
U(β, α, φ) =N∑
i=1
(∂πi
∂β
)′V−1
i (β, α, φ)(yi − πi) = 0 . (8.219)
The solutions are denoted by βG. For the quasi–Fisher matrix, we have
FG(β, α) =N∑
i=1
(∂πi
∂β
)′V−1
i (β, α, φ)(
∂πi
∂β
). (8.220)
To avoid the dependence of α in determining βG, Liang and Zeger (1986)proposed replacing α by a N1/2–consistent estimate α(y1, . . . , yN , β, φ) andφ by φ (8.79) and determining βG from U(β, α, φ) = 0.
Remark. The iterative estimating procedure for GEE is described in detailin Liang and Zeger (1986). For the computational translation, an SASmacro by Karim and Zeger (1988) and a program by Kastner, Fieger andHeumann (1997) exist.
If Ri(α) = Ini(i = 1, . . . , N), is chosen, then the GEEs are reduced to
the independence estimating equations (IEEs) . The IEEs are
U(β, φ) =N∑
i=1
(∂πi
∂β
)′A−1
i (yi − πi) = 0 (8.221)
with Ai = diag(v(πij)φ). The solution is denoted by βI . Under some weakconditions, we have (Theorem 1 in Liang and Zeger, 1986) that βI is as-
8.9 Extensions to Dependent Binary Variables 381
ymptotically consistent if the expectation πij = h(x′ijβ) is correctly specifiedand the dispersion parameter φ is consistently estimated.
βI is asymptotically normal
βIa.s.∼ N(β;F−1
Q (β, φ)F2(β, φ)F−1Q (β, φ)), (8.222)
where
F−1Q (β, φ) =
[N∑
i=1
(∂πi
∂β
)′Ai−1
(∂πi
∂β
)]−1
,
F2(β, φ) =N∑
i=1
(∂πi
∂β
)′Ai−1 cov(yi)Ai
−1
(∂πi
∂β
),
and cov(yi) is the true covariance matrix of yi.A consistent estimate for the variance of βI is found by replacing βI by
βI , cov(yi) by its estimate (yi − πi)(yi − πi)′, and φ by φ from (8.79), if φis an unknown nuisance parameter. The consistency is independent of thecorrect specification of the covariance.
The advantages of βI are that βI is easy to calculate using software forgeneralized linear models and that in the case of correct specification ofthe regression model, βI and cov(βI) are consistent estimates. However, βI
loses in efficiency if the correlation between the clusters is large.
8.9.5 Properties of the Generalized Estimating EquationEstimate βG
Liang and Zeger (1986, Theorem 2) state that under some weak assump-tions, and under the conditions:
(i) α is N1/2–consistent for α, given β and φ;
(ii) φ is a N1/2–consistent estimate for φ; and given β
(iii) the derivation ∂α(β, φ)/∂φ is independent of φ and α and is ofstochastic order Op(1);
the estimate βG is consistent and asymptotically normal
βGa.s.∼ N(β, VG) (8.223)
with the asymptotic covariance matrix
VG = F−1Q (β, α)F2(β, α)F−1
Q (β, α), (8.224)
382 8. Models for Categorical Response Variables
where
F−1Q (β, α) =
(N∑
i=1
(∂πi
∂β
)′Vi−1
(∂πi
∂β
))−1
,
F2(β, α) =N∑
i=1
(∂πi
∂β
)′Vi−1 cov(yi)Vi
−1
(∂πi
∂β
)
and cov(yi) = E[(yi − πi)(yi − πi)′] is the true covariance matrix of yi. Ashort outline of the proof may be found in the Appendix of Liang and Zeger(1986).
The asymptotic properties hold only for N → ∞. Hence, it should beremembered that the estimation procedure should be used only for a largenumber of clusters.
An estimate VG for the covariance matrix VG may be found by replacingβ, φ, and α by their consistent estimates in (8.224), or by replacing cov(yi)by (yi − πi)(yi − πi)′.
If the covariance structure is specified correctly, so that Vi = cov(yi),then the covariance of βG is the inverse of the expected Fisher–informationmatrix
VG =
(N∑
i=1
(∂πi
∂β
)′Vi−1
(∂πi
∂β
))−1
= F−1(β, α).
The estimate of this matrix is more stable than that of (8.224), but ithas a loss in efficiency if the correlation structure is specified incorrectly(cf. Prentice, 1988, p. 1040).
The method of Liang and Zeger leads to an asymptotic variance of βG
that is independent of the choice of the estimates α and φ within the class ofthe N1/2–consistent estimates. This is true for the asymptotic distributionof βG as well.
In the case of correct specification of the regression model, the es-timates βG and VG are consistent, independent of the choice of thequasi–correlation matrix Ri(α). This means that even if Ri(α) is specifiedincorrectly, βG and VG stay consistent as long as α and φ are consistent.This robustness of the estimates is important, because the admissibilityof the working covariance matrix Vi is difficult to check for small ni. Anincorrect specification of Ri(α) can reduce the efficiency of βG.
If the identity matrix is assumed for Ri(α), i.e., Ri(α) = I (i = 1, . . . , N),then the estimating equations for β are reduced to the IEE. If the variancesof the binomial distribution are chosen, as is usually done in the binary case,then the IEE and the ML score function (with binomially distributed vari-ables) lead to the same estimates for β. However, the IEE method should bepreferred in general, because the ML estimation procedure leads to incor-rect variances for βG and hence, for example, incorrect test statistics and
8.9 Extensions to Dependent Binary Variables 383
p–values. This leads to incorrect conclusions, for instance, related to thesignificance or nonsignificance of the covariates (cf. Liang and Zeger, 1993).
Diggle, Liang and Zeger (1994, Chapter 7.5) have proposed checking theconsistency of βG by fitting an appropriate model with various covariancestructures. The estimates βG and their consistent variances are then com-pared. If these differ too much, the modeling of the covariance structurecalls for more attention.
8.9.6 Efficiency of the Generalized Estimating Equation andIndependence Estimating Equation Methods
Liang and Zeger (1986) stated the following about the comparison of βI
and βG. βI is almost as efficient as βG if the true correlation α is small. βI
is very efficient if α is small and the data are binary.If α is large, then βG is more efficient than βI , and the efficiency of βG
can be increased if the correlation matrix is specified correctly.In the case of a high correlation within the blocks, the loss of efficiency
of βI compared to βG is larger if the number of subunits ni (i = 1, . . . , N),varies between the clusters than if the clusters are all of the same size.
8.9.7 Choice of the Quasi–Correlation Matrix Ri(α)
The working correlation matrix Ri(α) is chosen according to considerationssuch as simplicity, efficiency, and amount of existing data. Furthermore,assumptions about the structure of the dependence among the data shouldbe considered by the choice. As mentioned before, the importance of thecorrelation matrix is due to the fact that it influences the variance of theestimated parameters.
The simplest specification is the assumption that the repeated observa-tions of a cluster are uncorrelated, that is,
Ri(α) = I, i = 1, . . . , N.
This assumption leads to the IEE equations for uncorrelated responsevariables.
Another special case, which is the most efficient according to Liang andZeger (1986, Section 4) but may be used only if the number of observationsper cluster is small and is the same for all clusters (e.g., equals n), is givenby the choice
Ri(α) = R(α),
where R(α) is left totally unspecified and may be estimated by the empiricalcorrelation matrix. The n(n− 1)/2 parameters have to be estimated.
If it is assumed that the same pairwise dependencies exist among allthe response variables of one cluster, then the exchangeable correlation
384 8. Models for Categorical Response Variables
structure may be chosen:
Corr(yik, yil) = α, k 6= l, i = 1, . . . , N .
This corresponds to the correlation assumption in random–effects models.If Corr(yik, yil) = α(|k−l|) is chosen, then the correlations are stationary.
The specific form α(|k − l|) = α|l−k| corresponds to the autocorrelationfunction of an AR(1)–process.
Further methods for parameter estimation in quasi–likelihood approachesare: the GEE1 method by Prentice (1988) that estimates the α and βsimultaneously from the GEE for α and β; the modified GEE1 methodby Fitzmaurice, Laird and Rotnitzky (1993) based on conditional oddsratios; those by Lipsitz, Laird and Harrington (1991) and Liang, Zegerand Qaqish (1992) based on marginal odds ratios for modeling the clustercorrelation; the GEE2 method by Liang et al. (1992) that estimates δ′ =(β′, α) simultaneously as a joint parameter; and the pseudo–ML methodby Zhao and Prentice (1990) and Prentice and Zhao (1991).
8.9.8 Bivariate Binary Correlated Response Variables
The previous sections introduced various methods developed for regressionanalysis of correlated binary data. They were described in a general formfor N blocks (clusters) of size ni. These methods may, of course, be usedfor bivariate binary data as well. This has the advantage that it simplifiesthe matter.
In this section, the GEE and IEE methods are developed for the bivariatebinary case. Afterward, an example demonstrates, for the case of bivariatebinary data, the difference between a naive ML estimate and the GEEmethod of Liang and Zeger (1986).
We have yi = (yi1, yi2)′ (i = 1, . . . , N). Each response variable yij
(j = 1, 2), has its own vector of covariates x′ij = (xij1, . . . , xijp). The cho-sen link function for modeling the relationship between πij = P (yij = 1)and xij is the logit link
logit(πij) = ln(
πij
1− πij
)= x′ijβ . (8.225)
Let
π′i = (πi1, πi2) , ηij = x′ijβ , η′ = (ηi1, ηi2) . (8.226)
The logistic regression model has become the standard method forregression analysis of binary data.
8.9 Extensions to Dependent Binary Variables 385
8.9.9 The Generalized Estimating Equation Method
From Section 8.9.4 it can be seen that the form of the estimating equationsfor β is as follows:
U(β, α, φ) = S(β, α) =N∑
i=1
(∂πi
∂β
)′Vi−1(yi − πi) = 0 , (8.227)
where Vi = Ai1/2Ri(α)Ai
1/2, Ai = diag(v(πij)φ) (j = 1, 2), and Ri(α)is the working correlation matrix. Since only one correlation coefficientρi = Corr(yi1, yi2) (i = 1, . . . , N), has to be specified for bivariate bi-nary data, and this is assumed to be constant, we have, for the correlationmatrix,
Ri(α) =(
1 ρρ 1
), i = 1, . . . , N . (8.228)
For the matrix of derivatives we have(
∂πi
∂β
)′=
(∂h(ηi)
∂β
)′=
(∂ηi
∂β
)′(∂h(ηi)
∂ηi
)′
=(
x′i1x′i2
)′(∂h(ηi1)/∂ηi1 0
0 ∂h(ηi2)/∂ηi2
).
Since
h(ηi1) = πi1 = (exp(x′i1β))/(1 + exp(x′i1β))
and
exp(x′i1β) = πi1/(1− πi1),
we have
1 + exp(x′i1β) = 1 + πi1/(1− πi1) = 1/(1− πi1),
and
∂h(ηi1)∂ηi1
=πi1
1 + exp(x′i1β)= πi1(1− πi1) (8.229)
holds. Analogously, we have
∂h(ηi2)∂ηi2
= πi2(1− πi2). (8.230)
If the variance is specified as var(yij) = πij(1− πij), φ = 1, then we get(
∂πi
∂β
)′= x′i
(var(yi1) 0
0 var(yi2)
)= x′i∆i
386 8. Models for Categorical Response Variables
with x′i = (xi1, xi2) and ∆i =(
var(yi1) 00 var(yi2)
). For the covariance
matrix Vi we have:
Vi =(
var(yi1) 00 var(yi2)
)1/2 (1 ρρ 1
)(var(yi1) 0
0 var(yi2)
)1/2
=(
var(yi1) ρ(var(yi1) var(yi2))1/2
ρ(var(yi1) var(yi2))1/2 var(yi2)
)(8.231)
and for the inverse of Vi:
V−1i =
1(1− ρ2) var(yi1) var(yi2)(
var(yi2) −ρ(var(yi1) var(yi2))1/2
−ρ(var(yi1) var(yi2))1/2 var(yi1)
)
=1
1− ρ2
([var(yi1)]−1 −ρ(var(yi1) var(yi2))−1/2
−ρ(var(yi1) var(yi2))−1/2 [var(yi2)]−1
).
(8.232)
If ∆i is multiplied by Vi−1, we obtain
Wi = ∆iVi−1 =
11− ρ2
1 −ρ(
var(yi1)var(yi2)
)1/2
−ρ(
var(yi2)var(yi1)
) 12
1
(8.233)
and for the GEE method for β in the bivariate binary case
S(β, α) =N∑
i=1
x′iWi(yi − πi) = 0. (8.234)
According to Liang and Zeger (1986, Theorem 2), under some weak con-ditions, and under the assumption that the correlation parameter wasconsistently estimated, the solution βG is consistent and asymptoticallynormal with expectation β and covariance matrix (8.224).
8.9.10 The Independence Estimating Equation Method
If it is assumed that the response variables of each of the blocks are inde-pendent, i.e., Ri(α) = I and Vi = Ai, then the GEE method is reducedto the IEE method,
U(β, φ) = S(β) =N∑
i=1
(∂πi
∂β
)′Ai−1(yi − πi) = 0. (8.235)
8.9 Extensions to Dependent Binary Variables 387
As we have just shown, we have, for the bivariate binary case,(
∂πi
∂β
)′= x′i∆i = x′i
(var(yi1) 0
0 var(yi2)
)(8.236)
with var(yij) = πij(1− πij), φ = 1, and
Ai−1 =
([var(yi1)]−1 0
0 [var(yi2)]−1
).
The IEE method then simplifies to
S(β) =N∑
i=1
x′i(yi − πi) = 0. (8.237)
The solution βI is consistent and asymptotically normal, according to Liangand Zeger (1986, Theorem 1).
8.9.11 An Example from the Field of Dentistry
In this section, we demonstrate the procedure of the GEE method by meansof a “twin” data set that was documented by the Dental Clinic in Karl-sruhe, Germany (Walther, 1992). The focal point is to show the differencebetween a robust estimate (GEE method), that takes the correlation ofthe response variables into account, and the naive ML estimate. For theparameter estimation with the GEE method, an SAS macro is available(Karim and Zeger, 1988), as well as a procedure by Kastner et al. (1997).
Description of the “Twin” Data Set
During the examined interval, 331 patients were provided with two conicalcrowns each in the Dental Clinic in Karlsruhe. Since 50 conical crownsshowed missing values, and since the SAS macro for the GEE method needscomplete data sets, these patients were excluded. Hence, for the estimationof the regression parameters, the remaining 612 completely observed twindata sets were used. In this example, the twin pairs make up the clusters,and the twins themselves (1.twin, 2.twin) are the subunits of the clusters.
The Response Variable
For all twin pairs in this study, the lifetime of the conical crowns wasrecorded in days. This lifetime is chosen as the response and is transformedinto a binary response variable yij of the jth twin (j = 1, 2) in the ithcluster with
yij =
1 , if the conical crown is in function longer than x days0 , if the conical crown is in function no longer than x days.
Different values may be defined for x. In the example, the values, in days,of 360 (1 year), 1100 (3 years), and 2000 (5 years) were chosen. Because
388 8. Models for Categorical Response Variables
the response variable is binary, the response probability of yij is modeledby the logit link (logistic regression). The model for the log–odds (i.e., thelogarithm of the odds πij/(1− πij) of the response yij = 1) is linear in thecovariates, and in the model for the odds itself, the covariates have a mul-tiplicative effect on the odds. The aim of the analysis is to find whether theprognostic factors have a significant influence on the response probability.
Prognostic Factors
The covariates that were included in the analysis with the SAS macro, are:
• age (in years);
• sex (1 : male, 2 : female);
• jaw (1 : upper jaw, 2 : lower jaw); and
• type (1 : dentoalveolar design, 2 : transversal design).
All covariates, except for the covariate age, are dichotomous. The two typesof conical crown constructions, dentoalveolar and transversal design, aredistinguished as follows (cf. Walther, 1992):
• The dentoalveolar design connects all abutments exclusively by a rigidconnection that runs on the alveolar ridge.
• The transversal design is used if parts of the reconstruction have tobe connected by a transversal bar. This is the case if teeth in thefront area are not included in the construction.
A total of 292 conical crowns were included in a dentoalveolar design and320 in a transversal design. Of these, 258 conical crowns were placed in theupper jaw, and 354 in the lower jaw.
The GEE Method
A problem that arises for the twin data is that the twins of a block arecorrelated. If this correlation is not taken into account, then the estimatesβ stay unchanged but the variance of the β is underestimated. In the caseof positive correlation in a cluster, we have
var(β)naive < var(β)robust.
Therefore,
β√var(β)naive
>β√
var(β)robust
,
which leads to incorrect tests and possibly to significant effects that mightnot be significant in a correct analysis (e.g., GEE). For this reason, appro-priate methods that estimate the variance correctly should be chosen if theresponse variables are correlated.
8.9 Extensions to Dependent Binary Variables 389
The following regression model without interaction is assumed:
lnP (lifetime ≥ x)P (lifetime < x)
= β0 + β1 · age + β2 · sex+β3 · jaw + β4 · type .
Additionally, we assume that the dependencies between the twins areidentical and hence the exchangeable correlation structure is suitable fordescribing the dependencies.
To demonstrate the effects of various correlation assumptions on the es-timation of the parameters, the following logistic regression models, whichdiffer only in the assumed association parameter, are compared:
Model 1: Naive (incorrect) ML estimation.
Model 2: Robust (correct) estimation, where independence is assumed, i.e.,Ri(α) = I.
Model 3: Robust estimation with exchangeable correlation structure(ρikl = Corr(yik, yil) = α, k 6= l).
Model 4: Robust estimation with unspecified correlation structure (Ri(α) =R(α)).
As a test statistic (z–naive and z–robust) the ratio of estimate and standarderror is calculated.
Results
Table 8.7 summarizes the estimated regression parameters, the standard er-rors, the z–statistics, and the p–values of Models 2, 3, and 4 of the responsevariables
yij =
1 , if the conical crown is in function longer than 360 days,0 , if the conical crown is in function no longer than 360 days.
It turns out that the β–values and the z–statistics are identical, indepen-dent of the choice of Ri, even though a high correlation between the twinsexists. The exchangeable correlation model yields the value 0.9498 for theestimated correlation parameter α. In the model with the unspecified cor-relation structure, ρi12 and ρi21 were estimated as 0.9498 as well. Thefact that the estimates of Models 2, 3, and 4 coincide was observed in theanalyses of the response variables with x = 1100 and x = 2000 as well. Thismeans that the choice of Ri has no influence on the estimation procedurein the case of bivariate binary response. The GEE method is robust withrespect to various correlation assumptions.
Table 8.8 compares the results of Models 1 and 2. A striking differencebetween the two methods is that the covariate age in the case of a naive
390 8. Models for Categorical Response Variables
Model 2 Model 3 Model 4(Independence assump.) (Exchangeable) (Unspecified)
Age 0.0171) (0.012)2) 0.017 (0.012) 0.017 (0.012)1.3303) (0.185)4) 1.330 (0.185) 1.330 (0.185)
Sex −0.117 (0.265) −0.117 (0.265) −0.117 (0.265)−0.440 (0.659) −0.440 (0.659) −0.440 (0.659)
Jaw 0.029 (0.269) 0.029 (0.269) 0.029 (0.269)0.110 (0.916) 0.110 (0.916) 0.110 (0.916)
Type −0.027 (0.272) −0.027 (0.272) −0.027 (0.272)−0.100 (0.920) −0.100 (0.920) −0.100 (0.920)1) Estimated regression values β. 2) Standard errors of β.3) z–Statistic. 4) p–Value.
Table 8.7. Results of the robust estimates for Models 2, 3, and 4 for x = 360.
Model 1 (naive) Model 2 (robust)σ z p–value σ z p–value
Age 0.008 1.95 0.051∗ 0.012 1.33 0.185Sex 0.190 −0.62 0.538 0.265 −0.44 0.659Jaw 0.192 0.15 0.882 0.269 0.11 0.916Type 0.193 −0.14 0.887 0.272 −0.10 0.920∗ Indicates significance at the 10% level.
Table 8.8. Comparison of the standard errors, the z–statistics, and the p–valuesof Models 1 and 2 for x = 360.
ML estimation (Model 1) is significant at the 10% level, even though thissignificance does not turn up if the robust method with the assumption ofindependence (Model 2) is used. In the case of coinciding estimated regres-sion parameters, the robust variances of β are larger and, accordingly, therobust z–statistics are smaller than the naive z–statistics. This result showsclearly that the ML method, which is incorrect in this case, underestimatesthe variances of β and hence leads to an incorrect age effect.
Tables 8.9 and 8.10 summarize the results with x–values 1100 and 2000.Table 8.9 shows that if the response variable is modeled with x = 1100,then none of the observed covariates is significant. As before, the estimated
Model 1 (naive) Model 2 (robust)β σ z p–value σ z p–value
Age 0.0006 0.008 0.08 0.939 0.010 0.06 0.955Sex −0.0004 0.170 −0.00 0.998 0.240 −0.00 0.999Jaw 0.1591 0.171 0.93 0.352 0.240 0.66 0.507Type 0.0369 0.172 0.21 0.830 0.242 0.15 0.878
Table 8.9. Comparison of the standard errors, the z–statistics, and the p–valuesof models 1 and 2 for x = 1100.
8.9 Extensions to Dependent Binary Variables 391
Model 1 (naive) Model 2 (robust)
β σ z p–value σ z p–value
Age −0.0051 0.013 −0.40 0.691 0.015 −0.34 0.735Sex −0.2177 0.289 −0.75 0.452 0.399 −0.55 0.586Jaw 0.0709 0.287 0.25 0.805 0.412 0.17 0.863Type 0.6531 0.298 2.19 0.028∗ 0.402 1.62 0.104∗ Indicates significance at the 10% level.
Table 8.10. Comparison of the standard errors, the z–statistics, and the p–valuesof Models 1 and 2 for x = 2000.
correlation parameter α = 0.9578 indicates a strong dependency betweenthe twins. In Table 8.10, the covariate “type” has significant influence inthe case of naive estimation. In the case of the GEE method (R = I),it might be significant with a p–value = 0.104 (10% level). The resultβtype = 0.6531 indicates that a dentoalveolar design significantly increasesthe log–odds of the response variable
yij =
1 , if the conical crown is in function longer than 2000 days,0 , if the conical crown is in function no longer than 2000 days.
Assuming the model
P (lifetime ≥ 2000)P (lifetime < 2000)
= exp(β0 + β1 · age + β2 · sex + β3 · jaw + β4 · type)
the odds P (lifetime≥ 2000)/P (lifetime< 2000) for a dentoalveolar designare higher than the odds for a transversal design by the factor exp(β4) =exp(0.6531) = 1.92 or, alternatively, the odds ratio equals 1.92. Thecorrelation parameter yields the value 0.9035.
In summary, it can be said that age and type are significant but nottime–dependent covariates. The robust estimation yields no significantinteraction, and a high correlation α exists between the twins of a pair.
Problems
The GEE estimations, which were carried out stepwise, have to be com-pared with caution, because they are not independent due to the time effectin the response variables. In this context, time–adjusted GEE methods thatcould be applied in this example are still missing. Therefore, further effortsare necessary in the field of survivorship analysis, in order to be able tocomplement the standard procedures, such as the Kaplan–Meier estimateand log–rank test, which are based on the independence of the responsevariables.
392 8. Models for Categorical Response Variables
8.9.12 Full Likelihood Approach for Marginal Models
A useful full likelihood approach for marginal models in the case of mul-tivariate binary data was proposed by Fitzmaurice et al. (1993). Theirstarting point is the joint density
f(y; Ψ,Ω) = P (Y1 = y1, . . . , YT = yT ; Ψ, Ω)= expy′Ψ + w′Ω−A(Ψ,Ω) (8.238)
with y = (y1, . . . , yT )′, w = (y1y2, y1y3, . . . , yT−1yT , . . . , y1y2 · · · yT )′,Ψ = (Ψ1, . . . , ΨT )′, and Ω = (ω12, ω13, . . . , ωT−1T , . . . , ω12···T )′. Further
expA(Ψ,Ω) =y=(1,1,...,1)∑
y=(0,0,...,0)
expy′Ψ + w′Ω
is a normalizing constant. Note that this is essentially the saturated para-metrization in a loglinear model for T binary responses, since interactionsof order 2 to T are included. A model that considers only all pairwise in-teractions, i.e., w = (y1y2), . . . , (yT−1yT ) and Ω = (ω12, ω13, . . . , ωT−1,T ),was already proposed by Cox (1972b) and by Zhao and Prentice (1990).The models are special cases of the so–called partial exponential familiesthat were introduced by Zhao, Prentice and Self (1992). The idea of Fitz-maurice et al. (1993) was then to make a one–to–one transformation of thecanonical parameter vector Ψ to the mean vector µ, which then can belinked to covariates via link functions such as in logistic regression. Thisidea of transforming canonical parameters one–to–one into (eventually cen-tralized) moment parameters can be generalized to higher moments and todependent categorical variables with more than two categories. Because thedetails, theoretically and computationally, are somewhat complex, we referthe reader to Lang and Agresti (1994), Molenberghs and Lesaffre (1994),Glonek (1996), Heagerty and Zeger (1996), and Heumann (1998). Each ofthese sources gives different possibilities on how to model the pairwise andhigher interactions.
8.10 Exercises and Questions
8.10.1 Let two models be defined by their design matrices X1 and X2 =(X1, X3). Name the test statistic for testing H0 : “Model X1 holds”and its distribution.
of a binomial distribution?
8.10.3 Why would a quasi–loglikelihood approach be chosen? How is thecorrelation in cluster data parametrized?
8.10.2 What is meant by overdispersion? How is it parametrized in the case
8.10 Exercises and Questions 393
8.10.4 Compare the models of two–way classification for continuous,normal data (ANOVA) and for categorical data. What are thereparametrization conditions in each case?
8.10.5 Given the following G2 analysis of a two–way model with allsubmodels:
Model G2 p–valueA 200 0.00B 100 0.00
A + B 20 0.10A ∗B 0 1.00
which model is valid?
8.10.6 Given the following I × 2 table for X : age group and Y : binaryresponse:
1 0< 40 10 8
40–50 15 1250–60 20 1260–70 30 20> 70 30 25
analyze the trend of the sample logits.
9Repeated Measures Model
9.1 The Fundamental Model for One Population
In contrast to the previous chapters, we now assume that instead of hav-ing only one observation per object/subject (e.g., patient) we now haverepeated observations. These repeated measurements are collected at pre-viously exact defined times. The principle idea is that these observationsgive information about the development of a response Y . This responsemight, for instance, be the blood pressure (measured every hour) for afixed therapy (treatment A), the blood sugar level (measured every day ofthe week), or the monthly training performance of sprinters for trainingmethod A, etc., i.e., variables which change with time (or a different scaleof measurement). The aim of a design like this is not so much the descrip-tion of the average behavior of a group (with a fixed treatment), rather thecomparison of two or more treatments and their effect across the scale ofmeasurement (e.g., time), i.e., the treatment or therapy comparison.
First of all, before we deal with this interesting question, let us introducethe model for one treatment, i.e., for one sample from one population.
The Model
We index the I elements (e.g., patients) with i = 1, . . . , I and the measure-ments with j = 1, . . . , p, so that the response of the jth measurement onthe ith element (individual) is denoted by yij . The general basis for many
© Springer Science + Business Media, LLC 2009
H. Toutenburg and Shalabh, Statistical Analysis of Designed Experiments, Third Edition,Springer Texts in Statistics, DOI 10.1007/978-1-4419-1148-3_9,
395
396 9. Repeated Measures Model
analyses is the specific modeling approach of a mixed model
yij = µij + αij + εij (9.1)
with the three components:
(i) µij is the average response of yij over hypothetical repetitions withrandomly chosen individuals from the population. Thus, µij wouldstay unchanged if the ith element is substituted by any other elementof the sample.
(ii) αij represents the deviation between yij and µij for the particularindividual of the sample that was selected as the ith element. Thus,under hypothetical repetitions, this indiviual would have mean µij +αij .
(iii) εij describes the random deviation of the ith individual from thehypothetical mean µij + αij .
µij is a fixed effect. αij , on the other hand, is a random effect that variesover the index i (i.e., over the individuals, e.g., patients), hence αij is aspecific characteristic of the individual. “To be poetic, µij is an immutableconstant of the universe, αij is a lasting characteristic of the individual”(Crowder and Hand, 1990, p. 15). Since µij does not vary over the individ-uals, the index i could be dropped. However, we retain this index in orderto be able to identify the individuals.
The vector µi = (µi1, . . . , µip)′ is called the µ–profile of the individual.The following assumptions are made:
(A1) The αij are random effects that vary over the population for given jaccording to
E(αij) = 0 (for all i, j), (9.2)var(αij) = σ2
αij. (9.3)
(A2) The errors εij vary over the individuals for given j according to
E(εij) = 0 (for all i, j), (9.4)var(εij) = σ2
j . (9.5)
(A3) For different individuals i 6= i′ the α–profiles are uncorrelated, i.e.,
cov(αij , αi′j′) = 0 (i 6= i′) . (9.6)
However, for different measurements j 6= j′, the α–profiles of anindividual i are correlated
cov(αij , αij′) = σ2αjj′
(j 6= j′) . (9.7)
This assumption is essential for the repeated measures model, since itmodels the natural assumption that the response of an element overthe j is an individual interdependent characteristic of the individual.
9.1 The Fundamental Model for One Population 397
(A4) The random errors are uncorrelated according to
E(εijεi′j′) = 0 (for all i, i′, j, j′) . (9.8)
(A5) The random components αij and εij are uncorrelated according to
E(αijεi′j′) = 0 (for all i, i′, j, j′) . (9.9)
(A6) The αij and εij are normally distributed.
From these assumptions it follows that
E(yij) = µij (9.10)
and (with δij the Kronecker symbol)
cov(yij , yi′j′) = E ((αij + εij)(αi′j′ + εi′j′))= E(αijαi′j′ + αijεi′j′ + εijαi′j′ + εijεi′j′)= δii′(σ2
αjj′+ δjj′σ
2j ) . (9.11)
If homogeneity of the variance over j is called for, i.e.,
σ2αjj′
= σ2α (9.12)
and
σ2j = σ2 , (9.13)
then the covariance (9.11) simplifies to
cov(yij , yi′j′) = δii′(σ2α + δjj′σ
2) . (9.14)
Thus, the variance is
var(yij) = σ2α + σ2 . (9.15)
The relation (9.14) expresses that two different individuals i 6= i′ are un-correlated, although the observations of an individual i are correlated overthe measurements
cov(yij , yi′j′) = 0 (i 6= i′), (9.16)cov(yij , yij′) = σ2
α (j 6= j′) . (9.17)
If the intraclass correlation coefficient for one individual over differentmeasurements is taken, then
ρ(j, j′) = ρ =cov(yij , yij′)√var(yij)var(yij′)
=σ2
α
σ2α + σ2
. (9.18)
The covariance matrix of every individual i (i = 1, . . . , I) is then of thefollowing form
var
yi1
...yip
= var(yi) = Σ = σ2Ip + σ2
αJp (9.19)
398 9. Repeated Measures Model
with Jp = 1p1′p (cf. DefinitionA.7). This matrix, that we already becameacquainted with in Section 3.9, is called compound symmetric.
Remark. The designs of Chapters 4 to 7 always had a covariance structureσ2I, with the exception of the mixed model from Section 7.6.2 (cf. (7.91)).Hence, the assumptions of the classical linear regression model (3.23) werevalid.
Because of the compound symmetry, we now have a generalized linearregression model and the parameter vector β has to be estimated accord-ing to the Gauss–Markov–Aitken theorem by the generalized least–squaresestimate
b = (X ′Σ−1X)−1X ′Σ−1y.
However (according to Theorem 3.22 by McElroy (1967)), the ordinary andthe generalized least–squares estimates are identical if and only if Σ has thestructure (9.19), under the assumption that the model contains the con-stant 1. The error structure Σ from (9.19) is ignored if the OLS estimateis applied, i.e., it does not have to be estimated. Hence, more degrees offreedom are available for the residual variance. This explains the prefer-ence given to the univariate ANOVA compared to the MANOVA for thecomparison of therapies in two groups, if they are treated according to therepeated measures design, and if the assumption of compound symmetryholds for both groups separately or, rather, if an assumption derived fromthis holds for the difference in response. This will be discussed in detailfurther on.
9.2 The Repeated Measures Model for TwoPopulations
We assume that two treatments, I and II, are to be compared with therepeated measures design. Additionally, we assume:
• n1 individuals receive treatment I;
• n2 individuals receive treatment II;
• both groups are homogeneous relating to all essential prognosticfactors for a response variable Y of interest; and
• realization of repeated measurements for both at j = 1, . . . , p.
9.2 The Repeated Measures Model for Two Populations 399
This results in two matrices of sample vectors
occasions1 . . . p
Y (I) =
y111 . . . y11p
. . .y1n11 . . . y1n1p
individual I1
. . .individual In1
occasions1 . . . p
Y (II) =
y211 . . . y21p
. . .y2n21 . . . y2n2p
individual II1
. . .individual IIn2
The subscripts of yijk stand for
k = 1 or 2: treatment I or II ,i = 1, . . . , ni: individual,j = 1, . . . , p: occasion (time of measurement) .
The response matrices Y (I) and Y (II) are assumed to be independent. Weintroduce the fixed factor “treatment” into the model (9.1) and choose thefollowing parametrization
ykij = µ + αk + βj + (αβ)kj + aki + εkij . (9.20)
These components have the following meaning:
µ is the overall mean;αk is the treatment effect;βj is the occasion effect (= time effect);(αβ)kj is the treatment × time interaction;aki is the random effect of the ith individual
in the kth treatment; andεkij is the random error.
The effects αk, βj , (αβ)kj are assumed to be fixed with the usual constraintsfor fixed effects, i.e.,
∑αk = 0,
∑βj = 0, and
∑i(αβ)ij =
∑j(αβ)ij = 0.
The effects αki and the errors εkij , however, are random. Hence, (9.20) isa mixed model.
For the random variables the following assumptions hold:
(i) The vector εk = (εk11, . . . , εknkp)′, k = 1, 2, is normally distributedaccording to
εk ∼ N(0, σ2I) . (9.21)
400 9. Repeated Measures Model
(ii) The vector ak = (αk1, . . . , αknk)′, k = 1, 2, is normally distributed
according to
ak ∼ N(0, σ2αI) . (9.22)
(iii) Both random variables are independent
E(εka′k′) = 0 (k, k′ = 1, 2) . (9.23)
With these assumptions, we obtain the expectation of ykij :
E(ykij) = µkj = µ + αk + βj + (αβ)kj , (9.24)
and for the expectation vector of the ith individual in the kth treatment,i.e., for yki = (yki1, . . . , ykip)′, we obtain
E(yki) = µk = (µk1, . . . , µkp)′, k = 1, 2 . (9.25)
The vector µk, that represents the mean vector over the p observations ofan individual and that is identical for all nk individuals of a group, is calledthe µk–profile of the individuals (Crowder and Hand, 1990, p. 26; Morrison,1983, p. 153). The observation vector yki, on the other hand, is called thecurve of progress of the ith individual in the kth treatment group.
With (9.24) and the assumptions (9.21)–(9.23), we have
cov(ykij , yk′i′j′) =
σ2α + σ2, if k = k′, i = i′, j = j′,
σ2α, if k = k′, i = i′, j 6= j′,
0, otherwise.(9.26)
Hence, the (p × p)–covariance matrix Σk (k = 1, 2) of the ith observationvector yki, k = 1, 2 (i = 1, . . . , nk) is of the form
Σk = σ2Ip + σ2αJp (9.27)
(cf. (9.19)), which is the structure of compound symmetry.
Remark. The reparametrization of (9.1) into (9.20) maintained all theassumptions of Section 9.1. Model (9.20) has the advantage that it canadopt the structure of the mixed models, as well as the estimationand interpretation of the parameters. For the correlation between theobservations
ρ(ykij , yk′i′j′) =
σ2α
σ2α + σ2 , if k = k′, i = i′, j 6= j′,
1, if k = k′, i = i′, j = j′,
0, otherwise,
(9.28)
we find:
(1) The observations, and hence the observation vectors, of individualsfrom different groups are uncorrelated. Due to the normal distributionthey are independent as well.
9.3 Univariate and Multivariate Analysis 401
(2) Observations, or rather observation vectors, of different individualsof the same group are uncorrelated (independent).
(3) Observations of an individual at different times of measurement arecorrelated (dependent) with the so–called intraclass correlation
ρ =σ2
α
σ2α + σ2
. (9.29)
9.3 Univariate and Multivariate Analysis
Parametric procedures for analyzing continuous data require the assump-tion of a distribution. Here the normal distribution as an extensive and,after the elimination of outliers or smoothing, an adequate class of distri-butions is available. Often, however, the variables have to be transformedfirst. The comparison of therapies is part of the complex of general meancomparisons of normally distributed populations. However, therapy com-parison requires only the far more weak assumption that the distances(differences) of the populations are normal.
Multivariate procedures for the mean comparison of two independentnormal distributions are constructed in analogy to univariate procedures.The major principles will be explained in the following section.
9.3.1 The Univariate One–Sample Case
Given a sample (y1, . . . , yn) from N(µ, σ2) with yi independent identicallydistributed. Then y ∼ N(µ, σ2/n) and s2(n − 1)/σ2 ∼ χ2
n−1. The t–testfor H0 : µ = µ0 is given by tn−1 = [(y − µ0)/s]
√n.
9.3.2 The Multivariate One–Sample Case
We assume that not only one random variable is observed, but a p–dimensional vector of random variables. The sample size is n. The sampleis then of the form
Yn,p
=
y′1...
y′n
=
y11, . . . , y1p
...yn1, . . . , ynp
and we assume for every vector yii.i.d∼ Np(µ, Σ), with µ′ = (µ1, . . . , µp) and
Σ positive definite. Hence
Y ∼ Np
µ...µ
,
Σ 0. . .
0 Σ
. (9.30)
402 9. Repeated Measures Model
The sample mean vector is
y.. = (y.1, . . . , y.p)′ (9.31)
with
y.j =1n
n∑
i=1
yij (j = 1, . . . , p) (9.32)
and the sample covariance matrix is
S = (Sjh) =1
n− 1
n∑
i=1
(yi − y..)(yi − y..)′ (9.33)
with the elements
Sjh = (n− 1)−1n∑
i=1
(yij − yj.)(yih − yh.) . (9.34)
Hence
y.. ∼ Np(µ, Σ/n) (9.35)
with µ′ = (µ1, . . . , µp) and
(n− 1)S ∼ Wp(Σ, n− 1) (9.36)
distributed independently, where Wp denotes the p–dimensional Wishartdistribution with (n− 1) degrees of freedom.
Definition 9.1. Let X = (x1, . . . , xn)′ be an (n × p)–data matrix froman Np(0,Σ), where x1, . . . , xn are independent and identically Np(0,Σ)–distributed. The (p× p)–matrix
W = X ′X =n∑
i=1
xix′i ∼ Wp(Σ, n)
then has a Wishart distribution with n degrees of freedom.
For p = 1, we have X ′X =∑n
i=1 x2i = x′x ∼ W1(σ2, n) so
that W1(σ2, n) = σ2χ2n holds. Hence, the Wishart distribution is the
multivariate analog of the χ2–distribution.
Definition 9.2. A random variable u has a Hotelling T 2–distribution withthe parameters p and n if it can be expressed as
u = nx′W−1x (9.37)
with
x ∼ Np(0, I) and W ∼ Wp(I, n)
being independent. We write
u ∼ T 2(p, n) . (9.38)
9.3 Univariate and Multivariate Analysis 403
Remark. If x ∼ Np(µ, Σ) and W ∼ Wp(Σ, n) and x and W areindependent, then
n(x− µ)′W−1(x− µ) ∼ T 2(p, n) . (9.39)
The T 2–distribution is equivalent to the F–distribution (Mardia, Kentand Bibby, 1979, p. 74):
T 2(p, n) ∼ np
n− p + 1Fp,n−p+1 . (9.40)
The multivariate two–sided hypothesis
H0 : µ = µ0 against H1 : µ 6= µ0 (9.41)
is tested in analogy to the t–test with the test statistic by Hotelling
T 2 = n(y.. − µ0)′S−1(y.. − µ0) , (9.42)
where (y..−µ0)′S−1(y..−µ0) is the Mahalanobis–D2 statistic. If H0 holds,then the test statistic
F =n− p
p(n− 1)T 2 (9.43)
has an Fp,n−p–distribution, according to (9.36) and (9.40) (replace n byn− 1). The decision rule is as follows:
do not reject H0 : µ = µ0 if
T 2 ≤ p(n− 1)n− p
Fp,n−p;1−α . (9.44)
Idea of Proof. This test procedure is dealt with in detail in the standardliterature for multivariate analysis (cf., e.g., Timm, 1975, pp. 158–166; Mor-rison, 1983, pp. 128–134). Hence, we only want to give a short outline ofthe proof.
The decision rule (9.44) is derived by the union–intersection principlethat dates back to Roy (1953; 1957) . Assume y ∼ Np(µ, Σ) and let a 6= 0be any (p× 1)–vector. Hence (cf. A.55)
a′y ∼ N1(a′µ, a′Σa) = N1(µa, σ2a) . (9.45)
If H0 : µ = µ0 [(9.41)] is true, then H0a : µa = a′µ0 = µ0a is true for allvectors a as well. If, on the other hand, H0a is true for every a 6= 0, H0 istrue as well.
Hence, the multivariate hypothesis H0 : µ = µ0 is the intersection of theunivariate hypotheses
H0 =⋂
a6=0
H0a . (9.46)
Let Y (n × p) be a sample from N(µ,Σ) with y′.. = (y1., . . . , yp.) and Sfrom (9.33). Every univariate hypothesis H0a : a′µ = a′µ0 is tested against
404 9. Repeated Measures Model
its two–sided alternative H1a : a′µ 6= a′µ0 by the t–statistic
t(a) =a′(y.. − µ0)√
a′Sa
√n , (9.47)
and the acceptance region for H0 is given by
t2(a) ≤ t2n−1,1−α/2 . (9.48)
Hence, the multivariate acceptance region is the intersection of allunivariate acceptance regions
⋂
a6=0
(t2(a) ≤ t2n−1,1−α/2) . (9.49)
Therefore, this area has to contain the largest t2(a), so that (9.49) isequivalent to
maxa
t2(a) ≤ t2n−1,1−α/2 . (9.50)
Hence, the multivariate test for H0 : µ = µ0 can be based on t2(a). Sincet2(a) is dimensionless and unaffected by a change of scale of the elementsof a, this indeterminacy can be eliminated by a constraint as, for instance,
a′Sa = 1 . (9.51)
The optimization problem maxat2(a) | a′Sa = 1 is now equivalent to
maxaa′(y.. − µ0)(y.. − µ0)
′an− λ(a′Sa− 1) . (9.52)
Differentiation with respect to a, and to the Lagrangian multiplier λ(Theorems A.63–A.67), yields the system of normal equations
[(y.. − µ0)(y.. − µ0)′n− λS] a = 0 (9.53)
and
a′Sa = 1 . (9.54)
Premultiplication of (9.53) by a′, and taking (9.54) and (9.47) into account,gives
λ = a′(y.. − µ0)(y.. − µ0)′an
= t2(a | a′Sa = 1) . (9.55)
On the other hand, (9.53), as a homogeneous system in a, has a nontrivialsolution a 6= 0, as long as the determinant of the matrix equals zero. Thematrix (y.. − µ0)(y.. − µ0)′ is of rank 1. With the determinantal constraint(S is assumed to be regular), (9.53) yields according to
0 = |(y.. − µ0)(y.. − µ0)′n− λS|
= |S−1/2(y.. − µ0)(y.. − µ0)′S−1/2n− λIp| · |S|
9.3 Univariate and Multivariate Analysis 405
the characteristic equation for the first matrix, which is symmetric and ofrank 1 as well.
The only nontrivial eigenvalue of a matrix of rank 1 is the trace of thismatrix (corollary to Theorem A.10):
λ = trS−1/2(y.. − µ0)(y.. − µ0)′S−1/2n
= (y.. − µ0)′S−1(y.. − µ0)n . (9.56)
Hence t2(a | a′Sa = 1) equals Hotelling’s T 2 from (9.42).The test statistic derived according to the union–intersection principle
is equivalent to the likelihood–ratio statistic. However, this equivalence isnot true in general. The advantage of the union–intersection test is thatin the case of a rejection of H0, it is possible to test which one of therejection regions caused this. By choosing a = ei, it can be tested for whichcomponents of µ are responsible for the rejection of H0 : µ = µ0. Thisis not possible for the likelihood–ratio test. Furthermore, the importanceof the union–intersection principle also lies in the fact that simultaneousconfidence intervals for µ can be computed (Fahrmeir and Hamerle, 1984,p. 81). With
maxa6=0
t2(a) = n(y.. − µ0)′S−1(y.. − µ0)
= T 2 (9.57)
and (cf. (9.43))
T 2 =p(n− 1)n− p
Fp,n−p (9.58)
we have for µ = µ0
P
n− p
p(n− 1)T 2 ≤ Fp,n−p,1−α
= 1− α (9.59)
or, equivalently,
P
⋂
a6=0
(n− p)np(n− 1)
a′(y.. − µ)2
a′Sa≤ Fp,n−p,1−α
= 1− α . (9.60)
These confidence regions are simultaneously true for all a′µ with a ∈ Rp.If only a few comparisons are of interest, i.e., only a few ai, then we have
P (a′iy.. − c ≤ a′iµ ≤ a′iy.. + c) ≥ 1− α (9.61)
with
c2 = Fp,n−p,1−αp(n− 1)(n− p)n
a′Sa . (9.62)
In order to assure the confidence coefficient 1 − α for the chosen com-parisons, i.e., for a′1µ, . . . , a′kµ with k ≤ p, and to simultaneously shorten
406 9. Repeated Measures Model
the length of the interval, the Bonferroni method is applied. Assume Ei
(i = 1, . . . , k) is the event that the ith confidence interval covers theparameter a′iµ, and also assume that αi = 1 − P (Ei) = P (Ei) is thecorresponding significance level. Let Ei be the appropriate complementaryevent, then
P
(k⋂
i=1
Ei
)= 1− P
(k⋃
i=1
Ei
)≥ 1−
k∑
i=1
P (Ei) = 1−k∑
i=1
αi . (9.63)
Hence, (1 − ∑αi) is a lower limit for the real simultaneous confidence
coefficient
1− δ = P
(k⋂
i=1
Ei
).
If αi = α/k is chosen, then
P
(k⋂
i=1
Ei
)≥ 1− α .
The corresponding simultaneous confidence intervals are
a′iy.. ±√
F1,n−1,1−α/ka′Sa
n. (9.64)
9.4 The Univariate Two–Sample Case
Suppose that we are given two independent samples
(x1, . . . , xn1) from N(µ1, σ2) (9.65)
and
(y1, . . . , yn2) from N(µ2, σ2) . (9.66)
In the case of equal variances, the test statistic for H0 : µ1 = µ2 is
tn1+n2−2 =(x− y)
s√
1/n1 + 1/n2
(9.67)
with the pooled sample variance
s2 =(n1 − 1)s2
x + (n2 − 1)s2y
n1 + n2 − 2. (9.68)
The assumption of equal variances has to be tested with the F–test. Inthe case of a rejection of H0 : σ2
x = σ2y, no exact solution exists. This is
called the Behrens–Fisher problem. The comparison of means in the case ofσx 6= σy is done approximately by a tv–statistic, where the sample variancesinfluence the degrees of freedom v.
9.5 The Multivariate Two–Sample Case 407
9.5 The Multivariate Two–Sample Case
The multivariate analog of the t–test for testing H0 : µx = µy ((p × 1)–vectors each) is defined as Hotelling’s two–sample T 2:
T 2 = (n−11 + n−1
2 )−1(x.. − y..)′S−1(x.. − y..) (9.69)
with the pooled sample covariance matrix (within–groups)
(n1 + n2 − 2)S = (n1 − 1)Sx + (n2 − 1)Sy . (9.70)
The statistic T 2 is, in fact, an estimate of the Mahalanobis distanceD2 = (µx − µy)′Σ−1(µx − µy) of both populations. Under H0 : µx = µy,T 2 has the following relationship to the central F–distribution
Fp,v =n1 + n2 − p− 1(n1 + n2 − 2)p
T 2 (9.71)
with the degrees of freedom of the denominator
v = n1 + n2 − p− 1 . (9.72)
The decision rule based on the union–intersection principle (Roy, 1953,1957)—or, equivalently, on the likelihood–ratio principle—yields therejection region for H0 : µx = µy as
T 2 >(n1 + n2 − 2)p
vFp,v,1−α . (9.73)
Hotelling’s T 2–statistic for the model with fixed effects assumes the equal-ity of the covariance matrices Σx and Σy, in analogy to the univariatecomparison of means. This equality can be tested by various measures.
Remark.
(i) If H0 : µx = µy is replaced by H0 : C(µx − µy) = 0 where C is acontrast matrix for differences, then the statistic F [(9.71)] has onedegree of freedom less in the numerator as well as in the denominator,i.e., p is to be replaced by p− 1.
(ii) The performance of Hotelling’s T 2 and four nonparametric tests wereinvestigated by Harwell and Serlin (1994) with respect to type I errordistributions with varying skewness and sample size.
9.6 Testing of H0 : Σx = Σy
Box (1949) has given the following generalization of Bartlett’s test for theequality of two univariate variances to H0 : Σx = Σy in the multivariate(p–dimensional) case.
408 9. Repeated Measures Model
Assume that S [(9.70)] is the pooled sample covariance matrix of the twop–variate normal distributions. The Box–M statistic is αM with
M = (n1 − 1) ln( |S||Sx|
)+ (n2 − 1) ln
( |S||Sy|
)(9.74)
and α according to
1− 1/6(2p2 + 3p− 1)(p + 1)−1
1
n1 − 1+
1n2 − 1
− 1n1 + n2 − 2
.(9.75)
Under H0 : Σx = Σy, we have the following approximate distribution
αM ∼ χ2p(p+1)/2 . (9.76)
Remark. Box (1949) developed this statistic for the general comparisonof g ≥ 2 normal distributions and gave equivalent representations as anF–statistic. For the comparison of g independent normal distributionsNp(µ1, Σ1), . . . , Np(µg, Σg), the test problem is
H0 : Σ1 = . . . = Σg (9.77)
against
H1 : H0 not true .
Let Si be the unbiased estimates (i.e., the appropriate sample covariancematrices) of Σi (i = 1, . . . , g) and let ni be the corresponding sample size.We assume
N =g∑
i=1
ni, vi = ni − 1, (9.78)
and denote the pooled sample covariance matrix by S;
S =1
N − g
g∑
i=1
viSi . (9.79)
The test statistic is then of the form αM (cf. Timm, 1975, p. 252) with
M = (N − g) ln |S| −g∑
i=1
vi ln |Si| (9.80)
and
α = 1− C, (9.81)
C =2p2 + 3p− 1
6(p + 1)(g − 1)
(g∑
i=1
1vi− 1
N − g
). (9.82)
The approximate distribution is
αM ∼ χ2v with v =
p(p + 1)(g − 1)2
. (9.83)
For g = 2, we have α specified by (9.75).
9.7 Univariate Analysis of Variance in the Repeated Measures Model 409
9.7 Univariate Analysis of Variance in theRepeated Measures Model
9.7.1 Testing of Hypotheses in the Case of CompoundSymmetry
Consider the model (9.20) formulated in Section 9.2
ykij = µ + αk + βj + (αβ)kj + aki + εkij , (9.84)
which can be interpreted as a mixed model, i.e., as a two–factorial model(fixed factors: treatments k = 1, 2 and occasions j = 1, . . . , p), withinteraction and one random effect αki (individual).
The univariate analysis of variance assumes equal covariance matricesof the two subpopulations (k = 1 and 2). Furthermore, the structure ofcompound symmetry [(9.19)] is required for both covariance matrices. Thisassumption is sufficient for the validity of the univariate F–tests. Com-pound symmetry is a special case of a more general covariance structurewhich ensures the exact F–distribution. This situation, which occurs oftenin practice, will be discussed in detail in Section 9.7.2
In the mixed model, the following hypotheses, tailored to the situationof the repeated measures model, are tested:
(i) The null hypothesis of homogeneous levels of both treatments
H0 : α1 = α2 . (9.85)
(ii) The null hypothesis of homogeneous occasions (cf. Figure 9.1)
H0 : β1 = . . . = βp . (9.86)
(iii) The null hypothesis of no interaction between the treatment and timeeffects (cf. Figure 9.2)
H0 : (αβ)ij = 0 (k = 1, 2, j = 1, . . . , p) . (9.87)
We define the correction term once again as
C =Y 2
...
N
with N = (n1 + n2)p = np. Taking the possibly unbalanced sample sizes(n1 6= n2) into consideration, we obtain the following sums of squares
410 9. Repeated Measures Model
B
A
Figure 9.1. No interaction and no time effect.
¡¡
¡¡¡
¡¡
¡¡¡
@@
@@@
@@
@@@
B
A
Figure 9.2. No interaction (H0 : (αβ)ij = 0 not rejected) and a time effect.
(cf. (7.17)–(7.22) and Morrison, 1983, p. 213):
SSTotal =∑∑ ∑
(ykij − y...)2
=∑∑ ∑
y2kij − C, (9.88)
SSA = SSTreat =∑∑ ∑
(yk.. − y...)2
=1p
2∑
k=1
1nk
Y 2k.. − C, (9.89)
SSB = SSTime =∑∑ ∑
(y..j − y...)2
=1
n1 + n2
p∑
j=1
Y 2..j − C, (9.90)
SSSubtotal =∑∑ ∑
(yk.j − y...)2
=∑
k
1nk
∑
j
Y 2k.j − C, (9.91)
SSA×B = SSTreat × Time
= SSSubtotal − SSTreat − SSTime, (9.92)
SSInd =∑∑ ∑
(y.i. − yk..)2
=1p
2∑
k=1
nk∑
i=1
Y 2.i. −
1p
2∑
k=1
1nk
Y 2k.. (9.93)
SSError = SSTotal − SSSubtotal − SSInd, . (9.94)
9.7 Univariate Analysis of Variance in the Repeated Measures Model 411
The test statistics are (cf. Greenhouse and Geisser, 1959)
FTreat =MSTreat
MSInd, (9.95)
FTime =MSTime
MSError, (9.96)
FTreat × Time =MSTreat × Time
MSError. (9.97)
Source SS df MS F–Values
Treatment SSTreat 1 SSTreat FTreat =MSTreatMSInd
Occasion SSTime p− 1SSTime
p−1 FTime =MSTimeMSError
Treatment × SSTreat × Time p− 1SSTreat × Time
p−1 FTreat × Time
Occasion =MSTreat × Time
MSError
Individual SSInd n− 2SSIndn−2
Error SSError (p− 1)(n− 2)SSError
(p−1)(n−2)
Total SSTotal np− 1
Table 9.1. Table of the univariate analysis of variance in the repeated measuresmodel.
These F–tests are called unadjusted univariate F–tests—as opposed to theadjusted F–tests named according to the Greenhouse–Geisser strategy.
Remark. The assumption of a compound symmetric structure is not veryrealistic in the repeated measures model, since this requirement meansthat the correlation of the response between two occasions is identical.This assumption, however, cannot be expected for all situations. Hence, thequestion of interest is whether and when univariate tests may be applied inthe case of a more general covariance structure (sphericity of the contrastcovariance matrix) (cf. Girden, 1992).
9.7.2 Testing of Hypotheses in the Case of Sphericity
We assume that the two populations have an identical covariance matrixΣ. The comparison of therapies, i.e., the testing of the linear hypotheses(9.85)–(9.87), is done by means of linear contrasts. The comparison of thep means of the p occasions requires a system of p− 1 orthogonal contrasts.The test statistic follows an F–distribution, if and only if the covariancematrix of the orthogonal contrasts is a scalar multiple of the identity ma-trix. This condition is called the circularity or sphericity condition. It canbe expressed in a number of alternative ways.
412 9. Repeated Measures Model
For example, it can be demanded that all the variances of pairwise dif-ferences of the response values of an individual are equal. For any randomvariables xi and xj , the following is valid
var(xi − xj) = var(xi) + var(xj)− 2cov(xi, xj) .
If var(xi) = var(xj) and cov(xi, xj) is constant (for all i, j), then compoundsymmetry holds. However, more general dependent structures exist, underwhich the condition
var(xi − xj) = const
is valid, from which sphericity of every contrast covariance matrix follows,as long as sphericity is proven for one specific covariance matrix.
The necessary and sufficient condition is known as the Huynh–Feldt con-dition (Huynh and Feldt, 1970). It can be expressed in three equivalent(alternative) forms.
Huynh–Feldt Condition (H Pattern)
(i) The common covariance matrix Σ of both populations is Σ = (σjj′)with
σjj′ =
αj + αj′ + λ, j = j′ ,αj + αj′ , j 6= j′ . . (9.98)
(ii) All possible differences ykij−ykij′ of the response variables have equalvariance, i.e., var(ykij−ykij′) = 2λ is valid for every individual i fromeach of the two groups.
(iii) For the Huynh–Feldt epsilon εHF = 1 holds, where
εHF =p2(σd − σ··)2
(p− 1)(∑∑
σ2rs − 2p
∑σ2
r· + p2σ2··). (9.99)
Here Σ = (σrs) is the population covariance matrix whereσd is the average of the diagonal elements;σ·· is the overall mean of the σrs; andσr· is the average of the rth row.
Testing the Huynh–Feldt Condition
Huynh and Feldt (1970) proved that the necessary and sufficient conditions(i), (ii), or (iii) are valid, if
CHΣC ′H = λI (9.100)
holds where CH is the normalized form of CH . CH is the suborthogonal((p− 1)× p)–submatrix of the orthogonal Helmert matrix
(1′p/
√p
CH
), (9.101)
9.7 Univariate Analysis of Variance in the Repeated Measures Model 413
that is formed from the Helmert contrasts. The Helmert matrix CH in(9.101) contains the following elements:
CH
p−1,p
=
c′1c′2...
c′p−1
=
(p− 1) −1 −1 . . . −1 −10 (p− 2) −1 . . . −1 −1...
...0 0 0 . . . 1 −1
.
(9.102)The vectors c′s (s = 1, . . . , p − 1) are called Helmert contrasts. They areorthogonal
c′s1cs2 = 0 (s1 6= s2)
and∑p
j=1 csj = 0, i.e.,
c′s1p = 0.
However, the cs are not normed (c′scs 6= 1). Therefore, the vector 1′p or itsstandardized version 1p/
√p is included in the contrast matrix as the first
row, although strictly speaking this is not a contrast (1′p1p = p 6= 0, i.e.,the second property of contrasts is not fulfilled).
Standard software is available that converts the contrasts CH intoorthonormal contrasts CH .
Remark. Based on the standardized Helmert matrix CH , we give a shortoutline of how to prove the equivalence of (ii) and (9.100):
Case p = 2The Helmert matrix is CH = (1,−1), hence CH = (1/
√2,−1/
√2). Thus,
(9.100) is
(1/√
2 − 1/√
2) (
σ21 σ12
σ12 σ22
)(1/√
2−1/
√2
)= λ
⇐⇒ σ21 + σ2
2 − 2σ12 = 2λ .
Case p = 3We obtain CHΣC ′H = λI as
2/√
6 −1/√
6 −1/√
6
0 1/√
2 −1/√
2
0@
σ21 σ12 σ13
σ12 σ22 σ23
σ13 σ23 σ23
1A0@
2/√
6 0
−1/√
6 1/√
2
−1/√
6 −1/√
2
1A = λI2
Form
σ21 + σ2
2 − 2σ12 = σ21 +
[σ2
3 + 2σ12 − 2σ13
]− 2σ12
= σ21 + σ2
3 − 2σ13 .
414 9. Repeated Measures Model
⇐⇒Element (1, 1): 1
6
[4σ2
1 + σ22 + σ2
3 − 4σ12 − 4σ13 + 2σ23
]= λ;
Element (1, 2): σ23 − σ2
2 + 2σ12 − 2σ13 = 0;(= Element (2, 1)): =⇒ σ2
2 =[σ2
3 + 2σ12 − 2σ13
];
Element (2, 2): 12
[σ2
2 + σ23 − 2σ23
]= λ .
Equate (1, 1) = (2, 2) (since the right–hand sides are equal) =⇒(σ2
1 + σ22 − 2σ12) + (σ2
1 + σ23 − 2σ13) = 2(σ2
2 + σ23 − 2σ23) = 4λ .
Both terms on the left are identical
=⇒ σ2j + σ2
j′ − 2σjj′ = 2λ (j 6= j′) .
The Condition of Sphericity or Circularity
Compound symmetry is a special case of covariance structures, for whichthe univariate F–tests are valid. Let us first consider the case of a therapygroup measured on p occasions. We can apply (p−1) orthonormal contrastsfor testing the differences in the p occasions.
The univariate statistics (c′jyki)2 follow exact F–distributions if and onlyif the covariance matrix of the contrasts has equal variances and zero covari-ances, i.e., if it has the form σ2I (circularity or sphericity). This correspondsto the assumption of the mixed model that the differences in the yki arecaused only by unequal means and not by variance inhomogeneity.
The model of compound symmetry is a special case of the model ofsphericity of the orthonormal contrasts. Compound symmetry is equivalentto the intraclass correlation structure, i.e., the diagonal elements being σ2
and the off–diagonal elements being σ2α [(9.19)]. Every term on the main
diagonal of the covariance matrix of orthonormal contrasts estimates thedenominator in the univariate F–statistic of the corresponding contrast.Thus, when sphericity holds, each element estimates the same thing.Hence, a better statistic is the average of these elements. This is calledthe averaged F–test. If sphericity does not hold, the denominators of theF–statistics may become too large or too small so that the test is biased.
Comparison of Two or More Therapy Groups—Test for Sphericity
Similar to the above arguments, univariate F–tests only stay valid if thecovariance matrix of orthonormal contrasts within therapy groups arespherical and—additionally—are identical across the therapy groups sothat global sphericity holds. This assumption may be weakened, for in-stance, by demanding sphericity only for the main effects (e.g., j fixed,comparison of two therapies by means of a linear contrast).
For the test of global sphericity [(9.100)], the equality of the covariancematrices in the therapy groups is tested first. This is done by the Box–Mstatistic [(9.74)]. If H0: Σ1 = Σ2 is not rejected, then the test for sphericity
9.7 Univariate Analysis of Variance in the Repeated Measures Model 415
by Mauchly (1940) may be applied. According to Morrison (1983), p. 251the test statistic is
W =qq|R|
(trR)q(9.103)
with q = p− 1,
R = CHSC ′H (9.104)
and CH is the (q×p)–matrix of orthonormal Helmert contrasts. In additionto the exact critical values (cf. tables in Kres (1983)), the approximatedistribution
−[(N − 1)− 2p2 − 3p + 3
6(p− 1)
]ln W ∼ χ2
v (9.105)
with
v = 1/2(p− 2)(p + 1) = 1/2p(p− 1)− 1 (9.106)
may be used in the case of equal sample sizes n1 = n2 = N .Tests relating to the covariance structure—especially the Box–M test
and the Mauchly test—are sensitive to nonnormality in general. Huynhand Mandeville (1979) analyzed the robustness of the Mauchly test to asuch departure by means of simulation studies. The following conclusionsare drawn:
(i) the W–test tends to err on the conservative side for light–tailed dis-tributions, the difference between the empirical type I error and thenominal significance level α increases for large samples and for smallα; and
(ii) for heavy–tailed distributions the reverse is true, i.e., H0 : sphericityis rejected earlier, even though H0 is true.
9.7.3 The Problem of Nonsphericity
After the pretests (univariate F–tests, Box–M test, Mauchly test) are car-ried out, the following questions have to be settled (cf. Crowder and Hand,1990, pp. 50–56):
(i) Which effect occurs if the F–test is applied in spite of a rejection ofsphericity?
(ii) What is to be done if the assumptions seem unjustifiable altogether?
To (i): If sphericity does not hold, then the actual level of significance α ofthe univariate F–tests will exceed the nominal level α, with the effect thattoo many true null hypotheses are rejected. For tests with complete systemsof orthonormal contrasts, this effect can be analyzed by studying the εcorrection factor. Rouanet and Lepine (1970), Mitzel and Games (1981),
416 9. Repeated Measures Model
and Boik (1981) discuss the effect of nonsphericity on single contrasts. Boikconcludes that the type I error is out of control. Rouanet and Lepine (1970)recommend using all relevant statistics.
To (ii): What is to be done in the case of nonsphericity? The multivariateanalysis only assumes the equality of the covariance matrices, but not anyspecific form of the (common) covariance matrix. If however sphericityholds, then the MANOVA has a relatively low power compared to theunivariate approach.
Hence, the direct application of a multivariate analysis, i.e., withoutpreviously testing the possibility of sphericity, is not the best strategy.
9.7.4 Application of Univariate Modified Approaches in theCase of Nonsphericity
Let c be a set of (p−1)–orthonormal contrasts with the covariance matrixΣc. The Greenhouse–Geisser epsilon is then defined as
εG−H =(tr Σc)2
(p− 1) tr(Σ2c)
=(∑
θj)2
(p− 1)∑
θ2j
, (9.107)
where θj are the eigenvalues of Σc. If Σc = I, then all θj = 1 andε is equal to 1. Otherwise, we have εG−H < 1. The overall F–testsfor an occasion effect, and for interaction in the case of two therapygroups with n = n1 + n2 individuals and p measures, involves theFp−1,(p−1)(n−2)–distribution (cf. test statistics (9.96) and (9.97)). In thecase of non-sphericity, the FεG−H(p−1),εG−H(p−1)(n−2)–distribution is usedfor testing. Hence, for εG−H < 1, the critical values increase, i.e., the nullhypotheses are rejected less often. This counteracts the previously describedeffect (answer to (i)).
Since εG−H will not be known, it will have to be estimated. Hence thequestion arises: What influence does the estimation error of εG−H have onthe power of the F–test corrected by εG−H?
Greenhouse–Geisser Test Strategy
In order to avoid this problem, Greenhouse and Geisser (1959) suggest aconservative approach. This strategy consists of the following steps:
• standard F–test (unmodified). If H0 is not rejected, then stop.
• If H0 is rejected, then the smallest ε–value is chosen (lower boundepsilon)
εmin = 1/(p− 1) (9.108)
and tested with the modified F–test. If H0 is rejected by this mostconservative test, then the decision is accepted and stop.
9.7 Univariate Analysis of Variance in the Repeated Measures Model 417
If H0 is not rejected, then εG−H is estimated [(9.107)] and the εG−H–F–test is conducted and its decision is accepted.
As a universal answer for the entire problem, we conclude:
If strong prior reasons favor the assumption of sphericity (i.e., for theindependence of the univariate distributions of the contrasts), then the uni-variate F–tests should be conducted. Otherwise, either a modified ε–F–testor a multivariate test or a nonparametric approach should be applied. It isobvious that this problem cannot be solved academically, but only on thebasis of the data.
Test Procedure in the Two–Sample Case in the Mixed Model
1. Testing for interaction and for occasions effects (H0 from (9.87) and(9.86)):
(a) Σ1 = Σ2 ⇒ MANOVA;(b) CH(Σ1 − Σ2)C ′H = λI ⇒ ANOVA (averaged F–test); and(c) CH(Σ1 − Σ2)C ′H 6= λI ⇒ ANOVA (modified) or MANOVA.
Comment. If sphericity holds, then the ANOVA (unmodified) is morepowerful than the MANOVA.
If we have nonsphericity, the power of the ANOVA (modified) com-pared to the MANOVA depends on the ε–values (Huynh–Feldt ε orGreenhouse–Geisser ε) or, rather, on the estimation errors in ε.
2. Testing for the main effect H0 : α1 = α2 [(9.85)] under the assumptionof H0 : (αβ)ij = 0:
Σ1 = Σ2 ⇒ univariate F–test(MANOVA = unmodified ANOVA)
Σ1 6= Σ2 ⇒ nonparametric approach.
9.7.5 Multiple Tests
If a global treatment effect is proven, i.e., if H0:µ1 = µ2 is rejected, thenthe question of interest is whether regions with a multiple treatment effectexist. Multiple treatment effect means that µ1j 6= µ2j for some j.
Of special interest are connected regions with local multiple treatmenteffects as, for example,
µ1j 6= µ2j , j = 1, . . . , p, p < p , (9.109)
i.e., treatment effects from the first occasion until a specific occasion p.For this, a multiple testing procedure is performed that meets the mul-tiple α–level. This is done by defining so–called Holm–adjusted quantiles(cf. Lehmacher, 1987, p. 29), starting out with Bonferroni’s inequality.
418 9. Repeated Measures Model
Holm–Procedure for Local Multiple Treatment Effects
To begin with, the global treatment effect is tested, i.e., H0 : µ1 = µ2 istested with Hotelling’s T 2 (cf. (9.69)). If H0 is not significant the procedurestops. If, however, H0 is rejected, then the Holm–procedure is conducted,which sorts all p univariate t–statistics of the p single occasions by theirsize (thus, in analogy to the size of the p–values, starting with the smallestp–value). These p–values are compared to the Holm–adjusted sequence:
j = 1 j = 2 j = 3 j = 4 . . . j = p− 1 j = pα/p− 1 α/p− 1 α/p− 2 α/p− 3 . . . α/2 α
As soon as one p–value of a tj lies above its appropriate Holm limit, theprocedure is terminated and H0 : µ1j = µ2j (j = 1, . . . , p), is rejected infavor of H1 (9.109).
Interpretation. A local multiple treatment effect exists for all occasions jwith a p–value of tj ≤ jth Holm limit. This means that all univariatehypotheses H0j : µ1j = µ2j , whose test statistics have p–values below theappropriate Holm limit, are rejected in favor of a local multiple treatmenteffect.
9.7.6 Examples
Example 9.1. Two treatments, 1 and 2, over p = 3 measures withn1 = n2 = 4 individuals each are compared in Table 9.2.
OccasionTreatment A B C Yki.
10 19 27 561 9 13 25 47
4 10 20 345 6 12 23
13 16 19 482 11 18 28 57
17 28 25 7020 23 29 72
Table 9.2. Repeated measures design for the treatment comparison.
9.7 Univariate Analysis of Variance in the Repeated Measures Model 419
Call in SPSS:
MANOVA A B C by Treat (1,2)/ws factors = Time(3)/contrast(Time) = difference/ws design/print = homogeneity(boxm) transform error (cor)
signig(averf) param(estim)/design .
The steps of the test are:
(i) H0 : Σ1 = Σ2:The Box–M statistic is αM = 3.93638, i.e., approximately (cf. (9.76))
χ2p(p+1)/2 = χ2
6 = 1.80417 (p–value 0.937) .
Hence H0 is not rejected. After the test procedure, the MANOVAmay be performed. Before doing this, however, it should be testedwhether sphericity holds for the contrast covariance matrix.
(ii) H0 : CHΣC ′H = λI : We have
CH =(
2/√
6 −1/√
6 −1/√
60 1/
√2 −1/
√2
).
Test involving ’Time’ Within Subject Effect
Mauchly sphericity test, W = .90352Chi-square approx. = .50728 with 2 D.F.Significance = .776Greenhouse-Geisser Epsilon = .91201Huynh-Feldt Epsilon = 1.00000
Hence H0 : Sphericity is not rejected and we may conduct the unadjustedF–tests of the ANOVA.
According to the test strategy in the mixed model, we first test
H0 : (αβ)ij = 0
with (cf. (9.97) and Table 9.1)
FTreat × Time = F(p−1);(p−1)(p−2) =MSTreat
MSError.
From Table 9.2, we get
A B CY1.j 28 48 84 Y1.. = 160Y2.j 61 85 101 Y2.. = 247Y..j 89 133 185 Y... = 407
420 9. Repeated Measures Model
N = 2 · 3 · 4 = 24,
C =Y 2
...
N=
4072
24= 6902.04,
SSTotal = 8269− C = 1366.96,
SSTreat = 1/12(1602 + 2472)− C
= 7217.42− C = 315.38,
SSTime = 1/8(892 + 1332 + 1852)− C
= 7479.38− C = 577.33,
SSSubtotal = 1/4(282 + 482 + 842 + 612 + 852 + 1012)− C
= 7822.75− C = 920.71,
SSTreat × Time = SSSubtotal − SSTreat − SSTime
= 920.71− 315.38− 577.33= 28.00,
SSInd = 1/3(562 + 472 + . . . + 702 + 722)− 1/12(1602 + 2472)= 7555.67− 7217.42 = 338.25,
SSError = SSTotal − SSSubtotal − SSInd
= 108.00 .
SS df MS F p–valueTreat 315.38 1 315.38 5.59 0.056Time 577.33 2 288.67 32.07 0.000
Treat × Time 28.00 2 14.00 1.56 0.251Ind 338.25 6 56.38
Error 108.00 12 9.00Total 1366.96 23
Table 9.3. Analysis of variance table in the model with interaction.
We have
FTreat × Time =MSTreat × Time
MSError= 1.56 .
Because of 1.56 < F2,12;0.95 = 3.88, H0 : (αβ)ij = 0 is not rejected. Hencewe return to the independence model for testing the main effect “Time”.SSTreat × Time is added to SSError. The treatment effect (p–value, 0.056)is not significant; the time effect is significant. The test statistic of thetreatment effect is identical in both tables: FTreat = MSTreat/MSInd.
9.7 Univariate Analysis of Variance in the Repeated Measures Model 421
SS df MS F p–valueTreat 315.38 1 315.38 5.59 0.056Time 577.33 2 288.67 29.73 0.000 ∗Ind 338.25 6 56.38
Error 136.00 14 9.71Total 1366.96 23
Table 9.4. Analysis of variance table in the independence model.
6
| | |A B C
−
−
−
−
−
20
40
60
80
100
....................................
....................................
....................................
....................................
................................................................................................................................................................................................................
..............................
..............................
..............................
..............................
..............................
.....................................
..........................................
..........................................
..........................................
.......
Treatment 1
Treatment 2
Total response
Figure 9.3. Total response treatment 1 and treatment 2 (Example 9.1).
Example 9.2. Two blood pressure lowering drugs, B and a combination ofB and another drug, are to be compared. On 3 control days, the diastolicblood pressure is measured in intervals of 2 hours. The last day is then an-alyzed. This results in a repeated measures design with p = 12 measures.The sample sizes are n1 = 24 (B) and n2 = 27 (combination).
The analysis is done with SPSS.
MANOVA X1 TO X12 by Treat(1,2)/wsfactors=Interval(12)/contrast(Interval)=Difference/Print=Homogeneity(BoxM)/Design=Treat .
(i) Test of the homogeneity of variance, i.e., H0 : ΣB = Σcomb. :BoxsM = 109.59084F with (78,7357)DF = 1.03211 , P
.= .401Chi-square with 78 DF = 81.66664 , P
.= .366
422 9. Repeated Measures Model
With p = 12, we have p(p + 1)/2 = 78, so that the Box-M statisticαM follows a χ2
78 (cf. (9.76)).
Hence, the null hypothesis H0 : ΣB = Σcomb = Σ is not rejected.The univariate unadjusted F–tests require, in addition to the as-sumption of the homogeneity of variance, the special structure ofcompound symmetry. This assumption is included in the sphericityof the contrast covariance matrix as a special case.
(ii) Testing of H0 : CHΣC ′H = λI:The test statistic by Mauchly is (cf. (9.103)) W ∼ χ2
v with v =12 (p− 2)(p + 1) = 1
2 (12− 2)(12 + 1) = 65 degrees of freedom.Mauchly sphericity test, W = .00478Chi-square approx. = 241.17785 with 65 D.F.Significance .00000
Hence, sphericity (and, of course, compound symmetry as well) isrejected and the unadjusted (averaged) univariate F–tests may notbe applied. However, the adjusted univariate F–tests according to theGreenhouse–Geisser strategy can now be conducted.
(iii) Greenhouse–Geisser strategy:The measures for sphericity/nonsphericity are:
Greenhouse–Geisser epsilon (9.107): εG−H = 0.41,Huynh–Feldt epsilon (9.99): ε = 0.46 .
They are distinctly smaller than 1 and indicate nonsphericity of thecontrast covariance matrix CHΣC ′H . The Greenhouse–Geisser stra-tegy now corrects the univariate test statistics according to theirdegrees of freedom.
Source SS df MS F p–valueTreat 5014.49 1 5014.49 4.24 0.045 ∗Time 32414.11 11 2946.74 41.64 0.000 ∗
Treat × Time 2135.01 11 194.09 2.74 0.002 ∗Ind 57996.61 49 1183.60
Error 38141.34 539 70.76Total 135701.56 611
Table 9.5. Unadjusted univariate averaged F–tests.
The null hypothesis H0 : (αβ)ij = 0 is rejected by the unadjustedunivariate F–test. The test value of FTreat × Time = 2.74 is now as-sessed with respect to the Fε(p−1),ε(p−1)(n−2)–distribution, where we start
9.7 Univariate Analysis of Variance in the Repeated Measures Model 423
with the lower–bound epsilon εmin = 1/(p − 1) = 1/11. We have2.74 < F1,49;0.95 = 4.04, hence the interaction is not significant, i.e., H0
: (αβ)ij = 0 is not rejected.Now the next step of the Greenhouse–Geisser strategy is to be carried
out. The value estimated with SPSS is εG−H = 0.41, hence the adjustedF–statistic has 11 · 0.41 = 4.5 degrees of freedom in the numerator, and539 · 0.41 = 221 degrees of freedom in the denominator. Because of
FTreat × Time = 2.74 > 2.32 = F4.5,221;0.95
H0 : (αβ)ij = 0 is rejected. This decision is accepted.
Source p–valueTreat×Time F11,39 = 1.75 0.099Time F11,39 = 18.01 0.000 ∗Treat F1,49 = 4.24 0.045 ∗
Table 9.6. Results of the MANOVA.
Results of the MANOVA and the corrected ANOVA:
At the 5% level, the model with interaction holds for the ANOVA, andfor the MANOVA the independence model holds. Hence, the significantmain effects “treatment” and “time” can be interpreted separately onlyin the case of the MANOVA. If the 10% level is chosen the independencemodel holds for the adjusted ANOVA as well.
Multiple Tests:
The overall treatment effect is significant. Hence the multiple test pro-cedure from Section 9.7.5 may be applied.
From the table of the p–values of the univariate comparison of means, wefind values in ascending order, which we compare with the adjusted Holmlimits.
Hence the following local multiple treatment effects are significant:
424 9. Repeated Measures Model
j p–values1 0.0062 0.0033 0.0024 0.0615 0.3296 0.3747 0.4248 0.8939 0.536
10 0.11711 0.58212 0.024
Table 9.7. Ordered p–values.
j = 3 j = 2 j = 1 j = 12 j = 4 . . .p–Values 0.002 0.003 0.006 0.024 0.061 . . .
Holm 5% 0.0511 = 0.0045 0.0045 0.05
10 = 0.005 0.059 = 0.0056 0.05
8 = 0.0063 . . .Holm 10% 0.0091 0.0091 0.010 0.011 0.0125 . . .
9.8 Multivariate Rank Tests in the RepeatedMeasures Model
In the case of continuous but not necessarily normal response values, thesame hypotheses as in the previous sections may be tested by statisticsthat are based on ranks. The starting point is once again a multivariatetwo–sample problem. Assume the following observation vectors
yki = (yki1, . . . , ykip)′, k = 1, 2, i = 1, . . . , nk . (9.110)
For the observation vectors, we assume that the yki have independentdistributions with a continuous distribution function
Fk(yki) = G(yki −mk), k = 1, 2 , (9.111)
where mk = (mk1, . . . , mkp)′is the vector of medians of the kth group for
the p measures. The function G characterizes the type of distribution andmk represents the location parameter.
The null hypothesis H0 : no treatment effect means H0 : F1 = F2 andimplies
H0 : m1 = m2 , (9.112)
(i) 5 % level: j = 2 and j = 3;
(ii) 10 % level: j = 1, 2 and j = 3.
9.8 Multivariate Rank Tests in the Repeated Measures Model 425
so that both distributions are identical. The null hypothesis H0 : no timeeffect means (cf. Koch, 1969)
H0 : mk1 = . . . = mkp, k = 1, 2 . (9.113)
The test procedures are to be carried out considering the fact, whetherwe have a significant interaction treatment × time or not. A detailed de-scription of these tests can be found in Koch (1969) (cf. Puri and Sen,1971). Since these nonparametric tests are quite burdensome and not im-plemented in standard software, we confine ourselves to a short descriptionof the tests for one treatment effect. In the case of a continuous but notnecessarily normal response, it is more practical to go over to loglinearmodels by applying categorical coding. These tests may then be conductedaccording to Chapter 8.
For the construction of a test for H0 from (9.112), we proceed as follows.Let
rkij := [rank of ykij in y11j , . . . , y1n1j , y21j , . . . , y2n2j ] (9.114)
(k = 1, 2, i = 1, . . . , nk, j = 1, . . . , p), i.e., for every occasion j(j = 1, . . . , p) the ranks 1, . . . , N = n1 + n2 are assigned. If ties oc-cur, then the averaged ranks are used.
Since the distribution is assumed to be continuous, we can assume
P (ykij = yk′ i′ j) = 0 . (9.115)
Hence, we disregard the ties in the following.If the rkij (cf. (9.114)) are combined for each individual, we get the rank
observation vector of the ith individual in the kth group
rki = (rki1, . . . , rkip)′, k = 1, 2, i = 1, . . . , nk . (9.116)
This yields N rank vectors that can be summarized by the (p × N)–rankmatrix
R = (r11, . . . , r1n1 , r21, . . . , r2n2) . (9.117)
Because of the rank assignment (cf. (9.114)), each of the p rows of R is apermutation of the numbers 1, . . . , N .
If the columns of R are exchanged in a way that the first row of Rcontains the ordered ranks, we find the matrix
Rper =
1 . . . Nrper21 . . . rper
2N...
...rperp1 . . . rper
pN
= (r1, . . . , rN ), (9.118)
which is a permutation equivalent to R (cf. (9.117)).
426 9. Repeated Measures Model
Since the p observations ykij (j = 1, . . . , p) are not independent, thecommon distribution of the elements of R (or of Rper) is dependent on theunknown distributions, even if H0 holds.Assume Rper is the set of all possible realizations of Rper. For the size ofRper, we have
|Rper| = (N !)p−1. (9.119)
In general, the distribution of Rper over Rper is dependent on the dis-tributions F1 and F2.
If, however, H0 : F1 = F2 holds, then the observation vectors yki
(k = 1, 2, i = 1, . . . , nk) are independent and identically distributed.Hence, their common distribution stays invariant in the case of a permu-tation within itself, i.e., it is of no great importance from which treatmentgroup the vectors are derived.
This means, however, that under H0, R is uniformly distributed over theset Rper of the N ! possible realizations that we get by all possible per-mutations of the columns of Rper.
Hence, we have
P (R = rS | Rper,H0) =1
N !for all rS ∈ Rper . (9.120)
Denote this (conditional) probability distribution by P0.
Assume that the N rank observation vectors rki, k = 1, 2, (i = 1, . . . , nk)(cf. (9.116)), are known and that these are represented by Rper, then thefollowing holds (cf. Koch, 1969):
The probability that a rank observation vector rki takes the value r is
P (rki = r) =(N − 1)!
N !=
1N
for r = r1, . . . , rN . (9.121)
Hence, for the expectation of rki (k = 1, 2, i = 1, . . . , nk), we have
E(rki | H0) =N∑
j=1
1N
rj
=1N
N(N + 1)2
1p =N + 1
21p . (9.122)
For the construction of an appropriate test statistic, we define the rankmean vector of the kth group
rk. =1nk
nk∑
i=1
rki, k = 1, 2. (9.123)
9.8 Multivariate Rank Tests in the Repeated Measures Model 427
With (9.122), we obtain
E(rk.) =N + 1
21p . (9.124)
The hypothesis H0 can now be tested with the following test statistic(cf. Puri and Sen, 1971, p. 186):
LI =2∑
k=1
nk
(rk. − N + 1
21p
)′
S−1I
(rk. − N + 1
21p
), (9.125)
where we assume that the empirical rank covariance matrix SI is regular.
Remark. The matrix SI measures the interaction treatment × time. If nointeraction exists, SI equals the identity matrix (except for a variance fac-tor) and the multivariate test statistic LI equals the univariate statistic byKruskal–Wallis (cf. (4.134)).
We have
SI =1N
2∑
k=1
nk∑
i=1
(rki − N + 1
21p
)(rki − N + 1
21p
)′
. (9.126)
The test statistic LI is the multivariate version of the statistic of theKruskal–Wallis test and is equivalent to a generalized Lawley–HotellingT 2–statistic. It can be shown that LI has an asymptotic χ2–distributionunder H0 with p degrees of freedom (cf. Puri and Sen, 1971, p. 193). Basedon the construction of the test, large values of LI indicate a violation ofthe null hypothesis H0 from (9.112). Hence, H0 is rejected if
LI ≥ χ2p;1−α . (9.127)
Example 9.3. In the following, we demonstrate the calculation of the teststatistic by means of a simple example. Suppose that we are given thefollowing data set for p = 3 repeated measures:
Group 1
Group 2
2 3 65 6 44 5 58 14 1010 12 1412 13 12
=⇒ranks
1 1 33 3 12 2 24 6 45 4 66 5 5
,
R =
1 3 2 4 5 61 3 2 6 4 53 1 2 4 6 5
= ( r11 r12 r13 r21 r22 r23 ) .
428 9. Repeated Measures Model
The rank means in the two therapy groups are
r1. =1nk
(r11 + r12 + r13)
=13
113
+
331
+
222
=13
666
=
222
,
r2. =13
464
+
546
+
655
=13
151515
=
555
.
From this we calculate, according to (9.125),
ri· − N + 12
1p = ri· − 6 + 12
13 (i = 1, 2),
(r1· − 7213) = −3
213 ,
(r2· − 7213) =
3213 .
This yields the covariance matrix SI , from (9.126),
SI =1
6 · 4
70 58 5058 70 3850 38 70
and
S−1I =
2451840
3456 −2160 −1296−2160 2400 240−1296 240 1536
.
For LI , from (9.125), we have
LI =2∑
k=1
nk
(rk· − N + 1
213
)′S−1
I
(rk· − N + 1
213
)= 6.00 .
Hence, the test for H0 : m1 = m2 (cf. (7.112)) with
LI = 6.00 < 7.81 = χ23;0.95
does not lead to a rejection of H0.
9.9 Categorical Regression for the Repeated Binary Response Data 429
9.9 Categorical Regression for the RepeatedBinary Response Data
9.9.1 Logit Models for the Repeated Binary Response for theComparison of Therapies
Unlike the previous sections of this chapter, we now assume categoricalresponse. In order to explain the problems, we start with binary responseyijk = 1 or yijk = 0. These categories can stand for a reaction above/belowan average. In an example, the blood pressure of each patient above/belowthe median blood pressure of a control group is measured in this way.
Let I = 2 (response categories) and assume two therapies (P : placeboand M : treatment) to be compared. We define the logit for the responsedistribution of the kth subpopulation (therapy P or M , i.e., k = 1 or k = 2)for occasion j (j = 1, . . . , m) as
L(j; k) = ln [P1(j; k)/P2(j; k)] . (9.128)
The independence model in effect coding
L(j; k) = µ + λP1 + λV
j (j = 1, . . . ,m− 1) (9.129)
contains the main effects
λP1 : placebo effect,
λVj (j = 1, . . . ,m− 1): occasions effect,
where the constraints of effect coding (cf. Chapter 6) hold
λM2 = −λP
1 (treatment effect), (9.130)
λVm = −
m−1∑
j=1
λVj . (9.131)
The inclusion of interaction effects λPV1j is possible (saturated model).
The ML estimation of the parameters of the model (9.129) is quitecomplicated since marginal probabilities, that are to be estimated fromthe marginal frequencies, are used for the odds. These marginal frequen-cies, however, have no independent multinomial distributions. The MLestimation has to be achieved by maximizing the likelihood under the con-straint that the marginal distributions satisfy the model [(9.129)] of thenull hypothesis. For this, iterative procedures (e.g., Koch, Landis, Free-man, Freeman and Lehnen (1977); Aitchison and Silvey (1958)) have tobe applied. These procedures replace the necessary nonlinear optimizationunder linear constraints by stepwise weighted ordinary least squares esti-mates, and the iterated ML estimates are again used to form the standardχ2 or G2 goodness–of–fit statistics.
430 9. Repeated Measures Model
9.9.2 First–Order Markov Chain Models
A Markov chain of the lth order Xt is a stochastic process with a “mem-ory” of length l, i.e., in the case of l = 1, we have, for a given occasiont,
P (Xt+1 | X0, . . . , Xt) = P (Xt+1 | Xt) . (9.132)
Hence, the conditional probability for a future value Xt+1 is only dependenton the preceding value Xt and not on the past X0, . . . , Xt−1. The commondensity of (X0, . . . , Xm) is then of the form
f(x0, . . . , xm) = f(x0) · f(x1 | x0) · · · · · f(xm | xm−1) . (9.133)
Hence the common distribution is only dependent on the starting distri-bution f(x0) and on the conditional transition probabilities f(xi | xi−1).This corresponds to a loglinear model with the effects
(X0, (X0, X1), (X1, X2), . . . , (Xm−1, Xm)) . (9.134)
Remark. The transformation of the first–order Markov chain into categori-cal time–dependent response is the nonparametric counterpart of modelingthe process as a time series with first–order autocorrelated errors.
Applied to our problem of binary response Xj at occasions tj (j =1, . . . , m) in the comparison of two therapies (P and M), the probabilities
Pα,β(j − 1, j) α, β = 1, 2 (response), (9.135)
specify the common distribution of Xj−1 and Xj .The conditional probability that the process is in state α = i at occasion
j, under the condition that it was in state α = k (i, k = 1, 2) at occasionj − 1, equals
πi/k(j) = P (Xj = imidXj−1 = k) =Pi,k(j − 1, j)
2∑k=1
Pi,k(j − 1, j). (9.136)
Hence, the modeling of this process is equivalent to the loglinear model[(9.137)]. We find the estimates of the πi/k(j) by constructing a contingencytable and counting the frequencies of possible events. By means of observa-tions in the subpopulations of the prognostic factor (placebo/treatment),we get the estimates πP
i/k(j) and πMi/k(j) for both subpopulations.
Example 9.4. Binary response Xj , binary prognostic factor (placebo,treatment). Assume
XMj and XP
j =
1 blood pressure of the patient lies above the medianof the placebo group at the jth occasion,
0 below.
We choose the following fictitious numbers for a therapy group, in order toillustrate the calculation of the estimates of πi/k(j):
9.9 Categorical Regression for the Repeated Binary Response Data 431
j = 11 800 20
100
j = 21 600 40
100
Assume the following counts of transitions for each patient:
j = 11100
⇒Number of
j = 2 transitions1 500 301 100 10
100
This yields
P1,1(1, 2) =50100
= 0.5,
P1,0(1, 2) =30100
= 0.3,
P0,1(1, 2) =10100
= 0.1,
P0,0(1, 2) =10100
= 0.1 .
Hence the estimated conditional transition probabilities are
π1/1(2) =0.50.8
= 0.625,
π0/1(2) =0.30.8
= 0.375,
∑= 1 ,
π1/0(2) =0.10.2
= 0.5,
π0/0(2) =0.10.2
= 0.5,
∑= 1 .
Remark. The separate modeling for each therapy group by a loglinearmodel
ln(πi/k(j)) = µ + λX0X11 + · · ·+ λXm−1Xm
m (9.137)
gives an insight into significant transitions and filters out the best modelaccording to the G2 criterion.
If both therapy groups are included in one joint model, i.e., if the indica-tor placebo/therapy is chosen as a third dimension, then local statementswithin the scope of the discrete Markov chain models of the following form
432 9. Repeated Measures Model
can be tested:
H0 : The effects of the treatment λM0,j = −λP
1,j on the transition probabilitiesπ1/0(j) are significant (or significant at some occasions of the day’s rhythmof blood pressure).
The actual aim—a global measure (overall superiority) or a global test forH0 : “placebo=treatment”—cannot be achieved directly with this model,but only via an additional consideration.
9.9.3 Multinomial Sampling and Loglinear Models for aGlobal Comparison of Therapies
We assume the response of a patient to therapy A or B to be a cate-gorical response (e.g., binary response) over m occasions. Thus, for eachtherapy, we have m dependent (correlated) response values. If the responseis observed in I categories, then the possible response values for the m oc-casions can be represented in an Im–dimensional contingency table. Table9.8 corresponds to I = 2 and m = 4.
Example 9.5.I = 2 (binary response),
m = 4 occasions.Coding of the response: 1,Coding of the nonresponse: 0.Denote by i = (i1, . . . , im) the cell in the table corresponding to responseij = 1 or ij = 0 (j = 1, . . . , m) at the occasions t1, . . . , tm, and by πi theprobability for this cell. We then have
Im∑1
πi = 1 . (9.138)
Let mi = nπi be the expected cell count of the ith cell. Let the I categoriesbe indexed by h (h = 1, . . . , I) and let Ph(j) be the probability of responseh at occasion j. The Ph(j), h = 1, . . . , I for given j are then the jthmarginal distribution of the contingency table.
We now consider Table 9.8 with m = 4 occasions. For each therapygroup (P or M), we count separately the completely crossed experimentaldesign for the binary response (e.g., 1 : above the median blood pressure ofthe placebo group at occasion j, 0 : below), i.e., the 24 table. We nowclassify the response according to the independent multinomial schemeM(n; π1, . . . , π5):
9.9 Categorical Regression for the Repeated Binary Response Data 433
OccasionResponse i 1 2 3 4 Number4 times 1 1 1 1 1 n1
2 1 1 1 0 n2
3 1 1 0 1 n33 times4 0 1 1 1 n4
5 1 0 1 1 n5
6 1 1 0 0 n6
7 1 0 1 0 n7
8 1 0 0 1 n82 times9 0 1 1 0 n9
10 0 1 0 1 n10
11 0 0 1 1 n11
12 1 0 0 0 n12
13 0 1 0 0 n131 time14 0 0 1 0 n14
15 0 0 0 1 n15
0 times 16 0 0 0 0 n16
n
Table 9.8. 24 Table.
Class 1: 4–times response 1,0–times nonresponse 0⇒ row 1 of Table 9.8.
Class 2: 3–times response 1,1–time nonresponse 0⇒ rows 2–5.
Class 3: 2–times response 1,2–times nonresponse 0⇒ rows 6–11.
Class 4: 1–time response 1,3–times nonresponse 0⇒ rows 12–15.
Class 5: 0–times response 1,4–times nonresponse 0⇒ row 16.
If both therapies (P/M) are included, we receive a 5 × 2 table. Thedisjoint categories of the rows are often called profiles.
434 9. Repeated Measures Model
Cumulatednumber ofresponse 1 P M
0 n11 n12
1 n21 n22
2 n31 n32
3 n41 n42
4 n51 n52
n+1 n+2
Since P and M are independent and since the columns follow the modelof the independent multinomial scheme M(n+1;πP ), or M(n+2;πM ), re-spectively, the null hypothesis H0 : “independent decomposition accordingto cumulated response and therapy” can, equivalently, be formulated by aloglinear model (mij : under H0 expected cell frequencies)
ln(mij) = µ + λRi + λP
1 + λRPi1 , (9.139)
where
µ is the total mean;λR
i is the effect of the ith cumulated response category (ith profile);λP
1 is the effect of the placebo; andλRP
i1 is the interaction ith response category–placebo.
If effect coding is chosen, the effect of the treatment is λM1 = −λP
1 .
Example 9.6. We illustrate the global test on a 13-hour blood pressure dataset. The data set consists of measures of n1 = 63 and n2 = 64 patientsof the therapy groups P (placebo) and M (treatment) over a stretch ofm = 13 hours (start: j = 0, then 12 measures taken in 1–hour intervals).For each patient, it is recorded to which cumulated response category i(i = 0, . . . , 13) he belongs, with i : number of hourly blood pressuresabove the median of the jth hourly measurement of the placebo group(j = 0, . . . , 12).
The results are shown in Table 9.9. Table 9.10 shows these results sum-marized according to groups (0, 1), (2, 3), . . . , (12, 13) (in order to overcomezero–counts in the cells). The parameter estimates and the standardizedparameter estimates (∗: significance at the two–sided level of 5%, i.e.,comparison with u0.95 (two−−sided) = 1.96) are shown in Table 9.11.
Remark. The calculations have been done with the newly developed soft-ware LOGGY 1.0 (cf. Heumann and Jacobsen, 1993), the standard softwarePCS, as well as additional programs.
9.9 Categorical Regression for the Repeated Binary Response Data 435
i P M∑
0 5 30 351 7 7 142 3 6 93 4 6 104 3 5 85 3 3 66 5 2 77 6 0 68 3 2 59 9 0 9
10 5 0 511 2 2 412 2 1 313 6 0 6∑
63 64 127
Table 9.9. Classification of the 12–hour measures at the end point according to“i–times blood pressure values above the respective hourly median of the placebogroup”.
P M∑
0, 1 12 37 492, 3 8 12 204, 5 6 8 146, 7 10 2 128, 9 13 2 15
10, 11 7 2 912, 13 7 1 8∑
63 64 127
Table 9.10. Summary of the classes in Table 9.9.
Interpretation
(i) Saturated model
ln(mij) = µ + λRi + λP
1 + λRPi1 . (9.140)
The test statistic for H0 : “saturated model valid” is G2 = 0 (perfectfit) as usual.
The placebo effect λP1 = 0.35 (2.57 standardized) is significant. Since
code 1 symbolizes high blood pressure (above the respective hourlymedian of the placebo group), a positive λP
1 stands for an effect to-ward higher blood pressure. Hence (λM
1 = −0.35), the treatment
436 9. Repeated Measures Model
ParameterParameter estimate Significant Standardized
µ 1.81 ∗ 13.42λP
1 0.35 ∗ 2.57λR
1 1.24 ∗ 6.35λR
2 0.47 ∗ 2.00λR
3 0.12 0.47λR
4 -0.31 -0.89λR
5 -0.18 -0.53λR
6 -0.49 -1.35λR
7 -0.84 ∗ -1.98λRP
11 -0.91 ∗ -4.67λRP
21 -0.55 ∗ -2.34λRP
31 -0.49 -1.85λRP
41 0.46 1.29λRP
51 0.59 1.69λRP
61 0.28 0.77λRP
71 0.63 1.33
Table 9.11. Parameter estimates and standardized values for the saturated modelln(mij) = µ + λR
i + λP1 + λRP
i1 .
significantly lowers the blood pressure.
The significant response effects λR1 (categories 0- and 1-time above
the median) and λR2 (2- and 3-times above the median) are positive,
and λR7 (10- and 11-times above the median) is negative. These two
results once again speak (in a qualitative way) for the blood pressurelowering effect of the treatment.The interactions are hard to interpret separately.
The analysis of the submodels of the hierarchy lead to the followingresults:
(ii) Independence model
H0 : ln(mij) = µ + λRi + λP
1 . (9.141)
The test value G2 = 37 (p–value 0.000002) is significant, hence H0
[(9.141)] is rejected.
(iii) Model for isolated profile effects
H0 : ln(mij) = µ + λRi . (9.142)
The test value is G2 = 37 (7 df) is significant as well (H0 : (9.142) isrejected).
9.9 Categorical Regression for the Repeated Binary Response Data 437
(iv) Model for isolated treatment effect
H0 : ln(mij) = µ + λP1 (9.143)
The test value is G2 = 90 (12 df) and hence significant.
As a result, it can be stated that the saturated model is the only possiblestatistical model for the observed profiles of the two subpopulations placeboand treatment. This model indicates:
– a blood pressure lowering effect of the treatment;
– profile effects;
and gives evidence for:
– significant interactions.
As an interesting result, it can be stated that the therapy effect is notisolated (i.e., it is not an orthogonal component), but has a mutual effectwith the time after taking the treatment.
This analysis is confirmed by the following crude–rate analysis for whichthe profiles 0–6 and 7–13 were combined:
P M∑
0–6 32 59 917–12 31 5 36∑
63 64 127
The saturated model
ln(mij) = µ + λR1 + λP
1 + λRP11 (9.144)
yields the significant parameter estimates
µ λR1 λP
1 λRP11
3.15 0.63 0.30 -0.61Standardized 23.77 ∗ 4.72 ∗ 2.69 ∗ -4.60 ∗
In the saturated model we have, for the odds ratio,
θ = exp(4λRP11 ) ,
i.e.,
θ = 0.0036 ,
ln θ = −2.44 (negative interaction).
The crude model of the 2 × 2 table is regarded as a robust indicator ofinteractions, in general, that can be broken down by finer structures. Theadvantage of the 2 × 2 table is the estimation of a crude interaction overall levels of the categories of the rows.
438 9. Repeated Measures Model
Remark. The model calculations assume a Poisson sampling scheme for thecontingency table, i.e., unrestricted random sampling, especially a randomtotal sample size.
The sampling scheme is restricted to independent multinomial samplingin the case of the model of therapy comparison. Birch (1963) has provedthat the ML estimates are identical for independent multinomial samplingand Poisson sampling, as long as the model contains a term (parameter) forthe marginal distribution given by the experimental design. For our caseof therapy comparison, this means that the marginal sums n+1 and n+2
(i.e., the number of patients in the placebo group and the treated group),have to appear as sufficient statistics in the parameter estimates. This isthe case in:
(i) the saturated model (9.140);
(ii) the independence model (9.141);
(iii) the model for isolated profile effects (9.142);
but not in:
(iv) the model for the isolated treatment effect (9.143).
As our model calculations show, model (9.143) is of no interest, since atreatment effect cannot be detected isolated, but only in interaction withthe profiles.
Remark. Tables 9.9 and 9.10 differ slightly due to patients whose bloodpressure coincide with the hourly median.
Trend of the Profiles of the Medicated Group
As another nonparametric indicator for the blood pressure lowering effectof the treatment, we now model the crude binary risk
7–12 times over the respective placebo hourly median/0–6 times over the median
over three observation days (i.e., i = 1, 2, 3) by a logistic regression. Theresults are shown in Table 7.11.
i 7–12 0–6 Logit1 34 32 0.062 12 51 -1.453 5 59 -2.47
Table 9.12. Crude profile of the medicated group for the three observation days.
9.10 Exercises and Questions 439
From this we calculate the model
ln(
ni1
ni2
)= α + β i (i = 1, 2, 3)
= 1.243− 1.265 · i , (9.145)
with the correlation coefficient r = 0.9938 (p–value 0.0354, one–sided) andthe residual variance σ2 = 0.22.
Hence, the negative trend to fall into the unfavorable profile group “7–12”is significant for this model (three observations, two parameters!). However,this result can only be regarded as a crude indicator. Results that are morereliable are achieved with Table 9.13, which is subdivided into seven groupsinstead of only two profiles.
i 0–1 2–3 4–5 6–7 8–9 10–11 12–131 4 10 10 13 8 13 82 29 14 7 4 4 2 13 37 12 8 2 2 2 1
Table 9.13. Fine profiles of the medicated group for the three observation days.
The G2 analysis in Table 9.13 for testing H0 : “cell counts over theprofiles and days are independent” yields a significant value of G2
14 = 70.50(> 23.7 = χ2
14;0.95) so that H0 is rejected.
9.10 Exercises and Questions
correlation coefficient of an individual over two different occasions.
structure does the compound symmetric covariance matrixhave? Name the best linear unbiased estimate of β in the modely = Xβ + ε, ε∼ (0, σ2Σ), with Σ of compound symmetric structure.
9.10.3 Why is the ordinary least–squares estimate chosen instead of theAitken estimate in the case of compound symmetry?
Why can it be interpreted as a mixed model and as a split–plotdesign?
9.10.5 What is meant by the µk–profile of an individual?
9.10.6 How is the Wishart distribution defined?
9.10.1 How is the correlation of an individual over the occasions defined?In which way are two individuals correlated? Name the intraclass
9.10.2 What
9.10.4 Name the repeated measures model for two independent populations.
440 9. Repeated Measures Model
9.10.7 How is H0 : µ = µ0 (one–sample problem) tested univariate forx1, . . . , xn independent and identically distributed ∼ Np(µ,Σ)?
9.10.8 How is H0 : µx = µy (two–sample problem) tested multivariate forx1, . . . , xn1 ∼ Np(µx, Σx) and y1, . . . , yn2 ∼ Np(µy,Σy)? Whichconditions have to hold true?
fulfillment of the sphericity condition.9.10.9 Describe the test strategy (univariate/multivariate) dependent on the
10Cross–Over Design
10.1 Introduction
Clinical trials form an important part of the examination of new drugsor medical treatments. The drugs are usually assessed by comparing theireffects on human subjects. From an ethical point of view, the risks whichpatients might be exposed to must be reduced to a minimum and also thenumber of individuals should be as small as statistically required. Cross–over trials follow the latter, treating each patient successively with twoor more treatments. For that purpose, the individuals are divided intorandomized groups in which the treatments are given in certain orders.In a 2 × 2 design, each subject receives two treatments, conventionallylabeled as A and B. Half of the subjects receive A first and then cross overto B while the remaining subjects receive B first and then cross over toA. Between two treatments a suitable period of time is chosen, where notreatment is applied. This washout period is used to avoid the persistenceof a treatment applied in one period to a subsequent period of treatment.
The aim of cross–over designs is to estimate most of the main effects us-ing within–subject differences (or contrasts). Since it is often the case thatthere is considerably more variation between subjects than within subjects,this strategy leads to more powerful tests than simply comparing two in-dependent groups using between–subject information. As each subject actsas his own control, between–subject variation is eliminated as a source oferror.
© Springer Science + Business Media, LLC 2009
H. Toutenburg and Shalabh, Statistical Analysis of Designed Experiments, Third Edition, 441Springer Texts in Statistics, DOI 10.1007/978-1-4419-1148-3_10,
442 10. Cross–Over Design
If the washout periods are not chosen long enough, then a treatment maypersist in a subsequent period of treatment. This carry–over effect willmake it more difficult, or nearly impossible, to estimate direct treatmenteffects.
To avoid psychological effects, subjects are treated in a double blindedmanner so that neither patients nor doctors know which of the treatmentsis actually applied.
10.2 Linear Model and Notations
We assume that there are s groups of subjects. Each group receives the Mtreatments in a different order. It is favorable to use all of the M ! orderingsof treatments, i.e., to use the orderings AB and BA for comparison ofM = 2 treatments and ABC, BCA, CAB,ACB, CBA, BAC for M = 3treatments so that s = M !
We generally assume that the trial lasts p periods (i.e., p = M periodsif all possible orderings are used). Let yijk denote the response observedon the kth subject (k = 1, . . . , ni) of group i (i = 1, . . . , s) in periodj (j = 1, . . . , p). We first consider the following linear model (cf. Jonesand Kenward, 1989, p. 9) which Ratkovsky, Evans and Alldredge (1993,pp. 81–84) label as parametrization 1:
yijk = µ + sik + πj + τ[i,j] + λ[i,j−1] + εijk , (10.1)
where
yijk: is the response of the kth subject of group i in period j;µ: is the overall mean;sik: is the effect of subject k in group i (i = 1, . . . s, k = 1, . . . , ni);πj : is the effect of period j (j = 1, . . . , p);τ[i,j]: is the direct effect of the treatment administered in period j
of group i (treatment effect);λ[i,j−1]: is the carry–over effect (effect of the treatment administered
in period j − 1 of group i) that still persists in period j; andwhere λ[i,0] = 0; and
εijk: is random error.
The subject effects sik are taken to be random. Sample totals will bedenoted by capital letters, sample means by small letters. A dot (·) willreplace a subscript to indicate that the data has been summed over thatsubscript. For example,
total response: Yij· =Pni
k=1 yijk, Yi·· =Pp
j=1 Yij·, Y··· =Ps
i=1 Yi·· ,means: yij· = Yij·/ni, yi·· = Yi··/pni, y··· = Y···/(p
Psi=1 ni) .
(10.2)
10.3 2× 2 Cross–Over (Classical Approach) 443
To begin with, we assume that the response has been recorded on acontinuous scale.
Remark. Model (10.1) may be called the classical approach and has beenexplored intensively since the 1960s (Grizzle, 1965). This parametrization,however, shows some inconsistencies concerning the effect caused by theorder in which the treatments are given. This so–called sequence effectbecomes important, especially regarding higher–order designs. For example,using the following plan in a cross–over design trial
Sequence
Period1 2 3 4A B C DB D A CC A D BD C B A
,
the actual sequence (group) might have a fixed effect on the response.Then the between–subject effect sik would also be stratified by sequences(groups). This effect would have to be considered as an additional parame-ter γi (i = 1, . . . , s) in model (10.1). Applying the classical approach (10.1)without this sequence effect leads to the sequence effect being confoundedwith other effects. We will discuss this fact later in this chapter.
10.3 2 × 2 Cross–Over (Classical Approach)
We now consider the common comparison of M = 2 treatments A and B(cf. Figure 10.1) using a 2 × 2 cross–over trial with p = 2 periods.
Period 1 Period 2Group 1 A BGroup 2 B A
Figure 10.1. 2 × 2 Cross–over design with two treatments.
As there are only four sample means y11·, y12·, y21·, and y22· available fromthe 2× 2 cross–over design, we can only use three degrees of freedom to es-timate the period, treatment, and carry–over effects. Thus, we have to omitthe direct treatment × period interaction which now has to be estimatedas an aliased effect confounded with the carry–over effect. Therefore, the2 × 2 cross–over design has the special parametrization
τ1 = τA and τ2 = τB . (10.3)
444 10. Cross–Over Design
The carry–over effects are simplified as
λ1 = λ[1,1] = λ[A,1] ,λ2 = λ[2,1] = λ[B,1] .
(10.4)
Group Period 1 Period 21 (AB) µ + π1 + τ1 + s1k + ε11k µ + π2 + τ2 + λ1 + s1k + ε12k
2 (BA) µ + π1 + τ2 + s2k + e21k µ + π2 + τ1 + λ2 + s2k + ε22k
Table 10.1. The effects in the 2 × 2 cross–over model.
Then λ1 and λ2 denote the carry–over effect of treatment A (resp., B)applied in the first period so that the effects in the full model are as shownin Table 10.1. The subject effects sik are regarded as random.
The random effects are assumed to be distributed as follows:
siki.i.d.∼ N(0, σ2
s),εijk
i.i.d.∼ N(0, σ2),E(εijksik) = 0 (∀i, j, k).
(10.5)
10.3.1 Analysis Using t–Tests
The analysis of data from a 2 × 2 cross–over trial using t–tests was firstsuggested by Hills and Armitage (1979). Jones and Kenward (1989) notethat these are valid, whatever the covariance structure of the two mea-surements yA and yB taken on each subject during the active treatmentperiods.
Testing Carry–Over Effects, i.e., H0 : λ1 = λ2
The first test we consider is the test on equality of the carry–over effects λ1
and λ2. Only if equality is not rejected, the following tests on main effectsare valid, since the difference of the carry–over effects λd = λ1 − λ2 is thealiased effect of the treatment × period interaction.
We note that the subject total Y1·k of the kth subject in Group 1
Y1·k = y11k + y12k (10.6)
has the expectation (cf. Table 10.1)
E(Y1·k) = E(y11k) + E(y12k)= (µ + π1 + τ1) + (µ + π2 + τ2 + λ1)= 2µ + π1 + π2 + τ1 + τ2 + λ1 .
(10.7)
In Group 2 (BA) we get
Y2·k = y21k + y22k (10.8)
10.3 2× 2 Cross–Over (Classical Approach) 445
and
E(Y2·k) = 2µ + π1 + π2 + τ1 + τ2 + λ2 . (10.9)
Under the null hypothesis,
H0 : λ1 = λ2 , (10.10)
these two expectations are equal
E(Y1·k) = E(Y2·k) for all k. (10.11)
Now we can apply the two–sample t–test to the subject totals and define
λd = λ1 − λ2 . (10.12)
Then
λd =Y1··n1
− Y2··n2
= 2(y1·· − y2··) (10.13)
is an unbiased estimator for λd, i.e.,
E(λd) = λd . (10.14)
Using
Yi·k − E(Yi·k) = 2sik + εi1k + εi2k
and
Var(Yi·k) = 4σ2s + 2σ2
we get
Var(
Yi··ni
)=
1n2
i
ni∑
k=1
Var(Yi·k) =4σ2
s + 2σ2
ni(i = 1, 2) .
Therefore we have
Var(λd) = 2(2σ2s + σ2)
(1n1
+1n2
)
= σ2d
(n1 + n2
n1n2
)(10.15)
where
σ2d = 2(2σ2
s + σ2) . (10.16)
To estimate σ2d we use the pooled sample variance
s2 =(n1 − 1)s2
1 + (n2 − 1)s22
n1 + n2 − 2(10.17)
446 10. Cross–Over Design
which has (n1 + n2 − 2) degrees of freedom, with s21 and s2
2 denoting thesample variances of the response totals within groups, where
s2i =
1ni − 1
ni∑
k=1
(Yi·k − Yi··
ni
)2
=1
ni − 1
(ni∑
k=1
Y 2i·k −
Y 2i··
ni
)(i = 1, 2) .
(10.18)We construct the test statistic
Tλ =λd
s
√n1n2
n1 + n2(10.19)
that follows a Student’s t–distribution with (n1+n2−2) degrees of freedomunder H0 [(10.10)].
According to Jones and Kenward (1989), it is usual practice to followGrizzle (1965) to run this test at the α = 0.1 level. If this test does notreject H1, we can proceed to test the main effects.
Testing Treatment Effects (Given λ1 = λ2 = λ)
If we can assume that λ1 = λ2 = λ, then the period differences
d1k = y11k − y12k (Group 1, i.e., A–B) ,d2k = y21k − y22k (Group 2, i.e., B–A) ,
(10.20)
have expectations
E(d1k) = π1 − π2 + τ1 − τ2 − λ ,E(d2k) = π1 − π2 + τ2 − τ1 − λ .
(10.21)
Under the null hypothesis H0 : no treatment effect, i.e.,
H0 : τ1 = τ2 , (10.22)
these two expectations coincide. The difference of the treatment effects
τd = τ1 − τ2 (10.23)
is estimated by
τd =12(d1· − d2·) (10.24)
which is unbiased
E(τd) = τd , (10.25)
and has variance
Var(τd) =2σ2
4
(1n1
+1n2
)
=σ2
D
4
(1n1
+1n2
), (10.26)
10.3 2× 2 Cross–Over (Classical Approach) 447
where
σ2D = 2σ2 . (10.27)
The pooled estimate of σ2D, according to (10.17), replacing s2
i by
s2iD =
1ni − 1
ni∑
k=1
(dik − di·)2
becomes
s2D =
(n1 − 1)s21D + (n2 − 1)s2
2D
n1 + n2 − 2. (10.28)
Under the null hypothesis H0 : τd = 0, the statistic
Tτ =τd
12sD
√n1n2
n1 + n2, (10.29)
follows a t–distribution with (n1 + n2 − 2) degrees of freedom.
Testing Period Effects (Given λ1 + λ2 = 0)
Finally we test for period effects using the null hypothesis
H0 : π1 = π2 . (10.30)
The “cross–over” differences
c1k = d1k ,c2k = −d2k ,
(10.31)
have expectations
E(c1k) = π1 − π2 + τ1 − τ2 − λ1 ,E(c2k) = π2 − π1 + τ1 − τ2 + λ2 .
(10.32)
Under the null hypothesis H0 : π1 = π2 and the familiar reparametrizationλ1+λ2 = 0, these expectations coincide, i.e., E(c1k) = E(c2k). An unbiasedestimator for the difference of the period effects πd = π1 − π2 is given by
πd =12(c1· − c2·) (10.33)
and we get the test statistic with sD from (10.28)
Tπ =πd
12sD
√n1n2
n1 + n2, (10.34)
which again follows a t–distribution with (n1 + n2− 2) degrees of freedom.
448 10. Cross–Over Design
Unequal Carry–Over Effects
If the hypothesis λ1 = λ2 is rejected, the above procedure for testing τ1 = τ2
should not be used since it is based on biased estimators. Given λd =λ1 − λ2 6= 0, we get
E(τd) = E(
d1· − d2·2
)= τd − λd
2. (10.35)
With
λd = y11· + y12· − y21· − y22· (10.36)
and
τd =12(y11· − y12· − y21· + y22·) (10.37)
an unbiased estimator τd|λdof τd is given by
τd|λd=
12(y11· − y12· − y21· + y22·) +
12(y11· + y12· − y21· − y22·)
= y11· − y21· (10.38)
The unbiased estimator of τd for λd 6= 0 is identical to the estimator of aparallel group study. The estimator is based on between–subject informa-tion of the first period and the measurements. Testing for H0 : τd = 0 isdone following a two–sample t–test, but using the measurements of the firstperiod only, to estimate the variance. Thus, the sample size might becometoo small to get significant results for the treatment effect.
Regarding the reparametrization
λ1 + λ2 = 0 , (10.39)
we see that the estimator πd is still unbiased
E(πd) = E(
c1· − c2·2
)
=12E
(1n1
n1∑
k=1
c1k − 1n2
n2∑
k=1
c2k
)
=12
(1n1
n1∑
k=1
E(c1k)− 1n2
n2∑
k=1
E(c2k)
)
=12(2π1 − 2π2 − (λ1 + λ2)) [cf. (10.32)]
= πd [cf. (10.39)] ,
and thus πd is unbiased, even if λd = λ1 − λ2 6= 0 but λ1 + λ2 = 0.
10.3 2× 2 Cross–Over (Classical Approach) 449
10.3.2 Analysis of Variance
Considering higher–order cross–over designs, it is useful to test the effectsusing F–tests obtained from an analysis of variance table. Such a table waspresented by Grizzle (1965) for the special case n1 = n2. The first generaltable was given by Hills and Armitage (1979). The sums of squares may bederived for the 2 × 2 cross–over design as a simple example of a split–plotdesign. The subjects form the main plots while the periods are treated asthe subplots at which repeated measurements are taken (cf. Section 7.8).With this in mind, we get
SSTotal =2∑
i=1
2∑
j=1
ni∑
k=1
y2ijk −
Y 2···
2(n1 + n2),
between–subjects:
SSCarry−over =2n1n2
(n1 + n2)(y1·· − y2··)2 ,
SSb−s Residual =2∑
i=1
ni∑
k=1
Y 2i·k2−
2∑
i=1
Y 2i··
2ni,
within–subjects:
SSTreat =n1n2
2(n1 + n2)(y11· − y12· − y21· + y22·)2 ,
SSPeriod =n1n2
2(n1 + n2)(y11· − y12· + y21· − y22·)2 ,
SSw−s Residual =2∑
i=1
2∑
j=1
ni∑
k=1
y2ijk −
2∑
i=1
2∑
j=1
Y 2ij·ni
− SSb−s Residual .
Source SS df MS FBetween–subjectsCarry–over SSc−o 1 MSc−o Fc−oResidual(between–subjects) SSResidual(b−s) n1 + n2 − 2 MSResidual(b−s)Within–subjectsDirecttreatment effect SSTreat 1 MSTreat FTreatPeriod effect SSPeriod 1 MSPeriod FPeriodResidual(within–subjects) SSResidual(w−s) n1 + n2 − 2 MSResidual(w−s)Total SSTotal 2(n1 + n2)− 1
Table 10.2. Analysis of variance table for 2 × 2 cross–over designs (Jones andKenward, 1989, p. 31; Hills and Armitage, 1979).
The F–statistics are built according to Table 10.3.
Under H0 : λ1 = λ2, the expressions MSc−o and MSResidual(b−s) have thesame expectations and we use the statistic Fc−o = MSc−o/MSResidual(b−s).
450 10. Cross–Over Design
MS E(MS)MSc−o [(2n1n2)/(n1 + n2)](λ1 − λ2)2 + (2σ2
s + σ2)MSResidual(b−s) (2σ2
s + σ2)MSTreat (2n1n2)/(n1 + n2)[(τ1 − τ2)− (λ1 − λ2)/2]2 + σ2
MSPeriod [(2n1n2)/(n1 + n2)](π1 − π2)2 + σ2
MSResidual(w−s) σ2
Table 10.3. E(MS).
Assuming λ1 = λ2 and H0 : τ1 = τ2, MSTreat and MSResidual(w−s) haveequal expectations σ2. Therefore, we get FTreat = MSTreat/MSResidual(w−s).
Testing for period effects does not depend upon the assumption that λ1 =λ2 holds. Since MSPeriod and MSResidual(w−s) have expectation σ2 consid-ering H0 : π1 = π2, the statistic FPeriod|H0 = MSPeriod/MSResidual(w−s)
follows a central F–distribution.
Example 10.1. A clinical trial is used to compare the effect of two soporificsA and B. Response is the prolongation of sleep (in minutes).
Group 1 PatientPeriod Treatment 1 2 3 4 Y1j· y1j·
1 A 20 40 30 20 110 27.52 B 30 50 40 40 160 40.0
Y1·k 50 90 70 60 Y1·· = 270Y1··/4 = 67.50
y1·· = 33.75Differences d1k −10 −10 −10 −20 d1. = -12.5
Group 2 PatientPeriod Treatment 1 2 3 4 Y2j· y2j·
1 B 30 40 20 30 120 30.02 A 20 50 10 10 90 22.5
Y2·k 50 90 30 40 Y2·· = 210Y2··/4 = 52.50
y2·· = 26.25Differences d2k 10 −10 10 20 d2. = 7.5
10.3 2× 2 Cross–Over (Classical Approach) 451
t–Tests
H0 : λ1 = λ2 (no carry–over effect):
(10.13) λd =Y1··4− Y2··
4=
2704− 210
4= 15,
(10.18) 3s21 =
4∑
k=1
(Y1·k − Y1··ni
)2
= (50− 67.5)2 + · · ·+ (60− 67.5)2 = 875,
(10.18) 3s22 = (50− 52.5)2 + · · ·+ (40− 52.5)2 = 2075,
(10.17) s2 =2950
6= 491.67 = 22.172,
(10.19) Tλ =15
22.17
√168
= 0.96 .
Decision. Tλ = 0.96 < 1.94 = t6;0.90(two−−sided) ⇒ H0 : λ1 = λ2 is notrejected. Therefore, we can go on testing the main effects.
H0 : τ1 = τ2 (no treatment effect).
We compute
d1· =−10− 10− 10− 20
4= −12.5,
d2· =10− 10 + 10 + 20
4= 7.5,
(10.24) τd =12(d1· − d2·) = −10,
3s21D =
∑(d1k − d1·)2
= (−10 + 12.5)2 + · · ·+ (−20 + 12.5)2 = 75,
3s22D = (10− 7.5)2 + · · ·+ (20− 7.5)2 = 475,
(10.28) s2D =
75 + 4756
= 9.572,
(10.29) Tτ =−10
9.57/2
√4 · 44 + 4
= −2.96 .
Decision. With t6;0.95(two−sided) = 2.45 and t6;0.95(one−sided) = 1.94 the hy-pothesis H0 : τ1 = τ2 is rejected one–sided, as well as two–sided, whichmeans a significant treatment effect.
H0 : π1 = π2 (no period effect).
452 10. Cross–Over Design
We calculate
(10.33) πd =12(c1· − c2·) =
12(d1· + d2·)
=12(−12.5 + 7.5) = −2.5,
(10.34) Tπ =−2.5
9.57/2
√2 = −0.74 .
H0 : π1 = π2 cannot be rejected (one– and two–sided).
From the analysis of variance we get the same F1,6 = t26 statistics.
SS df MS FCarry-over 225 1 225.00 0.92 = 0.962
Residual (b-s) 1475 6 245.83Treatment 400 1 400.00 8.73 = 2.962 ∗Period 25 1 25.00 0.55 = 0.742
Residual (w-s) 275 6 45.83Total 2400 15
SSTotal = 16, 800− 4802
2 · 8 = 2400,
SSc−o =2 · 4 · 44 + 4
(33.75− 26.25)2 = 225,
SSResidual(b−s) =12(502 + 902 + · · ·+ 402)−
(2702
8− 2102
8
)
=32, 200
2− 117, 000
8= 16, 100− 14, 625 = 1475,
SSTreat =4 · 4
2(4 + 4)(27.5− 40.0− 30.0 + 22.5)2
= (−20)2 = 400,
SSPeriod = (27.5− 40.0 + 30.0− 22.5)2
= (−5)2 = 25,
SSResidual(w−s) = 16, 800− 14(1102 + 1602 + 1202 + 902)− 1475
= 16, 800− 15, 050− 1475 = 275 .
10.3 2× 2 Cross–Over (Classical Approach) 453
10.3.3 Residual Analysis and Plotting the Data
In addition to t– and F–tests, it is often desirable to represent the datausing plots. We will now describe three methods of plotting the datawhich will allow us to detect patients being conspicuous by their response(outliers) and interactions such as carry–over effects.
Subject profile plots are produced for each group by plotting each sub-ject’s response against the period label. To summarize the data, we choosea groups–by–periods plot in which the group–by–period means are plottedagainst the period labels and points which refer to the same treatment areconnected. Using Example 10.1 we get the following plots.
6
Periods| |1 (A) 2 (B)
−
−
−
−
−
10
20
30
40
50
................................................................
................................................................
................................................................
................................................................
......................................
....................................
....................................
....................................
....................................
....................................
....................................
....................................
....................................
...............................
................................................................
................................................................
................................................................
................................................................
......................................................................
................................................................
................................................................
................................................................
................................................................
......
Figure 10.2. Individual profiles (Group 1).
All patients in Group 1 show increasing response when they cross–overfrom treatment A to treatment B. In Group 2, the profile of patient 2 (up-permost line) exhibits a decreasing response while the other three profilesshow an increasing tendency.
Figure 10.4 shows that in both periods treatment B leads to higher re-sponse than treatment A (difference of means B − A : 30 − 27.5 = 2.5for period 1; 40 − 22.5 = 17.5 for period 2; so that τd(B − A) =12 (17.5 + 2.5) = 10 = −τd(A − B)). It would also be possible to saythat treatment A shows a slight carry–over effect that strengthens B (orB has a carry–over effect that reduces A). This difference in the treatmenteffects is not statistically significant according to the results we obtainedfrom testing treatment × period interactions (= carry–over effect). Withoutdoubt, we can say that treatment A has lower response than treatment Bin period 1 and this effect is even more pronounced in period 2. Another
454 10. Cross–Over Design
6
Periods| |1 (B) 2 (A)
−
−
−
−
−
10
20
30
40
50
......................................................................................................................................................................................................................................................................................................
...............................................................................................................................................................................................................................................................................................................................
......................................................................................................................................................................................................................................................................................................
................................................................
................................................................
................................................................
................................................................
......................................
Figure 10.3. Individual profiles (Group 2).
6
Periods| |1 2
−
−
−
−
−
10
20
30
40
50
...............................................................................................................................................................................................................................................................................................
................................................................
................................................................
................................................................
................................................................
......................................
1A2B
2A
1B
Figure 10.4. Group–period plots.
interesting view is given by the differences–by–totals plot where the sub-jects’ differences dik are plotted against the total responses Yi·k. Plottingthe pairs (dik, Yi·k) and connecting the outermost points of each group by aconvex hull, we get a clear impression of carry–over and treatment effects.Since the statistic for carry–over is based on λd = (Y1../n1 − Y2../n2), thetwo hulls will be separated horizontally if λd 6= 0. In the same way the
10.3 2× 2 Cross–Over (Classical Approach) 455
treatment effect based on τd = 12 (d1. − d2.) will manifest if the two hulls
are being vertically separated.Figure 10.5 shows vertically separated hulls indicating a treatment effect
(which we already know is significant according to our tests). On the otherhand, the hulls are not separated horizontally and indicate no carry–overeffect.
-
6
Yi·k
dik
| | | | |20 40 60 80 100
−−−−−
-20
-10
10
20
...........................................................................................................................................................................................................................................................................................................................
...................................................................................................................................................................................................................................................................................................
.......................................................................
........................................................................................................................................................................................................
.................................................
.................................................
..................................................
.................................................................................
x
x
x
x
x
x
x
Figure 10.5. Difference–response–total plot to Example 10.1
Analysis of Residuals
The components εijk of ε = (y−Xβ) are the estimated residuals which areused to check the model assumptions on the errors εijk. Using appropriateplots, we can check for outliers and revise our assumptions on normal distri-bution and independency. The response values corresponding to unusuallylarge standardized residuals are called outliers. A standardized residual isgiven by
rijk =εijk√
Var(εijk), (10.40)
with the variance factor σ2 being estimated with MSResidual(w−s).From the 2 × 2 cross–over, we get
yijk = yi·k + yij· − yi·· (10.41)
and
Var(εijk) = Var(yijk − yijk) =(ni − 1)
2niσ2 . (10.42)
456 10. Cross–Over Design
Then
rijk =εijk√
MSResidual(w−s)(ni − 1)/2ni
. (10.43)
This is the internally Studentized residual and follows a beta–distribution.We, however, regard rijk as N(0, 1)–distributed and choose the two–sidedquantile 2.00 (instead of u0.975 = 1.96) to test for yijk being an outlier.
Remark. If a more exact analysis is required, externally Studentized resid-uals should be used, since they follow the F–distribution (and can thereforebe tested directly) and. additionally, are more sensitive to outliers (cf. Beck-man and Trussel, 1974; Rao et al., 2008, pp. 328–332).
Group 1 (AB) Group 2 (BA)Patient yijk yijk εijk rijk Patient yijk yijk εijk rijk
1 20 18.75 1.25 0.30 1 30 28.75 1.25 0.302 40 38.75 1.25 0.30 2 40 48.75 –8.75 –2.10 ∗3 30 28.75 1.25 0.30 3 20 18.75 1.25 0.304 20 23.75 –3.75 –0.90 4 30 23.75 6.25 1.51
Hence, patient 2 in Group 2 is an outlier.
Remark. If εijk ∼ N(0, σ2) is not tenable, the response values aresubstituted by their ranks and the hypotheses are tested with theWilcoxon–Mann–Whitney test (cf. Section 2.5) instead of using t–tests.
A detailed discussion of the various approaches for the 2 × 2 cross–overand, especially, their interpretations may be found in Jones and Kenward(1989, Chapter 2) and Ratkowsky et al. (1993).
Comment on the Procedure of Testing
Grizzle (1965) suggested testing carry–over effects on a quite high level ofsignificance (α = 0.1) first. If this leads to a significant result, then thetest for treatment effects is to be based on the data of the first periodonly. If it is not significant, then the treatment effects are tested using thedifferences between the periods. This procedure has certain disadvantages.For example, Brown Jr. (1980) showed that this pretest is of minor efficiencyin the case of real carry–over effects.
The hypothesis of no carry–over effect is very likely to be rejected evenif there is a true carry–over effect. Hence, the biased test [(10.29)] (biased,because the carry–over was not recognized) is used to test for treatmentdifferences. This test is conservative in the case of a true positive carry–overeffect and therefore is insensitive to potential differences in treatments. Onthe other hand, this test will exceed the level of significance if there is atrue negative carry–over effect (not very likely in practice, since this refersto a withdrawal effect).
If there is no true carry–over effect, the null hypothesis is very likely to berejected erroneously (α = 0.1) and the less efficient test using first–perioddata only is performed.
10.3 2× 2 Cross–Over (Classical Approach) 457
Brown Jr. (1980) concluded that this method is not very useful in testingtreatment effects as it depends upon the outcome of the pretest.
Further comments are given in the Section 10.3.4.
10.3.4 Alternative Parametrizations in 2 × 2 Cross–Over
Model (10.1) was introduced as the classical approach and is labeled para-metrization No. 1 using the notation of Ratkovsky, Evans and Alldredge(1993). A more general parametrization of the 2×2 cross–over design, thatincludes a sequence effect γi, is given by
yijk = µ + γi + sik + πj + τt + λr + εijk , (10.44)
with i, j, t, r = 1, 2 and k = 1, . . . , ni. The data are summarized in a tablecontaining the cell means yij·, i.e.,
Sequence12
Period1 2
y11· y12·y21· y22·
Here Sequence 1 indicates that the treatments are given in the order (AB)and Sequence 2 has the (BA) order. Using the common restrictions
γ2 = −γ1, π2 = −π1, τ2 = −τ1, λ2 = −λ1 , (10.45)
and writing γ1 = γ, π1 = π, τ1 = τ , λ1 = λ for brevity, we get the followingequations representing the four expectations:
µ11 = µ + γ + π + τ
µ12 = µ + γ − π − τ + λ,
µ21 = µ− γ + π − τ,
µ22 = µ− γ − π + τ − λ .
In matrix notation this is equivalent to
µ11
µ12
µ21
µ22
= Xβ =
1 1 1 1 01 1 −1 −1 11 −1 1 −1 01 −1 −1 1 −1
µγπτλ
. (10.46)
This (4× 5)–matrix X has rank 4, so that β is only estimable if one of theparameters is removed. Various parametrizations are possible dependingon which of the five parameters is removed and then confounded with theremaining ones.
458 10. Cross–Over Design
Parametrization No. 1
The classical approach ignores the sequence parameter. Its expectationsmay therefore be represented as a submodel of (10.46) by dropping thesecond column of X:
X1β1 =
1 1 1 01 −1 −1 11 1 −1 01 −1 1 −1
µπτλ
. (10.47)
From this we get
X ′1X1 =
(E 00 H
),
where
E = 4I2 ,
H =(
4 −2−2 2
), |X ′
1X1| = 64 ,
(X ′1X1)−1 =
(E−1 00 H−1
)[cf. Theorem A.4],
with E−1 = 14I2, H−1 =
(1/2 1/21/2 1
). The least squares estimate of β1
is
β1 =
µπτ
λ
= (X ′
1X1)−1X ′1
y11·y12·y21·y22·
. (10.48)
We calculate
X ′1
y11·y12·y21·y22·
=
1 1 1 11 −1 1 −11 −1 −1 10 1 0 −1
y11·y12·y21·y22·
=
y11· + y12· + y21· + y22·y11· − y12· + y21· − y22·y11· − y12· − y21· + y22·
y12· − y22·
. (10.49)
10.3 2× 2 Cross–Over (Classical Approach) 459
Therefore, the least squares estimation gives
β1 =
µπτ
λ
= (X ′
1X1)−1X ′1
y11·y12·y21·y22·
(10.50)
=
(y11· + y12· + y21· + y22·)/4(y11· − y12· + y21· − y22·)/4
(y11· − y21·)/2(y11· + y12· − y21· − y22·)/2
, (10.51)
from which we get the following results:
µ = y··· , (10.52)
π = (y·1· − y·2·)/2 = (c1· − c2·)/4 =πd
2[cf. (10.33)] , (10.53)
τ = (y11· − y21·)/2 =τd/λd
2[cf. (8.38)] , (10.54)
λ = y1·· − y2·· = λd/2 [cf. (10.13)] . (10.55)
The estimators τ and λ are correlated
V(τ , λ) = σ2H−1 = σ2
(1/2 1/21/2 1
),
with ρ(τ , λ) = 12/( 1
2 · 1)1/2 = 0.707. The estimation of τ is always twice asaccurate as the estimation of λ, although τ uses data of the first period onlyand is confounded with the difference between the two groups (sequences).
Remark. In fact, parametrization No. 1 is a three–factorial design withthe main effects π, τ , and λ and with τ and λ being correlated. On theother hand, the classical approach uses the split–plot model in addition toparametrization (10.1). So it is obvious that we will get different resultsdepending on which parametrization we use. We will demonstrate this inExample 8.2, where the four different parametrizations are applied to ourdata set of Example 10.1.
Parametrization No. 1(a)
If the test for no carry–over effect does not reject H0 : λ = 0 againstH1 : λ 6= 0 using the test statistic F1,df = λ2
d/ Var(λd) (cf. (10.19)), ourmodel can be reduced to the following
X1β1 =
1 1 11 −1 −11 1 −11 −1 1
µπτ
(10.56)
460 10. Cross–Over Design
and we get the same estimators µ [(10.52)] and π [(10.53)] as before, butnow the estimator τ is based on both periods’ data
τ = (y11· − y12· − y21· + y22·)/4= (d1· − d2·)/4= τd/2 [cf. (10.24)] . (10.57)
The results of parametrizations No. 1 and No. 1(a) are the same as theclassical univariate results we obtained in Section 10.3.1 (except for a factorof 1/2 in π, τ , and λ). But, in addition, the dependency in estimating thetreatment effect τ and the carry–over effect λ is explained.
Parametrization No. 2
In the first parametrization, the interaction treatment × period was aliasedwith the carry–over effect λ. We now want to parametrize this interac-tion directly. Dropping the sequence effect, the model of expectations is asfollows:
E(yijk) = µij = µ + πj + τt + (τπ)tj . (10.58)
Using effect coding, the codings of the interaction effects are just theproducts of the involved main effects. Therefore, we get
µ11
µ12
µ21
µ22
= X2β2 =
1 1 1 11 −1 −1 11 1 −1 −11 −1 1 −1
µπτ
(πτ)
. (10.59)
Since the column vectors are orthogonal, we easily get (X ′2X2) = 4I4 and,
therefore, the parameter estimations are independent (cf. Section 7.3). Theestimators are
β2 =
µπτ
(πτ)
=
y···πd/2
(y11· − y12· − y21· + y22·)/4(y11· + y12· − y21· − y22·)/4
. (10.60)
Note that µ and π are as in the first parametrization. The estimator τ in(10.60) and the estimator τ [(10.57)] in the reduced model (10.56) coincide.The estimator (πτ) may be written as (cf. (10.55))
(πτ) = (y1·· − y2··)/2 = λd/4 = λ/2 , (10.61)
and coincides—except for a factor of 1/2—with the estimation of the carry–over effect (10.55) in model (10.47). So it is obvious that there is an intrinsicaliasing between the two parameters λ and (πτ).
10.3 2× 2 Cross–Over (Classical Approach) 461
Parametrization No. 3
Supposing that a carry–over effect λ or, alternatively, an interaction effect(πτ) may be excluded from analysis, the model now contains only maineffects. We already discussed model (10.56). Now we want to introduce thesequence effect γ as an additional main effect. With γ2 = −γ1 = γ, we get
µ11
µ12
µ21
µ22
= X3β3 =
1 1 1 11 1 −1 −11 −1 1 −11 −1 −1 1
µγπτ
, (10.62)
(X ′3X3) = 4I4 ,
β3 =
µγπτ
=
14X ′
3
y11·y12·y21·y22·
=
y···(y11· + y12· − y21· − y22·)/4(y11· − y12· + y21· − y22·)/4(y11· − y12· − y21· + y22·)/4
(10.63)
=
y···(y1·· − y2··)/2(y·1· − y·2·)/2
τd/2
. (10.64)
The sequence effect γ is estimated using the contrast in the total response ofboth groups (AB) and (BA) and we see the equivalence γ = (πτ) = λd/4.The period effect π is estimated using the contrast in the total response ofboth periods and coincides with π in parametrizations No. 1 (cf. (10.53))and No. 2 (cf. (10.60)) The estimation of τ is the same as τ [(10.57)] inthe reduced model [(10.56)] and τ (cf. (10.60)) in parametrization No. 2.Furthermore, the estimates in β3 are independent, so that, e.g., H0 : τ = 0can be tested not depending on γ = λd = 0 (in contrast to parametrizationNo. 1).
Parametrization No. 4
Here, the main–effects treatment and sequence and their interaction arerepresented in a two–factorial model (cf. Milliken and Johnson, 1984)
E(yijk) = µij = µ + γi + τt + (γτ)it , (10.65)
462 10. Cross–Over Design
i.e.,
µ11
µ12
µ21
µ22
= X4β4 =
1 1 1 11 1 −1 −11 −1 −1 11 −1 1 −1
µγτ
(γτ)
. (10.66)
Since X ′4X4 = 4I4, the components of β4 can be estimated independently
as
β4 =
µγτ
(γτ)
=
y···(y1·· − y2··)/2
τd/2(y·1· − y·2·)/2
. (10.67)
Values of γ in parametrizations 3 and 4 are the same. Analogously, thevalues of τ coincide in parametrizations 2, 3, and 4 whereas the inter-action effect sequence × treatment (γτ) refers to the period effect π inparametrizations 1, 2, and 3.
ParametrizationClassical No. 1 No. 1(a) No. 2 No. 3 No. 4
µ y··· y··· y··· y··· y··· y···γ — — — — λd/4 λd/4π πd = 1
2 (d1· + d2·) πd/2 πd/2 πd/2 πd/2 —τ τd/λd
= y11· − y21· τd/λd/2 τd/2 τd/2 τd/2 τd/2
λ λd = 2(y1·· − y2··) λd/2 — — — —(τπ) — — — λd/4 — —(γτ) — — — — — πd/2
Table 10.4. Estimators using six different parametrizations.
Remark. From the various parametrizations we get the following results:
(i) In parametrization No. 1, the estimators of τ and λ are correlated.In contrast to the arguments of Ratkovsky et al. (1993, pp. 89–90), thevalues of E(MS) given in Table 10.3 are valid. E(MSTreat) depends on(λ1 − λ2) = 2λ so that testing for H0 : τ = 0 may be done either usinga central t–test if λ = 0 or using a noncentral t–test if λ is known. Adifficulty in the argument is certainly that τ and λ are correlated but notrepresented in the two–factorial hierarchy “main effect A, main effect B,and the interaction A ×B ”.
(ii) In parametrization No. 2, the carry–over effect is indirectly representedas the alias effect of the interaction (πτ). We can use the common hier-archical test procedure, as in a two–factorial model with interaction, since
10.3 2× 2 Cross–Over (Classical Approach) 463
the design is orthogonal. If the interaction is not significant the estimatorsof the main effects remain the same (in contrast to parametrization No. 1).
(iii) The analysis of data of a 2 × 2 cross–over design is done in two steps.In the first step, we test for carry–over using one of the parametrizationsin which the carry–over effect is separable from the main effects, e.g., para-metrization No. 3, and it is not surprising that the result will be the sameas if we had used the sequence effect.
We consider the following experiment. We take two groups of subjectsand apply the treatments in both groups in the same order (AB). If there isan interaction effect (maybe a significant carry–over effect in the classicalapproach of Grizzle or a significant sequence effect in parametrization No. 3of Ratkovsky et al. (1993)), then we conclude that the two groups mustconsist of two different classes of subjects. There is either a difference perse between the subjects of the two groups, or treatment A shows differentpersistencies in the two groups. Since the latter is not very likely, it isclear that the subjects of both groups are different in their reactions. Andtherefore it is a sequence effect but not a carry–over effect. We try to avoidthis confusion by randomizing the subjects.
Regarding the classical (AB)/(BA) design, there are two ways tointerpret a significant interaction effect:
(a) either it is a true sequence effect as a result of insufficientrandomization; or
(b) it is a true carry–over effect; this will be the case if there is no doubtabout the randomization process.
Since the actual set of data may hardly be used to decide whether therandomization succeeded or failed, it is necessary to make a distinctionbefore we analyze our data.
If the subjects have not been randomized, the possibility of a sequenceeffect should attract our attention. The F–statistics given for parametriza-tion No. 3 are valid and do not depend upon whether the sequence effectis significant or not, because there is no natural link between a sequenceeffect and a treatment or a period effect.
Given the case that we did randomize our subjects, then there is no needto consider a sequence effect and, therefore, the interaction effect is to beregarded as a result of carry–over.
The carry–over effect was introduced as the persisting effect of a treat-ment during the subsequent period of treatment and is represented as anadditive component in our model. Therefore, it is evident that the F–statistics for treatment and period effects, derived from parametrizationNo. 3 or from the classical approach, are no longer valid if the carry–overeffect is significant.
To continue our examination, we choose one of the following alternatives:
464 10. Cross–Over Design
Sequence12
Period1 2 3 4 5
Baseline A Washout B WashoutBaseline B Washout A Washout
Figure 10.6. Extended 2× 2 cross–over design.
(a) We try to test treatment effects using the data of the first period only.This might be difficult because the sample size is likely to be too smallfor a parallel group design. Of course we then omit the sequence effectfrom our analysis (because we have only this first period).
(b) A significant carry–over effect may also be regarded as a suffcientindicator that the two treatments differ in their effects. At least wecan state that the two treatments have different persistencies andtherefore they are not equal.
It can be assumed that Ratkovsky et al. (1993) regarded the analysis ofvariance tables to be read simultaneously and that the given F–statisticsfor carry–over, treatment, and period effects are always valid. But they arenot. This is only the case if the carry–over effect was proven to be non-significant. Only with a nonsignificant carry–over effect are the expressionsfor treatment and period effect valid. If the label carry–over is replacedby the label sequence effect, then the ordering of tests is not importantand the table is no longer misleading to readers who only just glance atthe literature. The interpretation of the results must reflect this relabel-ing, too. Then, of course, we do not know anything about the carry–overeffect which, mostly, is of more importance than a sequence effect. Usingthe classical approach, the analysis of variance table is valid.
(iv) From a theoretical point of view, it is interesting to extend the 2 × 2design by three additional periods: a baseline period and two washout peri-ods (Figure 10.6). This approach was suggested by Ratkovsky et al. (1993,Chapter 3.6), but is rarely applied because of the amount of effort.
The linear model then contains two additional period effects and carry–over effects of first and second order. The main advantages are that allparameters are estimable, there is no dependence between treatment andcarry–over effects, and we get reduced variance.
(v) Possible modifications of the 2× 2 cross–over are 2× n designs like
Sequence 12
Period1 2 3A B BB A A Sequence 1
2
Period1 2 3A B AB A B
or n× 2 designs like
10.3 2× 2 Cross–Over (Classical Approach) 465
Sequence
1234
Period1 2A BB AA AB B
Adding baseline and washout periods may further improve these designs.A comprehensive treatment of this subject matter is given by Ratkovskyet al. (1993, Chapter 4).
Example 10.2. (Continuation of Example 10.1). The data of Example 10.1are now analyzed with parametrizations 2, 3, and 4 using the SAS procedureGLM. In the split–plot model (classical approach) the following analysisof variance table was obtained for the data of Example 10.1 (cf. Section10.3.2).
Source SS df MS FCarry-over 225 1 225.00 0.92Residual (b–s) 1475 6 245.83Treatment 400 1 400.00 8.73 *Period 25 1 25.00 0.55Residual (w–s) 275 6 45.83Total 2400 15
The treatment effect was found to be significant.Parametrization No. 1 does not take the split–plot character of the design
(limited randomization) into account. Therefore, the two sums of squaresSS (b-s) and SS (w-s) are added for SSResidual = 1750. Table 10.5 showsthis result in the upper part (SS type I). The lower part (SS type II)gives the result using first–period data only, because the model containsthe carry–over effect. All other parametrizations do not contain carry–overeffects and the important sums of squares are found in the lower part (SStype II) of the table. We note that the following F–values coincide
Carry-over (resp., Sequence): F = 0.92 (classical, No. 3, No. 4).Treatment: F = 8.73 (classical, No. 3, No. 4).Period: F = 0.55 (classical, No. 3).
466 10. Cross–Over Design
The different parametrizations were calculated using the following smallSAS programs.
proc glm;class seq subj period treat carry;model y = period treat carry /solution ss1 ss2;title "Parametrization 1";run;
proc glm;class seq subj period treat carry;model y = period treat treat(period) /solution ss1 ss2;title "Parametrization 2";run;
proc glm;class seq subj period treat carry;model y = seq subj(seq) period treat /solution ss1 ss2;random subj(seq);title "Parametrization 3";run;
proc glm;class seq subj period treat carry;model y = seq subj(seq) treat seq(treat) /solution ss1 ss2;random subj(seq);title "Parametrization 4";run;
data Example 8.2;input subj seq period treat $ carry $ y @@;
cards;1 1 1 a 0 20 1 1 2 b a 302 1 1 a 0 40 2 1 2 b a 503 1 1 a 0 30 3 1 2 b a 404 1 1 a 0 20 4 1 2 b a 401 2 1 b 0 30 1 2 2 a b 202 2 1 b 0 40 2 2 2 a b 503 2 1 b 0 20 3 2 2 a b 104 2 1 b 0 30 4 2 2 a b 10run;
10.3 2× 2 Cross–Over (Classical Approach) 467
Parametrization No. 1
Source df SS type I MS FPeriods 1 25.00 25.00 0.17Treatments 1 400.00 400.00 2.74Carry–over 1 225.00 225.00 1.54Residual 12 1750.00 145.83
df SS type I MS FTreatments 1 12.50 12.50 0.09Carry–over 1 225.00 225.00 1.54Residual 12 1750.00 145.83
Parametrization No. 2
Source df SS type I MS FPeriods (P) 1 25.00 25.00 0.17Treatments (T) 1 400.00 400.00 2.74P × T 1 225.00 225.00 1.54Residual 12 1750.00 145.83
df SS type I MS FTreatments 1 400.00 400.00 2.74P × T 1 225.00 225.00 1.54Residual 12 1750.00 145.83
Parametrization No. 3
Source df SS type I MS Fbetween–subjectsSequence 1 225.00 225.00 0.92Residual 6 1475.00 245.83
df SS type I MS Fwithin–subjectsPeriods 1 25.00 25.00 0.55Treatments 1 400.00 400.00 8.73Residual 6 275.00 45.83
Parametrization No. 4
Source df SS type I MS Fbetween–subjectsSequence 1 225.00 225.00 0.92Residual 6 1475.00 245.83
df SS type I MS Fwithin–subjectsTreatments 1 400.00 400.00 8.73Seq × treat. 1 25.00 25.00 0.55Residual 6 275.00 45.83
Table 10.5. GLM results of the four parametrizations.
468 10. Cross–Over Design
10.3.5 Cross–Over Analysis Using Rank Tests
Known rank tests from other designs with two independent groups offer anonparametric approach to analyze a cross–over trial. These tests are basedon the model given in Table 8.1. However, the random effects may nowfollow any continuous distribution with expectation zero. The advantage ofusing nonparametric methods is that there is no need to assume a normaldistribution. According to the difficulties mentioned above, we now assumeeither that there are no carry–over effects or that they are at least ignorable.
Rank Test on Treatment Differences
The null hypothesis that there are no differences between the twotreatments implies that the period differences follow the same distribution
H0 : Fd1(d1k) = Fd2(d2k), k = 1, . . . , ni . (10.68)
Here Fd1 and Fd2 are continuous distributions with identical variances.Then the null hypothesis of no treatment effects may be tested using theWilcoxon, Mann, and Whitney statistic (cf. Section 2.5 and Koch, 1972).
We calculate the period differences d1k and d2k (cf. (10.20)). TheseN = (n1 + n2) differences then get ranks from 1 to N . Let
rφik = [rank of dik ind11, . . . , d1n1 , d21, . . . , d2n2], (10.69)
with i = 1, 2, k = 1, . . . , ni. In the case of ties we use mean ranks. For bothgroups (AB) and (BA), we get the sum of ranks R1 (resp., R2) which areused to build the test statistics U1 (resp., U2) [(2.38) (resp., (2.39))].
Rank Tests on Period Differences
The null hypothesis of no period differences is
H0 : Fc1(c1k) = Fc2(c2k), k = 1, . . . , ni , (10.70)
and so the distribution of the difference c1k = (y11k − y12k) equals thedistribution of the difference c2k = (y22k − y21k). Again, Fci (i = 1, 2) arecontinuous distributions with equal variances.
The null hypothesis H0 is then tested in the same way as H1 in (10.68)using the Wilcoxon, Mann, and Whitney test.
10.4 2 × 2 Cross–Over and Categorical (Binary)Response
10.4.1 Introduction
In many applications, the response is categorical. This is the case in pretestswhen only a rough overview of possible relations is needed. Often a con-tinuous response is not available. For example, recovering from a mental
10.4 2 × 2 Cross–Over and Categorical (Binary) Response 469
illness cannot be measured on a continuous scale, categories like “worse,constant, better” would be sufficient.
Example: Patients suffering from depression participate in two treatmentsA and B. Their response to each treatment is coded binary with 1 forimprovement and 0 : no change. The profile of each subject is then one ofthe pairs (0, 0), (0, 1), (1, 0), and (1, 1). To summarize the data we counthow often each pair occurs.
Group (0, 0) (0, 1) (1, 0) (1, 1) Total1 (AB) n11 n12 n13 n14 n1.
2 (BA) n21 n22 n23 n24 n2.
Total n.1 n.2 n.3 n.4 n..
Table 10.6. 2 × 2 Cross–over with binary response.
Contingency Tables and Odds Ratio
The two columns in the middle of this 2 × 4 contingency table may indicatea treatment effect. Assuming no period effect and under the null hypothesisH0 : “no treatment effect”, the two responses nA = (n13+n22) for treatmentA and nB = (n12+n23) for treatment B have equal probabilities and followthe same binomial distribution nA (resp., nB) ∼ B(n.2 + n.3; 1
2 ).The odds ratio
OR =n12n23
n22n13(10.71)
may also indicate a treatment effect.Testing for carry–over effects is done—similar to the test statistic Tλ
[(10.19)], which is based mainly on λ = Y1../n1 − Y2../n2—by comparingthe differences in the total response values for the profiles (0, 0) and (1, 1).Instead of differences, we choose the odds ratio
OR =n11n24
n14n21(10.72)
which should equal 1 under H0 : “no treatment × period effect”. Using the
2 × 2 table A B
C D, the odds ratio is OR = AD/BC with the following
asymptotic distribution
(ln(OR))2
σ2ln(dOR)
∼ χ21 , (10.73)
where
σ2ln(dOR)
=(
1A
+1B
+1C
+1D
)(10.74)
470 10. Cross–Over Design
(cf. Agresti (2007)). We can now test the significance of the two odds ratios(10.71) and (10.72).
McNemar’s Test
Application of this test assumes no period effects. Only values of subjectsare considered, who show a preference for one of the treatments. Thesesubjects have either a (0, 1) or (1, 0) response profile.
There are nP = (n.2 + n.3) subjects who show a preference for one ofthe treatments. nA = (n13 +n22) prefer treatment A and nB = (n12 +n23)prefer treatment B.
Under the null hypothesis of no treatment effects, nA (resp., nB) arebinomial distributed B(nP ; 1
2 ). The hypothesis is tested using the followingstatistic (cf. Jones and Kenward, 1989, p. 93):
χ2MN =
(nA − nB)2
nP, (10.75)
where χ2MN is asymptotically χ2–distributed with one degree of freedom
under the null hypothesis.
Mainland–Gart Test
Based on a logistic model, Gart (1969) proposed a test for treatment dif-ferences, which is equivalent to Fisher’s exact test using the following 2×2contingency table:
Group (0, 1) (1, 0) Total1 (AB) n12 n13 n12 + n13 = m1
2 (BA) n22 n23 n22 + n23 = m2
Total n.2 n.3 m.
This test is described in Jones and Kenward (1989, p. 113). Asymptot-ically, the hypothesis of no treatment differences may be tested using oneof the common tests for 2× 2 contingency tables, e.g., the χ2–statistic
χ2 =m·(n12n23 − n13n22)2
m1m2n.2n.3. (10.76)
This statistic follows a χ21–distribution under the null hypothesis. This test
and the test with ln(OR) (cf. (10.73)) coincide.
10.4 2 × 2 Cross–Over and Categorical (Binary) Response 471
Prescott Test
The above tests have one thing in common: subjects showing no preferencefor one of the treatments are discarded from the analysis. Prescott (1981)includes these subjects in his test, by means of the marginal sums n1. andn2.. The following 2× 3 table will be used:
Group (0, 1) (0, 0) or (1, 1) (1, 0) Total1 (AB) n12 n11 + n14 n13 n1·2 (BA) n22 n21 + n24 n23 n2·Total n·2 n·1 + n·4 n·3 n··
We first consider the difference between the first and second response. De-pending on the response profile (1, 0), (0, 0), (1, 1), or (0,1), this differencetakes the values +1, 0, or -1.
Assuming that treatment A is better, we expect the first group (AB)to have a higher mean difference than the second group (BA). The meandifference of the response values in Group 1 (AB) is
1n1·
n1·∑
k=1
(y12k − y11k) =n12 − n13
n1·= −d1· (10.77)
and in Group 2 (BA)
1n2·
n2·∑
k=1
(y22k − y21k) =n22 − n23
n2·= −d2· . (10.78)
Prescott’s test statistic (cf. Jones and Kenward, 1989, p. 100) under thenull hypothesis H0 : no direct treatment effect (i.e., E(d1· − d2·) = 0) is
χ2(P ) = [(n12 − n13)n·· − (n·2 − n·3)n1·]2/V (10.79)
with
V = n1·n2·[(n·2 + n·3)n·· − (n·2 − n·3)2]/n·· d. (10.80)
Asymptotically, χ2(P ) follows the χ21–distribution under H0.
This test, however, has the disadvantage that only the hypothesis of no–treatment differences can be tested. As a uniform approach for testing allimportant hypotheses one could choose the approach of Grizzle, Starmerand Koch (1969).
Remark. Another, and often more efficient, method of analysis is given byloglinear models, especially models with uncorrelated two–dimensional bi-nary response. These were examined thoroughly in recent years (cf. Chapter8).
472 10. Cross–Over Design
Example 10.3. A comparison between a placebo A and a new drug B fortreating depression might have shown the following results (1 : improve-ment, 0 : no improvement):
Group (0, 0) (0, 1) (1, 0) (1, 1) Total1 (AB) 5 14 3 6 282 (BA) 10 7 18 10 45Total 15 21 21 16 73
We check for H0 : “treatment × period–effect = 0” (i.e., no carry–overeffect) using the odds ratio [(10.72)]
OR =5 · 106 · 10
= 0.83 and ln(OR) = −0.1823 .
We get
σ2lndOR
= 1/5 + 1/10 + 1/6 + 1/10 = 0.5667
and
(ln(OR))2
σ2lndOR
= 0.06 < 3.84 = χ21;0.95 ,
so that H0 cannot be rejected. In the same way, we get for the odds ratio[(10.71)]
OR =14 · 187 · 3 = 12 , ln(OR) = 2.48 ,
σ2lndOR
= (1/14 + 1/18 + 1/7 + 1/3) = 0.60 ,
(ln(OR))2
σ2ln OR
= 10.24 > 3.84 ,
and this test rejects H0 : no–treatment effect. Since there is no carry–overeffect, we can use McNemar’s test
χ2MN =
((3 + 7)− (14 + 18))2
21 + 21
=222
42= 11.53 > 3.84 ,
which gives the same result. For Prescott’s test we get
V = 28 · 45[(21 + 21) · 73]/73= 28 · 45 · 42 = 52920 ,
χ2(P ) = [(14− 3) · 73− (21− 21) · 28]2/V
= (11 · 73)2/V = 12.28 > 3.84 ,
and H0 : no–treatment effect is also rejected.
10.4 2 × 2 Cross–Over and Categorical (Binary) Response 473
10.4.2 Loglinear and Logit Models
In Table 10.6, we see that Group 1 (AB) and Group 2 (BA) are representedby four distinct categorical response profiles (0, 0), (0, 1), (1, 0), and (1, 1).We assume that each row (and, therefore, each variable) is an indepen-dent observation from a multinomial distribution M(ni.;πi1, πi2, πi3, πi4)(i = 1, 2). Using the appropriate parametrizations and logit or loglinearmodels, we try to define a bivariate binary variable (Y1, Y2), which repre-sents the four profiles and their probabilities according to the model of the2×2 cross–over design. There are various approaches available for handlingthis.
Bivariate Logistic Model
Generally, Y1 and Y2 denote a pair of correlated binary variables. We firstwant to follow the approach of Jones and Kenward (1989, p. 106) whouse the following bivariate logistic model according to Cox (1970) andMcCullagh and Nelder (1989):
P (Y1 = y1, Y2 = y2) = exp(β0 + β1y1 + β2y2 + β12y1y2) , (10.81)
with the binary response being coded with +1 and −1 in contrast to theformer coding. This coding relates to the transformation Zi = (2Yi − 1)(i = 1, 2), which was used by Cox (1972a). The parameter β0 is a scal-ing constant to assure us that the four probabilities sum to 1. Thisdepends upon the other three parameters. The parameter β12 measures thecorrelation between the two variables. β1 and β2 depict the main effects.
The four possible observations are now put into (10.81) in order to getthe joint distribution
ln P (Y1 = 1, Y2 = 1) = β0 + β1 + β2 + β12 ,ln P (Y1 = 1, Y2 = −1) = β0 + β1 − β2 − β12 ,ln P (Y1 = −1, Y2 = 1) = β0 − β1 + β2 − β12 ,ln P (Y1 = −1, Y2 = −1) = β0 − β1 − β2 + β12 .
Bayes’ theorem gives
P (Y1 = 1 | Y2 = 1)P (Y1 = −1 | Y2 = 1)
=P (Y1 = 1, Y2 = 1)/P (Y2 = 1)
P (Y1 = −1, Y2 = 1)/P (Y2 = 1)
=exp(β0 + β1 + β2 + β12)exp(β0 − β1 + β2 − β12)
= exp 2(β1 + β12) .
474 10. Cross–Over Design
We now get the logits
logit[P (Y1 = 1 | Y2 = 1)] = lnP (Y1 = 1 | Y2 = 1)
P (Y1 = −1 | Y2 = 1)= 2(β1 + β12) ,
logit[P (Y1 = 1 | Y2 = −1)] = lnP (Y1 = 1 | Y2 = −1)
P (Y1 = −1 | Y2 = −1)= 2(β1 − β12) ,
and the conditional log–odds ratio
logit[P (Y1 = 1 | Y2 = 1)]− logit[P (Y1 = 1 | Y2 = −1)] = 4β12 , (10.82)
i.e.,
P (Y1 = 1 | Y2 = 1)P (Y1 = −1 | Y2 = −1)P (Y1 = −1 | Y2 = 1)P (Y1 = 1 | Y2 = −1)
= exp(4β12) . (10.83)
This refers to the relationm11m22
m12m21= exp(4λXY
11 )
between the odds ratio and interaction parameter in the loglinear model(cf. Chapter 8). In the same way we get, for i, j = 1, 2 (i 6= j),
logit[P (Yi = 1 | Yj = yj)] = 2(βi + yjβ12) . (10.84)
For a specific subject of one of the groups (AB or BA), a treatment effectexists if the response is either (1, -1) or (-1, 1). From the log–odds ratio forthis combination we get
logit[P (Y1 = 1 | Y2 = −1)]− logit[P (Y2 = 1 | Y1 = −1)] = 2(β1 − β2) .(10.85)
This is an indicator for a treatment effect within a group.Assuming the same parameter β12 for both groups AB and BA, the
following expression is an indicator for a period effect:
logit[P (Y ABi = 1 | Y AB
j = yj)] − logit[P (Y BAi = 1 | Y BA
j = yj)]
= 2(βABi − βBA
i ) . (10.86)
This relation is directly derived from (10.84) with an additional indexingfor the two groups AB and BA. The assumption βAB
12 = βBA12 is important,
i.e., identical interaction in both groups.
Logit Model of Jones and Kenward for the Classical Approach
Let yijk denote the binary response of subject k of group i in period j(i = 1, 2, j = 1, 2, k = 1, . . . , ni). Again we choose the coding as in Table10.6 with yijk = 1 denoting success and yijk = 0 for failure. Using logit–links we want to reparametrize the model according to Table 10.1 for thebivariate binary response (yi1k, yi2k)
logit(πij) = ln(
πij
1− πij
)= Xβ , (10.87)
10.4 2 × 2 Cross–Over and Categorical (Binary) Response 475
where X denotes the design matrix using effect coding for the two groupsand the two periods (cf. (10.47))
X =
1 1 1 01 −1 −1 11 1 −1 01 −1 1 −1
(10.88)
and β = (µ π τ λ)′ is the parameter vector using the reparametrizationconditions
π = −π1 = π2, τ = −τ1 = τ2, λ = −λ1 = λ2 . (10.89)
(i) For both of the two groups and the two periods of the 2× 2 cross–overwith binary response, the logits show the following relation to the modelin Table 10.1:
logit P (y11k = 1) = ln(
P (y11k = 1)P (y11k = 0)
)= ln
(P (y11k = 1)
1− P (y11k = 1)
)
= µ− π − τ ,
logit P (y12k = 1) = µ + π + τ − λ ,
logit P (y21k = 1) = µ− π + τ ,
logit P (y22k = 1) = µ + π − τ + λ .
We get, for example,
P (y11k = 1) =exp(µ− π − τ)
1 + exp(µ− π − τ),
and
P (y11k = 0) =1
1 + exp(µ− π − τ).
(ii) To start with, we assume that the two observations of each subject inperiod 1 and 2 are independent. The joint probabilities πij :
Group (0, 0) (0, 1) (1, 0) (1, 1)1 (AB) π11 π12 π13 π14
2 (BA) π21 π22 π23 π24
are the product of the probabilities defined above. We introduce a nor-malizing constant for the case of nonresponse (0, 0) to adjust the otherprobabilities. The constant c1 is chosen so that the four probabilities sumto 1 (in Group 2 this constant is c2):
π11 = P (y11k = 0, y12k = 0) = exp(c1) ,π12 = P (y11k = 0, y12k = 1) = exp(c1 + µ + π + τ − λ) ,π13 = P (y11k = 1, y12k = 0) = exp(c1 + µ− π − τ) ,π14 = P (y11k = 1, y12k = 1) = exp(c1 + 2µ− λ) .
. (10.90)
476 10. Cross–Over Design
Then
exp(c1)[1 + exp(µ + π + τ − λ) + exp(µ− π − τ) + exp(2µ− λ)] = 1 ,
will give exp(c1).
(iii) Jones and Kenward (1989, p. 109) chose the following parametriza-tion to represent the interaction referring to β12. They introduce a newparameter σ to denote the mean interaction of both groups (i.e., σ =(βAB
12 + βBA12 )/2) and another parameter φ that measures the interaction
difference (φ = (βAB12 −βBA
12 )/2). In the logarithms of the probabilities, themodel for the two groups is as follows (Table 10.7).
Group 1 Group 2(0, 0) ln π11 = c1 + σ + φ ln π21 = c2 + σ − φ(0, 1) ln π12 = c1 + µ + π + τ − λ− σ − φ ln π22 = c2 + µ + π − τ + λ− σ + φ(1, 0) ln π13 = c1 + µ− π − τ − σ − φ ln π23 = c2 + µ− π + τ − σ + φ(1, 1) ln π14 = c1 + 2µ− λ + σ + φ ln π24 = c2 + 2µ + λ + σ − φ
Table 10.7. Logit model of Jones and Kenward.
The values of ci and µ are somewhat difficult to interpret. The nui-sance parameters σ and φ represent the dependency in the structure of thesubjects of the two groups.
From Table 10.7 we obtain the following relations, among the parametersπ, τ , and λ, and the odds ratios
π =14(ln π12 + ln π22 − ln π13 − ln π23)
=14
ln(
π12π22
π13π23
), (10.91)
λ =12
ln(
π11π24
π14π21
)(cf. (10.72)), (10.92)
τ =14
ln(
π12π23
π13π22
)(cf. (10.71)). (10.93)
The null hypotheses H0 : π = 0, H0 : τ = 0, H0 : λ = 0 can be testedusing likelihood ratio tests in the appropriate 2× 2 table.
For π: m12 m13
m23 m22
(second and third column of Table 10.6, where the second row BA isreversed to get the same order AB as the first row).
For λ:m11 m14
m21 m24
(first and last column of Table 10.6).
10.4 2 × 2 Cross–Over and Categorical (Binary) Response 477
For τ :m12 m13
m22 m23
(second and third column of Table 10.6).The estimators mij are taken from the appropriate loglinear model,
corresponding to the hypothesis.
Remark. The modeling [(10.90)] of the probabilities π1j of the first group(and analogously for the second group) is based on the assumption thatthe response of each subject is independent over the two periods. Sincethis assumption cannot be justified in a cross–over design, this within–subject dependency has to be introduced afterward using the parametersσ and φ. This guarantees the formal independency of ln(πij) and thereforethe applicability of loglinear models. This approach, however, is criticallyexamined by Ratkovsky et al. (1993, p. 300), who suggest the followingalternative approach.
Sequence (1, 1) (1, 0) (0, 1) (0, 0)1 (AB) m11 = m12 = m13 = m14 =
n1·PAPB|A n1·PA(1− PB|A) n1·(1− PA)PB|A n1·(1− PA)(1− PB|A)2 (BA) m21 = m22 = m23 = m24 =
n2·PBPA|B n2·PB(1− PA|B) n2·(1− PB)PA|B n2·(1− PB)(1− PA|B)
Table 10.8. Expectations mij of the 2× 4 contingency table.
Logit Model of Ratkovsky, Evans, and Alldredge (1993)
The cross–over experiment aims to analyze the relationship between thetransitions (0, 1) and (1, 0) and the constant response profiles (0, 0) and(1, 1). We define the following probabilities:
(i) unconditional:
PA : P (success of A),PB : P (success of B);
(ii) conditional (conditioned on the preceding treatment):
PA|B : P (success of A | success of B),PA|B : P (success of B | no success of B);
and, analogously, PB|A and PB|A. The contingency tables of the two groupsthen have the expectations mij of cell counts illustrated in Table 10.8.The proper table of observed response values is as follows (Table 10.6transformed and using Nij instead of nij):
(1, 1) (1, 0) (0, 1) (0, 0)N11 N12 N13 N14 n1·N21 N22 N23 N24 n2·
478 10. Cross–Over Design
The loglinear model for sequence i (group, i = 1, 2) can then be written asfollows
ln(Ni1)ln(Ni2)ln(Ni3)ln(Ni4)
= Xβi + εi , (10.94)
where the vector of errors εi is such that p lim εi = 0. From Table 10.8, weget the design matrix for the two groups
X =
1 1 0 1 0 0 01 1 0 0 1 0 01 0 1 0 0 1 01 0 1 0 0 0 1
(10.95)
and the vectors of the parameters
β1 =
ln(n1·)ln(PA)
ln(1− PA)ln(PB|A)
ln(1− PB|A)ln(PB|A)
ln(1− PB|A)
, β2 =
ln(n2·)ln(PB)
ln(1− PB)ln(PA|B)
ln(1− PA|B)ln(PA|B)
ln(1− PA|B)
. (10.96)
Under the usual assumption of independent multinomial distributionsM(ni·, πi1, πi2, πi3, πi4), we get the estimators of the parameters βi bysolving iteratively the likelihood equations using the Newton–Raphson pro-cedure. An algorithm to solve this problem is given in Ratkovsky et al.(1993, Appendix 7.A). The authors mention that the implementation isquite difficult.
Taking advantage of the structure of Table 10.8, this difficulty canbe avoided by transforming the problem (equivalently reducing it) to astandard problem that can be solved with standard software.
From Table 10.8, we get the following relations
(m11 + m12)/n1· = PAPB|A + PA(1− PB|A) = PA ,(m13 + m14)/n1· = (1− PA) ,
(10.97)
⇒
ln(m11 + m12)− ln(m13 + m14) = ln(PA)− ln(1− PA)= logit(PA) , (10.98)
ln(m11)− ln(m12) = logit(PB|A) , (10.99)ln(m13)− ln(m14) = logit(PB|A) , (10.100)
10.4 2 × 2 Cross–Over and Categorical (Binary) Response 479
and, analogously,
ln(m21 + m22)− ln(m23 + m24) = logit(PB) , (10.101)ln(m21)− ln(m22) = logit(PA|B) , (10.102)ln(m23)− ln(m24) = logit(PA|B) . (10.103)
The logits, as a measure for the various effects in the 2×2 cross–over, aredeveloped using one of the four parametrizations given in Section 10.3.4 forthe main effects and the additional effects for the within–subject correla-tion. To avoid overparametrization, we drop the carry–over effect λ whichis represented as an alias effect anyhow, using the other interaction effects(cf. Section 10.3.4). The model of Ratkovsky et al. (1993, REA model), hasthe following structure.
REA Model
logit(PA) = µ + γ1 + π1 + τ1 ,
logit(PB|A) = µ + γ1 + π2 + τ2 + α11 ,
logit(PB|A) = µ + γ1 + π2 + τ2 + α10 ,
logit(PB) = µ + γ2 + π1 + τ2 ,
logit(PA|B) = µ + γ2 + π2 + τ1 + α21 ,
logit(PA|B) = µ + γ2 + π2 + τ1 + α20 .
µ, γi, πi, and τi denote the usual parameters for the four main effectsoverall–mean, sequence, period, and treatment. The new parameters havethe meaning:
αi1 is the association effect averaged over subjects of sequence iif period 1 treatment was a success; and
αi0 is the analog for failure .
Using the sum–to–zero conventions: for the within–subject effects, we
γ = γ1 = −γ2 sequence effect,π = π1 = −π2 period effect,τ = τ1 = −τ2 treatment effect,andαi0 = −αi1 association effect,
480 10. Cross–Over Design
can represent the REA model for the two sequences as follows
logit(PA)logit(PB|A)logit(PB|A)logit(PB)
logit(PA|B)logit(PA|B)
=
1 1 1 1 0 01 1 −1 −1 1 01 1 −1 −1 −1 01 −1 1 −1 0 01 −1 −1 1 0 11 −1 −1 1 0 −1
µγπτ
α11
α21
,
Logit = Xsβs . (10.104)
Replacing the estimators of the logits on the left side by the relations(10.98)–(10.103), and replacing the expected counts mij by the obseverdcounts Nij , we get the following solutions
βs = X−1s Logit , (10.105)
i.e.,
µγπτ
α11
α21
=18
2 1 1 2 1 12 1 1 −2 −1 −12 −1 −1 2 −1 −12 −1 −1 −2 1 10 4 −4 0 0 00 0 0 0 4 −4
Logit(PA)Logit(PB|A)Logit(PB|A)Logit(PB)
Logit(PA|B)Logit(PA|B)
.
(10.106)With (10.98)–(10.103) (mij replaced by Nij) we get
Logit(PA) = ln(
N11 + N12
N13 + N14
), (10.107)
Logit(PB|A) = ln(
N11
N12
), (10.108)
Logit(PB|A) = ln(
N13
N14
), (10.109)
Logit(PB) = ln(
N21 + N22
N23 + N24
), (10.110)
Logit(PA|B) = ln(
N21
N22
), (10.111)
Logit(PA|B) = ln(
N23
N24
). (10.112)
In the saturated model (10.104), rank(Xs) = 6, so that the parameterestimates βs can be derived directly from the estimated logits from (10.105).
10.4 2 × 2 Cross–Over and Categorical (Binary) Response 481
The parameter estimates in the saturated model (10.104) are
α11 =12[Logit(PB|A)− Logit(PB|A)]
=12
ln(
N11N14
N12N13
), (10.113)
α21 =12
ln(
N21N24
N22N23
). (10.114)
Then exp(2α11), for example, is the odds ratio in the 2× 2 table of the ABsequence
10
1 0N11 N12
N13 N14
.
8µ = ln(
N11 + N12
N13 + N14
)2 (N11N13
N12N14
)
+ ln(
N21 + N22
N23 + N24
)2 (N21N23
N22N24
)(10.115)
= a1 + a2 ,
8γ = a1 − a2 , (10.116)
8π = ln(
N11 + N12
N13 + N14
)2 (N12N14
N11N13
)
+ ln(
N21 + N22
N23 + N24
)2 (N22N24
N21N23
)(10.117)
= a3 + a4 ,
8τ = a3 − a4 . (10.118)
The covariance matrix of βs is derived considering the covariance struc-ture of the logits in the weighted least–squares estimation (cf. Chapter8). For the saturated model or submodels (after dropping nonsignificantparameters), the parameter estimates are given by standard software.
Ratkovsky et al. (1993, p. 310) give an example of the application of theprocedure SAS PROC CATMOD. The file has to be organized accordingto (10.107)–(10.112) and Table 10.9 (Y = 1 : success, Y = 2 : failure).
482 10. Cross–Over Design
Count inCount Y Example 10.3
N11 + N12
N13 + N14
12
Logit(PA)1614
N11
N12
12
Logit(PB|A)142
N13
N14
12
Logit(PB|A)159
N21 + N22
N23 + N24
12
Logit(PB)2315
N21
N22
12
Logit(PA|B)185
N23
N24
12
Logit(PA|B)411
Table 10.9. Data organization in SAS PROC CATMOD (saturated model).
Example 10.4. The efficiency of a treatment (B) compared to a placebo (A)for a mental illness is examined using a 2× 2 cross–over experiment (Table10.10). Coding is 1 : improvement and 0 : no improvement.
Group (0, 0) (0, 1) (1, 0) (1, 1) Total1 (AB) 9 5 2 14 302 (BA) 11 4 5 18 38Total 20 9 7 32 68
Table 10.10. Response profiles in a 2 × 2 cross–over with binary response.
We first check for H0 : “treatment × period effect = 0” using the oddsratio [(10.72)]
OR =9 · 1814 · 11
= 1.05 ,
ln(OR) = 0.05 ,
σ2lndOR
= 1/9 + 1/18 + 1/14 + 1/11 = 0.33 ,
(ln(OR))2
σ2lndOR
= 0.01 < 3.84 ,
so that H0 is not rejected. Now we can run the tests for treatment effects.The Mainland–Gart test uses the following 2×2 table:
Group (0, 1) (1, 0) Total1 (AB) 5 2 72 (BA) 4 5 9Total 9 7 16
10.4 2 × 2 Cross–Over and Categorical (Binary) Response 483
Pearson’s χ21–statistic with
χ2 =16(5 · 5− 2 · 4)2
9 · 7 · 7 · 9 = 1.17 < 3.84 = χ21;0.95
does not indicate a treatment effect (p–value: 0.2804).The Mainland–Gart test and Fisher’s exact test do test the same hypoth-
esis but the p–values are different. Fisher’s exact test (cf. Section 2.6.2)gives, for the three tables,
2 55 4
1 66 3
0 77 2
the following probabilities
P1 =7! 9! 7! 9!
16!· 15!2!4!5!
= 0.2317,
P2 =2 · 46 · 6P1 = 0.0515,
P3 =1 · 37 · 7P2 = 0.0032 ,
with P = P1 + P2 + P3 = 0.2364, so that H0 : P ((AB)) = P ((BA)) is notrejected.
Prescott’s test uses the following 2×3 table:
Group (0, 1) (0, 0) or (1, 1) (1, 0) Total(AB) 5 9 + 14 2 30(BA) 4 11 + 18 6 38Total 9 52 7 68
V = 30 · 38[(9 + 7) · 68− (9− 7)2]/68
=30 · 38
68[16 · 68− 4] = 18172.94,
χ2(P ) = [(5− 2) · 68− (9− 7) · 30]2/V
=1442
V= 1.14 < 3.84 .
H0 : treatment effect = 0 is not rejected.
Saturated REA Model
The analysis of the REA model using SAS gives the following table, aftercalling this procedure in SAS:
PROC CATMOD DATA = BEISPIEL 8.4;WEIGHT COUNT;DIRECT SEQUENCE PERIOD TREATASSOC_AB ASSOC_BA;MODEL Y = SEQUENCE PERIOD TREAT
484 10. Cross–Over Design
ASSOC_AB ASSOC_BA /NOGLS ML;
RUN;
Effect Estimate S.E. Chi-Square p–ValueINTERCEPT 0.3437 0.1959 3.08 0.0793SEQUENCE 0.0626 0.1959 0.10 0.7429PERIOD -0.0623 0.1959 0.10 0.7470TREAT -0.2096 0.1959 1.14 0.2846ASSOC AB 1.2668 0.4697 7.27 0.0070 *ASSOC BA 1.1463 0.3862 8.81 0.0030 *
None of the main effects is significant.
Remark. The parameter estimates may be checked directly using formulas(10.113)–(10.118):
µ =18
ln
[(14 + 29 + 5
)2 14 · 59 · 2
]+
18
ln
[(18 + 511 + 4
)2 18 · 411 · 5
]
= 0.2031 + 0.1406 = 0.3437,
γ = 0.2031− 0.1406 = 0.0625,
π =18
ln
[(1614
)2 1870
]+
18
ln
[(2315
)2 5572
]
= −0.1364 + 0.0732 = −0.0632,
τ = −0.1364− 0.0732 = −0.2096,
α11 =12
ln(
9 · 145 · 2
)= 1.2668,
α21 =12
ln(
11 · 184 · 5
)= 1.1463 .
Analysis via GEE1 (cf. Chapter 8)
The analysis of the data set using the GEE1 procedure of Heumann (1993)gives the following results for parametrization No. 2 (model (10.58)):
Effect Estimates Naive S.E. Robust S.E. P-Robust
INTERCEPT 0.1335 0.3569 0.3569 0.7154
TREATMENT 0.2939 0.4940 0.4940 0.5521
PERIOD 0.1849 0.4918 0.4918 0.7071
TREAT x PERIOD -0.0658 0.7040 0.8693 0.9397
The working correlation is 0.5220. All effects are not significant.
10.5 Exercises and Questions 485
10.5 Exercises and Questions
10.5.1 Give a description of the linear model of cross–over designs. What isits relationship to repeated measures and split–plot designs? Whatare the main effects and the interaction effect?
10.5.2 Review the test strategy in the 2×2 cross–over. Assuming the carry–over effect to be significant, what effect is still testable? Is this testuseful?
10.5.3 What is the difference between the classical approach and thefour alternative parametrizations? Describe the relationship betweenrandomization versus carry–over effect and parallel groups versusequence effect.
10.5.4 Consider the following 2× 2 cross–over with binary response:
Group (0, 0) (0, 1) (1, 0) (1, 1) Total1 (AB) n11 n12 n13 n14 n1·2 (BA) n21 n22 n23 n24 n2·
Which contingency tables and corresponding odds ratios are indica-tors for the treatment effect or treatment × period effect?
9.5.5 Review the tests of McNemar, Mainland–Gart, and Prescott (assump-tions, objectives).
11Statistical Analysis of Incomplete Data
11.1 Introduction
A basic problem in the statistical analysis of data sets is the loss of singleobservations, of variables, or of single values. Rubin (1976) can be regardedas the pioneer of the modern theory of Nonresponse in Sample Surveys.Little and Rubin (1987) and Rubin (1987) have discussed fundamentalconcepts for handling missing data based on decision theory and modelsfor the mechanism of nonresponse.
Standard statistical methods have been developed to analyze rectangulardata sets, i.e., to analyze a matrix
X =
x11 · · · · · · x1p
... ∗ ...∗
... ∗ ...xn1 · · · · · · xnp
.
The columns of the matrix X represent variables observed for each unit,and the rows of X represent units (cases, observations) of the variables.Here, data on all scales can be observed:
• interval-scaled data;
• ordinal-scaled data; and
• nominal-scaled data.
© Springer Science + Business Media, LLC 2009
H. Toutenburg and Shalabh, Statistical Analysis of Designed Experiments, Third Edition, 487Springer Texts in Statistics, DOI 10.1007/978-1-4419-1148-3_11,
488 11. Statistical Analysis of Incomplete Data
In practice, some of the observations may be missing. This fact is indicatedby the symbol “∗”.
Examples:
• People do not always give answers to all of the items in a question-naire. Answers may be missing at random (a question was overlooked)or not missing at random (individuals are not always willing togive detailed information concerning personal items like drinkingbehavior, income, sexual behavior, etc.).
• Mechanical experiments in industry (e.g., quality control by pressure)sometimes destroy the object and the response is missing. If there is astrong causal relationship between the object of the experiment andthe loss of response, then it may be expected that the response is notmissing at random.
• In clinical long–time studies, some individuals may not cooperate ordo not participate over the whole period and drop out. In the analysisof lifetime data, these individuals are called censored. Censoring is amechanism causing nonrandomly missing data.
-
6
•
•
•
•III
I
II
?
?
Event
Start End of the study(evaluation)
Figure 11.1. Censored individuals (I : drop–out and II : censored by the endpoint) and an individual with response (event) (III).
Statistical Methods with Missing Data
There are mainly three general approaches to handling the missing dataproblem in statistical analysis.
11.1 Introduction 489
(i) Complete Case Analysis
Analyses using only complete cases confine their attention to those cases(rows of the matrix X) where all p variables are observed. Let X berearranged according to
X =
Xcn1,p
X∗n2,p
where Xc (c : complete) is fully observed. The statistical analysis makesuse of the data in Xc only. The complete case analysis tends to becomeinefficient if the percentage (n2/n) ·100 is increasing and if there are blocksin the pattern of missing data. The selection of complete cases can lead toa selectivity bias in the estimates if selection is heterogeneous with respectto the covariates. Hence, the crucial concern is whether the complete casesconstitute a random subsample of X or not.
Example 11.1. Suppose that age under 60 and age over 60 are the twolevels of the binary variable X (age of individuals). Assume the followingsituation in a lifetime data analysis:
Start End< 60 100 60> 60 100 40
The drop–out percentage is 40% and 60%, respectively. Hence, one has totest if there is a selectivity bias in estimating survivorship models and, ifthe tests are significant, one has to correct the estimations by adjustmentmethods (see, e.g., Walther and Toutenburg, 1991).
(ii) Filling In the Missing Values (Imputation for Nonresponse)
Imputation is a general and flexible alternative to the complete case analy-sis. The missing cells in the submatrix X∗ are replaced by guesses orcorrelation–based predictors transforming X∗ to X∗. However, this methodcan lead to severe biases in statistical analysis, as the imputed values, ingeneral, are different from the true but missing data. We will discuss thisproblem in detail in the case of regression. Sometimes, the statistician hasno other choice but to fill–up the matrix X∗, especially if the percentage ofcomplete units is too small. There are several approaches for imputation.Popular among them are the following:
• Hot deck imputation. Recorded units of the sample are substitutedfor missing data.
• Cold deck imputation. A missing value is replaced by a constant value,as, for example, a unit from external (or previous) samples.
490 11. Statistical Analysis of Incomplete Data
• Mean imputation. Based on the sample of the responding units, meansare substituted for the missing cells.
• Regression (correlation) imputation. Based on the correlative struc-ture of the matrix Xc, missing values are replaced by predicted valuesfrom a regression of the missing item on items observed for the unit.
(iii) Model–Based Procedures
Modeling techniques are generated by factorization of the likelihood accord-ing to the observation and missing patterns. Parameters can be estimatedby iterative maximum likelihood procedures starting with the completecases. These methods are discussed in full by Little and Rubin (1987).
Multiple Imputation
The idea of multiple imputation (Rubin, 1987) is to achieve a variabilityof the estimate by repeated imputation and analysis of each of the so–completed data sets. The final estimate can then be calculated, for example,by taking the means.
Missing Data Mechanisms
Ignorable nonresponse. Knowledge of the mechanism for nonresponse isa central element in choosing an appropriate statistical analysis. If themechanism is under control of the statistician, and if it generates a randomsubsample of the whole sample, then it may be called ignorable.
Example: Assume Y ∼ N(µ, σ2) to be a univariate normally dis-tributed response variable and denote the planned whole sample by(y1, . . . , ym, ym+1, . . . , yn)′. Suppose that indeed only a subsample denotedby yobs = (y1, . . . , ym)′ of responses is observed and the remaining re-sponses ymis = (ym+1, . . . , yn)′ are missing. If the values are missing atrandom (MAR), then the vector (y1, . . . , ym)′ is a random subsample. Theonly disadvantage is a loss of sample size and, hence, a loss of efficiency ofthe unbiased estimators y and s2
y.
Nonignorable nonresponse occurs if the probability P (yi observed) is a func-tion of the value yi itself, as happens, for example, in the case of censoring.In general, estimators based on nonrandom subsamples are biased.
MAR, OAR, and MCAR
Let us assume a bivariate sample of (X,Y ) such that X is completelyobserved but that some values of Y are missing. This structure is a specialcase of a so–called monotone pattern of missing data.
This situation is typical for longitudinal studies or questionnaires, whenone variable is known for all elements of the sample, but the other variableis unknown for some of them.
11.1 Introduction 491
x y
1...
m
m + 1...
n
yobs
ymis
Figure 11.2. Monotone pattern in the bivariate case.
Examples:
X Y
Age IncomePlacebo Blood pressure after 28 daysCancer Life span
The probability of the response of Y can be dependent on X and Y in thefollowing manner:
(i) dependent on X and Y ;
(ii) dependent on X but independent of Y ; and
(iii) independent of X and Y .
In case (iii) the missing data is said to be missing at random (MAR) and theobserved data is said to be observed at random (OAR). Thus the missingdata is said to be missing completely at random (MCAR). As a conse-quence, the data yobs constitutes a random subsample of y = (yobs, ymis)′.In case (ii) the missing data is MAR but the observed values are not nec-essarily a random subsample of y. However, within fixed X–levels, they–values yobs are OAR.
In case (i) the data is neither MAR nor OAR and hence, the missingdata mechanism is not ignorable. In cases (ii) and (iii) the missing datamechanisms are ignorable for methods using the likelihood function. Incase (iii) this is true for methods based on the sample as well.If the conditional distribution of Y | X has to be investigated, then MARis sufficient to have efficient estimators. On the other hand, if the marginaldistribution of Y is of interest (e.g., estimation of µ by y based on the mcomplete observations), then MCAR is a necessary assumption to avoid a
492 11. Statistical Analysis of Incomplete Data
bias. Suppose that the joint density function of X and Y is factorized as
f(X, Y ) = f(X)f(Y | X)
where f(X) is the marginal density of X and f(Y | X) is the conditionaldensity of Y | X. It is obvious that analysis of f(Y | X) has to be basedon the m jointly observed data points. Estimating ymis coincides with theclassical prediction.
Example: Suppose that X is a categorical covariate with two categoriesX = 1 (age > 60 years) and X = 0 (age ≤ 60 years). Let Y be the lifetimeof a denture. It may happen that the younger group of patients participatesless often in the follow–ups compared to the older group. Therefore, onemay expect that P (yobs | X = 1) > P (yobs | X = 0).
11.2 Missing Data in the Response
In controlled experiments such as clinical trials, the design matrix X isfixed and the response is observed for the different factor levels of X. Theanalysis is done by means of analysis of variance or the common linearmodel and the associated test procedures (cf. Chapter 3). In this situation,it is realistic to assume that missing values occur in the response y and notin the design matrix X. This results in an unbalanced response. Even if wecan assume that MCAR holds, sometimes it may be more advantageousto fill–up the vector y than to confine the analysis to the complete cases.This is the fact, for example, in factorial (cross–classified) designs with fewreplications.
11.2.1 Least Squares Analysis for Complete Data
Let Y be the response variable, X the (T, K)–matrix of design, and assumethe linear model
y = Xβ + ε, ε ∼ N(0, σ2I). (11.1)
The OLSE of β is given by b = (X ′X)−1X ′y and the unbiased estimatorof σ2 is given by
s2 = (y −Xb)′(y −Xb)(T −K)−1
=∑T
t=1(yt − yt)2
T −K. (11.2)
To test linear hypotheses of the type Rβ = 0 (R a (J × K)–matrix ofrank J), we use the test statistic
FJ,T−K =(Rb)′(R(X ′X)−1
R′)−1(Rb)Js2
(11.3)
11.2 Missing Data in the Response 493
(cf. Sections 3.7 and 3.8).
11.2.2 Least Squares Analysis for Filled–Up Data
The following method was proposed by Yates (1933). Assume that (T −m)responses in y are missing. Reorganize the data matrices according to
(yobs
ymis
)=
(Xc
X∗
)β +
(εc
ε∗
). (11.4)
The complete case estimator of β is then given by
bc = (X ′cXc)−1X ′
cyobs (11.5)
(Xc : m×K) and the classical predictor of the (T −m)–vector ymis is givenby
ymis = X∗bc. (11.6)
Inserting this estimator into (11.4) for ymis and estimating β in the filled–up model is equivalent to minimizing the following function with respectto β (cf. (3.6))
S(β) =(
yobs
ymis
)−
(Xc
X∗
)β
′(yobs
ymis
)−
(Xc
X∗
)β
=m∑
t=1
(yt − x′tβ)2 +T∑
t=m+1
(yt − x′tβ)2 −→ minβ
! (11.7)
The first sum is minimized by bc [(11.5)]. Replacing β in the second sum bybc equates this sum to zero (cf. (11.6)), i.e., to its absolute minimum. There-fore, the estimator bc minimizes the error–sum–of–squares S(β) [(11.7)] andbc is seen to be the OLSE of β in the filled–up model.
Estimating σ2
(i) If the data are complete, then s2 =∑T
t=1(yt − yt)2/(T − K) is thecorrect estimator of σ2.
(ii) If (T −m) values are missing (i.e., ymis in (11.4)), then
σ2mis =
m∑t=1
(yt − yt)2/(m−K) (11.8)
would be the appropriate estimator of σ2.
494 11. Statistical Analysis of Incomplete Data
(iii) On the other hand, if the missing data are filled–up according to themethod of Yates, we automatically receive the estimator
σ2Yates =
m∑
t=1
(yt − yt)2 +T∑
t=m+1
(yt − yt)2
/(T −K)
=m∑
t=1
(yt − yt)2 /(T −K) . (11.9)
Therefore we get the relationship
σ2Yates = σ2
mis ·m−K
T −K< σ2
mis , (11.10)
and hence the method of Yates underestimates the variance. As aconsequence of this, the confidence intervals (cf. (3.148), (3.149), and(3.164)) turn out to be too small and the test statistics (cf. (11.3))become too large, implying that null hypotheses can be rejected moreoften. To ensure correct tests, the estimate of the variance and allthe following statistics would have to be corrected by the factor (T −K)/(m−K).
11.2.3 Analysis of Covariance—Bartlett’s Method
Bartlett (1937) suggested an improvement of Yates’ ANOVA, which isknown as Bartlett’s ANCOVA (analysis of covariance). This procedure isas follows:
(i) each missing value is replaced by an arbitrary estimate (guess):ymis ⇒ ymis;
(ii) define an indicator matrix ZT,(T−m)
as a covariate according to
Z =
0 0 0 · · · 00 0 0 · · · 0...
......
...0 0 0 · · · 01 0 0 · · · 00 1 0 · · · 0...
......
...0 0 0 · · · 1
. (11.11)
The m null vectors indicate the observed cases and the (T − m)–vectors e′i indicate the missing values. This covariate Z leads to an
11.3 Missing Values in the X–Matrix 495
additional parameter γ(T−m),1
in the model that has to be estimated
(yobs
ymis
)= Xβ + Zγ + ε
= (X,Z)(
βγ
)+ ε . (11.12)
The OLSE of the parameter vector(
βγ
)is found by minimizing
the error-sum-of-squares
S(β, γ) =m∑
t=1
(yt − x′tβ − 0′γ)2 +T∑
t=m+1
(yt − x′tβ − e′tγ)2. (11.13)
The first term is minimal for β = bc [(11.5)], whereas the second termbecomes minimal (equating to zero) for γ = ymis −X∗bc. Hence, the sumtotal is minimal for (bc, γ)′, and so
(bc
ymis −X∗bc
)(11.14)
is the OLSE of(
βγ
)in the model (11.12). Choosing the guess ymis = X∗bc
(as in Yates’ method), we get γ = 0. Both methods lead to the completecase OLSE bc as an estimate of β. Introducing the additional parameter γ(which is not of any statistical interest) has one advantage: the degree offreedom in estimating σ2 in model (11.12) is now T minus the number ofestimated parameters, i.e., T −K−(T −m) = m−K, and is hence correct.Therefore Bartlett’s ANCOVA leads to σ2 = σ2
mis (cf. (11.8)), an unbiasedestimator of σ2.
11.3 Missing Values in the X–Matrix
In econometric models, other than in experimental design in biology orpharmacy, the matrix X is not fixed but contains observations of exoge-neous variables. Hence X may be a matrix of random variables, and missingobservations can occur. In general, we may assume the following structureof the data
yobs
ymis
y∗obs
=
Xobs
X∗obs
Xmis
β + ε . (11.15)
Estimation of ymis corresponds to the prediction problem. The classicalprediction is equivalent to the method of Yates. Based on these arguments,
496 11. Statistical Analysis of Incomplete Data
we may confine ourselves to the substructure(
yobs
y∗obs
)=
(Xobs
Xmis
)β + ε (11.16)
of (11.15) and change the notation as follows:(
yc
y∗
)=
(Xc
X∗
)β +
(εc
ε∗
),
(εc
ε∗
)∼ (0, σ2I). (11.17)
The submodel
yc = Xcβ + εc (11.18)
stands for the completely observed data (c : complete), and we haveyc : m × 1, Xc : m × K, and rank(Xc) = K. Assume that X isnonstochastic. If not, we would use conditional expectations.
The other submodel
y∗ = X∗β + ε∗ (11.19)
is of dimension (T −m) = J . The vector y∗ is observed completely. In thematrix X∗ some observations are missing. The notation X∗ will underlinethat X∗ is partially incomplete, in contrast to the matrix Xmis, whichis completely missing. Combining both of the submodels in model (11.17)corresponds to the so–called mixed model. Therefore, it seems to be naturalto use the method of mixed estimation.
The optimal estimator of β in model (11.17) is given by the mixedestimator (cf. Rao et al., 2008, Chapter 5)
β(X∗) = (X ′cXc + X ′
∗X∗)−1(X ′cyc + X ′
∗y∗)= bc + S−1
c X ′∗(IJ + X∗S−1
c X ′∗)−1(y∗ −X∗bc), (11.20)
where
bc = (X ′cXc)−1X ′
cyc (11.21)
is the OLSE in the complete case submodel (11.18) and
Sc = X ′cXc. (11.22)
The covariance matrix of β(X∗) is
V(β(X∗)) = σ2(Sc + S∗)−1 (11.23)
with
S∗ = X ′∗X∗. (11.24)
The mixed estimator (11.20) is not operational though, due to the fact thatX∗ is partially unknown.
11.3 Missing Values in the X–Matrix 497
11.3.1 Missing Values and Loss of Efficiency
Before we discuss the different methods for estimating missing values, letus study the consequences of confining the analysis to the complete casemodel [(11.18)]. Our measure to compare βc and β(X∗) is the scalar risk
R(β, β, Sc) = trSc V(β), (11.25)
which coincides with the MSE–III risk. From Theorem A.3(iii) we have theidentity
(Sc + X ′∗X∗)−1 = S−1
c − S−1c X ′
∗(IJ + X∗S−1X ′∗)−1X∗S−1
c . (11.26)
Applying this we get the risk of β(X∗) as
σ−2R(β(X∗), β, Sc) = trSc(Sc + S∗)−1= K − tr(IJ + B′B)−1B′B, (11.27)
where B = S−1/2c X ′
∗.The (J × J)–matrix B′B is nonnegative definite with rank (B′B) = J∗.
If rank(X∗) = J < K holds, then J∗ = J and hence B′B > 0.
Let λ1 ≥ . . . ≥ λJ ≥ 0 denote the eigenvalues of B, let Λ =diag(λ1, . . . , λJ ), and let P denote the matrix of orthogonal eigenvectors.Then we have (Theorem A.11) B′B = PΛP ′ and
tr(IJ + B′B)−1B′B = trP (IJ + Λ)−1P ′PΛP ′= tr(IJ + Λ)−1Λ
=J∑
i=1
λi
1 + λi. (11.28)
The MSE–III risk of bc is
σ−2R(bc, β, Sc) = trScS−1c = K. (11.29)
Using the MSE–III criterion, we may conclude that
R(bc, β, Sc)−R(β(X∗), β, Sc) =∑ λi
1 + λi≥ 0 , (11.30)
and, hence, that β(X∗) is superior to bc. We want to continue the compar-ison according to a different criterion, which compares the size of the risksinstead of their differences.
Definition 11.1. The relative efficiency of an estimator β1, compared toanother estimator β2, is defined as the following ratio
eff(β1, β2, A) =R(β2, β, A)
R(β1, β, A). (11.31)
498 11. Statistical Analysis of Incomplete Data
β1 is said to be less efficient than β2 if
eff(β1, β2, A) ≤ 1 .
Using (11.27)–(11.29) we find
eff(bc, β(X∗), Sc) = 1− 1K
∑ λi
1 + λi≤ 1 . (11.32)
The relative efficiency of the complete case estimator bc, compared to themixed estimator in the full model (11.17), is smaller than or equal to one
max[0, 1− J
K
λ1
1 + λ1
]≤ eff(bc, β(X∗), Sc) ≤ 1− J
K
λJ
1 + λJ≤ 1. (11.33)
Examples:
(i) Let X∗ = Xc, so that in the full model the design matrix Xc isused twice. Then B′B = XcS
−1c X ′
c is idempotent of rank J = K.Therefore, we have λi = 1 (Theorem A.36(i)) and hence
eff(bc, β(Xc), Sc) = 1/2. (11.34)
(ii) J = 1 (one row of X is incomplete). Then X∗ = x′∗ becomes a (1×K)–vector and B′B = x′∗S
.−1c x∗ becomes a scalar. Let µ1 ≥ . . . ≥ µK > 0
be the eigenvalues of Sc and let Γ = (γ1, . . . , γK) be the matrix ofthe corresponding orthogonal eigenvectors.
Therefore, we may write β(x∗) as
β(x∗) = (Sc + x∗x′∗)−1(X ′
cyc + x∗y∗) (11.35)
and observe that
µ−11 x′∗x∗ ≤ x′∗S
−1c x∗ =
∑µ−1
j (x′∗γj)2 ≤ µ−1K x′∗x∗ . (11.36)
According to (11.32), the relative efficiency becomes
eff(bc, β(x∗), Sc) = 1− 1K
x′∗S−1c x∗
1 + x′∗S−1c x∗
= 1− 1K
∑µ−1
j (x′∗γj)2
1 +∑
µ−1j (x′∗γj)2
≤ 1
(11.37)and, hence,
1− µ1µ−1K x′∗x∗
K(µ1 + x′∗x∗)≤ eff(bc, β(x∗), Sc) ≤ 1− x′∗x∗
K(µ1µ−1K )(µK + x′∗x∗)
.
(11.38)The relative efficiency of bc in comparison to β(x∗) is dependent on thevector x∗ (or rather its quadratic norm x′∗x∗), as well as on the eigenvaluesof the matrix Sc, especially on the so–called condition number µ1/µK andthe span (µ1 − µK) between the largest and smallest eigenvalues.
11.3 Missing Values in the X–Matrix 499
Let x∗ = gγi (i = 1, . . . ,K), where g is a scalar and defineM = diag(µ1, . . . , µK). For these x∗–vectors, which are parallel to theeigenvectors of Sc, the quadratic risk of the estimators β(gγi) becomes
σ−2R(β(gγi), β, Sc) = trΓMΓ′(ΓMΓ′ + g2γiγ′i)−1
= K − 1 +µi
µi + g2. (11.39)
Hence, the relative efficiency of bc reaches its maximum if x∗ is parallel toγ1 (eigenvector corresponding to the maximum eigenvalue µ1). Therefore,the loss in efficiency by removing one row x∗ is minimal for x∗ = gγ1
and maximum for x∗ = gγK . This fact corresponds to the result of Silvey(1969), namely, that the goodness–of–fit of the OLSE can be improved, ifadditional observations are taken in the direction which was most imprecise.This is just the direction of the eigenvector corresponding to the minimaleigenvalue µK of Sc.
11.3.2 Standard Methods for Incomplete X–Matrices
(i) Complete Case Analysis
The idea of the first method is to confine the analysis to the completelyobserved submodel [(11.18)]. The corresponding estimator of β is bc =S−1
c X ′cyc [(11.21)], which is unbiased and has the covariance matrix V(bc) =
σ2S−1c . Using the estimator bc is only feasible for a small percentage of
missing or incomplete rows in X∗, i.e., for [(T −m)/T ] · 100% at the most,and assumes that MAR holds. The assumption of MAR may not be tenableif, for instance, too many rows in X∗ are parallel to the eigenvector γK
corresponding to the eigenvalue µK of Sc.
(ii) Zero–Order Regression (ZOR)
This method by Weisberg (1980), also called the method of sample means,replaces a missing value xij of the jth regressor Xj by the sample mean ofthe observed values of Xj . Denote the index sets of the missing values ofXj by
Φj = i : xij missing, j = 1, . . . ,K, (11.40)
and let Mj be the number of elements in Φj . Then for j fixed, any missingvalue xij in X∗ is replaced by
xij = xj =1
T −Mj
∑
i/∈Φj
xij . (11.41)
This method may be recommended, as long as the sample mean is a goodestimator for the mean of the jth column. If, somehow, the data in thejth column are trended or follows a growth curve, then xj is not a good
500 11. Statistical Analysis of Incomplete Data
estimator and, hence, replacing missing values by xj may cause a bias. If allthe missing values xij are replaced by the corresponding column means xj
(j = 1, . . . ,K), then the matrix X∗ results in a—now completely known—matrix X(1). Hence, an operationalized version of the mixed model [(11.17)]is (
yc
y∗
)=
(Xc
X(1)
)β +
(εε(1)
). (11.42)
For the vector of errors ε(1), we have
ε(1) = (X∗ −X(1))β + ε∗ (11.43)
with
ε(1) ∼ (X∗ −X(1))β, σ2IJ (11.44)
and J = (T −m).In general, replacing missing values can result in a biased mixed model,
since (X∗ − X(1)) 6= 0 holds. If X is a matrix of stochastic regressorvariables, then, at the most, one may expect that E(X∗ −X(1)) = 0 holds.
(iii) First–Order Regression (FOR)
This term comprises a set of methods, which make use of the structure ofthe matrix X by setting up additional regressions. Based on the index setsΦj in (11.40), the dependence of each column xj (j = 1, . . . ,K, j fixed) onthe other columns is modeled according to the following relationship
xij = θ0j +K∑
µ=1µ6=j
xiµθµj + uij , i /∈ Φ =K⋃
j=1
Φj . (11.45)
The missing values xij in X∗ are estimated and replaced by
xij = θ0j +K∑
µ=1µ6=j
xiµθµj (i ∈ Φj). (11.46)
(iv) Correlation Methods for Stochastic X
In the case of stochastic regressors X1, . . . , XK (or X2, . . . , XK , if X1 = 1),the vector β is estimated by solving the normal equations
Cov(xi, xj)β = Cov(xi, y) (i, j = 1, . . . , K), (11.47)
where Cov(xi, xj) is the (K ×K)–sample covariance matrix. The (i, j)thelement of Cov(xi, xj) is calculated from the pairwise observed elementsof the variables Xi and Xj . Similarly, Cov(xi, y) makes use of pairwiseobserved elements of xi and y. Since this method frequently leads to un-satisfactory results, we will not deal with this method any further. Based
11.3 Missing Values in the X–Matrix 501
on simulation studies, Haitovsky (1968) concludes that in most situationsthe complete case estimator bc is superior to the correlation method.
Maximum–Likelihood Estimates of Missing Values
Suppose that the errors are normally distributed, i.e., ε ∼ N(0, σ2IT ).Moreover, assume a so–called monotone pattern of missing values, whichenables a factorization of the likelihood (cf. Little and Rubin, 1987). Weconfine ourselves to the most simple case and assume that the matrix X∗ iscompletely unobserved. This requires a model which contains no constant.Then X∗, in the mixed model (11.17), may be treated as an unknown pa-rameter. The loglikelihood corresponding to the estimators of the unknownparameters β, σ2, and the “parameter” X∗ may be written as
ln L(β, σ2, X∗) = −n
2ln(2π)− n
2ln(σ2)
− 12σ2
(yc −Xcβ, y∗ −X∗β)′(
yc −Xcβy∗ −X∗β
).
(11.48)
Differentiating with respect to β, σ2, and X∗ leads to the following normalequations
12
∂ ln L
∂β=
12σ2
X ′c(yc −Xcβ) + X ′
∗(y∗ −X∗β) = 0, (11.49)
∂ ln L
∂σ2=
12σ2
−n +1σ2
(yc −Xcβ)′(yc −Xcβ)
+1σ2
(y∗ −X∗β)′(y∗ −X∗β) = 0 (11.50)
and∂ ln L
∂X∗=
12σ2
(y∗ −X∗β)β′ = 0. (11.51)
This results in the ML estimators for β and σ2:
β = bc = S−1c X ′
cyc , (11.52)
σ2 =1m
(yc −Xcbc)′(yc −Xcbc), (11.53)
which are only based on the complete submodel (11.18). Hence, the MLestimator X∗ is solution (cf. (11.36) with β = bc) of
y∗ = X∗bc. (11.54)
Only if K = 1, the solution is unique
x∗ =y∗bc
, (11.55)
where bc = (xc′xc)−1xc
′yc (cf. Kmenta, 1997). For K > 1, a (J × (K−1))–fold set of solutions X∗ exists. If any solution X∗ of (11.39) is substituted
502 11. Statistical Analysis of Incomplete Data
for X∗ in the mixed model, i.e.,(
yc
y∗
)=
(Xc
X∗
)β +
(εc
ε∗
), (11.56)
then the following identity holds
β(X∗) = (Sc + X ′∗X∗)−1(X ′
cyc + X ′∗y∗)
= (Sc + X ′∗X∗)−1(Scβ + X ′
cεc + X ′∗X∗β + X ′
∗X∗S−1c X ′
cεc)= β + (Sc + X ′
∗X∗)−1(Sc + X ′∗X∗)S−1
c X ′cεc
= β + S−1c X ′
cεc
= bc . (11.57)
Remark. The OLSE β(X∗) in the model filled up with the ML estimatorX∗ equals the OLSE bc in the submodel with the incomplete observations.This is true for other monotone patterns as well.
On the other hand, if the pattern is not monotone, then the ML equationshave to be solved by iterative procedures as, for example, the EM algorithmby Dempster, Laird and Rubin (1977) (cf. algorithms by Oberhofer andKmenta, 1974).
Further discussions of the problem of estimating missing values can befound in Little and Rubin (1987), Weisberg (1980) and Toutenburg (1992a,Chapter 8). Toutenburg, Heumann, Fieger and Park (1995) propose aunique solution of the normal equation (11.49) according to
minX∗,λ
|Sc + X ′∗X∗|−1 − 2λ′(y∗ − X∗bc). (11.58)
The solution is
X∗ =y∗y′cXc
y′cxXS−1c X ′
cyc
. (11.59)
11.4 Adjusting for Missing Data in 2× 2Cross–Over Designs
In Chapter 10, procedures for testing a 2 × 2 cross–over design were in-troduced for continuous response. In practice, small sample sizes are animportant factor for the employment of the cross–over design. Hence, forstudies of this kind, it is especially important to use all available informa-tion and to include the data of incomplete observations in the analysis aswell.
11.4.1 Notation
We assume that data are only missing for the second period of treatment.Moreover, we assume that the response (yi1k, yi2k) of group i is ordered, so
11.4 Adjusting for Missing Data in 2× 2 Cross–Over Designs 503
that the first mi pairs represent the complete data sets. The last (ni−mi)pairs are then the incomplete pairs of response. The first mi values of theresponse of period j, which belong to complete observation pairs of groupi, are now stacked in the vector
y′ij = (yij1, . . . , yijmi) . (11.60)
Those observations of the first period which are assigned to incompleteresponse pairs are denoted by
y∗′i1 = (yi1(mi+1), . . . , yi1ni) (11.61)
for group i. The (m × 2)–data matrix Y of the complete data and the((n−m)× 1)–vector y∗1 of the incomplete data can now be written as
Y =(
y11 y12
y21 y22
), y∗1 =
(y∗11y∗21
), (11.62)
with m = m1 + m2 and n = n1 + n2. Additionally, we assume that
(yi1k, yi2k) i.i.d.∼ N ((µi1, µi2), Σ) for k = 1, . . . ,mi ,
yi1ki.i.d.∼ N(µi1, σ
211) for k = mi + 1, . . . , ni .
(11.63)
Here Σ denotes the covariance matrix
Σ =(
σ11 σ12
σ21 σ22
)(11.64)
with
σjj′ = Cov(yijk, yij′k) (11.65)
and, hence, σ11 = Var(yi1k) and σ22 = Var(yi2k). The correlation coefficientρ can now be written as
ρ =σ12√σ11σ22
. (11.66)
Additionally, we assume that the rows of the matrix Y are independent ofthe rows of the vector y∗1 . The entire sample can now be described by thetwo vectors u′ = (y′11, y
′21, y
∗′1 ) and v′ = (y′12, y
′22). Hence, the (n×1)–vector
u represents the observations of the first period and the (m × 1)–vectorv those of the second period. Since we interpret the observed responsepairs as independent realizations of a random sample of a bivariate normaldistribution, we can express the density function of (u, v) as the productof the marginal density of u and the conditional density of v given u. Thedensity function of u is
fu =(
1√2πσ11
)n
exp
(− 1
2σ11
2∑
i=1
ni∑
k=1
(yi1k − µi1)2)
(11.67)
504 11. Statistical Analysis of Incomplete Data
and the conditional density of v given u is
fv|u =1/p
2πσ22(1− ρ2)m
·exp
− 1
2σ22(1−ρ2)
2Pi=1
miPk=1
(yi2k − µi2 − (ρ√
σ22/σ11)(yi1k − µi1))2
.
(11.68)
The joint density function fu,v of (u, v) is now
fu,v = fufv|u. (11.69)
11.4.2 Maximum Likelihood Estimator (Rao, 1956)
We now estimate the unknown parameters µ11, µ21, µ12, and µ22, as wellas the unknown components σjj′ of the covariance matrix Σ. The log-likelihood is ln L = ln fu + ln fv|u with
ln fu = −n
2ln(2πσ11)− 1
2σ11
2∑
i=1
ni∑
k=1
(yi1k − µi1)2 (11.70)
and
ln fv|u = − m2
ln(2πσ22(1− ρ2))
− 1
(2σ22(1− ρ2))
2Xi=1
miX
k=1
yi2k − µi2 − ρ
pσ22/σ11(yi1k − µi1)
2
.
(11.71)
Let us introduce the following notation
σ∗ = σ22(1− ρ2) , (11.72)
β = ρ
√σ22
σ11, (11.73)
µ∗i2 = µi2 − βµi1 . (11.74)
Equation (11.71) can now be transformed, and we get
ln fv|u = −(m/2) ln(2πσ∗)− (1/2σ∗)2∑
i=1
mi∑k=1
(yi2k − µ∗i2 − βyi1k)2 .
(11.75)This leads to a factorization of the loglikelihood into the two terms (11.70)and (11.75), where no two of the unknown parameters µ11, µ21, µ
∗12, µ
∗22, σ11,
σ∗, and β show up in one summand at the same time. Hence maximizationof the loglikelihood can be done independently for the unknown parameters
11.4 Adjusting for Missing Data in 2× 2 Cross–Over Designs 505
and we find the maximum–likelihood estimates
µi1 = y(ni)i1· ,
µi2 = y(mi)i2· + β
(µi1 − y
(mi)i1·
),
β =s12
s11,
σ11 = 1n
2∑i=1
ni∑k=1
(yi1k − µi1)2
,
σ22 = s22 + β2 (σ11 − s11) ,
σ12 = βσ11 .
(11.76)
If we write
y(c)ij· =
1a
a∑
k=1
yijk,
sjj′ =1
m1 + m2
2∑
i=1
mi∑
k=1
(yijk − y
(mi)ij·
)(yij′k − y
(mi)ij′·
), (11.77)
then β and y(c)ij· are independent for a = ni,mi. Consequently, the
covariance matrix Γi = ((γi,uv)) of (µi1, µi2) is
Γi =
(σ11/ni σ12/ni
σ12/ni [σ22 +(1− mi
ni
)σ11
(Var(β)− β2
)]/mi
)(11.78)
with
Var(β) = E(Var(β|y1)
)=
σ22(1− ρ2)σ11(m− 4)
, (11.79)
ρ = β
√σ11
σ22. (11.80)
11.4.3 Test Procedures
We now develop test procedures for large and small sample sizes and for-mulate the hypotheses H(1)
0 : no interaction, H(2)0 : no treatment effect, and
H(3)0 : no effect of the period:
H(1)0 : θ1 = µ11 + µ12 − µ21 − µ22 = 0 , (11.81)
H(2)0 : θ2 = µ11 − µ12 − µ21 + µ22 = 0 , (11.82)
H(3)0 : θ2 = µ11 − µ12 + µ21 − µ22 = 0 . (11.83)
506 11. Statistical Analysis of Incomplete Data
Large Samples
The estimates (11.76) lead to the maximum–likelihood estimate θ1 of θ1.For large sample sizes m1 and m2, the distribution of Z1, defined by
Z1 =θ1√
2∑i=1
(γi,11 + 2γi,12 + γi,22)
, (11.84)
can be approximated by the N(0, 1)–distribution if H(1)0 holds. Here γi,uv
denote the estimates of the elements of the covariance matrix Γi. Theseare found by replacing σ11 [(11.76)] and sjj′ [(11.77)] by their unbiasedestimates
σ11 =n
n− 2σ11 , (11.85)
sjj′ =m
m− 2sjj′ . (11.86)
The maximum–likelihood estimate θ2 for θ2 is derived from the estimatesin (11.76). The test statistic Z2, given by
Z2 =θ2√
2∑i=1
(γi,11 − 2γi,12 + γi,22)
, (11.87)
is approximatively N(0, 1)–distributed for large samples m1 and m2 underH(2)
0 . Analogously, we find the distribution of the test statistic Z3:-
Z3 =θ3√
2∑i=1
(γi,11 − 2γi,12 + γi,22)
(11.88)
and construct the maximum–likelihood estimate θ3 for θ3.
Small Samples
For small sample sizes m1 and m2, Rao (1956) suggests approximating thedistribution of Z1 by a t–distribution with v1 = 1
2 (n + m − 5) degrees offreedom. The choice of v1 degrees of freedom is explained as follows: Theestimates of the variances σ11 and σ∗ (σ∗ = s22− βs12) are based on (n−2)and (n−3) degrees of freedom, and their mean is v1 = 1
2 (n+m−5). If thereare no missing values in the second period (n = m), then a t–distributionwith (n−2) degrees of freedom should be chosen. This test then correspondsto the previously introduced test based on Tλ [(10.19)].
Rao chooses a t–distribution with v2 = (m−2) degrees of freedom for theapproximation of the distribution of Z2 and Z3. Morrison (1973) constructs
11.4 Adjusting for Missing Data in 2× 2 Cross–Over Designs 507
a test for a comparison of the means of a bivariate normal distribution formissing values in one variable at the most. Morrison derives the test statisticfrom the maximum–likelihood estimate and specifies its distribution as at–distribution, where the degrees of freedom are only dependent on thenumber of completely observed response pairs. These tests are equivalentto the tests in Section 10.3.1 if no data are missing.
Example 11.2. In Example 11.1, patient 2 in Group 2 was identified as anoutlier. We now want to check to what extent the estimates of the effectsvary when the observation of this patient in the second period is excludedfrom the analysis. We reorganize the data so that patient 2 in Group 2comes last.
Group 1 Group 2A B B A20 30 30 2040 50 20 1030 40 30 1020 40 40 —
Summarizing in matrix notation (cf. (11.62)), we have
Y =
20 3040 5030 4020 4030 2020 1030 10
, y∗1 = (40) . (11.89)
The unbiased estimates are calculated with n1 = 4, n2 = 4,m1 = 4, andm2 = 3 by inserting (11.85) and (11.86) in (11.76). We calculate
y(n1)11· =
14
(20 + 40 + 30 + 20) = 27.50,
y(m1)11· =
14
(20 + 40 + 30 + 20) = 27.50,
y(m1)12· =
14
(30 + 50 + 40 + 40) = 40.00,
y(n2)21· =
14
(30 + 20 + 30 + 40) = 30.00,
y(m2)21· =
13
(30 + 20 + 30) = 26.67,
y(m1)22· =
13
(20 + 10 + 10) = 13.33,
508 11. Statistical Analysis of Incomplete Data
and
s11 =1
7− 2
(20− 27.50)2 + · · ·+ (20− 27.50)2
+ (30− 26.67)2 + · · ·+ (30− 26.67)2
= 68.33,
s22 =1
7− 2
(30− 40.00)2 + · · ·+ (40− 40.00)2
+ (20− 13.33)2 + (10− 13.33)2 + (10− 13.33)2
= 53.33,
s12 =1
7− 2[(20− 27.50)(30− 40) + · · ·+ (20− 27.50)(40− 40)
+ (30− 26.67)(20− 13.33) + · · ·+ (30− 26.67)(10− 13.33)]
= 46.67,
s21 = s12.
With
β =s12
s11=
53.3368.33
= 0.68
we find
µ11 = y(n1)11· = 27.50,
µ21 = y(n2)21· = 30.00,
µ12 = 40.00 + 0.68 · (27.50− 27.50) = 40.00,
µ22 = 13.33 + 0.68 · (30.00− 26.67) = 15.61,
and with
σ11 =1
8− 2[(20− 27.50)2 + · · ·+ (20− 27.50)2
+ (30− 30)2 + · · ·+ (30− 30)2]
= 79.17,
σ22 = 53.33 + 0.682 · (79.17− 68.33) = 58.39,
σ12 = 0.68 · 79.17 = 54.07,
σ21 = σ12 ,
we get
ρ = 0.68 ·√
79.1758.39
= 0.80 [cf. (11.80)],
Var(β) =58.39 · (1− 0.802)
79.17 · (7− 4)= 0.09 [cf. (11.79)].
11.4 Adjusting for Missing Data in 2× 2 Cross–Over Designs 509
We now determine the two covariance matrices [(11.78)]
Γ1 =(
79.17/4 54.07/454.07/4 [58.39 +
(1− 4
4
) · 79.17 · (0.09− 0.682)]/4
)
=(
19.79 13.5213.52 14.60
),
Γ2 =(
19.79 13.5213.52 16.98
).
Finally, our test statistics are
interaction: Z1 = 21.89/11.19 = 1.96 [5 degrees of freedom],
treatment: Z2 = −26.89/4.13 = −6.50 [5 degrees of freedom],
period: Z3 = 1.89/4.13 = 0.46 [5 degrees of freedom] .
The following table shows a comparison with the results of the analysis ofthe complete data set:
Complete Incompletet df p–Value t df p–Value
Carry-over 0.96 6 0.376 1.96 5 0.108Treatment -2.96 6 0.026 -6.50 5 0.001Period 0.74 6 0.488 0.46 5 0.667
20 40 60 80 100
−20
−10
0
10
20
dik
Yi·k
• ••
•
• Group 1
Group 2 ..................................................
..................................................
.................................................
........................................................................................................................................................................................................................................................................................
......................
......................
......................
................................................................................................................................................................................
Figure 11.3. Difference–response–total plot of the incomplete data set.
An interesting result is that by excluding the second observation of pa-tient 2, the treatment effect achieves an even higher level of significance ofp = 0.001 (compared to p = 0.026 before). However, the carry–over effectof p = 0.108 is now very close to the limit of significance of p = 0.100proposed by Grizzle. This is easily seen in the difference–response–total
510 11. Statistical Analysis of Incomplete Data
plot (Figure 11.3), which shows a clear separation of the covering, in thehorizontal as well as the vertical direction (cf. Figure 8.5).
11.5 Missing Categorical Data
The procedures which have been introduced so far are all based on thelinear regression model [(11.1)] with one continuous endogeneous variableY . In many applications however, this assumption does not hold. Often Y isdefined as a binary response variable and hence has a binomial distribution.Because of this, statistical analysis of incompletely observed categoricaldata demands different procedures than those previously described. Fora clear and understandable representation of the different procedures, athree–dimensional contingency table is chosen where only one of the threecategorical variables is assumed to be observed incompletely.
11.5.1 Introduction
Let Y be a binary outcome variable and let X1 and X2 be two covariateswith J and K categories. The contingency table is thus of the dimension2×J×K. We assume that only X2 is observed incompletely. The responseof the covariate X2 is indicated by an additional variable
R2 =
1 if X2 is not missing,0 if X2 is missing .
(11.90)
This leads to a new random variable
Z2 =
X2 if R2 = 1,K+1 if R2 = 0 .
(11.91)
Assume that Y is related to X1 and X2 by the logistic model, a generalizedlinear model with logit link. This model assesses the effects of the covariatesX1 and X2 on the outcome variable Y .
Let µi|jk = P (Y = i | X1 = j, X2 = k) be the conditional distribution ofthe binary variable Y , given the values of the covariates X1 and X2. Thelogistic model without interaction is
ln(
µ1|jk
1− µ1|jk
)= β0 + β1j + β2k (11.92)
or
µ1|jk =exp(β0 + β1j + β2k)
1 + exp(β0 + β1j + β2k). (11.93)
The parameters β1j and β2k describe the effect of the jth category of X1
and the kth category of X2 on the outcome variable Y . The parameter
11.5 Missing Categorical Data 511
vector β′ = (β0, β11, . . . , β1J , β21, . . . , β2K) is estimated by the maximum–likelihood approach.
11.5.2 Maximum Likelihood Estimation in the Complete DataCase
Let π∗ijk = P (Y = i, X1 = j, X2 = k) be the joint distribution of the threevariables for the complete data case and define
γk|j = P (X2 =k | X1 =j),τj = P (X1 =j) .
(11.94)
This parametrization allows a factorization of the joint distribution ofY, X1, and X2:
π∗ijk = µi|jk γk|j τj
= (µ1|jk)i (1− µ1|jk)1−i γk|j τj . (11.95)
The contribution of a single observation with the values Y = i,X1 =j, andX2 =k to the loglikelihood is
ln((
µ1|jk
)i (1− µ1|jk
)1−i)
+ ln γk|j + ln τj . (11.96)
Hence, the loglikelihood is additive in the parameters and can be maxi-mized independently for β, γ and τ . The maximum–likelihood estimate ofβ results from maximizing the loglikelihood of the entire sample
l∗n(β) =1∑
i=0
J∑
j=1
K∑
k=1
n∗ijk l∗(β; i, j, k) (11.97)
with
l∗(β; , i, j, k) = ln((
µ1|jk
)i (1− µ1|jk
)1−i),
where n∗ijk is the number of elements with Y = i,X1 = j, and X2 = k.However, these equations are nonlinear in β and, hence, the maximizationtask involves an iterative method. A standard procedure for nonlinear op-timization is the Newton–Raphson method or one of its variants, like theFisher–scoring method.
11.5.3 Ad–Hoc Methods
Complete Case Analysis
Similar to the previously described situation with continuous variables, thecomplete case analysis is a standard approach for incomplete categoricaldata as well: the incompletely observed cases are eliminated from the dataset. This reduced sample can now be analyzed by the maximum–likelihoodapproach for completely observed contingency tables (cf. Section 11.5.2).
512 11. Statistical Analysis of Incomplete Data
Filling the Contingency Table
Unlike imputation methods that fill up the gaps in the data set (cf. Sec-tion 11.1), the filling method by Vach and Blettner (1991) fills up the cellsof the contingency table. This is done by distributing the elements witha missing value of X2, i.e., with the value Z2 = K +1, to the other cells,dependent on the (known) values of Y and X1.
Let nijk be the number of elements with the values Y = i, X1 = j, andZ2 = k, i.e., the cell counts of the [2×J×(K+1)]–contingency table. Thefilled–up contingency table is then
nFILLijk = nijk + nijK+1
nijk∑Kk=1 nijk
. (11.98)
To this new (2× J ×K) table, the maximum–likelihood procedure forcompletely observed contingency tables is applied, according to Section11.5.2.
11.5.4 Model–Based Methods
Maximum–Likelihood Estimation in the Incomplete Data Case
Let πijk = P (Y = i,X1 =j, Z2 =k) be the joint distribution of the variablesY , X1, and Z2, and define
qijk = P (R2 =1 | Y = i,X1 =j, X2 =k) . (11.99)
The parametrization [(11.94) and (11.99)] enables a decomposition of thejoint distribution (cf. Vach and Schumacher, 1993, p. 355). However, wehave to distinguish between the case that the value of X2 is known
πijk = P (Y = i, X1 =j, Z2 =k)
= P (Y = i,X1 =j, X2 =k, R2 =1)
= P (R2 =1 | Y = i,X1 =j, X2 =k) P (Y = i | X1 =j,X2 =k)×P (X2 =k | X1 =j)P (X1 =j)
= qijk
(µ1|jk
)i(1− µ1|jk
)1−iγk|j τj . (11.100)
11.5 Missing Categorical Data 513
and the case that the value of X2 is missing, i.e., k = K+1:
πijK+1 = P (Y = i,X1 =j, Z2 =K+1)
= P (Y = i,X1 =j, R2 =0)
= P (R2 =0 | Y = i,X1 =j)P (Y = i | X1 =j)P (X1 =j)
=( K∑
k=1
P (R2 =0 | Y = i,X1 =j, X2 = k)P (Y = i | X1 =j,X2 =k)
×P (X2 =k | X1 =j))
P (X1 =j)
=( K∑
k=1
(1− qijk)(µ1|jk
)i(1− µ1|jk
)1−iγk|j
)τj . (11.101)
Note that this distribution, unlike the complete data case, is dependenton the parameter q. Furthermore, the loglikelihood is not additive in theparameters β, γ, τ , and q and, hence, cannot be maximized separately forthe parameters.
If the missing values are missing at random (MAR), then the missingprobability is independent of the true value k of X2, i.e.,
P (R2 =1 | Y = i,X1 =j,X2 =k) ≡ P (R2 =1 | Y = i,X1 =j) (11.102)
and thus qijk ≡ qij . For the joint distribution of Y,X1, and Z2 (cf. (11.100)and (11.101)) this leads to
πijk = qij
(µ1|jk
)i(1− µ1|jk
)1−iγk|j τj (11.103)
for k = 1, . . . ,K and to
πijK+1 = (1− qij)( K∑
k=1
(µ1|jk
)i(1− µ1|jk
)1−iγk|j
)τj (11.104)
for k = K+1.The contribution of a single element to the loglikelihood under the MAR
assumption is now
ln qij + ln((
µ1|jk
)i(1− µ1|jk
)1−i)
+ ln γk|j + ln τj (11.105)
for k = 1, . . . ,K and
ln (1−qij) + ln( K∑
k=1
(µ1|jk
)i(1− µ1|jk
)1−iγk|j
)+ ln τj (11.106)
for k = K+1.The loglikelihood disintegrates into three summands; hence, maximizing
the loglikelihood for β can now be done independently of q. If the value
514 11. Statistical Analysis of Incomplete Data
of X2 is missing, it is impossible to split the second summand dependingon β and γ any further. Hence, the maximum–likelihood estimation of βrequires joint maximization of the following loglikelihood for (β, γ), whereγ is regarded as a nuisance parameter,
lMLn (β, γ) =
1∑
i=0
J∑
j=1
K+1∑
k=1
nijk lML(β, γ ; i, j, k) (11.107)
with
lML (β, γ; i, j, k) =
ln
µ1|jk
i1− µ1|jk
1−i
+ ln γk|j for k = 1, . . . , K,
lnPK
k=1
µ1|jk
i1− µ1|jk
1−iγk|j
for k = K+1 ,
where nijk is the number of elements with Y = i,X1 =j, and Z2 =k.Analogously to the complete data case, the computation of the estimates
of β and γ requires an iterative procedure such as the Fisher–scoringmethod. Let θ = (β, γ). The iteration step of the Fisher–scoring methodis
θ(t+1) = θ(t) +(IMLθθ (θ(t), τn, qn)
)−1SML
n (θ(t)) , (11.108)
with the score function
SMLn (θ) =
1n
∂
∂θlMLn (θ) (11.109)
and the information matrix
IMLθ (θ, τ, q) = −Eθ,τ,q
(∂2
∂θ ∂θ′lML (β; Y, X1, Z2)
). (11.110)
Pseudo–Maximum–Likelihood Estimation (PML)
In order to simplify the computation of the maximum–likelihood estimateof the regression parameter β, the nuisance parameter γ may be estimatedfrom the observed values of X1 and Z2 and inserted into the loglikeli-hood, instead of joint iterative estimation along with β. A possible estimate(cf. Pepe and Fleming, 1991) is
γk|j =n+jk∑K
k=1 n+jk
. (11.111)
This estimate is only consistent for γ under very strict assumptions for themissing mechanism. Vach and Schumacher (1993), p. 356, suggest apply-ing this estimate to the filled up contingency table of the filling method(cf. Section 11.5.3)
γk|j =nFILL
+jk∑Kk=1 nFILL
+jk
=n0jk
PK+1k=1 n0jkPKk=1 n0jk
+ n1jk
PK+1k=1 n1jkPKk=1 n1jk∑K+1
k=1 n+jk
. (11.112)
11.6 Exercises and Questions 515
This estimate is consistent for γ if the MAR assumption holds. PML es-timation of β is now achieved by iterative maximization of the followingloglikelihood:
lPMLn (β) =
1∑
i=0
J∑
j=1
K+1∑
k=1
nijk lPML(β, γ ; i, j, k) (11.113)
with
lPML (β, γ; i, j, k) =
ln((
µ1|jk
)i(1− µ1|jk
)1−i)
for k = 1, . . . , K,
ln(( K∑
k=1
µ1|jkγk|j)i(1−
K∑
k=1
µ1|jkγk|j)1−i
), k=K+1.
11.6 Exercises and Questions
11.6.1 What is a selectivity bias and what is meant by drop–out in long–termstudies?
11.6.2 Name the essential methods for imputation and describe them.
11.6.3 Explain the missing data mechanisms MAR, OAR, and MCAR bymeans of a bivariate sample.
11.6.4 Describe the OLS methods of Yates and Bartlett. What is thedifference?
11.6.5 Assume that in a regression model values in the matrix X are missingand are to be replaced. Which methods may be used? Explain theeffect on the unbiasedness of the final estimator β.
Appendix AMatrix Algebra
There are numerous books on matrix albegra which contain results usefulfor the discussion of linear models. See, for instance, books by Graybill(1961), Mardia et al. (1979), Searle (1982), Rao (1973), Rao and Mitra(1971), Rao and Rao (1998) to mention a few. We collect in this Appendixsome of the important results for ready reference. Proofs are not generallygiven. References to original sources are given wherever necessary.
A.1 Introduction
Definition A.1. An (m× n)–matrix A is a rectangular array of elements inm rows and n columns.
In the context of the material treated in this book and in this Appendixthe elements of a matrix are taken as real numbers.
We refer to an (m×n)–matrix of type (or order) m×n and indicate thisby writing A : m× n or A
m,n.
Let aij be the element in the ith row and the jth column of A. Then Amay be represented as
A =
a11 a12 . . . a1n
a21 a22 . . . a2n
...... . . .
am1 am2 . . . amn
= (aij).
518 Appendix A. Matrix Algebra
A matrix with n = m rows and columns is called a square matrix. A squarematrix, having zeros as elements below (above) the diagonal, is called anupper (lower) triangular matrix.
Let A and B be two matrices with the same dimensions, i.e., with thesame number of rows m and columns n. Then the sum of the matricesA±B is defined element by element, i.e.,
A±B =
a11 ± b11 a12 ± b12 . . . a1n ± b1n
a21 ± b21 a22 ± b22 . . . a2n ± b2n
......
...am1 ± bm1 am2 ± bm2 . . . amn ± bmn
.
Also an element–by–element operation is the multiplication of a matrixwith a scalar. Therefore νA = ν · aij ∀i = 1, . . . , m, j = 1, . . . , n.
Definition A.2. The transpose A′ : n×m of a matrix A : m× n is given byinterchanging the rows and columns of A. Thus
A′ = (aji).
Then we have the following rules:
(A′)′ = A, (A + B)′ = A′ + B′, (AB)′ = B′A.′
Definition A.3. A square matrix is called symmetric, if A′ = A.
Example A.1. Let x be a random vector with an expectation vector E(x) =µ. Then the covariance matrix of x is defined by
cov(x) = E(x− µ)(x− µ)′ .
Any covariance matrix is symmetric.
Definition A.4. An (m×1)–matrix a is said to be an m–vector and is writtenas a column
a =
a1
...am
.
Definition A.5. A (1× n)–matrix a′ is said to be a row vector
a′ = (a1, . . . , an).
Hence, a matrix A : m× n may be written, alternatively, as
A = (a(1), . . . , a(n)) =
a′1...
a′m
A.1 Introduction 519
with
a(j) =
a1j
...amj
, ai =
ai1
...ain
.
Definition A.6. The (n× 1)–row vector (1, . . . , 1)′ is denoted by 1′n or 1′.
Definition A.7. The matrix A : m×m with aij = 1 (for all i, j) is given thesymbol Jm, i.e.,
Jm =
1 . . . 1...
...
1... 1
= 1m1′m .
Definition A.8. The n–vector
ei = (0, . . . , 0, 1, 0, . . . , 0)′ ,
whose ith component is one and whose remaining components are zero, iscalled the ith unit vector.
Definition A.9. A (n × n)–matrix, with elements 1 on the main diagonaland zeros off the diagonal, is called the identity matrix In.
Definition A.10. A square matrix A : n× n, with zeros in the off diagonal,is called a diagonal matrix. We write
A = diag(a11, . . . , ann) = diag(aii) =
a11 0. . .
0 ann
.
Definition A.11. A matrix A is said to be partitioned if its elements arearranged in submatrices.
Examples are
Am,n
= ( A1m,r
, A2m,s
) with r + s = n
or
Am,n
=
A11r,n−s
A12r,s
A21m−r,n−s
A22m−r,s
.
For partitioned matrices we get the transpose as
A′ =(
A′1A′2
), A′ =
(A′11 A′21A′12 A′22
),
respectively.
520 Appendix A. Matrix Algebra
A.2 Trace of a Matrix
Definition A.12. Let a11, . . . , ann be the elements on the main diagonal ofa square matrix A : n× n. Then the trace of A is defined as the sum
tr(A) =n∑
i=1
aii.
Theorem A.1. Let A and B be square (n×n)–matrices and let c be a scalarfactor. Then we have the following rules:
(i) tr(A±B) = tr(A)± tr(B).
(ii) tr(A′) = tr(A).
(iii) tr(cA) = c tr(A).
(iv) tr(AB) = tr(BA).
(v) tr(AA′) = tr(A′A) =∑
i,j a2ij .
(vi) If a = (a1, . . . , an)′ is an n–vector, then its squared norm may bewritten as
|| a ||2 = a′a =n∑
i=1
a2i = tr(aa′).
Note: The rules (iv) and (v) also hold for the cases A : n × m andB : m× n.
A.3 Determinant of a Matrix
Definition A.13. Let n > 1 be a positive integer. The determinant of asquare matrix A : n× n is defined by
|A| =n∑
i=1
(−1)i+jaij |Mij | (for any j, j fixed),
with |Mij | being the minor of the element aij . |Mij | is the determinant ofthe remaining [(n − 1) × (n − 1)]–matrix when the ith row and the jthcolumn of A are deleted. Aij = (−1)i+j |Mij | is called the cofactor of aij .
Example A.2.n = 2:
|A| = a11a22 − a12a21 .
A.3 Determinant of a Matrix 521
n = 3: First column (j = 1) fixed:
A11 = (−1)2∣∣∣∣
a22 a23
a32 a33 ,
∣∣∣∣
A21 = (−1)3∣∣∣∣
a12 a13
a32 a33 ,
∣∣∣∣
A31 = (−1)4∣∣∣∣
a12 a13
a22 a23 ,
∣∣∣∣
⇒ |A| = a11A11 + a21A21 + a31A31 .
Note: As an alternative, we may fix a row and develop the determinant ofA according to
|A| =n∑
j=1
(−1)i+jaij |Mij | (for any i, i fixed).
Definition A.14. A square matrix A is said to be regular or nonsingular if|A| 6= 0. Otherwise A is said to be singular.
Theorem A.2. Let A and B be (n×n)–square matrices and let c be a scalar.Then we have:
(i) |A′| = |A|.(ii) |cA| = cn|A|.(iii) |AB| = |A||B|.(iv) |A2| = |A|2.(v) If A is diagonal or triangular, then
|A| =n∏
i=1
aii.
(vi) For D =
An,n
Cn,m
Om,n
Bm,m
we have
∣∣∣∣A CO B
∣∣∣∣ = |A||B|,
and, analogously,∣∣∣∣
A′ O′
C ′ B′
∣∣∣∣ = |A||B|.
522 Appendix A. Matrix Algebra
(vii) If A is partitioned with A11 : p × p and A22 : q × q square andnonsingular, then
∣∣∣∣A11 A12
A21 A22
∣∣∣∣ = |A11||A22 −A21A−111 A12|
= |A22||A11 −A12A−122 A21|.
Proof. Define the following matrices
Z1 =(
I −A12A−122
0 I
)and Z2 =
(I 0
−A−122 A21 I
),
where |Z1| = |Z2| = 1 by (vi). Then we have
Z1AZ2 =(
A11 −A12A−122 A21 0
0 A22
)
and [using (iii) and (iv)]
|Z1AZ2| = |A| = |A22||A11 −A12A−122 A21| .
(viii)∣∣∣∣
A xx′ c
∣∣∣∣ = |A|(c− x′A−1x) where x is an (n, 1)–vector.
Proof. Use (vii) with A instead of A11 and c instead of A22.
(ix) Let B : p × n and C : n × p be any matrices and let A : p × p be anonsingular matrix. Then
|A + BC| = |A||Ip + A−1BC|= |A||In + CA−1B|.
Proof. The first relationship follows from (iii) and
(A + BC) = A(Ip + A−1BC),
immediately.
The second relationship is a consequence of (vii) applied to the matrix∣∣∣∣
Ip −A−1BC In
∣∣∣∣ = |Ip||In + CA−1B|
= |In||Ip + A−1BC| .(x) |A + aa′| = |A|(1 + a′A−1a), if A is nonsingular.
(xi) |Ip + BC| = |In + CB|, if B : (p, n) and C : (n, p).
A.4 Inverse of a Matrix
Definition A.15. The inverse of a square matrix A : n×n is written as A−1.The inverse exists if and only if A is nonsingular. The inverse A−1 is unique
A.5 Orthogonal Matrices 523
and characterized by
AA−1 = A−1A = I.
Theorem A.3. If all the inverses exist we have:
(i) (cA)−1 = c−1A−1.
(ii) (AB)−1 = B−1A−1.
(iii) If A : p× p, B : p× n, C : n× n, and D : n× p, then
(A + BCD)−1 = A−1 −A−1B(C−1 + DA−1B)−1DA−1.
(iv) If 1 + b′A−1a 6= 0, then we get, from (iii),
(A + ab′)−1 = A−1 − A−1ab′A−1
1 + b′A−1a.
(v) |A−1| = |A|−1.
Theorem A.4 (Inverse of a Partitioned Matrix).For partitioned regular A:
A =(
E FG H
),
where E : (n1 × n1), F : (n1 × n2), G : (n2 × n1), and H : (n2 × n2)(n1 + n2 = n) are such that E and D = H − GE−1F are regular, thepartitioned inverse is given by
A−1 =(
E−1(I + FD−1GE−1) −E−1FD−1
−D−1GE−1 D−1
)=
(A11 A12
A21 A22
).
Proof. Check that the product of A and A−1 reduces to the identitymatrix, i.e.,
AA−1 = A−1A = I.
A.5 Orthogonal Matrices
Definition A.16. A square matrix A : n × n is said to be orthogonal ifAA′ = I = A′A. For orthogonal matrices we have:
(i) A′ = A−1 .
(ii) |A| = ±1 .
(iii) Let δij = 1 for i = j and 0 for i 6= j, denote the Kronecker symbol.Then the row vectors ai and the column vectors a(i) of A satisfy theconditions
a′iaj = δij , a′(i)a′(j) = δij .
524 Appendix A. Matrix Algebra
(iv) AB is orthogonal, if A and B are orthogonal.
Theorem A.5. For A : n × n and B : n × n symmetric, there exists anorthogonal matrix H such that H ′AH and H ′BH become diagonal if andonly if A and B commute, i.e.,
AB = BA.
A.6 Rank of a Matrix
Definition A.17. The rank of A : m×n is the maximum number of linearlyindependent rows (or columns) of A. We write rank(A) = p.
Theorem A.6 (Rules for Ranks).
(i) 0 ≤ rank(A) ≤ min(m,n).
(ii) rank(A) = rank(A′).
(iii) rank(A + B) ≤ rank(A) + rank(B).
(iv) rank(AB) ≤ minrank(A), rank(B).(v) rank(AA′) = rank(A′A) = rank(A) = rank(A′).
(vi) For B : m×m and C : n×n regular, we have rank(BAC) = rank(A).
(vii) For A : n× n, rank(A) = n if and only if A is regular.
(viii) If A = diag(ai), then rank(A) equals the number of the ai 6= 0.
A.7 Range and Null Space
Definition A.18.
(i) The range R(A) of a matrix A : m × n is the vector space spannedby the column vectors of A, that is,
R(A) =
z : z = Ax =
n∑
i=1
a(i)xi, x ∈ Rn
⊂ Rm ,
where a(1), . . . , a(n) are the column vectors of A.
(ii) The null space N (A) is the vector space defined by
N (A) = x ∈ <n and Ax = 0 ⊂ <n.
A.8 Eigenvalues and Eigenvectors 525
Theorem A.7.
(i) rank(A) = dim R(A), where dim V denotes the number of basisvectors of a vector space V .
(ii) dim R(A) + dim N (A) = n.
(iii) N (A) = R(A′)⊥.(V ⊥ the orthogonal complement of a vector space V defined by V ⊥ =x : x′y = 0 for all y ∈ V ).
(iv) R(AA′) = R(A).
(v) R(AB) ⊆ R(A) for any A and B.
(vi) For A ≥ 0 and any B, R(BAB′) = R(BA).
A.8 Eigenvalues and Eigenvectors
Definition A.19. If A : p× p is a square matrix, then
q(λ) = |A− λI|is a pth–order polynomial in λ. The p roots λ1, . . . , λp of the characteristicequation q(λ) = |A− λI| = 0 are called eigenvalues or characteristic rootsof A.
The eigenvalues possibly may be complex numbers. Since |A− λiI| = 0,A − λiI is a singular matrix. Hence, there exists a nonzero vector γi 6= 0satisfying (A− λiI)γi = 0, i.e.,
Aγi = λiγi.
γi is called a (right) eigenvector of A for the eigenvalue λi. If λi is com-plex, then γi may have complex components. An eigenvector γ with realcomponents is called standardized if γ′γ = 1.
Theorem A.8.
(i) If x and y are nonzero eigenvectors of A for λi and α and β are anyreal numbers, then αx + βy is also an eigenvector for λi, i.e.,
A(αx + βy) = λi(αx + βy).
Thus the eigenvectors for any λi span a vector space which is calledan eigenspace of A for λi.
(ii) The polynomial q(λ) = |A− λI| has the normal form in terms of theroots
q(λ) =p∏
i=1
(λi − λ).
526 Appendix A. Matrix Algebra
Hence, q(0) =∏p
i=1 λi and
|A| =p∏
i=1
λi.
(iii) Matching the coefficients of λn−1 in q(λ) =∏p
i=1(λi−λ) and |A−λI|gives
tr(A) =p∑
i=1
λi.
(iv) Let C : p × p be a regular matrix. Then A and CAC−1 have thesame eigenvalues λi. If γi is an eigenvector for λi, then Cγi is aneigenvector of CAC−1 for λi.
Proof. As C is nonsingular, it has an inverse C−1 with CC−1 = I. Wehave |C−1| = |C|−1 and
|A− λI| = |C||A− λC−1C||C−1|= |CAC−1 − λI|.
Thus, A and CAC−1 have the same eigenvalues. Let Aγi = λiγi andmultiply from the left by C:
CAC−1Cγi = (CAC−1)(Cγi) = λi(Cγi).
(v) The matrix A + αI with α a real number has the eigenvalues λi =λi + α and the eigenvectors of A and A + αI coincide.
(vi) Let λ1 denote any eigenvalue of A : p × p with eigenspace H ofdimension r. If k denotes the multiplicity of λ1 in q(λ), then
1 ≤ r ≤ k.
Remark.
(a) For symmetric matrices A we have r = k.
(b) If A is not symmetric, then it is possible that r < k.
Example A.3. A =(
0 10 0
), A 6= A′
|A− λI| =∣∣∣∣−λ 10 −λ
∣∣∣∣ = λ2 = 0.
The multiplicity of the eigenvalue λ1,2 = 0 is k = 2.
The eigenvectors for λ = 0 are γ = α
(10
)and generate an
eigenspace of dimension 1.
A.9 Decomposition of Matrices 527
(c) If for any particular eigenvalue λ, dim(H) = r = 1, then thestandardized eigenvector for λ is unique (up to the sign).
Theorem A.9. Let A : n×p and B : p×n, with n ≥ p, be any two matrices.Then, from Theorem A.2(vii)
∣∣∣∣−λIn −A
B Ip
∣∣∣∣ = (−λ)n−p|BA− λIp| = |AB − λIn|.
Hence the n eigenvalues of AB are equal to the p eigenvalues of BA plus theeigenvalue 0 with multiplicity n− p. Suppose that x 6= 0 is an eigenvectorof AB for any particular λ 6= 0. Then y = Bx is an eigenvector of BA forthis λ and we have y 6= 0, too.
Corollary. A matrix A = aa′ with a 6= 0 has the eigenvalues 0 and λ = a′aand the eigenvector a.
Corollary. The nonzero eigenvalues of AA′ are equal to the nonzeroeigenvalues of A′A.
Theorem A.10. If A is symmetric, then all the eigenvalues are real.
A.9 Decomposition of Matrices
Theorem A.11 (Spectral Decomposition Theorem).Any symmetric matrix A : (p× p) can be written as
A = ΓΛΓ′ =∑
λiγ(i)γ′(i) ,
where Λ = diag(λ1, . . . , λp) is the diagonal matrix of the eigenvalues of Aand Γ = (γ(1), . . . , γ(p)) is the matrix of the standardized eigenvectors γ(i).Γ is orthogonal
ΓΓ′ = Γ′Γ = I.
Theorem A.12. Suppose A is symmetric and A = ΓΛΓ′. Then:
(i) A and Λ have the same eigenvalues with the same multiplicity.
(ii) From A = ΓΛΓ′ we get Λ = Γ′AΓ.
(iii) If A : p×p is a symmetric matrix, then for any integer n, An = ΓΛnΓ′
and Λn = diag(λni ). If the eigenvalues of A are positive, then we can
define the rational powers
Ar/s = ΓΛr/sΓ′ with Λr/s = diag(λr/si )
for integers s > 0 and r.
528 Appendix A. Matrix Algebra
Important special cases are when λi > 0
A−1 = ΓΛ−1Γ′ with Λ−1 = diag(λ−1i ),
the symmetric square root decomposition of A is when λi ≥ 0
A1/2 = ΓΛ1/2Γ′ with Λ1/2 = diag(λ1/2i )
and, if λi > 0,
A−1/2 = ΓΛ−1/2Γ′ with Λ−1/2 = diag(λ−1/2i ).
(iv) For any square matrix A the rank of A equals the number of nonzeroeigenvalues.
Proof. According to Theorem A.6(vi) we have rank(A) = rank(ΓΛΓ′) =rank(Λ). But rank(Λ) equals the number of nonzero λi’s.
(v) A symmetric matrix A is uniquely determined by its distinct eigen-values and the corresponding eigenspaces. If the distinct eigenvaluesλi are ordered as λ1 ≥ · · · ≥ λp, then the matrix Γ is unique (up tosign).
(vi) A1/2 and A have the same eigenvectors. Hence, A1/2 is unique.
(vii) Let λ1 ≥ λ2 ≥ · · · ≥ λk > 0 be the nonzero eigenvalues and letλk+1 = · · · = λp = 0. Then we have
A = (Γ1Γ2)(
Λ1 00 0
)(Γ′1Γ′2
)= Γ1Λ1Γ′1
with Λ1 = diag(λ1, . . . , λk) and Γ1 = (γ(1), . . . , γ(k)), whereas Γ′1Γ1 =Ik holds so that Γ1 is column–orthogonal.
(viii) A symmetric matrix A is of rank 1 if and only if A = aa′ where a 6= 0.
Proof. If rank(A) = rank(Λ) = 1, then Λ =(
λ 00 0
), A = λγγ′ = aa′
with a =√
λγ. If A = aa′, then by Theorem A.6(iv) we have rank(A) =rank(a) = 1.
Theorem A.13 (Singular Value Decomposition of a Rectangular Matrix). LetA be a rectangular (n× p)–matrix of rank r. Then we have
An,p
= Un,r
Lr,r
V ′r,p
with U ′U = Ir, V ′V = Ir and L = diag(l1, . . . , lr), li > 0.For a proof, see Rao (1973), p. 42.
A.9 Decomposition of Matrices 529
Theorem A.14. If A : p × q has rank(A) = r, then A contains at leastone nonsingular (r, r)–submatrix X, such that A has the so–called normalpresentation
Ap,q
=
Xr,r
Yr,q−r
Zp−r,r
Wp−r,q−r
.
All square submatrices of type (r + s, r + s) with (s ≥ 1) are singular.
Proof. As rank(A) = rank(X) holds, the first r rows of (X, Y ) are lin-early independent. Then the (p − r)–rows (Z,W ) are linear combinationsof (X, Y ) i.e., there exists a matrix F such that
(Z,W ) = F (X,Y ).
Analogously, there exists a matrix H satisfying(
YW
)=
(XZ
)H.
Hence, we get W = FY = FXH and
A =(
X YZ W
)=
(X XH
FX FXH
)
=(
IF
)X(I, H)
=(
XFX
)(I, H) =
(IF
)(X,XH) .
As X is nonsingular, the inverse X−1 exists. Then we obtain F = ZX−1,H = X−1Y , W = ZX−1Y , and
A =(
X YZ W
)=
(I
ZX−1
)X(I, X−1Y )
=(
XZ
)(I,X−1Y )
=(
IZX−1
)(X Y ) .
Theorem A.15 (Full Rank Factorization).
(i) If A : p× q has rank(A) = r, then A may be written as
Ap,q
= Kp,r
Lr,q
with K of full column rank r and L of full row rank r.
Proof. Theorem A.14.
530 Appendix A. Matrix Algebra
(ii) If A : p× q has rank(A) = p, then A may be written as
A = M(I,H) where M : p× p is regular.
Proof. Theorem A.15(i).
A.10 Definite Matrices and Quadratic Forms
Definition A.20. Suppose A : n×n is symmetric and x : n×1 is any vector.Then the quadratic form in x is defined as the function
Q(x) = x′Ax =∑
i,j
aijxixj .
Clearly Q(0) = 0.
Definition A.21. The matrix A is called positive definite (p.d.) if Q(x) > 0for all x 6= 0. We write A > 0.
Note: If A > 0, then (−A) is called negative definite.
Definition A.22. The quadratic form x′Ax (and the matrix A, also) is calledpositive semidefinite (p.s.d.), if Q(x) ≥ 0 for all x and Q(x) = 0 for at leastone x 6= 0.
Definition A.23. The quadratic form x′Ax and A) is called nonnegative def-inite (n.n.d.), if it is either p.d. or p.s.d., i.e., if x′Ax ≥ 0 for all x. If A isn.n.d., we write A ≥ 0.
Theorem A.16. Let the (n× n)–matrix A > 0. Then:
(i) A has all eigenvalues λi > 0.
(ii) x′Ax > 0 for any x 6= 0.
(iii) A is nonsingular and |A| > 0.
(iv) A−1 > 0.
(v) tr(A) > 0.
(vi) Let P : n × m be of rank(P ) = m ≤ n. Then P ′AP > 0 and, inparticular, P ′P > 0, choosing A = I.
(vii) Let P : n×m be of rank(P ) < m ≤ n. Then P ′AP ≥ 0 and P ′P ≥ 0.
Theorem A.17. Let A : n × n and B : n × n be such that A > 0 andB : n× n ≥ 0. Then:
(i) C = A + B > 0.
(ii) A−1 − (A + B)−1 ≥ 0.
(iii) |A| ≤ |A + B|.
A.10 Definite Matrices and Quadratic Forms 531
Theorem A.18. Let A ≥ 0. Then:
(i) λi ≥ 0.
(ii) tr(A) ≥ 0.
(iii) A = A1/2A1/2 with A1/2 = ΓΛ1/2Λ′.
(iv) For any matrix C : n×m we have C ′AC ≥ 0.
(v) For any matrix C we have C ′C ≥ 0 and CC ′ ≥ 0.
Theorem A.19. For any matrix A ≥ 0 we have 0 ≤ λi ≤ 1 if and only if(I −A) ≥ 0.
Proof. Write the symmetric matrix A in its spectral form as A = ΓΛΓ′.Then we have
(I −A) = Γ(I − Λ)Γ′ ≥ 0
if and only if
Γ′Γ(I − Λ)Γ′Γ = I − Λ ≥ 0.
(a) If I − Λ ≥ 0, then for the eigenvalues of I − A we have 1 − λi ≥ 0,i.e., 0 ≤ λi ≤ 1.
(b) If 0 ≤ λi ≤ 1, then for any x 6= 0:
x′(I − Λ)x =∑
x2i (1− λi) ≥ 0,
i.e., I − Λ ≥ 0.
Theorem A.20 (Theobald, 1974).Let D : n× n be symmetric. Then D ≥ 0 if and only if trCD ≥ 0 for allC ≥ 0.
Proof. D is symmetric, so that
D = ΓΛΓ′ =∑
λiγiγ′i
and, hence,
trCD = tr∑
λiCγiγ′i
=∑
λiγ′iCγi.
(a) Let D ≥ 0 and, hence, λi ≥ 0 for all i. Then tr(CD) ≥ 0 if C ≥ 0.
532 Appendix A. Matrix Algebra
(b) Let trCD ≥ 0 for all C ≥ 0. Choose C = γiγ′i (i = 1, . . . , n, i fixed)
so that
0 ≤ trCD = tr
γiγ
′i
∑
j
λjγjγ′j
= λi (i = 1, . . . , n)
and D = ΓΛΓ′ ≥ 0.
Theorem A.21. Let A : n×n be symmetric with eigenvalues λ1 ≥ . . . ≥ λn.Then
supx
x′Ax
x′x= λ1, inf
x
x′Ax
x′x= λn.
Proof. See Rao (1973), p. 62.
Theorem A.22. Let A : n × r = (A1, A2), with A1 of order n × r1 and A2
of order n× r2 and rank(A) = r = r1 + r2.
Define the orthogonal projectors M1 = A1(A′1A1)−1A′1 and M =A(A′A)−1A′. Then
M = M1 + (I −M1)A2(A′2(I −M1)A2)−1A′2(I −M1).
Proof. M1 and M are symmetric idempotent matrices fulfilling M1A1 = 0and MA = 0. Using Theorem A.4 for partial inversion of A′A, i.e.,
(A′A)−1 =(
A′1A1 A′1A2
A′2A1 A′2A2
)−1
,
and using the special form of the matrix D defined in Theorem A.4, i.e.,
D = A′2(I −M1)A2,
straightforward calculation concludes the proof.
Theorem A.23. Let A : n ×m, with rank(A) = m ≤ n and B : m ×m, beany symmetric matrix. Then
ABA′ ≥ 0 if and only if B ≥ 0.
Proof. (i) B ≥ 0 ⇒ ABA′ ≥ 0 for all A.
(ii) Let rank(A) = m ≤ n and assume ABA′ ≥ 0, so that x′ABA′x ≥ 0for all x ∈ En.We have to prove that y′By ≥ 0 for all y ∈ Em. As rank(A) = m, theinverse (A′A)−1 exists. Setting z = A(A′A)−1y, we have A′z = y andy′By = z′ABA′z ≥ 0 so that B ≥ 0.
A.10 Definite Matrices and Quadratic Forms 533
Definition A.24. Let A : n × n and B : n × n be any matrices. Then theroots λi = λB
i (A) of the equation
|A− λB| = 0
are called the eigenvalues of A in the metric of B. For B = I we obtain theusual eigenvalues defined in Definition A.19 (cf. Dhrymes (1978)).
Theorem A.24. Let B > 0 and A ≥ 0. Then λBi (A) ≥ 0.
Proof. B > 0 is equivalent to B = B1/2B1/2 with B1/2 nonsingular andunique (Theorem A.12(iii) ). Then we may write
0 = |A− λB| = |B1/2|2|B−1/2AB−1/2 − λI|and λB
i (A) = λIi (B
−1/2AB−1/2) ≥ 0, as B−1/2AB−1/2 ≥ 0.
Theorem A.25 (Simultaneous Diagonalization).Let B > 0 and A ≥ 0 and denote by Λ = diag(λB
i (A)) the diagonal matrixof the eigenvalues of A in the metric of B. Then there exists a nonsingularmatrix W such that
B = W ′W and A = W ′ΛW.
Proof. From the proof of Theorem A.24 we know that the roots λBi (A)
are the usual eigenvalues of the matrix B−1/2AB−1/2. Let X be the matrixof the corresponding eigenvectors:
B−1/2AB−1/2X = XΛ,
i.e.,
A = B1/2XΛX ′B1/2 = W ′ΛW
with W ′ = B1/2X regular and
B = W ′W = B1/2XX ′B1/2 = B1/2B1/2.
Theorem A.26. Let A > 0 (or A ≥ 0) and B > 0. Then
B −A > 0 if and only if λBi (A) < 1.
Proof. Using Theorem A.25 we may write
B −A = W ′(I − Λ)W,
i.e.,
x′(B −A)x = x′W ′(I − Λ)Wx
= y′(I − Λ)y
=∑
(1− λBi (A))y2
i
with y = Wx, W regular and, hence, y 6= 0 for x 6= 0. Then x′(B−A)x > 0holds if and only if
λBi (A) < 1.
534 Appendix A. Matrix Algebra
Theorem A.27. Let A > 0 (or A ≥ 0) and B > 0. Then
A−B ≥ 0
if and only if
λBi (A) ≤ 1.
Proof. Similar to Theorem A.26.
Theorem A.28. Let A > 0 and B > 0. Then
B −A > 0 if and only if A−1 −B−1 > 0.
Proof. From Theorem A.25 we have
B = W ′W, A = W ′ΛW.
Since W is regular we have
B−1 = W−1W ′−1, A−1 = W−1Λ−1W ′−1
,
i.e.,
A−1 −B−1 = W−1(Λ−1 − I)W ′−1> 0,
as λBi (A) < 1 and, hence, Λ−1 − I > 0.
Theorem A.29. Let B −A > 0. Then |B| > |A| and tr(B) > tr(A).
If B −A ≥ 0, then |B| ≥ |A| and tr(B) ≥tr(A).Proof. From Theorem A.25 and Theorem A.2(iii), (v) we get
|B| = |W ′W | = |W |2,|A| = |W ′ΛW | = |W |2|Λ| = |W |2
∏λB
i (A),
i.e.,
|A| = |B|∏
λBi (A).
For B −A > 0 we have λBi (A) < 1, i.e., |A| < |B|.
For B −A ≥ 0 we have λBi (A) ≤ 1, i.e., |A| ≤ |B|.
B − A > 0 implies tr(B − A) > 0, and tr(B) > tr(A). Analogously,B −A ≥ 0 implies tr(B) ≥ tr(A).
Theorem A.30 (Cauchy–Schwarz Inequality).Let x and y be real vectors of the same dimension. Then
(x′y)2 ≤ (x′x)(y′y),
with equality if and only if x and y are linearly dependent.
A.10 Definite Matrices and Quadratic Forms 535
Theorem A.31. Let x and y be n–vectors and A > 0. Then we have thefollowing results:
(i) (x′Ay)2 ≤ (x′Ax)(y′Ay).
(ii) (x′y)2 ≤ (x′Ax)(y′A−1y) .
Proof. (i) A ≥ 0 is equivalent to A = BB with B = A1/2 (TheoremA.18(iii)). Let Bx = x and By = y. Then (i) is a consequence of TheoremA.30.
(ii) A > 0 is equivalent to A = A1/2A1/2 and A−1 = A−1/2A−1/2. LetA1/2x = x and A−1/2y = y, then (ii) is a consequence of Theorem A.30.
Theorem A.32. Let A > 0 and let T be any square matrix. Then:
(i) supx6=0(x′y)2/x′Ax = y′A−1y .
(ii) supx6=0(y′Tx)2/x′Ax = y′TA−1T ′y .
Proof. Use Theorem A.31(ii).
Theorem A.33. Let I : n × n be the identity matrix and a an n–vector.Then
I − aa′ ≥ 0 if and only if a′a ≤ 1.
Proof. The matrix aa′ is of rank 1 and aa′ ≥ 0. The spectral decom-position is aa′ = CΛC ′ with Λ = diag(λ, 0, . . . , 0) and λ = a′a. Hence,I − aa′ = C(I − Λ)C ′ ≥ 0 if and only if λ = a′a ≤ 1 (see Theorem A.19).
Theorem A.34. Assume MM ′ − NN ′ ≥ 0. Then there exists a matrix Hsuch that N = MH.
Proof. (Milliken and Akdeniz, 1977). Let M(n, r) of rank(M) = s andlet x be any vector ∈ R(I −MM−), implying x′M = 0 and x′MM ′x = 0.As NN ′ and MM ′ − NN ′ (by assumption) are n.n.d., we may concludethat x′NN ′x ≥ 0 and
x′(MM ′ −NN ′)x = −x′NN ′x ≥ 0,
so that x′NN ′x = 0 and x′N = 0. Hence, N ⊂ R(M) or, equivalently,N = MH for some matrix H(r, k).
Theorem A.35. Let A be an (n × n)–matrix and assume (−A) > 0. Let abe an n–vector. In the case of n ≥ 2, the matrix A + aa′ is never n.n.d.
Proof. (Guilkey and Price, 1981). The matrix aa′ is of rank ≤ 1. In thecase of n ≥ 2 there exists a nonzero vector w such that w′aa′w = 0 implyingw′(A + aa′)w = w′Aw < 0.
536 Appendix A. Matrix Algebra
A.11 Idempotent Matrices
Definition A.25. A square matrix A is called idempotent if it satisfies
A2 = AA = A.
An idempotent matrix A is called an orthogonal projector if A = A′.Otherwise, A is called an oblique projector.
Theorem A.36. Let A : n × n be idempotent with rank(A) = r ≤ n. Thenwe have:
(i) The eigenvalues of A are 1 or 0.
(ii) tr(A) = rank(A) = r.
(iii) If A is of full rank n, then A = In.
(iv) If A and B are idempotent and if AB = BA, then AB is alsoidempotent.
(v) If A is idempotent and P is orthogonal, then PAP ′ is also idempotent.
(vi) If A is idempotent, then I −A is idempotent and
A(I −A) = (I −A)A = 0.
Proof. (i) The characteristic equation
Ax = λx
multiplied by A gives
AAx = Ax = λAx = λ2x.
Multiplication of both the equations by x′ then yields
x′Ax = λx′x = λ2x′x,
i.e.,
λ(λ− 1) = 0.
(ii) From the spectral decomposition
A = ΓΛΓ′
we obtain
rank(A) = rank(Λ) = tr(Λ) = r,
where r is the number of characteristic roots with value 1.(iii) Let rank(A) = rank(Λ) = n, then Λ = In and
A = ΓΛΓ′ = In.
(iv)–(vi) follow from the definition of an idempotent matrix.
A.12 Generalized Inverse 537
A.12 Generalized Inverse
Definition A.26. Let A be an (m × n)–matrix. Then a matrix A− : n ×mis said to be a generalized inverse (g–inverse) of A if
AA−A = A
holds.
Theorem A.37. A generalized inverse always exists although it is not uniquein general.
Proof. Assume rank(A) = r. According to Theorem A.13 we may write
Am,n
= Um,r
Lr,r
V ′r,n
with U ′U = Ir and V ′V = Ir and
L = diag(l1, . . . , lr), li > 0.
Then
A− = V
(L−1 XY Z
)U ′ ,
where X,Y , and Z are arbitrary matrices (of suitable dimensions), is ag–inverse.
Using Theorem A.14, i.e.,
A =(
X YZ W
)
with X nonsingular, we have
A− =(
X−1 00 0
)
as a special g–inverse.
For details on g–inverses, the reader is referred to Rao and Mitra (1971).
Definition A.27 (Moore–Penrose Inverse). A matrix A+ satisfying the fol-lowing conditions is called a Moore–Penrose inverse of A:
(i) AA+A = A; (ii) A+AA+ = A+ ;(iii) (A+A)′ = A+A; (iv) (AA+)′ = AA+ .
A+ is unique.
Theorem A.38. For any matrix A : m×n and any g–inverse A− : m×n wehave:
(i) A−A and AA− are idempotent.
538 Appendix A. Matrix Algebra
(ii) rank(A) = rank(AA−) = rank(A−A).
(iii) rank(A) ≤ rank(A−).
Proof. (i) Using the definition of the g–inverse:
(A−A)(A−A) = A−(AA−A) = A−A.
(ii) According to Theorem A.6(iv) we get
rank(A) = rank(AA−A) ≤ rank(A−A) ≤ rank(A),
i.e., rank(A−A) = rank(A). Analogously, we see that rank(A) =rank(AA−).
(iii) rank(A) = rank(AA−A) ≤ rank(AA−) ≤ rank(A−).
Theorem A.39. Let A be an (m× n)–matrix. Then:
(i) A regular ⇒ A+ = A−1.
(ii) (A+)+ = A.
(iii) (A+)′ = (A′)+.
(iv) rank(A) = rank(A+) = rank(A+A) = rank(AA+).
(v) A an orthogonal projector ⇒ A+ = A.
(vi) rank(A) : m× n = m. ⇒ A+ = A′(AA′)−1 and AA+ = Im.
(vii) rank(A) : m× n = n. ⇒ A+ = (A′A)−1A′ and A+A = In.
(viii) If P : m×m and Q : n×n are orthogonal⇒ (PAQ)+ = Q−1A+P−1.
(ix) (A′A)+ = A+(A′)+ and (AA′)+ = (A′)+A+.
(x) A+ = (A′A)+A′ = A′(AA′)+.
Theorem A.40 (Baksalary et al., 1983). Let M : n × n ≥ 0 and N : m × nbe any matrices. Then
M −N ′(NM+N ′)+N ≥ 0
if and only if
R(N ′NM) ⊂ R(M).
Theorem A.41. Let A be any square (n×n)–matrix and let a be an n–vectorwith a 6∈ R(A). Then a g–inverse of A + aa′ is given by
(A + aa′)− = A− − A−aa′U ′Ua′U ′Ua
−V V ′aa′A−
a′V V ′a+ φ
V V ′aa′U ′U(a′U ′Ua)(a′V V ′a)
,
with A− any g–inverse of A and
φ = 1 + a′A−a, U = I −AA−, V = I −A−A.
A.12 Generalized Inverse 539
Proof. Straightforward by checking AA−A = A.
Theorem A.42. Let A be a square (n × n)–matrix. Then we have thefollowing results:
(i) Assume a and b to be vectors with a, b ∈ R(A) and let A be sym-metric. Then the bilinear form a′A−b is invariant to the choice ofA−.
(ii) A(A′A)−A′ is invariant to the choice of (A′A)−.
Proof. (i) a, b ∈ R(A) ⇒ a = Ac and b = Ad.
Using the symmetry of A gives
a′A−b = c′A′A−Ad
= c′Ad.
(ii) Using the row–wise representation of A as A =
a′1...
a′n
gives
A(A′A)−A′ = (a′i(A′A)−aj).
As A′A is symmetric, we may conclude then: (i) that all bilinear formsa′i(A
′A)aj are invariant to the choice of (A′A)− and, hence, (ii) is proved.
Theorem A.43. Let A : n × n be symmetric, a ∈ R(A), b ∈ R(A), andassume 1 + b′A+a 6= 0. Then
(A + ab′)+ = A+ − A+ab′A+
1 + b′A+a.
Proof. Straightforward, using Theorems A.41 and A.42.
Theorem A.44. Let A : n× n be symmetric, a an n–vector, and α > 0 anyscalar. Then the following statements are equivalent:
(i) αA− aa′ ≥ 0.
(ii) A ≥ 0, a ∈ R(A), and a′A−a ≤ α, with A− being any g–inverse of A.
Proof. (i) ⇒ (ii) αA− aa′ ≥ 0 ⇒ αA = (αA− aa′) + aa′ ≥ 0 ⇒ A ≥ 0.Using Theorem A.12 for αA−aa′ ≥ 0 we have αA−aa′ = BB and, hence,
αA = BB + aa′ = (B, a)(B, a)′
⇒ R(αA) = R(A) = R(B, a)⇒ a ∈ R(A)⇒ a = Ac with c ∈ En.
⇒ a′A−a = c′Ac.
540 Appendix A. Matrix Algebra
As αA− aa′ ≥ 0 ⇒ x′(αA− aa′)x ≥ 0 for any vector x. Choosing x = cwe have
αc′Ac− c′aa′c = αc′Ac− (c′Ac)2 ≥ 0
⇒ c′Ac ≤ α.
(ii) ⇒ (i) Let x ∈ En be any vector. Then, using Theorem A.30
x′(αA− aa′)x = αx′Ax− (x′a)2
= αx′Ax− (x′Ac)2
≥ αx′Ax− (x′Ax)(c′Ac)
⇒ x′(αA− aa′)x ≥ (x′Ax)(α− c′Ac).
In (ii) we have assumed A ≥ 0 and c′Ac = a′A−a ≤ α. Hence, αA−aa′ ≥ 0.
Remark: This theorem is due to Baksalary et al. (1983).
Theorem A.45. For any matrix A we have
A′A = 0 if and only if A = 0.
Proof. (i) A=0 ⇒ A′A = 0.(ii) Let A′A = 0 and let A = (a(1), . . . , a(n)) be the column–wise
presentation. Then
A′A = (a′(i)a(j)) = 0,
so that all the elements on the diagonal are zero: a′(i)a(i) = 0 ⇒ a(i) = 0and A = 0.
Theorem A.46. Let X 6= 0 be an (m × n)–matrix and let A be an (n × n)matrix. Then
X ′XAX ′X = X ′X ⇒ XAX ′X = X and X ′XAX ′ = X ′.
Proof. As X 6= 0 and X ′X 6= 0, we have
X ′XAX ′X −X ′X = (X ′XA− I)X ′X = 0 ⇒(X ′XA− I) = 0 ⇒
0 = (X ′XA− I)(X ′XAX ′X −X ′X)= (X ′XAX ′ −X ′)(XAX ′X −X) = Y ′Y ,
so that (by Theorem A.45) Y = 0 and, hence, XAX ′X = X.
Corollary. Let X 6= 0 be an (m,n)–matrix and let A and b be (n, n)–matrices. Then
AX ′X = BX ′X ←→ AX ′ = BX ′.
A.12 Generalized Inverse 541
Theorem A.47 (Albert’s Theorem). Let A =(
A11 A12
A21 A22
)be symmetric.
Then:
(a) A ≥ 0 if and only if:
(i) A22 ≥ 0 ;(ii) A21 = A22A
−22A21 ;
(iii) A11 ≥ A12A−22A21 .
((ii) and (iii) are invariant of the choice of A−22).
(b) A > 0 if and only if:
(i) A22 > 0 ;(ii) A11 > A12A
−122 A21 .
Proof. (Bekker and Neudecker, 1989)
(a) Assume A ≥ 0.
(i) A ≥ 0 ⇒ x′Ax ≥ 0 for any x. Choosing x′ = (0′, x′2),⇒ x′Ax = x′2A22x2 ≥ 0 for any x2 ⇒ A22 ≥ 0.
(ii) Let B′ = (0, I −A22A−22) ⇒
B′A =((I −A22A
−22)A21, A22 −A22A
−22A22
)
=((I −A22A
−22)A21, 0
)
and
B′AB = B′A1/2A1/2B = 0 ⇒ B′A1/2 = 0 (Theorem A.45)
⇒ B′A1/2A1/2 = B′A = 0⇒ (I −A22A
−22)A21 = 0.
This proves (ii).(iii) Let C ′ = (I,−(A−22A21)′). As A ≥ 0 ⇒
0 ≤ C ′AC = A11 −A12(A−22)′A21 −A12A
−22A21
+ A12(A−22)′A22A
−22A21
= A11 −A12A−22A21
(as A22 is symmetric, we have (A−22)′ = A22).
Assume now (i), (ii), and (iii). Then
D =(
A11 −A12A−22A21 0
0 A22
)≥ 0,
as the submatrices are n.n.d. by (i) and (ii). Hence,
A =(
I A12(A−22)0 I
)D
(I 0
A−22A21 I
)≥ 0.
542 Appendix A. Matrix Algebra
(b) Proof as in (a) if A−22 is replaced by A−122 .
Theorem A.48. If A : n× n and B : n× n are symmetric, then:
(a) 0 ≤ B ≤ A if and only if:
(i) A ≥ 0;(ii) B = AA−B;(iii) B ≥ BA−B.
(b) 0 < B < A if and only if 0 < A−1 < B−1.
Proof. Apply Theorem A.47 to the matrix(
B BB A
).
Theorem A.49. Let A be symmetric and let c ∈ R(A). Then the followingstatements are equivalent:
(i) rank(A + cc′) = rank(A).
(ii) R(A + cc′) = R(A).
(iii) 1 + c′A−c 6= 0.
Corollary. Assume (i) or (ii) or (iii) to hold, then
(A + cc′)− = A− − A−cc′A−
1 + c′A−c
for any choice of A−.
Corollary. Assume (i) or (ii) or (iii) to hold, then
c′(A + cc′)−c = c′A−c− (c′A−c)2
1 + c′A−c
= 1− 11 + c′A−c
.
Moreover, as c ∈ R(A + cc′), this is seen to be invariant for the specialchoice of the g–inverse.
Proof. c ∈ R(A) ⇔ AA−c = c ⇒R(A + cc′) = R(AA−(A + cc′)) ⊂ R(A).
Hence, (i) and (ii) become equivalent. Consider the following product ofmatrices(
1 0c A + cc′
)(1 −c0 I
)(1 0
−A−c I
)=
(1 + c′A−c −c
0 A
).
The left–hand side has the rank
1 + rank(A + cc′) = 1 + rank(A)
(see (i) or (ii)). The right–hand side has the rank 1 + rank(A) if and onlyif 1 + c′A−c 6= 0.
A.12 Generalized Inverse 543
Theorem A.50. Assume A : n×n to be a symmetric and nonsingular matrixand assume c 6∈ R(A). Then we have:
(i) c ∈ R(A + cc′).
(ii) R(A) ⊂ R(A + cc′).
(iii) c′(A + cc′)−c = 1.
(iv) A(A + cc′)−A = A.
(v) A(A + cc′)−c = 0.
Proof. As A is assumed to be nonsingular, the equation Al = 0 has anontrivial solution l 6= 0 which may be standardized as l/(c′l), such thatc′l = 1. Then we have c = (A+ cc′)l ∈ R(A+ cc′) and, hence, (i) is proved.Relation (ii) holds as c 6∈ R(A). Relation (i) is seen to be equivalent to
(A + cc′)(A + cc′)−c = c.
Therefore (iii) follows:
c′(A + cc′)−c = l′(A + cc′)(A + cc′)−c
= l′c = 1.
From
c = (A + cc′)(A + cc′)−c
= A(A + cc′)−c + cc′(A + cc′)−c
= A(A + cc′)−c + c
we have (v). (iv) is a consequence of the general definition of a g–inverseand of (iii) and (iv):
A + cc′ = (A + cc′)(A + cc′)−(A + cc′)= A(A + cc′)−A
+cc′(A + cc′)−cc′ [= cc′ using (iii)]+A(A + cc′)−cc′ [= 0 using (v)]+cc′(A + cc′)−A [= 0 using (v)].
Theorem A.51. We have A ≥ 0 if and only if:
(i) A + cc′ ≥ 0.
(ii) (A + cc′)(A + cc′)−c = c.
(iii) c′(A + cc′)−c ≤ 1.
Assume A ≥ 0, then:
(a) c = 0 ⇔ c′(A + cc′)−c = 0.
(b) c ∈ R(A) ⇔ c′(A + cc′)−c < 1.
544 Appendix A. Matrix Algebra
(c) c 6∈ R(A) ⇔ c′(A + cc′)−c = 1.
Proof. A ≥ 0 is equivalent to
0 ≤ cc′ ≤ A + cc′.
Straightforward application of Theorem A.48 gives (i)–(iii).
(a) A ≥ 0 ⇒ A + cc′ ≥ 0. Assume c′(A + cc′)−c = 0 and replace c by(ii) ⇒
c′(A + cc′)−(A + cc′)(A + cc′)−c = 0 ⇒(A + cc′)(A + cc′)−c = 0
as (A + cc′) ≥ 0. Assuming c = 0 ⇒ c′(A + cc′)c = 0.
(b) Assume A ≥ 0 and c ∈ R(A), and use Theorem A.49 ⇒
c′(A + cc′)−c = 1− 11 + c′A−c
< 1.
The opposite direction of (b) is a consequence of (c).
(c) Assume A ≥ 0 and c 6∈ R(A), and use Theorem A.50(iii) ⇒c′(A + cc′)−c = 1.
The opposite direction of (c) is a consequence of (b).
Note: The proofs of Theorems A.47–A.51 are given in Bekker andNeudecker (1989).
Theorem A.52. The linear equation Ax = a has a solution if and only if
a ∈ R(A) or AA−a = a
for any g–inverse A.
If this condition holds, then all solutions are given by
x = A−a + (I −A−A)w,
where w is an arbitrary m–vector. Further q′x has a unique value for allsolutions of Ax = a if and only if q′A−A = q′, or q ∈ R(A′).
For a proof see Rao (1973), p. 25.
A.13 Projections 545
A.13 Projections
Consider the range space R(A) of the matrix A : m × n with rank r.Then there exists R(A)⊥ which is the orthogonal complement of R(A)with dimension m− r. Any vector x ∈ <m has the unique decomposition
x = x1 + x2, X1 ∈ R(A), and x2 ∈ R(A)⊥ ,
of which the component x is called the orthogonal projection of x on R(A).The component x1 can be computed as Px where
P = A(A′A)−A′
which is called the projection operator on R(A). Note that P is unique forany choice of the g–inverse (A′A)−.
Theorem A.53. For any P : n× n, the following statements are equivalent:
(i) P is an orthogonal projection operator.
(ii) P is symmetric and idempotent.
For proofs and other details the reader is referred to Rao (1973) and Raoand Mitra (1971).
Theorem A.54. Let X be a matrix of order T ×K with rank r < K and letU : (K − r)×K be such that R(X ′) ∩R(U ′) = 0.
Then:
(i) X(X ′X + U ′U)−1U ′ = 0.
(ii) X ′X(X ′X +U ′U)−1X ′X = X ′X, i.e., (X ′X +U ′U)−1 is a g–inverseof X ′X.
(iii) U ′U(X ′X + U ′U)−1U ′U = U ′U , i.e., (X ′X + U ′U)−1 is also a g–inverse of U ′U .
(iv) U(X ′X + U ′U)−1U ′u = u if u ∈ R(U).
Proof. Since X ′X +U ′U is of full rank, there exists a matrix A such that
(X ′X + U ′U)A = U ′
⇒ X ′XA = U ′ − U ′UA ⇒ XA = 0 and U ′ = U ′UA
since R(X ′) and R(U ′) are disjoint.(i):
X(X ′X + U ′U)−1U ′ = X(X ′X + U ′U)−1(X ′X + U ′U)A = XA = 0
(ii):
X ′X(X ′X + U ′U)−1(X ′X + U ′U − U ′U)= X ′X −X ′X(X ′X + U ′U)−1U ′U = X ′X .
546 Appendix A. Matrix Algebra
The result (iii) follows on the same lines as result (ii).(iv):
U(X ′X + U ′U)−1U ′u = U(X ′X + U ′U)−1U ′Ua = Ua = u
since u ∈ R(U) .
A.14 Functions of Normally Distributed Variables
Let x′ = (x1, . . . , xp) be a p–dimensional random vector. Then x is p–dimensional normally distributed with expectation vector µ and covariancematrix Σ > 0, i.e., x ∼ Np(µ, Σ), if the joint density is
f(x; µ, Σ) = (2π)p|Σ|−1/2 exp−1/2(x− µ)′Σ−1(x− µ).Theorem A.55. Assume x ∼ Np(µ, Σ), and A : p × p and b : p × 1nonstochastic. Then
y = Ax + b ∼ Nq(Aµ + b, AΣA′) with q = rank(A).
Theorem A.56. If x ∼ Np(0, I), then
x′x ∼ χ2p
(central χ2–distribution with p degrees of freedom).
Theorem A.57. If x ∼ Np(µ, I), then
x′x ∼ χ2p(λ)
has a noncentral χ2–distribution with a noncentrality parameter
λ = µ′µ =p∑
i=1
µ2i .
Theorem A.58. If x ∼ Np(µ, Σ), then:
(i) x′Σ−1x ∼ χ2p(µ
′Σ−1µ).
(ii) (x− µ)′Σ−1(x− µ) ∼ χ2p.
Proof. Σ > 0 ⇒ Σ = Σ1/2Σ1/2 with Σ1/2 regular and symmetric. Hence,
Σ−1/2x = y ∼ Np(Σ−1/2µ, I) ⇒ x′Σ−1x = y′y ∼ χ2p(µ
′Σ−1µ)
and
(x− µ)′Σ−1(x− µ) = (y − Σ−1/2µ)′(y − Σ−1/2µ) ∼ χ2p.
A.14 Functions of Normally Distributed Variables 547
Theorem A.59. If Q1 ∼ χ2m(λ) and Q2 ∼ χ2
n, and Q1 and Q2 areindependent, then:
(i) The ratio
F =Q1/m
Q2/n
has a noncentral Fm,n(λ)–distribution.
(ii) If λ = 0, then F ∼ Fm,n, the central F–distribution.
(iii) If m = 1, then√
F has a noncentral tn(√
λ)–distribution or a centraltn–distribution if λ = 0.
Theorem A.60. If x ∼ Np(µ, I) and A : p × p is a symmetric idempotentmatrix with rank(A) = r, then
x′Ax ∼ χ2r(µ
′Aµ).
Proof. We have A = PΛP ′ (Theorem A.11) and without loss of generality
(Theorem A.36(i)) we may write Λ =(
Ir 00 0
), i.e., P ′AP = Λ with P
orthogonal. Let P = ( P1p,r
P2p,(p−r)
) and
P ′x = y =(
y1
y2
)=
(P ′1xP ′2x
).
Therefore
y ∼ Np(P ′µ, Ip) (Theorem A.55),y1 ∼ Nr(P ′1µ, Ir), and
y′1y1 ∼ χ2r(µ
′P1P′1µ) (Theorem A.57).
As P is orthogonal, we have
A = (PP ′)A(PP ′) = P (P ′AP )P
= (P1 P2)(
Ir 00 0
) (P ′1P ′2
)= P1P
′1
and, therefore,
x′Ax = x′P1P′1x = y′1y1 ∼ χ2
r(µ′Aµ).
Theorem A.61. Assume x ∼ Np(µ, I), A : p × p an idempotent of rank r,and B : p× n any matrix.
Then the linear form Bx is independent of the quadratic form x′Ax ifand only if AB = 0.
Proof. Let P be the matrix as in Theorem A.60. Then BPP ′AP =BAP = 0, as BA = 0 was assumed. Let BP = D = (D1, D2) =
548 Appendix A. Matrix Algebra
(BP1, BP2), then
BPP ′AP = (D1, D2)(
Ir 00 0
)= (D1, 0) = (0, 0),
so that D1 = 0. This gives
Bx = BPP ′x = Dy = (0, D2)(
y1
y2
)= D2y2
where y2 = P ′2x. Since P is orthogonal and, hence, regular we may concludethat all the components of y = P ′x are independent ⇒ Bx = D2y2 andx′Ax = y′1y1 are independent.
Theorem A.62. Let x ∼ Np(0, I) and assume A and B to be idempotentp × p matrices with rank(A) = r and rank(B) = s. Then the quadraticforms x′Ax and x′Bx are independent if and only if BA = 0.
Proof. If we use P from Theorem A.60 and set C = P ′BP (C symmetric)we get, with the assumption BA = 0,
CP ′AP = P ′BPP ′AP
= P ′BAP = 0.
Using
C =(
P1
P2
)B(P ′1 P ′2)
=(
C1 C2
C ′2 C3
)=
(P1BP ′1 P1BP ′2P2BP ′1 P2BP ′2
)
this relation may be written as
CP ′AP =(
C1 C2
C ′2 C3
)(Ir 00 0
)=
(C1 0C ′2 0
)= 0 .
Therefore, C1 = 0 and C2 = 0,
x′Bx = x′(PP ′)B(PP ′)x= x′P (P ′BP )P ′x= x′PCP ′x
= (y′1, y′2)
(0 00 C3
)(y1
y2
)= y′2C3y2 .
As shown in Theorem A.60, we have x′Ax = y′1y1 and, therefore, thequadratic forms x′Ax and x′Bx are independent.
A.15 Differentiation of Scalar Functions of Matrices 549
A.15 Differentiation of Scalar Functions ofMatrices
Definition A.28. If f(X) is a real function of an m × n matrix X = (xij),then the partial differential of f with respect to X is defined as the (m ×n)–matrix of partial differentials ∂f/∂xij :
∂f(X)∂X
=
∂f/∂x11 . . . ∂f/∂x1n
......
∂f/∂xm1 . . . ∂f/∂xmn
.
Theorem A.63. Let x be an n–vector and A a symmetric (n × n)–matrix.Then
∂
∂xx′Ax = 2Ax.
Proof.
x′Ax =n∑
r,s=1
arsxrxs ,
∂f
∂xix′Ax =
n∑s=1(s6=i)
aisxs +n∑
r=1(r 6=i)
arixr + 2aiixi
= 2n∑
s=1
aisxs (as aij = aji)
= 2a′ix (a′i: ith row vector of A).
According to Definition A.28 we get
∂x′Ax
∂x=
∂/∂x1
...∂/∂xn
(x′Ax) = 2
a′1...
a′n
x = 2Ax.
Theorem A.64. If x is an n–vector, y an m–vector, and C an (n×m)–matrix,then
∂
∂Cx′Cy = xy′.
550 Appendix A. Matrix Algebra
Proof.
x′Cy =m∑
r=1
n∑s=1
xscsryr,
∂
∂ckλx′Cy = xkyλ (the (k, λ)th element of xy′),
∂
∂Cx′Cy = (xkyλ) = xy′.
Theorem A.65. Let x be a K–vector, A a symmetric (T × T )–matrix, andC a (T ×K)–matrix. Then
∂
∂Cx′C ′ACx = 2ACxx′.
Proof. We have
x′C ′ =
(K∑
i=1
xic1i, . . . ,K∑
i=1
xicTi
),
∂
∂ckλ= (0, . . . , 0, xλ, 0, . . . , 0) (xλ is an element of the kth column).
Using the product rule yields
∂
∂ckλx′C ′ACx =
(∂
∂ckλx′C ′
)ACx + x′C ′A
(∂
∂ckλCx
).
Since
x′C ′A =
(T∑
t=1
K∑
i=1
xictiat1, . . . ,T∑
t=1
K∑
i=1
xictiaTt
)
we get
x′C ′A(
∂
∂ckλCx
)=
∑
t,i
xixλctiakt
=∑
t,i
xixλctiatk (as A is symmetric)
=(
∂
∂ckλx′C ′
)ACx.
But∑
t,i xixλctiatk is just the (k, λ)th element of the matrix ACxx′.
Theorem A.66. Assume A = A(x) to be an (n × n)–matrix, where its ele-ments aij(x) are real functions of a scalar x. Let B be an (n× n)–matrix,such that its elements are independent of x. Then
∂
∂xtr(AB) = tr
(∂A
∂xB
).
A.15 Differentiation of Scalar Functions of Matrices 551
Proof.
tr(AB) =n∑
i=1
n∑
j=1
aijbji,
∂
∂xtr(AB) =
∑
i
∑
j
∂aij
∂xbji
= tr(
∂A
∂xB
),
where ∂A/∂x = ∂aij/∂x.
Theorem A.67. For the differential of the trace we have the following rules:
y ∂y/∂X(i) tr(AX) A′
(ii) tr(X ′AX) (A + A′)X(iii) tr(XAX) X ′A + A′X ′
(iv) tr(XAX ′) X(A + A′)(v) tr(X ′AX ′) AX ′ + X ′A(vi) tr(X ′AXB) AXB + A′XB′
Differentiation of Inverse Matrices
Theorem A.68. Let = T (x) be a regular matrix, such that its elementsdepend on a scalar x. Then
∂T−1
∂x= −T−1 ∂T
∂xT−1.
Proof. We have T−1T = I, ∂I/∂x = 0,
∂(T−1T )∂x
=∂T−1
∂xT + T−1 ∂T
∂x= 0.
Theorem A.69. For nonsingular X we have
∂ tr(AX−1)∂X
= −(X−1AX−1)′ ,
∂ tr(X−1AX−1B)∂X
= −(X−1AX−1BX−1 + X−1BX−1AX−1)′ .
Proof. Use Theorems A.67, A.68 and the product rule.
Differentiation of a Determinant
Theorem A.70. For a nonsingular matrix Z we have:
(i) ∂∂Z |Z| = |Z|(Z ′)−1.
(ii) ∂∂Z log |Z| = (Z ′)−1.
552 Appendix A. Matrix Algebra
A.16 Miscellaneous Results, StochasticConvergence
Theorem A.71 (Kronecker Product). Let A : m× n = (aij) and B : p× q =(brs) be any matrices. Then the Kronecker product of A and B is definedas
Cmp,nq
= Am,n
⊗ Bp,q
=
a11B a12B · · · a1nB...
... · · ·am1B am2B · · · amnB
and the following rules hold:
(i) c(A⊗B) = (cA)⊗B = A⊗ (cB) (c a scalar).
(ii) A⊗ (B ⊗ C) = (A⊗B)⊗ C.
(iii) A⊗ (B + C) = (A⊗B) + (A⊗ C).
(iv) (A⊗B)′ = A′ ⊗B′.
Theorem A.72 (Tschebyschev’s Inequality). For any n–dimensional randomvector X and a given scalar ε > 0 we have
P|X| ≥ ε ≤ E |X|2ε2
.
Proof. Let F (x) be the joint distribution function of X = (x1, . . . , xn).Then
E|x|2 =∫|x|2 dF (x)
=∫
x:|x|≥ε|x|2 dF (x) +
∫
x:|x|<ε|x|2 dF (x)
≥ ε2∫
x:|x|≥εdF (x) = ε2P|x| ≥ ε .
Definition A.29. Let x(t), t = 1, 2, . . ., be a multivariate stochasticprocess.
(i) Weak convergenceIf
limt→∞
P|x(t)− x| ≥ δ = 0,
where δ > 0 is any given scalar and x is a finite vector, then x iscalled the probability limit of x(t) and we write
plim x = x.
A.16 Miscellaneous Results, Stochastic Convergence 553
(ii) Strong convergenceAssume that x(t) is defined on a probability space (Ω,Σ, P ). Thenx(t) is said to be strongly convergent to x, i.e.,
x(t) → x almost sure (a.s.)
if there exists a set T ∈ Σ, P (T ) = 0, and xω(t) → xω, as T → ∞,for each ω ∈ Ω− T (M.M. Rao, 1984, p. 45).
Theorem A.73 (Slutsky’s Theorem). (i) If plim x = x, thenlimt→∞Ex(t) = E(x) = x.
(ii) If c is a vector of constants, then plim c = c.
(iii) (Slutsky’s Theorem) If plim x = x and y = f(x) is any continuousvector function of x, then plim y = f(x).
(iv) If A and B are random matrices, then, when the following limits exist,
plim (AB) = (plim A)(plim B)
and
plim (A−1) = (plim A)−1 .
(v) If plim[√
T (x(t)− Ex(t))]′ [√
T (x(t)− Ex(t))]
= V , then theasymptotic covariance matrix is
V (x, x) = E[x− E(x)
]′ [x− E(x)
]= T−1V .
Definition A.30. If x(t), t = 1, 2, . . ., is a multivariate stochastic processstatisfying
limt→∞
E|x(t)− x|2 = 0,
then x(t) is called convergent in the quadratic mean, and we write
l.i.m. x = x d.
Theorem A.74. If l.i.m. x = x, then plim x = x.
Proof. Using Theorem A.72 we get
0 ≤ limt→∞
P (|x(t)− x| ≥ ε) ≤ limt→∞
E|x(t)− x|2ε2
= 0 .
Theorem A.75. If l.i.m. (x(t)− Ex(t)) = 0 and limt→∞Ex(t) = c, thenplim x(t) = c.
554 Appendix A. Matrix Algebra
Proof.
limt→∞
P (|x(t)− c| ≥ ε) ≤ ε−2 limt→∞
E|x(t)− c|2
= ε−2 limt→∞
E|x(t)− Ex(t) + Ex(t)− c|2
= ε−2 limt→∞
E|x(t)− Ex(t)|2 + ε−2 limt→∞
|Ex(t)− c|2
+ 2ε−2 limt→∞
(Ex(t)− c)′(x(t)− Ex(t))= 0 .
Theorem A.76. l.i.m. x = c if and only if
l.i.m.(x(t)− Ex(t)) = 0 and limt→∞
Ex(t) = c .
Proof. As in Theorem A.75, we may write
limt→∞
E|x(t)− c|2 = limt→∞
E|x(t)− Ex(t)|2
+ limt→∞
|Ex(t)− c|2
+ 2 limt→∞
E(Ex(t)− c)′(x(t)− Ex(t))
= 0.
Theorem A.77. Let x(t) be an estimator of a parameter vector θ. Then wehave the result
limt→∞
Ex(t) = θ if l.i.m.(x(t)− θ) = 0 .
That is, x(t) is an asymptotically unbiased estimator for θ if x(t) convergesto θ in the quadratic mean.
Proof. Use Theorem A.76.
Appendix BTheoretical Proofs
In this Appendix the reader will find proofs of theoretical results whichwe decided to put in the appendix. It is structured in accordance with thechapters of the book.
B.1 The Linear Regression Model
Proof 1 (Theorem (3.1)). Let Ax = a have a solution. Then at least onevector x0 exists, with Ax0 = a. As AA−A = A for every g–inverse, weobtain
a = Ax0 = AA−Ax0 = AA−(Ax0) = AA−a ,
which is just (3.12).Now let (3.12) be true, i.e., AA−a = a. Then A−a is a solution of (3.11).
Assume now that (3.11) is solvable. To prove (3.13), we have to show:
(i) that A−a + (I −A−A)w is always a solution of (3.11) (w arbitrary);and
(ii) that every solution x of Ax = a may be represented by (3.13).
Part (i) follows by insertion of the general solution, also making use ofA(I −A−A) = 0:
A[A−a + (I −A−A)w] = AA−a = a .
556 Appendix B. Theoretical Proofs
To prove (ii) we choose w = x0, where x0 is a solution of the linear equation,i.e., Ax0 = a. Then we have
A−a + (I −A−A)x0 = A−a + x0 −A−Ax0
= A−a + x0 −A−a
= x0 ,
thus concluding the proof.
Proof 2 (Theorem (3.2)). We have to start by the following corollary:
Corollary. The set of equations
AXB = C (B.1)
where A : m× n, B : p× q, C : m× q, and X : n× p have a solution X ifand only if
AA−CB−B = C , (B.2)
where A− and B− are arbitrary g–inverses of A and B.If X is of full rank, i.e., rank(X) = p = K, then we have (X ′X)− =
(X ′X)−1 and the normal equations are uniquely solvable by
b = (X ′X)−1X ′y . (B.3)
If, more generally, rank(X) = p < K, then the solutions of the normalequations span the same hyperplane as Xb, i.e., for two solutions b and b∗
we have
Xb = Xb∗ . (B.4)
This result is easy to prove: If b and b∗, are solutions to the normalequations, we have
X ′Xb = X ′y and X ′Xb∗ = X ′y .
Accordingly, we have, for the difference of the above equations,
X ′X(b− b∗) = 0 ,
which entails
X(b− b∗) = 0 or Xb = Xb∗ .
Moreover, by (B.4), the two sums of squared errors are given by
S(b) = (y −Xb)′(y −Xb) = (y −Xb∗)′(y −Xb∗) = S(b∗) .
Thus Theorem B.3 has been proven.
Proof 3 (Theorem (3.3)). As R(X) is of dimension p, an orthonormal basisv1, . . . , vp exists. Furthermore, we may represent the (T × 1)–vector y as
y =p∑
i=1
aivi +
(y −
p∑
i=1
aivi
)= c + d, (B.5)
B.1 The Linear Regression Model 557
where ai = y′vi.
As
v′jd = v′jy −∑
i
aiv′jvi = aj −
∑
i
aiδij = 0 (B.6)
(δij denotes the Kronecker symbol), we have c ⊥ d, i.e., we have c ∈ R(X)and d ∈ R(X)⊥, such that y has been decomposed into two orthogonalcomponents. This decomposition is unique as can easily be shown.
We have to show now that c = Xb = Θ0.
It follows from c−Θ ∈ R(X) that
(y − c)′(c−Θ) = d′(c−Θ) = 0 . (B.7)
Considering y −Θ = (y − c) + (c−Θ), we get
S(Θ) = (y −Θ)′(y −Θ) = (y − c)′(y − c) + (c−Θ)′(c−Θ)+ 2(y − c)′(c−Θ)
= (y − c)′(y − c) + (c−Θ)′(c−Θ) .(B.8)
S(Θ) reaches its minimum on R(X) for the choice Θ = c. As S(Θ) = S(β)we find b to be the optimum c = Θ0 = Xb.
Proof 4 (Theorem (3.4)). Following Theorem 3.3, we have
Θ0 = c =∑
i
aivi =∑
i
vi(y′vi)
=∑
i
vi(v′iy)
= (v1, . . . , vp)(v1, . . . , vp)′y= BB′y [B = (v1, . . . , vp)]= Py , (B.9)
where P is obviously symmetric and idempotent.
We have to make use of the following lemma, which will be stated withoutproof.Lemma. A symmetric and idempotent (T × T )–matrix P of rank p ≤T represents the orthogonal projection matrix of RT on a p–dimensionalvector space V = R(P ).
(i) Determination of P if rank(X) = K.The rows of B constitute an orthonormal basis of R(X) = Θ : Θ =Xβ. But X = BC, with a regular matrix C, as the columns of X
558 Appendix B. Theoretical Proofs
also form a basis of R(X).
Thus
P = BB′ = XC−1C′−1X ′ = X(C ′C)−1X ′
= X(C ′B′BC)−1X ′ [as B′B = I]= X(X ′X)−1X ′ , (B.10)
and we finally get
Θ0 = Py = X(X ′X)−1X ′y = Xb . (B.11)
(ii) Determination of P if rank(X) = p < K.The normal equations have a unique solution, if X is of full columnrank K. A method of deriving unique solutions, if rank(X) = p < K,is based on imposing additional linear restrictions, which enable theidentification of β.
We introduce only the general strategy by using Theorem 3.4; furtherdetails will be given in Section 3.5.
Let R be a [((K− p)×K)]–matrix with rank(R) = K− p and define the
matrix D =(
X
R
).
Let r be a known ((K − p) × 1)–vector. If rank(D) = K, then X and Rare complementary matrices. The matrix R represents (K − p) additionallinear restrictions on β (reparametrization), as it will be assumed that
Rβ = r . (B.12)
Minimization of S(β), subject to these exact linear restrictions Rβ = r,requires the minimization of the function
Q(β, λ) = S(β) + 2λ′(Rβ − r) , (B.13)
where λ stands for a [((K − p)× 1)]–vector of Lagrangian multipliers. Thecorresponding normal equations are given by (cf. Theorem A.63–A.67)
12
∂Q(β, λ)∂β
= X ′Xβ −X ′y + R′λ = 0 ,
12
∂Q(β, λ)∂λ
= Rβ − r = 0 .
(B.14)
If r = 0, we can prove the following theorem (cf. Seber (1966), p. 16):
Theorem B.1. Under the exact linear restrictions Rβ = r with rank(R) =K − p and rank(D) = K we can state:
(i) The orthogonal projection matrix of RT on R(X) is of the form
P = X(X ′X + R′R)−1X ′ . (B.15)
B.1 The Linear Regression Model 559
(ii) The conditional ordinary least–squares estimator of β is given by
b(R, r) = (X ′X + R′R)−1(X ′y + R′r) . (B.16)
Proof. We start with the proof of part (i).
From the assumptions we conclude that for every Θ ∈ R(X) a β exists,such that Θ = Xβ and Rβ = r are valid. β is unique, as rank(D) = K. Inother words, for every Θ ∈ R(X), the [((T + K − p)× 1)]–vector is
(ΘR
)∈ R(D), therefore
(Θr
)= Dβ (and β is unique) .
If we make use of Theorem 3.4, then we get the projection matrix ofRT+K−p on R(D) as
P ∗ = D(D′D)−1D′ . (B.17)
As the projection P ∗ maps every element of R(D) onto itself we have, forevery Θ ∈ R(X),
(Θr
)= D(D′D)−1D′
(Θr
)
=(
X(D′D)−1X ′ X(D′D)−1R′
R(D′D)−1X ′ R(D′D)−1R′
) (Θr
), (B.18)
i.e.,
Θ = X(D′D)−1X ′Θ + X(D′D)−1R′r , (B.19)
r = R(D′D)−1X ′Θ + R(D′D)−1R′r . (B.20)
Equations (B.19) and (B.20) hold for every Θ ∈ R(X) and for all r = Rβ ∈R(R). If we choose in (B.12) r = 0, then (B.19) and (B.20) specialize to
Θ = X(D′D)−1X ′Θ , (B.21)0 = R(D′D)−1X ′Θ . (B.22)
From (B.22) it follows that
R(X(D′D)−1R′) ⊥ R(X) (B.23)
and as R(X(D′D)−1R′) = Θ : Θ = Xβ with β = (D′D)−1R′β it holdsthat
R(X(D′D)−1R′) ⊂ R(X) , (B.24)
such that, finally,
X(D′D)−1R′ = 0 (B.25)
(see also Tan, 1971).
560 Appendix B. Theoretical Proofs
The matrices X(D′D)−1X ′ and R(D′D)−1R′ are idempotent (symmetryis evident):
X(D′D)−1X ′X(D′D)−1X ′
= X(D′D)−1(X ′X + R′R−R′R)(D′D)−1X ′
= X(D′D)−1(X ′X + R′R)(D′D)−1X ′ −X(D′D)−1R′R(D′D)−1X ′
= X(D′D)−1X ′ ,
as D′D = X ′X + R′R and (B.25) are valid.
The idempotency of R(D′D)−1R′ can be shown in a similar way. D′Dand (D′D)−1 are both positive definite (see Theorems A.16 and A.17).R(D′D)−1R′ is positive definite (Theorem A.16(vi)) and thus regular sincerank(R) = K−p. But there exists only one idempotent and regular matrix,namely, the identity matrix (Theorem A.36(iii))
R(D′D)−1R′ = I , (B.26)
such that (B.20) is equivalent to r = r. As P = X(D′D)−1X ′ is idem-potent, it represents the orthogonal projection matrix of RT on a vectorspace V ⊂ RT (see the lemma following Theorem 3.4).
With (B.21) we have R(X) ⊂ V . But the reverse proposition is also true(see Theorem A.7(iv), (v)):
V = R(X(D′D)−1X ′) ⊂ R(X) , (B.27)
such that V = R(X), which proves (i).
(ii): We will solve the normal equations (B.14). With Rβ = r it alsoholds that R′Rβ = R′r. Inserting the latter identity into the first equationof (B.14) yields
(X ′X + R′R)β = X ′y + R′r −R′λ .
Multiplication with (D′D)−1 from the left yields
β = (D′D)−1(X ′y + R′r)− (D′D)−1R′λ .
If we use the second equation of (B.14), (B.25), and (B.26), and thenmultiply by R from the left we get
Rβ = R(D′D)−1(X ′y + R′r)−R(D′D)−1R′λ = r − λ , (B.28)
from which λ = 0 follows.
The solution of the normal equations is therefore given by
β = b(R, r) = (X ′X + R′R)−1(X ′y + R′r) (B.29)
which proves (ii).
B.1 The Linear Regression Model 561
Proof 5 (Theorem (3.11)). r(β, β) has to be minimized with respect to Cunder the restriction
CX =
c′1...
c′K
X =
e′1...
e′K
= IK ,
i.e.,
minC
[trXCC ′X ′ | CX − I = 0] .
This problem may be reformulated in terms of Lagrangian multipliers as
minCi,λi
[trXCC ′X ′ − 2
K∑
i=1
λ′i(c′iX − e′i)
′]
. (B.30)
The (K × 1)–vectors λi of Lagrangian multipliers may be contained in thematrix
Λ =
λ′1...
λ′K
. (B.31)
Differentiation of (B.30) with respect to C and Λ yields (Theorems A.63–A.67) the normal equations
X ′XC − ΛX ′ = 0 , (B.32)CX − I = 0 . (B.33)
The matrix X ′X is regular since rank(X) = K. Premultiplication of (B.32)with (X ′X)−1 leads to
C = (X ′X)−1ΛX ′ ,
from which we have (using (B.33))
CX = (X ′AX)−1Λ(X ′X) = IK ,
namely,
Λ = IK .
Therefore, the optimum matrix is
C = (X ′X)−1X ′ .
The actual linear unbiased estimator is given by
βopt = Cy = (X ′X)−1X ′y , (B.34)
and coincides with the descriptive or empirical OLS estimator b. Theestimator b is unbiased since
CX = (X ′X)−1X ′X = IK , (B.35)
562 Appendix B. Theoretical Proofs
(see (3.47)) and has the (K ×K)–covariance matrix
V(b) = Vb = E(b− β)(b− β)′
= E(X ′X)−1X ′εε′X(X ′X)−1= σ2(X ′X)−1 . (B.36)
Proof 6 (Theorem (3.12)). The equivalence is a direct consequence from thedefinition of definiteness. We will prove (a).
Let β = Cy be an arbitrary unbiased estimator. Define, without loss ofgenerality,
C = C + D = (X ′X)−1X ′ + D .
Unbiasedness of β requires that (3.47) is fulfilled:
CX = CX + DX = I .
In view of (B.35) it is necessary that
DX = 0 .
For the covariance matrix of β we get
Vβ = E(Cy − β)(Cy − β)′
= E(Cε)(ε′C ′)= σ2[(X ′X)−1X ′ + D][X(X ′X)−1 + D′]= σ2[(X ′X)−1 + DD′]= Vb + σ2DD′ ≥ Vb .
Corollary. Let Vβ − Vb ≥ 0. Denote by Var(bk) and Var(βk) the maindiagonal elements of Vb and Vβ . Then the following inequality holds for thecomponents of the two vectors β and b:
Var(βi)−Var(bi) ≥ 0 (i = 1, . . . ,K) . (B.37)
Proof. From Vβ − Vb ≥ 0 we have a′(Vβ − Vb)a ≥ 0 for arbitrary vectorsa, such that for the vectors, e′i = (0 . . . 010 . . . 0) with 1 at the ith position.Let A be an arbitrary symmetric matrix such that e′iAei = aii. Then theith diagonal element of Vβ − Vb is just (B.37).
Proof 7 (Theorem (3.14)). Let d = c′y be an arbitrary linear unbiased es-timator of d, where c is a (T × 1)–vector. Without loss of generality weset
c′ = a′(X ′X)−1X ′ + c′ .
The unbiasedness of d requires that
c′X = a′ ,
B.1 The Linear Regression Model 563
i.e.,
a′(X ′X)−1X ′X + c′X = a′
and, therefore,
c′X = 0 . (B.38)
Using (3.94) we get
d− d = a′β + a′(X ′X)−1X ′ε + c′ε− a′β
= a′(X ′X)−1X ′ε + c′ε = c′ε .
The variance of d is given by
Var(d) = E(d− d)2 = c′ E(εε′)c = σ2c′c
= σ2[a′(X ′X)−1X ′ + c′][X(X ′X)−1a + c]= a′Vb0a + σ2c′c .
As c′c ≥ 0, the variance of d will be minimized if c = 0. The estimatorc′y = a′(X ′X)−1X ′y = a′b0 is therefore the best estimator among alllinear unbiased estimators in the sense of a minimum variance.
Proof 8. We may use the corollary following Theorem 3.1.
The condition of unbiasedness is a condition on the matrix C, namely,
CX = I .
The latter equation is solvable with respect to C if and only if (B.1)holds, i.e., X−X = IK . With the help of Theorem A.38(ii), we knowthat rank(X−X) = rank(X) and rank(X) = p < K. On the other hand,rank(IK) = K. Thus (X−X) = IK cannot be valid so that CX = I is notsolvable.
Proof 9 (Theorem (3.15)). The proof consists of three parts.
(a) b(R) is unbiased.
With Rβ = 0 we also have R′Rβ = 0 (Theorems A.45 and A.46), suchthat
E(b(R)) = (X ′X + R′R)−1X ′Xβ
= (X ′X + R′R)−1(X ′X + R′R)β = β .
b(R) fulfills the restriction
Rb(R) = R(X ′X + R′R)−1X ′y = 0 (compare (B.25)) .
(b) We immediately get
b(R)− β = (D′D)−1X ′ε
564 Appendix B. Theoretical Proofs
and, therefore,
Vb(R) = E(D′D)−1X ′εε′X(D′D)−1= σ2(D′D)−1X ′X(D′D)−1 .
(c) We now have to prove that b(R) is the best linear conditionally un-biased estimator of β under the restriction Rβ = 0, i.e., the best linearunbiased estimator in model (3.75). (A somewhat different way of proof isgiven by Tan (1971) who deals with multivariate models using generalizedinverses.)Model (3.75) is then of the form
(y
0
)(X
R
)β +
(ε
0
), (B.39)
or in new symbols (T = T + K − p) of the form
yT×1
= DT×K
βK×1
+ εT×1
. (B.40)
We have E(ε) = 0, E(εε′) = V =(
σ2I 00 0
), and rank(D) = K, such
that the model is singular. The estimator b(R) is still linear in y:
b(R) = (D′D)−1X ′y = (D′D)−1(X ′y + R′0)= (D′D)−1D′y = Cy (C is a K × T–matrix) . (B.41)
Since b(R) is conditionally unbiased, we have
CD = I . (B.42)
Let β = Cy + d be an arbitrary unbiased estimator of β in model (B.39).
Without loss of generality, we write
C = C + F with F = (F1, F2) , (B.43)
where C = (D′D)−1D′ is the matrix from (B.41), F1 is a (K × T )–matrix,and F2 is a [(K × (K − p))]–matrix. Unbiasedness of β in model (B.39)requires that
E(β) = CDβ + d = β for all β ,
from which we have d = 0 by choosing β = 0. A necessary condition forunbiasedness is thus given by
CDβ = CDβ + FDβ
= CDβ + F1Xβ + F2Rβ
= β + F1Xβ = β [Rβ = 0 and (B.42)]
and, thus,
F1X = 0 . (B.44)
B.1 The Linear Regression Model 565
It follows that
β − β = (C + F )Dβ + (C + F )ε− β
= (C + F )ε = Cε
and we can express the covariance matrix of β in the following form:
Vβ = E(β − β)(β − β)′ = CV C ′
= (C + F )V (C ′ + F ′)= CV C ′ + FV F ′ + FV C ′ + CV F ′ .
Furthermore, we have (with E(εε′) = V , compare (B.40))
CV C ′ = Vb(R) ,
FV F ′ = (F1, F2)(
σ2I 00 0
)(F ′1F ′2
)= σ2F1F
′1 ,
where σ2F1F′1 is nonnegative definite [Theorem A.18 (v)].
For mixed products it holds that
FV C ′ = (F1, F2)(
σ2I 00 0
)(XR
)(D′D)−1
= F1X(D′D)−1 = 0 [by (B.44)] (B.45)
Finally, we get
Vβ − Vb(R) = σ2F1F′1 ≥ 0 (B.46)
and the asserted optimality of b(R) has been proven. Therefore, b(R) is aGauss–Markov estimator of β in model (B.39).
Proof 10 (Testing Linear Hypotheses, Case s > 0). Let
X
(GR
)−1
= XT×K
=
(X1T×s
, X2T×(K−s)
)
andβ1s×1
= Gβ, β2(K−s)×1
= Rβ .
Then the model could be rewritten as
y = Xβ + ε = X1β1 + X2β2 + ε .
Proof 11 (Testing Linear Hypotheses, Distribution of F ). In what follows, wewill determine F and its distribution for the two special cases of the generallinear hypothesis.
566 Appendix B. Theoretical Proofs
Distribution of F
Case 1: s = 0The ML estimators under H0 (3.96) are given by
β = β∗ and σ2ω =
1T
(y −Xβ∗)′(y −Xβ∗). (B.47)
The ML estimators over Ω are available from Theorem 3.18:
β = b and σ2Ω =
1T
(y −Xb)′(y −Xb). (B.48)
Subsequent modifications then yield
b− β∗ = (X ′X)−1X ′(y −Xβ∗),
(b− β∗)′X ′X = (y −Xβ∗)′X,
y −Xb = (y −Xβ∗)−X(b− β∗),
(y −Xb)′(y −Xb) = (y −Xβ∗)′(y −Xβ∗)
+ (b− β∗)′X ′X(b− β∗)
− 2(y −Xβ∗)′X(b− β∗)
= (y −Xβ∗)′(y −Xβ∗)
− (b− β∗)′X ′X(b− β∗).
(B.49)
It follows that
T (σ2ω − σ2
Ω) = (b− β∗)′X ′X(b− β∗), (B.50)
and we now have the test statistic
F =(b− β∗)′X ′X(b− β∗)
(y −Xb)′(y −Xb)· T −K
K. (B.51)
B.1 The Linear Regression Model 567
Numerator:The following statements hold:
b− β∗ = (X ′X)−1X ′[ε + X(β − β∗)] [by (B.49)],
ε = ε + X(β − β∗) ∼ N(X(β − β∗), σ2I) [Theorem A.82],
X(X ′X)−1X ′ idempotent and of rank K
(b− β∗)′X ′X(b− β∗) = ε′X(X ′X)−1X ′ε
∼ σ2χ2K(σ−2(β − β∗)′X ′X(β − β∗)) [Theorem A.57]
and ∼ σ2χ2K under H0.
Denominator:
(y −Xb)′(y −Xb) = (T −K)s2 = ε′Mε [by (3.62)],
M = I −X(X ′X)−1X ′ idempotent of rank T −K [A.36(vi)],
ε′Mε ∼ σ2χ2T−K [Theorem A.60].
(B.52)We have
MX(X ′X)−1X ′ = 0 [Theorem A.36(vi)], (B.53)
such that the numerator and denominator are independently distributed(Theorem A.62).
Thus (Theorem A.59) the ratio F exhibits the following properties:
• F is distributed as FK,T−K(σ−2(β−β∗)′X ′X(β−β∗)) under H1; and
• F is distributed as central FK,T−K under H0 : β = β∗.
If we denote by Fm,n,1−q the (1 − q)–quantile of Fm,n (i.e., P (F ≤Fm,n,1−q) = 1 − q), then we may derive a uniformly most powerful test,given a fixed level of significance α (cf. Lehmann, 1986, p. 372):
region of acceptance of H0 : 0 ≤ F ≤ FK,T−K,1−α,critical area of H0 : F > FK,T−K,1−α.
(B.54)
A selection of critical values is provided in Appendix C.
568 Appendix B. Theoretical Proofs
Case 2: s > 0Next we consider a decomposition of the model in order to determine theML estimators under H0 (3.97) and compare them with the correspondingML estimators over Ω. Let
β′ =
(β′11×s
, β′21×(K−s)
)(B.55)
and, respectively,
y = Xβ + ε = X1β1 + X2β2 + ε . (B.56)
We set
y = y −X2r. (B.57)
Since rank(X) = K, we have
rank(X1)T×s
= s, rank(X2)T×(K−s)
= K − s, (B.58)
such that the inverse matrices (X ′1X1)−1 and (X ′
2X2)−1 exist.
The ML estimators under H0 are then given by
β2 = r, β1 = (X ′1X1)−1X ′
1y, (B.59)
and
σ2ω =
1T
(y −X1β1)′(y −X1β1). (B.60)
Separation of b
It can easily be seen that
b = (X ′X)−1X ′y
=(
X ′1X1 X ′
1X2
X ′2X1 X ′
2X2
)−1 (X ′
1yX ′
2y
). (B.61)
Making use of the formulas for the inverse of a partitioned matrix yields(Theorem A.4)(
(X ′1X1)−1[I + X ′
1X2D−1X ′
2X1(X ′1X1)−1] −(X ′
1X1)−1X ′1X2D
−1
−D−1X ′2X1(X ′
1X1)−1 D−1
),
(B.62)where
D = X ′2M1X2 (B.63)
and
M1 = I −X1(X ′1X1)−1X ′
1 = I − PX1 . (B.64)
B.1 The Linear Regression Model 569
M1 is (analogously to M) idempotent and of rank T − s, furthermore, wehave M1X1 = 0. The [(K − s)× (K − s)]–matrix
D = X ′2X2 −X ′
2X1(X ′1X1)−1X ′
1X2 (B.65)
is symmetric and regular, as the normal equations are uniquely solvable.
The components b1 and b2 of b are then given by
b =(
b1
b2
)=
((X ′
1X1)−1X ′1y − (X ′
1X1)−1X ′1X2D
−1X ′2M1y
D−1X ′2M1y
).
(B.66)Various relations immediately become apparent from (B.66)
b2 = D−1X ′2M1y,
b1 = (X ′1X1)−1X ′
1(y −X2b2),b2 − r = D−1X ′
2M1(y −X2r)= D−1X ′
2M1y= D−1X ′
2M1(ε + X2(β2 − r)),
(B.67)
b1 − β1 = (X ′1X1)−1X ′
1(y −X2b2 − y)= −(X ′
1X1)−1X ′1X2(b2 − r)
= −(X ′1X1)−1X ′
1X2D−1X ′
2M1y.
(B.68)
Decomposition of σ2Ω
We write (using symbols u and v)
(y −Xb) = (y −X2r −X1β1) −(X1(b1 − β1) + X2(b2 − r)
)
= u − v.(B.69)
Thus, we may decompose the ML estimator T σ2Ω = (y −Xb)′(y −Xb) as
(y −Xb)′(y −Xb) = u′u + v′v − 2u′v. (B.70)
We have
u = y −X2r −X1β1 = y −X1(X ′1X1)−1X ′
1y = M1y, (B.71)u′u = y′M1y, (B.72)
v = X1(b1 − β1) + X2(b2 − r)= −X1(X ′
1X1)−1X ′1X2D
−1X ′2M1y [by (B.67)]
+ X2D−1X ′
2M1y [by (B.68)]= M1X2D
−1X ′2M1y , (B.73)
v′v = y′M1X2D−1X ′
2M1y
= (b2 − r)′D(b2 − r) , (B.74)u′v = v′v . (B.75)
570 Appendix B. Theoretical Proofs
Summarizing, we may state
(y −Xb)′(y −Xb) = u′u− v′v
= (y −X1β1)′(y −X1β1)− (b2 − r)′D(b2 − r)(B.76)
or
T (σ2ω − σ2
Ω) = (b2 − r)′D(b2 − r) . (B.77)
Hence, for Case 2: s > 0, we get
F =(b2 − r)′D(b2 − r)(y −Xb)′(y −Xb)
T −K
K − s. (B.78)
Distribution of F
Numerator:We use the following relations:
A = M1X2D−1X ′
2M1 is idempotent,
rank(A) = tr(A) = tr(M1X2D−1)(X ′
2M1)= tr(X ′
2M1)(M1X2D−1) [Theorem A.1(iv)]
= tr(IK−s) = K − s,
b2 − r = D−1X ′2M1ε [by (B.67)],
ε = ε + X2(β2 − r) ∼ N(X2(β2 − r), σ2I), [Theorem A.55],
(b2 − r)′D(b2 − r) = ε′Aε ∼ σ2χ2K−s(σ
−2(β2 − r)′D(β2 − r)) (B.79)
[Theorem A.57] and
∼ σ2χ2K−s under H0. (B.80)
Denominator:The denominator is equal in both cases, i.e., with PX = X(X ′X)−1X ′, wehave
(y −Xb)′(y −Xb) = ε′(I − PX)ε ∼ σ2χ2T−K . (B.81)
Since
(I−PX)X = (I−PX)(X1, X2) = ((I−PX)X1, (I−PX)X2) = (0, 0) (B.82)
we find
(I − PX)M1 = (I − PX) (B.83)
and
(I − PX)A = (I − PX)M1X2D−1X ′
2M1 = 0, (B.84)
B.1 The Linear Regression Model 571
such that the numerator and denominator of F (B.78) are independentlydistributed ([Theorem A.62]). Hence ([see also Theorem A.59]), the teststatistic F is distributed under H1 as FK−s,T−K(σ−2(β2 − r)′D(β2 − r))and as central FK−s,T−K under H0.
Proof 12 (Theorem (3.20)). Let
R2X −R2
X1=
RSSX1 −RSSX
SY Y,
such that the assertion (3.161) is equivalent to
RSSX1 −RSSX ≥ 0 .
Since
RSSX = (y −Xb)′(y −Xb)= y′y + b′X ′Xb− 2b′X ′y
= y′y − b′X ′y (B.85)
and, analogously,
RSSX1 = y′y − β′1X′1y
where
b = (X ′X)−1X ′y
and
β1 = (X ′1X1)−1X ′
1y
are OLS estimators in the full model and in the submodel, we have
RSSX1 −RSSX = b′X ′y − β′1X′1y . (B.86)
Now we have, with (B.61)–(B.67),
b′X ′y = (b′1, b′2)
(X ′
1yX ′
2y
)
= (y′ − b′2X′2)X1(X ′
1X1)−1X ′1y + b′2X
′2y
= β′1X′1y + b′2X
′2M1y (cf. (B.76)) .
Thus, (B.86) becomes
RSSX1 −RSSX = b′2X′2M1y
= y′M1X2D−1X ′
2M1y ≥ 0 , (B.87)
such that (3.161) is proven.
572 Appendix B. Theoretical Proofs
Proof 13 (Transformation for General Linear Regression). The matrices Wand W−1 may be decomposed [see also Theorem A.12(iii)] as
W = MM and W−1 = NN, (B.88)
where M = W 1/2 and N = W−1/2 are nonsingular. We transform themodel (3.166) by premultiplication with N :
Ny = NXβ + Nε (B.89)
and set
Ny = y , NX = X , Nε = ε . (B.90)
Then it holds
E(ε) = E(Nε) = 0, E(εε′) = E(Nεε′N) = σ2I , (B.91)
such that the transformed model y = Xβ + ε obeys all assumptions of theclassical regression model. The OLS estimator of β in this model is of theform
b = (X ′X)−1X ′y
= (X ′NN ′X)−1X ′NN ′y
= (X ′W−1X)−1X ′W−1y . (B.92)
Proof 14 (Smallest Variance for Aitken Estimator). Let β = Cy be an arbi-trary linear unbiased estimator of β. We set
C = C + D (B.93)
with
C = S−1X ′W−1 . (B.94)
The unbiasedness of β leads to the condition DX = 0, such that CWD = 0.Therefore, we get, for the covariance matrix,
Vβ = E(Cεε′C ′)
= σ2(C + D)W (C ′ + D′)= σ2CWC ′ + σ2DWD′
= Vb + σ2DWD′ , (B.95)
such that Vβ − Vb = σ2D′WD is nonnegative definite (Theorem A.18(v)).
Proof 15 (Estimation of σ2). Here we have
ε = y −Xβ = (I −X(X ′AX)−1X ′A)ε ,
(T −K)σ2 = ε′ε
= tr(I −X(X ′AX)−1X ′A)εε′(I −AX(X ′AX)−1X ′) ,
E(σ2)(T −K) = σ2 tr(W −X(X ′AX)−1X ′A)+ trσ2X(X ′AX)−1X ′A(I − 2W ) + XVβX ′ . (B.96)
B.1 The Linear Regression Model 573
If we choose the standardization tr(W ) = T , then the first term in (B.96)becomes (T−K) (Theorem A.1). In the case β = (X ′X)−1X ′y (i.e., A = I),we get
E(σ2) = σ2 +σ2
T −Ktr[X(X ′X)−1X ′(I −W )]
= σ2 +σ2
T −K(K − tr[(X ′X)−1X ′WX]) . (B.97)
Proof 16 (Decomposition of P ). Assume that X is partitioned as X =(X1, X2) with X1 : T × p and rank(X1) = p, X2 : T × (K − p) andrank(X2) = K − p. Let P1 = X1(X ′
1X1)−1X ′1 be the (idempotent) pre-
diction matrix for X1, and let W = (I − P1)X2 be the projection ofthe columns of X2 onto the orthogonal complement of X1. Then the ma-trix P2 = W (W ′W )−1W ′ is the prediction matrix for W , and P can beexpressed as (using Theorem A.45)
P = P1 + P2 (B.98)
or
X(X ′X)−1X ′ = X1(X ′1X1)−1X ′
1+(I−P1)X2[X ′2(I−P1)X2]−1X ′
2(I−P1) .(B.99)
Equation (B.98) shows that the prediction matrix P can be decomposedinto the sum of two (or more) prediction matrices. Applying the decom-position (B.99) to the linear model, including a dummy variable, i.e.,y = 1α + Xβ + ε, we obtain
P =11′
T+ X(X ′X)−1X ′ = P1 + P2 (B.100)
and
pii =1T
+ x′i(X′X)−1xi , (B.101)
where X = (xij − xi) is the matrix of the mean–corrected x–values. Thisis seen as follows. Application of (B.99) to (1, X) gives
P1 = 1(1′1)−11′ =11′
T(B.102)
and
W = (I − P1)X = X − 1(
1T
1′X)
= X − (1x1, 1x2, . . . , 1xK)= (x1 − x1, . . . , xK − xK) . (B.103)
Since X ′1 = 0 and hence P21 = 0, we get, from (B.100),
P1 = 1T
T+ 0 = 1 . (B.104)
574 Appendix B. Theoretical Proofs
Proof 17 (Property (ii)). Since P is nonnegative definite, we have x′Px ≥ 0for all x and, especially, for xij = (0, . . . , 0, xi, 0, xj , 0, . . . , 0)′, where xi andxj occur at the ith and jth positions (i 6= j). This gives
x′ijPxij = (xi, xj)(
pii pij
pji pjj
)(xi
xj
)≥ 0 .
Therefore, Pij =(
pii pij
pji pjj
)is nonnegative definite, and hence its
determinant is nonnegative
|Pij | = piipjj − p2ij ≥ 0 .
Proof 18 (Property (iv)). Analogous to (ii), using I − P instead of P leadsto (3.198).
We have
pii +ε2iε′ε
≤ 1 . (B.105)
Proof. Let Z = (X, y), PX = X(X ′X)−1X ′, and PZ = Z(Z ′Z)−1Z ′. Then(B.99) and (3.181) imply
PZ = PX +(I − PX)yy′(I − PX)
y′(I − PX)y
= PX +εε′
ε′ε. (B.106)
Hence we find that the ith diagonal element of PZ is equal to pii + ε2i /ε′ε.
If we now use (3.192), then (B.105) follows.
Proof 19 (pij in Multiple Regression). The proof is straightforward by usingthe spectral decomposition of X ′X = ΓΛΓ′ and the definition of pij andpii (cf. (3.182)), i.e.,
pij = x′i(X′X)−1xj = x′iΓΛ−1Γ′xj
=K∑
r=1
λ−1r x′iγrx
′jγr
= ‖xi‖ ‖xj‖∑
λ−1r cos θir cos θjr ,
where ‖xi‖ = (x′ixi)1/2 is the norm of the vector xi.
B.1 The Linear Regression Model 575
Proof 20 (Likelihood–Ratio Test Statistic). Applying relationship (B.99) weobtain
(X, ei)[(X, ei)′(X, ei)]−1(X, ei)′ = P +(I − P )eie
′i(I − P )
e′i(I − P )ei. (B.107)
The left–hand side may be interpreted as the prediction matrix P(i) whenthe ith observation is omitted. Therefore, we may conclude that
SSE(H1) = (T −K − 1)s2(i) = y′(i)(I − P(i))y(i)
= y′(
I − P − (I − P )eie′i(I − P )
e′i(I − P )ei
)y
= SSE(H0)− ε2i1− pii
(B.108)
holds, where we have made use of the following relationships: (I −P )y = εand e′iε = εi and, moreover, e′iIei = 1 and e′iPei = pii.
Proof 21 (Andrews–Pregibon Statistic). Define Z = (X, y) and consider thepartitioned matrix
Z ′Z =(
X ′X X ′yy′X y′y
). (B.109)
Since rank(X ′X) = K, we get (cf. Theorem A.2(vii))
|Z ′Z| = |X ′X||y′y − y′X(X ′X)−1X ′y|= |X ′X|(y′(I − P )y)= |X ′X|(T −K)s2 . (B.110)
Analogously, defining Z(i) = (X(i), y(i)), we get
|Z ′(i)Z(i)| = |X ′(i)X(i)|(T −K − 1)s2
(i). (B.111)
Therefore the ratio (3.224) becomes
|Z ′(i)Z(i)||Z ′Z| . (B.112)
Proof 22 (Another Notation of the Andrews–Pregibon statistic). Using
Z ′(i)Z(i) = Z ′Z − ziz′i
with zi = (x′i, yi) and Theorem A.2(x), we obtain
|Z ′(i)Z(i)| = |Z ′Z − ziz′i|
= |Z ′Z|(1− z′i(Z′Z)−1zi)
= |Z ′Z|(1− pzii) .
576 Appendix B. Theoretical Proofs
Proof 23 (Lemma 3.25). Using Theorem A.3(iv),
(X ′X)−1 = (X ′(i)X(i) + xix
′i)−1
= (X ′(i)X(i))−1 −
(X ′(i)X(i))−1xix
′i(X
′(i)X(i))−1
1 + tii,
where
tii = x′i(X′(i)X(i))−1xi .
We have
P = X(X ′X)−1X ′
=(
X(i)
x′i
) ((X ′
(i)X(i))−1 −(X ′
(i)X(i))−1xix′i(X
′(i)X(i))−1
1 + tii
)(X ′
(i)xi)
and
Py = X(X ′X)−1X ′y
=
X(i)β(i) − 1/(1 + tii)(X
′(i)(X
′(i)X(i))
−1xix′iβ(i) −X ′
(i)(X′(i)X(i))
−1xiyi)
1/(1 + tii)(x′iβ(i) + tiiyi)
!.
Since
(I − P )ei =1
1 + tii
(−X(i)(X ′(i)X(i))−1xi
1
)
and
||(I − P )ei||2 =1
1 + tii,
we get
eiei′y =
11 + tii
(X ′
(i)(X′(i)X(i))−1xix
′iβ(i) −X ′
(i)(X′(i)X(i))−1xiyi
−x′iβ(i) + yi
).
Therefore,
X(X ′X)−1X ′y + eiei′y =
(X(i)β(i)
yi
).
B.1 The Linear Regression Model 577
Proof 24 (Lemma 3.26). Using the fact that
(X ′X X ′ei
e′iX e′iei
)− 1
=(
(X ′X)−1 + (X ′X)−1X ′eiHe′iHe′iX(X ′X)−1 −(X ′X)−1X ′eiH−He′iX(X ′X)−1 H
)
where
H = (e′iei − e′iX(X ′X)−1Xei)−1
= (e′i(I − P )ei)−1
=1
||Qei||2 ,
we can show that P (X, ei), the projection matrix onto the column spaceof (X, ei), becomes
P (X, ei) = (X ei)(
X ′X X ′ei
e′iX e′iei
)−1 (X ′
e′i
)
= P +(I − P )eie
′i(I − P )
||Qei||2= P + eie
′i.
Therefore
y(λ) = X(X ′X)−1X ′y + λeie′iy
= y(0) + λ(P (X, ei)− P )y= y(0) + λ(y(1)− y(0))= λy(1) + (1− λ)y(0)
and property (ii) can be proved by the fact that
ε(λ) = y − y(λ)= y − y(0)− λ(y(1)− y(0))= ε− λ(y(1)− y(0)).
578 Appendix B. Theoretical Proofs
B.2 Single–Factor Experiments with Fixed andRandom Effects
Proof 25 (OLS Estimate for s = 2). The multiplication of (4.11), by rows,with (4.12) yields
µ =n1n2(1 + n)Y·· − n1n2Y1· − n1n2Y2·
n1n2n2
=nY··n2
=Y··n
= y·· ,
α1 =−n1n2Y·· + n2(n(1 + n2)− n2)Y1· − n1n2(n− 1)Y2·
n1n2n2
= −Y··n2
+n + nn2 − n2
n1n2Y1· − n− 1
n2(Y·· − Y1·)
= Y1·
(n + nn2 − n2 + nn1 − n1
n1n2
)− Y··
(1− 1 + n
n2
)
=Y1·n1
− Y··n
= y1· − y··
and, analogously,
α2 = y2· − y·· .
Proof 26 (Proof of the F–Distribution of F1,n−s). We first start proving withthe denomiator.
(i) DenominatorFirst, we derive a representation of MSError as a quadratic form in thetotal error vector ε (cf. (4.4)).
With (4.2) and (4.42) we have
yij − yi· = εij − εi· , (all i, j),
εi − 1niεi· = εi − 1ni
1ni1′ni
εi
=(
Ini− 1
ni1ni
1′ni
)εi
= Qiεi , (B.113)
ε1...εs
−
1n1ε1·...
1nsεs·
=
Q1 0. . .
0 Qs
ε
= diag(Q1, . . . , Qs)ε= Qε . (B.114)
B.2 Single–Factor Experiments with Fixed and Random Effects 579
The matrices Qi = Ini− 1/ni1ni
1′niare symmetric
Qi = Q′i ,
hence, we have
Q = Q′ .
Furthermore, Qi is idempotent
Q2i = Ini +
1n2
i
1ni1′ni
1ni1′ni− 2
ni1ni1
′ni
= Qi ,
with rank(Qi) = tr(Qi) = ni − 1. Hence, Q is idempotent as well, withrank(Q) =
∑rank(Qi) = n− s.
This yields the following representation:
MSError =1
n− sε′Qε . (B.115)
(ii) NumeratorWe have
y =
y1·...
ys·
=
µ + α1 + ε1·...
µ + αs + εs·
. (B.116)
Under
H0 : c′µ = c′
µ + α1
...µ + αs
= 0 (B.117)
we have
c′y = c′
ε1·...
εs·
= c′ε (B.118)
with
ε =
1/n11′n10′
. . .0′ 1/ns1′ns
ε
= diag(D′1, . . . , D
′s)ε
= D′ε . (B.119)
580 Appendix B. Theoretical Proofs
Hence, the numerator of F [(4.58)] can also be presented as a quadraticform in ε according to
(c′y)2∑c2i /ni
=1∑c2i /ni
ε′Dcc′D′ε . (B.120)
The matrix of this quadratic form is symmetric and idempotent:(
1∑c2i /ni
Dcc′D′)2
=1∑c2i /ni
Dcc′D′ . (B.121)
We check this for s = 2. We have
Dcc′D′ =(
1/n11n1 00 1/n21n2
)(c1
c2
)(c1 c2)
(1/n11′n1
0′
0′ 1/n21′n2
)
=
c21/n2
11n11′n1
(c1c2)/(n1n2)1n11′n2
(c1c2)/(n1n2)1n21′n1
c22/n2
21n21′n2
and, hence,
(Dcc′D′)2 =(
c21
n1+
c22
n2
)(Dcc′D′) .
From this the idempotence follows (cf. (B.121)). Furthermore, we have(cf. A.36(ii))
rank(
Dcc′D′∑
c2i /ni
)= tr
(Dcc′D′∑
c2i /ni
)= 1 ,
since tr(1ni1′ni
) = ni.
(iii) Independence of numerator and denominatorThe numerator and denominator of F from (4.58) are quadratic forms inε with idempotent matrices, hence they have a χ2
1–distribution, or χ2n−s–
distribution, respectively. According to Theorem A.61, their ratio has anF1,n−s–distribution if
1∑c2i /ni
QDcc′D′ = 0. (B.122)
As can easily be seen, we have
QD =
Q1D1 0. . .
0 QsDs
and
QiDi =(
Ini− 1
ni1ni
1′ni
)1ni
1ni
=1ni
1ni− 1
ni1ni
= 0 .
B.3 Incomplete Block Designs 581
Hence
QD = 0
and (B.122) holds.
B.3 Incomplete Block Designs
Proof 27 (Proof of b + rankC = v + rank D). In order to prove b+rank C =v + rank D, consider a submatrix of C-matrix as
∆ =[
K NN ′ R
]. (B.123)
Also consider the nonsingular matrices
Ω =[
Ib 0−N ′K−1 Iv
]and Φ =
[Ib 0
−R−1N ′ Iv
].
Since the rank of a matrix does not change by premultiplication of anonsingular matrix, so
rank∆ = rankΩ∆ = rank∆Φ.
Since
Ω∆ =[
K N0 C
]
and
∆Φ =[
D N0 R
],
so
rank[
K N0 C
]= rank
[D N0 R
]
or
b + rank C = v + rank D,
which completes the proof.
Further, the rank of matrix
n 1b′K 1v
′RK1b K NR1v N ′ R
[cf.(6.5)] (B.124)
is same as that of ∆ (cf. (B.123)) and rank of the matrix (B.124) with anadditional column
00L
582 Appendix B. Theoretical Proofs
where L = (l1, l2, . . . , lv)′ is same as the rank of matrix[
K N 00 C L
]. (B.125)
In order that the rank of the matrices ∆ and (B.125) are same, a necessarycondition is that 1v
′L = 0. Thus a necessary condition that the linearparametric function L′τ is estimable is that 1v
′L = 0, i.e., the L′τ is acontrast.
Proof 28 (Covariance Matrices of Adjusted Treatment and Block Totals). Letus consider
Q = V −N ′K−1B
=(
I −N ′K−1)Z
where
Z =(
VB
).
Thus
V(Q) =(
I −N ′K−1)V(Z)
(I
−K−1N
)(B.126)
where
V(Z) =(
V(V ) Cov(V, B)Cov(B, V ) V(B)
).
Since Bi and Vj have nij observations in common and observations aremutually independent, so
Cov(Bi, Vj) = nijσ2 ,
Var(Bi) = kiσ2 ,
Var(Vj) = rjσ2 ,
so that
V(Z) =(
R N ′
N K
)σ2. (B.127)
Substituting (6.19) in (B.126) we have
V(Q) = (R−N ′K−1N)σ2
= Cσ2.
B.3 Incomplete Block Designs 583
Similarly the covariance matrix of adjusted block totals from (6.17) and(6.18) is
V(P ) =( −NR′ I
)V(Z)
( −RN ′
I
)
= K −NR−1N ′ [cf. 6.19]= Dσ2.
Next we find the covariance between B and Q as
Cov(B, Q) = Cov(B, V −N ′K−1B)= Cov(B, V )−V(B)K−1N
= Nσ2 −KK−1Nσ2 [cf. B.127]= 0.
Proof 29 (Theorem 6.8). If nij/rj = ai (constant), say, then summing overi on both of the sides gives ai = ki/n. Thus
nij
rj=
ki
n
ornij
ki=
rj
n. (B.128)
The right hand side of (B.128) is independent of i, which proves the result.The other part can be proved similarly which completes the proof.
Proof 30 (Estimates of µ and τ in interblock analysis). In order to obtainthe estimates of µ and τ , we minimize the sum of squares due to errorf = (f1, f2, . . . , fb)′, i.e., minimize
(B − kµ∗1b −Nτ)′(B − kµ∗1b −Nτ)
with respect to µ and τ . The estimates of µ and τ are the solutions offollowing normal equations:
(k1b
′
N ′
) (k1b
′ N) (
µτ
)=
(k1b
′
N ′
)B
or(
k21b′1b k1b
′NkN ′1b N ′N
)(µτ
)=
(kGN ′B
)
or(
k2b k1v′R
kR1v N ′N
) (µτ
)=
(kGN ′B
)(using N ′1b = r = R1v).
(B.129)
Premultiplying both sides of (B.129) by(
1 0−R1v
′
b Iv
),
584 Appendix B. Theoretical Proofs
we get(
bk 1v′R
0 N ′N − R1v1v′R
b
)(µτ
)=
(G
N ′B − R1vGb
).
Using the side condition 1v′Rτ = 0 and assuming N ′N to be nonsingular,
we get
µ =G
bk,
τ = (N ′N)−1
(N ′B − R1vG
b
)
= (N ′N)−1
(N ′B − kGN ′1b
bk
)(usingR1v = r = N ′1b)
= (N ′N)−1
(N ′B − G
bkN ′N1v
)
= (N ′N)−1N ′B − G1v
bk.
Proof 31 (Derivation of relation (i) bk = vr of BIBD). Consider
1b′N1v = 1b
′
∑j n1j∑j n2j
...∑j nbj
= 1b′
kk...k
[cf. (6.68)]
= bk. (B.130)
Similarly, consider
1v′N ′1b = 1v
′
∑i ni1∑i ni2
...∑i niv
= 1v
′
rr...r
= vr. (B.131)
But 1b′N1v = 1v
′N ′1b, both being scalars, so bk = vr, and thus relation(i) holds.
B.3 Incomplete Block Designs 585
Proof 32 (Derivation of relation (ii) λ(v − 1) = r(k − 1) of BIBD). Consider
N ′N =
∑i n2
i1
∑i ni1ni2 . . .
∑i ni1niv∑
i ni1ni2
∑i n2
i2 . . .∑
i ni2niv
......
. . ....∑
i nivni1
∑i nivni2 . . .
∑i n2
iv
=
r λ . . . λλ r . . . λ...
.... . .
...λ λ . . . r
(B.132)
as nij = 1 or 0, so n2ij = 1 or 0. Thus
∑
i
n2ij = number of times τj occurs in the design
= r for all j = 1, 2, . . . , v,∑
i
nijnij′ = number of blocks in which τj and τj′ occurs together
= λ for all j 6= j′
and
N ′N1v = [r + λ(v − 1)]1v . [cf. (B.132)] (B.133)
Also
N ′N1v = N ′[N1v]
= N ′
kk...k
= k
∑i ni1∑i ni2
...∑i niv
= kr1v. (B.134)
It follows from (B.133) and (B.134) that
[r + λ(v − 1)]1v = kr1v
or r + λ(v − 1) = kr
or λ(v − 1) = r(k − 1)
and thus the relation (6.66) holds.
586 Appendix B. Theoretical Proofs
Proof 33 (Derivation of relation (iii) b ≥ v of BIBD). The determinant of N ′Nis
|N ′N | = [r + λ(v − 1)](r − λ)v−1 [cf. (B.132)]= rk(r − λ)v−1 [cf. (6.66)]6= 0
because if r = λ, then (6.66) gives k = v which contradicts the completenessproperty of the design. Thus N ′N is a (v × v) nonsingular matrix and so
rank N ′N = v.
Since rank N = rank N ′N , so rank N = v. But rank N ≤ b, being b rowsin N . Thus v ≤ b and thus the relation (iii) in (6.67) holds.
Proof 34 (Theorem 6.11). Let
b = nr (B.135)
where n > 1 is an integer. For a BIBD
λ(v − 1) = r(k − 1)or λ(nk − 1) = r(k − 1) (using vr = bk with (B.135))
or r = λ
(n− 1k − 1
)+ λn.
Since n > 1 and k > 1, so λ(n− 1)/(k − 1) is a positive integer.Now if possible, let
b < v + r − 1 (B.136)or nr < v + r − 1 (using (B.135))or r(n− 1) < v − 1
or r(n− 1) <r(k − 1)
λ(using (6.66))
which implies
λ(n− 1)k − 1
< 1
which is a contradiction, so (B.136) is not possible. Thus
b ≥ v + r − 1
holds.
B.3 Incomplete Block Designs 587
Proof 35 (Estimate of τ in Intrablock Analysis of BIBD). The C-matrix forBIBD is
C = rIv − NN ′
r
= rIv − 1k
[(r − v)Iv + λ1v1v′] [cf. (B.132)]
=λv
k
[Iv − 1v1v
′
v
]. (B.137)
We consider here the symmetric generalized inverse of the form
Ω = (C + κ1v1v′)−1
where κ is any convenient nonzero scalar. For such generalized inverse, wehave
(C + κ1v1v′)Ω = Iv (B.138)
so that
CΩ = Iv − κ1v1v′Ω
= Iv − 1v1v′
v(B.139)
which is obtained by noting that
1v′ − κ1v
′1v1v′Ω = 1v
′CΩ1v′ − κ1v
′1v1v′Ω = 0 (since 1v
′C = 0)κv1v
′Ω = 1v′
κv1v1v′Ω = 1v1v
′
κ1v1v′Ω =
1v1v′
v.
Using this generalized inverse in case of (B.137), we have
C =λv
kIv − λ
k1v1v
′.
It is convenient to take κ = λ/k so that
Ω−1 = C + κ1v1v′ =
λv
kIv
and
Ω =k
λvIv .
Thus the intrablock estimate of treatment effect in case of BIBD is thesolution of Cτ = Q which is
τ =k
λvQ . (B.140)
588 Appendix B. Theoretical Proofs
Proof 36 (Variance of Intrablock and Interblock Estimates of l′τ).
Var(l′τ) =(
k
λv
)2
Var
∑
j
ljQj
=(
k
λv
)2∑
j
l2jVar(Qj) + 2∑
j
∑
j′( 6=j)
lj lj′Cov(Qj , Qj′)
.
Since
Var(Qj) = r
(1− 1
k
)σ2 ,
Cov(Qj , Qj′) = −λ
kσ2 , (j 6= j′)
so
Var(l′τ) =(
k
λv
)2
r
(1− 1
k
)σ2
∑
j
l2j −λ
k
∑
j
lj
2
−∑
j
l2j
σ2
=(
k
λv
)2r(k − 1)
k
∑
j
l2j +λ
k
∑
j
l2j
σ2
(since∑
j lj = 0 being contrast)
=(
k
λv
)2 1k
[λ(v − 1) + λ]∑
j
l2j
(using r(k − 1) = λ(v − 1))
=(
k
λv
)σ2
∑
j
l2j .
Similarly,
Var(l′τ) =(
1r − λ
)2∑
j
l2jVar(Tj) + 2∑
j
∑
j′(6=j)
lj lj′Cov(Tj , Tj′)
=(
1r − λ
)2
rσ2
f
∑
j
l2j + λσ2f
∑
j
lj
2
−∑
j
l2j
=σ2
f
r − λ
∑
j
l2j .
B.3 Incomplete Block Designs 589
Proof 37 (Derivation of τ∗j ). We have seen in (6.105) that
τ∗j =λvω1τj + k(r − λ)ω2τj
λvω1 + k(r − λ)ω2. (B.141)
Since τj = (k/λv)Qj and τj = Tj/(r − λ), so the numerator of (B.141)can be expressed as
ω1λvτj + ω2k(r − λ)τj = ω1kQj + ω2kTj (B.142)
and denominator of (B.141) can be expressed as
ω1λv + ω2k(r − λ)
= ω1
[vr(k − 1)
v − 1
]+ ω2
[k
(r − r(k − 1)
v − 1
)]
(using λ(v − 1) = r(k − 1))
=1
v − 1[ω1vr(k − 1) + ω2kr(v − k)] . (B.143)
Let
W ∗j = (v − k)Vj − (v − 1)Tj + (k − 1)G (B.144)
where∑
j W ∗j = 0. Using (B.142) and (B.143) in (B.141), we have
τ∗j =(v − 1)[ω1kQj + ω2kTj ]
ω1rv(k − 1) + ω2kr(v − k)
=(v − 1)[ω1(kVj − Tj) + ω2kTj ]
r[ω1v(k − 1) + ω2k(v − k)](using Qj = Vj − Tj
k )
=ω1k(v − 1)Vj + (kω2 − ω1)(v − 1)Tj
r[ω1v(k − 1) + ω2k(v − k)]
=ω1k(v − 1)Vj + (ω1 − kω2)[W ∗
j − (v − k)Vj − (k − 1)G]r[ω1v(k − 1) + ω2k(v − k)]
=[ω1k(v − 1)− (ω1 − kω2)(v − k)]Vj + (ω1 − kω2)[W ∗
j − (k − 1)G]r[ω1v(k − 1) + ω2k(v − k)]
=1r
[Vj +
ω1 − kω2
ω1v(k − 1) + ω2k(v − k)W ∗
j − (k − 1)G]
=1r
[Vj + ξ
W ∗
j − (k − 1)G]
where
ξ =ω1 − kω2
ω1v(k − 1) + ω2k(v − k).
Appendix CDistributions and Tables
x 0.0 0.02 0.04 0.06 0.080.0 0.3989 0.3989 0.3986 0.3982 0.39770.2 0.3910 0.3894 0.3876 0.3857 0.38360.4 0.3814 0.3653 0.3621 0.3589 0.35550.6 0.3332 0.3292 0.3251 0.3209 0.31660.8 0.2897 0.2850 0.2803 0.2756 0.27091.0 0.2419 0.2371 0.2323 0.2275 0.2226
1.2 0.1942 0.1895 0.1849 0.1804 0.17581.4 0.1497 0.1456 0.1415 0.1374 0.13341.6 0.1109 0.1074 0.1039 0.1006 0.09731.8 0.0789 0.0761 0.0734 0.0707 0.06812.0 0.0539 0.0519 0.0498 0.0478 0.0459
2.2 0.0355 0.0339 0.0325 0.0310 0.02962.4 0.0224 0.0213 0.0203 0.0194 0.01842.6 0.0136 0.0167 0.0122 0.0116 0.01102.8 0.0059 0.0075 0.0071 0.0067 0.00633.0 0.0044 0.0024 0.0012 0.0006 0.0003
Table C.1. Density function φ(x) of the N(0, 1)–distribution.
592 C Distributions and Tables
u 0.00 0.01 0.02 0.03 0.040.0 0.500000 0.503989 0.507978 0.511966 0.5159530.1 0.539828 0.543795 0.547758 0.551717 0.5556700.2 0.579260 0.583166 0.587064 0.590954 0.5948350.3 0.617911 0.621720 0.625516 0.629300 0.6330720.4 0.655422 0.659097 0.662757 0.666402 0.6700310.5 0.691462 0.694974 0.698468 0.701944 0.705401
0.6 0.725747 0.729069 0.732371 0.735653 0.7389140.7 0.758036 0.761148 0.764238 0.767305 0.7703500.8 0.788145 0.791030 0.793892 0.796731 0.7995460.9 0.815940 0.818589 0.821214 0.823814 0.8263911.0 0.841345 0.843752 0.846136 0.848495 0.850830
1.1 0.864334 0.866500 0.868643 0.870762 0.8728571.2 0.884930 0.886861 0.888768 0.890651 0.8925121.3 0.903200 0.904902 0.906582 0.908241 0.9098771.4 0.919243 0.920730 0.922196 0.923641 0.9250661.5 0.933193 0.934478 0.935745 0.936992 0.938220
1.6 0.945201 0.946301 0.947384 0.948449 0.9494971.7 0.955435 0.956367 0.957284 0.958185 0.9590701.8 0.964070 0.964852 0.965620 0.966375 0.9671161.9 0.971283 0.971933 0.972571 0.973197 0.9738102.0 0.977250 0.977784 0.978308 0.978822 0.979325
2.1 0.982136 0.982571 0.982997 0.983414 0.9838232.2 0.986097 0.986447 0.986791 0.987126 0.9874552.3 0.989276 0.989556 0.989830 0.990097 0.9903582.4 0.991802 0.992024 0.992240 0.992451 0.9926562.5 0.993790 0.993963 0.994132 0.994297 0.994457
2.6 0.995339 0.995473 0.995604 0.995731 0.9958552.7 0.996533 0.996636 0.996736 0.996833 0.9969282.8 0.997445 0.997523 0.997599 0.997673 0.9977442.9 0.998134 0.998193 0.998250 0.998305 0.9983593.0 0.998650 0.998694 0.998736 0.998777 0.998817
Table C.2. Distribution function Φ(u) of the N(0, 1)–distribution.
C Distributions and Tables 593
u 0.05 0.06 0.07 0.08 0.090.0 0.519939 0.523922 0.527903 0.531881 0.5358560.1 0.559618 0.563559 0.567495 0.571424 0.5753450.2 0.598706 0.602568 0.606420 0.610261 0.6140920.3 0.636831 0.640576 0.644309 0.648027 0.6517320.4 0.673645 0.677242 0.680822 0.684386 0.6879330.5 0.708840 0.712260 0.715661 0.719043 0.722405
0.6 0.742154 0.745373 0.748571 0.751748 0.7549030.7 0.773373 0.776373 0.779350 0.782305 0.7852360.8 0.802337 0.805105 0.807850 0.810570 0.8132670.9 0.828944 0.831472 0.833977 0.836457 0.8389131.0 0.853141 0.855428 0.857690 0.859929 0.862143
1.1 0.874928 0.876976 0.879000 0.881000 0.8829771.2 0.894350 0.896165 0.897958 0.899727 0.9014751.3 0.911492 0.913085 0.914657 0.916207 0.9177361.4 0.926471 0.927855 0.929219 0.930563 0.9318881.5 0.939429 0.940620 0.941792 0.942947 0.944083
1.6 0.950529 0.951543 0.952540 0.953521 0.9544861.7 0.959941 0.960796 0.961636 0.962462 0.9632731.8 0.967843 0.968557 0.969258 0.969946 0.9706211.9 0.974412 0.975002 0.975581 0.976148 0.9767052.0 0.979818 0.980301 0.980774 0.981237 0.981691
2.1 0.984222 0.984614 0.984997 0.985371 0.9857382.2 0.987776 0.988089 0.988396 0.988696 0.9889892.3 0.990613 0.990863 0.991106 0.991344 0.9915762.4 0.992857 0.993053 0.993244 0.993431 0.9936132.5 0.994614 0.994766 0.994915 0.995060 0.995201
2.6 0.995975 0.996093 0.996207 0.996319 0.9964272.7 0.997020 0.997110 0.997197 0.997282 0.9973652.8 0.997814 0.997882 0.997948 0.998012 0.9980742.9 0.998411 0.998462 0.998511 0.998559 0.9986053.0 0.998856 0.998893 0.998930 0.998965 0.998999
Table C.3. Distribution function Φ(u) of the N(0, 1)–distribution.
594 C Distributions and Tables
Level of significance αdf 0.99 0.975 0.95 0.05 0.025 0.011 0.0001 0.001 0.004 3.84 5.02 6.622 0.020 0.051 0.103 5.99 7.38 9.213 0.115 0.216 0.352 7.81 9.35 11.34 0.297 0.484 0.711 9.49 11.1 13.35 0.554 0.831 1.15 11.1 12.8 15.1
6 0.872 1.24 1.64 12.6 14.4 16.87 1.24 1.69 2.17 14.1 16.0 18.58 1.65 2.18 2.73 15.5 17.5 20.19 2.09 2.70 3.33 16.9 19.0 21.7
10 2.56 3.25 3.94 18.3 20.5 23.2
11 3.05 3.82 4.57 19.7 21.9 24.712 3.57 4.40 5.23 21.0 23.3 26.213 4.11 5.01 5.89 22.4 24.7 27.714 4.66 5.63 6.57 23.7 26.1 29.115 5.23 6.26 7.26 25.0 27.5 30.6
16 5.81 6.91 7.96 26.3 28.8 32.017 6.41 7.56 8.67 27.6 30.2 33.418 7.01 8.23 9.39 28.9 31.5 34.819 7.63 8.91 10.1 30.1 32.9 36.220 8.26 9.59 10.9 31.4 34.2 37.6
25 11.5 13.1 14.6 37.7 40.6 44.330 15.0 16.8 18.5 43.8 47.0 50.940 22.2 24.4 26.5 55.8 59.3 63.750 29.7 32.4 34.8 67.5 71.4 76.2
60 37.5 40.5 43.2 79.1 83.3 88.470 45.4 48.8 51.7 90.5 95.0 100.480 53.5 57.2 60.4 101.9 106.6 112.390 61.8 65.6 69.1 113.1 118.1 124.1
100 70.1 74.2 77.9 124.3 129.6 135.8
Table C.4. Quantiles of the χ2–distribution.
C Distributions and Tables 595
Levels of significance α (one–sided)0.05 0.025 0.01 0.005
Levels of significance α (two–sided)df 0.10 0.05 0.02 0.01
1 6.31 12.71 31.82 63.662 2.92 4.30 6.97 9.923 2.35 3.18 4.54 5.844 2.13 2.78 3.75 4.605 2.01 2.57 3.37 4.03
6 1.94 2.45 3.14 3.717 1.89 2.36 3.00 3.508 1.86 2.31 2.90 3.369 1.83 2.26 2.82 3.25
10 1.81 2.23 2.76 3.17
11 1.80 2.20 2.72 3.1112 1.78 2.18 2.68 3.0513 1.77 2.18 2.65 3.0114 1.76 2.14 2.62 2.9815 1.75 2.13 2.60 2.95
16 1.75 2.12 2.58 2.9217 1.74 2.11 2.57 2.9018 1.73 2.10 2.55 2.8819 1.73 2.09 2.54 2.8620 1.73 2.09 2.53 2.85
30 1.70 2.04 2.46 2.7540 1.68 2.02 2.42 2.7060 1.67 2.00 2.39 2.66∞ 1.64 1.96 2.33 2.58
Table C.5. Quantiles of the t–distribution.
596 C Distributions and Tables
df1
df2 1 2 3 4 5 6 7 8 9
1 161 200 216 225 230 234 237 239 2412 18.51 19.00 19.16 19.25 19.30 19.33 19.36 19.37 19.383 10.13 9.55 9.28 9.12 9.01 8.94 8.88 8.84 8.814 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.005 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.78
6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.107 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.688 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.399 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18
10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02
11 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.9012 4.75 3.88 3.49 3.26 3.11 3.00 2.92 2.85 2.8013 4.67 3.80 3.41 3.18 3.02 2.92 2.84 2.77 2.7214 4.60 3.74 3.34 3.11 2.96 2.85 2.77 2.70 2.6515 4.54 3.68 3.29 3.06 2.90 2.79 2.70 2.64 2.59
20 4.35 3.49 3.10 2.87 2.71 2.60 2.52 2.45 2.4030 4.17 3.32 2.92 2.69 2.53 2.42 2.34 2.27 2.21
Table C.6. Quantiles of the Fdf1,df2–distribution with df1 and df2 degrees offreedom (α = 0.05).
C Distributions and Tables 597
df1
df2 10 11 12 14 16 20 24 301 242 243 244 245 246 248 249 2502 19.39 19.40 19.41 19.42 19.43 19.44 19.45 19.463 8.78 8.76 8.74 8.71 8.69 8.66 8.64 8.624 5.96 5.93 5.91 5.87 5.84 5.80 5.77 5.745 4.74 4.70 4.68 4.64 4.60 4.56 4.53 4.50
6 4.06 4.03 4.00 3.96 3.92 3.87 3.84 3.817 3.63 3.60 3.57 3.52 3.49 3.44 3.41 3.388 3.34 3.31 3.28 3.23 3.20 3.15 3.12 3.089 3.13 3.10 3.07 3.02 2.98 2.93 2.90 2.86
10 2.97 2.94 2.91 2.86 2.82 2.77 2.74 2.70
11 2.86 2.82 2.79 2.74 2.70 2.65 2.61 2.5712 2.76 2.72 2.69 2.64 2.60 2.54 2.50 2.4613 2.67 2.63 2.60 2.55 2.51 2.46 2.42 2.3814 2.60 2.56 2.53 2.48 2.44 2.39 2.35 2.3115 2.55 2.51 2.48 2.43 2.39 2.33 2.29 2.25
20 2.35 2.31 2.28 2.23 2.18 2.12 2.08 2.0430 2.16 2.12 2.00 2.04 1.99 1.93 1.89 1.84
Table C.7. Quantiles of the Fdf1,df2–distribution with df1 and df2 degrees offreedom (α = 0.05).
References
Agresti, A. (2007). Categorical Data Analysis, Wiley, Hoboken.
Aitchison, J. and Silvey, S. D. (1958). Maximum likelihood estimation of para-meters subject to restraints, Annals of Mathematical Statistics 29: 813–828.
Albert, A. (1972). Regression and the Moore–Penrose Pseudoinverse, AcademicPress.
Algina, J. (1995). An improved general approximation test for the main effect in asplit-plot design, British Journal of Mathematical and Statistical Psychology48: 149–160.
Algina, J. (1997). Generalization of improved general approximation tests to split-plot designs with multiple between-subjects factors and/or multiple within-subjects factors, British Journal of Mathematical and Statistical Psychology50: 243–252.
Amemiya, T. (1985). Advanced Econometrics, Basil Blackwell, Oxford.
Andrews, D. F. and Pregibon, D. (1978). Finding outliers that matter, Journalof the Royal Statistical Society, Series B 40: 85–93.
Baksalary, J. K., Kala, R. and Klaczynski, K. (1983). The matrix inequalityM ≥ B∗MB, Linear Algebra and Its Applications 54: 77–86.
Bartlett, M. S. (1937). Some examples of statistical methods of research in agri-culture and applied botany, Journal of the Royal Statistical Society, SeriesB 4: 137–170.
Beckman, R. J. and Trussel, H. J. (1974). The distribution of an arbitrary Stu-dentized residual and the effects of updating in multiple regression, Journalof the American Statistical Association 69: 199–201.
Bekker, P. A. and Neudecker, H. (1989). Albert’s theorem applied to problemsof efficiency and MSE superiority, Statistica Neerlandica 43: 157–167.
600 References
Belsley, D. A., Kuh, E. and Welsch, R. E. (1980). Regression Diagnostics, Wiley,New York.
Birch, M. W. (1963). Maximum likelihood in three-way contingency tables,Journal of the Royal Statistical Society, Series B 25: 220–233.
Bishop, Y. M. M., Fienberg, S. E. and Holland, P. W. (2007). DiscreteMultivariate Analysis: Theory and Practice, Springer, New York.
Boik, R. J. (1981). A priori tests in repeated measures designs: Effects of non-sphericity, Psychometrica 46(3): 241–255.
Bosch, K. (1992). Statistik-Taschenbuch, Oldenbourg.
Bose, R. and Shimamoto (1952). Classification and analysis of partially bal-anced designs with two associate classes, Journal of American StatisticalAssociation 47: 151–184.
Box, G. E. P. (1949). A general distribution theory for a class of likelihood criteria,Biometrics 36: 317–346.
Brook, R. J. and Arnold, G. C. (1985). Applied Regression Analysis andExperimental Design, Dekker.
Brown Jr., B. W. (1980). The crossover experiment for clinical trials, Biometrics36: 69–79.
Brownie, C. and Boos, D. D. (1994). Type i error robustness of anova and anovaon ranks when the number of treatments is large, Biometrics 50: 542–549.
Brzeskwiniewicz, H. and Wagner, W. (1991). Covariance analysis for split-plotand split-block designs, The American Statistician 46: 155–162.
Buning, H. and Trenkler, G. (1978). Nichtparametrische statistische Methoden,de Gruyter.
Burdick, R. (1994). Using confidence intervals to test variance components,Journal of Quality Technology 28: 30–30.
Campbell, S. L. and Meyer, C. D. (1979). Generalized Inverses of LinearTransformations, Pitman.
Casella, G. (2008). Statistical Design, Springer, New York.
Chakrabarti, M. C. (1963). Mathematics of Design and Analysis of Experiment,Asia Publishing House.
Chatterjee, S. and Hadi, A. S. (1988). Sensitivity Analysis in Linear Regression,Wiley, New York.
Christensen, R. (1990). Log-Linear Models, Springer–Verlag.
Cochran, W. G. and Cox, G. M. (1950). Experimental Designs, Wiley.
Cochran, W. G. and Cox, G. M. (1957). Experimental Designs, Wiley.
Cook, R. D. (1977). Detection of influential observations in linear regression,Technometrics 19: 15–18.
Cook, R. D. and Weisberg, S. (1982). Residuals and Influence in Regression,Chapman and Hall, New York.
Cook, R. D. and Weisberg, S. (1989). Regression diagnostics with dynamicgraphics, Technometrics 31: 277–291.
Cox, D. R. (1970). The Analysis of Binary Data, Chapman and Hall.
References 601
Cox, D. R. (1972a). The analysis of multivariate binary data, Applied Statistics21: 113–120.
Cox, D. R. (1972b). Regression models and life-tables (with discussion), Journalof the Royal Statistical Society, Series B 34: 187–202.
Cox, D. R. and Snell, E. J. (1968). A general definition of residuals, Journal ofthe Royal Statistical Society, Series B 30: 248–275.
Crowder, M. J. and Hand, D. J. (1990). Analysis of Repeated Measures, Chapmanand Hall.
Cureton, E. E. (1967). The normal approximation to the signed-rank samplingdistribution when zero differences are present, Journal of the AmericanStatistical Association 62: 1068–1069.
Dean, A. and Voss, D. (1998). Design and Analysis of Experiments, Springer–Verlag.
Deming, W. E. and Stephan, F. F. (1940). On a least squares adjustment of sam-pled frequency table when the expected marginal totals are known, Annalsof Mathematical Statistics 11: 427–444.
Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihoodfrom incomplete data via the EM algorithm, Journal of the Royal StatisticalSociety, Series B 43: 1–22.
Dey, A. (1986). Theory of Block Designs, Wiley Eastern Limited.
Dhrymes, P. J. (1978). Indroductory Econometrics, Springer–Verlag, New York.
Diggle, P. J., Liang, K.-Y. and Zeger, S. L. (1994). Analysis of Longitudinal Data,Chapman and Hall, London.
Doksum, K. A. and Gasko, M. (1990). On a correspondence between models inbinary regression analysis and in survival analysis, International StatisticalReview 58: 243–252.
Draper, N. R. and Pukelsheim, F. (1996). An overview of design of experiments,Statistical Papers 37: 1–32.
Draper, N. R. and Smith, H. (1966). Applied Regression Analysis, Wiley.
Duncan, D. B. (1975). t–tests and intervals for comparisons suggested by thedata, Biometrics 31: 339–359.
Dunn, O. J. (1964). Multiple comparisons using rank sums, Technometrics 6: 241–252.
Dunn, O. J. and Clark, V. A. (1987). Applied Statistics: Analysis of Variance andRegression, Wiley.
Dunnett, C. W. (1955). A multiple comparison procedure for comparing treat-ments with a control, Journal of the American Statistical Association50: 1096–1121.
Dunnett, C. W. (1964). New tables for multiple comparisons with a control,Biometrics 20: 482–491.
Fahrmeir, L. and Hamerle, A. (eds) (1984). Multivariate statistische Verfahren,de Gruyter, Berlin.
Fahrmeir, L. and Kaufmann, H. (1985). Consistency and asymptotic normalityof the maximum likelihood estimator in generalized linear models, Annalsof Statistics 13: 342–368.
602 References
Fahrmeir, L. and Tutz, G. (2001). Multivariate Statistical Modelling Based onGeneralized Linear Models, Springer–Verlag.
Fitzmaurice, G. M., Laird, N. M. and Rotnitzky, A. G. (1993). Regression modelsfor discrete longitudinal responses, Statistical Science 8(3): 284–309.
Fleiss, J. L. (1989). A critique of recent research in the two-treatment cross-overdesign, Controlled Clinical Trials 10: 237–243.
Friedman, M. (1937). The use of ranks to avoid the assumption of normalityimplicit in the analysis of variance, Journal of the American StatisticalAssociation 32: 675–701.
Gail, M. H. and Simon, R. (1985). Testing for qualitative interactions betweentreatment effects and patient subsets, Biometrics 41: 361–372.
Gart, J. J. (1969). An exact test for comparing matched proportions in crossoverdesigns, Biometrika 56(1): 75–80.
Gibbons, J. D. (1976). Nonparametric Methods for Quantitative Analysis,American Series in Mathematical and Management Sciences.
Girden, E. R. (1992). ANOVA—Repeated Measures, Sage.
Glonek, G. V. F. (1996). A class of regression models for multivariate categoricalresponses, Biometrika 83(1): 15–28.
Goldberger, A. S. (1964). Econometric Theory, Wiley, New York.
Graybill, F. A. (1961). An Introduction to Linear Statistical Models, McGraw-Hill,New York.
Greenhouse, S. W. and Geisser, S. (1959). On methods in the analysis of profiledata, Psychometrika 24(2): 95–112.
Grieve, A. P. (1982). The two-period changeover design in clinical trials (letterto the editor), Biometrics 38: 517–517.
Grieve, A. P. (1990). Cross–over versus parallel designs, Statistical Methodologyin the Pharmaceutical Sciences.
Grizzle, J. E. (1965). The two-period change-over design and its use in clinicaltrials, Biometrics 21: 467–480.
Grizzle, J. E., Starmer, F. C. and Koch, G. G. (1969). Analysis of categoricaldata by linear models, Biometrics 25: 489–504.
Guilkey, D. K. and Price, J. M. (1981). On comparing restricted least squaresestimators, Journal of Econometrics 15: 397–404.
Haaland, P. D. (1989). Experimental Design in Biotechnology, Dekker.
Haitovsky, Y. (1968). Missing data in regression analysis, Journal of the RoyalStatistical Society, Series B 34: 67–82.
Hamerle, A. and Tutz, G. (1989). Diskrete Modelle zur Analyse von Verweildauernund Lebenszeiten, Campus, Frankfurt/M.
Harwell, M. and Serlin, R. (1994). An empirical study of five multivariate testsfor the single factor repeated measures model, Computational Statistics andData Analysis 26: 605–618.
Hays, W. L. (1988). Statistics, Holt, Rinehart and Winston.
References 603
Heagerty, P. J. and Zeger, S. L. (1996). Marginal regression models for clus-tered ordinal measurements, Journal of the American Statistical Association91(435): 1024–1036.
Hemelrijk, J. (1952). Note on wilcoxon’s two-sample test when ties are present,Annals of Mathematical Statistics 23: 133–135.
Heumann, C. (1993). GEE1-procedure for categorical correlated response,Technical report, Ludwigstr. 33, 80535 Munchen, Germany.
Heumann, C. (1998). Likelihoodbasierte marginale Regressionsmodelle fur ko-rrelierte kategoriale Daten, Peter Lang Europaischer Verlag der Wis-senschaften, Frankfurt am Main.
Heumann, C. and Jacobsen, M. (1993). LOGGY 1.0—Ein Programm zurAnalyse von loglinearen Modellen, C. Heumann, Ludwig-Richter-Str. 3,85221 Dachau.
Heumann, C., Jacobsen, M. and Toutenburg, H. (1993). Rechnergestutztegrafische Analyse von ordinalen Kontingenztafeln—eine Alternative zumPareto-Prinzip, Technical report.
Hills, M. and Armitage, P. (1979). The two-period cross-over clinical trial, BritishJournal of Clinical Pharmacology 8: 7–20.
Hinkelmann, K. and Kempthorne, O. (2005). Design and Analysis of Experi-ments, Volume 2, Advanced Experimental Design, Wiley.
Hinkelmann, K. and Kempthorne, O. (2007). Design and Analysis of Ex-periments, Volume 1, Introduction to Experimental Design, 2nd Edition,Wiley.
Hochberg, Y. and Tamhane, A. C. (1987). Multiple Comparison Procedures,Wiley.
Hocking, R. R. (1973). A discussion of the two-way mixed models, The AmericanStatistician 27(4): 148–152.
Hollander, M. and Wolfe, D. A. (1973). Nonparametric Statistical Methods, Wiley.
Huynh, H. and Feldt, L. S. (1970). Conditions under which mean square ratiosin repeated measurements designs have exact F -distribution, Journal of theAmerican Statistical Association 65: 1582–1589.
Huynh, H. and Mandeville, G. K. (1979). Validity conditions in repeated measuresdesigns, Psychological Bulletin 86(5): 964–973.
Ishihawa, K. (1976). Guide to Quality Control, Unipub.
John, P. W. M. (1980). Incomplete Block Designs, Marcel Dekker.
Johnson, N. L. and Leone, F. C. (1964). Statistics and Experimental Design inEngineering and the Physical Sciences, Volumes II, Wiley.
Johnston, J. (1972). Econometric Methods, McGraw-Hill.
Johnston, J. (1984). Econometric Methods, McGraw-Hill.
Jones, B. and Kenward, M. G. (1989). Design and Analysis of Cross–over Trials,Chapman and Hall.
Joshi, D. D. (1987). Linear Estimation and Design of Experiments, New AgeInternational Publishers.
604 References
Judge, G. G., Griffiths, W. E., Hill, R. C. and Lee, T.-C. (1980). The Theory andPractice of Econometrics, Wiley, New York.
Judge, G. G., Griffiths, W. E., Hill, R. C., Lutkepohl, H. and Lee, T.-C. (1985).The Theory and Practice of Econometrics, 2nd edition, Wiley, New York.
Karim, M. and Zeger, S. L. (1988). GEE: A SAS macro for longitudinal analy-sis, Technical Report, Department of Biostatistics, John Hopkins School ofHygine and Public Health, Baltimore, MD.
Kastner, C., Fieger, A. and Heumann, C. (1997). MAREG and WinMAREG—a tool for marginal regression models, Computational Statistics and DataAnalysis 24(2): 235–241.
Kmenta, J. (1997). Elements of Econometrics, The University of Michigan Press,Ann Arbor.
Koch, G. G. (1969). Some aspects of the statistical analysis of split-plot experi-ments in completely randomized layouts, Journal of the American StatisticalAssociation 64: 485–505.
Koch, G. G. (1972). The use of nonparametric methods in the analysis of the twoperiod change-over design, Biometrics 28: 577–584.
Koch, G. G., Landis, R. J., Freeman, J. L., Freeman, D. H. and Lehnen, R. G.(1977). A general methodology for the analysis of experiments with repeatedmeasurements of categorical data, Biometrics 33: 133–158.
Kres, H. V. (1983). Statistical Tables for Multivariate Analysis, Springer–Verlag.
Kruskal, W. H. and Wallis, W. A. (1952). Use of ranks in one-criterion varianceanalysis, Journal of the American Statistical Association 47: 583–621.
Lang, J. B. and Agresti, A. (1994). Simultaneously modeling joint and marginaldistributions of multivariate categorical responses, Journal of the AmericanStatistical Association 89(426): 625–632.
Larsen, W. A. and McCleary, S. J. (1972). The use of partial residual plots inregression analysis, Technometrics 14: 781–790.
Lawless, J. F. (1982). Statistical Models and Methods for Lifetime Data, Wiley,New York.
Lehmacher, W. (1987). Verlaufskurven und Crossover, Springer–Verlag.
Lehmacher, W. (1991). Analysis of the cross–over design in the presence ofresidual effects, Statistics in Medicine 10: 891–899.
Lehmacher, W. and Wall, K. D. (1978). A new nonparametric approach tothe comparison of k independent samples of response curves, BiometricalJournal 20(3): 261–273.
Lehmann, E. L. (1986). Testing Statistical Hypotheses, 2nd Edition, Wiley, NewYork.
Liang, K.-Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalizedlinear models, Biometrika 73: 13–22.
Liang, K.-Y. and Zeger, S. L. (1989). A class of logistic regression models for mul-tivariate binary time series, Journal of the American Statistical Association84(406): 447–451.
Liang, K.-Y. and Zeger, S. L. (1993). Regression analysis for correlated data,Annual Review of Public Health 14: 43–68.
References 605
Liang, K.-Y., Zeger, S. L. and Qaqish, B. (1992). Multivariate regression analysisfor categorical data, Journal of the Royal Statistical Society, Series B 54: 3–40.
Lienert, G. A. (1986). Verteilungsfreie Methoden in der Biostatistik, Hain.
Lipsitz, S. R., Laird, N. M. and Harrington, D. P. (1991). Generalized estimatingequations for correlated binary data: Using the odds ratio as a measure ofassociation, Biometrika 78: 153–160.
Little, R. J. A. and Rubin, D. B. (1987). Statistical Analysis with Missing Data,Wiley.
Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979). Multivariate Analysis,Academic Press, London.
Mauchly, J. W. (1940). Significance test for sphericity of a normal n–variatedistribution, Annals of Mathematical Statistics 11: 204–209.
McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, Chapmanand Hall, London.
McCulloch, C. E. and Searle, S. R. (2000). Gereralized, Linear and Mixed Models,Wiley, New York.
McElroy, F. W. (1967). A necessary and sufficient condition that ordinaryleast-squares estimators be best linear unbiased, Journal of the AmericanStatistical Association 62: 1302–1304.
McFadden, D. (1974). Conditional logit analysis of qualitative choice, Frontiersin econometrics.
Michaelis, J. (1971). Schwellenwerte des Friedman-Tests, Biometrische Zeitschrift13: 118–122.
Miller Jr., R. G. (1981). Simultaneous Statistical Inference, Springer–Verlag.
Milliken, G. A. and Akdeniz, F. (1977). A theorem on the difference of the gen-eralized inverse of two nonnegative matrices, Communications in Statistics,Part A—Theory and Methods 6: 73–79.
Milliken, G. A. and Johnson, D. E. (1984). Analysis of Messy Data, Van NostrandReinhold.
Mitzel, H. C. and Games, P. A. (1981). Circularity and multiple comparisons inrepeated measure designs, British Journal of Mathematical and StatisticalPsychology 34: 253–259.
Molenberghs, G. and Lesaffre, E. (1994). Marginal modeling of correlated ordinaldata using a multivariate Plackett distribution, Journal of the AmericanStatistical Association 89(426): 633–644.
Montgomery, D. C. (1976). Design and Analysis of Experiments, Wiley.
Morrison, D. F. (1973). A test for equality of means of correlated variates withmissing data on one response, Biometrika 60: 101–105.
Morrison, D. F. (1983). Applied Linear Statistical Methods, Prentice Hall.
Nelder, J. A. and Wedderburn, R. W. M. (1972). Generalized linear models,Journal of the Royal Statistical Society, Series A 135: 370–384.
Neter, J., Wassermann, W. and Kutner, M. H. (1990). Applied Linear StatisticalModels, 3rd edition, Irwin, Boston.
606 References
Oberhofer, W. and Kmenta, J. (1974). A general procedure for obtaining max-imum likelihood estimates in generalized regression models, Econometrica42: 579–590.
Park, S. H., Kim, Y. H. and Toutenburg, H. (1992). Regression diagnosticsfor removing an observation with animating graphics, Statistical Papers33: 227–240.
Pepe, M. S. and Fleming, T. R. (1991). A nonparametric method for dealing withmismeasured covariate data, Journal of the American Statistical Association86: 108–113.
Petersen, R. G. (1985). Design and Analysis of Experiments, Dekker.
Pollock, D. S. G. (1979). The Algebra of Econometrics, Wiley, Chichester.
Pratt, J. W. (1959). Remarks on zeros and ties in the wilcoxon signed rankprocedures, Journal of the American Statistical Association 54: 655–667.
Prentice, R. L. (1988). Correlated binary regression with covariates specific toeach binary observation, Biometrics 44: 1033–1048.
Prentice, R. L. and Zhao, L. P. (1991). Estimating equations for parameters inmeans and covariances of multivariate discrete and continuous responses,Biometrics 47: 825–839.
Prescott, R. J. (1981). The comparison of success rates in cross–over trials in thepresence of an order effect, Applied Statistics 30(1): 9–15.
Puri, M. L. and Sen, P. K. (1971). Nonparametric Methods in MultivariateAnalysis, Wiley.
Raghavarao, D. (1971). Constructions and Combinatorial Problems in Designs ofExperiment, Wiley.
Raghavarao, D. and Padgett, L. V. (1986). Block Designs- Analysis, Combina-torics and Applications, World Scientific.
Rao, C. R. (1956). Analysis of dispersion with incomplete observations on one ofthe characters, Journal of the Royal Statistical Society, Series B 18: 259–264.
Rao, C. R. (1973). Linear Statistical Inference and Its Applications, 2nd edition,Wiley, New York.
Rao, C. R. (1988). Methodology based on the L1-norm in statistical inference,Sankhya, Series A 50: 289–313.
Rao, C. R. and Mitra, S. K. (1971). Generalized Inverse of Matrices and ItsApplications, Wiley, New York.
Rao, C. R. and Rao, M. B. (1998). Matrix Algebra and Its Applications toStatistics and Econometrics, World Scientific, Singapore.
Rao, C. R. and Toutenburg, H. (1999). Linear Models: Least Squares andAlternatives, Springer–Verlag.
Rao, C. R., Toutenburg, H., Shalabh and Heumann, C. (2008). Linear Modelsand Generalizations - Least Squares and Alternatives, Springer–Verlag.
Ratkovsky, D. A., Evans, M. A. and Alldredge, J. R. (1993). Cross–OverExperiments. Design, Analysis and Application, Dekker.
Rosner, B. (1984). Multivariate methods in ophtalmology with application topaired-data situations, Biometrics 40: 1025–1035.
References 607
Rouanet, H. and Lepine, D. (1970). Comparison between treatments in arepeated-measurement design: ANOVA and multivariate methods, BritishJournal of Mathematical and Statistical Psychology 23(2): 147–163.
Roy, S. N. (1953). On a heuristic method of test construction and its use inmultivariate analysis, Annals of Mathematical Statistics 24: 220–238.
Roy, S. N. (1957). Some Aspects of Multivariate Analysis, Wiley.
Rubin, D. B. (1976). Inference and missing data, Biometrika 63: 581–592.
Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Sample Surveys,Wiley, New York.
Sachs, L. (1974). Angewandte Statistik: Planung und Auswertung, Methoden undModelle, Springer–Verlag.
Scheffe, H. (1953). A method for judging all contrasts in the analysis of variance,Biometrika 40: 87–104.
Scheffe, H. (1956). A “mixed model” for the analysis of variance, Annals ofMathematical Statistics 27: 23–26.
Scheffe, H. (1959). The Analysis of Variance, Wiley, New York.
Schneeweiß, H. (1990). Okonometrie, Physica.
Searle, S. R. (1982). Matrix Algebra Useful for Statistics, Wiley, New York.
Searle, S. R., Casella, G. and McCulloch, C. E. (1992). Variance Components,Wiley, New York.
Seber, G. A. F. (1966). The linear hypothesis: a general theory, Griffin.
Silvey, S. D. (1969). Multicollinearity and imprecise estimation, Journal of theRoyal Statistical Society, Series B 35: 67–75.
Snedecor, G. W. and Cochran, W. G. (1967). Statistical Methods, 6th edition,Iowa State University Press, Ames, Iowa.
Tan, W. Y. (1971). Note on an extension of the GM-theorem to multivariatelinear regression models, SIAM Journal on Applied Mathematics 1: 24–28.
Theobald, C. M. (1974). Generalizations of mean square error applied to ridgeregression, Journal of the Royal Statistical Society, Series B 36: 103–106.
Timm, N. H. (1975). Multivariate Analysis with Applications in Education andPsychology, Brooks/Cole.
Toutenburg, H. (1992a). Lineare Modelle, Physica.
Toutenburg, H. (1992b). Moderne nichtparametrische Verfahren der Risiko-analyse, Physica, Heidelberg.
Toutenburg, H. (1994). Versuchsplanung und Modellwahl, Physica.
Toutenburg, H. (2003). Lineare Modelle – Theorie und Anwendungen, Physica.
Toutenburg, H., Heumann, C., Fieger, A. and Park, S. H. (1995). Missing valuesin regression: Mixed and weighted mixed estimation, in V. Mammitzsch andH. Schneeweiss (eds), Statistical Sciences: Symposia Gaussiana, Proceedingsof the 2nd Gauss Symposium, Walter de Gruyter, Berlin, pp. 289–301.
Toutenburg, H., Toutenburg, S. and Walther, W. (1991). Datenanalyse undStatistik fur Zahnmediziner, Hanser.
608 References
Toutenburg, H. and Walther, W. (1992). Statistische Behandlung unvollstandigerDatensatze, Deutsche Zahnarztliche Zeitschrift 47: 104–106.
Toutenburg, S. (1977). Eine Methode zur Berechnung des Betreungsgrades inder prothetischen und konservierenden Zahnmedizin auf der Basis von Ar-beitsablaufstudien, Arbeitszeitmessungen und einer Morbiditatsstudie, PhDthesis.
Trenkler, G. (1981). Biased Estimators in the Linear Regression Model, Hain,Konigstein/Ts.
Tukey, J. W. (1953). The problem of multiple comparisons, Technical report.
Vach, W. and Blettner, M. (1991). Biased estimation of the odds ratio incase-control studies due to the use of ad-hoc methods of correcting formissing values in confounding variables, American Journal of Epidemiology134: 895–907.
Vach, W. and Schumacher, M. (1993). Logistic regression with incompletely ob-served categorial covariates: A comparison of three approaches, Biometrika80: 353–362.
Waller, R. A. and Duncan, D. B. (1972). A bayes rule for the symmetric mul-tiple comparison problem, Journal of the American Statistical Association67: 253–255.
Walther, W. (1992). Ein Modell zur Erfassung und statistischen Bewertungklinischer Therapieverfahren—entwickelt durch Evaluation des Pfeilerver-lustes bei Konuskronenersatz, Habilitationsschrift, Universitat Homburg,Germany.
Walther, W. and Toutenburg, H. (1991). Datenverlust bei klinischen Studien,Deutsche Zahnarztliche Zeitschrift 46: 219–222.
Wedderburn, R. W. M. (1974). Quasi-likelihood functions, generalized linearmodels, and the Gauss–Newton method, Biometrika 61: 439–447.
Wedderburn, R. W. M. (1976). On the existence and uniqueness of the maximumlikelihood estimates for certain generalized linear models, Biometrika 63: 27–32.
Weerahandi, S. (1995). Anova under unequal error variances, Biometrics 51: 589–599.
Weisberg, S. (1980). Applied Linear Regression, Wiley.
Wilks, S. S. (1932). Moments and distributions of estimates of populationparameters from fragmentary samples, Annals of Mathematical Statistics3: 163–195.
Wilks, S. S. (1938). The large-sample distribution of the likelihood ratio fortesting composite hypotheses, Annals of Mathematical Statistics 9: 60–62.
Woolson, R. F. (1987). Statistical Methods for the Analysis of Biomedical Data,Wiley.
Wu, C. F. J. and Hamada, M. (2000). Experiments: Planning, Analysis andParameter Design Optimization, Wiley.
Yates, F. (1933). The analysis of replicated experiments when the field resultsare incomplete, Empire Journal of Experimental Agriculture 1: 129–142.
References 609
Zhao, L. P. and Prentice, R. L. (1990). Correlated binary regression using ageneralized quadratic model, Biometrika 77: 642–648.
Zhao, L. P., Prentice, R. L. and Self, S. G. (1992). Multivariate mean parame-ter estimation by using a partly exponential model, Journal of the RoyalStatistical Society, Series B 54(3): 805–811.
Zimmermann, H. and Rahlfs, W. (1978). Testing hypotheses in the two periodchange-over with binary data, Biometrical Journal 20(2): 133–141.
Index
C-matrix, 187
ad–hoc criteria, 81adjusted coefficient of determination,
81, 83adjusted treatment sum of squares,
190adjusted treatment totals, 187affine resolvable BIBD, 204Albert’s theorem, 541algorithm
Fisher–scoring, 337iterative proportional fitting (IPF),
366aliases, 318alternate, 319Analysis of variance, 73Andrews–Pregibon statistic, 100ANOVA, table, 74, 80AR(1)–process, 384associate classes, 220association parameters, 360, 363association schemes, 219
balanced design, 185balanced incomplete block design
(BIBD), 201
balanced partially confounded design,305
beta–binomial distribution, 340BIBD, 201
affine resolvable, 204effective variance, 214efficiency balanced, 207efficiency factor, 207resolvable, 203symmetric, 203
binary design, 185binary response, 340, 356
variable, 344binomial distribution, 330bivariate
binary correlated response, 384regression, 73
canonical link, 332categorical response variables, 330categorical variables, 343Cauchy–Schwarz Inequality, 534censoring, 488central limit theorem, 350chain rule, 335clinical long-time studies, 488cluster, 339, 376
612 Index
coding of response models, 372coefficient of determination, 77
adjusted, 81, 83multiple, 80
complementary one-half fraction, 319complete block design, 182complete case analysis, 489, 499compound symmetric structure, 376concordance matrix, 184condition number , 498conditional
distribution, 344model, 377
confidenceellipsoid, 83, 97intervals, 83intervals for b0 and b1, 77
confounding, 294confounding arrangement, 298connected design, 185constraints, 360contingency table, 343
I × J , 330I × J × 2, 362three–way, 362two–way, 343, 351, 359
Cook’s distance, 98corrected logit, 354corrected sum of squares, 74correlated response, 377correlation coefficient, sample, 75, 77covariance matrix, 350
asymptotic, 350estimated asymptotic, 366
Cox approach, 373criteria
ad–hoc, 81for model choice, 81
cross–product ratio, 346cyclic type scheme, 228
defining contrasts, 298defining relation, 317dependent binary variables, 375design matrix for the main effects,
371detection of outliers, 93determinant, 520deviance, 339
diagnostic plots, 96differences, test for qualitative, 373dispersion parameter, 332distribution
beta–binomial, 340conditional, 344logistic, 356multinomial, 347Poisson, 347
drop–out, 488dummy coding, 368dummy variable, 73
effect coding, 366, 369effective variance, 214efficiency balanced BIBD, 207efficiency factor, 207elements of P , 88endodontic treatment, 362equireplicate design, 184estimable function, 185estimating equations, 341estimation
mixed, 496OLS, 571
estimator, OLS, 73exact linear restrictions, 70exchangeable correlation, 384exponential
dispersion model, 332family, 331
externally Studentized residual, 92
filled–up data, 493filling–up method according to Yates,
494first–order regression (FOR), 500Fisher
–information matrix, 334–scoring algorithm, 337
Fisher’s inequality, 201fit, perfect, 360fractional factorial experiments, 316fractional replications, 316
G2–statistic, 358generalized
estimating equations (GEE), 380linear model (GLM), 329, 331
Index 613
linear model for binary response,353
generalized interaction, 300generalized inverse, 537generator, 317goodness of fit, 73, 339
testing, 350group divisible type scheme, 227
nonsingular, 228regular, 228semi-regular, 228singular, 227
grouped data, 353
hat matrix, 87hazard function, model for the, 374hazard rate, 372heteroscedasticity, 97hierarchical models for three–way
contingency tables, 364
identity link, 332ignorable nonresponse, 490imputation
cold deck, 489for nonresponse, 489hot deck, 489mean, 490multiple, 490regression (correlation), 490
incidence matrix, 184incomplete block design, 182
analysis of variance, 189interblock estimates, 196pooled estimator, 198recovery of interblock information,
200independence, 344
conditional, 363joint, 363mutual, 362testing, 351
independence estimating equations(IEE), 380, 386
independent multinomial sample, 348influential observations, 91inspecting the residuals, 94interaction, test for quantitative, 373interblock analysis
incomplete block design, 193interblock analysis of incomplete
block design, 193interblock estimates, 196internally Studentized residual, 92intrablock analysis
C-matrix, 187adjusted treatment sum of squares,
190adjusted treatment totals, 187analysis of variance, 189incomplete block design, 185intrablock equations, 186unadjusted block sum of squares,
190intrablock analysis of incomplete
block design, 185intrablock equations, 186inversion, partial, 568iterative proportional fitting (IPF),
366I × J contingency table, 330
kernel of the likelihood, 349key block, 301
Latin square type association scheme,228
leverage, 88likelihood
equations, 69function, 348ratio, 71ratio test, 352, 358
link, 331canonical, 332, 379function, 356identity, 332natural, 332
log odds, 353logistic
distribution, 356regression, 353regression model, 353
logit link, 353logit models, 353
for categorical data, 357loglinear model, 359
of independence, 360
614 Index
LR test, 77
Mallow’s Cp, 83MAR, 490marginal
distribution, 343model, 377probability, 344
maximum likelihood, 384estimates, 348, 351estimates of missing values, 501
MCAR, 490mean shift model, 106mean–shift outlier model, 93missing
data in the response, 492data mechanisms, 490not at random, 488values and loss of efficiency, 497values in the X–matrix, 495
modelindependence, 358logistic, 358logistic regression, 353logit, 353, 358saturated, 358, 360sub-, 571
model choice, 81criteria for, 81
model of statistical independence, 357Moore–Penrose Inverse, 537MSE superiority, 54MSE–I criterion, 54multinomial
distribution, 347independent sample, 348
multinomial distribution, 350multiple
X–rows, 90coefficient of determination, 80imputation, 490regression, 79
naturallink, 332parameter, 331
nested, test, 81nonignorable nonresponse, 490nonresponse in sample surveys, 487
normal equations, 48normalized residual, 92
OAR, 490observation–driven model, 377odds, 345
log, 353ratio, 346ratio for I × J tables, 346
OLS estimator, 73in the filled–up model, 493
orthogonal block design, 189outlier, 96overdispersion, 339
parameter, natural, 331partial
inversion, 568regression plots, 102
partial confounding, 304partially balanced association
schemes, 220partially balanced incomplete block
design (PBIBD), 219PBIBD
associate classes, 220association schemes, 219cyclic type association scheme, 228general theory, 229group divisible type association
scheme, 227Latin square type association
scheme, 228rectangular association scheme, 220singly linked block association
scheme, 229triangular association scheme, 222
Pearson’s χ2, 350Poisson
distribution, 330, 347sampling, 366
pooled estimator, 198prediction matrix, 87principal block, 301principle of least squares, 47probit model, 356product multinomial sampling, 348prognostic factor, 353proper design, 184
Index 615
quasi likelihood, 341quasi loglikelihood, 341quasi–correlation matrix, 380, 383quasi–score function, 342
random–effects model, 377, 384recovery of interblock information,
200rectangular association scheme, 220reduced intrablock matrix, 187regression
bivariate, 73multiple, 79
regression analysis, checking theadequacy of, 76
regression diagnostics, 105relative
efficiency, 497risk, 345
residual, sum of squares, 79, 81residuals
externally Studentized, 92internally Studentized, 92normalized, 92standardized, 92sum of squared, 47
residuals matrix, 87resolvable BIBD, 203response
binary, 340missing data, 492
response probability, model for, 370response variable, binary, 344restrictions, exact linear, 70risk, relative, 345
sample correlation coefficient, 75, 77sample logit, 354sample, independent multinomial, 348score function, 334selectivity bias, 489singly linked block association
scheme, 229span, 498standard order, 287standardized residual, 92Submodel, 571Sum of squares
Residual-, 79
superiorityMSE, 54
SXX, 75SXY, 75symmetric BIBD, 203systematic component, 331SYY, 74, 75
table of ANOVA, 74, 80test
for qualitative differences, 373for quantitative interaction, 373likelihood–ratio, 352nested, 81
test statistic, 79, 566testing goodness of fit, 350therapy effect, 373three–factor interaction, 363three–way contingency table, 362triangular association scheme, 222two–way
contingency table, 351interactions, 363
unadjusted block sum of squares, 190unbalanced partially confounded
design, 305
variance balanced design, 185variance ratio, 101
Wald statistic, 355Welsch–Kuh’s distance, 99Wilks’ G2, 339, 352working
covariance matrix, 380variances, 341, 379
zero–order regression (ZOR), 499