+ All Categories
Home > Documents > Designing of Experiments for ANOVA models - Abstracts

Designing of Experiments for ANOVA models - Abstracts

Date post: 12-Mar-2023
Category:
Upload: khangminh22
View: 0 times
Download: 0 times
Share this document with a friend
36
University of Natural Resources and Applied Life Sciences Vienna Designing of Experiments for ANOVA models Marie ˇ Simeˇ ckov´ a Supervisor: Prof. Dr. Dr. h.c. Dieter Rasch 2008
Transcript

University of Natural Resources and Applied Life Sciences

Vienna

Designing of Experiments for ANOVA models

Marie Simeckova

Supervisor: Prof. Dr. Dr. h.c. Dieter Rasch

2008

Abstract

Statistical design is a very important part of applied empirical studies. In this work evaluatingof the required size of experiment in two specific cases of ANOVA models is proposed.

In the first part, the one-way layout with one fixed factor and ordinal categorical responsevariable is considered. The Kruskal – Wallis test is used to test the equality of main effects andits properties are compared with those of the F -test. The distribution of response variables ischaracterized by the relative effect. A formula for evaluating the sample size for the Kruskal– Wallis was derived by simulation. Then the formula was compared with the two Noether’sformula for the Wilcoxon test in the case of two factor levels.

In the second part, the two-way ANOVA mixed model with one observation for each row –column combination is considered and tests of interaction in this model are studied. Five testsof additivity are covered, all developed for models with two fixed factors: Tukey test, Mandeltest, Johnson – Graybill test, Tussel test and locally best invariant (LBI) test. We confirmedby simulation that these tests hold the type-I-risk even for the mixed model case. Then theirpower was studied. The power of Johnson – Graybill, LBI and Tussel test is sufficient, butthe power of Tukey and Mandel test is low for the general type of interaction. A modificationof Tukey test is proposed to solve this problem. Finally, a formula for determination of thesize of an experiment for the Johnson – Graybill test is derived.

Zusammenfassung

Versuchsplanung ist ein wichtiger Bestandteil angewandter empirischer Forschung. In dieserArbeit wird der erforderliche Versuchsumfang fur einige Spezialfalle der Varianzanalyse (VA)hergeleitet.

Im ersten Teil wird die einfache Klassifikation mit einem festen Faktor und kategorialer Ziel-variablen behandelt. Der Kruskal – Wallis – Test wird zur Prufung der Gleichheit der Haupt-effekte herangezogen und seine Eigenschaften werden mit denen des F -Tests verglichen. DieVerteilung der kategorialen Zielvariablen wird durch den relativen Effekt charakterisiert. Eswurde eine Formel zur Bestimmung des Stichprobenumfangs fur den Kruskal – Wallis – Testmit Hilfe von Simulationen abgeleitet. Diese Formel wurde dann mit zwei Formeln vonNoether fur den Wilcoxon – Test, also fur den Spezialfall eines Faktors mit zwei Stufen,verglichen.

Im zweiten Teil wird die zweifache Klassifikation mit einfacher Klassenbesetzung und einemgemischten Modell betrachtet. Es wurden funf Tests auf fehlende Wechselwirkungen unter-sucht, die fur das Modell mit zwei festen Faktoren entwickelt wurden. Diese sind: der Tukey– Test, der Mandel – Test, der Johnson – Graybill – Test, der Tussel – Test und ein besterlokaler invarianter Test (LBI). Auch im Falle eines gemischten Modells halten – wie unsereSimulationsstudien zeigen – alle diese Test das Risiko erster Art ein. Bezuglich der Gutestellten wir fest, dass die Gute von Johnson – Graybill – Test, LBI – Test und Tussel Testzufriedenstellend ist, dagegen haben Tukey – Test und Mandel – Test eine unbefriedigendgeringe Gute. Eine Modifikation des Tukey – Tests wurde vorgenommen aber es ergab sichkeine ausreichende Verbesserung der Gute. Schliesslich wurde eine Formel zur Bestimmungdes Versuchsumfanges fur den Johnson – Graybill – Test entwickelt.

1

Contents

1 Introduction 3

2 One-Way Layout with One Fixed Factor for Ordered Categorical Data 4

2.1 Description of the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1.1 The parametric F -test . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.2 The nonparametric Kruskal – Wallis test . . . . . . . . . . . . . . . . . 7

2.1.3 Relative effect and its properties . . . . . . . . . . . . . . . . . . . . . 8

2.2 Design of experiment for the Kruskal – Wallis test . . . . . . . . . . . . . . . 11

2.2.1 Comparison of the F -test and the Kruskal – Wallis test for the one-waylayout with ordinal categorical response . . . . . . . . . . . . . . . . . 11

2.2.2 Size of experiment for the Kruskal – Wallis test . . . . . . . . . . . . . 13

2.2.3 Size of experiment for the Wilcoxon test . . . . . . . . . . . . . . . . . 19

3 Tests of Additivity in Mixed Two-way ANOVA Model with Single SubclassNumbers 22

3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.1.1 Description of the problem . . . . . . . . . . . . . . . . . . . . . . . . 23

3.1.2 Tests of additivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2 Properties of the additivity tests . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2.1 Type-I-risk of the additivity tests . . . . . . . . . . . . . . . . . . . . . 25

3.2.2 Simulation study of the power of the additivity tests . . . . . . . . . . 26

3.2.3 Size of experiment for the Johnson – Graybill test . . . . . . . . . . . 27

3.3 Modified Tukey test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2

Chapter 1

Introduction

In biology, agriculture, psychology and many other research fields experiments are very im-portant methods to acquire knowledge. To obtain credible conclusions on the one hand andnot to waste resources on the other hand experiments must be carefully designed. An essentialpart of planning an experiment is determination of the number of unit included. This work isfocused on two specific cases – estimation of power and sample size for the Kruskal – Wallistest and additivity test in two-way ANOVA model without replication.

The thesis is divided in two parts. In Chapter 2 the interest lies in the one-way layout withordinal categorical response. The Kruskal – Wallis analysis of variance by ranks is applied.As demonstrated in papers Rasch and Simeckova (2007) and Simeckova and Rasch (2008)the required sample size can be approximated by a formula comprising number of responsecategories and relative effect. The results are compared to analogous formula for the F -testand Wilcoxon test.

In Chapter 3 tests of presence or absence of an interaction in two-way ANOVA mixed modelwith only one observation in each subclass (additivity tests) are studied. Five tests originallydeveloped for the fixed effects ANOVA are considered (Tukey test, Mandel test, Johnson –Graybill test, Tussel test and locally best invariant test) and their type-I-risk and type-II-riskin the mixed effects ANOVA are evaluated. The minimal sample size is estimated by meansof simulation for Johnson – Graybill test and an empirical formula for determining samplesize is presented. Finally, a modification of Tukey test is proposed to increase the power forgeneralized interaction scheme. All these results have been already submitted for publication,see Simecek and Simeckova (submitted), Rusch et al. (subbmitted) and Simeckova and Rasch(subbmitted).

All presented simulations were performed using the statistical environment R, R DevelopmentCore Team (2008) on a grid of 48 Intel machines at Supercomputing Centre Brno. I wouldlike to thank the METACentrum project (http://meta.cesnet.cz) for allocation of computingtime.

I cannot forget to commemorate my former supervisor Prof. Harald Strelec. I am extremelygrateful to Prof. Dieter Rasch, my colleagues from Universitat fur Bodenkultur Wien andmy friends from Universitat Wien, their suggestions improve this work substantially. Thefriendly environment and support was provided by Institute of Animal Science in Prague.

3

Chapter 2

One-Way Layout with One Fixed

Factor for Ordered Categorical

Data

The aim of my work was to determine the size of experiments for a special case of the Kruskal– Wallis test (Kruskal and Wallis (1952)): for the test of equality of main effects in theone-way layout with ordinal categorical response variable.

Because the exact distribution of the test statistic of the Kruskal – Wallis test under thealternative hypothesis is not known, results about its power and sample size determinationin literature are very poor. In Mahoney and Magel (1996) a bootstrapping method for estima-tion of the power is presented, the simulation was performed for comparing of distributions in 3or 5 groups. A special case of the Kruskal – Wallis test when comparing distributions in twogroups is the Wilcoxon test (Wilcoxon (1945)). The approximation of the power of Wilcoxontest is known (Lehmann (1975)) and in Noether (1987) two formulas for determination of thesize of experiment for the Wilcoxon test are introduced.

In this chapter we will first summarize the F -test and the Kruskal – Wallis test and thenproperties of the relative effect will be discussed. This knowledge will be used in Section 2.2.2,where the power of the Kruskal – Wallis test is investigated by simulation and a formula fordetermination of the size of experiment is derived. Last, outputs of this formula for theWilcoxon test are compared to outputs of two Noether’s formulas.

Let us conclude with following definition: We say that a distribution function F is lower thananother distribution function G, i.e. F < G, if F (x) ≤ G(x) for all x ∈ R and there existsa set A (with positive measure with respect to both distribution functions F and G) suchthat for all x ∈ A: F (x) < G(x).

Through the thesis the random variables are bold print.

2.1 Description of the model

Let us consider a one-way layout y1, . . . ,ya with distribution functions F1, . . . , Fa, respec-tively. The random variable yi corresponds to the i-th level of the factor A. For each i

4

(i = 1, . . . , a) random sample yi1, . . . , yiniis observed.

The aim is to design the experiment for testing an effect of the factor A. The null hypothesis

H0: F1 = F2 = · · · = Fa (2.1)

is tested against the alternative

HA: ∃ i, j : Fi < Fj or Fi > Fj . (2.2)

For F1, . . . , Fa Gaussian distribution functions the F -test can be used. For non-Gaussiandistributions the Kruskal – Wallis test has to be used.

Our interest lies in the case when the response variables y1, . . . ,ya are ordinal categorical, par-ticularly in the case when y1, . . . ,ya where derived from some continuous variables x1, . . . ,xa

by discretization. The formal definition follows.

Definition 1. Consider a continuous random variable x. A new ordered categorical randomvariable y with r categories is derived from x using a decomposition of the real line basedon a set {ξ1, ξ2, . . . , ξr−1}, −∞ = ξ0 < ξ1 < ξ2 < · · · < ξr−1 < ξr = +∞.

The y = i if and only if x lies in the interval (ξi−1, ξi], i = 1, . . . , r.

The set {ξ1, ξ2, . . . , ξr−1} is called the support of the decomposition.

Because we compare the sample size for the Kruskal – Wallis test with the sample size forthe F -test, we will recall the F -test and the Kruskal – Wallis test and their properties. Thenthe definition of relative effect will be introduced.

2.1.1 The parametric F -test

In this section properties of the F -test are shortly summarized. More information about themcan be found e.g. in (Rasch et al., 2007, section 4.1.1), Scheffe (1959) or Lehmann (2005).

Let us consider a continuous random variable x. The model equation is written in the form(ANOVA model I):

xij = E(xij) + eij = µ+ αi + eij (i = 1, . . . , a; j = 1, . . . , ni), (2.3)

where µ, αi are real numbers (i.e. non-random); it should hold either∑a

i=1 αi = 0 or∑a

i=1 niαi = 0 (which are equivalent if n1 = · · · = na). The eij are mutually independentnormally distributed random variables with E(eij) = 0 and var(eij) = σ2.

The ni’s and a are known real constants. Let us denote N =∑a

i=1 ni and α =∑

αi/a.

In Rasch and Guiard (1990) it was shown that the F -test is quite robust and the assumptionof normality of the error terms eij can be relaxed.

We like to design the experiment for test of the hypothesis

H0 : α1 = · · · = αi, (2.4)

in other words “the factor A has no effect on the response variable”, against the alternative

HA : ∃ i, j : αi 6= αj .

5

If the errors eij are normally distributed, the exact test statistic for testing the null hypoth-esis (2.4) is equal to

F =MSA

MSR, (2.5)

where

MSA =

i ni · (xi· − x··)2

a− 1and MSR =

i

j(xij − x··)2

N − a

are the mean squares of the factor A and the residual mean squares (xi· row means, x·· overallmean).

Under the null hypothesis F follows the central F -distribution with f1 = a−1 and f2 = N−adegrees of freedom. Otherwise, it follows the non-central F -distribution with f1 = a− 1 andf2 = N − a degrees of freedom and the non-centrality parameter λ equals

λ =

∑ai=1 ni · (αi − α)2

σ2. (2.6)

Let denote F (f1, f2;λ; p) the p-quantile of the non-central F -distribution with f1 and f2

degrees of freedom and non-centrality parameter λ. If λ = 0 we shorten F (f1, f2; 0; p)as F (f1, f2; p).

If the realization F of F in (2.5) exceeds F (a− 1, N −a; 1−α) the null hypothesis is rejectedon the level α, otherwise it is not rejected. For a = 2 the F -test coincides with the t-test fortwo independent samples.

A design of an experiment must involve specification of the the required type-I-risk α andrequired type-II-risk β. For determination of the sample size of the F -test two more pa-rameters should be specified, the variance σ2 of random errors eij and δ = αmax − αmin

(αmax = max(αi) the greatest and αmin = min(αi) the lowest of the effects α1, . . . , αa). Wewill determine the maxi-min sample size, which assure the type-II-risk for any α1, . . . , αa

fulfilling αmax − αmin = δ. The maxi-min sample size will be denoted nmax.

The type-II-risk of the F -test depends on the non-centrality parameter λ. The type-II-riskdecreases as λ increases. For given δ the term (2.6) is minimized if the a− 2 effects are equalto 1

2 (αmin + αmax). For any other values λ would be higher and the type-II-risk lower.

If N =∑a

i=1 ni is fixed then the type-II-risk of the F -test is minimized when the subclassnumbers ni are as equal as possible. If a is an integer divisor of N then the type-II-risk isminimized for n1 = · · · = na = n = N/a. Therefore we choose n1 = . . . , na = n.

To calculate the required sample size nmax we have to solve the quantile equation

F (a− 1, na− a; 1 − α) = F (a− 1, na− a;λ;β). (2.7)

If αmax = αmin + δ and the other a − 2 of the effects αi are equal to 12 (αmin + αmax) =

αmin + δ2 = α, then

∑ai=1 (αi − α)2 = δ2

2 and so λ = nδ2

2σ2 . The λ depends on δ and σ only

through their ratio, therefore only this ration δσ , called relative effect size, is necessary to

know for calculation of the maxi-min sample size.

To conclude, for evaluation of the maxi-min sample size of the F -test (for given numberof factor levels a) the type-I-risk α, type-II-risk β and relative effect size δ

σ must be fixed.

In Table 2.1 we report some values of nmax in dependence on δσ and β for α = 0.05 and a = 6.

6

Table 2.1: Maxi-min sample sizes for α = 0.05, a = 6 and different values of β and δσ .

δσ β = 0.05 β = 0.1 β = 0.2

1 41 34 271.2 29 24 191.5 19 16 13

2.1.2 The nonparametric Kruskal – Wallis test

The F -test for testing the hypothesis about the equality of means in a one-way ANOVAmodel I discussed in Section 2.1.1 is based on the assumption that the observed variables arenormally distributed and their distributions in different groups differ only in their expectedvalues. The Kruskal – Wallis test considered in this chapter can be used in cases wherenormality is questionable. The principle of this test will be explained in brief, for details seee.g. (Lehmann, 1975, chapter 5, section 2).

Let y1, . . . ,ya be random variables with distribution functions F1, . . . , Fa, the yi correspondsto the observed variable in the i-th level of the factor A. We will test the hypothesis

H0: F1 = F2 = · · · = Fa, (2.8)

against the alternativeHA: ∃ i, j : Fi < Fj or Fi > Fj . (2.9)

For each i (i = 1, . . . , a) there are ni realizations of the random variable yi, denoted byyi1, yi2, . . . , yini

, N =∑

ni. All the observed values yij are set up overall in one vector andordered. Let rij be the rank of the value yij in this sequence. Then the sum of ranks in eachfactor level is evaluated, i.e. Ti =

∑ni

j=1 rij , i = 1, . . . a.

The test statistic of the Kruskal – Wallis test is equal to

Q =12

N(N + 1)

a∑

i=1

T 2i

ni− 3(N + 1). (2.10)

This basic version of the Kruskal – Wallis test assumes that the distribution functions Fi arecontinuous and therefore there are no ties (almost surely). In our case of categorical variablesyi this is not true and a Kruskal – Wallis test corrected for ties should be used (Kruskal andWallis (1952)). Let s be the number of distinct values of observations and tl be the numberof tied values of the l-th smallest observed value, l = 1, . . . , s. Then the corrected test statisticis equal to

QK =Q

1 −Ps

l=1(t3

l−tl)

N3−N

. (2.11)

Let us note that Q is a special case of QK , where tl = 1 for all l = 1, . . . , s = N .

The test statistics Q and QK are under H0 asymptotically χ2a−1 distributed. For small sample

sizes the critical values are tabulated in software or tables.

For a = 2 (comparing of distributions in only two groups) the test statistics are simplified tothe Wilcoxon (Mann – Whitney) test statistics.

7

2.1.3 Relative effect and its properties

The aim of this work is to design an experiment for the Kruskal – Wallis test, i.e. to findmaxi-min sample size for given type-I-risk and type-II-risk. Because the exact distributionof the Kruskal – Wallis test statistic under the alternative hypothesis is not known we deriveda formula for determining of the sample size by simulation.

We assume that the ordinal response variables y1, . . . ,ya are discretized from an underlyingcontinuous random variables x1, . . . ,xa (see Definition 1 on page 5). The loss of informationcaused by the discretization is measured by the so called relative effect.

In more details, the ordered categorical random variable y takes realizations belonging to rordered categories C1 ≺ C2 ≺ · · · ≺ Cr with r ≥ 1 (we used the symbol ≺ to denote the orderrelation). We need a measure for the distance between two distributions. For this we use theapproach of (Brunner and Munzel, 2002, section 1.4, formula (1.4.1)).

First, let us recall the definition of distribution function. The concept of Brunner and Munzel(2002) is used to treat the problem of discontinuity of the distribution function.

Definition 2. Let y be a random variable. Then we define the (standardized) distributionfunction of y as

F (y) =1

2

(

P (y < y) + P (y ≤ y))

.

The relative effect is defined as follows.

Definition 3. For two independent random variables y1 and y2 with distribution functionsF1(y) and F2(y) respectively, the probability

p(y1; y2) = P (y1 < y2) +1

2P (y1 = y2) =

F1dF2

is called the relative effect of y2 with respect to y1.

Note that the relative effect p(y1; y2) is equal to 1 − p(y2; y1), and for continuous randomvariables p(y1; y2) = P (y1 < y2).

Our aim is to find a connection between the non-centrality parameter of the continuous modeland the relative effect of the model with ordinal variables. Properties of the relative effectwill be discussed in this section in details.

For simulation experiments we generate ordered categorical variables by decomposition of thereal line, as was described in Definition 1 (page 5). The relative effect for this case is computedin the following example.

Example 1. Let x be a random variable with distribution function F . The random variabley is derived from it using the decomposition with support {ξ1, ξ2, . . . , ξr−1}. Then it holdsP (y = i) = F+(ξi) − F+(ξi−1).

If x1 and x2 are two variables with distribution functions F1 and F2, respectively, and thevariables y1 and y2 are derived from them using the same support {ξ1, ξ2, . . . , ξr−1}, the

8

relative effect of y2 with respect to y1 can be evaluated as follows

p(y1; y2) = P (y1 < y2) +1

2P (y1 = y2) =

=r∑

j=2

(

P (y2 = j) ·j−1∑

i=1

P (y1 = i))

+1

2

r∑

j=1

(

P (y1 = j) · P (y2 = j))

=

=r∑

j=2

j−1∑

i=1

(

F+2 (ξj) − F+

2 (ξj−1))

·(

F+1 (ξi) − F+

1 (ξi−1))

+

+1

2

r∑

j=1

(

F+2 (ξj) − F+

2 (ξj−1))

·(

F+1 (ξj) − F+

1 (ξj−1))

. (2.12)

The following theorem shows the relation between the relative effects of two continuous vari-ables and two ordinal categorical variables, derived from the continuous one by decomposition.

Theorem 1. Let x1 and x2 be two independent continuous random variables and yr1, yr

2

two ordinal categorical variables with r categories, derived from x1, x2 by decomposition withsupport {ξ1, ξ2, . . . , ξr−1}. Then:

(i) If r = 1 (i.e. y11 = y1

2 = 1 with probability 1) then p(y1; y2) = 12 .

(ii) If r → ∞ and maxi |ξi+1 − ξi| → 0 then

p(yr1; y

r2) → p(x1; x2). (2.13)

(iii) If p(x1; x2) >12 and

∀ δ1, δ2, δ3 ∈ R∗, δ1 < δ2 < δ3 :

P (δ1 < x1 < δ2) · P (δ2 < x2 < δ3) ≥ P (δ1 < x2 < δ2) · P (δ2 < x1 < δ3), (2.14)

then the convergence in (2.13) is monotonous and therefore for each r

p(x1; x2) ≥ p(yr+11 ; yr+1

2 ) ≥ p(yr1; y

r2) ≥

1

2.

Proof. (i) Obviously:

p(y1; y2) = P (y1 < y2) +1

2P (y1 = y2) = 0 +

1

2· 1 =

1

2.

(ii) It is valid

p(x1; x2) − p(yr1; y

r2) =

1

2

(

P (x1 < x2,yr1 = yr

2) − P (x1 > x2,yr1 = yr

2))

. (2.15)

It follows|p(x1; x2) − p(yr

1; yr2)| < P (yr

1 = yr2) → 0.

9

(iii) Let us proceed by the induction on r. Take any yr1, yr

2 and make refinement of theirsupport of decomposition adding one point φ somewhere between any ξi and ξi+1.Denote z1, z2 the categorical variables based on this new decomposition.

Analogously as in (2.15),

p(z1; z2) − p(yr1; y

r2) =

1

2

(

P (z1 < z2,yr1 = yr

2) − P (z1 > z2,yr1 = yr

2))

.

Then using that and the independence of the variables x1, x2

p(z1; z2) − p(yr1; y

r2) =

1

2

(

P (z1 < z2,yr1 = yr

2) − P (z1 > z2,yr1 = yr

2))

=

=1

2

(

P (ξi < x1 ≤ φ) · P (φ < x2 ≤ ξi+1) −

−P (φ < x1 ≤ ξi+1) · P (ξi < x2 ≤ φ))

.

This expression should be nonnegative. Using the condition (2.14) for δ1 = ξi, δ2 = φand δ3 = ξi+1 finishes the proof.

The part (ii) of Theorem 1 tells us that the random variables with countable infinite support(i.e. variables yr

1, yr2, which take the values 1, 2, . . . , r, r → ∞) can for the sufficiently crowded

support have almost the same relative effect as the variables with uncountable support (i.e.variables x1, x2, which take the values from some interval in real numbers).

In part (iii) there is the necessary condition to increase the relative effect by adding a newsupport point. If we take δ1 = ξi, δ3 = ξi+1 elements of the original support and δ2 a newelement added between the ξi and ξi+1, the condition (2.14) is necessary to increase therelative effect in this particular case.

The condition is fulfill e.g. for x1, x2 uniformly distributed. It is usually fulfilled for normallydistributed variables but it does not hold generally, as is shown in Example 2. The conditionis not valid because the difference between the expected values of x1 and x2 is very small(relatively to their standard deviation).

Example 2. Consider normally distributed random variables x1 and x2 with common vari-ance equal to 1 and expected values equal to −0.1 and 0.1, respectively. Two random variablesy1 and y2 are derived from them using the support {−0.4,−0.2}. Then p(y1; y2) = 0.5020(it is easy to see using formula (2.12)).

If a new point −0.3 is added to the support and random variables z1, z2 are derived from thex1 and x2 using the support {−0.4,−0.3,−0.2}, the relative effect p(z1; z2) = 0.5004. Thisvalue is lower then the value of p(y1; y2).

The distribution functions of y1, y2 and z1, z2 are plotted in Figure 2.1.

10

y

Fy(y

)

0.0

0.4

0.8

0 1 2 3

z

Fz(z

)

0.0

0.4

0.8

0 1 2 3 4

Figure 2.1: The distribution functions of y1, y2 and z1, z2 for Example 2; y1 and z1 solidlines, y2 and z2 dashed lines.

2.2 Design of experiment for the Kruskal – Wallis test

2.2.1 Comparison of the F -test and the Kruskal – Wallis test for the one-

way layout with ordinal categorical response

Some researchers use the F -test for the comparison of means of ordered categorical variablesalthough it is not correct. In this section we compare the nominal and the actual risks forinvestigating the impact of this mistake.

Let us consider three normally distributed random variables x1, x2, x3, with the same stan-dard deviation σ = 50/3 = 16.67, and different expected values µ1 = 50 − σ/2 = 41.67,µ2 = 50, µ3 = 50 + σ/2 = 58.33. Three categorical random variables y1, y2, y3 are derivedfrom the variables x1, x2, x3 using the support {30, 50, 70} (see Definition 1 on page 5). Weare comparing a = 3 variables, each attaining r = 4 values.

We are interested in the properties of testing the hypothesis

H0: µ1 = µ2 = µ3, (2.16)

against the alternativeHA: ∃ i, j : µi 6= µj .

We test this hypothesis using the original normally distributed variables x1, x2, x3 or theordinal categorical variables y1, y2, y3. The F -test and the Kruskal – Wallis test will beperformed in both cases.

Let αnom denote the nominal type-I-risk (which should not be violated) and αact the actualtype-I-risk (which is attained) of the performed test (F -test or Kruskal – Wallis test) andgiven variables (continuous or categorical). The αact is estimated by simulation. Analogouslythe βnom and βact are used for the type-II-risk.

11

Table 2.2: The actual type-II-risk. The bold printed values are the 20 % robust results.

Normal Categorical Normal CategoricalNominal values F -test Kruskal–Wallis Kruskal–Wallis F -test

αnom βnom n βact sd(βact) βact sd(βact) βact sd(βact) βact sd(βact)

0.10 0.20 17 0.1808 0.0027 0.2400 0.0033 0.2018 0.0038 0.2309 0.00240.10 0.15 19 0.1410 0.0024 0.1975 0.0039 0.1589 0.0030 0.1880 0.00300.10 0.10 22 0.0962 0.0020 0.1440 0.0036 0.1117 0.0031 0.1369 0.00360.10 0.05 27 0.0484 0.0017 0.0827 0.0023 0.0584 0.0020 0.0776 0.00260.05 0.20 21 0.1850 0.0037 0.2582 0.0032 0.2108 0.0038 0.2425 0.00470.05 0.15 23 0.1454 0.0023 0.2128 0.0044 0.1685 0.0026 0.2004 0.00450.05 0.10 27 0.0931 0.0021 0.1452 0.0020 0.1096 0.0026 0.1341 0.00300.05 0.05 32 0.0483 0.0020 0.0881 0.0025 0.0602 0.0019 0.0799 0.00300.01 0.20 30 0.1889 0.0032 0.2820 0.0027 0.2234 0.0023 0.2573 0.00270.01 0.15 33 0.1405 0.0026 0.2254 0.0030 0.1696 0.0024 0.2037 0.00350.01 0.10 37 0.0943 0.0031 0.1628 0.0033 0.1165 0.0034 0.1451 0.00360.01 0.05 43 0.0485 0.0017 0.0974 0.0027 0.0630 0.0029 0.0856 0.0019

The main question is whether the difference of the actual and the nominal type-II-risks issubstantial. To asses the difference the concept of a 20 % robustness as defined in Rasch andGuiard (1990) is used: a test is called 20 % robust if |βnom − βact| ≤ 0.2 · βnom.

Simulation

Three levels of the nominal type-I-risk 0.10, 0.05, 0.01 and four levels of the nominal type-II-risk 0.20, 0.15, 0.10, 0.05 were considered. For each combination of these nominal risks themaxi-min sample size n for the F -test was evaluated (see Section 2.1.1). A sample of sizen was generated for each of the three normally distributed random variables x1, x2, x3 andfrom them the categorical random variables y1, y2, y3 were derived by decomposition withthe support {30, 50, 70}.For both the normal and the categorical variables the F -test and the Kruskal – Wallis testwere performed. This was repeated 100 000 times and the ratio of non-significant tests wasrecorded, it is the (estimated) actual type-II-risk. The repetitions were divided into 10 blocksof 10 000 and the βact were recorded for each of them, these values were used to estimate thestandard deviation of the βact (denote sd(βact)).

For the actual type-I-risk, the simulation was made in the analogous way, just the meansof the variables x1, x2, x3 were all equal to µ1 = µ2 = µ3 = 50 and the ratio of significanttests was recorded as the estimate of αact.

Results

Tables 2.2 and 2.3 reports the actual type-II-risk and type-I-risks of the F -test and Kruskal– Wallis test. The sample size n is the maxi-min sample size of the F -test for the given αnom

and βnom. For each test the standard deviation sd of the estimate of the risk is recordedin the second column.

It is not surprising that the actual type-I-risk is in all cases lower than the nominal risk(the few opposite cases are caused by the error of the estimate, as is seen from standarddeviations). The tests are constructed to keep the given level of the type-I-risk.

12

Table 2.3: The actual type-I-risk.

Normal Categorical Normal CategoricalNominal values F -test Kruskal–Wallis Kruskal–Wallis F -test

αnom βnom n αact sd(αact) αact sd(αact) αact sd(αact) αact sd(αact)

0.10 0.20 17 0.1013 0.0030 0.0995 0.0029 0.0999 0.0020 0.1007 0.00350.10 0.15 19 0.1004 0.0028 0.0994 0.0046 0.0997 0.0033 0.1010 0.00390.10 0.10 22 0.0996 0.0031 0.0994 0.0040 0.0987 0.0036 0.0998 0.00400.10 0.05 27 0.0984 0.0022 0.0981 0.0018 0.0982 0.0020 0.0985 0.00250.05 0.20 21 0.0496 0.0029 0.0484 0.0028 0.0474 0.0029 0.0509 0.00320.05 0.15 23 0.0494 0.0016 0.0492 0.0015 0.0485 0.0013 0.0508 0.00200.05 0.10 27 0.0505 0.0026 0.0487 0.0027 0.0495 0.0032 0.0516 0.00240.05 0.05 32 0.0497 0.0012 0.0490 0.0015 0.0493 0.0018 0.0509 0.00100.01 0.20 30 0.0101 0.0010 0.0090 0.0011 0.0090 0.0011 0.0104 0.00130.01 0.15 33 0.0098 0.0011 0.0091 0.0004 0.0089 0.0010 0.0100 0.00080.01 0.10 37 0.0099 0.0012 0.0097 0.0010 0.0092 0.0010 0.0107 0.00140.01 0.05 43 0.0096 0.0008 0.0094 0.0010 0.0091 0.0007 0.0101 0.0010

The results for the type-II-risk are more interesting. In Table 2.2 the 20 % robust resultsare bold print. Naturally, for normally distributed random variables the actual type-II-riskof the F -test is close to the nominal one. The type-II-risk of the Kruskal – Wallis test is a bithigher, it is not 20 % robust in two cases.

For the ordinal categorical variable it seems that the actual type-II-risk of the F -test is closerto the nominal one than the risk of the Kruskal – Wallis test. However, the difference betweenthe nominal risks and the actual ones is greater than 20 %, which is asked in the concept of the20 % robustness. The βact lies neither in the interval [0.08, 0.12] for βnom = 0.10, nor in theinterval [0.12, 0.18] for the βnom = 0.15, nor in the interval [0.16, 0.24] for βnom = 0.20, andnor in the interval [0.04, 0.06] for the βnom = 0.05. Note that the increase of the type-II-riskis partially caused by the discretization which decreases the relative effect size δ

σ .

It follows that the maxi-min sample size computed for the F -test and continuous responsevariable is lower than the required maxi-min sample size for the Kruskal – Wallis test andordinal categorical response variables. Some method for the Kruskal – Wallis test is necessaryand will be discussed in the following part.

2.2.2 Size of experiment for the Kruskal – Wallis test

The Kruskal – Wallis test is used to test the hypothesis of equal means (2.8). For keeping theappropriate type-II-risk β it is necessary to determine the maxi-min sample size.

Because the exact distribution of the Kruskal – Wallis test statistics (2.10) or (2.11) underthe alternative hypothesis is not known, evaluation of the power of the test and followingdetermination of sample size is very problematic in the case of ordinal categorical responsevariables. In this section a formula for computing the sample size was found out by simulation.

If only two groups are compared, the Kruskal – Wallis test coincides with the Wilcoxon test.The asymptotic power of the Wilcoxon test statistic under the alternative hypothesis is derivedin Lehmann (1975). In Noether (1987) two formulas for determining the size of experiments

13

Table 2.4: The parameters and properties of the used distributions of Fleishman system. Allthe distributions have zero mean and standard deviation 1.

No. of distr. Skewness Kurtosis u = −s t v

1 0 3.75 0 0.748020807992 0.0778727161012 0 7 0 0.630446727840 0.1106967420403 1 1.5 0.163194276264 0.953076897706 0.0065973697444 2 7 0.260022598940 0.761585274860 0.0530722734915 (Normal) 0 0 0 1 06 (Uniform) 0 −1.2

were provided. Properties of Noether’s formula were described in Chakraborti et al. (2006)in details. In Section 2.2.3 our results for the Kruskal – Wallis test are compared to theNoether’s formulas .

Let us consider a (a ≥ 2) continuously distributed random variables x1, . . . ,xa. We want totest whether their means are all equal or whether there is at least one pair of these variableswith different means.

Instead of these continuous variables, only the ordinal categorical variables y1, . . . ,ya areobserved. They are derived from the variables x1, . . . ,xa using the decomposition basedon a support {ξ1, ξ2, . . . , ξr−1}, as is described in Definition 1 (page 5).

In this section, the simulation to determine the type-II-risk β for given sample size is described.

Simulation

For the simulation experiment it is important to choose the mechanism of generating categor-ical random variables. We used six different types of distribution of the underlying continuousvariables.

All the considered distributions have zero mean and standard deviation 1. The distributionsdiffer in the skewness and kurtosis. The first distribution is the normal distribution, i.e. boththe skewness and the kurtosis are equal to 0. The second distribution was the uniformdistribution in the interval (−

√3,√

3), its skewness is equal to 0, kurtosis to 1.2.

The other distributions come from the Fleishman system, described in Rasch and Guiard(1990). The random variable has the form s + tx + ux2 + vx3, where the x is a standardnormally distributed random variable and s, t, u, v are given parameters. Values of theseparameters and properties of the distributions can be found in Table 2.4. The densitiesof these distributions are plotted in Figure 2.2.

For each of these distributions two different decompositions are used:

• Equidistant: The support points are equidistant in the area where 99 % of observationslie in.

• Equal mass: The measure of all categories is equal with respect to the given distribu-tion. It means that an observation will be in each category with the same probability.

Emphasize that the support points were fixed for distributions of zero mean.

The parameters of simulation are as follows:

14

−6 −2 2 6

0.0

0.2

0.4

0.6

Distribution 1

x

f(x)

−6 −2 2 6

0.0

0.2

0.4

0.6

Distribution 2

xf(

x)

−6 −2 2 6

0.0

0.2

0.4

0.6

Distribution 3

x

f(x)

−6 −2 2 6

0.0

0.2

0.4

0.6

Distribution 4

x

f(x)

−6 −2 2 6

0.0

0.2

0.4

0.6

Distribution 5

x

f(x)

−6 −2 2 6

0.0

0.2

0.4

0.6

Distribution 6

x

f(x)

Figure 2.2: Densities of the distributions from Fleishman system.

• The number of groups a equals 2, 3, . . . , 10 for the normal distribution and 2, 3, 4, 6, 8, 10for the others.

• The difference between the minimal and the maximal expected values δ equals 1.67,1.25, 1.11, 1.

• The number of categories of the ordinal variables r equals 3, 4, 5, 10, 50.

• One of the six distribution from Table 2.4 and decomposition with equidistant supportpoints or support point of equal mass (see previous paragraph) is used.

• The standard deviation of error terms eij equals σ = 1 (implying δ/σ = δ).

• The type-I-risk equals α = 0.05.

Let us fix a, δ, r, a distribution and a support of decomposition. The expected valuesof x1, . . . ,xa are considered as µ1 = −δ/2, µ2 = +δ/2, µ3 = · · · = µa = 0.

To choose the sample size n the following procedure is used. Our focus is on the type-II-riskbetween 0.40 and 0.05. The first n is the maxi-min sample size of the F -test for the type-II-risk β equal to 0.40 (formula (2.7) in Section 2.1.1). For this n the type-II-risk of the Kruskal– Wallis test was estimated (see below) and n is increased by 1 until the estimated type-II-riskis lower than 0.05. For the further analysis only the results with actual type-II-risk β smallerthan 0.40 were used.

For a fixed sample size n one step of the simulation is as follows:

15

1. The continuous random samples of size n were generated for each group with the appro-priate expected value. Then they were transformed to the ordinal categorical variables,using the given support of decomposition.

2. The Kruskal – Wallis test was performed and the result was recorded.

These two steps were repeated 10 000 times. The (estimated) actual type-II-risk β for thegiven sample size n is equal to the proportion of the non-significant tests in these repetitions.

For the further analysis only the results with actual type-II-risk β smaller than 0.40 wereused.

Formula

By inspection of the results of simulation it was found that there is almost linear dependenceof the required sample size on the maxi-min sample size for the ANOVA F -test and normallydistributed variables (for given a, σ

δ , the relative effect p, and the number of categories r).Many linear models were tried. The model below was chosen as the most appropriate (goodfit and not too many factors).

Given the type-I-risk α = 0.05, the maxi-min sample size for the Kruskal – Wallis test can becomputed as follows:

n(β) = 3.054 · n0(β) − 47.737 · δσ

+ 51.288 · p2 + 82.050 · 1

r+

+2.336 · n0(β) · δσ− 7.428 · n0(β) · p2 − 0.535 · n0(β) · 1

r+

+29.708 · δσ· p2 + 56.102 · δ

σ· 1

r− 223.770 · p2 · 1

r, (2.17)

where the n0(β) = n0(β, a, δ, σ) is the maxi-min sample size for the F -test.

The formula (2.17) estimates the sample size very well, only 4.8 % of the residuals are largerthan 20 % of the fitted value. Further, 9.0 % percent is higher than 15 % of the fitted value,16.6 % percent is higher than 10 % and 30.8 % percent of the residuals is higher than 5 %of the fitted value.

The negative residuals are not so dangerous because it means that the actual type-II-riskwould be even lower than required. Using the formula (2.17) 48 % of the residuals are negative.Of course, there is a loss of sources in this case.

In Table 2.5, there are the sample sizes estimated by simulation and the sample sizes estimatedusing relation (2.17) for the β = 0.20 and various choices of parameters. Figure 2.3 visualizedresiduals in model (2.17). The absolute values of residuals increase with the increasing of thesample size. The ratio of the residuals to the estimated sample size is almost constant for thesample size over 25.

Discussion

The formula (2.17) for determination of required sample size, given in the previous paragraph,was derived for some specific cases. The six used continuous distributions with different shapesand two decompositions provide eleven different distributions of categorical variables. Thequestion about legitimacy of its generalization arises.

16

Table 2.5: Comparison of required sample sizes attained by simulation and calculated by for-mula (2.17) for β = 0.2. In the columns are subsequently the number of groups a, relativeeffect size δ

σ , identification of underlying distribution (as in Table 2.4), relative effect of thedistribution of categorical variables and their number of categories, maxi-min sample sizeof the F -test for normal variables and the maxi-min sample sizes for the Kruskal – Wallis testbased on the simulation and calculated by formula (2.17).

Groups δσ Distribution Rel. effect Categories n0(β) nSIM nFIT

2 1 1 0.66 3 16.71 31 352 1 1 0.78 5 16.71 14 152 1.67 1 0.77 3 6.76 11 112 1.67 1 0.89 5 6.76 7 76 1 1 0.66 3 26.59 47 556 1 1 0.78 5 26.59 22 226 1.67 1 0.77 3 10.2 16 196 1.67 1 0.89 5 10.2 10 102 1 3 0.69 3 16.71 28 302 1 3 0.77 5 16.71 17 172 1.67 3 0.8 3 6.76 11 102 1.67 3 0.88 5 6.76 7 76 1 3 0.69 3 26.59 44 466 1 3 0.77 5 26.59 27 266 1.67 3 0.8 3 10.2 16 166 1.67 3 0.88 5 10.2 11 10

17

0 20 40 60 80 100

−0.

4−

0.2

0.0

0.2

0.4

0.6

0.8

1.0

estimated sample size

rezi

dual

s / e

stim

ated

sam

ple

size

Figure 2.3: Relation between the residuals of model (2.17) and the required sample sizeestimated from this model. The ratio of the residuals and the estimated sample sizes isplotted on the y-axis, the estimated sample sizes are plotted on the x-axis.

Table 2.6: Sample sizes computed by formula (2.17) for α = 0.05, a = 6, r = 5, p = 0.71 anddifferent values of β and δ

σ .

δσ β = 0.05 β = 0.1 β = 0.2

1 61 51 401.2 51 42 321.5 38 30 21

It should be noted that the formula is checked for four values of δ/σ between 1 and 1.7. Itcan be interpolated for all the values in this interval. Similarly, it is assumed the numberof categories r can be interpolated for all integer values between 3 and 50. With r → ∞ thereis decreasing influence of r on the required size of an experiment (the distribution tends toa continuous one which is reflected by the fact that there is 1

r in the formula, see Theorem 1in Section 2.1.3). Therefore if r > 50 use formula (2.17) as for r = 50.

For comparison of the sample sizes needed for quantitative and ordered categorical variableswe calculated the values analogue to those of Table 2.1 by formula (2.17), a results arein Table 2.6. The required sample size is always higher in the case of Kruskal – Wallis testand ordinal categorical variables (the relative effect is lower in that case) than for the F -testand continuous normally distributed variables and the difference is significant.

To summarize, the required size of an experiment with ordinal categorical variables for giventype-I-risk α = 0.05, type-II-risk β from the interval [0.05, 0.4], δ

σ in interval [1, 1.7] and

18

number of compared groups a between 2 and 10 can be calculated by formula (2.17). Thereare no restrictions for the other parameters.

2.2.3 Size of experiment for the Wilcoxon test

The Wilcoxon test is a special case of the Kruskal – Wallis test for the number of groupsequal to a = 2. In Chakraborti et al. (2006) two formulas for evaluating the sample size forthe Wilcoxon test are mentioned.

Noether’s formula is derived for the local alternatives and the required sample size is computedas

nNF = CEIL

(

(

Φ−1(1 − α/2) + Φ−1(1 − β))2

6 · (p− 0.5)2

)

, (2.18)

where Φ−1 is the quantile function of normal distribution, α and β the risk of the first and thesecond kind and p the relative effect. This formula is quite simple, which is useful in practice.More accurate but demanding more inputs is the second formula (it was derived for an one-tailed alternative with α instead of α/2):

nF2 = CEIL

(

Φ−1(1 − α/2)/√

6 + Φ−1(1 − β)√

(p3 − p2) − (p2 − p2))2

6 · (p− 0.5)2

, (2.19)

where p2 = P (x1 < x2 and x1 < x′

2) and p3 = P (x1 < x2 and x′

1 < x2), while x1, x′

1

are independent random variables distributed as in the group with the lower expected value,and x2, x′

2 are independent random variables distributed as in the group with the greaterexpected value.

Note that both of Noether’s formulas were derived assuming continuous distributions of theresponse variables.

Values of sample size computed using the formula (2.17) derived in Section 2.2.2, and theformulas (2.18) and (2.19) from Chakraborti et al. (2006) in some special cases can be foundin Table 2.7.

In the left part of Table 2.8 there are differences between the simulated and calculated samplesizes given as the percentage of the calculated sample size. In the ideal case the most of theobservations should be in the row of 0 % as happen for the formula (2.17). Two Noether’sformulas tend to overestimate the required sample size. In the right part of Table 2.8 thereis analogue data for the type-II-risk β between 0.1 and 0.3 only.

19

Table 2.7: Comparison of required sample sizes for the Wilcoxon test evaluated using sim-ulation and formulas (2.17), (2.18) and (2.19) for β = 0.2 and some values of the otherparameters.

Formulaδσ p p2 p2 r n0(β) Simulated (2.17) (2.18) (2.19)

1 0.66 0.47 0.54 3 16.71 31 35 55 531 0.78 0.66 0.71 4 16.71 14 15 17 171 0.78 0.67 0.71 5 16.71 14 15 17 16

1.67 0.77 0.63 0.7 3 6.76 11 11 18 171.67 0.89 0.82 0.85 4 6.76 7 7 9 81.67 0.89 0.82 0.85 5 6.76 7 7 9 81 0.69 0.52 0.58 3 16.71 28 30 37 361 0.76 0.65 0.67 4 16.71 17 18 20 201 0.77 0.66 0.67 5 16.71 17 17 19 19

1.67 0.8 0.68 0.73 3 6.76 11 10 15 131.67 0.88 0.82 0.83 4 6.76 7 7 10 91.67 0.88 0.83 0.83 5 6.76 7 7 9 81 0.71 0.56 0.63 3 16.71 22 26 30 301 0.74 0.61 0.66 4 16.71 20 21 23 231 0.75 0.62 0.66 5 16.71 19 20 22 21

1.67 0.83 0.71 0.78 3 6.76 9 9 13 121.67 0.86 0.77 0.81 4 6.76 8 7 11 91.67 0.87 0.78 0.81 5 6.76 8 7 10 91 0.69 0.53 0.59 3 16.71 31 29 36 361 0.72 0.57 0.62 4 16.71 25 24 28 281 0.73 0.59 0.62 5 16.71 22 22 25 24

1.67 0.82 0.72 0.77 3 6.76 10 9 13 131.67 0.86 0.77 0.81 4 6.76 8 7 11 91.67 0.86 0.76 0.8 5 6.76 8 7 11 9

20

Table 2.8: The differences between the simulated sample size and sample sizes evaluated usingthe formulas (2.17), (2.18) and (2.19), given as the percentage of the evaluated sample sizes.In the cells are the percentages of the cases in the whole data set.

β ∈ [0.05, 0.4] β ∈ [0.1, 0.3]Percentage (2.17) (2.18) (2.19) (2.17) (2.18) (2.19)

positive diff. > 40 % 0.1 0 0 0 0 020 %–40 % 1.6 0 0 1.1 0 010 %–20 % 9.1 0 0.2 9.3 0 05 %–10 % 8.0 0 1.0 8.2 0 0.10 %–5 % 3.8 0 0.5 2.8 0 0

0 % 26.4 0 12.7 27.1 0 13.9negative diff. 0 %–5 % 5.8 0 1.5 3.6 0 1.1

5 %–10 % 20.4 2.1 12.3 22.0 1.1 12.010 %–20 % 21.4 43.0 33.9 23.6 43.9 33.820 %–40 % 3.1 43.3 30.9 2.3 43.0 32.4> 40 % 0.2 11.6 6.9 0 12.1 6.7

21

Chapter 3

Tests of Additivity in Mixed

Two-way ANOVA Model with

Single Subclass Numbers

Two-way ANOVA models are a well known class of linear models that allow estimation andtesting of two main effects and one interaction effect. Usually, the number of replicationsper factor level combination or cell is greater than one, which enables estimation of the maineffects and the interaction effect simultaneously. If the number of replications in each cell isequal to one, the classic way of estimating or testing the interaction effect is not applicableanymore.

For example, some patients may react differently to the same drug treatment and it is infea-sible to test more drugs on one patient. Testing for an interaction in such a model will bereferred here as testing of additivity hypothesis.

The first test of additivity in case of single subclass numbers was proposed by John Tukeyin Tukey (1949). This test was derived for a very special type of interaction (see Ghoshand Sharma (1963) and Hegeman and Johnson (1976)). Other tests for some more generalinteractions includes Mandel (1961), Johnson and Graybill (1972), Tusell (1990) and Boik(1993b), the tests are nicely summarized in Alin and Kurt (2006).

Unfortunately, all of these tests were developed for the case of fixed effects model. Since manypossible applications correspond to mixed effects model, it is necessary to find out whetherthe proposed additivity tests can be used with mixed effects as well and, if so, how powerfulthey are.

In this chapter we will summarize the known additivity tests. Then the type-I-risk and powerof these tests for the mixed effects model is studied and a formula for size of experiment isprovided. Finally, a modification of Tukey test is derived.

22

3.1 Preliminaries

3.1.1 Description of the problem

In this section we will discuss the two-way ANOVA models. First, the model with both factorand interaction fixed is considered. The response in the ith row and the jth column is modeledas follows:

yij = µ+ αi + βj + (αβ)ij + eij , i = 1, . . . , a, j = 1, . . . , b, (3.1)

where µ, αi, βj and (αβ)ij are real constants such that

i

αi =∑

j

βj =∑

i

(αβ)ij =∑

j

(αβ)ij = 0

and the eij are normally distributed independent random variables with zero mean and vari-ance σ2.

Second, the model with one factor fixed and one factor random is considered, the interactionis random variable. The response variable is modeled as

yij = µ+ αi + bj + (ab)ij + eij , i = 1, . . . , a, j = 1, . . . , b, (3.2)

where where µ, αi are real constants and bj , (ab)ij and eij are normally distributed randomvariables, all with zero mean and variance σ2

B, σ2AB and σ2, respectively.

We want to test the hypothesis that there is no interaction in the model. In the fixed effectmodel (3.1) the additivity hypothesis can be written as

H0: (αβ)ij = 0, i = 1, . . . , a, j = 1, . . . , b, (3.3)

and it is tested against the alternative

HA: (αβ)ij 6= 0 for at least one pair of (i, j).

In the mixed model (3.2) the additivity hypothesis can be written as

H0: σ2AB = 0 (3.4)

and it is tested against the alternative

HA: σ2AB > 0.

A lot of tests were designed for the test of hypothesis (3.3) in the fixed effects model (3.1).We want to confirm whether these tests work also in the case of the mixed model. Thepower of these tests in mixed model is investigated and an empirical determination of sizeof experiment for the Johnson – Graybill test is derived. Finally, a modification of Tukey testwith improved power is developed.

23

3.1.2 Tests of additivity

Several tests of additivity in fixed effects model (3.1) have been developed over the years. Fiveof them will be discussed here, namely the tests by Tukey (1949), Mandel (1961), Johnsonand Graybill (1972), Boik (1993b) and Tusell (1990). More details can be found e.g. in Boik(1993a) or Alin and Kurt (2006).

Subsequently the following notation will be used: Let y·· =∑

i

j yij/ab denotes the over

all mean of the response, yi· =∑

j yij/b the ith row mean and y·j =∑

i yij/a the jth columnmean. The matrix R will stand for the residual matrix with respect to the main effects

rij = yij − yi· − y·j + y·· (3.5)

The decreasingly ordered list of eigenvalues of RRT matrix will be denoted by κ1 ≥ κ2 ≥ . . . ,and its scaled version by

ωi =κi

k κk, i = 1, 2, . . .

If the interaction is present we may expect that some of ωi coefficients will be substantiallyhigher than others.

Tukey test: Introduced in Tukey (1949). Tukey suggested first to estimate row and columneffects by the row a column means αi = yi·, βj = y·j and then test for interaction of type(αβ)ij = k · αi · βj , where k is a real constant (k = 0 implies no interaction). The Tukey teststatistic ST equals

ST = MSint/MSerror,

where

MSint =

(

i

j yij(yi· − y··)(y·j − y··))2

i(yi· − y··)2∑

j(y·j − y··)2

and

MSerror =

i

j(yij − y··)2 − a

j(y·j − y··)2 − b

i(yi· − y··)2 − MSint

(a− 1)(b− 1) − 1.

Under the additivity hypothesis ST is F -distributed with 1 and (a − 1)(b − 1) − 1 degreesof freedom.

Mandel test: Introduced in Mandel (1961). Mandel generalized the approach of Tukey andderived a test for the interaction (ab)ij = ci · bj with ci being a certain row constant. Hedefined the test statistic SM to test for ci = 0 as

SM =

i(zi − 1)2∑

j(y·j − y··)2

a− 1/

i

j ((yij − yi·) − zi(y·j − y··))2

(a− 1)(b− 2),

where

zi =

j yij(y·j − y··)∑

j(y·j − y··)2.

Under the additivity hypothesis SM is F -distributed with a− 1 and (a− 1) · (b− 1) degreesof freedom.

24

Johnson – Graybill test: Introduced in Johnson and Graybill (1972). These authors chosea different approach and derived a test for (ab)ij = k · ci · dj with ci and dj being a certainrow or column constant and k an overall constant. They suggested the test statistic

SJ = ω1 =eig1(RR

′)

tr(RR′).

The hypothesis is rejected if SJ is high.

Tussel test: See Tusell (1990). Tussel chose a similar approach to the Johnson – Graybilltest. Without loss of generality assume a ≤ b. The suggested test statistic is

SU = (a− 1)(a−1)(b−1)/2

(

a−1∏

i=1

ωi

)(b−1)/2

.

The additivity hypothesis is rejected if SU is low. Critical values for this test statistic aregiven e. g. in Kres (1972). Note that these tables should be used with (a−1) = p and b = N .

Locally best invariant (LBI) test: See Boik (1993b). This test was designed to havelocally more power than the Tussel test. LBI test statistic equals

SL =1

a− 1· 1∑a−1

1 ω2i

.

The additivity hypothesis is rejected if SL is low.

The critical values of these test can be found by simulation for given a and b.

All these test (together with the procedure for founding of critical values) were implementedin R environment (R Development Core Team (2008)), package AdditivityTests. It maybe downloaded on

http://5r.matfyz.cz/skola/AdditivityTests/additivityTests 0.3.zip.

As far as we are informed, this is the first R implementation of additivity tests with theexception of the Tukey test.

All these test were developed for the fixed effects model (3.1). In the next section their usagefor the model with mixed effects (3.2) is examined.

3.2 Properties of the additivity tests

3.2.1 Type-I-risk of the additivity tests

The main interest of our work lies in model (3.2) with one fixed and one random factor.There arise a question whether the tests presented in the previous section and developed forthe fixed effect model (3.1) can be used also in this situation.

We considered common 5 % type-I-risk level and perform a simulation to estimate the actualtype-I-risk. In the simulation the parameters were set to the following values:

• The number of levels of the fixed factor was equal to a = 3, 4, . . . , 10.

25

Table 3.1: Number and percentage of the simulated cases where the actual type-I-risk is loweror greater than the nominal level 5%.

Test α ≤ 0.05 α > 0.05

Tukey test 349 (96.94) 11 (3.06)Mandel test 348 (96.67) 12 (3.33)Johnson Graybill test 339 (94.17) 21 (5.83)Tusell test 337 (93.61) 23 (6.39)LBI test 336 (93.33) 24 (6.67)

• The number of levels of the random factor b was chosen between 4 and 50 (by 2 between4 and 20, by 5 between 20 and 50).

• The variance of the random factor was equal to σ2B = 2, 5, 10.

• The variance of the random error was σ2 = 1. For other values of σ2 the model can bescaled (see Example 3 on page 29).

In one step of the simulation a dataset was generated based on the model without interaction(σ2

AB = 0). Then one of the Tukey, Mandel, Johnson – Graybill test, LBI or Tussel test wasperformed. The percentage of significant results in the number of steps is assumed to be theactual type-I-risk of the test.

The 10 000 simulation steps were repeated 10 times and standard error of the estimation wascomputed based on these 10 repetitions. Then for each test and each combination of param-eters the one sided hypothesis

H0: the actual type-I-risk is lower or equal to 0.05

was tested by one sample t-test on 5 % level against the alternative

HA: the actual type-I-risk is greater than 0.05.

The results of these t-tests for each additivity test are summarized in Table 3.1.

For the Tukey and Mandel tests the vast majority (> 95 %) of cases is not significantly abovethe 0.05 level. For the other tests the estimated type-I-risk is higher than 0.05 in slightlymore cases (6 − 7 %). However, this may be also false positives caused by multiple testing.

We would like to remark that although the tests were derived for the fixed effects model(∑b

j=1 βj = 0) we used them in mixed effects model where E bj = 0 but∑b

j=1 bj 6= 0 almostsurely. However, for high number of levels of the random factor b, the sum converges to zero(law of large numbers, e.g. Grimmett and Stirzaker (1992)).

For 5 % type-I-risk we can say that all five additivity tests seem not to violated the type-I-riskassumption and therefore they can be used for the mixed effects model as well.

3.2.2 Simulation study of the power of the additivity tests

The power of the Tukey, Mandel, Johnson – Graybill, LBI and Tussel tests is studied in thissection. Powers of these tests were compared by simulation. It is shown that while Tukey and

26

Mandel tests have good power when the interaction is a product of the main effects, i.e. when(ab)ij = k ·αi · bj (k is a real constant, αi and bj the row and column effects in model (3.2)),their power for more general interaction is very poor. The other three tests work a bit worsein this special case but they have appropriately good power in more general cases too.

Let us consider mixed effects model (3.2). Two possible interaction schemes were underinspection:

• Type (A): (ab)ij = k · αi · bj

• Type (B): (ab)ij = k · αi · cj ,

where cj is normally distributed random variables with zero mean and variance σ2B, mutually

independent on bj and eij , k is a real constant, αi the row effect, bj the column effect.

Two possibilities are considered for the value of b, either b = 10 or b = 50, and 10 differentvalues of interaction parameter k between 0 and 12 are considered. The other parameters areµ = 0, σ2

B = 2, σ2 = 1, a = 10,

(α1, . . . , α10) = (−2.03,−1.92,−1.27,−0.70, 0.46, 0.61, 0.84, 0.94, 1.07, 2.00).

For each combination of parameters a dataset was generated for the model (3.2), the Tukey,Mandel, Johnson – Graybill, LBI and Tussel tests were performed and results were noteddown. The step was repeated 10 000 times. The estimated power of the test is the percentageof the positive results.

All tests were done on α = 5% level. The dependence of the power on the constant k isvisualized on Figure 3.1. As we can see while Tukey and Mandel tests outperformed theother three tests for the interaction type (A), they completely fail to detect the interactiontype (B) even for a large value of k. Therefore, it is desirable to develop a test which is ableto detect a spectrum of practically relevant alternatives while still has the power comparableto the Tukey and Mandel tests for the most common interaction type (A).

Because in practice the type of interaction is usually not known, it should be recommendedto use Johnson – Graybill, LBI or Tussel test for the hypothesis of additivity (3.3) or (3.4).Other possibility is to use the modified Tukey test proposed in Section 3.3.

3.2.3 Size of experiment for the Johnson – Graybill test

In this section we will propose an empirical formula for the required size of experiment forthe Johnson – Graybill test.

We consider an interaction term in the model (3.2) of the form

(ab)ij = k · αi · cj , (3.6)

where αi are the row effects in (3.2), cj are normally distributed random variables with zeroexpected value and variance σ2

B, mutually independent to the random variable bj and randomerror eij . The k is a real constant.

The interaction (3.6) is a random variable with zero mean and variance

var(ab)ij = k2 · α2i · σ2

B.

27

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.2

0.6

1.0

k

pow

er

0.0 0.5 1.0 1.5 2.0 2.5 3.00.

20.

61.

0

k

pow

er

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.2

0.6

1.0

k

pow

er

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.2

0.6

1.0

k

pow

er

Figure 3.1: Power dependence on k, b and interaction type; b = 10 left, b = 50 right andinteraction type (A) up, type (B) down. Tukey test solid line, Mandel test dashed line,Johnson – Graybill test dotted line, Tussel test long dash line, LBI test dot-dash line.

28

We perform a simulation to investigate the power of Johnson – Graybill test in dependencyon the parameters of the model (3.2) with interaction (3.6):a, b, α1, . . . , αa, k and σB. Weconsider number of levels of the fixed factor a equal to 10, 20, 30, 40 or 50, and five differentshapes of the distribution generating αi:

• equidistant

• random sample from the normal distribution

• random sample from the t3-distribution

• half of the αi’s cumulated in one point and half of them in another point

• two points of αi’s in the same distance from zero and the other a− 2 αi’s equal to zero.

In all cases α1, . . . , αa are scaled to have zero mean. It was observed that the power does notdepend on the shape and depends on the αi only through the sum of their squares

∑ai=1 α

2i .

Three values of this sum were considered: 296, 665 and 1496.

The variance of the random effect bj was considered to be equal to 1,√

2 or 2. The parameterk (which control the variance of random interaction (ab)ij) takes values 0.03, 0.05, 0.07 and0.1. The variance of random noise eij is considered to be equal to 1, in the case of other valuethe model should be scaled (see Example 3 on page 29).

The power of a test increases with the distance of its alternative from the null hypothesis.Based on the simulation, the power π of Johnson – Graybill test for type-I-risk equal to 5 %can be approximated by

π(b) = 1 − 1

a · b · k4 · σ4B

∑ai=1 α

2i

. (3.7)

The formula was computed only for the power in the interval 〈0.10, 0.95〉.Let us emphasize that in practice the number of rows a is fixed and we can influence the sizeof experiment only through the number of columns b. By simple manipulation formula (3.7)can be reformulated as follows:

b(β) =

1

β · a · k4 · σ4B

∑ai=1 α

2i

, (3.8)

where ⌈x⌉ means the lowest integer equal to or greater than x. In case of σ2 6= 1 the modelshould be scaled, see Example 3.

In Figure 3.2 the difference of number of levels of random factor b realized by simulation andcomputed by formula (3.8) is plotted. Notice that the formula gives quite satisfactory results,although there are a few outliers.

Example 3. Scaling the model when σ2 6= 1

Consider that we want to plan an experiment and use formula (3.8) to determine the sizeof this experiment. This formula supposes that the variance of errors eij in model (3.2) isequal to 1. This example shows the solution when this assumption is violated.

29

0.2 0.4 0.6 0.8

−10

00

100

300

500

β

b SIM

−b E

ST

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Figure 3.2: Dependency of the difference of real number of levels of the random factor bSIM

(realized from simulation) and the bEST estimated by formula (3.8) on the type-II-risk β (theline is zero level).

Let the variance of eij in model (3.2) be equal to σ2 > 0. We divide the equation (3.2) withinteraction (3.6) by σ and for i = 1, . . . , a, j = 1, . . . , b define

y∗

ij =yij

σ, µ∗ =

µ

σ, α∗

i =αi

σ, b∗j =

bj

σ, c∗j =

cj

σ, k∗ = k · σ and e∗

ij =eij

σ.

The modified model equation is

y∗

ij = µ∗ + α∗

i + b∗j + k∗ · α∗

i · c∗j + e∗

ij , i = 1, . . . , a, j = 1, . . . , b. (3.9)

The condition analogous to that of model (3.2) are valid, i.e.∑

i α∗

i = 0, the b∗j , c∗j ande∗

ij are normally distributed random variables with zero mean. Variances of b∗j and c∗j equals

σ∗2B = (σB/sigma)2, variance of e∗

ij equals σ∗2 = var e∗

ij = 1.

The size of experiment computed by formula (3.8) for model (3.9) is appropriate for theoriginal model.

3.3 Modified Tukey test

To increase the power of Tukey test its modification is proposed in this section.

In the classic Tukey test a model

yij = µ+ αi + βj + k · αi · βj + eij (3.10)

is tested against the submodel

yij = µ+ αi + βj + e′

ij .

30

The estimators of row effects αi = yi· − y·· and column effects βj = y·j − y·· are calculatedin the same way in both models although the dependency of yij on these parameters is notlinear for the full model.

The main idea behind the presented modification is that the full model (3.10) is fitted by non-linear regression and tested against the submodel

yij = µ+ φi + ψj + e′

ij

by the likelihood ratio test. Therefore the estimates of row and column effects therefore differbetween the classical and the modified model.

Non-adjusted test

Under the additivity hypothesis the maximum likelihood estimates of parameters in mo-

del (3.10) are calculated as µ = y··, α(0)i = yi· − y·· and β

(0)j = y·j − y··. Residual sum

of squares equals

RSS(0) =

i

j

(

yij − µ− α(0)i − β

(0)j

)2=∑

i

j

(yij − yi· − y·j + y··)2 .

In the full model (3.10) the first estimate of k is taken as follows

k(0) =

i

j

(

yij − α(0)i − β

(0)j − µ(0)

)

· α(0)i · β(0)

j

i

j

(

α(0)i

)2·(

β(0)j

)2 ,

i.e. the same as in Tukey test, and then we continue by the iteration procedure updatingestimates based on previous step’s versions

a(n)i =

j

(

yij − µ− β(n−1)j

)

·(

1 + k(n−1) · β(n−1)j

)

j

(

1 + k(n−1) · β(n−1)j

)2 ,

β(n)j =

i

(

yij − µ− α(n−1)i

)

·(

1 + k(n−1) · α(n−1)i

)

i

(

1 + k(n−1) · α(n−1)i

)2 ,

k(n) =

i

j

(

yij − α(n−1)i − β

(n−1)j − µ

)

· α(n−1)i · β(n−1)

j

i

j

(

α(n−1)i

)2·(

β(n−1)j

)2 .

Surprisingly, it seems that one iteration is just enough in a vast majority of cases. Therefore,for a simplicity reason let us define

RSS =∑

i

j

(

yij − µ− α(1)i − β

(1)j − k(1)α

(1)i β

(1)j

)2.

The likelihood ratio statistic, i.e. a difference of twice log-likelihoods, equals

RSS(0) − RSS

σ2

31

0.0 1.0 2.0 3.0

0.2

0.6

1.0

k

pow

er

0.0 1.0 2.0 3.0

0.2

0.6

1.0

k

pow

er

Figure 3.3: Power dependence on k and b for the interaction type (B), b = 10 left, b = 50right. Tukey test solid line, Mandel test dashed line, Johnson – Graybill test dotted line,Tussel test long dash line, LBI test dot-dash line, modified Tukey test two dash line.

and it is asymptotically χ2–distributed with 1 degree of freedom.

The consistent estimate of a residual variance σ2 is s2 = RSS

ab−a−b and RSS

σ2 is approximately

χ2–distributed with ab− a− b degrees of freedom. Thus, using a linear approximation of thenonlinear model (3.10) the following statistic

RSS(0) − RSS

RSS

ab−a−b

(3.11)

is F–distributed with 1 and ab− a− b degrees of freedom. Easy manipulation of (3.11) givesthe modified Tukey test which rejects the additivity hypothesis if and only if

RSS(0) > RSS

(

1 +1

ab− a− bF (1, ab− a− b; 1 − α)

)

, (3.12)

where F (1, ab− a− b; 1−α) stands for 1−α quantile of F -distribution with 1 and ab− a− bdegrees of freedom.

For type (A) interaction the power of the modified test is almost equal to the power of Tukeytest. For type (B) interaction the power of all the tests can be seen in Figure 3.3. The powerof the modified Tukey test is much higher than the power of Tukey test for this more generalinteraction.

Theoretically, we may expect the modified test to be conservative because just one iterationdoes not find precisely the maximum of model (3.10) likelihood. However, as we will see in thefollowing part a situation for a mall number of rows or columns is quite opposite.

Small sample adjustment

If the left part of Figure 3.3 (b = 10) would be magnified enough it can be observed thatthe modified Tukey test does not work properly (type-I-risk

.= 6%). The reason is that

the ratio likelihood test statistic converges to χ2–distribution rather slowly (see Bartlett(1937)) and a correction for small sample size is needed. We present two possibilities that arerecommended if the number of rows or columns is below 20 (that is an empirical thresholdbased on our simulations).

32

One possibility how overcome this obstacle is to bootstrap without replacement. Considerthe test statistic S = RSS(0) −RSS. Then generate N (boot) times a dataset by model

y(boot)ij = µ+ α

(0)i + β

(0)j + rπij

,

where π is a random permutation of indexes of R matrix (3.5). For each dataset the statisticof interest S(boot) = RSS

(0)(boot) − RSS(boot) is computed. The critical value of the modified

Tukey test is then the (1−α)·100% quantile of the generated S(boot). The number of generatedsamples N (boot) = 1000 seems to be sufficient in the most cases.

The second possibility is to estimate the residual variance σ2 of random errors eij as s2 =RSS

ab−a−b and then generate N (sample) datasets using model

y(sample)ij = µ+ α

(0)i + β

(0)j + e

(NEW )ij ,

where (e(NEW )ij ) are independent identically distributed random variables generated from

a normal distribution with zero mean and variance s2. Because under the additivity hypothesisthe parameter k is equal to zero, the proposed test statistic is the absolute value of its estimatork(1). As in bootstrapping, for each of the N (sample) datasets the value of the statistic iscomputed and the additivity hypothesis is rejected if more than (1 − α) · 100% of sampledstatistics lie below the statistic k(1) based on the real data. The number of generated samplesN (boot) = 1000 seems to be sufficient in the most of cases.

To conclude, we have proposed a modification of the Tukey additivity test. The modifiedTukey test performs almost as good power as Tukey test when the interaction is a productof main effects but should be recommended if we also request reasonable power in case ofmore general interaction schemes.

33

Bibliography

A. Alin and S. Kurt. Testing non-additivity (interaction) in two-way ANOVA tables with noreplication. Statistical Methods in Medical Research, 15:63–85, 2006.

M.S. Bartlett. Properties of sufficiency and statistical tests. Statistical Methods in MedicalResearch, 160:268–282, 1937.

R.J. Boik. A comparison of three invariant tests of additivity in two-way classifications withno replications. Computational Statistics and Data Analysis, 15:411–424, 1993a.

R.J. Boik. Testing additivity in two-way classifications with no replications: the locally bestinvariant test. Journal of Applied Statistics, 20:41–55, 1993b.

E. Brunner and U. Munzel. Nichtparametrische Datenanalyse - unverbundene Stichproben.Springer, Berlin, 2002.

S. Chakraborti, B. Hong, and M.A. van de Wiel. A note on sample size determination fora nonparametric test of location. Technometrics, 48:88–94, 2006.

M.N. Ghosh and D. Sharma. Power of tukey’s test for non-additivity. Journal of the RoyalStatistical Society, B25:213–219, 1963.

G. R. Grimmett and D. R. Stirzaker. Probability and Random Processes, 2nd Edition. Claren-don Press, Oxford, 1992.

V. Hegeman and D.E. Johnson. The power of two tests for nonadditivity. Journal of theAmerican Statistical Association, 71:945–948, 1976.

D.E. Johnson and F.A. Graybill. An analysis of a two-way model with interaction and noreplication. Journal of the American Statistical Association, 67:862–868, 1972.

H. Kres. Statistical Tables for Multivariate Analysis. Springer, New York, 1972.

W.H. Kruskal and W.A. Wallis. Use of ranks in one-criterion variance analysis. Journal ofthe American Statistical Association, 47:583–621, 1952.

E. L. Lehmann. Nonparametrics: Statistical Methods Based on Ranks. Holden-Day, Inc., SanFrancisco, 1975.

E. L. Lehmann. Testing Statistical Hypothesis. Springer-Verlag, New York, 2005.

M. Mahoney and R. Magel. Estimation of the power of the kruskal-wallis test. Biometricaljournal, 38:613–630, 1996.

34

J. Mandel. Non-additivity in two-way analysis of variance. Journal of the American StatisticalAssociation, 56:878–888, 1961.

G.E. Noether. Sample size determination for some common nonparametric tests. Journal ofthe American Statistical Association, 82:645–647, 1987.

R Development Core Team. R: A language and environment for statistical com-puting. R Foundation for Statistical Computing, Vienna, Austria, 2008. URLhttp://www.R-project.org.

D. Rasch and V. Guiard. The robustness of parametric statistical methods. ComputationalStatistics and Data Analysis, 10:29–45, 1990.

D. Rasch and M. Simeckova. Determining the size of experiments for the one-way ANOVAmodel I for ordered categorical data. In Proceedings of the 8th International Workshopin Model-Oriented Design and Analysis, Almagro, Spain, June 4-8, 2007, 2007. PhysicaVerlag, Series: Contributions to Statistics.

D. Rasch, L.R. Verdooren, and J.I. Gowers. The Design and Analysis of Experiments andSurveys, Second Edition. R. Oldenburg Verlag, Muenchen Wien, 2007.

T. Rusch, M. Simeckova, K.D. Kubinger, K. Moder, P. Simecek, and D. Rasch. Test of addi-tivity in mixed and fixed effects two-way ANOVA models with single subclass numbers. InProceedings of The International Conference on rends and Perspectives in Linear Statisti-cal Inference LINSTAT 2008, Bedlewo, Poland, April 21-25, 2008, subbmitted. Springer,special issue of Statistical Papers.

H. Scheffe. The Analysis of Variance. John Wiley & Sons, Inc., New York, 1959.

P. Simecek and M. Simeckova. Modification of Tukey’s additivity test. Journal of StatisticalPlanning and Inference, submitted.

M. Simeckova and D. Rasch. Additivity hypothesis in the mixed two-way ANOVA modelwith single subclass numbers. In Proceedings of 15th Summer School of JCMF ROBUST2008, Rackova dolina, Pribylina, Slovakia, September 8-12, 2008, subbmitted.

M. Simeckova and D. Rasch. Sample size for the one-way layout with one fixed factor forordered categorical data. Journal of Statistical Theory and Practice, 2:109–123, 2008.

J.W. Tukey. One degree of freedom for non-additivity. Biometrics, 5:232–242, 1949.

F. Tusell. Testing for interaction in two-way ANOVA tables with no replication. Computa-tional Statistics and Data Analysis, 10:29–45, 1990.

F. Wilcoxon. Individual comparisons by ranking methods. Biometrics Bulletin, 1:80–83, 1945.

35


Recommended