+ All Categories
Home > Documents > On Sample Size Requirements for Johansen's Test w,= w = 2 w,

On Sample Size Requirements for Johansen's Test w,= w = 2 w,

Date post: 22-Feb-2023
Category:
Upload: khangminh22
View: 0 times
Download: 0 times
Share this document with a friend
10
Journal of Educational and Behavioral Statistics Summer 1996, Vol. 21, No. 2, pp. 169-178 On Sample Size Requirements for Johansen's Test William T. Coombs Oklahoma State University James Algina University of Florida Key words: MANOVA, omnibus hypothesis test, robustness Type I error rates for the Johansen test were estimated using simulated data for a variety of conditions. The design of the experiment was a 2 X 2X2X3X9X3 factorial. The factors were (a) type of distribution, (b) number of dependent variables, (c) number of groups, (d) ratio of the smallest sample size to the number of dependent variables, (e) sample size ratios, and (f) degree of heteroscedasticity. The results indicate that Type I error rates for the Johansen test depend heavily on the number of groups and the ratio of the smallest sample size to the number of dependent vari- ables. Type I error rates depend to a lesser extent on the distribution types used in the study. Based on the results, sample size guidelines are presented. The Johansen (1980) test was developed to test the hypothesis H 0 : jx t = |x 2 = . . . = |x G in situations in which 2/ =£ 2, (for at least one pair of / and j). The Johansen test uses the statistic G J=^ (*/ " x)'W,<ii " x), where w,= s, 1, • • • , G, 1 w = 2 w, I ni x/ = - 2 x (/> ' = 1, • • •, G, n i j= 1 169 at PENNSYLVANIA STATE UNIV on March 4, 2016 http://jebs.aera.net Downloaded from
Transcript

Journal of Educational and Behavioral Statistics Summer 1996, Vol. 21, No. 2, pp. 169-178

On Sample Size Requirements for Johansen's Test

William T. Coombs Oklahoma State University

James Algina University of Florida

Key words: MANOVA, omnibus hypothesis test, robustness

Type I error rates for the Johansen test were estimated using simulated data for a variety of conditions. The design of the experiment was a 2 X 2X2X3X9X3 factorial. The factors were (a) type of distribution, (b) number of dependent variables, (c) number of groups, (d) ratio of the smallest sample size to the number of dependent variables, (e) sample size ratios, and (f) degree of heteroscedasticity. The results indicate that Type I error rates for the Johansen test depend heavily on the number of groups and the ratio of the smallest sample size to the number of dependent vari­ables. Type I error rates depend to a lesser extent on the distribution types used in the study. Based on the results, sample size guidelines are presented.

The Johansen (1980) test was developed to test the hypothesis H0: jxt = |x2 = . . . = |xG in situations in which 2/ =£ 2 , (for at least one pair of / and j). The Johansen test uses the statistic

G

J = ^ (*/ " x)'W,<ii " x),

where

w,= s, 1, • • • , G,

1

w = 2 w,

I ni

x/ = - 2 x(/> ' = 1, • • •, G, ni j= 1

169

at PENNSYLVANIA STATE UNIV on March 4, 2016http://jebs.aera.netDownloaded from

Coombs and Algina

and

G

x = W"1 2 W&. 1=1

The critical value for the Johansen test is cFa/?(G_1)/?(G_1)[/?(G-1)+2]/(3A) degrees of freedom, where

6/4 c = p(G-l) + 2A- — — —

p(G - 1) + 2

and

_ o trace(I - W W , ) 2 + trace2(I - W"'W,) A ~ k 2(«,- - 1)

With unequal-sized samples selected from multivariate normal populations under heteroscedastic conditions, the Johansen test outperforms both the Pillai-Bartlett trace criterion and James's first-order test (Algina & Tang, 1988; Tang & Algina, 1993).

Let nt (i = 1, . . . , G) be the /th sample size and n[{] < n[2] < . . . < n[G]. Let p be the number of dependent variables, and define r = n[{]/p. Inspection of results in Algina and Tang (1988), Coombs (1993), and Tang (1989) suggests that when the data are multivariate normal and r is at least 4, the estimated actual Type I error rate (f) for Johansen's test will be near the nominal Type I error rate (a). However, the cited research is limited in several ways. First, Algina and Tang included only G = 2 groups, and Tang and Algina included only G = 3 groups. Neither study included nonnormal data. This is important because results in Algina, Oshima, and Tang (1991) and in Coombs (1993) indicate that T can be adversely affected by sampling from skewed distributions. Coombs included G = 3 and G = 6 groups but manipulated the ratio of total sample size to number of variables rather than r. Consequently, the dependence of T on r is obscured in his study.

There are several alternatives to Johansen's test that might have been included in the study:

(1) The usual MANOVA criteria (Lawley-Hotelling trace, Pillai-Bartlett trace, Roy's largest root, and Wilks's A) might have been included. However, fs reported in Olsen (1974) show that under normality and homoscedasticity, these test do not always control T even when the sample sizes are equal. Of the MANOVA criteria, Olsen recommends the Pillai-Bartlett trace as the best for controlling T. However, fs in Tang and Algina (1993) show that the Pillai-Bartlett trace can provide very poor control of T when sample sizes are unequal.

170

at PENNSYLVANIA STATE UNIV on March 4, 2016http://jebs.aera.netDownloaded from

Sample Size

(2) The tests developed by James (1954) might have been included in the study. James used J as the test statistic and approximated the critical value by a series of terms that involve powers of (ri; - l)"1. The critical value in the first-order test includes terms up to the first power; the critical value in the second-order test includes terms up to the second power. Estimated TS reported in Wilcox (1988) show that in the univariate case James's second-order test controls T much better than does Welch's approximate degrees of freedom test. The latter test is the univariate special case of Johansen's test. However, Wilcox's results for the univariate case have not generalized to the multivariate case. Algina, Oshima, and Tang (1991) reported fs for James's first-order, James's second-order, and Johansen's tests applied to G = 2 groups and/? = 2, 6, or 10 variables. The differences in the estimated TS for the latter two tests were very small. James's first-order test provided poorer control of T than did the other two tests. Tang and Algina (1993) reported fs for the three tests applied to G = 3 groups and p = 3 or 6 variables. James's first-order test provided uniformly poorer control of T than did Johansen's test. The estimated TS for James's second-order test were smaller than those for Johansen's test and were usually smaller than a; estimated TS for Johansen's test were typically larger than a. As a result, Johansen's test provided better control of T in some conditions, and James's second-order test provided better control in other conditions. With small sample sizes, Johansen's test tends to provide better control when there is a direct relation­ship between the sample sizes and dispersion matrices, and James's second-order test tends to provide better control when there is an inverse relationship between the sample sizes and dispersion matrices. For larger sample sizes, both tests tend to provide reasonable control of T. Because T for Johansen's test tends to be larger than a, and T for James's second-order test tends to be smaller than a, the choice between the two tests depends, in part, on one's preference between a somewhat liberal and a somewhat conservative test. A second factor is the complexity of the two tests. The calculations for the critical value for James's second-order test are substantially more complex than are those for Johansen's test. (To give some idea of the difference in complexity, we compared the code for the two critical values in the PROC MATRIX program prepared by Tang and Algina, who studied G = 3 groups. The critical value for James's second-order test required about 230 lines of code; the critical value for Johansen's test required about 6 lines of code. Furthermore, the critical value for James's second-order test involves products of functions of the dispersion matrices for the G groups, with the products calculated for every pair of treatments. As a result, the number of required calculations increases rapidly with an increase in the number of groups.)

We also considered power in our selection of Johansen's test. Johansen's test and James's tests all use J as the test statistic but employ different approximations to the critical value for J. Consequently, power differences among the three tests are solely a function of the adequacy with which the

171

at PENNSYLVANIA STATE UNIV on March 4, 2016http://jebs.aera.netDownloaded from

Coombs and Algina

critical value is approximated. It should also be noted that J is asymptotically distributed as x2 with p(G - 1) degrees of freedom and noncentrality parameter

where

r G

is "5f - i - i - ' c

2 /= i

"Xjj

, n i \

Consequently, the power of any test that employs J as the test statistic will increase with increasing variation in the fjt, and with increasing nt.

Method

Design

Six factors were considered in the study. Distribution type (DT). Data were generated to simulate experiments in

which distributions were either multivariate normal or multivariate lognormal. Number of dependent variables (p). Data were generated to simulate experi­

ments in which there are p = 3 or p = 6 dependent variables. Number of populations sampled (G). Data were generated to simulate

experiments in which there is sampling from either G = 3 or G = 6 populations.

Ratio of the smallest sample size to the number of dependent variables (r). The ratios chosen were r = 2, r = 4, and r = 6.

Sample size ratios (NR). The ratios of ni'.n2:n3 used in the simulation when sampling from three different populations were (a) 1:1:1, (b) 1:1:1.5, (c) 1:1:2, (d) 1:1.5:1.5, (e) 1:2:2, and the opposites (e.g., 1.5:1:1) of (b) to (d). Similarly, the ratios of n{: . . . :n6 used in simulation when sampling from six different populations were (a) 1:1:1:1:1:1, (b) 1:1:1:1:1.5:1.5, (c) 1:1:1:1:2:2, (d) 1:1:1.5:1.5:1.5:1.5, (e) 1:1:2:2:2:2, and the opposites of (b) to (d).

Degree of heteroscedasticity (d). Each population with dispersion matrix equal toa /?X/? identity matrix (I) will be called an uncontaminated popula­tion. Each population with a p X p diagonal dispersion matrix (D) with at least one diagonal element not equal to one will be called a contaminated population. The forms of the dispersion matrices depend on the number of dependent variables. When/? = 3, D = Diagfl, d2, d2} and I = Diagfl, 1, 1}. When/? = 6, D = Diag{l, 1, d2, d2, d2, d2} and I = Diag{l, 1, 1, 1, 1,1}. Three levels of d—d = 1, d = 1.5, and d = 3—were used to simulate

2 (M* - »r n,

172

at PENNSYLVANIA STATE UNIV on March 4, 2016http://jebs.aera.netDownloaded from

Sample Size

heteroscedasticity of the dispersion matrices. The use of diagonal matrices entails no loss of generalizability beyond that due to the use of just two forms of dispersion matrix (I and D) for each value of p. It is well known (Anderson, 1958) that two dispersion matrices, say X\ and S2»

c a n be simulta­neously diagonalized by pre- and postmultiplication by the same (p X p) matrix and that Johansen's test is invariant to the transformation implied by the matrix. Through combination of the NR and d factors, both positive and negative relationships between sample size and dispersion matrices were included. In the positive relationship, the larger samples correspond to D. In the negative relationship, the smaller samples correspond to D.

Design layout. The sample sizes were determined once levels of p, G, r, and NR were determined. Each of these 54 conditions was crossed with two distribution types and three levels of heteroscedasticity to generate 648 experimental conditions.

Simulation Procedure

The simulation was conducted as 648 separate runs, one for each condition, with 20,000 replications per condition. In the multivariate normal conditions a n / i , - X p ( / = 1 , . . . , G) matrix of uncorrected pseudorandom observations was generated for the /th sample (using RANNOR with PROC IML in SAS) from the normal distribution. Each nt X p matrix of observations corresponding to a contaminated population was postmultiplied by an appro­priate D to simulate dispersion heteroscedasticity. In the nonnormal condition the data were generated by using the following steps.

(1) Generate an nl X p (i = 1,. . . , G) matrix of uncorrected pseudorandom standard normal observations for the /th sample by using RANNOR with PROC IML in SAS.

(2) Transform each element of the matrix by using xjk = exp(0.4z/A:) -exp(0.08), where j = 1,. . . , nt and k = 1 , . . . , p. The result is an nt X p matrix of uncorrected pseudorandom variables, each with mean 0 and skew 1.32. We selected skew of 1.32 based on results in Micceri (1989).

(3) Postmultiply each matrix generated in Step 2 by an appropriate D to simulate heteroscedasticity.

For each replication, the data were analyzed by using the Johansen test. The proportion of 20,000 replications that yielded significant results at a equal to 0.01, 0.05, and 0.10 was recorded.

Results

Analyses of variance were used to investigate the effect of DT, p, G, r, NR, and d on f for the Johansen test. Because there were six factors, initial analyses were conducted to determine which effects to enter into the analysis of variance model. A forward selection approach was used, with all main effects entered first, followed by all two-way interactions, and all three-way interactions. The error terms were pooled in order to estimate R2. The R2 for

173

at PENNSYLVANIA STATE UNIV on March 4, 2016http://jebs.aera.netDownloaded from

Coombs and Algina

models, identified by their highest-order terms, were (a) main effects (.745), (b) two-way interactions (.974), and (c) three-way interactions (.998). Because R2 was .998 for the model with three-way interactions, more complex models were not examined. The model with main effects, two-way interactions, and three-way interactions was selected.

For each effect, the component of the mean-square for that effect (0„ i = 1 , . . . , 41) was estimated. Negative mean-square components were set to zero, and the sum of these mean-square components and the error variance were used as a measure of total variance. Shown in Table 1 are 6, and proportions of variance for effects that (a) were statistically significant and (b) accounted for at least 1.0% of the total variance in estimated Type I error rates.

Effect o/r, G, and DT. The r, G, r X G, and DT effects were interpreted by calculating percentiles of the distribution of f for each combination of r and G for each distribution. The percentiles for the normal and nonnormal conditions are shown in Tables 2 and 3, respectively. Percentiles marked by * are outside Bradley's (1978) liberal criterion (.5a < f < 1.5a).

The interpretation of these results, of course, depends on one's tolerance

TABLE 1 Mean-square components

Effect e Percent of variance

r .000558 43.9 G .000441 33.0 rXG .000186 13.7 DT .000045 3.6

G G G (r = 2) (r = 4) (r = 6)

Percentile 3 6 3 6 3 6

Max .1273* .2430* .0635 .0823* .0579 .0627 95th .1175* .2243* .0628 .0782* .0552 .0613 90th .1018* .2162* .0605 .0766* .0545 .0602 75th .0913* .1879* .0573 .0732 .0525 .0583 50th .0819* .1664* .0550 .0697 .0517 .0568 25th .0739 .1464* .0534 .0667 .0501 .0555 10th .0661 .1275* .0515 .0637 .0496 .0533 5th .0649 .1175* .0505 .0622 .0493 .0527 Min .0568 .1029* .0498 .0563 .0488 .0509

TABLE 2 Percentiles of f for the Johansen test: Multivariate normal distribution

Note. Percentiles denoted by asterisks fall outside the interval [.5a, 1.5a].

174

at PENNSYLVANIA STATE UNIV on March 4, 2016http://jebs.aera.netDownloaded from

Sample Size

for Type I errors. Using the criterion that control of T is good when each f < .07, then Johansen's test controls T well when the data are normal, G = 3, and r > 4; if G = 6, Johansen's test controls T well when r > 6. When G = 3, Johansen's test may control T well at some level of r in the interval 2 < r < 4. When G = 6, Johansen's test may control T well at some level of r in the interval 4 < r < 6. Our design does not permit determination of these values except by interpolation between the percentiles for r = 2, 4, and 6.

To provide additional information for interpolation, we estimated T for several additional conditions. We selected the additional conditions based on the following considerations. The four values of f for G = 3, d = 3, p = 3 and 6, and NR = 1:1:2 and 1:1:1.5 and the four values of f for G = 6, d = 3, p = 3 and 6, and NR = 1:1:1:1:2:2 and 1:1:1:1:1.5:1.5 were among the largest six values of f for each combination of distribution and n Further, for each of G = 3 and G = 6, one of these four values of f was the maximum f. The results of the additional runs are presented in Table 4. The new results indicate that for normal distributions and G = 3, control of T is good for r > 3 l /3 . For normal distributions and G = 6, control is good for r > 42/3.

For the nonnormal conditions, the results in Table 3 indicate that control of T is adequate when G = 3 and r ^ 6, in the sense that almost all TS meet Bradley's (1978) liberal criterion when G = 3 and r > 6. When G = 6, an even larger value of r is required. The results in Table 4 indicate that when G = 6, control of T is adequate for r > 8. Even larger sample sizes would be required for good control of T.

Results for a = .01 and a = .10 exhibited trends similar to those for a = .05.

TABLE 3 Percentiles of f for the Johansen test: Nonnormal distribution

Note. Percentiles denoted by asterisks fall outside the interval [.5a, 1.5a].

175

G G G (r = 2) (r = 4) (r = 6)

Percentile 3 6 3 6 3 6

Max .1649* .3078* .0893* .1265* .0791* .0957* 95th .1436* .2751* .0856* .1164* .0741 .0889* 90th .1245* .2481* .0785* .1010* .0654 .0816* 75th .1008* .2152* .0670 .0923* .0607 .0748 50th .0853* .1895* .0602 .0853* .0564 .0702 25th .0748 .1646* .0571 .0820* .0545 .0674 10th .0694 .1470* .0542 .0780* .0519 .0650 5th .0669 .1322* .0526 .0731 .0507 .0633 Min .0647 .1241* .0498 .0722 .0489 .0610

at PENNSYLVANIA STATE UNIV on March 4, 2016http://jebs.aera.netDownloaded from

r Sample size ratio

( 3

Distribution r Sample size ratio 3 6

Normal 3 1:1:1.5 .0723 .0869 1:1:2.0 .0698 .0936

3V3 1:1:1.5 .0691 .0696 1:1:2.0 .0679 .0690

4V3 1:1:1:1:1.5:1.5 .0762 .0714 1:1:1:1:2.0:2.0 .0764 .0723

42/3 1:1:1:1:1.5:1.5 .0676 .0699 1:1:1:1:2.0:2.0 .0687 .0695

Lognormal 8 1:1:1:1:1.5:1.5 .0796 .0751 1:1:1:1:2.0:2.0 .0824 .0780

9 1:1:1:1:1.5:1.5 .0748 .0744 1:1:1:1:2.0:2.0 .0782 .0715

Coombs and Algina

TABLE 4 Type I error rates for several additional conditions

Note. For r = 3, 4'/3, and 9, and a sample size ratio of 1.5, the larger sample sizes were selected by multiplying the smaller sample size by 1.5 and rounding up to the next larger integer.

Discussion and Conclusions

The generalizability of the study is limited by (a) the types of distributions studied, (b) the forms of the dispersion matrices considered, (c) the variation in the degree of the sample size ratios, and (d) the variation in the ratio of the smallest sample size to the number of dependent variables. Keeping in mind these limitations and the need for additional research, the following conclusions can be set forth.

Of the factors included in the study, the ratio of the smallest sample size to the number of variables, the number of groups, the interaction of the preceding factors, and the type of distribution have the strongest effects on T. The effect of distribution type was substantially smaller than the other three effects. It is important to note that degree of heteroscedasticity had a very small effect on T. Degree of sample size imbalance also had a small effect, accounting for just under 1% of the variance. Though these results certainly reflect the levels of heteroscedasticity and the sample size ratios we employed in the study, the results suggest that degree of heteroscedasticity and sample size imbalance do not have strong effects on T, at least in compari­son with the size of the effects of the number of groups, the ratio of the smallest sample size to the number of variables, and the type of distribution.

The results suggest some sample size guidelines for using Johansen's test. With multivariate normal data and three groups, the Johansen test is effective at controlling T when r > 3 l/3 . This value of r is smaller than the value r = 4 suggested by inspection of the results in Algina and Tang (1988), Coombs (1993), and Tang (1989). When there are six groups, control of T is good

176

at PENNSYLVANIA STATE UNIV on March 4, 2016http://jebs.aera.netDownloaded from

Sample Size

when r > 42/3. This value of r is larger than the value r = 4 suggested by inspection of the results in the cited literature. With skewed data, larger sample sizes are required to control T. The skewed data in our study were centered and scaled lognormal distributions with skew 1.32. With three groups, r > 6 was required for adequate control of T; with six groups, r ^ 8 was required. Both values of r are larger than the value r = 4 suggested by inspection of the results in Algina and Tang, Coombs, and Tang.

References

Algina, J., Oshima, T. C , & Tang, K. L. (1991). Robustness of Yao's, James', and Johansen's tests under variance-covariance heteroscedasticity and nonnormality. Journal of Educational Statistics, 16, 125-139.

Algina, J., & Tang, K. L. (1988). Type I error rates for Yao's and James' tests of equality of mean vectors under variance-covariance heteroscedasticity. Journal of Educational Statistics, 13, 281-290.

Anderson, T. W. (1958). An introduction to multivariate statistical analysis. New York: Wiley.

Bradley, J. V. (1978). Robustness? British Journal of Mathematical and Statistical Psychology, 34, 144-152.

Coombs, W. T. (1993). Solutions to the multivariate G-sample Behrens-Fisher problem based upon generalizations of the Brown-Forsythe F* and Wilcox Hm tests. Disser­tation Abstracts International, 57(01), 313B. (University Microfilms No. 9314213)

James, G. S. (1954). Tests of linear hypotheses in univariate and multivariate analysis when the ratio of the population variances are unknown. Biometrika, 41, 19-43.

Johansen, S. (1980). The Welch-James approximation to the distribution of the residual sum of squares in a weighted linear regression. Biometrika, 67, 85-92.

Micceri, T (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105, 156-166.

Olsen, C. L. (1974). Comparative robustness of six tests in multivariate analysis of variance. Journal of the American Statistical Association, 69, 894-908.

Tang, K. L. (1989). Robustness of four multivariate tests under variance-covariance heteroscedasticity. Unpublished doctoral dissertation, University of Florida. (Uni­versity Microfilms No. 90-28, 574).

Tang, K. L., & Algina, J. (1993). Performance of four multivariate tests under variance-covariance heteroscedasticity. Multivariate Behavioral Research, 28, 391^05 .

Wilcox, R. R. (1988). A new alternative to the ANOVA F and new results on James's second-order method. British Journal of Mathematical and Statistical Psychology, 41, 109-117.

Authors

WILLIAM T. COOMBS is Assistant Professor, Applied Behavioral Studies in Educa­tion, 202 North Murray Hall, Oklahoma State University, Stillwater, OK 74078-0254; [email protected]. He specializes in applied statistics.

177

at PENNSYLVANIA STATE UNIV on March 4, 2016http://jebs.aera.netDownloaded from

Coombs and Algina

JAMES ALGINA is Professor and Chair, Foundations of Education, 1403 Norman Hall, University of Florida, Gainesville, FL 32611-2053. He specializes in psycho­metric theory and applied statistics.

Received June 30, 1993 Revision received May 20, 1994

Second revision received October 7, 1994 Accepted January 3, 1995

178

at PENNSYLVANIA STATE UNIV on March 4, 2016http://jebs.aera.netDownloaded from


Recommended