+ All Categories
Home > Documents > Patricia de Siqueira Ramos a Bayesian Solution to the Multivariate Behrens Fisher Problem

Patricia de Siqueira Ramos a Bayesian Solution to the Multivariate Behrens Fisher Problem

Date post: 23-Dec-2015
Category:
Upload: que-green
View: 216 times
Download: 1 times
Share this document with a friend
Description:
The multivariate Behrens-Fisher problem through a Bayesian aproach.
12
Computational Statistics and Data Analysis 54 (2010) 1622–1633 Contents lists available at ScienceDirect Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda A Bayesian solution to the multivariate Behrens–Fisher problem Patrícia de Siqueira Ramos, Daniel Furtado Ferreira * Exact Sciences Department, Federal University of Lavras, 37200-000, Brazil article info Article history: Received 26 August 2009 Received in revised form 7 January 2010 Accepted 8 January 2010 Available online 25 January 2010 Keywords: Means comparison Heteroscedastic covariances Monte Carlo simulation Type I error rate Power abstract One of the most common problems in applied statistics is to compare two normal population means when the ratio of variances is unknown and not equal to 1. This is the known Behrens–Fisher problem. There are many approaches to the distribution to the t -statistic in the univariate circumstance under the Behrens–Fisher problem. In the multivariate case, most solutions are based on adjusting the degrees of freedom to obtain better approximations to the chi-squared or Hotelling’s T 2 distributions. In both circumstances there are Bayesian solutions proposed by some authors. This work aimed to propose a computational Bayesian solution to the multivariate Behrens–Fisher problem based on the complex analytical solution of Johnson and Weerahandi (1988), to evaluate its performance through Monte Carlo simulation computing the type I error rates and power and to compare it with the modified Nel and Van der Merwe test, that is considered the best frequentist solution. The inferences were made to the population difference δ of the mean vectors. It was used as a conjugate prior distribution to the population mean vector (μ i ) and covariance matrix (Σ i ) obtaining a posterior multivariate t distribution to μ i , for i = 1, 2. In general, the Bayesian test was conservative for samples of different sizes and liberal in some circumstances of equal and small sample sizes and its power was equal to or greater than that of its competitor in large samples and/or in balanced circumstances. The new solution has competitive advantages and in some circumstances surpasses its main competitor, therefore its use in real cases should be recommended. © 2010 Elsevier B.V. All rights reserved. 1. Introduction One of the most common problems in applied statistics is to compare two normal population means when the ratio of variances is unknown and different from 1 (Scheffé, 1970). In the univariate case the Student’s t -test for equality of means is used when the two population variances are assumed to be equal and the data are normally distributed. However, the t -statistic is sensitive to the variance heterogeneity, therefore several approaches have been proposed. Behrens (1929) was the first to propose a solution for testing the hypothesis of equality of two normal means under heteroscedastic variances and then Fisher (1936) showed that this solution could be justified using fiducial inference. Because they were the first authors to study this issue, it was called the Behrens–Fisher problem. The first Bayesian approach to the univariate problem was presented by Jeffreys (1940) whose solution is equivalent to that obtained by fiducial inference. The multivariate Behrens–Fisher problem deals with the comparison of two mean vectors from multivariate normal distributions with unknown and heterogeneous covariance matrices. In the multivariate case several alternatives were proposed in the literature to circumvent the Behrens–Fisher problem. Most of them adjust the degrees of freedom of the chi-square or Hotelling’s T 2 -statistics searching for a better performance of the approximations (Ferreira, 2008). Some of these solutions were proposed by James (1954), Yao (1965), Nel and Van der Merwe (1986) and Johnson and Weerahandi * Corresponding author. Tel.: +55 035 3821 3831. E-mail addresses: [email protected] (P.S. Ramos), [email protected] (D.F. Ferreira). 0167-9473/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2010.01.019
Transcript

Computational Statistics and Data Analysis 54 (2010) 1622–1633

Contents lists available at ScienceDirect

Computational Statistics and Data Analysis

journal homepage: www.elsevier.com/locate/csda

A Bayesian solution to the multivariate Behrens–Fisher problemPatrícia de Siqueira Ramos, Daniel Furtado Ferreira ∗Exact Sciences Department, Federal University of Lavras, 37200-000, Brazil

a r t i c l e i n f o

Article history:Received 26 August 2009Received in revised form 7 January 2010Accepted 8 January 2010Available online 25 January 2010

Keywords:Means comparisonHeteroscedastic covariancesMonte Carlo simulationType I error ratePower

a b s t r a c t

One of the most common problems in applied statistics is to compare two normalpopulation means when the ratio of variances is unknown and not equal to 1. Thisis the known Behrens–Fisher problem. There are many approaches to the distributionto the t-statistic in the univariate circumstance under the Behrens–Fisher problem. Inthe multivariate case, most solutions are based on adjusting the degrees of freedom toobtain better approximations to the chi-squared or Hotelling’s T 2 distributions. In bothcircumstances there are Bayesian solutions proposed by some authors. This work aimedto propose a computational Bayesian solution to the multivariate Behrens–Fisher problembased on the complex analytical solution of Johnson andWeerahandi (1988), to evaluate itsperformance through Monte Carlo simulation computing the type I error rates and powerand to compare it with the modified Nel and Van der Merwe test, that is considered thebest frequentist solution. The inferences were made to the population difference δ of themean vectors. It was used as a conjugate prior distribution to the population mean vector(µi) and covariance matrix (Σi) obtaining a posterior multivariate t distribution to µi, fori = 1, 2. In general, the Bayesian test was conservative for samples of different sizes andliberal in some circumstances of equal and small sample sizes and its powerwas equal to orgreater than that of its competitor in large samples and/or in balanced circumstances. Thenew solution has competitive advantages and in some circumstances surpasses its maincompetitor, therefore its use in real cases should be recommended.

© 2010 Elsevier B.V. All rights reserved.

1. Introduction

One of the most common problems in applied statistics is to compare two normal population means when the ratio ofvariances is unknown and different from 1 (Scheffé, 1970). In the univariate case the Student’s t-test for equality of meansis used when the two population variances are assumed to be equal and the data are normally distributed. However, thet-statistic is sensitive to the variance heterogeneity, therefore several approaches have been proposed. Behrens (1929) wasthe first to propose a solution for testing the hypothesis of equality of two normal means under heteroscedastic variancesand then Fisher (1936) showed that this solution could be justified using fiducial inference. Because they were the firstauthors to study this issue, it was called the Behrens–Fisher problem. The first Bayesian approach to the univariate problemwas presented by Jeffreys (1940) whose solution is equivalent to that obtained by fiducial inference.The multivariate Behrens–Fisher problem deals with the comparison of two mean vectors from multivariate normal

distributions with unknown and heterogeneous covariance matrices. In the multivariate case several alternatives wereproposed in the literature to circumvent the Behrens–Fisher problem. Most of them adjust the degrees of freedom of thechi-square or Hotelling’s T 2-statistics searching for a better performance of the approximations (Ferreira, 2008). Some ofthese solutions were proposed by James (1954), Yao (1965), Nel and Van der Merwe (1986) and Johnson and Weerahandi

∗ Corresponding author. Tel.: +55 035 3821 3831.E-mail addresses: [email protected] (P.S. Ramos), [email protected] (D.F. Ferreira).

0167-9473/$ – see front matter© 2010 Elsevier B.V. All rights reserved.doi:10.1016/j.csda.2010.01.019

P.S. Ramos, D.F. Ferreira / Computational Statistics and Data Analysis 54 (2010) 1622–1633 1623

(1988) and others. The performance of some solutions were evaluated by Monte Carlo simulation. Subrahmaniam andSubrahmaniam (1973) compared three methods, Christensen and Rencher (1997) compared seven methods, and Lix et al.(2005) evaluated six frequentist procedures. According to the results of these papers, apparently there is no definitivesolution that shows good performance in all circumstances.Krishnamoorthy and Yu (2004) proposed a test by modifying Nel and Van der Merwe (1986) solution to obtain an

invariant test under nonsingular transformations. To apply the test it is necessary to calculate the value of the statisticgiven by

T ∗2c = (Y1 − Y2 − δ0)>

(V1

n1(n1 − 1)+

V2n2(n2 − 1)

)−1(Y1 − Y2 − δ0),

where Yi is the sample mean vector, Vi is the cross-product matrix and ni is the sample size of the ith population, i = 1and 2. The null hypothesis of equality of the two population mean vectors, H0 : µ1 − µ2 = δ0 = 0, should be rejected ifT ∗2c > νpFα,p,ν+1−p/(ν + 1− p), where ν is calculated by

ν =p+ p2

2∑i=1

1ni−1

{tr[(

SiS−1eni

)2]+

[tr(

SiS−1eni

)]2} ,and Se = S1/n1 + S2/n2 and Fα,p,ν+1−p is the upper 100α% quantile from the F distribution with p and ν + 1− p degrees offreedom. In the univariate case (p = 1) this solution reduces to theWelch approximation to the degrees of freedom (Welch,1947), that also occurs with the Yao’s approach (Yao, 1965).One Bayesian solution to the univariate Behrens–Fisher problem was proposed by Buckley (2004) and it was applied

to political, social and behavioral sciences. The author presented a model that allows the Bayesian estimation of thedifference δ = µ1 − µ2 between population means. The posterior distribution of this difference, called the Behrens–Fisherdistribution, is obtained by using noninformative prior distributions for the means and for the log of the variances fromthe two populations. Interpretations can be made by plotting the posterior distribution or computing a variety of HPDs(Highest Posterior Density—region of valueswhich contains 100(1−α)%of posterior probability) for different percentages orproportions of the posterior density. Themodel is extended to consider non-normal variates, informative prior distributions,and a multivariate test based on Markov chain Monte Carlo (MCMC) simulation. To implement the multivariate test theauthor considered conjugate prior distributions to themean vectors and covariance matrices from the two populations. Theprior distribution of means is normal with mean 0 and a small precision (inverse variance) and the precision matrices priordistribution is an independent Wishart. The posterior distribution of the difference between the two mean vectors wasobtained. Instead of solving for the exact posterior probabilities using this model or deriving approximations, the authorused MCMC simulation to estimate the marginal posterior densities. Unfortunately Buckley (2004) did not provide MonteCarlo simulations to evaluate the performance of his solution. Only real examples were used to illustrate his method.Another Bayesian solution to themultivariate Behrens–Fisher problemwaspresented in Johnson andWeerahandi (1988).

However, it was not evaluated by Monte Carlo simulation. The aim of the procedure was to find posterior probabilities ofellipsoids for the difference between twomultivariate normal means. Prior distributions were noninformative or conjugate.In the first case, the joint prior distribution was

π(µ1,Σ1,µ2,Σ2) = (|Σ1||Σ2|)−1/2, i = 1, 2,

where µi ∈ Rp, Σi > 0, and µ1 and µ2 were independently distributed. The difference µi − yi is distributed asTp(Vi/(ni − p)ni, ni − p), i = 1, 2, where Tp(A, ν) is p-dimensional multivariate t distribution with parameters A and ν.In the second case conjugate prior distributions were used. It was considered that µi|Σi is distributed as Np(ai,Σi/qi)

andΣi is distributed asW−1p (Ri, ri), i = 1, 2, whereW−1p (A, ν) is the inverse Wishart distribution with parameters A and ν.Thus, the joint distribution of µ1 and µ2 is defined by the following: µ1 and µ2 are independent and µi − yi is distributedas Tp(Ui/(ni + ri − 2p), ni + ri − 2p), i = 1, 2, where yi = (niyi + qiai)/(ni + qi) and

Ui =[

1(ni + qi)

] [Wi + Ri +

niqi(ni + qi)

(ai − yi)(ai − yi)>], i = 1, 2.

For both prior distributions, the inference is performed on δ = µ1 − µ2 and, therefore, it is necessary to calculate theprobabilities based on the convolution of two multivariate t distributions. For the inference the authors considered thequadratic form given by

Q = (δ− d)>V−1(δ− d),

centered at posteriormeand = Y1−Y2 of δ. The knownpositive definitematrixV can be chosen to facilitate the computationof probabilities for Q . The cumulative distribution function of Q is a linear combination of central F distributions, Fr,s, withr and s degrees of freedom given by

FQ (q) =∞∑j=0

E(ωj)Fp+2j,n

[nq

θ(p+ 2j)

],

1624 P.S. Ramos, D.F. Ferreira / Computational Statistics and Data Analysis 54 (2010) 1622–1633

where θ is an arbitrary constant chosen to obtain fast convergence for the summation, the expectation is taken with respectto the beta random variable B, which is distributed as Be[(n1 − p)/2, (n2 − p)/2], and n = n1 + n2 − 2p. The quantity ωj(j = 1, 2, . . .) is defined in terms of θ and λj(B) by the recursion relation

ωr =12r

r−1∑j=0

Hr−jωj,

where w0 =∏pj=1(θ/λj)

1/2, Hr =∑pj=1(1− θ/λj)

r(r = 1, 2, . . .) and λj = λj(B), j = 1, . . ., p are the ordered eigenvaluesλ1(B) ≤ · · · ≤ λp(B) of the matrix

1n1B

V−1/2V1V−1/2 +1

n2(1− B)V−1/2V2V−1/2.

After obtaining the distribution of Q , FQ (q), and its quantiles, the hypothesis H0 : δ = δ0 should be rejected if the credibilityellipsoid does not contain the hypothetical value δ0. The performance of this solution was not evaluated by simulation.The solution proposed by Buckley (2004) was an attempt to circumvent the complexity of the solution of Johnson andWeerahandi (1988) and provide a simpler alternative, although details of this solution are not shown. Thus, there is a lackof a simple Bayesian solution to the multivariate problem in the scientific literature.Motivated by these considerations, this work proposes a Bayesian computational solution to the multivariate

Behrens–Fisher problem based on the approach of Johnson and Weerahandi (1988). Moreover, it intends to evaluate theperformance of the new proposal by Monte Carlo simulation evaluating the type I error rates and power under differentdegrees of covarianceheterogeneity of the twomultivariate normal populations, considering several combinations of samplesizes and differences in standard errors between the mean vectors for different nominal significance levels. This work alsoaims to compare the performance of the new solution with the modified Nel and Van der Merwe test, considered the bestfrequentist solution according to Krishnamoorthy and Yu (2004).

2. Methodology

2.1. The proposed Bayesian solution

Let Y11, . . ., Y1n1 and Y21, . . ., Y2n2 be independent random samples of sizes n1 and n2 obtained from two p-dimensionalmultivariate normal populations Np(µ1,Σ1) and Np(µ2,Σ2), respectively, where µi is the mean vector p× 1 andΣi is thecovariance matrix p×p from the i-th population, i = 1, 2. To characterize the Behrens–Fisher problem, thematricesΣ1 andΣ2 are considered different and unknown. For making inferences about the unknown population mean vectors difference,δ = µ1 − µ2, it was proposed a Bayesian test for the null hypothesis

H0 : δ = δ0 =

δ01δ02...δ0p

. (1)

Particularly, the interest was in the test of the null hypothesis (1) when δ0 = [0, 0, . . . , 0]> that corresponds to the caseof equality of two population mean vectors. The procedure described by Johnson and Weerahandi (1988) was used and ispresented in the sequence. Initially, the sample mean vectors and cross-product matrices were obtained by

Yi =1ni

ni∑j=1

Yij, (2)

Vi =ni∑j=1

(Yij − Yi)(Yij − Yi)>, (3)

for i = 1, 2.A conjugate prior distribution for µi and Σi was used, as suggested by Johnson and Weerahandi (1988), where

µi|Σi ∼ Np(ai,Σi/qi), Σi ∼ W−1p (Ri, ri), and W−1p is the inverse p-dimensional Wishart distribution. In this case, ai,qi, Ri and ri are hyperparameters. Considering ai = 0, qi → 0 and Ri → 0 (p × p), the posterior distribution of µi isTp(Yi,Vi/[ni(ni + ri − 2p)], ni + ri − 2p), where Tp(A, ν) is a p-dimensional multivariate t distribution with covarianceparameter A and ν degrees of freedom. The hyperparameter ri was chosen such that ni + ri − 2p varied between ni − p andni − 1. Thus, ri should range between p and 2p − 1. It was found that as closer p was to ni worse was the performance ofthe proposed test. It became liberal if ni + ri − 2pwas replaced by ni − 1. Similarly, if the value of ni + ri − 2pwas close toni − p, the test became conservative.

P.S. Ramos, D.F. Ferreira / Computational Statistics and Data Analysis 54 (2010) 1622–1633 1625

It was verified that the test performance changed from liberal to conservative in a non-linear response pattern dependingon ri. Thus the information to establish the value of hyperparameter ri was contained in p and ni and expressed as a functionof the ratio p/ni. Therefore, the ri value was empirically defined by

ri = (2p− 1)(

p2p− 1

)( pni

)3, (4)

that ranges between p and 2p− 1.To make inferences about δ = µ1 − µ2 it is necessary to obtain the convolution of two multivariate t distributions,

which is a very difficult strategy. Thus, it was used a function of δ and Y1 − Y2 and its posterior distribution was obtainedcomputationally by Monte Carlo simulation. This feature distinguishes this work of the Johnson and Weerahandi (1988)proposal. They obtained an analytical distribution of this function and calculated its posterior quantiles by numericalmethods by using the exact distribution and some approximations. This function is given by the quadratic form

q = (d − δ)>V−1p (d − δ),

where d = Y1 − Y2, δ = µ1 − µ2 and Vp (V pooled) is a linear combination of the matrices V1 and V2 given by

Vp =(n1 − 1)V1 + (n2 − 1)V2

n1 + n2 − 2. (5)

As the posterior distribution function of Q obtained by Johnson andWeerahandi (1988) is very complex and involves aninfinite series, the purpose of this work is obtain it computationally. Thus, given the samples of two multivariate normalpopulations Y11, . . ., Y1n1 and Y21, . . ., Y2n2 , the sample mean (Yi) and cross-product matrix (Vi) were estimated using (2)and (3), for i = 1, 2. Using Yi as the mean of the posterior distribution of µi, that was considered Tp(Yi,Vi/[ni(ni + ri −2p)], ni + ri − 2p), the random samples of size N were simulated. Since µij is the j-th multivariate observation from thisposterior distribution corresponding to the i-th population, with j = 1, 2, . . ., N and i = 1, 2, the following quantity wasobtained

qj = (µ1j − µ2j − d)>V−1p (µ1j − µ2j − d). (6)

The expected value of µ1j − µ2j − d is a p-dimensional null vector. Thus, the distribution of qj corresponds to thedistribution under the null hypothesis of equal population means. To perform the test of the null hypothesis H0 : δ = δ0 thevalue of the following quantity was obtained by

qc = (d − δ0)>V−1p (d − δ0). (7)

The empirical probability valuewas denoted here by empirical credibility C and itwas used as evidence to decidewhetheror not the null hypothesis should be rejected. It was calculated by

C =

N∑j=1I(qc ≤ qj)

N,

where I(qc ≤ qj) is the indicator function such that I(qc ≤ qj) = 1 if qj exceeds the original value qc and 0 otherwise. Thevalue of N considered in this work was 2000.

2.2. Monte Carlo simulation

For evaluating the performance of the proposed test, Monte Carlo simulation was used. The simulations were performedusing the R software (R Development Core Team, 2009). Two stages were considered. In the first, the type I error rates wereevaluated using samples simulated under H0. In the second, the power was computed considering samples generated underH1.Under the null hypothesis of equality of mean vectors, M = 2000 Monte Carlo simulations were performed to obtain

the type I error rates. Multivariate samples from two normal distributions Np(µ1,Σ1) and Np(µ2,Σ2)were simulated, withsizes n1 and n2, where p is the dimension (number of variates) andΣi(p× p) is the covariance matrix of the i-th population,assumed to be real symmetric positive definite. Under the null hypothesis H0 : µ1 = µ2, and, without loss of generality, themean vectors were considered equal to a p-dimensional null vector 0.Under the alternative hypothesisH1:µ1 6= µ2,M Monte Carlo simulations were performed in a second stage to calculate

the power. Again, multivariate samples from two normal distributions Np(µ1,Σ1) and Np(µ2,Σ2) were simulated, withsizes n1 and n2, where the means differed by k standard errors (k = 2, 4, 8, 16). The mean of the population 2 was definedas a function of the mean of the population 1 by µ2 = µ1 + k

√diag(ΣY1−Y2), where ΣY1−Y2 = Σ1/n1 + Σ2/n2 is the

1626 P.S. Ramos, D.F. Ferreira / Computational Statistics and Data Analysis 54 (2010) 1622–1633

covariance matrix of the difference between means and, without loss of generality, µ1 was fixed as a p-dimensional nullvector, i.e., µ1 = [0 . . . 0]> and

√diag(ΣY1−Y2) =

√σ 111

n1+σ 211

n2√σ 122

n1+σ 222

n2...√

σ 1pp

n1+σ 2pp

n2

where σ i`` is the variance of the `-th variate (` = 1, . . ., p) in the i-th population (i = 1, 2) of size ni. This strategy ofsimulation corresponds to the concentrated non-centrality structure (Olson, 1974).In both cases (under H0 and H1) the covariances were

Σ1 = σ2

1 ρ ρ · · · ρρ 1 ρ · · · ρρ ρ 1 · · · ρ...

......

. . ....

ρ ρ ρ · · · 1

, Σ2 = γpΣ1, (8)

where γp = p√γ and γ = |Σ2|/|Σ1| is the degree of covariance heterogeneity between the two populations, considered as 1,

2, 8, 16, and 32. The value γ = 1 determines the circumstance of homogeneity of the covariances from the two populationsand therefore is not a case of the multivariate Behrens–Fisher problem, used only as a reference and an ideal situation forcomparing the tests performance with γ values greater than 1.After obtained the samples of both populations in each of theMMonte Carlo simulations the following testswere applied:

the Bayesian test proposed in Section 2.1 and the modified Nel and Van der Merwe test proposed by Krishnamoorthy andYu (2004).Under H0, the type I error rates were calculated for each test as the proportion of times that the true null hypothesis was

rejected (out of a total ofM Monte Carlo simulations) by

type I error rate =

M∑j=1I(qcj ≥ qαj)

M(9)

where I(·) is the indicator function that returns 1 if the inequality is true and 0 otherwise, qcj corresponds to the test statisticobtained under H0 and qαj is the 100α% quantile from the j-th simulation.Exact binomial tests were applied to each value of type I error rates, considering a nominal significance level of 1% for the

hypothesis H0 : α = α0 and H1 : α 6= α0, where the α0 chosen values were 10%, 5% and 1%. If the null hypothesis is rejectedand the observed type I error rates are considered significantly (p-value< 0.01) lower than the nominal significance level,the test will be considered conservative. If the null hypothesis is rejected and the observed type I error rates are consideredsignificantly (p-value< 0.01) higher than the nominal significance level, the test will be considered liberal. However, if theobserved type I error rates are not considered significantly (p-value > 0.01) different from the nominal significance level,the test will be considered accurate.Under the alternative hypothesisH1, the simulationswere performed to evaluate the power of the test, i.e., the proportion

of times where the false null hypothesis was rejected (out of a total ofM Monte Carlo simulations), that was computed by

power =

M∑j=1I(qcj ≥ qαj)

M

where qcj is the test statistic value obtained under H1 and qαj is the 100α% quantile from the j-th simulation.Several configurations were considered in theMonte Carlo simulations for different values of number of variates, sample

sizes, degrees of covariances heterogeneity and many differences between the mean vectors. The number of variatesconsidered was p = 2 and 5. Without loss of generality it was settled ρ = 0.5 and σ 2 = 1, since the tests are invariant tononsingular transformations. The sample sizes were n1, n2 = 8, 15, 20, 30, and 100, restricted to n1 > p and n2 > p andconsidering different combinations of n1 and n2.

P.S. Ramos, D.F. Ferreira / Computational Statistics and Data Analysis 54 (2010) 1622–1633 1627

3. Results

In what follows the results from the simulations under H0 and H1 considering different number of variates p for the typeI error rates and the power are presented. It was considered significance nominal levels of 10%, 5% and 1%, different degreesof population covariance heterogeneity and different sample sizes. Since the results for the significance nominal level of 1%were very similar to those of 10% and 5% they were not shown.

3.1. Type I error rates

Table 1 shows the type I error rates of the proposed Bayesian test (BT) and of the modified Nel and Van der Merwe test(NVMT) as a function of n1, n2 and γ , under H0, to p = 2 and α = 10%. For equal sample sizes, n1 = n2, the two testscontrolled the type I error rates for all degrees of covariance heterogeneity, i.e., the tests showed error rates close to thesignificance nominal level for all γ values. For different sample sizes, n1 6= n2, the BT showed conservative behavior insome circumstances and mainly when the larger sample size was associated with the lower population covariance matrix(n1 = 30, n2 = 8). In this circumstances the type I error rates were lower than the nominal significance level of 10% forall γ values. It should be noted that lower covariance corresponds to lower generalized variance |Σi|, i = 1, 2. The NVMTcontrolled the type I error rates in all circumstances with test sizes equal to the nominal significance level.Fig. 1 gives the type I error rates of the Bayesian and the modified Nel and Van der Merwe tests to p = 2, α = 5% and

different values of γ under H0. With equal sample sizes (Fig. 1a, b and c) the tests controlled the type I error rates with onlyone exception (n1 = n2 = 8 and γ = 2) where the BT was conservative (Fig. 1a). The most extreme case, whereas n1 = 30and n2 = 8, was displayed in Fig. 1d where the BT was conservative and the NVMT showed size of the test not significantly(p > 0.01) different from the nominal significance level for all degrees of covariance heterogeneity γ .Table 2 shows the type I error rates of the proposed Bayesian and themodified Nel and Van derMerwe tests as a function

of n1, n2 and γ under H0 for p = 5 and α = 10%. For equal sample sizes both tests had type I error rates close to the nominalsignificance level, although the Bayesian test was considered liberal in some circumstances and conservative only in thesmall sample sizes case (n1 = n2 = 8) and under covariance homogeneity (γ = 1). However, for different sample sizesfrom the two populations, the conservative behavior of the BT for p = 2 and α = 10% (Table 1) was more intense andaccentuated when the difference between n1 and n2 increased and becamemore evident in the case of n1 = 30 and n2 = 8.In this case, the NVMT that was liberal only in some different sample sizes circumstances became liberal to all degrees ofcovariance heterogeneity γ .In cases of large differences between the sample sizes (n1 = 8, n2 = 30 and n1 = 30, n2 = 8) the NVMT was, in

general, liberal even with homogeneous population covariances. When the larger covariance, population 2, was associatedwith smaller sample size (n2 = 8) the type I error rates tended to increase when the degree of covariance heterogeneityincreased, reaching a difference of 3.95% points when it was compared to the nominal significance level of 10%. In all these

Table 1Type I error rates of the Bayesian (BT) and the modified Nel and Van der Merwe (NVMT) tests as a function of the sample sizes n1 and n2 and degrees ofcovariance heterogeneity (γ ) considering p = 2 variates and nominal significance level α = 10% under H0 .

n1 = 8, n2 = 8 n1 = 15, n2 = 15 n1 = 20, n2 = 20γ BT NVMT BT NVMT BT NVMT

1 0.0925 0.1035 0.0885 0.0950 0.0945 0.10452 0.0870 0.0980 0.1025 0.1075 0.0995 0.10558 0.0945 0.0945 0.1085 0.1130 0.1035 0.108516 0.0940 0.0965 0.1025 0.1030 0.0960 0.092532 0.1105 0.1075 0.1015 0.0990 0.1040 0.1030

n1 = 30, n2 = 30 n1 = 100, n2 = 100 n1 = 8, n2 = 15γ BT NVMT BT NVMT BT NVMT

1 0.0980 0.1050 0.0925 0.0945 0.0970 0.11702 0.0910 0.0930 0.1130 0.1145 0.0860 0.10308 0.0955 0.0970 0.1105 0.1105 0.0720* 0.091016 0.0970 0.0960 0.0950 0.0930 0.0940 0.103532 0.0890 0.0870 0.1075 0.1075 0.0895 0.0975

n1 = 15, n2 = 8 n1 = 8, n2 = 30 n1 = 30, n2 = 8γ BT NVMT BT NVMT BT NVMT

1 0.0845 0.0970 0.0730* 0.1035 0.0740* 0.09652 0.0860 0.1045 0.0725* 0.1065 0.0680* 0.10658 0.0790* 0.0975 0.0780* 0.1005 0.0675* 0.108516 0.0920 0.1095 0.0850 0.1100 0.0700* 0.097532 0.0940 0.1045 0.0840 0.1015 0.0790* 0.0970* Significantly (p < 0.01) lower than the nominal significance level of α = 10%.

1628 P.S. Ramos, D.F. Ferreira / Computational Statistics and Data Analysis 54 (2010) 1622–1633

(a) n1 = n2 = 8. (b) n1 = n2 = 30.

(c) n1 = n2 = 100. (d) n1 = 30 and n2 = 8.

Fig. 1. Type I error rates of the Bayesian (BT) and the modified Nel and Van der Merwe (NVMT) tests considering p = 2, different degrees of covarianceheterogeneity γ , and α = 5% under H0 . The dotted lines represent the limits where the tests of the hypothesis H0 : α = 5% would be rejected, consideringa 99% confidence coefficient.

circumstances the BT showed opposite results from those of the NVMT, being more conservative. In any case, for n1 6= n2,the new test was considered liberal.Results of the type I error rates for different degrees of covariance heterogeneity γ and sample sizes n1 and n2, considering

p = 5, α = 5%, and for both tests showed similar response pattern to those presented for α = 10% (results not shown).The only difference is that the NVMT was more conservative for smaller equal sample sizes case (8 and 15) associated withsmaller degrees of covariance heterogeneity γ . The conservative characteristic of the Bayesian test for different sample sizesand the liberal behavior of the modified Nel and Van der Merwe test for n1 = 30 and n2 = 8 remained.The BT had the same behavior for different nominal significance levels α, although it was a little more conservative with

p = 5 variates when compared to p = 2. Similarly, the NVMT showed differences when different number of variates wasconsidered. For a low number of variates p, the size of the test was not significantly (p > 0.01) different from the nominallevel, but when p increased from 2 to 5 it was liberal, considering circumstances with very different sample sizes and withthe larger covariance associatedwith lower sample size (n1 = 30 and n2 = 8). This featurewas an advantage of the Bayesiantest because it had type I error rates lower than the nominal level even when very different sample sizes are considered,while the NVMT was liberal in such cases. Krishnamoorthy and Yu (2004) evaluating the original and modified Nel and VanderMerwe tests observed that themodified version always controlled the type I error rates. However, the authors simulatedcircumstances of different sample sizes but they did not assigned the larger covariance matrix to the smaller sample size.This case showed the most liberal behavior, especially for the higher value of p.

3.2. Power

Fig. 2 shows the powers of the BT and the NVMT for p = 2 as a function of the degrees of covariance heterogeneity γand of the sample sizes n1 and n2 with the difference between mean vectors settled as k = 2 standard errors, consideringα = 10%. For samples of equal andmoderate sizes (n1 = n2 = 30) or large sizes (n1 = n2 = 100), the BT and the NVMT had

P.S. Ramos, D.F. Ferreira / Computational Statistics and Data Analysis 54 (2010) 1622–1633 1629

Table 2Type I error rates of the Bayesian (BT) and the modified Nel and Van der Merwe (NVMT) tests as a function of the sample sizes n1 and n2 and degrees ofcovariance heterogeneity (γ ) considering p = 5 variates and nominal significance level α = 10% under H0 .

n1 = 8, n2 = 8 n1 = 15, n2 = 15 n1 = 20, n2 = 20γ BT NVMT BT NVMT BT NVMT

1 0.0830* 0.0780* 0.1090 0.0840 0.1135 0.08902 0.0865 0.0845 0.1195** 0.1015 0.1175 0.09908 0.0990 0.0895 0.1300** 0.1045 0.1110 0.085016 0.1100 0.1000 0.1165 0.0885 0.1245** 0.095532 0.0975 0.0890 0.1255** 0.0915 0.1180 0.0955

n1 = 30, n2 = 30 n1 = 100, n2 = 100 n1 = 8, n2 = 15γ BT NVMT BT NVMT BT NVMT

1 0.1140 0.1030 0.1070 0.1040 0.0560* 0.10202 0.1055 0.0930 0.0915 0.0905 0.0540* 0.09258 0.1060 0.0965 0.1065 0.0980 0.0575* 0.091516 0.1095 0.0920 0.0935 0.0905 0.0695* 0.097532 0.1095 0.0930 0.1040 0.1010 0.0670* 0.1000

n1 = 15, n2 = 8 n1 = 8, n2 = 30 n1 = 30, n2 = 8γ BT NVMT BT NVMT BT NVMT

1 0.0515* 0.1060 0.0190* 0.1290** 0.0185* 0.1200**

2 0.0620* 0.1120 0.0280* 0.1105 0.0170* 0.1270**

8 0.0540* 0.1045 0.0245* 0.1230** 0.0215* 0.1295**

16 0.0605* 0.1250** 0.0220* 0.1095 0.0165* 0.1395**

32 0.0610* 0.1175 0.0255* 0.1100 0.0205* 0.1365**

* Significantly (p < 0.01) lower than the nominal significance level of α = 10%.** Significantly (p < 0.01) greater than the nominal significance level of α = 10%.

almost equal and indistinguishable powers (Fig. 2c and d). In such cases, the degree of covariance heterogeneity practicallyhad no effect on power of both tests, since the power curves were almost parallel to the abscissa. The power oscillated from60% to 65% and considering n1 = n2 = 100 the powers were close to 65% and for n1 = n2 = 30 they were close to 60%.In the case of different sample sizes, n1 = 8 and n2 = 30 (Fig. 2a), where the smaller sample is associated with the

population of lower generalized variance or lower covariance, the BT showed lower power than the NVMT for all degreesof covariance heterogeneity. The difference was almost constant in all cases, oscillating around 5% points. It should benoted that the BT was conservative in the control of type I error rates in these circumstances for large degree of covarianceheterogeneity, considering p = 2. In the other case of different sample sizes, n1 = 30 and n2 = 8 (Fig. 2b), there was asmall reduction of power when this circumstance is compared to the previous onewhich is in agreement with the commentmade by Kim (1992) that the power of some tests is affected when the sample related to the larger covariance matrix areassociated with smaller sample size. The BT was superior to that of NVMT. Only for the homogeneous (γ = 1) and lowheterogeneity (γ = 2) cases the BT power was lower than that of the NVMT. In other cases, the powers were equal. Thisresult is unexpected because this was the circumstance in which the BT was more conservative and the NVMT presentedexact test sizes, except for the Monte Carlo error.With the same configurations illustrated in Fig. 2, but considering k = 4 standard errors of differences between mean

vectors the powers of both tests were almost equal to or greater than 97% for all degrees of covariance heterogeneity γ(results not shown). When the difference increased to k = 8, all powers were 100% (results not shown). In this case, thetests had equal and optimal performance, with power of 100%, confirming what Subrahmaniam and Subrahmaniam (1973)pointed out that for large differences between themean vectors the power tends to be 100%. Beyond this level of differencesbetween the population means (k > 8), the powers of the tests were always equal to 100%, with p = 2 and α = 10%.Since the power results for α = 5% are similar to those of α = 10%, only a few cases were illustrated in Fig. 3.

Homogeneous (γ = 1) and large heterogeneity (γ = 32) circumstances were chosen, where the powers were expressedas a function of the difference k between the population mean vectors for p = 2. For samples of equal and moderate sizes(n1 = n2 = 30) or large sizes (n1 = n2 = 100), the tests showed the same power performance, regardless of the covarianceheterogeneity and the value of k (Fig. 3c and d).For different sample sizes,n1 = 8 andn2 = 30 (Fig. 3a), the BTwas less powerful than theNVMT for k ≤ 4. The differences

were around 5% points for k = 2 and 1% point for k = 4. For k ≥ 8, the powers of the tests were equivalent and equal to100%. The degree of covariance heterogeneity caused small increases in the powers of the tests which is surprising, becausethe opposite result was expected (Fig. 3a). A possible explanation is due to the fact of the larger sample was associated withthe higher population covariance. This feature reveals an aspect of robustness of both the tests, although the differences inthe power were very small. In the case of n1 = 30 and n2 = 8 (Fig. 3b) differences in the powers between the tests weresmall and almost negligible in all cases, considering k and γ .For equal and moderate sample sizes, n1 = n2 = 30, considering k = 2, p = 5 and α = 10%, the BT power was

superior to the NVMT and for n1 = n2 = 100 the differences between the power of the two tests were negligible (results

1630 P.S. Ramos, D.F. Ferreira / Computational Statistics and Data Analysis 54 (2010) 1622–1633

(a) n1 = 8 and n2 = 30. (b) n1 = 30 and n2 = 8.

(c) n1 = n2 = 30. (d) n1 = n2 = 100.

Fig. 2. Power of the Bayesian (BT) and themodified Nel and Van derMerwe (NVMT) tests considering p = 2, different degrees of covariance heterogeneityγ , k = 2, and α = 10%, under H1 .

not shown). For different sample sizes the BT was less powerful than NVMT (results not shown) and the differences weremuch greater than those observed for p = 2 (Fig. 2a and b). It is important to emphasize that in these cases under H0 theNVMT was almost always liberal while the BT was conservative (Table 2). The same pattern behavior of the NVMT in thesecircumstances was observed by Subrahmaniam and Subrahmaniam (1973) and Christensen and Rencher (1997) when thepurpose was to evaluate the solution proposed by James (1954). They observed high power but also high type I error rates,especially when the larger covariance was associatedwith the smaller sample size and for large p. Cirillo and Ferreira (2003)observed similar response pattern of Johansen’s solution (Johansen, 1980). This solution showed high power with α = 5%and p = 5 but did not control the type I error rates when the two sample sizes were different (n1 = 10 and n2 = 15). Thus,the advantage of the NVMT power performance should not be considered because in real circumstances the researcher doesnot know if the sample is under H0 or H1. Therefore, a rejection of the null hypothesis of equality of mean vectors is morelikely to be attributed to type I error if the null hypothesis is true, since this test is liberal. The probability of this error isactually larger than the nominal significance level α and this case did not occur to the BT.Some circumstances were simulated and discussed below, but the results were not showed. To illustrate two of them,

the cases of k = 4 and α = 10%, for p = 2 and p = 5 were used. Considering p = 2, regardless of sample size and degreeof covariance heterogeneity, the power values of both tests were almost equal and very close to 100%. For p = 5 and equalsample sizes the tests showed the same behavior observed for p = 2. The degree of covariance heterogeneity practically hadno effect on the tests performance, showing lines parallel to the abscissa. For different sample sizes, the differences betweenthe tests performance reduced when compared to the case of k = 2. The power decreases when the number of variates ofthe two tests increases and the BTwasmore influenced than the NVMT, confirming the observationmade by Subrahmaniamand Subrahmaniam (1973). They asserted that the power of tests decreases with the increase of p for a settled differencebetween mean vectors. Another circumstance (not shown) was simulated to p = 5 and k = 8 for all sample sizes anddegrees of covariance heterogeneity. The powers of both tests were equal to 100%. The other circumstances consideringk > 8 also showed power equal to 100% for both tests.

P.S. Ramos, D.F. Ferreira / Computational Statistics and Data Analysis 54 (2010) 1622–1633 1631

(a) n1 = 8 and n2 = 30. (b) n1 = 30 and n2 = 8.

(c) n1 = n2 = 30. (d) n1 = n2 = 100.

Fig. 3. Power of the Bayesian (BT) and the modified Nel and Van der Merwe (NVMT) tests considering p = 2, different values of k, γ = 1, γ = 32, andα = 5%, under H1 .

Fig. 4 gives the power of the Bayesian and the modified Nel and Van der Merwe tests for p = 5 and α = 5%, consideringγ = 1 and γ = 32 and different k values including the case of k = 0 that shows information concerning type I errorrates. Comparing these powers with those of Fig. 3, corresponding to the same settings but with p = 2, it was observedthat for different sample sizes (Fig. 4a and b) the differences between the two tests were accentuated and the BT showedlower values than those of the NVMT for k = 2 and k = 4. Unlike the happening illustrated in Fig. 3a and b, where theheterogeneity had the effect of increasing the tests power, in this case (Fig. 4) the heterogeneity had no expressively effecton the BT performance. However, for k = 2, n1 = 30 and n2 = 8 (Fig. 4b) in the covariance homogeneity circumstance(γ = 1) the NVMT showed highest power, what was expected beforehand. But this result should be considered cautiously,since for k = 0 the type I error rates were higher than the nominal significance level. For differences between mean vectorsgreater than 8 standard errors the two tests showed power equal to 100%. For equal sample sizes (Fig. 4c and d) the BT andthe NVMT showed the same power performance with no influence of the covariance heterogeneity, being influenced onlyby the value of k. This feature demonstrates the robustness of these tests to deal with severe cases of the Behrens–Fisherproblem, since the lines for γ = 1 and for γ = 32 are almost indistinguishable.In general the power of both tests decreased when the number of variates increased from 2 to 5. This effect was more

pronounced on the BT and for different sample sizes. The covariance heterogeneity practically had no effect on the powersof the tests. With differences between mean vectors equal to or greater than 4 standard errors, the powers were very highfor equal sample sizes for all p (values above 97%) and for different sample sizes with p = 2 (values above 90%) and onlymoderate for different sample sizes with p = 5 (values above 65%). For all cases the power was lower when the largersample was associated with lower covariance matrix (n1 = 30 and n2 = 8).Some combinations of sample sizes were simulated, (n1, n2) = (15, 15), (8, 15) and (15, 8), but these results were

not fully displayed because their behavior were very similar to circumstances of n1 and n2 given by (30, 30), (8, 30) and(30, 8). However, as the differences between the sample sizes for populations 1 and 2 decreased (cases not shown) thepowers increased, indicating that higher the balance of the sample sizes (n1 close to n2) better and more similar is the testsperformance.

1632 P.S. Ramos, D.F. Ferreira / Computational Statistics and Data Analysis 54 (2010) 1622–1633

(a) n1 = 8 and n2 = 30. (b) n1 = 30 and n2 = 8.

(c) n1 = n2 = 30. (d) n1 = n2 = 100.

Fig. 4. Power of the Bayesian (BT) and the modified Nel and Van der Merwe (NVMT) tests considering p = 5, different values of k, γ = 1, γ = 32, andα = 5%, under H1 .

The Bayesian test showed similar response pattern to that observed by Christensen and Rencher (1997) evaluating thesolutions of Kim and original Nel and Van der Merwe to the multivariate Behrens–Fisher problem, because they had highpower and were conservative. Since the BT showed high power, was conservative for adverse cases (different sample sizesand the largest sample associated with lower covariance) and controlled the type I error rates in most of the circumstances,even with high heterogeneity, we recommend it. Moreover, in several cases under H0, the NVMT was liberal, but showedtype I error rates very close to the nominal significance levels. In these cases, but underH1, the NVMT powerwas higher thanthat of the BT. This advantage is not real because, under H0, there was no control of Type I error rates. In these conditions ofhigher sample sizes heterogeneity pairing, the researcher should consider the possibility of using the BT, since it is extremelyconservative on type I error rates control and it does have competitive power.The proposed Bayesian test is recommended due to the high control of type I error rates, in most simulated cases,

especially for use in balanced cases because, in such circumstances, its power was at least equal to that of the NVMT.New evaluations considering different hyperparameters choices can be performed in future works when the purpose isto improve the BT performance for unbalanced cases (n1 6= n2).

4. Conclusions

The results of this work lead us to the following conclusions: the Bayesian test (BT) has been successfully developedand it is competitive with the best frequentist solution. In general, the BT was conservative for samples of different sizesand liberal in some circumstances of equal and small sample sizes. When the test was liberal, the difference between theempirical type I error rate and the nominal significance level was practically negligible. The NVMTwas liberal in unbalancedcases when the number of variates was high (p = 5). The power performance of the BT was equal to or greater than that ofthe NVMT in large samples and/or in balanced circumstances. The new solution in some circumstances surpasses its maincompetitor. Thus the BT is recommended for real cases due to its competitive advantages.

P.S. Ramos, D.F. Ferreira / Computational Statistics and Data Analysis 54 (2010) 1622–1633 1633

Acknowledgements

The authors thank CNPq and FAPEMIG – Fundação de Amparo à Pesquisa do Estado de Minas Gerais – for the financialsupport. The authors would also like to thank the anonymous referees for their valuable contributions which helped toimprove the quality of this article. The first author’s work was supported by FAPEMIG. The second author’s work wassupported by CNPq.

References

Behrens, W.U., 1929. A contribution to error estimation with few observations. Landwirtschaftliche Jahrbücher 68, 807–837.Buckley, J., 2004. Simple Bayesian inference for qualitative political research. Political Analysis 12, 386–399.Christensen, W.F., Rencher, A.C., 1997. A comparison of type I error rates and power levels for seven solutions to the multivariate Behrens–Fisher problem.Communications in Statistics - Simula 26 (4), 1251–1273.

Cirillo, M.A., Ferreira, D.F., 2003. Extensão do teste para normalidade univariado baseado no coeficiente de correlação quantil–quantil para o casomultivariado. Revista de Matemática e Estatística 21 (3), 57–75.

Ferreira, D.F., 2008. Estatística Multivariada. UFLA, Lavras, 662p.Fisher, R.A., 1936. The fiducial argument in statistical inference. Annals of Eugenics 6, 391–398.James, G.S., 1954. Tests of linear hypotheses in univariate and multivariate analysis when the ratios of the population variances are unknown. Biometrika41, 19–43.

Jeffreys, H., 1940. Note on the Behrens–Fisher formula. Annals of Eugenics 10, 48–51.Johansen, S., 1980. The Welch–James approximation to the distribution of the residual sum of squares in a weighted linear regression. Biometrika 67 (1),85–92.

Johnson, R.A., Weerahandi, S., 1988. A Bayesian solution to the multivariate Behrens–Fisher problem. Journal of the American Statistical Association 83(401), 145–149.

Kim, S., 1992. A practical solution to the multivariate Behrens–Fisher problem. Biometrika 79 (1), 171–176.Krishnamoorthy, K., Yu, J., 2004. Modified Nel and Van der Merwe test for the multivariate Behrens–Fisher problem. Statistics and Probability Letters 66,161–169.

Lix, L.M., Keselman, H.J., Hinds, A.M., 2005. Robust tests for the multivariate Behrens–Fisher problem. Computer Methods and Programs in Biomedicine 77,129–139.

Nel, D.G., Merwe, C.A.Van der, 1986. A solution to the multivariate Behrens–Fisher problem. Communication in Statistics - Theory and Methods 15,3719–3735.

Olson, C.L., 1974. Comparative robustness of six tests in multivariate analysis of variance. Journal of the American Statistical Association 69 (348), 894–908.R Development Core Team, 2009. R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria,disponível em: http://www.r-project.org. Acesso em: 15 fev. 2009.

Scheffé, H., 1970. Practical solutions of the Behrens–Fisher problem. Journal of the American Statistical Association 65 (332), 1501–1508.Subrahmaniam, K., Subrahmaniam, K., 1973. On the multivariate Behrens–Fisher problem. Biometrika 60 (1), 107–111.Welch, B.L., 1947. The generalisation of student’s problem when several different population variances are involved. Biometrika 6, 28–35.Yao, Y., 1965. An approximate degrees of freedom solution to the Behrens–Fisher problem. Biometrika 52 (1/2), 139–147.


Recommended