Optimal Shrinkage Estimation of Variances with...

Optimal Shrinkage Estimation of Variances

with Applications to Microarray Data Analysis

Tiejun TONG and Yuedong WANG ∗

May 27, 2005

Abstract

Microarray technology allows a scientist to study genome-wide patterns of gene ex-pression. Thousands of individual genes are measured with relatively small number ofreplications which poses challenges to traditional statistical methods. In particular, thegene-specific estimators of variances are not reliable and gene-by-gene tests have lowpower. In this paper we propose a family of shrinkage estimators for variances raised to afixed power. We derive optimal shrinkage parameters under both Stein and the squaredloss functions. Our results show that the standard sample variance is inadmissible undereither loss functions. We propose several estimators for the optimal shrinkage parametersand investigate their asymptotic properties under two scenarios: large number of replica-tions and large number of genes. We conduct simulations to evaluate the finite sampleperformance of the data-driven optimal shrinkage estimators and compare them with someexisting methods. We construct F -like statistics using these shrinkage variance estimatorsand apply them to detect differentially expressed genes in a microarray experiment. Wealso conduct simulations to evaluate performance of these F -like statistics and comparethem with some existing methods.

Key words and phrases: F -like statistic, gene expression data, inadmissibility, James-Stein shrink-

age estimator, loss function.

1. INTRODUCTION

The development of microarray technology has revolutionized the study of molecular biologyand become a standard tool in genomics research. Instead of working on a gene-by-gene basis,microarray technology allows the scientists to view the expression of thousands of genes froman experimental sample simultaneously (Nguyen, Arpat, Wang and Carroll 2002, Leung and

∗Tiejun Tong (email: [email protected]) is Ph.D candidate, and Yuedong Wang (email: [email protected]) is Professor, Department of Statistics and Applied Probability, Universityof California, Santa Barbara, California 93106. This research was supported by NIH Grant R01GM58533. The authors thank Dr. Xiangqin Cui in the Jackson Laboratory for answering manytechnical questions about the MAANOVA package.

1

Cavalieri 2003). Due to the cost, it is common that thousands of genes are measured with asmall number of replications (Lonnstedt and Speed 2002, Kendziorski, Newton, Lan and Could2003). As a consequence, we are facing with a ”large G, small n” paradigm, where G is totalnumber of genes and n is the number of replications.

The standard gene-specific estimators of variances are unreliable due to the relatively smallnumber of replications. Consequently, the commonly used statistical methods such as t-testor F-test for detecting differentially expressed genes on a gene-by-gene basis have low power(Callow, Dudoit, Gong, Speed and Rubin 2000). On the other hand, the assumption thatvariances are equal for all genes is unlikely to be true. Thus, tests based on a pooled commonvariance estimator for all genes are at the risk of generating misleading results (Wright andSimon 2003, Cui, Hwang, Qiu, Blades and Churchill 2005).

A number of approaches to improving variance estimation and hypothesis test have emerged.Kamb and Ramaswami (2001) suggested a simple regression estimation of local variances.Storey and Tibshirani (2003) added a small constant to the gene-specific variance estimatorsin their SAM t-test in order to stabilize the small variances. Lin, Nadler, Attie and Yandell(2003) proposed a data-adapted robust estimator of array error based on a smoothing splineand standardized local median absolute deviation. Jain, Thatte, Braciale, Ley, O’Connelland Lee (2003) proposed a local-pooled-error estimation procedure which borrows strengthfrom genes in local intensity regions for estimating array error variability. Baldi and Long(2001) proposed a regularized t-test by replacing the usual variance estimator with a Bayesianestimator. Lonnstedt and Speed (2002) proposed an empirical Bayes approach that combinesinformation across genes. Kendziorski et al. (2003) extended the empirical Bayes method usinghierarchical gamma-gamma and lognormal-normal models.

Cui and Churchill (2003) compared three variance estimators: the gene-specific estimator,the pooled estimator across genes, and the hybrid estimator as the averages of the gene-specificand the pooled estimators. Cui et al. (2005) proposed James-Stein type shrinkage estimator forvariances (referred to as the CHQBC estimator in the remainder of this article). Compared tosome existing tests, they showed that the F -test using the James-Stein type variance estimatorhas the best or nearly the best power for detecting differentially expressed genes over a widerange of situations.

The research so far has concentrated on the methodology. Little is known about the theo-retical properties of various shrinkage variance estimators. The shrinkage variance estimationhas a long history starting with the amazing inadmissibility result discovered by Stein (1964),where the standard sample variance is improved by a shrinkage estimator using informationcontained in the sample mean. Much research has been done since then (Maatta and Casella1990, Kubokawa 1999). Most research concerned with a single variance (Brown 1968, Brewsterand Zidek 1974, Kubokawa 1994) which are not applicable to microarray data analysis sincethe homogeneity of the variances is unlikely to be true. Some research has devoted to theshrinkage estimator of a covariance matrix (Sinha and Ghosh 1987, Perron 1990, Kubokawaand Srivastava 2003). However, all these methods required n > G to ensure non-singularityof the sample covariance matrix. Therefore, these methods broke down for microarray dataanalysis.

2

We propose new optimal shrinkage estimators in this paper. Instead of using informationin the sample mean (Stein 1964), we borrow information across variances. We will show thatthe standard sample variance is inadmissible. Therefore, our results extend Stein’s theory formultiple means (James and Stein 1961) to multiple variances. An important insight of thispaper is that a better variance estimator does not necessarily lead to a more powerful test.Specifically, since the variance appears in the denominator, an F -test using an estimator ofthe reciprocal of the variance is more powerful than that using the reciprocal of an estimatorof the variance (Section 5). We consider optimal shrinkage estimators for the parameter (σ2

g)t

where σ2g is the true variance associated with gene g, g = 1, . . . , G, and t is a fixed non-zero

power. Most existing research focuses on the special case of estimating σ2g . We will show that

our methods compare favorably to all existing methods using simulations.

Our methods and theory are general. Nevertheless, we present our methods in the frameworkof microarray data analysis. In Section 2, we introduce the CHQBC estimator and proposea modified version. In Section 3, we derive optimal shrinkage estimators for (σ2

g)t under two

common loss functions and show that the optimal shrinkage estimators dominate the standardgene-specific variance estimator. We also propose various estimators for the optimal shrinkageparameters and investigate their asymptotic properties under two scenarios: n → ∞ with fixedG and G → ∞ with fixed n. In Section 4, we conduct simulations to evaluate the performanceof the optimal shrinkage estimators and compare them with the CHQBC estimator and themodified CHQBC estimator. In Section 5, we construct F -like statistics using our optimalshrinkage estimators to detect differentially expressed genes for a microarray data, and conductsimulations to evaluate and compare performances of these F -like statistics. We conclude thisarticle in Section 6 with a brief discussion.

2. CHQBC ESTIMATOR AND ITS MODIFICATION

Let G (G ≥ 3) be the number of genes, Xg ∼ σ2gχ

2ν and Xg’s are mutually independent. Xg’s

are usually the residual sum of squares and ν is the degree of freedom. Consider the followingtransformation

X ′

g = ln σ2g + ε′g,

where X ′

g = ln(Xg/ν) − m, ε′g = ln(χ2ν/ν) − m and m = E(ln(χ2

ν/ν)). For simplicity, χ2ν also

denotes the random variable which follows a Chi-squared distribution with ν degrees of freedom.Applying the James-Stein shrinkage method to X ′

g (Lindley 1962) and then transform back tothe original scale, Cui et al. (2005) proposed the following CHQBC estimator

σ2g =

(

G∏

g=1

(Xg/ν)1/G

)

B × exp

[

(

1 − (G − 3)V∑

(ln Xg − ln Xg)2

)

+

× (ln Xg − ln Xg)

]

, (1)

where V = var(ε′g), ln Xg =∑G

g=1 ln(Xg)/G and B = exp(−m).

Let Zg = Xg/ν, Zpool =∏G

g=1 Z1/Gg and α0 = 1−

(

1 − (G − 3)V/∑

(ln Xg − ln Xg)2)

+. It is

easy to check that the CHQBC estimator (1) can be rewritten as

σ2g = B (Zpool)

α (Zg)1−α (2)

3

with α = α0. Note that when σ2g = σ2 for all g, E(Zpool) = σ2/B. That is, BZpool is an unbiased

estimator of σ2 when σ2g = σ2 for all g. On the other hand, Zg is an unbiased estimator of σ2

g .Therefore, it is reasonable to consider the following combination of two unbiased estimators

σ2g = (BZpool)

α (Zg)1−α , 0 ≤ α ≤ 1. (3)

When necessary, the dependence of σ2g on α will be expressed explicitly as σ2

g(α). We referto σ2

g(α0) as the modified CHQBC estimator. When variances σ2g are similar, it is likely that

α0 ≈ 1 and thus σ2g(α0) ≈ σ2

g(α0). When α0 ≈ 0, σ2g(α0) ≈ BZg which is biased when B 6= 1.

Simulations in Section 4.1 indicate that the modified CHQBC estimator σ2g(α0) always performs

better than the original CHQBC estimator σ2g(α0) for estimating σ2

g .The estimator σ2

g has a very simple structure: it borrows information across genes byshrinking each gene-specific variance toward the bias corrected geometric mean of variancesfor all genes. The shrinkage parameter α0 was obtained by applying the James-Stein methodto the logarithm of sample variances which do not follow the normal distribution (Cui et al.2005). Therefore, α0 may not be optimal and theoretical properties of σ2

g and σ2g are unknown.

3. OPTIMAL SHRINKAGE

We now consider the family of shrinkage estimators σ2g in (3) with the shrinkage parameter

α unfixed. There is no shrinkage when α = 0, and all variance estimates are shrunk to the biascorrected geometric mean when α = 1. Our goal is to find the optimal shrinkage parameter αunder Stein loss function (James and Stein 1961)

L1(σ2, σ2) = σ2/σ2 − ln

(

σ2/σ2)

− 1, (4)

and the squared loss function

L2(σ2, σ2) = (σ2/σ2 − 1)2. (5)

(4) is also called the entropy loss or Kullback-Leibler loss function (Kubokawa 1999). Thoughmore complicated than the squared loss function, Stein loss function has some desirable prop-erties. It is easy to see that L1(σ

2, σ2) = L2(σ2, σ2) = 0, L1(σ

2, σ2) → ∞ and L2(σ2, σ2) → 1

as σ2 → 0, and L1(σ2, σ2) → ∞ and L2(σ

2, σ2) → ∞ as σ2 → ∞. This means that Stein lossfunction penalizes gross underestimation as heavily as gross overestimation, while the squaredloss function penalizes the gross underestimation less than the gross overestimation.

Throughout the remainder of this article, we assume that Zg = σ2gχ

2ν/ν, g = 1, . . . , G, are

independent random variables and G ≥ 2. As discussed in Section 1, we will derive an optimalshrinkage estimator for (σ2

g)t for any power t 6= 0. σ2

g and 1/σ2g are special cases with t = 1 and

t = −1. We will first generalize the estimator (3) for estimating (σ2g)

t. Define

hn(t) = (ν

2)t

(

Γ(ν2)

Γ(ν2

+ tn)

)n

, (6)

where Γ(·) is the Gamma function.

4

Lemma 1. For any non-zero t > −ν/2,

(i) h1(t)Ztg is an unbiased estimator of (σ2

g)t;

(ii) When σ2g = σ2 for all g, hG(t)Zt

pool is an unbiased estimator of (σ2)t.The proof is straightforward. Note that h1(t)Z

tg is the gene-specific estimator of (σ2

g)t.

Combining (3) and Lemma 1, we now propose a family of shrinkage estimators for (σ2g)

t,

σ2tg =

(

hG(t)Ztpool

)α (h1(t)Z

tg

)1−α, 0 ≤ α ≤ 1. (7)

Note that h1(1) = 1 and hG(1) → B as G → ∞. Therefore, when t = 1 and G is large, (7)reduces to (3). Let σ2

pool = (∏G

g=1 σ2g)

1/G, σ2t = (σ2t1 , · · · , σ2t

G ) and σ2t = (σ2t1 , · · · , σ2t

G ).

3.1 Optimal Estimator under Stein Loss Function

Under Stein loss function (4), it is easy to check that the average risk for each gene is

R1(σ2t, σ2t) ,

1

G

G∑

g=1

E(

L1(σ2tg , σ2t

g ))

=hα

G(t)h1−α1 (t)

hG−11

(

αtG

)

h1

(

(1 − α + αG)t)(σ2

pool)αt 1

G

G∑

g=1

(σ2g)

−αt

− ln(

hαG(t)h1−α

1 (t))

− tΨ(ν

2) + t ln(

ν

2) − 1, (8)

where t > −ν/2, Ψ(t) = Γ′(t)/Γ(t) is the digamma function (Abramowitz and Stegun 1972),and the second equality is derived after some tedious but straightforward algebra using Lemma1 and the fact that E ln(χ2

ν) = Ψ(ν/2) + ln(2). Then the optimal α under Stein loss functionα∗

1 = argminα∈[0,1]

R1(σ2t, σ2t). Denote the optimal estimator under Stein loss function as σ2t

g (α∗

1).

In what follows the derivatives R′

k(σ2t, σ2t) and R′′

k(σ2t, σ2t) are with respect to α, k = 1, 2.

We prove in Appendix 1 thatTheorem 1. For any fixed G, ν and non-zero t > −ν/2, R1(σ

2t, σ2t) is a strictly convexfunction of α on [0, 1] satisfying

(i) R′

1(σ2t, σ2t)|α=0 < 0;

(ii) R′

1(σ2t, σ2t)|α=1 ≥ 0, where the equality holds if and only if σ2

g = σ2 for all g;

(iii) R′′

1(σ2t, σ2t) > 0 for all α ∈ [0, 1].

It is an immediate result from Theorem 1 thatCorollary 1. For any fixed G, ν and non-zero t > −ν/2, under Stein loss function,(i) there exists a unique α∗

1 in (0, 1] which is the solution to R′

1(σ2t, σ2t) = 0;

(ii) h1(t)Ztg is inadmissible for (σ2

g)t since α∗

1 > 0;

(iii) α∗

1 = 1 iff σ2g = σ2 for all g. Thus, hG(t)Zt

pool is also inadmissible for (σ2g)

t unlessσ2

g = σ2 for all g.In Appendix 2 we prove thatTheorem 2. For any fixed G and non-zero t > −ν/2, as ν → ∞,(i) α∗

1 → 0 when σ2g are not all the same;

(ii) R1(σ2t, σ2t) approaches to a constant function of α when σ2

g = σ2 for all g.

5

Theorem 2 indicates that it is unnecessary to borrow information from other genes whenthe number of replications for each gene is large.

3.2 Optimal Estimator under the Squared Loss Function

Under the squared loss function (5), it is easy to check that the average risk is

R2(σ2t, σ2t) ,

1

G

G∑

g=1

E(

L2(σ2tg , σ2t

g ))

=h2α

G (t)h2(1−α)1 (t)

hG−11

(

2αtG

)

h1

(

2(1 − α + αG)t)(σ2

pool)2αt 1

G

G∑

g=1

(σ2g)

−2αt

− 2hαG(t)h1−α

1 (t)

hG−11

(

αtG

)

h1

(

(1 − α + αG)t)(σ2

pool)αt 1

G

G∑

g=1

(σ2g)

−αt + 1, (9)

where t > −ν/4. We prove in Appendix 3 thatTheorem 3. For any fixed G, ν and non-zero power t > −ν/4, we have(i) R′

2(σ2t, σ2t)|α=0 < 0;

(ii) R′

2(σ2t, σ2t)|α=1 > 0.

Theorem 3 implies that the gene-specific estimator, h1(t)Ztg, is inadmissible. Contrary to

the result under Stein loss function, the pooled estimator, hG(t)Ztpool, is also inadmissible even

when σ2g = σ2 for all g. By Theorem 3 and the fact that R2(σ

2t, σ2t) ≥ 0, there exists an α∗

2

that minimizes R2(σ2t, σ2t). However, R2(σ

2t, σ2t) is not guaranteed to be a convex function ofα in [0, 1]. Therefore, α∗

2 may not be unique. A counter example with very large ν is providedin Appendix 4. Nevertheless, for small ν, R2(σ

2t, σ2t) is always strictly convex in millionsof simulations under various situations. Denote the optimal estimator under the squared lossfunction as σ2t

g (α∗

2). In Appendix 4 we also prove thatTheorem 4. For any fixed G and non-zero t > −ν/4, as ν → ∞,(i) α∗

2 → 0 when σ2g are not all the same;

(ii) R2(σ2t, σ2t) approaches to a constant function of α when σ2

g = σ2 for all g.

3.3 Estimation of the Optimal Shrinkage Parameters

Both α∗

1 and α∗

2 depend on the unknown quantity

b(σ2, η) = (σ2pool)

η 1

G

G∑

g=1

(σ2g)

−η, (10)

where η = αt or η = 2αt. A simple estimator of b(σ2, η) is b(Z, η), where Z = (Z1, . . . , ZG).Denote α∗

1 and α∗

2 as the estimates of α∗

1 and α∗

2 with b(σ2, η) in (8) and (9) replaced by b(Z, η).The following theorem shows that σ2t

g (α∗

1) and σ2tg (α∗

2) are asymptotically optimal, and α∗

1 andα∗

2 are consistent under certain conditions as ν → ∞.Theorem 5. Assume that Zg

a.s.→ σ2g as ν → ∞, g = 1, . . . , G. For any fixed G and non-zero

t, when ν → ∞, we have

6

(i) b(Z, αt)a.s.→ b(σ2, αt) uniformly for α ∈ [0, 1];

(ii) Rk(σ2t, σ2t(α∗

k(ν))) − Rk(σ2t, σ2t(α∗

k(ν)))a.s.→ 0, k = 1, 2;

(iii) α∗

1(ν)a.s.→ 0 and α∗

2(ν)a.s.→ 0 when σ2

g are not all the same.The proof of Theorem 5 is in Appendix 5. For microarray data, ν is small and G is large. In

the following we investigate asymptotic properties as G → ∞. We now consider σ2g as random

variables and assume that σ2g

iid∼ F , g = 1, . . . , G. We prove in Appendix 6 thatLemma 2. For any fixed non-zero t with ν > 2t, E(σ2

1)−t < ∞ and E(ln(σ2

1)) < ∞,we have w(αt)b(Z, αt) − b(σ2, αt)

a.s.→ 0 uniformly for α ∈ [0, 1] as G → ∞, where w(αt) =(ν/2)αth1(−αt) exp [−αtΨ(ν/2)] .

For a fixed t, let Hk(σ2, α,G) = Rk(σ

2t, σ2t(α)) , Hk(Z, α,G) be the functions withb(σ2, kαt) in Hk(σ

2, α,G) replaced by w(kαt)b(Z, kαt), k = 1, 2. Denote α∗

k(G) = argminα∈[0,1]

Hk(σ2, α,G)

and α∗

k(G) = argminα∈[0,1]

Hk(Z, α,G).

Theorem 6. For any fixed non-zero t,

(i) when ν > 2|t|, E(σ21)

−t < ∞ and E(ln(σ21)) < ∞, we have R1(σ

2t, σ2t(α∗

1(G))) −R1(σ

2t, σ2t(α∗

1(G)))a.s.→ 0 and α∗

1(G) − α∗

1(G)a.s.→ 0 as G → ∞;

(ii) when ν > 4|t|, E(σ21)

−2t < ∞ and E(ln(σ21)) < ∞, we have R2(σ

2t, σ2t(α∗

2(G))) −R2(σ

2t, σ2t(α∗

2(G)))a.s.→ 0 as G → ∞.

The proof of Theorem 6 is in Appendix 6. Note that there is no corresponding consistentresult for α∗

2 since it may not be unique. For the special case that F is a Gamma distributionwith shape parameter γ and scale parameter β, it is easy to check that E(ln(σ2

1)) = Ψ(γ)+ln β <∞, E(σ2

1)−t = β−tΓ(γ − t)/Γ(γ) < ∞ for γ > t, and E(σ2

1)−2t = β−2tΓ(γ − 2t)/Γ(γ) < ∞ for

γ > 2t.Note that Theorem 6 does not apply for small ν. We propose an alternative two-step proce-

dure: (i) substitute b(σ2, η) in (8) and (9) by b(Z, η), and compute temporary optimal shrinkageparameters and the corresponding shrinkage estimators, say σ2

−; (ii) substitute b(σ2, η) in (8)

and (9) by b(σ2−, η) to get the final optimal shrinkage parameters α∗

1 and α∗

2. When t > 0,since σ2

g appears in the denominator in (8) and (9), extreme small values of Zg make estimatesof α∗

1 and α∗

2 unstable. We truncate the smallest 1% of Zg’s in our procedures. We find thatthe truncation is unnecessary when t < 0. Simulations indicate that α∗

1 and α∗

2 perform betterthan α∗

1, α∗

2, α∗

1 and α∗

2 when ν is small. Therefore, α∗

1 and α∗

2 will be used in our simulations.

4. SIMULATIONS AND COMPARISONS

In this section we conduct simulations to compare the performance of our estimators withthe CHQBC estimator and the modified CHQBC estimator for the purpose of estimation.All estimators considered in this section perform substantially better than the standard gene-specific estimator. For simplicity, we will not present the results for the gene-specific estimator.We evaluate the performance for estimating σ2

g in Section 4.1 and the performance for estimating(σ2

g)−1 in Section 4.2. We set G = 5000 in this section.

4.1 Estimation of σ2g

7

We consider four different estimators in this subsection: σ2g(α

∗

1), σ2g(α

∗

2), σ2g(α0) and σ2

g(α0).We simulate σ2

g , g = 1, . . . , G, from a Gamma distribution with shape parameter γ and scaleparameter β. To evaluate performance under different levels of variance heterogeneity, weconsider three different shape parameters, γ = 0.25, 1 and 4, which corresponds to threedifferent coefficients of variation (CV =

√

γβ2/(γβ) =√

γ/γ) at levels 2, 1 and 0.5 respectively.For each σ2

g , we simulate ν + 1 observations from N(µg, σ2g), where µg is a random sample from

N(0, 1). We calculate Zg as the sample variance for each g. We use a factorial design whichconsists 3 levels for γ and 7 levels for ν, ν = 1, . . . , 7. Therefore, we have 21 combinations ofparameter settings. For each setting, we repeat simulation 100 times. We compute the averagerisk

ARk =1

100G

100∑

r=1

G∑

g=1

Lk(σ2gr, σ

2gr), k = 1, 2,

where r represents simulation replications, and k = 1 and k = 2 correspond to Stein and thesquared loss functions respectively. We plot ln(ARk) as a function of ν for all four methods inFigure 1.

The modified CHQBC estimator σ2g(α0) has smaller average risk than the original CHQBC

estimator σ2g(α0) in all settings. When the variance heterogeneity is not small (γ = 0.25 and

γ = 1), two optimal estimators σ2g(α

∗

1) and σ2g(α

∗

2) have similar risks which are smaller thanthose of σ2

g(α0). When ν and the variance heterogeneity is small (γ = 4), σ2g(α0) has smaller

risks under Stein loss function than the optimal estimators. Overall, the optimal estimator un-der Stein loss function performs well. Note that the estimates of optimal shrinkage parametersα∗

1 and α∗

2 do not guarantee the optimal performance, especially when ν is small. Simulations(not shown) indicate that α∗

1 and α∗

2 do guarantee the optimal performance for Stein and thesquared loss functions respectively.

To take a closer look at estimates of shrinkage parameters, we construct boxplots of ratios,α∗

1/α0 and α∗

2/α0 for γ = 1 in Figure 2. Boxplots of ratios for γ = 0.25 and γ = 4 are similar(not shown). In general, α∗

1 > α∗

2, α∗

1 < α0 and α∗

2 < α0 for small ν, and α∗

1 > α0 and α∗

2 > α0

for large ν. Therefore, σ2g(α

∗

1) shrinks more than σ2g(α

∗

2), and both σ2g(α

∗

1) and σ2g(α

∗

2) shrink lessthan σ2

g(α0) and σ2g(α0) when ν is small. This explains why the optimal shrinkage estimators

are inferior to σ2g(α0) when the variance heterogeneity and ν are both small.

4.2 Estimation of σ−2g

As discussed in Section 1, it is better to use an estimator of σ−2g directly than the reciprocal

of an estimator of σ2g in the F -test (Section 5). In this subsection we evaluate performance for

estimating σ−2g .

Simulations (not shown) indicate that σ−2g (α∗

1) always performs better than σ−2g (α∗

2). Forsimplicity, we present results for σ−2

g (α∗

1) only. We consider four estimators: σ−2g (α∗

1), σ2g(α

∗

1),σ2

g(α0) and σ2g(α0), where σ−2

g (α∗

1) is the estimator of (σ2g)

t with t = −1. We take the reciprocalof the last three as estimators of σ−2

g . Theorem 1 requires that ν ≥ 3 when t = −1. Therefore,

8

1

1

11

11

1

1 2 3 4 5 6 7

−2

−1

01

2

γ=0.25

ln(A

R1)

2

2

22

22

2

3

3

33

33

3

4

4

44

44

4

11 1 1 1 1 1

1 2 3 4 5 6 7

05

1015

γ=0.25

ln(A

R2)

2 2 2 2 2 2 2

3

33 3 3 3 3

4

44 4 4 4 4

1

1

11

11

1

1 2 3 4 5 6 7

−2.

0−

1.0

0.0

0.5

γ=1

ln(A

R1)

2

2

2

22

22

3

3

3

33

33

4

4

4

44

44

1 11

1 1 1 1

1 2 3 4 5 6 7

02

46

γ=1

ln(A

R2)

22 2 2 2 2 2

3

3

33

3 3 3

4

4

44

4 4 4

1

1

11 1 1 1

1 2 3 4 5 6 7

−2.

5−

1.5

−0.

5

γ=4

ν

ln(A

R1)

2

2

2

22

22

33

33 3 3 3

44

44

4 4 4

1

11

11

11

1 2 3 4 5 6 7

−1.

5−

1.0

−0.

50.

0

γ=4

ν

ln(A

R2)

2

2

2

22

22

3

3

33

33

3

4

4

44

44

4

Figure 1: Plots of average risk for estimating σ2g under Stein loss function (left) and the squared

loss function (right). Three rows correspond to three shape parameters. Four lines in each plotmarked ”1”, ”2”, ”3” and ”4” correspond to the optimal estimator under Stein loss functionσ2

g(α∗

1), the optimal estimator under the squared loss function σ2g(α

∗

2), the modified CHQBCestimator σ2

g(α0) and the CHQBC estimator σ2g(α0) respectively.

9

1 2 3 4 5 6 7

0.5

0.7

0.9

1.1

ν

ratio

1 2 3 4 5 6 7

0.2

0.4

0.6

ν

ratio

Figure 2: Boxplots of ratio=α∗

1/α0 (left) and ratio=α∗

2/α0 (right) with γ = 1.

we take ν from 3 to 9. All other settings are the same as those in Section 4.1.

Figure 2 shows that under both Stein and the squared loss functions, risk of σ−2g (α∗

1) < riskof σ2

g(α0) < risk of σ2g(α0) < risk of σ2

g(α∗

1). It is interesting to note that σ2g(α0) outperforms

σ2g(α0), which, again, confirms that a better estimator for σ2

g may not lead to a better estimatorfor σ−2

g . We have performed many more simulations with different parameters for both Sections4.1 and 4.2. Comparative results remain the same.

5. APPLICATIONS

Cui et al. (2005) has demonstrated that the F -test using the CHQBC estimator has the bestor nearly the best power among several ”information-sharing” statistics for detecting differen-tially expressed genes over a wide range of settings. For simplicity, we compare the performanceof our shrinkage estimators with the CHQBC estimator and the modified CHQBC estimatoronly. We introduce F -like statistics using shrinkage estimators in Section 5.1. In Section 5.2,we apply our method to a microarray data and conduct simulations to evaluate and comparethe performances of several F -like statistics.

5.1 F -like statistics

For a fixed gene g, g = 1, . . . , G, test statistics are usually based on the following generallinear mixed effects model (Kerr, Afshari, Bennett, Bushel, Martinez, Walker and Churchill2002, Cui et al. 2005)

yg = Xgβg + Zgbg + εg, (11)

where yg is the vector of all observations for gene g, Xg and Zg are design matrices for thefixed effects βg and the random effects bg respectively, and εg is the vector of random errors.

10

1

1

1

11

11

3 4 5 6 7 8 9

−2.

0−

1.0

0.0

γ=0.25

ln(A

R1)

2

2

2

2

22

2

3

3

3

3

33

3

4

4

4

44

44

1

1

11

1 1 1

3 4 5 6 7 8 9

02

46

8

γ=0.25

ln(A

R2)

2

2

22

22 2

3

3

33

33 3

4

4

44

4 4 4

1

1

1

11

11

3 4 5 6 7 8 9

−2.

0−

1.5

−1.

0

γ=1

ln(A

R1)

2

2

2

2

22

2

3

3

3

3

33

3

4

4

4

4

44

4

1 11

11

11

3 4 5 6 7 8 9

−1.

00.

01.

02.

0

γ=1

ln(A

R2)

2

2

2

22

22

3

3

3

33

33

4

4

4

44

44

1

11

11

11

3 4 5 6 7 8 9

−2.

6−

2.4

−2.

2−

2.0

γ=4

ν

ln(A

R1)

2

2

2

2

22

2

3

3

3

33

33

4

44

44

44

1 11

11

11

3 4 5 6 7 8 9

−2.

0−

1.6

−1.

2−

0.8

γ=4

ν

ln(A

R2)

2

2

2

22

22

33

33

33

3

44

44

44

4

Figure 3: Plots of average risk for estimating (σ2g)

−1 under Stein loss function (left) and thesquared loss function (right). Three rows correspond to three shape parameters. Four lines ineach plot marked ”1”, ”2”, ”3” and ”4” correspond to σ−2

g (α∗

1), σ2g(α

∗

1), σ2g(α0) and σ2

g(α0).

11

We assume that(

bg

εg

)

∼ N

((

0

0

)

, σ2g

(

Gg

Rg

))

,

where σ2g is the error variance for gene g. Note that the general linear model is a special

case with empty Zg and bg. Denote βg and bg as the best linear unbiased predictor and itsvariance-covariance matrix

Cg = σ2g

(

X′

gR−1g Xg X′

gR−1g Zg

Z′

gR−1g Xg Z′

gR−1g Zg + G−1

g

)−

, σ2gDg,

where ”−” sign represents the generalized inverse of the matrix. The F statistic for testing thehypothesis H0 : L′

gβg = 0 is (Littell, Milliken, Stroup and Wolfinger 1996)

F =β

′

gLg(L′

gDgLg)−1L′

gβg/rank(Lg)

σ2g

, ∆gσ−2g , (12)

where Dg is an estimator of Dg which can be calculated using the restricted maximum likelihoodmethod (Searle, Casella and McCulloch 1992), and σ−2

g is an estimator of σ−2g .

Different estimators of σ−2g leads to different F -like statistics. The gene-specific F statistic

replaces σ2g by its gene-specific estimator. We will consider four F -like statistics, F1, F2, F3 and

F4, with σ−2g in (12) replaced by σ−2

g (α∗

1), σ−2g (α∗

2), 1/σ2g(α0) and 1/σ2

g(α0) respectively. F4 is thesame as the Fs in Cui et al. (2005). Since F4(Fs) compares favorably to other F -like statisticsincluding the gene-specific F -statistic (Cui et al. 2005), we will compare our methods to F4(Fs)only. Simulations (not shown) indicate that F -like statistics with σ2

g in (12) replaced by σ2g(α

∗

1)or σ2

g(α∗

2) have similar performance as F3. To save space, these results are not presented. SinceF -like statistics do not follow F distributions, we calculate p-values by permutation as in Cuiet al. (2005). We note that the method described here applies to general designs, and usuallyXg, Zg, Gg, Rg and Lg are independent of g.

5.2 Case study

Cui et al. (2005) described an experiment that compared two human colon cancer celllines, CACO2 and HCT116, and three human ovarian cancer cell lines, ES2, MDAH2774 andOV1063. Five samples were arranged in a loop and no reference sample was used. Fluorescentdye labeled cDNA targets were hybridized to cDNA microarrays containing 9600 human cDNAclones. To simplify the analysis, as in Cui et al. (2005), the duplicated spots for the same geneon each array were averaged at the original signal level. Observations were transformed andnormalized.

We fit the following model to each gene (Cui et al. 2005):

yij = µ + Ai + Dj + Sk(i,j) + εij, i = 1, . . . , 10; j = 1, 2; k = 1, . . . , 5, (13)

where µ is the gene mean, Ai is the array effect, Dj is the dye effect, Sk(i,j) is the sample effect,and εij is the random error. In general, the terms µ, Dj and Sk(i,j) are fixed effects and Ai

12

are random effects. Cui et al. (2005) demonstrated that the array variance has little impacton the F -tests. Therefore, as Cui et al. (2005), we treat Ai as fixed effects. All analyses areconducted using the newest version of R/MAANOVA (Wu and Churchill 2005). At a nominalsignificant level of 0.01, F1, F2, F3 and F4 detected 1877, 1870, 1790 and 1838 significant genesrespectively.

To study the false positive and successful detection rates for four F -like tests, we simu-late 100 data sets using the same design as the real data. Each simulated data set contains1000 constant genes and 1000 differentially expressed genes. Since the successful detection rateof a test depends on the magnitude of the overall treatment effect, Θ =

∑5k=1(Sk − S.)

2/4,

where Sk is defined in (13), we consider√

4Θ as a parameter. Specifically, we generate

Sk =√

4Θ(Qk − Q)/√

∑5k=1(Qk − Q)2 where Qk

iid∼ N(0, 1), with√

4Θ = 0.1, 0.2, . . . , 1 repre-

senting ten different levels of treatment effects. The fixed effects µ and Dj are drawn randomlyfrom the normal distributions N(0, 0.652) and N(0, 0.352) respectively, and are held constantacross all simulations. For each simulation, Ai and εij are drawn randomly from N(0, 0.62)and N(0, σ2

g) respectively, where σ2g are sampled randomly without replacement from the 9600

estimates of residual variances of the real data. As in Cui et al. (2005), the variability of theresidual variances was controlled by a parameter τ through the formula σ2

g,τ = (σ2τg /σ2τ

pool)σ2pool.

We consider three choices of τ , τ = 0.5, 1 and 2, which correspond to CV = 0.63, 1.84 and10.60 respectively.

F1 F2 F3 F4

CV = 0.63 0.026 0.027 0.024 0.026CV = 1.84 0.038 0.038 0.041 0.041CV = 10.6 0.046 0.045 0.048 0.047

Table 1: Average false positive rates of four F -like statistics at the significance level 0.05.

Average false positive rates at the significance level 0.05 listed in Table 1 indicate that thetype I error is under the nominal value. All four F -like statistics have similar false positiverates. F3 and F4 have similar powers in all settings even though the performances of σ2

g(α0)and σ2

g(α0) are quite different for estimation (Section 4). Improvements of the new F -like testscome with moderate to large treatment effect. For simplicity, in Figure 4, we plot the successfuldetection rate for F1, F2 and F4 with 0.4 ≤

√4Θ ≤ 1. F1 outperforms F4 in all situations.

F1 performs better than F2 when the heterogeneity of variance is not large (CV =0.63 andCV = 1.84). When the heterogeneity of variance is large (CV = 10.6), F2 performs better thanF1. For microarray data in their typical range of CV , we recommend F1.

6. DISCUSSION

One major challenge in microarray data analysis is the relative small number of replicationsfor each gene compared to the large number of genes. In this paper we propose a family of

13

1

1

11 1 1 1

0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.90

0.92

0.94

0.96

0.98

1.00

τ=0.5

ΣTrt2

succ

essf

ul d

etec

tion

rate

2

2

2

2 2 2 2

4

4

44 4 4 4

1

1

1

1

11 1

0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.80

0.85

0.90

0.95

1.00

τ=1

ΣTrt2

2

2

2

2

22 2

4

4

4

4

4

4 4

1

1

1

1

1

11

0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.70

0.75

0.80

0.85

0.90

τ=2

ΣTrt2

2

2

2

2

2

22

4

4

4

4

4

4

4

Figure 4: Plots of the average successful detection rate of three F -like statistics.

shrinkage variance estimators which borrow information across genes by shrinking each gene-specific variance estimator toward the bias corrected geometric mean of variance estimatorsfor all genes. The amount of optimal shrinkage depends on the variability of the individualvariances. We have shown that the standard sample variance is inadmissible under eitherStein or the squared loss functions. Our optimal shrinkage estimators compare favorably withthe CHQBC estimator in terms of both estimation and hypothesis test. One key is to usean estimator of σ−2

g directly in the F statistic instead of using the reciprocal of an estimatorof σ2

g . We note that the optimal shrinkage variance estimators may also be used for otherpurposes such as constructing confidence intervals. We recommend the estimator under Steinloss function for microarray data analysis.

On the logarithm scale, our shrinkage estimator (7) is a weighted average of the gene-specificvariance and the bias corrected geometric mean. One of our future research topic is to considera weighted average of the gene-specific variance and the arithmetic mean (Baldi and Long2001). Our method shrinks all gene-specific variances to a unique common variance. One mayalso borrow information contained in the sample mean (Stein 1964). Better shrinkage estimatormay be constructed when additional information is available. For example, instead of shrinkingto the overall geometric mean, one may borrow information from neighboring genes (Baldi andLong 2001, Jain et al. 2003).

Even though motivated and applied to the microarray data, the optimal shrinkage varianceestimators in this paper are general which can have a wide range of applications such as portfoliooptimization (Ledoit and Wolf 2004a, Ledoit and Wolf 2004b). Our methodology and theoryextend Stein’s landmark results from shrinkage estimation of means to shrinkage estimation of

14

variances (James and Stein 1961), and from shrinkage estimation of a single variance to theshrinkage estimation of multiple variances (Stein 1964).

APPENDIX 1: THE PROOF OF THEOREM 1

Lemma 3. (i) For any t > 0,

Ψ(t + 1) = Ψ(t) + 1/t, (14)

Ψ′(t) =∞∑

k=0

(t + k)−2 > 0, (15)

Γ′′(t)/Γ(t) = Ψ2(t) + Ψ′(t), (16)

where Ψ(t) = Γ′(t)/Γ(t) and Ψ′(t) are the digamma and trigamma functions;(ii) For any t > 0, Γ(s)/Γ(s + t) is a strictly decreasing function of s on (0,∞);

(iii) hn(t) > h1(t) for any n ≥ 2 and any non-zero t > −ν/2;(iv) h2

n(t) > hn1 (2t/n) for any n ≥ 2 and any non-zero t > −ν/2;

(v) limν→∞

hn(t) = 1 for any fixed n and t.

[Proof ] (i) (14) is the same as #6.3.5 in Abramowitz and Stegun (1972). (15) is a special caseof #6.4.10 in Abramowitz and Stegun (1972) with n = 1. (16) can be derived directly from thedefinition of Ψ(t).

(ii) Let B(s, t) =∫ 1

0xs−1(1 − x)t−1dx be the Beta function. It is obvious that B(s, t) is a

strictly decreasing function of s for any fixed t > 0. Therefore, Γ(s)/Γ(s + t) = B(s, t)/Γ(t) isa strictly decreasing function of s.

(iii) Using Lemma 3 (ii), for any t > 0,

h1(t) = (ν

2)t Γ(ν

2)

Γ(ν2

+ t)

= (ν

2)t

(

Γ(ν2)

Γ(ν2

+ tn)

)(

Γ(ν2

+ tn)

Γ(ν2

+ 2tn)

)

· · ·(

Γ(ν2

+ t − tn)

Γ(ν2

+ t)

)

< (ν

2)t

(

Γ(ν2)

Γ(ν2

+ tn)

)(

Γ(ν2)

Γ(ν2

+ tn)

)

· · ·(

Γ(ν2)

Γ(ν2

+ tn)

)

= hn(t).

Same result holds for −ν/2 < t < 0 by noting the fact that Γ(s)/Γ(s+ t) is a strictly increasingfunction of s on (−t,∞) for t < 0.

(iv) Similar arguments as in the proof of (iii) give

h2n(t)

hn1 (2t

n)

=

(

Γ(ν2)

Γ(ν2

+ tn)

/

Γ(ν2

+ tn)

Γ(ν2

+ 2tn)

)n

> 1.

(v) limν→∞

hn(t) = 1 is an immediate result of the fact that limt→∞

tb−aΓ(t + a)/Γ(t + b) = 1 for

any a, b ∈ R (#6.1.46 in Abramowitz and Stegun (1972)).

15

Lemma 4. For any positive sequence {ai, i = 1, . . . , n}, we have

n∑

i=1

1

ai

(

1

n

n∑

j=1

ln(aj) − ln(ai)

)

≥ 0.

[Proof ] Without loss of generality, we assume that 0 < a1 ≤ · · · ≤ an. Thus 1/a1 ≥ · · · ≥ 1/an.Let bi =

∑nj=1 ln(aj)/n − ln(ai). Since that b1 ≥ · · · ≥ bn and

∑ni=1 bi = 0, there exists an

m ∈ {1, . . . , n} such that b1 ≥ · · · ≥ bm ≥ 0 ≥ bm+1 ≥ · · · ≥ bn and∑m

i=1 bi = −∑ni=m+1 bi.

Therefore,n∑

i=1

1

ai

(

1

n

n∑

j=1

ln(aj) − ln(ai)

)

=m∑

i=1

1

ai

bi +n∑

i=m+1

1

ai

bi

≥ 1

am

m∑

i=1

bi +1

am+1

n∑

i=m+1

bi

=

(

1

am

− 1

am+1

) m∑

i=1

bi ≥ 0.

Proof of Theorem 1. For simplicity, we define

A(α) = C1(α)C2(α) − C3(α), (17)

where

C1(α) ,hα

G(t)h1−α1 (t)

hG−11

(

αtG

)

h1

(

(1 − α + αG)t) , (18)

C2(α) , (σ2pool)

αt 1

G

G∑

g=1

(σ2g)

−αt, (19)

C3(α) , ln(

hαG(t)h1−α

1 (t))

,

where C2(α) is the same as b(σ2, αt). A(α) and R1(σ2t, σ2t(α)) differ only by a constant

independent of α. Let

D(α) , ln

(

hG(t)

h1(t)

)

+(G − 1)t

G

[

Ψ

(

ν

2+

αt

G

)

− Ψ(ν

2+ (1 − α +

α

G)t)

]

. (20)

Then it is not difficult to check that

C′

1(α) = C1(α)D(α),

C′′

1 (α) = C1(α)(

D2(α) + D′

(α))

,

C′

2(α) = (σ2pool)

αt 1

G

G∑

g=1

[

(σ2g)

−αtt ln

(

σ2pool

σ2g

)]

,

C′′

2 (α) = (σ2pool)

αt 1

G

G∑

g=1

[

(σ2g)

−αtt2 ln2

(

σ2pool

σ2g

)]

,

16

where

D′

(α) =(G − 1)t2

G2

[

Ψ′

(

ν

2+

αt

G

)

+ (G − 1)Ψ′

(ν

2+ (1 − α +

α

G)t)

]

. (21)

For any α ∈ [0, 1] and non-zero t > −ν/2, ν/2+αt/G > 0 and ν/2+ (1−α+α/G)t > 0. Thusby (15), D

′

(α) > 0.

[Proof of (i)]

A′(0) = C ′

1(0)C2(0) + C1(0)C′

2(0) − C ′

3(0)

= ln

(

hG(t)

h1(t)

)

+(G − 1)t

G

(

Ψ(ν

2) − Ψ(

ν

2+ t)

)

− ln

(

hG(t)

h1(t)

)

=(G − 1)t

G

(

Ψ(ν

2) − Ψ(

ν

2+ t)

)

(22)

< 0,

where the last inequality holds for non-zero t > −ν/2 since Ψ(·) is a strictly increasing functionon (0,∞).[Proof of (ii)]

A′(1) = ln

(

hG(t)

h1(t)

)

(σ2pool)

t 1

G

G∑

g=1

(σ2g)

−t + (σ2pool)

t 1

G

G∑

g=1

(σ2g)

−tt ln

(

σ2pool

σ2g

)

− ln

(

hG(t)

h1(t)

)

= ln

(

hG(t)

h1(t)

)

(

(σ2pool)

t 1

G

G∑

g=1

(σ2g)

−t − 1

)

+ (σ2pool)

t 1

G

G∑

g=1

(σ2g)

−tt ln

(

σ2pool

σ2g

)

, ln (hG(t)/h1(t)) M1 + M2,

where M1 , (σ2pool)

t∑G

g=1(σ2g)

−t/G − 1 ≥ (σ2pool)

t(∏G

g=1 σ2g)

−t/G − 1 = 0, and

M2 , (σ2pool)

t 1

G

G∑

g=1

(σ2g)

−tt ln(

σ2pool/σ

2g

)

= (σ2pool)

t 1

G

G∑

g=1

1

(σ2g)

t

(

1

G

G∑

i=1

ln(σ2i )

t − ln(σ2g)

t

)

≥ 0,

where the last inequality is a consequence of Lemma 4. By Lemma 3 (iii), ln (hG(t)/h1(t)) > 0.Thus A′(1) ≥ 0. It is easy to see that equality holds iff σ2

g = σ2 for all g.

17

[Proof of (iii)]

A′′(α) = C ′′

1 (α)C2(α) + 2C ′

1(α)C ′

2(α) + C1(α)C ′′

2 (α) − C ′′

3 (α)

= C1(α)[

D2(α) + D′(α)]

C2(α) + 2C1(α)D(α)C ′

2(α) + C1(α)C ′′

2 (α)

= C1(α)(σ2pool)

αt 1

G

[

G∑

g=1

(σ2g)

−αt(

D2(α) + D′(α))

+G∑

g=1

(σ2g)

−αt2D(α)t ln

(

σ2pool

σ2g

)

+G∑

g=1

(σ2g)

−αtt2 ln2

(

σ2pool

σ2g

)

]

= C1(α)(σ2pool)

αt 1

G

G∑

g=1

(σ2g)

−αt[

(

D(α) + t ln(

σ2pool/σ

2g

))2+ D′(α)

]

(23)

> 0,

where the last inequality uses the fact that D′(α) > 0 for any α ∈ [0, 1].


[Proof of (i)] For any fixed t > 0, let m be the smallest integer such that m ≥ t. By (14) andthe fact that Ψ(·) is an increasing function on (0,∞), we have

0 ≥ Ψ(ν

2) − Ψ(

ν

2+ t) ≥ Ψ(

ν

2) − Ψ(

ν

2+ m) = −

m−1∑

k=0

1

ν/2 + k≥ −2m

ν

ν→∞−→ 0. (24)

Thus by (22), for any fixed G and t, limν→∞

A′(0) = 0. Similar arguments show that limν→∞

A′(0) = 0

for any fixed −ν/2 < t < 0.Lemma 3 (v) implies that lim

ν→∞

C1(α) = 1 for any fixed α, G and t. Similar arguments as in

(24) show that limν→∞

D(α) = 0. By (15) it is easy to see that limν→∞

D′(α) = limν→∞

2(1−1/G)t2/ν = 0.

Thus,

limν→∞

A′′(α) = (σ2pool)

αt 1

G

G∑

g=1

(σ2g)

−αtt2 ln2

(

σ2pool

σ2g

)

. (25)

When σ2g are not all the same, lim

ν→∞

A′′(α) > 0. This shows that A(α) reaches the minimum at

α = 0 when ν → ∞ since limν→∞

A′(0) = 0 and A(α) is a strictly convex function of α in [0, 1].

[Proof of (ii)] By (25), limν→∞

A′′(α) = 0 when σ2g = σ2 for all g. Combining with the fact that

limν→∞

A′(0) = 0, A(α) is a constant function of α in [0, 1].


Lemma 5. For any s > 0 and t > −s, we have

ln (Γ(s + t)/Γ(s)) ≥ tΨ(s).

18

[Proof ] Let f(t) = ln (Γ(s + t)/Γ(s)) − tΨ(s) and we need to show that f(t) ≥ 0. f ′(t) =Ψ(s + t) − Ψ(s). Since Ψ(·) is an increasing function, f ′(t) ≥ 0 on (0,∞) and f ′(t) ≤ 0 on(−s, 0). Thus f(t) ≥ f(0) = 0.

Proof of Theorem 3. We rewrite (9) as

R2(σ2t, σ2t) , E1(α)E2(α) − 2C1(α)C2(α) + G,

where C1(α) and C2(α) are defined in (18) and (19) and

E1(α) ,h2α

G (t)h2(1−α)1 (t)

hG−11

(

2αtG

)

h1

(

2(1 − α + αG)t) , (26)

E2(α) , (σ2pool)

2αt 1

G

G∑

g=1

(σ2g)

−2αt.

[Proof of (i)]

R′

2(σ2t, σ2t)|α=0 = 2

(

h21(t)

h1(2t)− 1

)[

ln

(

hG(t)

h1(t)

)

+(G − 1)t

G

(

Ψ(ν

2) − Ψ(

ν

2+ t)

)

]

+2h2

1(t)

h1(2t)

(G − 1)t

G

[

Ψ(ν

2) − Ψ(

ν

2+ t)

]

< 2

(

h21(t)

h1(2t)− 1

)[

ln

(

hG(t)

h1(t)

)

+(G − 1)t

G

(

Ψ(ν

2) − Ψ(

ν

2+ t)

)

]

, 2M3M4, (27)

where the inequality holds following the same arguments in the proof of Theorem 1 (i). ByLemma 3(ii), Γ(ν/2)/Γ(ν/2 + t) ≥ Γ(ν/2 + t)/Γ(ν/2 + 2t) for any t > −ν/2 and t 6= 0. Thush2

1(t) ≥ h1(2t) which implies that M3 ≥ 0. By Lemma 5,

− ln

(

hG(t)

h1(t)

)

= (G − 1) ln

(

Γ(ν2

+ tG)

Γ(ν2)

)

+ ln

(

Γ(ν2

+ tG)

Γ(ν2

+ t)

)

≥ (G − 1)t

GΨ(

ν

2) − (G − 1)t

GΨ(

ν

2+ t)

=(G − 1)t

G

(

Ψ(ν

2) − Ψ(

ν

2+ t)

)

,

which implies that M4 ≤ 0. Therefore, R′

2(σ2t, σ2t)|α=0 < 0.

[Proof of (ii)] It is easy to check that R′

2(σ2t, σ2t)|α=1 , (M5 + M6)/G, where

M5 , 2 ln

(

hG(t)

h1(t)

)

(

h2G(t)

hG1 (2t

G)(σ2

pool)2t

G∑

g=1

(σ2g)

−2t − (σ2pool)

t

G∑

g=1

(σ2g)

−t

)

,

M6 ,h2

G(t)

hG1 (2t

G)(σ2

pool)2t

G∑

g=1

[

(σ2g)

−2t2t ln

(

σ2pool

σ2g

)]

− (σ2pool)

t

G∑

g=1

[

(σ2g)

−t2t ln

(

σ2pool

σ2g

)]

.

19

We will show that M5 > 0 and M6 ≥ 0. Using the facts that (σ2pool)

t∑G

g=1(σ2g)

−t ≥ G

and (σ2pool)

2t∑G

g=1(σ2g)

−2t − 2(σ2pool)

t∑G

g=1(σ2g)

−t + G =∑G

g=1

(

(σ2pool/σ

2g)

t − 1)2 ≥ 0, we have

(σ2pool)

2t∑G

g=1(σ2g)

−2t ≥ (σ2pool)

t∑G

g=1(σ2g)

−t. Using Lemma 3 (iii) and (iv),

M5 > 2 ln

(

hG(t)

h1(t)

)

(

(σ2pool)

2t

G∑

g=1

(σ2g)

−2t − (σ2pool)

t

G∑

g=1

(σ2g)

−t

)

≥ 0.

By Lemma 3 (iv) and Lemma 4,

M6 ≥ (σ2pool)

2t

G∑

g=1

[

(σ2g)

−2t2t ln

(

σ2pool

σ2g

)]

− (σ2pool)

t

G∑

g=1

[

(σ2g)

−t2t ln

(

σ2pool

σ2g

)]

=G∑

g=1

2 ln

(

σ2pool

σ2g

)t[

(

σ2pool

σ2g

)2t

−(

σ2pool

σ2g

)t]

≥ 0,

where the last inequality uses the fact that ln(x) and x(x− 1) have the same sign for all x > 0.

APPENDIX 4: COUNTER EXAMPLE AND PROOF OF THEOREM 4

Some algebra shows that

R′′

2(σ2t, σ2t) = E1(α)(σ2

pool)2αt 1

G

G∑

g=1

(σ2g)

−2αt

{

[

F (α) + 2t ln

(

σ2pool

σ2g

)]2

+ F ′(α)

}

− 2C1(α)(σ2pool)

αt 1

G

G∑

g=1

(σ2g)

−αt

{

[

D(α) + t ln

(

σ2pool

σ2g

)]2

+ D′(α)

}

, (28)

where C1(α), D(α), D′(α) and E1(α) are defined in (18), (20), (21) and (26) respectively, and

F (α) , 2 ln

(

hG(t)

h1(t)

)

+2(G − 1)t

G

[

Ψ

(

ν

2+

2αt

G

)

− Ψ(ν

2+ 2(1 − α +

α

G)t)

]

,

F ′(α) ,4(G − 1)t2

G2

[

Ψ′

(

ν

2+

2αt

G

)

+ (G − 1)Ψ′

(ν

2+ 2(1 − α +

α

G)t)

]

.

We have shown in Appendix 2 that limν→∞

C1(α) = 1 and limν→∞

D(α) = limν→∞

D′(α) = 0. Similar

arguments show that for any fixed α, G and t, limν→∞

E1(α) = 1 and limν→∞

F (α) = limν→∞

F ′(α) = 0.

Denote ag , (σ2pool/σ

2g)

t. We have∏G

g=1 ag = 1, and

limν→∞

R′′

2(σ2t, σ2t) =

2

G

G∑

g=1

(2a2αg − aα

g )(ln ag)2. (29)

20

[Counter Example] Let G = 100, a1 = · · · = a99 = 161/99 and a100 = 1/16. It is easy to checkthat

∏Gg=1 ag = 1 and lim

ν→∞

R′′

2(σ2t, σ2t)|α=0.5 = −0.0176 < 0. Therefore, there exists a large ν

such that R2(σ2t, σ2t) is not a convex function.

[Proof of Theorem 4] Similar arguments as in (29) show that for any α ∈ [0, 1],

limν→∞

R′

2(σ2t, σ2t) =

2

G

G∑

g=1

(a2αg − aα

g ) ln(ag) =2

αG

G∑

g=1

aαg (aα

g − 1) ln(aαg ) ≥ 0,

where the equality holds iff ag = 1 for all g in the last inequality. This implies that α∗

2 → 0 asν → ∞ when σ2

g are not all the same. Otherwise, limν→∞

R′

2(σ2t, σ2t) = 0 which proves part (ii).


It is easy to see that (Zg/σ2g)

−t a.s.→ 1 for any fixed t. For an arbitrary ε > 0, there existsan N > 0 such that for any ν > N , 1 − ε ≤ (Zg/σ

2g)

−t ≤ 1 + ε with probability 1. Now

for any α ∈ [0, 1], 1 − ε ≤ (1 − ε)α ≤ (Zg/σ2g)

−αt ≤ (1 + ε)α ≤ 1 + ε. Therefore, Z−αtg

a.s.−→(σ2

g)−αt uniformly for α ∈ [0, 1] as ν → ∞. Note that G is fixed, we have

1

G

G∑

g=1

Z−αtg

a.s.−→ 1

G

G∑

g=1

(σ2g)

−αt uniformly for α ∈ [0, 1] as ν → ∞. (30)

Similarly, by the fact that Zpool/σ2pool

a.s.→ 1, we have

(Zpool)αt a.s.−→ (σ2

pool)αt uniformly for α ∈ [0, 1] as ν → ∞. (31)

Thus by (30) and (31) and the facts that (σ2pool)

αt < ∞ and∑G

g=1(σ2g)

−αt/G < ∞,

b(Z, αt) =(

(Zpool)αt − (σ2

pool)αt)

(

1

G

G∑

g=1

Z−αtg − 1

G

G∑

g=1

(σ2g)

−αt

)

+(σ2pool)

αt

(

1

G

G∑

g=1

Z−αtg − 1

G

G∑

g=1

(σ2g)

−αt

)

+(

(Zpool)αt − (σ2

pool)αt) 1

G

G∑

g=1

(σ2g)

−αt

+(σ2pool)

αt 1

G

G∑

g=1

(σ2g)

−αt

a.s.−→ b(σ2, αt) uniformly for α ∈ [0, 1] as ν → ∞. (32)

For a fixed t, let Qk(σ2, α, ν) = Rk(σ

2t, σ2t(α)), Qk(Z, α, ν) be the functions with b(σ2, kαt)in Qk(σ

2, α, ν) replaced by b(Z, kαt), k = 1, 2. By (32), it is not difficult to check that

Q1(Z, α, ν) − Q1(σ2, α, ν)

a.s.−→ 0 uniformly for α ∈ [0, 1] as ν → ∞. (33)

21

Then for each pair (σ2,Z) satisfying (33), it is easy to show that minα∈[0,1]

Q1(Z, α, ν)− minα∈[0,1]

Q1(σ2, α, ν) →

0. Note that minα∈[0,1]

Q1(σ2, α, ν) = R1(σ

2t, σ2t(α∗

1(ν))) and

minα∈[0,1]

Q1(Z, α, ν) − R1(σ2t, σ2t(α∗

1(ν)))

= C1 (α∗

1(ν))[

b(Z, α∗

1(ν)t) − b(σ2, α∗

1(ν)t)]

→ 0 as ν → ∞,

since limν→∞

C1(α) = 1. Therefore,

R1(σ2t, σ2t(α∗

1)) − R1(σ2t, σ2t(α∗

1))a.s.−→ 0 as ν → ∞. (34)

Similar arguments show that (34) holds for the squared loss function. From (34) we haveb(σ2, α∗

1t)− b(σ2, α∗

1t)a.s.→ 0. Note that α∗

1 → 0 since σ2g are not all the same (Theorem 2). This

implies that b(σ2, α∗

1t) → b(σ2, 0) = 1 and thus b(σ2, α∗

1t)a.s.→ 1. Therefore, by the fact that

∑Gg=1(σ

2g)

p/G > (σ2pool)

p for any non-zero p since σ2g are not all the same, we have α∗

1a.s.→ 0.

Similarly, we have [b(σ2, 2α∗

2t) − b(σ2, 2α∗

2t)]−2 [b(σ2, α∗

2t) − b(σ2, α∗

2t)]a.s.→ 0. Thus α∗

2a.s.→ 0

by noting that α∗

2 → 0 (Theorem 4) and the fact that

b(σ2, 2α∗

2t) − 2b(σ2, α∗

2t) + 1 =1

G

G∑

g=1

[

(σ2pool/σ

2g)

α∗

2t − 1

]2 ≥ 0.

APPENDIX 6: THE PROOF OF LEMMA 2 AND THEOREM 6

[Proof of Lemma 2] Since s(α) = (σ2g)

−αt is a convex function of α, then (σ2g)

−αt = s(α) ≤(1 − α)s(0) + αs(1) ≤ 1 + (σ2

g)−t. By Theorem 16(a) in Ferguson (1996) and the fact that

E(σ2g)

−t < ∞,

1

G

G∑

g=1

(σ2g)

−αt a.s.−→ E(σ21)

−αt uniformly for α ∈ [0, 1] as G → ∞. (35)

Similarly, by the fact that E(Z1)−αt = E

[

E(Z−αt1 |σ2

1)]

= E(σ21)

−αt/h1(−αt),

h1(−αt)

G

G∑

g=1

(Zg)−αt a.s.−→ E(σ2

1)−αt uniformly for α ∈ [0, 1] as G → ∞. (36)

Combining (35) and (36),

h1(−αt)

G

G∑

g=1

(Zg)−αt − 1

G

G∑

g=1

(σ2g)

−αt a.s.−→ 0 uniformly for α ∈ [0, 1] as G → ∞. (37)

22

By Strong law of large numbers, ln(σ2pool) =

∑Gg=1 ln(σ2

g)/Ga.s.→ E ln(σ2

1). Thus[

σ2pool exp(−E ln(σ2

1))]t a.s.→

1 for any fixed t. Thus by the similar arguments as in (30),

(σ2pool)

αt a.s.−→ exp(αtE ln(σ21)) uniformly for α ∈ [0, 1] as G → ∞. (38)

Following the similar arguments and using the fact that E(ln(Z1)) = E [E(ln(Z1)|σ21)] =

E ln(σ21) + Ψ(ν/2) − ln(ν/2), we have

(ν

2)αt exp

(

−αtΨ(ν

2))

(Zpool)αt a.s.−→ exp[αtE ln(σ2

1)] uniformly for α ∈ [0, 1] as G → ∞. (39)

Combining (38) and (39),

(ν

2)αt exp

(

−αtΨ(ν

2))

(Zpool)αt − (σ2

pool)αt a.s.−→ 0 uniformly for α ∈ [0, 1] as G → ∞. (40)

Using (36), (37), (38) and (40), we have

w(αt)b(Z, αt) − b(σ2, αt)

=[

(ν

2)αt exp

(

−αtΨ(ν

2))

(Zpool)αt − (σ2

pool)αt] h1(−αt)

G

G∑

g=1

(Zg)−αt

−(σ2pool)

αt

[

h1(−αt)

G

G∑

g=1

(Zg)−αt − 1

G

G∑

g=1

(σ2g)

−αt

]

a.s.−→ 0 uniformly for α ∈ [0, 1] as G → ∞.

[Proof of Theorem 6 (i)] By Lemma 2, it is not difficult to check that

H1(Z, α,G) − H1(σ2, α,G)

a.s.−→ 0 uniformly for α ∈ [0, 1] as G → ∞. (41)

Similar arguments as in the proof of (34) together with the fact that maxα∈[0,1]

limG→∞

C1(α) < ∞ lead

to R1(σ2t, σ2t(α∗

1(G))) − R1(σ2t, σ2t(α∗

1(G)))a.s.→ 0.

Now we show for each pair (σ2,Z) satisfying (41), α∗

1(G)− α∗

1(G) → 0. Suppose that thereexists a pair (σ2,Z) for which (41) holds and α∗

1(G) − α∗

1(G) 9 0. Since α belongs to thecompact interval [0, 1], there exists a subsequence {Gn} such that |α∗

1(Gn)− α∗

1(Gn)| → β > 0.Note that (σ2

pool)αt∑G

g=1(σ2g)

−αt/G ≥ 1 for any G, by (23) we have

limG→∞

∂2H1(σ2, α,G)

∂α2≥ lim

G→∞

C1(α)D′(α) =Γ(ν

2+ (1 − α)t)

Γα(ν2)Γ1−α(ν

2+ t)

t2Ψ′

(ν

2+ (1 − α)t

)

.

Since the non-zero t > −ν/2, it is not difficult to show that δ , minα∈[0,1]

{ limG→∞

C1(α)D′(α)} > 0.

Then for an arbitrary 0 < ε < δβ2/16, there exists an N1 > 0 such that for any Gn > N1,

23

|α∗

1(Gn) − α∗

1(Gn)| > β/2. Consequently,

H1(σ2, α∗

1(Gn), Gn) − H1(σ2, α∗

1(Gn), Gn)

=1

2

∂2H1(σ2, α,Gn)

∂α2

∣

∣

∣

∣

α=ξ

(α∗

1(Gn) − α∗

1(Gn))2

≥ δβ2

8.

For the same ε, by (41), there exists another N2 > 0 such that for any Gn > N2, supα∈[0,1]|H1(Z, α,Gn)−H1(σ

2, α,Gn)| < ε. Therefore for any Gn > max{N1, N2}, we have

H1(Z, α∗

1(Gn), Gn) ≥ H1(σ2, α∗

1(Gn), Gn) − ε

≥ H1(σ2, α∗

1(Gn), Gn) + δβ2/8 − ε

≥ H1(Z, α∗

1(Gn), Gn) + δβ2/8 − 2ε

> H1(Z, α∗

1(Gn), Gn),

which contradicts the fact that H1(Z, α∗

1(Gn), Gn) is the minimum of H1(Z, α,Gn).The proof of Theorem 6 (ii) is skipped since it is similar.

References

Abramowitz, M. and Stegun, I. (1972). Handbook of mathematical functions, Dover, New York.

Baldi, P. and Long, A. D. (2001). A bayesian framework for the analysis of microarray expressiondata: regularized t-test and statistical inferences of gene changes, Bioinformatics pp. 509–519.

Brewster, J. F. and Zidek, J. V. (1974). Improving on equivariant estimators, Annals of Statis-tics 2: 21–38.

Brown, L. (1968). Inadmissibility of the usual estimators of scale parameters in problems withunknown location and scale parameters, Ann. Math. Statist. 39: 29–48.

Callow, M. J., Dudoit, S., Gong, E. L., Speed, T. P. and Rubin, E. M. (2000). Microarrayexpression profiling identifies genes with altered expression in hdl-deficient mice, GenomeResearch 10: 2022–2029.

Cui, X. and Churchill, G. A. (2003). Statistical tests for differential expression in cDNAmicroarray experiments, Genome Biology.

Cui, X., Hwang, J. T. G., Qiu, J., Blades, N. J. and Churchill, G. A. (2005). Improvedstatistical tests for differential gene expression by shrinking variance components estimates,Biostatistics 6: 59–75.

24

Ferguson, T. S. (1996). A course in large sample theory, London: Chapman and Hall.

Jain, N., Thatte, J., Braciale, T., Ley, K., O’Connell, M. and Lee, J. (2003). Local-poolederror test for identifying differentially expressed genes with a small number of replicatedmicroarrays, Bioinformatics 19: 1945–1951.

James, W. and Stein, C. (1961). Estimation with quadratic loss, Proc. 4th Berkeley Sympos.Math. Statist. and Prob. 1: 361–379.

Kamb, A. and Ramaswami, A. (2001). A simple method for statistical analysis of intensitydifferences in microarray-derived gene expression data, BMC Biotechnol. pp. 1–8.

Kendziorski, C. M., Newton, M. A., Lan, H. and Could, M. N. (2003). On parametric empiricalbayes methods for comparing multiple groups using replicated gene expression profiles,Statistics in Medicine 22: 3899–3914.

Kerr, M. K., Afshari, C. A., Bennett, L., Bushel, P., Martinez, J., Walker, N. J. and Churchill,G. A. (2002). Statistical analysis of a gene expression microarray experiment with repli-cation, Statistica Sinica 12: 203–218.

Kubokawa, T. (1994). A unified approach to improving equivariant estimators, Annals ofStatistics 22: 290–299.

Kubokawa, T. (1999). Shrinkage and modification techniques in estimation of variance and therelated problems: A review, Comm. Statist. A–Theory Methods 28: 613–650.

Kubokawa, T. and Srivastava, M. S. (2003). Estimating the covariance matrix: a new approach,Journal of Multivariate Analysis 86: 28–47.

Ledoit, O. and Wolf, M. (2004a). Honey, I shrunk the sample covariance matrix, Journal ofPortfolio Management 30: 110–119.

Ledoit, O. and Wolf, M. (2004b). A well-conditioned estimator for large-dimensional covariancematrices, Journal of Multivariate Analysis 88: 365–411.

Leung, Y. and Cavalieri, D. (2003). Fundamentals of cDNA microarray data analysis, TRENDSin Genetics 11: 649–659.

Lin, Y., Nadler, S. T., Attie, A. D. and Yandell, B. S. (2003). Adaptive gene picking withmicroarray data: detecting important low abundance signals. in Parmigiani, G., Garrett,E. S., Irizarry, R. A. and Zeger, S. L. (ed.), The Analysis of Gene Expression Data: Methodsand Software. New York: Springer.

Lindley, D. V. (1962). Discussion of professor stein’s paper: Confidence sets for the mean of amultivariate normal distribution, Journal of the Royal Statistical Society B 24: 265–296.

25

Littell, R. C., Milliken, G., Stroup, W. W. and Wolfinger, R. D. (1996). SAS System for MixedModels, SAS Institute Inc., NC.

Lonnstedt, I. and Speed, T. (2002). Replicated microarray data, Statistica Sinica 12: 31–46.

Maatta, J. M. and Casella, G. (1990). Developments in decision-theoretic variance estimation,Statistical Science 5: 90–101.

Nguyen, D. V., Arpat, A. B., Wang, N. and Carroll, R. J. (2002). DNA microarray experiments:biological and technological aspects, Biometrics 58: 701–717.

Perron, F. (1990). Equivariant estimators of a covariance matrix, Canad. J. Statist. 18: 179–182.

Searle, S. R., Casella, G. and McCulloch, C. E. (1992). Variance Components, Wiley, NewYork.

Sinha, B. K. and Ghosh, M. (1987). Inadmissibility of the best equivariant estimators ofthe variance-covariance matrix, the precision matrix and the generalized variance underentropy loss, Statist. Decisions 5: 201–227.

Stein, C. (1964). Inadmissibility of the usual estimator for the variance of a normal distributionwith unknown mean, Ann. Inst. Statist. Math. 16: 155–160.

Storey, J. and Tibshirani, R. (2003). SAM thresholding and false discovery rates for detectingdifferential gene expression in DNA microarrays. in Parmigiani, G., Garrett, E. S., Irizarry,R. A. and Zeger, S. L. (ed.), The Analysis of Gene Expression Data: Methods and Software.New York: Springer.

Wright, G. W. and Simon, R. M. (2003). A random variance model for detection of differentialgene expression in small microarray experiments, Bioinformatics 19: 2448–2455.

Wu, H. and Churchill, G. (2005). R/MAANOVA: an extensive R environment forthe analysis of microarray experiments, http://www.jax.org/staff/churchill/labsite /soft-ware/Rmaanova/maanova.pdf.

26

Date post:	05-Aug-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Optimal Shrinkage Estimation of Variances with...

Documents