Monte Carlo EM algorithm in logistic linear models involving non-ignorable missing data

Available online at www.sciencedirect.com

Applied Mathematics and Computation 197 (2008) 440–450

www.elsevier.com/locate/amc

Monte Carlo EM algorithm in logistic linear modelsinvolving non-ignorable missing data q

Jeong-Soo Park a,*, Guoqi Q. Qian b, Yuna Jun c

a Department of Statistics, Chonnam National University, Gwangju, Republic of Koreab Department of Statistics, La Trobe University, Victoria 3086, Australia

c Samsung Tesco LTD, SambuBldg 676 Yeoksamdong, Seoul, 135-979, Korea

Abstract

Many data sets obtained from surveys or medical trials often include missing observations. Since ignoring the missinginformation usually cause bias and inefficiency, an algorithm for estimating parameters is proposed based on the likelihoodfunction of which the missing information is taken account. A binomial response and normal exploratory model for themissing data are assumed. We fit the model using the Monte Carlo EM (Expectation and Maximization) algorithm. TheE-step is derived by Metropolis–Hastings algorithm to generate a sample for missing data, and the M-step is done by New-ton–Raphson to maximize the likelihood function. Asymptotic variances and the standard errors of the MLE (maximumlikelihood estimates) of parameters are derived using the observed Fisher information.� 2007 Elsevier Inc. All rights reserved.

Keywords: Conditional expectation; Fisher information matrix; Maximum likelihood estimation; Metropolis–Hastings algorithm;Newton–Raphson iteration; Standard error

1. Introduction

Many data sets obtained from surveys or medical trials often include missing observations [1]. When thesedata sets are analyzed, it is general to use only complete cases with data all observed after removing missingdata. However, this may cause some problems if the missing data is related to the values of the missing item[2]. The estimate of parameter could be biased and be inefficient [3]. So we need some method for utilizing thepartial information involved in the missing data instead of ignoring them. Little and Rubin [3] describedmany statistical methods dealing with the missing data. Baker and Laird [4] used the EM (Expectationand Maximization) algorithm to obtain maximum likelihood estimates (MLE) of parameters from the incom-plete data. Ibrahim and Lipsitz [5,6] presented Bayesian methods for estimation in generalized linear models.

0096-3003/$ - see front matter � 2007 Elsevier Inc. All rights reserved.

doi:10.1016/j.amc.2007.07.080

q This work was supported by Korean Science and Engineering Foundation (KOSEF) grant funded by the Korea government (MOST)(R01-2006-000-11087-0).

* Corresponding author.E-mail addresses: [email protected] (J.-S. Park), [email protected] (G.Q. Qian), [email protected] (Y. Jun).

mailto:[email protected]



https://www.researchgate.net/publication/200033837_Statistical_Analysis_With_Missing_Data?el=1_x_8&enrichId=rgreq-c2589570-2f86-40a9-8cc0-c10d74411eb4&enrichSource=Y292ZXJQYWdlOzIyMDU1NzY0MTtBUzo5OTY4MDgwOTY1MjIyNUAxNDAwNzc3MTY1MzM5


https://www.researchgate.net/publication/4771936_Missing_covariates_in_generalized_linear_models_when_the_missing_data_mechanism_is_non-ignorable?el=1_x_8&enrichId=rgreq-c2589570-2f86-40a9-8cc0-c10d74411eb4&enrichSource=Y292ZXJQYWdlOzIyMDU1NzY0MTtBUzo5OTY4MDgwOTY1MjIyNUAxNDAwNzc3MTY1MzM5

https://www.researchgate.net/publication/247378274_Regression_Analysis_for_Categorical_Variables_With_Outcome_Subject_to_Nonignorable_Nonresponse?el=1_x_8&enrichId=rgreq-c2589570-2f86-40a9-8cc0-c10d74411eb4&enrichSource=Y292ZXJQYWdlOzIyMDU1NzY0MTtBUzo5OTY4MDgwOTY1MjIyNUAxNDAwNzc3MTY1MzM5

https://www.researchgate.net/publication/14398602_Parameter_Estimation_from_Incomplete_Data_in_Binomial_Regression_When_the_Missing_Data_Mechanism_is_Nonignorable?el=1_x_8&enrichId=rgreq-c2589570-2f86-40a9-8cc0-c10d74411eb4&enrichSource=Y292ZXJQYWdlOzIyMDU1NzY0MTtBUzo5OTY4MDgwOTY1MjIyNUAxNDAwNzc3MTY1MzM5

J.-S. Park et al. / Applied Mathematics and Computation 197 (2008) 440–450 441

Our proposed method stems from [5,6], and can be thought as an extended and modified version for differentmodel.

There are two types of missing data: Ignorable and non-ignorable [3]. Missing data is called ignorable(non-ignorable) if the probability of observing a data item is independent of (dependent on) the value ofthat data item. The data that is missing at random is ignorable, while non-ignorable missing data is not atrandom.

In this paper, we propose a method for estimating parameters in logistic linear models involving non-ignor-able missing data. A binomial response and normal covariate model for the missing data is assumed. TheMonte Carlo EM algorithm is used to estimate parameters [7]. Metropolis–Hastings algorithm to generatea sample for missing data is used in the E-step. Newton–Raphson iteration to solve the score equation is usedto maximize the conditional expectation of likelihood function in the M-step. The standard errors of theseestimates are calculated by the observed Fisher information matrix.

The rest of this paper is organized as follows. In Section 2, notation and model are stated. In Section 3 wederive the E- and M-steps of the Monte Carlo EM algorithm including Metropolis–Hastings algorithm andNewton–Raphson iteration. Calculation for standard error is described in Section 4. In Section 5 we illustrateour method with one example. A summary is given at the last section. Details of derivatives for Newton–Raphson iteration and formulas for elements of observed Fisher information matrix are given in theAppendix.

2. Notation and model

Suppose that y1, . . . ,yn are independent observations, where each yi has a binomial distribution with samplesize mi and success probability pi. Let Xi = (X1i,X2i)

t is a 2 · 1 random vector of covariates, where X1i and X2i

are independent observations and follow normal distributions with means l1, l2 and variances r21, r2

2, respec-tively. Further, let bt = (b0,b1,b2) are regression coefficients assuming to include an intercept. It is alsoassumed that

logitðpiÞ ¼ logpi

1� pi¼ X t

ib; and

pðyijX i; bÞ ¼expfyiX

tibg

1þ expfX tibg

:ð1Þ

We assume that X1i is completely observed, and Yi and X2i are partially missing. Our objective is to estimateb; l1; l2; r

21; r

22 (using maximum likelihood estimation) and their standard errors from the given data with non-

ignorable missing values.Missing value indicators are introduced [3] as

ri ¼0 if yi is observed;

1 if yi is missing;

�and si ¼

0 if x2i is observed;

1 if x2i is missing;

�ð2Þ

with probabilities P(ri) = wi, P(si) = /i. Following [6], the non-ignorable missing-data mechanism is defined as

logitðwiÞ ¼ d1X 1i þ d2X 2i þ yix;

logitð/iÞ ¼ a1X 1i þ a2X 2i þ yis; i ¼ 1; 2; . . . ; n;ð3Þ

where d = (d1,d2)t, a = (a1,a2)t, x and s are parameters determining the missing mechanism. The conditionalprobability functions for ri and si are derived by Eqs. (1) and (3) as

pðrijX i; yi; d;xÞ ¼expfriðX t

idþ yixÞg1þ expfX t

idþ yixg; ð4Þ

pðsijX i; yi; a; sÞ ¼expfsiðX t

iaþ yisÞg1þ expfX t

iaþ yisg: ð5Þ



https://www.researchgate.net/publication/220558243_Using_Monte_Carlo_method_for_ranking_efficient_DMUs?el=1_x_8&enrichId=rgreq-c2589570-2f86-40a9-8cc0-c10d74411eb4&enrichSource=Y292ZXJQYWdlOzIyMDU1NzY0MTtBUzo5OTY4MDgwOTY1MjIyNUAxNDAwNzc3MTY1MzM5



https://www.researchgate.net/publication/14398602_Parameter_Estimation_from_Incomplete_Data_in_Binomial_Regression_When_the_Missing_Data_Mechanism_is_Nonignorable?el=1_x_8&enrichId=rgreq-c2589570-2f86-40a9-8cc0-c10d74411eb4&enrichSource=Y292ZXJQYWdlOzIyMDU1NzY0MTtBUzo5OTY4MDgwOTY1MjIyNUAxNDAwNzc3MTY1MzM5

442 J.-S. Park et al. / Applied Mathematics and Computation 197 (2008) 440–450

Now we derive the joint probability function of yi, x2i, ri, si as

pðyi; x2i; ri; sijx1iÞ ¼ pðri j yi;X i; d;xÞpðsijyi;X i; a; sÞpðyijX i; bÞpðx2i j x1iÞ

/ expfriðX tidþ yixÞg

1þ expfX tidþ yixg

� expfsiðX tiaþ yisÞg

1þ expfX tiaþ yisg

� expfX tibyig � ð1þ expfX t

ibgÞ�mi

� ð2pr22Þ�1=2 � exp

�ðx2 � l2Þ2

2r22

( ): ð6Þ

Therefore, we can write down the complete-data log-likelihood l(h) by

log Lðhjyi;X i; ri; siÞ ¼Xn

i¼1

logexpfriðX t

idþ yixÞg1þ expfX t

idþ yixg

� �þXn

i¼1

logexpfsiðX t

iaþ yisÞg1þ expfX t

iaþ yisg

� �þXn

i¼1

X tibyi

�Xn

i¼1

mi logð1þ expfX tibgÞ �

n2

logð2pr22Þ �

Xn

i¼1

ðx2i � l2Þ2

2r22

; ð7Þ

where h ¼ ðb; d;x; a; s; l2; r22Þ is the parameters related to develop EM algorithm. The complete-data log-like-

lihood specifies a model for the joint characterization of the observed data and the associated missing-datamechanism.

3. Monte Carlo EM algorithm

3.1. Algorithm formulation

The MLE of b and other components of h are the ones maximizing the observed-data likelihoodL(hj(y,X)obs, ri, si) that has a quite intractable analytical form, where (y,X)obs is the observed componentsof (y,X). Rather than directly differentiating L(hj(y,X)obs, ri, si) with respect to h, we compute the MLE of husing an EM algorithm [8] which involves iterative evaluation and maximization of conditional expectationof the complete-data log-likelihood l(h). If the conditional expectation involved is difficult to be evaluatedexactly, a Monte Carlo EM (MCEM) algorithm [9] can be used where a Gibbs sampler or Metropolis–Has-tings algorithm [10] is used to approximate the conditional expectation.

Specifically, let h 0 be the current estimate of h and define the conditional expectation of l(h) – with respect tothe conditional distribution of the missing data (y,X)mis given the observed data (yi,Xi, ri, si) and the value h 0 –as the following:

Qðh; h0Þ ¼ E½lðhÞjðy;X Þobs; r; s; h0�: ð8Þ

The feasibility of calculating the conditional expectation in Q(h,h 0) is dependent on the complexity of the con-ditional distribution of the missing data.

The EM algorithm is composed of expectation (E-step) and maximization (M-step) iterations. Now for theexpectation of the complete-data log-likelihood in the E-step of EM algorithm, we consider four possiblecases: response variable yi is missing, a covariate x2i is missing, both of them are missing, and no missing val-ues. Then the expected log-likelihood is written by

E½lðhÞjX i; yi; ri; si� ¼

Pmi

yi¼0

lðhÞpðyijX i; ri; siÞ ðif yi has missing componentsÞ;R

lðhÞpðx2ijx1i; yi; ri; siÞdx2i;mis ðif x2i has missing componentsÞ;Pmi

yi¼0

RlðhÞpðyi; x2ijx1i; ri; siÞdx2i;mis ðif yi and x2i have missing componentsÞ;

lðhÞ ðfor no missing valuesÞ;

8>>>>>>>><>>>>>>>>:

ð9Þ

https://www.researchgate.net/publication/238523240_A_Monte_Carlo_Implementation_of_the_EM_Algorithm_and_the_Poor_Man's_Data_Augmentation_Algorithm?el=1_x_8&enrichId=rgreq-c2589570-2f86-40a9-8cc0-c10d74411eb4&enrichSource=Y292ZXJQYWdlOzIyMDU1NzY0MTtBUzo5OTY4MDgwOTY1MjIyNUAxNDAwNzc3MTY1MzM5

https://www.researchgate.net/publication/256476485_Maximum_Likelihood_from_Incomplete_Data_Via_the_EM_Algorithm_with_discussion_J_Roy?el=1_x_8&enrichId=rgreq-c2589570-2f86-40a9-8cc0-c10d74411eb4&enrichSource=Y292ZXJQYWdlOzIyMDU1NzY0MTtBUzo5OTY4MDgwOTY1MjIyNUAxNDAwNzc3MTY1MzM5


where x2i,mis is the missing components of x2i. Eqs. (8) and (9) lead to the conditional expectation of l(h) whichis our target quantity as

Qðh; hrÞ ¼Xn1

i¼1

lðhÞ þXn2

i¼n1þ1

Xmi

yi¼0

lðhÞpðyi;misjX i; ri; si; hrÞ þ

Xn3

n2þ1

ZlðhÞpðx2i;misjX i;obs; yi; ri; si; h

rÞdx2i;mis

þXn

n3þ1

Xmi

yi¼0

ZlðhÞpðyi;mis; x2i;misjX i;obs; ri; si; h

rÞdx2i;mis; ð10Þ

where n1, n2, n3 are corresponding sample sizes, hr is the rth iteration estimate of h, yi,mis is the missing com-ponents of yi, Xi,obs is the observed component of Xi, and p(yi,misjXi, ri, si), p(x2i,misjXi,obs,yi, ri, si) and p(yi,mis, -x2i,misjXi,obs, ri, si) are the conditional probabilities of the missing data given the observed data. Theseconditional probabilities are regarded as the weights in Q(h,hr). The weights have the form respectively asfollowing:

pðyi;mis; x2i;misjX i;obs; ri; si; hrÞ ¼ pðyijX i; h

rÞpðx2ijx1iÞpðrijyi;X i; hrÞpðsi j yi;X i; h

rÞPmiyi¼0

Rpðyi j X i; h

rÞpðx2i j x1iÞpðrijyi;X i; hrÞpðsijyi;X i; h

rÞ

/ pðyi; x2i; ri; sijx1i; hrÞ; ð11Þ

pðx2i;mis j X i;obs; yi; ri; si; hrÞ ¼ pðx2i j x1i; h

rÞpðsijyi;X i; hrÞR

pðx2i j x1i; hrÞpðsijyi;X i; h

rÞ / pðx2i j x1i; hrÞpðsi j yi;X i; h

rÞ

/exp siðX t

iaþ yisÞ� �

1þ expfsiðX tiaþ yisÞg

� ð2pr22Þð�1=2Þ � exp �ðx2i � l2Þ

2

2r22

( ); ð12Þ

pðyi;misjX i; ri; si; hrÞ ¼ pðyi j X i; h

rÞpðrijyi;X i; hrÞPmi

yi¼0pðyijX i; hrÞpðrijyi;X i; h

rÞ / pðyi j X i; hrÞpðrijyi;X i; h

rÞ: ð13Þ

Since X2 is continuous random variable, the number of possible values that ðyi; X tiÞmis can take is almost

infinity. Thus the above weights (11)–(13) are not explicitly computed and so is the exact computing ofQ(h,hr). In this situation, we can use a Gibbs sampler or Metropolis–Hastings algorithm to simulate a sampleof ðyi; X t

iÞmis values and use the associated empirical distribution to approximate the weights. This means thatthe conditional expectation Q(h,hr) is to be calculated by a Monte Carlo approximation, which leads a MonteCarlo EM (MCEM) algorithm.

3.2. Metropolis–Hastings algorithm

For generation of random samples from weights functions (11)–(13), it is particularly convenient to use aMetropolis–Hastings (MH) algorithm [10] where we choose the multivariate normal distribution as the oper-ating transition density and an easily verified acceptance–rejection condition. When some of yi and x2i aremissing, the algorithm uses the following steps to generate a sample of (yi,x2i)mis [11]. When hr, mi, x1i andx2i are given, it is easy to generate yi from the binomial distribution B(mi,pi) where

pi ¼ expfxtib

rg=ð1þ expfxtib

rgÞ: ð14Þ

So we will present a MH algorithm of generating {x2i}mis for given xi,obs, yi and hr.

Step (1) Set the initialization of parameters hr, x1i,obs, ri, si.Step (2) Repeat the following steps for k = 0,1, . . . ,n � 1:

1. Generate ðyðkÞmis; xðkÞ2;misÞ from (11).

2. Generate xðkÞ2 and y(k) from their distributions where x2 � Nðl2; r22Þ and y � B(mk,pk), where pk is

given by (14) for the xk sample generated.

https://www.researchgate.net/publication/238879748_The_EM_Algorithm_and_Related_Statistical_ModelsThe_EM_Algorithm_and_Related_Statistical_Models?el=1_x_8&enrichId=rgreq-c2589570-2f86-40a9-8cc0-c10d74411eb4&enrichSource=Y292ZXJQYWdlOzIyMDU1NzY0MTtBUzo5OTY4MDgwOTY1MjIyNUAxNDAwNzc3MTY1MzM5


3. Compute the acceptance probability

aðkÞ ¼ aðxðkÞ2;mis; pkÞ; ð15Þ

¼ minpkPðxðkÞ2;misjx

ðkÞ2 ; yðkÞÞ

pðxðkÞ2;misÞP ðxðkÞ2 ; yðkÞjxðkÞ2;misÞ

; 1

( ); ð16Þ

where pðxðkÞ2;misÞ is calculated using xðkÞ2;mis according to the Eq. (14).
4. Take
xðkþ1Þ2;mis ¼

xðkÞ2 with probability aðkÞ;xðkÞ2;mis with probability 1� aðkÞ:

(ð17Þ

Step (3) Obtain the sample fxð1Þ2;mis; . . . ; xðnÞ2;misg.

3.3. M-step and convergence

Now turn to the M-step in the MCEM algorithm where we need to find a value of h, say h 0, at whichQ(h,h 0) will attain the maximum. This can be done by solving the score equation which sets to 0 the derivativeof Q(h,h 0) with respect to h. The Newton–Raphson method will be used to solve the score equation. Theparameters hr+1 = (b,d,x,a,s) in the M-step at the (r + 1)st EM iteration and the (t + 1)st Newton–Raphsoniteration take the form (for b for example):

btþ1 ¼ bt þ � o2Qðh; hrÞobob

� ��1��b¼bt

� oQðh; hrÞob

� ��b¼bt

: ð18Þ

The details of the derivatives used in the iteration are given in the Appendix. The (r + 1)st estimates of l2, r22

are obtained to by solving the score equations:

o

olQðh; hrÞ ¼

Xn

i¼1

Eðx2i j x1i; yi; ri; siÞ � nl2 ¼ 0;

o

or2

Qðh; hrÞ ¼Xn

i¼1

Eððx2i � l2Þ2jx1i; yi; ri; siÞ � nr2

2 ¼ 0:

ð19Þ

Therefore, we take lrþ12 , r2ðrþ1Þ

2 by

lrþ12 ¼ 1

nEðx2i j x1i; yi; ri; siÞ; ð20Þ

r2ðrþ1Þ2 ¼ 1

nEððx2i � l2Þ

2 j x1i; yi; ri; siÞ; ð21Þ

which are approximated by the sample averages of simulated and given observations.Since MCEM algorithm needs iterations, convergence check is required to get reliable result. The sequence

{Q(h,hr)} often exhibits an increasing trend, and then fluctuate around the value of Qðh; hÞ once r becomessufficiently large. The sequence fhrg would also fluctuate the the MLE h when r is sufficiently large. To mon-itor the convergence of MCEM algorithm we can plot {Q(h,hr)} as well as fhrg against iteration number. Weterminate the algorithm when the sequence of {Q(h,hr)} become stationary. Otherwise, we continue byincreasing the Monte Carlo precision in the E-step provided that the required calculation is computationallyfeasible.

4. Standard errors of estimates

It is well known that the distribution of maximum likelihood estimates h asymptotically follows anormal distribution MVN(h,V(h)) under some regularity condition. The expected Fisher information


matrix IðhÞ which gives the inverse of variance matrix of h is approximated by the observed informationmatrix J hðY Þ:

TableA sam

N

y

x1

x2

V ðhÞ�1 ¼ nE � o2 log LðhÞoh2

� ��h¼h

/ nZ� o2 log LðhÞ

oh2

� dx�

Xn

i¼1

� o2 log LðhÞoh

� ��h¼h

� nJðhÞ: ð22Þ

We apply the result of [12] on the information of h:

observed information ¼ complete information�missing information; ð23Þ
so that we have
IðhÞ � J hðY Þ ¼ �o2 log LðhÞ

oh2¼ � o2Qðh; hÞ

oh2� Varh

Xn

i¼1

o log LðhÞoh

!" #��h¼h

; ð24Þ

where Varh(Æ) is the conditional variance given (y,X)obs, r, s and hr. The details are to be provided in theAppendix.

5. An illustration

In this section, we show an example to illustrate MCEM algorithm method with missing response variableand a covariate in logistic regression model. At first, we generate covariate x1i, x2i independently at random.Each x1i, x2i has normal distribution Nðl1; r

21Þ, Nðl2; r

22Þ, respectively. The response variable yi is generated

from binomial distribution with sample size mi, probability p by generating x1i, x2i where pi = exp{xtb}/(1 + exp{xtb}). We apply missing data mechanism to generate missing data for variable yi, x2i, where wi

and /i are defined in (3). Then each ri, si is generate from Bernoulli distribution with success probabilitywi, /i. The data set with missing observations generated by the above procedure is presented in Table 1, where‘–’ is missing, and ‘0’ is observed. At second, we use Metropolis–Hastings algorithm to generate samples fromweights (11)–(13) in the E-step. We illustrate the algorithm by each condition.

5.1. Both y and x2 are missing

1. The probability density is defined as Eq. (25) and the conjugate density is normal distribution with mean 0and variance 1.

f ðX Þ ¼ pðyi;mis; x2i;misjX i;obs; ri; si; hrÞ / pðyi; x21; ri; sijx1i; h

rÞ



� expfsiðX tiaþ yisÞg

1þ expfX tiaþ yisg

� expfX tibyig

� ð1þ expfX tibgÞ

�mi � ð2pr22Þ�1=2 � exp

�ðx2 � l2Þ2

2r22

( ); ð25Þ

gðX Þ ¼ expðx2i � l2Þ

2

2

( ): ð26Þ

2. To do the Metropolis–Hastings algorithm, we generate a initial sample x2,0 = (x2,01, . . . ,x2,0J), j = 1,2, . . .,J

from normal distribution Nðl2; r22Þ. Then we generate a sample ~x2 from Nðl2; r

22Þ and u from uniform

1ple data set with missing observations (‘–’ is missing, and ‘0’ is observed) generated by the method described in the Section 5

1 2 � � � n1 n1 + 1 � � � n2 n2 + 1 � � � n3 n3 + 1 � � � n

0 0 0 0 – – – 0 0 0 – – –0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 – – – – – –

https://www.researchgate.net/publication/239993134_Finding_Observed_Information_Using_the_EM_Algorithm?el=1_x_8&enrichId=rgreq-c2589570-2f86-40a9-8cc0-c10d74411eb4&enrichSource=Y292ZXJQYWdlOzIyMDU1NzY0MTtBUzo5OTY4MDgwOTY1MjIyNUAxNDAwNzc3MTY1MzM5


distribution. After computing aðx2;i�1;~x2Þ, if u is lower than a then x2i ¼ ~x2, otherwise x2i = x2,i�1. We gen-erate k sample sets repeating the step k = 1,2, . . .,K. Now we can generate yi from binomial distributionwith the computed pi ¼ expfxt

ibg=ð1þ expfxtibgÞ using the x2i sample generated. And

aðxi�1;~xÞ ¼ minf ðxi�1Þgð~xÞf ð~xÞgðxi�1Þ

; 1

� : ð27Þ

5.2. Only x2 is missing

The probability density is defined as Eq. (28) and the conjugate density is normal distribution with mean 0and variance 1. Also, the method is same as in case when yi and x2i are missing

f ðX Þ ¼ pðx2i;misjX i;obs; yi; ri; si; hrÞ / pðx2ijx1i; h

rÞpðsijyi;X i; hrÞ

/ expfsiðX tiaþ yisÞg

1þ expfsiðX tiaþ yisÞg

� ð2pr22Þð�1=2Þ � exp �ðx2i � l2Þ

2

2r22

( ); ð28Þ

gðX Þ ¼ expðx2i � l2Þ

2

2

( ): ð29Þ

5.3. Only y is missing

The probability density is defined as Eq. (30) and the conjugate density is binomial distribution

f ðX Þ ¼ pðyi;mis j X i; ri; si; hrÞ / pðyi j X i; h

rÞpðrijyi;X i; hrÞ



� expfX tibyig � ð1þ expfX t

ibgÞ�mi ; ð30Þ

gðX Þ ¼ expfX tibyig � ð1þ expfX t

ibgÞ�mi : ð31Þ

6. Summary

An algorithm for estimating parameters in logistic linear models is proposed for incomplete data. Whensome of the response and a covariate observations are missing non-ignorably, the maximum likelihood esti-mation (MLE) is considered. Metropolis–Hastings (MH) algorithm to compute the conditional expectationof log-likelihood function is implemented in the proposed Monte Carlo EM (Expectation and Maximization)algorithm. Newton–Raphson iteration is used in the M-step of the algorithm. For the standard error of MLEis also considered using the observed Fisher information matrix. Details of MH algorithm, derivatives neededin M-step, and formulas of observed Fisher information matrix are given with an illustration.

Acknowledgements

This paper was started while the third author was visiting the Department of Statistics, La Trobe Univer-sity, Australia. She thanks Dr. Richard Huggins and the staff in the department for their hospitality andsupport.

Appendix 1. Derivatives of Q(h,hr) for M-step

Iterating E-step and M-step, the (r + 1)st Newton–Raphson step estimates of hr+1 = (b,d,x,a,s) can beobtained using these derivatives


o

obQðh; hrÞ ¼

Xn1

i¼1

xtiyi þ

Xn2

i¼n1þ1

Eðxtiyijxi; h

rÞ þXn3

i¼n2þ1

Eðxtiyijxobs; yi; h

rÞ

þXn4

i¼n3þ1

Eðxtiyijxobs; h

rÞ þXn1

i¼1

xtipi �

Xn2

i¼n1þ1

EðxtipixijhrÞ

þXn3

i¼n2þ1

Eðxtipijxobs; yi; h

rÞ þXn

i¼n3þ1

Eðxtipijxobs; h

rÞ;

o2

obobt Qðh; hrÞ ¼Xn1

i¼1

xtpiðpi � 1Þxi þXn2

i¼n1þ1

Eðxtipiðpi � 1Þxijxi; h

rÞ

þXn3

i¼n2þ1

Eðxtipiðpi � 1Þxijxobs; yi; h

rÞ þXn

i¼n3þ1

Eðxtipiðpi � 1Þxijxobs; h

rÞ;

where pi ¼ expfxtibg=ð1þ expfxt

ibgÞ,
o
odQðh; hrÞ ¼

Xn1

i¼1

xtiðri � wiÞ þ

Xn2

i¼n1þ1

Eðxtiðri � wiÞ j xobs; h

rÞ

þXn3

i¼n2þ1

Eðxtiðri � wiÞjxobs; yi; h

rÞ þXn

i¼n3þ1

Eðxtiðri � wiÞjxobs; h

rÞ;

o2

ododt Qðh; hrÞ ¼Xn1

i¼1

xtiriwið1� wiÞ þ

Xn2

i¼n1þ1

Eðxtiriwið1� wiÞjxobs; h

rÞ

þXn3

i¼n2þ1

Eðxtiriwið1� wiÞjxobs; yi; h

rÞ þXn

i¼n3þ1

Eðxtiriwið1� wiÞjxobs; h

rÞ

o

oxQðh; hrÞ ¼

Xn1

i¼1

ytiðri � wiÞ þ

Xn2

i¼n1þ1

Eðytiðri � wiÞjxobs; h

rÞ

þXn3

i¼n2þ1

Eðytiðri � wiÞjxobs; yi; h

rÞ þXn

i¼n3þ1

Eðytiðri � wiÞjxobs; h

rÞ;

o2

oxoxtQðh; hrÞ ¼

Xn1

i¼1

ytiriwið1� wiÞ þ

Xn2

i¼n1þ1

Eðytiriwið1� wiÞjxobs; h

rÞ

þXn3

i¼n2þ1

Eðytiriwið1� wiÞjxobs; yi; h

rÞ þXn4

i¼n3þ1

Eðytiriwið1� wiÞjxobs; h

rÞ;

where wi ¼ expfxtidþ yixg=ð1þ expfxt

idþ yixgÞ,
o
oaQðh; hrÞ ¼

Xn1

i¼1

xtiðsi � /iÞ þ

Xn2

i¼n1þ1

Eðxtiðsi � /iÞjxobs; h

rÞ

þXn3

i¼n2þ1

Eðxtiðsi � /iÞjxobs; yi; h

rÞ þXn

i¼n3þ1

Eðxtiðsi � /iÞjxobs; h

rÞ;

o2

oaoatQðh; hrÞ ¼

Xn1

i¼1

xtisi/ið1� /iÞ þ

Xn2

i¼n1þ1

Eðxtisi/ið1� /iÞjxobs; h

rÞ

þXn3

i¼n2þ1

Eðxtisi/ið1� /iÞjxobs; yi; h

rÞ þXn

i¼n3þ1

Eðxtisi/ið1� /iÞjxobs; h

rÞ;

o

osQðh; hrÞ ¼

Xn1

i¼1

ytiðsi � /iÞ þ

Xn2

i¼n1þ1

Eðytiðsi � /iÞjxobs; h

rÞ

þXn3

i¼n2þ1

Eðytiðsi � /iÞjxobs; yi; h

rÞ þXn

i¼n3þ1

Eðytiðsi � /iÞjxobs; h

rÞ;


o2

osostQðh; hrÞ ¼

Xn1

i¼1

ytisi/ið1� /iÞ þ

Xn2

i¼n1þ1

Eðytisi/ið1� /iÞjxobs; h

rÞ

þXn3

i¼n2þ1

Eðytisi/ið1� /iÞjxobs; yi; h

rÞ þXn

i¼n3þ1

Eðytisi/ið1� /iÞjxobs; h

rÞ;

where /i ¼ expfxtiaþ yisg=ð1þ expfxt

iaþ yisgÞ.

Appendix 2. Observed Fisher information matrix

The expected Fisher information matrix IðhÞ which gives the inverse of variance matrix of h is approxi-mated by the observed information matrix J hðY Þ

V ðhÞ�1 ¼ nE � o2 log LðhÞ

oh2

� ��h¼h

/ nZ� o

2 log LðhÞoh2

� dx �

Xn

i¼1

� o2 log LðhÞ

oh

� ��h¼h

� nIðhÞ; ð32Þ

IðhÞ � J hðY Þ ¼ � o2 log LðhÞoh2

� ��h¼h

¼ � o2Qðh; hrÞoh2

� Varh

Xn

i¼1

o log LðhÞoh

!" #��h¼h

; ð33Þ

where Varhrð�Þ is the conditional variance given a (y,X)obs, r, s and hr.

J hðY Þ ¼

1 2 3 4 5 6 7

1 � o2 log LðhÞob2 � o2 log LðhÞ

obod � o2 log LðhÞobox � o2 log LðhÞ

oboa � o2 log LðhÞobos � o2 log LðhÞ

obol2� o2 log LðhÞ

obor2

2 � o2 log LðhÞobod � o2 log LðhÞ

od2 � o2 log LðhÞodox � o2 log LðhÞ

odoa � o2 log LðhÞodos � o2 log LðhÞ

odol2� o2 log LðhÞ

odor2

3 � o2 log LðhÞobox � o2 log LðhÞ

odox � o2 log LðhÞox2 � o2 log LðhÞ

oxoa � o2 log LðhÞoxos � o2 log LðhÞ

oxol2� o2 log LðhÞ

oxor2

4 � o2 log LðhÞoboa � o2 log LðhÞ

odoa � o2 log LðhÞoxoa � o2 log LðhÞ

oa2 � o2 log LðhÞoaos � o2 log LðhÞ

oaol2� o2 log LðhÞ

oaor2

5 � o2 log LðhÞobos � o2 log LðhÞ

odos � o2 log LðhÞoxos � o2 log LðhÞ

oaos � o2 log LðhÞos2 � o2 log LðhÞ

osol2� o2 log LðhÞ

ol2or2

6 � o2 log LðhÞobol2

� o2 log LðhÞodol2

� o2 log LðhÞoxol2

� o2 log LðhÞoaol2

� o2 log LðhÞosol2

� o2 log LðhÞol2

2

� o2 log LðhÞol2or2

7 � o2 log LðhÞobor2

� o2 log LðhÞodor2

� o2 log LðhÞoxor2

� o2 log LðhÞoaor2

� o2 log LðhÞosor2

� o2 log LðhÞol2or2

� o2 log LðhÞor2

2

0BBBBBBBBBBBBBBBBBBB@

1CCCCCCCCCCCCCCCCCCCA

and

� o2Qðh; hrÞoh2

¼

1 2 3 4 5 6 7

1 � o2 log LðhÞob2 0 0 0 0 0 0

2 0 � o2Qðh;hrÞod2 � o2Qðh;hrÞ

odox 0 0 0 0

3 0 � o2Qðh;hrÞodox � o2Qðh;hrÞ

ox2 0 0 0 0

4 0 0 0 � o2Qðh;hrÞoa2 � o2Qðh;hrÞ

oaos 0 0

5 0 0 0 � o2Qðh;hrÞoaos � o2Qðh;hrÞ

os2 0 0

6 0 0 0 0 0 � o2Qðh;hrÞol2

2

� o2Qðh;hrÞol2or2

7 0 0 0 0 0 � o2Qðh;hrÞol2or2

� o2Qðh;hrÞor2

2

0BBBBBBBBBBBBBBBBBB@

1CCCCCCCCCCCCCCCCCCA

:


Now we can estimate the J hðY Þ as follows:

J hðY Þ11 ¼ �o2 log LðhÞ

obobt � 1

k

Xn

i¼1

Xk

k¼1

ðX tiyÞðX t

iyÞt;

J hðY Þ12 ¼ �1

k

Xn

i¼1

Xk

k¼1

ðX tiyÞðX t

iðri � wiÞÞt;

J hðY Þ13 ¼ �1

k

Xn

i¼1

Xk

k¼1

ðX tiyÞðyiðri � wiÞÞ

t;

J hðY Þ14 ¼ �1

k

Xn

i¼1

Xk

k¼1

ðX tiyÞðX t

iðsi � /iÞÞt;

J hðY Þ15 ¼ �1

k

Xn

i¼1

Xk

k¼1

ðX tiyÞðyiðsi � /iÞÞ

t;

J hðY Þ16 ¼ �1

k

Xn

i¼1

Xk

k¼1

ðX tiyÞððx2 � l2Þ=r2

2Þ;

J hðY Þ17 ¼ �1

k

Xn

i¼1

Xk

k¼1

ðX tiyÞð�1=r3

2 þ ðx2 � l2Þ=r32Þ;


ododt � 1

k

Xn

i¼1

Xk

k¼1

ðX tiðri � wiÞÞðX t

iðri � wiÞÞt;

J hðY Þ23 ¼ �1

k

Xn

i¼1

Xk

k¼1

ðX tiðri � wiÞÞðyiðri � wiÞÞ

t;

J hðY Þ24 ¼ �1

k

Xn

i¼1

Xk

k¼1

ðX tiðri � wiÞÞðX t

iðsi � /iÞÞt;

J hðY Þ25 ¼ �1

k

Xn

i¼1

Xk

k¼1

ðX tiðri � wiÞÞðyiðsi � /iÞÞ

t;

J hðY Þ26 ¼ �1

k

Xn

i¼1

Xk

k¼1

ðX tiðri � wiÞÞððx2 � l2Þ=r2

2Þ;

J hðY Þ27 ¼ �1

k

Xn

i¼1

Xk

k¼1

ðX tiðri � wiÞÞð1=r3

2 þ ðx2 � l2Þ=r32Þ;

J hðY Þ33 ¼ �o

2 log LðhÞoxoxt

� 1

k

Xn

i¼1

Xk

k¼1

ðyiðri � wiÞÞðyiðri � wiÞÞt;

J hðY Þ34 ¼ �1

k

Xn

i¼1

Xk

k¼1

ðyiðri � wiÞÞðX tiðsi � /iÞÞ

t;

J hðY Þ35 ¼ �1

k

Xn

i¼1

Xk

k¼1

ðyiðri � wiÞÞðyiðsi � /iÞÞt;

J hðY Þ36 ¼ �1

k

Xn

i¼1

Xk

k¼1

ðyiðri � wiÞÞððx2 � l2Þ=r22Þ;

J hðY Þ37 ¼ �1

k

Xn

i¼1

Xk

k¼1

ðyiðri � wiÞÞð1=r32 þ ðx2 � l2Þ=r3

2Þ;



oaoat� 1

k

Xn

i¼1

Xk

k¼1

ðX tiðsi � /iÞÞðX t

iðsi � /iÞÞt;

J hðY Þ45 ¼ �1

k

Xn

i¼1

Xk

k¼1

ðX tiðsi � /iÞÞðyiðsi � /iÞÞ

t;

J hðY Þ46 ¼ �1

k

Xn

i¼1

Xk

k¼1

ðX tiðsi � /iÞÞððx2 � l2Þ=r2

2Þ;

J hðY Þ47 ¼ �1

k

Xn

i¼1

Xk

k¼1

ðX tiðsi � /iÞÞð1=r3

2 þ ðx2 � l2Þ=r32Þ;

J hðY Þ55 ¼ �o

2 log LðhÞosost

� 1

k

Xn

i¼1

Xk

k¼1

ðyiðsi � /iÞÞðyiðsi � /iÞÞt;

J hðY Þ56 ¼ �1

k

Xn

i¼1

Xk

k¼1

ðyiðsi � /iÞÞððx2 � l2Þr22Þ;

J hðY Þ57 ¼ �1

k

Xn

i¼1

Xk

k¼1

ðyiðsi � /iÞÞð1=r32 þ ðx2 � l2Þ=r3

2Þ;


ol2olt2

� 1

k

Xn

i¼1

Xk

k¼1

ððx2 � l2Þ=r22Þ

2;

J hðY Þ67 ¼ �1

k

Xn

i¼1

Xk

k¼1

ððx2 � l2Þ=r22Þð1=r2

2 þ ðx2 � l2Þ=r22Þ;

J hðY Þ77 ¼ �1

k

Xn

i¼1

Xk

k¼1

ð1=r32 þ ðx2 � l2Þ=r3

2Þ2: ð34Þ

References

[1] Y.G. Smirlis, E.K. Maragos, D.K. Despotis, Data envelopment analysis with missing values: An interval DEA approach, Appl.Math. Comput. 177 (2006) 1–10.

[2] M.M. Rueda, S. Gonzalez, A. Arcos, Indirect methods of imputation of missing data based on available units, Appl. Math. Comput.164 (2005) 249–261.

[3] R.J.A. Little, D.B. Rubin, Statistical Analysis with Missing Data, second ed., Wiley, NY, 2002.[4] S.G. Baker, N.M. Laird, Regression analysis for categorical variables with outcome subject to nonignorable nonresponse, J. Am. Stat.

Assoc. 83 (1988) 62–69.[5] J.G. Ibrahim, S.R. Lipsitz, Parameter estimation from incomplete data in bionomial regression when the missing data mechanism is

nonignorable, Biometrics 52 (1996) 1071–1078.[6] J.G. Ibrahim, S.R. Lipsitz, Missing covariates in generalized linear models when the missing data mechanism is non-ignorable, J.

Royal Stat. Soc. B 61 (1999) 173–190.[7] G.R. Jahanshahloo, F.H. Lotfi, H.Z. Rezai, F.R. Balf, Using Monte Carlo method for ranking efficient DMUs, Appl. Math.

Comput. 162 (2005) 371–379.[8] A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm (with discussion), J.

Royal Stat. Soc. B 39 (1977) 1–38.[9] G.C.G. Wei, M.A. Tanner, A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithm, J.

Amer. Stat. Assoc. 85 (1990) 699–704.[10] C.P. Robert, G. Casella, Monte Carlo Statistical Methods, Springer, Berlin, 1999.[11] M. Watanabe, K. Yamaguchi (Eds.), The EM Algorithm and Related Statistical Models, Marcel Dekker, NY, 2004.[12] T.A. Louis, Finding observed information using the EM algorithm, J. Royal Stat. Soc. B 44 (1982) 226–233.

Date post:	27-Nov-2023
Category:	Documents
Upload:	unimelb
View:	0 times
Download:	0 times

Monte Carlo EM algorithm in logistic linear models involving non-ignorable missing data

Documents