Available online at www.sciencedirect.com
Applied Mathematics and Computation 197 (2008) 440–450
www.elsevier.com/locate/amc
Monte Carlo EM algorithm in logistic linear modelsinvolving non-ignorable missing data q
Jeong-Soo Park a,*, Guoqi Q. Qian b, Yuna Jun c
a Department of Statistics, Chonnam National University, Gwangju, Republic of Koreab Department of Statistics, La Trobe University, Victoria 3086, Australia
c Samsung Tesco LTD, SambuBldg 676 Yeoksamdong, Seoul, 135-979, Korea
Abstract
Many data sets obtained from surveys or medical trials often include missing observations. Since ignoring the missinginformation usually cause bias and inefficiency, an algorithm for estimating parameters is proposed based on the likelihoodfunction of which the missing information is taken account. A binomial response and normal exploratory model for themissing data are assumed. We fit the model using the Monte Carlo EM (Expectation and Maximization) algorithm. TheE-step is derived by Metropolis–Hastings algorithm to generate a sample for missing data, and the M-step is done by New-ton–Raphson to maximize the likelihood function. Asymptotic variances and the standard errors of the MLE (maximumlikelihood estimates) of parameters are derived using the observed Fisher information.� 2007 Elsevier Inc. All rights reserved.
Keywords: Conditional expectation; Fisher information matrix; Maximum likelihood estimation; Metropolis–Hastings algorithm;Newton–Raphson iteration; Standard error
1. Introduction
Many data sets obtained from surveys or medical trials often include missing observations [1]. When thesedata sets are analyzed, it is general to use only complete cases with data all observed after removing missingdata. However, this may cause some problems if the missing data is related to the values of the missing item[2]. The estimate of parameter could be biased and be inefficient [3]. So we need some method for utilizing thepartial information involved in the missing data instead of ignoring them. Little and Rubin [3] describedmany statistical methods dealing with the missing data. Baker and Laird [4] used the EM (Expectationand Maximization) algorithm to obtain maximum likelihood estimates (MLE) of parameters from the incom-plete data. Ibrahim and Lipsitz [5,6] presented Bayesian methods for estimation in generalized linear models.
0096-3003/$ - see front matter � 2007 Elsevier Inc. All rights reserved.
doi:10.1016/j.amc.2007.07.080
q This work was supported by Korean Science and Engineering Foundation (KOSEF) grant funded by the Korea government (MOST)(R01-2006-000-11087-0).
* Corresponding author.E-mail addresses: [email protected] (J.-S. Park), [email protected] (G.Q. Qian), [email protected] (Y. Jun).
J.-S. Park et al. / Applied Mathematics and Computation 197 (2008) 440–450 441
Our proposed method stems from [5,6], and can be thought as an extended and modified version for differentmodel.
There are two types of missing data: Ignorable and non-ignorable [3]. Missing data is called ignorable(non-ignorable) if the probability of observing a data item is independent of (dependent on) the value ofthat data item. The data that is missing at random is ignorable, while non-ignorable missing data is not atrandom.
In this paper, we propose a method for estimating parameters in logistic linear models involving non-ignor-able missing data. A binomial response and normal covariate model for the missing data is assumed. TheMonte Carlo EM algorithm is used to estimate parameters [7]. Metropolis–Hastings algorithm to generatea sample for missing data is used in the E-step. Newton–Raphson iteration to solve the score equation is usedto maximize the conditional expectation of likelihood function in the M-step. The standard errors of theseestimates are calculated by the observed Fisher information matrix.
The rest of this paper is organized as follows. In Section 2, notation and model are stated. In Section 3 wederive the E- and M-steps of the Monte Carlo EM algorithm including Metropolis–Hastings algorithm andNewton–Raphson iteration. Calculation for standard error is described in Section 4. In Section 5 we illustrateour method with one example. A summary is given at the last section. Details of derivatives for Newton–Raphson iteration and formulas for elements of observed Fisher information matrix are given in theAppendix.
2. Notation and model
Suppose that y1, . . . ,yn are independent observations, where each yi has a binomial distribution with samplesize mi and success probability pi. Let Xi = (X1i,X2i)
t is a 2 · 1 random vector of covariates, where X1i and X2i
are independent observations and follow normal distributions with means l1, l2 and variances r21, r2
2, respec-tively. Further, let bt = (b0,b1,b2) are regression coefficients assuming to include an intercept. It is alsoassumed that
logitðpiÞ ¼ logpi
1� pi¼ X t
ib; and
pðyijX i; bÞ ¼expfyiX
tibg
1þ expfX tibg
:ð1Þ
We assume that X1i is completely observed, and Yi and X2i are partially missing. Our objective is to estimateb; l1; l2; r
21; r
22 (using maximum likelihood estimation) and their standard errors from the given data with non-
ignorable missing values.Missing value indicators are introduced [3] as
ri ¼0 if yi is observed;
1 if yi is missing;
�and si ¼
0 if x2i is observed;
1 if x2i is missing;
�ð2Þ
with probabilities P(ri) = wi, P(si) = /i. Following [6], the non-ignorable missing-data mechanism is defined as
logitðwiÞ ¼ d1X 1i þ d2X 2i þ yix;
logitð/iÞ ¼ a1X 1i þ a2X 2i þ yis; i ¼ 1; 2; . . . ; n;ð3Þ
where d = (d1,d2)t, a = (a1,a2)t, x and s are parameters determining the missing mechanism. The conditionalprobability functions for ri and si are derived by Eqs. (1) and (3) as
pðrijX i; yi; d;xÞ ¼expfriðX t
idþ yixÞg1þ expfX t
idþ yixg; ð4Þ
pðsijX i; yi; a; sÞ ¼expfsiðX t
iaþ yisÞg1þ expfX t
iaþ yisg: ð5Þ
442 J.-S. Park et al. / Applied Mathematics and Computation 197 (2008) 440–450
Now we derive the joint probability function of yi, x2i, ri, si as
pðyi; x2i; ri; sijx1iÞ ¼ pðri j yi;X i; d;xÞpðsijyi;X i; a; sÞpðyijX i; bÞpðx2i j x1iÞ
/ expfriðX tidþ yixÞg
1þ expfX tidþ yixg
� expfsiðX tiaþ yisÞg
1þ expfX tiaþ yisg
� expfX tibyig � ð1þ expfX t
ibg�mi
� ð2pr22Þ�1=2 � exp
�ðx2 � l2Þ2
2r22
( ): ð6Þ
Therefore, we can write down the complete-data log-likelihood l(h) by
log Lðhjyi;X i; ri; siÞ ¼Xn
i¼1
logexpfriðX t
idþ yixÞg1þ expfX t
idþ yixg
� �þXn
i¼1
logexpfsiðX t
iaþ yisÞg1þ expfX t
iaþ yisg
� �þXn
i¼1
X tibyi
�Xn
i¼1
mi logð1þ expfX tibgÞ �
n2
logð2pr22Þ �
Xn
i¼1
ðx2i � l2Þ2
2r22
; ð7Þ
where h ¼ ðb; d;x; a; s; l2; r22Þ is the parameters related to develop EM algorithm. The complete-data log-like-
lihood specifies a model for the joint characterization of the observed data and the associated missing-datamechanism.
3. Monte Carlo EM algorithm
3.1. Algorithm formulation
The MLE of b and other components of h are the ones maximizing the observed-data likelihoodL(hj(y,X)obs, ri, si) that has a quite intractable analytical form, where (y,X)obs is the observed componentsof (y,X). Rather than directly differentiating L(hj(y,X)obs, ri, si) with respect to h, we compute the MLE of husing an EM algorithm [8] which involves iterative evaluation and maximization of conditional expectationof the complete-data log-likelihood l(h). If the conditional expectation involved is difficult to be evaluatedexactly, a Monte Carlo EM (MCEM) algorithm [9] can be used where a Gibbs sampler or Metropolis–Has-tings algorithm [10] is used to approximate the conditional expectation.
Specifically, let h 0 be the current estimate of h and define the conditional expectation of l(h) – with respect tothe conditional distribution of the missing data (y,X)mis given the observed data (yi,Xi, ri, si) and the value h 0 –as the following:
Qðh; h0Þ ¼ E½lðhÞjðy;X Þobs; r; s; h0�: ð8Þ
The feasibility of calculating the conditional expectation in Q(h,h 0) is dependent on the complexity of the con-ditional distribution of the missing data.
The EM algorithm is composed of expectation (E-step) and maximization (M-step) iterations. Now for theexpectation of the complete-data log-likelihood in the E-step of EM algorithm, we consider four possiblecases: response variable yi is missing, a covariate x2i is missing, both of them are missing, and no missing val-ues. Then the expected log-likelihood is written by
E½lðhÞjX i; yi; ri; si� ¼
Pmi
yi¼0
lðhÞpðyijX i; ri; siÞ ðif yi has missing componentsÞ;R
lðhÞpðx2ijx1i; yi; ri; siÞdx2i;mis ðif x2i has missing componentsÞ;Pmi
yi¼0
RlðhÞpðyi; x2ijx1i; ri; siÞdx2i;mis ðif yi and x2i have missing componentsÞ;
lðhÞ ðfor no missing valuesÞ;
8>>>>>>>><>>>>>>>>:
ð9Þ
J.-S. Park et al. / Applied Mathematics and Computation 197 (2008) 440–450 443
where x2i,mis is the missing components of x2i. Eqs. (8) and (9) lead to the conditional expectation of l(h) whichis our target quantity as
Qðh; hrÞ ¼Xn1
i¼1
lðhÞ þXn2
i¼n1þ1
Xmi
yi¼0
lðhÞpðyi;misjX i; ri; si; hrÞ þ
Xn3
n2þ1
ZlðhÞpðx2i;misjX i;obs; yi; ri; si; h
rÞdx2i;mis
þXn
n3þ1
Xmi
yi¼0
ZlðhÞpðyi;mis; x2i;misjX i;obs; ri; si; h
rÞdx2i;mis; ð10Þ
where n1, n2, n3 are corresponding sample sizes, hr is the rth iteration estimate of h, yi,mis is the missing com-ponents of yi, Xi,obs is the observed component of Xi, and p(yi,misjXi, ri, si), p(x2i,misjXi,obs,yi, ri, si) and p(yi,mis, -x2i,misjXi,obs, ri, si) are the conditional probabilities of the missing data given the observed data. Theseconditional probabilities are regarded as the weights in Q(h,hr). The weights have the form respectively asfollowing:
pðyi;mis; x2i;misjX i;obs; ri; si; hrÞ ¼ pðyijX i; h
rÞpðx2ijx1iÞpðrijyi;X i; hrÞpðsi j yi;X i; h
rÞPmiyi¼0
Rpðyi j X i; h
rÞpðx2i j x1iÞpðrijyi;X i; hrÞpðsijyi;X i; h
rÞ
/ pðyi; x2i; ri; sijx1i; hrÞ; ð11Þ
pðx2i;mis j X i;obs; yi; ri; si; hrÞ ¼ pðx2i j x1i; h
rÞpðsijyi;X i; hrÞR
pðx2i j x1i; hrÞpðsijyi;X i; h
rÞ / pðx2i j x1i; hrÞpðsi j yi;X i; h
rÞ
/exp siðX t
iaþ yisÞ� �
1þ expfsiðX tiaþ yisÞg
� ð2pr22Þð�1=2Þ � exp �ðx2i � l2Þ
2
2r22
( ); ð12Þ
pðyi;misjX i; ri; si; hrÞ ¼ pðyi j X i; h
rÞpðrijyi;X i; hrÞPmi
yi¼0pðyijX i; hrÞpðrijyi;X i; h
rÞ / pðyi j X i; hrÞpðrijyi;X i; h
rÞ: ð13Þ
Since X2 is continuous random variable, the number of possible values that ðyi; X tiÞmis can take is almost
infinity. Thus the above weights (11)–(13) are not explicitly computed and so is the exact computing ofQ(h,hr). In this situation, we can use a Gibbs sampler or Metropolis–Hastings algorithm to simulate a sampleof ðyi; X t
iÞmis values and use the associated empirical distribution to approximate the weights. This means thatthe conditional expectation Q(h,hr) is to be calculated by a Monte Carlo approximation, which leads a MonteCarlo EM (MCEM) algorithm.
3.2. Metropolis–Hastings algorithm
For generation of random samples from weights functions (11)–(13), it is particularly convenient to use aMetropolis–Hastings (MH) algorithm [10] where we choose the multivariate normal distribution as the oper-ating transition density and an easily verified acceptance–rejection condition. When some of yi and x2i aremissing, the algorithm uses the following steps to generate a sample of (yi,x2i)mis [11]. When hr, mi, x1i andx2i are given, it is easy to generate yi from the binomial distribution B(mi,pi) where
pi ¼ expfxtib
rg=ð1þ expfxtib
rgÞ: ð14Þ
So we will present a MH algorithm of generating {x2i}mis for given xi,obs, yi and hr.
Step (1) Set the initialization of parameters hr, x1i,obs, ri, si.Step (2) Repeat the following steps for k = 0,1, . . . ,n � 1:
1. Generate ðyðkÞmis; xðkÞ2;misÞ from (11).
2. Generate xðkÞ2 and y(k) from their distributions where x2 � Nðl2; r22Þ and y � B(mk,pk), where pk is
given by (14) for the xk sample generated.
444 J.-S. Park et al. / Applied Mathematics and Computation 197 (2008) 440–450
3. Compute the acceptance probability
aðkÞ ¼ aðxðkÞ2;mis; pkÞ; ð15Þ
¼ minpkPðxðkÞ2;misjx
ðkÞ2 ; yðkÞÞ
pðxðkÞ2;misÞP ðxðkÞ2 ; yðkÞjxðkÞ2;misÞ
; 1
( ); ð16Þ
where pðxðkÞ2;misÞ is calculated using xðkÞ2;mis according to the Eq. (14).
4. Takexðkþ1Þ2;mis ¼
xðkÞ2 with probability aðkÞ;xðkÞ2;mis with probability 1� aðkÞ:
(ð17Þ
Step (3) Obtain the sample fxð1Þ2;mis; . . . ; xðnÞ2;misg.
3.3. M-step and convergence
Now turn to the M-step in the MCEM algorithm where we need to find a value of h, say h 0, at whichQ(h,h 0) will attain the maximum. This can be done by solving the score equation which sets to 0 the derivativeof Q(h,h 0) with respect to h. The Newton–Raphson method will be used to solve the score equation. Theparameters hr+1 = (b,d,x,a,s) in the M-step at the (r + 1)st EM iteration and the (t + 1)st Newton–Raphsoniteration take the form (for b for example):
btþ1 ¼ bt þ � o2Qðh; hrÞobob
� ��1�����b¼bt
� oQðh; hrÞob
� �����b¼bt
: ð18Þ
The details of the derivatives used in the iteration are given in the Appendix. The (r + 1)st estimates of l2, r22
are obtained to by solving the score equations:
o
olQðh; hrÞ ¼
Xn
i¼1
Eðx2i j x1i; yi; ri; siÞ � nl2 ¼ 0;
o
or2
Qðh; hrÞ ¼Xn
i¼1
Eððx2i � l2Þ2jx1i; yi; ri; siÞ � nr2
2 ¼ 0:
ð19Þ
Therefore, we take lrþ12 , r2ðrþ1Þ
2 by
lrþ12 ¼ 1
nEðx2i j x1i; yi; ri; siÞ; ð20Þ
r2ðrþ1Þ2 ¼ 1
nEððx2i � l2Þ
2 j x1i; yi; ri; siÞ; ð21Þ
which are approximated by the sample averages of simulated and given observations.Since MCEM algorithm needs iterations, convergence check is required to get reliable result. The sequence
{Q(h,hr)} often exhibits an increasing trend, and then fluctuate around the value of Qðh; hÞ once r becomessufficiently large. The sequence fhrg would also fluctuate the the MLE h when r is sufficiently large. To mon-itor the convergence of MCEM algorithm we can plot {Q(h,hr)} as well as fhrg against iteration number. Weterminate the algorithm when the sequence of {Q(h,hr)} become stationary. Otherwise, we continue byincreasing the Monte Carlo precision in the E-step provided that the required calculation is computationallyfeasible.
4. Standard errors of estimates
It is well known that the distribution of maximum likelihood estimates h asymptotically follows anormal distribution MVN(h,V(h)) under some regularity condition. The expected Fisher information
J.-S. Park et al. / Applied Mathematics and Computation 197 (2008) 440–450 445
matrix IðhÞ which gives the inverse of variance matrix of h is approximated by the observed informationmatrix J hðY Þ:
TableA sam
N
y
x1
x2
V ðhÞ�1 ¼ nE � o2 log LðhÞoh2
� ����h¼h
/ nZ� o2 log LðhÞ
oh2
� dx�
Xn
i¼1
� o2 log LðhÞoh
� �����h¼h
� nJðhÞ: ð22Þ
We apply the result of [12] on the information of h:
observed information ¼ complete information�missing information; ð23Þ
so that we haveIðhÞ � J hðY Þ ¼ �o2 log LðhÞ
oh2¼ � o2Qðh; hÞ
oh2� Varh
Xn
i¼1
o log LðhÞoh
!" #�����h¼h
; ð24Þ
where Varh(Æ) is the conditional variance given (y,X)obs, r, s and hr. The details are to be provided in theAppendix.
5. An illustration
In this section, we show an example to illustrate MCEM algorithm method with missing response variableand a covariate in logistic regression model. At first, we generate covariate x1i, x2i independently at random.Each x1i, x2i has normal distribution Nðl1; r
21Þ, Nðl2; r
22Þ, respectively. The response variable yi is generated
from binomial distribution with sample size mi, probability p by generating x1i, x2i where pi = exp{xtb}/(1 + exp{xtb}). We apply missing data mechanism to generate missing data for variable yi, x2i, where wi
and /i are defined in (3). Then each ri, si is generate from Bernoulli distribution with success probabilitywi, /i. The data set with missing observations generated by the above procedure is presented in Table 1, where‘–’ is missing, and ‘0’ is observed. At second, we use Metropolis–Hastings algorithm to generate samples fromweights (11)–(13) in the E-step. We illustrate the algorithm by each condition.
5.1. Both y and x2 are missing
1. The probability density is defined as Eq. (25) and the conjugate density is normal distribution with mean 0and variance 1.
f ðX Þ ¼ pðyi;mis; x2i;misjX i;obs; ri; si; hrÞ / pðyi; x21; ri; sijx1i; h
rÞ
/ expfriðX tidþ yixÞg
1þ expfX tidþ yixg
� expfsiðX tiaþ yisÞg
1þ expfX tiaþ yisg
� expfX tibyig
� ð1þ expfX tibgÞ
�mi � ð2pr22Þ�1=2 � exp
�ðx2 � l2Þ2
2r22
( ); ð25Þ
gðX Þ ¼ expðx2i � l2Þ
2
2
( ): ð26Þ
2. To do the Metropolis–Hastings algorithm, we generate a initial sample x2,0 = (x2,01, . . . ,x2,0J), j = 1,2, . . .,J
from normal distribution Nðl2; r22Þ. Then we generate a sample ~x2 from Nðl2; r
22Þ and u from uniform
1ple data set with missing observations (‘–’ is missing, and ‘0’ is observed) generated by the method described in the Section 5
1 2 � � � n1 n1 + 1 � � � n2 n2 + 1 � � � n3 n3 + 1 � � � n
0 0 0 0 – – – 0 0 0 – – –0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 – – – – – –
446 J.-S. Park et al. / Applied Mathematics and Computation 197 (2008) 440–450
distribution. After computing aðx2;i�1;~x2Þ, if u is lower than a then x2i ¼ ~x2, otherwise x2i = x2,i�1. We gen-erate k sample sets repeating the step k = 1,2, . . .,K. Now we can generate yi from binomial distributionwith the computed pi ¼ expfxt
ibg=ð1þ expfxtibgÞ using the x2i sample generated. And
aðxi�1;~xÞ ¼ minf ðxi�1Þgð~xÞf ð~xÞgðxi�1Þ
; 1
� : ð27Þ
5.2. Only x2 is missing
The probability density is defined as Eq. (28) and the conjugate density is normal distribution with mean 0and variance 1. Also, the method is same as in case when yi and x2i are missing
f ðX Þ ¼ pðx2i;misjX i;obs; yi; ri; si; hrÞ / pðx2ijx1i; h
rÞpðsijyi;X i; hrÞ
/ expfsiðX tiaþ yisÞg
1þ expfsiðX tiaþ yisÞg
� ð2pr22Þð�1=2Þ � exp �ðx2i � l2Þ
2
2r22
( ); ð28Þ
gðX Þ ¼ expðx2i � l2Þ
2
2
( ): ð29Þ
5.3. Only y is missing
The probability density is defined as Eq. (30) and the conjugate density is binomial distribution
f ðX Þ ¼ pðyi;mis j X i; ri; si; hrÞ / pðyi j X i; h
rÞpðrijyi;X i; hrÞ
/ expfriðX tidþ yixÞg
1þ expfX tidþ yixg
� expfX tibyig � ð1þ expfX t
ibgÞ�mi ; ð30Þ
gðX Þ ¼ expfX tibyig � ð1þ expfX t
ibgÞ�mi : ð31Þ
6. Summary
An algorithm for estimating parameters in logistic linear models is proposed for incomplete data. Whensome of the response and a covariate observations are missing non-ignorably, the maximum likelihood esti-mation (MLE) is considered. Metropolis–Hastings (MH) algorithm to compute the conditional expectationof log-likelihood function is implemented in the proposed Monte Carlo EM (Expectation and Maximization)algorithm. Newton–Raphson iteration is used in the M-step of the algorithm. For the standard error of MLEis also considered using the observed Fisher information matrix. Details of MH algorithm, derivatives neededin M-step, and formulas of observed Fisher information matrix are given with an illustration.
Acknowledgements
This paper was started while the third author was visiting the Department of Statistics, La Trobe Univer-sity, Australia. She thanks Dr. Richard Huggins and the staff in the department for their hospitality andsupport.
Appendix 1. Derivatives of Q(h,hr) for M-step
Iterating E-step and M-step, the (r + 1)st Newton–Raphson step estimates of hr+1 = (b,d,x,a,s) can beobtained using these derivatives
J.-S. Park et al. / Applied Mathematics and Computation 197 (2008) 440–450 447
o
obQðh; hrÞ ¼
Xn1
i¼1
xtiyi þ
Xn2
i¼n1þ1
Eðxtiyijxi; h
rÞ þXn3
i¼n2þ1
Eðxtiyijxobs; yi; h
rÞ
þXn4
i¼n3þ1
Eðxtiyijxobs; h
rÞ þXn1
i¼1
xtipi �
Xn2
i¼n1þ1
EðxtipixijhrÞ
þXn3
i¼n2þ1
Eðxtipijxobs; yi; h
rÞ þXn
i¼n3þ1
Eðxtipijxobs; h
rÞ;
o2
obobt Qðh; hrÞ ¼Xn1
i¼1
xtpiðpi � 1Þxi þXn2
i¼n1þ1
Eðxtipiðpi � 1Þxijxi; h
rÞ
þXn3
i¼n2þ1
Eðxtipiðpi � 1Þxijxobs; yi; h
rÞ þXn
i¼n3þ1
Eðxtipiðpi � 1Þxijxobs; h
rÞ;
where pi ¼ expfxtibg=ð1þ expfxt
ibgÞ,
oodQðh; hrÞ ¼
Xn1
i¼1
xtiðri � wiÞ þ
Xn2
i¼n1þ1
Eðxtiðri � wiÞ j xobs; h
rÞ
þXn3
i¼n2þ1
Eðxtiðri � wiÞjxobs; yi; h
rÞ þXn
i¼n3þ1
Eðxtiðri � wiÞjxobs; h
rÞ;
o2
ododt Qðh; hrÞ ¼Xn1
i¼1
xtiriwið1� wiÞ þ
Xn2
i¼n1þ1
Eðxtiriwið1� wiÞjxobs; h
rÞ
þXn3
i¼n2þ1
Eðxtiriwið1� wiÞjxobs; yi; h
rÞ þXn
i¼n3þ1
Eðxtiriwið1� wiÞjxobs; h
rÞ
o
oxQðh; hrÞ ¼
Xn1
i¼1
ytiðri � wiÞ þ
Xn2
i¼n1þ1
Eðytiðri � wiÞjxobs; h
rÞ
þXn3
i¼n2þ1
Eðytiðri � wiÞjxobs; yi; h
rÞ þXn
i¼n3þ1
Eðytiðri � wiÞjxobs; h
rÞ;
o2
oxoxtQðh; hrÞ ¼
Xn1
i¼1
ytiriwið1� wiÞ þ
Xn2
i¼n1þ1
Eðytiriwið1� wiÞjxobs; h
rÞ
þXn3
i¼n2þ1
Eðytiriwið1� wiÞjxobs; yi; h
rÞ þXn4
i¼n3þ1
Eðytiriwið1� wiÞjxobs; h
rÞ;
where wi ¼ expfxtidþ yixg=ð1þ expfxt
idþ yixgÞ,
ooaQðh; hrÞ ¼
Xn1
i¼1
xtiðsi � /iÞ þ
Xn2
i¼n1þ1
Eðxtiðsi � /iÞjxobs; h
rÞ
þXn3
i¼n2þ1
Eðxtiðsi � /iÞjxobs; yi; h
rÞ þXn
i¼n3þ1
Eðxtiðsi � /iÞjxobs; h
rÞ;
o2
oaoatQðh; hrÞ ¼
Xn1
i¼1
xtisi/ið1� /iÞ þ
Xn2
i¼n1þ1
Eðxtisi/ið1� /iÞjxobs; h
rÞ
þXn3
i¼n2þ1
Eðxtisi/ið1� /iÞjxobs; yi; h
rÞ þXn
i¼n3þ1
Eðxtisi/ið1� /iÞjxobs; h
rÞ;
o
osQðh; hrÞ ¼
Xn1
i¼1
ytiðsi � /iÞ þ
Xn2
i¼n1þ1
Eðytiðsi � /iÞjxobs; h
rÞ
þXn3
i¼n2þ1
Eðytiðsi � /iÞjxobs; yi; h
rÞ þXn
i¼n3þ1
Eðytiðsi � /iÞjxobs; h
rÞ;
448 J.-S. Park et al. / Applied Mathematics and Computation 197 (2008) 440–450
o2
osostQðh; hrÞ ¼
Xn1
i¼1
ytisi/ið1� /iÞ þ
Xn2
i¼n1þ1
Eðytisi/ið1� /iÞjxobs; h
rÞ
þXn3
i¼n2þ1
Eðytisi/ið1� /iÞjxobs; yi; h
rÞ þXn
i¼n3þ1
Eðytisi/ið1� /iÞjxobs; h
rÞ;
where /i ¼ expfxtiaþ yisg=ð1þ expfxt
iaþ yisgÞ.
Appendix 2. Observed Fisher information matrix
The expected Fisher information matrix IðhÞ which gives the inverse of variance matrix of h is approxi-mated by the observed information matrix J hðY Þ
V ðhÞ�1 ¼ nE � o2 log LðhÞ
oh2
� ����h¼h
/ nZ� o
2 log LðhÞoh2
� dx �
Xn
i¼1
� o2 log LðhÞ
oh
� ����h¼h
� nIðhÞ; ð32Þ
IðhÞ � J hðY Þ ¼ � o2 log LðhÞoh2
� ����h¼h
¼ � o2Qðh; hrÞoh2
� Varh
Xn
i¼1
o log LðhÞoh
!" #�����h¼h
; ð33Þ
where Varhrð�Þ is the conditional variance given a (y,X)obs, r, s and hr.
J hðY Þ ¼
1 2 3 4 5 6 7
1 � o2 log LðhÞob2 � o2 log LðhÞ
obod � o2 log LðhÞobox � o2 log LðhÞ
oboa � o2 log LðhÞobos � o2 log LðhÞ
obol2� o2 log LðhÞ
obor2
2 � o2 log LðhÞobod � o2 log LðhÞ
od2 � o2 log LðhÞodox � o2 log LðhÞ
odoa � o2 log LðhÞodos � o2 log LðhÞ
odol2� o2 log LðhÞ
odor2
3 � o2 log LðhÞobox � o2 log LðhÞ
odox � o2 log LðhÞox2 � o2 log LðhÞ
oxoa � o2 log LðhÞoxos � o2 log LðhÞ
oxol2� o2 log LðhÞ
oxor2
4 � o2 log LðhÞoboa � o2 log LðhÞ
odoa � o2 log LðhÞoxoa � o2 log LðhÞ
oa2 � o2 log LðhÞoaos � o2 log LðhÞ
oaol2� o2 log LðhÞ
oaor2
5 � o2 log LðhÞobos � o2 log LðhÞ
odos � o2 log LðhÞoxos � o2 log LðhÞ
oaos � o2 log LðhÞos2 � o2 log LðhÞ
osol2� o2 log LðhÞ
ol2or2
6 � o2 log LðhÞobol2
� o2 log LðhÞodol2
� o2 log LðhÞoxol2
� o2 log LðhÞoaol2
� o2 log LðhÞosol2
� o2 log LðhÞol2
2
� o2 log LðhÞol2or2
7 � o2 log LðhÞobor2
� o2 log LðhÞodor2
� o2 log LðhÞoxor2
� o2 log LðhÞoaor2
� o2 log LðhÞosor2
� o2 log LðhÞol2or2
� o2 log LðhÞor2
2
0BBBBBBBBBBBBBBBBBBB@
1CCCCCCCCCCCCCCCCCCCA
and
� o2Qðh; hrÞoh2
¼
1 2 3 4 5 6 7
1 � o2 log LðhÞob2 0 0 0 0 0 0
2 0 � o2Qðh;hrÞod2 � o2Qðh;hrÞ
odox 0 0 0 0
3 0 � o2Qðh;hrÞodox � o2Qðh;hrÞ
ox2 0 0 0 0
4 0 0 0 � o2Qðh;hrÞoa2 � o2Qðh;hrÞ
oaos 0 0
5 0 0 0 � o2Qðh;hrÞoaos � o2Qðh;hrÞ
os2 0 0
6 0 0 0 0 0 � o2Qðh;hrÞol2
2
� o2Qðh;hrÞol2or2
7 0 0 0 0 0 � o2Qðh;hrÞol2or2
� o2Qðh;hrÞor2
2
0BBBBBBBBBBBBBBBBBB@
1CCCCCCCCCCCCCCCCCCA
:
J.-S. Park et al. / Applied Mathematics and Computation 197 (2008) 440–450 449
Now we can estimate the J hðY Þ as follows:
J hðY Þ11 ¼ �o2 log LðhÞ
obobt � 1
k
Xn
i¼1
Xk
k¼1
ðX tiyÞðX t
iyÞt;
J hðY Þ12 ¼ �1
k
Xn
i¼1
Xk
k¼1
ðX tiyÞðX t
iðri � wiÞÞt;
J hðY Þ13 ¼ �1
k
Xn
i¼1
Xk
k¼1
ðX tiyÞðyiðri � wiÞÞ
t;
J hðY Þ14 ¼ �1
k
Xn
i¼1
Xk
k¼1
ðX tiyÞðX t
iðsi � /iÞÞt;
J hðY Þ15 ¼ �1
k
Xn
i¼1
Xk
k¼1
ðX tiyÞðyiðsi � /iÞÞ
t;
J hðY Þ16 ¼ �1
k
Xn
i¼1
Xk
k¼1
ðX tiyÞððx2 � l2Þ=r2
2Þ;
J hðY Þ17 ¼ �1
k
Xn
i¼1
Xk
k¼1
ðX tiyÞð�1=r3
2 þ ðx2 � l2Þ=r32Þ;
J hðY Þ22 ¼ �o2 log LðhÞ
ododt � 1
k
Xn
i¼1
Xk
k¼1
ðX tiðri � wiÞÞðX t
iðri � wiÞÞt;
J hðY Þ23 ¼ �1
k
Xn
i¼1
Xk
k¼1
ðX tiðri � wiÞÞðyiðri � wiÞÞ
t;
J hðY Þ24 ¼ �1
k
Xn
i¼1
Xk
k¼1
ðX tiðri � wiÞÞðX t
iðsi � /iÞÞt;
J hðY Þ25 ¼ �1
k
Xn
i¼1
Xk
k¼1
ðX tiðri � wiÞÞðyiðsi � /iÞÞ
t;
J hðY Þ26 ¼ �1
k
Xn
i¼1
Xk
k¼1
ðX tiðri � wiÞÞððx2 � l2Þ=r2
2Þ;
J hðY Þ27 ¼ �1
k
Xn
i¼1
Xk
k¼1
ðX tiðri � wiÞÞð1=r3
2 þ ðx2 � l2Þ=r32Þ;
J hðY Þ33 ¼ �o
2 log LðhÞoxoxt
� 1
k
Xn
i¼1
Xk
k¼1
ðyiðri � wiÞÞðyiðri � wiÞÞt;
J hðY Þ34 ¼ �1
k
Xn
i¼1
Xk
k¼1
ðyiðri � wiÞÞðX tiðsi � /iÞÞ
t;
J hðY Þ35 ¼ �1
k
Xn
i¼1
Xk
k¼1
ðyiðri � wiÞÞðyiðsi � /iÞÞt;
J hðY Þ36 ¼ �1
k
Xn
i¼1
Xk
k¼1
ðyiðri � wiÞÞððx2 � l2Þ=r22Þ;
J hðY Þ37 ¼ �1
k
Xn
i¼1
Xk
k¼1
ðyiðri � wiÞÞð1=r32 þ ðx2 � l2Þ=r3
2Þ;
450 J.-S. Park et al. / Applied Mathematics and Computation 197 (2008) 440–450
J hðY Þ44 ¼ �o2 log LðhÞ
oaoat� 1
k
Xn
i¼1
Xk
k¼1
ðX tiðsi � /iÞÞðX t
iðsi � /iÞÞt;
J hðY Þ45 ¼ �1
k
Xn
i¼1
Xk
k¼1
ðX tiðsi � /iÞÞðyiðsi � /iÞÞ
t;
J hðY Þ46 ¼ �1
k
Xn
i¼1
Xk
k¼1
ðX tiðsi � /iÞÞððx2 � l2Þ=r2
2Þ;
J hðY Þ47 ¼ �1
k
Xn
i¼1
Xk
k¼1
ðX tiðsi � /iÞÞð1=r3
2 þ ðx2 � l2Þ=r32Þ;
J hðY Þ55 ¼ �o
2 log LðhÞosost
� 1
k
Xn
i¼1
Xk
k¼1
ðyiðsi � /iÞÞðyiðsi � /iÞÞt;
J hðY Þ56 ¼ �1
k
Xn
i¼1
Xk
k¼1
ðyiðsi � /iÞÞððx2 � l2Þr22Þ;
J hðY Þ57 ¼ �1
k
Xn
i¼1
Xk
k¼1
ðyiðsi � /iÞÞð1=r32 þ ðx2 � l2Þ=r3
2Þ;
J hðY Þ66 ¼ �o2 log LðhÞ
ol2olt2
� 1
k
Xn
i¼1
Xk
k¼1
ððx2 � l2Þ=r22Þ
2;
J hðY Þ67 ¼ �1
k
Xn
i¼1
Xk
k¼1
ððx2 � l2Þ=r22Þð1=r2
2 þ ðx2 � l2Þ=r22Þ;
J hðY Þ77 ¼ �1
k
Xn
i¼1
Xk
k¼1
ð1=r32 þ ðx2 � l2Þ=r3
2Þ2: ð34Þ
References
[1] Y.G. Smirlis, E.K. Maragos, D.K. Despotis, Data envelopment analysis with missing values: An interval DEA approach, Appl.Math. Comput. 177 (2006) 1–10.
[2] M.M. Rueda, S. Gonzalez, A. Arcos, Indirect methods of imputation of missing data based on available units, Appl. Math. Comput.164 (2005) 249–261.
[3] R.J.A. Little, D.B. Rubin, Statistical Analysis with Missing Data, second ed., Wiley, NY, 2002.[4] S.G. Baker, N.M. Laird, Regression analysis for categorical variables with outcome subject to nonignorable nonresponse, J. Am. Stat.
Assoc. 83 (1988) 62–69.[5] J.G. Ibrahim, S.R. Lipsitz, Parameter estimation from incomplete data in bionomial regression when the missing data mechanism is
nonignorable, Biometrics 52 (1996) 1071–1078.[6] J.G. Ibrahim, S.R. Lipsitz, Missing covariates in generalized linear models when the missing data mechanism is non-ignorable, J.
Royal Stat. Soc. B 61 (1999) 173–190.[7] G.R. Jahanshahloo, F.H. Lotfi, H.Z. Rezai, F.R. Balf, Using Monte Carlo method for ranking efficient DMUs, Appl. Math.
Comput. 162 (2005) 371–379.[8] A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm (with discussion), J.
Royal Stat. Soc. B 39 (1977) 1–38.[9] G.C.G. Wei, M.A. Tanner, A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithm, J.
Amer. Stat. Assoc. 85 (1990) 699–704.[10] C.P. Robert, G. Casella, Monte Carlo Statistical Methods, Springer, Berlin, 1999.[11] M. Watanabe, K. Yamaguchi (Eds.), The EM Algorithm and Related Statistical Models, Marcel Dekker, NY, 2004.[12] T.A. Louis, Finding observed information using the EM algorithm, J. Royal Stat. Soc. B 44 (1982) 226–233.