+ All Categories
Home > Documents > Estimating the survival function based on the semi-Markov ... › ~zj7 › lida2015.pdf · Z. Zhao...

Estimating the survival function based on the semi-Markov ... › ~zj7 › lida2015.pdf · Z. Zhao...

Date post: 03-Jul-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
30
Lifetime Data Anal DOI 10.1007/s10985-015-9325-0 Estimating the survival function based on the semi-Markov model for dependent censoring Ziqiang Zhao 1 · Ming Zheng 1 · Zhezhen Jin 2 Received: 10 March 2014 / Accepted: 7 March 2015 © Springer Science+Business Media New York 2015 Abstract In this paper, we study a nonparametric maximum likelihood estimator (NPMLE) of the survival function based on a semi-Markov model under dependent censoring. We show that the NPMLE is asymptotically normal and achieves asymp- totic nonparametric efficiency. We also provide a uniformly consistent estimator of the corresponding asymptotic covariance function based on an information operator. The finite-sample performance of the proposed NPMLE is examined with simulation studies, which show that the NPMLE has smaller mean squared error than the exist- ing estimators and its corresponding pointwise confidence intervals have reasonable coverages. A real example is also presented. Keywords Semi-Markov model · Dependent censoring · NPMLE · Survival function Electronic supplementary material The online version of this article (doi:10.1007/s10985-015-9325-0) contains supplementary material, which is available to authorized users. B Zhezhen Jin [email protected] Ziqiang Zhao [email protected] Ming Zheng [email protected] 1 Department of Statistics, School of Management, Fudan University, 670 Guoshun Road, Shanghai, China 2 Department of Biostatistics, Mailman School of Public Health, Columbia University, 722 West 168th Street, New York, NY, USA 123
Transcript
Page 1: Estimating the survival function based on the semi-Markov ... › ~zj7 › lida2015.pdf · Z. Zhao et al. Let V = min{X,U},R = I{X U} and ξ be the indicator of observ- ing (X,Δ).The

Lifetime Data AnalDOI 10.1007/s10985-015-9325-0

Estimating the survival function based on thesemi-Markov model for dependent censoring

Ziqiang Zhao1 · Ming Zheng1 · Zhezhen Jin2

Received: 10 March 2014 / Accepted: 7 March 2015© Springer Science+Business Media New York 2015

Abstract In this paper, we study a nonparametric maximum likelihood estimator(NPMLE) of the survival function based on a semi-Markov model under dependentcensoring. We show that the NPMLE is asymptotically normal and achieves asymp-totic nonparametric efficiency. We also provide a uniformly consistent estimator ofthe corresponding asymptotic covariance function based on an information operator.The finite-sample performance of the proposed NPMLE is examined with simulationstudies, which show that the NPMLE has smaller mean squared error than the exist-ing estimators and its corresponding pointwise confidence intervals have reasonablecoverages. A real example is also presented.

Keywords Semi-Markov model · Dependent censoring · NPMLE ·Survival function

Electronic supplementary material The online version of this article (doi:10.1007/s10985-015-9325-0)contains supplementary material, which is available to authorized users.

B Zhezhen [email protected]

Ziqiang [email protected]

Ming [email protected]

1 Department of Statistics, School of Management, Fudan University,670 Guoshun Road, Shanghai, China

2 Department of Biostatistics, Mailman School of Public Health, Columbia University,722 West 168th Street, New York, NY, USA

123

Page 2: Estimating the survival function based on the semi-Markov ... › ~zj7 › lida2015.pdf · Z. Zhao et al. Let V = min{X,U},R = I{X U} and ξ be the indicator of observ- ing (X,Δ).The

Z. Zhao et al.

1 Introduction

In survival analysis, the survival function of the failure time is commonly estimated bythe Kaplan-Meier estimator and the Nelson-Aalen estimator. For these two estimators,a key assumption is that the censoring time and the survival time are independent. Itis challenging to estimate the survival function under dependent censorship (Tsiatis1975).

In oncology studies, independent dropout yields independent censoring. In addition,subjects are often censored due to the ending of the study or to the onset of progressivedisease (PD). The study-ending censoring is usually independent of the survival timewhile the PD-related censoring might be dependent on the survival time since the PDcould be a precursor of both death and lost of follow-up. In other words, in the presenceof PD, the patients have a much higher risk of death and are more likely to leave thestudy. Ignoring such dependence would yield biased and inconsistent estimation ofthe survival function. On the other hand, it is possible to improve the estimation if thedependency information is properly used. Datta et al. (2000) considered nonparametricestimation using a three-stage irreversible illness–death model.

In the case of dependent censoring, Lee and Tsai (2005) proposed a semi-Markovmodel and developed an empirical-type estimator of the survival function. The asymp-totic variance of their proposed estimator, however, is too complicated to compute. Inthis paper, we present a nonparametric maximum likelihood estimator (NPMLE) ofthe survival function based on the semi-Markov model. We show that our proposedNPMLE converges weakly to a Gaussian process and achieves asymptotic efficiency.In addition, we develop a consistent estimator of its asymptotic covariance functionbased on an information operator, which can be easily calculated.

The remainder of the paper is organized as follows. In Sect. 2, we will introduce thesemi-Markov model. In Sect. 3, we will derive the NPMLE of the survival function.In Sect. 4, we will establish the asymptotic properties of the NPMLE, and constructa consistent estimator of its asymptotic covariance function based on an informationoperator. In Sect. 5, we will present simulation studies and re-analysis of the examplein Lee and Tsai (2005). We will conclude with a short discussion in Sect. 6 and providesketches of the proofs of theorems in the Appendix.

2 The semi-Markov model

Let T be the survival time and U be the PD censoring time. The semi-Markov modelproposed by Lee and Tsai (2005) assumes that:

λT|U (t |u) = λ(0,2)0 (t) I {t � u} + λ

(1,2)0 (t − u) I {t > u} , for t, u � 0, (1)

whereλT|U is the conditional hazard function of T given U, I {·} is the indicator function

taking the value 1 if the condition is satisfied and the value 0 otherwise, λ(0,2)0 and λ

(1,2)0

are unknown hazard functions for death without PD and death with PD respectively.

123

Page 3: Estimating the survival function based on the semi-Markov ... › ~zj7 › lida2015.pdf · Z. Zhao et al. Let V = min{X,U},R = I{X U} and ξ be the indicator of observ- ing (X,Δ).The

Semi-Markov model for dependent censoring

Fig. 1 Transition

Note that the cumulative form of Model (1) is:

ΛT|U (t |u) = Λ(0,2)0 (min {t, u}) + Λ

(1,2)0 (t − u) I {t > u} , for t, u � 0, (2)

and that for any t, u � 0,

ST|U (t |u) =⎧⎨

S(0,2)0 (t) t � u,

S(0,2)0 (u) S(1,2)

0 (t − u) t > u,

where Λ(0,1)0 ,Λ

(0,2)0 ,Λ

(1,2)0 and S(0,1)

0 , S(0,2)0 , S(1,2)

0 are the corresponding causespecific cumulative hazard functions and the corresponding cause specific survivalfunctions for λ

(0,1)0 , λ

(0,2)0 and λ

(1,2)0 , respectively. The superscript values 0, 1, 2 indi-

cate three different states, with 0 being the state of alive without PD, 1 being the stateof alive with PD and 2 being the state of death.

More precisely, Model (1) corresponds to a non-homogeneous semi-Markovprocess J with the state space {0, 1, 2}:

J (t) =⎧⎨

0 T > t, U > t (Alive without PD at time t)1 T > t, U � t (Alive with PD at time t), for t � 02 T � t (Died at time t)

.

With this specification, if λ(0,1)0 denotes the hazard function of U, i.e. the hazard

function for PD, it can be shown that λ(0,1)0 , λ

(0,2)0 and λ

(1,2)0 are respectively the cause

specific hazard functions for the transition from state 0 to state 1, state 0 to state 2,and state 1 to state 2. Figure 1 provides a plot of transition for illustration.

In addition, we use C to denote independent censoring, which is independent of(T, U). Let X = min {T, C} and Δ = I {T � C}. When X � U, the subject is dead orcensored before PD, i.e., (X,Δ) is observed while U is not. When X > U, PD occursbefore both death and the independent censoring C and so the subject would leave thestudy at time U with a probability θ0, i.e., U is observed and (X,Δ) is observed withprobability θ0, where θ0 is an unknown parameter.

123

Page 4: Estimating the survival function based on the semi-Markov ... › ~zj7 › lida2015.pdf · Z. Zhao et al. Let V = min{X,U},R = I{X U} and ξ be the indicator of observ- ing (X,Δ).The

Z. Zhao et al.

Let V = min {X, U} , R = I {X � U} and ξ be the indicator of observ-ing (X,Δ). The observed data for a subject is (V, R, ξ, ξX, ξΔ). Throughout thepaper, it is assumed that ξ is conditionally independent of (T, U, C) given R withP (ξ = 1|R = 1) = 1 and P (ξ = 1|R = 0) = θ0. For a sample of size n, we observe(Vi , Ri , ξi , ξi Xi , ξiΔi ) , i = 1, . . . , n.

3 Nonparametric maximum likelihood estimation

In general, without additional information, likelihood based estimation is not availablefor dependent censoring problems. Using the semi-Markov model specification, it ispossible to develop likelihood based estimation.

Next, we present the likelihood for the unknown parameters:

(Λ(0,1), Λ(0,2), Λ(1,2), Λ(c), θ

),

where Λ(c) is the parameter for Λ(c)0 which is the true cumulative hazard function of

C. Note that Λ(c) and θ are nuisance parameters.Suppose that Λ(0,1), Λ(0,2), Λ(1,2) and Λ(c) are differentiable with corresponding

derivatives: λ(0,1), λ(0,2), λ(1,2) and λ(c). Let S(0,1), S(0,2), S(1,2) and S(c) be the cor-responding survival functions of Λ(0,1), Λ(0,2), Λ(1,2) and Λ(c). With these notations,the likelihood of

(Λ(0,1), Λ(0,2), Λ(1,2), Λ(c), θ

)can be derived.

For subjects with ξi = 1, (Vi , Xi ,Δi , Ri , ξi ) is observed and its corresponding like-lihood can be obtained based on the joint distribution of (V, X,Δ, R, ξ). Specifically,when ξi = 1, Ri = 1 and Δi = 1, the likelihood is:

λ(0,2) (Vi ) S(0,2) (Vi ) S(c) (Vi ) S(0,1) (Vi ) ;

When ξi = 1, Ri = 1 and Δi = 0, the likelihood is:

λ(c) (Vi ) S(c) (Vi ) S(0,2) (Vi ) S(0,1) (Vi ) ;

When ξi = 1, Ri = 0 and Δi = 1, the likelihood is:

λ(1,2) (Xi − Vi ) S(1,2) (Xi − Vi ) S(c) (Xi ) λ(0,1) (Vi ) S(0,1) (Vi ) S(0,2) (Vi ) θ;

When ξi = 1, Ri = 0 and Δi = 0, the likelihood is:

λ(c) (Xi ) S(c) (Xi ) S(1,2) (Xi − Vi ) λ(0,1) (Vi ) S(0,1) (Vi ) S(0,2) (Vi ) θ.

For subjects with ξi = 0, (Vi , Ri , ξi ) is observed and its corresponding likelihood is:

λ(0,1) (Vi ) S(0,1) (Vi ) S(0,2) (Vi ) S(c) (Vi ) (1 − θ) .

123

Page 5: Estimating the survival function based on the semi-Markov ... › ~zj7 › lida2015.pdf · Z. Zhao et al. Let V = min{X,U},R = I{X U} and ξ be the indicator of observ- ing (X,Δ).The

Semi-Markov model for dependent censoring

Hence, the likelihood based on the observed data (Vi , Ri , ξi , ξi Xi , ξiΔi ) , i =1, . . . , n, is proportional to

n∏

i=1

⎣[λ(0,1) (Vi )

](1−Ri ) ∏

y∈[0,Vi )

(1 − Λ(0,1) (dy)

)⎤

×n∏

i=1

⎣[λ(0,2) (Vi )

]Δi Ri ∏

y∈[0,Vi )

(1 − Λ(0,2) (dy)

)⎤

×n∏

i=1

⎣[λ(1,2) (Xi − Vi )

]Δi ∏

y∈[0,Xi −Vi )

(1 − Λ(1,2) (dy)

)⎤

ξi (1−Ri )

.

Unfortunately, this function is unbounded from above and the usual maximum like-lihood estimator (MLE) does not exist when Λ

(0,1)0 ,Λ

(0,2)0 and Λ

(1,2)0 are restricted

to continuous functions. With discretized extensions by allowing Λ(0,1)0 ,Λ

(0,2)0 and

Λ(1,2)0 to be discontinuous, the NPMLE is well defined in the sense of Kiefer and

Wolfowitz (1956) and Scholz (1980). Specifically, we assume that Λ(0,1)0 ,Λ

(0,2)0 and

Λ(1,2)0 are cadlag, piecewise constant and right continuous with left limits.To obtain the NPMLE, we rewrite the likelihood function with the discretized

Λ(0,1), Λ(0,2) and Λ(1,2):

Ln

(Λ(0,1), Λ(0,2), Λ(1,2)

)= L(0,1)

n

(Λ(0,1)

)L(0,2)

n

(Λ(0,2)

)L(1,2)

n

(Λ(1,2)

),

where for any cumulative hazard function Λ,

L(0,1)

n (Λ) =n∏

i=1

⎣[Λ {Vi }

](1−Ri )∏

y∈[0,Vi )

(1 − Λ(dy))

⎦ ,

L(0,2)

n (Λ) =n∏

i=1

⎣[Λ {Vi }

]Δi Ri (1 − Λ {Vi })1−Δi Ri∏

y∈[0,Vi )

(1 − Λ(dy))

⎦ ,

L(1,2)

n (Λ)

=n∏

i=1

⎣[Λ {Xi −Vi }

]Δi (1−Λ {Xi − Vi })1−Δi∏

y∈[0,Xi−Vi )

(1 − Λ(dy))

ξi (1−Ri )

,

with Λ {t} = Λ(t) − Λ(t−) for all t � 0.It is also assumed that Λ

(0,1)0 does not share jump points with Λ

(0,2)0 and Λ

(c)0 ,

which means that the time of PD censoring occurrence is different from the time ofdeath or the time of independent censoring. Then, for any cumulative hazard functionΛ,

123

Page 6: Estimating the survival function based on the semi-Markov ... › ~zj7 › lida2015.pdf · Z. Zhao et al. Let V = min{X,U},R = I{X U} and ξ be the indicator of observ- ing (X,Δ).The

Z. Zhao et al.

log L(0,1)n (Λ) �

∫ +∞

0log [Λ {t}] N

(0,1)

n (dt) +∑

t�0

Y(0)

n (t+) log [1 − Λ {t}] (3)

�∫ +∞

0log

[Λ(0,1)

n {t}]

N(0,1)

n (dt) +∑

t�0

Y(0)

n (t+) log[1 − Λ(0,1)

n {t}]

(4)

= log L(0,1)n

(Λ(0,1)

n

),

log L(0,2)n (Λ) �

∫ +∞

0log [Λ {t}] N

(0,2)

n (dt) +∑

t�0

Y(0)

n (t+) log [1 − Λ {t}] (5)

�∫ +∞

0log

[Λ(0,2)

n {t}]

N(0,2)

n (dt) +∑

t�0

Y(0)

n (t+) log[1 − Λ(0,2)

n {t}]

(6)

= log L(0,2)n

(Λ(0,2)

n

),

log L(1,2)n (Λ) �

∫ +∞

0log [Λ {t}] N

(1,2)

n (dt) +∑

t�0

Y(1)

n (t+) log [1 − Λ {t}] (7)

�∫ +∞

0log

[Λ(1,2)

n {t}]

N(1,2)

n (dt) +∑

t�0

Y(1)

n (t+) log[1 − Λ(1,2)

n {t}]

= log L(1,2)n

(Λ(1,2)

n

), (8)

where for any t � 0,

Y(0)

n (t) =n∑

i=1

I {Vi � t} , Y(1)

n (t) =n∑

i=1

ξi (1 − Ri ) I {Xi − Vi � t} ,

N(0,1)

n (t) =n∑

i=1

(1 − Ri ) I {Vi � t} , N(0,2)

n (t) =n∑

i=1

Δi Ri I {Vi � t} ,

N(1,2)

n (t) =n∑

i=1

ξiΔi (1 − Ri ) I {Xi − Vi � t} ,

Λ(0,1)n (t) =

[0,t]

(Y

(0)

n (y))−1

I{Y

(0)

n (y) > 0}

N(0,1)

n (dy) ,

Λ(0,2)n (t) =

[0,t]

(Y

(0)

n (y))−1

I{Y

(0)

n (y) > 0}

N(0,2)

n (dy) ,

Λ(1,2)n (t) =

[0,t]

(Y

(1)

n (y))−1

I{Y

(1)

n (y) > 0}

N(1,2)

n (dy) .

The equalities for (3), (5), and (7) hold if and only if Λ is a pure jump function. Theinequalities in (4), (6), and (8) follow from the fact that for any λ ∈ (0, 1) and anyα, β > 0,

123

Page 7: Estimating the survival function based on the semi-Markov ... › ~zj7 › lida2015.pdf · Z. Zhao et al. Let V = min{X,U},R = I{X U} and ξ be the indicator of observ- ing (X,Δ).The

Semi-Markov model for dependent censoring

α log λ + β log (1 − λ) � α log (α/(α + β)) + β log (β/(α + β)) .

According to Criteria (B) of Scholz (1980),(Λ

(0,1)n , Λ

(0,2)n , Λ

(1,2)n

)is the NPMLE of

(0,1)0 ,Λ

(0,2)0 ,Λ

(1,2)0

), since Ln

(0,1)n , Λ

(0,2)n , Λ

(1,2)n

)> 0.

Let S0 be the marginal survival function of T. Note that for any t � 0,

S0 (t) = P (T > t) = −∫

R+ST|U (t |u) S(0,1)

0 (du)

= S(0,1)0 (t) S(0,2)

0 (t) −∫

[0,t]S(0,2)

0 (u) S(1,2)0 (t − u) S(0,1)

0 (du) .

Correspondingly, for any t � 0, define

Sn (t) = S(0,1)n (t) S(0,2)

n (t) −∫

[0,t]S(0,2)

n (u) S(1,2)n (t − u) S(0,1)

n (du) ,

where for any t � 0,

S(0,1)n (t) =

y∈[0,t]

[1 − Λ(0,1)

n (dy)], S(0,2)

n (t) =∏

y∈[0,t]

[1 − Λ(0,2)

n (dy)],

S(1,2)n (t) =

y∈[0,t]

[1 − Λ(1,2)

n (dy)].

By the invariance property, it follows that(S(0,1)

n , S(0,2)n , S(1,2)

n

)is the NPMLE of

(S(0,1)

0 , S(0,2)0 , S(1,2)

0

)and Sn is the NPMLE of S0.

4 Asymptotic results

In this section, we present the asymptotic properties of the NPMLE.

4.1 Regularity conditions

We list regularity conditions in this subsection.For any t � 0, define L(0)

0 (t) = S(0,1)0 (t) S(0,2)

0 (t) S(c)0 (t) and

L(1)0 (t) = −θ0S(1,2)

0 (t)∫

R+S(0,2)

0 (u) S(c)0 (u + t) S(0,1)

0 (du) .

The regularity conditions are as follows:

A.1 For a sample of size n, (Ti , Ui , Ci , Xi ,Δi , Vi , Ri , ξi ) , i = 1, . . . , n are n inde-pendent copies of (T, U, C, X,Δ, V, R, ξ).

123

Page 8: Estimating the survival function based on the semi-Markov ... › ~zj7 › lida2015.pdf · Z. Zhao et al. Let V = min{X,U},R = I{X U} and ξ be the indicator of observ- ing (X,Δ).The

Z. Zhao et al.

A.2 Model (2) holds with Λ(1,2)0 (0) = 0.

A.3 The censoring time C is independent of (T, U).A.4 The indicator ξ is conditionally independent of (T, U, C) given R, with

P (ξ = 1|R = 1) = 1 and P (ξ = 1|R = 0) = θ0 ∈ (0, 1) .

A.5 There exists τ > 0 such that L(0)0 (τ−) > 0 and L(1)

0 (τ−) > 0.

A.6 For any t ∈ [0, τ ] ,Λ(0,1)0 {t} Λ

(0,2)0 {t} = Λ

(0,1)0 {t} Λ

(c)0 {t} = 0.

Assumption A.1 is commonly satisfied. Assumption A.2 is a technical conditionfor model specification. Assumption A.3 indicates that C is independent censoring.Assumption A.4 assumes that there is a subgroup of the potentially dependent censoredsubjects whose (X,Δ) are observed. Assumption A.5 is equivalent to L(0)

0 (0) > 0 and

L(1)0 (0) > 0. The required τ is generally not unique and could be chosen as large as

possible. Assumption A.6 is a mild technical condition which is weaker than continu-ity.

4.2 Asymptotic normality

In this subsection, we establish the asymptotic normality and the asymptotic effi-ciency of the NPMLE estimators. The asymptotic normality is established byconvergence to a tight Gaussian process in the space of uniformly bounded func-tions on [0, τ ]. Specifically, Sn is viewed as a random element in ∞([0, τ ]), and(Λ

(0,1)n , Λ

(0,2)n , Λ

(1,2)n

)and

(S(0,1)

n , S(0,2)n , S(1,2)

n

)are viewed as random elements in

∞3 ([0, τ ]) = ∞([0, τ ]) × ∞([0, τ ]) × ∞([0, τ ]), where ∞([0, τ ]) denotes the

space of all uniformly bounded real valued functions defined on [0, τ ] equipped withthe uniform norm. The asymptotic efficiency is shown by convolution theorem.

The first theorem gives the asymptotic normality for(Λ

(0,1)n , Λ

(0,2)n , Λ

(1,2)n

).

Theorem 1 Under assumptions A.1–A.6,

n1/2(Λ(0,1)

n − Λ(0,1)0 , Λ(0,2)

n − Λ(0,2)0 , Λ(1,2)

n − Λ(1,2)0

)

weakly converges to a tight zero-mean Gaussian process in ∞3 ([0, τ ]) with covariance

function U0, as n → ∞, where for any t, s ∈ [0, τ ],

U0 (t, s) = diag(U (0,1)

0 (t, s) ,U (0,2)0 (t, s) ,U (1,2)

0 (t, s))

,

in which, for any (i, j) ∈ {(0, 1) , (0, 2) , (1, 2)} and any t, s ∈ [0, τ ],

U (i, j)0 (t, s) =

[0,min{t,s}]

(L(i)

0 (y−))−1 (

1 − Λ(i, j)0 {y}

(i, j)0 (dy) .

The second theorem gives the asymptotic normality for(S(0,1)

n , S(0,2)n , S(1,2)

n

).

123

Page 9: Estimating the survival function based on the semi-Markov ... › ~zj7 › lida2015.pdf · Z. Zhao et al. Let V = min{X,U},R = I{X U} and ξ be the indicator of observ- ing (X,Δ).The

Semi-Markov model for dependent censoring

Theorem 2 Under assumptions A.1–A.6,

n1/2(S(0,1)

n − S(0,1)0 , S(0,2)

n − S(0,2)0 , S(1,2)

n − S(1,2)0

)

weakly converges to a tight zero-mean Gaussian process in ∞3 ([0, τ ]) with covariance

function V0, as n → ∞, where for any t, s ∈ [0, τ ],

V0 (t, s) = diag(V(0,1)

0 (t, s) ,V(0,2)0 (t, s) ,V(1,2)

0 (t, s)),

in which, for any (i, j) ∈ {(0, 1) , (0, 2) , (1, 2)} and any t, s ∈ [0, τ ],

V(i, j)0 (t, s) = S(i, j)

0 (t) S(i, j)0 (s) W(i, j)

0 (t, s),

W(i, j)0 (t, s) =

[0,min{t,s}]

(L

(i)

0 (y−))−1 (

1 − Λ(i, j)0 {y}

)−1Λ

(i, j)0 (dy) .

The next theorem establishes the asymptotic normality for Sn .

Theorem 3 Under assumptions A.1–A.6, n1/2(Sn − S0) weakly converges to a tightzero-mean Gaussian process in ∞([0, τ ]) with covariance function Ω0, as n → ∞,where for any t, s ∈ [0, τ ],

Ω0 (t, s) = Ω(0,1)0 (t, s) + Ω

(0,2)0 (t, s) + Ω

(1,2)0 (t, s) ,

Ω(0,1)0 (t, s) =

(0,t]

(0,s]V(0,1)

0 (x−, y−) S(0,2)0 (dy) S(0,2)

0 (dx)

− S(0,2)0 (0)

(0,t]

(0,s]V(0,1)

0 (x−, s − y) S(1,2)0 (dy) S(0,2)

0 (dx)

− S(0,2)0 (0)

(0,t]

(0,s]V(0,1)

0 (t − x, y−) S(0,2)0 (dy) S(1,2)

0 (dx)

+[S(0,2)

0 (0)]2

(0,t]

(0,s]V(0,1)

0 (t − x, s − y) S(1,2)0 (dy) S(1,2)

0 (dx),

Ω(0,2)0 (t, s) = S(0,1)

0 (t) S(0,1)0 (s) V(0,2)

0 (t, s)

− S(0,1)0 (t)

[0,s]V(0,2)

0 (t, y) S(1,2)0 (s − y) S(0,1)

0 (dy)

− S(0,1)0 (s)

[0,t]V(0,2)

0 (x, s) S(1,2)0 (t − x) S(0,1)

0 (dx)

+∫

[0,t]

[0,s]V(0,2)

0 (x, y) S(1,2)0 (t − x) S(1,2)

0 (s − y) S(0,1)0 (dy) S(0,1)

0 (dx) ,

Ω(1,2)0 (t, s) =

[0,t]

[0,s]V(1,2)

0 (t − x, s − y) S(0,2)0 (x) S(0,2)

0 (y) S(0,1)0 (dy) S(0,1)

0 (dx) .

It is worth noting that, although Λ(0,1)n − Λ

(0,1)0 , Λ

(0,2)n − Λ

(0,2)0 and Λ

(1,2)n − Λ

(1,2)0

themselves are martingales, they do not form a joint martingale, since no common σ -field is available. Hence, the martingale limit theory cannot be directly used here. Theproof of the above theorems are based on empirical process theory and are providedin the Appendix.

123

Page 10: Estimating the survival function based on the semi-Markov ... › ~zj7 › lida2015.pdf · Z. Zhao et al. Let V = min{X,U},R = I{X U} and ξ be the indicator of observ- ing (X,Δ).The

Z. Zhao et al.

4.3 Asymptotic efficiency

In this subsection, we turn to the asymptotic nonparametric efficiency, which is for-mally defined by the convolution theorem (Theorem VIII.3.1 of Andersen et al. 1993).

We explicitly specify the Hilbert space required by the convolution theorem. DefineH = H

(0,1) × H(0,2) × H

(1,2), where for (i, j) ∈ {(0, 1) , (0, 2) , (1, 2)},

H(i, j) =

{

h :∫

[0,τ ][h (y)]2 L(i)

0 (y−)(

1 − Λ(i, j)0 {y}

)−1Λ

(i, j)0 (dy) < +∞

}

.

For any g = (g(0,1), g(0,2), g(1,2)

), h = (

h(0,1), h(0,2), h(1,2)) ∈ H, define

〈g, h〉H =∑

(i, j)

[0,τ ]g(i, j) (y) h(i, j) (y) L(i)

0 (y−)(

1 − Λ(i, j)0 {y}

)−1Λ

(i, j)0 (dy) ,

where the summation is taken over {(0, 1) , (0, 2) , (1, 2)}. For any h ∈ H, define‖h‖H = 〈h, h〉1/2

H. It can be shown that H is a Hilbert space with the inner product

〈·, ·〉H and the norm ‖·‖H.For any (i, j) ∈ {(0, 1) , (0, 2) , (1, 2)} and any h = (

h(0,1), h(0,2), h(1,2)) ∈ H,

define

Λ(i, j)n,h (t) =

[0,t]

(1 + n−1/2h(i, j) (y)

(i, j)0 (dy) ,

the likelihood ratio of(Λ

(0,1)n,h ,Λ

(0,2)n,h ,Λ

(1,2)n,h

)to

(0,1)0 ,Λ

(0,2)0 ,Λ

(1,2)0

)is:

Rn (h) = R(0,1)n (h)R(0,2)

n (h)R(1,2)n (h) ,

where for any h = (h(0,1), h(0,2), h(1,2)

) ∈ H,

log R(0,1)n (h) =

n∑

i=1

(1 − Ri ) log[1 + n−1/2h(0,1) (Vi )

]

+n∑

i=1

⎣log∏

y∈[0,Vi )

(1 − Λ

(0,1)n,h (dy)

)− log

y∈[0,Vi )

(1 − Λ

(0,1)0 (dy)

)⎤

⎦ ,

log R(0,2)n (h) =

n∑

i=1

Δi Ri log[1 + n−1/2h(0,2) (Vi )

]

+n∑

i=1

(1 − Δi Ri )[log

(1 − Λ

(0,2)n,h {Vi }

)− log

(1 − Λ

(0,2)0 {Vi }

)]

+n∑

i=1

⎣log∏

y∈[0,Vi )

(1 − Λ

(0,2)n,h (dy)

)− log

y∈[0,Vi )

(1 − Λ

(0,2)0 (dy)

)⎤

⎦ ,

123

Page 11: Estimating the survival function based on the semi-Markov ... › ~zj7 › lida2015.pdf · Z. Zhao et al. Let V = min{X,U},R = I{X U} and ξ be the indicator of observ- ing (X,Δ).The

Semi-Markov model for dependent censoring

log R(1,2)n (h) =

n∑

i=1

ξi (1 − Ri ) Δi log[1 + n−1/2h(1,2) (Xi − Vi )

]

+n∑

i=1

ξi (1 − Ri ) (1 − Δi ) log(

1 − Λ(1,2)n,h {Xi − Vi }

)

−n∑

i=1

ξi (1 − Ri ) (1 − Δi ) log(

1 − Λ(1,2)0 {Xi − Vi }

)

+n∑

i=1

ξi (1 − Ri ) log∏

y∈[0,Xi −Vi )

(1 − Λ

(1,2)n,h (dy)

)

−n∑

i=1

ξi (1 − Ri ) log∏

y∈[0,Xi −Vi )

(1 − Λ

(1,2)0 (dy)

).

Theorem 4 Under assumptions A.1–A.6, for any m > 0, h1, . . . , hm ∈ H,

(log Rn (h1) , . . . , log Rn (hm))

weakly converges to a Gaussian random vector in Rm with mean

−1

2

(‖h1‖2

H, . . . , ‖hm‖2

H

)

and covariance matrix

⎜⎝

〈h1, h1〉H . . . 〈h1, hm〉H...

. . ....

〈hm, h1〉H · · · 〈hm, hm〉H

⎟⎠ .

Hence, the information operator I0 : H × H → R is: for any g, h ∈ H,

I0 (g, h) = 〈g, h〉H .

For any h = (h(0,1), h(0,2), h(1,2)

) ∈ H, define

κn (h)=∫

[0,τ ]

(h(0,1) (y) Λ(0,1)

n (dy)+h(0,2) (y) Λ(0,2)n (dy)+h(1,2) (y) Λ(1,2)

n (dy)).

Note that for any g, h ∈ H, the asymptotic covariance of κn (g) and κn (h) is

(i, j)

[0,τ ]g(i, j) (y) h(i, j) (y) L(i)

0 (y−)(

1 − Λ(i, j)0 {y}

)−1Λ

(i, j)0 (dy) (9)

123

Page 12: Estimating the survival function based on the semi-Markov ... › ~zj7 › lida2015.pdf · Z. Zhao et al. Let V = min{X,U},R = I{X U} and ξ be the indicator of observ- ing (X,Δ).The

Z. Zhao et al.

and can be interpreted as the “inverse” of the information operator I0, wherethe summation is taken over {(0, 1) , (0, 2) , (1, 2)}. The efficiency result for(Λ

(0,1)n , Λ

(0,2)n , Λ

(1,2)n

)and Sn is stated in the following theorem.

Theorem 5 Under assumptions A.1–A.6,(Λ

(0,1)n , Λ

(0,2)n , Λ

(1,2)n

)and Sn are efficient.

The proof of the above theorems are given in the Appendix.

4.4 Asymptotic covariance function estimation

To carry out statistical inference, it is necessary to estimate the asymptotic covari-ance function. In this subsection, we provide uniform consistent estimators for theasymptotic covariance functions.

For any g = (g(0,1), g(0,2), g(1,2)

), h = (

h(0,1), h(0,2), h(1,2)) ∈ H, define

In (g, h) =∑

(i, j)

[0,τ ]g(i, j) (y) h(i, j) (y)

(1 − Λ

(i, j)n {y}

)−1N

(i, j)n (dy)

where the summation is taken over {(0, 1) , (0, 2) , (1, 2)}.For any g = (

g(0,1), g(0,2), g(1,2)), h = (

h(0,1), h(0,2), h(1,2)) ∈ H, it can be

shown that In (g, h) is the negative second order directional derivative of log Ln at(Λ

(0,1)n , Λ

(0,2)n , Λ

(1,2)n

)along the direction

[Gn, Hn

], where

Gn =(∫

g(0,1)dΛ(0,1)n ,

g(0,2)dΛ(0,2)n ,

g(1,2)dΛ(1,2)n

)

,

Hn =(∫

h(0,1)dΛ(0,1)n ,

h(0,2)dΛ(0,2)n ,

h(1,2)dΛ(1,2)n

)

.

Since the function Ln plays the role of the likelihood, In should be a reasonableestimator for I0. The “inverse” operation in (9) leads to an estimator for the asymptoticcovariance function of κn :

(i, j)

[0,τ ]g(i, j) (y) h(i, j) (y)

(1 − Λ

(i, j)n {y}

)−1Y

(0)

n (y) Λ(i, j)n (dy)

for all g = (g(0,1), g(0,2), g(1,2)

), h = (

h(0,1), h(0,2), h(1,2)) ∈ H, where the summa-

tion is taken over {(0, 1) , (0, 2) , (1, 2)}.Hence, the estimators for U0 and V0 are given as follows: for any t, s ∈ [0, τ ],

Un (t, s) = diag(U (0,1)

n (t, s) , U (0,2)n (t, s) , U (1,2)

n (t, s))

,

Vn (t, s) = diag(V(0,1)

n (t, s) , V(0,2)n (t, s) , V(1,2)

n (t, s))

,

123

Page 13: Estimating the survival function based on the semi-Markov ... › ~zj7 › lida2015.pdf · Z. Zhao et al. Let V = min{X,U},R = I{X U} and ξ be the indicator of observ- ing (X,Δ).The

Semi-Markov model for dependent censoring

where for any (i, j) ∈ {(0, 1) , (0, 2) , (1, 2)} and any t, s ∈ [0, τ ],

V(i, j)n (t, s) = S(i, j)

n (t) S(i, j)n (s) W(i, j)

n (t, s) ,

U (i, j)n (t, s) = n

[0,min{t,s}]

(1 − Λ

(i, j)n {y}

) (Y

(i)n (y)

)−1Λ

(i, j)n (dy) ,

W(i, j)n (t, s) = n

[0,min{t,s}]

(Y

(i)n (y)

)−1 (1 − Λ

(i, j)n {y}

)−1Λ

(i, j)n (dy) .

The estimator for Ω0 is given as Ωn : for any t, s ∈ [0, τ ],

Ωn (t, s) = Ω(0,1)n (t, s) + Ω(0,2)

n (t, s) + Ω(1,2)n (t, s) ,

where for any (i, j) ∈ {(0, 1) , (0, 2) , (1, 2)} and any t, s ∈ [0, τ ],

Ω(0,1)n (t, s) =

(0,t]

(0,s]V(0,1)

n (x−, y−) S(0,2)n (dy) S(0,2)

n (dx)

− S(0,2)n (0)

(0,t]

(0,s]V(0,1)

n (x−, s − y) S(1,2)n (dy) S(0,2)

n (dx)

− S(0,2)n (0)

(0,t]

(0,s]V(0,1)

n (t − x, y−) S(0,2)n (dy) S(1,2)

n (dx)

+[S(0,2)

n (0)]2

(0,t]

(0,s]V(0,1)

n (t − x, s − y) S(1,2)n (dy) S(1,2)

n (dx) ,

Ω(0,2)n (t, s)

= S(0,1)n (t) S(0,1)

n (s) V(0,2)n (t, s)

− S(0,1)n (t)

[0,s]V(0,2)

n (t, y) S(1,2)n (s − y) S(0,1)

n (dy)

− S(0,1)n (s)

[0,t]V(0,2)

n (x, s) S(1,2)n (t − x) S(0,1)

n (dx)

+∫

[0,t]

[0,s]V(0,2)

n (x, y) S(1,2)n (t − x) S(1,2)

n (s − y) S(0,1)n (dy) S(0,1)

n (dx) ,

Ω(1,2)n (t, s) =

[0,t]

[0,s]V(1,2)

n (t − x, s − y) S(0,2)n (x) S(0,2)

n (y) S(0,1)n (dy)

× S(0,1)n (dx) .

The following theorem states the uniform consistency for Un, Vn and Ωn .

Theorem 6 Under assumptions A.1–A.6, as n → ∞,

supt,s∈[0,τ ]

∣∣∣Un (t, s) − U0 (t, s)

∣∣∣ , sup

t,s∈[0,τ ]

∣∣∣Vn (t, s) − V0 (t, s)

∣∣∣ ,

supt,s∈[0,τ ]

∣∣∣Ωn (t, s) − Ω0 (t, s)

∣∣∣

converge to zero in probability.

The proof of the theorem is given in the Appendix.

123

Page 14: Estimating the survival function based on the semi-Markov ... › ~zj7 › lida2015.pdf · Z. Zhao et al. Let V = min{X,U},R = I{X U} and ξ be the indicator of observ- ing (X,Δ).The

Z. Zhao et al.

Based on the asymptotic theory and the estimation of the asymptotic covariancefunction, various types of inference can be conducted. For example, for any fixedt0 ∈ [0, τ ], a 95 % pointwise confidence interval can be constructed as:

Sn (t0) ± 1.96Ω1/2n (t0, t0) .

5 Numerical studies

5.1 Simulation studies

Four sets of simulation studies were carried out to examine the finite-sample perfor-mance of the proposed NPMLE.

In the first and the third sets of the simulation studies, the independent censoring timeC was specified as +∞. In the second and the fourth sets, the independent censoringtime C was generated from a uniform distribution such that P (T � C) = 0.15.

In the first and the second sets of the simulation studies, the PD censoring timeU was generated from Uniform (0, η), while in the third and the fourth sets, the PDcensoring time U was generated from the exponential distribution with hazard rate η.

In all simulation studies, the survival time T was generated as

T ={

T(1) if T(1) � UU + T(2) if T(1) > U

,

where T(1) was generated from the exponential distribution with hazard rate 1 and T(2)

was generated from the exponential distribution with hazard rate λ. The parameterswere set to be λ ∈ {1, 2} , P (U < T) ∈ {0.3, 0.7} and θ0 ∈ {0.3, 0.6, 1}. For eachscenario, we generated 1000 samples of size n = 100.

The simulation results were summarized in Tables 1, 2, 3, 4, 5, 6, 7 and 8, whichreport the empirical bias (Bias), the empirical standard deviation (Std), the square rootof the empirical mean squared error (sqrt[MSE]) and the average estimated standarddeviation (AES) of the proposed estimator, as well as the empirical coverage rate (CR)of the corresponding 95 % pointwise confidence interval, at the 0.3th, 0.5th and 0.7thquantiles of T. The values of Bias, Std, sqrt(MSE) and AES were multiplied by 10,000.For comparison, the results from the Kaplan–Meier approach were also included in thetables. In addition, the empirical bias, the empirical standard deviation and the squareroot of the empirical mean squared error of Lee and Tsai’s estimator were includedin Tables 1, 2, 5 and 6. Because there is no valid estimator of the variance of Lee andTsai’s estimator, the AES and the CR are not reported for the Lee and Tsai’s estimator.The Lee and Tsai’s estimator was not included in Tables 3, 4, 7 and 8, because it doesnot incorporate the additional independent censoring.

For the cases with C = +∞, in which the Lee and Tsai’s approach is applicable,the proposed NPMLE and the Lee and Tsai’s estimator were nearly unbiased andconsistent. The NPMLE had smaller sqrt(MSE) than the Lee and Tsai’s estimatorfor all three quantiles, the improvement ranged from 0.6 to 7.9 %. The coverage ofthe 95 % pointwise confidence interval for the proposed NPMLE were close to thenominal level.

123

Page 15: Estimating the survival function based on the semi-Markov ... › ~zj7 › lida2015.pdf · Z. Zhao et al. Let V = min{X,U},R = I{X U} and ξ be the indicator of observ- ing (X,Δ).The

Semi-Markov model for dependent censoring

Tabl

e1

Sim

ulat

ion

resu

ltsw

hen

C=

+∞an

dU

isge

nera

ted

from

the

unif

orm

dist

ribu

tion

with

P(T

>U

)=

0.3

λθ

τL

eean

dT

sai’s

estim

ator

Kap

lan–

Mei

eres

timat

orN

PML

E

Bia

sSt

dsq

rt(M

SE)

Bia

sSt

dsq

rt(M

SE)

AE

SC

RB

ias

Std

sqrt

(MSE

)A

ES

CR

10.

30.

316

.322

449.

049

449.

346

17.5

0445

2.96

345

3.30

146

4.36

60.

946

15.0

9544

4.90

044

5.15

645

3.46

50.

948

10.

30.

520

.922

514.

706

515.

131

19.3

0251

9.67

252

0.03

051

9.93

20.

952

18.8

5550

7.43

750

7.78

850

8.85

80.

950

10.

30.

715

.460

493.

339

493.

581

4.20

548

3.33

248

3.35

049

7.77

30.

954

20.2

4548

7.24

148

7.66

251

1.33

30.

966

10.

60.

311

.145

477.

190

477.

320

9.97

148

7.06

348

7.16

546

0.30

90.

932

9.91

147

3.49

347

3.59

644

7.44

70.

928

10.

60.

511

.651

505.

172

505.

306

9.10

551

7.07

951

7.16

051

0.23

50.

945

10.3

6149

7.35

649

7.46

448

9.32

20.

942

10.

60.

729

.273

472.

164

473.

071

30.3

5048

6.64

448

7.58

947

9.24

30.

935

25.3

4946

1.29

346

1.98

946

7.52

00.

946

11.

00.

319

.700

456.

914

457.

339

19.7

0045

6.91

445

7.33

945

4.64

70.

949

17.7

1644

4.79

744

5.15

044

4.82

50.

940

11.

00.

5−8

.600

507.

468

507.

541

−8.6

0050

7.46

850

7.54

149

7.40

70.

947

−9.6

4848

1.70

148

1.79

848

0.68

90.

953

11.

00.

7−9

.100

457.

686

457.

777

−9.1

0045

7.68

645

7.77

745

5.09

80.

948

−10.

352

437.

898

438.

020

446.

506

0.95

1

20.

30.

39.

379

467.

485

467.

579

89.8

5147

1.06

547

9.55

846

0.53

30.

935

12.2

6046

4.08

246

4.24

445

3.54

10.

944

20.

30.

5−3

.479

507.

639

507.

651

183.

352

510.

943

542.

845

518.

393

0.93

2−2

.961

502.

958

502.

966

512.

860

0.94

6

20.

30.

7−1

1.35

948

0.61

448

0.74

928

2.39

950

2.68

257

6.57

550

5.04

90.

927

− 12.

246

477.

239

477.

396

518.

828

0.96

5

20.

60.

3−7

.299

457.

866

457.

924

36.8

1746

3.47

646

4.93

645

9.00

20.

940

−5.3

2044

8.09

444

8.12

644

6.42

90.

943

20.

60.

51.

588

523.

249

523.

251

103.

364

533.

389

543.

312

508.

829

0.93

18.

289

507.

991

508.

059

493.

961

0.94

3

20.

60.

7−1

6.65

447

8.81

447

9.10

413

3.20

850

4.19

052

1.49

048

1.33

20.

935

−15.

314

464.

329

464.

582

488.

198

0.96

1

21.

00.

3−9

.700

440.

306

440.

413

−9.7

0044

0.30

644

0.41

345

6.14

90.

956

−10.

120

426.

256

426.

376

443.

491

0.95

2

21.

00.

5−1

1.00

047

3.92

447

4.05

1−1

1.00

047

3.92

447

4.05

149

7.74

00.

949

−8.2

8546

0.14

946

0.22

448

5.97

50.

957

21.

00.

79.

900

449.

504

449.

613

9.90

044

9.50

444

9.61

345

6.03

50.

960

9.80

643

0.75

443

0.86

547

6.47

00.

970

The

AE

San

dth

eC

Rar

eno

trep

orte

dfo

rthe

Lee

and

Tsa

i’ses

timat

or,s

ince

the

corr

espo

ndin

gva

rian

cees

timat

orha

sno

tbee

nde

velo

ped.

The

valu

esof

Bia

s,St

d,sq

rt(M

SE)

and

AE

Sha

vebe

enm

ultip

lied

by10

,000

Bia

sth

eem

piri

calb

ias,

Std

the

empi

rica

lsta

ndar

dde

viat

ion,

sqrt

(MSE

)th

esq

uare

root

ofth

eem

piri

calm

ean

squa

red

erro

r,A

ES

the

aver

age

estim

ated

stan

dard

devi

atio

n,C

Rth

eem

piri

calc

over

age

rate

ofth

e95

%co

nfide

nce

inte

rval

123

Page 16: Estimating the survival function based on the semi-Markov ... › ~zj7 › lida2015.pdf · Z. Zhao et al. Let V = min{X,U},R = I{X U} and ξ be the indicator of observ- ing (X,Δ).The

Z. Zhao et al.

Tabl

e2

Sim

ulat

ion

resu

ltsw

hen

C=

+∞an

dU

isge

nera

ted

from

the

unif

orm

dist

ribu

tion

with

P(T

>U

)=

0.7

λθ

τL

eean

dT

sai’s

estim

ator

Kap

lan–

Mei

eres

timat

orN

PML

E

Bia

sSt

dsq

rt(M

SE)

Bia

sSt

dsq

rt(M

SE)

AE

SC

RB

ias

Std

sqrt

(MSE

)A

ES

CR

10.

30.

3−1

1.33

749

9.43

749

9.56

5−0

.496

520.

377

520.

377

503.

210

0.93

0−1

4.57

649

2.27

649

2.49

247

4.69

80.

932

10.

30.

5−3

1.31

165

8.79

165

9.53

4−3

3.69

567

5.06

067

5.90

164

4.22

20.

925

−30.

850

647.

774

648.

508

631.

552

0.93

9

10.

30.

7−2

.828

716.

866

716.

871

0.97

376

0.75

276

0.75

372

8.59

20.

918

−3.8

0271

0.68

371

0.69

469

7.28

80.

920

10.

60.

34.

242

442.

553

442.

574

1.27

246

9.83

646

9.83

848

0.90

60.

946

2.31

541

8.69

341

8.69

943

7.04

70.

959

10.

60.

55.

565

525.

029

525.

059

3.81

355

8.60

555

8.61

856

2.14

00.

950

4.55

949

6.98

549

7.00

651

1.49

10.

952

10.

60.

7−5

.090

531.

379

531.

404

−6.2

8556

0.95

656

0.99

155

6.61

60.

946

−5.8

1451

0.06

051

0.09

352

1.13

10.

951

11.

00.

31.

900

467.

809

467.

813

1.90

046

7.80

946

7.81

345

5.30

10.

941

−9.3

9043

0.11

943

0.22

142

0.91

80.

932

11.

00.

5−2

.100

489.

127

489.

131

−2.1

0048

9.12

748

9.13

149

7.59

20.

945

−5.5

4744

7.85

244

7.88

645

4.72

20.

950

11.

00.

7−6

.200

448.

980

449.

023

−6.2

0044

8.98

044

9.02

345

5.33

60.

950

−6.6

4241

4.76

941

4.82

242

7.75

80.

956

20.

30.

3− 5

.362

495.

129

495.

158

245.

676

499.

341

556.

506

483.

087

0.89

5−5

.022

479.

782

479.

808

467.

545

0.94

5

20.

30.

512

.071

602.

643

602.

763

513.

093

595.

947

786.

395

593.

462

0.85

18.

185

589.

272

589.

329

581.

017

0.93

5

20.

30.

71.

222

645.

529

645.

530

466.

958

739.

829

874.

869

721.

418

0.88

16.

178

634.

232

634.

262

623.

757

0.93

2

20.

60.

37.

979

460.

885

460.

954

138.

658

474.

635

494.

474

470.

687

0.93

116

.016

438.

841

439.

133

428.

777

0.94

3

20.

60.

511

.827

527.

470

527.

603

247.

354

563.

289

615.

206

546.

354

0.91

411

.581

494.

984

495.

120

491.

471

0.95

0

20.

60.

7−6

.385

511.

517

511.

557

193.

050

566.

862

598.

833

559.

222

0.93

8−1

.412

485.

327

485.

329

479.

425

0.94

4

21.

00.

3−8

.200

457.

627

457.

700

−8.2

0045

7.62

745

7.70

045

5.86

50.

953

−15.

557

409.

854

410.

149

413.

332

0.95

6

21.

00.

5−2

3.80

048

3.32

948

3.91

5−2

3.80

048

3.32

948

3.91

549

7.64

50.

956

−24.

953

436.

142

436.

856

449.

685

0.95

8

21.

00.

7−1

4.80

044

5.84

744

6.09

3−1

4.80

044

5.84

744

6.09

345

4.98

40.

960

−18.

720

392.

816

393.

261

406.

233

0.95

1

The

AE

San

dth

eC

Rar

eno

trep

orte

dfo

rthe

Lee

and

Tsa

i’ses

timat

or,s

ince

the

corr

espo

ndin

gva

rian

cees

timat

orha

sno

tbee

nde

velo

ped.

The

valu

esof

Bia

s,St

d,sq

rt(M

SE)

and

AE

Sha

vebe

enm

ultip

lied

by10

,000

Bia

sth

eem

piri

calb

ias,

Std

the

empi

rica

lsta

ndar

dde

viat

ion,

sqrt

(MSE

)th

esq

uare

root

ofth

eem

piri

calm

ean

squa

red

erro

r,A

ES

the

aver

age

estim

ated

stan

dard

devi

atio

n,C

Rth

eem

piri

calc

over

age

rate

ofth

e95

%co

nfide

nce

inte

rval

123

Page 17: Estimating the survival function based on the semi-Markov ... › ~zj7 › lida2015.pdf · Z. Zhao et al. Let V = min{X,U},R = I{X U} and ξ be the indicator of observ- ing (X,Δ).The

Semi-Markov model for dependent censoring

Tabl

e3

Sim

ulat

ion

resu

ltsw

hen

P(T

>C

)=

0.15

and

Uis

gene

rate

dfr

omth

eun

ifor

mdi

stri

butio

nw

ithP

(T>

U)=

0.3

λθ

τK

apla

n–M

eier

estim

ator

NPM

LE

Bia

sSt

dsq

rt(M

SE)

AE

SC

RB

ias

Std

sqrt

(MSE

)A

ES

CR

10.

30.

33.

157

475.

857

475.

868

465.

098

0.94

31.

603

464.

227

464.

229

454.

039

0.94

4

10.

30.

5−8

.948

518.

809

518.

886

520.

695

0.94

9−1

2.34

651

5.59

251

5.74

051

0.10

70.

952

10.

30.

7−2

4.53

548

0.11

648

0.74

249

7.57

60.

949

−26.

005

488.

030

488.

722

512.

812

0.96

1

10.

60.

3−5

5.07

145

7.71

946

1.02

046

3.38

60.

952

−53.

657

444.

683

447.

908

450.

794

0.94

3

10.

60.

5−2

7.60

552

1.31

252

2.04

250

9.74

00.

944

−23.

875

496.

274

496.

848

489.

593

0.93

9

10.

60.

7−4

.198

477.

638

477.

657

477.

467

0.95

2−6

.434

444.

951

444.

998

466.

468

0.96

0

11.

00.

35.

800

456.

393

456.

430

455.

269

0.95

72.

850

450.

296

450.

305

445.

537

0.95

1

11.

00.

5−1

.200

517.

326

517.

327

497.

304

0.93

8−0

.957

504.

409

504.

410

480.

853

0.94

5

11.

00.

7−9

.700

457.

368

457.

470

455.

068

0.94

9−4

.476

439.

057

439.

080

446.

865

0.94

7

20.

30.

373

.204

466.

208

471.

920

461.

281

0.93

2−3

.787

461.

198

461.

213

454.

278

0.93

6

20.

30.

518

6.87

751

9.63

855

2.22

051

8.04

10.

932

5.28

050

8.99

250

9.02

051

3.61

60.

955

20.

30.

728

5.55

252

0.92

959

4.06

050

4.75

90.

911

9.75

049

4.25

249

4.34

851

9.51

30.

964

20.

60.

354

.006

466.

143

469.

262

458.

288

0.94

711

.512

451.

738

451.

884

445.

745

0.93

7

20.

60.

512

4.85

253

3.05

254

7.47

850

9.02

00.

923

18.9

4050

8.31

950

8.67

149

4.00

40.

936

20.

60.

716

8 .70

148

1.22

150

9.93

548

3.24

50.

939

16.3

4844

7.17

444

7.47

348

9.08

20.

963

21.

00.

3−4

.200

473.

568

473.

587

455.

499

0.94

5−4

.680

459.

618

459.

642

442.

950

0.93

4

21.

00.

58.

000

498.

583

498.

647

497.

498

0.94

84.

254

478.

715

478.

734

485.

907

0.95

5

21.

00.

78.

600

444.

391

444.

474

456.

052

0.96

15.

087

423.

137

423.

167

476.

635

0.97

4

The

Lee

and

Tsa

i’ses

timat

oris

noti

nclu

ded,

sinc

eth

ead

ditio

nali

ndep

ende

ntce

nsor

ing

isno

tinc

orpo

rate

din

this

appr

oach

.The

valu

esof

Bia

s,St

d,sq

rt(M

SE)

and

AE

Sha

vebe

enm

ultip

lied

by10

,000

Bia

sth

eem

piri

calb

ias,

Std

the

empi

rica

lsta

ndar

dde

viat

ion,

sqrt

(MSE

)th

esq

uare

root

ofth

eem

piri

calm

ean

squa

red

erro

r,A

ES

the

aver

age

estim

ated

stan

dard

devi

atio

n,C

Rth

eem

piri

calc

over

age

rate

ofth

e95

%co

nfide

nce

inte

rval

123

Page 18: Estimating the survival function based on the semi-Markov ... › ~zj7 › lida2015.pdf · Z. Zhao et al. Let V = min{X,U},R = I{X U} and ξ be the indicator of observ- ing (X,Δ).The

Z. Zhao et al.

Tabl

e4

Sim

ulat

ion

resu

ltsw

hen

P(T

>C

)=

0.15

and

Uis

gene

rate

dfr

omth

eun

ifor

mdi

stri

butio

nw

ithP

(T>

U)=

0.7

λθ

τK

apla

n–M

eier

estim

ator

NPM

LE

Bia

sSt

dsq

rt(M

SE)

AE

SC

RB

ias

Std

sqrt

(MSE

)A

ES

CR

10.

30.

3−1

1.32

150

8.92

750

9.05

350

4.17

20.

952

−7.0

3249

1.38

349

1.43

347

3.16

40.

938

10.

30.

5−2

.953

674.

774

674.

780

642.

816

0.93

84.

681

649.

068

649.

085

629.

126

0.93

0

10.

30.

735

.088

748.

811

749.

633

731.

977

0.93

621

.350

714.

376

714.

695

698.

724

0.93

3

10.

60.

3−1

.934

492.

202

492.

205

480.

297

0.94

1−1

.255

440.

665

440.

666

437.

101

0.94

8

10.

60.

5−1

.948

557.

711

557.

714

561.

540

0.95

011

.016

501.

484

501.

605

511.

521

0.94

5

10.

60.

721

.931

556.

991

557.

423

556.

910

0.94

721

.264

510.

031

510.

474

521.

362

0.93

5

11.

00.

3−5

.000

475.

711

475.

738

455.

495

0.94

3−5

.471

432.

805

432.

840

420.

981

0.93

7

11.

00.

5−1

5.70

051

0.45

551

0.69

649

7.37

40.

938

−12.

767

461.

219

461.

395

454.

286

0.94

5

11.

00.

7−1

2.20

046

8.41

346

8.57

245

4.83

80.

946

−21.

023

433.

071

433.

581

427.

045

0.93

8

20.

30.

326

8.55

846

8.72

654

0.21

148

2.08

00.

889

9.30

946

6.17

146

6.26

446

6.74

50.

940

20.

30.

550

0.11

459

3.13

577

5.83

759

4.95

20.

851

−1.3

8658

7.40

958

7.41

058

1.48

20.

945

20.

30.

742

0.00

973

9.10

585

0.10

872

2.83

90.

900

−13.

806

641.

460

641.

608

621.

044

0.92

4

20.

60.

397

.707

453.

296

463.

707

473.

168

0.94

5−2

7.11

741

3.58

141

4.46

943

0.98

10.

960

20.

60.

521

7.81

255

5.66

759

6.83

154

6.97

00.

916

−17.

538

483.

365

483.

683

492.

237

0.95

0

20.

60.

719

0.46

256

1.83

659

3.24

255

9.10

50.

942

−7.8

1148

2.64

448

2.70

847

9.95

50.

937

21.

00.

30.

100

453.

780

453.

780

455.

556

0.95

71.

700

407.

523

407.

527

412.

883

0.94

8

21.

00.

5−1

8.30

051

4.19

851

4.52

349

7.33

30.

936

−7.8

8245

0.73

945

0.80

744

9.65

40.

938

21.

00.

7−2

0.60

046

5.65

346

6.10

845

4.48

50.

943

−19.

324

415.

083

415.

533

405.

605

0.93

7

The

Lee

and

Tsa

i’ses

timat

oris

noti

nclu

ded,

sinc

eth

ead

ditio

nali

ndep

ende

ntce

nsor

ing

isno

tinc

orpo

rate

din

this

appr

oach

.The

valu

esof

Bia

s,St

d,sq

rt(M

SE)

and

AE

Sha

vebe

enm

ultip

lied

by10

,000

Bia

sth

eem

piri

calb

ias,

Std

the

empi

rica

lsta

ndar

dde

viat

ion,

sqrt

(MSE

)th

esq

uare

root

ofth

eem

piri

calm

ean

squa

red

erro

r,A

ES

the

aver

age

estim

ated

stan

dard

devi

atio

n,C

Rth

eem

piri

calc

over

age

rate

ofth

e95

%co

nfide

nce

inte

rval

123

Page 19: Estimating the survival function based on the semi-Markov ... › ~zj7 › lida2015.pdf · Z. Zhao et al. Let V = min{X,U},R = I{X U} and ξ be the indicator of observ- ing (X,Δ).The

Semi-Markov model for dependent censoring

Tabl

e5

Sim

ulat

ion

resu

ltsw

hen

C=

+∞an

dU

isge

nera

ted

from

the

expo

nent

iald

istr

ibut

ion

with

P(T

>U

)=

0.3

λθ

τL

eean

dT

sai’s

estim

ator

Kap

lan–

Mei

eres

timat

orN

PML

E

Bia

sSt

dsq

rt(M

SE)

Bia

sSt

dsq

rt(M

SE)

AE

SC

RB

ias

Std

sqrt

(MSE

)A

ES

CR

10.

30.

310

.296

456.

820

456.

936

11.2

4746

9.84

746

9.98

146

8.01

80.

941

10.4

4245

0.53

845

0.65

945

6.57

40.

944

10.

30.

54.

645

522.

233

522.

254

10.9

0652

7.33

852

7.45

152

6.66

20.

955

3.85

251

7.10

551

7.12

052

1.40

50.

953

10.

30.

7−2

.288

513.

833

513.

838

−2.4

5650

3.66

550

3.67

150

5.22

60.

943

−3.6

1351

0.71

151

0.72

452

9.39

60.

952

10.

60.

321

.891

464.

160

464.

676

19.8

9947

0.59

347

1.01

446

1.64

30.

940

20.4

0745

7.76

245

8.21

644

6.71

80.

942

10.

60.

514

.251

495.

721

495.

925

14.5

9650

3.02

350

3.23

551

3.28

80.

951

12.5

1348

9.97

549

0.13

549

2.02

70.

947

10.

60.

7−0

.230

475.

155

475.

155

0.28

648

2.68

348

2.68

348

1.63

20.

944

1.16

846

4.45

646

4.45

747

3.89

40.

954

11.

00.

319

.800

456.

154

456.

583

19.8

0045

6.15

445

6.58

345

4.66

50.

958

19.1

5644

6.73

044

7.14

144

2.64

00.

952

11.

00.

55.

600

485.

494

485.

526

5.60

048

5.49

448

5.52

649

7.62

80.

947

4.39

346

9.21

646

9.23

647

9.71

00.

944

11.

00.

76.

200

453.

594

453.

636

6.20

045

3.59

445

3.63

645

5.83

80.

956

7.41

542

9.56

642

9.63

044

8.33

90.

963

20.

30.

318

.762

440.

773

441.

172

116.

418

447.

944

462.

825

462.

360

0.93

819

.215

433.

321

433.

747

456.

426

0.95

3

20.

30.

52.

862

514.

297

514.

305

222.

344

511.

420

557.

662

523.

404

0.93

42.

159

508.

523

508.

527

522.

428

0.95

0

20.

30.

7−1

.447

496.

163

496.

165

318.

350

506.

289

598.

059

512.

735

0.91

5−3

.142

489.

412

489.

422

526.

006

0.96

1

20.

60.

3−1

2.46

645

4.92

045

5.09

143

.213

460.

479

462.

502

460.

518

0.93

4−1

5.99

643

8.64

843

8.93

944

6.45

40.

944

20.

60.

5−8

.363

503.

876

503.

945

113.

093

513.

398

525.

707

512.

080

0.94

3−1

0.43

548

8.75

448

8.86

549

6.81

80.

955

20.

60.

7−0

.668

472.

414

472.

414

167.

725

498.

620

526.

074

486.

202

0.94

3−1

.017

458.

369

458.

371

491.

470

0.97

1

21.

00.

3−2

2.60

045

7.66

745

8.22

4−2

2.60

045

7.66

745

8.22

445

6.49

30.

948

−18.

267

440.

063

440.

442

441.

278

0.93

9

21.

00.

5−2

2.50

051

7.35

051

7.83

9−2

2.50

051

7.35

051

7.83

949

7.29

80.

941

−16.

422

489.

006

489.

282

484.

956

0.94

4

21.

00.

71.

400

452.

751

452.

753

1.40

045

2.75

145

2.75

345

5.62

60.

951

−2.1

1643

5.38

743

5.39

247

6.52

30.

955

The

AE

San

dth

eC

Rar

eno

trep

orte

dfo

rthe

Lee

and

Tsa

i’ses

timat

or,s

ince

the

corr

espo

ndin

gva

rian

cees

timat

orha

sno

tbee

nde

velo

ped.

The

valu

esof

Bia

s,St

d,sq

rt(M

SE)

and

AE

Sha

vebe

enm

ultip

lied

by10

,000

Bia

sth

eem

piri

calb

ias,

Std

the

empi

rica

lsta

ndar

dde

viat

ion,

sqrt

(MSE

)th

esq

uare

root

ofth

eem

piri

calm

ean

squa

red

erro

r,A

ES

the

aver

age

estim

ated

stan

dard

devi

atio

n,C

Rth

eem

piri

calc

over

age

rate

ofth

e95

%co

nfide

nce

inte

rval

123

Page 20: Estimating the survival function based on the semi-Markov ... › ~zj7 › lida2015.pdf · Z. Zhao et al. Let V = min{X,U},R = I{X U} and ξ be the indicator of observ- ing (X,Δ).The

Z. Zhao et al.

Tabl

e6

Sim

ulat

ion

resu

ltsw

hen

C=

+∞an

dU

isge

nera

ted

from

the

expo

nent

iald

istr

ibut

ion

with

P(T

>U

)=

0.7

λθ

τL

eean

dT

sai’s

estim

ator

Kap

lan–

Mei

eres

timat

orN

PML

E

Bia

sSt

dsq

rt(M

SE)

Bia

sSt

dsq

rt(M

SE)

AE

SC

RB

ias

Std

sqrt

(MSE

)A

ES

CR

10.

30.

3−5

.225

538.

632

538.

657

−3.8

1855

0.36

155

0.37

452

6.10

30.

935

−4.2

0252

4.80

952

4.82

550

7.76

90.

936

10.

30.

524

.438

685.

975

686.

410

29.8

3767

2.75

967

3.42

064

9.39

20.

932

22.8

3567

7.26

067

7.64

565

5.72

90.

931

10.

30.

720

.986

710.

008

710.

318

35.8

3670

1.71

170

2.62

668

3.78

20.

933

14.4

0570

3.19

570

3.34

368

5.00

90.

928

10.

60.

30.

793

458.

252

458.

253

−0.8

2347

9.78

247

9.78

349

0.88

80.

956

0.68

043

5.08

443

5.08

544

5.07

10.

956

10.

60.

5−1

7.91

953

6.53

353

6.83

2−1

9.71

356

0.27

856

0.62

556

6.15

90.

939

−18.

499

514.

169

514.

501

521.

047

0.94

6

10.

60.

7−1

7.08

852

6.25

452

6.53

2−1

3.20

955

3.73

755

3.89

454

5.97

40.

929

−15.

800

505.

719

505.

966

516.

219

0.94

6

11.

00.

3−1

7.50

045

0.74

045

1.08

0−1

7.50

045

0.74

045

1.08

045

6.36

70.

956

−12.

883

413.

698

413.

899

417.

697

0.96

0

11.

00.

5−1

1.10

048

4.33

648

4.46

3−1

1.10

048

4.33

648

4.46

349

7.63

80.

945

−16.

014

445.

981

446.

268

457.

365

0.95

9

11.

00.

7−1

3.40

044

3.30

244

3.50

5−1

3.40

044

3.30

244

3.50

545

5.08

30.

957

−10.

796

411.

697

411.

839

430.

601

0.95

5

20.

30.

3−2

4.21

252

3.87

752

4.43

728

9.56

150

3.13

858

0.51

149

7.89

20.

881

−23.

854

509.

650

510.

208

498.

870

0.93

4

20.

30.

5−2

4.45

061

8.91

161

9.39

350

3.50

261

2.36

179

2.78

061

2.97

60.

854

−18.

987

606.

850

607.

147

613.

808

0.94

5

20.

30.

7−1

4.46

762

2.93

362

3.10

159

8.47

067

8.10

390

4.42

866

6.90

60.

856

−12.

684

614.

458

614.

588

619.

218

0.93

9

20.

60.

3−5

.620

447.

090

447.

125

153.

853

465.

542

490.

307

478.

634

0.92

94.

358

422.

673

422.

695

435.

396

0.94

6

20.

60.

50.

120

511.

504

511.

504

259.

435

540.

600

599.

629

554.

792

0.92

61.

101

489.

888

489.

889

503.

871

0.96

4

20.

60.

7−3

.514

498.

036

498.

048

266.

248

542.

172

604.

018

549.

819

0.93

0−1

0.18

146

9.89

347

0.00

448

8.70

90.

955

21.

00.

321

.600

469.

163

469.

660

21.6

0046

9.16

346

9.66

045

4.39

00.

942

21.1

3841

7.59

741

8.13

240

6.46

10.

933

21.

00.

531

.400

501.

483

502.

465

31.4

0050

1.48

350

2.46

549

7.45

90.

938

23.5

0745

5.78

245

6.38

845

1.00

10.

943

21.

00.

76.

100

457.

320

457.

361

6.10

045

7.32

045

7.36

145

5.78

20.

954

15.7

0142

1.12

142

1.41

442

4.77

10.

955

The

AE

San

dth

eC

Rar

eno

trep

orte

dfo

rthe

Lee

and

Tsa

i’ses

timat

or,s

ince

the

corr

espo

ndin

gva

rian

cees

timat

orha

sno

tbee

nde

velo

ped.

The

valu

esof

Bia

s,St

d,sq

rt(M

SE)

and

AE

Sha

vebe

enm

ultip

lied

by10

,000

Bia

sth

eem

piri

calb

ias,

Std

the

empi

rica

lsta

ndar

dde

viat

ion,

sqrt

(MSE

)th

esq

uare

root

ofth

eem

piri

calm

ean

squa

red

erro

r,A

ES

the

aver

age

estim

ated

stan

dard

devi

atio

n,C

Rth

eem

piri

calc

over

age

rate

ofth

e95

%co

nfide

nce

inte

rval

123

Page 21: Estimating the survival function based on the semi-Markov ... › ~zj7 › lida2015.pdf · Z. Zhao et al. Let V = min{X,U},R = I{X U} and ξ be the indicator of observ- ing (X,Δ).The

Semi-Markov model for dependent censoring

Tabl

e7

Sim

ulat

ion

resu

ltsw

hen

P(T

>C

)=

0.15

and

Uis

gene

rate

dfr

omth

eex

pone

ntia

ldis

trib

utio

nw

ithP

(T>

U)=

0.3

λθ

τK

apla

n–M

eier

estim

ator

NPM

LE

Bia

sSt

dsq

rt(M

SE)

AE

SC

RB

ias

Std

sqrt

(MSE

)A

ES

CR

10.

30.

3−3

7.14

248

6.21

548

7.63

246

9.84

50.

948

−32.

751

468.

945

470.

088

458.

227

0.94

6

10.

30.

5−9

.926

527.

004

527.

098

526.

192

0.94

4−7

.799

521.

589

521.

647

520.

417

0.94

8

10.

30.

715

.360

508.

382

508.

614

505.

214

0.94

014

.140

511.

564

511.

759

528.

501

0.95

6

10.

60.

3−1

.646

440.

079

440.

082

463.

025

0.96

0−0

.956

421.

928

421.

929

448.

100

0.96

5

10.

60.

516

.974

487.

732

488.

027

513.

509

0.96

119

.143

461.

533

461.

930

492.

530

0.96

3

10.

60.

78.

971

464.

171

464.

258

482.

019

0.95

79.

506

438.

781

438.

884

473.

189

0.96

0

11.

00.

3−5

.500

473.

206

473.

238

455.

534

0.94

8−2

.166

459.

658

459.

663

443.

548

0.93

9

11.

00.

52.

600

517.

978

517.

985

497.

297

0.93

07.

841

491.

710

491.

773

479.

562

0.93

5

11.

00.

7−8

.700

451.

950

452.

033

455.

201

0.95

0−7

.242

431.

096

431.

157

448.

312

0.95

0

20.

30.

310

6.27

244

4.51

445

7.04

146

2.60

10.

942

4.02

844

5.56

544

5.58

345

7.06

60.

954

20.

30.

521

8.34

249

9.22

054

4.87

952

3.11

20.

941

2.86

350

1.59

050

1.59

852

2.94

80.

955

20.

30.

732

0.37

050

0.00

959

3.84

051

2.66

80.

919

1.89

048

1.91

148

1.91

452

8.81

80.

972

20.

60.

357

.458

475.

318

478.

778

459.

332

0.93

66.

531

458.

627

458.

673

444.

893

0.93

3

20.

60.

514

8.05

452

4.99

354

5.47

051

1.50

10.

931

27.5

5850

0.38

550

1.14

349

6.14

10.

949

20.

60.

719

2.21

848

1.92

051

8.84

048

7.07

30.

941

22.6

1744

7.30

444

7.87

549

2.08

80.

973

21.

00.

3−2

4.80

047

4.12

047

4.76

845

6.40

40.

944

− 24.

034

449.

961

450.

602

441.

301

0.94

7

21.

00.

5−1

2.10

049

1.28

949

1.43

849

7.57

00.

943

−21.

879

480.

151

480.

650

485.

176

0.94

8

21.

00.

7−2

3.10

045

7.47

845

8.06

045

4.47

60.

961

−25.

766

429.

097

429.

870

476.

058

0.97

0

The

Lee

and

Tsa

i’ses

timat

oris

noti

nclu

ded,

sinc

eth

ead

ditio

nali

ndep

ende

ntce

nsor

ing

isno

tinc

orpo

rate

din

this

appr

oach

.The

valu

esof

Bia

s,St

d,sq

rt(M

SE)

and

AE

Sha

vebe

enm

ultip

lied

by10

,000

Bia

sth

eem

piri

calb

ias,

Std

the

empi

rica

lsta

ndar

dde

viat

ion,

sqrt

(MSE

)th

esq

uare

root

ofth

eem

piri

calm

ean

squa

red

erro

r,A

ES

the

aver

age

estim

ated

stan

dard

devi

atio

n,C

Rth

eem

piri

calc

over

age

rate

ofth

e95

%co

nfide

nce

inte

rval

123

Page 22: Estimating the survival function based on the semi-Markov ... › ~zj7 › lida2015.pdf · Z. Zhao et al. Let V = min{X,U},R = I{X U} and ξ be the indicator of observ- ing (X,Δ).The

Z. Zhao et al.

Tabl

e8

Sim

ulat

ion

resu

ltsw

hen

P(T

>C

)=

0.15

and

Uis

gene

rate

dfr

omth

eex

pone

ntia

ldis

trib

utio

nw

ithP

(T>

U)=

0.7

λθ

τK

apla

n–M

eier

estim

ator

NPM

LE

Bia

sSt

dsq

rt(M

SE)

AE

SC

RB

ias

Std

sqrt

(MSE

)A

ES

CR

10.

30.

311

.840

530.

033

530.

165

525.

895

0.92

815

.547

504.

538

504.

778

506.

972

0.94

7

10.

30.

515

.475

636.

189

636.

378

650.

307

0.94

715

.571

625.

898

626.

092

654.

501

0.95

5

10.

30.

754

.060

674.

713

676.

876

684.

240

0.94

234

.744

662.

199

663.

110

685.

833

0.94

9

10.

60.

3−7

.869

501.

497

501.

559

490.

966

0.93

1−1

0.25

045

1.60

645

1.72

344

6.25

90.

937

10.

60.

5−7

.043

577.

838

577.

881

565.

730

0.94

4−1

5.67

652

5.63

552

5.86

952

1.54

10.

944

10.

60.

7−1

9.48

154

7.57

054

7.91

654

6.07

60.

943

−21.

699

510.

310

510.

771

515.

834

0.93

8

11.

00.

3−2

0.20

044

8.63

544

9.08

945

6.50

20.

965

−22.

370

400.

871

401.

494

418.

539

0.96

0

11.

00.

5−1

3.10

050

0.16

950

0.34

049

7.48

00.

938

−20.

281

457.

657

458.

107

457.

652

0.95

7

11.

00.

7−1

1.60

046

5.06

146

5.20

645

4.89

80.

944

−4.1

6842

8.94

642

8.96

643

0.92

90.

941

20.

30.

331

7.39

549

6.28

558

9.10

049

6.72

20.

873

−10.

674

501.

787

501.

900

497.

296

0.94

5

20.

30.

555

9.88

462

0.29

683

5.60

661

2.19

90.

826

−10.

036

621.

620

621.

701

613.

743

0.94

3

20.

30.

765

8.43

066

3.68

193

4.88

167

0.39

20.

849

4.92

762

1.53

362

1.55

261

8.64

60.

932

20.

60.

316

3.33

948

6.41

051

3.10

347

8.17

70.

917

5.58

543

7.10

643

7.14

243

5.28

40.

949

20.

60.

528

8.48

155

0.58

162

1.57

955

5.22

70.

919

14.5

3349

5.36

349

5.57

650

4.19

10.

954

20.

60.

728

6.41

454

9.55

261

9.71

055

0.98

00.

927

16.6

1147

2.10

347

2.39

549

0.12

80.

952

21.

00.

336

.000

460.

734

462.

139

453.

857

0.95

127

.417

398.

346

399.

288

406.

952

0.94

5

21.

00.

545

.700

495.

830

497.

932

497.

505

0.95

435

.270

434.

118

435.

548

451.

011

0.95

6

21.

00.

732

.400

458.

891

460.

033

456.

942

0.95

131

.639

410.

367

411.

585

425.

044

0.95

9

The

Lee

and

Tsa

i’ses

timat

oris

noti

nclu

ded,

sinc

eth

ead

ditio

nali

ndep

ende

ntce

nsor

ing

isno

tinc

orpo

rate

din

this

appr

oach

.The

valu

esof

Bia

s,St

d,sq

rt(M

SE)

and

AE

Sha

vebe

enm

ultip

lied

by10

,000

Bia

sth

eem

piri

calb

ias,

Std

the

empi

rica

lsta

ndar

dde

viat

ion,

sqrt

(MSE

)th

esq

uare

root

ofth

eem

piri

calm

ean

squa

red

erro

r,A

ES

the

aver

age

estim

ated

stan

dard

devi

atio

n,C

Rth

eem

piri

calc

over

age

rate

ofth

e95

%co

nfide

nce

inte

rval

123

Page 23: Estimating the survival function based on the semi-Markov ... › ~zj7 › lida2015.pdf · Z. Zhao et al. Let V = min{X,U},R = I{X U} and ξ be the indicator of observ- ing (X,Δ).The

Semi-Markov model for dependent censoring

100806040200

0.0

Survival Function Estimate

t

Sur

viva

l Rat

eNPMLEKaplan−Meier EsimatorEmpirical DistributionPointwise 95% Confidence Interval Based on the NPMLE ApproachPointwise 95% Confidence Interval Based on the Kaplan−Meier Approach

0.2

0.4

0.6

0.8

1.0

Fig. 2 Kaplan–Meier estimate and NPMLE

For the cases with λ = 1 when T was independent of U and the cases with θ = 1when (X,Δ) was fully observed, the Kaplan–Meier estimator was nearly unbiasedand consistent and the corresponding 95 % confidence intervals provided reasonablecoverage rates; the proposed NPMLE and the Lee and Tsai’s estimator were alsonearly unbiased and consistent, and the coverage rate of the proposed 95 % confidenceintervals were close to the nominal level. Nevertheless, the proposed NPMLE hadsmaller sqrt(MSE) than the Kaplan–Meier estimator.

For the cases with λ = 2 and θ < 1, the Kaplan–Meier estimator had larger biasand appeared inconsistent, while the other two estimators performed well.

5.2 An example

We illustrate our method with the data from a clinical trial of lung cancer conducted bythe Eastern Cooperative Oncology Group, which was initially analyzed by Lagakosand Williams (1978). The data were also used by Lee and Wolfe (1998) and Lee andTsai (2005) for the illustration of their methods. The complete data can be found inLee and Tsai (2005). The study consists of 61 patients with inoperable carcinoma ofthe lung who were treated with the drug cyclophosphamide. Among the 61 patients,28 patients experienced metastatic disease or a significant increase in the size of their

123

Page 24: Estimating the survival function based on the semi-Markov ... › ~zj7 › lida2015.pdf · Z. Zhao et al. Let V = min{X,U},R = I{X U} and ξ be the indicator of observ- ing (X,Δ).The

Z. Zhao et al.

100806040200

0.0

Survival Function Estimate

t

Sur

viva

l Rat

eNPMLELee and Tsai’s EstimatorEmpirical DistributionPointwise 95% Confidence Interval Based on the NPMLE Approach

0.2

0.4

0.6

0.8

1.0

Fig. 3 Lee and Tsai’s estimator and NPMLE

primary lesion, which are certain types of PD. In addition, Lee and Tsai (2005) dida pseudo second stage sampling, in which they selected 10 of these 28 patients as arandom sample and pretended that the death time of the rest of the 18 patients wereunknown.

For illustration, we also pretended that these 18 patients’ death time were unknown.We treated the 10 selected patients as if they did not leave the study while the rest of the18 did. Figure 2 shows the NPMLE and its pointwise 95 % confidence intervals alongwith the Kaplan–Meier estimator and its pointwise 95 % confidence intervals and theempirical distribution based on the complete data. Figure 3 presents the NPMLE andits pointwise 95 % confidence intervals along with the Lee and Tsai’s estimator andthe empirical distribution based on the complete data.

Figure 2 clearly shows that the Kaplan–Meier estimator is very different from theempirical distribution, and far away from the other estimators. Figures 2 and 3 showthat both NPMLE and the Lee and Tsai’s estimator are very close to the empiricaldistribution. The proposed pointwise 95 % confidence interval contains the Lee andTsai’s estimator and the empirical distribution.

6 Discussion

In oncology studies, in addition to the usual independent censoring, the censoringcaused by the PD is a certain type of dependent censoring. In this paper, we have

123

Page 25: Estimating the survival function based on the semi-Markov ... › ~zj7 › lida2015.pdf · Z. Zhao et al. Let V = min{X,U},R = I{X U} and ξ be the indicator of observ- ing (X,Δ).The

Semi-Markov model for dependent censoring

adopted the semi-Markov model provided by Lee and Tsai (2005) with an additionalindependent censoring and studied the NPMLE of the cause specific cumulative hazardfunctions and the marginal survival function, where we have shown the asymptoticnormality of the NPMLE in the space of uniformly bounded functions and establishedits asymptotic efficiency. We have also developed uniformly consistent estimators forthe covariance function based on the information operator.

A limitation of the model (1) is that it assumes that the risk of death after diseaseprogression only depends on the time since progression, but not the duration of theprogression. The limitation could be addressed by regression type models with theinclusion of the time to progression as a covariate. It is of interest to develop diagnosticmethods for the model and to investigate regression type models in future studies.

Acknowledgments The research is supported by the National Natural Science Foundation of China(11271081) and the Student Growth Fund Scholarship of School of Management, Fudan University.

Appendix

Preliminary

Lemmas 1 and 2 provide related Donsker properties.

Lemma 1 The classes {(x, δ) �→δI {x � t} : t � 0} and {(x, δ) �→δI {x � t} : t � 0}are Donsker classes of functions on R+ × {0, 1}.

Proof Combining Lemma 2.6.15 and Lemma 2.6.18(iii, vi) of Vaart and Wellner(1996), it can be shown that {(x, δ) �→ δI {x � t} : t � 0} and {(x, δ) �→ δI {x � t} :t � 0} are VC-classes on R+ ×{0, 1}. By Theorem 2.6.8 of Vaart and Wellner (1996),they are Donsker classes, where the measurability conditions could be verified via thedenseness of the rational numbers.

Lemma 2 Let η be a positive real number and Λ be a nondecreasing cadlag function

on [0, η]. The class{(x, δ) �→ ∫ t

0 δI {x � y} Λ(dy) : t ∈ [0, η]}

is a Donsker class

of functions on R+ × {0, 1}.

Proof The result is easy to see by combining Example 2.6.21 and Example 2.10.8 ofVaart and Wellner (1996).

Lemma 3 Let η be a positive real number and BV ([0, η]) be the space of all cadlagfunctions defined on [0, η] whose total variation are bounded by 2. For any (F, G, H) ∈BV ([0, η]) × BV ([0, η]) × BV ([0, η]) and any t ∈ [0, η], let

φ (F, G, H) [t] = F (t) G (t) −∫

[0,t]G (u) H (t − u) F (du) .

123

Page 26: Estimating the survival function based on the semi-Markov ... › ~zj7 › lida2015.pdf · Z. Zhao et al. Let V = min{X,U},R = I{X U} and ξ be the indicator of observ- ing (X,Δ).The

Z. Zhao et al.

For any F0, G0, H0 ∈ BV ([0, η]) , φ is Hadamard differentiable at (F0, G0, H0) withderivative φ′(F0, G0, H0), where for any t ∈ [0, η],

φ′ (β0) [β] [t] = F0 (t) G (t) + F (t) G0 (t) −∫

[0,t]G0 (x) H0 (t − x) F (dx)

−∫

[0,t]G0 (x) H (t − x) F0 (dx) −

[0,t]G (x) H0 (t − x) F0 (dx) ,

where β = (F, G, H) and β0 = (F0, G0, H0).

Proof Let F0, G0, H0, F, G, H ∈ BV ([0, η]) , {(Fm, Gm, Hm) : m ∈ N} be asequence converging to (F, G, H) in BV ([0, η]) × BV ([0, η]) × BV ([0, η]) and{hm : m ∈ N} be a sequence of real numbers converging to 0.

Denote β0 = (F0, G0, H0) and βm = β0 + hm(F, G, H).Note that for any t ∈ [0, η] , φ(βm)[t] = φ(β0)[t] + hmΓm(t) + h2

mWm(t), where

Γm (t) = F0 (t) Gm (t) + Fm (t) G0 (t) −∫

[0,t]G0 (x) H0 (t − x) Fm (dx)

−∫

[0,t]G0 (x) Hm (t − x) F0 (dx) −

[0,t]Gm (x) H0 (t − x) F0 (dx) ,

Wm (t) = Fm (t) Gm (t) −∫

[0,t]H0 (t − x) Gm (x) Fm (dx)

−∫

[0,t]G0 (x) Hm (t − x) Fm (dx) −

[0,t]Gm (x) Hm (t − x) F0 (dx)

− hm

[0,t]Gm (x) Hm (t − x) Fm (dx) .

For any t ∈ [0, η], let

Γ0 (t) = F0 (t) G (t) + F (t) G0 (t) −∫

[0,t]G0 (x) H0 (t − x) F (dx)

−∫

[0,t]G0 (x) H (t − x) F0 (dx) −

[0,t]G (x) H0 (t − x) F0 (dx) .

Note that for any t ∈ [0, η] , |Wm(t)| � 28 + 8hm and

|Γm (t) − Γ0 (t)| � 6 sups∈[0,η]

|Fm (s) − F (s)| + 6 sups∈[0,η]

|Gm (s) − G (s)|+ 4 sup

s∈[0,η]|Hm (s) − H (s)| .

Hence, as m → ∞, supt∈[0,η] |Wm(t)| = O(1) and supt∈[0,η] |Γm(t)−Γ0(t)| = o(1).

123

Page 27: Estimating the survival function based on the semi-Markov ... › ~zj7 › lida2015.pdf · Z. Zhao et al. Let V = min{X,U},R = I{X U} and ξ be the indicator of observ- ing (X,Δ).The

Semi-Markov model for dependent censoring

Therefore, φ is Hadamard differentiable at β0 and for any t ∈ [0, η],

φ′ (β0) [β] [t] = F0 (t) G (t) + F (t) G0 (t) −∫

[0,t]G0 (x) H0 (t − x) F (dx)

−∫

[0,t]G0 (x) H (t − x) F0 (dx) −

[0,t]G (x) H0 (t − x) F0 (dx) ,

where β = (F, G, H).

Proof of Theorems 1 to 3

For any (i, j) ∈ {(0, 1) , (0, 2) , (1, 2)} and any t � 0, define

M(i, j)n (t) = N

(i, j)n (t) −

[0,t]Y

(i)n (y) Λ

(i, j)0 (dy) .

Proof Combining Theorem 2.10.6 of Vaart and Wellner (1996) with Lemmas 1 and2, it can be shown that, as n → ∞,

n−1/2(M

(0,1)

n , M(0,2)

n , M(1,2)

n , Y(0)

n − EY(0)

n , Y(1)

n − EY(1)

n

)

weakly converges to a tight zero mean Gaussian process in

∞5 ([0, τ ]) = ∞ ([0, τ ]) × ∞ ([0, τ ]) × ∞ ([0, τ ]) × ∞ ([0, τ ]) × ∞ ([0, τ ]) .

Combining this with Lemma 3.9.17, Lemma 3.9.25, and Theorem 3.9.4 of Vaart andWellner (1996), for any (i, j) ∈ {(0, 1) , (0, 2) , (1, 2)},

supt∈[0,τ ]

∣∣∣∣Λ

(i, j)n (t) − Λ

(i, j)0 (t) −

[0,t]

(L(i)

0 (y−))−1

n−1M(i, j)n (dy)

∣∣∣∣ = op

(n−1/2) ,

(10)

By the continuity and the linearity of the integral operator, as n → ∞,

n1/2(Λ(0,1)

n − Λ(0,1)0 , Λ(0,2)

n − Λ(0,2)0 , Λ(1,2)

n − Λ(1,2)0

)

weakly converges to a tight Gaussian process in ∞3 ([0, τ ]) with covariance function

U0.By Lemma 3.9.30 and Theorem 3.9.4 of Vaart and Wellner (1996), as n → ∞,

n1/2(S(0,1)

n − S(0,1)0 , S(0,2)

n − S(0,2)0 , S(1,2)

n − S(1,2)0

)

weakly converges to a tight Gaussian process in ∞3 ([0, τ ]) with covariance function

V0.

123

Page 28: Estimating the survival function based on the semi-Markov ... › ~zj7 › lida2015.pdf · Z. Zhao et al. Let V = min{X,U},R = I{X U} and ξ be the indicator of observ- ing (X,Δ).The

Z. Zhao et al.

Combining Lemma 3 with Theorem 3.9.4 of Vaart and Wellner (1996),

supt∈[0,τ ]

∣∣∣Sn (t) − S0 (t) − φ′ (β0)

[βn − β0

][t]

∣∣∣ = op

(n−1/2

),

where β0 =(S(0,1)

0 , S(0,2)0 , S(1,2)

0

)and βn =

(S(0,1)

n , S(0,2)n , S(1,2)

n

).

Therefore, as n → ∞, n1/2(Sn − S0) weakly converges to a tight zero meanGaussian process in ∞ ([0, τ ]) with covariance function Ω0.

Proof of Theorems 4 and 5

Proof To obtain the efficiency, we will verify the conditions of Theorem VIII.3.2 andTheorem VIII.3.3 of Andersen et al. (1993).

First, we verify the local asymptotic normality (LAN) Assumption (AssumptionVIII.3.1 of Andersen et al. 1993).

By the Taylor expansion, it is easy to show that for any h = (h(0,1), h(0,2), h(1,2)) ∈H,

log Rn (h) = S(0,1)n (h) + S(0,2)

n (h) + S(1,2)n (h) − 1

2‖h‖2

H+ op (1) , (11)

where for any (i, j) ∈ {(0, 1) , (0, 2) , (1, 2)} and any h = (h(0,1), h(0,2), h(1,2)) ∈ H,

S(i, j)n (h) = n−1/2

[0,τ ]h(i, j) (y)

(1 − Λ

(i, j)0 {y}

)−1M

(i, j)n (dy) .

Hence, for any m ∈ N+ and any h1, . . . , hm ∈ H,

(log Rn (h1) , . . . , log Rn (hm))

weakly converges to a Gaussian random vector in Rm with mean

−1

2(‖h1‖H , . . . , ‖hm‖H)

and covariance matrix

⎜⎝

〈h1, h1〉H . . . 〈h1, hm〉H...

. . ....

〈hm, h1〉H · · · 〈hm, hm〉H

⎟⎠ .

Next, we verify the Differentiability Assumption (Assumption VIII.3.2 of Andersenet al. 1993).

123

Page 29: Estimating the survival function based on the semi-Markov ... › ~zj7 › lida2015.pdf · Z. Zhao et al. Let V = min{X,U},R = I{X U} and ξ be the indicator of observ- ing (X,Δ).The

Semi-Markov model for dependent censoring

For any (i, j) ∈ {(0, 1) , (0, 2) , (1, 2)}, any h = (h(0,1), h(0,2), h(1,2)) ∈ H, anyt ∈ [0, τ ],

n1/2(Λ

(i, j)n,h (t) − Λ

(i, j)0 (t)

)=

[0,t]h(i, j) (y) Λ

(i, j)0 (dy) .

Note that, for any (i, j) ∈ {(0, 1) , (0, 2) , (1, 2)}, any h = (h(0,1), h(0,2), h(1,2)) ∈ H,and any t ∈ [0, τ ],

∣∣∣∣

[0,t]h(i, j) (y) Λ

(i, j)0 (dy)

∣∣∣∣ � ‖h‖H Λ

(i, j)0 (τ )

(L(i)

0 (τ−))−1

.

Hence, the Differentiability Assumption is verified.

Next, we show that(Λ

(0,1)n , Λ

(0,2)n , Λ

(1,2)n

)is regular.

Recall the weak convergence of the log-likelihood ratio log Rn (h). The continuityfollows from the Le Cam’s third Lemma and the weak convergence can be shown viasimilar arguments in the previous subsection. Hence, the regularity follows.

It remains to check Equation (8.3.5) of Andersen et al. (1993) for each coordinate.For any t ∈ [0, τ ], define

γ(0,1)t =

(0,1)t , 0, 0

), γ

(0,2)t =

(0, ϕ

(0,2)t , 0

), γ

(1,2)t =

(0, 0, ϕ

(1,2)t

)

where for any (i, j) ∈ {(0, 1) , (0, 2) , (1, 2)} and any s ∈ [0, τ ],

ϕ(i, j)t (s) = I {s � t} L(i)

0 (s−)−1(

1 − Λ(i, j)0 {y}

).

By (10) and (11), for any (i, j) ∈ {(0, 1) , (0, 2) , (1, 2)} and any t ∈ [0, τ ],

Λ(i, j)n (t) − Λ

(i, j)0 (t) − n−1/2

(

log Rn

(i, j)t

)+ 1

2‖h‖2

H

)

= op

(n−1/2

).

Therefore, combining all the above results with Theorem VIII.3.2 and Theorem

VIII.3.3 of Andersen et al. (1993), it follows that(Λ

(0,1)n , Λ

(0,2)n , Λ

(1,2)n

)is asymp-

totically efficient.Combining Theorem VIII.3.4 of Andersen et al. (1993) with Lemma 3, the effi-

ciency of Sn follows.

References

Andersen PK, Borgan O, Gill RD, Keiding N (1993) Statistical models based on counting processes.Springer, New York

Datta S, Satten GA, Datta S (2000) Nonparametric estimation for the three-stage irreversible illness-deathmodel. Biometrics 56:841–847

Kiefer J, Wolfowitz J (1956) Consistency of the maximum likelihood estimator in the presence of infinitelymany incidental parameters. Ann Math Stat 27(4):887–906

123

Page 30: Estimating the survival function based on the semi-Markov ... › ~zj7 › lida2015.pdf · Z. Zhao et al. Let V = min{X,U},R = I{X U} and ξ be the indicator of observ- ing (X,Δ).The

Z. Zhao et al.

Lagakos SW, Williams JS (1978) Models for censored survival analysis: a cone class of variable-summodels. Biometrika 65(1):181–189

Lee SY, Tsai WY (2005) An estimator of the survival function based on the semi-markov model underdependent censorship. Lifetime Data Anal 11:193–211

Lee SY, Wolfe RA (1998) A simple test for independent censoring under the proportional hazards model.Biometrics 54:1176–1182

Scholz FW (1980) Towards a unified definition of maximum likelihood. Can J Stat 8:193–203Tsiatis AA (1975) A nonidentifiability aspect of the problem of competing risks. Proc Natl Acad Sci USA

72:20–22Van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes: with applications to

statistics. Springer, New York

123


Recommended