Nonparametric estimation of hazard functions and their ...derivatives are considered under the...

Ann. Inst. Statist. Math.

Vol. 45, No. 2, 249-264 (1993)

NONPARAMETRIC ESTIMATION OF HAZARD FUNCTIONS AND THEIR DERIVATIVES UNDER TRUNCATION MODEL*

0LK0 GORLER 1 AND JANE-LING WANG 2

1Bilkent University, Faculty of Industrial Engineering, Ankara, Turkey 2Division of Statistics, University of California, Davis, CA 95616-8705, U.S.A.

(Received August 26, 1991; revised September 16, 1992)

A b s t r a c t . Nonparametric kernel estimators for hazard functions and their derivatives are considered under the random left truncation model. The estimator is of the form of sum of identically distributed but dependent random variables. Exact and asymptotic expressions for the biases and variances of the estimators are derived. Mean square consistency and local asymptotic normality of the estimators are established. Adaptive local bandwidths are obtained by estimating the optimal bandwidths consistently.

Key words and phrases: Adaptive bandwidth choice, consistency, H£jek projection, kernel estimate, mean square error, tightness.

1. Introduction

Let X be a random variable (r.v.) of interest, referred to as the lifetime. In practice the observation of X may be prevented by another independent random variable Y called the truncation variable. Suppose (Xi,Y/), i = 1 , . . . , N is a random sample of (X,Y). Then, under the random left truncation model, one observes only those i.i.d, pairs (Xi, Yi) for which Yi < Xi . We index those observed pairs by i = 1 , . . . , n . There is a similarity of the left truncation model to the left censoring model studied by CsSrg5 and Horv~th (1980), but the number of observations n in the former is a random variable.

Much of the literature has been devoted to the censoring model and the sta- tistical interest in the truncation model spurred only more recently, partly due to its applicability to AIDS data (Lui et al. (1986), Lagakos et al. (1988), Kalbfleisch and Lawless (1989)). More applications of the random left truncation model can be found in Allredge and Gates (1985), among others.

Let F and G be the (right continuous) distr ibution functions (d.f.) of X and Y respectively. The nonparametr ic max imum likelihood est imator (MLE) of F was

* Research supported by Air Force Grant AFOSR 89-0386. Part of the work of lJlkii Giirler was done while she was a Ph.D. student at the Department of Statistics, the Wharton School of the University of Pennsylvania.

249

250 ULKU GI~IRLER AND JANE-LING WANG

first suggested by Lynden-Bell (1971) and studied by Woodroofe (1985), Wang et al. (1986), Chao and Lo (1988), Gu and Lai (1990) and Keiding and Gill (1990). In this paper, our interest focuses on the hazard function ~ of F defined as

(1.1) A(z) = f ( z ) / [ 1 - F ( z ) ] , for F(z) < 1,

where f is the probability density function of F. The hazard function is important for the assessment of risks and has been

studied extensively for randomly censored data, e.g., hazard estimates of the type (2.10) were studied by Ramlau-Hansen (1983), Tanner and T o n g (1983), Yandell (1983), Diehl and Stute (1988) and Miiller and Wang (1990) among others. However, little is known about hazard estimation for truncated data although this problem is of applied interest. For example, in the Channing House data in Hyde (1977), F is the lifetime distribution for males, and ,~(t) is their hazard at age t which is of demographic interest. However the lifetime X is subject to left truncation since the data consists of only those males who were alive at the time the study started; thus the truncation variable Y is the age at entry into the study. A detailed analysis of these data can be found in Giirler and Wang (1992).

Although our main interest is the hazard function itself, we consider the more general problem of estimating its r-th derivative, A (r), for r _> 0. One motivation being that they are involved in the choice of data-dependent optimal bandwidths. We consider the kernel hazard estimator i(~) in (2.10) of A(r) by convolving the kernel with a cumulative hazard estimator. Exact and asymptotic expansions for the mean and variance of i(~)(z) are given in Theorems 3.1 and 3.2 which then imply the mean square consistency of ~(r)(z). The computation of the variance term of ~(r)(z) (cf. (3.2) and Appendix A) is much more complicated under the present truncation model than the consoring model. Asymptotic normality of ~(~) (z) is obtained in Theorem 3.3 via the H~jek projection method (H£jek (1968)).

It is well known that the choice of bandwidths is crucial for the quality of the resulting kernel estimate and that the optimal local bandwidth depends on the curvature at a point. This was first noticed, for kernel density estimators by Parzen ((1962), equation (4.15)). This effect is magnified, even for i.i.d, observations, for kernel estimated hazard functions since the variance of ~(z) tends to infinity as z tends to the right end of the support of F. The left truncation scheme further complicates the situation and the variance of ~(~)(z) also blows up as z tends to zero (cf. (3.5)). Local bandwidth choice is therefore considered here instead of a global one. The optimal local bandwidth b* depends on unknown quantities (cf. (3.8)). We show in Theorem 4.1 that any consistent estimator of it will give rise to a kernel hazard estimator which possesses the same limiting distribution as the kernel hazard estimator employing the optimal bandwidth. Such procedures provide efficient methods for hazard estimation and the resulting bandwidths are called locally adaptive bandwidths. Some choices of locally adaptive bandwidths are given in Section 4. Lengthy proofs are relegated to the Appendices.

ESTIMATION OF HAZARD FUNCTIONS FOR TRUNCATED DATA 251

2. Kernel hazard estimates

We shall assume without loss of generality that both X and Y are nonneg- ative random variables. We adopt Woodroofe's (1985) notation throughout the presentation. The cumulative hazard function of F (or X) is:

J0 x (2.1) A(x) = A(t)dt = - log(1 - F(x)) .

For any d.f. W, define

aw = inf{t : W(t ) > 0} and bw = sup{t : W(t ) < 1},

as the left and right endpoints of the support of W. As Woodroofe (1985) points out, in random truncation models, F can be estimated completely only if ao <_ aF. We shall assume this and put

(2.2) c~ ~_ ~(F, G) = P ( Y <_ X ) = G(x )dF(x ) > O.

Let H* denote the joint distribution of the observed (X, Y) pair, and F* and G* denote the corresponding marginal d.f.'s. Then

(2.3)

(2.4)

(2.5)

H*(x ,y ) = P ( X < x , Y < y l Y < X ) = a -1 G(min(y , t ) )dF( t ) ,

f * ( x ) = H*(x, oc) = a -1 G(t)dF(t) ,

/o" G*(y) = H*(oc, y) = a -~ G(min(y , t ) )dF( t ) .

Theorem 1 of Woodroofe (1985) gives the following representation for the cumulative hazard A:

(2.6)

where

(2.7)

~ x

A(x) = dF*(z ) /C(z ) ,

C(z) = P ( Y < z < X ) = G*(z) - F * ( z - ) = c~-lG(z)[1 - F ( z - ) ] .

Note that C is not monotone and C(z) tends to zero as z tends to either aG or bF. The representation (2.6) then suggests estimating A(z) by

/o (2.8) h ~ ( z ) = [C~(x)]-ldF~(x) = ~ [~C~(Xd] -1, i:Xi<_z

where/7* and C~ are the empirical functions of F* and C, e.g.,

(2.9) c~(z) = # { i : ~ < z < xd/n.

252 /JLK/~I G/JRLER AND JANE-LING WANG

Notice that Cn(Xi) > 1/n, however, it is not a monotone function• We consider the following kernel est imator for A <) (z) by convolving a kernel

Kr with/itn in (2.8):

(2.10) £( ' )(z) - b~+l K~ d?t~(x) - Kr,b(Z -- Xi) nCn(Xi) , i=1

where Kr,b(X ) = b-(~+l)K~(x/b), and b = bn is the bandwidth sequence.

To obtain the propert ies of A(~) we need to assume that: (A1) for some p >_ r, A is p times continuously differentiable at z. As for the bandwidth sequence we require that: (BI) bn --+ O, (B2) nb2~ r+l --, oc. For the kernel function it is assumed that: K~ is a function of bounded varia-

tion with support [ -1 , 1] and it is a function of order (r,p), i.e., Kr satisfies

f (2.11) Kr e M~,p = ~q e L2[-1 , 1] :

{ (-1)>! i q(x)xJdx = 0

# 0 but finite

j=r } O < _ j < p , j # r . j = p

Note that under (2.11) K~ and K~,b(x) implicitly involve p; however, for brevity of notat ion this is suppressed.

3. Mean square consistency and asymptotic normality

We will derive in this section the properties of ~(r)(z) for a c < z < bF. The notat ions in Sections 1 and 2 are used. All expectat ions hereafter are with respect to conditioning on n, the number of observations.

THEOREM 3.1. (Mean and variance)

(3.1)

and

(3.2)

E( i (~)(z)) = f f K~,b(Z - x)[1 - (1 - C(x))n]dA(x)

Var(~ (r) (z)) = i K2, b(z - x)£n(C(x))dA(x)

+ 2 ~t<s Kr, b(Z -- t)Kr,b(Z -- s)

• { [1 -- C(s ) ] ~ 1 - F(t) F(s) - F(t) [[1 - C(s)] ~ - Pn(s,t)]

- [1 - C(t)]n[1 - C(s)]~}dA(t)dA(s),


where for 0 <__ y < 1

1

k=l

P(s, t) = P(neither s nor t is in [Y, X] ] Y < X)

= 1 - a - l [ G ( s ) [ 1 - iV(s ) ] + a ( t ) [ F ( s ) - f ( t ) ] ] .

PROOF. The proof is given in Appendix A. []

Remark. The In function is also used in Watson and Leadbetter ((1964), formula (2.3)). Note that nyIn(y) < 2 and nIn(y) converges uniformly to y - : on any interval [a, b] with a > 0 and b < 1.

Asymptotic behavior of the bias term and the variance is given in the following theorem. The proof is in Appendix B.

THEOREM 3.2. (a) For p > r and under (A1),

(3.3)

where

(3.4)

(3.5)

(b)

where

(3.6)

bias(5,(~)(z)) = ~-~:~(P)(z)B~,~ + o(t~-~),

B~, v - ( -1) v p! f K~(Y)Y~'dY

If z is a continuity point of G, then

Var(~(~)(z))- 1 { A(z) } nb2r+l C---~Vr,p + o(1) ,

(3.7) MSE(~(~)(z)) _ 1 A(z) nb 2r+1 C(z) Vr,p q- (bP-r )~(P)(z)Br,p) 2

(: ) + o nb~:r+: + b 2(p-~) .

COROLLARY 3.1. Under the conditions of Theorem 3.2, (a) If (B1) holds, then ~(r)(z) is asymptotically unbiased for ~(~)(z). (b) If (B1) and (B2) hold, then ~(~)(z) is a mean square consistent and hence

consistent estimator of ;~ (~) ( z ). (c)

? v~,p = K~(y)dy. 1

254 /JLK/0 GURLER AND JANE-LING WANG

The opt imum bandwidth which minimizes the leading term in (3.7) is given by

(3.8) = r t_l / (2p+l) [. 2r ~- 1 l (z) V~,p ] b*(z) [ 2(v - r) C(z) (~(p)(z)Br,p)2]

- n - 1 / ( 2 ; + z ) ~ * (z ) .

1/(2p+1)

Note that the opt imum rate, n -1/(2p+1) for bandwidth and the opt imum rate, n -2 (p-r ) / (2p+I) for MSE, are analogous to the i.i.d, case (i.e. without truncation). The optimal bandwidth b* (z) depends on the unknown quantities A(z), C(z) and A(p) (z). Data dependent adaptive bandwidth choices will be addressed in Section 4.

Next, we will derive the local limiting distribution of A(~)(z). Notice that A(~) (z) is a sum of identically distributed but not independent terms since C~(X~) depends on the entire sample. As mentioned earlier, we will utilize the H&jek projection method. This method was also used by Tanner and Wong (1983) for kernel hazard estimates based on randomly censored data.

Let W be a function of i.i.d, random variables Z1, Z2 , . . . , Z~. Hdjek (1968) defines the projection W* of W to the space S of the sum of i.i.d, variables as follows:

(3.9)

(3 .1o)

n

w* - E ( w * ) = ~ [ E ( W f Z a - E(W)], i=1

E(W*) ---- E(W), E(W* - W) 2 : Var(W) -- Var(W*).

In the truncation setting, Zi = (Xi, Y/), W = i(~)(z), and A*(~)(z)denotes the H&jek projection W* of A(~)(z). The following lemma gives the form of A*(~)(z). The derivation is in Appendix C.

LEMMA 3.1. (H&jek projection) (a)

(3.11) a*(~)(z) - E(A*(~)(z)) n

= n -I Z{I~r,b(Z -- Xi)[C(Xi)]-l[l - [i - C(Xi)] n]

i:I

- fz(Y~ < ~ < X ~ ) K r , b ( z - s)

• [c(s)]-~[l -(1- C(~)FldA(~)} n

• / < , b ( ~ - ~)[1 - c ( ~ ) p - ~ a a ( ~ )

- ~-~ ~(z) + ~ ( z ) , i=1 i=1


where E ( ~ ( z ) ) : E(v~(z)) = O. (b) I f z is a continuity point of G, then for V~,p defined in (3.6),

(3.12) i r }

Var(A*( ')(z)) -- nb2r+l l ~(~) Vr'p Jr- 0(1) .

THEOREM 3.3. (Asymptotic normality) Assume G is continuous at z and

(B1)-(B2) are satisfied. We then have

(a) ( ~ b 2 r + l ) l / 2 ( [ / ~ ( z ) / C ( z ) ] V r , p ) - l / 2 [ ~ ( r ) ( z ) - ~(.~(r)(z))]--~£ X(0, 1), (b) /f d = l i m ~ _ ~ nb 2p+1 < oc, then

(nb2r+l)1/2 [~(r)(5) -- /~{r)(Z)] £ N(dl/2A (p) (z)Br,p, [/~(z)/C(z)]Vr,p).

PROOF. (a) It follows from (3.5), (3.10) and (3.12) that

Var(A (r) (z))/Var(), *(r) (z)) -~ 1 and o.

Therefore, [Var(J,(')(z))]-I/2 [j,(,)(z) - E(),(')(z))] has the same asymptotic distr ibution as Z~(z) = [Var(A*(~)(z))]-l/2[A*(')(z) - E(A*(')(z))] and it suffices to show that Z~ ~ N(0, 1). This is accomplished by verifying Lindeberg's condition for a triangular array.

(b) Follows immediately from (3.3) and (a). []

4. Adaptive bandwidth choice

Consider the estimator (2.10) with local bandwidth b(z) = ?~t -1 / (2P+l )co(z ) n-1/(2P+l)co, which attains the optimal rate of convergence by (3.8) and denote it a s

(4.1) ( z _ - X i ) 1 ~(r)(z, co) = 1 Kr \ n _ l / ( 2 p + l ) c ° nCn(X i ) " [n--1/(2p+I)co]r÷I i=l

Thus A (~) (z, co* (z)) is optimal in terms of minimizing the asymptotic MSE. In this section we show that locally adaptive bandwidth choices are indeed feasible. More precisely, it is shown that the estimator ),(~)(z, &* (z)), where &* (z) is a consistent estimator of co* (z), has the same asymptotic distribution as the hypothetical optimal estimator A(~)(z, co*(z)). To obtain this result, it will be convenient to deal with a suitably normalized form of (4.1), given as

(4.2) un(z, co) = co) _

For fixed z choose Wa, Wb such that 0 < Wa < w*(z) < co b < 0~. Let L i p s ( A ) denote the class of real functions on the set A which satisfy

Lipschitz continuity of order a > 0. The next lemma provides the key to the main result, Theorem 4.1, of this paper. The proof of Lemma 4.1 is in Appendix D.

256 IJLK0 G / J R L E R AND J A N E - L I N G W A N G

LEMMa 4.1. Assume (B1), (B2), and G is continuous at z. I f Kr E L ip~( -cc , oo) where a > 0.5 and p > r, then for O < coa <_ co <_ cob < co, and Wb --coa < 1 the process Un(z, co) given by (4.2) converges weakly in C[coa,cob] to a Gaussian process U(z, co) with

(4.3) and

(4.4)

E(U(z, CO)) = [CO(z)]P-~A (p) (z)Br,p,

Cov(U(z, COl), u(z, CO ))

= (COl ,Cd2) - ( r+l ) [ )~ ( z ) /C(z ) ] ( )Kr where B~,p is given by (3.4).

THEOREM 4.1. (Locally adaptive bandwidth choice) Under the conditions

--+CO (z), both U,(z , go(z)) of Lemma 4.1 and for any estimator &(z) satisfying go(z) P * and Un(z, CO*(z)) converge weakly to a normal distribution N([CO*(z)]P-~A(P)(z) • Br,p;CO*(z)-(Z~+l)[A(z)/C(z)]V~,p), where Br,p and Vr,p are given by (3.4) and (3.6).

PROOF. Lemma 4.1 implies that Un(z, go) - Un(z, CO*) --+ 0 in probability. The result then follows from Lemma 4.1 and application of Slutsky's Theorem. []

Remarks 1. Note that Lemma 4.1 is only a tool to show the adaptive bandwidth choice result in Theorem 4.1. In practice one doesn't need to locate wa and CO b .

2. Theorem 4.1 requires construction of consistent estimators for CO* (z) which reduces to estimating the quantity A(z)/[C(z)A(P)(z)] consistently. By Corollary

3.1(b), consistent estimators for k(z) and k(P)(z), denoted by A0(z) and A(oP)(z) respectively, can be obtained via selecting proper Ko, Kp and initial bandwidths

b0 and bp. The initial bandwidth for A(oP)(z) should be larger than the initial bandwidth for A0(z) (nbo -+ e~ but nb2p p+l -+ oc). As for estimating C(z), the C,~(z) given by (2.9) is not appropriate for the present purpose since it may assume zero value. Let C~ be any modified version of C , which is nonzero and consistent for C, e.g., let C~(z) = 1/(n + 1), whenever C~(z) = 0. Then, a candidate for adaptive bandwidth choice can be given as:

(4 .5 ) b * ( z ) = f~ - 1 / ( 2 p + 1 ) 2r + 1 Ao(Z) Vr,p

2(;- r) 0n(z) [ (0p)(z)Sr,p]2

1/(2p+l)

3. Another choice of adaptive bandwidth can be obtained using the fact that A = d F * / C which follows from (2.6), and that dF* -- f* can be estimated using the ordinary kernel estimate

n

f*(z) = KO,b(Z -- x)dF*(x) = ~ i=1 \ bn J"

ESTIMATION OF HAZARD F U N C T I O N S F O R T R U N C A T E D DATA 257

An alternative candidate for adaptive bandwidth choice is then:

(4.6) l)*(z) = n -1/(2p+1) 2~ + 1 f;(z)V~,~ 1/(2p+1)

Acknowledgement

We would like to thank Abba Krieger for a useful suggestion in the computation of Var(t(~)(z)) in Theorem 3.1.

Appendix A

PROOF OF THEOREM 3.1. The mean of ~(~)(x) follows directly from Lemma 2 of Woodroofe (1985). To find the variance, consider

(a.1) E(t(~)(z))2 = E (i=< K~,b(Z- Xi)[nC~(Xi)]-2 )

+ 2E(~<TKr,b(z -Xi )Kr ,b(Z-Xj)

• b~c~(xdc~(xj)]- 0 = I + I I .

Now, observe that given Xi, nC~(X~)- 1 ~ B i n o m i a l ( n - 1,C(X~)), and E([n2C~(Xi)] -1 I Xi) = I~(C(Xi))/[nC(X{)]. Hence by (2.6)

(a.2) I : f{K2b(Z -- Xl)In(C(Xl))/C(X1)},

which is the first term in (3.2). To evaluate II, first consider the following condi- tional expectation:

2E I ~ ~ K~,b(~- X~)Kr,b(~- Xj) II i<j

• E([n2Cn(Xi)Cn(Xj)] -1 ]Xi,Xj,Yi, Yj)}.

For Xi < Xj, nC~ (Xj) = 1 + M2 + M3, and

1 + M 1 + M a , if Xi < Yj nCn(Xi) = 2 + M 1 + Ma, if Xi _> ~ ,

258 ULK/J GURLER AND JANE-LING WANG

where (M1, M2, M3, M4) have a multinomial distribution with parameters n - 2 and Pk = Pk(Xj ,X i ) , k = 1,2,3,4. The cell probabilities Pk's are defined as follows:

P l ( S , t ) = P ( Y < t , t < X < s I Y ~ X ) = a - l G ( t ) [ F ( s ) - F( t ) ] ,

P2(s, t) = P(t < Y <_ s,X > s I Y __ x ) = o z - 1 [ C ( 8 ) - C(t)][1 - F(s)], P3( s , t ) = P ( Y < t , X > s t Y < X ) = a - l G ( t ) [ 1 - F ( s ) ] ,

P4(s, t) = P ( n e i t h e r t nor s is in [Y, X ] I Y <_ X )

= 1 - P l ( S , t ) - P2 ( s , t ) - Pa ( s , t ) .

Hence,

(a.3) i(x{ < xj)E([~2c~(xdG(xj)] -~ I x{, xj , ~, U) = I(Xi < Yj)E[(1 + Mx + Ms)(1 + M2 + Ms)] -1

+ I(Yj <_ Xi < Xj)E[(2 + M1 + M3)(1 + M2 + M3)] -1.

Similarly, one can replace [(Xi < Xj) , [ (Xi < Yj) and I(Yj _ X~ < Xj) in (a.3) by I (Xi ~_ Xj) , I (X j < Yi) and I(Yi ~ Xj ~ Xi) respectively. Now using the facts that:

(i) = I 1 Io for > i. ( (2) (a + b + c + d) n = Ekl+k2+k3+k4=n klk2kak4 )aklbk2ckadk4' for integer

n > l , it can be shown that (details are available in Appendix A of Uzunogullari and Wang (1990)),

f (a.4) n = 2 E ~ K r , b ( z - X{)K~,b(~ - X j )

•

Jr< 8

• 1 F(t) - F(s)[[1 - C(s)] n - Pg(s, t)] - [1 - C(t)] ~

• d A ( t ) d A ( s ) .

Theorem 3.1 now follows from (a.1), (a.2), (a.4), and noting that P4(s, t) = P(s, t), and

= 2 / m,,b(~ - t)K~,b(~ -- ~) Jr< s

• [1 - [1 - C(s)] ~ - [i - C(t)]" + [1 - C(t)]~[l - C(s)]~]dA(t)dA(s). []


Appendix B

PROOF OF THEOREM 3.2. (a) Using integration by parts and the moment conditions in (2.11), it follows tha t if one defined Kr - l (X) = f~-i Kr(y)dy, for r >_ 1, then

Kr-1 E Mr-l,p-1, Kr_j E M , r - j , p - j and K - Ko E Mo,p_~.

Hence

/ Kr,b(Z-x)dA(x) = /_ll K(Y)A(r)(z- yb)dy.

By (3.1) this leads to the following bias expansion:

(b.l) bias(](~)(z)) = [./_I K(Y)I(~)( z _ yb)dy- l(<)(z)]

- / K~,b(Z -- x)[1 -- C(x)]ndA(x)

= I + I I .

Now utilizing the assumptions (A1) and Taylor expansion, it follows tha t

i t / ' . J

On the other hand, for n large enough and aC < x < bF, there exists 6 > 1 such tha t 1 - C(x) <_ 6. Therefore,

/ (b.3) IIII < 6nb - r IKr(y) l /~(z - yb )dy : o ( 6 n b - r ) = o(b p - r ) 1

since Kr E L2[-1 , 1] and A is continuous at z. Part (a) now follows from (b.1) to ( 5 . 3 ) .

(b) Consider the first term in (3.2). Using the assumptions on the kernel and bandwidth, the continuity of A/C at z and the uniform convergence of nIn(C(x)) to 1/C(x) on [z - b~, z + b~], one can show tha t

f K~,b(z - x)I~(C(x))dA(x) -~ ~(z)V~,,/C(z) ~tb2r+ 1

It remains to show tha t the second term in (3.2) is of the order o((nb2~+l)-l). To see this observe that:

[1 1 - F(t) (b.4) - C(s)]'~ F(-t) --F--((s)[[1 - C(s)] ~ - P~(.s, t)]

- [1 - c ( ~ ) p [1 - c ( t ) ] ~

1 - F ( t ) < [1 - C ( s ) ] ~ + F ( t ) - F ( s ) [ [ 1 - C ( s ) ] ~ - P n ( s , t ) ]

_< (~ + 1)[I - c(~)p -~,

260 IJLK0 G/~IRLER AND JANE-LING WANG

where the last inequali ty follows from the fact t ha t 1 - C(s) - P(s, t) = ct-iG(t) • IF(s) - F(t)] , P(s, t) < min{1 - C(s), 1 - C(t)} and the following polynomial expansion:

1 - F ( t ) F~]-_-~-(s)-[[1 - C(s)] ~ - P~(s,t)]

: C(t)[[1 - C ( 8 ) ] n-1 J-[1 - -C( s ) ]n -2p ( s , t ) + . . . + Pn - l ( s , t)]

< n[1 - C(s)] ~-~.

For large n, (b.4) implies tha t for some 6 > 0,

( n b 2 r + l ) • [second t e rm in (3.2)]

_< (nb 2~+1) / K<,b(Z -- s)K<,b(Z -- t)(n + 1)[1 -- C(s)]~-ldA(t)dA(s) . I t <s

< n(n + 1)b2<+16 ~-1 K~,b(Z -- t)dA(t) ~ O. []

Appendix C

PROOF OF LEMMA 3.1. (a) Appl icat ion of (3.9) to A(~)(z) yields

(c.1) k*(~)(z) - E(k*(~)(z)) n

= y-~{E(Wj I X j , ~ ) + (~ - 1)E(Wi I X j , ~ ) - E(i(r)(z))}, j= l

where Wk = Kr,b(z -- Xk)[nC~(Xk)]- l , and

(c.2) E(Wy I X j, Y~) = [nC(Xj ) ] - lKr ,b (Z- Xj)[1 - (1 - C(Xj) )n] ,

by L e m m a 2 of Woodroofe (1985). Also,

(c.3) E(Wi I X j , Y j ) = E{Kr,b(Z - Xi)E[(nCn(Xi)) -1 I X i , ~ , X j , ~ ] l X j , Y j } .

Let p = C(X~) and observe that , given X~, Y~, X j , Yj, and n, the condit ional dis t r ibut ion of nCn(Xi) is

nCn(Xi) ~ { 2 + Binomial (n - 2,p), if Yj <_ Xi <_ X j 1 + Binomial (n - 2,p), otherwise.

Hence for Yj <_ Xi <_ Xj ,

(c.4) E(['~Cn(Xi)] -1 I X,, Y~, Xj, Yj) = Z , k-~ nk P ~ - 2 ( 1 - P)'~-~ k=2

= [n(n - 1)p2]- l[np- 1 + (1 --p)n].


Similarly, for Xi < Yj or Xj <_ Xi,

(c.5) E([nCn(Xi)] -1 ]X i ,Y i ,X j ,Y j ) = [(n - 1)p]-l[1 - (1 _p)n-]].

Combining (c.4) and (c.5) we have

(c.6) E([nC~(Xi)] -1 I X~, ~, Xj, Vj) = [ ( n - 1)p]-111 - ( 1 - p ) n - 1 ]

-k [n (n - 1)p2]-1[(1 _ p)n + rip(1 __ p)n--1 __ l l / (Yj ~ X i ~ X j ) .

Replacing p back by C(Xi), and plugging (c.6) into (c.3), we obta in

1) -1 { / Kb(z -- s)[X - [1 - C(s)]n-1]dA(s) E(Wi I x j, ~) (~

l [ (Z j < 8 < X j ) K b ( z - s)[?),C(8)] -1

-[1 - [1 - C(s)] - nC(s)[1 - C(s)]n-1]dA(s)}.

(3.11) now follows from (c.1), (c.2), (c.6) and (3.1). The fact that ~i and V~ have mean zero follows from (2.6), (2.7), and the fact t ha t the first and second t e rm in ~i have the same expectat ion.

(b) For this par t we utilize the following result whose proof is given in Ap- pendix C of Uzunogullar i and Wang (1990):

f K$,b(Z -- x)[1 -- [1 -- C(x)]~]2[C(s)]-ldA(s). (c.7) Var(~(z) )

Using the cont inui ty of A/C at z, the fact tha t K c L2[-1 , 1] and the domina ted convergence theorem, (c.7) can be wri t ten as

1 f ,.-2, , A(z - by)[1 - [1 - C(z - b y ) ] n ] 2 d y (c.8) Var(~i(z)) - b2r+ 1 tt;~'Y) C(z - by)

1 [ A ( z ) , ] -- b2r+ ~ L~(z) V~,p +o(1)_ .

Next, consider ~i(z). For some ~ > O,

(c.9) I ~ ( z ) l = / K , , b ( z - s)[±(Y~ < s < X~) - C ( s ) ] [ 1 - C(s)]n-ldA(s)

<_ .~ K,,b(Z -- s) [1 -- C(s)]~-ldA(s)

= b- '5 ~ / K , ( y ) A ( z - by)dy I

Formula (3.12) now follows from (c.8), (c.9) and appl icat ion of the Cauchy-Schwarz inequali ty for the covariance term. []

262 0LKI) GURLER AND JANE-LING WANG

Appendix D

PROOF OF LEMMA 4.1. The course of the proof is to show that (a) the finite dimensional distributions of U~(z, w) converge to a multivariate

normal distribution, with the covariance structure given by (4.4), and (b) the process U~(z, co) is tight. Part (a). By the hypothesis of the lemma, Theorem 3.2(a) and Theorem

3.3(a), it follows that

(d.1) Un(z, 02) --+ X(w(z)P- 'A (p) (z)B,,p, a~ -(2~+1) [A(z)/C(z)]V,,p).

The Cram6r-Wold device then implies the weak convergence of the finite dimensional distributions of U~(z,w) to a multivariate normal distribution with mean given by (4.3). It remains to verify the covariance structure of the limiting multivariate normal distribution.

Let as = n - 1/(2p+1). Then following the proof of Theorem 3. l(b) and Theorem 3.2(b) for the variance computations, we arrive at

(d.2) Cov(Un(z , col), g n ( z , ~ 2 ) )

= n2(p+l)/(2p+]) (021022)--(r+1)

" { f K~ (~a~l ) K~ ( ~ ) I~[C(x)laA(x)

+ L F • [[1 - C ( s ) ] n [ 1 - (1 - C ( t ) ) ~]

1 - F(t) 1 - t))] }dA(t)dA(s) F ~ g - T ( t ) ([ - c (s )p p~(s,

( ,)( ' ) : (<02~)-(~+1)c(~) K~ ~ K~ ~ dt + o(1).

Part (a) is now completed by (d.2). Part (b). To see that the process Un(z,a~) is tight, consider

(d.3) E(Un(z,021) - Sn(Z,022)) 2

: eb (~-~)/(~'+1) (i (~) (~, 021) - i (~) (~, 02~))]~

= n2(p+I)/(2p+I)E z, Xi,021,w2 n i ,

~'"kl r+l whereS(z,X~,~l,022) = (1/021)S~((z-X~)/an021)-(1/022 )K~((~-Xd/an021). Observe that the last expression in curled brackets is similar in form to ~(r)(z) in (2.10) and therefore one can obtain this expectation from the proof of E[A(r)(z)] 2

ESTIMATION OF HAZARD FUNCTIONS FOR T R U N C A T E D DATA 263

in Append ix A since H is a fixed funct ion of Xi . Hence we have

E[Un(z, col) - Un(z,

~- Tt'2(pq-1)/(2Pq-1) V [ H2( Z, 8,col,co2)Zn[C(x)]di(8) LJ

= I + I I .

+ 2 ft H(z, s, col, co2)H(z, t, col, co2) <s

1 - F ( t ) • 1 F-(~) -F((t)[[1 - C(s)] n - Pg]

- [1 - C(s)]n[1 - C(t)] n }dA(t)da(s)]

Now consider t e r m I, wi th an = r t - 1 / ( 2 p + l ) :

(d.4) I = f nI~ [C(z - an t ) ] (co; (r+l)Kr ( t /co 1 ) - co2 (r+l)Kr ( t /°22))2

• ),(z - ant)dr

= / nIn[C(z - ant)]{col -(r+l) [Kr(t/col) - Kr(t/co2)]

+ [co~-(~+l) _ co~(~+l)]Kr(t/co2)}2A( z _ ant)dr

<<_ 2 / nIn[C(z - ant)]{co~ -2(~+1) [K~(t/czl) - K~(t/co2)] 2

+ [co[(r+l) _ co~(r+l)]2K~(t/cz2)}A( z _ ant)dr

_< cons tan t f nIn [C(z - ant)]" Icol - co21 min(2a'2) ~(z - ant)dt

_< c o n s t a n t Icol - 022 ]min(2a,2)

where the second last step follows f rom the Lipschitz condi t ion on BLr and t h a t Icox - co2] _< 1; the last step follows f rom the con t inu i ty of A/C at z and the fact t h a t nyIn(y) _< 2 for 0 _< y _< 1.

Term II can s imilar ly be bounded by L(cox -co2) min(2~'2) for some L > 0. Therefore

E[Un(z, Wl) - Un(z, co2)] 2 _< constant(col - c02) min(2a'2),

for all (col, co2) E [a~a; cob]. This implies the t ightness of Un(z, co) by T h e o r e m 12.3 of Bill ingsley (1968). []

REFERENCES

Allredge, J. R. and Gates, C. E. (1985). Line transect est imators for left t runcated distributions, Biometrics, 41, 273-280.

Billingsley, P. (1968). Convergence of Probability Measures, Wiley, New York.

264 fJLKfJ GURLER AND JANE-LING WANG

Chao, M. T. and Lo, S. H. (1988). Some representations of the nonparametric maximum likelihood estimators with truncated data, Ann. Statist., 16, 661-668.

CsSrg6, S. and Horv~th, L. (1980). Random censorship from the left, Studia Sci. Math. Hungar., 15, 397-401.

Diehl, S. and Stute, W. (1988). Kernel density and hazard function estimation in the presence of censoring, J. Multivariate Anal., 25, 299-310.

Gu, M. G. and Lai, T. L. (1990). Functional law of the iterated logarithm for the product-limit estimator of a distribution function under random censorship or truncation, Ann. Probab., 18, 160-189.

H~jek, J. (1968). Asymptotic normality of simple linear rank statistics under alternative, Ann. Math. Statist., 39, 325-346.

Hyde, J. (1977). Testing survival under right censoring and left truncation, Biometrika, 64, 225-230.

Kalbfieisch, J. D. and Lawless, J. F. (1989). Inference based on transfusion-relation AIDS, J. Amer. Statist. Assoc., 84, 360-372.

Keiding, N. and Gill, R. D. (1990). Random truncation models and Markov processes, Ann. Statist., 18, 582-602.

Lagakos, S. W., Barraj, L. M. and DeGruttola, V. (1988). Nonparametric analysis of truncated survival data with application to AIDS, Biometrika, 75, 515-523.

Lui, K. J., Lawrence, D. N., Morgan, W. M., Peterman, T. A., Haverkos, H. W. and Bergman, D. J. (1986). A model based approach for estimating the mean incubation period of transfusion associated acquired immunodeficiency syndrome, Proc. Nat. Acad. Sci. U.S.A., 88, 3051- 3055.

Lynden-Bell, D. (1971). A method of allowing for known observational selection in small samples applied to 3CR quasars, Monthly Notices of the Royal Astronomical Society, 155, 95-118.

Miiller, H. G. and Wang, J. L. (1990). Locally adaptive hazard smoothing, Probab. Theory Related Fields, 85, 523-538.

Parzen, E. (1962). On estimation of a probability density function and mode, Ann. Math. Statist., 33, 1065-1076.

Ramlau-Hansen, H. (1983). Smoothing counting process intensities by means of kernel functions, Ann. Statist., 11,453-466.

Tanner, M. A. and Wong, W. H. (1983). The estimation of the hazard function from randomly censored data by the kernel method, Ann. Statist., 11, 989-993.

Uzunogullari, U. and Wang, J. L. (1990). Nonparametric estimation of hazard functions and their derivatives under truncation model, Tech. Report, ~¢156, University of California, Davis.

Uzunogullari, U. and Wang, J. L. (1992). A comparison of hazard rate estimators for left truncated and right censored data, Biometrika, 79, 297-310.

Wang, M. C., Jewell, N. P. and Tsai, W. Y. (1986). Asymptotic properties of the product limit estimate under random truncation, Ann. Statist., 14, 1597-1605.

Watson, G. S. and Leadbetter, M. R. (1964). Hazard Analysis II, Sankhy~ Set. A, 26, 101-116. Woodroofe, M. (1985). Estimating a distribution function with truncated data, Ann. Statist.,

13, 163-177. Yandell, S. B. (1983). Nonparametric inference for rates with censored survival data, Ann.

Statist., 11~ 1119-1135.

Date post:	24-Feb-2021
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Nonparametric estimation of hazard functions and their ...derivatives are considered under the...

Documents