+ All Categories
Home > Documents > 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

Date post: 24-Dec-2015
Category:
Upload: poppy-strickland
View: 227 times
Download: 5 times
Share this document with a friend
Popular Tags:
61
1 Research Method Research Method Lecture 16 Lecture 16 Duration analysis: Duration analysis: Survivor and hazard Survivor and hazard function estimation function estimation ©
Transcript
Page 1: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

1

Research MethodResearch Method

Lecture 16 Lecture 16

Duration analysis:Duration analysis:

Survivor and hazard Survivor and hazard function estimationfunction estimation

©

Page 2: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

Duration analysisDuration analysis

The duration analysis was originally developed to examine the duration that a patient survives the cancer etc.

Such models have been applied to Econometrics. The common application is the estimation of the unemployment duration, or the duration of a worker to be promoted to a higher position.

2

Page 3: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

In the duration analysis, our purpose is to estimate either the Survivor function, or the Hazard function.

The definitions of the survivor function and the hazard function are simple.

For illustrative purpose, I will consider the duration of a university faculty to be promoted to a full professor as an example.

3

Page 4: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

The duration until the promotion is a random variable.

Let F(t) be the cumulative distribution function of the duration.

Then, we have the following.

4

Page 5: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

1. The cumulative distribution function F(t):

F(t)=The probability that the duration until the promotion is less than t years.

5

F(t)

t: (years since hired)

1

0.95

Example

20

This graph means that if you work 20 years, the probability that you will be promoted to full professor is 95%.

Page 6: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

2. The survivor function S(t) S(t)=1-F(t) = the probability that

the person has not been promoted for at least t years.

6

1

0.05

20 years

S(t)

t: years since hired.

This means that, if you work 20 years, the probability that you are not promoted to full professor is 5%

Example

Page 7: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

3. The density function f(t) f(t)=F’(t)

4. The hazard function λ(t) =f(t)/S(t)

The hazard function shows the rate at which you will be promoted at t years, given that you have not promoted up to that time.

7

Page 8: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

It may sound strange for you to call the faculty who is not promoted as the ‘Survivor’, and call the rate at which the faculty is promoted as ‘Hazard’.

But just remember that, this model is initially developed to estimate the survival duration of cancer patients etc.

8

Page 9: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

The relationship between The relationship between Survivor function and Hazard Survivor function and Hazard

functionfunctionLet G(t)=logS(t). Then the derivative of G(t) is

written as:

You can recover G(t) from G’(t) by integration, which is shown below.

Since G(t)=logS(t), we have Exp(G(t))=S(t). Thus,

Thus, the relationship between the survivor function and the hazard function is given by

9

)( )(

)(

)(

)(' )(log)( ' t

tS

tf

tS

tS

t

tStG

(u)du(u)]du [(u)du ' )(000 ttt

GtG

)()(exp)(exp0

tSduutGt

t

duutS0

)(exp)(

Page 10: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

If you can estimate the hazard function, you can recover the survivor function.

Therefore, most of the duration analyses focus on the estimation of the hazard function.

10

Page 11: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

The hazard function The hazard function estimationestimation

Let x be the row vector of explanatory variables, and β be the corresponding column vector of coefficients.

We model the hazard function λ(t, x,β) as λ(t, x,β) =λ0(t)exp(xβ)

λ0(t) is called the baseline hazard. There are several choices for the baseline hazard. I will explain 3 common choices.

11

Page 12: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

The exponential hazard model:

When you assume that λ0(t)=1, then this model is called the exponential hazard model. The hazard function is given by:

λ(t) =exp(xβ) This model assumes that the hazard is

constant overtime. This is restrictive, since if you are unemployed for long time, you become less and less likely to find a job. This kind of time dependency cannot be captured by the exponential hazard.

12

Page 13: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

Given the exponential hazard function, the survivor function S(t) is given by

13

txdux

duxduutS

t

t

tt

)exp(exp)exp(exp

)exp(exp)(exp)(

0

00

)exp()exp(exp)(')( xtxtStf

and

Page 14: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

If the person has been promoted already, you know the exact duration. Thus, the likelihood contribution for this person is the density function: f(t)

If the person has not been promoted, then only thing you know is that the duration until promotion is longer than the recorded duration. Thus, the likelihood contribution is the probability that the promotion duration is greater than t, which is equal to the Survivor function S(t).

14

Page 15: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

Let Di be the dummy indicating that the person has been promoted. Then, the likelihood contribution is written as:

15

ii DD

itStfL

1)()(

The likelihood function L is then given by:

n

iiLL

1

)(

The values of β that maximizes the likelihood function is the estimators of the exponential hazard model.

As usual, you usually maximize the log(L).

Page 16: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

Regarding the dummy variable Di, note the difference between the hazard function estimation and the censored model.

In the hazard function estimation, Di=1 if the person is promoted. But in the censored model, we set Di=1 if the person is not promoted (thus the duration is censored).

This is purely a difference in convention between two models.

16

Page 17: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

Exponential hazard Exponential hazard exampleexample

Using the promotion.dta, let us estimate the exponential hazard model. Explanatory variables are female, phdabroad and book_rate

In the data, “durat” is the duration from the initial hire to the promotion to the full professor.

“promoted” is the dummy variable indicating if the person has been promoted. This corresponds to Di.

“phdabroad” is a dummy for those who get Ph.D. abroad.

“book_rate” is the number of books published per year of their career.

17

Page 18: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

18

last observed exit t = 39 earliest observed entry t = 0 3256 total analysis time at risk, at risk from t = 0 200 failures in single failure-per-subject data 324 subjects 324 obs. remaining, representing 0 exclusions 324 total obs.

exit on or before: failureobs. time interval: (durat[_n-1], durat] failure event: promoted != 0 & promoted < . id: id

. stset durat , failure(promoted) id(id)

. First, tell the STATA that this is a survival data

Page 19: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

19

_cons -2.908412 .094387 -30.81 0.000 -3.093407 -2.723417 book_rate 1.073336 .3718619 2.89 0.004 .3445004 1.802172 phdabroad .2842275 .2224027 1.28 0.201 -.1516738 .7201288 female -.3131531 .2226157 -1.41 0.160 -.7494718 .1231656 _t Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -272.49945 Prob > chi2 = 0.0189 LR chi2(3) = 9.96Time at risk = 3256No. of failures = 200No. of subjects = 324 Number of obs = 324

Exponential regression -- log relative-hazard form

Iteration 4: log likelihood = -272.49945 Iteration 3: log likelihood = -272.49945 Iteration 2: log likelihood = -272.49996 Iteration 1: log likelihood = -272.88406 Iteration 0: log likelihood = -277.47869

id: id analysis time _t: durat failure _d: promoted

. streg female phdabroad book_rate, dist(exponential) nohr Then, estimate the hazard function

Page 20: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

Interpretation of the coefficients are tricky. Note that you have estimated the following

hazard function

20

)exp(

)exp()exp()exp(

exp

3

210

3210

bookrate

phdabroadfemale

bookratephdabroadfemale

Thus, the estimated coefficient for female (-0.3131) means that, if you are female, the hazard will decrease by the multiplicative factor of exp(-0.3131)=0.7112.

Page 21: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

In other words, female’s hazard function is 71% that of male’s. This means that females are less likely than males to be promoted to the full professor position at any give experience (though the coefficient is not significant).

Sometimes, researchers report the exponentiated coefficients exp(βj), instead of the actual coefficients. You can do this by dropping the “nohr” option in the streg command. However, economists usually report coefficients. Thus, I recommend you to use “nohr” option.

21

Page 22: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

Another complicating fact is that, even if the female’s hazard function is 71% that of males, this does not mean that female’s probability of being promoted is 71% of the male’s promotion probability.

In order to compare the probability of being promoted, you have to compute the survivor function.

22

Page 23: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

Just remember that

The survivor functions for the “average” males and females are given by:

23

tbookratephdabroadtS M 0expexp)(3210

txtS )exp(exp)(

tbookratephdabroadtS F 1expexp)(3210

Male:

Female:

Page 24: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

Survivor function is a function of t. Thus, there are two ways to compare the survival probabilities.

The first way is to plot the survivor functions for male and female, then visually compare these two. This is done automatically by Stata.

The second way is to compute the survival probability at a particular time, say 10 years, for both males and females, then compare them.

24

Page 25: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

25

0.2

.4.6

.81

Surv

ival

0 10 20 30 40analysis time

female=1 female=0

Exponential regression

female. stcurve, survival at1(female=1) at2(female=0)

male

Note that the survival probability shows the probability of not being promoted.

Since survival curve for females is above males, females are less likely to be promoted (i.e, more likely to be not promoted.)

Page 26: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

Now, let us compute the survival probability for males and females at duration equal to 20 years.

26

277.0

20 0expexp)20(3210

bookratephdabroadS M

391.0

20 1expexp)20(3210

bookratephdabroadS M

The difference in the survival probability is about 0.114. Thus, at 20 years of experience, female is 11% less likely than males to be promoted to full professor.

Next slide shows how I computed this these probabilities using STATA.

Page 27: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

27

surv_f 324 .3913285 0 .3913285 .3913285 surv_m 324 .2771445 0 .2771445 .2771445 Variable Obs Mean Std. Dev. Min Max

. su surv_m surv_f

.

. gen surv_f=exp(-exp(xbfemale)*20)

. gen surv_m=exp(-exp(xbmale)*20)

.

. gen xbfemale= _b[_cons]+_b[female]*1+_b[phdabroad]*av_phd+_b[book_rate]*av_book

. gen xbmale= _b[_cons]+_b[female]*0+_b[phdabroad]*av_phd+_b[book_rate]*av_book

.

. egen av_book=mean(book_rate)

. egen av_phd=mean(phdabroad)

Page 28: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

Weibull hazard model:

When you assume that λ0(t)= , this model is called the Weibull hazard model. If <1, there is a negative duration dependence (i.e., if stay unemployed longer, it becomes less likely to find a job.) If >1, then there is a positive duration dependence. If =1, then there is no duration dependence, and it is the same as the exponential model.

28

1t

Page 29: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

Remember that the exponential hazard model cannot capture the duration dependence. Thus, Weibull hazard model overcomes this weakness.

The hazard function for Weibull model is given by:

29

)exp()( 1 xtt

Page 30: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

Given the Weibull hazard function, we have

30

txduux

duuxduutS

t

t

tt

)exp(exp)exp(exp

)exp(exp)(exp)(

0

1

0

1

0

1)exp()exp(exp)(')( txtxtStf

And

Page 31: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

Let Di be the dummy indicating that the person has been promoted. Then, the likelihood contribution is written as:

31

ii DD

itStfL

1)()(

The likelihood function L is then given by:

n

iiLL

1

),(

The values of β and that maximizes the likelihood function is the estimators of the Weibull hazard model.

As usual, you usually maximize log(L).

Page 32: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

Weibull hazard estimation Weibull hazard estimation exampleexample

Using the promotion.dta, let us estimate the Weibull hazard function.

The explanatory variables are female, phdaborad, and book_rate.

32

Page 33: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

33 1/p .3764876 .0178773 .3430298 .4132088 p 2.65613 .126125 2.420084 2.915199 /ln_p .9768701 .0474845 20.57 0.000 .8838022 1.069938 _cons -7.427347 .3753446 -19.79 0.000 -8.163009 -6.691685 book_rate 2.068296 .3706405 5.58 0.000 1.341854 2.794738 phdabroad .7574907 .2258349 3.35 0.001 .3148625 1.200119 female -.4590474 .2245992 -2.04 0.041 -.8992538 -.018841 _t Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -150.40093 Prob > chi2 = 0.0000 LR chi2(3) = 34.73Time at risk = 3256No. of failures = 200No. of subjects = 324 Number of obs = 324

Weibull regression -- log relative-hazard form

Iteration 5: log likelihood = -150.40093 Iteration 4: log likelihood = -150.40093 Iteration 3: log likelihood = -150.40097 Iteration 2: log likelihood = -150.47551 Iteration 1: log likelihood = -158.11946 Iteration 0: log likelihood = -167.76786

Fitting full model:

Iteration 4: log likelihood = -167.76786Iteration 3: log likelihood = -167.76786Iteration 2: log likelihood = -167.76789Iteration 1: log likelihood = -173.68071Iteration 0: log likelihood = -277.47869

Fitting constant-only model:

id: id analysis time _t: durat failure _d: promoted

. streg female phdabroad book_rate, dist(weibull) nohr

This is log( ). Thus, is greater than 1. So there is positive duration dependence

Page 34: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

Female is less likely to be promoted to full professor at any give experience. The coefficient (-0.459) indicates that female’s hazard is smaller than male by the multiplicative factor of exp(-0.459)= 0.639.

Now, let us compare the survival functions for males and females.

34

Page 35: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

The survival functions for males and females “at average” are given by

35

tbookratephdabroadtS M 0expexp)(3210

tbookratephdabroadtS F 1expexp)(3210

male

female

The STATA automatically plots these survival functions.

Page 36: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

36

0.2

.4.6

.81

Sur

viva

l

0 10 20 30 40analysis time

female=1 female=0

Weibull regression

. stcurve, survival at1(female=1) at2(female=0)Females’ survivor function is above the males’. Thus, females are less likely to be promoted to full professor at any given experience.

Page 37: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

Now, let us compute the survival probability at t=20.

37

093.0

20 0expexp)20(3210

bookratephdabroadS M

223.0

20 1expexp)20(3210

bookratephdabroadS M

Thus, the difference in the survival probability is 13%. Females are 13% less likely to be promoted to full professor at 20 years of experience. Next slide shows how I computed these probabilities.

Page 38: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

38 surv_f 324 .2228726 0 .2228726 .2228726 surv_m 324 .0929515 0 .0929515 .0929515 Variable Obs Mean Std. Dev. Min Max

. su surv_m surv_f

.

.

. gen surv_f=exp(-exp(xbfemale)*(20^exp(_b[/ln_p])))

. gen surv_m=exp(-exp(xbmale)*(20^exp(_b[/ln_p])))

.

. gen xbfemale= _b[_cons]+_b[female]*1+_b[phdabroad]*av_phd+_b[book_rate]*av_book

. gen xbmale= _b[_cons]+_b[female]*0+_b[phdabroad]*av_phd+_b[book_rate]*av_book

.

. egen av_book=mean(book_rate)

. egen av_phd=mean(phdabroad)

Computing the survival probability at t=20 for Weibull model

Page 39: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

Piecewise-constant hazard model

This is perhaps the most flexible model. In this model, you have to segment the duration into several pieces.

Then, you assume that (i) within each

segment, the hazard is constant, but (ii) between segments, hazard can be different.

39

Page 40: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

The hazard function can be written as:

λ(t)= λ1 for 0≤t ≤c1

= λ2 for c1<t ≤c2

. . = λM for cM-1<t ≤∞

40

Page 41: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

For example, suppose that you segment the duration into three pieces, then the piecewise-constant hazard function would look like:

41

tc1 c2

λ(t)

λ1

λ2

λ3

Page 42: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

In piecewise-constant hazard model, you estimate λ1~λM as well as β. The practical estimation is illustrated as follows. Suppose you segment the duration into three pieces, 0 ~10, 11~20, 21~∞. Let B1 be the dummy variable that takes 1 if the recorded duration is in the first segment. B2 is the dummy variable that

takes 1 if the recorded duration is in segment 2. B3 is the dummy for those whose recoded duration is in segment 3.

42

Page 43: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

Then, the piecewise hazard function can be written as:

43

)exp(

)exp()exp()(

332211

332211

xBBB

xBBBt

)exp(

)exp(

)exp(

33

22

11

where

In estimation, you estimate ~ . 3

1

Page 44: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

As can be seen, this is the same model as the exponential hazard model. Only the difference is that you have included 3 dummies, B1, B2 and B3.

The Survival function of the piecewise-constant hazard model has somewhat complicated form.

44

1m1

1cfor )(exp)(

m

m

jmmjj

ctctbtS

1jjjccbwhere .

Page 45: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

45

It is easier to be understand it with an example. Suppose you segment the duration into three pieces: c1=5 and c2=10.

λ(t)

t

λ1

λ3

λ2

5 10 t*

Page 46: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

Now suppose that you want to compute the survival probability at t* years. Then it will be given by:

46

)10(55exp)(3

*

21

* ttS

Page 47: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

In piecewise-hazard function estimation, it is a good idea to use the demeaned explanatory variables.

This is because, if the explanatory variables are demeaned, then the estimated hazard pieces λ1~λM are the hazard pieces for the ‘average person’.

47

Page 48: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

To see this, note that if you estimate the following hazard,

48

])exp[()exp()(332211

xxBBBt

then, at the average, we have

)exp(

]0exp[)exp(

])exp[()exp()(

332211

1

332211

332211

BBB

BBB

xxBBBt

Thus, the estimated hazard pieces are the hazard pieces for the “average person”.

Page 49: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

When you divide the duration into several segments, you should make sure that, in each segment, there should be at least one person who have been promoted to full professor. Otherwise, you cannot estimate this model.

Finally, when you estimate the model, you should estimate it without the constant.

49

Page 50: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

Piecewise-constant Piecewise-constant hazard examplehazard example

Use promotion.dta to estimate the piecewise-constant hazard model.

Let us divide the segment in the following way:

Segment 1: 0~5 years Segment 2: 6~10 years : Segment 5: 21~25 years Segment 6: 26 years or greater

50

Page 51: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

Demean all the explanatory variables except female, so that the estimated hazard pieces are for the “average males.”

51

Page 52: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

52(7 real changes made). replace tp6=1 if durat>=26

. gen tp6=0

(6 real changes made). replace tp5=1 if durat>=21 & durat<=25

. gen tp5=0

(28 real changes made). replace tp4=1 if durat>=16 & durat<=20

. gen tp4=0

(102 real changes made). replace tp3=1 if durat>=11 & durat<=15

. gen tp3=0

(110 real changes made). replace tp2=1 if durat>=6 & durat<=10

. gen tp2=0

(71 real changes made). replace tp1=1 if durat>=0 & durat<=5

. gen tp1=0

. *******************************

. *Create the hazard piece dummy*

. *******************************

last observed exit t = 39 earliest observed entry t = 0 3256 total analysis time at risk, at risk from t = 0 200 failures in single failure-per-subject data 324 subjects 324 obs. remaining, representing 0 exclusions 324 total obs.

exit on or before: failureobs. time interval: (durat[_n-1], durat] failure event: promoted != 0 & promoted < . id: id

. stset durat, failure(promoted) id(id)

. *****************************

. *Set duration data *

. *****************************

Create the hazard piece dummies

Page 53: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

53

. gen dbookrate=book_rate-av_book

. egen av_book=mean(book_rate)

. gen dphdabroad=phdabroad-av_phd

. egen av_phd=mean(phdabroad)

. ****************************

. *Demean the variables *

. ****************************

Then demean the explanatory variables

Page 54: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

54

Piecewise-constant hazard is the same as exponential hazard plus the hazard piece dummies

dbookrate 1.030095 .3934427 2.62 0.009 .2589616 1.801229 dphdabroad .2081765 .2252712 0.92 0.355 -.2333469 .6496999 female -.2709733 .2246917 -1.21 0.228 -.7113608 .1694143 tp6 -3.845459 .502513 -7.65 0.000 -4.830367 -2.860552 tp5 -3.331892 .4510981 -7.39 0.000 -4.216028 -2.447757 tp4 -2.92134 .2066626 -14.14 0.000 -3.326391 -2.516288 tp3 -2.567151 .1069942 -23.99 0.000 -2.776856 -2.357447 tp2 -2.69848 .1316696 -20.49 0.000 -2.956548 -2.440412 tp1 -2.950586 .3100712 -9.52 0.000 -3.558314 -2.342858 _t Coef. Std. Err. z P>|z| [95% Conf. Interval]

Log likelihood = -265.8059 Prob > chi2 = 0.0000 Wald chi2(9) = 1512.80Time at risk = 3256No. of failures = 200No. of subjects = 324 Number of obs = 324

Exponential regression -- log relative-hazard form

Iteration 5: log likelihood = -265.8059 Iteration 4: log likelihood = -265.8059 Iteration 3: log likelihood = -265.80593 Iteration 2: log likelihood = -265.87692 Iteration 1: log likelihood = -268.50841 Iteration 0: log likelihood = -360.1429

id: id analysis time _t: durat failure _d: promoted

. streg tp1 tp2 tp3 tp4 tp5 tp6 female dphdabroad dbookrate, nocons dist(exponential) nohr

Do not include constant

Page 55: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

The female hazard is smaller than male by the multiplicative factor of exp(-0.271)=0.76. However, the coefficient is not significant.

The computation of the survivor function cannot be done automatically by STATA for piecewise-constant hazard model.

I recommend you to use Excel to do compute this, since this is perhaps the quickest way to do so.

55

Page 56: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

To compute the survivor function, note that the estimated hazard pieces are the exponentiated coefficients for tp1~tp6.

Then, you use the following formula to compute this.

Next slide provides a graphical illustration of how to compute this.

56

1m1

1cfor )(exp)(

m

m

jmmjj

ctctbtS

1jjjccbwhere

Page 57: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

57

Suppose that t is in the 3rd segment.

λ(t)

t

λ1

λ3

λ2

5 10 t

Page 58: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

Then the survival probability is given by:

This is simply the following:

Now, let us compute and plot the survival function for the “average males”.

Since we have demeaned the explanatory variables, the exponentiated coefficients for tp1~tp6 are the hazard pieces for the “average males”.

58

)10(55exp)(321

ttS

function) Hazard under the Area(exp)( tS

Page 59: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

59

Page 60: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

Now, let us compute the survival probability for female.

To do so, you should simply notice that the hazard pieces for female are given by:

Thus, first, multiply the hazard pieces by exp(βfemale). Then estimate the hazard function in the same way as that for males.

60

)exp(femalefor pieces hazardj female

Page 61: 1 Research Method Lecture 16 Duration analysis: Survivor and hazard function estimation ©

61

Survival functions for males and females


Recommended