TESTING CLUSTER-LEVEL RANDOM EFFECT IN JOINT MODELS … Profiles/Xin Yao Practicum Re… · methods...

1

TESTING CLUSTER-LEVEL RANDOM EFFECT IN JOINT

MODELS OF CLINICAL TRIAL DATA

by

Xin Yao

A thesis submitted to the Department of Public Health

In conformity with the requirements for

the degree of Master of Science

Queen’s University

Kingston, Ontario, Canada

(November, 2014)

Copyright ©Xin Yao, 2014

2

Abstract

To obtain enough participants, clinical trials often involve patients recruited from

multiple institutions, such as a Hodgkin’s Lymphoma clinical trial HD.6 conducted by

NCIC Clinical Trial Group (CTG). However, these institutions may have inherent

heterogeneity that affects the outcomes of the clinical trial. These heterogeneities can be

described as cluster-level random effects. In previous study done by Wang (2013), a joint

model was used to analyze the relationship between remission response and survival in

the HD.6 clinical trial. In this study, we develop both asymptotic and bootstrapping

methods to test whether the variance of the cluster-level random effect is larger than zero

for multivariate outcomes. These methods extend the classical asymptotic results for

score test of homogeneity to deal with multivariate outcomes. Both methods involved the

construction of a score-based test statistics, but the bootstrapping method approximate the

distribution of the statistic by resampling while the asymptotic method uses the analytical

variance for statistical inference. A series of simulations were conducted and the results

showed that the bootstrapping method has higher power compared to the asymptotic

method. This is due to the fact that the bootstrapping method does not make any

assumptions about the underlying distribution of the statistics that could be false. We also

showed that the proposed bootstrapping method can be an appropriate method for

epidemiologist as it allows the optimal sample size to be determined during the design of

an investigative study. We applied both the bootstrapping and asymptotic methods to the

NCIC CTG HD.6 clinical trial data and concluded there was no evidence to support that

outcomes were not affected by cluster-level random effects. For future directions, the

3

theory behind how to determine optimal sample size in studies in which bootstrap method

are used to test cluster effect should be further investigated. Furthermore, the effect of

variation in cluster size on the power of the bootstrap method should be studied.

4

Co-Authorship

5

Acknowledgements

First, I would like to thank both Dr. Bingshu Chen, Dr. Wenyu Jiang and the Department

of Public Health Sciences for granting me the opportunity to enroll in the Master of

Science in Biostatistics. My two supervisors have provided continuous mentorship and

guidance from 2013 to 2014, and assisted me throughout the practicum project. Without

their help this project would not have been possible.

I would also like to thank the faculty members in both Department of Public Health

Sciences and Department of Mathematics and Statistics. They have taught me crucial

knowledge throughout the school year and prepared me for this project.

At last, I would like to thank NCIC Clinical Trials Group for providing a supportive

working environment and all the facility I needed. I would also like to thank Natural

Sciences and Engineering Research Council of Canada (NSERC) for financial support.

6

Statement of Originality

I hereby certify that all of the work described within this thesis is the original work of the author.

Any published (or unpublished) ideas and/or techniques from the work of others are fully

acknowledged in accordance with the standard referencing practices.

(Xin Yao)

(October, 2014)

7

Table of Contents

Abstract ............................................................................................................................................ 2

Co-Authorship ................................................................................................................................. 4

Acknowledgements .......................................................................................................................... 5

Statement of Originality ................................................................................................................... 6

List of Figures .................................................................................................................................. 9

List of Tables ................................................................................................................................. 10

Chapter 1 Literature Review and Introduction .............................................................................. 11

1.1 Cluster effect in multicenter clinical trials ........................................................................... 11

1.1.1 Consequences of Cluster Effects ................................................................................... 11

1.1.2 Correcting For Cluster Effect in Practice ...................................................................... 12

1.1.3 Asymptotic Approach of Detecting Cluster-level Random Effect ................................ 14

1.2 Bootstrap .............................................................................................................................. 16

1.2.1 Introduction ................................................................................................................... 16

1.2.2 Bootstrap Methods Applied for the Modeling of Cluster Effects ................................. 18

1.3 Notations .............................................................................................................................. 20

1.4 HD.6 Hodgkin Lymphoma Clinical Trial and Joint Modeling ............................................ 21

1.5 Purpose of Study .................................................................................................................. 25

Chapter 2 Bootstrap and Asymptotic Variance Testing Procedures .............................................. 28

2.1 Overview .............................................................................................................................. 28

2.2 Calculating score-based statistics S...................................................................................... 29

2.3 Inference using Asymptotic Method .................................................................................... 32

2.4 Bootstrap Procedure ............................................................................................................. 33

2.5 Calculating Type I Error and Power .................................................................................... 35

Chapter 3 ........................................................................................................................................ 36

3.1 Overview .............................................................................................................................. 36

3.2 Methods ............................................................................................................................... 36

3.3 Simulation Results ............................................................................................................... 38

3.3.1 Results of Asymptotic Method ..................................................................................... 38

3.3.2 Bootstrap Estimate Distribution .................................................................................... 39

3.3.3 Type I Error and Power Comparison ............................................................................ 40

8

Chapter 4 Application and Discussion ........................................................................................... 44

4.1 Application to HD.6 Clinical Trial ...................................................................................... 44

4.2 Comparison of bootstrap method with the asymptotic method ........................................... 47

4.3 Sample Size Selection .......................................................................................................... 48

4.4 Future Directions ................................................................................................................. 49

4.5 Computational Expense ....................................................................................................... 50

Chapter 5 ........................................................................................................................................ 51

Bibliography .................................................................................................................................. 53

Appendix A .................................................................................................................................... 58

Appendix B .................................................................................................................................... 60

Appendix C .................................................................................................................................... 63

Appendix D .................................................................................................................................... 69

9

List of Figures

Figure 1 Distribution of Participants in HD.6 Clinical Trial ......................................................... 22

Figure 2 Bootstrap Estimate Distribution of 4 Datasets ................................................................ 39

Figure 3 Distribution of bootstrap estimates obtained from HD.6 clinical trial data ..................... 45

10

List of Tables

Table 3.1 Parameters of Simulation ............................................................................................... 37

Table 3.2 Type I error of Asymptotic Approach and Bootstrap with K = 500 replications .......... 41

Table 3.3 Power of Asymptotic and Bootstrap Method ................................................................. 42

Table 3.4 Relationship between Sample Size and Statistical Power ............................................. 43

11

Chapter 1

Literature Review and Introduction

1.1 Cluster effect in multicenter clinical trials

1.1.1 Consequences of Cluster Effects

In the ideal scenario of scenario for a clinical trial, one central agency randomly

assigns treatments to all participants. This allows statistical basis to test for the null

hypothesis of no treatment effect. In reality, clinical trials are often designed to involve

multiple institutions, to ensure that an adequate number of participants and

generalizability of the trial, especially in the case of rare diseases such as Hodgkin’s

Lymphoma (HL), or where a study involves many collaboration groups. However, effect

modification could arise if patients recruited at one specific center share certain common

characteristics, or if the medical team at one center practices in a particular pattern.

Furthermore, the correlation between patient outcome and exposure in a particular

institution presents more analytical challenges. If the correct analytical procedure is not

used in the inference process and the data collected from different centers are simply

pooled, center-specific variations could lead to incorrect p-values as well as misleading

confidence intervals, biased estimates, and unrecognized heterogeneity among clusters

(Glidden and Vittinghoff 2004; Localio et al, 2001).

For example, in 1998, 41,000 Medicare patients in 73 Pennsylvania hospitals

participated in a trial designed to study cardiac catheterization. In the analysis by Laine et

al on the rate of combined cardiac catheterization, variation among hospitals were

12

ignored and patient outcomes were assumed to be independent. The initial pooled

analysis showed confidence interval of [33.7%, 34.7%] for the overall rate. However,

subsequent analysis showed that individually, catheterization rate in each hospital varied

from 2% to 98%. These data suggest that the pooled estimate grossly underestimated the

variation in catheterization rates among the participating hospitals. After adjustment for

the correlation of patients within hospitals, the confidence interval of overall rate became

25% to 43%. This example demonstrate that without accounting for cluster-level random

effect, pooling of clustered data can greatly overstate the precision of the data, among

other consequences (Laine et al, 1998; Localio et al, 2001).

1.1.2 Correcting For Cluster Effect in Practice

There are several methods in literature that address the issue of cluster specific

variation in different settings such as binary response studies as well as time-to-event

analysis. For binary response studies, existing methods that correct for clustering effect

can be categorized based on whether the inference procedure is conditional on the

centers. In conditional methods, the effect of treatment is analyzed within each cluster,

and the average effect of treatment across centers is calculated. Analysis about the

relationship between the covariates and outcome is also done within clusters. Such

methods are appropriate for traditional multicenter trials in which patients are randomly

assigned to different treatments within each center. One of the elementary method in this

category is the Mantel-Haenszel method (Mantel and Haenszel, 1959), which allows

estimation of odds ratios or relative risks in clustered data, but is limited to binary

13

outcomes and scenarios with one covariates. Conditional logistic regression can include

multiple covariates, but can only estimate patient-level factors. Fixed-effect logistic

regression assumes that the effect of particular centers on the outcome is fixed. In this

method each center is represented by an indicator variable, and the estimated regression

coefficient of each indicator is the risk estimate for all patients within the center. Finally,

random-effect model assumes that the centers involved in the studies are random samples

chosen from a population that follows an underlying distribution, usually assumed to be

normal. The effect of every center is estimated through a single random-effect variable.

Therefore, the random effect model is appropriate for situations with many centers, as

opposed to fixed-effect logistic regression.

In comparison, unconditional methods take into account of both within- and

between-center homogeneities. Some well-established survey methods are designed to

use in surveys where participants are clustered, such as the one conducted by Bassuk et al

to investigate cognitive decline among seniors. The survey-sampling methods are good

for analyzing the between-center differences present in clinical trials. Another method is

the generalized estimating equations (GEE), which represent a set of methods that allow

for estimation of effect of treatment and exposure to other covariates (Liang and Zeger,

1986). Confidence intervals of estimates are adjusted to account for correlations in

clusters. These methods are suitable for situations when many clusters are involved.

14

1.1.3 Asymptotic Approach of Detecting Cluster-level Random Effect

Several methods have been developed to analyze the variation among institutions

as random effects in multivariate models. Matsuyama et al (1998) used a Bayesian

hierarchical survival model to investigate the impact of institutional variation on the

efficacy of treatment in multicenter cancer clinical trial. In that study, survival was the

endpoint of interest. There have also been several statistical testing method developed

specifically to test for homogeneity among strata. Different versions of these tests are

available for survival data (Gray, 1995) and generalized linear model (Liang, 1987; Lin,

1997; Smith, 1993). In these methods, the variation among institutions is treated as a

random effect and the objectives are to test the null hypothesis that the variance of such

random effect is 0. The test statistics used include score function and observed fisher

information.

In a method proposed by Liang (1987), mixed-effect models were used to model

samples grouped in m potential clusters. A specific outcome variable Y for cluster i that

contains a group of samples has a corresponding density function,𝑓𝑖(𝑦𝑖; 𝛽, 𝛼𝑖 ), in which

𝛼𝑖 = 𝛼 + 𝜃1

2𝑣𝑖 , and v𝑖 is independently generated from an unknown distribution F(.)

with zero mean and unit variance. In other words, 𝛽 represents fixed effects of covariates

other than clusters, while 𝛼𝑖 represents random effect presented by cluster 𝑖. The null

hypothesis to be tested is H0: 𝜃 = 0. If 𝜃 = 0, then the random effect disappears and a

fixed-effect model alone is sufficient to model the outcome and exposure, which means

that the variation among centers is negligible.

15

To test the hypothesis of homogeneity among clusters, Liang (1987) proposed the

following score-based test statistics:

𝑆 =∑𝜕

𝜕𝜃

𝑚

𝑖=1

𝑙𝑜𝑔𝑓𝑖(�̂�, �̂�, 𝜃) =1

2∑[{

𝜕

𝜕𝛼𝑖𝑙𝑜𝑔𝑓𝑖(�̂�, �̂� )}

2 +

𝑚

𝑖=1

𝜕2

𝜕𝛼𝑖2 𝑙𝑜𝑔𝑓𝑖(�̂�, �̂� )] , (1.1)

in which 𝑓𝑖(�̂�, �̂�, 𝜃) represents density of cluster i, 𝜕

𝜕𝛼𝑖𝑙𝑜𝑔𝑓𝑖(�̂�, �̂� ) and

𝜕2

𝜕𝛼𝑖2 𝑙𝑜𝑔𝑓𝑖(�̂�, �̂� )

are score function and observed information about the random effect 𝛼𝑖 calculated under

the null hypothesis (𝜃 = 0 ). �̂� represents the column vector of maximum likelihood

estimators of fixed-effect covariates and �̂� represents random effects associate with

clusters. (Detailed derivation of the score-based statistics S is in Appendix A). The

statistics I was used to approximate the variance of the score-based statistics

asymptotically. Let li represent the log-likelihood function 𝑙𝑜𝑔𝑓𝑖(𝛽, 𝛼 ), and I has the

following form

𝐼 = 𝐼𝜃𝜃 − [𝐼𝜃𝛽𝑇 , 𝐼𝜃𝛼] [𝐼𝛽𝛽 𝐼𝛼𝛽𝐼𝛼𝛽𝑇 𝐼𝛼𝛼

]

−1

[𝐼𝜃𝛽𝐼𝜃𝛼] , (1.2)

where

𝐼𝜃𝜃 =∑(𝜕𝑙𝑖𝜕𝜃)2𝑚

𝑖=1

𝐼𝛼𝛼 =∑(𝜕𝑙𝑖𝜕𝛼)2𝑚

𝑖=1

𝐼𝛽𝛽 =∑𝜕𝑙𝑖𝜕𝛽

𝜕𝑙𝑖𝜕𝛽𝑇

𝑚

𝑖=1

𝐼𝜃𝛽 =∑𝜕𝑙𝑖𝜕𝜃

𝜕𝑙𝑖𝜕𝛽

𝑚

𝑖=1

𝐼𝜃𝛼 =∑𝜕𝑙𝑖𝜕𝜃

𝜕𝑙𝑖𝜕𝛼

𝑚

𝑖=1

𝐼𝛼𝛽 =∑𝜕𝑙𝑖𝜕𝛼

𝜕𝑙𝑖𝜕𝛽

𝑚

𝑖=1

𝐼𝛼𝛽𝑇 =∑𝜕𝑙𝑖𝜕𝛼

𝜕𝑙𝑖𝜕𝛽𝑇

𝑚

𝑖=1

, (1.3)

The rejection criteria involved the normalized statistics T=𝑆/ 𝐼1 2⁄ . This test

statistics was proved to have an asymptotic standard normal distribution. Large values of

16

T lead to the rejection of the null hypothesis. Liang argued that since the “parameters

specified by the null hypothesis is on the boundary of the parameter space formed by β, α

and θ”, the test of statistics T needs to be one-sided. Therefore, the rejection criteria

should be a p<α (α=0.025) (Liang, 1987).

Gray (1995) also proposed a test of cluster effectbased on martingale residual for

survival data, and the test statistics has similar form to (1). Lin(1997) showed a method to

detect random effect in generalized linear models. In all these methods, the test statistics

are the same as ones shown in (1.1), (1.2) and (1.3), and rejection criteria are similar to

the one demonstrated above.

In this project, we will develop statistical methods to test clustered effects for

multiple outcome variables which included both binary response variable and survival

outcome. Two different approaches will be considered, in one approach, we extend the

existing asymptotic method to the multiple outcomes setting; in another approach, we use

bootstrap method to approximate the distribution of the score test statistic. In the next

section, we review some basic idea of bootstrap method to be used in this project.

1.2 Bootstrap

1.2.1 Introduction

Inspired by the jackknife method, Efron (1979) was the first to introduce the

bootstrap method, in an attempt to improve the estimation of statistics such as variance

and regression estimators in a nonparametric manner. The idea is that to estimate a

parameter 𝜃 for a population with unknown distribution, an approximation to the

population distribution can be constructed by resampling from the observed sample

17

which then generate a collection of bootstrap samples. Later, Bickel and Freedman

(1981) showed that Efron’s method is asymptotically valid for many situations and can

improve the asymptotic accuracy in some situations. Rubin (1981) developed a Bayesian

analogue of the bootstrap. This method simulates theposterior distribution of the

parameter of interest, instead of the sampling distribution of a statistics of interest. The

result is a collection of estimated parameter inferred from the distribution of the bootstrap

statistics.

Suppose the parameter of interest 𝜃 can be estimated by estimator 𝜃 derived from

the samples. Bootstrap sample can be repeatedly drawn from the observed sample. On

each bootstrap sample, a bootstrap estimate 𝜃∗ can be calculated in the same way as 𝜃 is

from the observed sample. A popular way of assessing the uncertainty of the estimates

and establish a 1 − 𝛼 level confidence interval for the parameter𝜃 is the percentile

bootstrap method (Efron and Tibshirani, 1993). In this method a two-sided 1 − 𝛼

confidence interval for 𝜃 is simply the percentiles marking the middle 1 − 𝛼 portion of

the bootstrap distribution for the bootstrap estimates 𝜃∗ . This method is the most

appropriate when the distribution of bootstrap estimates is symmetrical and centered on

the observed statistics. We will apply this method later on when we make inference about

the parameter of interest.

The estimate of statistics calculated from bootstrap sample may disagree with the

true value of the parameter 𝜃 systematically, in which case a bias occurs. Efron (1987)

developed new methods of estimating bootstrap confidence intervals, showing the

process of using a bias-correction constant to improve the accuracy of bootstrap. Steck

18

and Jaakkola(2003) also demonstrated a method of leading-order bias correction for

bootstrapped scoring functions as well as maximum likelihood.

1.2.2 Bootstrap Methods Applied for the Modeling of Cluster Effects

Besides the methods introduced in section 1.2.1, bootstrapping is another

unconditional method for studying the clustering effect in multicenter studies. There are

several such bootstrap methods in literature, each with a distinct way of resampling.

Random cluster bootstrap involves random sampling of clusters with replacement and

subsequent permutation of observation within clusters . In reverse two-stage bootstrap,

observations within each cluster are selected with replacement, and then the clusters are

again selected using random sampling with replacement (RSWR). As a result, clusters

that appear in the final collection of bootstrap samples contain the same set of cluster

members.

Applying bootstrap to mixed-effect models (mixed-effect bootstrap) is also a

common practice. In a data structure where the observations are divided into m clusters

and there are ni observations in the cluster i, the response variable yij is modeled using a

linear regression model 𝑦𝑖𝑗 = 𝑋𝑖𝑗𝑇𝛽 + 𝜇𝑖 + 𝑒𝑖𝑗 where Xij

T is a vector of fixed-effect

covariates, 𝛽 is a vector of fixed-effect regression coefficients, ui is the effect of cluster i

and 𝜇𝑖~𝑁(0, 𝜎𝜇2), and eij describes random error and is a random variable drawn from

𝑁(0, 𝜎𝑒2) distribution. The basic sampling approach is to generate m cluster-level error

ui*

from 𝑁(0, 𝜎𝜇2) distribution, and within-cluster error 𝑒𝑖𝑗

∗ for each observation j in

19

cluster i. Next a new dataset is simulated using the model 𝑦𝑖𝑗 = 𝑋𝑖𝑗𝑇𝛽 + 𝜇𝑖 + 𝑒𝑖𝑗

∗ and from

this dataset one bootstrap parameter estimate 𝜃∗can be obtained. This process is repeated

until a satisfactory number of bootstrap estimates are obtained. This method works well

when the cluster-level random effects are independently and identically distributed and

the mixed effect model coefficients are statistically significant (Chambers and Chaundra,

2014).

All the methods above aim to generate empirical estimates that are used to make

statistical inference about the distribution of a particular test statistics. The effectiveness

of these methods has been assessed using a number of criterion in the existing literature.

Davidson and Hinkey(1997) argued that the first two moments of bootstrap statistics

should be as close as possible to the first two moments of the original distribution. Using

this criterion, they evaluated several bootstrap methods and found that random cluster

bootstrap and the mixed-effect bootstrap are the most appropriate for bootstrapping

clustered data.

One major goal in this study is to formulate and apply a new bootstrap procedure

to clustered data. Contrary to the bootstrap methods described above, the parameters of

interest in this study are estimated under the null hypothesis, that is, the hypothesis that

there is no random effect at the cluster level. Therefore, a different sampling scheme is

needed. Instead of sampling clusters or random effect variables, we will resample all

observations with replacement and allocate the bootstrap sample to the existing clusters,

as if there is no heterogeneity among the clusters

20

1.3 Notations

In this section, we define notation that will be used through the following:

i i =1, 2….m represents the ith cluster

j j = 1,2….,ni represents the jth observation in the ith cluster.

In the GLMM model:

𝑋1𝑖𝑗 A 𝑝 × 1 vector of fixed-effect covariates with the first column being 1’s (the

intercept) used in the GLMM model

𝛽1 A 𝑝 × 1 vector of fixed-effect regression parameters.

�̂�1 A 𝑝 × 1 vector of estimators of fixed-effect regression parameters. It is obtained

through GLMM regression.

𝑌𝑖𝑗 The value of the remission response marker of observation j in cluster i

𝛼1𝑖 the random effect of the ith cluster, 𝛼1𝑖~ N(𝛼1,σ12). 𝛼1𝑖 = 𝛼1 + 𝜃1

1/2𝑣1𝑖 , where

𝛼1 and 𝜃1 are both scalar quantity.

In the Cox-frailty model:

𝑡𝑖𝑗 Time to failure or censoring for each subject j in cluster i

𝛿𝑖𝑗 Censoring event indicator for subject j in cluster i

𝑋2𝑖𝑗 A 𝑞 × 1 vector of fixed-effect covariates used in the Cox-frailty model

𝛽2 A 𝑞 × 1 vector, consists of estimates of the fixed-effect regression parameters.

�̂�2 A 𝑞 × 1 vector of estimators of fixed-effect regression parameters. It is obtained

through Cox-frailty regression

𝜆0(𝑡) The baseline hazard function, 𝑡𝑖𝑗 represent time of death or censoring

𝛬0(𝑡) The cumulative baseline hazard function

21

𝛼2𝑖 random effect of the ith cluster, and 𝛼2𝑖~ N(𝛼2,σ22). 𝛼2𝑖 = 𝛼2 + 𝜃2

1/2𝑣1𝑖, where

𝛼2 and 𝜃2 are both scalar quantity.

𝜎12 covariance of random effect 𝛼1𝑖 and 𝛼2𝑖

𝜌 correlation of random effect 𝛼1𝑖 and 𝛼2𝑖 . 𝜌 =𝜎12

√𝜎12𝜎2

2

𝑣1𝑖 and 𝑣2𝑖 are random variables that are assumed to have bivariate joint distribution

1.4 HD.6 Hodgkin Lymphoma Clinical Trial and Joint Modeling

The NCIC Clinical Trials Group’s HD.6 clinical trial is a suitable example for

studying institutional variation (Meyer et al, 2012). It was designed for comparing the

efficacy of one treatment, which is a combination of doxorubicin, bleomycin, vinblastine

and decarbazine (ABVD), with another that includes radiotherapy, with or without

ABVD therapy. This study involved 405 stage IA or IIA nonbulky Hodgkin’s

lymphoma patients recruited from 29 medical institutions across Canada and U.S.

Patients were first divided based on risk status into “favorable risk cohort” and

“unfavorable risk cohort” (defined in Myer et al.2005), then randomly assigned into two

treatment arms: ABVD alone group and radiotherapy group. In the radiotherapy arm, the

unfavorable risk cohort received two cycles of ABVD treatment followed by

radiotherapy, while those with favorable risk profile received subtotal nodal radiation

therapy; in the ABVD arm, both favorable and unfavorable risk cohort received four

cycles of ABVD. Patients were followed-up for a median length of 11.3 years. Both

remission status and survival time were measurements of interest. Remission status was

determined based on radiological or clinical evidence six months after the randomization.

22

Those without remission are “response-positive” and those with remission are “response-

negative”. By the time the study finished, the survival rate in the ABVD therapy group

was 94%, compared to the 87% in the radiotherapy group. Another observation was that

out of the patients who went through remission, 94% had no disease progression and 98%

survived at the end of the trial, compared to 81% who had no disease progression and

92% who survived among patients who did not go through remission. The following

factors were recorded for patients:

Sex: 1 for male, 0 for female

Arm: treatment arm, ABVD alone=0, ABVD+ radiotherapy =1

Remission status (Yij): no remission = 0, remission =1

Risk profile: favorable risk =0, unfavorable risk =1.

The participant numbers are uneven across the institutions, with one institution

responsible for 100 of the total number of participants. The distribution of participant

shown in Figure 1:

Figure 1 Distribution of Participants in HD.6 Clinical Trial

23

In previous work done by Jia Wang (2013), a joint model linking remission

response and survival probability was developed to compare the efficacy of the two

treatments under investigation, and to investigate whether remission response could be

used as a predictor for survival time. The joint model is made up of a Generalized Linear

Mixed Model (GLMM) that models the effect of covariates in choice on remission

response, as well as a Cox-frailty survival model, in which remission response is one of

the covariates. The cluster effect generated from institutional heterogeneity was included

in the model as two random effect variable, 𝛼1𝑖 and 𝛼2𝑖 in the GLMM and Cox-frailty

model respectively. The overall joint model has the following form:

1) The Generalized Linear Mixed Model (GLMM) model was developed using a

Bernoulli distribution for response to treatment,

𝑃(𝑌𝑖𝑗 = 1|𝑋𝑖𝑗, 𝛽1, 𝛼1𝑖) =𝑒𝑋1𝑖𝑗𝑇 𝛽1+𝛼1𝑖

1+𝑒𝑋1𝑖𝑗𝑇 𝛽1+𝛼1𝑖

where 𝛼1𝑖 is the random effect of the ith cluster. 𝑋1𝑖𝑗𝑇 is a p×1 vector of fixed effects

covariates in the GLMM model, 𝛽1 represents a p×1 vector of fixed effect regression

coefficients in GLMM, with sex, treatment arm and risk profile as covariates.

2) The Cox proportional hazard model:

𝜆(𝑡| 𝑋2𝑖𝑗, 𝛽2, 𝛼2𝑖) = 𝜆0(𝑡)𝑒𝑋2𝑖𝑗𝑇 𝛽2+𝛼2𝑖,

(3)

(4)

24

where t is failure time, 𝛼2𝑖 is the random effect of the ith cluster in the Cox-frailty model,

𝑋2𝑖𝑗𝑇 is a 1×q vector of fixed effects covariates in the Cox proportional hazard model, 𝛽2

is a q×1 vector of fixed effect regression coefficients in the Cox-frailty model, with sex,

remission response, treatment arm, and risk profile as covariates.

The two models above are joint together through the remission response variable

Yij and through the joint distribution of the two random effect variables. More

specifically, 𝛼1𝑖 and 𝛼2𝑖 are assumed to have bivariate normal distribution:

𝜶𝒊 = (𝛼1𝑖𝛼2𝑖)~𝑁2 ((

𝛼1𝑖𝛼2𝑖) , (

𝜎12 𝜎12

𝜎12 𝜎22 )) = 𝑁2 ((

𝛼1𝑖𝛼2𝑖) , Σ) ,

In the study by Wang (2013), penalized partial likelihood method was used to

estimate both fixed and random effects. More specifically, the variance and covariance of

the random effect model were estimated using both joint model and separate model, and

the accuracy of the estimates from these models were compared. The study demonstrated

that the joint model is preferable to the separate in various aspects, such as reduced bias

in estimates of fixed-effect parameters and variance components, and decreased mean

square error for parameter estimation.

Using the joint model, Wang’s study concluded that the remission status differs

significantly between patients in each treatment arm, and neither gender nor risk profile

can predict remission status. The study also concluded that the binary remission response

can be an important predictor for the hazard rate of survival, and that the possible

25

interaction effect between remission response and treatment is not statistically significant.

Wang also used Maximum Partial Likelihood (MPL) and Jackknife method to provide a

point estimate for the variance of the cluster-level random effects.

1.5 Purpose of Study

The purpose of this study is to develop both the asymptotic and bootstrapping

methods in detecting existence of random cluster effects for clinical and epidemiology

studies with multiple outcome variables. In the previous study done by Wang, no test was

included in the study to test whether the heterogeneity among each center was

statistically significant. Furthermore, the existing methods of testing shown in section

1.1.2 have not yet been applied to join modeling situation. This study addresses both of

these issues by combining a bootstrapping procedure with the score-based statistics

described in (1.1).

In this study we intend to test the hypothesis that in the joint model developed for

studies that contain both binary and survival outcomes, the variance of the random effect

components are 0. In other words, we seek to test the following null hypothesis and

alternative hypothesis:

H0 : 𝜃1 = 𝜃2 = 0

H1 : 𝜃1 > 0, or 𝜃2 > 0 or both 𝜃1 and 𝜃2 > 0

More specifically, the objectives include:

26

1. Develop both asymptotic and bootstrap methods to test the presence of random

effects.

2. Evaluate the validity of both the asymptotic and bootstrap testing methods by

calculating the type I error and power of the proposed testing methods.

3. Compare the bootstrapping method with the asymptotic variance test outlined in

section 1.2.

4. Apply the proposed methods to the HD.6 clinical trial to test for cluster effects.

After establishing the type I error and power of the bootstrap and the asymptotic

method, we will also investigate whether it is possible to achieve a desired level of

statistical power and test size by using specific sample sizes. This investigation is

important for other researchers who need to choose an optimal sample size to investigate

cluster-level random effects in a given sample. Typically, when designing a study,

researchers want to choose an appropriate sample size so that the method used to test the

hypothesis at a given test size (0.025 for one-sided test, and 0.05 for tow-sided test) can

have specific type I error and power level. We will show that it is possible for such

sample size selection with the bootstrap and the asymptotic method, but the theory behind

how sample size affect the achieved type I error and power, as well as the precise

methodology of sample size selection, are beyond the scope of this study.

We will not pursue bias estimation of the bootstrap statistics in this study, and

rather focus on developing a convenient method that can be used in cluster-effect

detection and protocol design by researchers. As shown above, the bootstrap technique is

much easier to master and involves less complicated calculations; it provides an

27

approximation to the distribution of the test statistic. If the bootstrap method is accurate

in test size and has the same or better statistics power, it can bring great benefits to

various research efforts in the future.

28

Chapter 2

Bootstrap and Asymptotic Variance Testing Procedures

2.1 Overview

In this chapter, we develop approaches to test the existence of cluster effects in

the multivariate models discussed in section 1.4. This is equivalent to testing the

hypothesis that random effects in the multivariate models have variance of 0. The first

approach is based on the asymptotic variance method such as the one presented by Liang

(1987), in which both the score test statistics S and the standardized score test statistic T

are calculated as illustrated in section 1.1.2. In the second approach, instead of

calculating the T, we use bootstrap method to establish the distribution of the score test

statistics S under the null hypothesis H0: 𝜃1 = 𝜃2 = 0. By comparing this distribution with

the observed score S statistics of the data, an inference can be made about the random

effect components. To assess the validity of both the asymptotic and bootstrap methods,

we will study the distribution of estimates generated from both methods, as well as

compute the type I error and power of them. Lastly, we will also apply these methods to a

Hodgkin Lymphoma clinical trial conducted at NCIC, also known as the HD.6 clinical

trial, to evaluate possibility of institution cluster effect involved in this trial.

29

(2.2)

2.2 Calculating score-based statistics S

Let 𝛼1𝑖 stand for the random effect of the ith cluster in the GLMM model, and 𝛼2𝑖

stand for random effect of the ith cluster in the cox-frailty model. We assume that these

two random effects follow a bivariate normal distribution with zero means and a variance

matrix Σ , with variances 𝜎12 and 𝜎2

2 for 𝛼1𝑖 and 𝛼2𝑖 respectively, and 𝜎12 as the

covariance of 𝛼1𝑖 and 𝛼2𝑖.

𝜶𝒊 = (𝛼1𝑖𝛼2𝑖)~𝑁2 ((

𝛼1𝑖𝛼2𝑖) , (

𝜎12 𝜎12

𝜎12 𝜎22 )) = 𝑁2 ((

𝛼1𝑖𝛼2𝑖) , Σ) (2.1)

Under the null hypothesis, we can assume that both 𝛼1𝑖 and 𝛼1𝑖 are both 0. We use 𝜋𝑖𝑗 to

denote the probability of biomarker response 𝑦𝑖𝑗 taking value 1.

𝑃(𝑌𝑖𝑗 = 1|𝑋𝑖𝑗, 𝛽1, 𝛼1𝑗) = 𝜋𝑖𝑗 =𝑒𝑋1𝑗𝑖𝑇 𝛽1+𝛼1𝑖

1+𝑒𝑋1𝑖𝑗𝑇 𝛽1+𝛼1𝑖

Wang(2013) defined the full joint likelihood function as

𝐿 = ∫{∏1

2𝜋√|Σ|exp [−

1

2𝜶𝒊𝑻Σ−1𝜶𝒊]

𝑚

𝑖=1

∏∏ (𝑒𝑋1𝑗𝑖

𝑇 𝛽1+𝛼1𝑖

1 + 𝑒𝑋1𝑖𝑗𝑇 𝛽1+�̂�1𝑖

)

𝑦𝑖𝑗

(1

1 + 𝑒𝑋1𝑖𝑗𝑇 𝛽1+𝛼1𝑖

)

1−𝑦𝑖𝑗𝑛𝑖

𝑗=1

𝑚

𝑖=1

[𝑒−𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 𝛽2+𝛼2𝑖

] [𝜆0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 𝛽2+𝛼2𝑖]

𝛿𝑖𝑗} 𝑑𝜶𝒊 (2.3)

Furthermore, l denotes the joint log-likelihood function and has the following form:

𝑙 =∑∑𝑦𝑖𝑗(𝑋1𝑖𝑗𝑇 𝛽1 + 𝛼1 + 𝜃1

12𝑣1𝑗

𝑛𝑗

𝑖

)

𝑚

𝑗

− log(1 + 𝑒𝑋1𝑖𝑗𝑇 𝛽1+𝛼1+𝜃1

12𝑣1𝑗) + 𝛿𝑖[𝑙𝑜𝑔𝜆0(𝑡𝑖𝑗) + 𝑋2𝑖𝑗

𝑇 𝛽2

+ 𝛼2 + 𝜃2

12𝑣2𝑖] − 𝛬0(𝑡𝑖𝑗)𝑒

𝑋2𝑖𝑗𝑇 𝛽2+𝛼2+𝜃2

12𝑣2𝑖 (2.4)

(more details in Appendix B)

30

Combining the log-likelihood function with the theory developed by Liang

(1987), the score-based statistics S has the following form:

𝑆 =1

2∑{(

𝜕𝑙

𝜕𝛼1𝑖)2

+ (𝜕𝑙

𝜕𝛼2𝑖)2

− (−𝜕𝑙2

𝜕𝛼1𝑖2 ) − (−

𝜕𝑙2

𝜕𝛼2𝑖2 )}

𝑚

𝑖=1

(2.5)

To calculate the score-based statistics S, we need to derive 𝜕𝑙

𝜕𝛼1𝑖 and

𝜕𝑙

𝜕𝛼2𝑖 , the score

functions with respect to the random effect components, as well as −𝜕2𝑙

𝜕𝛼1𝑖2 and −

𝜕2𝑙

𝜕𝛼2𝑖2 , the

observed information function of the random effect variables (More detailed derivation in

Appendix B).

Similar to Wang (2013), we applied the first-order Lapalce method to

approximate the solution of (2.3) (Appendix B), and derived the necessary parts for the

score-based statistics S:

𝜕𝑙

𝜕𝛼1𝑖=∑𝑦𝑖𝑗 −

𝑒𝑋1𝑖𝑗𝑇 𝛽1+𝛼1𝑖

1 + 𝑒𝑋1𝑖𝑗𝑇 𝛽1+𝛼1𝑖

𝑛𝑖

𝑗=1

−𝛼1𝑖𝜎1

2 − 𝛼2𝑖𝜎12

𝜎12𝜎2

2 − 𝜎12

−𝜕2𝑙

𝜕𝛼1𝑖2 = −∑[−


(1 + 𝑒𝑋1𝑖𝑗𝑇 𝛽1+𝛼1𝑖)

2]

𝑛𝑖

𝑗=1

+𝜎22

𝜎12𝜎2

2 − 𝜎12

𝜕𝑙

𝜕𝛼2𝑖=∑𝛿𝑖 −

𝑛𝑖

𝑗=1

𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 𝛽2+𝛼2𝑖 −

𝛼2𝑖𝜎12 − 𝛼1𝑖𝜎12

𝜎12𝜎2

2 − 𝜎12

−𝜕2𝑙

𝜕𝛼1𝑖2 = −∑−𝛬0(𝑡𝑖𝑗)𝑒

𝑋2𝑖𝑗𝑇 𝛽2+𝛼2𝑖

𝑛𝑖

𝑗=1

+𝜎12

𝜎12𝜎2

2 − 𝜎12

To obtain the score-based statistics S under null hypothesis, we evaluate these

equations at H0: 𝜃1 = 𝜃2 = 0, which is equivalent to 𝜎12 = 𝜎2

2 = 𝜎12 = 0, and 𝛼1𝑖 =

31

𝛼2𝑖 = 0. Let S1i(0) and S2i(0) to represent the score functions in the GLMM and Cox-

frailty model evaluated under the null hypothesis H0, and I1i(0) and I2i(0) represent the

information functions in the GLMM and Cox-frailty model evaluated under H0

𝑆1𝑖(0) =∑𝑦𝑖𝑗 −𝑒𝑋1𝑖𝑗

𝑇 𝛽1

1 + 𝑒𝑋1𝑖𝑗𝑇 𝛽1

𝑛𝑖

𝑗=1

𝐼1𝑖(0) = −∑−𝑒𝑋1𝑖𝑗

𝑇 𝛽1

(1 + 𝑒𝑋1𝑖𝑗𝑇 𝛽1)

2

𝑛𝑖

𝑗=1

𝑆2𝑖(0) =∑𝛿𝑖 −

𝑛𝑖

𝑗=1

𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 𝛽2

𝐼2𝑖(0) = −∑−𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 𝛽2

𝑛𝑖

𝑗=1

Substitute these functions into equation (2.5), and change 𝛽1 and 𝛽2 to �̂�1 and �̂�2

represent the fitted regression coefficient parameters. The final form of the score-based

statistics S are shown as the following:

𝑆 = 1

2∑[(∑𝑦𝑖𝑗 −

𝑒𝑋1𝑖𝑗𝑇 �̂�1

1 + 𝑒𝑋1𝑖𝑗𝑇 �̂�1

𝑛𝑖

𝑗=1

)

2

+ (∑𝛿𝑖 −

𝑛𝑖

𝑗=1

𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 �̂�2)

2

−∑−𝑒𝑋1𝑖𝑗

𝑇 �̂�1

(1 + 𝑒𝑋1𝑖𝑗𝑇 �̂�1)

2

𝑛𝑖

𝑗=1

𝑚

𝑖=1

−∑𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 �̂�2

𝑛𝑖

𝑗=1

]. (2.6)

32

2.3 Inference using Asymptotic Method

As previously demonstrated in Liang (1987), the asymptotic variance method

involves the statistics T=S/I1/2. Let 𝑙1 denote the log-likelihood function of the GLMM

model, the GLMM portion of the information matrix I is:

𝐼𝐺𝐿𝑀𝑀 = 𝐼𝜃1𝜃1 − [𝐼𝜃1𝛽1𝑇 , 𝐼𝜃1𝛼1] [𝐼𝛽1𝛽1 𝐼𝛼1𝛽1𝐼𝛼1𝛽1𝑇 𝐼𝛼1𝛼1

]

−1

[𝐼𝜃1𝛽1𝐼𝜃1𝛼1

],

in which

𝐼𝜃1𝜃1 =∑(𝜕𝑙1𝑖𝜕𝜃1

)2𝑚

𝑖=1

𝐼𝛼1𝛼1 =∑(𝜕𝑙1𝑖𝜕𝛼1

)2𝑚

𝑖=1

𝐼𝛽1𝛽1 =∑𝜕𝑙1𝑖𝜕𝛽1

𝜕𝑙1𝑖

𝜕𝛽1𝑇

𝑚

𝑖=1

𝐼𝜃1𝛽1 =∑𝜕𝑙1𝑖𝜕𝜃1

𝜕𝑙1𝑖𝜕𝛽1

𝑚

𝑖=1

𝐼𝜃1𝛼1 =∑𝜕𝑙1𝑖𝜕𝜃1

𝜕𝑙1𝑖𝜕𝛼1

𝑚

𝑖=1

𝐼𝛼1𝛽1 =∑𝜕𝑙1𝑖𝜕𝛼1


𝑚

𝑖=1

𝐼𝛼1𝛽1𝑇 =∑𝜕𝑙1𝑖𝜕𝛼1

𝜕𝑙1𝑖

𝜕𝛽1𝑇

𝑚

𝑖=1

. (2.7)

And the Cox-frailty portion is

𝐼𝑐𝑜𝑥 = 𝐼𝜃2𝜃2 − [𝐼𝛼2𝛽2𝑇 , 𝐼𝜃2𝛼2] [𝐼𝛽2𝛽2 𝐼𝛼2𝛽2𝐼𝛼2𝛽2𝑇 𝐼𝛼2𝛼2

]

−1


],

𝐼𝜃2𝜃2 =∑(𝜕𝑙2𝑖𝜕𝜃2

)2𝑚

𝑖=1

𝐼𝛼2𝛼2 =∑(𝜕𝑙2𝑖𝜕𝛼2

)2𝑚

𝑖=1

𝐼𝛽2𝛽2 =∑𝜕𝑙2𝑖𝜕𝛽2

𝜕𝑙2𝑖

𝜕𝛽2𝑇

𝑚

𝑖=1

𝐼𝜃2𝛽2 =∑𝜕𝑙2𝑖𝜕𝜃2


𝑚

𝑖=1

𝐼𝜃2𝛼2 =∑𝜕𝑙2𝑖𝜕𝜃2

𝜕𝑙2𝑖𝜕𝛼2

𝑚

𝑖=1

𝐼𝛼2𝛽2 =∑𝜕𝑙2𝑖𝜕𝛼2


𝑚

𝑖=1

𝐼𝛼2𝛽2𝑇 =∑𝜕𝑙2𝑖𝜕𝛼2

𝜕𝑙2𝑖

𝜕𝛽2𝑇

𝑚

𝑖=1

(2.8)

The final form of I used in inference is the sum of IGLMM and ICox together.

Intuitively, since the score-based statistic S is the sum of its GLMM and Cox-frailty

proportions, its variance approximation, I, should also be the sum of the variance

approximation of these two portions.

33

The details about the specific parts of I matrices are listed in Appendix C. The T

statistics can be valuated using the value of S and I. In the original literature by Liang

(1987), T is shown to follow a standard normal distribution. The p-value of the

hypothesis test can be obtained by testing T against standard normal distribution.

2.4 Bootstrap Procedure

Bootstrap method is particularly effective for establishing distribution of

complicated statistics since no assumption about the distribution of the statistics is needed

(Adèr et al, 2008). The distribution of statistics S can be complicated, which is a good

scenario to use the bootstrap method to establish the distribution of the score-based

statistics and as an alternative to the asymptotic method.

Suppose a data has N samples grouped into m clusters, Just as score-based

statistic S was calculated under the condition that both 𝛼1𝑖 and 𝛼2𝑖 have zero variances,

the bootstrap procedure should also be conducted under such assumption. Therefore, we

will randomly sample all of the observed samples with replacement to obtain N bootstrap

samples. After this, we rearrange all the bootstrap samples into clusters according to the

arrangement in the original data as if all the clusters are the same.

The bootstrap procedure contains the steps as the following:

0. Start with the original dataset, which has a total of N samples grouped into m

strata. Each observation j in cluster i has covariate 𝑋𝑖𝑗 which contains covariates

values that are part of the fixed-effects such as gender, treatment arm, risk profile.

Remission status is represented by 𝑦𝑖𝑗, time to events are represented by 𝑡𝑖𝑗 , and

http://en.wikipedia.org/wiki/Ad%C3%A8r,_H._J.

34

censoring events indicator is represented by 𝛿𝑖𝑗 . Each observation also

corresponds to a cluster assignment variable 𝛾𝑖𝑗, and 𝛾𝑖𝑗’s have the same value for

the same value of i. In other words, each observation in the original dataset is

composed of a set of (𝑋𝑖𝑗, 𝑦𝑖𝑗, 𝑡𝑖𝑗, 𝛿𝑖𝑗, 𝛾𝑖𝑗) where i = 1,2…m and j=1,2…ni .

1. Randomly select, with replacement, N samples from {𝑋𝑖𝑗, 𝑦𝑖𝑗, 𝑡𝑖𝑗, 𝛿𝑖𝑗: i = 1,2…m;

j=1,2…ni } to produce bootstrap sample {𝑋ℎ∗, 𝑦ℎ

∗ , 𝑡ℎ∗ , 𝛿ℎ

∗: h = 1,2…N}, where is

the total number of sample size.

2. Pair 𝛾𝑖𝑗 to the corresponding bootstrap sample. The bootstrap sample become

{𝑋𝑖𝑗∗ , 𝑦𝑖𝑗

∗ , 𝑡𝑖𝑗∗ , 𝛿𝑖𝑗

∗ , 𝛾𝑖𝑗: i = 1,2…m; j=1,2…ni }

3. Use the bootstrap samples to conduct fixed-effect GLMM and Cox-frailty

regression. This gives �̂�1∗

and �̂�2∗

, the estimates of fixed-effect regression

coefficients in the GLMM and Cox-frailty model.

4. Calculate bootstrap estimate of score statistics S* by applying equation (2.5) to the

bootstrap sample.

5. Repeat step 1 to 4 for B times to obtain B bootstrap estimates.

A large-enough B value can establish the distribution of the score statistic S,

which is expected to have a standard normal distribution. After calculating the actual

score statistic for the dataset, compare the actual score statistic with the statistics

calculated from the B bootstrap samples. In the one sided test, p-value of the hypothesis

test is the percentage of bootstrap statistics larger than the observed score-based statistics

S. The hypothesis should be rejected if p< 𝛼 and 𝛼 = 0.025. For the two-sided test, p-

35

value is the percentage of bootstrap estimates whose absolute values are larger than that

of the observed score statistics. The hypothesis should be rejected if p< 𝛼 and 𝛼 = 0.05.

2.5 Calculating Type I Error and Power

We use the empirical type I error and power values to study the performance of

both bootstrap and the asymptotic methods. We will use both one-sided and two-sided

test to obtain p-values. In one sided test, we use as the criterion for rejecting the null

hypothesis, and in two-sided test we use 𝛼 = 0.05 as the criterion of rejection. We will

generate a large number (K) datasets with both 𝛼1𝑖 and 𝛼2𝑖 set to have zero variance

values, and apply both methods to each dataset to obtain a large number of p-values. The

percentage of p-values less than αis the value of Type I error. To obtain the power of the

methods we will generate K datasets with both 𝛼1𝑖 and 𝛼2𝑖 set to have positive variances.

Bootstrap and asymptotic method are again applied on the simulated datasets and the

percentage of p-values less than αis the power of the bootstrap or the asymptotic method.

36

Chapter 3

Simulation Studies

3.1 Overview

In this chapter, we conduct a series of numerical simulations to evaluate the finite

sample properties of the proposed methods in chapter 2. We used similar data generation

methods as those used in the study by Wang. For each dataset, we pre-define the variance

of cluster-effect to be either zero or a positive value. Using the R program, we generate

multiple datasets and apply the asymptotic and bootstrap methods to them. We calculate

p-values and obtain the type I error and statistical power.

3.2 Methods

Conditioning on the random effect 𝛼1𝑖 , the GLMM response variable yij is

generated from a Bernoulli distribution with probability given in (3.1). To simplify the

process, the only fixed-effect covariate included in the model is treatment, denoted by Zij.

Magnitude of the fixe- effect estimates are pre-defined as parameters, 𝛽1 = (𝛽10 , 𝛽11) ,

for the GLMM model, and 𝛽2 = (𝛽21 , 𝛽22, 𝛽23) for the Cox frailty model. To simplify

the comparison, we used the same value of fixed effect regression coefficients for all

simulations. These values are pre-defined and have fixed values, and are used in all of the

simulation settings.

We used the values defined in table 3.1 as magnitudes of the fixed effects.

37

Table 3.1 Parameters of Simulation

𝛽10 𝛽11 𝛽21 𝛽22 𝛽23

-1 log(0.5) log(0.5) log(2) log(2)

𝑃(𝑌𝑖𝑗 = 1|𝑧𝑖𝑗, 𝛽1, 𝛼1𝑖) =𝑒𝛽10+𝑧𝑖𝑗𝛽11+𝛼1𝑖

1+𝑒𝛽11+𝑧𝑖𝑗𝛽12+𝛼1𝑖

.

Conditioning on the random effect 𝛼2𝑖 , the failure time Tij was assumed to follow an

exponential distribution and simulated from the Cox frailty model (3.2), with the baseline

hazard set to a constant of 0.15. Censoring time Cij was generated from uniform

distribution Unif(0,20) so that the censoring rate was around 20%. The fixed-effect

covariates included were remission response (Yij) and treatment (Zij). To study the

interaction between the two variables, an interaction term was included in the model. The

random variable S(T) (which represents the survival function) follows an uniform

distribution Unif(0,1) (denoted as Wij). Equation (3.2) can be rewritten into equation

(3.3), and provides a solution for the failure time 𝑇𝑖𝑗 .

𝜆(𝑡|𝑋2𝑖𝑗, 𝛽2, 𝛼2𝑖) = 𝜆0(𝑡𝑖𝑗)𝑒𝛽21𝑦𝑖𝑗+𝛽22𝑧𝑖𝑗+𝛽23𝑧𝑖𝑗𝑦𝑖𝑗+𝛼2𝑖 , (3.2)

𝑇𝑖𝑗 = −log(W𝑖𝑗)

0.15𝑒𝛽21𝑦𝑖𝑗+𝛽22𝑧𝑖𝑗+𝛽23𝑧𝑖𝑗𝑦𝑖𝑗+𝛼2𝑖. (3.3)

In each simulated dataset, center-specific random effects are generated from

bivariate normal distribution N2 (0, Σ), where Σ is a 2×2 variance-covariance matrix that

defines the variances and covariance of random effect variable 𝛼1𝑖 and 𝛼2𝑖 . The variance

of 𝛼1𝑖 and 𝛼2𝑖 are denoted as 𝜎12 and 𝜎2

2 respectively, and the covariance is denoted as

(3.1)

38

𝜎12. To simplify the problem further, we will distribute the observation equally into

clusters in our simulated datasets.

After each simulation dataset is generated, we apply the asymptotic method in

section 2.3 and the bootstrap procedure described in section 2.4 to the dataset and

calculate one-sided and two-sided p-values. To calculate the type I error of the test, a

large number (K) of datasets with 0 variance and covariance in the random effect

variables were generated by repeating the data generation procedure in R software

package (R core team, 2014). From these datasets, K p-values were calculated and the

proportion of the p-values smaller than α is the type I error of the test (chance of rejecting

the null hypothesis when it is correct). To calculate the power of the test (chance of

rejecting the null hypothesis when the null hypothesis is false), the same method was

used except at least one of the parameters 𝜎12, 𝜎2

2 and 𝜎12 was made larger than 0, in

order to assess the power of the method.

3.3 Simulation Results

3.3.1 Results of Asymptotic Method

The asymptotic method outlined in Chapter 2 was applied to K = 200 datasets in

each setting to calculate the empirical power, except for when we use the asymptotic

method to test datasets under the null hypothesis with zero-variance random effects, K =

500 replication were conducted to obtain a more stable estimate of empirical type I error

rate. Table 3.2 and 3.3 show the Type I error and power of the asymptotic method in each

setting. In all settings of simulations, the two-sided test of asymptotic method failed to

achieve Type I error rates lower than the test size (0.025 for one-sided test, 0.05 for two-

39

sided test). The power of the method ranged from 36% to 100%, and increases with the

variance of the random effects.

3.3.2 Bootstrap Estimate Distribution

To gain more understanding about the distribution of bootstrap estimates, we

simulated 4 different datasets, with (𝜎12 , 𝜎2

2) values of (0.0, 0.0), (0.1, 0.1), (0.2, 0.2) and

(0.2, 0.1). The covariance values were set so that the correlation between 𝛼11 and

𝛼2𝑖 were set to be 0.5. For each simulated data, we applied the bootstrap procedure

outlined in section 2.3 with B=499 bootstrap replications and plotted the frequency of the

bootstrap estimates for each data in Figure 2.

Figure 2 Bootstrap Estimate Distribution of 4 Datasets

Figure 2 shows that in each trial the bootstrap estimates distribution are

approximately centered near 0 with roughly symmetrical shape that resembles normal

40

distributions. This distribution pattern suggests that the percentile bootstrap is appropriate

for the score-based statistic S.

3.3.3 Type I Error and Power Comparison

In Table 3.2, we show a group of simulations conducted using predefined

regression coefficient as shown in table 3.1. In each setting, K = 200 dataset was

simulated, with the exception that under the null hypothesis settings in which variances

and covariance are set to zero were applied in K = 500 replications. In each simulated

dataset, n = 600 observations were generated and grouped into m = 30 clusters. For each

set of pre-defined variance value, a pair of simulations were conducted so that the impact

of both positive and negative correlations were investigated. It should be noted that the K

= 200 datasets used in each trials were different, as the survival time, binary marker

response and key covariates are generated at random.

In seven simulation settings, variance values of random effect variables were set

as zero. The α for one-sided test is 0.025, and 0.05 for two-sided test. In these settings,

the bootstrap methods showed Type I error value ranging from 0.040 to 0.064 in two-

sided tests (average 0.053), and 0.015 to 0.040 in one-sided tests. On the other hand, the

Type I error rate of the asymptotic method is consistently higher, ranging from 0.114 to

0.148; the one-sided test range from 0 to 0.02. These simulations show that the bootstrap

method is more accurate at recognizing data without clustering effect.

41

Table 3.2 Type I error of Asymptotic Approach and Bootstrap with K = 500 replications

𝝈𝟏𝟐 𝝈𝟐

𝟐 Correlation

coefficient

Type I error (Asymptotic ) Type I error (Bootstrap)

One-sided

(α = 0.025)

Two-sided

(α = 0.05)

One-sided

(α = 0.025)

Two-sided

(α = 0.05)

0 0 0 0.020 0.126 0.035 0.062

0 0 0 0.005 0.130 0.020 0.056

0 0 0 0.000 0.148 0.035 0.052

0 0 0 0.005 0.120 0.020 0.064

0 0 0 0.01 0.114 0.015 0.048

0 0 0 0.000 0.126 0.040 0.048

0 0 0 0.005 0.130 0.030 0.040

More simulations were done using datasets with chosen positive values of 𝜎12 and

𝜎22, and 𝜎12 such that the correlation between the two random effect variables are -0.5,

0.0 and 0.5, in order to study the effect of correlation on the type I error and power of

both test methods. 200 datasets were simulated in each trial.

42

Table 3.3 Power of Asymptotic and Bootstrap Method

In all settings, the bootstrap method manage to achieve higher power (lower type

II error) than the asymptotic method. These simulations also show that the correlation

between the two random effect variable in the joint model has small effects on the power

of the approach. When the variance value are large enough, both bootstrap and the

asymptotic method can detect random effect with almost 100% power.

𝝈𝟏𝟐 𝝈𝟐

𝟐 Correlation coefficient

Power (Asymptotic) Power (Bootstrap)

One-Sided (α = 0.025)

Two-Sided (α = 0.025)

One-Sided (α = 0.025)

Two-Sided (α = 0.025)

0.05 0.05 0.5 0.380 0.380 0.580 0.520 0.05 0.05 0 0.365 0.365 0.700 0.510 0.05 0.05 -0.5 0.365 0.365 0.655 0.592

0.05 0.06 0.5 0.480 0.485 0.725 0.680 0.05 0.06 0 0.465 0.465 0.655 0.715 0.05 0.06 -0.5 0.490 0.495 0.775 0.725

0.05 0.1 0.5 0.740 0.740 0.900 0.905 0.05 0.1 0 0.735 0.735 0.975 0.940 0.05 0.1 -0.5 0.735 0.735 0.940 0.93

0.1 0.05 0.5 0.435 0.440 0.675 0.645 0.1 0.05 0 0.490 0.680 0.680 0.690 0.1 0.05 -0.5 0.430 0.630 0.630 0.655

0.1 0.1 0.5 0.790 0.790 0.915 0.940 0.1 0.1 0 0.835 0.835 0.975 0.945 0.1 0.1 -0.5 0.825 0.825 0.950 0.945

0.1 0.25 0.5 0.990 0.990 1.000 1.00 0.1 0.25 -0.5 1.000 1.000 0.995 0.996

0.1 0.5 0.5 1.000 1.00 1.00 1.00 0.1 0.5 -0.5 1.000 1.00 1.00 1.00

0.25 0.1 0.5 0.945 0.972 0.972 0.972 0.25 0.1 -0.5 0.910 0.968 0.968 0.968

0.5 0.1 0.5 0.980 1.00 1.00 1.00 0.5 0.1 -0.5 0.980 0.974 0.974 0.974

43

For epidemiologists and clinical trial professionals, power levels around 80% are

more of interest. We fixed the variances and covariance of the random effects chosen a

range of sample sizes to demonstrate it is possible to achieve a certain level of statistical

power by changing the sample sizes. In Table 3.3, we show several simulation results.

We controlled 𝜎12 , 𝜎2

2 and 𝜎12 to be 0.05, 0.05, and 0.025 respectively in all settings. It

was observed that a change in sample size and/or cluster number can change the power of

both bootstrap and the asymptotic method. Both asymptotic and bootstrap method

achieved statistical power of 80% at certain levels of sample size and cluster number. A

general trend that was observed is that the power of both methods increase as the sample

size increases.

Table 3.4 Relationship between Sample Size and Statistical Power

Total

Sample

Number of

Clusters

Power (Asymptotic) Power (Bootstrap)

One-sided Two-sided One-sided Two-sided

1200 40 0.820 0.795 0.905 0.915

1000 50 0.670 0.665 0.825 0.825

1000 40 0.715 0.705 0.855 0.885

800 40 0.525 0.520 0.700 0.745

600 30 0.370 0.400 0.580 0.520

500 25 0.290 0.300 0.575 0.610

500 20 0.330 0.355 0.690 0.565

400 20 0.170 0.280 0.415 0.495

350 14 0.210 0.245 0.425 0.475

44

Chapter 4 Application and Discussion

4.1 Application to HD.6 Clinical Trial

We applied both the bootstrap (B=899) and the asymptotic method to the HD.6

clinical trial data. In the Cox-frailty model, we excluded the interaction term of remission

response and treatment arm, and included risk profile as a fixed-effect covariate. The

reason is that risk profile is statistically significant predictor of survival hazard but the

interaction term between response and treatment arm is not statistically significant

(Wang, 2013). The observed value of score-based statistics S is -7.1 and the p-value is

0.485. The distribution of bootstrap estimate has a mean of -3.6. Using the bootstrap

method, it was shown that cluster-level random effect has zero variance, with p-value of

0.51 achieved in one-sided test, and 0.61 in the two-sided test. The asymptotic method

reached the same conclusion, with p-value of 0.88 for one-sided test, and 0.24 for two-

sided test. Both methods showed that the HD.6 clinical trial in question is not affected by

cluster effect at either 0.025 or 0.05 level of significant.

The distribution of the bootstrap estimate for the HD.6 data show right skew,

which is not seen from the distribution shown in Figure 3, in that the HD.6 data bootstrap

estimates. Figure 3 plots the distribution of the bootstrap estimates.

45

Figure 3 Distribution of bootstrap estimates obtained from HD.6 clinical trial data

This skewness could be caused by the uneven distribution of the participants in

the trial. In the resampling process, participants from a particular institution are more

likely to be sampled if that institution has more participants. The resulting distribution is

different from that of the bootstrap estimates generated from evenly distributed data

shown in Figure 2, but since one-sided asymptotic test, which has very low type I error,

confirmed the result that there is no statistically significant cluster-level effect, we accept

the conclusion reached by the bootstrap method.

In the original study conducted by Meyer et al (2012), Kaplan-Meier estimates

were used to analyze the survival and remission status of the patients in whole and in

groups of different risk profiles. In all scenarios of the analysis, cluster effect was not

accounted for. Our analysis showed that it is not necessary to account for the random

difference between different medical centers from which the patients were recruited since

such difference is not statistically significant. The study by Meyer et al eventually

46

concluded that when compared with ABVD therapy alone was associated with higher rate

of overall survival, and this conclusion does not need to be altered to take into account

for cluster effect.

This indicates that future analyses on the dataset do not have to account for this

type of random effect. Based on the conclusion from this study, cluster-level random

effects do not have to be accounted for in the model. This would drastically decrease the

complexity of the model and the inference process in that study.

Previously, Wang(2013) used both multivariate penalized likelihood (MPL) and

Jackknife (JK) resampling method to study the joint modeling of treatment response and

survival time. In the model, cluster-level joint random effect between remission response

and survival was taken into account. It was concluded that “the effects of variance

components are not statistically significant and therefore the association between two

endpoints through joint random effects is negligible”. In fact, the reason that the MPL

algorithm was used in the first place was because of the potential cluster-level random

effect. The alternative methods to the MPL algorithm are the more widely used maximum

likelihood analysis such as the Expectation-Maximization (EM) algorithm. These

methods can reduce bias and make the inference efficient, but can involve “intractable

high-dimensional integral” due to potential but unobservable random effects, and thus

make computation much more time-consuming (Wulfsohn et al. 1997, Wang 2013).

Since we eliminated the possibility of random effects in this study, statistical inference

for this study can be greatly simplified.

47

4.2 Comparison of bootstrap method with the asymptotic method

As demonstrated in Section 3.3, the bootstrap shows improved power at only B =

299 iteration compared to the asymptotic method in both one-sided and two-sided tests.

With respect to the asymptotic method, the type-I error is close to zero when one-sided

test is used, and is well above 0.05 when two-sided test is used. These results show that

the tests are not accurate in achieving the nominal test level α. The fact that the two-sided

tests achieved type I errors that are much larger than the one-sided test type I error shows

that the assumption about the distribution of the standardized statistics T may not be

appropriate in the joint model scenario. On the other hand, the bootstrap method achieved

acceptable levels of type I error using both one-sided and two-sided tests, and since the

type I errors are not close to zero, the bootstrap method is more appropriate for

application in epidemiological studies.

The bootstrap method can be applied to types of data other than clinical trial data. The

asymptotic method requires complicated calculation that involve derivatives of the log-

likelihood function and matrix operations. It also involves assumption that the T statistic

belongs to standard normal distribution, which may not hold. The bootstrap method

presents a non-parametric and relatively easier way to detect whether cluster-level

random effect is in the data. This allows for practitioners to quickly check their data on

hand for cluster-level random effects, and decide whether it is appropriate to include an

extra variable in the model specifically for this effect.

48

4.3 Sample Size Selection

In practice, when the cluster-level random effect is the variable that is

investigated and thus included as a model, researchers want to choose a sample size so

that the method used for modeling has a certain level of type I error and power, usually

0.05 and 0.80 by convention (Kelsey et al, 1996). For example, when designing case

control studies, given the theoretical values of odds ratios, an appropriate sample size can

be chosen based on a simple formula so that the analytical method has type I error of 0.05

and power of 0.80 (Kelsey et al, 1996).

In this study, we attempted to show that asymptotic and bootstrap method can

achieve certain levels of statistical power by using certain numbers of sample size.

Section 3.3 showed that the power of the bootstrap test method can vary with sample and

cluster and the type I error of the method approximates α in multiple simulated scenarios.

Therefore, in studies where the bootstrap methods are applied, optimal sample size can be

chosen so that the type I error and power are αand 0.80, knowing the possible range of

variance of the cluster-level random effect. Since the main objective of this study is to

evaluate the finite sample properties of both bootstrap method and the asymptotic

method, we did not investigate further about the detail about this sample size

determination problem. This can be worthwhile to investigate in the future.

When the asymptotic method is used, the type I error of the one-sided test is close

to zero, with the exception of one setting in our simulation study, and the type I error of

the two-sided test is well above 0.05. Therefore, if the asymptotic method is used in an

epidemiological study, it is difficult to select a particular sample size to achieve type-I

49

errors close to α. Furthermore, the power of the asymptotic test is lower than that of the

bootstrap method, at every variance level, for both one-sided and two-sided tests. This

means that more samples are needed to achieve 80% statistical power using the

asymptotic method. Thus, the bootstrap is the more preferable method to be used in

epidemiological studies.

4.4 Future Directions

The bootstrap method demonstrated in this method can be used for multiple

scenarios, without the necessity for specialized algorithms, to directly test size of cluster-

level random effects in a non-parametric manner. For example, a current hot topic in

clinical trial study is the joint modeling of longitudinal measurement and survival data.

As opposed to the binary response marker (remission status) that is measured only once

in the HD.6 clinical trial, researchers are very interested in a biological marker that is

measured throughout the course of the trial, and how it can predict survival. The

bootstrap method demonstrated in this study can be applied to this type of joint model.

The reason is that the score-based statistics S we used in this study was intended for the

GLMM model in the study done by Liang (et al 1987). The longitudinal biomarker

measurements in clinical trial can be modeled using GLMM model, which our bootstrap

method can be used for random-effect testing. Li and Wang (2008) also demonstrated the

use of smooth bootstrap method for analysis of longitudinal data.

In future simulation studies, the number of observations across clusters can be set

as different numbers and the effect of this setting on the power and type I error of the

50

bootstrap method can be investigated. This is worthwhile to study since clusters with

unequal observation numbers is a more realistic scenario. The performance of both

asymptotic method and bootstrap method on data in which only one cluster-level random

effect has zero variance in the joint model should also be studied.

4.5 Computational Expense

When implementing both methods in R software package, the bootstrap operation

is significantly more computationally expensive than the asymptotic method. This is due

to the reason that the bootstrap method repeats the process of calculating the S statistics

for B times, and we accomplish this using a loop-based mechanism. More specifically in

this case, the program needs to first loop through each cluster to calculate the quantities

needed for subsequent steps (S1𝑖(0) for GLMM model, I1𝑖(0) for Cox-frailty model),

sum these quantities across all centers in a dataset, and then through each of the B

resampled datasets. As a consequence, the computation time increase with magnitude of

R, as well as the sample size n, at a quadratic rate. The bootstrap method becomes

particularly time-consuming when we try to determine its type I error/power on detecting

particular levels of random effect, since the bootstrap process was repeated for 200 times.

In contrast, the asymptotic method is implemented through vectorized mechanism.

Perhaps in the future the process of bootstrapping can be implemented on other software

or better algorithms can be written in order to apply vectorized implementation to the

process.

51

Chapter 5

Summary and Future Direction

The asymptotic method has been used to test the presence of cluster-level random

effect in a number of previous literatures (Liang, 1987; Lin 1997). The method involves

deriving a score-based test statistics S, and subsequently a standard version of the score

test statistics T which is shown to have standard normal distribution. The p-value of the T

statistics can be interpreted as the probability that cluster-level random effect is observed

in data due to random chance.

In this study, we formulated a new bootstrap procedure to achieve the same

purpose. Simulations showed that that the bootstrap method has well controlled Type I

error rate and is more powerful at detecting small values of cluster-level random effect.

We applied both the bootstrap method and the asymptotic method to NCIC Clinical

Trials Group HD.6 clinical trial. Both methods showed that the data of the clinical trial is

not affected by cluster-level random effect. This conclusion has significant impact on the

decision on analysis of the trial’s data. This also means that previous works done by

Wang(2013) can be simplified.

When it comes to study design, practitioners are often interested in choosing a

sample size so that the method that is used can have type I error and power of certain

values. We have shown that the levels of statistical power for both the asymptotic and

bootstrap change with sample size, as well as variance and covariance value of the

random effects themselves. The theoretical basis behind this sample selection problem is

52

out of the scope of this study. In the future studies, more efforts can be spent on the

theoretical basis of the connection between sample size and the type I error and power, so

that a more specific system can be set up to specifically choose appropriate sample sizes

given the theoretical values of random effect variances.

Another research direction in the future is to investigate how the bootstrap method

performs in datasets with unequal cluster sizes. We kept the number of observations in

each cluster sample equal in our simulation to simplify the problem, but a more realistic

setting would be one in which the number of observations in each cluster varies. The

HD.6 clinical trial data has this feature, and could be the reason why the distribution of

the bootstrap estimate show right skew. The conclusion of the bootstrap and the

asymptotic method regarding the cluster-level random effect agree with each other, but

the effect of unequal number of sample in clsuters on the statistical power of the

bootstrap method needs to be determined so that the bootstrap method can be more

properly applied to practical scenarios.

53

Bibliography

Adèr, H. J., Mellenbergh G. J., & Hand, D. J. (2008). Advising on research methods: A

consultant's companion. Huizen, The Netherlands: Johannes van Kessel Publishing.

Bassuk SS, Glass TA, Berkman LF. Social disengagement and incident cognitive decline

in community-dwelling elderly persons. Ann Intern Med. 1999; 131.165-73 PubMed

Bickel P, Freeman D (1981) Some the asymptotic theory for the bootstrap. Ann Statist 9

1196–1217

Davison, A. C.; Hinkley, D. V. (1997). Bootstrap methods and their application.

Cambridge University Press.

Efron B (1979). Bootstrap methods: Another look at the jackknife. Ann. Statist. 7 1–26

Efron, B. (1987). "Better Bootstrap Confidence Intervals". Journal of the American

Statistical Association (Journal of the American Statistical Association, Vol. 82, No. 397)

82 (397): 171–185.

Efron, B.; Tibshirani, R. (1993). An Introduction to the Bootstrap. Boca Raton, FL:

Chapman & Hall/CRC.

54

Glidden, DV, Vittinghoff, E (2004). Modeling clustered survival data from multicenter

clinical trials. Stat Med, 23, 3:369-88.

Gray, R.J., (1995) Testing for Variation Over Groups in Survival Data, Journal of the

American Statistical Association, March 1995, Vol 90, No. 429.

Harald Steck, H., Jaakkola, T.S., (2003) Bias-Corrected Bootstrap and Model

Uncertainty, Advances in Neural Information Processing Systems 16, 2003

Kelsey, J.L., Whittemore, A.S., Evans, A.S., (1996) Methods in Observational

Epidemiology, Oxford University Press

Laine C, Venditti L, Localio R, Wickenheiser L, Morris DL. (1998), Combined cardiac

catheterization for uncomplicated ischemic heart disease in a Medicare population. Am J

Med. 1998; 105.373-9

Li, Y., Wang, YG., Smooth bootstrap methods for analysis of longitudinal data. Stat

Med. 2008 Mar 30;27(7):937-53.

Liang, KY., (1987), A locally most powerful test for homogeneity with many strata,

Biometrika (1987), 74, 2, pp. 259-64

55

Liang, KY., Zeger, S., (1986), Longitudinal data analysis using generalized linear

models. Biometrika 73 (1): 13–22.

Lin, D.Y., Wei, L.J., (1991), Goodness-Of-Fit Tests for the General Cox Regression

Model. Statistical Sinica, 1, 1-17

Localio AR, Berlin JA, Ten Have TR, Kimmel SE. Adjustments for Center in

Multicenter Studies: An Overview. Ann Intern Med. 2001;135:112-123.

Mantel N, Haenszel W (1959) Statistical aspects of the analysis of data from

retrospective studies of disease. J Natl Cancer Inst. 22: 719–748.

Matsuyama, Y; Sakamoto, J ; Ohashi, Y, A Bayesian hierarchical survival model for the

institutional effects in a multi-centre cancer clinical trial. Stat in Med 1998

Sep,17(17):1893-908

Meyer, RM, Gospodarowicz, MK, Connors, JM, Pearcey, RG, Bezjak, A, Wells, WA,

Burns, BF, Winter, JN, Horning, SJ, Dar, AR, Djurfeldt, MS, Ding, K, Shepherd, LE

(2005). Randomized comparison of ABVD chemotherapy with a strategy that includes

radiation therapy in patients with limited-stage Hodgkin's lymphoma: National Cancer

Institute of Canada Clinical Trials Group and the Eastern Cooperative Oncology Group.

J. Clin. Oncol., 23, 21:4634-42.

56

R Core Team (2014). R: A language and environment for statistical computing. R

Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.

Ripatti, S, Palmgren, J (2000). Estimation of multivariate frailty models using penalized

partial likelihood. Biometrics, 56, 4:1016-22.

Rubin D (1981). The Bayesian bootstrap. Ann Statist 9 130–134

Singh K (1981) On the asymptotic accuracy of Efron’s bootstrap. Ann Statist 9 1187–

1195

Smith, P.J., Heitjan, D.F., (1993), Testing and Adjusting for Departures from Normal

Dispersion in Generalized Linear Models, Appl. Statist. (1993) 42, No. 1, 31-41.

Stect, H., Jaakkola, T.S., Bias-Corrected Bootstrap and Model Uncertainty, Advances in

Neural Information Processing Systems 16 (NIPS 2003)

Wang, J. (2013), Joint Modeling of Binary Response and Survival Data in Clinical Trials,

Department of Public Health Sciences, Queen’s University, Kingston, Ontario

Wulfsohn, M.S., Tsiatis, A.A. (1997). A joint model for survival and longitudinal data

measured with error. Biometrics, 53, 1:330-9.

57

Ye, W., Lin, X.H., Taylor, J (2008). A penalized likelihood approach to joint modeling of

longitudinal measurements and time-to-event data. Statistics and Its Interface, 1, 33–45.

58

Appendix A

Derivation of Score-based Statistics

Let 𝛽 represent the fixed-effect regression parameters and 𝛼𝑖 represent the random

effect specific to cluster i, where 𝛼𝑖 = 𝛼 + 𝜃1

2𝑣𝑖 with each of the components as defined

in section 1.1.2. We start from the density function of cluster i, denoted as 𝑓(𝛽, 𝛼𝑖). The

log likelihood function is denoted by 𝑙𝑖 and has the following form:

𝑙𝑖 = 𝑙𝑜𝑔∫𝑓(𝛽, 𝛼𝑖)𝑑𝐹(𝑣) = 𝑙𝑜𝑔∫𝑓(𝛽, 𝛼 + 𝜃12𝑣𝑖)𝑑𝐹(𝑣𝑖)

The ultimate goal is to derive the derivative of the log-likelihood function with respect to

𝜃. This is done by applying the chain rule to establish the equality: 𝜕𝑙

𝜕𝜃=

𝜕𝑙

𝜕𝛼𝑖

𝜕𝛼𝑖

𝜕𝜃

𝜕𝑙𝑖𝜕𝜃

=𝜕𝑙

𝜕𝛼𝑖

𝜕𝛼𝑖𝜕𝜃

=∫𝜕𝑓 (𝛽, 𝛼 + 𝜃

12𝑣𝑖)

𝜕𝛼𝑖𝑣𝑑𝐹(𝑣𝑖)

𝑓(𝛽, 𝛼 + 𝜃1/2𝑣𝑖) ×1

2𝜃−

12

=1

2

∫𝜕𝑓 (𝛽, 𝛼 + 𝜃

12𝑣)

𝜕𝛼𝑖𝜃−

12𝑣𝑑𝐹(𝑣)

𝑓(𝛽, 𝛼 + 𝜃1/2𝑣)

Since the numerator results in a value of infinity when evaluated at 𝜃=0, we use

L'Hôpital's rule, and the numerator is transformed using the following way

lim𝜃→0

∫𝜕𝑓 (𝛽, 𝛼 + 𝜃

12𝑣)

𝜕𝛼𝑖𝑣𝑑𝐹(𝑣)

𝜃12

= lim𝜃→0

𝑑𝑑𝜃 ∫

𝜕𝑓 (𝛽, 𝛼 + 𝜃12𝑣)

𝜕𝛼𝑖𝑣𝑑𝐹(𝑣)

𝑑𝑑𝜃𝜃12

(And applying similar chain rule as above)

= lim𝜃→0

12𝜃

−12 ∫

𝜕2𝑓 (𝛽, 𝛼 + 𝜃12𝑣)

𝜕2𝛼𝑖𝑣2𝑑𝐹(𝑣)

12 𝜃

−12

= lim𝜃→0

∫𝜕2𝑓 (𝛽, 𝛼 + 𝜃

12𝑣)

𝜕2𝛼𝑖𝑣2𝑑𝐹(𝑣)

From here we omit the lim sign and evaluate the final expression at 𝜃=0.

59

Since v is a random variable with 0 mean and unit variance, ∫𝑣2𝑑𝐹(𝑣) =E(v2)

=[E(v)]2+var(v) = 1. Therefore

∫𝜕2𝑓 (𝛽, 𝛼 + 𝜃

12𝑣)

𝜕2𝛼𝑖𝑣2𝑑𝐹(𝑣) =

𝜕2𝑓 (𝛽, 𝛼 + 𝜃12𝑣)

𝜕2𝛼𝑖

and


=1

2

𝜕2

𝜕2𝛼𝑖𝑓 (𝛽, 𝛼 + 𝜃

12𝑣)

𝑓 (𝛽, 𝛼 + 𝜃12𝑣)

Add and then subtract [

𝜕

𝜕𝛼𝑖𝑓(𝛽,𝛼+𝜃

12𝑣)

𝑓(𝛽,𝛼+𝜃12𝑣)

]

2

from the existing expression of𝜕𝑙𝑖

𝜕𝜃, we have:


=1

2

{

[

𝜕𝜕𝛼𝑖

𝑓 (𝛽, 𝛼 + 𝜃1

2𝑣)

𝑓 (𝛽, 𝛼 + 𝜃1

2𝑣)]

2

+

𝜕2

𝜕2𝛼𝑖𝑓 (𝛽, 𝛼 + 𝜃

1

2𝑣) × 𝑓 (𝛽, 𝛼 + 𝜃1

2𝑣) − [𝜕𝜕𝛼𝑖

𝑓 (𝛽, 𝛼 + 𝜃1

2𝑣)]2

[𝑓 (𝛽, 𝛼 + 𝜃1

2𝑣)]2

}

The part of the addition sign is actually the result of applying the chain rule and then the

quotient rule to 𝜕2

𝜕2𝛼𝑖𝑙𝑜𝑔𝑓 (𝛽, 𝛼 + 𝜃

1

2𝑣), and the part before the addition sign is the result

of applying the chain rule to 𝜕

𝜕𝛼𝑖𝑙𝑜𝑔𝑓 (𝛽, 𝛼 + 𝜃

1

2𝑣).


=1

2[{𝜕


12𝑣)}

2

− {−𝜕2

𝜕2𝛼𝑖𝑙𝑜𝑔𝑓 (𝛽, 𝛼 + 𝜃

12𝑣)}]

evaluated at 𝜃 = 0, which is the same as evaluating at 𝛼𝑖 = 0

Since 𝜕


1

2𝑣) is the score function, denoted as S(𝛼𝑖) and −𝜕2

𝜕2𝛼𝑖𝑙𝑜𝑔𝑓 (𝛽, 𝛼 +

𝜃1

2𝑣) is the observed information function, denoted as I(𝛼𝑖)

We use to denote the score function, and I(𝛼𝑗) to denote the observation.


= 𝑆 =12∑{𝑆(0)}

2− 𝐼(0)

𝑛

𝑖=1

60

Appendix B

Deriving Score-Based Statistics in the Joint Model

In the study by Wang (2013), parameter estimation was done using multivariate

penalized likelihood method. The approximate marginal log-likelihood function (𝑙) for

the joint model is the following:

𝑙 = 𝑙1+𝑙2 + 𝑙3 + 𝑙4

where

𝑙1 = −𝑚

2log Σ

𝑙2 = −1

2∑{ log [∑−


(1 + 𝑒𝑋1𝑖𝑗𝑇 �̂�1+𝛼1𝑖)

2 +𝜎22

𝜎12𝜎2

2 − 𝜎122

𝑛𝑖

𝑗=1

]

𝑚

𝑖=1

+ log [∑−𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 𝛽2+𝛼2𝑖

𝑛𝑖

𝑗=1

+𝜎12

𝜎12𝜎2

2 − 𝜎122 ]}

𝑙3 =∑∑𝑦𝑖𝑗(𝑋1𝑖𝑗𝑇 𝛽1 + 𝛼1 + 𝜃1

1/2𝑣1𝑗

𝑛𝑖

𝑗

)

𝑚

𝑖


12𝑣1𝑗) + 𝛿𝑖[𝑙𝑜𝑔𝜆0(𝑡𝑖𝑗) + 𝑋2𝑖𝑗

𝑇 𝛽2

+ 𝛼2 + 𝜃21/2𝑣2𝑖] − 𝛬0(𝑡𝑖𝑗)𝑒


1/2𝑣2𝑖

𝑙4 = −1

2∑

𝛼2𝑖2 𝜎1

2 − 2𝛼1𝑖𝛼2𝑖𝜎12 + 𝛼2𝑖2 𝜎1

2

𝜎12𝜎2

2 − 𝜎122

𝑚

𝑖=1

When estimating random effects, 𝑙2 was excluded from the log-likelihood function, based

on the claims by Ripatti et al. 2000 and Ye et al. 2008 that 𝑙2 has negligible effect on the

estimation of the random effects. Since when we evaluate the expression under the null

hypothesis, 𝑙4 becomes 0, and 𝑙1 is a constant, when we derive the necessary parts for the

61

asymptotic variance statistic 𝐼 we only use 𝑙3 as the log-likelihood function 𝑙.

𝑙 =∑∑𝑦𝑖𝑗(𝑋1𝑖𝑗𝑇 𝛽1 + 𝛼1 + 𝜃1

12𝑣1𝑗

𝑛𝑖

𝑗

)

𝑚

𝑖

− log (1 + 𝑒𝑋1𝑖𝑗𝑇 𝛽1+𝛼1+𝜃1

12𝑣1𝑗) + 𝛿𝑖[ log𝜆0(𝑡𝑖𝑗) + 𝑋2𝑖𝑗

𝑇 𝛽2

+ 𝛼2 + 𝜃2



12𝑣2𝑖

The function above can be seen as the sum of two parts, lG, which contains only the

random effect in GLMM model and lC , which contains only the random effect in the

Cox-frailty model:

𝑙𝐺 =∑∑𝑦𝑖𝑗(𝑋1𝑖𝑗𝑇 𝛽1 + 𝛼1 + 𝜃1

12𝑣1𝑖

𝑛𝑖

𝑗

)

𝑚

𝑖


12𝑣1𝑖)

𝑙𝐶 =∑∑𝛿𝑖[ log𝜆0(𝑡𝑖𝑗) + 𝑋2𝑖𝑗𝑇 𝛽2 + 𝛼2 + 𝜃2



12𝑣2𝑖

𝑛𝑖

𝑗

𝑚

𝑖

Let 𝑙𝐺𝑖 and 𝑙𝐶𝑖 represent the GLMM portion and the Cox-frailty portion of individuals in cluster i

.

𝑙𝐺𝑖 =∑𝑦𝑖𝑗(𝑋1𝑖𝑗𝑇 𝛽1 + 𝛼1 + 𝜃1

12𝑣1𝑖

𝑛𝑖

𝑗=1

) − log (1 + 𝑒𝑋1𝑖𝑗𝑇 𝛽1+𝛼1+𝜃1

12𝑣1𝑖)

𝑙𝐶𝑖 =∑𝛿𝑖[ log𝜆0(𝑡𝑖𝑗) + 𝑋2𝑖𝑗𝑇 𝛽2 + 𝛼2 + 𝜃2



12𝑣2𝑖

𝑛𝑖

𝑗

The score-based test statistics S is given by

𝑆 =1

2∑

𝜕𝑙𝐺𝑖𝜕𝜃1

+𝜕𝑙𝐶𝑖𝜕𝜃2

𝑚

𝑖=1

.

62

From the conclusion in Appendix A, we have

𝜕𝑙𝐺𝑖𝜕𝜃1

= {𝜕𝑙𝐺𝑖𝜕𝛼1𝑖

}2

− {−𝜕2𝑙𝐺𝑖𝜕2𝛼1𝑖

}

𝜕𝑙𝑐𝑖𝜕𝜃2

= {𝜕𝑙𝐶𝑖𝜕𝛼2𝑖

}2

− {−𝜕2𝑙𝐶𝑖𝜕2𝛼2𝑖

}

where both expressions are evaluated at 𝛼1𝑖 = 𝛼2𝑖 =0.

𝜕𝑙𝐺𝑖

𝜕𝜃1=

𝜕𝑙𝑖

𝜕𝜃1 and

𝜕𝑙𝐶𝑖

𝜕𝜃2=

𝜕𝑙𝑖

𝜕𝜃2 since the log-likelihood function 𝑙𝑖 is split into parts that

contain only one type of random effect, the score-based statistics has the form shown in

equation (2.5)

1

2∑{(

𝜕𝑙

𝜕𝛼1𝑖)2

+ (𝜕𝑙

𝜕𝛼2𝑖)2

− (−𝜕𝑙2

𝜕𝛼1𝑖2 ) − (−

𝜕𝑙2

𝜕𝛼2𝑖2 )}

𝑛

𝑖=1

63

Appendix C

Application of Asymptotic Variance in GLMM and Cox-frailty Joint

Model

The objective is to derive the asymptotic statistics T in section 1.1.2 where T=S/𝐼. Since 𝐼

approximate the variance of the score-based statistics S, it can be calculated as the

asymptotic variance approximation of the GLMM model (𝐼𝐺𝐿𝑀𝑀) and that of the cox-

frailty model (𝐼𝑐𝑜𝑥). Similar to splitting up the log-likelihood function in Appendix B to

calculate the score-based statistics S, we do the same for the asymptotic variance statistics

I.


]

−1


]

in which 𝐼𝛽1𝛽1 is a p×p matrix. 𝐼𝛼1𝛽1 is a p×1 matrix and 𝐼𝛼1𝛽1𝑇 is its transverse. 𝐼𝜃1𝜃1 and

𝐼𝛼1𝛼1 are both scalar quantities. Furthermore, [𝐼𝜃1𝛽1𝑇 , 𝐼𝜃1𝛼1] is a 1 ×(p+1) dimension and


] is its transpose, and

𝐼𝜃1𝜃1 =∑(𝛿𝑙𝑖𝛿𝜃1

)2𝑚

𝑖=1

𝐼𝛼1𝛼1 =∑(𝛿𝑙𝑖𝛿𝛼1

)2𝑚

𝑖=1

𝐼𝛽1𝛽1 =∑𝛿𝑙𝑖𝛿𝛽1

𝛿𝑙𝑖

𝛿𝛽1𝑇

𝑚

𝑖=1

𝐼𝜃1𝛼1 =∑𝛿𝑙𝑖𝛿𝜃1

𝛿𝑙𝑖𝛿𝛼1

𝑚

𝑖=1

𝐼𝜃1𝛽1 =∑𝛿𝑙𝑖𝛿𝜃1

𝛿𝑙𝑖𝛿𝛽1

𝑚

𝑖=1

𝐼𝛼1𝛽1 =∑𝛿𝑙𝑖𝛿𝛼1


𝑚

𝑖=1

𝐼𝛼1𝛽1𝑇 =∑𝛿𝑙𝑖𝛿𝛼1


𝑚

𝑖=1

These parts are all evaluated at 𝜃1 = 0 and then substituted into 𝐼𝐺𝐿𝑀𝑀. Furthermore:

𝐼𝐶𝑜𝑥 = 𝐼𝜃2𝜃2 − [𝐼𝜃2𝛽2𝑇 , 𝐼𝜃2𝛼2] [𝐼𝛽2𝛽2 𝐼𝛼2𝛽2𝐼𝛼2𝛽2𝑇 𝐼𝛼2𝛼2

]

−1


]

64

In which 𝐼𝛽2𝛽2 is a p×p matrix. 𝐼𝛼2𝛽2 is a p×1 matrix and 𝐼𝛼2𝛽2𝑇 is its transpose. 𝐼𝜃2𝜃2 and

𝐼𝛼2𝛼2 are both scalar quantities. Furthermore, [𝐼𝜃2𝛽2𝑇 , 𝐼𝜃2𝛼2] is a 1 ×(p+1) dimension and


] is its transverse. Where:

𝐼𝜃2𝜃2 =∑(𝛿𝑙𝑖𝛿𝜃2

)2𝑚

𝑖=1

𝐼𝛼2𝛼2 =∑(𝛿𝑙𝑖𝛿𝛼2

)2𝑚

𝑖=1

𝐼𝛽2𝛽2 =∑𝛿𝑙𝑖𝛿𝛽2

𝛿𝑙𝑖

𝛿𝛽2𝑇

𝑚

𝑖=1

𝐼𝜃2𝛼2 =∑𝛿𝑙𝑖𝛿𝜃2

𝛿𝑙𝑖𝛿𝛼2

𝑚

𝑖=1

𝐼𝜃2𝛽2 =∑𝛿𝑙𝑖𝛿𝜃2


𝑚

𝑖=1

𝐼𝛼2𝛽2 =∑𝛿𝑙𝑖𝛿𝛼2


𝑚

𝑖=1

𝐼𝛼2𝛽2𝑇 =∑𝛿𝑙𝑖𝛿𝛼2

𝛿𝑙𝑖

𝛿𝛽2𝑇

𝑚

𝑖=1

.

These parts are all evaluated at 𝜃2 = 0 and then substituted into 𝐼𝐶𝑜𝑥.

As established in Appendix B, the total log-likelihood function of the joint model is:

𝑙 = ∑ ∑ 𝑦𝑖𝑗(𝑋1𝑖𝑗𝑇 𝛽1 + 𝛼1 + 𝜃1

1/2𝑣1𝑖

𝑛𝑖𝑗 )𝑚

𝑖 − log(1 + 𝑒𝑋1𝑖𝑗𝑇 𝛽1+𝛼1+𝜃1

12𝑣1𝑖) + 𝛿𝑖[ log𝜆0(𝑡𝑖𝑖) +

𝑋2𝑖𝑗𝑇 𝛽2 + 𝛼2 + 𝜃2

1/2𝑣2𝑖] − 𝛬0(𝑡𝑖𝑗)𝑒


1/2𝑣2𝑖.

Let

𝑙𝐺𝑖 = ∑ 𝑦𝑖𝑗(𝑋1𝑖𝑗𝑇 𝛽1 + 𝛼1 + 𝜃1

1

2𝑣1𝑖𝑛𝑖𝑗 ) − log (1 + 𝑒𝑋1𝑖𝑗

𝑇 𝛽1+𝛼1+𝜃1

12𝑣1𝑖),

𝑙𝐶𝑖 = ∑ 𝛿𝑖[ log𝜆0(𝑡𝑖𝑗) + 𝑋2𝑖𝑗𝑇 𝛽2 + 𝛼2 + 𝜃2

1

2𝑣2𝑖] − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 𝛽2+𝛼2+𝜃2

12𝑣2𝑖𝑛𝑖

𝑗 .

Now we take a series of partial derivatives to obtain the asymptotic variance of the

proposed score test statistic for the joint model:

𝛿𝑙𝐺𝑖𝛿𝜃1

= (∑𝑦𝑖𝑗 −𝑒𝑋1𝑖𝑗

𝑇 𝛽1+𝛼1+𝜃11/2

𝑣1𝑖

1 + 𝑒𝑋1𝑖𝑗𝑇 𝛽1+𝛼1+𝜃1

1/2𝑣1𝑖

𝑛𝑖

𝑗=1

)

2

−∑𝑒𝑋1𝑖𝑗

𝑇 𝛽+𝛼1+𝜃11/2

𝑣1𝑖

(1 + 𝑒𝑋1𝑖𝑗𝑇 𝛽+𝛼1+𝜃1

1/2𝑣1𝑖)2

𝑛𝑗

𝑖

𝛿𝑙𝐺𝑖𝛿𝛼1

=∑𝑦𝑖𝑗

𝑛𝑖

𝑗=1

−𝑒𝑋1𝑖𝑗

𝑇 𝛽1+𝛼1+𝜃11/2

𝑣1𝑖


1/2𝑣1𝑖

65

𝛿𝑙𝐺𝑖𝛿𝛽1

=∑𝑦𝑖𝑗𝑋1𝑖𝑗𝑇

𝑛𝑖

𝑗=1


𝑇 𝛽1+𝛼1+𝜃11/2

𝑣1𝑖


1/2𝑣1𝑖𝑋1𝑖𝑗𝑇

= ∑(𝑦𝑖𝑗 −𝑒𝑋1𝑖𝑗

𝑇 𝛽1+𝛼1+𝜃11/2

𝑣1𝑖


1/2𝑣1𝑖)𝑋1𝑖𝑗

𝑇

𝑛𝑖

𝑗=1

(𝛿𝑙𝐺𝑖𝛿𝛽1

)2

=𝛿𝑙

𝛿𝛽1𝑇

𝛿𝑙

𝛿𝛽1

=∑(𝑦𝑖𝑗 −𝑒𝑋1𝑖𝑗

𝑇 𝛽1+𝛼1+𝜃11/2

𝑣1𝑖



𝑛𝑖

𝑗=1

×∑(𝑦𝑖𝑗 −𝑒𝑋1𝑖𝑗

𝑇 𝛽1+𝛼1+𝜃11/2

𝑣1𝑖



𝑇

𝑛𝑖

𝑗=1

Sum across all clusters, evaluate under the null hypothesis, replace the unknown

parameter 𝛽1 by the correspond maximum likelihood estimate (MLE) �̂�1:

𝐼𝜃1𝜃1 = {∑[(∑𝑦𝑖𝑗 −𝑒𝑋1𝑖𝑗

𝑇 �̂�1

1 + 𝑒𝑋1𝑖𝑗𝑇 �̂�1

𝑛𝑖

𝑗=1

)

2


𝑇 �̂�1

(1 + 𝑒𝑋1𝑖𝑗𝑇 �̂�1)2

𝑛𝑖

𝑗=1

]

𝑚

𝑖=1

}

2

𝐼𝛼1𝛼1 = [∑(∑𝑦𝑖𝑗

𝑛𝑖

𝑗=1


𝑇 �̂�1

1 + 𝑒𝑋1𝑖𝑗𝑇 �̂�1

)

𝑚

𝑖=1

]

2

𝐼𝛽1𝛽1 =∑[∑(𝑦𝑖𝑗 −𝑒𝑋1𝑖𝑗

𝑇 �̂�1

1 + 𝑒𝑋1𝑖𝑗𝑇 �̂�1

)𝑋1𝑖𝑗

𝑛𝑖

𝑗=1

×∑(𝑦𝑖𝑗 −𝑒𝑋1𝑖𝑗

𝑇 �̂�1

1 + 𝑒𝑋1𝑖𝑗𝑇 �̂�1

)𝑋1𝑖𝑗𝑇

𝑛𝑖

𝑗=1

]

𝑚

𝑖=1

𝐼𝜃1𝛼1 =∑{[(∑𝑦𝑖𝑗 −𝑒𝑋1𝑖𝑗

𝑇 �̂�1

1 + 𝑒𝑋1𝑖𝑗𝑇 �̂�1

𝑛𝑖

𝑗=1

)

2


𝑇 �̂�1

(1 + 𝑒𝑋1𝑖𝑗𝑇 �̂�1)2

𝑛𝑖

𝑗=1

] ×∑𝑦𝑖𝑗

𝑛𝑖

𝑗=1


𝑇 �̂�1

1 + 𝑒𝑋1𝑖𝑗𝑇 �̂�1

}

𝑚

𝑖=1

𝐼𝜃1𝛽1 =∑{[(∑𝑦𝑖𝑗 −𝑒𝑋1𝑖𝑗

𝑇 �̂�1

1 + 𝑒𝑋1𝑖𝑗𝑇 �̂�1

𝑛𝑖

𝑗=1

)

2


𝑇 �̂�1

(1 + 𝑒𝑋𝑖𝑗𝑇�̂�1)2

𝑛𝑖

𝑗=1

] ×∑[𝑦𝑖𝑗 −𝑒𝑋1𝑖𝑗

𝑇 �̂�1

1 + 𝑒𝑋1𝑖𝑗𝑇 �̂�1

] × 𝑋1𝑖𝑗𝑇

𝑛𝑖

𝑗=1

}

𝑚

𝑖=1

𝐼𝛼1𝛽1 =∑[(∑𝑦𝑖𝑗

𝑛𝑖

𝑗=1


𝑇 �̂�1

1 + 𝑒𝑋1𝑖𝑗𝑇 �̂�1

)∑(𝑦𝑖𝑗 −𝑒𝑋1𝑖𝑗

𝑇 �̂�1

1 + 𝑒𝑋1𝑖𝑗𝑇 �̂�1

)𝑋1𝑖𝑗𝑇

𝑛𝑖

𝑗=1

]

𝑚

𝑖=1

66

𝐼𝛼1𝛽1𝑇 =∑[(∑𝑦𝑖𝑗

𝑛𝑖

𝑗=1


𝑇 �̂�1

1 + 𝑒𝑋1𝑖𝑗𝑇 �̂�1

)∑(𝑦𝑖𝑗 −𝑒𝑋1𝑖𝑗

𝑇 �̂�1

1 + 𝑒𝑋1𝑖𝑗𝑇 �̂�1

)𝑋1𝑖𝑗

𝑛𝑖

𝑗=1

]

𝑚

𝑖=1

These functions make up IGLMM, which has the following form:


]

−1


]

To obtain the derivatives for 𝐼𝐶𝑜𝑥

𝛿𝑙𝐶𝑖𝛿𝜃2

=∑[(∑𝛿𝑖𝑗 − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 𝛽2+𝛼2+𝜃2

1/2𝑣2𝑖

𝑛𝑖

𝑗=1

)

2

−∑𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 𝛽2+𝛼2+𝜃2

1/2𝑣2𝑖

𝑛𝑖

𝑗=1

]

𝑚

𝑖=1

Where yij is the remission response for each subject i in cluster j.

𝛿𝑙𝐶𝑖𝛿𝛼2

=∑𝛿𝑖 − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 𝛽2+𝛼2+𝜃2

1/2𝑣2𝑖

𝑛𝑖

𝑗=1

𝛿𝑙𝐶𝑖𝛿𝛽2

= (∑𝛿𝑖𝑗𝑋2𝑖𝑗𝑇

𝑛𝑖

𝑗=1

− 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 𝛽2+𝛼2+𝜃2

1/2𝑣2𝑖𝑋2𝑖𝑗

𝑇 )

= ∑(𝛿𝑖𝑗 − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 𝛽2+𝛼2+𝜃2


𝑇

𝑛𝑖

𝑗=1

𝛿𝑙𝐶𝑖

𝛿𝛽2𝑇 =∑(𝛿𝑖𝑗 − 𝛬0(𝑡𝑖𝑗)𝑒



𝑛𝑖

𝑗=1

(𝛿𝑙𝐶𝛿𝛽2

)

2

=𝛿𝑙

𝛿𝛽2

𝛿𝑙

𝛿𝛽2𝑇

67

=∑(𝛿𝑖𝑗 − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 𝛽2+𝛼2+𝜃2


𝑛𝑖

𝑗=1

×∑(𝛿𝑖𝑗 − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 𝛽2+𝛼2+𝜃2


𝑇

𝑛𝑖

𝑗=1

Sum each component across all clusters, evaluate under the null hypothesis, and replace

the unknown parameter 𝛽2 by the correspond maximum likelihood estimate (MLE) �̂�2

𝐼𝜃2𝜃2 =∑(𝛿𝑙𝐶𝑖𝛿𝜃2

)2𝑚

𝑖=1

= {∑[(∑𝛿𝑖𝑗 − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 �̂�2

𝑛𝑖

𝑗=1

)

2


𝑛𝑖

𝑗=1

]

𝑚

𝑖=1

}

2

𝐼𝛼2𝛼2 = (𝛿𝑙𝐶𝛿𝛼1

)2

= [∑(∑𝛿𝑖 − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 �̂�2

𝑛𝑖

𝑗=1

)

𝑚

𝑖=1

]

2

𝐼𝛽2𝛽2 =∑𝛿𝑙𝐶𝑖𝛿𝛽2

𝛿𝑙𝐶𝑖

𝛿𝛽2𝑇

𝑚

𝑖=1

=∑{[∑(𝛿𝑖𝑗 − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 �̂�2) 𝑋2𝑖𝑗

𝑛𝑖

𝑗=1

] × [∑(𝛿𝑖𝑗 − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 �̂�2) 𝑋2𝑖𝑗

𝑇

𝑛𝑖

𝑗=1

]}

𝑚

𝑖=1

𝐼𝜃2𝛼2 =∑𝛿𝑙𝐶𝑖𝛿𝜃2

𝛿𝑙𝐶𝑖𝛿𝛼2

𝑚

𝑖=1

=∑{[(∑𝛿𝑖𝑗 − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 �̂�2

𝑛𝑖

𝑗=1

)

2


𝑛𝑖

𝑗=1

] × [∑𝛿𝑖 − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 �̂�2

𝑛𝑖

𝑗=1

]}

𝑚

𝑖=1

𝐼𝜃2𝛽2 =∑𝛿𝑙𝐶𝑖𝛿𝜃2


𝑚

𝑖=1

=∑{[(∑𝛿𝑖𝑗 − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 �̂�2

𝑛𝑖

𝑗=1

)

2


𝑛𝑖

𝑖=1

] ×∑(𝛿𝑖𝑗 − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 �̂�2)𝑋2𝑖𝑗

𝑇

𝑛𝑖

𝑗=1

}

𝑚

𝑖=1

𝐼𝛼2𝛽2 =∑𝛿𝑙𝐶𝑖𝛿𝛼2


𝑚

𝑖=1

=∑{[∑𝛿𝑖 − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 �̂�2

𝑛𝑖

𝑗=1

] ×∑(𝛿𝑖𝑗 − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 �̂�2)𝑋2𝑖𝑗

𝑇

𝑛𝑖

𝑗=1

}

𝑚

𝑖=1

68

𝐼𝛼2𝛽2𝑇 =∑𝛿𝑙𝐶𝛿𝛼2

𝛿𝑙𝐶

𝛿𝛽2𝑇

𝑚

𝑖=1

=∑{[∑𝛿𝑖 − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 �̂�2

𝑛𝑖

𝑗=1

] × [∑(𝛿𝑖𝑗 − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 �̂�2)𝑋2𝑖𝑗

𝑛𝑖

𝑗=1

]}

𝑚

𝑖=1

And thus we obtained 𝐼𝐶𝑜𝑥 which has the following form:

𝐼𝐶𝑜𝑥 = 𝐼𝜃2𝜃2 − [𝐼𝜃2𝛽2𝑇 , 𝐼𝜃2𝛼2] [𝐼𝛽2𝛽2 𝐼𝛼2𝛽2𝐼𝛼2𝛽2𝑇 𝐼𝛼2𝛼2

]

−1


]

The standardized test statistics T can be calculated as S/(𝐼𝐺𝐿𝑀𝑀 + 𝐼𝐶𝑜𝑥) 12⁄ and used for

further inference, where S is the score-based statistics of the joint model.

69

Appendix D

R code

Library(mnormt)

Library(survival)

Library(mass)

####rnormal and simu.joint generate dataset

rnormal<-function(ncentre,var_u,var_v,cov_uv){

var_cov<-matrix(c(var_u,cov_uv, cov_uv, var_v),2,2)

if(var_u>0 & var_v>0){

ran=rmnorm(ncentre,mean=c(0,0),varcov=var_cov)}

else if(var_v==0){

elements<-cbind(rnorm(ncentre,mean=0, sd=sqrt(var_u)),

rep(0, ncentre))

ran<-as.matrix(elements)

}

else if(var_u==0){

elements<-cbind(rep(0, ncentre),

rnorm(ncentre,mean=0, sd=sqrt(var_v)))

ran<-as.matrix(elements)

}else if ((var_u==0) & (var_v==0)){

ran=matrix(rep(0,ncentre*2), ncentre,2)}

return(ran)

}

simu.joint <- function (n, ncentre, b0,b1, g1, g2, g3, sigma_u, sigma_v, sigma_uv) {

max.iter2=10000

tol2=0.00001

v_u<-vector();v_v<-vector(); covuv<-vector()

beta_list<-list();

gamma_list<-list()

sebeta<-list();

70

segamma<-list()

se_v_u<-vector();

se_v_v<-vector();

se_covuv<-vector()

n_beta1=0;

n_beta2=0;

n_gamma1=0;

n_gamma2=0;

n_gamma3=0

n_var_u=0;

n_var_v=0;

n_covuv=0

##set the initial values for all the parameters that will be estimated

sim_beta<-c(b0, b1)

sim_gamma<-c(g1, g2, g3)

sim_sigma<-c(sigma_u, sigma_v, sigma_uv)

sim_var_u=sim_sigma[1];

sim_var_v=sim_sigma[2];

sim_cov_uv=sim_sigma[3]

##generate centre ID

sim_centre = rep(c(1:ncentre), n/ncentre)

sim_centre<-as.factor(sim_centre)

###ct_matrix records the centre information for all the patients

ct_matrix<-matrix(rep(0,n*ncentre),n,ncentre)

for (i in 1:n){

for (j in 1:ncentre){

if (as.numeric(sim_centre[i])==j) {ct_matrix[i,j]=1}

}

}

centre = rep(c(1:ncentre), n/ncentre)

centre<-as.factor(centre)

##generate center-specific random effect u and v

ran<-rnormal(ncentre,sim_var_u,sim_var_v,sim_cov_uv)

##assign centre ID to each patient

uv_p<-ct_matrix%*%ran

u_p<-uv_p[,1];v_p<-uv_p[,2]

##simulate "arm" variable

arm<-rbinom(n,size=1,prob=0.5)

sim_X1<-cbind(1, arm)

##simulate resp variable

za<-exp(sim_X1%*%sim_beta+u_p)

resp = rbinom(n, 1, za/(1+za))

##simulate the survival time

71

##assume in baseline hazard lambda=0.15, p=1 (Follow exponential distribution)

ranuni<-runif(n,min=0,max=1)

sim_X2<-cbind(resp, arm, resp*arm)

stime<-log(ranuni)/exp(sim_X2%*%sim_gamma+v_p)/(-0.15)

endstudy = runif(n,0, 20) ###make the censoring rate around 20%

event = ifelse(stime>endstudy, 0, 1)

time = ifelse(stime>endstudy, endstudy, stime)

##put all data in a data frame and sort the data based on survival time

datsimu = data.frame(time, event, resp,arm,centre)

st = sort(datsimu$time, decr = T, index = T)

idx = st$ix

datasimu = datsimu[idx, ]

sim<-datasimu[1:n,]

time<-sim$time;

event<-sim$event

resp<-sim$resp;

arm<-sim$arm;

centre<-sim$centre

return(sim)

}

###calculate score in each cluster of the data

centrescore<-function(centredata, betaglm, betacox){

lambda<-centredata$hazard

arm<-centredata$arm

resp<-centredata$resp

event<-centredata$event

risk<-centredata$risk

sex<-centredata$sex

numerator<-exp(betaglm[1]+betaglm[2]*frame$arm)

glminfo<-numerator/(1+numerator)^2

glmscore<-resp-numerator/(1+numerator)

effect<-exp(resp*betacox[1]+arm*betacox[2]+arm*resp*betacox[3])

coxscore<-event-lambda*effect

coxinfo<-lambda*effect

v<-0.5*((sum(glmscore))^2+(sum(coxscore))^2-sum(glminfo)-sum(coxinfo))

72

return(v)

}

###”calcscore” calculates the score statistic of the input data

calcscore<-function(data){

cox<-coxph(Surv(time, event)~as.factor(resp)+as.factor(arm)+

as.factor(resp)*as.factor(arm), data=data)

bootbetacox<-cox$coeff

logi<-glm(data$resp~as.factor(data$arm), family=binomial)

bootbetaglm<-logi$coeff

grid<-basehaz(cox, centered=F)

datasorted<-merge(grid,data,by="time")

total<-vector()

centreNUM<-length(table(data$centre))

for (i in 1:centreNUM){

centredata<-subset(datasorted, centre==i)

total[i]<-centrescore(centredata, bootbetaglm, bootbetacox)

}

scorestat<-sum(total)

return(scorestat)

}

####boottest takes a data and generate bootstrap estimates vector as “result”

boottest<-function(simudata,r){

bootcentre<-simudata$centre

patientNUM<-length(simudata$centre)

centreNUM<-length(table(simudata$centre))

result<-rep(0,r)

result<-replicate(r, {

index<-sample(1:patientNUM, patientNUM, replace=T)

bootstrap<-simudata[index, ]

bootstrap$centre<-bootcentre

bootscore<-calcscore(bootstrap)

return(bootscore)

})

return(result)

}

73

###calculate p-value from the bootstrap estimates

calcpvalue<-function(result, simudata){

actualscore<-round(calcscore(simudata), digits=5)

pvalue<-round(sum(abs(result)>abs(actualscore))/length(result), digits=4)

return(pvalue)

}

####pvaluedist generates simulated datasets with pre-defined values and get pvalues with

two-sided test

pvaluedist<-function(n, ncentre, r,b,g, sigmau, sigmav, sigma_uv){

pcol<-vector()

for (i in 1:r){

data<-simu.joint(n, ncentre, b[1], b[2], g[1], g[2], g[3], sigmau, sigmav, sigma_uv)

bootresult<-boottest(data, 299)

actualscore<-calcscore(data)

pvalue<-sum(abs(bootresult)>abs(actualscore))/length(bootresult)

pcol[i]=pvalue

}

return(pcol)

}

### bootalpha generates datasets with defined parameters and obtain p-values, using one-

sided test, equivalent of the “pvaluedist” function with one-sided test

bootalpha<-function(n, ncentre, r,b,g, sigmau, sigmav, sigma_uv){

pcol<-vector()

for (i in 1:r){


bootresult<-boottest(data, 299)

actualscore<-calcscore(data)

pvalue<-sum(bootresult>actualscore)/length(bootresult)

pcol[i]=pvalue

}

return(pcol)

}

###getalpha takes all the p-values generate from one setting of simulation and calculates

type I error or power, with the options of using either 0.025 or 0.05 as cut-off.

getalpha<-function(vector){

alpha<-sum(vector<=0.05)/length(vector)

74

alpha<-round(alpha, digits=4)

return(alpha)

}

######Asymptotic Method

#####glmI calculates the GLMM portion of the asymptotic variance statistic I

glmI<-function(data){


betaglm<-logi$coeff


total_tt=0

total_bb=matrix(rep(0,4), nrow=2, ncol=2)

total_alphabeta=matrix(rep(0,2), nrow=1, ncol=2)

total_tab=matrix(rep(0,3),nrow=1, ncol=3)

total_gmid=matrix(rep(0,9),nrow=3, ncol=3)


centredata<-subset(data, centre==i)

numerator<-exp(betaglm[1]+betaglm[2]*centredata$arm)


glmscore<-centredata$resp-numerator/(1+numerator)

glm_itheta<-0.5*((sum(glmscore))^2-sum(glminfo))

X1T=cbind(rep(1,length(centredata$arm)),centredata$arm)

alpha<-(centredata$resp-numerator/(1+numerator))

glm_ialpha<-sum(alpha)

beta=alpha*X1T

glm_ibeta<-matrix(c(sum(beta[,1]),sum(beta[,2])), ncol=2)

glm_ibetaT<-t(glm_ibeta)

glm_ibeta2<-glm_ibetaT%*%glm_ibeta

glm_ialphabeta<-glm_ibeta*glm_ialpha

### glm_ibetaalpha<-glm_ibetaT*glm_ialpha

glm_ialpha2<-matrix(glm_ialpha^2, nrow=1, ncol=1)

### glm_mid<-rbind(cbind(glm_ibeta2, glm_ibetaalpha), cbind(glm_ialphabeta,

glm_ialpha2))

glm_tab<-matrix(c(glm_itheta*glm_ibeta, glm_itheta*glm_ialpha), nrow=1, ncol=3)

total_bb=total_bb+glm_ibeta2

total_alphabeta=total_alphabeta+glm_ialphabeta

total_tt=total_tt+glm_itheta^2

### total_gmid<-total_gmid+glm_mid

75

total_tab<-total_tab+glm_tab

}

left<-rbind(total_bb, total_alphabeta)

right<-rbind(t(total_alphabeta), glm_ialpha2)

total_gmid<-cbind(left, right)

total_gmid = total_gmid[1:2, 1:2]

total_tab = matrix(total_tab[1, 1:2], nrow = 1, ncol = 2)

###print(total_gmid)

###print(total_tab)

glm_I<-total_tt-total_tab%*%solve(total_gmid)%*%t(total_tab)

return(glm_I)

}

###end glmI

####coxI calculates the Cox-frailty portion of the asymptotic variance statistics I

coxI<-function(data){


as.factor(resp)*as.factor(arm), data=data)

betacox<-cox$coeff




ctotal_tt=0

ctotal_tab=matrix(rep(0,4),nrow=1, ncol=4)

total_cmid=matrix(rep(0,16),nrow=4, ncol=4)

total_coxibeta2<-matrix(rep(0,9),nrow=3, ncol=3)

total_ab<-matrix(rep(0,3), nrow=1, ncol=3)



effect<-exp(centredata$resp*betacox[1]+centredata$arm*betacox[2]

+centredata$resp*centredata$arm*betacox[3])

coxscore<-centredata$event-centredata$hazard*effect

coxinfo<-centredata$hazard*effect

cox_itheta<-0.5*((sum(coxscore))^2-sum(coxinfo))

alpha<-centredata$event-centredata$hazard*effect

cox_ialpha=sum(alpha)

cox_ibeta<-data.matrix(cbind(sum(alpha*centredata$resp),

sum(alpha*centredata$arm),

sum(alpha*centredata$resp*centredata$arm)))

cox_ibetaT<-t(cox_ibeta)

76

cox_ibeta2<-cox_ibetaT%*%cox_ibeta

cox_ialphabeta<-cox_ialpha*cox_ibeta

cox_ibetaalpha<-cox_ialpha*cox_ibetaT

cox_ialpha2<-matrix(cox_ialpha^2, nrow=1, ncol=1)

### cox_mid<-rbind(cbind(cox_ibeta2, cox_ibetaalpha), cbind(cox_ialphabeta,

cox_ialpha2))

cox_tab<-matrix(c(cox_itheta*cox_ibeta, cox_itheta*cox_ialpha), nrow=1, ncol=4)

ctotal_tt=ctotal_tt+cox_itheta^2

### total_cmid<-total_cmid+cox_mid

ctotal_tab<-ctotal_tab+cox_tab

total_ab=total_ab+cox_ialphabeta

total_coxibeta2=total_coxibeta2+cox_ibeta2

}

left=rbind(total_coxibeta2, total_ab)

right=rbind(t(total_ab), cox_ialpha2)

total_cmid<-cbind(left, right)

total_cmidinv=solve(total_cmid)

cox_I<-ctotal_tt-ctotal_tab%*%total_cmidinv%*%t(ctotal_tab)

return(cox_I)

}

#### ‘itest’ takes a dataset and test

itest<-function(data){

S<-calcscore(data)

I<-coxI(data)+glmI(data)

stat=S/(I^0.5)

pvalue<-pnorm(-abs(stat))

return(pvalue)

}

### ‘itest’ calculates the overall asymptotic variance statistics I and obtains p-value using

both one-sided and two-sided test

itestcomp<-function(data){

S<-calcscore(data)

I<-coxI(data)+glmI(data)

stat=S/(I^0.5)

pvalue1 = 1-pnorm(stat)

pvalue2 = 2*pnorm(-abs(stat))

return(c(pvalue1, pvalue2))

}

77

#### ‘itestalpha’ generates simulation data and obtains p-values and store the results in

‘pcol’

itestalpha<-function(n, ncentre, R,b,g, sigmau, sigmav, sigma_uv){

pcol<-matrix(0, R, 2)

for (i in 1:R){


pvalue<-itestcomp(data)

pcol[i, ]<-pvalue

# cat('i=', i, ', p = ', pvalue, '\n')

}

alpha1=getalpha25(pcol[,1])

alpha2=getalpha(pcol[,2])

print(alpha1, alpha2)

return(pcol)

}

return(pcol)

}

###getalpha takes all the p-values generate from one setting of simulation and calculates

type I error or power, using 0.05 as the cutoff

getalpha<-function(vector){


alpha<-round(alpha, digits=4)

return(alpha)

}

getalpha25<-function(vector){


return(alpha)

}

####Functions specifically used for the HD.6 data

####bootstrap for HD.6 data, adding “risk” as a fixed-effect covariate to the Cox-frailty

model.

centrescorehd6<-function(centredata, betaglm, betacox){

lambda<-centredata$hazard

arm<-centredata$arm

resp<-centredata$resp

event<-centredata$event

risk<-centredata$risk

sex<-centredata$sex

numerator<-exp(betaglm[1]+betaglm[2]*centredata$arm)

78


glmscore<-resp-numerator/(1+numerator)

effect<-exp(resp*betacox[1]+arm*betacox[2]+risk*betacox[3]+arm*resp*betacox[4])

coxscore<-event-lambda*effect

coxinfo<-lambda*effect

v<-0.5*((sum(glmscore))^2+(sum(coxscore))^2-sum(glminfo)-sum(coxinfo))

return(v)

}

calcscorehd6<-function(data){

cox<-coxph(Surv(time, event)~as.factor(resp)+as.factor(arm)+as.factor(risk)

+as.factor(resp)*as.factor(arm), data=data)

bootbetacox<-cox$coeff


bootbetaglm<-logi$coeff



total<-vector()




total[i]<-centrescorehd6(centredata, bootbetaglm, bootbetacox)

}

scorestat<-sum(total)

return(scorestat)

}

boottesthd6<-function(simudata,r){

bootcentre<-simudata$centre

patientNUM<-length(simudata$centre)

centreNUM<-length(table(simudata$centre))

result<-rep(0,r)

result<-replicate(r, {

index<-sample(1:patientNUM, patientNUM, replace=T)

bootstrap<-simudata[index, ]

bootstrap$centre<-bootcentre

bootscore<-calcscorehd6(bootstrap)

return(bootscore)

})

return(result)

}

79

###calculate p-value from the bootstrap estimates

calcpvalue<-function(result, simudata){

actualscore<-round(calcscore(simudata), digits=5)

pvalue<-round(sum(abs(result)>abs(actualscore))/length(result), digits=4)

return(pvalue)

}

###coxIhd6 calculates Cox-frailty portion of the I statistics

coxIhd6<-function(data){


as.factor(risk), data=data)

betacox<-cox$coeff




ctotal_tt=0

ctotal_tab=matrix(rep(0,4),nrow=1, ncol=4)

total_cmid=matrix(rep(0,16),nrow=4, ncol=4)

total_coxibeta2<-matrix(rep(0,9),nrow=3, ncol=3)

total_ab<-matrix(rep(0,3), nrow=1, ncol=3)



effect<-

exp(centredata$resp*betacox[1]+centredata$arm*betacox[2]+centredata$risk*betacox[3]

)

coxscore<-centredata$event-centredata$hazard*effect

coxinfo<-centredata$hazard*effect

cox_itheta<-0.5*((sum(coxscore))^2-sum(coxinfo))

alpha<-centredata$event-centredata$hazard*effect

cox_ialpha=sum(alpha)

cox_ibeta<-data.matrix(cbind(sum(alpha*centredata$resp),

sum(alpha*centredata$arm),

sum(alpha*centredata$risk)))

cox_ibetaT<-t(cox_ibeta)

cox_ibeta2<-cox_ibetaT%*%cox_ibeta

cox_ialphabeta<-cox_ialpha*cox_ibeta

cox_ibetaalpha<-cox_ialpha*cox_ibetaT

cox_ialpha2<-matrix(cox_ialpha^2, nrow=1, ncol=1)

### cox_mid<-rbind(cbind(cox_ibeta2, cox_ibetaalpha), cbind(cox_ialphabeta,

cox_ialpha2))

80

cox_tab<-matrix(c(cox_itheta*cox_ibeta, cox_itheta*cox_ialpha), nrow=1, ncol=4)

ctotal_tt=ctotal_tt+cox_itheta^2

### total_cmid<-total_cmid+cox_mid

ctotal_tab<-ctotal_tab+cox_tab

total_ab=total_ab+cox_ialphabeta

total_coxibeta2=total_coxibeta2+cox_ibeta2

}

left=rbind(total_coxibeta2, total_ab)

right=rbind(t(total_ab), cox_ialpha2)

total_cmid<-cbind(left, right)

total_cmidinv=solve(total_cmid)

cox_I<-ctotal_tt-ctotal_tab%*%total_cmidinv%*%t(ctotal_tab)

return(cox_I)

}

###end coxI

#### ‘itesthd6’ applies the asymptotic method specifically to the HD.6 data

itesthd6<-function(data){

S<-calcscorehd6(data)

I<-coxIhd6(data)+glmI(data)

stat=S/(I^0.5)

pvalue1 = 1-pnorm(stat)

pvalue2 = 2*pnorm(-abs(stat))

return(c(pvalue1, pvalue2))

}

####data cleaning with the HD.6 Trial

library(survival)

require(boot)

library(coxme)

library(lme4)

abvd<-read.csv("I:/crcruJointModel_cr2.csv", header=TRUE)

colnames(abvd) <- c("arm", "sex", "age", "centre","clc2resp","resp", "risk", "time",

"event")

abvd$time<-as.numeric(abvd$time)

miss<-abvd[is.na(abvd$resp),]

miss$event<-as.factor(miss$event)

summary(miss)

###delete all the observation with missing resp value

abvd<-abvd[!is.na(abvd$resp),]

##remove all the data with survival time less than 6 month

abvd<-subset(abvd,time>=6)

###convert categorical variable to numeric variable

81

########note: arm A=abvd + radiation; arm B=abvd alone##############

abvd$arm<-ifelse(abvd$arm=="A", 1,0) ##abvd alone=0; abvd+radiation=1)

arm<-abvd$arm

resp<-ifelse(abvd$resp=="YES", 1,0) ##remission=1, no remission=0

event<-as.numeric(abvd$event)

time<-abvd$time

centre<-as.factor(as.numeric(abvd$centre)) ##there are 29 centres in total

risk<-ifelse(abvd$risk=="High", 1,0)

sex<-ifelse(abvd$sex=="M", 1,0)

###rearrange the dataset in the descending order of survival time

dat = data.frame(time, event, resp,arm,risk, sex, centre)

st = sort(dat$time, decr = T, index = T)

idx = st$ix

dat = dat[idx, ]

time<-dat$time; event<-dat$event; resp<-dat$resp; arm<-dat$arm

risk<-dat$risk; sex<-dat$sex; centre<-dat$centre

####HD.6 clinical trial data cleaning

abvd<-read.csv("I:/crcruJointModel_cr2.csv", header=TRUE)

colnames(abvd) <- c("arm", "sex", "age", "centre","clc2resp","resp", "risk", "time",

"event")

abvd$time<-as.numeric(abvd$time)

miss<-abvd[is.na(abvd$resp),]

miss$event<-as.factor(miss$event)

summary(miss)

###delete all the observation with missing resp value

abvd<-abvd[!is.na(abvd$resp),]

##remove all the data with survival time less than 6 month

abvd<-subset(abvd,time>=6)

###convert categorical variable to numeric variable

########note: arm A=abvd + radiation; arm B=abvd alone##############

abvd$arm<-ifelse(abvd$arm=="A", 1,0) ##abvd alone=0; abvd+radiation=1)

arm<-abvd$arm

resp<-ifelse(abvd$resp=="YES", 1,0) ##remission=1, no remission=0

event<-as.numeric(abvd$event)

time<-abvd$time

centre<-as.factor(as.numeric(abvd$centre)) ##there are 29 centres in total

risk<-ifelse(abvd$risk=="High", 1,0)

sex<-ifelse(abvd$sex=="M", 1,0)

###rearrange the dataset in the descending order of survival time

82

dat = data.frame(time, event, resp,arm,risk, sex, centre)

st = sort(dat$time, decr = T, index = T)

idx = st$ix

dat = dat[idx, ]

time<-dat$time; event<-dat$event; resp<-dat$resp; arm<-dat$arm

risk<-dat$risk; sex<-dat$sex; centre<-dat$centre

Date post:	02-Nov-2020
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

TESTING CLUSTER-LEVEL RANDOM EFFECT IN JOINT MODELS … Profiles/Xin Yao Practicum Re… · methods...

Documents