1
TESTING CLUSTER-LEVEL RANDOM EFFECT IN JOINT
MODELS OF CLINICAL TRIAL DATA
by
Xin Yao
A thesis submitted to the Department of Public Health
In conformity with the requirements for
the degree of Master of Science
Queen’s University
Kingston, Ontario, Canada
(November, 2014)
Copyright ©Xin Yao, 2014
2
Abstract
To obtain enough participants, clinical trials often involve patients recruited from
multiple institutions, such as a Hodgkin’s Lymphoma clinical trial HD.6 conducted by
NCIC Clinical Trial Group (CTG). However, these institutions may have inherent
heterogeneity that affects the outcomes of the clinical trial. These heterogeneities can be
described as cluster-level random effects. In previous study done by Wang (2013), a joint
model was used to analyze the relationship between remission response and survival in
the HD.6 clinical trial. In this study, we develop both asymptotic and bootstrapping
methods to test whether the variance of the cluster-level random effect is larger than zero
for multivariate outcomes. These methods extend the classical asymptotic results for
score test of homogeneity to deal with multivariate outcomes. Both methods involved the
construction of a score-based test statistics, but the bootstrapping method approximate the
distribution of the statistic by resampling while the asymptotic method uses the analytical
variance for statistical inference. A series of simulations were conducted and the results
showed that the bootstrapping method has higher power compared to the asymptotic
method. This is due to the fact that the bootstrapping method does not make any
assumptions about the underlying distribution of the statistics that could be false. We also
showed that the proposed bootstrapping method can be an appropriate method for
epidemiologist as it allows the optimal sample size to be determined during the design of
an investigative study. We applied both the bootstrapping and asymptotic methods to the
NCIC CTG HD.6 clinical trial data and concluded there was no evidence to support that
outcomes were not affected by cluster-level random effects. For future directions, the
3
theory behind how to determine optimal sample size in studies in which bootstrap method
are used to test cluster effect should be further investigated. Furthermore, the effect of
variation in cluster size on the power of the bootstrap method should be studied.
4
Co-Authorship
5
Acknowledgements
First, I would like to thank both Dr. Bingshu Chen, Dr. Wenyu Jiang and the Department
of Public Health Sciences for granting me the opportunity to enroll in the Master of
Science in Biostatistics. My two supervisors have provided continuous mentorship and
guidance from 2013 to 2014, and assisted me throughout the practicum project. Without
their help this project would not have been possible.
I would also like to thank the faculty members in both Department of Public Health
Sciences and Department of Mathematics and Statistics. They have taught me crucial
knowledge throughout the school year and prepared me for this project.
At last, I would like to thank NCIC Clinical Trials Group for providing a supportive
working environment and all the facility I needed. I would also like to thank Natural
Sciences and Engineering Research Council of Canada (NSERC) for financial support.
6
Statement of Originality
I hereby certify that all of the work described within this thesis is the original work of the author.
Any published (or unpublished) ideas and/or techniques from the work of others are fully
acknowledged in accordance with the standard referencing practices.
(Xin Yao)
(October, 2014)
7
Table of Contents
Abstract ............................................................................................................................................ 2
Co-Authorship ................................................................................................................................. 4
Acknowledgements .......................................................................................................................... 5
Statement of Originality ................................................................................................................... 6
List of Figures .................................................................................................................................. 9
List of Tables ................................................................................................................................. 10
Chapter 1 Literature Review and Introduction .............................................................................. 11
1.1 Cluster effect in multicenter clinical trials ........................................................................... 11
1.1.1 Consequences of Cluster Effects ................................................................................... 11
1.1.2 Correcting For Cluster Effect in Practice ...................................................................... 12
1.1.3 Asymptotic Approach of Detecting Cluster-level Random Effect ................................ 14
1.2 Bootstrap .............................................................................................................................. 16
1.2.1 Introduction ................................................................................................................... 16
1.2.2 Bootstrap Methods Applied for the Modeling of Cluster Effects ................................. 18
1.3 Notations .............................................................................................................................. 20
1.4 HD.6 Hodgkin Lymphoma Clinical Trial and Joint Modeling ............................................ 21
1.5 Purpose of Study .................................................................................................................. 25
Chapter 2 Bootstrap and Asymptotic Variance Testing Procedures .............................................. 28
2.1 Overview .............................................................................................................................. 28
2.2 Calculating score-based statistics S...................................................................................... 29
2.3 Inference using Asymptotic Method .................................................................................... 32
2.4 Bootstrap Procedure ............................................................................................................. 33
2.5 Calculating Type I Error and Power .................................................................................... 35
Chapter 3 ........................................................................................................................................ 36
3.1 Overview .............................................................................................................................. 36
3.2 Methods ............................................................................................................................... 36
3.3 Simulation Results ............................................................................................................... 38
3.3.1 Results of Asymptotic Method ..................................................................................... 38
3.3.2 Bootstrap Estimate Distribution .................................................................................... 39
3.3.3 Type I Error and Power Comparison ............................................................................ 40
8
Chapter 4 Application and Discussion ........................................................................................... 44
4.1 Application to HD.6 Clinical Trial ...................................................................................... 44
4.2 Comparison of bootstrap method with the asymptotic method ........................................... 47
4.3 Sample Size Selection .......................................................................................................... 48
4.4 Future Directions ................................................................................................................. 49
4.5 Computational Expense ....................................................................................................... 50
Chapter 5 ........................................................................................................................................ 51
Bibliography .................................................................................................................................. 53
Appendix A .................................................................................................................................... 58
Appendix B .................................................................................................................................... 60
Appendix C .................................................................................................................................... 63
Appendix D .................................................................................................................................... 69
9
List of Figures
Figure 1 Distribution of Participants in HD.6 Clinical Trial ......................................................... 22
Figure 2 Bootstrap Estimate Distribution of 4 Datasets ................................................................ 39
Figure 3 Distribution of bootstrap estimates obtained from HD.6 clinical trial data ..................... 45
10
List of Tables
Table 3.1 Parameters of Simulation ............................................................................................... 37
Table 3.2 Type I error of Asymptotic Approach and Bootstrap with K = 500 replications .......... 41
Table 3.3 Power of Asymptotic and Bootstrap Method ................................................................. 42
Table 3.4 Relationship between Sample Size and Statistical Power ............................................. 43
11
Chapter 1
Literature Review and Introduction
1.1 Cluster effect in multicenter clinical trials
1.1.1 Consequences of Cluster Effects
In the ideal scenario of scenario for a clinical trial, one central agency randomly
assigns treatments to all participants. This allows statistical basis to test for the null
hypothesis of no treatment effect. In reality, clinical trials are often designed to involve
multiple institutions, to ensure that an adequate number of participants and
generalizability of the trial, especially in the case of rare diseases such as Hodgkin’s
Lymphoma (HL), or where a study involves many collaboration groups. However, effect
modification could arise if patients recruited at one specific center share certain common
characteristics, or if the medical team at one center practices in a particular pattern.
Furthermore, the correlation between patient outcome and exposure in a particular
institution presents more analytical challenges. If the correct analytical procedure is not
used in the inference process and the data collected from different centers are simply
pooled, center-specific variations could lead to incorrect p-values as well as misleading
confidence intervals, biased estimates, and unrecognized heterogeneity among clusters
(Glidden and Vittinghoff 2004; Localio et al, 2001).
For example, in 1998, 41,000 Medicare patients in 73 Pennsylvania hospitals
participated in a trial designed to study cardiac catheterization. In the analysis by Laine et
al on the rate of combined cardiac catheterization, variation among hospitals were
12
ignored and patient outcomes were assumed to be independent. The initial pooled
analysis showed confidence interval of [33.7%, 34.7%] for the overall rate. However,
subsequent analysis showed that individually, catheterization rate in each hospital varied
from 2% to 98%. These data suggest that the pooled estimate grossly underestimated the
variation in catheterization rates among the participating hospitals. After adjustment for
the correlation of patients within hospitals, the confidence interval of overall rate became
25% to 43%. This example demonstrate that without accounting for cluster-level random
effect, pooling of clustered data can greatly overstate the precision of the data, among
other consequences (Laine et al, 1998; Localio et al, 2001).
1.1.2 Correcting For Cluster Effect in Practice
There are several methods in literature that address the issue of cluster specific
variation in different settings such as binary response studies as well as time-to-event
analysis. For binary response studies, existing methods that correct for clustering effect
can be categorized based on whether the inference procedure is conditional on the
centers. In conditional methods, the effect of treatment is analyzed within each cluster,
and the average effect of treatment across centers is calculated. Analysis about the
relationship between the covariates and outcome is also done within clusters. Such
methods are appropriate for traditional multicenter trials in which patients are randomly
assigned to different treatments within each center. One of the elementary method in this
category is the Mantel-Haenszel method (Mantel and Haenszel, 1959), which allows
estimation of odds ratios or relative risks in clustered data, but is limited to binary
13
outcomes and scenarios with one covariates. Conditional logistic regression can include
multiple covariates, but can only estimate patient-level factors. Fixed-effect logistic
regression assumes that the effect of particular centers on the outcome is fixed. In this
method each center is represented by an indicator variable, and the estimated regression
coefficient of each indicator is the risk estimate for all patients within the center. Finally,
random-effect model assumes that the centers involved in the studies are random samples
chosen from a population that follows an underlying distribution, usually assumed to be
normal. The effect of every center is estimated through a single random-effect variable.
Therefore, the random effect model is appropriate for situations with many centers, as
opposed to fixed-effect logistic regression.
In comparison, unconditional methods take into account of both within- and
between-center homogeneities. Some well-established survey methods are designed to
use in surveys where participants are clustered, such as the one conducted by Bassuk et al
to investigate cognitive decline among seniors. The survey-sampling methods are good
for analyzing the between-center differences present in clinical trials. Another method is
the generalized estimating equations (GEE), which represent a set of methods that allow
for estimation of effect of treatment and exposure to other covariates (Liang and Zeger,
1986). Confidence intervals of estimates are adjusted to account for correlations in
clusters. These methods are suitable for situations when many clusters are involved.
14
1.1.3 Asymptotic Approach of Detecting Cluster-level Random Effect
Several methods have been developed to analyze the variation among institutions
as random effects in multivariate models. Matsuyama et al (1998) used a Bayesian
hierarchical survival model to investigate the impact of institutional variation on the
efficacy of treatment in multicenter cancer clinical trial. In that study, survival was the
endpoint of interest. There have also been several statistical testing method developed
specifically to test for homogeneity among strata. Different versions of these tests are
available for survival data (Gray, 1995) and generalized linear model (Liang, 1987; Lin,
1997; Smith, 1993). In these methods, the variation among institutions is treated as a
random effect and the objectives are to test the null hypothesis that the variance of such
random effect is 0. The test statistics used include score function and observed fisher
information.
In a method proposed by Liang (1987), mixed-effect models were used to model
samples grouped in m potential clusters. A specific outcome variable Y for cluster i that
contains a group of samples has a corresponding density function,𝑓𝑖(𝑦𝑖; 𝛽, 𝛼𝑖 ), in which
𝛼𝑖 = 𝛼 + 𝜃1
2𝑣𝑖 , and v𝑖 is independently generated from an unknown distribution F(.)
with zero mean and unit variance. In other words, 𝛽 represents fixed effects of covariates
other than clusters, while 𝛼𝑖 represents random effect presented by cluster 𝑖. The null
hypothesis to be tested is H0: 𝜃 = 0. If 𝜃 = 0, then the random effect disappears and a
fixed-effect model alone is sufficient to model the outcome and exposure, which means
that the variation among centers is negligible.
15
To test the hypothesis of homogeneity among clusters, Liang (1987) proposed the
following score-based test statistics:
𝑆 =∑𝜕
𝜕𝜃
𝑚
𝑖=1
𝑙𝑜𝑔𝑓𝑖(�̂�, �̂�, 𝜃) =1
2∑[{
𝜕
𝜕𝛼𝑖𝑙𝑜𝑔𝑓𝑖(�̂�, �̂� )}
2 +
𝑚
𝑖=1
𝜕2
𝜕𝛼𝑖2 𝑙𝑜𝑔𝑓𝑖(�̂�, �̂� )] , (1.1)
in which 𝑓𝑖(�̂�, �̂�, 𝜃) represents density of cluster i, 𝜕
𝜕𝛼𝑖𝑙𝑜𝑔𝑓𝑖(�̂�, �̂� ) and
𝜕2
𝜕𝛼𝑖2 𝑙𝑜𝑔𝑓𝑖(�̂�, �̂� )
are score function and observed information about the random effect 𝛼𝑖 calculated under
the null hypothesis (𝜃 = 0 ). �̂� represents the column vector of maximum likelihood
estimators of fixed-effect covariates and �̂� represents random effects associate with
clusters. (Detailed derivation of the score-based statistics S is in Appendix A). The
statistics I was used to approximate the variance of the score-based statistics
asymptotically. Let li represent the log-likelihood function 𝑙𝑜𝑔𝑓𝑖(𝛽, 𝛼 ), and I has the
following form
𝐼 = 𝐼𝜃𝜃 − [𝐼𝜃𝛽𝑇 , 𝐼𝜃𝛼] [𝐼𝛽𝛽 𝐼𝛼𝛽𝐼𝛼𝛽𝑇 𝐼𝛼𝛼
]
−1
[𝐼𝜃𝛽𝐼𝜃𝛼] , (1.2)
where
𝐼𝜃𝜃 =∑(𝜕𝑙𝑖𝜕𝜃)2𝑚
𝑖=1
𝐼𝛼𝛼 =∑(𝜕𝑙𝑖𝜕𝛼)2𝑚
𝑖=1
𝐼𝛽𝛽 =∑𝜕𝑙𝑖𝜕𝛽
𝜕𝑙𝑖𝜕𝛽𝑇
𝑚
𝑖=1
𝐼𝜃𝛽 =∑𝜕𝑙𝑖𝜕𝜃
𝜕𝑙𝑖𝜕𝛽
𝑚
𝑖=1
𝐼𝜃𝛼 =∑𝜕𝑙𝑖𝜕𝜃
𝜕𝑙𝑖𝜕𝛼
𝑚
𝑖=1
𝐼𝛼𝛽 =∑𝜕𝑙𝑖𝜕𝛼
𝜕𝑙𝑖𝜕𝛽
𝑚
𝑖=1
𝐼𝛼𝛽𝑇 =∑𝜕𝑙𝑖𝜕𝛼
𝜕𝑙𝑖𝜕𝛽𝑇
𝑚
𝑖=1
, (1.3)
The rejection criteria involved the normalized statistics T=𝑆/ 𝐼1 2⁄ . This test
statistics was proved to have an asymptotic standard normal distribution. Large values of
16
T lead to the rejection of the null hypothesis. Liang argued that since the “parameters
specified by the null hypothesis is on the boundary of the parameter space formed by β, α
and θ”, the test of statistics T needs to be one-sided. Therefore, the rejection criteria
should be a p<α (α=0.025) (Liang, 1987).
Gray (1995) also proposed a test of cluster effectbased on martingale residual for
survival data, and the test statistics has similar form to (1). Lin(1997) showed a method to
detect random effect in generalized linear models. In all these methods, the test statistics
are the same as ones shown in (1.1), (1.2) and (1.3), and rejection criteria are similar to
the one demonstrated above.
In this project, we will develop statistical methods to test clustered effects for
multiple outcome variables which included both binary response variable and survival
outcome. Two different approaches will be considered, in one approach, we extend the
existing asymptotic method to the multiple outcomes setting; in another approach, we use
bootstrap method to approximate the distribution of the score test statistic. In the next
section, we review some basic idea of bootstrap method to be used in this project.
1.2 Bootstrap
1.2.1 Introduction
Inspired by the jackknife method, Efron (1979) was the first to introduce the
bootstrap method, in an attempt to improve the estimation of statistics such as variance
and regression estimators in a nonparametric manner. The idea is that to estimate a
parameter 𝜃 for a population with unknown distribution, an approximation to the
population distribution can be constructed by resampling from the observed sample
17
which then generate a collection of bootstrap samples. Later, Bickel and Freedman
(1981) showed that Efron’s method is asymptotically valid for many situations and can
improve the asymptotic accuracy in some situations. Rubin (1981) developed a Bayesian
analogue of the bootstrap. This method simulates theposterior distribution of the
parameter of interest, instead of the sampling distribution of a statistics of interest. The
result is a collection of estimated parameter inferred from the distribution of the bootstrap
statistics.
Suppose the parameter of interest 𝜃 can be estimated by estimator 𝜃 derived from
the samples. Bootstrap sample can be repeatedly drawn from the observed sample. On
each bootstrap sample, a bootstrap estimate 𝜃∗ can be calculated in the same way as 𝜃 is
from the observed sample. A popular way of assessing the uncertainty of the estimates
and establish a 1 − 𝛼 level confidence interval for the parameter𝜃 is the percentile
bootstrap method (Efron and Tibshirani, 1993). In this method a two-sided 1 − 𝛼
confidence interval for 𝜃 is simply the percentiles marking the middle 1 − 𝛼 portion of
the bootstrap distribution for the bootstrap estimates 𝜃∗ . This method is the most
appropriate when the distribution of bootstrap estimates is symmetrical and centered on
the observed statistics. We will apply this method later on when we make inference about
the parameter of interest.
The estimate of statistics calculated from bootstrap sample may disagree with the
true value of the parameter 𝜃 systematically, in which case a bias occurs. Efron (1987)
developed new methods of estimating bootstrap confidence intervals, showing the
process of using a bias-correction constant to improve the accuracy of bootstrap. Steck
18
and Jaakkola(2003) also demonstrated a method of leading-order bias correction for
bootstrapped scoring functions as well as maximum likelihood.
1.2.2 Bootstrap Methods Applied for the Modeling of Cluster Effects
Besides the methods introduced in section 1.2.1, bootstrapping is another
unconditional method for studying the clustering effect in multicenter studies. There are
several such bootstrap methods in literature, each with a distinct way of resampling.
Random cluster bootstrap involves random sampling of clusters with replacement and
subsequent permutation of observation within clusters . In reverse two-stage bootstrap,
observations within each cluster are selected with replacement, and then the clusters are
again selected using random sampling with replacement (RSWR). As a result, clusters
that appear in the final collection of bootstrap samples contain the same set of cluster
members.
Applying bootstrap to mixed-effect models (mixed-effect bootstrap) is also a
common practice. In a data structure where the observations are divided into m clusters
and there are ni observations in the cluster i, the response variable yij is modeled using a
linear regression model 𝑦𝑖𝑗 = 𝑋𝑖𝑗𝑇𝛽 + 𝜇𝑖 + 𝑒𝑖𝑗 where Xij
T is a vector of fixed-effect
covariates, 𝛽 is a vector of fixed-effect regression coefficients, ui is the effect of cluster i
and 𝜇𝑖~𝑁(0, 𝜎𝜇2), and eij describes random error and is a random variable drawn from
𝑁(0, 𝜎𝑒2) distribution. The basic sampling approach is to generate m cluster-level error
ui*
from 𝑁(0, 𝜎𝜇2) distribution, and within-cluster error 𝑒𝑖𝑗
∗ for each observation j in
19
cluster i. Next a new dataset is simulated using the model 𝑦𝑖𝑗 = 𝑋𝑖𝑗𝑇𝛽 + 𝜇𝑖 + 𝑒𝑖𝑗
∗ and from
this dataset one bootstrap parameter estimate 𝜃∗can be obtained. This process is repeated
until a satisfactory number of bootstrap estimates are obtained. This method works well
when the cluster-level random effects are independently and identically distributed and
the mixed effect model coefficients are statistically significant (Chambers and Chaundra,
2014).
All the methods above aim to generate empirical estimates that are used to make
statistical inference about the distribution of a particular test statistics. The effectiveness
of these methods has been assessed using a number of criterion in the existing literature.
Davidson and Hinkey(1997) argued that the first two moments of bootstrap statistics
should be as close as possible to the first two moments of the original distribution. Using
this criterion, they evaluated several bootstrap methods and found that random cluster
bootstrap and the mixed-effect bootstrap are the most appropriate for bootstrapping
clustered data.
One major goal in this study is to formulate and apply a new bootstrap procedure
to clustered data. Contrary to the bootstrap methods described above, the parameters of
interest in this study are estimated under the null hypothesis, that is, the hypothesis that
there is no random effect at the cluster level. Therefore, a different sampling scheme is
needed. Instead of sampling clusters or random effect variables, we will resample all
observations with replacement and allocate the bootstrap sample to the existing clusters,
as if there is no heterogeneity among the clusters
20
1.3 Notations
In this section, we define notation that will be used through the following:
i i =1, 2….m represents the ith cluster
j j = 1,2….,ni represents the jth observation in the ith cluster.
In the GLMM model:
𝑋1𝑖𝑗 A 𝑝 × 1 vector of fixed-effect covariates with the first column being 1’s (the
intercept) used in the GLMM model
𝛽1 A 𝑝 × 1 vector of fixed-effect regression parameters.
�̂�1 A 𝑝 × 1 vector of estimators of fixed-effect regression parameters. It is obtained
through GLMM regression.
𝑌𝑖𝑗 The value of the remission response marker of observation j in cluster i
𝛼1𝑖 the random effect of the ith cluster, 𝛼1𝑖~ N(𝛼1,σ12). 𝛼1𝑖 = 𝛼1 + 𝜃1
1/2𝑣1𝑖 , where
𝛼1 and 𝜃1 are both scalar quantity.
In the Cox-frailty model:
𝑡𝑖𝑗 Time to failure or censoring for each subject j in cluster i
𝛿𝑖𝑗 Censoring event indicator for subject j in cluster i
𝑋2𝑖𝑗 A 𝑞 × 1 vector of fixed-effect covariates used in the Cox-frailty model
𝛽2 A 𝑞 × 1 vector, consists of estimates of the fixed-effect regression parameters.
�̂�2 A 𝑞 × 1 vector of estimators of fixed-effect regression parameters. It is obtained
through Cox-frailty regression
𝜆0(𝑡) The baseline hazard function, 𝑡𝑖𝑗 represent time of death or censoring
𝛬0(𝑡) The cumulative baseline hazard function
21
𝛼2𝑖 random effect of the ith cluster, and 𝛼2𝑖~ N(𝛼2,σ22). 𝛼2𝑖 = 𝛼2 + 𝜃2
1/2𝑣1𝑖, where
𝛼2 and 𝜃2 are both scalar quantity.
𝜎12 covariance of random effect 𝛼1𝑖 and 𝛼2𝑖
𝜌 correlation of random effect 𝛼1𝑖 and 𝛼2𝑖 . 𝜌 =𝜎12
√𝜎12𝜎2
2
𝑣1𝑖 and 𝑣2𝑖 are random variables that are assumed to have bivariate joint distribution
1.4 HD.6 Hodgkin Lymphoma Clinical Trial and Joint Modeling
The NCIC Clinical Trials Group’s HD.6 clinical trial is a suitable example for
studying institutional variation (Meyer et al, 2012). It was designed for comparing the
efficacy of one treatment, which is a combination of doxorubicin, bleomycin, vinblastine
and decarbazine (ABVD), with another that includes radiotherapy, with or without
ABVD therapy. This study involved 405 stage IA or IIA nonbulky Hodgkin’s
lymphoma patients recruited from 29 medical institutions across Canada and U.S.
Patients were first divided based on risk status into “favorable risk cohort” and
“unfavorable risk cohort” (defined in Myer et al.2005), then randomly assigned into two
treatment arms: ABVD alone group and radiotherapy group. In the radiotherapy arm, the
unfavorable risk cohort received two cycles of ABVD treatment followed by
radiotherapy, while those with favorable risk profile received subtotal nodal radiation
therapy; in the ABVD arm, both favorable and unfavorable risk cohort received four
cycles of ABVD. Patients were followed-up for a median length of 11.3 years. Both
remission status and survival time were measurements of interest. Remission status was
determined based on radiological or clinical evidence six months after the randomization.
22
Those without remission are “response-positive” and those with remission are “response-
negative”. By the time the study finished, the survival rate in the ABVD therapy group
was 94%, compared to the 87% in the radiotherapy group. Another observation was that
out of the patients who went through remission, 94% had no disease progression and 98%
survived at the end of the trial, compared to 81% who had no disease progression and
92% who survived among patients who did not go through remission. The following
factors were recorded for patients:
Sex: 1 for male, 0 for female
Arm: treatment arm, ABVD alone=0, ABVD+ radiotherapy =1
Remission status (Yij): no remission = 0, remission =1
Risk profile: favorable risk =0, unfavorable risk =1.
The participant numbers are uneven across the institutions, with one institution
responsible for 100 of the total number of participants. The distribution of participant
shown in Figure 1:
Figure 1 Distribution of Participants in HD.6 Clinical Trial
23
In previous work done by Jia Wang (2013), a joint model linking remission
response and survival probability was developed to compare the efficacy of the two
treatments under investigation, and to investigate whether remission response could be
used as a predictor for survival time. The joint model is made up of a Generalized Linear
Mixed Model (GLMM) that models the effect of covariates in choice on remission
response, as well as a Cox-frailty survival model, in which remission response is one of
the covariates. The cluster effect generated from institutional heterogeneity was included
in the model as two random effect variable, 𝛼1𝑖 and 𝛼2𝑖 in the GLMM and Cox-frailty
model respectively. The overall joint model has the following form:
1) The Generalized Linear Mixed Model (GLMM) model was developed using a
Bernoulli distribution for response to treatment,
𝑃(𝑌𝑖𝑗 = 1|𝑋𝑖𝑗, 𝛽1, 𝛼1𝑖) =𝑒𝑋1𝑖𝑗𝑇 𝛽1+𝛼1𝑖
1+𝑒𝑋1𝑖𝑗𝑇 𝛽1+𝛼1𝑖
where 𝛼1𝑖 is the random effect of the ith cluster. 𝑋1𝑖𝑗𝑇 is a p×1 vector of fixed effects
covariates in the GLMM model, 𝛽1 represents a p×1 vector of fixed effect regression
coefficients in GLMM, with sex, treatment arm and risk profile as covariates.
2) The Cox proportional hazard model:
𝜆(𝑡| 𝑋2𝑖𝑗, 𝛽2, 𝛼2𝑖) = 𝜆0(𝑡)𝑒𝑋2𝑖𝑗𝑇 𝛽2+𝛼2𝑖,
(3)
(4)
24
where t is failure time, 𝛼2𝑖 is the random effect of the ith cluster in the Cox-frailty model,
𝑋2𝑖𝑗𝑇 is a 1×q vector of fixed effects covariates in the Cox proportional hazard model, 𝛽2
is a q×1 vector of fixed effect regression coefficients in the Cox-frailty model, with sex,
remission response, treatment arm, and risk profile as covariates.
The two models above are joint together through the remission response variable
Yij and through the joint distribution of the two random effect variables. More
specifically, 𝛼1𝑖 and 𝛼2𝑖 are assumed to have bivariate normal distribution:
𝜶𝒊 = (𝛼1𝑖𝛼2𝑖)~𝑁2 ((
𝛼1𝑖𝛼2𝑖) , (
𝜎12 𝜎12
𝜎12 𝜎22 )) = 𝑁2 ((
𝛼1𝑖𝛼2𝑖) , Σ) ,
In the study by Wang (2013), penalized partial likelihood method was used to
estimate both fixed and random effects. More specifically, the variance and covariance of
the random effect model were estimated using both joint model and separate model, and
the accuracy of the estimates from these models were compared. The study demonstrated
that the joint model is preferable to the separate in various aspects, such as reduced bias
in estimates of fixed-effect parameters and variance components, and decreased mean
square error for parameter estimation.
Using the joint model, Wang’s study concluded that the remission status differs
significantly between patients in each treatment arm, and neither gender nor risk profile
can predict remission status. The study also concluded that the binary remission response
can be an important predictor for the hazard rate of survival, and that the possible
25
interaction effect between remission response and treatment is not statistically significant.
Wang also used Maximum Partial Likelihood (MPL) and Jackknife method to provide a
point estimate for the variance of the cluster-level random effects.
1.5 Purpose of Study
The purpose of this study is to develop both the asymptotic and bootstrapping
methods in detecting existence of random cluster effects for clinical and epidemiology
studies with multiple outcome variables. In the previous study done by Wang, no test was
included in the study to test whether the heterogeneity among each center was
statistically significant. Furthermore, the existing methods of testing shown in section
1.1.2 have not yet been applied to join modeling situation. This study addresses both of
these issues by combining a bootstrapping procedure with the score-based statistics
described in (1.1).
In this study we intend to test the hypothesis that in the joint model developed for
studies that contain both binary and survival outcomes, the variance of the random effect
components are 0. In other words, we seek to test the following null hypothesis and
alternative hypothesis:
H0 : 𝜃1 = 𝜃2 = 0
H1 : 𝜃1 > 0, or 𝜃2 > 0 or both 𝜃1 and 𝜃2 > 0
More specifically, the objectives include:
26
1. Develop both asymptotic and bootstrap methods to test the presence of random
effects.
2. Evaluate the validity of both the asymptotic and bootstrap testing methods by
calculating the type I error and power of the proposed testing methods.
3. Compare the bootstrapping method with the asymptotic variance test outlined in
section 1.2.
4. Apply the proposed methods to the HD.6 clinical trial to test for cluster effects.
After establishing the type I error and power of the bootstrap and the asymptotic
method, we will also investigate whether it is possible to achieve a desired level of
statistical power and test size by using specific sample sizes. This investigation is
important for other researchers who need to choose an optimal sample size to investigate
cluster-level random effects in a given sample. Typically, when designing a study,
researchers want to choose an appropriate sample size so that the method used to test the
hypothesis at a given test size (0.025 for one-sided test, and 0.05 for tow-sided test) can
have specific type I error and power level. We will show that it is possible for such
sample size selection with the bootstrap and the asymptotic method, but the theory behind
how sample size affect the achieved type I error and power, as well as the precise
methodology of sample size selection, are beyond the scope of this study.
We will not pursue bias estimation of the bootstrap statistics in this study, and
rather focus on developing a convenient method that can be used in cluster-effect
detection and protocol design by researchers. As shown above, the bootstrap technique is
much easier to master and involves less complicated calculations; it provides an
27
approximation to the distribution of the test statistic. If the bootstrap method is accurate
in test size and has the same or better statistics power, it can bring great benefits to
various research efforts in the future.
28
Chapter 2
Bootstrap and Asymptotic Variance Testing Procedures
2.1 Overview
In this chapter, we develop approaches to test the existence of cluster effects in
the multivariate models discussed in section 1.4. This is equivalent to testing the
hypothesis that random effects in the multivariate models have variance of 0. The first
approach is based on the asymptotic variance method such as the one presented by Liang
(1987), in which both the score test statistics S and the standardized score test statistic T
are calculated as illustrated in section 1.1.2. In the second approach, instead of
calculating the T, we use bootstrap method to establish the distribution of the score test
statistics S under the null hypothesis H0: 𝜃1 = 𝜃2 = 0. By comparing this distribution with
the observed score S statistics of the data, an inference can be made about the random
effect components. To assess the validity of both the asymptotic and bootstrap methods,
we will study the distribution of estimates generated from both methods, as well as
compute the type I error and power of them. Lastly, we will also apply these methods to a
Hodgkin Lymphoma clinical trial conducted at NCIC, also known as the HD.6 clinical
trial, to evaluate possibility of institution cluster effect involved in this trial.
29
(2.2)
2.2 Calculating score-based statistics S
Let 𝛼1𝑖 stand for the random effect of the ith cluster in the GLMM model, and 𝛼2𝑖
stand for random effect of the ith cluster in the cox-frailty model. We assume that these
two random effects follow a bivariate normal distribution with zero means and a variance
matrix Σ , with variances 𝜎12 and 𝜎2
2 for 𝛼1𝑖 and 𝛼2𝑖 respectively, and 𝜎12 as the
covariance of 𝛼1𝑖 and 𝛼2𝑖.
𝜶𝒊 = (𝛼1𝑖𝛼2𝑖)~𝑁2 ((
𝛼1𝑖𝛼2𝑖) , (
𝜎12 𝜎12
𝜎12 𝜎22 )) = 𝑁2 ((
𝛼1𝑖𝛼2𝑖) , Σ) (2.1)
Under the null hypothesis, we can assume that both 𝛼1𝑖 and 𝛼1𝑖 are both 0. We use 𝜋𝑖𝑗 to
denote the probability of biomarker response 𝑦𝑖𝑗 taking value 1.
𝑃(𝑌𝑖𝑗 = 1|𝑋𝑖𝑗, 𝛽1, 𝛼1𝑗) = 𝜋𝑖𝑗 =𝑒𝑋1𝑗𝑖𝑇 𝛽1+𝛼1𝑖
1+𝑒𝑋1𝑖𝑗𝑇 𝛽1+𝛼1𝑖
Wang(2013) defined the full joint likelihood function as
𝐿 = ∫{∏1
2𝜋√|Σ|exp [−
1
2𝜶𝒊𝑻Σ−1𝜶𝒊]
𝑚
𝑖=1
∏∏ (𝑒𝑋1𝑗𝑖
𝑇 𝛽1+𝛼1𝑖
1 + 𝑒𝑋1𝑖𝑗𝑇 𝛽1+�̂�1𝑖
)
𝑦𝑖𝑗
(1
1 + 𝑒𝑋1𝑖𝑗𝑇 𝛽1+𝛼1𝑖
)
1−𝑦𝑖𝑗𝑛𝑖
𝑗=1
𝑚
𝑖=1
[𝑒−𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 𝛽2+𝛼2𝑖
] [𝜆0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 𝛽2+𝛼2𝑖]
𝛿𝑖𝑗} 𝑑𝜶𝒊 (2.3)
Furthermore, l denotes the joint log-likelihood function and has the following form:
𝑙 =∑∑𝑦𝑖𝑗(𝑋1𝑖𝑗𝑇 𝛽1 + 𝛼1 + 𝜃1
12𝑣1𝑗
𝑛𝑗
𝑖
)
𝑚
𝑗
− log(1 + 𝑒𝑋1𝑖𝑗𝑇 𝛽1+𝛼1+𝜃1
12𝑣1𝑗) + 𝛿𝑖[𝑙𝑜𝑔𝜆0(𝑡𝑖𝑗) + 𝑋2𝑖𝑗
𝑇 𝛽2
+ 𝛼2 + 𝜃2
12𝑣2𝑖] − 𝛬0(𝑡𝑖𝑗)𝑒
𝑋2𝑖𝑗𝑇 𝛽2+𝛼2+𝜃2
12𝑣2𝑖 (2.4)
(more details in Appendix B)
30
Combining the log-likelihood function with the theory developed by Liang
(1987), the score-based statistics S has the following form:
𝑆 =1
2∑{(
𝜕𝑙
𝜕𝛼1𝑖)2
+ (𝜕𝑙
𝜕𝛼2𝑖)2
− (−𝜕𝑙2
𝜕𝛼1𝑖2 ) − (−
𝜕𝑙2
𝜕𝛼2𝑖2 )}
𝑚
𝑖=1
(2.5)
To calculate the score-based statistics S, we need to derive 𝜕𝑙
𝜕𝛼1𝑖 and
𝜕𝑙
𝜕𝛼2𝑖 , the score
functions with respect to the random effect components, as well as −𝜕2𝑙
𝜕𝛼1𝑖2 and −
𝜕2𝑙
𝜕𝛼2𝑖2 , the
observed information function of the random effect variables (More detailed derivation in
Appendix B).
Similar to Wang (2013), we applied the first-order Lapalce method to
approximate the solution of (2.3) (Appendix B), and derived the necessary parts for the
score-based statistics S:
𝜕𝑙
𝜕𝛼1𝑖=∑𝑦𝑖𝑗 −
𝑒𝑋1𝑖𝑗𝑇 𝛽1+𝛼1𝑖
1 + 𝑒𝑋1𝑖𝑗𝑇 𝛽1+𝛼1𝑖
𝑛𝑖
𝑗=1
−𝛼1𝑖𝜎1
2 − 𝛼2𝑖𝜎12
𝜎12𝜎2
2 − 𝜎12
−𝜕2𝑙
𝜕𝛼1𝑖2 = −∑[−
𝑒𝑋1𝑖𝑗𝑇 𝛽1+𝛼1𝑖
(1 + 𝑒𝑋1𝑖𝑗𝑇 𝛽1+𝛼1𝑖)
2]
𝑛𝑖
𝑗=1
+𝜎22
𝜎12𝜎2
2 − 𝜎12
𝜕𝑙
𝜕𝛼2𝑖=∑𝛿𝑖 −
𝑛𝑖
𝑗=1
𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 𝛽2+𝛼2𝑖 −
𝛼2𝑖𝜎12 − 𝛼1𝑖𝜎12
𝜎12𝜎2
2 − 𝜎12
−𝜕2𝑙
𝜕𝛼1𝑖2 = −∑−𝛬0(𝑡𝑖𝑗)𝑒
𝑋2𝑖𝑗𝑇 𝛽2+𝛼2𝑖
𝑛𝑖
𝑗=1
+𝜎12
𝜎12𝜎2
2 − 𝜎12
To obtain the score-based statistics S under null hypothesis, we evaluate these
equations at H0: 𝜃1 = 𝜃2 = 0, which is equivalent to 𝜎12 = 𝜎2
2 = 𝜎12 = 0, and 𝛼1𝑖 =
31
𝛼2𝑖 = 0. Let S1i(0) and S2i(0) to represent the score functions in the GLMM and Cox-
frailty model evaluated under the null hypothesis H0, and I1i(0) and I2i(0) represent the
information functions in the GLMM and Cox-frailty model evaluated under H0
𝑆1𝑖(0) =∑𝑦𝑖𝑗 −𝑒𝑋1𝑖𝑗
𝑇 𝛽1
1 + 𝑒𝑋1𝑖𝑗𝑇 𝛽1
𝑛𝑖
𝑗=1
𝐼1𝑖(0) = −∑−𝑒𝑋1𝑖𝑗
𝑇 𝛽1
(1 + 𝑒𝑋1𝑖𝑗𝑇 𝛽1)
2
𝑛𝑖
𝑗=1
𝑆2𝑖(0) =∑𝛿𝑖 −
𝑛𝑖
𝑗=1
𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 𝛽2
𝐼2𝑖(0) = −∑−𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 𝛽2
𝑛𝑖
𝑗=1
Substitute these functions into equation (2.5), and change 𝛽1 and 𝛽2 to �̂�1 and �̂�2
represent the fitted regression coefficient parameters. The final form of the score-based
statistics S are shown as the following:
𝑆 = 1
2∑[(∑𝑦𝑖𝑗 −
𝑒𝑋1𝑖𝑗𝑇 �̂�1
1 + 𝑒𝑋1𝑖𝑗𝑇 �̂�1
𝑛𝑖
𝑗=1
)
2
+ (∑𝛿𝑖 −
𝑛𝑖
𝑗=1
𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 �̂�2)
2
−∑−𝑒𝑋1𝑖𝑗
𝑇 �̂�1
(1 + 𝑒𝑋1𝑖𝑗𝑇 �̂�1)
2
𝑛𝑖
𝑗=1
𝑚
𝑖=1
−∑𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 �̂�2
𝑛𝑖
𝑗=1
]. (2.6)
32
2.3 Inference using Asymptotic Method
As previously demonstrated in Liang (1987), the asymptotic variance method
involves the statistics T=S/I1/2. Let 𝑙1 denote the log-likelihood function of the GLMM
model, the GLMM portion of the information matrix I is:
𝐼𝐺𝐿𝑀𝑀 = 𝐼𝜃1𝜃1 − [𝐼𝜃1𝛽1𝑇 , 𝐼𝜃1𝛼1] [𝐼𝛽1𝛽1 𝐼𝛼1𝛽1𝐼𝛼1𝛽1𝑇 𝐼𝛼1𝛼1
]
−1
[𝐼𝜃1𝛽1𝐼𝜃1𝛼1
],
in which
𝐼𝜃1𝜃1 =∑(𝜕𝑙1𝑖𝜕𝜃1
)2𝑚
𝑖=1
𝐼𝛼1𝛼1 =∑(𝜕𝑙1𝑖𝜕𝛼1
)2𝑚
𝑖=1
𝐼𝛽1𝛽1 =∑𝜕𝑙1𝑖𝜕𝛽1
𝜕𝑙1𝑖
𝜕𝛽1𝑇
𝑚
𝑖=1
𝐼𝜃1𝛽1 =∑𝜕𝑙1𝑖𝜕𝜃1
𝜕𝑙1𝑖𝜕𝛽1
𝑚
𝑖=1
𝐼𝜃1𝛼1 =∑𝜕𝑙1𝑖𝜕𝜃1
𝜕𝑙1𝑖𝜕𝛼1
𝑚
𝑖=1
𝐼𝛼1𝛽1 =∑𝜕𝑙1𝑖𝜕𝛼1
𝜕𝑙1𝑖𝜕𝛽1
𝑚
𝑖=1
𝐼𝛼1𝛽1𝑇 =∑𝜕𝑙1𝑖𝜕𝛼1
𝜕𝑙1𝑖
𝜕𝛽1𝑇
𝑚
𝑖=1
. (2.7)
And the Cox-frailty portion is
𝐼𝑐𝑜𝑥 = 𝐼𝜃2𝜃2 − [𝐼𝛼2𝛽2𝑇 , 𝐼𝜃2𝛼2] [𝐼𝛽2𝛽2 𝐼𝛼2𝛽2𝐼𝛼2𝛽2𝑇 𝐼𝛼2𝛼2
]
−1
[𝐼𝜃2𝛽2𝐼𝜃2𝛼2
],
𝐼𝜃2𝜃2 =∑(𝜕𝑙2𝑖𝜕𝜃2
)2𝑚
𝑖=1
𝐼𝛼2𝛼2 =∑(𝜕𝑙2𝑖𝜕𝛼2
)2𝑚
𝑖=1
𝐼𝛽2𝛽2 =∑𝜕𝑙2𝑖𝜕𝛽2
𝜕𝑙2𝑖
𝜕𝛽2𝑇
𝑚
𝑖=1
𝐼𝜃2𝛽2 =∑𝜕𝑙2𝑖𝜕𝜃2
𝜕𝑙2𝑖𝜕𝛽2
𝑚
𝑖=1
𝐼𝜃2𝛼2 =∑𝜕𝑙2𝑖𝜕𝜃2
𝜕𝑙2𝑖𝜕𝛼2
𝑚
𝑖=1
𝐼𝛼2𝛽2 =∑𝜕𝑙2𝑖𝜕𝛼2
𝜕𝑙2𝑖𝜕𝛽2
𝑚
𝑖=1
𝐼𝛼2𝛽2𝑇 =∑𝜕𝑙2𝑖𝜕𝛼2
𝜕𝑙2𝑖
𝜕𝛽2𝑇
𝑚
𝑖=1
(2.8)
The final form of I used in inference is the sum of IGLMM and ICox together.
Intuitively, since the score-based statistic S is the sum of its GLMM and Cox-frailty
proportions, its variance approximation, I, should also be the sum of the variance
approximation of these two portions.
33
The details about the specific parts of I matrices are listed in Appendix C. The T
statistics can be valuated using the value of S and I. In the original literature by Liang
(1987), T is shown to follow a standard normal distribution. The p-value of the
hypothesis test can be obtained by testing T against standard normal distribution.
2.4 Bootstrap Procedure
Bootstrap method is particularly effective for establishing distribution of
complicated statistics since no assumption about the distribution of the statistics is needed
(Adèr et al, 2008). The distribution of statistics S can be complicated, which is a good
scenario to use the bootstrap method to establish the distribution of the score-based
statistics and as an alternative to the asymptotic method.
Suppose a data has N samples grouped into m clusters, Just as score-based
statistic S was calculated under the condition that both 𝛼1𝑖 and 𝛼2𝑖 have zero variances,
the bootstrap procedure should also be conducted under such assumption. Therefore, we
will randomly sample all of the observed samples with replacement to obtain N bootstrap
samples. After this, we rearrange all the bootstrap samples into clusters according to the
arrangement in the original data as if all the clusters are the same.
The bootstrap procedure contains the steps as the following:
0. Start with the original dataset, which has a total of N samples grouped into m
strata. Each observation j in cluster i has covariate 𝑋𝑖𝑗 which contains covariates
values that are part of the fixed-effects such as gender, treatment arm, risk profile.
Remission status is represented by 𝑦𝑖𝑗, time to events are represented by 𝑡𝑖𝑗 , and
34
censoring events indicator is represented by 𝛿𝑖𝑗 . Each observation also
corresponds to a cluster assignment variable 𝛾𝑖𝑗, and 𝛾𝑖𝑗’s have the same value for
the same value of i. In other words, each observation in the original dataset is
composed of a set of (𝑋𝑖𝑗, 𝑦𝑖𝑗, 𝑡𝑖𝑗, 𝛿𝑖𝑗, 𝛾𝑖𝑗) where i = 1,2…m and j=1,2…ni .
1. Randomly select, with replacement, N samples from {𝑋𝑖𝑗, 𝑦𝑖𝑗, 𝑡𝑖𝑗, 𝛿𝑖𝑗: i = 1,2…m;
j=1,2…ni } to produce bootstrap sample {𝑋ℎ∗, 𝑦ℎ
∗ , 𝑡ℎ∗ , 𝛿ℎ
∗: h = 1,2…N}, where is
the total number of sample size.
2. Pair 𝛾𝑖𝑗 to the corresponding bootstrap sample. The bootstrap sample become
{𝑋𝑖𝑗∗ , 𝑦𝑖𝑗
∗ , 𝑡𝑖𝑗∗ , 𝛿𝑖𝑗
∗ , 𝛾𝑖𝑗: i = 1,2…m; j=1,2…ni }
3. Use the bootstrap samples to conduct fixed-effect GLMM and Cox-frailty
regression. This gives �̂�1∗
and �̂�2∗
, the estimates of fixed-effect regression
coefficients in the GLMM and Cox-frailty model.
4. Calculate bootstrap estimate of score statistics S* by applying equation (2.5) to the
bootstrap sample.
5. Repeat step 1 to 4 for B times to obtain B bootstrap estimates.
A large-enough B value can establish the distribution of the score statistic S,
which is expected to have a standard normal distribution. After calculating the actual
score statistic for the dataset, compare the actual score statistic with the statistics
calculated from the B bootstrap samples. In the one sided test, p-value of the hypothesis
test is the percentage of bootstrap statistics larger than the observed score-based statistics
S. The hypothesis should be rejected if p< 𝛼 and 𝛼 = 0.025. For the two-sided test, p-
35
value is the percentage of bootstrap estimates whose absolute values are larger than that
of the observed score statistics. The hypothesis should be rejected if p< 𝛼 and 𝛼 = 0.05.
2.5 Calculating Type I Error and Power
We use the empirical type I error and power values to study the performance of
both bootstrap and the asymptotic methods. We will use both one-sided and two-sided
test to obtain p-values. In one sided test, we use as the criterion for rejecting the null
hypothesis, and in two-sided test we use 𝛼 = 0.05 as the criterion of rejection. We will
generate a large number (K) datasets with both 𝛼1𝑖 and 𝛼2𝑖 set to have zero variance
values, and apply both methods to each dataset to obtain a large number of p-values. The
percentage of p-values less than αis the value of Type I error. To obtain the power of the
methods we will generate K datasets with both 𝛼1𝑖 and 𝛼2𝑖 set to have positive variances.
Bootstrap and asymptotic method are again applied on the simulated datasets and the
percentage of p-values less than αis the power of the bootstrap or the asymptotic method.
36
Chapter 3
Simulation Studies
3.1 Overview
In this chapter, we conduct a series of numerical simulations to evaluate the finite
sample properties of the proposed methods in chapter 2. We used similar data generation
methods as those used in the study by Wang. For each dataset, we pre-define the variance
of cluster-effect to be either zero or a positive value. Using the R program, we generate
multiple datasets and apply the asymptotic and bootstrap methods to them. We calculate
p-values and obtain the type I error and statistical power.
3.2 Methods
Conditioning on the random effect 𝛼1𝑖 , the GLMM response variable yij is
generated from a Bernoulli distribution with probability given in (3.1). To simplify the
process, the only fixed-effect covariate included in the model is treatment, denoted by Zij.
Magnitude of the fixe- effect estimates are pre-defined as parameters, 𝛽1 = (𝛽10 , 𝛽11) ,
for the GLMM model, and 𝛽2 = (𝛽21 , 𝛽22, 𝛽23) for the Cox frailty model. To simplify
the comparison, we used the same value of fixed effect regression coefficients for all
simulations. These values are pre-defined and have fixed values, and are used in all of the
simulation settings.
We used the values defined in table 3.1 as magnitudes of the fixed effects.
37
Table 3.1 Parameters of Simulation
𝛽10 𝛽11 𝛽21 𝛽22 𝛽23
-1 log(0.5) log(0.5) log(2) log(2)
𝑃(𝑌𝑖𝑗 = 1|𝑧𝑖𝑗, 𝛽1, 𝛼1𝑖) =𝑒𝛽10+𝑧𝑖𝑗𝛽11+𝛼1𝑖
1+𝑒𝛽11+𝑧𝑖𝑗𝛽12+𝛼1𝑖
.
Conditioning on the random effect 𝛼2𝑖 , the failure time Tij was assumed to follow an
exponential distribution and simulated from the Cox frailty model (3.2), with the baseline
hazard set to a constant of 0.15. Censoring time Cij was generated from uniform
distribution Unif(0,20) so that the censoring rate was around 20%. The fixed-effect
covariates included were remission response (Yij) and treatment (Zij). To study the
interaction between the two variables, an interaction term was included in the model. The
random variable S(T) (which represents the survival function) follows an uniform
distribution Unif(0,1) (denoted as Wij). Equation (3.2) can be rewritten into equation
(3.3), and provides a solution for the failure time 𝑇𝑖𝑗 .
𝜆(𝑡|𝑋2𝑖𝑗, 𝛽2, 𝛼2𝑖) = 𝜆0(𝑡𝑖𝑗)𝑒𝛽21𝑦𝑖𝑗+𝛽22𝑧𝑖𝑗+𝛽23𝑧𝑖𝑗𝑦𝑖𝑗+𝛼2𝑖 , (3.2)
𝑇𝑖𝑗 = −log(W𝑖𝑗)
0.15𝑒𝛽21𝑦𝑖𝑗+𝛽22𝑧𝑖𝑗+𝛽23𝑧𝑖𝑗𝑦𝑖𝑗+𝛼2𝑖. (3.3)
In each simulated dataset, center-specific random effects are generated from
bivariate normal distribution N2 (0, Σ), where Σ is a 2×2 variance-covariance matrix that
defines the variances and covariance of random effect variable 𝛼1𝑖 and 𝛼2𝑖 . The variance
of 𝛼1𝑖 and 𝛼2𝑖 are denoted as 𝜎12 and 𝜎2
2 respectively, and the covariance is denoted as
(3.1)
38
𝜎12. To simplify the problem further, we will distribute the observation equally into
clusters in our simulated datasets.
After each simulation dataset is generated, we apply the asymptotic method in
section 2.3 and the bootstrap procedure described in section 2.4 to the dataset and
calculate one-sided and two-sided p-values. To calculate the type I error of the test, a
large number (K) of datasets with 0 variance and covariance in the random effect
variables were generated by repeating the data generation procedure in R software
package (R core team, 2014). From these datasets, K p-values were calculated and the
proportion of the p-values smaller than α is the type I error of the test (chance of rejecting
the null hypothesis when it is correct). To calculate the power of the test (chance of
rejecting the null hypothesis when the null hypothesis is false), the same method was
used except at least one of the parameters 𝜎12, 𝜎2
2 and 𝜎12 was made larger than 0, in
order to assess the power of the method.
3.3 Simulation Results
3.3.1 Results of Asymptotic Method
The asymptotic method outlined in Chapter 2 was applied to K = 200 datasets in
each setting to calculate the empirical power, except for when we use the asymptotic
method to test datasets under the null hypothesis with zero-variance random effects, K =
500 replication were conducted to obtain a more stable estimate of empirical type I error
rate. Table 3.2 and 3.3 show the Type I error and power of the asymptotic method in each
setting. In all settings of simulations, the two-sided test of asymptotic method failed to
achieve Type I error rates lower than the test size (0.025 for one-sided test, 0.05 for two-
39
sided test). The power of the method ranged from 36% to 100%, and increases with the
variance of the random effects.
3.3.2 Bootstrap Estimate Distribution
To gain more understanding about the distribution of bootstrap estimates, we
simulated 4 different datasets, with (𝜎12 , 𝜎2
2) values of (0.0, 0.0), (0.1, 0.1), (0.2, 0.2) and
(0.2, 0.1). The covariance values were set so that the correlation between 𝛼11 and
𝛼2𝑖 were set to be 0.5. For each simulated data, we applied the bootstrap procedure
outlined in section 2.3 with B=499 bootstrap replications and plotted the frequency of the
bootstrap estimates for each data in Figure 2.
Figure 2 Bootstrap Estimate Distribution of 4 Datasets
Figure 2 shows that in each trial the bootstrap estimates distribution are
approximately centered near 0 with roughly symmetrical shape that resembles normal
40
distributions. This distribution pattern suggests that the percentile bootstrap is appropriate
for the score-based statistic S.
3.3.3 Type I Error and Power Comparison
In Table 3.2, we show a group of simulations conducted using predefined
regression coefficient as shown in table 3.1. In each setting, K = 200 dataset was
simulated, with the exception that under the null hypothesis settings in which variances
and covariance are set to zero were applied in K = 500 replications. In each simulated
dataset, n = 600 observations were generated and grouped into m = 30 clusters. For each
set of pre-defined variance value, a pair of simulations were conducted so that the impact
of both positive and negative correlations were investigated. It should be noted that the K
= 200 datasets used in each trials were different, as the survival time, binary marker
response and key covariates are generated at random.
In seven simulation settings, variance values of random effect variables were set
as zero. The α for one-sided test is 0.025, and 0.05 for two-sided test. In these settings,
the bootstrap methods showed Type I error value ranging from 0.040 to 0.064 in two-
sided tests (average 0.053), and 0.015 to 0.040 in one-sided tests. On the other hand, the
Type I error rate of the asymptotic method is consistently higher, ranging from 0.114 to
0.148; the one-sided test range from 0 to 0.02. These simulations show that the bootstrap
method is more accurate at recognizing data without clustering effect.
41
Table 3.2 Type I error of Asymptotic Approach and Bootstrap with K = 500 replications
𝝈𝟏𝟐 𝝈𝟐
𝟐 Correlation
coefficient
Type I error (Asymptotic ) Type I error (Bootstrap)
One-sided
(α = 0.025)
Two-sided
(α = 0.05)
One-sided
(α = 0.025)
Two-sided
(α = 0.05)
0 0 0 0.020 0.126 0.035 0.062
0 0 0 0.005 0.130 0.020 0.056
0 0 0 0.000 0.148 0.035 0.052
0 0 0 0.005 0.120 0.020 0.064
0 0 0 0.01 0.114 0.015 0.048
0 0 0 0.000 0.126 0.040 0.048
0 0 0 0.005 0.130 0.030 0.040
More simulations were done using datasets with chosen positive values of 𝜎12 and
𝜎22, and 𝜎12 such that the correlation between the two random effect variables are -0.5,
0.0 and 0.5, in order to study the effect of correlation on the type I error and power of
both test methods. 200 datasets were simulated in each trial.
42
Table 3.3 Power of Asymptotic and Bootstrap Method
In all settings, the bootstrap method manage to achieve higher power (lower type
II error) than the asymptotic method. These simulations also show that the correlation
between the two random effect variable in the joint model has small effects on the power
of the approach. When the variance value are large enough, both bootstrap and the
asymptotic method can detect random effect with almost 100% power.
𝝈𝟏𝟐 𝝈𝟐
𝟐 Correlation coefficient
Power (Asymptotic) Power (Bootstrap)
One-Sided (α = 0.025)
Two-Sided (α = 0.025)
One-Sided (α = 0.025)
Two-Sided (α = 0.025)
0.05 0.05 0.5 0.380 0.380 0.580 0.520 0.05 0.05 0 0.365 0.365 0.700 0.510 0.05 0.05 -0.5 0.365 0.365 0.655 0.592
0.05 0.06 0.5 0.480 0.485 0.725 0.680 0.05 0.06 0 0.465 0.465 0.655 0.715 0.05 0.06 -0.5 0.490 0.495 0.775 0.725
0.05 0.1 0.5 0.740 0.740 0.900 0.905 0.05 0.1 0 0.735 0.735 0.975 0.940 0.05 0.1 -0.5 0.735 0.735 0.940 0.93
0.1 0.05 0.5 0.435 0.440 0.675 0.645 0.1 0.05 0 0.490 0.680 0.680 0.690 0.1 0.05 -0.5 0.430 0.630 0.630 0.655
0.1 0.1 0.5 0.790 0.790 0.915 0.940 0.1 0.1 0 0.835 0.835 0.975 0.945 0.1 0.1 -0.5 0.825 0.825 0.950 0.945
0.1 0.25 0.5 0.990 0.990 1.000 1.00 0.1 0.25 -0.5 1.000 1.000 0.995 0.996
0.1 0.5 0.5 1.000 1.00 1.00 1.00 0.1 0.5 -0.5 1.000 1.00 1.00 1.00
0.25 0.1 0.5 0.945 0.972 0.972 0.972 0.25 0.1 -0.5 0.910 0.968 0.968 0.968
0.5 0.1 0.5 0.980 1.00 1.00 1.00 0.5 0.1 -0.5 0.980 0.974 0.974 0.974
43
For epidemiologists and clinical trial professionals, power levels around 80% are
more of interest. We fixed the variances and covariance of the random effects chosen a
range of sample sizes to demonstrate it is possible to achieve a certain level of statistical
power by changing the sample sizes. In Table 3.3, we show several simulation results.
We controlled 𝜎12 , 𝜎2
2 and 𝜎12 to be 0.05, 0.05, and 0.025 respectively in all settings. It
was observed that a change in sample size and/or cluster number can change the power of
both bootstrap and the asymptotic method. Both asymptotic and bootstrap method
achieved statistical power of 80% at certain levels of sample size and cluster number. A
general trend that was observed is that the power of both methods increase as the sample
size increases.
Table 3.4 Relationship between Sample Size and Statistical Power
Total
Sample
Number of
Clusters
Power (Asymptotic) Power (Bootstrap)
One-sided Two-sided One-sided Two-sided
1200 40 0.820 0.795 0.905 0.915
1000 50 0.670 0.665 0.825 0.825
1000 40 0.715 0.705 0.855 0.885
800 40 0.525 0.520 0.700 0.745
600 30 0.370 0.400 0.580 0.520
500 25 0.290 0.300 0.575 0.610
500 20 0.330 0.355 0.690 0.565
400 20 0.170 0.280 0.415 0.495
350 14 0.210 0.245 0.425 0.475
44
Chapter 4 Application and Discussion
4.1 Application to HD.6 Clinical Trial
We applied both the bootstrap (B=899) and the asymptotic method to the HD.6
clinical trial data. In the Cox-frailty model, we excluded the interaction term of remission
response and treatment arm, and included risk profile as a fixed-effect covariate. The
reason is that risk profile is statistically significant predictor of survival hazard but the
interaction term between response and treatment arm is not statistically significant
(Wang, 2013). The observed value of score-based statistics S is -7.1 and the p-value is
0.485. The distribution of bootstrap estimate has a mean of -3.6. Using the bootstrap
method, it was shown that cluster-level random effect has zero variance, with p-value of
0.51 achieved in one-sided test, and 0.61 in the two-sided test. The asymptotic method
reached the same conclusion, with p-value of 0.88 for one-sided test, and 0.24 for two-
sided test. Both methods showed that the HD.6 clinical trial in question is not affected by
cluster effect at either 0.025 or 0.05 level of significant.
The distribution of the bootstrap estimate for the HD.6 data show right skew,
which is not seen from the distribution shown in Figure 3, in that the HD.6 data bootstrap
estimates. Figure 3 plots the distribution of the bootstrap estimates.
45
Figure 3 Distribution of bootstrap estimates obtained from HD.6 clinical trial data
This skewness could be caused by the uneven distribution of the participants in
the trial. In the resampling process, participants from a particular institution are more
likely to be sampled if that institution has more participants. The resulting distribution is
different from that of the bootstrap estimates generated from evenly distributed data
shown in Figure 2, but since one-sided asymptotic test, which has very low type I error,
confirmed the result that there is no statistically significant cluster-level effect, we accept
the conclusion reached by the bootstrap method.
In the original study conducted by Meyer et al (2012), Kaplan-Meier estimates
were used to analyze the survival and remission status of the patients in whole and in
groups of different risk profiles. In all scenarios of the analysis, cluster effect was not
accounted for. Our analysis showed that it is not necessary to account for the random
difference between different medical centers from which the patients were recruited since
such difference is not statistically significant. The study by Meyer et al eventually
46
concluded that when compared with ABVD therapy alone was associated with higher rate
of overall survival, and this conclusion does not need to be altered to take into account
for cluster effect.
This indicates that future analyses on the dataset do not have to account for this
type of random effect. Based on the conclusion from this study, cluster-level random
effects do not have to be accounted for in the model. This would drastically decrease the
complexity of the model and the inference process in that study.
Previously, Wang(2013) used both multivariate penalized likelihood (MPL) and
Jackknife (JK) resampling method to study the joint modeling of treatment response and
survival time. In the model, cluster-level joint random effect between remission response
and survival was taken into account. It was concluded that “the effects of variance
components are not statistically significant and therefore the association between two
endpoints through joint random effects is negligible”. In fact, the reason that the MPL
algorithm was used in the first place was because of the potential cluster-level random
effect. The alternative methods to the MPL algorithm are the more widely used maximum
likelihood analysis such as the Expectation-Maximization (EM) algorithm. These
methods can reduce bias and make the inference efficient, but can involve “intractable
high-dimensional integral” due to potential but unobservable random effects, and thus
make computation much more time-consuming (Wulfsohn et al. 1997, Wang 2013).
Since we eliminated the possibility of random effects in this study, statistical inference
for this study can be greatly simplified.
47
4.2 Comparison of bootstrap method with the asymptotic method
As demonstrated in Section 3.3, the bootstrap shows improved power at only B =
299 iteration compared to the asymptotic method in both one-sided and two-sided tests.
With respect to the asymptotic method, the type-I error is close to zero when one-sided
test is used, and is well above 0.05 when two-sided test is used. These results show that
the tests are not accurate in achieving the nominal test level α. The fact that the two-sided
tests achieved type I errors that are much larger than the one-sided test type I error shows
that the assumption about the distribution of the standardized statistics T may not be
appropriate in the joint model scenario. On the other hand, the bootstrap method achieved
acceptable levels of type I error using both one-sided and two-sided tests, and since the
type I errors are not close to zero, the bootstrap method is more appropriate for
application in epidemiological studies.
The bootstrap method can be applied to types of data other than clinical trial data. The
asymptotic method requires complicated calculation that involve derivatives of the log-
likelihood function and matrix operations. It also involves assumption that the T statistic
belongs to standard normal distribution, which may not hold. The bootstrap method
presents a non-parametric and relatively easier way to detect whether cluster-level
random effect is in the data. This allows for practitioners to quickly check their data on
hand for cluster-level random effects, and decide whether it is appropriate to include an
extra variable in the model specifically for this effect.
48
4.3 Sample Size Selection
In practice, when the cluster-level random effect is the variable that is
investigated and thus included as a model, researchers want to choose a sample size so
that the method used for modeling has a certain level of type I error and power, usually
0.05 and 0.80 by convention (Kelsey et al, 1996). For example, when designing case
control studies, given the theoretical values of odds ratios, an appropriate sample size can
be chosen based on a simple formula so that the analytical method has type I error of 0.05
and power of 0.80 (Kelsey et al, 1996).
In this study, we attempted to show that asymptotic and bootstrap method can
achieve certain levels of statistical power by using certain numbers of sample size.
Section 3.3 showed that the power of the bootstrap test method can vary with sample and
cluster and the type I error of the method approximates α in multiple simulated scenarios.
Therefore, in studies where the bootstrap methods are applied, optimal sample size can be
chosen so that the type I error and power are αand 0.80, knowing the possible range of
variance of the cluster-level random effect. Since the main objective of this study is to
evaluate the finite sample properties of both bootstrap method and the asymptotic
method, we did not investigate further about the detail about this sample size
determination problem. This can be worthwhile to investigate in the future.
When the asymptotic method is used, the type I error of the one-sided test is close
to zero, with the exception of one setting in our simulation study, and the type I error of
the two-sided test is well above 0.05. Therefore, if the asymptotic method is used in an
epidemiological study, it is difficult to select a particular sample size to achieve type-I
49
errors close to α. Furthermore, the power of the asymptotic test is lower than that of the
bootstrap method, at every variance level, for both one-sided and two-sided tests. This
means that more samples are needed to achieve 80% statistical power using the
asymptotic method. Thus, the bootstrap is the more preferable method to be used in
epidemiological studies.
4.4 Future Directions
The bootstrap method demonstrated in this method can be used for multiple
scenarios, without the necessity for specialized algorithms, to directly test size of cluster-
level random effects in a non-parametric manner. For example, a current hot topic in
clinical trial study is the joint modeling of longitudinal measurement and survival data.
As opposed to the binary response marker (remission status) that is measured only once
in the HD.6 clinical trial, researchers are very interested in a biological marker that is
measured throughout the course of the trial, and how it can predict survival. The
bootstrap method demonstrated in this study can be applied to this type of joint model.
The reason is that the score-based statistics S we used in this study was intended for the
GLMM model in the study done by Liang (et al 1987). The longitudinal biomarker
measurements in clinical trial can be modeled using GLMM model, which our bootstrap
method can be used for random-effect testing. Li and Wang (2008) also demonstrated the
use of smooth bootstrap method for analysis of longitudinal data.
In future simulation studies, the number of observations across clusters can be set
as different numbers and the effect of this setting on the power and type I error of the
50
bootstrap method can be investigated. This is worthwhile to study since clusters with
unequal observation numbers is a more realistic scenario. The performance of both
asymptotic method and bootstrap method on data in which only one cluster-level random
effect has zero variance in the joint model should also be studied.
4.5 Computational Expense
When implementing both methods in R software package, the bootstrap operation
is significantly more computationally expensive than the asymptotic method. This is due
to the reason that the bootstrap method repeats the process of calculating the S statistics
for B times, and we accomplish this using a loop-based mechanism. More specifically in
this case, the program needs to first loop through each cluster to calculate the quantities
needed for subsequent steps (S1𝑖(0) for GLMM model, I1𝑖(0) for Cox-frailty model),
sum these quantities across all centers in a dataset, and then through each of the B
resampled datasets. As a consequence, the computation time increase with magnitude of
R, as well as the sample size n, at a quadratic rate. The bootstrap method becomes
particularly time-consuming when we try to determine its type I error/power on detecting
particular levels of random effect, since the bootstrap process was repeated for 200 times.
In contrast, the asymptotic method is implemented through vectorized mechanism.
Perhaps in the future the process of bootstrapping can be implemented on other software
or better algorithms can be written in order to apply vectorized implementation to the
process.
51
Chapter 5
Summary and Future Direction
The asymptotic method has been used to test the presence of cluster-level random
effect in a number of previous literatures (Liang, 1987; Lin 1997). The method involves
deriving a score-based test statistics S, and subsequently a standard version of the score
test statistics T which is shown to have standard normal distribution. The p-value of the T
statistics can be interpreted as the probability that cluster-level random effect is observed
in data due to random chance.
In this study, we formulated a new bootstrap procedure to achieve the same
purpose. Simulations showed that that the bootstrap method has well controlled Type I
error rate and is more powerful at detecting small values of cluster-level random effect.
We applied both the bootstrap method and the asymptotic method to NCIC Clinical
Trials Group HD.6 clinical trial. Both methods showed that the data of the clinical trial is
not affected by cluster-level random effect. This conclusion has significant impact on the
decision on analysis of the trial’s data. This also means that previous works done by
Wang(2013) can be simplified.
When it comes to study design, practitioners are often interested in choosing a
sample size so that the method that is used can have type I error and power of certain
values. We have shown that the levels of statistical power for both the asymptotic and
bootstrap change with sample size, as well as variance and covariance value of the
random effects themselves. The theoretical basis behind this sample selection problem is
52
out of the scope of this study. In the future studies, more efforts can be spent on the
theoretical basis of the connection between sample size and the type I error and power, so
that a more specific system can be set up to specifically choose appropriate sample sizes
given the theoretical values of random effect variances.
Another research direction in the future is to investigate how the bootstrap method
performs in datasets with unequal cluster sizes. We kept the number of observations in
each cluster sample equal in our simulation to simplify the problem, but a more realistic
setting would be one in which the number of observations in each cluster varies. The
HD.6 clinical trial data has this feature, and could be the reason why the distribution of
the bootstrap estimate show right skew. The conclusion of the bootstrap and the
asymptotic method regarding the cluster-level random effect agree with each other, but
the effect of unequal number of sample in clsuters on the statistical power of the
bootstrap method needs to be determined so that the bootstrap method can be more
properly applied to practical scenarios.
53
Bibliography
Adèr, H. J., Mellenbergh G. J., & Hand, D. J. (2008). Advising on research methods: A
consultant's companion. Huizen, The Netherlands: Johannes van Kessel Publishing.
Bassuk SS, Glass TA, Berkman LF. Social disengagement and incident cognitive decline
in community-dwelling elderly persons. Ann Intern Med. 1999; 131.165-73 PubMed
Bickel P, Freeman D (1981) Some the asymptotic theory for the bootstrap. Ann Statist 9
1196–1217
Davison, A. C.; Hinkley, D. V. (1997). Bootstrap methods and their application.
Cambridge University Press.
Efron B (1979). Bootstrap methods: Another look at the jackknife. Ann. Statist. 7 1–26
Efron, B. (1987). "Better Bootstrap Confidence Intervals". Journal of the American
Statistical Association (Journal of the American Statistical Association, Vol. 82, No. 397)
82 (397): 171–185.
Efron, B.; Tibshirani, R. (1993). An Introduction to the Bootstrap. Boca Raton, FL:
Chapman & Hall/CRC.
54
Glidden, DV, Vittinghoff, E (2004). Modeling clustered survival data from multicenter
clinical trials. Stat Med, 23, 3:369-88.
Gray, R.J., (1995) Testing for Variation Over Groups in Survival Data, Journal of the
American Statistical Association, March 1995, Vol 90, No. 429.
Harald Steck, H., Jaakkola, T.S., (2003) Bias-Corrected Bootstrap and Model
Uncertainty, Advances in Neural Information Processing Systems 16, 2003
Kelsey, J.L., Whittemore, A.S., Evans, A.S., (1996) Methods in Observational
Epidemiology, Oxford University Press
Laine C, Venditti L, Localio R, Wickenheiser L, Morris DL. (1998), Combined cardiac
catheterization for uncomplicated ischemic heart disease in a Medicare population. Am J
Med. 1998; 105.373-9
Li, Y., Wang, YG., Smooth bootstrap methods for analysis of longitudinal data. Stat
Med. 2008 Mar 30;27(7):937-53.
Liang, KY., (1987), A locally most powerful test for homogeneity with many strata,
Biometrika (1987), 74, 2, pp. 259-64
55
Liang, KY., Zeger, S., (1986), Longitudinal data analysis using generalized linear
models. Biometrika 73 (1): 13–22.
Lin, D.Y., Wei, L.J., (1991), Goodness-Of-Fit Tests for the General Cox Regression
Model. Statistical Sinica, 1, 1-17
Localio AR, Berlin JA, Ten Have TR, Kimmel SE. Adjustments for Center in
Multicenter Studies: An Overview. Ann Intern Med. 2001;135:112-123.
Mantel N, Haenszel W (1959) Statistical aspects of the analysis of data from
retrospective studies of disease. J Natl Cancer Inst. 22: 719–748.
Matsuyama, Y; Sakamoto, J ; Ohashi, Y, A Bayesian hierarchical survival model for the
institutional effects in a multi-centre cancer clinical trial. Stat in Med 1998
Sep,17(17):1893-908
Meyer, RM, Gospodarowicz, MK, Connors, JM, Pearcey, RG, Bezjak, A, Wells, WA,
Burns, BF, Winter, JN, Horning, SJ, Dar, AR, Djurfeldt, MS, Ding, K, Shepherd, LE
(2005). Randomized comparison of ABVD chemotherapy with a strategy that includes
radiation therapy in patients with limited-stage Hodgkin's lymphoma: National Cancer
Institute of Canada Clinical Trials Group and the Eastern Cooperative Oncology Group.
J. Clin. Oncol., 23, 21:4634-42.
56
R Core Team (2014). R: A language and environment for statistical computing. R
Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.
Ripatti, S, Palmgren, J (2000). Estimation of multivariate frailty models using penalized
partial likelihood. Biometrics, 56, 4:1016-22.
Rubin D (1981). The Bayesian bootstrap. Ann Statist 9 130–134
Singh K (1981) On the asymptotic accuracy of Efron’s bootstrap. Ann Statist 9 1187–
1195
Smith, P.J., Heitjan, D.F., (1993), Testing and Adjusting for Departures from Normal
Dispersion in Generalized Linear Models, Appl. Statist. (1993) 42, No. 1, 31-41.
Stect, H., Jaakkola, T.S., Bias-Corrected Bootstrap and Model Uncertainty, Advances in
Neural Information Processing Systems 16 (NIPS 2003)
Wang, J. (2013), Joint Modeling of Binary Response and Survival Data in Clinical Trials,
Department of Public Health Sciences, Queen’s University, Kingston, Ontario
Wulfsohn, M.S., Tsiatis, A.A. (1997). A joint model for survival and longitudinal data
measured with error. Biometrics, 53, 1:330-9.
57
Ye, W., Lin, X.H., Taylor, J (2008). A penalized likelihood approach to joint modeling of
longitudinal measurements and time-to-event data. Statistics and Its Interface, 1, 33–45.
58
Appendix A
Derivation of Score-based Statistics
Let 𝛽 represent the fixed-effect regression parameters and 𝛼𝑖 represent the random
effect specific to cluster i, where 𝛼𝑖 = 𝛼 + 𝜃1
2𝑣𝑖 with each of the components as defined
in section 1.1.2. We start from the density function of cluster i, denoted as 𝑓(𝛽, 𝛼𝑖). The
log likelihood function is denoted by 𝑙𝑖 and has the following form:
𝑙𝑖 = 𝑙𝑜𝑔∫𝑓(𝛽, 𝛼𝑖)𝑑𝐹(𝑣) = 𝑙𝑜𝑔∫𝑓(𝛽, 𝛼 + 𝜃12𝑣𝑖)𝑑𝐹(𝑣𝑖)
The ultimate goal is to derive the derivative of the log-likelihood function with respect to
𝜃. This is done by applying the chain rule to establish the equality: 𝜕𝑙
𝜕𝜃=
𝜕𝑙
𝜕𝛼𝑖
𝜕𝛼𝑖
𝜕𝜃
𝜕𝑙𝑖𝜕𝜃
=𝜕𝑙
𝜕𝛼𝑖
𝜕𝛼𝑖𝜕𝜃
=∫𝜕𝑓 (𝛽, 𝛼 + 𝜃
12𝑣𝑖)
𝜕𝛼𝑖𝑣𝑑𝐹(𝑣𝑖)
𝑓(𝛽, 𝛼 + 𝜃1/2𝑣𝑖) ×1
2𝜃−
12
=1
2
∫𝜕𝑓 (𝛽, 𝛼 + 𝜃
12𝑣)
𝜕𝛼𝑖𝜃−
12𝑣𝑑𝐹(𝑣)
𝑓(𝛽, 𝛼 + 𝜃1/2𝑣)
Since the numerator results in a value of infinity when evaluated at 𝜃=0, we use
L'Hôpital's rule, and the numerator is transformed using the following way
lim𝜃→0
∫𝜕𝑓 (𝛽, 𝛼 + 𝜃
12𝑣)
𝜕𝛼𝑖𝑣𝑑𝐹(𝑣)
𝜃12
= lim𝜃→0
𝑑𝑑𝜃 ∫
𝜕𝑓 (𝛽, 𝛼 + 𝜃12𝑣)
𝜕𝛼𝑖𝑣𝑑𝐹(𝑣)
𝑑𝑑𝜃𝜃12
(And applying similar chain rule as above)
= lim𝜃→0
12𝜃
−12 ∫
𝜕2𝑓 (𝛽, 𝛼 + 𝜃12𝑣)
𝜕2𝛼𝑖𝑣2𝑑𝐹(𝑣)
12 𝜃
−12
= lim𝜃→0
∫𝜕2𝑓 (𝛽, 𝛼 + 𝜃
12𝑣)
𝜕2𝛼𝑖𝑣2𝑑𝐹(𝑣)
From here we omit the lim sign and evaluate the final expression at 𝜃=0.
59
Since v is a random variable with 0 mean and unit variance, ∫𝑣2𝑑𝐹(𝑣) =E(v2)
=[E(v)]2+var(v) = 1. Therefore
∫𝜕2𝑓 (𝛽, 𝛼 + 𝜃
12𝑣)
𝜕2𝛼𝑖𝑣2𝑑𝐹(𝑣) =
𝜕2𝑓 (𝛽, 𝛼 + 𝜃12𝑣)
𝜕2𝛼𝑖
and
𝜕𝑙𝑖𝜕𝜃
=1
2
𝜕2
𝜕2𝛼𝑖𝑓 (𝛽, 𝛼 + 𝜃
12𝑣)
𝑓 (𝛽, 𝛼 + 𝜃12𝑣)
Add and then subtract [
𝜕
𝜕𝛼𝑖𝑓(𝛽,𝛼+𝜃
12𝑣)
𝑓(𝛽,𝛼+𝜃12𝑣)
]
2
from the existing expression of𝜕𝑙𝑖
𝜕𝜃, we have:
𝜕𝑙𝑖𝜕𝜃
=1
2
{
[
𝜕𝜕𝛼𝑖
𝑓 (𝛽, 𝛼 + 𝜃1
2𝑣)
𝑓 (𝛽, 𝛼 + 𝜃1
2𝑣)]
2
+
𝜕2
𝜕2𝛼𝑖𝑓 (𝛽, 𝛼 + 𝜃
1
2𝑣) × 𝑓 (𝛽, 𝛼 + 𝜃1
2𝑣) − [𝜕𝜕𝛼𝑖
𝑓 (𝛽, 𝛼 + 𝜃1
2𝑣)]2
[𝑓 (𝛽, 𝛼 + 𝜃1
2𝑣)]2
}
The part of the addition sign is actually the result of applying the chain rule and then the
quotient rule to 𝜕2
𝜕2𝛼𝑖𝑙𝑜𝑔𝑓 (𝛽, 𝛼 + 𝜃
1
2𝑣), and the part before the addition sign is the result
of applying the chain rule to 𝜕
𝜕𝛼𝑖𝑙𝑜𝑔𝑓 (𝛽, 𝛼 + 𝜃
1
2𝑣).
𝜕𝑙𝑖𝜕𝜃
=1
2[{𝜕
𝜕𝛼𝑖𝑙𝑜𝑔𝑓 (𝛽, 𝛼 + 𝜃
12𝑣)}
2
− {−𝜕2
𝜕2𝛼𝑖𝑙𝑜𝑔𝑓 (𝛽, 𝛼 + 𝜃
12𝑣)}]
evaluated at 𝜃 = 0, which is the same as evaluating at 𝛼𝑖 = 0
Since 𝜕
𝜕𝛼𝑖𝑙𝑜𝑔𝑓 (𝛽, 𝛼 + 𝜃
1
2𝑣) is the score function, denoted as S(𝛼𝑖) and −𝜕2
𝜕2𝛼𝑖𝑙𝑜𝑔𝑓 (𝛽, 𝛼 +
𝜃1
2𝑣) is the observed information function, denoted as I(𝛼𝑖)
We use to denote the score function, and I(𝛼𝑗) to denote the observation.
𝜕𝑙𝑖𝜕𝜃
= 𝑆 =12∑{𝑆(0)}
2− 𝐼(0)
𝑛
𝑖=1
60
Appendix B
Deriving Score-Based Statistics in the Joint Model
In the study by Wang (2013), parameter estimation was done using multivariate
penalized likelihood method. The approximate marginal log-likelihood function (𝑙) for
the joint model is the following:
𝑙 = 𝑙1+𝑙2 + 𝑙3 + 𝑙4
where
𝑙1 = −𝑚
2log Σ
𝑙2 = −1
2∑{ log [∑−
𝑒𝑋1𝑖𝑗𝑇 𝛽1+𝛼1𝑖
(1 + 𝑒𝑋1𝑖𝑗𝑇 �̂�1+𝛼1𝑖)
2 +𝜎22
𝜎12𝜎2
2 − 𝜎122
𝑛𝑖
𝑗=1
]
𝑚
𝑖=1
+ log [∑−𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 𝛽2+𝛼2𝑖
𝑛𝑖
𝑗=1
+𝜎12
𝜎12𝜎2
2 − 𝜎122 ]}
𝑙3 =∑∑𝑦𝑖𝑗(𝑋1𝑖𝑗𝑇 𝛽1 + 𝛼1 + 𝜃1
1/2𝑣1𝑗
𝑛𝑖
𝑗
)
𝑚
𝑖
− log(1 + 𝑒𝑋1𝑖𝑗𝑇 𝛽1+𝛼1+𝜃1
12𝑣1𝑗) + 𝛿𝑖[𝑙𝑜𝑔𝜆0(𝑡𝑖𝑗) + 𝑋2𝑖𝑗
𝑇 𝛽2
+ 𝛼2 + 𝜃21/2𝑣2𝑖] − 𝛬0(𝑡𝑖𝑗)𝑒
𝑋2𝑖𝑗𝑇 𝛽2+𝛼2+𝜃2
1/2𝑣2𝑖
𝑙4 = −1
2∑
𝛼2𝑖2 𝜎1
2 − 2𝛼1𝑖𝛼2𝑖𝜎12 + 𝛼2𝑖2 𝜎1
2
𝜎12𝜎2
2 − 𝜎122
𝑚
𝑖=1
When estimating random effects, 𝑙2 was excluded from the log-likelihood function, based
on the claims by Ripatti et al. 2000 and Ye et al. 2008 that 𝑙2 has negligible effect on the
estimation of the random effects. Since when we evaluate the expression under the null
hypothesis, 𝑙4 becomes 0, and 𝑙1 is a constant, when we derive the necessary parts for the
61
asymptotic variance statistic 𝐼 we only use 𝑙3 as the log-likelihood function 𝑙.
𝑙 =∑∑𝑦𝑖𝑗(𝑋1𝑖𝑗𝑇 𝛽1 + 𝛼1 + 𝜃1
12𝑣1𝑗
𝑛𝑖
𝑗
)
𝑚
𝑖
− log (1 + 𝑒𝑋1𝑖𝑗𝑇 𝛽1+𝛼1+𝜃1
12𝑣1𝑗) + 𝛿𝑖[ log𝜆0(𝑡𝑖𝑗) + 𝑋2𝑖𝑗
𝑇 𝛽2
+ 𝛼2 + 𝜃2
12𝑣2𝑖] − 𝛬0(𝑡𝑖𝑗)𝑒
𝑋2𝑖𝑗𝑇 𝛽2+𝛼2+𝜃2
12𝑣2𝑖
The function above can be seen as the sum of two parts, lG, which contains only the
random effect in GLMM model and lC , which contains only the random effect in the
Cox-frailty model:
𝑙𝐺 =∑∑𝑦𝑖𝑗(𝑋1𝑖𝑗𝑇 𝛽1 + 𝛼1 + 𝜃1
12𝑣1𝑖
𝑛𝑖
𝑗
)
𝑚
𝑖
− log(1 + 𝑒𝑋1𝑖𝑗𝑇 𝛽1+𝛼1+𝜃1
12𝑣1𝑖)
𝑙𝐶 =∑∑𝛿𝑖[ log𝜆0(𝑡𝑖𝑗) + 𝑋2𝑖𝑗𝑇 𝛽2 + 𝛼2 + 𝜃2
12𝑣2𝑖] − 𝛬0(𝑡𝑖𝑗)𝑒
𝑋2𝑖𝑗𝑇 𝛽2+𝛼2+𝜃2
12𝑣2𝑖
𝑛𝑖
𝑗
𝑚
𝑖
Let 𝑙𝐺𝑖 and 𝑙𝐶𝑖 represent the GLMM portion and the Cox-frailty portion of individuals in cluster i
.
𝑙𝐺𝑖 =∑𝑦𝑖𝑗(𝑋1𝑖𝑗𝑇 𝛽1 + 𝛼1 + 𝜃1
12𝑣1𝑖
𝑛𝑖
𝑗=1
) − log (1 + 𝑒𝑋1𝑖𝑗𝑇 𝛽1+𝛼1+𝜃1
12𝑣1𝑖)
𝑙𝐶𝑖 =∑𝛿𝑖[ log𝜆0(𝑡𝑖𝑗) + 𝑋2𝑖𝑗𝑇 𝛽2 + 𝛼2 + 𝜃2
12𝑣2𝑖] − 𝛬0(𝑡𝑖𝑗)𝑒
𝑋2𝑖𝑗𝑇 𝛽2+𝛼2+𝜃2
12𝑣2𝑖
𝑛𝑖
𝑗
The score-based test statistics S is given by
𝑆 =1
2∑
𝜕𝑙𝐺𝑖𝜕𝜃1
+𝜕𝑙𝐶𝑖𝜕𝜃2
𝑚
𝑖=1
.
62
From the conclusion in Appendix A, we have
𝜕𝑙𝐺𝑖𝜕𝜃1
= {𝜕𝑙𝐺𝑖𝜕𝛼1𝑖
}2
− {−𝜕2𝑙𝐺𝑖𝜕2𝛼1𝑖
}
𝜕𝑙𝑐𝑖𝜕𝜃2
= {𝜕𝑙𝐶𝑖𝜕𝛼2𝑖
}2
− {−𝜕2𝑙𝐶𝑖𝜕2𝛼2𝑖
}
where both expressions are evaluated at 𝛼1𝑖 = 𝛼2𝑖 =0.
𝜕𝑙𝐺𝑖
𝜕𝜃1=
𝜕𝑙𝑖
𝜕𝜃1 and
𝜕𝑙𝐶𝑖
𝜕𝜃2=
𝜕𝑙𝑖
𝜕𝜃2 since the log-likelihood function 𝑙𝑖 is split into parts that
contain only one type of random effect, the score-based statistics has the form shown in
equation (2.5)
1
2∑{(
𝜕𝑙
𝜕𝛼1𝑖)2
+ (𝜕𝑙
𝜕𝛼2𝑖)2
− (−𝜕𝑙2
𝜕𝛼1𝑖2 ) − (−
𝜕𝑙2
𝜕𝛼2𝑖2 )}
𝑛
𝑖=1
63
Appendix C
Application of Asymptotic Variance in GLMM and Cox-frailty Joint
Model
The objective is to derive the asymptotic statistics T in section 1.1.2 where T=S/𝐼. Since 𝐼
approximate the variance of the score-based statistics S, it can be calculated as the
asymptotic variance approximation of the GLMM model (𝐼𝐺𝐿𝑀𝑀) and that of the cox-
frailty model (𝐼𝑐𝑜𝑥). Similar to splitting up the log-likelihood function in Appendix B to
calculate the score-based statistics S, we do the same for the asymptotic variance statistics
I.
𝐼𝐺𝐿𝑀𝑀 = 𝐼𝜃1𝜃1 − [𝐼𝜃1𝛽1𝑇 , 𝐼𝜃1𝛼1] [𝐼𝛽1𝛽1 𝐼𝛼1𝛽1𝐼𝛼1𝛽1𝑇 𝐼𝛼1𝛼1
]
−1
[𝐼𝜃1𝛽1𝐼𝜃1𝛼1
]
in which 𝐼𝛽1𝛽1 is a p×p matrix. 𝐼𝛼1𝛽1 is a p×1 matrix and 𝐼𝛼1𝛽1𝑇 is its transverse. 𝐼𝜃1𝜃1 and
𝐼𝛼1𝛼1 are both scalar quantities. Furthermore, [𝐼𝜃1𝛽1𝑇 , 𝐼𝜃1𝛼1] is a 1 ×(p+1) dimension and
[𝐼𝜃1𝛽1𝐼𝜃1𝛼1
] is its transpose, and
𝐼𝜃1𝜃1 =∑(𝛿𝑙𝑖𝛿𝜃1
)2𝑚
𝑖=1
𝐼𝛼1𝛼1 =∑(𝛿𝑙𝑖𝛿𝛼1
)2𝑚
𝑖=1
𝐼𝛽1𝛽1 =∑𝛿𝑙𝑖𝛿𝛽1
𝛿𝑙𝑖
𝛿𝛽1𝑇
𝑚
𝑖=1
𝐼𝜃1𝛼1 =∑𝛿𝑙𝑖𝛿𝜃1
𝛿𝑙𝑖𝛿𝛼1
𝑚
𝑖=1
𝐼𝜃1𝛽1 =∑𝛿𝑙𝑖𝛿𝜃1
𝛿𝑙𝑖𝛿𝛽1
𝑚
𝑖=1
𝐼𝛼1𝛽1 =∑𝛿𝑙𝑖𝛿𝛼1
𝛿𝑙𝑖𝛿𝛽1
𝑚
𝑖=1
𝐼𝛼1𝛽1𝑇 =∑𝛿𝑙𝑖𝛿𝛼1
𝛿𝑙𝑖𝛿𝛽1
𝑚
𝑖=1
These parts are all evaluated at 𝜃1 = 0 and then substituted into 𝐼𝐺𝐿𝑀𝑀. Furthermore:
𝐼𝐶𝑜𝑥 = 𝐼𝜃2𝜃2 − [𝐼𝜃2𝛽2𝑇 , 𝐼𝜃2𝛼2] [𝐼𝛽2𝛽2 𝐼𝛼2𝛽2𝐼𝛼2𝛽2𝑇 𝐼𝛼2𝛼2
]
−1
[𝐼𝜃2𝛽2𝐼𝜃2𝛼2
]
64
In which 𝐼𝛽2𝛽2 is a p×p matrix. 𝐼𝛼2𝛽2 is a p×1 matrix and 𝐼𝛼2𝛽2𝑇 is its transpose. 𝐼𝜃2𝜃2 and
𝐼𝛼2𝛼2 are both scalar quantities. Furthermore, [𝐼𝜃2𝛽2𝑇 , 𝐼𝜃2𝛼2] is a 1 ×(p+1) dimension and
[𝐼𝜃2𝛽2𝐼𝜃2𝛼2
] is its transverse. Where:
𝐼𝜃2𝜃2 =∑(𝛿𝑙𝑖𝛿𝜃2
)2𝑚
𝑖=1
𝐼𝛼2𝛼2 =∑(𝛿𝑙𝑖𝛿𝛼2
)2𝑚
𝑖=1
𝐼𝛽2𝛽2 =∑𝛿𝑙𝑖𝛿𝛽2
𝛿𝑙𝑖
𝛿𝛽2𝑇
𝑚
𝑖=1
𝐼𝜃2𝛼2 =∑𝛿𝑙𝑖𝛿𝜃2
𝛿𝑙𝑖𝛿𝛼2
𝑚
𝑖=1
𝐼𝜃2𝛽2 =∑𝛿𝑙𝑖𝛿𝜃2
𝛿𝑙𝑖𝛿𝛽2
𝑚
𝑖=1
𝐼𝛼2𝛽2 =∑𝛿𝑙𝑖𝛿𝛼2
𝛿𝑙𝑖𝛿𝛽2
𝑚
𝑖=1
𝐼𝛼2𝛽2𝑇 =∑𝛿𝑙𝑖𝛿𝛼2
𝛿𝑙𝑖
𝛿𝛽2𝑇
𝑚
𝑖=1
.
These parts are all evaluated at 𝜃2 = 0 and then substituted into 𝐼𝐶𝑜𝑥.
As established in Appendix B, the total log-likelihood function of the joint model is:
𝑙 = ∑ ∑ 𝑦𝑖𝑗(𝑋1𝑖𝑗𝑇 𝛽1 + 𝛼1 + 𝜃1
1/2𝑣1𝑖
𝑛𝑖𝑗 )𝑚
𝑖 − log(1 + 𝑒𝑋1𝑖𝑗𝑇 𝛽1+𝛼1+𝜃1
12𝑣1𝑖) + 𝛿𝑖[ log𝜆0(𝑡𝑖𝑖) +
𝑋2𝑖𝑗𝑇 𝛽2 + 𝛼2 + 𝜃2
1/2𝑣2𝑖] − 𝛬0(𝑡𝑖𝑗)𝑒
𝑋2𝑖𝑗𝑇 𝛽2+𝛼2+𝜃2
1/2𝑣2𝑖.
Let
𝑙𝐺𝑖 = ∑ 𝑦𝑖𝑗(𝑋1𝑖𝑗𝑇 𝛽1 + 𝛼1 + 𝜃1
1
2𝑣1𝑖𝑛𝑖𝑗 ) − log (1 + 𝑒𝑋1𝑖𝑗
𝑇 𝛽1+𝛼1+𝜃1
12𝑣1𝑖),
𝑙𝐶𝑖 = ∑ 𝛿𝑖[ log𝜆0(𝑡𝑖𝑗) + 𝑋2𝑖𝑗𝑇 𝛽2 + 𝛼2 + 𝜃2
1
2𝑣2𝑖] − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 𝛽2+𝛼2+𝜃2
12𝑣2𝑖𝑛𝑖
𝑗 .
Now we take a series of partial derivatives to obtain the asymptotic variance of the
proposed score test statistic for the joint model:
𝛿𝑙𝐺𝑖𝛿𝜃1
= (∑𝑦𝑖𝑗 −𝑒𝑋1𝑖𝑗
𝑇 𝛽1+𝛼1+𝜃11/2
𝑣1𝑖
1 + 𝑒𝑋1𝑖𝑗𝑇 𝛽1+𝛼1+𝜃1
1/2𝑣1𝑖
𝑛𝑖
𝑗=1
)
2
−∑𝑒𝑋1𝑖𝑗
𝑇 𝛽+𝛼1+𝜃11/2
𝑣1𝑖
(1 + 𝑒𝑋1𝑖𝑗𝑇 𝛽+𝛼1+𝜃1
1/2𝑣1𝑖)2
𝑛𝑗
𝑖
𝛿𝑙𝐺𝑖𝛿𝛼1
=∑𝑦𝑖𝑗
𝑛𝑖
𝑗=1
−𝑒𝑋1𝑖𝑗
𝑇 𝛽1+𝛼1+𝜃11/2
𝑣1𝑖
1 + 𝑒𝑋1𝑖𝑗𝑇 𝛽1+𝛼1+𝜃1
1/2𝑣1𝑖
65
𝛿𝑙𝐺𝑖𝛿𝛽1
=∑𝑦𝑖𝑗𝑋1𝑖𝑗𝑇
𝑛𝑖
𝑗=1
−𝑒𝑋1𝑖𝑗
𝑇 𝛽1+𝛼1+𝜃11/2
𝑣1𝑖
1 + 𝑒𝑋1𝑖𝑗𝑇 𝛽1+𝛼1+𝜃1
1/2𝑣1𝑖𝑋1𝑖𝑗𝑇
= ∑(𝑦𝑖𝑗 −𝑒𝑋1𝑖𝑗
𝑇 𝛽1+𝛼1+𝜃11/2
𝑣1𝑖
1 + 𝑒𝑋1𝑖𝑗𝑇 𝛽1+𝛼1+𝜃1
1/2𝑣1𝑖)𝑋1𝑖𝑗
𝑇
𝑛𝑖
𝑗=1
(𝛿𝑙𝐺𝑖𝛿𝛽1
)2
=𝛿𝑙
𝛿𝛽1𝑇
𝛿𝑙
𝛿𝛽1
=∑(𝑦𝑖𝑗 −𝑒𝑋1𝑖𝑗
𝑇 𝛽1+𝛼1+𝜃11/2
𝑣1𝑖
1 + 𝑒𝑋1𝑖𝑗𝑇 𝛽1+𝛼1+𝜃1
1/2𝑣1𝑖)𝑋1𝑖𝑗
𝑛𝑖
𝑗=1
×∑(𝑦𝑖𝑗 −𝑒𝑋1𝑖𝑗
𝑇 𝛽1+𝛼1+𝜃11/2
𝑣1𝑖
1 + 𝑒𝑋1𝑖𝑗𝑇 𝛽1+𝛼1+𝜃1
1/2𝑣1𝑖)𝑋1𝑖𝑗
𝑇
𝑛𝑖
𝑗=1
Sum across all clusters, evaluate under the null hypothesis, replace the unknown
parameter 𝛽1 by the correspond maximum likelihood estimate (MLE) �̂�1:
𝐼𝜃1𝜃1 = {∑[(∑𝑦𝑖𝑗 −𝑒𝑋1𝑖𝑗
𝑇 �̂�1
1 + 𝑒𝑋1𝑖𝑗𝑇 �̂�1
𝑛𝑖
𝑗=1
)
2
−∑𝑒𝑋1𝑖𝑗
𝑇 �̂�1
(1 + 𝑒𝑋1𝑖𝑗𝑇 �̂�1)2
𝑛𝑖
𝑗=1
]
𝑚
𝑖=1
}
2
𝐼𝛼1𝛼1 = [∑(∑𝑦𝑖𝑗
𝑛𝑖
𝑗=1
−𝑒𝑋1𝑖𝑗
𝑇 �̂�1
1 + 𝑒𝑋1𝑖𝑗𝑇 �̂�1
)
𝑚
𝑖=1
]
2
𝐼𝛽1𝛽1 =∑[∑(𝑦𝑖𝑗 −𝑒𝑋1𝑖𝑗
𝑇 �̂�1
1 + 𝑒𝑋1𝑖𝑗𝑇 �̂�1
)𝑋1𝑖𝑗
𝑛𝑖
𝑗=1
×∑(𝑦𝑖𝑗 −𝑒𝑋1𝑖𝑗
𝑇 �̂�1
1 + 𝑒𝑋1𝑖𝑗𝑇 �̂�1
)𝑋1𝑖𝑗𝑇
𝑛𝑖
𝑗=1
]
𝑚
𝑖=1
𝐼𝜃1𝛼1 =∑{[(∑𝑦𝑖𝑗 −𝑒𝑋1𝑖𝑗
𝑇 �̂�1
1 + 𝑒𝑋1𝑖𝑗𝑇 �̂�1
𝑛𝑖
𝑗=1
)
2
−∑𝑒𝑋1𝑖𝑗
𝑇 �̂�1
(1 + 𝑒𝑋1𝑖𝑗𝑇 �̂�1)2
𝑛𝑖
𝑗=1
] ×∑𝑦𝑖𝑗
𝑛𝑖
𝑗=1
−𝑒𝑋1𝑖𝑗
𝑇 �̂�1
1 + 𝑒𝑋1𝑖𝑗𝑇 �̂�1
}
𝑚
𝑖=1
𝐼𝜃1𝛽1 =∑{[(∑𝑦𝑖𝑗 −𝑒𝑋1𝑖𝑗
𝑇 �̂�1
1 + 𝑒𝑋1𝑖𝑗𝑇 �̂�1
𝑛𝑖
𝑗=1
)
2
−∑𝑒𝑋1𝑖𝑗
𝑇 �̂�1
(1 + 𝑒𝑋𝑖𝑗𝑇�̂�1)2
𝑛𝑖
𝑗=1
] ×∑[𝑦𝑖𝑗 −𝑒𝑋1𝑖𝑗
𝑇 �̂�1
1 + 𝑒𝑋1𝑖𝑗𝑇 �̂�1
] × 𝑋1𝑖𝑗𝑇
𝑛𝑖
𝑗=1
}
𝑚
𝑖=1
𝐼𝛼1𝛽1 =∑[(∑𝑦𝑖𝑗
𝑛𝑖
𝑗=1
−𝑒𝑋1𝑖𝑗
𝑇 �̂�1
1 + 𝑒𝑋1𝑖𝑗𝑇 �̂�1
)∑(𝑦𝑖𝑗 −𝑒𝑋1𝑖𝑗
𝑇 �̂�1
1 + 𝑒𝑋1𝑖𝑗𝑇 �̂�1
)𝑋1𝑖𝑗𝑇
𝑛𝑖
𝑗=1
]
𝑚
𝑖=1
66
𝐼𝛼1𝛽1𝑇 =∑[(∑𝑦𝑖𝑗
𝑛𝑖
𝑗=1
−𝑒𝑋1𝑖𝑗
𝑇 �̂�1
1 + 𝑒𝑋1𝑖𝑗𝑇 �̂�1
)∑(𝑦𝑖𝑗 −𝑒𝑋1𝑖𝑗
𝑇 �̂�1
1 + 𝑒𝑋1𝑖𝑗𝑇 �̂�1
)𝑋1𝑖𝑗
𝑛𝑖
𝑗=1
]
𝑚
𝑖=1
These functions make up IGLMM, which has the following form:
𝐼𝐺𝐿𝑀𝑀 = 𝐼𝜃1𝜃1 − [𝐼𝜃1𝛽1𝑇 , 𝐼𝜃1𝛼1] [𝐼𝛽1𝛽1 𝐼𝛼1𝛽1𝐼𝛼1𝛽1𝑇 𝐼𝛼1𝛼1
]
−1
[𝐼𝜃1𝛽1𝐼𝜃1𝛼1
]
To obtain the derivatives for 𝐼𝐶𝑜𝑥
𝛿𝑙𝐶𝑖𝛿𝜃2
=∑[(∑𝛿𝑖𝑗 − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 𝛽2+𝛼2+𝜃2
1/2𝑣2𝑖
𝑛𝑖
𝑗=1
)
2
−∑𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 𝛽2+𝛼2+𝜃2
1/2𝑣2𝑖
𝑛𝑖
𝑗=1
]
𝑚
𝑖=1
Where yij is the remission response for each subject i in cluster j.
𝛿𝑙𝐶𝑖𝛿𝛼2
=∑𝛿𝑖 − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 𝛽2+𝛼2+𝜃2
1/2𝑣2𝑖
𝑛𝑖
𝑗=1
𝛿𝑙𝐶𝑖𝛿𝛽2
= (∑𝛿𝑖𝑗𝑋2𝑖𝑗𝑇
𝑛𝑖
𝑗=1
− 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 𝛽2+𝛼2+𝜃2
1/2𝑣2𝑖𝑋2𝑖𝑗
𝑇 )
= ∑(𝛿𝑖𝑗 − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 𝛽2+𝛼2+𝜃2
1/2𝑣2𝑖)𝑋2𝑖𝑗
𝑇
𝑛𝑖
𝑗=1
𝛿𝑙𝐶𝑖
𝛿𝛽2𝑇 =∑(𝛿𝑖𝑗 − 𝛬0(𝑡𝑖𝑗)𝑒
𝑋2𝑖𝑗𝑇 𝛽2+𝛼2+𝜃2
1/2𝑣2𝑖)𝑋2𝑖𝑗
𝑛𝑖
𝑗=1
(𝛿𝑙𝐶𝛿𝛽2
)
2
=𝛿𝑙
𝛿𝛽2
𝛿𝑙
𝛿𝛽2𝑇
67
=∑(𝛿𝑖𝑗 − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 𝛽2+𝛼2+𝜃2
1/2𝑣2𝑖)𝑋2𝑖𝑗
𝑛𝑖
𝑗=1
×∑(𝛿𝑖𝑗 − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 𝛽2+𝛼2+𝜃2
1/2𝑣2𝑖)𝑋2𝑖𝑗
𝑇
𝑛𝑖
𝑗=1
Sum each component across all clusters, evaluate under the null hypothesis, and replace
the unknown parameter 𝛽2 by the correspond maximum likelihood estimate (MLE) �̂�2
𝐼𝜃2𝜃2 =∑(𝛿𝑙𝐶𝑖𝛿𝜃2
)2𝑚
𝑖=1
= {∑[(∑𝛿𝑖𝑗 − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 �̂�2
𝑛𝑖
𝑗=1
)
2
−∑𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 �̂�2
𝑛𝑖
𝑗=1
]
𝑚
𝑖=1
}
2
𝐼𝛼2𝛼2 = (𝛿𝑙𝐶𝛿𝛼1
)2
= [∑(∑𝛿𝑖 − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 �̂�2
𝑛𝑖
𝑗=1
)
𝑚
𝑖=1
]
2
𝐼𝛽2𝛽2 =∑𝛿𝑙𝐶𝑖𝛿𝛽2
𝛿𝑙𝐶𝑖
𝛿𝛽2𝑇
𝑚
𝑖=1
=∑{[∑(𝛿𝑖𝑗 − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 �̂�2) 𝑋2𝑖𝑗
𝑛𝑖
𝑗=1
] × [∑(𝛿𝑖𝑗 − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 �̂�2) 𝑋2𝑖𝑗
𝑇
𝑛𝑖
𝑗=1
]}
𝑚
𝑖=1
𝐼𝜃2𝛼2 =∑𝛿𝑙𝐶𝑖𝛿𝜃2
𝛿𝑙𝐶𝑖𝛿𝛼2
𝑚
𝑖=1
=∑{[(∑𝛿𝑖𝑗 − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 �̂�2
𝑛𝑖
𝑗=1
)
2
−∑𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 �̂�2
𝑛𝑖
𝑗=1
] × [∑𝛿𝑖 − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 �̂�2
𝑛𝑖
𝑗=1
]}
𝑚
𝑖=1
𝐼𝜃2𝛽2 =∑𝛿𝑙𝐶𝑖𝛿𝜃2
𝛿𝑙𝐶𝑖𝛿𝛽2
𝑚
𝑖=1
=∑{[(∑𝛿𝑖𝑗 − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 �̂�2
𝑛𝑖
𝑗=1
)
2
−∑𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 �̂�2
𝑛𝑖
𝑖=1
] ×∑(𝛿𝑖𝑗 − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 �̂�2)𝑋2𝑖𝑗
𝑇
𝑛𝑖
𝑗=1
}
𝑚
𝑖=1
𝐼𝛼2𝛽2 =∑𝛿𝑙𝐶𝑖𝛿𝛼2
𝛿𝑙𝐶𝑖𝛿𝛽2
𝑚
𝑖=1
=∑{[∑𝛿𝑖 − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 �̂�2
𝑛𝑖
𝑗=1
] ×∑(𝛿𝑖𝑗 − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 �̂�2)𝑋2𝑖𝑗
𝑇
𝑛𝑖
𝑗=1
}
𝑚
𝑖=1
68
𝐼𝛼2𝛽2𝑇 =∑𝛿𝑙𝐶𝛿𝛼2
𝛿𝑙𝐶
𝛿𝛽2𝑇
𝑚
𝑖=1
=∑{[∑𝛿𝑖 − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 �̂�2
𝑛𝑖
𝑗=1
] × [∑(𝛿𝑖𝑗 − 𝛬0(𝑡𝑖𝑗)𝑒𝑋2𝑖𝑗𝑇 �̂�2)𝑋2𝑖𝑗
𝑛𝑖
𝑗=1
]}
𝑚
𝑖=1
And thus we obtained 𝐼𝐶𝑜𝑥 which has the following form:
𝐼𝐶𝑜𝑥 = 𝐼𝜃2𝜃2 − [𝐼𝜃2𝛽2𝑇 , 𝐼𝜃2𝛼2] [𝐼𝛽2𝛽2 𝐼𝛼2𝛽2𝐼𝛼2𝛽2𝑇 𝐼𝛼2𝛼2
]
−1
[𝐼𝜃2𝛽2𝐼𝜃2𝛼2
]
The standardized test statistics T can be calculated as S/(𝐼𝐺𝐿𝑀𝑀 + 𝐼𝐶𝑜𝑥) 12⁄ and used for
further inference, where S is the score-based statistics of the joint model.
69
Appendix D
R code
Library(mnormt)
Library(survival)
Library(mass)
####rnormal and simu.joint generate dataset
rnormal<-function(ncentre,var_u,var_v,cov_uv){
var_cov<-matrix(c(var_u,cov_uv, cov_uv, var_v),2,2)
if(var_u>0 & var_v>0){
ran=rmnorm(ncentre,mean=c(0,0),varcov=var_cov)}
else if(var_v==0){
elements<-cbind(rnorm(ncentre,mean=0, sd=sqrt(var_u)),
rep(0, ncentre))
ran<-as.matrix(elements)
}
else if(var_u==0){
elements<-cbind(rep(0, ncentre),
rnorm(ncentre,mean=0, sd=sqrt(var_v)))
ran<-as.matrix(elements)
}else if ((var_u==0) & (var_v==0)){
ran=matrix(rep(0,ncentre*2), ncentre,2)}
return(ran)
}
simu.joint <- function (n, ncentre, b0,b1, g1, g2, g3, sigma_u, sigma_v, sigma_uv) {
max.iter2=10000
tol2=0.00001
v_u<-vector();v_v<-vector(); covuv<-vector()
beta_list<-list();
gamma_list<-list()
sebeta<-list();
70
segamma<-list()
se_v_u<-vector();
se_v_v<-vector();
se_covuv<-vector()
n_beta1=0;
n_beta2=0;
n_gamma1=0;
n_gamma2=0;
n_gamma3=0
n_var_u=0;
n_var_v=0;
n_covuv=0
##set the initial values for all the parameters that will be estimated
sim_beta<-c(b0, b1)
sim_gamma<-c(g1, g2, g3)
sim_sigma<-c(sigma_u, sigma_v, sigma_uv)
sim_var_u=sim_sigma[1];
sim_var_v=sim_sigma[2];
sim_cov_uv=sim_sigma[3]
##generate centre ID
sim_centre = rep(c(1:ncentre), n/ncentre)
sim_centre<-as.factor(sim_centre)
###ct_matrix records the centre information for all the patients
ct_matrix<-matrix(rep(0,n*ncentre),n,ncentre)
for (i in 1:n){
for (j in 1:ncentre){
if (as.numeric(sim_centre[i])==j) {ct_matrix[i,j]=1}
}
}
centre = rep(c(1:ncentre), n/ncentre)
centre<-as.factor(centre)
##generate center-specific random effect u and v
ran<-rnormal(ncentre,sim_var_u,sim_var_v,sim_cov_uv)
##assign centre ID to each patient
uv_p<-ct_matrix%*%ran
u_p<-uv_p[,1];v_p<-uv_p[,2]
##simulate "arm" variable
arm<-rbinom(n,size=1,prob=0.5)
sim_X1<-cbind(1, arm)
##simulate resp variable
za<-exp(sim_X1%*%sim_beta+u_p)
resp = rbinom(n, 1, za/(1+za))
##simulate the survival time
71
##assume in baseline hazard lambda=0.15, p=1 (Follow exponential distribution)
ranuni<-runif(n,min=0,max=1)
sim_X2<-cbind(resp, arm, resp*arm)
stime<-log(ranuni)/exp(sim_X2%*%sim_gamma+v_p)/(-0.15)
endstudy = runif(n,0, 20) ###make the censoring rate around 20%
event = ifelse(stime>endstudy, 0, 1)
time = ifelse(stime>endstudy, endstudy, stime)
##put all data in a data frame and sort the data based on survival time
datsimu = data.frame(time, event, resp,arm,centre)
st = sort(datsimu$time, decr = T, index = T)
idx = st$ix
datasimu = datsimu[idx, ]
sim<-datasimu[1:n,]
time<-sim$time;
event<-sim$event
resp<-sim$resp;
arm<-sim$arm;
centre<-sim$centre
return(sim)
}
###calculate score in each cluster of the data
centrescore<-function(centredata, betaglm, betacox){
lambda<-centredata$hazard
arm<-centredata$arm
resp<-centredata$resp
event<-centredata$event
risk<-centredata$risk
sex<-centredata$sex
numerator<-exp(betaglm[1]+betaglm[2]*frame$arm)
glminfo<-numerator/(1+numerator)^2
glmscore<-resp-numerator/(1+numerator)
effect<-exp(resp*betacox[1]+arm*betacox[2]+arm*resp*betacox[3])
coxscore<-event-lambda*effect
coxinfo<-lambda*effect
v<-0.5*((sum(glmscore))^2+(sum(coxscore))^2-sum(glminfo)-sum(coxinfo))
72
return(v)
}
###”calcscore” calculates the score statistic of the input data
calcscore<-function(data){
cox<-coxph(Surv(time, event)~as.factor(resp)+as.factor(arm)+
as.factor(resp)*as.factor(arm), data=data)
bootbetacox<-cox$coeff
logi<-glm(data$resp~as.factor(data$arm), family=binomial)
bootbetaglm<-logi$coeff
grid<-basehaz(cox, centered=F)
datasorted<-merge(grid,data,by="time")
total<-vector()
centreNUM<-length(table(data$centre))
for (i in 1:centreNUM){
centredata<-subset(datasorted, centre==i)
total[i]<-centrescore(centredata, bootbetaglm, bootbetacox)
}
scorestat<-sum(total)
return(scorestat)
}
####boottest takes a data and generate bootstrap estimates vector as “result”
boottest<-function(simudata,r){
bootcentre<-simudata$centre
patientNUM<-length(simudata$centre)
centreNUM<-length(table(simudata$centre))
result<-rep(0,r)
result<-replicate(r, {
index<-sample(1:patientNUM, patientNUM, replace=T)
bootstrap<-simudata[index, ]
bootstrap$centre<-bootcentre
bootscore<-calcscore(bootstrap)
return(bootscore)
})
return(result)
}
73
###calculate p-value from the bootstrap estimates
calcpvalue<-function(result, simudata){
actualscore<-round(calcscore(simudata), digits=5)
pvalue<-round(sum(abs(result)>abs(actualscore))/length(result), digits=4)
return(pvalue)
}
####pvaluedist generates simulated datasets with pre-defined values and get pvalues with
two-sided test
pvaluedist<-function(n, ncentre, r,b,g, sigmau, sigmav, sigma_uv){
pcol<-vector()
for (i in 1:r){
data<-simu.joint(n, ncentre, b[1], b[2], g[1], g[2], g[3], sigmau, sigmav, sigma_uv)
bootresult<-boottest(data, 299)
actualscore<-calcscore(data)
pvalue<-sum(abs(bootresult)>abs(actualscore))/length(bootresult)
pcol[i]=pvalue
}
return(pcol)
}
### bootalpha generates datasets with defined parameters and obtain p-values, using one-
sided test, equivalent of the “pvaluedist” function with one-sided test
bootalpha<-function(n, ncentre, r,b,g, sigmau, sigmav, sigma_uv){
pcol<-vector()
for (i in 1:r){
data<-simu.joint(n, ncentre, b[1], b[2], g[1], g[2], g[3], sigmau, sigmav, sigma_uv)
bootresult<-boottest(data, 299)
actualscore<-calcscore(data)
pvalue<-sum(bootresult>actualscore)/length(bootresult)
pcol[i]=pvalue
}
return(pcol)
}
###getalpha takes all the p-values generate from one setting of simulation and calculates
type I error or power, with the options of using either 0.025 or 0.05 as cut-off.
getalpha<-function(vector){
alpha<-sum(vector<=0.05)/length(vector)
74
alpha<-round(alpha, digits=4)
return(alpha)
}
######Asymptotic Method
#####glmI calculates the GLMM portion of the asymptotic variance statistic I
glmI<-function(data){
logi<-glm(data$resp~as.factor(data$arm), family=binomial)
betaglm<-logi$coeff
centreNUM<-length(table(data$centre))
total_tt=0
total_bb=matrix(rep(0,4), nrow=2, ncol=2)
total_alphabeta=matrix(rep(0,2), nrow=1, ncol=2)
total_tab=matrix(rep(0,3),nrow=1, ncol=3)
total_gmid=matrix(rep(0,9),nrow=3, ncol=3)
for (i in 1:centreNUM){
centredata<-subset(data, centre==i)
numerator<-exp(betaglm[1]+betaglm[2]*centredata$arm)
glminfo<-numerator/(1+numerator)^2
glmscore<-centredata$resp-numerator/(1+numerator)
glm_itheta<-0.5*((sum(glmscore))^2-sum(glminfo))
X1T=cbind(rep(1,length(centredata$arm)),centredata$arm)
alpha<-(centredata$resp-numerator/(1+numerator))
glm_ialpha<-sum(alpha)
beta=alpha*X1T
glm_ibeta<-matrix(c(sum(beta[,1]),sum(beta[,2])), ncol=2)
glm_ibetaT<-t(glm_ibeta)
glm_ibeta2<-glm_ibetaT%*%glm_ibeta
glm_ialphabeta<-glm_ibeta*glm_ialpha
### glm_ibetaalpha<-glm_ibetaT*glm_ialpha
glm_ialpha2<-matrix(glm_ialpha^2, nrow=1, ncol=1)
### glm_mid<-rbind(cbind(glm_ibeta2, glm_ibetaalpha), cbind(glm_ialphabeta,
glm_ialpha2))
glm_tab<-matrix(c(glm_itheta*glm_ibeta, glm_itheta*glm_ialpha), nrow=1, ncol=3)
total_bb=total_bb+glm_ibeta2
total_alphabeta=total_alphabeta+glm_ialphabeta
total_tt=total_tt+glm_itheta^2
### total_gmid<-total_gmid+glm_mid
75
total_tab<-total_tab+glm_tab
}
left<-rbind(total_bb, total_alphabeta)
right<-rbind(t(total_alphabeta), glm_ialpha2)
total_gmid<-cbind(left, right)
total_gmid = total_gmid[1:2, 1:2]
total_tab = matrix(total_tab[1, 1:2], nrow = 1, ncol = 2)
###print(total_gmid)
###print(total_tab)
glm_I<-total_tt-total_tab%*%solve(total_gmid)%*%t(total_tab)
return(glm_I)
}
###end glmI
####coxI calculates the Cox-frailty portion of the asymptotic variance statistics I
coxI<-function(data){
cox<-coxph(Surv(time, event)~as.factor(resp)+as.factor(arm)+
as.factor(resp)*as.factor(arm), data=data)
betacox<-cox$coeff
grid<-basehaz(cox, centered=F)
datasorted<-merge(grid,data,by="time")
centreNUM<-length(table(data$centre))
ctotal_tt=0
ctotal_tab=matrix(rep(0,4),nrow=1, ncol=4)
total_cmid=matrix(rep(0,16),nrow=4, ncol=4)
total_coxibeta2<-matrix(rep(0,9),nrow=3, ncol=3)
total_ab<-matrix(rep(0,3), nrow=1, ncol=3)
for (i in 1:centreNUM){
centredata<-subset(datasorted, centre==i)
effect<-exp(centredata$resp*betacox[1]+centredata$arm*betacox[2]
+centredata$resp*centredata$arm*betacox[3])
coxscore<-centredata$event-centredata$hazard*effect
coxinfo<-centredata$hazard*effect
cox_itheta<-0.5*((sum(coxscore))^2-sum(coxinfo))
alpha<-centredata$event-centredata$hazard*effect
cox_ialpha=sum(alpha)
cox_ibeta<-data.matrix(cbind(sum(alpha*centredata$resp),
sum(alpha*centredata$arm),
sum(alpha*centredata$resp*centredata$arm)))
cox_ibetaT<-t(cox_ibeta)
76
cox_ibeta2<-cox_ibetaT%*%cox_ibeta
cox_ialphabeta<-cox_ialpha*cox_ibeta
cox_ibetaalpha<-cox_ialpha*cox_ibetaT
cox_ialpha2<-matrix(cox_ialpha^2, nrow=1, ncol=1)
### cox_mid<-rbind(cbind(cox_ibeta2, cox_ibetaalpha), cbind(cox_ialphabeta,
cox_ialpha2))
cox_tab<-matrix(c(cox_itheta*cox_ibeta, cox_itheta*cox_ialpha), nrow=1, ncol=4)
ctotal_tt=ctotal_tt+cox_itheta^2
### total_cmid<-total_cmid+cox_mid
ctotal_tab<-ctotal_tab+cox_tab
total_ab=total_ab+cox_ialphabeta
total_coxibeta2=total_coxibeta2+cox_ibeta2
}
left=rbind(total_coxibeta2, total_ab)
right=rbind(t(total_ab), cox_ialpha2)
total_cmid<-cbind(left, right)
total_cmidinv=solve(total_cmid)
cox_I<-ctotal_tt-ctotal_tab%*%total_cmidinv%*%t(ctotal_tab)
return(cox_I)
}
#### ‘itest’ takes a dataset and test
itest<-function(data){
S<-calcscore(data)
I<-coxI(data)+glmI(data)
stat=S/(I^0.5)
pvalue<-pnorm(-abs(stat))
return(pvalue)
}
### ‘itest’ calculates the overall asymptotic variance statistics I and obtains p-value using
both one-sided and two-sided test
itestcomp<-function(data){
S<-calcscore(data)
I<-coxI(data)+glmI(data)
stat=S/(I^0.5)
pvalue1 = 1-pnorm(stat)
pvalue2 = 2*pnorm(-abs(stat))
return(c(pvalue1, pvalue2))
}
77
#### ‘itestalpha’ generates simulation data and obtains p-values and store the results in
‘pcol’
itestalpha<-function(n, ncentre, R,b,g, sigmau, sigmav, sigma_uv){
pcol<-matrix(0, R, 2)
for (i in 1:R){
data<-simu.joint(n, ncentre, b[1], b[2], g[1], g[2], g[3], sigmau, sigmav, sigma_uv)
pvalue<-itestcomp(data)
pcol[i, ]<-pvalue
# cat('i=', i, ', p = ', pvalue, '\n')
}
alpha1=getalpha25(pcol[,1])
alpha2=getalpha(pcol[,2])
print(alpha1, alpha2)
return(pcol)
}
return(pcol)
}
###getalpha takes all the p-values generate from one setting of simulation and calculates
type I error or power, using 0.05 as the cutoff
getalpha<-function(vector){
alpha<-sum(vector<=0.05)/length(vector)
alpha<-round(alpha, digits=4)
return(alpha)
}
getalpha25<-function(vector){
alpha<-sum(vector<=0.025)/length(vector)
return(alpha)
}
####Functions specifically used for the HD.6 data
####bootstrap for HD.6 data, adding “risk” as a fixed-effect covariate to the Cox-frailty
model.
centrescorehd6<-function(centredata, betaglm, betacox){
lambda<-centredata$hazard
arm<-centredata$arm
resp<-centredata$resp
event<-centredata$event
risk<-centredata$risk
sex<-centredata$sex
numerator<-exp(betaglm[1]+betaglm[2]*centredata$arm)
78
glminfo<-numerator/(1+numerator)^2
glmscore<-resp-numerator/(1+numerator)
effect<-exp(resp*betacox[1]+arm*betacox[2]+risk*betacox[3]+arm*resp*betacox[4])
coxscore<-event-lambda*effect
coxinfo<-lambda*effect
v<-0.5*((sum(glmscore))^2+(sum(coxscore))^2-sum(glminfo)-sum(coxinfo))
return(v)
}
calcscorehd6<-function(data){
cox<-coxph(Surv(time, event)~as.factor(resp)+as.factor(arm)+as.factor(risk)
+as.factor(resp)*as.factor(arm), data=data)
bootbetacox<-cox$coeff
logi<-glm(data$resp~as.factor(data$arm), family=binomial)
bootbetaglm<-logi$coeff
grid<-basehaz(cox, centered=F)
datasorted<-merge(grid,data,by="time")
total<-vector()
centreNUM<-length(table(data$centre))
for (i in 1:centreNUM){
centredata<-subset(datasorted, centre==i)
total[i]<-centrescorehd6(centredata, bootbetaglm, bootbetacox)
}
scorestat<-sum(total)
return(scorestat)
}
boottesthd6<-function(simudata,r){
bootcentre<-simudata$centre
patientNUM<-length(simudata$centre)
centreNUM<-length(table(simudata$centre))
result<-rep(0,r)
result<-replicate(r, {
index<-sample(1:patientNUM, patientNUM, replace=T)
bootstrap<-simudata[index, ]
bootstrap$centre<-bootcentre
bootscore<-calcscorehd6(bootstrap)
return(bootscore)
})
return(result)
}
79
###calculate p-value from the bootstrap estimates
calcpvalue<-function(result, simudata){
actualscore<-round(calcscore(simudata), digits=5)
pvalue<-round(sum(abs(result)>abs(actualscore))/length(result), digits=4)
return(pvalue)
}
###coxIhd6 calculates Cox-frailty portion of the I statistics
coxIhd6<-function(data){
cox<-coxph(Surv(time, event)~as.factor(resp)+as.factor(arm)+
as.factor(risk), data=data)
betacox<-cox$coeff
grid<-basehaz(cox, centered=F)
datasorted<-merge(grid,data,by="time")
centreNUM<-length(table(data$centre))
ctotal_tt=0
ctotal_tab=matrix(rep(0,4),nrow=1, ncol=4)
total_cmid=matrix(rep(0,16),nrow=4, ncol=4)
total_coxibeta2<-matrix(rep(0,9),nrow=3, ncol=3)
total_ab<-matrix(rep(0,3), nrow=1, ncol=3)
for (i in 1:centreNUM){
centredata<-subset(datasorted, centre==i)
effect<-
exp(centredata$resp*betacox[1]+centredata$arm*betacox[2]+centredata$risk*betacox[3]
)
coxscore<-centredata$event-centredata$hazard*effect
coxinfo<-centredata$hazard*effect
cox_itheta<-0.5*((sum(coxscore))^2-sum(coxinfo))
alpha<-centredata$event-centredata$hazard*effect
cox_ialpha=sum(alpha)
cox_ibeta<-data.matrix(cbind(sum(alpha*centredata$resp),
sum(alpha*centredata$arm),
sum(alpha*centredata$risk)))
cox_ibetaT<-t(cox_ibeta)
cox_ibeta2<-cox_ibetaT%*%cox_ibeta
cox_ialphabeta<-cox_ialpha*cox_ibeta
cox_ibetaalpha<-cox_ialpha*cox_ibetaT
cox_ialpha2<-matrix(cox_ialpha^2, nrow=1, ncol=1)
### cox_mid<-rbind(cbind(cox_ibeta2, cox_ibetaalpha), cbind(cox_ialphabeta,
cox_ialpha2))
80
cox_tab<-matrix(c(cox_itheta*cox_ibeta, cox_itheta*cox_ialpha), nrow=1, ncol=4)
ctotal_tt=ctotal_tt+cox_itheta^2
### total_cmid<-total_cmid+cox_mid
ctotal_tab<-ctotal_tab+cox_tab
total_ab=total_ab+cox_ialphabeta
total_coxibeta2=total_coxibeta2+cox_ibeta2
}
left=rbind(total_coxibeta2, total_ab)
right=rbind(t(total_ab), cox_ialpha2)
total_cmid<-cbind(left, right)
total_cmidinv=solve(total_cmid)
cox_I<-ctotal_tt-ctotal_tab%*%total_cmidinv%*%t(ctotal_tab)
return(cox_I)
}
###end coxI
#### ‘itesthd6’ applies the asymptotic method specifically to the HD.6 data
itesthd6<-function(data){
S<-calcscorehd6(data)
I<-coxIhd6(data)+glmI(data)
stat=S/(I^0.5)
pvalue1 = 1-pnorm(stat)
pvalue2 = 2*pnorm(-abs(stat))
return(c(pvalue1, pvalue2))
}
####data cleaning with the HD.6 Trial
library(survival)
require(boot)
library(coxme)
library(lme4)
abvd<-read.csv("I:/crcruJointModel_cr2.csv", header=TRUE)
colnames(abvd) <- c("arm", "sex", "age", "centre","clc2resp","resp", "risk", "time",
"event")
abvd$time<-as.numeric(abvd$time)
miss<-abvd[is.na(abvd$resp),]
miss$event<-as.factor(miss$event)
summary(miss)
###delete all the observation with missing resp value
abvd<-abvd[!is.na(abvd$resp),]
##remove all the data with survival time less than 6 month
abvd<-subset(abvd,time>=6)
###convert categorical variable to numeric variable
81
########note: arm A=abvd + radiation; arm B=abvd alone##############
abvd$arm<-ifelse(abvd$arm=="A", 1,0) ##abvd alone=0; abvd+radiation=1)
arm<-abvd$arm
resp<-ifelse(abvd$resp=="YES", 1,0) ##remission=1, no remission=0
event<-as.numeric(abvd$event)
time<-abvd$time
centre<-as.factor(as.numeric(abvd$centre)) ##there are 29 centres in total
risk<-ifelse(abvd$risk=="High", 1,0)
sex<-ifelse(abvd$sex=="M", 1,0)
###rearrange the dataset in the descending order of survival time
dat = data.frame(time, event, resp,arm,risk, sex, centre)
st = sort(dat$time, decr = T, index = T)
idx = st$ix
dat = dat[idx, ]
time<-dat$time; event<-dat$event; resp<-dat$resp; arm<-dat$arm
risk<-dat$risk; sex<-dat$sex; centre<-dat$centre
####HD.6 clinical trial data cleaning
abvd<-read.csv("I:/crcruJointModel_cr2.csv", header=TRUE)
colnames(abvd) <- c("arm", "sex", "age", "centre","clc2resp","resp", "risk", "time",
"event")
abvd$time<-as.numeric(abvd$time)
miss<-abvd[is.na(abvd$resp),]
miss$event<-as.factor(miss$event)
summary(miss)
###delete all the observation with missing resp value
abvd<-abvd[!is.na(abvd$resp),]
##remove all the data with survival time less than 6 month
abvd<-subset(abvd,time>=6)
###convert categorical variable to numeric variable
########note: arm A=abvd + radiation; arm B=abvd alone##############
abvd$arm<-ifelse(abvd$arm=="A", 1,0) ##abvd alone=0; abvd+radiation=1)
arm<-abvd$arm
resp<-ifelse(abvd$resp=="YES", 1,0) ##remission=1, no remission=0
event<-as.numeric(abvd$event)
time<-abvd$time
centre<-as.factor(as.numeric(abvd$centre)) ##there are 29 centres in total
risk<-ifelse(abvd$risk=="High", 1,0)
sex<-ifelse(abvd$sex=="M", 1,0)
###rearrange the dataset in the descending order of survival time
82
dat = data.frame(time, event, resp,arm,risk, sex, centre)
st = sort(dat$time, decr = T, index = T)
idx = st$ix
dat = dat[idx, ]
time<-dat$time; event<-dat$event; resp<-dat$resp; arm<-dat$arm
risk<-dat$risk; sex<-dat$sex; centre<-dat$centre