for Design and Analysis of Disease Novel...

ACTAUNIVERSITATIS

UPSALIENSISUPPSALA

2014

Digital Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Pharmacy 184

Novel Pharmacometric Methodsfor Design and Analysis of DiseaseProgression Studies

SEBASTIAN UECKERT

ISSN 1651-6192ISBN 978-91-554-8862-8urn:nbn:se:uu:diva-216537

Dissertation presented at Uppsala University to be publicly examined in B41, BiomedicinsktCentrum, Husargatan 3, Uppsala, Friday, 7 March 2014 at 13:15 for the degree of Doctor ofPhilosophy (Faculty of Pharmacy). The examination will be conducted in English. Facultyexaminer: PhD Kayode Ogungbenro (The University of Manchester).

AbstractUeckert, S. 2014. Novel Pharmacometric Methods for Design and Analysis of DiseaseProgression Studies. Digital Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Pharmacy 184. 65 pp. Uppsala: Acta Universitatis Upsaliensis.ISBN 978-91-554-8862-8.

With societies aging all around the world, the global burden of degenerative diseases isexpected to increase exponentially. From the perspective drug development, degenerativediseases represent an especially challenging class. Clinical trials, in this context often termeddisease progression studies, are long, costly, require many individuals, and have low successrates. Therefore, it is crucial to use informative study designs and to analyze efficiently theobtained trial data. The development of novel approaches intended towards facilitating both thedesign and the analysis of disease progression studies was the aim of this thesis.

This aim was pursued in three stages (i) the characterization and extension of pharmacometricsoftware, (ii) the development of new methodology around statistical power, and (iii) thedemonstration of application benefits.

The optimal design software PopED was extended to simplify the application of optimaldesign methodology when planning a disease progression study. The performance of non-linear mixed effect estimation algorithms for trial data analysis was evaluated in terms of bias,precision, robustness with respect to initial estimates, and runtime. A novel statistic allowingfor explicit optimization of study design for statistical power was derived and found to performsuperior to existing methods. Monte-Carlo power studies were accelerated through applicationof parametric power estimation, delivering full power versus sample size curves from a fewhundred Monte-Carlo samples. Optimal design and an explicit optimization for statistical powerwere applied to the planning of a study in Alzheimer's disease, resulting in a 30% smaller studysize when targeting 80% power. The analysis of ADAS-cog score data was improved throughapplication of item response theory, yielding a more exact description of the assessment score,an increased statistical power and an enhanced insight in the assessment properties.

In conclusion, this thesis presents novel pharmacometric methods that can help addressingthe challenges of designing and planning disease progression studies.

Keywords: pharmacometrics, optimal design, non-linear mixed effects models, degenerativediseases, Alzheimer's disease, item response theory, statistical power

Sebastian Ueckert, Department of Pharmaceutical Biosciences, Box 591, Uppsala University,SE-75124 Uppsala, Sweden.

© Sebastian Ueckert 2014

ISSN 1651-6192ISBN 978-91-554-8862-8urn:nbn:se:uu:diva-216537 (http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-216537)

To my parents

List of papers

This thesis is based on the following papers, which are referred to in the text

by their Roman numerals.

I Nyberg J, Ueckert S, Strömberg EA, Hennig S, Karlsson MO, Hooker

AC. PopED: An extended, parallelized, nonlinear mixed effects models

optimal design tool.

Comput Meth Prog Bio.108(2):789–805 (2012).

II Johansson Å, Ueckert S, Plan E, Hooker AC, Karlsson MO. Evaluation

of Bias, Precision, Robustness and Runtime for Estimation Methods in

NONMEM 7.

In manuscript

III Ueckert S, Hennig S, Nyberg J, Karlsson MO, Hooker AC. Optimizing

disease progression study designs for drug effect discrimination.

J Pharmacokinet Pharmacodyn. 40(5):587-596 (2013).

IV Ueckert S, Hooker AC. Accelerating Monte-Carlo Power Studies

through Parametric Power Estimation.

In manuscript

V Ueckert S, Andrews M, Ito K, Karlsson MO, Corrigan B, Hooker AC.

Challenges and potential of optimal design in late phase clinical trials

through application in Alzheimer’s disease.

In manuscript

VI Ueckert S, Plan E, Ito K, Karlsson MO, Corrigan B, Hooker AC.

Improved Utilization of ADAS-cog Assessment Data through Item

Response Theory based Pharmacometric Modeling.

Accepted

Reprints were made with permission from the publishers.

Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.1 Pharmacometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.1.1 Pharmacometric models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.1.2 Model-based drug development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.2 Disease progression studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.2.1 Progressive disorders and types of drug action . . . . . . . . . . . . . 14

1.2.2 Alzheimer’s disease . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.2.3 Challenges in disease progression studies . . . . . . . . . . . . . . . . . . . . . 16

1.3 Trial data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.3.1 Non-linear mixed effect models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.3.2 Item response theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.3.3 Monte-Carlo simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.3.4 Maximum likelihood estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.3.5 Hypothesis tests and power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

1.4 Trial Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

1.4.1 Cramer-Rao bound and optimal design . . . . . . . . . . . . . . . . . . . . . . . . . . 27

1.4.2 Population Fisher information matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

1.4.3 Design optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

1.4.4 Clinical trial simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.1.1 Item level ADAS-cog data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.2.1 PKPD benchmark models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.2.2 Generic disease progression model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.2.3 ADAS-cog summary score model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.2.4 ADAS-cog item response theory model . . . . . . . . . . . . . . . . . . . . . . . . 33

3.3 Power estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.3.1 Monte-Carlo based power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.3.2 Information matrix based power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.3.3 Parametric power estimation algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.4 Information calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.4.1 Information matrix for power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.4.2 Information matrix for late phase clinical trials . . . . . . . . . . . . 37

3.4.3 Information of ADAS-cog components . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.5 Simulation studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.5.1 Estimation algorithm comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.5.2 Information matrix based power evaluation . . . . . . . . . . . . . . . . . . 40

3.5.3 Parametric power estimation algorithm evaluation . . . . . . . 40

3.5.4 Power comparison for ADAS-cog analysis approaches 40

4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.1 Foundation of tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.1.1 PopED - a population optimal design tool . . . . . . . . . . . . . . . . . . . . . 41

4.1.2 Performance of NLMEM estimation algorithms . . . . . . . . . . . 43

4.2 Expansion of methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.2.1 Information matrix based power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.2.2 Parametric power estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.3 Demonstration of application benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.3.1 Optimal design in late phase trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.3.2 IRT based modeling of ADAS-cog data . . . . . . . . . . . . . . . . . . . . . . . . 50

5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.1 Foundation of tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.2 Expansion of methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.3 Demonstration of application benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

7 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

Abbreviations

3PL 3 parameter logit

AD Alzheimer’s disease

ADNI Alzheimer’s Disease Neuroimaging Initiative

AGQ adaptive Gaussian quadrature

CAMD Coalition Against Major Diseases

CTS clinical trial simulation

EM expectation maximization

FDA US Food and Drug Administration

FIM Fisher information matrix

FO first-order

FO-I first-order with interaction

FOCE first-order conditional estimation

FOCE-I first-order conditional estimation with interaction

GUI graphical user interface

ICC item characteristic curve

IIV interindividual variability

IOV interoccasion variability

IRT item response theory

ITS iterative two-stage

LL log-likelihood

LLR log-likelihood ratio

LS-means least-square-means

MBDD model-based drug development

MC Monte-Carlo

MCI mild cognitively impairment

MLE maximum likelihood estimation

MMSE mini-mental state examination

NEE normalized estimation error

NLMEM non-linear mixed effect model

ODE ordinary differential equation

PD pharmacodynamics

pdf probability density function

PK pharmacokinetics

PKPD pharmacokinetic-pharmacodynamic

PPE parametric power estimation

rRMSE relative root mean square error

RTTE repeated time to event

RUV residual unexplained variability

SAEM stochastic approximation expectation maximization

SSE stochastic simulation and estimation

XML extensible markup language

1. Introduction

The global population of people 60 years and older is projected to triple be-

tween 2000 and 2050 [9]. Naturally, these demographic changes are accompa-

nied by an increase in the prevalence of age-associated degenerative diseases.

Alzheimer’s disease (AD) for example is expected to rise from 35.6 million

sufferers worldwide in 2010 to 115 million in 2050 [68]. Parkinson’s disease,

from 4.1 million in 2005 to 8.7 million 2030, and osteoporosis, from 1.26

million in 1990 to 4.5 million in 2050, show similar growth rates [13, 20].

From a drug development perspective, degenerative diseases represent an

especially challenging class. Clinical trials, in this context often termed dis-

ease progression studies, are long, costly, require many individuals, and have

low success rates [28]. Therefore, it is extremely important to use informative

study designs and efficiently analyze the obtained trial data. The development

of novel approaches intended towards facilitating both the design and analysis

of disease progression studies are the subject of this thesis.

1.1 Pharmacometrics

An expanding need for the treatment of degenerative diseases and the poor

efficacy of current practices has resulted in increased efforts from the private

and public sectors to tackle the challenges of disease progression studies. The

use of quantitative modeling and simulation plays a central role in these ini-

tiatives [53]. The science utilizing modeling and simulation of biological sys-

tems to aid efficient drug development is pharmacometrics [67].

Depending on the particular approach employed, pharmacometric methods

can support the development of new drugs at several different levels: 1) it can

help to understand the underlying mechanisms of a disease or drug action; 2) it

can help to analyze the data obtained in a clinical study more efficiently; and 3)

it can be used to simulate and plan future studies. An essential component for

these and most other pharmacometric applications are models. Their function

and structure is described in section 1.1.1.

The prospects of a more efficient drug development process have led to an

increased application of pharmacometric models in the pharmaceutical indus-

try. Section 1.1.2 describes the relatively new paradigm that puts pharmaco-

metric models at the center of drug development.

11

Figure 1.1. Potential complexity of a pharmacometric model for a clinical trial illus-

trated by common component and their interactions.

1.1.1 Pharmacometric models

The probably most important pharmacometric tool is the model, a simplified

mathematical description of the data-generating system: a clinical trial, the

human body, an animal or a cell culture. Despite the intrinsic simplification,

pharmacometric models extend from simple linear functions to complicated

sets of differential equations.

A modern clinical trial, especially during the late phases of drug develop-

ment, involves a complex system influenced by an interplay of several diverse

factors. Predicting clinical outcome requires the identification and character-

ization of influential processes as well as their interactions (figure 1.1). The

evolution of pharmacometrics is driven, to a large degree, by the aspiration

to better describe different components of this complex system. In this mis-

sion, the field aligns objectives with other branches of pharmacology such as

pharmacokinetics (PK) and pharmacodynamics (PD).

While PK is the study of the time course of a drug within the body [60] and

often seen as the cradle of pharmacometrics [67], PD is dedicated to the de-

scription of the drug effect, both beneficial or adverse [67]. Additional mod-

els for biomarkers or the placebo effect are common integrations of the PD

component. Joint pharmacokinetic-pharmacodynamic (PKPD) models cover

both the description of the commonly measured drug concentration as well as

12

its observed effects and can include subcomponents for biophase distribution,

biosensor process, biosignal flux, transduction and many more [31].

Diseases in general are dynamic processes and change during the time

course of a clinical trial. The disease progression component of a pharma-

cometric model captures theses changes by describing the trajectory of the

disease [38].

The trial execution model describes the implementation of the trial itself and

includes both the nominal protocol, such as inclusion and exclusion criteria, as

well as protocol deviations through patient withdrawal or non-compliance [21].

Most of the components described above depend on covariates, e.g., the

distributional volume of a drug on a patient’s weight or the progression rate of

the disease on the patient’s age. The covariate distribution model captures the

distribution of these covariates in the population [21].

1.1.2 Model-based drug development

The development of a new drug is a long and costly process [1] characterized

by a continuous accumulation of knowledge around the new compound [30].

Each lab experiment, pre-clinical study and clinical trial delivers additional

data that may be integrated into the existing body of knowledge. The frame-

work of model-based drug development (MBDD) aspires to facilitate the inte-

gration of knowledge by capturing the available information in pharmacomet-

ric models.

Sheiner highlighted the importance of an alternation between learning and

confirming phases during clinical drug development [58]. MBDD fully em-

braces the learn-confirm paradigm. During the learning phase, models are

used to capture information and drive research towards knowledge gaps. Based

on these models, confirmatory trials are planned and quantitative decision cri-

teria derived, allowing for making risk decisions efficiently at any phase of the

development [30].

Currently, MBDD is not routinely used across the pharmaceutical indus-

try [30]. However, companies that implemented the MBDD paradigm made

considerable gains in efficiency [35] and the US Food and Drug Administra-

tion (FDA) considers MBDD an “important approach to improving drug de-

velopment knowledge management and development decision making” [64].

1.2 Disease progression studiesDiseases in which the function or structure of the affected tissues or organs

worsens over time are commonly summarized as degenerative disorders [39].

This umbrella term includes a broad spectrum of different diseases with a

diverse set of pathologies and different tissues affected. An overview of this

diversity as well as a classification of common drug actions is provided in

13

section 1.2.1. A more detailed description of AD, which is the focus of 2 out

of the 6 papers in this thesis, is given in section 1.2.2.

The term “disease progression studies” tries to capture the common chal-

lenges for clinical trials in the therapeutic areas of progressive diseases. These

challenges, created by the shared progressive nature of these disorders, are

discussed in section 1.2.3.

1.2.1 Progressive disorders and types of drug action

Progressive disorders are commonly chronic conditions with a worsening of

the disease status over time. While this definition also includes rapidly pro-

gressive diseases that last days or weeks, this thesis focuses on more slowly

developing progressive diseases which extend over several years.

In most cases progressive disorders are caused by degenerative processes

that gradually reduce the physiologic operation of organs and tissues. These

disorders can differ considerably in etiology, pathogenesis and clinical man-

ifestation. Osteoporosis, for example, is a systematic skeletal disorder char-

acterized by low bone density and micro-architectural deteriorations of bone

tissue with a consequent increase in bone fragility, ultimately resulting in frac-

tures and people becomeing bedridden [69]. Parkinson’s disease is a degen-

erative disorder of the central nervous system caused by death of dopamine-

generating cells in the substantia nigra. Multiple sclerosis is an inflammatory

disease in which the fatty myelin sheaths around the axons of the brain and

spinal cord are damaged. Each of these diseases affects a different organ and

thus creates a unique challenge for patient care and drug development.

A certain terminology has been developed towards describing the type of

drug action in progressive disorders [38]. Generally, three different types are

distinguished: 1) symptomatic, 2) disease modifying and 3) curative drug ac-

tion. A symptomatic drug has a beneficial effect without changing the trajec-

tory of the disease and benefits patients only during their treatment. Therefore,

when the treatment is stopped the disease status rapidly returns to the untreated

state. A disease modifying drug effect, in contrast, affects the progression of

the disease, changes its trajectory and provides a benefit even after the treat-

ment has stopped. Finally, a curative drug resets the disease status to the level

of a healthy individual.

1.2.2 Alzheimer’s disease

The novel methods developed in this thesis apply to many progressive disor-

ders and may be applied beyond this class of diseases. However, AD was the

focus in 2 of the 6 papers, justifying a more detailed description of this disease.

AD is the most common form of dementia, believed to be responsible for

50-75% of all cases [47] and estimated to affect more than 115 million in

14

2050 [68]. In early stages, patients become forgetful, show signs of disorien-

tation and changes in mood [48]. While AD progresses relentlessly, all brain

regions become affected and patients are unable to recognize relatives, are un-

able to eat without help and are severely restricted in their mobility [48]. This

late phase of the disease constitutes an enormous financial, physical and psy-

chological burden to the caregiver [47]. Ultimately, AD leads to the death of

the patient.

The pathogenesis of the disease is not yet fully understood and several com-

peting hypotheses for the cause of the disease exist [3]. The key pathologi-

cal characteristics of AD are amyloid plagues and neurofibrilliary tangles [3].

However, it is currently still debated which of these pathological changes are

a cause or a consequence of the disease. According to the amyloid cascade

hypothesis, changes in the synthesis of amyloid β leads to its accumulation

inside neuronal cells and extracellularly, where it aggregates into plaques [3].

These toxic concentrations of amyloid β are hypothesized to trigger changes

in tau, a microtubule-associated protein, causing the neurofibrillary tangle for-

mations as well as neuronal cell death [3].

Diagnosis of AD can currently only be performed post mortem [3]. In the

clinic, a detailed history of symptoms as well as neuropsychological tests,

such as the mini-mental state examination (MMSE), are used to obtain a prob-

able diagnosis. AD clinical trials require a more precise assessment of the

cognitive status and rely on more complex evaluations such as the ADAS-cog

test. MMSE and ADAS-cog as the two most important neuropsychological

tests and are central elements of paper V and VI are described in the following

sections.

Treatment options for AD are even sparser than the understanding of the

molecular mechanisms of the disease. Currently, only symptomatic treat-

ments with 2 types of drugs are available: 1) Cholinesterase inhibitors such as

Donepezil; and 2) NMDA receptor antagonist such as Memantine [3]. Despite

extensive efforts with numerous AD trials of potentially disease modifying

treatments, no approved drug is on the market today. Most disease-modifying

treatments under investigation target amyloid β , yet are so far without any suc-

cess [3]. Focus is now shifting towards a treatment much earlier in the disease

progression as well as on treating specific populations [3].

Mini-mental state examinationThe MMSE is a quick neuropsychological evaluation consisting of a question-

naire with 30 items. For each item a simple task is given to the patient and the

number of correctly performed items is recorded. The sum of correctly per-

formed tasks constitutes the MMSE score; in clinical trials this score is often

used for screening.

15

Table 1.1. Components of the ADAS-cog 11 and 13 assessment (* marks additionalADAS-cog 13 components)

Type ComponentsTask based Commands, Construction, Ideational Praxis,

Naming Object & Fingers, Orientation, Word Re-

call, Delayed Word Recall*, Word Recognition,

Number Cancellation*

Rater assessed Comprehension, Spoken Language, Remember-

ing, Word Finding

ADAS-cog assessmentThe ADAS-cog score is the main regulatory accepted clinical endpoint for AD.

Similar to the MMSE, the total ADAS-cog composite score is obtained by rat-

ing a subject’s ability to perform a broad range of cognitive tests and then sum-

marizing all scores. The ADAS-cog is substantially longer than the MMSE,

has higher scores for more severely impaired subjects (i.e., an inversed scale)

and exists in several variants. These variants differ in the number of compo-

nents included in the assessment and also target specific patient populations.

The original ADAS-cog assessment developed by Rosen et al. [54] consists

of 11 components (table 1.1) and is referred to as ADAS-cog 11. It is often

extended by addition of two components resulting in the ADAS-cog 13 which

has a score range from 0 and 70 points [36] (table 1.1).

1.2.3 Challenges in disease progression studies

In this thesis the term disease progression studies will be used to refer to the

common challenges of phase II and III studies in the progressive disorders

described above.

Progressive disorders share several features that complicate the evaluation

of treatment effects in a clinical trial. Frequently, these trials have treatment

durations of several months and more. In AD, for example, the trial duration

can extend up to 3 years [23], and in osteoporosis up to 5 years [44]. This is

due to the relatively slow progression of these disorders and the consequential

need to observe patients over an extended time frame to detect changes. The

long enrollment period, in turn, increases the probability for patients to drop

out of the study, reducing the sample size for analysis.

The slow progression rate of these diseases has another important conse-

quence. For drugs that affect the disease progression rate, the observed effect

size is expected to be low. For instance, the expected between treatment group

difference after one year is only 1.6 ADAS-cog score points for a hypothetical

disease modifying AD treatment that reduces the progression by 30%1. This

leads to the requirement of large study sizes in order to have sufficiently high

1Assuming a natural history progression rate of 5.49 points per year [23]

16

statistical power to detect a drug effect. Large study sizes, in combination with

long durations, result in very expensive trials.

An additional complication is the predominately elderly patient population

noted in progressive disorders. Elderly subjects may dropout more frequently,

especially in long studies, and are more likely to take other medications. Spe-

cific to clinical trials of progressive disorders is the overlap between disease-

induced degeneration and natural aging, and the complications of separating

the two during the trial analysis.

Quantification of the disease is a further complication in a disease progres-

sion study. Due to the lack of a cure for most progressive diseases, the ef-

ficacy evaluation of a novel treatment is based on an assessment of the dis-

ease status rather than on the number of subjects cured. The probability to

detect a significant treatment effect or power (see also section 1.3.5) is there-

fore closely linked to the performance of the disease quantification method.

Neuro-degenerative disorders such as AD or Parkinson’s disease use elaborate

composite scales to quantify the disease status. In contrast to most biomarkers,

these composite scales are bounded, are inherently discrete, and are non-linear

scales with particular statistical properties. From a pharmacometric perspec-

tive, a robust incorporation of these statistical properties constitutes a chal-

lenge.

1.3 Trial data analysis

Clinical trials are the biggest constituent of the development costs for new

drugs. It is therefore extremely important to utilize the expensively obtained

information in the most efficient manner. A model-based data analysis pro-

vides an elegant and effective way to utilize the information contained in the

data and to increase the efficiency of a clinical trial [26].

The following sections introduce some of the general pharmacometric tools

used in model-based trial data analyses: non-linear mixed effect models (sec-

tion 1.3.1), item response theory (section 1.3.2), Monte-Carlo simulations

(section 1.3.3), maximum likelihood estimation (section 1.3.4) and hypothesis

tests (section 1.3.5).

1.3.1 Non-linear mixed effect models

Non-linear mixed effect models(NLMEMs) explicitly handle the biologic vari-

ability inherent in clinical data and have become the standard in pharmacomet-

rics for analyses of clinical trials. In this thesis, NLMEMs for both continuous

and discrete data are used. For continuous data, a NLMEM is commonly

specified by describing the observation yi j for individual i at time ti j directly

through the prediction function f (·) as well as the deviations between those

17

Example 1 – NLMEM model. The following equation represents a plau-

sible NLMEM describing the increase in ADAS-cog score of AD patients

with age:

yi j =70

1+ elog( 7010−1)+αi(τi−ti j)

470

+ εi j

In this parameterization τi is the age of disease onset (defined as an

ADAS-cog score of 10) and αi is the maximal progression slope. For

the IIV model of the disease onset one might assume τi = θτeηi and

for simplicity that the progression slope does not vary in the population

(αi = θα ). The complete model has 4 parameters: θτ , θα , ω2 (Var(ηi))and σ2 (Var(εi, j)).

predictions and the observations through the residual error function h(·), i.e.

yi j = f (ti j,g(θ ,ai,ηi))+h(ti j,g(θ ,ai,ηi),εi j) (1.1)

For discrete data, on the other hand, a NLMEM describes the probability of

observing x using the probability density function (pdf) l(·), i.e.

P(yi j = x) = l(ti j,g(θ ,ai,ηi)) (1.2)

Prediction function and residual error function in the continuous case as well

as the pdf for discrete data depend: on the observation time points ti j and sub-

ject specific parameters described as the function g(·) of population values θ ;

individual specific covariates ai; and individual specific random effect param-

eters ηi. The residual error function of the continuous data model furthermore

depends on the random variables εi j. Both sets of random variables, ηi and εi j,

are assumed to be normally distributed with mean zero and covariance matrix

Ω and Σ for the interindividual variability (IIV) and residual unexplained vari-

ability (RUV) random effects, respectively. For notational purposes, one may

summarize all population parameters (θ , Ω and Σ) in the vector Θ.

1.3.2 Item response theory

Item response models are a class of NLMEMs that were used in this thesis

for the analysis of composite scores. These scores are a common method

of quantifying the disease severity in neuro-degenerative disorders and are

obtained by summarizing various tests or questionnaires.

Generally, the severity of a disease constitutes a hypothetical construct. It

can be described as a hidden or latent variable, which cannot be directly ob-

served or measured. Disease assessments serve as the surrogate measure, with

each test in the assessment being interpreted as part of the measurement of the

18

disease severity. Simply summarizing the sub-scores completely disregards

the level of information that each individual test may contribute.

Rather than just considering the summary score itself, the statistical frame-

work of item response theory (IRT) relates every sub-score obtained in the

different tests to a subject-specific hidden or latent variable (Di). The relation-

ship between Di and the probability for a certain response to test j is described

through item characteristic curves(ICCs), i.e.,

P(Yi j = k) = l j(Di,Θ j) (1.3)

where the parameter vector Θ j is a set of test specific parameters. The shape

of the ICCs for each test is informative with respect to how sensitive the test

is relative to the population being tested. Based on those ICCs, IRT allows

to estimate the disease severity of a subject given the specific results of the

assessment. The scale for the hypothetical variable Di is generally defined by

assuming a normal distribution in the population and defining a mean of 0 and

variance of 1 as a reference.

As a result, IRT allows comparison of individuals of different disease sever-

ities on a single scale irrespective of the assessment performed, as long as the

individual tests have been mapped to the overall hidden variable scale of the

population. This aspect makes the approach preferable for measuring disease

severity over a long period of time with either one instrument, an instrument

with multiple variants (ADAS-cog 11, 13 etc. in AD) or multiple measurement

instruments (MMSE, ADAS-cog).

1.3.3 Monte-Carlo simulations

NLMEMs have multiple levels of random effects and therefore a potentially

infinite number of possible states. Monte-Carlo (MC) simulation is a tech-

nique used in many computational fields and relies on repeated random sam-

pling to generate and investigate the distribution of states of a system. Within

pharmacometrics MC simulations have a multitude of different applications

including the solution of complex integrals, the visualization of model behav-

ior, the investigation of statistical methods and the simulation of clinical trials.

The most frequently used method of MC simulation from NLMEMs is the

generation of random effect samples, followed by an evaluation of the model

for each of the samples and finally a summary of the obtained results. An

example for the application of MC simulations is given in example 2.

1.3.4 Maximum likelihood estimation

Parameter estimation provides a bridge between the data collected in an exper-

iment and the model. Generally, the goal is to find a set of parameter values

19

Example 2 – MC simulations. MC simulations can be applied to visualize

the range of probable ADAS-cog values for the model from example 1 as

well as simulate trial data.

0

20

40

60

20 40 60 80Age [years]

AD

AS−c

og s

core

●●

●

●

●●

●●●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●●

●

●

●●12

3

45

6

7

8

9

10

0

10

20

30

40

50

55 60 65 70Age [years]

AD

AS−c

og s

core

The left graph displays different percentiles of individual ADAS-cog

scores (no RUV) and the right graph one sample of trial data for a 2 year

disease progression study with 10 subjects (parameter values: θτ = 55,

θα = 6, ω2 = 0.02 and σ2 = 5). The numbers identify different subjects.

for which the model could have produced the observations. Maximum likeli-

hood estimation (MLE) choses the model parameters such that the observed

data becomes most likely under the model. This method has a number of good

statistical properties and thus is one of the most frequently used parameter

estimation methods.

The log-likelihood (LL) function measures how likely the data are for a

specific set of parameters. For a NLMEM, as defined in section 1.3.1, the LL

function is the sum of individual LL (Li) over all subjects N in the data, i.e.

L (y,Θ) =N

∑i=1

Li(yi,Θ) (1.4)

The individual LL is defined as

Li(yi,Θ) = log

∫ ∞

−∞u(yi|θ ,η ,Σ) · v(η |Ω)dη (1.5)

where u(·) and v(·) are the individual data density and the population param-

eter density, respectively. Their product is referred to as the joint likelihood

density; the integral over this product is called the marginal density or individ-

ual contribution to the LL. For continuous data, u(·) generally corresponds to

the density function of the normal distribution and for categorical data, u(·) is

equal to the pdf l(·) in equation 1.2.

The calculation of the individual LL, or more specifically the integration

over the IIV random effects, makes the MLE in NLMEMs challenging. No

20

Example 3 – Likelihood. The following figure visualizes the individual

data density (u(·) in equation 1.5) , the population parameter density (v(·))and the joint density (u(·)v(·)) under the simulation parameter values for

the data simulated in example 2 for subject 7.

Data Parameter Joint

0.0000

0.0005

0.0010

0.0015

0.250.500.751.001.25

0.0000

0.0005

0.0010

0.0015

0.0020

−0.5 0.0 0.5 −0.5 0.0 0.5 −0.5 0.0 0.5ηi

Den

sity

analytical solution for the integral in equation 1.5 exists and thus numerical

approximations must be used. Several different methods for the integral ap-

proximation exist and one can classify estimation algorithms as either deter-

ministic or sampling based depending on the type of integral approximation

used. Alternatively, algorithms might be classified according to whether they

use a gradient-based method or the expectation maximization (EM) algorithm

to maximize the approximate likelihood with respect to the population param-

eters.

The following sections provide an overview of the most common NLMEM

estimation algorithms. All algorithms, except adaptive Gaussian quadrature

(AGQ), are implemented in the NLMEM software NONMEM R© [6], which

was originally developed by L.B. Sheiner and S.L. Beal at the University of

California in San Francisco; NONMEM is now maintained by Icon Develop-

ment Solutions and is the primary modeling software for this thesis. A com-

parison of the different estimation algorithms in NONMEM is the subject of

one of the papers in this thesis.

Gradient-based estimation algorithmsGradient-based algorithms operate in iterations. At each iteration, an algo-

rithm uses the current set of population parameter estimates to obtain an ap-

proximation of the LL as well as derivative information. From the derivative

information, improved parameter estimates for the next iteration are found (see

left figure in example 4). The algorithms differ mainly in their LL approxima-

tion; the most common ones are presented in the following sections.

Adaptive Gaussian quadratureGaussian quadrature is a general numerical integration technique suitable if

the integrand is the product of two functions just as in equation 1.5. The

21

integral is approximated by the weighted sum

∫ ∞

−∞u(yi|θ ,η ,Σ) · v(η |Ω)dη ≈

Q

∑q=1

wq ·u(yi|θ ,η ,Σ)|η=zq (1.6)

where Q is the number of quadrature points that determines the order of ap-

proximation. The optimal location of the quadrature points (zq) is given by the

roots of the Hermite polynomials. The weights (wq) can be calculated with a

specific algorithm [37].

AGQ improves the approximation by centering and scaling the integrand

in equation 1.6 as if it was the pdf of a normal distribution [37]. Mean and

variance of this normal distribution are given by mode and second derivative

(Hessian) with respect to η of the logarithm of the joint density. The updated

quadrature points take the shape of the integrand into account and achieve a

much higher accuracy than the non-adaptive method [37]. Furthermore, in

contrast to all other gradient-based estimation methods, AGQ can achieve ar-

bitrary precision. However, for every individual a maximization has to be

carried out, the second derivative has to be calculated and the integrand has

to be evaluated at every quadrature point. This is computationally very ex-

pensive if the number of random effects is large. This estimation algorithm is

implemented for example in the statistical software SAS [56].

Laplace approximationThe Laplace approximation can be seen as AGQ with only one quadrature

point at the mode of the joint density and therefore is a second-order method

(accurate up to quadratic terms in the Taylor expansion of the joint density) [37].

This method is expected to perform well provided the joint density is sym-

metric around its mode. An illustration of the operation of the algorithm is

provided in example 4.

First-order conditional approximationCalculating the second derivative of the joint density is time consuming. This

is avoided in the first-order conditional estimation (FOCE) algorithm by ap-

proximating the Hessian by its expected value [65]. Generally, the expectation

is calculated under the assumption of normally distributed individual data (yiin equation 1.5). With this assumption the expected Hessian is a function of

the model gradient with respect to η . Because of that, the FOCE algorithm is

considered a first-order method and is expected to perform well if the individ-

ual data approximately follow a normal distribution.

The calculation of the expected Hessian and the determination of the mode

require the consideration of extra terms if the residual error function h(·) is

dependent on η . The variant of the FOCE algorithm taking these extra terms

into account is referred to as first-order conditional estimation with interaction

(FOCE-I).

22

Example 4 – Laplace approximation. The figure below illustrates the

gradient-based optimization procedure and the effect of the joint density

approximations when using the Laplace approximation method. The es-

timation model is presented in example 1 and the data was simulated in

example 2.

●

●

●

●

●●●●●●

Iteration 1

Iteration 10

0.00

0.02

0.04

0.06

0.08

50 55 60 65 70θτ

ω2

Subject 2 Subject 7

Iteration 1Iteration 10

−0.1 0.0 0.0 0.2ηi

Join

t Den

sity

First-order approximationThe first-order (FO) estimation algorithm introduces an additional level of ap-

proximation by calculating the expected Hessian matrix at η = 0 instead of

at the mode of the joint density. This avoids the maximization of the joint

density and therefore further reduces the computational burden. The approx-

imation is expected to perform well if the joint data are approximately nor-

mally distributed and the IIV random effects enter the model linearly (at least

approximately).

First-order with interaction (FO-I) is the variant of the algorithm for η-

dependent residual error functions.

Expectation maximization algorithmsThe remaining algorithms described in this section use the EM algorithm to

estimate the population parameters. A characteristic of these algorithms is

the alternation of E (expectation) and M (maximization) step. During the E

step one of the methods described below is used to obtain estimates of the

conditional mean parameters for each individual, i.e., the expected values of

g(θ ,ai,ηi) in equation 1.1 and 1.2 given the current population parameters

and the data. During the M step the likelihood of the data given the current

estimates of the conditional mean parameters is maximized with respect to the

population parameters. The next iteration of the algorithm uses the updated

population parameters during the E step, followed by another maximization.

This process repeats until a termination criterion, e.g., change of the popula-

tion estimates by less than a certain value, is met.

23

Iterative two-stage algorithmThe iterative two-stage (ITS) estimation algorithm uses either the Laplace or

the FOCE methods to obtain estimates of the conditional mean parameters in

the E step. Thus, the mean of the distribution is approximated by its mode [61].

Importance SamplingAn intuitive way of obtaining estimates for the conditional mean parameters

in the E step is sampling from the population parameter density v(η |Ω) in

equation 1.5 and calculating the weighted average of g(θ ,ai,ηi) with a weight

corresponding to the the data density evaluated at these samples. Similar to the

Gaussian quadrature method, this approach can be improved by concentrating

the sampling to regions where the value of the joint density is large [4].

Importance sampling uses the FOCE or Laplace method to obtain an ap-

proximate sampling density in the first iteration. In following iterations, the

conditional mean parameter, as well as its variance from the previous iteration,

are used as a sampling density. The algorithm obtains an exact value for the

marginal density if the number of samples approaches infinity [6].

Stochastic approximation expectation maximizationSimilar to importance sampling, stochastic approximation expectation maxi-

mization (SAEM) uses random samples to evaluate the joint density. How-

ever, in contrast to the importance sampling algorithm, SAEM uses very few

samples at each iteration (down to 2) and requires more elaborate sampling

strategies [6]. Furthermore, at every iteration the algorithm averages together

individual parameter samples with samples from previous iterations which

converge towards the true conditional individual parameter means and vari-

ances [6].

1.3.5 Hypothesis tests and power

Formulating and testing hypotheses is an integral part of all scientific research.

In pharmacometrics, hypotheses are generally formulated and tested using

models. Application range from decisions between a PK 1 or 2-compartment

model to significance testing of drug effect.

In a model based data analysis, hypothesis testing can be formalized as the

decision between the following two hypotheses (H):

H0 : Ψ(Θ) = 0

H1 : Ψ(Θ) �= 0 (1.7)

where Ψ(.) is a function of the model parameters Θ. Generally, the model

estimated under the restriction Ψ(Θ0) = 0 is referred to as the reduced model

and the model estimated without this restriction as the full model.

24

The decision whether to reject the null hypothesis or not is based on the

test statistic t(Θ̂) which maps every possible vector of parameter estimates Θ̂to either of the 2 possible outcomes. The decision boundary, separating the

rejection and the non-rejection region, is given by the equation t(Θ̂) = tcut, in

which tcut is a pre-specified scalar value.

Within this framework of only two possible outcomes, there are also two

possible ways of making errors referred to as type I and type II errors. When

testing for a drug effect, a type I error corresponds to the false identification

of a drug effect and a type II error to the ignorance of an existing effect.

Let p(t|H0) be the pdf of t(Θ) given H0 is true then α is given by

α =∫ ∞

tcut

p(t|H0)dt (1.8)

Thus, the decision criteria for the test statistic t is chosen such that the prob-

ability for a type I error is equal to α . In a clinical study this probability is

usually set by the regulatory agency.

For the sponsor of the trial, the probability of not identifying an existing

drug effect β or similarly the probability of the complementary event, cor-

rectly identifying a drug effect π , is of particular interest. The probability πcan be calculated using

π = 1−β = 1−∫ tcut

−∞p(t|H1)dt (1.9)

where p(t|H1) is the pdf of t(Θ̂) given H1 is true.

The explicit form of p(t|H0) and p(t|H1) depends on the test statistic (t(Θ̂))employed. There are two main hypothesis tests that are used in pharmacomet-

rics: log-likelihood ratio (LLR) test and the Wald test (see also example 5).

Log-likelihood ratio testThe LLR test evaluates the evidence for the null hypothesis in the LL domain

and uses the following test statistic

tLLR

(θ̂)= L

(Θ̂,y

)−L(

Θ̂0,y)

(1.10)

where L (.) denotes the log-likelihood evaluated for the observed data y under

the reduced and full model, respectively. The LLR statistic follows a chi-

square distribution with k degrees of freedom given the null hypothesis is true

and a non-central chi-square distribution with k degrees of freedom and the

non-centrality parameter λ , if the alternative is true.

25

Example 5 – LLR vs. Wald test. The following figure visualizes the testing

of the hypothesis H0 : θ = 52 for the maximum likelihood estimate θ̂ with

the LLR and Wald test.

Accept

Rej

ect

Rej

ect

Acc

ept

Log-

Like

lihoo

d (L

L)

RejectLog-Likelihood 2nd Order Approximation

The test statistic (↔) for the LLR test is in the LL domain (y-axis) and

in the parameter domain (x-axis) for the Wald test. Also, while the LLR

test uses the estimation LL (solid line), the Wald test relies on a 2nd-order

approximation of it (dashed line). Both statistics fail to reject H0.

Wald testRather then in the LL domain, the Wald test considers the evidence for the null

hypothesis in the domain of the parameters using the following formula

tWald

(Θ̂)= Ψ

(Θ̂)T

[∂Ψ

(Θ̂)

∂ΘI(Θ̂)−1 ∂Ψ

(Θ̂)

∂Θ

T]−1

Ψ(Θ̂)

(1.11)

Here I(Θ̂)−1 is the inverse of the Fisher information matrix (FIM) and ∂Ψ(Θ̂)/∂Θis the Jacobian matrix of the constraint function [11]. For simple hypotheses

of the form H0 : Θ = 0 the Wald test becomes Θ2/Var(Θ).The Wald statistic also follows a chi-square distribution under H0 and a

non-central chi-square distribution under H1. In fact, LLR and Wald statistic

are asymptotically equivalent and the non-centrality parameter λ is, for both

tests, given by tWald [14].

1.4 Trial DesignR.A. Fisher himself highlighted2 that the planning of an experiment is of sim-

ilar importance as the analysis of the data afterwards.

2“To call in the statistician after the experiment is done may be no more than asking him to

perform a post-mortem examination: he may be able to say what the experiment died of.”

26

The following sections introduce some of the pharmacometric trial design

methodologies that were used in this thesis. First of all, the basis of optimal

design is introduced (section 1.4.1), followed by a description of the FIM for

NLMEMs and the design optimization process (section 1.4.3). Finally, clinical

trial simulations(CTSs) as an additional tool for the design of clinical trials is

discussed (section 1.4.4).

1.4.1 Cramer-Rao bound and optimal design

The idea of designing scientific experiments in an optimal way dates back to

the beginning of the 20th century [8]. R.A. Fisher first demonstrated that the

information contained in experimental data is limited and dependent on the

experimental setup [8]. A formal mathematical inequality was established by

Cramer and Rao (independently from each other) stating that the inverse of the

FIM is a lower bound for the covariance of any unbiased estimator [10, 49].

Formally,

I−1(ζ ,Θ)≤ Var(Θ̂|y) (1.12)

where I−1 is the inverse of the FIM, ζ are the design variables and Var is the

variance covariance matrix for Θ given y. Therefore, the Cramer-Rao bound

provides the essential basis for the optimized planning of an experiment, sug-

gesting that the experimental design variables should be chosen such that the

FIM is maximal.

The branch of statistics dedicated to the optimization of experimental de-

signs based on the FIM is called optimal experimental design, or simply opti-

mal design.

1.4.2 Population Fisher information matrix

The FIM is mathematically defined as the expectation over the second moment

of the score function or, equivalent under certain regularity conditions, as the

negative of the expectation of the second derivative of the LL

I(ζ ,Θ) =−E

(∂ 2

∂Θ2L (y,Θ)

)(1.13)

For linear and nonlinear regression models without random effect parameters,

the FIM can be analytically calculated as demonstrated in the work by A. C.

Atkinson [2]. In contrast, the calculation of the FIM for NLMEMs relies on

approximations since analytic solutions do not exist; it is often referred to as

population FIM.

A closed form solution for the FO approximated population FIM for contin-

uous data was initially derived by Mentré et al [34]. Later, this approach was

27

Example 6 – Optimal design. The following figure visualizes the ap-

plication of optimal design to determine the optimal study age for the

ADAS-cog model from previous examples.

The lowest plot displays the determinant of the FIM for different study

ages and the two plots above the corresponding expected confidence inter-

vals for the parameters θτ and ω2. The dashed line marks the D-optimal

design.

extended by Retout et al. [51], Foracchia et al. [17] and others. These deriva-

tions of the FIM rely on the FO approximation described in section 1.3.4 and

thus use the same underlying assumptions (most importantly normal distribu-

tion of the data).

1.4.3 Design optimization

The comparison of different FIMs, obtained for different designs, is performed

utilizing a function that maps the FIM to a scalar number (see example 6).

There are an ever growing number of different design criteria focusing on

different aspects of the FIM. One of the most common criteria in optimal de-

sign is D-optimality. D-optimality maximizes the determinant of the FIM and

provides a compromise by minimizing the generalized variance of all param-

eters in the model. Ds-optimality, another common criterion, is useful when a

subset of the model parameters is of special interest and therefore should be

estimated with highest possible precision.

The scalar objective function can be optimized with respect to the design

variables using a generic optimizer. A number of characteristics make the

optimization of experimental design non-trivial: generally the optimization

problems are multidimensional (e.g., multiple groups with different sampling

times) with a non-linear and often non-smooth objective function. Further-

28

more, when optimizing over multiple design variables, like dose and sample

times etc., it can be important to consider a simultaneous optimization of the

different design parameters [40].

The dependence of the FIM on the unknown model parameters constitutes

a drawback in practice. Global optimal design provides a more flexible frame-

work by allowing parameter distributions instead of requiring point values.

1.4.4 Clinical trial simulations

CTS is a MC-based method in which a planned clinical trial is simulated from

a pharmacometric model in order to study a large variety of outcome metrics.

For example, the simulated data can be re-analyzed with the simulation or an

alternative model to evaluate the planned data analysis. Generally, the process

of simulation and re-estimation is repeated several times to obtain more ro-

bust results; this methodology is also referred to as stochastic simulation and

estimation (SSE).

CTSs can also be used to study the expected parameter precision for a cer-

tain study design and therefore could be considered an alternative to optimal

design. However, while CTSs are easy to implement and very versatile, they

are also computationally rather expensive. Furthermore, only the investigation

of a fixed set of candidate designs is practical with CTSs. Optimal design in

contrast, is less flexible but much faster and allows a systematic search in the

defined design space.

29

2. Aims

The overall aim of this thesis was to evaluate and develop pharmacometric

methods for design and analysis of clinical trials focusing on the particular

requirements for disease progression studies.

This aim was pursued in three stages I. foundation of tools, II. expansion of

methodology and III. demonstration of application benefits.

I. Foundation of tools: The goal of the first stage was the characterization

and development of pharmacometric software. A particular focus was

here the evaluation of different NLMEM estimation algorithms for a di-

verse set of data types and the implementation of new functionality in

an optimal design program to facilitate the optimal planning of clinical

studies.

II. Expansion of methodology: The second stage focused on the extension

of the methodology around statistical power. Both, the explicit optimiza-

tion of the study design for maximal power and a novel algorithm for a

faster power estimation were the specific subjects.

III. Demonstration of application benefits: The objective of the final stage

was to demonstrate the potential benefits of applying pharmacometric

methods in disease progression studies. From the perspective of study

design, the benefits and challenges of applying optimal design in late

phase clinical trials was investigated and for an improved data analysis

of ADAS-cog assessment data, the advantages of using IRT were ex-

plored.

30

3. Methods

3.1 DataItem level ADAS-cog assessment data was used in paper VI of this thesis for

the development of an IRT model. The following section describes the sources

and characteristics of these data.

3.1.1 Item level ADAS-cog data

Baseline and longitudinal item level ADAS-cog assessment data were used in

paper VI.

Baseline ADAS-cog dataThe baseline assessment data from 2744 patients was taken from two major

AD databases, the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and

the Coalition Against Major Diseases (CAMD). As a collection of various

studies, these two databases contain a number of different ADAS-cog assess-

ment variants.

The data contained in the ADNI database (version as of December 2012)

consisted of a mild cognitively impairment (MCI) group, a mild AD group and

an elderly control group that were followed for 36 months (24 for the control

group). The baseline observations from all groups were used in this work.

The CAMD database (C-Path Online Data Repository) contained, as of

November 2012, the de-identified control arm data from 20 clinical trials.

Eight of those trials contained both total ADAS-cog score and item level data.

The baseline data from seven of these eight studies were used to develop the

IRT model; the remaining study was used for the longitudinal analysis.

Longitudinal ADAS-cog dataThe data used to describe longitudinal changes in cognitive disability were

collected from a double-blind, placebo controlled phase III trial to evaluate

atorvastatin (Lipitor R©) effect in mild to moderate AD patients who were on

stable donepezil (Aricept R©) background therapy (LEADe study) [16, 25]. The

study duration was 80 weeks which included a withdrawal phase of 8 weeks.

This analysis included all ADAS-cog 11 assessment data from the placebo

arm (placebo + donepezil) during a double-blind treatment period and the

withdrawal phase. The study included a total of 8 scheduled ADAS-cog as-

sessments, at 0, 3, 6, 9, 12, 15, 18 and 20 month. Overall, the dataset consisted

of 322 patients contributing to 98,439 item level observations.

31

3.2 Models

The following sections present the different pharmacometric models used in

this thesis: a set of PKPD benchmark models (section 3.2.1), a generic dis-

ease progression model (section 3.2.2), an ADAS-cog summary score model

(section 3.2.3); and an ADAS-cog IRT model (section 3.2.4).

3.2.1 PKPD benchmark models

The models presented in this section are generic PKPD models that were used

for the evaluation of software and novel methodologies.

Estimation method evaluation modelsThe models included in the evaluation of estimation algorithms in paper II de-

scribed continuous, binary, ordered categorical, repeated time to event (RTTE)

and count PD responses. In the continuous data model, noradrenaline lev-

els were modeled with a direct inhibitory Emax model related to a given

concentration-time profile of moxonidine. The data set contained data from

95 individuals for whom noradrenaline concentrations were available at 3 oc-

casions with 5 samples after each dose. The model consisted of 5 fixed effect

parameters and 4 random effect parameters, 1 of which described the interoc-

casion variability (IOV), 2 the IIV and 1 RUV.

The other 4 models investigated in paper II all containined a time-constant

baseline model with a random effect to describe the IIV. In the models for bi-

nary, RTTE and count data, the drug effect was incorporated as a fixed effect I-

max model, where the drug effect could completly inhibit non-zero responses

(binary) or events (RTTE and count). In the model for ordered categorical

data, the drug effect was incorporated as a linear fixed effect with a random

effect to describe the IIV. In each case, the study design included 48 subjects

divided evenly into 4 dose groups (0, 10, 50 and 200) with 8 hourly observa-

tions for all but the RTTE model, and observations each minute (n=480) for

the RTTE.

Power estimation algorithm evaluation modelsIn paper IV three distinct models were used to compare the novel power es-

timation algorithm to the MC based method. Model 1 was a disease progres-

sion model which represented a simplified version of the ADAS-cog summary

score model described in section 3.2.3 (linear disease progression with disease

modifying drug and placebo effect).

Model 2 evaluated the probability to detect the auto-inductive effect of a

hypothetical drug for different adherence levels and sample sizes. The differ-

ential equation was taken from the work of Wilkins et al. [66] and consisted

of a classical 1-compartment model with first order absorption and separate

32

liver enzyme compartment. Enzyme compartment and plasma clearance were

linked to handle the auto-induction process.

The last example of paper IV, model 3, evaluated the power to detect a drug

effect for different doses in an event count study. A Poisson count model was

used to describe the drug dependent number of events occurring in 8 hourly

intervals.

3.2.2 Generic disease progression model

Paper III used a generic disease progression model with linear natural history

and a combined symptomatic and disease modifying drug effect. Both natural

history parameters were assumed to vary between individuals (exponential IIV

for the baseline and proportional for the progression rate). The drug effect was

modeled as acting proportionally on the natural history fixed effects and it was

assumed that the IIV during treatment is independent from the natural history

IIV. The RUV was modeled as a combination of an additive and a proportional

error model.

3.2.3 ADAS-cog summary score model

In paper V, the model used for the study design optimization was created by

combining information culled from different sources. The natural history of

the disease progression in AD, as well as the typical placebo response, were

extracted from the meta-analysis published by Ito et al. [23]. A population

model developed with longitudinal ADAS-cog scores from 817 patients (from

the ADNI study) contributed variability information in terms of covariate re-

lationships, IIV as well as RUV [24]. The covariate distribution model was

obtained from the CAMD database [53]. A model describing the dropout was

available through an internally performed analysis of the ADNI data (unpub-

lished). The model was completed using a drug effect model for a hypothetical

disease modifying treatment. A maximal effect of 10% reduction in disease

progression rate with an on- and offset half-time of 3 months was assumed.

3.2.4 ADAS-cog item response theory model

A IRT model for ADAS-cog assessment data was developed in paper VI.

Model building was performed in two stages. In stage 1, a model for the

baseline data was developed and, in stage 2, this model was extended to lon-

gitudinal data.

Baseline modelThe IRT model described the response for each of the test items of the ADAS-

cog as a function of the patients’ underlying cognitive disability (Di). Most

33

ADAS-cog test items consist of a number of tasks that the subject is asked to

perform and whether the patient succeeded or not. These tests, having two

potential outcomes, were modeled using a binary model, which described the

probability to fail (pi j) as function of cognitive disability using a model, com-

monly referred to as a 3 parameter logit (3PL) in IRT publications. The 3PL

model had the form

pi j = c j +(1− c j)ea j(Di−b j)

1+ ea j(Di−b j)(3.1)

In this parameterization, the three test item specific parameters are: 1) a j – the

slope or discrimination parameter, 2) b j – the item location or difficulty pa-

rameter and 3) c j – the probability for a subject with no cognitive impairment

to fail.

For word based tests of the ADAS-cog assessment, “(Delayed) Word Re-

call” and “Word Recognition”, it was assumed that the ICCs do not differ

between words. The resulting count of incorrectly recalled/recognized words

k out of n given words, was described using the binomial model

P(Yi j = k) =(

nk

)pk

i j(1− pi j)n−k (3.2)

where(n

k

)denotes the binomial coefficient. For the word recall tests (3 repeti-

tions of the “Word Recall” test and the “Delayed Word Recall” test), the failure

probability pi j was modeled using equation 3.1. For the “Word Recognition”

test, equation 3.1 was extended to

pi j = c j +(d j − c j)ea j(Di−b j)

1+ ea j(Di−b j)(3.3)

where the additional parameter d j describes the maximal probability for a

severely cognitively impaired person to incorrectly categorize the words as

previously seen or not.

For the “Number Cancellation” component a generalized Poisson model

was used to describe the data

P(Yi j = k) =p(Di)(p(Di)+δk)k−1e−p(Di)−δk

k!P(Yi j > 40)(3.4)

p(Di) = d j

(1− ea j(Di−b j)

1+ ea j(Di−b j)

)(3.5)

where a j, b j, d j have a similar interpretation as above and δ is a dispersion

parameter allowing for over- or underdispersion in the data. The factor P(Yi j >40) in equation 3.4 ensures that all scores predicted by the equation are in the

range 0 to 40.

34

The remaining components are examiner rated and categorize a subject to 1

of 5 categories (no impairment to severe impairment). These data were mod-

eled using a proportional odds, ordered categorical model. The probability

that a patient received a rating of at least k was described using the function

P(Yi j ≥ k) =ea j(Di−b jk)

1+ ea j(Di−b jk)(3.6)

Similar to the 3PL model, a j is the slope and b jk is the difficulty parameter.

The latter was constrained to be non-decreasing for higher scores of the same

test. The probability of obtaining exactly the score k was then calculated by

subtracting the probability to obtain at least k+ 1 from the probability of ob-

taining at least k.

The variable Di was modeled as a subject specific random effect following

a normal distribution with a mean of zero and a variance of 1. Note that the

assumed scale of cognitive disability goes from −∞ to +∞. This scale is

arbitrary and the theory does not preclude the use of other scales or assumed

distributions.

Longitudinal modelTest specific parameters in the baseline IRT model were fixed to the previ-

ously estimated baseline values and deterioration, as a consequence of disease

progression, was implemented on the hidden variable. The specific implemen-

tation of the disease progression model on the hidden variable scale followed

the model evaluated by Ito et al. [24]. Total change during the study was as-

sumed to be of small magnitude, justifying the following linear expression

Di(t) = D0i +αit (3.7)

Both, the baseline D0i and slope parameter ai were assumed to be subject-

specific and modeled through random effects (D0i = θ1+ηi1 and αi = θ2+θi2),

which were allowed to be correlated.

A dropout model for interval censored data [22] was implemented, describ-

ing the probability of a subject to remain in the study beyond a certain time.

Four different hazard functions were tested: constant hazard, cognitive dis-

ability dependent hazard, progression rate dependent hazard and baseline dis-

ability dependent hazard. The hazard function describing the data best, was

chosen using the LLR test with a 5% significance level.

3.3 Power estimationThree different approaches were used in this thesis to estimate statistical power

and are described in the following sections: MC-based power (section 3.3.1),

information matrix based power (section 3.3.2) and parametric power estima-

tion (PPE) (section 3.3.3).

35

3.3.1 Monte-Carlo based power

MC simulations are the most common method of estimating the power of a

future clinical study and were used in paper VI and as a reference in paper III

and IV. In general terms, the method replicates the planned analysis of the trial

for NMC simulated data sets. For each replicate the simulated data is analysed

and the hypothesis test is carried out by comparing the test statistic to the 1−αquantile of the chi-square distribution (see section 1.3.5). The power estimate

is then the fraction of times the hypothesis test was significant.

3.3.2 Information matrix based power

The possibility to calculate power using the FIM is discussed in paper III. Two

different Wald statistics corresponding to two alternative formulations of the

constrains function Ψ(Θ) in equation 1.7 were derived and compared to the

MC derived LLR power (section 3.3.1).

The first Wald statistic was derived by assuming that under H0 the true drug

effect parameter (θE) is equal to 0, leading to the following form of equa-

tion 1.11

tWL(θE) =θ 2

Evar(θE)

(3.8)

Since the null-hypothesis is linear in θE , tWL will be referred to as linear Wald

statistic.

The second Wald statistic was based on the assumption of equal expected

responses under the null-hypothesis, i.e.,

Eθ (yi) = Eθ 0(yi) (3.9)

From the FO approximation of the likelihood, the constraint function and the

following non-linear Wald statistic were derived

θWNL(θ̂) = ( fF − fR)T(

∂ ( fF − fR)

∂θI(θ)−1 ∂ ( fF − fR)

T

∂θ

)+

( fF − fR)

(3.10)

where fF is the predictions of the full model, fR is the prediction of the re-

duced model, I is the expected FIM and + refers to the Moore-Penrose pseudo-

inverse.

The power was, for both variants, calculated according to equation 1.9 as-

suming non-central chi-square distribution of the test statistic under H1.

3.3.3 Parametric power estimation algorithm

The PPE algorithm introduced in paper IV aimed to reduce the number of

replicates required to obtain a stable power estimate by utilizing the theoreti-

36

cally expected non-central chi-square distribution fχ2(ti,k,λ ) of the test statis-

tic. Maximum likelihood estimation can be used to estimate the non-centrality

parameter λ from a sample of LLR test statistics (T ) using

λ̂ = argmaxλ

∑t∈T

log fχ2(t,k,λ ) (3.11)

Subsequently, λ̂ is used to calculate an estimate for the power from the cumu-

lative distribution function of the non-central chi-square distribution according

equation 1.9.

This novel method was extended to obtain a full power versus sample size

curve from one λ̂ estimate using the scalar multiplicativity of the Fisher in-

formation (i.e., let I1 be the Fisher information for n1 subjects, then the Fisher

information for n2 subjects is I2 = n2/n1I1). As a result, if the number of indi-

viduals in a study is increased proportionally in every arm from n1 to n2 , the

non-centrality parameter (λ ∗) for this study is given by

λ ∗ =n2

n1λ (3.12)

where λ is the non-centrality parameter for the study performed with n1 sub-

jects.

3.4 Information calculationFisher information was used in several papers of this thesis. The following

sections describe the calculation of Fisher information for the determination

of power (section 3.4.1), the optimization of a late phase clinical trial (sec-

tion 3.4.2) and the evaluation of the ADAS-cog components (section 3.4.3).

3.4.1 Information matrix for power

In paper III, the expected parameter precision in equations 3.8 and 3.10 were

derived from two different approximations of the population FIM: FO and

FOCE-Mode approximation. The FO approximation was based on the ap-

proach presented by Forracchia et al. [17]. The FOCE-Mode approximation

was a slightly updated version of the FOCE method described by Retout et

al. [51]. Here the mode and the corresponding random effect values were cal-

culated based on the expected data for the subject. The method is described in

more detail in paper I.

3.4.2 Information matrix for late phase clinical trials

The FIM was approximated using the FO approximation [17, 34]. However,

due to the covariates in model 3.2.3, the FIM had to be averaged over the co-

variate distributions. For the categorical covariates (APOE4 genotype and sex)

37

this was performed by multiplying every possible combination of categorical

covariates with its corresponding probability and summing all terms together.

This sum was averaged over the remaining continuous covariates (MMSE and

age). The averaging was implemented numerically using 2-dimensional Simp-

son quadrature [12].

The probability distributions required for the averaging were determined

from covariates in the CAMD trial database. For the categorical covariates,

the empirical probabilities, i.e. the observed frequencies of every covariate

combination, were used. The distribution of continuous covariates was ap-

proximated through a multivariate normal distribution which was fit to the

observed distribution in the database.

As a matter of simplification, it was assumed that the FIM depends on

dropout only in terms of the number of individuals in the study. With this

assumption the FIM could be written as the weighted sum of FIMs with re-

duced study length.

The calculation time for the FIM was reduced by solving the differential

equation describing the delayed onset of the drug effect analytically. Addi-

tionally, the FIM calculations were parallelized to further reduce runtimes.

3.4.3 Information of ADAS-cog components

In paper VI, the Fisher information for cognitive disability in equations 3.1

to 3.6 was calculated analytically. This was feasible given the special structure

of the equations (i.e., the only random effect in the equation was cognitive

disability).

The resulting information functions were visualized to illustrate the sensi-

tivity of each assessment item over the full cognitive disability range.

The Fisher information functions also served as a basis to calculate the av-

erage information of each assessment component in a MCI and a mild AD

patient population. Firstly, mean and standard deviation for cognitive disabil-

ity in the MCI and mild AD cohort of the ADNI study were estimated using the

model shown in section 3.2.4. Secondly, using the two disability distributions

and assuming normality, the average information for each ADAS-cog assess-

ment item was calculated. Thirdly, average item information for all items in

one component was added to yield average component information. Finally,

assessment components were ranked based on their average information con-

tent.

3.5 Simulation studies

The MC-based simulation studies used in this thesis to compare estimation al-

gorithms (section 3.5.1), to evaluate power estimation methods (sections 3.5.2

38

and 3.5.3), and to compare the ADAS-cog analysis methods (section 3.5.4) are

described in the following sections.

3.5.1 Estimation algorithm comparison

The performance of the estimation algorithms in NONMEM version 7.1.2

were evaluated in paper II with a SSE study. The five PKPD models presented

in section 3.2.1 were used to simulate data sets that were subsequently ana-

lyzed with the different estimation algorithms. The algorithms were evaluated

and compared with respect to bias, precision, robustness, and runtime.

Bias and precisionThe bias and precision of the estimation algorithms were evaluated from the

normalized estimation errors(NEEs) and the relative root mean square er-

rors(rRMSEs) calculated from 500 SSEs.

NEEs were calculated using the formula

NEE(apΘ̂i) =

apΘ̂i − pΘsdr(pΘ̂)

(3.13)

whereapΘ̂i represents the estimate of parameter p in data set i obtained with

algorithm a, pΘ is the true parameter value and sdr(·) is a robust estimate of

the standard deviation of estimates across algorithms. It was computed by di-

viding the empirically determined interquartile range (IQR), i.e. the difference

between the 75th and 25th percentile of all the estimates from all algorithms,

by the IQR of the standard normal distribution.

rRMSEs were calculated according to

RMSE(apΘ̂) =

√√√√N−1500

∑i=1

(apΘ̂− pΘ)2 (3.14)

rRMSE(apΘ̂) = RMSE(

apΘ̂)/RMSE(

FOCEpΘ̂) (3.15)

The rRMSE was further summarized per algorithm by calculating the average

across all random and fixed effect parameters of a model.

RobustnessThe robustness of the estimation algorithms was evaluated by performing each

estimation of the SSEs twice, using initial estimates set to the true parame-

ter values, and randomly generated initial estimates (CHAIN option in NON-

MEM 7). Afterwards, the number of significantly different final estimates

were counted and used as a robustness metric. The 2 estimates were regarded

as significantly different if they were more than 1.96 standard deviations apart

(95% confidence level). As above, a robust estimate of the standard deviation

was used.

39

RuntimeThe runtimes, reported by NONMEM for the 100 estimations, with initial

estimates set to the true parameter values were used to calculate average esti-

mation time for each algorithm and each model separately.

3.5.2 Information matrix based power evaluation

The generic disease progression model presented in section 3.2.2 was used to

compare the linear and non-linear Wald statistic (section 3.5.2) to the LLR

test power. The latter was calculated from 1000 SSEs according to MC-based

power estimation method described in section 3.3.1. Both FO and FOCE ap-

proximation methods were used for the comparison.

3.5.3 Parametric power estimation algorithm evaluation

Three different pharmacometric models (section 3.2.1) were used to compare

the performance of the PPE and the pure MC-based algorithm under different

sampling sizes. In all evaluation scenarios, the power obtained though the MC

power estimation algorithm with NMC = 10,000 samples served as a reference

value. Power estimation with both algorithms were repeated 1,000 times to

evaluate the uncertainty in the estimates. The Δ-OFVs used for the evaluations

were sampled (with repetition) from the 10,000 Δ-OFVs generated for the

reference power.

3.5.4 Power comparison for ADAS-cog analysis approaches

The MC-based power estimation method described in section 3.3.1 was used

to compare the power to detect a drug effect for 3 different data analysis meth-

ods: (1) least-square-means (LS-means) analysis, (2) analysis using a longi-

tudinal pharmacometric model for the total ADAS-cog score and (3) analysis

using the longitudinal IRT model (section 3.2.4).

The CTSs were performed under the scenario of a hypothetical phase III

trial in mild to moderate AD patients for a disease modifying agent. The

trial duration was set to 20 months with a balanced parallel-arm study design

(placebo and treatment) and 7 ADAS-cog assessments per subject (0, 3, 6,

9, 12, 15 and 18 months). The longitudinal IRT model with the parameter

values estimated from the longitudinal data was used as the simulation model.

A hypothetical drug effect was introduced as a 20% lower subject-specific

disease progression rate in the treatment group.

40

4. Results

This chapter describes the results obtained in papers I to VI. The structure of

this chapter reflects the pursuit of the thesis’ aims in three stages. First, the

results for the foundation of tools are presented (section 4.1), followed by an

evaluation of the expanded methodology (section 4.2) and concluded by the

demonstration of application benefits (section 4.3).

4.1 Foundation of tools

The optimal design software PopED and the NLMEM software NONMEM

served as important tools in this thesis. An extension of PopED, as described

in section 4.1.1, and characterization of the estimation methods in NONMEM,

presented in section 4.1.2, are the foundation of papers III to VI.

4.1.1 PopED - a population optimal design tool

PopED was initially developed by Foracchia et al. [17] and was considerably

extended as a part of this thesis. The results of this extension, in terms of

the software architecture and numerical implementation, are described in this

section.

Software architectureThe software architecture chosen for PopED is illustrated in figure 4.1. It

consists of a graphical user interface (GUI) and a calculation engine which

communicate through the generation of extensible markup language (XML)

files.

The GUI was written in C# .NET Framework 2.0. It enables usage of

PopED without extensive knowledge of programming. The GUI presents de-

sign and optimization settings in a more user-friendly interface, but all settings

may also be entered directly in the PopED configuration file. The calculation

engine consists of a library of MATLAB R©functions to process the FIM calcu-

lation, to optimize with different design criteria, and to generate the diagnostic

plots and results files, etc.. The calculation engine also handles the translation

of PopED XML settings file into MATLAB code which is then executed within

the PopED script function.

A number of tools to visualize model responses and diagnose optimization

results were included in PopED. These tools can visualize the typical model

41

Figure 4.1. Overview of the software architecture of PopED.

response, the results of an optimization, and the influence of a pair of design

variables on the chosen criterion. Another tool realized in PopED allows for

translating design efficiency to number of individuals, by plotting the lowest

and highest efficiencies versus number of individuals. Finally, a sampling

window tool was implemented providing the user with information about the

design sensitivity.

PopED was extended by parallelization support using either the MATLAB

Parallel Computing ToolboxTM or Open MPI [18]. The Open MPI imple-

mentation was tested by evaluating 231 designs for two models with different

calculation times (≈ 1s and ≈ 60s per FIM) across a varied number of CPU-

cores. On a 4 CPU-core system with an additional job manager, the parallel

implementation resulted in a reduction in runtime by a factor of 6.8 for the fast

model and a factor of 7.6 for the slow one.

Numerical implementationSeveral different FIM approximations were implemented in PopED. An ex-

tended version of the FO approximation derived by Mentré et al. [34] was

made available in PopED, supporting a general residual variability model and

a FIM with IIV covariance terms as well IOV. In addition, the calculation of

interaction terms between IIV/IOV and the RUV model was included through

the FO-I approximation. Several implementations of the population FIM as-

42

sume that the variance of the model is independent of the change in typical

values. This assumption accelerates the calculation of the FIM but can have

implications for the optimal design [41]. In PopED, both options, full FIM

and reduced FIM, were enabled. Furthermore, the FOCE approximation of the

FIM as described by Retout et al. [51] was added to PopED. Finally, a first-

order conditional mode approximation to mimic the FOCE algorithm more

closely than the FOCE FIM approximation was implemented.

PopED focuses on exact experimental designs that take the discrete nature

of experimental units into account. Finding initial guesses in this situation

can be complicated. Hence, it is important to use an optimizer which is ro-

bust with respect to the choice of the initial values. In PopED two different

asymptotically global optimization techniques were included: 1) adaptive ran-

dom search followed by stochastic gradient or a Broyden-Fletcher-Goldfarb-

Shanno optimization and finally a line search [17] or 2) modified Fedorov

exchange algorithm [15, 43].

The calculation of the FIM for mixed effect models includes numerous

derivatives. In PopED, a combination of different differentiation strategies

was made available: 1) symbolic calculations of analytic derivatives using the

symbolic toolbox in MATLAB for simple models with a closed form solution,

2) automatic differentiation using the INTLAB package [55] for more com-

plex functions, 3) complex step differentiation [32] when numerical deriva-

tives are necessary and 4) classical central differences as a numerical method

when complex differentiation can not be used.

For the solution of ordinary differential equations(ODEs), the classical solvers

available in MATLAB (i.e., ode45 for non-stiff and ode15s for stiff ODEs)

were extended through a solver using matrix exponentials for linear homo-

geneous differential equations. Furthermore, a solver using Krylov subspace

projection techniques [59] for linear nonhomogeneous differential equations

with constant input was added. Both, the matrix exponential and the Krylov

subspace projection, produced more accurate results and were faster than the

general solvers when applicable.

4.1.2 Performance of NLMEM estimation algorithms

The comparison of estimation algorithms available in the current version of

NONMEM was carried out with respect to bias, precision, robustness, and

runtime.

Bias and precisionThe NEEs for each algorithm, stratified by model type, are shown in figure 4.2.

The median estimate should be close to zero to be considered unbiased and

the range of estimates should be approximately ±sdr from the median to be

considered precise. A table with rRMSEs as the second bias and precision

metric is available in paper II.

43

● ● ● ● ● ●● ● ● ● ●●

●

●

● ●

●

●

●

●

● ●●

●

● ● ● ●●

●● ● ●

●

●

●

● ● ●

●

●

●● ●

● ●

●

●

● ● ● ● ● ●

●

●

● ● ●●●

●

● ● ●

●

● ●

● ● ●

●

● ●● ●

●

●● ●● ● ● ●● ● ● ● ● ●●

●● ● ●

●●●

● ● ●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

● ● ● ●●

●● ● ●

●

● ● ● ● ●

●

●

●

● ● ● ●●●

● ● ●

●

● ● ● ● ●

●

Continuous Binary

OC RTTE

Count

−2 SD

−1 SD

0

1 SD

2 SD

−2 SD

−1 SD

0

1 SD

2 SD

−2 SD

−1 SD

0

1 SD

2 SD

FOCE/Laplace

ITS IMP IMPMAP

SAEM Bayes

Estimation Method

Nor

mal

ized

Est

imat

ion

Err

or

Parameter Type

●

●

Fixed Effect

Random Effect

Figure 4.2. Point range plot of the normalized estimation error for fixed (black) and

random effect parameters (grey) stratified by model type. The median estimation er-

rors of each parameter are shown as a point and a vertical line indicating the range of

the estimates (1 standard deviation).

As shown in figure 4.2, for the continuous data model the IMP algorithm

was most precise and least biased, visible both through the NEEs and the mean

rRMSE. IMPMAP and SAEM were the second and third best algorithms. The

ITS and BAYES algorithms showed marked bias for some of the parameters.

For the binary data model, IMP, IMPMAP and SAEM algorithms displayed

very similar performance with low bias and good precision for fixed and ran-

dom effect parameters. The remaining three algorithms produced biased ran-

dom effect, ITS and BAYES also had biased fixed effects estimates.

The Laplace algorithm performed best for the ordered categorical data mod-

els, producing estimates with low bias and high precision. IMP and SAEM

showed a similarly good performance but with a mean rRMSE slightly larger

than 1 (the FOCE/Laplace reference value). The ITS algorithm displayed the

poorest performance with imprecise fixed and biased random effects.

For the RTTE and count model the differences between algorithms were

minor; all produced negatively biased random effect estimates in both mod-

els. Bias and precision were good for the fixed effects for all but the BAYES

algorithm.

44

RobustnessClearly, the continuous data model represented the biggest challenge for the

algorithms; all showed more than 5% significantly different parameters. The

best performing algorithm for this data type was SAEM, with 8% and worst

performing BAYES with 38% significantly different parameters.

For the RTTE and binary data models, all algorithms showed a high degree

of robustness with percentages of significantly different parameters close to or

below the expected level of 5%.

The ordered categorical and count model showed opposite trends, while

only SAEM (8%) and BAYES (15%) showed more than 5% differences for

the ordered categorical model, only ITS (14%), IMP (12%) and IMPMAP

(12%) did so for the count model.

RuntimeThe average runtime across all data sets for each model and each algorithm are

presented in figure 4.3, data are on a logarithmic scale due to large differences

between algorithms and models.

Across all models, FOCE/Laplace had the shortest runtimes, followed by

the ITS algorithm with runtimes equal to Laplace for the ordered categori-

cal model but 2-3 times slower than FOCE/Laplace for the remaining models.

The IMP and IMPMAP algorithms had similar relative runtimes for all models

with IMPMAP nevertheless being always a bit faster than IMP. The relative

runtimes were between 25 (continuous data model with IMPMAP) and 77 (bi-

nary model with IMP) times longer than for the FOCE/Laplace algorithm. The

runtimes were longer for IMP and IMPMAP than for the SAEM algorithm for

all models, except the count model, where they were similar for all three algo-

rithms, and the continuous model, for which IMPMAP and SAEM had similar

runtimes. Runtimes for the SAEM algorithm were between 8.4 (ordered cate-

gorical model) and 54 (count model) times longer than for the FOCE/Laplace

Continuous Binary OC RTTE Count

10

1000

FOC

E/L

apla

ce ITS

IMP

IMP

MA

PS

AE

MB

ayes

FOC

E/L

apla

ce ITS

IMP

IMP

MA

PS

AE

MB

ayes

FOC

E/L

apla

ce ITS

IMP

IMP

MA

PS

AE

MB

ayes

FOC

E/L

apla

ce ITS

IMP

IMP

MA

PS

AE

MB

ayes

FOC

E/L

apla

ce ITS

IMP

IMP

MA

PS

AE

MB

ayes

Run

time

[s]

Figure 4.3. Average estimation time in seconds for each algorithm and model type on

the log-scale.

45

Symptomatic Drug Effect Protective Drug Effect

25

50

75

100

25

50

75

100FO

Approxim

ationFO

CE

Approxim

ation

0 100 200 300 400 0 100 200 300 400Number of Individuals

Pow

er [%

] Method

LLR

NL Wald

Wald

Figure 4.4. Comparison of power curves as predicted by the linear (Wald) or non-

linear Wald (NL Wald) test and through simulation and estimations followed by a

log-likelihood ratio test test (LLR)

algorithm. The BAYES algorithm was the slowest for all models and the rel-

ative runtimes were between 47 (ordered categorical model) and 280 (count

model) times longer than for the FOCE/Laplace algorithm.

4.2 Expansion of methodology

This thesis extended the existing methodologies for calculation of statistical

power in disease progression studies and beyond. The comparison of the novel

methods for power calculation with the more established LLR MC simulations

is subject of the following sections. First, the results of evaluating the the

information matrix based power are presented in section 4.2.1 and then, the

performance of PPE is assessed in section 4.2.2.

4.2.1 Information matrix based power

Figure 4.4 compares the empirical, LLR based, power to detect a symptomatic

or protective drug effect with the power as predicted by the linear and non-

linear Wald statistic. All three power curves for both drug effects are shown

as obtained from the FO and FOCE approximation.

46

For the symptomatic drug effect and the FO approximation method, there

is a high degree of correspondence across the whole power curve between all

three test statistics. At higher power values, both Wald statistics slightly over

predict the LLR test power, with the non-linear Wald statistic being marginally

lower than the linear one. Under the FOCE approximation, the deviation be-

tween empirical and predicted power increases with an increasing number of

individuals. The difference between the two Wald statistics, however, is minor

i.e. 230 individuals to reach 80% power according to the linear versus 239

individuals according to the non-linear Wald statistic (314 for the empirical

power).

In contrast, the predicted power to detect a protective drug effect differs

considerably between both Wald statistics for the FO and FOCE approxima-

tions. In comparison with the LLR power, the linear Wald statistic drastically

over predicts the power and reaches 80% already with less than 30 individ-

uals. The non-linear Wald statistic on the other hand, is in good agreement

with the LLR power over the whole range from 5 to 400 individuals and both

approximation methods.

4.2.2 Parametric power estimation

For all three examples (disease progression, auto-induction and count study)

the power estimated through PPE and MC algorithms were in very good agree-

ment. In figure 4.5 the power versus sample size curves for different adher-

ence levels from the auto-induction example are shown. Each panel displays

Compliance: 100% Compliance: 70−100% Compliance: 40−70%

●

●

●

●

●

●

0%

25%

50%

75%

100%

0 2 4 6 8 10 12 0 2 4 6 8 10 12 0 2 4 6 8 10 12Total Number of Subjects

Pow

er

●●PPE MC Ref.

Figure 4.5. Power versus total number of subjects in the trial for different compliance

levels in an auto-induction study as estimated from the parametric power estimation

(PPE) and the Monte-Carlo (MC) algorithm. Furthermore, reference power (Ref.) as

determined from a 10,000 sample based estimation with the MC algorithm.

47

●

●

●●

●

●

●10%

20%

30%

40%

50%

0 100 200 300 400 500Number of samples used in the power estimation

Ran

ge o

f est

imat

ed p

ower

Algorithm

● PPE

MC

Figure 4.6. Range of estimated power versus number of samples used in the power es-

timation for the parametric power estimation (PPE) and Monte-Carlo (MC) algorithm

(auto-induction study)

the power curve determined using the PPE algorithm (10 subjects, 100 MC

samples) with the power values obtained through the pure MC algorithm (100

MC samples) for sample sizes of 5 and 10. For the median power, both algo-

rithms show a very good agreement between each other and with the reference

power across all 3 scenarios. Furthermore, the variability in power estimates

from 1,000 repetitions of the estimation procedures is significantly lower for

the PPE algorithm. This is highlighted even more in figure 4.6 displaying the

range of predicted power for 70-100% compliance and 10 subjects in the study

for both algorithms and differing number of MC sample sizes. For one spe-

cific study size, the PPE algorithm required about half as many simulations

and estimations to achieve the same precision as the pure MC based method.

Furthermore, from one estimate of the PPE algorithm a full power versus study

size curve could be obtained with simple rescaling. For the examples investi-

gated, the full power curves were also in good agreement with the MC based

evaluations.

4.3 Demonstration of application benefits

The resulting benefits of applying pharmacometric methods in disease pro-

gression studies are presented in the following sections. Specifically, sec-

tion 4.3.1 illustrates the use of optimal design methodology in late phase tri-

als and section 4.3.2 highlights the advantages of an IRT-based analysis of

ADAS-cog score data.

48

4.3.1 Optimal design in late phase trials

Optimal design has been recognized as a valuable tool for the planning of

studies and is used currently in most large size pharmaceutical companies [33].

Current applications, however, are largely limited to early phases of the drug

development process [33]. The increased complexity of late phase trials is a

major reason why optimal design is not applied at these stages. The following

section presents the challenges and benefits of applying optimal design in late

phase clinical trials using the example of an AD study.

ChallengesSeveral challenges had to be addressed to apply optimal design methodology

in this setting. First of all, the lack of an integrated AD model was handled by

combining information from multiple sources: the literature, publicly avail-

able clinical trial databases as well as prior knowledge of the compound under

development (a hypothetical drug effect in this work). The resulting model,

described in section 3.2.3, integrated the available knowledge in a coherent

framework.

The model had typical complexities seen in late phase pharmacometric

models such as covariates and dropout. The covariates in the model were

handled by averaging the FIM over the joint covariate distribution that was es-

timated from the CAMD trial database. This approach provided a stable FIM

but ignored correlations between categorical and continuous covariates.

The presence of dropout was addressed by excluding the dropout parame-

ters from the FIM and considering only the reduction in number of individ-

uals over time through this process. Basically, this approach assumes non-

informative dropout and separates the estimation of dropout and disease pro-

gression parameters.

The computation time for the FIM is an essential determinant of how ex-

tensive the available design space can be explored. Usually, several hundred

FIM evaluations are performed during an optimization. In this work, the com-

putation of 1 FIM for the inital ODE based model took more than 15 minutes

leading to an optimization time of several days. By solving the ODE analyt-

ically and parallelizing the computations the evaluation was reduced to less

than 30 seconds resulting in a total optimization time of a few hours.

Practical constraints are a common complication when applying optimal

design to the clinical setting. These can range from obvious limitations such

as the non-practicality of replicate blood samples to less obvious ones such as

the working hours of the study nurse. In this work, the time between ADAS-

cog assessments was restricted to be at least a month. Mathematically, this was

included in the optimization using inequality constraints and then solved tech-

nically using the interior point algorithm from the MathWorks R© optimization

toolbox.

49

BenefitsBeginning with a reference design of a typical phase III trial in AD with paral-

lel groups (placebo/standard of care vs. active treatment) and 300 patients per

arm, design optimization with 3 optimization criteria were carried out. The

first optimization strategy maximized the D-optimality criterion which was

interpreted as maximization of the overall information collected in the trial.

For the second optimization strategy, the goal was to minimize the number of

interventions for each patient while roughly maintaining the information from

the reference design. The third optimization strategy aimed at maximizing the

power to detect a drug effect in the trial. The final optimal designs as well as

the reference design are summarized in table 4.1. The optimal values for some

of the design variables differed considerably.

Table 4.1. Design variables for the reference and optimized designsTreatment Inclusion

Design Sampling times Start Stop Age MMSE

Reference 0, 3, 6, 12, 18, 24 0 24 55-90 14-26

Max. information 0, 1.2, 4.6, 10.8, 14.1, 24 0 16.7 55-65 8-19

Min. interventions 0, 2.2, 14.6, 240 12.6 55-65 8-19

1.1, 4.8, 7.5, 24

Max. power 0.4, 3, 4.8, 13.6, 22.9, 24 0 24 55-65 8-17

Compared to the reference design, the maximal information design had an

efficiency 135%. Therefore, in order to obtain the same amount of information

as the maximal information design, at least 35% more individuals would have

been required under the reference design.

The number of samples per subject in the minimal interventions design is

reduced from the reference by 33% (from 6 to 4) while decreasing the overall

information content by merely 2.6%. Calculated for the total trial with 600

patients in total, this corresponds to a saving of about 1,200 visits. This is

achieved by adding two additional arms to the study, one placebo and one

active treatment, and by optimizing the sampling times in both groups.

The power to detect the treatment effect under the reference as well as under

the 3 optimal designs is compared in figure 4.7. With the reference design

approximately 1388 subject are required to reach 80% power; only 423 are

needed with the power optimal design. This corresponds to a reduction of the

number of individuals in the trial by 70%.

4.3.2 IRT based modeling of ADAS-cog data

The use of IRT to analyze ADAS-cog assessment data resulted in a number of

benefits: 1) ADAS-cog assessment variants from different studies were ana-

lyzed in a common framework, 2) the information content of the ADAS-cog

50

● ●●

●●

●●

●●

●●

●●

●●

●● ●

●● ●

● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0%

20%

40%

60%

80%

100%

0 500 1000 1500 2000Total Number of Subjects

Pow

er Design

● Reference

Maximal Information

Minimal Interventions

Maximal Power

Figure 4.7. Power to detect a drug effect versus sample size for the reference and

3 optimal designs. Furthermore, power for a MMRM analysis on ADAS-cog score

change from baseline (dashed line).

assessment components was quantified and 3) longitudinal trial data were de-

scribed better. The following sections describe these benefits in more detail.

Joint analysis of multiple ADAS-cog variantsThe data from all 8 studies (section 3.1.1) were analyzed in a joint model

(section 3.2.4) and all test-specific parameters were successfully estimated.

The resulting ICCs are shown in figure 4.8.

For the binary response components (top row), all curves have the charac-

teristic S-shape in the range of -4 to 4, with a low failure probability for low

scores and high failure probability for higher scores. The construction com-

ponent (top row, second column) is illustrative for understanding the individ-

ual ICCs. While the task “Draw a Circle” can even be performed by patients

with large disability, the “Draw a Cube” task represents a challenge for even

20% of healthy elderly subjects. It is important to note that a non-zero inter-

cept can be caused by a number of different reasons unrelated to cognition,

i.e., the probability to fail might not depend exclusively on cognitive disability

but also on other non-considered factors. For example, the non-zero intercept

for the task “Tap Shoulder” could be due to a certain percentage of elderly

people with a restricted range of motion.

Noteworthy are also the ICCs for the 3 repetitions of the “Word Recall”

test. While healthy individuals improve by the second repetition, patients with

high cognitive disability value show little change between repetitions.

The panels for the ordered categorical type of response show the probability

of a certain classification as a function of cognitive disability. It is apparent

51

Ordered C

ategoricalC

ountB

inaryFa

ilure

Pro

babi

lity

Wor

ds F

aile

dC

ateg

oriz

atio

n P

roba

bilit

y

Poi

nts

Not

Sco

red

Ordered Categorical:Binary: Count:

Figure 4.8. Item characteristic curves for the different items of the ADAS-cog assess-

ment

here that category 0 is the most frequently assigned category in all compo-

nents. The figure also illustrates that there is considerable overlap between

categories, indicating that patients with a specific cognitive disability value

can be assigned to different categories with similar probability.

Goodness of fit and simulation diagnostics performed for all test items and

all studies qualified the chosen probabilistic model and underlined the validity

of the different models across studies.

Component information contentFigure 4.9 depicts the Fisher information of the different ADAS-cog items as

a function of cognitive disability. A task with a higher information value will

determine a subject’s cognitive disability more precisely. For reference, the

95% prediction intervals of cognitive disability in the MCI and in the mild

AD population as estimated from the ADNI database are also shown. The

information curves clearly differ both in amplitude as well as in location of

the maxima. Most items have their information peak to the right of the 95%

cognitive disability intervals, indicating a higher sensitivity for patients more

severely impaired than those studied in the ADNI study. An exception is the

“Delayed Word Recall” test, which has particularly high amplitude and is most

informative for relatively low disabilities. Also noteworthy is the particularly

52

Ordered C

ategoricalC

ountB

inary

Cognitive Disability Interval:Binary: Count:

Ordered Categorical

Mild Cognitively ImpairedMild AD

Figure 4.9. Information content for the different items of the ADAS-cog assessment

versus cognitive disability

low information content for the rater assessed items in the bottom row of fig-

ure 4.9.

A quantitative evaluation of the information content for each item can be

obtained by calculating the expected information for the MCI and the mild AD

patient populations. The total information, i.e., the sum over all components,

is 25% higher in the mild AD patient population than in the MCI population.

Also ranking the individual components by their contribution yields consider-

ably different results. For the MCI population, about 90% of the information

is contained in 6 components (in order of their information content): “Delayed

Word Recall”, “Word Recall”, “Orientation”, “Word Recognition”, “Naming

Objects & Fingers” and “Number Cancellation”. For the mild AD popula-

tion, information is more evenly distributed between components. While the

“Delayed Word Recall” component carries most information in the MCI pop-

ulation, it is only ranked 3rd for mild AD patients and its information content

dropped from 4.8 to 3.3. The “Orientation” component, in contrast, has much

higher information content in a mild AD population (5.01) than in a MCI

population (2.02). These differences are remarkable considering that the two

populations have significant overlap (see shaded area in figure 4.9).

53

Improved description of trial dataThe longitudinal IRT model (section 3.2.4) was successfully fit to ADAS-cog

assessment data from the LEADe study (section 3.1.1). The simulation based

diagnostics indicated a very good description of the observed data, both on the

item and on the summary score level.

Based on a simulated dataset from the longitudinal IRT model, a pharma-

cometric total ADAS-cog score model was built. The final pharmacometric

total ADAS-cog score model was compared with a LS-means analysis and the

longitudinal IRT model for their ability to detect a drug effect. Compared to

the LS-means analysis both pharmacometric methods provided considerably

higher power, with the IRT based method highest amongst the three. More

specifically, in order to achieve 80% power the IRT based model required 71%

fewer subjects than the LS-means analysis and 23% fewer subjects than the

pharmacometric summary score model. The increase in power was achieved

without inflation of the type-I error, as confirmed through a simulation and

estimation study.

54

5. Discussion

5.1 Foundation of toolsThe characterization and development of general pharmacometric software

were the subject of the first part of this thesis and provide the foundation for

later results.

Designing a clinical trial is a highly complex task requiring the consid-

eration of numerous factors and their potential influence on the outcome of

the study. Optimal design methodology provides techniques that can assist

in evaluating and finding study designs. The extension of the optimal design

software PopED, presented in paper I, made these techniques available in a

easy to use tool. PopED strives to be user-friendly without sacrificing func-

tionality by providing a GUI together with a rich set of available features as

well as flexible extension mechanisms. The utility of the software developed

in this thesis was twofold: 1) serve as a research tool and 2) as a vehicle to

bring the developed methods into clinical practice.

Parameter estimation is an essential part of the pharmacometric work-flow.

For NLMEMs the estimation of parameters is challenging; no closed form

solution of the likelihood exists and approximations are required. Qualifying

the influence of these approximations has been recognized as a crucial step,

as shown through the large number of studies on this topic [5, 45, 46, 57,

27, 19, 29]. However, paper II was the first study to evaluate bias, precision,

sensitivity with respect to initial estimates and runtime of NLMEM estima-

tion algorithms with a diverse set of models for different data types. The

IMP algorithm showed the best performance in terms of bias and precision,

FOCE/Laplace was the fastest algorithm with satisfactory bias and precision

results. In practice, the choice of an estimation algorithm is dependent on

a multitude of factors, objective, such as the ones used in this comparison,

but also subjective, like the analysts familiarity with a certain algorithm. The

best estimation algorithm might also change throughout different phases of an

analysis for runtime reasons: a fast algorithm for rapid model building com-

bined with a slower more precise algorithm for pivotal steps, for example.

Sometimes, runtime restrictions will forbid utilization of computationally ex-

pensive methods, e.g. during a power analysis with hundreds of simulations

and estimations. Other times, for example when critical drug development de-

cisions will be based on a particular model, the added benefit of more precise

estimates is worth almost any additional computing time.

Based on the findings of paper II, FOCE/Laplace was used throughout pa-

pers III-VI as the standard estimation method.

55

5.2 Expansion of methodology

The prediction and optimization of the expected power of a planned trial

through expanding the available methodology were the focus of the second

part of the thesis.

The utilization of the FIM to predict power and calculate sample sizes in

NLMEMs is not new [50, 42]. However, previous studies used the linear Wald

statistic, which provided satisfactory results in the investigated cases but can-

not be generalized as demonstrated in paper III. This is due to interactions

between parameters that are not taken into account with the classical Wald

test. The non-linear Wald statistic suggested in paper III respects these in-

teractions and delivers power predictions that are in close agreement with the

LLR statistic across all presented scenarios.

The possibility to predict the expected power of a clinical trial accurately

based on the FIM allows for effective communication of the influence of dif-

ferent designs. The increase in probability to detect an existing drug effect is

more illustrative than, for example, reporting the efficiency of a design. Fur-

thermore, it provides the basis for an explicit optimization of trial designs for

statistical power. The latter is of special importance considering the general

challenges in disease progression studies described in section 1.2.3 and was

demonstrated in paper V.

Disadvantages of the purely asymptotic method to estimate power derive

from the approximation of the population FIM. Furthermore, asymptotic meth-

ods do not take the behavior of the estimation algorithm into account and re-

quire implementation of the model in another software. Most importantly,

calculating the power via the expected information matrix is challenging for

categorical data. Therefore PPE as a novel algorithm to estimate power was

presented in paper IV. The algorithm estimates the unknown parameter in the

theoretical distribution of the of the test statistic under the alternative hypothe-

sis to obtain more precise estimates with fewer MC samples. For the examples

investigated in paper IV, the PPE algorithm required about half as many simu-

lations and estimations to achieve the same precision in the power estimate as

a purely MC based method. The advantage provided is even larger when the

influence of study size on power is evaluated, as the full power versus study

size curve can be obtained from one set of SSEs.

Applied to disease progression studies, the PPE algorithm can quickly gen-

erate power curves for competing study designs or be used to evaluate the

influence of the data analysis method for a future study (as done in paper VI).

The method is especially beneficial for complex models with multiple com-

ponents (PKPD, disease progression, dropout, etc.) and large study sizes, as

frequently encountered in disease progression studies. In contrast to the FIM-

based power estimation, the PPE algorithm can also be applied if a composite

disease status scale is modeled with an IRT approach.

56

5.3 Demonstration of application benefits

In the last stage of this thesis, the benefits of applying pharmacometric tools

and methods to disease progression studies were investigated in two papers.

Paper V focused on the design aspects of a disease progression study and

paper VI concentrated on the data analysis aspect, both papers use the example

of AD.

The application of optimal design to optimize the design of an AD study in

paper V resulted in a 35% increase in information, a 33% decrease in the num-

ber of samples per subject or a 30% reduction in the number subjects required

for 80% power. All designs were determined through simultaneous optimiza-

tion of assessment times, treatment start and stop times as well as inclusion

criteria. With the optimization, the result is chosen from an infinite num-

ber of candidate designs, covering design alternatives such as a delayed onset

or washout design. All optimal designs are non-trivial despite the relatively

simple, essentially linear structure of the model. These results underline the

limitations of heuristic approaches where design variables are chosen based

on tradition.

Paper VI demonstrated the advantages of an IRT-based analysis of ADAS-

cog assessment data. Utilizing IRT, the information available in clinical trial

databases can be used to characterize the relationships between the individual

items of a cognitive assessment. The resulting mathematical description can

serve as a platform for future trials with the advantages of a more exact repli-

cation of the score distribution, an implicit mechanism for handling missing

information, and the ability to easily combine data from different ADAS-cog

variants. The longitudinal IRT model also had a higher power to detect a drug

effect than a pharmacometric summary score model. Another feature demon-

strated in paper VI is the capability to quantify the information content of the

individual components of a cognitive assessment and the possibility to adapt

a cognitive assessment specific to the patient population’s degree of disability.

A population specific test would not only be more sensitive to changes due

to disease progression or drug effect, but also reduce the assessment time and

thus burden for the patient. With the help of IRT, a dynamic assessment of cog-

nition, where the tests are chosen based on the previous responses, can be im-

plemented [62]. In addition, IRT allows for combination of different cognitive

assessments, like the MMSE, into one common pharmacometric model [63].

Both papers capitalized on the results obtained in the first stages of this

thesis, paper V used PopED (paper I) for all optimizations and the power opti-

mality criterion (paper III). The FOCE/Laplace estimation algorithm was used

in papers V and VI.

57

6. Conclusions

Clinical trials for progressive diseases represent a particularly challenging

class of studies. In AD for example, the costs for a phase III study with 600

patients can reach up to 60 million dollars (including imaging , biomarkers,

clinical and bioanalytical costs). At the same time, the success rates for devel-

oping novel treatments for this disease is extremely low [7]. The pharmaco-

metric methods developed in this thesis can help to significantly reduce both

attrition rate and trial costs.

User-friendly optimal design software, such as PopED, simplify the appli-

cation of optimal design methodology when planning a disease progression

study and allow a direct optimization of the study design. This optimization

can target a variety of beneficial criteria, such as parameter precision or power

to detect a drug effect, while respecting clinical constraints. Applied to AD,

the estimated reduction in study costs due to the possible reduction in study

size would be 42 million dollars. In addition, the possibility to quickly gener-

ate power versus sample size curves for complex models with the PPE algo-

rithm facilitates efficient decision making for future clinical trials.

The application of NLMEM in combination with accurate and precise es-

timation algorithms improves the analysis of disease progression studies. For

disease assessments with composite scales, an additional improvement can be

achieved through use of IRT-based pharmacometric models. Their applica-

tion not only results in a more exact description of the assessment score and

increased statistical power, but also provides insight in the assessment prop-

erties. For the ADAS-cog assessment in AD, the application of IRT-based

NLMEMs over classical statistical methods increased the probability to detect

a drug effect by 43%.

In conclusion, this thesis presents novel pharmacometric methods that can

help addressing the challenges of designing and planning disease progression

studies.

58

7. Acknowledgements

The work presented in this thesis was carried out at the Department of Phar-maceutical Biosciences, Uppsala, Sweden. I am grateful for support for educa-

tional traveling to courses and conferences from Apotekarsocieten, Smålandsnation and the European Union.

Parts of the research in this thesis were funded by the DDMoRe initiative, I

am very thankful for that. Furthermore, I would like to thank Pfizer, Inc. for

giving me the opportunity to get a different perspective during my internship.

I feel immensely fortunate to have received help, support and guidance from

so many people. A thousand thanks to:

Assoc. Prof. Andrew Hooker, my supervisor, for the trust received when being

accepted as a PhD student and while working with him. Thank you for always

being curious and enthusiastic about new ideas, for taking time when I asked

for it and for giving me space when I needed it. Thank you also for supporting

my internship. I really appreciated working with you.

Prof. Mats Karlsson, my co-supervisor, for building and steering this amaz-

ing group and for being an idol for all innovative and creative scientists. I am

especially grateful that you made my Pfizer internship possible.

Dr. Stefanie Hennig, my MSc supervisor and co-author of papers I and III,

for introducing me to the FIM & company, putting in a good word for me

during my PhD application and letting me read books to Nikki and Lucas.

Dr. Kaori Ito and Dr. Brian Corrigan, my industry mentors and co-authors

of papers V and VI, for teaching me about drug development, getting excited

about IRT and helping me to digest reviewer responses. Thanks also for ribs,

BBQ, lobster rolls and so much more; your company made our year in the US

even more memorable.

Dr. Joakim Nyberg, FIM-guru and co-author of papers I and III, for the

numerous interesting discussions, moral support during PODE meetings and

proofreading this thesis. I will always be available for more cheese and wine,

but I think it’s better if you drive.

Dr. Elodie Plan, brilliant scientist and co-author of papers II and VI, for her

countless great ideas, not always agreeing with, but always believing in me.

Åsa Johansson, my roommate and co-author of paper II, for her patience

with the manuscript, sharing fears about the future and visiting us in West

Hartford.

Dr. Marilee Andrew, co-author of paper V, and Erik Strömberg, co-author

of paper I, for your help with the projects and our fruitful discussions.

59

I am very thankful to Dr. Peter Milligan and Dr. Richard Lalonde for

supporting my one year internship in the US, advising me and showing me the

value of pharmacometrics in drug development.

The Pfizer pharmacometrics group in Groton, for welcoming me and mak-

ing me feel like a full member of the team; especially Dr. Kevin Sweeney for

helping us discover Connecticut, Dr. Tom Tensfeldt for interesting conversa-

tions and after-work gatherings, and Yali Liang for nice chats and delicious

Chinese food. Thanks also to Dr. Lutz Harnisch for bringing a bit of Berlin to

the US and for being a foodie like me.

Dr. Ron Keizer and Dr. Johan Wallin, my half-time opponents, for taking

the time to review my progresses and giving me valuable feedback for my

thesis.

Magnus Jansson and Jerker Nyberg, for buying laptops with SSDs and re-

covering my files quickly after cluster crashes.

Prof. Margareta Hammarlund-Udenaes for making this department such a

special place and being a great example for a professor.

My former roommates, Kajsa for helping me navigating the maze of PsN,

Waqas for joining me in candy sprees and Jacob for highlighting the superior-

ity of tox-coffees.

Brendan for proofreading this thesis, throwing the best theme parties and a

fantastic drive along the coast of Florida.

All colleagues, past and present, for creating the best working place in the

world and filling it with so much kindness. I have numerous beautiful memo-

ries of moments with so many of you, like gym-philosophy with Paul B. and

Guangli; rowing across the Baltic sea with Joe (defense party is pay day!); ski-

ing in the Alps with Emilie; an unforgettable (and unscheduled) trip through

Europe with Paul W., Jan-Stefan, Åsa, Ami, Camille (1:0!), Akash and Waqas;

a party night in an old factory with Rocío, Roberto and Angelica; quad rides

with Mirjam, Martin, Paolo and Elin; visiting Central Park with Vijay and

Annika; and petting Alligators with Ron and Elo.

My friends Roberto and Waqas, for delicious dinners, enlightening discus-

sions and lots of laughters. Your friendship made everything so much easier.

My friends from home, Sebastian, Stefan, Robert, Erik, Timm, Tim, Fabian,

Christoph und Sebastian, for getting together in Berlin, Uppsala, New York,

Paris or Austin and always making me feel like we see each other weekly.

My new French family, especially Christophe et Odile, for welcoming me

with open arms and showing me the incredible world of French “savoir vivre”.

Thanks also to Raphaël and Frantz for helping me with the cover.

My family, particularly Oma Lola, Oma Erna, Oma Gudrun und Opa Ossi,for their understanding and support during the past years.

My sister and her little family, Elisa, Marko und Mette, for Google Hang-

outs with a bearded baby, Parkour in Eklundshof and medieval “Teufelsbraten”.

Extra credit to Marko for his cover idea.

60

My parents, Mama und Papa, for always being there for me, supporting

me and showing me their love. Thank you for Kaffee & Kuchen via Skype,

welcoming a dozen scientists with a buffet at midnight and allowing me to

pack my laptop for vacations. Without your help, I would be nowhere near

where I am today.

My love Elodie, the most impressive person I have ever met, for changing

my life, encouraging me constantly and making me smile every time I see her.

You inspire my dreams, chase them on my side and ensure I catch them. Being

with you made these past years such a great time.

Sebastian

61

References

[1] C. P. Adams and V. V. Brantner. Spending on new drug development. HealthEcon., 19(2):130–141, 2010.

[2] A. C. Atkinson and A. N. Donev. Optimum Experimental Designs. Oxford

University Press, USA, 1992.

[3] C. Ballard, S. Gauthier, A. Corbett, C. Brayne, D. Aarsland, and E. Jones.

Alzheimer’s disease. The Lancet, 377(9770):1019–1031, 2011.

[4] R. Bauer and S. Guzy. Monte carlo parametric expectation maximization (MC-

PEM) method for analyzing population Pharmacokinetic/Pharmacodynamic

data. In Advanced Methods of PKPD Systems Analysis Volume 3, number 765 in

Int. Eng. Comp. Sci., pages 135–163. Springer US, 2004.

[5] R. J. Bauer, S. Guzy, and C. Ng. A survey of population analysis methods and

software for complex pharmacokinetic and pharmacodynamic models with ex-

amples. AAPS J., 9(1):E60–E83, 2007.

[6] S. Beal, L. B. Sheiner, A. Boeckmann, and R. J. Bauer. NONMEM User’s Guides(1989-2009). Icon Development Solutions, Ellicott City, MD, USA, 2009.

[7] R. E. Becker and N. H. Greig. Alzheimer’s disease drug development in 2008

and beyond: Problems and opportunities. Curr. Alzheimer Res., 5(4):346–357,

2008.

[8] J. F. Box. R. A. Fisher and the design of experiments, 1922-1926. Am. Stat.,34(1):1–7, 1980.

[9] J. E. Cohen. Human population: The next half century. Science, 302(5648):1172

–1175, 2003.

[10] H. Cramér. Methods of estimation. In Mathematical Methods of Statistics, vol-

ume 9. Princeton University Press, 1945.

[11] M. G. Dagenais and J. M. Dufour. Invariance, nonlinear models, and asymptotic

tests. Econometrica, 59(6):1601–1615, 1991.

[12] P. Deuflhard and A. Hohmann. Numerische Mathematik I - Eine algorithmischorientierte Einführung. Walter de Gruyter, Berlin, 2002.

[13] A. Elbaz and F. Moisan. Update in the epidemiology of Parkinson’s disease.

Curr. Opin. Neurol., 24(4):454–460, 2008.

[14] R. F. Engle. Wald, likelihood ratio, and Lagrange multiplier tests in economet-

rics. In Zvi Griliches and Michael D. Intriligator, editor, Handbook of Econo-metrics, volume Volume 2, pages 775–826. Elsevier, 1984.

[15] V. V. Fedorov and P. Hackl. Numerical techniques. In Model-Oriented Design ofExperiments, number 125 in Lecture Notes in Statistics, pages 45–55. Springer

New York, Jan. 1997.

[16] H. H. Feldman, R. S. Doody, M. Kivipelto, et al. Randomized controlled trial

of atorvastatin in mild to moderate Alzheimer’s disease: LEADe. Neurology,

74(12):956–964, 2010.

62

[17] M. Foracchia, A. Hooker, P. Vicini, and A. Ruggeri. POPED, a software for

optimal experiment design in population kinetics. Comput. Methods ProgramsBiomed., 74(1):29–46, 2004.

[18] E. Gabriel, G. E. Fagg, G. Bosilca, et al. Open MPI: Goals, concept, and de-

sign of a next generation MPI implementation. In Proceedings, 11th EuropeanPVM/MPI Users’ Group Meeting, pages 97–104, Budapest, Hungary, 2004.

[19] L. Gibiansky, E. Gibiansky, and R. Bauer. Comparison of NONMEM 7.2 es-

timation methods and parallel processing efficiency on a target-mediated drug

disposition model. J. Pharmacokinet. Pharmacodyn., 39(1):17–35, 2012.

[20] B. Gullberg, O. Johnell, and J. A. Kanis. World-wide projections for hip fracture.

Osteoporos. Int., 7(5):407–413, 1997.

[21] N. H. Holford, H. C. Kimko, J. P. Monteleone, and C. C. Peck. Simulation of

clinical trials. Annu. Rev. Pharmacol. Toxicol., 40:209–234, 2000.

[22] C. Hu and M. E. Sale. A joint model for nonlinear longitudinal data with infor-

mative dropout. J. Pharmacokinet. Pharmacodyn., 30(1):83–103, 2003.

[23] K. Ito, S. Ahadieh, B. Corrigan, J. French, T. Fullerton, and T. Tensfeldt. Disease

progression meta-analysis model in Alzheimer’s disease. Alzheimer’s Dement.,6(1):39–53, 2010.

[24] K. Ito, B. Corrigan, Q. Zhao, J. French, et al. Disease progression model for cog-

nitive deterioration from Alzheimer’s Disease Neuroimaging Initiative database.

Alzheimer’s Dement., 7(2):151–160, 2011.

[25] R. W. Jones, M. Kivipelto, H. Feldman, L. Sparks, et al. The Atorvas-

tatin/Donepezil in alzheimer’s disease study (LEADe): design and baseline char-

acteristics. Alzheimer’s Dement., 4(2):145–153, 2008.

[26] E. N. Jonsson and L. B. Sheiner. More efficient clinical trials through use of

scientific model-based statistical tests. Clin. Pharmacol. Ther., 72(6):603–614,

2002.

[27] S. Jönsson, M. Kjellsson, and M. Karlsson. Estimating bias in population param-

eters for some models for repeated measures ordinal data using NONMEM and

NLMIXED. J. Pharmacokinet. Pharmacodyn., 31(4):299–320, 2004.

[28] I. Kola and J. Landis. Can the pharmaceutical industry reduce attrition rates?

Nat. Rev. Drug Discovery, 3(8):711–716, 2004.

[29] E. Kuhn and M. Lavielle. Maximum likelihood estimation in nonlinear mixed

effects models. Comput. Stat. Dat. An., 49(4):1020–1038, 2005.

[30] R. L. Lalonde, K. G. Kowalski, M. M. Hutmacher, W. Ewy, D. J. Nichols, et al.

Model-based drug development. Clin. Pharmacol. Ther., 82(1):21–32, 2007.

[31] D. E. Mager, E. Wyska, and W. J. Jusko. Diversity of mechanism-based pharma-

codynamic models. Drug Metab. Dispos., 31(5):510–518, 2003.

[32] J. R. R. A. Martins, P. Sturdza, and J. J. Alonso. The complex-step derivative

approximation. ACM Trans. Math. Softw., 29(3):245–262, 2003.

[33] F. Mentré et al. Survey on the current use of optimal design approaches and

the developments needed in adaptive optimal design for model based analysis

performed amongst DDMoRe’s EFPIA members. PAGE 21, Abstr 2337, May

2012.

[34] F. Mentré, A. Mallet, and D. Baccar. Optimal design in random-effects regression

models. Biometrika, 84(2):429–442, 1997.

[35] P. A. Milligan, M. J. Brown, B. Marchant, S. W. Martin, et al. Model-based drug

63

development: A rational approach to efficiently accelerate drug development.

Clin. Pharmacol. Ther., 93(6):502–514, 2013.

[36] R. C. Mohs, D. Knopman, R. C. Petersen, S. H. Ferris, et al. Development of

cognitive instruments for use in clinical trials of antidementia drugs: additions

to the Alzheimer’s Disease Assessment Scale that broaden its scope. AlzheimerDis. Assoc. Disord., 11 Suppl 2:S13–21, 1997.

[37] G. Molenberghs and G. Verbeke. The generalized linear mixed model (GLMM).

In Models for Discrete Longitudinal Data, Springer Series in Statistics, pages

265–280. Springer New York, Jan. 2005.

[38] D. R. Mould. Developing models of disease progression. In Pharmacometrics,

pages 547–581. John Wiley & Sons, Inc., 2006.

[39] National Institutes of Health. NCI dictionary of cancer terms. ��

�� , 2014.

[40] J. Nyberg, M. O. Karlsson, and A. C. Hooker. Simultaneous optimal exper-

imental design on dose and sample times. J. Pharmacokinet. Pharmacodyn.,36(2):125–145, 2009.

[41] J. Nyberg, S. Ueckert, and A. C. Hooker. Approximations of the population

fisher information matrix: differences and consequences. PODE 2010, 2010.

[42] K. Ogungbenro and L. Aarons. Sample size/power calculations for repeated

ordinal measurements in population pharmacodynamic experiments. J. Pharma-cokinet. Pharmacodyn., 37(1):67–83, 2010.

[43] K. Ogungbenro, G. Graham, I. Gueorguieva, and L. Aarons. The use of a modi-

fied fedorov exchange algorithm to optimise sampling times for population phar-

macokinetic experiments. Comput. Meth.Prog. Bio., 80(2):115–125, 2005.

[44] E. Papadimitropoulos, G. Wells, B. Shea, W. Gillespie, et al. Meta-analysis of

the efficacy of vitamin D treatment in preventing osteoporosis in postmenopausal

women. Endocr. Rev., 23(4):560–569, 2002.

[45] E. L. Plan, A. Maloney, F. Mentré, M. O. Karlsson, and J. Bertrand. Performance

comparison of various maximum likelihood nonlinear mixed-effects estimation

methods for dose-response models. AAPS J, 14(3):420–432, 2012.

[46] E. L. Plan, A. Maloney, I. F. Trocóniz, and M. O. Karlsson. Performance in

population models for count data, part i: maximum likelihood approximations.

J. Pharmacokinet. Pharmacodyn., 36(4):353–366, 2009.

[47] M. Prince and J. Jackson. World alzheimer report 2009. Technical report,

Alzheimer’s Disease International, London, 2009.

[48] M. Prince, M. Prina, and M. Guerchet. World alzheimer report 2013. Technical

report, Alzheimer’s Disease International, London, 2013.

[49] C. R. Rao. Information and the accuracy attainable in the estimation of statistical

parameters. Bull. Calcutta Math. Soc., 37(3):81–89, 1945.

[50] S. Retout, E. Comets, A. Samson, and F. Mentré. Design in nonlinear mixed

effects models: Optimization using the Fedorov-Wynn algorithm and power of

the Wald test for binary covariates. Stat. Med., 26(28):5162–5179, 2007.

[51] S. Retout and F. Mentré. Further developments of the Fisher information matrix

in nonlinear mixed effects models with evaluation in population pharmacokinet-

ics. J. Biopharm. Stat., 13(2):209–227, 2003.

[52] B. J. Riis. Biochemical markers of bone turnover II: diagnosis, prophylaxis, and

treatment of osteoporosis. Am. J. Med., 95(5, Supplement 1):S17–S21, 1993.

64

[53] K. Romero, M. de Mars, D. Frank, M. Anthony, et al. The Coalition Against

Major Diseases: Developing tools for an integrated drug development process

for Alzheimer’s and Parkinson’s diseases. Clin. Pharmacol. Ther., 86(4):365–

367, 2009.

[54] W. G. Rosen, R. C. Mohs, and K. L. Davis. A new rating scale for Alzheimer’s

disease. Am. J. Psychiatry, 141(11):1356–1364, 1984.

[55] S. Rump. INTLAB - INTerval LABoratory. In T. Csendes, editor, Develop-ments in Reliable Computing, pages 77–104. Kluwer Academic Publishers, Dor-

drecht, 1999.

[56] SAS Institute Inc. SAS/STAT Software, Version 9.3. SAS Institute Inc., Cary, NC,

2011.

[57] R. Savic and M. Lavielle. Performance in population models for count data, part

II: a new SAEM algorithm. J. Pharmacokinet. Pharmacodyn., 36(4):367–379,

2009.

[58] L. B. Sheiner. Learning versus confirming in clinical drug development. Clin.Pharmacol. Ther., 61(3):275–291, 1997.

[59] R. B. Sidje. EXPOKIT. A software package for computing matrix exponentials.

ACM Trans. Math. Softw., 24(1):130–156, 1998.

[60] D. A. Smith, C. Allerton, A. S. Kalgutkar, H. Waterbeemd, and D. K. Walker.

Pharmacokinetics and Metabolism in Drug Design. John Wiley & Sons, 2012.

[61] J.-L. Steimer, A. Mallet, J.-L. Golmard, and J.-F. Boisvieux. Alternative ap-

proaches to estimation of population pharmacokinetic parameters: Comparison

with the nonlinear mixed-effect model. Drug Metab. Rev., 15(1-2):265–292,

1984.

[62] S. Ueckert, E. L. Plan, K. Ito, M. O. Karlsson, B. Corrigan, and A. Hooker.

AD i.d.e.a. – Alzheimer’s disease integrated dynamic electronic assessment of

cognition. PAGE 22, Abstr 2893, 2013.

[63] S. Ueckert, E. L. Plan, K. Ito, M. O. Karlsson, B. Corrigan, and A. Hooker.

Predicting baseline ADAS-cog scores from screening information using item re-

sponse theory and full random effect covariate modeling. ACoP 2013, 2013.

[64] US Department of Health and Human Services, Food and Drug Administration.

Innovation or stagnation: Challenge and opportunity on the critical path to new

medical products. Challenges and opportunities report, FDA, Mar. 2004.

[65] Y. Wang. Derivation of various NONMEM estimation methods. J. Pharma-cokinet. Pharmacodyn., 34(5):575–593, 2007.

[66] J. J. Wilkins, G. Langdon, H. McIlleron, et al. A population pharmacokinetic-

enzyme model for rifampicin autoinduction and bimodal absorption in pul-

monary tuberculosis patients. PAGE 13, Abstr 538, 2004.

[67] P. J. Williams and E. I. Ette. Pharmacometrics: Impacting drug development and

pharmacotherapy. In Pharmacometrics, pages 1–21. John Wiley & Sons, Inc.,

2006.

[68] A. Wimo and M. Prince. World alzheimer report 2010. Technical report,

Alzheimer’s Disease International, 2010.

[69] World Health Organization. Prevention and management of osteoporosis: Report

of a WHO scientific group. Technical report, World Health Organization, 2003.

65

Acta Universitatis UpsaliensisDigital Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Pharmacy 184

Editor: The Dean of the Faculty of Pharmacy

A doctoral dissertation from the Faculty of Pharmacy, UppsalaUniversity, is usually a summary of a number of papers. A fewcopies of the complete dissertation are kept at major Swedishresearch libraries, while the summary alone is distributedinternationally through the series Digital ComprehensiveSummaries of Uppsala Dissertations from the Faculty ofPharmacy. (Prior to January, 2005, the series was publishedunder the title “Comprehensive Summaries of UppsalaDissertations from the Faculty of Pharmacy”.)

Distribution: publications.uu.seurn:nbn:se:uu:diva-216537

ACTAUNIVERSITATIS

UPSALIENSISUPPSALA

2014

Date post:	15-Mar-2018
Category:	Documents
Upload:	truongthuan
View:	221 times
Download:	1 times

for Design and Analysis of Disease Novel...

Documents