Modelling Complex Longitudinal Phenotypes over Childhood ... · GWASs of complex traits in...

Modelling Complex Longitudinal

Phenotypes over Childhood in Genetic

Association Studies

Nicole Warrington Bachelor of Science (Honours)

This thesis is presented for the degree of

Doctor of Philosophy

of The University of Western Australia

School of Women’s and Infants’ Health.

November, 2013

Declaration This thesis was completed during the course of enrolment in this degree at the University of

Western Australia and has not previously been accepted for a degree at this or another

institution.

This thesis is the author's own composition. This thesis contains published work and/or work

prepared for publication, some of which has been co-authored. The bibliographical details of

the work and where it appears in the thesis are outlined below. The permission of all co-

authors has been obtained to include the work in this thesis.

Publication 1/Chapter 2:

Warrington NM, Wu YY, Pennell CE, Marsh JA, Beilin LJ, et al. (2013) Modelling BMI

Trajectories in Children for Genetic Association Studies. PLoS One 8: e53897.

The candidate carried out the literature review, conducted the statistical analyses, interpreted

the results and drafted the manuscript. Co-authors provided guidance on data analysis and

interpretation of the results, in addition to critically reviewing all aspects of the study design

and manuscript.


Warrington NM, Tilling K, Howe LD, Paternoster L, Pennell CE, Wu YY, Briollais L. Robustness of

the linear mixed effects model to error distribution assumptions and the consequences for

genome-wide association studies. (Statistical Applications in Genetics and Molecular Biology.

Accepted on 22 July 2014)

The candidate carried out the literature review, conducted the simulations and additional

statistical analyses and drafted the manuscript. Dr Paternoster conducted the chromosome 16

analysis in ALSPAC. Dr Wu wrote the R code for the calculation of the robust standard errors.

Dr Howe, Professor Tilling and Associate Professor Briollais critically reviewed all aspects of the

study design and manuscript and provided constructive feedback throughout the analysis.

i

Chapter 5:

This work was conducted in collaboration with Dr Laura Howe and Dr Lavinia Paternoster at

the University of Bristol and Dr Marika Kaakinen and Dr Sauli Herrala from the University of

Oulu. The candidate carried out the literature review, conducted the genome-wide analysis in

the Raine Study, wrote the R scripts for conducting the replication analysis in ALSPAC and

NFBC66 and conducted the meta-analysis. Dr Paternoster and Dr Howe conducted the

replication analysis in ALSPAC and Dr Kaakinen and Dr Herrala conducted the replication

analysis in NFBC66.


Warrington NM*, Howe LD* (*joint first authorship), Wu YY, Timpson NJ, Tilling K, Pennell CE,

Newnham J, Davey-Smith G, Palmer LJ, Beilin LJ, Lye SJ, Lawlor DA, Briollais L. Association of a

Body Mass Index Genetic Risk Score with Growth throughout Childhood and Adolescence. PLoS

One 8(11): e79547

Planning of the paper was jointly undertaken by Dr Howe and the candidate. The candidate

carried out the literature review, selected the SNPs of interest, conducted all the statistical

analyses and drafted the manuscript, while Dr Howe and other co-authors on the manuscript

critically reviewed all aspects of the study design and manuscript.

Signed ………………………………………………………………………………

Associate Professor Laurent Briollais, Supervisor, Lunenfeld-Tanenbaum Research

Institute and the Dalla Lana School of Public Health, University of Toronto

Signed ………………………………………………………………………………

Nicole Warrington, PhD candidate, School of Women’s and Infants’ Health, The University of

Western Australia

ii

Abstract Genome-wide association studies (GWASs) are a hypothesis free approach to investigating

genetic factors that influence health and disease. Whilst they have been relatively successful in

uncovering novel genetic variants associated with complex human diseases, it is largely only

the ‘low hanging fruit’ that have been described to date, leaving much of the heritability of any

given trait unexplained. Geneticists are beginning to perform more complex analyses to

improve our understanding of genetic determinants of disease, including the investigation of

how genes play a role in the development of a trait over time in longitudinal studies.

Compared to cross-sectional analyses, longitudinal studies are advantageous for investigating

genetic associations as they: 1) allow information to be repeated among individuals across

various time points; 2) facilitate the detection of genetic variants that influence trajectories

rather than simple differences in phenotypes; and 3) allow the detection of genes that are

associated with age of onset of a trait. Improving analytic techniques for conducting

longitudinal GWASs offers the opportunity to advance our understanding of the aetiology of

health and disease.

The core aim of this thesis was to develop an appropriate modelling framework to conduct

GWASs of complex traits in longitudinal study designs. Body mass index (BMI) trajectories

throughout childhood were chosen for this research for several reasons. Firstly, obesity

(defined by high BMI) is a complex disorder with increasing incidence, particularly during the

first decades of life, and it is important to gain an understanding into the developmental

processes that precedes the obesity diagnosis. Secondly, obesity is linked to increased risk of

many other diseases including type-two diabetes, the metabolic syndrome, mental health

disorders, respiratory problems and some cancers. The principles underlying life course

epidemiology suggest that the link between these diseases begins in early life. Thirdly, the

genetic determinants of BMI remain largely unknown. Finally, BMI trajectories over childhood

are difficult to model statistically due to the complexities in the shape of the growth curve and

differences between individuals rate of growth within the population.

To address the aim of this thesis, five projects were conducted. The first project describes the

application of four longitudinal modelling frameworks to the BMI data from the Western

iii

Australian Pregnancy Cohort (Raine) Study. This research demonstrated that a semi-parametric

linear mixed effects model (SPLMM) provides the best fit to the data while allowing for the

detection of small genetic effects in a computationally efficient manner.

It has been suggested that a two-step approach is the most efficient for longitudinal GWASs,

firstly modelling the trait of interest and then using summary statistics from the model for the

genetic analysis. The second project compared the SPLMM to this two-step approach using a

simulation study and also the genotype data from chromosome 16 in the Raine Study. The

results demonstrated that a two-step approach is not appropriate for childhood BMI

trajectories.

Given the complex nature of data collection in large, longitudinal cohort studies, the

distributional assumptions for the error term of the linear mixed effects model are sometimes

not met. Through analysis of both a simulation study and chromosome 16 data from the UK

Avon Longitudinal Study of Parents and Children (ALSPAC), the third project in this thesis

showed that the power, bias and coverage rate of the fixed effects estimates are relatively

unaffected by the misspecification; however, a robust standard error is required to protect

against inflation of type 1 error for fixed effects estimates when the fixed effects covariates

interact with time.

The fourth project in this thesis used the statistical methods developed in the previous

chapters to conduct a GWAS of BMI over childhood in the Raine Study, with replication in

ALSPAC and the Northern Finnish Birth Cohort from 1966. Results suggest that genetic variants

in the KCNJ15 gene are associated with both increased average BMI and the rate of growth

over childhood. Variants in this gene have previously been reported to be associated with

increased risk of type-two diabetes, increased levels of insulin and insulin resistance, indicating

that this gene may be biologically important.

There are currently 32 SNPs known to be associated with BMI in adulthood. In the fifth project,

analyses of these SNPs was conducted in ALSPAC and the Raine Study to illustrate that the

association with BMI during childhood is mediated by both changes in adiposity and skeletal

growth; effects that are detectable from one year of age.

iv

Through the use of both advanced statistical techniques and the breadth of longitudinal data

that is available in large cohort studies, it is anticipated that geneticists will be able to uncover

more of the genetic determinants of complex diseases. The statistical methods investigated

and developed in this thesis provide a modelling framework that can be applied to numerous

complex disease traits and enable gene discovery to occur on a wider scale. Longer term, this

may lead to the development and implementation of more targeted interventions, at a

younger age, before the onset of disabling diseases like obesity in those at risk.

v

Publications

Publications arising directly from this thesis

Warrington NM, Wu YY, Pennell CE, Marsh JA, Beilin LJ, Palmer LJ, Lye SJ, Briollais L. Modelling

BMI trajectories in children for genetic association studies. PLoS One. 2013;8(1):e53897.

(Chapter 2)

Warrington NM*, Howe LD* (*joint first authorship), Wu YY, Timpson NJ, Tilling K, Pennell CE,

Newnham J, Davey-Smith G, Palmer LJ, Beilin LJ, Lye SJ, Lawlor DA, Briollais L. Association of a

Body Mass Index Genetic Risk Score with Growth throughout Childhood and Adolescence. PLoS

One. 2013; 8(11):e79547 (Chapter 6)

Warrington NM, Tilling K, Howe LD, Paternoster L, Pennell CE, Wu YY, Briollais L. Robustness of

the linear mixed effects model to error distribution assumptions and the consequences for

genome-wide association studies. (Statistical Applications in Genetics and Molecular Biology.

Accepted on 22 July 2014; Chapter 4)

Warrington NM, Marsh JA, Wu YY, Newnham JP, Beilin LJ, Lye SJ, Palmer LJ, Briollais L, Pennell

CE. Genetic Variants in Adult Obesity Genes are Associated with Childhood Growth. Journal of

Developmental Origins of Health and Disease. 2011; 2(S1): p S144. (ABSTRACT) (Chapter 2)

Publications arising indirectly from this thesis

Marsh JA, White SQ, Warrington NM, Lye SJ, Davey-Smith G, Newnham JP, Palmer LJ, Pennell

CE. Feeding the Epidemic of Childhood Obesity. Journal of Developmental Origins of Health

and Disease. 2011; 2(S1): p S92. (ABSTRACT)

Bradfield JP, Taal HR, Timpson NJ, Scherag A, Lecoeur C, Warrington NM, Hypponen E, Holst C,

Valcarcel B, Thiering E, Salem RM, Schumacher FR, Cousminer DL, Sleiman PM, Zhao J,

Berkowitz RI, Vimaleswaran KS, Jarick I, Pennell CE, Evans DM, St Pourcain B, Berry DJ, Mook-

Kanamori DO, Hofman A, Rivadeneira F, Uitterlinden AG, van Duijn CM, van der Valk RJ, de

vi

Jongste JC, Postma DS, Boomsma DI, Gauderman WJ, Hassanein MT, Lindgren CM, Mägi R,

Boreham CA, Neville CE, Moreno LA, Elliott P, Pouta A, Hartikainen AL, Li M, Raitakari O,

Lehtimäki T, Eriksson JG, Palotie A, Dallongeville J, Das S, Deloukas P, McMahon G, Ring SM,

Kemp JP, Buxton JL, Blakemore AI, Bustamante M, Guxens M, Hirschhorn JN, Gillman MW,

Kreiner-Møller E, Bisgaard H, Gilliland FD, Heinrich J, Wheeler E, Barroso I, O'Rahilly S,

Meirhaeghe A, Sørensen TI, Power C, Palmer LJ, Hinney A, Widen E, Farooqi IS, McCarthy MI,

Froguel P, Meyre D, Hebebrand J, Jarvelin MR, Jaddoe VW, Smith GD, Hakonarson H, Grant SF;

Early Growth Genetics Consortium. A genome-wide association meta-analysis identifies new

childhood obesity loci. Nat Genet. 2012 May;44(5):526-31.

Sovio U, Mook-Kanamori DO, Warrington NM, Lawrence R, Briollais L, Palmer CN, Cecil J,

Sandling JK, Syvänen AC, Kaakinen M, Beilin LJ, Millwood IY, Bennett AJ, Laitinen J, Pouta A,

Molitor J, Davey Smith G, Ben-Shlomo Y, Jaddoe VW, Palmer LJ, Pennell CE, Cole TJ, McCarthy

MI, Järvelin MR, Timpson NJ; Early Growth Genetics Consortium. Association between

Common Variation at the FTO Locus and Changes in Body Mass Index from Infancy to Late

Childhood: The Complex Nature of Genetic Association through Growth and Development.

PLoS Genet. 2011 Feb;7(2):e1001307.

vii

http://www.ncbi.nlm.nih.gov/pubmed/21379325



"Essentially, all models are wrong, but some are useful"

George E. P. Box

viii

Contents DECLARATION .......................................................................................................................... I

ABSTRACT ............................................................................................................................... III

PUBLICATIONS ...................................................................................................................... VI

CONTENTS ............................................................................................................................... IX

LIST OF TABLES .................................................................................................................... XV

LIST OF FIGURES ............................................................................................................. XVIII

GLOSSARY ..........................................................................................................................XXIII

ABBREVIATIONS .............................................................................................................. XXVI

ACKNOWLEDGEMENTS .................................................................................................. XXIX

CHAPTER 1: INTRODUCTION ........................................................................................... 1

1.1 General Introduction .............................................................................................................. 1

1.2 Introduction to Life Course Epidemiology ......................................................................... 2

1.3 Introduction to Genetic Epidemiology ................................................................................ 3

1.3.1 Genetics for Genetic Epidemiology ............................................................................. 3

1.3.2 Hardy-Weinberg Equilibrium Principle ..................................................................... 5

1.3.3 Linkage Disequilibrium ................................................................................................ 6

1.3.4 Haplotype Inference ..................................................................................................... 8

1.3.5 Evolution of Genetic Epidemiology Studies ............................................................... 9

1.3.5.1 Linkage ....................................................................................................................... 9

1.3.5.2 Association ................................................................................................................10

1.4 Introduction to Genome-Wide Association Studies (GWASs) .........................................14

1.4.1 Definition ......................................................................................................................14

1.4.2 Imputation ....................................................................................................................15

1.4.3 Association Analysis ....................................................................................................20

1.4.4 Replication ....................................................................................................................22

1.4.5 GWASs of Longitudinal Quantitative Traits ..............................................................23

1.5 Obesity and Body Mass Index .............................................................................................24

1.5.1 Life Course Approach to Obesity ................................................................................25

1.5.1.1 Infancy Growth and the Adiposity Peak ..................................................................27

1.5.1.2 Adiposity Rebound ....................................................................................................28

1.5.1.3 Puberty ......................................................................................................................29

1.5.2 Genetics of BMI .............................................................................................................30

1.6 Birth Cohorts used in this Thesis........................................................................................32

ix

1.6.1 The Western Australian Pregnancy Cohort (Raine) Study ..................................... 32

1.6.1.1 Subjects ..................................................................................................................... 32

1.6.1.2 Measurements .......................................................................................................... 33

1.6.1.3 Genotyping ................................................................................................................ 34

1.6.2 Avon Longitudinal Study of Parents and Children (ALSPAC) ................................. 36

1.6.2.1 Subjects ..................................................................................................................... 36

1.6.2.2 Measurements .......................................................................................................... 37

1.6.2.3 Genotyping ................................................................................................................ 38

1.6.3 The Northern Finland Birth Cohort of 1966 (NFBC66) ........................................... 38

1.6.3.1 Subjects ..................................................................................................................... 38

1.6.3.2 Measurements .......................................................................................................... 39

1.6.3.3 Genotyping ................................................................................................................ 39

1.7 Aims ....................................................................................................................................... 40

1.8 Outline of Thesis .................................................................................................................. 41

CHAPTER 2: LONGITUDINAL STATISTICAL MODELS FOR BODY MASS

INDEX TRAJECTORIES THROUGHOUT CHILDHOOD USING THE WESTERN

AUSTRALIAN PREGNANCY COHORT (RAINE) STUDY .......................................... 43

2.1 Introduction ......................................................................................................................... 43

2.2 Background .......................................................................................................................... 43

2.2.1 Aims ........................................................................................................................... 46

2.3 Subjects and Materials ........................................................................................................ 47

2.4 Statistical Methods and Model Fit ...................................................................................... 53

2.4.1 Linear Mixed Effects Model (LMM) ........................................................................ 53

2.4.1.1 Method Description ............................................................................................... 54

2.4.1.2 Model Fit ................................................................................................................... 55

2.4.1.3 Computational Time ................................................................................................58

2.4.2 Skew-t Model Linear Mixed Effects Model (STLMM) ............................................... 63

2.4.2.1 Method Description .................................................................................................... 63

2.4.2.2 Model Fit .................................................................................................................... 65

2.4.2.3 Computational Time ................................................................................................... 68

2.4.3 Semi-Parametric Mixed Model (SPLMM) using Smoothing Splines ....................... 69

2.4.3.1 Method Description .................................................................................................... 69

2.4.3.2 Model Fit .................................................................................................................... 69

2.4.3.3 Computational Time ................................................................................................... 74

2.4.4 Non-Linear Mixed Effects Model (NLMM); also known as the SuperImposition by

Translation And Rotation (SITAR) Model ................................................................................. 75

x

2.4.4.1 Method Description ....................................................................................................75

2.4.4.2 Model Fit .....................................................................................................................76

2.4.4.3 Computational Time....................................................................................................81

2.5 Genetic Associations ............................................................................................................81

2.5.1 SNP Selection ................................................................................................................... 81

2.5.2 Cross-Sectional Analyses .............................................................................................85

2.5.3 Longitudinal Analyses .................................................................................................89

2.5.4 Obesity-Risk Allele Score ............................................................................................99

2.5.5 Characterising Genetic Associations in SPLMM Model .......................................... 102

2.6 Comparison of Models ....................................................................................................... 107

2.6.1 Model Fit ......................................................................................................................... 107

2.6.2 Computation Time ..................................................................................................... 110

2.6.3 Ability to Detect Genetic Associations with Known Adult BMI/Obesity SNPs .... 110

2.7 Discussion ........................................................................................................................... 111

2.8 Conclusion ........................................................................................................................... 114

CHAPTER 3: COMPARING THE SEMI-PARAMETRIC LINEAR MIXED MODEL

TO A TWO-STEP APPROACH FOR GENOME-WIDE ASSOCIATION STUDIES

................................................................................................................................................. 115

3.1 Introduction ........................................................................................................................ 115

3.2 Background ......................................................................................................................... 115

3.2.1 Aims ............................................................................................................................. 118

3.3 Methods ............................................................................................................................... 119

3.3.1 Statistical Methods ..................................................................................................... 119

3.3.2 Simulation Study ........................................................................................................ 121

3.3.3 Chromosome 16 Analysis in the Raine Study ......................................................... 122

3.4 Results ................................................................................................................................. 122

3.4.1 Simulation Study Results .......................................................................................... 122

3.4.2 Chromosome 16 SNPs in the Raine Study................................................................ 130

3.5 Discussion ........................................................................................................................... 133

3.6 Conclusions ......................................................................................................................... 133

CHAPTER 4: ROBUSTNESS OF THE LINEAR MIXED EFFECTS MODEL TO

DISTRIBUTIONAL ASSUMPTIONS AND CONSEQUENCES FOR GENOME-WIDE

ASSOCIATION STUDIES .................................................................................................. 134

4.1 Introduction ......................................................................................................................... 134

4.2 Background ......................................................................................................................... 134

4.2.1 Aims ............................................................................................................................. 136

xi

4.3 Motivating Example ........................................................................................................... 136

4.4 Simulation Study ................................................................................................................ 141

4.4.1 Sampling Designs ....................................................................................................... 143

4.4.2 Models for Data Generation ..................................................................................... 144

4.4.2.1 Standard Linear Mixed Model ............................................................................... 144

4.4.2.2 Non Gaussian Error ................................................................................................ 144

4.4.2.3 Heteroscedastic Error ............................................................................................ 144

4.4.3 Data Generation ......................................................................................................... 144

4.4.4 Calculating Robust Standard Errors and Global Wald Tests ................................ 146

4.5 Results for Simulated Data ............................................................................................... 147

4.5.1 Coverage Probabilities .............................................................................................. 147

4.5.2 Bias .............................................................................................................................. 147

4.5.3 Power .......................................................................................................................... 151

4.5.4 Type 1 Error ............................................................................................................... 154

4.5.5 Type 1 Error in Unbalanced Designs Versus Complete Designs .......................... 158

4.5.6 Power Using the Robust Standard Error ................................................................. 162

4.6 Analysis of Chromosome-Wide BMI Data ........................................................................ 165

4.6.1 Comparison Between the Classical and Robust Tests ........................................... 169

4.7 Discussion ........................................................................................................................... 172

4.8 Conclusion .......................................................................................................................... 175

CHAPTER 5: GENOME-WIDE ASSOCIATION STUDY OF BMI TRAJECTORIES

ACROSS CHILDHOOD ...................................................................................................... 176

5.1 Introduction ........................................................................................................................ 176

5.2 Background ........................................................................................................................ 176

5.2.1 Aims ............................................................................................................................ 178

5.3 Statistical Methods............................................................................................................. 179

5.3.1 Study Populations ...................................................................................................... 179

5.3.1.1 Raine Study ............................................................................................................. 179

5.3.1.2 ALSPAC .................................................................................................................... 179

5.3.1.3 NFBC66 .................................................................................................................... 179

5.3.2 Data Cleaning ............................................................................................................. 179

5.3.3 Longitudinal Modelling ............................................................................................. 181

5.3.4 Statistical Analysis ..................................................................................................... 182

5.3.5 Additional Analysis for Characterizing Significant Findings ................................ 183

5.3.6 Pathway Analysis ....................................................................................................... 184

5.4 Results ................................................................................................................................. 185

xii

5.4.1 Comparison of Cohorts .............................................................................................. 185

5.4.2 Results from the Raine Study GWAS ........................................................................ 187

5.4.2.1 Summary of GWAS .................................................................................................. 187

5.4.2.2 Regions of Interest .................................................................................................. 189

5.4.3 Characterising the Findings of the KCNJ15 Gene .................................................... 193

5.4.4 Results from Replication and Meta-Analysis .......................................................... 196

5.4.5 Results from Pathway Analysis ................................................................................ 199

5.5 Discussion ........................................................................................................................... 199

5.5.1 Role of KCNJ15 Gene and Nearby Genes on Chromosome 21 ................................ 202

5.6 Challenges and Future Research ...................................................................................... 204

5.7 Conclusion ........................................................................................................................... 205

CHAPTER 6: ASSOCIATION OF A GENETIC RISK SCORE WITH

LONGITUDINAL BMI IN CHILDREN ........................................................................... 206

6.1 Introduction ........................................................................................................................ 206

6.2 Background ......................................................................................................................... 206

6.2.1 Aims ............................................................................................................................. 207

6.3 Subjects and Materials ....................................................................................................... 208

6.3.1 Study Populations ...................................................................................................... 208

6.3.2 SNP Selection and Allelic Score ................................................................................ 208

6.4 Statistical Analysis ............................................................................................................. 209

6.4.1 Longitudinal Modelling and Derivation of Growth Parameters ........................... 209

6.4.2 Statistical Analysis ..................................................................................................... 209

6.5 Results ................................................................................................................................. 211

6.5.1 Association Between the Allelic Score and Growth Trajectories ......................... 215

6.5.2 Associations Between the Allelic Score and Birth Measures, Adiposity Peak and

Adiposity Rebound ..................................................................................................................... 219

6.5.3 Variance Explained by the Allelic Score .................................................................. 220

6.5.4 Sex Interactions Between the 32 Individual BMI SNPs and BMI Trajectories .... 222

6.5.5 Adjustment for FTO Effect ......................................................................................... 222

6.5.6 Comparison with Weighted Allelic Score ................................................................ 224

6.6 Discussion ........................................................................................................................... 227

6.7 Conclusion ........................................................................................................................... 229

CHAPTER 7: CONCLUSIONS, LIMITATIONS AND FUTURE DIRECTIONS .... 230

7.1 Main Findings ...................................................................................................................... 230

7.1.1 Longitudinal Statistical Models for Body Mass Index Growth Trajectories

throughout Childhood using the Western Australian Pregnancy Cohort (Raine) Study ... 231

xiii

7.1.2 Comparing SPLMM to Two-Step Approach for GWASs .......................................... 231

7.1.3 Robustness of the Linear Mixed Effects Model to Distribution Assumptions and

Consequences for Genome-Wide Association Studies ........................................................... 232

7.1.4 Genome-Wide Association Study of BMI Trajectories Across Childhood ........... 232

7.1.5 Association of a Genetic Risk Score with Longitudinal BMI in Children ............. 233

7.2 Limitations .......................................................................................................................... 233

7.2.1 Computational Intensity ........................................................................................... 233

7.2.2 Gene discovery ........................................................................................................... 234

7.3 Future Directions ............................................................................................................... 234

7.3.1 Reducing Remaining Type 1 Error .......................................................................... 234

7.3.2 Longitudinal Family Studies ..................................................................................... 235

7.3.3 Adjusting for Environmental Covariates ................................................................ 235

7.3.4 Gene-Environment and Gene-Gene Interactions ................................................... 236

7.3.5 Fine Mapping .............................................................................................................. 237

7.3.6 Rare Variants ............................................................................................................. 238

7.4 Conclusion .......................................................................................................................... 238

REFERENCES ...................................................................................................................... 240

APPENDIX A: PUBLICATION ARISING FROM THE RESEARCH IN CHAPTER TWO

APPENDIX B: ADDITIONAL DETAILS OF THE LINEAR MIXED MODEL IN CHAPTER

TWO

APPENDIX C: PUBLICATION ARISING FROM THE RESEARCH IN CHAPTER FOUR

APPENDIX D: ADDITIONAL RESULTS FROM SIMULATION ANALYSIS IN CHAPTER

FOUR

APPENDIX E: PUBLICATION ARISING FROM THE RESEARCH IN CHAPTER SIX

APPENDIX F: ADDITIONAL RESULTS FROM ALLELIC SCORE ANALYSIS IN CHAPTER

SIX

APPENDIX G: R CODE FOR THE MODELS USED IN THE ANALYSIS OF EACH CHAPTER

xiv

List of Tables

Table 1.1: Probability of observing a given haplotype when the loci are in disequilibrium ................... 7

Table 1.2: The genotypic information an individual possess across two loci determines their possible

diplotype ...................................................................................................................................... 8

Table 1.3: Study designs for genetic association studies [18,33,34] ......................................................11

Table 2.1: Number of follow-ups with BMI measured for each of the participants in the sample ........47

Table 2.2: The phenotypic characteristics at each follow-up year for the 1,506 individuals in the study

sample. Continuous variables are expressed as means (standard deviation); binary variables as

percentage (number). ..................................................................................................................49

Table 2.3: The correlation structure of the repeated observations of BMI ...........................................50

Table 2.4: Model fit statistics for covariance models tested using the LMM method; -2 log likelihood,

Bayesian Information Criterion (BIC) and Akaike Information Criterion (AIC). All models assumed

an independent correlation structure for the random effects and no specific correlation structure

for the error. ................................................................................................................................56

Table 2.5: Details of LMM model in females (N=733, n=4377) ..............................................................59

Table 2.6: Details of LMM model in males (n=773, N=4609) .................................................................61

Table 2.7: Details of STLMM model in females (N=733, n=4377) ..........................................................65

Table 2.8: Details of STLMM model in males (N=773, n=4609) .............................................................67

Table 2.9: Details of SPLMM model in females (N=733, n=4377). Spline 1 is the change in slope

between two and eight years, Spline 2 is the change in slope after 12 years and Spline 3 is the

change in slope before two years. ...............................................................................................71

Table 2.10: Details of SPLMM model in males (N=773, n=4,609). Spline 1 is the change in slope

between two and eight years, Spline 2 is the change in slope after 12 years and Spline 3 is the

change in slope before two years. ...............................................................................................73

Table 2.11: Details of NLMM model in females (N=733, n=4,377) ........................................................77

Table 2.12: Details of NLMM model in males (N=773, n=4,609) ...........................................................79

Table 2.13: Results from association analysis between the estimates of the three parameters from the

NLMM model and markers of obesity at age 17 years. ................................................................81

Table 2.14: Characteristics from the Raine Study sample of the 17 SNPs investigated in each of the

statistical methods ......................................................................................................................83

Table 2.15: Summary of cross-sectional results for the 17 SNPs. Significant P-Values are in bold. .......86

Table 2.16: Summary of longitudinal analyses, using the four methods, for each of the 17 SNPs in

females. Significant P-Values are in bold. ....................................................................................91

xv

Table 2.17: Summary of longitudinal analyses, using the four methods, for each of the 17 SNPs in

males. Significant P-Values are in bold. ....................................................................................... 95

Table 2.18: Results from association analysis of the obesity-risk allele score with BMI trajectories using

the four methods, adjusted for the first five principal components .......................................... 101

Table 2.19: Statistical measures used to compare model fit of the four methods. ............................. 108

Table 2.20: Computation time for the four methods adjusting for the FTO genotype (median [IQR]) 110

Table 2.21: The number of significant SNPs for each method, using a likelihood ratio test................ 111

Table 3.1: Parameter estimates from the Raine Study SPLMM model (Model 1) used to generate the

data in the simulation study ...................................................................................................... 121

Table 3.2: Results from the 1,000 simulations. SDdiff is the standard deviation of the difference

between -log10(pF) [P-Value for testing the β coefficients in the one step method] and -log10(pT)

[P-Value for testing the β coefficients in the two-step method] for the 1,000 simulations. r2 is the

Pearson correlation coefficient for the ratio of the beta coefficient to the standard error. ....... 124

Table 3.3: Results of the 1,000 simulations in the additional scenarios. SDdiff is the standard deviation

of the difference between -log10(pF) [P-Value for testing the β coefficients in the one step

method] and -log10(pT) [P-Value for testing the β coefficients in the two-step method]. r2 is the

Pearson correlation coefficient for the ratio of the beta coefficient to the standard error. ....... 130

Table 4.1: Parameter estimates from the ALSPAC non-genetic model used to generate the data in the

simulation study........................................................................................................................ 142

Table 4.2: Coverage rates of the 95% confidence intervals of the fixed effects; bold and underlined

cells are those that are significantly different from the nominal 95% based on 4,000 simulations

under each design (1,000 simulations for each MAF combined into one summary statistic). .... 148

Table 4.3: Bias and 95% confidence interval for the complete designs; bold and underlined cells are

those whose confidence interval does not cover zero based on 4,000 simulations under each

design (1,000 simulations for each MAF combined into one summary statistic)........................ 149

Table 4.4: Bias and 95% confidence interval for the unbalanced designs; bold and underlined cells are

those whose confidence interval does not cover zero based on 4,000 simulations under each

design (1,000 simulations for each MAF combined into one summary statistic)........................ 150

Table 4.5: Type 1 error for complete designs; bold and underlined cells are those that are significantly

different from the nominal α=0.05 based on 20,000 simulations under each design (5,000

simulations for each MAF combined into one summary statistic). ............................................ 155

Table 4.6: Type 1 error for unbalanced designs; bold and underlined cells are those that are

significantly different from the nominal α=0.05 based on 20,000 simulations under each design

(5,000 simulations for each MAF combined into one summary statistic). .................................. 156

Table 5.1: Mean (SD) age and BMI at the adiposity rebound in the three cohorts, in addition to the

correlation between the two measures. ................................................................................... 186

xvi

Table 5.2: Results from the pathway analysis using SNPs not in LD .................................................... 199

Table 6.1: Phenotypic characteristics of the two birth cohorts used for analysis ................................ 211

Table 6.2: Descriptive statistics of the single nucleotide polymorphisms included in the allelic score.

.................................................................................................................................................. 213

Table 6.3: Results of the allelic score with each of the trajectory outcomes (BMI, weight and height) in

both cohorts and the combined meta-analysis. Significant findings are in bold; Spline 1 is the

change in slope between two and eight years, Spline 2 is the change in slope after 12 years and

Spline 3 is the change in slope before two years. ....................................................................... 215

Table 6.4: Cross-sectional association analysis results for birth measures, BMI and age at adiposity

peak (AP) and BMI and age at adiposity rebound (AR) in ALSPAC and the Raine Study. ............ 219

Table 6.5: Results of the allelic score, after adjustment for the FTO locus, with each of the trajectory

outcomes (BMI, weight and height) in both cohorts and the combined meta-analysis. Significant

findings are in bold; Spline 1 is the change in slope between two and eight years, Spline 2 is the

change in slope after 12 years and Spline 3 is the change in slope before two years. ................ 223

Table 6.6: Comparison of the unweighted and weighted allelic scores for the three trajectory

outcomes. Spline 1 is the change in slope between two and eight years, Spline 2 is the change in

slope after 12 years and Spline 3 is the change in slope before two years. ................................ 225

xvii

List of Figures Figure 1.1: Architecture Of disease based on genetic determinants (Image adapted from McCarthy et

al [42], Manolio et al [43], Bush and Moore [44])........................................................................ 15

Figure 1.2: A schematic of the process by which genetic data is imputed using haplotype inference .. 17

Figure 1.3: Schema of BMI trajectory over childhood and adolescence. The green arrow indicates the

period around the adiposity peak (9 months of age); the red arrow indicates the period around

the adiposity rebound (5-6 years of age). .................................................................................... 26

Figure 1.4: The Raine Study schedule of assessments and broad measurements collected.................. 33

Figure 1.5: Principal components for population stratification in the Raine Study with the HapMap

populations superimposed, showing that the Raine Study individuals are prominently of

European descent. ...................................................................................................................... 35

Figure 1.6: Principal components for population stratification for the 1,494 participants with genome-

wide data in the Raine Study ....................................................................................................... 36

Figure 2.1: Boxplots of BMI at each follow-up year, with BMI displayed from 10-30kg/m2 for years 1-6

and 10-50kg/m2 for years 8-17. ................................................................................................... 51

Figure 2.2: Individual BMI profiles of 20 individuals from the Raine Study .......................................... 52

Figure 2.3: Observed BMI measures for the 1,506 individuals with a lowess curve to visualise the

curvature in BMI over childhood ................................................................................................. 52

Figure 2.4: Model diagnostic plots from LMM model fit to the data from females in the Raine Study . 60

Figure 2.5: Model diagnostic plots from LMM model fit to the data from males in the Raine Study. ... 62

Figure 2.6: Model diagnostic plots from STLMM model fit to the data from females in the Raine Study

.................................................................................................................................................... 66

Figure 2.7: Model diagnostic plots from STLMM model fit to the data from males in the Raine Study. 68

Figure 2.8: Model diagnostic plots from SPLMM model fit to the data from females in the Raine Study.

.................................................................................................................................................... 72

Figure 2.9: Model diagnostic plots from SPLMM model fit to the data from males in the Raine Study 74

Figure 2.10: Model diagnostic plots from NLMM model fit to the data from females in the Raine Study.

.................................................................................................................................................... 78

Figure 2.11: Model diagnostic plots from NLMM model fit to the data from males in the Raine Study.

.................................................................................................................................................... 80

Figure 2.12: Distribution of obesity-risk allele score, with error bars for mean BMI at age 14 years. The

obesity-risk-allele score incorporates genotypes from 17 loci (FTO, MC4R, TMEM18, GNPDA2,

KCTD15, NEGR1, BDNF, ETV5, SEC16B, LYPLAL1, TFAP2B, MTCH2, BCDIN3D, NRXN3, SH2B1, and

MRSA) in the 1,219 individuals from the Raine Study with complete genetic data. The error bars

xviii

display the mean (95% CI) BMI at age 14 years (the largest follow-up in adolescence) for each

risk-allele score. ......................................................................................................................... 100

Figure 2.13: Population average curves for each of the significantly associated SNPs from the SPLMM

method in females (panel A) and males (panel B) ...................................................................... 103

Figure 2.14: Population average curves from the SPLMM method in females and males ................... 105

Figure 2.15: Associations between the risk-allele score and BMI at each follow-up in females and

males. Regression coefficients (95% CI) presented on ln(BMI) scale from the Semi-Parametric

Linear Mixed Model (SPLMM) longitudinal model, derived at each of the average ages of follow-

up. For example, a male with 17 obesity-risk-alleles is likely to have a ln(BMI) 0.005 units higher

at age 6 than a male 16 alleles and by age 14 this difference will be increased to 0.010 units. .. 106

Figure 2.16: Q-Q plot of residuals for each of the methods for females (top four) and males (bottom

four) .......................................................................................................................................... 109

Figure 3.1: Comparison of the one and two-step approaches for the SNP main effect from the 1,000

simulated data sets with different effect sizes for the SNP main effect and SNP*age interaction

effect; on the x-axis is the –log10(PF) and on the y-axis is the –log10(PT). ................................... 123

Figure 3.2: Comparison of the one and two-step approaches for the SNP*age interaction effect from

the 1,000 simulated data sets with different effect sizes for the SNP main effect and SNP*age

interaction effect; on the x-axis is the –log10(PF) and on the y-axis is the –log10(PT). ................. 125

Figure 3.3: Comparison of the β and SE estimates using the one and two-step approaches for the SNP

main effect and SNP*age interaction effect from the 1,000 simulated data sets where both the

SNP main effect and SNP*age interaction effect were significant; on the x-axis are the estimates

(β and SE(β)) from the SPLMM and on the y-axis are the estimates from the two-step approach.

.................................................................................................................................................. 127

Figure 3.4: Comparison of the one and two-step approaches for each of the SNP effects from analysis

of the chromosome 16 data in the Raine Study; on the x-axis is the –log10(PF) and on the y-axis is

the –log10(PT). ........................................................................................................................... 131

Figure 3.5: Comparison of the β and SE estimates using the one and two-step approaches for each of

the SNP effects from the chromosome 16 analysis in the Raine Study; on the x-axis are the

estimates (β and SE(β)) from the SPLMM and on the y-axis are the estimates from the two-step

approach. .................................................................................................................................. 132

Figure 4.1: Individual BMI trajectories for 20 females from ALSPAC ................................................... 137

Figure 4.2: BMI measurements over time, by measurement source, in ALSPAC ................................. 138

Figure 4.3: Residual plots, by measurement source, for the LMM model fit to the ALSPAC data........ 140

Figure 4.4: Simulated power of the SNP main effect and SNP*age interaction terms for complete

designs. The two plots on the left are for the Sparse Complete design, while the two plots on the

right are from the intense complete design. .............................................................................. 152

xix

Figure 4.5: Simulated power of the SNP main effect and SNP*age interaction terms for unbalanced

designs, where “Equal” is the simulations from the Equal Unbalanced design, “Over” are the

simulations from the unbalanced design with less samples around the adiposity rebound and

“Under” are the simulations from the unbalanced design with more samples around the

adiposity rebound. .................................................................................................................... 153

Figure 4.6: Results from comparison between missing data or variable measurement time under the

sparse design...................................................................................... ........................................ 159

Figure 4.7: Results from comparison between missing data or variable measurement time under the

intense design .................................................................................... ........................................ 160

Figure 4.8: Difference in power based on a normal standard error versus a robust standard error for

the complete designs. A positive value indicates the power using the normal standard error is

greater than the power using the robust standard error. The two plots on the left are for the

Sparse Complete design, while the two plots on the right are from the intense complete design.

.................................................................................................................................................. 163

Figure 4.9: Difference in power based on a normal standard error versus a robust standard error for

the unbalanced designs. A positive value indicates the power using the normal standard error is

greater than the power using the robust standard error. Here, “Equal” is the simulations from

the Equal Unbalanced design, “Over” are the simulations from the unbalanced design with fewer

samples around the adiposity rebound and “Under” are the simulations from the unbalanced

design with more samples around the adiposity rebound. ........................................................ 164

Figure 4.10: Q-Q plot of the chromosome 16 analysis in ALSPAC for the overall Wald test and the

SNP*linear age interaction test. Plots A and B include 88 SNPs in the FTO gene, Plots C and D

exclude SNPs in the FTO gene. .................................................................................................. 167

Figure 4.11: Q-Q plot of the chromosome 16 analysis in ALSPAC for all parameters, excluding 88 SNPs

from the FTO gene. ................................................................................................................... 168

Figure 4.12: Comparison of the classical and robust tests for each of the parameters of interest from

the chromosome 16 analysis in ALSPAC .................................................................................... 170

Figure 4.13: Comparison of the classical and robust tests for the SNP main effect by minor allele

frequency (MAF) from the chromosome 16 analysis in ALSPAC................................................. 171

Figure 5.1: Schematic describing the relationships between genetic variants, environmental exposures

and modification to disease risk in adulthood. Image adapted from Newnham et al [117]. ...... 177

Figure 5.2: Population average BMI trajectories in females (A) and males (B) for each of the three

cohorts; the Raine Study (red), ALSPAC (green) and NFBC66 (blue). .......................................... 186

Figure 5.3: Q-Q plot for each of the four tests of interest in the Raine Study GWAS .......................... 188

Figure 5.4: Plot of standard versus robust test P-Values .................................................................... 188

xx

Figure 5.5: Manhattan plot of the P-Values from the global SNP effect (Wald test) for BMI trajectory in

the Raine Study. The red line indicates the genome-wide significance level. The most significant

genetic variant is in the KCNJ15 gene on chromosome 21. ........................................................ 191

Figure 5.6: Manhattan plot of the P-Values from the global SNP by age effect (Wald test) for BMI

trajectory in the Raine Study. The red line indicates the genome-wide significance level. The

most significant genetic variant is in an intergenic region on chromosome 2. ........................... 191

Figure 5.7: Manhattan plot of the P-Values from the SNP main effect for BMI trajectory in the Raine

Study. The red line indicates the genome-wide significance level. The most significant genetic

variant is in an intergenic region on chromosome 14................................................................. 192

Figure 5.8: Manhattan plot of the P-Values from the SNP by linear age effect for BMI trajectory in the

Raine Study. The red line indicates the genome-wide significance level. The most significant

genetic variant is upstream from the GRM7 gene on chromosome 3......................................... 192

Figure 5.9: BMI trajectories for females (left) and males (right) for each of the KCNJ15, rs2008580

alleles. ....................................................................................................................................... 195

Figure 5.10: Regional plot of (A) global Wald P-Values for the overall SNP effect and (B) Wald P-Values

for the SNP by age effect as a function of genomic position (NCBI Build 36) from the meta-

analysis of ALSPAC and NFBC66 for KCNJ15 gene region. In each plot, the meta-analysis P-Value

for rs2836241 is denoted by a purple diamond; all other analysed SNPs are represented by a

circle. Local LD structure is reflected by the plotted estimated recombination rates (taken from

HapMap). The colour scheme of the circles respects LD patterns (HapMap CEU pairwise r2

correlation coefficients) between rs2836241 and surrounding variants. Gene annotations were

taken from the University of California Santa Cruz genome browser. ........................................ 197

Figure 5.11: Regional plot of (A) P-Values for the SNP main effect at age eight and (B) P-Values for the

SNP by linear age effect as a function of genomic position (NCBI Build 36) from the meta-analysis

of ALSPAC and NFBC66 for KCNJ15 gene region. In each plot, the meta-analysis P-Value for

rs2836241 is denoted by a purple diamond; all other analysed SNPs are represented by a circle.

Local LD structure is reflected by the plotted estimated recombination rates (taken from

HapMap). The colour scheme of the circles respects LD patterns (HapMap CEU pairwise r2

correlation coefficients) between rs2836241 and surrounding variants. Gene annotations were

taken from the University of California Santa Cruz genome browser. ........................................ 198

Figure 6.1: Population average curves for individuals from ALSPAC with 27, 29 or 31 BMI risk alleles in

females (A, C and E) and males (B, D and F). Predicted population average BMI (A and B), weight

(C and D) and height (E and F) trajectories from 1 – 16 years for individuals with 27 (lower

quartile), 29 (median), and 31 (upper quartile) BMI risk alleles in the allelic score. ................... 217

xxi

Figure 6.2: Associations between the allelic score and BMI (A and B), weight (C and D) and height (E

and F) at each follow-up in females and males from ALSPAC. Regression coefficients (95% CI)

derived from the longitudinal model at each year of follow-up between 1 and 16 years. ......... 218

Figure 6.3: Estimates from the longitudinal models of the proportion of BMI variation explained (R2) at

each time point in females and males from ALSPAC. R2 derived from the longitudinal model at

each year of follow-up between 1 and 16 years. Of note, there are increases in the proportion of

BMI variation explained by the allelic score around the landmarks of growth including adiposity

peak and puberty. ..................................................................................................................... 221

xxii

Glossary Allele: a viable DNA (deoxyribonucleic acid) coding that occupies a specific location on a

chromosome. An individual inherits one of these DNA codes from each parent.

Autosome: Any chromosome that is not a sex chromosome. There are 22 autosomes in the

human genome.

Association Study: A study of the statistical association between an allele and a trait in the

population.

Chromosome: an organized structure of deoxyribonucleic acid (DNA) that is found in cells. It is

a single piece of coiled DNA containing many genes, regulatory elements and other nucleotide

sequences.

Complex disease: a disease that involves multiple genetic and environmental factors and their

interactions.

Coverage probability: the proportion of the time that a confidence interval contains the true

value of interest.

Critical period: a period of time in which an exposure can have an adverse (or protective)

effect on the development of a disease outcome; outside this time window, there is no

additional risk of disease associated with the exposure.

Deoxyribonucleic Acid (DNA): a molecule that encodes the genetic instructions used in the

development and functioning of all known living organisms. Most DNA molecules are double-

stranded helices consisting of four different types of subunits called nucleotides: guanine (G),

adenine (A), thymine (T) and cytosine (C).

Exon: the part of a gene that codes for a protein. They are the part of the DNA that is

converted into mature messenger RNA (mRNA).

xxiii

Gene: a segment of inherited DNA which contains the information necessary to produce a

functional product through transcription to RNA and translation to proteins

Genome: the total hereditary information of an individual that is encoded in the DNA.

Genomic Inflation: the presence of excess false-positive results, measured by quantifying the

ratio of the median of the empirically observed distribution of the test statistic to the expected

median.

Genotype: the combination of alleles an individual possesses at a particular locus. As

individuals have two chromosomes, a genotype is usually expressed as two alleles.

Haplotype: a group of alleles on a single chromosome that are closely enough linked to be

inherited as a unit.

Hardy-Weinberg Equilibrium: the principle which describes the distribution of genotypes at a

locus in terms of its allele frequencies in a population.

Heritability: proportion of observable variation in a trait between individuals within a

population that is due to genetic variation.

Heterozygote: an individual that has two different alleles at a particular locus on a

chromosome.

Homozygote: an individual that has the same two alleles at a particular locus on a

chromosome.

Identical-By-Descent: a segment of DNA that two or more individuals have inherited from a

common ancestor without recombination and therefore the segment has the same ancestral

origin in these individuals.

Intron: segments of a gene that are not transcribed into messenger RNA and that are found

between exons.

xxiv

Linkage Disequlibrium: non-random association of alleles at two or more closely linked loci.

Locus: the fixed, unique physical position of a gene or one of its alleles on a chromosome.

Phenotype: specific manifestation of a trait or behaviour that varies between individuals.

These are any qualitative or quantitative observable characteristics of an individual and often

referred to as traits.

Polymorphism: a locus that is polymorphic has a least two alternative alleles. This implies that

the given locus varies genetically between individuals.

Population stratification: different disease rates and allele frequencies co-occurring within

population subgroups, which can lead to spurious associations at the population level.

Power: is the probability of rejecting the null hypothesis when the null hypothesis is false.

Recombination: the exchange of genetic information between two chromosomes during cell

division, resulting in new genetic variation.

Sensitive period: similar to a critical period, it is a period of rapid change in an individual,

however a sensitive period allows the risk of disease be modified or even reversed outside the

time window.

Single Nucleotide Polymorphism: a DNA sequence variation occurring when a single

nucleotide – A, T, C or G – in the genome differs between members of a species.

Type 1 error: the incorrect rejection of a true null hypothesis. That is, when a parameter is

falsely declared significant.

Type ll error: is the failure to reject a false null hypothesis. That is, a parameter is not declared

significant when it should be.

xxv

Abbreviations AIC Akaike information criterion

ALSPAC Avon Longitudinal Study of Parents and Children

BCDIN3D Gene: BCDIN3 domain containing

BDNF Gene: Brain-Derived Neurotrophic Factor

BIC Bayesian information criterion

BLUE Best Linear Unbiased Estimator

BLUP Best Linear Unbiased Predictor

BMI Body mass index

CADM2 Gene: Cell Adhesion Molecule 2

cdf Cumulative Distribution Function

CEU Samples of European descent (from residents of the United States of America

of northern and western European ancestry) from HapMap

CHB Samples of Chinese descent (from Beijing) from HapMap

CI Confidence Interval

CNV Copy Number Variant

dbGaP Database of Genotypes and Phenotypes

df Degrees of Freedom

bp Base pairs

DOHaD Developmental Origins of Health and Disease

DNA Deoxyribonucleic Acid

EGG Early Growth Genetics Consortium

EM Expectation Maximization

ETV5 Gene: ETS Variant 5

FAIM2 Gene: Fas Apoptotic Inhibitory Molecule 2

FANCL Gene: Fanconi Anemia, Complementation Group L

FDR False Discovery Rate

FLJ35779 Gene: POC5 centriolar protein homolog

FTO Gene: Fat mass and obesity associated

GAW Genetic Analysis Workshop

GIANT Genetic Investigation of Anthropometric Traits Consortium

xxvi

GNPDA2 Gene: Glucosamine-6-Phosphate Deaminase 2

GPRC5B Gene: G Protein-Coupled Receptor, Family C, Group 5, Member B

GWAS Genome-Wide Association Study

HOXB5 Gene: Homeobox B5

HWE Hardy-Weinberg Equilibrium

IBD Identical by Decent

IQR Interquartile Range

JPT Samples of Japanese descent (from Tokyo) from HapMap

KCTD15 Gene: Potassium Channel Tetramerisation Domain Containing 15

LD Linkage Disequilibrium

LGR4 Gene: Leucine-Rich Repeat containing G Protein-Coupled Receptor 4

LMM Linear Mixed Effects Model

LMX1B Gene: LIM Homeobox Transcription Factor 1, beta

ln Natural Logarithm

LRP1B Gene: Low Density Lipoprotein Receptor-Related Protein 1B

LRRN6C Gene: Leucine Rich Repeat Neuronal 6C

LRT Likelihood Ratio Test

MACH Markov Chain Haplotyping software

MAF Minor Allele Frequency

MAGIC Meta-Analysis of Glucose and Insulin-related traits Consortium

MAP2K5 Gene: Mitogen-Activated Protein Kinase 5

MC4R Gene: Melanocoritin 4 receptor

MCE Monte Carlo Error

MDS Multidimensional Scaling

ML Maximum Likelihood

MLE Maximum Likelihood Estimate

MTCH2 Gene: Mitochondrial Carrier 2

MTIF3 Gene: Mitochondrial Translational Initiation Factor 3

NFBC66 Northern Finnish Birth Cohort of 1966

NEGR1 Gene: Neuronal Growth Regulator 1

NLMM Non-Linear Mixed Effects Model

NRXN3 Gene: Neurexin 3

OLFM4 Gene: Olfactomedin 4

xxvii

PC Principal Component

PCA Principal Components Analysis

pdf Probability Density Function

PI Ponderal Index

PRKD1 Gene: Protein Kinase D1

PTBP2 Gene: Polypyrimidine Tract Binding Protein 2

QC Quality Control

QPCTL Gene: Glutaminyl-Peptide Cyclotransferase-Like

Raine Western Australian Pregnancy Cohort Study

RBJ Gene: DnaJ (Hsp40) homolog, subfamily C, member 27

REML Restricted Maximum Likelihood

RPL27A Gene: Ribosomal Protein L27a

SD Standard Deviation

SE Standard Error

SEC16B Gene: SEC16 Homolog B

SFRS10 Gene: Splicing Factor, Arginine/Serine-Rich 10

SH2B1 Gene: SH2B Adaptor Protein 1

SITAR SuperImposition by Translation And Rotation

SLC39A8 Gene: Solute Carrier Family 39 (zinc transporter), Member 8

SNI Skew-Normal/Independent distribution

SNP Single Nucleotide Polymorphism

SPLMM Semi-Parametric Linear Mixed Effects Model

STLMM Skew-t Linear Mixed Effects Model

TFAP2B Gene: Transcription Factor AP-2 Beta (activating enhancer binding protein 2

beta)

TFBS Transcription Factor Binding Site

TMEM18 Gene: Transmembrane Protein 18

TMEM160 Gene: Transmembrane Protein 160

TNNI3K Gene: TNNI3 interacting kinase

UMVUE Uniformly Minimally Variance Unbiased Estimator

YRI Samples of African descent (Yoruba people of Ibadan, Nigeria) from HapMap

ZNF608 Gene: Zinc Finger Protein 608

xxviii

Acknowledgements I would firstly like to thank my supervisors. Thank you Associate Professor Craig Pennell for

giving me the freedom to conduct my research in Toronto; I know it was an additional

challenge at times, but I appreciate your dedication to making it work. Thank you Associate

Professor Laurent Briollais for allowing me to join your research group for the past two years; I

have learnt so much more by being in Toronto, and I will be ever grateful for the opportunity

you have provided. Your patience, advice and direction have allowed this thesis to become a

reality. Professor Stephen Lye, your excitement over my findings that fit the DOHaD story has

kept me going, thank you.

To Professor Lyle Palmer, thank you for beginning my career in genetic statistics. Your

guidance and enthusiasm is what has kept me going over the past few years. You have always

believed in my abilities and encouraged me to aspire to a level I never thought was possible.

I am grateful to have spent time working with an amazingly talented group of people

throughout the course of my thesis. Yan Yan, thank you for helping broaden my knowledge of

linear mixed effects models and simulation studies. Your ability to describe complex methods

makes the statistics much more palatable. Laura Howe, thank you for your continuous

enthusiasm and willingness to contribute to my research. Even from so far away, your emails

would always put a smile on my face. To Kate, Lavinia and Debbie, thank you for your advice

and help in interpreting results from various aspects of my research.

This thesis wouldn’t be possible without the foresight of Professor John Newnham who

established the Raine Study, thank you. Thank you also to the participants and scientists

involved in the Raine, ALSPAC and NFBC66 studies; I understand how much these cohort

studies take to maintain and know that they only exist because of the dedication and

generosity of all those involved. Thank you to the Raine Study for providing me with additional

funding to make my research possible and allow me to travel between Perth and Toronto. I

also gratefully acknowledge the financial support received from the Australian Government

Department of Innovation, Industry, Science and Research (Australian Postgraduate Award).

xxix

To my friends who are one step ahead of me; Laura, Katie, Gemma and Sarah. Thank you for

giving me the inspiration to achieve this goal and sharing stories of your PhD journeys with me.

I have enjoyed our statistical and epidemiological discussions in the office, over brunch or even

over a glass of wine (or beer for the Canadians!). Laura, thank you for making me experience

everything Canadian while I was in Toronto; you have been an amazing tour guide and I have

seen so much. Sarah, all of your advice over the final few months of this journey was greatly

appreciated.

A special thank you to my Canadian family: Lucie, Peter, Helen and Jane. When it all got too

much, you were always there to listen, have a glass of wine and entertain me with the latest

misdemeanours. You gave me a home away from home. Our memoirs from the last two years

will provide great entertainment!

Finally, I would to say a big thank you to my amazing family. I’m sure you all thought that I was

crazy to do this, but you continued to support me in every way. Mum and Dad, thank you for

being there and listening through all the ups and downs of this journey. Your support and

willingness to understand what I was working on is admired. Mon, thank you for providing a

bed for my many trips back to Perth, being an amazing right arm when mine was broken, being

an English language teacher when I needed it most and just being the best big sister a girl

could ever want. I would be lost without you. Nanna, thank you for your continued love and

support, and amazing baking when I make it home! I hope I have made you all proud.

xxx

Chapter 1: Introduction

1.1 General Introduction Incidence rates of the most common chronic diseases are increasing in developed, and more

recently in developing countries, affecting millions of people globally. These conditions are

influenced by multiple interacting factors including the environment, behaviour and genetics.

Relatively little is known about what is driving the rise in the prevalence of many chronic

diseases, which limits our ability to constrain the prevalence of these conditions that result in

enormous economic and social burden. The ability to study environmental factors influencing

disease risk has resulted in some important discoveries, such as the relationship between

smoking and increased cancer risk. Despite the common knowledge that most chronic diseases

are heritable, our ability to discover the combination of genes that increase an individual’s

disease risk has, to date, had a very small impact.

Over the past decade there has been explosive growth in the technical capacity to generate

and store enormous genomic datasets across the whole genome. The analysis of these

datasets was initially tempered by failures to find genes for complex phenotypes using any

analytic strategy. More recently, primarily through the increased sample sizes available using

multiple studies in large consortia, many genetic regions associated with hundreds of

phenotypes have been identified. Notwithstanding these successes, there is still a great need

to discover, characterise and translate genetic and environmental determinants of disease.

The statistical methods to analyse genetic data still lag far behind our ability to produce

enormous genetic datasets. Many of the traits investigated to date have been analysed using

cross-sectional designs utilising relatively simplistic statistical methods. This has resulted in the

identification of many regions of interest in the genome; however the genetic variants

discovered to date only explain a small proportion of the expected variability in a given trait

due to genetics (heritability). To advance our knowledge of the role of genetics in complex

disease, sophisticated statistical methods and optimal study designs now need to be

developed that better capture the disease process.

1 Chapter 1: Introduction

This thesis aims to extend the current statistical methods used in genetic association studies to

allow investigation into how genetic variants are associated with quantitative phenotypes over

time. By applying advanced statistical techniques to the breadth of longitudinal data that is

available in large population based studies, it is anticipated that geneticists will be able to

uncover more of the genetic determinants of complex diseases. Longer term, this information

may lead to the development and implementation of more targeted interventions at a younger

age before the onset of disabling chronic disease in those at risk, ultimately reducing the cost

of healthcare.

1.2 Introduction to Life Course Epidemiology Late last century, there were two schools of thought regarding how individuals developed

disease in adulthood. The first hypothesis was the “adult lifestyle” approach, whereby an

individual’s behaviour in adulthood, including smoking, diet, exercise and alcohol

consumption, affected the onset and progression of disease. Its main focus was on identifying

factors that were associated with the timing of disease onset and speed of degeneration. The

second approach was the “biological programming” hypothesis, whereby environmental

exposures, such as under-nutrition during critical periods of growth and development in utero,

programmed an individual in such a way that their risk of adult chronic disease was increased

[1]. This approach was originally known as the foetal origins of adult disease, where birth size,

as a marker of antenatal growth, influenced later disease risk. It has more recently been

termed the developmental origins of health and disease (DOHaD) which broadens the

timeframe of development to beyond that of just antenatal growth leading to birth size [2].

The hypotheses underlying DOHaD indicate that we are predisposed to disease through our

early life exposures. In the late 1990’s, a third approach was introduced to bridge the

increasing gap between the existing biological programming and adult lifestyle approaches.

This new hypothesis was termed the life course approach, which developed into a

multidisciplinary framework for research on health, human development and aging [3,4]. Life

course epidemiology is defined as “the study of long term effects on later health or disease risk

of physical or social exposures during gestation, childhood, adolescence, young adulthood and

later adult life” [3,4]. Life course epidemiology investigates the contribution of early life risk

factors, such as biological (including genetics), environmental and social exposures, in

conjunction with later-life factors, to identify processes that may account for inequalities in

adult health and mortality. It aims to understand the relevance of different exposures


occurring at different times in the life course on later health, and allows key periods to be

identified for potential targeted interventions. A significant aspect of life course epidemiology

is the idea of critical or sensitive periods; limited windows of time where an exposure has an

effect, either protective or adverse, on the development of a disease outcome. Specifically, a

critical period occurs where outside this window of time there is no excess disease risk

associated with the exposure, whereas a sensitive period allows for modification or reversal of

the disease risk outside of this particular window [5]. If researchers are able to promote

awareness of these early and life course approaches to disease, it will reinforce the importance

of a healthy lifestyle from a young age, which will ideally have an impact on reducing the

incidence of disease.

1.3 Introduction to Genetic Epidemiology Genetic epidemiology has been defined as “a discipline closely allied to traditional

epidemiology that focuses on the familial, and in particular genetic, determinants of disease

and the joint effects of genes and non-genetic determinants” [6]. This section presents an

overview of some of the fundamental genetic concepts and terminology used throughout this

thesis.

1.3.1 Genetics for Genetic Epidemiology

The number of cells in the human body is estimated to be 100 trillion (1x1014). In the nucleus

of each cell are 23 pairs of chromosomes; one of each pair is inherited from each parent. These

23 pairs are made up of 22 autosomes and 1 sex chromosome; an XX if a female and XY if a

male. Each chromosome is made up of deoxyribonucleic acid (DNA); the carrier of genetic

code. The genome is the total amount of hereditary information encoded in the DNA across all

chromosomes. The genome is made up of smaller units, also known as genes, which are

sequences of DNA located together and are the basic unit of genetic information. Each

individual has approximately 25,000 genes each of which consists of coding and non-coding

regions, known as exons and introns respectively. Exons are DNA sequences that code for

particular proteins, whereas introns do not code proteins but are thought to play an important

role in regulating the manufacture of proteins. DNA consists of two strands; each strand

contains chemical building blocks called nucleotides. Nucleotides differ in their nitrogen-

containing base. There are four bases in DNA; adenine (A), cytosine (C), thymine (T) and

guanine (G). The sequence of these bases within a strand determines the genetic information


stored in the strand. As these four bases occur in pairs along the chromosome, the genetic

distance along a chromosome is measured in base pairs (bp). A genetic locus is a particular

position along a chromosome. An allele is any one of the four possible DNA bases occupying a

given genetic locus on a chromosome. Typically, alleles are used as a representation of a gene

that incorporates them, as they account for variations in an inherited characteristic. An

individual will inherit one allele from each parent and will therefore have two alleles at each

genetic locus. An individual’s genotype depicts the alleles that they possess at a locus and

therefore the specific genetic makeup of an individual at a particular locus.

The most commonly used genetic variant in studies of the genome is a biallelic variant. If we

let A denote the common allele and a the rare allele at a locus, then there are three possible

combinations of alleles that make up an individual’s genotype, namely AA, Aa or aa. If an

individual has a genotype with identical alleles (AA or aa) then they are referred to as a

homozygote. If an individual has two different alleles at a locus (Aa) then they are said to be a

heterozygote.

An observable characteristic in an individual, either a biological or physical trait, is called a

phenotype. A phenotype is any characteristic that varies between individuals, for example

height or eye colour, and is influenced by an interaction between an individual’s genotype and

their environment.

A polymorphism is a variation in the genetic material that may change the function of a gene. A

single nucleotide polymorphism (SNP) occurs when a single nucleotide (either A, T, C or G) at a

specific genetic locus in the genome is substituted with another nucleotide. More specifically,

suppose the majority of individuals are homozygous for the C allele. A SNP occurs if one or

both of these alleles are replaced with another allele, say T. In this mutation, the T allele would

be referred to as the variant (or rare) allele. Each SNP has a unique identification number,

known as its rs number. Of the three billion bases in the human genome, there are

approximately 10 million common SNPs with frequency greater than 1% [7].

Many SNPs have no effect on cell function and go unnoticed. However, some SNPs can affect

the development and progression of disease in individuals and how they respond to

treatments used in prevention and disease management. SNPs are often used as markers in


genetic studies. As they remain relatively constant from generation to generation, they are

useful for studying the associations between genetic variations and observed phenotypes in an

individual. Although there are several different types of mutations, including copy number

variations (CNVs) and insertions/deletions (Indels), this thesis focuses on SNPs only.

SNPs can affect phenotypic expression in four ways, exerting dominant, recessive, additive or

co-dominant genetic effects. A dominant pattern is where a phenotype is expressed in

individuals who have at least one copy of the variant allele; a recessive pattern is where the

individual must have two copies of the variant allele to have the phenotype; an additive

pattern is where the expression of a phenotype increases or decreases linearly as the number

of variant alleles increases; and a co-dominant pattern occurs when the expression of a

phenotype increases or decreases, not necessarily in a linear pattern, as the number of variant

alleles increases.

Recombination is the “breaking down of one maternal and one paternal chromosome, the

exchange of corresponding sections of DNA, and the re-joining of the chromosomes”[8]. As a

result of recombination, each chromosome contains a mixture of alleles from each parent. A

haplotype is a series of alleles along a chromosome which have not taken part in

recombination, and have therefore been inherited as a unit. The concept of haplotype

inference will be introduced in Section 1.3.4 and how they relate to genome-wide association

studies in Section 1.4.2.

1.3.2 Hardy-Weinberg Equilibrium Principle

The Hardy-Weinberg Equilibrium (HWE) principle serves as the foundation for population

genetics, as it explains the consistency of genotype frequencies across generations in a

population. The principle was defined independently by an English Mathematician, Godfrey

Hardy [9], and a German Physician, Wilhelm Weinberg [10]. The assumption is that in a large,

randomly mating population, the allele and genotype frequencies at a single genetic locus will

remain at an equilibrium value from generation to generation. The principle only holds if the

following conditions are satisfied:

1. The population is very large

2. The population is isolated from other populations (i.e. there is no migration)

3. There are no mutations


4. Mating within the population occurs at random

5. There is no natural selection (i.e. every individual has an equal chance of survival)

Let us consider the following example consisting of a bi-allelic locus with two alleles, A and a.

Let the relative frequencies of these alleles in a given population be p and q=1-p respectively.

As stated in the previous section, there are three possible genotypes for this locus, AA, Aa and

aa. Under HWE, the frequencies of these genotypes will be of the proportions p2, 2pq and q2

respectively. If the above conditions hold then these frequencies will remain unchanged from

generation to generation. If the genotype frequencies in the observed population are

significantly different from the expected frequencies then it can be concluded that the sample

is not in HWE. Deviations from HWE can indicate genotyping errors or violation of one of the

five assumptions, both of which could lead to bias in subsequent genetic analysis [11].

1.3.3 Linkage Disequilibrium

Under the assumption of random mating, the alleles of a locus within a population are

combined into genotypes randomly, such that genotype frequencies are consistent with

Hardy-Weinberg proportions. Consider an example with two loci, M1 and M2, where the

possible genotypes at M1 are AA, Aa and aa and the possible genotypes at M2 are BB, Bb and

bb. Although the genotypes of both of these loci may be consistent with Hardy-Weinberg

proportions, the alleles of M1 may not be independent of the alleles of M2. This dependence

between two genetic loci is termed linkage disequilibrium (LD). Alternatively, if allele A is

independent of allele B then the two loci are said to be in linkage equilibrium.

Several measures to quantify the linkage disequilibrium between two SNPs have been

proposed [12,13]. To continue the example above, let the probabilities of the alleles at the two

loci in the sample be denoted as pA, pa, pB, and pb respectively. When alleles at the two loci

occur independently of each other, that is they are in linkage equilibrium, then the probability

of having the an A allele at M1 and B at M2 is pAB = pApB. The sum of the probabilities for each

combination is one; i.e. pAB + paB + pAb + pab = 1. Additionally, due to the independence of the

two loci, the probability of having an A allele given that an individual already has a B allele is

pA|B = pA. When the alleles are in linkage disequilibrium, the observed probabilities of the

alleles differ from the expected probabilities and the strength of this deviation can be


quantified by the linkage disequilibrium coefficient, D. Table 1.1 illustrates how D modifies the

expected probabilities.

Table 1.1: Probability of observing a given haplotype when the loci are in disequilibrium

Alleles B b

A pAB = pApB + D pAb = pApb – D

a paB = papB – D pab = papb + D

Table 1.1 shows the probabilities of each combination of alleles differ by the linkage

disequilibrium parameter D when the population is in LD. If D is zero then there is linkage

equilibrium. The value of D is dependent on the allele frequencies at M1 and M2 such that the

smallest possible value of D (Dmin) and the largest possible value (Dmax) are:

Dmin = the larger of –pApB and – papb

Dmax = the smaller of pApb and papB

Because D is dependent on allele frequencies at both the loci, this parameter is comparable

between two pairs of loci only if their allele frequencies are similar. To standardise this

measure, Lewontin [12] introduced the now commonly used measure D’, obtained by dividing

D by Dmax. D’ will then take a value between 0 and 1, with a larger value of D’ indicating a

stronger correlation between the two loci and therefore stronger LD. If the two loci are

completely correlated, D’ will equal 1 and the alleles at one locus will always predict the alleles

at the second locus. Another common LD coefficient, r [13], is the correlation coefficient

between two loci, defined as:

A a B b

Drp p p p+ + + +

=

Where pA+ indicates the relative frequency of the A allele (pAB + pAb), pa+ indicates the relative

frequency of the a allele (paB + pab), and the same applies for the B allele. This term is

commonly squared to remove the sign that can be introduced in this calculation depending on

how the loci are labelled, giving the more familiar measure r2. Even when two loci are in

complete disequilibrium based on Lewontin’s D’ (i.e. D’=1), the pairwise r2 value can vary

widely because it is related to the allele frequencies of the two loci and the position of the

corresponding mutations in the genealogy.


1.3.4 Haplotype Inference

Chromosomes from each parent exchange segments of DNA during cell formation, a process

referred to as recombination. Alleles that have not been subjected to the recombination

process are inherited as a unit, henceforth referred to as a haplotype. Several methods can be

used to construct haplotypes with certainty, including the use of family data or laboratory

techniques to isolate specific haplotypes; however these procedures are often costly and time

consuming [14]. Therefore, haplotypes are often derived using analytic techniques from

genotypes; although an individual’s genotype may not uniquely define his haplotype. For

example, consider two loci such that the first has genotype AA, Aa or aa and the second BB, Bb

or bb. Pairing one allele from a genotype at each locus forms a haplotype. Table 1.2 illustrates

the possible diplotypes (haplotype pairs) an individual can have given their genotypic

information.

Table 1.2: The genotypic information an individual possess across two loci determines their

possible diplotype

Genotype BB Bb bb

AA AB AB AB Ab Ab Ab

Aa AB aB AB ab or Ab aB Ab ab

aa aB aB aB ab ab ab

Table 1.2 shows that most genotype combinations result in a unique diplotype. For example, if

an individual has the genotype AA at one locus and BB at the other, then they have the

diplotype AB AB. However, if an individual is heterozygous at the two loci, that is the genotype

Aa at one locus and Bb at the other, then the diplotype is uncertain. This is also referred to as

a phase-ambiguous haplotype pair. Appropriate statistical methods are necessary to establish

the likelihood of an individual possessing each diplotype based on their available genotype

data. There is a growing body of literature on appropriate haplotype inference methods,

including Clark’s algorithm [15], a pseudo-Bayesian algorithm by Stephens et al [16] and an

Expectation-Maximization (EM) algorithm by Excoffier and Slatkin [14].


1.3.5 Evolution of Genetic Epidemiology Studies

There are various approaches available to investigate the relationship between genetic

variants and specific phenotypes including linkage studies [17], association studies [18] and

most recently whole genome sequences [19], each of which has its own strengths and

limitations [20]. This section describes these key analytic approaches which are used to

identify the genetic factors of human disease.

1.3.5.1 Linkage

Linkage studies were previously used as the first stage in the genetic investigation of a trait, as

they identify broad genetic regions that might contain a disease gene. Two genetic loci are

linked if recombination between them occurs with a probability of less than 50%; that is, they

are more likely to be transmitted from parent to offspring than expected under independent

inheritance [17].

In parametric (or model-based) linkage analysis, genetic markers that are evenly distributed

throughout the genome are genotyped in pedigrees and their co-segregation is investigated.

Parametric linkage examines the probability of recombination between two loci, quantified by

the recombination fraction θ, and is usually reported as a logarithm of the odds (LOD) score

[21]. LOD score analysis is equivalent to likelihood ratio testing, but uses logs to the base 10

instead of natural logarithms. Under the null hypothesis, there is no linkage between the

disease and marker loci (θ=0.5), while the alternative hypothesis assumes linkage exists

(θ<0.5). The LOD score function, log10 of the ratio between the likelihood of θ and null

hypotheses, is then maximised with respect to θ. A LOD score of greater than around three or

3.3 is deemed significant evidence of linkage [22,23].

Non-parametric (or model-free) linkage analysis uses the expectation that in a region

containing a disease-causing gene there would be an excess of identical by descent (IBD)

haplotype sharing. The simplest approach is to study affected sibling pairs. Under the null

hypothesis of no linkage, the number of IBD alleles shared by the siblings is none with

probability 0.25, one with probability 0.5 and two with probability 0.25. By genotyping genetic

markers across the genome, the observed proportions of sharing none, one or two alleles IBD

at candidate loci in affected sib pairs can be compared with the expected proportions under

the null hypothesis. Linkage would be suggested if the affected sib pairs share significantly


more alleles IBD than expected by chance. The most powerful test for detecting linkage in

affected sib pairs is the mean test, whereby the mean number of alleles shared IBD is

compared to the expected value of one [24]. This approach has been modified for other types

of relatives and larger family study designs [25,26]. Methods have also been developed for

linkage analysis of quantitative traits, using the assumption that two siblings who share more

alleles IBD would be expected to have more similar trait values if the marker is linked to a gene

influencing the trait [27,28,29].

Linkage studies have been very useful for identifying genomic regions for single-gene,

‘Mendelian’ diseases, which has resulted in the mapping of over 2,000 genes [30]. However,

they have only had limited success for common, complex diseases [31] due to several factors

including the relatively small proportion of variance explained by individual loci of the complex

disease, low resolution to localise disease alleles to a particular location, imprecise phenotype

definitions and inadequately powered study designs [32].

1.3.5.2 Association

Association studies investigate whether a genetic variant (for example a genotype, allele or

haplotype) is consistently associated with an observed disease or phenotypic trait in a study

population as a whole. They typically use large cross-sectional (either case/control samples or

population based samples for quantitative phenotypes) or cohort designs of unrelated

individuals; however, family designs can also be utilized. Table 1.3, adapted from Cordell and

Clayton [18], illustrates the possible study designs for genetic association studies.


Table 1.3: Study designs for genetic association studies [18,33,34]

Study design Advantages Disadvantages

Case-control Sample affected (case) and unaffected (control) individuals.

Cases are often attained from family practitioners or disease

registries; controls can be obtained from a random

population sample.

The sample is relatively easy to collect, in

comparison to other study designs and there

is no need for follow-up of the individuals.

This study design is used to provide an

estimate of exposure effects and is the

preferred design for rare diseases.

Sampling requires careful selection of

controls. Potential for confounding (e.g.

population stratification) using this study

design and it can generally only be used

to investigate one outcome.

Case-only Sample only affected individuals. Cases can be obtained

from initial cross-sectional, cohort or disease based sample.

This design is the most powerful design for

detection of interaction effects.

This design can only estimate interaction

effects and is very sensitive to population

stratification.

Case-parent triads Sample affected individuals and both of their parents.

Affected individuals can be obtained from initial cross-

sectional, cohort or disease based sample.

This design is robust to population

stratification and can be used to estimate

parent of origin and imprinting effects.

This study design is less powerful than

the case-control design.

Case-parent-

grandparent

Sample affected individuals, both of their parents and all of

their grandparents. Affected individuals can be obtained

from initial cross-sectional, cohort or disease based sample.

This design is robust to population

stratification and can be used to estimate

parent of origin and imprinting effects.

Grandparents rarely available for

sampling.

Table 1.3 continued

Study design Advantages Disadvantages

General pedigrees A random sample or disease based sample of families from

general population.

Power is often greater in studies of large

pedigrees than in other family designs. The

sample may already exist from previous

linkage studies. This study design can be

used to investigate a disease or quantitative

trait.

This study design can be expensive to

genotype and there are generally many

missing individuals within the families.

Cohort A subsection of the population which are used to follow the

disease incidence over specified time period.

This design measures events in temporal

sequence and therefore can be used to

distinguish between causes and effects.

This design is expensive to follow-up.

Sample selection and loss to follow-up

are potential causes of bias.

Cross-sectional A random sample from the population which is used to

study the prevalence of a disease.

This sample is inexpensive to collect and it

can be used to investigate multiple diseases

or quantitative traits.

This study design is not ideal for rare

diseases as there are few affected

individuals.

Extreme values A sample of individuals with extreme (high or low) values of

a quantitative trait. These individuals are often obtained

from an established cross-sectional or cohort sample.

This study design genotypes only the most

informative individuals and hence saves on

genotyping costs.

This study design does not allow for an

estimate of true genetic effect sizes.

DNA-pooling Applies to variety of above designs, but genotyping is of

pooled DNA from anywhere between two and 100

individuals (rather than on an individual basis).

This study design is potentially inexpensive

compared with individual genotyping.

It is hard to estimate different

experimental sources of variance using

this study design.

There are two main types of association studies; candidate gene studies and genome-wide

association studies (GWASs). For candidate gene studies, a gene that is likely to be associated

with an outcome of interest is selected based on prior knowledge, such as biological

plausibility, studies of animal models or prior genetic association studies [35]. Particular

genetic variants in that gene, based on the LD structure, are then genotyped and association

analyses between these variants and the outcome are performed. Candidate gene studies

have been unsuccessful in detecting replicable genetic loci for some disease outcomes as there

are no obvious plausible candidates to test, the power of the studies is low because of the

small sample sizes, the effect size of each genetic loci is small and, previous to HapMap, there

was inadequate coverage of common variation [32]. More recently, genetic association studies

have been performed over the entire genome, which removes the requirement of prior

knowledge. These GWAS analyses make no assumptions about the identity of the causal gene

and are therefore a ‘hypothesis free’ approach to genetic association analysis. The specific

details of GWASs will be provided in Section 1.4.

There are three reasons why an association between a genetic polymorphism and a trait might

exist [36]:

1. Direct association: the polymorphism is functional and causes the change in trait

value.

2. Indirect association: the polymorphism does not cause the change in trait value

but is in linkage disequilibrium with the causal variant.

3. Confounded association: the association is due to underlying population

stratification or admixture. In genetic epidemiological studies, population

stratification can be accounted for to reduce the chance of a confounded

association. It can be accounted for by [37]:

a. Matching by family membership so that comparisons are performed

between members of the same family [38].

b. Estimating the population substructure using either ancestry informative

markers (a set of loci that exhibit different allele frequencies between

populations from different geographical regions) or principal components

analysis on a large set of genotype data. These estimates can then be used

to remove individuals from the study who are of a different substructure

or to make statistical adjustments for the substructure in the analysis [39].


c. Population substructure increases the type 1 error rate (false positives)

and therefore, increasing the threshold required to declare statistical

significance will also control for the substructure. This method is referred

to as ‘Genomic control’; however, this method does not attempt to control

for the false negative rate that is also an issue when population

substructure is present [40].

These points need to be kept in mind when interpreting the results of a genetic association

study.

1.4 Introduction to Genome-Wide Association Studies (GWASs) 1.4.1 Definition

As linkage and candidate gene approaches have, in general, failed to discover replicable

genetic associations for many diseases and traits to date, the genetics community are turning

to GWASs to identify more genetic variants that explain the heritability of these traits. A GWAS

investigates the association between a particular phenotype and each SNP on a genome-wide

scale. Although ‘genome-wide’ implies that all 10 million common SNPs are investigated,

which would be a complete coverage of the common SNPs within the genome, often only

~500,000 SNPs are genotyped on a ‘whole-genome’ panel (panels with greater than one

million SNPs are available). However, most custom whole-genome panels provide substantial

coverage of common variation in non-African populations through LD patterns [41].

As displayed in Figure 1.1, GWASs identify SNPs that are common in the population but have

low disease penetrance (purple circle). In contrast, linkage analyses often identify genetic

variants that are low in frequency but have high disease penetrance (blue circle) and candidate

gene association studies capture a mix of all combinations depending on the disease and gene

of interest. The shift in genetic epidemiology to using GWASs has been accompanied by an

increasing methodological focus on optimal approaches to the design, analysis, meta-analysis

and reporting of genetic studies, including how to define a SNP as ‘significant’ given the large

number of tests conducted.


Figure 1.1: Architecture Of disease based on genetic determinants (Image adapted from

McCarthy et al [42], Manolio et al [43], Bush and Moore [44])

The first GWAS, published in 2005, investigated age-related macular degeneration in a small

sample of individuals on a relatively sparse panel of markers [45]. However, the ‘landmark’

GWAS wasn’t published until 2007 by the Wellcome Trust Case Control Consortium who

performed a GWAS of seven common diseases [46]. There is now a large database describing

the trait/disease associated SNPs discovered using GWASs [47], showing that over 1,500

GWASs have been published on a wide range of diseases and trait phenotypes [48]. Most of

the common variants at loci found to date have only demonstrated a modest effect, often with

odds ratios of less than 1.2 and explaining less than 1% of the variance of a phenotypic trait

[49]. Therefore, for the majority of common diseases and traits there is still a large portion of

the genetic architecture of disease unexplained [50,51].

1.4.2 Imputation

Imputation, the process by which missing data is replaced by a probable value based on

additional information, has been used in statistics for many decades. Genotype imputation

uses information from genotyped SNPs observed in a given individual and known LD patterns

from a more densely genotyped reference panel to determine the probable genotypes at an

untyped locus. Imputation is used for loci that were not genotyped or where a genotype was

not determined for a particular individual.


A particular stretch of chromosome in one individual provides information about the

genotypes of many other individuals who inherit that same stretch of chromosome IBD. In

related individuals, the stretch of chromosome, or haplotype block, will be larger as there have

been fewer generations for recombination to occur. In unrelated individuals, the shared

haplotype blocks will be much shorter as the common ancestors will be much more distant

and therefore the haplotypes are harder to identify with confidence. Genotype imputation for

missing genotypes at observed SNPs uses these haplotype blocks in a Hidden Markov Model to

determine the probability of the missing genotype given the observed genotypes and

haplotype combinations. It takes into account recurrent mutation at the SNP and potential

recombination in the region [52].

Genotype imputation for missing SNPs identifies haplotype blocks throughout the genome for

a particular ethnicity from a reference panel, which is an existing database with detailed

genotype information on a large number of markers in a few individuals (50 to several

hundred). The haplotype blocks for each individual in a study sample are then compared to

those in the reference panel. The schema in Figure 1.2 outlines this process. The top left panel

shows the loci of 15 common variants that were genotyped in the reference panel, of which 6

were genotyped in the study sample. The grey question marks indicate loci that were not

genotyped; these genotypes need to be imputed. The first stage of imputation is to phase the

haplotypes (top right panel of Figure 1.2); or in other words, to determine which haplotypes

the individual is likely to have, as described in Section 1.3.4. If the phase is ambiguous, this

stage may produce several possible haplotypes for a given individual. The second stage is to

compare the phased haplotypes for each individual to the reference panel (bottom left panel

of Figure 1.2). Finally, when a match is made to the reference panel, the remaining genotypes

in the haplotype block are imputed for the given individual (bottom right panel of Figure 1.2).

This is then repeated for all individuals in the study sample. Due to possible mutations and

recombination throughout generations, this process is often more complex than illustrated in

Figure 1.2; however, this describes the basic idea being implemented.

There are several different programmes available that will perform the imputation including,

but not limited to, Markov Chain Haplotyping software (MaCH) [53], IMPUTE [52], fastPHASE

[54], PLINK [55] and Beagle [56]. Each programme uses slightly different procedures to search

for shared haplotype blocks, and hence their computational efficiency differs. For example,


IMPUTE relies on recombination rates generated by the HapMap Consortium and assumes a

uniform mutation/error rate for all markers, whereas MaCH estimates recombination rates

within each dataset and allows mutation rates to vary. Li et al [53] illustrated that MaCH and

IMPUTE perform similarly, while Biernacka et al [57] and Pei et al [58] showed that these two

programmes both outperform fastPHASE, PLINK and Beagle. Both MaCH and IMPUTE are

based on a Hidden Markov Model and implement variants of the ‘product of approximate

conditionals’ model [59].

Figure 1.2: A schematic of the process by which genetic data is imputed using haplotype

inference


The success of genotype imputation is dependent on the following two steps:

Step 1: Pre-processing genotype data before imputation

Before imputing any genotype data, it is imperative that the genotyped data is cleaned;

otherwise there will be inaccuracies in the haplotype estimation. Typical quality control (QC)

measures are applied first to the individuals and then to the SNPs. Individuals are often filtered

out based on heterozygosity, call rate (both may be due to poor quality DNA), cryptic

relatedness (unknown relationship with another individual in the study sample) or population

structure (removing individuals who appear to have descended from a population of different

ethnic background than the rest of the study sample). QC on the SNPs includes call rate, Hardy-

Weinberg outliers (i.e. SNPs not in HWE) and minor allele frequency (MAF). Each study applies

thresholds that they deem appropriate, however it is also common to use the thresholds

detailed in the original Wellcome Trust Case Control Consortium GWAS [46], whereby SNPs are

excluded if:

- The call rate is less than 95%

- The HWE P-Value is less than 5.7x10-7

- The MAF is less than 0.01 (1%)

The final data cleaning step is to ensure that the annotation of the SNPs on the chip are the

same as the chosen reference panel and that the SNPs are all on the same strand. It is common

to align SNPs to build 36 of the human genome and to the forward (+) strand. The alignment

step to a particular strand is crucially important to avoid imputation errors, especially for SNPs

with complement alleles (i.e. A/T and C/G SNPs), and to allow ease of meta-analysis with other

studies.

Step 2: Choosing an appropriate reference panel

To date, the HapMap Consortium database [60] has typically been used as the reference panel.

However greater numbers of individuals and SNPs are being genotyped in a wider range of

ethnic populations for other databases, such as the 1000 Genomes Project

(http://www.1000genomes.org/home), and these are increasingly becoming more popular

reference panels.

The International HapMap project was developed to describe the common patterns of

sequence variation in the human genome; in other words, to develop a haplotype map (hence


HapMap) of the human genome. The data from Phase 2 of the project is freely available in the

public domain and includes variants at 3.1 million loci in multiple ethnic groups, including

African (YRI; samples from 30 trios from the Yoruba people of Ibadan, Nigeria), Asian (CHB; 45

samples from Beijing and JPT; 45 samples from the Tokyo area) and European (CEU; samples

from 30 trios from residents of the United States of America with northern and western

European ancestry) [61]. There are currently two larger databases than this available:

1. HapMap Phase 3 [62] with 1.6 million common SNPs in 1,184 individuals from 11

global populations: YRI, CHB, JPT and CEU samples from the previous release in

addition to samples from individuals of African ancestry in the south-western USA

[ASW], Chinese in metropolitan Denver, Colorado, USA [CHD], Gujarati Indians in

Houston, Texas, USA [GIH], Luhya in Webuye, Kenya [LWK], Maasai in Kinyawa, Kenya

[MKK], Mexican ancestry in Los Angeles, California, USA [MXL] and Tuscans in Italy

[TSI].

2. 1000 Genomes Project has pilot data of approximately 15 million SNPs in 742

individuals [63] and a main phase 1 with approximately 28 million SNPs in 1,092

individuals. Both datasets include individuals from 14 locations of five major ethnic

origins including European, East Asian, West African, Americas and South Asian.

For studies of European ancestry, in the majority of cases it is clear that the HapMap CEU

samples are an appropriate reference panel. However, for populations of mixed ancestry or

from areas not included in the HapMap Consortium, such as the Middle East, it is a little more

difficult to decide which panel to use as a reference. Several authors have suggested that

‘masking’ a set of genotypes and imputing them using the different reference panels will

identify a strategy that provides the most accurate genotype imputation [64,65]. Other

methods include using principal components analysis to determine which reference panel is

closest to the individuals in the study [66] or using a ‘cosmopolitan’ panel combining all

reference panels together [53,65,67,68]. For all reference panels, it is recommended that

release 22 is used for imputation [49], as the National Centre for Biotechnology Information

(NCBI) build 36 has become the standard for the human genome assembly.

Genotype imputation is commonly used in association analyses on the genome-wide scale as it

increases the power of a GWAS [52,53,68,69] and furthers the discovery of novel genotype-

phenotype associations though the meta-analysis of multiple studies genotyped with different


sets of genetic variants [49,70]. Commercial genotyping platforms differ in the SNPs that are

included and they are continually being updated as new SNPs are being discovered. Therefore

imputation serves as a crucial bridge when merging distinct studies genotyped on different

platforms, combining different versions of the same platform or adopting a new platform

during the course of a study. By imputing GWAS data, each study has a common set of SNPs to

contribute to a meta-analysis. Such meta-analyses are required to achieve the large sample

sizes needed to discover modest genetic effects.

Given the computational time already required to conduct the complex analyses in this thesis,

the HapMap Phase 2 CEU data will be used as a reference panel, with approximately 2.5

million SNPs. The CEU population was chosen as all of the cohorts investigated in this thesis

were of European descent and principal components analysis showed that the studies closely

clustered with the CEU population.

1.4.3 Association Analysis

Once the study dataset has been imputed, an association analysis similar to that outlined in

Section 1.3.5.2 is conducted. For each imputed SNP in a given sample, three values are

calculated:

1. Posterior probability for each of the three possible genotypes (AA, Aa and aa),

2. Allelic dosage, which is the expected number of copies of a given allele, ranging from 0

to 2,

3. ‘Best guess’ genotype, which is the genotype with the highest posterior probability.

The majority of association analyses use the dosages from the imputed data, which are treated

as a continuous variable in the regression model. Software has been developed specifically to

conduct large scale association analyses in a time efficient manner, such as PLINK [55], MaCH

[53,64] and SNPTEST [71]. These commonly used programmes conduct linear (for a trait

outcome) or logistic (for a disease outcome) regression models, allowing for adjustment of any

number of covariates.

One important aspect of association analysis using imputed data is to take into account the

uncertainty of each imputed SNP. There is no consensus for the most appropriate way of

accounting for the uncertainty, however many studies simply remove SNPs that have poor

imputation quality. One measure of imputation quality is the ratio of the empirically observed


variance of the allele dosage to the expected binomial variance p(1 - p) at Hardy-Weinberg

Equilibrium, where p is the observed allele frequency from HapMap. This is the metric that

both MaCH (RSQR_HAT) [53] and PLINK (INFO) [55] generate. When a SNP has imputed

accurately, this ratio will be close to one, indicating that the observed variance is close to the

expected variance. However, as the observed variance decreases, this ratio tends towards

zero, indicating more uncertainty in the imputation. A commonly used threshold in meta-

analyses of common diseases/traits is to exclude SNPs with a MaCH RSQR_HAT < 0.3 [72,73].

SNPTEST calculates a similar measure that reflects the effective sample size (or power) for the

genetic effect being estimated, PROPER_INFO, with values < 0.4 commonly excluded from

meta-analyses. Barrett [74] found that different genotyping platforms have different success

rates in terms of accurately imputing additional loci; the Illumina HumanHap550-Duo chip had

better accuracy (87% of imputed loci had r2>0.9) than the Affymetrix 500K chip (60% of

imputed loci had r2>0.9).

Another important adjustment in the association analysis is to account for any population

substructure in the study sample. Principal component analysis, which can be conducted

simply in EIGENSTRAT [39], is a powerful tool to adjust for population stratification. The

eigenvectors generated through EIGENSTRAT can be used to either exclude outliers or

included as covariates in the association analysis to describe the variation along the first few

axes.

The final step of the association analysis is to check the distribution of the test statistic. This

can be done in two ways:

1. Calculating a genomic inflation factor, λ [40]. This is the ratio of the median of the

empirically observed distribution of the test statistic to the expected median. The λ

quantifies the extent of the excess false positive rate, with values close to one

indicating no inflation and values greater than one indicating increasing levels of false

positives.

2. Plotting a quantile-quantile (Q-Q) plot. This is a useful visual tool to mark deviations of

the observed distribution from the expected null distribution. It is advised that Q-Q

plots are derived for genotyped SNPs separately from imputed SNPs as they have

different distributional properties [49].


These measures of inflation may point to undetected sample duplications, unknown familial

relationships, a poorly calibrated test statistic, systematic technical bias or uncorrected

population stratification. If inflation is detected, it should be dealt with prior to presenting

results or meta-analysis with any additional studies.

1.4.4 Replication

It is important that any findings from a GWAS are confirmed in an independent study. Two

commonly used words to describe this confirmation are “replication” and “validation”, both of

which are used interchangeably in genetic association studies. Igl et al [75] acknowledge that

these are two different concepts and have defined both terms as follows:

1. “Replication: both original and confirmation sample are drawn from the same

population, and systematic differences are reduced to a minimum.”

2. “Validation: the confirmation sample stems from a population which is different than

that from which the original sample was drawn.”

Chanock et al have also defined criteria for establishing “replication” and “validation” [76].

Their specific criteria for establishing replication included 1) utilizing a similar population to the

discovery population, and 2) using the same study design and analysis techniques. In contrast

to Igl et al [75], Chanock et al believe that “validation” of a finding should be in a population

which may be from a different ethnic background, have a different phenotype definition,

recruitment or sampling strategy or the time point under investigation may be different from

the discovery population [76]. Therefore, if a genetic association discovered through a GWAS is

‘validated’, then it is more generalizable than if the association is only ‘replicated’. Most

GWASs to date have included replication of their findings, however only a few extend their

findings to additional ethnic backgrounds or to other populations and therefore rely on future

studies to validate their findings. This is partly due to the fact that the discovery study of a

GWAS often has the aim of identifying regions of the genome that are of interest, rather than

pin-pointing the causal locus and estimating its effect [77].


1.4.5 GWASs of Longitudinal Quantitative Traits

Statistical geneticists are beginning to develop methods for analysis of longitudinal data in

genetic studies. The Genetics Analysis Workshop (GAW; http://www.gaworkshop.org/) is

an example of such an initiative. According to their website, GAW is “a collaborative effort

among researchers worldwide to evaluate and compare statistical genetic methods and

relevant current analytical problems in genetic epidemiology and statistical genetics”. Three

of their 18 workshops were dedicated to longitudinal data analysis: 1) GAW 13 focused on

longitudinal analysis for microsatellite genome-wide data [78,79], 2) GAW 16, problems 2

and 3 used real longitudinal data from the Framingham Heart Study and simulated data

respectively [80], and 3) GAW 18 looked at longitudinal data for sequencing studies, although

none of this research from GAW 18 is published yet.

The methods developed in GAW 13 dealt with genetic linkage analysis, although they would be

applicable to genetic association studies in longitudinal family study designs [81]. The methods

that would be the most computationally efficient for genome-wide association studies used a

two-stage approach to conduct the genome-wide linkage analysis in the Framingham Heart

Study; in the first stage, the longitudinal data was reduced to an intercept and slope estimate

for each individual, which were used as outcomes for the genetic analysis [82,83,84]. In

addition to these methods from GAW 13, there were several methods described in Kerner et al

for the groups at GAW 16 that used the longitudinal data [85]. The approaches included

growth mixture modelling [86,87], linear mixed effects modelling [88], multivariate linear

growth modelling [89] and multivariate adaptive splines for the analysis of longitudinal data

[90]. Although these methods are all relevant for genetic association studies, they were only

applied to subsets of the genome-wide genetic data provided at GAW 16. The two-stage

approaches are promising as they are computationally efficient; however, they have only been

investigated using phenotypes that have a linear trajectory over time.

Recently, Furlotte et al described a mixed effects model for GWAS analysis and calculate the

genetic, environmental, and residual error contributions to the phenotype [91]. Their method

(for which they have written a software package) incorporates a random effect to adjust for

any population stratification or cryptic relatedness in the sample. Their simulation study

includes an average genetic effect in their model; however, the genetic effect over time was

not investigated. Fan et al present a non-parametric mixed effects model using splines, and


http://www.gaworkshop.org/

show it has lower type 1 error, higher power and lower bias in comparison to a parametric

model [92]. However, they do not investigate the effect of missing data, which is common in

large, longitudinal cohort studies.

Although there have now been many GWASs of cross-sectional phenotypes in childhood and

geneticists are beginning to investigate GWASs of longitudinal traits, to my knowledge, no

GWAS of a longitudinal childhood trait has been published in the literature. In addition, the

longitudinal methods described for GWASs to date make relatively simplistic assumptions for

the trait studied, ignoring complex mechanisms such as unbiased study designs, non-linear

trajectories, high correlation between the intercept and trajectory terms and non-normal,

correlated residuals. The research in this thesis aims to address some of these current

limitations by proposing and evaluating a modelling framework for analysing complex,

longitudinal childhood phenotype in unrelated individuals, particularly in the context of GWAS

analysis.

1.5 Obesity and Body Mass Index Obesity is a medical condition, whereby excess body fat has accumulated to the extent that it

may have an adverse effect on health, leading to reduced life expectancy and/or increased

health problems [93]. Obesity is a major global public health problem. In 2010, there were at

least 42 million overweight children under the age of five years and one billion overweight

adults globally [94]. The World Health Organisation considers Australia to have one of the

world’s highest rates of obesity, with 25% of children aged 5-17 years and 62% of adults

classified as overweight or obese in 2007-8 [95]. Childhood obesity is associated with poor

mental [96,97,98,99] and physical health [100,101] and is one of the strongest predictors of

adult obesity [102,103]. In turn, adult obesity increases the risk of many diseases including

coronary heart disease, the metabolic syndrome, some cancers, stroke, liver and gallbladder

disease, sleep apnoea and respiratory problems, osteoarthritis and gynaecological problems

[94]. Since the 1990s, the prevalence of obesity has trebled [104] which has led to an earlier

onset of related adverse health outcomes. There is recent evidence that the rate of obesity

among Australian children is plateauing after a dramatic acceleration over the preceding

decade [105]. The observed plateau, if real, could be a result of increased physical activity and

nutrition initiatives that have been introduced into Australian schools and communities in

recent years. The plateauing in the prevalence of obesity in children has been observed in


other developed, high-income countries, such as New Zealand [106], US [107], Sweden [108]

and France [109], however, the prevalence in the developing world is still increasing. Although

the prevalence of obesity appears to be stabilizing, the incidence of obesity is still higher than

desirable, particularly in Australia.

Obesity is a multifactorial condition with many biological, genetic, social and environmental

influences affecting its development [93]. There are monogenic (stemming from a single

dysfunctional gene, but is very rare in the general population) [110] and syndromic forms of

obesity (distinguished by the co-occurrence with mental retardation, dysmorphic features or

organ specific developmental abnormalities) [111]. The focus of this thesis is on the common,

multifactorial form of obesity.

Body mass index (BMI) is the relationship between weight and height that is associated with

body fat, nutritional status and health risk. It is calculated by weight, measured in kilograms,

divided by height squared, measured in metres. In epidemiology, BMI is the most commonly

used quantitative measure of adiposity [112]. Cut-off points have been defined for both child

[113] and adult [114] obesity; they are defined by the point where an increased risk of disease

is observable due to high BMI.

1.5.1 Life Course Approach to Obesity

Obesity doesn’t develop instantaneously; it is a developmental process by which an

individual’s BMI increases over a period of time. Therefore, to understand more about the

increasing incidence of obesity, it is important to understand the developmental process that

precedes the diagnosis of the condition. By utilizing the comprehensive data collected as part

of longitudinal birth cohorts in a growth trajectory modelling framework, we can begin to

understand this developmental process.

BMI growth trajectories are difficult to model statistically due to the complexities of growth

over childhood (Figure 1.3). Children tend to have rapidly increasing BMI from birth to

approximately nine months of age when they reach their adiposity peak. BMI then tends to

decrease until about the age of five or six years at the adiposity rebound and then it steadily

increases again until just after puberty when it tends to plateau through adulthood.


Figure 1.3: Schema of BMI trajectory over childhood and adolescence. The green arrow

indicates the period around the adiposity peak (9 months of age); the red arrow indicates the

period around the adiposity rebound (5-6 years of age).

The World Health Organization recently conducted research into the statistical methods

previously used to construct growth curves over childhood. As part of their process of defining

international growth standards for pre-school children, they examined as many as 30

previously published methods [115]. Most of these methods were designed for cross-sectional

cohorts where each child is measured once between certain predefined ages. Only eight

methods allowed for the partitioning of variance into between and within subject variability

required for longitudinally designed studies, where each child is measured multiple times

across childhood.

Intervention programs have traditionally targeted the individual at the onset of disease-

precursors; however, the principles underlying life course epidemiology and DOHaD suggest

that there may be sensitive periods earlier in the developmental continuum, including

pregnancy, infancy and childhood, that may offer greater opportunities for obesity prevention

[116,117,118]. By extending our current statistical methodologies for growth trajectory

modelling to enable the detection of small differences between individuals due to genetic,

environmental or lifestyle determinants, we will potentially be able to identify individuals at

higher risk of developing obesity early in the process. Intervention programmes can be


developed to specifically target these individuals when the programmes are likely to be most

cost effective and beneficial. To have the greatest impact through intervention programmes,

we need to determine which indicators are associated with different patterns of growth and

then assess how those growth patterns may predict different disease risks. Several indicators

of critical periods for adiposity development have been identified, in addition to the overall

growth trajectory from infancy to adolescence. These include the age and BMI at the adiposity

peak, the age and BMI at the adiposity rebound and various markers of puberty. Our current

knowledge based on research regarding each of these indicators will be outlined below.

1.5.1.1 Infancy Growth and the Adiposity Peak

Infancy is a period of rapid growth due to the underlying biological changes that are occurring

and therefore has been suggested as a sensitive period for the development of obesity

[118,119]. Two large reviews concluded that there is an association between rapid infant

weight gain and increased risk of adult obesity [120,121]. However, the effects of rapid infant

weight gain need to be reviewed with caution, as they are often difficult to untangle from

intrauterine growth and post-infancy growth [122].

There are several factors thought to influence adult obesity through weight gain during

infancy, including breastfeeding, feeding frequency and amount of physical activity.

Breastfeeding is thought to reduce the risk of obesity in adulthood, and a systematic review by

Owen and colleagues found a mean difference of 0.04 BMI units between subjects who were

breastfed verses formula-fed across a wide range of ages [123]. Of note, the difference was

halved after adjustment for maternal BMI and smoking, and the association did not remain

statistically significant after adjustment for maternal socio-economic status. Horta et al [124]

studied the association between breastfeeding and adult obesity, and found a 22% reduction

in the odds of obesity for those whom were breastfed. In addition to breastfeeding, there have

been a number of studies that focused on the frequency and amount of feeding over infancy

and found that many infants are overfed [125,126]. Parenting behaviours not only influence

feeding but also the amount of physical activity undertaken by the infant. For example, a study

by Zimmerman and colleagues [127] showed that by the age of two years, 90% of children

were watching television regularly with an average exposure of 1.5 hours per day. The

behaviours of overeating and lack of physical exercise induced by parents in the first few years

of life: once established, these behaviours can program the infant for life.


The adiposity peak, at around nine months of age, may be a good marker of infant growth.

Silverwood et al [128] and Sovio et al [129] have published data that demonstrate that a

delayed adiposity peak is associated with increased BMI in late childhood and adulthood

respectively. However, beyond these studies, relatively little is known about the association of

the timing of the adiposity peak with disease later in life.

1.5.1.2 Adiposity Rebound

Research has identified the adiposity rebound as a sensitive period for the development of

adiposity that persists into adult life [118,119,130,131,132,133]. The adiposity rebound is a

period of rapid growth in body fat, due to the increase in both size and number of adipocytes

(cells that specialize in storing energy as fat) [131]. It has been shown that an early age of

adiposity rebound is not only associated with adult obesity but also with a greater risk of

diabetes [134,135] and hypertension [136].

There is a growing body of literature showing that the timing of the adiposity rebound,

specifically an early rebound, is associated with increased risk for later obesity

[129,131,132,133,137,138]. Rolland-Cachera and Péneau [137] illustrate that although an early

adiposity rebound is associated with obesity in later life, it is not associated with obesity in

early life. It is therefore thought that this may help to distinguish between two growth

patterns; 1) those individuals who have a high BMI at all ages, which reflects a high lean and

fat body mass, verses 2) those individuals who have a normal BMI followed by an early

adiposity rebound and consequently a higher BMI, reflecting increased fat rather than lean

mass [132,139,140]. Individuals following the first pattern of ‘consistently high BMI’ appear to

have relatively normal metabolic profiles, whereas individuals following the second pattern

seem to be at higher risk for coronary heart disease and insulin resistance [137]. In addition to

increased risk of adult obesity, Williams and Goulding [140] identified that the timing of the

adiposity rebound was also associated with both skeletal and physical maturation. It has also

been shown that BMI at the time of the adiposity rebound positively correlates with BMI in

adulthood [129].

Several factors have been reported to be associated with an early age of adiposity rebound

and a high BMI at the time of the rebound. These factors include parental obesity [141], low

levels of activity and high levels of television viewing [142], low weight gain in the first year of


life and therefore low weight at year one [135]. Consistent with the increasing prevalence of

obesity, the age of adiposity rebound is decreasing over time [143].

As promising as the adiposity rebound may sound as a marker of later obesity, it is not without

its critics. Cole [144] uses five hypothetical individuals to show that the timing of the adiposity

rebound is determined by the individuals BMI centile and their rate of centile crossing. He goes

on to show that a high centile and upward centile crossing are independently associated with

an early adiposity rebound and hence an increase in later obesity risk. As an example of the

“horse racing effect” [145], Cole provides evidence that centile crossing at any age is an

indicator of obesity risk, rather than the timing of the adiposity rebound.

In summary, like most complex traits, the adiposity rebound appears to be influenced by both

genetic and behavioural factors. It is therefore a useful marker when investigating the early life

determinants of obesity.

1.5.1.3 Puberty

The timing of menarche in girls has been shown to be associated with obesity in adulthood

[146,147,148]. Further, being overweight during childhood is also associated with early

menarche [149,150,151]. Taken together these data have generated great debate about the

direction and mechanism underpinning the association. Mumby et al [152] have recently used

a Mendelian randomization study to try and untangle the causal relationship and found that

childhood obesity was causing early menarche.

In males, the height growth spurt is often used as a marker of puberty. Peak height velocity, a

measure of the growth spurt, has been shown to be associated with the rate of genital

development in boys [153]. It has also been shown to be highly correlated (ρ=0.84) with the

age of menarche in girls, with estimates of menarche onset occurring about one year after the

onset of the growth spurt [154,155]. It is therefore common to use the growth spurt as a

marker of pubertal status, in both males and females.

Both age of menarche and the pubertal growth spurt are influenced by genetics, as well as

biological and environmental factors [156,157]. Rapid weight gain during adolescence is

expected due to the normal physiologic changes that are occurring during this period. There


are three key hormones which play an important role in the timing of puberty that are also

related to obesity risk; leptin, insulin and estrogen [158]. Increased levels of all three

hormones need to occur before the onset of puberty begins, accompanied by a relative

increase in fat mass that accompanies it. However, in overweight or obese adolescence, these

hormones are already elevated, which leads to the earlier onset of puberty.

While these biological changes are occurring, adolescents are also changing their social

behaviours, often to assert their independence. For some individuals, these can include more

time spent in sedentary activities and less physical activity [159], developing a ‘calorie-rich,

nutrient poor’ diet [160], unhealthy eating behaviours such as skipping meals [161], binge-

eating or using laxatives/diuretics [162] and increased levels of stress leading to depression or

anxiety [163,164]. As outlined, puberty is another sensitive period where the development of

obesity may begin or continue to progress due to the complex interactions between genetics,

the environment and the biological process underlying the period. Therefore the presence of

high levels of pubertal hormones and the development of negative social behaviours may

contribute to increased obesity in adolescents.

1.5.2 Genetics of BMI

Heritability is the proportion of variability in a quantitative phenotype that is due to genetic

factors. It is a commonly used measure to identify whether a trait has a genetic component

and therefore should be used in genetic studies. Twin and family studies have indicated that

the heritability of obesity and body weight/BMI is estimated to be between 40 and 80%

[165,166,167]; however, heritability estimates appear to be age dependent where the younger

the individuals the higher the estimate of heritability [168]. A recent meta-analysis of 88 twin

studies showed that heritability of BMI in children was on average 0.07 higher than in adults

[169]. Until recently, the genes identified to regulate weight have largely been rare mutations

that cause severe monogenetic forms of obesity [170]. The latest human obesity gene map

published in 2005 noted that 127 candidate genes had been identified through association

analyses for obesity related phenotypes, however only 22 of those had been replicated in

more than five studies [171]. This, along with several other review articles [172], indicate that

although linkage and candidate gene association studies have identified a few genes that are

implicated in obesity risk or increased BMI, we were still a long way from uncovering the

genetic profile of an obese individual.


Since the advent of GWASs, common variants within several genes have been found to be

associated with adult obesity and population variation in BMI. These include the well-

replicated fat-mass and obesity associated (FTO) and the melanocortin 4 receptor (MC4R)

genes. Common variants within these two genes are associated with modest effects on BMI

(0.2-0.4kg/m2 per allele) which translates to increased odds of obesity of 1.1-1.3 in adults

[173,174,175,176,177]. In 2009, two genome-wide studies on BMI and measures of obesity

were published by large consortia which discovered further genetic variants that were

associated with BMI and risk of obesity [178,179]. An additional 10 genomic regions have been

identified through these analyses; however they still require further consistent replication in

alternate cohorts. These genes are spread across the genome and included TMEM18, GNPDA2,

SH2B1, MTCH2, KCTD15, NEGR1, SEC16B, and near SFRS10, LGR4 and BCDIN3D. There are

other studies that replicate these findings in additional genome-wide analyses [180,181,182].

The largest genome-wide meta-analysis of BMI published to date included 249,796 individuals

from the Genetic Investigation of Anthropometric Traits (GIANT) Consortium; which confirmed

14 previously-reported loci and identified 18 novel loci for BMI [72].

Many of the genes that have been identified to date are not only involved in the regulation of

body weight but also in determining an individual’s response to environmental factors, such as

diet and exercise. Of the common variants identified, many act in the central nervous system

and are believed to influence eating behaviour and feeding regulation, which in turn effects

the amount of fat within the body [178,179]. This highlights the fact that many of these genes

are influencing the regulation of food intake rather than controlling metabolism as first

thought. However, the genes identified, including both common and rare variants, only

account for a small fraction of the population variability in BMI and weight, leaving much of

the estimated heritability unexplained. Although at least 32 loci have been replicated to date,

these regions only account for approximately 1.45% of the variability in adult BMI [72]. In

order to uncover more of the heritability of obesity, current statistical methodology will need

to be extended to enable the discovery of additional genetic factors, including investigating

growth rates over childhood (rather than cross-sectional population variation in size at a

particular age, often in adulthood).

To date, there has only been one study investigating the association between genetic variants

and childhood obesity on a genome-wide scale [183] and none focusing on BMI as a


continuous trait. The childhood obesity study identified two new genetic regions, one near the

olfactomedin 4 (OLFM4) gene and the other in the homeobox B5 (HOXB5) gene, which were

shown to have the same direction of effect as the meta-analyses of adult BMI. Given the

heritability estimates for BMI are higher in children [168], childhood is an important period for

genetic association analyses with the potential of uncovering additional genetic loci of interest.

1.6 Birth Cohorts used in this Thesis Data from three cohorts are used to investigate the genes associated with childhood growth

trajectories in this thesis. The three cohorts were from Western Australia, Bristol (United

Kingdom) and Northern Finland; each of the cohorts is described in detail below. Subsets of

the cohorts are used for different chapters of this thesis; these subsets are defined in the

relevant chapters.

1.6.1 The Western Australian Pregnancy Cohort (Raine) Study

1.6.1.1 Subjects

The Western Australian Pregnancy Cohort [184,185,186] (http://www.rainestudy.org.au/) is a

prospective pregnancy cohort where 2,900 mothers were recruited prior to 18-weeks’

gestation between 1989 and 1991. The study began as a randomized control trial to

investigate the effects of multiple ultrasounds on birth outcomes and foetal growth. Although

differences due to increased exposure to ultrasound during pregnancy were detected in foetal

growth [184], these differences did not influence postnatal growth. After birth, the

longitudinal aspect of the Study aimed to focus on the DOHaD hypothesis, where data was

collected to gain a better understanding of how events during pregnancy, infancy and

childhood affect later health and development. The Study is now referred to as ‘The Raine

Study’, as an acknowledgement of the generous funding from the Raine Medical Research

Foundation to create and continuously support the study. The Raine Medical Research

Foundation was set up by The University of Western Australia when Mrs Mary Raine, a

prominent figure in Perth in the early part of the century, bequeathed her property empire to

the University in 1957 to fund medical research into the early origins of health and disease. Of

the 2,969 potential births (some of which were multiple births) from the 2,900 mothers, 14

withdrew from the study at birth, 19 had miscarriages (nine in first trimester, 10 in second

trimester ), 10 had in-utero growth restriction, 23 were lost at delivery, two were lost-to-

follow-up, 26 were stillbirths and seven pregnancies were terminated, leaving 2,868 children


for follow-up. Recruitment predominantly took place at Western Australia’s major perinatal

centre, King Edward Memorial Hospital, and nearby private practices. Participants have been

followed for the past 23 years, with physical examinations and questionnaires collected at

average ages of 1, 2, 3, 6, 8, 10, 14, 17, 18 and most recently 20 years. Figure 1.4 illustrates the

range of variables that were collected including anthropometric measurements, medical

histories, socio-demographic indicators, health, lifestyle and environmental outcomes.

The study was conducted with appropriate institutional ethics approval from the King Edward

Memorial Hospital and Princess Margaret Hospital for Children ethics boards, and written

informed consent was obtained from all mothers. The cohort has been shown to be

representative of the population presenting to the antenatal tertiary referral centre in

Western Australia [186].

Figure 1.4: The Raine Study schedule of assessments and broad measurements collected.

1.6.1.2 Measurements

The measurements used throughout this thesis are described below. Additional explanatory

variables are described in the relevant chapters.

Birth length was measured between 24 and 72 hours post birth using a Harpenden

Neonatometer to the nearest 0.1cm. Birth weight was measured in the hospital at birth.

Gestational age was based on the date of the last menstrual period unless there was

discordance with ultrasound biometry at the dating scan by greater than seven days; if there

was a discordance the gestational age was based on the dating scan [186].


Weight and height were measured at each follow-up by trained members of the research team

[187]; weight was measured using a Wedderburn Digital Chari Scale to the nearest 100 grams

with children dressed in their underclothes and height was measured to the nearest 0.1cm

with a Holtain Standiometer. BMI was calculated from the weight and height measurements.

1.6.1.3 Genotyping

A DNA sample was collected at the 14 and 17 year follow-ups. The genome-wide data was

genotyped in two separate batches using the Illumina Human660W Quad Array at the Centre

for Applied Genomics, Toronto, Ontario, Canada. The first batch of genotyping was completed

on 1,259 Raine Study children (including 63 replicates and a plate control on each plate) and

the second on 334 children (including 18 replicates and a plate control on each plate). The

Illumina Human660W Quad Array includes 657,366 genetic variants including approximately

560,000 SNPs and approximately 95,000 CNV’s.

QC checks were performed for individuals and SNPs using PLINK [55]. The initial data cleaning

step was to ensure that there were no ‘batch effects’ between the two rounds of genotyping.

No clear difference was detected between the two batches of genotyping so the participants in

each batch were merged together for QC and imputation. Replicated samples with the lower

genotyping success rate and plate controls were excluded. Individuals were removed if they

had a gender mismatch between reported gender and that determined on the basis of X

chromosome data (N = 7), had a genotyping success rate lower than 97% (N = 16), had a low

level of heterozygosity (i.e. h<0.3; N = 4) or were related to other individuals at the level of

half-siblings or first cousins by IBD sharing (i.e. π > 0.1875; N = 68). In total, 1,494 individuals

passed QC and were available for analysis. SNPs were excluded if they deviated from HWE (i.e.

HWE P-Value < 5.7x10-7; 919 markers), their genotype call rate was less than 95% (97,718

markers; includes all the CNV’s as they are not called with the SNP data) or their MAF was less

than 1% (119,246 markers). A total of 535,632 SNPs passed QC checks and were available for

analysis.

Imputation of un-typed or missing genotypes was performed using MaCH v1.0.16 [53,64] for

the 22 autosomes with the CEU samples from HapMap Phase 2 (Build 36, release 22) used as a

reference panel. After imputation, 2,543,887 SNPs (535,632 genotyped and 2,008,255

imputed) were available for analysis.


There is some population stratification detected between the Raine Study individuals due to

the sampling criteria for genotyping of at least one parent of European descent, so a principal

components analysis was carried out in EIGENSTRAT [39] using a subset of 42,888 SNPS that

are not in LD with each other. Figure 1.5 and Figure 1.6 display the first two principal

components for the 1,494 individuals. These principal components were included in the

genetic association analysis throughout this thesis and will be discussed in further detail in the

following chapters.

Figure 1.5: Principal components for population stratification in the Raine Study with the

HapMap populations superimposed, showing that the Raine Study individuals are

prominently of European descent.


Figure 1.6: Principal components for population stratification for the 1,494 participants with

genome-wide data in the Raine Study

1.6.2 Avon Longitudinal Study of Parents and Children (ALSPAC)

1.6.2.1 Subjects

The Avon Longitudinal Study of Parents and Children (ALSPAC) is a prospective cohort study

from Bristol in the United Kingdom (UK) [188] (www.bristol.ac.uk/alspac). The study is known

to its participants as the “Children of the 90’s” study. Pregnant women resident in one of three

Bristol-based health districts with an expected delivery date between 1 April 1991 and 31

December 1992 were invited to participate. ALSPAC was part of the European Longitudinal

Study of Pregnancy and Childhood, which had initial seed funding from the World Health

Organisation Europe to pilot common methodology and questionnaires in the UK, Russia and

Greece. Subsequently, funding for ALSPAC was obtained from various other sources. From

birth to five years, information on the children was extracted from health visitor records.

These records form part of standard child care in the UK and there are up to four

measurements taken on average at six weeks, 10, 21, and 48 months of age. A random 10% of

the children were recruited into a “Children in Focus” subset of the cohort which involved

eight research clinic visits, held between the ages of four months and five years of age. At age

seven, additional eligible children in the Avon district were invited to participant, and all

ALSPAC children were subsequently followed-up with annual research clinic visits that are on-


going. Questionnaires were completed by the parents throughout infancy, childhood and

adolescence, gathering information on the child and both parents.

Ethical approval for the study was obtained from the ALSPAC Law and Ethics Committee and

the Local Research Ethics Committees.


The measurements used throughout this thesis are described below. Additional explanatory

variables are described in the relevant chapters.

Birth length (crown-heel) was measured by ALSPAC staff who visited new-borns soon after

birth (median one day, range 1-14 days), using a Harpenden Neonatometer (Holtain Ltd). Birth

weight was extracted from medical records. Gestational age was obtained from obstetric

medical records, as recorded by health care professionals, who used data from the woman’s

reported last menstrual period, paediatric assessment at birth, obstetric assessment during the

antenatal period and ultrasound assessment. At the time this cohort was established, routine

early pregnancy data scans were not conducted and it is likely that only a minority had

gestational age determined by ultrasound scan. From birth to five years, length and weight

measurements were extracted from the four health visitor records. For the “Children in Focus”

subset, length/height measurements are available from the research clinic visits. At these

clinics, crown-heel length for children aged 4 to 25 months was measured using a Harpenden

Neonatometer and from 25 months onwards standing height was measured using a Leicester

Height Measure; weight was measured using Fereday 100kg combined scale (four month

clinic), Soenhle scale or Seca scale model 724 (eight month clinic), Seca 724 or Seca 835 (12

month clinic), Seca 835 (18 months onwards). From age seven years upwards, all children were

invited to annual clinics, at which standing height was measured to the last complete

millimetre using the Harpenden Standiometer and weight was measured to the nearest 0.1kg

using the Tanita Body Fat Analyser (Model TBF 305) [189]. In addition, parent-reported child

height and weight were also available from the questionnaires (27% of measures). BMI was

calculated from the weight and height measurements.

As outlined above, the growth data was collected using three measurement sources in

ALSPAC; clinic visits, measurements made during routine health care visits, and parental


reports in questionnaires. Whilst the measurements from routine health care visits have

previously been shown to be accurate in this cohort [190], parental report of children’s height

tends to be overestimated while weight tends to be under estimated [191]. Therefore, the

variability of BMI is greater in the questionnaire measures, which potentially has implications

for the genetic association analysis; this will be investigated in detail in Chapter Four of this

thesis.

1.6.2.3 Genotyping

ALSPAC children were genotyped using the Illumina HumanHap550 quad genome-wide SNP

genotyping platform by 23 and Me subcontracting the Wellcome Trust Sanger Institute,

Cambridge, UK and the Laboratory Corporation of America, Burlington, North Carolina, United

States of America.

Standard QC methods were performed in each sample separately; similar to the Raine Study

QC, SNPs were removed if the MAF was < 1%, the call rate was < 95% or the P-Value from an

exact test of HWE P-Value was <5.7x10-7 [192,193]. Individual samples were excluded on the

basis of incorrect gender assignment, minimal or excessive heterozygosity, high levels of

missingness and cryptic relatedness (16% of genotyped individuals). Genotypic data were

subsequently imputed using MaCH v1.0.16 [64] for the all 22 autosomes with the CEU samples

from HapMap Phase2 (Build 36, release 22) used as a reference panel.

No substantial population stratification was detected in ALSPAC based on the principal

components generated in the EIGENSTRAT software [39].

1.6.3 The Northern Finland Birth Cohort of 1966 (NFBC66)

1.6.3.1 Subjects

The Northern Finland Birth Cohort of 1966 is a prospective birth cohort from the region

covering the Provinces of Lapland and Oulu in Finland [194] (http://kelo.oulu.fi/NFBC/).

Mothers were invited to participate in the cohort if they had estimated delivery dates falling

between January 1st and December 31st 1966 and were followed from the 24th week of

gestation. The cohort has changed names over the years from the “North Finland premature

birth study” and “Development study of children in North Finland” to “The mother-child cohort

study of morbidity and mortality during childhood with the special purpose of preventing


mental and physical handicap” and “Cohort-66 Study”. The study included 12,231 live born and

stillborn infants with birth weight of 600 grams or more. Data collection began during

pregnancy, with additional data collected at birth, 0-1 years, 14 and 31 years in dedicated

research clinics. Register data on morbidity, mortality and socioeconomic factors are collected

from hospital records and official registers over the life course.

Informed consent for the use of the data including DNA was obtained from all subjects. The

study was approved by ethics committees in Oulu (Finland) and Oxford (UK) universities in

accordance with the Declaration of Helsinki.


Height and weight measures were collected from communal child health clinics as part of

routine clinical care in Finland. Standardised national maternity and child health care systems

have been operating in Finland since the 1940s; therefore staff were trained to record birth

and later growth measurements with great accuracy. In infancy, weight was measured to the

nearest 10 grams and in childhood to the nearest 100 grams with the child dressed in

underclothes. Height was measured to the nearest millimetre using standard procedures.

1.6.3.3 Genotyping

Genome-wide data was genotyped using the Illumina HumanCNV-370DUO Analysis BeadChip

at the Broad Institute Biological Sample Repository (BSP), Boston, Massachusetts, United

States of America.

Standard QC methods were performed; SNPs were removed if the MAF was < 1%, the call rate

was < 95% or the HWE P-Value was <1x10-4 [73]. Individual samples were excluded they had

>5% genotype data missing, incorrect gender assignment, or were related to other individuals

at the level of half-siblings by IBD sharing (i.e. π > 0.1875; one with greater missing data of

every pair removed). Genotypic data were subsequently imputed using IMPUTE software

[52,195] for the all 22 autosomes with the CEU samples from HapMap Phase2 (Build 36,

release 22) used as a reference panel.

There is some population stratification detected between the NFBC66 individuals due the

different linguistic/graphical groups of participants [196]. The population structure was


assessed using classical multidimensional scaling (MDS) on the matrix of identity-by-state of all

pairs of individuals in the program PLINK [55]. Similar to the Raine Study, variables from the

MDS analysis were included in the genetic association analysis; details are described in

subsequent chapters.

1.7 Aims This thesis investigates the association between BMI growth trajectories across

childhood/adolescence and genetic variants on a genome-wide scale. To accurately perform

this analysis, the definition of appropriate statistical models to detect small genetic effects is

required. The aims of this thesis are:

1. To develop appropriate longitudinal statistical models for BMI growth trajectories

throughout childhood using the Western Australian Pregnancy Cohort (Raine) Study.

The potential strengths and weaknesses of each model for genetic association studies

will then be explored to identify the most appropriate model for both candidate gene

and GWAS of longitudinal BMI.

2. To investigate whether a two-step approach is a valid option for the GWAS, whereby

BMI trajectories are first modelled and the random effects for each individual are

extracted for the genetic analysis.

3. To ensure the models developed in aim one extend to additional birth cohorts of

European descent. Any model misspecifications will be tested in an extensive

simulation study to ensure they will have limited impact on a large scale genetic

association study. Appropriate tests will be employed and evaluated for whether they

reduce the impact of the model misspecifications.

4. To conduct a GWAS to identify the underlying genetic determinants of BMI trajectory

in early life across the three birth cohorts as proof-of-principle for the statistical

models developed.

5. To investigate how all known genetic variants associated with adult BMI influence

growth over childhood and adolescence (including BMI, height and weight) and related

growth parameters (including age and BMI at both the adiposity peak and rebound).


1.8 Outline of Thesis Each chapter in this thesis outlines a particular research project and is an extended form of the

publications that were written for the project. Therefore, although the chapters form a logical

order, there may be some overlap between them in terms of analyses conducted and

background to methods discussed.

Chapter 2 contains a literature review of the statistical models that have previously been used

for BMI trajectory analysis throughout childhood, and describes the Raine Study BMI data in

more detail. The most appropriate methods for genetic association studies are identified and

the fit of each method to the Raine Study data from one to 17 years is described. Previously

reported adult BMI loci are incorporated into each model and the methods are compared to

determine the most appropriate modelling framework for detecting genetic effects of BMI

growth across childhood and adolescence. The chapter concludes with a summary and

recommendation detailing which modelling framework is considered the most appropriate for

future, and potentially larger scale, genetic and epidemiological studies of BMI growth.

Various methods have been suggested for conducting longitudinal GWASs, one of which is a

two-step approach whereby one models the phenotype of interest and extracts summary

measures to conduct the genetic analysis. Chapter 3 explores the application of the two-step

approach to the complex phenotype of BMI over childhood and explains why this approach is

not appropriate for this phenotype.

In Chapter 4, the recommended model from Chapter 2 is applied to ALSPAC data. A simulation

study is described and carried out to determine how model misspecifications may influence

the genome-wide results. Analysis of the genetic variants on chromosome 16 in the ALSPAC

data is also conducted to ensure the real data shows the same results as the simulation study.

Conclusions are made regarding the appropriate methods to utilise to perform GWAS of

longitudinal BMI data from the ALSPAC cohort.

The genome-wide association study for BMI trajectory over childhood is outlined in Chapter 5.

This includes the methodology used, the results found and the suggested future direction for

research involving more detailed analyses that include additional birth cohort studies.


Chapter 6 uses an alternate approach to identify genes associated with BMI trajectories in

early life – it begins by identifying gene loci that are associated with adult BMI (that have been

replicated in independent populations) and explores how SNPs in these loci can affect different

growth patterns over childhood by affecting either weight, height and/or BMI growth. Details

regarding the specific methods and data used are described in detail in the chapter.

The thesis is concluded in Chapter 7 with a summary of the key findings and their implication

in future genetic association studies of childhood growth and more broadly for longitudinal

phenotypes.


Chapter 2: Longitudinal Statistical Models For Body Mass Index Growth Trajectories Throughout Childhood Using The Western Australian Pregnancy Cohort (Raine) Study

2.1 Introduction The results presented in this chapter have been published [197]; the manuscript is included as

an appendix (Appendix A).

Before investigating the genetic basis underlying childhood growth trajectories, it is important

to assess the statistical properties of various modelling approaches that are able to capture all

the intricate details of the trajectory as described in Chapter 1. This chapter outlines the

process by which the most efficient model for detecting genetic effects for childhood BMI

growth trajectories was chosen.

2.2 Background Obesity is a major global public health problem. The World Health Organisation estimated that

in 2010 there were at least 42 million overweight children under the age of five years and one

billion overweight adults globally [94]. An individual’s susceptibility to obesity is thought to

result from a combination of their genetics, behaviours and environment. The heritability of

obesity is estimated from family and twin studies to be between 40 and 80% [165,166,167]

which appears to be age dependent, with younger individuals having higher heritability

estimates [168]. Genetic factors have an important role in childhood obesity, but their role

may be different to those that operate in adulthood. Since the advent of GWASs, common

variants within 35 genes have been discovered to be associated with adult obesity

[198,199,200,201,202]. A further 48 genes associated with population variation in body mass

index (BMI) and weight [72,174,175,178,179,180,182] in individuals of European descent. To-

43 Chapter 2: BMI Growth Trajectories

date, not all of these genes have been validated. There have been no studies to date

investigating the association between BMI in childhood and genetic variants in a GWAS.

Relatively few studies have investigated the relationship between known adult BMI associated

variants and childhood BMI [178,203,204,205,206]. Zhao et al [203] investigated the

association between childhood BMI and 13 genomic loci reported to be associated with adult

obesity, and found that nine of the loci contributed to paediatric BMI between birth and 18

years of age. Subsequently, several authors have investigated the association between adult

BMI loci and changes in growth over childhood. Hardy and colleagues [205] took variants from

the two most commonly reported obesity genes, FTO and MC4R, to see whether they were

associated with life course body size. They found the association with BMI in both genes

strengthened during childhood up until 20 years of age before weakening throughout

adulthood. In 2010, Elks et al [206] used eight variants that showed individual associations

with childhood BMI to create an obesity-risk-allele-score. This allele-score was strongly

associated with early infant weight gain but also with weight gain throughout childhood. Den

Hoed et al [204] looked at BMI in childhood and adolescence against a larger subset of

replicated SNPs representing the 16 BMI loci from the six GWASs in adults of white European

descent [174,175,178,179,207,208]. They found that the cumulative effect of all 16 variants on

BMI in childhood was similar to that in adulthood; however the association with some variants

differed by age. Finally, Belsky et al [209] investigated the largest number of adult BMI

associated SNPs to determine their influence the development of obesity. They concluded that

individuals with more risk alleles at the 32 loci had an increased likelihood of being obese in

adulthood, and that this genetic risk manifested as rapid early childhood growth. Together,

these studies begin to provide evidence that genetic loci associated with BMI in adulthood may

start to have an effect on BMI in childhood and even infancy.

Every disease, including obesity, develops over a period of time, and hence investigating the

genetic determinants of this developmental process may provide insights into the mechanisms

of the genetic associations. Sophisticated longitudinal analyses allow hypotheses to be tested

that cannot be determined from single time point analyses. These hypotheses include

assessing the patterns and duration of a genetic effects over a given time period and the

differences in means and rates of change of a trait. It is therefore important to investigate the

genetic component of BMI trajectory in order to better understand some of the underlying


biology of growth. The analysis of longitudinal growth curves allows one to identify specific

time periods in which genes play a central role.

A child’s growth profile contains important information regarding their genetic make-up and

environmental exposures. However, BMI trajectories are difficult to model due to the

complexities of growth over childhood; children tend to have rapidly increasing BMI from birth

to approximately nine months of age where they reach their adiposity peak, then BMI

decreases until about the age of 5.5 years at adiposity rebound and then steadily increases

again until after puberty where it tends to plateau through adulthood (see figure 1.3, Chapter

1). These patterns of growth tend to be different between males and females where females

often reach each of the ‘landmarks’ (adiposity rebound, puberty and plateau at adult BMI) at

an earlier age than males. These changes over time within each individual, as well as the

increasing variability over time of BMI between individuals, are often difficult to capture in a

statistical model, particularly with the aim of detecting small genetic effects. The World Health

Organization recently conducted research into statistical methods used to construct growth

curves over childhood. They examined as many as 30 previously published methods, of which

only seven handled multiple measurements per child [115]. Historically, growth (height and

weight) models were non-linear, parametric curves over a small age range, for example

adolescence, which were subsequently concatenated to cover the whole age range [210]. They

were parametric in that they modelled the age range of interest with a small number of

parameters. However, they had several drawbacks which included 1) they did not allow for

enough individual variation from the non-linear curve and therefore often missed interesting

local variations [211] and 2) they are unable to account for variation in growth due to other

characteristics that are measured at each time point such as diet and exercise. Later, these

models were extended to non-parametric but still non-linear functions where the shape of the

curve was determined locally and a curve was estimated for each subject over a small range of

ages [211,212]. These non-parametric methods used spline functions (for example, cubic

smoothing splines [213] or variable knot cubic splines [214]) and kernel estimation techniques

[211], where at any age the nearby measurements contribute to the shape of the curve. Spline

functions are sufficiently smooth polynomial functions that are defined using piecewise

functions between chosen knot points; whereas, kernel estimation applies weights to the

growth measurements and averages the weighted measures over appropriate age windows.

Although it solved some of the drawbacks of the previous method, the non-parametric


methods were still unable to describe the relationships with other covariates at each time

point and the growth estimates were highly dependent on the size of the selected smoothing

parameter; if the smoothing parameter is too small, the model will follow random variations,

whereas if the smoothing parameter is too large it will pick up interesting local patterns but

might be over-fitting the data and the estimates will not be reproducible. As the estimation is

based on nearby measurements, each individual must have a minimum number of

observations, and ideally the same number of observations for all subjects. The next major

development in the growth modelling literature was the introduction of linear mixed-effects

models for longitudinal normally distributed data [215,216]. These models use powers of age

as explanatory variables, can easily incorporate further explanatory variables measured at

each time point and can model growth over a wide age range. These models can be extended

to account for increasing heteroscedasity over the time period of interest based on a

multivariate t-distribution or to account for a curve shape that differs from the polynomial

function of age by using smoothing splines. Although the range of available methods

previously used for growth modelling is large, not all are appropriate for genetic association

analyses.

2.2.1 Aims

The aims of this chapter were to:

• Fit each method that is appropriate for modelling BMI trajectories throughout

childhood to the Raine Study data from years 1 to 17.

• Check residuals and compare model fit between methods to determine the most

appropriate model for BMI growth throughout infancy, childhood and adolescence.

• Incorporate known adult BMI loci into each model and compare estimates to

determine most appropriate model for detecting genetic effects of BMI growth.

Methods that will be explored include:

o Linear mixed effects model with up to a cubic function in fixed and/or random effects

o Linear mixed effects model with an extension to allow for non-normally distributed

random effects and error terms (Skew-normal/Skew-t) with up to a cubic function in

fixed and/or random effects

o Semi-parametric linear mixed effects model with smoothing splines

o SuperImposition by Translation And Rotation (non-linear mixed effects model)


2.3 Subjects and Materials The Western Australian Pregnancy Cohort (Raine) Study is described in detail in Section 1.6.1.

A subset of 1,506 individuals were used for this analysis based on the following criteria: at least

one parent of European descent, live birth, unrelated to anyone else in sample (one individual

of every related pair, including multiple births, was selected at random), no significant

congenital anomalies, genetic data available, and at least one measure of BMI throughout

childhood available. The individuals excluded from these analyses consisted of 369 of non-

Caucasian descent, a further 59 individuals from multiple births (55 twins, two triplets and one

twin who died <18 weeks gestation, one twin withdrew from the study at birth - one of each

multiple remained in the analysis), one individual of each of 66 siblings (not including multiple

births), 10 congenital anomalies, 853 without genetic data available and five without a

measure of BMI in childhood. Many studies investigating BMI trajectories through childhood

and adolescence begin with BMI measured at birth, however, this complicates the modelling

further for two main reasons; firstly, BMI in infants is meaningless and generally ponderal

index (PI = weight/length3) is used as a measure of growth for this age and secondly, there is

often a period of weight loss after birth that would not accurately captured in the modelling

due to the measurement times available. For these reasons, BMI at birth was excluded from all

models and modelling began at the one year assessment. BMI was calculated from the weight

and height measurements (Table 2.1; median six measures per person, interquartile range

[IQR]: 5-7), with a total of 8,986 BMI measures.

Table 2.1: Number of follow-ups with BMI measured for each of the participants in the

sample

Number of follow-ups

attended

1 2 3 4 5 6 7 8 Average

(SD)

Number of individuals 17 41 66 128 192 348 583 131 5.97 (1.52)

Of the 1,506 individuals in the analysis, there are 773 males and 733 females (51% male). At

birth, these individuals were similar to the Western Australian population of births with an

average birth weight of 3.35kg (SD=0.59kg) and gestational age of 39.35 weeks (SD=2.11

weeks), 25.21% of them were born to mothers who smoked throughout pregnancy and 8.77%

born preterm. The mothers on average gained 8.79kg (SD=3.78) from 18-34 weeks of

pregnancy and breast fed their infant for an average of six months (IQR=2-12 months). On


average, the infants gained 6.98kg (SD=1.17kg) in the first year of life. Table 2.2 presents the

average age, weight, height and BMI at each follow-up of the study.

Table 2.3 displays the correlation structure for the repeated observations of BMI, which

indicates that there is a degree of tracking over time. A typical pattern for growth data is

observed, whereby the strength of correlation decreases with increasing time. This suggests

that an autoregressive or unstructured correlation structure may be the most appropriate;

however this will be investigated further in Section 2.4.

Figure 2.1 displays the distributions of BMI for eight scheduled follow-up windows: 1-, 2-, 3-,

6-, 8-, 10-, 14- and 17 years. These can be considered as independent observations as each

individual is only measured once at each scheduled follow-up. It appears that BMI is fairly

normally distributed and the variability between individuals is fairly small, until age six where

the distribution becomes increasingly skewed as age increases.

To get a sense of the time trends, it is important to look at plots of the individual trajectories

over time. Figure 2.2 provides an example of the trajectories for a sample of individuals with

two or more time points over the follow-up period. Figure 2.3 displays a smooth curve through

the observed BMI measurements by age for all 1,506 individuals. Both figures indicate that

there is some curvature in the BMI measurements over time that needs to be accounted for in

the models. No outliers were removed from the data as it was of interest to see whether they

were appropriately accounted for in the chosen methods.


Table 2.2: The phenotypic characteristics at each follow-up year for the 1,506 individuals in

the study sample. Continuous variables are expressed as means (standard deviation); binary

variables as percentage (number).

All

(n=1,506)

Male

(n=773)

Female

(n=733)

P-Value

Summary of birth measures

Birth Weight (kg) 3.35 (0.59) 3.41 (0.59) 3.28 (0.58) 3.85x10-5

Gestational Age (weeks) 39.35 (2.11) 39.37 (2.05) 39.32 (2.17) 0.66

Preterm birth 8.77% (132) 8.03% (62) 9.55% (70) 0.34

Maternal smoking during

pregnancy

25.22% (379) 22.77% (176) 27.81% (203) 0.03

Summary of measures by follow-up year

Age Year 1 (n=1,375) 1.16 (0.10) 1.15 (0.10) 1.16 (0.10) 0.22

(yr) Year 2 (n=402) 2.18 (0.14) 2.19 (0.14) 2.16 (0.14) 0.05

Year 3 (n=994) 3.11 (0.12) 3.12 (0.13) 3.11 (0.10) 0.71

Year 6 (n=1,324) 5.92 (0.18) 5.91 (0.19) 5.92 (0.18) 0.30

Year 8 (n=1,320) 8.10 (0.35) 8.12 (0.34) 8.09 (0.36) 0.17

Year 10 (n=1,274) 10.60 (0.18) 10.60 (0.19) 10.59 (0.17) 0.16

Year 14 (n=1,276) 14.07 (0.20) 14.07 (0.20) 14.07 (0.19) 0.55

Year 17 (n=1,021) 17.05 (0.25) 17.03 (0.24) 17.06 (0.25) 0.06

BMI Year 1 (n=1,375) 17.11 (1.40) 17.38 (1.38) 16.82 (1.37) 4.63x10-14

(kg/m2) Year 2 (n=402) 15.97 (1.29) 16.19 (1.28) 15.72 (1.25) 2.00x10-4

Year 3 (n=994) 16.15 (1.27) 16.29 (1.21) 16.00 (1.31) 2.00x10-4

Year 6 (n=1,324) 15.86 (1.76) 15.88 (1.70) 15.84 (1.82) 0.64

Year 8 (n=1,320) 16.88 (2.54) 16.79 (2.47) 16.97 (2.62) 0.29

Year 10 (n=1,274) 18.69 (3.41) 18.58 (3.38) 18.80 (3.45) 0.25

Year 14 (n=1,276) 21.45 (4.23) 21.21 (4.24) 21.71 (4.20) 0.03

Year 17 (n=1,021) 23.02 (4.38) 22.83 (4.34) 23.23 (4.42) 0.15

Height Year 1 (n=1,375) 0.78 (0.03) 0.78 (0.03) 0.77 (0.03) 1.04x10-14

(m) Year 2 (n=402) 0.90 (0.03) 0.91 (0.03) 0.90 (0.03) 3.00x10-4

Year 3 (n=994) 0.96 (0.04) 0.97 (0.04) 0.96 (0.04) 1.06x10-9

Year 6 (n=1,324) 1.16 (0.05) 1.17 (0.05) 1.15 (0.04) 6.05x10-7

Year 8 (n=1,320) 1.29 (0.06) 1.30 (0.06) 1.29 (0.06) 4.37x10-6


Table 2.2 continued

All

(n=1,506)

Male

(n=773)

Female

(n=733)

P-Value

Height Year 10 (n=1,274) 1.44 (0.06) 1.44 (0.07) 1.44 (0.06) 0.97

(m) Year 14 (n=1,276) 1.65 (0.08) 1.67 (0.09) 1.62 (0.06) 4.94x10-26

Year 17 (n=1,021) 1.73 (0.09) 1.79 (0.07) 1.66 (0.06) 1.94x10-143

Weight Year 1 (n=1,375) 10.34 (1.24) 10.67 (1.24) 9.99 (1.15) 5.03x10-25

(kg) Year 2 (n=402) 13.03 (1.49) 13.39 (1.48) 12.65 (1.40) 3.37x10-7

Year 3 (n=994) 15.06 (1.84) 15.42 (1.83) 14.69 (1.78) 3.99x10-10

Year 6 (n=1,324) 21.48 (3.37) 21.75 (3.42) 21.20 (3.30) 2.91x10-3

Year 8 (n=1,320) 28.42 (5.68) 28.58 (5.65) 28.24 (5.72) 0.28

Year 10 (n=1,274) 39.01 (9.02) 38.80 (9.09) 39.23 (8.95) 0.40

Year 14 (n=1,276) 58.49 (13.44) 59.50 (14.49) 57.39 (12.11) 4.81x10-3

Year 17 (n=1,021) 68.69 (14.59) 73.15 (14.91) 64.12 (12.74) 3.91x10-24

Table 2.3: The correlation structure of the repeated observations of BMI

Year 1 2 3 6 8 10 14 17 1 1 2 0.712 1 3 0.689 0.761 1 6 0.497 0.619 0.729 1 8 0.388 0.461 0.595 0.878 1

10 0.314 0.351 0.503 0.778 0.899 1 14 0.246 0.326 0.423 0.689 0.794 0.861 1 17 0.213 0.272 0.44 0.611 0.698 0.754 0.853 1


Figure 2.1: Boxplots of BMI at each follow-up year, with BMI displayed from 10-30kg/m2 for

years 1-6 and 10-50kg/m2 for years 8-17.


Figure 2.2: Individual BMI profiles of 20 individuals from the Raine Study

Figure 2.3: Observed BMI measures for the 1,506 individuals with a lowess curve to

visualise the curvature in BMI over childhood


2.4 Statistical Methods and Model Fit Four methods were compared to assess the accuracy of estimation for the BMI growth

trajectories and the ability to detect genetic effects influencing these trajectories. These

methods included: Linear Mixed Effects Model (LMM) [215], Skew-t Linear Mixed Effects

Model (STLMM) [217,218,219], Semi-Parametric Linear Mixed Effect Model (SPLMM) and a

Non-Linear Mixed Model (NLMM), also known as SuperImposition by Translation and Rotation

(SITAR) [220]. Although there are many possible statistical methods that could be utilized in

this context, these methods were chosen as they allow for adjustment of potential

confounders, appropriately account for the correlation between the repeated measures,

obtain valid inference, allow for incomplete data on the assumption that data are missing at

random, and are computationally feasible. Once the best fitting model was defined for each

method, the model fit for each of the methods was compared.

A small simulation study was also conducted using re-sampling techniques based on 1,000

non-parametric bootstrap data sets with replacement [221] from the Raine Study data and

calculating an R2 statistic for each method fit to these simulated data sets. These bootstrap

resamples provide an estimate of the variance of the R2 statistic in each method.

Sex stratified models were used for all analyses, 1) to account for the differing growth curves

between males and females, particularly around puberty, and 2) because different genes may

influence the timing of growth spurts in males and females.

All analyses were conducted in R version 2.12.1 [222]; the spida library was used for the

SPLMM models and the sitarlib library was used for the NLMM models. Maximum likelihood

estimation was used for all mixed models to enable comparison between each of the four

methods.

2.4.1 Linear Mixed Effects Model (LMM)

LMMs [215] include both fixed effects, which are parameters that are associated with an entire

population, and random effects, which are parameters that are associated with individual units

drawn from the population at random. An LMM with a polynomial function for the time

component is a common tool for growth curve analysis with continuous repeated measures.

For a set of time points varying from 1,..,t, the time trend in the sample can be described by a

(q-1)st degree polynomial function, with q ≤ t.


2.4.1.1 Method Description

In comparison to the linear model, with one random effect term for the error (often referred

to as the residual error and denoted ε i), the LMM includes additional random-effects terms

which are appropriate for representing clustered and therefore dependant data, such as

observations taken on related individuals or data collected at several time points. The LMM

has the following general form:

( )( )i

i i i i i

i q

i n i

= + +y Xβ Zb εb N 0,Dε N 0,R

Where:

• y i is the ni x 1 response vector for observations in the ith group.

• X i is the ni x p design matrix for the fixed effects for observations in group i.

• β is the p x 1 vector of fixed-effect or population-averaged regression coefficients

(unknown population parameters).

• Z i is the ni x q design matrix for the random effects for observations in group i.

• b i is the q x 1 vector of random-effect coefficients for group i.

• ε i is the ni x 1 vector of errors for observations in group i.

• D is the q x q covariance matrix for the random effects.

• R i is the ni x ni positive-definite covariance matrix for the errors in group i.

• b i and ε i are assumed to be independent

The random effects, b i, are defined to be normally distributed with a mean of zero.

There are several advantages to this form of model including 1) it can be used in an

unbalanced design often seen in longitudinal studies, where either the number of measures or

the timing of the measurements differs between individuals, 2) it explicitly estimates the

between and within individual variation and 3) due to its computational efficiency, it facilitates

exploratory association analysis where multiple covariates are of interest.

Inferences about the fixed effects are generally referred to as estimates, whereas inferences

about the random effects are referred to as predictors [223]. The best linear unbiased

estimator (BLUE) of β is:

1 1 1( )T T− − −=β X V X X V y


Where V is the covariance matrix for the vector of observations, y, such that V = ZDZT + R. The

best linear unbiased predictor (BLUP) of u is [224]:

1( )T −= −u GZ V y Xβ

They are ‘best’ in that they minimize the sampling variance, linear in the sense that they are

linear functions of the observations y and unbiased such that:

( )

( )

E

E

=

=

β β

u u

The BLUE’s are estimated to ensure the random effects are distributed Nq(0, D). Marginally,

the y i are independently normally distributed as N(X iβ; R i + Z iDZ iT).

Laird and Ware [215] note that maximum likelihood (ML) estimates tend to give biased

estimates of covariate structure, whereas restricted ML (REML) is able to give an unbiased

result. In REML, the maximum likelihood estimation is not carried out on all the information,

but instead it uses a transformed set of data so that the nuisance parameters do not influence

the likelihood. That is, the maximum likelihood is based on any full-rank set of error contrast

µTy such that E(µTy)=0 which is equivalent to µTX=0. It is therefore recommended to use REML

when interpreting the covariance estimates as it produces unbiased estimates of the

parameters.

2.4.1.2 Model Fit

The growth curve LMM for the jth individual and tth time point and with the time scale

measured by age is as follows:

BMIjt = β0 + Σ i β i (Agejt – Age )i + b0j + Σk bkj (Agejt - Age )k + ε jt k ≤ i

Where Age is the mean age over the t time points in the sample (i.e. eight years), β i are the

parameters for the fixed effects, bkj are the parameter estimates for the random effects

assumed to be multivariate normal and the ε jt‘s are the error terms assumed to be normally

distributed N(0, Σ), where Σ is the within-individual correlation matrix. Both age and the

natural log (ln) transformation of age were considered as the time component to identify the

optimal underlying scale. Both fixed (i) and random (k) effects up to polynomial of degree 3

were tested. Several within-individual correlation structures were considered, including

autoregressive (i.e. constant variance across occasions, σ2, and Corr(Yij, Yij+k)=ρk), continuous


autoregressive, exchangeable (compound symmetric; i.e. constant variance across occasions,

σ2, and Corr(Yij, Yik)=ρ) and unstructured (i.e no assumption is made about the variances or

covariances).

Output from R version 2.12.1 [222] of the model fitting procedure outlined below are provided

in Appendix B. Following the guidelines outlined in Cheng et al [225], the initial saturated

model included a cubic function of age for both the fixed and random effects and BMI on the

natural log scale. Initially, likelihood ratio tests (LRT) were used to assess the required degree

of polynomial function for the random effects to fit the data accurately, while keeping the

fixed effects the same and specifying an independence correlation matrix for the random

effects. Table 2.4 provides statistics for the covariance models that were tested.

Table 2.4: Model fit statistics for covariance models tested using the LMM method; -2 log

likelihood, Bayesian Information Criterion (BIC) and Akaike Information Criterion (AIC). All

models assumed an independent correlation structure for the random effects and no specific

correlation structure for the error.

Model Random Effects -2LL BIC AIC

Fem

ale

1 Intercept 3647.45 -7244.60 -7282.90

2 Intercept, age 4294.20 -8529.72 -8574.40

3 Intercept, age,

age2

4392.85 -8718.63 -8769.70

Mal

e

1 Intercept 3812.15 -7573.70 -7612.31

2 Intercept, age 4521.14 -8983.23 -9028.27

3 Intercept, age,

age2

4626.61 -9185.74 -9237.22


The model with a polynomial of degree 3 in the random effects did not converge for either

males or females. Therefore, according to the AIC and other criteria presented in Table 2.4, the

model with a quadratic function for both males and females was the most appropriate (LRT P-

Value < 0.0001 when comparing Model 2 to 3 in both females and males).

Independence between the random effects was assumed in the initial model fit, which may

not be necessary, so a LRT was conducted to see whether this assumption could be relaxed.

The LRT P-Value for both males and females was <0.0001 indicating that it is necessary to

allow a correlation between the random effects.

A similar approach was used to investigate whether a within-individual correlation structure

was required in addition to the random effects. LRTs suggest that a correlation structure using

the continuous autoregressive of order one method is necessary for both males and females.

Finally, models with untransformed BMI and both untransformed and natural log transformed

age were compared using model fit criteria including fitted verses observed values, fitted

verses residual values and distribution of both random effects and error terms. These criteria

suggest that natural log transformed BMI and untransformed age provided the best fit of the

data for both males and females.

To summarise, the optimal LMM model for both males and females was based on ln(BMI) and

untransformed age, with a quadratic polynomial of age in the random effects, a cubic

polynomial of age in the fixed effects and a continuous autoregressive correlation structure of

order one. Hence, the final model for both females and males was

ln(BMIjt) = β0 + β1(Agejt - 8) + β2(Agejt - 8)2 + β3(Agejt - 8)3 + b0 + b1(Agejt - 8) +

b2(Agejt - 8)2 + ε

The model output and diagnostics for females (Table 2.5 and Figure 2.4) and males (Table 2.6

and Figure 2.5) are below. The plots in Figure 2.4 show a variety of diagnostics of the LMM

model fit in females. The residuals versus age in Plot A shows that the residuals are relatively

constant over time, however there are a few outliers within each time window. The predicted

values from the model correspond fairly well to the observed BMI values, as seen in Plot B of

Figure 2.4. The model tends to under estimate the high values of BMI as seen by the points


above the x-y line at the top of Plot B. Plot C illustrates that there is reasonably constant

variability in the standardised residuals across the range of predicted values. Focusing now on

Plots D-F, it can be observed that the autocorrelation is within the expected bounds from lag 4

and although the residuals appear to fit the assumption of normality the random intercept has

deviations at both ends. For males, the plots in Figure 2.5 show a similar pattern to what was

discussed for the females. However, there are fewer outliers in the males than the females and

both the random intercept and residuals appear to follow closer to a normal distribution.

2.4.1.3 Computational Time

Running 100 basic models as described above in R-64-bit version 2.12.1 on a 64-bit operating

system with an Intel Core i7 CPU Processor (L 640 @ 2.13GHz), it takes on average 12.14

seconds (12.10-12.18 seconds) for each female model and 14.25 seconds (13.98-14.59

seconds) for each male model.


Table 2.5: Details of LMM model in females (N=733, n=4377)

Goodness of fit criteria

AIC -9622.34

BIC -9545.74

Log Likelihood 4823.17

Random Effects

SD Correlation

Intercept Age

Intercept 0.1247

Age 0.0110 0.8310

Age2 0.0010 -0.7230 -0.4550

Residual 0.0633

Correlation Structure: Continuous AR(1)

Phi 0.37671

Fixed effects

Value SE DF t-value P-Value

Intercept 2.8170 0.0050 3641 564.28 <0.001

Age 0.0343 0.0007 3641 52.28 <0.001

Age2 0.0030 0.0001 3641 47.04 <0.001

Age3 -0.0003 0.00001 3641 -33.71 <0.001

Correlation

Intercept Age Age2

Age 0.45

Age2 -0.61 0.06

Age3 0.08 -0.70 -0.41

Standardized Within-Group Residuals

Min 1st quartile Median 3rd quartile Max

-4.7245 -0.4761 -0.0304 0.4404 5.8136


Figure 2.4: Model diagnostic plots from LMM model fit to the data from females in the

Raine Study


Table 2.6: Details of LMM model in males (n=773, N=4609)


AIC -10119.56

BIC -10042.34


Random Effects

SD Correlation

Intercept Age

Intercept 0.1194

Age 0.0113 0.8120

Age2 0.0010 -0.7090 -0.3640

Residual 0.0637

Correlation Structure: Continuous AR(1)

Phi 0.3763

Fixed effects


Intercept 2.8087 0.0047 3833 597.65 <0.001

Age 0.0288 0.0006 3833 44.51 <0.001

Age2 0.0032 0.0001 3833 52.47 <0.001

Age3 -0.0003 0.00001 3833 -31.16 <0.001

Correlation

Intercept Age Age2

Age 0.44

Age2 -0.61 0.09

Age3 0.08 -0.69 -0.40



-4.0948 -0.4423 -0.0223 0.4149 3.5616


Figure 2.5: Model diagnostic plots from LMM model fit to the data from males in the Raine

Study.


2.4.2 Skew-t Model Linear Mixed Effects Model (STLMM)

The assumption of multivariate normal random effects and within-subject errors is often

violated, particular when modelling childhood growth. This is the case in the Raine Study BMI

data, particularly in the females – as seen in the random effects and residual errors plots from

the LMM models (Plots E and F of Figure 2.4 and Figure 2.5). This assumption makes the model

easy to apply in widely used software; however its accuracy is difficult to check and the routine

use of normality has been questioned as it often lacks robustness against departures from the

normal. This misspecification of the distribution may lead to biased estimation of fixed effects

and their standard errors, and thus incorrect statistical inference, in particular with the genetic

related parameters. A common approach to achieve normality is to transform the response

variable; however, there is not a unique transformation that could be used in every scenario

and the results of the analyses might depend on the transformation used. To avoid

transforming the response and to maintain valid inference under a non-normal distribution for

the response, an extension to the LMM model was utilised assuming a multivariate t-

distribution for the error terms, ε jt‘s, and a multivariate skew-normal distribution for the

random effects. The resulting model for the response over the t time points is multivariate

skew-t with specific parameters that account for the asymmetry (skewness parameters) and

the long-tail (degree of freedom of the t-distribution) of the response distribution [218].


There has been considerable work undertaken in the area of extending the LMM for situations

in which the residuals do not follow a normal distribution. Pinheiro, Liu and Wu [226] proposed

a multivariate t linear mixed model and later several articles were published on a skew-normal

linear mixed model [227,228,229]. This mixed model was based on the skew-normal

distribution introduced by Azzalla [230], who also developed an expectation-maximization

(EM) type algorithm for maximum likelihood estimation. Recently, Lachos et al [217] extended

these methods to allow for skew-normal/independent (SNI) distributions including skew-

normal, skew-t, skew-slash and the skew-contaminated distributions.


A p x 1 random vector Y follows a skew-normal distribution (SN) with location vector µ,

dispersion matrix Σ (a p x p positive definite matrix) and p x 1 skewness vector λ, if is

probability density function (pdf) is given by: 1 2

1( ) 2 ( | ) ( ( )), RT nY pf φ −= ,Σ Φ − ∈y y μ λ Σ y μ y

Where Φp(.|μ,Σ) stands for the pdf of the p-variate normal distribution with mean vector µ

and covariance matrix Σ and Φ1(.) is the cumulative distribution function (cdf) of the standard

univariate normal distribution. Note for λ=0 that 2.2 reduces to the symmetric Np(µ, Σ).

If aZ ~ SNp(0, a2Σ, λ) for all a > 0 and Z = Y - µ, the SNI distribution can be defined as follows:

Y = μ + U-1/2Z

Where U is a positive random variable with the cdf H(u; v) and pdf h(u; v) and independent of

the SNp(0,Σ, λ)-random vector Z. The skew-t distribution is a special case of the SNI distribution

with v degrees of freedom, STp(µ, Σ, λ, v), and U ~ Gamma(v/2, v/2), v > 0. The pdf of Y is:

( ) 2 ( ; ) ; , R pp

v pf t v T A v pv d

+= , , + ∈ +

y y μ Σ y

Where tp(.; μ, Σ, v) is the pdf of the p-variate Student-t distribution and T(., v) is the cdf of the

standard univariate t-distribution, d = (y - µ)TΣ-1(y - µ) is the Mahalanobis distance and A = λTy0.

The skew-normal distribution is the limiting case (as v ↑ ∞).

Therefore, the SNI LMM is defined by the following:

SNI , , , , 1,...,i

i i i i i

in q

i iH i n+

= + +

=

y Xβ Ζb ε

b 0 D0 λε 0 0 Σ 0

The matrices D = D(α) and Σ i = Σ i(γ), i = 1, …, n, are dispersion matrices corresponding to the

between and within subject variability, and depend on unknown and reduced parameters α

and γ, respectively. Finally, H = H(.;v) is the cdf-generator that determines the specific SNI

model assumed.

The model has a closed form for the marginal distribution function, which facilitates the use of

straightforward implementation of inferences with standard optimization routines. There is no

explicit solution for maximizing the likelihood function, so it has to be maximized numerically

using an expectation maximization (EM) algorithm (Lachos et al [217]). The important step is in


choosing the starting values, which are often chosen to be the corresponding estimates under

a normal assumption and the starting values for the asymmetric parameters are set to be 0.

2.4.2.2 Model Fit

The specification in terms of the fixed and random effects was identical to the LMM. No

transformations were applied to either BMI or age as the skewness in the data was accounted

for by the model structure. However, the model would not converge with both linear and

quadratic age components in the random effects so this was reduced to only linear age. Hence,

the final model for both females and males was

BMIjt = β0 + β1(Agejt - 8) + β2(Agejt - 8)2 + β3(Agejt - 8)3 + u0 + u1(Agejt - 8) + ε jt

The model output and diagnostics for females are presented in Table 2.7 and Figure 2.6.

Table 2.7: Details of STLMM model in females (N=733, n=4377)


Log Likelihood -7952.42

Kurtosis 4.15

Random Effects

SD Correlation

Intercept

Intercept 5.4443

Age 0.0997 0.6814

Residual 0.7235

Fixed effects

Value SE t-value P-Value

Intercept 14.6367 0.0994 147.32 <0.001

Age 0.2891 0.0123 23.52 <0.001

Age2 0.0557 0.0007 79.10 <0.001

Age3 -0.0044 0.0001 -31.03 <0.001

Skewness

Intercept 4.5791

Age 2.2336


Figure 2.6: Model diagnostic plots from STLMM model fit to the data from females in the

Raine Study

The plots in Figure 2.6 display the diagnostics for the STLMM method in females, similar to

those plotted for the of the LMM model. The residuals versus age in Plot A show that the

residuals are relatively constant over time, however, there are a few large outliers at the final

follow-up time. Plot B shows that the predicted values from the model fit fairly well to the

observed BMI values; the under estimation of high BMI values that was seen with the LMM

method is not observed here. Once again, there is reasonably constant variability in the

standardised residuals across the range of predicted values (Plot C), however, this method

provides a better fit than the LMM for the residuals using the t-distributional assumption (Plot

E) and the random intercept under the skew-normal distribution (Plot D).


Similar output is presented in Table 2.8 and Figure 2.7 for the males.

Table 2.8: Details of STLMM model in males (N=773, n=4609)


Log Likelihood -8264.53

Kurtosis 3.05

Random Effects

SD Correlation

Intercept

Intercept 3.3500

Age 0.0673 0.4127

Residual 0.5918

Fixed effects

Value SE t-value P-Value

Intercept 14.8247 0.1002 147.89 <0.001

Age 0.19703 0.0128 15.40 <0.001

Age2 0.05738 0.0006 93.10 <0.001

Age3 -0.00357 0.0001 -29.03 <0.001

Skewness

Intercept 2.8590

Age 1.6628

The plots in Figure 2.7 show a similar pattern to what was discussed for the STLMM method in

females. However, the residuals show it is not such a good fit for the males (Plot E), potentially

because there is less variability in the male BMI measures to account for, and hence this

method may be over fitting the data.


Figure 2.7: Model diagnostic plots from STLMM model fit to the data from males in the

Raine Study.


This was the most computationally intensive method to fit as it uses the EM algorithm, and

therefore took the longest time to converge. To run 100 basic models as described above, it

takes on average 67.02 minutes (IQR: 66.80-69.58 minutes) per male skew-t model in R-64-bit

version 2.12.1 on a 64-bit operating system with an Intel Core i7 CPU Processor (L 640 @

2.13GHz), and 76.67 minutes (IQR: 76.55-78.48 minutes) for the female skew-t models.


2.4.3 Semi-Parametric Mixed Model (SPLMM) using Smoothing Splines

SPLMMs make use of smoothing splines, which yield a smoother growth curve estimate than

the polynomial function in the LMM when fitting non-linear relationships. By fitting smoothing

splines in a mixed model framework, inference can easily be performed with established ML

and best prediction methods. An overarching benefit of these models is their ability to be

easily extended; they allow complexities in the data to be incorporated in a straightforward

manner.


Splines are semi-parametric techniques for fitting smooth curves to data, made up of a series

of piecewise polynomials with smooth joints. Spline interpolation uses low-degree polynomials

in each of the intervals and chooses the polynomial pieces, such that they fit smoothly

together. Mathematically, the shape of the curve fixed by n+1 predefined points (“knots”; (xi,

yi), i = 0, 1, … n) and is fit by interpolating between all the pairs of knots, (xi-1, yi-1) and (xi, yi),

with polynomials y = qi(x), i = 1, 2, … n. The curve will take a shape that minimizes the amount

of curvature under the constraint of passing through all knots and first and second derivatives

of y will be continuous everywhere, including at the knots.

A spline function is defined by generating a matrix of regressors for a spline including the knots

points, degree of polynomials in each interval and the degree of smoothness at each knot

(continuous first derivative in this case). The spline function is then used in a model equation.

In this case, the function is used in an LMM framework where the fixed and random effects

time components (age over childhood) are the spline function.

2.4.3.2 Model Fit

The basic model for the jth individual is as follows:

BMIjt = β0 +Σ i β i (Agejt – Age )i + Σk γk ((Agejt-Age ) - κk)i+ + u0j + Σ i bij (Agejt – Age )i + Σk

ηkj ((Agejt-Age ) - κk)i+ + ε jt

Where κk is the kth knot and (t - κk)+ = 0 if t ≤ κk and (t - κk) if t > κk, which is known as the

truncated power basis that ensures smooth continuity between the time windows. Various

numbers and positions of knots and the degree of polynomial between knots were tested to

find the best fit to the data. Knot points were initially estimated visually from both individual

profiles and the population average curve in males and females. To optimise the number and


placement of the knot points, a series of models were fit with the knot points placed at 6-

month intervals around the estimated placement by visualization and additional knot points

were incorporated to see if they added to the model fit. The model with the lowest Akaike

Information Criterion (AIC) was selected as the final model. Finally, the degree of polynomial

was investigated, up to the third degree, required for each spline, once again selecting the

model with the lowest AIC.

For females and males, the optimal model was with three knot points placed at two, eight and

12 years with a cubic slope for each spline. The full spline function was used in both females

and males for the fixed effects, but only the first two parameters corresponding to the

intercept and linear time over the whole period were used for the random effects. The model

output and diagnostics for females and males are presented in Table 2.9 and Table 2.10

respectively.

Similar to the LMM plots, Figure 2.8 shows the residuals for females; the residuals versus age

in Plot A show that the residuals are relatively constant over time, with a few outliers. The

predicted values from the model fit fairly well to the observed BMI values (Plot B), with a

tighter fit to the x=y line than the LMM method and not as much underestimation of the large

BMI values. The variability in the standardised residuals is more constant across the range of

predicted values than in the LMM method (Plot C). Plots D to F are similar to the LMM method,

with not much improvement by adding the smoothing splines to the fixed and random effects

rather than the polynomial function.


Table 2.9: Details of SPLMM model in females (N=733, n=4377). Spline 1 is the change in

slope between two and eight years, Spline 2 is the change in slope after 12 years and Spline 3

is the change in slope before two years.


AIC -9515.20 BIC -9425.80 Log Likelihood 4771.60

Random Effects

SD Correlation

Intercept Age

Intercept 0.1297

Age 0.0121 0.7710

Age2 0.0024 -0.6900 -0.4380

Residual 0.0503

Fixed effects


Intercept 2.8125 0.0051 3638 546.32 <0.001

Age 0.0339 0.0008 3638 40.66 <0.001

Age2 0.0096 0.0011 3638 8.71 <0.001

Age3 0.00003 0.0006 3638 0.05 0.96

Spline 1 -0.0036 0.0011 3638 -3.22 0.001

Spline 2 0.0029 0.0014 3638 2.14 0.03

Spline 3 0.1437 0.0643 3638 2.23 0.03

Correlation

Intercept Age Age2 Age3 Spline 1 Spline 2

Age 0.38

Age2 -0.31 0.06

Age3 -0.23 -0.13 0.95

Spline 1 0.24 -0.08 -0.99 -0.96

Spline 2 -0.19 0.37 0.86 0.75 -0.89

Spline 3 -0.13 -0.26 0.65 0.81 -0.70 0.45



-5.5373 -0.4908 0.0039 0.4499 6.8332


Figure 2.8: Model diagnostic plots from SPLMM model fit to the data from females in the

Raine Study.


Table 2.10: Details of SPLMM model in males (N=773, n=4,609). Spline 1 is the change in

slope between two and eight years, Spline 2 is the change in slope after 12 years and Spline 3

is the change in slope before two years.


AIC -10036.18 BIC -9946.10 Log Likelihood 5032.09

Random Effects

SD Correlation

Intercept Age

Intercept 0.1246

Age 0.0125 0.7560

Age2 0.0024 -0.6820 -0.3640

Residual 0.0505

Fixed effects


Intercept 2.8036 0.0049 3830 576.85 <0.001

Age 0.0308 0.0008 3830 37.79 <0.001

Age2 0.0119 0.0011 3830 10.96 <0.001

Age3 0.0008 0.0005 3830 1.42 0.16

Spline 1 -0.0059 0.0011 3830 -5.35 <0.001

Spline 2 0.0082 0.0014 3830 6.06 <0.001

Spline 3 0.2095 0.0629 3830 3.33 0.001

Correlation

Intercept Age Age2 Age3 Spline 1 Spline 2

Age 0.38

Age2 -0.33 0.06

Age3 -0.25 -0.13 0.95

Spline 1 0.25 -0.07 -0.99 -0.97

Spline 2 -0.20 0.36 0.86 0.75 -0.89

Spline 3 -0.14 -0.26 0.65 0.81 -0.70 0.46



-4.4810 -0.4565 -0.0076 0.4522 4.2693


Figure 2.9: Model diagnostic plots from SPLMM model fit to the data from males in the

Raine Study

The plots in Figure 2.9 for males are similar to those from the LMM method. It appears that

the smoothing splines allow for better prediction of BMI values than the LMM, as shown in

Plots B and C.


On average, each basic model, as described above, takes 17.31 seconds (17.27-17.34 seconds)

for the males in R-64-bit version 2.12.1 on a 64-bit operating system with an Intel Core i7 CPU

Processor (L 640 @ 2.13GHz) and 17.97 seconds (17.94-18.00 seconds) for the females.


2.4.4 Non-Linear Mixed Effects Model (NLMM); also known as the SuperImposition

by Translation And Rotation (SITAR) Model

The minimum point in the BMI trajectory that occurs about 5-7 years of age for most

individuals is commonly known as the adiposity rebound. This rebound can differ between

individuals in three distinct ways; 1) size, where some individual’s minimum BMI is low while

others is higher, 2) timing, where some individuals reach their minimum BMI earlier than

others 3) duration, where the length of time an individual stays at their minimum can be

longer than others. The adiposity rebound is an important marker for later adult disease, as

outlined in Chapter 1, Section 1.5.1.2. All three aspects of the adiposity rebound are targeted

by the SITAR model, which is another extension to the LMM whereby the response is not

assumed to be Gaussian but may come from some other exponential distribution.


The SITAR model [220] was recently defined to summarize height growth in puberty (in

particular peak height velocity) and estimate subject-specific parameters that can be used to

investigate relationships with earlier exposures and later outcomes. The SITAR method

(referred to here as NLMM) has a single fitted curve at the population level and individual level

estimates of mean differences in size (shifting up or down of the BMI curve), growth tempo

(left-right shift of the curve on the age scale) and velocity (shrinking or stretching of the age

scale).

The basic model for the growth curve is:

i

iit i i

ty heγβα ε− = + +

Where:

yit = growth of subject i at age t

h(t) = natural cubic spline curve of growth versus age

α i = random growth intercept that adjusts for differences in mean BMI (size)

β i = random growth intercept to adjust for differences in timing (tempo)

γ i = random age scaling adjusting for the duration of the growth spurt (velocity)

ε i = within subject residual error

This model was fitted using the nlme function in R with the h(t) function estimated as a fixed-

effect natural cubic spline.


2.4.4.2 Model Fit

This model was fit with the three parameters (size, tempo and velocity) as random effects, size

and velocity as fixed effects, and h(t) a natural cubic spline curve with 3 to 8 degrees of

freedom (df) fitted as fixed effects. BMI and age were fit both untransformed and natural log

transformed to identify the best fit to the data. Model fit to the data were compared using AIC,

deviance and residual standard deviation.

The optimal model for females had a natural cubic spline curve with three df and both BMI and

age on the natural log transformed scale. Similarly, the optimal model for males was with BMI

and age on the natural log transformed scale but with four df for the natural cubic spline

curve. The model output from each of the models is presented in Table 2.11 and Figure 2.10

for females and Table 2.12 and Figure 2.11 for males.


Table 2.11: Details of NLMM model in females (N=733, n=4,377)


AIC -9575.00

BIC -9498.39


Random Effects

SD Correlation

Size Tempo

Size 0.0739

Tempo 0.2774 -0.4960

Velocity 0.2294 -0.3690 0.4940

Residual 0.0513

Fixed effects


Knot 1 0.0220 0.0053 3640 4.17 <0.001

Knot 2 0.0753 0.0058 3640 13.05 <0.001

Knot 3 0.2408 0.0128 3640 18.78 <0.001

Size 2.7790 0.0037 3640 742.08 <0.001

Velocity 0.5571 0.0392 3640 14.20 <0.001

Correlation

Knot 1 Knot 2 Knot 3 Size

Knot 2 0.80

Knot 3 0.48 0.69

Size -0.02 0.21 0.58

Velocity -0.14 -0.43 -0.91 -0.68



-5.6123 -0.4751 -0.0144 0.4474 6.3412


Figure 2.10: Model diagnostic plots from NLMM model fit to the data from females in the

Raine Study.


Table 2.12: Details of NLMM model in males (N=773, n=4,609)


AIC -10099.41

BIC -10015.75


Random Effects

SD Correlation

Size Tempo

Size 0.0745

Tempo 0.2819 -0.5540

Velocity 0.2001 -0.5340 0.5630

Residual 0.0521

Fixed effects


Knot 1 -0.1792 0.0242 3831 -7.42 <0.001

Knot 2 0.1314 0.0256 3831 5.14 <0.001

Knot 3 0.2390 0.0453 3831 5.28 <0.001

Knot 4 0.4917 0.0658 3831 7.47 <0.001

Size 2.8841 0.0150 3831 192.88 <0.001

Velocity -0.0372 0.0838 3831 -0.44 0.6576

Correlation

Knot 1 Knot 2 Knot 3 Knot 4 Size

Knot 2 -0.92

Knot 3 -0.94 0.98

Knot 4 -0.96 0.97 0.99

Size -0.97 0.92 0.93 0.96

Velocity 0.98 -0.96 -0.98 -0.99 -0.97



-5.9288 -0.4479 -0.0150 0.4276 4.7460


Figure 2.11: Model diagnostic plots from NLMM model fit to the data from males in the

Raine Study.

The estimates for the three parameters (size, tempo and velocity) were extracted for each

individual from the best fitting NLMM model and used for genetic analyses. Firstly, these

parameters were investigated against later outcome, including BMI, waist circumference,

waist-hip ratio and superilliac skinfold at 17 years of age (Table 2.13). It was evident that all

three parameters in females were highly predictive of their end point, whereas only size and

tempo in males were predictive of outcome in males. A later age of adiposity rebound is

associated with decreased markers of obesity in both males and females. Additionally, a

shorter rebound period is associated with increased markers of obesity, in females only. Males

with a longer rebound period had lower superilliac skinfolds than males with short rebound

periods.


Table 2.13: Results from association analysis between the estimates of the three parameters

from the NLMM model and markers of obesity at age 17 years.

Log(BMI) Log(Waist

Circumference)

Waist-hip ratio Superilliac skinfold

(log for males)

Effect (SE) P Effect (SE) P Effect (SE) P Effect (SE) P

Mal

e

Size 1.29 (0.09) <0.01 0.80 (0.07) <0.01 0.17 (0.04) <0.01 3.40 (0.35) <0.01

Tempo -0.48 (0.02) <0.01 -0.31 (0.02) <0.01 -0.08 (0.01) <0.01 -1.41 (0.08) <0.01

Velocity -0.04 (0.05) 0.36 -0.04 (0.04) 0.34 0.02 (0.02) 0.15 -0.62 (0.17) <0.01

Fem

ale

Size 1.41 (0.10) <0.01 0.75 (0.09) <0.01 0.14 (0.04) <0.01 60.54 (6.95) <0.01

Tempo -0.46 (0.02) <0.01 -0.28 (0.02) <0.01 -0.07 (0.01) <0.01 -22.98 (1.76) <0.01

Velocity 0.26 (0.04) <0.01 0.18 (0.03) <0.01 0.07 (0.02) <0.01 15.26 (2.72) <0.01


There are two different ways genetic variables could be added into these models. Firstly, into

the actual model where running 100 basic models as described above on average it takes

36.68 seconds (36.62-36.74 seconds) per male model in R-64-bit version 2.12.1 on a 64-bit

operating system with an Intel Core i7 CPU Processor (L 640 @ 2.13GHz) and 27.00 seconds

(26.85-27.09 seconds) for the female models. Alternatively, the predictions (estimates of size,

tempo and velocity for each individual) could be extracted and treated as the dependent

variables in a linear regression model, which would dramatically reduce the time for each

genetic locus (time not shown).

2.5 Genetic Associations To investigate whether these chosen methods are appropriate for detecting genetic markers

that have an effect on childhood BMI and the change in BMI over childhood, 17 genetic

variants were selected to include in the best model from each method.

2.5.1 SNP Selection

The 17 genetic variants published in den Hoed et al [204] were selected to investigate the

association with childhood BMI, and more importantly the change in BMI over childhood.

These SNPs were initially discovered to be associated with adult BMI and were subsequently

replicated in at least one study of BMI and change in BMI over childhood. At the time of

selecting SNPs for this study, they were the largest set of SNPs shown to be associated with

BMI over childhood and adolescence. Loci illustrated to be associated with only obesity risk

and not BMI were excluded. Subsets of these 17 SNPs (either the same SNPs or a SNP in high

LD [r2>0.8]) were also presented by Elks et al [206] and Hardy et al [205], who showed


associations with changes in growth over childhood. Genotype information on these 17

published genetic variants was available for individuals in the Raine Study sample (genotype

data was described in Section 1.6.1.3.), either directly genotyped SNPs (rs925946 (BDNF),

rs10913469 (SEC16B), rs2605100 (LYPLAL1), rs987237 (TFAP2B), rs10838738 (MTCH2),

rs7138803 (BCDIN3D) and rs10146997 (NRXN3)) or from the best guess genotype data

imputed against HapMap release 22 (rs2815752 (NEGR1), rs6548238 (TMEM18), rs7647305

(ETV5), rs10938397 (GNPDA2), rs613080 (MRSA), rs1488830 (BDNF), rs8055138 (SH2B1),

rs1121980 (FTO), rs17782313 (MC4R) and rs11084753 (KCTD15)). Two variants in the BDNF

gene, previously been shown to be independently associated with obesity [179], were

investigated (r2 =0.11). The 17 SNPs are described in Table 2.14, including the available sample

size with complete data for each SNP. These 17 SNPs were used to investigate the sensitivity of

each method to detect genetic effects in terms of point estimates and standard errors across

various time points. Each SNP was incorporated into the model independently assuming an

additive genetic effect for the BMI increasing allele. In addition, an ‘obesity-risk allele score’

was created on the subset of individuals with complete genetic data by summing the number

of risk alleles an individual had (n=1,219) [231]. The alleles were not weighted by their effect

size, as this has previously been shown to only have limited benefit [232].

Due to the population stratification detected in the Raine Study (see Chapter 1, Section

1.6.1.3), analysis was conducted adjusting for the first five principal components generated in

the EIGENSTRAT software [39]. No adjustment for multiple testing have been made as the aim

was to estimate a combined effect of SNPs that have already been validated in previous

studies and shown to be significantly associated with childhood BMI and growth. Therefore,

genetic loci were considered associated with BMI if the global LRT was significant at an α<0.05

level.


Table 2.14: Characteristics from the Raine Study sample of the 17 SNPs investigated in each of the statistical methods

SNP Gene Chr. Obesity

Risk

Allele

MAF HWE-P Major

Homozygote

Heterozygote Minor

Homozygote

Sample Size

rs2815752 NEGR1 1p31 A 0.38 0.47 583 687 219 N female=724, N male=765

rs10913469 SEC16B 1q25 G 0.20 0.57 953 471 64 N female=723, N male=765

rs2605100 LYPLAL1 1q41 G 0.30 0.71 726 623 140 N female=724, N male=765

rs6548238 TMEM18 2p25 C 0.15 0.48 1029 370 38 N female=698, N male=739

rs7647305 ETV5 3q28 C 0.21 0.11 928 508 53 N female=724, N male=765

rs10938397 GNPDA2 4p12 G 0.44 0.02 492 678 301 N female=716, N male=755

rs987237 TFAP2B 6p12 G 0.19 0.24 970 473 46 N female=724, N male=765

rs613080 MRSA 8 G 0.14 0.19 1072 368 41 N female=720, N male=761

rs10838738 MTCH2 11p11 G 0.36 0.96 604 690 195 N female=724, N male=765

rs1488830 BDNF 11p13 T 0.21 0.38 938 472 68 N female=717, N male=761

rs925946 BDNF 11p13 T 0.30 0.11 745 597 146 N female=723, N male=765

rs7138803 BCDIN3D 12q13 A 0.37 0.11 612 662 215 N female=724, N male=765

rs10146997 NRXN3 14q31 G 0.22 0.15 919 487 80 N female=722, N male=764

rs8055138 SH2B1 16p11 T 0.38 0.87 573 691 213 N female=718, N male=759

Table 2.14 continued

SNP Gene Chr Obesity

risk

allele

MAF HWE-P Major

Homozygote

Heterozygote Minor

Homozygote

Sample Size

rs1121980 FTO 16q12 T 0.41 0.83 509 713 243 N female=712, N male=753

rs17782313 MC4R 18q22 C 0.23 0.83 881 531 77 N female=724, N male=765

rs11084753 KCTD15 19q13 G 0.35 0.43 574 591 168 N female=645, N male=688

2.5.2 Cross-Sectional Analyses

Cross-sectional analyses at each scheduled follow-up time were conducted to characterize the

genetic associations of each locus. Sex stratified linear models were used, adjusting for mean

centred age (Age ) and the SNP under an additive genetic model. In both males and females,

BMI (and the residuals) were approximately normally distributed from years one to three,

however the distribution became increasingly skewed from year six, and hence the natural

logarithm of BMI was used. The results are presented in Table 2.15.

In females, a consistently significant association between BMI and the BMI increasing allele of

the SEC16B rs10913469 SNP was detected from age 6 to 17, with an increasing effect size over

time (β=0.0171 at age 6 and β=0.0372 by age 17). The BMI increasing allele of the BCDIN3D

rs7138803 SNP was associated with BMI at age 13, which may indicate this SNP has an effect

on BMI from post puberty. Finally, there was a significant association between BMI and the

BMI increasing allele of the KCTD15 rs11084753 SNP at ages 8 and 13.

The males are more complex than the females, with associations between a number of SNPs

and BMI at multiple ages. The BMI increasing alleles of the loci in SEC16B, GNPDA2, MRSA,

MTCH2, BDNF (rs1488830), BCDIN3D, NRXN3 and SH2B1 all had significant associations at a

single time point. The BMI increasing allele of the TMEM18 rs4854344 SNP was associated

with BMI at ages 10 and 13, with a slight increase in effect size over time (β=0.0229 at age 10

and β=0.0249 by age 13). Similarly, there was an association between MC4R and BMI at ages

13 and 16 years and between the BMI increasing allele of TFAP2B rs987237 and BMI at ages 3

to 10. Finally, the BMI increasing allele of FTO rs3751812 showed strong associations with BMI

from age 6 to 16.

Each of these associations are based on a significance level of α=0.05, which is not an

appropriate threshold given the number of tests conducted. Using a Bonferroni adjustment for

the number of SNPs and number of time points analysed in each gender, α=0.00037, only the

FTO association from age eight in males would remain significantly associated with BMI.


Table 2.15: Summary of cross-sectional results for the 17 SNPs. Significant P-Values are in bold.

SNP Females Males Beta (SE) P Beta (SE) P

rs2815752 Year 1 0.0504 (0.0784) 0.52 0.0163 (0.0749) 0.83 (NEGR1) Year 2 -0.1269 (0.1306) 0.33 0.2049 (0.1282) 0.11

Year 3 0.0685 (0.0884) 0.44 0.1018 (0.0777) 0.19 Year 6 0.0094 (0.0062) 0.13 0.0069 (0.0055) 0.21 Year 8 0.0106 (0.0082) 0.20 0.0092 (0.0076) 0.23 Year 10 0.0135 (0.0101) 0.18 0.0102 (0.0095) 0.29 Year 14 0.0074 (0.0108) 0.49 0.0151 (0.0104) 0.15 Year 17 -0.0040 (0.0116) 0.73 0.0110 (0.0113) 0.33

rs10913469 Year 1 -0.0138 (0.0974) 0.89 -0.0497 (0.0872) 0.57 (SEC16B) Year 2 0.0974 (0.1714) 0.57 0.3207 (0.1573) 0.04

Year 3 0.1972 (0.1098) 0.07 -0.0311 (0.0910) 0.73 Year 6 0.0171 (0.0077) 0.03 0.0022 (0.0064) 0.73 Year 8 0.0306 (0.0102) 0.003 -0.0028 (0.0088) 0.75 Year 10 0.0296 (0.0125) 0.02 -0.0007 (0.0110) 0.95 Year 14 0.0424 (0.0132) 0.001 0.0134 (0.0122) 0.27 Year 17 0.0372 (0.0144) 0.01 -0.0175 (0.0132) 0.19

rs2605100 Year 1 -0.1300 (0.0810) 0.11 -0.1091 (0.0774) 0.16 (LYPLAL1) Year 2 0.0080 (0.1390) 0.95 -0.0257 (0.1346) 0.85

Year 3 -0.0910 (0.0928) 0.33 -0.0846 (0.0806) 0.29 Year 6 -0.0010 (0.0066) 0.88 -0.0073 (0.0058) 0.21 Year 8 0.0011 (0.0087) 0.90 -0.0031 (0.0079) 0.69 Year 10 -0.0027 (0.0108) 0.80 -0.0016 (0.0099) 0.87 Year 14 -0.0033 (0.0111) 0.77 -0.0085 (0.0109) 0.44 Year 17 -0.0118 (0.0122) 0.33 -0.0052 (0.0117) 0.66

rs6548238 Year 1 0.0138 (0.0874) 0.87 0.0842 (0.0892) 0.35 (TMEM18) Year 2 -0.0778 (0.1479) 0.60 0.0463 (0.1591) 0.77

Year 3 0.1032 (0.0970) 0.29 0.0133 (0.0921) 0.89 Year 6 0.0048 (0.0071) 0.50 0.0040 (0.0068) 0.56 Year 8 0.0036 (0.0093) 0.70 0.0138 (0.0093) 0.14 Year 10 0.0029 (0.0115) 0.80 0.0229 (0.0114) 0.04 Year 14 -0.0002 (0.0119) 0.99 0.0249 (0.0126) 0.05 Year 17 0.0104 (0.0132) 0.43 0.0260 (0.0138) 0.06

rs7647305 Year 1 0.0919 (0.0966) 0.34 0.0973 (0.0901) 0.28 (ETV5) Year 2 0.0972 (0.1576) 0.54 0.1157 (0.1473) 0.43

Year 3 0.0453 (0.1065) 0.67 -0.0002 (0.0961) 1.00 Year 6 -0.0042 (0.0078) 0.59 0.0035 (0.0067) 0.60 Year 8 -0.0021 (0.0101) 0.84 0.0054 (0.0093) 0.56 Year 10 -0.0006 (0.0126) 0.96 0.0112 (0.0117) 0.34 Year 14 0.0021 (0.0131) 0.87 0.0089 (0.0126) 0.48 Year 17 -0.0055 (0.0141) 0.70 0.0207 (0.0132) 0.12


Table 2.15 continued SNP Females Males rs10938397 Year 1 -0.0453 (0.0714) 0.53 0.0539 (0.0724) 0.46 (GNPDA2) Year 2 0.0224 (0.1196) 0.85 0.1623 (0.1380) 0.24

Year 3 0.0500 (0.0800) 0.53 0.0863 (0.0755) 0.25 Year 6 0.0014 (0.0058) 0.81 0.0067 (0.0054) 0.22 Year 8 0.0023 (0.0077) 0.77 0.0054 (0.0074) 0.47 Year 10 0.0088 (0.0094) 0.35 0.0214 (0.0093) 0.02 Year 14 0.0108 (0.0097) 0.27 0.0102 (0.0102) 0.32 Year 17 0.0191 (0.0106) 0.07 0.0127 (0.0110) 0.25

rs987237 Year 1 0.1815 (0.0991) 0.07 0.0098 (0.0929) 0.92 (TFAP2B) Year 2 0.0395 (0.1742) 0.82 0.0055 (0.1789) 0.98

Year 3 0.0501 (0.1109) 0.65 0.2436 (0.0937) 0.01 Year 6 0.0086 (0.0079) 0.28 0.0188 (0.0068) 0.01 Year 8 0.0067 (0.0106) 0.53 0.0219 (0.0093) 0.02 Year 10 0.0056 (0.0129) 0.67 0.0227 (0.0118) 0.05 Year 14 0.0150 (0.0136) 0.27 0.0181 (0.0127) 0.16 Year 17 -0.0080 (0.0148) 0.59 0.0232 (0.0137) 0.10

rs613080 Year 1 -0.0486 (0.1028) 0.64 0.0368 (0.1002) 0.71 (MRSA) Year 2 -0.1139 (0.1927) 0.56 -0.4622 (0.1985) 0.02

Year 3 0.1386 (0.1114) 0.21 0.0190 (0.1017) 0.85 Year 6 -0.0015 (0.0084) 0.86 0.0069 (0.0074) 0.35 Year 8 -0.0108 (0.0109) 0.32 0.0103 (0.0101) 0.31 Year 10 -0.0198 (0.0134) 0.14 -0.0041 (0.0128) 0.75 Year 14 0.0077 (0.0138) 0.58 0.0102 (0.0139) 0.46 Year 17 0.0181 (0.0150) 0.23 0.0084 (0.0147) 0.57

rs10838738 Year 1 -0.0561 (0.0763) 0.46 -0.0248 (0.0768) 0.75 (MTCH2) Year 2 0.0275 (0.1251) 0.83 -0.0678 (0.1393) 0.63

Year 3 -0.0326 (0.0840) 0.70 0.1353 (0.0781) 0.08 Year 6 -0.0047 (0.0061) 0.45 0.0115 (0.0057) 0.04 Year 8 -0.0018 (0.0082) 0.83 0.0143 (0.0078) 0.07 Year 10 -0.0026 (0.0101) 0.79 0.0135 (0.0097) 0.17 Year 14 0.0015 (0.0104) 0.89 0.0121 (0.0107) 0.26 Year 17 -0.0063 (0.0114) 0.58 0.0102 (0.0114) 0.37

rs1488830 Year 1 -0.0381 (0.0916) 0.68 0.1481 (0.0868) 0.09 (BDNF) Year 2 0.1963 (0.1577) 0.22 0.2860 (0.1548) 0.07

Year 3 -0.0225 (0.1020) 0.83 0.0411 (0.0941) 0.66 Year 6 0.0041 (0.0074) 0.58 0.0092 (0.0065) 0.16 Year 8 0.0098 (0.0097) 0.31 0.0169 (0.0089) 0.06 Year 10 0.0176 (0.0122) 0.15 0.0137 (0.0110) 0.21 Year 14 0.0010 (0.0124) 0.94 0.0270 (0.0124) 0.03 Year 17 0.0055 (0.0132) 0.68 0.0224 (0.0131) 0.09


Table 2.15 continued

SNP Female Male rs925946 Year 1 -0.0960 (0.0812) 0.40 0.1370 (0.0771) 0.08 (BDNF) Year 2 0.0969 (0.1493) 0.52 0.1488 (0.1377) 0.28

Year 3 -0.1415 (0.0885) 0.11 0.0274 (0.0818) 0.74 Year 6 -0.0066 (0.0065) 0.31 0.0062 (0.0058) 0.29 Year 8 -0.0072 (0.0085) 0.40 0.0047 (0.0079) 0.55 Year 10 -0.0019 (0.0105) 0.86 0.0063 (0.0100) 0.53 Year 14 -0.0039 (0.0111) 0.72 0.0100 (0.0108) 0.36 Year 17 -0.0001 (0.0119) 0.99 0.0087 (0.0114) 0.45

rs7138803 Year 1 -0.0185 (0.0778) 0.81 -0.0191 (0.0731) 0.79 (BCDIN3D) Year 2 0.0826 (0.1308) 0.53 -0.0309 (0.1277) 0.81

Year 3 0.0974 (0.0847) 0.25 0.0385 (0.0762) 0.61 Year 6 0.0096 (0.0063) 0.13 0.0083 (0.0054) 0.12 Year 8 0.0152 (0.0082) 0.06 0.0140 (0.0073) 0.06 Year 10 0.0121 (0.0104) 0.25 0.0158 (0.0092) 0.09 Year 14 0.0229 (0.0106) 0.03 0.0223 (0.0101) 0.03 Year 17 0.0182 (0.0116) 0.12 0.0197 (0.0111) 0.08

rs10146997 Year 1 0.0554 (0.0890) 0.53 -0.0020 (0.0866) 0.98 (NRXN3) Year 2 0.1023 (0.1382) 0.46 -0.2526 (0.1515) 0.10

Year 3 0.0055 (0.0987) 0.96 0.1056 (0.0868) 0.23 Year 6 0.0024 (0.0072) 0.74 -0.0027 (0.0065) 0.68 Year 8 -0.0005 (0.0095) 0.96 -0.0053 (0.0089) 0.55 Year 10 -0.0074 (0.0117) 0.53 -0.0085 (0.0109) 0.44 Year 14 0.0102 (0.0123) 0.41 -0.0226 (0.0120) 0.06 Year 17 0.0027 (0.0133) 0.84 -0.0409 (0.0130) 0.002

rs8055138 Year 1 -0.0480 (0.0761) 0.53 0.0838 (0.0766) 0.27 (SH2B1) Year 2 0.0145 (0.1315) 0.91 0.0116 (0.1357) 0.92

Year 3 -0.0212 (0.0837) 0.80 0.1382 (0.0782) 0.08 Year 6 0.0010 (0.0061) 0.87 0.0120 (0.0056) 0.03 Year 8 0.0027 (0.0080) 0.73 0.0154 (0.0077) 0.05 Year 10 -0.0006 (0.0101) 0.95 0.01400 (0.0096) 0.14 Year 14 0.0106 (0.0104) 0.31 0.0148 (0.0105) 0.16 Year 17 0.0082 (0.0114) 0.47 0.0073 (0.0113) 0.52

rs1121980 Year 1 0.0281 (0.0774) 0.72 0.0235 (0.0752) 0.75 (FTO) Year 2 0.1126 (0.1337) 0.40 0.0106 (0.1374) 0.94

Year 3 0.0546 (0.0866) 0.53 0.0580 (0.0776) 0.46 Year 6 0.0092 (0.0063) 0.14 0.0159 (0.0056) 0.004 Year 8 0.0144 (0.0082) 0.08 0.0339 (0.0076) 9.01x10-6 Year 10 0.0202 (0.0101) 0.05 0.0336 (0.0095) 4.18x10-4 Year 14 0.0159 (0.0106) 0.14 0.0297 (0.0103) 0.004 Year 17 0.0079 (0.0116) 0.49 0.0326 (0.0112) 0.004


Table 2.15 continued SNP Female Male rs17782313 Year 1 -0.0462 (0.0884) 0.60 -0.0203 (0.0892) 0.82 (MC4R) Year 2 -0.0721 (0.1445) 0.62 0.0823 (0.1571) 0.60

Year 3 0.0344 (0.0989) 0.73 -0.0068 (0.0898) 0.94 Year 6 -0.0042 (0.0071) 0.56 0.0071 (0.0065) 0.28 Year 8 0.0027 (0.0093) 0.77 0.01411 (0.0089) 0.11 Year 10 0.0102 (0.0116) 0.38 0.0124 (0.0112) 0.27 Year 14 -0.0023 (0.0120) 0.85 0.0265 (0.0122) 0.03 Year 17 0.0002 (0.0132) 0.99 0.0276 (0.0131) 0.04

rs11084753 Year 1 -0.0049 (0.0697) 0.94 0.0758 (0.0679) 0.26 (KCTD15) Year 2 -0.0830 (0.1164) 0.48 -0.0448 (0.1175) 0.70

Year 3 0.0357 (0.0774) 0.65 0.0880 (0.0734) 0.23 Year 6 0.0057 (0.0056) 0.30 -0.0022 (0.0051) 0.67 Year 8 0.0177 (0.0073) 0.02 -0.0006 (0.0069) 0.93 Year 10 0.0139 (0.0090) 0.12 -0.0123 (0.0086) 0.15 Year 14 0.0241 (0.0094) 0.01 -0.0012 (0.0094) 0.90 Year 17 0.0081 (0.0101) 0.42 0.0013 (0.0102) 0.90

Allele score Year 1 -0.0147 (0.0203) 0.47 0.0233 (0.0183) 0.20 Year 2 0.0201 (0.0346) 0.56 0.0361 (0.0332) 0.28 Year 3 0.0231 (0.0220) 0.29 0.0520 (0.0187) 0.01 Year 6 0.0027 (0.0016) 0.09 0.0053 (0.0013) 9.27x10-5 Year 8 0.0058 (0.0021) 0.01 0.0086 (0.0018) 3.62x10-6 Year 10 0.0061 (0.0027) 0.02 0.0088 (0.0023) 1.79x10-4 Year 14 0.0094 (0.0028) 0.001 0.0101 (0.0025) 5.87x10-5 Year 17 0.0057 (0.0030) 0.06 0.0089 (0.0028) 0.001

2.5.3 Longitudinal Analyses

Several genetic associations were detected between longitudinal BMI and the previously

reported adult obesity loci, after adjustment for the first five principal components; however

the detected associations differed by the statistical method used. A LRT indicated the LMM

method detected one significant association in the females and three in males at the 5% level

of significance. The STLMM method detected additional genetic associations with three in

females and four in males. The SPLMM was the most efficient in detecting significant

associations with five in females and four in males. Finally, the NLMM method detected no

significant SNPs in either females or males for the size parameter but two significant SNPs for

the tempo parameter in females and four in males in addition to one significant SNP for

velocity in males. Results of all 17 SNPs can be found in Table 2.16 (females) and Table 2.17

(males).


With the exception of the STLMM method, all methods detected the significant association in

females between BMI and the BMI increasing allele of rs10913469 (SEC16B), observed in the

cross-sectional analysis. However, the STLMM method identified two significant associations

with loci that were not detected in the cross-sectional analyses, rs987237 (TFAP2B) and

rs1121980 (FTO). In addition, the SPLMM method detected 4 associations with loci that were

not observed in the cross-sectional analysis, including TFAP2B, MRSA, BDNF (rs1488830) and

NRXN3. None of the longitudinal methods detected the association observed in the cross-

sectional analysis with the BMI increasing allele of rs7138803 (BCDIN3D); this was only a

marginal association (P-Value=0.02) at age 13 which did not reach the Bonferroni threshold,

and therefore may have been a false positive.

For males, all three SNPs associated with BMI using the LMM method were detected in the

cross-sectional analysis; however, nine potentially significant loci were not detected. The

STLMM method failed to detect the FTO loci, the only loci that reached the Bonferroni

threshold in the cross-sectional analysis. It did however detect an association with the

rs17782313 (MC4R) loci; this loci was statistically significant in the cross-sectional analysis but

none of the other longitudinal methods detected it. The SPLMM method detected four of the

nine loci observed to be significantly associated with BMI at one or more time points in the

cross-sectional analysis. Finally, the NLMM method detected four of the loci that were

significantly associated with BMI in the cross-sectional analyses, one of which was only

detected by this longitudinal method (BCDIN3D).


Table 2.16: Summary of longitudinal analyses, using the four methods, for each of the 17 SNPs in females. Significant P-Values are in bold.

LMM SPLMM STLMM NLMM

SNP Beta P LRT P Beta P LRT P Beta P LRT P Beta P NEGR1 rs2815752 0.012 0.10 0.16 0.154 0.07 0.19 0.011 0.14 0.07 Size 0.001 0.62

Age:rs2815752 0.001 0.32 0.015 0.23 0.001 0.31 Tempo -0.006 0.54 Age2:rs2815752 -1x10-4 0.17 -0.001 0.40 0.001 0.45 Velocity -0.003 0.70 Age3:rs2815752 -1x10-5 0.48 -2x10-4 0.35 0.001 0.31

SEC16B rs10913469 0.029 1x10-3 0.02 0.217 0.04 0.31 0.030 2x10-3 0.04 Size 0.002 0.56 Age:rs10913469 0.003 0.03 0.032 0.04 0.003 0.03 Tempo -0.030 0.02 Age2:rs10913469 -2x10-4 0.06 -0.001 0.46 -0.001 0.60 Velocity 0.007 0.51 Age3:rs10913469 -2x10-7 0.99 -2x10-4 0.47 -0.001 0.61

LYPLAL1 rs2605100 0.002 0.82 0.12 0.036 0.71 0.34 -2x10-4 0.98 0.16 Size -0.003 0.32 Age:rs2605100 0.001 0.40 0.001 0.97 0.001 0.51 Tempo -0.005 0.63 Age2:rs2605100 -1x10-4 0.19 -0.002 0.09 0.001 0.42 Velocity -0.014 0.12 Age3:rs2605100 -1x10-5 0.40 -1x10-4 0.76 0.001 0.43

TMEM18 rs6548238 0.017 0.07 0.19 0.074 0.55 0.71 0.022 0.02 0.18 Size 0.005 0.22 Age:rs6548238 0.001 0.36 0.018 0.32 0.001 0.61 Tempo -0.008 0.56 Age2:rs6548238 -9x10-5 0.48 0.001 0.33 -0.003 0.13 Velocity 0.002 0.87 Age3:rs6548238 -2x10-6 0.90 -3x10-4 0.20 -0.001 0.19

ETV5 rs7647305 -0.001 0.92 0.38 -0.079 0.44 0.42 0.001 0.89 0.31 Size 0.002 0.55 Age:rs7647305 -3x10-4 0.83 -0.026 0.09 -1x10-4 0.94 Tempo 0.008 0.55 Age2:rs7647305 8x10-5 0.50 0.001 0.41 -0.002 0.24 Velocity 0.008 0.43 Age3:rs7647305 4x10-6 0.80 2x10-4 0.50 -0.001 0.15

Table 2.16 continued LMM STLMM SPLMM NLMM

SNP Beta P LRT P Beta P LRT P Beta P LRT P Beta P GNPDA2 rs10938397 0.004 0.52 0.36 0.056 0.48 0.74 0.002 0.80 0.32 Size -4x10-4 0.88

Age:rs10938397 0.001 0.22 0.012 0.28 0.002 0.15 Tempo -0.009 0.35 Age2:rs10938397 1x10-5 0.89 0.001 0.30 0.002 0.22 Velocity 0.007 0.33 Age3:rs10938397 -1x10-5 0.68 -1x10-4 0.54 0.001 0.34

TFAP2B rs987237 0.008 0.42 0.07 0.163 0.16 0.02 0.008 0.44 0.04 Size 0.004 0.30 Age:rs987237 0.002 0.06 0.040 0.01 -0.001 0.56 Tempo -3x10-4 0.98 Age2:rs987237 1x10-4 0.38 0.003 0.01 -0.001 0.79 Velocity 0.003 0.76 Age3:rs987237 -4x10-5 0.02 -4x10-4 0.12 -4x10-7 1.00

MRSA rs613080 -0.008 0.43 0.14 0.049 0.64 0.30 -0.012 0.25 0.02 Size 0.001 0.85 Age:rs613080 -3x10-4 0.79 0.009 0.60 -0.003 0.10 Tempo 0.003 0.80 Age2:rs613080 2x10-4 0.15 0.001 0.35 0.001 0.52 Velocity 0.017 0.12 Age3:rs613080 1x10-5 0.55 2x10-4 0.61 0.001 0.47

MTCH2 rs10838738 -0.006 0.44 0.28 0.031 0.70 0.28 -0.005 0.54 0.34 Size -0.002 0.44 Age:rs10838738 0.001 0.53 0.031 0.01 0.001 0.66 Tempo 0.001 0.90 Age2:rs10838738 4x10-5 0.66 4x10-4 0.69 -0.001 0.53 Velocity -0.009 0.27 Age3:rs10838738 -2x10-5 0.22 -3x10-3 0.10 -0.001 0.45

BDNF rs1488830 0.013 0.13 0.09 0.201 0.06 0.04 0.011 0.22 0.01 Size 1x10-4 0.97 Age:rs1488830 0.003 0.03 0.048 2x10-3 0.005 4x10-4 Tempo -0.018 0.16 Age2:rs1488830 -6x10-5 0.59 -2x10-4 0.85 0.002 0.45 Velocity -0.001 0.96 Age3:rs1488830 -3x10-5 0.08 -0.001 0.02 -1x10-4 0.89


SNP Beta P LRT P Beta P LRT P Beta P LRT P Beta P BDNF rs925946 -0.007 0.37 0.35 -0.052 0.55 0.32 -0.008 0.31 0.32 Size -0.003 0.29

Age:rs925946 3x10-4 0.76 0.013 0.32 0.001 0.28 Tempo -0.005 0.62 Age2:rs925946 6x10-5 0.51 0.001 0.26 0.001 0.45 Velocity -0.001 0.93 Age3:rs925946 -1x10-5 0.46 -4x10-5 0.86 2x10-4 0.81

BCDIN3D rs7138803 0.012 0.11 0.09 0.098 0.25 0.82 0.010 0.17 0.10 Size 4x10-4 0.90 Age:rs7138803 0.002 0.01 0.018 0.13 0.002 0.06 Tempo -0.017 0.11 Age2:rs7138803 -2x10-5 0.80 3x10-5 0.98 4x10-4 0.79 Velocity 0.008 0.35 Age3:rs7138803 -2x10-5 0.19 -2x10-4 0.49 3x10-5 0.97

NRXN3 rs10146997 -0.003 0.69 0.30 -0.021 0.84 0.64 -1x10-5 1.00 0.01 Size 0.001 0.83 Age:rs10146997 7x10-5 0.95 0.016 0.31 -0.003 0.05 Tempo -0.001 0.93 Age2:rs10146997 1x10-4 0.23 0.002 0.07 -0.004 0.03 Velocity 0.015 0.11 Age3:rs10146997 4x10-7 0.98 -3x10-4 0.21 -0.002 0.07

SH2B1 rs8055138 0.004 0.60 0.34 0.052 0.49 0.94 0.004 0.59 0.36 Size 5x10-4 0.87 Age:rs8055138 5x10-4 0.60 0.002 0.85 3x10-4 0.82 Tempo -0.005 0.61 Age2:rs8055138 -2x10-6 0.98 -0.001 0.61 -0.001 0.68 Velocity 0.005 0.50 Age3:rs8055138 5x10-6 0.70 1x10-4 0.62 -4x10-4 0.62

FTO rs1121980 0.013 0.08 0.31 0.244 4x10-3 0.02 0.012 0.13 0.38 Size 0.001 0.77 Age:rs1121980 0.001 0.30 0.027 0.03 0.002 0.14 Tempo -0.013 0.22 Age2:rs1121980 -1x10-4 0.23 -0.002 0.10 0.001 0.55 Velocity -0.006 0.46 Age3:rs1121980 1x10-5 0.67 1x10-4 0.77 4x10-4 0.59


SNP Beta P LRT P Beta P LRT P Beta P LRT P Beta P MC4R rs17782313 2x10-4 0.98 0.31 -0.032 0.76 0.70 -0.001 0.91 0.10 Size -0.004 0.22

Age:rs17782313 0.001 0.25 0.006 0.68 0.003 0.02 Tempo -0.017 0.15 Age2:rs17782313 -3x10-7 1.00 -2x10-4 0.84 0.002 0.40 Velocity 0.008 0.41 Age3:rs17782313 -2x10-5 0.32 -3x10-4 0.26 3x10-4 0.71

KCTD15 rs11084753 0.011 0.17 0.08 0.008 0.92 0.26 0.013 0.11 0.11 Size -0.004 0.24 Age:rs11084753 0.002 0.02 0.017 0.18 0.003 0.03 Tempo -0.028 0.01 Age2:rs11084753 -7x10-5 0.50 0.001 0.63 -0.002 0.25 Velocity 0.007 0.40 Age3:rs11084753 -2x10-5 0.26 3x10-5 0.88 -0.001 0.19

Table 2.17: Summary of longitudinal analyses, using the four methods, for each of the 17 SNPs in males. Significant P-Values are in bold.

LMM STLMM SPLMM NLMM

SNP Beta P LRT P Beta P LRT P Beta P LRT P Beta P

NEGR1 rs2815752 0.011 0.12 0.57 0.027 0.72 0.96 0.008 0.25 0.70 Size 0.002 0.47 Age:rs2815752 0.001 0.38 0.004 0.74 0.001 0.63 Tempo -0.006 0.54 Age2:rs2815752 -6x10-5 0.49 4x10-4 0.63 0.002 0.23 Velocity 0.002 0.77 Age3:rs2815752 4x10-6 0.78 -1x10-4 0.55 0.001 0.16

SEC16B rs10913469 0.001 0.90 0.28 0.058 0.52 0.33 -0.001 0.92 0.18 Size -0.002 0.53 Age:rs10913469 0.002 0.10 0.026 0.05 -2x10-4 0.86 Tempo -0.004 0.70 Age2:rs10913469 -1x10-5 0.94 -0.001 0.46 -1x10-4 0.95 Velocity -0.004 0.56 Age3:rs10913469 -3x10-5 0.05 -2x10-4 0.22 1x10-4 0.93

LYPLAL1 rs2605100 -0.005 0.53 0.49 0.015 0.85 0.65 -0.007 0.37 0.79 Size -0.003 0.24 Age:rs2605100 2x10-4 0.87 0.016 0.23 -0.001 0.68 Tempo 0.001 0.96 Age2:rs2605100 -5x10-5 0.60 4x10-4 0.70 0.001 0.42 Velocity -0.004 0.48 Age3:rs2605100 -1x10-5 0.57 -3x10-4 0.08 0.001 0.33

TMEM18 rs6548238 0.015 0.13 0.02 0.098 0.31 0.01 0.014 0.16 0.02 Size 0.002 0.55 Age:rs6548238 0.004 1x10-3 0.066 1x10-4 0.003 0.06 Tempo -0.022 0.10 Age2:rs6548238 1x10-4 0.35 0.002 0.06 0.001 0.76 Velocity 0.013 0.12 Age3:rs6548238 -5x10-5 0.01 -0.001 4x10-4 3x10-4 0.77

ETV5 rs7647305 0.006 0.46 0.68 0.062 0.53 0.44 0.006 0.51 0.37 Size 0.003 0.35 Age:rs7647305 5x10-4 0.67 0.026 0.06 0.002 0.24 Tempo -0.003 0.79 Age2:rs7647305 5x10-5 0.63 0.001 0.15 0.002 0.37 Velocity 0.003 0.66 Age3:rs7647305 -3x10-6 0.84 -3x10-4 0.20 0.001 0.42


SNP Beta P LRT P Beta P LRT P Beta P LRT P Beta P GNPDA2 rs10938397 0.009 0.17 0.52 0.095 0.17 0.32 0.006 0.41 0.05 Size 0.003 0.20

Age:rs10938397 0.001 0.23 0.009 0.43 0.003 4x10-3 Tempo -4x10-4 0.97 Age2:rs10938397 -1x10-5 0.95 -2x10-4 0.83 0.003 0.04 Velocity 0.003 0.64 Age3:rs10938397 -1x10-5 0.41 -2x10-4 0.19 0.001 0.17

TFAP2B rs987237 0.023 0.01 0.09 0.123 0.20 0.27 0.023 0.01 0.24 Size 0.003 0.41 Age:rs987237 0.001 0.41 -0.011 0.48 0.001 0.45 Tempo -0.011 0.36 Age2:rs987237 -3x10-4 0.02 -0.002 0.03 -0.001 0.75 Velocity -0.005 0.53 Age3:rs987237 1x10-5 0.57 4x10-4 0.06 -1x10-4 0.95

MRSA rs613080 0.002 0.84 0.68 -0.110 0.29 0.17 0.005 0.63 0.11 Size 3x10-4 0.93 Age:rs613080 -4x10-4 0.74 -0.037 0.02 -0.003 0.06 Tempo -0.006 0.64 Age2:rs613080 -1x10-5 0.96 -0.002 0.19 -0.002 0.38 Velocity 0.005 0.51 Age3:rs613080 2x10-5 0.24 0.001 0.01 3x10-6 1.00

MTCH2 rs10838738 0.012 0.10 0.48 0.085 0.28 0.17 0.010 0.16 0.85 Size -3x10-4 0.90 Age:rs10838738 0.001 0.53 -0.010 0.38 0.001 0.61 Tempo -0.012 0.25 Age2:rs10838738 -2x10-4 0.11 -0.002 0.05 4x10-4 0.82 Velocity -0.003 0.68 Age3:rs10838738 1x10-5 0.66 4x10-4 0.02 4x10-4 0.67

BDNF rs1488830 0.011 0.18 0.22 0.038 0.67 0.27 0.014 0.09 0.20 Size 0.003 0.29 Age:rs1488830 0.002 0.06 0.018 0.19 0.002 0.27 Tempo -0.015 0.21 Age2:rs1488830 7x10-5 0.50 0.002 0.03 -0.003 0.08 Velocity 0.013 0.07 Age3:rs1488830 -2x10-5 0.23 -2x10-4 0.36 -0.002 0.05


SNP Beta P LRT P Beta P LRT P Beta P LRT P Beta P BDNF rs925946 0.005 0.47 0.22 -0.005 0.94 0.03 0.004 0.60 0.36 Size 0.004 0.16

Age:rs925946 0.001 0.35 0.011 0.36 1x10-4 0.94 Tempo 0.002 0.83 Age2:rs925946 7x10-5 0.45 0.003 1x10-3 0.001 0.64 Velocity 0.005 0.43 Age3:rs925946 -2x10-5 0.12 -4x10-4 0.03 3x10-4 0.71

BCDIN3D rs7138803 0.013 0.05 0.10 0.034 0.62 0.92 0.013 0.05 0.44 Size -1x10-4 0.96 Age:rs7138803 0.002 0.06 0.006 0.57 0.002 0.14 Tempo -0.020 0.04 Age2:rs7138803 -7x10-5 0.43 5x10-6 1.00 -0.001 0.65 Velocity 0.006 0.28 Age3:rs7138803 2x10-6 0.86 5x10-6 0.77 -3x10-4 0.74

NRXN3 rs10146997 -0.004 0.57 1x10-3 0.143 0.11 1x10-3 -0.006 0.50 0.01 Size -0.002 0.58 Age:rs10146997 -0.002 0.06 0.013 0.34 -0.002 0.14 Tempo 0.031 0.01 Age2:rs10146997 -2x10-4 0.03 -0.003 0.00 -4x10-5 0.98 Velocity -0.032 4x10-6 Age3:rs10146997 -2x10-6 0.88 -2x10-4 0.23 1x10-4 0.89

SH2B1 rs8055138 0.011 0.11 0.48 0.054 0.51 0.80 0.013 0.07 0.70 Size 0.002 0.48 Age:rs8055138 0.001 0.49 -0.007 0.57 5x10-4 0.69 Tempo -0.005 0.60 Age2:rs8055138 -8x10-5 0.38 -4x10-4 0.69 -0.001 0.54 Velocity -0.003 0.57 Age3:rs8055138 -1x10-5 0.70 2x10-4 0.23 -3x10-4 0.73

FTO rs1121980 0.026 2x10-4 4x10-4 0.138 0.08 0.21 0.029 3x10-5 3x10-4 Size -4x10-4 0.87 Age:rs1121980 0.004 1x10-4 0.027 0.02 0.005 9x10-5 Tempo -0.034 1x10-3 Age2:rs1121980 -2x10-4 0.08 -0.001 0.56 -0.003 0.07 Velocity 0.003 0.64 Age3:rs1121980 -2x10-5 0.14 -2x10-4 0.26 -0.002 0.06


SNP Beta P LRT P Beta P LRT P Beta P LRT P Beta P MC4R rs17782313 0.013 0.10 0.10 0.124 0.16 0.02 0.012 0.15 0.19 Size -0.001 0.74

Age:rs17782313 0.003 0.01 0.037 0.01 0.002 0.20 Tempo -0.029 0.01 Age2:rs17782313 -1x10-5 0.95 0.002 0.11 0.001 0.59 Velocity 0.008 0.27 Age3:rs17782313 -1x10-5 0.47 -1x10-4 0.73 0.001 0.42

KCTD15 rs11084753 0.001 0.94 0.34 -0.041 0.63 0.54 0.002 0.75 0.40 Size 0.004 0.17 Age:rs11084753 -0.001 0.35 -0.023 0.07 -0.002 0.12 Tempo 0.007 0.49 Age2:rs11084753 7x10-5 0.46 -1x10-4 0.95 -0.002 0.33 Velocity 0.008 0.21 Age3:rs11084753 2x10-5 0.27 2x10-4 0.32 -0.001 0.50

2.5.4 Obesity-Risk Allele Score

The obesity-risk allele score based on the genotypes at each of the 17 loci was normally

distributed and showed an approximately linear association with BMI across childhood, based

on the mean BMI (95% confidence interval) for each score at each age (Figure 2.12).

When the obesity-risk allele score was incorporated into the four longitudinal models, it was

associated with increasing BMI in females using all four methods, however only three methods

detected an association in males (Table 2.18). For the females, the LMM, STLMM and SPLMM

methods all detected an increase in BMI per allele increase in the obesity-risk allele score

(LMM β=0.0754, P-Value=0.02; STLMM β=0.0566, P-Value=0.02; SPLMM β=0.0793, P-

Value=0.01), in addition to an increase in linear trajectory over time (LMM β=0.0181, P-

Value=0.00002; STLMM β=0.0152, P-Value=0.00003; SPLMM β=0.0184, P-Value=0.0006). No

significant associations in the LMM, STLMM or SPLMM methods were detected for the

quadratic interactions with the obesity-risk allele score, however the cubic interaction was

significant in the LMM (β=-0.0002, P-Value=0.01) and STLMM (β=-0.0001, P-Value=0.02).

According the LMM and STLMM methods, this indicates that females with higher allele scores

plateau to adult BMI at an earlier age. In contrast, the NLMM method in both females and

males was unable to detect a significant association with an increase in size or velocity,

however did detect a decrease in tempo (assumed to be the adiposity rebound) for each

increase in the number of risk alleles. In the males, the LMM and SPLMM methods, also

detected an increase in BMI (LMM β=0.1045, P-Value=0.0001; SPLMM β=0.1022, P-

Value=0.0002) and BMI/year per allele increase (LMM β=0.0145, P-Value=0.0001; STLMM

β=0.0083, P-Value=0.007; SPLMM β=0.0123, P-Value=0.007). No significant associations in the

LMM, STLMM or SPLMM methods were detected for the quadratic and cubic interactions with

the obesity-risk allele score. This indicates that the shape of the curve is consistent across the

score categories.


Figure 2.12: Distribution of obesity-risk allele score, with error bars for mean BMI at age 14

years. The obesity-risk-allele score incorporates genotypes from 17 loci (FTO, MC4R, TMEM18,

GNPDA2, KCTD15, NEGR1, BDNF, ETV5, SEC16B, LYPLAL1, TFAP2B, MTCH2, BCDIN3D, NRXN3,

SH2B1, and MRSA) in the 1,219 individuals from the Raine Study with complete genetic data.

The error bars display the mean (95% CI) BMI at age 14 years (the largest follow-up in

adolescence) for each risk-allele score.


Table 2.18: Results from association analysis of the obesity-risk allele score with BMI trajectories using the four methods, adjusted for the first five principal

components


Beta 95% CI P Beta 95% CI P Beta 95% CI P Beta SE P

Fem

ale

Score 0.075 0.014, 0.137 0.02 0.057 0.008, 0.105 0.02 0.079 0.017, 0.142 0.01 Size -2x10-4 0.001 0.85

Score*Age 0.018 0.010, 0.026 2x10-5 0.015 0.008, 0.023 3x10-5 0.018 0.008, 0.029 6x10-4 Tempo -0.009 0.003 5x10-3

Score*Age2 -7x10-6 -0.001, 0.001 0.99 0.001 9x10-5, 0.001 0.07 -0.008 -0.021, 0.006 0.28 Velocity 0.003 0.002 0.14

Score*Age3 -2x10-4 -3x10-4, -4x10-5 0.01 -1x10-4 -3x10-4, 1x10-4 0.02 -0.006 -0.013, 0.001 0.11

Mal

e

Score 0.105 0.052, 0.157 1x10-4 0.039 -0.004, 0.081 0.07 0.102 0.048, 0.156 2x10-4 Size 3x10-4 0.001 0.69

Score*Age 0.015 0.007, 0.022 1x10-4 0.008 0.002, 0.014 0.01 0.012 0.003, 0.021 0.02 Tempo -0.007 0.003 4x10-3

Score*Age2 -6x10-4 -0.001, 1x10-4 0.10 -1x10-5 -4x10-4, 4x10-4 0.96 -3x10-4 -0.012, 0.011 0.96 Velocity 4x10-4 0.002 0.79

Score*Age3 -1x10-4 -2x10-4, 2x10-6 0.06 -1x10-4 -2x10-4, -1x10-5 0.20 7x10-4 -0.005, 0.007 0.83

2.5.5 Characterising Genetic Associations in SPLMM Model

The SPLMM model yielded the best fit to these data, and therefore further analysis of the

genetic data was undertaken using this model. As seen in the in the previous sections, loci in

different genes were associated with BMI trajectory in males and females; this indicates that

there are potentially different genetic pathways leading to growth rate in males and females.

In females, SNPs in the SEC16B, TFAP2B, MRSA, BDNF and NRXN3 genes were significantly

associated with BMI trajectory, whereas in males TMEM18, GNPDA2 , NRXN3, and FTO were

significant. Figure 2.13 shows the population average curves for females (A) and males (B) with

zero, one or two copies of the minor allele for each of the significantly associated loci using the

SPLMM model.

FTO in males and SEC16B in females have similar trajectory patterns, whereby the genetic

effect is observed early in life and persists throughout adolescence with each additional copy

of the risk allele having a greater increase in BMI over time. In addition, the risk alleles of the

NRXN3 and MRSA loci in females are associated with lower levels of BMI from the adiposity

rebound to post-puberty (approximately 6-13 years of age).

Figure 2.14 displays the population average curves for individuals with 15, 17 or 18 (25th, 50th

and 75th percentile) obesity-risk alleles. The growth curves in each of the genders show

different patterns; females begin their trajectory smaller than males, they have an earlier

rebound, and by the age of 18 years they are beginning to plateau at their potential adult BMI.

In contrast, males go through puberty at a slightly later age resulting in their BMI continuing to

increase at the age of 18 years. The genetic effect is shown to begin later for females, at

around seven and a half years (P-Value=0.03), than for males at four years (P-Value=0.02)

(Figure 2.15).


Figure 2.13: Population average curves for each of the significantly associated SNPs from

the SPLMM method in females (panel A) and males (panel B)

A)


B)


Figure 2.14: Population average curves from the SPLMM method in females and males

Predicted population average BMI trajectory from 1 – 18 years, after adjusting for birth weight

and maternal smoking during pregnancy, for individuals with 15 (lower quartile), 17 (median),

and 18 (upper quartile) risk alleles.


Figure 2.15: Associations between the risk-allele score and BMI at each follow-up in females

and males. Regression coefficients (95% CI) presented on ln(BMI) scale from the Semi-

Parametric Linear Mixed Model (SPLMM) longitudinal model, derived at each of the average

ages of follow-up. For example, a male with 17 obesity-risk-alleles is likely to have a ln(BMI)

0.005 units higher at age 6 than a male 16 alleles and by age 14 this difference will be

increased to 0.010 units.


2.6 Comparison of Models The modelling methods were compared using the following:

1. Model fit

2. Computation time

3. Ability to detect genetic associations with known adult BMI SNPs

2.6.1 Model Fit

To compare model fit, several model parameters were compared:

1. R2: measure of the variance explained by the model calculated as [233]:

22

* 20 0

var( | , )1 1var( | )YRY b

σσ

= − = −x b

Where, using the notation from 2.1 above: 2 ' ' 20 1 1var{ ( ) ( ) }ij x ij ziσ β σ= − + − +x μ b z μ

2 var( )σ ε=

2. Difference between observed and fitted values: calculated by 2( )estY YΣ − which give

an indication of how well the model fits to the observed data

3. Visual inspection the residual and mean plots.

Table 2.19 displays the measures of fit used to compare methods: R2, R2 from 1,000 simulated

datasets (see Section 2.4) and the observed - fitted values. The R2, in conjunction with

interquartile range of variation of R2 estimated through simulations, clearly favour the SPLMM

as the best model fit for the females. The R2 estimates from the simulations indicate that

although the STLMM method has higher R2 for both females and males, the interquartile range

is much larger for STLMM method. The model fit is therefore more data dependent in the

STLMM method than the other methods, which is not desirable when applying these methods

to other cohorts. The R2 in the males favours the STLMM method; however, this method has a

considerably longer computational time and larger deviation between the fitted values and the

observed values (as seen in the following two sections) indicating that is not be appropriate for

large scale genetic studies.


Table 2.19: Statistical measures used to compare model fit of the four methods.

R2 R2 from 1,000 simulated

datasets [mean (95%CI)]

(Observed-fitted values)2

[median (IQR)]

Fem

ale

LMM 83.59% 83.58% (83.50, 83.66) 0.2705 (0.0579, 0.8755)

STLMM 88.78% 90.36% (89.07, 91.66) 0.2728 (0.0613, 0.9007)

SPLMM 89.42% 89.45% (89.34, 89.56) 0.1720 (0.0374, 0.5871)

NLMM 85.98% 85.95% (85.76, 86.14) 0.1678 (0.0350, 0.5752)

Mal

e

LMM 80.67% 80.65% (80.35, 81.94) 0.2390 (0.0470, 0.8187)

STLMM 88.72% 91.44% (90.40, 92.48) 0.2248 (0.0479, 0.8453)

SPLMM 87.59% 87.61% (87.50, 87.73) 0.1656 (0.0329, 0.5501)

NLMM 85.10% 85.07% (84.86, 85.28) 0.1604 (0.0333, 0.5713)

Figure 2.16 displays the residuals from all four methods in both males and females. The female

residual plots indicate the LMM, STLMM and SPLMM methods all have residuals distributed

close to the expected distribution (normal for the LMM and SPLMM and skew-t for the

STLMM). Several within-subject outliers (at the tails of the distribution) were not captured in

all methods. However, the NLMM in particular had additional outliers not present with the

other methods. The LMM and SPLMM methods both have some deviation from the normal

distribution at the top end of the curve signifying that they under estimate the high BMI

values. In contrast, there were an excess of extreme residual values at both ends when using

the NLMM method indicating a poor fit for the data. It over estimates low BMI values and

under estimates high values, thus under estimating within-individual variability and potentially

leading to conservative inference about genetic associations. The male residuals displayed a

similar pattern to females, although there were fewer obvious outliers. In addition, as there

was less skewness in the males, the STLMM method deviated from the expected t-distribution

but in the opposite direction to that of the females, whereby the low values of BMI are

underestimated. Based on model fit, all four methods were adequate in modelling childhood

growth curves; however, the SPLMM was slightly better than the other methods at accounting

for outliers and had the best model fit.


Figure 2.16: Q-Q plot of residuals for each of the methods for females (top four) and males

(bottom four)


2.6.2 Computation Time

Table 2.20 indicates the median (IQR) computation time for 100 models adjusting main effect

and the interaction between the FTO SNP and time in each of the four methods. The models

were run in R-64-bit version 2.12.1 on a 64-bit operating system with an Intel Core i7 CPU

Processor (L 640 @ 2.13GHz). It clearly indicates that the STLMM method is the most

computationally intensive, taking on average 75 minutes for one model in females and 66

minutes in males. The other three methods are all relatively quick and would allow scalability

to a genome-wide analysis.

Table 2.20: Computation time for the four methods adjusting for the FTO genotype (median

[IQR])

Females Males

LMM 13.59sec (13.41, 14.40) 15.84sec (15.66, 16.55) STLMM 4505sec (4490, 4784) 3962sec (3895, 3970) SPLMM 23.49sec (23.41, 23.92) 24.07sec (23.78, 24.52) NLMM 0.01sec (0.00,0.02) 0.00sec (0.00,0.02)

2.6.3 Ability to Detect Genetic Associations with Known Adult BMI/Obesity SNPs

As mentioned in Section 2.5, genetic loci from 17 genes previously shown to be associated

with adult BMI and subsequently with childhood growth were tested for association with BMI

trajectory using each of the four methods. Table 2.21 displays the number of significant SNPs

detected by each of the methods in females and males. The SPLMM method was able to

detect a higher proportion of associations with childhood growth in both males and females. In

addition, the results also reflected those seen in the cross-sectional analyses where no

complex modelling was necessary. The NLMM method was unable to detect many associations

in either males (only five significant SNPs) or females (two significant SNPs), which follows

from it being a slightly more conservative method than the other three methods. The STLMM

also had the ability to detect a number of genetic effects; however it detected SNPs that were

not associated with BMI in any other method or in the cross-sectional analyses (i.e. TFAP2B in

females MC4R in males). In addition, it is a more computationally intensive method, which

would prove difficult in larger scale genetic studies such as a GWAS. It is also not as flexible as

the other methods in terms of extensions to look at gene-environment or gene-gene

interactions. The current study provides evidence that the SPLMM method is the most


effective method to detect genetic associations and allows the flexibility for extensions into

larger scale or more complex genetic analyses.

Table 2.21: The number of significant SNPs for each method, using a likelihood ratio test.

Female Male LMM 1 of 17 3 of 17 STLMM 3 of 17 4 of 17 SPLMM 5 of 17 4 of 17 NLMM 2 of 51 (three tests per SNP) 5 of 51 (three tests per SNP)

2.7 Discussion The current study has shown that of the four statistical methods evaluated, the SPLMM

method was the most efficient for modelling childhood growth to detect modest genetic

effects in the longitudinal pregnancy cohort study investigated. In addition, it has been shown

that there are potentially different genetic pathways leading to increased growth rate in males

and females and that each additional adult BMI allele increases both average BMI and rate of

growth throughout childhood.

There are several different statistical methods that can be used to model childhood growth.

The four methods were selected as they would allow for adjustment of potential confounders,

appropriately account for the correlation between the repeated measures, allow for

incomplete data, and were computationally feasible in the context of candidate gene studies

and GWASs. Results indicate that the SPLMM method does a more proficient job at accounting

for the variation in BMI growth than the LMM method as it has a smaller residual standard

deviation. The STLMM method used a different scale for BMI and was therefore unable to be

compared using standard measures of model fit such as the AIC, deviance or residual standard

deviation; however, the observed versus fitted values could be compared across all four

methods, as it is scale independent. The SPLMM and NLMM methods produce similar

differences between observed and fitted values. However, there is a larger range in values

from the LMM and STLMM methods, which indicates that these methods are less accurate in

predicting BMI for each individual over time, as they tend to overestimate low BMI and

underestimate high BMI. Although the residual plots indicate the STLMM method has the best

fit to the data, the method does not produce the most accurate predictions as seen by the IQR

for the fitted versus observed values. . Furthermore, the estimates of skewness from the


STLMM model were relatively large (Females: intercept=4.5791 [SE=1.0957], slope=2.2336

[SE=0.6269]; Males: intercept=2.8590 [SE=0.5943], slope=1.6628 [SE=0.4155]), which could be

influenced by outliers and result in inaccurate predictions. Based on model fit, all four methods

are adequate in modelling childhood growth curves; however the SPLMM produces the most

accurate fitted values and can account for the majority of the outlying BMI measurements.

Of the 17 genetic variants associated with adult BMI and obesity risk that were investigated,

the SPLMM method was able to detect a higher proportion of associations with childhood

growth in both males and females than the other methods. As expected, the more

conservative NLMM method performed poorly in both males (five significant tests of 51) and

females (two significant tests of 51). The STLMM method detected a number of genetic

effects; however it was a more computationally intensive method and less flexible than others,

which would prove difficult in larger scale genetic studies such as GWASs or gene-gene/gene-

environment interaction studies. The current study provides evidence that the SPLMM method

is the most effective method to detect genetic associations and allows for the flexibility for

extensions into large scale and more complex genetic analyses.

Single genetic loci typically have small effects on complex diseases or explain only a small

proportion of the variability in a quantitative trait; therefore, major increases in disease risk

are expected from simultaneous exposure to multiple genetic risk variants. A post hoc power

calculation using 1,000 non-parametric bootstrap simulations based on the Raine Study data

indicated that this study had 97% power to detect the FTO loci rs1121980 with MAF=0.41,

which has one of the larger effect sizes on BMI, but still had 83% power to detect a more

realistic smaller effect size like the BDNF SNP rs1488830 association in females with MAF=0.21.

In contrast, the power to detect the allele score, combining all risk alleles, was 95% in both

males and females separately. The current study is the first to investigate an association

between 17 published obesity-risk loci as an allele score and BMI trajectory throughout

childhood and adolescence, separately in males and females. Hoed et al [204] used a similar

approach with a 17-loci allele-score but focused on two cross-sectional association analyses in

pre-/early pubertal children and adolescents. By utilizing a longitudinal design, the current

study reduced the number of genetic association tests conducted from eight in a cross-

sectional setting to one per gender, reducing the necessity of adjusting for multiple testing and

potentially overlooking important genetic loci. A second study by Elks et al [206] evaluated the


association between adult obesity risk genes and growth throughout childhood using a smaller

subset of obesity susceptibility loci and with analyses only up to age 11 years. Both studies

conducted analysis adjusting for gender; however, this does not allow each gender to have

different growth trajectories or the investigation of different timing of the genetic effects.

Substantial differences were found between males and females in the timing of the adiposity

rebound and plateauing towards adulthood. Additionally, the genetic effects had different

timing and magnitude in each gender. By combining males and females into one analysis,

these genetic differences may have been averaged out and the biology underlying the

differences may remain undetected.

A recent longitudinal study investigating the life-course effects of variants in the FTO gene and

near the MC4R gene demonstrated that the effects strengthen throughout childhood and peak

at age 20 before weakening during adulthood [205]. A similar pattern was detected with the

obesity-risk allele score throughout childhood, where the effect begins around four years in

males and seven years of age in females, and increases in size each year. One limitation of the

current study is that the cohort currently only has data available up to 17-years. It will be of

interest to follow the cohort to investigate how the combined effect of these SNPs changes as

the cohort progresses into adulthood. Further, it would be valuable to confirm that the SPLMM

method is the most appropriate statistical method in other cohorts investigating the genetic

determinants of childhood growth and the patterns of association across the life course.

Further studies are now required to assess the validity of these findings and also extend them

to perhaps focus on interactions between genes and the environment. Interactions, both gene-

gene and gene-environment, are an important area of research that is critical for

understanding the mechanisms underlying obesity. A small simulation study was performed

using re-sampling techniques based on 1,000 non-parametric bootstrap data sets with

replacement from the Raine Study data and calculating the power to detect a gene-gene

interaction. Two SNP combinations were investigated to gather an understanding of the range

of power in the current study; these included the two most commonly reported BMI

associated loci, FTO rs1121980 (MAF=0.41) by MC4R rs17782313 (MAF=0.23) as well as two

loci with large minor allele frequency, FTO rs1121980 by NEGR1 rs2815752 (MAF=0.38). Based

on these simulations, the current study had 58.0% power to detect an interaction between

two SNPs with larger minor allele frequencies (FTO*NEGR1) and effect sizes (FTO 0.019kg/m2;


NEGR1 0.011kg/m2), while assuming a multiplicative model for the interaction. However, the

power decreases rapidly with the minor allele frequency (FTO*MC4R) and effect size (FTO

0.004kg/m2; MC4R 0.002kg/m2) to 4.6%. This study was therefore not appropriately designed

to detect gene-gene or gene-environment interactions but instead suggest that meta-analyses

of multiple cohorts might be a better way to tackle this problem.

2.8 Conclusion In conclusion, it has been shown that although all four statistical methods investigated for

modelling childhood growth were appropriate to model growth curves in childhood, the

SPLMM method was the most efficient in these data in terms of predicted values and

detection of genetic effects. Further, it was shown that there is some evidence that genetic

variations in established adult obesity-associated genes are associated with childhood growth;

however these effects differ by gender and timing of effect. This study provides further

evidence of genetic effects that may identify individuals early in life that are more likely to

rapidly increase their BMI through childhood, which provides some insight into the biology of

childhood growth.


Chapter 3: Comparing The Semi-Parametric Linear Mixed Model To A Two-Step Approach For Genome-Wide Association Studies 3

3.1 Introduction Some authors are currently suggesting the use of a two-step approach to investigate the

genetic associations with longitudinal traits in a genome-wide setting. Using this approach, one

models the phenotype in a mixed effects model framework and then takes summary measures

from this model to analyse against the ~2.5 million genome-wide SNPs. After showing in

Chapter 2 that the Semi-Parametric Linear Mixed Effects Model (SPLMM) was the best fit to

the longitudinal BMI data in the Raine Study, it was necessary to investigate whether a two-

step approach using this model could be implemented in this data for the GWAS.

3.2 Background There has been a plethora of discussion regarding the ‘missing heritability’ of most traits and

diseases [43,50,234]. Some researchers are sceptical about this idea and state that the

heritability may not be ‘missing’ but rather overestimated by the quantitative genetic studies

to begin with [235] or the fact it is missing is not important for clinical practice [236]. This

‘missing heritability’ comes from the fact that the current estimates of variation explained for

most common complex diseases by the known genes to date are much lower than the original

heritability estimates. For example, the variability explained by the 32 known genetic variants

for adult BMI is 1.45% [72], whereas the heritability estimates are as high as 80% from family

studies [169]. Therefore, to be able to explain more about the genetic associations with a given

disease or trait, we need to develop new statistical methodologies and/or investigate a wider

variety of genetic markers. Progress has begun on both aspects of development, especially

with the advent of next generation sequencing and 1,000 genomes imputation [63]. For

example, it is now possible to look at rare genetic variants through new gene based association

analyses [237,238,239,240,241,242,243]. There are also new methods which use functional

115 Chapter 3: Two-Step Approach

annotation to refine GWAS signals [47,244], and sophisticated gene-gene and gene-

environment interaction tests.

As previously outlined in Chapters 1 and 2, geneticists are also beginning to investigate

longitudinal traits in GWASs. However, the models required for longitudinal traits are

considerably more computationally intensive to conduct than those for cross-sectional traits

(see Chapter 2). Therefore, different methods are being suggested to reduce this

computational burden in GWASs [85,245,246,247]. Some of these methods require a data

reduction method for the phenotypic data, with the summary measures subsequently being

used in the genetic association analysis.

Kerner et al provide a summary of the longitudinal genetic association analyses applied to the

Framingham Heart Study [85]. These analyses either involved:

1. Methods whereby the SNP was added to the longitudinal model, such as those

presented in Chapter 2. However, most of the studies using this method only analysed

a subset of the genome-wide genetic variants [88,90,248];

2. A two-step approach to the analysis whereby the phenotypic data was reduced prior

to the genetic association analysis [87,89,249].

Two of the two-step approaches looked at how genetic variants influenced the trajectory of a

quantitative phenotype over time; the third study reported in Kerner et al that used a two-step

approach was investigating a longitudinal case-control study [249]. Kerner and Muthén used

latent class modelling to define three groups of individuals and then looked at whether SNPs

on chromosome 8 were associated with class memberships; this study therefore investigated

the heterogeneity between individuals due to genetics [87]. In contrast, Roslin et al used

multivariate linear latent growth models to estimate an intercept and slope value for each

individual and then used these parameters as independent scalar outcomes for the genetic

association analyses [89]. These latent models are very similar to the LMM and SPLMM

methods discussed in Chapter 2; however they allow investigation of the relationships

between multiple dependent variables in a multivariate framework. In addition, the traits

presented in Roslin et al had linear trajectories over time, rather than the complex curve

required for childhood BMI. Each of the analyses presented in Kerner et al [85] targeted a

slightly different scientific question; the only publication they presented investigating the

genetic association with both intercept and trajectory of a quantitative trait over time, similar


to this thesis, was Roslin et al [89]. Similar two-step approaches have been used previously in

genetic linkage studies [82,84]

Sikorska et al conducted a simulation study investigating three different two-step approaches

[245]: 1) “slope as outcome” whereby a linear trajectory is estimated for each individual in the

first step and is then regressed against the SNPs in the second step; 2) “two-step” whereby a

linear mixed effects model (LMM) is used in the first step to estimate best linear unbiased

predictors (BLUPs [224]) for the random effects slope parameter in each individual which is

subsequently regressed against the SNPs in the second step; and 3) a “conditional two-step”

which is the same as the two-step approach however a conditional linear mixed effects model

is used in the first step [250]. Their two-step approach was the same as in Roslin et al [89].

Sikorska et al showed that although the conditional two-step method is the most desirable in

terms of both accuracy and computational time, the two-step method is a reasonable

alternative for most scenarios. They also conclude that the two-step approaches were 170

times faster than the one-step LMM for a full GWAS.

The conditional LMM used by Sikorska et al [245], initially introduced by Verbeke et al [250],

separates the time stationary (cross-sectional effects) from the time varying (longitudinal

effects) covariates in both the fixed and random effects. It then uses conditional inference

where the random intercept is treated as nuisance and estimation is performed conditional on

sufficient statistics for the nuisance parameters. This removes both the random intercepts and

the cross-sectional fixed effects from the model. This model was appropriate for Sikorska et al

[245] as they were only interested in the SNP effect over time; however, in this thesis, both the

average SNP effect in childhood and the SNP effect over time are of interest as, to my

knowledge, there has been no GWAS of childhood BMI.

These two-step approaches are used as a ‘screening tool’, where the summary statistics are

used to perform the full genome-wide scan in a fast manner and then the significant loci from

the GWAS analysis are then focused on in a full linear mixed model. Although these data

reduction methods have been shown to be successful for reducing the computational burden

while still producing accurate results for longitudinal genetic association studies to date, they

have not been investigated for complex phenotypic traits; they have been used where there is

a linear relationship between the outcome and time, the correlation between the intercept


and slope is low (for example, the correlation between the intercept and slope in Sikorska et al

is -0.14 [245]) and there are normal, independent errors. Of particular interest for this thesis is

the investigation of both the SNP and SNP*age effects on a complex trait which has a

trajectory that varies with time, high correlation between the intercept and slope terms and

non-normal, correlated (continuous auto-regressive) errors; this data therefore needs a

complex model to account for the intricate details in the data.

3.2.1 Aims

The aim of this chapter is to compare the SNP and SNP*age interaction effects using the two-

step approach outlined in Sikorska et al [245] and the SPLMM model presented in Chapter 2.

I hypothesize that the two-step approach will provide different results to the SPLMM model

for the SNP and SNP*age interaction effects. The conditional two-step method presented in

Sikorska et al removes the correlation between the intercept and slope terms by only

modelling the slope, which increases the correlation between the one- and two-step

approaches. As seen in Chapter 2, the correlation between the intercept and slope terms in

the SPLMM are high, which may influence the genetic results when using the two-step

approach.


3.3 Methods 3.3.1 Statistical Methods

Two models were used for the genetic analyses, the SPLMM and the two-step method. The

SPLMM model is a one-step approach whereby the SNPs were added to the fixed effects of the

model. The general SPLMM model is presented in Section 2.4.3.2; the specific model run in this

analysis including the SNPs and mean centred age (centred at eight years) was: 32

0 1 2 3

3 3 31 2 3

4 5 6

2 3

7 8 9 10

11

(Age Age) (Age Age)log(BMI ) (Age Age)

2! 3!(Age Age ) (Age Age ) (Age Age )

3! 3! 3!

(Age Age) (Age Age)SNP SNP (Age Age) SNP SNP

2! 3!(Ag

SNP

ij ijij ij

ij ij ij

ij iji i ij i i

i

β β β β

κ κ κβ β β

β β β β

β

− −= + − + + +

− − − − − −+ + +

− −+ − + + +

3 3 31 2 3

12 13

2

0 1 2

e Age ) (Age Age ) (Age Age )SNP SNP

3! 3! 3!

(Age Age)b b (Age Age) b

2!

ij ij iji i

iji i ij i ij

κ κ κβ β

ε

− − − − − −+ + +

−+ − + +

(1)

Where κk is the kth knot and (t - κk)+=0 if t ≤ κk and (t - κk) if t > κk, which is known as the

truncated power basis that ensures smooth continuity between the time windows. A Taylor

series, which is a representation of a function as an infinite sum of terms that are calculated

from the values of the function's derivatives at a single point, is used in the spline function to

allow for easier model convergence. The three knot points are placed at κ1=two, κ2=eight and

κ3=12 years with a cubic slope for each spline. SNP indicates the genotype for individual i, SNPi

ε (0, 1, 2) and age ij is the age for individual i at time j. The null hypothesis is such that H0:

β7=β8=β9=0. In other words, the test investigates if there is a statistically significant effect of

the SNP on average BMI at age 8 and BMI over time. The other coefficients for the SNP are not

tested as these do not have a comparable estimate using the BLUPs from the random effects.

The second model, the two-step approach, is where the SNP effects are omitted from the

SPLMM model, such that the following model is fitted to the data only once:


2 3

0 1 2 3

3 3 31 2 3

4 5 6

2

1 2

(Age Age) (Age Age)log(BMI ) (Age Age)2 6

(Age Age ) (Age Age ) (Age Age )6 6 6

(Age Age)b b (Age Age) b2

ij ijij ij

ij ij ij

ijoi i ij i ij

β β β β

κ κ κβ β β

ε

− −= + − + + +

− − − − − −+ + +

−+ − + +

(2)

Then the second step is to regress the BLUPs of b0i, b1i and b2i on the SNPs for each individual i

with a simple linear regression model:

* * *0 0 1

** ** **1 0 1

*** *** ***2 0 1

b

b

b

ii i

ii i

ii i

SNP

SNP

SNP

β β ε

β β ε

β β ε

= + +

= + +

= + +

(3)

These models were applied to each of the data sets described below. If the one and two-step

methods are indeed similar, then testing if *1β = 0 in Model 3 should give a similar P-Value to if

testing 7β = 0 in Model 1; likewise, testing **1β = 0 in Model 3 should give a similar P-Value to

testing 8β = 0 in Model 1 and testing ***1β = 0 in Model 3 should give a similar P-Value to

testing 9β = 0 in Model 1. 7β and *1β will be referred to as the SNP main effect, but are also

known as the cross-sectional effect. 8β and **1β will be referred to as the SNP*age interaction

effect and 9β and ***1β will be referred to as the SNP*age2 interaction effect. To compare the

methods, the difference between the log10(pF) and log10(pT) is summarized for each of the

data sets where pF is the P-Value for testing the β coefficients in the one step method (full

model) and pT is the P-Value for testing the β coefficients in the two-step method; specifically

the standard deviation of the difference, SDdiff, will be investigated, so comparisons can be

made with the results presented in Sikorska et al [245]. In addition, a Pearson correlation

between the ratio of the beta coefficient to the standard error will be presented. The

parameters are estimated using maximum likelihood; however, the conclusions were the same

with restricted maximum likelihood estimation.


3.3.2 Simulation Study

A simulation study was conducted using 1,000 parametric bootstrap data sets [221]. Each

dataset was generated to resemble the Raine Study data as much as possible. Data was

simulated according to Model 1 for 1,000 individuals measured on up to eight occasions (tij =

1, 2, 3, 6, 8, 10, 14, 17). The actual age of measurement was set to vary between individuals by

up to a year (i.e. individuals had measurements taken up to six months before or after a

birthday), which is representative of longitudinal studies. The fixed effects and variance-

covariance matrix were set to be similar to those from the SPLMM in Model 1 for males in the

Raine Study, excluding all terms containing the SNP (Table 3.1). Measurements were set to

missing at random so that 25% of the BMI measurements were missing, which is equivalent to

the proportion of missing data in the Raine Study under the assumption that all individuals

could have been measured yearly. The SNPs were generated by sampling genotypes from a

multinomial distribution (a value of 0, 1 or 2), with a minor allele frequency of 0.3. Four

combinations of effect sizes were investigated; the SNP main effect with two levels β7=0 or

β7=0.01 and the SNP*age interaction with two levels β8=0 or β8=0.001. These effect sizes were

chosen to be similar to that from the allelic score effect in Chapter 2 (i.e. a relatively large

genetic effect).

Table 3.1: Parameter estimates from the Raine Study SPLMM model (Model 1) used to

generate the data in the simulation study

Effect Parameter

in Model 2

Value Effect Parameter

in Model 2

Value

Intercept β0 2.795 SD(b0) σ0 0.118

Age β1 0.030 SD(b1) σ1 0.011

Age2 β2 0.010 SD(b2) σ2 0.002

Age3 β3 -0.001 Cor(b0, b1) ρ0 0.813

Age3:knot 1 β4 0.083 Cor(b0, b2) ρ1 -0.704

Age3:knot 2 β5 -0.003 Cor(b1, b2) ρ2 -0.352

Age3:knot 3 β6 0.005 SD(ε) Σ 0.064

Correlation

structure

ρ 0.383


3.3.3 Chromosome 16 Analysis in the Raine Study

A subset of 1,461 individuals from the Raine Study were used in the analysis (see Chapter 1,

Section 1.6.1 for further details on the Raine Study); 753 males and 708 females, based on the

following inclusion criteria: at least one parent of European descent, unrelated to anyone else

in sample (one of every related pair, including multiple births, was selected at random), no

significant congenital anomalies, genome-wide genetic data, and at least one measure of BMI

throughout childhood. BMI was calculated from the weight and height measurements, with a

total of 8,670 BMI measures (median six measures per person, IQR: 5-7). A chromosome wide

analysis was conducted using the dosages from the imputed data on chromosome 16 to

investigate how well the two-step method works on the Raine Study data. Chromosome 16

was chosen as it is where the most replicated gene to date, the fat mass and obesity gene

(FTO), for BMI in both adults and children is found [174,251]. It was therefore hypothesised

that some significant loci would be detected, specifically around the FTO gene, as well as many

non-associated SNPs. Models 1 and 2 for the chromosome 16 analysis included the first five

principal components for population stratification (see Section 1.6.1.3 for details) and a

sex*age interaction (where age is the spline function for age); given these would be included in

the GWAS analysis, they were included in this analysis also to ensure an accurate

representation of the GWAS results using the two approaches was achieved. Each SNP was

incorporated to the model assuming an additive genetic effect.

3.4 Results 3.4.1 Simulation Study Results

The simulation study showed that the two-step approach may be appropriate to select the

most significant SNP for further follow-up for the SNP main effect parameter; however, for the

SNP*age interaction term the results are largely different between the two methods. Figure

3.1 shows that when there is no effect of the SNP, i.e. β7=0, the two-step approach produces

fairly similar P-Values to the SPLMM for the SNP main effect; however, when there is a SNP

effect (β7=0.01) the two-step approach gives slightly smaller P-Values than the SPLMM (i.e.

more significant). This is ideal as although some of the most significant loci will be found to be

false positives when investigated in the full SPLMM, all SNPs (or the vast majority) that are

indeed significant are identified. In contrast, the SNP*age interaction effect produced very

different results using the two approaches. The concordance between the two P-Values is very

low, which can be seen by the large standard deviations for the difference and low correlations


presented in Table 3.2. Figure 3.2 shows that the P-Values for the two-step approach tend to

be smaller than the SPLMM when there is a SNP*age effect (i.e. β8=0.001), as seen with the

SNP main effect; however, there is a greater amount of variability between the two P-Values.

Figure 3.1: Comparison of the one and two-step approaches for the SNP main effect from

the 1,000 simulated data sets with different effect sizes for the SNP main effect and SNP*age

interaction effect; on the x-axis is the –log10(PF) and on the y-axis is the –log10(PT).


Table 3.2: Results from the 1,000 simulations. SDdiffis the standard deviation of the

difference between -log10(pF) [P-Value for testing the β coefficients in the one step method]

and -log10(pT) [P-Value for testing the β coefficients in the two-step method] for the 1,000

simulations. r2 is the Pearson correlation coefficient for the ratio of the beta coefficient to

the standard error.

β7=0 β7=0.01

β8=0 β8=0.001 β8=0 β8=0.001

SDdiff r2 SDdiff r2 SDdiff r2 SDdiff r2

SNP main effect 0.24 0.93 0.24 0.94 0.36 0.94 0.38 0.93

SNP*age interaction 0.60 0.41 0.79 0.41 0.68 0.39 1.01 0.40


Figure 3.2: Comparison of the one and two-step approaches for the SNP*age interaction

effect from the 1,000 simulated data sets with different effect sizes for the SNP main effect

and SNP*age interaction effect; on the x-axis is the –log10(PF) and on the y-axis is the –

log10(PT).

Interestingly, the SNP main effect appears to be unaffected by a significant SNP*age

interaction effect (i.e. SDdiff=0.24 for both β8 values in Table 3.2), whereas the SNP*age

interaction effect is affected by a significant SNP main effect (i.e. SDdiff=0.60 when β7=0 but

SDdiff=0.68 when β7=0.01). This is consistent with Verbeke et al [250] and Sikorska et al [245]

who both discuss that the longitudinal effect, or the SNP*age interaction effect, is affected by


the cross-sectional component of an LMM and hence propose the conditional linear mixed

model which estimates the longitudinal effect independent of the cross-sectional effect.

Figure 3.3 displays the β estimates and standard errors from the simulations with significant

SNP main effect and SNP*age interaction effect. It illustrates that the standard error is much

smaller in magnitude using the two-step approach for both the SNP main effect and the

SNP*age interaction. The β and standard errors in Figure 3.3 show little variability for the SNP

main effect term; however both estimates differ greatly for the SNP*age interaction effect

term, which leads to the differences in P-Values as seen in Figure 3.2. This is consistent for all

combinations of effect sizes for β7 and β8, hence just β7=0.01 and β8=0.001 are presented

here. Figure 3.3 illustrates why random effects are referred to as ‘shrinkage estimates’,

particularly for the SNP*age interaction effect, as the estimates are shrunk towards the

population average and hence the SNP effect is biased towards zero [252].


Figure 3.3: Comparison of the β and SE estimates using the one and two-step approaches for

the SNP main effect and SNP*age interaction effect from the 1,000 simulated data sets where

both the SNP main effect and SNP*age interaction effect were significant; on the x-axis are

the estimates (β and SE(β)) from the SPLMM and on the y-axis are the estimates from the

two-step approach.

As mentioned previously, there are several differences between the BMI data presented in this

thesis and the bone mineral density data presented by Sikorska et al [245]. These differences

include a complex function of age to model the BMI trajectory over time, high correlation

between the intercept and slope terms, and continuous auto-regressive errors. Therefore,

additional simulations were conducted to investigate the differences between these results

and those from Sikorska et al [245]. The following additional scenarios were simulated:


1. Low correlation between intercept and slope terms: the correlation between the

intercept and slope terms in the BMI model were relatively high (ρ0, ρ1 and ρ2 in Table

3.1), whereas the correlation between the intercept and linear trajectory from the

Sikorska et al [245] publication was only -0.140. For these additional simulations,

ρ0=0.1, ρ1=-0.1 and ρ2=-0.1.

2. Linear random effects: Sikorska et al were able to fit a simple linear trajectory to their

bone mineral density data for both the fixed and random effects. The random effects

for the

BMI data included a quadratic curve. Therefore, the age2 parameter was removed to

simulate the data (i.e. the same random effects as in the Sikorska et al), with SD(b0)

and SD(b1) the same as in Table 3.1. Adjustments to the models were made to

incorporate this change; Model 1 became:

32

0 1 2 3

3 3 31 2 3

4 5 6

2 3

7 8 9 10

11

(Age Age) (Age Age)log(BMI ) (Age Age)

2 6(Age Age ) (Age Age ) (Age Age )

6 6 6

(Age Age) (Age Age)SNP SNP (Age Age) SNP SNP

2 6(Age Age

SNP

ij ijij ij

ij ij ij

ij iji i ij i i

iji

β β β β

κ κ κβ β β

β β β β

β

− −= + − + + +

− − − − − −+ + +

− −+ − + + +

− 3 3 31 2 3

12 13

0 1

) (Age Age ) (Age Age )SNP SNP

6 6 6b b (Age Age)

ij iji i

i i ij ij

κ κ κβ β

ε

− − − − −+ + +

+ − +

Model 2 was the subset of Model 1 without the SNP effects and Model 3 became:

* * *0 0 1

** ** **1 0 1

b

b

ii i

ii i

SNP

SNP

β β ε

β β ε

= + +

= + +

3. Cubic fixed effects: Given the spline function in the SPLMM is complex; further

simulations were conducted with a polynomial cubic function among the fixed effects

instead. The random effects remained as a quadratic function. Model 1 was

therefore:

2 30 1 2 3

2 34 5 6 7

21 2

log(BMI ) (Age Age) (Age Age) (Age Age)SNP SNP (Age Age) SNP (Age Age) SNP (Age Age)

b b (Age Age) b (Age Age)

ij ij ij ij

i i ij i ij i ij

oi i ij i ij ij

β β β β

β β β β

ε

= + − + − + − +

+ − + − + − +

+ − + − +


The random effects were extracted from the non-genetic model the same as

previously and used for Model 3.

4. Linear fixed and random effects: Similar to the random effects, a linear trajectory over

time was simulated with the intercept and age interacting with the SNP; β0 and β1

were the same as in Table 3.1. Given the random effects need to be a subset of the

fixed effects [253], a linear trajectory was also used in the random effects.

Adjustments to Model 1 were made to incorporate this change:

0 1 2 3

0 1

log(BMI ) (Age Age) SNP SNP (Age Age)

b b (Age Age)ij ij i i ij

i i ij ij

β β β β

ε

= + − + + − +

+ − +

The random effects were extracted from the non-genetic model the same as

previously and used for Model 3.

5. Independent correlation structure: Data was simulated assuming the within subject

errors were independently distributed, therefore sampling errors from a normal

distribution with specified variance σ2=0.0642

All of the other parameters were held identical to those of the previous simulations (Table

3.1).

The standard deviations of the difference in P-Values between the one and two-step

approaches (SDdiff) from these additional simulations, along with the Pearson correlation of

the ratio between the beta coefficient and standard error, are presented in Table 3.3.

Comparing these results to those presented in Table 3.2, it is apparent that the simulations

with a linear trajectory among the fixed effects (similar to the model presented in Sikorska et

al [245]), rather than the complex spline function, reduced the difference in P-Values between

the two methods. When there was no SNP or SNP by age effect, for example, the SDdiff

reduced from 0.60 in the model with the full spline function to 0.48 with the cubic function

and 0.13 for the linear term. The model with a linear trajectory in the fixed and random effects

produced a similar SDdiff to that reported in Sikorska et al (i.e. 0.17) [245]. A reduction in the

SDdiff was also observed for the simulations with a linear trajectory in the random effects, but

the reduction was small. When the correlation between the intercept and slope parameters in

the random effects is low, the test of the SNP*age interaction effect is no longer affected by

the presence of a SNP main effect in the model (i.e. SDdiff=0.60 when β7=0 and SDdiff=0.58

when β7=0.01). In addition, the correlation structure for the within individual errors does not

seem to influence the difference in P-Values between the two methods.


Table 3.3: Results of the 1,000 simulations in the additional scenarios. SDdiff is the standard

deviation of the difference between -log10(pF) [P-Value for testing the β coefficients in the

one step method] and -log10(pT) [P-Value for testing the β coefficients in the two-step

method]. r2 is the Pearson correlation coefficient for the ratio of the beta coefficient to the

standard error.

β7=0 β7=0.01

β8=0 β8=0.001 β8=0 β8=0.001

SDdiff r2 SDdiff r2 SDdiff r2 SDdiff r2

Low correlation between intercept and slope terms

0.60 0.34 0.95 0.42 0.58 0.40 0.90 0.39

Linear random effects

0.58 0.39 0.81 0.37 0.64 0.40 0.99 0.42

Cubic fixed effects 0.48 0.67 0.70 0.65 0.63 0.63 0.83 0.66

Linear fixed and random effects

0.13 0.98 0.27 0.98 0.32 0.98 0.22 0.98

Independent correlation structure

0.56 0.34 0.79 0.38 0.72 0.40 0.98 0.38

3.4.2 Chromosome 16 SNPs in the Raine Study

There were 68,690 imputed SNPs on chromosome 16 with a MAF greater than 1% and

imputation quality (R2) greater than 0.3. The results from the chromosome 16 analysis were

consistent with the simulations. Figure 3.4 displays the P-Values for each of the SNP effects; it

can be concluded from these plots that the two-step approach is not appropriate for the

SNP*age and SNP*age2 interaction effects, as the P-Values are both under and overestimated.

The standard deviations for the difference also support this; SDdiff for the SNP main effect is

0.19, and 0.56 for both the SNP*age interactions. However, even the SDdiff for the SNP main

effect is larger than that from the real data example in Sikorska et al (SDdiff=0.117) [245].

Figure 3.5 compares the β and standard errors between the two approaches for each of the

SNP terms. When the β estimates using the SPLMM are large (i.e. either a strong negative or

positive effect), the two-step approach produces β estimates closer to zero. This indicates that

the random effects may be poor surrogates for the intercept and slope of a linear mixed model


and thus the genetic effects are biased towards zero. As observed in the simulation study, the

standard errors are consistently smaller using the two-step approach, and seem to be

increasingly underestimated as the standard error from the SPLMM increases. This pattern can

be seen for all the SNP effects; however appears to be far worse for the age interaction terms.

Figure 3.4: Comparison of the one and two-step approaches for each of the SNP effects

from analysis of the chromosome 16 data in the Raine Study; on the x-axis is the –log10(PF)

and on the y-axis is the –log10(PT).


Figure 3.5: Comparison of the β and SE estimates using the one and two-step approaches for

each of the SNP effects from the chromosome 16 analysis in the Raine Study; on the x-axis

are the estimates (β and SE(β)) from the SPLMM and on the y-axis are the estimates from the

two-step approach.

Focusing on the FTO SNP from Chapter 2, rs1121980, the P-Values for the SNP main effect are

the same at P-Value( 7β )= P-Value( *1β )=0.0001, whereas for the age interaction terms the P-

Values from the two-step approach are much smaller than the one-step approach (P-

Value( 8β )=0.0021, P-Value( **1β )=0.0002, P-Value( 9β )=0.2411, P-Value( ***

1β )=0.0005). The

results from the SPLMM are more consistent with the results seen in Chapter 2 with the other

longitudinal methods, whereby the SNP*age2 effect in particular failed to reach nominal levels

of significance.


3.5 Discussion In this Chapter it has been shown that the two-step approach, which has been suggested by

several authors, is not precise for detecting SNP*age interactions for complex longitudinal

phenotype, such as BMI over childhood. As previously outlined, BMI in childhood is complex as

it has a non-linear trajectory over time, high correlation between the intercept and slope

terms, and non-normal, correlated (continuous auto-regressive) errors. Although this approach

has been successful in detecting genome-wide significant associations for both the SNP main

effect and SNP*time interaction effect in traits that have a linear trajectory [89], the results

shown here indicate that when the phenotype has a non-linear trajectory over time, the two-

step approach produces inaccurate SNP associations. This is perhaps due to the subset of

parameters in the random effects not being able to accurately summarize the full trajectory in

the fixed effects. In addition, when the correlation between the intercept and slope

parameters in the random effects is high, the SNP*age interaction effect using the two-step

approach is affected by a significant SNP main effect, whereby it is less accurate at detecting

an association. These results are consistent with Kerner et al who alluded to the fact that the

two-step approach is far from ideal and could potentially lead to biased results [85]. Therefore,

although this two-step method substantially reduces the computational time of a GWAS, for

complex phenotypes it may be detrimental to detecting genetic associations and is not

recommended.

3.6 Conclusions It would be ideal to utilize an approach that is computationally efficient for the full GWAS as a

screening tool to select a subset of SNPs for further follow-up in a more complex model.

However, the results presented in this Chapter show that the two-step approach reported to

date is not accurate to select SNPs with a significant SNP*age interaction when the phenotype

has a complex trajectory over time. Therefore, it is important to compare results from a one-

and two-step approach for a number of SNPs to confirm that the two-step approach is

accurate in detecting SNPs of interest for a particular phenotype, before conducting a full

GWAS. It is concluded that the two-step approach is void for BMI trajectories over childhood

and therefore the one-step approach will continue to be the focus for the remainder of this

thesis.


Chapter 4: Robustness Of The Linear Mixed Effects Model To Distribution Assumptions And Consequences For Genome-Wide Association Studies 4.1 Introduction The results presented in this chapter have been submitted for review at Statistical Applications

in Genetics and Molecular Biology; the manuscript is included as an appendix (Appendix C).

As concluded in Chapter 2, the SPLMM is the most efficient for modelling childhood growth to

detect modest genetic effects. All four methods discussed in Chapter 2 were applied to the

ALSPAC data (outlined in Chapter 1, Section 1.6.2) to ensure this model was generalizable to

other cohorts. None of the model assumptions from the four methods were satisfied, partly

due to the two different sources of measurement used in this study. This chapter outlines the

issues with the model misspecification and a potential solution when conducting a GWAS.

4.2 Background Over recent years, the study of population genetics has progressed from candidate gene and

linkage studies over relatively small regions of the genome, to whole genome association

analyses. These GWASs are designed to search the entire genome for SNPs that are associated

with a disease or trait of interest. If SNPs are found to be associated, they are then considered

to mark a region of the genome that influences the risk of disease or affects the levels of a

trait. In general, very small effects are expected and hence large sample sizes are required.

This advance in the scale of genetic analyses has transformed the field from hypothesis driven

research to a hypothesis-free approach, which has required additional statistical methods to

be developed to ensure there is a balance between acceptable levels of power and the chance

of inflating the type 1 error. Given the cost of conducting these studies, in terms of both

monetary costs for genotyping samples and computational costs for the analysis, it is

important that appropriate analyses are conducted from the outset.

134 Chapter 4: Simulation Study

To date, most of the GWASs have focused on case/control studies of particular diseases or

cross-sectional measurements of phenotypic traits. These study designs typically use relatively

simple statistical techniques, such as chi-square tests or linear (or logistic) regression models,

to investigate at the association between a trait and each of the ~2.5 million SNPs. There are

now over 1,500 published studies focusing on 250 traits using analyses of this kind [48].

However, researchers are beginning to focus on more complex analyses to uncover additional

genetic loci and reduce the currently unexplained heritability of these traits. One area

requiring extension is to use longitudinal studies, with repeated measures on each individual in

the study, to understand how SNPs affect changes over time of a particular phenotype

[85,245,246]. There are several developed statistical methods commonly used for repeated

measures data to take into account the non-independence of measurements within an

individual. For continuous traits, the most popular statistical method is the LMM by Laird and

Ware [215]. This method can be computationally intensive as the model can account for linear

or non-linear trajectories for the outcome of interest over time, correlation between measures

at the starting point (intercept) and change over time (slope, or non-linear trajectory) within

an individual, and adjustment for both time-independent and time-dependent covariates.

Several methods are available to reduce this computational burden in GWASs [85,245,246],

most of which suggest a two-step approach whereby one models the phenotype in the mixed

effects model framework and then takes summary measures (e.g. the Best Linear Unbiased

Predictors [BLUPs] for the intercept and slope parameters of a linear, random-intercept

random-slope model) from this to analyse against the ~2.5 million SNPs. However, as

illustrated in Chapter 3, these data reduction methods are not appropriate for complex traits

that have non-linear trajectories over time; therefore these data reduction techniques will not

be discussed further.

In LMMs, the usual assumptions made about the random effects and error distributions

include: the random effects and error terms are normally distributed, the random effects are

independent of the error term, and the error term has homoscedastic variance [215]. In

studies that utilize this method to assess the association of a SNP with the trajectory, the fixed

effect estimates are often of most interest; the random effects and correlation structure at the

individual level are necessary to provide an accurate fit of the model to the data, in addition to

providing appropriate test statistics, but are treated as nuisance parameters and are often

difficult to interpret. There have been a number of studies investigating whether violations of


the assumptions about the random effects and error terms affect the maximum likelihood

inference of the fixed effect parameters and their variance estimates; several manuscripts

have shown that the fixed effects estimates are robust to non-Gaussian random effects

distribution [254,255], non-Gaussian or heteroscedastic error distribution [256] and that the

population fixed effects are robust to misspecified covariance structure [257], but the

individual level predictions are not [258]. Jacqmin-Gadda et al [256] showed the fixed effects

estimates are not robust to error variance that is dependent on a covariate in the model that

interacts with time. Liang and Zeger [259] demonstrated that a robust sandwich estimator

[260] can correct for biased variance estimates of the fixed effects when the covariance

structure is not correctly specified. To my knowledge, there has not been any investigation

into how any of these model misspecifications affect the power and type 1 error in high

dimensional studies, for example when running an LMM on a genome-wide scale, and what

the value of the robust variance estimator is in this context.

4.2.1 Aims

The aim of this study is to assess by simulations whether misspecification of the error term,

with either non-Gaussian error distributions or non-constant error variance, in a complex

longitudinal model with non-linear trajectories will affect: 1) the coverage rates of the 95%

confidence interval of the fixed effects parameter estimates; 2) the bias of the fixed effects

parameter estimates; 3) the statistical power to detect association; or 4) the type 1 error of

SNP detection in a GWAS. Differences in the conclusions due to MAF for the SNPs or sample

size of the investigated cohort were also examined.

4.3 Motivating Example The Avon Longitudinal Study of Parents and Children (ALSPAC) [261] is a birth cohort study;

14,541 pregnant women in the former county of Avon, UK, were recruited into the study if

they had an expected delivery date between 1st April 1991 and 31st December 1992 (described

in detail in Chapter 1, Section 1.6.2). A subset of 7,916 participants were used for analysis

based on the following inclusion criteria: at least one parent of European descent, singleton

birth, unrelated to anyone in the sample, genome-wide genotype data, and at least one

measure of BMI throughout childhood. Participants have a median of nine BMI measurements

between 1 and 15 years of age (interquartile range 5-12, range 1-29 measurements). Figure 4.1

shows the BMI trajectory for 20 randomly selected individuals. There is a large amount of


variability between individuals for both intercept and slope, with slight curvature in the

trajectory and a nadir at approximately five to seven years of age.

Figure 4.1: Individual BMI trajectories for 20 females from ALSPAC

Figure 4.2 graphically shows how each of the BMI measurements in ALSPAC was taken. From

birth to five years, length and weight measurements were extracted from health visitor

records, with up to four measurements taken on average at six weeks and 10, 21, and 48

months of age. For a random 10% of the cohort, length and height measurements were taken

in eight research clinic visits, held between the ages of four months and five years of age. From

age seven years upwards, all children were invited to annual research clinics from age’s seven

to 11 and biannual research clinics thereafter. Details of measuring equipment used in the

clinics is described elsewhere [189]. In addition, parent-reported child height and weights were

also available from questionnaires (27% of measurements); only questionnaire data was

available around the nadir of the BMI trajectory, also known as the adiposity rebound, which

occurs between five and seven years of age in most children. Whilst the measurements from

routine health care have previously been shown to be accurate in this cohort [190], parental

report of children’s height tends to be over-estimated while weight tends to be under-

estimated [191].


Figure 4.2: BMI measurements over time, by measurement source, in ALSPAC

As seen in Figure 4.2, the variability of BMI increases over time, with more individuals reaching

BMI values of 30 or greater (the obese cut-off point in adults) at the later time points. This

increase of individuals with BMI>30 also induces a skewed distribution with a long right-hand

tail.

The primary research question is to identify SNPs that are associated with average BMI and

change in BMI over childhood and adolescence in the ALSPAC data. A LMM was used to

appropriately model the longitudinal trajectory over childhood, to account for the large

correlation between each of the random effects parameters, to adjust for additional covariates

such as the source of the height/weight measurements (clinic or questionnaire) and to allow

data to be missing at random across childhood. The general form of the model is as follows:

Yi = Xiβ + Z ibi + ε i (4.1) where Yi is the response vector for the ith individual, β is the vector of fixed effects and


bi ~ N(0, Σ) is the vector of subject specific random effects, Xi and Zi are the fixed effect and

random effect regressor matrices respectively and ε i ~ N(0, σ2) is the within subject error

vector. When applying this model to the ALSPAC data, the best model fit included a cubic

polynomial of mean centred age (centred at eight years) in the fixed effects, a quadratic

polynomial of mean centred age in the random effects and a continuous autoregressive

correlation structure of order one for the covariance of the within-subject errors. Hence, the

final model for both females and males was:

BMIij = β0 + β1t ij + β2t ij2 + β3t ij

3 + β4MSij + β5SNPi + β6t ijSNPi +

β7t ij2SNPi + β8t ij

3SNPi + bi0 + bi1t ij + bi2t ij2 + ε ij

(4.2)

where MS is the measurement source (i.e. clinical visit or questionnaire) of individual i at time

j, tij is the age (centred at eight years) and ~ (0, )ij iN Rε such that:

1

1 12

1

11

1

j

j

i

j j

R

ρ ρρ ρ

σ

ρ ρ

−

−

=

Therefore β0 is the population intercept (i.e. mean BMI at age 8), β1, β2 and β3 are the fixed

effects for the cubic function of age, β4 is the measurement source, β5 is the change in the

mean BMI at eight years of age for each additional copy of the minor allele, β6 is the SNP by

linear age effect, β7 and β8 are the SNP by quadratic and cubic effects respectively.

Although this was not the optimal model selected in Chapter 1 for BMI growth trajectory

modelling, this simpler function to model the curvature over time was chosen so that the

effects from the simulation study would be relatively interpretable.

The residuals from this model fitted to the ALSPAC data are displayed in Figure 4.3. Figure 4.3A

shows that the residuals have fairly constant variance for the clinic measures (which includes

the Children in Focus and children’s health records), but those in Figure 4.3B show there is

greater variability for the questionnaire measures particularly around the adiposity rebound. It

is also evident that the model is unable to estimate the BMI values well for the questionnaire

measures (Figure 4.3D) and the residuals deviate further from the Gaussian distribution

assumption (Figure 4.3F).


Figure 4.3: Residual plots, by measurement source, for the LMM model fit to the ALSPAC

data


Due to the nature of the data collection, which is often intricate in large cohort based studies

such as ALSPAC, the model assumptions were not met due to the following:

1. The questionnaire measures have previously been shown to have greater variability

than clinic measured height and weight [191]; therefore the variability was dependent

on a covariate in the model.

2. There were only questionnaire measures available around the nadir of the trajectory

(also known as the adiposity rebound), which meant there was greater variability

around the rebound.

3. The variability within individuals changes over time; particularly with increasedvariability around puberty and into adolescence.

4. BMI also has a non-Gaussian error distribution. This is in part due to the increasingvariability between individuals over time, with some individuals having rapidlyincreasing BMI while others remain relatively consistent.

In the following simulation study, the robustness of the maximum likelihood inference for the

fixed effects is investigated, along with the power and the type 1 error for detecting an

association with the SNP when the error distribution is misspecified due to the above

intricacies of the data.

4.4 Simulation Study Extensive simulations were carried out to investigate the effects on the LMM when the error

term (also called the level 1 residual, or the occasion-level residual) in the model was non-

Gaussian or had a non-constant variance. In each of the simulation scenarios, we set the non-

genetic fixed effects parameters (β0-β4 from model (2)) and the variance-covariance matrix

similar to those coming from the fitted model for BMI adjusting for the FTO rs1121980 SNP in

the ALSPAC study (Table 4.1). The measurement source, which is a fixed effect in the LMM and

used in the heteroscedastic error simulations, was a randomly generated binary variable for

each individual at each time point with distribution throughout the ages similar to the

distribution in ALSPAC (percent questionnaire measurements per follow-up year: year 1=40%,

year 2=20%, year 3=40%, year 4=10%, year 5=60%, year 6=99%, year 7=10%, year 10=10%,

year 13=30%; the remaining years had 0% questionnaire measurements).


The fixed effect estimation for various sample sizes, minor allele frequencies of the SNP and

the SNP effect sizes were also investigated:

1. Sample size: two levels; N=1,000 and N=3,000

2. MAF: four levels; 0.1, 0.2, 0.3 and 0.4

3. Effect sizes: two combinations; β5=0.6, β6=0.15, β7 = -0.000752 and β8 = -0.000380

(alternative hypothesis) or β5 = β6= β7 = β8 = 0 (null hypothesis). The alternative

hypothesis effect sizes for β5 and β6 were chosen to have 80% power to detect with

the larger sample size (N=3,000); the effect sizes for β7 and β8 were similar to those

coming from the fitted model for BMI adjusting for the FTO rs1121980 SNP in the

ALSPAC study.

Table 4.1: Parameter estimates from the ALSPAC non-genetic model used to generate the

data in the simulation study

Effect Parameter Value

Intercept β0 16.534

Age β1 0.400

Age2 β2 0.056

Age3 β3 -0.003

Source β4 -0.153

SD(b0) σ0 2.092

SD(b1) σ1 0.269

SD(b2) σ2 0.0235

Cor(b0, b1) ρ0 0.820

Cor(b0, b2) ρ1 -0.389

Cor(b1, b2) ρ2 -0.092

SD(ε) Σ 1.063

Correlation

structure

ρ 0.394


4.4.1 Sampling Designs

Many longitudinal cohorts have different sampling designs, some with variable amounts of

missing time points and missing observations at each time point, and hence five different

sampling designs were investigated:

1. Sparse complete: ni=8 measures per person with few measures around the adiposity

rebound; times of measures are 1, 2, 3, 5, 8, 10, 13, 15

2. Intense complete: ni=14 measures per person with multiple measures around the

adiposity rebound; times of measures are 1, 2, 3, 3.5, 4, 4.5 ,5 ,5.5 ,6 ,7, 9, 11, 13, 15

3. Equal unbalanced: ni=1 to 15 measures per person between 1 and 15 years with a

mean of nine measures (proportion of missingness = 0.4 across whole age range)

4. Unbalanced with more samples around the adiposity rebound: ni=1 to 15 measures

per person between 1 and 15 years with a mean of nine measures; proportion of

missingness around adiposity rebound of 0.2 and 0.45 outside the five to seven year

age range (average proportion of missingness over whole age range is 0.4)

5. Unbalanced with fewer samples around the adiposity rebound: ni=1 to 15 measures

per person between 1 and 15 years with a mean of nine measures; proportion of

missingness around adiposity rebound of 0.6 and 0.35 outside the five to seven year

age range (average proportion of missingness over whole age range is 0.4)

The first two designs with complete data at each follow-up assume that every individual had

the exact same age at follow-up (i.e. came into clinic on their birthday), whereas the other

three designs are more representative of longitudinal studies, where the actual age of

measurement varies between individuals by up to a year (i.e. came into clinic either six months

before or after a birthday). Data is assumed to be missing completely at random, or in other

words the probability that an observation is missing for a given individual is independent of all

other observed data. The proportion of missingness simulated across the whole range (i.e. 0.4)

was equivalent to the amount of missing data observed in ALSPAC under the assumption that

all individuals could have been measured yearly. A fully factorial design for the simulations

with the three data characteristics (sample size, MAF, effect size) and the five sampling designs

were used.


4.4.2 Models for Data Generation

4.4.2.1 Standard Linear Mixed Model

Data were generated with Gaussian random effects and error distribution to validate the

estimation method.

4.4.2.2 Non Gaussian Error

Three error structures were investigated:

1. t-distribution: t with 5 degrees of freedom

2. skew-normal distribution: SN(1.0632, 40)

3. Asymmetric mixture of two Gaussian distributions: 0.3N(-0.67, 12) + 0.7N(0.5, 0.32)

4.4.2.3 Heteroscedastic Error

Three cases were studied:

1. Variance dependent on a covariate: Var(eij) = σ2e aXij

where σ2e= 1.131, a=1.5 and Xij=1 if measure was from questionnaire and 0 if measure

was from a follow-up clinic

2. Variance greater at the adiposity rebound: Var(eij) = σ2e aXij

where σ2e=1.131, a=1.5 and Xij=1 if measure was between five and seven years and 0 if

not

3. Variance increasing over time: Var(eij) = σ2e atij

where σ2e=1.131 and a=1.15

4.4.3 Data Generation

Performance estimates included coverage probability, power and type 1 error. Coverage

probability is defined as the proportion of simulations where the 95% confidence interval

around the parameter estimate contained the simulated parameter. Coverage probabilities

can indicate whether the confidence interval of the parameter(s) of interest is conservative

(i.e. the coverage probability is larger than the nominal confidence interval) or liberal (i.e. the

coverage probability is narrower than the nominal confidence interval). Power is defined as

the proportion of tests under the alternative hypothesis that reach genome-wide significance

of P-Value < 5x10-8. Type 1 error is defined as the proportion of tests under the null hypothesis

that reach significance of P-Value < 0.05. It would have been ideal to look at the type 1 error

rate at a genome-wide level also; however, this would have required a null dataset to be


simulated such that no allele was associated with the outcome and a genome-wide scan would

then be performed on this dataset, with these simulation and association steps repeated 5,000

times. This would require a large amount of computing time and was therefore deemed

infeasible. Investigating a P-Value of less than 0.05, in conjunction with a real data example,

should provide sufficient information regarding the effect of the model misspecification on the

type 1 error.

It is important to report the uncertainty in any estimates from simulation based studies [262].

Therefore, Monte Carlo error (MCE) was calculated for coverage probabilities, bias, power and

type 1 error using the following confidence interval [263]:

P(1-P)P 1.96S

±

Where P is the α-level, for example P for coverage estimates is 0.95 and P for type 1 error is

0.05 and S is the number of simulations. The output from the simulations was then assessed as

to whether they fell within this confidence interval.

We simulated 1,000 datasets under the alternative hypothesis (β5=0.6 and β6=0.15) to look at

coverage probabilities, bias and power and 5,000 datasets under the null hypothesis (β5=0 and

β6=0) to look at type 1 error at α=0.05. The number of simulations for each hypothesis was

determined so that the MCE was appropriate. Each SNP (coded as 0, 1, and 2) was

incorporated into the model assuming an additive genetic model, whereby each additional

minor allele increases BMI by an equal amount. The primary interest was estimating the SNP

main effect, β5, which represents the increase on the mean BMI at eight years of age for each

additional copy of the minor allele and the SNP by age effect, β6, which represents the effect

on the mean linear increase of BMI (slope) for each additional minor allele. All analyses were

conducted in R version 2.12.1 [222] using the nlme package.


4.4.4 Calculating Robust Standard Errors and Global Wald Tests

As mentioned in Section 4.1, the robust sandwich estimate [260] can correct the biased

variance estimates of the fixed effects when the covariance structure is not correctly specified.

Therefore, a robust standard error was calculated for each fixed effect parameter and

corresponding P-Value. The following formula was used:

'

1

( ) ( )S

i i i ii ii

− −

=

∑

-1 -1 -1 -1' ' 'X V X X V ε ε V X X V X

Where:

X is the fixed effect regressor matrix from equation 4.1

V is the variance of Y from equation 4.1

i i iyε β= −X

S is the number of subjects and i is the ith subject

In addition to the fixed effects parameters with robust standard errors, a Wald test was

conducted to assess whether the overall SNP effect was affected by the misspecification. The

Wald test was estimated using the General Linear Hypothesis approach [264]. This approach is

based on the normal approximation for maximum likelihood estimators using the estimated

variance-covariance matrix. The hypothesis can be specified through a constant matrix L to be

matched with the fixed effects of the model such that H0: Lβ = m where the m are the

hypothesized values. The estimates of the fixed effects, β, asymptotically follow a multivariate

normal distribution ( ,cov( ))Nβ β β by the Central Limit Theorem such that the linear

form also asymptotically follows a multivariate normal distribution:

'~ ( , cov( ) )L N L L Lβ β β

Thus the 95% confidence interval and corresponding P-Value for the hypothesized value can be

obtained accordingly. Testing to evaluate whether the parameters for the SNP were

simultaneously equal to zero was then conducted. It is computationally intensive to calculate a

robust estimate for the Wald test; for example, the robust standard error for the fixed effects

takes approximately 7 minutes for the rs1121980 FTO SNP in the ALSPAC data whereas the

robust standard error for the global Wald test takes approximately an additional 3 minutes.

These computational times decrease exponentially as sample size and the number of repeated

measures per individual decreases; however, they may not be scalable to a GWAS study. To

investigate whether a robust standard error would be beneficial for the global Wald test, we


selected the scenario where the inflation was greatest was selected and calculated the robust

estimates for all the simulations in this scenario.

4.5 Results for Simulated Data 4.5.1 Coverage Probabilities

Coverage probabilities for the 95% confidence interval of the fixed effects parameter estimates

from each of the simulations are presented in Table 4.2. No consistent differences were seen

across the range of MAFs, so the results from each of the simulated datasets were combined

for ease of presentation; however the coverage probabilities for each of the MAFs are

presented in Appendix D, Tables 1-5.

The coverage probabilities of the SNP main effects parameter for all simulations appear to be

unaffected by the error misspecifications; only nine of 70 coverage probabilities were

significantly different from 95%, that is less than 94.32% or greater than 95.68%, five of which

were from the simulations where the error variance increases over time.

Thirty-one of the 70 coverage probabilities (44%) for the SNP*age interaction parameter were

significantly different from 95%, with both the non-Gaussian and heteroscedastic error

distributions being affected. When the error variance followed a t-distribution, the coverage

probabilities for the confidence interval of the SNP*age interaction parameter are less than

95% in all designs except the sparse complete scenario. Similarly, the SNP*age interaction

parameter had coverage probabilities less than 95% when the error variance followed a skew-

normal distribution, however only in the unbalanced designs with missing data. The coverage

probabilities were less than 95% when the error variance was dependent on a covariate and

increased over time, in both the complete and unbalanced designs. All the coverage

probabilities that significantly differ from 95% for the SNP*age interaction parameter have

underestimated variance estimates and thus confidence intervals that were too narrow, which

could lead to test statistics that are too liberal.

4.5.2 Bias

Bias for the fixed effect parameter estimates are presented in Table 4.3 (complete scenarios)

and Table 4.4 (unbalanced scenarios). No consistent differences were seen in the bias

estimates across the range of minor allele frequencies; however, the 95% confidence intervals


for the difference between the simulated parameter and the true parameter were tighter as

the sample size and minor allele frequency increased. The bias for each of the minor allele

frequencies are presented in Appendix D, Tables 6-10.

The SNP main effect and the SNP*age interaction parameters are unbiased in the majority of

the simulations, indicating that the misspecifications in the error distribution do not affect the

estimates of the β’s. Only nine of 140 95% confidence intervals did not cover zero; these nine

confidence intervals were across the range of error distributions and designs, indicating that

no one scenario was particularly biased.

Table 4.2: Coverage rates of the 95% confidence intervals of the fixed effects; bold and

underlined cells are those that are significantly different from the nominal 95% based on

4,000 simulations under each design (1,000 simulations for each MAF combined into one

summary statistic).

Sampling Design

Sparse Complete

Intense Complete

Equal Unbalanced

Unbalanced with more samples

around the adiposity rebound

Unbalanced with less samples around the adiposity rebound

Sample Size 1,000 3,000 1,000 3,000 1,000 3,000 1,000 3,000 1,000 3,000

Gaussian Distribution

SNP 95.43 95.03 95.08 95.45 94.83 95.23 95.08 94.70 95.40 94.73

SNP*age 95.00 95.23 94.58 95.13 94.35 94.63 94.30 93.90 94.53 94.35

t-distribution

SNP 95.45 95.35 95.90 94.55 95.13 94.85 94.65 94.48 95.48 94.95

SNP*age 95.30 94.80 94.05 94.13 94.45 94.10 93.70 94.00 93.33 94.03

Skew-normal Distribution

SNP 94.90 95.03 95.18 95.10 95.05 94.25 95.43 94.95 94.85 94.75

SNP*age 95.68 95.18 94.63 94.65 93.88 93.73 94.73 94.13 93.90 93.55

Mixture of 2 Gaussian Distributions

SNP 94.85 94.98 94.83 95.65 95.00 95.08 94.53 95.40 94.33 94.48

SNP*age 95.05 94.78 95.03 94.60 95.20 94.08 94.58 95.20 94.80 94.10

Variance dependent on a covariate

SNP 94.93 95.05 95.83 95.35 94.43 94.75 94.70 94.93 94.98 94.93

SNP*age 94.93 95.03 94.95 94.15 94.03 94.10 93.75 94.53 93.95 93.93

Variance greater at adiposity rebound

SNP 94.75 95.23 95.15 95.08 94.25 95.20 95.48 95.35 95.23 94.43

SNP*age 94.05 94.45 95.38 95.38 94.13 94.43 94.00 94.75 94.73 94.60

Variance increasing over time

SNP 94.20 95.00 94.08 94.38 94.80 94.33 94.30 94.03 95.98 95.48

SNP*age 94.10 94.88 91.78 92.38 94.70 94.23 93.28 93.48 95.65 95.25


Table 4.3: Bias and 95% confidence interval for the complete designs; bold and underlined cells are those whose confidence interval does not cover zero

based on 4,000 simulations under each design (1,000 simulations for each MAF combined into one summary statistic).

Sampling Design Sparse Complete Intense Complete Sample Size N=1,000 N=3,000 N=1,000 N=3,000

Gaussian Distribution SNP 0.0012 (-0.0026,0.0051) 0.0012 (-0.0010,0.0035) -0.0044 (-0.0083,-0,.0005) 0.0022 (0.0000,0.0044) SNP*age 0.0002 (-0.0005,0.0008) 0.0002 (-0.0002,0.0006) -0.0004 (-0.0010,0.0002) -0.0001 (-0.0004,0.0003)

t-distribution SNP 0.0002 (-0.0038,0.0042) -0.0007 (-0.0030,0.0016) -0.0008 (-0.0048,0.0032) 0.0005 (-0.0019,0.0028) SNP*age -0.0001 (-0.0008,0.0007) -0.0002 (-0.0006,0.0002) 0 (-0.0007,0.0007) 0 (-0.0005,0.0004)

Skew-normal Distribution SNP -0.0027 (-0.0065,0.0012) -0.0002 (-0.0025,0.0020) 0.0009 (-0.0030,0.0048) 0.0008 (-0.0014,0.0030) SNP*age 0.0002 (-0.0005,0.0008) 0 (-0.0004,0.0004) 0.0003 (-0.0003,0.0009) 0.0001 (-0.0003,0.0004)

Mixture of 2 Gaussian Distributions SNP -0.0002 (-0.0042,0.0037) -0.0016 (-0.0039,0.0006) -0.0005 (-0.0045,0.0034) -0.0007 (-0.0029,0.0015) SNP*age 0 (-0.0005,0.0005) -0.0003 (-0.0006,0.0000) -0.0003 (-0.0008,0.0003) 0 (-0.0003,0.0003)

Variance dependent on a covariate SNP 0.0004 (-0.0036,0.0044) -0.0002 (-0.0024,0.0021) -0.0041 (-0.008,-0.0002) 0.0023 (0.0000,0.0045) SNP*age -0.0001 (-0.0008,0.0006) -0.0001 (-0.0005,0.0003) -0.0002 (-0.0008,0.0005) 0.0001 (-0.0002,0.0005)

Variance greater at adiposity rebound SNP 0.0002 (-0.0037,0.0042) -0.0004 (-0.0027,0.0019) -0.0001 (-0.0041,0.0039) -0.0025 (-0.0048,-0.0003) SNP*age 0.0002 (-0.0005,0.0009) -0.0001 (-0.0006,0.0003) 0.0004 (-0.0003,0.0010) -0.0003 (-0.0007,0.0000)

Variance increasing over time SNP -0.0009 (-0.0049,0.0031) -0.0045 (-0.0067,-0.0022) -0.0014 (-0.0055,0.0027) 0.0011 (-0.0013,0.0034) SNP*age 0.0003 (-0.0004,0.0011) -0.0004 (-0.0008,0.0000) -0.0003 (-0.0010,0.0004) 0.0002 (-0.0002,0.0006)

Table 4.4: Bias and 95% confidence interval for the unbalanced designs; bold and underlined cells are those whose confidence interval does not cover zero

based on 4,000 simulations under each design (1,000 simulations for each MAF combined into one summary statistic).

Sampling Design Equal Unbalanced

Unbalanced with more samples around the adiposity rebound

Unbalanced with less samples around the adiposity rebound

Sample Size N=1,000 N=3,000 N=1,000 N=3,000 N=1,000 N=3,000

Gaussian Distribution SNP -0.0013 (-0.0052,0.0026) -0.0017 (-0.0040,0.0005) 0.0006 (-0.0033,0.0046) 0.0008 (-0.0015,0.0031) 0.0020 (-0.0019,0.0059) -0.0001 (-0.0024,0.0022) SNP*age -0.0002 (-0.0009,0.0004) -0.0003 (-0.0007,0.0000) -0.0001 (-0.0008,0.0005) 0 (-0.0004,0.0003) -0.0003 (-0.0009,0.0004) 0.0004 (0.0000,0.0008)

t-distribution SNP 0.0009 (-0.0031,0.0049) 0.0021 (-0.0003,0.0044) -0.0003 (-0.0044,0.0037) 0.0004 (-0.0019,0.0027) 0.0037 (-0.0003,0.0078) 0.0036 (0.0012,0.0059) SNP*age 0.0002 (-0.0005,0.0010) 0.0001 (-0.0003,0.0006) 0.0002 (-0.0006,0.0009) 0.0005 (0.0000,0.0009) 0.0008 (0.0000,0.0015) 0.0004 (0.0000,0.0009)

Skew-normal Distribution SNP 0.0009 (-0.0030,0.0047) -0.0006 (-0.0029,0.0017) -0.0003 (-0.0042,0.0035) 0.0029 (0.0006,0.0051) -0.0021 (-0.0060,0.0018) -0.0017 (-0.0040,0.0006) SNP*age 0.0004 (-0.0002,0.0011) -0.0003 (-0.0007,0.0001) 0 (-0.0007,0.0006) 0.0005 (0.0001,0.0009) -0.0004 (-0.0010,0.0003) -0.0002 (-0.0006,0.0001)

Mixture of 2 Gaussian Distributions SNP -0.0036 (-0.0075,0.0003) 0.0014 (-0.0008,0.0037) 0 (-0.0040,0.0040) -0.0016 (-0.0038,0.0006) -0.0017 (-0.0057,0.0022) -0.0022 (-0.0044,0.0001) SNP*age -0.0007 (-0.0012,-0.0001) 0.0003 (0.0000,0.0006) -0.0001 (-0.0006,0.0005) -0.0002 (-0.0005,0.0001) 0.0001 (-0.0005,0.0006) -0.0001 (-0.0004,0.0003)

Variance dependent on a covariate SNP 0.0005 (-0.0034,0.0045) -0.0006 (-0.0029,0.0018) -0.0008 (-0.0048,0.0032) 0.0027 (0.0005,0.0050) -0.0022 (-0.0062,0.0017) -0.0002 (-0.0025,0.0021) SNP*age 0 (-0.0007,0.0007) 0.0001 (-0.0003,0.0005) -0.0004 (-0.0011,0.0003) 0.0003 (-0.0001,0.0007) -0.0005 (-0.0012,0.0002) -0.0001 (-0.0005,0.0003)

Variance greater at adiposity rebound SNP 0.0014 (-0.0027,0.0056) 0.0015 (-0.0008,0.0038) 0.0009 (-0.0031,0.0049) -0.0011 (-0.0034,0.0012) 0.0017 (-0.0022,0.0056) 0.0008 (-0.0015,0.0031) SNP*age 0 (-0.0007,0.0007) -0.0001 (-0.0005,0.0003) 0.0006 (-0.0001,0.0013) -0.0001 (-0.0005,0.0003) 0.0004 (-0.0003,0.0010) -0.0002 (-0.0005,0.0002)

Variance increasing over time SNP -0.0006 (-0.0046,0.0034) -0.0002 (-0.0025,0.0022) 0.0009 (-0.0031,0.0049) -0.0012 (-0.0035,0.0012) 0.0018 (-0.0022,0.0057) 0.0002 (-0.0021,0.0025) SNP*age -0.0001 (-0.0009,0.0006) -0.0002 (-0.0006,0.0002) -0.0002 (-0.0010,0.0006) -0.0003 (-0.0007,0.0002) 0.0004 (-0.0003,0.0011) -0.0001 (-0.0005,0.0003)

4.5.3 Power

Effect sizes for the alternative hypothesis (β5=0.6 and β6=0.15) were chosen to have 80%

power with a MAF of 0.4 and sample size of 1,000 when the error from the fitted LMM follows

a Gaussian distribution with constant variance. Therefore, the power for all error distributions

and MAFs in the simulations with sample size of 3,000 was greater than 80%; hence this

section will only discuss power for the simulations with a sample size of 1,000. Power for the

SNP main effect and SNP*age interaction parameters are displayed in Figure 4.4 (complete

designs) and Figure 4.5 (unbalanced designs).

As expected, the power increases with the MAF. Interestingly, the simulations where the error

distribution was assumed to have a t-distribution had lower power for both the SNP main

effect and the SNP*age interaction parameters than the simulations assuming a Gaussian error

distribution. This pattern was consistent across all the sampling designs; however it appears

that the power is slightly closer to that of the error with the Gaussian distribution when there

is more data around the adiposity rebound (i.e. the intense complete and unbalanced with

more samples around the adiposity rebound). In addition, for simulations where the error

distribution follows a skew-normal distribution, the power for both the SNP and SNP*age

interaction parameters was slightly higher than those with the Gaussian error.

When investigating the different error variance structures, the power for the SNP main effect

parameter across all MAFs was slightly lower than the power when the constant variance

assumption was met. Likewise, for the SNP*age interaction parameter, all of the error variance

structures led to lower power than when the constant variance assumption was met.

However, simulations under the unbalanced designs where the variance increased over time

were the most affected and had notably reduced power until a MAF of approximately 0.3.


Figure 4.4: Simulated power of the SNP main effect and SNP*age interaction terms for

complete designs. The two plots on the left are for the Sparse Complete design, while the

two plots on the right are from the intense complete design.


Figure 4.5: Simulated power of the SNP main effect and SNP*age interaction terms for

unbalanced designs, where “Equal” is the simulations from the Equal Unbalanced design,

“Over” are the simulations from the unbalanced design with less samples around the

adiposity rebound and “Under” are the simulations from the unbalanced design with more

samples around the adiposity rebound.


4.5.4 Type 1 Error

As observed with the coverage probabilities, no consistent differences in type 1 error were

evident across the MAF range, and hence the results from each of the simulated datasets were

combined for ease of presentation; however the type 1 error for each of the MAFs tested are

given in Appendix D, Tables 11-15.

As seen in Table 4.5, the type 1 error for the complete designs remained within acceptable

limits of the nominal alpha level. Inflation for the SNP by age interaction parameter was

observed in several cases, but this inflation was reduced to nominal levels by using a robust

standard error.

Table 4.6 shows that the type 1 error for the SNP by age interaction was often inflated under

the unbalanced designs. However, by using a robust standard error, the inflation can be

reduced to nominal levels in the majority of cases; approximately 75% of the inflated effects

were reduced. The design where the robust standard error didn’t seem to have an effect was

when the error variance increased over time; only 20% of the estimates were reduced to

nominal levels under this design. Interestingly, the robust standard error did not appear to

affect the type 1 error for the scenarios that were not originally inflated.

To declare significance in a GWAS, several thresholds are commonly used; suggestive

association, significant association and highly significant association. Duggal et al define

suggestive associations as SNPs that reach a P-Value threshold under the assumption that one

false positive association is expected per GWAS [265]; SNPs reaching this threshold are often

taken forward to a replication stage. In the context of the simulation study, this definition

would equate to a P-Value of 0.00005 (1/20,000; where 20,000 is the number of simulations

per design and error assumption). The scenario with the highest type 1 error inflation using the

classical standard error was for the SNP*age interaction under the intense design where the

error variance increased over time (0.0746 for both N=1,000 and 3,000). In this scenario, six

SNPs would falsely reach the definition of ‘suggestive association’ for the SNP*age interaction

parameter when using the classical standard error with a sample size of 1,000 individuals. In

contrast, when the model assumptions are met, that is when the error distribution follows a

Gaussian distribution with constant variance, only two SNPs met the ‘suggestive association’

threshold, indicating an inflation in the type 1 error for the simulations where the variance


increased over time due to the misspecification of the error term. When using the robust

standard error under the increasing variance over time design, one SNP would meet the

criteria, showing not only a reduction in the type 1 error from the seven SNPs seen with the

classical standard error, but also a reduction in power in comparison to the model where the

assumptions were met.

Table 4.5: Type 1 error for complete designs; bold and underlined cells are those that are

significantly different from the nominal α=0.05 based on 20,000 simulations under each

design (5,000 simulations for each MAF combined into one summary statistic).

Sampling Design Sparse Complete Intense Complete

Sample Size N=1,000 N=3,000 N=1,000 N=3,000

Standard Robust Standard Robust Standard Robust Standard Robust


SNP 0.0514 0.0528 0.0509 0.0513 0.0502 0.0521 0.0500 0.0510

SNP*age 0.0483 0.0504 0.0483 0.0491 0.0549 0.0486 0.0539 0.0467

Global Wald test 0.0497 0.0478 0.0605 0.0620

t-distribution

SNP 0.0495 0.0498 0.0489 0.0496 0.0479 0.0510 0.0483 0.0502

SNP*age 0.0521 0.0534 0.0487 0.0492 0.0581 0.0490 0.0563 0.0465

Global Wald test 0.0531 0.0508 0.0624 0.0629


SNP 0.0502 0.0517 0.0524 0.0524 0.0509 0.0526 0.0525 0.0532

SNP*age 0.0503 0.0519 0.0461 0.0474 0.0541 0.0508 0.0529 0.0486

Global Wald test 0.0493 0.0488 0.0621 0.0579


SNP 0.0498 0.0504 0.0479 0.0479 0.0485 0.0499 0.0510 0.0508

SNP*age 0.0502 0.0510 0.0492 0.0488 0.0528 0.0506 0.0529 0.0495

Global Wald test 0.0498 0.0508 0.0615 0.0586


SNP 0.0523 0.0527 0.0488 0.0490 0.0485 0.0511 0.0459 0.0485

SNP*age 0.0546 0.0527 0.0531 0.0514 0.0520 0.0493 0.0524 0.0481

Global Wald test 0.0515 0.0525 0.0556 0.0546


SNP 0.0472 0.0478 0.0511 0.0519 0.0477 0.0493 0.0471 0.0490

SNP*age 0.0528 0.0497 0.0570 0.0528 0.0513 0.0513 0.0491 0.0487

Global Wald test 0.0527 0.0540 0.0502 0.0478


SNP 0.0523 0.0536 0.0471 0.0473 0.0543 0.0513 0.0561 0.0522

SNP*age 0.0564 0.0538 0.0522 0.0491 0.0746 0.0528 0.0746 0.0530

Global Wald test 0.0875 0.0549 0.0875 0.0497 0.1667 0.0506 0.1685 0.0506


Table 4.6: Type 1 error for unbalanced designs; bold and underlined cells are those that are significantly different from the nominal α=0.05 based on 20,000

simulations under each design (5,000 simulations for each MAF combined into one summary statistic).

Sampling Design Equal Unbalanced Unbalanced with more samples around the adiposity

rebound Unbalanced with less samples around the adiposity

rebound


Standard Robust Standard Robust Standard Robust Standard Robust Standard Robust Standard Robust


SNP 0.0518 0.0532 0.0500 0.0508 0.0503 0.0521 0.0478 0.0490 0.0529 0.0550 0.0540 0.0542

SNP*age 0.0581 0.0526 0.0592 0.0531 0.0566 0.0514 0.0556 0.0496 0.0560 0.0511 0.0575 0.0509

Global Wald test 0.0646 0.0598 0.0601 0.0615 0.0621 0.0609

t-distribution

SNP 0.0510 0.0522 0.0491 0.0497 0.0485 0.0500 0.0495 0.0505 0.0487 0.0499 0.0516 0.0523

SNP*age 0.0571 0.0487 0.0629 0.0539 0.0596 0.0508 0.0571 0.0475 0.0563 0.0487 0.0577 0.0489

Global Wald test 0.0607 0.0621 0.0620 0.0583 0.0587 0.0605


SNP 0.0493 0.0508 0.0495 0.0501 0.0498 0.0517 0.0473 0.0481 0.0512 0.0519 0.0482 0.0484

SNP*age 0.0548 0.0492 0.0589 0.0526 0.0580 0.0512 0.0571 0.0498 0.0593 0.0532 0.0547 0.0490

Global Wald test 0.0618 0.0582 0.0616 0.0583 0.0632 0.0575


SNP 0.0519 0.0527 0.0490 0.0490 0.0505 0.0510 0.0519 0.0517 0.0510 0.0522 0.0487 0.0483

SNP*age 0.0534 0.0517 0.0487 0.0459 0.0510 0.0494 0.0538 0.0518 0.0543 0.0511 0.0551 0.0517

Global Wald test 0.0579 0.0581 0.0605 0.0603 0.0589 0.0568

Table 4.6 continued

Sampling design Equal Unbalanced Unbalanced with more samples around the adiposity


rebound




SNP 0.0495 0.0515 0.0482 0.0491 0.0498 0.0513 0.0506 0.0509 0.0512 0.0518 0.0528 0.0502

SNP*age 0.0586 0.0499 0.0607 0.0514 0.0576 0.0505 0.0588 0.0497 0.0605 0.0507 0.0604 0.0507

Global Wald test 0.0589 0.0611 0.0597 0.0567 0.0620 0.0583


SNP 0.0493 0.0504 0.0492 0.0495 0.0486 0.0498 0.0516 0.0526 0.0506 0.0514 0.0496 0.0531

SNP*age 0.0570 0.0491 0.0563 0.0483 0.0546 0.0482 0.0563 0.0503 0.0561 0.0483 0.0600 0.0505

Global Wald test 0.0572 0.0559 0.0568 0.0541 0.0588 0.0569


SNP 0.0533 0.0545 0.0500 0.0502 0.0564 0.0563 0.0530 0.0520 0.0491 0.0523 0.0500 0.0526

SNP*age 0.0554 0.0536 0.0571 0.0540 0.0643 0.0570 0.0610 0.0527 0.0497 0.0534 0.0497 0.0513

Global Wald test 0.0911 0.0576 0.0929 0.0529 0.1031 0.0578 0.1011 0.0548 0.0850 0.0559 0.0801 0.0520

The global Wald test, which is assessing whether there is any genetic effect on the whole BMI

growth trajectory, was inflated above the acceptable limits under all error variance

misspecifications and even under the Gaussian/constant variance assumption, except under

the sparse complete design. The scenario where the error variance increased over time

showed the largest inflation; however, using the robust estimates for the Wald test under this

scenario were also reduced to nominal levels in most designs; if it wasn’t reduced to nominal

levels it was dramatically lower than using the classical test (Table 4.6).

4.5.5 Type 1 Error in Unbalanced Designs Versus Complete Designs

Inflation in the type 1 error of the SNP*age interaction was observed in the simulations where

the error term followed a Gaussian, constant variance distribution, primarily under the

unbalanced designs rather than the complete designs. This could be due to the missing data in

the unbalanced designs or the variability in timing of the measurements (i.e. the samples were

measured at any time throughout a year rather than at an exact time). Therefore, additional

simulations under the Gaussian, constant variance distribution were conducted using two of

the sampling designs:

1. Sparse: ni=8 measures per person with few measures around the adiposity rebound;

times of measures are 1, 2, 3, 5, 8, 10, 13, 15

2. Intense: ni=14 measures per person with multiple measures around the adiposity

rebound; times of measures are 1, 2, 3, 3.5, 4, 4.5 ,5 ,5.5 ,6 ,7, 9, 11, 13, 15

However, they were not simulated with complete data as in the previous simulations; the

following combinations were used instead:

1. Complete in all individuals and they were all measured at the same time (complete,

same age)

2. Complete in all individuals but they were measured at different times within each year

period (complete, different age)

3. Each individual had 40% missing data over the time period, but were all measured at

the same time (missing, same age)

4. Each individual had 40% missing data over the time period and were measured at

different times within each year period (missing, different age)


The results from these simulations can be found in Figure 4.6 for the sparse design and Figure

4.7 for the intense design. These simulations provide evidence that this inflation was greater in

the presence of missing data rather than because of the different measurement times

between individuals, with greater inflation seen in the larger sample size but no obvious

differences between MAFs.

Figure 4.6: Results from comparison between missing data or variable measurement time

under the sparse design


Figure 4.7: Results from comparison between missing data or variable measurement time

under the intense design

Since the LME is known to be robust to missing data under the missing at random and missing

completely at random assumptions, we simulated additional data varying the polynomial

function of age in the fixed and random effects. These simulations showed the type 1 error

was reduced to nominal levels when the fixed and random effects had the same function of

age, i.e. cubic function in both the fixed and random effects (Table 16 in Appendix D).

To determine whether there is remaining inflation in the type 1 error after modelling the same

function of age in the fixed and random effects when the error distribution is misspecified we

simulated additional data using the equal unbalanced sampling design (see Figure 1 of


Appendix D for outline of additional simulations). These simulations showed that the type 1

error was again reduced to nominal levels when the fixed and random effects had the same

function of age regardless of the misspecification in the error distribution (Table 17 in

Appendix D).

It is often difficult to estimate higher order terms in the random effects when using real data

due to computational and convergence issues. In this case, it is often only possible to fit a

lower-order polynomial function in the random effects than the fixed effects. We simulated

additional data where the fixed and random effects included a quadratic function for age but

we analysed the data with a quadratic function in the fixed effects and a linear function in the

random effects. In addition, we also simulated data where the fixed effects included a

quadratic function for age and the random effects included only a linear function but analysed

the data with a quadratic function in both the fixed and random effects. These simulations

showed that the type 1 error was inflated when the analysis model had lower order terms of

polynomial function in the random effects compared to the fixed effects terms (Table 18 in

Appendix D).

These additional simulations also showed that having the same structure of fixed and random

terms for the age polynomial function would yield nominal type 1 errors for the global Wald

test.

In summary, it is recommended that one includes the same polynomial function for age in the

fixed and random effects to avoid inflation in the type 1 error; however, if this is not possible

due to non-convergence of the model then a robust standard error is required to reach

nominal levels of type 1 error.

Given that many researchers investigating GWAS of longitudinal traits are interested in only

the SNP main effect and not the SNP*age interaction [91], we conducted some additional

simulations without the SNP*age interaction. Once again, we used the scenario where the

error variance increased over time and where there was equal unbalance in the data structure.

We found that the type 1 error was within the nominal range for the SNP main effect for both

sample sizes (N=1,000: 0.0506; N=3,000: 0.0515), where previously we saw inflation for the

sample size of 1,000 (0.0533 from Table 4.6). We have no reason to believe that any of the


other scenarios would be affected by the misspecifications when the SNP*age interactions are

not modelled.

4.5.6 Power Using the Robust Standard Error

We have shown that using the robust standard error doesn’t affect those situations where the

type 1 error wasn’t initially inflated. However before adopting the robust standard error for a

GWAS analysis, it is important to determine whether using the robust standard error would

decrease the power to detect a statistically significant association.

The power for the SNP main effect parameter remains almost unchanged when using the

robust standard error rather than the normal standard error in all scenarios and under all

model misspecifications (Figure 4.8 for complete designs and Figure 4.9 for unbalanced

designs). The only scenario where the power decreased for the SNP main effect parameter by

using the robust standard error was where there was increasing variance over time under the

intense complete scenario. Given that the type 1 error was not inflated using either standard

error estimate, there appears to be no harm in using a robust standard error for estimation

even when not required.

The power for the SNP*age interaction parameter, particularly for low MAF, is considerably

more variable. Under the sparse complete design, where there was no inflation in the type 1

error, the power remains about the same using either the classical or robust standard error.

For the other designs, the power for the SNP*age interaction parameter decreases using the

robust standard error, but only by 5% or less for most error misspecifications, when the MAF

was 0.2 or greater. The simulations which assumed a t-distribution for the error had a 5-10%

reduction in power using the robust standard error when the MAF 0.1 or 0.2; this might be due

to the substantial reduction in type 1 error. The power also decreases by greater than 5%

when the variance is greater at the adiposity rebound and the variance is dependent on a

covariate, for values of MAF around 0.1 in the scenarios presented.


Figure 4.8: Difference in power based on a normal standard error versus a robust standard

error for the complete designs. A positive value indicates the power using the normal

standard error is greater than the power using the robust standard error. The two plots on

the left are for the Sparse Complete design, while the two plots on the right are from the

intense complete design.


Figure 4.9: Difference in power based on a normal standard error versus a robust standard

error for the unbalanced designs. A positive value indicates the power using the normal

standard error is greater than the power using the robust standard error. Here, “Equal” is

the simulations from the Equal Unbalanced design, “Over” are the simulations from the

unbalanced design with fewer samples around the adiposity rebound and “Under” are the

simulations from the unbalanced design with more samples around the adiposity rebound.


4.6 Analysis of Chromosome-Wide BMI Data While the simulated data provided a useful platform for testing the effect of the error

misspecification in LMM’s in a controlled setting, it is important to also investigate how this

related to the real data. Given the simulation results, in particular the need for a robust

standard error to ensure accurate inference for the SNP*age interaction where the type 1

error is inflated, the impact of the distribution assumption problems was investigated in a real

data application on the ALSPAC data.

Each SNP in the ALSPAC data takes approximately 30 minutes to run the LMM in addition to

the robust tests for the fixed effects and the global test. Therefore, to conduct a genome-wide

analysis on the imputed ALSAPC data using this model would take approximately 2.5 years.

This was deemed to be too large a computational burden, so the genotyped data on one

specific chromosome was used instead. The fat mass and obesity gene (FTO) on chromosome

16 is the most replicated gene to date for association with BMI in both adults and children

[174]. In addition to the cross-sectional associations with BMI, it has also been shown to be

associated with childhood growth in ALSPAC and other birth cohorts [251]. This chromosome

was therefore selected for the analysis and it was hypothesised that some significant loci

would be detected, specifically around the FTO gene, as well as many non-associated SNPs.

We used the same LMM model as in equation 4.2, with the inclusion of an age*sex interaction

in the fixed effects for all the age components (i.e. β9sexi + β10tijsexi + β11tij2sexi + β10tij

3sexi)

to account for the differences in growth between males and females [197]. There were 14,875

SNPs genotyped on chromosome 16, all of which had a MAF greater than 1%; GWASs are

designed to look at common SNPs, so it is a common strategy to exclude SNPs with MAF less

than 1%. Each SNP was incorporated into the model assuming an additive genetic model.

As expected, SNPs in the FTO gene were highly significant for the global tests as well as the

SNP main effect and SNP*age interactions. It is common to display P-Values from a GWAS

analysis as a QQ plot of the observed –log10(P-Value) with the expected –log10(P-Value) under

the null distribution. Figure 4.10 displays a Q-Q plot from the chromosome 16 analysis in

ALSPAC for each of the parameters which displayed inflated levels of type 1 error in the

simulation study. As the 88 SNPs within the FTO region are believed to be true positives, the


QQ plots are also displayed excluding SNPs from this region (Figure 4.10, C and D). In addition,

Figure 4.11 displays the QQ plot for each of the parameters involving the SNPs from the

chromosome 16 analysis in ALSPAC, excluding the SNPs in the FTO gene. These QQ plots clearly

show that where the parameters have inflated type 1 error using the classical test, including

the global SNP test, the SNP*age and SNP*age3 P-Values, the robust test reduces this to

nominal levels. These plots also indicate that if the classical test is not inflated it may be

dangerous to use the robust test as it artificially induces inflation. When using the robust tests,

associations with SNPs in the FTO gene were still detected, both for the SNP main effect and

the SNP*age interactions.

In the chromosome wide analysis, the P-Value to declare ‘suggestive significance’ would be

0.000067 (1/14,875). Using this threshold, 57 SNPs would reach suggestive significance for the

SNP by age interaction using the classical standard error in comparison to only 16 SNPs using

the robust standard error. Six of these 16 SNPs were in the FTO gene, four of which would

reach the significant threshold.

There is still some remaining inflation in the global Wald test and SNP*age interaction,

however it is suspected this might be due to additional regions of chromosome 16 being

associated with BMI trajectory. There are two regions, in addition to FTO, that have been

shown to be associated with adult BMI which may show some association in these analyses. It

would require conducting a full GWAS analysis to get an accurate estimate of the inflation in

ALSPAC, which will be discussed further in Chapter 4.


Figure 4.10: Q-Q plot of the chromosome 16 analysis in ALSPAC for the overall Wald test and

the SNP*linear age interaction test. Plots A and B include 88 SNPs in the FTO gene, Plots C

and D exclude SNPs in the FTO gene.


Figure 4.11: Q-Q plot of the chromosome 16 analysis in ALSPAC for all parameters, excluding

88 SNPs from the FTO gene.


4.6.1 Comparison Between the Classical and Robust Tests

For the SNP*age and the SNP*age3 interactions, where the greatest inflation in the type 1

error was seen, the majority of the P-Values for the robust tests were larger than the P-Values

using the classical test. This can be seen by the bow shaped curve in Figure 4.12. This was not

the case for the global Wald test, where 36% of the robust P-Values were less than the

classical P-Values; the Wald test also displayed greater variability between the two tests

(Figure 4.12). The difference between the classical and robust P-Values for the SNP main effect

appears to fall within two groups; those that are fairly consistent between the two P-Values

and those where the classical P-Value is larger than the robust P-Value. To investigate these

groups further, the P-Values were examined by MAF. Figure 4.13 shows that for the low MAFs,

for some SNPs the robust estimates deviate from the classical estimates and may not be

accurate. In contrast, at the higher MAFs the robust P-Values were almost identical to the

classical P-Values. Although it is often the case that the robust P-Value is larger than the

classical P-Value, it is not a necessity and there have been several publications indicating that

the robust test is more ‘consistent’ than the classical test [267,268].

Focusing on the ten most significant SNPs using the classical Wald test and the robust Wald

test, only two SNPs were in common. The fixed effects terms showed more consistency

between the two tests, with six of the top 10 being significant using both the classical and

robust tests for the SNP*age interaction.


Figure 4.12: Comparison of the classical and robust tests for each of the parameters of

interest from the chromosome 16 analysis in ALSPAC


Figure 4.13: Comparison of the classical and robust tests for the SNP main effect by minor

allele frequency (MAF) from the chromosome 16 analysis in ALSPAC


4.7 Discussion In this Chapter, longitudinal data was simulated that mimicked childhood BMI to explore the

coverage probability, bias, power and type 1 error for association with a SNP when the linear

mixed effects model is misspecified with either a non-Gaussian error distribution or

heteroskedastic error. We have shown that the type 1 error for the SNP*age interaction terms

in a genetic association study has no inflation if the same function of age is included in both

the fixed and random effects. However, type 1 error is inflated, regardless of the model

misspecification, if the age function in the fixed and random effects differs. In situations where

the model is too complex and will not converge with a high order polynomial function in the

random effects, an appropriate way to deflate the type 1 error to nominal levels is to use a

robust standard error for the fixed effects parameters. Although robust standard errors have

previously been used in a wide range of statistical applications, LMM’s are only just beginning

to be utilized in GWASs and therefore guidance on their application was warranted. Given that

QQ plots in GWASs are an important diagnostic to rule out the possibility of population

stratification, it is essential to generate standard errors that perform well under the null

hypothesis so that any remaining inflation is not due to the model fitting. Similar to the

conclusions by Gurka et al [269] and Verbeke and Molenberhgs [270], the sandwich estimator

is a valid alternative when the model assumptions are misspecified, however it is less efficient

than using the correct covariance model.

Similar to Jacqmin-Gadda et al [256], results in this Chapter have shown that estimates of

differences in slope by the number of copies of minor allele are sensitive to heterogeneous

error variance, particularly when the error variance depends on a covariate or increases over

time. The variance of the estimates is underestimated and therefore the confidence interval is

too narrow; this is consistent with the inflated type 1 error under these misspecified model

assumptions.

Of all the misspecifications investigated, the situation where the error variance increases over

time and is not accounted for in the modelling has poor parameter estimates, low power and

the most inflation of the type 1 error, particularly for the SNP*age interaction terms. It also

appears that by using the robust standard error, the inflation in the type 1 error is reduced to

the nominal level in only some of the scenarios. It is therefore imperative that some


adjustment is made in the modelling to account for this increasing variance over time. In the

ALSPAC BMI data, the variance stays relatively constant until around the age of four years,

when it rapidly increases until around 11 years of age before plateauing again. This is due to

the different growth rates between individuals through the adiposity rebound and puberty.

Increasing variability over time is observed with many other phenotypes both in childhood and

adulthood; for example, lung function in an elderly population can decrease due to the rate at

which individuals are diagnosed with diseases such chronic obstructive pulmonary disease,

while other individuals remain healthy. Variance functions for modelling heteroscedasticity in

mixed effects models have been studied in detail by Davidian and Giltinan [271] and can be

implemented using the varFunc classes in the nlme package in R [252]. There are also

equivalent functions in alternative statistical packages such as MLwiN [272]. The use of these

variance functions could be recommended in the context of GWASs, if there is remaining

heteroscedasticity in the residuals after appropriately modelling the fixed and random effects;

however further studies are needed to assess their properties in this context.

When looking at SNPs with low MAFs, it seems that by using the robust standard error the

power is reduced by approximately 5%. To counteract this reduction, studies can increase the

sample size though the use of meta-analysis of multiple cohort studies as is commonly done in

GWAS analyses. However, several manuscripts have previously discussed the extended

computational time for longitudinal GWAS in comparison to GWAS of cross-sectional

phenotypes, so it is recommended that large computing clusters are available to those cohort

studies conducting analyses. The longitudinal GWAS of cardiovascular risk factors presented in

Smith et al [246] took approximately 3 hours on 64 processors of a compute cluster for

600,000 tests in 525 individuals. Sikorska et al [245] illustrated that the analysis of 2.5 million

SNPs using the LME function in the nlme package of R would take 3,500 hours for a sample size

of 3,000 individuals on a desktop computer (Intel(R) Core(TM) 2 Duo CPU, 3.00 GHz). These

times are consistent with those in this study; the chromosome 16 analysis of 14,875 SNPs in

the 7,916 ALSPAC individuals took approximately 125 hours on 32 processors of a compute

cluster (BlueCrystal Phase 2 cluster with each node having four 2x2.8 GHz core processors and

8 GB of RAM).

It has been suggested that the genome-wide significance threshold should be set at

5x10-8[273,274]. In addition, Duggal et al [265] established an appropriate P-Value threshold


based on the number of independent SNP tests in a GWAS. If study data is imputed against the

HapMap CEU population, they suggest a threshold of P-Value < 6.09x10-6 be used to select

SNPs with suggestive evidence for follow-up. Many cross-sectional GWASs use thresholds

around this, generally ranging from P-Value < 5x10-6 [72] to P-Value < 1x10-5 [179], to select

SNPs for replication. In longitudinal genetic association studies, particularly those with

complex, non-linear trajectories, controlling the type 1 error of the many parameters involving

SNP effects, can be quite challenging. This would be the case, for example, when using

smoothed splines functions and those functions could interact with the SNP effects. Providing

robust standard errors in this context can be difficult. As an alternative, it may be plausible to

use genomic control procedures to reduce a possible inflation in the type 1 error for the

parameters involving the SNP effects [40,275]. Genomic control is typically used in genetic

association studies to account for the potential confounding due to cryptic relatedness. It

makes the assumption that the inflation in type 1 error is constant across all marker in the

genome; this assumption is plausible in the context of cryptic relatedness as the inflation is

due to the kinship coefficients which are unrelated to the individual loci. In the context of

LMM’s one would need to show that the inflation was uniform across the genome or genetic

region of interest. Benke et al [247] suggested using a joint test of all SNP effects, similar to the

global Wald test used in the current study, as an optimal way to control the type 1 error and

increase power. However, caution needs to be applied when utilizing this method for complex

traits, such as BMI trajectories over childhood, and a genome-wide significance threshold

should only be used if there is no inflation detected in the type 1 error. Benke et al [247] used

a trait with a linear decrease over time and low correlation between the intercept and slope

parameters; in contrast, in this study there is a complex trajectory over time with high

correlation between the intercept and slope parameters, which indicated that the joint test

has inflated type 1 error and can only be reduced using a robust estimate in some scenarios.

Caution needs to be taken when using the robust test for the global test as the analysis of

chromosome 16 in ALSPAC showed large variability between the classical and robust global

tests, which also lead to different ‘top hits’ depending on which test was used.


4.8 Conclusion Based on the simulation results in this Chapter, it is strongly suggested that one fits the same

function of age in the fixed and random effect to avoid inflation of the type 1 error of the

SNP*age interaction terms. If this is not possible due to convergence issues, then it is

recommended that one uses a robust standard error for the SNP by age interaction terms to

reduce the type 1 error inflation in GWASs, regardless of whether or not the error term of the

model correctly follows the model assumptions. If no inflation in the type 1 error is detected

for a particular parameter of interest, then the classical standard error should be used; for

example, for the SNP main effect parameter in this study.


Chapter 5: Genome-Wide Association Study Of BMI Trajectories Across Childhood 5.1 Introduction Now that the most efficient model has been defined and extensively studied in Chapter 2 and

the effect of any possible model misspecification has been investigated in Chapter 4, the next

stage is to conduct the genetic analyses. This chapter outlines the genome-wide association

analyses that were conducted, the results found and the future publication that is planned.

5.2 Background There is growing evidence that genetic variants, particularly SNPs, within genes influence an

individuals’ risk of many common diseases. As mentioned in Chapter 1, Section 1.2, there is

also a growing body of literature from observational and animal studies demonstrating the

influence of antenatal and early life factors on disease risk in later life. One such study showed

that poor infant and child growth led to type 2 diabetes in adulthood [276]. Some of the

identified genetic variants have been found to either act on both the early life factors and

disease outcome or they appear to modify the relationship between them. For example,

Freathy et al found that a SNP in the ADCY5 gene had pleiotropic effects on birth weight,

glucose regulation and type 2 diabetes in adulthood [73]. In contrast, polymorphisms in the

PPAR-γ2 gene modify the relationship between size at birth and hypertension [277], obesity

[278] and insulin sensitivity [279]. Because of these and other studies, it is now widely

accepted that genetic variants play an important role in the DOHaD and life course approaches

to adult disease described in Chapter 1, Section 1.2. Newnham et al [117] described these

relationships with the diagram in Figure 5.1. The inherited genetic variants and antenatal

environmental exposures, along with the postnatal environmental exposures, predispose

individuals to a range of diseases in adulthood. These exposures can work additively, whereby

each exposure independently increases disease risk, or multiplicatively, whereby each

exposure modifies the disease risk imposed by other exposures.

176 Chapter 5: GWAS of childhood BMI growth

Figure 5.1: Schematic describing the relationships between genetic variants, environmental

exposures and modification to disease risk in adulthood. Image adapted from Newnham et

al [117].

There are several pathways by which genetic variants could affect adult disease risk. These

include but are not limited to:

1. The variant could be directly associated with the disease (green arrow in Figure 5.1).

For example, there are at least 18 SNPs associated with type 2 diabetes [280].

2. The variant could be associated with a mediator of disease, such as BMI (orange arrow

in Figure 5.1). As many as 32 genetic regions have been found to date to be associated

with BMI in adulthood [72]. At least one of these genes, the FTO gene, is also

associated with increased risk of type 2 diabetes but only through its influence on BMI

[174].

3. The variant could be associated with an adverse antenatal environment, for which

birth weight is often used as a surrogate, as in Freathy et al described above [73] (blue

arrows in Figure 5.1).

4. The variant could be associated with an adverse postnatal environment, where poor

growth during childhood is one of many different markers (red arrows in Figure 5.1).


To date, there is one GWAS investigating childhood obesity in populations of European

descent from the Early Growth Genetics (EGG) Consortium (http://egg-consortium.org/) [183].

Bradfield et al defined cases as children reaching ≥95th percentile for their age and sex at least

once before the age of 18 years, and controls were consistently below the 50th percentile

throughout childhood [183]. Using this definition, they identified two novel genetic variants

associated with childhood obesity; these variants also show evidence for association with BMI

in adulthood, although not at a genome-wide level of significance [183]. The gene near their

first variant, olfactomedin 4 (OLFM4), had not previously been implicated with obesity,

however it had been studied in the context of several cancers, particularly gastrointestinal

cancers [281]. The other variant was in the homeobox B5 (HOXB5) gene, which is involved in

gut and lung development. They concluded that both genes may impact obesity risk through

their influence on gut function. There are no genome-wide studies looking at BMI as a

continuous trait, or BMI trajectories over childhood. In a recent review article of the field of

obesity genetics, Day and Loos highlight the importance of conducting GWASs in children and

adolescence to identify additional loci that may have important effects early in life rather than

adulthood [282].

5.2.1 Aims

The aim of this study is to assess the genetic basis of BMI growth trajectories across childhood

and adolescence. Any genes found to be associated with growth trajectories will be

characterised in terms of their timing and effect on growth.


5.3 Statistical Methods 5.3.1 Study Populations

All three cohorts are described in detail in Chapter 1, Section 1.6. A full GWAS was conducted

in the Raine Study, while the other two cohorts were used for replication. The subsets used in

this analysis are described below. The data from each cohort was cleaned as outlined in

Section 5.3.2; the subsets described include only the cleaned data.

5.3.1.1 Raine Study

A subset of 1,461 individuals was used for analysis in this study using the following inclusion

criteria: at least one parent of European descent, live singleton birth, unrelated to anyone in

the sample (one of every related pair was selected at random), no major congenital anomalies,

genotype data and at least one measure of BMI between ages 1 and 17 years. BMI was

calculated from the weight and height measurements (median six measures per person, IQR:

5-7, range 1-8 measurements), with a total of 8,670 BMI measures.

5.3.1.2 ALSPAC

A subset of 7,868 individuals was used for analysis in this study using the same criteria as in the

Raine Study. BMI was calculated from the weight and height measurements (median nine

measures per person, IQR: 5-12, range 1-29 measurements), with a total of 68,862 BMI

measures.

5.3.1.3 NFBC66

A subset of 3,918 individuals was used for analysis in this study using the same criteria as in the

Raine Study. BMI was calculated from the weight and height measurements (median 12

measures per person, IQR: 9-15, range 1-28 measurements), with a total of 48,530 BMI

measures.

5.3.2 Data Cleaning

Several steps were conducted independently in each of the cohorts to ensure that the

phenotypic data was clean before beginning the genetic association analysis. These steps

included:

1. Removing missing BMI records or data outside our age range of interest (1-17 years).


2. Due to the data collection methods using medical databases in ALSPAC and NFBC66,

there were potentially multiple measures of height and weight at each age. The

SPLMM will not run unless there are unique values for each of the variables.

Therefore, all multiple measures were removed except the final measure.

3. Similarly to the age, there were repeated height and weight values; for example, if the

child had a health care visit one day, and the parent might record those same values

the following day on the cohort questionnaire so that the height/weight values are the

same but the age is different. All but the final measure were removed.

4. Some of the height measures decreased or remained constant over time. The

following step-wise process was used to identify which record would be removed:

o If height at time j was higher than height at time j+1 and j+2, then it was

removed. E.g. height at 1 year=80cm, height at 1.5 years=75, height at

1.7years =77cm then height at year 1 was removed

o If height at time j was higher than height at time j+1, but height at time j-1 was

higher than height at j+1, then height at j+1 was removed. E.g. height at 1

year=74cm, height at 1.5 years=75cm, height at 1.7 years=72cm then height at

1.7 years was removed

o Finally, if the above two steps didn’t apply, then the second height measure

was removed (i.e. j+1)

5. Finally, any height or weight measures that were ±4SD from the mean of each year

group were investigated in males and females separately. For height, all measures

were removed, however for weight a two stage process was undertaken:

a. If the measure identified was the only measure ±4SD for that individual then it

was removed. If there was more than one measure for an individual that was

±4SD then they were retained. The reason for this was that weight becomes

increasingly skewed as the children get older, and it was therefore expected

that a number of correctly measured ‘outliers’ would be present.

b. If the measure identified was the only measure ±4SD for an individual and it

was their final measure, then their previous measure was looked at. If their

previous measure was ±3SD then their final measure was kept in, but

otherwise removed. The rationale is that the individual is perhaps on an

increasing trend in weight and if there were further measures they are likely to

be ±4SD.


5.3.3 Longitudinal Modelling

As shown in Chapter 2, the best fitting model for the GWAS was the SPLMM [197]. Therefore,

the final model for the jth individual and at the tth time-point is as follows:

BMIjt = β0 + (Σ i β i (Agejt – Age )i + Σk γk ((Agejt-Age ) - κk)i+) * sex + Σ l β l Covariatel

+ u0j

+ Σ i uij (Agejt – Age )i + Σk ηkj ((Agejt-Age ) - κk)i+ + ε jt

Where Age is the mean age over the t time points in the sample (i.e. eight years), κk is the kth

knot and (t - κk)+=0 if t ≤ κk and (t - κk) if t > κk, which is known as the truncated power basis

that ensures smooth continuity between the time windows and Covariate includes the first five

principal components in the Raine Study and NFBC66 studies and the measurement source

variable in ALSPAC only. The knot points used in each study are defined below. All models

assumed a continuous autoregressive of order 1 correlation structure. Genetic differences in

the trajectories were estimated by including an interaction between the spline function for age

and the imputed value for each genetic variant (i.e. an additive genetic model).

In Chapter 2, it was shown that the BMI growth trajectories over childhood differ between

males and females, and different genetic variants are associated with growth in males and

females. However, a recent GWAS meta-analysis from the GIANT consortium has shown that

there are no genome-wide significant differences between males and females for BMI in adults

[283]. Given the results of the meta-analysis and the computational time involved in

conducting a GWAS using these longitudinal models, the sexes were combined into one model

for the full GWAS analysis, with the inclusion an age by sex interaction; however, the

significant findings will be further characterised by looking at sex stratified models.

In the Raine Study and ALSPAC, the ideal knot point placement in males and females was at

ages two, eight and twelve years. Given the optimal knot points in the males and females in

these two cohorts were the same, these were used for the combined model in the GWAS

analysis. However, for NFBC66, different knot points were optimal in males (two, ten and

sixteen) and females (three, ten and twelve), so the data was combined and the modelling was

conducted again to determine the optimal placements of the knots (using the same criteria as

in Chapter 2, Section 2.4.3). The final knots chosen for the NFBC66 combined data were two,

ten and twelve years. Each of these knot points were used for the cohort specific analyses,

with a cubic slope for each spline.


5.3.4 Statistical Analysis

A full GWAS analysis was conducted in the Raine Study. A discussion regarding the GWAS

analysis of the ALSPAC and NFBC66 cohorts will be presented in Section 5.6.

A robust standard error was calculated for each fixed effect parameter and corresponding P-

Value, using the same calculation as in Chapter 4, Section 4.4.4. A robust test of the overall

SNP effect using a Wald test was not conducted as these were computationally intensive, and

still displayed inflation in the type 1 error in the simulations presented in Chapter 4, Section

4.5.4.

Given the complex nature of the curve being fit by these longitudinal models, four test

statistics were chosen to be of interest from the GWAS including:

1. Global test (Wald test): this will determine which SNPs affect overall BMI and BMI

growth over childhood. This is calculated by conducting a Wald test for a model with

and without the SNP.

2. SNP by age interaction (Wald test): this will determine which SNPs affect BMI growth

over childhood. Given the spline function has multiple parameters (i.e. it is a non-

linear function of time), it is necessary to use a global test to summarize the effect of

each SNP on BMI growth. This is calculated by conducting a Wald test for a model with

just the SNP main effect and a second model with the SNP by spline function

interaction.

3. SNP main effect: this will determine which SNPs affect average childhood BMI.

4. SNP by linear age interaction: this will determine which SNPs affect the initial increase

in growth.

These tests were deemed the most appropriate to identify the genetic associations with BMI

growth trajectories as they look at both the change in BMI over time as well as the overall shift

(up or down) of the curve.

All analyses were conducted in R [222] using the nlme package. The usual genome-wide

significance threshold is 5x10-8 [273,274]; however Duggal et al [265] suggest a threshold of

suggestive evidence (P-Value < 6.09x10-6) based on the number of independent SNPs when the

study data is imputed against the HapMap CEU population. This suggestive evidence P-Value

threshold was adopted for the current study, as it is consistent with many of the cross-


sectional GWASs conducted to date whose P-Values to select SNPs for follow-up generally

range from P-Value < 5x10-6 [72] to P-Value < 1x10-5 [179].

5.3.5 Additional Analysis for Characterizing Significant Findings

Regions of interest were determined by at least one SNP in the region reaching the suggestive

level of significance across two or more of the parameters of interest. For regions that were

deemed to be significant, several additional analyses were conducted on the most significant

SNP in the region to determine how it was affecting growth. This included: 1) analysing both

the height and weight trajectories over the same time period to determine whether the SNP

had a larger effect on skeletal or adiposity change; 2) investigating whether the SNP can be

detected from birth; 3) exploring how the SNP influences several aspects of the growth

trajectory, including the age and BMI at the adiposity rebound. These analyses were conducted

in the Raine Study only; the details of the analyses are described below. In addition, replication

the region of interested was attempted in ALSPAC and NFBC66 cohorts.

The same SPLMM framework was used to model weight and height trajectories over childhood

and adolescence. While the optimal height model was same as the BMI model, the weight

model had the same placement for the knot points but had a linear spline from 1-2 years, cubic

slope for 2-8 years and 8-12 years and finally a quadratic slope for over 12 years. The height

and weight models also assumed a continuous autoregressive of order 1 correlation structure.

Genetic differences in the trajectories were estimated by including an interaction between the

spline function for age and the imputed value for the genetic variant.

The association between the imputed genetic variant and weight and length at birth was

analysed using linear regression, adjusting for gestational age at birth.

Age and BMI at adiposity rebound were derived by setting the first derivative of the fixed and

random effects from the BMI model between two and eight years of age for each individual to

zero (i.e. the minimum point in the curve). Linear regression, adjusting for sex, was used to

investigate the associations of the imputed genetic variant with age and BMI at the adiposity

rebound.


Replication analysis in the ALSPAC and NFBC66 cohorts included SNPs 250kb up and

downstream from the most significant SNP in the region of interest (a total of 500kb around

the most significant SNP). Given the Raine Study is relatively small in comparison to the two

replication cohorts, the whole region of interest was used for the replication stage, rather than

just the most significant SNP, to ensure that the analysis from the Raine Study had highlighted

the correct gene for association with BMI trajectory and not a nearby gene. Association

analysis between these SNPs and longitudinal BMI was conducted using the SPLMM models

described in Section 5.3.3. Results from the two cohorts were meta-analysed. SNPs were

excluded if they had a MAF less than 5%, an R2 value of less than 0.3 in ALSPAC (i.e. imputation

quality score from MACH [53]), and an info value less than 0.4 in NFBC66 (i.e. imputation

quality score from IMPUTE [52]). For the SNP main effect and SNP by linear age interaction

terms, a random-effects inverse-variance weighted meta-analysis was conducted in METAL

[284] using the beta coefficients and standard errors from the two studies; the robust standard

errors were used for the SNP by linear age interaction term. Stouffer’s Z-score method [285],

weighting by the number of individuals in each study, was used to meta-analyse the global

Wald tests for the overall SNP effect and the SNP by age interaction.

5.3.6 Pathway Analysis

A common extension to a GWAS analysis is to look at the biological process underlying the

GWAS signals using a gene/pathway analysis [286]. These analyses, also known as ‘gene set

enrichment analyses’, use prior biological knowledge on gene function to group SNPs along a

particular pathway together for a more powerful analysis than the single SNP approaches.

There are various tools and databases designed specifically for these analysis, including

MAGENTA [287], i-GSEA4GWAS [288] and GSA-SNP [289]. i-GSEA4GWAS was used as it did not

require any additional software licences, used test statistics from the GWAS results rather than

importing the individual level genotype data and used a competitive test as the null hypothesis

whereby the statistics for genes in a given pathway were compared to statistics for other

genes, which adjusts for the genomic inflation of the test statistic. It has previously been

shown that the results of these analyses, particularly in i-GSEA4GWAS, are strongly affected by

imputation [290], as LD patterns are not taken into account; Zhang et al discuss this in their

manuscript and suggest that SNP sets are pruned to R2>0.2 before being used in analysis [288].

Therefore, a list of SNPs was generated from the genotyped data in the Raine Study which

included SNPs not in LD; these SNPs were selected from the GWAS results files for analysis in i-


GSEA4GWAS. Only gene sets having 20-200 gene members were used, with a 20kb padding

added to the end of each gene. The canonical pathways database was used to define the genes

in each pathway, which contain the pathways integrated and curated from a variety of online

resources, as outlined in MSigDB v2.5 (http://www.broadinstitute.org/gsea/msigdb/index.jsp).

5.4 Results 5.4.1 Comparison of Cohorts

The growth trajectories of the three cohorts were compared to investigate the potential

between study heterogeneity. It is important to know the heterogeneity of the growth

trajectories between the studies so any attempt to replicate genetic associations can be

interpreted in light of these differences. Figure 5.2 displays the growth trajectories in females

and males in the three cohorts. BMI at age one is almost identical in the two European

cohorts, ALSPAC and NFBC66, whereas it is lower in the Raine study. The NFBC66 has a

considerably lower BMI at the adiposity rebound on average than the Raine Study and ALSPAC,

whereas the Raine Study has an earlier average adiposity rebound than the other cohorts. This

leads to a lower BMI through adolescence and into adulthood for the NFBC66, whereas both

ALSPAC and the Raine Study have similar trajectories with the Raine Study crossing over

ALSPAC for the first time just before the onset of puberty. These profiles are similar in both

males and females.

Table 5.1 also highlights the differences in the timing and magnitude of the adiposity rebound,

with the Raine Study, particularly the females, having a much earlier rebound than the other

two cohorts. The negative correlation between age and BMI at the adiposity rebound in all

cohorts shows that an earlier rebound is associated with a higher BMI at the rebound; it is also

common for BMI to track throughout childhood, therefore a high BMI at the adiposity rebound

often leads to a high BMI in later life. Interestingly, the Raine Study has the highest correlation

between the age and BMI at the rebound. The lower BMI at the rebound in the NFBC66 may

highlight the generational differences between these cohorts, with the NFBC66 being recruited

more than 23 years earlier than the Raine Study and ALSPAC. Over that time, the

environmental determinants of obesity changed dramatically with the increase in fast food

consumption and decrease in physical activity.


Figure 5.2: Population average BMI trajectories in females (A) and males (B) for each of the

three cohorts; the Raine Study (red), ALSPAC (green) and NFBC66 (blue).

Table 5.1: Mean (SD) age and BMI at the adiposity rebound in the three cohorts, in addition

to the correlation between the two measures.

Raine Study ALSPAC NFBC66

Female Male Female Male Female Male

Age at Adiposity Rebound 4.63

(1.08)

5.30

(1.04)

5.57

(1.17)

6.10

(1.01)

5.53

(0.95)

5.63

(0.84)

BMI at Adiposity Rebound 15.40

(0.93)

15.53

(0.95)

15.72

(1.07)

15.80

(1.06)

15.27

(1.20)

15.41

(1.06)

Correlation between Age

and BMI at Adiposity

Rebound

-0.85 -0.84 -0.64 -0.67 -0.57 -0.53


5.4.2 Results from the Raine Study GWAS

5.4.2.1 Summary of GWAS

As outlined in Chapter 1, Section 1.4.3, it is important to check the distribution assumptions of

the test statistics to ensure that they have the correct asymptotic behaviour. Figure 5.3

displays the Q-Q plots for each of the four tests of interest from the GWAS. As observed, the

test statistics from the two Wald tests are inflated, which was anticipated given the simulation

study conducted in Chapter 4. The test for the SNP main effect showed no inflation; however

when using the robust standard error this measure became slightly inflated, therefore the

usual standard error will be used from here on for this parameter. Finally, the test for the SNP

by linear age interaction effect showed inflation; however this was reduced to 1.06 using the

robust test, which will be referred to for the remainder of this Chapter. Given the robust test

will be used for the test of the SNP by linear age parameter but not the SNP main effect, it is

important to see how it compares to the classical test for the two parameters. Figure 5.4

shows that for the SNP by age fixed effect, the robust P-Values are often the same as or larger

than the classical P-Values, whereas the robust P-Values for the SNP main effect are not

consistently smaller or larger than the classical P-Values.


Figure 5.3: Q-Q plot for each of the four tests of interest in the Raine Study GWAS

Figure 5.4: Plot of standard versus robust test P-Values


5.4.2.2 Regions of Interest

A Manhattan plot is commonly used to summarize the results from a GWAS, whereby the

negative log-transformed P-Values from the association analysis are plotted against the

chromosomal position. The negative log-transformed P-Values are used so that the smallest P-

Values have the largest transformed P-Value and can be easily seen. It is termed a Manhattan

plot as it resembles the Manhattan skyline, where skyscrapers (genome-wide significant loci)

tower over smaller level buildings. Figure 5.5 to Figure 5.8 show the Manhattan plots for each

of the parameters of interest from the GWAS.

Figure 5.5 is the Manhattan plot for the global SNP effect using the Wald test. As seen in Figure

5.3, this test has inflated type 1 error, so the significance of the association needs to be treated

with caution. Due to this inflation, SNPs were investigated further if they had a genome-wide

significant P-Value of less than 5x10-8. There are six SNPs that reached this threshold; five of

which are in the KCNJ15 gene on chromosome 21, and one on chromosome 2. Although it

doesn’t meet the strict genome-wide significance threshold, there is another region of interest

on chromosome 13; a group of SNPs near the MTIF3 gene, with the P-Value of the most

significant SNP being 9.81x10-8, an area that has been shown to be associated with adult BMI

[72].

Similar results are seen for the global SNP by age interaction term (Figure 5.6). There are two

SNPs that reach genome-wide significance, one in KCNJ15 on chromosome 21 and one on

chromosome 2. There are an additional 11 SNPs that have a P-Value between 5x10-8 - 1x10-7;

these include an additional six SNPs in KCNJ15, three SNPs in MTIF3 and two SNPs in the CADM

gene on chromosome 3. Interestingly, none of the SNP or SNP by age fixed effects were

significant for the two CADM SNPs, and therefore these were not investigated any further.

No SNPs reached genome-wide significance for the fixed effect estimate for the SNP main

effect (the SNP by spline function is in the model but not taken into account here) (Figure 5.7).

This was expected as there was no inflation detected and the sample size is relatively small to

detect small genetic effects. Therefore, regions with suggestive evidence as outlined by Duggal

et al [265] with a SNP with P-Value < 6.09x10-6 are discussed; these included two SNPs in the

KDR gene on chromosome 4, four SNPs on chromosome 5 in an intergenic region, one SNP in


the CPLX2 gene on chromosome 5, one SNP in an intergenic region on chromosome 14 and

one SNP in the CACNG3 gene on chromosome 16.

Finally, Figure 5.8 shows the Manhattan plot for the SNP by linear age effect. Due to the slight

inflation remaining with the robust test, regions were selected that had at least two SNPs with

suggestive evidence for association at P-Value < 6.09x10-6; this included two SNPs in the

HS1BP3 gene on chromosome 2, five SNPs upstream from the GRM7 gene on chromosome 3,

three SNPs in the ACPL2 gene on chromosome 3, seven SNPs crossing multiple genes on

chromosome 4 including the USP53 and C4orf3 genes, six SNPs in the MIR4500HG gene on

chromosome 13, two SNPs in an intergenic region of chromosome 14, three SNPs on

chromosome 15, 16 SNPs in the KCNJ15 gene on chromosome 21 and 10 SNPs in the

TBC1D22A gene on chromosome 22.

The KCNJ15 locus, rs2008580, reached the significance thresholds for three out of four of the

parameters of interest. This locus is located in an intron of the potassium inwardly-rectifying

channel, subfamily J, member 15 gene. The gene has previously been shown to be associated

with increased risk of type 2 diabetes in a Japanese population [291], however the association

was found with a different primary SNP. Okamoto et al found that the association with this

locus was more prominent in lean, rather than obese, individuals [291]. Therefore, it could be

the growth pattern during childhood and adolescence that is influencing these individuals type

2 diabetes risk, rather than their final BMI which is more common in European populations.

This therefore provides additional evidence that this gene could be a likely candidate for

growth trajectories and was hence taken forward for replication in the ALSPAC and NFBC66

cohorts.


Figure 5.5: Manhattan plot of the P-Values from the global SNP effect (Wald test) for BMI


most significant genetic variant is in the KCNJ15 gene on chromosome 21.

Figure 5.6: Manhattan plot of the P-Values from the global SNP by age effect (Wald test) for

BMI trajectory in the Raine Study. The red line indicates the genome-wide significance level.

The most significant genetic variant is in an intergenic region on chromosome 2.


Figure 5.7: Manhattan plot of the P-Values from the SNP main effect for BMI trajectory in

the Raine Study. The red line indicates the genome-wide significance level. The most

significant genetic variant is in an intergenic region on chromosome 14.

Figure 5.8: Manhattan plot of the P-Values from the SNP by linear age effect for BMI


most significant genetic variant is upstream from the GRM7 gene on chromosome 3.


5.4.3 Characterising the Findings of the KCNJ15 Gene

Iwasaki et al identified several regions through linkage analysis that were associated with type

2 diabetes in a Japanese population; the largest LOD score was 1.92 for a region on

chromosome 21q [292]. Okamoto et al conducted an association study on this region to

attempt to localize the causal locus [291]. The first stage of their analysis included a case-

control association analysis using pooled DNA to narrow down the 9Mbp region to several loci;

these few loci were subsequently sequenced using individual level genotyping. This association

analysis revealed that the G allele (minor allele) in rs743296 was associated with a decreased

risk of type 2 diabetes. They went on to sequence the exons and promoter region of the gene

in healthy, unrelated subjects to find that both rs743296 and a SNP on exon 4, rs3746876,

were associated with type 2 diabetes risk; the minor allele of rs3746876 increasing the risk of

disease. Unfortunately, their strongest association, rs3746876, has a MAF of less than 1% in

European populations and is not in the HapMap data or on any of the common GWAS chips.

However, rs743296 was imputed in the Raine Study GWAS and showed suggestive evidence

for association with BMI trajectory using the global Wald test (P-Value=3.8x10-6) and the Wald

test for SNP by age interaction (P-Value=2.3x10-6). They conclude that the rs3746876 SNP

effects type 2 diabetes risk in lean individuals; however, the ‘lean’ cases have a lower BMI than

their control group, which might indicate that this SNP is also acting on type 2 diabetes risk

through BMI. Finally, they conducted functional analysis on the protein level of Kcnj15 and

found that over expression of Kcnj15 decreased insulin secretion at high levels of glucose,

however did not change insulin secretion under normoglycemic conditions [293].

The evidence produced by the above studies [291,292,293] indicate that the KCNJ15 gene may

have pleiotropic effects, influencing multiple outcome measures related to insulin levels. The

strongest signal from the Raine Study GWAS, rs2008580, was therefore investigated in other

genome-wide studies consortium meta-analyses and phenotypes within the Raine Study. The

rs2008580 T allele, which was associated with increased average BMI and increased rate of

BMI growth, is the minor allele with frequency 0.24 in the Raine Study. It was not associated

with birth weight (P-Value=0.12) or birth length (P-Value=0.11) in the Raine Study, which was

consistent with the EGG Consortium meta-analysis of birth weight (P-Value=0.28) [294].

Interestingly, the T allele of the rs2008580 SNP was associated with increased birth weight

(β=57.89g, P-Value=0.03) and birth length (β=0.27cm, P-Value=0.02) in males but not females

(Pbirth weight=0.79, Pbirth length=0.57). The genetic effect is detectable from about 5.5 years of age,


and the T allele is associated with increased BMI at the adiposity rebound (β=0.11, P-Value=

0.02) but not age at the adiposity rebound (P-Value=0.06). The SNP was not associated with

age of menarche in the girls in the Raine Study (P-Value=0.16).

The association between the rs2008580 locus and BMI growth was driven by an association

with change in weight (Wald P-Value=2.9x10-6), rather than a change in height (Wald P-

Value=0.11). The lack of association with height was also seen in the GIANT consortium meta-

analysis of height in adults (P-Value=0.27) [295], whereas the meta-analysis in the same

consortium for BMI showed rs2008580 approaching significance (P-Value=0.06) [72]. The T

allele was associated with higher average weight at age eight years (β=0.04, P-Value=9.5x10-6)

and change in weight over childhood and adolescence (Wald P-Value=0.0003). The BMI growth

trajectories for each of the three genotypes in rs2008580 in females and males are displayed in

Figure 5.9.The association between BMI growth and rs2008580 was stronger in females (Wald

P-Value=1.3x10-5) than males (Wald P-Value=0.002). The association in females seemed to be

driven by the change in BMI (PSNP=0.01, Wald PSNP*age=5.2x10-6) whereas the male association

was driven by the average BMI at age eight (PSNP=4.3x10-4, Wald PSNP*age=0.01). A similar

pattern of association was observed between weight growth and rs2008580; the overall

association was similar in females (Wald P-Value=0.002) and males (Wald P-Value=0.0006),

however the female association seemed to be driven by the change in weight (PSNP=0.03, Wald

PSNP*age=0.001) whereas the male association was driven by the average weight at age eight

(PSNP=7.8x10-5, Wald PSNP*age=0.03).

Given the previous associations observed by Okamoto et al [291] with type 2 diabetes, fasting

insulin and glucose levels were also analysed using linear regression. Although the rs2008580 T

allele was not associated with fasting glucose levels in either the Raine Study at age 17 (P-

Value=0.35) or in the Meta-Analysis of Glucose and Insulin-related traits Consortium (MAGIC)

meta-analysis (P-Value=0.16) [296], it was associated with increased fasting insulin (β=0.011,

P-Value=0.01), HOMA-β (β=0.0086, P-Value=0.03) and HOMA-IR (β=0.011, P-Value=0.01) levels

in the MAGIC consortium meta-analysis [296]. In addition, SNPs in high LD with rs2008580 in

the KCNJ15 gene were significantly associated with increased risk of type 2 diabetes in the

DIAGRAM consortium (rs6517456, [LD statistics with rs2008580: r2= 0.96, D’=1] OR=1.04, P-

Value=0.048) [297].


Figure 5.9: BMI trajectories for females (left) and males (right) for each of the KCNJ15,

rs2008580 alleles.

According to the SNPInfo database [298], the rs2008580 locus is a transcription factor binding

site (TFBS). SNPs in TFBS regions have been shown in experimental studies to lead to

differences in transcription factor binding between individuals [299], indicating that SNPs

within TFBS regions are more likely to play a biological role than other SNPs in the associated

region without evidence of overlap with any functional data [300]. This SNP is therefore

believed to have a functional role within the KCNJ15 gene.


5.4.4 Results from Replication and Meta-Analysis

The most significant SNP from the GWAS analysis in the Raine Study, rs2008580, was not

significantly associated with BMI or BMI trajectory in either ALSPAC (P-ValueWald=0.9476, P-

ValueWald(SNP*Age)=0.9044, P-ValueSNP=0.7919, P-ValueSNP*Age=0.4106) or NFBC66 (P-

ValueWald=0.8725, P-ValueWald(SNP*Age)=0.8817, P-ValueSNP=0.9175, P-ValueSNP*Age=0.8419).

Four hundred and thirty three SNPs were included in the meta-analysis; as mentioned in

section 5.3.5, SNPs were included in the meta-analysis if they were within 250kb of the

rs2008580 variant, had a MAF greater than or equal to 5% and an R2 greater than or equal to

0.3 or info greater than or equal to 0.4. Although none of the SNPs met the genome-wide

significance threshold of < 5x10-8, 26 SNPs reached a Bonferroni corrected P-Value of 0.0001

for the region (i.e. 0.05/433) in the meta-analysis of the global Wald test; no SNPs reached a

Bonferroni corrected P-Value for the Wald test for the SNP by age interaction. The most

significant SNP from the meta-analysis for the global Wald test for the overall SNP effect was

rs2836241. As can be seen in Figure 5.10, the SNPs in high LD with rs2836241 spread across

the Down syndrome critical region genes 4, 8 and 10 (DSCR4, DSCR8 and DSCR10). This region

across the DSCR genes appears to have a greater effect on the average BMI effect at age eight

than the linear trajectory over childhood (Figure 5.11); although, the most significant SNPs for

the SNP main effect (rs2836444 P-Value=0.0220) and SNP by linear age interaction (rs2836335

P-Value=0.0043) did not reach the Bonferroni corrected P-Value.


Figure 5.10: Regional plot of (A) global Wald P-Values for the overall SNP effect and (B) Wald

P-Values for the SNP by age effect as a function of genomic position (NCBI Build 36) from the

meta-analysis of ALSPAC and NFBC66 for KCNJ15 gene region. In each plot, the meta-analysis

P-Value for rs2836241 is denoted by a purple diamond; all other analysed SNPs are

represented by a circle. Local LD structure is reflected by the plotted estimated recombination

rates (taken from HapMap). The colour scheme of the circles respects LD patterns (HapMap

CEU pairwise r2 correlation coefficients) between rs2836241 and surrounding variants. Gene

annotations were taken from the University of California Santa Cruz genome browser.

(A)

(B)


Figure 5.11: Regional plot of (A) P-Values for the SNP main effect at age eight and (B)

P-Values for the SNP by linear age effect as a function of genomic position (NCBI Build 36)

from the meta-analysis of ALSPAC and NFBC66 for KCNJ15 gene region. In each plot, the

meta-analysis P-Value for rs2836241 is denoted by a purple diamond; all other analysed

SNPs are represented by a circle. Local LD structure is reflected by the plotted estimated

recombination rates (taken from HapMap). The colour scheme of the circles respects LD

patterns (HapMap CEU pairwise r2 correlation coefficients) between rs2836241 and

surrounding variants. Gene annotations were taken from the University of California Santa

Cruz genome browser.

(A)

(B)


5.4.5 Results from Pathway Analysis

Pathway analysis in i-GSEA4GWAS uses a false discovery rate (FDR) adjustment for multiple

testing; therefore, significant pathways are determined by a q-value<0.05. There were no

significant pathways using SNPs that were independent (SNPs with LD>0.2 were excluded);

that is, all pathways analysed had an FDR q-value greater than 0.05. There was one pathway

that reached significance using the classical P-Value, though it did not reach significance after

adjusting for multiple testing, for the SNP main effect; however it is presented in Table 5.2 as

an exploratory finding.

Table 5.2: Results from the pathway analysis using SNPs not in LD

Pathway Description P-Value FDR q-value

Significant genes/Selected genes/All genes

Parameter

HSA00260 GLYCINE SERINE AND THREONINE METABOLISM

Genes involved in glycine, serine and threonine metabolism

0.001 0.24 12/28/45 SNP main effect

5.5 Discussion In this study, a GWAS for BMI trajectory across childhood and adolescence was conducted in

the Raine Study, with replication of the most significant region in the ALSPAC and NFBC66

cohorts. The results show that the KCNJ15 gene is associated with BMI trajectory over this

time period. The association of the most significant SNP in the Raine Study, rs2008580, is

driven by a faster rate of change in BMI over time in females but an increase in average BMI in

males. In addition, this SNP appears to be driven by a change in weight, rather than a change in

height, therefore indicating an influence on adiposity rather than skeletal growth. The SNP

does not affect birth weight, which is consistent with other studies of established BMI loci

[301,302]. The rs2008580 SNP is intronic, which, according to Hindorff et al, is consistent with

45% of the loci discovered in GWASs [47]; it is also at a transcription factor binding site. An

association was also found between the KCNJ15 loci and increased fasting insulin levels but not

with glucose, which is consistent with the conclusion made by Okamoto [293] that inactivation

of Kcnj15 leads to increased insulin secretion. Unfortunately, this specific SNP did not replicate

in the ALSPAC and NFBC66 cohorts.


There are several possible reasons explaining why the rs2008580 SNP association with BMI

trajectory in the Raine Study was not replicated in the ALSPAC and NFBC66 cohorts, including:

1) the association observed in the Raine Study was spurious; 2) the differences in the growth

patterns between the three cohorts seen in Section 5.4.1 are in part due to different genetic

profiles; 3) there are different environmental stimuli in each of the cohorts interacting with the

genetic variants; 4) differences in LD patterns between the cohorts due to ethnic origin.

Palmer and Cardon [20] state that “For most complex human diseases, the reality of multiple

disease-predisposing genes of modest individual effect, gene-gene interactions, gene-

environment interactions, heterogeneity of both genetic and environmental determinants of

disease and low statistical power mean that both initial detection and replication will likely

remain difficult.” Although the rs2008580 SNP didn’t replicate in the two European cohorts, it

appears that the region on chromosome 21 containing the KCNJ15 gene may still be of interest

for association with BMI trajectory across childhood; there are several SNPs in high LD in the

KCNJ15 gene, and nearby genes, which were highlighted in the global Wald tests.

As outlined in Section 5.4.3, the KCNJ15 gene influences insulin secretion, therefore potentially

having effects on both risk of type 2 diabetes and increased BMI. It is hypothesised that the

KCNJ15 gene shows similarities with the discovery of the FTO gene but for childhood growth.

The FTO gene was first discovered in a GWAS of type 2 diabetes. When additional analysis was

conducted, it was found that the association between FTO and diabetes risk was purely due to

an association between FTO and BMI [174]. To a lesser extent, the KCNJ15 gene might be

similar to this, but rather than acting on diabetes risk through final BMI it influences both

childhood BMI (and weight) growth and diabetes risk through its effect on insulin. A study by

Bhargava et al shows that adults with impaired glucose intolerance or type 2 diabetes are not

only obese in adulthood but also have increased rate of growth from age 2 years onwards

[134]. It is therefore plausible that a genetic variant influencing rate of BMI growth could

ultimately influence risk of type 2 diabetes. The same pattern of growth was shown by Barker

et al for increased risk of cardiovascular events [303]. If true, this would nicely support the

early life approaches to adult disease, whereby the combination of inherited genetic variants

and adverse early life exposures lead to an increased risk of diabetes in later life. The early life

exposure could be the less optimal growth trajectory itself, or some exposure(s) that influence

the growth trajectory. Unfortunately, the individuals in the Raine Study are too young to fully

test this hypothesis at present.


SNPs in the MTIF3 gene had P-Values just below the suggestive levels of significance for three

out of four of the parameters of interest; however it is worth mentioning as it has previously

been shown to be associated with adult BMI [72]. The most significant SNP from this gene in

the current study showed a stronger association with average BMI at age eight than BMI

trajectory, consistent with knowledge that BMI tracks throughout life. It was promising to see

that the GWAS in a relatively small cohort was able to detect both novel and previously known

genetic regions.

Some of the other regions that reached significance thresholds for only one of the test

statistics also have plausible mechanisms for influencing BMI growth trajectory. For example,

the locus on chromosome 2 that was significant for the two global tests is in a region of the

chromosome that is also associated with waist-hip ratio in adults in the National Institutes of

Health (NIH) database of Genotypes and Phenotypes (dbGaP). The HS1BP3 gene, which

showed suggestive evidence for association with the SNP by age parameter, encodes a protein

that maybe involved in lymphocyte activation, and the SNP with strongest association in the

Raine Study was approaching significance in the GIANT consortium meta-analysis of BMI (P-

Value=0.075) [72]. A gene downstream from the significant finding in the ACPL2 gene on

chromosome three was previously shown to be associated with height in both a European

[304] and Korean study [305]. Therefore, this region might be a marker of height growth

throughout childhood, which influences BMI through skeletal growth rather than changes in

adiposity. Ten SNPs in the TBC1D22A gene reached the significance threshold, a gene which

has previously been associated with longevity [306] and metabolism in dbGaP. As outlined,

there are several interesting genes and genetic regions that have arisen from this GWAS of

BMI trajectory; however, until the loci have been replicated in independent studies, false

positives cannot be ruled out with certainty.

Given the FTO gene is the most commonly replicated gene for BMI in adulthood, and also

associated with childhood BMI growth [251], it was surprising that it was not one of the top

regions on any of the parameters in this study. However, analysis in Chapter 2 indicated that

the FTO locus, rs1121980, is significant in male but not females. The GWAS analysis presented

in this Chapter shows that the rs1121980 FTO SNP is significant when using a significance

threshold of P-Value<0.05, which doesn’t adjust for multiple testing (P-Valuewald=0.005, P-

Valuewald_int=0.003, P-ValueSNP=0.0001, P-ValueSNP*age=0.006), but due to the lack of association


in females this did not reach the genome-wide significance threshold. This is one of the

limitations of including an age by sex interaction rather than stratifying by sex.

5.5.1 Role of KCNJ15 Gene and Nearby Genes on Chromosome 21

It is unknown how the KCNJ15 gene, or nearby genes on chromosome 21, affect BMI growth

throughout childhood and adolescence. Three plausible pathways in which the KCNJ15 gene

may affect BMI growth have been identified in a literature review, which will be outlined

below.

As previously mentioned, the KCNJ15 gene is a potassium inwardly-rectifying channel that is

expressed in the pancreas. The potassium channel sits on the surface of the β-cells of the islets

of Langerhans in the pancreas and regulates depolarisation of the cell membrane, which in

turn regulates how much insulin is produced by the cell. Okamoto et al suggest that the

KCNJ15 protein, similar to KCNJ11, inhibits depolarization of the pancreatic β-cell membrane

by maintaining the resting membrane potential and therefore negatively regulating insulin

secretion [291]. This was also seen in their functional studies whereby “increased plasma

glucose induced KCNJ15 expression at a transcriptional level and inhibited insulin secretion

through the KCNJ15 channel” [293]. This functional information coupled with the results from

the MAGIC consortium indicate that the rs2008580 SNP, or a SNP in LD, has a role on both

releasing an abundance of insulin from the pancreas but also making other cells resistant to

the uptake of insulin, leaving the levels of fasting plasma insulin elevated.

Insulin is central in regulating carbohydrate and fat metabolism in the body. When insulin is

released from the pancreas it signals for the normal release of fatty acids from the adipose

tissue to be shut down, in conjunction with increasing the uptake of fatty acids. Insulin

prioritises the processing of carbohydrates and proteins that enter the body during a meal

over stored fats, and when these are processed it returns to burning stored fats. Adipocytes,

the cells that primarily compose adipose tissue, are responsible for the production of the

hormone leptin. The discovery of leptin in 1994 found that it is important in regulation of

appetite and acts as a satiety factor [307]. Lustig showed that “insulin and leptin share a

common central signalling pathway, and it seems that insulin functions as an endogenous

leptin antagonist” [308]. Therefore, it is biologically plausible that a gene regulating the release

of insulin influences BMI and weight growth.


Several of the type 2 diabetes associated genes are in or close to genes which are expressed in

the pancreas, with many shown to be associated with reduced beta-cell dysfunction; whereas,

genes associated with obesity or BMI are expressed in the brain, particularly the

hypothalamus, with some genes in the leptin signalling pathway [309,310]. Given the previous

association with KCNJ15 and type 2 diabetes, it may be that its effect on growth is highlighting

the onset of diabetes risk; in contrast, the KCNJ15 gene may be indicating another pathway, in

addition to that through the brain, to the development of obesity.

Along with the KCNJ15 gene’s influence on insulin levels, it is also located in the Down

syndrome critical region 1 of chromosome 21 [311]. Down syndrome is characterized by

various complex phenotypes, including dysmorphic features, hypotonia, developmental

abnormalities, deficiencies of the immune system and mental retardation [312]. These

phenotypes vary greatly between individuals with Down syndrome, indicating that multiple

genes are involved in its pathogenesis. Individuals who are trisomic for only part of

chromosome 21 and who share the same subset of phenotypes, have been used to define

Down syndrome critical regions [313]. An overdose of genes in these regions appears to

constitute the complex phenotypes of Down syndrome. This critical region contains not only

the KCNJ15 gene, but also the DSCR4 and DSCR8 genes [314] which were significantly

associated with BMI trajectory in the meta-analysis. Gosset et al indicate that the KCNJ15 gene

is expressed in the kidney and the brain [311], which is consistent with Okamoto who also

report KCNJ15 expression in the kidney [293] and other BMI associated genes which are

reported to be expressed in the brain [309,310]. Due to the various levels of expression during

foetal development and in adulthood, Gosset et al acknowledge that overexpression of the

KCNJ15 gene may have pleiotropic effects on organ function and Down syndrome phenotypes

[311].

Thirdly, according to the Kyoto Encyclopaedia of Genes and Genomes (KEGG) database, the

KCNJ15 gene belongs to the gastric acid secretion pathway in the Organismal

Systems/Digestive system class. Gastric acid is a key factor in normal upper gastrointestinal

functions, including protein digestion and absorption of both calcium and iron. It also provides

some protection against bacterial infections.


Further functional work is required to determine which of these possible pathways, if any, is

responsible for the observed association between the KCNJ15 gene and BMI growth over

childhood and adolescence.

5.6 Challenges and Future Research As observed with the chromosome-wide analysis in ALSPAC in Chapter 4, the λ for the SNP by

age term remains slightly inflated. One option to reduce the remaining inflation is to use a

genomic control adjustment [275]. Genomic control assumes that the inflation factor is

constant for all markers across the genome; this works well for inflation due to cryptic

relatedness, which is what genomic control was designed for, as the inflation is based on the

kinship matrix that is independent of any particular marker. The robust standard errors are not

inflated by the same amount for each SNP, therefore indicating that the initial inflation

probably isn’t constant. It may be possible that any remaining inflation after using the robust

standard error is constant; although this would need to be shown before the approach is

adopted for GWAS analysis.

Although consideration for computational time was taken into account in Chapter 2 when

selecting the optimal model, the SPLMM model for each SNP is relatively computationally

intensive, much more so than the commonly used cross-sectional models used for GWASs. In

the larger cohorts, the model for one SNP can take several minutes to run due to the

complexities of the correlation structure and spline function being fit. The original aim of this

chapter was to include the meta-analysis of GWAS results for all three cohorts; the Raine

Study, ALSPAC and NFBC66. The Raine Study GWAS took approximately 2 months to conduct

all 22 autosomes using up to 145 nodes on a supercomputer (http://www.ivec.org/). The

ALSPAC GWAS was estimated to take two and a half years for the imputed data as each of

their models take approximately 10 minutes and they have approximately 20 nodes to use.

Finally, the NFBC66 GWAS was estimated to take 10-20 years for the imputed data as each

model takes 3-4 minutes to run and they only have one node to use. These are clearly not

feasible timelines to be included in this thesis, or to be used for a manuscript. It was therefore

decided that the full GWAS would be conducted in the Raine Study only for this thesis and the

other two cohorts would be used for replication. In the meantime, the R code used for

calculating the robust standard error has been rewritten in C++ code, which is more time


http://www.ivec.org/

efficient and reduces the computational time for the robust standard error calculation in

NFBC66 from several minutes to less than a second. The analysts for this project at ALSPAC and

NFBC66 have also had several discussions with the high performance computing teams at their

universities to determine a computationally efficient procedure for running the required

scripts. This chapter has shown that although the most computationally feasible model was

chosen, which could incorporate all the complexities in the data, GWAS analysis in the

timeframe of this thesis was still not possible in large cohort studies; however, these small

changes to the analytic scripts and how they are run on the computing facilities at the

universities will allow the analysis to be conducted for a future publication. Once the meta-

analysis of these three cohorts has been conducted, replication analysis of the most significant

findings will be undertaken in additional cohorts in the EGG Consortium. The project has been

approved by this Consortium, and the current analytic plan has been circulated to the relevant

cohorts.

GWASs are designed to identify regions of interest for further follow-up; they generally do not

have the ability to identify the causal locus, or in some cases the causal gene, of a given

genetic signal. This can be seen in the results presented in this Chapter, whereby the KCNJ15

gene and several genes upstream (DSCR4, DSCR8 and DSCR10) were found to be associated

with BMI trajectory across childhood using the two global Wald tests. Therefore, sequencing

this region would be required to determine the causal locus.

5.7 Conclusion In conclusion, a genome-wide association study of BMI trajectory across childhood was

conducted in the Raine Study with replication in two large European cohorts and found a

significant association with the KCNJ15 gene. The KCNJ15 gene has previously been linked to

increased risk of type 2 diabetes, increased levels of insulin and insulin resistance. The

rs2008580 SNP appears to be driven by a change in weight, rather than a change in height,

hence indicating that it is influencing adiposity rather than skeletal growth, which is consistent

with increased levels of circulating insulin. Therefore, through the development of appropriate

longitudinal models for BMI trajectories, a novel biologically plausible gene for BMI trajectory

over childhood has been discovered.


Chapter 6: Association Of A Genetic Risk Score With Longitudinal BMI In Children 6.1 Introduction The results presented in this chapter have been published [315]; the manuscript is included as

an appendix (Appendix E).

The final area of interest was to investigate how genetic variants known to be associated with

adult BMI influence growth over childhood and adolescence (including BMI, height and weight)

and related growth parameters (including age and BMI at both the adiposity peak and

rebound). Using the statistical methods described in the previous chapters, I analyse the

association between 32 adult BMI associated genetic variants and growth over childhood and

adolescence.

6.2 Background Recent GWASs have begun to uncover genetic loci contributing to increases in BMI in

adulthood [72,174,175,178,179,180,182]. The largest genome-wide meta-analysis of BMI

published to date included 249,796 individuals from the GIANT Consortium; which confirmed

14 previously-reported loci and identified 18 novel loci for BMI [72]. There has been one GWAS

to date that has focused on a dichotomous indicator of childhood obesity [183], but none

looking at BMI on a continuous scale in childhood.

Once adult height is attained, changes in BMI are largely driven by changes in weight. In

contrast, during childhood and adolescence, changes in BMI are influenced by both changes in

height and weight. Therefore, genetic variants that affect adult BMI may influence change in

weight, height or both during childhood. Previous studies of adult BMI SNPs in relation to

infant and child change in growth (BMI, weight and height) have shown little evidence of an

association with birth weight [206,209,316], but have shown evidence that these loci are

associated with more rapid height and weight gain in infancy [206,209], and higher BMI and

odds of obesity at multiple ages across the life course [205,206,209,251,316], with the

magnitude of associations not being constant across all ages. Rates of change in height, weight

206 Chapter 6: Genetic risk score

and BMI and other features of a child’s growth are influenced by a combination of genetic and

environmental factors. These act interactively to shape developmental milestones including,

the adiposity peak at approximately nine months of age [129], adiposity rebound at around

the age of 5-6 years, and puberty between 10 and 13 years of age [156,251,317,318]. Early age

at adiposity rebound is associated with greater risk of diabetes [134,135], hypertension [136]

and obesity [132,133] in adulthood. Sovio et al [251] and Belsky et al [209] have recently

shown that SNPs associated with adult BMI are also associated with earlier age and higher BMI

at adiposity rebound. However, relatively little is known about the association of the timing of

the adiposity peak with disease later in life; Silverwood et al [128] and Sovio et al [129] have

both shown that a delayed adiposity peak is associated with increased BMI later in life.

Understanding how genetic loci are associated with BMI and other anthropometric measures

differentially across the life course may shed light on the biological pathways involved, as well

as providing insight into the development of obesity that may be useful in the design of

interventions.

6.2.1 Aims

To date, there has been no comprehensive study of how all the genetic variants known to date

to be associated with adult BMI influence growth over childhood and adolescence and growth

parameters (age and BMI at the adiposity peak and rebound). One of the limitations of

previous studies is they have not stratified by sex, despite some evidence that sex-specific

differences in body composition may be partly due to genetics [319,320], with different

(possibly partially overlapping) genes contributing to variation in body shape in men and

women. Therefore, the two aims for this Chapter are:

1) To examine the association of alleles at 32 loci identified in the recent GWAS of BMI in

European adults [72], in combination as an allelic score, with BMI, weight and height

growth trajectories from age 1 to 17 in two birth cohorts. In addition to investigating

how early in life a genetic effect can be detected, and exploring genetic influences on

several aspects of the growth trajectories, i.e. age and BMI at adiposity peak in infancy

and adiposity rebound in childhood.

2) To assess whether there are differences in the associations between BMI and the 32

individual genetic loci between males and females.


6.3 Subjects and Materials 6.3.1 Study Populations

Both cohorts are described in detail in Chapter 1, Section 1.6. The subsets used in this analysis

are described below.

ALSPAC: A subset of 7,868 individuals were used for analysis in this study using the following

inclusion criteria: at least one parent of European descent, live singleton birth, unrelated to

anyone in the sample (one of every related pair was selected at random), no major congenital

anomalies, genotype data and at least one measure of BMI throughout childhood. BMI was

calculated from the weight and height measurements (median nine measures per person, IQR:

5-12, range 1-29 measurements), with a total of 68,862 BMI measures.

Raine Study: A subset of 1,460 individuals was used for analysis in this study using the same

criteria as in ALSPAC. BMI was calculated from the weight and height measurements (median

six measures per person, IQR: 5-7, range 1-8 measurements), with a total of 8,687 BMI

measures.

6.3.2 SNP Selection and Allelic Score

Speliotes et al [72] reported 32 variants to be associated with BMI. In addition, Belsky et al

[209] selected a tag SNP from each LD block that had previously been shown to be associated

with BMI-related traits. The 32 SNPs from either of these two manuscripts were selected; SNPs

reported in these two manuscripts that were within the genes of interest were all in high LD

(r2>0.75). All 32 SNPs were well imputed (all R2 for imputation quality > 0.7, with average of

0.981), therefore the dosages were extracted from the imputed data (i.e. the estimated

number of increasing BMI alleles as described in Chapter 1, Section 1.4.3) for these 32 SNPs in

both ALSPAC and the Raine Study. Each SNP was incorporated into the BMI trajectory model

independently assuming an additive genetic effect for the BMI-increasing allele. In addition, an

‘allelic score’ was created by summing the dosages for the BMI-increasing alleles across all 32

SNPs [231]. Finally, a sensitivity analysis was conducted whereby the alleles were weighted by

the published effect size for adult BMI (Table 6.2). Both the weighted and unweighted allelic

scores were standardized to have a mean of zero and standard deviation of one to allow for

comparison.


6.4 Statistical Analysis 6.4.1 Longitudinal Modelling and Derivation of Growth Parameters

A SPLMM using smoothing splines to yield a smooth growth curve estimate, as described in

Chapter 2, Chapter 4 and Warrington et al [197], was fit to the BMI, weight and height

measures. The basic model for the jth individual and at the tth time-point is as follows:

Growthjt = β0 + Σ i β i (Agejt – Age )i + Σk γk ((Agejt-Age ) - κk)i+ + Σ l β l Covariatel

+ u0j + Σ i

uij (Agejt – Age )i + Σk ηkj ((Agejt-Age ) - κk)i+ + ε jt

Where Growth is BMI, weight or height, Age is the mean age over the t time points in the

sample (i.e. eight years), κk is the kth knot and (t - κk)+=0 if t ≤ κk and (t - κk) if t > κk, which is

known as the truncated power basis that ensures smooth continuity between the time

windows and Covariate are the study specific (time independent) covariates. Three knot points

were used, placed at two, eight and 12 years, with a cubic slope for each spline in the BMI and

height models [197]. The weight model had the same placement for the knot points but had a

linear spline from 1-2 years, cubic slope for 2-8 years and 8-12 years and finally a quadratic

slope for over 12 years. All models assumed a continuous autoregressive of order 1 correlation

structure.

Age and BMI at adiposity rebound were derived by setting the first derivative of the fixed and

random effects from the BMI model between two and eight years of age for each individual to

zero (i.e. the minimum point in the curve). In addition, a second model was fit in ALSPAC only,

between birth and five years to derive the adiposity peak, with each individual required to

have greater than two measures throughout this period, similar to Silverwood et al [128]. BMI

measurements after five years were excluded to avoid the period of adiposity rebound. This

model had knot points at ages one and 2.5 years and a cubic slope between each. Adiposity

peak was derived by setting the first derivative of the fixed and random effects between birth

and 2.5 years to zero. The Raine Study does not have enough repeated measurements

between birth and one year of age to justify calculating the adiposity peak.

6.4.2 Statistical Analysis

Implausible height, weight and BMI measurements (> 4SD from the mean for sex and age

specific category) were considered as outliers and were recoded to missing.


Genetic differences in the trajectories were estimated by including an interaction between the

spline function for age and the genetic variants (each SNP [BMI only] and the allelic score). The

association between the genetic variants as an allelic score and weight and length at birth was

analysed using linear regression, adjusting for gestational age at birth. An interaction between

gestational age (as a continuous variable and dichotomised as preterm [<37 weeks] and full

term) and the allelic score was also tested to determine whether associations between BMI-

associated variants and growth differ by gestational age. Linear regression was used to

investigate the associations of the allelic score with age and BMI at both adiposity peak

(ALSPAC only) and adiposity rebound.

As discussed in Chapter 3, the growth data was collected using three measurement sources in

ALSPAC; clinic visits, measurements made during routine health care visits, and parental

reports in questionnaires. Whilst the measurements from routine health care visits have

previously been shown to be accurate in this cohort [190], parental report of children’s height

tends to be overestimated while weight tends to be under estimated [191], so all analyses of

the trajectories adjusted for a binary indicator of measurement source (parent reports versus

clinic/health care measurements) as a fixed effect in ALSPAC.

Section 1.6 in Chapter 1 showed evidence for population stratification in the Raine Study so all

analyses in the current study are adjusted for the first five principal components. This was not

the case for ALSPAC so no adjustment was made.

Given that growth curves differ greatly between males and females, particularly around

puberty, and because different genes may influence the timing of growth spurts in males and

females, the effect of sex over the time period was investigated. Preliminary analysis showed

evidence for a sex-age interaction in both cohorts (likelihood-ratio test [LRT] P-Values < 0.0001

in both cohorts); therefore all analyses were conducted stratified for sex.

Given that FTO is the most replicated SNP for BMI, with the largest effect size of the BMI-

associated SNPs to date, and has been shown previously to effect childhood growth [205,251],

there is some concern that the results of the allelic score are representing the associations

seen with the FTO loci only. To determine whether this was the case, all of the analyses looking

at the allelic score were repeated including an adjustment for the FTO locus.


The results from the two cohorts for each of the analyses were meta-analysed. A meta-analysis

was conducted rather than pooling the data as different covariates were necessary for each of

the cohorts. For the allelic score analyses, a fixed-effects inverse-variance weighted meta-

analysis was conducted using the beta coefficients and standard errors from the two studies.

No statistically significant heterogeneity was detected between the cohorts (all P-Values >

0.05), so there was no evidence for conducting a random-effects meta-analysis. The allelic

score was considered to be statistically significantly associated with the growth parameter if

the P-Value for the meta-analysis was less than 0.05. For the analyses of the individual SNPs

with BMI, a P-Value meta-analysis was conducted on the LRT P-Values from the two studies,

without weighting, and a Bonferroni significance threshold of 0.0016 (0.05/32) was used to

declare a statistically significant association. All analyses were conducted in R version 2.12.1

[222], using the spida library to estimate the spline functions, the rmeta library for the effect-

size meta-analysis and the MADAM library for the P-Value meta-analysis.

6.5 Results The characteristics of the cohorts are described in Table 6.1. Fifty-one percent of both cohorts

were male, the ALSPAC participants had more BMI measures throughout childhood than the

Raine Study participants with a median of nine (IQR: 5-12) and six (IQR: 5-7), respectively. The

genetic loci are described in Table 6.2. All of the following results are reported from the meta-

analysis of the two cohorts, unless otherwise specified.

Table 6.1: Phenotypic characteristics of the two birth cohorts used for analysis

Sex ALSPAC Raine Study (n=7,868) (n=1,460)

Sex [% male (N)] 7,868 51.25% (4,032) 1,460 51.58% (753) N Mean (SD) N Mean (SD)

BMI measures per person

-- 8.75 (4.58) -- 5.94 (1.52)

Birth Weight (kg) Males 3,001 3.52 (0.53) 752 3.42 (0.57) Females 2,855 3.40 (0.47) 707 3.31 (0.55)

Birth Length (cm) Males 3,001 51.13 (2.40) 675 50.12 (2.34) Females 2,855 50.41 (2.28) 616 49.31 (2.28)

Gestational Age (wks.)

Males 3,001 39.52 (1.64) 753 39.42 (1.99) Females 2,855 39.65 (1.58) 707 39.42 (2.06)

Adiposity Peak Males 4,030 18.03 (0.76) -- -- BMI (kg/m2) Females 3,792 17.45 (0.69) -- --


Table 6.1 continued

Sex ALSPAC Raine Study Adiposity Peak Males 4,030 8.90 (0.33) -- -- Age (months) Females 3,792 9.36 (0.49) -- -- Adiposity Rebound Males 3,642 15.62 (1.04) 697 15.53 (0.93) BMI (kg/m2) Females 3,225 15.53 (1.06) 647 15.42 (0.95) Adiposity Rebound Males 3,642 6.07 (1.02) 697 5.30 (1.05) Age (years) Females 3,225 5.61 (1.16) 647 4.64 (1.10)

Age Stratum ALSPAC Raine Study Age 1-1.49 2,832 1.18 (0.18) 1,326 1.15 (0.09) (years) 1.5-2.49 7,113 1.76 (0.25) 387 2.14 (0.13)

2.5-3.49 2,537 2.95 (0.28) 956 3.09 (0.09) 3.5-4.49 6,915 3.77 (0.23) 20 3.69 (0.17) 4.5-5.49 1,843 5.05 (0.33) 3 5.28 (0.14) 5.5-6.49 3,848 5.90 (0.24) 1,269 5.91 (0.17) 6.5-7.49 2,861 7.31 (0.30) 42 7.25 (0.38) 7.5-8.49 3,975 7.74 (0.33) 1,040 8.02 (0.27) 8.5-9.49 4,443 8.71 (0.22) 204 8.60 (0.12) 9.5-10.49 6,777 9.94 (0.29) 303 10.44 (0.08) 10.5-11.49 4,917 10.75 (0.23) 926 10.64 (0.15) 11.5-12.49 5,240 11.82 (0.21) 4 11.91 (0.36) 12.5-13.49 6,797 12.97 (0.22) 9 13.28 (0.17) 13.5-14.49 4,690 13.89 (0.17) 1,196 14.06 (0.17) 14.5-15.49 2,339 15.32 (0.15) 24 14.69 (0.17) 15.5-16.49 1,645 15.72 (0.22) 2 16.16 (0.19) >16.5 90 16.83 (0.24) 976 17.05 (0.24)

BMI 1-1.49 2,832 17.42 (1.51) 1,326 17.11 (1.39) (kg/m2) 1.5-2.49 7,113 16.82 (1.49) 387 15.97 (1.19)

2.5-3.49 2,537 16.48 (1.40) 956 16.14 (1.23) 3.5-4.49 6,915 16.25 (1.39) 20 15.92 (1.41) 4.5-5.49 1,843 16.02 (1.70) 3 15.94 (1.43) 5.5-6.49 3,848 15.71 (1.87) 1,269 15.82 (1.62) 6.5-7.49 2,861 16.10 (1.98) 42 16.41 (2.43) 7.5-8.49 3,975 16.31 (2.01) 1,040 16.83 (2.38) 8.5-9.49 4,443 17.15 (2.40) 204 16.90 (2.44) 9.5-10.49 6,777 17.67 (2.81) 303 18.91 (3.34) 10.5-11.49 4,917 18.25 (3.10) 926 18.55 (3.16) 11.5-12.49 5,240 19.04 (3.35) 4 16.78 (2.64) 12.5-13.49 6,797 19.64 (3.35) 9 21.11 (3.75) 13.5-14.49 4,690 20.31 (3.45) 1,196 21.39 (4.02) 14.5-15.49 2,339 21.28 (3.48) 24 21.66 (4.23) 15.5-16.49 1,645 21.41 (3.51) 2 20.14 (3.26) >16.5 90 22.47 (3.40) 976 23.01 (4.28)


Table 6.2: Descriptive statistics of the single nucleotide polymorphisms included in the allelic score.

Chr Nearest Gene SNP Alleles GWAS Effect Size

for BMI

Effect Allele Frequency

R2 HWE

Effect Allele / Non-effect Allele

ALSPAC Raine Study

ALSPAC Raine Study

ALSPAC Raine Study

1 NEGR1 rs2568958 A/G 0.13 0.60 0.62 1.00 1.00 0.59 0.58 TNNI3K rs1514175 A/G 0.07 0.42 0.44 1.00 1.00 0.96 0.87 PTBP2 rs1555543 C/A 0.06 0.59 0.59 1.00 1.00 0.83 0.96 SEC16B rs543874 G/A 0.22 0.21 0.20 1.00 0.99 0.50 0.09

2 TMEM18 rs2867125 C/T 0.31 0.83 0.83 1.00 1.00 0.34 0.27 RBJ, ADCY3, POMC rs713586 C/T 0.14 0.49 0.48 1.00 1.00 0.27 1.00 FANCL rs887912 T/C 0.1 0.29 0.29 1.00 1.00 0.28 0.57 LRP1B rs2890652 C/T 0.09 0.17 0.16 0.99 0.98 0.04 0.92

3 CADM2 rs13078807 G/A 0.1 0.20 0.21 1.00 1.00 0.12 0.88 ETV5, DGKG, SFRS10 rs7647305 C/T 0.14 0.79 0.79 0.97 1.00 0.38 0.13

4 SLC39A8 rs13107325 T/C 0.19 0.08 0.07 1.00 1.00 0.87 0.57 GNPDA2 rs10938397 G/A 0.18 0.43 0.44 0.99 0.99 0.11 0.02

5 FLJ35779, HMGCR rs2112347 T/G 0.1 0.64 0.63 0.99 0.99 0.48 0.32 ZNF608 rs4836133 A/C 0.07 0.49 0.49 0.94 0.93 0.69 0.47

6 TFAP2B rs987237 G/A 0.13 0.18 0.19 1.00 1.00 0.49 0.27 9 LRRN6C rs10968576 G/A 0.11 0.32 0.31 1.00 1.00 0.35 0.58

LMX1B rs867559 G/A 0.24 0.20 0.20 1.00 1.00 0.67 1.00

http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=867559

Table 6.2 continued

Chr Nearest Gene SNP Alleles GWAS Effect Size

for BMI

Effect Allele Frequency

R2 HWE

Effect Allele / Non-effect Allele

ALSPAC Raine Study

ALSPAC Raine Study

ALSPAC Raine Study

11 RPL27A, TUB rs4929949 C/T 0.06 0.54 0.52 0.97 0.97 0.48 0.36 BDNF rs6265 C/T 0.19 0.81 0.81 1.00 1.00 0.002 0.24 MTCH2, NDUFS3, CUGBP1

rs3817334 T/C 0.06 0.40 0.42 1.00 1.00 0.98 0.71

12 FAIM2 rs7138803 A/G 0.12 0.36 0.37 1.00 1.00 0.73 0.09 13 MTIF3, GTF3A rs4771122 G/A 0.09 0.23 0.22 0.93 0.93 0.07 0.41 14 PRKD1 rs11847697 T/C 0.17 0.05 0.04 0.97 0.93 0.71 0.26

NRXN3 rs10150332 C/T 0.13 0.21 0.22 1.00 1.00 0.69 0.17 15 MAP2K5, LBXCOR1 rs2241423 G/A 0.13 0.79 0.77 1.00 1.00 0.87 0.21 16 GPRC5B, IQCK rs12444979 C/T 0.17 0.86 0.85 1.00 0.98 0.93 0.92

SH2B1, ATXN2L, TUFM, ATP2A1

rs7359397 T/C 0.15 0.42 0.38 1.00 1.00 0.69 0.87

FTO rs9939609 A/T 0.39 0.39 0.38 1.00 1.00 0.69 0.70 18 MC4R rs12970134 A/G 0.23 0.27 0.25 1.00 1.00 0.51 0.63 19 KCTD15 rs29941 G/A 0.06 0.68 0.66 1.00 1.00 0.72 0.56

TMEM160, ZC3H4 rs3810291 A/G 0.09 0.69 0.64 0.77 0.71 0.01 4.81x10-5

QPCTL, GIPR rs2287019 C/T 0.15 0.81 0.81 1.00 1.00 0.85 0.87

http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=12970134

6.5.1 Association Between the Allelic Score and Growth Trajectories

The allelic score was associated with higher mean levels of BMI at age eight (Female:

β=0.0061, P-Value < 0.001; Male: β=0.0044, P-Value < 0.001) and faster BMI growth over

childhood in both sexes (Table 6.3). Due to the increasing rate of growth over time, the

trajectories of individuals with high and low allelic scores begin together at age one but

separate throughout childhood (Figure 6.1).

To investigate whether the association of these loci with BMI growth over childhood was due

to skeletal growth or adiposity, associations between the allelic score and both weight and

height measurements were tested. The allelic score was associated with higher weight

(Females: β=0.0073, P-Value < 0.001; Males β=0.0056, P-Value < 0.001) and faster rates of

weight gain over childhood in both males and females (Figure 6.1 and Table 6.3). As for

associations with BMI, the association was seen earlier in males (by one year of age in ALSPAC)

than females (around two years of age in ALSPAC). The allelic score was associated with

increased height in females (β=0.0949, P-Value < 0.001) and males (β=0.0838, P-Value < 0.001)

and also displayed evidence for an interaction with age (Figure 6.1 and Table 6.3).

Interestingly, the effect size of the allele score increases over childhood until around 10 years

of age in females and slightly later in males and then decreases until it becomes statistically

non-significant (Figure 6.2).

Table 6.3: Results of the allelic score with each of the trajectory outcomes (BMI, weight and

height) in both cohorts and the combined meta-analysis. Significant findings are in bold;

Spline 1 is the change in slope between two and eight years, Spline 2 is the change in slope

after 12 years and Spline 3 is the change in slope before two years.

ALSPAC Raine Study Combined

Effect Beta SE Beta SE Beta 95% CI P

ln(B

MI)

Fem

ale

Score 0.006 0.001 0.007 0.001 0.006 (0.005, 0.007) <0.01 age:score 0.001 1.0x10-4 0.001 3.0x10-4 0.001 (0.001, 0.001) <0.01 age2:score -3.7x10-5 1.0x10-4 -3.0x10-4 3.0x10-4 -6.2x10-5 (-3.0x10-4,

1.0x10-4) 0.54

age3:score -1.0x10-4 4.7x10-5 -2.0x10-4 2.0x10-4 -9.1x10-5 (-2.0x10-4, -2.7x10-6)

0.04

Spline 1 -5.0x10-5 1.0x10-4 2.0x10-4 3.0x10-4 -2.5x10-5 (-2.0x10-4, 2.0x10-4)

0.80

Spline 2 0.001 2.0x10-4 1.4x10-6 4.0x10-4 3.0x10-4 (-5.0x10-5, 0.001)

0.09

Spline 3 -0.008 0.006 -0.012 0.018 -0.009 (-0.020, 0.002) 0.12


Table 6.3 continued


Effect Beta SE Beta SE Beta 95% CI P

ln(B

MI)

Mal

e

Score 0.004 0.001 0.007 0.001 0.004 (0.003, 0.005) <0.01 age:score 0.001 1.0x10-4 0.001 2.0x10-4 0.001 (0.001, 0.001) <0.01 age2:score 3.0x10-4 1.0x10-4 1.0x10-4 3.0x10-4 2.0x10-4 (5.2x10-5,

4.0x10-4) 0.01

age3:score 3.5x10-5 4.5x10-5 -1.4x10-5 1.0x10-4 3.1x10-5 (-5.3x10-5, 1.0x10-4)

0.47

Spline 1 -3.0x10-4 1.0x10-4 -1.0x10-4 3.0x10-4 -3.0x10-4 (-0.001, -8.3x10-5)

0.01

Spline 2 0.001 2.0x10-4 3.0x10-4 4.0x10-4 0.001 (3.0x10-4, 0.001)

<0.01

Spline 3 0.003 0.005 -0.008 0.016 0.002 (-0.008, 0.012) 0.70

ln(W

eigh

t)

Fem

ale

Score 0.007 0.001 0.009 0.002 0.007 (0.006, 0.009) <0.01 age:score 0.001 1.0x10-4 0.001 3.0x10-4 0.001 (0.001, 0.001) <0.01 age2:score 1.6x10-5 1.0x10-4 1.0x10-4 2.0x10-4 2.2x10-5 (-8.2x10-5,

1.0x10-4) 0.68

Spline 1 -1.0x10-4 3.8x10-5 -3.0x10-4 2.0x10-4 -1.0x10-4 (-2.0x10-4, -5.9x10-5)

<0.01

Spline 2 - - 0.001 4.0x10-4 - - -

Mal

e

Score 0.005 0.001 0.008 0.002 0.006 (0.004, 0.007) <0.01 age:score 0.001 1.0x10-4 0.001 3.0x10-4 0.001 (0.001, 0.001) <0.01 age2:score 2.0x10-4 5.0x10-5 -1.0x10-4 1.0x10-4 2.0x10-4 (10.0x10-5,

3.0x10-4) <0.01

Spline 1 -2.0x10-4 3.5x10-5 -1.0x10-4 1.0x10-4 -2.0x10-4 (-3.0x10-4, -1.0x10-4)

<0.01

Spline 2 - - 2.0x10-4 3.0x10-4 - - -

Hei

ght

Fem

ale

Score 0.088 0.028 0.124 0.059 0.095 (0.045, 0.145) <0.01 age:score 0.012 0.004 0.024 0.008 0.014 (0.007, 0.020) <0.01 age2:score 0.002 0.004 -0.006 0.010 0.001 (-0.006, 0.008) 0.76 age3:score 0.001 0.002 -0.004 0.005 0.001 (-0.002, 0.004) 0.64 Spline 1 -0.007 0.004 1.0x10-4 0.010 -0.006 (-0.012, 0.001) 0.07 Spline 2 0.024 0.008 0.020 0.012 0.023 (0.010, 0.035) <0.01 Spline 3 0.109 0.183 -0.393 0.581 0.063 (-0.279, 0.406) 0.72

Mal

e

Score 0.077 0.027 0.116 0.062 0.084 (0.035, 0.133) <0.01 age:score 0.011 0.004 0.009 0.009 0.011 (0.003, 0.018) 0.04 age2:score 0.001 0.004 -0.012 0.013 -2.0x10-4 (-0.007, 0.007) 0.96 age3:score 1.0x10-4 0.002 -0.004 0.006 -1.0x10-4 (-0.003, 0.003) 0.92 Spline 1 4.0x10-4 0.004 0.007 0.013 0.001 (-0.006, 0.008) 0.79 Spline 2 -0.016 0.009 -0.008 0.016 -0.014 (-0.029, 0.001) 0.06 Spline 3 -0.082 0.184 0.0708 0.698 -0.072 (-0.419, 0.276) 0.69


Figure 6.1: Population average curves for individuals from ALSPAC with 27, 29 or 31 BMI

risk alleles in females (A, C and E) and males (B, D and F). Predicted population average BMI

(A and B), weight (C and D) and height (E and F) trajectories from 1 – 16 years for individuals

with 27 (lower quartile), 29 (median), and 31 (upper quartile) BMI risk alleles in the allelic

score.


Figure 6.2: Associations between the allelic score and BMI (A and B), weight (C and D) and

height (E and F) at each follow-up in females and males from ALSPAC. Regression coefficients

(95% CI) derived from the longitudinal model at each year of follow-up between 1 and 16

years.


6.5.2 Associations Between the Allelic Score and Birth Measures, Adiposity Peak

and Adiposity Rebound

As expected, females were both lighter and shorter than males at birth (Table 6.1). The allelic

score was not associated with the birth measures in either sex (Table 6.4). In addition, there

was no interaction between the allelic score and gestational age for either weight or length at

birth (data not shown).

In ALSPAC the mean age of the adiposity peak was slightly later in females at 9.03 months

(SD=0.76) than males at 8.43 months (SD=0.55), with males also having a higher BMI at the

peak than females (Table 6.1). The estimated age and the BMI at the peak were weakly

correlated in females (ρ=0.08) and males (ρ=-0.30). Greater age at adiposity peak was

associated with higher BMI at age 15-17 in females (β=0.6257kg/m2, P-Value < 1.65x10-8) but

not associated with later BMI in males (β=-0.0408kg/m2, P-Value=0.78). In addition, higher BMI

at adiposity peak was associated with higher BMI at age 15-17 years (Females: β=1.3682kg/m2,

P-Value=3.11 x10-44, Males: β=1.0578kg/m2, P-Value=9.4463x10-30). The allelic score was not

statistically significantly associated with age of reaching the adiposity peak in females or males

(Table 6.4). However, the allelic score was associated with a higher BMI at the peak (Table 6.4),

explaining less than 0.5% of the variation in BMI at the peak in both females (0.42%) and males

(0.22%). Adjustment for age at the adiposity peak did not substantively alter the magnitude of

the association of the allelic score with BMI at adiposity peak (Females: β=0.0157kg/m2, P-

Value=0.0003; Males: β=0.0135kg/m2, P-Value=0.0007).

Table 6.4: Cross-sectional association analysis results for birth measures, BMI and age at

adiposity peak (AP) and BMI and age at adiposity rebound (AR) in ALSPAC and the Raine

Study.

Females Males

Beta (95% CI) P-Value Beta (95% CI) P-Value Birth weight -0.0004 (-0.0043, 0.0035) 0.83 0.0026 (-0.0017, 0.0069) 0.23 Birth length -0.0158 (-0.0352, 0.0036) 0.11 -0.0002 (-0.0190, 0.0186) 0.98 BMI at AP 0.0163 (0.0079, 0.0248) <0.001 0.0123 (0.0041, 0.0204) <0.001 Age at AP 0.0074 (-0.0002, 0.0151) 0.06 0.0028 (-0.0025, 0.0080) 0.30 BMI at AR 0.0332 (0.0237, 0.0427) <0.001 0.0364 (0.0277, 0.0451) <0.001 Age at AR -0.0362 (-0.0467, -0.0257) <0.001 -0.0362 (-0.0450, -0.0274) <0.001


The ALSPAC participants had a later adiposity rebound than the Raine Study participants with a

mean of 6.1 years (SD=1.02) versus 5.3 (SD=1.05) years in boys and 5.6 (SD=1.16) years versus.

4.6 (SD=1.10) years in girls, respectively (Table 6.1). Earlier age at the adiposity rebound and

higher BMI at the adiposity rebound were both associated with higher BMI at age 15-17 years.

The allelic score was associated with an earlier age at the adiposity rebound for females and

males (Table 6.4), both of which remain associated independent of BMI at the rebound,

however the effect size attenuates (Females: β=-0.0122 years, P-Value=0.002; Males: β=-

0.0096 years, P-Value=0.002). The allelic score was also associated with higher BMI at the

rebound in females and males (Table 6.4), both of which remain significantly associated when

adjusting for age at the rebound, although the effect size attenuates (Females: β=0.0094kg/m2,

P-Value=0.01; Males: β=0.0109kg/m2, P-Value < 0.001). The allelic score accounts for 1-2% of

the variation in age and BMI at the adiposity rebound in the two cohorts, which is twice as

much of the variation in BMI that was accounted for at the time of the adiposity peak or in the

overall trajectory.

There is a strong positive correlation between BMI at the adiposity peak and the adiposity

rebound (Female ρ=0.65; Male ρ=0.59). The BMI at the adiposity rebound explains more of the

variation in BMI at age 15-17 than the BMI at the adiposity peak, with estimates of around 10%

for the adiposity peak and 45% for the adiposity rebound. Nevertheless, the allelic score

remains associated with BMI at the adiposity rebound after adjusting for the BMI at the

adiposity peak in both females (β=0.0171kg/m2, P-Value < 0.001) and males (β=0.0269kg/m2,

P-Value < 0.001).

6.5.3 Variance Explained by the Allelic Score

We calculated the percentage of variation in BMI explained by the allele score at each time

point in ALSPAC using the residual sums of squares from the longitudinal BMI growth model.

This was not calculated in the Raine Study as the sample size is too small for accurate

estimates. The allelic score explains 0.58% of the variation in BMI across childhood in females

and slightly less in males (0.44%). This is approximately a third of the variation in adult BMI

explained by these SNPs in the study that identified them [72]. Figure 6.3 displays the

estimates over childhood in females and males.


Figure 6.3: Estimates from the longitudinal models of the proportion of BMI variation

explained (R2) at each time point in females and males from ALSPAC. R2 derived from the

longitudinal model at each year of follow-up between 1 and 16 years. Of note, there are

increases in the proportion of BMI variation explained by the allelic score around the

landmarks of growth including adiposity peak and puberty.


The allelic score accounted for a similar percentage of BMI at the adiposity peak in both

females (0.42%) and males (0.22%). However, for the measures at the adiposity rebound, the

allelic score accounts for up to 1-2% of the variation in the two cohorts (Age: 0.87% in ALSPAC

females, 2.70% in the Raine Study females, 1.46% in ALSPAC males and 0.89% in the Raine

Study males; BMI: 1.01% in ALSPAC females, 1.87% in the Raine Study females, 1.46% in

ALSPAC males and 1.14% in the Raine Study males). This is twice as much of the variation in

BMI than was accounted for at the time of the adiposity peak or in the overall trajectory.

6.5.4 Sex Interactions Between the 32 Individual BMI SNPs and BMI Trajectories

In females, differences in BMI trajectories due to the allelic score were detectable from just

after one year of age in ALSPAC and approximately 2.5 years of age in the Raine Study. A LRT

for five of the 32 adult BMI loci reached a Bonferroni significance threshold of 0.0016 in the

meta-analysis, including loci from the RBJ, FTO, MC4R, CADM2 and MTCH2 regions (Appendix

F; Table 1).

In males, differences in BMI trajectories due to the allelic score were detectable from the

beginning of the curve at one year in ALSPAC and slightly later at one and a half years in the

Raine Study. A LRT for four of the 32 adult BMI loci were significantly associated with BMI

trajectory at the Bonferroni significance threshold in the meta-analysis, including loci from the

SEC16B, TMEM18, MC4R and FTO regions (Appendix F; Table 2).

Given that different genes associated with childhood BMI growth in males and females, BMI

trajectories with males and females combined were also conducted to investigate any sex by

genetic loci interactions. None of the sex differences for the 32 loci would be significant under

Bonferroni correction; however following result is reported here as an exploratory finding. The

meta-analysis of LRT P-Values for the interaction terms between sex and the NRXN3 loci,

rs10150332 (including interaction with the spline function), had a P-Value of 0.0039.

6.5.5 Adjustment for FTO Effect

When adjusting for the FTO loci, rs9939609, in the trajectory and adiposity peak/rebound

analyses, the results remained statistically significant, with no attenuation of the effect size.

This indicates that the associations between growth and the allelic score were independent of

the FTO effect (Table 6.5).


Table 6.5: Results of the allelic score, after adjustment for the FTO locus, with each of the

trajectory outcomes (BMI, weight and height) in both cohorts and the combined meta-

analysis. Significant findings are in bold; Spline 1 is the change in slope between two and eight

years, Spline 2 is the change in slope after 12 years and Spline 3 is the change in slope before

two years.


Effect Beta SE Beta SE Beta 95% CI P P (Het)

ln(B

MI)

Fem

ale

Score 0.006 0.001 0.007 0.001 0.006 (0.005, 0.007)

<0.01 0.3

age:score 0.001 1.1x10-4 0.001 2.7x10-4 0.001 (0.001, 0.001)

<0.01 0.7

age2:score -5.1x10-5 1.1x10-4 0.000 3.1x10-4 -8.3x10-5 (-3x10-4, 1x10-4)

0.43 0.4

age3:score -8.7x10-5 4.8x10-5 0.000 1.6x10-4 -9.7x10-5 (-2x10-4, -6x10-6)

0.04 0.5

Spline 1 -2.6x10-5 1.1x10-4 0.000 3.1x10-4 5.7x10-6 (-2x10-4, 2x10-4)

0.96 0.4

Spline 2 4.0x10-4 2.4x10-4 0.000 3.8x10-4 3.0x10-4 (-1x10-4, 6x10-4)

0.20 0.2

Spline 3 -0.008 0.006 -0.015 0.019 -0.009 (-0.020, 0.003)

0.13 0.7

Mal

e

Score 0.004 0.001 0.006 0.001 0.004 (0.003, 0.005)

<0.01 0.2

age:score 0.001 1.0x10-4 0.001 2.4x10-4 0.001 (0.001, 0.001)

<0.01 0.5

age2:score 2.4x10-4 1.1x10-4 1.7x10-4 3.0x10-4 2.0x10-4 (4x10-5, 4x10-4)

0.02 0.8

age3:score 2.9x10-5 4.5x10-5 4.2x10-5 1.5x10-4 3.1x10-5 (-5x10-5

1x10-4) 0.48 0.9

Spline 1 -2.6x10-4 1.1x10-4 -2.1x10-4 2.9x10-4 -2.0x10-4 (-4x10-4, -6x10-5)

0.01 0.9

Spline 2 0.001 2.4x10-4 3.9x10-4 3.6x10-4 0.001 (2x10-4, 0.001)

0.00 0.6

Spline 3 0.003 0.005 -0.004 0.016 0.003 (-0.007, 0.013)

0.60 0.7

ln(W

eigh

t)

Fem

ale

Score 0.007 0.001 0.009 0.002 0.007 (0.006, 0.009)

<0.01 0.4

age:score 0.001 1.1x10-4 0.001 3.3x10-4 0.001 (0.001, 0.001)

<0.01 0.2

age2:score 6.5x10-6 5.7x10-5 9.8x10-5 1.7x10-4 1.6x10-5 (-9x10-5, 1x10-4)

0.77 0.6

Spline 1 -1.1x10-4 3.9x10-5 -2.7x10-4 1.6x10-4 -1.0x10-4 (-2x10-4, -4x10-5)

<0.01 0.3

Spline 2 - - 0.001 3.7x10-4 - - - -

Mal

e

Score 0.005 0.001 0.007 0.002 0.005 (0.004, 0.007)

<0.01 0.2

age:score 0.001 1.1x10-4 0.001 3.0x10-4 0.001 (0.001, 0.001)

<0.01 0.7

age2:score 2.x10-4 5.1x10-5 -4.4x10-5 1.5x10-4 2.0x10-4 (9x10-5, 3x10-4)

<0.01 0.1

Spline 1 -2.1x10-4 3.6x10-5 -7.9x10-5 1.4x10-4 -2.0x10-4 (-3x10-4, -1x10-4)

<0.01 0.4

Spline 2 - - 2.0x10-4 3.3x10-4 - - - -


Table 6.5 continued


Effect Beta SE Beta SE Beta 95% CI P P (Het)

Hei

ght

Fem

ale

Score 0.088 0.029 0.147 0.061 0.099 (0.048, 0.150)

<0.01 0.4

age:score 0.011 0.004 0.024 0.008 0.013 (0.007, 0.020)

<0.01 0.2

age2:score 0.001 0.004 -0.006 0.010 0.001 (-0.006, 0.007)

0.88 0.5

age3:score 0.001 0.002 -0.004 0.005 0.001 (-0.002, 0.004)

0.71 0.4

Spline 1 -0.006 0.004 -0.001 0.010 -0.006 (-0.012, 0.001)

0.10 0.6

Spline 2 0.024 0.008 0.022 0.012 0.023 (0.010, 0.036)

0.00 0.9

Spline 3 0.154 0.187 -0.352 0.602 0.109 (-0.241, 0.459)

0.54 0.4

Mal

e

Score 0.069 0.028 0.104 0.063 0.075 (0.025, 0.125)

0.00 0.6

age:score 0.009 0.004 0.007 0.009 0.009 (0.002, 0.017)

0.01 0.8

age2:score 0.001 0.004 -0.012 0.013 -2.0x10-4 (-0.007, 0.007)

0.95 0.3

age3:score 1.5x10-4 0.002 -0.004 0.006 -1.0x10-4 (-0.003, 0.003)

0.94 0.5

Spline 1 0.001 0.004 0.009 0.013 0.001 (-0.006, 0.008)

0.73 0.6

Spline 2 -0.018 0.009 -0.012 0.016 -0.016 (-0.031, -0.001)

0.03 0.8

Spline 3 -0.116 0.186 0.039 0.708 -0.106 (-0.458, 0.246)

0.56 0.8

Adip

osity

Pea

k

Fem

ale Age 0.009 0.004 - - - - - -

BMI 0.019 0.004 - - - - - -

Mal

e Age 0.003 0.003 - - - - - - BMI 0.013 0.004 - - - - - -

Adip

osity

Reb

ound

Fem

ale Age -0.028 0.006 -0.051 0.013 -0.032 (-0.043,

-0.021) <0.01 0.1

BMI 0.032 0.006 0.037 0.011 0.033 (0.023, 0.042)

<0.01 0.7

Mal

e

Age -0.032 0.005 -0.031 0.011 -0.032 (-0.041, -0.023)

<0.01 0.9

BMI 0.035 0.005 0.032 0.010 0.035 (0.026, 0.044)

<0.01 0.8

6.5.6 Comparison with Weighted Allelic Score

The mean of the weighted allelic score was 4.09 (SD=0.54) in ALSPAC and 4.06 (SD=0.55) in the

Raine Study while the mean for the unweighted allelic score was higher at 28.82 (SD=3.45) in

ALSPAC and 28.65 (SD=3.54) in the Raine Study. After standardizing both of these scores, the

results using the weighted allele score in the trajectory models displayed the same

associations as the unweighted results (Table 6.6).


Table 6.6: Comparison of the unweighted and weighted allelic scores for the three trajectory outcomes. Spline 1 is the change in slope between two and

eight years, Spline 2 is the change in slope after 12 years and Spline 3 is the change in slope before two years.

Unweighted Weighted Phenotype Sex Effect Beta 95% CI P Beta 95% CI P

ln(BMI) Female Score 0.0209 (0.017,0.0249) <0.001 0.0207 (0.0168, 0.0247) <0.001 age:score 0.0039 (0.00317, 0.0046) <0.001 0.0041 (0.00343, 0.00486) <0.001 age2:score -0.0002 (-0.0009, 0.0005) 0.59 -4.10x10-5 (-0.0008, 0.0007) 0.91 age3:score -0.0003 (-0.0007, 0.00004) 0.08 -0.0003 (-0.0007, 0.00006) 0.10 Spline 1 -0.0001 (-0.0008, 0.0006) 0.79 -0.0003 (-0.0010, 0.0005) 0.49 Spline 2 0.0012 (-0.0002, 0.0025) 0.09 0.0015 (0.0001, 0.0028) 0.03 Spline 3 -0.0298 (-0.0674, 0.0079) 0.12 -0.0306 (-0.0678, 0.0067) 0.11

Male Score 0.0151 (0.0115, 0.0187) <0.001 0.0181 (0.0145, 0.0217) <0.001 age:score 0.0035 (0.0028, 0.0042) <0.001 0.0045 (0.0040, 0.0051) <0.001 age2:score 0.0008 (0.00008, 0.0015) 0.03 0.0009 (0.0002, 0.0016) 0.02 age3:score 8.00x10-5 (-0.0003, 0.0004) 0.67 0.0001 (-0.0002, 0.0005) 0.49 Spline 1 -0.0009 (-0.0016, -0.0002) 0.01 -0.0011 (-0.0018, -0.0004) 0.003 Spline 2 0.0022 (0.0009, 0.0035) 0.001 0.0026 (0.0013, 0.0040) <0.001 Spline 3 0.0070 (-0.0275, 0.0415) 0.69 -0.0003 (-0.035, 0.0345) 0.99

ln(Weight) Female Score 0.0251 (0.0199, 0.0303) <0.001 0.0248 (0.0196, 0.0300) <0.001 age:score 0.0034 (0.0027, 0.0042) <0.001 0.0037 (0.0030, 0.0045) <0.001 age2:score 0.0001 (-0.0003, 0.0005) 0.53 0.0002 (-0.0001, 0.0006) 0.22 Spline 1 -0.0004 (-0.0006, -0.0002) <0.001 -0.0005 (-0.0007, -0.0003) <0.001

Table 6.6 continued

Unweighted Weighted Phenotype Sex Effect Beta 95% CI P Beta 95% CI P

Male Score 0.0193 (0.0144, 0.0242) <0.001 0.0237 (0.0188, 0.0286) <0.001 age:score 0.0034 (0.0027, 0.0041) <0.001 0.0041 (0.0034, 0.0048) <0.001 age2:score 0.0006 (0.0002, 0.0009) 0.002 0.0005 (0.0002, 0.0009) 0.004 Spline 1 -0.0006 (-0.0008, -0.0004) <0.001 -0.0007 (-0.0009, -0.0005) <0.001

Height Female Score 0.3280 (0.1560, 0.5010) 0.0002 0.3140 (0.1420, 0.4860) <0.001 age:score 0.0470 (0.0245, 0.0695) <0.001 0.0414 (0.0190, 0.0638) <0.001 age2:score 0.0037 (-0.0194, 0.0268) 0.75 0.0030 (-0.0201, 0.0260) 0.80 age3:score 0.0024 (-0.0076, 0.0123) 0.64 0.0024 (-0.0076, 0.0123) 0.64 Spline 1 -0.0206 (-0.0430, 0.0018) 0.07 -0.0179 (-0.0402, 0.0044) 0.12 Spline 2 0.0790 (0.0352, 0.1230) <0.001 0.0629 (0.0191, 0.1070) 0.004 Spline 3 0.2230 (-0.9630, 1.4100) 0.71 0.0523 (-1.1200, 1.2300) 0.93

Male Score 0.2900 (0.1210, 0.4590) 0.001 0.3900 (0.2200, 0.5590) <0.001 age:score 0.0366 (0.0115, 0.0617) 0.004 0.0464 (0.0213, 0.0715) <0.002 age2:score -0.0005 (-0.0250, 0.0240) 0.97 -0.0028 (-0.0273, 0.0217) 0.83 age3:score -0.0005 (-0.0108, 0.0098) 0.93 -0.0009 (-0.0114, 0.0096) 0.87 Spline 1 0.0033 (-0.0209, 0.0274) 0.79 0.0006 (-0.0235, 0.0248) 0.96 Spline 2 -0.0495 (-0.1010, 0.0019) 0.06 -0.0278 (-0.0789, 0.0234) 0.29 Spline 3 -0.2480 (-1.4500, 0.9550) 0.69 0.5020 (-0.7090, 1.7100) 0.42

6.6 Discussion In this study, the association of variants in genes shown to be associated with increased BMI in

adulthood with longitudinal growth measures over childhood were investigated in two birth

cohorts of 7,868 and 1,460 individuals. Similar to previous studies, it was shown that an allelic

score derived from a set of known adult BMI-associated SNPs is not associated with birth

measures but is associated with BMI growth throughout childhood and adolescence. In

addition, a statistically significant association was detected between the allelic score and

weight gain over childhood and adolescence and also an association with height growth,

though the effect size for height was smaller than that for BMI and weight. It appears that the

association of the allelic score with height growth stops after the age that was considered to

reflect puberty, whereas it continues to be associated with weight, and therefore BMI. This

indicates that the variants that are associated with adult BMI are related to childhood BMI

growth by affecting both how much fat is accumulated and how tall the child grows until

puberty. Although the individuals SNPs are associated with adiposity, they might not all be

associated with height, or height growth, hence the weaker association detected between the

allelic score and height. Of the SNPs included in the allelic score, the GIANT consortium found a

SNP 30,000bp upstream from the RBJ loci and a SNP in the MC4R gene to be associated with

adult height [295]. An extension to the current study would be to investigate whether any of

the individual SNPs in the allelic score largely influence child height growth rather than weight;

however a larger sample size would be required to consider this.

This study is an extension to the work conducted by Elks et al [206] who investigated the

association of an eight SNP allelic score with growth trajectories from birth to 11 years of age

in ALSPAC. The analysis in this Chapter has extended their work using both ALSPAC and the

Raine Study, by increasing the age period over which the trajectories are examined and also

the number of SNPs investigated. By extending the time period under study, it was shown that

although the weight gain in early childhood is small for each additional risk allele, this gain

increases through late childhood and adolescence. Belsky et al [209] are the only other

investigators to look at an allelic score using the same set of SNPs; conclusions in this Chapter

are similar to theirs in terms of the growth trajectories throughout childhood however the

results in this Chapter were also able to show the early timing of effect and some exploratory

findings regarding sex specific genes effecting BMI growth.


Previous studies investigating the association between adult BMI associated SNPs and

childhood growth adjusted their analyses for sex [205,206,209,251,316]; only Hardy et al [205]

tested for a sex interaction and found it to be non-significant. Sex specific models were run

due to detecting a statistically significant sex interaction and this allowed the finding that the

allelic score begins to be associated with BMI and weight earlier in males than females, but

around the same age for height. Furthermore, other than the FTO and MC4R SNPs, different

genes appear to be associated with childhood BMI growth in males and females. However,

these differences could not be replicated in the formal interaction analysis and therefore

further investigation in larger sample sizes is required to confirm this observation.

Nevertheless, this is the first study in childhood that has observed some evidence for sex

differences for genetic associations of BMI, which provides additional evidence that there may

be different, but partially overlapping, genes that contribute to the body shape of men and

women, even in childhood.

For the first time, it has been shown that the effect of the BMI increasing alleles has a

detectable effect on childhood growth as early as one year of age in males and slightly later in

females. In addition, a statistically significant association was detected between the allelic

score and higher BMI at the adiposity peak, but only weak evidence of an association between

the allelic score and age at adiposity peak. This contrasts the findings for the association

between the FTO gene and adiposity peak shown in the NFBC66 [129], where the age at the

adiposity peak was associated with the FTO variant but not BMI at the peak. Similar to

previous studies investigating the genetic determinants of the adiposity rebound, the analysis

in this Chapter found that for each additional BMI risk allele an individual’s age at the adiposity

rebound decreases by approximately 11 days and also their BMI at the rebound increases by

0.035kg/m2.

Further studies are now required to assess the validity of these findings, particularly regarding

the onset of the genetic association. Both of the cohorts investigated had limited data

available in the first few years of life, and although the statistical modelling framework allowed

for the estimation the timing of the genetic association and the parameters around the

adiposity peak, it is important that this is replicated in cohorts with more regular

measurements throughout this time period. Likewise, although over 4,500 males and females


were included in the sex stratified analyses, it is important that the observed sex differences in

this study are replicated.

6.7 Conclusion In conclusion, an association analysis has been conducted in a large childhood population to

investigate the effect of known adult genetic determinants of BMI on childhood growth. It has

been shown that the genetic effect begins very early in life and that there are potentially sex

differences in the genetic effects of BMI throughout childhood. These results are consistent

with both the DOHaD and life course epidemiology hypotheses – the determinants of adult

susceptibility to obesity begin in early childhood and develop over the life course.


Chapter 7: Conclusions, Limitations And Future Directions 7.1 Main Findings This chapter provides a summary of the research and the conclusions presented throughout

this thesis. The limitations of this research are also presented and discussed. This chapter

concludes with a discussion of the potential areas of extension for future research.

In 2009, at the beginning of the research in this thesis, Kerner et al indicated that the groups in

the Genetic Analysis Workshop that focused on longitudinal analysis of GWAS data were

unable to establish a clear analytic strategy for dealing with complex longitudinal data

structures in a time efficient manner [85]. The primary aim of this thesis, as detailed in Chapter

1, was therefore to investigate the association between BMI growth trajectories across

childhood and adolescence and genetic variants on a genome-wide scale. Compared to cross-

sectional analyses, longitudinal studies are advantageous for investigating genetic associations

as they: 1) allow information to be shared between individuals across time improving precision

over analysis of a single time point; 2) facilitate the detection of genes that influence

trajectories rather than simple differences in phenotypes; and 3) allow the detection of genes

that are associated with age of onset of a trait. [85]. There are several traits, including BMI as a

measure of adiposity, where the age of onset provides insight to the pathophysiological

heterogeneity between individuals. For example, individuals who have a high BMI at all ages,

reflecting a high lean and fat body mass, appear to have relatively normal metabolic profiles;

individuals who have normal BMI followed by an early adiposity rebound and consequently a

higher BMI, reflecting increased fat rather than lean mass, are at higher risk for coronary heart

disease and insulin resistance [132,137,139,140]. Therefore, attention to the timing and

trajectory of a phenotype can help to clarify the underlying pathophysiological process. Kerner

et al conclude that “Future work should focus on the development of analytical methods and

computer software that can handle these longitudinal data in the context of other

complexities that are often found in cohort studies.” [85]. The research presented in this

thesis, which were a series of separate but consecutive studies, were conducted to investigate

statistical methods that could be used for analysing longitudinal data in large scale genetic

230 Chapter 7: Conclusions

studies. The first three studies investigated the statistical aspects of various modelling

approaches, while the final two studies focused on the genetic association analyses and clinical

implications; these studies are summarized below.

7.1.1 Longitudinal Statistical Models for Body Mass Index Growth Trajectories

throughout Childhood using the Western Australian Pregnancy Cohort

(Raine) Study [197]

After a systematic review of the literature, no statistical method has been described to detect

small genetic effects in a high dimensional, longitudinal study, such as modelling BMI

trajectories across childhood and adolescence. Therefore, in the first study of this thesis [197],

the major methodological challenges of analysing these data were investigated and the

SPLMM was shown to be the most appropriate model for genetic association studies of BMI

trajectories. This model has the ability to be used for high dimensional studies such as GWASs

or gene-gene/gene-environment interaction studies. This study contributes to a more

complete understanding of the advantages and limitations of each of the statistical methods

evaluated and provides a basis for furthering the exploration of genetic associations with BMI

trajectories – the focus of this thesis. In later studies, this model was applied to the ALSPAC

and NFBC66 cohorts, indicating that it is a flexible modelling framework that is generalizable to

other cohorts and potentially to other complex longitudinal phenotypes.

7.1.2 Comparing SPLMM to Two-Step Approach for GWASs

There have recently been several publications investigating a two-step approach for

conducting longitudinal GWASs that greatly reduces the computational time for traits that

have a linear trajectory over time [85,89,245]. The second study of this thesis explored how

this two-step approach could be applied to the BMI data over childhood and adolescence,

which has a non-linear trajectory over time, a high correlation between the intercept and slope

terms and non-normal, correlated errors. This study showed that the two-step approach

produced results that were unreliable for complex phenotypes, particularly due to the data

having a non-linear trajectory over time. It is recommended that before conducting a GWAS of

a longitudinal phenotype, that the one and two-step approaches are compared for reliability.

Therefore, although it is far more computationally intensive, the full SPLMM was used for the

GWAS analyses in this thesis.


7.1.3 Robustness of the Linear Mixed Effects Model to Distribution Assumptions

and Consequences for Genome-Wide Association Studies

When the four methods from the first study were applied to the ALSPAC data to ensure that

the SPLMM was generalizable, the residuals were non-normal and heteroscedastic, thus not

meeting the assumptions of the model. The third study presented in this thesis explored,

through simulation and a real data example, the effect of these error misspecifications on the

genetic results in a chromosome-wide study. It is shown that the type 1 error for the SNP by

age interaction terms in a genetic association study are inflated, regardless of the type of

model misspecification. To address this issue, results are presented that describe the use of

robust standard errors for the fixed effects parameters as an appropriate way to deflate the

type 1 error, in most scenarios to nominal levels. Given mixed models have only recently

begun being used in GWASs, this study provided practical guidance to genetics researchers

investigating longitudinal traits regarding the use of appropriate statistical methods when

model assumptions cannot be met.

7.1.4 Genome-Wide Association Study of BMI Trajectories Across Childhood

Once the methodological aspects of the thesis had been completed, the results from the first

three methodological studies were applied to longitudinal BMI data from three large human

cohorts. The fourth study of this thesis was the GWAS analysis for BMI trajectory across

childhood and adolescence in the Raine Study, with replication of the most significant region in

the ALSPAC and NFBC66 cohorts. Variants in the KCNJ15 gene were shown to be associated

with BMI trajectory over this time period in both males and females. The most significant SNP

in the Raine Study, rs2008580, is a transcription factor binding site within an intronic region of

the gene. The KCNJ15 gene has previously been reported to be associated with an increased

risk of type 2 diabetes, increased levels of insulin and insulin resistance [291,293]. The

rs2008580 SNP appears to be driven by a change in weight rather than a change in height,

indicating that it is influencing adiposity rather than skeletal growth, which is consistent with

increased levels of circulating insulin. Therefore, through the development of appropriate

longitudinal models for BMI trajectories, a novel biologically plausible gene for BMI trajectory

over childhood was discovered.


7.1.5 Association of a Genetic Risk Score with Longitudinal BMI in Children

The final study presented in this thesis evaluated the impact of all known and replicated

genetic variants associated with adult BMI on growth over childhood and adolescence

(including BMI, height and weight) and related growth parameters (including age and BMI at

both the adiposity peak and rebound). The results of this study indicate that these variants

appear to be related to childhood BMI growth by affecting both how much fat is accumulated

and how tall the child grows until puberty. The genetic effect begins very early in life, around

one year, and there are potentially sex differences in the impact these variants in the

established obesity genes have on BMI growth throughout childhood. These results provide

further evidence for the life course epidemiology hypotheses and provide insight into the

development of obesity, based on the genetic predisposition, which may be useful in the

design of intervention studies.

7.2 Limitations Although the SPLMM method was shown to be the most efficient and flexible approach for

modelling complex longitudinal phenotypes in large-scale genetics studies, it remained a

computationally intensive method for GWAS analyses particularly in the studies with larger

sample size and more repeated measurements.

7.2.1 Computational Intensity

As discussed in Chapter 5, I was only able to conduct a full GWAS analysis in the Raine Study,

primarily due to the computational burden that the GWAS would have in the other studies. As

discussed in Chapter 5, the larger cohorts 1) had a longer computation time for the analysis of

each SNP due to the greater sample size and additional measures per person and 2) had

limited computational facilities available at the cohorts at the time of the study. To attempt to

reduce the computational burden, the code for calculating the robust standard errors and

associated P-Values was rewritten in C++, which reduced the time from several minutes to

seconds per SNP in the ALSPAC and NFBC66 cohorts. To make it feasible to conduct analysis of

the full 2.5 million SNPs, without adjustment to how the scripts were run on the high

performance computers, the lme function from R would also have to be rewritten in C++,

which would require a considerable amount of programming. Instead, the analysts for this

project at ALSPAC and NFBC66 are currently in discussion with the high performance

computing teams at their universities to determine a computationally efficient procedure for


running the required scripts. If there is no alternative to reduce the computation time using

the SPLMM method with robust standard errors, either the analytic plan will need to be

redefined or the lme function reprogrammed in C++.

7.2.2 Gene Discovery

Because the Raine Study is relatively small with only 1,461 individuals and a maximum of 8

measurements per individual, the GWAS in this study only detected one genetic region of

interest. To find reliable genetic associations with BMI trajectories, it will be necessary to have

a larger sample size in the discovery GWAS analysis. To be able to increase the sample size, it

would be necessary to reduce the computational time for each SNP in the larger cohorts, as

discussed above.

7.3 Future Directions To improve the clinical care provided by medical practitioners, additional knowledge about the

complex interplay between genetics, the environment and behaviours underlying the disease

is required. To understand more about the mechanisms of diseases, such as obesity, complex

statistical methods, such as those described in this thesis, will need to be utilized. The use of

longitudinal data in genetic association studies have only recently begun, hence there are

numerous opportunities for building further on the studies described in this thesis. Below are a

number of potential extensions; however this list is not exhaustive.

7.3.1 Reducing Remaining Type 1 Error

The chromosome-wide analysis in Chapter 4 and the GWAS in Chapter 5 present results that

demonstrate there is some remaining inflation in the λ value for the SNP*age effect. This may

be because there are many associated SNPs in the analysis; alternatively, and more likely,

there is some residual type 1 error inflation for that parameter. As discussed in Chapter 5, it

may be possible to use genomic control adjustment to reduce this to nominal levels; however,

it would need to be determined that the remaining inflation, after using the robust standard

errors, is constant across the genome as the genomic control adjusts all test statistics by the

same amount.


7.3.2 Longitudinal Family Studies

There are many large cohort studies that combine both a longitudinal and family based study

design, for example the Framingham Heart Study [321] and the Busselton Health Study [322].

Generally, when using these cohorts, either a cross-sectional study of the families is conducted

or a longitudinal study of unrelated individuals, depending on the research question of

interest. The modelling framework presented in this thesis can easily be extended to account

for the within-family correlation by incorporating additional random effects, as well as the

accounting for the within-individual correlation discussed here. This could be very powerful for

gene discovery by including all genotyped individuals in the analysis to increase the sample

size, as opposed to selecting an unrelated subset of individuals (a design that has previously

been used [322]).

7.3.3 Adjusting for Environmental Covariates

As discussed in Section 1.5, there are several important covariates that have previously been

shown to be associated with BMI throughout childhood and adolescence, including duration of

breast feeding [123,124] and nutrition over childhood [125,126], amount of physical activity

[127,142] and timing of puberty [147,148]. Although the modelling frameworks presented

here allow for covariates and it would be beneficial to adjust for them to decrease the total

residual variation and potentially increase power to detect small genetic effects, it is not

without its challenges. There are two types of covariates that can be incorporated into the

models presented in this thesis; time-invariant and time varying covariates. Ideally, time-

invariant covariates would be those which occur before the start of the trajectory and remain

consistent throughout the time window being modelled. Examples of such covariates would

include sex, maternal smoking during pregnancy and birth weight. Covariates of this nature

could be incorporated into the SPLMM model and investigation into the changes to the genetic

effect could be conducted. Time-invariant covariates that occur within the time window being

modelled, such as the timing of puberty, are more difficult to incorporate. However, one

example of incorporating such covariates would be to include a knot point in the SPLMM at

time of the covariate to allow the curve to differ before and after the onset of the covariate.

Finally, time varying covariates can also be included in the modelling frameworks presented

here, although some thought regarding how they are incorporated is often required. Often in

cohort studies, such as the Raine Study, ALSPAC and NFBC66, the collection of variables such

as nutrition and exercise changes over the follow-up years. This is often necessary as the


nutrition and exercise patterns of a two year-old are quite different to those of a 13 year-old.

For example, questions regarding breast feeding and baby food are asked in the two year

follow-up of the Raine Study, whereas a food frequency questionnaire is asked at the 13 year

follow-up. Therefore, to incorporate a time varying covariate for ‘nutritional status’ over the

full time window, these different questions would need to be classified into an interpretable

variable. This classification was beyond the scope of this thesis, however the modelling

frameworks presented will allow for such covariates.

Although adjustment for covariates is an important part of genetic and epidemiological

research they have not been incorporated in this thesis for two main reasons; firstly, the

covariates that need to be included when modelling BMI trajectories are complex to

incorporate as outlined in the previous paragraph and secondly, the three cohorts used in the

analysis in this thesis collected different covariates at different follow-up times. To include the

required covariates in the BMI models further research into the best way to incorporate them

and their generalizability to all three cohorts would be required; this was beyond the scope of

this thesis.

7.3.4 Gene-Environment and Gene-Gene Interactions

The environment plays an important role in BMI and the development of obesity, particularly

diet and exercise. As discussed in Chapter 2, it is difficult to investigate the genetics of BMI

without investigating the environmental determinants simultaneously. In addition, it is unlikely

that one gene is regulating growth but rather multiple genes acting collectively. An important

future research direction would be to investigate both gene-gene and gene-environment

interactions in association with childhood BMI trajectories. The research in this thesis has

provided a framework that will allow extensions to studies of these interactions; however,

given the statistical power and computational issues, a candidate gene study may initially be

more appropriate than a genome-wide study.

Researchers are beginning to investigate the interactions between known BMI genes and

environmental factors in cross-sectional study designs. For example, previous studies have

shown that the FTO effect was attenuated by approximately 30% in individuals who were more

physically active [323,324,325,326,327]. In addition, studies that examined the interaction of

FTO with dietary intake showed that the effect of FTO on BMI was less pronounced in


individuals with a low calorie intake [328], with healthier diets [329] or were breast fed for

more than two months [330]. These indicate that FTO is sensitive to a healthy lifestyle in

general, but it does not remove the genetic effect all together. Another study investigated the

association between BMI and 12 BMI/obesity associated variants as a genetic risk score; this

study showed that the effect of the score was 40% less pronounced in individuals who were

physically active [331]. Therefore, it is thought that these lifestyle modifications may go

beyond just the FTO genotype. An extension to these studies in a longitudinal setting could be

to study how the timing of onset of a healthy diet or increased physical activity interacts with

specific genes to influence BMI trajectory.

Diet and exercise, as environmental predictors of BMI throughout childhood, were not

investigated in this thesis, as it is difficult to get a consistent measure across the age range of

interest. Although at most of the follow-ups in the Raine Study, diet and physical activity were

measured in some way, it is difficult to combine the measures into one (or a small number) of

time-varying covariates that could be used in a mixed model framework. In addition, the other

cohorts that were used as replication utilized different measures of diet and exercise than the

Raine Study. As gene-environment interactions were not the primary interest of this thesis,

these covariates were not included; however, the framework described in this thesis allows

the inclusion of both time-varying and time independent covariates in future studies.

7.3.5 Fine Mapping

For the majority of loci identified to date, including the 32 adult BMI associated loci discussed

in Chapter 6, there is no clear variant or gene that is on the causal pathway to obesity. Some

loci are in regions with multiple genes, whereas others are intergenic. Therefore, work remains

to be done to fine-map these regions and identify the causal genes and loci, such that they can

be followed up in experimental research [282]. Longitudinal methods may be beneficial when

fine mapping a region as they can provide additional information regarding the timing of

onset, which in turn can be used in developing individualized interventions based on an

individual’s genetic profile. In addition, if real, the KCNJ15 region would need to be sequenced

to determine the causal locus affecting childhood growth, in addition to investigating

functional data. Although the rs2008580 variant was the most significant SNP in the Raine

Study, this was not replicated in the ALSPAC or NFBC66 cohorts and therefore is unlikely to be

the causal locus in the region; however, several SNPs in the region spanning from the DSCR4 to


KCNJ15 genes were significantly associated with BMI trajectory in the meta-analysis of the

three cohorts, indicating there is potentially a genetic variant in the region that influences

childhood growth.

7.3.6 Rare Variants

Imputing against 1,000 genomes to investigate gene-based tests, rather than individual

common variants, is an inexpensive method that may highlight additional genomic regions of

interest. Although the SPLMM method may not be suitable for analysis of all 28 million

variants that are imputed using 1,000 genomes due to the computational burden, it may be

useful when looking at the gene-based tests used for rare variant analysis

[237,238,239,240,241,242,243]. Using these methods, either the number of variants in the

gene are counted and used as the explanatory variable in the association analysis (known as

burden tests) or mixed effects models are used to test the variance of the random effects. In

addition, the SPLMM could be used for the investigation of focused regions of the genome

from next generation sequence data. Given the SPLMM method is flexible and allows for the

inclusion of covariates, this method would allow the investigation of which rare variants play

an important role in an individual’s pattern of growth.

7.4 Conclusion The aim of this thesis was to develop a modelling framework that allowed the detection of

small genetic effects in analysis of complex longitudinal phenotypes, utilizing BMI trajectories

across childhood to develop the framework. Although the methods discussed were applied to

BMI trajectories throughout childhood and adolescence, the SPLMM framework and the use of

robust standard errors are flexible so that they can be translated to the genetic analysis of any

longitudinal phenotype. In addition, the research in this thesis describes a GWAS of childhood

BMI trajectories. One region on chromosome 21, in the KCNJ15 gene, was associated with

higher BMI and faster rate of growth throughout childhood in males and females. This gene

has not previously been shown to be associated with BMI or risk of obesity; however, has been

shown to be associated with type 2 diabetes, increased levels of insulin and insulin resistance

and is therefore a biologically plausible gene for BMI growth.

Obesity cost the Australian society $8.3 billion in 2008, including direct costs such as loss of

productivity, health system costs and carer costs, and indirect cost such as absenteeism and


taxation revenue forgone [332,333]. These costs include those for subsequent diseases caused

by being obese, such as type 2 diabetes or coronary heart disease. The discovery of genetic

variants associated with growth patterns throughout childhood has the potential of identifying

individuals very early in life who are at risk of becoming obese. Although interventions for

preventing obesity in children were initially shown to be ineffective [334], the most recent

Cochrane review showed that some interventions may have an impact on reducing BMI [335].

It has been shown that the most successful interventions start before age three, where a

successful intervention can have personal benefits, social benefits and government savings

[336]. The interventions that have been successful have impacted many aspects of the

children’s lives, including both the environment at home and school [335]. Although each of

the SNPs found to-date will not have clinical utility on their own, they have pointed the way

towards new pathways of obesity development that could be used for pharmacological

manipulation and ultimately therapeutic benefit. Therefore, by incorporating genetic

information when developing an intervention programme, we may be able to offer more

targeted interventions for those high-risk individuals while they are still learning about the

importance of healthy lifestyle factors such as a good diet and physical activity [337]. By

reducing the incidence of obesity through interventions targeting those genetically at risk, the

impact of this disease and other related diseases would also be reduced in the community.


References 1. Barker DJ (1990) The fetal and infant origins of adult disease. BMJ 301: 1111.2. Gluckman PD, Hanson MA (2006) Developmental Origins of Health and Disease: Springer US.3. Ben-Shlomo Y, Kuh D (2002) A life course approach to chronic disease epidemiology:

conceptual models, empirical challenges and interdisciplinary perspectives. Int J Epidemiol 31: 285-293.

4. Kuh D, Ben-Shlomo Y (2004) A life course approach to chronic disease epidemiology; tracingthe origins of ill-health from early to adult life: Oxford: Oxford University Press.

5. Kuh D, Ben-Shlomo Y, Lynch J, Hallqvist J, Power C (2003) Life course epidemiology. JEpidemiol Community Health 57: 778-783.

6. Burton PR, Tobin MD, Hopper JL (2005) Key concepts in genetic epidemiology. Lancet 366:941-951.

7. Kruglyak L, Nickerson DA (2001) Variation is the spice of life. Nat Genet 27: 234-236.8. Palmer L, Smith GD, Burton PR (2011) An Introduction to Genetic Epidemiology: Policy Press.9. Hardy GH (1908) Mendelian Proportions in a Mixed Population. Science 28: 49-50.10. Weinberg W (1908) Über den Nachweis der Vererbung beim Menschen. Jahresh. Ver.

Vaterl. Naturkd. Württemb 64: 369-382. 11. Mitchell AA, Cutler DJ, Chakravarti A (2003) Undetected genotyping errors cause apparent

overtransmission of common alleles in the transmission/disequilibrium test. Am J Hum Genet 72: 598-610.

12. Lewontin RC (1964) The Interaction of Selection and Linkage. I. General Considerations;Heterotic Models. Genetics 49: 49-67.

13. Hill WG, Weir BS (1994) Maximum-likelihood estimation of gene location by linkagedisequilibrium. Am J Hum Genet 54: 705-714.

14. Excoffier L, Slatkin M (1995) Maximum-likelihood estimation of molecular haplotypefrequencies in a diploid population. Mol Biol Evol 12: 921-927.

15. Clark AG (1990) Inference of haplotypes from PCR-amplified samples of diploidpopulations. Mol Biol Evol 7: 111-122.

16. Stephens JC, Schneider JA, Tanguay DA, Choi J, Acharya T, et al. (2001) Haplotype variationand linkage disequilibrium in 313 human genes. Science 293: 489-493.

17. Dawn Teare M, Barrett JH (2005) Genetic linkage studies. Lancet 366: 1036-1044.18. Cordell HJ, Clayton DG (2005) Genetic association studies. Lancet 366: 1121-1131.19. Metzker ML (2010) Sequencing technologies - the next generation. Nat Rev Genet 11: 31-

46. 20. Palmer LJ, Cardon LR (2005) Shaking the tree: mapping complex disease genes with linkage

disequilibrium. Lancet 366: 1223-1234. 21. Morton NE (1955) Sequential tests for the detection of linkage. Am J Hum Genet 7: 277-

318. 22. Chotai J (1984) On the lod score method in linkage analysis. Ann Hum Genet 48: 359-378.23. Lander E, Kruglyak L (1995) Genetic dissection of complex traits: guidelines for interpreting

and reporting linkage results. Nat Genet 11: 241-247. 24. Blackwelder WC, Elston RC (1985) A comparison of sib-pair linkage tests for disease

susceptibility loci. Genet Epidemiol 2: 85-97. 25. Kong A, Cox NJ (1997) Allele-sharing models: LOD scores and accurate linkage tests. Am J

Hum Genet 61: 1179-1188. 26. Whittemore AS, Halpern J (1994) A class of tests for linkage using affected pedigree

members. Biometrics 50: 118-127.

240 References

27. Haseman JK, Elston RC (1972) The investigation of linkage between a quantitative trait anda marker locus. Behav Genet 2: 3-19.

28. Elston RC, Buxbaum S, Jacobs KB, Olson JM (2000) Haseman and Elston revisited. GenetEpidemiol 19: 1-17.

29. Almasy L, Blangero J (1998) Multipoint quantitative-trait linkage analysis in generalpedigrees. Am J Hum Genet 62: 1198-1211.

30. George RA, Smith TD, Callaghan S, Hardman L, Pierides C, et al. (2008) General mutationdatabases: analysis and review. J Med Genet 45: 65-70.

31. Altmuller J, Palmer LJ, Fischer G, Scherb H, Wjst M (2001) Genomewide scans of complexhuman diseases: true linkage is hard to find. Am J Hum Genet 69: 936-950.

32. Hirschhorn JN, Daly MJ (2005) Genome-wide association studies for common diseases andcomplex traits. Nat Rev Genet 6: 95-108.

33. Mann CJ (2003) Observational research methods. Research design II: cohort, crosssectional, and case-control studies. Emerg Med J 20: 54-60.

34. Bochud M (2012) Genetics for clinicians: from candidate genes to whole genome scans(technological advances). Best Pract Res Clin Endocrinol Metab 26: 119-132.

35. Hattersley AT, McCarthy MI (2005) What makes a good genetic association study? Lancet366: 1315-1323.

36. Silverman EK, Palmer LJ (2000) Case-control association studies for the genetics of complexrespiratory diseases. Am J Respir Cell Mol Biol 22: 645-648.

37. Price AL, Zaitlen NA, Reich D, Patterson N (2010) New approaches to populationstratification in genome-wide association studies. Nat Rev Genet 11: 459-463.

38. Abecasis GR, Cardon LR, Cookson WO (2000) A general test of association for quantitativetraits in nuclear families. Am J Hum Genet 66: 279-292.

39. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, et al. (2006) Principalcomponents analysis corrects for stratification in genome-wide association studies. Nat Genet 38: 904-909.

40. Devlin B, Roeder K (1999) Genomic control for association studies. Biometrics 55: 997-1004.

41. Barrett JC, Cardon LR (2006) Evaluating coverage of genome-wide association studies. NatGenet 38: 659-662.

42. McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, et al. (2008) Genome-wideassociation studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 9: 356-369.

43. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, et al. (2009) Finding the missingheritability of complex diseases. Nature 461: 747-753.

44. Bush WS, Moore JH (2012) Chapter 11: Genome-wide association studies. PLoS ComputBiol 8: e1002822.

45. Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS, et al. (2005) Complement factor Hpolymorphism in age-related macular degeneration. Science 308: 385-389.

46. Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14,000cases of seven common diseases and 3,000 shared controls. Nature 447: 661-678.

47. Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, et al. (2009) Potentialetiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A 106: 9362-9367.

48. Hindorff LA, MacArthur J, Wise A, Junkins HA, Hall PN, et al. (2010) A Catalog of PublishedGenome-Wide Association Studies. National Human Genome Research Institute.

241 References

49. de Bakker PI, Ferreira MA, Jia X, Neale BM, Raychaudhuri S, et al. (2008) Practical aspectsof imputation-driven meta-analysis of genome-wide association studies. Hum Mol Genet 17: R122-128.

50. Maher B (2008) Personal genomes: The case of the missing heritability. Nature 456: 18-21.51. Visscher PM, Brown MA, McCarthy MI, Yang J (2012) Five years of GWAS discovery. Am J

Hum Genet 90: 7-24. 52. Marchini J, Howie B, Myers S, McVean G, Donnelly P (2007) A new multipoint method for

genome-wide association studies by imputation of genotypes. Nat Genet 39: 906-913. 53. Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR (2010) MaCH: using sequence and genotype

data to estimate haplotypes and unobserved genotypes. Genet Epidemiol 34: 816-834. 54. Scheet P, Stephens M (2006) A fast and flexible statistical model for large-scale population

genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet 78: 629-644.

55. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, et al. (2007) PLINK: a tool set forwhole-genome association and population-based linkage analyses. Am J Hum Genet 81: 559-575.

56. Browning SR, Browning BL (2007) Rapid and accurate haplotype phasing and missing-datainference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet 81: 1084-1097.

57. Biernacka JM, Tang R, Li J, McDonnell SK, Rabe KG, et al. (2009) Assessment of genotypeimputation methods. BMC Proc 3 Suppl 7: S5.

58. Pei YF, Li J, Zhang L, Papasian CJ, Deng HW (2008) Analyses and comparison of accuracy ofdifferent genotype imputation methods. PLoS One 3: e3551.

59. Li N, Stephens M (2003) Modeling linkage disequilibrium and identifying recombinationhotspots using single-nucleotide polymorphism data. Genetics 165: 2213-2233.

60. The International HapMap Consortium (2003) The International HapMap Project. Nature426: 789-796.

61. Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, et al. (2007) A second generationhuman haplotype map of over 3.1 million SNPs. Nature 449: 851-861.

62. Altshuler DM, Gibbs RA, Peltonen L, Dermitzakis E, Schaffner SF, et al. (2010) Integratingcommon and rare genetic variation in diverse human populations. Nature 467: 52-58.

63. Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, et al. (2010) A map of humangenome variation from population-scale sequencing. Nature 467: 1061-1073.

64. Li Y, Willer C, Sanna S, Abecasis G (2009) Genotype imputation. Annu Rev Genomics HumGenet 10: 387-406.

65. Huang L, Li Y, Singleton AB, Hardy JA, Abecasis G, et al. (2009) Genotype-imputationaccuracy across worldwide human populations. Am J Hum Genet 84: 235-250.

66. Jostins L, Morley KI, Barrett JC (2011) Imputation of low-frequency variants using theHapMap3 benefits from large, diverse reference sets. Eur J Hum Genet 19: 662-666.

67. Howie B, Marchini J, Stephens M (2011) Genotype imputation with thousands of genomes.G3 (Bethesda) 1: 457-470.

68. Guan Y, Stephens M (2008) Practical issues in imputation-based association mapping. PLoSGenet 4: e1000279.

69. Servin B, Stephens M (2007) Imputation-based analysis of association studies: candidateregions and quantitative traits. PLoS Genet 3: e114.

70. Zeggini E, Ioannidis JP (2009) Meta-analysis in genome-wide association studies.Pharmacogenomics 10: 191-201.

71. Marchini J, Howie B (2010) Genotype imputation for genome-wide association studies. NatRev Genet 11: 499-511.

242 References

72. Speliotes EK, Willer CJ, Berndt SI, Monda KL, Thorleifsson G, et al. (2010) Associationanalyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat Genet 42: 937-948.

73. Freathy RM, Mook-Kanamori DO, Sovio U, Prokopenko I, Timpson NJ, et al. (2010) Variantsin ADCY5 and near CCNL1 are associated with fetal growth and birth weight. Nat Genet 42: 430-435.

74. Barrett JC (2010) Genotype Imputation Enables Powerful Combined Analyses of Genome-Wide Association Studies. Illumina (http://www.illumina.com/Documents/products/appnotes/appnote_imputation.pdf).

75. Igl BW, Konig IR, Ziegler A (2009) What do we mean by 'replication' and 'validation' ingenome-wide association studies? Hum Hered 67: 66-68.

76. Chanock SJ, Manolio T, Boehnke M, Boerwinkle E, Hunter DJ, et al. (2007) Replicatinggenotype-phenotype associations. Nature 447: 655-660.

77. Kraft P (2008) Curses--winner's and otherwise--in genetic epidemiology. Epidemiology 19:649-651; discussion 657-648.

78. Almasy L, Amos CI, Bailey-Wilson JE, Cantor RM, Jaquish CE, et al. (2003) Proceedings of theGenetic Analysis Workshop 13: analysis of longitudinal family data for complex diseases and related risk factors. November 11-14, 2002. New Orleans, Louisiana, USA. BMC Genet 4 Suppl 1: S1-106.

79. Almasy L, Cupples LA, Daw EW, Levy D, Thomas D, et al. (2003) Proceedings of the GeneticAnalysis Workshop 13: Summarizing analysis of longitudinal family data for complex diseases and related risk factors. New Orleans, Louisiana, USA. November 11-14, 2002. Genet Epidemiol 25 Suppl 1: S1-97.

80. MacCluer JW, Amos CI, Gregersen PK, Heard-Costa N, Lee M, et al. (2009) Genetic AnalysisWorkshop 16: introduction to workshop summaries. Genet Epidemiol 33 Suppl 1: S1-7.

81. Gauderman WJ, Macgregor S, Briollais L, Scurrah K, Tobin M, et al. (2003) Longitudinal dataanalysis in pedigree studies. Genetic Epidemiology 25: S18-S28.

82. Briollais L, Tzontcheva A, Bull S (2003) Multilevel modeling for the analysis of longitudinalblood pressure data in the Framingham Heart Study pedigrees. BMC Genet 4 Suppl 1: S19.

83. Gee C, Morrison JL, Thomas DC, Gauderman WJ (2003) Segregation and linkage analysis forlongitudinal measurements of a quantitative trait. BMC Genet 4 Suppl 1: S21.

84. Palmer LJ, Scurrah KJ, Tobin M, Patel SR, Celedon JC, et al. (2003) Genome-wide linkageanalysis of longitudinal phenotypes using sigma2A random effects (SSARs) fitted by Gibbs sampling. BMC Genet 4 Suppl 1: S12.

85. Kerner B, North KE, Fallin MD (2009) Use of longitudinal data in genetic studies in thegenome-wide association studies era: summary of Group 14. Genet Epidemiol 33 Suppl 1: S93-98.

86. Chang SW, Choi SH, Li K, Fleur RS, Huang C, et al. (2009) Growth mixture modeling as anexploratory analysis tool in longitudinal quantitative trait loci analysis. BMC Proc 3 Suppl 7: S112.

87. Kerner B, Muthen BO (2009) Growth mixture modelling in families of the FraminghamHeart Study. BMC Proc 3 Suppl 7: S114.

88. Luan J, Kerner B, Zhao JH, Loos RJ, Sharp SJ, et al. (2009) A multilevel linear mixed model ofthe association between candidate genes and weight and body mass index using the Framingham longitudinal family data. BMC Proc 3 Suppl 7: S115.

89. Roslin NM, Hamid JS, Paterson AD, Beyene J (2009) Genome-wide association analysis ofcardiovascular-related quantitative traits in the Framingham Heart Study. BMC Proc 3 Suppl 7: S117.

243 References

http://www.illumina.com/Documents/products/appnotes/appnote_imputation.pdf)

90. Zhu W, Cho K, Chen X, Zhang M, Wang M, et al. (2009) A genome-wide association analysisof Framingham Heart Study longitudinal data using multivariate adaptive splines. BMC Proc 3 Suppl 7: S119.

91. Furlotte NA, Eskin E, Eyheramendy S (2012) Genome-wide association mapping withlongitudinal data. Genet Epidemiol 36: 463-471.

92. Fan R, Zhang Y, Albert PS, Liu A, Wang Y, et al. (2012) Longitudinal Association Analysis ofQuantitative Traits. Genet Epidemiol.

93. Haslam DW, James WP (2005) Obesity. Lancet 366: 1197-1209.94. World Health Organization (2006) Obesity and Overweight Fact Sheet.95. Australian Bureau of Statistics (2007-2008) National Health Survey: Summary of Results.96. Griffiths LJ, Parsons TJ, Hill AJ (2010) Self-esteem and quality of life in obese children and

adolescents: a systematic review. Int J Pediatr Obes 5: 282-304. 97. Tsiros MD, Olds T, Buckley JD, Grimshaw P, Brennan L, et al. (2009) Health-related quality

of life in obese children and adolescents. Int J Obes (Lond) 33: 387-400. 98. Lawlor DA, Mamun AA, O'Callaghan MJ, Bor W, Williams GM, et al. (2005) Is being

overweight associated with behavioural problems in childhood and adolescence? Findings from the Mater-University study of pregnancy and its outcomes. Arch Dis Child 90: 692-697.

99. Sawyer MG, Miller-Lewis L, Guy S, Wake M, Canterford L, et al. (2006) Is there arelationship between overweight and obesity and mental health problems in 4- to 5-year-old Australian children? Ambul Pediatr 6: 306-311.

100. Srinivasan SR, Myers L, Berenson GS (2006) Changes in metabolic syndrome variables since childhood in prehypertensive and hypertensive subjects: the Bogalusa Heart Study. Hypertension 48: 33-39.

101. Bradford NF (2009) Overweight and obesity in children and adolescents. Prim Care 36: 319-339.

102. Kindblom JM, Lorentzon M, Hellqvist A, Lonn L, Brandberg J, et al. (2009) BMI changes during childhood and adolescence as predictors of amount of adult subcutaneous and visceral adipose tissue in men: the GOOD Study. Diabetes 58: 867-874.

103. Serdula MK, Ivery D, Coates RJ, Freedman DS, Williamson DF, et al. (1993) Do obese children become obese adults? A review of the literature. Prev Med 22: 167-177.

104. Booth ML, Chey T, Wake M, Norton K, Hesketh K, et al. (2003) Change in the prevalence of overweight and obesity among young Australians, 1969-1997. Am J Clin Nutr 77: 29-36.

105. Olds TS, Tomkinson GR, Ferrar KE, Maher CA (2010) Trends in the prevalence of childhood overweight and obesity in Australia between 1985 and 2008. Int J Obes (Lond) 34: 57-66.

106. Gerritsen S, Stefanogiannis N, Galloway Y, Devlin M, Templeton R, et al. (2008) A Portrait of Health - Key Results of the 2006/07 New Zealand Health Survey. In: Ministry of Health: Wellington NZ, editor.

107. Ogden CL, Carroll MD, Flegal KM (2008) High body mass index for age among US children and adolescents, 2003-2006. JAMA 299: 2401-2405.

108. Sjoberg A, Lissner L, Albertsson-Wikland K, Marild S (2008) Recent anthropometric trends among Swedish school children: evidence for decreasing prevalence of overweight in girls. Acta Paediatr 97: 118-123.

109. Peneau S, Salanave B, Maillard-Teyssier L, Rolland-Cachera MF, Vergnaud AC, et al. (2009) Prevalence of overweight in 6- to 15-year-old children in central/western France from 1996 to 2006: trends toward stabilization. Int J Obes (Lond) 33: 401-407.

110. Farooqi IS, O'Rahilly S (2005) Monogenic obesity in humans. Annu Rev Med 56: 443-458.

244 References

111. Delrue MA, Michaud JL (2004) Fat chance: genetic syndromes with obesity. Clin Genet 66: 83-93.

112. Hall DM, Cole TJ (2006) What use is the BMI? Arch Dis Child 91: 283-286. 113. Cole TJ, Bellizzi MC, Flegal KM, Dietz WH (2000) Establishing a standard definition for child

overweight and obesity worldwide: international survey. BMJ 320: 1240-1243. 114. WHO (2000) Obesity: preventing and managing the golbal epidemic. Report of a WHO

Consultation. WHO Technical Report Series 894. Geneva: World Health Organization, 2000.

115. Borghi E, de Onis M, Garza C, Van den Broeck J, Frongillo EA, et al. (2006) Construction of the World Health Organization child growth standards: selection of methods for attained growth curves. Stat Med 25: 247-265.

116. Field CJ (2009) Early risk determinants and later health outcomes: implications for research prioritization and the food supply. Summary of the workshop. Am J Clin Nutr 89: 1533S-1539S.

117. Newnham JP, Pennell CE, Lye SJ, Rampono J, Challis JR (2009) Early life origins of obesity. Obstet Gynecol Clin North Am 36: 227-244, xii.

118. Dietz WH (1994) Critical periods in childhood for the development of obesity. Am J Clin Nutr 59: 955-959.

119. Adair LS (2008) Child and adolescent obesity: epidemiology and developmental perspectives. Physiol Behav 94: 8-16.

120. Monteiro PO, Victora CG (2005) Rapid growth in infancy and childhood and obesity in later life--a systematic review. Obes Rev 6: 143-154.

121. Baird J, Fisher D, Lucas P, Kleijnen J, Roberts H, et al. (2005) Being big or growing fast: systematic review of size and growth in infancy and later obesity. BMJ 331: 929.

122. Mook-Kanamori DO, Durmus B, Sovio U, Hofman A, Raat H, et al. (2011) Fetal and infant growth and the risk of obesity during early childhood: the Generation R Study. Eur J Endocrinol 165: 623-630.

123. Owen CG, Martin RM, Whincup PH, Davey-Smith G, Gillman MW, et al. (2005) The effect of breastfeeding on mean body mass index throughout life: a quantitative review of published and unpublished observational evidence. Am J Clin Nutr 82: 1298-1307.

124. Horta BL, Bahl R, Martines JC, Victora CG (2007) Evidence on the long-term effects of breastfeeding: systematic review and meta-analysis. Geneva, Switzerland: World Health Organization.

125. Briefel R, Ziegler P, Novak T, Ponza M (2006) Feeding Infants and Toddlers Study: characteristics and usual nutrient intake of Hispanic and non-Hispanic infants and toddlers. J Am Diet Assoc 106: S84-95.

126. Mihrshahi S, Battistutta D, Magarey A, Daniels LA (2011) Determinants of rapid weight gain during infancy: baseline results from the NOURISH randomised controlled trial. BMC Pediatr 11: 99.

127. Zimmerman FJ, Christakis DA, Meltzoff AN (2007) Television and DVD/video viewing in children younger than 2 years. Arch Pediatr Adolesc Med 161: 473-479.

128. Silverwood RJ, De Stavola BL, Cole TJ, Leon DA (2009) BMI peak in infancy as a predictor for later BMI in the Uppsala Family Study. Int J Obes (Lond) 33: 929-937.

129. Sovio U, Timpson NJ, Warrington NM, Briollais L, Mook-Kanamori D, et al. (2009) Association Between FTO Polymorphism, Adiposity Peak and Adiposity Rebound in The Northern Finland Birth Cohort 1966. Atherosclerosis 207: e4-e5.

130. He Q, Karlberg J (2002) Probability of adult overweight and risk change during the BMI rebound period. Obes Res 10: 135-140.

245 References

131. Rolland-Cachera MF, Deheeger M, Bellisle F, Sempe M, Guilloud-Bataille M, et al. (1984) Adiposity rebound in children: a simple indicator for predicting obesity. Am J Clin Nutr 39: 129-135.

132. Rolland-Cachera MF, Deheeger M, Maillot M, Bellisle F (2006) Early adiposity rebound: causes and consequences for obesity in children and adults. Int J Obes (Lond) 30 Suppl 4: S11-17.

133. Whitaker RC, Pepe MS, Wright JA, Seidel KD, Dietz WH (1998) Early adiposity rebound and the risk of adult obesity. Pediatrics 101: E5.

134. Bhargava SK, Sachdev HS, Fall CH, Osmond C, Lakshmy R, et al. (2004) Relation of serial changes in childhood body-mass index to impaired glucose tolerance in young adulthood. N Engl J Med 350: 865-875.

135. Eriksson JG, Forsen T, Tuomilehto J, Osmond C, Barker DJ (2003) Early adiposity rebound in childhood and risk of Type 2 diabetes in adult life. Diabetologia 46: 190-194.

136. Taylor RW, Grant AM, Goulding A, Williams SM (2005) Early adiposity rebound: review of papers linking this to subsequent obesity in children and adults. Curr Opin Clin Nutr Metab Care 8: 607-612.

137. Rolland-Cachera MF, Peneau S (2013) Growth trajectories associated with adult obesity. World Rev Nutr Diet 106: 127-134.

138. Freedman DS, Kettel Khan L, Serdula MK, Srinivasan SR, Berenson GS (2001) BMI rebound, childhood height and obesity among adults: the Bogalusa Heart Study. Int J Obes Relat Metab Disord 25: 543-549.

139. Taylor RW, Goulding A, Lewis-Barned NJ, Williams SM (2004) Rate of fat gain is faster in girls undergoing early adiposity rebound. Obes Res 12: 1228-1230.

140. Williams SM, Goulding A (2009) Patterns of growth associated with the timing of adiposity rebound. Obesity (Silver Spring) 17: 335-341.

141. Dorosty AR, Emmett PM, Cowin S, Reilly JJ (2000) Factors associated with early adiposity rebound. ALSPAC Study Team. Pediatrics 105: 1115-1118.

142. Janz KF, Levy SM, Burns TL, Torner JC, Willing MC, et al. (2002) Fatness, physical activity, and television viewing in children during the adiposity rebound period: the Iowa Bone Development Study. Prev Med 35: 563-571.

143. Deheeger M, Rolland-Cachera MF (2004) [Longitudinal study of anthropometric measurements in Parisian children aged ten months to 18 years]. Arch Pediatr 11: 1139-1144.

144. Cole TJ (2004) Children grow and horses race: is the adiposity rebound a critical period for later obesity? BMC Pediatr 4: 6.

145. Peto R (1981) The horse-racing effect. Lancet 2: 467-468. 146. Ong KK, Northstone K, Wells JC, Rubin C, Ness AR, et al. (2007) Earlier mother's age at

menarche predicts rapid infancy growth and childhood obesity. PLoS Med 4: e132. 147. van Lenthe FJ, Kemper CG, van Mechelen W (1996) Rapid maturation in adolescence

results in greater obesity in adulthood: the Amsterdam Growth and Health Study. Am J Clin Nutr 64: 18-24.

148. Garn SM, LaVelle M, Rosenberg KR, Hawthorne VM (1986) Maturational timing as a factor in female fatness and obesity. Am J Clin Nutr 43: 879-883.

149. Freedman DS, Khan LK, Serdula MK, Dietz WH, Srinivasan SR, et al. (2003) The relation of menarcheal age to obesity in childhood and adulthood: the Bogalusa heart study. BMC Pediatr 3: 3.

150. Must A, Naumova EN, Phillips SM, Blum M, Dawson-Hughes B, et al. (2005) Childhood overweight and maturational timing in the development of adult overweight and fatness: the Newton Girls Study and its follow-up. Pediatrics 116: 620-627.

246 References

151. dos Santos Silva I, De Stavola BL, Mann V, Kuh D, Hardy R, et al. (2002) Prenatal factors, childhood growth trajectories and age at menarche. Int J Epidemiol 31: 405-412.

152. Mumby HS, Elks CE, Li S, Sharp SJ, Khaw KT, et al. (2011) Mendelian Randomisation Study of Childhood BMI and Early Menarche. J Obes 2011: 180729.

153. Tanner JM, Whitehouse RH, Marubini E, Resele LF (1976) The adolescent growth spurt of boys and girls of the Harpenden growth study. Ann Hum Biol 3: 109-126.

154. Zacharias L, Rand WM (1983) Adolescent growth in height and its relation to menarche in contemporary American girls. Ann Hum Biol 10: 209-222.

155. Tanner JM, Davies PS (1985) Clinical longitudinal standards for height and height velocity for North American children. J Pediatr 107: 317-329.

156. Elks CE, Perry JR, Sulem P, Chasman DI, Franceschini N, et al. (2010) Thirty new loci for age at menarche identified by a meta-analysis of genome-wide association studies. Nat Genet 42: 1077-1085.

157. Cousminer DL, Berry DJ, Timpson NJ, Ang W, Thiering E, et al. (2013) Genome-wide association and longitudinal analyses reveal genetic loci linking pubertal height growth, pubertal timing and childhood adiposity. Hum Mol Genet.

158. Jasik CB, Lustig RH (2008) Adolescent obesity and puberty: the "perfect storm". Ann N Y Acad Sci 1135: 265-279.

159. Crespo CJ, Smit E, Troiano RP, Bartlett SJ, Macera CA, et al. (2001) Television watching, energy intake, and obesity in US children: results from the third National Health and Nutrition Examination Survey, 1988-1994. Arch Pediatr Adolesc Med 155: 360-365.

160. Berkey CS, Rockett HR, Field AE, Gillman MW, Frazier AL, et al. (2000) Activity, dietary intake, and weight changes in a longitudinal study of preadolescent and adolescent boys and girls. Pediatrics 105: E56.

161. Berkey CS, Rockett HR, Gillman MW, Field AE, Colditz GA (2003) Longitudinal study of skipping breakfast and weight change in adolescents. Int J Obes Relat Metab Disord 27: 1258-1266.

162. Eaton DK, Kann L, Kinchen S, Ross J, Hawkins J, et al. (2006) Youth risk behavior surveillance--United States, 2005. MMWR Surveill Summ 55: 1-108.

163. Richardson LP, Garrison MM, Drangsholt M, Mancl L, LeResche L (2006) Associations between depressive symptoms and obesity during puberty. Gen Hosp Psychiatry 28: 313-320.

164. Goodman E, Whitaker RC (2002) A prospective study of the role of depression in the development and persistence of adolescent obesity. Pediatrics 110: 497-504.

165. Maes HH, Neale MC, Eaves LJ (1997) Genetic and environmental factors in relative body weight and human adiposity. Behav Genet 27: 325-351.

166. Haworth CM, Carnell S, Meaburn EL, Davis OS, Plomin R, et al. (2008) Increasing heritability of BMI and stronger associations with the FTO gene over childhood. Obesity (Silver Spring) 16: 2663-2668.

167. Wardle J, Carnell S, Haworth CM, Plomin R (2008) Evidence for a strong genetic influence on childhood adiposity despite the force of the obesogenic environment. Am J Clin Nutr 87: 398-404.

168. Parsons TJ, Power C, Logan S, Summerbell CD (1999) Childhood predictors of adult obesity: a systematic review. Int J Obes Relat Metab Disord 23 Suppl 8: S1-107.

169. Elks CE, den Hoed M, Zhao JH, Sharp SJ, Wareham NJ, et al. (2012) Variability in the heritability of body mass index: a systematic review and meta-regression. Front Endocrinol (Lausanne) 3: 29.

170. Farooqi IS (2005) Genetic and hereditary aspects of childhood obesity. Best Pract Res Clin Endocrinol Metab 19: 359-374.

247 References

171. Rankinen T, Zuberi A, Chagnon YC, Weisnagel SJ, Argyropoulos G, et al. (2006) The human obesity gene map: the 2005 update. Obesity (Silver Spring) 14: 529-644.

172. Ichihara S, Yamada Y (2008) Genetic factors for human obesity. Cell Mol Life Sci 65: 1086-1098.

173. Dina C, Meyre D, Gallina S, Durand E, Korner A, et al. (2007) Variation in FTO contributes to childhood obesity and severe adult obesity. Nat Genet 39: 724-726.

174. Frayling TM, Timpson NJ, Weedon MN, Zeggini E, Freathy RM, et al. (2007) A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 316: 889-894.

175. Loos RJ, Lindgren CM, Li S, Wheeler E, Zhao JH, et al. (2008) Common variants near MC4R are associated with fat mass, weight and risk of obesity. Nat Genet 40: 768-775.

176. Scuteri A, Sanna S, Chen WM, Uda M, Albai G, et al. (2007) Genome-wide association scan shows genetic variants in the FTO gene are associated with obesity-related traits. PLoS Genet 3: e115.

177. Chambers JC, Elliott P, Zabaneh D, Zhang W, Li Y, et al. (2008) Common genetic variation near MC4R is associated with waist circumference and insulin resistance. Nat Genet 40: 716-718.

178. Willer CJ, Speliotes EK, Loos RJ, Li S, Lindgren CM, et al. (2009) Six new loci associated with body mass index highlight a neuronal influence on body weight regulation. Nat Genet 41: 25-34.

179. Thorleifsson G, Walters GB, Gudbjartsson DF, Steinthorsdottir V, Sulem P, et al. (2009) Genome-wide association yields new sequence variants at seven loci that associate with measures of obesity. Nat Genet 41: 18-24.

180. Liu JZ, Medland SE, Wright MJ, Henders AK, Heath AC, et al. (2010) Genome-wide association study of height and body mass index in Australian twin families. Twin Res Hum Genet 13: 179-193.

181. Wang KS, Liu X, Zheng S, Zeng M, Pan Y, et al. (2012) A novel locus for body mass index on 5p15.2: a meta-analysis of two genome-wide association studies. Gene 500: 80-84.

182. Fox CS, Heard-Costa N, Cupples LA, Dupuis J, Vasan RS, et al. (2007) Genome-wide association to body mass index and waist circumference: the Framingham Heart Study 100K project. BMC Med Genet 8 Suppl 1: S18.

183. Bradfield JP, Taal HR, Timpson NJ, Scherag A, Lecoeur C, et al. (2012) A genome-wide association meta-analysis identifies new childhood obesity loci. Nat Genet 44: 526-531.

184. Newnham JP, Evans SF, Michael CA, Stanley FJ, Landau LI (1993) Effects of frequent ultrasound during pregnancy: a randomised controlled trial. Lancet 342: 887-891.

185. Williams LA, Evans SF, Newnham JP (1997) Prospective cohort study of factors influencing the relative weights of the placenta and the newborn infant. BMJ 314: 1864-1868.

186. Evans S, Newnham J, MacDonald W, Hall C (1996) Characterisation of the possible effect on birthweight following frequent prenatal ultrasound examinations. Early Hum Dev 45: 203-214.

187. Huang RC, Burke V, Newnham JP, Stanley FJ, Kendall GE, et al. (2007) Perinatal and childhood origins of cardiovascular disease. Int J Obes (Lond) 31: 236-244.

188. Boyd A, Golding J, Macleod J, Lawlor DA, Fraser A, et al. (2013) Cohort Profile: the 'children of the 90s'--the index offspring of the Avon Longitudinal Study of Parents and Children. Int J Epidemiol 42: 111-127.

189. Howe LD, Tilling K, Benfield L, Logue J, Sattar N, et al. (2010) Changes in ponderal index and body mass index across childhood and their associations with fat mass and cardiovascular risk factors at age 15. PLoS One 5: e15186.

248 References

190. Howe LD, Tilling K, Lawlor DA (2009) Accuracy of height and weight data from child health records. Arch Dis Child 94: 950-954.

191. Dubois L, Girad M (2007) Accuracy of maternal reports of pre-schoolers' weights and heights as estimates of BMI values. Int J Epidemiol 36: 132-138.

192. Paternoster L, Zhurov AI, Toma AM, Kemp JP, St Pourcain B, et al. (2012) Genome-wide association study of three-dimensional facial morphology identifies a variant in PAX3 associated with nasion position. Am J Hum Genet 90: 478-485.

193. Taal HR, St Pourcain B, Thiering E, Das S, Mook-Kanamori DO, et al. (2012) Common variants at 12q15 and 12q24 are associated with infant head circumference. Nat Genet 44: 532-538.

194. Rantakallio P (1988) The longitudinal study of the northern Finland birth cohort of 1966. Paediatr Perinat Epidemiol 2: 59-88.

195. Howie BN, Donnelly P, Marchini J (2009) A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet 5: e1000529.

196. Sabatti C, Service SK, Hartikainen AL, Pouta A, Ripatti S, et al. (2009) Genome-wide association analysis of metabolic traits in a birth cohort from a founder population. Nat Genet 41: 35-46.

197. Warrington NM, Wu YY, Pennell CE, Marsh JA, Beilin LJ, et al. (2013) Modelling BMI Trajectories in Children for Genetic Association Studies. PLoS One 8: e53897.

198. Jiao H, Arner P, Hoffstedt J, Brodin D, Dubern B, et al. (2011) Genome wide association study identifies KCNMA1 contributing to human obesity. BMC Med Genomics 4: 51.

199. Wang K, Li WD, Zhang CK, Wang Z, Glessner JT, et al. (2011) A genome-wide association study on obesity and obesity-related traits. PLoS One 6: e18939.

200. Meyre D, Delplanque J, Chevre JC, Lecoeur C, Lobbens S, et al. (2009) Genome-wide association study for early-onset and morbid adult obesity identifies three new risk loci in European populations. Nat Genet 41: 157-159.

201. Paternoster L, Evans DM, Aagaard Nohr E, Holst C, Gaborieau V, et al. (2011) Genome-Wide Population-Based Association Study of Extremely Overweight Young Adults - The GOYA Study. PLoS One 6: e24303.

202. Cotsapas C, Speliotes EK, Hatoum IJ, Greenawalt DM, Dobrin R, et al. (2009) Common body mass index-associated variants confer risk of extreme obesity. Hum Mol Genet 18: 3502-3507.

203. Zhao J, Bradfield JP, Li M, Wang K, Zhang H, et al. (2009) The role of obesity-associated loci identified in genome-wide association studies in the determination of pediatric BMI. Obesity (Silver Spring) 17: 2254-2257.

204. den Hoed M, Ekelund U, Brage S, Grontved A, Zhao JH, et al. (2010) Genetic susceptibility to obesity and related traits in childhood and adolescence: influence of loci identified by genome-wide association studies. Diabetes 59: 2980-2988.

205. Hardy R, Wills AK, Wong A, Elks CE, Wareham NJ, et al. (2010) Life course variations in the associations between FTO and MC4R gene variants and body size. Hum Mol Genet 19: 545-552.

206. Elks CE, Loos RJ, Sharp SJ, Langenberg C, Ring SM, et al. (2010) Genetic markers of adult obesity risk are associated with greater early infancy weight gain and growth. PLoS Med 7: e1000284.

207. Heard-Costa NL, Zillikens MC, Monda KL, Johansson A, Harris TB, et al. (2009) NRXN3 is a novel locus for waist circumference: a genome-wide association study from the CHARGE Consortium. PLoS Genet 5: e1000539.

249 References

208. Lindgren CM, Heid IM, Randall JC, Lamina C, Steinthorsdottir V, et al. (2009) Genome-wide association scan meta-analysis identifies three Loci influencing adiposity and fat distribution. PLoS Genet 5: e1000508.

209. Belsky DW, Moffitt TE, Houts R, Bennett GG, Biddle AK, et al. (2012) Polygenic risk, rapid childhood growth, and the development of obesity: evidence from a 4-decade longitudinal study. Arch Pediatr Adolesc Med 166: 515-521.

210. Preece MA, Baines MJ (1978) A new family of mathematical models describing the human growth curve. Ann Hum Biol 5: 1-24.

211. Gasser T, Kohler W, Muller HG, Kneip A, Largo R, et al. (1984) Velocity and acceleration of height growth using kernel estimation. Ann Hum Biol 11: 397-411.

212. Stutzle W, Gasser T, Molinari L, Largo RH, Prader A, et al. (1980) Shape-invariant modelling of human growth. Ann Hum Biol 7: 507-528.

213. Largo RH, Gasser T, Prader A, Stuetzle W, Huber PJ (1978) Analysis of the adolescent growth spurt using smoothing spline functions. Ann Hum Biol 5: 421-434.

214. Berkey CS, Reed RB, Valadian I (1983) Midgrowth spurt in height of Boston children. Ann Hum Biol 10: 25-30.

215. Laird NM, Ware JH (1982) Random-effects models for longitudinal data. Biometrics 38: 963-974.

216. Milani S, Bossi A, Marubini E (1989) Individual growth curves and longitudinal growth charts between 0 and 3 years. Acta Paediatr Scand Suppl 350: 95-104.

217. Lachos VH, Ghosh P, Arellano-Valle RB (2010) Likelihood based inference for skew-normal independent linear mixed model. Statistica Sinica 20: 303-322.

218. Azzalini A, Capitanio A (1999) Statistical applications of the multivariate skew normal distribution. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 61: 579-602.

219. Song PXK, Zhang PQA (2007) Maximum likelihood inference in robust linear mixed-effect models using multivariate t distributions. Statistica Sinica 17: 929-943.

220. Cole TJ, Donaldson MD, Ben-Shlomo Y (2010) SITAR--a useful instrument for growth curve analysis. Int J Epidemiol 39: 1558-1566.

221. Efron B, Tibshirani RJ (1994) An Introduction to the Bootstrap: Taylor & Francis. 222. Ihaka R, Gentleman R (1996) R: A Language for Data Analysis and Graphics. Journal of

Computational and Graphical Statistics 5: 299-314. 223. Littell RC, Milliken GA, Stroup WW, Wolfinger RD, Schabenberger O (2006) SAS for Mixed

Models: SAS Institute. 224. Henderson CR (1975) Best linear unbiased estimation and prediction under a selection

model. Biometrics 31: 423-447. 225. Cheng J, Edwards LJ, Maldonado-Molina MM, Komro KA, Muller KE (2010) Real

longitudinal data analysis for real people: building a good enough mixed model. Stat Med 29: 504-520.

226. Pinheiro JC, Liu C, Wu YN (2001) Efficient Algorithms for Robust Estimation in Linear Mixed-Effects Models Using the Multivariate t-Distribution. Journal of Computational and Graphical Statistics 10: 249-276.

227. Arellano-Valle RB, Bolfarine H, Lachos VH (2005) Skew-normal Linear Mixed Models. Journal of Data Science 3: 415-438.

228. Lin TI, Lee JC (2008) Estimation and prediction in linear mixed models with skew-normal random effects for longitudinal data. Statistics in Medicine 27: 1490-1507.

229. Lachos VH, Bolfarine H, Arellano-Valle RB, Montenegro LC (2007) Likelihood based Inference for Multivariate Skew-Normal Regression Models. Communications in Statistics - Theory and Methods 36: 1769-1786.

250 References

230. Azzalini A, Dalla-Valle A (1996) The multivariate skew-normal distribution. Biometrika 83: 715-726.

231. Janssens AC, Aulchenko YS, Elefante S, Borsboom GJ, Steyerberg EW, et al. (2006) Predictive testing for complex diseases using multiple genes: fact or fiction? Genet Med 8: 395-400.

232. Janssens AC, Moonesinghe R, Yang Q, Steyerberg EW, van Duijn CM, et al. (2007) The impact of genotype frequencies on the clinical validity of genomic profiling for predicting common chronic diseases. Genet Med 9: 528-535.

233. Xu R (2003) Measuring explained variation in linear mixed effects models. Stat Med 22: 3527-3541.

234. Brookfield JF (2013) Quantitative genetics: heritability is not always missing. Curr Biol 23: R276-278.

235. Llewellyn CH, Trzaskowski M, Plomin R, Wardle J (2013) Finding the missing heritability in pediatric obesity: the contribution of genome-wide complex trait analysis. Int J Obes (Lond).

236. Chaufan C, Joseph J (2013) The 'missing heritability' of common disorders: should health researchers care? Int J Health Serv 43: 281-303.

237. Morris AP, Zeggini E (2010) An evaluation of statistical approaches to rare variant analysis in genetic association studies. Genet Epidemiol 34: 188-193.

238. Li B, Leal SM (2008) Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet 83: 311-321.

239. Madsen BE, Browning SR (2009) A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet 5: e1000384.

240. Liu DJ, Leal SM (2010) A novel adaptive method for the analysis of next-generation sequencing data to detect complex trait associations with rare variants due to gene main effects and interactions. PLoS Genet 6: e1001156.

241. Bhatia G, Bansal V, Harismendy O, Schork NJ, Topol EJ, et al. (2010) A covering method for detecting genetic associations between rare variants and common phenotypes. PLoS Comput Biol 6: e1000954.

242. Neale BM, Rivas MA, Voight BF, Altshuler D, Devlin B, et al. (2011) Testing for an unusual distribution of rare variants. PLoS Genet 7: e1001322.

243. Wu MC, Lee S, Cai T, Li Y, Boehnke M, et al. (2011) Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet 89: 82-93.

244. Lee PH, Shatkay H (2009) An integrative scoring system for ranking SNPs by their potential deleterious effects. Bioinformatics 25: 1048-1055.

245. Sikorska K, Rivadeneira F, Groenen PJ, Hofman A, Uitterlinden AG, et al. (2013) Fast linear mixed model computations for genome-wide association studies with longitudinal data. Stat Med 32: 165-180.

246. Smith EN, Chen W, Kahonen M, Kettunen J, Lehtimaki T, et al. (2010) Longitudinal genome-wide association of cardiovascular disease risk factors in the Bogalusa heart study. PLoS Genet 6.

247. Benke KS, Wu Y, Fallin DM, Maher B, Palmer LJ (2013) Strategy to control type I error increases power to identify genetic variation using the full biological trajectory. Genet Epidemiol 37: 419-430.

248. Park YM, Province MA, Gao X, Feitosa M, Wu J, et al. (2009) Longitudinal trends in the association of metabolic syndrome with 550 k single-nucleotide polymorphisms in the Framingham Heart Study. BMC Proc 3 Suppl 7: S116.

251 References

249. Fradin DD, Fallin MD (2009) Influence of control selection in genome-wide association studies: the example of diabetes in the Framingham Heart Study. BMC Proc 3 Suppl 7: S113.

250. Verbeke G, Spiessens B, Lesaffre E (2001) Conditional Linear Mixed Models. The American Statistician 55: 25-34.

251. Sovio U, Mook-Kanamori DO, Warrington NM, Lawrence R, Briollais L, et al. (2011) Association between common variation at the FTO locus and changes in body mass index from infancy to late childhood: the complex nature of genetic association through growth and development. PLoS Genet 7: e1001307.

252. Pinheiro J, Bates D (2000) Mixed Effects Models in S and S-Plus: Springer. 253. Fitzmaurice GM, Laird NM, Ware JH (2004) Applied Longitudinal Analysis: Wiley. 254. Zhang D, Davidian M (2001) Linear mixed models with flexible distributions of random

effects for longitudinal data. Biometrics 57: 795-802. 255. Verbeke G, Lesaffre E (1997) The effect of misspecifying the random-effects distribution in

linear mixed models for longitudinal data. Computational Statistics & Data Analysis 23: 541-556.

256. Jacqmin-Gadda H, Sibillot S, Proust C, Molina JM, Thiébaut R (2007) Robustness of the linear mixed model to misspecified error distribution. Computational Statistics & Data Analysis 51: 5142-5154.

257. Taylor JMG, Cumberland WG, Sy JP (1994) A Stochastic Model for Analysis of Longitudinal AIDS Data. Journal of the American Statistical Association 89: 727-736.

258. Taylor JM, Law N (1998) Does the covariance structure matter in longitudinal modelling for the prediction of future CD4 counts? Stat Med 17: 2381-2394.

259. Liang KY, Zeger SL (1986) Longitudinal data analysis using generalized linear models. Biometrika 73: 13-22.

260. Royall RM (1986) Model Robust Confidence Intervals Using Maximum Likelihood Estimators. International Statistical Review / Revue Internationale de Statistique 54: 221-226.

261. Golding J, Pembrey M, Jones R (2001) ALSPAC--the Avon Longitudinal Study of Parents and Children. I. Study methodology. Paediatr Perinat Epidemiol 15: 74-87.

262. Koehler E, Brown E, Haneuse SJ (2009) On the Assessment of Monte Carlo Error in Simulation-Based Statistical Analyses. Am Stat 63: 155-162.

263. White I (2010) simsum: Analysis of simulation studies including Monte Carlo error. The Stata Journal 10: 369-385.

264. McDonald L (1975) Tests for the General Linear Hypothesis Under the Multiple Design Multivariate Linear Model. The Annals of Statistics 3: 461-466.

265. Duggal P, Gillanders EM, Holmes TN, Bailey-Wilson JE (2008) Establishing an adjusted p-value threshold to control the family-wide type 1 error in genome wide association studies. BMC Genomics 9: 516.

266. Aulchenko YS, Ripke S, Isaacs A, van Duijn CM (2007) GenABEL: an R library for genome-wide association analysis. Bioinformatics 23: 1294-1296.

267. Zeger SL, Liang KY, Albert PS (1988) Models for longitudinal data: a generalized estimating equation approach. Biometrics 44: 1049-1060.

268. Zeger SL, Liang KY (1986) Longitudinal data analysis for discrete and continuous outcomes. Biometrics 42: 121-130.

269. Gurka MJ, Edwards LJ, Muller KE (2011) Avoiding bias in mixed model inference for fixed effects. Stat Med 30: 2696-2707.

270. Verbeke G, Molenberghs G (2000) Linear mixed models for longitudinal data: Springer Series in Statistics, Springer-Verlag, New York. 568 p.

252 References

271. Davidian M, Giltinan DM (1995) Nonlinear models for repeated measurement data. London: Chapman & Hall.

272. Rasbash J, Steele F, Browne WJ, Goldstein H (2012) A User’s Guide to MLwiN, v2.26. Centre for Multilevel Modelling, University of Bristol.

273. Dudbridge F, Gusnanto A (2008) Estimation of significance thresholds for genomewide association scans. Genet Epidemiol 32: 227-234.

274. Risch N, Merikangas K (1996) The future of genetic studies of complex human diseases. Science 273: 1516-1517.

275. Dadd T, Weale ME, Lewis CM (2009) A critical evaluation of genomic control methods for genetic association studies. Genet Epidemiol 33: 290-298.

276. Eriksson JG, Forsen TJ, Osmond C, Barker DJ (2003) Pathways of infant and childhood growth that lead to type 2 diabetes. Diabetes Care 26: 3006-3010.

277. Yliharsila H, Eriksson JG, Forsen T, Laakso M, Uusitupa M, et al. (2004) Interactions between peroxisome proliferator-activated receptor-gamma 2 gene polymorphisms and size at birth on blood pressure and the use of antihypertensive medication. J Hypertens 22: 1283-1287.

278. Pihlajamaki J, Vanhala M, Vanhala P, Laakso M (2004) The Pro12Ala polymorphism of the PPAR gamma 2 gene regulates weight from birth to adulthood. Obes Res 12: 187-190.

279. Eriksson JG, Lindi V, Uusitupa M, Forsen TJ, Laakso M, et al. (2002) The effects of the Pro12Ala polymorphism of the peroxisome proliferator-activated receptor-gamma2 gene on insulin sensitivity and insulin metabolism interact with size at birth. Diabetes 51: 2321-2324.

280. Meigs JB, Shrader P, Sullivan LM, McAteer JB, Fox CS, et al. (2008) Genotype score in addition to common risk factors for prediction of type 2 diabetes. N Engl J Med 359: 2208-2219.

281. Oue N, Aung PP, Mitani Y, Kuniyasu H, Nakayama H, et al. (2005) Genes involved in invasion and metastasis of gastric cancer identified by array-based hybridization and serial analysis of gene expression. Oncology 69 Suppl 1: 17-22.

282. Day FR, Loos RJ (2011) Developments in obesity genetics in the era of genome-wide association studies. J Nutrigenet Nutrigenomics 4: 222-238.

283. Randall JC, Winkler TW, Kutalik Z, Berndt SI, Jackson AU, et al. (2013) Sex-stratified Genome-wide Association Studies Including 270,000 Individuals Show Sexual Dimorphism in Genetic Loci for Anthropometric Traits. PLoS Genet 9: e1003500.

284. Willer CJ, Li Y, Abecasis GR (2010) METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26: 2190-2191.

285. Stouffer S, DeVinney L, Suchmen E (1949) The American soldier: Adjusment during army life. Princeton University Press Princeton, US Volume 1.

286. Wang K, Li M, Hakonarson H (2010) Analysing biological pathways in genome-wide association studies. Nat Rev Genet 11: 843-854.

287. Segre AV, Groop L, Mootha VK, Daly MJ, Altshuler D (2010) Common inherited variation in mitochondrial genes is not enriched for associations with type 2 diabetes or related glycemic traits. PLoS Genet 6.

288. Zhang K, Cui S, Chang S, Zhang L, Wang J (2010) i-GSEA4GWAS: a web server for identification of pathways/gene sets associated with traits by applying an improved gene set enrichment analysis to genome-wide association study. Nucleic Acids Res 38: W90-95.

289. Nam D, Kim J, Kim SY, Kim S (2010) GSA-SNP: a general approach for gene set analysis of polymorphisms. Nucleic Acids Res 38: W749-754.

253 References

290. Kwon JS, Kim J, Nam D, Kim S (2012) Performance Comparison of Two Gene Set Analysis Methods for Genome-wide Association Study Results: GSA-SNP vs i-GSEA4GWAS. Genomics Inform 10: 123-127.

291. Okamoto K, Iwasaki N, Nishimura C, Doi K, Noiri E, et al. (2010) Identification of KCNJ15 as a susceptibility gene in Asian patients with type 2 diabetes mellitus. Am J Hum Genet 86: 54-64.

292. Iwasaki N, Cox NJ, Wang YQ, Schwarz PE, Bell GI, et al. (2003) Mapping genes influencing type 2 diabetes risk and BMI in Japanese subjects. Diabetes 52: 209-213.

293. Okamoto K, Iwasaki N, Doi K, Noiri E, Iwamoto Y, et al. (2012) Inhibition of glucose-stimulated insulin secretion by KCNJ15, a newly identified susceptibility gene for type 2 diabetes. Diabetes 61: 1734-1741.

294. Horikoshi M, Yaghootkar H, Mook-Kanamori DO, Sovio U, Taal HR, et al. (2013) New loci associated with birth weight identify genetic links between intrauterine growth and adult height and metabolism. Nat Genet 45: 76-82.

295. Lango Allen H, Estrada K, Lettre G, Berndt SI, Weedon MN, et al. (2010) Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467: 832-838.

296. Dupuis J, Langenberg C, Prokopenko I, Saxena R, Soranzo N, et al. (2010) New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk. Nat Genet 42: 105-116.

297. Morris AP, Voight BF, Teslovich TM, Ferreira T, Segre AV, et al. (2012) Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat Genet 44: 981-990.

298. Xu Z, Taylor JA (2009) SNPinfo: integrating GWAS and candidate gene information into functional SNP selection for genetic association studies. Nucleic Acids Res 37: W600-605.

299. Kasowski M, Grubert F, Heffelfinger C, Hariharan M, Asabere A, et al. (2010) Variation in transcription factor binding among humans. Science 328: 232-235.

300. Schaub MA, Boyle AP, Kundaje A, Batzoglou S, Snyder M (2012) Linking disease associations with regulatory information in the human genome. Genome Res 22: 1748-1759.

301. Andersson EA, Pilgaard K, Pisinger C, Harder MN, Grarup N, et al. (2010) Do gene variants influencing adult adiposity affect birth weight? A population-based study of 24 loci in 4,744 Danish individuals. PLoS One 5: e14190.

302. Kilpelainen TO, den Hoed M, Ong KK, Grontved A, Brage S, et al. (2011) Obesity-susceptibility loci have a limited influence on birth weight: a meta-analysis of up to 28,219 individuals. Am J Clin Nutr 93: 851-860.

303. Barker DJ, Osmond C, Forsen TJ, Kajantie E, Eriksson JG (2005) Trajectories of growth among children who have coronary events as adults. N Engl J Med 353: 1802-1809.

304. Weedon MN, Lango H, Lindgren CM, Wallace C, Evans DM, et al. (2008) Genome-wide association analysis identifies 20 loci that influence adult height. Nat Genet 40: 575-583.

305. Kim JJ, Lee HI, Park T, Kim K, Lee JE, et al. (2010) Identification of 15 loci influencing height in a Korean population. J Hum Genet 55: 27-31.

306. Yashin AI, Wu D, Arbeev KG, Ukraintseva SV (2010) Joint influence of small-effect genetic variants on human longevity. Aging (Albany NY) 2: 612-620.

307. Zhang Y, Proenca R, Maffei M, Barone M, Leopold L, et al. (1994) Positional cloning of the mouse obese gene and its human homologue. Nature 372: 425-432.

254 References

308. Lustig RH (2006) Childhood obesity: behavioral aberration or biochemical drive? Reinterpreting the First Law of Thermodynamics. Nat Clin Pract Endocrinol Metab 2: 447-458.

309. O'Rahilly S (2009) Human genetics illuminates the paths to metabolic disease. Nature 462: 307-314.

310. Hofker M, Wijmenga C (2009) A supersized list of obesity genes. Nat Genet 41: 139-140. 311. Gosset P, Ghezala GA, Korn B, Yaspo ML, Poutska A, et al. (1997) A new inward rectifier

potassium channel gene (KCNJ15) localized on chromosome 21 in the Down syndrome chromosome region 1 (DCR1). Genomics 44: 237-241.

312. Epstein CJ, Korenberg JR, Anneren G, Antonarakis SE, Ayme S, et al. (1991) Protocols to establish genotype-phenotype correlations in Down syndrome. Am J Hum Genet 49: 207-235.

313. Delabar JM, Theophile D, Rahmani Z, Chettouh Z, Blouin JL, et al. (1993) Molecular mapping of twenty-four features of Down syndrome on chromosome 21. Eur J Hum Genet 1: 114-124.

314. Toyoda A, Noguchi H, Taylor TD, Ito T, Pletcher MT, et al. (2002) Comparative genomic sequence analysis of the human chromosome 21 Down syndrome critical region. Genome Res 12: 1323-1332.

315. Warrington NM, Howe LD, Wu YY, Timpson NJ, Tilling K, et al. (2013) Association of a Body Mass Index Genetic Risk Score with Growth throughout Childhood and Adolescence. PLoS One 8: e79547.

316. Mei H, Chen W, Jiang F, He J, Srinivasan S, et al. (2012) Longitudinal replication studies of GWAS risk SNPs influencing body mass index over the course of childhood and adulthood. PLoS One 7: e31470.

317. Dvornyk V, Waqar ul H (2012) Genetics of age at menarche: a systematic review. Hum Reprod Update 18: 198-210.

318. Wen X, Kleinman K, Gillman MW, Rifas-Shiman SL, Taveras EM (2012) Childhood body mass index trajectories: modeling, characterizing, pairwise correlations and socio-demographic predictors of trajectory characteristics. BMC Med Res Methodol 12: 38.

319. Zillikens MC, Yazdanpanah M, Pardo LM, Rivadeneira F, Aulchenko YS, et al. (2008) Sex-specific genetic effects influence variation in body composition. Diabetologia 51: 2233-2241.

320. Comuzzie AG, Blangero J, Mahaney MC, Mitchell BD, Stern MP, et al. (1993) Quantitative genetics of sexual dimorphism in body fat measurements. American Journal of Human Biology 5: 725-734.

321. Atwood LD, Heard-Costa NL, Cupples LA, Jaquish CE, Wilson PW, et al. (2002) Genomewide linkage analysis of body mass index across 28 years of the Framingham Heart Study. Am J Hum Genet 71: 1044-1050.

322. Webster RJ, Warrington NM, Weedon MN, Hattersley AT, McCaskie PA, et al. (2009) The association of common genetic variants in the APOA5, LPL and GCK genes with longitudinal changes in metabolic and cardiovascular traits. Diabetologia 52: 106-114.

323. Andreasen CH, Stender-Petersen KL, Mogensen MS, Torekov SS, Wegner L, et al. (2008) Low physical activity accentuates the effect of the FTO rs9939609 polymorphism on body fat accumulation. Diabetes 57: 95-101.

324. Vimaleswaran KS, Li S, Zhao JH, Luan J, Bingham SA, et al. (2009) Physical activity attenuates the body mass index-increasing influence of genetic variation in the FTO gene. Am J Clin Nutr 90: 425-428.

255 References

325. Cauchi S, Stutzmann F, Cavalcanti-Proenca C, Durand E, Pouta A, et al. (2009) Combined effects of MC4R and FTO common genetic variants on obesity in European general populations. J Mol Med (Berl) 87: 537-546.

326. Rampersaud E, Mitchell BD, Pollin TI, Fu M, Shen H, et al. (2008) Physical activity and the association of common FTO gene variants with body mass index and obesity. Arch Intern Med 168: 1791-1797.

327. Kilpeläinen TO, Qi L, Brage S, Sharp SJ, Sonestedt E, et al. (2011) Physical Activity Attenuates the Influence of <italic>FTO</italic> Variants on Obesity Risk: A Meta-Analysis of 218,166 Adults and 19,268 Children. PLoS Med 8: e1001116.

328. Ahmad T, Lee IM, Pare G, Chasman DI, Rose L, et al. (2011) Lifestyle interaction with fat mass and obesity-associated (FTO) genotype and risk of obesity in apparently healthy U.S. women. Diabetes Care 34: 675-680.

329. Sonestedt E, Roos C, Gullberg B, Ericson U, Wirfalt E, et al. (2009) Fat and carbohydrate intake modify the association between genetic variation in the FTO genotype and obesity. Am J Clin Nutr 90: 1418-1425.

330. Abarin T, Yan Wu Y, Warrington N, Lye S, Pennell C, et al. (2012) The impact of breastfeeding on FTO-related BMI growth trajectories: an application to the Raine pregnancy cohort study. Int J Epidemiol 41: 1650-1660.

331. Li S, Zhao JH, Luan J, Ekelund U, Luben RN, et al. (2010) Physical activity attenuates the genetic predisposition to obesity in 20,000 men and women from EPIC-Norfolk prospective population study. PLoS Med 7.

332. Access Economics (2008) The growing cost of obesity in 2008: three years on. Canberra: Diabetes Australia.

333. Colagiuri S, Lee CMY, Colagiuri R, Magliano D, Shaw JE, et al. (2010) The cost of overweight and obesity in Australia. Medical Journal of Australia 192: 260-264.

334. Summerbell CD, Waters E, Edmunds LD, Kelly S, Brown T, et al. (2005) Interventions for preventing obesity in children. Cochrane Database Syst Rev: CD001871.

335. Waters E, de Silva-Sanigorski A, Hall BJ, Brown T, Campbell KJ, et al. (2011) Interventions for preventing obesity in children. Cochrane Database Syst Rev: CD001871.

336. Doyle O, Harmon CP, Heckman JJ, Tremblay RE (2009) Investing in early human development: timing and economic efficiency. Econ Hum Biol 7: 1-6.

337. Crowle J, Turner E (2010) Childhood Obesity: An Economic Perspective. Productivity Commission Staff Working Paper, Melbourne.

256 References

Appendix A: Publication Arising from the Research in Chapter Two

Modelling BMI Trajectories in Children for GeneticAssociation StudiesNicole M. Warrington1,2, Yan Yan Wu2, Craig E. Pennell1, Julie A. Marsh1, Lawrence J. Beilin3,

Lyle J. Palmer2,4, Stephen J. Lye2, Laurent Briollais2*

1 School of Women’s and Infants’ Health, The University of Western Australia, Perth, Western Australia, Australia, 2 Samuel Lunenfeld Research Institute, Mount Sinai

Hospital, Toronto, Ontario, Canada, 3 School of Medicine and Pharmacology, The University of Western Australia, Perth, Western Australia, Australia, 4 Ontario Institute for

Cancer Research, Toronto, Ontario, Canada

Abstract

Background: The timing of associations between common genetic variants and changes in growth patterns over childhoodmay provide insight into the development of obesity in later life. To address this question, it is important to defineappropriate statistical models to allow for the detection of genetic effects influencing longitudinal childhood growth.

Methods and Results: Children from The Western Australian Pregnancy Cohort (Raine; n = 1,506) Study were genotyped at17 genetic loci shown to be associated with childhood obesity (FTO, MC4R, TMEM18, GNPDA2, KCTD15, NEGR1, BDNF, ETV5,SEC16B, LYPLAL1, TFAP2B, MTCH2, BCDIN3D, NRXN3, SH2B1, MRSA) and an obesity-risk-allele-score was calculated as thetotal number of ‘risk alleles’ possessed by each individual. To determine the statistical method that fits these data and hasthe ability to detect genetic differences in BMI growth profile, four methods were investigated: linear mixed effects model,linear mixed effects model with skew-t random errors, semi-parametric linear mixed models and a non-linear mixed effectsmodel. Of the four methods, the semi-parametric linear mixed model method was the most efficient for modellingchildhood growth to detect modest genetic effects in this cohort. Using this method, three of the 17 loci were significantlyassociated with BMI intercept or trajectory in females and four in males. Additionally, the obesity-risk-allele score wasassociated with increased average BMI (female: b= 0.0049, P = 0.0181; male: b= 0.0071, P = 0.0001) and rate of growth(female: b= 0.0012, P = 0.0006; male: b= 0.0008, P = 0.0068) throughout childhood.

Conclusions: Using statistical models appropriate to detect genetic variants, variations in adult obesity genes wereassociated with childhood growth. There were also differences between males and females. This study provides evidence ofgenetic effects that may identify individuals early in life that are more likely to rapidly increase their BMI through childhood,which provides some insight into the biology of childhood growth.

Citation: Warrington NM, Wu YY, Pennell CE, Marsh JA, Beilin LJ, et al. (2013) Modelling BMI Trajectories in Children for Genetic Association Studies. PLoSONE 8(1): e53897. doi:10.1371/journal.pone.0053897

Editor: Guoying Wang, John Hopkins Bloomerg School of Public Health, United States of America

Received August 17, 2012; Accepted December 4, 2012; Published January 17, 2013

Copyright: � 2013 Warrington et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permitsunrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: The following institutions provide funding for Core Management of the Raine Study: The University of Western Australia (UWA), Raine Medical ResearchFoundation, UWA Faculty of Medicine, Dentistry and Health Sciences, The Telethon Institute for Child Health Research, Curtin University and Women and InfantsResearch Foundation. This study was supported by project grants from the National Health and Medical Research Council of Australia (Grant ID 403981 and ID003209; http://www.nhmrc.gov.au/) and the Canadian Institutes of Health Research (Grant ID MOP-82893; http://www.cihr-irsc.gc.ca/e/193.html). Ms. Warringtonis funded by an Australian Postgraduate Award from the Australian Government of Innovation, Industry, Science and Research and a Raine Study PhD Top-UpScholarship. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing Interests: The authors have declared that no competing interests exist.

* E-mail: [email protected]

Introduction

Obesity is a major global public health problem. The World

Health Organisation estimated in 2010 there were at least 42

million overweight children under the age of 5-years and one

billion overweight adults globally [1]. Childhood obesity is

associated with poor mental [2,3,4,5] and physical health [6,7]

and is one of the strongest predictors of adult obesity [8,9]. Adult

obesity, in turn, increases the risk of many diseases including

coronary heart disease, metabolic syndrome, some cancers, stroke,

liver and gallbladder disease, sleep apnoea and respiratory

problems, osteoarthritis and gynaecological problems [1]. It has

been proposed that there are critical periods early in an

individual’s life for the development of obesity including gestation

and early infancy, adiposity rebound and adolescence [10].

An individual’s susceptibility to obesity is thought to result from

a combination of their genetics, behaviours and environment. The

heritability of obesity is estimated from family and twin studies to

be between 40 and 80% [11,12,13], which appears to be age

dependent with younger individuals having higher heritability

estimates [14]. Genetic factors have an important role in

childhood obesity, but their role may be different to those that

operate in adulthood. Since the advent of genome-wide association

studies (GWAS), common variants within 35 genes have been

discovered to be associated with adult obesity [15,16,17,18,19]

and a further 48 genes associated with population variation in

body mass index (BMI) and weight [20,21,22,23,24,25,26] in

individuals of European descent. In particular, common variants

within the fat-mass and obesity associated (FTO) and melanocor-

PLOS ONE | www.plosone.org 1 January 2013 | Volume 8 | Issue 1 | e53897

itin 4 receptor (MC4R) genes are associated with modest effects on

BMI (0.2–0.4 kg/m2 per allele) which translate into increased odds

of obesity of 1.1–1.3 in adults [24,26,27,28,29]. However, the

genomic regions discovered to date to be associated with BMI

account for less than 1% of the total variance in the BMI [30],

leaving much of the estimated heritability unexplained. In

addition, relatively few studies have investigated the association

between the adult BMI associated variants and childhood BMI

[23,31,32,33,34]. Zhao et al [31] investigated the association

between childhood BMI and 13 genomic loci reported to be

associated with adult obesity to find that nine of the loci contribute

to paediatric BMI between birth and 18 years of age.

Subsequently, several authors have investigated the association

between adult BMI loci and changes in growth over childhood.

Hardy and colleagues [33] took variants from the two most

commonly reported obesity genes, FTO and MC4R, to see if they

were associated with life course body size. They found the

association with BMI in both genes strengthened during childhood

up until 20 years of age before weakening throughout adulthood.

In 2010, Elks et al [34] used eight variants that showed individual

associations with childhood BMI to create an obesity-risk-allele-

score. This allele-score was strongly associated with early infant

weight gain but also with weight gain over childhood. Finally, den

Hoed et al [32] looked at BMI in childhood and adolescence

against a larger subset of replicated SNPs representing the 16 BMI

loci from the six genome-wide association studies in adults of white

European descent [22,23,24,26,35,36]. Together, these studies

begin to provide evidence that genetic loci associated with BMI in

adulthood start having an effect in childhood and even infancy.

Obesity develops over a period of time so investigating the

genetic determinants underlying this developmental process may

provide insights into mechanisms of the genetic associations.

Sophisticated longitudinal analyses allow questions to be addressed

that cannot be determined from cross-sectional analyses. These

longitudinal models assess patterns and duration of genetic effect

at baseline and over a time period and the differences in means

and rates of change of a trait. It is therefore important to

investigate the genetic component of BMI trajectory in order to

better understand some of the underlying biology of growth. The

analysis of longitudinal growth curves allows one to identify

specific stages in which genes play a central role.

A child’s growth rate profile often contains important informa-

tion regarding their genetic make-up and environmental expo-

sures; however, BMI trajectories are difficult to model statistically

due to the various changes in growth rate over childhood.

Children tend to have rapidly increasing BMI from birth to

approximately 9 months of age where they reach their adiposity

peak; BMI then decreases until around the age of 5–6 years at

adiposity rebound and then steadily increases again until after

puberty where it tends to plateau through adulthood. These

patterns of growth tend to be different in males and females where

females often reach each of the ‘landmarks’ (adiposity rebound,

puberty and plateau at adult BMI) at an earlier age than males.

These changes over time within each individual, as well as the

increasing variability over time of BMI between individuals, are

often difficult to capture accurately in a statistical model. This is

particularly the case when the aim is to detect modest genetic

effects. The World Health Organization recently conducted

research into statistical methods used to estimate growth curves

over childhood and examined 30 previously published methods, of

which only 7 could handle multiple measurements per child [37].

These methods range from non-linear, parametric curves [38] to

non-linear, non-parametric methods where the form of the curve

was allowed to differ for each subject [39,40] and from linear

mixed-effects models for longitudinal normally distributed data

[41,42] to a more general multilevel model, some with non-

parametric components [43,44,45]. Although many methods have

been previously used for growth modelling, not all are appropriate

for genetic association analyses or modelling growth profiles in

longitudinal birth cohorts.

We aim to compare various modelling approaches to assess the

genetic effects of BMI growth through infancy, childhood and

adolescence. To investigate the sensitivity of these different

modelling frameworks to detect genetic effects, we will use the

previously published adult obesity and BMI associated SNPs that

have been shown to be associated with childhood BMI and

explore their associations with childhood growth.

Methods

SubjectsThe Western Australian Pregnancy Cohort (Raine) Study

[46,47,48] is a prospective pregnancy cohort where 2,900 mothers

were recruited prior to 18-weeks’ gestation between 1989 and

1991. Recruitment took place at Western Australia’s major

perinatal centre, King Edward Memorial Hospital, and nearby

private practices. The mothers completed questionnaires regard-

ing the children and the children had physical examinations at

average ages of 1, 2, 3, 6, 8, 10, 14 and 17 years. A DNA sample

was collected at the 14 and 17 year follow-ups. A subset of 1,506

individuals were used for analysis in this study using the following

inclusion criteria: at least one parent of European descent, live

birth, unrelated to anyone in the sample (one of every related pair,

including multiple births, was selected at random to exclude), no

significant congenital anomalies, a DNA sample and at least one

measure of body mass index (BMI) throughout childhood. Weight

and height were measured at each follow-up by trained members

of the research team [49]; weight was measured using a

Wedderburn Digital Chair Scale to the nearest 100 g with

children dressed in running shorts and a singlet top and height

was measured to the nearest 0.1 cm with a Holtain Stadiometer.

BMI was calculated from the weight and height measurements

(median 6 measures per person, interquartile range 5–7, range 1–8

measurements), with a total of 8,986 BMI measures. The study

was conducted with appropriate institutional ethics approval from

the King Edward Memorial Hospital and Princess Margaret

Hospital for Children ethics boards, and written informed consent

was obtained from all mothers. The cohort has been shown to be

representative of the population presenting to the antenatal

tertiary referral centre in Western Australia [48].

GenesWe wanted to investigate markers that have an effect on

childhood BMI, and more importantly, change in BMI over

childhood so selected the 17 genetic variants published in den

Hoed et al [32]. These SNPs were first discovered to be associated

with adult BMI and replicated in at least one study against

childhood BMI and change in BMI growth over childhood. At the

time of selecting SNPs for this study, they were the largest set of

SNPs shown to be associated with BMI over childhood and

adolescence. We did not include loci that have been shown to be

associated with only obesity risk but not BMI. Subsets of these 17

SNPs (either the same SNPs or a SNP in high LD [r2.0.8]) were

also presented by Elks et al [34] and Hardy et al [33], who showed

associations with changes in growth over childhood. Genetic

information on these 17 published genetic variants was available

for individuals in our sample, either directly genotyped SNPs

(rs925946 (BDNF), rs10913469 (SEC16B), rs2605100 (LYPLAL1),

Statistical Methods for BMI in Genetic Studies


rs987237 (TFAP2B), rs10838738 (MTCH2), rs7138803

(BCDIN3D) and rs10146997 (NRXN3)) or from the best guess

genotype data imputed against HapMap release 22 (rs2815752

(NEGR1), rs6548238 (TMEM18), rs7647305 (ETV5), rs10938397

(GNPDA2), rs613080 (MRSA), rs1488830 (BDNF), rs8055138

(SH2B1), rs1121980 (FTO), rs17782313 (MC4R) and rs11084753

(KCTD15)). Genotyping and quality control has been described

elsewhere [50]. Briefly, our sample was genotyped using the

genome-wide Illumina 660 Quad Array. Genotyping was

performed on the Illumina BeadArray Reader at the Centre for

Applied Genomics, Toronto, Canada using 250 nanograms of

DNA. The genotype data was cleaned using standard thresholds

(HWE p-value .5.761027, call rate .95% and minor allele

frequency .1%). Individual level genotype data was extracted for

those SNPs of interest that were directly genotyped by the chip

and passed QC measures. Imputation of un-typed or missing

genotypes was also performed using MACH v1.0.16 for the all 22

autosomes with the CEU samples from HapMap Phase2 (Build 36,

release 22) used as a reference panel. Two variants in the BDNF

gene were investigated as they have previously been shown to be

Table 1. The phenotypic characteristics of the Raine sample.

All Male Female P-Value

(n = 1,506) (n = 773) (n = 733)

Age Year 1 (n = 1,375) 1.16 (0.10) 1.15 (0.10) 1.16 (0.10) 0.22

(yr) Year 2 (n = 402) 2.18 (0.14) 2.19 (0.14) 2.16 (0.14) 0.05

Year 3 (n = 994) 3.11 (0.12) 3.12 (0.13) 3.11 (0.10) 0.71

Year 5 (n = 1,324) 5.92 (0.18) 5.91 (0.19) 5.92 (0.18) 0.30

Year 8 (n = 1,320) 8.10 (0.35) 8.12 (0.34) 8.09 (0.36) 0.17

Year 10 (n = 1,274) 10.60 (0.18) 10.60 (0.19) 10.59 (0.17) 0.16

Year13/14 (n = 1,276) 14.07 (0.20) 14.07 (0.20) 14.07 (0.19) 0.55

Year 16/17 (n = 1,021) 17.05 (0.25) 17.03 (0.24) 17.06 (0.25) 0.06

BMI Year 1 (n = 1,375) 17.11 (1.40) 17.38 (1.38) 16.82 (1.37) 4.63E-14

(kg/m2) Year 2 (n = 402) 15.97 (1.29) 16.19 (1.28) 15.72 (1.25) 2.00E-04

Year 3 (n = 994) 16.15 (1.27) 16.29 (1.21) 16.00 (1.31) 2.00E-04

Year 5 (n = 1,324) 15.86 (1.76) 15.88 (1.70) 15.84 (1.82) 0.64

Year 8 (n = 1,320) 16.88 (2.54) 16.79 (2.47) 16.97 (2.62) 0.29

Year 10 (n = 1,274) 18.69 (3.41) 18.58 (3.38) 18.80 (3.45) 0.25

Year13/14 (n = 1,276) 21.45 (4.23) 21.21 (4.24) 21.71 (4.20) 0.03

Year 16/17 (n = 1,021) 23.02 (4.38) 22.83 (4.34) 23.23 (4.42) 0.15

Height Year 1 (n = 1,375) 0.78 (0.03) 0.78 (0.03) 0.77 (0.03) 1.04E-14

(m) Year 2 (n = 402) 0.90 (0.03) 0.91 (0.03) 0.90 (0.03) 3.00E-04

Year 3 (n = 994) 0.96 (0.04) 0.97 (0.04) 0.96 (0.04) 1.06E-09

Year 5 (n = 1,324) 1.16 (0.05) 1.17 (0.05) 1.15 (0.04) 6.05E-07

Year 8 (n = 1,320) 1.29 (0.06) 1.30 (0.06) 1.29 (0.06) 4.37E-06

Year 10 (n = 1,274) 1.44 (0.06) 1.44 (0.07) 1.44 (0.06) 0.97

Year13/14 (n = 1,276) 1.65 (0.08) 1.67 (0.09) 1.62 (0.06) 4.94E-26

Year 16/17 (n = 1,021) 1.73 (0.09) 1.79 (0.07) 1.66 (0.06) 1.94E-143

Weight Year 1 (n = 1,375) 10.34 (1.24) 10.67 (1.24) 9.99 (1.15) 5.03E-25

(kg) Year 2 (n = 402) 13.03 (1.49) 13.39 (1.48) 12.65 (1.40) 3.37E-07

Year 3 (n = 994) 15.06 (1.84) 15.42 (1.83) 14.69 (1.78) 3.99E-10

Year 5 (n = 1,324) 21.48 (3.37) 21.75 (3.42) 21.20 (3.30) 2.91E-03

Year 8 (n = 1,320) 28.42 (5.68) 28.58 (5.65) 28.24 (5.72) 0.28

Year 10 (n = 1,274) 39.01 (9.02) 38.80 (9.09) 39.23 (8.95) 0.40

Year13/14 (n = 1,276) 58.49 (13.44) 59.50 (14.49) 57.39 (12.11) 4.81E-03

Year 16/17 (n = 1,021) 68.69 (14.59) 73.15 (14.91) 64.12 (12.74) 3.91E-24

Number of follow-ups per person 5.97 (1.52) 5.96 (1.52) 5.97 (1.53) 0.91

Birth Weight (kg) 3.35 (0.59) 3.41 (0.59) 3.28 (0.58) 3.85E-05

Gestational Age (wks) 39.35 (2.11) 39.37 (2.05) 39.32 (2.17) 0.66

Preterm [% (N)] 8.77% (132) 8.03% (62) 9.55% (70) 0.34

Maternal smoking during pregnancy [% (N)] 25.22% (379) 22.77% (176) 27.81% (203) 0.03

Continuous variables are expressed as means (SD); binary variables as percentage (number).doi:10.1371/journal.pone.0053897.t001



independently associated with obesity [22] (r2 = 0.11). The 17

SNPs are described in Table S1, including the available sample

size with complete data for each SNP. These 17 SNPs were used to

investigate the sensitivity of each method to detect genetic variants

in terms of point estimates and standard errors (SEs) across various

time points (for those methods that could be compared). Each SNP

was incorporated into the model independently assuming an

additive genetic effect for the obesity risk allele. In addition, an

‘obesity-risk-allele score’ was created on the subset of individuals

with complete genetic data by summing the number of risk alleles

an individual had (n = 1,219) [51]. The alleles were not weighted

by their effect size as this has previously been shown to only have

limited benefit [52].

Statistical AnalysisFour popular methods were compared to assess the accuracy of

estimation of BMI growth trajectories and the ability to detect

genetic effects influencing these trajectories. These methods

included: Linear Mixed Effects Model (LMM) [41], the Skew-t

Linear Mixed Effects Model (STLMM) [53,54,55], Semi-Para-

metric Linear Mixed Models (SPLMM) and a Non-Linear Mixed

Model (NLMM), also known as SuperImposition by Translation

and Rotation (SITAR) [40]. Although there are many possible

statistical methods that could be utilized in this context, these

methods were chosen as they allow for adjustment of potential

confounders, appropriately account for the complex correlation

structure between the repeated measures, allow for incomplete

data on the assumption that data are missing at random, and are

computationally feasible in the context of candidate gene and

genome-wide association studies. Once the best fitting model was

defined for each method, the model fit for each of the methods was

compared. A small simulation study was also conducted using re-

sampling techniques based on 1,000 non-parametric bootstrap

data sets with replacement [56] from the Raine data and

calculating an R2 statistic for each method fit to these simulated

datasets.

LMM. The LMM with a polynomial function is a common

tool for growth curve analysis with continuous repeated measures.

For a set of time points varying from 1,.,t, the time trend in the

sample can be described by a (q-1)st-degree polynomial function,

with q # t. The growth curve LMM for the jth individual and tth

time point and with the time scale measured by age is as follows:

BMIjt kg=m2� �

~b0zSibi Agejt

� �izu0j

zSkukj Agejt{Age� �k

zejt kƒi

Where Age is the mean age over the t time points in the sample

(i.e. 8 years), bi are the parameter estimates for the fixed effects, ukj

are the parameter estimates for the random effects assumed

multivariate normal and the ejt‘s are the error terms assumed

normally distributed N(0, S), where S is the within-individual

correlation matrix. Both age and the natural log transformation of

age were considered as the time component to identify the optimal

underlying scale. Both fixed (i) and random (k) effects up to

polynomial of degree 3 were tested for significance. Several within-

individual correlation structures were considered, including

autoregressive, continuous autoregressive, exchangeable (com-

pound symmetric) and unstructured.

Following the guidelines outlined in Cheng et al [57], the initial

saturated model considered included a cubic function of age for

both the fixed and random effects and BMI on the natural log

scale, was used to compare covariance (random effects) matrices.

Initially, likelihood ratio tests (LRT) were used to assess the

required degree of polynomial function for the random effects to

fit the data accurately, while keeping the fixed effects the same and

specifying an independence correlation matrix for the random

effects. Next, a similar approach was used to investigate within-

individual correlation structures in addition to the random effects.

Finally, models with both untransformed and natural log

transformed age were compared using diagnostic plots such as

fitted verses observed values, fitted versus residual values and

distribution of both random effects and error terms.

STLMM. The assumption of multivariate normal random

effects and within-subject errors is often violated, particularly when

modelling the childhood growth curve. This may lead to biased

estimation of fixed effects and their SEs and thus to wrong

statistical inference, in particular of the genetic association-related

parameters. A common approach to achieve normality is to

transform the response variable but generally there is not a unique

transformation that could be used and the results of the analyses

might depend on the transformation used. To avoid transforming

Table 2. Characteristics of the best model for each method.

Scale of response Fixed effect parameters Random effect parametersWithin-individualcorrelation matrix

Female LMM ln(BMI) 1+ age+age2+age3 1+ age+age2 corCAR1

STLMM BMI 1+ age+age2+age3 1+age None

SPLMM ln(BMI) piecewise cubic spline function ofage with knots at 2, 8 and 12 years

1+ age +0.5*age2 None

NLMM ln(BMI) size and a natural cubic splinefunction of ln(age) for velocity with3df

size and a natural cubic spline functionof ln(age) for tempo and velocityparameters with 3df

corCAR1

Male LMM ln(BMI) 1+ age+age2+age3 1+ age+age2 corCAR1

STLMM BMI 1+ age+age2+age3 1+age None

SPLMM ln(BMI) piecewise cubic spline function ofage with knots at 2, 8 and 12 years

1+ age +0.5*age2 None

NLMM ln(BMI) size and a natural cubic splinefunction of ln(age) for velocity with4df

size and a natural cubic spline functionof ln(age) for tempo and velocityparameters with 4df

corCAR1

doi:10.1371/journal.pone.0053897.t002



the response and still obtain a valid inference under a non-normal

distribution assumption for the response, we utilised an extension

of the LMM model assuming a multivariate t distribution for the

error terms, ejt‘s, and a multivariate skew-normal distribution for

the random effects. The resulting model for the response over the t

time points is multivariate skew-t with specific parameters that

account for the asymmetry (skewness parameters) and long-tail

(degree of freedom of the t distribution) of the response distribution

[54]. The specification in terms of fixed and random effects was

identical to the LMM. No transformations were applied to either

BMI or age as the skewness in the data was accounted for by the

model structure.

SPLMM. Semi-parametric linear mixed models make use of

smoothing splines, which yield a smoother growth curve estimate

than the polynomial function in the LMM when fitting non-linear

relationships. The basic model for the jth individual and time-point

t is as follows:

BMIjt kg=mð Þ~b0zSibi Agejt

� �izSkck( Agejt{Age

� �{kk)i

z

zu0jzSiuij Agejt

� �izSkgkj( Agejt{Age

� �{kk)i

zzejt

Where kk is the k-th knot and (t – kk)+ = 0 if t # kk and (t – kk) if

t.kk, which is known as the truncated power basis that ensures

smooth continuity between the time windows.

Various numbers and positions of knots and the degree of

polynomial between knots were compared to find the best fit to the

data. Knot points were initially estimated visually from both

individual profiles and the population average curve in males and

females separately. To optimise the number and placement of the

knot points, we fit a series of models with the knot points placed at

6-month intervals around the estimated knot points and incorpo-

rated additional knot points to see if they improved the model fit.

The model with the lowest Akaike Information Criterion (AIC)

was selected as the final model. Finally, we investigated the degree

of polynomial, up to the third degree, required for each spline,

once again selecting the best model with the lowest AIC.

NLMM. The SITAR method [40] was recently defined to

summarize height growth in puberty (in particular peak height

velocity) and estimate subject-specific parameters that can be used

to investigate relationships with earlier exposures and later

outcomes. The SITAR method (referred to here as NLMM)

model has a single fitted curve at the population level and

individual level estimates of mean differences in size (shifting up or

down of the BMI curve), growth tempo (left-right shift of the curve

on the age scale) and velocity (shrinking or stretching of the age

scale).

The basic model for the growth curves is:

yit~aizh(t{bi

e{yi

� �

Where:

yit = growth of subject i at age t.

h(t) = natural cubic spline curve of growth vs. age.

ai = random growth intercept that adjusts for differences in

mean height (size).

bi = random growth intercept to adjust for difference in timing

(tempo).

ci = random age scaling adjusting for the duration of the growth

spurt (velocity).

This model was fit with the three parameters (size, tempo and

velocity) as random effects, size and velocity as fixed effects, and

h(t) a natural cubic spline curve with 3 to 8 degrees of freedom (df)

fitted as fixed effects. BMI and age were fitted both untransformed

and natural log transformed, to identify the best fit to the data.

Model fit to the data were compared using AIC, deviance and

residual standard deviation. The estimates for the three param-

eters (size, tempo and velocity) were extracted for each individual

and used for genetic analyses.

Given that growth curves differ greatly between males and

females, particularly around puberty, and because different genes

may influence the timing of growth spurts in males and females,

sex stratified models were used for all analyses. Age was mean

centred prior to analysis. Due to the possibility of population

stratification in our sample given our sampling criteria of at least

one parent of European descent, a sensitivity analysis was

conducted adjusting the genetic analyses for the first five principal

components generated in the EIGENSTRAT software [58]. No

adjustment for multiple testing have been made as our goal was to

estimate a combined effect of SNPs that have already been

validated in previous studies and shown to be significantly

associated with childhood BMI and growth. All analyses were

conducted in R version 2.12.1 [59]; the spida library was used for

the SPLMM models and the sitarlib library was used for the

NLMM models. To enable comparison between the four methods,

maximum likelihood estimation was used for all mixed models.

Genetic loci were considered associated with BMI if the global

likelihood ratio test was significant at a a,0.05 level.

Results

Population CharacteristicsOf the 1,506 children in the analysis, there are 773 males (51%)

and 733 females. Table 1 gives the characteristics of the Raine

sample used in the analysis. At birth, these babies were similar to

the Western Australian population of births with an average birth

weight of 3.35 Kg (SD = 0.59 Kg) and gestational age of 39.35

weeks (SD = 2.11 weeks), 25.21% of them were born to mothers

who smoked throughout pregnancy and 8.77% born preterm. The

mothers on average gained 8.79 kg (SD = 3.78) throughout

pregnancy and breast fed their infant for an average of 6 months

(IQR = 2–12 months). On average, the infants gained 6.98 Kg

(SD = 1.17 Kg) in the first year of life.

Model Fitting and ComparisonsThe optimal model for each method was defined before any

cross-method comparisons were conducted. The selected models

for each method are summarized in Table 2.

LMM. The optimal LMM model for both males and females

was based on ln(BMI) and untransformed age, with cubic

polynomial of age in the fixed effects, a quadratic polynomial of

age in the random effects and a continuous autoregressive

correlation structure of order one. Hence, the final model for

both females and males was

ln BMI kg=m½ �ð Þ~b0zb1 Age{8ð Þzb2 Age{8ð Þ2z

b3 Age{8ð Þ3zu0zu1 Age{8ð Þzu2 Age{8ð Þ2ze

STLMM. The LMM model defined previously was used for

this method; however BMI was modelled on the untransformed

scale as the method accounts for the skewness and kurtosis of the

BMI distribution. The model would not converge with both linear



and quadratic age components in the random effects so this was

reduced to only linear age. This was the most computationally

intensive method to fit as it uses an expectation-maximization

(EM) algorithm for parameter estimation, and hence took the

longest time to converge.

SPLMM. For females, the optimal model had three knot

points placed at two, eight and 12 years with a cubic slope for each

spline. The males displayed a similar curve to the females, also

with three knots at two, eight and 12 years and a cubic slope

between each knot.

NLMM. The optimal model for females had a natural cubic

spline curve with three degrees of freedom and both BMI and age

on the natural log transformed scale. Similarly, the optimal model

for males was with BMI and age on the natural log transformed

scale but with four degrees of freedom for the natural cubic spline

curve.

Comparisons. Table 3 displays the measures of fit used to

compare methods: R2, R2 from 1,000 simulated datasets,

observed-fitted values, number of SNPs detected and computa-

tional time. The R2, in conjunction with interquartile range of

variation of R2 estimated through simulations, clearly favour the

SPLMM as the best model fit for the females. The R2 estimates

from the simulations indicate that although the STLMM method

has higher R2 for both females and males, the interquartile range

is much larger for STLMM method, indicating the model fit is

more data dependent than the other methods, which is not

desirable for generalization to other cohorts. The conclusion for

the males is not as simplistic as the R2 is largest for the STLMM,

Figure 1. Q-Q plot of residuals for each of the methods by females (top four) and males (bottom four).doi:10.1371/journal.pone.0053897.g001



however with the considerably longer computational time and the

larger deviation the fitted values are from the observed values

indicates that this model might not be appropriate for large scale

genetic studies. Figure 1 displays the residuals from all four

methods in both males and females. The female residual plots

indicate the LMM, STLMM and SPLMM methods all have

residuals distributed close to the expected distribution (normal for

the LMM and SPLMM and skew-t for the STLMM). Several

within-subject outliers (at the tails of the distribution) were not

captured in all methods. However, the NLMM in particular had

additional outliers not present with the other methods. The LMM

and SPLMM methods both have some deviation from the normal

distribution at the top end of the curve signifying that they under

estimate the high BMI values. In contrast, there were an excess of

extreme residual values at both ends when using the NLMM

method indicating a poor fit for the data. It over estimates low

BMI values and under estimates high values, thus under estimating

within-individual variability and potentially leading to conserva-

Table 3. Statistical measures used to compare model fit of the four methods.

R2R2 from 1,000 simulateddatasets [median (IQR)]

(Observed-fitted values)2

[median (IQR)]Number of SNPsdetected

Average run time for geneticmodelT (median [IQR])

Female LMM 83.59% 83.60% (82.70, 84.44) 0.2705 (0.0579, 0.8755) 1 of 17 13.59 sec (13.41, 14.40)

STLMM 88.78% 91.80% (86.30, 95.54) 0.2728 (0.0613, 0.9007) 3 of 17 4505 sec (4490, 4784)

SPLMM 89.42% 89.47% (89.06, 89.84) 0.1720 (0.0374, 0.5871) 3 of 17 23.49 sec (23.41, 23.92)

NLMM 85.98% 85.97% (85.32, 86.65) 0.1678 (0.0350, 0.5752) 2 of 51 (three testsper SNP)

0.01 sec (0.00,0.02)

Male LMM 80.67% 80.71% (79.64, 81.71) 0.2390 (0.0470, 0.8187) 3 of 17 15.84 sec (15.66, 16.55)

STLMM 88.72% 91.99% (87.88, 95.74) 0.2248 (0.0479, 0.8453) 4 of 17 3962 sec (3895, 3970)

SPLMM 87.59% 87.62% (87.24, 88.03) 0.1656 (0.0329, 0.5501) 4 of 17 24.07 sec (23.78, 24.52)

NLMM 85.10% 85.07% (84.41, 85.82) 0.1604 (0.0333, 0.5713) 5 of 51 (three testsper SNP)

0.00 sec (0.00,0.02)

TMedian (IQR) of 100 models with the FTO SNP in R-64-bit version 2.12.1 on a 64-bit operating system with an Intel Core i7 CPU Processor (L 640 @ 2.13 GHz).doi:10.1371/journal.pone.0053897.t003

Figure 2. Distribution of obesity-risk allele score, with error bars for mean BMI at age 14 years. The obesity-risk-allele score incorporatesgenotypes from 17 loci (FTO, MC4R, TMEM18, GNPDA2, KCTD15, NEGR1, BDNF, ETV5, SEC16B, LYPLAL1, TFAP2B, MTCH2, BCDIN3D, NRXN3, SH2B1,and MRSA) in the 1,219 individuals from the Raine study with complete genetic data. The error bars display the mean (95% CI) BMI at age 14 years(the largest follow-up in adolescence) for each risk-allele score.doi:10.1371/journal.pone.0053897.g002



tive inference about genetic associations. The male residuals

displayed a similar pattern to females, although there were fewer

obvious outliers. In addition, as there was less skewness in the

males, the STLMM method deviated from the expected t

distribution but in the opposite direction to that of the females,

whereby the low values of BMI are underestimated. Based on

model fit, all four methods were adequate in modelling childhood

growth curves; however, the SPLMM was slightly better than the

other methods at accounting for outliers and had the best model

fit.

Genetic ResultsOf the 17 SNPs, a likelihood ratio test indicated the LMM

method detected one significant association in the females and

three in males at the 5% level of significance, the STLMM method

detected three in females and four in males, the SPLMM detected

three in females and four in males and finally the NLMM method

detected no significant SNPs in either females or males for the size

parameter but 2 significant SNPs for the velocity parameter in

males. Results of all 17 SNPs can be found in Tables S2 (females)

and S3 (males). The first five principal components for population

stratification were not significantly associated with BMI in any of

the four methods and the genetic results of the 17 SNPs remained

consistent when adjusting for them (data not shown).

The obesity-risk allele score based on the genotypes at each of

the 17 loci was normally distributed and showed an approximately

linear association with BMI across childhood, based on the mean

BMI (95% confidence interval) for each score at each age

(Figure 2). When incorporating the risk-allele score into the four

longitudinal models, it was associated with increasing BMI in

females using all four methods however only three methods

detected an association in males (Table 4). For the females, the

LMM, STLMM and SPLMM methods all detected an increase in

BMI per allele increase in the obesity-risk-allele-score (LMM

b= 0.0046, P = 0.0216; STLMM b= 0.0492, P = 0.0410; SPLMM

b= 0.0049, P = 0.0181), in addition to an increase in linear slope

over time (LMM b= 0.0012, P = 0.00002; STLMM b= 0.0153,

P = 0.00003; SPLMM b= 0.0012, P = 0.0006). No significant

associations in the LMM, STLMM or SPLMM methods were

detected for the quadratic interactions with the risk-allele score,

however the cubic interaction was significant in the LMM

(b= 20.00001, P = 0.0067) and STLMM (b= 20.0001,

P = 0.0236). This indicates that, according the LMM and

STLMM methods, females with higher allele scores plateau to

adult BMI at an earlier age. In contrast, the NLMM method in

both females and males was unable to detect a significant

association with an increase in size or velocity, but did detect a

decrease in tempo (assumed to be adiposity rebound) for each

increase in risk allele. In the males, the LMM, STLMM and

SPLMM methods, also detected an increase in BMI (LMM

b= 0.0073, P = 0.0001; STLMM b= 0.0423, P = 0.0481; SPLMM

b= 0.0071, P = 0.0001) and BMI/year per allele increase (LMM

b= 0.0010, P = 0.0001; STLMM b= 0.0083, P = 0.0070; SPLMM

b= 0.0008, P = 0.0068). No significant associations in the LMM,

STLMM or SPLMM methods were detected for the quadratic

and cubic interactions with the risk-allele score, indicating that the

shape of the curve is consistent across the score categories.

Further analysis focused on the SPLMM model, as this method

was shown to give the best fit to these data. There are potentially

different genetic pathways leading to increased growth rate in

males and females as SNPs from different genes are associated

with BMI trajectory; in females, SNPs in the NRXN3, BDNF and

MRSA genes were significantly associated with BMI trajectory

whereas in males FTO, NRXN3, GNPDA2 and TMEM18 were

significant. Figure 3 displays the population average curves for

individuals with 15, 17 or 18 (25th, 50th and 75th percentile)

obesity-risk alleles. The growth curves in each of the genders show

different patterns; females begin their trajectory smaller than

males, they have an earlier rebound, and by the age of 18 years

they are beginning to plateau at their potential adult BMI. In

contrast, males go through puberty at a slightly later age resulting

in their BMI continuing to increase at the age of 18 years. It is

apparent that the genetic effect begins later for females, around

seven and a half years (P = 0.03), than males at four years

(P = 0.02)(Figure 4).

Table 4. Results from association analysis of the obesity-risk allele score with BMI trajectory using the four methods.


Beta 95% CI P-Value Beta 95% CI P-Value Beta 95% CI P-Value Beta SEP-Value

Female Score 0.0720 0.0107,0.1335

0.0216 0.0492 0.0020,0.0964

0.0410 0.0758 0.0131,0.1388

0.0181 Size 20.0003 0.0008 0.6910

Score*Age 0.0182 0.0099,0.02645

1.68E-05 0.0153 0.0082,0.0225

2.84E-05 0.0185 0.0080,0.0290

0.0006 Tempo 20.0090 0.0030 0.0023

Score*Age2 20.00001 20.0008,0.0008

0.9848 0.0005 20.00004,0.0011

0.0685 20.0077 20.0214,0.0061

0.2763 Velocity 0.0045 0.0024 0.0562

Score*Age3 20.0002 20.0003,20.00004

0.0067 20.0001 20.0002,20.00002

0.0236 20.0058 20.0128,0.0013

0.1077

Male Score 0.1073 0.0553,0.1595

0.0001 0.0423 0.0004,0.0843

0.0481 0.1053 0.0516,0.1591

0.0001 Size 0.0005 0.0007 0.4850

Score*Age 0.0144 0.0074,0.0215

0.0001 0.0083 0.0023,0.0144

0.0070 0.0122 0.0034,0.0210

0.0068 Tempo 20.0072 0.0026 0.0053

Score*Age2 20.0006 20.0012,0.0001

0.1043 20.00001 20.0005,0.0004

0.9586 20.0003 20.0120,0.0114

0.9573 Velocity 0.0009 0.0016 0.5820

Score*Age3 20.0001 20.0002,0.000002

0.0550 20.0001 20.0001,0.00003

0.1940 0.0007 20.0052,0.0065

0.8270




Discussion

The current study has shown that of the four statistical methods

evaluated, the semi-parametric linear mixed model (SPLMM)

method was the most efficient for modelling childhood growth to

detect modest genetic effects in the longitudinal pregnancy cohort

study investigated. In addition, we have shown that there are

potentially different genetic pathways leading to increased growth

rate in males and females and that the obesity-risk-allele score

increases both average BMI and rate of growth throughout

childhood.

There are several different statistical methods that can be used

to model childhood growth. We selected four methods that would

allow for adjustment of potential confounders, appropriately

account for the correlation between the repeated measures, allow

for incomplete data, and were computationally feasible in the

context of candidate gene studies and GWAS. The evidence

suggested that the SPLMM method does a better job at

accounting for the variation in BMI growth than the LMM as it

had a smaller residual standard deviation. The SPLMM and

NLMM methods produce similar differences between observed

and fitted values. The LME and STLMM methods have a larger

range which indicates the prediction of BMI for each individual

over time is worst using both of these methods, introducing bias

whereby they over estimate low BMI values and under estimate

high BMI values. As seen in the residual plots, there are a small

number of outliers in this dataset, which are highly influential for

both the LMM and STLMM and will effect there ability for

accurate prediction. Furthermore, the estimates of skewness from

the STLMM model were relatively large (intercept = 4.5791

[SE = 1.0957] and slope = 2.2336 [SE = 0.6269] for females and

intercept = 2.8590 [SE = 0.5943] and slope = 1.6628

[SE = 0.4155] for males), which could be influenced by outliers

and result in inaccurate predictions. Although residual plots

indicate the STLMM method has the best fit to the data, it does

not produce the most accurate predictions. Based on model fit, all

four methods are adequate in modelling childhood growth curves;

however the SPLMM produces the most accurate fitted values and

can account for outliers.

Of the 17 genetic variants associated with adult BMI and

obesity risk that we investigated, the SPLMM method was able to

detect a higher proportion of associations with childhood growth

in both males and females than the other methods. The NLMM

method performed poorly in both males (five significant tests of 51)

and females (two significant tests of 51) consistent with it being

Figure 3. Population average curves from the SPLMM method in females and males. Predicted population average BMI trajectories from1–18 years for individuals with 15 (lower quartile), 17 (median), and 18 (upper quartile) risk alleles in the allele score.doi:10.1371/journal.pone.0053897.g003



more conservative than the other three methods. The STLMM

method detected a number of genetic effects, however it was a

more computationally intensive method, which would prove

difficult in larger scale genetic studies such as genome-wide

association studies. Moreover, it is not as flexible as the other

methods in terms of extensions to evaluate gene-environment or

gene-gene interactions. The current study provides evidence that

the SPLMM method is the most effective method to detect genetic

associations and allows the flexibility for extensions into large scale

and more complex genetic analyses.

Single genetic loci typically have small effects on complex

diseases or explain only a small proportion of the variability in a

quantitative trait; therefore, major increases in disease risk are

expected from simultaneous exposure to multiple genetic risk

variants. A post hoc power calculation using 1,000 non-parametric

bootstrap simulations based on the Raine data indicated that this

study had 97% power to detect the FTO loci rs1121980 with

MAF = 0.41, which has one of the larger effect sizes on BMI, but

still had 83% power to detect a more realistic smaller effect size

like the BDNF SNP rs1488830 association in females with

MAF = 0.21. In contrast, the power to detect the allele score,

combining all risk alleles, was 95% in both males and females

separately. The current study is the first to investigate, separately

in males and females, an association between 17 published obesity-

risk loci as an allele score and BMI trajectory throughout

childhood and adolescence. Hoed et al [32] used a similar

approach with a 17-loci allele-score but focused on two cross-

sectional association analyses in pre2/early pubertal children and

adolescents. By utilizing a longitudinal design, the current study

reduced the number of genetic association tests conducted from

eight in a cross-sectional setting to one per gender, reducing the

necessity of adjusting for multiple testing and potentially missing

important genetic loci. A second study by Elks et al [34] evaluated

the association between adult obesity risk genes and growth

throughout childhood using a smaller subset of obesity suscepti-

bility loci and with analyses only up to age 11 years. Both studies

conducted analysis adjusting for gender; however, this does not

allow each gender to have different growth trajectories or the

investigation of different timing of the genetic effects. We found

substantial differences between males and females in the timing of

the adiposity rebound and plateauing towards adulthood. Addi-

tionally, we detected genetic effects had different timing and effects

in each gender. By combining males and females into one analysis,

these genetic differences may have been averaged out and the

biology underlying the differences may remain undetected.

A recent longitudinal study investigating the life-course effects of

variants in the FTO gene and near the MC4R gene demonstrated

that the effects strengthen throughout childhood and peak at age

20 before weakening during adulthood [33]. We detected a similar

pattern with the obesity-risk allele score throughout childhood,

where the effect begins around four years in males and seven years

of age in females and increases in size each year. One limitation of

the current study is that the cohort currently only has data

available up to 18-years. It will be of interest to follow the cohort in

Figure 4. Associations between the risk-allele score and BMI at each follow-up in females and males. Regression coefficients (95% CI)presented on ln(BMI) scale from the Semi-Parametric Linear Mixed Model (SPLMM) longitudinal model, derived at each of the average ages of follow-up. For example, a male with 17 obesity-risk-alleles is likely to have an ln(BMI) 0.005 units higher at age 6 than a male with 16 risk-alleles and by age14 this difference will be increased to 0.010 units.doi:10.1371/journal.pone.0053897.g004



order to investigate how the combined effect of these SNPs

changes as the cohort progresses into adulthood. Further, it would

be valuable to confirm that the SPLMM method is the most

appropriate statistical method in other cohorts investigating the

genetic determinants of childhood growth and the patterns of

association across the life course.

Further studies are now required to assess the validity of these

findings and also extend them to perhaps focus on interactions

between genes and the environment. Interactions, both gene-gene

and gene-environment, are an important area of research that is

critical for understanding the mechanisms underlying obesity. We

performed a small simulation study using re-sampling techniques

based on 1,000 non-parametric bootstrap data sets with replace-

ment from the Raine data and calculating the power to detect a

gene-gene interaction. Two SNP combinations were investigated

to gather an understanding of the range of power in our study;

these included the two most commonly reported BMI associated

loci, FTO rs1121980 (MAF = 0.41) by MC4R rs17782313

(MAF = 0.23) as well as two loci with large minor allele frequency,

FTO rs1121980 by NEGR1 rs2815752 (MAF = 0.38). Based on

these simulations, our study had 58.0% power to detect an

interaction between two SNPs with larger minor allele frequencies

(FTO*NEGR1) and effect sizes (FTO 0.019 kg/m2; NEGR1

0.011 kg/m2), while assuming a multiplicative model for the

interaction. However, the power decreases rapidly with the minor

allele frequency (FTO*MC4R) and effect size (FTO 0.0044 kg/

m2; MC4R 0.0020 kg/m2) to 4.6%. We therefore believe that our

study was not appropriately designed to detect gene-gene or gene-

environment interactions but instead think that meta-analyses of

multiple cohorts might be a better way to tackle this problem.

In conclusion, we have shown that although all four statistical

methods investigated for modelling childhood growth were

appropriate to model growth curves in childhood, the SPLMM

method was the most efficient in these data in terms of predicted

values and detection of genetic effects. Further, we have shown

that there is some evidence that genetic variations in established

adult obesity-associated genes are associated with childhood

growth; however these effects differ by gender and timing of

effect. This study provides further evidence of genetic effects that

may identify individuals early in life that are more likely to rapidly

increase their BMI through childhood, which provides some

insight into the biology of childhood growth.

Supporting Information

Table S1 Details of the 17 SNPs used in geneticassociation analyses.

(XLSX)

Table S2 Results of genetic association analysis infemales for all 17 SNPs in each of the four statisticalmethods.

(XLSX)

Table S3 Results of genetic association analysis inmales for all 17 SNPs in each of the four statisticalmethods.

(XLSX)

Acknowledgments

The authors are grateful to the Raine Study participants, their families, and

to the Raine Study research staff for cohort coordination and data

collection. The authors gratefully acknowledge the assistance of the

Western Australian DNA Bank (National Health and Medical Research

Council of Australia National Enabling Facility).

Author Contributions

Conceived and designed the experiments: NMW YYW LB. Analyzed the

data: NMW. Wrote the paper: NMW YYW CEP JAM LJB LJP SJL LB.

References

1. World Health Organization (2006) Obesity and Overweight Fact Sheet.

2. Griffiths LJ, Parsons TJ, Hill AJ (2010) Self-esteem and quality of life in obesechildren and adolescents: a systematic review. Int J Pediatr Obes 5: 282–304.

3. Tsiros MD, Olds T, Buckley JD, Grimshaw P, Brennan L, et al. (2009) Health-

related quality of life in obese children and adolescents. Int J Obes (Lond) 33:

387–400.

4. Lawlor DA, Mamun AA, O’Callaghan MJ, Bor W, Williams GM, et al. (2005) Isbeing overweight associated with behavioural problems in childhood and

adolescence? Findings from the Mater-University study of pregnancy and itsoutcomes. Arch Dis Child 90: 692–697.

5. Sawyer MG, Miller-Lewis L, Guy S, Wake M, Canterford L, et al. (2006) Is

there a relationship between overweight and obesity and mental health problemsin 4- to 5-year-old Australian children? Ambul Pediatr 6: 306–311.

6. Srinivasan SR, Myers L, Berenson GS (2006) Changes in metabolic syndrome

variables since childhood in prehypertensive and hypertensive subjects: the

Bogalusa Heart Study. Hypertension 48: 33–39.

7. Bradford NF (2009) Overweight and obesity in children and adolescents. PrimCare 36: 319–339.

8. Kindblom JM, Lorentzon M, Hellqvist A, Lonn L, Brandberg J, et al. (2009)

BMI changes during childhood and adolescence as predictors of amount of adultsubcutaneous and visceral adipose tissue in men: the GOOD Study. Diabetes 58:

867–874.

9. Serdula MK, Ivery D, Coates RJ, Freedman DS, Williamson DF, et al. (1993)Do obese children become obese adults? A review of the literature. Prev Med 22:

167–177.

10. Dietz WH (1994) Critical periods in childhood for the development of obesity.

Am J Clin Nutr 59: 955–959.

11. Maes HH, Neale MC, Eaves LJ (1997) Genetic and environmental factors inrelative body weight and human adiposity. Behav Genet 27: 325–351.

12. Haworth CM, Carnell S, Meaburn EL, Davis OS, Plomin R, et al. (2008)

Increasing heritability of BMI and stronger associations with the FTO gene overchildhood. Obesity (Silver Spring) 16: 2663–2668.

13. Wardle J, Carnell S, Haworth CM, Plomin R (2008) Evidence for a strong

genetic influence on childhood adiposity despite the force of the obesogenicenvironment. Am J Clin Nutr 87: 398–404.

14. Parsons TJ, Power C, Logan S, Summerbell CD (1999) Childhood predictors ofadult obesity: a systematic review. Int J Obes Relat Metab Disord 23 Suppl 8:

S1–107.

15. Jiao H, Arner P, Hoffstedt J, Brodin D, Dubern B, et al. (2011) Genome wide

association study identifies KCNMA1 contributing to human obesity. BMCMed Genomics 4: 51.

16. Wang K, Li WD, Zhang CK, Wang Z, Glessner JT, et al. (2011) A genome-wide

association study on obesity and obesity-related traits. PLoS One 6: e18939.

17. Meyre D, Delplanque J, Chevre JC, Lecoeur C, Lobbens S, et al. (2009)

Genome-wide association study for early-onset and morbid adult obesityidentifies three new risk loci in European populations. Nat Genet 41: 157–159.

18. Paternoster L, Evans DM, Aagaard Nohr E, Holst C, Gaborieau V, et al. (2011)

Genome-Wide Population-Based Association Study of Extremely Overweight

Young Adults - The GOYA Study. PLoS One 6: e24303.

19. Cotsapas C, Speliotes EK, Hatoum IJ, Greenawalt DM, Dobrin R, et al. (2009)Common body mass index-associated variants confer risk of extreme obesity.

Hum Mol Genet 18: 3502–3507.

20. Speliotes EK, Willer CJ, Berndt SI, Monda KL, Thorleifsson G, et al. (2010)

Association analyses of 249,796 individuals reveal 18 new loci associated withbody mass index. Nat Genet 42: 937–948.

21. Liu JZ, Medland SE, Wright MJ, Henders AK, Heath AC, et al. (2010)

Genome-wide association study of height and body mass index in Australiantwin families. Twin Res Hum Genet 13: 179–193.

22. Thorleifsson G, Walters GB, Gudbjartsson DF, Steinthorsdottir V, Sulem P, et

al. (2009) Genome-wide association yields new sequence variants at seven loci

that associate with measures of obesity. Nat Genet 41: 18–24.

23. Willer CJ, Speliotes EK, Loos RJ, Li S, Lindgren CM, et al. (2009) Six new lociassociated with body mass index highlight a neuronal influence on body weight

regulation. Nat Genet 41: 25–34.

24. Loos RJ, Lindgren CM, Li S, Wheeler E, Zhao JH, et al. (2008) Common

variants near MC4R are associated with fat mass, weight and risk of obesity. NatGenet 40: 768–775.

25. Fox CS, Heard-Costa N, Cupples LA, Dupuis J, Vasan RS, et al. (2007)

Genome-wide association to body mass index and waist circumference: theFramingham Heart Study 100K project. BMC Med Genet 8 Suppl 1: S18.



26. Frayling TM, Timpson NJ, Weedon MN, Zeggini E, Freathy RM, et al. (2007)

A common variant in the FTO gene is associated with body mass index andpredisposes to childhood and adult obesity. Science 316: 889–894.

27. Dina C, Meyre D, Gallina S, Durand E, Korner A, et al. (2007) Variation in

FTO contributes to childhood obesity and severe adult obesity. Nat Genet 39:724–726.

28. Scuteri A, Sanna S, Chen WM, Uda M, Albai G, et al. (2007) Genome-wideassociation scan shows genetic variants in the FTO gene are associated with

obesity-related traits. PLoS Genet 3: e115.

29. Chambers JC, Elliott P, Zabaneh D, Zhang W, Li Y, et al. (2008) Commongenetic variation near MC4R is associated with waist circumference and insulin

resistance. Nat Genet 40: 716–718.30. Hinney A, Hebebrand J (2009) Three at one swoop! Obes Facts 2: 3–8.

31. Zhao J, Bradfield JP, Li M, Wang K, Zhang H, et al. (2009) The role of obesity-associated loci identified in genome-wide association studies in the determination

of pediatric BMI. Obesity (Silver Spring) 17: 2254–2257.

32. den Hoed M, Ekelund U, Brage S, Grontved A, Zhao JH, et al. (2010) Geneticsusceptibility to obesity and related traits in childhood and adolescence:

influence of loci identified by genome-wide association studies. Diabetes 59:2980–2988.

33. Hardy R, Wills AK, Wong A, Elks CE, Wareham NJ, et al. (2010) Life course

variations in the associations between FTO and MC4R gene variants and bodysize. Hum Mol Genet 19: 545–552.

34. Elks CE, Loos RJ, Sharp SJ, Langenberg C, Ring SM, et al. (2010) Geneticmarkers of adult obesity risk are associated with greater early infancy weight gain

and growth. PLoS Med 7: e1000284.35. Heard-Costa NL, Zillikens MC, Monda KL, Johansson A, Harris TB, et al.

(2009) NRXN3 is a novel locus for waist circumference: a genome-wide

association study from the CHARGE Consortium. PLoS Genet 5: e1000539.36. Lindgren CM, Heid IM, Randall JC, Lamina C, Steinthorsdottir V, et al. (2009)

Genome-wide association scan meta-analysis identifies three Loci influencingadiposity and fat distribution. PLoS Genet 5: e1000508.

37. Borghi E, de Onis M, Garza C, Van den Broeck J, Frongillo EA, et al. (2006)

Construction of the World Health Organization child growth standards:selection of methods for attained growth curves. Stat Med 25: 247–265.

38. Preece MA, Baines MJ (1978) A new family of mathematical models describingthe human growth curve. Ann Hum Biol 5: 1–24.

39. Gasser T, Kohler W, Muller HG, Kneip A, Largo R, et al. (1984) Velocity andacceleration of height growth using kernel estimation. Ann Hum Biol 11: 397–

411.

40. Cole TJ, Donaldson MD, Ben-Shlomo Y (2010) SITAR–a useful instrument forgrowth curve analysis. Int J Epidemiol 39: 1558–1566.

41. Laird NM, Ware JH (1982) Random-effects models for longitudinal data.Biometrics 38: 963–974.

42. Milani S, Bossi A, Marubini E (1989) Individual growth curves and longitudinal

growth charts between 0 and 3 years. Acta Paediatr Scand Suppl 350: 95–104.

43. Goldstein H (1986) Efficient statistical modelling of longitudinal data. Ann Hum

Biol 13: 129–141.44. Rice JA, Silverman BW (1991) Estimating the Mean and Covariance Structure

Nonparametrically when the Data are Curves. Journal of the Royal Statistical

Society, Series B 53: 233–243.45. Donnelly CA, Laird NM, Ware JH (1995) Prediction and Creation of Smooth

Curves for Temporally Correlated Longitudinal Data. Journal of the AmericanStatistical Association 90: 984–989.

46. Newnham JP, Evans SF, Michael CA, Stanley FJ, Landau LL (1993) Effects of

frequent ultrasound during pregnancy: a randomised controlled trial. Lancet342: 887–891.

47. Williams LA, Evans SF, Newnham JP (1997) Prospective cohort study of factorsinfluencing the relative weights of the placenta and the newborn infant. British

Medical Journal 314: 1864–1868.48. Evans S, Newnham J, MacDonald W, Hall C (1996) Characterisation of the

possible effect on birthweight following frequent prenatal ultrasound examina-

tions. Early Human Development 45: 203–214.49. Huang RC, Burke V, Newnham JP, Stanley FJ, Kendall GE, et al. (2006)

Perinatal and childhood origins of cardiovascular disease. Int J Obes Res.50. Taal HR, St Pourcain B, Thiering E, Das S, Mook-Kanamori DO, et al. (2012)

Common variants at 12q15 and 12q24 are associated with infant head

circumference. Nat Genet 44: 532–538.51. Janssens AC, Aulchenko YS, Elefante S, Borsboom GJ, Steyerberg EW, et al.

(2006) Predictive testing for complex diseases using multiple genes: fact orfiction? Genet Med 8: 395–400.

52. Janssens AC, Moonesinghe R, Yang Q, Steyerberg EW, van Duijn CM, et al.(2007) The impact of genotype frequencies on the clinical validity of genomic

profiling for predicting common chronic diseases. Genet Med 9: 528–535.

53. Lachos VH, Ghosh P, Arellano-Valle RB (2010) Likelihood based inference forskew-normal independent linear mixed model. Statistica Sinica 20: 303–322.

54. Azzalini A, Capitanio A (1999) Statistical applications of the multivariate skewnormal distribution. Journal of the Royal Statistical Society: Series B (Statistical

Methodology) 61: 579–602.

55. Song PXK, Zhang PQA (2007) Maximum likelihood inference in robust linearmixed-effect models using multivariate t distributions. Statistica Sinica 17: 929–

943.56. Efron B, Tibshirani RJ (1994) An Introduction to the Bootstrap: Taylor &

Francis.57. Cheng J, Edwards LJ, Maldonado-Molina MM, Komro KA, Muller KE (2010)

Real longitudinal data analysis for real people: building a good enough mixed

model. Stat Med 29: 504–520.58. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, et al. (2006)

Principal components analysis corrects for stratification in genome-wideassociation studies. Nat Genet 38: 904–909.

59. Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics.

Journal of Computational and Graphical Statistics 5: 299–314.



Appendix B: Additional Details of the Linear Mixed Model in Chapter Two

Saturated Model: Females > test.f <- lme(log(bmi) ~ I(age-8) + I((age-8)^2) + I((age-8)^3),

data=data.f, method="ML", random = ~ I(age-8) + I((age-8)^2)| ID,

na.action=na.omit)

> summary(test.f)

Linear mixed-effects model fit by maximum likelihood Data: data.f AIC BIC logLik -9594.429 -9524.204 4808.214 Random effects: Formula: ~I(age - 8) + I((age - 8)^2) | ID Structure: General positive-definite, Log-Cholesky parametrization StdDev Corr (Intercept) 0.129605923 (Intr) I(g-8) I(age - 8) 0.012108440 0.772 I((age - 8)^2) 0.001200233 -0.690 -0.437 Residual 0.050449166 Fixed effects: log(bmi) ~ I(age - 8) + I((age - 8)^2) + I((age - 8)^3) Value Std.Error DF t-value p-value (Intercept) 2.8173907 0.004965848 3641 567.3534 0 I(age - 8) 0.0344273 0.000619246 3641 55.5956 0 I((age - 8)^2) 0.0029519 0.000060840 3641 48.5194 0 I((age - 8)^3) -0.0003112 0.000008402 3641 -37.0404 0 Correlation: (Intr) I(g-8) I((-8)^2 I(age - 8) 0.524 I((age - 8)^2) -0.610 -0.041 I((age - 8)^3) 0.052 -0.625 -0.347 Standardized Within-Group Residuals: Min Q1 Med Q3 Max -5.488724150 -0.487264811 0.004883059 0.456330949 6.729748021 Number of Observations: 4377 Number of Groups: 733

Males > test.m <- lme(log(bmi) ~ I(age-8) + I((age-8)^2) + I((age-8)^3),

data=data.m, method="ML", random = ~ I(age-8) + I((age-8)^2)| ID,

na.action=na.omit)

> summary(test.m)

Linear mixed-effects model fit by maximum likelihood Data: data.m AIC BIC logLik -10091.98 -10021.19 5056.99 Random effects: Formula: ~I(age - 8) + I((age - 8)^2) | ID Structure: General positive-definite, Log-Cholesky parametrization StdDev Corr (Intercept) 0.124336118 (Intr) I(g-8) I(age - 8) 0.012422263 0.758 I((age - 8)^2) 0.001185971 -0.684 -0.367 Residual 0.050843578 Fixed effects: log(bmi) ~ I(age - 8) + I((age - 8)^2) + I((age - 8)^3) Value Std.Error DF t-value p-value (Intercept) 2.8088458 0.004665463 3833 602.0507 0 I(age - 8) 0.0290900 0.000612958 3833 47.4584 0 I((age - 8)^2) 0.0031807 0.000059063 3833 53.8518 0 I((age - 8)^3) -0.0002872 0.000008322 3833 -34.5110 0 Correlation: (Intr) I(g-8) I((-8)^2 I(age - 8) 0.516 I((age - 8)^2) -0.611 -0.011 I((age - 8)^3) 0.056 -0.618 -0.339 Standardized Within-Group Residuals: Min Q1 Med Q3 Max -4.54398186 -0.44495072 -0.01160751 0.44978391 4.32696447 Number of Observations: 4609 Number of Groups: 773

Testing the random effects (using method=REML): > base.mod.3.f = lme(log(bmi) ~ I(age-8) + I((age-8)^2) + I((age-8)^3),

data=data.f, method="REML", random = list(ID=pdDiag(~ I(age-8) +

I((age-8)^2) )), na.action=na.omit)

> base.mod.2.f = lme(log(bmi) ~ I(age-8) + I((age-8)^2) + I((age-8)^3),

data=data.f, method="REML", random = list(ID=pdDiag(~ I(age-8)

)),

na.action=na.omit)

> base.mod.1.f = lme(log(bmi) ~ I(age-8) + I((age-8)^2) + I((age-8)^3),

data=data.f, method="REML",random = list(ID=pdDiag(~ 1 )),

na.action=na.omit)

> sapply(list(base.mod.1.f,base.mod.2.f,base.mod.3.f),AIC)

> sapply(list(base.mod.1.f,base.mod.2.f,base.mod.3.f),BIC)

> sapply(list(base.mod.1.f,base.mod.2.f,base.mod.3.f),logLik)

Below is a table providing statistics for the models that were tested (the same code as

above was used for males):

It appears that the random intercept for age^2 is necessary for both males and females.

To calculate the LRT test comparing model.3 and model.2, for example:

Model Random Effects ρ Random Effect

ρ error

-2LL BIC AIC

Female

1 Intercept Independent Null 3647.447 -7244.596 -7282.895

2 Intercept, age Independent Null 4294.200 -8529.718 -8574.400

3 Int, age, age^2 Independent Null 4392.848 -8718.630 -8769.695

Male

1 Intercept Independent Null 3812.153 -7573.697 -7612.306

2 Intercept, age Independent Null 4521.136 -8983.229 -9028.273

3 Int, age, age^2 Independent Null 4626.609 -9185.739 -9237.218

Females > anova(base.mod.2.f,base.mod.3.f)

Model df AIC BIC logLik Test L.Ratio p-

value

base.mod.3.f 1 8 -8769.695 -8718.630 4392.848

base.mod.2.f 2 7 -8574.400 -8529.718 4294.200 1 vs 2 197.2952

<.0001

Males > anova(base.mod.3.m, base.mod.2.m)


value

base.mod.3.m 1 8 -9237.218 -9185.739 4626.609

base.mod.2.m 2 7 -9028.273 -8983.229 4521.136 1 vs 2 210.9452

<.0001

Testing the correlation structure of the error terms: Females

> base.mod.3.f.ar1 = update(base.mod.3.f, correlation=corAR1())

>

base.mod.3b.f.cs=update(base.mod.3b.f,correlation=corCompSymm(form=~1|ID)

)

> anova(base.mod.3.f,base.mod.3.f.ar1)

Model df AIC BIC logLik Test L.Ratio p-value

base.mod.3.f 1 8 -8769.695 -8718.630 4392.848

base.mod.3.f.ar1 2 9 -9060.715 -9003.267 4539.358 1 vs 2 293.0203 <.0001

> anova(base.mod.3b,base.mod.3b.cs)


value

base.mod.3b.f 1 11 -9531.723 -9461.508 4776.862

base.mod.3b.f.cs 2 12 -9529.723 -9453.125 4776.862 1 vs 2 5.989739e-06

0.998

Males

> base.mod.3.m.ar1 = update(base.mod.3.m, correlation=corAR1())

> base.mod.3.m.cs = update(base.mod.3.m,

correlation=corCompSymm(form=~1|ID))

> anova(base.mod.3.m,base.mod.3.m.ar1)


base.mod.3.m 1 8 -9237.218 -9185.739 4626.609

base.mod.3.m.ar1 2 9 -9556.990 -9499.076 4787.495 1 vs 2 321.7723 <.0001

> anova(base.mod.3.m, base.mod.3.m.cs)


base.mod.3.m 1 8 -9237.218 -9185.739 4626.609

base.mod.3.m.cs 2 9 -9235.218 -9177.304 4626.609 1 vs 2 6.006303e-09 0.9999

It appears that the AR1() correlation structure is needed for both males and females.

Testing the correlation between random effects: Females > base.mod.3b.f = lme(log(bmi) ~ I(age-8) + I((age-8)^2) + I((age-8)^3),

data=data.f, method="REML", random = ~ I(age-8) + I((age-8)^2)|ID,

na.action=na.omit)

> anova(base.mod.3.f,base.mod.3b.f)


value

base.mod.3.f 1 8 -8769.695 -8718.630 4392.848

base.mod.3b.f 2 11 -9531.723 -9461.508 4776.862 1 vs 2 768.0278

<.0001

Males > base.mod.3b.m = lme(log(bmi) ~ I(age-8) + I((age-8)^2) + I((age-8)^3),

data=data.m, method="REML", random = ~ I(age-8) + I((age-8)^2)|ID,

correlation=corAR1(), na.action=na.omit)

> anova(base.mod.3.m.ar1, base.mod.3b.m)


base.mod.3.m.ar1 1 9 -9556.99 -9499.076 4787.495

base.mod.3b.m 2 12 -10119.56 -10042.340 5071.779 1 vs 2 568.5681 <.0001

The LRT suggests that a correlation between the random intercepts and slopes is

necessary. Thus, the chosen model is base.mod.3b.f.ar1 and base.mod.3b.m

Appendix C: Publication Arising from the Research in Chapter Four

Title: 1

Robustness of the linear mixed effects model to error distribution assumptions and the 2

consequences for genome-wide association studies. 3

4

Authors: 5

Nicole M Warrington1,2*, Kate Tilling3,4, Laura D Howe3,4, Lavinia Paternoster3,4, Craig E 6

Pennell1, Yan Yan Wu5, Laurent Briollais5 7

8

Affiliations: 9 1 School of Women’s and Infants’ Health, The University of Western Australia, Perth, 10

Western Australia, Australia 11

2 University of Queensland Diamantina Institute, Translational Research Institute, Brisbane, 12

Queensland, Australia 13

3 School of Social and Community Medicine, University of Bristol, Bristol, UK 14

4 MRC Integrative Epidemiology Unit at the University of Bristol, Bristol, UK 15

5 Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario, Canada 16

*Corresponding Author: Nicole Warrington, University of Queensland Diamantina Institute, 17

Translational Research Institute, 37 Kent St, Woolloongabba, Brisbane, Queensland, 18

Australia; Phone: +61 7 3443 7044; Fax: +61 7 344 36966; Email: [email protected] 19

20

Word count of abstract: 244 21

Word count of body (excluding tables, figures and references): 7,612 22

Number of tables: 4 23

Number of figures: 5 24

Short title: Robustness of the LMM to distribution assumptions in GWAS 25

26

1

mailto:

Abstract: 27

Genome-wide association studies have been successful in uncovering novel genetic variants 28

that are associated with disease status or cross-sectional phenotypic traits. Researchers are 29

beginning to investigate how genes play a role in the development of a trait over time. 30

Linear mixed effects models (LMM) are commonly used to model longitudinal data; 31

however, it is unclear if the failure to meet the models distributional assumptions will affect 32

the conclusions when conducting a genome-wide association study. In an extensive 33

simulation study, we compare coverage probabilities, bias, type 1 error rates and statistical 34

power when the error of the LMM is either heteroscedastic or has a non-Gaussian 35

distribution. We conclude that the model is robust to misspecification if the same function 36

of age is included in the fixed and random effects. However, type 1 error of the genetic 37

effect over time is inflated, regardless of the model misspecification, if the polynomial 38

function for age in the fixed and random effects differs. In situations where the model will 39

not converge with a high order polynomial function in the random effects, a reduced 40

function can be used but a robust standard error needs to be calculated to avoid inflation of 41

the type 1 error. As an illustration, a LMM was applied to longitudinal body mass index 42

(BMI) data over childhood in the ALSPAC cohort; the results emphasised the need for the 43

robust standard error to ensure correct inference of associations of longitudinal BMI with 44

chromosome 16 single nucleotide polymorphisms. 45

46

Key words: mixed model; robustness; misspecificiation; genome-wide association; 47

longitudinal studies; ALSPAC 48

49

2

1. Introduction: 50

Over recent years, the study of population genetics has progressed from candidate gene and 51

linkage studies over relatively small regions of the genome to whole genome association 52

analyses. These genome-wide association studies (GWAS) are designed to search the entire 53

genome for single nucleotide polymorphisms (SNPs) that are associated with a disease or 54

trait of interest. If SNPs are found to be associated, they are then considered to mark a 55

region of the genome that influences the risk of disease or affects the levels of a trait. In 56

general, very small effects are detected so large sample sizes are required. This advance in 57

the scale of genetic analyses has transformed the field from hypothesis driven research to a 58

hypothesis free approach, which has required additional statistical methods to be 59

developed to ensure there is a balance between acceptable levels of power and the chance 60

of inflating the type 1 error. Given the cost of conducting these studies, in terms of both 61

monetary costs for genotyping samples and computational costs for the analysis, it is 62

important that appropriate analyses are conducted from the outset. 63

64

To date, most of the GWAS have focused on case/control studies of particular diseases or 65

cross-sectional measurements of phenotypic traits. These study designs typically use 66

relatively simple statistical techniques, such as chi-square tests or linear (or logistic) 67

regression models, to look at the association between a trait and each of the ~2.5 million 68

SNPs. There are now over 1,500 published studies focusing on 250 traits using analyses of 69

this kind (Hindorff et al. 2010). However, researchers are beginning to focus on more 70

complex analyses to uncover additional genetic loci and reduce the currently unexplained 71

heritability of these traits. One area of extension is to use longitudinal studies, with 72

repeated measures on each individual in the study, to understand how SNPs affect changes 73

over time of a particular phenotype (Kerner, North, and Fallin 2009; Sikorska et al. 2013; 74

Smith et al. 2010). There are several developed statistical methods commonly used for 75

repeated measures data to take into account the non-independence of measurements 76

within an individual. For continuous traits, the most popular statistical method is the linear 77

mixed effects model (LMM) by Laird and Ware (Laird and Ware 1982). This method can be 78

computationally intensive as the model can account for linear or non-linear trajectories for 79

the outcome of interest over time, correlation between measures at the starting point 80

3

(intercept) and change over time (slope, or non-linear trajectory) within an individual and 81

adjustment for both time-independent and time-dependent covariates. 82

83

In LMMs, the usual assumptions made about the random effects and error distributions 84

include: the random effects and error terms are normally distributed, the random effects 85

are independent of the error term and the error term has homoscedastic variance (Laird 86

and Ware 1982). In studies utilizing this method to assess the association of a SNP with the 87

trajectory, the fixed effect estimates are often of most interest; the random effects and 88

correlation structure at the individual level are necessary to provide an accurate fit of the 89

model to the data, in addition to providing appropriate test statistics, but are treated as 90

nuisance parameters and are often difficult to interpret. There have been a number of 91

studies investigating whether violations of the assumptions about the random effects and 92

error terms affect the maximum likelihood inference of the fixed effect parameters and 93

their variance estimates; several manuscripts have shown that the fixed effects estimates 94

are robust to non-Gaussian random effects distribution (Zhang and Davidian 2001; Verbeke 95

and Lesaffre 1997), non-Gaussian or heteroscedastic error distribution (Jacqmin-Gadda et al. 96

2007) and that the population fixed effects are robust to misspecified covariance structure 97

(Taylor, Cumberland, and Sy 1994), but the individual level predictions are not (Taylor and 98

Law 1998). Jacqmin-Gadda et al (Jacqmin-Gadda et al. 2007) show the fixed effects 99

estimates are not robust to error variance that is dependent on a covariate in the model 100

that interacts with time. Liang and Zeger (Liang and Zeger 1986) demonstrated that a robust 101

sandwich estimator (Royall 1986) can correct for biased variance estimates of the fixed 102

effects when the covariance structure is not correctly specified. There has not been any 103

investigation, to our knowledge, into how any of these model misspecifications affect the 104

power and type 1 error in high dimensional studies, for example when running an LMM on a 105

genome-wide scale, and what the value of the robust variance estimator is in this context. 106

107

The aim of this study is to assess by simulations whether misspecification of the error term, 108

with either non-Gaussian error distributions or non-constant error variance, in a complex 109

longitudinal model with non-linear trajectories will affect: 1) the coverage probabilities of 110

the 95% confidence interval of the fixed effects parameter estimates; 2) the bias of the fixed 111

effects parameter estimates; 3) the type 1 error of SNP detection in a GWAS; or 4) the 112

4

statistical power to detect association. We also examined whether our conclusions differ 113

according to minor allele frequency (MAF) for the SNPs or sample size of the investigated 114

cohort. 115

116

2. Motivating example: 117

The World Health Organization defines obesity as “abnormal or excessive fat accumulation 118

that presents a risk to health” (World Health Organization 2012). Obesity is a medical 119

condition which increases an individual’s risk to health problems such as cardiovascular 120

disease, type 2 diabetes and some cancers and therefore reduces life expectancy (Haslam 121

and James 2005). The prevalence of obesity has been increasing in recent decades in 122

developed countries, particularly in children. Body mass index (BMI; calculated as weight 123

(kg)/height2 (m)) is commonly used to define overweight and obesity, with appropriate cut-124

off’s defined for both children (Cole et al. 2000) and adults (WHO 2000). Childhood obesity 125

is one of the strongest predictors of adult obesity (Kindblom et al. 2009; Serdula et al. 1993). 126

Although the growing prevalence of obesity is most likely to be due to the increasing energy 127

intake and decreasing energy expenditure, twin and adoption studies have provided 128

evidence that BMI is heritable (Maes, Neale, and Eaves 1997; Haworth et al. 2008; Wardle et 129

al. 2008; Parsons et al. 1999). Recent GWAS have begun to uncover some plausible genetic 130

loci contributing to higher BMI (Speliotes et al. 2010; Liu et al. 2010; Thorleifsson et al. 2009; 131

Willer et al. 2009; Loos et al. 2008; Fox et al. 2007; Frayling et al. 2007) and obesity in 132

children (Bradfield et al. 2012), with 34 new loci identified. However, none of the studies to 133

date provide information regarding the genetic determinants of the rate of BMI growth over 134

childhood, which leads to obesity. 135

136

The Avon Longitudinal Study of Parents and Children (ALSPAC) (Boyd et al. 2013; Fraser et 137

al. 2013) is a birth cohort study; 14,541 pregnant women in the former county of Avon, UK, 138

were recruited into the study if they had an expected delivery date between 1st April 1991 139

and 31st December 1992. From birth to five years, length and weight measurements were 140

extracted from health visitor records, with up to four measurements taken on average at six 141

weeks, 10, 21, and 48 months of age. For a random 10% of the cohort, length and height 142

measurements were taken in eight research clinic visits held between the ages of four 143

months and five years of age. From age seven years upwards, all children were invited to 144

5

annual research clinics from ages 7 to 11 and biannual research clinics thereafter. Details of 145

measuring equipment used in the clinics is described elsewhere (Howe et al. 2010). In 146

addition, parent-reported child height and weight were also available from questionnaires 147

(27% of measurements). Whilst the measurements from routine health care have previously 148

been shown to be accurate in this cohort (Howe, Tilling, and Lawlor 2009), parental report 149

of children’s height tends to be overestimated while weight tends to be under estimated 150

(Dubois and Girad 2007). Ethical approval for the study was obtained from the ALSPAC 151

Ethics and Law Committee and the Local Research Ethics Committees. Please note that the 152

study website contains details of all the data that is available through a fully searchable data 153

dictionary (http://www.bris.ac.uk/alspac/researchers/data-access/data-dictionary/). 154

155

A subset of 7,916 participants were used for analysis based on the following inclusion 156

criteria: at least one parent of European descent, singleton birth, unrelated to anyone in the 157

sample, genome-wide genotype data, and at least one measure of BMI throughout 158

childhood. Participants have a median of 9 BMI measurements between 1 and 15 years of 159

age (interquartile range 5-12, range 1-29 measurements). Children tend to have rapidly 160

increasing BMI from birth to approximately 9 months of age where they reach their 161

adiposity peak; BMI then decreases until around the age of 5-7 years at adiposity rebound 162

and then steadily increases again until after puberty where it tends to plateau through 163

adulthood. There is a large amount of variability between individuals for both intercept and 164

slope. 165

166

The primary research question is to identify SNPs that are associated with average BMI and 167

change in BMI over childhood and adolescence in the ALSPAC data. A LMM was used to 168

appropriately model the longitudinal trajectory over childhood, to account for the large 169

correlation between each of the random effects parameters, to adjust for additional 170

covariates such as the source of the height/weight measurements (clinic or questionnaire) 171

and to allow data to be missing at random across childhood. The general form of the model 172

is as follows: 173

174

Yi = Xiβ + Z ibi + ε i (1)

6

where Yi is the response vector for the ith individual, β is the vector of fixed effects and 175

bi ~ N(0, Σ) is the vector of subject specific random effects, Xi and Zi are the fixed effect and 176

random effect regressor matrices respectively and ε i ~ N(0, σ2) is the within subject error 177

vector. When applying this model to the ALSPAC data, the best model fit included a cubic 178

polynomial of mean centred age (centred at age 8 years) in the fixed effects, a quadratic 179

polynomial of mean centred age in the random effects and a continuous autoregressive 180

correlation structure of order one for the covariance of the within-subject errors. Hence, the 181

final model for both females and males was: 182

183

BMIij = β0 + β1t ij + β2t ij2 + β3t ij

3 + β4MSij + β5SNPi + β6t ijSNPi +

β7t ij2SNPi + β8t ij

3SNPi + bi0 + bi1t ij + bi2t ij2 + ε ij

(2)

184

where MS is the measurement source (i.e. clinical visit or questionnaire) of individual i at 185

time j and tij is the age (centred at 8). Therefore β0 is the population intercept (i.e. mean 186

BMI at age 8), β1-β3 are the fixed effects for the cubic function of age, β4 is the 187

measurement source, β5 is the change in the mean BMI at 8 years of age for each additional 188

copy of the minor allele, β6 is the SNP by linear age effect, β7 and β8 are the SNP by 189

quadratic and cubic effects respectively. 190

191

Due to the nature of the data collection, which is often complex in large cohort based 192

studies, we found that the model assumptions were not met due to the following: 193

1. The questionnaire measures have previously been shown to have greater variability 194

than clinic measured height and weight (Dubois and Girad 2007); therefore we had 195

variability that was dependent on a covariate in the model. 196

2. There were only questionnaire measures available around the nadir of the trajectory 197

(also known as the adiposity rebound), which meant we had greater variability 198

around the rebound. 199

3. The variability within individuals changes over time; particularly with increased 200

variability around puberty and into adolescence. 201

4. BMI also has a non-Gaussian error distribution. This is in part due to the increasing 202

variability between individuals over time, with some individuals having rapidly 203

increasing BMI while others remain relatively consistent. 204

7

In the following, we investigate the robustness of the maximum likelihood inference for the 205

fixed effects, the type 1 error and the power for detecting an association with the SNP when 206

the error distribution is misspecified due to the above intricacies of the data. 207

208

3. Simulation study: 209

We carried out extensive simulations to investigate the effects on the LMM when the error 210

term (also called the level-1 residual, or the occasion-level residual) in the model was non-211

Gaussian or had a non-constant variance. In each of the simulation scenarios, we set the 212

non-genetic fixed effects parameters (β0-β4 from model (2)) and the variance-covariance 213

matrix similar to those coming from the fitted model for BMI adjusting for the FTO 214

rs1121980 SNP in the ALSPAC study; these can be found Table 1. The measurement source, 215

which is a fixed effect in the LMM and used in the heteroscedastic error simulations, was a 216

randomly generated binary variable for each individual at each time point with distribution 217

throughout the ages similar to the distribution in the ALSPAC cohort (percent questionnaire 218

measurements per follow-up year: year 1 = 40%, year 2 = 20%, year 3 = 40%, year 4 = 10%, 219

year 5= 60%, year 6 = 99%, year 7 = 10%, year 8 = 0%, year 9 = 0%, year 10 = 10%, year 11 = 220

0%, year 12 = 0%, year 13 = 30%, year 14 = 0%, year 15 = 0%). 221

222

We also investigated the fixed effect estimation for various sample sizes, minor allele 223

frequencies of the SNP and the SNP effect sizes: 224

1. Sample size: two levels; N=1,000 and N=3,000 225

2. Minor allele frequency: four levels; 0.1, 0.2, 0.3 and 0.4 226

3. Effect sizes: two combinations; β5 = 0.6, β6 = 0.15, β7 = -0.000752 and β8 = -227

0.000380 (alternative hypothesis) or β5 = β6 = β7 = β8 = 0 (null hypothesis). The 228

alternative hypothesis effect sizes for β5 and β6 were chosen to have 80% power to 229

detect with the larger sample size; the effect sizes for β7 and β8 were similar to 230

those coming from the fitted model for BMI adjusting for the FTO rs1121980 SNP in 231

the ALSPAC study. 232

233

234

8

3.1 Sampling Designs: 235

As many longitudinal cohorts have different sampling designs, some with variable amounts 236

of missing time points and missing observations at each time point, we investigated five 237

different sampling designs: 238

1. Sparse complete: ni = 8 measures per person with few measures around the 239

adiposity rebound; times of measures are 1, 2, 3, 5, 8, 10, 13, 15 240

2. Intense complete: ni = 14 measures per person with multiple measures around the 241

adiposity rebound; times of measures are 1, 2, 3, 3.5, 4, 4.5 ,5 ,5.5 ,6 ,7, 9, 11, 13, 15 242

3. Equal unbalanced: ni = 1 to 15 measures per person between 1 and 15 years with a 243

mean of 9 measures (proportion of missingness = 0.4 across whole age range) 244

4. Unbalanced with more samples around the adiposity rebound: ni = 1 to 15 measures 245

per person between 1 and 15 years with a mean of 9 measures; proportion of 246

missingness around adiposity rebound of 0.2 and 0.45 outside the 5 to 7 year age 247

range (average proportion of missingness over whole age range is 0.4) 248

5. Unbalanced with fewer samples around the adiposity rebound: ni = 1 to 15 measures 249

per person between 1 and 15 years with a mean of 9 measures; proportion of 250

missingness around adiposity rebound of 0.6 and 0.35 outside the 5 to 7 year age 251

range (average proportion of missingness over whole age range is 0.4) 252

The first two designs with complete data at each follow-up assume that every individual had 253

the exact same age at follow-up (i.e. came into clinic on their birthday), whereas the other 254

three designs are more representative of longitudinal studies where the actual age of 255

measurement varies between individuals by up to a year (i.e. came into clinic either 6 256

months before or after a birthday). We assume data is missing completely at random, that is 257

that the probability that an observation is missing for a given individual is independent of all 258

other observed data. The proportion of missingness simulated across the whole range (i.e. 259

0.4) was equivalent to the amount of missing data observed in the ALSPAC cohort under the 260

assumption that all individuals could have been measured yearly. We used a fully factorial 261

design for the simulations with the 3 data characteristics and the 5 sampling designs. 262

263

264

9

3.2 Models for data generation: 265

Standard linear mixed model: 266

Data were generated with Gaussian random effects and error distribution to validate the 267

estimation method. 268

Non Gaussian error: 269

Three error structures were investigated: 270

1. t-distribution: t with 5 degrees of freedom 271

2. skew-normal distribution: SN(1.0632, 40) 272

3. Asymmetric mixture of two Gaussian distributions: 0.3N(-0.67, 12) + 0.7N(0.5, 0.32) 273

Heteroscedastic error: 274

Three cases were studied: 275

1. Variance dependent on a covariate: Var(eij) = σ2e aXij

276

where σ2e = 1.131, a = 1.500 and Xij = 1 if measure was from questionnaire and 0 if 277

measure was from a follow-up clinic 278

2. Variance greater at the adiposity rebound: Var(eij) = σ2e aXij 279

where σ2e

= 1.131, a = 1.500 and Xij = 1 if measure was between 5 and 7 years and 0 280

if not 281

3. Variance increasing over time: Var(eij) = σ2e atij 282

where σ2e

= 1.131 and a = 1.150 283

284

3.3 Data generation: 285

We simulated 1,000 datasets under the alternative hypothesis (β5 = 0.6 and β6 = 0.15) to 286

look at coverage probabilities, bias and power and 5,000 datasets under the null hypothesis 287

(β5 = 0 and β6 = 0) to look at type 1 error at α=0.05. Each SNP (coded as 0, 1, 2) was 288

incorporated into the model assuming an additive genetic model, whereby each additional 289

minor allele increases BMI by an equal amount. We were primarily interested in estimating 290

the SNP main effect, β5, which represents the increase on the mean BMI at 8 years of age 291

for each additional copy of the minor allele and the SNP by age effect, β6, which represents 292

the effect on the mean linear increase of BMI (slope) for each additional minor allele. We 293

calculated a robust standard error for each fixed effect parameter and corresponding p-294

value; the following formula was used: 295

10

'

1

( ) ( )S

i i i ii ii

− −

=

∑

-1 -1 -1 -1' ' 'X V X X V ε ε V X X V X

Where: 296

X is the fixed effect regressor matrix from equation (1) 297

V is the variance of Y from equation (1) 298

i i iyε β= −X 299

S is the number of subjects and i is the ith subject 300

In addition to the fixed effects parameters, we conducted a Wald test to assess whether the 301

overall SNP effect was affected by the misspecification. The Wald test was estimated using 302

the General Linear Hypothesis approach (McDonald 1975). This approach is based on the 303

normal approximation for maximum likelihood estimators using the estimated variance-304

covariance matrix. The hypothesis can be specified through a constant matrix L to be 305

matched with the fixed effects of the model such that H0: Lβ = m where the m are the 306

hypothesized values. The estimates of the fixed effects, β, asymptotically follow a 307

multivariate normal distribution ( ,cov( ))Nβ β β by the Central Limit Theorem such that 308

the linear form also asymptotically follows a multivariate normal distribution: 309

310

'~ ( , cov( ) )L N L L Lβ β β (3)

Thus the 95% confidence interval and corresponding P-value for the hypothesized value can 311

be obtained accordingly. We tested whether the parameters for the SNP were 312

simultaneously equal to zero. It is computationally intensive to calculate a robust estimate 313

for the Wald test; for example, the robust standard error for the fixed effects takes 314

approximately 7 minutes for the rs1121980 FTO SNP in the ALSPAC data whereas the robust 315

standard error for the global Wald test takes approximately an additional 3 minutes. These 316

computational times decrease exponentially as sample size and the number of repeated 317

measures per individual decreases; however, they may not be scalable to a GWAS study. To 318

investigate whether a robust standard error would be beneficial for the global Wald test, we 319

selected the scenario where the inflation was greatest and calculated the robust estimates 320

for all the simulations in this scenario. All analyses were conducted in R version 2.12.1 (Ihaka 321

1996) using the nlme package. 322

11

As it is important to report the uncertainty in any estimates from simulation based studies 323

(Koehler, Brown, and Haneuse 2009), Monte Carlo error (MCE) was calculated using the 324

joint performance method of β and si outlined in White (White 2010). A confidence interval 325

for coverage probabilities, bias, type 1 error and power was calculated using the following: 326

327

P(1-P)P 1.96S

±

(4)

328

Where P is the α-level, for example P for coverage estimates is 0.95 and P for type 1 error is 329

0.05, and S is the number of simulations, for example either 1,000 or 5,000. The output from 330

the simulations was then assessed as to whether they fell within this confidence interval. 331

332

3.4 Results: 333

3.4.1 Coverage probabilities: 334

Coverage probability can indicate whether the confidence interval of the parameter(s) of 335

interest is conservative (i.e. the coverage probability is larger than the nominal confidence 336

interval) or liberal (i.e. the coverage probability is narrower than the nominal confidence 337

interval). 338

339

Coverage probabilities for the 95% confidence interval of the fixed effects parameter 340

estimates from each of the simulations are presented in Table 2. No consistent differences 341

were seen across the range of minor allele frequencies, so the results from each of the 342

simulated datasets were combined for ease of presentation; however the coverage 343

probabilities for each of the minor allele frequencies are presented in Supplementary Table 344

1. 345

346

The coverage probabilities of the SNP main effects parameter for all simulations appear to 347

be unaffected by the error misspecifications; only nine of 70 coverage probabilities were 348

significantly different from 95%, that is less than 94.32% or greater than 95.68%, five of 349

which were from the simulations where the error variance increases over time. 350

351

12

Thirty-one of the 70 coverage probabilities (44%) for the SNP*age interaction parameter 352

were significantly different from 95%, with both the non-Gaussian and heteroscedastic error 353

distributions being affected. When the error variance followed a t distribution, the coverage 354

probabilities for the confidence interval of the SNP*age interaction parameter are less than 355

95% in all designs except the sparse complete scenario. Similarly, the SNP*age interaction 356

parameter had coverage probabilities less than 95% when the error variance followed a 357

skew-normal distribution, however only in the unbalanced designs where there is missing 358

data. The coverage probabilities were less than 95% when the error variance was both 359

dependent on a covariate and increased over time, in both the complete and unbalanced 360

designs. All the coverage probabilities that significantly differ from 95% for the SNP*age 361

interaction parameter have underestimated variance estimates and thus confidence 362

intervals that were too narrow, which could lead to test statistics that are too liberal. 363

364

3.4.2 Bias: 365

The SNP main effect and the SNP*age interaction parameters are unbiased in the majority 366

of the simulations, indicating that the misspecifications in the error distribution do not 367

affect the estimates of the β’s (Supplementary Tables 2 and 3). Only nine of 140 95% 368

confidence intervals did not cover zero; these nine confidence intervals were across the 369

range of error distributions and designs, showing that no one scenario was particularly 370

biased. 371

372

No consistent differences were seen in the bias estimates across the range of minor allele 373

frequencies; however, the 95% confidence intervals for the difference between the 374

simulated parameter and the true parameter were tighter as the sample size and minor 375

allele frequency increased (Supplementary Table 4). 376

377

3.4.3 Type 1 error: 378

As seen with the coverage probabilities, no consistent differences in type 1 error were 379

evident across the minor allele frequency range, so the results from each of the simulated 380

datasets were combined for ease of presentation (Table 3 and Table 4); however the type 1 381

error for each of the minor allele frequencies tested are given in Supplementary Table 5. 382

383

13

As seen in Table 3, the type 1 error for the complete designs remained within acceptable 384

limits of the nominal alpha level. We observed inflation for the SNP by age interaction 385

parameter in several cases, but this inflation was reduced to nominal levels by using a 386

robust standard error. 387

388

Table 4 shows that the type 1 error for the SNP by age interaction was often inflated under 389

the unbalanced designs. However, by using a robust standard error, the inflation seen can 390

be reduced to nominal levels in the majority of cases; approximately 75% of the inflated 391

effects were reduced. The design where the robust standard error didn’t seem to have an 392

effect was when the error variance increased over time; only 20% of the estimates were 393

reduced to nominal levels under this design. Interestingly, the robust standard error did not 394

appear to affect the type 1 error for the scenarios that were not originally inflated. 395

396

To declare significance in a GWAS, several thresholds are commonly used: suggestive 397

association, significant association and highly significant association. Duggal et al define 398

suggestive associations as SNPs that reach a P-value threshold under the assumption that 399

one false positive association is expected per GWAS (Duggal et al. 2008); SNPs reaching this 400

threshold are taken forward for replication. In the context of our simulation study, this 401

definition would equate to a P-value of 0.00005 (1/20,000; where 20,000 is the number of 402

simulations per design and error assumption). The scenario with the highest type 1 error 403

inflation using the classical standard error was for the SNP*age interaction under the 404

intense design where the error variance increased over time (0.0746 for both N=1,000 and 405

3,000). In this scenario, 6 SNPs would falsely reach the definition of ‘suggestive association’ 406

for the SNP*age interaction parameter when using the classical standard error with a 407

sample size of 1,000 individuals. In contrast, when the model assumptions are met, that is 408

when the error distribution follows a Gaussian distribution with constant variance, only 2 409

SNPs met the ‘suggestive association’ threshold, indicating an inflation in the type 1 error 410

for the simulations where the variance increased over time due to the misspecification of 411

the error term. When using the robust standard error under the increasing variance over 412

time design, 1 SNP would meet the criteria, showing not only a reduction in the type 1 error 413

from the 7 SNPs seen with the classical standard error, but also a reduction in power in 414

comparison to the model where the assumptions were met. 415

14

These results show that there is greater inflation in the type 1 error for the SNP*age 416

interaction in the unbalanced designs than the complete designs. As outlined in 417

Supplementary Figure 1, we simulated additional data to investigate what aspects of the 418

unbalanced design contributed to the inflation. Briefly, the results from these additional 419

simulations are as follows: 420

1. The unbalanced designs differed from the complete designs by including missing 421

data and altering the measurement times so they fell within a range around the 422

scheduled times, both of which are inherent in cohort studies. The additional 423

simulations showed the inflation was greater in the presence of missing data rather 424

than because of the different measurement times between individuals 425

(Supplementary Figure 2). 426

2. Since the LME is known to be robust to missing data under the missing at random 427

and missing completely at random assumptions, we simulated additional data 428

varying the polynomial function of age in the fixed and random effects. These 429

simulations showed the type 1 error was reduced to nominal levels when the fixed 430

and random effects had the same function of age, i.e. cubic function in both the 431

fixed and random effects (Supplementary Table 6). 432

3. To determine whether there is remaining inflation in the type 1 error after modelling 433

the same function of age in the fixed and random effects when the error distribution 434

is misspecified we simulated additional data using the equal unbalanced sampling 435

design. These simulations showed that the type 1 error was again reduced to 436

nominal levels when the fixed and random effects had the same function of age 437

regardless of the misspecification in the error distribution (Supplementary Table 7). 438

4. It is often difficult to estimate higher order terms in the random effects when using 439

real data due to computational and convergence issues. In this case, it is often only 440

possible to fit a lower-order polynomial functions in the random effects than the 441

fixed effects. We simulated additional data where the fixed and random effects 442

included a quadratic function for age but we analyzed the data with a quadratic 443

function in the fixed effects and a linear function in the random effects. In addition, 444

we also simulated data where the fixed effects included a quadratic function for age 445

and the random effects included only a linear function but analyzed the data with a 446

quadratic function in both the fixed and random effects. These simulations showed 447

15

that the type 1 error was inflated when the analysis model had lower order terms of 448

polynomial function in the random effects compared to the fixed effects terms 449

(Supplementary Table 8). 450

In summary, it is recommended that one includes the same polynomial function for age in 451

the fixed and random effects to avoid inflation in the type 1 error; however, if this is not 452

possible due to non-convergence of the model then a robust standard error is required to 453

reach nominal levels of type 1 error. 454

455

456

The global Wald test, which is assessing whether there is any genetic effect on the whole 457

BMI growth trajectory, was inflated above the acceptable limits under all error variance 458

misspecifications and even under the Gaussian/constant variance assumption, except under 459

the sparse complete design. The scenario where the error variance increased over time 460

showed the largest inflation; however, using the robust estimates for the Wald test under 461

this scenario were also reduced to nominal levels in most designs; if it wasn’t reduced to 462

nominal levels it was dramatically lower than using the classical test (Table 4). Again, having 463

the same structure of fixed and random terms for the age polynomial function would yield 464

nominal type 1 errors. 465

466

Given that many researchers investigating GWAS of longitudinal traits are interested in only 467

the SNP main effect and not the SNP*age interaction (Furlotte, Eskin, and Eyheramendy 468

2012), we conducted some additional simulations without the SNP*age interaction. Once 469

again, we used the scenario where the error variance increased over time and where there 470

was equal unbalance in the data structure. We found that the type 1 error was within the 471

nominal range for the SNP main effect for both sample sizes (N=1,000: 0.0506; N=3,000: 472

0.0515), where previously we saw inflation for the sample size of 1,000 (0.0533 from Table 473

4). We have no reason to believe that any of the other scenarios would be affected by the 474

misspecifications when the SNP*age interactions are not modelled. 475

476

477

16

3.4.4 Power: 478

Effect sizes for the alternative hypothesis (β5 = 0.6 and β6 = 0.15) were chosen to have 80% 479

power with a MAF of 0.4 and sample size of 1,000 when the error from the fitted LMM 480

follows a Gaussian distribution with constant variance. Therefore, the power for all error 481

distributions and MAFs in the simulations with sample size of 3,000 was greater than 80%; 482

so this section will only discuss power for the simulations with a sample size of 1,000. Power 483

for the SNP main effect and SNP*age interaction parameters are displayed in Figures 1 484

(complete designs) and 2 (unbalanced designs). 485

486

As expected, the power increases with the MAF. Interestingly, assuming the error 487

distribution has a t-distribution led to lower power for both the SNP main effect and the 488

SNP*age interaction parameters than assuming a Gaussian error distribution. This pattern 489

was consistent across all of the sampling designs; however it appears that the power is 490

slightly closer to that of the error with the Gaussian distribution when there is more data 491

around the adiposity rebound (i.e. the intense complete and unbalanced with more samples 492

around the adiposity rebound). In addition, the simulations where the error distribution 493

follows a skew-normal distribution led to slightly higher power for both the SNP and 494

SNP*age interaction parameters than with the Gaussian error. 495

496

When investigating the different error variance structures, the power for the SNP main 497

effect parameter across all MAFs was slightly lower than the power when the constant 498

variance assumption was met. Likewise, for the SNP*age interaction parameter, all of the 499

error variance structures led to lower power than when the constant variance assumption 500

was met. However, simulations under the unbalanced designs where the variance increased 501

over time suffered the most and had notably reduced power until a MAF of approximately 502

0.3. 503

504

3.4.5 Power under the robust standard error: 505

We have shown that using the robust standard error doesn’t affect those situations where 506

the type 1 error wasn’t initially inflated, however before adopting the robust standard error 507

for a GWAS analysis we also wanted to determine whether using the robust standard error 508

would decrease our power to detect a statistically significant association. 509

17

The power for the SNP main effect parameter remains almost unchanged when using the 510

robust standard error rather than the normal standard error in all scenarios and under all 511

model misspecifications (Figures 3 and 4). The only scenario where the power decreased for 512

the SNP main effect parameter by using the robust standard error was where there was 513

increasing variance over time under the intense complete scenario. Given that the type 1 514

error was not inflated using either standard error estimate, there appears to be no harm in 515

using a robust standard error for estimation even when not required. 516

517

The power for the SNP*age interaction parameter, particularly for low MAF, is much more 518

variable. Under the sparse complete design, where there was no inflation in the type 1 519

error, the power remains about the same using either the classical or robust standard error. 520

For the other designs, the power for the SNP*age interaction parameter decreases using the 521

robust standard error, but only by 5% or less for most error misspecifications, when the 522

MAF was 0.2 or greater. Assuming a t-distribution for the error led to a decrease of about 5-523

10% power using the robust standard error when the MAF 0.1 or 0.2; this might be due to 524

the substantial reduction in type 1 error. The power also decreases by greater than 5% 525

when the variance is greater at the adiposity rebound and the variance is dependent on a 526

covariate, for values of MAF around 0.1 in our scenarios. 527

528

4. Analysis of chromosome-wide body-mass-index data: 529

Given our simulation results, in particular the need for a robust standard error to ensure 530

accurate inference for the SNP*age interaction where the type 1 error is inflated, we 531

wanted to investigate the impact of the distribution assumption problems in a real data 532

application. GWAS analysis of multiple cohorts would be ideal to observe the effect of the 533

different error term misspecifications; however this would require a large amount of 534

computing time and was thus determined to be prohibitive. Instead, we chose to conduct 535

analysis using chromosome 16 in the ALSPAC data as the most replicated gene for BMI to 536

date, the fat mass and obesity gene (FTO), is located on this chromosome and we therefore 537

hypothesised that we would detect some significant loci on this chromosome as well as 538

many non-associated SNPs. We used the same LMM model as in equation 2, with the 539

inclusion of an age*sex interaction in the fixed effects for all the age components (i.e. β9sexi 540

+ β10tijsexi + β11tij2sexi + β10tij

3sexi) to account for the differences in growth between males 541

18

and females (Warrington et al. 2013). There were 14,875 SNPs genotyped on chromosome 542

16, all of which had a MAF greater than 1%; GWAS are designed to look at common SNPs, so 543

it is a common strategy to exclude SNPs with MAF less than 1%. Each SNP was incorporated 544

into the model assuming an additive genetic model. 545

546

As expected, SNPs in the FTO gene were highly significant for the global tests as well as the 547

SNP main effect and SNP*age interactions. It is common to display GWAS analysis as a QQ 548

plot of the observed –log10(P) with the expected –log10(P) under the null distribution. Figure 549

5 displays a QQ plot from the chromosome 16 analysis in ALSPAC for each of the parameters 550

which displayed inflated levels of type 1 error in the simulation study. As we believe SNPs 551

within the FTO region to be true positives, we also display the QQ plots excluding SNPs from 552

this region (Figure 5, C and D). Lambda (λ) values are also commonly calculated for GWAS 553

analyses, which is the ratio of the median of the empirically observed distribution of the test 554

statistic to the expected median. The λ quantifies the extent of the excess false positive 555

rate, with values close to 1 indicating no inflation and values deviating from 1 indicating 556

increasing levels of false positives. The lambda values corresponding to each QQ plot were 557

calculated using the estlambda option in the GenABEL software (Aulchenko et al. 2007). 558

These QQ plots and lambda statistics clearly show that where the parameters have lambda 559

values greater than one using the classical test, the robust test reduces this to nominal 560

levels. When using the robust tests, we were still able to detect an association with SNPs in 561

the FTO gene. 562

563

In the chromosome wide analysis, the P-value to declare ‘suggestive significance’ would be 564

0.000067 (1/14,875). Using this threshold, 57 SNPs would reach suggestive significance for 565

the SNP by age interaction using the classical standard error in comparison to only 16 SNPs 566

using the robust standard error. Six of these 16 SNPs were in the FTO gene, four of which 567

would reach the significant threshold. 568

569

570

19

5. Discussion: 571

In this article, we simulated longitudinal data that mimicked childhood BMI to explore the 572

coverage probability, bias, type 1 error and power for association with a SNP when the 573

linear mixed effects model is misspecified with either a non-Gaussian error distribution or 574

heteroscedastic error. We have shown that the type 1 error for the SNP*age interaction 575

terms in a genetic association study has no inflation if the same function of age is included 576

in both the fixed and random effects. However, type 1 error is inflated, regardless of the 577

model misspecification, if the age function in the fixed and random effects differs. In 578

situations where the model is too complex and will not converge with a high order 579

polynomial function in the random effects, an appropriate way to deflate the type 1 error to 580

nominal levels is to use a robust standard error for the fixed effects parameters. Although 581

robust standard errors have been previously used in a wide range of statistical applications, 582

LMM’s are only just beginning to be utilized in GWAS and therefore guidance on their 583

application was warranted. Given that QQ plots in GWAS are an important diagnostic to rule 584

out the possibility of population stratification, it is essential to generate standard errors that 585

perform well under the null hypothesis so that any remaining inflation is not due to the 586

model fitting. Similar to the conclusions by Gurka et al (Gurka, Edwards, and Muller 2011) 587

and Verbeke and Molenberhgs (Verbeke and Molenberghs 2000) using other applications, 588

the sandwich estimator is a valid alternative in GWAS when the model assumptions are 589

misspecified, however it is less efficient than using the correct covariance model. 590

591

Similar to Jacqmin-Gadda et al (Jacqmin-Gadda et al. 2007), we have shown that estimates 592

of differences in slope by the number of copies of minor allele are sensitive to 593

heterogeneous error variance particularly when the error variance depends on a covariate 594

or increases over time. The variance of the estimates is underestimated and therefore the 595

confidence interval is too narrow; this is consistent with the inflated type 1 error under 596

these misspecified model assumptions. 597

598

Of all the misspecifications investigated, the situation where the error variance increases 599

over time and is not accounted for in the modelling has poor parameter estimates, low 600

power and the most inflation of the type 1 error, particularly for the SNP*age interaction 601

terms. It also appears that by using the robust standard error, the inflation in the type 1 602

20

error is reduced to the nominal level in only some of the scenarios. It is therefore imperative 603

that some adjustment is made in the modelling to account for this increasing variance over 604

time. In the ALSPAC BMI data, the variance stays relatively constant until around the age of 605

four years where it rapidly increases until around 11 years of age where it plateaus again. 606

This is due to the different growth rates between individuals through the adiposity rebound 607

and puberty. Increasing variability over time can be seen with many other phenotypes both 608

in childhood and adulthood; for example lung function in an elderly population can decrease 609

due to the rate at which individuals are diagnosed with diseases such chronic obstructive 610

pulmonary disease, while other individuals remain healthy. Variance functions for modelling 611

heteroscedasticity in mixed effects models have been studied in detail by Davidian and 612

Giltinan (Davidian and Giltinan 1995) and can be implemented using the varFunc classes in 613

the nlme package in R (Pinheiro and Bates 2000). There are also equivalent functions in 614

alternative statistical packages such as MLwiN (Rasbash et al. 2012). The use of these 615

variance functions could be recommended in the context of GWAS, if there is remaining 616

heteroscedasticity in the residuals after appropriately modelling the fixed and random 617

effects; however further studies are needed to assess their properties in this context. 618

619

When looking at SNPs with low minor allele frequencies, we have seen that by using the 620

robust standard error we reduce our power by approximately 5%. To counteract this 621

reduction, we can increase the sample size though the use of meta-analysis of multiple 622

cohort studies as is commonly done in GWAS analyses. However, several manuscripts have 623

previously discussed the extended computational time for longitudinal GWAS in comparison 624

to GWAS of cross-sectional phenotypes, so it is recommended that large computing clusters 625

are available to those cohort studies conducting analyses. The longitudinal GWAS of 626

cardiovascular risk factors presented in Smith et al (Smith et al. 2010) took approximately 3 627

hours on 64 processors of a compute cluster for 600,000 tests in 525 individuals. Sikorska et 628

al (Sikorska et al. 2013) illustrated that the analysis of 2.5 million SNPs using the LME 629

function in the nlme package of R would take 3,500 hours for a sample size of 3,000 630

individuals on a desktop computer (Intel(R) Core(TM) 2 Duo CPU, 3.00 GHz). These times are 631

consistent with those in this study; the chromosome 16 analysis of 14,875 SNPs in the 7,916 632

ALSPAC individuals took approximately 125 hours on 32 processors of a compute cluster 633

21

(BlueCrystal Phase 2 cluster with each node having four 2x2.8 GHz core processors and 8 GB 634

of RAM). 635

636

It has been suggested that the genome-wide significance threshold be set at 5 x 10-8 637

(Dudbridge and Gusnanto 2008; Risch and Merikangas 1996). In addition, Duggal et al 638

(Duggal et al. 2008) established an appropriate p-value threshold based on the number of 639

independent SNP tests in a GWAS. If study data is imputed against the HapMap CEU 640

population, they suggest a threshold of p<6.09 x 10-6 be used to select SNPs with suggestive 641

evidence for follow-up. Many cross-sectional GWAS studies use thresholds around this, 642

generally ranging from p<5 x 10-6 (Speliotes et al. 2010) to p<10-5 (Thorleifsson et al. 2009), 643

to select SNPs for replication. In longitudinal genetic association studies, particularly those 644

with complex, non-linear trajectories, controlling the type 1 error of the many parameters 645

involving SNP effects, can be quite challenging. This would be the case when using for 646

example smoothed splines functions and those functions could interact with the SNP 647

effects. Providing robust standard errors in this context can be difficult. As an alternative, it 648

may be plausible to use genomic control procedures to reduce a possible inflation in the 649

type 1 error for the parameters involving the SNP effects (Devlin and Roeder 1999; Dadd, 650

Weale, and Lewis 2009). Genomic control is typically used in genetic association studies to 651

account for the potential confounding due to cryptic relatedness, and makes the 652

assumption that the inflation in type 1 error is constant across all marker in the genome; 653

this is plausible in the context of cryptic relatedness as the inflation is due to the kinship 654

coefficients which are unrelated to the individual loci, however in the context of LMM’s one 655

would need to show that the inflation was uniform across the genome or genetic region of 656

interest. Benke et al (Benke et al. 2013) suggested using a joint test of all SNP effects, similar 657

to the global Wald test used in the current study, as an optimal way to control the type 1 658

error and increase power. However, caution needs to be applied when utilizing this method 659

for complex traits, such as BMI trajectories over childhood, and a genome-wide significance 660

threshold should only be used if there is no inflation detected in the type 1 error. Benke et 661

al (Benke et al. 2013) used a trait with a linear decrease over time and low correlation 662

between the intercept and slope parameters; in contrast, in this study we have a complex 663

trajectory over time with high correlation between the intercept and slope parameters, 664

22

which indicated that the joint test has inflated type 1 error and can only be reduced using a 665

robust estimate in some scenarios. 666

667

In summary, based on our simulation results, we strongly suggest fitting the same function 668

of age in the fixed and random effect to avoid inflation of the type 1 error of the SNP*age 669

interaction terms. If this is not possible due to convergence issues, then we suggest using a 670

robust standard error for the SNP by age interaction terms to reduce the type 1 error 671

inflation in GWAS, regardless of whether the error term of the model correctly follows the 672

model assumptions or not. If no inflation in the type 1 error is detected for a particular 673

parameter of interest, then the classical standard error should be used; for example, for the 674

SNP main effect parameter in this study. 675

676

Acknowledgements: 677

We are extremely grateful to all the families who took part in the ALSPAC study, the 678

midwives for their help in recruiting them, and the whole ALSPAC team, which includes 679

interviewers, computer and laboratory technicians, clerical workers, research scientists, 680

volunteers, managers, receptionists and nurses. The UK Medical Research Council and the 681

Wellcome Trust (Grant ref: 092731) and the University of Bristol provide core support for 682

ALSPAC. NM Warrington is funded by an Australian Postgraduate Award from the Australian 683

Government of Innovation, Industry, Science and Research and a Raine Study PhD Top-Up 684

Scholarship. LD Howe is funded by a UK Medical Research Council Population Health 685

Scientist fellowship (G1002375). L Paternoster is funded by a UK Medical Research Council 686

Population Health Scientist fellowship (MR/J012165/1). K Tilling, LD Howe and L Paternoster 687

work in a Unit that receives core funding from the University of Bristol and the UK Medical 688

Research Council (Grant ref: MC_UU_12013/9). The UK Medical Research Council also 689

supports K Tilling's research (G1000726/1). 690

691

Figure Legends: 692

Figure 1: Simulated power of the SNP main effect and SNP*age interaction terms for 693

complete designs. The two plots on the left are for the sparse complete design, while the 694

two plots on the right are from the intense complete design. The solid black line for the 695

Gaussian Distribution is the situation where the model is correctly specified. 696

23

697 Figure 2: Simulated power of the SNP main effect and SNP*age interaction terms for 698

unbalanced designs. “Equal” is the simulations from the equal unbalanced design, “Over” 699

are the simulations from the unbalanced design with less samples around the adiposity 700

rebound and “Under” are the simulations from the unbalanced design with more samples 701

around the adiposity rebound. The solid black line for the Gaussian Distribution is the 702

situation where the model is correctly specified. 703

704

Figure 3: Difference in power based on a normal standard error vs. a robust standard error 705

for the complete designs. A positive value indicates the power using the normal standard 706

error is greater than the power using the robust standard error. The two plots on the left 707

are for the sparse complete design, while the two plots on the right are from the intense 708

complete design. The solid black line for the Gaussian Distribution is the situation where the 709

model is correctly specified. 710

711


for the unbalanced designs. A positive value indicates the power using the normal standard 713

error is greater than the power using the robust standard error. Here, “Equal” is the 714

simulations from the equal unbalanced design, “Over” are the simulations from the 715

unbalanced design with less samples around the adiposity rebound and “Under” are the 716

simulations from the unbalanced design with more samples around the adiposity rebound. 717

The solid black line for the Gaussian Distribution is the situation where the model is 718

correctly specified. 719

720

Figure 5: QQ plots of the chromosome 16 analysis in the ALSPAC cohort. These plots are 721

the observed –log10(P) against the expected –log10(P) under the null hypothesis for each 722

SNP on chromosome 16. P-Values deviating from the red x=y line indicate significant 723

findings, whether they be false (i.e. inflation in type 1 error) or true. 724

725

24

References: 726

Aulchenko, Y. S., S. Ripke, A. Isaacs, and C. M. van Duijn. 2007. GenABEL: an R library for genome-727 wide association analysis. Bioinformatics 23 (10):1294-6. 728

Benke, K. S., Y. Wu, D. M. Fallin, B. Maher, and L. J. Palmer. 2013. Strategy to control type I error 729 increases power to identify genetic variation using the full biological trajectory. Genet 730 Epidemiol 37 (5):419-30. 731

Boyd, A., J. Golding, J. Macleod, D. A. Lawlor, A. Fraser, J. Henderson, L. Molloy, A. Ness, S. Ring, and 732 G. Davey Smith. 2013. Cohort Profile: the 'children of the 90s'--the index offspring of the 733 Avon Longitudinal Study of Parents and Children. Int J Epidemiol 42 (1):111-27. 734

Bradfield, J. P., H. R. Taal, N. J. Timpson, A. Scherag, C. Lecoeur, N. M. Warrington, E. Hypponen, C. 735 Holst, B. Valcarcel, E. Thiering, R. M. Salem, F. R. Schumacher, D. L. Cousminer, P. M. 736 Sleiman, J. Zhao, R. I. Berkowitz, K. S. Vimaleswaran, I. Jarick, C. E. Pennell, D. M. Evans, B. St 737 Pourcain, D. J. Berry, D. O. Mook-Kanamori, A. Hofman, F. Rivadeneira, A. G. Uitterlinden, C. 738 M. van Duijn, R. J. van der Valk, J. C. de Jongste, D. S. Postma, D. I. Boomsma, W. J. 739 Gauderman, M. T. Hassanein, C. M. Lindgren, R. Magi, C. A. Boreham, C. E. Neville, L. A. 740 Moreno, P. Elliott, A. Pouta, A. L. Hartikainen, M. Li, O. Raitakari, T. Lehtimaki, J. G. Eriksson, 741 A. Palotie, J. Dallongeville, S. Das, P. Deloukas, G. McMahon, S. M. Ring, J. P. Kemp, J. L. 742 Buxton, A. I. Blakemore, M. Bustamante, M. Guxens, J. N. Hirschhorn, M. W. Gillman, E. 743 Kreiner-Moller, H. Bisgaard, F. D. Gilliland, J. Heinrich, E. Wheeler, I. Barroso, S. O'Rahilly, A. 744 Meirhaeghe, T. I. Sorensen, C. Power, L. J. Palmer, A. Hinney, E. Widen, I. S. Farooqi, M. I. 745 McCarthy, P. Froguel, D. Meyre, J. Hebebrand, M. R. Jarvelin, V. W. Jaddoe, G. D. Smith, H. 746 Hakonarson, and S. F. Grant. 2012. A genome-wide association meta-analysis identifies new 747 childhood obesity loci. Nat Genet 44 (5):526-31. 748

Cole, T. J., M. C. Bellizzi, K. M. Flegal, and W. H. Dietz. 2000. Establishing a standard definition for 749 child overweight and obesity worldwide: international survey. BMJ 320 (7244):1240-3. 750

Dadd, T., M. E. Weale, and C. M. Lewis. 2009. A critical evaluation of genomic control methods for 751 genetic association studies. Genet Epidemiol 33 (4):290-8. 752

Davidian, M., and D. M. Giltinan. 1995. Nonlinear models for repeated measurement data, 753 Monographs on statistics and applied probability;62. London: Chapman & Hall. 754

Devlin, B., and K. Roeder. 1999. Genomic control for association studies. Biometrics 55 (4):997-1004. 755 Dubois, L., and M. Girad. 2007. Accuracy of maternal reports of pre-schoolers' weights and heights 756

as estimates of BMI values. Int J Epidemiol 36 (1):132-8. 757 Dudbridge, F., and A. Gusnanto. 2008. Estimation of significance thresholds for genomewide 758

association scans. Genet Epidemiol 32 (3):227-34. 759 Duggal, P., E. M. Gillanders, T. N. Holmes, and J. E. Bailey-Wilson. 2008. Establishing an adjusted p-760

value threshold to control the family-wide type 1 error in genome wide association studies. 761 BMC Genomics 9:516. 762

Fox, C. S., N. Heard-Costa, L. A. Cupples, J. Dupuis, R. S. Vasan, and L. D. Atwood. 2007. Genome-763 wide association to body mass index and waist circumference: the Framingham Heart Study 764 100K project. BMC Med Genet 8 Suppl 1:S18. 765

Fraser, A., C. Macdonald-Wallis, K. Tilling, A. Boyd, J. Golding, G. Davey Smith, J. Henderson, J. 766 Macleod, L. Molloy, A. Ness, S. Ring, S. M. Nelson, and D. A. Lawlor. 2013. Cohort Profile: the 767 Avon Longitudinal Study of Parents and Children: ALSPAC mothers cohort. Int J Epidemiol 42 768 (1):97-110. 769

Frayling, T. M., N. J. Timpson, M. N. Weedon, E. Zeggini, R. M. Freathy, C. M. Lindgren, J. R. Perry, K. 770 S. Elliott, H. Lango, N. W. Rayner, B. Shields, L. W. Harries, J. C. Barrett, S. Ellard, C. J. Groves, 771 B. Knight, A. M. Patch, A. R. Ness, S. Ebrahim, D. A. Lawlor, S. M. Ring, Y. Ben-Shlomo, M. R. 772 Jarvelin, U. Sovio, A. J. Bennett, D. Melzer, L. Ferrucci, R. J. Loos, I. Barroso, N. J. Wareham, F. 773 Karpe, K. R. Owen, L. R. Cardon, M. Walker, G. A. Hitman, C. N. Palmer, A. S. Doney, A. D. 774 Morris, G. D. Smith, A. T. Hattersley, and M. I. McCarthy. 2007. A common variant in the FTO 775

25

gene is associated with body mass index and predisposes to childhood and adult obesity. 776 Science 316 (5826):889-94. 777

Furlotte, N. A., E. Eskin, and S. Eyheramendy. 2012. Genome-wide association mapping with 778 longitudinal data. Genet Epidemiol 36 (5):463-71. 779

Gurka, M. J., L. J. Edwards, and K. E. Muller. 2011. Avoiding bias in mixed model inference for fixed 780 effects. Stat Med 30 (22):2696-707. 781

Haslam, D. W., and W. P. James. 2005. Obesity. Lancet 366 (9492):1197-209. 782 Haworth, C. M., S. Carnell, E. L. Meaburn, O. S. Davis, R. Plomin, and J. Wardle. 2008. Increasing 783

heritability of BMI and stronger associations with the FTO gene over childhood. Obesity 784 (Silver Spring) 16 (12):2663-8. 785

Hindorff, L.A., J. MacArthur, A. Wise, H.A. Junkins, P.N. Hall, A.K. Klemm, and T.A. Manolio. 2010. A 786 Catalog of Published Genome-Wide Association Studies. National Human Genome Research 787 Institute. 788

Howe, L. D., K. Tilling, L. Benfield, J. Logue, N. Sattar, A. R. Ness, G. D. Smith, and D. A. Lawlor. 2010. 789 Changes in ponderal index and body mass index across childhood and their associations with 790 fat mass and cardiovascular risk factors at age 15. PLoS One 5 (12):e15186. 791

Howe, L. D., K. Tilling, and D. A. Lawlor. 2009. Accuracy of height and weight data from child health 792 records. Arch Dis Child 94 (12):950-4. 793

Ihaka, R., Gentleman R. 1996. R: a language for data analysis and graphics. Journal of Computational 794 and Graphical Statistics 5 (3):299-314. 795

Jacqmin-Gadda, H., S. Sibillot, C. Proust, J.M Molina, and R. Thiébaut. 2007. Robustness of the linear 796 mixed model to misspecified error distribution. Computational Statistics & Data 797 Analysis 51 (10):5142-5154. 798

Kerner, B., K. E. North, and M. D. Fallin. 2009. Use of longitudinal data in genetic studies in the 799 genome-wide association studies era: summary of Group 14. Genet Epidemiol 33 Suppl 800 1:S93-8. 801

Kindblom, J. M., M. Lorentzon, A. Hellqvist, L. Lonn, J. Brandberg, S. Nilsson, E. Norjavaara, and C. 802 Ohlsson. 2009. BMI changes during childhood and adolescence as predictors of amount of 803 adult subcutaneous and visceral adipose tissue in men: the GOOD Study. Diabetes 58 804 (4):867-74. 805

Koehler, E., E. Brown, and S. J. Haneuse. 2009. On the Assessment of Monte Carlo Error in 806 Simulation-Based Statistical Analyses. Am Stat 63 (2):155-162. 807

Laird, N. M., and J. H. Ware. 1982. Random-effects models for longitudinal data. Biometrics 38 808 (4):963-74. 809

Liang, K.Y., and S.L. Zeger. 1986. Longitudinal data analysis using generalized linear models. 810 Biometrika 73 (1):13-22. 811

Liu, J. Z., S. E. Medland, M. J. Wright, A. K. Henders, A. C. Heath, P. A. Madden, A. Duncan, G. W. 812 Montgomery, N. G. Martin, and A. F. McRae. 2010. Genome-wide association study of height 813 and body mass index in Australian twin families. Twin Res Hum Genet 13 (2):179-93. 814

Loos, R. J., C. M. Lindgren, S. Li, E. Wheeler, J. H. Zhao, I. Prokopenko, M. Inouye, R. M. Freathy, A. P. 815 Attwood, J. S. Beckmann, S. I. Berndt, K. B. Jacobs, S. J. Chanock, R. B. Hayes, S. Bergmann, A. 816 J. Bennett, S. A. Bingham, M. Bochud, M. Brown, S. Cauchi, J. M. Connell, C. Cooper, G. D. 817 Smith, I. Day, C. Dina, S. De, E. T. Dermitzakis, A. S. Doney, K. S. Elliott, P. Elliott, D. M. Evans, 818 I. Sadaf Farooqi, P. Froguel, J. Ghori, C. J. Groves, R. Gwilliam, D. Hadley, A. S. Hall, A. T. 819 Hattersley, J. Hebebrand, I. M. Heid, C. Lamina, C. Gieger, T. Illig, T. Meitinger, H. E. 820 Wichmann, B. Herrera, A. Hinney, S. E. Hunt, M. R. Jarvelin, T. Johnson, J. D. Jolley, F. Karpe, 821 A. Keniry, K. T. Khaw, R. N. Luben, M. Mangino, J. Marchini, W. L. McArdle, R. McGinnis, D. 822 Meyre, P. B. Munroe, A. D. Morris, A. R. Ness, M. J. Neville, A. C. Nica, K. K. Ong, S. O'Rahilly, 823 K. R. Owen, C. N. Palmer, K. Papadakis, S. Potter, A. Pouta, L. Qi, J. C. Randall, N. W. Rayner, 824 S. M. Ring, M. S. Sandhu, A. Scherag, M. A. Sims, K. Song, N. Soranzo, E. K. Speliotes, H. E. 825 Syddall, S. A. Teichmann, N. J. Timpson, J. H. Tobias, M. Uda, C. I. Vogel, C. Wallace, D. M. 826

26

Waterworth, M. N. Weedon, C. J. Willer, Wraight, X. Yuan, E. Zeggini, J. N. Hirschhorn, D. P. 827 Strachan, W. H. Ouwehand, M. J. Caulfield, N. J. Samani, T. M. Frayling, P. Vollenweider, G. 828 Waeber, V. Mooser, P. Deloukas, M. I. McCarthy, N. J. Wareham, I. Barroso, P. Kraft, S. E. 829 Hankinson, D. J. Hunter, F. B. Hu, H. N. Lyon, B. F. Voight, M. Ridderstrale, L. Groop, P. 830 Scheet, S. Sanna, G. R. Abecasis, G. Albai, R. Nagaraja, D. Schlessinger, A. U. Jackson, J. 831 Tuomilehto, F. S. Collins, M. Boehnke, and K. L. Mohlke. 2008. Common variants near MC4R 832 are associated with fat mass, weight and risk of obesity. Nat Genet 40 (6):768-75. 833

Maes, H. H., M. C. Neale, and L. J. Eaves. 1997. Genetic and environmental factors in relative body 834 weight and human adiposity. Behav Genet 27 (4):325-51. 835

McDonald, L. 1975. Tests for the General Linear Hypothesis Under the Multiple Design Multivariate 836 Linear Model. The Annals of Statistics 3 (2):461-466. 837

Parsons, T. J., C. Power, S. Logan, and C. D. Summerbell. 1999. Childhood predictors of adult obesity: 838 a systematic review. Int J Obes Relat Metab Disord 23 Suppl 8:S1-107. 839

Pinheiro, J., and D. Bates. 2000. Mixed Effects Models in S and S-Plus: Springer. 840 Rasbash, J., F. Steele, W.J. Browne, and H. Goldstein. 2012. A User’s Guide to MLwiN, v2.26. Centre 841

for Multilevel Modelling, University of Bristol. 842 Risch, N., and K. Merikangas. 1996. The future of genetic studies of complex human diseases. Science 843

273 (5281):1516-7. 844 Royall, R.M. 1986. Model Robust Confidence Intervals Using Maximum Likelihood Estimators. 845

International Statistical Review / Revue Internationale de Statistique 54 (2):221-226. 846 Serdula, M. K., D. Ivery, R. J. Coates, D. S. Freedman, D. F. Williamson, and T. Byers. 1993. Do obese 847

children become obese adults? A review of the literature. Prev Med 22 (2):167-77. 848 Sikorska, K., F. Rivadeneira, P. J. Groenen, A. Hofman, A. G. Uitterlinden, P. H. Eilers, and E. Lesaffre. 849

2013. Fast linear mixed model computations for genome-wide association studies with 850 longitudinal data. Stat Med 32 (1):165-80. 851

Smith, E. N., W. Chen, M. Kahonen, J. Kettunen, T. Lehtimaki, L. Peltonen, O. T. Raitakari, R. M. 852 Salem, N. J. Schork, M. Shaw, S. R. Srinivasan, E. J. Topol, J. S. Viikari, G. S. Berenson, and S. S. 853 Murray. 2010. Longitudinal genome-wide association of cardiovascular disease risk factors in 854 the Bogalusa heart study. PLoS Genet 6 (9). 855

Speliotes, E. K., C. J. Willer, S. I. Berndt, K. L. Monda, G. Thorleifsson, A. U. Jackson, H. L. Allen, C. M. 856 Lindgren, J. Luan, R. Magi, J. C. Randall, S. Vedantam, T. W. Winkler, L. Qi, T. Workalemahu, I. 857 M. Heid, V. Steinthorsdottir, H. M. Stringham, M. N. Weedon, E. Wheeler, A. R. Wood, T. 858 Ferreira, R. J. Weyant, A. V. Segre, K. Estrada, L. Liang, J. Nemesh, J. H. Park, S. Gustafsson, T. 859 O. Kilpelainen, J. Yang, N. Bouatia-Naji, T. Esko, M. F. Feitosa, Z. Kutalik, M. Mangino, S. 860 Raychaudhuri, A. Scherag, A. V. Smith, R. Welch, J. H. Zhao, K. K. Aben, D. M. Absher, N. 861 Amin, A. L. Dixon, E. Fisher, N. L. Glazer, M. E. Goddard, N. L. Heard-Costa, V. Hoesel, J. J. 862 Hottenga, A. Johansson, T. Johnson, S. Ketkar, C. Lamina, S. Li, M. F. Moffatt, R. H. Myers, N. 863 Narisu, J. R. Perry, M. J. Peters, M. Preuss, S. Ripatti, F. Rivadeneira, C. Sandholt, L. J. Scott, 864 N. J. Timpson, J. P. Tyrer, S. van Wingerden, R. M. Watanabe, C. C. White, F. Wiklund, C. 865 Barlassina, D. I. Chasman, M. N. Cooper, J. O. Jansson, R. W. Lawrence, N. Pellikka, I. 866 Prokopenko, J. Shi, E. Thiering, H. Alavere, M. T. Alibrandi, P. Almgren, A. M. Arnold, T. 867 Aspelund, L. D. Atwood, B. Balkau, A. J. Balmforth, A. J. Bennett, Y. Ben-Shlomo, R. N. 868 Bergman, S. Bergmann, H. Biebermann, A. I. Blakemore, T. Boes, L. L. Bonnycastle, S. R. 869 Bornstein, M. J. Brown, T. A. Buchanan, F. Busonero, H. Campbell, F. P. Cappuccio, C. 870 Cavalcanti-Proenca, Y. D. Chen, C. M. Chen, P. S. Chines, R. Clarke, L. Coin, J. Connell, I. N. 871 Day, M. Heijer, J. Duan, S. Ebrahim, P. Elliott, R. Elosua, G. Eiriksdottir, M. R. Erdos, J. G. 872 Eriksson, M. F. Facheris, S. B. Felix, P. Fischer-Posovszky, A. R. Folsom, N. Friedrich, N. B. 873 Freimer, M. Fu, S. Gaget, P. V. Gejman, E. J. Geus, C. Gieger, A. P. Gjesing, A. Goel, P. 874 Goyette, H. Grallert, J. Grassler, D. M. Greenawalt, C. J. Groves, V. Gudnason, C. Guiducci, A. 875 L. Hartikainen, N. Hassanali, A. S. Hall, A. S. Havulinna, C. Hayward, A. C. Heath, C. 876 Hengstenberg, A. A. Hicks, A. Hinney, A. Hofman, G. Homuth, J. Hui, W. Igl, C. Iribarren, B. 877

27

Isomaa, K. B. Jacobs, I. Jarick, E. Jewell, U. John, T. Jorgensen, P. Jousilahti, A. Jula, M. 878 Kaakinen, E. Kajantie, L. M. Kaplan, S. Kathiresan, J. Kettunen, L. Kinnunen, J. W. Knowles, I. 879 Kolcic, I. R. Konig, S. Koskinen, P. Kovacs, J. Kuusisto, P. Kraft, K. Kvaloy, J. Laitinen, O. 880 Lantieri, C. Lanzani, L. J. Launer, C. Lecoeur, T. Lehtimaki, G. Lettre, J. Liu, M. L. Lokki, M. 881 Lorentzon, R. N. Luben, B. Ludwig, P. Manunta, D. Marek, M. Marre, N. G. Martin, W. L. 882 McArdle, A. McCarthy, B. McKnight, T. Meitinger, O. Melander, D. Meyre, K. Midthjell, G. W. 883 Montgomery, M. A. Morken, A. P. Morris, R. Mulic, J. S. Ngwa, M. Nelis, M. J. Neville, D. R. 884 Nyholt, C. J. O'Donnell, S. O'Rahilly, K. K. Ong, B. Oostra, G. Pare, A. N. Parker, M. Perola, I. 885 Pichler, K. H. Pietilainen, C. G. Platou, O. Polasek, A. Pouta, S. Rafelt, O. Raitakari, N. W. 886 Rayner, M. Ridderstrale, W. Rief, A. Ruokonen, N. R. Robertson, P. Rzehak, V. Salomaa, A. R. 887 Sanders, M. S. Sandhu, S. Sanna, J. Saramies, M. J. Savolainen, S. Scherag, S. Schipf, S. 888 Schreiber, H. Schunkert, K. Silander, J. Sinisalo, D. S. Siscovick, J. H. Smit, N. Soranzo, U. 889 Sovio, J. Stephens, I. Surakka, A. J. Swift, M. L. Tammesoo, J. C. Tardif, M. Teder-Laving, T. M. 890 Teslovich, J. R. Thompson, B. Thomson, A. Tonjes, T. Tuomi, J. B. van Meurs, G. J. van 891 Ommen, V. Vatin, J. Viikari, S. Visvikis-Siest, V. Vitart, C. I. Vogel, B. F. Voight, L. L. Waite, H. 892 Wallaschofski, G. B. Walters, E. Widen, S. Wiegand, S. H. Wild, G. Willemsen, D. R. Witte, J. C. 893 Witteman, J. Xu, Q. Zhang, L. Zgaga, A. Ziegler, P. Zitting, J. P. Beilby, I. S. Farooqi, J. 894 Hebebrand, H. V. Huikuri, A. L. James, M. Kahonen, D. F. Levinson, F. Macciardi, M. S. 895 Nieminen, C. Ohlsson, L. J. Palmer, P. M. Ridker, M. Stumvoll, J. S. Beckmann, H. Boeing, E. 896 Boerwinkle, D. I. Boomsma, M. J. Caulfield, S. J. Chanock, F. S. Collins, L. A. Cupples, G. D. 897 Smith, J. Erdmann, P. Froguel, H. Gronberg, U. Gyllensten, P. Hall, T. Hansen, T. B. Harris, A. 898 T. Hattersley, R. B. Hayes, J. Heinrich, F. B. Hu, K. Hveem, T. Illig, M. R. Jarvelin, J. Kaprio, F. 899 Karpe, K. T. Khaw, L. A. Kiemeney, H. Krude, M. Laakso, D. A. Lawlor, A. Metspalu, P. B. 900 Munroe, W. H. Ouwehand, O. Pedersen, B. W. Penninx, A. Peters, P. P. Pramstaller, T. 901 Quertermous, T. Reinehr, A. Rissanen, I. Rudan, N. J. Samani, P. E. Schwarz, A. R. Shuldiner, 902 T. D. Spector, J. Tuomilehto, M. Uda, A. Uitterlinden, T. T. Valle, M. Wabitsch, G. Waeber, N. 903 J. Wareham, H. Watkins, J. F. Wilson, A. F. Wright, M. C. Zillikens, N. Chatterjee, S. A. 904 McCarroll, S. Purcell, E. E. Schadt, P. M. Visscher, T. L. Assimes, I. B. Borecki, P. Deloukas, C. S. 905 Fox, L. C. Groop, T. Haritunians, D. J. Hunter, R. C. Kaplan, K. L. Mohlke, J. R. O'Connell, L. 906 Peltonen, D. Schlessinger, D. P. Strachan, C. M. van Duijn, H. E. Wichmann, T. M. Frayling, U. 907 Thorsteinsdottir, G. R. Abecasis, I. Barroso, M. Boehnke, K. Stefansson, K. E. North, M. I. 908 McCarthy, J. N. Hirschhorn, E. Ingelsson, and R. J. Loos. 2010. Association analyses of 909 249,796 individuals reveal 18 new loci associated with body mass index. Nat Genet 42 910 (11):937-48. 911

Taylor, J. M. G., W. G. Cumberland, and J. P. Sy. 1994. A Stochastic Model for Analysis of Longitudinal 912 AIDS Data. Journal of the American Statistical Association 89 (427):727-736. 913

Taylor, J. M., and N. Law. 1998. Does the covariance structure matter in longitudinal modelling for 914 the prediction of future CD4 counts? Stat Med 17 (20):2381-94. 915

Thorleifsson, G., G. B. Walters, D. F. Gudbjartsson, V. Steinthorsdottir, P. Sulem, A. Helgadottir, U. 916 Styrkarsdottir, S. Gretarsdottir, S. Thorlacius, I. Jonsdottir, T. Jonsdottir, E. J. Olafsdottir, G. H. 917 Olafsdottir, T. Jonsson, F. Jonsson, K. Borch-Johnsen, T. Hansen, G. Andersen, T. Jorgensen, 918 T. Lauritzen, K. K. Aben, A. L. Verbeek, N. Roeleveld, E. Kampman, L. R. Yanek, L. C. Becker, L. 919 Tryggvadottir, T. Rafnar, D. M. Becker, J. Gulcher, L. A. Kiemeney, O. Pedersen, A. Kong, U. 920 Thorsteinsdottir, and K. Stefansson. 2009. Genome-wide association yields new sequence 921 variants at seven loci that associate with measures of obesity. Nat Genet 41 (1):18-24. 922

Verbeke, G, and G Molenberghs. 2000. Linear mixed models for longitudinal data: Springer Series in 923 Statistics, Springer-Verlag, New York. 924

Verbeke, G., and E. Lesaffre. 1997. The effect of misspecifying the random-effects distribution in 925 linear mixed models for longitudinal data. Computational Statistics & Data Analysis 23 926 (4):541-556. 927

28

Wardle, J., S. Carnell, C. M. Haworth, and R. Plomin. 2008. Evidence for a strong genetic influence on 928 childhood adiposity despite the force of the obesogenic environment. Am J Clin Nutr 87 929 (2):398-404. 930

Warrington, N. M., Y. Y. Wu, C. E. Pennell, J. A. Marsh, L. J. Beilin, L. J. Palmer, S. J. Lye, and L. 931 Briollais. 2013. Modelling BMI Trajectories in Children for Genetic Association Studies. PLoS 932 One 8 (1):e53897. 933

White, I. 2010. simsum: Analysis of simulation studies including Monte Carlo error. The Stata Journal 934 10 (3):369-385. 935

WHO. 2000. Obesity: preventing and managing the golbal epidemic. Report of a WHO Consultation. 936 WHO Technical Report Series 894. Geneva: World Health Organization, 2000. 937

Willer, C. J., E. K. Speliotes, R. J. Loos, S. Li, C. M. Lindgren, I. M. Heid, S. I. Berndt, A. L. Elliott, A. U. 938 Jackson, C. Lamina, G. Lettre, N. Lim, H. N. Lyon, S. A. McCarroll, K. Papadakis, L. Qi, J. C. 939 Randall, R. M. Roccasecca, S. Sanna, P. Scheet, M. N. Weedon, E. Wheeler, J. H. Zhao, L. C. 940 Jacobs, I. Prokopenko, N. Soranzo, T. Tanaka, N. J. Timpson, P. Almgren, A. Bennett, R. N. 941 Bergman, S. A. Bingham, L. L. Bonnycastle, M. Brown, N. P. Burtt, P. Chines, L. Coin, F. S. 942 Collins, J. M. Connell, C. Cooper, G. D. Smith, E. M. Dennison, P. Deodhar, P. Elliott, M. R. 943 Erdos, K. Estrada, D. M. Evans, L. Gianniny, C. Gieger, C. J. Gillson, C. Guiducci, R. Hackett, D. 944 Hadley, A. S. Hall, A. S. Havulinna, J. Hebebrand, A. Hofman, B. Isomaa, K. B. Jacobs, T. 945 Johnson, P. Jousilahti, Z. Jovanovic, K. T. Khaw, P. Kraft, M. Kuokkanen, J. Kuusisto, J. 946 Laitinen, E. G. Lakatta, J. Luan, R. N. Luben, M. Mangino, W. L. McArdle, T. Meitinger, A. 947 Mulas, P. B. Munroe, N. Narisu, A. R. Ness, K. Northstone, S. O'Rahilly, C. Purmann, M. G. 948 Rees, M. Ridderstrale, S. M. Ring, F. Rivadeneira, A. Ruokonen, M. S. Sandhu, J. Saramies, L. 949 J. Scott, A. Scuteri, K. Silander, M. A. Sims, K. Song, J. Stephens, S. Stevens, H. M. Stringham, 950 Y. C. Tung, T. T. Valle, C. M. Van Duijn, K. S. Vimaleswaran, P. Vollenweider, G. Waeber, C. 951 Wallace, R. M. Watanabe, D. M. Waterworth, N. Watkins, J. C. Witteman, E. Zeggini, G. Zhai, 952 M. C. Zillikens, D. Altshuler, M. J. Caulfield, S. J. Chanock, I. S. Farooqi, L. Ferrucci, J. M. 953 Guralnik, A. T. Hattersley, F. B. Hu, M. R. Jarvelin, M. Laakso, V. Mooser, K. K. Ong, W. H. 954 Ouwehand, V. Salomaa, N. J. Samani, T. D. Spector, T. Tuomi, J. Tuomilehto, M. Uda, A. G. 955 Uitterlinden, N. J. Wareham, P. Deloukas, T. M. Frayling, L. C. Groop, R. B. Hayes, D. J. 956 Hunter, K. L. Mohlke, L. Peltonen, D. Schlessinger, D. P. Strachan, H. E. Wichmann, M. I. 957 McCarthy, M. Boehnke, I. Barroso, G. R. Abecasis, and J. N. Hirschhorn. 2009. Six new loci 958 associated with body mass index highlight a neuronal influence on body weight regulation. 959 Nat Genet 41 (1):25-34. 960

World Health Organization. Obesity and Overweight Fact Sheet (No 311), May 2012 2012 [cited 4 961 September 2012. Available from 962 http://www.who.int/mediacentre/factsheets/fs311/en/index.html. 963

Zhang, D., and M. Davidian. 2001. Linear mixed models with flexible distributions of random effects 964 for longitudinal data. Biometrics 57 (3):795-802. 965

966

967

968

29

http://www.who.int/mediacentre/factsheets/fs311/en/index.html

Table 1: Parameter estimates from the ALSPAC non-genetic model used to generate the 969

data in the simulation study: 970

Effect Parameter Value

Intercept β0 16.534

Age β1 0.400

Age2 β2 0.056

Age3 β3 -0.003

Source β4 -0.153

SD(b0) σ0 2.092

SD(b1) σ1 0.269

SD(b2) σ2 0.0235

Cor(b0, b1) ρ0 0.820

Cor(b0, b2) ρ1 -0.389

Cor(b1, b2) ρ2 -0.092

SD(ε) σ 1.063

Correlation

structure

ρ 0.394

971

972

30

Table 2: Coverage rates of the 95% confidence intervals of the fixed effects; bold and 973

underlined cells are those that are significantly different from the nominal 95% based on 974

4,000 simulations under each design (1,000 simulations for each MAF combined into one 975

summary statistic). 976

Sampling Design Sparse Complete Intense Complete Equal Unbalanced

Unbalanced with more samples

around the adiposity rebound

Unbalanced with less samples around the

adiposity rebound

Sample Size N=1,000 N=3,000 N=1,000 N=3,000 N=1,000 N=3,000 N=1,000 N=3,000 N=1,000 N=3,000


SNP 95.43 95.03 95.08 95.45 94.83 95.23 95.08 94.70 95.40 94.73

SNP*age 95.00 95.23 94.58 95.13 94.35 94.63 94.30 93.90 94.53 94.35

t-distribution

SNP 95.45 95.35 95.90 94.55 95.13 94.85 94.65 94.48 95.48 94.95

SNP*age 95.30 94.80 94.05 94.13 94.45 94.10 93.70 94.00 93.33 94.03


SNP 94.90 95.03 95.18 95.10 95.05 94.25 95.43 94.95 94.85 94.75

SNP*age 95.68 95.18 94.63 94.65 93.88 93.73 94.73 94.13 93.90 93.55


SNP 94.85 94.98 94.83 95.65 95.00 95.08 94.53 95.40 94.33 94.48

SNP*age 95.05 94.78 95.03 94.60 95.20 94.08 94.58 95.20 94.80 94.10


SNP 94.93 95.05 95.83 95.35 94.43 94.75 94.70 94.93 94.98 94.93

SNP*age 94.93 95.03 94.95 94.15 94.03 94.10 93.75 94.53 93.95 93.93


SNP 94.75 95.23 95.15 95.08 94.25 95.20 95.48 95.35 95.23 94.43

SNP*age 94.05 94.45 95.38 95.38 94.13 94.43 94.00 94.75 94.73 94.60


SNP 94.20 95.00 94.08 94.38 94.80 94.33 94.30 94.03 95.98 95.48

SNP*age 94.10 94.88 91.78 92.38 94.70 94.23 93.28 93.48 95.65 95.25

977 978

31

Table 3: Type 1 error for the complete designs; bold and underlined cells are those that are 979

significantly different from the nominal α=0.05 based on 20,000 simulations under each 980

design (5,000 simulations for each MAF combined into one summary statistic). 981

Sampling Design Sparse Complete Intense Complete

Sample Size N=1,000 N=3,000 N=1,000 N=3,000

Standard Robust Standard Robust Standard Robust Standard Robust


SNP 0.0514 0.0528 0.0509 0.0513 0.0502 0.0521 0.0500 0.0510

SNP*age 0.0483 0.0504 0.0483 0.0491 0.0549 0.0486 0.0539 0.0467

Global wald test 0.0497 0.0478 0.0605 0.0620

t-distribution

SNP 0.0495 0.0498 0.0489 0.0496 0.0479 0.0510 0.0483 0.0502

SNP*age 0.0521 0.0534 0.0487 0.0492 0.0581 0.0490 0.0563 0.0465

Global wald test 0.0531 0.0508 0.0624 0.0629


SNP 0.0502 0.0517 0.0524 0.0524 0.0509 0.0526 0.0525 0.0532

SNP*age 0.0503 0.0519 0.0461 0.0474 0.0541 0.0508 0.0529 0.0486

Global wald test 0.0493 0.0488 0.0621 0.0579


SNP 0.0498 0.0504 0.0479 0.0479 0.0485 0.0499 0.0510 0.0508

SNP*age 0.0502 0.0510 0.0492 0.0488 0.0528 0.0506 0.0529 0.0495

Global wald test 0.0498 0.0508 0.0615 0.0586


SNP 0.0523 0.0527 0.0488 0.0490 0.0485 0.0511 0.0459 0.0485

SNP*age 0.0546 0.0527 0.0531 0.0514 0.0520 0.0493 0.0524 0.0481

Global wald test 0.0515 0.0525 0.0556 0.0546


SNP 0.0472 0.0478 0.0511 0.0519 0.0477 0.0493 0.0471 0.0490

SNP*age 0.0528 0.0497 0.0570 0.0528 0.0513 0.0513 0.0491 0.0487

Global wald test 0.0527 0.0540 0.0502 0.0478


SNP 0.0523 0.0536 0.0471 0.0473 0.0543 0.0513 0.0561 0.0522

SNP*age 0.0564 0.0538 0.0522 0.0491 0.0746 0.0528 0.0746 0.0530

Global wald test 0.0875 0.0549 0.0875 0.0497 0.1667 0.0506 0.1685 0.0506

982

32

Table 4: Type 1 error for the unbalanced designs; bold and underlined cells are those that are significantly different from the nominal α=0.05 983

based on 20,000 simulations under each design (5,000 simulations for each MAF combined into one summary statistic). 984

Sampling Design Equal Unbalanced Unbalanced with more samples around the adiposity


rebound




SNP 0.0518 0.0532 0.0500 0.0508 0.0503 0.0521 0.0478 0.0490 0.0529 0.0550 0.0540 0.0542

SNP*age 0.0581 0.0526 0.0592 0.0531 0.0566 0.0514 0.0556 0.0496 0.0560 0.0511 0.0575 0.0509

Global wald test 0.0646 0.0598 0.0601 0.0615 0.0621 0.0609

t-distribution

SNP 0.0510 0.0522 0.0491 0.0497 0.0485 0.0500 0.0495 0.0505 0.0487 0.0499 0.0516 0.0523

SNP*age 0.0571 0.0487 0.0629 0.0539 0.0596 0.0508 0.0571 0.0475 0.0563 0.0487 0.0577 0.0489

Global wald test 0.0607 0.0621 0.0620 0.0583 0.0587 0.0605


SNP 0.0493 0.0508 0.0495 0.0501 0.0498 0.0517 0.0473 0.0481 0.0512 0.0519 0.0482 0.0484

SNP*age 0.0548 0.0492 0.0589 0.0526 0.0580 0.0512 0.0571 0.0498 0.0593 0.0532 0.0547 0.0490

Global wald test 0.0618 0.0582 0.0616 0.0583 0.0632 0.0575


SNP 0.0519 0.0527 0.0490 0.0490 0.0505 0.0510 0.0519 0.0517 0.0510 0.0522 0.0487 0.0483

SNP*age 0.0534 0.0517 0.0487 0.0459 0.0510 0.0494 0.0538 0.0518 0.0543 0.0511 0.0551 0.0517

Global wald test 0.0579 0.0581 0.0605 0.0603 0.0589 0.0568


SNP 0.0495 0.0515 0.0482 0.0491 0.0498 0.0513 0.0506 0.0509 0.0512 0.0518 0.0528 0.0502

SNP*age 0.0586 0.0499 0.0607 0.0514 0.0576 0.0505 0.0588 0.0497 0.0605 0.0507 0.0604 0.0507

Global wald test 0.0589 0.0611 0.0597 0.0567 0.0620 0.0583


SNP 0.0493 0.0504 0.0492 0.0495 0.0486 0.0498 0.0516 0.0526 0.0506 0.0514 0.0496 0.0531

33

SNP*age 0.0570 0.0491 0.0563 0.0483 0.0546 0.0482 0.0563 0.0503 0.0561 0.0483 0.0600 0.0505

Global wald test 0.0572 0.0559 0.0568 0.0541 0.0588 0.0569


SNP 0.0533 0.0545 0.0500 0.0502 0.0564 0.0563 0.0530 0.0520 0.0491 0.0523 0.0500 0.0526

SNP*age 0.0554 0.0536 0.0571 0.0540 0.0643 0.0570 0.0610 0.0527 0.0497 0.0534 0.0497 0.0513

Global wald test 0.0911 0.0576 0.0929 0.0529 0.1031 0.0578 0.1011 0.0548 0.0850 0.0559 0.0801 0.0520

34


complete designs. The two plots on the left are for the sparse complete design, while the 986

two plots on the right are from the intense complete design. The solid black line for the 987

Gaussian Distribution is the situation where the model is correctly specified. 988

989 990

35


unbalanced designs. “Equal” is the simulations from the equal unbalanced design, “Over” 992

are the simulations from the unbalanced design with less samples around the adiposity 993

rebound and “Under” are the simulations from the unbalanced design with more samples 994

around the adiposity rebound. The solid black line for the Gaussian Distribution is the 995

situation where the model is correctly specified. 996

997

998

36


for the complete designs. A positive value indicates the power using the normal standard 1000

error is greater than the power using the robust standard error. The two plots on the left 1001

are for the sparse complete design, while the two plots on the right are from the intense 1002

complete design. The solid black line for the Gaussian Distribution is the situation where the 1003

model is correctly specified. 1004

1005

1006

37


for the unbalanced designs. A positive value indicates the power using the normal standard 1008

error is greater than the power using the robust standard error. Here, “Equal” is the 1009

simulations from the equal unbalanced design, “Over” are the simulations from the 1010

unbalanced design with less samples around the adiposity rebound and “Under” are the 1011

simulations from the unbalanced design with more samples around the adiposity rebound. 1012

The solid black line for the Gaussian Distribution is the situation where the model is 1013

correctly specified. 1014

1015

1016

38

Figure 5: QQ plots of the chromosome 16 analysis in the ALSPAC cohort. These plots are 1017

the observed –log10(P) against the expected –log10(P) under the null hypothesis for each 1018

SNP on chromosome 16. P-Values deviating from the red x=y line indicate significant 1019

findings, whether they be false (i.e. inflation in type 1 error) or true. 1020

1021 1022

39

Appendix D: Additional Results from Simulation Analysis in Chapter Four

Table 1: Coverage rates of the 95% confidence intervals of the fixed effects under the sparse complete

design; bold and underlined cells are those that are significantly different from the nominal 95%

based on 1,000 simulations.

MAF 0.1 0.2 0.3 0.4 Sample Size N=1,000 N=3,000 N=1,000 N=3,000 N=1,000 N=3,000 N=1,000 N=3,000

Gaussian Distribution SNP 95.2 94.7 94.6 94.2 96.4 95.5 95.5 95.7 SNP*age 95.4 94.6 95.3 94.9 95.6 95.2 93.7 96.2 SNP*age2 95.2 95.7 95.1 94.3 94.4 95.7 95.4 95.0 SNP*age3 94.6 95.6 95.7 93.7 95.9 95.1 95.4 94.0

t-distribution SNP 96.0 95.3 95.3 94.9 94.7 96.4 95.8 94.8 SNP*age 95.9 93.9 94.5 94.6 95.2 95.5 95.6 95.2 SNP*age2 96.2 94.4 95.1 94.9 94.7 95.9 95.4 94.9 SNP*age3 95.5 95.5 95.0 94.1 94.8 96.8 94.5 94.4

Skew-normal Distribution SNP 95.7 95.9 93.9 94.5 94.5 94.8 95.5 94.9 SNP*age 95.3 94.6 95.6 95.4 96.1 94.8 95.7 95.9 SNP*age2 95.5 96.2 95.8 95.9 94.7 95.0 95.1 94.5 SNP*age3 93.4 94.4 95.6 94.7 95.6 95.1 95.3 94.6

Mixture of 2 Gaussian Distributions SNP 94.9 95.7 95.0 94.3 94.0 95.5 95.5 94.4 SNP*age 95.5 95.5 95.7 94.7 94.2 93.7 94.8 95.2 SNP*age2 95.6 94.2 95.0 94.4 94.0 95.5 94.4 93.9 SNP*age3 96.2 95.2 95.2 95.9 95.8 94.5 95.1 95.2

Variance dependent on a covariate SNP 94.0 95.7 93.8 95.2 95.7 94.4 96.2 94.9 SNP*age 93.4 95.7 95.6 95.9 94.9 93.6 95.8 94.9 SNP*age2 94.6 95.1 94.5 94.4 94.7 95.2 94.4 95.0 SNP*age3 94.0 95.9 95.4 95.3 94.1 95.1 94.5 95.0

Variance greater at adiposity rebound SNP 95.9 94.9 95.0 94.9 94.7 96.0 93.4 95.1 SNP*age 94.5 94.7 94.4 94.5 93.1 94.2 94.2 94.4 SNP*age2 95.0 95.9 95.6 95.5 95.4 94.2 94.5 94.6 SNP*age3 95.6 93.4 93.6 92.8 94.4 95.1 96.3 93.4

Variance increasing over time SNP 95.5 94.6 93.8 94.0 93.1 95.7 94.4 95.7 SNP*age 95.0 94.1 94.3 95.3 92.2 94.8 94.9 95.3 SNP*age2 94.1 95.9 96.7 95.3 95.5 95.6 95.0 95.1 SNP*age3 88.9 88.3 90.1 89.5 88.6 88.7 88.7 89.3

Table 2: Coverage rates of the 95% confidence intervals of the fixed effects under the intense

complete design; bold and underlined cells are those that are significantly different from the nominal

95% based on 1,000 simulations.









Table 3: Coverage rates of the 95% confidence intervals of the fixed effects under the equal

unbalanced design; bold and underlined cells are those that are significantly different from the

nominal 95% based on 1,000 simulations.









Table 4: Coverage rates of the 95% confidence intervals of the fixed effects under the unbalanced

design with more samples around the adiposity rebound; bold and underlined cells are those that are

significantly different from the nominal 95% based on 1,000 simulations.


SNP 93.6 94.7 95.7 94.1 95.3 93.9 95.7 93.9 SNP*age 94.6 93.9 94.2 94.5 93.5 94.3 94.9 93.9 SNP*age2 94.7 94.1 94.8 95.2 95.3 94.6 93.8 94.8 SNP*age3 91.8 92.9 92.3 92.3 92.3 92.6 92.5 92.6 SNP 95.0 94.9 94.3 93.9 95.0 94.2 94.3 94.9 SNP*age 94.3 94.2 93.2 94.2 92.8 93.5 94.5 94.1 SNP*age2 95.9 96.0 96.1 95.2 94.9 94.5 95.7 93.9 SNP*age3 93.7 93.2 93.1 92.3 92.3 93.5 92.4 91.9 SNP 95.3 94.0 94.1 94.6 97.0 95.0 95.3 96.2 SNP*age 94.6 93.1 94.4 94.6 94.2 94.0 95.7 94.8 SNP*age2 95.5 94.2 94.0 95.7 95.6 96.1 95.1 94.4 SNP*age3 93.4 90.9 92.9 92.9 93.0 93.0 91.9 92.7 SNP 95.3 95.5 93.7 95.8 94.8 95.1 94.3 95.2 SNP*age 95.0 95.3 94.3 96.2 93.9 94.4 95.1 94.9 SNP*age2 94.4 94.6 94.6 95.0 94.3 95.5 95.1 94.1 SNP*age3 91.8 92.8 92.0 94.6 93.6 92.7 92.7 93.1 SNP 94.5 95.2 94.7 94.1 95.4 95.1 94.2 95.3 SNP*age 93.6 95.1 95.4 94.6 94.2 93.6 91.8 94.8 SNP*age2 95.3 95.5 94.5 94.4 95.8 94.9 94.9 94.0 SNP*age3 95.0 94.0 94.0 92.9 92.5 92.3 92.7 93.8 SNP 95.4 96.4 94.7 93.3 95.3 96.6 96.5 95.1 SNP*age 93.9 94.7 94.4 95.5 94.7 93.5 93.0 95.3 SNP*age2 94.9 94.2 94.5 94.7 96.1 96.2 95.8 96.0 SNP*age3 93.3 94.0 94.4 94.2 93.1 94.3 93.5 94.7 SNP 95.3 93.6 93.6 95.0 93.9 93.5 94.4 94.0 SNP*age 93.3 93.7 94.5 93.6 92.1 93.3 93.2 93.3 SNP*age2 93.6 92.9 92.7 93.9 95.1 95.0 93.1 92.1 SNP*age3 85.7 86.4 86.5 85.3 85.8 87.2 84.9 85.8

Table 5: Coverage rates of the 95% confidence intervals of the fixed effects under the unbalanced

design with less samples around the adiposity rebound; bold and underlined cells are those that are

significantly different from the nominal 95% based on 1,000 simulations.









Table 6: Bias and 95% confidence interval for the sparse complete design; bold and underlined cells are those whose confidence interval does not cover zero based on 1,000

simulations.

MAF 0.1 0.2 0.3 0.4

Sample Size N=1,000 N=3,000 N=1,000 N=3,000 N=1,000 N=3,000 N=1,000 N=3,000


SNP -0.001 (-0.0109,0.009)

0.0046 (-0.0014,0.0107)

0.0011 (-0.0065,0.0086)

-0.0019 (-0.0063,0.0026)

0.0029 (-0.0036,0.0094)

0.0029 (-0.0009,0.0067)

0.002 (-0.0042,0.0082)

-0.0007 (-0.0041,0.0028)

SNP*age -0.0005 (-0.0022,0.0012)

0.0007 (-0.0002,0.0017)

0.0001 (-0.0011,0.0013)

0.0003 (-0.0004,0.001)

0.0006 (-0.0005,0.0016)

0.0001 (-0.0005,0.0007)

0.0005 (-0.0006,0.0015)

-0.0003 (-0.0009,0.0003)

SNP*age2 0.00009 (-0.00006,0.00024)

-0.00002 (-0.00011,0.00006)

-0.00002 (-0.00013,0.00009)

-0.00001 (-0.00008,0.00005)

-0.00004 (-0.00014,0.00006)

-0.00005 (-0.00011,0)

-0.00001 (-0.0001,0.00008)

-0.00001 (-0.00006,0.00005)

SNP*age3 0.000004 (-0.00002,0.00003)

-0.000001 (-0.00002,0.00001)

-0.000002 (-0.00002,0.00002)

-0.00001 (-0.00002,0.000003)

0.000001 (-0.00002,0.00002)

-0.000001 (-0.00001,0.00001)

-0.000003 (-0.00002,0.00001)

0.000004 (-0.00001,0.00001)

t-distribution

SNP -0.008 (-0.0182,0.0021)

-0.0006 (-0.0067,0.0054)

0.0072 (-0.0006,0.015)

-0.0041 (-0.0088,0.0006)

0.0002 (-0.0067,0.0071)

0.0004 (-0.0034,0.0041)

0.0013 (-0.0051,0.0078)

0.0015 (-0.0022,0.0052)

SNP*age -0.0011 (-0.003,0.0007)

-0.0004 (-0.0015,0.0007)

0.001 (-0.0004,0.0024)

-0.0003 (-0.0012,0.0005)

-0.0005 (-0.0018,0.0007) 0 (-0.0007,0.0007)

0.0004 (-0.0008,0.0016)

-0.0001 (-0.0008,0.0006)

SNP*age2 -0.00002 (-0.00019,0.00014)

0.00006 (-0.00004,0.00016)

-0.00003 (-0.00016,0.00009)

0.00004 (-0.00004,0.00011) 0 (-0.00012,0.00011)

0.00003 (-0.00003,0.0001)

0.00003 (-0.00007,0.00014)

-0.00001 (-0.00007,0.00005)

SNP*age3 -0.000002 (-0.00004,0.00003)

0.000011 (-0.00001,0.00003)

-0.000005 (-0.00003,0.00002)

-0.000001 (-0.00002,0.00002)

0.00001 (-0.00001,0.00003)

0.000005 (-0.00001,0.00002)

0 (-0.00002,0.00002)

0.000007 (-0.00001,0.00002)


SNP 0.0004 (-0.0095,0.0103)

-0.0015 (-0.0072,0.0042)

-0.0093 (-0.017,-0.0017)

0.0026 (-0.0018,0.007)

-0.0002 (-0.0068,0.0063)

-0.0014 (-0.0052,0.0024)

-0.0015 (-0.0075,0.0045)

-0.0007 (-0.0042,0.0029)

SNP*age 0 (-0.0017,0.0017) 0.0002 (-0.0008,0.0012)

-0.0002 (-0.0014,0.0011)

0.0004 (-0.0003,0.0011)

0.0012 (0.0001,0.0022)

-0.0002 (-0.0008,0.0005)

-0.0003 (-0.0013,0.0007)

-0.0004 (-0.0009,0.0002)

SNP*age2 0 (-0.00015,0.00015)

0.00004 (-0.00004,0.0001)

0.00008 (-0.00003,0.00019)

-0.00001 (-0.00008,0.00005) 0.0001 (0,0.0002) 0.00006 (0,0.00011)

0.00005 (-0.00004,0.00015)

-0.00003 (-0.00009,0.00002)

SNP*age3 -0.000006 (-0.00003,0.00002)

-0.000004 (-0.00002,0.00001)

-0.000016 (-0.00004,0.000003)

-0.000002 (-0.00001,0.00001)

-0.000025 (-0.00004,-0.00001)

0.000006 (-0.000004,0.00002)

0.000001 (-0.00002,0.00002)

0.000005 (-0.00001,0.00001)


SNP -0.005 (-0.0152,0.0052)

-0.0025 (-0.0082,0.0033)

0.0024 (-0.0053,0.0101)

-0.0027 (-0.0073,0.0019)

-0.0017 (-0.0086,0.0051)

0.0013 (-0.0025,0.0052)

0.0034 (-0.0027,0.0096)

-0.0027 (-0.0064,0.0009)

SNP*age 0 (-0.0014,0.0013) -0.0003 0 (-0.0011,0.001) -0.0006 (-0.0012,0) -0.0001 0.0003 0.0002 -0.0006

(-0.0011,0.0005) (-0.001,0.0009) (-0.0003,0.0009) (-0.0007,0.001) (-0.0011,-0.0001)

SNP*age2 0.00011 (-0.00002,0.00023)

-0.00001 (-0.00008,0.00006)

-0.00001 (-0.00011,0.00008)

0.00003 (-0.00002,0.00009)

0.00003 (-0.00005,0.00012)

0.00002 (-0.00003,0.00007)

0 (-0.00008,0.00008)

0.00002 (-0.00003,0.00006)

SNP*age3 0.000002 (-0.00001,0.00002)

0.000005 (-0.00001,0.00001)

0.000003 (-0.00001,0.00001)

0.000005 (-0.000001,0.00001)

0 (-0.00001,0.000009

-0.000003 (-0.00001,0.000002)

-0.000003 (-0.00001,0.00001)

0.000002 (-0.00001,0.00001)


SNP 0.0032 (-0.0074,0.0138)

-0.0016 (-0.0074,0.0041)

0.0006 (-0.0073,0.0085)

0.0027 (-0.0018,0.0073)

-0.0021 (-0.0088,0.0046)

-0.0018 (-0.0057,0.0022)

-0.0001 (-0.0063,0.0061) 0 (-0.0036,0.0037)

SNP*age 0 (-0.0019,0.0019) -0.0004 (-0.0014,0.0006)

-0.0009 (-0.0023,0.0005)

-0.0002 (-0.0009,0.0006)

0.0003 (-0.0009,0.0015)

0.0001 (-0.0006,0.0008)

0.0001 (-0.001,0.0012)

0.0001 (-0.0005,0.0008)

SNP*age2 0.00001 (-0.00016,0.00017)

-0.00002 (-0.00011,0.00007)

0 (-0.00012,0.00012)

-0.00005 (-0.00012,0.00002)

0.00003 (-0.00007,0.00014)

0.00004 (-0.00002,0.0001)

-0.00001 (-0.00011,0.00008)

0.00001 (-0.00004,0.00007)

SNP*age3 0.000018 (-0.000014,0.000051)

0.000005 (-0.00001,0.00002)

0.00002 (-0.000003,0.00004)

0.000007 (-0.00001,0.00002)

-0.000004 (-0.00003,0.00002)

-0.000004 (-0.00002,0.00001)

-0.000005 (-0.00002,0.00002)

0.000001 (-0.00001,0.00001)


SNP -0.002 (-0.0121,0.0082)

0.0008 (-0.0051,0.0067)

-0.0014 (-0.0091,0.0062)

0.0001 (-0.0043,0.0046)

0.0016 (-0.0051,0.0084)

-0.0012 (-0.005,0.0026)

0.0027 (-0.0038,0.0092)

-0.0013 (-0.005,0.0024)

SNP*age -0.0007 (-0.0025,0.001)

-0.0004 (-0.0015,0.0006)

0.0007 (-0.0006,0.0021)

0.0001 (-0.0007,0.0009)

0.0008 (-0.0004,0.002) 0 (-0.0006,0.0007) 0 (-0.0011,0.0011)

-0.0003 (-0.0009,0.0003)

SNP*age2 0.00003 (-0.00012,0.00019)

-0.00001 (-0.0001,0.00008)

0.00004 (-0.00007,0.00015)

0.00004 (-0.00003,0.00011)

0.00003 (-0.00007,0.00013)

0.00001 (-0.00005,0.00007)

-0.00002 (-0.00011,0.00008)

-0.00001 (-0.00006,0.00005)

SNP*age3 0.00001 (-0.000019,0.000039)

0.00001 (-0.00001,0.00003)

-0.000016 (-0.00004,0.00001)

-0.000002 (-0.00002,0.00001)

-0.000012 (-0.00003,0.00001)

0.000001 (-0.00001,0.00001)

0.00001 (-0.00001,0.00003)

0 (-0.00001,0.00001)


SNP 0.0028 (-0.0075,0.0131)

-0.0082 (-0.0141,-0.0022)

-0.0055 (-0.0132,0.0023)

-0.0038 (-0.0083,0.0007)

-0.0008 (-0.0081,0.0065)

-0.0023 (-0.0061,0.0015)

-0.0002 (-0.0065,0.0061)

-0.0036 (-0.0071,0)

SNP*age 0.0017 (-0.0002,0.0036)

-0.0011 (-0.0022,0)

-0.0005 (-0.0019,0.0009)

-0.0003 (-0.001,0.0005)

0.0002 (-0.0011,0.0015)

-0.0004 (-0.0011,0.0003)

-0.0001 (-0.0012,0.001) 0 (-0.0006,0.0007)

SNP*age2 0.00006 (-0.00014,0.00026)

0.00004 (-0.00007,0.00014)

0.00003 (-0.0001,0.00017)

-0.00005 (-0.00013,0.00004)

0.00008 (-0.00005,0.00021)

-0.00002 (-0.00009,0.00005)

-0.00003 (-0.00015,0.00009)

0.00005 (-0.00002,0.00012)

SNP*age3 -0.000014 (-0.00005,0.00002)

0.000009 (-0.00001,0.00003)

-0.000003 (-0.00003,0.00003)

-0.000002 (-0.00002,0.00001)

-0.000001 (-0.00003,0.00002)

-0.000001 (-0.00002,0.00001)

0.000007 (-0.00002,0.00003)

0.000001 (-0.00001,0.00001)

Table 7: Bias and 95% confidence interval for the intense complete design; bold and underlined cells are those whose confidence interval does not cover zero based on

1,000 simulations.

MAF 0.1 0.2 0.3 0.4



SNP -0.0073 (-0.0178,0.0032)

0.0002 (-0.0056,0.006)

-0.0052 (-0.0127,0.0022)

0.0041 (-0.0001,0.0083)

-0.0007 (-0.007,0.0056)

0.0013 (-0.0024,0.005)

-0.0043 (-0.0104,0.0019)

0.0032 (-0.0003,0.0067)

SNP*age -0.0007 (-0.0024,0.0009)

-0.0008 (-0.0017,0.0001)

-0.0008 (-0.002,0.0004)

-0.0001 (-0.0007,0.0006) 0 (-0.001,0.001)

0.0001 (-0.0005,0.0007)

-0.0001 (-0.0011,0.0008)

0.0005 (-0.0001,0.001)

SNP*age2 0.0001 (-0.00005,0.00025)

-0.00004 (-0.00013,0.00004)

0.00008 (-0.00003,0.00019)

-0.00002 (-0.00008,0.00004)

0.00001 (-0.00009,0.0001)

-0.00002 (-0.00007,0.00003)

0.00005 (-0.00004,0.00014) 0 (-0.00006,0.00005)

SNP*age3 0.000002 (-0.00002,0.00003)

0.000006 (-0.00001,0.00002)

0.000009 (-0.00001,0.00003)

0.000007 (-0.000003,0.00002)

0.000005 (-0.00001,0.00002)

0.000003 (-0.00001,0.00001)

-0.000001 (-0.00002,0.00001)

0.000002 (-0.00001,0.00001)

t-distribution

SNP 0.0007 (-0.0098,0.0113)

0.0011 (-0.0051,0.0074)

-0.0042 (-0.012,0.0036)

0.0007 (-0.0037,0.0051)

0.0025 (-0.0041,0.0091)

-0.0007 (-0.0048,0.0034)

-0.0022 (-0.0084,0.0039)

0.0008 (-0.0029,0.0045)

SNP*age 0.0012 (-0.0006,0.003)

-0.0001 (-0.0011,0.001)

-0.0004 (-0.0017,0.001)

0.0002 (-0.0006,0.001)

0.0004 (-0.0007,0.0016)

-0.0006 (-0.0013,0.0001)

-0.0011 (-0.0022,-0.0001)

0.0002 (-0.0004,0.0009)

SNP*age2 0.00004 (-0.00014,0.00021) 0 (-0.0001,0.0001)

0.00005 (-0.00007,0.00018)

-0.00003 (-0.0001,0.00004) 0 (-0.0001,0.00011)

-0.00001 (-0.00008,0.00005)

0.00009 (-0.00001,0.00019)

-0.00002 (-0.00008,0.00004)

SNP*age3 -0.00001 (-0.00004,0.00002)

0.000006 (-0.00001,0.00003)

0.000004 (-0.00002,0.00003)

-0.000003 (-0.00002,0.00001)

0.000001 (-0.00002,0.00002)

0.000006 (-0.00001,0.00002)

0.000011 (-0.00001,0.00003)

-0.000003 (-0.00001,0.00001)


SNP 0.0033 (-0.0069,0.0136)

0.0006 (-0.0051,0.0064)

0.0027 (-0.0048,0.0101)

0.0026 (-0.0017,0.0068)

-0.0015 (-0.0081,0.0051)

0.0006 (-0.0031,0.0043)

-0.0009 (-0.007,0.0052)

-0.0004 (-0.0039,0.0031)

SNP*age 0.0007 (-0.0008,0.0023) 0 (-0.001,0.0009)

0.0009 (-0.0003,0.002)

0.0001 (-0.0006,0.0007)

-0.0004 (-0.0014,0.0006)

0.0003 (-0.0003,0.0009)

-0.0001 (-0.0011,0.0009)

-0.0001 (-0.0007,0.0005)

SNP*age2 -0.00014 (-0.00029,0.00001)

0.00002 (-0.00006,0.0001) 0 (-0.00011,0.00011)

-0.00003 (-0.00009,0.00003)

-0.00002 (-0.00011,0.00008)

-0.00003 (-0.00009,0.00002)

0.00001 (-0.00008,0.0001)

-0.00001 (-0.00006,0.00004)

SNP*age3 -0.00001 (-0.00003,0.00001)

0.000008 (-0.00001,0.00002)

-0.000006 (-0.00002,0.00001)

0.000004 (-0.00001,0.00001)

0 (-0.00002,0.00002)

-0.000007 (-0.00002,0.000002)

0.000007 (-0.00001,0.00002)

-0.000001 (-0.00001,0.000007)


SNP -0.0028 (-0.0131,0.0075)

-0.0004 (-0.0061,0.0054)

0.0026 (-0.0051,0.0103)

-0.0007 (-0.005,0.0037)

0.0044 (-0.0021,0.0108)

0.0014 (-0.0023,0.0051)

-0.0063 (-0.0127,0.0001)

-0.0033 (-0.0069,0.0003)

SNP*age -0.001 0 (-0.0008,0.0008) -0.0001 0.0002 0.0003 0.0001 -0.0002 -0.0004

(-0.0023,0.0003) (-0.0012,0.0009) (-0.0004,0.0008) (-0.0006,0.0012) (-0.0004,0.0006) (-0.0011,0.0006) (-0.0009,0)

SNP*age2 -0.00001 (-0.00014,0.00011)

-0.00002 (-0.0001,0.00005)

-0.00005 (-0.00014,0.00004) 0 (-0.00005,0.00005)

-0.00003 (-0.00011,0.00006)

-0.00006 (-0.0001,-0.00002)

0.00003 (-0.00005,0.00011)

0.00001 (-0.00004,0.00005)

SNP*age3 0.000014 (0.000001,0.000028)

-0.000001 (-0.00001,0.00001)

0.000003 (-0.00001,0.00001)

0.000001 (-0.00001,0.00001)

0.000004 (-0.00001,0.00001)

0.000001 (-0.000004,0.00001)

0.000004 (-0.00001,0.00001) 0.000005 (0,0.00001)


SNP -0.0052 (-0.0153,0.005) 0.0039 (-0.0021,0.01)

-0.0051 (-0.0125,0.0024)

0.0018 (-0.0024,0.0061)

0.0031 (-0.0037,0.0098)

0.0001 (-0.0037,0.0039)

-0.0092 (-0.0155,-0.0028)

0.0032 (-0.0003,0.0067)

SNP*age -0.0003 (-0.0019,0.0014) 0 (-0.001,0.001)

0.0004 (-0.0009,0.0016)

0.0001 (-0.0006,0.0009)

0.0005 (-0.0006,0.0017) 0 (-0.0007,0.0007)

-0.0014 (-0.0024,-0.0003)

0.0004 (-0.0002,0.001)

SNP*age2 0.00009 (-0.00006,0.00025)

-0.00004 (-0.00013,0.00005)

0.00005 (-0.00007,0.00017)

0.00004 (-0.00002,0.00011)

-0.00001 (-0.00012,0.00009)

0.00001 (-0.00005,0.00007)

0.00005 (-0.00004,0.00015)

0.00001 (-0.00005,0.00006)

SNP*age3 -0.000002 (-0.00003,0.00003)

0.000005 (-0.00001,0.00002)

-0.000002 (-0.00002,0.00002)

0.000005 (-0.00001,0.00002)

-0.000003 (-0.00002,0.00002)

-0.000002 (-0.00001,0.00001)

0.00001 (-0.00001,0.00003)

-0.000003 (-0.00001,0.00001)


SNP -0.004 (-0.0144,0.0065)

-0.0018 (-0.0077,0.004)

-0.0014 (-0.0091,0.0062)

-0.0044 (-0.0088,0.0001)

0.0074 (0.0009,0.014)

-0.0032 (-0.0071,0.0007)

-0.0024 (-0.0091,0.0042)

-0.0008 (-0.0044,0.0028)

SNP*age -0.0001 (-0.0018,0.0016) 0 (-0.0009,0.001)

0.0008 (-0.0003,0.002)

-0.0006 (-0.0013,0.0002)

0.0008 (-0.0003,0.0019)

-0.0004 (-0.001,0.0002)

-0.0001 (-0.0011,0.0009)

-0.0003 (-0.0009,0.0003)

SNP*age2 0.00006 (-0.00009,0.00022)

0.00002 (-0.00007,0.0001)

0.00006 (-0.00005,0.00017)

0.00003 (-0.00004,0.00009)

-0.00004 (-0.00014,0.00006)

0.00001 (-0.00005,0.00007)

0.00009 (-0.00001,0.00019)

-0.00006 (-0.00011,0)

SNP*age3 -0.000009 (-0.00004,0.00002)

-0.000004 (-0.00002,0.00001)

-0.000022 (-0.00004,-0.000004)

0.000002 (-0.00001,0.00001)

-0.000008 (-0.00003,0.00001)

0.000001 (-0.00001,0.00001)

0.000002 (-0.00001,0.00002)

0.000007 (-0.000002,0.00002)


SNP 0.002 (-0.0087,0.0127)

-0.0022 (-0.0084,0.0039)

-0.0036 (-0.0114,0.0042)

0.001 (-0.0034,0.0054)

-0.0011 (-0.0079,0.0058)

0.0009 (-0.0031,0.0048)

-0.003 (-0.0095,0.0035)

0.0047 (0.001,0.0084)

SNP*age 0 (-0.0017,0.0018) -0.0004 (-0.0014,0.0006)

-0.0007 (-0.002,0.0006)

0.0003 (-0.0005,0.001) 0 (-0.0012,0.0011)

0.0001 (-0.0006,0.0008)

-0.0005 (-0.0016,0.0006)

0.0009 (0.0002,0.0015)

SNP*age2 -0.00018 (-0.00037,0.00001) 0 (-0.00011,0.00011)

-0.00004 (-0.00019,0.0001)

-0.00002 (-0.0001,0.00006)

-0.00008 (-0.0002,0.00004)

-0.00001 (-0.00008,0.00006)

-0.00006 (-0.00018,0.00006)

-0.00005 (-0.00012,0.00002)

SNP*age3 -0.000015 (-0.000049,0.00002)

0.000005 (-0.00002,0.00003)

0 (-0.00003,0.00003)

-0.000008 (-0.00002,0.00001)

-0.000015 (-0.00004,0.00001)

0 (-0.00001,0.00001)

-0.000003 (-0.00003,0.00002)

-0.000012 (-0.000024,0)

Table 8: Bias and 95% confidence interval for the equal unbalanced design; bold and underlined cells are those whose confidence interval does not cover zero based on

1,000 simulations.

MAF 0.1 0.2 0.3 0.4



SNP -0.005 (-0.015,0.0051)

0.0008 (-0.005,0.0067)

0.0036 (-0.0041,0.0114)

-0.0052 (-0.0097,-0.0008)

0.0026 (-0.0041,0.0093)

-0.0024 (-0.0062,0.0015)

-0.0065 (-0.0126,-0.0003)

-0.0002 (-0.0037,0.0033)

SNP*age -0.0003 (-0.0021,0.0015) 0 (-0.001,0.0009)

0.0003 (-0.001,0.0015)

-0.0008 (-0.0015,-0.0001) 0 (-0.0011,0.0011)

-0.0003 (-0.001,0.0004)

-0.0008 (-0.0018,0.0002)

-0.0002 (-0.0008,0.0004)

SNP*age2 0.00012 (-0.00005,0.00028)

-0.00004 (-0.00013,0.00005)

-0.00003 (-0.00015,0.00009)

0.00006 (-0.00001,0.00013)

-0.00003 (-0.00014,0.00008)

0.00002 (-0.00005,0.00008)

0.00002 (-0.00008,0.00012)

-0.00003 (-0.00009,0.00003)

SNP*age3 -0.000008 (-0.00004,0.000023)

0.000009 (-0.00001,0.00003)

0.000011 (-0.00001,0.00003)

0.000001 (-0.00001,0.00001)

0.000009 (-0.00001,0.00003)

0.000004 (-0.00001,0.00002)

0.000001 (-0.00002,0.00002)

-0.000004 (-0.00001,0.00001)

t-distribution

SNP 0.0045 (-0.0058,0.0148)

0.0071 (0.001,0.0133)

0.0018 (-0.0063,0.0099)

0.0004 (-0.0042,0.005)

-0.0051 (-0.012,0.0017)

0.0026 (-0.0014,0.0066)

0.0024 (-0.0039,0.0086)

-0.0019 (-0.0056,0.0018)

SNP*age 0.0029 (0.001,0.0048)

0.0005 (-0.0006,0.0017)

-0.0009 (-0.0024,0.0005)

0.0001 (-0.0007,0.001)

-0.0008 (-0.002,0.0005)

0.0001 (-0.0007,0.0008)

-0.0003 (-0.0015,0.0009)

-0.0002 (-0.0009,0.0005)

SNP*age2 -0.00002 (-0.00021,0.00017)

-0.00006 (-0.00017,0.00005)

-0.00003 (-0.00018,0.00012)

-0.00006 (-0.00015,0.00002)

0.00011 (-0.00001,0.00024)

-0.0001 (-0.00017,-0.00003)

0.00002 (-0.0001,0.00013)

0.00001 (-0.00005,0.00008)

SNP*age3 -0.000047 (-0.00009,-0.00001)

-0.000005 (-0.00003,0.00002)

0.000029 (0,0.000059)

-0.000002 (-0.00002,0.00001)

0.000003 (-0.00002,0.00003)

-0.000004 (-0.00002,0.00001)

0.000021 (-0.000002,0.00005)

-0.000005 (-0.00002,0.00001)


SNP 0.0063 (-0.0038,0.0165)

-0.0029 (-0.009,0.0033)

-0.0006 (-0.008,0.0068)

0.0012 (-0.0033,0.0057)

0.0013 (-0.0051,0.0076)

-0.0013 (-0.0051,0.0025)

-0.0035 (-0.0097,0.0027)

0.0005 (-0.0031,0.0041)

SNP*age 0.0014 (-0.0003,0.0031)

-0.0006 (-0.0016,0.0003)

0.0004 (-0.0009,0.0017)

-0.0004 (-0.0011,0.0003)

0.0002 (-0.0009,0.0013)

-0.0003 (-0.001,0.0004)

-0.0002 (-0.0013,0.0009)

0.0001 (-0.0005,0.0007)

SNP*age2 0.00002 (-0.00015,0.00019)

-0.00003 (-0.00013,0.00006)

-0.00001 (-0.00013,0.00012)

-0.00006 (-0.00013,0.00001)

-0.00004 (-0.00015,0.00007)

-0.00001 (-0.00007,0.00006)

0.00001 (-0.00009,0.00011)

0.00004 (-0.00002,0.0001)

SNP*age3 -0.000001 (-0.00003,0.00003)

0.000008 (-0.00001,0.00003)

-0.00001 (-0.00003,0.00001)

0.000012 (-0.000001,0.00003)

-0.000006 (-0.00003,0.00001)

-0.000001 (-0.00001,0.00001)

0.000005 (-0.00001,0.00002)

0.000002 (-0.00001,0.00001)


SNP -0.0073 (-0.0177,0.0031)

0.0054 (-0.0006,0.0113)

0.0021 (-0.0055,0.0098)

-0.0012 (-0.0056,0.0032) -0.0065 (-0.0129,0)

-0.0006 (-0.0044,0.0033)

-0.0028 (-0.009,0.0034)

0.0021 (-0.0015,0.0056)

SNP*age -0.001 0.0007 -0.0003 0 (-0.0006,0.0006) -0.0007 -0.0001 -0.0006 0.0006

(-0.0024,0.0004) (-0.0001,0.0016) (-0.0013,0.0008) (-0.0016,0.0002) (-0.0007,0.0004) (-0.0015,0.0002) (0.0001,0.0011)

SNP*age2 -0.00003 (-0.00016,0.0001)

-0.00006 (-0.00013,0.00002)

-0.00005 (-0.00015,0.00005) 0 (-0.00005,0.00006)

0.00005 (-0.00004,0.00013) 0 (-0.00005,0.00005)

0.00004 (-0.00004,0.00012)

0.00003 (-0.00002,0.00007)

SNP*age3 0.000008 (-0.00001,0.00003)

-0.000007 (-0.00002,0.000004)

0.000015 (0.000001,0.000028)

-0.000004 (-0.00001,0.000004)

0.000007 (-0.000004,0.00002)

0.000005 (-0.000002,0.00001)

0.000005 (-0.00001,0.00002)

-0.000007 (-0.00001,-0.000001)


SNP 0.0053 (-0.0049,0.0154)

0.0005 (-0.0056,0.0066)

0.0033 (-0.0043,0.011)

-0.0038 (-0.0081,0.0005)

-0.0018 (-0.0086,0.0051)

0.0014 (-0.0025,0.0054)

-0.0047 (-0.0115,0.002)

-0.0003 (-0.0041,0.0034)

SNP*age 0.0002 (-0.0017,0.002)

0.0005 (-0.0006,0.0015)

0.0006 (-0.0008,0.002) 0 (-0.0008,0.0007)

-0.0006 (-0.0017,0.0006)

0.0004 (-0.0003,0.0011)

-0.0004 (-0.0016,0.0007)

-0.0006 (-0.0012,0.0001)

SNP*age2 0.0001 (-0.00008,0.00027)

0.00001 (-0.00009,0.00011)

-0.00001 (-0.00014,0.00012)

0.00004 (-0.00004,0.00012)

0.00008 (-0.00003,0.0002)

0.00001 (-0.00005,0.00007)

0.00004 (-0.00006,0.00015)

-0.00003 (-0.0001,0.00003)

SNP*age3 0.000025 (-0.00001,0.00006)

0.000002 (-0.00002,0.00002)

-0.000007 (-0.00003,0.00002)

-0.000011 (-0.00003,0.000003)

0.000009 (-0.00001,0.00003)

-0.000002 (-0.00002,0.00001)

0.00001 (-0.00001,0.00003)

0.000013 (0.000001,0.000026)


SNP -0.0007 (-0.0114,0.0099)

0.0105 (0.0046,0.0163)

0.0069 (-0.0012,0.0149)

-0.0031 (-0.0076,0.0014)

0.0011 (-0.0061,0.0082)

-0.0002 (-0.0041,0.0038)

-0.0015 (-0.008,0.0051)

-0.0013 (-0.0049,0.0024)

SNP*age -0.0003 (-0.0021,0.0014)

0.0009 (-0.0002,0.0019)

0.0008 (-0.0005,0.0021)

-0.0008 (-0.0016,-0.0001)

-0.0002 (-0.0014,0.001) 0 (-0.0006,0.0007)

-0.0002 (-0.0014,0.0009)

-0.0003 (-0.001,0.0003)

SNP*age2 0.0001 (-0.00007,0.00027)

-0.00002 (-0.00012,0.00008)

-0.00002 (-0.00015,0.0001)

-0.00004 (-0.00011,0.00003)

-0.00013 (-0.00024,-0.00001)

-0.00002 (-0.00008,0.00005)

0.00007 (-0.00003,0.00017)

-0.00004 (-0.0001,0.00002)

SNP*age3 -0.000002 (-0.00004,0.00003)

-0.000001 (-0.00002,0.000018)

-0.000006 (-0.00003,0.00002)

0.000007 (-0.00001,0.00002)

0.000002 (-0.00002,0.00002)

-0.00001 (-0.00002,0.000002)

0.000005 (-0.00002,0.00003)

0.000003 (-0.000008,0.000015)


SNP -0.0006 (-0.011,0.0098)

0.0013 (-0.0049,0.0075)

0.0023 (-0.0056,0.0102)

0.0007 (-0.0038,0.0053)

-0.0015 (-0.0086,0.0056)

-0.0023 (-0.0063,0.0016)

-0.0026 (-0.0088,0.0037)

-0.0004 (-0.0041,0.0033)

SNP*age 0.0004 (-0.0015,0.0023)

-0.0001 (-0.0012,0.001) 0 (-0.0015,0.0014)

0.0001 (-0.0008,0.0009)

-0.0008 (-0.0021,0.0005)

-0.0002 (-0.001,0.0005)

-0.0001 (-0.0013,0.0011)

-0.0006 (-0.0013,0.0001)

SNP*age2 -0.00007 (-0.00028,0.00014) 0 (-0.00013,0.00013)

0.00012 (-0.00005,0.00028)

-0.00005 (-0.00015,0.00004)

-0.00004 (-0.00019,0.00011)

0.00003 (-0.00005,0.00012)

-0.00012 (-0.00025,0.00002)

-0.00002 (-0.0001,0.00006)

SNP*age3 -0.000005 (-0.00005,0.00004)

0.000002 (-0.00002,0.00003)

0.000037 (0.000004,0.000071)

-0.000004 (-0.00002,0.00002)

0.000011 (-0.00002,0.00004)

0.000005 (-0.00001,0.00002)

-0.000012 (-0.00004,0.00002)

0.000006 (-0.00001,0.00002)

Table 9: Bias and 95% confidence interval for the unbalanced design with more samples around the adiposity rebound; bold and underlined cells are those whose

confidence interval does not cover zero based on 1,000 simulations.

MAF 0.1 0.2 0.3 0.4



SNP 0.0019 (-0.0088,0.0126)

0.0009 (-0.0051,0.0069)

-0.0067 (-0.0143,0.0009)

-0.0031 (-0.0077,0.0014)

0.0039 (-0.0027,0.0105)

0.0041 (0,0.0082)

0.0035 (-0.0026,0.0095)

0.0012 (-0.0024,0.0049)

SNP*age -0.0012 (-0.003,0.0005)

-0.0004 (-0.0014,0.0006)

-0.0008 (-0.0021,0.0005) -0.0007 (-0.0015,0)

0.0006 (-0.0005,0.0017)

0.0006 (-0.0001,0.0012) 0.001 (0,0.002) 0.0004 (-0.0002,0.001)

SNP*age2 0 (-0.00016,0.00017) -0.00009 (-0.00019,0.00001)

-0.00001 (-0.00013,0.00012)

0.00002 (-0.00005,0.00009)

-0.00005 (-0.00015,0.00006)

-0.00003 (-0.0001,0.00003)

0.00003 (-0.00008,0.00013)

-0.00001 (-0.00007,0.00005)

SNP*age3 0.000031 (-0.000002,0.00006)

0.000006 (-0.00001,0.00002)

0.000008 (-0.00002,0.00003)

0.000003 (-0.00001,0.00002)

-0.000004 (-0.00003,0.00002)

-0.000007 (-0.00002,0.00001)

-0.000008 (-0.00003,0.00001)

0.000001 (-0.00001,0.000012)

t-distribution

SNP -0.0062 (-0.0165,0.0041)

-0.0024 (-0.0083,0.0036)

-0.0023 (-0.0104,0.0058)

0.0014 (-0.0032,0.006)

0.0024 (-0.0045,0.0093)

0.0007 (-0.0034,0.0047)

0.0047 (-0.0017,0.0111) 0.0018 (-0.002,0.0056)

SNP*age 0.0004 (-0.0016,0.0023) 0.0011 (0,0.0022)

0.0001 (-0.0015,0.0016)

0.0003 (-0.0006,0.0011)

-0.0001 (-0.0014,0.0012)

0.0002 (-0.0006,0.0009)

0.0003 (-0.0009,0.0016) 0.0003 (-0.0004,0.001)

SNP*age2 0.00004 (-0.00015,0.00023)

0.00009 (-0.00002,0.0002)

-0.0001 (-0.00024,0.00004)

0.00002 (-0.00006,0.0001)

0.00006 (-0.00006,0.00019)

0.00002 (-0.00006,0.00009)

-0.00003 (-0.00015,0.00008)

0.00002 (-0.00005,0.00009)

SNP*age3 -0.000016 (-0.00006,0.00002)

-0.000021 (-0.00004,0.000002)

-0.000017 (-0.00005,0.00001)

-0.000003 (-0.00002,0.00002)

0.000012 (-0.00001,0.00004)

-0.000007 (-0.00002,0.00001)

-0.000002 (-0.00003,0.00002) 0 (-0.00002,0.00002)


SNP 0.0076 (-0.0026,0.0177)

0.0056 (-0.0004,0.0116)

-0.0027 (-0.0101,0.0048)

0.0038 (-0.0005,0.0082)

-0.0059 (-0.0124,0.0006)

0.0028 (-0.001,0.0066)

-0.0002 (-0.0063,0.0059)

-0.0007 (-0.0042,0.0027)

SNP*age 0.0009 (-0.0008,0.0026)

0.0009 (-0.0001,0.002) 0 (-0.0012,0.0013)

0.0003 (-0.0004,0.001)

-0.0011 (-0.0022,0.0001)

0.0005 (-0.0002,0.0012)

0.0001 (-0.0009,0.0011)

0.0002 (-0.0004,0.0007)

SNP*age2 -0.00011 (-0.00027,0.00006)

-0.00001 (-0.00011,0.00009)

-0.00002 (-0.00014,0.00011)

0.00001 (-0.00006,0.00008)

-0.00001 (-0.00012,0.00009)

-0.00005 (-0.00011,0.00001)

0.00009 (-0.00002,0.00019)

-0.00001 (-0.00007,0.00005)

SNP*age3 0 (-0.00003,0.00003)

-0.000011 (-0.00003,0.00001)

0.00001 (-0.00001,0.00003)

0.000008 (-0.00001,0.00002)

0.000001 (-0.00002,0.00002)

-0.000011 (-0.00002,0)

0.000008 (-0.00001,0.00003)

0.000002 (-0.00001,0.00001)


SNP -0.0037 (-0.0142,0.0067)

-0.002 (-0.0079,0.0038)

0.0053 (-0.0023,0.013)

-0.0021 (-0.0064,0.0022)

-0.0001 (-0.0068,0.0066)

-0.0037 (-0.0075,0.0002)

-0.0015 (-0.0078,0.0048) 0.0014 (-0.0022,0.005)

SNP*age -0.0004 -0.0002 0.0007 -0.0005 0.0003 -0.0003 -0.0008 (-0.0017,0) 0.0003

(-0.0018,0.0011) (-0.001,0.0006) (-0.0004,0.0018) (-0.0011,0.0001) (-0.0007,0.0012) (-0.0009,0.0002) (-0.0002,0.0008)

SNP*age2 0.00008 (-0.00005,0.00022)

0.00001 (-0.00007,0.00009)

0.00001 (-0.00009,0.00011)

0.00002 (-0.00004,0.00008)

-0.00006 (-0.00014,0.00003)

0.00003 (-0.00002,0.00008) 0.00008 (0,0.00016)

-0.00003 (-0.00008,0.00002)

SNP*age3 0.000011 (-0.00001,0.00003)

0.000002 (-0.00001,0.00001)

-0.000003 (-0.00002,0.00001)

0.000005 (-0.000002,0.00001)

-0.000012 (-0.00002,0.000001)

0.000005 (-0.000002,0.00001)

0.000007 (-0.00001,0.00002)

0.000002 (-0.00001,0.00001)


SNP -0.0019 (-0.0124,0.0087)

0.0063 (0.0003,0.0123)

-0.0015 (-0.0093,0.0063)

0.0037 (-0.0007,0.0082)

-0.0006 (-0.0072,0.006)

-0.0016 (-0.0054,0.0023)

0.0009 (-0.0054,0.0072)

0.0025 (-0.0011,0.0061)

SNP*age -0.0006 (-0.0025,0.0013)

0.0004 (-0.0007,0.0014)

-0.0007 (-0.0021,0.0006)

0.0001 (-0.0006,0.0009)

0.0007 (-0.0006,0.0019)

0.0001 (-0.0006,0.0007)

-0.0008 (-0.002,0.0004)

0.0004 (-0.0002,0.0011)

SNP*age2 0.00021 (0.00004,0.00039) -0.0001 (-0.00021,0)

0.00016 (0.00002,0.00029)

-0.00007 (-0.00015,0.00001)

0.00002 (-0.0001,0.00013)

0.00001 (-0.00005,0.00008)

0.00003 (-0.00007,0.00014)

-0.00002 (-0.00008,0.00005)

SNP*age3 0.000014 (-0.00002,0.00005)

0.000002 (-0.00002,0.00002)

0.000007 (-0.00002,0.00003)

0 (-0.00002,0.00002)

-0.000003 (-0.00003,0.00002)

-0.000001 (-0.00002,0.00001)

0.000015 (-0.00002,0.00004)

-0.000008 (-0.00002,0.000004)


SNP -0.0024 (-0.0128,0.008)

-0.0024 (-0.0083,0.0035)

0.0025 (-0.0055,0.0105)

-0.0007 (-0.0054,0.004) 0.0033 (-0.0034,0.01)

-0.0009 (-0.0045,0.0028)

0.0003 (-0.0057,0.0063) -0.0005 (-0.004,0.003)

SNP*age 0.0009 (-0.001,0.0027)

-0.0005 (-0.0016,0.0005)

0.0004 (-0.001,0.0018)

0.0002 (-0.0006,0.0009)

0.0007 (-0.0005,0.0019)

-0.0005 (-0.0012,0.0001)

0.0005 (-0.0006,0.0016)

0.0005 (-0.0002,0.0011)

SNP*age2 -0.00003 (-0.00021,0.00014)

0.00008 (-0.00003,0.00018)

-0.00002 (-0.00015,0.0001)

0.00005 (-0.00003,0.00013)

-0.00009 (-0.0002,0.00002)

0.00001 (-0.00005,0.00007)

0.00005 (-0.00005,0.00016) 0 (-0.00006,0.00006)

SNP*age3 -0.000032 (-0.00007,0.000002)

0.000016 (-0.000004,0.00004)

0.000001 (-0.00003,0.00003)

-0.000003 (-0.00002,0.00001)

-0.000016 (-0.00004,0.00001)

0.000007 (-0.000005,0.00002)

0 (-0.00002,0.00002)

-0.00001 (-0.00002,0.000002)


SNP -0.0019 (-0.0121,0.0083)

-0.0037 (-0.0099,0.0026)

0.0013 (-0.0066,0.0093)

0.0001 (-0.0043,0.0046)

0.0056 (-0.0013,0.0125)

0.0004 (-0.0036,0.0044)

-0.0016 (-0.0081,0.0049)

-0.0015 (-0.0053,0.0022)

SNP*age -0.0002 (-0.0022,0.0018) -0.0012 (-0.0023,0)

-0.0002 (-0.0017,0.0013)

0.0006 (-0.0002,0.0015)

0.0001 (-0.0012,0.0015)

-0.0002 (-0.001,0.0005)

-0.0004 (-0.0017,0.0008)

-0.0003 (-0.001,0.0004)

SNP*age2 -0.00001 (-0.00024,0.00021)

0.00003 (-0.0001,0.00016)

-0.00003 (-0.0002,0.00015)

0.00003 (-0.00006,0.00013)

-0.00002 (-0.00016,0.00012)

0.00006 (-0.00002,0.00014)

-0.00001 (-0.00015,0.00013)

-0.00001 (-0.00009,0.00007)

SNP*age3 -0.000001 (-0.00005,0.00005)

0.000018 (-0.00001,0.00005)

-0.000004 (-0.00004,0.00003)

-0.000007 (-0.00003,0.00001)

0.000008 (-0.00002,0.00004)

0.000005 (-0.00001,0.00002)

0.000003 (-0.00002,0.00003)

-0.000002 (-0.00002,0.00001)

Table 10: Bias and 95% confidence interval for the unbalanced design with less samples around the adiposity rebound; bold and underlined cells are those whose

confidence interval does not cover zero based on 1,000 simulations.

MAF 0.1 0.2 0.3 0.4



SNP 0.0042 (-0.006,0.0144)

0.0027 (-0.0033,0.0087)

-0.0006 (-0.0083,0.007) 0 (-0.0046,0.0045) 0.0034 (-0.0032,0.01)

-0.003 (-0.0067,0.0008)

0.001 (-0.0052,0.0073) 0 (-0.0036,0.0037)

SNP*age -0.0008 (-0.0025,0.001)

0.0011 (0.0001,0.0021)

0.0001 (-0.0011,0.0014)

0.0001 (-0.0006,0.0009)

-0.0003 (-0.0014,0.0008)

-0.0001 (-0.0007,0.0005)

-0.0001 (-0.0011,0.0009)

0.0005 (-0.0001,0.0012)

SNP*age2 -0.00015 (-0.00032,0.00001) 0 (-0.00009,0.00009)

-0.00001 (-0.00013,0.00011)

0.00004 (-0.00003,0.00011)

-0.00009 (-0.00019,0.00001) 0 (-0.00006,0.00006) 0 (-0.0001,0.0001)

0.00002 (-0.00004,0.00007)

SNP*age3 0.000023 (-0.00001,0.00005)

-0.00002 (-0.00004,-0.000003)

-0.000007 (-0.00003,0.00002)

0.000008 (-0.00001,0.00002)

0.000009 (-0.00001,0.00003)

-0.000005 (-0.00002,0.00001)

-0.000009 (-0.00003,0.00001)

-0.000006 (-0.00002,0.00001)

t-distribution

SNP 0.0103 (-0.0004,0.0209)

0.0094 (0.0033,0.0156)

0.0026 (-0.0054,0.0105)

0.0007 (-0.004,0.0055)

0.003 (-0.0036,0.0096)

0.0025 (-0.0015,0.0065)

-0.0009 (-0.0073,0.0055)

0.0016 (-0.0021,0.0054)

SNP*age 0.0021 (0.0001,0.0041)

0.0006 (-0.0006,0.0017)

0.001 (-0.0005,0.0024)

0.0001 (-0.0007,0.001)

0.0007 (-0.0006,0.002)

0.0008 (0.0001,0.0016)

-0.0006 (-0.0018,0.0005)

0.0002 (-0.0005,0.0008)

SNP*age2 0.00007 (-0.00011,0.00025)

-0.00014 (-0.00025,-0.00003)

-0.00006 (-0.00019,0.00008)

0.00002 (-0.00007,0.0001)

-0.00004 (-0.00016,0.00008)

-0.00004 (-0.00011,0.00004)

-0.00014 (-0.00026,-0.00002)

0.00002 (-0.00004,0.00009)

SNP*age3 -0.000006 (-0.00005,0.00003)

-0.000003 (-0.00003,0.00002)

-0.000013 (-0.00004,0.00002)

-0.000004 (-0.00002,0.00001)

-0.000016 (-0.00004,0.00001)

-0.000006 (-0.00002,0.00001)

0.000007 (-0.00002,0.00003)

0.000002 (-0.00001,0.00002)


SNP -0.0021 (-0.0122,0.008)

-0.0018 (-0.0077,0.0041)

-0.0073 (-0.0148,0.0003)

-0.0053 (-0.0097,-0.001)

0.0048 (-0.0018,0.0114)

-0.0009 (-0.0047,0.003)

-0.0038 (-0.01,0.0024)

0.0012 (-0.0026,0.0049)

SNP*age -0.001 (-0.0027,0.0007)

-0.0005 (-0.0015,0.0004)

-0.0002 (-0.0015,0.0011)

-0.0003 (-0.001,0.0005)

0.0002 (-0.0009,0.0013)

-0.0001 (-0.0008,0.0006)

-0.0005 (-0.0016,0.0005) 0 (-0.0006,0.0006)

SNP*age2 -0.00013 (-0.00029,0.00003)

-0.00002 (-0.00012,0.00007)

0.00012 (-0.00001,0.00024) 0.00007 (0,0.00014)

-0.00003 (-0.00013,0.00008)

-0.00002 (-0.00008,0.00004)

0.00011 (0.00001,0.00021)

-0.00002 (-0.00007,0.00004)

SNP*age3 0.000004 (-0.00003,0.00004)

-0.000001 (-0.00002,0.00002)

-0.00002 (-0.00004,0.000003)

0 (-0.00001,0.00001)

0.000012 (-0.00001,0.00003)

-0.000002 (-0.000013,0.00001)

0.000003 (-0.00002,0.00002)

0.000003 (-0.00001,0.00001)


SNP 0.0034 (-0.0068,0.0136)

-0.0073 (-0.0132,-0.0014)

-0.0089 (-0.0165,-0.0012)

-0.0007 (-0.0051,0.0038)

-0.0027 (-0.0095,0.0041)

-0.0006 (-0.0046,0.0035)

0.0013 (-0.0052,0.0078)

-0.0001 (-0.0037,0.0036)

SNP*age 0.0014 (0,0.0027) -0.0008 -0.0006 0 (-0.0006,0.0007) -0.0005 0.0003 0 (-0.0009,0.0009) 0.0001

(-0.0016,0.0001) (-0.0017,0.0004) (-0.0015,0.0004) (-0.0002,0.0009) (-0.0003,0.0006)

SNP*age2 -0.00003 (-0.00015,0.0001)

0.00006 (-0.00002,0.00014)

-0.00002 (-0.00011,0.00008)

0.00002 (-0.00003,0.00008)

-0.00003 (-0.00012,0.00005) 0.00005 (0,0.00009) 0 (-0.00008,0.00008) 0 (-0.00005,0.00005)

SNP*age3 -0.000022 (-0.00004,-0.000005)

-0.000005 (-0.00002,0.00001)

0.000004 (-0.00001,0.00002)

-0.000003 (-0.00001,0.00001)

-0.000002 (-0.00001,0.00001)

0.000001 (-0.000005,0.000008)

0.000007 (-0.00001,0.00002)

-0.000001 (-0.00001,0.00001)


SNP 0.0013 (-0.009,0.0115)

-0.0026 (-0.0087,0.0035)

-0.0009 (-0.0089,0.0071)

0.0022 (-0.0024,0.0067)

-0.0056 (-0.0123,0.0011)

-0.0007 (-0.0045,0.0031)

-0.0037 (-0.0099,0.0025)

0.0003 (-0.0034,0.004)

SNP*age -0.0008 (-0.0026,0.001)

-0.0004 (-0.0014,0.0007)

-0.0011 (-0.0025,0.0003)

-0.0003 (-0.0011,0.0005)

0.0001 (-0.001,0.0013)

0.0002 (-0.0005,0.0009)

-0.0003 (-0.0014,0.0008) 0 (-0.0007,0.0006)

SNP*age2 -0.00003 (-0.0002,0.00014)

-0.00002 (-0.00012,0.00008)

-0.00004 (-0.00017,0.00009)

-0.00005 (-0.00013,0.00002)

0.00006 (-0.00006,0.00017)

0.00003 (-0.00004,0.00009)

-0.00003 (-0.00014,0.00007)

0.00002 (-0.00004,0.00008)

SNP*age3 0.000017 (-0.00002,0.00005)

-0.000003 (-0.00002,0.00002)

0.000024 (-0.000002,0.00005)

0.00001 (-0.00001,0.00002)

-0.00001 (-0.00003,0.00001)

-0.000006 (-0.00002,0.000007)

-0.000009 (-0.00003,0.00001)

0.000004 (-0.00001,0.00002)


SNP 0.0002 (-0.0101,0.0105)

-0.0001 (-0.006,0.0057)

0.0066 (-0.0007,0.0139)

-0.0007 (-0.0052,0.0039)

-0.0008 (-0.0076,0.0059)

0.0019 (-0.0022,0.006)

0.0008 (-0.0054,0.007)

0.0022 (-0.0014,0.0059)

SNP*age 0.0006 (-0.0011,0.0024)

-0.0008 (-0.0018,0.0002)

0.0007 (-0.0006,0.0019)

0.0003 (-0.0005,0.001)

-0.0003 (-0.0014,0.0008)

0.0002 (-0.0005,0.0008)

0.0005 (-0.0006,0.0016)

-0.0003 (-0.0009,0.0003)

SNP*age2 -0.00002 (-0.00018,0.00014)

-0.00006 (-0.00016,0.00004)

-0.00002 (-0.00015,0.00011) 0 (-0.00007,0.00007)

-0.00003 (-0.00014,0.00009)

-0.00001 (-0.00008,0.00005) 0 (-0.00011,0.0001) 0 (-0.00006,0.00007)

SNP*age3 -0.000024 (-0.00006,0.00001)

0.00001 (-0.00001,0.00003)

-0.000006 (-0.00003,0.00002)

-0.000005 (-0.00002,0.00001)

0.00001 (-0.000011,0.00003)

0.000008 (-0.000003,0.00002)

-0.000008 (-0.00003,0.00001)

0.000009 (-0.000002,0.00002)


SNP 0.0016 (-0.0087,0.0119)

-0.0013 (-0.0073,0.0047)

0.0041 (-0.0034,0.0116)

0.002 (-0.0024,0.0065)

-0.0024 (-0.0092,0.0044)

-0.0011 (-0.0051,0.0028) 0.0037 (-0.0026,0.01)

0.0011 (-0.0025,0.0047)

SNP*age 0.0008 (-0.0011,0.0028)

-0.0002 (-0.0013,0.0009)

0.0005 (-0.0009,0.0019)

0.0003 (-0.0006,0.0011) 0 (-0.0012,0.0013)

-0.0004 (-0.0011,0.0004)

0.0003 (-0.0009,0.0014) 0 (-0.0007,0.0007)

SNP*age2 0.00012 (-0.00009,0.00032)

-0.00002 (-0.00014,0.00011)

0.00002 (-0.00015,0.00018)

-0.00003 (-0.00012,0.00006)

-0.00013 (-0.00026,0.00001)

-0.00004 (-0.00012,0.00004)

0.00008 (-0.00005,0.00022)

-0.00005 (-0.00013,0.00002)

SNP*age3 0.000007 (-0.00004,0.00005)

-0.000005 (-0.00003,0.00002)

0.000006 (-0.00003,0.00004)

-0.000001 (-0.00002,0.00002)

-0.000038 (-0.00007,-0.00001)

0.000007 (-0.00001,0.00002)

0.00002 (-0.00001,0.00005)

-0.000009 (-0.00002,0.00001)

Table 11: Type one error for sparse complete design; bold and underlined cells are those that are

significantly different from the nominal α=0.05 based on 5,000 simulations.









Table 12: Type one error for intense complete design; bold and underlined cells are those that are










Table 13: Type one error for equal unbalanced design; bold and underlined cells are those that are










Table 14: Type one error for the design with more samples around the adiposity rebound; bold and

underlined cells are those that are significantly different from the nominal α=0.05 based on 5,000

simulations.









Table 15: Type one error for design with less samples around the adiposity rebound; bold and

underlined cells are those that are significantly different from the nominal α=0.05 based on 5,000

simulations.

MAF 0.1 0.2 0.3 0.4 Sample Size N=1,000 N=3,000 N=1,000 N=3,000 N=1,000 N=3,000 N=1,000 N=3,000 Gaussian Distribution SNP 0.0580 0.0542 0.0442 0.0578 0.0560 0.0506 0.0534 0.0534 SNP*age 0.0556 0.0552 0.0548 0.0664 0.0562 0.0530 0.0574 0.0552 SNP*age2 0.0524 0.0492 0.0434 0.0506 0.0568 0.0542 0.0502 0.0516 SNP*age3 0.0690 0.0734 0.0702 0.0690 0.0786 0.0642 0.0706 0.0772 t-distribution SNP 0.0498 0.0532 0.0432 0.0524 0.0518 0.0498 0.0498 0.0510 SNP*age 0.0586 0.0626 0.0588 0.0560 0.0538 0.0532 0.0540 0.0590 SNP*age2 0.0496 0.0536 0.0544 0.0598 0.0486 0.0532 0.0562 0.0484 SNP*age3 0.0686 0.0764 0.0708 0.0668 0.0674 0.0666 0.0712 0.0696 Skew-normal Distribution SNP 0.0508 0.0498 0.0528 0.0474 0.0502 0.0462 0.0510 0.0494 SNP*age 0.0612 0.0550 0.0556 0.0522 0.0608 0.0558 0.0596 0.0556 SNP*age2 0.0480 0.0444 0.0558 0.0492 0.0482 0.0524 0.0522 0.0474 SNP*age3 0.0768 0.0734 0.0728 0.0732 0.0684 0.0686 0.0738 0.0738 Mixture of 2 Gaussian Distributions SNP 0.0480 0.0474 0.0590 0.0476 0.0492 0.0492 0.0476 0.0504 SNP*age 0.0570 0.0528 0.0544 0.0556 0.0530 0.0550 0.0528 0.0570 SNP*age2 0.0506 0.0534 0.0492 0.0460 0.0518 0.0472 0.0506 0.0502 SNP*age3 0.0730 0.0690 0.0706 0.0650 0.0768 0.0712 0.0678 0.0718 Variance dependent on a covariate SNP 0.0566 0.0484 0.0488 0.0516 0.0514 0.0534 0.0478 0.0576 SNP*age 0.0642 0.0554 0.0574 0.0616 0.0630 0.0606 0.0572 0.0638 SNP*age2 0.0524 0.0548 0.0512 0.0516 0.0508 0.0508 0.0556 0.0534 SNP*age3 0.0734 0.0714 0.0764 0.0772 0.0812 0.0784 0.0738 0.0796 Variance greater at adiposity rebound SNP 0.0510 0.0528 0.0518 0.0480 0.0532 0.0540 0.0462 0.0434 SNP*age 0.0604 0.0632 0.0576 0.0544 0.0530 0.0622 0.0532 0.0600 SNP*age2 0.0548 0.0488 0.0522 0.0524 0.0554 0.0516 0.0514 0.0430 SNP*age3 0.0726 0.0612 0.0630 0.0718 0.0656 0.0700 0.0702 0.0676 Variance increasing over time SNP 0.0528 0.0504 0.0472 0.0492 0.0508 0.0468 0.0456 0.0536 SNP*age 0.0496 0.0492 0.0506 0.0488 0.0484 0.0536 0.0502 0.0470 SNP*age2 0.0526 0.0538 0.0486 0.0496 0.0562 0.0548 0.0498 0.0544 SNP*age3 0.0954 0.1002 0.1054 0.0908 0.1098 0.1002 0.1056 0.0980

Table 16: Results from additional simulations for comparison between missing data or variable measurement time under the intense design. Data was simulated with

the error term following a Gaussian distribution using the missing/different age design so that: 1) there was a cubic function of age in both the fixed and random effects

(i.e. BMIij = β0 + β1t ij + β2t ij2 + β3t ij

3 + β4MSij + β5SNPi + β6t ijSNPi + β7t ij2SNPi + β8t ij

3SNPi + bi0 + bi1t ij + bi2t ij2 + bi3t ij

3 + ε ij); 2) there was a quadratic function of age in both

the fixed and random effects (i.e. BMIij = β0 + β1t ij + β2t ij2 + β3MSij + β4SNPi + β5t ijSNPi + β6t ij

2SNPi + bi0 + bi1t ij + bi2t ij2 + ε ij); 3) there was a linear function of age in both

the fixed and random effects (i.e. BMIij = β0 + β1t ij + β2MSij + β3SNPi + β4t ijSNPi + bi0 + bi1t ij + ε ij); 4) there was a cubic function of age in the fixed effects and a quadratic

function of age in the random effects (i.e. BMIij = β0 + β1t ij + β2t ij2 + β3t ij

3 + β4MSij + β5SNPi + β6t ijSNPi + β7t ij2SNPi + β8t ij

3SNPi + bi0 + bi1t ij + bi2t ij2 + ε ij); 5) there was a

quadratic function of age in the fixed effects and a linear function of age in the random effects (i.e. BMIij = β0 + β1t ij + β2t ij2 + β3MSij + β4SNPi + β5t ijSNPi + β6t ij

2SNPi + bi0 +

bi1t ij + ε ij )

N=1,000 N=3,000

SNP SNP*age SNP*age2 SNP*age3 SNP SNP*age SNP*age2 SNP*age3

Cubic fixed, cubic random 0.0499 0.0509 0.0501 0.0533 0.0472 0.0490 0.0496 0.0489

Quadratic fixed, quadratic random 0.0485 0.0487 0.0519 -- 0.0514 0.0531 0.0499 --

Linear fixed, linear random 0.0513 0.0543 -- -- 0.0517 0.0486 -- --

Cubic fixed, quadratic random 0.0487 0.0648 0.0490 0.0902 0.0501 0.0655 0.0498 0.0861

Quadratic fixed, linear random 0.0505 0.0502 0.0689 -- 0.0533 0.0519 0.0679 --

Table 17: Type 1 error when the fixed and random effects both include a cubic function for age. Data

was simulated under the equal unbalanced scenario with a sample size of 3,000; 1,000 simulations for

each MAF were conducted. Columns 2 and 3 (under the heading “Quadratic”) are the same as in Table

4.6 of Chapter 4 and are included here for comparison purposes. Bold and underlined cells are those

that are significantly different from the nominal α=0.05 under each design

Random effects Quadratic

(20,000 simulations – 5,000 each MAF) Cubic

(4,000 simulations – 1,000 each MAF) Standard Robust Standard Robust Gaussian Distribution SNP 0.0500 0.0508 0.0470 0.0480 SNP*age 0.0592 0.0531 0.0438 0.0440 Global wald test 0.0598 0.0525 t-distribution SNP 0.0491 0.0497 0.0448 0.0443 SNP*age 0.0629 0.0539 0.0483 0.0498 Global wald test 0.0621 0.0535 Skew-normal Distribution SNP 0.0495 0.0501 0.0558 0.0560 SNP*age 0.0589 0.0526 0.0500 0.0490 Global wald test 0.0582 0.0513 Mixture of 2 Gaussian Distributions SNP 0.0490 0.0490 0.0505 0.0498 SNP*age 0.0487 0.0459 0.0533 0.0550 Global wald test 0.0581 0.0478 Variance dependent on a covariate SNP 0.0482 0.0491 0.0555 0.0578 SNP*age 0.0607 0.0514 0.0460 0.0473 Global wald test 0.0611 0.0473 Variance greater at adiposity rebound SNP 0.0492 0.0495 0.0518 0.0528 SNP*age 0.0563 0.0483 0.0555 0.0550 Global wald test 0.0559 0.0518 Variance increasing over time SNP 0.0500 0.0502 0.0488 0.0490 SNP*age 0.0571 0.0540 0.0473 0.0473 Global wald test 0.0929 0.0529 0.0495

Table 18: Type 1 error when the simulation and analysis models are different. Data was simulated with

a sample size of 3,000 under the missing/unbalanced scenario (each individual had 40% missing data

over the time period and were measured at different times within each year period) when the analysis

model was different to the simulation model; 5,000 simulations for each MAF were conducted. Bold and

underlined cells are those that are significantly different from the nominal α=0.05 under each design.

Simulated Model Analysis model SNP SNP*age SNP*age2

Quadratic fixed, linear random Quadratic fixed, quadratic random

0.0508 0.0503 0.0484

Quadratic fixed, quadratic random

Quadratic fixed, linear random 0.0624 0.0521 0.1393

Figure 1: Overview of the simulations conducted in this study. Initial simulations were conducted to determine whether misspecification of the model affected coverage probability, bias, power or type 1 error. Upon discovering inflation in the type 1 error in the unbalanced sampling designs, we conducted steps 2 to 5 attempting to determine the source of the inflation.

Analysis of: • 4 NEW sampling

designs • 4 MAFs • 2 sample sizes • Gaussian random

effects and error distribution

STEP 2: Investigate why type 1 error is increased in unbalanced designs

Null hypothesis (β5 = 0 and β6 = 0)

Complete in all individuals and they were all measured at the same

time Derive type 1 error (5 000 simulations)

Complete in all individuals but they were measured at different times

within each year period Derive type 1 error (5 000 simulations)

Each individual had 40% missing data over the time period, but were

all measured at the same time. Derive type 1 error (5,000

simulations)

Each individual had 40% missing data over the time period and were

measured at different times within each year period. Derive type 1

(5 000 i l ti )

Analysis of: • 5 sampling designs • 4 MAFs • 2 sample sizes • 7 models

Alternative hypothesis (β5 = 0.6 and β6 = 0.15)


Derive coverage probabilities (1,000 simulations per scenario)

Derive power (1,000 simulations per scenario)

Derive bias (1,000 simulations per scenario)

Derive type 1 error (5,000 simulations per scenario)

STEP 1: Investigate whether different error distributions affect coverage probability, bias, power, type 1 error

STEP 3: Investigate why the type 1 error inflation is magnified in the presence of missing data

Analysis of: • Missing/different age

used in STEP 2 • 4 MAFs • 2 sample sizes • Gaussian random



Cubic function for age for the fixed and random effects. Derive type 1 error (5,000 simulations)

Cubic function for age in the fixed effects and quadratic function for age in the random effects. Derive type 1 error (5,000 simulations)

Linear function for age for the fixed and random effects. Derive type 1 error (5,000 simulations)

Quadratic function for age for the fixed and random effects. Derive type 1 error (5,000 simulations)

Quadratic function for age in the fixed effects and linear function for age in the random effects. Derive type 1 error (5,000 simulations)

Analysis of: • Equal unbalanced

sampling design • 4 MAFs • sample size of 3,000 • Gaussian random


STEP 5: Investigate whether type 1 error is inflated when the analysis model is different to the model the data is simulated under

Null hypothesis (β5 = 0 and β6 = 0) Derive type 1 error (1,000 simulations per scenario)

STEP 4: Investigate whether other six models have nominal type 1 error when the fixed and random effects are the same

Analysis of: • Equal unbalanced

sampling design • 4 MAFs • sample size of 3,000 • 7 models

Null hypothesis (β5 = 0 and β6 = 0) Derive type 1 error (1,000 simulations)

Appendix E: Publication Arising from the Research in Chapter Six

Association of a Body Mass Index Genetic Risk Score withGrowth throughout Childhood and AdolescenceNicole M. Warrington1,2., Laura D. Howe3., Yan Yan Wu2, Nicholas J. Timpson3, Kate Tilling4,

Craig E. Pennell1, John Newnham1, George Davey-Smith3, Lyle J. Palmer2,5, Lawrence J. Beilin6,

Stephen J. Lye2, Debbie A. Lawlor3, Laurent Briollais2*

1 School of Women’s and Infants’ Health, The University of Western Australia, Perth, Western Australia, Australia, 2 Samuel Lunenfeld Research Institute, University of

Toronto, Toronto, Ontario, Canada, 3 MRC Centre for Causal Analyses in Translational Epidemiology, School of Social and Community Medicine, University of Bristol,

Bristol, United Kingdom, 4 School of Social and Community Medicine, University of Bristol, Bristol, United Kingdom, 5 Ontario Institute for Cancer Research, University of

Toronto, Toronto, Ontario, Canada, 6 School of Medicine and Pharmacology, The University of Western Australia, Perth, Western Australia, Australia

Abstract

Background: While the number of established genetic variants associated with adult body mass index (BMI) is growing, therelationships between these variants and growth during childhood are yet to be fully characterised. We examined theassociation between validated adult BMI associated single nucleotide polymorphisms (SNPs) and growth trajectories acrosschildhood. We investigated the timing of onset of the genetic effect and whether it was sex specific.

Methods: Children from the ALSPAC and Raine birth cohorts were used for analysis (n = 9,328). Genotype data from 32 adultBMI associated SNPs were investigated individually and as an allelic score. Linear mixed effects models with smoothingsplines were used for longitudinal modelling of the growth parameters and measures of adiposity peak and rebound werederived.

Results: The allelic score was associated with BMI growth throughout childhood, explaining 0.58% of the total variance inBMI in females and 0.44% in males. The allelic score was associated with higher BMI at the adiposity peak (females =0.0163 kg/m2 per allele, males = 0.0123 kg/m2 per allele) and earlier age (-0.0362 years per allele in males and females) andhigher BMI (0.0332 kg/m2 per allele in females and 0.0364 kg/m2 per allele in males) at the adiposity rebound. No gene:sexinteractions were detected for BMI growth.

Conclusions: This study suggests that known adult genetic determinants of BMI have observable effects on growth fromearly childhood, and is consistent with the hypothesis that genetic determinants of adult susceptibility to obesity act fromearly childhood and develop over the life course.

Citation: Warrington NM, Howe LD, Wu YY, Timpson NJ, Tilling K, et al. (2013) Association of a Body Mass Index Genetic Risk Score with Growth throughoutChildhood and Adolescence. PLoS ONE 8(11): e79547. doi:10.1371/journal.pone.0079547

Editor: Kristel Sleegers, University of Antwerp, Belgium

Received July 12, 2013; Accepted September 23, 2013; Published November 11, 2013

Copyright: � 2013 Warrington et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permitsunrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: This work was supported by the following funding bodies and institutions. The UK Medical Research Council and the Wellcome Trust (Grant ref:092731) and the University of Bristol provide core support for ALSPAC. The following Institutions provide funding for Core Management of the Raine Study: TheUniversity of Western Australia (UWA), Raine Medical Research Foundation, UWA Faculty of Medicine, Dentistry and Health Sciences, The Telethon Institute forChild Health Research, Curtin University and Women and Infants Research Foundation. This study was supported by project grants from the National Health andMedical Research Council of Australia (Grant ID 403981 and ID 003209) and the Canadian Institutes of Health Research (Grant ID MOP-82893). NM Warrington isfunded by an Australian Postgraduate Award from the Australian Government of Innovation, Industry, Science and Research and a Raine Study PhD Top-UpScholarship. LD Howe is funded by a UK Medical Research Council Population Health Scientist fellowship (G1002375). LD Howe, NJ Timpson, K Tilling, G Davey-Smith and DA Lawlor all work in a Centre that receives core funding from the University of Bristol and the UK Medical Research Council (Grant ref: G0600705). Thefunders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing Interests: The authors have declared that no competing interests exist.

* E-mail: [email protected]

. These authors contributed equally to this work.

Introduction

Twin and family studies have provided evidence that body mass

index (BMI) is strongly heritable [1,2,3,4]. Recent genome-wide

association studies (GWAS) have begun to uncover genetic loci

contributing to increases in BMI in adulthood [5,6,7,8,9,10,11].

The largest genome-wide meta-analysis of BMI published to-date

included 249,796 individuals from the Genetic Investigation of

Anthropometric Traits (GIANT) Consortium; which confirmed 14

previously-reported loci and identified 18 novel loci for BMI [5].

There has been one GWAS to date that has focused on a

dichotomous indicator of childhood obesity [12], but none looking

at BMI on a continuous scale in childhood.

Once adult height is attained, changes in BMI are largely driven

by changes in weight. In contrast, during childhood and

adolescence, changes in BMI are influenced by both changes in

height and weight. Therefore, genetic variants that affect adult

BMI may influence change in weight, height or both during

childhood. Previous studies of adult BMI single nucleotide

polymorphisms (SNPs) in relation to infant and child change in

PLOS ONE | www.plosone.org 1 November 2013 | Volume 8 | Issue 11 | e79547

growth have shown little evidence of an association with birth

weight [13,14,15], but have shown evidence that these loci are

associated with more rapid height and weight gain in infancy

[13,15], and higher BMI and odds of obesity at multiple ages

across the life course [13,14,15,16,17].

BMI growth over childhood and adolescence is complex;

children tend to have rapidly increasing BMI from birth to

approximately 9 months of age where they reach their adiposity

peak, BMI then decreases until about the age of 5-6 years at

adiposity rebound and then steadily increases again until just after

puberty where it tends to plateau through adulthood. The BMI

and timing at the adiposity peak [18] and adiposity rebound

[19,20,21,22,23,24] have been shown to be associated with later

BMI. Genetic variants could also affect features of the growth

trajectory and shape key developmental milestones, including the

adiposity peak [25], adiposity rebound, and onset of puberty

between 10 and 13 years [17,26,27,28]. Sovio et al [17] and Belsky

et al [15] have recently shown that SNPs associated with adult

BMI are also associated with earlier age and higher BMI at

adiposity rebound. Genetic influences on the adiposity peak

remain poorly understood. Understanding whether and how

genetic loci are associated with BMI and other anthropometric

measures differentially across the life course may shed light on the

biological pathways involved, as well as insights into the

development of obesity to inform the design of interventions.

To date, there has been no comprehensive study of how all

known genetic variants of adult BMI influence growth over

childhood and adolescence (BMI, height and weight) and related

growth parameters (age and BMI at the adiposity peak and

rebound). One of the limitations of previous studies is they have

not stratified by sex, despite some evidence that sex-specific

differences in body composition may be partly due to genetics

[29,30]. Therefore, in the current study we:

1. Examine the association between an allelic score of 32 adult

BMI associated alleles and BMI, weight and height growth

trajectories from birth to age 17 in two birth cohorts.

2. Assess whether the association between BMI trajectories and

the 32 individual genetic loci are sex specific.

Materials and Methods

Study PopulationsALSPAC. The Avon Longitudinal Study of Parents and

Children (ALSPAC) is a prospective cohort study. The full study

methodology is published elsewhere [31] (www.bristol.ac.uk/

alspac). Pregnant women resident in one of three Bristol-based

health districts with an expected delivery date between 1 April

1991 and 31 December 1992 were invited to participate.

Invitation cards indicated that study consent was ‘opt out’, i.e.

women not actively declining participation would be included in

future data collection follow-up. Follow-up included parent and

child completed questionnaires, links to routine health care data,

and clinic attendance. 7,868 individuals were included in this

study based on the following criteria: at least one parent of

European descent, live singleton birth, unrelated to anyone in the

sample, no major congenital anomalies, genotype data, and at

least one measure of BMI throughout childhood. Ethical approval

for the study was obtained from the ALSPAC Law and Ethics

Committee and the Local Research Ethics Committees.

Raine. The Western Australian Pregnancy Cohort (Raine)

Study [32,33,34] is a prospective pregnancy cohort where 2,900

mothers were recruited prior to 18-weeks’ gestation between 1989

and 1991 (http://www.rainestudy.org.au/). 1,460 individuals were

included in this study using the same criteria as in the ALSPAC

cohort. The study was conducted with appropriate institutional

ethics approval from the King Edward Memorial Hospital and

Princess Margaret Hospital for Children ethics boards, and written

informed consent was obtained from all mothers.

BMI was calculated from weight and height measurements in

both cohorts. Additional information on the measurements in each

cohort is provided in the supplementary material (see Methods

S1). Access to data and associated protocols from the two cohorts

needs to follow the cohort guidelines outlined on their respective

websites.

Genotyping and allelic scoreImputed genotypic data used in both cohorts has been

previously described [35,36] (details in supplementary material;

see Methods S1). Speliotes et al [5] reported 32 variants to be

associated with BMI, while Belsky et al [15] selected a tag SNP

from each LD block that had previously been shown to be

associated with BMI-related traits. We selected 32 SNPs that were

from either of these two manuscripts; SNPs reported in these two

manuscripts that were within the genes of interest were all in high

LD (r2.0.75) so one loci was selected to be included. All SNPs

imputed well (all R2 for imputation quality . 0.7, mean = 0.981),

therefore, dosages from the imputed data were used (i.e. the

estimated number of increasing BMI alleles). An ‘allelic score’ was

created by summing the dosages for the BMI-increasing alleles

across all 32 SNPs [37]. A sensitivity analysis was conducted

whereby the alleles were weighted by the published effect size for

adult BMI. The weighted score gave the same conclusions as the

unweighted score; therefore only the unweighted score is

presented.

Longitudinal Modelling and derivation of growthparameters

Modelling BMI longitudinally from birth throughout childhood

is complex due to the two inflection points, adiposity peak in

infancy and adiposity rebound in childhood, and the increasing

variance in BMI throughout childhood. For this reason the

longitudinal models focused on data between 1 (when most

individuals will be post adiposity peak) and 17 years of age. A semi-

parametric linear mixed model, using smoothing splines to yield a

smooth growth curve estimate, was fitted to the BMI, weight and

height measures [38]. The basic model for the jth individual and at

the tth time-point is as follows:

Growthjt~b0zX

ibi(Agejt{Age)izX

kck((Agejt{Age){kk)i

zzXlblCovariatelzu0jz

Xiuij(Agejt{Age)izX

kgkj((Agejt{Age){kk)i

zzejt

Where Growth is BMI, weight or height, Age is the mean age

over the t time points in the sample (i.e. 8 years), kk is the k-th knot

and (t 2 kk)+ = 0 if t # kk and (t 2 kk) if t . kk, which is known

as the truncated power basis that ensures smooth continuity

between the time windows and Covariate are the study specific

(time independent) covariates. Three knot points were used, placed

at two, eight and 12 years, with a cubic slope for each spline in the

BMI and height models; this model provided the best fit of the

data compared to other approaches [38]. The weight model had

BMI Allelic Score Associated with Childhood Growth


the same placement for the knots but a linear spline from 122

years, cubic slope for 228 years and 8212 years and finally a

quadratic slope for over 12 years provided a better fit to the data

based having the lowest Akaike Information Criterion (AIC). All

models assumed a continuous autoregressive of order 1 correlation

structure.

Age and BMI at adiposity rebound were derived by setting the

first derivative of the fixed and random effects from the BMI

model between 2 and 8 years of age for each individual to zero (i.e.

the minimum point in the curve). In addition, a second model was

fit in the ALSPAC cohort only, between birth and 5 years to derive

the adiposity peak; individuals with greater than 2 measures

throughout this period were included [18], with 93% of included

individuals having at least one measure of BMI between six and 12

months. Adiposity peak was derived by setting the first derivative

of the fixed and random effects between birth and 2.5 years to

zero.

Statistical AnalysisImplausible height, weight and BMI measurements (. 4SD

from the mean for sex and age specific category) were considered

as outliers and were recoded to missing. Genetic differences in the

trajectories were estimated by including an interaction between all

components of the spline function for age and the genetic variants.

The association between the allelic score and birth measures was

analysed using linear regression, adjusting for gestational age at

birth. Linear regression was used to investigate the associations of

the allelic score with age and BMI at adiposity peak and adiposity

rebound. In addition, we used the data from the final follow-up in

each of the cohorts (15217 years) to investigate, with linear

regression, the association between the adiposity peak and

adiposity rebound parameters with final BMI.

The growth data were collected using three measurement

sources in the ALSPAC cohort; clinic visits, routine health care

visits, and parental reports in questionnaires. Trajectory analyses

in ALSPAC adjusted for a binary indicator of measurement source

(parent reports versus clinic/health care measurements) as a fixed

effect to allow for differential measurement error. To assess

population stratification, principal components generated in the

EIGENSTRAT software [39]. These components revealed no

obvious population stratification and genome-wide analyses with

other phenotypes indicate a low lambda in the ALSPAC cohort;

however in the Raine cohort there was evidence of stratification so

all analyses were adjusted for the first five principal components.

FTO is the most replicated SNP for BMI, with the largest effect

size of the BMI-associated SNPs found to date, and has been

shown previously to effect childhood growth [16,17]. We therefore

repeated the analysis adjusting for the FTO locus. All results

remained unchanged indicating that the associations between

growth and the allelic score were not driven exclusively by the

FTO effect (data not shown).

We calculated the percentage of variation in BMI explained by

the allelic score at each time point in the ALSPAC cohort using

the residual sums of squares from the longitudinal BMI growth

model [40]. We did not calculate this in the Raine cohort as the

sample size was too small for accurate estimates.

Results from the two cohorts were meta-analysed. For the allelic

score analyses, a fixed-effects inverse-variance weighted meta-

analysis was conducted using the beta coefficients and standard

errors from the two studies. No heterogeneity using Cochran’s Q

was detected between the cohorts (all P.0.05). The allelic score

was considered statistically associated with the growth parameter if

the P-value for the meta-analysis was less than 0.05. For the

analyses of the individuals SNPs with BMI, a P-Value meta-

analysis was conducted on the likelihood-ratio test (LRT) P-Values

from the two studies, without weighting, and a Bonferroni

significance threshold of 0.0016 was used to declare a statistically

significant association. All analyses were conducted in R version

2.12.1 [41], using the Spida library to estimate the spline

functions, the rmeta library for the effect-size meta-analysis and

the MADAM library for the P-Value meta-analysis.

Results

ALSPAC children had more BMI measures throughout

childhood than the Raine children with a median of 9

(interquartile range 5212) and 6 (interquartile range 527)

measures, respectively (Table 1). The minor allele frequency

(MAF) for the 32 SNPs ranged from 0.04 to 0.49 (Table 2). The

FTO loci had the largest effect on adult BMI, with an effect size of

0.39, while the effect size on adult BMI for the majority of the

remaining loci ranged from 0.06 to 0.2. All of the following results

are reported from the meta-analysis of the two cohorts, unless

otherwise specified.

Associations between the allelic score and growthtrajectories

The allelic score was associated with higher mean levels of BMI

at the intercept of 8 years (Female: b = 0.0061 units, P , 0.0001;

Male: b = 0.0044 units, P , 0.0001; Table S1) and faster BMI

growth over childhood in both sexes (all age by score interaction P

, 0.001). Due to the increasing rate of growth over time, the

trajectories of individuals with high and low allelic scores begin

together at age one but separate throughout childhood (Figure 1A

and 1B). In females, differences in BMI trajectories associated with

the allelic score were detectable from just after one year in the

ALSPAC cohort and approximately 2.5 years in the Raine cohort;

a difference was detected earlier in males, at 1 year in ALSPAC

and at 18 months in the Raine cohort.

To investigate whether the association of the allelic score with

BMI growth over childhood was due to skeletal growth or

adiposity, we tested associations between the allelic score and both

weight and height measurements. The allelic score was associated

with higher weight (Females: b= 0.0073 units, P,0.0001; Males

b= 0.0056 units, P,0.0001; Table S1) and faster rates of weight

gain over childhood in both males and females (all age by score

interaction P,0.001; Figure 1C and 1D). The association with

weight was seen earlier in males (by 1 year of age in ALSPAC)

than females (around 2 years of age in ALSPAC). The allelic score

was associated with increased height in females (b= 0.0949m,

P = 0.0002) and males (b= 0.0838m, P = 0.0008) (Table S1) and

also displayed evidence for an interaction with age (P,0.001 in

ALSPAC, P = 0.001 in Raine females and P = 0.015 in Raine

males; Figure 1E and 1F). The effect size of the allelic score on

height growth increased over childhood until around 10 years of

age in females and slightly later in males and then decreased until

it became statistically non-significant (Figure 2C). These results

suggest that the association of the allelic score with BMI growth

over childhood was due to both skeletal growth and adiposity.

Associations between the allelic score and birthmeasures, adiposity peak and adiposity rebound

As expected, females were both lighter and shorter than males

at birth (Table 1). The allelic score was not associated with the

birth measures in either sex (Table 3). In addition, there was no

interaction between the allelic score and gestational age for either

weight or length at birth (data not shown).



Table 1. Phenotypic characteristics of the two birth cohorts used for analysis.

Age Stratum ALSPAC Raine

(years) (n = 7,868) (n = 1,460)

Sex [% male (N)] 7,868 51.25% (4,032) 1,460 51.58% (753)

N Mean (SD) N Mean (SD)

Number of BMI measures per person -- 8.75 (4.58) -- 5.94 (1.52)

Age 121.49 2,832 1.18 (0.18) 1,326 1.15 (0.09)

(years) 1.522.49 7,113 1.76 (0.25) 387 2.14 (0.13)

2.523.49 2,537 2.95 (0.28) 956 3.09 (0.09)

3.524.49 6,915 3.77 (0.23) 20 3.69 (0.17)

4.525.49 1,843 5.05 (0.33) 3 5.28 (0.14)

5.526.49 3,848 5.90 (0.24) 1,269 5.91 (0.17)

6.527.49 2,861 7.31 (0.30) 42 7.25 (0.38)

7.528.49 3,975 7.74 (0.33) 1,040 8.02 (0.27)

8.529.49 4,443 8.71 (0.22) 204 8.60 (0.12)

9.5210.49 6,777 9.94 (0.29) 303 10.44 (0.08)

10.5211.49 4,917 10.75 (0.23) 926 10.64 (0.15)

11.5212.49 5,240 11.82 (0.21) 4 11.91 (0.36)

12.5213.49 6,797 12.97 (0.22) 9 13.28 (0.17)

13.5214.49 4,690 13.89 (0.17) 1,196 14.06 (0.17)

14.5215.49 2,339 15.32 (0.15) 24 14.69 (0.17)

15.5216.49 1,645 15.72 (0.22) 2 16.16 (0.19)

.16.5 90 16.83 (0.24) 976 17.05 (0.24)

BMI 121.49 2,832 17.42 (1.51) 1,326 17.11 (1.39)

(kg/m2) 1.522.49 7,113 16.82 (1.49) 387 15.97 (1.19)

2.523.49 2,537 16.48 (1.40) 956 16.14 (1.23)

3.524.49 6,915 16.25 (1.39) 20 15.92 (1.41)

4.525.49 1,843 16.02 (1.70) 3 15.94 (1.43)

5.526.49 3,848 15.71 (1.87) 1,269 15.82 (1.62)

6.527.49 2,861 16.10 (1.98) 42 16.41 (2.43)

7.528.49 3,975 16.31 (2.01) 1,040 16.83 (2.38)

8.529.49 4,443 17.15 (2.40) 204 16.90 (2.44)

9.5210.49 6,777 17.67 (2.81) 303 18.91 (3.34)

10.5211.49 4,917 18.25 (3.10) 926 18.55 (3.16)

11.5212.49 5,240 19.04 (3.35) 4 16.78 (2.64)

12.5213.49 6,797 19.64 (3.35) 9 21.11 (3.75)

13.5214.49 4,690 20.31 (3.45) 1,196 21.39 (4.02)

14.5215.49 2,339 21.28 (3.48) 24 21.66 (4.23)

15.5216.49 1,645 21.41 (3.51) 2 20.14 (3.26)

.16.5 90 22.47 (3.40) 976 23.01 (4.28)

Birth Weight (kg) Males 3,001 3.52 (0.53) 752 3.42 (0.57)

Females 2,855 3.40 (0.47) 707 3.31 (0.55)

Birth Length (cm) Males 3,001 51.13 (2.40) 675 50.12 (2.34)

Females 2,855 50.41 (2.28) 616 49.31 (2.28)

Gestational Age (wks) Males 3,001 39.52 (1.64) 753 39.42 (1.99)

Females 2,855 39.65 (1.58) 707 39.42 (2.06)

BMI at Adiposity Males 4,030 18.03 (0.76) -- --

Peak (kg/m2) Females 3,792 17.45 (0.69) -- --

Age at Adiposity Peak Males 4,030 8.90 (0.33) -- --

(months) Females 3,792 9.36 (0.49) -- --

BMI at Adiposity Males 3,642 15.62 (1.04) 697 15.53 (0.93)



The estimated age and BMI at the peak were weakly correlated

in females (r= 0.08) and males (r= 20.30). Later age at adiposity

peak was associated with higher BMI at age 15217 in females but

not males. In addition, higher BMI at adiposity peak was

associated with higher BMI at age 15217 years in both sexes.

The allelic score was not associated with age of adiposity peak in

females or males (Table 3). However, the allelic score was

associated with a higher BMI at the peak (Females: b= 0.0163 kg/

m2, P = 0.0002; Males: b= 0.0123 kg/m2, P = 0.0033). Adjust-

ment for age at the peak did not substantively alter the magnitude

Table 1. Cont.

Age Stratum ALSPAC Raine

Rebound (kg/m2) Females 3,225 15.53 (1.06) 647 15.42 (0.95)

Age at Adiposity Males 3,642 6.07 (1.02) 697 5.30 (1.05)

Rebound (years) Females 3,225 5.61 (1.16) 647 4.64 (1.10)


Table 2. Descriptive statistics of the single nucleotide polymorphisms included in the allelic score.

Chr Nearest Gene SNP

Alleles (EffectAllele / Non-effect Allele)

GWAS EffectSize for BMI Effect Allele Frequency

ALSPAC Raine

1 NEGR1 rs2568958 A/G 0.13 0.5956 0.6218

TNNI3K rs1514175 A/G 0.07 0.4249 0.4360

PTBP2 rs1555543 C/A 0.06 0.5905 0.5942

SEC16B rs543874 G/A 0.22 0.2075 0.2021

2 TMEM18 rs2867125 C/T 0.31 0.8325 0.8303

RBJ, ADCY3, POMC rs713586 C/T 0.14 0.4888 0.4841

FANCL rs887912 T/C 0.1 0.2904 0.2929

LRP1B rs2890652 C/T 0.09 0.1669 0.1627

3 CADM2 rs13078807 G/A 0.1 0.2025 0.2089

ETV5, DGKG, SFRS10 rs7647305 C/T 0.14 0.7924 0.7934

4 SLC39A8 rs13107325 T/C 0.19 0.0764 0.0723

GNPDA2 rs10938397 G/A 0.18 0.4342 0.4359

5 FLJ35779, HMGCR rs2112347 T/G 0.1 0.6401 0.6347

ZNF608 rs4836133 A/C 0.07 0.4949 0.4920

6 TFAP2B rs987237 G/A 0.13 0.1770 0.1897

9 LRRN6C rs10968576 G/A 0.11 0.3167 0.3062

LMX1B rs867559 G/A 0.24 0.1983 0.1968

11 RPL27A, TUB rs4929949 C/T 0.06 0.5390 0.5210

BDNF rs6265 C/T 0.19 0.8122 0.8119

MTCH2, NDUFS3, CUGBP1 rs3817334 T/C 0.06 0.4000 0.4213

12 FAIM2 rs7138803 A/G 0.12 0.3592 0.3675

13 MTIF3, GTF3A rs4771122 G/A 0.09 0.2304 0.2154

14 PRKD1 rs11847697 T/C 0.17 0.0467 0.0414

NRXN3 rs10150332 C/T 0.13 0.2112 0.2183

15 MAP2K5, LBXCOR1 rs2241423 G/A 0.13 0.7850 0.7699

16 GPRC5B, IQCK rs12444979 C/T 0.17 0.8620 0.8541

SH2B1, ATXN2L, TUFM, ATP2A1 rs7359397 T/C 0.15 0.4166 0.3791

FTO rs9939609 A/T 0.39 0.3933 0.3835

18 MC4R rs12970134 A/G 0.23 0.2680 0.2547

19 KCTD15 rs29941 G/A 0.06 0.6848 0.6606

TMEM160, ZC3H4 rs3810291 A/G 0.09 0.6941 0.6438

QPCTL, GIPR rs2287019 C/T 0.15 0.8123 0.8127




of the association of the allelic score with BMI at the peak

(Females: b= 0.0157 kg/m2, P = 0.0003; Males: b= 0.0135 kg/

m2, P = 0.0007).

Earlier age and higher BMI at the adiposity rebound were both

associated with higher BMI at age 15217 years. The allelic score

was associated with an earlier age at the adiposity rebound for

females (b= 20.0362years, P,0.0001) and males

(b= 20.0362years, P,0.0001) (Table 3). The effect size was

attenuated after adjusting for BMI at the rebound (Females:

b= 20.0122years, P = 0.0018; Males: b= 20.0096 years,

P = 0.0022). The allelic score was also associated with higher

BMI at the rebound in females (b= 0.0332 kg/m2, P,0.0001) and

males (b= 0.0364 kg/m2, P,0.0001). Again, the effect size

attenuated when adjusting for age at the rebound (Females:

b= 0.0094 kg/m2, P = 0.0078; Males: b= 0.0109 kg/m2,

P = 0.0004).

There was a strong positive correlation between BMI at the

adiposity peak and the adiposity rebound (Female r= 0.65,

p,0.0001; Male r= 0.59, p,0.0001). BMI at the adiposity

rebound explains more of the variation in BMI at age 15217

(45%) than the BMI at the adiposity peak (10%). Nevertheless, the

allelic score remains associated with BMI at the adiposity rebound

after adjusting for the BMI at the adiposity peak in both females

(b= 0.0171 kg/m2, P,0.0001) and males (b= 0.0269 kg/m2,

P,0.0001).

Variance explained by the allelic scoreWe calculated the percentage of variation in BMI explained by

the allelic score at each time point in the ALSPAC cohort using

the residual sums of squares from the longitudinal BMI growth

model [40]. We did not calculate this in the Raine cohort as the

sample size was too small for accurate estimates. The allelic score

explained 0.58% of the variance in BMI across childhood overall

in females and slightly less in males (0.44%) in ALSPAC, but this

percentage varied with age (Figure 3). This is approximately a

third of the variance in adult BMI explained by these SNPs in the

Figure 1. Population average curves for individuals with 27, 29 or 31 BMI risk alleles in females (A, C and E) and males (B, D and F)from the ALSPAC cohort. Predicted population average BMI (A and B), weight (C and D) and height (E and F) trajectories from 1 – 16 years forindividuals with 27 (lower quartile), 29 (median), and 31 (upper quartile) BMI risk alleles in the allelic score.doi:10.1371/journal.pone.0079547.g001



Figure 2. Associations between the allelic score and BMI, weight and height at each follow-up in females (A, C and E) and males (B,D and F) from the ALSPAC cohort. Regression coefficients (95% CI) derived from the longitudinal model at each year of follow-up between 1 and16 years.doi:10.1371/journal.pone.0079547.g002

Table 3. Cross-sectional association analysis results for birth measures, BMI and age at adiposity peak (AP) and BMI and age atadiposity rebound (AR) in the ALSPAC and Raine cohorts.

Females Males

Beta (95% CI) P-Value Beta (95% CI) P-Value

Birth weight (kg) 20.0004 (20.0043, 0.0035) 0.8283 0.0026 (20.0017, 0.0069) 0.2334

Birth length (cm) 20.0158 (20.0352, 0.0036) 0.1111 20.0002 (20.0190, 0.0186) 0.9840

BMI at AP (kg/m2) 0.0163 (0.0079, 0.0248) 0.0002 0.0123 (0.0041, 0.0204) 0.0033

Age at AP (months) 0.0074 (20.0002, 0.0151) 0.0566 0.0028 (20.0025, 0.0080) 0.3020

BMI at AR (kg/m2) 0.0332 (0.0237, 0.0427) ,0.0001 0.0364 (0.0277, 0.0451) ,0.0001

Age at AR (years) 20.0362 (20.0467, 20.0257) ,0.0001 20.0362 (20.0450, 20.0274) ,0.0001




study that identified them [5]. Figure 3 displays the estimates over

childhood in females and males.

The allelic score accounted for a similar percentage of BMI at

the adiposity peak in both females (0.42%) and males (0.22%).

However, for the measures at the adiposity rebound, the allelic

score accounts for up to 122% of the variation in the two cohorts

(Age: 0.87% in ALSPAC females, 2.70% in Raine females, 1.46%

in ALSPAC males and 0.89% in Raine males; BMI: 1.01% in

ALSPAC females, 1.87% in Raine females, 1.46% in ALSPAC

males and 1.14% in Raine males). This is twice as much of the

variation in BMI than was able to be accounted for at the time of

the adiposity peak or in the overall trajectory.

Single SNP analysesIn females, five of the 32 individual loci (RBJ, FTO, MC4R,

CADM2 and MTCH2) reached a Bonferroni significance threshold

of 0.0016 in the meta-analysis (Table S2). In males, four of the 32

individual loci (SEC16B, TMEM18, MC4R and FTO) were

associated with BMI trajectory at the Bonferroni significance

threshold (Table S3). Only FTO and MC4R reached statistical

significance in both males and females.

Sex differencesIn analyses combining males and females, there was no evidence

for sex interactions for any of the 32 loci after Bonferroni

correction; however we report the following result here as an

exploratory finding. The sex interaction for the NRXN3 loci,

rs10150332 (including interaction with the spline function), had a

P-Value of 0.0039.

Discussion

We investigated the association of variants in genes known to be

associated with increased BMI in adulthood with growth measures

over childhood from two extensively characterized longitudinal

birth cohorts. Similar to previous studies [13,14,15,16,17], we

have shown that an allelic score of known adult BMI-associated

SNPs is not associated with birth measures but is associated with

BMI growth throughout childhood and adolescence, weight

changes, and also height changes (though with weaker associa-

tions). Previous work by Elks et al [13] in the ALSPAC cohort

investigated the association of an 8 SNP allelic score with growth

trajectories from birth to 11 years of age. We have extended their

work by including an additional cohort, and by increasing the age

period over which the trajectories are examined and the number

of SNPs investigated. By extending the age range, we have shown

that the association between the allelic score and weight changes

increases in magnitude with age, whereas the association of the

allelic score with height growth stops after the onset of puberty.

Belsky et al [15] are the only other investigators to look at an allelic

score using the same set of SNPs; our conclusions are similar to

theirs in terms of the growth trajectories throughout childhood,

but we extend their work by i) having more detailed early growth

measurements, enabling us to show that the allelic score starts to

be associated with growth trajectories at an early age and to assess

Figure 3. A smooth curve of the estimates from the longitudinal models of the proportion of BMI variation explained (R2) at eachtime point in females and males from the ALSPAC cohort. R2 derived from the longitudinal model at each year of follow-up between 1 and 16years.doi:10.1371/journal.pone.0079547.g003



associations between the allelic score and the adiposity peak in

infancy, and ii) some exploratory findings regarding sex specific

genes effecting BMI growth. The GIANT consortium found a

SNP 30,000 bp upstream from the RBJ loci and a SNP in the

MC4R gene to be associated with adult height [42], but the full

functional relevance of the 32 loci, and which of them affect

height, fat accumulation or both, is not yet understood, and our

study does not have sufficient power to address this. A useful

extension to the current study would be to investigate whether any

of the individual SNPs in the allelic score largely influence child

height growth rather than weight; however a larger sample size

would be required to consider this.

Although the effect sizes presented appear relatively small, they

are consistent with those previously reported in the adult studies.

At age 15, an increase of one BMI risk allele increases BMI by

approximately 0.15 kg/m2, which is equivalent to some of the

mid-range effect sizes from adult GWAS studies as reported in

Table 2. It is widely known that the genetic basis of obesity is still

largely unknown, with only 1.45% of the variation in BMI due to

genetics having been described [5]; however, this study sheds more

light on the mechanisms behind how these genetic variants

influence childhood growth, rather than describing particularly

large effects sizes from any individual SNP.

Our results suggest that known adult BMI increasing alleles

have a detectable effect on childhood growth as early as one year.

In addition, we investigated the association between the allelic

score and features of the growth curve thought to be associated

with later obesity and cardiovascular health [23,24,43,44,45]; the

allelic score was positively associated with higher BMI at the

adiposity peak, but only weakly associated with age at adiposity

peak. This contrasts the findings for the association between the

FTO gene and adiposity peak shown in the Northern Finnish Birth

Cohort from 1966 [25], where the age but not BMI at adiposity

peak was associated with the FTO variant; however, subsequent

analysis in this cohort as part of a meta-analysis showed the

association was not statistically significant [17]. The explanation

for these differences are unclear; both of the cohorts investigated

had limited data available in the first few years of life, and

although data availability was greater than in previous studies and

we were able to estimate the emergence of the genetic association

and the parameters around the adiposity peak, it would be

beneficial to replicate this finding in cohorts with more regular

measurements in early infancy. Likewise, we saw differences in the

timing of the adiposity rebound between the ALSPAC and Raine

cohorts, with an earlier rebound being found in the Raine cohort.

This could be due to the lack of data between three and five and a

half years where the rebound is expected to occur. In contrast, the

ALSPAC cohort had an adequate number of measurements

throughout the adiposity rebound period although a portion of

them came from parental report questionnaires which have been

shown to be less accurate than the clinic measures [46]. Therefore,

the precision of the estimate for the BMI and age at the adiposity

rebound is very similar between the two cohorts, as seen by the

standard deviations in Table 1. In addition, we do not believe this

has influenced the genetic results as the effect sizes of the allelic

score were similar between the ALSPAC and Raine cohorts for

both the age and BMI at the adiposity rebound (data not shown).

Previous studies investigating the association between adult BMI

associated SNPs and childhood growth adjusted their analyses for

sex [13,14,15,16,17]; only Hardy et al [16] tested for a sex

interaction and found it to be non-significant. We detected a

statistically significant sex interaction for the allelic score, so

conducted sex specific analyses. We found that the allelic score

begins to be associated with BMI and weight earlier in males than

females, but around the same age for height. Furthermore, other

than the FTO and MC4R SNPs, we found different genes

associated with childhood BMI trajectory in males and females.

However, these differences could not be replicated in the formal

interaction analysis and therefore further investigation in larger

sample sizes is required to confirm this observation. Our findings

provide additional evidence that there may be different, but

partially overlapping, genes that contribute to the body shape of

males and females from early childhood.

In conclusion, we have conducted an association analysis in a

large childhood population to investigate the effect of known adult

genetic determinants of BMI on childhood growth trajectory. We

have shown that the genetic effect begins very early in life, which is

consistent with the life course epidemiology hypotheses – the

determinants of adult susceptibility to obesity begin in early

childhood and develop over the life course.

Supporting Information

Table S1 Longitudinal allelic score association analysis results

for BMI, weight and height in ALSPAC and Raine, in addition to

the meta-analysis summary

(XLSX)

Table S2 Longitudinal association analysis results for each of the

32 BMI SNPs against BMI, weight and height in females from

ALSPAC and Raine, in addition to the meta-analysis summary

(XLSX)

Table S3 Longitudinal association analysis results for each of the

32 BMI SNPs against BMI, weight and height in males from

ALSPAC and Raine, in addition to the meta-analysis summary

(XLSX)

Methods S1 Additional information regarding the collection of

phenotypic measurements and genotyping methods in the

ALSPAC and Raine cohorts. Furthermore, additional details

regarding the longitudinal modelling and derivation of growth

phenotypes are provided.

(DOC)

Acknowledgments

ALSPAC: We are extremely grateful to all the families who took part in this

study, the midwives for their help in recruiting them, and the whole

ALSPAC team, which includes interviewers, computer and laboratory

technicians, clerical workers, research scientists, volunteers, managers,

receptionists and nurses.

Raine: The authors are grateful to the Raine Study participants, their

families, and to the Raine Study research staff for cohort coordination and

data collection. The authors gratefully acknowledge the assistance of the

Western Australian DNA Bank (National Health and Medical Research

Council of Australia National Enabling Facility).

Author Contributions

Conceived and designed the experiments: NMW LDH KT LJP DAL LB.

Analyzed the data: NMW. Contributed reagents/materials/analysis tools:

LDH YYW. Wrote the paper: NMW LDH DAL LB. Aquired data: CEP

JN GDS LJP LJB SJL DAL. Interpreted results and reviewed manuscript:

YYW NJT KT CEP JN GDS LJP LJB SJL. Approved manuscript for

submission: NMW LDH YYW NJT KT CEP JN GDS LJP LJB SJL DAL

LB.



References

1. Maes HH, Neale MC, Eaves LJ (1997) Genetic and environmental factors in

relative body weight and human adiposity. Behav Genet 27: 3252351.2. Haworth CM, Carnell S, Meaburn EL, Davis OS, Plomin R, et al. (2008)

Increasing heritability of BMI and stronger associations with the FTO gene overchildhood. Obesity (Silver Spring) 16: 266322668.

3. Wardle J, Carnell S, Haworth CM, Plomin R (2008) Evidence for a strong

genetic influence on childhood adiposity despite the force of the obesogenicenvironment. Am J Clin Nutr 87: 3982404.

4. Parsons TJ, Power C, Logan S, Summerbell CD (1999) Childhood predictors ofadult obesity: a systematic review. Int J Obes Relat Metab Disord 23 Suppl 8:

S12107.

5. Speliotes EK, Willer CJ, Berndt SI, Monda KL, Thorleifsson G, et al. (2010)Association analyses of 249,796 individuals reveal 18 new loci associated with

body mass index. Nat Genet 42: 9372948.6. Liu JZ, Medland SE, Wright MJ, Henders AK, Heath AC, et al. (2010)

Genome-wide association study of height and body mass index in Australiantwin families. Twin Res Hum Genet 13: 1792193.

7. Thorleifsson G, Walters GB, Gudbjartsson DF, Steinthorsdottir V, Sulem P, et

al. (2009) Genome-wide association yields new sequence variants at seven locithat associate with measures of obesity. Nat Genet 41: 18224.

8. Willer CJ, Speliotes EK, Loos RJ, Li S, Lindgren CM, et al. (2009) Six new lociassociated with body mass index highlight a neuronal influence on body weight

regulation. Nat Genet 41: 25234.

9. Loos RJ, Lindgren CM, Li S, Wheeler E, Zhao JH, et al. (2008) Commonvariants near MC4R are associated with fat mass, weight and risk of obesity. Nat

Genet 40: 7682775.10. Fox CS, Heard-Costa N, Cupples LA, Dupuis J, Vasan RS, et al. (2007)

Genome-wide association to body mass index and waist circumference: theFramingham Heart Study 100K project. BMC Med Genet 8 Suppl 1: S18.

11. Frayling TM, Timpson NJ, Weedon MN, Zeggini E, Freathy RM, et al. (2007)

A common variant in the FTO gene is associated with body mass index andpredisposes to childhood and adult obesity. Science 316: 8892894.

12. Bradfield JP, Taal HR, Timpson NJ, Scherag A, Lecoeur C, et al. (2012) Agenome-wide association meta-analysis identifies new childhood obesity loci. Nat

Genet 44: 5262531.

13. Elks CE, Loos RJ, Sharp SJ, Langenberg C, Ring SM, et al. (2010) Geneticmarkers of adult obesity risk are associated with greater early infancy weight gain

and growth. PLoS Med 7: e1000284.14. Mei H, Chen W, Jiang F, He J, Srinivasan S, et al. (2012) Longitudinal

replication studies of GWAS risk SNPs influencing body mass index over thecourse of childhood and adulthood. PLoS One 7: e31470.

15. Belsky DW, Moffitt TE, Houts R, Bennett GG, Biddle AK, et al. (2012)

Polygenic risk, rapid childhood growth, and the development of obesity:evidence from a 4-decade longitudinal study. Arch Pediatr Adolesc Med 166:

5152521.16. Hardy R, Wills AK, Wong A, Elks CE, Wareham NJ, et al. (2010) Life course

variations in the associations between FTO and MC4R gene variants and body

size. Hum Mol Genet 19: 5452552.17. Sovio U, Mook-Kanamori DO, Warrington NM, Lawrence R, Briollais L, et al.

(2011) Association between common variation at the FTO locus and changes inbody mass index from infancy to late childhood: the complex nature of genetic

association through growth and development. PLoS Genet 7: e1001307.18. Silverwood RJ, De Stavola BL, Cole TJ, Leon DA (2009) BMI peak in infancy as

a predictor for later BMI in the Uppsala Family Study. Int J Obes (Lond) 33:

9292937.19. Adair LS (2008) Child and adolescent obesity: epidemiology and developmental

perspectives. Physiol Behav 94: 8216.20. Dietz WH (1994) Critical periods in childhood for the development of obesity.

Am J Clin Nutr 59: 9552959.

21. He Q, Karlberg J (2002) Probability of adult overweight and risk change duringthe BMI rebound period. Obes Res 10: 1352140.

22. Rolland-Cachera MF, Deheeger M, Bellisle F, Sempe M, Guilloud-Bataille M,et al. (1984) Adiposity rebound in children: a simple indicator for predicting

obesity. Am J Clin Nutr 39: 1292135.

23. Rolland-Cachera MF, Deheeger M, Maillot M, Bellisle F (2006) Early adiposityrebound: causes and consequences for obesity in children and adults. Int J Obes

(Lond) 30 Suppl 4: S11217.

24. Whitaker RC, Pepe MS, Wright JA, Seidel KD, Dietz WH (1998) Early

adiposity rebound and the risk of adult obesity. Pediatrics 101: E5.25. Sovio U, Timpson NJ, Warrington NM, Briollais L, Mook-Kanamori D, et al.

(2009) Association Between FTO Polymorphism, Adiposity Peak and AdiposityRebound in The Northern Finland Birth Cohort 1966. Atherosclerosis 207:

e42e5.

26. Elks CE, Perry JR, Sulem P, Chasman DI, Franceschini N, et al. (2010) Thirtynew loci for age at menarche identified by a meta-analysis of genome-wide

association studies. Nat Genet 42: 107721085.27. Dvornyk V, Waqar ul H (2012) Genetics of age at menarche: a systematic

review. Hum Reprod Update 18: 1982210.

28. Wen X, Kleinman K, Gillman MW, Rifas-Shiman SL, Taveras EM (2012)Childhood body mass index trajectories: modeling, characterizing, pairwise

correlations and socio-demographic predictors of trajectory characteristics.BMC Med Res Methodol 12: 38.

29. Zillikens MC, Yazdanpanah M, Pardo LM, Rivadeneira F, Aulchenko YS, et al.(2008) Sex-specific genetic effects influence variation in body composition.

Diabetologia 51: 223322241.

30. Comuzzie AG, Blangero J, Mahaney MC, Mitchell BD, Stern MP, et al. (1993)Quantitative genetics of sexual dimorphism in body fat measurements. American

Journal of Human Biology 5: 7252734.31. Boyd A, Golding J, Macleod J, Lawlor DA, Fraser A, et al. (2012) Cohort Profile:

The ’Children of the 90s’--the index offspring of the Avon Longitudinal Study of

Parents and Children. Int J Epidemiol.32. Newnham JP, Evans SF, Michael CA, Stanley FJ, Landau LI (1993) Effects of

frequent ultrasound during pregnancy: a randomised controlled trial. Lancet342: 8872891.

33. Williams LA, Evans SF, Newnham JP (1997) Prospective cohort study of factorsinfluencing the relative weights of the placenta and the newborn infant. BMJ

314: 186421868.

34. Evans S, Newnham J, MacDonald W, Hall C (1996) Characterisation of thepossible effect on birthweight following frequent prenatal ultrasound examina-

tions. Early Hum Dev 45: 2032214.35. Paternoster L, Zhurov AI, Toma AM, Kemp JP, St Pourcain B, et al. (2012)

Genome-wide association study of three-dimensional facial morphology

identifies a variant in PAX3 associated with nasion position. Am J Hum Genet90: 4782485.

36. Taal HR, St Pourcain B, Thiering E, Das S, Mook-Kanamori DO, et al. (2012)Common variants at 12q15 and 12q24 are associated with infant head

circumference. Nat Genet 44: 5322538.37. Janssens AC, Aulchenko YS, Elefante S, Borsboom GJ, Steyerberg EW, et al.

(2006) Predictive testing for complex diseases using multiple genes: fact or

fiction? Genet Med 8: 3952400.38. Warrington NM, Wu YY, Pennell CE, Marsh JA, Beilin LJ, et al. (2013)

Modelling BMI Trajectories in Children for Genetic Association Studies. PLoSOne 8: e53897.

39. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, et al. (2006)

Principal components analysis corrects for stratification in genome-wideassociation studies. Nat Genet 38: 9042909.

40. Xu R (2003) Measuring explained variation in linear mixed effects models. StatMed 22: 352723541.

41. Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics.Journal of Computational and Graphical Statistics 5: 2992314.

42. Lango Allen H, Estrada K, Lettre G, Berndt SI, Weedon MN, et al. (2010)

Hundreds of variants clustered in genomic loci and biological pathways affecthuman height. Nature 467: 8322838.

43. Bhargava SK, Sachdev HS, Fall CH, Osmond C, Lakshmy R, et al. (2004)Relation of serial changes in childhood body-mass index to impaired glucose

tolerance in young adulthood. N Engl J Med 350: 8652875.

44. Eriksson JG, Forsen T, Tuomilehto J, Osmond C, Barker DJ (2003) Earlyadiposity rebound in childhood and risk of Type 2 diabetes in adult life.

Diabetologia 46: 1902194.45. Taylor RW, Grant AM, Goulding A, Williams SM (2005) Early adiposity

rebound: review of papers linking this to subsequent obesity in children and

adults. Curr Opin Clin Nutr Metab Care 8: 6072612.46. Dubois L, Girad M (2007) Accuracy of maternal reports of pre-schoolers’

weights and heights as estimates of BMI values. Int J Epidemiol 36: 1322138.



Appendix F: Additional Results from Allelic Score Analysis in Chapter Six

Table 1: Results for females of the 32 individual adult BMI associated SNPs with BMI trajectories in both cohorts and the combined meta-analysis; k1, k2 and k3 represent

knot point 1, 2 and 3 respectively.

ALSPAC Raine

Combined P-Value

Combined bonferroni corrected P-Value

Gene SNP Beta SE P LRT P Beta SE P LRT P

NEGR1

rs2568958 0.0042 0.0032 0.19

0.23

0.0090 0.0072 0.21

0.17 0.16 1.00

(Age-8):rs2568958 0.0005 0.0005 0.31 0.0013 0.0012 0.29 (Age-8)2:rs2568958 -0.0006 0.0005 0.27 0.0019 0.0016 0.24 (Age-8)3:rs2568958 -0.0002 0.0002 0.35 0.0011 0.0008 0.18 (Age-8)3:k2:rs2568958 0.0005 0.0005 0.39 -0.0023 0.0016 0.14 (Age-8)3:k3:rs2568958 -0.0004 0.0012 0.72 0.0027 0.0020 0.17 (Age-8)3:k1:rs2568958 0.0500 0.0282 0.08 0.1762 0.0924 0.06

TNNI3K

rs1514175 0.0109 0.0032 0.00

4.8x10-3

0.0165 0.0071 0.02

0.47 0.02 0.52

(Age-8):rs1514175 0.0019 0.0005 0.00 0.0011 0.0012 0.34 (Age-8)2:rs1514175 1.35x10-5 0.0005 0.98 -0.0002 0.0015 0.89 (Age-8)3:rs1514175 -0.0001 0.0002 0.81 0.0000 0.0008 0.97 (Age-8)3:k2:rs1514175 -0.0001 0.0005 0.79 0.0001 0.0015 0.97 (Age-8)3:k3:rs1514175 0.0008 0.0011 0.50 -0.0002 0.0019 0.91 (Age-8)3:k1:rs1514175 -0.0165 0.0285 0.56 -0.0114 0.0888 0.90

PTBP2

rs1555543 0.0011 0.0032 0.73

0.86

0.0134 0.0073 0.07

0.15 0.38 1.00

(Age-8):rs1555543 0.0003 0.0005 0.57 0.0031 0.0012 0.01 (Age-8)2:rs1555543 0.0001 0.0005 0.92 0.0012 0.0016 0.46 (Age-8)3:rs1555543 0.0001 0.0002 0.72 0.0004 0.0008 0.58 (Age-8)3:k2:rs1555543 -0.0002 0.0005 0.67 -0.0015 0.0016 0.36 (Age-8)3:k3:rs1555543 0.0008 0.0011 0.49 0.0029 0.0020 0.16

(Age-8)3:k1:rs1555543 0.0180 0.0289 0.53 0.0876 0.0929 0.35

SEC16B

rs543874 0.0074 0.0038 0.05

0.03

0.0260 0.0087 0.00

0.09 0.02 0.60

(Age-8):rs543874 0.0016 0.0006 0.01 0.0033 0.0014 0.02 (Age-8)2:rs543874 -0.0007 0.0006 0.31 0.0001 0.0019 0.95 (Age-8)3:rs543874 -0.0003 0.0003 0.28 0.0000 0.0010 0.97 (Age-8)3:k2:rs543874 0.0006 0.0006 0.33 -0.0005 0.0019 0.81 (Age-8)3:k3:rs543874 -0.0013 0.0014 0.36 0.0014 0.0024 0.55 (Age-8)3:k1:rs543874 -0.0123 0.0350 0.72 -0.0357 0.1148 0.76

TMEM18

rs2867125 0.0065 0.0041 0.12

0.01

0.0124 0.0090 0.17

0.85 0.05 1.00

(Age-8):rs2867125 0.0011 0.0007 0.11 0.0002 0.0015 0.91 (Age-8)2:rs2867125 0.0006 0.0007 0.41 -0.0021 0.0020 0.29 (Age-8)3:rs2867125 4.25x10-5 0.0003 0.89 -0.0009 0.0010 0.35 (Age-8)3:k2:rs2867125 -0.0003 0.0007 0.68 0.0021 0.0020 0.29 (Age-8)3:k3:rs2867125 -0.0002 0.0015 0.90 -0.0024 0.0024 0.33 (Age-8)3:k1:rs2867125 -0.0244 0.0362 0.50 -0.0841 0.1153 0.47

RBJ, ADCY3, POMC

rs713586 0.0152 0.0031 0.00

4.8x10-10

0.0154 0.0068 0.02

0.18 2.1x10-9 6.73x10-8

(Age-8):rs713586 0.0015 0.0005 0.00 0.0012 0.0011 0.29 (Age-8)2:rs713586 0.0002 0.0005 0.73 -0.0006 0.0015 0.69 (Age-8)3:rs713586 -0.0001 0.0002 0.67 -0.0003 0.0008 0.68 (Age-8)3:k2:rs713586 -0.0002 0.0005 0.70 0.0002 0.0015 0.90 (Age-8)3:k3:rs713586 0.0012 0.0011 0.26 0.0009 0.0019 0.63 (Age-8)3:k1:rs713586 -0.0206 0.0276 0.45 -0.0222 0.0896 0.80

FANCL

rs887912 0.0023 0.0035 0.51

0.80

-0.0002 0.0076 0.97

0.32 0.60 1.00 (Age-8):rs887912 0.0004 0.0006 0.50 0.0031 0.0013 0.02 (Age-8)2:rs887912 -0.0004 0.0006 0.51 0.0015 0.0017 0.39 (Age-8)3:rs887912 -0.0002 0.0003 0.37 0.0003 0.0009 0.70 (Age-8)3:k2:rs887912 0.0005 0.0006 0.41 -0.0015 0.0017 0.38

(Age-8)3:k3:rs887912 -0.0012 0.0012 0.31 0.0026 0.0021 0.21 (Age-8)3:k1:rs887912 -0.0384 0.0329 0.24 0.0145 0.1034 0.89

CADM2

rs13078807 0.0142 0.0039 0.00

1.5x10-3

0.0121 0.0090 0.18

0.09 1.4x10-3 0.04

(Age-8):rs13078807 0.0020 0.0006 0.00 -0.0009 0.0015 0.55 (Age-8)2:rs13078807 -0.0006 0.0007 0.33 -0.0001 0.0020 0.94 (Age-8)3:rs13078807 -0.0002 0.0003 0.46 0.0004 0.0010 0.69 (Age-8)3:k2:rs13078807 0.0002 0.0007 0.78 0.0003 0.0020 0.90 (Age-8)3:k3:rs13078807 0.0004 0.0014 0.77 -0.0021 0.0025 0.41 (Age-8)3:k1:rs13078807 0.0470 0.0348 0.18 0.1199 0.1147 0.30

ETV5, DGKG, SFRS10

rs7647305 0.0104 0.0040 0.01

0.02

0.0027 0.0090 0.76

0.62 0.08 1.00

(Age-8):rs7647305 0.0016 0.0007 0.01 0.0006 0.0015 0.71 (Age-8)2:rs7647305 0.0002 0.0007 0.80 -0.0019 0.0020 0.33 (Age-8)3:rs7647305 3.56x10-5 0.0003 0.90 -0.0013 0.0010 0.17 (Age-8)3:k2:rs7647305 -0.0004 0.0007 0.56 0.0023 0.0020 0.24 (Age-8)3:k3:rs7647305 0.0009 0.0014 0.54 -0.0021 0.0025 0.40 (Age-8)3:k1:rs7647305 0.0528 0.0364 0.15 -0.1877 0.1118 0.09

SLC39A8

rs13107325 0.0167 0.0060 0.01

4.9x10-4

0.0138 0.0132 0.30

0.88 3.8x10-3 0.12

(Age-8):rs13107325 0.0033 0.0010 0.00 -0.0014 0.0022 0.53 (Age-8)2:rs13107325 0.0009 0.0010 0.36 -0.0014 0.0029 0.62 (Age-8)3:rs13107325 3.85x10-5 0.0004 0.93 -0.0004 0.0015 0.77 (Age-8)3:k2:rs13107325 -0.0010 0.0010 0.35 0.0014 0.0029 0.63 (Age-8)3:k3:rs13107325 0.0036 0.0023 0.11 -0.0026 0.0036 0.47 (Age-8)3:k1:rs13107325 0.0080 0.0539 0.88 -0.0111 0.1777 0.95

FLJ35779, HMGCR

rs2112347 0.0029 0.0033 0.38

0.16

0.0090 0.0075 0.23

0.77 0.38 1.00 (Age-8):rs2112347 0.0011 0.0005 0.03 0.0012 0.0012 0.34 (Age-8)2:rs2112347 -0.0003 0.0006 0.65 -0.0021 0.0016 0.19 (Age-8)3:rs2112347 -0.0003 0.0002 0.16 -0.0011 0.0008 0.20

(Age-8)3:k2:rs2112347 0.0002 0.0005 0.65 0.0021 0.0016 0.21 (Age-8)3:k3:rs2112347 0.0007 0.0012 0.57 -0.0022 0.0020 0.27 (Age-8)3:k1:rs2112347 -0.0533 0.0297 0.07 -0.0972 0.0968 0.32

ZNF608

rs4836133 0.0033 0.0032 0.31

0.68

0.0022 0.0072 0.76

0.15 0.33 1.00

(Age-8):rs4836133 -0.0002 0.0005 0.77 -0.0024 0.0012 0.05 (Age-8)2:rs4836133 -0.0001 0.0005 0.88 -0.0039 0.0016 0.01 (Age-8)3:rs4836133 3.55x10-5 0.0002 0.88 -0.0014 0.0008 0.08 (Age-8)3:k2:rs4836133 -0.0001 0.0005 0.85 0.0037 0.0016 0.02 (Age-8)3:k3:rs4836133 0.0009 0.0012 0.46 -0.0051 0.0020 0.01 (Age-8)3:k1:rs4836133 -0.0056 0.0293 0.85 -0.0687 0.0946 0.47

TFAP2B

rs987237 0.0118 0.0041 0.00

0.02

0.0091 0.0094 0.33

0.05 0.01 0.23

(Age-8):rs987237 0.0014 0.0007 0.05 -0.0009 0.0015 0.55 (Age-8)2:rs987237 0.0004 0.0007 0.59 -0.0009 0.0020 0.67 (Age-8)3:rs987237 0.0002 0.0003 0.45 -0.0001 0.0010 0.93 (Age-8)3:k2:rs987237 -0.0009 0.0007 0.22 0.0011 0.0021 0.58 (Age-8)3:k3:rs987237 0.0031 0.0015 0.04 -0.0041 0.0025 0.10 (Age-8)3:k1:rs987237 0.0518 0.0368 0.16 0.0998 0.1203 0.41

LRRN6C

rs10968576 -0.0003 0.0034 0.93

0.08

-0.0075 0.0078 0.33

0.77 0.23 1.00

(Age-8):rs10968576 -0.0006 0.0006 0.30 -0.0013 0.0013 0.32 (Age-8)2:rs10968576 -0.0017 0.0006 0.00 0.0001 0.0017 0.94 (Age-8)3:rs10968576 -0.0008 0.0003 0.00 0.0002 0.0009 0.84 (Age-8)3:k2:rs10968576 0.0017 0.0006 0.00 0.0001 0.0017 0.94 (Age-8)3:k3:rs10968576 -0.0026 0.0012 0.04 -0.0015 0.0021 0.48 (Age-8)3:k1:rs10968576 -0.0315 0.0309 0.31 0.0564 0.1051 0.59

LMX1B rs867559 0.0002 0.0039 0.95

0.32 0.0071 0.0092 0.44

0.15 0.19 1.00 (Age-8):rs867559 0.0014 0.0006 0.03 0.0036 0.0015 0.02 (Age-8)2:rs867559 0.0003 0.0007 0.60 0.0019 0.0020 0.34

(Age-8)3:rs867559 0.0001 0.0003 0.78 0.0002 0.0010 0.87 (Age-8)3:k2:rs867559 -0.0005 0.0006 0.40 -0.0018 0.0020 0.36 (Age-8)3:k3:rs867559 0.0018 0.0014 0.21 0.0043 0.0025 0.09 (Age-8)3:k1:rs867559 0.0272 0.0356 0.44 -0.1023 0.1211 0.40

RPL27A, TUB

rs4929949 0.0058 0.0032 0.07

0.03

0.0050 0.0072 0.49

0.40 0.06 1.00

(Age-8):rs4929949 0.0002 0.0005 0.76 0.0015 0.0012 0.22 (Age-8)2:rs4929949 -0.0014 0.0005 0.01 0.0009 0.0016 0.56 (Age-8)3:rs4929949 -0.0007 0.0002 0.00 0.0005 0.0008 0.56 (Age-8)3:k2:rs4929949 0.0014 0.0005 0.01 -0.0014 0.0016 0.38 (Age-8)3:k3:rs4929949 -0.0013 0.0012 0.25 0.0027 0.0020 0.18 (Age-8)3:k1:rs4929949 -0.0500 0.0298 0.09 0.0980 0.0923 0.29

BDNF

rs6265 0.0105 0.0040 0.01

0.07

0.0088 0.0090 0.33

0.03 0.01 0.43

(Age-8):rs6265 0.0013 0.0007 0.04 0.0050 0.0015 0.00 (Age-8)2:rs6265 0.0001 0.0007 0.87 0.0016 0.0020 0.43 (Age-8)3:rs6265 0.0001 0.0003 0.76 -0.0002 0.0010 0.86 (Age-8)3:k2:rs6265 -0.0003 0.0007 0.70 -0.0015 0.0020 0.47 (Age-8)3:k3:rs6265 0.0001 0.0015 0.93 0.0043 0.0025 0.09 (Age-8)3:k1:rs6265 -0.0450 0.0387 0.24 -0.1909 0.1164 0.10

MTCH2, NDUFS3, CUGBP1

rs3817334 0.0071 0.0032 0.02

1.7x10-4

-0.0062 0.0070 0.38

0.91 1.5x10-3 0.05

(Age-8):rs3817334 0.0027 0.0005 0.00 -0.0007 0.0012 0.55 (Age-8)2:rs3817334 0.0005 0.0005 0.37 -0.0011 0.0015 0.47 (Age-8)3:rs3817334 -6.92x10-6 0.0002 0.98 -0.0005 0.0008 0.47 (Age-8)3:k2:rs3817334 -0.0007 0.0005 0.17 0.0013 0.0015 0.40 (Age-8)3:k3:rs3817334 0.0028 0.0011 0.01 -0.0019 0.0019 0.33 (Age-8)3:k1:rs3817334 -0.0310 0.0286 0.28 -0.0462 0.0881 0.60

FAIM2 rs7138803 0.0093 0.0032 0.00

0.09 0.0108 0.0072 0.14

0.19 0.09 1.00 (Age-8):rs7138803 0.0008 0.0005 0.14 0.0023 0.0012 0.06

(Age-8)2:rs7138803 -0.0003 0.0005 0.58 0.0002 0.0016 0.90 (Age-8)3:rs7138803 -0.0001 0.0002 0.55 -0.0001 0.0008 0.90 (Age-8)3:k2:rs7138803 0.0002 0.0005 0.76 0.0000 0.0016 0.99 (Age-8)3:k3:rs7138803 0.0006 0.0011 0.61 -0.0006 0.0020 0.78 (Age-8)3:k1:rs7138803 -0.0372 0.0298 0.21 -0.0624 0.0939 0.51

MTIF3, GTF3A

rs4771122 -0.0019 0.0038 0.61

0.30

0.0212 0.0088 0.02

0.15 0.19 1.00

(Age-8):rs4771122 0.0010 0.0006 0.10 0.0033 0.0015 0.03 (Age-8)2:rs4771122 0.0014 0.0006 0.03 -0.0011 0.0019 0.58 (Age-8)3:rs4771122 0.0006 0.0003 0.04 -0.0004 0.0010 0.67 (Age-8)3:k2:rs4771122 -0.0015 0.0006 0.02 0.0005 0.0019 0.79 (Age-8)3:k3:rs4771122 0.0028 0.0014 0.04 0.0002 0.0024 0.93 (Age-8)3:k1:rs4771122 0.0447 0.0343 0.19 0.0439 0.1140 0.70

PRKD1

rs11847697 0.0023 0.0077 0.76

0.43

-0.0043 0.0185 0.81

0.59 0.60 1.00

(Age-8):rs11847697 -7.63x10-6 0.0013 1.00 -0.0029 0.0031 0.34 (Age-8)2:rs11847697 -0.0014 0.0013 0.27 -0.0041 0.0040 0.31 (Age-8)3:rs11847697 -0.0008 0.0006 0.18 -0.0023 0.0021 0.26 (Age-8)3:k2:rs11847697 0.0015 0.0013 0.24 0.0051 0.0041 0.21 (Age-8)3:k3:rs11847697 -0.0029 0.0029 0.31 -0.0071 0.0050 0.15 (Age-8)3:k1:rs11847697 0.0513 0.0709 0.47 -0.3977 0.2618 0.13

NRXN3

rs10150332 0.0001 0.0039 0.97

0.05

-0.0016 0.0084 0.85

0.01 3.3x10-3 0.11

(Age-8):rs10150332 0.0019 0.0006 0.00 -0.0032 0.0014 0.02 (Age-8)2:rs10150332 0.0006 0.0006 0.34 -0.0033 0.0018 0.07 (Age-8)3:rs10150332 1.96x10-5 0.0003 0.94 -0.0013 0.0009 0.14 (Age-8)3:k2:rs10150332 -0.0006 0.0006 0.37 0.0040 0.0018 0.03 (Age-8)3:k3:rs10150332 0.0022 0.0014 0.11 -0.0074 0.0023 0.00 (Age-8)3:k1:rs10150332 0.0177 0.0344 0.61 -0.1183 0.1037 0.25

MAP2K5, LBXCOR1 rs2241423 0.0057 0.0038 0.13 0.13 0.0113 0.0086 0.19 0.21 0.13 1.00

(Age-8):rs2241423 0.0013 0.0006 0.03 0.0022 0.0014 0.13 (Age-8)2:rs2241423 -0.0007 0.0006 0.28 -0.0014 0.0019 0.46 (Age-8)3:rs2241423 -0.0005 0.0003 0.10 -0.0008 0.0009 0.38 (Age-8)3:k2:rs2241423 0.0008 0.0006 0.22 0.0011 0.0019 0.56 (Age-8)3:k3:rs2241423 -0.0011 0.0013 0.38 -0.0005 0.0024 0.82 (Age-8)3:k1:rs2241423 -0.0168 0.0345 0.63 0.0046 0.1060 0.97

GPRC5B, IQCK

rs12444979 0.0050 0.0046 0.27

0.01

0.0151 0.0097 0.12

0.33 0.02 0.65

(Age-8):rs12444979 -0.0003 0.0007 0.70 0.0040 0.0016 0.01 (Age-8)2:rs12444979 0.0018 0.0008 0.02 -0.0005 0.0021 0.80 (Age-8)3:rs12444979 0.0010 0.0003 0.00 -0.0005 0.0011 0.61 (Age-8)3:k2:rs12444979 -0.0019 0.0008 0.01 0.0002 0.0021 0.91 (Age-8)3:k3:rs12444979 0.0009 0.0016 0.58 0.0015 0.0027 0.58 (Age-8)3:k1:rs12444979 0.0574 0.0423 0.17 -0.0929 0.1194 0.44


rs7359397 -0.0007 0.0032 0.83

0.33

0.0041 0.0071 0.57

0.67 0.55 1.00

(Age-8):rs7359397 0.0004 0.0005 0.41 0.0003 0.0012 0.83 (Age-8)2:rs7359397 -3.05x10-5 0.0005 0.95 -0.0005 0.0016 0.76 (Age-8)3:rs7359397 -0.0001 0.0002 0.67 -0.0003 0.0008 0.69 (Age-8)3:k2:rs7359397 0.0001 0.0005 0.78 0.0007 0.0016 0.66 (Age-8)3:k3:rs7359397 -0.0011 0.0011 0.33 -0.0009 0.0019 0.63 (Age-8)3:k1:rs7359397 -0.0140 0.0285 0.62 -0.1020 0.0937 0.28

FTO

rs9939609 0.0108 0.0032 0.00

3.8x10-10

0.0122 0.0073 0.09

0.26 2.4x10-9 7.7x10-8

(Age-8):rs9939609 0.0036 0.0005 0.00 0.0007 0.0012 0.57 (Age-8)2:rs9939609 0.0003 0.0005 0.63 0.0010 0.0016 0.51 (Age-8)3:rs9939609 -3.70x10-5 0.0002 0.87 0.0007 0.0008 0.40 (Age-8)3:k2:rs9939609 -0.0006 0.0005 0.29 -0.0014 0.0016 0.38 (Age-8)3:k3:rs9939609 0.0017 0.0011 0.12 0.0021 0.0020 0.30 (Age-8)3:k1:rs9939609 -0.0156 0.0289 0.59 0.0201 0.0957 0.83

MC4R

rs12970134 0.0154 0.0036 0.00

1.0x10-6

0.0018 0.0080 0.82

0.34 5.6x10-6 1.8x10-4

(Age-8):rs12970134 0.0028 0.0006 0.00 0.0032 0.0013 0.02 (Age-8)2:rs12970134 -0.0011 0.0006 0.07 0.0010 0.0018 0.57 (Age-8)3:rs12970134 -0.0007 0.0003 0.01 0.0001 0.0009 0.88 (Age-8)3:k2:rs12970134 0.0007 0.0006 0.24 -0.0013 0.0018 0.45 (Age-8)3:k3:rs12970134 0.0003 0.0013 0.84 0.0035 0.0022 0.11 (Age-8)3:k1:rs12970134 -0.0213 0.0317 0.50 0.0240 0.0984 0.81

KCTD15

rs29941 0.0028 0.0034 0.40

0.58

0.0095 0.0073 0.19

0.73 0.79 1.00

(Age-8):rs29941 -0.0003 0.0006 0.59 0.0018 0.0012 0.13 (Age-8)2:rs29941 0.0003 0.0006 0.60 -0.0009 0.0016 0.58 (Age-8)3:rs29941 0.0003 0.0002 0.25 -0.0004 0.0008 0.60 (Age-8)3:k2:rs29941 -0.0003 0.0006 0.63 0.0007 0.0016 0.68 (Age-8)3:k3:rs29941 -0.0008 0.0012 0.53 -0.0004 0.0020 0.86 (Age-8)3:k1:rs29941 0.0252 0.0307 0.41 0.0044 0.0942 0.96

TMEM160, ZC3H4

rs3810291 0.0039 0.0038 0.31

0.03

0.0137 0.0084 0.10

0.58 0.09 1.00

(Age-8):rs3810291 0.0014 0.0006 0.02 -0.0002 0.0014 0.87 (Age-8)2:rs3810291 0.0007 0.0006 0.27 0.0002 0.0018 0.89 (Age-8)3:rs3810291 0.0002 0.0003 0.56 0.0004 0.0009 0.69 (Age-8)3:k2:rs3810291 -0.0005 0.0006 0.46 -0.0004 0.0018 0.85 (Age-8)3:k3:rs3810291 0.0001 0.0013 0.92 -0.0002 0.0023 0.93 (Age-8)3:k1:rs3810291 -0.0002 0.0342 1.00 0.0015 0.1046 0.99

QPCTL, GIPR

rs2287019 0.0013 0.0040 0.74

0.05

-0.0078 0.0088 0.38

0.76 0.16 1.00

(Age-8):rs2287019 -0.0003 0.0007 0.66 0.0003 0.0015 0.84 (Age-8)2:rs2287019 -0.0007 0.0007 0.32 0.0007 0.0019 0.73 (Age-8)3:rs2287019 -0.0004 0.0003 0.17 0.0001 0.0010 0.93 (Age-8)3:k2:rs2287019 0.0007 0.0007 0.32 -0.0007 0.0019 0.71 (Age-8)3:k3:rs2287019 -0.0008 0.0014 0.57 0.0017 0.0024 0.48

(Age-8)3:k1:rs2287019 -0.1035 0.0355 0.00 0.0088 0.1109 0.94

GNPDA2

rs10938397 0.0045 0.0032 0.15

0.24

0.0055 0.0068 0.42

0.69 0.46 1.00

(Age-8):rs10938397 0.0010 0.0005 0.07 0.0018 0.0011 0.10 (Age-8)2:rs10938397 0.0005 0.0005 0.31 0.0017 0.0015 0.26 (Age-8)3:rs10938397 0.0002 0.0002 0.42 0.0007 0.0007 0.37 (Age-8)3:k2:rs10938397 -0.0007 0.0005 0.16 -0.0017 0.0015 0.27 (Age-8)3:k3:rs10938397 0.0021 0.0011 0.07 0.0021 0.0018 0.24 (Age-8)3:k1:rs10938397 -0.0277 0.0293 0.34 0.0206 0.0858 0.81

LRP1B

rs2890652 -0.0002 0.0042 0.96

0.09

-0.0079 0.0096 0.41

0.22 0.10 1.00

(Age-8):rs2890652 0.0014 0.0007 0.05 -0.0007 0.0016 0.64 (Age-8)2:rs2890652 0.0008 0.0007 0.25 0.0032 0.0021 0.13 (Age-8)3:rs2890652 0.0002 0.0003 0.55 0.0018 0.0010 0.08 (Age-8)3:k2:rs2890652 -0.0006 0.0007 0.35 -0.0030 0.0021 0.15 (Age-8)3:k3:rs2890652 0.0003 0.0015 0.86 0.0022 0.0027 0.41 (Age-8)3:k1:rs2890652 -0.0500 0.0371 0.18 0.1159 0.1200 0.33

Table 2: Results for males of the 32 individual adult BMI associated SNPs with BMI trajectories in both cohorts and the combined meta-analysis; k1, k2 and k3 represent

knot point 1, 2 and 3 respectively.

ALSPAC Raine

Combined P-Value

Combined bonferroni corrected P-Value

Gene SNP Beta SE P LRT P Beta SE P LRT P

NEGR1

rs2568958 0.0041 0.0029 0.15

0.44

0.0052 0.0068 0.44

0.49 0.55 1.00

(Age-8):rs2568958 -2.79x10-5 0.0005 0.96 0.0002 0.0012 0.87 (Age-8)2:rs2568958 -0.0002 0.0005 0.74 0.0022 0.0015 0.15 (Age-8)3:rs2568958 1.63x10-5 0.0002 0.94 0.0013 0.0008 0.08 (Age-8)3:k2:rs2568958 0.0001 0.0005 0.91 -0.0023 0.0015 0.13 (Age-8)3:k3:rs2568958 -0.0005 0.0011 0.64 0.0020 0.0019 0.29 (Age-8)3:k1:rs2568958 0.0422 0.0271 0.12 0.1354 0.0877 0.12

TNNI3K

rs1514175 0.0040 0.0028 0.16

0.07

0.0104 0.0067 0.12

0.42 0.13 1.00

(Age-8):rs1514175 0.0007 0.0005 0.19 0.0026 0.0011 0.02 (Age-8)2:rs1514175 0.0005 0.0005 0.32 0.0008 0.0015 0.60 (Age-8)3:rs1514175 0.0001 0.0002 0.65 0.0001 0.0008 0.86 (Age-8)3:k2:rs1514175 -0.0004 0.0005 0.45 -0.0009 0.0015 0.58 (Age-8)3:k3:rs1514175 0.0006 0.0011 0.62 0.0020 0.0019 0.29 (Age-8)3:k1:rs1514175 -0.0171 0.0266 0.52 -0.0499 0.0890 0.58

PTBP2

rs1555543 -0.0039 0.0029 0.18

0.58

0.0133 0.0068 0.05

0.11 0.25 1.00 (Age-8):rs1555543 0.0005 0.0005 0.32 0.0028 0.0012 0.02 (Age-8)2:rs1555543 0.0004 0.0005 0.41 0.0008 0.0015 0.61 (Age-8)3:rs1555543 1.75x10-5 0.0002 0.94 0.0004 0.0008 0.65 (Age-8)3:k2:rs1555543 -0.0003 0.0005 0.51 -0.0009 0.0015 0.55

(Age-8)3:k3:rs1555543 0.0014 0.0012 0.23 0.0013 0.0019 0.49 (Age-8)3:k1:rs1555543 -0.0072 0.0264 0.78 0.0498 0.0902 0.58

SEC16B

rs543874 0.0138 0.0035 0.00

5.8x10-7

-0.0015 0.0081 0.85

0.02 2.4x10-7 7.8x10-6

(Age-8):rs543874 0.0023 0.0006 0.00 -0.0009 0.0014 0.52 (Age-8)2:rs543874 -0.0007 0.0006 0.24 -0.0005 0.0018 0.78 (Age-8)3:rs543874 -0.0005 0.0003 0.05 0.0000 0.0009 0.96 (Age-8)3:k2:rs543874 0.0009 0.0006 0.14 0.0009 0.0019 0.65 (Age-8)3:k3:rs543874 -0.0017 0.0014 0.22 -0.0041 0.0023 0.07 (Age-8)3:k1:rs543874 -0.0215 0.0320 0.50 0.0380 0.1068 0.72

TMEM18

rs2867125 0.0113 0.0038 0.00

3.5x10-6

0.0127 0.0091 0.16

0.04 2.5x10-6 1.0x10-4

(Age-8):rs2867125 0.0037 0.0007 0.00 0.0040 0.0016 0.01 (Age-8)2:rs2867125 0.0007 0.0007 0.31 0.0013 0.0021 0.53 (Age-8)3:rs2867125 -0.0001 0.0003 0.79 0.0005 0.0011 0.62 (Age-8)3:k2:rs2867125 -0.0007 0.0007 0.28 -0.0013 0.0021 0.53 (Age-8)3:k3:rs2867125 0.0025 0.0015 0.10 0.0012 0.0026 0.63 (Age-8)3:k1:rs2867125 -0.0389 0.0342 0.26 0.1612 0.1248 0.20

RBJ, ADCY3, POMC

rs713586 0.0063 0.0028 0.02

0.22

0.0117 0.0069 0.09

0.25 0.22 1.00

(Age-8):rs713586 0.0008 0.0005 0.11 0.0013 0.0012 0.27 (Age-8)2:rs713586 0.0001 0.0005 0.81 -0.0003 0.0015 0.85 (Age-8)3:rs713586 1.12x10-6 0.0002 1.00 -0.0003 0.0008 0.69 (Age-8)3:k2:rs713586 -0.0002 0.0005 0.70 0.0004 0.0015 0.81 (Age-8)3:k3:rs713586 0.0009 0.0011 0.42 -0.0006 0.0019 0.74 (Age-8)3:k1:rs713586 0.0058 0.0257 0.82 -0.0390 0.0894 0.66

FANCL

rs887912 -0.0032 0.0031 0.30

0.21

-0.0049 0.0074 0.50

0.55 0.36 1.00 (Age-8):rs887912 0.0002 0.0005 0.66 -0.0003 0.0012 0.79 (Age-8)2:rs887912 0.0014 0.0006 0.01 0.0006 0.0017 0.70 (Age-8)3:rs887912 0.0006 0.0002 0.01 0.0002 0.0009 0.77

(Age-8)3:k2:rs887912 -0.0014 0.0005 0.01 -0.0008 0.0017 0.64 (Age-8)3:k3:rs887912 0.0019 0.0012 0.13 0.0018 0.0021 0.39 (Age-8)3:k1:rs887912 0.0412 0.0279 0.14 0.0563 0.0994 0.57

CADM2

rs13078807 0.0030 0.0035 0.40

0.98

-0.0063 0.0079 0.42

0.01 0.05 1.00

(Age-8):rs13078807 0.0003 0.0006 0.60 0.0012 0.0013 0.39 (Age-8)2:rs13078807 -0.0005 0.0006 0.46 0.0002 0.0018 0.93 (Age-8)3:rs13078807 -0.0002 0.0003 0.39 -0.0005 0.0009 0.59 (Age-8)3:k2:rs13078807 0.0004 0.0006 0.49 0.0009 0.0018 0.63 (Age-8)3:k3:rs13078807 -0.0004 0.0014 0.78 -0.0024 0.0022 0.28 (Age-8)3:k1:rs13078807 0.0000 0.0326 1.00 -0.1535 0.1026 0.13

ETV5, DGKG, SFRS10

rs7647305 0.0008 0.0035 0.82

0.48

0.0041 0.0084 0.62

0.35 0.47 1.00

(Age-8):rs7647305 0.0012 0.0006 0.05 0.0012 0.0014 0.39 (Age-8)2:rs7647305 0.0008 0.0006 0.20 0.0018 0.0019 0.35 (Age-8)3:rs7647305 0.0002 0.0003 0.51 0.0009 0.0010 0.36 (Age-8)3:k2:rs7647305 -0.0007 0.0006 0.26 -0.0020 0.0019 0.28 (Age-8)3:k3:rs7647305 0.0013 0.0014 0.38 0.0033 0.0024 0.17 (Age-8)3:k1:rs7647305 -0.0230 0.0328 0.48 0.1532 0.1043 0.14

SLC39A8

rs13107325 0.0075 0.0053 0.16

0.64

0.0047 0.0129 0.71

0.32 0.52 1.00

(Age-8):rs13107325 -0.0003 0.0009 0.79 0.0007 0.0022 0.76 (Age-8)2:rs13107325 -0.0013 0.0010 0.16 0.0016 0.0029 0.59 (Age-8)3:rs13107325 -0.0005 0.0004 0.22 0.0006 0.0014 0.69 (Age-8)3:k2:rs13107325 0.0014 0.0009 0.14 -0.0019 0.0029 0.51 (Age-8)3:k3:rs13107325 -0.0030 0.0021 0.15 0.0049 0.0036 0.18 (Age-8)3:k1:rs13107325 -0.0678 0.0480 0.16 0.0385 0.1711 0.82

FLJ35779, HMGCR rs2112347 0.0020 0.0030 0.50

0.01 -0.0009 0.0070 0.90

0.31 0.03 0.84 (Age-8):rs2112347 0.0018 0.0005 0.00 -0.0018 0.0012 0.13 (Age-8)2:rs2112347 0.0004 0.0005 0.44 -0.0001 0.0016 0.94

(Age-8)3:rs2112347 -0.0001 0.0002 0.83 0.0000 0.0008 0.99 (Age-8)3:k2:rs2112347 -0.0004 0.0005 0.47 0.0003 0.0016 0.87 (Age-8)3:k3:rs2112347 0.0014 0.0012 0.23 -0.0002 0.0020 0.90 (Age-8)3:k1:rs2112347 0.0209 0.0277 0.45 -0.0706 0.0929 0.45

ZNF608

rs4836133 0.0010 0.0029 0.74

0.06

0.0068 0.0071 0.34

0.56 0.15 1.00

(Age-8):rs4836133 -0.0013 0.0005 0.01 0.0018 0.0012 0.14 (Age-8)2:rs4836133 -0.0003 0.0005 0.52 -0.0019 0.0016 0.23 (Age-8)3:rs4836133 2.62x10-7 0.0002 1.00 -0.0013 0.0008 0.09 (Age-8)3:k2:rs4836133 0.0003 0.0005 0.52 0.0021 0.0016 0.19 (Age-8)3:k3:rs4836133 -0.0006 0.0011 0.58 -0.0015 0.0020 0.45 (Age-8)3:k1:rs4836133 0.0282 0.0264 0.28 -0.1404 0.0922 0.13

TFAP2B

rs987237 0.0066 0.0036 0.07

3.0x10-3

0.0204 0.0085 0.02

0.42 0.01 0.32

(Age-8):rs987237 0.0024 0.0006 0.00 0.0009 0.0015 0.56 (Age-8)2:rs987237 0.0012 0.0006 0.07 -0.0004 0.0019 0.82 (Age-8)3:rs987237 0.0004 0.0003 0.20 0.0000 0.0010 0.99 (Age-8)3:k2:rs987237 -0.0014 0.0006 0.04 0.0001 0.0019 0.98 (Age-8)3:k3:rs987237 0.0025 0.0015 0.08 0.0002 0.0024 0.95 (Age-8)3:k1:rs987237 0.0235 0.0329 0.48 -0.0332 0.1150 0.77

LRRN6C

rs10968576 0.0034 0.0030 0.26

0.02

0.0040 0.0073 0.58

0.22 0.02 0.75

(Age-8):rs10968576 0.0006 0.0005 0.28 -0.0005 0.0012 0.71 (Age-8)2:rs10968576 -0.0009 0.0005 0.08 0.0010 0.0017 0.56 (Age-8)3:rs10968576 -0.0004 0.0002 0.08 0.0007 0.0008 0.40 (Age-8)3:k2:rs10968576 0.0010 0.0005 0.07 -0.0012 0.0017 0.46 (Age-8)3:k3:rs10968576 -0.0024 0.0012 0.05 0.0020 0.0021 0.34 (Age-8)3:k1:rs10968576 0.0436 0.0281 0.12 0.0787 0.0953 0.41

LMX1B rs867559 0.0011 0.0036 0.76

0.07 0.0020 0.0082 0.81

0.86 0.22 1.00 (Age-8):rs867559 0.0009 0.0006 0.15 0.0005 0.0014 0.73

(Age-8)2:rs867559 0.0017 0.0006 0.01 -0.0005 0.0018 0.78 (Age-8)3:rs867559 0.0006 0.0003 0.02 -0.0003 0.0009 0.78 (Age-8)3:k2:rs867559 -0.0016 0.0006 0.01 0.0002 0.0018 0.92 (Age-8)3:k3:rs867559 0.0025 0.0014 0.08 0.0011 0.0023 0.62 (Age-8)3:k1:rs867559 -0.0228 0.0324 0.48 -0.0063 0.1036 0.95

RPL27A, TUB

rs4929949 0.0033 0.0029 0.25

0.58

0.0055 0.0068 0.42

0.98 0.89 1.00

(Age-8):rs4929949 -0.0001 0.0005 0.85 0.0008 0.0012 0.50 (Age-8)2:rs4929949 -0.0007 0.0005 0.19 -0.0007 0.0015 0.64 (Age-8)3:rs4929949 -0.0003 0.0002 0.21 -0.0005 0.0008 0.54 (Age-8)3:k2:rs4929949 0.0007 0.0005 0.18 0.0007 0.0016 0.67 (Age-8)3:k3:rs4929949 -0.0013 0.0012 0.26 -0.0002 0.0019 0.92 (Age-8)3:k1:rs4929949 0.0145 0.0266 0.59 -0.0763 0.0888 0.39

BDNF

rs6265 0.0015 0.0035 0.68

0.10

0.0121 0.0083 0.14

0.32 0.15 1.00

(Age-8):rs6265 0.0014 0.0006 0.03 0.0013 0.0014 0.35 (Age-8)2:rs6265 0.0018 0.0006 0.01 -0.0021 0.0018 0.26 (Age-8)3:rs6265 0.0007 0.0003 0.02 -0.0012 0.0009 0.19 (Age-8)3:k2:rs6265 -0.0018 0.0006 0.00 0.0024 0.0018 0.20 (Age-8)3:k3:rs6265 0.0036 0.0014 0.01 -0.0026 0.0023 0.25 (Age-8)3:k1:rs6265 0.0271 0.0333 0.42 -0.1286 0.1049 0.22

MTCH2, NDUFS3, CUGBP1

rs3817334 -0.0005 0.0029 0.86

0.47

0.0068 0.0070 0.33

0.89 0.78 1.00

(Age-8):rs3817334 0.0005 0.0005 0.37 0.0002 0.0012 0.86 (Age-8)2:rs3817334 0.0003 0.0005 0.62 0.0007 0.0016 0.63 (Age-8)3:rs3817334 -4.89x10-5 0.0002 0.83 0.0005 0.0008 0.55 (Age-8)3:k2:rs3817334 -0.0001 0.0005 0.87 -0.0008 0.0016 0.62 (Age-8)3:k3:rs3817334 0.0004 0.0012 0.75 0.0004 0.0020 0.82 (Age-8)3:k1:rs3817334 -0.0354 0.0267 0.19 0.0180 0.0917 0.84

FAIM2 rs7138803 0.0063 0.0030 0.03 0.62 0.0133 0.0067 0.05 0.44 0.63 1.00

(Age-8):rs7138803 0.0005 0.0005 0.39 0.0014 0.0011 0.21 (Age-8)2:rs7138803 -0.0005 0.0005 0.37 -0.0010 0.0015 0.52 (Age-8)3:rs7138803 -0.0001 0.0002 0.56 -0.0004 0.0008 0.61 (Age-8)3:k2:rs7138803 0.0004 0.0005 0.48 0.0009 0.0015 0.56 (Age-8)3:k3:rs7138803 -0.0008 0.0012 0.49 -0.0011 0.0019 0.55 (Age-8)3:k1:rs7138803 -0.0007 0.0272 0.98 -0.0209 0.0884 0.81

MTIF3, GTF3A

rs4771122 0.0014 0.0035 0.70

0.51

0.0293 0.0087 0.00

3.8x10-3 0.01 0.44

(Age-8):rs4771122 0.0002 0.0006 0.71 0.0032 0.0015 0.03 (Age-8)2:rs4771122 0.0011 0.0006 0.07 0.0002 0.0020 0.94 (Age-8)3:rs4771122 0.0005 0.0003 0.07 0.0002 0.0010 0.87 (Age-8)3:k2:rs4771122 -0.0012 0.0006 0.06 -0.0010 0.0020 0.61 (Age-8)3:k3:rs4771122 0.0024 0.0014 0.08 0.0026 0.0025 0.29 (Age-8)3:k1:rs4771122 0.0392 0.0321 0.22 0.0927 0.1166 0.43

PRKD1

rs11847697 0.0207 0.0067 0.00

1.5x10-3

0.0031 0.0179 0.86

0.98 0.01 0.35

(Age-8):rs11847697 0.0046 0.0012 0.00 0.0017 0.0030 0.57 (Age-8)2:rs11847697 0.0015 0.0012 0.22 0.0029 0.0040 0.46 (Age-8)3:rs11847697 0.0004 0.0005 0.45 0.0012 0.0020 0.55 (Age-8)3:k2:rs11847697 -0.0017 0.0012 0.15 -0.0028 0.0040 0.48 (Age-8)3:k3:rs11847697 0.0037 0.0028 0.18 0.0032 0.0050 0.52 (Age-8)3:k1:rs11847697 0.0160 0.0624 0.80 0.0691 0.2353 0.77

NRXN3

rs10150332 0.0040 0.0035 0.25

0.06

-0.0024 0.0080 0.76

0.01 3.1x10-3 0.10

(Age-8):rs10150332 0.0020 0.0006 0.00 -0.0016 0.0013 0.23 (Age-8)2:rs10150332 -0.0001 0.0006 0.92 -0.0001 0.0018 0.98 (Age-8)3:rs10150332 -0.0002 0.0003 0.56 0.0000 0.0009 0.96 (Age-8)3:k2:rs10150332 -0.0002 0.0006 0.78 -0.0003 0.0018 0.87 (Age-8)3:k3:rs10150332 0.0019 0.0014 0.17 0.0005 0.0022 0.84 (Age-8)3:k1:rs10150332 0.0139 0.0333 0.68 -0.0279 0.1041 0.79

MAP2K5, LBXCOR1

rs2241423 0.0039 0.0035 0.26

0.08

0.0065 0.0076 0.39

0.71 0.21 1.00

(Age-8):rs2241423 0.0016 0.0006 0.01 0.0020 0.0013 0.13 (Age-8)2:rs2241423 -0.0006 0.0006 0.30 0.0014 0.0017 0.42 (Age-8)3:rs2241423 -0.0005 0.0003 0.05 0.0005 0.0009 0.57 (Age-8)3:k2:rs2241423 0.0005 0.0006 0.42 -0.0013 0.0017 0.44 (Age-8)3:k3:rs2241423 0.0010 0.0014 0.48 0.0015 0.0021 0.47 (Age-8)3:k1:rs2241423 -0.0495 0.0314 0.12 0.0109 0.1034 0.92

GPRC5B, IQCK

rs12444979 0.0077 0.0041 0.06

0.07

0.0109 0.0098 0.27

0.51 0.15 1.00

(Age-8):rs12444979 0.0010 0.0007 0.15 0.0011 0.0017 0.52 (Age-8)2:rs12444979 0.0009 0.0007 0.23 -0.0029 0.0022 0.18 (Age-8)3:rs12444979 0.0003 0.0003 0.32 -0.0016 0.0011 0.16 (Age-8)3:k2:rs12444979 -0.0009 0.0007 0.22 0.0025 0.0022 0.25 (Age-8)3:k3:rs12444979 0.0009 0.0016 0.57 -0.0012 0.0027 0.67 (Age-8)3:k1:rs12444979 0.0151 0.0386 0.70 -0.0914 0.1273 0.47


rs7359397 0.0032 0.0029 0.27

0.23

0.0126 0.0069 0.07

0.66 0.44 1.00

(Age-8):rs7359397 0.0011 0.0005 0.03 0.0004 0.0012 0.70 (Age-8)2:rs7359397 -0.0005 0.0005 0.28 -0.0012 0.0016 0.43 (Age-8)3:rs7359397 -0.0003 0.0002 0.15 -0.0005 0.0008 0.56 (Age-8)3:k2:rs7359397 0.0005 0.0005 0.30 0.0011 0.0016 0.50 (Age-8)3:k3:rs7359397 -0.0007 0.0011 0.51 -0.0015 0.0019 0.44 (Age-8)3:k1:rs7359397 -0.0320 0.0266 0.23 0.0044 0.0907 0.96

FTO

rs9939609 0.0106 0.0029 0.00

8.4x10-12

0.0310 0.0069 0.00

1.4x10-4 4.0x10-14 1.36x10-12

(Age-8):rs9939609 0.0041 0.0005 0.00 0.0047 0.0012 0.00 (Age-8)2:rs9939609 0.0009 0.0005 0.07 -0.0031 0.0016 0.05 (Age-8)3:rs9939609 0.0002 0.0002 0.46 -0.0017 0.0008 0.03 (Age-8)3:k2:rs9939609 -0.0013 0.0005 0.01 0.0027 0.0016 0.08 (Age-8)3:k3:rs9939609 0.0044 0.0012 0.00 -0.0018 0.0020 0.37

(Age-8)3:k1:rs9939609 -0.0114 0.0278 0.68 -0.1378 0.0912 0.13

MC4R

rs12970134 0.0107 0.0032 0.00

1.6x10-4

0.0031 0.0078 0.69

0.79 1.2x10-3 0.04

(Age-8):rs12970134 0.0027 0.0006 0.00 -0.0001 0.0013 0.96 (Age-8)2:rs12970134 0.0007 0.0006 0.20 0.0011 0.0017 0.51 (Age-8)3:rs12970134 0.0001 0.0002 0.54 0.0008 0.0009 0.36 (Age-8)3:k2:rs12970134 -0.0009 0.0006 0.12 -0.0012 0.0018 0.51 (Age-8)3:k3:rs12970134 0.0026 0.0013 0.04 0.0004 0.0022 0.85 (Age-8)3:k1:rs12970134 0.0011 0.0292 0.97 0.1289 0.1028 0.21

KCTD15

rs29941 0.0044 0.0030 0.15

0.63

-0.0010 0.0071 0.89

0.26 0.46 1.00

(Age-8):rs29941 0.0005 0.0005 0.31 -0.0025 0.0012 0.04 (Age-8)2:rs29941 0.0004 0.0005 0.50 -0.0015 0.0016 0.36 (Age-8)3:rs29941 0.0002 0.0002 0.41 -0.0005 0.0008 0.56 (Age-8)3:k2:rs29941 -0.0004 0.0005 0.41 0.0017 0.0016 0.29 (Age-8)3:k3:rs29941 0.0003 0.0012 0.78 -0.0030 0.0020 0.14 (Age-8)3:k1:rs29941 -0.0006 0.0283 0.98 -0.0487 0.0913 0.59

TMEM160, ZC3H4

rs3810291 0.0007 0.0034 0.83

0.24

0.0042 0.0084 0.62

0.98 0.57 1.00

(Age-8):rs3810291 0.0015 0.0006 0.01 0.0011 0.0014 0.42 (Age-8)2:rs3810291 0.0001 0.0006 0.87 -0.0011 0.0019 0.57 (Age-8)3:rs3810291 -0.0002 0.0003 0.49 -0.0007 0.0010 0.49 (Age-8)3:k2:rs3810291 0.0000 0.0006 1.00 0.0009 0.0019 0.64 (Age-8)3:k3:rs3810291 0.0007 0.0014 0.59 0.0003 0.0024 0.91 (Age-8)3:k1:rs3810291 -0.0319 0.0320 0.32 -0.0438 0.1069 0.68

QPCTL, GIPR

rs2287019 0.0032 0.0036 0.37

0.91

0.0033 0.0088 0.71

0.54 0.84 1.00 (Age-8):rs2287019 0.0002 0.0006 0.75 0.0006 0.0015 0.69 (Age-8)2:rs2287019 0.0003 0.0006 0.63 -0.0003 0.0020 0.90 (Age-8)3:rs2287019 0.0002 0.0003 0.56 -0.0004 0.0010 0.68 (Age-8)3:k2:rs2287019 -0.0004 0.0006 0.55 0.0006 0.0020 0.75

(Age-8)3:k3:rs2287019 0.0007 0.0014 0.63 -0.0013 0.0025 0.60 (Age-8)3:k1:rs2287019 0.0297 0.0327 0.36 -0.0765 0.1176 0.52

GNPDA2

rs10938397 0.0076 0.0029 0.01

0.02

0.0075 0.0068 0.27

0.05 0.01 0.30

(Age-8):rs10938397 0.0008 0.0005 0.12 0.0034 0.0011 0.00 (Age-8)2:rs10938397 0.0002 0.0005 0.66 0.0031 0.0015 0.04 (Age-8)3:rs10938397 0.0002 0.0002 0.36 0.0011 0.0008 0.16 (Age-8)3:k2:rs10938397 -0.0004 0.0005 0.45 -0.0033 0.0015 0.03 (Age-8)3:k3:rs10938397 5.46x10-6 0.0012 1.00 0.0057 0.0019 0.00 (Age-8)3:k1:rs10938397 0.0579 0.0262 0.03 0.0408 0.0897 0.65

LRP1B

rs2890652 -0.0039 0.0038 0.30

0.26

0.0021 0.0093 0.82

0.30 0.28 1.00

(Age-8):rs2890652 0.0010 0.0007 0.13 0.0028 0.0016 0.07 (Age-8)2:rs2890652 0.0014 0.0007 0.04 0.0007 0.0021 0.73 (Age-8)3:rs2890652 0.0004 0.0003 0.17 -0.0002 0.0010 0.85 (Age-8)3:k2:rs2890652 -0.0012 0.0007 0.08 -0.0008 0.0021 0.70 (Age-8)3:k3:rs2890652 0.0023 0.0015 0.13 0.0031 0.0026 0.23 (Age-8)3:k1:rs2890652 -0.0129 0.0341 0.71 -0.1640 0.1162 0.16

Appendix G: R Code For The Models Used In The Analysis Of Each Chapter

G.1 Chapter Two G.1.1 Linear Mixed Model (LMM)

######################

### Load libraries ###

######################

library(nlme)

##################

### Female LMM ###

##################

female.lmm <- lme(log(bmi) ~ I(age-8) + I((age-8)^2) + I((age-8)^3),

data=data.f, method="ML", random = ~ I(age-8) + I((age-8)^2)|ID,

na.action=na.omit, correlation=corCAR1(form = ~ 1 |ID))

summary(female.lmm)

female.lmm.genetic <- lme(log(bmi) ~ (I(age-8) + I((age-8)^2) +

I((age-8)^3)) * score, data=data.f, method="ML",

random = ~ I(age-8) + I((age-8)^2)|ID, na.action=na.omit,

correlation=corCAR1(form = ~ 1 |ID))

summary(female.lmm.genetic)

################

### Male LMM ###

################

male.lmm <- lme(log(bmi) ~ I(age-8) + I((age-8)^2) + I((age-8)^3),

data=data.m, method="ML", random = ~ I(age-8) + I((age-8)^2)|ID,

na.action=na.omit, correlation=corCAR1(form = ~ 1 |ID))

summary(male.lmm)

male.lmm.genetic <- lme(log(bmi) ~ (I(age-8) + I((age-8)^2) +

I((age-8)^3)) * score, data=data.m, method="ML",

random = ~ I(age-8) + I((age-8)^2)|ID, na.action=na.omit,

correlation=corCAR1(form = ~ 1 |ID))

summary(male.lmm.genetic)

G.1.2 Skew-t Linear Mixed Model (STLMM)

#################

### Load code ###

#################

source(“http://www.ime.unicamp.br/~hlachos/RprogramSNI.r”)

####################

### Female STLMM ###

####################

attach(data.f)

# Define lme model to get starting values for skew-t

female.lme.skew <- lme(bmi ~ I(age-8) + I((age-8)^2) + I((age-8)^3),

data=data.f, method="REML", random = ~ 1 + I(age-8) + I((age-

8)^2) |ID, na.action=na.omit, correlation=corCAR1())

# Define data vectors and matrices

y <- as.vector(bmi)

X.mat <- as.matrix(cbind(1, data.f$age-8, (data.f$age-8)^2,

(data.f$age-8)^3 ) )

Z.mat <- as.matrix(cbind(1, (data.f$age-8)))

X.mat.complete <- X.mat[apply(is.na(X.mat),1,sum)==0 & !is.na(bmi),]

Z.mat.complete <- Z.mat[apply(is.na(X.mat),1,sum)==0 & !is.na(bmi),]

BMI.complete <- bmi[apply(is.na(X.mat),1,sum)==0 & !is.na(bmi)]

ID.complete <- ID[apply(is.na(X.mat),1,sum)==0 & !is.na(bmi)]

# Calculate number of obserations per group

histo <- histogram(~BMI.complete | ID.complete)

nj <- as.vector(summary(histo)[[2]])

# Starting values for EM algorithm

beta1 <- fixed.effects(female.lme.skew)

sigmae <- female.lme.skew$sigma

D1 <- diag(1,dim(Z.mat)[2])

D1[1,1] <- as.numeric(VarCorr(female.lme.skew)[,"Variance"][1])

D1[2,2] <- as.numeric(VarCorr(female.lme.skew)[,"Variance"][2])

D1[1,2] <- as.numeric(VarCorr(female.lme.skew)[,"Corr"][2])

D1[2,1] <- as.numeric(VarCorr(female.lme.skew)[,"Corr"][2])

lambda <- rep(1,dim(Z.mat)[2])

nu <- 10

# Skew-t mixed Model

female.skewt <- EM.Skew(nj=nj, y=BMI.complete, x=X.mat.complete,

z=Z.mat.complete, beta1=beta1, sigmae=sigmae, D1=D1,

lambda=lambda, nu=nu, Ind=2, lb=-Inf, lu=Inf, precisão=0.001,

loglik=T, informa=T, calcbi=T)

bi.skewt.f <-read.table("bi.txt")

# Extract and derive parameters for use

bi.mat.st.f <- NULL

for (i in 1:nrow(bi.skewt.f)) {

bi.mat.st.f <- rbind(bi.mat.st.f,cbind(rep(bi.skewt.f[i,1],nj[i]),

rep(bi.skewt.f[i,2],nj[i])))

}

fixed.effects.st.f <- female.skewt$theta[1:4]

sigmae.st.f <- female.skewt$theta[5]

variance.intercept.st.f <- female.skewt$theta[6]

variance.slope.st.f <- female.skewt$theta[8]

corr.slope.intercept.st.f <- female.skewt$theta[7]

skewness.intercept.st.f <- female.skewt$theta[9]

skewness.slope.st.f <- female.skewt$theta[10]

kurtosis.st.f <- female.skewt$theta[11]

marginal.res.st.f <- BMI.complete - X.mat.complete %*%

(matrix(fixed.effects.st.f, nrow=length(fixed.effects.st.f)))

conditional.res.st.f <- BMI.complete - (X.mat.complete %*%

(matrix(fixed.effects.st.f, nrow=length(fixed.effects.st.f))))-

apply(Z.mat.complete*bi.mat.st.f, 1, sum)

bmi.fitted.skewt.f <- (X.mat.complete %*%

(matrix(fixed.effects.st.f, nrow=length(fixed.effects.st.f)))) +

apply(Z.mat.complete*bi.mat.st.f, 1, sum)

pt(abs(female.skewt$theta[1:10]/female.skewt$desvios[1:10]),

df=4000,lower.tail=F)*2

# Skew-t mixed model including genetics

female.lme.skew.score <- lme(bmi ~ (I(age-8) + I((age-8)^2) +

I((age-8)^3)) * score, data=data.f, method="REML",

random = ~ 1 + I(age-8)+ I((age-8)^2) |ID, na.action=na.omit,

correlation=corCAR1())

y <- as.vector(bmi)

X.mat <- as.matrix(cbind(1, data.f$age-8, (data.f$age-8)^2,

(data.f$age-8)^3, data.f$score, (data.f$age-8) * score,

(data.f$age-8)^2 * score, (data.f$age-8)^3 * score))

Z.mat <- as.matrix(cbind(1, (data.f$age-8)))







beta1 <- fixed.effects(female.lme.skew.score)

sigmae <- female.lme.skew.score$sigma


D1[1,1] <- 5; D1[2,2] <- 0.07; D1[1,2]<-0.2; D1[2,1]<-0.2

lambda <- rep(1,dim(Z.mat)[2])

nu<-10

female.skewt.score <- EM.Skew(nj=nj, y=BMI.complete, x=X.mat.complete,




bi.skewt.f.score <-read.table("bi.txt")

bi.mat.st.f.score <- NULL

for (i in 1:nrow(bi.skewt.f.score)) {

bi.mat.st.f.score <- rbind(bi.mat.st.f.score,

cbind(rep(bi.skewt.f.score[i,1],nj[i]),

rep(bi.skewt.f.score[i,2],nj[i])))

}

fixed.effects.st.f.score <- female.skewt.score$theta[1:8]

sigmae.st.f.score <- female.skewt.score$theta[9]

variance.intercept.st.f.score <- female.skewt.score$theta[10]

variance.slope.st.f.score <- female.skewt.score$theta[12]

corr.slope.intercept.st.f.score <- female.skewt.score$theta[11]

skewness.intercept.st.f.score <- female.skewt.score$theta[13]

skewness.slope.st.f.score <- female.skewt.score$theta[14]

kurtosis.st.f.score <- female.skewt.score$theta[15]

marginal.res.st.f.score <- BMI.complete - X.mat.complete %*%

(matrix(fixed.effects.st.f.score,

nrow=length(fixed.effects.st.f.score)))

conditional.res.st.f.score <- BMI.complete - (X.mat.complete %*%


nrow=length(fixed.effects.st.f.score)))) –

apply(Z.mat.complete*bi.mat.st.f.score,1,sum)

bmi.fitted.skewt.f.score <- (X.mat.complete %*%


nrow=length(fixed.effects.st.f.score)))) +

apply(Z.mat.complete*bi.mat.st.f.score,1,sum)

detach(data.f)

##################

### Male STLMM ###

##################

attach(data.m)

# Define lme model to get starting values for skew-t

male.lme.skew <- lme(bmi ~ I(age-8) + I((age-8)^2) + I((age-8)^3),

method = "REML", random = ~ 1 + I(age-8)+ I((age-8)^2) | ID,

data=data.m, na.action=na.omit, correlation=corCAR1(form=~1|ID))

summary(male.lme.skew)

# Define data vectors and matrices

y <- as.vector(bmi)

X.mat <- as.matrix(cbind(1, data.m$age-8, (data.m$age-8)^2,

(data.m$age-8)^3))

Z.mat < -as.matrix(cbind(1, (data.m$age-8)))

X.mat.complete < -X.mat[apply(is.na(X.mat),1,sum)==0 & !is.na(bmi),]




# Calculate number of observations per group


nj < -as.vector(summary(histo)[[2]])

# Starting values for EM algorithm

beta1 <- fixed.effects(male.lme.skew)

sigmae <- male.lme.skew$sigma


D1[1,1]<-as.numeric(VarCorr(male.lme.skew)[,"Variance"][1])

D1[2,2]<-as.numeric(VarCorr(male.lme.skew)[,"Variance"][2])

D1[1,2]<-as.numeric(VarCorr(male.lme.skew)[,"Corr"][2])

D1[2,1]<-as.numeric(VarCorr(male.lme.skew)[,"Corr"][2])

Lambda <- rep(1,dim(Z.mat)[2])

nu <- 10

# Skew-t mixed model

male.skewt<-EM.Skew(nj=nj, y=BMI.complete, x=X.mat.complete,




bi.skewt.m <- read.table("bi.txt")

# Extract and derive parameters for use

bi.mat.st.m <- NULL

for (i in 1:nrow(bi.skewt.m)) {

bi.mat.st.m <- rbind(bi.mat.st.m,cbind(rep(bi.skewt.m[i,1],nj[i]),

rep(bi.skewt.m[i,2],nj[i])))

}

fixed.effects.st.m <- male.skewt$theta[1:4]

sigmae.st.m <- male.skewt$theta[5]

variance.intercept.st.m <- male.skewt$theta[6]

variance.slope.st.m <- male.skewt$theta[8]

corr.slope.intercept.st.m <- male.skewt$theta[7]

skewness.intercept.st.m <- male.skewt$theta[9]

skewness.slope.st.m <- male.skewt$theta[10]

kurtosis.st.m <- male.skewt$theta[11]

marginal.res.st.m <- BMI.complete-X.mat.complete %*%

(matrix(fixed.effects.st.m, nrow=length(fixed.effects.st.m)))

conditional.res.st.m <- BMI.complete-(X.mat.complete %*%

(matrix(fixed.effects.st.m, nrow=length(fixed.effects.st.m))))-

apply(Z.mat.complete*bi.mat.st.m, 1, sum)

bmi.fitted.skewt.m <-(X.mat.complete %*%

(matrix(fixed.effects.st.m, nrow=length(fixed.effects.st.m)))) +

apply(Z.mat.complete*bi.mat.st.m, 1, sum)

pt(abs(male.skewt$theta[1:10]/male.skewt$desvios[1:10]), df=4000,

lower.tail=F)*2

# Skew-t mixed model including genetics

male.lme.skew.score <- lme(bmi ~ (I(age-8) + I((age-8)^2) +

I((age-8)^3)) * score, data=data.m, method="REML",

random = ~ 1 + I(age-8)+ I((age-8)^2) |ID, na.action=na.omit,


y <- as.vector(bmi)

X.mat <- as.matrix(cbind(1, data.m$age-8, (data.m$age-8)^2,

(data.m$age-8)^3, data.m$score, (data.m$age-8) * data.m$score,

(data.m$age-8)^2 * data.m$score, (data.m$age-8)^3 *

data.m$score) )

Z.mat <- as.matrix(cbind(1, (data.m$age-8)))







beta1 <- fixed.effects(male.lme.skew.score)

sigmae <- male.lme.skew.score$sigma

D1<-diag(1,dim(Z.mat)[2])

D1[1,1] <- 5; D1[2,2] <- 0.07; D1[1,2]<-0.2; D1[2,1]<-0.2

Lambda <- rep(1,dim(Z.mat)[2])

Nu <- 10

male.skewt.score <- EM.Skew(nj=nj, y=BMI.complete, x=X.mat.complete,




bi.skewt.m.score <-read.table("bi.txt")

bi.mat.st.m.score <- NULL

for (i in 1:nrow(bi.skewt.m.score)) {

bi.mat.st.m.score <- rbind(bi.mat.st.m.score,

cbind(rep(bi.skewt.m.score[i,1],nj[i]),

rep(bi.skewt.m.score[i,2],nj[i])))

}

fixed.effects.st.m.score <- male.skewt.score$theta[1:8]

sigmae.st.m.score <- male.skewt.score$theta[9]

variance.intercept.st.m.score <- male.skewt.score$theta[10]

variance.slope.st.m.score <- male.skewt.score$theta[12]

corr.slope.intercept.st.m.score <- male.skewt.score$theta[11]

skewness.intercept.st.m.score <- male.skewt.score$theta[13]

skewness.slope.st.m.score <- male.skewt.score$theta[14]

kurtosis.st.m.score <- male.skewt.score$theta[15]

marginal.res.st.m.score <-BMI.complete – X.mat.complete %*%

(matrix(fixed.effects.st.m.score,

nrow=length(fixed.effects.st.m.score)))

conditional.res.st.m.score <- BMI.complete - (X.mat.complete %*%


nrow=length(fixed.effects.st.m.score))))-

apply(Z.mat.complete*bi.mat.st.m.score,1,sum)

bmi.fitted.skewt.m.score <-(X.mat.complete %*%


nrow=length(fixed.effects.st.m.score)))) +

apply(Z.mat.complete*bi.mat.st.m.score,1,sum)

detach(data.m)

G.1.3 Semi-Parametric Linear Mixed Model (SPLMM)

######################


######################

library(spida)

####################

### Female SPLMM ###

####################

sp.f <- function(x) gsp( x, knots = c((2-8), (8-8),(12-8)),

degree = c(3,3,3,3), smooth = c(2,2,2))

female.spline <- lme(log(bmi) ~ sp.f(age-8), data=data.f,

random=~sp.f(age-8)[,1:2]|ID, na.action=na.omit, method="ML")

summary(female.spline)

female.spline.score <- lme(log(bmi) ~ sp.f(age-8)*score, data=data.f,

random=~sp.f(age-8)[,1:2]|ID, na.action=na.omit, method="ML")

summary(female.spline.score)

##################

### Male SPLMM ###

##################

sp.m <- function(x) gsp( x, knots = c((2-8),(8-8),(12-8)),

degree = c(3,3,3,3), smooth = c(2,2,2))

male.spline <- lme(log(bmi) ~ sp.m(age-8), data=data.m,

random=~sp.m(age-8)[,1:2]|ID, na.action=na.omit, method="ML")

summary(male.spline)

male.spline.score <- lme(log(bmi) ~ sp.m(age-8)*score, data=data.m,

random=~sp.m(age-8)[,1:2]|ID, na.action=na.omit, method="ML")

summary(male.spline.score)

G.1.4 Non-linear mixed model (NLMM)

## sitarlib (the R library for SITAR) was provided by the author,

## Professor Tim Cole (Cole, T. J., M. D. Donaldson, et al. (2010).

## "SITAR--a useful instrument for growth curve analysis." Int J

## Epidemiol 39(6): 1558-1566.)

##############################

### Set control parameters ###

##############################

con.nlme <- nlmeControl(maxIter=10, pnlsMaxIter=10, msMaxIter=10,

returnObject=TRUE, msVerbose=FALSE)

###################

### Female NLMM ###

###################

female.sitar <- sitar(x=log(age), y=log(bmi), id=ID, data=data.f,

nk=3, random="a+b+c", a.formula=~1, b.formula=~-1, c.formula=~1,

d.formula=~-1, correlation=corCAR1(), control=con.nlme)

female.sitar.para <- merge(female.sitar.para, gen[,c("SUBJECTID",

"score")], by.x="ID", by.y="SUBJECTID", all.x=T)

female.sitar.score.size <- lm(BMI.size ~ score + BMI.tempo +

BMI.velocity, data=female.sitar.para)

summary(female.sitar.score.size)

female.sitar.score.temp <- lm(BMI.tempo ~ score + BMI.size +

BMI.velocity, data=female.sitar.para)

summary(female.sitar.score.temp)

female.sitar.score.vel <- lm(BMI.velocity ~ score + BMI.size +

BMI.tempo, data=female.sitar.para)

summary(female.sitar.score.vel)

#################

### Male NLMM ###

#################

male.sitar <- sitar(x=log(age), y=log(bmi), id=ID, data=data.m, nk=4,

random="a+b+c", a.formula=~1, b.formula=~-1, c.formula=~1,

d.formula=~-1, correlation=corCAR1(), control=con.nlme)

male.sitar.para <- merge(male.sitar.para, gen[,c("SUBJECTID",

"score")], by.x="ID", by.y="SUBJECTID", all.x=T)

male.sitar.score.size <- lm(BMI.size ~ score + BMI.tempo +

BMI.velocity, data=male.sitar.para)

summary(male.sitar.score.size)

male.sitar.score.temp <- lm(BMI.tempo ~ score + BMI.size +

BMI.velocity, data=male.sitar.para)

summary(male.sitar.score.temp)

male.sitar.score.vel <- lm(BMI.velocity ~ score + BMI.size +

BMI.tempo, data=male.sitar.para)

summary(male.sitar.score.vel)

G.2 Chapter Three

########################################

### SPLMM Model – including genetics ###

########################################

nfitF <- lme((bmi) ~ sp(age8)*snp, data=ndd,

random=~sp(age8)[,1:2]|ID, na.action=na.omit, method="ML",


####################################

### SPLMM Model without genetics ###

####################################

nfitF_base <- tryCatch(lme((bmi) ~ sp(age8), data=ndd,

random=~sp(age8)[,1:2]|ID, na.action=na.omit, method="ML",


##############################

### Extract random effects ###

##############################

ranefs <- ranef(nfitF_base)

ranefs <- cbind(rownames(ranefs), ranefs)

names(ranefs) <- c("ID", "Intercept", "Slope", "Slope2")

ranefs <- merge(ranefs, SNP, by="ID")

#############################################

### Model genetics against random effects ###

#############################################

ranef_i <- lm(Intercept ~ snp, data=ranefs)

ranef_s <- lm(Slope ~ snp, data=ranefs)

G.3 Chapter Four ## Code from the simulations assuming normally distributed random

## errors and residuals with constant variance under the equal

## unbalanced sampling design

#####################################

### Model using FTO SNP in ALSPAC ###

#####################################

rfitF <- lme((bmi) ~ (age8 + I((age8)^2) + I((age8)^3)) *

rs1121980_add + source.r, na.action=na.exclude, data=data,

random = ~ (age8) + I((age8)^2)|cid_724a, method="ML",


summary(rfitF)

############################

### Base model in ALSPAC ###

############################

rfitF_base <- lme((bmi) ~ (age8 + I((age8)^2) + I((age8)^3)) +

source.r, data=data, method="ML", na.action=na.exclude,

random = ~ (age8) + I((age8)^2)|cid_724a, correlation=corCAR1())

summary(rfitF_base)

###############################################################

### Extract parameters from model fit to use in simulations ###

###############################################################

Phi = 0.3935524

bta <- as.numeric(fixef(rfitF))

varRan <- matrix(as.numeric(getVarCov(rfitF)),ncol=3)

varE <-as.numeric(VarCorr(rfitF)[4,1])

bta[5] <- 0

bta[7] <- 0

bta[8] <- 0

bta[9] <- 0

jr.begin=1

jr.finis=1000

maf=0.3

###############################################

### Creates design matrix for fixed effects ###

###############################################

N <- 1000

times <- 1:15

prob <- maf

ID <- as.factor(sort(rep(1:N, length(times))))

prob_s <- as.vector(c(0.4, 0.2, 0.4, 0.1, 0.6, 0.99, 0.1, 0.0, 0.0,

0.1, 0.0, 0.0, 0.3, 0.0, 0.0))

yr <- 1:(N*length(times))

ar <- sort(c((0:N*length(times))+5, (0:N*length(times))+6,

(0:N*length(times))+7))

#####################

### Generate data ###

#####################

beta_s <- data.frame()

se_s <- data.frame()

rse_s <- data.frame()

pval_s <- data.frame()

rpval_s <- data.frame()

lrt_s <- data.frame()

wald_s <- data.frame()

sample_s <- data.frame()

for(i in jr.begin:jr.finis){

repeat{

age <- as.data.frame(rep(times, N))

for(k in 1:nrow(age)){

age$age[k] <- runif(1, min=age[k,1]-0.5, max=age[k,1]+0.5)

age$age8[k] <- age$age[k]-8

age$source.r[k] <- rbinom(1, size=1,

prob=prob_s[round(age$age[k],0)])

}

snp <- sort(rep(rbinom(N, size=2, prob=prob), length(times)))

mdlMtx <- as.data.frame(cbind(1, age$age8, age$age8^2,

age$age8^3, snp, age$source.r, age$age8*snp,

(age$age8^2)*snp, (age$age8^3)*snp, ID))

names(mdlMtx) <- c("Intercept", "age8", "age8.2", "age8.3",

"rs1121980_add", "source.r", "age8.rs1121980_add",

"age8.2.rs1121980_add", "age8.3.rs1121980_add", "ID")

u1 <- sample(ar, 0.4*(N*3), replace=F) # delete these samples

from between 5-7yrs (proportion = 0.4*(n-3))

u2 <- sample(yr[!yr%in%ar], 0.4*(N*12), replace=F) # delete this

proportion of samples outside 5-7yrs

u <- c(u1,u2)

mdlMtx <- mdlMtx[-u,]

Zmtx <- mdlMtx[,1:3]

Zmtx$ID <- mdlMtx$ID

id <- data.frame(id=unique(mdlMtx$ID))

mdlMtx$age8 <- mdlMtx[,2]

# Normally distributed random effects and error

for (j in 1:nrow(id)) { #random effect, random error and correlation

Zj <- as.matrix(subset(Zmtx,Zmtx$ID == id$id[j])[, 1:3])

n <- nrow(Zj)

if(n==1){

Yj <- Zj%*% t(rmvnorm(1,sigma=varRan))+rnorm(n,sqrt(varE))

}

else{

Yj <- Zj%*% t(rmvnorm(1,sigma=varRan)) + t(rmvnorm(1,

sigma=CARmatrix(rho=Phi, Zj[,2])*varE))

}

if (j == 1 ){Y = Yj} else {Y =c(Y,Yj)}

}

# Add back in the population average

Y <- Y + as.matrix(mdlMtx[,1:9]) %*% as.matrix(bta)

# Create a data frame

ndd <- data.frame(mdlMtx$ID)

ndd$bmi <- Y

dx <-mdlMtx[,c("age8", "rs1121980_add", "source.r")]

ndd <-cbind(ndd,dx)

names(ndd)[1] <- "cid_724a"

# Update model with simulated data

nfitF <- tryCatch(update(rfitF,data=ndd), error=function(e) NA)

nfitF_base <- tryCatch(update(rfitF_base,data=ndd), error=function(e)

NA)

if(is.na(nfitF)==F & is.na(nfitF_base)==F){break}

}

# Generate robust standard errors

G <- matrix(as.numeric(getVarCov(nfitF)),ncol=3)

E <- cbind(mdlMtx$ID, nfitF$residuals[,1])

S <- matrix()

for(j in 1:nrow(id)){


Xj <- as.matrix(subset(mdlMtx,mdlMtx$ID == id$id[j])[, 1:9])

Rj <- as.numeric(VarCorr(nfitF)[4,1]) * diag(nrow(Zj))

Sj <- t(Xj) %*% ginv((Zj %*% G %*% t(Zj)) + Rj) %*% Xj

if (j == 1 ){S = Sj} else {S = S + Sj}

}

M <- matrix()



Xj <- as.matrix(subset(mdlMtx,mdlMtx$ID == id$id[j])[, 1:9])

Ej <- as.matrix(subset(E,E[,1] == id$id[j])[,2])

Rj <- as.numeric(VarCorr(nfitF)[4,1]) * diag(nrow(Zj))

Mj <- t(Xj) %*% ginv((Zj %*% G %*% t(Zj)) + Rj) %*% Ej %*% t(Ej)

%*% ginv((Zj %*% G %*% t(Zj)) + Rj) %*% Xj

if (j == 1 ){M = Mj} else {M = M + Mj}

}

rVarCov <- ginv(S) %*% M %*% ginv(S)

# Create output dataframes

beta_s <- rbind(beta_s, c(i, as.numeric(fixef(nfitF))))

se_s <- rbind(se_s, c(i, as.numeric(summary(nfitF)$tTable[,2])))

rse_s <- rbind(rse_s, c(i, sqrt(abs(diag(rVarCov)))))

pval_s <- rbind(pval_s, c(i, as.numeric(summary(nfitF)$tTable[,5])))

rpval_s <- rbind(rpval_s, c(i,

pt(abs(summary(nfitF)$tTable[,1]/sqrt(abs(diag(rVarCov)))),

df=4000,lower.tail=F)*2))

lrt_s <- rbind(lrt_s, c(i, as.numeric(anova(nfitF_base, nfitF)[2,9])))

wald_s <- rbind(wald_s, c(i,

wald(nfitF, 'rs1121980_add')[[1]]$anova[[4]][1,1]))

sample_s <- rbind(sample_s, c(i, length(unique(ndd$cid_724a)),

nrow(ndd), mean(table(ndd$cid_724a)), sd(table(ndd$cid_724a))))

}

# Add names to output dataframes

names(beta_s) <- c("beta_i", "beta_intercept", "beta_age",

"beta_age2", "beta_age3", "beta_snp", "beta_source.r",

"beta_snp_age", "beta_snp_age2", "beta_snp_age3")

names(se_s) <- c("se_i", "se_intercept", "se_age", "se_age2",

"se_age3", "se_snp", "se_source.r", "se_snp_age", "se_snp_age2",

"se_snp_age3")

names(rse_s) <- c("rse_i", "rse_intercept", "rse_age", "rse_age2",

"rse_age3", "rse_snp", "rse_source.r", "rse_snp_age",

"rse_snp_age2", "rse_snp_age3")

names(pval_s) <- c("pval_i", "pval_intercept", "pval_age",

"pval_age2", "pval_age3", "pval_snp", "pval_source.r",

"pval_snp_age", "pval_snp_age2", "pval_snp_age3")

names(rpval_s) <- c("rpval_i", "rpval_intercept", "rpval_age",

"rpval_age2", "rpval_age3", "rpval_snp", "rpval_source.r",

"rpval_snp_age", "rpval_snp_age2", "rpval_snp_age3")

names(lrt_s) <- c("lrt_i", "lrt")

names(wald_s) <- c("wald_i", "wald")

names(sample_s) <- c("sample_i", "N_ID", "N_obs", "Mean_obs_ID",

"SD_obs_ID")

G.4 Chapter Five

######################


######################

library(lattice)

library(nlme)

library(foreign)

source("wald.R")

library(MASS)

######################################

### Read in command line arguments ###

######################################

options(scipen=30)

args = commandArgs(TRUE)

print(args)

jr.begin=as.numeric(args[1])

jr.finis=as.numeric(args[2])

chr=as.numeric(args[3])

print(jr.begin)

print(jr.finis)

print(chr)

rm(args)

##########################

### Read in data files ###

##########################

data <- read.csv("RAINE_cleaned_GWAS.csv", na.strings=c("", " "))

inf.fname = paste("Chr",chr,"/step2.mlinfo", sep="")

info = read.table(file=inf.fname, header=T)

myinfo = info[(jr.begin-2):(jr.finis-2),]

rm(info)

dose.fpath = paste("Chr",chr,sep="")

setwd(dose.fpath)

unix.cmd = paste( "cut -d' ' -f 1,", jr.begin, "-", jr.finis, "

step2.mldose" ," ", sep="")

dose=read.table(pipe(unix.cmd))

#####################################

### Create ID column in dose file ###

#####################################

dose <- as.data.frame(dose)

dose$ID = substr(dose[,1], 1, (nchar(as.character(dose[,1]))/2)-1)

dose <- dose[,c(ncol(dose), 2:(ncol(dose)-1))]

dim(dose)

dose <- as.matrix(dose)

head(dose)[,1:5]

##################################

### Create function for models ###

##################################

sp <- function(x) gsp( x, knots = c((2-8), (8-8),(12-8)), degree =

c(3,3,3,3), smooth = c(2,2,2))

lme.fun =function(snp) {

geno = dose[,c(1,as.numeric(snp))]

geno <- as.data.frame(geno)

colnames(geno) <- c("ID", "g")

data <- merge(data, geno, by="ID")

data <- subset(data, !is.na(data$bmi) & !is.na(data$sex))

dim(data)

fit.snp = tryCatch( lme( log(bmi) ~ as.numeric(as.character(g))

* sp(age-8) + sex * sp(age-8) + PC1 + PC2 + PC3 + PC4 +

PC5, data=data, method="ML", na.action=na.exclude,

random = ~ sp(age-8)[,1:2]| ID,

correlation=corCAR1()),error=function(e) NA)

model <- summary(fit.snp)

mdlMtx <- as.data.frame(cbind(1,

as.numeric(as.character(data$g)), sp(data$age-8),

data$sex, data$PC1, data$PC2, data$PC3, data$PC4,

data$PC5, as.numeric(as.character(data$g))*sp(data$age-8),

data$sex*sp(data$age-8), data$ID))

Zmtx <- as.data.frame(cbind(1, sp(data$age-8)[,1:2], data$ID))

id <- as.data.frame(unique(data$ID))

data <- data[,-ncol(data)] # remove SNP from end column of data

if (is.na(fit.snp)==F) {

G <- matrix(as.numeric(getVarCov(fit.snp)),ncol=3)

E <- cbind(data$ID, fit.snp$residuals[,1])

S <- matrix()


Zj <- as.matrix(subset(Zmtx,Zmtx[,4] == id[j,1])[, 1:3])

Xj <- as.matrix(subset(mdlMtx, mdlMtx[,ncol(mdlMtx)] ==

id[j,1])[, 1:(ncol(mdlMtx)-1)])

Rj <- as.numeric(VarCorr(fit.snp)[4,1]) * diag(nrow(Zj))

Sj <- t(Xj) %*% ginv((Zj %*% G %*% t(Zj)) + Rj) %*% Xj

if (j == 1 ){S = Sj} else {S = S + Sj}

}

M <- matrix()


Zj <- as.matrix(subset(Zmtx,Zmtx[,4] == id[j,1])[, 1:3])

Xj <- as.matrix(subset(mdlMtx, mdlMtx[,ncol(mdlMtx)] ==

id[j,1])[, 1:(ncol(mdlMtx)-1)])

Ej <- as.matrix(subset(E,E[,1] == id[j,1])[,2])

Rj <- as.numeric(VarCorr(fit.snp)[4,1]) * diag(nrow(Zj))

Mj <- t(Xj) %*% ginv((Zj %*% G %*% t(Zj)) + Rj) %*% Ej %*%

t(Ej) %*% ginv((Zj %*% G %*% t(Zj)) + Rj) %*% Xj

if (j == 1 ){M = Mj} else {M = M + Mj}

}

rVarCov <- ginv(S) %*% M %*% ginv(S)

wald_t <- try(wald(fit.snp, c(2, 15:20)), silent=TRUE)

if(class(wald_t)[1] != 'try-error'){

wald_p <- wald_t[[1]]$anova[[4]][1,1]

}else{

wald_p <- NA

}

wald_t_int <- try(wald(fit.snp, c(15:20)), silent=TRUE)

if(class(wald_t_int)[1] != 'try-error'){

wald_p_int <- wald_t_int[[1]]$anova[[4]][1,1]

}else{

wald_p_int <- NA

}

snp.out <- c(as.numeric(fixef(fit.snp))[grep("as.numeric",

names(fixef(fit.snp)))],

as.numeric(model$tTable[grep("as.numeric",

rownames(model$tTable)),2]),

as.numeric(model$tTable[grep("as.numeric",

rownames(model$tTable)),5]),

sqrt(abs(diag(rVarCov)))[c(2,15:20)],

(pt(abs(summary(fit.snp)$tTable[,1]/sqrt(abs(diag(rVarCov)

))),df=4000,lower.tail=F)*2)[c(2,15:20)],

wald_p, wald_p_int)

}

if (is.na(fit.snp)==T) {

snp.out <- rep("NA",37)

}

return(snp.out)

}

G.5 Chapter Six ## ALSPAC code (similar code was used for Raine)

#######################

### Spline function ###

#######################


c(3,3,3,3), smooth = c(2,2,2))

######################################

### SPLMM Model for BMI in females ###

######################################

female <- lme(log(bmi) ~ sp(age_yr-8) + source.r, data=data.f,

random=~sp(age_yr-8)[,1:2]|cid_724a, na.action=na.omit,

method="ML", correlation=corCAR1(form = ~ 1 |cid_724a))

summary(female)

female.score <- lme(log(bmi) ~ sp(age_yr-8)*score + source.r,

data=data.f, random=~sp(age_yr-8)[,1:2]|cid_724a, method="ML",

na.action=na.omit, correlation=corCAR1(form = ~ 1 |cid_724a))

summary(female.score)

anova(female, female.score)$"p-value"

wald(female.score, 10:15)

####################################

### SPLMM Model for BMI in males ###

####################################

male <- lme(log(bmi) ~ sp(age_yr-8) + source.r, data=data.m,



summary(male)

male.score <- lme(log(bmi) ~ sp(age_yr-8)*score + source.r,

data=data.m, random=~sp(age_yr-8)[,1:2]|cid_724a, method="ML",


summary(male.score)

anova(male, male.score)$"p-value"

wald(male.score, 10:15)

##############################

### Weight spline function ###

##############################


c(1,3,3,2), smooth = c(2,2,2))

###################################

### SPLMM for weight in females ###

###################################

female.wt <- lme(log(weight) ~ sp(age_yr-8) + source.r, data=data.f,



summary(female.wt)

female.score.wt <- lme(log(weight) ~ sp(age_yr-8)*score + source.r,



summary(female.score.wt)

anova(female.wt, female.score.wt)$"p-value"

wald(female.score.wt, 7:9)

#################################

### SPLMM for weight in males ###

#################################

male.wt <- lme(log(weight) ~ sp(age_yr-8) + source.r, data=data.m,



summary(male.wt)

male.score.wt <- lme(log(weight) ~ sp(age_yr-8)*score + source.r,



summary(male.score.wt)

anova(male.wt, male.score.wt)$"p-value"

wald(male.score.wt, 7:9)

L <- c(0,0,0,0,1,0,sp(1.75-8)*1) # Test when effect occurs

wald(female.score.wt, L)

L <- c(0,0,0,0,1,0,sp(1-8)*1)

wald(male.score.wt, L)

##############################

### Height spline function ###

##############################


c(3,3,3,3), smooth = c(2,2,2))

###################################

### SPLMM for height in females ###

###################################

female.ht <- lme(height ~ sp(age_yr-8) + source.r, data=data.f,



summary(female.ht)

female.score.ht <- lme(height ~ sp(age_yr-8)*score + source.r,



summary(female.score.ht)

anova(female.ht, female.score.ht)$"p-value"

wald(female.score.ht, 10:15)

#################################

### SPLMM for height in males ###

#################################

male.ht <- lme(height ~ sp(age_yr-8) + source.r, data=data.m,



summary(male.ht)

male.score.ht <- lme(height ~ sp(age_yr-8)*score + source.r,



summary(male.score.ht)

anova(male.ht, male.score.ht)$"p-value"

wald(male.score.ht, 10:15)

L <- c(0,0,0,0,0,0,0,1,0,sp(3.5-8)*1)

wald(female.score.ht, L)

L <- c(0,0,0,0,0,0,0,1,0,sp(3.5-8)*1)

wald(male.score.ht, L)

#####################################

### Calculating adiposity rebound ###

#####################################

sp <- function(x) gsp( x, knots = c((2), (8),(12)), degree =

c(3,3,3,3), smooth = c(2,2,2))

female2.ar <- lme(log(bmi) ~ sp(age_yr) + source.r, data=data.f,

random=~sp(age_yr)[,1:2]|cid_724a, na.action=na.omit,


male2.ar <- lme(log(bmi) ~ sp(age_yr) + source.r, data=data.m,

random=~sp(age_yr)[,1:2]|cid_724a, na.action=na.omit,


k1 <- 2

k2 <- 8

k3 <- 12

AR <- data.frame()

ar_alspac <- function(fit, dd1){

fxef <- as.numeric(fixef(fit)[2:8])

fxef2 <- as.numeric(fixef(fit))

for(i in 1:nrow(dd1)){

coeff <- fxef + c(as.numeric(ranef(fit)[i,2:3]), rep(0,5))

y <- try(uniroot(function(x) coeff[1] + coeff[2]*x +

coeff[3]*x^2/2 + coeff[4]*(x-k1)^2/2, lower=k1, upper=k2)$root, TRUE)

AR[i,1] <- ifelse(class(y) != "try-error", y, NA)

coeff2 <- fxef2 + c(as.numeric(ranef(fit)[i,]), rep(0,5))

AR[i,2] <- ifelse(class(y) != "try-error", exp(coeff2[1] +

coeff2[2]*y + (coeff2[3]*(y^2))/2 + (coeff2[4]*(y^3))/6 +

(coeff2[5]*((y-k1)^3))/6), NA) }

out <- data.frame(AR)

out$cid_724a <- dd1$cid_724a

out

}

dd1 <- as.data.frame(unique(data.f$cid_724a))

names(dd1)[1] <- "cid_724a"

AR_girls2 <- ar_alspac(female2.ar, dd1)

names(AR_girls2) <- c("Age2", "BMI2", "cid_724a")

dd1 <- as.data.frame(unique(data.m$cid_724a))

names(dd1)[1] <- "cid_724a"

AR_boys2 <- ar_alspac(male2.ar, dd1)

names(AR_boys2) <- c("Age2", "BMI2", "cid_724a")

Date post:	22-Aug-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times