+ All Categories
Home > Documents > Building and validating a prediction model for paediatric type 1...

Building and validating a prediction model for paediatric type 1...

Date post: 01-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
16
RESEARCH ARTICLE Building and validating a prediction model for paediatric type 1 diabetes risk using next generation targeted sequencing of class II HLA genes Lue Ping Zhao 1,2 | Annelie Carlsson 3 | Helena Elding Larsson 4 | Gun Forsander 5 | Sten A. Ivarsson 4 | Ingrid Kockum 6 | Johnny Ludvigsson 7 | Claude Marcus 8 | Martina Persson 9 | Ulf Samuelsson 7 | Eva Örtqvist 9 | ChulWoo Pyo 10 | Hamid Bolouri 11 | Michael Zhao 11 | Wyatt C. Nelson 10 | Daniel E. Geraghty 10 | Åke Lernmark 4 | The Better Diabetes Diagnosis (BDD) Study Group 1 Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA 2 School of Public Health, University of Washington, Seattle, WA, USA 3 Department of Pediatrics, Lund University, Lund, Sweden 4 Department of Clinical Sciences, Lund University/CRC, Skåne University Hospital, Malmö, Sweden 5 Institute of Clinical Sciences, Department of Pediatrics and the Queen Silvia Children's Hospital, Sahlgrenska University Hospital, Gothenburg, Sweden 6 Department of Clinical Neurosciences, Karolinska Institutet, Solna, Sweden 7 Department of Clinical and Experimental Medicine, Linköping University, Linköping, Sweden 8 Department of Clinical Science, Karolinska Institutet, Huddinge, Sweden 9 Department of Medicine, Clinical Epidemiology, Karolinska University Hospital, Solna, Sweden 10 Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA 11 School of Arts and Sciences, University of Washington, Seattle, WA, USA Correspondence Lue Ping Zhao, Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave NE, Seattle, WA 98109, USA. Email: [email protected] Funding information European Foundation for the Study of Diabetes (EFSD); Swedish Child Diabetes Foundation (Barndiabetesfonden); National Institutes of Health, Grant/Award Number: DK26190 and DK63861; Swedish Research Council including a Linné grant to Lund University Diabetes Centre; Skåne County Council; Swedish Association of Local Authorities and Regions (SKL); National Institute of Diabetes and Digestive and Kidney Diseases, Grant/Award Number: ITN# 1605MH; Fred Hutchinson Cancer Research Center Abstract Aim: It is of interest to predict possible lifetime risk of type 1 diabetes (T1D) in young children for recruiting highrisk subjects into longitudinal studies of effective prevention strategies. Methods: Utilizing a casecontrol study in Sweden, we applied a recently developed next generation targeted sequencing technology to genotype class II genes and applied an objectoriented regression to build and validate a prediction model for T1D. Results: In the training set, estimated risk scores were significantly different between patients and controls (P = 8.12 × 10 92 ), and the area under the curve (AUC) from the receiver operating characteristic (ROC) analysis was 0.917. Using the validation data set, we validated the result with AUC of 0.886. Combining both training and validation data resulted in a predictive model with AUC of 0.903. Further, we performed a biological validationby correlating risk scores with 6 islet autoantibodies, and found that the risk score was significantly correlated with IA2A (Zscore = 3.628, P < 0.001). When applying this prediction model to the Swedish population, where the lifetime T1D risk ranges from 0.5% to 2%, we anticipate identifying approximately 20 000 highrisk subjects after testing all newborns, and this calculation would identify approximately 80% of all patients expected to develop T1D in their lifetime. Members of the BDD Study Group are listed in Appendix 1. Abbreviations: AUC, area under the receiver operating characteristic curve; GWAS, genomewide association study; MHC, major histocompatibility region; NGTS, next generation targeted sequencing; OOR, objectoriented regression; ROC, receiver operating characteristic; T1D, type 1 diabetes Received: 28 September 2016 Revised: 26 June 2017 Accepted: 10 July 2017 DOI: 10.1002/dmrr.2921 Diabetes Metab Res Rev. 2017;33:e2921. https://doi.org/10.1002/dmrr.2921 Copyright © 2017 John Wiley & Sons, Ltd. wileyonlinelibrary.com/journal/dmrr 1 of 16
Transcript
Page 1: Building and validating a prediction model for paediatric type 1 …research.fhcrc.org/content/dam/stripe/bolouri/files/Zhao... · 2020-07-11 · Although effective prevention of

Received: 28 September 2016 Revised: 26 June 2017 Accepted: 10 July 2017

DO

I: 10.1002/dmrr.2921

R E S E A R CH AR T I C L E

Building and validating a prediction model for paediatric type 1diabetes risk using next generation targeted sequencing of classII HLA genes

Lue Ping Zhao1,2 | Annelie Carlsson3 | Helena Elding Larsson4 | Gun Forsander5 |

Sten A. Ivarsson4 | Ingrid Kockum6 | Johnny Ludvigsson7 | Claude Marcus8 |

Martina Persson9 | Ulf Samuelsson7 | Eva Örtqvist9 | Chul‐Woo Pyo10 | Hamid Bolouri11 |

Michael Zhao11 | Wyatt C. Nelson10 | Daniel E. Geraghty10 | Åke Lernmark4 |

The Better Diabetes Diagnosis (BDD) Study Group

1Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA

2School of Public Health, University of Washington, Seattle, WA, USA

3Department of Pediatrics, Lund University, Lund, Sweden

4Department of Clinical Sciences, Lund University/CRC, Skåne University Hospital, Malmö, Sweden

5 Institute of Clinical Sciences, Department of Pediatrics and the Queen Silvia Children's Hospital, Sahlgrenska University Hospital, Gothenburg, Sweden

6Department of Clinical Neurosciences, Karolinska Institutet, Solna, Sweden

7Department of Clinical and Experimental Medicine, Linköping University, Linköping, Sweden

8Department of Clinical Science, Karolinska Institutet, Huddinge, Sweden

9Department of Medicine, Clinical Epidemiology, Karolinska University Hospital, Solna, Sweden

10Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA

11School of Arts and Sciences, University of Washington, Seattle, WA, USA

Correspondence

Lue Ping Zhao, Division of Public Health

Sciences, Fred Hutchinson Cancer Research

Center, 1100 Fairview Ave NE, Seattle, WA

98109, USA.

Email: [email protected]

Funding information

European Foundation for the Study of

Diabetes (EFSD); Swedish Child Diabetes

Foundation (Barndiabetesfonden); National

Institutes of Health, Grant/Award Number:

DK26190 and DK63861; Swedish Research

Council including a Linné grant to Lund

University Diabetes Centre; Skåne County

Council; Swedish Association of Local

Authorities and Regions (SKL); National

Institute of Diabetes and Digestive and Kidney

Diseases, Grant/Award Number: ITN# 16‐05‐MH; Fred Hutchinson Cancer Research Center

Members of the BDD Study Group are listed in Ap

Abbreviations: AUC, area under the receiver operat

generation targeted sequencing; OOR, object‐orien

Diabetes Metab Res Rev. 2017;33:e2921.https://doi.org/10.1002/dmrr.2921

Abstract

Aim: It is of interest to predict possible lifetime risk of type 1 diabetes (T1D) in young children

for recruiting high‐risk subjects into longitudinal studies of effective prevention strategies.

Methods: Utilizing a case‐control study in Sweden, we applied a recently developed next

generation targeted sequencing technology to genotype class II genes and applied an object‐

oriented regression to build and validate a prediction model for T1D.

Results: In the training set, estimated risk scores were significantly different between patients

and controls (P = 8.12 × 10−92), and the area under the curve (AUC) from the receiver operating

characteristic (ROC) analysis was 0.917. Using the validation data set, we validated the result with

AUC of 0.886. Combining both training and validation data resulted in a predictive model

with AUC of 0.903. Further, we performed a “biological validation” by correlating risk scores with

6 islet autoantibodies, and found that the risk score was significantly correlated with IA‐2A

(Z‐score = 3.628, P < 0.001). When applying this prediction model to the Swedish population,

where the lifetime T1D risk ranges from 0.5% to 2%, we anticipate identifying approximately

20 000 high‐risk subjects after testing all newborns, and this calculation would identify

approximately 80% of all patients expected to develop T1D in their lifetime.

pendix 1.

ing characteristic curve; GWAS, genome‐wide association study; MHC, major histocompatibility region; NGTS, next

ted regression; ROC, receiver operating characteristic; T1D, type 1 diabetes

Copyright © 2017 John Wiley & Sons, Ltd.wileyonlinelibrary.com/journal/dmrr 1 of 16

Page 2: Building and validating a prediction model for paediatric type 1 …research.fhcrc.org/content/dam/stripe/bolouri/files/Zhao... · 2020-07-11 · Although effective prevention of

2 of 16 ZHAO ET AL.

Conclusion: Through both empirical and biological validation, we have established a

prediction model for estimating lifetimeT1D risk, using class II HLA. This prediction model should

prove useful for future investigations to identify high‐risk subjects for prevention research in

high‐risk populations.

KEYWORDS

autoimmune disease, genetics, genome‐wide association study, islet autoantibodies, object‐

oriented regression, type 1 diabetes

1 | INTRODUCTION

Type 1 diabetes (T1D) results from an autoimmune destruction of the

pancreatic islet beta cells usually initiated in early life and progressing

at variable rate until diagnosis.1,2 Incidence rates in Europe and the

United States range from 8 to 63 per 100 000 per year, nearly 6 to

100 times of the incidence in Asian populations, with a lifetime risk

of 0.5%‐2%.3,4 Worldwide, the incidence rate of T1D is continuously

rising steadily with the rate of 2%‐5% a year.5 There is an increasing

research demand for both earlier and better clinical diagnosis, treat-

ment, and management. However, an even more important issue that

needs to be addressed is prevention, based on the development of an

early prediction and detection methodology. As the first appearing

beta‐cell autoantibody, be it against either insulin only, GAD65 only,

or both, signify an etiological trigger of a long‐term prodrome,6-8 it is

imperative that the overall T1D burden be reduced through early

detection and early prevention. There are reports that children

diagnosed with beta‐cell autoantibodies in longitudinal studies, in

comparison with those in the community, required no or fewer

hospitalizations,9 or had reduced frequency of ketoacidosis after being

diagnosed.10,11 It has also been reported that participation in prospec-

tive follow‐up before diagnosis of T1D leads to earlier diagnosis with

fewer symptoms, decreased incidence of ketoacidosis, as well as better

metabolic control up to 2 years after diagnosis.12 Also, it cannot be

excluded that several secondary prevention studies initially failing the

end‐point such as parenteral and oral insulin in DPT‐1,13,14 nicotin-

amide in ENDIT,15 nasal insulin in DIPP,16 or hydrolyzed infant formula

in TRIGR17 eventually may be successful perhaps through primary

prevention18 or more effective intervention studies.19-21 Autoantibody

levels against insulin, GAD65, IA‐2, and the ZnT8 transporter and their

longitudinal measurements have been proposed as early detection

biomarkers, and have been shown their effectiveness in detecting

T1D early in life.6-8 For earlier detection than islet autoantibodies,

DNA‐based biomarkers, such as HLA genes, could be complementary,

allowing us to identify high‐risk children at birth.22-25

Genetic factors in the HLA system have long been shown to be

important to the aetiology of T1D.26 While earlier efforts have centred

on HLA‐DR and DQ genes, recent genetic studies of T1D have been

genome‐wide association studies (GWAS), surveying the entire

human genome for discoveries.27-30 Again, the major histocompatibil-

ity region (MHC), covering HLA‐DR, DQ, DP, and other genes, has

exhibited unambiguous associations.29,31 Numerous investigations of

different populations have shown that the HLA association with T1D

is robust.

Translating HLA associations with T1D stimulates much inter-

est to develop prediction models. It was suggested earlier that

combining HLA class I and II genes with islet autoantibody mea-

surements should be useful to predict T1D.32,33 Recognizing the

high linkage‐disequilibrium in MHC region, a T1D prediction model

with 6 single nucleotide polymorphisms flanking HLA genes was

also suggested.34 In the TEDDY study, a defined set of HLA‐DR

and ‐DQ allele specific probes were used in a “qualitative predic-

tion model” to screen more than 420 000 newborns, and to recruit

over 7000 high‐risk subjects into longitudinal monitoring.22

Although effective prevention of T1D is not yet available, it will

be important to develop HLA gene‐based prediction models for

T1D risk to recruit high‐risk subjects into prevention clinical trials35

and to develop future precision screening.36

When building HLA gene‐based prediction models, one chal-

lenge is that HLA genes are exceptionally complex, including char-

acteristics such as high polymorphism, potential Hardy‐Weinberg

disequilibrium, and extensive linkage‐disequilibrium due to natural

selection, allele‐specific, or genotype‐specific associations, and pos-

sible interactions between genes.37 There is a continuing effort to

identify allele‐specific, genotype‐specific, or peptide‐specific

effects within a gene or within any MHC region, using ever‐

improving genotyping technologies.37 While improving resolution,

the next generation targeted sequencing (NGTS) technologies pro-

duce even more alleles/genotypes with lower frequencies, present-

ing challenges for data analytics. This challenge becomes

particularly acute for constructing prediction models, with many

uncommon alleles/genotypes. Progress to translate known HLA‐

T1D associations has been slow.

To circumvent this challenge, we recently developed an object‐

oriented regression (OOR) to correlate complex genotypes with

disease phenotypes.38 OOR transforms metrics, from a metric of

genotype to a metric of similarities to selected genotypes (referred

to as “exemplars”), and assesses T1D associations with genotype‐

specific similarity. Using the OOR, we build a T1D risk prediction

model with high‐resolution HLA‐DRB1, ‐DRB3, ‐DRB4, ‐DRB5,

‐DQA1, ‐DQB1, ‐DPA1, and ‐DPB1. After building the prediction

model on the training set, we then assess its predictive perfor-

mance in an independent validation data set. To establish the

biological basis of the predictive score, we further performed a

biological validation by correlating the risk score with the levels

of 4 autoantibodies directed against insulin, GAD65, IA‐2, and

any of the 3 variants (W, R, or Q) at amino acid position 325 of

the ZnT8 transporter.

Page 3: Building and validating a prediction model for paediatric type 1 …research.fhcrc.org/content/dam/stripe/bolouri/files/Zhao... · 2020-07-11 · Although effective prevention of

ZHAO ET AL. 3 of 16

2 | METHODS

2.1 | Study participants

The present case‐control study includes 962 patients (cases) from the

nation‐wide Swedish Better Diabetes Diagnosis (BDD) study and 448

geographically representative healthy normal subjects (controls).36 All

of the patients were registered in the BDD study carried out in collab-

oration with 42 paediatric clinics in Sweden since 2005.39,40 The

American Diabetes Association and World Health Organization criteria

were used for the diagnosis of diabetes and to classify the disease.41

Here, however, we included only patients who at the time of

clinical diagnosis had 1 or more autoantibodies against either insulin

(IAA), GAD65 (GADA), IA‐2 (IA‐2A), and 3 variants (amino acid

325 being either R, W or Q) (ZnT8‐RA, ZnT8‐WA, or ZnT8‐QA,

respectively).39,40,42 The Karolinska Institute Ethics Board approved

the BDD study (2004/1:9). The controls, described in detail else-

where,43 were randomly selected from the national population register

and frequency matched for patient age, gender, and residential area.

Prior to all analyses, a total of 1410 patients and controls were

randomly assigned into training and validation sets with 705 samples

each. The training set with 479 patients and 226 controls was used

to build the prediction model. The validation set with 483 patients

and 222 controls as used to validate the prediction model

independently.

2.2 | DNA extraction and HLA next generationtargeted sequencing

The Plasmid Maxiprep Kit (Qiagen, Stockholm, Sweden) was used to

isolate DNA from frozen whole blood samples according to the

manufacturer's instructions. The NGTS HLA typing approach utilized

PCR‐based amplification of HLA and sequencing using Illumina MiSeq

technology as described in detail elsewhere.36,44,45 In brief, the labora-

tory steps consisted of consecutive PCR reactions with bar coding

incorporated in the PCRs for individual sample tracking, followed by

application to the MiSeq. Robust assays for each of the target loci

for all HLA‐DR alleles were then developed along with all A1 and B1

alleles of HLA‐DQ A1 and HLA‐DP A1 and B1. The depth of genotyp-

ing was extended to all of HLA‐DRB3, 4, 5 to include exons 2 and 3 for

all DR alleles. The analytical tools used to define haplotypes and

genotypes were developed in collaboration with Scisco Genetics

(Seattle, WA). Data quality was assessed using a minimal read coverage

of 100 reads with perfect concordance to the determined type. Using

an amplicon‐based approach, the phase within each amplicon was

determined directly by single read coverage, while phase between

the 2 exons was deduced from database comparisons using the IMGT

HLA version 3.10 (http://www.ebi.ac.uk/ipd/imgt/hla). To date, these

tools have been tested—with 100% accuracy—on >2000 control

samples genotyped with the Scisco Genetics IGS approach.44,45

2.3 | Islet autoantibodies

IAA, GADA, IA‐2A, and 3 variants of ZnT8A (ZnT8‐RA, ZnT8‐WA, or

ZnT8‐QA, respectively) were determined in quantitative radio‐binding

assays using in‐house standards to determine levels as previously

described in detail.39,46 Qualitative values of these antibody measure-

ments, by their corresponding clinically acceptable threshold values,

were used to determine if each islet autoantibody measurement is

positive or negative. Measurements were made only among all

patients, because nearly all controls should be negative.

3 | STATISTICAL ANALYSIS

3.1 | Allelic and genotypic frequency estimations

All HLA genes under consideration are highly polymorphic. Without

imposing frequency‐specific restrictions, we computed allelic and

genotypic frequencies, stratified by patient and control status. Also,

to demonstrate comparability of these frequencies between training

and validation sets, our calculation also stratified over corresponding

data sets.

3.2 | Assessing T1D association via OOR

A major challenge facing association analysis of HLA genes withT1D is

that alleles and resulting genotypes are highly polymorphic. To build

predictive models with polymorphic HLA genes, we have developed

an object‐oriented regression (OOR) technique and have described

the methodology fully elsewhere.38 Briefly, the key idea of OOR is to

identify a set of genotype profiles, which are referred to as exemplars,

to compute similarities of all study subjects with these exemplars, and

then to assess disease associations with similarity measurements,

instead of actual genotypes as used previously.47 By shifting associa-

tion analysis from genotypes to similarities of genotypes to exemplars,

OOR is able to assess HLA associations even if corresponding

genotypes are relatively uncommon or have extremely unbalanced

frequencies between patients and controls. For example, to examine

T1D association with HLA‐DRB1*03:01:01/03:01:01, a conventional

method counts its frequencies among patients and controls, and

compares them with frequencies of all other genotypes among

patients and controls. Summarizing all counts in a 2 × 2 table fashion,

one computes odds ratio (OR) specific to this genotype that quantifies

the genotypic association with T1D. Such a method performs well

when the genotype in analysis is relatively frequent (but is not

approaching 100%) and its frequencies among patients and controls

do not approach 0% or 100%. However, this conventional method

has difficulty dealing with genotypes with relatively low frequencies,

especially when frequencies are unbalanced: for example, HLA‐

DRB1*15:01:01/07:01:01 is common in controls but is absent in

patients, for which odds ratios approach zero.

To circumvent this challenge of many uncommon and unbalanced

genotypes in HLA data analysis, OOR treats all observed genotypes

as exemplars, eg, HLA‐DRB1*03:01:01/03:01:01 and *15:01:01/

07:01:01 are 2 exemplars. For each subject in the study, OOR

compares its genotype (denoted as Gi for the ith subject) with each

exemplar, taking value 0, 0.5, or 1, if Gi shares no allele, 1 allele, or both

alleles with the exemplar, respectively. For 2 chosen exemplars above,

we create 2 similarity measurements for each Gi, denoted as Si1 and Si2,

respectively. Instead of assessing T1D association with the genotype

Page 4: Building and validating a prediction model for paediatric type 1 …research.fhcrc.org/content/dam/stripe/bolouri/files/Zhao... · 2020-07-11 · Although effective prevention of

4 of 16 ZHAO ET AL.

Gi, OOR assesses T1D association with exemplar‐specific similarities

(Si1, Si2) via 2 regression coefficients (β1, β2) in a logistic regression

model.38 If β1 is positive, it means that the similarity to the exemplar

(HLA‐DRB1*03:01:01/03:01:01) associates with an increased risk

for T1D. On the other hand, if β2 is negative, it means that the

similarity to the exemplar (HLA‐DRB1*15:01:01/07:01:01) associates

with a reduced risk. In other words, OOR assesses the T1D associa-

tion with HLA genes via similarities with specific genotypes.

There are at least 3 practical reasons that favour OOR. Again, we

use the genotype HLA‐DRB1*15:01:01/07:01:01 as an example. First,

this genotype has approximate frequencies 4.4 and 0 in controls and in

patients, respectively. In this case, the conventional method fails to

provide an OR estimate, because it approaches zero with unbalanced

frequency distributions between patients and controls. On the other

hand, OOR counts frequencies of subjects, sharing 0, 1, or 2 alleles

with the exemplar in patients, and has an appreciable number of

patients who share 1 allele with the exemplar, even though no one

shares both alleles. Hence, OOR produces an interpretable and robust

regression coefficient (β2) for statistical inference. Second, when

extending a single HLA gene to all 8 HLA genes, one has combinations

of multiple genotypes, referred to as genotype profiles. When there

are many genotype profiles but each has small frequency, OOR com-

putes similarity measurements with a list of chosen exemplars and pro-

ceeds with necessary association analysis, while the conventional

method fails. Third, OOR relies directly on genotype profiles, without

requiring any haplotype information. Hence, OOR retains the interpre-

tation of genotypes and robustness without making undesirable

assumptions, such as Hardy‐Weinberg equilibrium, typically required

by haplotype‐based association analysis methods.48

The intuitiveness of OOR analysis is based on the supposition that

one observes aT1D patient with a pair of high‐risk alleles and registers

this patient as an exemplar. When seeing a new subject who shares no

alleles, 1 allele, or both alleles with the exemplar, one may intuitively

conclude that the subject has low, modest, or high risk for T1D,

respectively.

3.3 | Building a T1D prediction model via OOR

After assessing T1D associations with all 8 class II genes, we are inter-

ested in building a prediction model for T1D, based on HLA class II

genes. Using all genotype profiles present in the training set, OOR

identifies a panel of informative exemplars and evaluates similarities

of each subject's genotypes with exemplars. Using the penalized likeli-

hood method for selecting informative predictors,38 OOR selects a

final set of “informative exemplars” into a prediction model, with

estimated coefficients corresponding to each informative exemplar. In

Equation 1, one has a risk score for a genotype profile˜Gwritten as

Risk Score˜G

� �¼ β1S1 ˜

G

� �þ β2S2 ˜

G

� �þ⋯þ βqSq ˜

G

� �; (1)

in which the function Sk˜G

� �; k ¼ 1;2;…; qð Þ measures the similarity

of the genotype profile˜Gwith the kth exemplar, and βk, estimated

log odds ratio, is the weight on the similarity, estimated from the

training data set. The exponentiation of βk leads to the estimated

odds ratio (OR).

Despite its mathematical look, the risk score has an intuitive

interpretation from a clinician's perspective. In this study, one treats

the panel exemplars as a collection of “case reports”; each has a

particular genotype profile; a protective one (βk < 0) or a risky one

(βk < 0). Those regression coefficients quantify clinical experience

associated with individual exemplars. When facing a new subject,

one evaluates his/her similarity to all exemplars and computes

weighted sum of all similarity measurements, leading to a risk score

for the clinician to make a judgement.

3.4 | ROC analysis in training and validation sets

As noted previously, we use a training data set exclusively to build a

prediction model. After building the prediction model, we compute

the risk score through the above Equation 1. To evaluate performance

of this risk score, potentially as a testing criterion, one performs

receiver operating characteristics (ROC) analysis. Basically, choosing a

series of values for a threshold value, one computes the sensitivity

(θ) and specificity(λ), defined as percentages of patients and controls

whose risk scores exceed the threshold value, respectively. Con-

ventionally, ROC analysis plots a XY plot of sensitivity values versus

1‐specificity values and computes an area under curve (AUC) to

measure the performance.

3.5 | Biological validation

In addition to an empirical validation via an independent validation set,

a stronger validation is via biological validation, ie, assessing associa-

tions of risk scores with islet autoantibody levels. To achieve this

objective, we use the logistic regression model, implemented in a R

function “glm”,49 to regress each qualitative measure of autoantibody

level on the risk score among patients because of available autoanti-

body measurements.

3.6 | Construction of a testing model

Following the validation of the risk scores both empirically and

functionally, it is of interest to construct a testing model. Suppose

that we have a population of Zsubjects. On each subject, we

genotype HLA genes and compute risk scores by the Equation 1,

denoting the risk score Z. The testing rule is that for a chosen thresh-

old value c, the subject is deemed as a high‐risk subject if the risk

score exceeds the threshold value. Formally, the testing rule may

be written as

Screening Rule ¼ Positive Z > c

Negative Z ≤ c;

�(2)

in which the threshold value c is chosen to indicate positive or

negative test result. Now let Pr(Z > c) denote the percentage of subjects

who test positive. This percentage has a relationship with sensitivity

(θ), specificity (λ) and the averaged lifetime risk (π) in the population,

which may be as

Pr Z > cð Þ ¼ θπþ 1 − λð Þ 1 − πð Þ: (3)

Page 5: Building and validating a prediction model for paediatric type 1 …research.fhcrc.org/content/dam/stripe/bolouri/files/Zhao... · 2020-07-11 · Although effective prevention of

ZHAO ET AL. 5 of 16

After testing N subjects, we expect to identify N Pr(Z > c) positive

subjects. Among all positive subjects, we estimate the percentage of

subjects to develop T1D in their lifetime by

Pr D ¼ 1jZ > cð Þ ¼ θπθπþ 1 − λð Þ 1 − πð Þ : (4)

When effective prevention strategies are available, the group of

subjects with positive test results may benefit from receiving this test.

4 | RESULTS

4.1 | Allelic distributions of all HLA‐DR, ‐DQ, and ‐DP

These HLA genes are highly polymorphic with many alleles (see online

for updated information http://www.ncbi.nlm.nih.gov/projects/gv/

mhc). Even within a relatively homogenous Swedish population,

polymorphisms of these genes remain high. The allelic frequencies of

all HLA genes (HLA‐DRB1, ‐DRB345, ‐DQA1, ‐DQB1, ‐DPA1, and

‐DPB1) by disease status (patients on the left and controls on the right

FIGURE 1 Allelic frequencies of HLA‐DRB1, ‐DRB345, ‐DQA1, ‐DQB1, ‐Dvalidation data sets

in each panel) are depicted in Figure 1. The plotting scale for each

individual allele is 50%, represented by the length of the dashed line.

Alleles are sorted based upon observed allelic frequencies among

controls in the training set. Clearly, DRB1 is the most polymorphic with

observed 44 distinct alleles in this population. On the other extreme,

DPA1 is least polymorphic with 12 distinct alleles. Further, it is

dominated by more than 50% of the major DPA1*01:03:01 allele.

Comparing allelic frequencies between patients and controls, one

notes that several alleles have much greater allelic frequencies among

controls than among patients, with a few extreme alleles with excep-

tionally high frequencies among controls but not among the patients,

eg, DRB1*15:01:01, DRB5*01:01:01, and DQB1*06:02:01. Collec-

tively, these alleles are negatively associated with T1D and therefore

thought as protective. Conversely, there are risk alleles because they

have higher allelic frequencies among patients than those among con-

trols, such as DRB1*04:01:01, DQA1*05:01:01, and DQB1*03:02:01.

In addition, it is interesting to note that controls appear be more

diverse with many more uncommon alleles than patients for all genes.

It is also imperative to observe estimated allelic frequencies

between training and validation sets. Besides subtle random variations,

PA1, and ‐DPB1 among cases and controls, stratified over training and

Page 6: Building and validating a prediction model for paediatric type 1 …research.fhcrc.org/content/dam/stripe/bolouri/files/Zhao... · 2020-07-11 · Although effective prevention of

6 of 16 ZHAO ET AL.

most of the estimated allelic frequencies are comparable between

these training and validation sets, indicating that there is no obvious

bias in generating training and validating data sets.

4.2 | Genotypic distributions of all HLA‐DR, ‐DQ,and ‐DP

Pairing polymorphic alleles at each locus creates even more diversity at

the genotype level, as expected. The data in Figure 2 uses an image

representation to present HLA‐DRB1 genotypic frequencies that are

proportional to intensity values. Two rows of image triangles corre-

spond to controls and patients, respectively, and 2 columns for training

compared with validation sets, respectively. Inspecting patterns of

genotypic frequencies among controls in the training set reveals the

anticipated “genotypic polymorphisms” induced by “random pairing”

of 2 alleles. Visually, the sporadic distribution of genotype frequencies

is consistent with Hardy‐Weinberg equilibrium, and the test statistic of

all genotypes is indeed unable to reject the equilibrium hypothesis

(not shown). However, when examining genotype‐specific deviations,

FIGURE 2 Genotypic frequencies of HLA‐DRB1 among cases and controlsintensity values

there is a varying degree of disequilibrium, suggesting that natural

selections are occurring at genotype level (not shown).

When comparing patients (panel A) and controls (panel B) in

Figure 2, it is striking to note that controls exhibit greater diversity in

genotype frequencies than patients do. Visually, one would conjecture

that certain genotypes occur more frequently among controls than

among patients, indicating possible genotype‐specific disease associa-

tions. However, sparseness, and hence complexity, limit the usefulness

of genotype‐specific statistical tests, due to small sample size per

genotype and multiple testing dilemmas.

Contrasting genotype frequency patterns between training and

validation sets (left and right columns), we note that corresponding

patterns are largely symmetric, supporting that training and validation

sets have comparable genotypic distributions, in addition to compara-

ble allelic distributions.

The above observations on HLA‐DRB1 hold true for the remaining

5 HLA genes. In summary, exceptional polymorphisms of these HLA

genes, while stressing their functional importance, have certainly

presented a substantial challenge for the scientific community to

stratified over training and validation data sets, proportional to colour

Page 7: Building and validating a prediction model for paediatric type 1 …research.fhcrc.org/content/dam/stripe/bolouri/files/Zhao... · 2020-07-11 · Although effective prevention of

ZHAO ET AL. 7 of 16

synthesize all evidence and to translate their disease associations

from bench to bedside. Overcoming this challenge was the impetus

to use OOR.

4.3 | Gene‐specific association analysis

The initial association analysis is to explore T1D association with 1

exemplar at a time and gene‐by‐gene. Specifically, to analyse T1D

association with HLA‐DRB1, we use 155 unique genotypes as exem-

plars and compute the similarity vector of every individual with these

exemplars, leading to a matrix of similarity measurements. Through

OOR, we performed a univariate regression analysis with 1 exemplar

at a time, resulting in estimated coefficients, standard errors, Z‐scores,

and P‐values. Z‐scores for individual exemplars that are specific to

each genotype (paired alleles are assigned to rows and columns) are

shown in Figure 3. Z‐scores are truncated to integers and are shown in

each cell only if they exceed 2, and each cell is colour‐coded to red

(protective association) or to green (risk association). For HLA‐DRB1

(Figure 3A), individuals who are similar to HLA‐DRB1*03:01:01/* or

HLA‐DRB1*04:01:01 are high risk for T1D, where “/*” is used to

FIGURE 3 Estimated Z‐scores from OOR association analysis of T1D w‐DRB345, ‐DQA1, ‐DQB1, ‐DPA1, and ‐DPB1) in the training data set. Thereduced risks, and black for no association. Each entry corresponds to a gen

denote any other alleles that are dominated by the first allele. On

the other hand, individuals who are similar to HLA‐DRB1*07:01:01/*,

HLA‐DRB1*11:01:01/*, or HLA‐DRB1*15:01:01/* have a reduced risk

or are protected fromT1D.

With respect to HLA‐DRB3, ‐DRB4, and ‐DRB5, it appears that

individuals, similar to exemplars with HLA‐DRB3*01:01:02/* or

‐DRB4*01:03:01/*, are at an increased risk. Meanwhile, those, similar

to HLA‐DRB3*02:02:01/*, ‐DRB3*03:01:01/*, ‐DRB4*01:01:01/*,

and ‐DRB5*01:01/*, are protected fromT1D.

HLA‐DQA1 was next considered. Individuals are at high risk for

T1D, if they are similar to exemplars with HLA‐DQA1*03:01:01/*

and HLA‐DQA1*05:01:01. On the other hand, individuals similar to

exemplars of HLA‐DQA1*01:01:01/*, HLA‐DQA1*01:02:01/*, HLA‐

DQA1*01:03:01, HLA‐DQA1*02:01, and HLA‐DQA1*05:05:01 are at

reduced risk for T1D.

By contrast, HLA‐DQB1 seems to have 2 major risk exemplars

(*02:01:01/* and *03:02:01). Protected exemplars include HLA‐

DQB1*02:02:01/*, HLA‐DQB1*03:01:01/*, HLA‐DQB1*03:03:02/*,

HLA‐DQB1*04:02:01/*, HLA‐DQB1*05:01:01/*, HLA‐DQB1*05:02:01/

*, and HLA‐DQB1*06:02:01/*.

ith exemplar‐specific similarity measures, gene‐by‐gene (HLA‐DRB1,integers of Z‐scores are shown, with green for elevated risks, red forotype, with corresponding alleles shown on each column

Page 8: Building and validating a prediction model for paediatric type 1 …research.fhcrc.org/content/dam/stripe/bolouri/files/Zhao... · 2020-07-11 · Although effective prevention of

8 of 16 ZHAO ET AL.

Finally, with respect to 2 DP genes, HLA‐DPA1 has relatively

weaker associations with T1D, probably associated with HLA‐

DPA1*01:03:01/* and HLA‐DPA1*02:01:02/*. It is of interest to note

that the individual similar to HLA‐DPA1*02:01:01/*02:02:02 appears

to be at reduced risk for T1D as somewhat weak allele‐allele interac-

tion. Finally, the exemplar of HLA‐DPB1*01:01:02/* appears to

convey risk, while that of HLA‐DPB1*04:02:01/* represents a protec-

tive exemplar.

4.4 | A prediction model with HLA‐DR, ‐DQ, and ‐DP

Results from the univariate analyses described earlier suggest that all

HLA genes meaningfully associate withT1D. As expected, the associa-

tion analysis gains an insight into marginal associations of 1 exemplar

at a time. Observed marginal associations could be biological or medi-

ated by linkage‐disequilibrium between genes. Additionally, gene‐gene

interactions may also contribute to the overall associations.37 Here,

our goal is to build a prediction model, using OOR. When measuring

the overall similarity between subjects and exemplars, we assign equal

weights to all genes. After creating a similarity matrix, OOR filters out

those exemplars that are highly correlated with each other (if pairwise

correlation exceeds 0.95), and filters out those exemplars that do not

meet the marginal significance criteria at 5%. After applying the

variable selection procedure, OOR selects 26 exemplars into the

prediction model (Table 1). Other than specific genotypes of these

HLA genes, the estimated coefficients used in the prediction model

are listed in Table 1. Among all exemplars, 14 of them have positive

coefficients; meaning similarity to these exemplars will increase the

individual's risk to acquire T1D. By contrast, the remaining 12

exemplars have negative coefficients, and hence the similarity to these

exemplars reduces the T1D risk.

To gain insight into these 26 exemplars, we performed a cluster

analysis on the similarity matrix, grouping subjects with correlated sim-

ilarity measurements of exemplars in the training set to facilitate visual

interpretation via a heatmap (Figure 4). Clusters of subjects are hierar-

chically organized via a dendrogram placed on the top of the heatmap.

The rows are sorted by odds ratios (OR) that quantify association of

T1D with exemplar‐specific similarity measures, from risk (>1) to

protective (<1) associations. The colour map in the upper left corner of

Figure 4 shows the magnitudes of similarity, characterized by white,

blue, and red colour for low, medium, and high similarity. On the right

side, each row is labelled with exemplar ID, exemplar‐specific OR, and

the associated genotype profile, and are shown in the same order as

Table 1. The genotype profile consists of genotypes of DRB1,

DRB345, DQA1, DQB1, DPA1, and DPB1. The coloured bar between

dendrogram and heatmap, across all samples, labels the disease status

as patients (green) and controls (red). Inspecting the hierarchical tree

suggests that 705 subjects appear to form distinct clusters. For exam-

ple, the cluster of subjects on the far right side, labelled by a circled 1,

appears to have comparable similarity measurements and tends to

share with exemplars 1, 4, and 14. Interestingly, nearly all subjects in

this cluster appear to be patients. On the other hand, the cluster

labelled by a circled 2 suggests that these subjects appear to have

modest similarities to multiple high‐risk exemplars, and a large

proportion of them are patients. The third cluster includes a group of

subjects who tend to have relatively high similarities with those

exemplars with protective genotype profiles and includes more

controls than patients.

From the perspective of exemplars, one can gain insights into

which subjects are highly similar to corresponding exemplars. For

example, consider the exemplar 1 with the OR nearly at 18, reading

across all subjects suggests that those with similarities greater than

0.5 and approaching 1 tend to be patients (marked as green by the

crossbar). On the other hand, exemplars 21‐24 have protective associ-

ations, and associated subjects with relatively high similarities tend to

be controls. Noticeably, for the exemplar 18, most of subjects tend

to have relatively low similarity.

To gain further insights into clusters of both patients and

controls, we compute pairwise distances, approximated by truncated

correlation coefficients (0.60 or higher) of similarity measures

between subjects. Then, using the force‐directed placement

algorithm50 implemented in the igraph package in R, we display a

“clustering network” of all subjects (Figure 5). While actual shapes

of this “clustering network” are somewhat arbitrary and simply a

representation with “minimum crossing by edges (lines connecting

subjects)”, this visual representation provides an intuitive organization

of clustered subjects, with meaningful interpretations. First, all

patients appear to have greater tendency clustering together, than

do controls. Second, there are at least 3 clusters of patients, and they

are labelled as the DR3+ cluster, DR3/4 cluster, and DR4+ cluster,

because subjects in these clusters tend to carry DR3 allele with

another allele, or both DR3 and DR4 alleles, or DR4 allele with

another allele, respectively. Interestingly, other patients who carry

different HLA‐DR genotypes tend to sparsely cluster with controls.

Third, controls tend to have various different genotypes and hence

tend to be more diverse, which is consistent with the observation

of more diverse genotypes in controls than in patients. For those

who are interested in exploring “clustering network” in depth, we

have included a high‐resolution version of Figure 5 in the supple-

mentary (Figure 5S).

4.5 | ROC analysis in training and validation sets

Utilizing estimated weights, we compute a risk score by Equation 1

above. By the disease status, we compare averaged risk scores among

patients with those among controls, and we find that their means are

significantly different (P‐value = 4.32 × 10−92). By the ROC analysis,

we compute the diagnostic sensitivity and specificity for a series of

threshold values, and plot the corresponding ROC curve, resulting in

a coloured curve in Figure 6. The corresponding AUC is estimated at

0.92. The risk scores, denoted on the right axis, are colour‐coded and

range from approximately −5.4 to 3.9.

Given selected exemplars and associated weights, we evaluate

their risk scores by Equation 1 in the validation set. Again, by the

case‐control status, we compute mean risk scores in patients and con-

trols, respectively, and the mean difference remains highly significant

in the same direction (P‐value = 8.99*10‐72). In Figure 6, we show

the ROC curve (solid dark line) for the validation data. The estimated

AUC is around 0.89, and this comparable AUC value supports an

empirical validation of the risk score calculation.

Page 9: Building and validating a prediction model for paediatric type 1 …research.fhcrc.org/content/dam/stripe/bolouri/files/Zhao... · 2020-07-11 · Although effective prevention of

TABLE

1Estim

ated

logodd

sratiosan

dtheirodd

sratiosforthose

exem

plarsch

aracterizedby

all6

HLA

gene

s(H

LA‐D

RB1,‐DRB345,‐DQA1,‐DQB1,‐DPA1,a

nd‐D

PB1)an

dalso

selected

byobject‐

orien

tedregression

Ex

Subjec

tDRB1

DRB345

DQA1

DQB1

DPA1

DPB1

LogOR

1D459

*04:01:01

*04:01:01

DRB4*01:03:01

DRB4*01:03:01

*03:01:01

*03:01:01

*03:02:01

*03:02:01

*01:03:01

*01:03:01

*04:01:01

*04:01:01

2.89

2D1342

*03:01:01

*03:01:01

DRB3*01:01:02

DRB3*01:01:02

*05:01:01

*05:01:01

*02:01:01

*02:01:08

*01:03:01

*01:03:01

*02:01:02

*03:01:01

2.47

3D1868

*07:01:01

*04:01:01

DRB4*01:01:01

DRB4*01:03:01

*02:01

*03:02

*02:02:01

*03:02:01

*01:03:01

*02:01:01

*02:01:02

*03:01:01

2.41

4D1819

*04:01:01

*04:05:01

DRB4*01:03:01

DRB4*01:03:01

*03:01:01

*03:02

*03:02:01

*03:02:01

*01:03:01

*02:01:01

*04:01:01

*13:01:01

1.99

5D1209

*04:05:01

*01:01:01

DRB4*01:01:01

*03:02

*01:01:01

*03:02:01

*05:01:01

*01:03:01

*02:01:01

*02:01:02

*09:01:01

1.96

6D1344

*03:01:01

*03:01:01

DRB3*02:02:01

DRB3*02:02:01

*05:01:01

*05:01:01

*02:01:01

*02:01:01

*01:03:01

*01:03:01

*02:01:02

*03:01:01

1.90

7D1191

*04:01:01

*13:02:01

DRB3*03:01:01

DRB4*01:03:01

*03:01:01

*01:02:01

*03:02:01

*06:04:01

*01:03:01

*02:01:02

*01:01:01

*04:01:01

1.67

8D405

*03:01:01

*03:01:01

DRB3*02:02:01

DRB3*02:02:01

*05:01:01

*05:01:01

*02:01:01

*02:01:01

*01:03:01

*01:03:01

*04:01:01

*30:01

1.30

9D704

*04:01:01

*09:01:02

DRB4*01:03:01

DRB4*01:03:01

*03:01:01

*03:02

*03:02:01

*03:03:02

*01:03:01

*02:02:02

*04:01:01

*05:01:01

1.19

10

D624

*03:01:01

*03:01:01

DRB3*01:01:02

DRB3*01:01:02

*05:01:01

*05:01:01

*02:01:01

*02:01:01

*01:03:01

*02:01:04

*02:01:02

*13:01:01

0.84

11

D1214

*04:01:01

*13:02:01

DRB3*03:01:01

DRB4*01:03:01

*03:01:01

*01:02:01

*03:02:01

*06:09

*01:03:01

*01:03:01

*02:01:02

*03:01:01

0.65

12

D1499

*04:01:01

*01:01:01

DRB4*01:03:01

*03:02

*01:01:01

*03:02:01

*05:01:01

*01:03:01

*02:02:01

*04:02:01

*19:01

0.45

13

D1034

*04:01:01

*04:04:01

*03:01:01

*03:01:01

*03:02:01

*03:02:01

*01:03:01

*01:03:01

*02:01:02

*04:01:01

0.21

14

D2102

*04:05:01

*09:01:02

DRB4*01:03:01

DRB4*01:03:01

*03:02

*03:02

*03:02:01

*03:03:02

*01:03:01

*02:01:01

*04:01:01

*13:01:01

0.20

15

N005938

*11:04:01

*07:01:01

DRB3*02:02:01

DRB4*01:03:01

*05:05:01

*02:01

*03:01:01

*03:03:02

*01:03:01

*01:03:01

*04:01:01

*04:02:01

‐0.08

16

N001991

*15:01:01

*13:01:01

DRB3*02:02:01

DRB5*01:01:01

*01:02:01

*01:03:01

*06:02:01

*06:03:01

*01:03:01

*02:02:01

*02:01:02

*19:01:01

‐0.15

17

N002842

*03:01:01

*16:01:01

DRB3*02:02:01

DRB5*02:02

*05:01:01

*01:02:02

*02:01:01

*05:02:01

*01:03:01

*02:01:01

*04:02:01

*14:01

‐0.40

18

N005872

*07:01:01

*07:01:01

DRB4*01:01:01

DRB4*01:01:01

*02:01

*02:01

*02:02:01

*02:02:01

*02:01:01

*02:01:01

*10:01

*11:01:01

‐0.62

19

N003698

*12:01:01

*15:01:01

DRB3*02:02:01

DRB5*01:01:01

*05:05:01

*01:02:01

*03:01:01

*06:02:01

*01:03:01

*02:02:01

*04:02:01

*19:01:01

‐1.21

20

N001707

*04:04:01

*14:54:01

DRB3*02:02:01

DRB4*01:03:01

*03:01:01

*01:01:01

*03:02:01

*05:03:01

*01:03:01

*01:04

*04:01:01

*15:01

‐1.31

21

N005182

*07:01:01

*15:01:01

DRB4*01:03:01

DRB5*01:01:01

*02:01

*01:02:01

*03:03:02

*06:02:01

*01:03:01

*02:01:01

*04:01:01

*13:01:01

‐1.38

22

N002460

*04:07:01

*13:01:01

DRB3*01:01:02

DRB4*01:03:01

*03:02

*01:03:01

*03:01:01

*06:03:01

*01:03:01

*01:03:01

*04:01:01

*04:02:01

‐2.03

23

N002709

*07:01:01

*15:01:01

DRB4*01:03:01

DRB5*01:01:01

*02:01

*01:02:01

*02:02:01

*06:02:01

*01:03:01

*01:03:01

*04:01:01

*04:02:01

‐2.42

24

N004319

*04:07:01

*07:01:01

DRB4*01:03:01

DRB4*01:03:01

*03:01:01

*02:01

*03:01:01

*03:03:02

*01:03:01

*01:03:01

*02:01:02

*04:01:01

‐2.43

25

N004385

*14:02

*15:01:01

DRB3*01:01:02

DRB5*01:01:01

*05:03

*01:02:01

*03:01:01

*06:02:01

*01:03:01

*01:03:01

*03:01:01

*06:01

‐3.68

26

N000982

*14:54:01

*15:02:01

DRB3*02:02:01

DRB5*01:02

*01:01:01

*01:03:01

*05:03:01

*06:01:01

*01:03:01

*02:01:01

*02:01:02

*14:01

‐4.08

ZHAO ET AL. 9 of 16

Page 10: Building and validating a prediction model for paediatric type 1 …research.fhcrc.org/content/dam/stripe/bolouri/files/Zhao... · 2020-07-11 · Although effective prevention of

FIGURE 4 Clustered samples and exemplars by similarity measures in the training set, with subjects in columns and with 26 exemplars in rows.Similarity values of every subject with exemplar, ranging from 0 to 1, are colour‐coded by the colour map. Colour bar across columns (subjects)indicate cases (green) and controls (red). Row labels on the right side include estimated odds ratio (OR) for genotype profiles (in the order of DRB1,DRB345, DQA1, DQB1, DPA1, DPB1) of all 26 exemplars. Three numbered labels, on the top of hierarchical tree, indicate 3 clusters of subjects:Cluster 1 includes a group of subjects, mostly patients, that have exceptionally high similarity to the exemplars 1, 4, and 14; subjects in cluster 2tend to have relatively high similarities to exemplars 1‐15 and 17, and subjects in cluster 3 tend to be normal subjects with high similarity toexemplars 21‐24

10 of 16 ZHAO ET AL.

4.6 | Biological validation of risk scores with isletautoantibodies

Beyond an empirical validation above, we choose to seek a biological

validation by correlating risk scores with islet autoantibody measure-

ments. The biological validation uses all patients from both training

and validation sets, because controls have no islet autoantibody in

general and no measurements are made in this study. We perform

regression of one autoantibody level a time on the risk score. The esti-

mated regression coefficients, standard errors, Z‐scores, and P‐values

are listed in Table 2. Evidently, the risk score is statistically significant

in its association with IA‐2A (P‐value < 0.0001). Interestingly, among

T1D patients, the risk score does not appear to associate with

GADA, ZnT8WA, and ZnT8QA. Somewhat surprisingly, the association

of risk score with ZnT8RA is negative, although it is marginal

(P‐value = 0.052).

4.7 | Evaluating the prediction model

As noted earlier, early detection and future prevention by either

primary or secondary prevention is a major impetus for developing a

T1D risk prediction model using DNA samples, from newborns in

particular. Through the training and validation exercise described

earlier, we have shown that the risk score with 26 exemplars has a

remarkable AUC of 0.89 in the validation data set. Combining both

training and validation data sets, we produce a final prediction model

that may be useful to construct a testing rule (2). The ROC curves for

the combined (black solid line), training (coloured line), and validation

(red line) data sets are shown in Figure 6. From the ROC curve, esti-

mated sensitivity and 1‐specificity are estimated around θ = 0.80 and

0.17 (or λ = 0.83), respectively. A birth cohort of 115 000 newborns

(approximation to 2014 birth cohort in Sweden, http://www.scb.se/)

would yield an expected 25 000 babies with positive test results in the

3 incidence scenariosπ = 0.5% , 1%or 2%, by Equation 3 (Table 3).

Among all subjects with positive test results, we subtract actual T1D

patients, resulting in approximately 19 500 subjects who are false

positive for T1D, given estimated 575, 1150, and 2300 T1D patients

under 3 scenarios. By Equation 4, we estimate that this prediction

model detects 460, 920, or 1840 T1D patients. Effectively, these

detected T1D patients correspond to approximately 80% of all T1D

patients. Indeed, these children, provided that effective prevention

strategies are available, would benefit from knowing their T1D risks

from birth.

5 | DISCUSSION

The major conclusion from the current study is that the prediction

model for T1D risk based on HLA‐DR, ‐DQ, and ‐DP genes is able to

differentiate high‐risk subjects from low‐risk subjects. Through an

Page 11: Building and validating a prediction model for paediatric type 1 …research.fhcrc.org/content/dam/stripe/bolouri/files/Zhao... · 2020-07-11 · Although effective prevention of

FIGURE 5 Organized “clustering network” of all cases (green) and controls (red), based pairwise distances of similarity measures between allsubjects. The visual display is created by the force‐directed placement algorithm implemented in the igraph package of R. All edges, connectingsubjects (nodes), indicate that the corresponding correlation coefficient is at least 0.6 or higher. The blue text, mostly not visible, corresponds tolow‐resolution alleles of HLA‐DRB1. Three distinct clusters of cases are indicated as DR4+, DR3/4, and DR3+ clusters that tend to include subjectscarrying DR4 with another allele, both DR3 and DR4, or DR3 with another allele, respectively. Specific genotype of DR is indicated in blue label andcan be explored in an enlarged figure (see Figure 5S)

ZHAO ET AL. 11 of 16

independent validation data set, we have shown that the prediction

model has desired diagnostic sensitivity and specificity in forming an

AUC of around 0.90. Meanwhile, the biological validation of this

prediction model suggests that risk scores to be diagnosed with T1D

positively and significantly associate with IA‐2A levels. This observa-

tion supports previous reports that IA‐2A is a strong risk factor for a

subsequent clinical diagnosis of T1D51,52 and to select subjects at high

risk for rapid progression to clinical diagnosis.53 It is noted that the

effect of IA‐2A is not alone, as nearly 80% of newly diagnosed T1D

patients have 2 or more islet autoantibodies at the time of clinical

diagnosis.54 Positive validations, empirically and biologically, provide

strong support for the validity of the current prediction model. Such

a prediction model is probably useful for identifying high‐risk subjects

to be recruited to both primary18 and secondary prevention clinical

trials once one or several islet autoantibodies have developed.55 After

effective preventive strategies are discovered, such a prediction may

be applicable to precision screening in a high‐risk population, eg, new-

borns in Sweden. For example, we consider a test with the threshold

value of 2.03, with corresponding sensitivity of 0.80 and specificity

of 0.83. Given the population demography of Sweden with ~115 000

newborns in year 2014 and assuming the lifetime risk of 0.5%, 1%,

or 2%, this effort yields approximately 20 000 babies with positive test

result (Table 3). Among this birth cohort, we expect that 575, 1150, or

2300, respectively, could develop T1D and would benefit from this

effort if there were effective preventive strategies. Given assumed

lifetime risk for such a birth cohort, one would expect to have 460,

920, and 1840 T1D patients during their lifetime. We expect this effort

would cover 80% of all T1D patients in this Swedish birth cohort.

Besides the application to screening newborns (if there were

effective prevention strategies), a DNA‐based prediction model may

have several other important applications. Among them, such a predic-

tion model should prove useful for T1D prevention studies to recruit

high‐risk subjects by testing newborns as in the ongoing Type 1

Diabetes Prediction and Prevention (DIPP) project in Finland56 and

the multicenter TEDDY study.1,22 This prediction model may also be

useful for counselling families at high risk, such as first‐degree relatives

of T1D patients. Because of a much elevated baseline risk, the predic-

tion model has an improved probability of detecting thoseT1D patients

before clinical symptoms. At the same time, it could be comforting for

some relatives of T1D patients to learn their T1D risks are not much

higher than those in general population.

Here, we consider one possible scenario for designing a test, by

choosing comparable diagnostic sensitivity and specificity. Alterna-

tively, by reducing the threshold value, one may reduce sensitivity

and increase specificity, netting more subjects with positive testing

results and hence increasing coverage of all T1D patients. For those

in the high‐risk population, one may implement longitudinal tests, for

example, using islet autoantibodies to monitor the progressive deletion

of beta cells prior to the onset of T1D.35 Recently, Pepe and Janes

(2013) showed that practitioners should choose the threshold,

balancing between cost and benefits with or without knowing this risk

scores.57 The cost refers to expenses associated with false positive

Page 12: Building and validating a prediction model for paediatric type 1 …research.fhcrc.org/content/dam/stripe/bolouri/files/Zhao... · 2020-07-11 · Although effective prevention of

FIGURE 6 ROC analysis on computed risk scores in the training data (red dotted line), in the validation data set (red solid line), and in the combineddata set (black solid line). Estimated AUCs in training, validation, and combined data sets are 0.917, 0.886, and 0.903, respectively. A tentativeselection of the threshold value around 2.02, for the combined data set, yields sensitivity and specificity of 0.80 and 0.83, respectively

TABLE 2 Functional validation of estimated risk scores via theirassociations with 6 islet autoantibody measurements among cases

Coef SE Z‐score P‐value

IAA −0.031 0.053 −0.584 0.559

GADA −0.029 0.051 −0.575 0.565

IA2A 0.217 0.059 3.675 0.000

ZnT8RA −0.099 0.051 −1.945 0.052

ZnT8WA 0.027 0.050 0.552 0.581

ZnT8QA −0.060 0.052 −1.154 0.249

Overall −0.002 0.014 −0.176 0.860

TABLE 3 With the 2014 birth cohort with approximately 115 000 subjectsassuming the lifetime incidence rate at 0.005, 0.01, or 0.02. Also estimatedestimated numbers of T1D patients for the birth cohort, numbers of T1D ppositive screening results

Number of newborns in Sweden (year 2014)

nnn

Number of subjects with positive screening test N+ = N × Pr(Z > c) Equation 3

Estimated number of normal subjects with positive screening test (= N+ − E+)

Number of T1D patients D+ = N × π

Expected number of T1D patients with positive screening test E+ = N+ Pr(D =

Coverage of T1D patients by positive screening test result (= E+/D+)

12 of 16 ZHAO ET AL.

prediction errors that lead to unnecessary monitoring and preventive

treatment. In addition, there are psychological costs with increased

worry in patient's guardians and individuals caused by false positive

prediction. InT1D, the appearance of two or more islet autoantibodies

predicts T1D during 15 years of follow‐up.58 Subjects found to have

high‐risk HLA might therefore be screened for islet autoantibodies as

the benefits of a true positive prediction are a diagnosis of T1D

without ketoacidosis and symptoms of diabetes and initially more

stable disease.11,59 Recognizing that judgments of costs and benefits

are population specific, it is of interest that 2–5 year olds in Germany

, we estimate total numbers of subjects with positive test result (line 1),are numbers of normal subjects with false positive test result (line 2),atients with positive result, and the coverage of all T1D patients by

Assumed lifetime incidence

π = 0.005 π = 0.01 π = 0.02

19 912 20 275 20 999

19 452 19 355 19 159

575 1150 2300

1|Z > c) Equation 4 460 920 1840

0.8 0.8 0.8

Page 13: Building and validating a prediction model for paediatric type 1 …research.fhcrc.org/content/dam/stripe/bolouri/files/Zhao... · 2020-07-11 · Although effective prevention of

ZHAO ET AL. 13 of 16

are now in the FR1DA study screened for islet autoantibodies with the

primary aim to prevent ketoacidosis at the onset of T1D.60,61

A major justification for adopting an early detection protocol is the

eventual availability of a prevention strategy. These strategies need to

be developed through clinical trials such as the ongoing oral insulin trial

by the TrialNet consortium.62 Currently, all T1D patients receive

lifelong treatment with insulin from the day of clinical diagnosis. There

are no other treatment options than insulin. The question is often

asked why screening newborns would have any value, as there is no

treatment available that would prevent clinical onset. On the support-

ive side in favour of screening, there are at least 3 arguments. First,

newborn screening followed by monitoring high‐risk children for islet

autoantibodies may reduce and even prevent hospitalization9 and

ketoacidosis10,11 at the time of clinical diagnosis of T1D. Second,

newborn screening increases the possibilities to separately uncover

the aetiology and pathogenesis of the disease.7,8,63 Third, newborn

screening was initiated to include only children born to mothers or

fathers with T1D64,65 or in the general population using HLA typing

to select children at risk66,67 as well as screening and following all

newborns in high‐risk populations.68 TheTEDDY study used a defined

set of HLA‐DQ genotypes to identify children at risk.22 The HLA

genotypes used showed that although the eligibility rate was 4.8%

on an average, it varied between participating sites being the lowest

in Georgia/Florida (3.5%) and highest in Sweden (7.4%). More

importantly, although the study has proven useful to detect the early

appearance of a first islet cell autoantibody and hence may be impor-

tant to uncover the aetiology of the disease initiation, it would only

identify less than 50% of the children expected to develop diabetes

in the entire screened cohort. If 12‐15% of the high risk Swedish

newborns would have been included in the newborn screening, it

could be estimated that more than this screening effort would include

approximately 60% of the children who are expected to develop T1D

before 18 years of age. It should be noted in this respect that screening

newborns in families already affected by T1D is not productive as only

13% of newly diagnosed T1D children and young adults have a father,

mother, or a sibling with the disease.69,70 The present prediction is that

approximately 20% newborns need to be selected and followed to

represent 80% of those expected to develop T1D. It would mean that

80% of the population would not have to be screened for islet autoan-

tibodies, while in the remaining 20% annual islet autoantibody testing

would identify 80% of expected T1D patients. Screening for islet auto-

antibodies was associated with psychological stress71 and the news

that a child is at increased risk for T1D heightened maternal anxiety.72

The initial anxiety was reported to dissipate to normal levels over time

even though subjects with islet autoantibodies initiated lifestyle or

health behaviour changes to delay or prevent a clinical onset of T1D,

Aside from practical considerations on when genetic tests should

be used, there are still several research topics to be considered in our

future research. First, our prediction model uses NGTS technology to

obtain diploid sequences for each HLA gene, which yield higher reso-

lution than conventionally typed HLA genotypes. Because of the cost

differentiation, it is important to know if the improved resolution leads

to improved accuracy of T1D prediction. Second, it is noted that

flanking SNPs can be used to predict T1D risk, and the cost of

genotyping SNPs is much lower than typing HLA genes. For any

large‐scale screening effort, it will be important to identify comple-

mentary features of SNP‐based and NGTS genetic markers, to develop

a cost‐effective precision screening strategy. Third, it is known that

there are approximately 40 loci, other than HLA genes, associated with

T1D.73 Hence, as a general prediction model, it will be of interest to

integrate these 40 loci, together with HLA genes, to test if the predic-

tion model can be further improved.

Although the present proposed screening test would identify 80%

of those with a lifetime risk for T1D and represent 20% of the new-

borns, the HLA typing needs to be complemented with islet autoanti-

body tests. The appearance of a first islet autoantibody seems to

occur in response to a yet unknown trigger(s) dependent on the HLA

type of the child. IAA‐only is primarily occurring in HLA‐DR4‐DQ8

children during the first 3 years of life, while GADA only is related to

DR3‐DQ2 children and appears later.1,7 The latter group was reported

earlier to be related with a more slowly progressiveT1D.74 It will there-

fore be important in the future to use NGTS HLA typing to determine

to what extent the model fit appearance of a first islet autoantibody

better than a clinical onset of T1D. It is noted that data from 3 new-

born screening programs merged into 1 data set suggest that over

20‐year follow‐up, 100% of children with 2 or more islet autoanti-

bodies were eventually diagnosed with T1D.6

Finally, our “clustering network” has indicated that some controls

are clustered with patients, and some patients are unexpectedly clus-

tered with controls. Identification of these “outlier subjects” provides

an impetus for investigating other etiological factors that contribute

to their OOR scores. For example, “outlier patients” may represent

monogenic diabetes, MODY, secondary, or type 2 diabetes.

Taken together, the present study of a case‐control study in Swe-

den of newly diagnosed T1D patients (1‐18 years of age) and matched

controls utilizing high‐resolution genotypes for HLA‐DRB1, ‐DRB3, ‐

DRB4, ‐DRB5, ‐DQA1, ‐DQB1, ‐DPA1, and ‐DPB1 by next‐generation

sequencing made it possible to use OOR technique to build a prediction

model for T1D. The model developed in a training set followed by a val-

idation set had a sensitivity and 1‐specificity plot (receivers operating

characteristics—ROC—curve) of 0.90. The risk score was strongly asso-

ciated with IA‐2A, negatively with ZnT8RA, but not with the other islet

autoantibodies. The model would select approximately 20% of all

newborns in Sweden to identify approximately 80% of those develop-

ing T1D during their lifetime. This high‐risk group of subjects should

prove useful to better combine HLA typing at birth with islet autoanti-

body measurements during follow‐up to identify subjects who would

be eligible in research to develop effective prevention strategies.

ACKNOWLEDGEMENT

This work is supported in part by European Foundation for the

Study of Diabetes (EFSD), the Swedish Child Diabetes Foundation

(Barndiabetesfonden), the National Institutes of Health (DK63861,

DK26190), the Swedish Research Council including a Linné grant to

Lund University Diabetes Centre, the Skåne County Council for

Research and Development as well as the Swedish Association of Local

Authorities and Regions (SKL), also in part by National Institute of

Health/National Institute of Diabetes and Digestive and Kidney

Disease (ITN# 16‐05‐MH) and the institutional developmental fund

from Fred Hutchinson Cancer Research Center (LPZ).

Page 14: Building and validating a prediction model for paediatric type 1 …research.fhcrc.org/content/dam/stripe/bolouri/files/Zhao... · 2020-07-11 · Although effective prevention of

14 of 16 ZHAO ET AL.

ORCID

Lue Ping Zhao http://orcid.org/0000-0002-1387-7165

REFERENCES

1. Krischer JP, Lynch KF, Schatz DA, et al. The 6 year incidence ofdiabetes‐associated autoantibodies in genetically at‐risk children: theTEDDY study. Diabetologia. 2015;58(5):980‐987.

2. Atkinson MA, Eisenbarth GS, Michels AW. Type 1 diabetes. Lancet.2014;383(9911):69‐82.

3. Tuomilehto J. The emerging global epidemic of type 1 diabetes. CurrDiab Rep. 2013;13(6):795‐804.

4. Rawshani A, Landin‐Olsson M, Svensson AM, et al. The incidence ofdiabetes among 0‐34 year olds in Sweden: new data and bettermethods. Diabetologia. 2014;57(7):1375‐1381.

5. Maahs DM, West NA, Lawrence JM, Mayer‐Davis EJ. Epidemiology oftype 1 diabetes. Endocrinol Metab Clin North Am. 2010;39(3):481‐497.

6. Ziegler AG, Rewers M, Simell O, et al. Seroconversion to multiple isletautoantibodies and risk of progression to diabetes in children. JAMA.2013;309(23):2473‐2479.

7. Ilonen J, Hammais A, Laine AP, et al. Patterns of beta‐cell autoantibodyappearance and genetic associations during the first years of life.Diabetes. 2013;62(10):3636‐3640.

8. Krischer JP, Lynch KF, Schatz DA, et al. The 6 year incidence of diabetes‐associated autoantibodies in genetically at‐risk children: the TEDDYstudy. Diabetologia. 2015;58(5):980‐987.

9. Barker JM, Goehrig SH, Barriga K, et al. Clinical characteristics ofchildren diagnosed with type 1 diabetes through intensive screeningand follow‐up. Diabetes Care. 2004;27(6):1399‐1404.

10. Elding Larsson H, Vehik K, Gesualdo P, et al. Children followed in theTEDDY study are diagnosed with type 1 diabetes at an early stage ofdisease. Pediatr Diabetes. 2014;15(2):118‐126.

11. Elding Larsson H, Vehik K, Bell R, et al. Reduced prevalence of diabeticketoacidosis at diagnosis of type 1 diabetes in young childrenparticipating in longitudinal follow‐up. Diabetes Care. 2011;34(11):2347‐2352.

12. LundgrenM, Sahlin A, Svensson C, et al. Reduced morbidity at diagnosisand improved glycemic control in children previously enrolled in DiPiSfollow‐up. Pediatr Diabetes. 2014;15(7):494‐501.

13. Effects of insulin in relatives of patients with type 1 diabetes mellitus. NEngl J Med. 2002;346(22):1685‐1691.

14. Skyler JS, Krischer JP, Wolfsdorf J, et al. Effects of oral insulin inrelatives of patients with type 1 diabetes: the diabetes preventiontrial—type 1. Diabetes Care. 2005;28(5):1068‐1076.

15. Gale EA, Bingley PJ, Emmett CL, Collier T. European NicotinamideDiabetes Intervention Trial (ENDIT): a randomised controlled trialof intervention before the onset of type 1 diabetes. Lancet.2004;363(9413):925‐931.

16. Nanto‐Salonen K, Kupila A, Simell S, et al. Nasal insulin to prevent type1 diabetes in children with HLA genotypes and autoantibodies confer-ring increased risk of disease: a double‐blind, randomised controlledtrial. Lancet. 2008;372(9651):1746‐1755.

17. Knip M, Akerblom HK, Becker D, et al. Hydrolyzed infant formula andearly beta‐cell autoimmunity: a randomized clinical trial. JAMA.2014;311(22):2279‐2287.

18. Bonifacio E, Ziegler AG, Klingensmith G, et al. Effects of high‐dose oralinsulin on immune responses in children at high risk for type 1 diabetes:the pre‐POINT randomized clinical trial. JAMA. 2015;313(15):1541‐1549.

19. Skyler JS. Immune intervention for type 1 diabetes mellitus. Int J ClinPract Suppl. 2011;170:61‐70.

20. Ludvigsson J. Combination therapy for preservation of beta cellfunction in type 1 diabetes: new attitudes and strategies are needed!Immunol Lett. 2014;159(1‐2):30‐35.

21. Ludvigsson J, Krisky D, Casas R, et al. GAD65 antigen therapyin recently diagnosed type 1 diabetes mellitus. N Engl J Med.2012;366(5):433‐442.

22. Hagopian WA, Erlich H, Lernmark A, et al. The EnvironmentalDeterminants of Diabetes in the Young (TEDDY): genetic criteria andinternational diabetes risk screening of 421 000 infants. PediatrDiabetes. 2011;12(8):733‐743.

23. Kiviniemi M, Hermann R, Nurmi J, et al. A high‐throughput populationscreening system for the estimation of genetic risk for type 1 diabetes:an application for the TEDDY (The Environmental Determinants ofDiabetes in theYoung) study.DiabetesTechnol Ther. 2007;9(5):460‐472.

24. Barker JM, Barriga KJ, Yu L, et al. Prediction of autoantibody positivityand progression to type 1 diabetes: Diabetes Autoimmunity Study inthe Young (DAISY). J Clin Endocrinol Metab. 2004;89(8):3896‐3902.

25. Ziegler AG, Hummel M, Schenker M, Bonifacio E. Autoantibodyappearance and risk for development of childhood diabetes in offspringof parents with type 1 diabetes: the 2‐year analysis of the GermanBABYDIAB Study. Diabetes. 1999;48(3):460‐468.

26. Nerup J, Platz P, Andersen OO, et al. HL‐A antigens and diabetesmellitus. Lancet. 1974;2(7885):864‐866.

27. WTCCC. Genome‐wide association study of 14,000 cases ofseven common diseases and 3,000 shared controls. Nature.2007;447(7145):661‐678.

28. Todd JA, Walker NM, Cooper JD, et al. Robust associations of four newchromosome regions from genome‐wide analyses of type 1 diabetes.Nat Genet. 2007;39(7):857‐864.

29. Cooper JD, Smyth DJ, Smiles AM, et al. Meta‐analysis of genome‐wideassociation study data identifies additional type 1 diabetes risk loci. NatGenet. 2008;40(12):1399‐1401.

30. Bradfield JP, Qu HQ, Wang K, et al. A genome‐wide meta‐analysis ofsix type 1 diabetes cohorts identifies multiple associated loci. PLoSGenet. 2011;7(9): e1002293

31. Concannon P, Rich SS, Nepom GT. Genetics of type 1A diabetes. N EnglJ Med. 2009;360(16):1646‐1654.

32. Noble JA, Valdes AM. Genetics of the HLA region in the prediction oftype 1 diabetes. Curr Diab Rep. 2011;11(6):533‐542.

33. Erlich HA, Valdes AM, Noble JA. Prediction of type 1 diabetes. Diabetes.2013;62(4):1020‐1021.

34. Clayton DG. Prediction and interaction in complex disease genetics:experience in type 1 diabetes. PLoS Genet. 2009;5(7): e1000540

35. Xu P, Krischer JP. Prognostic classification factors associated withdevelopment of multiple autoantibodies, dysglycemia, and type 1diabetes—a recursive partitioning analysis. Diabetes Care. 2016;39(6):1036‐1044.

36. Zhao LP, Bolouri H. Object‐oriented regression for building predictivemodels with high dimensional omics data from translational studies.J Biomed Inform. 2016;60:431‐445.

37. Hu X, Deutsch AJ, Lenz TL, et al. Additive and interaction effects atthree amino acid positions in HLA‐DQ and HLA‐DR molecules drivetype 1 diabetes risk. Nat Genet. 2015;47(8):898‐905.

38. Zhao LP, Bolouri H, Zhao M, Geraghty DE, Lernmark Å, Better DiabetesDiagnosis Study Group. An object‐oriented regression for building dis-ease predictive models with multiallelic HLA genes. Genet Epidemiol.2016;40(4):315‐332.

39. Delli AJ, Vaziri‐Sani F, Lindblad B, et al. Zinc transporter 8 autoanti-bodies and their association with SLC30A8 and HLA‐DQ genes differbetween immigrant and Swedish patients with newly diagnosed type1 diabetes in the Better Diabetes Diagnosis study. Diabetes.2012;61(10):2556‐2564.

40. Carlsson A, Kockum I, Lindblad B, et al. Low risk HLA‐DQ and increasedbody mass index in newly diagnosed type 1 diabetes children in theBetter Diabetes Diagnosis study in Sweden. Int J Obes (Lond).2012;36(5):718‐724.

41. Diagnosis and classification of diabetes mellitus. Diabetes Care.2014;37(Suppl 1):S81‐S90.

Page 15: Building and validating a prediction model for paediatric type 1 …research.fhcrc.org/content/dam/stripe/bolouri/files/Zhao... · 2020-07-11 · Although effective prevention of

ZHAO ET AL. 15 of 16

42. Delli AJ, Lindblad B, Carlsson A, et al. Type 1 diabetes patients born toimmigrants to Sweden increase their native diabetes risk and differfrom Swedish patients in HLA types and islet autoantibodies. PediatrDiabetes. 2010;11(8):513‐520.

43. Hedstrom AK, Sundqvist E, Baarnhielm M, et al. Smoking and twohuman leukocyte antigen genes interact to increase the risk for multiplesclerosis. Brain. 2011;134(Pt 3):653‐664.

44. Smith AG, Pyo CW, Nelson W, et al. Next generation sequencing todetermine HLA class II genotypes in a cohort of hematopoieticcell transplant patients and donors. Hum Immunol. 2014;75(10):1040‐1046.

45. Nelson WC, Pyo CW, Vogan D, et al. An integrated genotypingapproach for HLA and other complex genetic systems. Hum Immunol.2015;76(12):928‐938.

46. Vaziri‐Sani F, Delli AJ, Elding‐Larsson H, et al. A novel triple mixradiobinding assay for the three ZnT8 (ZnT8‐RWQ) autoantibodyvariants in children with newly diagnosed diabetes. J Immunol Methods.2011;371(1‐2):25‐37.

47. Zhao LP, Alshiekh S, Zhao M, et al. Next‐generation sequencing revealsthat HLA‐DRB3, ‐DRB4, and ‐DRB5 may be associated with isletautoantibodies and risk for childhood type 1 diabetes. Diabetes.2016;65(3):710‐718.

48. Li SS, Wang H, Smith A, et al. Predicting multiallelic genes usingunphased and flanking single nucleotide polymorphisms. GenetEpidemiol. 2011;35(2):85‐92.

49. Team RDC. R: A language and environment for statistical computing. In:Computing RFfS, ed. Vienna, Austria: R Foundation for Statistical Com-puting; 2008.

50. Fruchterman TMJ, Reingold EM. Graph drawing by force‐directedplacement. Software Pract Exper. 1991;21(11):1129‐1164.

51. Decochez K, De Leeuw IH, Keymeulen B, et al. IA‐2 autoantibodiespredict impending type I diabetes in siblings of patients. Diabetologia.2002;45(12):1658‐1666.

52. Lundgren M, Lynch K, Larsson C, Elding Larsson H. Cord bloodinsulinoma‐associated protein 2 autoantibodies are associated withincreased risk of type 1 diabetes in the population‐based diabetesprediction in Skane study. Diabetologia. 2015;58(1):75‐78.

53. De Grijse J, Asanghanwa M, Nouthe B, et al. Predictive power ofscreening for antibodies against insulinoma‐associated protein 2 beta(IA‐2beta) and zinc transporter‐8 to select first‐degree relatives oftype 1 diabetic patients with risk of rapid progression to clinical onsetof the disease: implications for prevention trials. Diabetologia.2010;53(3):517‐524.

54. Andersson C, Vaziri‐Sani F, Delli A, et al. Triple specificity of ZnT8autoantibodies in relation to HLA and other islet autoantibodiesin childhood and adolescent type 1 diabetes. Pediatr Diabetes.2013;14(2):97‐105.

55. Skyler JS, Greenbaum CJ, Lachin JM, et al. Type 1 Diabetes TrialNet—an international collaborative clinical trials network. Ann N Y Acad Sci.2008;1150:14‐24.

56. Erkkola M, Salmenhaara M, Nwaru BI, et al. Sociodemographic determi-nants of early weaning: a Finnish birth cohort study in infants withhuman leucocyte antigen‐conferred susceptibility to type 1 diabetes.Public Health Nutr. 2013;16(2):296‐304.

57. Lee M‐LT. Risk Assessment and Evaluation of Predictions. New York:Springer; 2013.

58. Ziegler AG, Rewers M, Simell O, et al. Seroconversion to multiple isletautoantibodies and risk of progression to diabetes in children. JAMA.2013;309(23):2473‐2479.

59. Lundgren M, Sahlin A, Svensson C, et al. Reduced morbidity at diagno-sis and improved glycemic control in children previously enrolled inDiPiS follow‐up. Pediatr Diabetes. 2014;15(7):494‐501.

60. Raab J, Haupt F, Scholz M, et al. Capillary blood islet autoantibodyscreening for identifying pre‐type 1 diabetes in the general population:

design and initial results of the Fr1da study. BMJ Open. 2016;6(5):e011144

61. Insel RA, Dunne JL, Ziegler AG. General population screening for type 1diabetes: has its time come? Curr Opin Endocrinol Diabetes Obes.2015;22(4):270‐276.

62. Vehik K, Cuthbertson D, Ruhlig H, Schatz DA, Peakman M, Krischer JP.Long‐term outcome of individuals treated with oral insulin: diabetesprevention trial‐type 1 (DPT‐1) oral insulin trial. Diabetes Care.2011;34(7):1585‐1590.

63. Insel RA, Dunne JL, Atkinson MA, et al. Staging presymptomatic type 1diabetes: a scientific statement of JDRF, the Endocrine Society, andthe American Diabetes Association. Diabetes Care. 2015;38(10):1964‐1974.

64. Roll U, Christie MR, Fuchtenbusch M, Payton MA, Hawkes CJ,Ziegler AG. Perinatal autoimmunity in offspring of diabetic parents.The German Multicenter BABY‐DIAB study: detection of humoralimmune responses to islet antigens in early childhood. Diabetes.1996;45(7):967‐973.

65. Ziegler AG, Hillebrand B, Rabl W, et al. On the appearance of isletassociated autoimmunity in offspring of diabetic mothers: a prospectivestudy from birth. Diabetologia. 1993;36(5):402‐408.

66. Kupila A, Muona P, Simell T, et al. Feasibility of genetic and immunolog-ical prediction of type I diabetes in a population‐based birth cohort.Diabetologia. 2001;44(3):290‐297.

67. Rewers M, Bugawan TL, Norris JM, et al. Newborn screening for HLAmarkers associated with IDDM: Diabetes Autoimmunity Study in theYoung (DAISY). Diabetologia. 1996;39(7):807‐812.

68. Wahlberg J, Fredriksson J, Vaarala O, Ludvigsson J. Vaccinations mayinduce diabetes‐related autoantibodies in one‐year‐old children. AnnN Y Acad Sci. 2003;1005:404‐408.

69. Dahlquist G, Blom L, Holmgren G, et al. The epidemiology of diabetes inSwedish children 0‐14 years—a six‐year prospective study.Diabetologia. 1985;28(11):802‐808.

70. Patterson CC, Dahlquist GG, Gyurus E, Green A, Soltesz G. Incidencetrends for childhood type 1 diabetes in Europe during 1989‐2003and predicted new cases 2005‐20: a multicentre prospective registra-tion study. Lancet. 2009;373(9680):2027‐2033.

71. Bennett Johnson S, Tercyak KP Jr. Psychological impact of islet cellantibody screening for IDDM on children, adults, and their familymembers. Diabetes Care. 1995;18(10):1370‐1372.

72. Roth R, Lynch K, Lernmark B, et al. Maternal anxiety about a child'sdiabetes risk in the TEDDY study: the potential role of life stress,postpartum depression, and risk perception. Pediatr Diabetes.2015;16(4):287‐298.

73. Barrett JC, Clayton DG, Concannon P, et al. Genome‐wide associationstudy and meta‐analysis find that over 40 loci affect risk of type 1diabetes. Nat Genet. 2009;41(6):703‐707.

74. Ludvigsson J, Samuelsson U, Beauforts C, et al. HLA‐DR 3 is associatedwith a more slowly progressive form of type 1 (insulin‐dependent)diabetes. Diabetologia. 1986;29(4):207‐210.

SUPPORTING INFORMATION

Additional Supporting Information may be found online in the

supporting information tab for this article.

How to cite this article: Zhao LP, Carlsson A, Larsson HE,

et al. Building and validating a prediction model for paediatric

type 1 diabetes risk using next generation targeted sequencing

of class II HLA genes. Diabetes Metab Res Rev. 2017;33:e2921.

https://doi.org/10.1002/dmrr.2921

Page 16: Building and validating a prediction model for paediatric type 1 …research.fhcrc.org/content/dam/stripe/bolouri/files/Zhao... · 2020-07-11 · Although effective prevention of

16 of 16 ZHAO ET AL.

APPENDIX 1. MEMBERS OF THE BETTERDIABETES DIAGNOSIS (BDD) STUDY GROUP

Members of the BDD study group: Anita Ramelius (Malmö), Helena

Desaix (Borås), Kalle Snellman (Eskilstuna), Anna Olivecrona (Falun),

Åke Stenberg (Gällivare), Lars Skogsberg (Gävle), Nils Östen Nilsson

(Halmstad), Jan Neiderud (Helsingborg), Åke Lagerwall (Hudiksvall),

Kristina Hemmingsson (Härnösand), Karin Åkesson (Jönköping), Göran

Lundström (Kalmar), Magnus Ljungcrantz (Karlskrona), Eva Albinsson

(Karlstad), Karin Larsson (Kristianstad), Christer Gundewall

(Kungsbacka), Rebecka Enander (Lidköping), Agneta Brännström

(Luleå), Annelie Carlsson (Lund), Maria Nordwall (Norrköping), Lennart

Hellenberg (Nyköping), Elena Lundberg (Skellefteå), Henrik Tollig

(Skövde), Britta Björsell (Sollefteå), Björn Rathsman (Stockholm/

Sacchska), Torun Torbjörnsdotter (Stockholm/Huddinge), Björn

Stjernstedt (Sundsvall), Nils Wramner (Trollhättan), Ragnar Hanås

(Uddevalla), Ingemar Swenne (Uppsala), Anna Levin (Visby), Anders

Thåström (Västervik), Carl‐Göran Arvidsson (Västerås), Stig Edvardsson

(Växjö), Björn Jönsson (Ystad), Torsten Gadd (Ängelholm), Jan Åman

(Örebro), Rein Florell (Örnsköldsvik), and Anna‐Lena Fureman

(Östersund).


Recommended