Vanderbilt’s DNA Databank:BioVU
Personalized Medicine
• Integration of genomic information into clinical decision making
• Personalized disease treatment and also preventative therapies
Personalized Medicine
• A SNP is a single base-pair mutation that occurs at a specific site in the DNA sequence - occurs in at least 1% of the population
• SNPs are responsible for over 80% of the variation between two individuals; they are ideal for establishing correlations between genotype and phenotype
• As some SNPs predispose individuals to have a certain disease or trait or react to a drug in a different way, they will be highly useful in diagnostics and drug development
What is BioVU?• The move towards personalized medicine requires very large
sample sets for discovery and validation
• BioVU: biobank intended to support a broad view of biology and enable personalized medicine
• Contains de-identified DNA extracted from leftover blood after clinically-indicated testing of Vanderbilt patients who have not opted out
• Linked to Synthetic Derivative: de-identified EMR
• Current sample number: 116,551
o 105,910 adult sampleso 10,641 pediatric samples
eligibleJohn
Doe
One
way
has
h A7C
CF9
9DE5
732…
.
A7C
CF9
9DE6
5732
….
scru
bbed
Extract DNA
A7C
CF9
9DE6
5732
….
John
Doe
The “synthetic derivative”(SD): can be updated
Synthetic Derivative vs. BioVU
A7C
DE6
532…
.
A7C
DE6
532
….
scru
bbed
+A
7CD
E653
2 …
.
scru
bbed
Synthetic Derivative BioVU ~1.9 million ~116,000
The Synthetic Derivative
• A Derivative of the EMR - information content reduced by ‘scrubbing’ identifiers
• Systematically shifted event dates
• Contains ~1.9 million recordso ~1 million with detailed longitudinal datao averaging 100,000 bytes in size o an average of 27 codes per record
• Records updated over time and are current through 9/31/09
• Can be searched restricting to records for which DNA is available
Narratives, such as:• Clinical Notes• Discharge Summaries• History and Physicals• Problem Lists• Surgical Reports• Progress Notes• Letters
Diagnostic Codes, Procedural Codes
Forms (intake, assessment)
Reports (pathology, ECGs, echocardiograms)
Clinical Communications
Lab Values and Vital Signs
Medication Orders
TraceMaster (ECGs)
Synthetic Derivative Data Types
BioVU Program Review
Jul-07
Jan-08Jul-0
8Jan-09
Jul-09
Jan-10Jul-1
0Jan-11
Jul-11
Jan-12Jul-1
2Jan-13
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
180,000
Anticipated pediatric samples
Anticipated adult sample accrual
Pediatric samples accrued
Adult samples accrued
Current accrual as of 3-28-2011:105,910 adult
10,641 pediatric
Sample accrual
BioVU Sample Management
RTS SmaRTStore
Validation in BioVU• Sample handling algorithms
o Gender matcho 1/384 gender mismatches
• Ancestryo Characterize sample ancestry, assess usefulness of ‘race’ as
defined in EMRo Provide a panel of ancestry informative markers that define ancestryo No significant difference between the concordance of self-report or
observer-report with genetic ancestry
• Demonstration project – American Journal of Human Geneticso Can known associations between genetic variants and common
diseases be identified in the EMR?
The “demonstration project”• Genotype “high-value” SNPs in the first 8,000 samples
accrued.o including SNPs associated by replicated genome-wide experiments
with common diseases & traits 1. Atrial fibrillation2. Crohn’s disease3. Multiple Sclerosis4. Rheumatoid arthritis5. Type II Diabetes
• Develop Natural Language Processing methods to identify cases and controls
• Are genotype-phenotype relations replicated?
First results
0.5 5.01.0Odds Ratio
rs2200733 Chr. 4q25rs10033464 Chr. 4q25rs11805303 IL23Rrs17234657 Chr. 5rs1000113 Chr. 5rs17221417 NOD2rs2542151 PTPN22rs3135388 DRB1*1501rs2104286 IL2RArs6897932 IL7RArs6457617 Chr. 6rs6679677 RSBN1rs2476601 PTPN22rs4506565 TCF7L2rs12255372 TCF7L2rs12243326 TCF7L2rs10811661 CDKN2Brs8050136 FTOrs5219 KCNJ11rs5215 KCNJ11rs4402960 IGF2BP2
Atrial fibrillation
Crohn's disease
Multiple sclerosis
Rheumatoid arthritis
Type 2 diabetes
disease gene / regionmarker
2.00.5 5
0.5 5.01.0Odds Ratio
rs2200733 Chr. 4q25rs10033464 Chr. 4q25rs11805303 IL23Rrs17234657 Chr. 5rs1000113 Chr. 5rs17221417 NOD2rs2542151 PTPN22rs3135388 DRB1*1501rs2104286 IL2RArs6897932 IL7RArs6457617 Chr. 6rs6679677 RSBN1rs2476601 PTPN22rs4506565 TCF7L2rs12255372 TCF7L2rs12243326 TCF7L2rs10811661 CDKN2Brs8050136 FTOrs5219 KCNJ11rs5215 KCNJ11rs4402960 IGF2BP2
Atrial fibrillation
Crohn's disease
Multiple sclerosis
Rheumatoid arthritis
Type 2 diabetes
disease gene / regionmarker
2.00.5 5
First results
Types of projects
• Discovery or validation of genotype-phenotype relations for disease susceptibility or drug responses
• Discovery of new disease/susceptibility genes resequence in patients (obesity, Cushing's, susceptibility to infection, insomnia, pre-term birth)
• Access samples without disease X, or “normals” of specified ancestry, or old normals
• Phenome-wide association study (PheWAS): in development
Research Use Cases
Retrospective chart reviews
Rapid preliminary data forgrant
submissions
Feasibility assessment
Hypothesis generation
Examples of ICD-9 codesfor rare diseases
Example Rare Disease
Number in SD Number in BioVU
Microcephalus 1,070 85Pica 115 22Septicemic Plague 21 0Pick’s Disease 45 8Acromegaly and Gigantism 571 123
Ehlers-Danlos Syndrome 285 34Narcolepsy without Cataplexy 438 76
Spina Bifida 1968 238Stiff-Man Syndrome 82 17Tourette Syndrome 667 34Bell’s Palsy 2534 402Bulimia Nervosa 919 88Cushing’s 1443 298Peyronies Disease 694 157Wilson’s Disease 140 49Meningioma 1444 355Wegener’s 363 141
Investigator query
cases
controls+
Data use agreement + IRB Approval
B699
tre56
3msd
..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
Investigator query
cases
controls+
Data use agreement + IRB Approval
Manual Review
Sample retrieval
B699
tre56
3msd
..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
F5rt
783m
bncd
s….
B69
9tre
563m
sd…
.F5
rt78
3mbn
cds…
.B
699t
re56
3msd
….
F5rt
783m
bncd
s….
B69
9tre
563m
sd…
.F5
rt78
3mbn
cds…
.B
699t
re56
3msd
….
F5rt
783m
bncd
s….
B69
9tre
563m
sd…
.F5
rt78
3mbn
cds…
.B
699t
re56
3msd
….
F5rt
783m
bncd
s….
B69
9tre
563m
sd…
.F5
rt78
3mbn
cds…
.B
699t
re56
3msd
….
F5rt
783m
bncd
s….
B69
9tre
563m
sd…
.
cases
controls+
Investigator query
cases
controls+
Data use agreement + IRB Approval
Sample retrieval
B699
tre56
3msd
..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
F5rt
783m
bncd
s….
B69
9tre
563m
sd…
.F5
rt78
3mbn
cds…
.B
699t
re56
3msd
….
F5rt
783m
bncd
s….
B69
9tre
563m
sd…
.F5
rt78
3mbn
cds…
.B
699t
re56
3msd
….
F5rt
783m
bncd
s….
B69
9tre
563m
sd…
.F5
rt78
3mbn
cds…
.B
699t
re56
3msd
….
F5rt
783m
bncd
s….
B69
9tre
563m
sd…
.F5
rt78
3mbn
cds…
.B
699t
re56
3msd
….
F5rt
783m
bncd
s….
B69
9tre
563m
sd…
.
Genotyping, genotype-phenotype relations
cases
controls+
Investigator query
cases
controls+
Data use agreement + IRB Approval
Data Use Agreement
Genotyping Data Accrual
Nationally Prevalent Diseases in the African American Population
Disease BioVU Count
Hypertension 1095
Type 2 Diabetes 714
Coronary Artery Disease 273
Kidney Disease 252
Asthma 210
Pneumonia 193
Stroke 133
Lupus 48
Lung cancer 21
Investigator(s) completesVICTR Resource Request
Funding request reviewed by VICTR SRC
Investigator(s) completesBioVU Application Process
Proposal reviewed by BioVU Review Committee
BioVU program office contacts investigators with any necessary
revisions if applicable
Proposal Approved/Access granted
Investigator resubmitsproposal if necessary
Total Time: Data Requests: 4-6 weeksDNA Access: 8-12 weeks
Funding Decision
Proposal Review Process:
BioVU Application Process
BioVU Genotyping Process
Genotyped data analyzedby investigator
Investigator selects cases and controls from
Synthetic Derivative
Investigator signals BioVU programto initiate sample selection
BioVU notifies DNA resources core that samples are ready for
selection and pickingSamples are provided to
appropriate lab and are genotyped
Investigator and BioVU programreceive genotype data
BioVU Genotyping Process:
BioVU Requests
BioVU Requests BioVU Approvals0
5
10
15
20
25
30
35
40DNA RequestsData Requests
37 Total Requests24 Approvals
FAQ “answers”• SD access: “non-human subjects” IRB review (days)
• Current access costs: $4/sample
• Genotyping:o Investigator-funded
Consider VICTR as a funding source
o Genotyping/sequencing performed in VUMC Core Facilities Justification must be provided for outside genotyping, including quality
control plans
o Genotype “redeposit” part of the data use agreement
• Anticipate 16,000 BioVU subjects will have GWAS-type genotyping data by fall 2011
Questions?
Contact: Erica Bowton PhD
BioVU Program Manager
322-1975