BioVU and the Synthetic Derivative
Erica Bowton, PhDProgram Manager, Personalized Medicine
Personalized Medicine
What is BioVU?
• The move towards personalized medicine requires very large sample sets for discovery and validation
• BioVU: biobank intended to support a broad view of biology and enable personalized medicine
• Contains de-identified DNA extracted from leftover blood after clinically-indicated testing of Vanderbilt patients who have not opted out
• Linked to Synthetic Derivative: de-identified EMR
John
Doe
4
One
way
has
h
A7C
CF9
9DE6
5732
….
scru
bbed
John
Doe
~2 million recordsThe Synthetic
Derivative: can be updated
5
eligibleJohn
Doe
One
way
has
h A7C
CF9
9DE5
732…
.
A7C
CF9
9DE6
5732
….
scru
bbed
Extract DNA
A7C
CF9
9DE6
5732
….
John
Doe
~2 million recordsThe Synthetic
Derivative: can be updated
6
7
Accepted samples must:· Be of good quality· Have sufficient amount of blood· Be from a patient who has signed the BioVU
form· Be from a patient who has not opted out
How BioVU Samples are Accepted
8
The BioVU FormA component of the Consent for Treatment process
9
Awareness Generation
• Posters in phlebotomy areas in English and Spanish
• Brochures freely available to VUMC clinics in English and Spanish
• BioVU hotline available for questions and opt-out
10
BioVU Sample Accrual: 176,448
Jul-07
Jan-08Jul-0
8Jan-09
Jul-09
Jan-10Jul-1
0Jan-11
Jul-11
Jan-12Jul-1
2Jan-13
Jul-13
Jan-14Jul-1
4Jan-15
Jul-15
0
25,000
50,000
75,000
100,000
125,000
150,000
175,000
200,000
225,000
Anticipated pediatric sample accrual
Anticipated adult sample accrual
Pediatric samples accrued
Adult samples accrued
Current accrual as of 2-19-2014:155,090 adult
21,472 pediatric
11
RTS SmaRTStore
Where are BioVU samples stored?
BioVU Operations OversightInstitutional Review Board
BioVUGeneral Counsel
Med Ctr Ethics
Vice Chancellor (Chair)
Ethics/ELSI (2)
Ctr Human Genetics Research (2)
Clinical genetic testing lab (1)
Genetics/Genetic Medicine (6)
Pediatric genetics (1)
Clin. Pharmacology(PI)
* Includes (or exclusively) external membership** (n)= number of members representing this discipline/area. Several members are represented in more than one area
Patient advocacy (2)
University counsel (1)
Biostatistics (3)
Cancer center (3)
Operations Oversight Board**
Community Advisory Board*
Ethics Advisory Board*
= oversight
Vice Chancellor’s Office
= input, advisory
Program staff
BioVU Protocol Review
Committee
Resources for EMR-based research at VUMC
13
The Synthetic DerivativeA de-identified and continuously-updated
image of the EMR (2 M records)
BioVU• DNA samples available: >175,000• Plasma collection underway
Redeposited genotypes• Subjects with GWAS data: >12,000• Subjects with any genotyping: >60,000• > 8,000,000,000 genotypes13
The Synthetic Derivative• Rich, multi-source database of de-identified clinical and demographic data
• A Derivative of the EMR - information content reduced by ‘scrubbing’ identifiers
• Systematically shifted event dates
• User Interface tool that can be used for access and analysis
• Services are available to help deliver results for non-standard queries (temporal queries, controls matching, etc)
• Contains ~2.1 million recordso ~1 million with detailed longitudinal datao averaging 100,000 bytes in size o an average of 27 codes per record
• Records updated over time and are current through 8/31/13
• Narratives, such as: Clinical Notes Discharge Summaries History and Physicals Problem Lists Surgical Reports Progress Notes Letters
• Diagnostic Codes, Procedural Codes• Forms (intake, assessment)• Reports (pathology, ECGs, echocardiograms)• Clinical Communications• Lab Values and Vital Signs• Medication Orders• TraceMaster (ECGs)• Tumor Registry
Synthetic Derivative Data Types
Technology + policyDe-identification
• Derivation of 128-character identifier (RUI) from the MRN generated by Secure Hash Algorithm (SHA-512)
• HIPAA identifiers removed using combination of custom techniques and established de-identification software
Date Shift• Our algorithm shifts the dates within a record by a time period (up to
364 days backwards) that is consistent within each record, but differs across records
Restricted access & continuous oversight• Access restricted to VU; not a public resource• IRB approval for study (non-human)• Data Use Agreement• Audit logs of all searches and data exports
Data Use Agreement
• No attempt at re-identification• Inform BioVU staff if a record is identifiable• Research confined to that which is described• Genotypes to be re-deposited back to BioVU
Phenotyping Approach
Algorithm Development
Identify phenotype of
interest
Case & control algorithm development
and refinement
Manual review; assess precision Deploy in BioVU
≥95%
<95%
Disease Cohorts
19
Number in SD Number in BioVUCentral Nervous SystemAlzheimer’s 3,429 497Parkinson’s 4,365 778Migraine 15,699 3,299PsychiatricDementia 3,747 1,045Major Depressive Disorder 20,008 3,385ADHD 12,922 1,184Generalized Anxiety Disorder 5,828 1,195Schizophrenia 4,069 495
20
Pre-Review
BioVU Committee Review Expedited Review
Genotyping data requests Reviewed by BioVU Chair
Full Review DNA sample access requests Reviewed and scored by Primary
and Secondary reviewers
BioVU Projects: Requests: 104 Approved so far: 86
BioVU Utilization
BioVU Requests BioVU Approvals0
20
40
60
80
100
120 DNA Requests
Data Requests
Current BioVU Studies
21
Heart Dise
ase
Cancer
Other
Neurological D
isease
Diabetes
Immune Sys
tem Disease
Obesity
Pharmaco
genomics
Obstetri
cs & Gyneco
logy
Lung Dise
ase
Eye Dise
ase
Privacy
0
5
10
15
20
25
BioVU Study Areas
Num
ber o
f Stu
dies
22
USE CASE 1Synthetic Derivative Study
23
11/1/2
007
11/13/2
007
11/25/2
007
12/7/2
007
12/19/2
007
12/31/2
007
1/12/2
008
1/24/2
008
2/5/2
008
2/17/2
008
2/29/2
008
3/12/2
008
3/24/2
008
4/5/2
008
4/17/2
008
4/29/2
008
5/11/2
008
5/23/2
008
6/4/2
008
6/16/2
008
6/28/2
008
7/10/2
008
7/22/2
008
8/3/2
008
8/15/2
008
8/27/2
008
9/8/2
00815
20
25
30
35
40
Ability to analyze quantitative, longitudinal repeated measures
BMI
Normal Range
Zyprexa Prescription
USE CASE 1Synthetic Derivative Study
USE CASE 1Synthetic Derivative Study
24
25
USE CASE 1Synthetic Derivative Study
26
USE CASE 1Synthetic Derivative Study
27
0
100
200
300
400
500
600
700
800
900
0 13.3 26.2 40.9 73.4 300+
BMI
USE CASE 1Synthetic Derivative Study
28
USE CASE 2Existing Genetic Data
29
USE CASE 2Existing Genetic Data
30
USE CASE 2Existing Genetic Data
USE CASE 2Existing Genetic Data
31
32
USE CASE 2Existing Genetic Data
33
USE CASE 2Existing Genetic Data
USE CASE 3New Genotyping/Sequencing
34
USE CASE 3New Genotyping/Sequencing
35
36
USE CASE 3New Genotyping/Sequencing
37
USE CASE 3New Genotyping/Sequencing
Investigator query
cases
controls+
Data use agreement
One
way
has
hUSE CASE 3New Genotyping/Sequencing
B699
tre56
3msd
..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
One
way
has
h
Investigator query
cases
controls+
Data use agreement
Data analysis
Sample retrieval
B699
tre56
3msd
..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre5
63m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B69
9tre
563m
sd..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt
783m
bncd
s…
scru
bbed
B699
tre56
3msd
..
scru
bbed
F5rt7
83m
bncd
s…
scru
bbed
F5rt
783m
bncd
s….
B69
9tre
563m
sd…
.F5
rt78
3mbn
cds…
.B
699t
re56
3msd
….
F5rt
783m
bncd
s….
B69
9tre
563m
sd…
.F5
rt78
3mbn
cds…
.B
699t
re56
3msd
….
F5rt
783m
bncd
s….
B69
9tre
563m
sd…
.F5
rt78
3mbn
cds…
.B
699t
re56
3msd
….
F5rt
783m
bncd
s….
B69
9tre
563m
sd…
.F5
rt78
3mbn
cds…
.B
699t
re56
3msd
….
F5rt
783m
bncd
s….
B69
9tre
563m
sd…
.
Genotyping, genotype-phenotype relations
cases
controls+
Investigator query
cases
controls+
Data use agreement
One
way
has
h
BioVU
VANTAGEVanderbilt Technologies for Advanced Genomics
VANGARDVanderbilt Technologies for
Advanced Genomics Analysis and Research
Design
• Access approvals/application• Cohort identification• Clinical data extraction• Programming support• Study design• Agreements
• Genotyping/sequencing approaches• Assay design• SNP selection• Sample pulling and plating
• Genomic data analysis and research design• Biostatistical/bioinformatic support
2-3 months
1-2 months
1-2 months
BioVU Project Life Cycle
For ALL BioVU Studies…
42
Resources:1. BioVU Project Management: [email protected]
2. Programming services: IDASC CORE
3. Genomic technologies: VANTAGE CORE
4. Data analysis services: VANGARD CORE
https://starbrite.vanderbilt.edu/biovu/
43
END
0.5 5
Validating EMR phenotype algorithms
0.5 50.5 5.01.0Odds Ratio
rs2200733 Chr. 4q25rs10033464 Chr. 4q25rs11805303 IL23Rrs17234657 Chr. 5rs1000113 Chr. 5rs17221417 NOD2rs2542151 PTPN22rs3135388 DRB1*1501rs2104286 IL2RArs6897932 IL7RArs6457617 Chr. 6rs6679677 RSBN1rs2476601 PTPN22rs4506565 TCF7L2rs12255372 TCF7L2rs12243326 TCF7L2rs10811661 CDKN2Brs8050136 FTOrs5219 KCNJ11rs5215 KCNJ11rs4402960 IGF2BP2
Atrial fibrillation
Crohn's disease
Multiple sclerosis
Rheumatoid arthritis
Type 2 diabetes
disease gene / regionmarker
2.0
Ritchie et al, 2010
observedpublished