Enabling Large-Scale Analysis of Electronic Health Records in Europe through standardization: SNOMED in Action
Peter Rijnbeek
Associate Professor Health Data ScienceDepartment of Medical Informatics
Erasmus MC RotterdamThe Netherlands
2
Disclosure belangen sprekerSymposium Verbinden & Innoveren met SNOMED
13 februari 2020
Geen (potentiële) belangenverstrengeling None
Voor bijeenkomst mogelijk relevante relaties None
• Sponsoring of onderzoeksgeld
• Honorarium of andere (financiële) vergoeding
• Aandeelhouder
• Andere relatie, namelijk: ..
• Innovative Medicines Initiative (IMI)• Janssen Research & Development Grant
• None
• None
• None
3
THE JOURNEY TO LARGE-SCALE ANALYTICS
• Introduction to the use of a Common Data Model and Standardized Vocabularies
• The Observational Health Data Sciences and Informatics (OHDSI) initiative
• The European Health Data and Evidence (EHDEN) Project
• SNOMED in Action: examples of large scale studies
4
HEALTH DATA ORIGINATES FROM PATIENT JOURNEYS
Conditions
Drugs
Procedures
Measurements
Person time
Dis
eas
e
Trea
tmen
t
Ou
tco
me
0Baseline time Follow-up time
5
EACH OBSERVATIONAL DATABASE IS JUST AN (INCOMPLETE) COMPILATION OF
PATIENT JOURNEYS
Person 1
Conditions
Drugs
Procedures
Measurements
Person time
Dis
ease
Trea
tmen
t
Ou
tco
me
0Baseline time Follow-up time
Person 2
Conditions
Drugs
Procedures
Measurements
Person time
Dis
ease
Trea
tmen
t
Ou
tco
me
0Baseline time Follow-up time
Person 3
Conditions
Drugs
Procedures
Measurements
Person time
Dis
ease
Trea
tmen
t
Ou
tco
me
0Baseline time Follow-up time
Person N
Conditions
Drugs
Procedures
Measurements
Person timeD
isea
se
Trea
tmen
t
Ou
tco
me
0Baseline time Follow-up time
6
QUESTIONS ASKED ACROSS THE PATIENT JOURNEY
Conditions
Drugs
Procedures
Measurements
Person time
Dis
ease
Trea
tmen
t
Ou
tco
me
0Baseline time Follow-up time
Which treatment did patients choose after diagnosis?
Which patients chose which treatments?
How many patients experienced the outcome after treatment?
What is the probability I will experience the outcome?
Does treatment cause outcome?
Does one treatment cause the outcome more than an alternative?
What is the probability I will develop the disease?
7
GENERATING RELIABLE EVIDENCE
How can we do this at a large scale, i.e. on many data sources in Europe for many research questions?
How can we make sure these results are reliable?
8
MINIMUM REQUIREMENTS TO ACHIEVE REPRODUCIBILITY
Patient-level data in source
system/schema
Reliable evidence
B
D
F
H
J
KM
OP
Q
R
S TU
V
W
I
C
E
L
N
XY
G
AZ
• Complete documented specification that fully describes all data manipulations and statistical procedures
• Original source data, no staged intermediaries• Full analysis code that executes end-to-end (from source to
results) without manual intervention
Desired attribute
Question Researcher Data Analysis Result
Reproducible Identical Different Identical Identical = Identical
9
THE CHALLENGES OF REAL-WORLD DATA
Analytical method
Link to data
Data interoperability Standardised analytics Data network Strong community
What will it require?
The data…
10
GENERATING RELIABLE EVIDENCE USING THE OMOP CDM
Patient-level data in source
system/schema
Reliable evidence
B
D
F
H
J
KM
OP
Q
R
S TU
V
W
I
C
E
L
N
XY
G
AZ
B
D
F
H
J
K
M
I
C
E
L
G
APatient-
level data in CDM
A Common Data Model will enable standardised analytics to generate reliable
evidence.
11
HOW A COMMON DATA MODEL + COMMON ANALYTICS CAN SUPPORT REPRODUCIBILITY
Patient-level data in source
system/schema
Reliable evidence
B
D
F
H
J
K
M
I
C
E
L
G
A
• Use of common data model splits the journey into two segments: 1) data standardization, 2) analysis execution
• ETL specification and source code can be developed and evaluated separately from analysis design
• CDM creates opportunity for re-use of data step and analysis step
Desired attribute
Question Researcher Data Analysis Result
Reproducible Identical Different Identical Identical = Identical
Patient-level data
in CDM
15
OBSERVATIONAL HEALTH DATA SCIENCES AND INFORMATICS
Mission: To improve health by empowering a community to collaboratively generate the evidence that promotes better health decisions and better care
A multi-stakeholder, interdisciplinary, international collaborative with a coordinating center at Columbia University
http://ohdsi.org
16
OHDSI’S GLOBAL RESEARCH COMMUNITY
• >200 collaborators from 25 different countries
• Experts in informatics, statistics, epidemiology, clinical sciences
• Active participation from academia, government, industry, providers
• Currently records on about 500 million unique patients in >100 databases
http://ohdsi.org/who-we-are/collaborators/
17
SECOND ANNUAL OHDSI SYMPOSIUM, MARCH 29TH 2019
- Provides a platform to stimulate community building: 250 participants from 27 countries- Demonstrates the OHDSI approach to Reliable and Reproducible
Evidence Generation: 35 posters, 8 software demos- Educates and trains the community: 5 full day tutorials
www.ohdsi-europe.org
18
ASIAN PACIFIC OHDSI COMMUNITY
South Korea, China, Taiwan, Japan, Australia is mapping data to the OMOP-CDM at scale
19
DEEP INFORMATION MODEL: OMOP CDM VERSION 6
Concept
Concept_relationship
Concept_ancestor
Vocabulary
Source_to_concept_map
Relationship
Concept_synonym
Drug_strength
Standardized vocabularies
Domain
Concept_classDose_era
Condition_era
Drug_era
Results Schema
Cohort_definition
Cohort
Standardized derived elements
Stan
dar
diz
ed
clin
ical
dat
a
Drug_exposure
Condition_occurrence
Procedure_occurrence
Visit_occurrence
Measurement
Observation_period
Payer_plan_period
Provider
Location
Cost
Device_exposure
Observation
Note
Standardized health system data
Fact_relationship
Specimen
Standardized health economics
CDM_source
Standardized metadata
Metadata
Person
Survey_conduct
Location_history
Note_NLP
Visit_detailCare_site
https://github.com/OHDSI/CommonDataModel/wiki
Single Concept Reference Table
20
Vocabulary ID
All vocabularies stacked up in one
table
Ancestry Relationships: Higher-Level Relationships
Atrial fibrillation
Fibrillation Atrial arrhythmia
Supraventricular
arrhythmia
Cardiac arrhythmia
Heart disease
Disease of the
cardiovascular system
Controlled
atrial
fibrillation
Persistent atrial
fibrillation
Chronic atrial fibrillation
Paroxysmal atrial
fibrillation
Rapid atrial
fibrillation
Permanent atrial
fibrillation
Concept Relationships
Concepts
Ancestry Relationships
Ancestor
Descendant
5 levels of separation
2 levels of separation
21
SNOMED-CT
Source codes
ICD10CM
Low-level concepts
Higher-level classifications
OxmisRead
SNOMED-CT
ICD9CM
Top-level classification
SNOMED-CT
MedDRA
MedDRA
MedDRA
Low-level terms
Preferred terms
High-level terms
MedDRA High-level group terms
MedDRA System organ class
ICD10 Ciel MeSHSNOMED
Use of SNOMED in the Standardized Vocabularies
22
• SNOMED is the standard in several domains, e.g. conditions, procedure.
• Powerful Polyhierarchical Structure.
23
SNOMED CHALLENGES
• We want to use SNOMED across the world: how to deal with countries that do not (yet) have a license?
• We will require SNOMED extensions to accommodate differences in granularity or classification differences.
• We have to making mappings from many source coding systems to SNOMED in Europe, for example mapping ICPC1 to SNOMED
Enabling Large-Scale Analysis of Electronic Health Records in
Europe
The EHDEN Project
25
Values
EHDEN: VISION AND MISSION
VisionThe European Health Data & Evidence Network (EHDEN) aspires to be the trusted
observational research ecosystem to enable better health decisions, outcomes and care
MissionOur mission is to provide a new paradigm for the discovery and analysis of health data in
Europe, by building a large-scale, federated network of data sources standardised to a common data model
26
EHDEN CONSORTIUM
Start date: 1 Nov 2018End date: 30 Apr 2024Duration: 66 months
Non-for-profit organisations
Small to medium-sized companies
EFPIA & Associated partners
Universities, public bodies and research organisations
Almost €29 million
Academic coordinator
EFPIA Lead
22 partners
Innovative Medicines Initiative Project
27
EHDEN IS ABOUT ...
29
CALL PROCESS FOR DATA PARTNERS AND SMALL TO MEDIUM-SIZED ENTERPRISE (SMES)
Tailored for project objectives and sustainability
Data Partners
Supporting SMEs
Open calls
Focusing on SMEs able to support
mapping and sustainability
Open calls
Workshop
Source Data
Evaluation
Share of Mapping Process
Mapping
Audit
MappingCycle
Evaluated via a pre-defined set of criteria
by the Data source prioritisation committee
Harmonization fund
Data sources can choose the SME from
the pool of EHDEN certified SMEs
SMEs are paid via grants from the
harmonisation fund
Payments are milestone based
Mapped data sources are encouraged to be active members of the EHDEN community,
participating in research studies.
Grant Awarding
Training & Certification
SME certification committee prioritizes SMEs for training and
certification
31
THE EHDEN ACADEMY
AimTo develop an e-learning environment to train all stakeholders in the project in the use of the tools and processes that are being adopted in EHDEN
CollaborationCourse development on the OMOP Common Data Model and the rich set of OHDSI tools are developed in collaboration with the OHDSI community
InfrastructureThe EHDEN Academy is being developed in Moodle and is hosted in the Amazon AWS cloud
academy.ehden.eu
32
SME CERTIFICATION
1) EHDEN Foundation: Introduction to IMI, EHDEN, OHDSI2) OHDSI-IN-A-BOX Virtual Machine3) OMOP CDM and Standardized Vocabularies4) Extract, Transform and Load5) Analytical Infrastructure
More course will be added in the EHDEN Academy in the future.
• Final certification will contain a two days face-to-face meeting at the Erasmus MC in Rotterdam with all SMEs in the current batch. Multiple persons per SME can participate.
• Final assessment will contain a mapping exercise and installation of the Analytical Infrastructure.
Goal: to provide the SME all the skills to perform the data standardisation task to the OMOP-CDM and train them on the installation of the analytical infrastructure
33
SME OPEN CALL RESULTS – APRIL 2019
Applicant countries
34 SME profiles made
28 Eligible applications
11 SMEs initially selected
Batch 1
Batch 2
Now open for applications till end of February!
34
SME PILOT CALL
36
OPEN CALL FOR DATA PARTNERS
• Draft Call Description has been made available on the website for public review since July. Pilot call opens Sept 1st and closes Sept 15th.
• Different types of grants (max 100.000 Euro):• Create new Data Transformation and Analytical Infrastructure• Revise Existing Data Transformation and Analytical Infrastructure• Inspect Completed Data Transformation and Analytical Infrastructure
• Data Partners from EU Member States and H2020 countries can apply through online application portal.
For more information about the future Open Calls see the EHDEN website: www.ehden.eu
37
DATA PARTNER PILOT CALL RESULTS
Applicant countries
48 Data partner profiles made
28 Submitted applications
20 Data Sources selected
>170 million Patient Records
Hospital, GP, Registries, etc.
What type of Evidence do we generate?
OHDSI and EHDEN in Action
39
CLASSIFICATION BY SCIENTIFIC TASKS
Instead of defining health data science by its technical activities, e.g. management, processing, analysis, visualization, we should define the field by its scientific tasks:
1. Description -> Clinical Characterisation: What happened to them?
2. Prediction (inference) -> Patient-Level Prediction: What will happen to me?
3. Counterfactual Prediction (causal inference) -> Population-Level Effect Estimation: What are the causal effects?
Real-World Data is very valuable for all these three Health Data Science Tasks!
Hernán MA. A second change to get causal inference right: A classification of data science tasks. Chance. 2019
Questions asked across the patient journey
• Clinical characterization
– Treatment Utilization: among patients with diabetes, which treatments are taken when
– Natural history: Who has diabetes, and who takes metformin?
– Quality improvement: What proportion of patients
with diabetes experience complications?
• Patient-level prediction
– Precision medicine: Given everything you know about me, now I started using metformin, what is the chance I will get lactic acidosis?
– Disease interception: Given everything you know about me, what is the chance I will develop diabetes?
• Population-level effect estimation
– Safety surveillance: Does metformin cause lactic acidosis?
– Comparative effectiveness: Does metformin cause lactic acidosis more than glyburide?
Goal of our work
• Develop transparent and fully reproducible analytical pipelines for all three scientific tasks
• Develop processes and tools to disseminate all the generated evidence
• Create an active community that collaboratively moves this field forward
• Train and educate all the stakeholders to maximally leverage the new paradigm Power of a network of
standardised data!
Type 2 Diabetes Mellitus Hypertension Depression
OPTUM
GE
MDCDCUMC
INPC
MDCR
CPRD
JMDC
CCAE
Clinical Characterization: Population-level heterogeneity across systems, and patient-level heterogeneity within systems
Population-Level Effect Estimation: Large-Scale Evidence Generation and Evaluation in a Network of Databases (LEGEND)
58 Outcomes, 9 databases
Comparisons of hypertension treatments
Not all analyses are valid
Journey of Patient-Level Prediction
An example of large-scale analysis enabled by data standardisation
Problem definition
Among a target population (T), we aim to predict which patients at a defined moment in time (t=0) will experience some outcome (O) during a time-at-risk Prediction is done using only information about the patients in an observation window prior to that moment in time.
Types of prediction problems in healthcareType Structure Example
Disease onset and progression
Amongst patients who are newly diagnosed with <insert your favorite disease>, which patients will go on to have <another disease or related complication> within <time horizon from diagnosis>?
Among newly diagnosed AFib patients, which will go onto to have ischemic stroke in next 3 years?
Treatment choice Amongst patients with <indicated disease> who are treated with either <treatment 1> or <treatment 2>, which patients were treated with <treatment 1> (on day 0)?
Among AFib patients who took either warfarin or rivaroxaban, which patients got warfarin? (as defined for propensity score model)
Treatment response Amongst patients who are new users of <insert your favorite chronically-used drug>, which patients will <insert desired effect> in <time window>?
Which patients with T2DM who start on metformin stay on metformin after 3 years?
Treatment safety Amongst patients who are new users of <insert your favorite drug>, which patients will experience <insert your favorite known adverse event from the drug profile> within <time horizon following exposure start>?
Among new users of warfarin, which patients will have GI bleed in 1 year?
Treatment adherence Amongst patients who are new users of <insert your favorite chronically-used drug>, which patients will achieve <adherence metric threshold> at <time horizon>?
Which patients with T2DM who start on metformin achieve >=80% proportion of days covered at 1 year?
Current status of predictive modelling
• Inadequate internal validation
• Small sets of features
• Incomplete dissemination of model and results
• No transportability assessment
• Impact on clinical decision making unknown
Relatively few prediction models are used in clinical practice
OHDSI aims to develop a systematic process to learn and evaluate large-scale patient-level prediction models using observational health data in a data network
OHDSI Mission for Patient-Level Prediction
Evidence
Generation
Evidence
Evaluation
Evidence
Dissemination
Patient-Level Prediction
51
R-package
www.github.com/OHDSI/PatientLevelPrediction
• Vignettes• Videos• Online training material
Book-of-OHDSI https://ohdsi.github.io/TheBookOfOhdsi/
Study Resultswww.data.ohdsi.org
52
LARGE-SCALE PATIENT-LEVEL PREDICTION NOT THE FUTURE!
www.github.com/OHDSI/PatientLevelPrediction
Jenna M Reps, Martijn J Schuemie, Marc A Suchard, Patrick B Ryan, Peter R Rijnbeek; Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data, Journal of the American Medical Informatics Association, Volume 25, Issue 8, 1 August 2018, Pages 969–975, https://doi.org/10.1093/jamia/ocy032
Model Specification
Generate R-Package and share with the world
Share model and performance
Large scale analysis enables wide-spread dissemination
• The tool auto generates a word document containing all the model specifications, internal and external validation results, model details etc. etc. which serves as a kickstart for result dissemination.
• We can generate numerous visualisations of study results.
57
THE POWER OF DISRUPTIVE OPEN SCIENCE: THE STUDY-A-THON CONCEPT
Why do we seem to accept that answering important clinical questions takes a lot of time?
We have an obligation to be disruptive and push hard to change the current paradigm!!
This requires a team effort, no one has all the necessary competences: clinical knowledge, data source expertise, analytics, writing skills, etc.
Why not bring them together at a nice location and focus !!
58
OXFORD STUDY-A-THON
To compare the risk of post-operative complications and
mortality between unicompartmentaland total knee replacement.
59
WE CAN DO THIS IN ONE WEEK (STUDY-A-THON)??
Monday
Group consensus on the problemDraft cohort definitions
Tuesday
Review clinical characterisationDraft patient-level prediction design
Wednesday
Review patient-level prediction resultsExternally validate prediction model
Thursday
Draft population-level effect estimation designReview population-level effect estimation diagnostics
Friday
Review of resultsPlan for completing publications
“To compare the risk of post-operative complications and mortalitybetween unicompartmental vs total knee replacement.”
60
THE SECOND STUDY-A-THON IN BARCELONADifferent Location: Barcelona
Different Topic: Rheumatoid Arthritis
Different Team: RA Experts, Industry, Academia, Data Custodians
More datasources: 14
More countries: USA, Japan, Spain, TheNetherlands, Estonia, UK, Germany, France, Belgium
Different approach:
Protocols were developed prior to the meeting and approved by governance board is applicable.
AIM: Submission of abstracts for European League Against Rheumatism (EULAR) and multiple publications
We will publish a video about this week soon!!
61
AN EXCITING JOURNEY AHEAD
The uptake of the OMOP-CDM and success of OHDSI enables the EHDEN project to build the European eco-system that brings reliable evidence quicker to our patients.
The EHDEN project is collaborating with OHDSI to further develop the CDM, Vocabularies, and analytical tools.
Interactions with the SNOMED Community are ongoing to collaborate.
Expanding the Data Network, Community, and the support system with SMEs, will drive the sustainability of the eco-system.
This project has received funding from the Innovative MedicinesInitiative 2 Joint Undertaking (JU) under grant agreement No806968. The JU receives support from the European Union’sHorizon 2020 research and innovation programme and EFPIA.
@IMI_EHDEN
IMI_EHDEN
www.ehden.eu
github.com/EHDEN
62
NEED MORE INFORMATION?
https://book.ohdsi.org
www.ohdsi-europe.org