REAL-WORLD BIG DATA ANALYTICS
FOR UNDERSTANDING TREATMENT
PATHWAYS
John Cai
Director, Medical Informatics
June 11, 2015
Big Data for Pharma
Decision
making
The Fourth Hurdle requires Real
World Evidence from RWD
Cost-effectiveness (or CER) has became
the “fourth hurdle” to market access
Real World Big Data
Complexity
Variety Unstructured
data types
e.g. clinical
notes
Volume Massive data
sets, e.g.
longitudinal
claims/EMR
Velocity Fast, real-time
data collection
and
transmission
e.g. HIE,
wearables
Volume: Real World Population and
Real World Data
• Real World Evidence (RWE) evaluates safety, effectiveness and outcomes using real
world data (RWD).
• Not RCT data and broader than observational data, RWD is health data collected from
actual practice by healthcare providers or in day-to-day situations by patients or
caregivers
Real World
Population
Randomized
Clinical Trial
Population
100
1,000
10,000
100,000
1,000,000
10,000,000
Phase 1 Phase 2 Phase 3 Phase 4 5 yrs 10 yrs
Typical Pharma Data
Real World Data
#patients
Observational
Study Population
Variety: Major Real World Data
Types and Sources
• Claims (from payers or data vendors): Truven (MarketScan), IMS (PharMetrics),
United Health Group (Optum), Wellpoint, Aetna, Humana, CMS, ...
• EMR/EHR (from Healthcare providers or EMR vendors):
Nation-wide: VA, DoD, GE Centricity, Allscripts, Cerner, Humedica, Flatiron, etc…
Regional: Kaiser, Regenstrief, Partners, Mayo, Intermountain, Geisinger, ...
Academic: Harvard, Univ of Utah, Vanderbilt, Cincinnati Children's Hospital, ...
• Surveys and registries: NCHS (NHANES, NHIS, NAMCS , NHAMCS, NSAS,
NHDS, NNHS, NNAS, etc.), SEER registries, MEPS, ACC registries, ...
• PBM/Pharmacy Databases: Medco, Wallgreens, CVS, Walmart, …
• Lab databases: Quest, Labcorp, …
• PHRs: patient portals, MS HealthVault™, Indivo X, CMS PHR Pilots, …
• Patient forums/social media: Patientslikeme, inspire.com, smartpatients.com…
• Monitoring/wearables: medical device data, Apple ResearchKit, …
Pharma
CER EBM
Proactive
Pharmacovigilance Trial Design
& Interpretation PHC
Cost
Effectiveness
Drug Repositioning
/New Indications
Patient
recruitment
Velocity: Real World Data
Transmission to Pharma
Payer/
PBM Real World Data
?
?
Complexity, Variability, Veracity
• Patient journeys are complex
• Real-world treatment
pathways can be messy
• Physicians not following
clinical practice
guidelines
• Patients not adherence
to medications
Treatment pathways are difficult to
reconstruct using healthcare data:
• Technical hurdles - need to repeatedly
query and merge across large # tables
• Conceptual hurdles of secondary use
• Claims for transaction
• EMR for patient care
9
• Use business rules to translate data to events of interest
- Example: ndMM patient cohort
One inpatient diagnosis or two outpatient diagnoses (two separate dates)
list of ICD9 codes
One or more MM-specific treatments
list of drugs and procedures
First diagnosis: “index date”
At least 6 or 12 months continuous coverage before index date
At least 12 or 24 months continuous coverage after index date
What is a therapy line?
What is a drug switch, discontinuation, add-on, combo, “drug holiday”?
• Addresses some parts of the conceptual challenge
• Creates new problems
- How sensitive are our results to the rule definitions?
Typical Solutions
Potential Technical Solution:
Hadoop and MapReduce
• Hadoop: an open source software project
- Hadoop Distributed File System (HDFS)
- MapReduce: compute paradigm for parallel computing
- A whole ecosystem of additional products/services/tools
• History:
- 2003 Google file system paper
- 2004 Google Map Reduce paper
- Adopted by Yahoo, donated to the open source community in 2009
• The gist of it:
- Distributed file system, “cheap” storage on computer clusters
- Compute paradigm that abstracts the parallelism by breaking down
operations to “map” and “reduce”
- Hadoop framework takes care of everything else
Map Reduce in a Nutshell
Mappers work on data,
“emit” key-value pairs
We write Mappers and Reducers
Hadoop takes care of everything else
Reducer works on all
values (data) for the
same key
Shuffle-Sort:
intermediary data
sorted and distributed
by key
12
• Load data into HDFS
- “Transactional” data (claims, interactions)
• Reconstructing a patient’s timeline is a textbook MapReduce exercise:
- Mapper:
Read a piece of data. Example: claim
Figure out who it relates to. Example: patient ID
Return key-value pairs:
Key: patient ID
Value: the full piece of information (claim)
- Reducer:
Gets as an input a key and the set of all values (claims) associated with that key (patient ID)
Organize the values (claims) to produce a basic patient history
Building Patient Timelines using Hadoop and MapReduce
13
Building Patient Timelines using MapReduce Followed by Visual Analytics
Shuffle-Sort: “Hadoop magic”
Mapper Reducer
Treatment Cost Trends
14
Cost analysis of PsA and PsO treatments
Biologics treatment costs have been high and going up
Presented to AMCP and ISPOR 2015 as posters
Co-medication Usage
Treatment Pathways
Patient timelines - “individual story”
Future Directions
Cost of care analysis, comparing across different pathways
Healthcare resource utilization analysis, comparing across different pathways
Patterns of care analysis: predictive modeling combining patient similarity measures and clustering
Comparison to Clinical Practice Guidelines (Compliance and Adherence)
Outcomes of care/CER: incorporating clinical outcomes using integrated claims/EMR data
Some Learning Points
Some Hadoop functionality perfectly suited for patient timeline analysis Mapreduce for creating patient timelines
Once patient timelines are created, everything else scales linearly
Map(reduce) for calculating patient metrics and complex events
Mapreduce for analyzing treatment pathways
Cheap scalable storage capacity and compute power Scalability allows robust analysis
Healthcare Decision Making Requires
Real-world Big Data Analytics
Efficacy and Safety from RCT settings – FDA to approve
Cost effectiveness – Payer's willingness to pay
Clinical effectiveness (long term efficacy and safety) – Physicians to
prescribe, patient to adhere
Comparative effectiveness, patient reported outcomes – Physicians to
prescribe, patient to adhere
To Innovate To Approve To Pay for To Prescribe To Adhere
Industry FDA Physician Patient
Health Plan
IDS
Government
Forthcoming
Thank You!
Leveraging Hadoop MapReduce in Building Patient Timelines and Analyzing
Health Resource Utilization
Special Issue on Big Data in Pharmacoeconomics
Saar Golde, Ph.D., Knowledgent Group and NYU
Zhaohui “John” Cai, M.D. Ph.D., Celgene Corporation