Date post: | 11-Jan-2016 |
Category: |
Documents |
Upload: | chloe-sutton |
View: | 213 times |
Download: | 1 times |
Using EMR Data for Population Registries
Diana Gumas, JHMCIS Senior Director for Research Systems, EPR and EPR2020/Amalga
David Thiemann, Center for Clinical Data Analysis1
Potential Data Uses
• Sample Size Estimates (aggregate data without IRB approval)– Feasibility, grant applications, statistical planning
• Identifying patients for enrollment/recruitment– By diagnosis, pathology, stage, labs, meds
• Identifying/creating matched study controls• Obtaining current demographics (name, address) for mail
solicitation– From research list or by clinic, provider, clinical criteria
• Obtaining ongoing clinical + administrative data on a registry panel– Labs, visits, procedures, immunizations, CPT/ICD9 codes,
resource use
2
Possible research data sources
• EPR (JHH & JHBMC)• Sunrise Clinical Manager (JHH – inpatient)• Meditech (Bayview)• Casemix Datamart• GE Centricity (JHCP)• EPR2020• Departmental Systems (ED, OR, Anesthesia)• Clinical Research Management System (CRMS)• IDX (professional fees)• Death Registry
3
Methods for Data Access
• Historical: Researcher Negotiates Access With Clinical System /DBA
– Logistic nightmare, technical challenge
• Clinical Research Management System (CRMS)– Study cohort with real-time links to enterprise data
• Center for Clinical Data Analysis– Monthly/quarterly data extracts from designated systems
4
Clinical Research Management System (CRMS)
5
• 1,054 Users• 1079 Active Studies• 25,430 Participants
Data Available in CRMS– eIRB – EPR (patient demographics)– Study participants / accruals– Electronic Case Report Forms - in next 2-3 months
Clinical Research Management System (CRMS)
6
Ways to extract data– Canned Reports (click for examples)
– Ad-hoc querying using SQL
– Possible with CCDA support - automated study-specific data extracts
EPR2020 Data for Researchers
7
4.2M Patients, 23.4M Visits
12.3M Documents, 6.8M Radiology Reports
25.6M Lab Results
1.5M Problems, 2.2M Medications, 140K Allergies
Planned • Bayview & JHCP data• ICD9 diagnosis codes and CPT charges (IDX)
Future• Death Registry• Blood Product Data for Transfusions• Eclipsys SCM Order data• HMED (ED), ORMIS, eADR/Medivision
FromEPR
Today
My Participant’s Lab Data
8
Reliable. Driven by the CRMS Participant Registry. Exportable.
Registry Cohort Discovery using EPR2020
A JHM investigator wants to find and enroll diabetic patients
aged 45-65 years
with hemoglobin A1C between 7 and 9%
serum creatinine < 2 mg/dl
9
Center for Clinical Data Analysis (CCDA)
Provides periodic (monthly/quarterly) bulk data extracts (delimited/flat files, .xls):
• Preliminary, anonymous data for feasibility, grant applications and statistical sample-size estimates
• IRB-approved case-finding--for study enrollment (mailings, phone solicitation), chart review, and cohort/case-control studies
• Research data extracts - monthly/quarterly integrated extracts from EPR, POE, ORMIS, lab/PDS, billing systems, vaccination/transfusion/culture data, etc.
10
How CCDA works
• Email [email protected], cc: [email protected]; phone 410-955-65558 (Thiemann)
• For IRB-approved research: – Provide full protocol + IRB approval– Meet to discuss query methods, format– Iterate, then schedule prod (email extracts, Jshare)– Cost: $100/hour
• For non-IRB projects (exploratory analyses, QI)– Same process, cost subsidized by ICTR/JHM– Do NOT implicitly morph QI into IRB
11
The Basics: Getting Clinical Data Into a Registry Database
• Real work, not ad hoc/bootstrap
• Need $$$ and FTE(s)
• Smart analyst(s) who know database technology and understand (or can learn) nuances of the sources and content domain
• Hands-on PI management/guidance
• Statistical liason early, before database schema and ETL methods are set in stone
12
The Extract-Transform-Load process:Getting Clinical Data into Research DB
• Raw clinical/administrative data is useless for research
• Build an intermediate (staging) database
– Don’t do data management in SAS/Stata/Excel
• Data dictionary—derivation for each field
• Templated, tested, documented cleanup scripts/routines.
• Intermediate tables: Log each step/modification – For each batch, be able to re-create data transform from scratch
– Version control, change control and documentation are vital
– Build data versioning into the database
13
Transforming Data
• Raw data typically string (char/text) fields
• Unanalyzable characters (* < >, comments) still have meaning
– Put non-numeric data in separate field. Avoid numerical recoding (999)
• ~3% of pts have multiple/non-preferred MRNs– Need 1-to-many link table
• Assays/reference ranges/coding changes– Avoid using raw codes (CPT/ICD) in research db– Map clinical codes to research terms
• Defer analytic assumptions. When recoding data, anticipate problems. Keep options open.
14
More Data Transform Challenges
• NEVER trust raw data. Learn business logic of source system.
– CPTs morph annually, internal complexity/redundancy– Lab assays/reference/terms change– Parsing is inherently unreliable– Administrative names/groups change (clinic #s, departments).
• Duplicate-value problems (labs, orders)
• System-attribution source/datetime (POE, lab)
• Always run an aggregate (“group by” ) query to identify alternative names (eg lab name) and values (number, result) before transform. Otherwise you’ll miss something
15
Understanding Business Logic
• Trust but verify: Test coding accuracy – Providers may habitually use imprecise/inaccurate diagnosis
codes (especially in profee data)– ICD9 procedure indications often a billing fiction – Trained coders may make systematic errors – Different content domains may have different standards (inpt vs
outpt coders)– Don’t infer/assume dependencies unless enforced by source
system.
• Run min/max queries, aggregates, outer joins– Confirm date ranges, data ranges, relative proportions by year
• Don’t assume that null rows actually are empty. Maybe the query missed something
16
JHM Clinical Data Landscape: Past, Present and Future
Past : Babble of unintegrated systems
• EPR (antiquated technology, VSAM files, DB2) contains text, not queryable, analyzable data
Present: EPR2020 (aka Amalga) –integrated data!!
• Has everything in EPR, plus JHCP, plus gradually adding data from clinical/departmental/administative systems (IDX CPTs, transfusion medicine, ORMIS, HMED, eADR, death registry, ad infinitum)
Future: ? Epic, ? JHM Data Warehouse• Epic: One system replacing all major JHM systems• JHH timeline: 4+ years
17
JHM Data Sources: Casemix Datamart
• Gold standard for JHM (non-profee) administrative data, including payer/insurance data
• Combines data from Keane (hospital charges), ADT (admission/discharge/transfer), HDM (ICD9 diagnosis + procedure coding), HSCRC (regulatory submissions)
• Not a true data warehouse; meager reconciliation
• Best source for length of stay, resource use, ICD9 diagnoses
• Outpatient ICD9s limited
• Has JHH + BMC + HCGH data 18
JHM Data Sources: IDX (profee)
• Gold standard for inpatient +outpatient CPT (profee charge) data
• ICD9 diagnosis data problematic
• Limitation: No data from non-faculty providers (private physicians, etc.)
• Difficult to query. Has a data warehouse, limited access.
• Early target for EPR2020/Amalga integration.
19
JHH Data Sources: SCM/POE
• Sunrise Clinical Manager/Provider Order Entry
• Replicated transactional database, difficult to query
• For registry purposes POE has large attribution/process challenges: Stutter-step orders, multiple alerts, imputed times
• Great source for inpatient meds, labs, physiologic monitor data
• No codified ICD9/Snomed/RxNorm data
• No outpatient data
20
JHH Data Sources: SCC/AIM
• Sunrise Critical Care (aka Emtek, Eclipsys). JHH ICUs + stepdown units + oncology
• AIM analytic database contains selected but comprehensive batch extract
• Sunsets as ICUs switch to POE ClinDoc
• Challenging to query. Lots of denormalized fields
21
JHH + BMC Data Sources: PDS
• PDS=Pathology Data Systems
• Includes lab, transfusion medicine, anatomic pathology, cytopath, John Boitnott’s death registry
• Lab data also available via EPR2020/Amalga and POE
22
BMC Data Sources: Meditech
• Shrink-wrapped, comprehensive inpatient + outpatient clinical + financial system
• Difficult for ad hoc research queries.
• Exports data to Datamart and EPR2020
• BMC-JHH patient linkage doable but difficult, needs caution
23
JHCP Data Sources: GE Centricity
• All clinical + administrative data for JHCP clinics
• Largely opaque to research query; JHCP sometimes collaborates directly, especially for its physician/investigators
• Early target for EPR2020/Amalga integration
• Linkage challenges to BMC and JHH mrns
24
JHH Departmental Data:ORMIS + eADR/Medivision
• ORMIS: Operating Room Management Information System
• Mostly transactional scheduling/tracking/administrative data, limited clinical data.
• Has diagnoses, procedures, case start/stop times
• eADR/Medivision (anesthesia) still evolving, limited research data access
• Design challenges similar to legacy SCC critical-care system.
25
JHH Departmental Data: HMED (Emergency Department)
• Mostly opaque to research
• Replicated data hosted by Datamart
26