Post on 25-Jan-2016
description
transcript
11
Using Electronic Medical Using Electronic Medical Records for Research: Records for Research: Practical Issues and Practical Issues and
Implementation HurdlesImplementation Hurdles
Prakash M. Nadkarni MDPrakash M. Nadkarni MD
22
Benefits of EMRsBenefits of EMRs
Most of the data that you want is often Most of the data that you want is often in the EMRin the EMRSample Size Analyses Sample Size Analyses Cohort identification /recruitmentCohort identification /recruitmentDetail DataDetail Data
You can implement many research You can implement many research related workflowsrelated workflowsAppointment scheduling enables Appointment scheduling enables
interventions at the patient's convenience.interventions at the patient's convenience.
33
EMRs don't do everything EMRs don't do everything Even Epic warns you about the need to Even Epic warns you about the need to
interoperate with software designed interoperate with software designed specifically for clinical research specifically for clinical research (CRIS=Clinical Research Information (CRIS=Clinical Research Information System).System).
Even CRISs are sub-specialized: Project Even CRISs are sub-specialized: Project management/finance, grant management management/finance, grant management workflows, federal paperwork (FDA workflows, federal paperwork (FDA Investigational New Drug applications), Investigational New Drug applications), general or specialized data capture (e.g., general or specialized data capture (e.g., patient diaries, adaptive questionnaires).patient diaries, adaptive questionnaires).
44
Challenge: No Study CalendarChallenge: No Study CalendarAll patients are not enrolled at the same time. All patients are not enrolled at the same time. Specific evaluations or interventions are done Specific evaluations or interventions are done
at specific time points ('events") relative to at specific time points ('events") relative to start of participation in the study (or some start of participation in the study (or some arbitrary point- e.g., working backwards from arbitrary point- e.g., working backwards from a scheduled MRI scan). a scheduled MRI scan).
Each time point may have a permissible Each time point may have a permissible range or range or windowwindow (e.g., “6-mth follow up” may (e.g., “6-mth follow up” may occur between 5-7 months).occur between 5-7 months).
Given a protocol/study calendar, a CRIS will Given a protocol/study calendar, a CRIS will *generate* a provisional patient calendar.*generate* a provisional patient calendar.
55
Study Calendar (2)Study Calendar (2) The protocol is worked out based on The protocol is worked out based on
information yield of the evaluation and information yield of the evaluation and expected rate of change in the parameters expected rate of change in the parameters evaluated, evaluation cost and patient risk. evaluated, evaluation cost and patient risk. An Event-CRF Cross-Table enforces An Event-CRF Cross-Table enforces consistency.consistency.
CRISs use "Unscheduled" events to deal CRISs use "Unscheduled" events to deal with emergency conditions.with emergency conditions.
An entire set of reports are calendar-driven An entire set of reports are calendar-driven – e.g., scheduled events, missing forms, – e.g., scheduled events, missing forms, out-of-range visits.out-of-range visits.
In Epic, the closest to Calendar functionality In Epic, the closest to Calendar functionality is the Chemotherapy module (Beacon)is the Chemotherapy module (Beacon)
66
Non-adherence to StandardsNon-adherence to Standards
If vendor ignores national/international If vendor ignores national/international controlled terminology standards, data controlled terminology standards, data pooling in cross-institutional pooling in cross-institutional collaborations is difficultcollaborations is difficultFor procedures, Epic does not use Clinical For procedures, Epic does not use Clinical
& Procedural Terminology (CPT). Instead, & Procedural Terminology (CPT). Instead, procedures are identified by idiosyncratic procedures are identified by idiosyncratic abbreviations created by hurried users, abbreviations created by hurried users, that are hard to interpret except by those that are hard to interpret except by those users, and vary across institutions. users, and vary across institutions.
77
Standards Challenges (2)Standards Challenges (2)
Of the 15,000 laboratory tests in our instance Of the 15,000 laboratory tests in our instance of Epic, only about 8% have been mapped of Epic, only about 8% have been mapped currently to the Logical Observations, currently to the Logical Observations, Identifiers, Nomenclature and Codes (LOINC) Identifiers, Nomenclature and Codes (LOINC) vocabulary.vocabulary.
Sometimes the same procedure or lab test is Sometimes the same procedure or lab test is defined more than once in a master tabledefined more than once in a master table the definitions are unhelpful, and one must look at the definitions are unhelpful, and one must look at
the actual data to determine which are used, e.g., the actual data to determine which are used, e.g., histogram showing number of tests performed over histogram showing number of tests performed over a period of time, the max and minimum values.a period of time, the max and minimum values.
88
Redundancy and Redundancy and heterogeneityheterogeneity
The data may have been stored more The data may have been stored more than once, and in different ways, in than once, and in different ways, in different parts of the medical recorddifferent parts of the medical recordBMI is recorded in two different places.BMI is recorded in two different places.
"Uncontrolled" local terminologies"Uncontrolled" local terminologiesFlowsheets where Blood pressure is recorded Flowsheets where Blood pressure is recorded
redundantly as text "124/82". (Not in UIHC, redundantly as text "124/82". (Not in UIHC, fortunately.)fortunately.)
Procedures and Lab definitions list are also Procedures and Lab definitions list are also semi-controlled.semi-controlled.
99
Duplicate ElementsDuplicate Elements
Pseudo-redundancy: Subtly different Pseudo-redundancy: Subtly different data elements that are given the data elements that are given the same label in the user interfacesame label in the user interfaceBaby's birth weight is recorded both at Baby's birth weight is recorded both at
the time of delivery and at the time of the time of delivery and at the time of admission to a NICU. The two are not admission to a NICU. The two are not semantically the same: with semantically the same: with interventions, the former may be interventions, the former may be significantly more (or less) than the significantly more (or less) than the latter.latter.
1010
““Wrong” structureWrong” structureMuch data (discharge summaries, etc.) is Much data (discharge summaries, etc.) is
stored as text, requiring human abstraction stored as text, requiring human abstraction or Natural language processing (NLP).or Natural language processing (NLP).
NLP is not 100% accurate, requiring NLP is not 100% accurate, requiring sensitivity and specificity to be traded off. It sensitivity and specificity to be traded off. It is especially hard with progress notes that is especially hard with progress notes that are replete with abbreviations and that may are replete with abbreviations and that may have little grammatical structure.have little grammatical structure.
Much of the published NLP work relies on Much of the published NLP work relies on idiosyncrasies of a particular dataset (e.g., idiosyncrasies of a particular dataset (e.g., the use of Epic templates) to achieve higher the use of Epic templates) to achieve higher accuracy, and is not always generalizable.accuracy, and is not always generalizable.
1111
The Needle in the HaystackThe Needle in the HaystackEpic schema contains several thousand Epic schema contains several thousand
tables; many unused, or with empty fields.tables; many unused, or with empty fields. Incomplete or out-of-date documentation.Incomplete or out-of-date documentation.The first time, one may spend more time The first time, one may spend more time
locating a particular data element than locating a particular data element than actually pulling it out.actually pulling it out.
Persons doing data extraction need to add Persons doing data extraction need to add value by providing signposts and tips, to help value by providing signposts and tips, to help others who have to do the same task later.others who have to do the same task later.
Even with a data warehouse, this problem Even with a data warehouse, this problem will reoccur as long as data definitions are will reoccur as long as data definitions are suboptimalsuboptimal
1212
Real-time cohort Real-time cohort identification must be done identification must be done
judiciouslyjudiciously"Best Practice Alerts" can be a "Best Practice Alerts" can be a
resource drain on responsiveness of resource drain on responsiveness of systems. systems.
Do you really need real-time subject Do you really need real-time subject identification? Would a 24-hour delay identification? Would a 24-hour delay be acceptable? ICU-related clinical be acceptable? ICU-related clinical studies; transfusion in preemies.studies; transfusion in preemies.
1313
Transforming the DataTransforming the DataThe form in which data is recorded in the The form in which data is recorded in the
EMR is not necessarily the form in which EMR is not necessarily the form in which it is most conveniently analyzed or it is most conveniently analyzed or reported. reported.
Registries often require creating derived Registries often require creating derived variablesvariablesConverting numerical data into categories – Converting numerical data into categories –
e.g., Binning children by birth weighte.g., Binning children by birth weightConverting numeric values or Converting numeric values or
existence/absence of data into Yes/No: Is the existence/absence of data into Yes/No: Is the bilirubin > 5 mg/dl? Did the neonate receive bilirubin > 5 mg/dl? Did the neonate receive nitric oxide inhalation for pulmonary nitric oxide inhalation for pulmonary hypertension?hypertension?
1414
Interfacing with statistical Interfacing with statistical softwaresoftware
Before: sample size, randomizationBefore: sample size, randomizationAfter: Analysis, fitting to modelsAfter: Analysis, fitting to models
Some CRISs (e.g., REDCap, TrialDB) will Some CRISs (e.g., REDCap, TrialDB) will output SAS/SPSS-formatted data files, output SAS/SPSS-formatted data files, with definitions for all variables (including with definitions for all variables (including enumerations for all categorical variables; enumerations for all categorical variables; SAS has a command called PROC FORMAT SAS has a command called PROC FORMAT for categorical data). EMRs still lag.for categorical data). EMRs still lag.
1515
Data WarehouseData WarehouseA database that is optimized for fast A database that is optimized for fast
query, preferably by end-users, without query, preferably by end-users, without interactive updatesinteractive updates
Solves some problems, but not othersSolves some problems, but not othersMore homogeneous structure – i.e., a handful More homogeneous structure – i.e., a handful
of tables rather than thousands.of tables rather than thousands.However, the problem of locating variables of However, the problem of locating variables of
interest doesn't go away. With indifferent interest doesn't go away. With indifferent documentation of the variables, the problem documentation of the variables, the problem of hunting for variables of interest is of hunting for variables of interest is transferred from the concierge/analyst to the transferred from the concierge/analyst to the end-user, which may worsen the problem.end-user, which may worsen the problem.
1616
Special Challenges in EMR Data Special Challenges in EMR Data Interpretation /ReliabilityInterpretation /Reliability
Data entry errors in source data, often a Data entry errors in source data, often a consequence of “copy and paste”.consequence of “copy and paste”.
Coding of categorical variables does not Coding of categorical variables does not accommodate nuances in the medical accommodate nuances in the medical history or diagnostic findings.history or diagnostic findings.
Depending on the source, billing data may Depending on the source, billing data may have been up-coded (Humana).have been up-coded (Humana).
Outcome data may be lacking – absence of Outcome data may be lacking – absence of return visit data mayreturn visit data may simply mean that simply mean that patient failed to improve and went patient failed to improve and went elsewhere.elsewhere.
1717
Special Challenges (2)Special Challenges (2)
Data fragmentation – especially where Data fragmentation – especially where healthcare is provided by separate institutions.healthcare is provided by separate institutions.
Data is observational – treatments and Data is observational – treatments and exposures are not assigned randomly.exposures are not assigned randomly.
Confounding Bias – socioeconomic factors Confounding Bias – socioeconomic factors might lead patients to use suboptimal might lead patients to use suboptimal treatmentstreatments
Selection/sampling Bias – atypical Selection/sampling Bias – atypical demographical attributes for the cohort whose demographical attributes for the cohort whose data you are seeing, may limit inferences that data you are seeing, may limit inferences that you can make about the general population.you can make about the general population.
1818
Frontiers: Genetic DataFrontiers: Genetic DataThere are no technical barriers to the There are no technical barriers to the
incorporation of limited genetic data incorporation of limited genetic data for an individual– e.g., SNPs or specific for an individual– e.g., SNPs or specific mutations – in structured (i.e., readily mutations – in structured (i.e., readily analyzable) form.analyzable) form.
Major current issue is the limited Major current issue is the limited understanding of genetic data and understanding of genetic data and definitions by EMR vendors.definitions by EMR vendors.
Whole-genome is still a long-way off. A Whole-genome is still a long-way off. A single record would be larger than the single record would be larger than the bulk of existing non-image EMR data.bulk of existing non-image EMR data.
1919
ConclusionsConclusionsNone of the challenges are None of the challenges are
insurmountable, but they take a lot of insurmountable, but they take a lot of effort and resources to addresseffort and resources to address
Most of the fixes are long-term, involving:Most of the fixes are long-term, involving:Manual mapping to controlled vocabulary Manual mapping to controlled vocabulary
termstermsChange in processesChange in processesMaintaining descriptive documentation that Maintaining descriptive documentation that
must continually be checked for usability and must continually be checked for usability and currency.currency.
2020
Further ReadingFurther Reading Masys DR, et al . Technical desiderata for Masys DR, et al . Technical desiderata for
the integration of genomic data into the integration of genomic data into Electronic Health Records.J Biomed Inform. Electronic Health Records.J Biomed Inform. 2012 Jun;45(3):419-222012 Jun;45(3):419-22
Nadkarni, Ohno-Machado and Chapman. Nadkarni, Ohno-Machado and Chapman. Natural Language Processing: A Tutorial. Natural Language Processing: A Tutorial. Journal of the American Medical Journal of the American Medical Informatics Association, 2011. Informatics Association, 2011. PMC3168328PMC3168328
Hoffman & Podgurski, “Big, bad data” Hoffman & Podgurski, “Big, bad data” Journal of Law, Medicine and Ethics, (2013) Journal of Law, Medicine and Ethics, (2013) 41:1,pp 56-60. 41:1,pp 56-60. http://www.ncvhs.hhs.gov/130430b6.pdfhttp://www.ncvhs.hhs.gov/130430b6.pdf..
2121
Questions?Questions?