Overview & Update of UCSF Clinical Data and Tools for Research
10/8/2018
Rick Larsen Director of Research Informatics, Academic Research Systems, UCSF IT
Clinical Data on the Move
10/8/20182018 Clinical Data Colloquium for Research 2
Clinical Decision SupportIntelligent clinical devices, Learning Health Systems
Patterns and PredictionsInformation Commons
Retrospective AnalyticsEMR, Data Marts, Data Warehouses
Agenda
Overview of Current UCSF Clinical Research Data Landscape• including what’s new/changed in the last 18 months
10/8/20183 2018 Clinical Data Colloquium for Research
What clinical data are available at UCSF for research? UCSF electronic medical record (EMR) data:
• APeX data dating back to 2012• STOR data dating back to 1988• Images
Plus additional data, such as:• Geocoded address data• CA Death Registry data • ZSFG and other Department of Public Health data• UC Health data (EMR data from UC Davis, UC Irvine, UCLA, UCSD,
UCSF) for patient counts across UC Health
10/8/20184 2018 Clinical Data Colloquium for Research
10/8/2018Presentation Title and/or Sub Brand Name Here5
Current UCSF Clinical Research Data Landscape
Note: CDW = Clinical DataWarehouse
10/8/2018Presentation Title and/or Sub Brand Name Here6
UCSF Tools that Support Counts
RDB- Research Data Browser• Cohort exploration• De-ID record drill down
PatientExploreR• Cohort exploration• De-ID record drill down• Potential replacement of
RDB
UC Rex Data Explorer• I2b2 Interface• Cohorts across the UC’s
UCSF Tools that Support CountsTool UCSF
DataUC Health data available
Drill Down to Row Level details
Notes
RDB Yes No Yes Evaluating replacement options (options include PatientExploreR, Trinetx, …)
UC ReX Yes Yes No Part of the X-UC CDW project is to introduce a new tool
PatientExploreR Yes No Yes Initial availability on Information Commons
10/8/20187 data.ucsf.edu2018 Clinical Data Colloquium for Research
10/8/2018Presentation Title and/or Sub Brand Name Here8
De-Identified Data Sets
De-Identified Data SetsTool UCSF
DataUC Health data available
Drill Down to Row Level details
Notes
RDB – Flat Files Yes No Yes Looking to Sunset in 2019
De-Identified CDW
Yes No Yes Details in following slides
Information Commons
Yes No Yes Details in following section
10/8/20189 data.ucsf.edu2018 Clinical Data Colloquium for Research
10/8/2018Presentation Title and/or Sub Brand Name Here10
Clarity Clarity is a relational DB
representation of Epic EMR DB
~13,000 Tables
~110,000 Columns
Identified Data Sets Availability & Services
Caboodle Clinical Data Warehouse (CDW)
~170 Tables
~2700 Columns
Data from Epic and other sources
ZSFG
Multiple Legacy Systems
X-UC CDW
Coming Soon for Research
Identified Data Sets Availability and ServicesData Date
RangesNotes
Epic EHR Data
2012 -Current
Clarity & CDW
STOR Historical Data
1994 -2012
CDW
ZSFG EHR Data
1990 -Current
ZSFG legacy systems. Moving to Epic
X-UC CDW
2012 -Current
Cross- UC-wide Centralized CDW
10/8/201811 data.ucsf.edu
• Delivery of UCSF & ZSFG Data• CTSI – ARS partnership to provide
Centralized Honest Broker Services• Significant improvements in time of
delivery • Limited resources and expanding demand• data.ucsf.edu to request data
• X-UC CDW (Cross UC, includes 5 UC Health EMRs)• Governance and procedures in the works
2018 Clinical Data Colloquium for Research
Getting to the Right Data can be Challenging
Tracing Data back to the Source System
The source data is focused on Clinical Care and Operations (Not Research)
Different data elements that are similar but different (e.g. multiple Diagnosis codes)
It often takes a team effort
…
10/8/201812 2018 Clinical Data Colloquium for Research
What’s been added to UCSF data for research in the last 18 months
10/8/201813 2018 Clinical Data Colloquium for Research
STOR Data going back to 1982, now available
10/8/2018Presentation Title and/or Sub Brand Name Here14
Ambulatory medical record system created at UCSF
~3,000,000 Patients
Demographics, encounters, diagnosis, procedures
Other clinical information (e.g. Lab System)
Physician-generated clinical data (e.g. Problem Lists)
Clinical notes (Dictated into STOR)
Geocoded Data now available
10/8/201815
• ~6 million addresses extracted from the UCSF EHR• 87% geocoded
• (x,y) coordinates on map• Census tract/block, etc
• Census tract/block/zip data added back into CDW
Use Cases Supported:• Describe geographic distribution of different medical conditions• Analyze neighborhood factors that might cause disease or contribute to disparities• Look for place-based opportunities for interventions (e.g.,schools, churches,
community centers)
2018 Clinical Data Colloquium for Research
California Death Registry DataWe now date of death for 163,000 +
• Prior to this we had ~16,000
CA Death Registry Date added a new field in the CDW
Monthly Updates
Use Cases Supported: Exclude from Trial Recruitment
Availability in upcoming release of the De-ID CDW (Dates shifted)
Request death dates as part of a IRB approved Clinical Data Request (Working on getting official permission from the CDPH)
10/8/201816 2018 Clinical Data Colloquium for Research
Introducing the UCSF De-Identified Clinical Data Warehouse (De-ID CDW)
10/8/201817 2018 Clinical Data Colloquium for Research
De-Identified Clinical Data Warehouse (CDW)
10/8/2018Presentation Title and/or Sub Brand Name Here18
Starts with UCSF CDW (Caboodle)
Removal of all Personal Health Information (PHI)
Safe Harbor Approach
Structured data only (not notes)
Monthly Updates
What's in De-ID CDW vs. Existing RDB RDB – Contained a subset of CDW data from mid 2012 - Present
• Clinical Data for ~1,000,000 Patients
De-ID CDW Contains all of the data that is in RDB plus:• Historical data going back to 1982 (~3,000,000 Patients)• Medication Administration & Dispensed• Provider Data• ED visit data• Patient Registry data (Apex Registries)• Surgical Episode detailed data (including Surg Procedures, supplies, …)• And Much More…
Having a fully De-ID CDW allows users to do research against the full row level data set
10/8/201819 2018 Clinical Data Colloquium for Research
When will it be available? Q4 2018 It will eventually replace the Research Data Browser (RDB)
Audit of the data to certify de-identification happening now
UCSF policies for internal use and external sharing are being developed
10/8/201820 2018 Clinical Data Colloquium for Research
Improvements in our Tools to use Data
10/8/201821 2018 Clinical Data Colloquium for Research
MATLAB Site Wide License – Success!MATLAB is a programming toolkit that is optimized for complex math, including such operations as Image Processing/Vision systems, Signal Processing, Statistics, Text Analysis, and Machine Learning.
10/8/201822
• Now UCSF has unlimited User licenses(single user, class room, server)
• Additional Tool Boxes available• Full Support from MathWorks• Thanks to existing UCSF MATLAB Community, Library and IT
2018 Clinical Data Colloquium for Research
Self Service Analytics (Tableau)
10/8/201823 2018 Clinical Data Colloquium for Research
• UCSF Site Wide License• Dashboards &
Visualizations• Environment stood up• Testing underway• Training Materials• Initial Availability at the
end of the year• Learn more at
data.ucsf.edu
MyResearch – What’s New?
Significant Performance Improvements
• Hardware replacement completed. More CPU/RAM and Storage
• Microsoft Remote Desktop Services implemented to deliver applications faster and simpler.
Introduced applications: STATA MP (Multiprocessor)
All other applications constantly updated/patched.
Grown to 1400+ Studies
10/8/20182018 Clinical Data Colloquium for Research 24
UCSF secure hosting environment for sensitive data with web-based management, collaboration tools, and research software (SAS, Stata, R, Matlab, more)
LEARN MORE! Afternoon Breakout Track 1, Session 1
REDCap – What’s New in the last 12 months?
Three major releases
• Repeating Forms and Instruments
• External Modules: Enables the ability to extend REDCap functionality and customize base functionality
• Smart Variables: Dynamic variables that can be used in calculated fields, conditional/branching logic, and piping.
Doubled the size of the ARS REDCap team for support!
10/8/20182018 Clinical Data Colloquium for Research 25
LEARN MORE! Afternoon Breakout Track 1, Session 2
REDCap (Research Electronic Data Capture) is a free secure web application for building and managing online surveys and databases.