Artificial Intelligence and Machine Learning:Innovations in Clinical Trial Data Automation
Presented by SDC
Richard B. Abelson, Ph.D. | President & CEO
Dale W. Usner, Ph.D. | Sr. VP, Strategic Scientific Consulting & CSO
1
DIA 2019 Innovation Theater | June 24, 2019
Revolution or Evolution
2
• The “Japanese Post-War Economic Miracle”
• Six-sigma and lean manufacturing
• A focus on quality, reduced touch time and automation of repetitive tasks has made it so that the average vehicle has increased less then 30% in cost since 1970 on an inflation adjusted basis. (Energy.gov, 2016)
EVOLUTION
01010111 01100101 01101100 01100011 01101111 01101101 01100101 00100000 01110100 01101111 00100000 01000100 01001001 01000001
What Does This Mean For Us?
3
Through process automation, we can achieve:
Leading to:
Higher Quality Data
Higher ROI on R&D
at a Lower Cost
Increased Profitability
in Less Time
Better Patient Care
Understanding Automation
Level Description
Level 0 System issues warnings
Level 1The driver and system share control (cruise control, parking assistance, lane control)
Level 2“Hands off”The automated system takes control fully but is closely monitored by driver
Level 3“Eyes off” Driver may do something else and the system will notify the driver if involvement is needed; driver must be ready to intervene immediately
Level 4“Mind off” Driver may go to sleep
Level 5“Steering wheel is optional” No human intervention is required in any circumstance
4
The Case of Self-Driving Cars
Artificial Intelligence and Machine Learning
• Artificial Intelligence (AI): AI is a general term for any computer system that simulates intelligent behavior
• Machine Learning (ML): ML is an application of AI where the system auto learns and improves based on observed data
• Similarity Models: Method for determining similarity of text
5
Applications in Clinical Trials
6
Patient Compliance
AI/ML Based Software as a Medical Device
Drug Discovery
Patient Recruitment
Applications in Clinical Data Sciences
Development of draft CRF and EDC visit schedule
Trending in trial data and key performance and quality indicators
EDC user acceptance testing
SDTM Mapping and aCRF generation
7
SDTM Mapping and aCRF Generation
• Standard specification and format for submitting data to the FDA
• Requires converting format of collected clinical data and creation of dataset structure (identifier) and trial design variables
• Annotation of CRF depicting SDTM mapped variables
• Requirements increased SAS programming resources for a study by approximately 20%
• Can AI and ML make this process more efficient?
8
AI/ML SDTM Auto Mapping Process
9
EDC, Labs, & Other Data
Raw Study data downloaded
Compile Raw Clinical Data
AI/ML SDTM Auto Mapping Process
10
EDC, Labs, & Other Data
Raw Study data downloaded
AI/ML Process
Get all saved study data
already converted to
SDTM
Train learning models and measure accuracy
based on SDTM variable values
Measure name similarity
Baseline SDTM term prediction
Predict the SDTM Variable
Collection of Race in Demographics
11
AI Application Deliverable
12
DM.BRTHDTC
DM.AGE
DM.SEX
DM.ETHNIC
DM.RACE
SUPPDM.QVAL WHEN SUPPDM.QNAM =
‘RACEOTH’
DM.BRTHDTC
DM.SEX
DM.ETHNIC
DM.RACE
SUPPDM.QVAL WHEN SUPPDM.QNAM =
‘RACEOTH’
DM.AGE
Predict the SDTM Variable
13
RACE_AMERICAN RACE_ASIAN RACE_BLACK RACE_HAWAIIAN RACE_WHITE RACE_OTHER
FALSE FALSE FALSE FALSE TRUE FALSE
FALSE TRUE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE FALSE
FALSE FALSE FALSE FALSE TRUE FALSE
FALSE FALSE FALSE FALSE TRUE FALSE
TRUE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE TRUE FALSE
FALSE FALSE FALSE FALSE TRUE FALSE
FALSE FALSE TRUE FALSE FALSE FALSE
RACE
White
Asian
White
White
White
American Indian or Alaska Native
White
White
Black or African American
RACE
WHITE
ASIAN
WHITE
WHITE
WHITE
AMERICAN INDIAN OR ALASKA NATIVE
WHITE
WHITE
BLACK OR AFRICAN AMERICAN
Raw EDC DataSDTM Mapped
Data
Predict the SDTM Variable
14
Domain Study ID Raw Variable Name SDTM Variable Name Variable Label Probability of Match Model Method
DM ###-##-#### Sex SEX Sex 100% ML & Similarity
DM ###-##-#### Race RACE Race 100% ML & Similarity
DM ###-##-#### Ethnic ETHNIC Ethnicity 89% ML & Similarity
Domain Study ID Raw Variable Name SDTM Variable Name Variable Label Probability of Match Model Method
DM ###-##-#### Sex SEX Sex 100% ML & Similarity
DM ###-##-####
Race_AmericanRace_AsianRace_Black
Race_HawaiianRace_WhiteRace_Other
RACE Race 73% Similarity
DM ###-##-#### Ethnic ETHNIC Ethnicity 98% ML & Similarity
AI/ML SDTM Auto Mapping Process
15
EDC, Labs, & Other Data
Raw Study data downloaded
Validate to SDTM standards by domain Check CDISC code list for non-extensible variables
Calculate derived fields
AI/ML Process
Get all saved study data
already converted to
SDTM
Train learning models and measure accuracy
based on SDTM variable values
Measure name similarity
Baseline SDTM term prediction
• CDISC Study Data Tabulation Model Implementation Guide• CDISC SDTM Controlled Terminology• Pinnacle21
Reference Documents
Derive Fields and Validation to SDTM Standards
Validation and Derived Fields
16
Code Codelist Code Codelist Name CDISC Submission Value
C74457 Race RACE
C41259 C74457 Race AMERICAN INDIAN OR ALASKA NATIVE
C41260 C74457 Race ASIAN
C16352 C74457 Race BLACK OR AFRICAN AMERICAN
C41219 C74457 Race NATIVE HAWAIIAN OR OTHER PACIFIC ISLANDER
C41261 C74457 Race WHITE
Example
CDISC code list values
Raw Variable Race Values
race CDISC Submission Value
american AMERICAN INDIAN OR ALASKA NATIVE
asian ASIAN
black BLACK OR AFRICAN AMERICAN
hawaiian NATIVE HAWAIIAN OR OTHER PACIFIC ISLANDER
white WHITE
other NOT IN LIST
Similarity Model
Mapped CDISC Value
Domain Study ID Raw Variable Name SDTM Variable Name Variable Label Probability of Match Model MethodDM ###-##-#### Sex SEX Sex 100% ML & SimilarityDM ###-##-#### Race RACE Race 100% ML & Similarity
DM ###-##-#### Ethnic ETHNIC Ethnicity 89% ML & Similarity
RACE
American Indian or Alaska Native
Asian
Black or African American
Native Hawaiian or Other Pacific Islander
White
Other
AI/ML SDTM Auto Mapping Process
17
EDC, Labs, & Other Data
Raw Study data downloaded
Validate to SDTM standards by domain Check CDISC code list for non-extensible variables
Auto create new SDTM study data sets created
by domain
Auto create annotated Case Report Form with
SDTM term next to raw variable term
Calculate derived fields
AI/ML Process
Get all saved study data
already converted to
SDTM
Train learning models and measure accuracy
based on SDTM variable values
Measure name similarity
Baseline SDTM term prediction
• CDISC Study Data Tabulation Model Implementation Guide• CDISC SDTM Controlled Terminology• Pinnacle21
Reference Documents
Create SDTM Data Sets Annotate CRF
Annotate CRF
18
DM.BRTHDTC
DM.AGE
DM.SEX
DM.ETHNIC
DM.RACE
SUPPDM.QVAL WHEN SUPPDM.QNAM =
‘RACEOTH’
DM.BRTHDTC
DM.SEX
DM.ETHNIC
DM.RACE
SUPPDM.QVAL WHEN SUPPDM.QNAM =
‘RACEOTH’
DM.AGE
AI/ML SDTM Auto Mapping Process
19
EDC, Labs, & Other Data
Raw Study data downloaded
Validate SDTM standards by domain Check CDISC code list for non-extensible variables
Auto create new SDTM study data sets created
by domain
Auto create annotated Case Report Form with
SDTM term next to raw variable term
Calculate derived fields
AI/ML Process
Get all saved study data
already converted to
SDTM
Train learning models and measure accuracy
based on SDTM variable values
Measure name similarity
Baseline SDTM term prediction
• CDISC Study Data Tabulation Model Implementation Guide• CDISC SDTM Controlled Terminology• Pinnacle21
Reference Documents
Lessons Learned for Implementing AI/ML
• Start Focused✓Define homogenous set of training data
• Consistent version of SDTM Implementation Guide and Controlled Terminology
• Same EDC system
• SDTM implemented by the same company
✓Choose a subset of Domains
• Look Beyond the Actual Measure Results✓ Important information resides in the dataset and variable names
• Train Data Scientist on Manual Process
20
Key Takeaways
• AI is any computer system that simulates intelligent behavior. ✓Patient recruitment, compliance, drug discovery, clinical data.
• Do not need to get to Level 5 automation to see benefits of AI.
• Small investments in AI-driven automation to get to a Level 2 or 3.✓Minimal oversight and occasional human intervention
✓Higher quality data, less time, less cost
• Driving overall shorter cycle times to meaningful therapeutics.
21
Artificial Intelligence and Machine Learning:Innovations in Clinical Trial Data Automation
Presented by SDC
Richard B. Abelson, Ph.D. | President & CEO
Dale W. Usner, Ph.D. | Sr. VP, Strategic Scientific Consulting & CSO
22
DIA 2019 Innovation Theater | June 24, 2019
Visit SDC at Booth #1239