4/18/2018
Copyright © Daniel Rubin and Imon Banerjee 2018 1
Frontiers of AI in Medical Imaging:
Daniel L. Rubin and Imon Banerjee
Department of Biomedical Data Science
Department of Radiology
Stanford School of Medicine
O V E R C O M I N G C U R R E N T C H AL L E N G E S AN D M O V I N G B E YO N D
C L AS S I F I C AT I O N
Outline
1. Need for image interpretation beyond image classification
2. Overcoming the challenge of insufficient training data
3. Integrating multiple data types with images
4. Making AI clinical predictions and providing visualizations for explanation
Outline
1. Need for image interpretation beyond image classification
2. Overcoming the challenge of insufficient training data
3. Integrating multiple data types with images
4. Making AI clinical predictions and providing visualizations for explanation
Benign
BenignCancer
Cancer
Image classification: Medical imaging
“Benign or cancer lesion?”
Copyright © Stanford University 2018
There are other important medical needs beyond image classification…
Key medical applications beyond classification
1. Disease detection
2. Lesion segmentation
3. Treatment selection
4. Response assessment
5. Clinical prediction (of response or future disease)
4/18/2018
Copyright © Daniel Rubin and Imon Banerjee 2018 2
People (and their diseases) differ…
Copyright © Stanford University 2018
Disease varies in different people
Molecular diversity Heterogeneous genomic
aberration landscape of individual tumors*
Phenotypic diversity Variable appearance of
lesions on images
Clinical diversity Patients have different
response to treatment 116 GBM patients
Proneural Neural Classical Mesenchymal
The TCGA Research Network. Cancer Cell. 2010
Copyright © Stanford University 2018
“Precision Medicine”
• Patient care often lacks specificity (“One size fits all” does not usually apply in medicine)
• There are “subtypes” of disease (e.g., many types of “breast cancer” needing specific therapy for each type)
• Precise diagnoses based on “electronic phenotyping” and molecular profiling enables treatments that are tailored to unique characteristics of each patient
• Requires accurate methods of prediction based on disease phenotypes
› Key opportunity for Big Data and AI methods
Copyright © Stanford University 2018
“Precision Health”
• A paradigm shift, focusing on prediction and prevention, rather than relying exclusively on diagnosis and treatment of existing disease
• Prevents or forestalls the development of disease
• Reduces costs and morbidity and improves patient care
• Requires accurate methods of prediction based on monitoring people’s health status
› Key opportunity for Big Data and AI methods
Copyright © Stanford University 2018
These are prediction problems…
Explosion in electronically-accessible medical records data provides opportunity to learn models to help with these prediction problems
Growth in electronic patient data
Source: https://www.healthit.gov/sites/default/files/data-brief/2014HospitalAdoptionDataBrief.pdf
Copyright © Stanford University 2018
4/18/2018
Copyright © Daniel Rubin and Imon Banerjee 2018 3
Data-driven, precision medicine
Mine medical data to create AI models
Precision medicine: match patients to treatments
Precision health: predict future disease
Deploy AI models to aid medical decision making in people
Copyright © Stanford University 2018
Computerized model:
Integration and phenotype extraction
Integrating various types of data (e.g., images + clinical notes) is needed
Disease detectionDiagnosisTreatment response
evaluationClinical prediction
Molecular Characterization
Laboratory and Clinical Testing
Radiology Imaging and Reports
Pathology Imaging/Reports
Clinical Decision SupportRadiomics and Deep
Learning
Copyright © Stanford University 2018
Deep learning: Image classification
Inputimage Output
label
Modified after slide by Jeff Dean, Google
Low levelrepresentation
Edges, shapes, object parts
Recognizable objects
High level representation
Unsupervised learned quantitative
features
• High-level abstractions of image features hierarchical, non-linear transformations
• Higher-level features (layers) are defined from lower-level ones, and represent higher levels of abstraction
• Most suitable for classification problems
Copyright © Stanford University 2018
Beyond classification: Annotation of images
• In order to build classifiers for images, we need to collect lots of annotated images
• Creating large annotated image datasets are costly
• Scalable NLP methods applied to radiology reports associated with images could generate image annotations in large volume
• Word embeddings are a promising NLP strategy for this purpose
• Disease in patients evolves over time (longitudinally)
• Patient data (images and text reports/notes) are acquired longitudinally
• We need prediction models need to account for longitudinal data inputs (e.g., RNN, LSTM)
Beyond classification: Prediction Deep learning with texts: Word embeddings
CBOW Neural Network Model
• Simple neural network of a single hidden layer with a linear activation function.
Skip-Gram: Input to model is wi, and the output is context, e.g., wi−1,wi−2,wi+1,wi+2"predicting the context given a word“
Continuous bag-of-words (CBOW): Input to model is context, e.g., wi−2,wi−1,wi+1,wi+2, and output is wi. "predicting the word given its context"
• Unsupervised learning from large corpora; word vectors (embedding) learned by stochastic gradient descent
Copyright © Stanford University 2018
4/18/2018
Copyright © Daniel Rubin and Imon Banerjee 2018 4
Identifying core terms from unstructured narrative text
Unsupervised deep learning algorithms can discover annotation from texts without the need of supplying specific domain knowledge
Word embedding using deep learning (4,442 words) projected in two dimensions
Imon Banerjee, JDI 30:506-518, 2017Copyright © Stanford University 2018
Outline
1. Need for image interpretation beyond image classification
2. Overcoming the challenge of insufficient training data
3. Integrating multiple data types with images
4. Making AI clinical predictions and providing visualizations for explanation
• Each PACS database contains millions of images “labeled” in the form of unstructured notes.
• Why not to use the notes for annotating the images?
• Unstructured free text cannot be directly interpreted by a machine due to the ambiguity and subtlety of natural language.
• How to extract the semantic information from the clinical notes?
Radiological image annotation: leveraging clinical notes
Radiologist’s noteCT image
Copyright © Stanford University 2018
Representing radiology notes for classification
f( )=y?
What is the best representation for the document x being
classified by machine?
Copyright © Stanford University 2018
Dictionary-based approach
Construct a pattern dictionary (template)
Extracts and structures clinically relevant information from textual radiology reports.
Translates the information to terms in a controlled vocabulary.
MedLEE – Friedman et. al. (1995)
Copyright © Stanford University 2018
Rule-based systems
Creates annotations for text that matches pre-defined regular-expression patterns Different from the dictionary based
method, the rule based method use several general rules instead of dictionary to extract information from text
The regular expressions are usually organized into rule files.
Bozkurt et. al. Using automatically extracted information from mammography reports for decision-support. 2016
Copyright © Stanford University 2018
4/18/2018
Copyright © Daniel Rubin and Imon Banerjee 2018 5
Unsupervised feature extraction (BoW and tf-idf)
Consider the document collection has distinct words
Each document is characterized by an n-dimensional vector whole ithcomponent is the frequency of word or weight of word
Example:
Report 1: [No tumor seen]
Report 2: [Large tumor detected]
W(vocabulary) = [No, tumor, seen, Large, detected] ( 5)
Report_V1 = [1,1,1,0,0]
Report_V2 = [0,1,0,1,1]
Copyright © Stanford University 2018
Word embedding + classification model
• Stores each word in as a point in space, where it is represented by a vector of fixed number of dimensions.
• Unsupervised, built just by reading huge corpus
• Can be used as features to train a supervised model with a small subset of annotation.
Word embedding
CorpusDocument embedding
Classifier
Positive
Negative
Document classificationMikolov, Distributed representations of words and phrases and their compositionality
Copyright © Stanford University 2018
Word2Vec
Copyright © Stanford University 2018
Limitation of existing NLP methods
Dictionary-based and Rule-based NLP systems
limited coverage and generalizability.
Requires tremendous manual labor to construct the dictionary or rule files.
Unsupervised feature extraction (BoW and tf-idf)
encode every word in the vocabulary as one-hot-encoded vector, but clinical vocabulary may potentially run into millions;
vectors corresponding to same contextual words are orthogonal;
don’t consider the order of words in the phrase.
Word embedding techniques
assign random vector for out-of-vocabulary (OOV) words and morphologically similar words
not directly suitable for radiology medicine as synonyms and related words are used widely, and words may have been used infrequently in a large corpus.
Copyright © Stanford University 2018
Intelligent Word Embedding: pipeline
Copyright © Stanford University 2018
Ontocrawler: generation of domain dictionary
Created an ontology crawler using SPARQL that grabs the sub-classes and synonyms of the domain-specific terms from NCBO bio-portal.
Generate a focused dictionary for each domain of radiology.
• {‘apoplexy’, ‘contusion’, ‘hematoma’, ...} ‘hemorrhage’
Copyright © Stanford University 2018
4/18/2018
Copyright © Daniel Rubin and Imon Banerjee 2018 6
Context-depended document vector creation
For Document vector creation also used:
I. Averaging II. Max poolingIII. Min pooling
Copyright © Stanford University 2018
IWE for classifying hemorrhage riskGoalsGoals
1. Extract 10,000 CT head reports from Stanford hospital repository.
2. 1,188 reports annotated with hemorrhage risk.
3. Tailor the popular word2vec method to be applicable in medical domain, particularly radiology.
Can be used at scale in EMR for: Reduce the requirement of manual annotation by using only a small subset of annotated reports.
Study 1: CT head exam reports
Banerjee I, Madhavan S, Goldman RE, Rubin DL. Intelligent Word Embeddings of Free-Text Radiology Reports. AMIA Annual Symposium 2017, arXiv preprint arXiv:1711.06968. 2017 Nov 19.
Copyright © Stanford University 2018
Case-study
Task: Classification of radiology reports by confidence in the diagnosis of intracranial hemorrhage by the interpreting radiologist.
Dataset:
Unannotated corpus: Captured 10,000 radiology reports of CT Head, CT Angiogram Head, and CT Head Perfusion study.
Gold-standard annotation: a subset of 1,188 of the radiologic reports were labeled independently by two radiologists and agreement was high ~0.98 kappa score.
Classification labels:
1) No intracranial hemorrhage;
2) Diagnosis of intracranial hemorrhage unlikely, though cannot be completely excluded;
3) Diagnosis of intracranial hemorrhage possible;
4) Diagnosis of intracranial hemorrhage probable, but not definitive;
5) Definite intracranial hemorrhage.
Copyright © Stanford University 2018
Classification
Aim is to assign a ‘risk’ label to the free-text CT radiology reports while being trained on the subset of reports with the ground truth labels created by the experts.
Performed experiments using three well-known classifiers - Random Forests, Support Vector Machines, K-Nearest Neighbors (KNN) in their default configurations.
36
AMIA 2017 | amia.org
No risk946 reports
High risk199 reports
Medium risk43 reports
Copyright © Stanford University 2018
Hyperparameter tuning
We used the grid search approach to tune the two main hyperparameters of our embedding for the targeted annotation: Window Size and Vector Dimension.
37
AMIA 2017 | amia.orgCopyright © Stanford University 2018
Results
Word analogies – Can be derived by computing the cosine similarity
.
| |
∑
∑ ∑
AMIA 2017 | amia.org
Word 1 Word 2 Similarity
new recent 0.941
infarction acute infarction 0.928
hemorrhage hemorrhage 0.964
hemorrhage subarachnoid hemorrhage
0.968
Word 1 Word 2 Similarity
hemorrhage NEGEX QUAL hemorrhage -0.074
large NEGEX enlarged -0.245
abnormalities NEGEX QUAL abnormalities
-0.283
mass effect NEGEX QUAL mass effect -0.170
Copyright © Stanford University 2018
4/18/2018
Copyright © Daniel Rubin and Imon Banerjee 2018 7
Word embeddings in 2D dimensions
39
AMIA 2017 | amia.org
Constructed using the t-SNE approach where each data point represents a word. A total of 4,442 words are visualized in the figure
Copyright © Stanford University 2018
Document embeddings in 2D dimensions
40
AMIA 2017 | amia.org
1,188 expert annotated CT radiology report vectors visualized in two dimensions
Copyright © Stanford University 2018
Baseline performance with unigrams
1. Baseline model – Bag-of-words with >10,000 words in the vocabulary
41
AMIA 2017 | amia.org
Classifier Precision Recall F1-score
Random Forest 87.5% 66.03% 75.26%
KNN (n = 10) 64.79% 80.49% 71.8%
KNN (n = 5) 82.62% 82.36% 75.9%
SVM (Radial kernel) 60.52% 77.80% 68.08%
SVM (Polynomial kernel) 69.52% 77.80% 68.08%
Copyright © Stanford University 2018
Comparative performance
1. Out-of-box word2vec – without semantic mapping
2. Proposed model - with semantic mapping
42
AMIA 2017 | amia.org
Out-of-box word2vec Proposed model
Classifier Precision Recall F1-score Precision Recall F1-score
Random Forest 87.59% 89.17% 87.78% 88.64% 90.42% 89.08%
KNN (n = 10) 86.73% 88.90% 87.47% 88.60% 89.91% 88.88%
KNN (n = 5) 87.52% 88.65% 87.74% 88.54% 89.62% 88.76%
SVM (Radial kernel) 63.98% 79.96% 71.07% 64.19% 80.09% 71.25%
SVM (Polynomial kernel) 62.40% 78.97% 69.70% 63.25% 79.49% 70.43%
Copyright © Stanford University 2018
IWE for annotating Pulmonary EmbolismGoalsGoals
1. Extract 100k+ de-identified Thoracic CT free text reports from Stanford hospital repository
2. Used ~900 CT free text reports from University of Pittsburgh Medical Center.
3. Design machine learning algorithms to retrospectively classify PE-CTA imaging reports
4. Compare performance to published state-of-the-art rules based information extraction for PE
Can be used at scale in EMR for: cohort analysis, machine vision, cost-effectiveness / utilization analysis
Study 2: Chest CT image (Stanford & UPMC)
Copyright © Stanford University 2018
PE study results
Banerjee I, Chen MC, Lungren MP, Rubin DL. Radiology Report Annotation using Intelligent Word Embeddings. Journal of Biomedical Informatics November 2017
Proposed modelsProposed models
Copyright © Stanford University 2018
4/18/2018
Copyright © Daniel Rubin and Imon Banerjee 2018 8
ResultsResults
Benchmarked the performance against PeFinder and word2vec model.
IWE model had lowest generalization error with highest F1 scores.
Of particular interest the IWE model (trained on the Stanford dataset) outperformed PeFinder on the UPMC dataset (used to create the PeFinder model).
PeFinder: B.E. Chapman, S. Lee, H.P. Kang, W.W. Chapman, Document-level classification of ct pulmonary angiography reports based on an extension of the context algorithm J. Biomed. Inform., 44 (5) (2011), pp. 728-737
Copyright © Stanford University 2018
Clustering of the word embedding space using K-means++
Clustered word vector space
Clustered elements
Copyright © Stanford University 2018
Unsupervised IWE reports embedding – holdout test-set
PE positive - Stanford test set (on left) and UPMC dataset (on right)
Copyright © Stanford University 2018
Unsupervised IWE reports embedding – holdout test-set
PE acute - Stanford test set (on left) and UPMC dataset (on right)
Copyright © Stanford University 2018
ROC curve measures
Stanford dataset UPMC dataset
Copyright © Stanford University 2018
ResultsResults
On Stanford dataset
Copyright © Stanford University 2018
4/18/2018
Copyright © Daniel Rubin and Imon Banerjee 2018 9
ResultsResults
On UPMC dataset
Copyright © Stanford University 2018
IWE for inferring LI-RADS scoresGoalsGoals
1. Extract 200K de-identified ultrasound reports from Stanford hospital repository
2. Used 2000 free text reports from UT Southwestern.
3. Design a scalable computerized approach for large-scale inference of Liver Imaging Reporting and Data System (LI-RADS) final assessment categories in narrative ultrasound (US) reports
4. Infer LI-RADS scoring for unstructured reports that were created before the LI-RADS guidelines were established
Can be used at scale in EMR for: large scale text mining and data gathering opportunities from standard hospital clinical data repositories
Study 3: Liver Ultrasound HCC screening
Copyright © Stanford University 2018
PE study results
Liver Ultrasound Cohorts
Stanford Data
Used for testingYear 2007 - 2016 (without LI-RADS template)
-11,154
Year 2017 (without LI-RADS template)962
Used for training and validationYear 2017 (with LI-RADS template)
TotalLI-RADS 1 1589LI-RADS 2 93LI-RADS 3 62
UT Southwestern Data
Used for testing
Year 2017 (with LI‐RADS
template)
Total
LI-RADS 1 1867LI-RADS 2 162LI-RADS 3 118
Copyright © Stanford University 2018
Proposed modelsProposed models
[2007 -2017]Learning
word semantics
Learning LI-RADS coding
Infer LI-RADS coding
With LI-
RADS?
Annotated US reports(not formatted with LIRADS template)
[2007 – 2016]
Trained classifier
LIRADS vocabulary
Copyright © Stanford University 2018
Synonyms of LI-RADS terminology derived by the model
Categories LI-RADS Lexicon
Synonyms generated
Echogenicity hyperechoic hyperechogenic, hyperechoisoechoic isoechohypoechoic hypoechogenicity, hypoechogen, hypoechocystic anecho, anechoicnonshadowing
non_shadowing
Doppler vascularity hypovascular
nonenhancing
avascular nonvascularhypervascular
hypervascularity
Architecture septation septat, septations, multicystic, septa, complex_cyst, intern_septation, thin_septation, multispet, reticul, fishnet, multilocul
complex complicated, solid_and_cysticMorphology lobulated bilobe, macrolobulated, microlobulated
round oval, rounded, ovoid, oblongill-define vague, indistinctexophytic bulgewell_defined well_circumcribed, marginated
Copyright © Stanford University 2018
Ensemble classifier
Ensemble classifier is a weighted combination of:
1. Section embedding classifier: takes vector representation of the liver section recorded in US exams report as input
2. Lesion measure classifier: takes the two quantitative lesion measures as input
1. Number of lesion present in the liver
2. Long axis length of largest lesion
Vectors of liver section
Copyright © Stanford University 2018
4/18/2018
Copyright © Daniel Rubin and Imon Banerjee 2018 10
Validation on LIRADS formatted reports
Machine inferred annotation Human annotationBOW classifier Proposed model Rater 1 Rater 2
Average precision 0.59 0.93 0.92 0.93Average recall 0.49 0.88 0.92 0.92Average f1 score 0.52 0.90 0.92 0.92
LI‐RADS category reported by the original interpreting radiologist as the true label. Performance of the human raters and the proposed model on the validation set (147 reports)
Copyright © Stanford University 2018
Liver Segment Originalcoding
Machine derivedprobability
Imputed label
Rater1
Rater 2 Reason
1 2 3liver length: 14.2 cm. liver appearance: mild steatosis. segment 5 lesion, previously characterized as a hemangioma on february 2014 mr now measures 1.0 x 1.0 x 0.9 cm, previously 0.7 x 0.6. previously seen optn class 5a lesion on february 2014 mr is not well seen on ultrasound. null a small right hepatic lobe cyst measures 7 x 6 x 7 mm, previously 10 x 9 x 9 mm. no new hepatic lesions.
3 0.56 0.06 0.38 1 1 1 Previously characterized as hemangioma, therefore should be categorized as benign
liver length: 17.6 cm. liver appearance: normal. liver observations:0.6 x 1.1 x 1.3 cm hyperechoic focus in the right liver was minimalflow likely representing a hemangioma or focal fat. liver doppler:hepatic veins: patent with normal triphasic waveforms.
1 0.55 0.06 0.4 1 2 3 Includes hemangioma or fat (both benign), but this is not definite and needs characterization
liver length: 12.1 cm. liver appearance: mild steatosis. hypoechoicleft hepatic lobe lesion measures 1.2 x 0.5 x 0.7 cm, decreased from3/8/2017 ct and not significantly changed from more recent pet/ct.
1 0.45 0.10 0.44 1 3 3 Observation is stablefrom prior imaging,but not definitelybenign
liver length: 16 cm.liver appearance: severe steatosis. null no surface nodularity. liverobservations: 1.7 x 1.4 x 1.0 cm hypoechoic focus in the gallbladderfossa likely reflects focal fatty sparing. null no surface nodularity.liver doppler: hepatic veins: patent with normal triphasicwaveforms.
3 0.43 0.17 0.40 3 1 1 Fatty sparing is benign
Disagreement between original US image reader and raters
Class probability value computed by the model clearly shows that model is deciding between reader’s defined label and raters’ observation
Copyright © Stanford University 2018
Test on non-LIRADS formatted reports
HCC screening Ultrasound Reports formatted without LIRADS template (2007 - 2016) [11,154 exams]
No LIRADS scoring available from the US image reader
Asked raters to annotated 216 reports where the model’s predicted highest probability is either <0.5 (152 reports) or >0.9 (64 reports) .
Confusion Matrix on 216 reports Prediction confidence
Copyright © Stanford University 2018
Test on UT Southwestern data
• Without retraining, applied on a different institutional data
• Tested on 2381 reports coded with LIRADS template
Machine inferred annotationProposed model
Average precision 0.89Average recall 0.84Average f1 score 0.85
LIRADS 2 is predicted as LIRADS 3
Copyright © Stanford University 2018
Longitudinal patient tracking: HCC screening
1 1 1 1 1 1
2 2
1 1 11
0
1
2
3
LI-
RA
DS
sc
ore
1
3
1/1/09 1/1/10 1/1/11 1/1/12 1/1/13 1/1/14 1/1/15
LIR
AD
S s
co
res
Patient 1 [2007 – 2016] Patient 2 [2009 – 2015]
Copyright © Stanford University 2018
Outline
1. Need for image interpretation beyond image classification
2. Overcoming the challenge of insufficient training data
3. Integrating multiple data types with images
4. Making AI clinical predictions and providing visualizations for explanation
4/18/2018
Copyright © Daniel Rubin and Imon Banerjee 2018 11
Probabilistic Prognostic Estimates of Survival in Metastatic Cancer Patients Utilizing Free-Text Clinical Narratives
Only in United States around 500,000 patients develop metastatic cancer every year.
Several studies have shown overutilization of aggressive medical interventions and protracted radiation treatment in patients close to the end of life.
Inability to accurately estimate patient life expectancy likely explains why physicians tend to choose overly-aggressive treatments for some patients.
Leads to increased morbidity and healthcare costs, while other patients may be under-treated and denied access to effective treatments that could reduce symptoms or even extend survival.
69
A robust ML model that predicts patient survival would have major impact on the quality of care and quality of life in metastatic cancer patients.
Banerjee I, Gensheimer MF, Wood DJ, Henry S, Chang D, Rubin DL. Probabilistic Prognostic Estimates of Survival in Metastatic Cancer Patients (PPES-Met) Utilizing Free-Text Clinical Narratives. AMIA Informatics 2018, arXiv preprint arXiv:1801.03058. 2018 Jan 9.
Copyright © Stanford University 2018
The number of natural language processing (NLP)-related articles compared to the number of electronic health record (EHR) articles from 2002 through 2015
Yanshan Wang et. al., Clinical information extraction applications: A literature review, JBI 2018
Under-utilization of NLP in EHR-based research
70Copyright © Stanford University 2018
Objective
Created a dynamic model that takes as input a sequence of free-text clinical visit narratives ordered according to the date of visits.
Computes as output a probability of short-term life expectancy (> 3 months) for each visit considering the current and all the historic time points.
71
Visit note t1
Visit note t2
Visit note t3
Visit note tn
Score t1
Score t2
Score t3
Score tn
Input: Unstructured visit notes Ordered based on time stamp
Output: Probability of survival
Model: Analyse current and historicvisit data
Copyright © Stanford University 2018
Major challenges
How to create machine-intelligible dense vector representations of unstructured clinical notes?
How to model irregular time gaps between the longitudinal clinical events?
How to infer human interpretable justification of prediction while using longitudinal data?
72Copyright © Stanford University 2018
Dataset used in the study
73
Characteristic Metastatic cancerdatabase
(MetDB)
Palliative radiationdataset
(PrDB)No. of patients 13,523 899
Age 61.5 (IQR 51.2 – 70.5) 65.0 (IQR 55.8 – 72.2)
Sex M: 6621 (49%);
F: 6902 (51%)
M: 460 (51.1%);
F: 439 (48.9%)
Primary site Breast: 1493 (11.0%) Endocrine: 211 (1.6%) Gastrointestinal: 3575 (26.4%) Genitourinary: 1504 (11.1%) Gynecologic: 849 (6.3%) Head and neck: 506 (3.7%) Skin: 453 (3.3%) Thorax: 2178 (16.1%) Other/Multiple/Unknown: 2754 (20.4%)
Breast: 141 (15.7%) Endocrine: 0 (0%) Gastrointestinal: 145 (16.1%) Genitourinary: 112 (12.5%) Gynecologic: 50 (5.6%) Head and neck: 57 (6.3%) Skin: 122 (13.6%) Thorax: 252 (28.0%) Other/Multiple/Unknown: 20 (2.2%)
Note typesOncology notes, progress notes, radiology reports, discharge summary, nursing notes, critical care notes
Distribution of visits
Copyright © Stanford University 2018
Survival data - challenges
Patient 1: Dense follow up(multiple visits on same day)
Patient 3: Sparse follow up(long and variable gaps between visits)
Patient 2: Minimal information(only 3 days)
Patient 4: No death info – long follow up
Copyright © Stanford University 2018
4/18/2018
Copyright © Daniel Rubin and Imon Banerjee 2018 12
PPES-Met model
75
Unsupervised embeddingOf free-text notes
Many-2-many RNN model
Copyright © Stanford University 2018
RNN model with time distributed weighted loss
76Copyright © Stanford University 2018
Training and Evaluation
77
Category 1: “Survival - positive” stands forsurvival up to 3months starting from the currentvisit date;
Category 2: “Survival - negative” flagged thenon-survival;
Category 3: “Zero padding” padded each inputsequences when is shorter than 1000 andtruncated the historic visits when sequence islonger than 1000
Model training and validation on MetDB
Training: 10,293
patients;Validation:
1,938 patients;
Model evaluation: dual strategy
1. Quantitative: measure the overall prognosis estimation accuracyusing the standard statistical metrics
2. Qualitative: evaluate the patient-level performance and performerror analysis with intelligible longitudinal graph summary forunderstanding the basis of prediction.
Test: 1818 patients;
899 from PrDB+ 919 Randomly
selected from MetDB
Copyright © Stanford University 2018
Results: Quantitative Evaluation on PrDB
78
Overall ROCAUC for predicting 3 mo. survival -0.89; Confidence interval [0.884 - 0.897]
Tested on 1818 patients with multiple visits
Copyright © Stanford University 2018
Results: Quantitative Evaluation on PrDB
79
ROC based multiple primary site
Comparing with systematic therapy:Shows model’s prediction outperformed oncologist’sexpectation of survival and can contribute in treatment planning
Tested on 1818 patients with different primary sites
Copyright © Stanford University 2018
Results: Qualitative Evaluation on PrDB
80
Patient 1 Patient 2
Patient-level performance analysis
Copyright © Stanford University 2018
4/18/2018
Copyright © Daniel Rubin and Imon Banerjee 2018 13
Results: Qualitative Evaluation on PrDB
81
Patient 3 Patient 4
Patient-level performance analysis
Copyright © Stanford University 2018
Hover & discover
82
Intelligible longitudinal survival curve of a patient
Copyright © Stanford University 2018
Prediction test with 30% polluted visit note at the end of the sequence
83
Patient 5 Patient 6
Copyright © Stanford University 2018
Prediction test with 30% polluted visit note at the end of the sequence
84
Patient 9 Patient 10
Copyright © Stanford University 2018
Future works
Extend our AI framework by integrating imaging and non-imaging multi-source data for predictions of future hospitalization and ER visits.
Integrate semantic data mining and deep learning analysis for combining structured and unstructured clinical data
85
Future vision
Copyright © Stanford University 2018
Conclusions
• There are important medical tasks for deep learning beyond classification
• Much informative medical data is longitudinal electronic patient records data
• Word embeddings are a powerful technique
• Information extraction
• Image annotation
• For generating features for prediction models
• Text data can be integrated with image data for prediction models
• Prediction models leveraging longitudinal patient data are promising
4/18/2018
Copyright © Daniel Rubin and Imon Banerjee 2018 14
Thank you.
Contact info:[email protected]