What I have learned from the RSNA Bone Age Challenge
Alexandre Cadrin-Chênevert, MD, B.Ing, FRCPCIn collaboration with :
Alexander Bilbily, MD, PGY5Mark Cicero, MD, BESc, FRCPC
Disclosures
Dr. Alexandre Cadrin-ChênevertNo financial conflict of interest to disclose
Dr. Alexander BilbilyCo-founder and CEO of 16 BitDr. Mark CiceroCo-founder and COO of 16 Bit
Learning Objectives
1. To identify a new research model : public machine learning challenges applied to medical imaging
2. To describe state of the art results from the RSNA bone age challenge
3. To list educational resources and learning tools to participate in future competitions
Amount of Data
Perfo
rman
ce
Traditional machine learning
Shallow neural networks
Medium neural networks
Deep neural networks
Deep learning scalability
Adapted from blog.easysol.net/building-ai-applications/ with permission
AI Research gap in healthcare/radiology
Data Expertise
HEALTHCAREFACILITY ACADEMIA INDUSTRY
GAP
LabelingSecurityStandard
TECHNICALPatient consentDeidentification
Data sharing
ETHICAL
Machine learning competitions• Expanding research paradigm• Goal : Finding the best performing algorithm on a
specific machine learning problem• Tool : Publicly available dataset• Open values : collaboration, education,
communication, algorithm sharing
Expertise
Data Algorithms GPU
Performance/Accuracy
Imagenet – CNN architectures
Source : Eugenio Culurciello, medium.com/towards-data-science/neural-network-architectures-156e5bad51ba
RSNA bone age challenge• Goal : Develop an algorithm which can most
accurately determine skeletal age on a validation set of pediatric hand radiographs
• 260 participants registered
• Datasets : From 2 children hospitals– Lucile Packard Children’s Hospital at Stanford
University– Children’s Hospital ColoradoLarson DB et al. Radiology 2018; 287(1)313-322.
Bone age• Degree of maturation of a child’s bone to evaluate for a potential
advanced or delayed growth compared to chronological age.• Most frequent evaluation method using left hand xray
By a radiologist AutomatedGreulich-Pyle atlas (2nd edition, 1959) E.g. CE approved BoneXpert
Phases/datasets
Larson DB et al. Radiology 2018; 287(1)313-322.
PHASE TRAINING LEADERBOARD TEST
DATASET SIZE 12,611 1,425 200
NO. HOSPITALS 2 2 1
GROUND-TRUTH REPORT REPORT REPORT + 5 REVIEWS
MEAN BONE AGE(years) 10.6 10.6 11.0
SD BONE AGE(years) 3.4 3.5 3.6
GENDER RATIO(M:F) 1.18 : 1 1.19 : 1 1 : 1
Dataset division
@alexandrecadrin
Dataset Learning Performance
TRAINING YES Duringtraining
VALIDATION NO Duringtraining
TEST NO Aftertraining
Competition metrics1st : Mean Absolute Distance (MAD) in months
2nd : Concordance Correlation Coefficient (CCC)
Ground truth bone agePredicted bone ageAbsolute distance = -
Larson DB et al. Radiology 2018; 287(1)313-322 : Example of Bland-Altman plot comparison between model and reviewer
ResultsPHASE LEADERBOARD TEST
No images 1425 200
MEAN AD (BEST) 5.8 4.3
CCC (BEST) 0.979 0.991
MEAN AD (TOP 10) 5.8 – 6.4 4.3 – 4.9
MEAN AD (HUMAN) 6.1*
MEAN AD (PUBLISHED) 5.2*
* Larson DB et al. Radiology 2018; 287(1)313-322.
• Best mean absolute distance of 4.3 months compared to ground-truth• No confidence intervals reported during the competition• Compared to 6.1 months for radiologists and 5.2 months for the best
previous published automated model
Technical approaches• Mostly : deep convolutional neural networks• Variable : network architecture, preprocessing,
pretraining, data augmentation, image resolution, gender input, classification vs regression
Winning solution : 16bit.ai
Mark Cicero, Alex Bilbily : https://16bit.ai/blog/ml-and-future-of-radiology
Hyperparameter ValueInput 500 x 500 image
Weights initialization RandomOptimizer AdamBatch size 16
Data augmentation Rotation, translation, H flipInference 5 best models x 10 crops
Demo available (not approved for clinical use): https://16bit.ai/bone-age
Top 5 solutionsRank 1 2 3 4 5
Team Name 16bit.ai Ian Pan F. Kitamura Visiana Md.ai
MAD (months) 4.27 4.35 4.38 4.51 4.53
Parameters
Model Inception V3 Resnet50 9 layers CNN PCA + LinearRegression
Multiple deepCNNs
Input 500x500 49x(224x224) 550x500 Hand-crafted 299x299
Gender Model Input 2 Models Model Input Model Input Model Input
Optimizer Adam Adam NS NS Adam
Augmentation Yes Yes Yes NS Yes
Batch size 16 NS NS NS 32
Inference 5 best models x 10 crops
Xth percentile of 49 patches x 9
models
Ensemble of 4 models Single model
Weightedensemble of 6
models
Visualization – Activation maps
Larson DB et al. Radiology 2018; 287(1)313-322.
Lee, H., Tajmir, S., Lee, J. et al. J Digit Imaging (2017) 30: 427. https://doi.org/10.1007/s10278-017-9955-8
Discussion• Deep learning models matching human performance in
research conditions for bone age estimation
• Reference gold-standard for bone age ?– Human interpretation based on Greulich-Pyle atlas– Chronological age from normal subjects– Double-reading : radiologist + model
• Bone age : low-hanging fruit for deep learning– Ground-truth labels easily extracted from written reports– Single 2D image– Single numerical output value– Relatively simple pattern recognition
What we have learned
• Machine learning requires extensive experimentation : model architecture, data augmentation, image resolution, ensemble of models
• Large public labeled datasets have likely high impact for research and future clinical applications
• Radiologists should be involved in machine learningresearch/challenges:– To define clinically significant use case scenarios– To help create large datasets with high quality ground truth
labels
Deep learning = statistical learning
• Imaging • Machine vendor• Protocol• Contrast, Noise
• Population• Age, Gender• Genetic• Lifestyle habits
• Diagnosis• Pretest probability• Prevalence• Ground-truth
Performance optimized for specific statistical research conditionsResearch performance ≠ Clinical performance
TRAINING
VALIDATION
TEST
TRAINING
VALIDATION
TEST
Research data Clinical data
From research to clinical applications
TRAINING
VALIDATION
TEST
TRAINING
VALIDATION
TEST
Research data Local data
Optimal : Retrain
Minimal : Retest
Educational toolsField Online Resources Conferences
Deep learning in radiology ACR Data Institute SIIM, MICCAI, EuSOMII, C-MIMI, MIDL
Convolutional neural networks
Coursera, Udemy,Stanford CS231n online
Computer vision Stanford CS231n online CVPR, ECCV, ICCV
Deep learning Coursera, Udemy, Fast.ai ICLR, NIPS
Machine learning Coursera, Udemy ICML, KDD
Computer programming Coursera, Udemy –Python
Take home messages
• Machine learning challenge = new significantresearch paradigm using public data
• Bone age : deep learning models matchinghuman performance in research conditions
• Research performance ≠ clinical performance • Online educational resources available
Alexandre Cadrin-Chênevert, MD, B.IngEmail : [email protected] : @alexandrecadrin