3/22/2017
1
COMPELLING USE-CASES FOR IMMEDIATE DEPLOYMENT OF IMAGE-BASED ANALYTICS IN DIGITAL WHOLE SLIDE IMAGING PATHOLOGY WORKFLOW
Ulysses G. J. Balis, M.D., FCAP, FASCP, FAIMBEProfessor of Pathology &Director, Division of Pathology InformaticsDirector, Computational Pathology Lab SectionDepartment of PathologyMichigan [email protected]
Disclosure of Relevant Financial Relationships
USCAP requires that all planners (Education Committee) in a position to
influence or control the content of CME disclose any relevant financial
relationship WITH COMMERCIAL INTERESTS which they or their
spouse/partner have, or have had, within the past 12 months, which relates to
the content of this educational activity and creates a conflict of interest.
Disclosure of Relevant Financial Relationships
.
Dr. Balis declares affiliation with:
Inspirata, Inc. – Strategic Advisory Board
(This is included for completeness; no commercial or proprietary information is included in this presentation)
Outline• Observations about data growth in Pathology
• Some thoughts on the “Hype Cycle”
• Maturation of required computational solutions needed in support of deployingWSI
– Workflow models
– High‐throughput computation (GPUs bot local and cloud‐based)
• Some thoughts on information theory and data compression
• Transitioning to cloud services to realize High‐Throughput WSI solutions.
• Example Opportunities, made possible by WSI‐based workflow:
• Rare micrometastatsis detection
• Mitotic Figure detection
• Another Motivation: Content Based Image Retrieval
• Example Use‐case: Democratization of Image‐based Analytics on the Web
• Closing thoughts
Data Portfolio:Contemporary Pathology Setting ‐ 2017
DiagnosticText
Image‐basedData
All OtherMetadata
The Present
3/22/2017
2
Data Portfolio:Digital Pathology Workspace ca. 2022
DiagnosticText
Image‐basedData
All OtherMetadata
…with near‐complete adoption
Plateau of ProductivitySlope of Enlightenment
Trough of Disillusionment
Peak of Inflated
Expectations
Innovation Trigger Phase
The Hype Cycle as Witnessed within Digital Pathology
Time
Ex
pe
cta
tio
ns
Deep Learning All-digital Whole Slide Imaging Workflow
Specific DP Reimbursement Models
High ThroughputScanners and GPU-basedComputation
Digital Consultation Outreach
Mul
tiple
x as
says
FDA Clearance for Primary Diagnosis
QuantitativeImmunoscoring
Info
rmed
Det
ectio
n
Conventional MachineLearning
NLP
DICOM
Liqu
id B
x
Effective Federated Integration with AP-LIS systems
High Level Integrated Diagnostics Architectural Map
9
Enterprise Service‐Oriented
Architecture M
essage Bus
Enterprise Service‐Oriented
Architecture M
essage Bus
Discreet Numerical Data Parsing Pipeline
Free Text NLP Parsing Pipeline
Staging
Staging
Scanning Center
Numerical Validation
Lexical Validation
Image Aggregation
Relational DB
Relational DB
Relational DB
Final Data Transformation
Multi‐AxialEdge‐Connected
AndRelational Database withHigh‐Performance Cluster
Image Scanning Pipeline
MultipleUser ClassesFinal Data
Transformation
Image Analysis / Informed Detection
Multiple Clinical Data Sources
Epic
LIS
AP‐LIS
PACS
CancerRegistry
High LevelDiscovery Workflow Map
EMREMR
ADT / Billing System
ADT / Billing System
Data Warehouse
Data Warehouse
Other Clinical RepositoriesOther Clinical Repositories
Interface Engine (WBI, Cloverleaf, eGate, etc.)
Interface Engine (WBI, Cloverleaf, eGate, etc.)
LISLIS
Digital Integration Engine
Digital Integration Engine
High‐ThroughputScanning FacilityHigh‐ThroughputScanning Facility
High‐PerformanceScanner and Analysis
Server
High‐PerformanceScanner and Analysis
Server
MirroredHigh‐performance
On‐line and Near‐lineImage Storage
MirroredHigh‐performance
On‐line and Near‐lineImage Storage
Pathology Discovery WorkstationsPathology Discovery Workstations
Barcode Tracking System
Barcode Tracking System
1 2 3 5
Application ServerApplication Server
DICOM‐BasedPACS
DICOM‐BasedPACS
Overall Application SuiteOverall Application Suite
MirroredHigh‐performance
Image Server
MirroredHigh‐performance
Image Server
4
Graphics Processing Units (GPUs) as a Computational Solution to Needed Scale
• Instead of running a program as a single linear set of operations (“a thread”), why not run 1,000 or 10,000 threads, concurrently?
• GPU‐based computation is usually only amenable to algorithms that can be subdivided without the need for substantial inter‐thread communication
• Whole Slide Imaging computation, by virtue of its tiled structure, is an ideal candidate for most image search and analytical operations
• GPU technology has become very affordable• GPU solutions are now available as a commodity
from cloud service providers.
Hence, conditions are perfect for transitioning to cloud‐based WSI analytics!
Claude Elwood Shannon (April 30, 1916 – February 24, 2001)
• Formalized our modern understanding of Information Theory and entropy• Remarkably, little use of Information theory has been applied to systematically
extracting meaningful information from WSI data sets• Loss and lossless compression are often applied without rigorous analysis
assessment of pre‐and post‐operation information content• It turns out that data compression is an incredibly important topic with respect
to high‐throughput WSI analytics
Information Theory as Applied to Digital Pathology Image Subject Matter and Image Search
3/22/2017
3
3 x 5 effective pixels 15 bytes of information• Essentially all high frequency information is
absent• Compression ratio of 71,093::1 as compared
to the original
6 x 10 effective pixels 60 bytes• Most high frequency information is absent• Compression ratio of 17,773::1 as compared
to the original
12 x 20 effective pixels 240 bytes• Minimal high frequency information is
present• Compression ratio of 4,443::1 as compared to
the original
24 x 40 effective pixels 960 bytes• Probably enough high frequency information
is present• Compression ration of 1,111::1 as compared
to the original
48 x 80 effective pixels 3840 bytes• Adequate high frequency information is
present• Compression ratio of 278::1 as compared to
the original
800 x 1333 effective pixels 1.1 Mb• All high frequency information present• Original Image
3/22/2017
4
4 x 3 effective pixels 12 bytes• 1670,000::1
8 x 6 effective pixels 48 bytes• 41,752::1
16 x 13 effective pixels 208 bytes• 9,635:1
32 x 26 effective pixels 832 bytes• 2409::1
64 x 51 effective pixels 3,264 bytes• 614::1
128 x 102 effective pixels 13,056 bytes• 153::1
3/22/2017
5
256 x 205 effective pixels 52,480 bytes• 38::1
1583 x 1266 effective pixels 2 Mb
Malignant Melanoma
Use Case – Informed Detection
• Micro‐metastasis identification is time consuming
• A pre‐screening tool would save pathologist time if sufficient sensitivity is realized
• In such circumstances, the pathologist’s task is shifted to directed review, which is less fatiguing, allowing the pathologist to practice at their highest credentialed level
• This is no longer in the realm of fiction
3/22/2017
6
Use Case – Mitosis Counting
• Mitotic figure identification is also time consuming
• A pre‐screening tool would similarly save pathologist time if sufficient sensitivity is realized
• Neuropathology and bone & soft tissue services can allocate substantial time for this task
A Foundational Model Building towards Image‐Based Differential Diagnosis Generation
FullyAutomated Diagnosis
Semi‐Supervised Dx
Machine Learning Techniques Normalized Discrete Vectorized and
Scalar Data
Application of Individual Image Extraction Operators (focal, diffuse and global)
Mature Technologies (scanners, storage, servers, GPUs and Network bandwidth)
Use Case: Using Images Themselves toSearch Image Repositories & Retrieve Associated Case Metadata:
The Dawn of Image‐Based Predictive Assays
• This is potentially a “killer apps” for the field of Whole Slide Imaging
• This can extract information not directly available to human cognition, and therefore not available through optical microscopy alone; can only be reached by means of DP
• When validated, predictive assays hold the potential to elevate use of WSI as a “must have” modality, essentially creating a new god standard
Use Case: Searching Libraries ofPathology Images with Images Themselves:
A Schematic Perspective
Extraction from Imagerepositories based uponspatial information
Analysis of datain the digital domain
…001011010111010111..
Content- Based Image Retrieval (CBIR)
Resultant gallery of matching images and any/all associated metadata
Initial Predicate Image with feature of interest
Use of CBIR to rapidly converge on a classifier
3/22/2017
7
…This begs the following question…• What is the actual useful (actionable) information content of the WSI data that we are generating?
• Does it make sense to query WSI datasets in native (non‐tokenized) format
• Are there more efficient ways to represent the spatial data?
• Are there some data elements that are more important than others?
Three Generations of Texture‐Based Pattern Recognition Software
• I – Vector Quantization (VQ)
• II – Spatially Invariant Vector Quantization (SIVQ)
• III – Vector Invariant Pattern Recognition (VIPR)
Markov Field Synthesis grid
Markov Field Synthesis grid
SIVQ individual search predicate feature match
SIVQ individual search predicate feature match
SIVQ individual search predicate feature match
Bayesian Probability Engine
A Matter of Degrees of Freedom…
Candidate Feature
How many ways can
this be sampled?
How Many Ways Can A Candidate Feature Be Matched During Training?
Y Translational Freedom
X Translational Freedom
RotationalFreedom
3/22/2017
8
The Compression Opportunity of SIVQ / VIPR:It may be the same feature but there are excessively enumerable ways to sample
• Typical Feature Vector:– 25 x 25 pixels (x by y) or larger
• 625 translational degrees of freedom
– Effective radius of 12.5 pixels– After Nyquist rotational sampling (2x spatial frequency)
• 2 x (2 x 12.5 x π) 79 separate rotations
– 3 color planes– 2 mirror symmetries– At least 20 possible semi‐discreet length‐scale Nyquist samples– All together, there are at least 625 x 79 x 3 x 2 x 20 5,925,000 possible ways
to represent one possible vector (assuming twenty fixed magnifications in use)
Further Possible Reductions in Degrees of Freedom• Length Scale
– Up to 20x impact on search space (40:2 magnification ratio)
• Dynamic Range (contrast) – 3x impact on search space
• Black Level Offset (brightness)– 5x impact on search space
• Biased distortion ellipsoid compression of fundamental circular vectors– 30x (both angle of axis and degree of distortion)
• Total further reductions: at least 9000, or approximately 4 orders or magnitude, in addition to the initial 5.9 million‐to‐one reduction ratio
Total Realized Search Space Reductions
• RGB Images– 5,925,000 * 104 = ~60 * 109
– (60 billion equivalent Cartesian vectors)
• Computational performance is improved linearly by the reduction of required comparisons for each matching class (at least 60 billion times faster search for the predicate or interest)
• In many cases, a complete feature descriptor can be described with as few as even a single vector.
Motivation: Why Develop Semi‐Supervised and Unsupervised Tools for Differential Diagnosis Generation?• Not to replace the pathologist, but rather to:
– Transition primary screening activities (time consuming and tedious) to a directed review paradigms (faster and less fatiguing)
– Add an interactive machine vision layer to the sign‐out process, conferring quantitative, prognostic and theragnostic data, as required
• Find all image‐based matches (or near‐matches) in a repository that correlate with the current image, based on spatially‐based acceptance criteria
• Use the matching images as a source of statistically convergent metadata that fits an established thresholds for predictive power (standard ROC performance metrics) for key concepts such as:
– Diagnosis
– Biological potential of malignancy (e.g. survival)
– Expected disease‐free survival following specific therapies
– Image‐enhanced Kaplan‐Meyer statistics
– Histology‐normalized response to therapeutic agents / regimen / clinical course
– Association with genomic data already known for the image‐matched cohort of cases (allowing for the constitutive image features to serve as a proxy for previously established multi‐dimensional correlates between morphology and the molecular basis of disease, once initial discovery has bridged the two)
Simple Use Case Already Reduced to Practice:Ground Truth Cancer Mapping
• Useful for precisely identifying all areas of a whole‐slide image that are involved by malignancy– Tumor quantization
– Automated gating for LCM
– Fiduciary mapping for multi‐modality fusion studies
• As vectors are internally derived for each case, inter‐slide variability from fixation and staining becomes inconsequential
Interactive Demonstration
• Web‐based deployment of WSI viewing in tandem with high‐performance computation
• Allows for real‐time analytical and diagnostic activities
• Publically available
3/22/2017
9
Final Observations• The combined democratization of high‐throughput
computing, as made possible with cloud‐based GPU solutions, in tandem with algorithms that can effectively operate in compressed data spaces, bodes well for the development of real time, high throughput WSI algorithms
• Continued migration to the cloud will accelerate the pace of discovery and implementation
• These tools are very real and the data they can extract will be incremental to our current diagnostic armamentarium
Important Information Regarding CME/SAMs
The Online CME/Evaluations/SAMs claim process will only be available on the USCAP website until September 30, 2017.
No claims can be processed after that date!
After September 30, 2017 you will NOT be able to obtain any CME or SAMs credits for attending this meeting.