Post on 12-Jun-2020
transcript
S U MM I TB E R L I N
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Building an Image Analysis auto-scaling hybrid HPC to research cancerAmador PahimQuality Assurance EngineerDefiniens AG
S e s s i o n I D
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Our vision is to improve patient lives by matching patients to the
best therapies based on the most comprehensive digital profiling
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Our proprietary technology finds structures, patterns and textures
in the tumor tissue image to better understand the disease biology
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Project Steps
Project
Initiation
Receiving
Inspection
Region
annotations
Image
Analysis
Data
processingQC
Report and
Delivery
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Project
Initiation
Receiving
Inspection
Region
annotations
Image
Analysis
Data
processingQC
Report and
Delivery
Definiens
Proprietary
Software
Project Steps
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Project
Initiation
Receiving
Inspection
Region
annotations
Image
Analysis
Data
processingQC
Report and
Delivery
Definiens
Proprietary
Software
Project Steps
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Project
Initiation
Receiving
Inspection
Region
annotations
Image
Analysis
Data
processingQC
Report and
Delivery
Definiens
Proprietary
Software
Project Steps
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Project
Initiation
Receiving
Inspection
Region
annotations
Image
Analysis
Data
processingQC
Report and
Delivery
Definiens
Proprietary
Software
Project Steps
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Internal Grid System
Tissue
Blur
Nucleus
Detectio
n
Tumor
Stroma
Annotation
s
Cell
Segmentatio
n
Level
Aggregatio
n
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
800 cores
Internal Grid System
Tissue
Blur
Nucleus
Detectio
n
Tumor
Stroma
Annotation
s
Cell
Segmentatio
n
Level
Aggregatio
n
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Tissue
Blur
Nucleus
Detectio
n
Tumor
Stroma
Annotation
s
Cell
Segmentatio
n
Level
Aggregatio
n
800 cores
35TB of RAM
Internal Grid System
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Project
Initiation
Receiving
Inspection
Region
annotations
Image
Analysis
Data
processingQC
Report and
Delivery
Definiens
Proprietary
Software
Project Steps
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Project
Initiation
Receiving
Inspection
Region
annotations
Image
Analysis
Data
processingQC
Report and
Delivery
Definiens
Proprietary
Software
Project Steps
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Project
Initiation
Receiving
Inspection
Region
annotations
Image
Analysis
Data
processingQC
Report and
Delivery
Definiens
Proprietary
Software
Project Steps
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
• Multiple data sets
• Different types of tasks
• Hybrid cloud support
• Auto scaling
• Job flow control
• Easy deployment
Requirements
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Task Dispatcher
Executor
API Web UI
Executor Executor
Executor
Task Scheduler
Architecture
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Executor
Executor
Executor
Executor
Task Dispatcher
API Web UI
Task Scheduler
Deployment
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
User Input
Tasks:
• Task1
• Type: python
• App: convert_format.py
• Task2:
• Type: spark
• App: heatmaps_calculation.py
• Upstream tasks: Task1,
Input Data
• Slide1
• Slide2
Task1/Slide1 Task1/Slide2
Task2/Slide1 Task2/Slide2
Resulting Tasks Workflow
First Level of Parallelismper input data
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Task Dispatcher
Second Level of Parallelismmultiprocessing
Task1/Slide1
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Task Dispatcher
Second Level of Parallelismmultiprocessing
Task1/Slide1
Python
Executor
- Python Executor
provisioning
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Task Dispatcher
Second Level of Parallelismmultiprocessing
Task1/Slide1- Python Executor
provisioning
- Task parameters
Python
Executor
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Task Dispatcher
Second Level of Parallelismmultiprocessing
Task1/Slide1- Python Executor
provisioning
- Task parameters
- Parallel execution of App
run() methods
Python
Executor
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Task Dispatcher
Second Level of Parallelismmultiprocessing
Task1/Slide1- Python Executor
provisioning
- Task parameters
- Parallel execution of App
run() methods
- Results report
Python
Executor
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Task Dispatcher
Second Level of Parallelismmultiprocessing
Task1/Slide1- Python Executor
provisioning
- Task parameters
- Parallel execution of App
run() methods
- Results report
- Executor teardown
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Task2/Slide1
Task Dispatcher
Second Level of Parallelismdistributed processing
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Task2/Slide1
Task Dispatcher
Spark
Driver
- Spark Driver provisioning
Second Level of Parallelismdistributed processing
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Task2/Slide1
Task Dispatcher
Spark
Driver
- Spark Driver provisioning
- Task parameters
Second Level of Parallelismdistributed processing
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Task2/Slide1
Task Dispatcher
Spark
Driver
Spark
Worker
Spark
Worker
Spark
Worker
- Spark Driver provisioning
- Task parameters
- Spark Workers
provisioning
Second Level of Parallelismdistributed processing
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Task2/Slide1
Task Dispatcher
Spark
Driver
Spark
Worker
Spark
Worker
Spark
Worker
- Spark Driver provisioning
- Task parameters
- Spark Workers
provisioning
- Processing orchestration
Second Level of Parallelismdistributed processing
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Task2/Slide1
Task Dispatcher
Spark
Driver
Spark
Worker
Spark
WorkerSpark
Worker
- Spark Driver provisioning
- Task parameters
- Spark Workers
provisioning
- Processing orchestration
- Results report
Second Level of Parallelismdistributed processing
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Task2/Slide1
Task Dispatcher
- Spark Driver provisioning
- Task parameters
- Spark Workers
provisioning
- Processing orchestration
- Results report
- Cluster teardown
Second Level of Parallelismdistributed processing
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
• DSS - Data Streaming Service
• Serves tiles from multiple file formats
• Standard data access service for internal applications
• Can be executed as a container
• Supports multiple storage backends (S3 included)
Data Access
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
DSS
Executor
DSS
Data Access
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
from jpc import Job
from jpc import InputData
from jpc import PythonTask
task1 = PythonTask(name='heatmaps',
app='Heatmaps.py',
app_args=['-p', '-r'],
repository_url='git@git.definiens.com:projects/12312.git')
input_data = InputData([['dss://dss.definiens.com/projects/12312/slide1'],
['dss://dss.definiens.com/projects/12312/slide2']])
job = Job(name='heatmaps_generation',
tasks=[task1],
input_data=input_data)
job_status = job.submit()
User Interface – First Version
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
User Interface – First Version
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
• Data Access Framework
• Final User Interface: Portal Integration
• More executors to come:
• Amazon SageMaker
• Amazon EMR
• Amazon Lambda experiment
• Projects billing
Next Steps
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Thank you!
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amador Pahimapahim@definiens.com
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMITSUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.