Date post: | 09-Jan-2017 |
Category: |
Healthcare |
Upload: | matthieu-schapranow |
View: | 285 times |
Download: | 0 times |
Analyze Genomes: A Federated In-Memory Database Computing Platform Enabling Real-time Analysis of Big Medical Data
Dr. Matthieu-P. Schapranow SAPPHIRE, Orlando, USA
May 17, 2016
■ Online: Visit we.analyzegenomes.com for latest research results, slides, videos, tools, and publications
■ Offline: High-Performance In-Memory Genome Data Analysis: In-Memory Data Management Research, Springer,
ISBN: 978-3-319-03034-0, 2014
■ In Person: Join us for Intel Tech Talks at SAPPHIRE booth 625 daily!
□ May 17 12.30pm: A Federated In-Memory Database Computing Platform Enabling Real-time Analysis of Big Medical Data
□ May 18 12.30pm: In-Memory Apps for Next Generation Life Sciences Research
□ May 19 11.30am: In-Memory Apps Supporting Precision Medicine
Where to find additional information?
Schapranow, SAPPHIRE, May 17, 2016
A Federated In-Memory Database Computing Platform for Big Medical Data
2
Ind irect In te raction
D irect In te raction
C lin ic ian P atien tR esearcher
P harm aceu tica lC om pany
H ea lthcareP roviders
H osp ita lR esearch
C enterLabora to ry
P atien tA dvocacy
G roup
Intelligent Healthcare Networks in the 21st Century?
Schapranow, SAPPHIRE, May 17, 2016
A Federated In-Memory Database Computing Platform for Big Medical Data
3
Ind irect In te raction
D irect In te raction
C lin ic ian P atien tR esearcher
P harm aceu tica lC om pany
H ea lthcareP roviders
H osp ita lR esearch
C enterLabora to ry
P atien tA dvocacy
G roup
Intelligent Healthcare Networks in the 21st Century?
Schapranow, SAPPHIRE, May 17, 2016
A Federated In-Memory Database Computing Platform for Big Medical Data
4
Ind irect In te raction
D irect In te raction
C lin ic ian P atien tR esearcher
P harm aceu tica lC om pany
H ea lthcareP roviders
H osp ita lR esearch
C enterLabora to ry
P atien tA dvocacy
G roup
Intelligent Healthcare Networks in the 21st Century!
Schapranow, SAPPHIRE, May 17, 2016
A Federated In-Memory Database Computing Platform for Big Medical Data
5
■ Patients
□ Individual anamnesis, family history, and background
□ Require fast access to individualized therapy
■ Clinicians
□ Identify root and extent of disease using laboratory tests
□ Evaluate therapy alternatives, adapt existing therapy
■ Researchers
□ Conduct laboratory work, e.g. analyze patient samples
□ Create new research findings and come-up with treatment alternatives
The Setting Actors in Oncology
Schapranow, SAPPHIRE, May 17, 2016
6
A Federated In-Memory Database Computing Platform for Big Medical Data
IT Challenges Distributed Heterogeneous Data Sources
7
Human genome/biological data 600GB per full genome 15PB+ in databases of leading institutes
Prescription data 1.5B records from 10,000 doctors and 10M Patients (100 GB)
Clinical trials Currently more than 30k recruiting on ClinicalTrials.gov
Human proteome 160M data points (2.4GB) per sample >3TB raw proteome data in ProteomicsDB
PubMed database >23M articles
Hospital information systems Often more than 50GB
Medical sensor data Scan of a single organ in 1s creates 10GB of raw data Cancer patient records
>160k records at NCT A Federated In-Memory Database Computing Platform for Big Medical Data
Schapranow, SAPPHIRE, May 17, 2016
Schapranow, SAPPHIRE, May 17, 2016
Our Approach Analyze Genomes: Real-time Analysis of Big Medical Data
8
In-Memory Database
Extensions for Life Sciences
Data Exchange, App Store
Access Control, Data Protection
Fair Use
Statistical Tools
Real-time Analysis
App-spanning User Profiles
Combined and Linked Data
Genome Data
Cellular Pathways
Genome Metadata
Research Publications
Pipeline and Analysis Models
Drugs and Interactions
A Federated In-Memory Database Computing Platform for Big Medical Data
Drug Response Analysis
Pathway Topology Analysis
Medical Knowledge Cockpit Oncolyzer
Clinical Trial Recruitment
Cohort Analysis
...
Indexed Sources
Combined column and row store
Map/Reduce Single and multi-tenancy
Lightweight compression
Insert only for time travel
Real-time replication
Working on integers
SQL interface on columns and rows
Active/passive data store
Minimal projections
Group key Reduction of software layers
Dynamic multi-threading
Bulk load of data
Object-relational mapping
Text retrieval and extraction engine
No aggregate tables
Data partitioning Any attribute as index
No disk
On-the-fly extensibility
Analytics on historical data
Multi-core/ parallelization
Our Technology In-Memory Database Technology
+
+++
+
P
v
+++t
SQL
xx
T
disk
9
Schapranow, SAPPHIRE, May 17, 2016
A Federated In-Memory Database Computing Platform for Big Medical Data
Where are all those Clouds go to?
Schapranow, SAPPHIRE, May 17, 2016
A Federated In-Memory Database Computing Platform for Big Medical Data
10
Gartner's 2014 Hype Cycle for Emerging Technologies
■ Requirements
□ Real-time data analysis
□ Maintained software
■ Restrictions
□ Data privacy
□ Data locality
□ Volume of “big medical data”
■ Solution?
□ Federated In-Memory Database System vs. Cloud Computing
Software Requirements in Life Sciences
Schapranow, SAPPHIRE, May 17, 2016
A Federated In-Memory Database Computing Platform for Big Medical Data
11
Approach I: Multiple Cloud Service Providers
Schapranow, SAPPHIRE, May 17, 2016
A Federated In-Memory Database Computing Platform for Big Medical Data
12
Local S ystem
C loudSynchron ization
S erv ice
R
Loca l S to rage
LocalSynchron iza tion
S erv ice
R
SharedC loud
S torage
S ite A
Local S ystem
R
Loca l S to rage
LocalSynchron iza tion
Serv ice
S ite B
C loudSynchron iza tion
S erv ice
SharedC loud
S torage
R
C loud P rovider S ite A
C loud Provider S ite B
Approach II: A Single Service Provider
Schapranow, SAPPHIRE, May 17, 2016
A Federated In-Memory Database Computing Platform for Big Medical Data
13
CloudSynchron ization
Service
SharedC loud
Storage
Site A Site BC loud Provider
C loud SystemR R
Multiple Sites Forming the Federated In-Memory Database System (FIMDB)
Schapranow, SAPPHIRE, May 17, 2016
A Federated In-Memory Database Computing Platform for Big Medical Data
14
Federated In-M em ory D atabase System
M aster D ata andS hared A lgorithm s
S ite A S ite BC loud Provider
C loud IM D BInstance
Local IM D BInstance
S ensitive D ata,e.g . P atient D ata
R
Local IM D BInstance
Sensitive D ata,e .g. P atien t D ata
R
FIMDB: Cloud Service Provider
Schapranow, SAPPHIRE, May 17, 2016
A Federated In-Memory Database Computing Platform for Big Medical Data
15 S ite B
Federated In -M em oryD atabase Instance ,
A lgorithm s, andApp lications M anaged
by Service P rovider
Clou
d Ser
vice
Prov
ider
S ite A
FIMD
BA.
1
FIMD
BA.
2
FIMD
BA.
3
FIMD
BA.
4
FIMD
BA.
5
FIMD
BB.
1
FIMD
BB.
2
FIMD
BB.
3
FIMD
BC.
1
Federated In -M em oryD atabase Instances
M aster D ataM anaged by
Service P rovider
Sensitive D atareside a t S ite
■ Change of cloud computing paradigm: Transfer (small) algorithms to (big) data
■ In-Memory Database (IMDB)
□ Landscape of IMDB nodes
□ Stored IMDB procedures and algorithms
□ Master data for applications
■ In-Memory File System (IMDBfs)
□ Integration of file-based tools
□ Managed services directory
□ OS binaries compiled and statically linked for individual platforms
1. Establish site-to-site VPN connection b/w site and cloud service provider
2. Mount remote services directory
3. Install and configure local IMDB instance from services directory
4. Subscribe to and configure selected managed services
FIMDB: Setup of a New Client
Schapranow, SAPPHIRE, May 17, 2016
A Federated In-Memory Database Computing Platform for Big Medical Data
16
■ Data partitioning protects sensitive data by storing it on local hardware resources only
■ Supports parallel query execution, i.e. reduced processing time
■ Efficient use of existing hardware resources
FIMDB: Incorporating Local Compute Resources
Schapranow, SAPPHIRE, May 17, 2016
A Federated In-Memory Database Computing Platform for Big Medical Data
17
■ Brings algorithms to data
■ Forms a single database across individual sites and locations
■ Master data managed by service provider whilst sensitive data resides locally
What to Take Home? Test it Yourself: AnalyzeGenomes.com
Schapranow, SAPPHIRE, May 17, 2016
A Federated In-Memory Database Computing Platform for Big Medical Data
18
Pros Cons
Single database license Complex operation
Easy to consume services Time-consuming infrastructure setup
Query propagation by IMDB
Only a single source of truth
■ Online: Visit we.analyzegenomes.com for latest research results, slides, videos, tools, and publications
■ Offline: High-Performance In-Memory Genome Data Analysis: In-Memory Data Management Research, Springer,
ISBN: 978-3-319-03034-0, 2014
■ In Person: Join us for Intel Tech Talks at SAPPHIRE booth 625 daily!
□ May 17 12.30pm: A Federated In-Memory Database Computing Platform Enabling Real-time Analysis of Big Medical Data
□ May 18 12.30pm: In-Memory Apps for Next Generation Life Sciences Research
□ May 19 11.30am: In-Memory Apps Supporting Precision Medicine
Where to find additional information?
Schapranow, SAPPHIRE, May 17, 2016
A Federated In-Memory Database Computing Platform for Big Medical Data
19
Keep in contact with us!
Dr. Matthieu-P. Schapranow Program Manager E-Health & Life Sciences
Hasso Plattner Institute
August-Bebel-Str. 88 14482 Potsdam, Germany
http://we.analyzegenomes.com/
Schapranow, SAPPHIRE, May 17, 2016
A Federated In-Memory Database Computing Platform for Big Medical Data
20