+ All Categories
Home > Healthcare > Analyze Genomes: A Federated In-memory Database Computing Platform enabling real-time Analysis of...

Analyze Genomes: A Federated In-memory Database Computing Platform enabling real-time Analysis of...

Date post: 09-Jan-2017
Category:
Upload: matthieu-schapranow
View: 285 times
Download: 0 times
Share this document with a friend
20
Analyze Genomes: A Federated In-Memory Database Computing Platform Enabling Real-time Analysis of Big Medical Data Dr. Matthieu-P. Schapranow SAPPHIRE, Orlando, USA May 17, 2016
Transcript
Page 1: Analyze Genomes: A Federated In-memory Database Computing Platform enabling real-time Analysis of Big Medical Data

Analyze Genomes: A Federated In-Memory Database Computing Platform Enabling Real-time Analysis of Big Medical Data

Dr. Matthieu-P. Schapranow SAPPHIRE, Orlando, USA

May 17, 2016

Page 2: Analyze Genomes: A Federated In-memory Database Computing Platform enabling real-time Analysis of Big Medical Data

■  Online: Visit we.analyzegenomes.com for latest research results, slides, videos, tools, and publications

■  Offline: High-Performance In-Memory Genome Data Analysis: In-Memory Data Management Research, Springer,

ISBN: 978-3-319-03034-0, 2014

■  In Person: Join us for Intel Tech Talks at SAPPHIRE booth 625 daily!

□  May 17 12.30pm: A Federated In-Memory Database Computing Platform Enabling Real-time Analysis of Big Medical Data

□  May 18 12.30pm: In-Memory Apps for Next Generation Life Sciences Research

□  May 19 11.30am: In-Memory Apps Supporting Precision Medicine

Where to find additional information?

Schapranow, SAPPHIRE, May 17, 2016

A Federated In-Memory Database Computing Platform for Big Medical Data

2

Page 3: Analyze Genomes: A Federated In-memory Database Computing Platform enabling real-time Analysis of Big Medical Data

Ind irect In te raction

D irect In te raction

C lin ic ian P atien tR esearcher

P harm aceu tica lC om pany

H ea lthcareP roviders

H osp ita lR esearch

C enterLabora to ry

P atien tA dvocacy

G roup

Intelligent Healthcare Networks in the 21st Century?

Schapranow, SAPPHIRE, May 17, 2016

A Federated In-Memory Database Computing Platform for Big Medical Data

3

Page 4: Analyze Genomes: A Federated In-memory Database Computing Platform enabling real-time Analysis of Big Medical Data

Ind irect In te raction

D irect In te raction

C lin ic ian P atien tR esearcher

P harm aceu tica lC om pany

H ea lthcareP roviders

H osp ita lR esearch

C enterLabora to ry

P atien tA dvocacy

G roup

Intelligent Healthcare Networks in the 21st Century?

Schapranow, SAPPHIRE, May 17, 2016

A Federated In-Memory Database Computing Platform for Big Medical Data

4

Page 5: Analyze Genomes: A Federated In-memory Database Computing Platform enabling real-time Analysis of Big Medical Data

Ind irect In te raction

D irect In te raction

C lin ic ian P atien tR esearcher

P harm aceu tica lC om pany

H ea lthcareP roviders

H osp ita lR esearch

C enterLabora to ry

P atien tA dvocacy

G roup

Intelligent Healthcare Networks in the 21st Century!

Schapranow, SAPPHIRE, May 17, 2016

A Federated In-Memory Database Computing Platform for Big Medical Data

5

Page 6: Analyze Genomes: A Federated In-memory Database Computing Platform enabling real-time Analysis of Big Medical Data

■  Patients

□  Individual anamnesis, family history, and background

□  Require fast access to individualized therapy

■  Clinicians

□  Identify root and extent of disease using laboratory tests

□  Evaluate therapy alternatives, adapt existing therapy

■  Researchers

□  Conduct laboratory work, e.g. analyze patient samples

□  Create new research findings and come-up with treatment alternatives

The Setting Actors in Oncology

Schapranow, SAPPHIRE, May 17, 2016

6

A Federated In-Memory Database Computing Platform for Big Medical Data

Page 7: Analyze Genomes: A Federated In-memory Database Computing Platform enabling real-time Analysis of Big Medical Data

IT Challenges Distributed Heterogeneous Data Sources

7

Human genome/biological data 600GB per full genome 15PB+ in databases of leading institutes

Prescription data 1.5B records from 10,000 doctors and 10M Patients (100 GB)

Clinical trials Currently more than 30k recruiting on ClinicalTrials.gov

Human proteome 160M data points (2.4GB) per sample >3TB raw proteome data in ProteomicsDB

PubMed database >23M articles

Hospital information systems Often more than 50GB

Medical sensor data Scan of a single organ in 1s creates 10GB of raw data Cancer patient records

>160k records at NCT A Federated In-Memory Database Computing Platform for Big Medical Data

Schapranow, SAPPHIRE, May 17, 2016

Page 8: Analyze Genomes: A Federated In-memory Database Computing Platform enabling real-time Analysis of Big Medical Data

Schapranow, SAPPHIRE, May 17, 2016

Our Approach Analyze Genomes: Real-time Analysis of Big Medical Data

8

In-Memory Database

Extensions for Life Sciences

Data Exchange, App Store

Access Control, Data Protection

Fair Use

Statistical Tools

Real-time Analysis

App-spanning User Profiles

Combined and Linked Data

Genome Data

Cellular Pathways

Genome Metadata

Research Publications

Pipeline and Analysis Models

Drugs and Interactions

A Federated In-Memory Database Computing Platform for Big Medical Data

Drug Response Analysis

Pathway Topology Analysis

Medical Knowledge Cockpit Oncolyzer

Clinical Trial Recruitment

Cohort Analysis

...

Indexed Sources

Page 9: Analyze Genomes: A Federated In-memory Database Computing Platform enabling real-time Analysis of Big Medical Data

Combined column and row store

Map/Reduce Single and multi-tenancy

Lightweight compression

Insert only for time travel

Real-time replication

Working on integers

SQL interface on columns and rows

Active/passive data store

Minimal projections

Group key Reduction of software layers

Dynamic multi-threading

Bulk load of data

Object-relational mapping

Text retrieval and extraction engine

No aggregate tables

Data partitioning Any attribute as index

No disk

On-the-fly extensibility

Analytics on historical data

Multi-core/ parallelization

Our Technology In-Memory Database Technology

+

+++

+

P

v

+++t

SQL

xx

T

disk

9

Schapranow, SAPPHIRE, May 17, 2016

A Federated In-Memory Database Computing Platform for Big Medical Data

Page 10: Analyze Genomes: A Federated In-memory Database Computing Platform enabling real-time Analysis of Big Medical Data

Where are all those Clouds go to?

Schapranow, SAPPHIRE, May 17, 2016

A Federated In-Memory Database Computing Platform for Big Medical Data

10

Gartner's 2014 Hype Cycle for Emerging Technologies

Page 11: Analyze Genomes: A Federated In-memory Database Computing Platform enabling real-time Analysis of Big Medical Data

■  Requirements

□  Real-time data analysis

□  Maintained software

■  Restrictions

□  Data privacy

□  Data locality

□  Volume of “big medical data”

■  Solution?

□  Federated In-Memory Database System vs. Cloud Computing

Software Requirements in Life Sciences

Schapranow, SAPPHIRE, May 17, 2016

A Federated In-Memory Database Computing Platform for Big Medical Data

11

Page 12: Analyze Genomes: A Federated In-memory Database Computing Platform enabling real-time Analysis of Big Medical Data

Approach I: Multiple Cloud Service Providers

Schapranow, SAPPHIRE, May 17, 2016

A Federated In-Memory Database Computing Platform for Big Medical Data

12

Local S ystem

C loudSynchron ization

S erv ice

R

Loca l S to rage

LocalSynchron iza tion

S erv ice

R

SharedC loud

S torage

S ite A

Local S ystem

R

Loca l S to rage

LocalSynchron iza tion

Serv ice

S ite B

C loudSynchron iza tion

S erv ice

SharedC loud

S torage

R

C loud P rovider S ite A

C loud Provider S ite B

Page 13: Analyze Genomes: A Federated In-memory Database Computing Platform enabling real-time Analysis of Big Medical Data

Approach II: A Single Service Provider

Schapranow, SAPPHIRE, May 17, 2016

A Federated In-Memory Database Computing Platform for Big Medical Data

13

CloudSynchron ization

Service

SharedC loud

Storage

Site A Site BC loud Provider

C loud SystemR R

Page 14: Analyze Genomes: A Federated In-memory Database Computing Platform enabling real-time Analysis of Big Medical Data

Multiple Sites Forming the Federated In-Memory Database System (FIMDB)

Schapranow, SAPPHIRE, May 17, 2016

A Federated In-Memory Database Computing Platform for Big Medical Data

14

Federated In-M em ory D atabase System

M aster D ata andS hared A lgorithm s

S ite A S ite BC loud Provider

C loud IM D BInstance

Local IM D BInstance

S ensitive D ata,e.g . P atient D ata

R

Local IM D BInstance

Sensitive D ata,e .g. P atien t D ata

R

Page 15: Analyze Genomes: A Federated In-memory Database Computing Platform enabling real-time Analysis of Big Medical Data

FIMDB: Cloud Service Provider

Schapranow, SAPPHIRE, May 17, 2016

A Federated In-Memory Database Computing Platform for Big Medical Data

15 S ite B

Federated In -M em oryD atabase Instance ,

A lgorithm s, andApp lications M anaged

by Service P rovider

Clou

d Ser

vice

Prov

ider

S ite A

FIMD

BA.

1

FIMD

BA.

2

FIMD

BA.

3

FIMD

BA.

4

FIMD

BA.

5

FIMD

BB.

1

FIMD

BB.

2

FIMD

BB.

3

FIMD

BC.

1

Federated In -M em oryD atabase Instances

M aster D ataM anaged by

Service P rovider

Sensitive D atareside a t S ite

■  Change of cloud computing paradigm: Transfer (small) algorithms to (big) data

■  In-Memory Database (IMDB)

□  Landscape of IMDB nodes

□  Stored IMDB procedures and algorithms

□  Master data for applications

■  In-Memory File System (IMDBfs)

□  Integration of file-based tools

□  Managed services directory

□  OS binaries compiled and statically linked for individual platforms

Page 16: Analyze Genomes: A Federated In-memory Database Computing Platform enabling real-time Analysis of Big Medical Data

1.  Establish site-to-site VPN connection b/w site and cloud service provider

2.  Mount remote services directory

3.  Install and configure local IMDB instance from services directory

4.  Subscribe to and configure selected managed services

FIMDB: Setup of a New Client

Schapranow, SAPPHIRE, May 17, 2016

A Federated In-Memory Database Computing Platform for Big Medical Data

16

Page 17: Analyze Genomes: A Federated In-memory Database Computing Platform enabling real-time Analysis of Big Medical Data

■  Data partitioning protects sensitive data by storing it on local hardware resources only

■  Supports parallel query execution, i.e. reduced processing time

■  Efficient use of existing hardware resources

FIMDB: Incorporating Local Compute Resources

Schapranow, SAPPHIRE, May 17, 2016

A Federated In-Memory Database Computing Platform for Big Medical Data

17

Page 18: Analyze Genomes: A Federated In-memory Database Computing Platform enabling real-time Analysis of Big Medical Data

■  Brings algorithms to data

■  Forms a single database across individual sites and locations

■  Master data managed by service provider whilst sensitive data resides locally

What to Take Home? Test it Yourself: AnalyzeGenomes.com

Schapranow, SAPPHIRE, May 17, 2016

A Federated In-Memory Database Computing Platform for Big Medical Data

18

Pros Cons

Single database license Complex operation

Easy to consume services Time-consuming infrastructure setup

Query propagation by IMDB

Only a single source of truth

Page 19: Analyze Genomes: A Federated In-memory Database Computing Platform enabling real-time Analysis of Big Medical Data

■  Online: Visit we.analyzegenomes.com for latest research results, slides, videos, tools, and publications

■  Offline: High-Performance In-Memory Genome Data Analysis: In-Memory Data Management Research, Springer,

ISBN: 978-3-319-03034-0, 2014

■  In Person: Join us for Intel Tech Talks at SAPPHIRE booth 625 daily!

□  May 17 12.30pm: A Federated In-Memory Database Computing Platform Enabling Real-time Analysis of Big Medical Data

□  May 18 12.30pm: In-Memory Apps for Next Generation Life Sciences Research

□  May 19 11.30am: In-Memory Apps Supporting Precision Medicine

Where to find additional information?

Schapranow, SAPPHIRE, May 17, 2016

A Federated In-Memory Database Computing Platform for Big Medical Data

19

Page 20: Analyze Genomes: A Federated In-memory Database Computing Platform enabling real-time Analysis of Big Medical Data

Keep in contact with us!

Dr. Matthieu-P. Schapranow Program Manager E-Health & Life Sciences

Hasso Plattner Institute

August-Bebel-Str. 88 14482 Potsdam, Germany

[email protected]

http://we.analyzegenomes.com/

Schapranow, SAPPHIRE, May 17, 2016

A Federated In-Memory Database Computing Platform for Big Medical Data

20


Recommended