+ All Categories
Home > Technology > Analyze Genomes: A Federated In-Memory Database System For Life Sciences

Analyze Genomes: A Federated In-Memory Database System For Life Sciences

Date post: 09-Jan-2017
Category:
Upload: matthieu-schapranow
View: 541 times
Download: 0 times
Share this document with a friend
19
Analyze Genomes: A Federated In-Memory Database System For Life Sciences Dr. Matthieu-P. Schapranow HPI Future SOC Lab Day, Potsdam, Germany Nov 4, 2015 Generously supported by
Transcript

Analyze Genomes: A Federated In-Memory Database System For Life Sciences

Dr. Matthieu-P. Schapranow HPI Future SOC Lab Day, Potsdam, Germany

Nov 4, 2015

Generously supported by

■  Online: Visit we.analyzegenomes.com for latest research results, tools, and news

■  Offline: Read more about it, e.g. High-Performance In-Memory Genome Data Analysis: How In-Memory Database Technology Accelerates Personalized Medicine, In-Memory Data Management Research, Springer, ISBN: 978-3-319-03034-0, 2014

■  In Person: Join us for “Festival of Genomics” Jan 19-21, 2016 in London, UK

Important things first: Where do you find additional information?

Schapranow/Perscheid, FSOC Lab Day, Nov 4, 2015

A Federated In-Memory Database System For Life Sciences

2

■  Patients

□  Individual anamnesis, family history, and background

□  Require fast access to individualized therapy

■  Clinicians

□  Identify root and extent of disease using laboratory tests

□  Evaluate therapy alternatives, adapt existing therapy

■  Researchers

□  Conduct laboratory work, e.g. analyze patient samples

□  Create new research findings and come-up with treatment alternatives

The Setting Actors in Oncology

Schapranow/Perscheid, FSOC Lab Day, Nov 4, 2015 3

A Federated In-Memory Database System For Life Sciences

IT Challenges Distributed Heterogeneous Data Sources

Human genome/biological data 600GB per full genome 15PB+ in databases of leading institutes

Prescription data 1.5B records from 10,000 doctors and 10M Patients (100 GB)

Clinical trials Currently more than 30k recruiting on ClinicalTrials.gov

Human proteome 160M data points (2.4GB) per sample >3TB raw proteome data in ProteomicsDB

PubMed database >24M articles Hospital information systems

Often more than 50GB

Medical sensor data Scan of a single organ in 1s creates 10GB of raw data Cancer patient records

>160k records at NCT A Federated In-Memory Database System For Life Sciences

Schapranow/Perscheid, FSOC Lab Day, Nov 4, 2015 Chart 4

■  Requirements

□  Real-time data analysis

□  Maintained software

■  Restrictions

□  Data privacy

□  Data locality

□  Volume of “big medical data”

■  Solution?

□  Federated In-Memory Database System vs. Cloud Computing

Software Requirements in Life Sciences

Schapranow/Perscheid, FSOC Lab Day, Nov 4, 2015

A Federated In-Memory Database System For Life Sciences

5

Where are all those Clouds go to?

Schapranow/Perscheid, FSOC Lab Day, Nov 4, 2015

A Federated In-Memory Database System For Life Sciences

6

Gartner's 2014 Hype Cycle for Emerging Technologies

Multiple Cloud Service Providers

Schapranow, BIRTE/VLDB 2015, Aug 31, 2015

A Federated In-Memory Database System For Life Sciences

7

Local S ystem

C loudSynchron ization

S erv ice

R

Loca l S to rage

LocalSynchron iza tion

S erv ice

R

SharedC loud

S torage

S ite A

Local S ystem

R

Loca l S to rage

LocalSynchron iza tion

Serv ice

S ite B

C loudSynchron iza tion

S erv ice

SharedC loud

S torage

R

C loud P rovider S ite A

C loud Provider S ite B

Federated In-Memory Database (FIMDB) Incorporating Local Compute Resources

Schapranow/Perscheid, FSOC Lab Day, Nov 4, 2015

A Federated In-Memory Database System For Life Sciences

8 S ite B

Federated In -M em oryD atabase Instance ,

A lgorithm s, andApp lications M anaged

by Service P rovider

Clou

d Ser

vice

Prov

ider

S ite A

FIMD

BA.

1

FIMD

BA.

2

FIMD

BA.

3

FIMD

BA.

4

FIMD

BA.

5

FIMD

BB.

1

FIMD

BB.

2

FIMD

BB.

3

FIMD

BC.

1

Federated In -M em oryD atabase Instances

M aster D ataM anaged by

Service P rovider

Sensitive D atareside a t S ite

■  Aim: Provision of managed Analyze Genomes services while sensitive data

remains locally

■  Process steps

□  Connect existing resources to join federated database landscape

□  Install Workers on local nodes to process sensitive data and store results in local DB instances

Schapranow/Perscheid, FSOC Lab Day, Nov 4, 2015

Analyze Genomes: Real-time Analysis of Big Medical Data

9

In-Memory Database

Extensions for Life Sciences

Data Exchange, App Store

Access Control, Data Protection

Fair Use

Statistical Tools

Real-time Analysis

App-spanning User Profiles

Combined and Linked Data

Genome Data

Cellular Pathways

Genome Metadata

Research Publications

Pipeline and Analysis Models

Drugs and Interactions

A Federated In-Memory Database System For Life Sciences

Drug Response Analysis

Pathway Topology Analysis

Medical Knowledge Cockpit Oncolyzer

Clinical Trial Recruitment

Cohort Analysis

...

Indexed Sources

Use Case: Identification of Best Treatment Option for Cancer Patient

■  Patient: 48 years, female, non-smoker, smoke-free environment

■  Diagnosis: Non-Small Cell Lung Cancer (NSCLC), stage IV

1.  Surgery to remove tumor

2.  Tumor sample is sent to laboratory to extract DNA

3.  DNA is sequenced resulting in up to 750 GB of raw data per sample

4.  Processing of raw data to perform analysis

5.  Identification of relevant driver mutations using international medical knowledge

6.  Informed decision making Schapranow/Perscheid, FSOC Lab Day, Nov 4, 2015

A Federated In-Memory Database System For Life Sciences

10

From Raw Genome Data to Analysis

Schapranow/Perscheid, FSOC Lab Day, Nov 4, 2015

A Federated In-Memory Database System For Life Sciences

■  Sequencing: Acquire digital DNA data

■  Alignment: Reconstruction of complete genome with snippets

■  Variant Calling: Identification of genetic variants

■  Data Annotation: Linking genetic variants with research findings

Chart 11

Standardized Modeling of Genome Data Analysis Pipelines

■  Graphical modeling of analysis pipelines

□  Supports reproducible research

□  BPMN-2.0-compliant

■  Extension of modeling notation by

□  Modular structure

□  Degree of parallelization

□  Parameters/variables

■  Pipelines stored in IMDB and executed through our worker framework

A Federated In-Memory Database System For Life Sciences

Schapranow/Perscheid, FSOC Lab Day, Nov 4, 2015 Chart 12

Execution of Genome Data Analysis Pipelines

■  Dedicated scheduler for optimized pipeline execution

□  Assigns tasks to workers

□  Recovery of pipeline status

■  Scheduler uses IMDB logs for workload estimation

■  Different scheduling algorithms available, e.g.

□  High Throughput

□  Priority First

□  User-/Group-based

A Federated In-Memory Database System For Life Sciences

Schapranow/Perscheid, FSOC Lab Day, Nov 4, 2015

IMDB

Pipeline Tasks Scheduler

Worker Worker

Worker Worker

Pipeline Subtasks

Events Data

Chart 13

Real-time Analysis of Genetic Variants

■  Genome Browser enables detailed exploration of genome loci and associated associations

■  Ranks variants accordingly to known diseases

■  Integrates latest international medical knowledge, annotations, and literature

■  Provides links back to primary data sources, e.g. EBI, NCBI, dbSNP, and UCSC

A Federated In-Memory Database System For Life Sciences

Schapranow/Perscheid, FSOC Lab Day, Nov 4, 2015 Chart 14

Medical Knowledge Cockpit

■  Uses patient specifics to provide more adequate results

■  Immediate exploration of relevant information, e.g.

□  Gene descriptions

□  Molecular impact and related pathways

□  Scientific publications

□  Suitable clinical trials

■  Translates manual searching for hours or days into finding

A Federated In-Memory Database System For Life Sciences

Schapranow/Perscheid, FSOC Lab Day, Nov 4, 2015 Chart 15

Drug Response Analysis

■  Incorporate knowledge about historic cases to optimize treatment of current cases

■  Enables real-time exploration of Xenograft experiments

■  Configurable medical model to predict drug response

A Federated In-Memory Database System For Life Sciences

Schapranow/Perscheid, FSOC Lab Day, Nov 4, 2015 Chart 16

■  Global Medical Knowledge (Master’s project)

■  Detect cardiovascular diseases and evaluate

treatment options (DHZB)

■  Use health insurance data to improve health care research (AOK)

■  Pharmacogenetics (Bayer)

■  Generously supported by

Join us for upcoming projects!

Schapranow/Perscheid, FSOC Lab Day, Nov 4, 2015

A Federated In-Memory Database System For Life Sciences

17

Interdisciplinary Design Thinking

Teams

You?

■  For patients

□  Identify relevant clinical trials and medical experts

□  Become an informed patient

■  For clinicians

□  Identify pharmacokinetic correlations

□  Scan for similar patient cases, e.g. to evaluate therapy efficiency

■  For researchers

□  Enable real-time analysis of medical data, e.g. assess pathways to identify impact of detected variants

□  Combined mining in structured and unstructured data, e.g. publications,

diagnosis, and EMR data

What to Take Home? Test it Yourself: AnalyzeGenomes.com

Schapranow/Perscheid, FSOC Lab Day, Nov 4, 2015 18

A Federated In-Memory Database System For Life Sciences

Keep in contact with us!

Hasso Plattner Institute Enterprise Platform & Integration Concepts (EPIC)

August-Bebel-Str. 88 14482 Potsdam, Germany

Dr. Matthieu-P. Schapranow Program Manager E-Health

[email protected]

Schapranow/Perscheid, FSOC Lab Day, Nov 4, 2015

A Federated In-Memory Database System For Life Sciences

19

Cindy Perscheid Research Assistant [email protected]


Recommended