+ All Categories
Home > Documents > Big data technologies to support the integration of ... · SQL vs. NoSQL with genetic variants....

Big data technologies to support the integration of ... · SQL vs. NoSQL with genetic variants....

Date post: 03-Jun-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
28
Big data technologies to support the integration of healthcare and research data Riccardo Bellazzi University of Pavia Italy
Transcript
Page 1: Big data technologies to support the integration of ... · SQL vs. NoSQL with genetic variants. Query. DB population. O’Connor et al. BMC Bioinformatics 2010 11 ... I2b2 –NGS/NOSQL

Big data technologies to support the integration of healthcare and

research data

Riccardo BellazziUniversity of Pavia

Italy

Page 2: Big data technologies to support the integration of ... · SQL vs. NoSQL with genetic variants. Query. DB population. O’Connor et al. BMC Bioinformatics 2010 11 ... I2b2 –NGS/NOSQL
Page 3: Big data technologies to support the integration of ... · SQL vs. NoSQL with genetic variants. Query. DB population. O’Connor et al. BMC Bioinformatics 2010 11 ... I2b2 –NGS/NOSQL
Page 4: Big data technologies to support the integration of ... · SQL vs. NoSQL with genetic variants. Query. DB population. O’Connor et al. BMC Bioinformatics 2010 11 ... I2b2 –NGS/NOSQL

Knowledge and data integration

Page 5: Big data technologies to support the integration of ... · SQL vs. NoSQL with genetic variants. Query. DB population. O’Connor et al. BMC Bioinformatics 2010 11 ... I2b2 –NGS/NOSQL

Clinical Bioinformatics (i2b2)

Knowledge discovery

Knowledge to practice

Bedside to bench

Test new Knowledge

Page 6: Big data technologies to support the integration of ... · SQL vs. NoSQL with genetic variants. Query. DB population. O’Connor et al. BMC Bioinformatics 2010 11 ... I2b2 –NGS/NOSQL

BIOINFORMATICS METHODOLOGY AND TECHNOLOGY TO INTEGRATE

CLINICAL AND BIOLOGICAL KNOWLEDGESUPPORTING

ONCOLOGY TRANSATIONAL RESEARCH

ONCO-i2b2 project

Page 7: Big data technologies to support the integration of ... · SQL vs. NoSQL with genetic variants. Query. DB population. O’Connor et al. BMC Bioinformatics 2010 11 ... I2b2 –NGS/NOSQL

Architecture overview

HIS

Clinical patient management

Data

Laboratory

Research

Samples

Biobank

CRCAnonymized data

Anonymized samples

i2b2

ResearcherPatient

Match IDs

Page 8: Big data technologies to support the integration of ... · SQL vs. NoSQL with genetic variants. Query. DB population. O’Connor et al. BMC Bioinformatics 2010 11 ... I2b2 –NGS/NOSQL

FSM - I2b2 instances

Activesince

Patients Visits Observations Genetic NLP

2011 5.611 7.726 23.175 n n

Onco-i2b2Active since Patients Visits Observations Genetic

dataNLP

2010 28.838 142.464 2.341.771 Y y

CardiologyActivesince

Patients Visits Observations Genetic NLP

2009 6.334 15.094 205.418 y n

Administration

Active since Patients Visits Observations Geneticdata

NLP

Biobank 923 - 8188 - y

Sub-project: biobank

Page 9: Big data technologies to support the integration of ... · SQL vs. NoSQL with genetic variants. Query. DB population. O’Connor et al. BMC Bioinformatics 2010 11 ... I2b2 –NGS/NOSQL

Predicting the development of

Diabetes complications and

assessing the evolution of the

disease

Page 10: Big data technologies to support the integration of ... · SQL vs. NoSQL with genetic variants. Query. DB population. O’Connor et al. BMC Bioinformatics 2010 11 ... I2b2 –NGS/NOSQL

DATA: Clinical + Administrative information

10

Data pre-processingand organization

Knowledge to practice

FROMHOSPITALS

. Follow up visits. Medications

. Labs

FROM LOCALHEALTHCARE AGENCIES

.Drugs Purchases.Hospitalizations

.Environmental Data

Temporal and WorkflowData Mining

What happens to the patient inside

the hospital

What happens to the patients outside the hospital

Administrative

Data

Clinical Data Analysis

Models

1000 patientsRich temporal

characterization

Page 11: Big data technologies to support the integration of ... · SQL vs. NoSQL with genetic variants. Query. DB population. O’Connor et al. BMC Bioinformatics 2010 11 ... I2b2 –NGS/NOSQL
Page 12: Big data technologies to support the integration of ... · SQL vs. NoSQL with genetic variants. Query. DB population. O’Connor et al. BMC Bioinformatics 2010 11 ... I2b2 –NGS/NOSQL

Big data – big opportunity for innovation1

12

1http://www.innovationexcellence.com/

2 directions: Analytics, Data storage and retrieval

Page 13: Big data technologies to support the integration of ... · SQL vs. NoSQL with genetic variants. Query. DB population. O’Connor et al. BMC Bioinformatics 2010 11 ... I2b2 –NGS/NOSQL

Analytics on Map-Reduce Parallel Programming Paradigm

Relies on a distributed file system

Some libraries available for data mining

02/06/2014

Page 14: Big data technologies to support the integration of ... · SQL vs. NoSQL with genetic variants. Query. DB population. O’Connor et al. BMC Bioinformatics 2010 11 ... I2b2 –NGS/NOSQL

“Whole genome” predictors

Page 15: Big data technologies to support the integration of ... · SQL vs. NoSQL with genetic variants. Query. DB population. O’Connor et al. BMC Bioinformatics 2010 11 ... I2b2 –NGS/NOSQL

Map-reduce at work

02/06/2014 15

Training

Testing

Page 16: Big data technologies to support the integration of ... · SQL vs. NoSQL with genetic variants. Query. DB population. O’Connor et al. BMC Bioinformatics 2010 11 ... I2b2 –NGS/NOSQL

Results – predicting longevityAllSNPs

acc 0.6178 [0.6001-0.6356]

sens 0.1317 [0.0891-0.1738]

spec 0.9783 [0.9665-0.9902]

mcc 0.2102 [0-1497-0.2707]

ppv 0.8101 [0.7196-0.9005]

npv 0.6035 [0.5916-0.6153]

Serial implementation: 284 minutes

Map reduce (single slave): 34 minutes

Top predictors

Page 17: Big data technologies to support the integration of ... · SQL vs. NoSQL with genetic variants. Query. DB population. O’Connor et al. BMC Bioinformatics 2010 11 ... I2b2 –NGS/NOSQL

Handling queries on variants from NGS

One patient - One exome - More than 20000 variants

The majority of them with role still unknown

Store them in files

Forget them

Query them ..

Page 18: Big data technologies to support the integration of ... · SQL vs. NoSQL with genetic variants. Query. DB population. O’Connor et al. BMC Bioinformatics 2010 11 ... I2b2 –NGS/NOSQL

SQL vs. NoSQL with genetic variants

Query

DB population

O’Connor et al. BMC Bioinformatics 2010 11(Suppl 12):S2 doi:10.1186/1471-2105-11-S12-S2

Page 19: Big data technologies to support the integration of ... · SQL vs. NoSQL with genetic variants. Query. DB population. O’Connor et al. BMC Bioinformatics 2010 11 ... I2b2 –NGS/NOSQL

NOSQL (CouchDB) pros

Advantages Mutations stored in JSON format (easily readable &

exportable) Cloud-based system (higly scalable) Fast pre-computed queries (no limits on queries’number)

Technologies Amazon Web Services CouchDB & Big Couch i2b2 (query platform & phenotype integration)

Page 20: Big data technologies to support the integration of ... · SQL vs. NoSQL with genetic variants. Query. DB population. O’Connor et al. BMC Bioinformatics 2010 11 ... I2b2 –NGS/NOSQL
Page 21: Big data technologies to support the integration of ... · SQL vs. NoSQL with genetic variants. Query. DB population. O’Connor et al. BMC Bioinformatics 2010 11 ... I2b2 –NGS/NOSQL

I2b2 – NGS/NOSQL Cell

Page 22: Big data technologies to support the integration of ... · SQL vs. NoSQL with genetic variants. Query. DB population. O’Connor et al. BMC Bioinformatics 2010 11 ... I2b2 –NGS/NOSQL

I2b2 NGS plugin

Page 23: Big data technologies to support the integration of ... · SQL vs. NoSQL with genetic variants. Query. DB population. O’Connor et al. BMC Bioinformatics 2010 11 ... I2b2 –NGS/NOSQL
Page 24: Big data technologies to support the integration of ... · SQL vs. NoSQL with genetic variants. Query. DB population. O’Connor et al. BMC Bioinformatics 2010 11 ... I2b2 –NGS/NOSQL
Page 25: Big data technologies to support the integration of ... · SQL vs. NoSQL with genetic variants. Query. DB population. O’Connor et al. BMC Bioinformatics 2010 11 ... I2b2 –NGS/NOSQL
Page 26: Big data technologies to support the integration of ... · SQL vs. NoSQL with genetic variants. Query. DB population. O’Connor et al. BMC Bioinformatics 2010 11 ... I2b2 –NGS/NOSQL

A test Deployed on Amazon AWS 55 exome variant sets (1.2 M variants)

Extract individuals with a specific phenotype and with missense or non-sense mutations in a gene of interest

Refine Task 1 results considering only Polyphen2-considered damaging mutations

Apply Task 2 logic to more complex phenotypes Check individuals with a specific dominant autosomal variant

Page 27: Big data technologies to support the integration of ... · SQL vs. NoSQL with genetic variants. Query. DB population. O’Connor et al. BMC Bioinformatics 2010 11 ... I2b2 –NGS/NOSQL

The ST-model (Ramoni, Stefanelli et al)

• An epistemological model of scientific and medical reasoning

• hypotheses

selection/generation

phase: abstraction and abduction

• hypotheses testing phase: ranking, deduction, eliminative induction

Page 28: Big data technologies to support the integration of ... · SQL vs. NoSQL with genetic variants. Query. DB population. O’Connor et al. BMC Bioinformatics 2010 11 ... I2b2 –NGS/NOSQL

Thanks from the BMI Labs “Mario Stefanelli”


Recommended