Data Stewardship and integration of Biomedical OMICS data · Robust & validated protocols; quality...

Post on 29-Sep-2020

0 views 0 download

transcript

Thomas Hankemeier, Amy Harms

Netherlands Metabolomics CentreBiomedical Metabolomics Facility Leiden

Leiden Academic Centre for Drug ResearchLeiden University, The Netherlands

Data Stewardship and integration of Biomedical OMICS data

Biologicalquestion

Samplepreparation

Experi-mentaldesign

Data acquisition

Data pre-processing

Biologicalinter-

pretation

Dataanalysis

Samples Raw data List of peaks/Biomolecules(identification)

Relevant biomolecules/ connectivities

&Models

Metabolites

Sampling

Protocol

Metabolomics workflow

Biomedical Metabolomics Facility Leiden• Robust & validated protocols; quality system & trained personal

• > 15,000 samples/year

• Various types of samples: blood, urine, biopsies, cells, etc

• Large number of clinical/preclinical studies with academia, clinics, industry (cardiovascular and metabolic diseases, diabetes, infectious diseases, CNS diseases and nutritional studies)

• Access for academic & clinical researchers & industry(international pharma & nutrition)

Biologicalquestion

Samplepreparation

Experi-mentaldesign

Data acquisition

Data pre-processing

Biologicalinter-

pretation

DataanalysisSampling

Metabolomics FacilityAdvice Advice

www.bmfl.nl

Oxidativestress

Metabolicstress

Inflammatorystress

Biology-driven Global profiling

Validated metabolomics platforms

More details: www.bmfl.nl

Medium polarcentral carbon/

energy metabolism> 200GC-MS Apolar

metabolites> 400LC-MS

Apolar lipids

> 800LC-MS

Polar lipids

> 150LC-MS

Biogenicamines

> 90LC-MS/MS

Endocannabinoids> 40

LC-MS/MS

Oxylipinspro/anti inflammatory

lipid mediators> 120

LC-MS/MS Oxydative/nitrosative stress

> 60LC-MS/MS

> 2500 metabolites>1000 identified> 500 quantitativeVariation < 10%!

Global profiling of lipids using RP-UPLC-TOF MS

PG,PI, PSer, PE

FA

GPCho, SM, GPGro, GPEtn

DG, TG & ChoE

lyso-GPCho, lyso-GPEtn

+ve ESI

-ve ESI

Low energy trace

Waters Synapt qTOF-MSAgilent qTOF MS

Bile acidsFFALPCLPEPIPEPGPCSMDGTGCECER

Castro-Perez et al, J. Proteom. Res, 2010

Human plasma

Data processing: combining targeted & untargeted

• ‘pseudo targeted’ using target list• Identified• Quantitative (if reference compounds available)

• ‘known unknowns’ • ‘unknowns’

• MZextract (Van der Kloet/NMC, new!)Example: lipid profile

TG(52:1)

0 5×100 6 1×100 7 2×100 70

5×100 6

1×100 7

2×100 7

2×100 7

3×100 7

QC SamplesRegular Samples

conventional

unta

rget

ed a

ppro

ach

Comparison quantificationtargeted vs MZ extract

Feature set: several m/z of one analyte

Good Practices

Sample Randomization:• Important to randomize case/control, treated/untreated to avoid

artifacts introduced by changes in instrumental drift• Experimental design dictates the randomization strategy• Within batch variation is lower than between batch, so the batch

design should block related samples to minimize variation

Blinded analysis:• For important clinical studies, the person running the samples

should be blinded to the sample identity• The lab is unblinded for data analysis only after data have been

deposited in a database or with a collaborator

Quality Control ToolsDuring routine analysis, calibration lines, blanks, QCs and are prepared together with the samples. A statistical tool has been developed to apply corrections to the data and to output quality parameters

For data analysis, all peak areas are corrected for internal standard response followed by a QC correction. This tool corrects for instrumental and experimental drift within and over batches. QC-samples (pooled study-samples) bracket ~15 study-samples within a batch.

Assuring Traceability

Make sure that the results that we deliver can be proven and explained not only now but also 5 years from now.

Proper data management should facilitate research based on (existing) research.

An (easy to use) exchange format, using controlled vocabularies/ontologies gives certainty about what was measured and how it was measured.

Researchers need to share information required to reproduce the results (https://biosharing.org/pages/about/). Which means sharing:• SOP’s• Scripts/software to (pre-)process the data• Decisions made, for example why data was discarded• Etc.

Our efforts to assure traceability

Experimental design: Assure reproducible data that can be shared and link to other resources (proteomics, transcriptomics, and genomics)

Traceable data: Starting with ELN coupled to our in-house developed LIMS

Interpretable data: Use external identifiers and controlled vocabularies to present/report data

Data analysis: Freeze data + scripts/algorithms with output, rerun the data analysis pipeline on the same data should produce the same output. Scripts and software should be open (accessible) to understand what happens to data.

We are working hard to deposit our studies in Metabolights (1 live, 3 under curation and 4 more in preparation)

Leiden leads MetabolomeXchange, an international data aggregation and notification service for metabolomics.

What we tried to make data available:Data support platform (NMC)• Easily access and analyze experimental

metabolomics data with the data support platform (DSP).• a metabolomics data

warehouse• a data processing

infrastructure

MetabolomeXchange

International data aggregation and notificationservice for metabolomics set up by Leiden

Easy to search forand subscribe to publiclyavailable data sets

PhenoMeNal Consortium

• H2020 Societal Challenge in Health, Demographic Change, and Well being

• 3 years• 13 partners• 8 Mio Euros• 830 PM

e-infrastructure for the processing, analysis and information-mining of the massive amount of medical molecular phenotyping and genotypingdata that will be generated by metabolomics applications now entering research and clinic.

The Aim

Data collection QC Data pre-

processingStatistical Analysis

Workflows

Biomedical Data & Metadata

DTL / FAIR DATA

Findable, Accessible, Interoperable, and Re-usableLeiden University, partner of DTL (Dutch Techcentre forLife Sciences), supports the idea and developments of international FAIR Data principles.

Linking with vendors?

• Discussions are ongoing between Leiden and vendors to see how the experience and expertise gained through many NMC can be used to enhance the their workflow.

• Leiden has been participating in EU funded grants for improving the infrastructure for metabolomics communication.

• Vendors and community are both developing software tools to integrate metabolomics tools in a more system biology approach and we should work together.

Summary & discussion points

• Workflow and data management crucial and different for each facility and field

• Share good practices

• For Metabolomics:

• Absolute concentrations are key!?

• Benefits for validation and replication!!

• Some main facilities for high throughput!?

• Benefits can be achieved in omics integration; NL/DTL to lead by example?

• Sharing metadata often bottleneck

www.bmfl.nl and www.metabolomicscentre.nl

Ruud BergerBiochemical interpretation

Amy HarmsLeader

Metabolomics Facility

Jan van der GreefSystems approaches

& SDPPM

Ronan FlemingMetabolic modelling

(guest)

Paul VultoOrgan-on-a-chip &

microfluidics(guest)

Slavik KovalData analysis

Peter LindenburgMetabolomics

technology

Rawi RamautarMetabolomics

technology

Acknowledgement

AcknowledgementPhD students: Amar Oedit, Vasu Kantae, Bas Trietsch, Junzeng Fu, Robert-Jan Raterink, Can Gulersonmez, Min He, Nelus Schoeman, Mengmeng Sun, Vincent van Duinen, Rosilene Rossetto-Burgos, Abidemi Junaid, Wei Zhang, Renate BuijinkPost docs: Oskar Gonzalez, Michel van Weeghel, Anne-Charlotte Dubbelman, Petri Kylli, Estefania Moreno-Gordaliza, Marek Noga Technicians: Gerwin Spijksma, Faisa Guled, Anthanasis Giannitsis (clean room), Sabine Bos, Lieke Lamont-de Vries, Hyung Elfrink, Belèn Gonzàlez Amoros, Marian Martinez Zapata, Sandra Pous-Torres, Monique Nieman Scientific Programmer: Michael van VlietMechanical Workshop: Raphael Zwier

www.metabolomicscentre.nl