+ All Categories
Home > Science > PRIDE and ProteomeXchange – Making proteomics data accessible and reusable

PRIDE and ProteomeXchange – Making proteomics data accessible and reusable

Date post: 07-Jul-2015
Category:
Upload: yasset-riverol
View: 1,050 times
Download: 3 times
Share this document with a friend
Description:
Proteomics, ProteomeXchange, Proteins, Biohackathon
Popular Tags:
25
PRIDE and ProteomeXchange – Making proteomics data accessible and reusable Dr. Yasset Perez-Riverol Twitter: @ypriverol Github: ypriverol Bioinformatician - PRIDE Group Proteomics Services Team EMBL-EBI Hinxton, Cambridge, UK
Transcript
Page 1: PRIDE and ProteomeXchange – Making proteomics data accessible and reusable

PRIDE and ProteomeXchange – Making proteomics data accessible and reusable

Dr. Yasset Perez-Riverol Twitter: @ypriverol

Github: ypriverol

Bioinformatician - PRIDE Group

Proteomics Services Team

EMBL-EBI

Hinxton, Cambridge, UK

Page 2: PRIDE and ProteomeXchange – Making proteomics data accessible and reusable

Yasset Perez-Riverol

[email protected] BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)

Proteomics Services, EBI-EMBL

IntAct Interactions

PRIDE MS/MS Data

Uniprot Protein Sequences

Reactome Pathways

Biomodels

Page 3: PRIDE and ProteomeXchange – Making proteomics data accessible and reusable

Yasset Perez-Riverol

[email protected] BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)

Overview

•  The ProteomeXchange (PX) consortium

•  PRIDE and ProteomeXchange

•  PRIDE Components.

•  Current and future developments.

Page 4: PRIDE and ProteomeXchange – Making proteomics data accessible and reusable

Yasset Perez-Riverol

[email protected] BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)

ProteomeXchange Consortium •  Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing proteomics repositories.

•  Includes PeptideAtlas (ISB, Seattle), PRIDE (Cambridge, UK) and MassIVE (UCSD, San Diego).

•  Common identifier space (PXD identifiers)

•  Two supported data workflows: MS/MS and SRM.

•  Main objective: Make data available and reusable.

http://www.proteomexchange.org

Page 5: PRIDE and ProteomeXchange – Making proteomics data accessible and reusable

Yasset Perez-Riverol

[email protected] BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)

ProteomeCentral

Metadata / Manuscript

Raw Data*

Results

Journals

UniProt/ neXtProt

Peptide Atlas

Other DBs

Receiving repositories

PASSEL (SRM data)

PRIDE (MS/MS data)

Other DBs

GPMDB

Researcher’s results

Reprocessed results

Raw data*

Metadata

MassIVE (MS/MS data)

Vizcaíno et al., Nat Biotechnol, 2014

ProteomeXchange data workflow

Page 6: PRIDE and ProteomeXchange – Making proteomics data accessible and reusable

Yasset Perez-Riverol

[email protected] BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)

MassIVE (UCSD)

http://proteomics.ucsd.edu/service/massive/

•  Just joined ProteomeXchange on June 2014

Page 7: PRIDE and ProteomeXchange – Making proteomics data accessible and reusable

Yasset Perez-Riverol

[email protected] BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)

• Suitable for SRM assays

• Part of PeptideAtlas set of resources.

http://www.peptideatlas.org/passel/ Farrah et al., Proteomics, 2012

PASSEL: repository for SRM data

Page 8: PRIDE and ProteomeXchange – Making proteomics data accessible and reusable

Yasset Perez-Riverol

[email protected] BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)

Pride: Protein identification Database

Vizcaíno et al., N. A Research, 2014 http://www.ebi.ac.uk/pride/archive/

Page 9: PRIDE and ProteomeXchange – Making proteomics data accessible and reusable

Yasset Perez-Riverol

[email protected] BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)

PX Submission workflow for MS/MS data 1.  Mass spectrometer output files: raw data (binary files) or

peak list spectra in a standardized format (mzML, mzXML).

2.  Result files:

a.  Complete submissions: Result files can be converted to PRIDE XML or the mzIdentML data standard.

b.  Partial submissions: For workflows not yet supported by PRIDE, search engine output files will be stored and provided in their original form.

3.  Metadata: Sufficiently detailed description of sample origin, workflow, instrumentation, submitter based on Ontologies and Controlled Vocabularies.

4.  Other files: Optional files: a.  QUANT: Quantification related results e. FASTA b.  PEAK: Peak list files c.  OTHER: Any other file type

Published    

Raw  Files  

Other  files  

Ternent et al., Proteomics, 2014

Page 10: PRIDE and ProteomeXchange – Making proteomics data accessible and reusable

Yasset Perez-Riverol

[email protected] BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)

Complete submissions using mzIdentML Search Engine

Results + MS files

Search engines

mzIdentML

-  Mascot -  MSGF+ -  Myrimatch and related tools from D. Tabb’s lab -  OpenMS -  PEAKS -  ProCon (ProteomeDiscoverer, Sequest) -  Scaffold -  TPP via the idConvert tool (ProteoWizard) -  ProteinPilot (planned by the end of 2014) -  Others: library for X!Tandem conversion, lab internal pipelines, …

An increasing number of tools support export to mzIdentML 1.1

-  Referenced spectral files need to be submitted as well (all open formats are supported).

Updated list: http://www.psidev.info/tools-implementing-mzIdentML#.

Page 11: PRIDE and ProteomeXchange – Making proteomics data accessible and reusable

Yasset Perez-Riverol

[email protected] BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)

mzTab

http://mztab.googlecode.com

•  Basic information about experiment and sample •  Key-Value pairs Metadata

•  Basic information about protein identifications •  Table-based Protein

•  Information about quantified peptides •  Table-based Peptide

•  Information about identified spectra •  Table-based PSM

•  Basic information about identified small molecules •  Table-based Small Molecule

J. Griss et al., MCP, 2014

Page 12: PRIDE and ProteomeXchange – Making proteomics data accessible and reusable

Yasset Perez-Riverol

[email protected] BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)

PRIDE Components: Submission Process

PRIDE Converter PRIDE Inspector PX Submission Tool

Page 13: PRIDE and ProteomeXchange – Making proteomics data accessible and reusable

Yasset Perez-Riverol

[email protected] BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)

 •  Capture the mappings between the different types of files.

•  Add the mandatory metadata annotation.

•  Make the file upload process straightforward to the submitter (It transfers all the files using Aspera or FTP).

•  Command line alternative: some scripting is needed.

PRIDE Components: PX submission tool

Published    

Raw  

Other  files  

http://www.proteomexchange.org/submission

PX submission

tool

Page 14: PRIDE and ProteomeXchange – Making proteomics data accessible and reusable

Yasset Perez-Riverol

[email protected] BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)

Available for complete submissions

Wang et al., Nat. Biotechnology, 2012

PRIDE Inspector 2.0

PRIDE Inspector 2.0 supports: -  PRIDE XML -  mzIdentML + all types of spectra files -  mzML -  mzTab Quantitation (work in progress)

https://github.com/PRIDE-Toolsuite/

Page 15: PRIDE and ProteomeXchange – Making proteomics data accessible and reusable

Yasset Perez-Riverol

[email protected] BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)

Pride Components: Pipelines and Visualization

Submission validation Pipeline

•  QC of files submitted. •  Metadata check.

Submission pipeline.

•  Add Project to Database (files location, general statistics, metadata)

Publication pipeline

•  Conversion of files to mztab •  Conversion spectra peaks to mgf •  Index de information in Solr server

Page 16: PRIDE and ProteomeXchange – Making proteomics data accessible and reusable

Yasset Perez-Riverol

[email protected] BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)

Pride Components: Services & Web components

Page 17: PRIDE and ProteomeXchange – Making proteomics data accessible and reusable

Yasset Perez-Riverol

[email protected] BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)

ProteomeCentral: Portal for all PX datasets

http://proteomecentral.proteomexchange.org/cgi/GetDataset

Page 18: PRIDE and ProteomeXchange – Making proteomics data accessible and reusable

Yasset Perez-Riverol

[email protected] BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)

ProteomeXchange: 1329 datasets up until October 2014

Origin: 271 USA

166 Germany

115 United Kingdom

73 Switzerland

70 China

68 Netherlands

67 France

55 Canada

44 Spain

42 Belgium

33 Sweden

31 Australia

31 Denmark

31 Japan

20 India

20 Norway

19 Taiwan

17 Ireland

16 Austria

14 Finland

14 Italy

12 Republic of Korea

11 Brazil

9 Russia

8 Israel

7 Singapore …

Type: 437 PRIDE complete 792 PRIDE partial 63 PeptideAtlas/PASSEL complete 14 MassIVE 23 reprocessed

Publicly Accessible: 691 datasets, 52% of all 86% PRIDE 12% PASSEL 2% MassIVE

Data volume: Total: ~55 TB Number of all files: ~131,000 PXD000320-324: ~ 5 TB PXD000065: ~ 1.4TB

Top Species studied by at least 10 datasets: 577 Homo sapiens 165 Mus musculus 56 Saccharomyces cerevisiae 53 Arabidopsis thaliana 29 Rattus norvegicus 22 Escherichia coli 17 Bos taurus 16 Mycobacterium tuberculosis 13 Oryza sativa 13 Drosophila melanogaster 13 Glycine max ~ 290 species in total

Datasets/year: 2012: 102 2013: 527 2014: 700

Page 19: PRIDE and ProteomeXchange – Making proteomics data accessible and reusable

Yasset Perez-Riverol

[email protected] BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)

Journals and Data Deposition

Journal

Num

ber o

f Sub

mis

sion

s

Page 20: PRIDE and ProteomeXchange – Making proteomics data accessible and reusable

Yasset Perez-Riverol

[email protected] BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)

Data Access ? To

tal N

umbe

rs

Page 21: PRIDE and ProteomeXchange – Making proteomics data accessible and reusable

Yasset Perez-Riverol

[email protected] BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)

Future developments

•  Make the data reusable.

•  Integration of different Protein expression resources

•  PRIDE

•  PeptideAtlas

•  ProteomicsDB

•  Human Proteome Map

PXD Identifier

Hits

Dataset title

PXD000561 153512 A draft map of the human

proteome

PXD000865 51639 Mass spectrometry based draft of

the human proteome

Page 22: PRIDE and ProteomeXchange – Making proteomics data accessible and reusable

Yasset Perez-Riverol

[email protected] BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)

PROXI Clients

Repositories &

Databases

Web Services PROXI PROXI PROXI PROXI PROXI Registry

Data Perez-Riverol Y, Proteomics, 20014

Page 23: PRIDE and ProteomeXchange – Making proteomics data accessible and reusable

Yasset Perez-Riverol

[email protected] BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)

Conclusions

•  ProteomeXchange is widely used.

•  PRIDE contains most of the MS/MS datasets.

•  It has now a new consortium member: MassIVE (UCSD).

•  Around half of the datasets are already public.

•  Different open source tools available to facilitate the process:

•  File transfer speed should not be a problem (Aspera support)

•  Data depostion enables and promotes data reuse.

•  ProteomeXchange is open to new members.

Page 24: PRIDE and ProteomeXchange – Making proteomics data accessible and reusable

Yasset Perez-Riverol

[email protected] BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)

Acknowledgements

PRIDE Team Juan A. Vizcaino (Group Leader) Attila Csordas Rui Wang Florian Reisinger Jose A. Dianes Tobias Ternent Yasset Perez-Riverol Noemi del Toro Henning Hermjakob

PeptideAtlas Team (ISB, Seattle) Eric Deutsch Terry Farrah Zhi Sun MAssIVE Nuno Bandeira And many other PX partners and stakeholders

Page 25: PRIDE and ProteomeXchange – Making proteomics data accessible and reusable

Yasset Perez-Riverol

[email protected] BioHackthon 2014 Miyagi, Japan (Nov 9-14, 2014)

Questions?


Recommended