+ All Categories
Home > Science > ProteomeXchange: data deposition and data retrieval made easy

ProteomeXchange: data deposition and data retrieval made easy

Date post: 29-Jun-2015
Category:
Upload: juan-antonio-vizcaino
View: 987 times
Download: 2 times
Share this document with a friend
Description:
Talk during the Annual Meeting of the EU PRIME-XS project in Avila. Highlights of ProteomeXchange in the last year in the context of the PRIME-XS project (JRA 1: Bioinformatics).
Popular Tags:
29
ProteomeXchange: data deposition and data retrieval made easy Proteomics Services Group European Bioinformatics Institute Hinxton, Cambridge United Kingdom [email protected] Juan Antonio VIZCAINO, Ph.D. PRIDE Group coordinator
Transcript
Page 1: ProteomeXchange: data deposition and data retrieval made easy

ProteomeXchange: data deposition and data retrieval made easy

Proteomics Services Group

European Bioinformatics Institute

Hinxton, Cambridge

United Kingdom

[email protected]

Juan Antonio VIZCAINO, Ph.D.

PRIDE Group coordinator

Page 2: ProteomeXchange: data deposition and data retrieval made easy

• The ProteomeXchange (PX) consortium

• Highlights in the last year

• PRIME-XS datasets

Overview

Page 3: ProteomeXchange: data deposition and data retrieval made easy

ProteomeXchange Consortium

•Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing proteomics repositories.

•Includes PeptideAtlas (ISB, Seattle), PRIDE (Cambridge, UK) and (very recently) MassIVE (UCSD, San Diego).

•Common identifier space (PXD identifiers)

•Two supported data workflows: MS/MS and SRM.

•Main objective: Make life easier for researchers

http://www.proteomexchange.org

Page 4: ProteomeXchange: data deposition and data retrieval made easy

ProteomeCentral

Metadata / Manuscript

Raw Data*

Results

Journals

UniProt/neXtProt

Peptide Atlas

Other DBs

Receiving repositories

PASSEL (SRM data)

PRIDE (MS/MS data)

Other DBs

GPMDB

Researcher’s results

Reprocessed results

Raw data*

Metadata

MassIVE (MS/MS data)

Vizcaíno et al., Nat Biotechnol, 2014

ProteomeXchange data workflow

Page 5: ProteomeXchange: data deposition and data retrieval made easy

MassIVE (UCSD)

http://proteomics.ucsd.edu/service/massive/

• Just joined ProteomeXchange on June 2014• Only partial submissions. A few datasets so far.

Page 6: ProteomeXchange: data deposition and data retrieval made easy

• The ProteomeXchange (PX) consortium

• Highlights in the last year

• PRIME-XS datasets

Overview

Page 7: ProteomeXchange: data deposition and data retrieval made easy

PX Data workflow for MS/MS data

1. Mass spectrometer output files: raw data (binary files) or peak list spectra in a standardized format (mzML, mzXML).

2. Result files:

a. Complete submissions: Result files can be converted to PRIDE XML or the mzIdentML data standard.

b. Partial submissions: For workflows not yet supported by PRIDE, search engine output files will be stored and provided in their original form.

3. Metadata: Sufficiently detailed description of sample origin, workflow, instrumentation, submitter.

4. Other files: Optional files:a. QUANT: Quantification related results e. FASTAb. PEAK: Peak list files f. SP_LIBRARYc. GEL: Gel imagesd. OTHER: Any other file type

Published

RawFiles

Other files

Page 8: ProteomeXchange: data deposition and data retrieval made easy

Complete Partial

For complete submissions, it is possible to connect the spectra with the identificationprocessed results and they can be visualized.

Complete vs Partial submissions: processed results

PRIDE XML, mzIdentML supportedmzTab to come

Page 9: ProteomeXchange: data deposition and data retrieval made easy

Complete vs Partial submissions: experimental metadata

Complete Partial

General experimental metadata about the projects is similar. However, at the assay level information, in partial submissions is less annotated

Page 10: ProteomeXchange: data deposition and data retrieval made easy

Complete submissions using mzIdentML

Search Engine Results + MS

files

Search engines

mzIdentML

- Mascot- MSGF+- Myrimatch and related tools from D. Tabb’s

lab- OpenMS- PEAKS- ProCon (ProteomeDiscoverer, Sequest)- Scaffold- TPP via the idConvert tool (ProteoWizard)- ProteinPilot (planned by the end of 2014)- Others: library for X!Tandem conversion, lab

internal pipelines, …

An increasing number of tools support export to mzIdentML 1.1

- Referenced spectral files need to be submitted as well (all open formats are supported).

Updated list: http://www.psidev.info/tools-implementing-mzIdentML#.

Page 11: ProteomeXchange: data deposition and data retrieval made easy

Tools ‘RESULT’ file generation Final ‘RESULT’ file

mzIdentML ‘RESULT’

Now: native file export

Spectra files

Mascot

ProteinPilot

Scaffold

PEAKS

MSGF+

Others

Native File export

Page 12: ProteomeXchange: data deposition and data retrieval made easy

Search output

files

Spectra files

Original data files ‘RESULT’ file generation Final ‘RESULT’ file

PRIDE XML

‘RESULT’

Before: file conversion using PRIDE Converter

File conversion

PRIDE Converter

Page 13: ProteomeXchange: data deposition and data retrieval made easy

PRIDE Inspector 2

Wang et al., Nat. Biotechnology, 2012

PRIDE Inspector 2.0

PRIDE Inspector 2.0 supports:

- PRIDE XML- mzIdentML + all types of spectra files- mzML- mzTab (work in progress)

http://code.google.com/p/pride-toolsuite/wiki/PRIDEInspector

Page 14: ProteomeXchange: data deposition and data retrieval made easy

•Capture the mappings between the different types of files.

•Add the mandatory metadata annotation.

•Make the file upload process straightforward to the submitter (It transfers all the files using Aspera or FTP).

•Command line alternative: some scripting is needed.

PX submission tool: data submission

Published

Raw

Other files

http://www.proteomexchange.org/submission

PXsubmission

tool

Page 15: ProteomeXchange: data deposition and data retrieval made easy

Uploading large datasets: Aspera

- Aspera is the default file transfer protocol to PRIDE:- PX Submission tool- Command line

- Up to 50X faster than FTP File transfer speed should not be a problem!!

Page 16: ProteomeXchange: data deposition and data retrieval made easy

Tutorial manuscript detailing the process

Ternent et al., Proteomics, 2014http://www.proteomexchange.org/submission

Example dataset:PXD000764

- Title: “Discovery of new CSF biomarkers for meningitis in children”- 12 runs: 4 controls and 8 infected samples- Identification and quantification data

Page 17: ProteomeXchange: data deposition and data retrieval made easy

Origin: 271 USA

166 Germany

115 United Kingdom

73 Switzerland

70 China

68 Netherlands

67 France

55 Canada

44 Spain

42 Belgium

33 Sweden

31 Australia

31 Denmark

31 Japan

20 India

20 Norway

19 Taiwan

17 Ireland

16 Austria

14 Finland

14 Italy

12 Republic of Korea

11 Brazil

9 Russia

8 Israel

7 Singapore …

ProteomeXchange: 1329 datasets up until October 2014

Type:

437 PRIDE complete

792 PRIDE partial

63 PeptideAtlas/PASSEL complete

14 MassIVE

23 reprocessed

Publicly Accessible:

691 datasets, 52% of all

86% PRIDE

12% PASSEL

2% MassIVE

Data volume:

Total: ~55 TB

Number of all files: ~131,000

PXD000320-324: ~ 5 TB

PXD000065: ~ 1.4TB

Top Species studied by at least 10 datasets:

577 Homo sapiens

165 Mus musculus

56 Saccharomyces cerevisiae

53 Arabidopsis thaliana

29 Rattus norvegicus

22 Escherichia coli

17 Bos taurus

16 Mycobacterium tuberculosis

13 Oryza sativa

13 Drosophila melanogaster

13 Glycine max

~ 290 species in total

Datasets/year:

2012: 102

2013: 527

2014: 700

Page 18: ProteomeXchange: data deposition and data retrieval made easy

• The ProteomeXchange (PX) consortium

• Highlights in the last year

• PRIME-XS datasets

Overview

Page 19: ProteomeXchange: data deposition and data retrieval made easy

PX submission tool: PRIME-XS tags

37 Datasets in total (both public and private at present):

- 20 from the Netherlands- 4 from UK - 2 from Austria, Belgium, Denmark,

Spain and Switzerland- 1 from France and USA.

Page 21: ProteomeXchange: data deposition and data retrieval made easy

ProteomeCentral: Portal for all PX datasets

http://proteomecentral.proteomexchange.org/cgi/GetDataset

Page 22: ProteomeXchange: data deposition and data retrieval made easy

Which are the most accessed datasets?

PXD Identifier Total Hits Dataset title Publication

PXD000561 153512 A draft map of the human proteomeKim et al., Nature,2014.

PMID: 24870542

PXD000851 111587Membrane proteomic analysis of

colorectal cancer tissueKume et al., MCP, 2014.

PMID:24687888

PXD000865 51639Mass spectrometry based draft of the

human proteomeWilhelm et al., 2014,

Nature, PMID:24870543

Page 23: ProteomeXchange: data deposition and data retrieval made easy

Tota

l Num

bers

Which are the most accessed datasets?

Page 24: ProteomeXchange: data deposition and data retrieval made easy

Find the desired PRIDE project …

… and start re-analyzing the data!

… inspect the project details ….

Reshake PRIDE data in PeptideShaker

http://peptide-shaker.googlecode.comVaudel M, Burkhart J, Zahedi RP, Berven FS, Sickmann A, Martens L, Barsnes H. Nature Biotechnology (in press)

Page 25: ProteomeXchange: data deposition and data retrieval made easy

A little bit of perspective

Berlin 2011 Mallorca 2012

Annecy 2013 Split 2013

Page 26: ProteomeXchange: data deposition and data retrieval made easy

A little bit of perspective

2011 2012 2013 2014

mzIdentML mzQuantMLqcMLmzTab

PRIDE web (2011)

PRIDE Converter

PRIDE Converter 2

PRIDE Inspector PX Submission Tool

PRIDE Inspector 2

PRIDE web (2014)

PRIDE/PX datasets

Page 27: ProteomeXchange: data deposition and data retrieval made easy

Conclusions

• ProteomeXchange is widely used. – PRIDE contains most of the MS/MS

datasets.– It has now a new consortium member:

MassIVE (UCSD).– Around half of the datasets are already

public.

• Different open source tools available to facilitate the process:– File transfer speed should not be a

problem (Aspera support)

• Data depostion enables and promotes data reuse.

• ProteomeXchange is open to new members.

Page 28: ProteomeXchange: data deposition and data retrieval made easy

Aknowledgements: People

Attila CsordasTobias TernentNoemi del ToroRui WangFlorian Reisinger

Jose A. DianesJohannes GrissSteven LewisYasset Perez-Riverol

Henning Hermjakob

All previous team membersProteomeXchange partners

Page 29: ProteomeXchange: data deposition and data retrieval made easy

Acknowledgements: Funding

[email protected]@ebi.ac.uk

http://www.proteomexchange.orghttp://code.google.com/p/pride-converter-2/

@pride_ebi


Recommended