Date post: | 15-Apr-2017 |
Category: |
Science |
Upload: | juan-antonio-vizcaino |
View: | 41 times |
Download: | 0 times |
EMBL-EBI Now and in the Future
PRIDE resources and ProteomeXchangeDr. Juan Antonio Vizcano
Proteomics Team LeaderEMBL-EBIHinxton, Cambridge, UK
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016
Data resources at EMBL-EBIGenes, genomes & variationArrayExpressExpression Atlas
PRIDEInterProPfamUniProtChEMBLChEBIMolecular structuresProtein Data Bank in EuropeElectron Microscopy Data BankEuropean Nucleotide ArchiveEuropean Variation ArchiveEuropean Genome-phenome ArchiveGene & protein expressionProtein sequences, families & motifsChemical biologyReactions, interactions & pathwaysIntActReactomeMetaboLightsSystemsBioModelsEnzyme PortalBioSamplesEnsembl Ensembl GenomesGWAS CatalogMetagenomics portalEurope PubMed CentralGene OntologyExperimental Factor OntologyLiterature & ontologies
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016The slide shows the core resources at the EBI to show the range of data you can access through the EBI.
2
PRIDE Archive (in the context of ProteomeXchange and the PSI standards)
How to submit data to PRIDE: PRIDE tools
How to access data in PRIDE Archive
PRIDE Cluster and PRIDE Proteomes
Overview
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016
PRIDE Archive (in the context of ProteomeXchange and the PSI standards)
How to submit data to PRIDE: PRIDE tools
How to access data in PRIDE Archive
PRIDE Cluster and PRIDE Proteomes
Overview
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016
PRIDE stores mass spectrometry (MS)-based proteomics data:Peptide and protein expression data (identification and quantification)Post-translational modificationsMass spectra (raw data and peak lists)Technical and biological metadataAny other related information
Full support for tandem MS approachesAny type of data can be stored.
PRIDE (PRoteomics IDEntifications) Archivehttp://www.ebi.ac.uk/pride/archiveMartens et al., Proteomics, 2005Vizcano et al., NAR, 2016
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016
5
Data content in PRIDE ArchiveSubmission driven resource
PRIDE is split in datasets (group of assays)
An assay represents one MS run (in most cases).
No data reprocessing at present. PRIDE aims to represent the authors view on the data
Supported formats: PRIDE XML and mzIdentML.
Raw data is also now stored
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016
ProteomeXchange: A Global, distributed proteomics database
PASSEL (SRM data)
PRIDE (MS/MS data)
MassIVE (MS/MS data)
Raw
ID/Q
Meta
jPOST(MS/MS data)
Mandatory raw data deposition since July 2015
Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing proteomics repositories.
http://www.proteomexchange.orgNew in 2016Vizcano et al., Nat Biotechnol, 2014Deutsch et al., NAR, 2017, in press
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016ProteomeCentralMetadata / ManuscriptRaw DataResults
Journals
Peptide Atlas Receiving repositories
PRIDE
Researchers results
Raw dataMetadata
PASSEL
Research groupsReanalysis of datasets
MassIVE
jPOST MS/MS data(as completesubmissions)
Any other workflow (mainly partial submissions)
DATASETS
SRM data
Reprocessed results
MassIVEProteomeXchange data workflow
Vizcano et al., Nat Biotechnol, 2014Deutsch et al., NAR, 2017, in press
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016
8
ProteomeCentralMetadata / ManuscriptRaw DataResults
Journals
UniProt/neXtProt
Peptide Atlas
Other DBs Receiving repositories
PRIDE
GPMDB
Researchers results
Raw dataMetadata
PASSEL
proteomicsDB
Research groupsReanalysis of datasets
MassIVE
jPOST MS/MS data(as completesubmissions)
Any other workflow (mainly partial submissions)
DATASETS
OmicsDIIntegration with other omics datasets
SRM data
Reprocessed results
MassIVEProteomeXchange data workflow
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016
9
PRIDE: Source of MS proteomics data
PRIDE Archive already provides or will soon provide MS proteomics data to other EMBL-EBI resources such as UniProt, Ensembl and the Expression Atlas.
http://www.ebi.ac.uk/pride
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016Explain that PRIDE is working in two main directions: Develop submission/dissemination pipelines of MS proteomics data involving the main proteomics resources (ProteomeXchange consortium), Integrate proteomics information (peptide/protein expression data) with other EBI resources like Ensembl (Genomics), the Expression Atlas (transcriptomics) and UniProt (to protein sequence information). Proteomics data is needed to have a more complete picture of biology. 10
PRIDE Archive (in the context of ProteomeXchange and the PSI standards)
How to submit data to PRIDE: PRIDE tools
How to access data in PRIDE Archive
A sneak peak to other PRIDE resources
Overview
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016ProteomeCentralMetadata / ManuscriptRaw DataResults
Journals
Peptide Atlas Receiving repositories
PRIDE
Researchers results
Raw dataMetadata
PASSEL
Research groupsReanalysis of datasets
MassIVE
jPOST MS/MS data(as completesubmissions)
Any other workflow (mainly partial submissions)
DATASETS
SRM data
Reprocessed results
MassIVEProteomeXchange data workflow
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016
12
CompletePartialComplete vs Partial submissions: processed resultsFor complete submissions, it is possible to connect the spectra with the identificationprocessed results and they can be visualized.
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016Complete vs Partial submissions: experimental metadata
CompletePartialGeneral experimental metadata about the projects is similar. However, at the assay level information in partial submissions is not so detailed
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016How to perform a complete PX submission to PRIDE
Decide between a complete/partial submission
File conversion/export to mzIdentML (or PRIDE XML)
File check before submission (PRIDE Inspector)
Experimental annotation and actual file submission (PX submission tool)
Post-submission steps
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016
PX Data workflow for MS/MS data
Mass spectrometer output files: raw data (binary files) or peak list spectra in a standardized format (mzML, mzXML).Result files: Complete submissions: Result files can be converted to the mzIdentML data standard (or PRIDE XML). Partial submissions: For workflows not yet supported by PRIDE, search engine output files will be stored and provided in their original form.Metadata: Sufficiently detailed description of sample origin, workflow, instrumentation, submitter. Other files: Optional files:QUANT: Quantification related resultse. FASTAPEAK: Peak list filesf. SP_LIBRARYGEL: Gel imagesOTHER: Any other file typePublished RawFilesOther files
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016
16
PX Data workflow for MS/MS data
Mass spectrometer output files: raw data (binary files) or peak list spectra in a standardized format (mzML, mzXML).Result files: Complete submissions: Result files can be converted to the mzIdentML data standard (or PRIDE XML). Partial submissions: For workflows not yet supported by PRIDE, search engine output files will be stored and provided in their original form.Metadata: Sufficiently detailed description of sample origin, workflow, instrumentation, submitter. Other files: Optional files (the list can be extended):QUANT: Quantification related resultse. FASTAPEAK: Peak list filesf. SP_LIBRARYGEL: Gel imagesOTHER: Any other file typePublished RawFilesOther files
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016
17
PRIDE Components: Data Submission ProcessPRIDE Converter 2PRIDE InspectorPX Submission Tool
mzIdentMLPRIDE XMLIn addition to PRIDE Archive, the PRIDE team develops and maintains different tools and software libraries to facilitate the handling and visualisation of MS proteomics data and the submission process
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016
18
Tools
RESULT file generationFinal RESULT file mzIdentML RESULTNative file export to mzIdentMLSpectra files
(mzML, mzXML, mzData, mgf, pkl, ms2, dta, apl)MascotProteinPilotScaffoldPEAKSMSGF+Others
Native File export
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016Put logo here19
Complete submissionsSearch Engine Results + MS filesSearch enginesmzIdentML
Mascot MSGF+ MyriMatch and related tools from D. Tabbs lab OpenMS PEAKS PeptideShaker ProCon (ProteomeDiscoverer, Sequest) Scaffold TPP via the idConvert tool (ProteoWizard) ProteinPilot (from version 5.0) X!Tandem native conversion (Beta, PILEDRIVER) Others: library for X!Tandem conversion, lab internal pipelines, Crux Soon: ProteomeDiscoverer (Thermo)
An increasing number of tools support export to mzIdentML 1.1
Referenced spectral files need to be submitted as well (all open formats are supported).
Updated list: http://www.psidev.info/tools-implementing-mzIdentML#.
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016
20
Tools
RESULT file generationFinal RESULT file mzTab RESULTComing soon: Support for mzTabSpectra files
(mzML, mzXML, mzData, mgf, pkl, ms2, dta, apl)MascotMaxQuantOthers
Native File export
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016Put logo here21
PRIDE Components: Submission ProcessPRIDE Converter 2PRIDE InspectorPX Submission Tool
mzIdentMLPRIDE XML2
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016PRIDE Inspector Toolsuite
Wang et al., Nat. Biotechnology, 2012Perez-Riverol et al., MCP, 2016
PRIDE InspectorPRIDE Inspector 2 supports:
PRIDE XML mzIdentML + all types of spectra files mzML mzTab identification and Quantification (+ all types of spectra files)
https://github.com/PRIDE-Toolsuite/
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016
23
PRIDE Inspector ToolsuitePRIDE Inspector 2
https://github.com/PRIDE-Toolsuite/
New visualisation functionality for Protein Groups
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016
24
PRIDE Components: Submission ProcessPRIDE Converter 2PRIDE InspectorPX Submission Tool
mzIdentMLPRIDE XML3
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016PX Submission Tool
Desktop application for data submissions to ProteomeXchange via PRIDE
Implemented in Java 7Streamlines the submission processCapture mappings between filesRetain metadataFast file transfer with Aspera (FASP transfer technology) FTP also availableCommand line option
Submission tool screenshot
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016
26
PX submission tool: screenshots
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016PRIDE Archive over 5,000 datasets from over 51 countries and 2,000 groupsUSA 814 datasetsGermany 528 UK 338China 328France 222Netherlands 175Canada - 137
Data volume:Total: ~275 TB Number of all files: ~560,000PXD000320-324: ~ 4 TBPXD002319-26 ~2.4 TBPXD001471 ~1.6 TB1,973 datasets i.e. 52% of all are publicly accessible~90% of all ProteomeXchange datasets
YearSubmissionsAll submissionsCompletePRIDE Archive growthIn the last 12 months: ~165 submitted datasets per monthTop Species studied by at least 100 datasets:2,010 Homo sapiens 604 Mus musculus 191 Saccharomyces cerevisiae 140 Arabidopsis thaliana 127 Rattus norvegicus >900 reported taxa in total
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016(> 922 processed by MaxQuant)
28
Public data release: when does it happen?
When the author tells us to do it (the authors can do it by themselves)
When we find out that a dataset has been published
We look for PXD identifiers in PubMed abstracts.
If your PXD identifier is not in the abstract, a paper may have been published and the data is still private. Let us know!
New web form in the PRIDE web to facilitate the process
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016Partial submissions can be used to store other data typesEverything can be stored, not only MS/MS data: very flexible mechanism to be able to capture all types of datasets.
PRIDE does not actively store SRM data (PASSEL).
Top down proteomics datasets.
Mass Spectrometry Imaging datasets.
Data independent acquisition techniques: e.g. SWATH-MS, MSE, HD-MSE, etc.
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016
CDFrom original publication [13]Reconstructed ProteomeXchange dataThermo RAW data / UDPMirion Software (JLU)Thermo RAW data / UDPConvert to imzMLUpload to PRIDE (EBI, Cambridge, UK)
Download from PRIDEDisplay in MSiReaderVendor-independent data formatFreely available software (open source)open data free to reuseAnybody can do this! A public repository for mass spectrometry imaging dataRmpp et al., 2015PRIDE databaseEuropean Bioinformatics Institute, Cambridge, UK 3. Upload4. Download
No file size limit!
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016
31
PRIDE Archive (in the context of ProteomeXchange and the PSI standards)
How to submit data to PRIDE: PRIDE tools
How to access data in PRIDE Archive
PRIDE Cluster and PRIDE Proteomes
Overview
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016Ways to access data in PRIDE Archive
PRIDE web interface
File repository
REST web service
PRIDE Inspector tool
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016PRIDE Archive web interface
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016PRIDE Archive web interface (2)
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016ProteomeCentralMetadata / ManuscriptRaw DataResults
Journals
Peptide Atlas Receiving repositories
PRIDE
Researchers results
Raw dataMetadata
PASSEL
Research groupsReanalysis of datasets
MassIVE
jPOST MS/MS data(as completesubmissions)
Any other workflow (mainly partial submissions)
DATASETS
SRM data
Reprocessed results
MassIVEProteomeXchange data workflow
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016
36
ProteomeCentral: Centralised portal for all PX datasetshttp://proteomecentral.proteomexchange.org/cgi/GetDataset
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016RSS feed and Twitter for following announcements of public datasets
http://groups.google.com/group/proteomexchange/feed/rss_v2_0_msgs.xml @proteomexchange
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016
PRIDE Archive (in the context of ProteomeXchange and the PSI standards)
How to submit data to PRIDE: PRIDE tools
How to access data in PRIDE Archive
PRIDE Cluster and PRIDE Proteomes
Overview
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016Added value resources: PRIDE Cluster and PRIDE ProteomesCondensed and across-data set, QC-filtered view on PRIDE data.PRIDE Cluster: Peptide centric.PRIDE Proteomes: Protein centric (identification data)
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016PRIDE Cluster
Provide an aggregated peptide centric view of PRIDE Archive.Hypothesis: same peptide will generate similar MS/MS spectra across experiments.New version of spectral clustering algorithm to reliably group spectra coming from the same peptide. Enables QC of peptide-spectrum matches (PSMs). Infer reliable identifications by comparing submitted identifications of spectra within a cluster.
After clustering, a representative spectrum is built for all peptides consistently identified across different datasets.Used to build spectral libraries (for 16 species).Griss et al., Nat. Methods, 2013Griss et al., Nat. Methods, 2016
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016
41
Example: one perfect cluster
880 PSMs give the same peptide ID4 species28 datasetsSame instruments
http://www.ebi.ac.uk/pride/cluster/
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016PRIDE Cluster as a Public Data Mining Resource43http://www.ebi.ac.uk/pride/cluster Spectral libraries for 16 species.All clustering results, as well as specific subsets of interest available.Source code (open source) and Java API
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016PRIDE Proteomes web interface: identification info
Unique/Shared Peptides Mass spec-based sequence coveragePTM detected ( )Observed tissues
Biological vs Sample Prep PTMshttp://wwwdev.ebi.ac.uk/pride/proteomes/
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016
Main characteristics of PRIDE Archive and ProteomeXchange
PX/PRIDE submission workflow for MS/MS dataPRIDE InspectorPX submission tool
PRIDE/ProteomeXchange has become the de facto standard for data submission and data availability in proteomics
PRIDE Proteomes and PRIDE Cluster: new resources
Conclusions
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016PRIDE resources
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016
Do you want to know a bit more?
http://www.slideshare.net/JuanAntonioVizcaino
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 2016Aknowledgements: PeopleAttila CsordasTobias TernentGerhard Mayer (de.NBI)
Johannes GrissYasset Perez-RiverolManuel Bernal-LlinaresAndrew Jarnuczak
Enrique Perez
Former team members, especially Rui Wang, Florian Reisinger, Noemi del Toro, Jose A. Dianes & Henning Hermjakob
Acknowledgements: The PRIDE Team
All data submitters !!!
@pride_ebi
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 201648
Questions?
Juan A. [email protected] Proteomics Bioinformatics Course 2016Hinxton, 8 December 201649