+ All Categories
Home > Documents > Data Submission Guidelines for the ProteomeXchange Consortium

Data Submission Guidelines for the ProteomeXchange Consortium

Date post: 14-Feb-2017
Category:
Upload: hanhu
View: 223 times
Download: 3 times
Share this document with a friend
14
ProteomeXchange and proteomics data submission v2.0 15 September 2016 1 Data Submission Guidelines for the ProteomeXchange Consortium This document aims to provide detailed guidelines for the users to submit mass spectrometry (MS) derived proteomics data to the ProteomeXchange (PX) Consortium of proteomics resources (1) (http://www.proteomexchange.org). Table of contents 1 Types of dataset submissions ........................................................................................................ 2 2 Proteomics data resources in ProteomeXchange.......................................................................... 2 2.1 List of Universal Archival resources ...................................................................................... 2 2.2 List of Focused Archival resources ........................................................................................ 3 2.3 List of Secondary Data Resources ......................................................................................... 3 2.4 ProteomeCentral: the common Portal for PX datasets ........................................................ 3 3 Data workflow for original datasets .............................................................................................. 4 3.1 Submission workflow for Selected Reaction Monitoring (SRM) datasets............................. 6 4 Workflow for reprocessed datasets .............................................................................................. 6 5 Data ownership ............................................................................................................................. 7 6 Data privacy ................................................................................................................................... 7 7 References ..................................................................................................................................... 8 8 Appendix I: Data types ................................................................................................................... 9 9 Appendix II: Metadata and the PX XML message ........................................................................ 11 10 Appendix III: How to get notified about new PX datasets....................................................... 13 11 Appendix IV: Membership in the ProteomeXchange Consortium .......................................... 14
Transcript
Page 1: Data Submission Guidelines for the ProteomeXchange Consortium

ProteomeXchangeandproteomicsdatasubmissionv2.0 15September2016

1

DataSubmissionGuidelinesfortheProteomeXchangeConsortiumThisdocumentaimstoprovidedetailedguidelinesfortheuserstosubmitmassspectrometry(MS)derived proteomics data to the ProteomeXchange (PX) Consortium of proteomics resources (1)(http://www.proteomexchange.org).Tableofcontents1 Typesofdatasetsubmissions........................................................................................................22 ProteomicsdataresourcesinProteomeXchange..........................................................................2

2.1 ListofUniversalArchivalresources......................................................................................22.2 ListofFocusedArchivalresources........................................................................................32.3 ListofSecondaryDataResources.........................................................................................32.4 ProteomeCentral:thecommonPortalforPXdatasets........................................................3

3 Dataworkflowfororiginaldatasets..............................................................................................43.1 SubmissionworkflowforSelectedReactionMonitoring(SRM)datasets.............................6

4 Workflowforreprocesseddatasets..............................................................................................65 Dataownership.............................................................................................................................76 Dataprivacy...................................................................................................................................77 References.....................................................................................................................................88 AppendixI:Datatypes...................................................................................................................99 AppendixII:MetadataandthePXXMLmessage........................................................................1110 AppendixIII:HowtogetnotifiedaboutnewPXdatasets.......................................................1311 AppendixIV:MembershipintheProteomeXchangeConsortium..........................................14

Page 2: Data Submission Guidelines for the ProteomeXchange Consortium

ProteomeXchangeandproteomicsdatasubmissionv2.0 15September2016

2

1 Typesofdatasetsubmissions ThePXresourcessupporttwotypesofdatasetsubmissions,dependingonthedifferentproteomicsdataworkflowsandthedataformatsavailable.a) Complete submission: A complete (also known as “supported”) submission ensures that the

identificationresultsandthecorrespondingmassspectra(seedefinitionsofdatatypesinAppendixI)canbeparsed,integratedandvisualisedbythePXresourceand/orinfree-to-usestand-alonetools such as PRIDE Inspector (available at https://github.com/PRIDE-Toolsuite/pride-inspector/releases). To achieve that, processed identification results need to be provided in astandardformat(e.g.mzIdentML(2),mzTab(3)),andoptionallyaswellinadifferentopendataformat(e.g.PRIDEXML).Inaddition,allthesubmittedfilesaremadeavailabletodownload.

b) Partialsubmission:Inthiscase(alsoknownas“unsupported”)processedidentificationresultsare

provided in other data formats than the indicated above for complete submission. For the PXresource,itisthennotpossibletoparse,integrateandvisualisetheidentificationand/orconnectthe identificationdata to the correspondingmass spectra.However, all the submitted files aremadeavailabletodownload.Thismechanismallowsdatageneratedfromsoftwarethatcannotexportyettostandardformats,orfromnovelexperimentalapproachestobedepositedintothePXresources.

2 ProteomicsdataresourcesinProteomeXchangeIn the ProteomeXchange Consortium there are currently two types of proteomics data resources(Figure1):a) Archivalresources:TheirmainmissionistostoreMSbasedproteomicsdata.Therearetwotypes:

a. Universalresources:Theycanstoreanytypeofproteomicsdatasets,comingfromanydataworkflow.However, theyarenormally focused in supporting“complete” submissions forparticular dataworkflows, e.g. bottom-up proteomics data dependent acquisition (DDA)workflows).ThecurrentexamplesintheConsortiumarePRIDEArchive,MassIVEandjPOST(seeSection2.1).

b. Focusedresources:Theysupportspecificallyonetypeofdataworkflowandwillnotstoredata from other proteomics approaches. An example is the PASSEL component ofPeptideAtlas, which is the representative for Selected Reaction Monitoring (SRM)approaches(seeSection2.2).

b) Secondarydataresources:Theseonesbuildupontheprimarydataprovidedbysubmitters,which

are stored in theArchival resources. Thereare two representative resources: PeptideAtlas andMassIVE(seeSection2.3).MassIVEisthenbothanArchivalandaSecondarydataresource.

2.1 ListofUniversalArchivalresources Currently,therearetwouniversalArchivalresourcesavailable:1- PRIDE Archive (http://www.ebi.ac.uk/pride/archive, EMBL-European Bioinformatics Institute,Cambridge,UK).Datasubmissiondocumentationisavailablehereorinthispublication(4).2-MassIVE(https://massive.ucsd.edu/,UniversityofCaliforniaSanDiego(UCSD),SanDiego,CA,US).Datasubmissiondocumentationisavailableathttps://massive.ucsd.edu/ProteoSAFe/help.jsp.

Page 3: Data Submission Guidelines for the ProteomeXchange Consortium

ProteomeXchangeandproteomicsdatasubmissionv2.0 15September2016

3

3-jPOST(http://jpost.org/,jPOSTProjectTeam,Japan).Datasubmissiondocumentationisavailableathttps://repository.jpostdb.org/help.

2.2 ListofFocusedArchivalresources1-PASSEL(InstituteforSystemsBiology,Seattle,WA,USA)istheonlyfocusedresourceatpresent.Datasubmissiondocumentationavailableathttp://www.peptideatlas.org/passel/.

2.3 ListofSecondaryDataResources PeptideAtlas(http://www.peptideatlas.org/, InstituteforSystemsBiology,Seattle,WA,USA) istheonlysecondarydataresourceatpresent.Documentationisavailableathttp://www.peptideatlas.org/.MassIVE(https://massive.ucsd.edu/,UniversityofCaliforniaSanDiego(UCSD),SanDiego,CA,USA).

2.4 ProteomeCentral:thecommonPortalforPXdatasets ProteomeCentral(availableathttp://proteomecentral.proteomexchange.org)istheportalforallPXdatasets,independentlyfromtheoriginalresourcewherethedatasetswerestored.Thisqueryablearchiveprovidestheuserswithanefficientwaytoidentifydatasetsofinterest.

Page 4: Data Submission Guidelines for the ProteomeXchange Consortium

ProteomeXchangeandproteomicsdatasubmissionv2.0 15September2016

4

3 DataworkflowfororiginaldatasetsTheoverallProteomeXchangedataworkflowissummarizedinFigure1.

Figure1:OverviewoftheProteomeXchangedataflow.OriginaldatasetscomingfromanyproteomicsdataworkflowcanbesubmittedtoanyoftheuniversalArchival Resources (PRIDE Archive, MassIVE or jPOST). Examples of data workflows are shot-gun(bottom-up) proteomics (Data Dependent Acquisition, DDA), top down, or Data IndependentAcquisition(DIA)approaches(e.g.SWATH-MS),amongmanyothers.Allofthesubmitteddatasetswillget a unique PXD identifier (see details athttp://www.ebi.ac.uk/miriam/main/collections/MIR:00000513).However, it is highly RECOMMENDED that datasets from data workflows explicitly supported byexistingfocusedarchivalresources,otherthanshot-gunproteomics(themostuniversalandwidelyusedapproach),aresubmittedtothatresource,andnottoanyoftheuniversalarchivalresources.Atpresent,SRM/MRMdatasetsshouldbesubmittedtoPASSEL(theonlyPXresourceofthistype).Thesamerecommendationwillbe implementedforadditionalproteomicsapproaches ifother focusedresourcesareincludedintheConsortiuminthefuture.UserscanthenchoosefreelytheuniversalArchivalresourceforthesubmissionoftheirdatasets.Userpreferences can be based for instance on geographical proximity, availability of “complete”submissions for particular workflows, or technical specifications (e.g. speed for data uploads anddownloads),amongotherconsiderations.Inanycase,foreachsubmittedPXdatasetitismandatorytoincludethefollowingdatatypes:

Page 5: Data Submission Guidelines for the ProteomeXchange Consortium

ProteomeXchangeandproteomicsdatasubmissionv2.0 15September2016

5

(i)Massspectrometeroutputfiles(seeAppendixI).(ii)Protein/peptideidentifications.Dependingonthetypeofsubmission, ‘supportedidentificationresults’(e.g.mzIdentML)willbeneededforcompletesubmissions.InthecaseofPartialsubmissions,anytypeofsearchengineoutputfilesaresupported.(iii)Processedpeaklistspectraformats.Thesefilesareneededtoenabletheconnectionbetweentheidentificationsandthemassspectra.InthecaseofcompletesubmissionsperformedwithmzIdentML,these files aremandatory (since peak lists are not included inmzIdentMLper se). These files areoptionalinthecaseofPartialsubmissions(sincemassspectrometeroutputfilesareavailableanyway).(iv) Metadata: Related biological and technological metadata provide the experimental context.Different resourceshavedifferentmetadata requirements (see individual documentation for eachresource),butatleastinformationneedstobeprovidedtobeabletogeneratethePXXMLformat(usedbytheProteomeCentralresource,seeAppendixII).Otheroptionaldatatypescanalsobeincludedinasubmitteddataset,forinstance:(i)Quantificationsoftwareoutputfiles:Quantificationresults.(ii)Gelimages.(iii)Filesusedtoperformthemassspectralsearches,eithersequencedatabasefilesorspectrallibraryfiles.(iv)Anyotherdatatype(e.g.scripts,pdffiles,etc).Inaddition,amechanismtosubmitmassspectrometryimagingdata(asaPartialsubmission)hasbeendescribedinthispublication(5).SeeTable1belowformoredetails.Table1.SummaryofsubmissionguidelinesforeachPXresource,dependingonthedataworkflowinvolved. PRIDE PASSEL MassIVE jPOST

DDAMS/MS Partial Yes No Yes Yes

Complete:mzIdentML Yes No Yes YesComplete:mzTab No No Yes Yes

Complete:TSV No No Yes NoComplete:PRIDEXML Yes No No No

Otherworkflows

TargetedSRM/MRM

Partialonly

Partialandcomplete

Partialonly

Partialonly

DIAMS/MS Partialonly No Partialandcomplete PartialonlyTop-down Partialonly No Partialonly Partialonly

Massspectrometryimaging Partialonly No Partialonly Partialonly

Page 6: Data Submission Guidelines for the ProteomeXchange Consortium

ProteomeXchangeandproteomicsdatasubmissionv2.0 15September2016

6

3.1 SubmissionworkflowforSelectedReactionMonitoring(SRM)datasetsNewdatasetsacquiredviaSRMshouldbesubmittedtoPASSEL,astheonlyfocusedArchivalresourcecurrentlysupportingthistypeofapproaches.Forsuchsubmitteddatasets,3mainitemsarerequired:

1. Massspectrometeroutputfiles,preferablyrawfiles(AppendixI).2. Transitionlistdescribingthepeptidesthattheinstrumenttargeted.3. Analysisresults.

Oncesubmissionsarereceived,theyarecheckedbyacurator,runthroughthePASSELpipeline,andthenloadedintothePASSELdatabase.

Figure2.WorkflowfororiginalSRMdatasubmissionstoPASSEL.

4 WorkflowforreprocesseddatasetsTheworkflowforreprocesseddatasetsstartswhenanysecondarydataresourceofthePXConsortium(atpresentPeptideAtlasandMassIVE)makeareinterpretationofexistingdatainanyoftheArchivalresources.AnewProteomeXchange identifierwillbeobtainedfromProteomeCentral (it isaRPXDidentifier instead of the standard PXD identifier). As an example, see dataset RPXD000665 inProteomeCentral:http://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=RPXD000665).However,theoriginalPXaccessionnumberisretainedinthePXXMLmessagetoallowcoordinatedsearch fordifferentviewsofdata fromonegivensubmission.Thisensures thatasimpleone-timesubmission from a contributor is automatically distributed to all PX repositories with sufficientinformation.WhenthereanalysisisdonebyaPXmemberaXMLbroadcastwillbeproduced,whichwill include the new PXD identifier, but also the old one. All the relevant information about theconnectionbetweenthedatasetswillbestoredinProteomeCentral.ThreemainsituationsmayarisewhenaPXdatasetisreanalysed:a)Ifthedatareinterpretationgetspublishedinaseparatepublicationas‘independent’findings:-DatamustgotoauniversalArchivalresource(e.g.asanyothernewMS/MSdataset).-Itisnotmandatorytore-uploadtherawdata(referencestoURLsareallowedinthiscase,ifthiscaseissupportedbytheArchivalresource).

Page 7: Data Submission Guidelines for the ProteomeXchange Consortium

ProteomeXchangeandproteomicsdatasubmissionv2.0 15September2016

7

b)Thereinterpretationdoesnotgetpublishedas‘independent’newfindings.Inthiscase,datacanbekeptinaSecondarydataresource.Forinstance,thisappliestoallnewPeptideAtlasbuildsthatgetpublished.c) In the case of a mixture of new and reprocessed data in one given dataset, they should beconsideredtobeanewdataset,sothedatasetshouldbesubmittedtothecorrespondinguniversalArchivalresource.

5 DataownershipAll ProteomeXchange resources donot assumeeditorial control or ownership over the submitteddata; itmaintains the original submitter as owner of these data. All ProteomeXchange resourcesrequirethatasubmitterisexplicitlyidentifiedforeachdataset.Uponpublicavailabilityofthedata,theoriginaldataownershipismaintainedinthedatabase,althoughobviouslydisseminationandreuseofthereleaseddataarenolongerrestrictedatthatpoint.PASSEL and jPOST also do not assume editorial control of the submitted data. Users specify atsubmissiontimethedateonwhichthedatabecomepubliclyaccessible.Onthisdate,thedataareautomaticallyreleased.Thedataownerhastheoptionofadjustingthisdateincaseofreviewdelays,etc.

6 DataprivacyAllProteomeXchangeresourcesallowdatatobekeptprivateforanydurationoftime,untiltheownerofthedata(asidentifiedbytheassociateduseraccount)givesexplicitpermissiontoreleasethedata.However,avariantoccurswhenprivatelysubmitteddataareassociatedwithamanuscriptsubmittedtoajournal.Oncethepaperispublished,thepublicavailabilityofthecorrespondingsubmitteddatawillthenbetriggeredwithoutaskingforpermissiontothesubmitters.IntheparticularcaseofPASSELandjPOST,databecomeautomaticallyavailableonthedatethatthesubmitterspecifies.AllPXresourcescanautomaticallyproviderevieweraccountsforeachsubmittedexperiment,whichcan be communicated to journal editors and referees in a submitted manuscript, thus allowingconfidentialreviewingoftheprivatelysubmitteddata.

Page 8: Data Submission Guidelines for the ProteomeXchange Consortium

ProteomeXchangeandproteomicsdatasubmissionv2.0 15September2016

8

7 References 1. Vizcaino,J.A.,Deutsch,E.W.,Wang,R.,Csordas,A.,Reisinger,F.,Rios,D.,Dianes,J.A.,Sun,Z.,

Farrah, T., Bandeira, N. et al. (2014) ProteomeXchange provides globally coordinatedproteomicsdatasubmissionanddissemination.NatBiotechnol,32,223-226.

2. Jones,A.R.,Eisenacher,M.,Mayer,G.,Kohlbacher,O.,Siepen,J.,Hubbard,S.J.,Selley,J.N.,Searle,B.C.,Shofstahl,J.,Seymour,S.L.etal.(2012)ThemzIdentMLdatastandardformassspectrometry-basedproteomicsresults.MolCellProteomics,11,M111014381.

3. Griss,J.,Jones,A.R.,Sachsenberg,T.,Walzer,M.,Gatto,L.,Hartler,J.,Thallinger,G.G.,Salek,R.M., Steinbeck, C., Neuhauser, N. et al. (2014) The mzTab data exchange format:communicating mass-spectrometry-based proteomics and metabolomics experimentalresultstoawideraudience.MolCellProteomics,13,2765-2775.

4. Ternent,T.,Csordas,A.,Qi,D.,Gomez-Baena,G.,Beynon,R.J.,Jones,A.R.,Hermjakob,H.andVizcaino,J.A.(2014)HowtosubmitMSproteomicsdatatoProteomeXchangeviathePRIDEdatabase.Proteomics,14,2233-2241.

5. Rompp,A.,Wang,R.,Albar, J.P.,Urbani,A.,Hermjakob,H., Spengler,B. andVizcaino, J.A.(2015)Apublicrepositoryformassspectrometryimagingdata.AnalBioanalChem,407,2027-2033.

6. Pedrioli,P.G.,Eng,J.K.,Hubley,R.,Vogelzang,M.,Deutsch,E.W.,Raught,B.,Pratt,B.,Nilsson,E., Angeletti, R.H., Apweiler, R. et al. (2004) A common open representation of massspectrometrydataanditsapplicationtoproteomicsresearch.NatBiotechnol,22,1459-1466.

7. Martens, L., Chambers,M., Sturm,M., Kessner,D., Levander, F., Shofstahl, J., Tang,W.H.,Rompp,A.,Neumann,S.,Pizarro,A.D.etal. (2011)mzML--acommunitystandardformassspectrometrydata.MolCellProteomics,10,R110000133.

8. Walzer,M.,Qi,D.,Mayer,G.,Uszkoreit,J.,Eisenacher,M.,Sachsenberg,T.,Gonzalez-Galarza,F.F.,Fan,J.,Bessant,C.,Deutsch,E.W.etal.(2013)ThemzQuantMLdatastandardformassspectrometry-basedquantitativestudiesinproteomics.MolCellProteomics,12,2332-2340.

Page 9: Data Submission Guidelines for the ProteomeXchange Consortium

ProteomeXchangeandproteomicsdatasubmissionv2.0 15September2016

9

8 AppendixI:DatatypesProteomicsdatacomeinavarietyofforms,whicharedefinedhere:

- Mass spectrometer output files: the data and metadata generated by mass spectrometers,usuallyonefileperrun(althoughsomeinstrumentsputmultiplerunsperfile).Thedatamaybetheoriginalprofilemodescansormayalreadyhavehadsomebasicprocessinglikecentroidingapplied.Theymaybe:

o i)rawdata(asdescribedbelow).o ii) peak list spectra in a standardized format such as mzML, mzXML or mzData (see

below),buttheycannotbe‘processedpeaklists’(seebelow).However,itisimportantthatallofthescansthatweregeneratedareincludedwithapplicablemetadata.

- Rawdata: thebinary,vendor-specificoutput filesdirectlycreatedbythe instrumentsoftware.Thesefilesaretypicallylargeandrequirespecializedsoftwareinordertoberead.

- StandardizedMSdataformats:Therearecurrentlythreewidelyknownmassspectrometrydataformats inproteomics:mzXML(6)(developedattheInstituteofSystemsBiology(ISB),Seattle,USA), mzData (now made obsolete, originally developed by the HUPO Proteomics StandardsInitiative(PSI)),andthesuccessortobothoftheabove:mzML(7)(currentlyv1.1,jointlydevelopedbytheISBandPSI,http://www.psidev.info/mzml).Thesedataformatscanbeusedtorepresentprocessedpeaklists,aswellasrawdata.Inadditiontothemassspectra,theycontaindetailedmetadatathatprovidecontexttothemeasurements.

- Processedpeaklists:Heavilyprocessedformofmassspectrometrydata,usuallyderivedfromthe

rawdatafilesthroughvarious(semi-)automaticsteps,e.g.centroiding,deisotoping,andchargedeconvolution.Thesefilesareformattedinplaintext,withtypicalformatslikedta,pkl,ms2ormgf.TheyusuallycontainonlyasubsetofonlytheMS2scans(MS1scansareexcluded),andaremissingsignificantamountsofmetadatathatwerepresentinthesourceformat.

- Protein/peptideidentifications:Proteomicsmassspectracanbematchedtopeptidesorproteins,

resultinginidentificationsforthosespectra.Typicallyaspectrumisconsideredidentifiedifthescoreattributedtoapeptideorproteinmatchqualifiesagainstanapriorioraposterioridefinedthreshold.Inthecaseoffragmentationspectra,theinitialidentificationwillconsistofapeptidesequence;subsequentstepswillderivealistofproteinsfromtheidentifiedpeptides.Theproteinassemblystepcanbeadiscernibleprocesswithitsowninputandoutputfiles,oritcanbeimplicitin theoverall identificationsoftware.This informationcanberepresentedbyavarietyofdataformatscalled‘searchengineoutputfiles’(seebelow).

- Protein/peptidequantification:Protein/peptideexpressionvaluescanalsobeobtainedfroma

MS-based proteomics experiment. There is a high diversity of approaches that result in theexistenceofveryheterogeneoussoftwareanddataanalysispipelines.Somesearchenginesareabletoperformbothidentificationandquantification,andproduce‘searchengineoutputfiles’containingbothtypesofdata.However,ifthereissoftwarethatonlyperformsthequantificationpartof theanalysis, thegenerateddata is represented in ‘quantificationsoftwareoutput files’(seebelow).

- Search engine output files: They contain the data and metadata generated by the software

(usuallycalledsearchengines)usedforperformingtheidentificationandoftenthequantificationofpeptidesandproteins. Each searchenginehas itsown specificoutput file. The formats are

Page 10: Data Submission Guidelines for the ProteomeXchange Consortium

ProteomeXchangeandproteomicsdatasubmissionv2.0 15September2016

10

typicallyformattedineitherplaintextorXML,withtypicalformatslikeMascot.dat,OMSSAxml,etc.In addition to each specific format, a data standard format calledmzIdentML (currently v1.1,http://www.psidev.info/mzidentml)(2)hasbeendevelopedbythePSItorepresentthiskindofinformation.Somesearchengineoutputfilescanrepresentaswellquantificationresults,butthisisnot thecaseofmzIdentML.Asecondstandarddata formatcalledmzTab(tabdelimitedfile,http://www.psidev.info/mztab)(3)canrepresentbothidentificationandquantificationresults.

- Supported protein/peptide identification results: This definition includes all protein/peptide

identification processed data that can be fully represented by the receiving repository inProteomeXchange. PRIDE Archive and MassIVE fully support mzIdentML, which can now beexportedfromavarietyoftools(seeupdatedlistathttp://www.psidev.info/tools-implementing-mzidentml).ThePRIDEXMLformalisalsosupportedbyPRIDEArchive(itwastheoriginalPRIDEdataformat),althoughitisnotRECOMMENDEDtouseitifthesamedatacanberepresentedinmzIdentML.

- - Quantificationsoftwareoutputfiles:thedataandmetadatageneratedbythesoftwareusedfor

performingexclusivelythequantificationanalysisofpeptidesandproteins. Inadditiontoeachspecific format fromeach software tool, a data standard format calledmzQuantML (currentlyv1.0,http://www.psidev.info/mzquantml)hasbeenreleasedbythePSItorepresentthiskindofinformation (8). As mentioned before, a second data format called mzTab(http://www.psidev.info/mztab) canalso representquantification results, although is currentlynotyetfinished.

- Metadata:Whereasmassspectrapresentthecoreoutputofanymassspectrometer,asimple

collection of spectra does not provide sufficient information for confident interpretation.Somethingsimilarhappensforthepeptideandproteinidentificationsandtheirexpressionvalues.Thislackofcontextcanbesolvedbyprovidingrelevantmetadataalongwiththespectraand/ortheidentificationsandquantificationdata.Massspectrometer,searchengine,andquantificationsoftwareoutputfiles(seeabove)typicallyaccommodatethisinformation.

Page 11: Data Submission Guidelines for the ProteomeXchange Consortium

ProteomeXchangeandproteomicsdatasubmissionv2.0 15September2016

11

9 AppendixII:MetadataandthePXXMLmessage An XML XSD (XML SchemaDefinition) file has been drafted for use in the generation of the XMLmessages,whichareusedbyProteomeCentral.ThePXXMLschemacontains theagreedcommonmetadatabyallthePXmembers.Thephilosophybehindthedesignoftheproposedschemawastokeepitasflexibleaspossiblewithanoverallstructurebasedontheheavyuseofcontrolledvocabulary(CV)terms.Allelementsintheschemaaremandatoryapartfromthelastones(ChangeLog,DatasetFileList,RepositoryRecordListandAdditionalInformation).Thecorresponding.xsdfileisavailableathttps://raw.githubusercontent.com/proteomexchange/proteomecentral/master/lib/schemas/proteomeXchange-1.3.0.xsd.Thisisthelistofelementsintheschema:- ProteomeXchangeDataset: This is the root element with mandatory attributes. TheformatVersion attributecouldbeused ifanannouncementhas tobe repeatedwith some (minor)changes,e.g.theadditionofapublicationreference.-CvList:ThiselementlistsallCVs/Ontologiesthatwereusedtopopulatethefile.ThisensuresthatusedCVtermscanbetracedtotheiroriginanddefinition.-DatasetSummary:Thiselementcontainssomebasicinformationaboutthesubmission,like‘title’,‘announcementdate’or‘projectdescription’.Moreover,someadditionalinformationaboutthetypeofsubmission(fullysupported(‘complete’)ornot(‘partial’)bythereceivingrepository),andwhetherarelatedmanuscripthasalreadybeenpublishedisalsoincludedinthiselement.- DatasetIdentifierList: This element includes the identifiers that will unambiguouslycharacterizethedataset:forinstance,thePXaccessionnumberandtheDigitalObjectIdentifier(DOI),ifrelevant.- DatasetOriginList: The aim of this element is to know if the dataset constitutes a newsubmission,or thesubmissiondescribes the reprocessingofapreviously submitteddataset.EveryreanalysisperformedonaparticulardatasetgetsadifferentPXaccessionnumber.-SpeciesList:Containsinformationaboutthespeciesincludedinthedataset.- InstrumentList: Element holding the overall information about the instrumentation used in thegenerationofthedata.- ModificationList: All protein modifications (natural and artificial) are listed in this record(specifiedasCVterms).Ifadatasetdoesnotcontainanymodifications,itisalsoexplicitlyannouncedherewithaspecificCVterm.-ContactList:Informationabouttheresearchersinvolvedinthegenerationandsubmissionofthedataset.-PublicationList:Thelistofpublicationsthatthedatasethasgenerated.-KeywordList:OneormoreCVtermsthatdefinealistofkeywordsthatmaybeattributedtothedataset.-FullDatasetLinkList:Listoflinksthatwillallowaccesstothedata.Differentlinksmaybeusedfordifferentwaysofaccessing thedata (forexampleFTPdownloador repositoryweb link)or fordifferentrepositorieshostingthesamedata.-DatasetFileList:Optionalelement toprovide individual links toall the submitted files (massspectrometeroutputfiles,searchengineoutputfiles,etc)belongingtothedataset.-RepositoryRecordList: This optional element allows a repository to report informationwithmoregranularityifavailable.Forexamplelinksandinformationcouldbeprovidedforeachpart/resultfileofalargerdataset.-AdditionalInformation:OptionalelementthatincludesanyotherCVtermsthatcanbeusedtodescribethedataset.-ChangeLog:Anelementthatrecordscommentsforallchangesmadetothefilesinceitsfirstrelease.ThiselementisoptionalforthefirstreleaseofthePXXMLonly,allsuccessivereleasesmustprovide

Page 12: Data Submission Guidelines for the ProteomeXchange Consortium

ProteomeXchangeandproteomicsdatasubmissionv2.0 15September2016

12

achangelogentry.DifferentversionsofthePXXMLannouncementforthesamePXdatasetscanbemadeavailabletoProteomeCentral.Thishappensifsomeinformationincludedthereisupdated(forinstance,thefinalversionofthereferenceofapublication).AlltheversionsaretrackedandkeptinProteomeCentral.Afterreprocessingofadataset,iftheresultingnewresultsaresubmittedtoPX,anewPXidentifierwillbegeneratedbutalsotheoriginalPXaccessionnumberwillberetained, toallowcoordinatedsearch for different views of data from one submission. This ensures that a simple one-timesubmission from a contributor is automatically distributed to all PX repositories with sufficientinformation.

Page 13: Data Submission Guidelines for the ProteomeXchange Consortium

ProteomeXchangeandproteomicsdatasubmissionv2.0 15September2016

13

10 AppendixIII:HowtogetnotifiedaboutnewPXdatasetsEachPXdatasetbecomespubliclyavailableonacceptanceorpublicationofthemanuscriptsupportedbythedataset.When a submission becomes publicly available, a short summary is released though a publicannouncementsystem,viaaRSSfeedcontainingalinktoafilewithadefinedXMLschema(PXXMLfile).ThePXXMLfilecontainskeyexperimentalmetadatasuchas:datasetidentifiers,sampledetails(e.g. species and protein modifications are mandatory), mass spectrometer, publication, list ofkeywords,etc.In addition, this file contains links to all the data, and allows PeptideAtlas, UniProt, and/or otherresourcestoevaluate,reprocessandintegratethedata.Infact,anymemberofthecommunitycansubscribetothisservice.Therearethreewaystodoit:1)[email protected])Onecanreceivetheseupdatesbye-mail.Ifyouwouldliketodothat,youneedtojointhePXGoogleGroup:-LogintoGooglewithyourpreferrede-mail.-Gotohttps://groups.google.com/group/proteomexchange/-Clickon"JointheGroup"button(theexactlocationdependsonyourpreferencesforhowthegroupsaredisplayedinyourwebbrowser).-Chooseyourpreferredoptionforreceivingthee-mailswiththenewdatasets.3)OnecansubscribetothefollowingRSSfeed:http://groups.google.com/group/proteomexchange/feed/rss_v2_0_msgs.xml

Page 14: Data Submission Guidelines for the ProteomeXchange Consortium

ProteomeXchangeandproteomicsdatasubmissionv2.0 15September2016

14

11 AppendixIV:MembershipintheProteomeXchangeConsortiumApplications for recognition as archival resources are welcome, and will be decided upon by theProteomeXchangeconsortiumbasedonthefollowingkeycriteria:

1. Experienceandfundinglevelofresource.2. Stability.3. Availabilityofdedicatedcurationstaff.4. Abilitytostoreandmakeaccessiblerawdata,metadata,andinterpretations.5. Worldwideunrestrainedavailabilityofstoreddatasetsfordownload.

The last version of the ProteomeXchange collaborative agreement (which can be found athttp://www.proteomexchange.org/documents/proteomexchange-collaborative-agreement)describesthestepsneededtobecomeamemberoftheconsortium.


Recommended