PSI Mass Spectrometry StandardsWorking Group Summary
HUPO PSI MS Standards Working Group
PSI MSS & PI WGs Joint Session2:00 – 5:30
10 minutes each:• mzIdentML• mzQuantML (incl support for SRM)• mzTab (incl. support for SRM)• TraML• mzML & imzML & compression• metabolomics & ion mobility & SWATH-MS• PEFF• MIAPE: MS & MSI & Quant• Controlled vocabulary• Cross-linking• Protein grouping• PTM localization• RNA-seq assisted proteomics
2
TraML
• Discussed ongoing implementation and use efforts
• Identified some outdated information in documentation. TODO: update
• Online validator has outdated CV. TODO: update
• No schema or CV updates needed at this time
• Discussed adding Waters format to jTraML converter
3
mzML
• Schema remains stable with no needed changes
• Many updates to the CV continue
• imzML continues to be aligned with mzML
• mz5 format was recently published as mzML clone using HDF5 not XML
• Discussions ongoing by metabolomics groups about how mzML would meet the needs of the metabolomics community. Some terms already added. Expect some more proposals soon.
• File size continues to be a significant problem…
4
mzML (or other) compression
• Andy Dowsey & Faviel Gonzalez Galarza presented their work on compression
• mz5 claims 50%+ space savings by using HDF5
• Implementation discussed vs. alternate HDF5 implementations
• But significant space savings was achieved via tricks that could work in mzML
• Discussed other work and proposed work in mass-spec aware compression
• Possibility: alternative to zlib internal compression (currently supported) could be a mass-spec aware “mszip” compression scheme. Provided as a simple, open-source routine available in many languages
• Possibility: Develop a variant of zlib that creates files that can be uncompressed normally, but allow indexing into the compressed file
5
MB
April 2013
~50% compression using mz5
File compression results
Orbitrap profile-mode spectra
Compressing mzML
April 2013
SYNAPT
57.1%50.6%
36.8%
45.1% 43.3%
Other presentations
• Shin presented on the use of RDF & TogoDB
• Mathias presented about qcML
8
Ion mobility MS & SWATH-MS
• Discussed with Waters their ion mobility data
• Discussed required terms and practices for encoding raw IMS and peak-picked IMS data. Proposal to be publicized on lists for further comment
• No schema change necessary
9
RNA-seq assisted proteomics
• Good discussion of the state of the field on this workflow
• Discussed using/promoting the PEFF format as a useful mechanism for encoding some of the RNA-seq results for use by proteomics searches
• Discussed possible need to update MIAPE documents to capture information about what is done in this type of a workflow
10
Controlled Vocabulary
• It is time to get the vendors to update their instrument and software terms again. Gerhard will repeat the effort done by Luisa years ago.
• Worked to get rid of purgatory branch in CV
• Discussed what to do with multiple SoftwareName:specificTerm entries that are effectively the same concept. Start by grouping similar terms under a common parent
• Discussed constraining some terms with an is_a relationship to a concept like “value between 0 and 1 inclusive”
11
• Interest in finalising the format specification and make it available
• Cannot expect that (most of the) DB providers will produce it in addition to their existing format
• Cannot expect that (most of the) search engines will fully take advantage of its structure (variants, PTMs, …) in the identification jobs
• A converter («source»-to-PEFF) and a reader already exist . Could be a reference implementation
=>A few minor open issues to be resolved and finalise the recommendation
PEFF (PSI Extended Fasta Format)
MIAPE-MSI
• Document is updated• Mapping to mzIdentML is validated
• Collection of up-to-date example instance documents ongoing
• Semantic validator for mzIdentML ongoing
=> Prepare submission of version 1.2 to PSI doc process