Post on 02-Oct-2020
transcript
Temesgen Dadi, Julianus Pfeuffer, Alexander Fillbrunn
The Center for Integrative
Bioinformatics (CIBI)
SeqAn and OpenMSIntegration Workshop
Julianus Pfeuffer, Alexander Fillbrunn
Mass-spectrometry data analysis in KNIME
OpenMS• OpenMS – an open-source C++ framework for computational mass
spectrometry
• Jointly developed at ETH Zürich, FU Berlin, University of Tübingen
• Open source: BSD 3-clause license
• Portable: available on Windows, OSX, Linux
• Vendor-independent: supports all standard formats and vendor-formats through proteowizard
• OpenMS TOPP tools – The OpenMS Proteomics Pipeline tools
– Building blocks: One application for each analysis step
– All applications share identical user interfaces
– Uses PSI standard formats
• Can be integrated in various workflow systems
– Galaxy
– WS-PGRADE/gUSE
– KNIME
Kohlbacher et al., Bioinformatics (2007), 23:e191
OpenMS Tools in KNIME
• Wrapping of OpenMS tools in KNIME via GenericKNIMENodes(GKN)
• Every tool writes its CommonToolDescription (CTD) via its command line parser
• GKN generates Java source code for nodes to show up in KNIME
• Wraps C++ executables and provides file handling nodes
Installation of the OpenMS plugin
• Community-contributions update site (stable & trunk)– Bioinformatics & NGS
• provides > 180 OpenMS TOPP tools as Community nodes – SILAC, iTRAQ, TMT, label-free, SWATH, SIP, …
– Search engines: OMSSA, MASCOT, X!TANDEM, MSGFplus, …
– Protein inference: FIDO
Peak
DataMaps
Annotated
Maps
Data Flow in Shotgun Proteomics
HPLC/MSSample
Sig.
Proc.
Data Reduction
Diff.
Quant.
Identification
Differentially
Expressed
Proteins
100 GB
1 GB50 MB
50 MB 50 kB
Raw
Data
Quantification StrategiesQuantitative Proteomics
Relative Quantification
Labeled
In vivo
14N/15N SILAC
In vitro
iTRAQ TMT 16O/18O
Label-Free
SpectralCounting MRM Feature-Based
Absolute Quantification
AQUA SISCAPA
After: Lau et al., Proteomics, 2007, 7, 2787
Quantitative Data – LC-MS Maps
• Spectra are acquired with rates up to dozens per second
• Stacking the spectra yields maps
• Resolution:
– Up to millions of points per spectrum
– Tens of thousands of spectra per LC run
• Huge 2D datasets of up to hundreds of GB per sample
• MS intensity follows the chromatographic concentration
LC-MS Data (Map)
10
Quantification(15 nmol/µl, 3x over-expressed, …)
Label-Free Quantification (LFQ)
• Label-free quantification is probably the most natural way of quantifying – No labeling required, removing further sources of
error, no restriction on sample generation, cheap
– Data on different samples acquired in different measurements – higher reproducibility needed
– Manual analysis difficult
– Scales very well with the number of samples, basically no limit, no difference in the analysis between 2 or 100 samples
LFQ – Analysis Strategy
1. Find features in all maps
1. Find features in all maps
2. Align maps
LFQ – Analysis Strategy
1. Find features in all maps
2. Align maps
3. Link corresponding features
LFQ – Analysis Strategy
1. Find features in all maps
2. Align maps
3. Link corresponding features
4. Identify features
GDAFFGMSCK
LFQ – Analysis Strategy
1. Find features in all maps
2. Align maps
3. Link corresponding features
4. Identify features
5. Quantify
GDAFFGMSCK
1.0 : 1.2 : 0.5
LFQ – Analysis Strategy
Feature-Based Alignment
• LC-MS maps can contain millions of peaks
• Retention time of peptides and metabolites can shift between
experiments
• In label-free quantification, maps thus need to be aligned in
order to identify corresponding features
• Alignment can be done on the raw maps (where it is usually
called ‘dewarping’) or on already identified features
• The latter is simpler, as it does not require the alignment of
millions of peaks, but just of tens of thousands of features
• Disadvantage: it replies on an accurate feature finding
Feature-Based Alignment
~350,000 peaks
~ 700 features
Feature Finding
• Identify all peaks belonging to one peptide
• Key idea:
– Identify suspicious regions (e.g. highest peaks)
– Fit a model to that region and identify peaks explained by it
Feature Finding
• Extension: collect all data points close to the seed
• Refinement: remove peaks that are not consistent with the model
• Fit an optimal model for the reduced set of peaks
• Iterate this until no further improvement can be achieved
Map 1
Map 2
Map k
…
rt
m/z
T1
T2
Tk
Consensus map
• Dewarp k maps onto a comparable coordinate system
• Choose one map (usually the one with the largest number of features) as reference map (here: map 2 -> T2 = 1)
Multiple Alignment
…
rt
LFQ with OpenMS in KNIME
• Identification• Feature finding and mapping• Map alignment• Feature linking• Statistical analysis with R Snippets• Visualization with KNIME plotting nodes
Preprocessing of single maps
Combining information of maps
Statistical post-processing and visualization