Large-scale analysis of non-targeted LC-MS metabolomics data withOpenMS in the Compound DiscovererTM platform
Fabian Aicheler1, Timo Sachsenberg1, Erhan Kenar1, Sebastian Kusch2, Hans Grensemann2, Oliver Kohlbacher11Center for Bioinformatics, Tübingen, Germany; 2Thermo Fisher Scientific GmbH, Germany
OVERVIEWPurpose: Integration of a pipeline for feature identi-fication and quantification into Thermo Fisher Com-pound DiscovererTM (CD).Implementation: Performance of a feature detec-tion algorithm was demonstrated on dilution series.An OpenMS [1] pipeline was constructed around thisalgorithm and wrapped into a CD community node.Pipeline output was incorporated into the CD report-ing format.Results: We announce the first integration of an au-tomated workflow for metabolite quantification into thenovel CD platform, providing a community extensionthat enables the differential analysis of multiple LC-MSruns.
INTRODUCTION• Label-free quantification of small molecules using
LC-MS has become a standard analytical technol-ogy.
• Complex LC-MS datasets require automated proc-essing such as mass trace detection and assemblyof isotopic traces to features, followed by quantifica-tion and identification of compounds.
• Recently, Kenar et al. [2] presented a sensi-tive feature detection algorithm which results inreproducible metabolite quantification for smallmolecules.
• CD is a new mass spectrometry analysis platformscheduled for a release later this year. Analogous toProteome Discoverer, which is tailored to proteins,CD is adapted for small molecule analysis. CD hasbeen designed to allow integration of external toolsand algorithms as so-called community nodes.
• Our aim was the integration of a metabolic featureidentification and quantification pipeline into CD.Besides creating the necessary interfaces, consis-tent presentation of CD results and their export forstraightforward downstream analysis outside of CDwere declared goals.
IMPLEMENTATION
CC© Sources: tinyurl.com/lm85xsu, tinyurl.com/ldtvazo.
Figure 1: The OpenMS pipeline is implemented as commu-nity node in CD. Result export as tables allows downstreamanalysis outside CD, for example in Knime.
The open-source software library OpenMS allows forrapid development of mass spectrometry algorithmsand tools. It includes methods for retention time align-ment and feature linking [3]. We expanded this toolsetwith a novel algorithm for feature detection and non-targeted quantification of small molecule LC-MS data.Our OpenMS metabolite quantification workflow wasencapsulated in a single CD node. Evaluations of themethod by Kenar et al. included human plasma sam-ples with spiked-in metabolites.
RT
m/z
RT
m/z
Intensity
RT
mass trace detection peak separation
RT
m/z
RT
m/z
+1/z+2/z
ΔRT
RT
m/z
feature assembly hypothesis testing
(a) (b)
(d) (c)
T0 T1 T2
T0,1 T1,1 T2,1
T0,2 T1,2 T2,2
Figure 2: Methodology overview of the feature detectionalgorithm.
RESULTS
Figure 3: OpenMS community node in a CD example work-flow.
OpenMS results are incorporated into the data pro-cessing and visualization capabilities of CD, offeringtightly integrated presentation and downstream analy-sis in the CD software. Restriction to Thermo Fisherinstruments allowed optimized parameter choices forthe OpenMS algorithms.
Figure 4: Result view of CD with integrated OpenMS results.
In the evaluation of the feature detection algorithm,correlations above 0.98 between feature intensitiesand corresponding compound concentrations were re-ported.
Figure 5: Correlations for chosen metabolites in dilutionexperiments.
To assess the quality of our small molecule detectionpipeline, we investigated reproducibility in terms of fea-ture recurrence over multiple measurements. Our inte-grated feature detection algorithm was compared withXCMS/CAMERA in a dilution series (33 MS runs). Atime series of Prazosin metabolism in rats was used tocompare our method with the feature detection algo-rithm provided by CD (6 MS runs).
Found in # OpenMS XCMSsamples Camera
1-5 5590 18376-9 591 24210-13 258 11514-17 183 7818-21 124 8222-25 124 5226-29 128 5230-33 744 341
Found in # OpenMS Componentsamples Elucidator
1 5369 57772 2480 26843 1719 14344 2288 15435 1491 9036 1446 1173
Table 1: Left: Reproducible features for OpenMS andXCMS/CAMERA (dilution series, 33 MS runs). Right: Re-producible features for OpenMS and Component Elucidator(Prazosin time series, 6 MS runs).
19032 1138521786
OpenMS Component Elucidator
Figure 6: Overlap of detected features measured in a timeseries of Prazosin metabolism.
CD results can be exported to tabular file formats,allowing downstream analysis outside of CD. Herewe use the KoNstanz Information MinEr (KNIME) [4].KNIME supports a multitude of processing modules forcheminformatics, machine learning and statistics. Adownstream analysis workflow in KNIME (See Figure7) allows elaborate analysis of CD results.
Figure 7: Example analysis in KNIME.
CONCLUSION• We successfully integrated a robust, sensitive fea-
ture quantification method into CD, enabling jointanalysis of multiple runs.
• Reduction of parameters and integration into CDsignificantly improved accessibility of this state ofthe art metabolite quantification workflow.
• Source code (C#) of our community node will befreely available under an open-source license par-allel to the release of CD.
REFERENCES
[1] Sturm et al. OpenMS - an open-source software frame-work for mass spectrometry. BMC Bioinformatics, 9:163,2008.
[2] Kenar et al. Automated label-free quantification ofmetabolites from liquid chromatography-mass spectrom-etry data. Mol. Cell. Proteomics, 13(1):348–59, 2014.
[3] Weisser et al. An automated pipeline for high-throughputlabel-free quantitative proteomics. J. Proteome Res.,12(4):1628–1644, 2013.
[4] Berthold et al. KNIME: The Konstanz Information Miner.In Studies in Classification, Data Analysis, and Knowl-edge Organization (GfKL 2007). Springer, 2007.