+ All Categories
Home > Education > Advanced strategies for Metabolomics Data Analysis

Advanced strategies for Metabolomics Data Analysis

Date post: 10-May-2015
Category:
Upload: dmitry-grapov
View: 517 times
Download: 4 times
Share this document with a friend
Description:
Part of a lectures series for the international summer course in metabolomics 2013 (http://metabolomics.ucdavis.edu/courses-and-seminars/courses). Get more material and information here (http://imdevsoftware.wordpress.com/2013/09/08/sessions-in-metabolomics-2013/).
Popular Tags:
33
Advanced Strategies for Metabolomic Data Analysis Dmitry Grapov, PhD Metabolomic Data Analysis
Transcript
Page 1: Advanced strategies for Metabolomics Data Analysis

Advanced Strategies for Metabolomic Data Analysis

Dmitry Grapov, PhD

Met

abol

omic

Dat

a An

alys

is

Page 2: Advanced strategies for Metabolomics Data Analysis

Analysis at the Metabolomic Scale

Page 3: Advanced strategies for Metabolomics Data Analysis

Multivariate Analysis

Samples

variables

Page 4: Advanced strategies for Metabolomics Data Analysis

Multivariate Analysis

• Visualization• Clustering• Projection• Modeling • Networks

Simultaneous analysis of many variables

Page 5: Advanced strategies for Metabolomics Data Analysis

ClusteringIdentify

•patterns

•group structure

•relationships

•Evaluate/refine hypothesis

•Reduce complexity

Artist: Chuck Close

Page 6: Advanced strategies for Metabolomics Data Analysis

Cluster AnalysisUse the concept similarity/dissimilarity to group a collection of samples or variables

Approaches•hierarchical (HCA)•non-hierarchical (k-NN, k-means)•distribution (mixtures models)•density (DBSCAN)•self organizing maps (SOM)

Linkage k-means

Distribution Density

Page 7: Advanced strategies for Metabolomics Data Analysis

Hierarchical Cluster Analysis• similarity/dissimilarity defines “nearness” or distance

X

Y

euclidean

X

Y

manhattan Mahalanobis

X

Y*

non-euclidean

Page 8: Advanced strategies for Metabolomics Data Analysis

Hierarchical Cluster Analysis

single complete centroid average

Agglomerative/linkage algorithm defines how points are grouped

Page 9: Advanced strategies for Metabolomics Data Analysis

Hierarchical Cluster Analysis (cont.)

Sim

ilarit

y

x

xx

x

Page 10: Advanced strategies for Metabolomics Data Analysis

Overview Confirmation

How does my metadata match my data structure?

Hierarchical Cluster Analysis (cont.)

Page 11: Advanced strategies for Metabolomics Data Analysis

Multidimensional Scaling

PLoS ONE 7(11): e48852. doi:10.1371/journal.pone.0048852

Page 12: Advanced strategies for Metabolomics Data Analysis

Projection of Data

The algorithm defines the position of the light sourcePrincipal Components Analysis (PCA)

• unsupervised• maximize variance (X)

Partial Least Squares Projection to Latent Structures (PLS)

• supervised• maximize covariance (Y ~ X)

Page 13: Advanced strategies for Metabolomics Data Analysis

PCA: GoalsPrincipal Components (PCs)

•non-supervised

•projection of the data which maximize variance explained

Results

1.eigenvalues = variance explained

2.scores = new coordinates for samples (rows)

3.loadings = linear combination of original variables

James X. Li, 2009, VisuMap Tech.

Page 14: Advanced strategies for Metabolomics Data Analysis

Interpreting PCA Results

Variance explained (eigenvalues)

Row (sample) scores and column (variable) loadings

Page 15: Advanced strategies for Metabolomics Data Analysis

PCA Example

*no scaling or centering

glucose

219021

Page 16: Advanced strategies for Metabolomics Data Analysis

How are scores and loadings related?

Page 17: Advanced strategies for Metabolomics Data Analysis

Centering and Scaling

van den Berg RA, Hoefsloot HC, Westerhuis JA, Smilde AK, van der Werf MJ (2006) Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genomics 7: 142.

Page 18: Advanced strategies for Metabolomics Data Analysis

Data scaling is very important!

*autoscaling (unit variance and centered)

glucose (GC/TOF)

glucose (clinical)

219021

Page 19: Advanced strategies for Metabolomics Data Analysis

Use PLS to test a hypothesis

Loadings on the first latent variable (x-axis) can be used to interpret the multivariate changes in metabolites which are correlated with time

time = 0 120 min.

Page 20: Advanced strategies for Metabolomics Data Analysis

Modeling multifactorial relationships

dynamic changes among groups~two-way ANOVA

Page 21: Advanced strategies for Metabolomics Data Analysis

“goodness” of the model is all about the perspective

Determine in-sample (Q2) and out-of-sample error (RMSEP) and compare to a random model

•permutation tests

•training/testing

Page 22: Advanced strategies for Metabolomics Data Analysis

Biological Interpretation

• Visualization• Enrichment• Networks

– biochemical– structural– spectral– empirical

Projection or mapping of analysis results into a biological context.

Page 23: Advanced strategies for Metabolomics Data Analysis

Ingredients for Network Analysis

1. Determine connections• biochemical (substrate/product) • chemical similarity• spectral similarity• empirical dependency (correlation)

2. Determine vertex properties• magnitude• importance• direction• relationships

Page 24: Advanced strategies for Metabolomics Data Analysis

Organism specific biochemical relationships and information

Multiple organism DBs

•KEGG

•BioCyc

•Reactome

•Human

•HMDB

•SMPDB

Making Connections Based on Biochemistry

Page 25: Advanced strategies for Metabolomics Data Analysis

Biochemical Networks

Page 26: Advanced strategies for Metabolomics Data Analysis

•Use structure to generate molecular fingerprint

•Calculate similarities between metabolites based on fingerprint

•PubChem service for similarity calculations•http://pubchem.ncbi.nlm.nih.gov//score_matrix/score_matrix.cgi

•online tools•http://uranus.fiehnlab.ucdavis.edu:8080/MetaMapp/homePage

BMC Bioinformatics 2012, 13:99 doi:10.1186/1471-2105-13-99

Making Connections Based on structural similarity

Page 27: Advanced strategies for Metabolomics Data Analysis

Structural Similarity Network

Page 28: Advanced strategies for Metabolomics Data Analysis

Making Connections Based on spectral similarity

Watrous J et al. PNAS 2012;109:E1743-E1752

•Connect molecules based on EI or MS/MS spectral similarity

•Useful for linking annotated analytes (known) to unknown

Page 29: Advanced strategies for Metabolomics Data Analysis

Spectral Similarity Network

Watrous J et al. PNAS 2012;109:E1743-E1752

Page 30: Advanced strategies for Metabolomics Data Analysis

Making connections based on empirical relationships

•Connect molecules based on strength of correlation or partial-correlation

Page 31: Advanced strategies for Metabolomics Data Analysis

Treatment Effects Network

=

MetabolitesShape = increase/decreaseSize = importance (loading)Color = correlation

Connectionsred = Biochemical relationships violet = Structural similarity

Page 32: Advanced strategies for Metabolomics Data Analysis

Summary

Multivariate analysis is useful for: • Visualization• Exploration and overview• Complexity reduction• Identification of multidimensional

relationships and trends• Mapping to networks• Generating holistic summaries of

findings

Page 33: Advanced strategies for Metabolomics Data Analysis

Resource

•Mapping tools (review)• Brief Bioinform (2012) doi: 10.1093/bib/bbs055

•Tutorials and Examples• http://imdevsoftware.wordpress.com/category/uncategorized/ • https://github.com/dgrapov/TeachingDemos

•Chemical Translations Services• CTS: http://cts.fiehnlab.ucdavis.edu/

•R-interface: https://github.com/dgrapov/CTSgetR • CIR: http://cactus.nci.nih.gov/chemical/structure

•R-interface: https://github.com/dgrapov/CIRgetR


Recommended