+ All Categories
Home > Documents > Multivariate NMR analysis of human disease models

Multivariate NMR analysis of human disease models

Date post: 10-Apr-2022
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
165
University of Calgary PRISM: University of Calgary's Digital Repository Graduate Studies The Vault: Electronic Theses and Dissertations 2013-02-01 Multivariate NMR analysis of human disease models Duggan, Gavin Duggan, G. (2013). Multivariate NMR analysis of human disease models (Unpublished doctoral thesis). University of Calgary, Calgary, AB. doi:10.11575/PRISM/27043 http://hdl.handle.net/11023/540 doctoral thesis University of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission. Downloaded from PRISM: https://prism.ucalgary.ca
Transcript
Page 1: Multivariate NMR analysis of human disease models

University of Calgary

PRISM: University of Calgary's Digital Repository

Graduate Studies The Vault: Electronic Theses and Dissertations

2013-02-01

Multivariate NMR analysis of human disease models

Duggan, Gavin

Duggan, G. (2013). Multivariate NMR analysis of human disease models (Unpublished doctoral

thesis). University of Calgary, Calgary, AB. doi:10.11575/PRISM/27043

http://hdl.handle.net/11023/540

doctoral thesis

University of Calgary graduate students retain copyright ownership and moral rights for their

thesis. You may use this material in any way that is permitted by the Copyright Act or through

licensing that has been assigned to the document. For uses that are not allowable under

copyright legislation or licensing, you are required to seek permission.

Downloaded from PRISM: https://prism.ucalgary.ca

Page 2: Multivariate NMR analysis of human disease models

“1% inspiration and 99% perspiration”

– Thomas Edison

Page 3: Multivariate NMR analysis of human disease models

i

UNIVERSITY OF CALGARY

Multivariate NMR analysis of human disease models

By

Gavin Duggan

A THESIS

SUBMITTED TO THE FACULTY OF GRADUATE STUDIES

IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE

DEGREE OF DOCTOR OF PHILOSOPHY

DEPARTMENT OF BIOLOGICAL SCIENCES

CALGARY, ALBERTA

January, 2013

© Gavin Duggan 2013

Page 4: Multivariate NMR analysis of human disease models

ii

ABSTRACT

In the late 1990s, the field of metabolic profiling evolved into metabolomics

following the general move towards systems biology and other omics techniques.

Using sensitive, analytical platforms such as NMR, metabolomics aims to gather an

unbiased, broad perspective of the active biochemistry in biofluids. The result was

an explosive growth in the data available to study short term physiological effects,

followed perforce by the application of multivariate pattern-recognition techniques

to aid in its interpretation.

Given the sensitive and comprehensive nature of the technique, it quickly became

apparent that any number of artifactual or spurious relationships appear in the

results. To alleviate those concerns, a variety of improved experimental designs,

analytical techniques, and validation paradigms can be applied. Starting with a

basic experimental design, the aim of this work is to explore the ability of properly

validated metabolomics to provide useful information about the metabolic shifts

seen in established animal models of insulin resistance, a human disease with

increasing medical significance. Different two-factor experimental designs are used

to refine the results of this early study, validate the resulting hypothesis and

reinforce its interpretation.

Having seen significant differences in ostensibly identical batches of animals in the

first three experiments, further analysis of the differences are performed.

Techniques for comparing batch models, as a form of multivariate hypothesis

validation, are evaluated and the ability of statistical techniques to predict or

ameiliorate these “batch effects” is studied. Finally, a rat model of vitamin C

deficiency, another condition with ongoing pathological implications in the third

world, is studied using the same metabolomic techniques. The identified metabolic

shifts are subjected to a complete pathway analysis, the context of which provides a

potentially interesting insight into the regulation of an important human oxidative

damage control mechanism.

Page 5: Multivariate NMR analysis of human disease models

iii

ACKNOWLEDGEMENTS

Research is never a solitary venture, and all of the work presented herein was

accomplished with significant assistance. I would like to thank Dr. Vogel and Dr.

Weljie for the ongoing guidance, patience, and encouragement. Additionally, many

collaborators were instrumental to the success of each project.

At every stage, Dr. Jane Shearer has provided every measure of pivotal assistance.

The experimental design for chapters 2, 3, and 4 were developed in whole or in part

with her assistance. Her expertise in animal management was essential to the

acquisition of high quality data, upon which any analytical techniques are

dependant.

The vitamin-C analysis performed in chapter 6 is based on the Gulo -/- mouse

model developed by Dr Frank Jirik. All of the animal handling and fluid acquisition

was performed by Joan Miller and other members of Dr Jirik’s lab.

Over the years, my teachers, friends, and family have provided years of support and

encouragement without which I could never have started, let alone completed, this

endeavour. I’d like to thank all of them, but most of all Sarah, with whose help a

great many more things are possible, and everything is more enjoyable.

Page 6: Multivariate NMR analysis of human disease models

iv

DEDICATION

To the many great teachers I’ve had – most especially

Pierre Poitras

Ruth Benda

Morris Tchir

for their willingness and ability to inspire, motivate, and believe.

Page 7: Multivariate NMR analysis of human disease models

v

TABLE OF CONTENTS

Abstract ............................................................................................................................... ii Acknowledgements ............................................................................................................ iii Dedication .......................................................................................................................... iv

Table of Contents .................................................................................................................v

List of Tables .................................................................................................................... vii List of Figures ................................................................................................................... vii LIST of Abbreviations ..................................................................................................... viii Citations for previously published chapters ....................................................................... ix

CHAPTER 1 - INTRODUCTION .....................................................................................10

1.1 Preamble and Context ..............................................................................................10

1.2 What is metabolomics? ............................................................................................14

1.3 What is metabolomics used to study? ......................................................................16

1.3.1 Disease Models ................................................................................................16

1.3.2 Epidemiology, toxicology, and in situ pathology ............................................17

1.3.3 Insulin resistance related disease .....................................................................19

1.4 How is metabolomics studied? ................................................................................21

1.4.1 Collection ........................................................................................................22

1.4.2 Quantification ..................................................................................................25

1.4.3 Modeling ..........................................................................................................33

1.4.4 Interpretation ...................................................................................................43

1.5 Where does metabolomics break down? .................................................................47

1.6 Research Scope ........................................................................................................49

CHAPTER 2 - METABOLOMIC PROFILING OF DIETARY-INDUCED INSULIN RESISTANCE IN THE FAT–FED MOUSE ...........................................................52

2.1 Abstract ....................................................................................................................52

2.2 Introduction ..............................................................................................................53

2.3 Experimental procedures .........................................................................................54

2.4 Results ......................................................................................................................58

2.5 Discussion ................................................................................................................66

CHAPTER 3 - DIFFERENTIATING SHORT- AND LONG-TERM EFFECTS OF DIET IN THE OBESE MOUSE USING 1H-NUCLEAR MAGNETIC RESONANCE METABOLOMICS...................................................................................................70

3.1 Abstract ....................................................................................................................70

3.2 Introduction ..............................................................................................................70

3.3 Methods ...................................................................................................................71

3.4 Results ......................................................................................................................73

3.5 Conclusions ..............................................................................................................76

CHAPTER 4 METABOLOMIC RESPONSE TO EXERCISE TRAINING IN LEAN AND DIET-INDUCED OBESE MICE .............................................................................77

Page 8: Multivariate NMR analysis of human disease models

vi

4.1 Abstract ....................................................................................................................77

4.2 Introduction ..............................................................................................................78

4.3 Methods ...................................................................................................................79

4.4 Results ......................................................................................................................84

4.5 Discussion ................................................................................................................93

CHAPTER 5 COMPARISON OF MULTIPLE HIGH-FAT MOUSE METABOLOMICS EXPERIMENTS .......................................................................................................98

5.1 Introduction ..............................................................................................................98

5.2 Comparison of resulting models ..............................................................................99

5.2.1 Branched Chain Amino Acids .........................................................................99

5.2.2 Aromatic amino acids ....................................................................................101

5.2.3 Single-carbon amino acids ............................................................................102

5.2.4 Carnitine and Ketone bodies .........................................................................102

5.2.5 Choline ..........................................................................................................103

5.3 Limitations and Omissions ....................................................................................103

5.4 Estimtaing batch effects and bias ...........................................................................104

5.4.2 Variable importance estimators .....................................................................105

CHAPTER 6 - METABOLIC PROFILING OF VITAMIN C DEFICIENCY IN GULO−/− MICE USING PROTON NMR SPECTROSCOPY ...............................................108

6.1 Abstract ..................................................................................................................108

6.2 Introduction ............................................................................................................108

6.3 Experimental procedures .......................................................................................111

6.4 Results ....................................................................................................................114

6.5 Discussion ..............................................................................................................119

6.6 Conclusions ............................................................................................................124

CHAPTER 7 CONCLUSIONS AND FUTURE WORK ................................................126

Time Series: The Future of Metabolomics ..................................................................133

Closing Thoughts .........................................................................................................136

CHAPTER 8 BIBLIOGRAPHY......................................................................................137

Page 9: Multivariate NMR analysis of human disease models

vii

LIST OF TABLES

Table 2.1: Positive contributors to multivariate analysis. ............................................................. 60

Table 2.2 Negative contributors to multivariate analysis. ............................................................ 61

Table 3.1: Univariate measures of obesity .................................................................................... 75

Table 4.1: Change in body mass, glucose, NEFA and Insulin levels ........................................... 85

Table 4.2 Individual metabolite changes in response to exercise and diet ................................... 88

Table 4.3 Univariate and multivariate significance scores ........................................................... 89

Table 6.1 Cross validation metrics for model training ............................................................... 115

LIST OF FIGURES

Figure 1.1: mTOR/AKT pathway diagram ................................................................................... 20

Figure 1.2: Overview of Metabolomics experiments. (Madsen 2011) ........................................ 22

Figure 1.3: Examples of NMR and MS spectra ............................................................................ 26

Figure 1.4: Example loading plots ................................................................................................ 37

Figure 1.5: Example score plots.................................................................................................... 38

Figure 1.6: Example Shared and Unique Structure (SUS) plot .................................................... 40

Figure 1.7: KEGG pathway diagram for the TCA cycle .............................................................. 46

Figure 2.1: Mean glucose infusion rates during a E-H clamp. ..................................................... 59

Figure 2.2: OPLS-DA Scores Plot ................................................................................................ 63

Figure 2.3: (VIP) from model 1 .................................................................................................... 63

Figure 2.4 SUS plot comparing loadings for model 1 vs 2 ........................................................... 64

Figure 3.1 OPLS Results.............................................................................................................. 74

Figure 4.1: Class separation and comparison of SED vs EX response ......................................... 87

Figure 4.2 Significance of changes in metabolite levels ............................................................... 90

Figure 4.3 mTOR and Akt western blot responses in liver and muscle ....................................... 92

Figure 5.1 BCAA mechanism of IR induction proposed by Newgard et al ............................... 100

Figure 5.2 OPLS loadings comparison schema .......................................................................... 106

Figure 5.3 Loadings replication vs significance estimation for mouse replicates ...................... 107

Figure 6.1: Effect of Gulo -/- knockout on mass gain ................................................................ 114

Figure 6.2: Cross-validation predicted vs actual body mass for batch 1 mice. .......................... 115

Figure 6.3: WT vs KO hierarchical OPLS scores ....................................................................... 116

Figure 6.4: External validation results ........................................................................................ 117

Figure 6.5: First component OPLS loadings of WT vs KO mice ............................................... 118

Figure 6.6 KEGG pathway diagram for serine and glycine metabolism .................................... 123

Page 10: Multivariate NMR analysis of human disease models

viii

LIST OF ABBREVIATIONS

1H Single-proton hydrogen 13C Carbon-13 DNA Deoxyribonucleic acid HSQC Heteronuclear Single Quantum Coherence, an NMR Spectroscopy technique KEGG The Kyoto Encyclopedia of Genes and Genomes MS Mass Spectrometry mTOR Mammalian Target of Rapamycin protein NMR Nuclear Magnetic Resonance Spectroscopy OPLS Orthogonal PLS PCA Principle Component Analysis PI3K Phosphatidylinsositol 3-kinase family of enzymes PLS Projection to Latent Structures -DA Discriminant Analysis RNA Ribonucleic Acid TCA Tricarboxylic acid cycle

Page 11: Multivariate NMR analysis of human disease models

ix

CITATIONS FOR PREVIOUSLY PUBLISHED CHAPTERS

Chapter 2 – (Shearer et al., 2008)

Shearer, J, Duggan, G., Weljie, A., Hittel, D. S., Wasserman, D. H., & Vogel, H. J. (2008). Metabolomic profiling of dietary-induced insulin resistance in the high fat-fed C57BL/6J mouse. Diabetes, Obesity & Metabolism, 10(10), 950–958. doi:10.1111/j.1463-1326.2007.00837.x

Chapter 3 (Duggan et al., 2011a)

Duggan, G.E., Hittel, D.S., Hughey, C.C., Weljie, A., Vogel, H.J., and Shearer, J. (2011). Differentiating short-and long-term effects of diet in the obese mouse using 1H-nuclear magnetic resonance metabolomics. Diabetes, Obesity and Metabolism 13, 859–862.

Chapter 4 –(Duggan et al., 2011b)

Duggan, G.E., Hittel, D.S., Sensen, C.W., Weljie, A.M., Vogel, H.J., and Shearer, J. (2011b). Metabolomic response to exercise training in lean and diet-induced obese mice. Journal of Applied Physiology 110, 1311–1318.

Chapter 6 – (Duggan et al., 2011c)

Duggan, G.E., Joan Miller, B., Jirik, F.R., and Vogel, H.J. (2011). Metabolic profiling of vitamin C deficiency in Gulo-/- mice using proton NMR spectroscopy. Journal of Biomolecular NMR 49, 165–173.

Page 12: Multivariate NMR analysis of human disease models

10

Chapter 1 - Introduction

1.1 Preamble and Context

Systems biology has been an important renaissance in the analysis of living systems,

human or otherwise. Where the study of individual elements yields insight to their

properties, examining the connections between gives them context (Aristotle, 1985).

Connections, and the networks they form, represent the true combinatorial

complexity of life, and only by studying the interactions between elements can the

entirety of a process, ergo its function, be elucidated (Oliver et al., 1998; Kell, 2006;

Trewavas, 2006; van der Greef et al., 2007).

Perhaps the most visible manifestation of this revolution has been the proliferation

of -omics technologies. In the last two decades of the previous century, high-

throughput sequencing techniques gave rise to the study of Genomics, with the

promise of completely characterising our biology by studying the genetic blueprint

from which it is constructed (Collins et al., 1998). The focus of genetic

experiments shifted from individual genetic loci, identified by their relationship to

some phenomenon, to an unbiased characterisation of an entire DNA sequence

which could be mined and analyzed a priori (McElheny, 2012). This shift in

perspective was not the precursor of systems biology but rather the harbinger of its

rise to prominence in the scientific and popular awareness; the study of interactions

in biological systems dates back decades (Woodger, 1929; Von Bertalanffy, 1950;

Pauling et al., 1971), if not longer (Descartes, 1643).

Because of their relatively static nature of genetic material, genomics was a natural

first frontier for this form of automated, unbiased analysis of biological state

(Lander et al., 2001). Unfortunately, pure1 genomics also suffers because the same

1 “Pure” genomics referring to the study of exclusively the DNA sequence, which

we now know (in large part due to genomics’ inability to explain many biological

Page 13: Multivariate NMR analysis of human disease models

11

relative intransience of its subject matter: genetic material ostensibly does not

change over the course of an organism’s lifespan. Life is dynamic by definition, not

the least because of its inseparable relationship with its environment, and hence it

was desirable to develop techniques for the study of other, more dynamic biological

entities in the same unbiased fashion.

Proteins were the next step in the analysis of ongoing changes in an organism’s

state; whether receptors or enzymes, they are the primary means by which a living

cell recognizes and reacts to changes in its external or internal state. Moreover,

protein’s relatively large mass, segmental structure, and chemical composition made

them amenable to examination by mass spectrometry: the field of Proteomics was

the result, fast on the heels of genomics (James, 1997). The mechanisms of

production and expression being inexorably linked to the operation of protein

networks within the cell, transcriptomics followed almost immediately by

leveraging microarray technology to analyse RNA molecules in parallel (Clark et

al., 2002; Bertone et al., 2004).

Like genomics though, protein-centric -omics failed to capture many temporal facets

of an organism’s state (Vitzthum et al., 2005; Mai et al., 2011; Zhu et al., 2011).

Protein expression being an energetically expensive, tightly regulated, and

physically constrained process, the time scale on which protein levels change is still

relatively slow compared to many physiological and environmental changes (Bino

et al., 2004; Scalbert et al., 2009). The smaller, chemical-scale molecules with

which proteins interact represent the largest source of energy, entropy, and (as a

combination of the two) dynamic information content in a cell (Bino et al., 2004;

van der Greef et al., 2007).

phenomena) represents only some of the information contained within genetic

material, hence the development of epi-genomics.

Page 14: Multivariate NMR analysis of human disease models

12

Small molecule metabolites act as long distance, concentration- based signals and

effectors of change within and outside the cell. They turn over on a much faster

range of time scale within a cell, making them a closer indicator of environmental

response, and changes in small-molecule concentrations can affect the activity level

of enzymes in a cell via mechanisms such as allosteric modification or classical

Michaelis-Menten enzyme kinetics (Raamsdonk et al., 2001; Fiehn, 2002; Goodacre

et al., 2004; Kell, 2006). As a result, metabolites are both products of, and

information conduits within, protein function networks, which makes them

extremely informative targets of investigation.

Metabolites are also relatively easy to collect. Whether they’re exported from the

cell, the result of membrane-local processes, or released from the cellular membrane

by experimental permeation, small molecules provide an accessible window into the

state of, or interactions between, biological processes (Fiehn, 2002). The ability of a

single protein molecule to catalyse changes in numerous metabolites enhances this

accessibility further by decreasing the need for sensitivity; when multiple proteins

form a metabolic cascade as in the case of insulin signalling, the result is an

exponential growth in the metabolic information content (Fell, 1997).

As such, it’s no coincidence studies of metabolism are some of the oldest forms of

biological investigation, a mainstay of medical diagnosis and an important (arguably

indispensable) level at which to study systems biology (Bollard et al., 2005;

Nicholson, 2006). Historically though, the only means available for quantifying the

presence or concentration of molecular entities was with highly specific assays

dependant on their chemical properties (Fell, 1997). Starting in early 1980s, early

systems biology efforts used NMR to fish potential biomarkers from serum samples

using e.g. lipid peak widths (Gadian and Radda, 1981; Bottomley, 1985;

Dawidowicz, 1987; Cerdan et al., 1990). As with genomics and proteomics, the

inflection point in the advancement of metabolic analysis was the advent of

technical platforms which can segregate and quantify multiple metabolites’

concentration simultaneously from a single sample (Gadian and Radda, 1981). With

Page 15: Multivariate NMR analysis of human disease models

13

the adoption of sufficiently high-resolution instrumentation, it became possible to

deconstruct a single sample’s composite spectrum into components (Nicholson and

Wilson, 1987; Fiehn, 2002). By analyzing the resulting concentrations in parallel,

and without a priori knowledge of their significance to a particular phenomenon, the

field of metabolomics was conceived.

Within a decade, metabolomics showed great promise in agriculture (Dixon et al.,

2006), disease research (Griffin, 2006), nutrition (Zivkovic and German, 2009),

medical diagnostics (Vitzthum et al., 2005), pharmaceutical development (Frank

and Hargreaves, 2003; Malet-Martino and Holzgrabe, 2010; Robertson and Reily,

2012), among other fields. On its own or as part of a larger systems biology

approach, the ability to capture a comprehensive metabolic snapshot from biofluids

was extremely enticing. Once a standard methodology was established, it was

applied to a vast array of diseases or biological conditions.

Like any nascent field of study (arguably any active field) however, metabolomics is

not without its faults and limitations (Ransohoff and Feinstein, 1978). There have

been notable failures in experiments’ accuracy (Bantscheff et al., 2007; Zhu et al.,

2011), and numerous studies purporting to yield breakthrough insights have failed to

have useful implications. The same automated collection of large data sets which

allowed for the shift in focus from individual metabolites to comprehensive

metabolic state also overwhelmed classical analysis tools. Instrumental procedures

required refinement to ensure the data was properly annotated (Brown et al., 2005),

and novel statistical techniques had to be developed or adapted to the inherent

character of metabolomics data. Biological interpretation of the resulting numerical

inferences has proven difficult and has often been left unfinished (Robertson and

Reily, 2012).

Perhaps most importantly, the exploratory nature of metabolomics methods and the

ability to uncover potential patterns in data presents a strong temptation to label

interesting hypotheses as unequivocal conclusions or game-changing insights. The

Page 16: Multivariate NMR analysis of human disease models

14

same trait which makes metabolomics so valuable, its ability to examine the

interconnected gestalt of a biological organism, also makes it difficult to segregate

any one element.

As a result, there is perhaps no such thing as a “simple” metabolomics study and

strongly meaningful conclusions are difficult to come by (Nicholls, 2012). For the

past decade, there has been, and continues to be, a need to properly develop, refine,

and reinforce metabolomics methods in order to make them more robust and

meaningful.

1.2 What is metabolomics?

Varied implementations, definitions, and labels notwithstanding, the term

metabolomics has had a reasonably consistent core meaning in the decade since its

inception (Nicholson and Wilson, 1987; Oliver et al., 1998). Ostensibly unbiased

selection of targets has always been the basis for the -omic nomenclature, and other

facets stem from the interaction of goals and technical limitations. For the purposes

of this thesis, the process of analyzing samples can be broken down into four steps

which, while varied in their implementation, are consistent in all metabolomics

methodologies.

Small molecule focus. While the study of metabolic-scale small molecules is

advantageous for a number of reasons, the technical limitations of both NMR or MS

instruments are an unavoidable motivator. In either case, the maximum structural

complexity which can be uniquely deconvolved from a complex spectrum of

multiple known compounds is far smaller than what can be identified in an

unknown but pure sample. Both platforms center on the detection of molecular

elements, returning spectra whose features correspond to the presence of a particular

molecular substructure. The physical limitations of mass/charge ratios and

fragmentation (in the MS paradigm), or spin relaxation times (in the NMR

paradigm), are dependent on increasing but finite instrumental limits. As a result,

while “metabolites” in the technical sense may comprise a larger range of molecular

Page 17: Multivariate NMR analysis of human disease models

15

entities, the focus of “metabolomics” is usually restricted to molecules below a

fairly low given molecular weight/structural complexity, usually only a few hundred

g/mol.

Nevertheless, exceptions to this principle do occur, when investigators trade off

specificity for the ability to handle larger or more complicated metabolic

mixtures/identities. Examples include the field of lipidomics, which analyses lipid

concentrations in a manner analogous to metabolomics’ perspective of smaller

molecules. Notwithstanding the order of magnitude difference in size,

metabolomics and lipidomics are often referred to collectively because of the similar

techniques used.

Unbiased selection. Within the scope of metabolic entities, the essence of systems

biology is the goal of characterising as much as possible of the system before

identifying which elements are relevant to a particular problem. Regardless of

platform or processing technique, the pivotal aspect is that data is not excluded from

acquisition, but instead by topical analysis.

Simultaneous Acquisition. An experiment which measured numerous metabolites

individually and then combined them into a single dataset for analysis could be

considered “metabolomics”, but due to practical constraints this is almost never the

case. The number of molecular species which need to be captured for the

experiment to be considered unbiased precludes anything but parallel, simultaneous

acquisition using one of the instruments outlined below.

Numerical analysis of concentrations. The true value of high-throughput

metabolite capture is the structured, numerical format of the data. The size of the

resulting matrix dictates the application of both statistical analysis and pattern

recognition techniques in order to draw biological conclusions. Biological states

being highly dynamic, it is the consistent differences (labelled “variance”) between

time points or treatments which usually convey the necessary information;

Page 18: Multivariate NMR analysis of human disease models

16

biological processes being stochastic in nature and varied in circumstance, context-

free analysis of concentrations is much harder to translate into practical application.

Taken together, these four aspects represent the commonality between almost all

metabolomics experimentation, and the definition given above. Beyond that,

metabolomics has seen a wide variety of implementations. Even the nomenclature

has been fragmented, with groups attempting first to modify the terminology, and

then trying to spin off sub-fields below the useful linguistic resolution. Taken as a

whole, however, metabolomics has produced a remarkably expansive body of

insight into numerous problems.

1.3 What is metabolomics used to study?

Given that changes in small-molecule metabolism are a major response mechanism

of living organisms at every scale, it’s unsurprising that efforts have been made to

apply metabolomics to a wide range of problems and perspectives.

1.3.1 Disease Models

Bacteria are easy to manipulate and devoid of ethical concerns, leading to numerous

studies of bacterial metabolism (Mashego et al., 2007). Graham-negative bacteria,

in particular, have been particularly interesting because of their tendency to express

numerous metabolites into their growth medium, resulting in a metabolic footprint

in contrast to extraction of intracellular state, referred to as a fingerprint. From a

pharmacological perspective, metabolomics has been used to study both established

and potential antibiotic agents for both efficacy and method of action. Identification

of bacterial gene functionality by knockout (Idle and Gonzalez, 2007) has helped

annotation efforts, and analysis of flux facilitated by 13C labelling is an important

bacterial platform (Zamboni and Sauer, 2009). Interactions between gut microflora

Page 19: Multivariate NMR analysis of human disease models

17

and human metabolism have been highlighted by numerous studies (Nicholls et al.,

2003; Wikoff et al., 2009) as both common and potentially important for human

health. Where bacterial culture interacts pathologically with mammalian biology,

metabolomics has provided potential for both diagnostic and functional insight

(Stringer et al., 2011).

Moving up the evolutionary ladder, yeast, plant, and animal species have been

studied extensively via metabolomics. Agricultural biotechnologies were among the

first common applications for metabolomics (Hall et al., 2002; Dixon et al., 2006)

and continue to be integrated with other systems biology to improve crop science

(Sakurai et al., 2011). Both plants and microbiotic metabolomics have been

leveraged to study extracted and expressed natural products (Oksman-Caldentey and

Inzé, 2004). Animal health studies have been conducted to improve livestock,

husbandry, and ecological conditions (Moore, Kirwan, Doherty, & Whitfield, n.d.).

1.3.2 Epidemiology, toxicology, and in situ pathology

Of course, the potential for non-invasive metabolic profiling has the highest

potential in areas related to human health. By applying metabolomics methods,

studies of nutrition have supplemented work in agriculture to examine the human

effects of energy intake, vitamin deficiency, nutritional supplements, dietary

composition, etc. In the realm of translational drug research, pharmacodynamics is

a high-value target which is difficult to evaluate. Systematic factors like efficacy,

off-target effects, and dosing calibration can be facilitated by whole-body metabolic

profiling (Kaddurah-Daouk et al., 2008). These studies of drug dynamics are

dependant to a degree on characterisation of the recipient’s disease state, which has

been a primary focus of metabolomics since its inception.

Regardless of the application, characterising disease state can be broken down,

under various nomenclature, into diagnostic and explanatory studies. The

dichotomy is artificial but pragmatic, with the former indicating a desire solely to

differentiate between outcomes, where the latter focusing on the ostensible

Page 20: Multivariate NMR analysis of human disease models

18

biochemical mechanism. Modern clinical practice being primarily a mapping of

disease to established treatment options, diagnostic power is held in high regard in

those circles. Developing treatments, understanding of disease, and other

epidemiological advances hinge on explaining the underlying mechanisms of

disease however. Black box diagnostic tools can be developed without assigning

biochemical function to metabolic shifts, and indeed can be useful for tracking

disease progress or physiological response to interventions (pharmalogical (Malet-

Martino and Holzgrabe, 2010), environmental (Cartee et al., 1989; Boulé et al.,

2001; Miccheli et al., 2009), etc) (Madsen et al., 2010), but are still a dead end vis-

a-vis future development.

As such the potential mechanistic insights bear consideration in the design of an

experiment and selection of analytical tools. Whether or not the practitioners

understood the implications at the biochemical level, metabolism has always been

one of the most important levels at which pathology is understood, studied, and

diagnosed. Genetic diseases which disrupt molecular machinery were among the

earliest conditions studied by targeted metabolic profiling (Wilcken et al., 2003)

because of the direct links. The technique has since been applied to many

conditions, some only indirectly related to metabolism, including almost every entry

in the WHO’s top causes of human mortality: cardiovascular disease (Stamler et al.,

1993; Bonora et al., 1996; Roussel et al., 2007), Alzheimer’s (Barba et al., 2008;

Wood, 2012), cancer (Abate-Shen and Shen, 2009; Fan et al., 2009; Bathe et al.,

2011), diabetes (Bain et al., 2009; Lanza et al., 2010), and HIV (Pendyala et al.,

2007; Ghannoum et al., 2011). Even psychiatric and neurological conditions, whose

connections to systematic biochemistry are inherently difficult to quantify, “are

linked to disturbances in metabolic pathways related to neurotransmitter systems

(dopamine, serotonin, GABA and glutamate); fatty acids such as arachidonic acid-

cascade; oxidative stress and mitochondrial function.” (Quinones and Kaddurah-

Daouk, 2009)

Page 21: Multivariate NMR analysis of human disease models

19

As a class of diseases, cancer is probably the most metabolically studied condition

in the past three decades, because of its systematic effects, growth-related nature,

pathological complexity, population prevalence, and available funding (Carroll and

Kritchevsky, 1994; Abate-Shen and Shen, 2009; Chan et al., 2009; Bathe et al.,

2011; Mai et al., 2011). Cancerous cells undergo a variety of metabolic shifts,

depending on the origin, location, and genetic causes of the growth. The glycolytic

upregulation known as the Warburg effect is almost universal (Weljie and Jirik;

Vander Heiden et al., 2009), while other changes accompanying angiogenesis (Li,

2000), disregulation of apoptosis, nutrient scavenging, and oxidative stress vary

depending on cancer type. As a result, metabolomics has been applied with varying

degrees of success to the understanding and/or diagnosis of pancreatic (Bathe et al.,

2011), prostate (Abate-Shen and Shen, 2009), breast (Claudino et al., 2007;

Giskeødegård et al., 2010), oral (Tiziani et al., 2009), lung (Hilario et al., 2003),

kidney (Kind et al., 2007; Kim et al., 2009a), and brain cancers (Griffin and

Kauppinen, 2007) among others (Chung and Griffiths, 2008; Kim et al., 2008;

Spratlin et al., 2009).

1.3.3 Insulin resistance related disease

Type II Diabetes Mellitus, and the related phenomena of insulin resistance,

metabolic syndrome, and obesity, are among the most studied health conditions in

the field (Griffin and Nicholls, 2006; Sébédio et al., 2009). Ostensibly caused by a

developed resistance to insulin which causes disregulation of sugar levels in the

body, diabetes exhibits crippling pathology which is eventually fatal (Brownlee and

Cerami, 1981). Worse, occurrences of metabolic syndrome are growing

exponentially in North America. Increases are partially due to an aging population,

but the prevalence in children is increasing along with childhood obesity . After

onset of acute diabetes, treatment requires a life-long regimen of insulin to prevent

nervous, coronary, renal, optical, hematologic, and skeletal (Heath et al., 1980;

Laakso et al., 1992; Bouillon et al., 1995; Kelley et al., 2002; Wong et al., 2007;

Wang et al., 2011) failures, among others.

Page 22: Multivariate NMR analysis of human disease models

20

Figure 1.1: mTOR/AKT pathway diagram

The mammalian target of Rapamycin pathway, showing relationships

between PI3K, mTOR and AKT actors believed to be involved in diabetic

pathology. Figure from Wikimedia commons, created by Charles Betz and

released under Creative Commons License v3.0.

Mechanistically, acquired insulin resistance has a complicated etiology which has

impeded discovery of better treatments or cures. Changing insulin sensitivity in

muscle and fat cells causes chronic hyperglycaemia, but sugar levels are

reciprocally a causal factor. Known mechanisms include overloading mTOR or

PI3K effectors, but non-enzymatic chemical activities such as Schiff-base

glycosylation have also been identified as part of downstream diabetic pathology,

and postulated as part of causal mechanism (Brownlee and Cerami, 1981). Such a

process would elude any protein or expression based analysis, but could influence

amino acid and sugar metabolism in a noticeable way. Complicating any analysis is

the influence of genetic and environmental factors on susceptibility, with both

variable and interacting second-order sensitivities (Christian and Stewart, 2010).

Page 23: Multivariate NMR analysis of human disease models

21

Insulin related conditions have therefore been a worthwhile target for analysis with

metabolomics. Overall nutrition and dietary health, both inexorably linked to

metabolic syndrome, have since been studied in this manner (Zivkovic and German,

2009), along with the various causes of obesity (Griffin, 2006). Urine from

established diabetic cases has also been the focus of NMR analysis (Jankevics et al.,

2009; Lanza et al., 2010), but the extended and unpredictable development of

insulin resistance makes direct examination of its onset difficult. In place of

prohibitively large human cohorts, effective animal models for insulin resistance

allow for more stringent controls and an induced onset. The C57BL/6 mouse strain

offers genetic homogeneity, human analogue metabolism, and strongly inducible

insulin resistance using a high-fat diet, making it an ideal model for the study of

insulin related disease.

All told, metabolomics can and has been used to study a wide variety of diseases,

human health conditions, and industrial applications. Direct and indirect influences

on metabolism can both be detected, with almost any significant change in biology

being expressed in biofluid metabolome of a living organism. The field, however, is

not without limitations and failures.

1.4 How is metabolomics studied?

Notwithstanding the variety of definitions and implementations, most metabolomics

experiments follow a standard methodology which can be broken down into stages

based (to first approximation) on the physical location in which they’re conducted.

While the nomenclature defined here is not universal, the approach is derived from

the work of Douglas Kell and made popular by a number of academic labs.

Organizations such as the metabolomics society and NIH special interest groups

have established standard operating procedures to facilitate deployment and

consistency (Fiehn et al., 2007; Goodacre et al., 2007), but every stage provides

opportunity for improvement and innovation.

Page 24: Multivariate NMR analysis of human disease models

22

Mass Spectrometry NMR Collection Selection

Manipulation Harvesting

Stabilization (clotting, freezing, filtration) Quantification Extraction

Derivitization Acquisition Alignment

Baseline adjustment Peak Identification

pH adjustment Buffering

Acquisition Baselining

Peak picking/ Deconvoluion

Modelling Dilution normalization Scaling

Outlier detection Multivariate/Univariate model construction

Model Optimizaton Model Validation

Interpretation Variable Selection and model reconstruction Variable importance

Variable identity confirmation Pathway Analysis

Hypothesis generation Figure 1.2: Overview of Metabolomics experiments.

1.4.1 Collection

Collection of samples is, nominally, the same as for any experiment in metabolism.

Samples are manipulated or selected from some larger population in such a way as

to focus on the results of some treatment of interest.

While solid state experiments have been performed, fluid samples make up the vast

majority of studies because of the simplicity in their handling and acquisition with

either NMR or MS instrumentation. Serum or urine are the most common

collection medium for vertebrate biology because of their relative ease in

acquisition, but tissue extracts and more exotic samples such as amniotic or

cerebrospinal fluid have been used (Dunne et al., 2005). Plant and bacterial extracts

have also yielded significant insights, with the latter (Bock, 1982) providing both

fingerprint (cellular lysate) and footprint (media content) analysis.

Page 25: Multivariate NMR analysis of human disease models

23

Regardless of fluid extraction method, the selection or manipulation of samples to

optimize the information content is critical (Jonsson et al., 2004, 2005; Broadhurst

and Kell, 2006). The complexity and variance inherent to biological systems

implies that each sample provides limited perspective, a problem which is

exacerbated by the large number of variables (metabolites) examined (Hastie et al.,

2009). In higher order systems, such as mammalian biology, many factors such as

environmental inconsistency, genetic inhomogenetiy, and experimental handling all

introduce confounding, undesireable, unrelated, or unexpected variance (“noise”)

which must be filtered out using replicates and experimental design (Dumas et al.,

2006; Dunn et al., 2011).

Information content is maximized when samples are selected or manipulated such

that the only sources of variation present are those of interest. Doing so is

exceptionally difficult in humans, because of extreme inconsistencies in diet,

exercise, genetics, and disease and health (Bijlsma et al., 2006; Scalbert et al., 2009)

Every factor affects metabolite levels either directly, as a result of uptake in

nutrients, indirectly via biological response, or both. Among the established

epidemiological methods for coping with human inconsistencies, several are directly

applicable to metabolomics:

Experimental control structures

When the selective application of treatment is not ethical, for example, structured

sample cohorts control variance by selecting samples from a larger pool in a

balanced way with respect to both primary attributes (e.g. disease state) and

unrelated variables like age, race, weight, etc. Similar samples are then arranged

into blocks which can be contrasted in order to identify and eliminate the effect of

those nuisance factors (Addelman, 1969). Further, blocking has application to

both hypothesis-generation or biomarker validation studies although the latter brings

with it other issues of sample size and logistics (Pepe et al., 2008).

Page 26: Multivariate NMR analysis of human disease models

24

Longitudinal studies use each individual as its own control, sampling biological

state from each both before and after one or more treatments of interest, or along

some time series. The differential of each sample in the time dimension then

becomes the scope of the experiment, with consistent change of a sample relative to

time zero rejecting a null hypothesis. Of note, longitudinal studies can only

compensate for any variance between individuals prior to the beginning of the

experiment. Differences in response between individuals will still appear as

inconsistencies in the final result.

Careful use of animal models of human biology are particularly powerful in

metabolomics. The ability to control or manipulate sample biology using diet,

medication, genetic alteration, etc. means smaller intra-class noise and higher inter-

class signal. Care must be taken when using models though, as the extreme

sensitivity of the method again makes it possible to detect many artifactual changes;

even strain differences have been shown to differ in metabolomic profile (Griffin,

2006).

Crossover studies are a form of longitudinal study, frequently employed in

mammalian contexts, which provide evidence for the causal nature of a change in

metabolism by switching the treatment status of some samples midway through the

course of the experiment. Typically, half of the control samples switch to undergo

treatment or induction of disease, while half of the “affected” samples are switched

to a control protocol. The three-point time-course controls for many factors and

shows more clearly how the manipulated variable is responsible for the change in

response.

Of all the factors which can be controlled in animal models, gut bacteria is an

extremely important one (Nicholls et al., 2003; Wikoff et al., 2009). Mice can be

raised in a sterile environment if necessary, which ostensibly eliminates

gastrointestinal microbial influences, but normal laboratory conditions quickly

induce bacterial effects. Whether such factors are significant to a disease varies, but

Page 27: Multivariate NMR analysis of human disease models

25

they have a clear impact on serum or urine metabolome regardless (Nicholls et al.,

2003). Given that, adjustments or controls for gut culture differences, such as

cohabitation acclimatization periods, standardized diets, and/or sterile handling, are

worth considering as both pros and cons of any animal model study (Bell et al.,

1991; Barnard et al., 2009; Wikoff et al., 2009).

1.4.2 Quantification

As with any experiment, translation of real-world biological samples into numerical

data requires appropriate instrumentation. Because of their ability to (A)

differentiate metabolites based single-atom chemical structure differences, and (B)

provide quantitative2 measurements of each metabolite’s presence in solution, two

instrumental paradigms are commonly applied:

Nuclear Magnetic Resonance (NMR) quantifies metabolites on the basis of their

component atoms’ spin state reaction to RF fields in a strong magnetic field (Canet,

1996). After processing a Fourier transform into frequency domain, NMR analysis

of a pure compound yields a quantized spectrum (Figure 2a) with peaks whose

frequency shift relative to a standard reflects the structural identity of an atom

therein.

2 The term quantitation or quantification is used in some metabolomics literature to refer to a combination of metabolite identification and concentration measurement with linear, homoscedastic accuracy. From an information theory perspective however, any measurement of physical properties, and their capture in a numerical matrix (as opposed to qualitative descriptors) is a quantification process. It is this numerical capture, regardless of any metabolite identification which precedes or follows it, which is referred to here as quantification: the second fundamental and necessary step in all metabolomics analysis.

Page 28: Multivariate NMR analysis of human disease models

26

Figure 1.3: Examples of NMR and MS spectra

The popularity of NMR stems from several factors. The fact that NMR probe

hardware never comes into contact with, or interacts chemically, samples improves

the reproducibility of the technique. Moreover, because the RF and magnetic fields

in an ideal NMR probe are homogeneous throughout the sample, every molecule in

the sample undergoes approximately the same process, and exhibits the same

response. The result is highly coherent signal, absent artifacts resulting from

(a)

(b)

Page 29: Multivariate NMR analysis of human disease models

27

interaction with the matrix or other molecules as a result of the instrument: the

response (in each peak) is linear with atomic concentration (e.g. proton

concentration in 1H NMR) and strictly additive in peak height, even across multiple

molecular species which overlap in the spectrum (Nicholson and Wilson, 1989;

Nicholson et al., 1995; Canet, 1996). In addition to yielding extremely consistent

data, these properties allow the use of known patterns to deconvolve a spectrum of

multiple compounds resulting from complex mixtures such as biofluids.

The quantum-mechanical details of NMR which yield these properties are usually

abstracted for the purposes of metabolomics, but remain important insofar as they

dictate instrumental allocation and properties of the resulting data: central to both is

the inverse-squared relationship between NMR sensitivity (signal-to-noise ratio) and

time spent scanning each sample (Canet, 1996). Because only atoms with non-zero

quantum spin are detectable by NMR, proton (1H) resonance spectra comprise the

vast majority of metabolomics experiments -- 1H’s high abundance in natural (non-

labelled) metabolites leads to high sensitivity and hence low required acquisition

times. Of the other elements available for NMR detection, 13C NMR has been

performed (Fan et al., 2009; Shaykhutdinov et al., 2009; Zamboni and Sauer, 2009),

but its low natural abundance leads to low sensitivity, so its utility is mostly

restricted to 2D 1H-13C experiments whose purpose is to identify unknown spectral

elements (Duggan et al., 2011c).

Specificity in NMR, in the form of accurate spectral-peak-to-metabolite assignment,

stems from the strength of the magnet used (Nicholson and Wilson, 1989). A well

maintained and shimmed 400 MHz magnet (that is, a magnet with a sufficiently

powerful field to cause 1H nuclear spins to precess at 400 MHz) can generate

sufficiently narrow spectral peaks to differentiate numerous compounds, but the 600

MHz magnet used for this work was much closer to the norm, and a necessary

baseline against which to demonstrate improvements (Nicholson et al., 1995;

Duggan et al., 2011c). With proper compensation for water signal and shimming to

Page 30: Multivariate NMR analysis of human disease models

28

optimize peak shape, concentrations as low as 1-2 uM were clearly discernible for

up to one hundred metabolites in practical circumstances.

NMR-based analysis was the primary analytical platform used in work for this

thesis. For comparison, three other technologies used for the quantification of

metabolomics samples are Mass Spectrometry (MS), and Fourier Transform

Infrared spectroscopy (FT-IR). Of the two, the former is far more common due to

its increased sensitivity, up to one hundred fold above that of NMR, and is usually

coupled to some form of chromatographic separation in the form of Gas

Chromatography (GC), Liquid Chromatography (LC), etc (Weckwerth and

Morgenthal, 2005). The GC- or LC- component of the hybridized instrument serves

to increase the accuracy and resolution of the ensuing MS instrumental readings

(Dunn et al., 2011). By segregating the contents of a sample on the basis of some

physical property, the chromatographic step allows MS to be applied in serial to

some smaller subset of the metabolites therein, while still ensuring coverage of the

entire molecular range.

Mass spectrometry itself operates by modifying the charge state of molecules in the

analyte, then separating and identifying the resulting ions on the basis of their mass

(a magnetic field exerting a constant force, and hence variable acceleration on them

within the detector). Like NMR, the result is a spectrum (Figure 2b) of peaks

separated by chemical identity and proportional in height to concentration. With

MS however, the relationship is not linear or additive, owing to frequent chemical

interactions between ions, the sample matrix, and/or the instrument (Pigini et al.,

2006; Bantscheff et al., 2007). Separation of metabolites on some orthogonal basis

prior to MS analysis notwithstanding, the result is data which is more sensitive but

less quantitatively reliable. Internal standards and better separation using more

precise detectors have been employed to eliminate such errors, which are not

necessarily as obvious as mis-assignment or changes in mean accuracy but can

include more nefarious artefacts such as heteroskedastic or inconsistent sensitivity;

the bane of statistical analysis.

Page 31: Multivariate NMR analysis of human disease models

29

Optical/vibrational spectroscopies such as FT-IR, Near Infrared (NIR) or Ramen

Spectroscopy have also been employed, with the significant advantage that IR-based

instrumentation is often orders of magnitude less expensive than either MS or NMR

based platforms. While this renders them attractive as an end-point application

(Dunn and Ellis, 2005) for some sample types, the absorbance of IR by water makes

the normal aqueous biofluid samples less amenable to full range (FT) IR analysis.

Limiting the spectrum to NIR analysis has yielded some improvement in the

analysis of aqueous samples when the experiment is focused on single compounds

with dominating concentrations (Lafrance et al., 2004). In general however, IR

based techniques indicate only the presence of gross structural features somewhere

in molecular structure, hence cannot differentiate between e.g. structural isomers or

identify low-concentration metabolites, which limits their use as a wide-perspective

platform (McGovern et al., 2002; Kaderbhai et al., 2003; Lafrance et al., 2004).

Regardless of the instrument used, sample preparation specific to the instrument is

performed prior to spectral analysis (Lu et al., 2008; Dunn et al., 2011). Steps

include removal of unrelated macromolecules such as proteins and lipids, chemical

derivitization to render metabolites volatile, and/or protection from bacterial decay.

The variations, based on experimental goals, are an important example of the many

sources of error and inconsistency inherent to the complex sample processing

pipeline required. Inconsistent sample preparation and handling can result in “batch

effects”, which differentiate samples based on acquisition date, instrument, analyst

or other irrelevant factor (Villas-Bôas et al., 2007; Alam et al., 2009; Zelena et al.,

2009; Draisma et al., 2010). Because of the wide scope and high specificity of the

methods used, subtle variations in batch processing are sometimes captured.

Metabolomics has shown to be at least as reproducible as other –omics methods

(Dumas et al., 2006), but the presence of batch effects is something which must be

considered, notwithstanding standard operating procedures (SOPs) and other

attempts to achieve consistent acquisition (Villas-Bôas et al., 2007; Alam et al.,

2009). Given the complexity of the biological systems in question, it would seem

Page 32: Multivariate NMR analysis of human disease models

30

unavoidable refining instrument sensitivity and sample processing invariably results

in other inconsistencies being more prominent.

As such, the character of measurements taken and the nature of errors are influential

on the resulting analysis. Neither NMR nor MS can be treated as a black box which

outputs absolute concentrations without regard to instrumental artefacts.

Spectral processing

Two important steps for dealing with procedural artefacts are variable normalization

and assignment (Broadhurst and Kell, 2006; Goodacre et al., 2007). Either could be

considered part of the modeling phase, but the focus on compensating for

differences between reality and quantified data suggests they are better considered

pre-modeling steps.

Normalization, collectively, refers here to an array of pre-processing techniques

designed to justify assumptions used in the statistical modeling phase, in particular

regarding the distribution of variance in metabolite concentrations (Craig et al.,

2006; Sysi-Aho et al., 2007; Torgrip et al., 2008; Warrack et al., 2009). Separate

from the scaling and mean centering performed as part of the statistical analysis,

whose goal is to make each variable numerically comparable; normalization

addresses the unavoidable differences in quantification of samples.

Differences in sample dilution are the most common quantification artefact, which

can result in scale differences across all measured metabolites (Craig et al., 2006;

Torgrip et al., 2008). To compensate, the measured values for each sample are

usually divided by some value based on an estimation of that sample’s “dilution

factor” (DF) (Craig et al., 2006; Dieterle et al., 2006). An obvious but relatively

unavoidable caveat of this approach is that differences in biofluid concentration can

occur in vivo, especially in fluids such as urine.

Page 33: Multivariate NMR analysis of human disease models

31

Nevertheless, various DF estimation algorithms may result in better final results.

Historically, the total concentration of all metabolites in a sample was used as a

simple estimation of its DF, with some manual curation to mitigate the effects of

dynamic range (such as removing glucose, lactate, or other high-concentration

metabolites) (Weljie et al., 2006). More recently, other algorithms have been

proposed that obviate the need for manual interference or awkward threshold

effects. Normalization relative to an average spectrum, rank transforms, internal

standards, or scaling the effect of each variable on DF estimation by its variance

across spectra have all been applied in the past (Dieterle et al., 2006; Sysi-Aho et al.,

2007); other techniques drawn from e.g. microarray analysis have been adapted and

may be prove valuable (Li and Wong, 2001; Bolstad et al., 2003; Shurubor et al.,

2005).

Other sources of error which NMR quantification can introduce include baseline

distortions, peak shifts, and metastable peaks. Spectrum-wide phenomena are

handled by subtracting a spline fit to the baseline (Weljie et al., 2006). Metastable

peaks, like urea, and peak shifts caused by salts/pH inconsistencies in the sample,

are part of the larger problem of assigning chemical identities to measured variables:

As a consequence of analog-to-digital processing, after Fourier transformation NMR

spectral peaks are each captured in an unknown number of variables (Nicholson and

Wilson, 1989). The same phenomenon occurs in MS data, regardless of

chromatographic separation, but for different reasons. As a result, the exact number

of data points per peak varies across the spectrum based on peak width, spectral

resolution, and alignment between the two. Peak overlaps, peak shifts, and

occasionally variable peak widths exacerbate the problem of aggregating and

attributing data points to specific resonance peaks, which is necessary before

deconvolution of peaks (and multiplets) into chemical identities is possible (Åberg

et al., 2009). Unlike normalization, some assignment tasks can be performed after

statistical modeling (i.e. only for variables identified as significant), but others (such

as dealing with peak shifts) cannot so it remains primarily an issue of accurate

Page 34: Multivariate NMR analysis of human disease models

32

spectrum-wide processing. Proposed approaches to simplify assignment include

binning (Davis et al., 2007; Anderson et al., 2008, 2010), targeted profiling (Weljie

et al., 2006), and correlation based analysis (Cloarec et al., 2005; Cho et al., 2008).

By summing all of the points within fixed regions of the chemical shift dimension

(either evenly spaced bins, or buckets believed to segregate metabolites based on a

priori knowledge) binning reduces the number of variables which facilitates both

modeling and assignment (Beckwith-Hall et al., 1998; Davis et al., 2007; Anderson

et al., 2010). Drawbacks include boundary effects, which introduce errors when one

or more windows incompletely cover a spectral peak. The unpredictable shift in

peaks necessitates human curation, or some kind of automated peak detection

algorithm. While peak detection is relatively simple at high signal-to-noise ratios, it

becomes increasingly difficult as concentrations (and peaks) get smaller; peak

boundaries become intractable and assignment quality suffers.

More sophisticated approaches to assigning spectral elements are based on the

correlation between peaks stemming from the same molecule, an important

implication of the aforementioned linear relationship between peak height and

molecular concentration. Targeted Profiling is a manual assignment strategy which

will be discussed in depth as it pertains to individual applications (Weljie et al.,

2006). Developed by Chenomx Inc, it depends heavily on software libraries to

inform the user of known correspondences between spectral peaks and particular

metabolites. The primary advantage of such an approach is that the additive

property of NMR spectra is leveraged to deconvolve overlapping peaks, recovering

more metabolite-specific concentration information. The result is a much simplified

dataset with theoretically minimal loss of information.

Other correlation-based assignment strategies include statistical total correlation

spectroscopy (STOCSY) (Cloarec et al., 2005), which simplifies spectra by filtering

the spectral autocorrelation matrix before applying automated peak detection or

presenting the same data for manual assignment.

Page 35: Multivariate NMR analysis of human disease models

33

1.4.3 Modeling

Having collected samples and quantified the contents of each into a vector (usually

proportional to concentration values, with some error characteristic of the

instrument used for quantitation), numerical analysis techniques are required to

draw meaningful inferences about samples, and from there the population.

Regardless which approach is used to analyse sample variations, certain terminology

and steps are consistent. A training set of samples is selected from among the

samples collected, and used to train a model of the variation seen, with the hope that

the distribution of each variable (metabolite concentrations) in the training set will

reflect its overall distribution in the greater populations (Trygg et al., 2007; Hastie et

al., 2009). The structure of the model is optimized using a testing dataset of some

kind, and then its accuracy must be confirmed on validation data before its utility

can be ascertained.

Training a metabolomics models occurs in a number of ways, but the most general

schema differentiates between supervised and unsupervised techniques (Goodacre et

al., 2004; Hastie et al., 2009) – semi-supervised learning techniques have seen little

application, if any, in metabolomics. Unsupervised analysis looks to model the

most prevalent sources of variation within the population, while supervised analysis

looks for specific patterns. Patterns of interest may be those related to a response

measure such as change in bodyweight, or a manipulated property of each sample

such as dosage. Other experiments look for the difference between discrete sample

classes, in which case the experiment is referred to as a discriminant or

classification problem (Hastie et al., 2009). In all cases, a supervisory variable of

some kind is used to inform the model training process what patterns to prioritize.

Because successful quantification often results in dozens, if not hundreds or

thousands of variables, multivariate considerations are significant to any

metabolomics study (Broadhurst and Kell, 2006). Application of univariate

techniques with adjustment for multiple hypothesis testing can avoid some

Page 36: Multivariate NMR analysis of human disease models

34

problems, but quantitation of NMR spectra provides no guarantee of linear

independence between metabolite concentrations (in fact, metabolites are often co-

linear for both biological and instrumental reasons). Thus, consideration of

metabolites on an individual basis discards important context for each, and falls prey

to Bellman’s curse of dimensionality (Broadhurst and Kell, 2006).

A number of analytical approaches have been applied to metabolomics data in the

past, drawn from various machine learning, signal processing, and chemometric

disciplines (Holmes and Antti, 2002; Goodacre et al., 2004). The related techniques

of probabilistic neural networks and Support Vector Machines have been applied

separately to the problems of assignment and biological modeling (Truong et al.,

2004; Xu et al., 2006; Mahadevan et al., 2008; Giskeødegård et al., 2010), however

they suffer from the problem that the resulting model structure is difficult to

interpret by mapping back to the original biological context. As stated by Madsen

et al.:

For non-linear methods it is often impossible to interpret the model

biologically, since statistical significance of model coefficient size

and its direction may not be directly related to the importance of the

corresponding variables for the model building. In our experience,

we have yet to discover an example where non-linear methods

indeed provide superior classification results (e.g. in an independent

follow-up study and not by cross-validation testing) compared to

linear methods to justify this significant disadvantage in biological

interpretation (Madsen et al., 2010)

Tree based hierarchical clustering, rule based inference, soft independent modelling

of class analogies, ANOVA-simultaneous component analysis (Smilde et al., 2005),

and genetic algorithms (Gilbert et al., 1997; Hageman et al., 2008; Koza et al.,

2008) have also been applied to both problems, but suffer from overfitting when the

Page 37: Multivariate NMR analysis of human disease models

35

number of samples is small – as previously mentioned, metabolomics

instrumentation can usually quantify several hundred metabolites or more, so the

number of samples present is usually much smaller than the number of variables.

Exception is that of agricultural, bacterial, or plant metabolomics, where replicates

are easier to come by and population variance smaller (Catchpole et al., 2005;

Fukusaki and Kobayashi, 2005; Tikunov et al., 2005). Markov-chain based

modeling, especially as a fitting tool for Bayesian estimators, have been applied to

the spectral processing correspondence problem (Kim et al., 2006; Rubtsov and

Griffin, 2007), but very little work has been done on its application to metabolic

interpretation; the best examples build on top of other techniques and as yet still

suffer from interpretation issues (Gavai, 2009; Mcateer et al., 2009; Franken et al.,

2012; Krumsiek et al., 2012).

Projection-based methods are a family of analytical techniques which handle both

smaller sample sizes and numerous co-linear variables well (Hastie et al., 2009). As

a result, they been the most popular and successful tool employed to analyse

metabolomics (Trygg et al., 2007). Projection techniques identify weighted linear

combinations of variables (hyperplanes) of maximal covariation within the Hilbert

space defined by the measured concentrations. Doing so intrinsically identifies so-

called latent variables: underlying combinations of variables which drive the

information content of the experiment. In projection based methods, each latent

variable is attributed to a perpendicular eigenvector of minimal variation, and hence

maximal information content. Among other things, this helps to solve the problem

of highly co-linear variables. The result is the identification of patterns of consistent

co-variation between metabolites which simplifies the difference between samples,

and aids both identification and application of the model.

Two projection-based techniques, principle component analysis (PCA) and

projection to latent structures (PLS), constitute the vast majority of the analysis

done in metabolomics, especially as it applies to human health science (Trygg et al.,

2007; Madsen et al., 2010). PCA and PLS correspond approximately to the

Page 38: Multivariate NMR analysis of human disease models

36

unsupervised and supervised versions of the same algorithm, and share a number of

facets in both data preparation and results interpretation:

Analysis with projection-based models begins by concatenation of the samples'

concentration vectors into a matrix, at which point the importance of consistent

treatment of samples in earlier steps becomes obvious. The coherence implied by

samples’ row-wise unification is only true to the extent that quantified

measurements are either accurate to a biological comparison or can be rendered

comparable using some mathematical equilibration. In either case, exploratory data

analysis or strong assumptions must be used to justify the use of projection

techniques because of their dependence on (at least) a center-weighted distribution

for in each variable (Hastie et al., 2009).

Before the resulting data matrix can be used to train a model of the sample variance,

a number of “column-wise” adjustments are required. Mean centering and scaling

to comparable measured second moment (ie: variance, either unit variance or, in

some cases, unit root variance “Pareto” scaling) are almost universal because they

render variables comparable within the framework of a normal probability

distribution. Where necessary, additional transforms such as logarithm or cube root

transforms can further coerce the data into an approximately normal distribution

(Trygg et al., 2007).

One final point of note is an inversion of the nomenclature with regard to dependant

and independent variables in a PCA or PLS model. The nature of metabolomics

experiments is that sample class or treatment is manipulated and the effects on

metabolite levels measured, making metabolites dependant variables. In both PLS

and PCA analysis however, the metabolite concentrations are usually labelled X

variables, and the sample treatment is the response (Y) variable when used

(Umetrics, 2006).

Principle Component Analysis (PCA)

Page 39: Multivariate NMR analysis of human disease models

37

PCA is a relatively simple form of factor analysis which uses no supervisory

variables: it identifies patterns in the measured metabolite data without regard to

descriptive sample characteristics (Wold et al., 1987). Each component is an

orthogonal hyperplane, also referred to as a dimension, which represents a dominant

or recurring pattern of correlated variables in the X-matrix. The structured variation

in the samples is then estimated using the normal vectors as an orthogonal basis of

“loadings”; a set of correlated changes orthogonal from any other component.

Figure 1.4: Example loading plots

1D and 2D plots showing the loadings, for one and two components,

representing patterns of correlated change seen in the samples.

Using the orthogonal loadings as basis vectors, the difference of each spectrum from

the mean can be reconstructed uniquely using a weighted combination.

Reconstruction weights comprise scalar “scores” in each dimension, with separation

in each score dimension representing modeled differences between samples.

-0.45

-0.40

-0.35

-0.30

-0.25

-0.20

-0.15

-0.10

-0.05

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

Allantoin

Trimethyla

Creatine

Citrate

3-M

ethylxa

Mannose

Hippurate

Fumarate

Alanine

Phenylacet

Tryptophan

Glutamine

Threonine

N,N-Dim

eth

Lysine

Asparagine

Glycolate

Uracil

2-Hydroxyi

Carnitine

Guanidoace

Isocitrate

Caprylate

N-Carbamoy

Ethanolami

N-Isovaler

Xylose

4-Hydroxyp

Urea

Ascorbate

Taurine

w*c[1]P

Var ID (Primary)SIMCA-P+ 12.0.1 - 2012-09-29 21:32:35 (UTC-7)

-0.20

-0.15

-0.10

-0.05

-0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

-0.30 -0.25 -0.20 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 0.25

w*c[2]O

w*c[1]P

2-Hydroxyi3-Methylxa

4-Hydroxyp

AlanineA llantoin

Ascorbate

Asparagine

Caprylate

Carnitine

Citrate

Creatine

Ethanolami

Fumarate

Glutamine

Glycolate

Guanidoace

Hippurate

Isocitrate

Lysine

Mannose

N,N-Dimeth

N-Carbamoy

N-Isovaler

Phenylacet

Taurine

Threonine

Trimethyla

Tryptophan

Uracil

Urea

Xylose

SIMCA-P+ 12.0.1 - 2012-09-29 21:47:11 (UTC -7)

Page 40: Multivariate NMR analysis of human disease models

38

Figure 1.5: Example score plots

One and two dimensional score plots, representing the quantity of the first (

and second) component pattern found in each sample. Legend: Control

(Dark Blue) vs Treated (Light Blue)

Total decomposition of a dataset into principle components will yield the same

number of dimensions as X variables. Dimensions are ranked according to the

percentage of total X-variable variance captured, but not every component

contributes significantly to biological variation (Wold et al., 1987; Umetrics, 2006;

Hastie et al., 2009). Selection of only statistically significant dimensions,

representing non-coincidental variation, is an important step described below. If the

number of significant components is less than the number of variables (as it always

is for biological data), the residual error resulting from incomplete reconstruction of

each spectrum represents the data reduction power (alternately, loss) of the model.

Because PCA dimensions may not correspond with the manipulated variables in an

experiment, it is useful to identify potential causes associated with each response

pattern. In an ideal scenario, the design of the experiment will structure inter-

sample variance in such a way that the first components (causing the largest

variation) are those of interest but confirmation using extraneous data regression or

visualisation can be informative.

-2

-1

0

1

2

10 20 30 40

Sample ID

Component 1

-2

-1

0

1

2

-2 -1 0 1 2

Component 1

Component 2

Page 41: Multivariate NMR analysis of human disease models

39

An important strength of PCA is its relative insensitivity to data properties. It is

stable in the presence of reasonable non-normality in the concentration distributions,

but is notably dependant on (or at least restricted to finding) linear relationships

between metabolites. Non-linear relationships within the concentration matrix can

cause large errors, and no instances of successfully applying e.g. kernel tricks to

model squared variables have been published, although this is one area where e.g.

neural networks might prove valuable (Kramer, 1991).

Projection to Latent Structures (PLS)

PLS is a supervised modeling technique which bears a number of mathematical

similarities to PCA, with the notable exception that components are not identified

on the basis of explained X-variation. Instead, latent variables (components) are

selected stepwise based on their covariance with, and ability to explain, variation in

supervisory Y-variables (Hastie et al., 2009). Selection and control of Y-variables,

which can be class-discriminating contrasts or progressive scalar values, can

optimize component discovery (Vinzi, 2010). Like PCA, selecting the number of

non-spurious components is an important step (Eriksson et al., 2008), but PLS

benefits from the fact that patterns are ranked based on their utility for a certain

problem (discriminatory or explanatory). Avoiding non-significant, spurious, or

unintentional patterns increases descriptive power (Hastie et al., 2009; Vinzi, 2010).

Orthogonalisation of PLS loadings

In addition to the problem of selecting only meaningful components, the ability of

PCA and PLS to capture extremely complex systems results in two problems:

complex results and potential over fitting. Complexity is relative, as scores and

loadings plots for either method are more easily interpreted than other multivariate

techniques (Madsen et al., 2010). Nevertheless, in both methods more than one

component pattern may be required to model the difference between samples

(whether grouped, in discriminant analysis, or continuous response variables). An

extensive body of literature has been written on the correct interpretation of PLS

Page 42: Multivariate NMR analysis of human disease models

40

coefficients, highlighting the difficulty of the problem (Wold et al., 2001; Næs et al.,

2004).

An innovative approach to this problem was formulated by Johan Trygg’s group in

Sweden (Trygg and Wold, 2002). Based on the extant technique of orthogonalising

results by filtering unrelated variance from input variables (Kemsley and Tapp,

2009), the OPLS algorithm integrates the filtration into the iterative NIPALS

process for constructing PLS models. As each component is modeled, directly

orthogonal variation is isolated from the X-variables before the next iteration. The

predictive (first) component is then remodelled on this filtered data, resulting in a

single component uniting all predictive value available. Each orthogonal

component can be visualized to identify unrelated sources of variation, or rejected in

favour of analysing only the primary component. This has the significant advantage

that predictive elements are summarized with a single scalar loading value per

metabolite, which greatly simplifies interpretation: in particular, comparison of

models and is easier because only one vector per model needs to be compared. The

result of their direct regression is the SUS plot proposed by Wiklund et al (2007).

Figure 1.6: Example Shared and Unique Structure (SUS) plot

Page 43: Multivariate NMR analysis of human disease models

41

One consequence of filtering orthogonal variation from model construction is that

prediction of new samples’ response values requires pre-processing. The algorithm

for doing so is both published and codified, but the implications on collection of

future sample are significant. Filtration of “orthogonal” variation from samples

collected at a different time is necessarily dependant on consistent collection and

quantitation (Trygg and Wold, 2002).

The first implementation of the OPLS algorithm supported a single supervisory Y-

variable. This had an important implication for discriminant analysis of samples

assigned to a class, rather than a continuous response variable such as change in

bodymass, age, etc. In a discriminant model, the difference between classes is

represented by a contrast variable, and in PLS models with more than two classes a

separate contrast is used for each additional class. In the OPLS algorithm, the

single supervisory variable can capture the differences between only two classes.

The more advanced O2PLS algorithm can support multiple supervisory variables,

and hence additional class contrast, but it is a mistake to represent 3 or more classes

using a single supervisory “dummy” variable, as doing so forces the model to

construct a single, progressive “ordered” response between classes.

Model optimization and validation

If complex results are an intrinsic problem of any real-world multivariate

application, the risk of over- fitting is inherent to all numerical modeling based on

imperfect real-world sampling (Hastie et al., 2009). In the projection methods

outlined above, avoiding training problems (“optimizing the model”) is

conceptualized as the selection of only significant (meaningful) components.

Spurious relationships, represented by additional components, are identified using a

number of techniques, including cross-validation to test the predictive power of each

component on samples excluded from its training (Broadhurst and Kell, 2006;

Goodacre et al., 2007).

Page 44: Multivariate NMR analysis of human disease models

42

R2 and Q2 are two measures used estimate the amount of variation captured, by each

component, in training and cross-validation samples respectively (Osten, 1988;

Umetrics, 2006). R2 represents the percentage of variation in the training sample

data which can be represented by a scalar multiple of that component’s loadings,

and approximates the linear correlation coefficient between the actual sample Y

values and the scores predicted by the model approximation. Q2 is Prediction Error

Sum of Squares (PRESS) estimate of the component’s accuracy in predicting the

value of testing samples in either X or Y (Q2X and Q2Y respectively). The

quantitative difference between R2 and Q2 for a component (or multi-component

model), or an incremental comparison of significant changes to the PRESS measure

(Osten, 1988), is an established metric for the goodness-of-fit of a metabolomics

model but is held by some to be overly optimistic (Goodacre et al., 2007); while the

complexity of biological datasets and small sample sizes often limits the total

(cumulative) measures to the 20-40% range, this is only an indicator of low power,

not over-fitting if the two numbers are similar.

As a method for optimizing model design, Cross Validation (CV) is widely accepted

but dependant on sample size (Kohavi, 1995; Umetrics, 2006; Madsen et al., 2010).

Unfortunately, estimating the correct minimum number of samples for significant

PLS analysis is a difficult problem for which no good “power estimation” work has

been published. Instead, best guesses based on the magnitude of the effect are used,

and the granularity of the R2/Q2 values are used to gauge whether enough variance

has been incorporated. Additionally, non-parametric tests, such as the loss of model

quality expected with permutation of sample labels, are used to good effect

(Broadhurst and Kell, 2006; Eriksson et al., 2008): A well trained model will

slowly but regularly decrease in predictive power as the accuracy of the training

data is decreased by intentionally introducing false supervisory data.

CV is also limited in its ability to detect bias in sample selection (Efron and Gong,

1983; Broadhurst and Kell, 2006), because it’s predicated on the assumption that the

distribution of variables in the sample set matches their distribution in the study

Page 45: Multivariate NMR analysis of human disease models

43

population. That can be particularly problematic given the aforementioned

presence of batch effects: Spectrum of response issues (Ransohoff and Feinstein,

1978) and bias introduced by sample handling can both violate the distribution

assumptions by changing the means, variance, or other properties (Ransohoff,

2005). In order to identify batch effects, an additional round of validation is

required using samples not just withheld from model training but acquired (or

ideally collected) at different times. The inability of a model to accurately assign a

response to samples acquired separately is a serious, if not fatal, problem.

Unfortunately, little literature has been published on the magnitude of batch effects

in either NMR or MS metabolomics.

1.4.4 Interpretation

The final step in any metabolomics experiment is most often conspicuous by its

absence. While projection models produce results which are easier to interpret than

some techniques, the models generated are still complex and too often have been

presented as a final conclusion with limited meaning, or as a just hypothesis for

future validation.

Metabolomics to date is primarily approached as an exploratory, hypothesis

generation technique. While it it excels at this task because of its wide scope, it

nevertheless suffers because of the myriad possibilities resulting from such large

data sets. The nature of a quantitative multivariate “hypothesis” is not well

understood by many biochemists, and the tools to compare future datasets to such a

hypothesis are not readily available. To even perform such a follow-up experiment,

some measure of “range of applicability” is necessary but such details are almost

never available in publication, or leveraged internally.

Whether the goal is to establish a hypothesis (exploratory analysis) or validate it, the

biological meaning of modeled variance in metabolite concentrations is a powerful

and important step (Mamas et al., 2010). The interpretation of the changes in terms

of biochemistry or physiology incorporates extant systems biology knowledge into

Page 46: Multivariate NMR analysis of human disease models

44

the results, serves as an additional check against experimental flaws, highlights

possible applications, and provides the details necessary to generate focused,

biochemically targeted hypotheses for future validation (Sreekumar et al., 2009).

Some of the most effective and convincing applications of metabolomics have

included biochemical interpretation part-and-parcel:

“… a few notable metabonomic investigations stand out with

regard to their mechanistic insights. In 2001 Nicholls et al.

published data on hydrazine toxicity that mechanistically linked the

neurotoxic effects of hydrazine to markedly increased levels of 2-

aminoadipate (2AA), which is known to affect kynurenic acid

levels in the brain, thus providing a plausible hypothesis for the

heretofore unexplained neurotoxic effects of the compound

(Nicholls et al., 2001). Slim et al., demonstrated that the urinary

metabolite changes induced by Type 4 phosphodiesterase (PDE4)

inhibitors were not the indirect result of concurrent inflammation

but were directly associated with vascular pathology (Slim et al.,

2002). Clayton et al., mechanistically linked the “usual suspect”

creatine to hepatotoxicity via effects on cysteine synthesis. They

later related elevated creatine levels in serum and urine with

hepatotoxicity and nutritional effects (Clayton et al., 2003, 2004).

Mortishire-Smith linked urinary dicarboxylic aciduria to impaired

fatty acid metabolism, which may be common to some hepatotoxic

mechanisms (Mortishire-Smith et al., 2004).” (Robertson, 2005)

To accomplish this bridging between metabolite concentration changes and current

models of relevant biological systems, knowledge bases (personal, literature,

electronic) are used to identify known or potential implications for each significant

metabolite in the model results (Duarte et al., 2007; Wishart et al., 2007, 2012;

Kanehisa et al., 2010, 2011). Determining significance is a somewhat contested

issue, but usually depends on relatively large PCA or PLS loading values in a

Page 47: Multivariate NMR analysis of human disease models

45

statistically significant component dimension (Broadhurst et al., 1997; Alsberg et

al., 1998; Broadhurst and Kell, 2006).

Unfortunately, establishing significant evidence for the lack of change in a

metabolite’s level, as the proverb goes, is more difficult and often overlooked.

Stability of a metabolite’s level in the presence of some environmental change or

other systematic perturbation could invalidate or reinforce a biochemical hypothesis

but variables with unchanging levels are not clearly identified by the statistical

techniques normally applied to PLS. Instead, small loadings values only indicate

that they do not correlate with other changes, a fact which must be contextualized

manually.

Having identified significant or potentially significant changes in metabolite levels,

interpretation can leverage electronic tools to visualise the relationships between

those elements and other nodes in the known biochemical/physiological networks

(Ingenuity, 2011). The Ingenuity software package uses similar databases to

automatically highlight possible pathways or cellular processes in which many

significant metabolites have changed, which speeds up the interpretive process, but

necessarily generates many spurious results which slow it down. Moreover, the

automated pathway analysis tools available cannot (easily) use the direction of

changes to identify putative shifts in metabolism. Nevertheless, expert systems,

probabilistic network inference engines, and other automated tools will likely

continue to increase in both prevalence and utility in the future. Biochemical

networks being strongly, if sparsely, characterised their application is more common

than the latter; the connections between tissues, organs, or cultures might benefit

from metabolite profiling by would require additional experimental methods.

Page 48: Multivariate NMR analysis of human disease models

46

Figure 1.7: KEGG pathway diagram for the TCA cycle

Finally, it bears noting that perhaps the most fundamental limitation of most ex vivo

metabolomics experiments is the assumption that metabolite levels measured in

extracellular fluids (such as serum or urine) are the direct result of changes in

intracellular biochemical processes . Without this assumption, interpretation of

metabolomics data is restricted to physiological signals, extracellular by-products,

or the analysis of cellular extracts via e.g. biopsy or controlled cell lysis and

quenching (Amantonico et al., 2010; Sellick et al., 2010). Tissue extraction must be

performed carefully but can yield a more direct measurement of intracellular

metabolites and hence closer biological picture. The cost comes in terms of other

caveats in the collection and acquisition steps.

Page 49: Multivariate NMR analysis of human disease models

47

1.5 Where does metabolomics break down?

The wide range of potential applications has led to thousands of successful

metabolomics publications in peer reviewed journals in the past decade. While

some results have been validated by following studies, others have been abandoned,

refuted (Brindle et al., 2002; Kirschenlohr et al., 2006; Roussel et al., 2007) or

contested as inconsistent (Sreekumar et al., 2009, 2010; Jentzmik et al., 2010a,

2010b, 2011). Given that each such shortcoming is an opportunity for improvement

in future, identifying potential causes of failure offers both preventative assurances

and the possibility for innovation.

The search for biomarkers, ostensibly key elements of the physiological shift

associated with some disease or condition, has been a recurrent promise of

every -omics field. The simplicity of a diagnostic tool based on serum or urine

samples is extremely attractive (Robertson, 2005), and framing the experiment in

terms of only end goals puts less pressure on the investigator by avoiding the need

for difficult and uncertain biochemical interpretation. Diagnostic biomarkers have

been difficult to validate in almost every field however (Ransohoff and Feinstein,

1978), and metabolomics is no exception (Mamas et al., 2010).

Traditional metabolic markers consisted of single protein or other measureable

whose levels changed dramatically (Rao et al., 2008; Rocconi et al., 2009), but the

multivariate techniques commonly applied to metabolomic analysis yield multi-

variable or multi-dimensional results which cannot be accurately reduced to a single

compound or assay. Instead, the concept of a biomarker pattern comprising

multiple less significant shifts was proposed (Pohjanen et al., 2006; Weljie et al.,

2007; Xuan et al., 2011). Interpretation and application of such a pattern requires

computational assistance, which will be a barrier to acceptance in clinical settings

which require established modus operandi.

The failure of proposed biomarkers is not just a problem of adoption though, and is

not restricted to metabolomics (Ransohoff and Feinstein, 1978; Defernez and

Page 50: Multivariate NMR analysis of human disease models

48

Kemsley, 1997; Forbes et al., 2006; Zhu et al., 2011). Shortfalls of either

reproducibility or meaningful insight have both been reported in metabolomics

literature, implying the need for better validation techniques (Robertson, 2005).

Statistical characterisation of results may be able to improve applications by

delineating a model’s limits or strengths, but attention to biochemical red flags is

also important.

One such flag was an early observation in metabolomics literature that certain

patterns, or subsets of overall response patterns, arise frequently and likely represent

nonspecific or artifactual responses. Energy metabolism is a particularly tricky area

of study, because many deleterious conditions have pervasive effects on the TCA

cycle, gluconeogenesis, fatty acid oxidation, etc (Robertson, 2005). Historical

knowledge of biochemistry reinforces the fact that changes in energy regulation are

a fundamental requirement of most stress responses, immune response, fight-or-

flight, and even psychological condition in higher order species. Unfortunately,

many energy related metabolites are among the highest concentration small

molecules found in both intracellular and extracellular samples, which means they

feature prominently in NMR results, and to a lesser extent those generated MS. As

such, any results which highlight such elements as potential biomarkers will (and

should) come under higher scrutiny and require an extra level of reinforcement at

the biochemical interpretative level.

Also frequently flagged as significant responders in metabolomics, but likely

spurious when considering specific conditions, are general classes of signalling

metabolites and markers of oxidative stress (Fiehn, 2002; Robertson et al., 2010).

Any experiment which manipulates sample biology enough to detect (or, in the case

of humans, measures effects with significant impact on human health) will induce

such nonspecific responses but reporting such a result has little utility for diagnostic,

prognostic, or explanatory goals. Possible solutions include methodological or

interpretive focus on more disease-specific responses, performing comparative

Page 51: Multivariate NMR analysis of human disease models

49

analysis of “non-healthy” but unrelated controls, and/or increasing the stringency of

validation depending on the purpose of the study.

1.6 Research Scope

Weighing the aforementioned caveats against the potential for biological insight, it

remains obvious that developing metabolomics methods is worth the effort. The

ensuing question of how to improve outcomes, in terms of specificity, sensitivity,

comprehensiveness, and other metrics, is therefore an interesting and non-trivial

one.

An old computer science adage states that the best way to improve the output of any

analysis is to improve the quality of the input data. Identifying sources of bias or

noise, and controlling for them in the experimental design and sample handling, are

established parts of an increasing number of metabolomics studies (Broadhurst and

Kell, 2006), but the vast majority of experiments conducted have been one-factor,

two class discriminant experiments comparing “affected” samples to nominal

“controls”. As a baseline for applying multifactor experimental designs, Chapter

Two of this work is a previously published traditional metabolomics experiment,

using the aforementioned high-fat diet-induced insulin resistance mouse model.

In contrast, numerous univariate techniques for analysing multi-factor data (e.g.: 2-

way ANOVA) have been developed and, more importantly, assimilated into the

scientific and clinical community. Based on those, “pre-omic” studies in areas such

as physiology and microbiology have for decades correctly leveraged multi-class,

multi-factor experiments to study the interaction between factors driving metabolic

response. As interactions are of fundamental value to systems biology, applying

similar experimental designs to metabolomics would further capitalize on the power

of the field. More importantly, analysis of confounding factors using the same

Page 52: Multivariate NMR analysis of human disease models

50

approaches can identify, explain, and isolate sources of interference, leading to

better data and better results.

To that end, Chapters Three and Four will present multi-factor studies of

metabolomics. The same insulin-resistant mouse model is used, to show how

experimental design can augment the standard methodology in a clear and

progressive manner.

In Chapter Three, a form of crossover design is used to illustrate the confounding

effects of the mice’s short-term diet on the more pertinent, insulin-resistance

inducing, effects of long-term dietary manipulation. Because mice must be killed to

acquire a sufficient volume of serum for NMR analysis, a traditional cross-over

design is not possible. While the treatment (diet) of individual mice can be changed

mid-course, it is not possible to sample an individual before and after the switch in

order to establish a time course. Instead, the metabolic profile of animals after a full

term on one diet is contrasted with those whose diet changes partway. In this way,

the long- and short-term effects of diet are isolated. The resulting models are

recombined into a single visualization of the comparison, similar to 2-way

univariate results.

In Chapter Four, an two-factor analysis is demonstrated. Exercise is a well-known

agonist for the deleterious effects of a high-fat diet. With respect to caloric

consumption, some of the underlying mechanism for exercise’s mediating effect are

well known and obvious. For some time, however, other implications have been

postulated at the level of biochemical regulation (Vial et al., 1974; Boulé et al.,

2001; Atalay and Laaksonen, 2002; Kadoglou et al., 2007). By contrasting mice

under a well-controlled, moderate exercise regimen, the metabolic effects of

exercise on both normal and obese mice are examined using metabolomic

techniques. The resulting models are used to examine the interaction between diet

and exercise effects in an OPLS-based analogue of the traditional two-factor,

univariate experiment.

Page 53: Multivariate NMR analysis of human disease models

51

Previously published, chapters 2-4 have become part of a growing literature

examining the metabolomic analysis of insulin resistance and Type II diabetes.

Chapter 5 presents a recap of the models previously produced as well as an analysis

of their similarities, significance, and possible sources of error in the context of

more recent publications. The presence of batch effects, even in tightly regulated

and thoroughly controlled methods, is also examined by looking at the differences

between models built on the identically treated samples from all three studies.

Differences are examined in terms of bias vs. simple overfitting, with an effort to

characterise the potential risk for other metabolomics studies. Numerical estimators

of overfitting in the form of jackknifing and resampling-based variable importance

are discussed, which shows the inability of a commonly applied resampling

technique to predict conserved elements. The results suggest the existence of bias

or otherwise inconsistent batch treatment effects, underscoring the necessity of

multiple-batch analysis for even “well controlled” animal model experments.

Finally, having demonstrated methods for improving and quantifying the quality of

metabolomics data, Chapter Six endeavours to extend the interpretation of the

results into the biochemical realm. Many published studies in the field have

presented only the multivariate scores for each sample, as a means to visualizing the

discriminative ability of the method. Doing so falls well short of the maximum

potential for biochemical insight, as represented by the integration of pathway

analysis. Using a knockout mouse model of vitamin C deficiency, a condition with

established but profound metabolic implications, network analysis using HMDB and

KEGG is used to formulate and reinforce an explanation of several changes seen,

beyond the cursory level usually provided. In particular, a hypothetical pathway for

the upregulation of secondary antioxidants is proposed, illustrating the potential for

insight at the systems biology level.

Page 54: Multivariate NMR analysis of human disease models

52

Chapter 2 - Metabolomic profiling of dietary-induced insulin resistance in the

fat–fed mouse

J. Shearer, G. Duggan, A. Weljie, D. S. Hittel, D. H. Wasserman, H. J. Vogel

Published: 2008-01-22 in Diabetes, Obesity, and Metabolism

2.1 Abstract

The predictive ability of metabolic profiling to detect obesity-induced perturbations

in metabolism has not been clearly established. Complex aetiologies interacting with

environmental factors highlight the need to understand how specific manipulations

alter metabolite profiles in this state. The aim of this study was to determine if

targeted metabolomic profiling could be employed as a reliable tool to detect

dietary-induced insulin resistance in a small subset of experimental animals

(n = 10/treatment). Following weaning, male C57BL/6J littermates were randomly

divided into two dietary groups: chow and high fat. Following 12 weeks of dietary

manipulation, mice were fasted for 5 h prior to serum collection. The resultant high

fat–fed animals were obese and insulin resistant as shown by a euglycaemic-

hyperinsulinaemic clamp. Sera were analysed by proton nuclear magnetic resonance

spectroscopy, and 46 known compounds were identified and quantified.

Multivariate analysis by orthogonal partial least squares discriminant analysis, a

projection method for class separation, was then used to establish models of each

treatment. Models were able to predict class separation between diets with 90%

accuracy. Variable importance plots revealed the most important metabolites in this

discrimination to include lysine, glycine, citrate, leucine, suberate and acetate. These

metabolites are involved in energy metabolism and may be representative of the

perturbations taking place with insulin resistance. Results show metabolomics to

reliably describe the metabolic effects of insulin resistance in a small subset of

Page 55: Multivariate NMR analysis of human disease models

53

samples and are an initial step in establishing metabolomics as a tool to understand

the biochemical signature of insulin resistance.

2.2 Introduction

Metabolic profiling or ‘metabolomics’ describes the identification and

quantification of numerous small molecular weight compounds in biological fluid

samples. To date, the ability of metabolomics to predict nutritional responsiveness

and chronic disease states is largely debatable because of the variability and

complexity of data sets (Kirschenlohr et al., 2006; Roussel et al., 2007). Diseases

such as obesity, diabetes and cardiovascular disease not only involve multiple genes

but also are strongly influenced by environmental factors including diet and

exercise. In addition, factors such as age, sex and ethnicity are known to influence

metabolomic profiles (Kardia et al., 2006; Lawton et al., 2008; Lutz et al., 2008). As

such, there is an acute need to generate metabolomic data in highly controlled

experimental models to determine the effects of specific manipulations on proton

nuclear magnetic resonance spectroscopy profiles.

The aim of the present study was to determine whether metabolomics can be used as

a tool to detect whole body insulin resistance from a single serum sample. As

human samples are highly variable, the C57BL/6J laboratory mouse was chosen as

an experimental model. When fed a high-fat diet, the C57BL/6J mouse became

obese and insulin resistant compared with chow-fed mice of the same strain (Fueger

et al., 2004a). As insulin resistance is dietary induced over a prolonged period of

time, this model shares many similarities with human insulin resistance and type 2

diabetes. Common features include obesity, hyperinsulinaemia, hyperlipidaemia and

mild hypertension. These similarities combined with genetic homology and strict

control over environmental conditions make this model ideal for exploring the

influence of dietary-induced insulin resistance on metabolic profiling. In addition, a

small subset of animals were examined and utilized for data generation as this most

closely reflects the numbers used in laboratory experiments. This detail is important

Page 56: Multivariate NMR analysis of human disease models

54

if metabolomics is to be used in a phenotyping capacity for the characterization of

mutant and transgenic mouse models of human disease.

Employing 1H-NMR spectroscopy to analyse serum in control and insulin-resistant

animals, we show targeted profiling of global spectra to be clearly distinguishable.

In addition, results show that both body mass and diet class are important

determinants of metabolite profiles in insulin resistance. Such data are the initial

steps in establishing and validating metabolomics as a tool to characterize and

understand the biochemical signature of insulin resistance.

2.3 Experimental procedures

Mouse maintenance

Procedures were approved by the University of Calgary Animal Care and Use

Committee and abide by the Canadian Association for Laboratory Animal Science

guidelines for animal experimentation. Insulin clamp experiments were approved by

the Vanderbilt University Animal Care and Use Committee. Animals were

maintained in a humidity-controlled room with a 12-h light : dark cycle. Following

weaning (3 weeks), male C57BL/6J littermates were randomly segregated into two

groups and maintained in microisolator cages for 1 week. Following this

acclimatization period, animals received either chow or high-fat diet for 12 weeks

(58R3 – Testdiet; Purina, Richmond, IN, USA). Energy density (%kcal/g) for chow

and high-fat diets was 23% protein, 21% fat and 55% carbohydrate and 15%

protein, 59% fat and 26% carbohydrate respectively.

Animal experimentation

At the end of the dietary period, animals (n = 10/treatment) were fasted for 5 h prior

to being anaesthetized with pentobarbital and weighed. Whole blood (~1 ml) was

obtained by a cardiac puncture and placed on ice and allowed to clot for 30 min.

Samples were then centrifuged for 10 min at 1000 g and sera collected prior to

Page 57: Multivariate NMR analysis of human disease models

55

storage at −80 °C. Blood glucose was assessed in anaesthetized mice (One Touch;

LifeScan, British Columbia, Canada). Non-esterified fatty acids (NEFA) were

measured spectrophotometrically (Wako NEFA C kit; Wako Chemicals, Richmond,

VA, USA). Immunoreactive insulin was assayed with a double antibody method

(MORGAN and LAZAROW, 1962). Abundant data on chow and high fat–fed

C57BL/6J mouse have been previously published (Park et al., 2005; Shearer et al.,

2005; Ayala et al., 2007, 2007). Insulin sensitivity as assessed by in vivo,

euglycaemic-hyperinsulinaemic clamp (n = 16/treatment, chow and high fat) are

previously described (Shearer et al., 2005). Briefly, clamps were conducted

following a postoperative recovery period of ~5 days. The recovery period was a

sufficient time for body weight to be restored within 10% of presurgery body

weight. On the day of the study, conscious and unrestrained mice were placed in a

1-l plastic container lined with bedding and fasted for 5 h before an experiment;

Micro-Renathane (0.033 OD) tubing was connected to the catheter leads and

infusion syringes. Following this, a baseline (t = −90 min) arterial blood sample

(150 ul) was drawn for the measurement of arterial blood glucose, haematocrit,

plasma insulin and NEFA. The remaining erythrocytes were washed with 0.9%

heparinized saline and reinfused. Mice were then infused with insulin

(4 mU/kg/min). To maintain glycaemia during insulin experiments, arterial blood

glucose (5 ul; HemoCue, Lake Forest, CA, USA) was measured at ~10-min

intervals and glucose (50%) administered into the venous catheter. Mice also

received saline-washed erythrocytes from a donor mouse as needed in order to

maintain haematocrit within 5% of incoming haematocrit.

Metabolite sample preparation

Serum samples were thawed and filtered twice using 3-kDa Nanosep

microcentrifuge filters, prewashed to reduce contamination. The filtrate was

transferred to clean microfuge tubes; the final sample volume ranged from 100 to

400 ul. Samples were brought to 650 ml by addition of D20, 140 ul of phosphate

Page 58: Multivariate NMR analysis of human disease models

56

buffer containing dimethyl-silapentane-sulphonate (DSS, final concentration

0.5 mM) and 40 ul of sodium azide. Final sample pH was adjusted to 7 ± 0.2.

Spectrum acquisition

One-dimensional nuclear overhauser effect spectroscopy (NOESY) spectra were

acquired using an automated NMR Case sample changer on a 600-MHz Bruker

Ultrashield spectrometer. The NOESY pulse sequence had a mixing time of 100 min

and a water presaturation pulse. Initial samples for each batch were shimmed to

ensure half-height linewidth of ~1.1 Hz for the DSS peak at 0.0 ppm. Spectra were

acquired with 256 scans (batch 1) or 768 scans (batch 2), then zero filled and

Fourier transformed to 32 k or 64 k points. Baseline correction was performed

manually using a spline function.

Sample fitting

Processed spectra were imported into Chenomx software (version 4.6) for

quantification. For some samples, additional preprocessing was required in

Chenomx Processor, in the form of a reference deconvolution, water region deletion

and/or additional baseline correction. In total, 46 compounds were quantified from

each spectra, for a total of 1000 individual measurements. Three test spectra were

randomly chosen and concentrations for each compound were averaged over the

three test profiles. These averages were used as the starting concentration vector for

fitting or refitting all the acquired spectra (including the three used for the previous

spectra). All 20 spectra were randomly ordered (within acquisition batches) for

fitting in Chenomx Profiler. For each spectrum, the 46 compounds were sorted by

decreasing concentration and then fit for concentration and translation in that order.

The three test profiles were compared with the corresponding profiles to test for

consistency. Each compound concentration was then normalized by dividing the

measured concentration into the total concentration of all metabolites in that sample

(excluding glucose and lactate because of excessively large volumes which

otherwise dominate the normalization) using a Perl script.

Page 59: Multivariate NMR analysis of human disease models

57

Statistical analysis

For measures of body mass, blood glucose, NEFA and insulin sensitivity, a two-way

anova was performed to detect statistical differences (p < 0.05). Differences within

the anova were determined using Tukey’s post hoc test. All data are reported as

means ± s.e. Urea and propylene glycol were excluded from the data set because of

difficulty in quantification and contamination by experimental methods respectively.

In addition to univariate tests, multivariate analysis was conducted using simca-p

software (Umetrics, Sweden) to better assess the concentration changes. A

supervised orthogonal partial least squares discriminant analysis (OPLS-DA)

approach was chosen (model 1 – diet). This allows for a direct comparison of the

variance between diet type (y variable) and metabolite concentrations (x variable)

(Beckwith-Hall et al., 2002). Additional modelling was performed by OPLS to

examine variation of body mass (y variable) in relation to metabolite concentrations

(x variable) (model 2 – mass).

Validation

The accuracy of each model was tested with three established methods. Leave-

some-out testing is a process whereby seven ‘partial models’ are built, each based

on a different subset of samples. The (average) ability of the submodels to predict

the diet (or mass) of the excluded animals gives a measure of the original model’s

strength. Secondly, the models were tested by randomly shuffling the y variables

and building alternate models based on this partially incorrect data.

The strongest form of model validation is external validation (EV) using entirely

separate samples. Ten additional samples were acquired using the same diet and

acquisition protocol. The spectra were profiled in random order without knowledge

of diet or mass and then imported into simca as a secondary data set. The y values of

the EV samples were then predicted using model 1 – diet and model 2 – mass,

which is indicative of each experiment’s reproducibility.

Page 60: Multivariate NMR analysis of human disease models

58

2.4 Results

Animal characteristics

Following dietary manipulation, high fat–fed animals became obese with average

weights of 30 ± 0.9 g and 41 ± 1.5 g for control and high fat–fed animals

respectively (p < 0.05). Fasting blood glucose levels in anaesthetized animals were

11.0 ± 0.5 mM for chow and 14.8 ± 0.8 mM for high fat–fed animals (p < 0.05).

Fasting NEFA levels were not different between groups with values of 0.90 ± 0.04

and 1.02 ± 0.05 mM for chow and high fat–fed groups respectively. A separate

subset of animals was used for the in vivo, euglycaemic and hyperinsulinaemic

clamps. During the clamps, blood glucose did not differ between chow and high fat–

fed groups at any time. Baseline insulin levels were 21 ± 10 uU/ml for chow and

68 ± 10 uU/ml for high fat–fed animals (p > 0.05). Average blood glucose during

the clamps was 7.4 ± 0.3 and 7.6 ± 0.2 mM for chow and high fat–fed animals

respectively (p < 0.05). Glucose infusion rates required to maintain euglycaemia are

shown in Figure 2.1. This study and other studies (Fueger et al., 2004a; Park et al.,

2005; Wu et al., 2006; Ayala et al., 2007; Rao et al., 2007) showed high-fat feeding

in the C57BL/6J mouse to cause significant obesity and insulin resistance, disposing

of ~50% less glucose compared with chow-fed animals.

Page 61: Multivariate NMR analysis of human disease models

59

Figure 2.1: Mean glucose infusion rates during a E-H clamp.

Euglycaemic–hyperinsulinaemic clamp (4 mU/mg/min insulin and 7.0 mM

glucose clamp with variable infusion) results for chow and high fat–fed

C57BL/6J mice. *p < 0.05 for chow vs. high fat–fed mice. Data are reported

as means ± s.e., n = 16 mice/dietary treatment.

Metabolites

The univariate mean and standard deviation of each compound’s original serum

concentration along with the anova p value for significant separation between those

compounds in a univariate model (model 1) are shown in Table 2.1. According to

the multivariate model, 22 metabolites were higher in the high fat–fed animals. Of

these, only absolute concentrations of citrate were significantly different (univariate

p < 0.05). Another 24 metabolites were lower with high-fat feeding (Table 2.2). In

this group, glycerine, lysine, suberate, acetate, leucine, valine, trimethylamine N-

oxide, hippurate and arginine were significantly lower.

Page 62: Multivariate NMR analysis of human disease models

60

Metabolite Control High fat p Value

Citrate 92 ± 12 154 ± 30 0.011 Citrulline 19 ± 3 24 ± 3 0.097 Asparagine 11 ± 2 14 ± 2 0.108 Glycerol 157 ± 35 225 ± 49 0.115 3-Hydroxybutyrate 86 ± 25 125 ± 33 0.190 Carnitine 13 ± 1 11 ± 1 0.234 2-Hydroxybutyrate 12 ± 2 19 ± 8 0.267 O-Acetylcarnitine 6 ± 1 9 ± 3 0.271 Threonine 60 ± 10 69 ± 9 0.390 3-Methyl-2-oxovalerate 6 ± 2 5 ± 1 0.549 Aspartate 9 ± 1 8 ± 0 0.564 O-Phosphocholine 4 ± 2 6 ± 4 0.645 Alanine 165 ± 21 156 ± 19 0.667 Serine 49 ± 8 51 ± 8 0.750 Glutamine 233 ± 34 242 ± 20 0.763 2-Oxoglutarate 10 ± 1 11 ± 2 0.806 Urea 259 ± 67 273 ± 66 0.826 Taurine 361 ± 70 374 ± 102 0.879 Nicotinate 4 ± 1 4 ± 1 0.883 Methionine 13 ± 1 13 ± 1 0.893 Creatine 71 ± 9 70 ± 11 0.916 Ornithine 30 ± 5 31 ± 2 0.970 Table 2.1: Positive contributors to multivariate analysis.

These metabolites increase with high-fat feeding compared with controls

(chow fed). Values are stated as mean (uM) ± s.e., n = 10/dietary treatment,

p values for each individual metabolite are listed.

Page 63: Multivariate NMR analysis of human disease models

61

Metabolite Control High fat P Value

Glycine 97 ± 16 42 ± 16 0.002

Lysine 88 ± 12 60 ± 5 0.005

Suberate 8 ± 2 4 ± 1 0.005

Acetate 148 ± 29 88 ± 14 0.014

Leucine 58 ± 8 41 ± 5 0.019

Valine 73 ± 11 51 ± 6 0.023

TMAO 13 ± 3 7 ± 3 0.038

Hippurate 79 ± 21 48 ± 11 0.042

Arginine 44 ± 15 22 ± 3 0.049

Phenylalanine 30 ± 4 23 ± 2 0.064

Lactate 3390 ± 483 2546 ± 532 0.103

Isobutyrate 9 ± 1 7 ± 1 0.114

Succinate 33 ± 22 8 ± 3 0.117

Tyrosine 29 ± 4 24 ± 3 0.149

Isoleucine 38 ± 10 26 ± 7 0.166

Uridine 6 ± 2 5 ± 1 0.202

Allantoin 39 ± 7 31 ± 7 0.250

Tryptophan 8 ± 2 6 ± 1 0.290

Glutamate 43 ± 24 29 ± 5 0.334

Fumarate 4 ± 1 3 ± 1 0.373

Formate 19 ± 6 15 ± 3 0.374

Pyruvate 72 ± 21 60 ± 12 0.499

Benzoate 139 ± 49 115 ± 34 0.570

Choline 13 ± 3 11 ± 3 0.625

Proline 65 ± 7 64 ± 5 0.906

Table 2.2 Negative contributors to multivariate analysis.

These metabolites decrease with high-fat feeding compared with controls

(chow fed). Values are stated as mean (uM) ± s.e., n = 10/dietary treatment,

p values for each individual metabolite are listed.

Page 64: Multivariate NMR analysis of human disease models

62

Model generation

The importance of the multivariate model is the ability to identify the set of

metabolites most directly responsive to the high-fat diet. In OPLS models, variation

in the x component represents changes in metabolite concentration, while y is

correlated to variation in either diet (model 1) or body mass (model 2). Other

sources of variation can be isolated in subsequent orthogonal components so that

their impact is not interpreted as part of the diet response but rather alternate sources

of variation (batch, exercise, instrument instability, etc.).

Model 1 – diet

Chow and high-fat diets resulted in clearly distinguishable spectra. Acetate, formate

and pyruvate were elevated, while uridine, creatine, tyrosine and the ketone body 3-

hydroxybutyrate were simultaneously depressed. The variable importance of each

metabolite in model 1 (Figure 2.2) is shown in Figure 2.3. The most significant

metabolites included amino acids (Lys, Gly, Leu, Phe and Val) and energy

metabolites (citrate, acetate, glycerol, suberate and lactate). Model 1 (Figure 2.2)

had one sample (of 20) outside the 95% confidence interval, which is within

acceptable tolerances.

An orthogonal partial least squares discriminant analysis scores plot shows the

separation between samples along each of the two model components in model 1.

The first (horizontal) component is the variation associated by the model with

metabolite-based interclass differences. The second, vertical component represents

subject variation unrelated to diet. n = 10 mice/dietary treatment

Page 65: Multivariate NMR analysis of human disease models

63

Figure 2.2: OPLS-DA Scores Plot

An orthogonal partial least squares discriminant analysis scores plot shows the separation between samples along each of the two model components in model 1. The first (horizontal) component is the variation associated by the model with metabolite-based interclass differences. The second, vertical component represents subject variation unrelated to diet. n = 10 mice/dietary treatment

Figure 2.3: (VIP) from model 1

Variable Importance Plot shows the relative contribution that each metabolite makes to the difference between classes in model 1. Metabolites with VIP scores above 1.0 are considered to be strong contributors; anything below 0.5 is considered non-significant. Only compounds with an error smaller than their magnitude (95% confidence in their significance) are shown.

Page 66: Multivariate NMR analysis of human disease models

64

Model 2 – mass

A second OPLS model, created from the same metabolite data, used the mass (y

variable) for regression (figure 4, r2 = 0.89, q2 = 0.52). The purpose of this second

model was to compare the metabolic changes associated with the cause (diet) with

those associated with the effect (change in mass). Validation was performed on this

model as well as scores, loadings and cross-validation statistics. The mass

measurements of the chow-fed animals were clustered in a small range

(35.4 ± 4.3 g). However, the high fat–fed animals were more disperse (42.8 ± 9.1 g),

showing some overlap with the chow-fed mass range.

Figure 2.4 SUS plot comparing loadings for model 1 vs 2

Co-efficients on the diagonal react similarly in both models, whereas off-

axis metabolites show a differential change. Those in the top-right corner are

elevated in high-fat animals and high-mass animals and vice versa. Citrate,

which falls to the right of the x axis, shows elevated concentrations in fat-fed

animals, but does not change in concert with body mass. Conversely,

Alanine and benzoate concentrations are higher in animals with elevated

body mass, but do not respond directly to diet.

Page 67: Multivariate NMR analysis of human disease models

65

Upon inspection, the high-fat animals showed a bimodal distribution and as a result,

the body mass model (model 2) could not separate animals into diet classes. All

animals with higher body mass were among those fed a high-fat diet (80%);

however, not all animals on a high-fat diet exhibited a significant increase in body

mass (20%). It was, however, a good cross-validation predictor of body mass based

on metabolic profile. Model 2 showed substantially different compounds of

importance compared with model 1. Of possible interest was the substantial drop in

the significance of citrate and suberate. In contrast, the significance of alanine, 2-

oxoglutarate, benzoate and formic acid all increased in this model.

Internal validation

Model 1 cross-validation accuracy (YPredCV) was 18/20 or 90%. If Hotelling’s

95% confidence interval was used to exclude outliers from this prediction, the

accuracy improved but the sample size shrinks. Overall, the sample size is too small

for the model to serve as a classifier (predictor for third-party samples), but is

sufficiently robust for biological interpretation.

External validation

Because cross validation cannot detect sampling bias, particularly in small sample

sets, additional EV on samples was performed. Seven of 10 EV samples fell within

the model’s expected parameter range. The body mass of the EV samples predicted

using model 2 correlated reasonably well with the experimental mass of the animals,

with a correlation co-efficient of 0.28. It is important to restate that these predictions

are not intended to describe the power of the model as a clinical or externally

predictive tool. Rather, they are intended to verify that the model is not overfit to the

small number of samples used. Given the limited sample size to build the models

(n = 10/dietary treatment), a 70% membership is reasonable.

Page 68: Multivariate NMR analysis of human disease models

66

2.5 Discussion

The C57BL/6J mouse is a commonly employed experimental model of metabolic

disease as it readily develops obesity and insulin resistance when fed a high-fat diet

(Leiter, 1993; Raab et al., 2005; Alevizos et al., 2007; Clee and Attie, 2007). The

model is advantageous over genetically induced mouse models (e.g. db/db and

ob/ob) as obesity occurs gradually and is environmentally induced. Specifically,

high-fat feeding results in a decline in whole body glucose disposal, increased fatty

acid utilization, impaired glucose tolerance and cardiac dysfunction (Fueger et al.,

2004a; Park et al., 2005; Shearer et al., 2005; Wu et al., 2006; Ayala et al., 2007;

Rao et al., 2007). Here, we present a first step in validating 1H-NMR-based

metabolic profiling techniques to detect dietary-induced insulin resistance in this

model.

Results showed NMR-based spectroscopy to be a predictor of dietary-induced

insulin resistance in an experimental model of limited sample size. EV showed that

both diet and body mass could be predicted with reasonable success. The variable

importance plot, a measure of which metabolites significantly discriminate chow

and high fat–fed animals, shows lysine, glycine, citrate, leucine, suberate and

acetate to be altered. These metabolites are all involved in energy metabolism and

may be representative of the perturbations taking place with insulin resistance.

Of interest, results show the amino acids leucine, glycine and lysine to be depressed

in high fat–fed animals compared with chow-fed animals. There has been renewed

interest in the role of leucine in dietary-induced insulin resistance (Layman and

Walker, 2006; Bain et al., 2009; Newgard et al., 2009). This amino acid interacts

with insulin signalling through the rapamycin (mTOR) pathway and is important in

protein synthesis and substrate selection. Analogous to the present findings, plasma

leucine has been shown to be depressed with high-fat feeding in the Sprague–

Dawley rat with a 22% reduction in levels following 4 weeks of dietary

manipulation(Calles-Escandon et al., 1984). In humans, plasma leucine levels are

Page 69: Multivariate NMR analysis of human disease models

67

depressed with type 2 diabetes compared with healthy volunteers but restored with

6 weeks of rosiglitazone treatment (Van Doorn et al., 2007). Of note, these results

are for plasma and not for urine that typically shows elevated branched-chain amino

acid levels with the disease (Van Doorn et al., 2007).

Other altered metabolites include glycine and citrate that were elevated with high-fat

feeding compared with controls. Glycine is a key metabolite in nucleic acid

synthesis and is known to fluctuate with changes in energy status. Starvation results

in large increases in glycine, signalling energy transition and energy conservation

(Adibi, 1968). Likewise, citrate is produced from the Trichloroacetic Acid (TCA)

cycle and has been previously shown to be increased in alloxan diabetic rats as well

as ketoacidotic humans with type 1 diabetes (DeVilliers et al., 1966). Endogenously

produced from fatty acid and glucose metabolism, concentrations are known to be

sensitive to insulin and glucose levels (Piloquet et al., 2003). Citrate becomes

elevated with high-fat feeding as a consequence of hyperglycaemia, insulin

resistance, a heavy reliance on fatty acid utilization and a decreased liver clearance

(Sjostrom, 1937). Previous reports show this metabolite to be increased with

hyperglycaemia and lowered upon insulin administration in both experimental

animal models and humans (Penttila and Pollanen; Natelson et al., 1948; Pincus et

al., 1948; Todorow and Dikow, 1960). Despite these findings, the use and

applicability of citrate as a disease marker in metabolomic studies are controversial

and should be interpreted with caution (Robertson, 2005).

Correlations between the degree of insulin resistance induced by high-fat feeding

and metabolomic profile could not be performed in the present study as insulin

resistance and metabolite profiling were performed on separate subsets of animals

because of limited serum volume. However, results show metabolite profiling to be

a predictor of body mass. This correlation may be weaker than expected as only

80% of the animals on high-fat diet became obese, the other 20% maintained body

masses in the range of chow-fed animals (>38 g). Such variation in dietary-induced

obesity in inbred rodent strains has been previously reported and is thought to arise

Page 70: Multivariate NMR analysis of human disease models

68

from subtle polygenic differences between animals (Schemmel et al., 1970; Levin et

al., 1987, 1997; West et al., 1992). Additionally, the present study examined

metabolites from a single serum sample. Given this, we could not determine which

tissues were primarily responsible for the observed metabolomic alterations with

high-fat feeding.

Examination of the literature shows the use of 1H-NMR spectroscopy to detect

complex disease states in humans to be highly speculative at best (Kirschenlohr et

al., 2006; Roussel et al., 2007). Mixed results are, in part, because of the complex

aetiology of these conditions and the heterogeneous nature of the populations

sampled. To date, the majority of studies have examined freezer samples of

sera/plasma collected on a large number of patients and show poor assessment of

disease in metabolomic profiles. A more appropriate strategy is to examine samples

collected under more defined conditions. An excellent example of this can be found

in the work of Salek et al. (Salek et al., 2007) who compared spectral profiles in

urine from humans, rats and mice with type 2 diabetes. In this study, metabolomic

profiles of unmedicated human subjects, the db/db mouse and fa/fa rat (genetically

induced models of diabetes caused by leptin receptor mutation) were analysed.

Changes in metabolites involved in nucleotide, methylamine metabolism and TCA

cycle intermediates were common between the animal and the human models. There

were a greater number of metabolites altered in the animal vs. human models, an

expected finding considering that these models exhibit extreme type 2 diabetic

phenotypes. Likewise, these models had greater changes in the number, severity and

type of metabolites with diabetes compared with the present study that employs a

milder, dietary-induced model of the disease. Overall, the study showed that control

and disease samples from each model could be clearly distinguished, data that may

eventually provide novel biomarkers for tracking type 2 diabetes in urine.

Metabolomic technologies encompassing 1H-NMR allow the examination of a large

number of metabolites from small volumes of serum. To the authors’ knowledge,

this is the first examination of mouse serum from chow and high fat–fed C57BL/6J

Page 71: Multivariate NMR analysis of human disease models

69

mice. Of note, this predictive ability was generated on a small subset of

experimental animals as they most closely reflect numbers used in common

laboratory experiments. Animals in the present study were fasted, negating any

direct effect of the diets on metabolomic profiles. Limitations of this study include

the measurement of whole serum and the presence of anaesthesia. The specific

effects of anaesthesia on metabolite profiles are not known. However, given the

volume of blood collected from each animal (1 ml), this was unavoidable.

In conclusion, results show NMR-based spectroscopy to be a predictor of dietary-

induced insulin resistance and body mass in mice. Such results are a first step in

validating the technique for further experimental use. It is important to note that we

are not suggesting that metabolomic profiling be used as a surrogate for the

detection of insulin resistance or insulin clamp studies. Indeed, future work will

need to establish the relationships between the severity of insulin resistance and

metabolomic profile.

Page 72: Multivariate NMR analysis of human disease models

70

Chapter 3 - Differentiating short- and long-term effects of diet in the obese

mouse using 1H-nuclear magnetic resonance metabolomics

G. E. Duggan, D. S. Hittel2, C. C. Hughey3, A. Weljie1, H. J. Vogel1, J. Shearer

Published: 2011-07-26 in Diabetes, Obestity and Metabolism

3.1 Abstract

This study determined whether targeted metabolomic profiling of serum, using 1H

nuclear magnetic resonance, could be employed to distinguish the effects of obesity

from those of diet in mice. Following weaning, littermates were randomly divided

into two diet groups: chow and high fat. After 12 weeks of dietary manipulation, fat-

fed animals were obese and hyperglycaemic. Mice from each treatment either

maintained their current diet or switched to the opposite diet for a final week.

Differences in metabolite levels were determined using orthogonal projection to

latent structures and cross-validated discriminant analysis. The short- and long-term

effects of each diet could be clearly distinguished. Short-term diet effects are the

major contributor to the metabolic profile, underscoring the need for controls

beyond the standard fast before serum collection. This work shows the importance

of dietary controls when attempting to isolate obesity-related changes and highlights

the ability of metabolomics to identify subtle changes when experiments are

properly structured.

3.2 Introduction

Obesity results from excess caloric consumption and a chronic mismatch between

energy consumption and expenditure. These conditions induce systemic changes in

metabolism, altering a number of biochemical pathways. While the study of

individual metabolites has been a staple of nutrition research, technological

advancements now allow for the simultaneous measurement of numerous

metabolites in small sample volumes. These ‘metabolomic’ studies are seeing an

Page 73: Multivariate NMR analysis of human disease models

71

expanded use as a tool for the study of obesity (Kim et al., 2009b). The technique is

relatively inexpensive, rapid, highly sensitive and provides an abundance of data. To

date, the technology has been proposed to monitor disease progression, identify

biomarkers and generate new therapeutic targets for obesity. Despite these perceived

opportunities, significant challenges remain.

One such caveat when applying metabolomics to the study of obesity is

distinguishing metabolites that are altered as a consequence of diet versus those

changing as a result of obesity (Zivkovic and German, 2009). Excess caloric intake,

independent of obesity, results in a number of metabolic changes. Overfeeding

(+50%) in healthy males for 5 days increases hepatic glucose production, insulin

secretion and alters incretin response (MORGAN and LAZAROW, 1962). All these

physiological changes would be expected to alter an individual's metabolic profile.

Given this, we sought to employ a factorial study design to distinguish the specific

effects of diet from those of obesity. We test both the sensitivity and specificity of

the technique by employing a ‘diet switch’ in the C57BL/6J mouse.

3.3 Methods

Procedures were approved by the University of Calgary Animal Care and Use

Committee and abide by the Canadian Association for Laboratory Animal Science

guidelines for animal experimentation. Animals were maintained in a humidity-

controlled room with a 12-h light : dark cycle. Following weaning (3 weeks), male

C57BL/6J littermates were randomly segregated into two groups and maintained in

microisolator cages for 1 week. Following this acclimation period, animals received

either chow (C) or high-fat diets (H; 60%) for 12 weeks (58M1, 58R3—TestDiet;

Purina, Richmond, IN, USA) (n = 20–24). At the end of 12 weeks, half of the chow-

fed animals (n = 10–12 per class) were switched to a diet of high fat (C-H) while the

remainder continued to receive chow feed for a final week (C-C). At the same time,

one half of the obese animals previously fed a high-fat diet were switched to the

chow diet (H-C). Animals remaining on a high-fat diet for the final week were

Page 74: Multivariate NMR analysis of human disease models

72

labelled H-H. Animal weight was measured every week during initial and final

dietary treatments (figure S1, Table S1).

On the day of the experiment, animals were fasted for 6 h prior to being weighed

and euthanized (pentobarbital). Whole blood (~1 ml) was obtained by immediate

cardiac puncture to minimize anaesthetic effect on metabolism. Blood was placed

on ice and allowed to clot for 30 min. Samples were then centrifuged for 10 min at

3000 rpm and sera collected prior to storage at −80 °C. Blood glucose was assessed

using One Touch (Lifescan, Bunaby, BC, USA). Non-esterified fatty acids were

measured spectrophotometrically (Wako NEFA C kit; Wako Chemicals, Richmond,

VA, USA). Immunoreactive insulin was assayed with a double-antibody method

(MORGAN and LAZAROW, 1962). Following blood collection, a small section of

the left liver lobe was isolated, cleaned and rapidly frozen in liquid nitrogen prior to

triglyceride analysis. Liver triglycerides were determined by a standard kit (Point

Scientific, Canton, MI, USA). Abundant data on the chow- and high–fat-fed

C57BL/6J mouse have been previously published (Fueger et al., 2007; Shearer et

al., 2008). Serum metabolites were prepared as previously described (Appendix S1)

(Shearer et al., 2008). Briefly, one-dimensional nuclear overhauser effect

spectroscopy spectra were acquired using an automated Case sample changer on a

600-MHz Bruker Ultrashield Nuclear Magnetic Resonance (NMR) spectrometer

(Bruker Biospin, Karlsruhe, Germany). Processed spectra were imported into

Chenomx 4.6 for quantification (Chenomx Inc, Edmonton, Alberta, Canada). Urea,

which is often excluded from spectral profiling, was included notwithstanding its

semiquantitative nature but caution is advised in its interpretation. For measures of

body mass, blood glucose and insulin, a two-way analysis of variance (anova) was

performed to detect statistical differences (p < 0.05). Differences within the anova

were determined using Tukey's post hoc test. All data are reported as means ± s.e.

For metabolomics data, all concentration measurements were exported into simca-p

(UMetrics, Umeå, Sweden) for multivariate analysis. The simca-p software suite

was used to construct an orthogonal projection to latent structures discriminant

Page 75: Multivariate NMR analysis of human disease models

73

analysis (OPLS-DA) model of both traditional and diet-switched samples. Here, the

relationships between dietary treatment and NMR spectra were assessed.

3.4 Results

Metabolomics is highly sensitive and broad in scope, but must be coupled with

proper experimental design to ensure specificity. This is especially true when

quantifying complex and subtle biochemical effects such as those occurring in

obesity. In this study, we set out to identify whether metabolomics could

differentiate the short- and long-term effects of chow and high-fat diets. Employing

OPLS-DA, we show clear separation between classes (Figure 3.1A). The major

finding of this work shows that the final week's diet was the predominant

contributor to the metabolic profile (Figure 3.1B). When variation in weight gains

was considered by comparing outliers to distribution of body weights within the

class, the association became even stronger.

The study design allowed us to distinguish the specific effects of diet from those of

obesity in the mouse. The fat-fed C57BL/6J mouse is an established method for

studying obesity (Park et al., 2005). Data on body mass, plasma values and liver

triglycerides are shown in Table 3.1. Exposure to a fat-enriched diet (H-H) for an

extended period results in long-term obesity and hyperglycaemia. Switching from a

high-fat diet to a chow diet (H-C) had no impact on obesity, but normalized plasma

glucose levels. Unexpectedly, the diet switch protocol (H-C), although brief in

duration, normalized liver triglyceride levels. Previous studies from our laboratory

have shown a 50% reduction in whole-body glucose disposal in the high-fat-fed

C57BL/6J mouse using a hyperinsulinaemic–euglycaemic clamp as well as

dysregulation of a number of energy- and signalling-related pathways (Shearer et

al., 2008).

Page 76: Multivariate NMR analysis of human disease models

74

Figure 3.1 OPLS

Results

(A) Orthogonal projection to latent structures

discriminant analysis (OPLS-DA) scores

showing separation

between C-C and H-H samples, the two groups employed in the majority of obesity studies. (B) Predicted scores for diet-switched samples. Comparison with (A) shows that C-H samples fall into the same range as H-H samples while H-C samples were more similar to C-C samples. This shows how the

final week diet masks the initial obesogenic diet of interest. Upon reinspection, the C-H outliers were found to have gained a significantly higher weight during the initial dietary treatment which may explain their differentiated metabolism. (C) Shared and unique structure plot comparing individual metabolite changes in the simple and diet-switched studies. OPLS coefficients are shown on the x-axis for animals remaining on the same diet throughout (H-H vs. C-C, no diet switch). The y-axis shows a second model comparing (diet switch controlled) H-C animals to the same C-C baseline. Significant metabolites (as determined by a t-test on loadings in each model) are highlighted in black. Metabolites found on the diagonal are responsive to obesity (valine, leucine and glutamine), whereas off-axis metabolites (pyruvate and alanine) change as a result of diet. Urea (marked with an asterisk) should be regarded as a semiquantitative measurement because of peak suppression in the nuclear magnetic resonance spectrum; nevertheless its biochemical significance merits some consideration, albeit with caution.

Page 77: Multivariate NMR analysis of human disease models

75

Class label C-C C-H H-H H-C

Dietary fat content Low/low Low/high High/high High/low

Number of animals 10 9 9 10

12-week preswitch mass (g) 40 ± 1a 39 ± 2a 49 ± 1b 47 ± 2b

13-week postswitch mass (g) 40 ± 1a 38 ± 2a 48 ± 1b 45 ± 2b

Body mass change (g) 0.1 ± 0.2a 1.2 ± 0.3ab 0.8 ± 0.3ac 1.8 ± 0.6b

Glucose (mM) 14.3 ± 0.6a 14.0 ± 1.0a 18.4 ± 1.1b 13.4 ± 0.9a

Insulin (pg/ml) 6.0 ± 1.0 6.5 ± 1.1 7.4 ± 1.4 7.2 ± 1.4

NEFA (mM) 1.12 ± 0.17ab 1.62 ± 0.23a 1.41 ± 0.13ab 0.91 ± 0.12b

Liver triglycerides (mg/g) 112 ± 11a 196 ± 30b 237 ± 24b 128 ± 15a

Table 3.1: Univariate measures of obesity

Body mass, plasma glucose, insulin, non-esterified fatty acids (NEFAs) and liver triglyceride levels are shown for C57BL/J mice. Different superscripts indicate significantly different classes within each row (p < 0.05). Some indicators vary with initial 12-week diet, whereas others align to the final week's diet. Data represent means ± s.e., n = 9–12 animals per treatment.

Comparison of all four treatments in this study allowed us to unmask the short-term

changes of diet from those specifically related to obesity. Individual metabolites

changing in response to the diet switch are shown in Figure 3.1C. Metabolites found

near the diagonal (e.g. valine, leucine, isoleucine, glutamine and glutamate)

responded similarly regardless of the treatment in the final week. These compounds

show bona fide changes associated with obesity. Numerous laboratories, including

our own, have previously identified these metabolites to be altered with

obesity(Shearer et al., 2008; Newgard et al., 2009; Connor et al., 2010), but the

results are often confounded with changes in other metabolites. Although the

metabolomics platforms utilized in these studies are sensitive, they often lack

dietary controls resulting in low specificity.

Examination of off-axis metabolites in Figure 3.1C showed that several compounds

responded differently when the final week diet returned to a normal fat content (H-

Page 78: Multivariate NMR analysis of human disease models

76

C). These transient changes result from the diet and not obesity per se. For example,

glucose levels were lowered but succinate was elevated with H-C. Metabolites

involved in short-term energy regulation (e.g. creatine, ornithine and taurine) were

also clustered. All these metabolites are influenced by the animals' energy intake

and general health, but revert to normal levels in the H-C animals furthering

suspicion in their biological relevance to obesity. Pyruvate, alanine and lactate

(lower right quadrant in Figure 3.1C) were elevated in H-H animals (and C-H

animals), which suggests a suppression of carbohydrate metabolism on a high-fat

diet. As this ‘metabotype’ does not persist in H-C animals, it may result from

hyperglycaemia that was resolved with the diet switch in this group.

Limitations of this study include no information on energy consumption and energy

balance in individual animals. Analysis of body mass as a function of metabolomic

profile also reveals the presence of outliers (n = 2), animals that did not gain the

expected weight when placed on a high-fat diet (figure S2). The presence of these

animals did not alter the model outcomes. Finally, the diet switch results in slight

weight loss in both C-H and H-C. The effects of this loss on the terminal

metabolomic profile cannot be discounted.

3.5 Conclusions

In this study, NMR was employed as a highly sensitive tool to discriminate the

effects of diet from those of obesity in the mouse. Novel findings show

metabolomic profile to be predominantly influenced by recent diet, despite fasting

prior to serum collection. Short-term diet effects were mainly centred around

changes in energy metabolism and glucose utilization, whereas obesity-related

changes were focused on amino acids and large non-polar molecules. This

underscores the importance of dietary controls when profiling obese states,

especially when highly sensitive techniques are employed.

Page 79: Multivariate NMR analysis of human disease models

77

Chapter 4 Metabolomic response to exercise training in lean and diet-induced

obese mice

Gavin E. Duggan, Dustin S. Hittel, Christoph W. Sensen, Aalim M. Weljie, Hans J. Vogel, and Jane Shearer

Published: 2011-01-23 in the Journal of Applied Physiology

4.1 Abstract

Exercise training is a common therapeutic approach known to antagonize the

metabolic consequences of obesity. The aims of the present study were to examine

1) whether short-term, moderate-intensity exercise training alters the basal

metabolite profile and 2) if 10 days of mild exercise training can correct obesity-

induced shifts in metabolic spectra. After being weaned, male C57BL/6J littermates

were randomly divided into two diet groups: low fat (LF) or high fat (HF). After 12

wk of dietary manipulation, HF animals were obese and hyperglycemic compared

with LF animals. Mice from each group were further divided into sedentary or

exercise treatments. Exercise training consisted of wheel running exercise (2 h/day,

10 days, 5.64 m/min). After exercise training, animals were rested (36 h) and fasted

(6 h) before serum collection. Samples were analyzed by high-resolution one-

dimensional proton NMR. Fifty high- and medium-concentration metabolites were

identified. Pattern recognition algorithms and multivariate modeling were used to

identify and isolate significant metabolites changing in response to HF and exercise

training. The results showed that while exercise can mitigate some of the abnormal

patterns in metabolic spectra induced by HF diet feeding, they cannot negate it. In

fact, when the effects of diet and exercise were compared, diet was a stronger

predictor and had the larger influence on the metabolic profile. External validation

of models showed that diet could be correctly classified with an accuracy of 89%,

whereas exercise training could be classified 73% of the time. The results

demonstrate metabolomics to effectively characterize obesity-induced perturbations

in metabolism and support the concept that exercise is beneficial for this condition.

Page 80: Multivariate NMR analysis of human disease models

78

4.2 Introduction

Obesity is reaching epidemic proportions in the population. The health implications

of obesity are well known and include an increased mortality from cardiovascular

disease, type 2 diabetes, and cancer (Stamler et al., 1993; Bonora et al., 1996;

Giovannucci and Michaud, 2007). A common therapeutic approach to treating

obesity is exercise. Exercise increases caloric expenditure, alters patterns of

substrate utilization, and enhances whole body insulin sensitivity (Yamanouchi et

al., 1995). However, the volume, intensity, and specific mechanisms underlying the

beneficial health effects of exercise are debatable and often difficult to assess.

A largely unexplored yet highly sensitive tool to examine the effects of exercise

training is metabolomics. Metabolomics is the quantitative assessment of small low-

molecular-weight metabolites in response to physiological or pathophysiological

stimuli. While the study of metabolites has been a staple of exercise physiology

(e.g., blood lactate), advances in technology now allow for the simultaneous

quantification of numerous metabolites in a small sample volume. This unbiased,

systematic approach, or “metabolomic profile,” is powerful in that it permits the

investigator to evaluate the fluctuations of a biochemical pathway and its individual

components. Unlike transcripts or proteins, which often take hours, days, or even

weeks to change, metabolites are the end products of a reaction and closely reflect

underlying physiological processes (Weljie et al., 2006). For this reason, the outputs

of metabolomics are highly sensitive indicators of an organism's physiological or

disease status (Glassbrook et al., 2000; Glassbrook and Ryals, 2001).

Using the common laboratory mouse, the present study aimed to determine 1)

whether short-term, moderate-intensity exercise training alters basal, whole body

metabolite profiles and 2) if exercise training can correct obesity-induced shifts in

metabolic spectra. Our laboratory has previously characterized the metabolomic

profile of diet-induced obesity in this model (Shearer et al., 2008). The results of the

present study showed that 10 days of low-intensity exercise training was sufficient

Page 81: Multivariate NMR analysis of human disease models

79

to normalize blood glucose in obese animals and induce global metabolomic shifts

in both lean and obese animals. When diet and exercise treatments of rested animals

were compared, diet rather than exercise training had the most pronounced effects

on the metabolomic profile, with obese animals displaying perturbations in

branched-chain amino acids (BCAAs), large noncharged amino acids, and numerous

mediators of insulin signaling.

4.3 Methods

Mouse maintenance

Procedures were approved by the Animal Care and Use Committee of the

University of Calgary and abided by the Canadian Association for Laboratory

Animal Science guidelines for experimentation. Animals were maintained in a

humidity-controlled room with a 12:12-h light-dark cycle. After being weaned (3

wk of age), male C57BL/6J littermates were randomly segregated into two groups

and maintained in microisolator cages for 1 wk. After this acclimation period,

animals received either a low-fat (LF) diet or a high-fat (HF) diet for 12 wk (58R3,

TestDiet, Purina, Richmond, IN). The energy density of the LF diet (5001

Laboratory Rodent Diet, Purina) was 234.0 g/kg of energy as protein, 45.0 g/kg of

energy as fat, and 499.0 g/kg of energy as carbohydrate. The HF diet contained

197.4 g/kg of energy as protein, 358.0 g/kg of energy as fat, and 358.2 g/kg of

energy as carbohydrate. The primary source of fat in this diet was lard. Both diets

met all nutritional requirements of adult mice. To encourage the development of

diet-induced obesity, food and water were provided ad libitum throughout the

experiment. Food consumption was not monitored or controlled throughout the

study.

Exercise training

After 12 wk of dietary manipulation, mice were randomly assigned to either

sedentary (SED) or exercise treatment (EX) groups (n = 9–12 mice/treatment).

Page 82: Multivariate NMR analysis of human disease models

80

Exercise was performed on a rotating treadmill wheel system (model 80800A-10,

Lafayette Instruments, Lafayette, IN) during daylight hours (9–11 AM).

Acclimation to exercise consisted of 15 min of treadmill exercise at 5.64 m/min in

EX animals followed by a 3-day rest period. Exercise was then performed at a

constant intensity for a defined duration (5.64 m/min, 2 h/day) for 10 consecutive

days. Exercise intensity was chosen so that obese mice could complete the entire

protocol. To account for stress induced by animal handling, SED animals were also

placed in a stationary treadmill for both acclimation and exercise treatments.

Animal experimentation.

At the end of the exercise period, animals were rested for 36 h. This rest period was

chosen to negate any direct effects of exercise. Previous work (Cartee et al.,

1989)has shown that 18 h of rest is sufficient for the acute effects of exercise on

insulin responsiveness to subside. This rest included a 6-h fasting period to negate

the effects of postprandial food absorption. Animals were then weighed and

anesthetized (pentobarbital). Whole blood (~1 ml) was obtained by a cardiac

puncture, placed on ice, and allowed to clot for 30 min. Samples were then

centrifuged for 10 min (3,000 rpm), and sera were collected before storage at

−80°C. Blood glucose was assessed in anesthetized mice (One Touch, Lifescan,

Burnaby, BC, Canada). Nonesterified fatty acids (NEFAs) were measured

spectrophotometrically (Wako NEFA C kit, Wako Chemicals, Richmond, VA).

Immunoreactive insulin was assayed with a double-antibody method (MORGAN

and LAZAROW, 1962). Abundant data on LF and HF diet-fed C57BL/6J mice have

been previously published (Fueger et al., 2005; Ayala et al., 2007).

Metabolite sample preparation.

Serum samples of ~0.3 ml were stored at −80°C before data acquisition. Samples

were thawed and filtered twice using 3-kDa NanoSep microcentrifuge filters, which

had been prewashed to reduce preservative contamination. The filtrate was

Page 83: Multivariate NMR analysis of human disease models

81

transferred to clean microfuge tubes; the final sample volume ranged from 100 to

300 ul. Samples were brought to 450 ul by the addition of 140 ul phosphate buffer

containing dimethyl silapentane sulfonate (DSS; final concentration: 0.5 mM), 40 ul

sodium azide, and distilled H2O. The final sample pH was adjusted to 7.0 ± 0.01.

Spectrum acquisition.

One-dimensional nuclear overhauser effect spectroscopy (NOESY) spectra were

acquired, in two batches, using an automated NMR case sample changer on a 600-

MHz Bruker Ultrashield spectrometer. The NOESY pulse sequence had a mixing

time of 100 ms and a water presaturation pulse. Samples were individually shimmed

to ensure a half-height line width of ~0.8 Hz for the major DSS peak, which was

calibrated to 0.0 ppm. Spectra were acquired with 768 scans, zero padded, and

Fourier transformed to 64,000 points. Standard postprocessing of the spectra

included deletion of the water region, B-spline baseline correction, reference

deconvolution, and calibration of the DSS peak. A 1H13C heteronuclear single

quantum coherence (HSQC) spectrum of one sample, chosen at random, was

acquired for peak assignment and verification.

Metabolite concentration profiling.

Processed spectra were imported into Chenomx software (version 4.6) for

quantification (Edmonton, AB, Canada). Fifty compounds were profiled based on

chemical shift assignments verified by 1H13C two-dimensional HSQC. Spectra were

randomly ordered for fitting in the Chenomx Profiler to avoid progressive bias.

Compounds were fit from the highest initial concentration to the lowest, with an

iterative reinspection. After all samples had been quantified, Perl scripts were used

to detect inconsistencies in peak assignment for further reinspection. Each

measurement was normalized to the mean sample concentration by dividing each

profiled spectral concentration by the total concentration of all profiled metabolites

in that sample (Weljie et al., 2006). For univariate reporting and analysis, profiled

concentrations were scaled back to the mean sample dilution.

Page 84: Multivariate NMR analysis of human disease models

82

Multivariate analysis.

Projection to Latent Structures (PLS) is a family of pattern recognition tools used to

model differences between treatment groups. Normalized concentration measures

were imported into SIMCA-P software (Umetrics, Sweden) for multivariate pattern

analysis. To evaluate the interaction and separation of diet- and exercise-related

effects, PLS was used to characterize the metabolomic changes seen within the four

treatment groups. Centered and scaled PLS coefficients were used to evaluate the

relative importance of metabolites in each model. Because of its ability to further

isolate changes in metabolism specific to each treatment, Orthogonal PLS (OPLS)

was subsequently used to isolate diet-related changes in each exercise group. An

OPLS model comparing LF and HF diet-fed animals in the SED treatment was

created (Y = 0 and Y = 1, respectively). An identical model was constructed for the

EX animals, and the resulting model structures were compared.

Model validation

The strength and reliability (“goodness of fit”) of models derived from

metabolomics data were assessed by comparing the percentage of variation captured

in known (modeling) and unknown (testing) samples. Here, two samples from each

treatment group or 26% of the samples collected in the study were randomly

selected and blinded to the investigator. Samples were then examined for fit into

previously described OPLS models. Cross validation was then performed to test

each model's ability to predict the class of samples not used in its creation.

Jackknifing was used to calculate SEs and t-value 95% confidence intervals for PLS

coefficients.

Network analysis

To establish the relationships between metabolites and insulin signaling, metabolite

loadings were exported into the Ingenuity Pathway Analysis (IPA) tool (Ingenuity

Systems, Redwood City, CA; http://www.ingenuity.com). A separate IPA was

Page 85: Multivariate NMR analysis of human disease models

83

performed for SED and EX animals; in each case, only loadings for metabolites

changing significantly (P < 0.05) in response to the HF diet were included in the

IPA.

Protein determination

Total protein and phosphorylation of Akt and mammalian target of rapamycin

(mTOR) were determined in skeletal muscle (gastrocnemius) and liver lystates.

Lysates were prepared in Laemmli buffer and separated by SDS-PAGE. Samples

were resolved on 4–12% bis-Tris SDS-PAGE gels (Invitrogen, Carlsbad, CA)

followed by electrophoretic transfer to polyvinylidene difluoride membranes

(Millipore, Billerica, MA). Membranes were blocked in 2% nonfat milk diluted in

Tris-buffered saline (TBS) containing 0.05% Tween 20. Membranes were probed

with primary antibodies overnight at 4°C and then incubated with secondary

antibodies for 1 h at room temperature. Phospho-Akt (Ser473) and phospho-mTOR

(Ser2448) were normalized to total Akt and mTOR as well as GAPDH, which was

run as a secondary loading control (data not shown). Primary antibodies were as

follows: total Akt, phospho-Akt (Ser473), total mTOR, and phospho-mTOR (Ser2448)

(all from Cell Signaling Technology, Boston, MA) and GAPDH (Abcam,

Cambridge, MA). All antibodies were diluted in 2% nonfat milk diluted in TBS

containing 0.05% Tween 20. Membranes were washed in TBS containing 0.05%

Tween 20. Densitometry was performed using GeneTools (Syngene, Fredrick, MD).

Statistical analysis

Differences between body weight, blood glucose, insulin, and NEFAs were

determined using two-way ANOVA. A significance level of P < 0.05 was used, and

differences with ANOVA were determined using a Tukey's post hoc test. Two-way

ANOVA was used to calculate individual F-test significance for each metabolite.

Metabolites were considered to be significant if they had a P value of <0.05 in both

OPLS models or a univariate significance of P < 0.05 for either factor. All data are

reported as means ± SE.

Page 86: Multivariate NMR analysis of human disease models

84

4.4 Results

Animal characteristics

After dietary manipulation, HF diet-fed animals became obese, with average

weights of 30.8 ± 0.9 and 46.6 ± 1.5 g for LF and HF diet-fed animals, respectively

(P < 0.05). These data confirm those of other studies (Ayala et al., 2007)

demonstrating that HF diet feeding in the C57BL/6J mouse causes significant

obesity and insulin resistance, disposing of ~50% less glucose compared with chow-

fed animals. Animal characteristics are shown in Table 4.1: Change in body mass,

glucose, NEFA and Insulin levels. Exercise training resulted in a slight weight loss

in both LF and HF diet-fed animals (P < 0.05). This finding was unexpected given

the short duration and mild nature of the EX regime. Fasting blood glucose levels

were elevated in HF diet-fed SED animals but not HF diet-fed EX animals,

indicating the EX protocol was effective in abolishing diet-induced hyperglycemia

(P < 0.05). Fasting NEFA levels were not different between groups. Plasma insulin

levels were greater in HF diet-fed animals compared with LF diet-fed animals

regardless of the exercise intervention (P < 0.05). However, no differences were

noted between HF diet-fed SED animals and HF diet-fed EX animals.

Page 87: Multivariate NMR analysis of human disease models

85

LF-SED LF-EX HF-SED HF-EX

Animals/Group 10 10 10 9

Initial Body Mass (g) 32.1±1.3 29.4±1.2 47.8±1.8 * 45.4±2.7 *

Final Body Mass (g) 32.5±1.3 28.6±0.8 47.7±1.5 * 39.9±1.8 *†

Percent Change Mass 1.3±1.0 -2.6±1.6 † 0.0±1.1 -11.4±2.5 *†

Glucose (mM) 12.9±0.8 13.7±1.4 17.7±1.0 * 12.8±1.5 †

NEFA (mM) 1.1±0.2 1.1±0.1 1.3±0.2 1.4±0.3

Insulin (ng/ml) 5.4±0.9 4.1±1.1 7.1±0.8 * 7.2±1.2 *

Table 4.1: Change in body mass, glucose, NEFA and Insulin levels

Values are means ± SE. NEFA, Non-esterified fatty acids ; LF, low fat; HF, high fat; SED, sedentary. Body weight was assessed before and after the exercise intervention. * P < 0.05, significance between LF and HF diets within a treatment (SED or EX); † P < 0.05 between SED and EX groups within a diet.

Model validation

Test samples used for validation were blinded to the investigator and not used in

model creation. While n = 9–12 samples were obtained, only n = 7–8 samples are

reported in the models. The remaining samples were used for validation purposes.

The results showed that there was a greater variation of metabolites due to diet than

exercise treatment. This can be seen in the greater spread between samples in the

score plot (Figure 4.1A). Cross-validation tests of model quality demonstrated diet

class prediction with 89% accuracy in EX animals and 73% in SED animals. Given

the limited sample size used in model creation (n = 7–8 samples/treatment) and the

complex physiology of the diet and exercise regimes, these validation scores

indicate that the generated PLS models were veridical.

Model interpretation.

Although diet and exercise manipulations resulted in distinct separation of

treatments, significant interactions between groups were visible (Figure 4.1A). To

Page 88: Multivariate NMR analysis of human disease models

86

separate these interactions, the data was modeled with OPLS (Figure 4.1B).

Analysis of the resulting loadings (Figure 4.1B) showed the response of both SED

and EX animals to HF diet feeding. Metabolites on the dashed line of identity in

Figure 4.1B responded similarly to HF diet feeding, regardless of exercise.

Conversely, metabolites closer to the top left or bottom right corner of Figure 4.1B

responded oppositely to the HF diet depending on the animal's exercise regime.

Individual metabolites.

To determine the individual metabolites changing in response to exercise, diet, and

the diet-exercise interaction, two-way ANOVA was performed. Individual

concentrations of significant metabolites in each treatment group are shown in Table

4.2. A metabolite was considered significant if it had a univariate ANOVA factor P

< 0.05 or a multivariate P value of <0.05 for both EX and SED animals. Statistical

differences and interactions for each metabolite in Table 4.2 are shown in Table 4.3.

A column plot showing the magnitude and direction of change for individual

metabolites for SED and EX animals is shown in Figure 4.2A.

When exercise and diet treatments were directly compared, diet had the most

profound effects on the metabolite profile. Given the animals were rested for 36 h

before serum collection, this finding was not unexpected. Of note, the HF diet

caused a lower concentration in BCAAs Figure 4.2B) as well as many large polar

amino acids, including phenylalanine and tyrosine. In addition, taurine and

methionine, key metabolites involved in lipid homeostasis and insulin sensitivity,

were lower in HF diet-fed animals (Table 4.2).

Page 89: Multivariate NMR analysis of human disease models

87

Figure 4.1: Class separation and comparison of SED vs EX response

A: two-dimensional score plot showing the Projection to Latent Structures (PLS) model's ability to separate samples based on diet (horizontal dimension) and exercise (vertical dimension). Multivariate scores represent each sample's composition; the corresponding axes of the score plot represent how much of each pattern is seen in a sample. Validation scores for the PLS model for both diet and exercise were 44%, 72%, and 27% for R2

X, R2Y, and Q2

Y. The model was generated with n = 7–8 samples/treatment. LF, low-fat; HF, high fat diet; SED, sedentary

Page 90: Multivariate NMR analysis of human disease models

88

treatement; EX, exercise treatment. B: comparison of metabolic responses in EX and SED animals subjected to a HF diet. Metabolites on the dashed line of identity responded similarly regardless of exercise, e.g., metabolites further to the right rose more in SED animals and did not respond in EX animals (glucose and ornithine). Multivariate significance of each response is indicated by the open circles (significant in one population) or solid circles (both populations) [by Orthogonal PLS (OPLS)]. OPLS validation scores of diet for SED animals were 79%, 95%, and 73%, whereas EX animals had scores of 73%, 98%, and 89% for R2

X, R2Y, and Q2

Y, respectively. The plot represents 7–8 animals/treatment. TMAO, trimethylamine oxide.

LF Diet HF Diet

Metabolite SED EX SED EX

Isobutyrate 15 ± 1 17 ± 2 10 ± 1 9 ± 1

Trimethylamine oxide 29 ± 4 39 ± 7 16 ± 2 19 ± 3

Creatine 117 ± 23 139 ± 21 80 ± 16 73 ± 7

Valine 132 ± 8 150 ± 13 100 ± 7 99 ± 17

3-Methyl-2-oxovalerate 26 ± 3 35 ± 9 23 ± 1 21 ± 4

Phenylalanine 46 ± 2 59 ± 8 39 ± 2 34 ± 3

Isoleucine 62 ± 6 79 ± 6 48 ± 4 42 ± 9

Leucine 100 ± 8 106 ± 10 75 ± 8 66 ± 7

Taurine 605 ± 75 860 ± 201 502 ± 47 402 ± 42

Glycine 213 ± 17 277 ± 42 193 ± 35 158 ± 21

O-acetylcarnitine 10 ± 0 10 ± 0 6 ± 1 8 ± 1

Choline 56 ± 9 80 ± 22 40 ± 6 36 ± 3

Glutamate 69 ± 10 97 ± 16 52 ± 5 51 ± 9

Lactate 6,639 ± 598 7,766 ± 1058 5,451 ± 656 4,479 ± 624

Tyrosine 52 ± 2 65 ± 4 47 ± 4 40 ± 6

Methionine 32 ± 2 37 ± 4 23 ± 2 24 ± 5

Acetate 233 ± 22 161 ± 16 100 ± 8 75 ± 9

Glutamine 493 ± 16 446 ± 39 517 ± 29 414 ± 39

Glucose 12,900 ± 788 13,677 ± 1394 17,730 ± 1009 12,766 ± 1446

Ornithine 55 ± 8 64 ± 10 64 ± 10 37 ± 4

Table 4.2 Individual metabolite changes in response to exercise and diet

Values are means ± SE (in uM); n = 7–8 samples/treatment. Data were categorized by treatment groups for all significant metabolites. Metabolites were considered significant if any factor ANOVA P values or both Orthogonal Projection to Latent Structures (OPLS) coefficient P values were <0.05. Statistical data for all metabolites are shown in Table 3.

Page 91: Multivariate NMR analysis of human disease models

89

Two-Way ANOVA (F-test P

Value) OPLS Diet Coefficient

Metabolite Diet Exercise Interaction SED EX

Isobutyrate <0.005 0.858 0.232 −0.050† −0.085†

Trimethylamine oxide

<0.005 0.491 0.898 −0.050† −0.158†

Creatine <0.005 0.434 0.559 −0.038* −0.016

Valine <0.005 0.963 0.554 −0.050* −0.027

3-Methyl-2-oxovalerate

<0.005 0.770 0.137 0.002* −0.052*

Phenylalanine <0.005 0.932 0.092 −0.041† −0.068†

Isoleucine <0.005 0.930 0.078 −0.048† −0.204†

Leucine <0.005 0.375 0.845 −0.050* −0.099†

Taurine <0.005 0.73 <0.05 −0.033† −0.061*

Glycine <0.05 0.572 0.226 −0.032* −0.040

O-acetylcarnitine <0.05 0.666 0.431 −0.045* −0.058*

Choline <0.05 0.550 0.255 −0.022 −0.100*

Glutamate <0.05 0.301 0.239 −0.022* −0.042*

Lactate <0.05 0.861 0.493 −0.039† −0.029

Tyrosine <0.05 0.902 0.198 −0.028* −0.044†

Methionine <0.05 0.959 0.946 −0.056* 0.016

Acetate <0.005 <0.005 0.179 −0.071* −0.229†

Glutamine 0.374 <0.005 0.624 0.027* 0.048

Glucose 0.300 <0.005 <0.05 0.036† −0.073*

Ornithine 0.892 0.103 <0.05 0.037* −0.051

Table 4.3 Univariate and multivariate significance scores

n=7-8 samples/treatment. * P < 0.05; † < 0.05

Page 92: Multivariate NMR analysis of human disease models

90

Figure 4.2 Significance of changes in metabolite levels

(A) column plot representing the magnitude and directionality of change for significant metabolites (OPLS diet coefficients) in SED animals (solid bars) and EX animals (shaded bars). Each bar represents the difference between HF and LF (CH) diet-fed animals within a diet. For example, acetate was lower in both SED and EX animals; however, the magnitude of the change was greater in the EX group. Nonsignificant coefficients are indicated by the open bars for comparison. The plot represents 7–8 animals/treatment.

(B) metabolic responses of branched-chain amino acids in LF or HF diet-fed animals subjected to either SED or EX conditions. *Significant difference between dietary treatments (P < 0.05). The plot represents 7–8 animals/treatment. Data are means ± SE.

Page 93: Multivariate NMR analysis of human disease models

91

Network analysis

To leverage the system-wide perspective of metabolomics and to further investigate

the relationships between individual metabolites, IPA (Ingenuity Systems;

http://www.ingenuity.com) was used. While significantly changed metabolites were

relevant to a number of biological networks and functions (Supplemental Material,

Supplemental Fig. S1), pathways related insulin signaling were of particular interest

and were highlighted.1 In both SED and EX samples, numerous metabolites

clustered with the functional groupings associated with insulin, proinsulin, and p38

MAPK. This metabolic “neighborhood,” and the changes seen in each sample

group, are shown in Supplemental Figs. S2 and S3 of the Supplemental Material.

Protein content and phosphorylation.

Total and phosphorylated levels of Akt and mTOR were evaluated in skeletal

muscle and the liver. Levels of total Akt, mTOR, and GAPDH did not change with

genotype or treatment and were used as loading controls. All graphical data are

shown as values normalized for the respective total protein. Protein levels of mTOR

(Ser2448) were elevated in the liver with HF diet feeding in SED animals but not in

EX animals (Figure 4.3A). No differences in the phosphorylation of this protein

were observed in skeletal muscle (Figure 4.3B). Examination of phospho-Akt

(Ser473) revealed that HF diet feeding increased the phosphorylation of this protein

in the liver of SED animals (Figure 4.3C). EX lowered the levels of HF diet-induced

phospho-Akt (Ser473) with no differences between LF and HF diet-fed animals. In

skeletal muscle, no differences in phospho-Akt (Ser473) were observed in SED

animals. However, EX resulted in an increase in phospho-Akt (Ser473) in HF diet-

fed animals but not in LF diet-fed animals, implying a higher relative exercise

training intensity in these animals (P < 0.05; Figure 4.3D).

Page 94: Multivariate NMR analysis of human disease models

92

Figure 4.3 mTOR and Akt western blot responses in liver and muscle

Total and site-specific phosphorylation of mammalian target of rapamycin (mTOR) and Akt in skeletal muscle and liver lysates. A and B: phosphorylation of mTOR (Ser2448) in the liver (A) and skeletal muscle (B). As mTOR (Ser2448) is phosphorylated by Akt, phosphorylation of this protein was also assessed. C and D: phosphorylation of Akt (Ser473) in the liver (C) and skeletal muscle (D). Values were normalized to their respective total protein contents and are expressed in arbitrary units. Data are means ± SE; n = 6–8 animals/treatment. *P < 0.05, LF vs. HF diet-fed animals within SED or EX conditions; **P < 0.05 vs. all other treatments.

Page 95: Multivariate NMR analysis of human disease models

93

4.5 Discussion

Exercise training effectively enhances the rates of energy expenditure and substrate

flux, creating an ideal situation for large-scale metabolomic profiling. While single-

metabolite measures have been a staple of physiological analysis for over a century,

using magnetic resonance-based strategies in combination with targeted metabolite

profiling represents a relatively novel tool. The goals of the present study were to

determine whether short-term, moderate-intensity exercise training could alter basal

metabolite profiles. A secondary aim was to examine whether exercise training

could correct obesity-induced shifts in metabolic spectra.

The results demonstrate that 1H NMR profiling can clearly distinguish between SED

and EX in both LF and HF diet-fed animals. Additional findings demonstrate that

while exercise can mitigate some of the abnormal patterns in metabolic spectra

induced by HF diet feeding, they cannot negate it. In fact, when the effects of diet

and exercise training were compared in the basal state, diet was a stronger predictor

and had the larger influence on the metabolic profile. To test the reliability of the

models generated in this study, external validation of the models was performed.

Here, two samples from each treatment group or 26% of the samples collected in the

study were randomly selected and blinded to the investigator. Samples were then

examined for fit into PLS models. External validation samples were not used in

model creation. The results demonstrated the diet could be correctly classified with

an accuracy of 89%, whereas exercise training could be classified 73% of the time.

These predictive percentages are high considering the limited sample sizes and the

complex physiology of HF diet feeding and exercise training. As such, we can

conclude that metabolomic profiling can create biologically relevant models to

predict responses to both diet and exercise treatments.

In the present study, the common laboratory mouse was chosen to examine the

effects of diet and exercise on the basal metabolomic profile. In addition to offering

strict control over genetic background, diet, exercise, and age, the HF diet-fed

Page 96: Multivariate NMR analysis of human disease models

94

C57BL/6J mouse also recapitulates human obesity. Much like the human condition,

obesity occurs over a prolonged time period and is multifactoral (as opposed to a

single genetic mutation) and results in impaired insulin sensitivity (Fueger et al.,

2004a). When fed a HF diet for 12 wk, C57BL/6J mice experience considerable

insulin resistance, as measured by a hyperinsulinemic-euglycemic clamp (Fueger et

al., 2004b; Shearer et al., 2008). Network analysis of significant metabolites in both

diet and exercise treatments showed them to converge on insulin signaling

(Supplemental Material, Supplemental Figs. S1 and S2). Amino acids are of

particular interest as they are intimately involved in cell signaling, gene expression,

and protein phosphorylation (Wu, 2009). Despite near-identical concentrations in

the diets, BCAAs were lower with HF diet feeding but largely unaffected by

exercise. These findings are in agreement with studies in Sprague-Dawley rats

demonstrating a 22% reduction in BCAA levels after 4 wk of HF diet feeding

(Calles-Escandon et al., 1984) as well as in humans, where plasma leucine levels

were depressed in type 2 diabetes and restored with 6 wk of rosiglitazone treatment

(Van Doorn et al., 2007). It remains unknown whether differences in BCAAs with

obesity stem from differential dietary consumption or metabolism. Recent work by

Newgard et al. (Newgard et al., 2009) has shown a strong relationship between a

BCAA metabolite model (principal component analysis score) and insulin resistance

(homeostatic model assessment). As BCAAs have been implicated in insulin

signaling, in part through activation of the mTOR signaling cascade, phospho-

mTOR (Ser2448) and phospho-Akt (Ser473) were examined in the liver and skeletal

muscle. The results showed increased phosphorylation of both proteins in the liver

with HF diet feeding in SED animals. Exercise training eliminated these differences.

No differences in skeletal muscle phospho-mTOR (Ser2448) were noted. However,

there was an increase in skeletal muscle phospho-Akt (Ser473) with EX in HF diet-

fed animals. This differential response is likely due to a greater relative exercise

training intensity in this group compared with LF diet-fed animals. No direct

relationships between BCAAs and either mTOR or Akt phosphorylation could be

established from the present data.

Page 97: Multivariate NMR analysis of human disease models

95

Despite animals resting for 36 h before sample collection, serum lactate levels were

also lower in HF diet-fed animals in both SED and EX conditions. The lower lactate

levels in HF diet-fed animals may reflect a diminished turnover of carbohydrate

stores. Additional metabolites declining with HF diet feeding in both SED and EX

were taurine and its precursor, methionine. This semiessential amino acid has been

implicated in the regulation of lipid homeostasis, insulin secretion, glucose uptake,

and antioxidant defence (Franconi et al., 2006). The fact that this pathway was

downregulated in HF diet-fed animals highlights that their obese, insulin-resistant

phenotype was largely unaltered by the exercise regime. Supplementation of taurine

to animal and human diabetes has shown it to reduce disease severity (Franconi et

al., 2006; Wu, 2009). Besides individual metabolites, clusters of metabolites that are

closely related and react similarly can be useful in indentifying points of

disregulation. Taken individually, the significance of some of these metabolites

might be overlooked, but their concerted pattern may be of interest. In the present

study, arginine, citrulline, and proline are all closely related via nitric oxide

synthesis to vasculature regulation and have been implicated in insulin delivery,

sensitivity, and glucose uptake (Wu and Meininger, 2009). These metabolites

clustered in the OPLS model (Figure 4.1B). The results showed all of these

metabolites were elevated in HF diet-fed animals, regardless of exercise; coupled

with the one-directional nature of arginine synthesis from ornithine, its depletion in

EX animals invites further study.

Exercise also had profound effects on a number of individual metabolites. In HF

diet-fed animals undergoing EX, there was a normalization of circulating glucose,

indicating a therapeutic benefit. Exercise training has been shown to increase

glucose transporter (GLUT4) expression and glucose tissue utilization in this model

(Fueger et al., 2004b, 2004c). Additional metabolites changing in response to

exercise included glutamine and acetate. Primarily produced by skeletal muscle,

resting levels of glutamine declined after EX. The ratio of glutamine to glutamate,

which is often used as an indicator of training, was also altered, with a lower value

Page 98: Multivariate NMR analysis of human disease models

96

of 4.6 in the LF diet-fed EX group and a high value of 9.9 in the HF diet-fed SED

group. This fact is reinforced by the network analysis, where glutamate was

centrally featured in both SED and EX response to a HF diet (Supplemental Figs. S2

and S3). Finally, acetate is a precursor to acetyl-CoA, which is metabolized by the

tricarboxylic acid cycle. Lower concentrations in EX animals are indicative of a

greater reliance on glucose and enhanced accumulation of glycogen in tissues

(Imoto and Namioka, 1983a, 1983b).

As metabolomic profiling can quantify entire spectral profiles, metabolite clusters,

and individual metabolites, this technology has typically been used to examine

disease states including cardiovascular disease (Kirschenlohr et al., 2006), arthritis

(Weljie et al., 2007), and cancer (Chan et al., 2009). It is only recently that

metabolomics has been used to examine health, obesity, insulin resistance, and

exercise training. Of note, a study by Yan et al. (Yan et al., 2009) specifically

examined the effects of exercise training. Analysis of sera from professional rowers

undergoing rigorous training showed the technology to successfully detect

differences between control subjects and athletes as well as differences in training

duration and years of athlete experience. Metabolomic profiling was also sensitive

enough to discriminate changes in metabolites occurring with 1 versus 2 wk of

exercise training. Indeed, metabolomics technology may eventually be used to

determine individual responses to exercise training, overtraining, and nutritional

supplementation.

Limitations of the present study include the measurement of metabolites by NMR.

This technology is not as sensitive as mass spectroscopy and detects only

metabolites at higher concentrations within the metabolome. Ideally, both

methodologies would be included. Additional limitations include the presence of

anesthesia, a factor that could undoubtedly alter metabolomic profiling in these

animals. However, given the large volume of blood collected from animals, this

cofounder was unavoidable. Other limitations include the measurement of serum

and not tissue metabolites. Although we can confirm LF and HF diet-fed animals

Page 99: Multivariate NMR analysis of human disease models

97

had differential responses to the exercise training regime, we cannot pinpoint the

source or tissue responsible for these differences. It is likely that differences in

relative exercise intensity and skeletal muscle and hepatic metabolism were all

contributors to the observed metabolomic profiles. In summary, we show that

metabolomics is capable of discriminating prior exercise training in a basal state in

lean and obese mice. Our results also show that, when compared, diet rather than

exercise training is the predominant determinant of the resting metabolic profile.

Page 100: Multivariate NMR analysis of human disease models

98

Chapter 5 Comparison of multiple high-fat mouse metabolomics experiments

5.1 Introduction

Chapters 3 and 4 are variations on the theme presented in chapter 2, analysing the

effects metabolic of high fat diet in a susceptible mouse model of insulin resistance.

By introducing additional factors, such as a short term diet change or exercise

intervention, we can establish connections between our metabolomic findings and

the known etiology of the disease. They formed a natural progression in

experimental design, driven by the exposure of potential confounders or

opportunities in earlier studies and allowing an exploration of the potential of

metabolomics.

Moreover, since each experiments contains, at its core, the same experiment

comparing chow-fed animals to those on a high fat diet, we may be able to draw

some additional information from comparisons of the results. To the extent they are

the same, we can take the similarities as evidence that the models are, at least,

reproducible and more likely biologically drive. To the extent the results differ they

may also provide insight to the limitations of the process applied.

It bears noting that the studies published as chapters 2-4 were conducted over a span

of 5 years, during which the rapidly growing field of metabolomics has progressed.

The author’s knowledge of metabolomics, statistics, and biochemistry has also

advanced during the same window, and some of the results published in those

chapters are incomplete by modern standards. The fact that those chapters were

written with an eye to translational acceptance by a non-technical audience gave rise

to some incompleteness, but most of the limitations were not yet well characterised

when published. Thus, while a complete reanalysis of the resulting data is not

within the scope of this work, a comparison of the results should take into account

their limitations.

Page 101: Multivariate NMR analysis of human disease models

99

5.2 Comparison of resulting models

In terms of specific metabolites, a number of compounds were consistently

identified as existing at different concentrations in high-fat and chow-fed animals.

Both individual compounds and combinations thereof, especially as identified by

pathway analysis (either curated or automated) are of interest; all have some history

in the insulin resistance literature, and many have been investigated further in the

time since publishing.

5.2.1 Branched Chain Amino Acids

The most conserved response to the introduction of a high fat diet, and presumably

the induction of insulin resistance, was a drop in the level of branched chain amino

acids. Across all three studies, three large non-polar amino acids (leucine,

isoleucine, and valine) were consistently depleted in animals on a high fat diet,

regardless of any diet-switch or exercise considerations.

The disruption of BCAA levels in insulin resistant individuals (Luetscher Jr, 1942;

Harper et al., 1984; Garlick and Grant, 1988; Marchesini et al., 1991), and the

interplay between their level has long been established (Forlani et al., 1984; Brooks

et al., 1986), so this was both reassuring and interesting as we hypothesized that

correlated shifts in other metabolites could suggest potential mechanisms of action.

As previously discussed, leucine in particular has been the focus of much research

in insulin resistance as it shows significant changes in numerous models and human

studies. In light of this, and its strong recurrence as a signal in these metabolomic

studies, it may be worth examining models with leucine as a supervisory variable to

draw out strongly correlated mechanistic shifts in metabolism.

Page 102: Multivariate NMR analysis of human disease models

100

Figure 5.1 BCAA mechanism of IR induction proposed by Newgard et al

When introduced as part of, or in supplement to, a high fat diet, branched

chain amino acids overwhelm amino acid catabolism and result in an

increase of ketone bodies. The outcome is modified availability of

acylcarnitine transport and single carbon flux to gluconeogenesis.

Reproduced from (Newgard et al., 2009)

One of the most prominent metabolomic analyses since the publishing of our initial

findings was that of Newgard et al in 2009, which reinforced and clarified the amino

acid signature associated with the onset of insulin resistance. Newgard’s analysis

was particularly elegant because it combined human studies with an animal model

while studying not just animo acids but fatty acid, derivatives (triglycerides and

acylcarnitines, in particular), hormone levels and phosphorylation assays similar to

those performed in chapter 4. Backed by a strong correlation with the Homeostasis

Page 103: Multivariate NMR analysis of human disease models

101

Modeling Assessment score, an established metric, and the interaction with

rapamycin (an mTOR activation inhibitor) they used supplementation of BCAA and

general-composition amino acids diet to tease out a specific interaction between the

branched chain acids and the high fat diet. In particular, the use of measured

respiratory energy requirement would constitute another potentially strong

supervisory variable for future metabolomic studies of insulin resistance.

5.2.2 Aromatic amino acids

Phenylalanine and tyrosine are not traditionally grouped in with branched chain

amino acids, but both have some history of with insulin resistance and were

included in Newgard’s amino acid signature (Newgard et al., 2009; Friedrich, 2012).

In our datasets tyrosine did not show significant changes in response to the high fat

diet, but phenylalanine showed a significant drop in response, correlated with

increase in bodymass.

Therefore, it is particularly interesting that switching animals to a lower fat chow

diet for the final week of their treatment appears to abolish this effect suggesting it

may be a symptom of insulin resistant animals’ inability to process dietary sources.

Newgard hypothesizes that the change in phenylalanine level may be because of

oversaturation of the large neutral amino acid transporter, while in 2011 Wang et al

noted in a large human cohort that phenylalanine increases in serum up to a decade

before the onset of diabetes (Wang et al., 2011). One of the more difficult

etymological and etiological challenges poised by diabetes is its existence as a

relatively late endpoint, before which many biological systems have been altered

and/or disrupted (Li et al., 2010). As such, the diet-switch protocol should likely be

expanded to investigate the response of specific amino acids and isolate diverse

involvements. The possibility of clamping phenylalanine or other potential

signalling amino acids (Castellino et al., 1987; Chevalier et al., 2004), rather than

glucose in a metabolomics context would likely prove informative.

Page 104: Multivariate NMR analysis of human disease models

102

5.2.3 Single-carbon amino acids

Alanine was another amino acid whose response different in the presence of a final-

week chow diet, but was otherwise elevated in multiple high-fat fed populations.

Glycine, in contrast was depressed in both HC and HH diets, but did not change

significantly in either exercised or sedentary high-fat animals in chapter 4. Since

glycine is known to provide a single carbon source for several nucleic acid synthetic

pathways, these results may provide a small insight to the hypothetical

overburdening of amino acid catabolism. Under such a disruption, glycine levels

would be elevated not because of lowered uptake but a surplus of free amino acids,

resulting in decreased flux through nucleosynthetic pathways.

5.2.4 Carnitine and Ketone bodies

While a number of energy-related metabolites were consistently lowered in high fat

animals (acetate, TMAO), the consistent increase in ketone bodies such as

hydroxybutyrate may shed more light on the biochemical disruptions associated

with insulin resistance. Energy metabolites represent a difficult area in which to

draw strong conclusions given the differences in diet and energy source, even under

the diet switch. Ketone bodies, on the other hand, factor heavily into both amino

acid synthesis and fatty-acid metabolism (Salek et al., 2007), and occupy a central

position in a hypothetic link between branched chain amino acid metabolism, fatty

acid metabolism, and a shift in energy usage towards carbohydrates (Honors et al.;

Newgard et al., 2009; Friedrich, 2012). The fact that 3-hydroxybutyrate levels

remained elevated notwithstanding the diet switch, and responsed identically in

exercised and sedentary animals but did not change in correlation with bodymass

highlights its value as a potential central indicator for this pathway, and reinforces

the hypotheses put forward about the BCAA/fatty acid metabolism link.

On a similar note, the level of several carnitine metabolites also showed significant

differentiation in high-fat fed animals. While their large size prevents the

measurement of acylcarnitines using NMR instrumentation, it’s of note that the

Page 105: Multivariate NMR analysis of human disease models

103

level of the smaller acetylcarnitine molecule was higher in two of our four HF

sample sets, but lower in the diet-switched HF samples, lower in the sedentary HF

animals and showed a muted decrease in exercised animals. According to

Friedman, serum acylcarnitine levels were elevated in several HF diet studies, which

suggests that the pool of carnitine used for mitochondrial transport may be depleted

or destroyed by oxidative radicals in the insulin resistant state, but the inconsistent

responses seen here are perplexing.

5.2.5 Choline

One final metabolite of interest which showed a consistent change in HF fed

animals was Choline, which was significantly depressed in all HF datasets

notwithstanding a final week diet shift. In contrast, the change in phosphocholine

levels was not consistent but went up in some models and down in others. Choline

plays a smaller role in the literature of insulin resistance (Zhang et al., 2008) and

does not factor into the BCAA-mediated shift in fatty acid metabolism proposed

earlier. Indeed, the paper published as chapter 2 represents the primary publication

directly linking choline to insulin resistance with metabolomics, but other

publications (Mooradian, 1987; Brenner et al., 2000) have linked this salt with

insulin resistance and the CDP-choline pathway in non-metabolomics contexts so its

role as symptom, upstream effector or side effect bear further investigation.

5.3 Limitations and Omissions

Several caveats bear mentioning as they apply to all of the datasets analysed. While

the small number of samples was potentially informative vis-à-vis the predictive

power of metabolomics in a typical experimental cohort, it would have been more

valuable to ascertain the true metabolic effect of insulin resistance and the ability of

a metabolomics model to identify insulin resistance in each singular, new sample

outside of the training data. As such, a larger dataset would have been useful for

both increasing power and better characterising the magnitude of the effect based on

diet feeding, etc.

Page 106: Multivariate NMR analysis of human disease models

104

While some tissue-specific assays were performed to characterise the level of

phosphorylation in mTOR, only system-wide serum levels were analysed in a

metabolomics context. Since many of the proposed causes of insulin resistance

occur specifically in the liver as a result of hepatic steatosis (Friedrich, 2012),

especially when it comes as a result of dietary manipulation, analysis of tissue

extracts would have been worthwhile, even given the difficulties of sample

extraction.

Finally, while steps were taken to eliminate confounders and bias, such as

transferring sedentary animals to an immobile treadmill and controlling for (or at

least identifying correlations with) age and susceptibility (bodymass change), by no

means was every potential confounder eliminated. Changes in bodymass and

animal health (in response to diet shock) are known to cause substantial metabolic

changes, so experiments should be designed to differentiate between these potential

confounders and a causal relationship with insulin resistance. Moreover, the

direction of the causality between that resistance and several of the changes must be

established using the aforementioned amino acid clamps, supplementation, or some

similar technique.

5.4 Estimtaing batch effects and bias

In addition to the similarities between batches, there are notable differences in what

should hypothetically be identical models. This similarity in experimental structure

(insofar as the CC-HH comparison was performed for at least some samples)

suggests that, in addition to systematic biases which may be present, there is a

substantial “batch effect” present which differentiates animals on the basis of

process defects rather than experimental treatment.

Notwithstanding our extensive precautions to standardize the acquisition and

processing of samples, including precise pH adjustment, randomization of sample

processing, subsequent randomization of sample quantification, controlled baseline

adjustment and dilution normalization, PCA analysis of all CC or HH samples from

Page 107: Multivariate NMR analysis of human disease models

105

the three experiments showed substantial differences, even when they were re-

normalized to the same mean concentration.

Within batches, cross validation of the PLS-DA models was repeated using 5-fold

rather than 7-fold batches to avoid problems with small sample sizes, and CV-

ANOVA was used to minimize over-fitting. Nevertheless PLS-DA models of each

batch yielded similar differential metabolites, with models trained on one batch

having much lower prediction accuracy on other batches than expected based on

cross-validation metrics. This held for models quantified with either targeted

profiling or binning.

5.4.2 Variable importance estimators

Because the number of samples used for modeling each experiment is relatively

small (the total number of samples used in all three experiments is fairly small,

compared to the complexity of the effect in question), it is possible that differences

between experimental datasets (replicates or batches, when used for comparative

analysis) are caused by insufficient coverage of population variance.

In addition to cross-validation as a means to optimize model structure, a commonly

applied technique for estimating the coverage of a dataset is resampling based

estimates of the variance in each variable. By retraining a model with a variety of

subsets from the training data, techniques such as jackknifing or bootstrapping

provide an estimation of the consistency in each model parameter: assuming the

distribution of each parameter is normal (or at least, known) the mean and variance

of each loading value can be used to calculate confidence intervals or other

significance tests for the probability that a particular loading value is non-spurious

and will be conserved in future experimental replicates.

Page 108: Multivariate NMR analysis of human disease models

106

Serum Samples

Spectra

Metabolite concentrations

Random

Assortment

Profiling

NMR

Batch 1 Batch 2

OPLS OPLS

Model 1 Model 2

Compare

Loadings

Reliability

estimation

Jackknifing

Model 1 with

Confidence

Figure 1

Figure 5.2 OPLS loadings comparison schema

As a baseline, each experimental dataset was divided into two subgroups

randomly (balanced for treatment), which were then modelled separately and

compared in light of the confidence estimates based on batch #1. The same

procedure was then repeated, but with each batch representing a separate

experimental dataset. The efficacy of confidence estimates in the former,

artificially segregated batches, was compared to its efficacy in true

experimental replicates.

By implementing OPLS in GNU-R with a nested resampling framework (Figure

Figure 5.2 OPLS loadings comparison schema), we calculated 95% confidence

intervals for each loading in the models constructed for each CC/HH experimental

Page 109: Multivariate NMR analysis of human disease models

107

dataset above. We compared the loadings from experimental batches to each other,

and also subdivided each batch into two halves and modelled each separately with

the same technique.

Figure 5.3 Loadings replication vs significance estimation for mouse replicates

Font size being proportional to the significance estimate based on batch one,

it is visible that the “volcano plot” effect visible on the right dominates over

the consistency between replicates.

While the confidence intervals were successful in predicting the value of loadings

for models build on the other “half” of the same batch, they were much less

effective at predicting the outcome of other, independent, batches. The magnitude

of the loadings had a dominating impact on their significance (Figure 5.3). As such,

we conclude that the distribution of loadings in models resampled from a particular

batch does not match the distribution of loadings in other batches, reinforcing the

existence of bias which can’t be captured by resampling for exactly this reason.

Even when extensive experimental controls are used, it is necessary to acquire

samples in multiple experimental replicates and then combine them together before

modeling in order to account for experimental drift/bias.

-0.3

-0.2

-0.1

0.0

0.1

0.2

X2.Hydroxybutyrate

X2.Oxoglutarate

X3.Hydroxybutyrate

X3.Methyl.2.oxovalerate

Acetate

Alanine

AsparagineAspartate

Benzoate

Carnitine

Choline

Citrate

Creatine

Formate

Fumarate

Glucose

Glutamate

Glutamine

Glycerol

Glycine

Hippurate

Isobutyrate

Isoleucine

Lactate

Leucine

Lysine

Methionine

O.Acetylcarnitine

Ornithine

Phenylalanine

Proline

Propylene.glycol

Pyruvate

Serine

Succinate

Taurine

Threonine

Trimethy lamine.N.oxide

Tryptophan

Tyrosine

Urea

Valine

-0.2 -0.1 0.0 0.1 0.2 0.3

Mouse B1 vs. Mouse B2 (opls:P 1/3 comps)

Batch 1 loading

Batch 2 loading

2

4

6

-0.2 -0.1 0.0 0.1 0.2

pvalue vs Loading

Rat C1 [ 3 ]

Batch 1 loading

sigval

distance

0.05

0.10

0.15

0.20

0.25

Page 110: Multivariate NMR analysis of human disease models

108

Chapter 6 - Metabolic profiling of vitamin C deficiency in Gulo−/− mice using

proton NMR spectroscopy

Gavin E. Duggan1, B. Joan Miller2, Frank R. Jirik2 and Hans J. Vogel1

Published: 2011-03-1 in the Journal of Biomolecular NMR, Vol 49, pp165-173

6.1 Abstract

Nutrient deficiencies are an ongoing problem in many populations and ascorbic acid

is a key vitamin whose mild or acute absence leads to a number of conditions

including the famously debilitating scurvy. As such, the biochemical effects of

ascorbate deficiency merit ongoing scrutiny, and the Gulo knockout mouse provides

a useful model for the metabolomic examination of vitamin C deficiency. Like

humans, these animals are incapable of synthesizing ascorbic acid but with dietary

supplements are otherwise healthy and grow normally. In this study, all vitamin C

sources were removed after weaning from the diet of Gulo−/− mice (n = 7) and wild

type controls (n = 7) for 12 weeks before collection of serum. A replicate study was

performed with similar parameters but animals were harvested pre-symptomatically

after 2–3 weeks. The serum concentration of 50 metabolites was determined by

quantitative profiling of 1D proton NMR spectra. Multivariate statistical models

were used to describe metabolic changes as compared to control animals; replicate

study animals were used for external validation of the resulting models. The results

of the study highlight the metabolites and pathways known to require ascorbate for

proper flux.

6.2 Introduction

Vitamin C deficiency in humans results in scurvy, a debilitating condition that is

well known historically as it affected sailors that were circumnavigating the globe.

In 1747, in what was probably the first clinical trial, the Scottish naval surgeon

James Lind discovered that lemon and orange juice could be used to prevent or

Page 111: Multivariate NMR analysis of human disease models

109

reverse the effects of scurvy. It took until the 1800s however for this cure to become

widely accepted (Harvie, 2005). The chemical structure of the active ingredient

vitamin C was conclusively established in 1932 (Svirbely and Szent-Györgyi,

1932). Most animals, including mice, are capable of synthesizing ascorbic acid to

satisfy their endogenous needs; it is produced through a well-defined metabolic

pathway from D-glucose and D-galactose. Guinea pigs, apes and humans however

have lost the ability to synthesize ascorbic acid, because they do not have a

functional L-gulonolactone oxidase gene (GULO), which codes for the enzyme that

catalyses the final step in the biosynthesis of vitamin C (CHATTERJEE et al.,

1961). Fortunately many fresh fruits and vegetables contain high concentrations of

ascorbic acid and hence humans and simians can acquire a sufficient supply of this

vitamin through regular consumption of appropriate foods (Nishikimi and Yagi,

1991; Naidu, 2003). It is noteworthy that humans still possess a nonfunctional

GULO pseudogene on chromosome 8p21.1, with several introns missing and

numerous insertions, and it has been estimated that most primates lost the ability to

synthesize their own vitamin C over 40 million years ago (Nishikimi et al., 1994).

Similarly, guinea pigs still contain the highly mutated remnants of an ancestral

GULO gene (Nishikimi et al., 1992).

Without access to environmental sources of vitamin C, humans suffer system-wide

failures in health; long term, severe shortages lead to scurvy, which is invariably

fatal if untreated. Some researchers have referred to this ‘hypoascorbemia’ as a

‘public inborn error of metabolism’ (Stone, 1979). Interestingly several studies have

shown that gene therapy can restore the ability of human cells to produce ascorbic

acid endogenously (Ha et al., 2004; Li et al., 2008).

High levels of vitamin C are thought to have beneficial effects on various diseases

ranging from cancer, all the way to the common cold (for a review see (Naidu,

2003)). In order to facilitate studies of the effects of vitamin C on the progression of

various diseases in mouse models, a Gulo−/− knockout strain has been developed

(Maeda et al., 2000). Like humans, this mouse strain needs to obtain vitamin C from

Page 112: Multivariate NMR analysis of human disease models

110

its food or drinking water. As a result, they suffer several deficiencies, including

partial loss of neutrophil function and increased oxidative stress during development

(Vissers and Wilkie, 2007; Harrison et al., 2010). By crossing Gulo knockout mice

with other disease model strains, the protective effects of ascorbate can be

investigated in a controlled environment. In this work we have used the Gulo−/−

strain to determine the effects of vitamin C deficiency on the metabolic profile of

these mice.

Proton Nuclear Magnetic Resonance (1H NMR) analysis of serum has the advantage

of quantitative, simultaneous measurements for many metabolites (Schicho et al., 0;

Weljie et al., 2007; Shearer et al., 2008; Dunn et al., 2011). This approach facilitates

interpretation and visualization of the resulting metabolic snapshot and is widely

used for mouse model studies (Griffin, 2006). The spectral area derived from each

metabolite can be segregated and attributed to individual metabolites. Targeted

profiling provides a significant advantage in metabolite identification by virtue of

correlated and corroborated grouping in peaks based on latent biochemical

knowledge. We have used the Chenomx Profiler approach (Weljie et al., 2006) to

identify specific metabolites and translate peak intensities into metabolite

concentrations using a library of spectral profiles. The resulting metabolite

concentrations provide a simplified dataset on which to perform multivariate

statistical analysis, which is easier to interpret and assess for significance.

These data were further analyzed using Orthogonal Projection to Latent Structures

(OPLS), a multivariate analysis technique that can separate the metabolic variation

in samples into noise and two classes of structured variation: patterns of change

which are correlated to the onset of vitamin deficiency, and coherent patterns of

change which are however not correlated to a particular phenomenon (Trygg and

Wold, 2002). As such, OPLS can be used to identify and remove variation between

sample batches and highlight metabolic changes of interest.

Page 113: Multivariate NMR analysis of human disease models

111

6.3 Experimental procedures

Mouse maintenance

Procedures were approved by the University of Calgary Animal Care and Use

Committee and abided by the Canadian Association for Laboratory Animal Science

guidelines for animal experimentation. Animals were maintained in a humidity-

controlled room with a 12-h light:dark cycle. Following weaning (3 week of age),

male Gulo−/− (KO) and Gulo +/+ (WT) mice (on a C57BL/6 background) were

maintained in microisolator cages for one week. Following this acclimation period,

all animals were switched to a vitamin C deficient diet (58R3—TestDiet, Purina,

Richmond, IN). The diet met all other nutritional requirements of adult mice. Food

and water were provided ad libitum throughout the experiment.

Animal experimentation

Animal mass was recorded on the date of vitamin C restriction and weekly

thereafter. Following 12 weeks of experimental observation, WT animals (n = 7) in

study 1 (model training) were healthy and KO animals (n = 7) showed signs of

vitamin C deficiency. WT animals in study group 2 (model testing) were harvested

after 11 (n = 4) or 19 days (n = 4). KO animals in group 2 were harvested after

19 days (n = 4). Final mass was recorded for all animals before euthanasia by

carbon dioxide inhalation (Weljie et al., 2007). Whole blood (~1 mL) was obtained

by a cardiac puncture and placed on ice and allowed to clot for 30 min. Samples

were then centrifuged for 10 min (3,000 rpm) and sera collected and divided into

two aliquots. Serum was stored at −80 °C until ready for data acquisition.

Metabolite sample preparation

Serum samples (0.2–0.3 mL) were thawed and filtered twice using 3 kDa NanoSep

microcentrifuge filters, pre-washed to reduce preservative contamination. Filtrates

were then transferred to clean microfuge tubes; final sample volume ranged from

Page 114: Multivariate NMR analysis of human disease models

112

150 to 250 uL. Samples were brought to 650 uL by addition of 140 uL of phosphate

buffer containing dimethyl-silapentane-sulfonate (DSS, final concentration

0.5 mM), 40 uL of sodium azide to limit bacterial growth, and D20. Sample pH was

then adjusted to a range of 7.0 ± 0.01.

Spectrum acquisition

One-dimensional 1H NMR spectra with optimal water suppression were acquired

using a standard pulse sequence (Bruker pr1d_noesy pulse definition) (Nicholson et

al., 1995). Spectra were acquired in two batches, using an automated NMR Case

sample changer on a 600 MHz Bruker Ultrashield Plus spectrometer. The pulse

sequence had a mixing time of 100 ms, a Z-gradient water presaturation pulse, and a

3 s acquisition time. Samples were individually shimmed to ensure half-height line

width of approximately 0.8 Hz for the major DSS peak calibrated to 0.0 ppm.

Spectra were acquired with 512 scans, zero-filled and Fourier transformed to 64 k

points. Standard post-processing of the spectra included deletion of the water region,

B-spline baseline correction, reference deconvolution and calibration of the DSS

peak. 1H,13C heteronuclear single quantum coherence (HSQC) spectra of two

samples, chosen at random, were acquired for peak assignment and verification.

Metabolite concentration profiling

Processed spectra were imported into Chenomx Profiler version 5 (Edmonton,

Alberta, Canada) for quantification. A standard set of fifty (50) compounds were

profiled based on chemical shift assignments verified by the 1H,13C HSQC results.

These metabolites represent an average of 84% of the total spectral area, excluding

the water, lactate and glucose areas. Spectra were randomly ordered for fitting in

Chenomx Profiler to avoid progressive bias. Compounds were fit from highest

initial concentration to lowest, with an iterative reinspection. Each measurement

was normalized to the mean sample concentration, dividing profiled spectral

concentrations by the total concentration of all profiled metabolites in that sample

(Weljie et al., 2006).

Page 115: Multivariate NMR analysis of human disease models

113

Multivariate statistical analysis

Normalized concentration values were imported into SIMCA-P software (Umetrics,

Sweden) for multivariate pattern analysis. Orthogonal Projection to Latent

Structures (OPLS) was used to capture artifactual differences between Batch 1 and 2

(irrespective of strain) and to compensate for batch variation. Hierarchical OPLS

was then applied to the residuals of Batch 1 samples to isolate changes between KO

and WT animals. 50 metabolite concentrations were used as X variables; animal

body mass at time of harvesting was used as the supervisory (Y) variable (Schicho

et al., 0).

Cross validation

A seven-fold rotating cross-validation (CV) was used to estimate the robustness of

the OPLS model. In each iteration, a model based on ~85% of Batch 1 samples was

used to predict the response (final body mass) for the other 15%. As a measure of

model strength (Eastment and Krzanowski, 1982), the average ratio of total sum of

squared errors (Q2Y) was compared to the percentage of Y variance captured in the

total model (R2Y). The distribution of predicted body-mass was compared to that of

actual training sample body masses.

External validation

The model constructed from Batch 1 samples were used to predict body-mass for

batch two samples. Predicted body-mass is compared to the actual strain and actual

mass of animals to determine the predictive ability of the model on external samples

not used in the model’s construction.

Model interpretaton

Primary component OPLS coefficients for the Batch 1 model, multiplied by the

weighting, were used to interpret the response pattern in affected animals (Trygg et

Page 116: Multivariate NMR analysis of human disease models

114

al., 2007). 95% confidence intervals were determined from CV jack-knifing and t-

value distributions (Efron et al., 1993).

6.4 Results

Animal characteristics

Wild-type animals in Batch 1 (training samples) showed no visible signs of vitamin

C deficiency after 12 wks. In contrast, Gulo−/− animals exhibited significant signs

of scurvy, including joint bruising and osteodegeneration (Hirschmann and Raugi,

1999). Batch 2 (testing) animals were harvested prior to the onset of symptoms, for

external validation purposes.

Animal body mass was reasonably consistent within strains, with a significant

difference (P < 0.05) induced between them when vitamin C was removed from the

diet (Figure 6.1). The body mass of wild-type animals in Batch 1 increased steadily

after the weaning period finished. The mass of KO mice lagged in the first month,

tailed off in the second and fell steeply in the last 4 weeks. Batch 2 KO mice already

had significantly lower body mass than WT mice when harvested after 3 weeks.

Figure 6.1: Effect of Gulo -/- knockout on mass gain

Page 117: Multivariate NMR analysis of human disease models

115

Statistical validation

The multivariate OPLS model of Batch 1 animals was supervised using the animals’

body mass after 12 weeks as the Y variable. Seven-fold cross validation was used,

with CV predictions of each training sample’s mass when left out of the model

construction. Overall 96.8% of the variance in body mass (Table 6.1 Cross

validation metrics for model training) could be accounted for, with 71% accuracy

(mean normalized sum of squares across cross validation iterations). Given the

relatively small number of samples in the training data set, good agreement between

predicted and actual body mass (Figure 6.2) indicated a strong model of the

metabolic changes between WT and KO mice when vitamin C was withheld.

Model Samples R2Y (Coverage) Q2Y (Quality)

KO vs WT (Y=mass) Batch 1 96.8% 71% (very good) Batch differences All 96.5% 96% (excellent)

Table 6.1 Cross validation metrics for model training

Figure 6.2: Cross-validation predicted vs actual body mass for batch 1 mice.

Average predicted (white) vs actual (dark) bodymass. Predictions were made for samples left out of each CV model. Intraclass overlap and interclass separation are a measure of model quality. One sample, believed to be a knockout, fell between the expected range for both classes. Re-typing showed it to be an unexpected heterozygote

Page 118: Multivariate NMR analysis of human disease models

116

External validation

A principle components analysis (PCA) model of all samples showed that the

differences between Batches 1 and 2 were more significant than differences in KO

and WT mice (Supplementary Figure S1). This is quite common in metabolomics

studies, where sensitive techniques such as serum NMR-profiling can detect

differences in animal environment (Bollard et al., 2005) but in this case is caused by

differences in animal age. An OPLS model of inter-batch differences showed

excellent quality measures (Table 6.1), indicating the model’s ability to put

Figure 6.3: WT vs KO hierarchical OPLS scores

WT (black) vs KO (white) OPLS model scores of modeling samples after removal of intrabatch variation using hierarchical OPLS. First component (horizontal) score indicates the degree of vitamin C deficiency perceived by the model. The one modeling sample (HET/Triangle) misclassified by the model was the heterozygous mutant (See Fig. 2)

Page 119: Multivariate NMR analysis of human disease models

117

samples on a common basis for comparison, independent of age (Figure 6.3). Based

on these analyses, the Batch 1 model was able to significantly (P < 0.05, Figure 6.4)

differentiate WT and KO mice in Batch 2—pre-symptomatic early harvesting of

Batch 2 samples notwithstanding.

Figure 6.4: External validation results

Predicted (light) and actual (dark) masses for batch 2 samples not used in

any model construction. Standard error bars show good separation between

predictions for samples, notwithstanding the fact that they were harvested

pre-symptomatically

Model interpretation

OPLS coefficients were multiplied by the scaling weights (Trygg et al., 2007) for

interpretation of the metabolite changes between strains. Jackknifing was used to

calculate 95% confidence intervals, and only metabolites with significantly

increased or decreased concentrations are reported (Figure 6.5). Several groups of

metabolites with similar biochemical functions (Kanehisa et al., 2010) were seen to

respond in concert.

Page 120: Multivariate NMR analysis of human disease models

118

Figure 6.5: First component OPLS loadings of WT vs KO mice

Values indicate relative change in compounds in knockout mice after

12 weeks of vitamin C deficiency

Page 121: Multivariate NMR analysis of human disease models

119

6.5 Discussion

An estimated 15–30% of low-income North Americans suffer from varying degrees

of vitamin C deficiency (Frikke-Schmidt and Lykkesfeldt, 2009), of which even

mild shortfall undermines the body’s ability to defend against damaging oxidizing

byproducts. Vitamin C deficiency has been implicated in joint trauma (Padh, 1990),

heart disease (Heinig and Johnson, 2006; Cangemi et al., 2007), cancer (Carroll and

Kritchevsky, 1994), as well as motor control and degenerative conditions

(Sauberlich, 1994). Linus Pauling proposed that high doses of vitamin C would

prevent the common cold, a claim that has been hotly debated (Harri and others,

2009), and some investigators have even suggested that infusion of extremely high

doses of vitamin C may have beneficial effects as a prodrug in the treatment of some

cancers (Chen et al., 2005, 2008; Cabanillas, 2010). Its popular appeal as a panacea

may be grounded in some degree of biochemical fact since ascorbic acid is a known

cofactor for over a dozen biological pathways (Padh, 1990), and a prominent

antioxidant in many others. Vitamin C is one of the nutrients being studied in the

European Prospective Investigation into Cancer and Nutrition (EPIC) population

study (e.g. (Gonzalez and Riboli, 2010)).

Its significance notwithstanding, the mechanisms by which vitamin C affects

various conditions have not been fully characterised (Naidu, 2003). Broad-spectrum

metabolic state profiling, in conjunction with known networks of metabolic

functionality, gives a contextualized view of changes and the Gulo knockout mouse

provides an ideal platform for the study of vitamin C deficiency (Y et al., 2003;

Harrison et al., 2010). The ability of this mutant to be crossed with other disease

models also provides abundant avenues for further investigation. Herein, we have

used 1D proton NMR of serum from Gulo−/− mice to study the differences in

metabolism induced when vitamin C was withheld after weaning.

Because direct analysis of NMR spectra can be difficult, 64,000 spectral intensity

points for each serum sample were translated to corresponding concentrations for 50

Page 122: Multivariate NMR analysis of human disease models

120

metabolites. This translation to named compounds prior to analysis was

instrumental in model interpretation, refinement, and establishment of significance

in differences between knockout and wild-type animals. In this study, the animals’

body mass was used as a supervisory measure of signs of vitamin C deficiency

(Figure 6.1). While the training cohort was small, testing of the models supported

their validity as the predicted values were in good agreement with actual values for

Batch 2 samples (Figure 6.2). Moreover, despite the early harvesting of the external

validation samples, prior to the onset of symptoms, the model showed a significant

(P < 0.05) ability to separate KO and WT samples from the second batch of unseen

animals (Figure 6.4). Taken together with the confidence intervals for the metabolite

changes, we have evidence that the model accurately describes the changes induced

by vitamin C shortfalls and not other phenomena or random variation.

The model loadings (Figure 6.5) reflect the significant changes between WT and

KO mice. The difficulty of interpretation stems from the complexity of the

interactions within the animal systems; no single metabolite can both capture and

segregate the multiple responses to various oxidative stresses. Some are ubiquitous

indicators of non-specific pathology, while others are involved in numerous

pathways. By viewing changes in the context of the total response the loadings can

highlight specific pathway segments of interest. In addition to literature sources, the

Kyoto Encylopedia of Genes and Genomes (Kanehisa et al., 2010) and the Human

Metabolome Database (Wishart et al., 2007) were used to identify functional groups

of metabolites with significant changes. Several functional groups were associated

with changes in vitamin C availability, which together highlight major shifts in

metabolism: an increase in glutathione production to compensate for ascorbic acid’s

loss, and/or an increase in glycerophospholipid metabolism for either energetic or

precursor-provision reasons.

Energy regulation (Pyruvate, Acetate, Oxaloacetate, Succinate, Fumarate)

Page 123: Multivariate NMR analysis of human disease models

121

The TCA cycle’s central role in energy metabolism features prominently in many

metabolomic profiles, and central TCA players can be affected by so many

responses (e.g. injury, diet, uptake, repair) that direct attribution is often difficult.

The known role of ascorbate in activating prolyl 4-hydroxylase to increase succinate

creation from 2-oxoglutarate raises some questions about succinate’s perceived

increase despite the decreasing levels of fumarate, pyruvate, and oxaloacetate. This

may imply an ascorbate-shortage induced shunt from either pyruvate or succinate

(or both) via alanine (which demonstrated a significant increase) and aspartate.

While aspartic acid exhibited no consistent increase in concentration there was a

large increase in variance indicating possible increases in flux; the changes in both

lysine and phenylalanine levels could be attributed to disruption of aspartic acid

production from alanine and oxaloacetate.

Carnitine biosynthesis

Two of the most significant and coherent changes seen are the decrease in carnitine

levels and an increase in lysine. Carnitine can act as a transporter of fatty acids into

the mitochondria, and the twofold role of ascorbate in its synthesis is well

established (Padh, 1990). Ascorbic acid acts as a cofactor for hydroxylation of

trimethyl-lysine and butyrobetaine, both precursors for the production of carnitine.

The direct implications of this can be seen in the significant drop in carnitine and O-

acetylcarnitine in KO mice, concurrent with a significant increase in lysine and

betaine. Whether it is a cause or effect requires further investigation: the resulting

shift in aspartate-to-lysine conversion would result in increased flux towards

cysteine and glutathione production, or an increase in lipid metabolism could result

from other oxidative stress responses (Hopps et al., 2010) (see below).

Glutathione synthesis

As part of the GSH/ascorbate redox chain the significant change in glutathione

levels is of immediate interest. In the absence of dehydroascorbic acid, GSSG

conversion to GSH is impeded and the normal cycle of reductive protection is cut

Page 124: Multivariate NMR analysis of human disease models

122

off from the terminal outlet of vitamin E quinones (Winkler et al., 1994). While

increased production in GSH would eventually be overwhelmed it could serve as a

short-term compensatory measure, and a number of metabolic shifts support the

possibility of increased glutathione production in these knockout mice.

Cystathionine-β-synthase (CBS) is the committed enzyme step in production of

cysteine from methionine, and is regulated by the oxidative state of its heme group.

The decrease in methionine levels supports the notion that an absence of ascorbate

results in oxidation of CBS and upregulation of cysteine production, one of two

required precursors for the increased production of glutathione. The other precursor

for glutathione production, glycine, is also seen to increase while serine (its

precursor) decreased significantly (Figure 6.6). Significant depletion of serine is

also supported by a potential increase in glycerophospholipid metabolism (via

conversion to O-phosphatidylserine, see below).

Page 125: Multivariate NMR analysis of human disease models

123

Figure 6.6 KEGG pathway diagram for serine and glycine metabolism

Glycerophospholipid metabolism

The additional role of ascorbate as a primary regulator of glycerophosphocholine-

cholinephosphodiesterase enzymes (Sok, 1998) may explain the differential levels

of choline, phosphocholine, and glycerol in our results. If the loss of ascorbate’s

stabilizing action on GPC-cholinediesterase gave rise to an increase in free choline,

the corresponding increase in betaine would feed the glutathione production above.

It seems reasonable that the concomitant release of diacylglycerols into the cell may

be an evolutionary response. DAG is a known secondary messenger, activating

Page 126: Multivariate NMR analysis of human disease models

124

protein kinase C. Moreover, release of those free lipids may be related to the reports

of insulin resistance in some cases of long term, mild, ascorbate deficiency. Further

investigation of intermediate energy metabolism using other techniques (e.g.:

lipidomics, hyperinsulinaemic clamp), might be used to distinguish between these

possibilities. Regardless, it suffices to say that the implications of low ascorbate

levels on lipid metabolism seem both profound and far-reaching.

The one metabolite of high significance that cannot be directly ascribed to any

particular ascorbate-related reaction within the scope of this study is 1,3-

dimethyluric acid. The importance of uric acid as a secondary antioxidant (Ames et

al., 1981), particularly in primates that lack the GULO gene (Sevanian et al., 1991),

suggests a number of possible functions however currently no literature linking the

two phenomena is available. A number of dimethylxanthine-related interactions

(Lee, 2000) have been described in the literature as having ascorbic acid

associations but in light of the numerous possibilities there is insufficient evidence

in these results to extrapolate. The lack of significant change in some related

metabolites, such as allantoin, further complicates the issue.

6.6 Conclusions

Our results highlight the ability of the Gulo knockout mouse to present a clear

picture of the metabolic changes induced by vitamin C deficiency, and of 1D proton

NMR to capture those changes. Quantitative profiling simplified the NMR results to

a manageable set of identified metabolites, and PLS was used to identify coherent

patterns (statistical models) of change between wild-type and KO mice.

Orthogonalization of those PLS loadings clarified the results and allowed

interpretation of the models with respect to vitamin C deficiency induced changes.

They also allowed comparison of model loadings between two sample batches, and

validation of the resulting models by virtue of accurately predicting the condition of

unknown animals. Finally, our results are in good agreement with several

Page 127: Multivariate NMR analysis of human disease models

125

metabolites known to require ascorbate for enzymatic activation and proper flux.

Almost all of the metabolic changes seen in the animal can be attributed to a few

pathways of known importance, which provides a coherent picture of how the

absence of ascorbate’s reducing action upregulates production of other antioxidants,

primarily glutathione, and possibly uric acid. The links between those pathways and

the perceived changes underscores the ability of NMR-based metabolomics to

capture the impact of complex diseases in a quantitative way.

Page 128: Multivariate NMR analysis of human disease models

126

Chapter 7 Conclusions and Future Work

Metabolomics as a field continues to evolve to meet the expectations of clinical,

pharmaceutical, biochemical and agricultural interests. This rapid growth can be

seen in the development of novel applications for metabolic profiling in the last five

years, but also in the frequent questions raised by end users about the interpretation,

implications and limitations of past results.

In this early, developmental phase questions answered with metabolomics

techniques often present opportunities for further study, new biochemical

hypotheses or variations on technical themes which might improve the information

available. These novel avenues for exploration have comprised the crux of the work

performed herein, both in terms of specific innovation but more importantly in

addressing the larger concept that an “engineered” approach to experimental design

may be both possible and appropriate to a given biological problem.

There is a saying about those who fail to study history, and thus a proper

examination of metabolomic techniques began with a baseline experiment on a well-

established system of interest. Dr Shearer’s work on the medically pressing issue of

insulin resistance, using mouse models to control as many factors as possible,

generated high quality serum samples. Analysis via 1H NMR (with select HSQC

spectra to calibrate our chemical assignment) provided successful insight into the

power of metabolomics. Unseen in the published results were the various

incremental steps required to optimize acquisition procedures, including pulse

sequence, relaxation times, sample concentration, and proper methods for targeted

profiling. The resulting data was an early example of physiological experimentation

which built on early work by Dr Weljie and has since been expanded on by us and

others (Zhao et al., 2010; Kim et al., 2011). The significance of branched-chain

amino acids, a previously known class of signalling metabolites (Newgard et al.,

2009), was the key biochemical assertion made as a result, which was later held up

in our other studies.

Page 129: Multivariate NMR analysis of human disease models

127

Incidental to the biochemical implications, several other phenomena became

apparent during the unfolding investigative process. To calibrate the profiling

process better, the same set of samples were profiled multiple times and the results

compared. The variance revealed the existence of a significant acclimatization

and/or training period in each individual’s use of the Chenomx software, and a

heteroskedastic nature of human error in the profiling. The latter is likely an

unavoidable consequence of the univariate scaling technique used: a consistent error

variance in the profiling is exacerbated when the variances are multiplied by the

inverse of the metabolite’s average concentration. This became relevant in later

chapters looking at the use of variance-based estimation techniques.

Even with this early study design though, limitations were readily apparent.

Reviewers noted the potentially confounded short and long term effects of diet, and

the presence of energy metabolites’ responses in any kind of health-affecting

intervention is established in the literature. The diet switch study described in

chapter three showed that the aforementioned amino acids, with established

physiological signalling implications, responded similarly regardless of the final

week’s diet and the normalization of glucose levels, liver triglycerides. This was in

contrast, as suspected, to the changes in immediate energy metabolites, which fell

off-axis in the SUS plot comparing regular- and highfat-chow fed metabolic

changes.

Notwithstanding its well established use in other fields, there have been few

metabolomics studies using similar controls (Miccheli et al., 2009; Scalbert et al.,

2009; Zivkovic and German, 2009; Westerhuis et al., 2010). The complexity of

normal multivariate results is compounded, exponentially, by the introduction of

four different sample treatments. Because no linear order exists between treatments,

a simple OPLS model can’t capture both relationships, but the SUS-based approach

presents a simple and intuitive way to present the results. By using two models and

then juxtaposing the two related phenomena in a graphical form, the relationships

between samples are easy to grasp.

Page 130: Multivariate NMR analysis of human disease models

128

The concept of data “purity”, protected by proper controls and unadulterated by

other effects, can be augmented and complemented by improving an experiment’s

perspective and depth. Once the data is focused to the condition under study, the

quality can still be improved by taking metabolomics beyond the realm of simple

case-controls.

In the case of insulin resistance, a phenomenon with astronomical cost implications

and diabolical complexity (Bain et al., 2009), diagnostic potential is undeniably

important but arguably secondary to insight about the physiological mechanisms

and interactions. The effects of exercise are known to interact with both causes and

effects of insulin resistance in complex ways (Atalay and Laaksonen, 2002), making

it an interesting subject for two-factor analysis. If metabolomics could identify and

isolate those effects, and interactions, of exercise and high-fat feeding then it could

provide a perspective unavailable to one-factor experiments.

When analysing the resulting metabolite concentration matrix, a PLS model with

both feeding and exercise as Y-variables gave a serendipitous model but the contrast

of two one-factor OPLS models was clearer (Figure 4.1a). By separating the factors

as seen in chapter 4, this technique leveraged the experimental cohorts better in the

same way that analysis of age- or gender-paired controls provides contrast analysis.

Doing so also establishes an important continuity and context, allowing for easier

comparison with other experiments.

Biochemically, the initial experiments with HF-diet induced changes highlighted

energy metabolism and signalling as major shifts associated with the diet, which the

non-metabolomic glycemic clamp technique suggests also caused insulin resistance

in the susceptible mice. The two-factor analysis shows that, while the changes some

energy metabolites persist when exercised, the hyperglycaemia, the changes in

signalling amino acids, and shifts in other energy metabolite were ablated. It’s

worth noting that the significance estimates for the loadings in this study were

generated with a jackknife-based technique similar to that studied in chapter five.

Page 131: Multivariate NMR analysis of human disease models

129

They should therefore be viewed with a degree of skepticism. Nevertheless, they do

show a degree of biochemical logic which is encouraging.

Given the assumption that exercise mitigates the onset of insulin resistance, the

results suggest a potential method of activity at the regulatory level, and highlight a

potential difficulty in solely glucose-based non-metabolomic diagnostic measures.

Considering the relative ease of manipulating exercise and dietary fat content in

humans, this suggests that a potential next step would be human studies using

calibrated diet or health measures. While withholding treatment in humans is often

an ethical non-starter, the impact of controlling caloric intake and fat/protein

balance is more acceptable. The obvious, and unavoidable, limitation is the long

onset time of any insulin-resistance related condition, but it would appear that

metabolomics investigation could extract meaningful data from the non-pathological

initial changes seen in a months-scale crossover timeline. Unfortunately, the strong

glucose-screening effects of the kidneys make the easier urine-based studies

unlikely to bear useful information in this case.

Interesting alternate applications of the same methodology could include the two

factor analysis of antibiotic treatments in sepsis models(Izquierdo-García et al.,

2011; Lacy, 2011; Stringer et al., 2011) or plaque formation in Alzheimer’s

(Irizarry, 2004; Barba et al., 2008). These types of pathological conditions with

unknown etiology and muddled biochemistry benefit from the contrast provided by

a second factor. Moreover, both conditions are amenable to either simple case-

control structure or a more sophisticated dose/response model using treatment

magnitude as the supervisory variable.

In general, it seems the suitability of a particular condition to metabolomic analysis

will likely depend on the breadth of biochemical disruption caused by the

experimental manipulation. Treatments or conditions which induce changes in

immune response, primary energy, or other generic systems will require adaptation

to clarify the resulting interpretation. Two factor analysis is one potential for doing

Page 132: Multivariate NMR analysis of human disease models

130

so, however the use of PLS is not without its drawbacks. One difficulty which arose

in comparing HF-induced changes in sedate and exercised animals was the scaled

nature of the loadings resulting from the OPLS (or any PLS-derived) model. The

use of weighted coefficients aids interpretation by controlling the spread of the

loadings over a fixed range, but renders them harder to compare because absolute

values are unknown. Instead, the comparison has to be done in a geometric context,

with the relative slope of point clusters providing context for other changes.

The same drawback became more significant when the comparison of multiple

mouse/diet models was performed in chapter 5. While the orthogonalization of the

model into single predictive components makes it easier to compare studies, the

different numerical ranges spanned by loadings still led to the possibility of non-unit

“slope” appearing between models. Such a trend would render either angle- or

distance-from-identity metrics of conservation invalid, and hence may be a reason

why variance-based estimates of significance were less accurate than expected.

Unfortunately, the scattering of a small number of metabolites at the extremes of the

loading ranges is usually insufficient to estimate the slope effect, if it exists.

Possible solutions would be further application of median-based slope estimators, or

forcing the loadings into the same range, but doing so would require significant

validation as a method. It may also be possible that the use of weights to scale

loadings may render the different loadings ranges meaningful and/or consistent.

Investigating the latter would still require investigation, but could be an important

step towards the accurate interpretation of OPLS models. Doing so would likely

require a properly designed control experiment with known sample variances, but

could be done on inexpensive bacterial studies.

Other possible reasons abound for the unexpected failure of jackknifing as a

significance predictor. Since it appears unlikely that practitioners will (or even

should) cease the direct interpretation of PLS loadings, their relative interpretability

being the primary reason for their use, further investigation of both jackknifing and

Page 133: Multivariate NMR analysis of human disease models

131

other significance estimators remains an important area of methodological research.

Extending the comparison of batches to projection coefficients from PLS, as they

pertain to each Y variable, should be a relatively simple and immediate next step; it

may prove that variance eliminated from the predictive components by

orthogonalization is damaging the ability of jackknifing, and that simple PLS in fact

yields better results. In general, the interpretation of PLS-based terms is

insufficiently understood by metabolomics practitioners.

In addition to being our first use of multi-factor metabolomics, the study of exercise

and high fat diet marked our first effective application of network analysis tools

such as the IPA™ suite, from Ingenuity Systems software(Ingenuity, 2011), and

KEGG pathway diagrams (Kanehisa et al., 2010). Because energy metabolites are

directly implicated in insulin-related pathology, and tightly coupled with many

processes at the cellular and systematic level, the pathway analysis didn’t provide

significant benefit in this case. Its subsequent application to the vitamin C study in

chapter 6 however, yielded much greater insight. While the postulated method of

interaction of vitamin C deficiency with the regulation of glutathione synthesis is

only a hypothesis, there was significant literature support for such a relationship.

Such an interpretation would not have been possible without a graphical network-

based perspective, which is increasingly becoming a minimum, and mandatory,

element of any serious metabolomics investigation of biochemistry.

The interaction between network analysis and significance prediction occurs at

multiple levels. While good numerical estimates of significance can aid

interpretation of the results in a network context by paring down the relevant

entities, the biochemical implications and connections contribute a large margin to

the final confidence in the results. Metabolite concentration shifts which were not

flagged as numerically significant can be interpreted with increased confidence

when they’re bracketed by complementary changes in network neighbours. In the

same way that Chenomx’s targeted profiled uses a priori molecular structural

knowledge to enhance the interpretation of the binned spectra, biochemical structure

Page 134: Multivariate NMR analysis of human disease models

132

data can enhance interpretation of multivariate results. The IPA™ suite represents a

first generation attempt to automate the application of this information, but as yet is

still very raw.

One area of research which represents a major frontier for metabolomics in the

future, and on which such next-generation biochemical interpretation tools will

likely be based, is the integration of data from multiple sources. A manual

incarnation of this integration can be seen in several of the studies presented here:

the antibody work, glycaemic clamps, and additional measures such as non-

esterified fatty acids (NEFA) gathered by Dr. Shearer gave important supporting

evidence to the multivariate metabolomics results. Automating this process is an

oft-discussed goal for metabolomics (Westerhuis et al., 2010), whether it’s single

additional variables such as this, or entire independent multivariate platforms such

as mass spectrometry (Dettmer et al., 2007; Amtmann and Armengaud, 2009;

Connor et al., 2010).

The naïve approach of concatenating input matrices discards far too much

information about the source properties of the data, but hierarchical multi-block PLS

presents an established tool which is more likely to accommodate the differences.

Hierarchical modeling can present a danger of over-fitting, further reinforcing the

need for effective confidence estimation at the base level, but also presents several

interesting possibilities. The use of residuals from the base models can be used to

factor out known issues (Schicho et al., 2010), while the use of base-scores as

variables for a top level model could theoretically integrate data from diverse

sources, effectively normalising and balancing each source. Unfortunately, at this

time there doesn’t appear to be any research into “carrying” variable confidence

from base models to higher orders.

Several other options exist for addressing the problem of integrating multisource

data, and developing techniques for application in metabolomics specifically will be

much more an exercise in statistical literature mining and testing than novel

Page 135: Multivariate NMR analysis of human disease models

133

methodological research. Examples include block-scaling, an intermediate option

which is offered by the commonly used SIMCA software package, but which is as

yet unexplored in a metabolomics context. As always, a properly structured

estimation for concentration distributions, trained with probabilistic integration

methods, represents the most flexible and powerful option but would require

significant development work and potentially larger sample sizes. One significant

advantage to such an approach would be the potential for relatively direct

integration of both molecular structure and biochemical network information,

making numerical and biological modeling an interactive two-way street.

Time Series: The Future of Metabolomics

Having established methods for improving sample manipulation, acquisition,

comparison, and interpretation, there remains potential for improvement at every

stage of metabolomics pipeline. A major frontier and overarching theme to many

potential improvements is the concept of time series analysis. Samples which can

be compared longitudinally to others taken from the same biological context

(individual, animal, etc), prior to manipulation, offer the strongest form of controls

in a multivariate setting. Time series have been an increasingly important part of

the metabolomics literature and will likely continue to grow in significance for

several reasons.

In humans, the potential for variation in environmental exposure, dietary variation,

and genetic background is well established. Not all of this variation can be

eliminated or calibrated by comparison to a baseline, but the ability of self-controls

to focus on the changes relative to treatment in a particular window increases both

the purity and focus of the data. Moreover, the animal samples examined herein

have shown that even genetically controlled model organisms such as mice are

subject to variations in infection, dietary consumption, and other random factors

which differentiate individuals. Severity of onset in manipulated pathology, and the

benefits of pharmalogical interventions, are likewise unpredictable and among the

Page 136: Multivariate NMR analysis of human disease models

134

most significant potential areas of systems biology application, so time series

analysis would likely benefit all such studies.

Acquiring and processing time series data will require, or at least benefit from, a

number of innovations or engineering improvements. At the sample collection and

instrumental acquisition level, the ability to analyse smaller volumes will be an

important development. Mass spectrometry presents an appealing sensitivity on this

front, but smaller bore NMR probes or capillary-NMR may eventually provide

complementary means to analyse small volume samples. The ability to quantify

samples in the 5-10uL range would facilitate both human and animal experiments.

In murine studies, it could allow temporal perspectives of the same animal by

collecting a non-fatal serum volume at each time point. In humans, the

physiological effects of repeated blood draw could be minimized, as well as the

application of metabolomics to cerebrospinal fluid from lumbar punctures, and

potentially the development of a portable, continuous-sampling apparatus. Such a

device could be derived from the insulin pump technology currently in public use,

inverted to provide an intimate perspective into human activity. Beyond

establishing a baseline, continuous sampling would allow analysis at a much higher

temporal resolution. This could eliminate, or at least capture and calibrate, many of

the problematic sources of variation such as food and drug metabolism, diurnal

biological cycles, and shifts caused by changes in energy expenditure.

From a numerical perspective, tools exist for the analysis of time-series data,

however additional assumptions are often required about the nature (transient,

periodic, monotonic) of the trends involved. Moreover, training and testing will be

required to validate their application in metabolomics data, and the results will once

again need to be translated for consumption by end users. Visualization, whether

customised or standardized, will likely remain the most powerful tool available for

doing so.

Page 137: Multivariate NMR analysis of human disease models

135

One final area of investigation which would benefit from time series analysis, but

remains a distant goal for metabolomics is that of measuring biochemical flux

within both cells and individuals. Very little quality work (Lee et al., 2006; Zamboni

and Sauer, 2009) has been done on the measurement of flux by metabolomic

methods, but it remains an important stepping stone for both biochemical

investigation and pharmacodynamics efforts. It may be that additional work is

required to clarify the relationships between cellular mechanics and biofluid-level

effects, before assertions can be made about flux based on metabolomics

measurements, but a higher temporal resolution between sampling would greatly

increase their utility.

Page 138: Multivariate NMR analysis of human disease models

136

Closing Thoughts

As an insider, it often seems that metabolomics has had a difficult adolescence,

failing to gain traction with external audiences. While there have been numerous

applications, the integration with other types of analysis has been slow coming.

Moreover, there is no clear roadmap for the integration of metabolomics with

clinical or pharmaceutical efforts. Achieving that acceptance will be as much

political as scientific, but a foundation of clearer, more reliable data will be

necessary.

With some more reflection however, it’s clear that in chasing its potential as a

method the field’s growth has been even more accelerated than other areas of

systems biology; the growing pains of the past decade are to be expected. With this

work, we’ve showed that metabolomics is comprised of a pipeline with the potential

for improvements at each step. By adapting established experimental structures to

metabolomics, it’s clear that proper controls and well thought out comparisons yield

robust results and clearer biological insight. Nevertheless, the need for continued

research into data processing, confidence estimation, and visualization techniques

remains clear. Combining those innovations will invariably lead to more efficient

diagnostic techniques, more insight into disease causes, greater acceptance of the

results, and connections between the metabolite pipeline and those of other systems

biology fields. Only then will metabolomics have fully matured as an investigative

field.

Page 139: Multivariate NMR analysis of human disease models

137

Chapter 8 Bibliography

Abate-Shen, C., and Shen, M.M. (2009). Diagnostics: The prostate-cancer metabolome. Nature 457, 799–800.

Åberg, K.M., Alm, E., and Torgrip, R.J.O. (2009). The correspondence problem for metabonomics datasets. Anal Bioanal Chem 394, 151–162.

Addelman, S. (1969). The Generalized Randomized Block Design. The American Statistician 23, 35–36.

Adibi, S.A. (1968). Influence of dietary deprivations on plasma concentration of free amino acids of man. J Appl Physiol 25, 52–57.

Alam, T.M., Alam, M.K., McIntyre, S.K., Volk, D.E., Neerathilingam, M., and Luxon, B.A. (2009). Investigation of Chemometric Instrumental Transfer Methods for High-Resolution NMR. Anal. Chem. 81, 4433–4443.

Alevizos, I., Misra, J., Bullen, J., Basso, G., Kelleher, J., Mantzoros, C., Stephanopoulos, G., and others (2007). Linking hepatic transcriptional changes to high-fat diet induced physiology for diabetes-prone and obese-resistant mice. CELL CYCLE-LANDES BIOSCIENCE- 6, 1631.

Alsberg, B.K., Kell, D.B., and Goodacre, R. (1998). Variable selection in discriminant partial least-squares analysis. Analytical Chemistry 70, 4126–4133.

Amantonico, A., Urban, P.L., and Zenobi, R. (2010). Analytical techniques for single-cell metabolomics: state of the art and trends. Anal Bioanal Chem 398, 2493–2504.

Ames, B.N., Cathcart, R., Schwiers, E., and Hochstein, P. (1981). Uric acid provides an antioxidant defense in humans against oxidant- and radical-caused aging and cancer: a hypothesis. Proc. Natl. Acad. Sci. U.S.A 78, 6858–6862.

Amtmann, A., and Armengaud, P. (2009). Effects of N, P, K and S on metabolism: new knowledge gained from multi-level analysis. Current Opinion in Plant Biology 12, 275–283.

Anderson, P.E., Mahle, D.A., Doom, T., Reo, N.V., DelRaso, N.J., and Raymer, M.L. (2010). Dynamic adaptive binning: an improved quantification technique for NMR spectroscopic data. Metabolomics.

Anderson, P.E., Reo, N.V., DelRaso, N.J., Doom, T.E., and Raymer, M.L. (2008). Gaussian binning: a new kernel-based method for processing NMR spectroscopic data for metabolomics. Metabolomics 4, 261–272.

Page 140: Multivariate NMR analysis of human disease models

138

Aristotle (1985). Metaphysics (Hackett Publishing).

Atalay, M., and Laaksonen, D.E. (2002). Diabetes, oxidative stress and physical exercise. Journal of Sports Science and Medicine 1, 1–14.

Ayala, J.E., Bracy, D.P., Julien, B.M., Rottman, J.N., Fueger, P.T., and Wasserman, D.H. (2007). Chronic treatment with sildenafil improves energy balance and insulin action in high fat-fed conscious mice. Diabetes 56, 1025–1033.

Bain, J.R., Stevens, R.D., Wenner, B.R., Ilkayeva, O., Muoio, D.M., and Newgard, C.B. (2009). Metabolomics Applied to Diabetes Research. Diabetes 58, 2429.

Bantscheff, M., Schirle, M., Sweetman, G., Rick, J., and Kuster, B. (2007). Quantitative mass spectrometry in proteomics: a critical review. Analytical and Bioanalytical Chemistry 389, 1017–1031.

Barba, I., Fernandez‐Montesinos, R., Garcia‐Dorado, D., and Pozo, D. (2008). Alzheimer’s disease beyond the genomic era: nuclear magnetic resonance (NMR) spectroscopy‐based metabolomics. Journal of Cellular and Molecular Medicine 12, 1477–1485.

Barnard, D.E., Lewis, S.M., Teter, B.B., and Thigpen, J.E. (2009). Open- and Closed-Formula Laboratory Animal Diets and Their Importance to Research. Journal of the American Association for Laboratory Animal Science : JAALAS 48, 709.

Bathe, O.F., Shaykhutdinov, R., Kopciuk, K., Weljie, A.M., McKay, A., Sutherland, F.R., Dixon, E., Dunse, N., Sotiropoulos, D., and Vogel, H.J. (2011). Feasibility of Identifying Pancreatic Cancer Based on Serum Metabolomics. Cancer Epidemiol Biomarkers Prev 20, 140–147.

Beckwith-Hall, B.M., Brindle, J.T., Barton, R.H., Coen, M., Holmes, E., Nicholson, J.K., and Antti, H. (2002). Application of orthogonal signal correction to minimise the effects of physical and biological variation in high resolution 1H NMR spectra of biofluids. Analyst 127, 1283–1288.

Beckwith-Hall, B.M., Nicholson, J.K., Nicholls, A.W., Foxall, P.J.D., Lindon, J.C., Connor, S.C., Abdi, M., Connelly, J., and Holmes, E. (1998). Nuclear Magnetic Resonance Spectroscopic and Principal Components Analysis Investigations into Biochemical Effects of Three Model Hepatotoxins. Chem. Res. Toxicol. 11, 260–272.

Bell, J.D., Sadler, P.J., Morris, V.C., and Levander, O.A. (1991). Effect of aging and diet on proton NMR spectra of rat urine. Magn. Reson. Med. 17, 414–422.

Page 141: Multivariate NMR analysis of human disease models

139

Von Bertalanffy, L. (1950). An outline of general system theory. British Journal for the Philosophy of Science 1, 134–165.

Bertone, P., Stolc, V., Royce, T.E., Rozowsky, J.S., Urban, A.E., Zhu, X., Rinn, J.L., Tongprasit, W., Samanta, M., Weissman, S., et al. (2004). Global Identification of Human Transcribed Sequences with Genome Tiling Arrays. Science 306, 2242–2246.

Bijlsma, S., Bobeldijk, I., Verheij, E.R., Ramaker, R., Kochhar, S., Macdonald, I.A., Van Ommen, B., and Smilde, A.K. (2006). Large-Scale Human Metabolomics Studies:  A Strategy for Data (Pre-) Processing and Validation. Anal. Chem. 78, 567–574.

Bino, R.J., Hall, R.D., Fiehn, O., Kopka, J., Saito, K., Draper, J., Nikolau, B.J., Mendes, P., Roessner-Tunali, U., Beale, M.H., et al. (2004). Potential of metabolomics as a functional genomics tool. Trends in Plant Science 9, 418–425.

Bock, J.L. (1982). Analysis of serum by high-field proton nuclear magnetic resonance. Clinical Chemistry 28, 1873–1877.

Bollard, M.E., Stanley, E.G., Lindon, J.C., Nicholson, J.K., and Holmes, E. (2005). NMR-based metabonomic approaches for evaluating physiological influences on biofluid composition. NMR Biomed. 18, 143–162.

Bolstad, B.M., Irizarry, R.A., Åstrand, M., and Speed, T.P. (2003). A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19, 185–193.

Bonora, E., Targher, G., Branzi, P., Zenere, M., Saggiani, F., Zenti, M.G., Travia, D., Tonoli, M., Muggeo, M., and Cigolini, M. (1996). Cardiovascular risk profile in 38-year and 18-year-old men. Contribution of body fat content and regional fat distribution. Int. J. Obes. Relat. Metab. Disord 20, 28–36.

Bottomley, P. (1985). Noninvasive study of high-energy phosphate metabolism in human heart by depth-resolved 31P NMR spectroscopy. Science 229, 769–772.

Bouillon, R., Bex, M., Herck, E.V., Laureys, J., Dooms, L., Lesaffre, E., and Ravussin, E. (1995). Influence of age, sex, and insulin on osteoblast function: osteoblast dysfunction in diabetes mellitus. JCEM 80, 1194–1202.

Boulé, N.G., Haddad, E., Kenny, G.P., Wells, G.A., and Sigal, R.J. (2001). Effects of exercise on glycemic control and body mass in type 2 diabetes mellitus. JAMA: The Journal of the American Medical Association 286, 1218–1227.

Brenner, R.R., Bernasconi, A.M., and Garda, H.A. (2000). Effect of experimental diabetes on the fatty acid composition, molecular species of phosphatidyl-choline

Page 142: Multivariate NMR analysis of human disease models

140

and physical properties of hepatic microsomal membranes. Prostaglandins, Leukotrienes and Essential Fatty Acids 63, 167–176.

Brindle, J.T., Antti, H., Holmes, E., Tranter, G., Nicholson, J.K., Bethell, H.W.L., Clarke, S., Schofield, P.M., McKilligin, E., Mosedale, D.E., et al. (2002). Rapid and noninvasive diagnosis of the presence and severity of coronary heart disease using 1H-NMR-based metabonomics. Nature Medicine 8, 1439–1445.

Broadhurst, D., Goodacre, R., Jones, A., Rowland, J.J., and Kell, D.B. (1997). Genetic algorithms as a method for variable selection in multiple linear regression and partial least squares regression, with applications to pyrolysis mass spectrometry. Analytica Chimica Acta 348, 71–86.

Broadhurst, D.I., and Kell, D.B. (2006). Statistical strategies for avoiding false discoveries in metabolomics and related experiments. Metabolomics 2, 171–196.

Brooks, D.C., Bessey, P.Q., Black, P.R., Aoki, T.T., and Wilmore, D.W. (1986). Insulin stimulates branched chain amino acid uptake and diminishes nitrogen flux from skeletal muscle of injured patients. Journal of Surgical Research 40, 395–405.

Brown, M., Dunn, W.B., Ellis, D.I., Goodacre, R., Handl, J., Knowles, J.D., O’Hagan, S., Spasić, I., and Kell, D.B. (2005). A metabolome pipeline: from concept to data to knowledge. Metabolomics 1, 39–51.

Brownlee, M., and Cerami, A. (1981). The Biochemistry of the Complications of Diabetes Mellitus. Annual Review of Biochemistry 50, 385–432.

Cabanillas, F. (2010). vitamin c and cancer: what can we conclude-1,609 Patients and 33 Years later? PR Health Sciences Journal 29,.

Calles-Escandon, J., Cunningham, J., and Felig, P. (1984). The plasma amino acid response to cafeteria feeding in the rat: influence of hyperphagia, sucrose intake, and exercise. Metabolism 33, 364–368.

Canet, D. (1996). Nuclear magnetic resonance: concepts and methods (Chichester ; New York: Wiley).

Cangemi, R., Angelico, F., Loffredo, L., Del Ben, M., Pignatelli, P., Martini, A., and Violi, F. (2007). Oxidative stress-mediated arterial dysfunction in patients with metabolic syndrome: Effect of ascorbic acid. Free Radic. Biol. Med 43, 853–859.

Carroll, K.K., and Kritchevsky, D. (1994). Nutrition and disease update: Cancer (The American Oil Chemists Society).

Page 143: Multivariate NMR analysis of human disease models

141

Cartee, G.D., Young, D.A., Sleeper, M.D., Zierath, J., Wallberg-Henriksson, H., and Holloszy, J.O. (1989). Prolonged increase in insulin-stimulated glucose transport in muscle after exercise. Am J Physiol Endocrinol Metab 256, E494–E499.

Castellino, P., Luzi, L., Simonson, D.C., Haymond, M., and DeFronzo, R.A. (1987). Effect of insulin and plasma amino acid concentrations on leucine metabolism in man. Role of substrate availability on estimates of whole body protein synthesis. Journal of Clinical Investigation 80, 1784.

Catchpole, G.S., Beckmann, M., Enot, D.P., Mondhe, M., Zywicki, B., Taylor, J., Hardy, N., Smith, A., King, R.D., and Kell, D.B. (2005). Hierarchical metabolomics demonstrates substantial compositional similarity between genetically modified and conventional potato crops. Proceedings of the National Academy of Sciences of the United States of America 102, 14458–14462.

Cerdan, S., Künnecke, B., and Seelig, J. (1990). Cerebral metabolism of [1,2-13C2]acetate as detected by in vivo and in vitro 13C NMR. J. Biol. Chem. 265, 12916–12926.

Chan, E.C.Y., Koh, P.K., Mal, M., Cheah, P.Y., Eu, K.W., Backshall, A., Cavill, R., Nicholson, J.K., and Keun, H.C. (2009). Metabolic profiling of human colorectal cancer using high-resolution magic angle spinning nuclear magnetic resonance (HR-MAS NMR) spectroscopy and gas chromatography mass spectrometry (GC/MS). J. Proteome Res 8, 352–361.

CHATTERJEE, I.B., KAR, N.C., GHOSH, N.C., and GUHA, B.C. (1961). Biosynthesis of L-ascorbic acid: missing steps in animals incapable of synthesizing the vitamin. Nature 192, 163–164.

Chen, Q., Espey, M.G., Krishna, M.C., Mitchell, J.B., Corpe, C.P., Buettner, G.R., Shacter, E., and Levine, M. (2005). Pharmacologic ascorbic acid concentrations selectively kill cancer cells: Action as a pro-drug to deliver hydrogen peroxide to tissues. PNAS 102, 13604–13609.

Chen, Q., Espey, M.G., Sun, A.Y., Pooput, C., Kirk, K.L., Krishna, M.C., Khosh, D.B., Drisko, J., and Levine, M. (2008). Pharmacologic doses of ascorbate act as a prooxidant and decrease growth of aggressive tumor xenografts in mice. PNAS 105, 11105–11109.

Chevalier, S., Gougeon, R., Kreisman, S.H., Cassis, C., and Morais, J.A. (2004). The hyperinsulinemic amino acid clamp increases whole-body protein synthesis in young subjects. Metabolism 53, 388–396.

Cho, H.-W., Kim, S.B., Jeong, M.K., Park, Y., Ziegler, T.R., and Jones, D.P. (2008). Genetic algorithm-based feature selection in high-resolution NMR spectra. Expert Systems with Applications 35, 967–975.

Page 144: Multivariate NMR analysis of human disease models

142

Christian, P., and Stewart, C.P. (2010). Maternal Micronutrient Deficiency, Fetal Development, and the Risk of Chronic Disease. The Journal of Nutrition 140, 437 –445.

Chung, Y.-L., and Griffiths, J.R. (2008). Using Metabolomics to Monitor Anticancer Drugs. In Oncogenes Meet Metabolism, G. Kroemer, D. Mumberg, H. Keun, B. Riefke, T. Steger-Hartmann, and K. Petersen, eds. (Springer Berlin Heidelberg), pp. 55–78.

Clark, T.A., Sugnet, C.W., and Ares, M. (2002). Genomewide Analysis of mRNA Processing in Yeast Using Splicing-Specific Microarrays. Science 296, 907–910.

Claudino, W.M., Quattrone, A., Biganzoli, L., Pestrin, M., Bertini, I., and Di Leo, A. (2007). Metabolomics: available results, current research projects in breast cancer, and future applications. Journal of Clinical Oncology 25, 2840.

Clee, S.M., and Attie, A.D. (2007). The genetic landscape of type 2 diabetes in mice. Endocrine Reviews 28, 48–83.

Cloarec, O., Dumas, M.-E., Craig, A., Barton, R.H., Trygg, J., Hudson, J., Blancher, C., Gauguier, D., Lindon, J.C., Holmes, E., et al. (2005). Statistical total correlation spectroscopy: an exploratory approach for latent biomarker identification from metabolic 1H NMR data sets. Anal. Chem 77, 1282–1289.

Collins, F.S., Patrinos, A., Jordan, E., Chakravarti, A., Gesteland, R., and Walters, L. (1998). New Goals for the U.S. Human Genome Project: 1998-2003. Science 282, 682–689.

Connor, S.C., Hansen, M.K., Corner, A., Smith, R.F., and Ryan, T.E. (2010). Integration of metabolomics and transcriptomics data to aid biomarker discovery in type 2 diabetes. Mol. BioSyst. 6, 909.

Craig, A., Cloarec, O., Holmes, E., Nicholson, J.K., and Lindon, J.C. (2006). Scaling and normalization effects in NMR spectroscopic metabonomic data sets. Anal. Chem. 78, 2262–2267.

Davis, R.A., Charlton, A.J., Godward, J., Jones, S.A., Harrison, M., and Wilson, J.C. (2007). Adaptive binning: An improved binning method for metabolomics data using the undecimated wavelet transform. Chemometrics and Intelligent Laboratory Systems 85, 144–154.

Dawidowicz, E.A. (1987). Dynamics of Membrane Lipid Metabolism and Turnover. Annual Review of Biochemistry 56, 43–57.

Defernez, M., and Kemsley, E.K. (1997). The use and misuse of chemometrics for treating classification problems. TrAC Trends in Analytical Chemistry 16, 216–221.

Page 145: Multivariate NMR analysis of human disease models

143

Descartes, R. (1643). Discours de la méthode; Méditations y Description du corps humain. Oeuvres Philosophiques, Editions Alquie, París 1967,.

Dettmer, K., Aronov, P.A., and Hammock, B.D. (2007). MASS SPECTROMETRY-BASED METABOLOMICS. Mass Spectrom Rev 26, 51–78.

DeVilliers, D.C., Dixit, P.K., and Lazarow, A. (1966). Citrate metabolism in diabetes: I. Plasma citrate in alloxan-diabetic rats and in clinical diabetes. Metabolism 15, 458–465.

Dieterle, F., Ross, A., Schlotterbeck, G., and Senn, H. (2006). Probabilistic Quotient Normalization as Robust Method to Account for Dilution of Complex Biological Mixtures. Application in 1H NMR Metabonomics. Anal. Chem. 78, 4281–4290.

Dixon, R.A., Gang, D.R., Charlton, A.J., Fiehn, O., Kuiper, H.A., Reynolds, T.L., Tjeerdema, R.S., Jeffery, E.H., German, J.B., Ridley, W.P., et al. (2006). Applications of Metabolomics in Agriculture. J. Agric. Food Chem. 54, 8984–8994.

Van Doorn, M., Vogels, J., Tas, A., Van Hoogdalem, E.J., Burggraaf, J., Cohen, A., and Van Der Greef, J. (2007). Evaluation of metabolite profiles as biomarkers for the pharmacological effects of thiazolidinediones in Type 2 diabetes mellitus patients and healthy volunteers. British Journal of Clinical Pharmacology 63, 562–574.

Draisma, H.H.M., Reijmers, T.H., Van der Kloet, F., Bobeldijk-Pastorova, I., Spies-Faber, E., Vogels, J.T.W.E., Meulman, J.J., Boomsma, D.I., Van der Greef, J., and Hankemeier, T. (2010). Equating, or Correction for Between-Block Effects with Application to Body Fluid LC−MS and NMR Metabolomics Data Sets. Anal. Chem. 82, 1039–1046.

Duarte, N.C., Becker, S.A., Jamshidi, N., Thiele, I., Mo, M.L., Vo, T.D., Srivas, R., and Palsson, B.Ø. (2007). Global reconstruction of the human metabolic network based on genomic and bibliomic data. PNAS 104, 1777–1782.

Duggan, G.E., Hittel, D.S., Hughey, C.C., Weljie, A., Vogel, H.J., and Shearer, J. (2011a). Differentiating short-and long-term effects of diet in the obese mouse using 1H-nuclear magnetic resonance metabolomics. Diabetes, Obesity and Metabolism 13, 859–862.

Duggan, G.E., Hittel, D.S., Sensen, C.W., Weljie, A.M., Vogel, H.J., and Shearer, J. (2011b). Metabolomic response to exercise training in lean and diet-induced obese mice. Journal of Applied Physiology 110, 1311–1318.

Duggan, G.E., Joan Miller, B., Jirik, F.R., and Vogel, H.J. (2011c). Metabolic profiling of vitamin C deficiency in Gulo-/- mice using proton NMR spectroscopy. Journal of Biomolecular NMR 49, 165–173.

Page 146: Multivariate NMR analysis of human disease models

144

Dumas, M.-E., Maibaum, E.C., Teague, C., Ueshima, H., Zhou, B., Lindon, J.C., Nicholson, J.K., Stamler, J., Elliott, P., Chan, Q., et al. (2006). Assessment of analytical reproducibility of 1H NMR spectroscopy based metabonomics for large-scale epidemiological research: the INTERMAP Study. Anal. Chem. 78, 2199–2208.

Dunn, W.B., Broadhurst, D.I., Atherton, H.J., Goodacre, R., and Griffin, J.L. (2011). Systems level studies of mammalian metabolomes: the roles of mass spectrometry and nuclear magnetic resonance spectroscopy. Chem Soc Rev 40, 387–426.

Dunn, W.B., and Ellis, D.I. (2005). Metabolomics: Current analytical platforms and methodologies. TrAC Trends in Analytical Chemistry 24, 285–294.

Dunne, V.G., Bhattachayya, S., Besser, M., Rae, C., and Griffin, J.L. (2005). Metabolites from cerebrospinal fluid in aneurysmal subarachnoid haemorrhage correlate with vasospasm and clinical outcome: a pattern-recognition 1H NMR study. NMR in Biomedicine 18, 24–33.

Eastment, H.T., and Krzanowski, W.J. (1982). Cross-Validatory Choice of the Number of Components from a Principal Component Analysis. Technometrics 24, 73–77.

Efron, B., and Gong, G. (1983). A Leisurely Look at the Bootstrap, the Jackknife, and Cross-Validation. The American Statistician 37, 36–48.

Efron, B., Tibshirani, R., and Tibshirani, R.J. (1993). An introduction to the bootstrap (CRC Press).

Eriksson, L., Trygg, J., and Wold, S. (2008). CV‐ANOVA for significance testing of PLS and OPLS® models. Journal of Chemometrics 22, 594–600.

Fan, T.W., Lane, A.N., Higashi, R.M., Farag, M.A., Gao, H., Bousamra, M., and Miller, D.M. (2009). Altered regulation of metabolic pathways in human lung cancer discerned by 13C stable isotope-resolved metabolomics (SIRM). Mol Cancer 8, 41.

Fell, D. (1997). Understanding the control of metabolism. (Portland Press Ltd.).

Fiehn, O. (2002). Metabolomics – the link between genotypes and phenotypes. Plant Molecular Biology 48, 155–171.

Fiehn, O., Robertson, D., Griffin, J., Van der Werf, M., Nikolau, B., Morrison, N., Sumner, L.W., Goodacre, R., Hardy, N.W., Taylor, C., et al. (2007). The metabolomics standards initiative (MSI). Metabolomics 3, 175–178.

Page 147: Multivariate NMR analysis of human disease models

145

Forbes, V.E., Palmqvist, A., and Bach, L. (2006). The use and misuse of biomarkers in ecotoxicology. Environmental Toxicology and Chemistry 25, 272–280.

Forlani, G., Vannini, P., Marchesini, G., Zoli, M., Ciavarella, A., and Pisi, E. (1984). Insulin-dependent metabolism of branched-chain amino acids in obesity. Metabolism 33, 147–150.

Franconi, F., Loizzo, A., Ghirlanda, G., and Seghieri, G. (2006). Taurine supplementation and diabetes mellitus. Current Opinion in Clinical Nutrition & Metabolic Care 9, 32.

Frank, R., and Hargreaves, R. (2003). Clinical biomarkers in drug discovery and development. Nature Reviews Drug Discovery 2, 566–580.

Franken, H., Seitz, A., Lehmann, R., Häring, H.U., Stefan, N., and Zell, A. (2012). Inferring disease-related metabolite dependencies with a bayesian optimization algorithm. Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics 62–73.

Friedrich, N. (2012). Metabolomics in diabetes research. J Endocrinol 215, 29–42.

Frikke-Schmidt, H., and Lykkesfeldt, J. (2009). Role of marginal vitamin C deficiency in atherogenesis: in vivo models and clinical studies. Basic Clin. Pharmacol. Toxicol 104, 419–433.

Fueger, P.T., Bracy, D.P., Malabanan, C.M., Pencek, R.R., Granner, D.K., and Wasserman, D.H. (2004a). Hexokinase II overexpression improves exercise-stimulated but not insulin-stimulated muscle glucose uptake in high-fat-fed C57BL/6J mice. Diabetes 53, 306–314.

Fueger, P.T., Bracy, D.P., Malabanan, C.M., Pencek, R.R., and Wasserman, D.H. (2004b). Distributed control of glucose uptake by working muscles of conscious mice: roles of transport and phosphorylation. Am J Physiol Endocrinol Metab 286, E77–E84.

Fueger, P.T., Hess, H.S., Bracy, D.P., Pencek, R.R., Posey, K.A., Charron, M.J., and Wasserman, D.H. (2004c). Regulation of Insulin-Stimulated Muscle Glucose Uptake in the Conscious Mouse: Role of Glucose Transport Is Dependent on Glucose Phosphorylation Capacity. Endocrinology 145, 4912–4916.

Fueger, P.T., Lee-Young, R.S., Shearer, J., Bracy, D.P., Heikkinen, S., Laakso, M., Rottman, J.N., and Wasserman, D.H. (2007). Phosphorylation Barriers to Skeletal and Cardiac Muscle Glucose Uptakes in High-Fat–Fed Mice Studies in Mice With a 50% Reduction of Hexokinase II. Diabetes 56, 2476–2484.

Page 148: Multivariate NMR analysis of human disease models

146

Fueger, P.T., Shearer, J., Bracy, D.P., Posey, K.A., Pencek, R.R., McGuinness, O.P., and Wasserman, D.H. (2005). Control of muscle glucose uptake: test of the rate-limiting step paradigm in conscious, unrestrained mice. J Physiol 562, 925–935.

Fukusaki, E., and Kobayashi, A. (2005). Plant metabolomics: potential for practical operation. Journal of Bioscience and Bioengineering 100, 347–354.

Gadian, D.G., and Radda, G.K. (1981). NMR Studies of Tissue Metabolism. Annual Review of Biochemistry 50, 69–83.

Garlick, P.J., and Grant, I. (1988). Amino acid infusion increases the sensitivity of muscle protein synthesis in vivo to insulin. Effect of branched-chain amino acids. Biochemical Journal 254, 579.

Gavai, A.K. (2009). Bayesian networks for omics data analysis (Wageningen Universiteit (Wageningen University)).

Ghannoum, M.A., Mukherjee, P.K., Jurevic, R.J., Retuerto, M., Brown, R.E., Sikaroodi, M., Webster-Cyriaque, J., and Gillevet, P.M. (2011). Metabolomics Reveals Differential Levels of Oral Metabolites in HIV-Infected Patients: Toward Novel Diagnostic Targets. OMICS: A Journal of Integrative Biology.

Gilbert, R.J., Goodacre, R., Woodward, A.M., and Kell, D.B. (1997). Genetic Programming:  A Novel Method for the Quantitative Analysis of Pyrolysis Mass Spectral Data. Analytical Chemistry 69, 4381–4389.

Giovannucci, E., and Michaud, D. (2007). The role of obesity and related metabolic disturbances in cancers of the colon, prostate, and pancreas. Gastroenterology 132, 2208–2225.

Giskeødegård, G.F., Grinde, M.T., Sitter, B., Axelson, D.E., Lundgren, S., Fjøsne, H.E., Dahl, S., Gribbestad, I.S., and Bathen, T.F. (2010). Multivariate Modeling and Prediction of Breast Cancer Prognostic Factors Using MR Metabolomics. J. Proteome Res. 9, 972–979.

Glassbrook, N., Beecher, C., and Ryals, J. (2000). Metabolic profiling on the right path. Nat. Biotechnol 18, 1142–1143.

Glassbrook, N., and Ryals, J. (2001). A systematic approach to biochemical profiling. Current Opinion in Plant Biology 4, 186–190.

Gonzalez, C.A., and Riboli, E. (2010). Diet and cancer prevention: Contributions from the European Prospective Investigation into Cancer and Nutrition (EPIC) study. European Journal of Cancer 46, 2555–2562.

Page 149: Multivariate NMR analysis of human disease models

147

Goodacre, R., Broadhurst, D., Smilde, A.K., Kristal, B.S., Baker, J.D., Beger, R., Bessant, C., Connor, S., Capuani, G., Craig, A., et al. (2007). Proposed minimum reporting standards for data analysis in metabolomics. Metabolomics 3, 231–241.

Goodacre, R., Vaidyanathan, S., Dunn, W.B., Harrigan, G.G., and Kell, D.B. (2004). Metabolomics by numbers: acquiring and understanding global metabolite data. TRENDS in Biotechnology 22, 245–252.

Van der Greef, J., Martin, S., Juhasz, P., Adourian, A., Plasterer, T., Verheij, E.R., and McBurney, R.N. (2007). The Art and Practice of Systems Biology in Medicine:  Mapping Patterns of Relationships. J. Proteome Res. 6, 1540–1559.

Griffin, J.L. (2006). Understanding mouse models of disease through metabolomics. Current Opinion in Chemical Biology 10, 309–315.

Griffin, J.L., and Kauppinen, R.A. (2007). A metabolomics perspective of human brain tumours. FEBS Journal 274, 1132–1139.

Griffin, J.L., and Nicholls, A.W. (2006). Metabolomics as a functional genomic tool for understanding lipid dysfunction in diabetes, obesity and related disorders. Pharmacogenomics 7, 1095+.

Ha, M.N., Graham, F.L., D’Souza, C.K., Muller, W.J., Igdoura, S.A., and Schellhorn, H.E. (2004). Functional rescue of vitamin C synthesis deficiency in human cells using adenoviral-based expression of murine l-gulono-γ-lactone oxidase. Genomics 83, 482–492.

Hageman, J.A., Van den Berg, R.A., Westerhuis, J.A., Van der Werf, M.J., and Smilde, A.K. (2008). Genetic algorithm based two-mode clustering of metabolomics data. Metabolomics 4, 141–149.

Hall, R., Beale, M., Fiehn, O., Hardy, N., Sumner, L., and Bino, R. (2002). Plant metabolomics: the missing link in functional genomics strategies. The Plant Cell Online 14, 1437–1440.

Harper, A.E., Miller, R.H., and Block, K.P. (1984). Branched-chain amino acid metabolism. Annual Review of Nutrition 4, 409–454.

Harri, H., and others (2009). Vitamin C supplementation and the common cold-was Linus Pauling right or wrong?

Harrison, F.E., Meredith, M.E., Dawes, S.M., Saskowski, J.L., and May, J.M. (2010). Low ascorbic acid and increased oxidative stress in gulo(-/-) mice during development. Brain Res 1349, 143–152.

Harvie, D.I. (2005). Limeys: The Conquest of Scurvy (The History Press Ltd).

Page 150: Multivariate NMR analysis of human disease models

148

Hastie, T., Tibshirani, R., and Friedman, J.H. (2009). The elements of statistical learning: data mining, inference, and prediction (Springer).

Heath, H., Melton, L.J., and Chu, C.-P. (1980). Diabetes Mellitus and Risk of Skeletal Fracture. New England Journal of Medicine 303, 567–570.

Heinig, M., and Johnson, R.J. (2006). Role of uric acid in hypertension, renal disease, and metabolic syndrome. Cleve Clin J Med 73, 1059–1064.

Hilario, M., Kalousis, A., M\üller, M., and Pellegrini, C. (2003). Machine learning approaches to lung cancer prediction from mass spectra. Proteomics 3, 1716–1719.

Hirschmann, J.V., and Raugi, G.J. (1999). Adult scurvy. J. Am. Acad. Dermatol 41, 895–906; quiz 907–910.

Holmes, E., and Antti, H. (2002). Chemometric contributions to the evolution of metabonomics: mathematical solutions to characterising and interpreting complex biological NMR spectra. Analyst 127, 1549–1557.

Honors, M.A., Davenport, B.M., and Kinzig, K.P. Effects of consuming a high carbohydrate diet after eight weeks of exposure to a ketogenic diet. Nutr Metab (Lond) 6, 46–46.

Hopps, E., Noto, D., Caimi, G., and Averna, M.R. (2010). A novel component of the metabolic syndrome: the oxidative stress. Nutr Metab Cardiovasc Dis 20, 72–77.

Idle, J.R., and Gonzalez, F.J. (2007). Metabolomics. Cell Metabolism 6, 348–351.

Imoto, S., and Namioka, S. (1983a). Acetate-Glucose Relationship in Growing Pigs. J ANIM SCI 56, 867–875.

Imoto, S., and Namioka, S. (1983b). Nutritive Value of Acetate in Growing Pigs. J ANIM SCI 56, 858–866.

Ingenuity, I. (2011). IPA (Ingenuity Pathway Analysis).

Irizarry, M.C. (2004). Biomarkers of Alzheimer Disease in Plasma. NeuroRX 1, 226–234.

Izquierdo-García, J.L., Nin, N., Ruíz-Cabello, J., Rojas, Y., De Paula, M., López-Cuenca, S., Morales, L., Martínez-Caro, L., Fernández-Segoviano, P., Esteban, A., et al. (2011). A metabolomic approach for diagnosis of experimental sepsis. Intensive Care Medicine 1–10.

James, P. (1997). Protein identification in the post-genome era: the rapid rise of proteomics. Quarterly Reviews of Biophysics 30, 279–331.

Page 151: Multivariate NMR analysis of human disease models

149

Jankevics, A., Liepinsh, E., Liepinsh, E., Vilskersts, R., Grinberga, S., Pugovics, O., and Dambrova, M. (2009). Metabolomic studies of experimental diabetic urine samples by 1H NMR spectroscopy and LC/MS method. Chemometrics and Intelligent Laboratory Systems 97, 11–17.

Jentzmik, F., Stephan, C., and Jung, K. (2010a). Reply to Arun Sreekumar, Laila M. Poisson, Thekkelnaycke M. Rajendiran, et al.’s Letter to the Editor re: Florian Jentzmik, Carsten Stephan, Kurt Miller, et al. Sarcosine in Urine after Digital Rectal Examination Fails as a Marker in Prostate Cancer Detection and Identification of Aggressive Tumours. Eur Urol 2010;58:12-8. European Urology 58, e31–e32.

Jentzmik, F., Stephan, C., Lein, M., Miller, K., Kamlage, B., Bethan, B., Kristiansen, G., and Jung, K. (2011). Sarcosine in Prostate Cancer Tissue is Not a Differential Metabolite for Prostate Cancer Aggressiveness and Biochemical Progression. The Journal of Urology 185, 706–711.

Jentzmik, F., Stephan, C., Miller, K., Schrader, M., Erbersdobler, A., Kristiansen, G., Lein, M., and Jung, K. (2010b). Sarcosine in Urine after Digital Rectal Examination Fails as a Marker in Prostate Cancer Detection and Identification of Aggressive Tumours. European Urology 58, 12–18.

Jonsson, P., Bruce, S.J., Moritz, T., Trygg, J., Sjöström, M., Plumb, R., Granger, J., Maibaum, E., Nicholson, J.K., Holmes, E., et al. (2005). Extraction, interpretation and validation of information for comparing samples in metabolic LC/MS data sets. Analyst 130, 701–707.

Jonsson, P., Gullberg, J., Nordstr\öm, A., Kusano, M., Kowalczyk, M., Sj\östr\öm, M., and Moritz, T. (2004). A strategy for identifying differences in large series of metabolomic samples analyzed by GC/MS. Anal. Chem 76, 1738–1745.

Kaddurah-Daouk, R., Kristal, B.S., and Weinshilboum, R.M. (2008). Metabolomics: a global biochemical approach to drug response and disease. Annu. Rev. Pharmacol. Toxicol. 48, 653–683.

Kaderbhai, N.N., Broadhurst, D.I., Ellis, D.I., Goodacre, R., and Kell, D.B. (2003). Functional genomics via metabolic footprinting: monitoring metabolite secretion byEscherichia coli tryptophan metabolism mutants using FT-IR and direct injection electrospray mass spectrometry. Comp. Funct. Genom. 4, 376–391.

Kadoglou, N.P.E., Iliadis, F., Angelopoulou, N., Perrea, D., Ampatzidis, G., Liapis, C.D., and Alevizos, M. (2007). The anti-inflammatory effects of exercise training in patients with type 2 diabetes mellitus. European Journal of Cardiovascular Prevention & Rehabilitation 14, 837–843.

Page 152: Multivariate NMR analysis of human disease models

150

Kanehisa, M., Goto, S., Furumichi, M., Tanabe, M., and Hirakawa, M. (2010). KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res 38, D355–360.

Kanehisa, M., Goto, S., Sato, Y., Furumichi, M., and Tanabe, M. (2011). KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Research 40, D109–D114.

Kardia, S.R., Chu, J., and Sowers, M.F.R. (2006). Characterizing variation in sex steroid hormone pathway genes in women of 4 races/ethnicities: The study of women’s health across the nation (SWAN). The American Journal of Medicine 119, S3–S15.

Kell, D.B. (2006). Systems biology, metabolic modelling and metabolomics in drug discovery and development. Drug Discovery Today 11, 1085–1092.

Kelley, D.E., He, J., Menshikova, E.V., and Ritov, V.B. (2002). Dysfunction of Mitochondria in Human Skeletal Muscle in Type 2 Diabetes. Diabetes 51, 2944–2950.

Kemsley, E.K., and Tapp, H.S. (2009). OPLS filtered data can be obtained directly from non-orthogonalized PLS1. Journal of Chemometrics 23, 263–264.

Kim, H.-J., Kim, J.H., Noh, S., Hur, H.J., Sung, M.J., Hwang, J.-T., Park, J.H., Yang, H.J., Kim, M.-S., Kwon, D.Y., et al. (2011). Metabolomic Analysis of Livers and Serum from High-Fat Diet Induced Obese Mice. J. Proteome Res. 10, 722–731.

Kim, K., Aronov, P., Zakharkin, S.O., Anderson, D., Perroud, B., Thompson, I.M., and Weiss, R.H. (2009a). Urine metabolomics analysis for kidney cancer detection and biomarker discovery. Molecular & Cellular Proteomics 8, 558–570.

Kim, S.B., Wang, Z., and Duran, C.M. (2006). A bayesian approach for the alignment of high-resolution nmr spectra. In Proceedings of the INFORMS Artificial Intelligence and Data Mining Workshop, Pittsburgh, PA, USA, pp. 1–6.

Kim, S.-H., Yang, S.-O., Kim, H.-S., Kim, Y., Park, T., and Choi, H.-K. (2009b). 1H-nuclear magnetic resonance spectroscopy-based metabolic assessment in a rat model of obesity induced by a high-fat diet. Anal Bioanal Chem 395, 1117–1124.

Kim, Y.S., Maruvada, P., and Milner, J.A. (2008). Metabolomics in biomarker discovery: future uses for cancer prevention. Future Oncology 4, 93–102.

Kind, T., Tolstikov, V., Fiehn, O., and Weiss, R.H. (2007). A comprehensive urinary metabolomic approach for identifying kidney cancer. Analytical Biochemistry 363, 185–195.

Page 153: Multivariate NMR analysis of human disease models

151

Kirschenlohr, H.L., Griffin, J.L., Clarke, S.C., Rhydwen, R., Grace, A.A., Schofield, P.M., Brindle, K.M., and Metcalfe, J.C. (2006). Proton NMR analysis of plasma is a weak predictor of coronary artery disease. Nat Med 12, 705–710.

Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In International Joint Conference on Artificial Intelligence, pp. 1137–1145.

Koza, J.R., Streeter, M.J., and Keane, M.A. (2008). Routine high-return human-competitive automated problem-solving by means of genetic programming. Information Sciences 178, 4434–4452.

Kramer, M.A. (1991). Nonlinear principal component analysis using autoassociative neural networks. AIChE Journal 37, 233–243.

Krumsiek, J., Suhre, K., Illig, T., Adamski, J., and Theis, F.J. (2012). Bayesian Independent Component Analysis Recovers Pathway Signatures from Blood Metabolomics Data. Journal of Proteome Research 11, 4120–4131.

Laakso, M., Edelman, S.V., Brechtel, G., and Baron, A.D. (1992). Impaired Insulin-Mediated Skeletal Muscle Blood Flow in Patients With NIDDM. Diabetes 41, 1076–1083.

Lacy, P. (2011). Metabolomics of sepsis-induced acute lung injury: a new approach for biomarkers. American Journal of Physiology-Lung Cellular and Molecular Physiology 300, L1–L3.

Lafrance, D., Lands, L.C., and Burns, D.H. (2004). In vivo lactate measurement in human tissue by near-infrared diffuse reflectance spectroscopy. Vibrational Spectroscopy 36, 195–202.

Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., et al. (2001). Initial sequencing and analysis of the human genome. Nature 409, 860–921.

Lanza, I.R., Zhang, S., Ward, L.E., Karakelides, H., Raftery, D., and Nair, K.S. (2010). Quantitative metabolomics by 1H-NMR and LC-MS/MS confirms altered metabolic pathways in diabetes. PloS One 5, e10538.

Lawton, K.A., Berger, A., Mitchell, M., Milgram, K.E., Evans, A.M., Guo, L., Hanson, R.W., Kalhan, S.C., Ryals, J.A., and Milburn, M.V. (2008). Analysis of the adult human plasma metabolome. Pharmacogenomics 9, 383–397.

Layman, D.K., and Walker, D.A. (2006). Potential importance of leucine in treatment of obesity and the metabolic syndrome. The Journal of Nutrition 136, 319S–323S.

Page 154: Multivariate NMR analysis of human disease models

152

Lee, C. (2000). Antioxidant ability of caffeine and its metabolites based on the study of oxygen radical absorbing capacity and inhibition of LDL peroxidation. Clin. Chim. Acta 295, 141–154.

Lee, J.M., Gianchandani, E.P., and Papin, J.A. (2006). Flux balance analysis in the era of metabolomics. Briefings in Bioinformatics 7, 140–150.

Leiter, E.H. (1993). Obesity genes and diabetes induction in the mouse. Critical Reviews in Food Science & Nutrition 33, 333–338.

Levin, B.E., Dunn-Meynell, A.A., Balkan, B., and Keesey, R.E. (1997). Selective breeding for diet-induced obesity and resistance in Sprague-Dawley rats. Am J Physiol Regul Integr Comp Physiol 273, R725–R730.

Levin, B.E., Triscari, J., Hogan, S., and Sullivan, A.C. (1987). Resistance to diet-induced obesity: food intake, pancreatic sympathetic tone, and insulin. Am J Physiol Regul Integr Comp Physiol 252, R471–R478.

Li, C., and Wong, W.H. (2001). Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. Genome Biology 2, research0032.1.

Li, L.O., Hu, Y.-F., Wang, L., Mitchell, M., Berger, A., and Coleman, R.A. (2010). Early Hepatic Insulin Resistance in Mice: A Metabolomics Analysis. Molecular Endocrinology 24, 657–666.

Li, W.W. (2000). Tumor angiogenesis: Molecular pathology, therapeutic targeting, and imaging. Academic Radiology 7, 800–811.

Li, Y., Shi, C.-X., Mossman, K.L., Rosenfeld, J., Boo, Y.C., and Schellhorn, H.E. (2008). Restoration of Vitamin C Synthesis in Transgenic Gulo −/− Mice by Helper-Dependent Adenovirus-Based Expression of Gulonolactone Oxidase. Human Gene Therapy 19, 1349–1358.

Lu, W., Bennett, B.D., and Rabinowitz, J.D. (2008). Analytical strategies for LC–MS-based targeted metabolomics. Journal of Chromatography B 871, 236–242.

Luetscher Jr, J.A. (1942). The metabolism of amino acids in diabetes mellitus. Journal of Clinical Investigation 21, 275.

Lutz, U., Bittner, N., Lutz, R.W., and Lutz, W.K. (2008). Metabolite profiling in human urine by LC-MS/MS: method optimization and application for glucuronides from dextromethorphan metabolism. Journal of Chromatography B 871, 349–356.

Madsen, R., Lundstedt, T., and Trygg, J. (2010). Chemometrics in metabolomics--A review in human disease diagnosis. Analytica Chimica Acta 659, 23–33.

Page 155: Multivariate NMR analysis of human disease models

153

Maeda, N., Hagihara, H., Nakata, Y., Hiller, S., Wilder, J., and Reddick, R. (2000). Aortic wall damage in mice unable to synthesize ascorbic acid. Proc. Natl. Acad. Sci. U.S.A 97, 841–846.

Mahadevan, S., Shah, S.L., Marrie, T.J., and Slupsky, C.M. (2008). Analysis of metabolomic data using support vector machines. Analytical Chemistry 80, 7562–7570.

Mai, P.L., Wentzensen, N., and Greene, M.H. (2011). Challenges Related to Developing Serum-Based Biomarkers for Early Ovarian Cancer Detection. Cancer Prev Res 4, 303–306.

Malet-Martino, M., and Holzgrabe, U. (2010). NMR techniques in biomedical and pharmaceutical analysis. J Pharm Biomed Anal.

Mamas, M., Dunn, W.B., Neyses, L., and Goodacre, R. (2010). The role of metabolites and metabolomics in clinically applicable biomarkers of disease. Archives of Toxicology 85, 5–17.

Marchesini, G., Bianchi, G.P., Vilstrup, H., Capelli, M., Zoli, M., and Pisi, E. (1991). Elimination of infused branched-chain amino-acids from plasma of patients with non-obese type 2 diabetes mellitus. Clinical Nutrition 10, 105–113.

Mashego, M.R., Rumbold, K., De Mey, M., Vandamme, E., Soetaert, W., and Heijnen, J.J. (2007). Microbial metabolomics: past, present and future methodologies. Biotechnology Letters 29, 1–16.

Mcateer, J.G.P., Skerrett, S.J., Liggitt, D., and Frevert, C.W. (2009). A Bayesian integration model of high-throughput proteomics and metabolomics data for improved early detection of microbial infections. In Pacific Symposium on Biocomputing 2009: Kohala Coast, Hawaii, USA, 5-9 January 2009, p. 451.

McElheny, V. (2012). Drawing the Map of Life: Inside the Human Genome Project (Basic Books).

McGovern, A.C., Broadhurst, D., Taylor, J., Kaderbhai, N., Winson, M.K., Small, D.A., Rowland, J.J., Kell, D.B., and Goodacre, R. (2002). Monitoring of complex industrial bioprocesses for metabolite concentrations using modern spectroscopies and machine learning: Application to gibberellic acid production. Biotechnology and Bioengineering 78, 527–538.

Miccheli, A., Marini, F., Capuani, G., Miccheli, A.T., Delfini, M., Cocco, M.E.D., Puccetti, C., Paci, M., Rizzo, M., and Spataro, A. (2009). The Influence of a Sports Drink on the Postexercise Metabolism of Elite Athletes as Investigated by NMR-Based Metabolomics. J Am Coll Nutr 28, 553–564.

Page 156: Multivariate NMR analysis of human disease models

154

Mooradian, A.D. (1987). Blood-brain barrier choline transport is reduced in diabetic rats. Diabetes 36, 1094–1097.

MORGAN, C.R., and LAZAROW, A. (1962). Immunoassay of insulin using a two-antibody system. Proc. Soc. Exp. Biol. Med 110, 29–32.

Næs, T., Isaksson, T., Fearn, T., and Davies, T. (2004). Interpreting PCR and PLS solutions. A User-Friendly Guide to Multivariate Calibration and Classification 1, 39–54.

Naidu, K.A. (2003). Vitamin C in human health and disease is still a mystery ? An overview. Nutrition Journal 2, 7.

Natelson, S., Pincus, J.B., and Lugovoy, J.K. (1948). RESPONSE OF CITRIC ACID LEVELS TO ORAL ADMINISTRATION OF GLUCOSE. I. NORMAL ADULTS AND CHILDREN 1. Journal of Clinical Investigation 27, 446–449.

Newgard, C.B., An, J., Bain, J.R., Muehlbauer, M.J., Stevens, R.D., Lien, L.F., Haqq, A.M., Shah, S.H., Arlotto, M., and Slentz, C.A. (2009). A Branched-Chain Amino Acid-Related Metabolic Signature that Differentiates Obese and Lean Humans and Contributes to Insulin Resistance. Cell Metabolism 9, 311–326.

Nicholls, A.W. (2012). Realising the potential of metabolomics. Bioanalysis 4, 2195–2197.

Nicholls, A.W., Mortishire-Smith, R.J., and Nicholson, J.K. (2003). NMR Spectroscopic-Based Metabonomic Studies of Urinary Metabolite Variation in Acclimatizing Germ-Free Rats. Chem. Res. Toxicol. 16, 1395–1404.

Nicholson, J.K. (2006). Global systems biology, personalized medicine and molecular epidemiology. Molecular Systems Biology 2,.

Nicholson, J.K., Foxall, P.J., Spraul, M., Farrant, R.D., and Lindon, J.C. (1995). 750 MHz 1H and 1H-13C NMR spectroscopy of human blood plasma. Anal. Chem 67, 793–811.

Nicholson, J.K., and Wilson, I.D. (1987). High resolution nuclear magnetic resonance spectroscopy of biological samples as an aid to drug development. Prog Drug Res 31, 427–479.

Nicholson, J.K., and Wilson, I.D. (1989). High resolution proton magnetic resonance spectroscopy of biological fluids. Progress in Nuclear Magnetic Resonance Spectroscopy 21, 449–501.

Nishikimi, M., Fukuyama, R., Minoshima, S., Shimizu, N., and Yagi, K. (1994). Cloning and chromosomal mapping of the human nonfunctional gene for L-gulono-

Page 157: Multivariate NMR analysis of human disease models

155

gamma-lactone oxidase, the enzyme for L-ascorbic acid biosynthesis missing in man. Journal of Biological Chemistry 269, 13685–13688.

Nishikimi, M., Kawai, T., and Yagi, K. (1992). Guinea pigs possess a highly mutated gene for L-gulono-gamma-lactone oxidase, the key enzyme for L-ascorbic acid biosynthesis missing in this species. Journal of Biological Chemistry 267, 21967–21972.

Nishikimi, M., and Yagi, K. (1991). Molecular basis for the deficiency in humans of gulonolactone oxidase, a key enzyme for ascorbic acid biosynthesis. Am. J. Clin. Nutr 54, 1203S–1208S.

Oksman-Caldentey, K.M., and Inzé, D. (2004). Plant cell factories in the post-genomic era: new ways to produce designer secondary metabolites. Trends in Plant Science 9, 433–440.

Oliver, S.G., Winson, M.K., Kell, D.B., and Baganz, F. (1998). Systematic functional analysis of the yeast genome. In European Symposium of Life Sciences Research in Space (Oser), p. 583.

Osten, D.W. (1988). Selection of optimal regression models via cross-validation. Journal of Chemometrics 2, 39–48.

Padh, H. (1990). Cellular functions of ascorbic acid. Biochem. Cell Biol 68, 1166–1173.

Park, S.-Y., Cho, Y.-R., Kim, H.-J., Higashimori, T., Danton, C., Lee, M.-K., Dey, A., Rothermel, B., Kim, Y.-B., Kalinowski, A., et al. (2005). Unraveling the temporal pattern of diet-induced insulin resistance in individual organs and cardiac dysfunction in C57BL/6 mice. Diabetes 54, 3530–3540.

Pauling, L., Robinson, A.B., Teranishi, R., and Cary, P. (1971). Quantitative Analysis of Urine Vapor and Breath by Gas-Liquid Partition Chromatography. PNAS 68, 2374–2376.

Pendyala, G., Want, E.J., Webb, W., Siuzdak, G., and Fox, H.S. (2007). Biomarkers for neuroAIDS: the widening scope of metabolomics. Journal of Neuroimmune Pharmacology 2, 72–80.

Penttila, I.., and Pollanen, O. Effect of Insulin and Tolbutamide on Blood Citric Acid in Rabbits, Scandinavian Journal of Clinical & Laboratory Investigation, Informa Healthcare.

Pepe, M.S., Feng, Z., Janes, H., Bossuyt, P.M., and Potter, J.D. (2008). Pivotal Evaluation of the Accuracy of a Biomarker Used for Classification or Prediction: Standards for Study Design. JNCI J Natl Cancer Inst 100, 1432–1438.

Page 158: Multivariate NMR analysis of human disease models

156

Pigini, D., Cialdella, A.M., Faranda, P., and Tranfo, G. (2006). Comparison between external and internal standard calibration in the validation of an analytical method for 1-hydroxypyrene in human urine by high-performance liquid chromatography/tandem mass spectrometry. Rapid Communications in Mass Spectrometry 20, 1013–1018.

Piloquet, H., Ferchaud-Roucher, V., Duengler, F., Zair, Y., Maugere, P., and Krempf, M. (2003). Insulin effects on acetate metabolism. Am J Physiol Endocrinol Metab 285, E561–E565.

Pincus, J.B., Natelson, S., and Lugovoy, J.K. (1948). RESPONSE OF CITRIC ACID LEVELS TO ORAL ADMINISTRATION OF GLUCOSE. II. ABNORMALITIES OBSERVED IN THE DIABETIC AND CONVULSIVE STATE 1. Journal of Clinical Investigation 27, 450–453.

Pohjanen, E., Thysell, E., Lindberg, J., Schuppe-Koistinen, I., Moritz, T., Jonsson, P., and Antti, H. (2006). Statistical multivariate metabolite profiling for aiding biomarker pattern detection and mechanistic interpretations in GC/MS based metabolomics. Metabolomics 2, 257–268.

Quinones, M.P., and Kaddurah-Daouk, R. (2009). Metabolomics tools for identifying biomarkers for neuropsychiatric diseases. Neurobiology of Disease 35, 165–176.

Raab, R.M., Bullen, J., Kelleher, J., Mantzoros, C., and Stephanopoulos, G. (2005). Regulation of mouse hepatic genes in response to diet induced obesity, insulin resistance and fasting induced weight reduction. Nutr Metab (Lond) 2, 15.

Raamsdonk, L.M., Teusink, B., Broadhurst, D., Zhang, N., Hayes, A., Walsh, M.C., Berden, J.A., Brindle, K.M., Kell, D.B., Rowland, J.J., et al. (2001). A functional genomics strategy that uses metabolome data to reveal the phenotype of silent mutations. Nature Biotechnology 19, 45–50.

Ransohoff, D.F. (2005). Bias as a threat to the validity of cancer molecular-marker research. Nature Reviews Cancer 5, 142–149.

Ransohoff, D.F., and Feinstein, A.R. (1978). Problems of spectrum and bias in evaluating the efficacy of diagnostic tests. New England Journal of Medicine 299, 926–930.

Rao, A.R., Motiwala, H.G., and Karim, O.M.A. (2008). The discovery of prostate-specific antigen. BJU Int 101, 5–10.

Rao, R., Hao, C.-M., Redha, R., Wasserman, D., McGuinness, O., and Breyer, M. (2007). Glycogen synthase kinase 3 inhibition improves insulin-stimulated glucose

Page 159: Multivariate NMR analysis of human disease models

157

metabolism but not hypertension in high-fat-fed C57BL/6J mice. Diabetologia 50, 452–460.

Robertson, D.G. (2005). Metabonomics in Toxicology: A Review. Toxicol. Sci. 85, 809–822.

Robertson, D.G., and Reily, M.D. (2012). The Current Status of Metabolomics in Drug Discovery and Development. Drug Development Research 73, 535–546.

Robertson, D.G., Watkins, P.B., and Reily, M.D. (2010). Metabolomics in Toxicology: Preclinical and Clinical Applications. Toxicological Sciences 120, S146–S170.

Rocconi, R.P., Matthews, K.S., Kemper, M.K., Hoskins, K.E., Huh, W.K., and Straughn, J.M., Jr (2009). The timing of normalization of CA-125 levels during primary chemotherapy is predictive of survival in patients with epithelial ovarian cancer. Gynecol. Oncol. 114, 242–245.

Roussel, R., Mentre, F., Bouchemal, N., Hadjadj, S., Lievre, M., Chatellier, G., Menard, J., Panhard, X., Le Henanff, A., Marre, M., et al. (2007). NMR-based prediction of cardiovascular risk in diabetes. Nat Med 13, 399–400.

Rubtsov, D.V., and Griffin, J.L. (2007). Time-domain Bayesian detection and estimation of noisy damped sinusoidal signals applied to NMR spectroscopy. Journal of Magnetic Resonance 188, 367–379.

Sakurai, T., Akiyama, K., and Saito, K. (2011). Analytical Platforms and Databases Ranging from Plant Transcriptomics to Metabolomics. Medicinal Plant Biotechnology 222.

Salek, R.M., Maguire, M.L., Bentley, E., Rubtsov, D.V., Hough, T., Cheeseman, M., Nunez, D., Sweatman, B.C., Haselden, J.N., Cox, R.D., et al. (2007). A metabolomic comparison of urinary changes in type 2 diabetes in mouse, rat, and human. Physiol. Genomics 29, 99–108.

Sauberlich, H.E. (1994). Pharmacology of vitamin C. Annu. Rev. Nutr 14, 371–391.

Scalbert, A., Brennan, L., Fiehn, O., Hankemeier, T., Kristal, B.S., Ommen, B., Pujos-Guillot, E., Verheij, E., Wishart, D., and Wopereis, S. (2009). Mass-spectrometry-based metabolomics: limitations and recommendations for future progress with particular focus on nutrition research. Metabolomics 5, 435–458.

Schemmel, R., Mickelsen, O., and Gill, J.L. (1970). Dietary Obesity in Rats: Body Weight and Body Fat Accretion in Seven Strains of Rats. J. Nutr. 100, 1041–1048.

Page 160: Multivariate NMR analysis of human disease models

158

Schicho, R., Nazyrova, A., Shaykhutdinov, R., Duggan, G., Vogel, H.J., and Storr, M. (0). Quantitative metabolomic profiling of serum and urine in DSS-induced ulcerative colitis of mice by 1H NMR spectroscopy. Journal of Proteome Research 0,.

Sébédio, J.-L., Pujos-Guillot, E., and Ferrara, M. (2009). Metabolomics in evaluation of glucose disorders. Current Opinion in Clinical Nutrition and Metabolic Care 12, 412–418.

Sellick, C.A., Knight, D., Croxford, A.S., Maqsood, A.R., Stephens, G.M., Goodacre, R., and Dickson, A.J. (2010). Evaluation of extraction processes for intracellular metabolite profiling of mammalian cells: matching extraction approaches to cell type and metabolite targets. Metabolomics 6, 427–438.

Sevanian, A., Davies, K.J., and Hochstein, P. (1991). Serum urate as an antioxidant for ascorbic acid. Am. J. Clin. Nutr 54, 1129S–1134S.

Shaykhutdinov, R., MacInnis, G., Dowlatabadi, R., Weljie, A., and Vogel, H. (2009). Quantitative analysis of metabolite concentrations in human urine samples using &lt;sup&gt;13&lt;/sup&gt;C&lt;sup&gt;1&lt;/sup&gt;H NMR spectroscopy. Metabolomics 5, 307–317.

Shearer, J., Duggan, G., Weljie, A., Hittel, D.S., Wasserman, D.H., and Vogel, H.J. (2008). Metabolomic profiling of dietary-induced insulin resistance in the high fat-fed C57BL/6J mouse. Diabetes Obes Metab 10, 950–958.

Shearer, J., Fueger, P.T., Bracy, D.P., Wasserman, D.H., and Rottman, J.N. (2005). Partial gene deletion of heart-type fatty acid-binding protein limits the severity of dietary-induced insulin resistance. Diabetes 54, 3133–3139.

Shurubor, Y.I., Paolucci, U., Krasnikov, B.F., Matson, W.R., and Kristal, B.S. (2005). Analytical precision, biological variation, and mathematical normalization in high data density metabolomics. Metabolomics 1, 75–85.

Smilde, A.K., Jansen, J.J., Hoefsloot, H.C.J., Lamers, R.-J.A.N., Greef, J. van der, and Timmerman, M.E. (2005). ANOVA-simultaneous component analysis (ASCA): a new tool for analyzing designed metabolomics data. Bioinformatics 21, 3043–3048.

Sok, D.-E. (1998). Ascorbate-Induced Oxidative Inactivation of Zn2+-Glycerophosphocholine Cholinephosphodiesterase. Journal of Neurochemistry 70, 1167–1174.

Spratlin, J.L., Serkova, N.J., and Eckhardt, S.G. (2009). Clinical applications of metabolomics in oncology: a review. Clinical Cancer Research 15, 431–440.

Page 161: Multivariate NMR analysis of human disease models

159

Sreekumar, A., Poisson, L.M., Rajendiran, T.M., Khan, A.P., Cao, Q., Yu, J., Laxman, B., Mehra, R., Lonigro, R.J., Li, Y., et al. (2009). Metabolomic profiles delineate potential role for sarcosine in prostate cancer progression. Nature 457, 910–914.

Sreekumar, A., Poisson, L.M., Rajendiran, T.M., Khan, A.P., Cao, Q., Yu, J., Laxman, B., Mehra, R., Lonigro, R.J., Li, Y., et al. (2010). Re: Florian Jentzmik, Carsten Stephan, Kurt Miller, et al. Sarcosine in Urine after Digital Rectal Examination Fails as a Marker in Prostate Cancer Detection and Identification of Aggressive Tumours. Eur Urol 2010;58:12-8. European Urology 58, e29–e30.

Stamler, J., Vaccaro, O., Neaton, J.D., and Wentworth, D. (1993). Diabetes, other risk factors, and 12-yr cardiovascular mortality for men screened in the Multiple Risk Factor Intervention Trial. Diabetes Care 16, 434–444.

Stone, I. (1979). Homo sapiens ascorbicus, a biochemically corrected robust human mutant. Medical Hypotheses 5, 711–721.

Stringer, K.A., Serkova, N.J., Karnovsky, A., Guire, K., Paine, R., and Standiford, T.J. (2011). Metabolic consequences of sepsis-induced acute lung injury revealed by plasma 1H-nuclear magnetic resonance quantitative metabolomics and computational analysis. Am J Physiol Lung Cell Mol Physiol 300, L4–L11.

Svirbely, J.L., and Szent-Györgyi, A. (1932). The chemical nature of vitamin C. Biochemical Journal 26, 865.

Sysi-Aho, M., Katajamaa, M., Yetukuri, L., and Orešič, M. (2007). Normalization method for metabolomics data using optimal selection of multiple internal standards. BMC Bioinformatics 8, 93.

Tikunov, Y., Lommen, A., De Vos, C.H.R., Verhoeven, H.A., Bino, R.J., Hall, R.D., and Bovy, A.G. (2005). A novel approach for nontargeted data analysis for metabolomics. Large-scale profiling of tomato fruit volatiles. Plant Physiology 139, 1125–1137.

Tiziani, S., Lopes, V., and Günther, U.L. (2009). Early stage diagnosis of oral cancer using 1H NMR-based metabolomics. Neoplasia (New York, NY) 11, 269.

Todorow, J.T., and Dikow, A.L. (1960). Changes of blood citric acid level during glucose tolerance test in normals and diabetic and hyperthyreotic patients. Clinica Chimica Acta 5, 762–765.

Torgrip, R.J.O., Åberg, K.M., Alm, E., Schuppe-Koistinen, I., and Lindberg, J. (2008). A note on normalization of biofluid 1D 1H-NMR data. Metabolomics 4, 114–121.

Page 162: Multivariate NMR analysis of human disease models

160

Trewavas, A. (2006). A Brief History of Systems Biology “Every object that biology studies is a system of systems.” Francois Jacob (1974). Plant Cell 18, 2420–2430.

Truong, Y., Lin, X., and Beecher, C. (2004). Learning a complex metabolomic dataset using random forests and support vector machines. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 835–840.

Trygg, J., Holmes, E., and Lundstedt, T. (2007). Chemometrics in metabonomics. J. Proteome Res 6, 469–479.

Trygg, J., and Wold, S. (2002). Orthogonal projections to latent structures (O-PLS). Journal of Chemometrics 16, 119–128.

Umetrics (2006). Multi- and Megavariate Data Analysis, Part 1, Basic Principles and Applications (MKS Umetrics AB).

Vander Heiden, M.G., Cantley, L.C., and Thompson, C.B. (2009). Understanding the Warburg effect: the metabolic requirements of cell proliferation. Science Signalling 324, 1029.

Vial, R., Balart, L., Arroyave, G., and others (1974). Effect of exercise and physical fitness on serum lipids and lipoproteins. Atherosclerosis 20, 1–9.

Villas-Bôas, S.G., Koulman, A., and Lane, G.A. (2007). Analytical methods from the perspective of method standardization. In Molecular Genetics of Recombination, A. Aguilera, and R. Rothstein, eds. (Springer Berlin Heidelberg), pp. 487–513.

Vinzi, V.E. (2010). Handbook of Partial Least Squares: Concepts, Methods and Applications (Springer).

Vissers, M.C.M., and Wilkie, R.P. (2007). Ascorbate deficiency results in impaired neutrophil apoptosis and clearance and is associated with up-regulation of hypoxia-inducible factor 1alpha. J. Leukoc. Biol 81, 1236–1244.

Vitzthum, F., Behrens, F., Anderson, N.L., and Shaw, J.H. (2005). Proteomics:  From Basic Research to Diagnostic Application. A Review of Requirements & Needs†. J. Proteome Res. 4, 1086–1097.

Wang, T.J., Larson, M.G., Vasan, R.S., Cheng, S., Rhee, E.P., McCabe, E., Lewis, G.D., Fox, C.S., Jacques, P.F., Fernandez, C., et al. (2011). Metabolite profiles and the risk of developing diabetes. Nature Medicine 17, 448–453.

Page 163: Multivariate NMR analysis of human disease models

161

Warrack, B.M., Hnatyshyn, S., Ott, K.H., Reily, M.D., Sanders, M., Zhang, H., and Drexler, D.M. (2009). Normalization strategies for metabonomic analysis of urine samples. Journal of Chromatography B 877, 547–552.

Weckwerth, W., and Morgenthal, K. (2005). Metabolomics: from pattern recognition to biological interpretation. Drug Discovery Today 10, 1551–1558.

Weljie, A.M., Dowlatabadi, R., Miller, B.J., Vogel, H.J., and Jirik, F.R. (2007). An inflammatory arthritis-associated metabolite biomarker pattern revealed by 1H NMR spectroscopy. J. Proteome Res 6, 3456–3464.

Weljie, A.M., and Jirik, F.R. Hypoxia-induced metabolic shifts in cancer cells: Moving beyond the Warburg effect. The International Journal of Biochemistry & Cell Biology In Press, Corrected Proof,.

Weljie, A.M., Newton, J., Mercier, P., Carlson, E., and Slupsky, C.M. (2006). Targeted profiling: quantitative analysis of 1H NMR metabolomics data. Anal. Chem 78, 4430–4442.

West, D.B., Boozer, C.N., Moody, D.L., and Atkinson, R.L. (1992). Dietary obesity in nine inbred mouse strains. Am J Physiol Regul Integr Comp Physiol 262, R1025–R1032.

Westerhuis, J., Van Velzen, E., Hoefsloot, H., and Smilde, A. (2010). Multivariate paired data analysis: multilevel PLSDA versus OPLSDA. Metabolomics 6, 119–128.

Wiklund, S., Johansson, E., Sjostrom, L., Mellerowicz, E.J., Edlund, U., Shockcor, J.P., Gottfries, J., Moritz, T., and Trygg, J. (2007). Visualization of GC/TOF-MS-Based Metabolomics Data for Identification of Biochemically Interesting Compounds Using OPLS Class Models. Anal. Chem. 80, 115–122.

Wikoff, W.R., Anfora, A.T., Liu, J., Schultz, P.G., Lesley, S.A., Peters, E.C., and Siuzdak, G. (2009). Metabolomics analysis reveals large effects of gut microflora on mammalian blood metabolites. Proceedings of the National Academy of Sciences 106, 3698.

Wilcken, B., Wiley, V., Hammond, J., and Carpenter, K. (2003). Screening newborns for inborn errors of metabolism by tandem mass spectrometry. New England Journal of Medicine 348, 2304–2312.

Winkler, B.S., Orselli, S.M., and Rex, T.S. (1994). The redox couple between glutathione and ascorbic acid: a chemical and physiological perspective. Free Radic. Biol. Med 17, 333–349.

Page 164: Multivariate NMR analysis of human disease models

162

Wishart, D.S., Jewison, T., Guo, A.C., Wilson, M., Knox, C., Liu, Y., Djoumbou, Y., Mandal, R., Aziat, F., Dong, E., et al. (2012). HMDB 3.0--The Human Metabolome Database in 2013. Nucleic Acids Research 41, D801–D807.

Wishart, D.S., Tzur, D., Knox, C., Eisner, R., Guo, A.C., Young, N., Cheng, D., Jewell, K., Arndt, D., Sawhney, S., et al. (2007). HMDB: the Human Metabolome Database. Nucleic Acids Res 35, D521–526.

Wold, S., Esbensen, K., and Geladi, P. (1987). Principal component analysis. Chemometrics and Intelligent Laboratory Systems 2, 37–52.

Wold, S., Trygg, J., Berglund, A., and Antti, H. (2001). Some recent developments in PLS modeling. Chemometrics and Intelligent Laboratory Systems 58, 131–150.

Wong, M. -c., Chung, J.W.Y., and Wong, T.K.S. (2007). Effects of treatments for symptoms of painful diabetic neuropathy: systematic review. BMJ 335, 87–87.

Wood, P. (2012). Lipidomics of Alzheimer’s disease: current status. Alzheimer’s Research & Therapy 4, 1–10.

Woodger, J.H. (1929). Biological principles: A critical study (Taylor & Francis).

Wu, G. (2009). Amino acids: metabolism, functions, and nutrition. Amino Acids 37, 1–17.

Wu, G., and Meininger, C.J. (2009). Nitric oxide and vascular insulin resistance. BioFactors 35, 21–27.

Wu, J.J., Roth, R.J., Anderson, E.J., Hong, E.-G., Lee, M.-K., Choi, C.S., Neufer, P.D., Shulman, G.I., Kim, J.K., and Bennett, A.M. (2006). Mice lacking MAP kinase phosphatase-1 have enhanced MAP kinase activity and resistance to diet-induced obesity. Cell Metabolism 4, 61–73.

Xu, Y., Zomer, S., and Brereton, R.G. (2006). Support vector machines: a recent method for classification in chemometrics. Critical Reviews in Analytical Chemistry 36, 177–188.

Xuan, J., Pan, G., Qiu, Y., Yang, L., Su, M., Liu, Y., Chen, J., Feng, G., Fang, Y., and Jia, W. (2011). Metabolomic profiling to identify potential serum biomarkers for schizophrenia and risperidone action. Journal of Proteome Research 10, 5433–5443.

Y, I., Y, O., and M, N. (2003). The Whole Structure of the Human Nonfunctional L-Gulono-.GAMMA.-Lactone Oxidase Gene-the Gene Responsible for Scurvy-and the Evolution of Repetitive Sequences Thereon. J Nutr Sci Vitaminol 49, 315–319.

Page 165: Multivariate NMR analysis of human disease models

163

Yamanouchi, K., Shinozaki, T., Chikada, K., Nishikawa, T., Ito, K., Shimizu, S., Ozawa, N., Suzuki, Y., Maeno, H., Kato, K., et al. (1995). Daily Walking Combined With Diet Therapy Is a Useful Means for Obese NIDDM Patients Not Only to Reduce Body Weight But Also to Improve Insulin Sensitivity. Dia Care 18, 775–778.

Yan, B., A, J., Wang, G., Lu, H., Huang, X., Liu, Y., Zha, W., Hao, H., Zhang, Y., Liu, L., et al. (2009). Metabolomic investigation into variation of endogenous metabolites in professional athletes subject to strength-endurance training. J Appl Physiol 106, 531–538.

Zamboni, N., and Sauer, U. (2009). Novel biological insights through metabolomics and 13C-flux analysis. Current Opinion in Microbiology 12, 553–558.

Zelena, E., Dunn, W.B., Broadhurst, D., Francis-McIntyre, S., Carroll, K.M., Begley, P., O’Hagan, S., Knowles, J.D., Halsall, A., Wilson, I.D., et al. (2009). Development of a Robust and Repeatable UPLC−MS Method for the Long-Term Metabolomic Study of Human Serum. Anal. Chem. 81, 1357–1364.

Zhang, S., Nagana Gowda, G.A., Asiago, V., Shanaiah, N., Barbas, C., and Raftery, D. (2008). Correlative and quantitative 1H NMR-based metabolomics reveals specific metabolic pathway disturbances in diabetic rats. Analytical Biochemistry 383, 76–84.

Zhao, X., Fritsche, J., Wang, J., Chen, J., Rittig, K., Schmitt-Kopplin, P., Fritsche, A., Häring, H.-U., Schleicher, E., Xu, G., et al. (2010). Metabonomic fingerprints of fasting plasma and spot urine reveal human pre-diabetic metabolic traits. Metabolomics 6, 362–374.

Zhu, C.S., Pinsky, P.F., Cramer, D.W., Ransohoff, D.F., Hartge, P., Pfeiffer, R.M., Urban, N., Mor, G., Bast, R.C., Moore, L.E., et al. (2011). A Framework for Evaluating Biomarkers for Early Detection: Validation of Biomarker Panels for Ovarian Cancer. Cancer Prev Res 4, 375–383.

Zivkovic, A.M., and German, J.B. (2009). Metabolomics for assessment of nutritional status. Curr Opin Clin Nutr Metab Care 12, 501–507.


Recommended