CH24 09/26/2012 12:15:0 Page 215
24Experimental Proteomics
Thierry Le BihanSchool of Biological Sciences, University of Edinburgh, UK
24.1 Basic ‘how-to-do’ and ‘why-do’ section24.1.1 What is proteomics used for?
Proteomics can be defined as the analysis of:
1. howmuch of a given protein or proteins is expressed (under specific conditions);
2. how proteins interact with each other; and
3. what is the nature and the dynamics of their modification (post-translational
modification)?
Although it is often considered that the proteome is to proteins what the genome
is to genes, several major divergences emerge, due to the differences both in their
nature and in their respective roles.
24.1.2 How do proteomics and genomics differ?
Amajor difference between genomics (see Primer 23) and proteomics is that there is
no protein equivalent to the polymerase chain reaction used for DNA amplification
(PCR; Primer 20). Moreover, cellular proteins have expression levels that vary over
a very large dynamic range (i.e. very low to very high levels), which is a significant
challenge to proteomic techniques. Essentially, the dynamic range of a mass
spectrometer must be able to cover at least four to five orders of magnitude in
protein abundance, knowing that, within a simple cell like yeast, this difference can
reach more than six orders of magnitude. At the extreme, this can mean attempting
Essential Guide to Reading Biomedical Papers: Recognising and Interpreting Best Practice, First Edition.
Edited by Phil Langton.
� 2013 by John Wiley & Sons, Ltd. Published 2013 by John Wiley & Sons, Ltd.
CH24 09/26/2012 12:15:0 Page 216
to detect a single molecule (copy) of one protein in a cell that may contain over a
million copies of several other proteins and smaller quantities or many hundreds
of others.
This difference, between what can be technically achieved in terms of detection
and what is actually present within a sample, is often referred to in the literature as
being simply the ‘tip of the iceberg’. Having such a vast dynamic range to be
covered is probably one of the most important challenges to tackle in the proteomics
field, and it ultimately defines the chosen experimental design. In addition, proteins
are continuously being synthesized and degraded, which adds the important variable
of ‘time’ to the protein detection equation. This characteristic is not captured at the
mRNA level and explains why, in some cases, a poor correlation between mRNA
and protein level exists.
This last example clearly supports the notion that for protein quantitation, it is
better to use a direct approach (i.e. protein measurement) instead of an indirect
approach based on mRNA transcript measurement.
Although proteomics does have its limitations, it has some advantages over other
techniques as well. For example, it provides the ability to acquire, relatively quickly,
either a global or a specific snapshot of the protein composition within a sample.
This ‘snapshot’ can be obtained either with or without prior knowledge of cellular or
tissue-specific protein abundance.
To some extent, some of the above-mentioned challenges in proteomics have
been tackled by significant improvements in sample preparation and separation
techniques and by improving the mass spectrometer instrument itself, as well as
improved data analysis methods and bioinformatics.
24.2 Important considerations24.2.1 Sample preparation and separation techniques
A typical proteomic mass spectrometry-based analysis is roughly based on protein
or peptide separation, followed by analysis using a mass spectrometer with the
peptide sequence often being confirmed from an already available database of
known protein sequences.
In a typical proteomic approach (often referred to as ‘bottom-up’), the protein
samples are digested with a protease such as trypsin. The peptides extracted are then
analyzed using a mass spectrometer, where peptide masses are measured (in MS
mode) and using isolation and collisional activation energy in tandem MS (often
described as MSMS or MS2).
The peptide sequence can be deduced from the mass/charge of its different
fragments. Although a bottom-up approach is appropriate for protein identification
in complete mixtures, not all the peptides will be identified in MS mode. Further-
more, not all of the peptide fragments generated by ionization will be detected by
the mass spectrometer, due to some of the physico-chemical properties of the
216 CH24 EXPERIMENTAL PROTEOMICS
CH24 09/26/2012 12:15:0 Page 217
peptide itself. Therefore, the incomplete protein coverage which is typical of a
bottom-up approach often leaves gaps in the information obtained for a given
protein (e.g. information may be missing regarding a protein’s post-translational
modification, splice isoform, specific site mutations). In addition, information about
the mature protein sequence or its potential degradation may not be captured. All of
these pieces of missing information can be important elements of a protein’s
biological function.
In most cases, bottom-up approaches fall into two main categories, both of which
are designed to reduce the complexity of the sample being analyzed:
1. Protein separation by 2-Dimensional gel-electrophoresis (2DE): Protein
mixtures are first separated in two orthogonal dimensions. Typically, proteins
are separated based on their net charge (their isoelectric point). Next they are
transferred to a SDS-PAGE gel (see Primer 15 for more detail), where they are
separated again, this time based on their molecular weight. Although this is
both a robust and valuable approach, it is being slowly excluded from the
‘mainstream’ proteomic field as other techniques become more popular.
Nevertheless, the 2DE approach is often used in cases where an organism’s
genome has not been fully sequenced (e.g. in the fields of agronomy and
environmental sciences). Following separation on gel, the proteins are visu-
alized by the use of specific protein stains. The gel area containing a given
protein is then excised from the rest of the gel, and the gel spot is subjected to
proteolytic digestion and the peptides analysed by MS. Even though 2DE is a
well-established approach, it is characterized by a very limited dynamic range
and often poor transfer of hydrophobic membrane proteins from one dimen-
sion to the other.
2. Multidimensional chromatography: Another approach, developed in order to
tackle the complexity challenge is the multidimensional protein identification
technology (MudPIT). This approach is based on the protease digestion of the
entire protein extract. This is separated first on a strong cation-exchange column
and, subsequently, each fraction is separated by reversed-phase chromatography
coupled to mass spectrometry. The MudPITapproach is now considered to be a
minimum standard for any shotgun proteomic study. In studies where each
sample is fractionated to generate between 3–24 fractions, each of those
fractions can take between 60–120 minutes on average to be analyzed on
LC-MS.Therefore, each sample in a given shotgun proteomic studymay require
a complete day of LC-MS time to be analyzed. The ability of MudPIT to dig
deeper into a givenproteome comes at the cost of time invested into a given study
In addition to these main categories, there are specialist technical approaches that
reflect a desire to understand a variety of specific modifications that are made to
proteins and which have functional significance.
24.2 IMPORTANT CONSIDERATIONS 217
CH24 09/26/2012 12:15:0 Page 218
For example, post- translational modifications (PTMs) are the ‘alteration’ of
specific amino acids within a protein sequence which change the properties of the
protein. Protein structure, activity, function and stability are all influenced by
various forms of PTMs. A few examples of post-translational modifications and
their effects on proteins are:
� Acetylation for protein stability and regulation.
� S-nitrosylation and phosphorylation for signal transduction.
� Ubiquitination for proteolysis and protein sorting.
� Disulphide bond formation for stability or redox sensing.
Due to their low abundance and, in some cases, their instability, detecting any
PTM using one of the previously mentioned methods mostly relies on chance and is
very inefficient. A more pragmatic approach consists of enriching for the PTM in a
sufficiently large quantity, which is only possible for few PTMs.
The various methods for PTM enrichment can be divided into several classes.
They are:
1. chemical/physical affinity, such as immobilized metal affinity chromatogra-
phy, which is based on coordinate binding between a metal ion (Fe, Ga) and a
phosphate group on a peptide;
2. a depletion-based approach such as a CnBr column which, under specific
conditions, will capture free N-terminal peptides (acetylated N-terminal
peptides which do not bind to the column are found in the flow-through);
3. immunoaffinity purification techniques such as those that have been developed
for ubiquitin and phosphotyrosine, for example;
4. other ‘hybrid’ or ‘tandem’ methods which are a combination of the above-
mentioned methods.
A typical post-translational modification analysis could have the following
pattern: PTM modified peptides are enriched using one of the methods described
above and then analysed by mass spectrometry. The experimental peptide mass
observed in MS mode are matched against theoretical potential peptides with and
without mass difference associated to a given PTM. For a specific PTM, the
modified amino acid remains unchanged (for example acetylation) and in MSMS
mode the peptide fragments in the presence of a collisional gas (refers to
collision-induced dissociation CID) in a similar manner as its unmodified
counterpart except that a mass shift is observed for the modified amino acid.
218 CH24 EXPERIMENTAL PROTEOMICS
CH24 09/26/2012 12:15:0 Page 219
Finally, in the case of peptide PTM enrichment, interpretation of the relative
comparisons between samples can be ambiguous, as one cannot distinguish
between variations in the levels of protein expression and the degree of PTM of
a given protein.
24.2.2 The mass spectrometer: different ionizationmodes and instrument design
Ionization modes To this point, we have described how to reduce sample
complexity (2D GE, MudPit) and how to ensure the sample is compatible with
the MS analysis (protease digestion). In this section, we will describe the two main
approaches used for protein/peptides ionization and a brief word on the different
types of mass spectrometers. Proteins and peptides are non-volatile polar molecules
requiring a soft ionization method for their transfer into a gaseous phase. The two
main approaches used to achieve this are MALDI and ESI:
1. Matrix-assisted laser desorption ionization (MALDI). This involves mixing
the sample with a matrix which absorbs laser energy that is transferred to the
peptides. The laser heat simultaneously induces both the desorption of matrix
and the transfer of singly positively charged ions of peptides into the gas phase.
Some of the known drawbacks of this approach are the generation of single-
charge ions that are often difficult to fragment for sequencing purposes, and
variation of the signal intensity due to the sample preparation.
2. Electrospray ionization (ESI). In this approach, ions are generated from a
solution and are produced by applying a high voltage (2–6 kV) between the
end of the solution separation device (commonly an HPLC column) and the
inlet of the mass spectrometer. Under these conditions, an electrically charged
droplet is created, which results in the formation and desolvation of analyte-
solvent droplets. Under these ionization conditions, peptides often carry
multiple charges and there is some interdependence between ion intensity,
the ion concentration and the flow rate.
Mass spectrometers Specific types of proteomic applications are better suited to
specific types of mass spectrometers. The main characteristics that differentiate by
instrument design are:
� mass accuracy;
� resolving power;
� sensitivity (limit of detection, LOD);
24.2 IMPORTANT CONSIDERATIONS 219
CH24 09/26/2012 12:15:0 Page 220
� sampling rate;
� dynamic range.
These parameters are summarized in Figure 24.1. In general, for most proteomic
applications, hybrid or tandem mass spectrometers are often used because
information regarding the exact peptide mass needs to be extracted (obtained
in MS mode), as well as the isolation and fragmentation which is performed in
MSMS mode.
Figure 24.1 Visual representation of the different major characteristics that distinguishthe different mass spectrometers. (a): the mass accuracy. (b): resolving power. (c and d):Sensitivity and limit of detection (LOD). (e): dynamic ranges. Courtesy of Dr. Thierry Lebihan.A full colour version of this figure appears in the colour plate section.
220 CH24 EXPERIMENTAL PROTEOMICS
CH24 09/26/2012 12:15:0 Page 221
In general, MS instruments fall into three broad types:
1. Time of flight (TOF). These instruments are based on the principle that, for
a given charge state, the time ions will take to reach the detector (under
a potential) depends on, and is inversely proportional to, their mass (i.e. low
mass ions will reach the detector before high mass ions). TOF instruments
are often used in a hybrid configuration with a quadrupole (Q-Q-TOF), as
well as in tandem (TOF-TOF), or even in a triple-TOF configuration. These
instruments have a good mass accuracy and good quantitation capability.
Furthermore, they are often used in discovery proteomics (Q-TOF), although
the triple TOF from AB-Sciex, according to the vendor’s claim, has been
designed as a single platform instrument for both discovery and targeted
quantitative proteomics.
2. Quadrupole-based instruments. A quadrupole is a structure composed of
four parallel rods, where a radio frequency (RF) quadrupole field is
generated which stabilizes the path of an ion having a given m/z ratio.
This RF field can be adjusted incrementally, allowing the analysis of a wider
m/z range. As for the TOF-based instrument, some of the quadrupole-based
instruments are found in a combination of three quadrupoles, where the first
and third ‘quads’ are mass filters and the second quadrupole is a collision
chamber (ion fragmentation). These instruments have a lower mass accuracy
and a lower resolving power than a TOF. Depending on how they are used,
they can have a high dynamic range, and they are often used in targeted
proteomics.
3. Ion traps. There are several types of ion traps. They are:
� The three-dimensional quadrupole ion trap, which is based on the same
principle as the quadrupole mass spectrometer described above. In this case,
however, the ions are instead trapped and consecutively ejected.
� The linear ion trap (LIT) differs slightly from the 3-D quadrupole ion trap
because it is a two-dimensional instead of a three-dimensional quadrupole
field. This allows the trapping of a higher number of ions (and increases
the dynamic range of the instrument).
� The Orbitrap, in which ions are trapped in an electrostatic field and rotate
around a central electrode. Ions are characterized by two different oscilla-
tions; one is around the electrode, while the other is a back-and-forth
movement along the electrode. The latter generates an image current which
is dependent on the mass-to-charge ratios. As a result, Orbitrap instruments
have high mass accuracy and sensitivity, as well as a better dynamic range,
compared to the two previous ion traps.
24.2 IMPORTANT CONSIDERATIONS 221
CH24 09/26/2012 12:15:1 Page 222
� Fourier transform ion cyclotron resonance instruments share some com-
mon features with the Orbitrap. In this case, the ions are trapped and
oscillate within a magnetic field instead. Although they are more tedious
instruments, they have high mass accuracy and sensitivity as well as a good
dynamic range.
Figure 24.2 illustrates the strength of each type of mass spectrometer. None are
perfect.
24.2.3 Issues of quantitation
Proteomics has moved from global analysis of organisms to the possibility of
providing more information about proteins – for example, protein abundance,
either in a relative manner (i.e. comparing control vs. disease state or drug
treatment) or absolute quantitation. However, it is quite surprising to see
quantitative measurements still being reported without any form of confidence
in the measurements (either a standard deviation or a p value associated to the
group comparisons). Quantitative proteomics often involves a sample preparation
component as well as a bioinformatics one. In this section, we will concentrate on
sample preparation.
Relative quantitation can be divided into three main groups:
1. In vivo metabolic labelling.
2. In vitro labelling.
3. Label-free.
Labelling refers to the use of a reagent, composed of light monoisotope (based on12C and 14N) combined with stable heavy isotope (commonly based on 13C and15N). Peptides from one sample in an experiment are labelled with light isotope, and
peptides from another sample are labelled with heavy isotope. Both of the now-
labelled peptide samples are mixed together and are similar enough to behave
likewise for the overall procedure, whereas, for a given peptide, both forms (light
Figure 24.2 Main types of mass spectrometers and their characteristics. Courtesy of Dr. ThierryLebihan. A full colour version of this figure appears in the colour plate section.
222 CH24 EXPERIMENTAL PROTEOMICS
CH24 09/26/2012 12:15:1 Page 223
and heavy) will separated at the mass detection level. The peak intensity ratio
provides information about the relative abundance of the corresponding protein.
In vivo labelling strategies These can be performed by supplying cells with
labelled amino acid (SILAC) or, in the case of autotrophic organism, using 15N
through nitrate or ammonium salt or 13C (glucose, acetate and even CO2). The Stable
Isotope Labelling by Amino acids in Cell culture (SILAC) has become a gold
standard in the field of quantitative proteomics. Several amino acids can be used, but
a ‘classical’ SILAC experiment is often based on amedium containing 13C6-arginine
and 13C6-lysine. In this way, all tryptic peptides should have at least one labelled
amino acid.
In vitro labelling strategies These are quite numerous in the field of mass
spectrometry-based proteomics. One of the very first approaches developed was the
Isotope-Coded Affinity Tag (ICAT). This method is based on the modification of
Cys-containing peptides with reagents of a different isotopic composition that yield
a pair of ions 8 Da apart and subsequently to enrich for them by affinity. Several
other in vitro labelling approaches have been developed, mostly targeting primary
amine modification (which is the peptide N-terminal and the lysine side-chain).
Several issues with in vitro labelling have been identified, such as possible side
reactions with amino acids other than the ones being targeted or incomplete
reactions. Another main drawback of these labelling strategies is that they suffer
from an increased sample complexity after the different samples have been mixed
together (a reminder that complexity is the main challenge in the proteomic field). It
is also difficult in some cases, if not impossible, to compare more than 2–8 samples
at the same time. This imposes some constraints in terms of experimental design
using these labelling strategies.
Label-free approaches A label-free differential approach compares LC-MS data-
sets based on relative peptide peak intensities, or by comparing the number of
spectra acquired. A label-free quantitation strategy has fewer limitations in terms of
the number of runs to compare. Using this method, either a few runs or up to one
hundred runs can be technically compared.
Although label-free quantitation is a robust quantitative method, its performance
depends on temporal LC-MS alignment of the different runs, which can be
challenging. Since none of the samples being analyzed are encoded by isotopic
labelling or mixed at any level, label-free quantitation is also strongly dependent on
the reproducibility of the overall platform from the sample preparation to the LC-
MS analysis. As more and more free label-free platforms are being made available
and are priced in order to make them accessible to standard proteomics labs, this
approach will undoubtedly increase in popularity. A comparison of the three main
quantitative proteomics strategy and at which point samples can be combined is
illustrated in Figure 24.3.
24.2 IMPORTANT CONSIDERATIONS 223
CH24 09/26/2012 12:15:1 Page 224
Figure 24.3 Typical quantitative mass spectrometry workflows. The left part is a generic processpresented for a bottom up proteomics analysis. Mixed red and blue lines indicate when the twosamples are normally combined together. The longer the process is run in parallel prior mixingthe sample, the more technical variations are introduced, which can affect both samplesindependently. Courtesy of Dr. Thierry Lebihan. A full colour version of this figure appears inthe colour plate section.
24.3 Required controls24.3.1 Control definition
Depending on the type of experiments (immunoprecipitation or global proteomic
survey), as well as the number of replicates and fractions per sample replicates, the
type of control and the experimental design can vary significantly. For an immuno-
precipitation experiment, where antibodies are attached to beads and protein
complexes are captured, it is important to use well defined control (see Primers 13
and 14). The nature of the control can be, for example, using a non-specific antibody
with the same type of samples as used for the experiments. If the effect of a
perturbation is studied, then performing the immunoprecipitation enrichment both
prior to the perturbation and after could highlight significant differences. As
the performance of the mass spectrometer will vary with time, and some of the
experiment could necessitate several days of mass spectrometry time, it can be
important to randomize the samples to be run.
24.3.2 Sample normalization
Normalization has to happen at least at the following two steps:
1. Protein assay: to ensure that the same amount of sample is analyzed by LC-MS.
224 CH24 EXPERIMENTAL PROTEOMICS
CH24 09/26/2012 12:15:1 Page 225
2. At the LC-MS trace: each sample signal output has to be normalized for proper
quantitation.
In the next section, I discuss in more detail the importance of replicates and their
nature.
24.4 Common problem or errors in literature andpitfalls in execution or interpretation
New and cutting-edge techniques in mass spectrometry are often limited to the
inventor’s research laboratories (and their close collaborators). Moreover, the newly
introduced methods are often verified using rather simple models (e.g. BSA digests,
or casein digests as a model for phospho-protein enrichment or analysis). In the best
cases, the results obtained along these lines of experimentation should be consid-
ered simply as interesting proofs of principle. Extrapolating their performance to
more realistic samples is erroneous and needs to be proven experimentally on
relevant samples.
Even with the current improvements in instrumentation, recent MS-based
techniques can only cover a fraction of a given proteome. Any small increase in
proteome coverage is often accomplished by a concomitant increase in the amount
of sample fractionation, which is a time-consuming operation. In these circum-
stances, the choice of digging deeper into a given proteome is often achieved at the
expense of acquiring higher number of replicates. Ultimately, this has an impact on
the conclusions drawn from the observed data.
How to define the number of replicates, as well as the type of replicates (technical
versus biological), is important (see Primers 2 and 3). A technical replicate consists
of repeating the analysis of the same sample several times. Such an approach
allows for the evaluation of the variability inherent to the technique being applied.
While this may be useful to extract, has it given some information about the
technique used? Ideally, no biological interpretation should been drawn in these
circumstances.
The definition of biological replicates in the proteomic context has not been
adequately addressed as of yet, and is currently broadly defined. For example, a
different flask of the same culture can be considered a biological replicate.
Furthermore, different mice of the same genetic background are also considered
to be biological replicates. It is essential to define the question needing to be
answered by the proteomic study before deciding upon the experimental design.
Only then can an appropriate experimental design and adequate sampling be
performed in order to allow for the correct conclusions to be drawn (see Primers 2
and 3).
The higher the similarity between the different ‘biological replicates’ (i.e. simply
different flasks of the same cells cultured in exactly the same way or same mouse
24.4 COMMON PROBLEM OR ERRORS IN LITERATURE 225
CH24 09/26/2012 12:15:1 Page 226
genotype) will probably allow for the observation and reporting of nice, well-
defined trends. However, the conclusions drawn should ideally be limited to the
observed specific culture conditions. Using two different mouse genotypes also
generates a wide range of different background proteins which, under certain
circumstances, render data analysis almost impossible.
In the case of the number of replicates to consider in a study, triplicates are often
arbitrarily considered ‘safe’. However, a more robust approach, such as the use of a
power analysis, which allows for the evaluation of an adequate sample size, should
be considered. This step can be important, as the sample size may be either too low
or potentially too high (rarely the case in the proteomic field). If too low, the
experiment will ultimately lack the precision to give reliable answers in relation to
the questions. On the other hand, if the sample size is too large, then resources and
time will be unnecessarily lost, with little information gained. A power analysis can
also be an extremely useful tool for determining ratio cut-offs between two groups
being compared, instead of using the often employed and arbitrarily defined ratio
cut-off of 1.5–2. In time, some standardization rules, not defined arbitrarily, will
emerge in the field of quantitative proteomics. For example, proper power analysis
should be mandatory in order to justify the chosen experimental design.
The reader would perhaps have appreciated a clear and definitive description of
which methods should be used for mass spectrometry-based proteomic studies.
However, different proteomic platforms provide solutions to different proteomic
questions. Several major challenges still exist, while new ones are continuously
emerging. Therefore, the overall state of the field of proteomics is far from being
stagnant. In this primer, we haven’t proposed a single solution; however, we
have raised some key points to consider when reading or writing a proteomics
research paper.
With time, a proteomic experiment will become easier to perform and more
reproducible. Digging down into a proteome will hopefully become attainable, with
a greater ease, and thereby allow researchers to explore other dimensions of
biological significance, such as time series, several perturbations and a higher
number of biological replicates. Under those circumstances, the field of proteomics
will be able to realize its full potential and deliver on its tremendous promise.
24.5 Complementary techniquesA transcriptomic study offers several advantages over a proteomics one, as it is
slightly easier to perform; often, more complex experimental design and number of
replicates can be defined. However, as mentioned earlier, there are some time
differences between the levels of protein and their corresponding mRNA. The
transcriptomic approach is often done with prior knowledge, where a proteomics
approach can be used as a non-hypothesis driven approach.
Protein array, based on antibody capture, is a technology which is developing fast
and will soon be a serious competitor to the classical proteomic approaches; the
226 CH24 EXPERIMENTAL PROTEOMICS
CH24 09/26/2012 12:15:1 Page 227
expression and purification of diversified population of proteins will get easier, and
there will be a larger bank of antibodies available.
Primer 13: Immunocytochemistry
Primer 14 Immunopreciptation
Primer 15: Immunoblotting
Primer 19: Viral vector transgenesis
Primer 26: Genetically modified models
AcknowledgementsSynthSys is a Centre for Integrative Systems Biology (CISB) funded by BBSRC and
EPSRC, reference BB/D019621/1.
Further reading and resourcesSeveral good reviews on the topic exist, as well as good web links.
Domon, B. & Aebersold, R. (2006). Mass spectrometry and protein analysis. Science 312,
214–217.
Elliott, M.H., Smith, D.S., Parker, C.E. & Borchers, C. (2009). Current trends in quantitative
proteomics. Journal of Mass Spectrometry 44(12), 1637–1660.
Some tutorial on internet: http://www.i-mass.com/guide/tutorial.html
Some tools for proteomics analysis: http://expasy.org/proteomics
FURTHER READING AND RESOURCES 227