NON-GEL BASED APPROACHES of Expression Proteomics:
Multi-Dimensional Chromatography-Based Profiling of Intact Proteins have
emerged as in-line solution-based alternatives to two-dimensional gel
electrophoresis for separation and quantitation of proteins prior to mass
spectrometric identification of proteins.
One of the most promising approaches has been chromate-focusing followed by
nonporous reversed-phase HPLC (RP-HPLC):
Two-dimensional liquid chromatographic (2DLC) approaches (Beckman Coulter
Proteome Lab PF2d System) to protein profiling are more amenable to automation
than 2DGE. In the first dimension, chromate-focusing allows proteins to be
focused into distinct fractions based on each protein’s isoelectric point (pI).
Subsequently, each pI-focused fraction is further separated using nonporous silica
(NPS)-RP-HPLC, which separates proteins based on their surface hydrophobicity.
The NPS, as opposed to conventional porous chromatographic stationary phase
silica media, allows much faster separation and detection of proteins when coupled
to fast “scanning” ESI TOF-based mass spectrometers (Banks and Gulcicek,
1997).
A significant advantage of using RP-HPLC is that the mobile phase is volatile, which
facilitates using MS to obtain accurate intact-protein molecular-weight
information, which can be used to detect post-translational modifications often
missed in traditional peptide mass mapping techniques.
Multidimensional Protein Identification Technology (MudPIT):
One of the most common protein-profiling techniques that identifies/sequences
peptides directly from complex proteolytic digests is termed multidimensional
protein identification technology (MudPIT). The technique utilizes tandemly
coupled liquid chromatography columns. Generally, strong cation exchange
prefractionation is followed by reversed-phase HPLC separation, and MS/MS
analysis. MudPIT technology analyzes the entire complex mixture of tryptically
digested proteins. A subset of peptides is eluted or fractioned from the cation-
exchange column using a step or a continuous gradient of increasing salt
concentration. For every salt step or fractioned gradient, the peptides are loaded
on to a reversed-phase HPLC column for second-dimension separation and salt
removal, and then enter the mass spectrometer for tandem MS/MS analysis. After
the first RP gradient is completed, the process may be repeated as many times as
needed to match the capacity of the sample amount and complexity.
Stable Isotope Labeling and Quantitative Protein Profiling:
Tandem mass spectral identification and quantitation of peptides by stable isotope
-coded affinity tag (ICAT) technique. ICAT analyses a subset of cysteine containing
peptides in a proteolyzed sample using similar 2-D chromatographic techniques
with tandem MS. ICAT is used as one of the most commonly used quantitative
techniques in proteomics.
Post-Translational Modifications:
Phosphoprotein Profiling:
Reversible protein phosphorylation is probably the most important mechanism
used for intracellular signal transduction and is involved in regulating cell-cycle
progression, differentiation, transformation, development, peptide hormone
response, and adaptation. The post-translational modification is among the most
important and widespread. It is important that studies directed at identifying
phosphoproteins and carrying out comparative phosphor-proteome profiling
incorporate either a phosphoprotein or a phosphor-peptide enrichment step.
Enrichment techniques for phosphoproteins are somewhat limited. Although
traditional nickel-affinity techniques can be used with suboptimal results due to
unspecific binding of other acidic proteins, phosphor specific antibody–based
immunoprecipitation techniques are most helpful. Unfortunately, these enrichment
approaches are limited to Ser- and Thr-containing phosphor peptides. The
immobilized metal-affinity chromatography (IMAC)-based phosphor peptide
enrichment approach, on the other hand, is capable of enriching for serine-,
threonine-, and tyrosine-containing phosphor peptides, and has the capacity to
profile large numbers of phosphor peptides. The phosphoprotein isotope-coded
affinity tag (PhIAT) approach differentially labels phosphoserine and phosphor
threonine residues by carrying out β-elimination of the phosphate before labeling
with a 1,2-ethanedithiol containing either four alkyl hydrogens (EDT-H0) or four
alkyl deuteriums (EDT-D4), followed by biotinylation of the EDT-D0/D4 moiety to
allow affinity purification. Another simple approach is to use trypsin digestion in
heavy/light water with IMAC enrichment followed by tandem MS analysis.
Generalized flowchart of important steps in the analysis of
phosphorylation by mass spectrometry (This figure was
uploaded by David K Han, Content may be subject to copyright)
Glycoprotein Profiling:
Protein glycosylation has been shown to play critical roles in cell recognition,
regulation, cancer, protein folding, Alzheimer’s disease, muscular dystrophies, and
immune responses. The traditional and most widely used analytical glycobiology
treatments involving are glycans and lectin affinity chromatography. Due to both
glycan heterogeneity and complexity, many of the isolation and MS approaches
are slow, and are usually limited to purified proteins. 2DGE approaches have been
found to be useful for glycoprotein detection in complex samples. level,
Enrichment methods like hydrophilic interaction liquid chromatography (HILIC)
have also enabled the isolation and identification of larger numbers of N-linked
glycosylated proteins
schematic overview of systematic characterization of glycoproteins (Ruiz-May, 2012)
The only protein-profiling technology currently in use that is directed completely
at identifying the relative expression levels of a small number (e.g., 25 to 75) of
peptides/proteins which is used to best differentiate control from
experimental/disease samples is SELDI/MALDI-MS-BASED DISEASE
BIOMARKERSurface-Enhanced Laser Desorption/Ionization Time-Of-
Flight (SELDI-TOF) Mass Spectrometry is a modification of matrix-assisted
laser desorption/ionization mass spectrometry (MALDI-TOF), combines the
precision of mass spectrometry and the high-through-put nature of protein arrays
known as Protein Chips. Three major components constitute SELDI-TOF: the
Protein Chip arrays, the mass analyser, and the data analysis software. Protein
Chips, are solid-phase ligand-binding assay systems using immobilized proteins
on surfaces such as glass, cellulose membranes, mass spectrometer plates,
microbeads, or micro/nanoparticles.
The proteins of interest are captured on the chromatographic surface by
adsorption, partition, electrostatic interaction or affinity chromatography
depending on their properties, and analysed by TOF mass spectrometry. The result
is a mass spectrum comprised of the mass to charge (m/z) values and intensities
of the bound proteins/peptides. Then, using these chromatographic surfaces, a
laser desorption (LD) time-of-flight (TOF) mass spectrometer can generate an
accurate protein profile of a biological sample requiring minimal amounts of
sample. The protein sample may be analysed directly or after proteolytic digestion
of the adsorbed material. Proteolysis of a protein after desorption can be used to
identify binding sites such as antibody epitope sites. One of the unique strengths
of SELDI-TOF is its ability to analyse proteins from a variety of crude sample types,
with minimal sample consumption and processing. SELDI-TOF is very rapid and it
can directly test native undigested biological samples.
A schematic diagram showing the principle of SELDI-TOF MS to detect protein biomarkers. Urine samples
are incubated with protein microchips. Activation of protein samples by laser results in ionization. The
protein ions are accelerated in the presence of a high-voltage electrical field. The flight of protein ions is
separated by the mass/charge ratio. The TOF of protein ions is detected to generate an MS-TOF profile.
Using this approach, protein peaks with specific mass/charge ratios were used as non-invasive urinary
biomarkers to distinguish patients with active lupus nephritis from those in remission (part of the figure
reproduced from Mosley et al. with permission).
PROTEIN MICROARRAYS
Review of planar antibody microarray technologies and their applications in the field of proteomics.
Images were adopted from Servier Medical Art by Servier (http://www.servier.com/Powerpoint-
image-bank) and modified by the authors under the following terms: CREATIVE COMMONS
Attribution 3.0 Unported (CC BY 3.
The traditional method for quantitative proteome analysis combines protein
separation by high-resolution 2-dimensional isoelectric focusing (IEF)/SDS-PAGE
(2DE) with mass spectrometric (MS) or tandem mass spectrometric (MS/MS)
identification of selected protein spots detected in the 2DE gels by use of specific
protein stains. Typically, amino acid sequence information is collected in a tandem
mass spectrometer and is correlated with protein sequence databases.
sequencing inward from both the N- and C-termini are concurrently present in
each spectrum. The technique generates sequence information form any peptides
from a protein and enables the redundant and unambiguous identification of the
protein from the database.
The complexity of protein analysis can vary greatly depending on the sample-
simplification steps chosen prior to MS analysis. In some cases, it may be a simple
protein identification step from relatively purified protein mixtures obtained from
gels or chromatographic fractions. In other cases, it may be that the proteins are
well known recombinant or purified proteins, but they contain post-translationally
modified isoforms that need to be characterized and quantified. In situations
involving more complex experiments, large-scale profiling and perhaps
quantitation of proteins from biological samples in different cellular states may be
required. In more recent applications that do not involve the identification of
proteins, examination of mass-spectral peptide patterns from biological fluids has
been used to uncover biomarkers of potential disease states. Regardless of the
complexity of the samples, as a first step, any small- or large-scale MS-based
proteomics effort will require a basic set of mass-spectral raw data analysis tools.
In addition, when used with MS, gel electrophoresis–based approaches will require
the use of quantitative image analysis software to discern distinct protein patterns
from the gel background as well as across many gels. Many of the gel-imaging
software programs will also have to incorporate some level of two dimensional
“triangulation” component to aid in the precise excision of gel spots for further
mass spectrometric analysis.
2) Functional Proteomics/Interaction Proteomics: Advanced Methods for
Protein-Protein Interaction Mapping
The analysis of proteins provides the most intricate functions performed by these
biomolecules and thus is the most direct approach to define the gene function,
making this branch to be the science of understanding working of genome via
control of gene expression.
This is characterized by use of
1) high-through put comprehensive analysis of protein complexes,
2) protein-protein interaction mapping and
3) possible post-translational modification
required to perform the function Such as the co-precipitation studies with a bait
protein followed by MS spectrum analysis of the bound proteins as well as arrange
of computational analysis of data.
Non-Proteome Based Methods Protein Engineering Based
Method
The transcriptome forming the cell’s global m-RNA content is a context–dependent science
that is altered according to conditions prevailing in the interior and exterior of the cell, tissue
or organ type. Though there is little correspondence among the amount of m-RNA content
and protein to which it is translated, still m-RNA complement reflects the proteome
component of a particular cell.
techniques have been
employed to obtain the
functional link between the
gene to m-RNA to protein to
metabolite types of a cell.
1. Expression Profiling
2. Computational methods for
detection of functional
linkages
i. Phylogenetic profile
ii. Domain fusion
method
iii. Gene-neighbour
method
iv. Homology method
1. Yeast Two-hybrid screen
Expression Profiling
The quantitative analysis of expression levels of novel genes under a variety of
physiological or developmental conditions help elucidate gene function, when
compared with expression patterns of functionally characterized genes.
1. Serial Analysis of Gene Expression (SAGE): Determine a complete set
of yeast genes expressed under given set of conditions or “Transcriptome.”
This technique helps in identification of genes that had not been predicted
by sequence information alone. m-RNAs were fished out on affinity columns
and correlation of those mRNAs whose expression levels are changed, help
establish functional linkages between proteins encoded by the correlated
m-RNAs.
2. DNA microarrays
The hybridization-array technologies for high-throughput analysis of
transcriptome.
Computational methods for detection of functional linkages
The advent of fully sequenced genomes has facilitated the development of
computational methods for establishing functional linkages between proteins.
i) Phylogenetic profile
It describes the pattern of presence or absence of a particular protein across a set
of organisms whose genomes have been sequenced. If two proteins have the same
phylogenetic profile in all surveyed genomes, it shows that the two proteins have
a functional link. For example, if protein A and B have the same phylogenetic
profile across all surveyed genomes then they tend to be linked in a particular
common function. .Hence any two proteins having identical or similar phylogenetic
profiles are likely to be engaged in a common pathway or complex.
ii) Domain fusion method:
Functional linkages between proteins can be detected by analysing fusion patterns
of protein domains. If the proteins A and B in one organism are expressed as a
fused protein that means the domains A and B are almost certainly linked in
function. Thus a successful search through other genome sequences for the
corresponding fused protein is powerful evidence that A and B are functionally
linked. If A and B have unrelated sequences, then this kind of functional linkage
could not be detected by homology search.
iii) Gene-neighbour method:
If in several genomes the gene that encodes two proteins are neighbours on the
chromosome, the proteins tend to be functionally linked. This method can be
powerful in un covering functional linkages in prokaryotes and hence most robust
as operons are common, but this may work to some extent even for human genes
where operon like clusters are observed as histone proteins gene clusters.
iv) Homology method:
After obtaining primary knowledge of function of an individual protein(s), one
can search for other proteins with related functions by seeking proteins whose
amino-acid sequences are similar to the original protein. This technique gives
a role or function to unknown protein just by comparison and analysis of the
sequence pattern of known amino acids. Since the arrangement of amino acids
results in formation of peculiar alpha or beta helixes/sheets followed by tertiary
and subunit associations, it becomes easier to ascertain the predictable protein
family of a particular protein by this method.
Protein Engineering Based Method
Yeast Two-hybrid screenThe yeast two-hybrid system exploits the finding
that many transcription activators are composed of two domains that are
physically separable and remain active provided they are in close proximity.
The binding domain(BD) tethers specific DNA sequences localized in the
promoter region of genes regulated by the transcription activator, whereas the
transcription activation domain (AD) induces transcription.
Thus, two-hybrid analysis works by separating the coding sequences for the
DNA-binding and activation domains of a transcriptional activator and cloning
them in to separate vector molecules. The coding sequence of a candidate
protein, whose partners are required is known as “Bait,” is then fused with the
DNA-binding domain. A library of coding sequences for proteins that might
interact with the bait called “Prey” is made in fusion with the activation domain.
The plasmids carrying the DNA sequences coding for the bait and prey proteins
are generally introduced into the same yeast cell using a mating strategy.
If the two proteins physically interact, the DNA binding and activation domains
are closely juxtaposed and the reconstituted transcriptional activator can
mediate the switching-on of the gene that gives either change in colour (lacZ
on X-gal) or nutritional auxotrophy (his/leu).
The principle of the yeast two-hybrid system. Two
plasmids are constructed, the bait-encoding protein
X fused to the C-terminus of a transcription factor
DNA-binding domain (BD) and the prey-encoding
protein Y fused to an activation domain (AD).
Alternatively, the prey can consist of proteins
encoded by an expression library. Each plasmid is
introduced into an appropriate yeast strain either by
co-transformation, sequential transformation, or by
yeast mating. Only if proteins X and Y physically
interact with one another are the BD and AD brought
together to reconstitute a functionally active
transcription factor that binds to upstream specific
activation sequences (UAS) in the promoters of the
reporter genes, and to activate their expression.
The ultimate goal of proteomics is to comprehensively identify all proteins, their
associated biological activities, post-translational modifications, and protein-
protein interactions occurring in a given cell, and determine how this “proteome”
is altered in response to a modifier.
Bioinformatics/Protein Database Search
Emerging computer technologies and the global database networking facilities
have caused big-bang in the proteomic analysis reports. Protein identification
could be done by accessing the protein databases, such as
Swiss Prot: The SWISS-PROT protein sequence data bank is composed of
sequence entries. Each entry corresponds to a single contiguous sequence as
contributed to the bank or reported in the literature. SWISS-PROT (Bairoch and
Apweiler, 1996) is an annotated protein sequence database established in 1986
and maintained collaboratively, since 1987, by the Department of Medical
Biochemistry of the University of Geneva and the EMBL Data Library. It is a curated
protein sequence database, which strives to provide a high level of annotation
(such as the description of the function of a protein, its domain
structure, posttranslational modifications and variants), a minimal level of
redundancy, and a high level of integration with other databases. TrEMBL is a
computer-annotated supplement of SWISS-PROT that contains all the translations
of EMBL nucleotide sequence entries, which are not yet integrated in SWISS-
PROT. Currently, SWISS-PROT and TrEMBL have 0.5 and 7.6 million sequences,
respectively. These databases are freely available
at http://www.expasy.org/sprot/ and http://www.ebi.ac.uk/swissprot/.
SWISS-PROT contains the information about the name and origin of the protein,
protein attributes, general information, ontologies, sequence annotation, amino
acid sequence, bibliographic references, cross-references with sequence, structure
and interaction databases, and entry information.
ExPasy: The ExPASy (the Expert Protein Analysis System) World Wide Web server
(http://www.expasy.org), is provided as a service to the life science community
by a multidisciplinary team at the Swiss Institute of Bioinformatics (SIB). It
provides access to a variety of databases and analytical tools dedicated to proteins
and proteomics. ExPASy databases include SWISS-PROT and TrEMBL, SWISS-
2DPAGE, PROSITE, ENZYME and the SWISS-MODEL repository. Analysis tools are
available for specific tasks relevant to proteomics, similarity searches, pattern and
profile searches, post-translational modification prediction, topology prediction,
primary, secondary and tertiary structure analysis and sequence alignment. These
databases and tools are tightly interlinked: a special emphasis is placed on
integration of database entries with related resources developed at the SIB and
elsewhere, and the proteomics tools have been designed to read the annotations
in SWISS-PROT in order to enhance their predictions. ExPASy started to operate
in 1993, as the first WWW server in the field of life sciences. In addition to the
main site in Switzerland, seven mirror sites in different continents currently serve
the user community.
UniProt: The Universal Protein Resource (UniProt) provides a stable,
comprehensive, freely accessible, central resource on protein sequences and
functional annotation. The UniProt Consortium is a collaboration between the
European Bioinformatics Institute (EBI), the Protein Information Resource (PIR)
and the Swiss Institute of Bioinformatics (SIB). The core activities include manual
curation of protein sequences assisted by computational analysis, sequence
archiving, development of a user-friendly UniProt website, and the provision of
additional value-added information through cross-references to other databases.
UniProt is comprised of four major components, each optimized for different uses:
the UniProt Knowledgebase, the UniProt Reference Clusters, the UniProt Archive
and the UniProt Metagenomic and Environmental Sequences database. UniProt is
updated and distributed every three weeks, and can be accessed online for
searches or download at http://www.uniprot.org.
BLAST (Basic Local Alignment Search Tool; a very fast search algorithm that is
used to separately search protein or DNA databases): The Basic Local Alignment
Search Tool (BLAST) finds regions of local similarity between sequences. The
program compares nucleotide or protein sequences to sequence databases and
calculates the statistical significance of matches. BLAST can be used to infer
functional and evolutionary relationships between sequences as well as help
identify members of gene families.
(Link: https://www.yumpu.com/en/document/read/20871204/protein-motifs-and-proteomics-
tools-galter-health-sciences-library)
MMDB: The Molecular Modeling DataBase (MMDB) is a database of experimentally determined
three-dimensional biomolecular structures, and is also referred to as the Entrez Structure
database. It is a subset of three-dimensional structures obtained from the RCSB Protein Data Bank
(PDB), excluding theoretical models.
FASTA (descriptive text nucleotide of amino acid on sequence line; database used
for motif searching),
PDB (protein databank; http://www.pdb.org/):
The Protein Data Bank (PDB) archive is the single worldwide repository of
information about the 3D structures of large biological molecules, including
proteins and nucleic acids. These are the molecules of life that are found in all
organisms including bacteria, yeast, plants, flies, other animals, and humans.
Understanding the shape of a molecule deduce a structure's role in human health
and disease, and in drug development. The structures in the archive range from
tiny proteins and bits of DNA to complex molecular machines like the ribosome.
The PDB archive is available at no cost to users. The PDB archive is updated
weekly.
The PDB was established in 1971 at Brookhaven National Laboratory under the
leadership of Walter Hamilton and originally contained 7 structures. After
Hamilton's untimely death, Tom Koetzle began to lead the PDB in 1973, and then
Joel Sussman in 1994. Led by Helen M. Berman, the Research Collaboratory for
Structural Bioinformatics (RCSB) became responsible for the management of the
PDB in 1998. In 2003, the wwPDB was formed to maintain a single PDB archive
of macromolecular structural data that is freely and publicly available to the global
community. It consists of organizations that act as deposition, data processing
and distribution centres for PDB data.
and
NCBI that provide information on protein sequence, domain structure and 3-D
protein structures.
The National Centre for Biotechnology Information (NCBI) develops and
maintains molecular and bibliographic databases as a part of the
National Library of Medicine (NLM). They do not generate their own data, but
they do: Receive data submissions from researchers; Develop software for
searching, and analysis of these data; Provide a web access.
Several input and output data formats have been utilized for accessing data from
various databases as raw text format, FASTA format, GCG format, Rich Sequence
Format (RSF format), ABI trace file format, AceDB format, CODATA format, etc.
This branch of proteomics is proliferating as new, advanced and sophisticated
softwares are being developed for efficient analysis and data storage along with
provision of partially automated comparison modules that help fast and efficient
comparison of protein structures or functions.
Networks of interacting functionally linked proteins can be traced out by detecting
functional linkages of all the proteins of an organism. Computational as well as
other methods can infer function fairly reliably. This homology method is widely
used to extend knowledge of protein function from one protein to its cousins,
which are presumably descended from the same common ancestral protein.
These powerful programs are used to extend experimental knowledge of protein
function to new sequences culminating to assigning of function to roughly 40–
70% of new genome sequences by such homology techniques. Protein Data Bank
(PDB) is now the sole repository for three-dimensional structure data of biological
macro-molecules and it uses advanced methods for annotation, archive–making
and access. PDB provides information on various aspects of structural genomics
as development and maintenance of the Target Registration Database (TargetDB),
organization of data dictionaries to define specification for exchange and
deposition of data with structural genomics centres, and creation of software tools
to capture data from standard structure determination applications.
As proteins do not act independently in most of the cases and form transient or
stable complexes with other proteins. The protein might be intricate as complexes
of variable composition and therefore it is essential to study the protein complexes
along with the conditions that result in their formation or dissociation for the
complete understanding of a biological system.
The protein pathways are a series of reactions inside the cell that exert a particular
biological effect. The proteins that are directly involved in reaction along with
those that regulates the pathways are combined in pathway databases; therefore,
a number of resources and databases are available for the protein pathways. The
KEGG, Ingenuity, Pathway Knowledge Base Reactome and BioCarta are some of
the pathway databases that include a comprehensive data regarding metabolism,
signalling and interactions. In addition to these comprehensive databases, the
specific databases for signal transduction pathways such as GenMAPP or PANTHER
have been developed. Moreover, databases such as Netpath have been developed,
which involve the pathways active in cancer that are helpful for the identification
of proteins relevant for a cancer type. These public databases possess higher
connectivity that allows novel findings for proteins. The databases such as
BioGRID, IntAct, MINT and HRPD contain the information with reference to protein
interactions in complexes. STRING is not only a widely used database for protein
interaction data, but it connects to various other resources for literature mining.
Furthermore, protein networks can be drawn based on the list of genes provided
and the available interactions using STRING database.