to fast “scanning” ESI TOF · 2020-06-01 · protein ions are accelerated in the presence of a...

NON-GEL BASED APPROACHES of Expression Proteomics:

Multi-Dimensional Chromatography-Based Profiling of Intact Proteins have

emerged as in-line solution-based alternatives to two-dimensional gel

electrophoresis for separation and quantitation of proteins prior to mass

spectrometric identification of proteins.

One of the most promising approaches has been chromate-focusing followed by

nonporous reversed-phase HPLC (RP-HPLC):

Two-dimensional liquid chromatographic (2DLC) approaches (Beckman Coulter

Proteome Lab PF2d System) to protein profiling are more amenable to automation

than 2DGE. In the first dimension, chromate-focusing allows proteins to be

focused into distinct fractions based on each protein’s isoelectric point (pI).

Subsequently, each pI-focused fraction is further separated using nonporous silica

(NPS)-RP-HPLC, which separates proteins based on their surface hydrophobicity.

The NPS, as opposed to conventional porous chromatographic stationary phase

silica media, allows much faster separation and detection of proteins when coupled

to fast “scanning” ESI TOF-based mass spectrometers (Banks and Gulcicek,

1997).

A significant advantage of using RP-HPLC is that the mobile phase is volatile, which

facilitates using MS to obtain accurate intact-protein molecular-weight

information, which can be used to detect post-translational modifications often

missed in traditional peptide mass mapping techniques.

Multidimensional Protein Identification Technology (MudPIT):

One of the most common protein-profiling techniques that identifies/sequences

peptides directly from complex proteolytic digests is termed multidimensional

protein identification technology (MudPIT). The technique utilizes tandemly

coupled liquid chromatography columns. Generally, strong cation exchange

prefractionation is followed by reversed-phase HPLC separation, and MS/MS

analysis. MudPIT technology analyzes the entire complex mixture of tryptically

digested proteins. A subset of peptides is eluted or fractioned from the cation-

exchange column using a step or a continuous gradient of increasing salt

concentration. For every salt step or fractioned gradient, the peptides are loaded

on to a reversed-phase HPLC column for second-dimension separation and salt

removal, and then enter the mass spectrometer for tandem MS/MS analysis. After

the first RP gradient is completed, the process may be repeated as many times as

needed to match the capacity of the sample amount and complexity.

Stable Isotope Labeling and Quantitative Protein Profiling:

Tandem mass spectral identification and quantitation of peptides by stable isotope

-coded affinity tag (ICAT) technique. ICAT analyses a subset of cysteine containing

peptides in a proteolyzed sample using similar 2-D chromatographic techniques

with tandem MS. ICAT is used as one of the most commonly used quantitative

techniques in proteomics.

Post-Translational Modifications:

Phosphoprotein Profiling:

Reversible protein phosphorylation is probably the most important mechanism

used for intracellular signal transduction and is involved in regulating cell-cycle

progression, differentiation, transformation, development, peptide hormone

response, and adaptation. The post-translational modification is among the most

important and widespread. It is important that studies directed at identifying

phosphoproteins and carrying out comparative phosphor-proteome profiling

incorporate either a phosphoprotein or a phosphor-peptide enrichment step.

Enrichment techniques for phosphoproteins are somewhat limited. Although

traditional nickel-affinity techniques can be used with suboptimal results due to

unspecific binding of other acidic proteins, phosphor specific antibody–based

immunoprecipitation techniques are most helpful. Unfortunately, these enrichment

approaches are limited to Ser- and Thr-containing phosphor peptides. The

immobilized metal-affinity chromatography (IMAC)-based phosphor peptide

enrichment approach, on the other hand, is capable of enriching for serine-,

threonine-, and tyrosine-containing phosphor peptides, and has the capacity to

profile large numbers of phosphor peptides. The phosphoprotein isotope-coded

affinity tag (PhIAT) approach differentially labels phosphoserine and phosphor

threonine residues by carrying out β-elimination of the phosphate before labeling

with a 1,2-ethanedithiol containing either four alkyl hydrogens (EDT-H0) or four

alkyl deuteriums (EDT-D4), followed by biotinylation of the EDT-D0/D4 moiety to

allow affinity purification. Another simple approach is to use trypsin digestion in

heavy/light water with IMAC enrichment followed by tandem MS analysis.

Generalized flowchart of important steps in the analysis of

phosphorylation by mass spectrometry (This figure was

uploaded by David K Han, Content may be subject to copyright)

https://www.researchgate.net/profile/David_Han7

Glycoprotein Profiling:

Protein glycosylation has been shown to play critical roles in cell recognition,

regulation, cancer, protein folding, Alzheimer’s disease, muscular dystrophies, and

immune responses. The traditional and most widely used analytical glycobiology

treatments involving are glycans and lectin affinity chromatography. Due to both

glycan heterogeneity and complexity, many of the isolation and MS approaches

are slow, and are usually limited to purified proteins. 2DGE approaches have been

found to be useful for glycoprotein detection in complex samples. level,

Enrichment methods like hydrophilic interaction liquid chromatography (HILIC)

have also enabled the isolation and identification of larger numbers of N-linked

glycosylated proteins

schematic overview of systematic characterization of glycoproteins (Ruiz-May, 2012)

The only protein-profiling technology currently in use that is directed completely

at identifying the relative expression levels of a small number (e.g., 25 to 75) of

peptides/proteins which is used to best differentiate control from

experimental/disease samples is SELDI/MALDI-MS-BASED DISEASE

BIOMARKERSurface-Enhanced Laser Desorption/Ionization Time-Of-

Flight (SELDI-TOF) Mass Spectrometry is a modification of matrix-assisted

laser desorption/ionization mass spectrometry (MALDI-TOF), combines the

precision of mass spectrometry and the high-through-put nature of protein arrays

known as Protein Chips. Three major components constitute SELDI-TOF: the

Protein Chip arrays, the mass analyser, and the data analysis software. Protein

Chips, are solid-phase ligand-binding assay systems using immobilized proteins

on surfaces such as glass, cellulose membranes, mass spectrometer plates,

microbeads, or micro/nanoparticles.

The proteins of interest are captured on the chromatographic surface by

adsorption, partition, electrostatic interaction or affinity chromatography

depending on their properties, and analysed by TOF mass spectrometry. The result

is a mass spectrum comprised of the mass to charge (m/z) values and intensities

of the bound proteins/peptides. Then, using these chromatographic surfaces, a

laser desorption (LD) time-of-flight (TOF) mass spectrometer can generate an

accurate protein profile of a biological sample requiring minimal amounts of

sample. The protein sample may be analysed directly or after proteolytic digestion

of the adsorbed material. Proteolysis of a protein after desorption can be used to

identify binding sites such as antibody epitope sites. One of the unique strengths

of SELDI-TOF is its ability to analyse proteins from a variety of crude sample types,

with minimal sample consumption and processing. SELDI-TOF is very rapid and it

can directly test native undigested biological samples.

A schematic diagram showing the principle of SELDI-TOF MS to detect protein biomarkers. Urine samples

are incubated with protein microchips. Activation of protein samples by laser results in ionization. The

protein ions are accelerated in the presence of a high-voltage electrical field. The flight of protein ions is

separated by the mass/charge ratio. The TOF of protein ions is detected to generate an MS-TOF profile.

Using this approach, protein peaks with specific mass/charge ratios were used as non-invasive urinary

biomarkers to distinguish patients with active lupus nephritis from those in remission (part of the figure

reproduced from Mosley et al. with permission).

PROTEIN MICROARRAYS

Review of planar antibody microarray technologies and their applications in the field of proteomics.

Images were adopted from Servier Medical Art by Servier (http://www.servier.com/Powerpoint-

image-bank) and modified by the authors under the following terms: CREATIVE COMMONS

Attribution 3.0 Unported (CC BY 3.

http://www.servier.com/Powerpoint-image-bank

http://www.servier.com/Powerpoint-image-bank

The traditional method for quantitative proteome analysis combines protein

separation by high-resolution 2-dimensional isoelectric focusing (IEF)/SDS-PAGE

(2DE) with mass spectrometric (MS) or tandem mass spectrometric (MS/MS)

identification of selected protein spots detected in the 2DE gels by use of specific

protein stains. Typically, amino acid sequence information is collected in a tandem

mass spectrometer and is correlated with protein sequence databases.

sequencing inward from both the N- and C-termini are concurrently present in

each spectrum. The technique generates sequence information form any peptides

from a protein and enables the redundant and unambiguous identification of the

protein from the database.

The complexity of protein analysis can vary greatly depending on the sample-

simplification steps chosen prior to MS analysis. In some cases, it may be a simple

protein identification step from relatively purified protein mixtures obtained from

gels or chromatographic fractions. In other cases, it may be that the proteins are

well known recombinant or purified proteins, but they contain post-translationally

modified isoforms that need to be characterized and quantified. In situations

involving more complex experiments, large-scale profiling and perhaps

quantitation of proteins from biological samples in different cellular states may be

required. In more recent applications that do not involve the identification of

proteins, examination of mass-spectral peptide patterns from biological fluids has

been used to uncover biomarkers of potential disease states. Regardless of the

complexity of the samples, as a first step, any small- or large-scale MS-based

proteomics effort will require a basic set of mass-spectral raw data analysis tools.

In addition, when used with MS, gel electrophoresis–based approaches will require

the use of quantitative image analysis software to discern distinct protein patterns

from the gel background as well as across many gels. Many of the gel-imaging

software programs will also have to incorporate some level of two dimensional

“triangulation” component to aid in the precise excision of gel spots for further

mass spectrometric analysis.

2) Functional Proteomics/Interaction Proteomics: Advanced Methods for

Protein-Protein Interaction Mapping

The analysis of proteins provides the most intricate functions performed by these

biomolecules and thus is the most direct approach to define the gene function,

making this branch to be the science of understanding working of genome via

control of gene expression.

This is characterized by use of

1) high-through put comprehensive analysis of protein complexes,

2) protein-protein interaction mapping and

3) possible post-translational modification

required to perform the function Such as the co-precipitation studies with a bait

protein followed by MS spectrum analysis of the bound proteins as well as arrange

of computational analysis of data.

Non-Proteome Based Methods Protein Engineering Based

Method

The transcriptome forming the cell’s global m-RNA content is a context–dependent science

that is altered according to conditions prevailing in the interior and exterior of the cell, tissue

or organ type. Though there is little correspondence among the amount of m-RNA content

and protein to which it is translated, still m-RNA complement reflects the proteome

component of a particular cell.

techniques have been

employed to obtain the

functional link between the

gene to m-RNA to protein to

metabolite types of a cell.

1. Expression Profiling

2. Computational methods for

detection of functional

linkages

i. Phylogenetic profile

ii. Domain fusion

method

iii. Gene-neighbour

method

iv. Homology method

1. Yeast Two-hybrid screen

Expression Profiling

The quantitative analysis of expression levels of novel genes under a variety of

physiological or developmental conditions help elucidate gene function, when

compared with expression patterns of functionally characterized genes.

1. Serial Analysis of Gene Expression (SAGE): Determine a complete set

of yeast genes expressed under given set of conditions or “Transcriptome.”

This technique helps in identification of genes that had not been predicted

by sequence information alone. m-RNAs were fished out on affinity columns

and correlation of those mRNAs whose expression levels are changed, help

establish functional linkages between proteins encoded by the correlated

m-RNAs.

2. DNA microarrays

The hybridization-array technologies for high-throughput analysis of

transcriptome.

Computational methods for detection of functional linkages

The advent of fully sequenced genomes has facilitated the development of

computational methods for establishing functional linkages between proteins.

i) Phylogenetic profile

It describes the pattern of presence or absence of a particular protein across a set

of organisms whose genomes have been sequenced. If two proteins have the same

phylogenetic profile in all surveyed genomes, it shows that the two proteins have

a functional link. For example, if protein A and B have the same phylogenetic

profile across all surveyed genomes then they tend to be linked in a particular

common function. .Hence any two proteins having identical or similar phylogenetic

profiles are likely to be engaged in a common pathway or complex.

ii) Domain fusion method:

Functional linkages between proteins can be detected by analysing fusion patterns

of protein domains. If the proteins A and B in one organism are expressed as a

fused protein that means the domains A and B are almost certainly linked in

function. Thus a successful search through other genome sequences for the

corresponding fused protein is powerful evidence that A and B are functionally

linked. If A and B have unrelated sequences, then this kind of functional linkage

could not be detected by homology search.

iii) Gene-neighbour method:

If in several genomes the gene that encodes two proteins are neighbours on the

chromosome, the proteins tend to be functionally linked. This method can be

powerful in un covering functional linkages in prokaryotes and hence most robust

as operons are common, but this may work to some extent even for human genes

where operon like clusters are observed as histone proteins gene clusters.

iv) Homology method:

After obtaining primary knowledge of function of an individual protein(s), one

can search for other proteins with related functions by seeking proteins whose

amino-acid sequences are similar to the original protein. This technique gives

a role or function to unknown protein just by comparison and analysis of the

sequence pattern of known amino acids. Since the arrangement of amino acids

results in formation of peculiar alpha or beta helixes/sheets followed by tertiary

and subunit associations, it becomes easier to ascertain the predictable protein

family of a particular protein by this method.

Protein Engineering Based Method

Yeast Two-hybrid screenThe yeast two-hybrid system exploits the finding

that many transcription activators are composed of two domains that are

physically separable and remain active provided they are in close proximity.

The binding domain(BD) tethers specific DNA sequences localized in the

promoter region of genes regulated by the transcription activator, whereas the

transcription activation domain (AD) induces transcription.

Thus, two-hybrid analysis works by separating the coding sequences for the

DNA-binding and activation domains of a transcriptional activator and cloning

them in to separate vector molecules. The coding sequence of a candidate

protein, whose partners are required is known as “Bait,” is then fused with the

DNA-binding domain. A library of coding sequences for proteins that might

interact with the bait called “Prey” is made in fusion with the activation domain.

The plasmids carrying the DNA sequences coding for the bait and prey proteins

are generally introduced into the same yeast cell using a mating strategy.

If the two proteins physically interact, the DNA binding and activation domains

are closely juxtaposed and the reconstituted transcriptional activator can

mediate the switching-on of the gene that gives either change in colour (lacZ

on X-gal) or nutritional auxotrophy (his/leu).

The principle of the yeast two-hybrid system. Two

plasmids are constructed, the bait-encoding protein

X fused to the C-terminus of a transcription factor

DNA-binding domain (BD) and the prey-encoding

protein Y fused to an activation domain (AD).

Alternatively, the prey can consist of proteins

encoded by an expression library. Each plasmid is

introduced into an appropriate yeast strain either by

co-transformation, sequential transformation, or by

yeast mating. Only if proteins X and Y physically

interact with one another are the BD and AD brought

together to reconstitute a functionally active

transcription factor that binds to upstream specific

activation sequences (UAS) in the promoters of the

reporter genes, and to activate their expression.

The ultimate goal of proteomics is to comprehensively identify all proteins, their

associated biological activities, post-translational modifications, and protein-

protein interactions occurring in a given cell, and determine how this “proteome”

is altered in response to a modifier.

Bioinformatics/Protein Database Search

Emerging computer technologies and the global database networking facilities

have caused big-bang in the proteomic analysis reports. Protein identification

could be done by accessing the protein databases, such as

Swiss Prot: The SWISS-PROT protein sequence data bank is composed of

sequence entries. Each entry corresponds to a single contiguous sequence as

contributed to the bank or reported in the literature. SWISS-PROT (Bairoch and

Apweiler, 1996) is an annotated protein sequence database established in 1986

and maintained collaboratively, since 1987, by the Department of Medical

Biochemistry of the University of Geneva and the EMBL Data Library. It is a curated

protein sequence database, which strives to provide a high level of annotation

(such as the description of the function of a protein, its domain

structure, posttranslational modifications and variants), a minimal level of

redundancy, and a high level of integration with other databases. TrEMBL is a

computer-annotated supplement of SWISS-PROT that contains all the translations

of EMBL nucleotide sequence entries, which are not yet integrated in SWISS-

PROT. Currently, SWISS-PROT and TrEMBL have 0.5 and 7.6 million sequences,

respectively. These databases are freely available

at http://www.expasy.org/sprot/ and http://www.ebi.ac.uk/swissprot/.

SWISS-PROT contains the information about the name and origin of the protein,

protein attributes, general information, ontologies, sequence annotation, amino

acid sequence, bibliographic references, cross-references with sequence, structure

and interaction databases, and entry information.

https://www.sciencedirect.com/topics/biochemistry-genetics-and-molecular-biology/posttranslational-modification

https://www.sciencedirect.com/topics/biochemistry-genetics-and-molecular-biology/nucleotide-sequence

https://www.sciencedirect.com/topics/biochemistry-genetics-and-molecular-biology/peptide-sequence

https://www.sciencedirect.com/topics/biochemistry-genetics-and-molecular-biology/peptide-sequence

ExPasy: The ExPASy (the Expert Protein Analysis System) World Wide Web server

(http://www.expasy.org), is provided as a service to the life science community

by a multidisciplinary team at the Swiss Institute of Bioinformatics (SIB). It

provides access to a variety of databases and analytical tools dedicated to proteins

and proteomics. ExPASy databases include SWISS-PROT and TrEMBL, SWISS-

2DPAGE, PROSITE, ENZYME and the SWISS-MODEL repository. Analysis tools are

available for specific tasks relevant to proteomics, similarity searches, pattern and

profile searches, post-translational modification prediction, topology prediction,

primary, secondary and tertiary structure analysis and sequence alignment. These

databases and tools are tightly interlinked: a special emphasis is placed on

integration of database entries with related resources developed at the SIB and

elsewhere, and the proteomics tools have been designed to read the annotations

in SWISS-PROT in order to enhance their predictions. ExPASy started to operate

in 1993, as the first WWW server in the field of life sciences. In addition to the

main site in Switzerland, seven mirror sites in different continents currently serve

the user community.

UniProt: The Universal Protein Resource (UniProt) provides a stable,

comprehensive, freely accessible, central resource on protein sequences and

functional annotation. The UniProt Consortium is a collaboration between the

European Bioinformatics Institute (EBI), the Protein Information Resource (PIR)

and the Swiss Institute of Bioinformatics (SIB). The core activities include manual

curation of protein sequences assisted by computational analysis, sequence

archiving, development of a user-friendly UniProt website, and the provision of

additional value-added information through cross-references to other databases.

UniProt is comprised of four major components, each optimized for different uses:

the UniProt Knowledgebase, the UniProt Reference Clusters, the UniProt Archive

and the UniProt Metagenomic and Environmental Sequences database. UniProt is

updated and distributed every three weeks, and can be accessed online for

searches or download at http://www.uniprot.org.

http://www.uniprot.org/

BLAST (Basic Local Alignment Search Tool; a very fast search algorithm that is

used to separately search protein or DNA databases): The Basic Local Alignment

Search Tool (BLAST) finds regions of local similarity between sequences. The

program compares nucleotide or protein sequences to sequence databases and

calculates the statistical significance of matches. BLAST can be used to infer

functional and evolutionary relationships between sequences as well as help

identify members of gene families.

(Link: https://www.yumpu.com/en/document/read/20871204/protein-motifs-and-proteomics-

tools-galter-health-sciences-library)

MMDB: The Molecular Modeling DataBase (MMDB) is a database of experimentally determined

three-dimensional biomolecular structures, and is also referred to as the Entrez Structure

database. It is a subset of three-dimensional structures obtained from the RCSB Protein Data Bank

(PDB), excluding theoretical models.

https://www.yumpu.com/en/document/read/20871204/protein-motifs-and-proteomics-tools-galter-health-sciences-library

https://www.yumpu.com/en/document/read/20871204/protein-motifs-and-proteomics-tools-galter-health-sciences-library

FASTA (descriptive text nucleotide of amino acid on sequence line; database used

for motif searching),

PDB (protein databank; http://www.pdb.org/):

The Protein Data Bank (PDB) archive is the single worldwide repository of

information about the 3D structures of large biological molecules, including

proteins and nucleic acids. These are the molecules of life that are found in all

organisms including bacteria, yeast, plants, flies, other animals, and humans.

Understanding the shape of a molecule deduce a structure's role in human health

and disease, and in drug development. The structures in the archive range from

tiny proteins and bits of DNA to complex molecular machines like the ribosome.

The PDB archive is available at no cost to users. The PDB archive is updated

weekly.

The PDB was established in 1971 at Brookhaven National Laboratory under the

leadership of Walter Hamilton and originally contained 7 structures. After

Hamilton's untimely death, Tom Koetzle began to lead the PDB in 1973, and then

Joel Sussman in 1994. Led by Helen M. Berman, the Research Collaboratory for

Structural Bioinformatics (RCSB) became responsible for the management of the

PDB in 1998. In 2003, the wwPDB was formed to maintain a single PDB archive

http://www.pdb.org/

http://wwpdb.org/

of macromolecular structural data that is freely and publicly available to the global

community. It consists of organizations that act as deposition, data processing

and distribution centres for PDB data.

and

NCBI that provide information on protein sequence, domain structure and 3-D

protein structures.

The National Centre for Biotechnology Information (NCBI) develops and

maintains molecular and bibliographic databases as a part of the

National Library of Medicine (NLM). They do not generate their own data, but

they do: Receive data submissions from researchers; Develop software for

searching, and analysis of these data; Provide a web access.

Several input and output data formats have been utilized for accessing data from

various databases as raw text format, FASTA format, GCG format, Rich Sequence

Format (RSF format), ABI trace file format, AceDB format, CODATA format, etc.

This branch of proteomics is proliferating as new, advanced and sophisticated

softwares are being developed for efficient analysis and data storage along with

provision of partially automated comparison modules that help fast and efficient

comparison of protein structures or functions.

Networks of interacting functionally linked proteins can be traced out by detecting

functional linkages of all the proteins of an organism. Computational as well as

other methods can infer function fairly reliably. This homology method is widely

used to extend knowledge of protein function from one protein to its cousins,

which are presumably descended from the same common ancestral protein.

These powerful programs are used to extend experimental knowledge of protein

function to new sequences culminating to assigning of function to roughly 40–

70% of new genome sequences by such homology techniques. Protein Data Bank

(PDB) is now the sole repository for three-dimensional structure data of biological

macro-molecules and it uses advanced methods for annotation, archive–making

and access. PDB provides information on various aspects of structural genomics

as development and maintenance of the Target Registration Database (TargetDB),

organization of data dictionaries to define specification for exchange and

deposition of data with structural genomics centres, and creation of software tools

to capture data from standard structure determination applications.

As proteins do not act independently in most of the cases and form transient or

stable complexes with other proteins. The protein might be intricate as complexes

of variable composition and therefore it is essential to study the protein complexes

along with the conditions that result in their formation or dissociation for the

complete understanding of a biological system.

The protein pathways are a series of reactions inside the cell that exert a particular

biological effect. The proteins that are directly involved in reaction along with

those that regulates the pathways are combined in pathway databases; therefore,

a number of resources and databases are available for the protein pathways. The

KEGG, Ingenuity, Pathway Knowledge Base Reactome and BioCarta are some of

the pathway databases that include a comprehensive data regarding metabolism,

signalling and interactions. In addition to these comprehensive databases, the

specific databases for signal transduction pathways such as GenMAPP or PANTHER

have been developed. Moreover, databases such as Netpath have been developed,

which involve the pathways active in cancer that are helpful for the identification

of proteins relevant for a cancer type. These public databases possess higher

connectivity that allows novel findings for proteins. The databases such as

BioGRID, IntAct, MINT and HRPD contain the information with reference to protein

interactions in complexes. STRING is not only a widely used database for protein

interaction data, but it connects to various other resources for literature mining.

Furthermore, protein networks can be drawn based on the list of genes provided

and the available interactions using STRING database.

Date post:	11-Jul-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

to fast “scanning” ESI TOF · 2020-06-01 · protein ions are accelerated in the presence of a...

Documents