Date post: | 21-Dec-2015 |
Category: |
Documents |
View: | 218 times |
Download: | 0 times |
Intro to Bioinformatics
Computational Approaches to Receptor Computational Approaches to Receptor Structure Prediction Structure Prediction
Uğur SezermanBiological Sciences and Bioengineering Program
Sabancı University, Istanbul
Protein Folding 2Intro to Bioinformatics
Determining Protein StructureDetermining Protein Structure There are O(100,000) distinct proteins in the
human proteome. 3D structures have been determined for over
60,000 proteins, from all organisms• Includes duplicates with different ligands bound,
etc.
Coordinates are determined by X-ray X-ray crystallographycrystallography or NMR or NMR
Protein Folding 3Intro to Bioinformatics
X-Ray CrystallographyX-Ray Crystallography
~0.5mm
• The crystal is a mosaic of millions of copies of the protein.
• As much as 70% is solvent (water)!
• May take months (and a “green” thumb) to grow.
Protein Folding 4Intro to Bioinformatics
X-Ray diffractionX-Ray diffraction
Image is averagedover:• Space (many copies)• Time (of the diffraction
experiment)
Protein Folding 5Intro to Bioinformatics
Electron Density MapsElectron Density Maps Resolution is
dependent on the quality/regularity of the crystal
R-factor is a measure of “leftover” electron density
Solvent fitting Refinement
Protein Folding 6Intro to Bioinformatics
The Protein Data BankThe Protein Data Bank
ATOM 1 N ALA E 1 22.382 47.782 112.975 1.00 24.09 3APR 213ATOM 2 CA ALA E 1 22.957 47.648 111.613 1.00 22.40 3APR 214ATOM 3 C ALA E 1 23.572 46.251 111.545 1.00 21.32 3APR 215ATOM 4 O ALA E 1 23.948 45.688 112.603 1.00 21.54 3APR 216ATOM 5 CB ALA E 1 23.932 48.787 111.380 1.00 22.79 3APR 217ATOM 6 N GLY E 2 23.656 45.723 110.336 1.00 19.17 3APR 218ATOM 7 CA GLY E 2 24.216 44.393 110.087 1.00 17.35 3APR 219ATOM 8 C GLY E 2 25.653 44.308 110.579 1.00 16.49 3APR 220ATOM 9 O GLY E 2 26.258 45.296 110.994 1.00 15.35 3APR 221ATOM 10 N VAL E 3 26.213 43.110 110.521 1.00 16.21 3APR 222ATOM 11 CA VAL E 3 27.594 42.879 110.975 1.00 16.02 3APR 223ATOM 12 C VAL E 3 28.569 43.613 110.055 1.00 15.69 3APR 224ATOM 13 O VAL E 3 28.429 43.444 108.822 1.00 16.43 3APR 225ATOM 14 CB VAL E 3 27.834 41.363 110.979 1.00 16.66 3APR 226ATOM 15 CG1 VAL E 3 29.259 41.013 111.404 1.00 17.35 3APR 227ATOM 16 CG2 VAL E 3 26.811 40.649 111.850 1.00 17.03 3APR 228
http://www.rcsb.org/pdb/
Protein Folding 7Intro to Bioinformatics
A Peek at Protein FunctionA Peek at Protein Function Serine proteases – cleave other proteins
• Catalytic Triad: ASP, HIS, SER
Protein Folding 8Intro to Bioinformatics
Cleaving the peptide bondCleaving the peptide bond
Protein Folding 9Intro to Bioinformatics
Three Serine ProteasesThree Serine Proteases Chymotrypsin – Cleaves the peptide bond on
the carboxyl side of aromatic (ring) residues: Trp, Phe, Tyr; and large hydrophobic residues: Met.
Trypsin – Cleaves after Lys (K) or Arg (R)• Positive charge
Elastase – Cleaves after small residues: Gly, Ala, Ser, Cys
Protein Folding 10Intro to Bioinformatics
Specificity Binding PocketSpecificity Binding Pocket
Protein Folding 11Intro to Bioinformatics
Protein Folding – Biological perspectiveProtein Folding – Biological perspective ““Central dogma”: Central dogma”: Sequence specifies structureSequence specifies structure Denature – to “unfold” a protein back to
random coil configuration-mercaptoethanol – breaks disulfide bonds• Urea or guanidine hydrochloride – denaturant• Also heat or pH
Anfinsen’s experiments• Denatured ribonuclease• Spontaneously regained enzymatic activity• Evidence that it re-folded to native conformation
Protein Folding 12Intro to Bioinformatics
PROTEIN FOLDING PROBLEMPROTEIN FOLDING PROBLEM STARTING FROM AMINO ACID SEQUENCE
FINDING THE STRUCTURE OF PROTEINS IS CALLED THE PROTEIN FOLDING PROBLEM
Protein Folding 13Intro to Bioinformatics
The Protein Folding ProblemThe Protein Folding Problem Central question of molecular biology:
“Given a particular sequence of amino acid Given a particular sequence of amino acid residues (primary structure), what will the residues (primary structure), what will the tertiary/quaternary structure of the resulting tertiary/quaternary structure of the resulting protein be?”protein be?”
Input: AAVIKYGCAL…Output: 11, 22…= backbone conformation:(no side chains yet)
Protein Folding 14Intro to Bioinformatics
Folding intermediatesFolding intermediates Levinthal’s paradox – Consider a 100 residue
protein. If each residue can take only 3x3=9 positions, there are 9100 possible conformations.
Folding must proceed by progressive stabilization of intermediates• Molten globules – most secondary structure formed,
but much less compact than “native” conformation.
Protein Folding 15Intro to Bioinformatics
Protein PackingProtein Packing
• occurs in the cytosol (~60% bulk water, ~40% water of hydration)
• involves interaction between secondary structure elements and solvent
• may be promoted by chaperones, membrane proteins
• tumbles into molten globule states
• overall entropy loss is small enough so enthalpy determines sign of E, which decreases (loss in entropy from packing counteracted by gain from desolvation and reorganization of water, i.e. hydrophobic effect)
• yields tertiary structure
Protein Folding 16Intro to Bioinformatics
Folding helpFolding help Proteins are, in fact, only marginally stable
• Native state is typically only 5 to 10 kcal/mole more stable than the unfolded form
Many proteins help in folding• Protein disulfide isomerase – catalyzes shuffling of
disulfide bonds• Chaperones – break up aggregates and (in theory)
unfold misfolded proteins
Protein Folding 17Intro to Bioinformatics
Forces driving protein foldingForces driving protein folding It is believed that hydrophobic collapse is a key
driving force for protein folding• Hydrophobic core• Polar surface interacting with solvent
Minimum volume (no cavities) Disulfide bond formation stabilizes Hydrogen bonds Polar and electrostatic interactions
Protein Folding 18Intro to Bioinformatics
Secondary StructureSecondary Structure
non-linear 3 dimensional localized to regions of an
amino acid chain formed and stabilized by
hydrogen bonding, electrostatic and van der Waals interactions
Protein Folding 19Intro to Bioinformatics
Common motifsCommon motifs
Protein Folding 20Intro to Bioinformatics
The Hydrophobic CoreThe Hydrophobic Core Hemoglobin A is the protein in red blood cells
(erythrocytes) responsible for binding oxygen. The mutation E6V in the chain places a
hydrophobic Val on the surface of hemoglobin The resulting “sticky patch” causes hemoglobin
S to agglutinate (stick together) and form fibers which deform the red blood cell and do not carry oxygen efficiently
Sickle cell anemia was the first identified molecular disease
Protein Folding 21Intro to Bioinformatics
Sickle Cell AnemiaSickle Cell Anemia
Sequestering hydrophobic residues in Sequestering hydrophobic residues in the protein core protects proteins from the protein core protects proteins from hydrophobic agglutination.hydrophobic agglutination.
Protein Folding 22Intro to Bioinformatics
Computational ApproachesComputational Approaches Ab initio methods Threading Comperative Modelling Fragment Assembly
Protein Folding 23Intro to Bioinformatics
Why is ab-initio prediction hard?
Protein Folding 24Intro to Bioinformaticsconformation
ener
gyAb-initio protein structure prediction as
an optimization problem
2. Solve the computational problem of finding an optimal structure.
3.
1. Define a function that map protein structures to some quality measure.
Protein Folding 25Intro to Bioinformatics
A dream function Has a clear minimum in the native structure. Has a clear path towards the minimum. Global optimization algorithm should find the
native structure.
Chen KeasarBGU
Protein Folding 26Intro to Bioinformatics
An approximate function Easier to design and compute. Native structure not always the global minimum. Global optimization methods do not converge. Many
alternative models (decoys) should be generated.
Chen KeasarBGU
Protein Folding 27Intro to Bioinformatics
An approximate function Easier to design and compute. Native structure not always the global minimum. Global optimization methods do not converge. Many
alternative models (decoys) should be generated. No clear way of choosing among them.
Decoy set
Chen KeasarBGU
Protein Folding 28Intro to Bioinformatics
Fold OptimizationFold Optimization Simple lattice models (HP-
models)• Two types of residues:
hydrophobic and polar• 2-D or 3-D lattice• The only force is hydrophobic
collapse• Score = number of HH
contacts
Protein Folding 29Intro to Bioinformatics
H/P model scoring: count noncovalent hydrophobic interactions.
Sometimes:• Penalize for buried polar or surface hydrophobic
residues
Scoring Lattice ModelsScoring Lattice Models
Protein Folding 30Intro to Bioinformatics
What can we do with lattice models?What can we do with lattice models? For smaller polypeptides, exhaustive search can
be used• Looking at the “best” fold, even in such a simple
model, can teach us interesting things about the protein folding process
For larger chains, other optimization and search methods must be used• Greedy, branch and bound• Evolutionary computing, simulated annealing• Graph theoretical methods
Protein Folding 31Intro to Bioinformatics
The “hydrophobic zipper” effect:
Learning from Lattice ModelsLearning from Lattice Models
Ken Dill ~ 1997
Protein Folding 32Intro to Bioinformatics
Threading: Fold recognitionThreading: Fold recognition Given:
• Sequence: IVACIVSTEYDVMKAAR…
• A database of molecular coordinates
Map the sequence onto each fold
Evaluate• Objective 1: improve
scoring function• Objective 2: folding
Protein Folding 33Intro to Bioinformatics
Protein Fold FamiliesProtein Fold Families CATH website
www.cathdb.info
Protein Folding 34Intro to Bioinformatics
Secondary Structure PredictionSecondary Structure Prediction
AGVGTVPMTAYGNDIQYYGQVT…AGVGTVPMTAYGNDIQYYGQVT…A-VGIVPM-AYGQDIQY-GQVT…AG-GIIP--AYGNELQ--GQVT…AGVCTVPMTA---ELQYYG--T…
AGVGTVPMTAYGNDIQYYGQVT…AGVGTVPMTAYGNDIQYYGQVT…----hhhHHHHHHhhh--eeEE…----hhhHHHHHHhhh--eeEE…
Protein Folding 35Intro to Bioinformatics
Secondary Structure PredictionSecondary Structure Prediction Easier than folding
• Current algorithms can prediction secondary structure with 70-80% accuracy
Chou, P.Y. & Fasman, G.D. (1974). Biochemistry, 13, 211-222.
• Based on frequencies of occurrence of residues in helices and sheets
PhD – Neural network based• Uses a multiple sequence alignment• Rost & Sander, Proteins, 1994 , 19, 55-72
Protein Folding 36Intro to Bioinformatics
Chou-Fasman ParametersChou-Fasman ParametersName Abbrv P(a) P(b) P(turn) f(i) f(i+1) f(i+2) f(i+3)Alanine A 142 83 66 0.06 0.076 0.035 0.058Arginine R 98 93 95 0.07 0.106 0.099 0.085Aspartic Acid D 101 54 146 0.147 0.11 0.179 0.081Asparagine N 67 89 156 0.161 0.083 0.191 0.091Cysteine C 70 119 119 0.149 0.05 0.117 0.128Glutamic Acid E 151 37 74 0.056 0.06 0.077 0.064Glutamine Q 111 110 98 0.074 0.098 0.037 0.098Glycine G 57 75 156 0.102 0.085 0.19 0.152Histidine H 100 87 95 0.14 0.047 0.093 0.054Isoleucine I 108 160 47 0.043 0.034 0.013 0.056Leucine L 121 130 59 0.061 0.025 0.036 0.07Lysine K 114 74 101 0.055 0.115 0.072 0.095Methionine M 145 105 60 0.068 0.082 0.014 0.055Phenylalanine F 113 138 60 0.059 0.041 0.065 0.065Proline P 57 55 152 0.102 0.301 0.034 0.068Serine S 77 75 143 0.12 0.139 0.125 0.106Threonine T 83 119 96 0.086 0.108 0.065 0.079Tryptophan W 108 137 96 0.077 0.013 0.064 0.167Tyrosine Y 69 147 114 0.082 0.065 0.114 0.125Valine V 106 170 50 0.062 0.048 0.028 0.053
Protein Folding 37Intro to Bioinformatics
HOMOLOGY MODELLINGHOMOLOGY MODELLING Using database search algorithms find the
sequence with known structure that best matches the query sequence
Assign the structure of the core regions obtained from the structure database to the query sequence
Find the structure of the intervening loops using loop closure algorithms
Protein Folding 38Intro to Bioinformatics
Homology Modeling: How it works
o Find template
o Align target sequence with template
o Generate model:- add loops- add sidechains
o Refine model
Protein Folding 39Intro to Bioinformatics
Prediction of Protein StructuresPrediction of Protein Structures Examples – a few good examples
actual predicted actual
actual actual
predicted
predicted predicted
Protein Folding 40Intro to Bioinformatics
Prediction of Protein StructuresPrediction of Protein Structures Not so good example
Protein Folding 41Intro to Bioinformatics
1esr1esr
Protein Folding 42Intro to Bioinformatics
Protein Folding 43Intro to Bioinformatics
Protein Folding 44Intro to Bioinformatics
How can we predict protein structures?
Are we lucky?
yes
A
V
C WK
A
GK
C
A C C W K A
V GKC
C
+
A
V
C WK
A
GK
C
C
homology
no
ab initio
a bit
fold recognition
Protein Folding 45Intro to Bioinformatics
HOMOLOGY MODELLINGHOMOLOGY MODELLING Using database search algorithms find the
sequence with known structure that best matches the query sequence
Assign the structure of the core regions obtained from the structure database to the query sequence
Find the structure of the intervening loops using loop closure algorithms
Protein Folding 46Intro to Bioinformatics
Homology Modeling: How it works
o Find template
o Align target sequence with template
o Generate model:- add loops- add sidechains
o Refine model
Protein Folding 47Intro to Bioinformatics
Prediction of Protein StructuresPrediction of Protein Structures Examples – a few good examples
actual predicted actual
actual actual
predicted
predicted predicted
Protein Folding 48Intro to Bioinformatics
Prediction of Protein StructuresPrediction of Protein Structures Not so good example
Protein Folding 49Intro to Bioinformatics
1esr1esr
Protein Folding 50Intro to Bioinformatics
Protein Folding 51Intro to Bioinformatics
Protein Folding 52Intro to Bioinformatics
G-protein coupled receptors (GPCRs)G-protein coupled receptors (GPCRs)
Vital protein bundles with versatile functions.
Play a key role in cellular signaling, regulation of basic physiological processes by interacting with more than 50% of prescription drugs.
Therefore excellent potential therapeutic target for drug design and the focus of current
pharmaceutical research.
Protein Folding 53Intro to Bioinformatics
GPCR Functional Classification ProblemGPCR Functional Classification Problem
Although thousands of GPCR sequences are known, the crystal structure solved only for one GPCR sequence at medium resolution to date.
For many of them, the activating ligand is unknown.
Functional classification methods for automated characterization of such GPCRs is imperative.
Not suitable for homology modelling but hybrid methods may work. A Rayan J. Mol. Modelling (2010) p 183-191
Protein Folding 54Intro to Bioinformatics
Schematic overview of the MHC-I antigen processing and Schematic overview of the MHC-I antigen processing and
presentation pathwaypresentation pathway
Protein Folding 55Intro to Bioinformatics
Pathway and MHC MoleculePathway and MHC Molecule Cytotoxic T-cells recognize antigen peptides (8-10 residues) bound
to a MHC class I molecule on the cell surface.
Protein Folding 56Intro to Bioinformatics
MHC-I bound epitope is scanned by T-cell receptorMHC-I bound epitope is scanned by T-cell receptor