Intro to Bioinformatics Computational Approaches to Receptor Structure Prediction Uğur Sezerman...

Intro to Bioinformatics

Computational Approaches to Receptor Computational Approaches to Receptor Structure Prediction Structure Prediction

Uğur SezermanBiological Sciences and Bioengineering Program

Sabancı University, Istanbul

Protein Folding 2Intro to Bioinformatics

Determining Protein StructureDetermining Protein Structure There are O(100,000) distinct proteins in the

human proteome. 3D structures have been determined for over

60,000 proteins, from all organisms• Includes duplicates with different ligands bound,

etc.

Coordinates are determined by X-ray X-ray crystallographycrystallography or NMR or NMR


X-Ray CrystallographyX-Ray Crystallography

~0.5mm

• The crystal is a mosaic of millions of copies of the protein.

• As much as 70% is solvent (water)!

• May take months (and a “green” thumb) to grow.


X-Ray diffractionX-Ray diffraction

Image is averagedover:• Space (many copies)• Time (of the diffraction

experiment)


Electron Density MapsElectron Density Maps Resolution is

dependent on the quality/regularity of the crystal

R-factor is a measure of “leftover” electron density

Solvent fitting Refinement


The Protein Data BankThe Protein Data Bank

ATOM 1 N ALA E 1 22.382 47.782 112.975 1.00 24.09 3APR 213ATOM 2 CA ALA E 1 22.957 47.648 111.613 1.00 22.40 3APR 214ATOM 3 C ALA E 1 23.572 46.251 111.545 1.00 21.32 3APR 215ATOM 4 O ALA E 1 23.948 45.688 112.603 1.00 21.54 3APR 216ATOM 5 CB ALA E 1 23.932 48.787 111.380 1.00 22.79 3APR 217ATOM 6 N GLY E 2 23.656 45.723 110.336 1.00 19.17 3APR 218ATOM 7 CA GLY E 2 24.216 44.393 110.087 1.00 17.35 3APR 219ATOM 8 C GLY E 2 25.653 44.308 110.579 1.00 16.49 3APR 220ATOM 9 O GLY E 2 26.258 45.296 110.994 1.00 15.35 3APR 221ATOM 10 N VAL E 3 26.213 43.110 110.521 1.00 16.21 3APR 222ATOM 11 CA VAL E 3 27.594 42.879 110.975 1.00 16.02 3APR 223ATOM 12 C VAL E 3 28.569 43.613 110.055 1.00 15.69 3APR 224ATOM 13 O VAL E 3 28.429 43.444 108.822 1.00 16.43 3APR 225ATOM 14 CB VAL E 3 27.834 41.363 110.979 1.00 16.66 3APR 226ATOM 15 CG1 VAL E 3 29.259 41.013 111.404 1.00 17.35 3APR 227ATOM 16 CG2 VAL E 3 26.811 40.649 111.850 1.00 17.03 3APR 228

http://www.rcsb.org/pdb/


A Peek at Protein FunctionA Peek at Protein Function Serine proteases – cleave other proteins

• Catalytic Triad: ASP, HIS, SER


Cleaving the peptide bondCleaving the peptide bond


Three Serine ProteasesThree Serine Proteases Chymotrypsin – Cleaves the peptide bond on

the carboxyl side of aromatic (ring) residues: Trp, Phe, Tyr; and large hydrophobic residues: Met.

Trypsin – Cleaves after Lys (K) or Arg (R)• Positive charge

Elastase – Cleaves after small residues: Gly, Ala, Ser, Cys


Specificity Binding PocketSpecificity Binding Pocket


Protein Folding – Biological perspectiveProtein Folding – Biological perspective ““Central dogma”: Central dogma”: Sequence specifies structureSequence specifies structure Denature – to “unfold” a protein back to

random coil configuration-mercaptoethanol – breaks disulfide bonds• Urea or guanidine hydrochloride – denaturant• Also heat or pH

Anfinsen’s experiments• Denatured ribonuclease• Spontaneously regained enzymatic activity• Evidence that it re-folded to native conformation


PROTEIN FOLDING PROBLEMPROTEIN FOLDING PROBLEM STARTING FROM AMINO ACID SEQUENCE

FINDING THE STRUCTURE OF PROTEINS IS CALLED THE PROTEIN FOLDING PROBLEM


The Protein Folding ProblemThe Protein Folding Problem Central question of molecular biology:

“Given a particular sequence of amino acid Given a particular sequence of amino acid residues (primary structure), what will the residues (primary structure), what will the tertiary/quaternary structure of the resulting tertiary/quaternary structure of the resulting protein be?”protein be?”

Input: AAVIKYGCAL…Output: 11, 22…= backbone conformation:(no side chains yet)


Folding intermediatesFolding intermediates Levinthal’s paradox – Consider a 100 residue

protein. If each residue can take only 3x3=9 positions, there are 9100 possible conformations.

Folding must proceed by progressive stabilization of intermediates• Molten globules – most secondary structure formed,

but much less compact than “native” conformation.


Protein PackingProtein Packing

• occurs in the cytosol (~60% bulk water, ~40% water of hydration)

• involves interaction between secondary structure elements and solvent

• may be promoted by chaperones, membrane proteins

• tumbles into molten globule states

• overall entropy loss is small enough so enthalpy determines sign of E, which decreases (loss in entropy from packing counteracted by gain from desolvation and reorganization of water, i.e. hydrophobic effect)

• yields tertiary structure


Folding helpFolding help Proteins are, in fact, only marginally stable

• Native state is typically only 5 to 10 kcal/mole more stable than the unfolded form

Many proteins help in folding• Protein disulfide isomerase – catalyzes shuffling of

disulfide bonds• Chaperones – break up aggregates and (in theory)

unfold misfolded proteins


Forces driving protein foldingForces driving protein folding It is believed that hydrophobic collapse is a key

driving force for protein folding• Hydrophobic core• Polar surface interacting with solvent

Minimum volume (no cavities) Disulfide bond formation stabilizes Hydrogen bonds Polar and electrostatic interactions


Secondary StructureSecondary Structure

non-linear 3 dimensional localized to regions of an

amino acid chain formed and stabilized by

hydrogen bonding, electrostatic and van der Waals interactions


Common motifsCommon motifs


The Hydrophobic CoreThe Hydrophobic Core Hemoglobin A is the protein in red blood cells

(erythrocytes) responsible for binding oxygen. The mutation E6V in the chain places a

hydrophobic Val on the surface of hemoglobin The resulting “sticky patch” causes hemoglobin

S to agglutinate (stick together) and form fibers which deform the red blood cell and do not carry oxygen efficiently

Sickle cell anemia was the first identified molecular disease


Sickle Cell AnemiaSickle Cell Anemia

Sequestering hydrophobic residues in Sequestering hydrophobic residues in the protein core protects proteins from the protein core protects proteins from hydrophobic agglutination.hydrophobic agglutination.


Computational ApproachesComputational Approaches Ab initio methods Threading Comperative Modelling Fragment Assembly


Why is ab-initio prediction hard?

Protein Folding 24Intro to Bioinformaticsconformation

ener

gyAb-initio protein structure prediction as

an optimization problem

2. Solve the computational problem of finding an optimal structure.

3.

1. Define a function that map protein structures to some quality measure.


A dream function Has a clear minimum in the native structure. Has a clear path towards the minimum. Global optimization algorithm should find the

native structure.

Chen KeasarBGU


An approximate function Easier to design and compute. Native structure not always the global minimum. Global optimization methods do not converge. Many

alternative models (decoys) should be generated.

Chen KeasarBGU


An approximate function Easier to design and compute. Native structure not always the global minimum. Global optimization methods do not converge. Many

alternative models (decoys) should be generated. No clear way of choosing among them.

Decoy set

Chen KeasarBGU


Fold OptimizationFold Optimization Simple lattice models (HP-

models)• Two types of residues:

hydrophobic and polar• 2-D or 3-D lattice• The only force is hydrophobic

collapse• Score = number of HH

contacts


H/P model scoring: count noncovalent hydrophobic interactions.

Sometimes:• Penalize for buried polar or surface hydrophobic

residues

Scoring Lattice ModelsScoring Lattice Models


What can we do with lattice models?What can we do with lattice models? For smaller polypeptides, exhaustive search can

be used• Looking at the “best” fold, even in such a simple

model, can teach us interesting things about the protein folding process

For larger chains, other optimization and search methods must be used• Greedy, branch and bound• Evolutionary computing, simulated annealing• Graph theoretical methods


The “hydrophobic zipper” effect:

Learning from Lattice ModelsLearning from Lattice Models

Ken Dill ~ 1997


Threading: Fold recognitionThreading: Fold recognition Given:

• Sequence: IVACIVSTEYDVMKAAR…

• A database of molecular coordinates

Map the sequence onto each fold

Evaluate• Objective 1: improve

scoring function• Objective 2: folding


Protein Fold FamiliesProtein Fold Families CATH website

www.cathdb.info


Secondary Structure PredictionSecondary Structure Prediction

AGVGTVPMTAYGNDIQYYGQVT…AGVGTVPMTAYGNDIQYYGQVT…A-VGIVPM-AYGQDIQY-GQVT…AG-GIIP--AYGNELQ--GQVT…AGVCTVPMTA---ELQYYG--T…

AGVGTVPMTAYGNDIQYYGQVT…AGVGTVPMTAYGNDIQYYGQVT…----hhhHHHHHHhhh--eeEE…----hhhHHHHHHhhh--eeEE…


Secondary Structure PredictionSecondary Structure Prediction Easier than folding

• Current algorithms can prediction secondary structure with 70-80% accuracy

Chou, P.Y. & Fasman, G.D. (1974). Biochemistry, 13, 211-222.

• Based on frequencies of occurrence of residues in helices and sheets

PhD – Neural network based• Uses a multiple sequence alignment• Rost & Sander, Proteins, 1994 , 19, 55-72


Chou-Fasman ParametersChou-Fasman ParametersName Abbrv P(a) P(b) P(turn) f(i) f(i+1) f(i+2) f(i+3)Alanine A 142 83 66 0.06 0.076 0.035 0.058Arginine R 98 93 95 0.07 0.106 0.099 0.085Aspartic Acid D 101 54 146 0.147 0.11 0.179 0.081Asparagine N 67 89 156 0.161 0.083 0.191 0.091Cysteine C 70 119 119 0.149 0.05 0.117 0.128Glutamic Acid E 151 37 74 0.056 0.06 0.077 0.064Glutamine Q 111 110 98 0.074 0.098 0.037 0.098Glycine G 57 75 156 0.102 0.085 0.19 0.152Histidine H 100 87 95 0.14 0.047 0.093 0.054Isoleucine I 108 160 47 0.043 0.034 0.013 0.056Leucine L 121 130 59 0.061 0.025 0.036 0.07Lysine K 114 74 101 0.055 0.115 0.072 0.095Methionine M 145 105 60 0.068 0.082 0.014 0.055Phenylalanine F 113 138 60 0.059 0.041 0.065 0.065Proline P 57 55 152 0.102 0.301 0.034 0.068Serine S 77 75 143 0.12 0.139 0.125 0.106Threonine T 83 119 96 0.086 0.108 0.065 0.079Tryptophan W 108 137 96 0.077 0.013 0.064 0.167Tyrosine Y 69 147 114 0.082 0.065 0.114 0.125Valine V 106 170 50 0.062 0.048 0.028 0.053


HOMOLOGY MODELLINGHOMOLOGY MODELLING Using database search algorithms find the

sequence with known structure that best matches the query sequence

Assign the structure of the core regions obtained from the structure database to the query sequence

Find the structure of the intervening loops using loop closure algorithms


Homology Modeling: How it works

o Find template

o Align target sequence with template

o Generate model:- add loops- add sidechains

o Refine model


Prediction of Protein StructuresPrediction of Protein Structures Examples – a few good examples

actual predicted actual

actual actual

predicted

predicted predicted


Prediction of Protein StructuresPrediction of Protein Structures Not so good example


1esr1esr




How can we predict protein structures?

Are we lucky?

yes

A

V

C WK

A

GK

C

A C C W K A

V GKC

C

+

A

V

C WK

A

GK

C

C

homology

no

ab initio

a bit

fold recognition


HOMOLOGY MODELLINGHOMOLOGY MODELLING Using database search algorithms find the

sequence with known structure that best matches the query sequence

Assign the structure of the core regions obtained from the structure database to the query sequence

Find the structure of the intervening loops using loop closure algorithms


Homology Modeling: How it works

o Find template

o Align target sequence with template

o Generate model:- add loops- add sidechains

o Refine model


Prediction of Protein StructuresPrediction of Protein Structures Examples – a few good examples

actual predicted actual

actual actual

predicted

predicted predicted


Prediction of Protein StructuresPrediction of Protein Structures Not so good example


1esr1esr




G-protein coupled receptors (GPCRs)G-protein coupled receptors (GPCRs)

Vital protein bundles with versatile functions.

Play a key role in cellular signaling, regulation of basic physiological processes by interacting with more than 50% of prescription drugs.

Therefore excellent potential therapeutic target for drug design and the focus of current

pharmaceutical research.


GPCR Functional Classification ProblemGPCR Functional Classification Problem

Although thousands of GPCR sequences are known, the crystal structure solved only for one GPCR sequence at medium resolution to date.

For many of them, the activating ligand is unknown.

Functional classification methods for automated characterization of such GPCRs is imperative.

Not suitable for homology modelling but hybrid methods may work. A Rayan J. Mol. Modelling (2010) p 183-191


Schematic overview of the MHC-I antigen processing and Schematic overview of the MHC-I antigen processing and

presentation pathwaypresentation pathway


Pathway and MHC MoleculePathway and MHC Molecule Cytotoxic T-cells recognize antigen peptides (8-10 residues) bound

to a MHC class I molecule on the cell surface.


MHC-I bound epitope is scanned by T-cell receptorMHC-I bound epitope is scanned by T-cell receptor

Date post:	21-Dec-2015
Category:	Documents
View:	218 times
Download:	0 times

Intro to Bioinformatics Computational Approaches to Receptor Structure Prediction Uğur Sezerman...

Documents