Aka, The Inverse Folding Problem
Topic 18Chapter 39, Du and Bourne “Structural Bioinformatics”
Protein Design is an Inverse Problem of Structure Prediction
MDVGQAVIFLGPPGAGKGTQASRLAQELGFKKLSTGDILRDHVARGTPLGERVRPIMERGDLVPDDLILELIREELAERVIFDGFPRTLAQAEALDRLLSETGTRLLGVVLVEVPEEELVRRIL…
Biology
Adopted from Amy Keating’s slides at MIT.
Different Types of Protein Design
Protein design
Grand challengeDe novo design
Immediate Practical applications
Design of new proteins -- novel protein folds-- binding interfaces-- enzymatic activities-- etc.
Redesign of existing proteins-- increased thermostability-- altered binding specificity-- improved binding affinity-- enhanced enzymatic activity-- altered substrate specificity
Current Opinion in Biotechnology 2007, 18:1-7.
Protein Design Problems
Annu. Rev. Biochem. 2008. 77:363-382.
Goal: design a protein that adopts a given structure
Open problems with assessment:-- What resolution is required? (fold,
sidechain, loop, etc?)-- Stability of the designed protein-- Structural uniqueness-- Must solve the structure to know how
you did!
There are typically many sequences that adopt the fold, so you must try to find one that the most stable.
That is, minimize the quantity:
DGfold = Gfolded – Gunfolded
Search through many possible sequences, and then pick the one with the best Gfold.
Design target Designed protein
The big challengesSearch
The search space is astronomical: 20n
Except in rare subspace search problems, this is computationally intractable.
It is practically impossible to DGfold because…
-- What is the structure of the folded state? (sidechain and loop positions)
-- How do we model the unfolded state?
-- Entropy?!
Instead, we focus on the energy of the folded protein, meaning native structure interactions. That is, replace DGfold with DEfold using MM force fields.
Energy
Sidechain packingDesign target Designed protein
As we did with structure prediction in homology modeling, we will typically use a rotamer library-based approach.
Search algorithms for large spaces
Exhaustive search – too slow!
Stochastic methods-- Monte Carlo-- Genetic algorithms
Pruning algorithms (which are deterministic)-- Branch and Bound-- Dead End Elimination
For all-atom protein design, some amount of stochasticism is generally required. Purely deterministic approaches rarely succeed in designing complete proteins.
Dead End EliminationEliminate, one at a time, rotamer choices that cannot under any circumstance be part of the minimum energy solution.
From Wikipedia: DEE is a method for minimizing a function over a discrete set of independent variables. The basic idea is to identify "dead ends",
i.e., "bad" combinations of variables that cannot possibly yield the global minimum and to refrain from searching such combinations further. Hence,
dead-end elimination is a mirror image of dynamic programming techniques in which "good" combinations are identified and explored further.
Although the method itself is general, it has been developed and applied mainly to the problems of predicting and designing the structures of proteins.
Dead End EliminationThere is a global minimum energy conformation (GMEC) where each residue as a unique rotamer, meaning the GEMC is the set of rotamers that has
the lowest energy.
Energy is defined as pairwise decomposable, meaning the total energy is broken down into pairwise interactions + the energy of the rotamer interacting
with the backbone.
Self energy Pairwise energy
Dead End EliminationThere is a global minimum energy conformation (GMEC) where each residue as a unique rotamer, meaning the GEMC is the set of rotamers that has
the lowest energy.
Energy is defined as pairwise decomposable, meaning the total energy is broken down into pairwise interactions.
In this example, assuming a fixed background (black), the rotamer that has the lower energy is chosen.
Dead End EliminationHowever, all of the other rotamers are not fixed. (Nor is it realistic to assume the backbone is either, but we’ll brush that issue aside for now.)
If the blue rotamer is always lower energy than the red, for example, then we can eliminate consideration of it from all considerations of future
configurations.
Iterate till completion.
Put otherwise, if the “worst case scenario” for blue is better than the “best case scenario” for red, then you always choose blue.
Dead End Elimination in WordsDead-end elimination algorithms provide a deterministic approach to finding the global minimum energy conformation (GMEC) of a set of amino
acid side chains anchored to specified backbone coordinates. All of the rotamers at a particular residue position are essentially in competition for
inclusion in the GMEC. The idea underlying DEE algorithms is that, by comparing the energy contributions of different candidate rotamers at a given
position, it is possible to identify certain rotamers which cannot exist in the GMEC. These dead-ending rotamers can be eliminated from future
consideration, thus decreasing the combinatorial size of the problem.
To follow this approach, the potential function used to evaluate the conformational energy must be expressed solely in terms of pairwise interactions.
The relative merits of candidate rotamers at a given position can then be ascertained without having to evaluate the total energy of all conformations
using each of the candidates. Instead, only the portion of the total energy that arises from pairwise interactions with the position in question need be
considered. By comparing the relative size of the pairwise energy contributions using each of the candidate rotamers at this position, it is possible to
identify incompatibility with the GMEC without knowledge of the actual minimum energy. The combinatorial cost of this procedure is far less
than the cost of complete enumeration of the energy of each conformation.
Pierce et al., 2000, J Comp Chem, 999-1009.
Dead End EliminationThis condition implies that i
r can be eliminated if the net energy
contribution resulting from its best-case pairwise interactions with
rotamers at all other positions (spanned by ju
) is still worse than that
produced by the worst-case pairwise interactions of some other
candidate rotamer, it, at the same position.
Pierce et al., 2000, J Comp Chem, 999-1009.
Different dead-end elimination criteria for sample energy profiles. The abscissa represents all possible conformations of the protein and the ordinate describes the net energy contribution produced by interactions with specific rotamers at position i. (a) Original DEE: ir is eliminated by it1, but not by it2.
Dead End EliminationIf two rotamers red,blue at residue position i, identify and eliminate rotamers that cannot be part of the best solution. Here, the red rotamer is
eliminated by the blue.
Note: Cannot afford to calculate energies for all of these configurations!
Most favorable interaction of red with conformational backgroundMost unfavorable interaction of blue with conformational background
DEE algorithm applied to protein design
Dead End Elimination: Goldstein criterionEnergy is composed of two parts: interaction with template and pairwise interactions between residues.
What is the least energy it would cost to replace it with i
r? We use the simple Goldstein criterion, which eliminates rotamer r at position i if, when
compared to some other rotamer t at the same position, the following inequality is satisfied:
DEE algorithm applied to protein design
If DE > 0, then eliminate ir.
Apply iteratively to all rotamer pairs.
The energy profile changes as rotamers are eliminated, leading to
elimination of further rotamers.
Coiled-coil design (Mayo et al.)
Biosensor design (Hellinga et al.)
The Hellinga lab has designed many different receptors based on the bPBP fold.
Protein-protein interface design (Love, Mayo, et al.)
Rosetta Design
Initial sequence selection
(primarily 12-6, HB, and Born terms)
Monte Carlo minimization
(both at rotamer and backbone levels)
Sequence optimization
Sketch input structure (the fold) Final structure
Note: this step is analogous to structure
prediction!
Repeat till convergence
Top7 (Baker, Kuhlman, et al.)
Conformational switch (Kuhlman, et al.)
unfolded
folded
Fold
ed to
unf
olde
d tra
nsiti
on a
s zi
nc is
titra
ted
in
The ideal: Designed sequences that meet both criteria
state of the art
The Holy Grail
TS: transition state
Design model: purpleX-ray crystal structure: green
The dirty little secret of protein design…
For every high impact success in the protein design literature, there are dozens (perhaps hundreds) of spectacular failures that go unreported.
Paraphrased from S. Mayo (Protein Society Meeting, 2006).
Scientific misconduct?
Design of a novel triosephosphate isomerase
DEE repacking around catalytic site
Scientific misconduct?
Design of a novel triosephosphate isomerase
Lineweaver-Burke plots
As do I!
Scientific misconduct?
Design of a novel triosephosphate isomerase