Post on 17-Jan-2016
description
transcript
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Systematic Conformational Search with Constraint Satisfaction
Lisa Tucker Kellogg
Ph.D Thesis
Massachusetts Institute of Technology
June 2002
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Protein Conformation• Proteins can have more than one shape
– Three rotatable bonds per residue
• Conformations:– Set of possible 3-D arrangements of atoms
• Related to protein folding problem:– Find Conformation with lowest Gibbs Free Energy
• Finding conformations is high scientific priority
11/14/02 Systematic Conformational Search with Constraint Satisfaction
NMR Constraints• Two types:
– Distances between particular pairs of atoms– Dihedral Angles for rotatable Bonds
• The ability to determine all conformations consistent with constraints aids in NMR error analysis and confidence measures
• NMR is an important tool for solving structures
• Algorithms can be used with docking, homology modeling as well
11/14/02 Systematic Conformational Search with Constraint Satisfaction
This is Hard!
• Proteins can have 103-104 degrees of freedom• Conformation space has exponential number of
degrees of freedom• Non-linear changes• Searching whole space may be as hard as folding• GOAL: Provide methods to guarantee minimum
interval of coverage
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Conformational Search Algorithms• Usually consist of two parts:
– Engine that generates conformations– Module to evaluate conformations
• Systematic (exhaustive)– Search using predefined intervals, order– Guaranteed coverage– Slow
• Stochastic– Simulated Annealing, Distance Geometry algorithms– Currently more popular– Creates good hypotheses quickly
• Some algorithms rank conformations
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Cool Stuff in this Thesis• Improvements to Systematic Conformational Search
– Voxel Model– Divide & Conquer– OmniMerge– Propagation– A*
• Uses Systematic Conformational Search to solve a new structure
• Goals:– Enumerate all conformations that satisfy set of constraints– Use a systematic method to guarantee interval of coverage– Invest in up-front computations to save time later
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Systematic Conformational Search
• Basic Algorithms
• Voxel Model
• Minimization
• Divide-and-Conquer
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Gridsearch
TreeSearch
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Comparing Gridsearch & TreeSearch
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Comparing Gridsearch & TreeSearch
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Search Completeness• Want to rule out higher dimension
slab of space based on evaluating lower-dimensional fragment
• However, conformations that respond to regularly spaced grid points don’t capture everything
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Rotamers
• Certain angles of rotation cause steric clashes– Causes set of torsional angles to be divided into ranges
• Rotamers to refer to likely neighborhood of conformations for any molecular fragment– Patterns of low-energy conformations– Ranges of 60±ø, 180 ±ø, 300 ±ø
• Have option of searching rotamers first or calibrating resolution to match regular intervals of rotamers– Treesearch uses [60°,180°, 300°]– “Succeed first” approach
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Voxels• Unit of higher-dimensional volume
– Like pixel is to 2-D– New idea– This thesis is first application of voxel model
• Evaluate voxels instead of points on grid• Ask whether there exists any conformation in voxel that
satisfies constraints• No general, perfectly accurate way to test if conformation
exists in volume.– But we can do much better than just one point
• Try and center voxels on rotamers
Voxelized Treesearch
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Minimization as Search• Convert constraints to objective function• Use minimization of constraint violation function
as a heuristic for searching within voxel for satisfying conformations– Always start in same place– Search for points that minimize objective function
• If sufficiently close to zero: SUCCESS• If max iterations reached: FAILURE
• Choose local minimization for finding conformations in voxel
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Minimization within Treesearch
• First voxels will have 1 dimension, then two dimensions, etc.• Initialize starting conformations so first d torsions have same
solution found previously• Only choose d+1 torsion arbitrarily
– If first pass fails, start at midpoint– If performing more than two passes, start at random points
• This only works with minimization, as grid-based algorithms always start in the same place
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Multi-Resolution Search
• Evaluating voxels using minimization followed by gridsearch
• If low resolution search fails, search systematically at higher resolution
• Start at 120º, then go to 30º• Much faster than performing gridsearch on entire
space• Allows for stochastic nature of minimization
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Results• Tetrapeptide of polyalanine• Two tests with treesearch
1. Easily satisfied distance constraints
2. Extensive set of distance constraints
• Random sampling & gridsearch on both problems• Minimization used on both• Compare different methods of voxel evaluation
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Voxel Evaluation
Minimization with voxels finds the most conformations.
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Evaluating Voxels with Tighter Constraints
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Divide-and-Conquer
• Want to prune regions of conformational space based on evaluation of low dimensional pieces
• Improves on treesearch– Evaluates every piece before adding– Once subproblem is solved, answer saved– Can define subproblems so average size smaller
than treesearch
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Active Constraints
• Constraints on a subchain denote set of constraints active when the torsions in only that subchain are instantiated
• Satisfying conformation is one that satisfies all active constraints
• Example: – Inter-atomic distance constraint on residue 6– Applies to subchain 1-99– Not subchain 3-5
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Combine Operation• Use satisfying conformations for two
pieces of subchain to define possible candidates for the whole chain
• For residues x-z – with subchains x-y, y+1-z– Evaluate individually as voxel with
minimization– Higher-dimension voxels are
significantly more difficult
• Each candidate is unique voxel
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Merge-Trees
• Each node corresponds to subchain• Root is whole molecule• Traverse starting at leaves• Legal Tree
1. Leaves have one-to-one correspondence with residues
2. Left-to-right order of leaves is same as residues
3. Each internal node has two children
11/14/02 Systematic Conformational Search with Constraint Satisfaction
D&C as Merge-Tree Traversal• Algorithm
– Create default merge tree based on number of residues– Traverse tree from bottom to top– Solve subproblem at each node
• Subproblem: – Enumerate satisfying voxels for subchain at node– Leaf node: use treesearch w/voxels, minimization– Internal node: combine operation on child subchains
• Tree should be as balanced as possible
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Treesearch vs D & C
(a) Treesearch
(b) D&C, linear merge tree
(c) D&C, balanced merge tree
• Treesearch is like divide-and-conquer with a linear tree
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Effect of Divide-and-Conquer1. Polyalanine tetrapeptide
2. 16 residue Polyalanine helix
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Divide-and-Conquer Results• 1RST Peptide• 9-residue Strep-tag peptide from peptide-
streptavidin complex• Long, flexible side chains
– 40 rotatable torsions
– Searched sidechains at 120° vs 40° for backbone
11/14/02 Systematic Conformational Search with Constraint Satisfaction
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Real World Results
Solving a new structure!
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Structure Experiment
• Used systematic conformational search to solve new tripeptide– N-formyl-L-Met-L-Leu-L-Phe-OH
– (f-MLF-OH)
• NMR Data– 16 distances
– 18 Torsion angle constraints
• Simulated annealing found 24 conformations• New algorithm found 56,975 conformations
11/14/02 Systematic Conformational Search with Constraint Satisfaction
…And the Rest of the Story• First fMLF structure different from the published one.
– The constraints were slightly different– The "correct" structure was disallowed by the constraints.– Another completely different structure was allowed.
• NMR folks tried some fixes:– Added error padding to all the constraints to allow the "right"
answer– Came up with one more new constraint to rule out the "wrong"
answer– Rethought their methods for processing the raw data
• New analysis yielded constraints that were satisfied by the "right" answer and didn't allow any other answers.
11/14/02 Systematic Conformational Search with Constraint Satisfaction
The Need for Systematic Search
• If NMR folks used simulated annealing/distance geometry:– They never would have discovered the flaw in their
data processing– This is greatest real-world triumph of systematic search
methods that author knows of
• Many of the better structural biologists are aware of the potential for this sort of thing and are therefore primed to embrace systematic methods
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Break!
• When we return:– Merge Strategy Optimization
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Merge Strategy Optimization
• How does choice of Merge-Tree influence runtime?• Low-dimensional searches are so much faster than
high-dimensional searches, practically free• Do extra searches
– Search all possible subchains of size n before searching size n+1
– Choose which subchains to merge
• Adapt divide-and-conquer for constraint satisfaction• Invest computational time to save time later
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Innovations• Omni-Merge
– Search all possible subchains– Merge-Tree cost functions– Find optimal merge-tree
• Propagation– Enforces compatibility between overlapping
subchains– Share information to filter out bad conformations
• A*– Order subchain searches based on costs
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Importance of Choosing a Good Merge-Tree
• Divide-and-conquer works best with equal sized subproblems
• Not all subproblems have equal runtime– Constraints are not uniformly distributed
• Default balanced-tree – Not always optimal – Can be exponentially worse than optimal
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Example• Polyalanine alpha-helice, N=2k residues• 2k ideal for default trees. k+1 levels• Constrain atom i and atom j if:
– Residue of i is <= N/2– Residue of j > n/2– No constraints in either half– Won’t appear until final combine operation
• High Dimensional, unconstrained subchains
• Slow!
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Counter-Example
• Same molecule• Same Constraints
– Residue of i is <= N/2– Residue of j > n/2– No constraints in either half– Won’t appear until final combine
operations
• All merges cross boundary• Usually this tree is bad!
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Example - Results• Number of good conformations for a subchain
is exponential in the length of the subchain• So cost of searching this molecule with default
mergetree tree is O(e^N)
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Choosing Good Merge-Trees• Considerations:
– Small subchains can be searched more quickly than large ones
– Constraints not distributed evenly
– Constraints more important consideration than the fullness of tree
• Want to construct merge-trees that reflect this• Divide and Conquer still useful for isolating
highly constrained parts of molecule• Exploit Locality and Ordering
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Locality
• Good merge-trees define sub-problems with as few satisfying conformations as possible
• Want to define subchains with lots of constraints– Few satisfying conformations
• If there is a constraint between atoms i and j– It is more likely that atoms near i and j will also be
constrained
– Subchains with short length are not necessarily the ones with the least number of satisfying conformations
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Ordering
• Search order has a big effect on treesearch
• We want to search subchains in similar manner– Add poorly-constrained subchains as late as
possible– Put unconstrained chains near root
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Cost Function• Provide lower bound on run time
• Cost of non-leaf from scratch is cost of all nodes in subtree
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Computing the Optimum Cost• Root just another internal node
• TreeCost
• Enumerate all possible merge-trees
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Computing the Optimum Merge Tree• Use dynamic programming
• Computing table of BestTreeCosts gives optimal merge tree
• Start at top, work down
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Omnimerge
• Final combine operation consumes a large part of the total runtime
• Can we benefit from extra lower-dimensional searches?
• Search all subchains
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Locally Optimal Merges
• Perform all possible dipeptide merges – (1-2, 2-3, 3-4, etc...)
– Assume cost of extra work is insignificant
– Won't help if we're making tetrapeptides
– Gives us choices in making tripeptides
• Choose cheaper merge: – 1 & 2-3 or 1-2 & 3
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Omnimerge • Search all possible subchains in order of increasing size
• Choose combinations based on minimizing node cost
• Add cheapest merge to D.P. table
• Trivially fills in successive rows of BestTreeCosts table
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Mitigating the Cost of Omnimerge
• OmniMerge can be expensive
• Several variations can run faster, provide less "insurance”
• Omit some subchains
• Globally optimal merge-tree cannot be computed as side effect
• Useful when conformational search result is more important than knowing the optimal merge-tree
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Limited Omnimerge
• Set an upper limit on subchain size
• Search all subchains up to length L– L >= N/2
– Choose L = .667 for tests
– Searches more than default merge tree
– Less than regular omnimerge
• Each subchain gets to use optimal children
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Low Resolution OmniMerge
• Use OmniMerge to compute optimal merge-tree at lower resolution than eventual search
• Then do ordinary divide-and-conquer using merge tree at high resolution
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Manual vs. Optimal Merge Trees
• Manual (Left), Optimal(Right) trees for 1RST peptide
• Optimal tree found by OmniMerge at low resolution (120°)
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Results
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Manual Search Conformations & Costs
Subtree CostConformations
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Locally Optimal MergesSubchain length:
• 6 Red• 7 Blue• 8 Yellow• 9 Green
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Candidate->Satisfying Conformations
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Propagation (Arc Consistency)• When a conformation violates constraints, want to
prune combination from consideration in all other subchains– Propagate info about disqualified conformations to
overlapping subchains
– Avoid deducing same conclusion repeatedly
– Arc Consistency addresses this problem well
• Most promising upgrade to systematic conformational search
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Arc Consistency Defined
• Arc Consistency between variable A and variable B means that for each value of A,
– There exists a value for B such that the value of A and the value of B are consistent according to the constraint arc
• Variables are vertices in a graph, pair-wise constraints are arcs
• Each subchain is a variable, each conformation is a value
• Two vertices are connected by an arc if subchains overlap
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Arc Consistency Applied
• For each conformation of subchain A, there must exist a conformation of subchain B such that the two conformations assign compatible ranges of angles to each of the torsional bonds in common
• Only forbid a value when its found to be inconsistent with the every possible value for linked variable
• We do not forbid a value when it is inconsistent with just one value of an overlapping subchain
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Propagation Algorithm• Enforces arc-consistency
• Run after each non-leaf subchain is searched– Once per subproblem
– Rather than repeatedly during combine operation
• Only applicable when searching redundant subchains (OmniMerge)
• Propagation queue holds subchains with recently-modified solution sets
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Results with Propagation
Speed increases, completeness deteriorates for OmniMerge– Minimizer is random
– False negatives
– Search failure in one subchain might spread to others
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Augmenting Omnimerge with A*
• Some molecules contain bottleneck:
– “Bad" subchain
– No good combination of right and left children
• Want to choose order in which subchains are searched
• Choose which subchains to skip based on info obtained during searches of smaller subchains
• Prioritize subchain based on immediate cost, expected usefulness of subchain toward other merges
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Buildable Sets• Formal Definition of space of possible merge
strategies
• A buildable set of subchains to be a set of subchains such that every element is either a leaf or else it is buildable from other subchains in set
• Subchain Si is buildable from subchains Li and Ri if they can function as left and right children for a combine operation to create Si
• Complete set is a buildable set that includes whole molecule as one of its subschains
11/14/02 Systematic Conformational Search with Constraint Satisfaction
A*
• Score each option according to a cost function that reflects the total cost of reaching the goal by way of that option
• Use heuristic to estimate cost from decision point to goal
• Resolve uncertainty optimistically– Provide guaranteed lower bound on actual future cost
• Contrasting with OmniMerge which chooses next subchain by standard left-to-right order
11/14/02 Systematic Conformational Search with Constraint Satisfaction
A* Cost Function• Define ƒ*(S) as Estimated-TreeCost for searching
whole molecule using the best merge-tree that – Includes subchain S
– Only uses searched subchains for constructing the cost of the subtree of S
• ƒ*(S) is broken into two parts: – g*(S) : actual cost to reach S (without estimate)
– h*(S) : estimated cost from S to the goal
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Computing BestTreeCosts w/A*
Regions influencing (red) and influenced by (yellow) by the black entry
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Best Merge Tree Using A* • For leaf subchains, define A* to be small
• Find cost of best merge tree restricted to S via dynamic programming table
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Searched 1RST with same constraints, parameters
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Raising the Bar
• 16 residue polyalanine helix
• Randomly distributed short-range constraints
• Default divide-and-conquer should do well
Hard test For OmniMerge• Little difference in cost between alternative
merge strategies– No constraints separated by more than 5 residues
– Merges of large, overlapping subchains are waste of time
• Five different trials
Results
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Engrailed QK50 • Protein/DNA complex solved by crystallography
• 50 residues, 300 torsional degrees of freedom
• Last stage of search requires many newly-active constraints– Too hard
– Added extra constraints
– Found 54 conformations in ~10 hours
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Conclusion• Core Systematic Conformational Search Algorithm
• Unique challenges of constraint satisfaction
• Demonstrated value of systematic methods via structure determination project
• Improvements – Voxels
– Omnimerge
– Propagation
– A*
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Outlook for Larger Molecules • Current algorithm is good for small peptides
• Larger molecules are problematic
• Moderate resolution– Prohibitive number of voxels
– High Search time
• Low resolution– Leads to incompleteness
– False negatives
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Future Work
• Enumerating all voxels is only practical for molecules with small number of unconstrained degrees of freedom
• Suggest resolution intervals for each torsion based on preliminary search
• Single erroneous constraint that cannot be satisfied causes whole search to fail
– Designed this way
– In practice a bug
– Investigate methods to relax this
• Combine torsion-based methods for small subchains with cartesian or distance-based methods for longer chains
11/14/02 Systematic Conformational Search with Constraint Satisfaction
Acknowledgments
• Thanks to Lisa Tucker-Kellogg for her enthusiastic help and advice!