Systematic Conformational Search with Constraint Satisfaction

Post on 17-Jan-2016

32 views 0 download

Tags:

description

Systematic Conformational Search with Constraint Satisfaction. Lisa Tucker Kellogg Ph.D Thesis Massachusetts Institute of Technology June 2002. Protein Conformation. Proteins can have more than one shape Three rotatable bonds per residue Conformations: - PowerPoint PPT Presentation

transcript

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Systematic Conformational Search with Constraint Satisfaction

Lisa Tucker Kellogg

Ph.D Thesis

Massachusetts Institute of Technology

June 2002

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Protein Conformation• Proteins can have more than one shape

– Three rotatable bonds per residue

• Conformations:– Set of possible 3-D arrangements of atoms

• Related to protein folding problem:– Find Conformation with lowest Gibbs Free Energy

• Finding conformations is high scientific priority

11/14/02 Systematic Conformational Search with Constraint Satisfaction

NMR Constraints• Two types:

– Distances between particular pairs of atoms– Dihedral Angles for rotatable Bonds

• The ability to determine all conformations consistent with constraints aids in NMR error analysis and confidence measures

• NMR is an important tool for solving structures

• Algorithms can be used with docking, homology modeling as well

11/14/02 Systematic Conformational Search with Constraint Satisfaction

This is Hard!

• Proteins can have 103-104 degrees of freedom• Conformation space has exponential number of

degrees of freedom• Non-linear changes• Searching whole space may be as hard as folding• GOAL: Provide methods to guarantee minimum

interval of coverage

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Conformational Search Algorithms• Usually consist of two parts:

– Engine that generates conformations– Module to evaluate conformations

• Systematic (exhaustive)– Search using predefined intervals, order– Guaranteed coverage– Slow

• Stochastic– Simulated Annealing, Distance Geometry algorithms– Currently more popular– Creates good hypotheses quickly

• Some algorithms rank conformations

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Cool Stuff in this Thesis• Improvements to Systematic Conformational Search

– Voxel Model– Divide & Conquer– OmniMerge– Propagation– A*

• Uses Systematic Conformational Search to solve a new structure

• Goals:– Enumerate all conformations that satisfy set of constraints– Use a systematic method to guarantee interval of coverage– Invest in up-front computations to save time later

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Systematic Conformational Search

• Basic Algorithms

• Voxel Model

• Minimization

• Divide-and-Conquer

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Gridsearch

TreeSearch

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Comparing Gridsearch & TreeSearch

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Comparing Gridsearch & TreeSearch

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Search Completeness• Want to rule out higher dimension

slab of space based on evaluating lower-dimensional fragment

• However, conformations that respond to regularly spaced grid points don’t capture everything

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Rotamers

• Certain angles of rotation cause steric clashes– Causes set of torsional angles to be divided into ranges

• Rotamers to refer to likely neighborhood of conformations for any molecular fragment– Patterns of low-energy conformations– Ranges of 60±ø, 180 ±ø, 300 ±ø

• Have option of searching rotamers first or calibrating resolution to match regular intervals of rotamers– Treesearch uses [60°,180°, 300°]– “Succeed first” approach

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Voxels• Unit of higher-dimensional volume

– Like pixel is to 2-D– New idea– This thesis is first application of voxel model

• Evaluate voxels instead of points on grid• Ask whether there exists any conformation in voxel that

satisfies constraints• No general, perfectly accurate way to test if conformation

exists in volume.– But we can do much better than just one point

• Try and center voxels on rotamers

Voxelized Treesearch

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Minimization as Search• Convert constraints to objective function• Use minimization of constraint violation function

as a heuristic for searching within voxel for satisfying conformations– Always start in same place– Search for points that minimize objective function

• If sufficiently close to zero: SUCCESS• If max iterations reached: FAILURE

• Choose local minimization for finding conformations in voxel

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Minimization within Treesearch

• First voxels will have 1 dimension, then two dimensions, etc.• Initialize starting conformations so first d torsions have same

solution found previously• Only choose d+1 torsion arbitrarily

– If first pass fails, start at midpoint– If performing more than two passes, start at random points

• This only works with minimization, as grid-based algorithms always start in the same place

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Multi-Resolution Search

• Evaluating voxels using minimization followed by gridsearch

• If low resolution search fails, search systematically at higher resolution

• Start at 120º, then go to 30º• Much faster than performing gridsearch on entire

space• Allows for stochastic nature of minimization

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Results• Tetrapeptide of polyalanine• Two tests with treesearch

1. Easily satisfied distance constraints

2. Extensive set of distance constraints

• Random sampling & gridsearch on both problems• Minimization used on both• Compare different methods of voxel evaluation

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Voxel Evaluation

Minimization with voxels finds the most conformations.

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Evaluating Voxels with Tighter Constraints

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Divide-and-Conquer

• Want to prune regions of conformational space based on evaluation of low dimensional pieces

• Improves on treesearch– Evaluates every piece before adding– Once subproblem is solved, answer saved– Can define subproblems so average size smaller

than treesearch

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Active Constraints

• Constraints on a subchain denote set of constraints active when the torsions in only that subchain are instantiated

• Satisfying conformation is one that satisfies all active constraints

• Example: – Inter-atomic distance constraint on residue 6– Applies to subchain 1-99– Not subchain 3-5

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Combine Operation• Use satisfying conformations for two

pieces of subchain to define possible candidates for the whole chain

• For residues x-z – with subchains x-y, y+1-z– Evaluate individually as voxel with

minimization– Higher-dimension voxels are

significantly more difficult

• Each candidate is unique voxel

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Merge-Trees

• Each node corresponds to subchain• Root is whole molecule• Traverse starting at leaves• Legal Tree

1. Leaves have one-to-one correspondence with residues

2. Left-to-right order of leaves is same as residues

3. Each internal node has two children

11/14/02 Systematic Conformational Search with Constraint Satisfaction

D&C as Merge-Tree Traversal• Algorithm

– Create default merge tree based on number of residues– Traverse tree from bottom to top– Solve subproblem at each node

• Subproblem: – Enumerate satisfying voxels for subchain at node– Leaf node: use treesearch w/voxels, minimization– Internal node: combine operation on child subchains

• Tree should be as balanced as possible

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Treesearch vs D & C

(a) Treesearch

(b) D&C, linear merge tree

(c) D&C, balanced merge tree

• Treesearch is like divide-and-conquer with a linear tree

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Effect of Divide-and-Conquer1. Polyalanine tetrapeptide

2. 16 residue Polyalanine helix

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Divide-and-Conquer Results• 1RST Peptide• 9-residue Strep-tag peptide from peptide-

streptavidin complex• Long, flexible side chains

– 40 rotatable torsions

– Searched sidechains at 120° vs 40° for backbone

11/14/02 Systematic Conformational Search with Constraint Satisfaction

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Real World Results

Solving a new structure!

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Structure Experiment

• Used systematic conformational search to solve new tripeptide– N-formyl-L-Met-L-Leu-L-Phe-OH

– (f-MLF-OH)

• NMR Data– 16 distances

– 18 Torsion angle constraints

• Simulated annealing found 24 conformations• New algorithm found 56,975 conformations

11/14/02 Systematic Conformational Search with Constraint Satisfaction

…And the Rest of the Story• First fMLF structure different from the published one.

– The constraints were slightly different– The "correct" structure was disallowed by the constraints.– Another completely different structure was allowed.

• NMR folks tried some fixes:– Added error padding to all the constraints to allow the "right"

answer– Came up with one more new constraint to rule out the "wrong"

answer– Rethought their methods for processing the raw data

• New analysis yielded constraints that were satisfied by the "right" answer and didn't allow any other answers.

11/14/02 Systematic Conformational Search with Constraint Satisfaction

The Need for Systematic Search

• If NMR folks used simulated annealing/distance geometry:– They never would have discovered the flaw in their

data processing– This is greatest real-world triumph of systematic search

methods that author knows of

• Many of the better structural biologists are aware of the potential for this sort of thing and are therefore primed to embrace systematic methods

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Break!

• When we return:– Merge Strategy Optimization

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Merge Strategy Optimization

• How does choice of Merge-Tree influence runtime?• Low-dimensional searches are so much faster than

high-dimensional searches, practically free• Do extra searches

– Search all possible subchains of size n before searching size n+1

– Choose which subchains to merge

• Adapt divide-and-conquer for constraint satisfaction• Invest computational time to save time later

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Innovations• Omni-Merge

– Search all possible subchains– Merge-Tree cost functions– Find optimal merge-tree

• Propagation– Enforces compatibility between overlapping

subchains– Share information to filter out bad conformations

• A*– Order subchain searches based on costs

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Importance of Choosing a Good Merge-Tree

• Divide-and-conquer works best with equal sized subproblems

• Not all subproblems have equal runtime– Constraints are not uniformly distributed

• Default balanced-tree – Not always optimal – Can be exponentially worse than optimal

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Example• Polyalanine alpha-helice, N=2k residues• 2k ideal for default trees. k+1 levels• Constrain atom i and atom j if:

– Residue of i is <= N/2– Residue of j > n/2– No constraints in either half– Won’t appear until final combine operation

• High Dimensional, unconstrained subchains

• Slow!

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Counter-Example

• Same molecule• Same Constraints

– Residue of i is <= N/2– Residue of j > n/2– No constraints in either half– Won’t appear until final combine

operations

• All merges cross boundary• Usually this tree is bad!

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Example - Results• Number of good conformations for a subchain

is exponential in the length of the subchain• So cost of searching this molecule with default

mergetree tree is O(e^N)

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Choosing Good Merge-Trees• Considerations:

– Small subchains can be searched more quickly than large ones

– Constraints not distributed evenly

– Constraints more important consideration than the fullness of tree

• Want to construct merge-trees that reflect this• Divide and Conquer still useful for isolating

highly constrained parts of molecule• Exploit Locality and Ordering

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Locality

• Good merge-trees define sub-problems with as few satisfying conformations as possible

• Want to define subchains with lots of constraints– Few satisfying conformations

• If there is a constraint between atoms i and j– It is more likely that atoms near i and j will also be

constrained

– Subchains with short length are not necessarily the ones with the least number of satisfying conformations

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Ordering

• Search order has a big effect on treesearch

• We want to search subchains in similar manner– Add poorly-constrained subchains as late as

possible– Put unconstrained chains near root

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Cost Function• Provide lower bound on run time

• Cost of non-leaf from scratch is cost of all nodes in subtree

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Computing the Optimum Cost• Root just another internal node

• TreeCost

• Enumerate all possible merge-trees

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Computing the Optimum Merge Tree• Use dynamic programming

• Computing table of BestTreeCosts gives optimal merge tree

• Start at top, work down

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Omnimerge

• Final combine operation consumes a large part of the total runtime

• Can we benefit from extra lower-dimensional searches?

• Search all subchains

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Locally Optimal Merges

• Perform all possible dipeptide merges – (1-2, 2-3, 3-4, etc...)

– Assume cost of extra work is insignificant

– Won't help if we're making tetrapeptides

– Gives us choices in making tripeptides

• Choose cheaper merge: – 1 & 2-3 or 1-2 & 3

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Omnimerge • Search all possible subchains in order of increasing size

• Choose combinations based on minimizing node cost

• Add cheapest merge to D.P. table

• Trivially fills in successive rows of BestTreeCosts table

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Mitigating the Cost of Omnimerge

• OmniMerge can be expensive

• Several variations can run faster, provide less "insurance”

• Omit some subchains

• Globally optimal merge-tree cannot be computed as side effect

• Useful when conformational search result is more important than knowing the optimal merge-tree

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Limited Omnimerge

• Set an upper limit on subchain size

• Search all subchains up to length L– L >= N/2

– Choose L = .667 for tests

– Searches more than default merge tree

– Less than regular omnimerge

• Each subchain gets to use optimal children

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Low Resolution OmniMerge

• Use OmniMerge to compute optimal merge-tree at lower resolution than eventual search

• Then do ordinary divide-and-conquer using merge tree at high resolution

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Manual vs. Optimal Merge Trees

• Manual (Left), Optimal(Right) trees for 1RST peptide

• Optimal tree found by OmniMerge at low resolution (120°)

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Results

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Manual Search Conformations & Costs

Subtree CostConformations

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Locally Optimal MergesSubchain length:

• 6 Red• 7 Blue• 8 Yellow• 9 Green

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Candidate->Satisfying Conformations

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Propagation (Arc Consistency)• When a conformation violates constraints, want to

prune combination from consideration in all other subchains– Propagate info about disqualified conformations to

overlapping subchains

– Avoid deducing same conclusion repeatedly

– Arc Consistency addresses this problem well

• Most promising upgrade to systematic conformational search

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Arc Consistency Defined

• Arc Consistency between variable A and variable B means that for each value of A,

– There exists a value for B such that the value of A and the value of B are consistent according to the constraint arc

• Variables are vertices in a graph, pair-wise constraints are arcs

• Each subchain is a variable, each conformation is a value

• Two vertices are connected by an arc if subchains overlap

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Arc Consistency Applied

• For each conformation of subchain A, there must exist a conformation of subchain B such that the two conformations assign compatible ranges of angles to each of the torsional bonds in common

• Only forbid a value when its found to be inconsistent with the every possible value for linked variable

• We do not forbid a value when it is inconsistent with just one value of an overlapping subchain

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Propagation Algorithm• Enforces arc-consistency

• Run after each non-leaf subchain is searched– Once per subproblem

– Rather than repeatedly during combine operation

• Only applicable when searching redundant subchains (OmniMerge)

• Propagation queue holds subchains with recently-modified solution sets

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Results with Propagation

Speed increases, completeness deteriorates for OmniMerge– Minimizer is random

– False negatives

– Search failure in one subchain might spread to others

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Augmenting Omnimerge with A*

• Some molecules contain bottleneck:

– “Bad" subchain

– No good combination of right and left children

• Want to choose order in which subchains are searched

• Choose which subchains to skip based on info obtained during searches of smaller subchains

• Prioritize subchain based on immediate cost, expected usefulness of subchain toward other merges

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Buildable Sets• Formal Definition of space of possible merge

strategies

• A buildable set of subchains to be a set of subchains such that every element is either a leaf or else it is buildable from other subchains in set

• Subchain Si is buildable from subchains Li and Ri if they can function as left and right children for a combine operation to create Si

• Complete set is a buildable set that includes whole molecule as one of its subschains

11/14/02 Systematic Conformational Search with Constraint Satisfaction

A*

• Score each option according to a cost function that reflects the total cost of reaching the goal by way of that option

• Use heuristic to estimate cost from decision point to goal

• Resolve uncertainty optimistically– Provide guaranteed lower bound on actual future cost

• Contrasting with OmniMerge which chooses next subchain by standard left-to-right order

11/14/02 Systematic Conformational Search with Constraint Satisfaction

A* Cost Function• Define ƒ*(S) as Estimated-TreeCost for searching

whole molecule using the best merge-tree that – Includes subchain S

– Only uses searched subchains for constructing the cost of the subtree of S

• ƒ*(S) is broken into two parts: – g*(S) : actual cost to reach S (without estimate)

– h*(S) : estimated cost from S to the goal

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Computing BestTreeCosts w/A*

Regions influencing (red) and influenced by (yellow) by the black entry

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Best Merge Tree Using A* • For leaf subchains, define A* to be small

• Find cost of best merge tree restricted to S via dynamic programming table

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Searched 1RST with same constraints, parameters

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Raising the Bar

• 16 residue polyalanine helix

• Randomly distributed short-range constraints

• Default divide-and-conquer should do well

Hard test For OmniMerge• Little difference in cost between alternative

merge strategies– No constraints separated by more than 5 residues

– Merges of large, overlapping subchains are waste of time

• Five different trials

Results

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Engrailed QK50 • Protein/DNA complex solved by crystallography

• 50 residues, 300 torsional degrees of freedom

• Last stage of search requires many newly-active constraints– Too hard

– Added extra constraints

– Found 54 conformations in ~10 hours

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Conclusion• Core Systematic Conformational Search Algorithm

• Unique challenges of constraint satisfaction

• Demonstrated value of systematic methods via structure determination project

• Improvements – Voxels

– Omnimerge

– Propagation

– A*

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Outlook for Larger Molecules • Current algorithm is good for small peptides

• Larger molecules are problematic

• Moderate resolution– Prohibitive number of voxels

– High Search time

• Low resolution– Leads to incompleteness

– False negatives

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Future Work

• Enumerating all voxels is only practical for molecules with small number of unconstrained degrees of freedom

• Suggest resolution intervals for each torsion based on preliminary search

• Single erroneous constraint that cannot be satisfied causes whole search to fail

– Designed this way

– In practice a bug

– Investigate methods to relax this

• Combine torsion-based methods for small subchains with cartesian or distance-based methods for longer chains

11/14/02 Systematic Conformational Search with Constraint Satisfaction

Acknowledgments

• Thanks to Lisa Tucker-Kellogg for her enthusiastic help and advice!