Date post: | 04-Jan-2016 |
Category: |
Documents |
Upload: | ferdinand-anderson |
View: | 43 times |
Download: | 0 times |
Dynamic Maintenance of Molecular Surfaces under Conformational Changes
Eran Eyal and Dan Halperin
Tel-Aviv University
2
Molecular Simulations
Molecular simulations help to understand the structure (and function) of protein molecules
• Monte Carlo Simulation (MCS)
• Molecular Dynamics Simulation (MDS)
3
Solvent Models
• Explicit Solvent Models : using solvent molecules• Implicit Solvent Models : all the effects of the solvent molecules are included in an
effective potential : W = Welec + Wnp
Wnp = ΣiγiAi(X)
Ai(X) – the area of atom i accessible to solvent for a given conformation X
4
Molecular Surfaces
• van der Waals surface
• Solvent Accessible surface
• Smooth molecular surface (solvent excluded)
Taken from http://www.netsci.org/Science/Compchem/feature14.html (Connolly)
5
Related Work
• Lee and Richards, 1971 – Solvent accessible surface• Richards, 1977 – Smooth molecular surface• Connolly, 1983 – First computation of smooth molecular
surface• Edelsbrunner, 1995 – Computing the molecular surface
using Alpha Shapes• Sanner and Olson, 1997 – Dynamic reconstruction of
the molecular surface when a small number of atoms move
• Edelsbrunner et al, 2001 – algorithm to maintain an approximating triangulation of a deforming 3D surface
• Bajaj et al, 2003 – dynamically maintain molecular surfaces as the solvent radius changes
6
Our Results
• a fast method to maintain a highly accurate surface area of a molecule dynamically during conformation changes
• robust while using floating point
• efficiently accounting for topological changes : theory and practice
7
Initial Construction of the Surface
• Finding all pairs of intersecting atoms
• Construction of spherical arrangements
• Controlled Perturbation
• Combining the spherical arrangements
• Constructing the boundary and calculating its surface area
8
Finding the Intersecting Atoms
Using a grid based solution introduced by Halperin and Overmars :
Theorem : Given S = {S1,…,Sn} spheres with radii r1,…,rn such that
• rmax/rmin < c for some constant c
• There’s a constant ρ such that for each sphere Si, the concentric sphere with radii ρri does not contain the center of any other sphere
Then :
(1) The maximum number of spheres that intersect any given sphere in S is bounded by a constant
(2) The maximum complexity of the boundary of the union of the spheres is O(n)
9
The Grid Algorithm
• Subdivide space into cubes 2xrmax long
• For each sphere compute the cubes it intersects (up to 8 cubes)
• For each sphere check intersection with the spheres located in its cubes
• Constructed in O(n) time with O(n) space
• Finding all pairs of intersecting spheres takes O(n) time
10
Construction of Spherical Arrangements
Spherical Arrangement
Full trapezoidal decomposition
Partial trapezoidal decomposition
11
Controlled Perturbation
• A method of robust computation while using floating point arithmetic
• Handles two types of degeneracies :– Type I : intrinsic degeneracies of the spherical
arrangement– Type II : degeneracies induced by the
trapezoidal decomposition
12
Type I Degeneracies
We wish to ensure the following conditions :
1. No Inner or outer tangency of two atoms2. No three atoms intersecting in a single point3. No four atoms intersecting in a common point
We achieve these conditions by randomly perturbing the center of each atom that induces a degeneracy by at most δ (the perturbation parameter). δ is a function of ε (the resolution parameter), m (the maximum number of atoms that intersect any given atom) and R (the maximum atom radius)
δ = 2m ε1/3R2/3 - ensures elimination of all Type I degeneracies in expected O(n) time
13
Type II Degeneracies
• Happens when two arcs added by the trapezoidal decomposition are too close (the angle between them is less than a certain ω threshold)
• These degeneracies are prevented by randomly choosing a direction for the north pole of an atom that induces no degeneracies
• sin ω < 1/(2m(m-1)) – ensures finding a good pole direction in expected O(n) time
14
Combining the Spherical Arrangements
• For each atom, the arc of each intersection circle points to the same arc on the intersection circle of the second atom.
• Now we have a subset of the arrangement of the spheres (contains all features of the arrangement except the 3 dimensional cells)
15
Building the Boundary of the Molecule
• Start with the lowest region (2D face) of the bottommost atom
• Traverse the outer boundary of the 3D arrangements : Whenever an arc of an intersection circle is reached, we jump to the opposite region on the other atom that shares this arc
• During the traversal, the area of each encountered region is calculated, and summed up
16
Finding the voids
• Find for each atom the exposed regions (regions not covered by other atoms)
• Find the difference between the set of exposed regions on all atoms and the outer boundary
• Traverse the difference to construct the boundary of the voids
17
Screenshot
18
Dynamic Maintenance of the Surface
• We wish to maintain the boundary of the protein molecule and its area as the molecule undergoes conformational changes
• The grid algorithm requires reconstruction from scratch of the entire structure on each step, which is slow for large molecules (even though it is asymptotically optimal in the worst case), O(n) time where n is the number of atoms
19
The Problem
• We perform a simulation where each time several DOFs of the backbone change (Φ and Ψ angles)
• A simulation step is accepted when it causes no self collisions
• After a step is accepted, we wish to quickly update the boundary of the molecule and its surface area
20
A Step of the Simulation
• Perform a k-DOF change• Check if the change incurs self collisions• If not :
– Find all the pairs of intersecting atoms affected by the change
– Modify the spherical arrangements– Modify the boundary of the molecule
and its surface area : account for topological changes
21
Attaching Frames to the Backbone
The backbone of a protein with the reference frames of each link
For each atom center we calculate its coordinates within its frame
22
Detecting Self Collision
• We use the ChainTree introduced by Lotan et al
Courtesy of Itay Lotan
23
ChainTree Performance
• Update Algorithm – Modifies the ChainTree after a k-DOF change in O(klog(n/k)) time
• Testing Algorithm – Finds self collision in O(n4/3) time
24
Finding intersecting atom pairs
• After a DOF change is accepted, we use the ChainTree to find all the pairs of intersecting atoms affected by the change:
– Deleted pairs
– Inserted pairs
– Updated pairs
25
The IntersectionsTree
• A tree used for efficient retrieval of modified intersections
• Updated in a similar way to the testing algorithm of the ChainTree
• Worst case running time : O(n4/3) (in practice very efficient)
26
The Modified Intersections List
• During the update of the IntersectionsTree we store in a separate list all the changes done in the IntersectionsTree :– Deleted intersecting atom pairs– Inserted intersecting atom pairs– Updated intersecting atom pairs
• The Modified Intersections List is used to update the spherical arrangements
27
Updating the Spherical Arrangements
• For each pair of inserted intersecting atoms – add their intersection circle to the spherical arrangements of both atoms
• For each pair of updated intersecting atoms – remove their old intersection circle from the two spherical arrangements and add their new intersection circle
• For each pair of deleted intersecting atoms – remove their old intersection circle from the two spherical arrangements
The Cost : O(p), where p is the number of atoms whose spherical arrangements were modified
28
Example
Backbone of 4PTI - A single 180o DOF change of the Ψ angle of the 13th amino acid
Affected atoms : 14 out of 454 (p out of n)Modified intersection circles : 13
29
Example - Continued
(Hemi)spherical arrangement of one of the affected atoms (the N atom of the 14th amino acid) of 4PTI before (left) and after (right) the mentioned DOF change
30
Dynamic Controlled Perturbation
Goals :
• Perturb as few atoms as possible– For efficiency– To reduce errors
• Avoid cascading errors caused by– Perturbing an atom several times in different
simulation steps– Changing a torsion angle several times
31
Type I Degeneracies
• Extend the Modified Intersections List to include also pairs of atoms that almost intersect
• Check all atoms in the Modified Intersections List that belong to inserted and updated pairs and the atoms that belong to near intersecting pairs
• Each of these atoms is checked against the atoms that intersect it or almost intersect it
• The center of an atom that causes a degeneracy is perturbed within a sphere or radius δ around the original center of the atom within its reference frame
• The spherical arrangement of a perturbed atom must be re-computed from scratch
32
Avoiding Errors in the Transformations
• In each DOF, accumulate the sum of the angle changes, and calculate a single rotation matrix (instead of combining several rotations)
• Use exact arithmetic with arbitrary-precision rational numbers to compute the sines and cosines of the rotations – turned off in current experiments, too slow
33
Type II Degeneracies
• The same set of atoms is tested
• For perturbed atoms we re-calculate their spherical arrangements from scratch
34
Running Time
• The expected update time of the spherical arrangements including the perturbation time is O(p)
35
Modify the Boundary and Surface Area
Naïve method :• The same method used for the initial
construction – traverse the outer boundary, and then traverse the voids
• Some savings :– No need to recalculate the surface area of regions
that weren’t updated– No need to recalculate the exposed regions of atoms
that weren’t updated
The Cost : O(n)
36
Dynamic Graph Connectivity
• We use a Dynamic Graph Connectivity algorithm introduced by Holm, De Lichtenberg & Thorup (2001)
• We define the boundary graph :– Each exposed region of the spherical arrangements is
a vertex of the graph– Two vertices of the graph are connected by an edge if
their respective regions are adjacent on the boundary of the molecule
– A connected component of the graph corresponds to a connected component of the boundary of the molecule (outer boundary or voids)
37
Boundary Graph Illustration
38
Updating the Boundary Graph
• After the spherical arrangements are modified (in an accepted DOF change) :– Remove all the vertices corresponding to
modified or deleted regions (with their incident edges)
– Add new vertices corresponding to modified or new regions
– Add new edges connecting the new vertices to each other and to the rest of the graph
39
HDT Graph Connectivity Algorithm
• A poly-logarithmic deterministic fully-dynamic algorithm for graph connectivity :– Maintains a spanning forest of a graph– Answers connectivity queries in O(logn) time
in the worst case– Uses O(log2n) amortized time per insertion or
deletion of an edge– n, the number of vertices of the graph, is fixed
as edges are added and removed
40
The General Idea of the Algorithm
• A spanning forest F of the input graph G is maintained
• Each tree in each spanning forest in represented by a data structure called ET-tree, which allows for O(logn) time splits and merges
41
ET-treeA Spanning Tree Euler Tour ET-Tree
42
ET-tree properties
• Merging two ET-trees or splitting an ET-tree can takes O(logn) time while maintaining the balance of the trees
• Each vertex of the original tree may appear several times in the ET-tree. One occurrence is chosen arbitrarily as representative
• Each internal node of the ET-tree represents all the representative leaves on its sub-tree, and may hold data that represent these leaves
43
Spanning Forests Hierarchy
• The edges of the graph are split into lmax=log2n levels
• A hierarchy F=F0 F1 … Flmax of spanning
forests is maintained where Fi is the sub forest of F induced by the edges of level I
• Invariants :– If (v,w) is a non-tree edge, v and w are
connected in Fl(v,w)
– The maximal number of nodes in a tree (component) of Fi is n/2i
44
Updating the Graph
• Insert an edge – added to level 0. If it connects two components, it becomes a tree edge (the components are merged)
• Remove a non-tree edge – trivial
• Remove a tree edge - more difficult. We must search for an edge that replaces the removed edge on the relevant spanning tree
45
Removing a Tree Edge
• The removal of a tree edge e=(v,w) splits its tree to Tv and Tw (Tv is the smaller one)
• The replacement edge can be found only on levels l(e)
• On each level l(e) (starting with l(e)) :– Promote the edges of Tv to the next level
– Each non-tree edge incident to vertices of Tv is tested • If it reconnects the split component, we are done• If not, we promote it to the next level
46
Amortization Argument
• The amortization argument of the algorithm is based on increasing the levels of the edges (each level can be increased at most lmax times)
47
Illustration of the Algorithm
48
Our Extensions
• We allow vertices of the graph to be inserted and removed. This has no effect on the amortized running time, because throughout the simulation the number of vertices remains O(n)
• In each representative occurrence of each ET-tree we store the area of the relevant region
• Each internal node of each ET-tree holds the sum of the areas of the representative leaves in its sub-tree
• Maintaining the area information takes O(logn) time per split or merge of the ET-trees
49
ET-tree with Areas
50
The Running Time
• Maintaining the area information for the spanning forest F takes O(log2n) amortized time for each insertion or deletion of an edge
• Finding the connected component of a given region of the boundary takes O(logn) time
• The amortized cost of recalculating the surface area of the outer boundary and voids of the molecule is O(plog2n)
• The cost of computing the contribution of a given atom to the boundary and all the voids is O(logn)
51
Implementation Details
• Order of edge deletion
• Recycling of deleted vertices
• Heuristics
52
Heuristics
• Sampling – Search for a replacement edge within the first s non-tree edges, without promotion
• Truncating Levels – Perform simple search (no promotion) for trees with less than b nodes
53
Complexity Summary
Initial construction of the arrangements and boundary
)including perturbation(
O(n)
Updating the ChainTreeO(klog(n/k))Testing for self collisionΘ(n4/3)
Updating the IntersectionsTree
Θ(n4/3)
Updating the arrangements (including perturbation)
O(p)
Updating the boundaryO(n) or O(plog2n)
54
Breakdown of Running Time
55
Experimental Results : InputsInput
File #of
Atoms #of
Amino Acids
#of Links
Max m
Mean m
Graph Size
|V|,|E|
4PTI45458117105.793405,
10553
1BZM2034260521105.7415254 ,47266
2GLS3636468937136.3329385 ,90820
1JKY56147481497136.2445558 ,138818
1KEE818110582117135.8762308 ,191317
1EA01118014522905136.1484536 ,260096
56
The Experiments
• Executed on a 1 GHz Pentium III machine with 2 GB of RAM
• Only one chain is read from each PDB file• 1000 simulation steps• Each step k DOFs are chosen uniformly at
random• For each chosen DOF a uniform random change
is chosen between -1o and 1o
• The results reflect the average running times of accepted simulation steps (usually several hundreds)
57
Average Number of Modified Atoms and Circles
58
Modification Times for Accepted Steps
Input File
#AtomsInitial Construct.
1-DOF5-DOFs20-DOFs
50-DOFs
4PTI4541.950.11 5.5%
0.48 24.4%
0.83 42.6%
1.32 67.5%
1BZM20348.790.61
7%
1.49 16.9%
2.24 25.5%
2.79 31.7%
2GLS363618.250.57 3.1%
1.45 7.9%
2.65 14.5%
4.3 23.5%
1JKY561427.310.61
2.3%
1.43 5.2%
2.81 10.3%
4.15 15.2%
1KEE818136.481.1
3%
2.29 6.3%
3.51 9.6%
4.92 13.5%
1EA01118053.531.29 2.4%
2.91 5.4%
4.79 8.9%
6.25 11.7%
59
Observations
• Strong connection between the number of simultaneous DOF changes and the number of modified atoms
• The algorithm is more effective for larger molecules
• Faster update times for small number of simultaneous DOF changes
• The implementation runs in time proportional to p
60
Dynamic Connectivity Implementation
• Using the implementation by Iyer, Karger, Rahul & Thorup of the dynamic graph connectivity algorithm of Holm, De Lichtenberg & Thorup
• Improved performance for small number of simultaneous DOF changes
61
Naive vs. Dynamic connectivity
Input FileNaïve algorithm (1-DOF)
Dynamic connectivity
)1-DOF(
improvement
4PTI 4540.110.09 11%
1BZM 20340.610.569%
2GLS 36360.570.37 36%
1JKY 56140.610.27 55%
1KEE 81811.10.6541%
1EA0 111801.290.64 50%
62
Naive vs. Dynamic connectivity
Input FileNaïve algorithm (5-DOF)
Dynamic connectivity
)5-DOF(
improvement
4PTI 4540.480.51- 7%
1BZM 2034 1.49 1.57- 6%
2GLS 36361.451.39 4%
1JKY 56141.431.18 18%
1KEE 8181 2.29 2.03 11%
1EA0 111802.912.54 13%
63
Breakdown of Running Time – Naïve vs. Dynamic Connectivity
Naïve Connectivity Dynamic Connectivity
64
Heuristics
1-DOF 20-DOFs
65
Future Work
• Allow DOFs in side chains of the protein
• Extend the work to volume calculations
• Extend the implementation to smooth molecular surfaces
• Speedup the implementation
66
References
The material presented in class is mainly based on the following papers:
Eyal and Halperin ’05, Dynamic maintenance of molecular surfaces under conformational changes, To appear in proceedings of the 21st ACM Symposium on Computational Geometry (SoCG’05)
http://www.cs.tau.ac.il/~eyaleran/dynamic_surfaces.pdfEyal and Halperin ’05, Improved maintenance of molecular
surfaces using dynamic graph connectivity, Manuscripthttp://www.cs.tau.ac.il/~eyaleran/dynamic_connectivity.pdf
67
Additional References
Our work combines and extends the following previous work:
• Halperin and Overmars 98’, Spheres, molecules and hidden surface removal, Computational Geometry:
Theory & Applications, Vol. 11(2), pp. 83-102• Halperin and Shelton 98’, A perturbation scheme for
spherical arrangements with application to molecular modeling, Computational Geometry: Theory & Applications, Vol. 10, pp. 273-287
• Lotan et al 04’, Algorithm and data structures for efficient energy maintenance during Monte Carlo simulation of proteins (2004), Journal of Computational Biology, Vol.
11(5), pp. 902-932
68
Some More References
The dynamic graph connectivity we use is based on the following paper:
Holm, De Lichtenberg & Thorup ’01, Poly-logarithmic deterministic fully-dynamic algorithms for connectivity…, Journal of the ACM, Vol. 48(4), pp. 723-760
and its implementation:Iyer, Karger, Rahul & Thorup ’01, An experimental
study of poly-logarithmic, fully dynamic, connectivity algorithms, J. Exp. Algorithmics, Vol. 6, pp. 4-