+ All Categories
Home > Documents > ICES REPORT 10-32ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj,...

ICES REPORT 10-32ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj,...

Date post: 29-Sep-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
32
Algorithms for Faster Molecular Energetics, Forces and Interfaces Rezaul Chowdhury and Chandrajit Bajaj The Institute for Computational Engineering and Sciences The University of Texas at Austin Austin, Texas 78712 by ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj, "Algorithms for Faster Molecular Energetics, Forces and Interfaces", ICES REPORT 10-32, The Institute for Computational Engineering and Sciences, The University of Texas at Austin, August 2010.
Transcript
Page 1: ICES REPORT 10-32ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj, "Algorithms for Faster Molecular Energetics, Forces and Interfaces", ICES REPORT 10-32,

Algorithms for Faster Molecular Energetics, Forcesand Interfaces

Rezaul Chowdhury and Chandrajit Bajaj

The Institute for Computational Engineering and SciencesThe University of Texas at AustinAustin, Texas 78712

by

ICES REPORT 10-32

August 2010

Reference: Rezaul Chowdhury and Chandrajit Bajaj, "Algorithms for Faster Molecular Energetics, Forces and Interfaces", ICES REPORT 10-32, The Institute for Computational Engineering and Sciences, The University of Texas at Austin, August 2010.

Page 2: ICES REPORT 10-32ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj, "Algorithms for Faster Molecular Energetics, Forces and Interfaces", ICES REPORT 10-32,

Algorithms for Faster Molecular Energetics, Forces and Interfaces ∗

Rezaul Alam Chowdhury Chandrajit Bajaj

August 6, 2010

Abstract

Bio-molecules reach their stable configuration in solvent which is primarily water with asmall concentration of salt ions. One approximation of the total free energy of a bio-moleculeincludes the classical molecular mechanical energy EMM (which is understood as the self intra-molecular energy in vacuum) and the solvation energy Gsol which is caused by the change of theenvironment of the molecule from vacuum to solvent (and hence also known as the molecule-solvent interaction energy). This total free energy is used to model and study the stability ofbio-molecules in isolation or in their interactions with drugs. In this technical report we presentfast octree based approximation schemes for estimating the compute-intensive terms of EMM

and Gsol. The algorithms run in O (M logM) time and use O (M) space, where M is the numberof atoms in the molecule. Additionally, we show how to approximate the polarization force (i.e.,derivatives of polarization energy) acting on all M atoms of the molecule within the same timeand space bounds. The algorithms for Gsol and polarization forces are dependent on an O (M)size sampling of the biomolecular surface and its spatial derivatives (normals). We also presentfast octree based algorithms for approximating interface areas (plain as well as hydrophobic andhydrophilic) of bio-molecular complexes. We include several examples with timing results, andspeed/accuracy tradeoffs, demonstrating the efficiency and scalability of our fast free energyestimation of bio-molecules, potentially with millions of atoms.

1 Introduction

Bio-molecules (primarily proteins and nucleic acids), gain their stable configuration in solvent whichis primarily water with a small concentration of salt ions. The total free energy (defined below)of a bio-molecule includes its (intra-) molecular mechanical energy in vacuum, and the molecule-solvent interaction energy (or solvation energy) caused by moving the molecule from vacuum tosolvent. This total free energy is used to model and study the stability of bio-molecules in isolationor in their interactions with drugs. In molecular dynamics simulations, one samples and steers thedynamic molecular motion, in an attempt to determine the molecular conformation with minimaltotal free energy. This energy is also used in ranking (or re-ranking) protein-drug bindings in stablemolecular docking. The binding energy is the total free energy of the protein-drug complex in thebound state minus the individual free energies of the isolated protein and drug molecules. If thedrug is modified, the change of the binding energy is used to specify how the modification affectsthe molecule-drug interaction. This technique is basic to the drug design process, where the drugwith the best binding to some target protein is chosen so as to either enhance or inhibit the behaviorof the target.

∗This research was supported in part by NIH grants: R01-GM074258, R01-GM073087, and R01-EB004873, anda grant from UT-Protugal colab project.

1

Page 3: ICES REPORT 10-32ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj, "Algorithms for Faster Molecular Energetics, Forces and Interfaces", ICES REPORT 10-32,

Binding Energy. The binding free energy of a complex formed by two given molecules A and Bis given by:

∆EA,B = EA+B − (EA + EB), (1)

where EA, EB and EA+B are the total free energy of molecule A, molecule B and the complexA+B, respectively. The total free energy of a system consisting of a solute biomolecule in a solventenvironment is given by

E = EMM + Esol︸ ︷︷ ︸potential energy

−TS,

where EMM is the classical molecular mechanical energy of the solute, Esol is the solvation energy,T is the system temperature, and S is the solute entropy.

Molecular Mechanical Energy. The molecular mechanical energy is defined as follows [42].

EMM = Ed + Eθ + Eϕ︸ ︷︷ ︸bonded interactions

+ Evdw + Ecoul︸ ︷︷ ︸nonbonded interactions

The first three terms represent bonded interactions based on bond lengths (Ed), bond angles (Eθ),and torsions around bounds (Eϕ). The last two terms represent nonbonded interactions: Lennard-Jones potential for van der Waals forces (Evdw), and the Coulomb potential for electrostatics(Ecoul).

The bonded interactions are computed as follows.

Ed =∑

bond length (d)

kd(d−deq)2, Eθ =∑

bond angle (θ)

kθ(θ−θeq)2, Eϕ =∑

torsion (ϕ)

kϕ(1−cos[n(ϕ−ϕeq)]),

where kd, kθ and kϕ are force constants for bond lengths, bond angles and bond dihedrals, respec-tively. Bond lengths, angles and torsions in the current configuration are denoted by d, θ and ϕ,respectively, while deq, θeq and ϕeq represent their values in the equilibrium condition.

The two nonbonded interactions are defined as follows.

Evdw =∑i

∑j>i

(aijr12ij

− bijr6ij

)and Ecoul =

∑i

∑j>i

qiqjε(rij)rij

,

where rij is the distance between two given atoms, aij and bij are constants based on atom types,qi and qj are Coulombic charges, and ε(rij) is a distance dependant dielectric constant. Sincethe bonded interactions are typically computed from efficient lookup tables, the primary challengewhen computing EMM lies in the nonbonded pairwise summation terms. Various methods havebeen used for approximating such pairwise summations, e.g., Barnes-Hut clustering [7], the fastmultipole method (FMM) [27, 10, 14], particle-particle particle-mesh (PPPM) [33, 45], particle-mesh Ewald (PME) [17, 21], and multilevel summation [51, 29, 54].

Solvation Energy. The solvation energy Esol consists of the energy to form cavity in thesolvent (Ecav), the solute-solvent van der Waals interaction energy (Evdw(s-s)), and the electro-static potential energy change due to the solvation (also known as the polarization energy, Epol)[20, 25, 32, 49, 50].

Esol = Ecav + Evdw(s-s)︸ ︷︷ ︸nonploar

+Epol︸︷︷︸polar

2

Page 4: ICES REPORT 10-32ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj, "Algorithms for Faster Molecular Energetics, Forces and Interfaces", ICES REPORT 10-32,

The first two terms are often modeled as [20, 56]

Ecav = pV +∑i

γiAi and Evdw(s-s) = ρ0

∑i

∫exu

(att)i (xi, r)d3r

where p is the solvent pressure, V is the molecular volume, Ai is the solvent accessible surface area

of atom i and γi is its solvation parameter, ρ0 is the bulk density, and u(att)i is the van der Waals

dispersive component of the interaction between atom i and the solvent.

The polarization energy has the form

Epol =1

2

∫φreaction(r)ρ(r) dr, (2)

where φreaction = φsolvent − φgas-phase, and φ(r) and ρ(r) are the electrostatic potential and thecharge density at r, respectively.

The Poisson-Boltzmann (PB) model is used to compute Epol by solving the following equationfor electrostatic potential φ: −∇(ε(x)∇φ(x)) = ρ(x). Numerical methods for solving the equationinclude the finite difference method [49, 38], finite element method [34, 6], and boundary elementmethod [43]. However, due to their high computational costs PB methods are rarely used for largemolecules such as proteins. Instead Equation (2) is approximated using the Generalized Born (GB)model which we will describe later in this section.

Now let us consider the binding free energy given by Equation (1) again. Assuming that thesolvent temperature T and the solute entropy S are constants, and the two molecules are mostlyrigid (i.e., negligible conformational change upon binding), we have,

∆EA,B ≈ ∆Evdw + ∆Ecoul + ∆Esol

Hence, for rigid-body binding ∆EA,B has three major components.

- ∆Evdw :∑

i∈A,j∈B

(aijr12ij− bij

r6ij

).

- ∆Ecoul :∑

i∈A,j∈Bqiqj

ε(rij)rij(long-range electrostatic potential). Inter-molecular hydrogen

bonds and disulphide bonds are responsible for short-range electrostatic interactions.

- ∆Esol : Desolvation free energy is defined as the change in energy due to the displacement ofsolvent molecules from the interface. Existance of large hydrophobic area at the interface isoften considered a favorable condition for binding.

Generalized Born (GB) Model of Polarization Energy. The GB method has been shownto be very effective in approximating the polarization energy of large systems. This model wasoriginally presented in [53], where the polar part was approximated by the pairwise sum overinteracting charges:

Epol = −1

2

(1− 1

εsolv

)∑i,j

qiqj[r2ij +RiRj exp

(− r2

ij

4RiRj

)] 12

, (3)

3

Page 5: ICES REPORT 10-32ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj, "Algorithms for Faster Molecular Energetics, Forces and Interfaces", ICES REPORT 10-32,

(a) (b)

Figure 1: (a) The effective Born radius re-flects how deep a charge is buried inside themolecule. The Born radius of an atom issmall if the atom is close to the molecularsurface; otherwise the Born radius is largeand the atom has weaker interaction with thesolvent. (b) Distances from the quadraturepoints sampled from the molecular surfaceand the surface normals at those points areused to approximate Born radius of an atom.

where Ri is the effective Born radius of atom i (see Figure 1(a) for an intuitive explanation ofRi), rij is the distance between atoms i and j, and εsolv is the solvent dielectric. The evaluationof Ri’s is essentially based on the Coulomb field approximation which assumes that the electricdisplacement is in the Coulombic form [8]:

1

Ri=

1

∫ex

1

|r− xi|4d3r, (4)

where xi is the center of atom i. Modifications of (4) have been developed including addingempirical correction terms to compensate for the error caused by the Coulomb field approximation[41, 40, 36], and using the idea of pairwise descreening and rescaling parameters to fit with theexperiment or numerical energy results of the training sets [30, 31].

A surface formulation of (4) can be obtained as follows by applying the divergence theorem onthe volume integral [24]:

1

Ri=

1

∫Γ

(r− xi) · ~n(r)

|r− xi|4d2r, (5)

where Γ is the molecular surface, and ~n(r) is the outward surface normal at r. Equation (5)can be evaluated more efficiently than Equation (4) because of decreased dimension. A discreteapproximation of R−1

i can be obtained as follows by applying Gaussian quadrature [19].

1

Ri=

1

N∑k=1

wk(rk − xi) · ~nk|rk − xi|4

, (6)

where the rk’s are N Gaussian quadrature points [19] on the molecular surface, ~nk is the outwardsurface normal at rk, and wk is a weight assigned to rk in order to achieve higher order of accuracyfor small N ([5] explains how these weights are chosen). In [40, 36] a higher order correction termis used to compensate for the error caused by the Coulomb field approximation of 1

Ri:

1

R′i=

(1− 1√

2

)1

Ri+ corr, (7)

where,

corr =

(1

∫ex

1

|r− xi|7d3r

) 14

. (8)

4

Page 6: ICES REPORT 10-32ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj, "Algorithms for Faster Molecular Energetics, Forces and Interfaces", ICES REPORT 10-32,

By applying the divergence theorem and Gaussian quadrature we obtain a discrete surface-basedapproximation of (8):

corr =

(1

16π

N∑k=1

wk(rk − xi) · ~nk|rk − xi|7

) 14

. (9)

The following alternative approximation of R−1i [28] shows better accuracy for spherical solutes

as well as for proteins [55].1

R3i

=3

∫ex

1

|r− xi|6d3r. (10)

Following [55] we will refer to Equation (10) as the r6-approximation of Ri while Equation (4) willbe referred to as the r4-approximation.

We obtain the following discrete surface formulation of Equation (10) by applying the divergencetheorem and Gaussian quadrature.

1

Ri3 ≈

1

N∑k=1

wk(rk − xi) · ~nk|rk − xi|6

. (11)

Traditional GB models do not consider the salt effect. The influence of the ionic atmosphere on theCoulomb interaction can be derived from the Debye-Huckel solution for the electrostatic potential[52]:

Epol = −1

2

(1− eκf

GBij

εsolv

)∑ij

qiqj

fGBij, (12)

where κ is the Debye-Huckel screening parameter which is proportional to the square root of the

ionic concentration, and fGBij =

[r2ij +RiRj exp

(− r2

ij

4RiRj

)] 12

.

The first two terms of Esol are related to the estimation of an analytic molecular surface (MS)area, which is easily estimated from the sparse O (M) triangulation of the MS given in Section 2,where M is the number of atoms in the molecule. In [5] we presented a fast O (M logM) timeapproximation algorithm based on non-uniform Fast Fourier transforms for estimating the M Bornradii. However, Epol was estimated naıvely in O

(M2)

time.

Solvation Force. Solvation forces acting on atoms form an important part of the forces drivingmolecular dynamics [35], and being able to derive these forces computationally provides a mucheasier way than experimental methods for detecting active sites on bio-molecules. In general,strong solvation forces acting on an atom indicate that the atom is sensitive to changes in solventenvironment, and hence is likely to be a part of an active site of the molecule. Weak solvationforces, on the other hand, imply stability, and such stable atoms can be updated less frequentlythan active atoms during MD simulations to save computational costs.

The solvation force acting on an atom α located at xα is defined as:

Fsolα = −∂Esol

∂xα= − ∂

∂xα(Ecav + Evdw(s-s))−

∂Epol

∂xα.

While the non-polar part of the force can be approximated using derivatives of the surface areaand/or the volume of the molecule w.r.t. the location of the atom, computing the gradients of

5

Page 7: ICES REPORT 10-32ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj, "Algorithms for Faster Molecular Energetics, Forces and Interfaces", ICES REPORT 10-32,

(a) (b) (c) (d)

Figure 2: Molecular Surfaces (MS) of the C60 Fullerene: (a) the van der Waal MS; (b) solid sphereMS patch complex consisting of spherical and toroidal patches; (c) triangulated analytic MS; (d)quadrature points sampled from MS.

the polar part is not easy. Computing polarization forces using the PB method is difficult becauseof the discontinuous energy change arising from the suddent change in the value of the dielectricconstant across the molecular surface [25]. However, there have been several attempts to computepolarization forces using the GB method [36, 57] including our fast summation based algorithm [5]that computes the polarization forces acting on all atoms of a molecule in O

(M2 logM

)time and

O (M) space, where M is the number of atoms in the molecule.

Molecular Surfaces for Solvation Energy Computation. There is considerable prior workon various molecular surface representations of bio-molecules for visualization (see [16]), and hencewe do not attempt a review here.

A commonly used spatial occupancy and molecular surface model lacking derivative continuity,for bio-molecules is the van der Waals surface (VWS) (see Figure 2) which is the union of solidspheres representing atoms with van der Waals radii based on atomic type [47]. Approximatingthe water molecule as a single solid sphere of radius 1.4 A, and considering the envelope of allwater spheres that contacts and coincides with the VWS, yields a solid sphere molecular surface(MS) model [39]. Since the VWS has several regions that don’t make contact with water, theMS is the preferred choice for estimating molecular free energies, especially the solvation terms.The problem, however, with the solid sphere contact MS model is the occurrence of cusp likesingularities of the MS, caused by re-entrant patch self-intersections [15]. An analytic modificationof the MS circumvents this issue [26]. The atomic solid sphere is replaced with a smooth Gaussianfunction, and the MS becomes the level set of a summation of these atomic Gaussian functions

f(x) =M∑k=1

eβ(|x−xk|2/r2

k−1) where xk is the center of atom k, rk is its van der Waals radius, M

is the number of atoms, and β is a parameter that controls the rate of decay of the Gaussian.In [48] β = −2.3 with isovalue 1 is provided as a good approximation to the molecular surface.Similar Gaussian based MS with various improvements are used in [12, 23], using algebraic surfacesplines are given in [58], and using tensor product B-splines is given in [4]. However, for Esol

estimations (and as apparent in this technical report), smooth molecular surface models are neededas they act as interfaces between the regions encapsulating the molecule’s atoms and their contactwith water. These molecular interfaces are implicitly used to estimate the interaction energy ofthe molecule with water. Necessarily, smooth molecular surfaces with derivative continuity (i.e.analytic molecular surfaces) provide more accurate and stable estimations of molecular energetics.

6

Page 8: ICES REPORT 10-32ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj, "Algorithms for Faster Molecular Energetics, Forces and Interfaces", ICES REPORT 10-32,

Our Contributions. In this technical report we first show how to find a sparse triangulation1 ofany smooth analytic molecular surface (MS) in O (M logM) time and O (M) space, where M is thenumber of atoms in the molecule. The triangulation additionally yields an estimation of molecularsurface normals at triangulation vertices and at Gauss quadrature numerical integrations points ineach triangle’s interior [18]. A constant number of Gauss quadrature points per triangle suffices forhigh accuracy of the Born-radii calculation [19]. This sparse triangulation is used by many of thealgorithms described later in this paper.

We provide efficient approximation schemes for computing Lennard-Jones potential (Evdw),Coulomb potential (Ecoul), Born radii, GB polarization energy (Epol), dispersion energy (Evdw(s-s)),and GB polarization force. We also present approximation algorithms for computing the interfacearea (both plain and hydrophobic/hydrophilic) of a complex formed by a pair of bio-molecules. Allalgorithms are based on Barnes-Hut [7] type near and far decomposition of the data points in 3Dusing octrees [37]. An octree is a tree data structure that recursively and adaptively subdividesthe 3D space into 8 octants, and is often used as a container for rectilinear scalar field data.It uses space linear in the number of data points it holds. For a molecule with M atoms, ourapproximation schemes for energy and force calculations run in O

(1εk·M logM

)time using O (M)

space, where ε > 0 is an approximation parameter and k ∈ {2, 3}. The larger the value of ε isthe more approximate the results are and the faster the algorithm runs, and vice versa. Thusthese algorithms provide useful speed-accuracy tradeoffs, and can be tuned appropriately basedon the speed and accuracy needs of the application. Our algorithms for interface calculations runin O (M logM) time using space linear in M . All algorithms except those for Lennard-Jones andCoulomb potential use our sparse triangulation of the molecular surface. All algorithms in thispaper are easily parallelizable.

The rest of this paper is organized as follows. In Section 2 we describe our method of obtaininga sparse triangulation of an analytic molecular surface efficiently. Our approximation schemes forLennard-Jones potential, Coulomb potential, Born radii, GB polarization energy, and dispersionenergy are given in various subsections of Section 3. Section 4 describes our algorithm for approx-imating the GB polarization forces acting on all atoms of a molecule. In Section 5 we present ouralgorithms for approximating the interfaces of bio-molecular complexes. We describe the generalstrategy for parallelizing the algorithms presented in this paper in Section 6. Some experimentalresults on our energetics approximation schemes are given in Section 7. Finally, Section 8 includessome concluding remarks.

2 Sparse Triangulation of Analytic Molecular Surfaces

We first construct the solid sphere molecular surface (MS) of the molecule which can be decomposedinto a set of spherical and toroidal patches using the method in [15]. An efficient construction ofthis patch complex using Power Diagrams is given in [3]. The triangulation of this solid MS patchcomplex can be constructed patch by patch (Figure 2) since each of the spherical and toroidalsurface patches is rational parametric with rational parametric curve boundaries (i.e. NURBS) [2].It is easy to see that the number of patches forming the MS is O (M), where M is the number ofatoms in the molecule, and furthermore a triangulation of each patch using a constant number oftriangles per patch, yields an overall MS triangulation T of order O (M).

1a surface triangulation of a molecule is sparse provided the number of triangles in it is linear in the number ofatoms in the molecule

7

Page 9: ICES REPORT 10-32ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj, "Algorithms for Faster Molecular Energetics, Forces and Interfaces", ICES REPORT 10-32,

(a) (b) (c) (d)

Figure 3: Triangulation of MS patch complex: (a) convex spherical patch; (b) toroidal patch; (c)re-entrant concave spherical patch. (d) A prism Dijk constructed based on the triangle [vivjvk]with vertices vl lying on the SES. vl are projected to pl on the Gaussian surface (the shadedsurface) along their normal directions.

Our next step is to project the vertices of T to any of the smooth analytic molecular surfaces[26, 12, 23, 4], and construct an algebraic spline molecular surface (ASMS) [58] approximationof the analytic MS. The ASMS construction replaces each triangle of T with smooth normals atvertices with a single cubic Hermite algebraic patch. The rational parameterization of each C1

algebraic patch additionally allows us to sample a constant number of (Gauss) integration points(and normals) from the Gaussian MS for use in Epol estimation (see Section 3.3).

Let for any triangle [vivjvk] ∈ T the pre-specified unit normal vector at vertex vl (l ∈ {i, j, k})be nl. We define a prism Dijk := {p : p = b1vi(λ) + b2vj(λ) + b3vk(λ), λ ∈ Iijk} wherevl(λ) = vl + λnl, and (b1, b2, b3) are the barycentric coordinates of points in [vivjvk]. Also Iijk isa maximal open interval satisfying 0 ∈ Iijk and ∀λ ∈ Iijk, vi(λ), vj(λ) and vk(λ) are not collinear.The normals ni, nj and nk should point to the same side of the plane Pijk(λ) := {v : v =b1vi(λ) + b2vj(λ) + b3vk(λ)}. Our paper [58] gives an explicit construction of a C1 interpolatory(Hermite) cubic Bezier implicit algebraic surface patch F (λ) as well as a rational parameterizationusing cubic Bezier bivariate polynomials with rational functions of λ as coefficients.

The vertex vl is projected to the point pl = vl(λ0) on the analytic Gaussian MS where λ0 is

the solution to the equation F (λ) = 1 solved iteratively by Newton’s method λn+1 = λn − F (λn)−1F ′(λn)

with initial value 0. Assuming f(λ, k) = eβ(|vl(λ)−xk|2/r2k−1), one can write F and F ′ as follows.

F (λ) =M∑k=1

f(λ, k) and F ′(λ) = 2βM∑k=1

(vl(λ) · nl

r2k

)f(λ, k)− 2β

M∑k=1

(xk · nlr2k

)f(λ, k)

By projecting all the vertices of T we obtain an O (M) (point, normal) sampling of the an-alytic (Gaussian) MS (see Figure 3(d)). The procedure of generating a coarse O (M) patch MStriangulation followed by an ASMS and a projection to the Gaussian MS additionally yields thegreatest approximation accuracy. If naıve summation methods are used for evaluating F and F ′,then generating O (M) (point, normal) samples will require O

(M2)

time. However using fastsummation methods based on non-uniform Fast Fourier [46], and using the smooth kernel function

g(k) = eβf(λ, k), with coefficients ck = 1, vl(λ)·nlr2k

and xk·nlr2k

for F and the first and the second sum-

mations of F ′, respectively, we can reduce the time complexity of producing this new triangulationto O (M logM).

8

Page 10: ICES REPORT 10-32ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj, "Algorithms for Faster Molecular Energetics, Forces and Interfaces", ICES REPORT 10-32,

3 Approximating Energy Terms

In this section we present approximation schemes for computing the various compute-intensiveterms of molecular mechanical energy (EMM) and solvation energy (Esol).

3.1 Lennard-Jones Potential

The Lennard-Jones (LJ) potential between molecules A and B is given by the following expression.

LJ(A,B) =∑

i∈A,j∈Blj(i, j), lj(i, j) =

aijr12ij

− bijr6ij

,

where rij is the distance between atoms i ∈ A and j ∈ B, constants aij and bij depend on the type(e.g., C, H, O, etc.) of the two atoms involved. The same expression can be used to evaluate theLennard-Jones potential among the atoms of a single molecule A by setting B = A and consideringonly non-bonded atom pairs.

Observe that direct computation of LJ(A,B) requires O (MAMB) time, where MA (resp. MB)is the number of atoms in molecule A (resp. B). However, since the terms in the summationdiminish very fast with the increase of rij , distance cutoffs are often used to approximate it. For agiven distance cutoff δ, one then evaluates the following expression exactly as an approximation ofLJ(A,B).

LJδ−(A,B) =∑

i ∈ A, j ∈ Brij ≤ δ

lj(i, j)

Suppose MA > MB. Then one straight-forward way of evaluating LJδ−(A,B) with the hopeof improving over the naıve O (MAMB) running time is to use a 3D grid to store/hash the atomsof A based on the coordinates of their centers, and then for each atom j ∈ B directly probe thegrid cells that may contain atoms i ∈ A within distance δ from j’s center. If the grid-spacing is ∆and the grid size is n (i.e., n = nxnynz, where nx, ny and nz are the number of cells in the x, yand z direction, respectively), and the total number of 〈i ∈ A, j ∈ B〉 pairs to evaluate is m, then

the total time required to evaluate LJδ−(A,B) is O(MA +MB

(δ∆

)3+m+ n

)2, and the space

requirement is O (MA +MB + n). However, in the worst-case n = Θ(MB

3), and so depending on

the size of B, both the time and the space requirement can become prohibitively large.One way of eliminating the cubic dependance on MB of the time and space required for com-

puting LJδ−(A,B) is to use an adaptive grid or octree [37] instead of the full 3D grid. An octreeis a tree data structure that recursively subdivides the 3D space into 8 octants, and is often usedas a container for rectilinear scalar field data. The data structure is adaptive in the sense that anoctree node is not subdivided unless it contains at least two data points. Thus the size of an octreeis linear in the number of data points it holds. Now instead of storing the atoms of A in a full 3Dgrid, we store the atoms of B in an octree. Since the construction of an octree involves implicitsorting of the data points the octree construction takes O (MB logMB) time. Also since a moleculedoes not have any isolated atoms or components, the height of the octree is O (logMB). Now onetakes each atom i ∈ A, and traverses the octree using atom i’s center in order to locate octree

2O (n) time to initialize the grid, O (MA) time to insert A’s atoms, and O(MB

(δ∆

)3+m

)total time to retrieve

A’s atoms within δ distance from B’s atoms.

9

Page 11: ICES REPORT 10-32ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj, "Algorithms for Faster Molecular Energetics, Forces and Interfaces", ICES REPORT 10-32,

nodes containing atoms j ∈ B that lie within distance δ from i’s center. The total time and spacerequired to evaluate LJδ−(A,B) are O ((MA +MB) logMB +m) and O (MA +MB), respectively.

In the rest of this section we first outline our approach to evaluating LJδ−(A,B) faster thanO ((MA +MB) logMB +m) time while still using O (MA +MB) space, followed by our approachto fast approximation of LJ(A,B) to within a factor of 1 + ε of the exact value for any given ε > 0.

Faster Evaluation of LJ(A,B) with a Distance Cutoff. For faster evaluation of LJδ−(A,B)we store the atoms of molecule B in our Dynamic Packing Grid (DPG) [1] data structure insteadof an octree. The DPG can maintain the atoms of a molecule in space linear in the size of themolecule (i.e., number of atoms) while allowing a range of spherical range queries and updates(i.e., insertion/deletion) very efficiently. An update takes O (1) time (w.h.p.3), while a range queryreturns all atoms within a given distance δ from any given atom center in O (k) time4 (w.h.p.),where k is the number atoms returned. Therefore, all atoms of B can be inserted into the DPGin O (MB) time (w.h.p.), and the total time required to find all atoms of B within distance δ ofthe atoms of A is O (MA +m). Hence, LJδ−(A,B) can be evaluated exactly in O (MA +MB +m)time (w.h.p.) and O (MA +MB) space.

Fast (1 + ε)-Approximation of LJ(A,B). Observe that LJ(A,B) can be written as follows.

LJ(A,B) = LJδ−(A,B) + LJδ+(A,B), where, LJδ+(A,B) =∑

i ∈ A, j ∈ Brij > δ

lj(i, j)

A distance cutoff based algorithm evaluates LJδ−(A,B) exactly, but ignores LJδ+(A,B) com-pletely. We outline below how to obtain an error-bounded approximation of LJ(A,B) through afast approximation of LJδ+(A,B) in addition to the exact evaluation of LJδ−(A,B). More precisely,given any user-defined constant ε > 0, we will approximate LJ(A,B) to within a (1 + ε) factor ofits exact value.

In the expression of LJ(A,B), aij and bij are fixed for any fixed pair of atom types, and canbe calculated from the Amber force field using well depths µXY and equivalence contact distancesof homogeneous pairs reqm,XY , where X = atomType(i ∈ A) and Y = atomType(j ∈ B)). Bydefinition, aij/bij = r6

eqm,XY /2. We assume X,Y ∈ {C, H, N, O, P, S}.We decompose LJ(A,B) as follows, where byMX we denote the subset of atoms of type X in

molecule M∈ {A,B}.

LJ(A,B) =∑

X,Y ∈{C, H, N, O, P, S}

LJ(AX , BY ),

where, for some constant δXY ≥ 0 (to be defined later),

LJ(AX , BY ) =∑

i ∈ AX ∧ j ∈ BY

lj(i, j) = LJδ−XY(AX , BY ) + LJδ+

XY(AX , BY )

We outline below how to approximate LJ(AX , BY ) for a given pair of X and Y . We evaluateLJδ−XY

(AX , BY ) exactly, and approximate LJδ+XY

(AX , BY ) to within a factor of (1 + ε) of its exact

value.3For an input of size n, an event E occurs w.h.p. (with high probability) if, for any α ≥ 1 and c independent of

n, Pr(E) ≤ 1− cnα

.4The actual complexities also depend on logw and log logw, respectively, where w is the RAM word size (e.g., 32

or 64) of the machine, which is a constant for a given machine.

10

Page 12: ICES REPORT 10-32ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj, "Algorithms for Faster Molecular Energetics, Forces and Interfaces", ICES REPORT 10-32,

�� ��� �� �� ��� ��� �� �� �

�� ��� �� �� ��� ��� �� �� �

��

��

�� ��� �� �� ��� ��� �� �� �

�� ��� �� �� ��� ��� �� �� �

��

��

����

����

�� ��

�� ���� ��

�� ��� �� �� ��� ��� �� �� �

�� ��� �� �� ��� ��� �� �� �

��

��

����

����

�� ��

�� ���� ��

Figure 4: Explanation of the (1 + ε) approximation algorithm for Lennard-Jones (LJ) potential in2D using quadtrees [22] (i.e., 2D variant of octrees): In the leftmost figure the bounding box ofmolecule A (resp. B) represents the root node of the quadtree storing molecule A (resp. B). Thesmallest boxes in the middle and the rightmost figures represent quadtree nodes at level 2 (i.e.,children of the root node) and level 3, respectively. We assume for simplicity that if two nodesof the two quadtrees do not intersect they are far enough so that the LJ potential between theiratoms can be approximated by treating them as pseudo-atoms. In the leftmost figure the two rootnodes (nodes A and B) intersect, and so we move to their children nodes in the middle figure. Inthe middle figure only nodes A2 and B3 intersect, and so while the potential between the atomsof all other 〈Ai, Bj〉 pairs can be approximated, we need to move to the children of A2 and B3 inorder to compute the potential between them (shown in the rightmost figure).

Let δXY ≥(

12 + 1

ε

) 16 reqm,XY . Then if we approximate each

bijr6ij

with rij > δXY to within a factor

of 1 + ε2+ε , it can be shown that |lj(i, j)| <

[bij/r

6ij

]approx

< (1 + ε)|lj(i, j)|.In order to approximate LJ(AX , BY ) as mentioned above, we construct two octrees TAX and TBY

from the atoms in AX and BY , respectively, and compute a (1 + ε)-approximation of LJ(AX , BY )by simultaneous recursive traversals of TAX and TBY starting from their root nodes. Suppose atsome point we are at node x of TAX and node y of TBY . If both x and y are leaf nodes, potentialbetween the atoms contained in x (say, Mx) and y (say, My) is computed exactly. Otherwise if xand y are far enough (i.e., at least δXY apart), and small enough5 the potential between Mx andMy is approximated by assuming that x and y are single pseudo atoms centered at the center of

gravity ofMx andMy, respectively, and taking |Mx||My| bijr6xy

as the approximated potential, where

rxy is the distance between the centers of the two pseudo atoms. If neither of the two conditionsabove holds, we subdivide x and/or y (i.e., move to their children), and approximate the potentialrecursively. Figure 4 explains the approach in 2D.

In order to obtain an upper bound on the time required for approximating LJ(AX , BY ) we as-sume that the initial bounding box of both AX and BY have exactly the same size. Then it can beshown that each node x ∈ TAX will be paired withO

(1ε3

)nodes of TBY of the same or larger size dur-

ing recursive calls, and vice versa. Observing that there are O (|AX |) (resp. O (|BY |)) nodes in TAX5i.e., rx,y + (rx + ry) < (1 + ε/(2 + ε))

16 (rx,y − (rx + ry)), where, rx (resp. ry) is the radius of the smallest ball

centered at the atom centers of x (resp. y) that encloses all atom centers of x (resp. y).

11

Page 13: ICES REPORT 10-32ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj, "Algorithms for Faster Molecular Energetics, Forces and Interfaces", ICES REPORT 10-32,

ApproxLJ( x, y )

(Inputs are two octree nodes x ∈ TAX and y ∈ TBY , and the the output is a floating point number V such that U ≤V ≤ (1 + ε) · U , where U =

∑i ∈Mx ∧ j ∈My

(aij/r

12ij − bij/r6

ij

). By child(x) (resp. child(y)) we denote the set

of non-empty octree nodes obtained by subdividing node x (resp. y). We denote by bXY the value of the constant bij foratom types X and Y , and by rx,y the distance between the centers of x and y.)

1. if leaf(x) ∧ leaf(y) then return∑i∈Mx∧j∈My

(aijr12ij

− bijr6ij

){exact value}

2. else if rx,y − (rx + ry) > δXY ∧ rx,y+(rx+ry)

rx,y−(rx+ry)<(

1 + ε2+ε

) 16then return − Mx·My·bXY

(rx,y−(rx+ry))6 {approximation}

3. else if leaf(x) return∑

cy ∈ child(y) ApproxLJ( x, cy ) {recursive approximation}

4. else if leaf(y) return∑

cx ∈ child(x) ApproxLJ( cx, y ) {recursive approximation}

5. else return∑

cx ∈ child(x) ∧ cy ∈ child(y) ApproxLJ( cx, cy ) {recursive approximation}

ApproxLJ Ends

Figure 5: Recursive approximation of∑

i∈Mx∧j∈My

(aij/r

12ij − bij/r6ij

)to within a factor of 1+ε. The initial

call is ApproxLJ( root(TAX ), root(TBY ) ) for the approximation of∑

i∈AX∧j∈BY

(aij/r

12ij − bij/r6ij

).

(resp. TBY ), and taking the construction times of the octrees into account, the total running timeof the algorithm for atom-type pair 〈X,Y 〉 is O

(|AX | log |AX |+ |BY | log |BY |+ 1

ε3(|AX |+ |BY |)

).

Summing over all possible pairs of atom types, the total running time for approximating LJ(A,B)is O

((1ε3

+ log (MA +MB))(MA +MB)

).

3.2 Coulomb Potential

Long range Coulomb potential plays a role in forming stable complexes due to partially chargedbio-molecules and solvent atoms. The Coulomb potential is given by the quadratic sum Q =∑

i,jqiqj

ε(rij)rij, where, qi and qj are Coulombic charges of atoms i and j, respectively, rij is the

distance between their centers, and ε(rij) is a distance dependant dielectric constant. Assumingε(rij) = rij , Q is also approximated as

∑i,j qiqj/r

2ij , where pairwise interactions fall off more sharply

with distance.

Coulomb potential can be approximated using an algorithm similar to the one given for Lennard-Jones potential. Since contributions due to positive and negative charges tend to cancel out thismethod does not guarantee an error bound similar to that for approximating Lennard-Jones poten-tial. Instead the error bound is dependant on whether the exact potential is positive or negative,and we cannot guarantee a multiplicative error bound if the exact potential is zero. SupposeQ = QP − QN , and α = QN

QP , where QP (resp. QN ) is the sum of all positive (resp. negative)pairwise potentials in Q. Now if Q > 0 (i.e., α < 1), and we get an approximate Coulomb potentialQ′ by running an (1+ε) approximation algorithm similar to the Lennard-Jones potential algorithmdescribed in Section 3.1, then it can be shown that(

1− ε

1− α

)Q ≤ Q′ ≤

(1 +

ε

1− α

)Q,⇒ Q− εQP ≤ Q′ ≤ Q+ εQP .

12

Page 14: ICES REPORT 10-32ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj, "Algorithms for Faster Molecular Energetics, Forces and Interfaces", ICES REPORT 10-32,

�� ��� �� ��� � � �� � �� � � � � �� �� � � � �� �� � � � � �� � � � � � � �� �� � �� � � � �

Figure 6: In our octree-based Born radius approximation algorithm we construct two octrees: onefor the atoms in the molecule, and the other for the quadrature points. Born radii of all atomsare approximated by simultaneous recursive traversal of both octrees. In this figure the octrees aredrawn as quadtrees [22] for simplicity.

Similarly, if Q < 0 (i.e., α > 1), the bound is(1− εα

1− α

)Q ≤ Q′ ≤

(1 +

εα

1− α

)Q ⇒ Q− εQN ≤ Q′ ≤ Q+ εQN .

3.3 Generalized Born Polarization Energy

In this section we first describe an O (M logM) algorithm for fast approximation of the Born radii(6) with correction term (9) of all M atoms in a molecule, followed by another O (M logM) timealgorithm for approximating Epol (3) from the approximated Born radii.

3.3.1 Born Radii

Let A be a set of M atoms in a molecule, and let Q be a set of m = O (M) Gauss quadra-ture/integration points sampled on the molecular surface (as per Section 2). For each quadraturepoint q ∈ Q, let nq = wqnq, where nq is the outward unit normal on the molecular surface at pointq, and wq is the weight assigned to q.

As in Section 3.1 our approach is to use a near and far decomposition of the atoms in A andthe quadrature points in Q. Hence, we build two octrees TA and TQ for A and Q, respectively(see Figure 6). We traverse TA and TQ simultaneously starting at their root nodes, and collect theapproximated integrals at appropriate internal nodes of TA and atoms of A. Suppose at some pointduring this traversal we are at node A of TA and node Q of TQ. Let rA (resp. rQ) be the radius6

of A (resp. Q). If A and Q are far enough, i.e., the distance between their centers is larger than(rA + rQ)

(1 + 1

ε

)for some user-defined approximation parameter ε > 0, then the contribution of

all quadrature points in Q to the Born radius integral of each atom in A can be approximated bytreating A as a single pseudo atom centered at the geometric center of the atoms under it, and

6i.e., rA = radius of the smallest ball centered at the geometric center of the atom centers in A that encloses allatom centers of A.

13

Page 15: ICES REPORT 10-32ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj, "Algorithms for Faster Molecular Energetics, Forces and Interfaces", ICES REPORT 10-32,

Q as a single pseudo quadrature point located at the geometric center of the quadrature pointsunder it with nQ =

∑q∈Q nq. This approximated contribution is collected in A. If A and Q are not

far enough but both are leaves then we compute this contribution exactly using the atoms underA and the quadrature points under Q, and collect it in the respective atoms. If at least one ofA and Q is not a leaf, we recurse using the children of the nonleaf/nonleaves. After we are donewith this simultaneous traversal, we traverse TA top-down and add the collected partial integralsto each atom from its ancestors and compute its Born radius from these accumulated values. Thepseudocode of the approximation algorithm is given in Figure 7.

The accuracy and running time of the algorithm depends on the approximation parameterε > 0. The smaller the value of ε is the more accurate the approximated Born radii are and thelarger ε is the faster the algorithm runs. The running time of the algorithm is dominated by thetime required for approximating the interactions between the atoms and the quadrature pointsthrough the simultaneous traversal of the two octrees. We observe that during that process, forany given r, each quadrature point is used in computations involving TA nodes of radius betweenr and 2r only O

(1ε2

)times. Similarly, each atom is involved in computations using TQ nodes of

radius between r and 2r only O(

1ε2

)times. Since the heights of TA and TQ are O (logM) and

O (logm), respectively, the algorithm runs in O(

1ε2· (M logm+m logM)

)time, which reduces to

O(

1ε2·M logM

)for m = O (M).

3.3.2 Polarization Energy

In this section we describe a fast approximation scheme for computing the Generalized Born po-larization force, i.e., the derivatives of GB polarization energy (Epol).

We have developed an octree-based algorithm for fast computation of Epol (3). As in Section3.1 the algorithm is based on near and far decomposition of the given set of atoms.

Consider a set A of M atoms with Rmin and Rmax being the minimum and the maximumof the Born radii in A, respectively. Now given an approximation parameter ε > 0, we dividethe atoms into Mε = log1+ε (Rmax/Rmin) groups, and place each atom a with Born radius Ra ∈[Rmin(1 + ε)k, Rmin(1 + ε)k+1) in group k ∈ [0,Mε), and approximate Ra with Rmin(1 + ε)k. We usethe atoms octree TA built in Section 3.3.1. For every node A ∈ TA and 0 ≤ k < Mε, we precomputeqA[k] =

∑(a∈A) ∧ (Ra∈[Rmin(1+ε)k,Rmin(1+ε)k+1)) qa. We now traverse TA simultaneously using two

pointers both of which initially point to the root node of TA. Suppose at some point during thistraversal the two pointers point to nodes U and V . We first check if both U and V are leaves,and if so, we compute the interaction between the two sets of atoms under U and V directly usingactual charges, Born radii and inter-atomic distances. Otherwise if the two nodes are far enoughfrom each other the interaction between the set of atoms under them is approximated using theapproximate Born radii described above and the sum of charges qU and qV . If the two nodes are tooclose for approximation, we recurse on the nonleaf node(s). The pseudocode of the approximationalgorithm is given in Figure 8.

The algorithm computes a (1 + ε) approximation provided all terms in the Epol sum are of thesame sign. Otherwise the multiplicative approximation error can be obtained as in Section 3.2.The running time of the algorithm can be shown to be O

(1ε2·M logM

).

14

Page 16: ICES REPORT 10-32ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj, "Algorithms for Faster Molecular Energetics, Forces and Interfaces", ICES REPORT 10-32,

Approx-Integrals( A, Q )

(For each atom a under the subtree rooted at the given node A in the atoms octree approximate∑q∈SQ wq

(pq−pa)·nq|pq−pa|4

,

where SQ is the set of integration/quadrature points under the subtree rooted at the given node Q in the quadrature pointsoctree. By pa = 〈xa, ya, za〉 we denote the center of an atom a, while by pq = 〈xq , yq , zq〉, wq and nq = 〈nxq , nyq , nzq〉 wedenote the location of a qudrature point q, weight assigned to q, and the unit outward normal on the molecular surface atq, respectively. By 〈xA, yA, zA〉 (resp. 〈xQ, yQ, zQ〉) we denote the geometric center of the atoms (resp. integration points)under A (resp. Q). By rA (resp. rQ) we denote the radius of the smallest ball centered at 〈xA, yA, zA〉 (resp. 〈xQ, yQ, zQ〉)that encloses all atom centers (resp. integration points) under A (resp. Q). The distance between the geometric centers ofA and Q is given by rA,Q. We also assume nxQ =

∑q∈Q wqnxq . Similarly for nyQ and nzQ. Each atom a has two fields

sa and ca, and each node A in the atoms octree has fields sA and cA, all of which are initialized to zero. The approximatedsum is added to sA provided A and Q are far enough in space so that the sum can be approximated reasonably well(controlled by an approximation parameter ε > 0). Otherwise the sums are computed recursively and added to the s field

of appropriate descendants of A. We also approximate a correction term∑q∈SQ wq

(pq−pa)·nq|pq−pa|7

and add it to cA or the c

field of the appropriate descendants of A.)

1. if rA,Q >(rA + rQ

) (1 + 1

ε

)then {far enough to approximate}

x∆ = xA − xQ, y∆ = yA − yQ, z∆ = zA − zQ

sA = sA +nxQ·x∆+nyQ·y∆+nzQ·z∆

(rA,Q)4, cA = cA +

nxQ·x∆+nyQ·y∆+nzQ·z∆(rA,Q)7

2. else if leaf(A) ∧ leaf(Q) then {too close to approximate; compute exact value}

for each atom a ∈ A do

for each quadrature point q ∈ Q do

xδ = xa − xq , yδ = ya − yq , zδ = za − zq

sa = sa +wq·(nxq·xδ+nyq·yδ+nzq·zδ)

(ra,q)4, ca = ca +

wq·(nxq·xδ+nyq·yδ+nzq·zδ)(ra,q)7

3. else if leaf(A) then ∀Q′ ∈ children(Q) : Approx-Integrals( A, Q′ ) {recurse on Q}

4. else if leaf(Q) then ∀A′ ∈ children(A) : Approx-Integrals( A′, Q ) {recurse on A}

5. else ∀A′ ∈ children(A) ∧ ∀Q′ ∈ children(Q) : Approx-Integrals( A′, Q′ ) {recurse on A and Q}

Approx-Integrals Ends

Push-Integrals-to-Atoms( A, s, c )

(A is a node in the atoms octree, s =∑A′∈ancestors(A) sA′ and c =

∑A′∈ancestors(A) cA′ . This function pushes s + sA

and c+ cA to each descendant of A. If A is a leaf it computes the Born radius of each atom a ∈ A using s+ sA + sa andc+ cA + ca.)

1. if leaf(A) then ∀a ∈ A : Ra = max

ra,1(

1− 1√2

)· sa+s+sA

4π+(ca+c+cA

16π

) 14

{compute Born radii of A’s atoms}

2. else ∀A′ ∈ children(A) : Push-Integrals-to-Atoms( A′, s+ sA, c+ cA ) {push integrals to A’s descendants}

Push-Integrals-to-Atoms Ends

Figure 7: Octree-based algorithm for approximating Born radii. Given the atoms octree TA andquadrature/integration points octree TQ, the Born radii of all atoms in TA can be approximated (con-trolled by a given approximation parameter ε > 0) by making the following sequence of function calls:Approx-Integrals( root(TA), root(TQ) ), and Push-Integrals-to-Atoms( root(TA), 0, 0 ).

15

Page 17: ICES REPORT 10-32ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj, "Algorithms for Faster Molecular Energetics, Forces and Interfaces", ICES REPORT 10-32,

Approx-Epol( U, V )

(For two given nodes U and V in the atoms octree TA approximate the part of Epol resulting from the interaction betweenthe set of atoms under U and V . By (xU , yU , zU ) we denote the geometric center of the atoms under U . By rU we denote theradius of the smallest ball centered at (xU , yU , zU ) that encloses all atom centers under U . For any atom u ∈ U , its center,

radius, charge and Born radius are given by (xu, yu, zu), ru, qu and Ru, respectively. For 0 ≤ k < Mε = log1+ε

(RmaxRmin

),

qU [k] =∑

(u∈U) ∧ (Ru∈[Rmin(1+ε)k,Rmin(1+ε)k+1)) qu, where Rmin and Rmax are the minimum and the maximum Born

radius among all atoms in A.)

1. if leaf(U) ∧ leaf(V ) then return − τ2

∑(u∈U) ∧ (v∈V )

quqv√r2uv+RuRve

−r2uv4RuRv

{exact value}

2. else if rU,V > (rU + rV )(1 + 1

ε

)then return − τ

2

∑0≤i,j<Mε

qU [i]·qV [j]√√√√r2UV

+Rmin(1+ε)i+je

−r2UV

4Rmin(1+ε)i+j

{approximate}

3. else if leaf(U) then return∑V ′∈children(V ) Approx-Epol

( U, V ′ ) {recurse on V }

4. else if leaf(V ) then return∑U′∈children(U) Approx-Epol

( U ′, V ) {recurse on U}

5. else return∑

(U′∈children(U)) ∧ (V ′∈children(V )) Approx-Epol( U ′, V ′ ) {recurse on U and V }

Approx-Epol Ends

Figure 8: Octree-based algorithm for approximating Epol from Born radii. Given the atoms octree TA withall Born radii already computed, Epol can be approximated (controlled by a given approximation parameterε > 0) by making the following function call: Approx-Epol( root(TA), root(TA) ).

3.4 Dispersion Energy

As mentioned in Section 1, the solute-solvent van der Waals interaction energy (also known asdispersion energy) is modeled as [20, 56]:

Evdw(s-s) = ρ0

M∑i=1

∫exu

(att)i (xi, r)d3r.

where ρ0 is the bulk density, and u(att)i is the van der Waals dispersive component of the interaction

between atom i ∈ [1,M ] and the solvent which is given as follows.

u(att)i (xi, r) =

1

|r− xi|6

Thus

Evdw(s-s) = ρ0

M∑i=1

∫ex

1

|r− xi|6d3r (13)

The following discrete surface formulation of Equation (13) is obtained by applying the diver-gence theorem and Gaussian quadrature (similar to Equations 6 and 11 in Section 1).

Evdw(s-s) =ρ0

3

M∑i=1

m∑k=1

wk(rk − xi) · ~nk|rk − xi|6

(14)

16

Page 18: ICES REPORT 10-32ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj, "Algorithms for Faster Molecular Energetics, Forces and Interfaces", ICES REPORT 10-32,

If Ri is the Born radius of atom i calculated using the r6-approximation (i.e., Equation 11),then Equation 14 can be rewritten as:

Evdw(s-s) = ρ04π

3

M∑i=1

1

R3i

(15)

Therefore, Evdw(s-s) can be approximated in O(

1ε2·M logM

)time and O (M) space using

m = O (M) quadrature points and the technique described in Section 3.3.1 for the simultaneousapproximation of Born radius of all atoms in a molecule, where ε > 0 is the approximation parameterused for Born radius approximation.

4 Approximating Forces

In this section we describe an efficient algorithm for approximating the polar part of the solvationforce. The non-polar part can be approximated by taking derivatives of the surface area and/orthe volume of the molecule w.r.t. the atom location.

4.1 Polarization Force

In [5] we described an O(M2 logM

)time algorithm for approximating the polarization force acting

at the center of all atoms of a moleculeM, where M is the number of atoms inM. In this sectionwe show that the force vectors for all M atoms can be approximated in O (M logM) time (w.h.p.)and O (M) space. Thus the time complexity has improved by a factor of M , and is essentially thesame as that needed for approximating the polarization energy (see Section 3.3).

For an atom i ∈ M (i.e., i ∈ [1,M ]), let xi be its center, ai be its van der Waals radius, qidenote its charge, and Ri be its Born radius. Also for i, j ∈M let

xij = xi − xj , qij = qiqj , Rij = RiRj ,

eij = e

−r2ij4Rij , fij = r2

ij +Rijeij , hij = r2ij + 4Rij , and Gij = qij/f

12ij .

Then the polarizaton force acting at the center of atom α ∈ M is obtained by taking thederivative of polarization energy Epol w.r.t. atom center xα:

Fpolα = −

∂Epol

∂xα= −τ

2

M∑i=1

M∑j=1

∂Gij∂xα

= −τ2

M∑i=1

M∑j=1

[∂Gij∂rij

∂rij∂xα

+∂Gij∂Ri

∂Ri∂xα

+∂Gij∂Rj

∂Rj∂xα

],

where τ = 1− 1εsolv

.In order to falititate the derivative computation of Ri’s, a volumetric density function %α is

defined [5] for each atom i, which includes a cubic spline within a band of width w7 near the atomboundary.

%i(r) =

1, |r− xi| ≤ ai2w3 (|r− xi| − ai)3 − 3

w2 (|r− xi| − ai)2 + 1, ai < |r− xi| < ai + w0, |r− xi| ≥ ai + w

7typically, w = 1.4A, i.e., the radius of a water molecule

17

Page 19: ICES REPORT 10-32ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj, "Algorithms for Faster Molecular Energetics, Forces and Interfaces", ICES REPORT 10-32,

Assuming that no more than four atoms overlap simultaneously, the following function hα isdefined where j, k, l are the atoms overlapping with atom α.

hα(r) =∂%α∂xα

(r)gα(r), where, gα = 1−∑j

%j +∑j<k

%j%k −∑j<k<l

%j%k%l

Also let Γ be the molecular surface of M, and ~n(r) be the outward unit normal at any givenpoint r on Γ. Then it can be shown that

Fpolα = −τ (Uα + Vα − PαWα) ,

where,

Pα = − 1

32π

M∑j=1

qαjeαjhαj

f32αj

, Uα =1

4

M∑j=1

qαj(eαj − 4)xαj

f32αj

,

Vα =M∑j=1

Pj

∫ |r−xα|=aα+w

|r−xα|=aαhα(r)

1

|r− xj |4d3r, and Wα =

∫Γ

~n(r)

|r− xα|4d2r.

We describe below how to approximate each of these four quantities (i.e., Pα, Uα, Vα and Wα)for all M atoms of M in O (M logM) time.

Approximating Pα and Uα. Both these terms can be approximated using a technique similar tothe one used for approximating Epol in Section 3.3.2. Suppose Rmin is the smallest Born radius inMand Rmax is the largest. As in Section 3.3.2 we divide the atoms ofM into Mε = log1+ε (Rmax/Rmin)groups, where ε > 0 is a given approximation parameter. Each atom with Born radius in [Rmin(1 +ε)k−1, Rmin(1 + ε)k) is placed in group Mk, k ∈ [1,Mε], and approximate the Born radius of eachsuch atom with Rmin(1 + ε)k−1. Now each Pα (and similarly Uα) can be decomposed as follows.

Pα =

Mε∑k=1

Pα,k, where, Pα,k = − 1

32π

∑j∈Mk

qαjeαjhαj

f32αj

Now for each pairing 〈Mp,Mk〉 of the groups we traverse the two octrees of the two groupssimultaneously, and approximate the Pα,k value for each atom α ∈ Mp using the technique weused to approximate Born radii of atoms (in Section 3.3.1) by collecting the partial sums at variousinternal nodes ofMp’s octree and finally pushing them to the atoms through a top-down traversalof the tree.

The overall running time of this algorithm for all possible pairings is O(

1ε2M logM

), and it

uses O (M) space.

Approximating Vα. Observe that Vα can be written as follows.

Vα =

∫ |r−xα|=aα+w

|r−xα|=aαhα(r)

M∑j=1

Pj|r− xj |4

d3r

The integration domain of Vα is a regular spherical shell of width w around atom α. In orderto numerically evaluate Vα we sample a constant number of integration points from this shell in

18

Page 20: ICES REPORT 10-32ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj, "Algorithms for Faster Molecular Energetics, Forces and Interfaces", ICES REPORT 10-32,

O (1) time as described in [5]. Let Sα be the set of integration points sampled from α’s shell. ThenVα can be decomposed as follows.

Vα ≈∑r∈Sα

Vα,r, where, Vα,r = hα(r)

M∑j=1

Pj|r− xj |4

Observe that we have already computed Pj for each atom j. For each r ∈ Sα, we can precomputehα(r) in constant time as follows. Packing propeties of molecules ensure that each atom can beoverlapped only by a constant number of other atoms. Hence for each atom α we can locate allatoms overlapping it in O (1) time (w.h.p.) using our Dynamic Packing Grid (DPG) data structure[1]. Initialization of the DPG data structure for this purpose requires inserting the centers of allatoms of the molecule into the data structure in O (M) time (w.h.p.). The data structure usesO (M) space. Then each hα(r) can be evaluated in constant time.

Let S = ∪α∈MSα. We now construct an octree for S and another octree for M, and traverseboth simultaneously in order to approximate Vα,r for each integration point r ∈ Sα, ∀α ∈ M.The algorithm is similar to the one for Born radius computation described in Section 3.3.1. Since|S| = Θ (M), the algorithm runs in O

(1ε2M logM

)time, and uses O (M) space, where ε > 0 is a

given approximation parameter.

Approximating Wα. We use the following discrete form:

Wα ≈m∑k=1

wk~nk|rk − xα|4

,

where, rk, k ∈ [1,m] are m = O (M) Gaussian integration points sampled from Γ as in Section3.3.1. Each rk has a weight wk, and a unit outward normal ~nk drawn on Γ at rk. Now the Wα valuesfor all α ∈ M can be approximated simultaneously using an octree-based algorithm similar to theone used for approximating Born radii in Section 3.3.1. For any given approximation parameterε > 0, the algorithm runs in O

(1ε2M logM

)time and uses O (M) space.

The approximated values of Pα, Uα, Vα and Wα for each α ∈ M can now be combined inconstant time to obtain Fpol

α . Hence, the force vectors for all atoms in M can be approximated inO (M logM) time (w.h.p.), and using space linear in M .

5 Approximating Interfaces

We describe algorithms for fast approximation of the interface area of bio-molecular complexes.We consider both plain and hydrophobic/hydrophilic interfaces.

5.1 Interface Area

Suppose molecules A and B form a complex, and let SP denote the molecular surface of moleculeP ∈ {A,B}. Suppose both surfaces are decomposed into numerous tiny patches of almost equalarea, and each such patch π is represented by a tuple 〈cπ, aπ〉, where cπ is a point on π (preferablyits center or close to its center) which call the patch center, and aπ is its area. Then the interfacearea can be approximated by first identifying each patch π ∈ SA such that there exists anotherpatch π′ ∈ SB with dist(cπ, cπ′) < µ and vice versa, where µ is a user-defined distance cutoff, and

19

Page 21: ICES REPORT 10-32ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj, "Algorithms for Faster Molecular Energetics, Forces and Interfaces", ICES REPORT 10-32,

�� ��� �� �� ��� ��� �� �� �

�� ��� �� �� ��� ��� �� �� �

��

��

�� ��� �� �� ��� ��� �� �� �

�� ��� �� �� ��� ��� �� �� �

��

��

����

����

�� ��

�� ���� ��

�� ��� �� �� ��� ��� �� �� �

�� ��� �� �� ��� ��� �� �� �

����

��� �� � � �� �

� �� � ����

��� �� �

��� �� � ��� �� �

Figure 9: Explanation of the interface area approximation algorithm in 2D using quadtrees [22](i.e., 2D variant of octrees): In the leftmost figure the bounding box of molecule A (resp. B)represents the root node of the quadtree storing molecule A (resp. B). The smallest boxes in themiddle and the rightmost figures represent quadtree nodes at level 2 (i.e., children of the root node)and level 3, respectively. In the leftmost figure the two root nodes (nodes A and B) intersect, andso we move to their children nodes in the middle figure. In the middle figure only nodes A2 andB3 intersect, and so we ignore all other node pairs except 〈A2, B3〉. In the rightmost figure wemove to the children of A2 and B3. Among the 16 possible pairings in the rightmost figure weneed to consider only the following 7 since only these pairs intersect and both nodes in the paircontain surface patches: 〈A21, B31〉, 〈A21, B33〉, 〈A22, B31〉, 〈A22, B33〉, 〈A22, B34〉, 〈A24, B33〉 and〈A24, B34〉.

then summing up the areas of all such identified interface patches. If exact interface areas are notrequired, and a value roughly to the exact value suffices, we can use the weighted surface integrationpoints described in Section 2 instead of actual patches as if the integration point is a point on apatch with its weight being the patch area.

Given the two molecules A and B along with their molecular surfaces decomposed into tinypatches as described above, we would like to evaluate the following expression.

InterfaceArea(A,B) =∑

π∈IPA(B)

aπ +∑

π∈IPB(A)

aπ,

where, IPP1(P2) ={π1|(π1 ∈ SP1) ∧ ∃π2∈SP2

(dist(cπ1 , cπ2) < µ)}

.

Our approach for approximating the interface area is based on octrees [37] and our DynamicPacking Grid (DPG) data structure [1] mentioned in Section 3.1.

For each molecule P ∈ {A,B}, we construct an octree TP based on the patches on its surfaceSP . For each octree node c ∈ TP , we denote by Nc the set of all patches π ∈ SP such that cπbelongs to c. Node c is marked as a leaf and is not subdivided provided the number of patches itcontains is below some user-defined threshold. Each leaf c has a DPG data structure PGc which isinitialized with the patch centers of Nc, and given any query point p, it can answer in O (1) time(w.h.p.8) if there exists a patch center in Nc within any given constant distance cutoff (say, µ)

8For an input of size n, an event E occurs w.h.p. (with high probability) if, for any α ≥ 1 and c independent ofn, Pr(E) ≤ 1− c

nα.

20

Page 22: ICES REPORT 10-32ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj, "Algorithms for Faster Molecular Energetics, Forces and Interfaces", ICES REPORT 10-32,

CollectInterfacePatches( x, y, PG )

(Inputs are two octree nodes x ∈ TSA and y ∈ TSB . The function collects all interface patches of SA and SB in a packinggrid data structure PG. By child(x) (resp. child(y)) we denote the set of non-empty octree nodes obtained by subdividingnode x (resp. y). We denote by rx,y the distance between the centers of x and y.)

1. if rx,y − (rx + ry) ≥ µ then return {too far: no interface between Nx and Ny}

2. else if leaf(x) ∧ leaf(y) then {collect interface patches, if any}

for each π ∈ Nx do

if (π /∈ PG) ∧ ∃p∈PGy (dist(cπ , p) < µ) then insert π into PG {collect interface patches of SA}

for each π ∈ Ny do

if (π /∈ PG) ∧ ∃p∈PGx (dist(cπ , p) < µ) then insert π into PG {collect interface patches of SB}

3. else if leaf(x) then

for each cy ∈ child(y) do CollectInterfacePatches( x, cy, PG ) {recursive collection}

4. else if leaf(y) then

for each cx ∈ child(x) do CollectInterfacePatches( cx, y, PG ) {recursive collection}

5. else for all cx ∈ child(x) and cy ∈ child(y) do

CollectInterfacePatches( cx, cy, PG ) {recursive collection}

CollectInterfacePatches Ends

ApproxInterfaceArea( A, B )

(Inputs are two molecules A and B. We assume that for each molecule P ∈ {A,B}, the octree TSP and the packing gridPGSP based on the patches on its surface SP have already been constructed as discussed in Section ??. The functionevaluates and returns InterfaceArea(A,B).)

1. initialize an empty packing grid PG {for collecting interface patches without any duplicates}

2. CollectInterfacePatches( root(TSA ), root(TSB ), PG ) {collect the interface patches in PG}

3. return∑π∈PG aπ {sum up the patch areas and return}

ApproxInterfaceArea Ends

Figure 10: Identify the interface patches on SA and SB and return the sum of their areas. The functionCollectInterfacePatches( x, y, PG ) collects in PG (without duplicates) each patch from Nx that iswithin distance µ from at least one patch in Ny (center to center distance), and vice versa. The initial callis CollectInterfacePatches( root(TSA), root(TSB ), PG ) for computing InterfaceArea(A,B).

from p [1].

We traverse TA and TB starting from their roots in order to identify all interface patches of SAand SB, and collect them in a DPG data structure PG. A DPG allows O (1) time (w.h.p.) insertionof points with satellite data (e.g., patches) as well as O (1) time (w.h.p.) checking of duplicates(i.e., if the point to be inserted already exists in the data structure) [1]. After all interfaces patches(without duplicates) are identified the sum of their areas is returned. Suppose we are at nodex ∈ TA and y ∈ TB at some point during our traversal of the octrees. Our goal is to collect in PG(without duplicates) each patch π1 ∈ Nx that is within distance µ from at least one patch π2 ∈ Nyusing center to center distance, and vice versa. We first checks if x and y are far enough so that nointerface between Nx and Ny is possible, and if so we return. Otherwise if both nodes are leaves it

21

Page 23: ICES REPORT 10-32ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj, "Algorithms for Faster Molecular Energetics, Forces and Interfaces", ICES REPORT 10-32,

ComputeHydroVec( A, B )

(Inputs are two molecules A and B. We assume that for each molecule P ∈ {A,B}, the octree TSP and the packing gridPGSP based on the patches on its surface SP have already been constructed as discussed in Section ??. The functionevaluates and returns HydroV ec(A,B).)

1. initialize an empty packing grid PG {for collecting interface patches without any duplicates}

2. CollectInterfacePatches( root(TSA ), root(TSB ), PG ) {collect the interface patches in PG: see Figure 10}

3. v1 ←∑

π ∈ PG ∧ π ∈ PGSAhπ < 0

hπaπ , v2 ←∑

π ∈ PG ∧ π ∈ PGSAhπ > 0

hπaπ ,

v3 ←∑

π ∈ PG ∧ π ∈ PGSBhπ < 0

hπaπ , v4 ←∑

π ∈ PG ∧ π ∈ PGSBhπ > 0

hπaπ {compute vector entries}

4. return 〈 v1, v2, v3, v4 〉 {return vector}

ComputeHydroVec Ends

Figure 11: This function computes and returns the hydrophobicity vector HydroV ec(A,B). FunctionCollectInterfacePatches from Section 5.1 is used for identifying the interface patches on SA and SB .

checks for each patch π ∈ Nx if there exists a patch in Ny within interface distance (i.e., center tocenter distance is less than µ), and if so, and π is not already in PG, we add π to PG. The patchesin Nx are handled in the same way. The use of PGx and PGy for identifying the interface patchesallows one to identify all such patches in O (|Nx|+ |Ny|) time (w.h.p.) instead of O (|Nx||Ny|)time using direct checks. This improves efficiency significantly because though the total number ofsurface patches is within a constant factor of the total number of atoms, this constant is large. Ifneither of the two conditions mentioned above holds, we identify the interface patches recursively.Figure 9 shows an example in 2D. The pseudocode of the algorithm is given in Figure 10.

It can be shown that the running time of the algorithm is O ((MA +MB) log (MA +MB))(w.h.p.), where MP is the number of atoms in P ∈ {A,B}. However, it runs much faster inpractice, i.e., in O (Nint) time (w.h.p.), where Nint is the total number of patches in the interface.

5.2 Hydrophobic and Hydrophilic Interface Area

We approximate the hydrophobic and hydrophilic interface area of a complex based on per-residuehydrophobicity values obtained from computational log(P ) determinations by the ”Small FragmentApproach” [9]. We use the following equation to scale raw log(P ) values to the values scaled between-1 and +1 with -1 being the most hydrophilic and +1 the most hydrophobic:

Scaled Parameter = −Raw Parameter− 0.181

2.242

Suppose as in Section 5.1 we have the patch decomposition of the surface of each moleculeP ∈ {A,B}. However, each patch π is now represented as a triple 〈cπ, aπ, hπ〉 where the additionalvalue hπ is a measure of hyrdophobicity of π in the following sense. We assume that each atomk ∈ P is assigned a hydrophobicity value hk, and hπ = 1

|SP,π |∑

k∈SP,π hk, where SP,π = {k|(k ∈P ) ∧ (dist(ck, cπ) < νrk)}, ck is the center of atom k, rk is its van der Waals radius, and ν is auser-defined constant.

We compute the following vector.

22

Page 24: ICES REPORT 10-32ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj, "Algorithms for Faster Molecular Energetics, Forces and Interfaces", ICES REPORT 10-32,

HydroV ec(A,B) =

⟨ ∑π ∈ IPA(B)hπ < 0

hπaπ,∑

π ∈ IPA(B)hπ > 0

hπaπ,∑

π ∈ IPB(A)hπ < 0

hπaπ,∑

π ∈ IPB(A)hπ > 0

hπaπ

⟩,

where IPP1(P2) is as defined in Section 5.1.Our approach for approximating the hydrophobicity vector is exactly the same as the interface

area approximation algorithm in Section 5.1, and has the same running time. So we do not analyzethe function here. The pseudocode of the algorithm is given in Figure 11. In fact, the hydropho-bicity vector can be computed simultaneously with the approximate interface area with negligibleoverhead.

6 Parallelization

All approximation algorithms presented in this paper are based on divide-and-conquer strategy,and thus particularly suitable for multithreaded execution on multicores. All algorithms are basedon simultaneous traversal of one or two octrees using two node pointers. In general, at any stage ofthe traversal if the two current nodes are very close to each other, we pair each child of one nodewith each child of the other, and recursively consider each such pair. However, it is easy to seethat each such pair can be processed independently, and thus in parallel of each other. Since fora molecule with M atoms, the total sequential work performed by each algorithm is O (M logM)and the length longest path in its computational DAG is O (logM)9, the parallel running time of

the algorithm on a machine with p cores is O((

Mp + 1

)logM

)assuming that no core is left idle

when there are jobs in the queue that can be processed in parallel (Brent’s principle [11]).

7 Experimental Results for Free Energy Terms

In this section we present performance figures for the octree-based algorithms for approximating theLennard-Jones potential, Born radii and Generalized Born polarization energy. The experiments inSection 7.1 were performed on a 16-core computation node containing 4 quad-core 2.3 GHz AMDOpteron processors with 32 GB RAM, while all other experiments were run on 3 GHz 2×dual-core(i.e., 4 cores) AMD Opteron 2222 processors with 4 GB RAM. All algorithms were tested on ZDockBenchmark Suite 2.0 [44] which contains a total of 80 complexes (160 proteins). The number ofatoms per protein varied from 436 to 11,238 with an average of 3,781 and median 3,275. We havealso included some results on very large virus structures.

7.1 Approximating Lennard-Jones Potential

We have implemented a multithreaded version of the Lennard-Jones potential approximation algo-rithm described in Section 3.1. In Figure 12 we compare this algorithm with the naıve quadratictime algorithm for computing Lennard-Jones potential. Both algorithms were run on 60 rigid-bodycomplexes from ZDock Benchmark 2.0.

9which is equal to the height of the octree assuming that all atoms of the molecule form a single connectedcomponent

23

Page 25: ICES REPORT 10-32ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj, "Algorithms for Faster Molecular Energetics, Forces and Interfaces", ICES REPORT 10-32,

(a) (b)

Figure 12: Performance of the octree-based multithreaded Lennard-Jones potential approxima-tion algorithm on the 60 rigid-body complexes from ZDock Benchmark 2.0. (a) Speed-up factorsachieved over naıve computation as the number of threads varies. (b) Speed-up over naıve compu-tation as the error margin varies.

Figure 12(a) plots the speed-up factors achieved by the approximation algorithm as the numberof threads is varied but ε is kept fixed so that the maximum error (from exact result) is at most1.5%. When the number of threads is 8 (using 8 cores) the approximation algorithm runs up to225 times faster than the naıve algorithm. We have not included results for more than 8 coresbecause most complexes used in this experiment are too small to achieve any significant speed-upwith more than 10 cores.

Figure 12(b) shows how the approximation algorithm speeds up over the naıve algorithm as theerror bound (i.e., ε) is varied, but the number of threads is kept fixed at 8. It reached an speed-upfactor of more than 100 even when the error bound was less than 1%.

(a) (b)

Figure 13: Performance of the octree-based Born radii calculation algorithm on ZDock Benchmark2.0 complexes (individually for both the receptor and the ligand). (a) Speed-up factors achievedover NFFT based algorithm. (b) Average absolute error from exact value.

24

Page 26: ICES REPORT 10-32ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj, "Algorithms for Faster Molecular Energetics, Forces and Interfaces", ICES REPORT 10-32,

7.2 Approximating Born Radii

Figure 13 compares the performance of our implementation of octree-based Born radii approxima-tion algorithm with the NFFT-based algorithm. The r4 approximation (i.e., Equation (4)) wasused. Both algorithms were multithreaded and hence used all 4 cores of the machine they wererun on. For both algorithms the average absolute error was calculated by taking the average of theabsolute difference from the exact Born radius computed using equation (6) for each atom. In ourexperiments the octree-based algorithm ran 25 to 133 times faster than the NFFT-based algorithmbut still was 5 to 10 times more accurate. The average absolute error of the octree-based algorithmwas consistantly around 1% while that of the NFFT-based algorithm varied from 4% to 12%.

(a) (b)

Figure 14: Performance of the octree-based polarization energy calculation algorithm on ZDockBenchmark 2.0 complexes (individually for both the receptor and the ligand). (a) Speed-up factorsachieved over naıve algorithm. (b) Average absolute error from exact value.

7.3 Approximating Polarization Energy

In Figure 14 we compare the performance of an unthreaded version of the octree-based Gpol approx-imation algorithm with that of the naıve quadratic time algorithm. The octree-based algorithmran up to 2 times faster with less than 4% relative error, and the speed-up factor improved withthe size of the system.

Figure 15 shows how our approximation algorithms for estimating the Born radii (using the r6-approximation or Equation (10)) and the polarization energy scale as the size of system becomestoo large. The Figure plots % error against % running time of our multithreaded algorithm (Bornradii approximation followed by Epol approximation) w.r.t. the unthreaded naıve algorithm as theapproximation parameters vary. The experiments were run on a 16-core computation node with 4quad-core 2.3 GHz AMD Opteron processor with 32 GB RAM. The plots are for the following fourvirus capsids each containing around half a million atoms: Canine Parvovirus (2CAS), CowpeaChlorotic Mosaic Virus (1CWP), Cucumber Mosaic Virus (1F15) and Human Hepatitis B Virus(1QGT). We have used a more optimized multithreaded version of the octree-based Epol algorithmfor these experiments. The algorithm used vector instructions and approximate math functions.As the plots show our octree-based algorithms ran around 1,000 - 2,000 times faster than the naıve

25

Page 27: ICES REPORT 10-32ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj, "Algorithms for Faster Molecular Energetics, Forces and Interfaces", ICES REPORT 10-32,

������������ �� � ����� �����������

������������� �� � ���������

� ������������������� �� � ����� ���� ����

��������������� �� � ����� ����������

�������������� �� � ����������

� ������������������� �� � ����� ���� ����

�������������� �� � ����� �����������

��������������� �� � ����������

� �������������������� �� � ������ ���� ����

�������������� �� � ����� �����������

������������� �� � ����������

� �������������������� �� � ������ ���� ����

Figure 15: Scaling and speed-accuracy tradeoff of the octree-based polarization energy approxima-tion algorithm.

algorithm for an error margin of less than 25%, and even for less than 1% error the running timesof our algorithm were no more than 1.5% of that of the naıve algorithm.

Figure 16 compares our approximation algorithm with AMBER 9 [13] (as well as naıve algo-rithm) when run on Cucumber Mosaic Virus (1F15). For better accuracy we have used thrice asmany integration points on the virus surface as was used for the 1F15 results in Figure 15. Allalgorithms were run on the same 16-core computation node as in the pervious experiment. NeitherAMBER 9 nor the naıve algorithm was multithreaded, but the octree-based algorithm utilized all16 available cores on the machine. While the naıve algorithm and AMBER 9 took 57 hours and 1.5hours, respectively, for computing Epol, our algorithm approximated it to within 18% of the valuescomputed by AMBER and naıve in only 23 seconds, and within 3% in about 9 minutes.

8 Conclusion

We have presented octree-based algorithms for efficiently estimating the compute intensive termsof EMM and Gsol, polarization forces as well as interfaces of bio-molecular complexes. The mainfeatures of these algorithms are as follows.

26

Page 28: ICES REPORT 10-32ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj, "Algorithms for Faster Molecular Energetics, Forces and Interfaces", ICES REPORT 10-32,

��� � � � ��� � �� � � �� � � � � � ����� �� � � � �� �� �� � � � � !"# $% & ' �( &� )* +� � ,� � � � &� � -# � �

. / 01 2 / 3 456 7 8 9;: < 8 => : ?@ A

. /B > 2 / 3 456 7 8 9;: < 8 =C 0DE 6 A

F� GH " $% & 'I ( &� )* +� � ,� � � � �J -# � �

Figure 16: Speed-accuracy tradeoff of the octree-based polarization energy approximation algorithmw.r.t. Amber 9 [13] and the naıve algorithm when run on Cucumber Mosaic Virus with half a millionatoms and around 2 billion integration points.

− All algorithms are designed within the same structural framework, and basically have thesame O (M logM) running time using O (M) space, where M is the number of atoms in themolecule. All are based on near and far decomposition of the data points in 3D space usingoctrees, and are very easy to implement. Most of them are also dependent on our sparsetriangulation of analytical molecular surfaces for which we provide an O (M logM) time andO (M) space algorithm.

− The approximation schemes presented for energy and forces can be easily tuned (by varyingthe approximation parameter ε) to obtain useful speed-accuracy tradeoffs based on the needsof the application. We have included several examples with timing results, and speed/accuracytradeoffs, demonstrating the efficiency and scalability of our fast free energy estimation of bio-molecules, potentially with millions of atoms.

− The algorithms for computing the interfaces practically run in time linear in the size of theinterface provided the octrees are computed in a preprocessing step. They are particularlysuitable for applications where one needs to evaluate the interfaces of a pair of bio-moleculesfor a large number of relative positions/orientations of the two molecules (e.g., pairwise dock-ing programs).

− The algorithms are easily parallelizable, and thus can achieve speedups for both parallelizationand approximation.

We have compared our Born radii and Gpol calculations with our own prior published NFFT-based O

(M2)

time results on GB-energy [5]. The new algorithms have produced more accurateresults while still running much faster than our previous algorithms. We have demonstrated thescalibility of these algorithms by running them on virus structures with millions of atoms for atvarious levels of accuracy and comparing the results with AMBER [13].

27

Page 29: ICES REPORT 10-32ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj, "Algorithms for Faster Molecular Energetics, Forces and Interfaces", ICES REPORT 10-32,

References

[1] Bajaj, C., Chowdhury, R. A., and Rasheed, M. A dynamic data structure for flexiblemolecular maintenance and informatics. In SIAM/ACM Conf. Geom. Phys. Model. (New York,NY, USA, 2009), ACM, pp. 259–270.

[2] Bajaj, C., Lee, H., Merkert, R., and Pascucci, V. NURBS based b-rep models frommacromolecules and their properties. In Proceedings of Fourth Symposium on Solid Modelingand Applications (1997), pp. 217–228.

[3] Bajaj, C., Pascucci, V., Shamir, A., Holt, R., and Netravali, A. Dynamic main-tenance and visualization of molecular surfaces. Discrete Applied Mathematics 127 (2003),23–51.

[4] Bajaj, C., Xu, G., and Zhang, Q. A fast variational method for the construction ofresolution adaptive c2-smooth molecular surfaces. Computer Methods Applied Mechanical En-gineering 198 (2009), 1684–1690.

[5] Bajaj, C., and Zhao, W. Fast molecular solvation energetics and forces computation. SIAMJournal on Scientific Computing 31, 6 (2010), 4524–4552.

[6] Baker, N., Holst, M., and Wang, F. Adaptive multilevel finite element solution ofthe Poisson-Boltzmann equation II. Refinement at solvent-accessible surfaces in biomolecularsystems. J. Comput. Chem. 21 (2000), 1343–1352.

[7] Barnes, J., and Hut, P. A hierarchical O(N logN) force-calculation algorithm. Nature324, 6096 (1986), 446–449.

[8] Bashford, D., and Case, D. A. Generalized Born models of macromolecular solvationeffects. Annu. Rev. Phys. Chem. 51 (2000), 129–152.

[9] Black, S., and Mould, D. Development of hydrophobicity parameters to analyze proteinswhich bear post- or cotranslational modifications. Anal. Biochem. 193 (1991), 72–82.

[10] Board, J. A., Causey, J. W., Leathrum, J. F., Windemuth, A., and Schulten,K. Accelerated molecular dynamics simulation with the parallel fast multipole algorithm.Chemical Physics Letters 198, 1-2 (October 1992), 89–94.

[11] Brent, R. P. The parallel evaluation of general arithmetic expressions. J. ACM 21, 2 (1974),201–206.

[12] Calimet, N., Schaefer, M., and Simonson, T. Protein molecular dynamics with thegeneralized Born/ACE solvent model. Proteins: Structure, function, and genetics 45 (2001),144–158.

[13] Case, D., Cheatham, III, T., Darden, T., Gohlke, H., Luo, R., Merz, Jr., K.,Onufriev, A., Simmerling, C., Wang, B., and Woods, R. The Amber biomolecularsimulation programs. J. Comput. Chem. 26 (2005), 1668–1688.

[14] Cheng, H., Greengard, L., and Rokhlin, V. A fast adaptive multipole algorithm inthree dimensions. J. Comput. Phys. 155, 2 (1999), 468–498.

28

Page 30: ICES REPORT 10-32ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj, "Algorithms for Faster Molecular Energetics, Forces and Interfaces", ICES REPORT 10-32,

[15] Connolly, M. Analytical molecular surface calculation. J. Appl. Cryst. 16 (1983), 548–558.

[16] Connolly, M. L. Molecular surfaces: A review.http://www.netsci.org/Science/Compchem/feature14.html.

[17] Darden, T., York, D., and Pedersen, L. Particle mesh Ewald: An N · log (N) methodfor Ewald sums in large systems. Journal of Chemical Physics 89 (1993), 10089–10092.

[18] Davis, P., and Rabinowitz, P. Methods of numerical integration, second ed. Dover, 2007.

[19] Dunavant, D. High degree efficient symmetrical Gaussian quadrature rules for the triangle.Int. J. Numer. Meth. Engng. 21 (1985), 1129–1148.

[20] Eisenberg, D., and Mclachlan, A. Solvation energy in protein folding and binding. Nature(London) 319 (1986), 199–203.

[21] Essmann, U., Perera, L., Berkowitz, M. L., Darden, T., Lee, H., and Pedersen,L. G. A smooth particle mesh Ewald method. The Journal of Chemical Physics 103, 19(1995), 8577–8593.

[22] Finkel, R. A., and Bentley, J. L. Quad trees: A data structure for retrieval on compositekeys. Acta Informatica 4 (1974), 1–9.

[23] Gallicchio, E., and Levy, R. M. AGBNP: An analytic implicit solvent model suitable formolecular dynamics simulations and high-resolution modeling. J. Comput. Chem. 25 (2004),479–499.

[24] Ghosh, A., Rapp, C., and Friesner, R. Generalized Born model based on a surfaceintegral formulation. J. Phys. Chem. B 102 (1998), 10983–10990.

[25] Gilson, M., Davis, M., Luty, B., and McCammon, J. Computation of electrostatic forceson solvated molecules using the Poisson-Boltzmann equation. J. Phys. Chem. 97 (1993), 3591–3600.

[26] Grant, J., and Pickup, B. A gaussian description of molecular shape. Journal of PhysicalChemistry 99 (1995), 3503–3510.

[27] Greengard, L., and Rokhlin, V. A fast algorithm for particle simulations. J. Comput.Phys. 73, 2 (1987), 325–348.

[28] Grycuk, T. Deficiency of the coulomb-field approximation in the generalized born model:An improved formula for born radii evaluation. J. Chem. Phys. 119 (2003), 4817–4826.

[29] Hardy, D. J. Multilevel summation for the fast evaluation of forces for the simulation ofbiomolecules. PhD thesis, Champaign, IL, USA, 2006. Adviser-Skeel, Robert D.

[30] Hawkins, G., Cramer, C., and Truhlar, D. Pairwise solute descreening of solute chargesfrom a dielectric medium. Chem. Phys. Lett. 246 (1995), 122–129.

[31] Hawkins, G., Cramer, C., and Truhlar, D. Parametrized models of aqueous free energiesof solvation based on pairwise descreening of solute atomic charges from a dielectric medium.J. Phys. Chem. 100 (1996), 19824 – 19839.

29

Page 31: ICES REPORT 10-32ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj, "Algorithms for Faster Molecular Energetics, Forces and Interfaces", ICES REPORT 10-32,

[32] Hermann, R. Theory of hydrophobic bonding. II. Correlation of hydrocarbon solubility inwater with solvent cavity surface area. J. Phys. Chem. 76 (1972), 2754–2759.

[33] Hockney, R. W., and Eastwood, J. W. Computer simulation using particles. Taylor &Francis, Inc., Bristol, PA, USA, 1988.

[34] Holst, M., Baker, N., and Wang, F. Adaptive multilevel finite element solution of thePoisson-Boltzmann equation I. Algorithms and examples. J. Comput. Chem. 21 (2000), 1319–1342.

[35] Im, W., Beglov, D., and Roux, B. Continuum solvation model: Computation of elec-trostatic forces from numerical solutions to the Poisson-Boltzmann equation. Comput. Phys.Comm. 111 (1998), 59–75.

[36] Im, W., Lee, M., and Brooks, III, C. Generalized Born model with a simple smoothingfunction. J. Comput. Chem. 24 (2003), 1691–1702.

[37] Jackins, C. L., and Tanimoto, S. L. Oct-trees and their use in representing three-dimensional objects. Computer Graphics and Image Processing 14, 3 (1980), 249–270.

[38] Kollman, P., Massova, I., Reyes, C., Kuhn, B., Huo, S., Chong, L., Lee, M., Lee,T., Duan, Y., Wang, W., Donini, O., Cieplak, P., Srinivasan, J., Case, D., andCheatham, T. Calculating structures and free energies of complex molecules: combiningmolecular mechanics and continuum models. Acc. Chem. Res. 33 (2000), 889–897.

[39] Lee, B., and Richards, F. The interpretation of protein structures: estimation of staticaccessibility. Journal of Molecular Biology 55, 3 (February 1971), 379–400.

[40] Lee, M., Feig, M., Salsbury, F., and Brooks, III, C. New analytical approximation tothe standard molecular volume definition and its application to generalized Born calculations.J. Comput. Chem. 24 (2003), 1348–1356.

[41] Lee, M., Salsbury, F., and Brooks, III, C. Novel generalized Born methods. J. Chem.Phys. 116, 24 (2002), 10606–10614.

[42] Levitt, M., Hirshberg, M., Sharon, R., and Daggett, V. Potential energy functionand parameters for simulations of the molecular dynamics of proteins and nucleic acids insolution. Comp. Phys. Comm. 91 (1995), 215–231.

[43] Lu, B., Zhang, D., and McCammon, J. Computation of electrostatic forces betweensolvated molecules determined by the Poisson-Boltzmann equation using a boundary elementmethod. J. Chem. Phys. 122 (2005), 214102–214109.

[44] Mintseris, J., Wiehe, K., Pierce, B., Anderson, R., Chen, R., Janin, J., and Weng,Z. Protein-protein docking benchmark 2.0: An update. Proteins: Structure, Function, andBioinformatics 60, 2 (2005), 214–216.

[45] Pollock, E. L., and Glosli, J. Comments on PPPM, FMM, and the Ewald method forlarge periodic Coulombic systems, Nov 1995.

30

Page 32: ICES REPORT 10-32ICES REPORT 10-32 August 2010 Reference: Rezaul Chowdhury and Chandrajit Bajaj, "Algorithms for Faster Molecular Energetics, Forces and Interfaces", ICES REPORT 10-32,

[46] Potts, D., and Steidl, G. Fast summation at nonequispaced knots by NFFTs. SIAM J.Sci. Comput. 24 (2003), 2013–2037.

[47] Richards, F. Areas, volumes, packing, and protein structure. Annual Review of Biophysicsand Bioengineering 6 (June 1977), 151–176.

[48] Ritchie, D. Evaluation of protein docking predictions using hex 3.1 in CAPRI rounds 1 and2. Proteins: Structure, Function, and Genetics 52, 1 (July 2003), 98–106.

[49] Sharp, K. Incorporating solvent and ion screening into molecular dynamics using the finite-difference Poisson-Boltzmann method. J. Comput. Chem. 12 (1991), 454–468.

[50] Simonson, T., and Bruenger, A. Solvation free energies estimated from macroscopiccontinuum theory: An accuracy assessment. J. Phys. Chem. 98 (1994), 4683 – 4694.

[51] Skeel, R. D., Tezcan, I., and Hardy, D. J. Multiple grid methods for classical moleculardynamics. Journal of Computational Chemistry 23, 6 (2002), 673–684.

[52] Srinivasan, J., Trevathan, M., Beroza, P., and Case, D. Application of a pairwisegeneralized Born model to proteins and nucleic acids: inclusion of salt effects. Theor. Chem.Accts. 101 (1999), 426–434.

[53] Still, W., Tempczyk, A., Hawley, R., and Hendrickson, T. Semianalytical treatmentof solvation for molecular mechanics and dynamics. J. Am. Chem. Soc. 112 (1990), 6127–6129.

[54] Stone, J., Phillips, J., Freddolino, P., Hardy, D., Trabuco, L., and Schulten, K.Accelerating molecular modeling applications with graphics processors. J Comput Chem 28,16 (September 2007), 2618–2640.

[55] Tjong, H., and Zhou, H.-X. Gbr6: A parameterization-free, accurate, analytical generalizedborn method. J. Phys. Chem. B 111 (2007), 3055–3061.

[56] Wagoner, J., and Baker, N. A. Assessing implicit models for nonpolar mean solvationforces: The importance of dispersion and volume terms. Proc. Natl. Acad. Sci. USA 103(2006), 8331–8336.

[57] Yu, Z., Jacobson, M., and Friesner, R. What role do surfaces play in GB models?A new-generation of surface-generalized Born model based on a novel Gaussian surface frobiomolecules. J. Comput. Chem. 27 (2006), 72–89.

[58] Zhao, W., Xu, G., and Bajaj, C. An algebraic spline model of molecular surfaces.IEEE/ACM Transactions on Computational Biology and Bioinformatics Accepted for Pub-lication (2009).

31


Recommended