Reviews in Computational Chemistry Volume 23read.pudn.com/downloads152/ebook/665558/Reviews in...

Reviews inComputationalChemistryVolume 23

Edited by

Kenny B. LipkowitzThomas R. Cundari

Editor Emeritus

Donald B. Boyd

WILEY-VCH



Edited by

Kenny B. LipkowitzThomas R. Cundari

Editor Emeritus

Donald B. Boyd

WILEY-VCH

Copyright � 2007 by John Wiley & Sons, Inc. All rights reserved

Published by John Wiley & Sons, Inc., Hoboken, New Jersey

Published simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in anyform or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise,

except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without

either the prior written permission of the Publisher, or authorization through payment of the

appropriate per-copy fee of the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers,MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www. copyright.com. Requests

to the Publisher for permission should be addressed to the Permissions Department, John Wiley &

Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online

at http://www.wiley.com/go/permission.

Limit of Liability/Disclaimer of Warranty: While the Publisher and author have used their best

efforts in preparing this book, they make no representations or warranties with respect to theaccuracy or completeness of the contents of this book and specifically disclaim any implied warr-

anties of merchantability or fitness for a particular purpose. No warranty may be created or

extended by sales representatives or written sales materials. The advice and strategies contained

herein may not be suitable for your situation. You should consult with a professional whereappropriate. Neither the publisher nor author shall be liable for any loss of profit or any other

commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact

our Customer Care Department within the United States at (800) 762-2974, outside the United

States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print

may not be available in electronic formats. For more information about Wiley products, visit ourweb site at www.wiley.com.

Wiley Bicentennial Logo: Richard J. Pacifico

Library of Congress Cataloging-in-Publication Data:

ISBN 978-0-470-08201-0

ISSN 1069-3599

Printed in the United States of America

10 9 8 7 6 5 4 3 2 1

Kenny B. LipkowitzDepartment of Chemistry

Howard University

525 College Street, N.W.

Washington, D. C., 20059, [email protected]

Thomas R. Cundari

Department of ChemistryUniversity of North Texas

Box 305070,

Denton, Texas 76203-5070, [email protected]

Donald B. BoydDepartment of Chemistry and Chemical

Biology

Indiana University-Purdue University

at Indianapolis402 North Blackford Street

Indianapolis, Indiana 46202-3274, U.S.A.

[email protected]

http://www.copyright.com

http://www.wiley.com/go/permission

http://www.wiley.com

Preface

Students wanting to become computational chemists face a steep learn-ing curve that can be intellectually and emotionally challenging. Those stu-dents are expected to know basic physics, from quantum mechanics tostatistical mechanics, along with a full comprehension of electricity and mag-netism; they are required to be conversant in calculus, algebra, graph theory,and statistics; they are expected to be cognizant of algorithmic issues of com-puter science, and they are expected to be well versed in the experimentalaspects of the topic they intend to model, whether it be in the realm of materi-als science, biology, or engineering. Beginning in the mid-1990s, and continu-ing into this century, there appeared a series of books on molecular modelingand computational chemistry that addressed the needs of such students. Thosebooks are very well organized, are extremely well written, and have beenreceived enthusiastically by the community at large.

The editors of Reviews in Computational Chemistry knew that suchbooks would avail themselves to a hungry public, and further knew thatonly an introduction to the wide range of topics in this discipline could becovered in a single book. Accordingly, a decision was made to providelengthier, more detailed descriptions of the many computational tools that acomputational scientist would need for his or her career. Reviews in Compu-tational Chemistry thus set out on a trajectory of providing pedagogicallydriven chapters for both the novice who wants to become a computationalmolecular scientist as well as for the seasoned professional who wants toquickly learn about a computational method outside of his or her area ofexpertise. In this, the 23rd volume of the series, we continue that traditionby providing seven chapters on a wide variety of topics.

Most bench chemists who use software for computing quantum mechan-ical properties, structures, and energies of molecular systems are well aware ofthe n4 bottleneck associated with the calculation of the required electron repul-sion integrals and quickly find this scaling problem to be a major impedimentto their studies. In Chapter 1, Christian Ochsenfeld, Jorg Kussmann, andDaniel Lambrecht provide a tutorial on the topic of linear scaling methods

v

in quantum chemistry. The authors begin by putting into perspective the exist-ing scaling problems associated with approximating the solution to the Schro-dinger equation. They review the basics of self-consistent field (SCF) theorywithin the Born–Oppenheimer (BO) approximation and focus the readers’attention on the interplay between the cubic scaling for diagonalization ofthe Fock matrix and the quadratic scaling for the formation of that matrix.They then describe how one can reduce this problem by selecting numericallysignificant integrals using Schwarz or multipole-based integral estimates, illus-trating those concepts with easy-to-follow diagrams and demonstrating theresults with simple plots. The calculation of integrals by multipole expansionis then presented, beginning with a very simple example that shows the novicehow individual pair-wise interactions between point charges can be collectedinto charge distributions that, when combined with a clever tree algorithm, canavoid the quadratic scaling step. The authors provide the reader with a basicunderstanding of multipole expansions and then describe the fast multipolemethod (FMM) and its generalization for continuous (Gaussian) distributions,the continuous FMMmethod, before providing an overview of other multipoleexpansions and tree codes used to speed up the calculation of two-electronintegrals. Exactly how this linear scaling is accomplished is illustrated nicelyby the authors through the use of an example of a molecule of substantialsize—an octameric fragment of DNA. Having described how to reduce thescaling behavior for the construction of the Coulomb part of the Fock matrix,the authors bring to the fore the remaining component within Hartree–Frock(HF) theory, the exchange part [also required for hybrid density functional the-ory (DFT)]. As before, with a didactic style, the authors show how one canexploit localization properties of the density matrix to achieve linear scalingof the exchange part of the Hamiltonian. Then the authors show how onecan avoid the conventional diagonalization of the assembled Fock matrixand reduce what would be a cubic scaling process to one that is linear. Thetensor formalism is introduced, properties of the one-particle density matrixare described, and the density-matrix-based energy functional is introducedto solve the SCF problem. The authors go beyond just the computation of ener-gies by then explaining energy gradients and molecular response properties.The chapter concludes with an overview of what it takes to reduce the scalingbehavior of post-HF methods for large systems. Linear scaling techniques inquantum chemistry are becoming more widely implemented in softwarepackages. The bench chemist who is inclined to use this software ought nottreat it as a black box, but instead should be cognizant of the assumptions,approximations and pitfalls associated with linear scaling methodology.This chapter makes all of this visible to the user in a clear and coherentmanner.

The BO approximation is sufficient in quantum chemistry for describingmost chemical processes. However, many nonadiabatic processes exist innature that cannot be described adequately within this context, examples of

vi Preface

which include the ubiquitous photophysical and photochemical processesassociated with photosynthesis, vision and charge transfer reactions, amongothers. Nonadiabatic phenomena occur when two or more potential energysurfaces approach each other, and the coupling between those surfacesbecomes important. Conical intersections are the actual crossings of thosesurfaces. In Chapter 2, Spiridoula Matsika highlights where the BO approxi-mation breaks down, the differences between adiabatic and diabatic represen-tations for studying nuclear dynamics, and the significance of the noncrossingrule. She follows this with an introduction to, and explanation of, conicalintersections by addressing the Jahn–Teller effect, symmetry allowed conicalintersections, accidental intersections, the branching plane, and how topo-graphy is used to characterize conical intersections. Post-HF methods, includ-ing MCSCF, MRCI, and CASPT2, and single reference methods are surveyedalong with the many considerations a user must take into account when choos-ing a particular electronic structure method for computing conical intersec-tions. These explanations are followed by a description of how to actuallylocate conical intersections using Lagrange multipliers and projected gradienttechniques. Then, with this background in hand, the author provides us withseveral applications to show what can be done to analyze such intersections.Matsika’s review of the field covers inorganic and organic molecules butfocuses primarily on biologically relevant systems, especially on nucleic acids.Most of the tutorial focuses on two-state conical intersections, but a descrip-tion of three-state intersections is also given. That is followed by a discussionof spin-orbit coupling that, when included in the Hamiltonian, provides newand qualitatively different effects in the radiationless behavior of chemicalsystems. In this part of the review, the author points out how such effectscan couple states of different spin multiplicity whose intersections wouldotherwise not be conical, along with explaining the influence of such couplingon systems with an odd number of electrons for which there are qualitativechanges in the characteristics of the intersection. Novice molecular modelersintending to carry out quantum mechanical calculations are encouraged toperuse this chapter to determine whether their systems are susceptible tononadiabatic processes that would require evaluation of conical intersections,and then to read this tutorial to ensure that a proper treatment of the system isbeing pursued. We also urge the reader to see the chapter by Michael A. Robb,Marco Garavelli, Massimo Olivucci, and Fernando Bernardi in Volume 15that also examined some of these issues.

Most of us are content with computing structures, energies, and someproperties of molecules in the ground or excited states. For other researchers,however, kinetic information is required when rate constants for chemicalreactions must be evaluated. How does one go about computing rate constantsfor, say, a large catalytic system like an enzyme in which a critical step isthe transfer of a light particle such as a hydride or a proton that is subjectto quantum tunneling effects? In Chapter 3, Antonio Fernandez-Ramos,

Preface vii

Benjamin Ellingson, Bruce Garrett, and Donald Truhlar provide a tutorial onthe topic of variational transition state theory (VTST) with multidimensionaltunneling that gives us a good starting point to answer that question. Thetutorial begins with a description of conventional transition state theory(TST), highlighting the tenets upon which it is constructed, the merits of itsuse, and pointing out that it only provides an approximation to the true rateconstant because it assesses the one-way flux through a dividing surface that isappropriate for small, classic vibrations around a saddle point. Canonical andother types of variational TSTs are introduced by the authors at this pointalong with highlighting the influences that quantum effects have on the reac-tion coordinate; a section on practical methods for quantized VTST calcula-tions is subsequently presented to address these concerns. Here the authorscover some algorithms used to calculate the reaction path by describing theminimum energy path and an algorithm for a variational reaction path, show-ing us how to evaluate partition functions using both rectilinear and curvi-linear coordinates, describing the influence of anharmonic vibrational levelson those partition functions, and demonstrating how to calculate the numberof states needed for microcanonical variational theory calculations. Theauthors then focus on quantum effects on reaction coordinate motion; sucheffects are usually dominated by tunneling but also include nonclassical reflec-tion, both of which are incorporated in a multiplicative transmission coeffi-cient. In this part of the tutorial, multidimensional tunneling correctionsare highlighted for the novice. Because the reaction path is a curvilinear coor-dinate, the curvature of that path couples motion along the reaction coordi-nate to local vibrational modes that are perpendicular to it. The couplingcauses the system to take a shorter path than the reaction coordinate bytunneling. Both small- and large-curvature tunneling motions, with and with-out vibrational excitations, are compared in this part of the tutorial. In thesecond part of Chapter 3, the authors deal with VTST in complex systems.Because an analytical potential energy surface (PES) is usually not available,the authors begin by describing how one can build the PES from electronicstructure calculations using ‘‘on the fly’’ quantum methods for directdynamics calculations, i.e., without the fitting of those energies in the formof a potential function, and then they explain how one can derive those sur-faces by interpolation through the use of their multiconfiguration molecularmechanics algorithm (MCMM) and by a mapping procedure. This, in turn, isfollowed by a description of how to incorporate both low-level and high-levelcalculations to generate the PES so as to make the calculation of rateconstants very fast. In this chapter, the authors also cover other topics ofrelevance to the prediction of accurate rate calculations, including reactionsin liquids and, because there is more than one reaction coordinate, how to useensemble-averaged VTST. Finally, the authors provide two insightfulexamples, one in the gas phase and the other in solution, to demonstratethe speed and accuracy of modern methods for predicting rate constants.

viii Preface

Chemists have traditionally worked on a scale of size ranging fromAngstrom units to nanometers; we take a molecular view of the scientificproblems at hand in which atomic-level detail is de rigueur. What happens,though, if the career path you take or the research project you are engagedwith involves, say, long-chain polymeric systems consisting of a few thousandmonomers in a melt where the relevant length scales run from bond lengths tothe contour length of the chain, which is on the order of micrometers, andwhere the relevant relaxation times increase as N3.4 for chains of length N?One approach to addressing such a problem is to invoke coarse-grained tech-niques, and in Chapter 4, Roland Faller shows us how this is accomplished.The author sets the stage for such a computational scene by first pointingout that one needs to define the system to be evaluated and then one needsto select a suitable model to combine simulations on a variety of length andtime scales. An explanation is then provided about how one assigns interactionsites on the coarse-grain scale. Because it may be necessary to use two or moremodels to cover the range of relevant interactions of interest to the scientist orengineer, the author emphasizes that a meaningful mapping between scalesis needed for meaningful results. For example, atomistic models can treatlengths of scale from a few hundred picometers to tens of nanometers, whereasmeso-scale models are useful from the multi-nanometer scale up to a fewmicrometers in size, but if we want to enter the realm of micrometers andbeyond, a second or third mapping is needed. A brief tutorial on the varioustypes of existing mapping strategies is given for the novice modeler. First,static mapping methods are discussed, including single-chain distributionmodels, iterative structural coarse-graining, and mapping onto simple models.Then the author teaches us about dynamic mapping, including mapping bychain diffusion, mapping through local correlation times, and direct mappingof the Lennard–Jones time. Following that part of the tutorial, the authordescribes coarse-grained Monte Carlo simulations and reverse mapping, inwhich atomistic detail is reintroduced at the end of the simulation. Fallerthen goes beyond polymers to describe examples of coarse-grain modelingof lipid bilayer systems. Nowadays the scientific community is expectingmore than it has in the past from a computational chemist in terms of boththe quality and the scope of the modeling endeavor. Because advances incomputing machinery will not likely allow us to take a fully atomisticapproach to such modeling in the next decade, this chapter, written fromthe perspective of an engineer, gives us the insights needed to carry out simu-lations on both small and large scales.

Many of the readers of this book series work in the pharmaceuticalindustry where informatics is especially relevant. Different databases areavailable free of charge in some cases but more usually for a fee, even ifthat fee comes from within the company where large investments are madein developing a proprietary database. One might want to know in advanceif a given database is more diverse than another, or one might want to answer

Preface ix

the question: ‘‘How much additional chemical diversity is added when wedouble the size of the current database?’’ Given the costs of generating com-pound libraries (real or virtual), answering such a question requires that amanagement team should have good insights about information in general;otherwise, poor decisions could have costly ripple effects that negatively influ-ence both big and small companies alike. In Chapter 5, Jeffrey Godden andJurgen Bajorath provide a tutorial on the analysis of information contentthat focuses on Shannon entropy (SE) applied to molecules. Here, any struc-tural representation of a molecule, including the limitless number of moleculardescriptors currently in use today or to be used in the future, is to be under-stood as a communication carrying a specific amount of information. Theauthors begin their tutorial by providing a historical account of how thisarea of informatics developed, and they explain the relationship betweenShannon entropy used in the telecommunications industry and informationbeing conveyed in a typical molecular database. Simple equations and simpleexamples are used to illustrate the concepts. The authors then use theseconcepts to show the reader how one would compare descriptors in, say,the Available Chemical Directory (ACD) with those in the Molecular DrugData Report (MDDR). Here it is emphasized that because SE is a nonpara-metric distribution measure, this entropy-based approach is well suited forinformation content analysis of descriptors with different units, differentnumerical ranges, and different variability. Now one can begin answeringquestions such as ‘‘Which descriptors carry high levels of information for aspecific compound set?’’ [which in turn could be used for deriving a statisti-cally meaningful quantitative structure-activity relationship/quantitativestructure-property relationship (QSAR/QSPR) model] and ‘‘Which descriptorscarry low levels of information?’’ The authors continue their tutorial bydescribing the influence of boundary effects on such analyses, and they givehints about what to do and what not to do for the novice modeler who wouldotherwise become trapped in one or more computational pitfalls that are notvisible to a beginner. An extension of the method called differential Shannonentropy (DSE) analysis is then introduced, and the reader is shown how DSEcan reveal descriptors that are sensitive to systematic differences in the proper-ties of different databases or classes of molecules. A brief glimpse into theinformation content of organic molecules is given, and then uses of SE inquantum mechanical calculations, molecular dynamics simulations, and othertypes of modeling are presented. The authors end this chapter with examplesof SE and DSE analysis for the modeling of physicochemical properties and foraccurate classification of molecules, a topic that is described in the followingchapter.

Many of us make binary, black/white, either/or type decisions every day:‘‘Should I buy this house now or wait?’’, ‘‘Should I say something to my bossor not?’’, and so on. These types of queries are commonly posed in a scientificsetting as well, where, for example, the question might be on a health-related

x Preface

issue like ‘‘Is this cell cancerous or not?’’, and in the business world, where wemight ask, ‘‘Is this lead molecule toxic or not?’’ Robust methods for simpleclassification do exist. One of the more popular and successful techniquesinvolves a group of supervised learning methods called support vectormachines (SVM) that can be applied to classification as well as to regression.In Chapter 6, Ovidiu Ivanciuc covers the topic of SVMs in chemistry. Follow-ing a historical introduction that covers the development of SVM and otherkernel-based techniques, the author provides a non-mathematical introductionto SVM, beginning with the classification of linearly separable classes and thencontinues by teaching us about partitioning classes that cannot be separatedwith a linear classifier, which is a situation where mapping into a high-dimensional feature space is accomplished with nonlinear functions called ker-nels. The author uses a simple ‘‘synthetic’’ dataset to demonstrate the conceptsfor the beginner, and he provides simple MATLAB-generated plots to illus-trate what should and should not be done for both classification and regres-sion. The next topic unveiled in this tutorial is pattern classification, whichis used, for example, in clinical diagnostics and in speech analysis as well asfor chemical problems where one might need to recognize, say, the provenanceof agricultural products like wine, olive oil, or honey based on chemicalcomposition or spectral analysis. Again, very simple examples and clear plotsare presented to show the utility of this method for pattern classification alongwith restrictions on its use. Because SVMs are based on structural riskminimization, which in turn is derived from statistical learning theory, themachine algorithm is considered deterministic. Accordingly, concepts relatedto the expected risk or to the expected error are next introduced by describingthe Vapnik–Chervonenkis dimension, a construct used to indicate how high incomplexity a classifier must be to minimize the empirical risk. With this back-ground, pattern classification with linear support machines is described for thereader, showing how to establish the optimum separation hyperplane for agiven finite set of learning patterns. The equations needed to accomplish thisare developed in a clear and concise manner, and again, simple examples aregiven for SVM classification of linearly separable data, and then for nonli-nearly separable data. Because there are cases where complex relationshipsexist between input parameters and the class of a pattern, the author devotesa full section to nonlinear support machines, showing first how patterns aremapped to a feature space, and then describing feature functions and kernels,including linear kernels, polynomial kernels, and Gaussian and exponentialradial basis function kernels, along with others like neural, Fourier series,spline, additive, and tensor product kernels. Also covered in this section ofthe chapter are weighted SVMs for imbalanced classification, and multiclassSVM classification. A significant portion of the review describes SVM regres-sion. Here simple examples and clear diagrams are used to illustrate theconcepts being described. This precedes a section on optimizing the SVMmodel, i.e., finding good prediction statistics. Given this background, the

Preface xi

author then spends the remainder of the chapter first on practical aspects ofSVM classification, providing guidelines for their use in cheminformaticsand QSAR, and then on applications of SVM regression. Several examplesfrom this section of the review include predicting the mechanism of actionfor polar and nonpolar narcotic compounds, classifying the carcinogenic activ-ity of polycyclic aromatic hydrocarbons, and using SVM regression for devel-oping a QSAR for benzodiazepine receptor ligands. The chapter ends with aliterature review of SVM applications in chemistry. SVM resources on theWeb are identified, and then SVM software for chemists interested in chemin-formatics and computational chemistry are tabulated in a convenient, easy-to-read list that describes what those programs can do.

In the final chapter, Donald B. Boyd presents a historical account of thegrowth of computational chemistry that covers hardware, software, events,trends, hurdles, successes, and people in the pharmaceutical industry. In the1960s, there were no computational chemists in that industry. That termhad not yet been invented. A smattering of theoretical chemists, statisticians,quantum chemists, and crystallographers were among the computer-savvyscientists at that time period who set the stage for modern computationalscience and informatics to be played out in the pharmaceutical industry.The chapter conveys to the novice molecular modeler what it was like torely on huge, offsite mainframes like the IBM 7094 or onsite machines usedmostly for payroll and bookkeeping with little time available for scientificcomputing. The smell and sounds of a computer center, replete with loudchunking noises from card punch machines and high-pitch ripping soundsfrom line printers, are well depicted for the young reader who grew up withquiet personal computers, graphical interfaces, and the Internet, all of whichwere only futuristic thoughts in the 1960s. Also brought to light is the fact thatthe armamentarium of the computational scientist in those days consisted ofprograms like Extended Huckel Theory and early versions of semiempiricalHF-based quantum methods like CNDO. Molecular mechanics for fast geo-metry optimization of pharmaceutically relevant molecules was just beingdeveloped in academic laboratories. Preparation of input data involving atom-ic coordinates was a tedious process because it involved using handheldmechanical models, protractors, and tables of standard bond lengths andbond angles. But despite hardware and software limitations, there were usefulinsights from such computational endeavors deemed valuable by managementand that, in turn, led eventually to the accelerated growth of computationalchemistry in the 1980s and the fruition of such research in the 1990s. Woveninto this historical tapestry are the expensive threads of hardware purchasesbeginning with IBM and CDC mainframes and followed by interactivemachines like the DEC-10, department-sized super-minicomputers like theVAX 11/780, PCs, Macintoshes, UNIX workstations, supercomputers,array processors, servers, and now clusters of PCs. Interlaced throughoutthis story are the historical strands of software availability and use of

xii Preface

venerable programs like CNDO/2, MINDO/3, MOPAC, MM2, and molecu-lar modeling packages like CHEMGRAF, SYBYL, and MacroModel alongwith other programs of utility to the pharmaceutical industry, includingMACCS and REACCS. Along with the inanimate objects of computers andsoftware, this chapter reveals some social dynamics involving computationalchemists, medicinal chemists, and management. Stitched throughout thischapter are the nascent filaments of what we now call informatics, showinghow the fabric of that industry evolved from dealing with a small numberof molecules to now treating the enormous numbers of potential drug candi-dates coming from experimental combi-chem studies or from virtual screeningby computer. This chapter conveys to the reader, in a compelling way, boththe hardships and the successes of computational chemistry in the pharmaceu-tical industry.

Reviews in Computational Chemistry is highly rated and well receivedby the scientific community at large; the reason for these accomplishmentsrests firmly on the shoulders of the authors whom we have contacted toprovide the pedagogically driven reviews that have made this ongoing bookseries so popular. To those authors we are especially grateful.

We are also glad to note that our publisher has plans to make our mostrecent volumes available in an online form through Wiley InterScience. Pleasecheck the Web (http://www.interscience.wiley.com/onlinebooks) or [email protected] for the latest information. For readers who appreciatethe permanence and convenience of bound books, these will, of course,continue.

We thank the authors of this and previous volumes for their excellentchapters.

Kenny B. LipkowitzWashington

Thomas R. CundariDenton

April 2006

Preface xiii

Contents

1. Linear-Scaling Methods in Quantum Chemistry 1Christian Ochsenfeld, Jorg Kussmann, and Daniel S. Lambrecht

Introduction 1Some Basics of SCF Theory 4Direct SCF Methods and Two-Electron Integral Screening 8

Schwarz Integral Estimates 9Multipole-Based Integral Estimates (MBIE) 11

Calculation of Integrals via Multipole Expansion 15A First Example 16Derivation of the Multipole Expansion 20The Fast Multipole Method: Breaking the Quadratic Wall 27Fast Multipole Methods for Continuous Charge

Distributions 32Other Approaches 34

Exchange-Type Contractions 35The Exchange-Correlation Matrix of KS-DFT 40Avoiding the Diagonalization Step—Density Matrix-Based SCF 42

General Remarks 42Tensor Formalism 43Properties of the One-Particle Density Matrix 47Density Matrix-Based Energy Functional 49‘‘Curvy Steps’’ in Energy Minimization 53Density Matrix-Based Quadratically Convergent

SCF (D-QCSCF) 55Implications for Linear-Scaling Calculation of SCF Energies 56

SCF Energy Gradients 57Molecular Response Properties at the SCF Level 59

Vibrational Frequencies 60NMR Chemical Shieldings 61Density Matrix-Based Coupled Perturbed SCF (D-CPSCF) 62

xv

Outlook on Electron Correlation Methods for Large Systems 64Long-Range Behavior of Correlation Effects 67Rigorous Selection of Transformed Products via

Multipole-Based Integral Estimates (MBIE) 72Implications 72

Conclusions 73References 74

2. Conical Intersections in Molecular Systems 83Spiridoula Matsika

Introduction 83General Theory 85

The Born–Oppenheimer Approximation and itsBreakdown: Nonadiabatic Processes 85

Adiabatic-Diabatic Representation 87The Noncrossing Rule 88The Geometric Phase Effect 89Conical Intersections and Symmetry 90The Branching Plane 91Characterizing Conical Intersections: Topography 93Derivative Coupling 96

Electronic Structure Methods for Excited States 97Multiconfiguration Self-Consistent Field (MCSCF) 98Multireference Configuration Interaction (MRCI) 99Complete Active Space Second-Order Perturbation

Theory (CASPT2) 101Single Reference Methods 101Choosing Electronic Structure Methods for Conical

Intersections 102Locating Conical Intersections 102Dynamics 104Applications 105

Conical Intersections in Biologically RelevantSystems 106

Beyond the Double Cone 110Three-State Conical Intersections 110Spin-Orbit Coupling and Conical Intersections 112

Conclusions and Future Directions 115Acknowledgments 116References 116

xvi Contents

3. Variational Transition State Theory with MultidimensionalTunneling 125

Antonio Fernandez-Ramos, Benjamin A. Ellingson,Bruce C. Garrett, and Donald G. Truhlar

Introduction 125Variational Transition State Theory for Gas-Phase Reactions 127

Conventional Transition State Theory 127Canonical Variational Transition State Theory 131Other Variational Transition State Theories 136Quantum Effects on the Reaction Coordinate 138

Practical Methods for Quantized VTST Calculations 139The Reaction Path 140Evaluation of Partition Functions 148Harmonic and Anharmonic Vibrational Energy Levels 158Calculations of Generalized Transition State Number

of States 163Quantum Effects on Reaction Coordinate Motion 163

Multidimensional Tunneling Corrections Based on theAdiabatic Approximation 164

Large Curvature Transmission Coefficient 172The Microcanonically Optimized Transmission Coefficient 188

Building the PES from Electronic Structure Calculation 190Direct Dynamics with Specific Reaction Parameters 191Interpolated VTST 192Dual-Level Dynamics 199

Reactions in Liquids 203Ensemble-Averaged Variational Transition State Theory 206Gas-Phase Example: Hþ CH4 212Liquid-Phase Example: Menshutkin Reaction 217Concluding Remarks 221Acknowledgments 222References 222

4. Coarse-Grain Modeling of Polymers 233Roland Faller

Introduction 233Defining the System 235

Choice of Model 235Interaction Sites on the Coarse-Grained Scale 237

Contents xvii

Static Mapping 238Single-Chain Distribution Potentials 238Simplex 239Iterative Structural Coarse-Graining 240Mapping Onto Simple Models 245

Dynamic Mapping 246Mapping by Chain Diffusion 247Mapping through Local Correlation Times 247Direct Mapping of the Lennard-Jones Time 250

Coarse-Grained Monte Carlo Simulations 250Reverse Mapping 252A Look Beyond Polymers 254Conclusions 257Acknowledgments 258References 258

5. Analysis of Chemical Information Content UsingShannon Entropy 263

Jeffrey W. Godden and Jurgen Bajorath

Introduction 263Shannon Entropy Concept 265Descriptor Comparison 269Influence of Boundary Effects 273Extension of SE Analysis for Profiling of Chemical Libraries 275Information Content of Organic Molecules 278Shannon Entropy in Quantum Mechanics, Molecular Dynamics,

and Modeling 279Examples of SE and DSE Analysis 280Conclusions 286References 287

6. Applications of Support Vector Machines in Chemistry 291Ovidiu Ivanciuc

Introduction 291A Nonmathematical Introduction to SVM 292Pattern Classification 301The Vapnik–Chervonenkis Dimension 306Pattern Classification with Linear Support Vector Machines 308

SVM Classification for Linearly Separable Data 308Linear SVM for the Classification of Linearly

Non-Separable Data 317

xviii Contents

Nonlinear Support Vector Machines 323Mapping Patterns to a Feature Space 323Feature Functions and Kernels 326Kernel Functions for SVM 329Hard Margin Nonlinear SVM Classification 334Soft Margin Nonlinear SVM Classification 335n-SVM Classification 337Weighted SVM for Imbalanced Classification 338Multi-class SVM Classification 339

SVM Regression 340Optimizing the SVM Model 347

Descriptor Selection 347Support Vectors Selection 348Jury SVM 348Kernels for Biosequences 349Kernels for Molecular Structures 350

Practical Aspects of SVM Classification 350Predicting the Mechanism of Action for Polar and

Nonpolar Narcotic Compounds 352Predicting the Mechanism of Action for Narcotic and

Reactive Compounds 355Predicting the Mechanism of Action from Hydrophobicity

and Experimental Toxicity 359Classifying the Carcinogenic Activity of Polycyclic

Aromatic Hydrocarbons 360Structure-Odor Relationships for Pyrazines 361

Practical Aspects of SVM Regression 362SVM Regression QSAR for the Phenol Toxicity to

Tetrahymena pyriformis 363SVM Regression QSAR for Benzodiazepine

Receptor Ligands 366SVM Regression QSAR for the Toxicity of Aromatic

Compounds to Chlorella vulgaris 367SVM Regression QSAR for Bioconcentration

Factors 369Review of SVM Applications in Chemistry 371

Recognition of Chemical Classes and Drug Design 371QSAR 376Genotoxicity of Chemical Compounds 378Chemometrics 379Sensors 381Chemical Engineering 383Text Mining for Scientific Information 384

Contents xix

SVM Resources on the Web 385SVM Software 387Conclusions 391References 392

7. How Computational Chemistry Became Important in thePharmaceutical Industry 401

Donald B. Boyd

Introduction 401Germination: The 1960s 404Gaining a Foothold: The 1970s 408Growth: The 1980s 414Gems Discovered: The 1990s 424Final Observations 437Acknowledgments 443References 443

Author Index 453

Subject Index 471

xx Contents

Contributors

Jurgen Bajorath, Department of Life Science Informatics, B-IT InternationalCenter for Information Technology, Rheinische Friedrich-Wilhelms-Universitat,Gorresstrasse 13, D-53113 Bonn, Germany (Electronic mail: [email protected] )

Donald B. Boyd, Department of Chemistry and Chemical Biology, IndianaUniversity-Purdue University at Indianapolis (IUPUI), 402 North BlackfordStreet, Indianapolis, Indiana 46202-3274, U.S.A. (Electronic mail: [email protected])

Benjamin A. Ellingson, Department of Chemistry and Supercomputing Insti-tute, University of Minnesota, 207 Pleasant Street S.E., Minneapolis,MN 55455, U. S. A. (Electronic mail: [email protected])

Roland Faller, Department of Chemical Engineering and Materials Science,University of California-Davis, 1 Shields Avenue, Davis, CA 95616, U. S. A.(Electronic mail: [email protected])

Antonio Fernandez-Ramos, Departamento de Quimica Fisica, Universidade deSantiago de Compostela, Facultade de Quimica, 15782 Santiago de Compos-tela, Spain (Electronic mail: [email protected])

Bruce C. Garrett, Chemical and Materials Sciences Division, Pacific North-west National Laboratory, MS K9-90, P.O. Box 999, Richland, WA 99352,U. S. A. (Electronic Mail: [email protected])

Jeffrey W. Godden, Department of Life Science Informatics, B-IT InternationalCenter for Information Technology, Rheinische Friedrich-Wilhelms-Universi-tat, Gorresstrasse 13, D-53113 Bonn, Germany (Electronic mail: [email protected])

xxi

Ovidiu Ivanciuc, Sealy Center for Structural Biology, Department of Bio-chemistry and Molecular Biology, University of Texas Medical Branch, 301University Boulevard, Galveston, TX 77555, U. S. A. (Electronic mail: [email protected])

Jorg Kussmann, Institut fur Physikalische und Theoretische Chemie, UniversitatTubingen, Auf der Morgenstelle 8, D-72076 Tubingen, Germany (Electronicmail: [email protected])

Daniel S. Lambrecht, Institut fur Physikalische und Theoretische Chemie,Universitat Tubingen, Auf der Morgenstelle 8, D-72076 Tubingen, Germany(Electronic mail: [email protected])

Spiridoula Matsika, Department of Chemistry, Temple University, 1901 N.13th Street, Philadelphia, PA 19122, U. S. A. (Electronic mail: [email protected])

Christian Ochsenfeld, Institut fur Physikalische und Theoretische Chemie,Universitat Tubingen, Auf der Morgenstelle 8, D-72076 Tubingen, Germany(Electronic mail: [email protected])

Donald G. Truhlar, Department of Chemistry and Supercomputing Institute,University of Minnesota, 207 Pleasant Street S.E., Minneapolis, MN 55455,U. S. A. (Electronic mail: [email protected])

xxii Contributors

Contributors toPrevious Volumes

Volume 1 (1990)

David Feller and Ernest R. Davidson, Basis Sets for Ab Initio MolecularOrbital Calculations and Intermolecular Interactions.

James J. P. Stewart, Semiempirical Molecular Orbital Methods.

Clifford E. Dykstra, Joseph D. Augspurger, Bernard Kirtman, and David J.Malik, Properties of Molecules by Direct Calculation.

Ernest L. Plummer, The Application of Quantitative Design Strategies inPesticide Design.

Peter C. Jurs, Chemometrics andMultivariate Analysis in Analytical Chemistry.

Yvonne C. Martin, Mark G. Bures, and Peter Willett, Searching Databases ofThree-Dimensional Structures.

Paul G. Mezey, Molecular Surfaces.

Terry P. Lybrand, Computer Simulation of Biomolecular Systems UsingMolecular Dynamics and Free Energy Perturbation Methods.

Donald B. Boyd, Aspects of Molecular Modeling.

Donald B. Boyd, Successes of Computer-Assisted Molecular Design.

Ernest R. Davidson, Perspectives on Ab Initio Calculations.

xxiii

Volume 2 (1991)

Andrew R. Leach, A Survey of Methods for Searching the ConformationalSpace of Small and Medium-Sized Molecules.

John M. Troyer and Fred E. Cohen, Simplified Models for Understanding andPredicting Protein Structure.

J. Phillip Bowen and Norman L. Allinger, Molecular Mechanics: The Art andScience of Parameterization.

Uri Dinur and Arnold T. Hagler, New Approaches to Empirical Force Fields.

Steve Scheiner, Calculating the Properties of Hydrogen Bonds by Ab InitioMethods.

Donald E. Williams, Net Atomic Charge and Multipole Models for the AbInitio Molecular Electric Potential.

Peter Politzer and Jane S. Murray, Molecular Electrostatic Potentials andChemical Reactivity.

Michael C. Zerner, Semiempirical Molecular Orbital Methods.

Lowell H. Hall and Lemont B. Kier, The Molecular Connectivity Chi Indexesand Kappa Shape Indexes in Structure-Property Modeling.

I. B. Bersuker and A. S. Dimoglo, The Electron-Topological Approach to theQSAR Problem.

Donald B. Boyd, The Computational Chemistry Literature.

Volume 3 (1992)

Tamar Schlick, Optimization Methods in Computational Chemistry.

Harold A. Scheraga, Predicting Three-Dimensional Structures of Oligopeptides.

Andrew E. Torda and Wilfred F. van Gunsteren, Molecular Modeling UsingNMR Data.

David F. V. Lewis, Computer-Assisted Methods in the Evaluation of ChemicalToxicity.

xxiv Contributors to Previous Volumes

Volume 4 (1993)

Jerzy Cioslowski, Ab Initio Calculations on Large Molecules: Methodologyand Applications.

Michael L. McKee and Michael Page, Computing Reaction Pathways onMolecular Potential Energy Surfaces.

Robert M. Whitnell and Kent R. Wilson, Computational Molecular Dynamicsof Chemical Reactions in Solution.

Roger L. DeKock, Jeffry D. Madura, Frank Rioux, and Joseph Casanova,Computational Chemistry in the Undergraduate Curriculum.

Volume 5 (1994)

John D. Bolcer and Robert B. Hermann, The Development of ComputationalChemistry in the United States.

Rodney J. Bartlett and John F. Stanton, Applications of Post-Hartree–FockMethods: A Tutorial.

Steven M. Bachrach, Population Analysis and Electron Densities fromQuantum Mechanics.

Jeffry D. Madura, Malcolm E. Davis, Michael K. Gilson, Rebecca C. Wade,Brock A. Luty, and J. Andrew McCammon, Biological Applications ofElectrostatic Calculations and Brownian Dynamics Simulations.

K. V. Damodaran and Kenneth M. Merz Jr., Computer Simulation of LipidSystems.

Jeffrey M. Blaney and J. Scott Dixon, Distance Geometry in MolecularModeling.

Lisa M. Balbes, S. Wayne Mascarella, and Donald B. Boyd, A Perspective ofModern Methods in Computer-Aided Drug Design.

Volume 6 (1995)

Christopher J. Cramer and Donald G. Truhlar, Continuum Solvation Models:Classical and Quantum Mechanical Implementations.

Contributors to Previous Volumes xxv

Clark R. Landis, Daniel M. Root, and Thomas Cleveland, MolecularMechanics Force Fields for Modeling Inorganic and OrganometallicCompounds.

Vassilios Galiatsatos, Computational Methods for Modeling Polymers: AnIntroduction.

Rick A. Kendall, Robert J. Harrison, Rik J. Littlefield, and Martyn F. Guest,High Performance Computing in Computational Chemistry: Methods andMachines.

Donald B. Boyd, Molecular Modeling Software in Use: Publication Trends.

Eiji �OOsawa and Kenny B. Lipkowitz, Appendix: Published Force FieldParameters.

Volume 7 (1996)

Geoffrey M. Downs and Peter Willett, Similarity Searching in Databases ofChemical Structures.

Andrew C. Good and Jonathan S. Mason, Three-Dimensional StructureDatabase Searches.

Jiali Gao, Methods and Applications of Combined Quantum Mechanical andMolecular Mechanical Potentials.

Libero J. Bartolotti and Ken Flurchick, An Introduction to Density FunctionalTheory.

Alain St-Amant, Density Functional Methods in Biomolecular Modeling.

Danya Yang and Arvi Rauk, The A Priori Calculation of Vibrational CircularDichroism Intensities.

Donald B. Boyd, Appendix: Compendium of Software for MolecularModeling.

Volume 8 (1996)

Zdenek Slanina, Shyi-Long Lee, and Chin-hui Yu, Computations in TreatingFullerenes and Carbon Aggregates.

xxvi Contributors to Previous Volumes

Gernot Frenking, Iris Antes, Marlis Bohme, Stefan Dapprich, Andreas W.Ehlers, Volker Jonas, Arndt Neuhaus, Michael Otto, Ralf Stegmann, AchimVeldkamp, and Sergei F. Vyboishchikov, Pseudopotential Calculations ofTransition Metal Compounds: Scope and Limitations.

Thomas R. Cundari, Michael T. Benson, M. Leigh Lutz, and Shaun O.Sommerer, Effective Core Potential Approaches to the Chemistry of theHeavier Elements.

Jan Almlof and Odd Gropen, Relativistic Effects in Chemistry.

Donald B. Chesnut, The Ab Initio Computation of Nuclear MagneticResonance Chemical Shielding.

Volume 9 (1996)

James R. Damewood, Jr., Peptide Mimetic Design with the Aid of Computa-tional Chemistry.

T. P. Straatsma, Free Energy by Molecular Simulation.

Robert J. Woods, The Application of Molecular Modeling Techniques to theDetermination of Oligosaccharide Solution Conformations.

Ingrid Pettersson and Tommy Liljefors, Molecular Mechanics CalculatedConformational Energies of Organic Molecules: A Comparison of ForceFields.

Gustavo A. Arteca, Molecular Shape Descriptors.

Volume 10 (1997)

Richard Judson, Genetic Algorithms and Their Use in Chemistry.

Eric C. Martin, David C. Spellmeyer, Roger E. Critchlow Jr., and Jeffrey M.Blaney, Does Combinatorial Chemistry Obviate Computer-Aided Drug Design?

Robert Q. Topper, Visualizing Molecular Phase Space: Nonstatistical Effectsin Reaction Dynamics.

Raima Larter and Kenneth Showalter, Computational Studies in NonlinearDynamics.

Contributors to Previous Volumes xxvii

Stephen J. Smith and Brian T. Sutcliffe, The Development of ComputationalChemistry in the United Kingdom.

Volume 11 (1997)

Mark A. Murcko, Recent Advances in Ligand Design Methods.

David E. Clark, Christopher W. Murray, and Jin Li, Current Issues in DeNovo Molecular Design.

Tudor I. Oprea and Chris L. Waller, Theoretical and Practical Aspects ofThree-Dimensional Quantitative Structure–Activity Relationships.

Giovanni Greco, Ettore Novellino, and Yvonne Connolly Martin, Approachesto Three-Dimensional Quantitative Structure–Activity Relationships.

Pierre-Alain Carrupt, Bernard Testa, and Patrick Gaillard, ComputationalApproaches to Lipophilicity: Methods and Applications.

Ganesan Ravishanker, Pascal Auffinger, David R. Langley, BhyravabhotlaJayaram, Matthew A. Young, and David L. Beveridge, Treatment ofCounterions in Computer Simulations of DNA.

Donald B. Boyd, Appendix: Compendium of Software and Internet Tools forComputational Chemistry.

Volume 12 (1998)

Hagai Meirovitch, Calculation of the Free Energy and the Entropy ofMacromolecular Systems by Computer Simulation.

Ramzi Kutteh and T. P. Straatsma, Molecular Dynamics with GeneralHolonomic Constraints and Application to Internal Coordinate Constraints.

John C. Shelley and Daniel R. Berard, Computer Simulation of WaterPhysisorption at Metal–Water Interfaces.

Donald W. Brenner, Olga A. Shenderova, and Denis A. Areshkin,Quantum-Based Analytic Interatomic Forces and Materials Simulation.

Henry A. Kurtz and Douglas S. Dudis, Quantum Mechanical Methods forPredicting Nonlinear Optical Properties.

Chung F. Wong, Tom Thacher, and Herschel Rabitz, Sensitivity Analysis inBiomolecular Simulation.

xxviii Contributors to Previous Volumes

Paul Verwer and Frank J. J. Leusen, Computer Simulation to Predict PossibleCrystal Polymorphs.

Jean-Louis Rivail and Bernard Maigret, Computational Chemistry in France:A Historical Survey.

Volume 13 (1999)

Thomas Bally and Weston Thatcher Borden, Calculations on Open-ShellMolecules: A Beginner’s Guide.

Neil R. Kestner and Jaime E. Combariza, Basis Set Superposition Errors:Theory and Practice.

James B. Anderson, Quantum Monte Carlo: Atoms, Molecules, Clusters,Liquids, and Solids.

Anders Wallqvist and Raymond D. Mountain, Molecular Models of Water:Derivation and Description.

James M. Briggs and Jan Antosiewicz, Simulation of pH-dependent Propertiesof Proteins Using Mesoscopic Models.

Harold E. Helson, Structure Diagram Generation.

Volume 14 (2000)

Michelle Miller Francl and Lisa Emily Chirlian, The Pluses and Minuses ofMapping Atomic Charges to Electrostatic Potentials.

T. Daniel Crawford and Henry F. Schaefer III, An Introduction to CoupledCluster Theory for Computational Chemists.

Bastiaan van de Graaf, Swie Lan Njo, and Konstantin S. Smirnov,Introduction to Zeolite Modeling.

Sarah L. Price, Toward More Accurate Model Intermolecular Potentials ForOrganic Molecules.

Christopher J. Mundy, Sundaram Balasubramanian, Ken Bagchi, Mark E.Tuckerman, Glenn J. Martyna, and Michael L. Klein, NonequilibriumMolecular Dynamics.

Donald B. Boyd and Kenny B. Lipkowitz, History of the Gordon ResearchConferences on Computational Chemistry.

Contributors to Previous Volumes xxix

Mehran Jalaie and Kenny B. Lipkowitz, Appendix: Published Force FieldParameters for Molecular Mechanics, Molecular Dynamics, and Monte CarloSimulations.

Volume 15 (2000)

F. Matthias Bickelhaupt and Evert Jan Baerends, Kohn-Sham DensityFunctional Theory: Predicting and Understanding Chemistry.

Michael A. Robb,Marco Garavelli,Massimo Olivucci, and Fernando Bernardi,A Computational Strategy for Organic Photochemistry.

Larry A. Curtiss, Paul C. Redfern, and David J. Frurip, Theoretical Methodsfor Computing Enthalpies of Formation of Gaseous Compounds.

Russell J. Boyd, The Development of Computational Chemistry in Canada.

Volume 16 (2000)

Richard A. Lewis, Stephen D. Pickett, and David E. Clark, Computer-AidedMolecular Diversity Analysis and Combinatorial Library Design.

Keith L. Peterson, Artificial Neural Networks and Their Use in Chemistry.

Jorg-Rudiger Hill, Clive M. Freeman, and Lalitha Subramanian, Use of ForceFields in Materials Modeling.

M. Rami Reddy,Mark D. Erion, and Atul Agarwal, Free Energy Calculations:Use and Limitations in Predicting Ligand Binding Affinities.

Volume 17 (2001)

Ingo Muegge and Matthias Rarey, Small Molecule Docking and Scoring.

Lutz P. Ehrlich and Rebecca C. Wade, Protein-Protein Docking.

Christel M. Marian, Spin-Orbit Coupling in Molecules.

Lemont B. Kier, Chao-Kun Cheng, and Paul G. Seybold, Cellular AutomataModels of Aqueous Solution Systems.

Kenny B. Lipkowitz and Donald B. Boyd, Appendix: Books Published on theTopics of Computational Chemistry.

xxx Contributors to Previous Volumes

Volume 18 (2002)

Geoff M. Downs and John M. Barnard, Clustering Methods and Their Uses inComputational Chemistry.

Hans-Joachim Bohm and Martin Stahl, The Use of Scoring Functions in DrugDiscovery Applications.

StevenW.Rick and Steven J. Stuart, Potentials andAlgorithms for IncorporatingPolarizability in Computer Simulations.

Dmitry V. Matyushov and Gregory A. Voth, New Developments in theTheoretical Description of Charge-Transfer Reactions in Condensed Phases.

George R. Famini and Leland Y. Wilson, Linear Free Energy RelationshipsUsing Quantum Mechanical Descriptors.

Sigrid D. Peyerimhoff, The Development of Computational Chemistry inGermany.

Donald B. Boyd and Kenny B. Lipkowitz, Appendix: Examination of theEmployment Environment for Computational Chemistry.

Volume 19 (2003)

Robert Q. Topper, David L. Freeman, Denise Bergin, and Keirnan R.LaMarche, Computational Techniques and Strategies for Monte CarloThermodynamic Calculations, with Applications to Nanoclusters.

David E. Smith and Anthony D. J. Haymet, Computing Hydrophobicity.

Lipeng Sun and William L. Hase, Born-Oppenheimer Direct DynamicsClassical Trajectory Simulations.

Gene Lamm, The Poisson–Boltzmann Equation.

Volume 20 (2004)

Sason Shaik and Philippe C. Hiberty, Valence Bond Theory: Its History,Fundamentals and Applications. A Primer.

Nikita Matsunaga and Shiro Koseki, Modeling of Spin Forbidden Reactions.

Contributors to Previous Volumes xxxi

Stefan Grimme, Calculation of the Electronic Spectra of Large Molecules.

Raymond Kapral, Simulating Chemical Waves and Patterns.

Costel Sarbu and Horia Pop, Fuzzy Soft-Computing Methods and TheirApplications in Chemistry.

Sean Ekins and Peter Swaan, Development of Computational Models forEnzymes, Transporters, Channels and Receptors Relevant to ADME/Tox.

Volume 21 (2005)

Roberto Dovesi, Bartolomeo Civalleri, Roberto Orlando, Carla Roetti, andVictor R. Saunders, Ab Initio Quantum Simulation in Solid State Chemistry.

Patrick Bultinck, Xavier Girones, and Ramon Carbo-Dorca, MolecularQuantum Similarity: Theory and Applications.

Jean-Loup Faulon, Donald P. Visco, Jr., andDiana Roe, EnumeratingMolecules.

David J. Livingstone and David W. Salt, Variable Selection- Spoilt for Choice.

Nathan A. Baker, Biomolecular Applications of Poisson-Boltzmann Methods.

Baltazar Aguda, Georghe Craciun, and Rengul Cetin-Atalay, Data Sourcesand Computational Approaches for Generating Models of Gene RegulatoryNetworks.

Volume 22 (2006)

Patrice Koehl, Protein Structure Classification.

Emilio Esposito, Dror Tobi, and JeffryMadura, Comparative ProteinModeling.

Joan-Emma Shea, Miriam Friedel, and Andrij Baumketner, Simulations ofProtein Folding.

Marco Saraniti, Shela Aboud, and Robert Eisenberg, The Simulation of IonicCharge Transport in Biological Ion Channels: An Introduction to NumericalMethods.

C. Matthew Sundling, Nagamani Sukumar, Hongmei Zhang, Curt Breneman,and Mark Embrechts, Wavelets in Chemistry and Chemoinformatics.

xxxii Contributors to Previous Volumes

CHAPTER 1

Linear-Scaling Methodsin Quantum Chemistry

Christian Ochsenfeld, Jorg Kussmann, and

Daniel S. Lambrecht

Institut fur Physikalische und Theoretische Chemie, UniversitatTubingen, D-72076 Tubingen, Germany

INTRODUCTION

With the introduction of the Schrodinger equation in 1926,1 it was inprinciple clear how to describe a molecular system and its properties exactlyin a nonrelativistic sense. However, for most molecular systems of chemicalinterest, the analytic solution of the Schrodinger equation is not possible.Therefore, since 1926, a multitude of hierarchical approximations (some ofwhich are displayed in Figure 1) have been devised that allow for a systematicapproach to the exact solution of the Schrodinger equation. Although theSchrodinger equation as the fundamental equation in electronic structure the-ory is already quite old, the field of quantum chemistry is still fairly young andfast moving, and much can be expected in the future for developing and apply-ing quantum chemical methods for the treatment of molecular systems.

The importance of the systematic hierarchy for solving the Schrodingerequation cannot be overemphasized, because it allowsone, inprinciple, to system-aticallyapproach the exact result for amolecularpropertyof interest.The simplestapproach in this hierarchy is the Hartree–Fock (HF) method, which describeselectron–electron interactions within a mean-field approach.2–4 The electron-correlation effects neglected in this approach can be described by the so-called

Reviews in Computational Chemistry, Volume 23edited by Kenny B. Lipkowitz and Thomas R. CundariCopyright � 2007 Wiley-VCH, John Wiley & Sons, Inc.

1

post-HF methods, with prominent examples such as perturbation theory (e.g.,MP2: Møller–Plesset second-order perturbation theory5) or the coupledcluster(CC) expansion (see e.g. Ref. 6 for a review; CCSD: CC singles doubles;CCSD(T):CCSDwithperturbative triples; orCCSDT:CCsinglesdoubles triples).In thisway, the hierarchy of ab initiomethods allows for reliable ‘‘measurements’’and for estimating the error bars of simpler approximations. In Figure 1, we alsolist density functional theory (DFT),7–9 although it does not provide (at least in itscurrent form) a systematic way of improving upon the result. Despite this defi-ciency in its current form, DFT has pragmatically proven to be highly useful forthe description of many molecular systems, while offering a good compromisebetweenaccuracyandcomputational cost.Therefore,DFThasbecomea standardtool of modern quantum chemistry.

Themaindifficultyassociatedwiththehierarchyofquantumchemicalmeth-ods is the strong increaseof thecomputationaleffortwithmolecular size (M)(com-pare Figures 1 and 2), especially when approaching the exact solution. Even thesimplest approach, the HF method,4 scales conventionally asOðM3Þ, where OðÞdenotes theorderof the asymptotic scalingbehavior.Thismeans thatwhenchoos-ing anothermolecule to study that is 10 times larger than the currentmolecule, thecomputational effort is increased by a factor of 1000. The increase becomes evenmore dramatic if the electron correlation effects neglected in the HF approachare either accounted for by, e.g., MP2 or CCSD, for which the scaling behavior isOðM5Þ orOðM6Þ, respectively. TheOðM6Þ scaling entails an increase of the com-putational effort by a factor of 1, 000, 000 for a 10-fold larger system.

At this stage it is worthwhile to spend some time to clarify the scaling beha-vior. The focus of this chapter is on methods whose efforts increase only linearly

Figure 1 The hierarchy of ab initio methods: A selection of common approximations forsolving the electronic Schrodinger equation is displayed. In addition, the asymptoticscaling order ðOðÞÞ with respect to molecular size M is listed.

2 Linear-Scaling Methods in Quantum Chemistry

with molecular size, M (defined by, e.g., the number of atoms), while the atom-centered basis set framework is retained. Within the same atomic-orbital (AO)basis, the total number of basis functions (N) scales similarly with the molecularsize, so that the scaling behavior can be described as well by the number of AOs.However, increasing the number of basis functions for a specific molecule wouldtypically not lead to a linear-scaling behavior. The size of the atom-centered basissimply defines the prefactor of the calculation (i.e., the constant factorwithwhichthe scaling behavior is multiplied; see Figure 2). In the current tutorial, we there-fore mainly employ the molecule size M for describing the scaling property.

To illustrate how prohibitive even anOðM3Þ scaling would be for the cal-culation of large molecules, we can think about ‘‘Moore’s law.’’10 It is anempirical observation proposed in 1965, leading to the statement that computerspeed roughly doubles every 1.5 years and that has been, as a rule of thumb,astonishingly valid over the last decades. The factor of 1000 for a 10-fold largermolecule can be described as roughly 210, which would—withMoore’s assump-tion—correspond to 15 years of computer development required, whereas anOðM6Þ scaling would lead to even 30 years. In other words, one would needto wait 15 years for the computers to evolve to perform an HF calculation fora 10-fold larger molecule within the same time frame. This is clearly not anoption for any enthusiastic researcher attempting to grasp deeper insights intomolecular processes in chemistry, biochemistry, or even biology.

Therefore, the aim of this didactical review is to provide some insightsinto reducing the scaling behavior of quantum chemical methods so thatthey scale linearly with molecular size. In this way, any increase in computerspeed translates directly into an increase of the treatable molecular size with

Molecule size (M)

Com

puta

tion

time

O(M)

O(M5)

O(M3)

(a) (b)

Molecule size (M)C

ompu

tatio

n tim

e

2.7M2

M 2

2.7M

M

Figure 2 The computation time behaves approximately as: computation time ¼ a �Mn.Here, Mn is called the scaling behavior, and a is the prefactor. The graphs provide aschematical comparison of computation times for (a) different scaling behaviors and(b) different prefactors (a ¼ 1 and a ¼ 2:7).

Introduction 3

respect to time requirements. The focus of this review is on presenting somebasic ideas of these linear-scaling methods, without giving a complete over-view of the many different approaches introduced in the literature. For basicaspects of quantum mechanics and quantum chemical methods, the reader isreferred to the textbook literature such as, e.g., Refs. 4 and 11–13.

In this chapter, we describe mainly linear-scaling self-consistent field(SCF) methods such as HF and DFT, which are closely related in the way ener-gies, energy gradients, and molecular properties are computed. With theselinear-scaling methods, molecular systems with more than 1000 atoms cannowadays be computed on simple one-processor workstations. In addition,we provide a brief outlook concerning electron-correlation methods andwhat might be expected in the future for reducing their scaling behavior whilepreserving rigorous error bounds. The review is structured as follows:

� After a brief introduction to some basics of SCF theories, we describe inthe following four sections how Fock-type matrices can be built in alinear-scaling fashion, which is one of the key issues in SCF theories.

� The reduction of the scaling for forming Fock-type matrices leads then tothe necessity for avoiding the second rate-determining step in SCF energycomputations, the cubically scaling diagonalization step.

� With the described methods, the linear-scaling calculation of SCFenergies becomes possible. However, for characterizing stationary pointson potential energy surfaces, the calculation of energy gradients iscrucial, which is described in the succeeding section.

� To obtain a link to experimental studies, the computation of responseproperties is often very important. Examples include vibrationalfrequencies or nuclear magnetic resonance (NMR) chemical shifts, forwhich the response of the one-particle density matrix with respect to aperturbation needs to be computed. Therefore, we describe ways toreduce the strong increase of the computational effort with molecularsize.

� Finally, we provide in the last section a brief outlook on the long-rangebehavior of electron correlation effects for the example of MP2 theoryand show how significant contributions to the correlation energy can bepreselected, so that the scaling behavior can be reduced to linear.

SOME BASICS OF SCF THEORY

The simplest approximation used to solve the time-independent Schro-dinger equation

HH� ¼ E� ½1�


within the commonly used Born–Oppenheimer approach14,15 of clampednuclei and the electronic Hamiltonian

HHel ¼ �1

2

Xi

r2i �

Xi

XA

ZA

riAþXi

Xj>i

1

rij

¼Xi

hhi þXi

Xj>i

1

rij½2�

is the expansion of the wave function in a Slater determinant16 as an antisym-metrized product of one-particle functions ji (spin orbitals):

�ðr1r2 � � � rNÞ ¼ jj1j2 � � �jNi

¼ 1ffiffiffiffiffiffiN!p

j1ðr1Þ j2ðr1Þ � � � jNðr1Þj1ðr2Þ j2ðr2Þ � � � jNðr2Þ

..

. ... . .

. ...

j1ðrNÞ j2ðrNÞ � � � jNðrNÞ

��

��½3�

With this expansion for the wave function, the expectation value using theelectronic Hamiltonian (Eq. [2]) can be calculated using the Slater–Condonrules.4 The result is (in Dirac notation):

EHF ¼Xi

hjijhhjjii þ1

2

Xi

Xj

hjijjjjjijji ½4�

Minimizing the HF expectation value (Eq. [4]) with respect to orbital rotationswhile imposing orthonormality constraints leads to the well-known HFequation:2–4

FFji ¼ eiji ½5�

with FF as the Fock operator and ei as the orbital energy. To algebraize thisequation and allow for a suitable solution on computers, it is necessary toexpand the one-particle functions in a set of fixed basis functions wm (typicallycontracted Gaussian basis functions are used in quantum chemistry):

ji ¼Xm

Cmi wm ½6�

leading to the Roothaan–Hall equations17,18

FC ¼ SCe ½7�

Some Basics of SCF Theory 5

where F is the Fock matrix, S is the overlap, C is the coefficient matrix of themolecular orbitals (MOs), and e is the diagonal matrix of the molecular-orbital energies. The Fock matrix of a closed-shell molecule is built by con-tracting the one-particle density matrix

Pmn ¼XNocc

i

CmiC�ni ½8�

with the four-center two-electron integrals and adding the one-electron parthmn:

Fmn ¼ hmn þXls

Pls½2ðmnjlsÞ � ðmsjlnÞ� ½9�

We use the Mulliken notation for two-electron integrals over (real-valued)Gaussian atomic basis functions in the following:

ðmnjlsÞ ¼ðwmðr1Þwnðr1Þ

1

r12wlðr2Þwsðr2Þ dr1dr2 ½10�

Because the Fock matrix depends on the one-particle density matrix P con-structed conventionally using the MO coefficient matrix C as the solution ofthe pseudo-eigenvalue problem (Eq. [7]), the SCF equation needs to be solvediteratively. The same holds for Kohn–Sham density functional theory (KS–DFT)8,9 where the exchange part in the Fock matrix (Eq. [9]) is at least partlyreplaced by a so-called exchange-correlation functional term. For both HF andDFT, Eq. [7] needs to be solved self-consistently, and accordingly, these meth-ods are denoted as SCF methods.

Two rate-determining steps occur in the iterative SCF procedure. Thefirst is the formation of the Fock matrix, and the second is the solution ofthe pseudo-eigenvalue problem. The latter step is conventionally done as adiagonalization to solve the generalized eigenvalue problem (Eq. [7]), andthus, the computational effort of conventional SCF scales cubically with sys-tem size ½OðM3Þ�.

The construction of the Fock matrix scales formally with M4 (or moreprecisely with N4; see discussion above) due to the two-electron integrals beingfour-index quantities. However, the asymptotic scaling of the number of two-electron integrals reduces to OðM2Þ for larger molecular systems. This can beunderstood by considering the following example: The charge distribution ofelectron 1 in a two-electron integral (Eq. [10]) is described by the product ofbasis functions wm � wn. If we consider a selected basis function wm, then onlybasis functions wn that are ‘‘close’’ to the center of wm will form non-vanishing charge distributions. This is because the Gaussian basis functions


decay exponentially with distance. Therefore, the number of basis functions wnoverlapping with the function wm will asymptotically (for large molecules)remain constant with increasing molecular size (in a way one can imagine a‘‘sphere’’ around the selected basis function as shown in Figure 3). Overallthere are OðMÞ basis-function pairs describing each of the two electrons, sothat a total of OðM2Þ two-electron integrals results:

ðwmðr1Þwnðr1Þ|fflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflffl}

OðMÞ

1

r12wlðr2Þwsðr2Þ|fflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflffl}

OðMÞ

dr1dr2 ½11�

Figure 3 Illustration of the basis functions-pair domain behavior. For a given basisfunction (or shell) m, only those n’s must be considered whose overlap integralsSmn ¼ ðmjnÞ exceed a certain threshold. In the upper graph the values of the overlapintegral is depicted as a function of the m� n distance. In the lower graph we consider alinear chain of 15 Gaussian functions (circles) at selected points in space. For m ¼ 8(center atom) and a numerical threshold of 10�7, only the shell pairs closer than4:01 a:u: ¼ 2:12 A ðn ¼ 4–12Þ are numerically significant (shaded area); all other n’s (1–3,13–15) are numerically insignificant for the formation of the selected chargedistribution �mn and may be neglected. (Chosen distance is 1 a:u: Each Gaussian is ans function of unit exponent.)

Some Basics of SCF Theory 7

Although the diagonalization of the Fock matrix scales cubically withsystem size as compared with the quadratic scaling for the formation of theFock matrix, its prefactor is rather small as shown schematically in Figure 4.Therefore, the diagonalization dominates only for large molecules and/or forfast Fock formation methods as described later.

DIRECT SCF METHODS AND TWO-ELECTRONINTEGRAL SCREENING

In ‘‘nondirect’’ SCF methods, all two-electron integrals are calculatedonce, stored on disk, and later reused in the subsequent SCF iterations. Becausethe number of integrals scales formally as M4, storing and retrieving two-electron integrals is an extremely expensive step as far as disk space and input/output (I/O) time are concerned. For largemolecules the required disk space (andcalculation time, see discussion below) easily exceeds all available capacities.

Almlof et al.19 observed in a seminal paper that recomputing inte-grals whenever needed, rather than storing them to disk, could not only be

0 2000 4000 6000 8000 10000Number of basis functions

0

2

4

6

8

10

12

CPU

tim

e [h

]

Conventional Fock matrix formationDiagonalizationCFMM/LinK Fock matrix formation

Figure 4 Typical timing behavior of the quadratic Fock matrix formation versus thecubically scaling diagonalization step (small prefactor) in SCF energy calculations. Thetimings for a conventional Fock matrix formation, the linear-scaling CFMM/LinKschemes (as explained later in this review), and Fock matrix diagonalization for a seriesof DNA molecules ðA-TÞn, n ¼ 1� 16 are depicted. Integral threshold is 10�6, basis set6-31G�.


competitive to the methods used so far, but also even surpass them as far ascomputational and storage efficiency are concerned.

Direct schemes have two advantages: First, storage requirements aregreatly decreased, and second, calculations for large molecules can actuallybe made much faster than for nondirect methods, because within the SCFiterations, information on the locality of the molecular system (via the one-particle density matrix; see discussion below) can be exploited. In this way,the introduction of the ‘‘direct’’ SCF approach constitutes an important steptoward the applicability of quantum chemical methods to large molecules.

The formation of Fock-type matrices can be schematically divided intotwo steps:

� Selection of numerically significant integrals: integral screening.

� Calculation of integrals and formation of final matrices.

In the following section, we focus first on the selection of numerically signifi-cant integrals, and later we discuss the different contractions of the two-electron integrals.

Schwarz Integral Estimates

Although the asymptotic OðM2Þ scaling of the four-center two-electronintegrals had been known at least since 1973,20 it was only in 1989 in theseminal work of Haser and Ahlrichs21 that an efficient and widely acceptedway of rigorously preselecting the numerically significant two-electron inte-grals was introduced:

jðmnjlsÞj � jðmnjmnÞj12 � jðlsjlsÞj12 ¼ QmnQls ½12�

This so-called Schwarz integral screening provides a rigorous upper bound tothe four-index integrals, while requiring just the computation of simple two-index quantities. In this way, small four-index integrals below a chosen thresh-old can be neglected and the formal M4 scaling associated with the formationof the Fock matrix in HF theory is reduced to OðM2Þ. As we will see shortly,this was a breakthrough for direct SCF methods19,21,22 and increased theapplicability of SCF methods dramatically.

In contrast to computing the two-electron integrals before an SCF calcu-lation, the key feature of the direct SCF method is the recalculation of two-electron integrals in each SCF iteration. Although this recalculation has thedisadvantage of computing the integrals multiple times for building the Cou-lomb and exchange parts (denoted as J and K) of the Hamiltonian,

Jmn � Kmn ¼Xls

Pls½2ðmnjlsÞ � ðmsjlnÞ� ½13�

Direct SCF Methods and Two-Electron Integral Screening 9

it not only avoids the bottleneck of storing the huge number of four-centertwo-electron integrals, but also it becomes possible to screen the two-electronintegrals in combination with the corresponding one-particle density matrix Pavailable in each iteration. For example, when calculating the Coulomb part(Jmn), integrals are neglected if their contributions to the Coulomb matrix arebelow a selected threshold of 10�#:

neglect ðmnjlsÞ; if jPlsj �QmnQls � 10�# ½14�An analogous screening criterion may be formulated for the exchange matrixKmn.

It has to be pointed out that in nondirect SCF, the density matrix is notavailable for screening, because all integrals are calculated prior to the SCFrun. Equation [12] has to be employed for screening instead of Eq. [14].This procedure has a severe disadvantage: Although an integral ðmnjlsÞ itselfmay be large, its contribution PlsðmnjlsÞ to the Coulomb (or, analogously,the exchange) matrix and finally the total energy may be negligible, becausethe density matrix elements Pls are often small. Integral screening for nondir-ect SCF is, therefore, much less efficient than for direct SCF, because a largenumber of integrals whose contribution to the final result is negligible cannotbe discarded due to the missing coupling with the density matrix.

A further improvement on integral screening can be achieved by employ-ing difference densities (c.f. Ref. 19 and 21). The Fock matrices of iterations nand n� 1 are given by

FðnÞ ¼ hþ PðnÞ � IIFðn�1Þ ¼ hþ Pðn�1Þ � II

½15�

with II as the antisymmetrized two-electron integrals. Instead of constructingthe full Fock matrix in each iteration, a recursive scheme as in the followingequation may be used:

FðnÞ ¼ Fðn�1Þ þ�PðnÞ � II ½16�with the difference density �PðnÞ for the nth iteration defined as

�PðnÞ ¼ PðnÞ � Pðn�1Þ ½17�Within this scheme, the number of two-electron integrals needed for the

Fock matrix updates �PðnÞ � II in each iteration may be screened by replacingPls with �Pls in Eq. [14] for the Coulomb part

neglect ðmnjlsÞ; if��P

ðnÞls

�� QmnQls � 10�# ½18�


and in an analogous fashion for the exchange part. As the SCF calculationapproaches convergence, the change�PðnÞ in the density matrix becomes smal-ler and smaller and finally approaches zero (within numerical accuracy). Thenumber of two-electron integrals surviving the screening of Eq. [18] is there-fore significantly smaller than without difference density screening (Eq. [14]).For an improved algorithm by Haser and Ahlrichs, where the norm of the dif-ference densities �PðnÞ is minimized further, the reader is referred to Ref. 21.

Multipole-Based Integral Estimates (MBIE)

The Schwarz estimates introduced by Haser and Ahlrichs21 are now usedin almost every quantum chemical code for two-electron integral screening.However, they are not optimal in a certain sense: They do not describe the1=R decay behavior between the charge distributions of the two-electron inte-grals, as we will explain shortly.

Consider a two-electron repulsion integral (ERI).

ðmnjlsÞ � ðAjBÞ ¼ð�Aðr1Þ 1

r12�Bðr2Þdr1dr2 ½19�

which consists of the two charge distributions �A and �B describing the spatialdistribution of electrons 1 and 2, respectively (see Figure 5). Here, A and B arecollective indices for the ‘‘bra’’ and ‘‘ket’’ basis functions, i.e., A ¼ mn andB ¼ ls. The �Aðr1Þ and �Bðr2Þ are Gaussian distributions built as productsof two Gaussians �Aðr1Þ ¼ wmðr1Þ � wnðr1Þ and �Bðr2Þ ¼ wlðr2Þ � wsðr2Þ, respec-tively. The integral describes the Coulomb repulsion between electrons e�1 ande�2 , whose spatial distribution is represented by �A and �B, respectively, as illu-strated in Figure 5. As stated by Coulomb’s law, the repulsion energy betweentwo charges is proportional to 1=R, where R is the distance between the twoparticles. Similarly, for the two-electron integral, one finds (see Ref. 13) that

ðmnjlsÞ SmnSlsR

½20�

χμχν χλχσ

BA

R

−−e1 e2

Figure 5 The spatial distributions of electrons e�1 and e�2 are described by the orbitalproducts �A ¼ wmwn and �B ¼ wlws centered about A and B, respectively. The distancebetween both centers is denoted as R.


for sufficiently large separations. We will refer to this in the following as expo-nential and 1=R-coupling as denoted by

exponential coupling : ðmnjlsÞ SmnSls e�amn�R2mne�als�R

2ls

1=R� coupling : ðmnjlsÞ 1

R

½21�

where Rmn (Rls) is the distance between basis function centers wm and wn (wland ws). amn and als are some constants irrelevant for the following discussion.

Although the Schwarz integral estimates account correctly for the expo-nential coupling of mn and ls, the 1=R decay when increasing the distance Rbetween the charge distributions is entirely missing. This is illustrated inFigure 6 where both the exact behavior of a two-electron integral and theSchwarz estimate (abbreviated as QQ) behavior are shown. The 1=R distancedecay becomes not only important for the treatment of large molecules in SCFtheories, but also in electron correlation methods, where the decay behavior isat least 1=R4.23,24 We will return to the latter issue in our outlook on electroncorrelation methods later in this review.

Almlof pointed out in 197225 that the missing 1/R dependence in theSchwarz screening might be approximated by the following equation via over-lap integrals (SA and SB):

ðmnjlsÞ � ðAjBÞ SASBRAB

½22�

0 5 10 15 20

H-F distance (Angstrom)

0

0.2

0.4

0.6

0.8

Inte

gral

Val

ue

QQMBIE-0MBIE-1exact

Figure 6 Comparison of integral estimates MBIE-0, MBIE-1, QQ (Schwarz), and exact1=R-dependence of two-electron repulsion integrals in a hydrogen-fluoride dimer forintegral ðdzzdzzjpzpzÞ with minimum exponents on bra and ket side, respectively, of�min ¼ 8:000000 � 10�1 and xmin ¼ 6:401217 � 10�1 using a 6-31G�� basis.


However, Eq. [22] does not represent a rigorous upper bound to the two-electron integral. Almlof26 as well as Haser and Ahlrichs21 noted later thatnonrigorous bounds for integrals cannot be used in screening as efficiently asrigorous bounds, because the error is uncontrollable. To achieve sufficient accu-racy with nonrigorous integral bounds, the thresholds would need to be loweredto an extent that renders them virtually useless for practical applications.

Recently, new multipole-based integral estimates (MBIE) have beenintroduced by Lambrecht and Ochsenfeld.23 They are simple, rigorous, andtight upper bounds to the two-electron integrals, and at the same time, theyaccount for the 1=R decay behavior. Because these estimates can be appliedgenerally in quantum chemistry and are expected to be particularly importantin view of electron-correlation theories for larger molecules, we briefly outlinethe main ideas of this MBIE method. For a discussion of the latter in the contextof electron correlation, see also the last section of this tutorial.

For a two-electron integral with well-separated charge distributions (wewill define this in more detail in the section on multipole expansions of two-electron integrals), it is possible to expand the 1

r12operator in a multipole series

as13,27–29

ðmnjlsÞ ¼MMð0Þ

Rþ MMð1Þ

R2þ MMð2Þ

R3þ . . . ½23�

where the MMðnÞ denote n-th order multipole terms. For example, MM(0)

describes the monopole-monopole (overlap-overlap) interaction, MM(1)

stands for dipole-monopole terms and MM(2) contains the quadrupole-monopole and dipole-dipole interactions. This series intrinsically containsthe 1=R-dependence for which we aim.

With the definition of ‘‘absolute spherical multipoles’’ of order n,MðnÞ

as the absolute value of the radial part of spherical multipoles

MðnÞA �

ðj�AðrÞrnj r2dr ½24�

and collecting all the nth order terms over the absolute multipole integrals byMMðnÞ, we obtain an upper bound to the two-electron integral:

jðmnjlsÞj � MMð0Þ

Rþ MM

ð1Þ

R2þ MM

ð2Þ

R3þ OðR�4Þ ½25�

HereMMðnÞ stands for expressions involving absolute multipole integrals oforder n. Although this expansion represents a rigorous upper bound to thetwo-electron integral, it is of no practical use in this form, because the seriesinvolves, in principle, an infinite (or, at least, a high) number of terms. Discard-ing the higher order terms would, of course, not lead to a rigorous upper bound.


The key feature of theMBIEmethod is to replace the higher order terms bylower order ones, while preserving the rigorous upper bound. This is not trivial,but analytical expressions can be derived that relate higher order multipoles tolower order terms.23 The key idea is illustrated in the following equation:

jðmnjlsÞj � MMð0Þmnls

R0X1n¼0

1

R0

� �n��

�� ½26�

Here, all multipoles with n � 1 are related to the monopole (overlap) termMMð0Þ by virtue of the analytically derived modified distance R0 (to bedescribed later). This replacement greatly simplifies the form of the series; sum-ming up the geometric series, we obtain the estimate to zeroth order (MBIE-0):

jðmnjlsÞj � Mð0ÞmnMð0Þ

ls

R0 � 1

�� ½27�

Note that this integral bound contains the 1=R-coupling through the modifieddistance R0.

The crucial point of MBIE is that R0 must be changed analytically suchthat the MBIE expression is a rigorous upper bound. After a tedious deriva-tion,23 it was found that

R0 � R� RAþB ¼ RAB � RA � RB ½28�

and

RA � K � 1

2�

� �nþ12n

; with K ¼ ðnþ 1þ lÞðnþ1þlÞ

2n

ll=2n� 1

e

� �nþ12n ½29�

guarantees that MBIE-0 is indeed a rigorous upper bound. In the previousequation, n is the multipole order up to which MBIE is valid. l and � denotethe total angular momentum and the orbital exponents of the Gaussian basisfunction product, respectively.

In the foregoing outline, all terms with n � 1 were related back to mono-poles (n0 ¼ 0). In a similar fashion, we can relate higher order terms back todipoles (MBIE-1), quadrupoles (MBIE-2), etc. (n0 ¼ 1; 2; . . .). For example,the MBIE-1 criterion, where all higher order terms are related back to expres-sions over dipoles, has the following form:

jðmnjlsÞj � Mð0ÞmnMð0Þ

ls

R

��þ M

ð1ÞmnMð0Þ

ls þMð0ÞmnMð1Þ

ls

R02 � R0

�� ½30�


It is important to note that independent of the order, MBIE always guaranteesupper bounds to the two-electron integral.23 The efficiency of the MBIE inte-gral estimates is illustrated in Figure 6, which shows that in contrast to theSchwarz estimates (QQ), MBIE accounts for the 1/R decay behavior of two-electron integrals. For SCF methods, we have found MBIE-1 to be a suffi-ciently good screening criterion. It overestimates the true number of significanttwo-electron integrals by just a few percent, while the screening overhead isnegligible. The presentation of actual timings with the MBIE screening willbe deferred to a later section, once we have introduced the linear-scaling meth-ods for forming the Fock matrix.

As mentioned, it is clear that the MBIE estimates require the validity ofthe multipole expansion for two-electron integrals, similar to the requirementsfor the fast multipole methods (these multipole expansions will be presented indetail in the next section). For the near-field part of the integrals, i.e., forcharge distributions that are so close that the multipole expansion is notapplicable, MBIE cannot be used. Here one can resort to, for example, theSchwarz bounds.

MBIE23 is the first rigorous integral screening criterion that takes boththe exponential and the 1=R-coupling into account. We also point out thatthe Schwarz bound significantly overestimates the true integral value, if the‘‘bra’’ and ‘‘ket’’ basis-function exponents are very different, as has been dis-cussed in the work of Gill et al.30 In contrast, MBIE does not suffer from sucha problem.

Because the computation, handling, and contraction of two-electronintegrals is central to many quantum chemical methods, it is clear thatMBIE can be widely applied. Therefore, by introducing the 1=R dependencein the two-electron integral estimates within MBIE, we not only gain perfor-mance for the treatment of large molecules in SCF theories, but MBIE becomesthe first screening criterion that allows for the rigorous preselection of contri-butions to the computation of electron correlation effects in AO-based the-ories. In these a coupling of at least 1=R4 is observed, which finally leads tolinear scaling for electron correlation methods, as we will outline in our out-look on electron correlation methods in a later section of this review.

CALCULATION OF INTEGRALS VIA MULTIPOLEEXPANSION

As we have seen, the number of Coulomb integrals scales as OðM2Þ forsufficiently large molecules. To overcome this potential bottleneck, the naivepair-wise summation over electron–electron interactions has to be circum-vented. We will see that the multipole expansion of the two-electron integralscan be used, thus allowing us to achieve an overall OðMÞ scaling for calculat-ing the Coulomb matrix.

Calculation of Integrals via Multipole Expansion 15

One advantage of using the multipole expansion in tackling the Cou-lomb problem is that instead of treating individual pair-wise interactionsbetween point charges, one can collect them into charge distributions anduse the total net interaction between these distributions as illustrated inFigure 7. Combined with a clever tree algorithm, the multipole expansioncan be used to avoid the quadratic step of summing over pair-wise interac-tions to obtain an OðMÞ scaling behavior. Another advantage is the separationof ‘‘bra’’ and ‘‘ket’’ quantities, making it possible to precalculate some auxili-ary entities before integral calculation itself starts, thus reducing the scalingprefactor.

In the next section, we consider as an introductory example the repla-cement of individual interactions with effective interactions by using a multi-pole series. After gaining some basic understanding of the multipoleexpansion, we derive in detail the spherical multipole expansion, which isone of the most prominent types of multipole expansions used in the calcula-tion of molecular integrals. Once our mathematical tools are derived, weexplain an algorithm that scales linearly with the number of interacting par-ticles, namely the fast multipole method (FMM), but that is only suitable forpoint charges. Then we consider continuous (Gaussian) charge distributionsby introducing a generalization of FMM, the continuous fast multipole meth-od (CFMM) in the next section. We complete our tour through multipolemethods with a brief overview of other approaches that make use of multi-pole expansions and tree codes to speed up the calculation of two-electronintegrals.

A First Example

Before deriving and discussing the multipole expansion in detail, let usfirst have a glimpse at its usefulness by means of a simple example. Imaginewe want to calculate the Coulomb interaction energy between a point chargeq1 and a set of point charges fq2; q3; q4; q5g (all of unit charge) as depicted inFigure 8(a).

ΩB

≅ BA RBA R

AΩ{q }i

{q }j

Figure 7 The interaction of several point charges fqig and fqjg (small open circles) canbe approximated as the net interaction term between two charge distributions �A and�B (large circles).


The simplest way to calculate the interaction energy U1B between q1 andall other charges is by summing over all four pair-wise interaction terms. Usingthe geometry described in Figure 8(a), this yields the following result:

U1B ¼ q1 �X5i¼2

fiðr1Þ ¼ q1 �X5i¼2

qir1i¼ 0:4010 a:u: ½31�

Here fiðr1Þ denotes the electrostatic potential generated by charge i as it is feltat the location of q1. This calculation of the repulsion energy is neither difficultnor time-consuming, but it must not be forgotten that the total number of inter-action terms in this naive approach scales likeOðM2Þ: If we want to evaluate allinteraction energies between two sets of M particles fq1a; q2a; . . . ; qMagand fq1b; q2b; . . . ; qMbg, we end up with a number of pair-wise interactionterms on the order of M2. As the number of interacting particles grows,the number of interaction terms soon becomes intractable. For example,a DNA molecule with 1052 atoms would require the calculation of1; 359; 959� ð1; 359; 959� 1Þ=2 ¼ 924; 743; 560; 861 pair interactions (witha 6-31G� basis set and an integral screening threshold of 10�6), presenting theresearcher with the enormous number of almost one trillion interaction terms!For large molecules described by several thousand basis functions, we musttherefore avoid the naive quadratic loop over pairs of interacting particles.

This is where the strength of the multipole expansion comes into play. Aswe will derive later, the potential f arising from an arbitrary charge can beexpanded in terms of monopole (q), dipole (D), quadrupole (Q), and higherorder multipole interactions:

fðrÞ ¼ fð0ÞðrÞ þ fð1ÞðrÞ þ fð2ÞðrÞ þ . . .

¼ q

rþD � rr

r2þ 1

2� rr �Q � rr

r3þ . . .

½32�

Figure 8 Cartoon illustrating the usefulness of the multipole expansion. (a) Naiveapproach: The interaction between a point charge (open circle) located at A and a set offour point charges distributed around B is calculated by summing over each individualpair-wise interaction term. (b) Multipole expansion: The four individual charges havebeen replaced by their net effect, where they behave like a single new (more complicated)charge distribution �B. The separation between A and B is R ¼ 10 a:u: The pointcharges of �B have a distance of r ¼ 1 a:u: from B.


where rr denotes the unit vector in the direction of r and r ¼ jrj is the length ofr. Even if a charge has a complicated structure, for example, is composed ofseveral point charges like in our examples (Figures 7 and 8), its potential canalways be expanded in the form of Eq. [32], where the multipoles are those ofthe composite charge distribution �. Instead of looking at the field generatedby the individual point charges, we can therefore consider their net effect[Figure 8(b)].

The expansion can be truncated after a finite number of, say, L terms.For the spherical multipole expansion, L ¼ 15–21 is known to provide accura-cies on the order of 10�7 a:u: and better in the total energy.31 Instead of takinginto account allM field terms of the individual charges qi, the total field is thengiven by the L terms of the multipole expansion of the total charge distribution� as

ftotalðrÞ ¼XMi

fiðrÞ ! fnetMPðrÞ

XLn¼0

fðnÞðrÞ ½33�

In the same manner, the total interaction energy of a point charge qi with allother charges qj is replaced with the net interaction

Utotal ¼ qiXMj

qjrij! Unet

MP qiXLn¼0

fðnÞðriÞ ½34�

Thus, instead of having OðMÞ interaction terms for each point charge qi, weend up with a sum over the different orders of the multipoles of the charge dis-tribution(s) �. Because L is constant for a certain level of accuracy, we can thuscalculate the interaction energy between qi and one composite charge � withcomplexity that is independent of the number of particles belonging to �.

To see how this works, let us return to our previous example[Figure 8(a,b)], where we combine the four point particles fq2; q3; q4; q5ginto one new charge distribution �B using the multipole series. Index Bdenotes the center of the new charge distribution and also the center of themultipole expansion. The new ‘‘particle’’ �B certainly has a more complicatedstructure than its constituent point-charge particles; the composite charge dis-tribution has not only a charge (monopole), but in general also a dipole, quad-rupole, and higher moments. We now calculate these lower order multipolemoments to evaluate the interaction energy.

The monopole is simply the total charge of the constituting pointcharges:

qB ¼X5j¼2

qj ¼ 4:0000 ½35�


The interaction energy between q1 and �B is, therefore, to zeroth order:

Uð0Þ1B ¼

q1qBR¼ 0:4000 a:u: ½36�

For the next-highest term, the dipole interaction, we obtain exactly zerobecause of symmetry reasons:

dB ¼Xj2B

qjrj ¼ 0

Uð1Þ1B ¼ 0:0000 a:u:

½37�

The quadrupole moment tensor of �B is given as

QB ¼Xi

3x2i � r2i xiyi xiziyixi 3y2i � r2i yizizixi ziyi 3z2i � r2i

0@

1A ¼ 2:0000 0:0000 0:0000

0:0000 �4:0000 0:00000:0000 0:0000 2:0000

0@

1A

½38�

and the second-order quadrupole interaction term gives the result

Uð2Þ1B ¼

1

2� rr1 �QB � rrA

R3¼ 0:0010 a:u: ½39�

We truncate our expansion here because a distribution of four point chargescan have, at most, a quadrupole moment. For high-accuracy calculations onmore complicated distributions, the expansion must be carried out to higherorders.

Putting everything together, we can approximate the interaction energyby

U1B Uð0Þ1B þU

ð1Þ1B þU

ð2Þ1B ¼ 0:4010 a:u: ½40�

Note that for this example the exact result of Eq. [31] is reproduced.Instead of treating each interaction term of q1 with fq2; . . . ; q5g expli-

citly, we approximated the total net influence of all charges using the multi-pole expansion and ended up with L (instead of M) interaction terms. Thecomputational workload is thus of order OðLÞ instead of OðMÞ when evaluat-ing the total interaction energy between a single point charge and a set of otherpoint charges. It is clear that if the number of point charges in �B is large,using the multipole series with only L net field terms may lead to significantsavings in CPU time.


Derivation of the Multipole Expansion

The multipole expansion may be carried out in several coordinate sys-tems, which may be chosen depending on the symmetry properties of the pro-blem under investigation. Spherical polar and Cartesian coordinates are usedmost commonly when calculating two-electron integrals. We outline here thederivation for the spherical series. The interested reader may find moredetailed discussions, for example, in the books of Eyring, Walter and Kimball27

or Morse and Feshbach.32,33 A discussion of the multipole expansion in theframework of atomic and molecular interactions and potentials may be foundin the article of Williams in this series of reviews (Ref. 34) or the book byHirschfelder, Curtiss and Bird.28

Before deriving the multipole series, let us start with some nomenclature.We want to evaluate the electron repulsion integral

ðmnjlsÞ � ðAjBÞ ¼ð�Aðr1Þ�Bðr2Þ

r12dr1dr2 ½41�

Here, A and B are collective indices for the ‘‘bra’’ and ‘‘ket’’ basis functions;i.e., A ¼ mn and B ¼ ls. �Aðr1Þ and �Bðr2Þ are Gaussian distributions,which are products of two Gaussians �Aðr1Þ ¼ wmðr1Þ � wnðr1Þ and�Bðr2Þ ¼ wlðr2Þ � wsðr2Þ. The centers of the Gaussian distributions are A andB, respectively. For our task it is handy to express the electronic coordinatesby their position relative to the centers A and B as

r1 ¼ r1A þ A

r2 ¼ r2B þ B½42�

and introduce the vector Dr12 as

Dr12 ¼ r1A � r2B

�r12 ¼ jDr12j½43�

The separation between the centers is given by

R ¼ B� A

R ¼ jRj ½44�

Our objective is to find a series expansion of the interelectronic distancer12, which facilitates the separation into an angular and a radial part andwhich decouples the coordinates of electrons 1 and 2. We follow here the deri-vation of Eyring et al.27


With the definitions introduced in the previous paragraph, the interelec-tronic separation can be expressed as

r12 ¼ jr1A � r2B þ Rj¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi�r212 þ R2 � 2 cos y ��r12 � R

q ½45�

Here y is the angle subtended between the vectors Dr12 and R. We denote thelarger radial part of the two vectors with r> and the smaller with r< as

r> ¼R R > �r12

�r12 R < �r12

�

r< ¼�r12 R > �r12

R R < �r12

� ½46�

and introduce the fraction x as

x ¼ r<r>

½47�

The interelectonic distance may now be expressed in terms of x and r>(containing all radial dependence) and the angle y (containing all angulardependence):

r12 ¼ r>ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1þ x2 � 2x cos y

p½48�

We are now ready to separate the angular and the radial parts of r12. Tothis end the angular part of the Coulomb term is expanded in Legendre poly-nomials Pnðcos yÞ, which form a complete and orthonormal set of eigenfunc-tions on the interval cos y 2� � 1; 1½, and the radial part is expressed throughcoefficients anðxÞ that will be determined shortly:

r�112 ¼1

r>

1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1þ x2 � 2x cos yp

¼ 1

r>

X1n¼0

anðxÞPnðcos yÞ½49�

The coefficients are determined by exploiting the orthonormality of theLegendre polynomials:

ð1�1

Pnðcos yÞPmðcos yÞ sin ydy ¼ 2

2nþ 1dnm ½50�


Squaring the left- and right-hand side terms of Eq. [49], multiplying with thesurface element of the unit sphere (sin ydy) and integrating, we obtainð1

r2>

sin ydy1þ x2 � 2x cos y

¼ðX1

n¼0

X1m¼0

anðxÞamðxÞPnðcos yÞPmðcos yÞ sin ydy

¼X1n¼0

2

2nþ 1a2nðxÞ ½51�

The left-hand side integral may be calculated easily and yields a logarithmicfunction of x. Taylor-expanding this givesð

1

r2>

sin ydy1þ x2 � 2x cos y

¼ 1

xln

1þ x

1� x¼X1n¼0

2

2nþ 1x2n ½52�

Comparing the right-hand side terms of Eqs. [51] and [52], we find thatanðxÞ ¼ xn and our expansion reads as

r�112 ¼1

r>

X1n¼0

xnPnðcos yÞ ½53�

This expansion facilitates the separation of radial and angular parts of theelectron–electron and nucleus-nucleus distance vectors.

But we are not yet done with our derivation, because one important goalhas not yet been achieved, namely the separation of the electronic coordinatesr1 and r2. These are still coupled through the angular parts in the Legendrepolynomials and inverse powers of �r12 occurring in xn. Without decouplingthe electron coordinates, we cannot precalculate terms of the series thatdepend on electrons 1 and 2 independently, and accordingly, we cannot cir-cumvent a quadratic loop over the electronic coordinates, which would spoilour aim of an OðMÞ method.

Invoking the addition theorem of the spherical harmonic functionsYmn ðy;fÞ as expressed in

Pnðcos yÞ ¼Xnm¼�n

4p2nþ 1

Ymn ðy12;f12ÞYm�

n ðyAB;fABÞ ½54�

the angular parts of the electronic and nuclear coordinates can be decoupledand the multipole series becomes

r�112 ¼1

r>

X1n¼0

Xnm¼�n

4p2nþ 1

rn<rn>


n ðyAB;fABÞ

¼

1

R

X1n¼0

Xnm¼�n

4p2nþ1

�r12R

� �nYmn ðy12;f12ÞYm�

n ðyAB;fABÞ for�r12 < R

1

�r12

X1n¼0

Xnm¼�n

4p2nþ1

R

�r12

� �nYmn ðy12;f12ÞYm�

n ðyAB;fABÞ for�r12 > R

8>>>><>>>>:

½55�


Here �r12; y12;f12 are the components of Dr12 in spherical coordinates andthose of R¼RAB are denoted accordingly.

So far this expansion is obviously convergent and holds exactly for bothcases�r12 < R andR < �r12, because jPnðcos yÞj � 1. (For amore detailed dis-cussion the mathematically inclined reader is referred to Refs. 32,33, and 35.)

Clearly the upper branch of the series complies with our goal: It decou-ples the radial parts of the interelectronic coordinates, because only positivepowers of �r12 occur. This, in the end (see next section), facilitates the factor-ization of the resulting integrals into parts depending only on r1A and r2B. Theintegrals are also easy to compute using some standard algorithm for molecu-lar integral evaluation.36–41

The lower branch of the expansion, however, does not allow factoriza-tion of the integrals and is difficult to calculate. For that reason, when employ-ing the multipole expansion of the two-electron integral, one usually assumesthat �r12 < R. That is to say, the charge distributions A and B are required tobe nonoverlapping. With this presumption, only the upper part of the seriesremains and the multipole expansion of the Coulomb operator for nonover-lapping distributions is

r�112 ¼1

R

X1n¼0

Xnm¼�n

4p2nþ 1

�r12R

� �n


n ðyAB;fABÞ ½56�

The presumption �r12 < R deserves some comment. The cautious reader willhave noticed that when dealing with continuous charge distributions likeGaussian products, there are always regions of integration in which thecharge distributions overlap to some extent such that �r12 > R. Strictlyspeaking, one would always have to include both branches of the expansionwhen dealing with charge distributions extending over whole space in orderto obtain a convergent and exact series representation of the two-electronintegral. In the literature, this is sometimes described by the notion of ‘‘asy-mptotic convergence’’ of the multipole expansion,13 which means that theseries only converges exactly if the overlap tends to zero or the separationR between the ‘‘bra’’ and ‘‘ket’’ charge distributions �A and �B goes to infi-nity. For an in-depth investigation of this mathematically involved topic,the reader is referred to the original literature, cf. Ref. 42 and referencestherein.

In practice, however, this does not pose serious problems, because onecan derive useful estimates of the error introduced by dropping the secondterm of the series. One often defines the extent of a Gaussian distribution�A as

RA ¼ erfc�1ð10�#Þffiffiffiffiffi�Ap ½57�


where 10�# denotes the desired accuracy. For Gaussian distributions of s angu-lar momentum being separated by

RAþB > RA þ RB ½58�one can then show that employing the multipole expansion (Eq. [56]) for cal-culating the ½ssjss� integral leads to an error on the order of 10�# times the sizeof the integral:

½AjB�ðexactÞ ¼ ½AjB�ðMPÞ þ error

error � 10�# � ½AjB�ðMPÞ ½59�

For example, to calculate an ½ssjss� integral with exponents �A ¼ �B ¼ 1 to anaccuracy of 10�7 using the multipole expansion, centers A and B have to beseparated by RA þ RB ¼ 4:0A. Similar expressions can be derived for higherangular momenta, but the expression for s Gaussians is usually sufficientlyaccurate. Together with judicious convergence criteria, themultipole expansionfor ERIs produces results that are numerically exact for all practical purposes.

Spherical Multipole Expansion for Two-Electron IntegralsThe spherical multipole expansion as derived can be cast into a different

form that achieves higher efficiency for computer implementations and finallydecouples the angular parts of the electron coordinates.

Notice that when carrying out the multipole summation in the above for-mulation each term has to be multiplied with a normalization constant 4p

2nþ1 andan inverse power of R. We can introduce new angular functions replacing thespherical harmonics, which already include the normalization constants, andwe can precompute them prior to the evaluation of the two-electron integrals.

To remove the constant fraction in front of each term, the solid harmo-nics in Racah’s normalization are used:

Cnmðy;fÞ ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffi4p

2nþ 1

rYmn ðy;fÞ ½60�

These are employed to define the scaled regular and irregular solid harmonics of

RnmðrÞ ¼ 1ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðn�mÞ!ðnþmÞ!p rnCnmðy;fÞ

InmðrÞ ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðn�mÞ!ðnþmÞ!

pr�ðnþ1ÞCnmðy;fÞ

½61�

In terms of these functions, the one-center multipole expansion of the Cou-lomb operator reads as

r�112 ¼X1n¼0

Xnm¼�n

RnmðDr12ÞI�nmðRÞ ½62�


Using the addition theorem for regular solid harmonics

Rnmðrþ sÞ ¼Xnk¼0

Xkl¼�k

Rn�k;m�lðrÞRklðsÞ ½63�

we can finally separate the electronic coordinates, and exploiting the behaviorof RklðsÞ in going from s to �s, and rewriting the summation indices, our finalmultipole expansion of r�112 with electron coordinates 1 and 2 separated reads:

r�112 ¼X1n¼0

Xnm¼�n

X1k¼0

Xkl¼�kð�1ÞkRnmðr1AÞI�nþk;mþlðRÞRklðr2BÞ ½64�

Until here we concentrated on the multipole expansion of the Coulomboperator. Now we obtain the series for the two-electron integrals. Inserting theexpansion into the two-electron integral and absorbing the ð�1Þk prefactor inthe interaction matrix T, we arrive at an efficient multipole expansion of thetwo-electron integrals:

½AjB� ¼X1n¼0

Xnm¼�n

X1k¼0

Xkl¼�k

qAnmðAÞTnm;klðRÞqBklðBÞ

qAnmðAÞ ¼ð�AðrÞRnmðrAÞdr

qBklðBÞ ¼ð�BðrÞRklðrBÞdr

Tnm;klðRÞ ¼ ð�1ÞkI�nþk;mþlðRÞ

½65�

Here the q’s are spherical multipole moments (monopole, dipole, quadrupole,etc.) of charge distributions �A and �B, respectively. Note that we used squarebrackets to denote an uncontracted two-electron integral. A suitable general-ization for contracted integrals is described in the next section.

Collecting all multipole moments of centerA into a vector qAðAÞ, those ofB into qBðBÞ, and arranging the elements of the interaction tensor inmatrix formTðRÞ, the multipole expansion can also be formulated in matrix notation as

½AjB� ¼ qAðAÞ � TðRÞ � qBðBÞ ½66�

In the following, we will often drop the arguments, because it is clear on whichvariables the terms depend.

We notice that because each multipole vector has OðL2Þ components[n ¼ 0; . . . ;L; for each n, there are 2Lþ 1 m-components, thus a total ofLð2Lþ 1Þ], the total cost of evaluating a single integral using the multipoleexpansion has OðL4Þ complexity. The spherical multipole integrals may be


calculated by means of some well-known recursive algorithms (cf. Refs. 13,36–41,43, and 44) in OðL2MÞ work. For the interaction tensor, efficient recur-sion algorithms also exist (cf. Ref. 13).

The two-electron integrals are for our purposes real entities, so it is clearthat using complex terms (solid harmonics) in the multipole expansion is unne-cessary and only makes the computer implementation slower and more diffi-cult. A reformulation in terms of real multipole integrals and interactionmatrix elements is possible by splitting each term into a real and a complexpart and dealing with them separately. After some algebra, one can see thatthe imaginary part drops out and one obtains the multipole expansion in termsof real-valued multipole moments and interaction matrix elements. The real-valued multipole expansion may be cast into exactly the same form as that ofthe complex series. As both formulations are formally very similar, we do notintroduce the real formulation here but instead refer the interested reader to theliterature, e.g., Ref. 13.

The Multipole Translation OperatorSo far we have only derived the multipole expansion for primitive Gaus-

sian distributions. As pointed out in the introductory example, one of the mainstrengths of the multipole expansion is that it can be used to treat the interac-tions of several primitive charge distributions simultaneously by combiningthem into one single, albeit more complicated, distribution. It will turn outto be useful to translate the centers of multipole expansions to different pointsin space; e.g., if qðAÞ is an expansion about A, we must find a way to convert itto a series about A� t, where t is the translation vector.

We first consider a simple case: the transition from primitive to con-tracted Gaussian distributions. To this end, the multipole expressions for pri-mitive charge distributions �ab � �a and �gd � �b have to be contracted withthe contraction coefficients kmnab and klsgd as

ðmnjlsÞ ¼ ðAjBÞ ¼Xab

Xgd

kmnabklsgd qabðaabÞ � TabgdðRabgdÞ � qgdðbgdÞ|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}

¼½abjgd�

½67�

Here, Rabgd is the distance vector between the primitive centers aab and bgd.Using contracted multipole integrals instead of primitive integrals, we obtainthe following expansion for contracted integrals:

ðAjBÞ ¼ qmnðAÞ � TðRÞ � qlsðBÞqmnnmðAÞ ¼

Xab

kmnab

ð�abðrÞRnmðraÞdr

qlsnmðBÞ ¼Xgd

klsgd

ð�gdðrÞRnmðrbÞdr ½68�


Note that the primitive expansion is centered at points aab and bgd, which are,in general, different from the centers A and B of the contracted charge distri-butions. By contracting we have in fact carried out a translation from the pri-mitive to the contracted centers. What we now must find out is how totranslate expansions to an arbitrary center.

Recalling the addition theorem for regular solid harmonics (Eq. [63]), wesee that a multipole expansion centered at a can be translated by a vector t to anew center A ¼ a� t as

qAnmðAÞ ¼ð�aðrÞRnmðra � tÞdr

¼Xnk¼0

Xkl¼�k

Rn�k;m�lð�tÞ|fflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflffl}¼Wnm;klðtÞ

ð�aðrÞRklðraÞdr|fflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflffl}

¼qaklðaÞ

½69�

We can therefore translate an expansion by simply multiplying with the‘‘translation operator’’ Wnm;klðtÞ and summing over the angular momenta.

Translating an expansion from a center a by ta to a new center A may bewritten in matrix form as

qAðAÞ ¼WðtaÞ � qaðaÞ ½70�

with a similar expression for the translation from b by tb to B. With this themultipole expansion reads as

ðAjBÞ ¼Xa2A

Xb2B

�WðtaÞqaðaÞ

� TðRÞ � �WðtbÞqbðbÞ ½71�

where the summation runs over all a and b centers that need to be shifted tothe new centers A and B, respectively.

To achieve a specified numerical accuracy for the translation, it is suffi-cient to truncate the translation series roughly after OðL2Þ terms. Translatingthe expansion center of a charge distribution is therefore an OðL4Þ step. In thenext section, we will see that the translation operator is necessary to obtain atrue linear-scaling method.

The Fast Multipole Method: Breaking the Quadratic Wall

We have seen in the introductory example that the multipole expansionmay be used to replace particle interactions with their net effects (seeFigure 7). But this alone does not reduce the scaling exponent: If there are d dis-tributionswith pparticles on average,wewill have to evaluateOðd2Þ interactionterms. As d ¼M=p, this so-called ‘‘Naive Multipole Method’’ still has a


complexity of OðM2Þ(but with a reduced scaling prefactor). For very largemolecules, we would again run into problems due to quadratic scaling. Wenow outline the FMM of Greengard and coworkers (cf. Refs. 45–47) as away to reduce the scaling exponent to linear without sacrificing accuracy. Anefficient derivation and implementation of the FMM has been presented byWhite and Head-Gordon.48,49 It is important to know that FMMwas designedfor point charges. Molecular integrals, however, involve (Gaussian) chargedistributions, which are continuous in space. A generalization of FMM tocontinuous charge distributions was developed by White, Johnson, Gill, andHead-Gordon31 and is explained in the next section.

First let us consider an example that captures one of the essentials ofFMM. Imagine six boxes A, B and C1 to C4, which, for example, representfragments of a DNA molecule [Figure 9(a)]. Say that A and B are rather close,whereas C1 to C4 are approximately four times as far from A. Throughout thefollowing discussion, we will asume that the interacting particles are evenlydistributed among the boxes (for a discussion of approaches to nonevenly dis-tributed charges, see Ref. 46). The interaction energies UAC1

; . . . ;UAC4will

then be approximately four times smaller than UAB, because the interactionenergy is proportional to 1=R.

Figure 9 In the naive multipole method, (a) the molecule is divided into boxes of equalsizes (for simplicity, only a few selected boxes are shown). For calculating interactionsbetween remote boxes, e.g., A and C1�C4, one can use much larger boxes (b) and stillachieve high numerical accuracy. This reduces the computational complexityconsiderably and is an important stepstone on the way to linear scaling.


The absolute error caused by truncating the multipole expansion is alsoproportional to 1=R.13 The error is therefore larger for close pairs of boxes ofthe same size and smaller for remote pairs of the same size. In our example, wecould thus combine all four boxes C1 to C4 into a four times larger box C[Figure 9(b)], and the errors in the interaction energies of the close boxes(UAB) and of the remote boxes (UAC) would still be on the same order of mag-nitude. The computational cost, however, is reduced drastically by increasingthe size of remote boxes.

This idea of resorting to fine grains when describing close interactionsand coarse grains for remote interactions is one of the key concepts of theFMM.45 In the framework of FMM, the graining is realized by a hierarchicalboxing scheme, which we outline now for the sake of simplicity in one (ratherthan three) dimension.

The molecule is divided into a hierarchy of boxes. Each box is dividedinto two smaller boxes when going from one level to the next: At level 0 thereis one box, at level 1 there are two, level 2 contains four boxes, and so on. Thisis illustrated for four hierarchy levels in Figure 10 (in general there will bemore levels). The first box we will call parent (abbreviated as P), and the latterare the child boxes (abbreviated as C). In Figure 10 at the bottom, for exam-ple, the parent of box A is called P(A).

In FMM, one distinguishes between near-field (NF) and far-field (FF)interactions. All interactions that can be treated by multipole expansionsbelong to the far field, all others are near-field. In our illustration ofFigure 10, all charges separated by more than two boxes (WS¼ 2) are far-field.Here, WS is the so-called well-separatedness (WS) criterion.

The NF interactions are calculated by conventional methods, e.g., sum-ming over all pairs of charges within the near field. For each level 3 box, thereare two to four near-field boxes, depending on whether the box is locatedat the end or in the inner region of the molecule. The number of particles with-in the near-field of a particular box therefore remains constant. Altogether,

P(A) PFF(A)

NFA NF FF FF FF

Level 0

Level 1

Level 3

Level 2

Figure 10 Illustration of the far-field calculation in FMM.


there are OðMÞ lowest level boxes, and therefore, the total computational costfor evaluating the near-field interactions scales linearly.

The calculation of the far-field interactions in OðMÞ work is the crucialstep of the algorithm. We outline it now for the electrostatic field V felt by thecharges in box A (see Figure 10). Note that V is the algebraized form of themultipole expanded electrostatic potential fðnÞðrÞ we used in the introductoryexample. For A’s parent P(A), there is only one far-field box with which itinteracts: the ‘‘parent far-field’’ PFF(A), as indicated by an arrow in the illus-tration. We denote this field as VPFF

PðAÞ. Box A feels the field of, in principle, fivelevel 3 far-field boxes. Three of them are marked ‘‘FF’’ in the figure. Their fieldwill be denoted as VFF

A . The interaction with the remaining two level 3 boxescan be described at a coarser grain, that is, at level 2. We call every far-fieldinteraction that can be described at a higher boxing level parent far-field (PFF)and all remaining (same-level) interactions are denoted far-field (FF). Theunion of PFF and FF we call total far-field.

The interaction of box A with the remaining two far-field boxes is con-tained in the parent far-field VPFF

PðAÞ. Here the translation operatorW comes intoplay: VPFF

PðAÞ is the field felt at the center of P(A); to obtain the field experiencedat the center of A, we must apply the translation operator. Altogether, the totalfar field of A is given as

Vtotal FFA ¼ VA

FF þWAPðAÞ � VPðAÞPFF ½72�

Note that the interaction of A with the three closest boxes is calculated at level3, whereas the more remote interactions are calculated at level 2. In general,FF interactions are evaluated at the highest possible FMM level, i.e., using thelargest possible boxes.

Consider now a molecule that is twice as large as the previous one, asillustrated in Figure 11. The area shaded in gray denotes a subunit of thesize of our previous example (for comparison). Considering again the FF ofa box A, then in proceeding from the gray subunit to the total system (dou-bling the size of the system) displayed in Figure 11, there are only three addi-tional interactions with the new part of the system at levels 2 and 3. Thenumber of these additional interactions is constant and does not scale withthe total system size in chosing larger systems, because they are done at thehighest possible level of boxes. Therefore, because we have a total numberof OðMÞ child boxes, the total effort scales with OðMÞ.

Note that the inheritance of the field vectors from parent to child (andvice versa) is crucial for the overall linear scaling of the algorithm. If this couldnot be done, forming a V vector would involve a summation over the higherlevel boxes, which would lead to an OðMlogMÞ scaling. Finally we note that‘‘inheritance’’ is only possible through the multipole translation operator,because the field experienced at a parent’s center must be shifted to the child’scenter.


Now we are ready to give a detailed description of the algorithm. TheFMM can be divided into four steps or, in FMM language, ‘‘passes.’’ InPass 1, the multipole expansions of all boxes at the lowest level are calculated.In addition, multipole expansions at higher levels are formed by translating theexpansions of the lowest box level to higher ones. These are then converted tothe multipole-expanded far fields in Pass 2. Then, in Pass 3, the FF is calculatedfor all boxes by adding their own FF and their parent’s FF (the inheritancetrick). Finally, in Pass 4, the interaction energies of all lowest level boxeswith their NF and FF is evaluated to yield the total interaction energy. Alto-gether, FMM calculates the total interaction energy in OðMÞ work.

Pass 1:

1. Calculate multipole expansions for all particles i and boxes A at the lowestlevel:

qA ¼Xi2A

qiA

2. Generate multipole expansions for all boxes at higher levels by translatingthe children’s expansions to the parent’s center:

qA ¼X

B2CðAÞWAB � qB

Figure 11 Illustration of the far field calculation in FMM for a molecule that is twice aslarge as the example of Figure 10.


Pass 2:3. Calculate far-field vector for each box at every level:

VFFA ¼

XB2FFðAÞ

TAB qB

Pass 3:4. Generate FF vector for each box at every level by adding current boxes’ FF

and the parent boxes’ FF:

Vtotal FFA ¼ VFF

A þ VPFFA ¼ VFF

A þWAPðAÞVFFPðAÞ

Special cases: Levels 0 and 1 have no FF. For Level¼ 2 there is no PFF:

Vtotal FFA ¼ VFF

A

Pass 4:5. Calculate interaction energy of each lowest level box with its total FF:

Utotal FFA ¼

Xi2A

qiA � Vtotal FFA

6. Calculate total interaction energy of each lowest level box (NFþ total FF):

UA ¼ UNFA þUtotal FF

A ¼X

i;j2NFðAÞ

qiqjRijþUtotal FF

A

7. The total Coulomb interaction energy is given by the sum over theinteraction energies of all lowest-level boxes (factor of one half accounts fordouble counting of interactions):

U ¼ 1

2

XA

UA

Fast Multipole Methods for Continuous ChargeDistributions

So far the FMM considers only point charges. In quantum chemistry,however, we must deal with continuous charge distributions as, for example,in the form of Gaussian distributions. For these continuous distributions, oneencounters two difficulties: How to define the spatial extent of a continuouscharge distribution (they may extend over the whole space in general), and


how to treat different extents of charge distributions in an efficient way. Wediscuss here a prominent generalization of FMM to continuous charge distri-butions, the CFMM,31 which addresses these issues.

When treating continuous charge distributions with the multipole expan-sion, we must ensure that the ‘‘bra’’ and ‘‘ket’’ distributions are nonoverlap-ping to guarantee convergence of the multipole series. Because Gaussiandistributions extend over the whole space, they are never nonoverlapping ina strict sense. However, if the contributions of the overlapping regions tothe two-integrals are numerically negligible, employing the multipole seriescauses no problems in practice.13

From the analytic expressions for the two-electron integral ð�Aj�BÞ overs Gaussians, we pointed out that the error caused by employing the multipoleexpansion for calculating the ERI is on the order of E, if the ‘‘bra–ket’’ distanceR is chosen such that the following equation holds:

R > RA þ RB

RA ¼ erfc�1ðEÞffiffiffiffiffi�Ap

RB ¼ erfc�1ðEÞffiffiffiffiffi�Bp

½73�

Extended criteria can be derived for higher angular momenta, but in practice,it is usually sufficient to use Eq. [73]. With it, we have a criterion at hand thatensures convergence of the multipole series to the exact value of the integralwithin numerical accuracy of OðEÞ even for Gaussians.

We now outline the CFMM as first formulated by White et al.31 For adiscussion of performance issues and a detailed description of implementa-tional considerations, the interested reader is referred to the original literature.A pedagogical introduction to (C)FMM can be found in the book by Helgaker,Jørgensen and Olsen.13 It is important to notice that although the error esti-mate of Eq. [73] holds only rigorously for s functions, the maximum box–boxinteraction error is used in CFMM. The CFMM error estimate is thereforegenerally considered an upper bound to the true error.31

We must keep track of the spatial extents of Gaussian distributions inorder to know when the multipole expansion is applicable and when it isnot; in other words, we must know the size of the NF for each Gaussian.To that end, a ‘‘well-separatedness criterion’’(WS) or, synonymously, an NFwidth parameter is introduced, which stores the number of boxes by whicha pair of equal Gaussian distributions have to be separated to be treated asan FF pair:

WSn ¼ 2n; n ¼ 1; 2; . . . ½74�


With this nomenclature, WS1 means that the NF is two boxes wide, whereasWS2 stands for four boxes and so on (in one dimension).

All Gaussian distributions are sorted into boxes (like in FMM) and intobranches of WS parameters according to their extents. That is, a Gaussian ofextent RA is sorted into the branch with

WSn ¼ max WS1; 2 d RA

Lbox

e� �

½75�

where d e is the ceiling function (smallest integer greater than or equal to argu-ment) and Lbox denotes the size of a lowest level box. Accordingly, the tightestdistributions, which have only very small extents, are assigned to the WS1branch; less tight distributions are assigned to branches WSn with larger n;and the most diffuse distributions belong to the branch with largest well-separatedness criterion. WS is chosen to be as small as possible while contain-ing the distribution completely; i.e., the far field is chosen as large as possibleso as to benefit from the multipole expansion.

The WS criterion for two distributions of (in general) different extents isgiven by

WSnm ¼WSn þWSm2

½76�

This means the two distributions have to be separated by WSnm boxes in orderto be treated as FF interactions.

Apart from the additional assignment of charge distributions into WSbranches, the CFMM steps are formally similar to those of FMM. The onlyimportant difference between CFMM and FMM is that the width of the NFis chosen according to the spatial extent of the charge distributions and the mul-tipole moments are calculated by integration (instead of summation for pointcharges) with the former method; everything else stays essentially the same.

Finally, we comment on the computational complexity of CFMM. It isimportant to notice that the overall computational complexity of CFMM isOðMÞ for calculating the Coulomb integral matrix, if the Gaussian charge distri-butions are not extremely diffuse. In the limit of exceedingly diffuse distributions,the NF would extend over the whole molecule, which ultimately results in a lateonset of linear scaling (but with reduced prefactor in comparison with conven-tional methods). In practice, however, this problem is usually not observed forthe basis sets commonly used in quantum chemistry for calculations on largemolecules: Calculating the Coulomb matrix via CFMM is a linear-scaling step.

Other Approaches

We concentrated here on the linear-scaling calculation of the Coulombmatrix in the frame of (C)FMM, which are used commonly in quantumchemical calculations of large molecules. It should be noted that other tree


codes for large molecules exist like, for example, Barnes–Hut (BH) tree meth-ods50 or the quantum chemical tree code (QCTC) of Challacombe, Schwegler,and Almlof.51,52 These differ in the structure of the tree used for organizing theboxes, or the kinds of expansions used for the integrals. BH methods tradi-tionally use Cartesian multipole expansions,29,53 whereas the QCTC employsthe fast Gauss transform.54 Several variations of these methods have beenreported; see the references in Ref. 52 for example.

Another variation of FMM are the very fast multipole methods(vFMM).55 In the original FMM formulation, the multipole series is truncatedafter a maximum angular momentum L, which is kept constant during thewhole calculation. Depending on the shape of the charge distributions andthe box size, it may not be necessary to carry out the summation up to Lbut, rather, to a smaller angular momentum Leff < L to reach a defined levelof accuracy for some boxes. This is essentially the idea behind vFMM, whichtruncates the multipole series at a certain angular momentum Leff based on anempirical criterion. Strain, Scuseria, and Frisch developed a variation ofCFMM, the Gaussian very fast multipole method (GvFMM),56 which trun-cates the series after Leff < L terms in the spirit of vFMM.55

Finally, we note that one of the most costly steps in calculating theCoulomb matrix using CFMM or GvFMM is the explicit evaluation ofthe near-field integrals. Although this step is scaling linearly with the size ofthe molecule, one can further decrease the prefactor by resorting to specialmethods that speed up the near-field integral calculation. Here we wouldlike to mention just two recently developed methods: The use of auxiliary basisset expansions57–61 in the multipole accelerated resolution of the identity(MARI-J) approach62 and the Fourier transform Coulomb (FTC) method.63–66

EXCHANGE-TYPE CONTRACTIONS

Now that we have described how to reduce the scaling behavior for theconstruction of the Coulomb part in the Fock matrix (Eq. [9]), the remainingpart within HF theory, which is as well required in hybrid DFT, is theexchange part. The exchange matrix is formed by contracting the two-electronintegrals with the one-particle density matrix P, where the density matrixelements couple the two sides of the integral:

Kmn ¼Xls

Pls ðmljnsÞ ½77�

At first sight, it seems that the formation requires asymptotically OðM2Þ four-center two-electron integrals, so that the overall scaling of the computationaleffort for building the exchange part would be quadratic. However, thecoupling of the two charge distributions of the two-electron integral by the

Exchange-Type Contractions 35

one-particle density matrix is of central importance for the scaling behavior aswill become clear from the discussion below. Therefore, it is crucial to discussfirst the behavior of the one-particle density matrix (Equation [8]).

The canonical MO coefficient matrix ðCÞ and the one-particle densitymatrix (P) are depicted in Figure 12 as computed at the HF/6-31G� level fora DNA fragment with four base pairs (DNA4). Here, negligible matrix ele-ments below a threshold of 10�7 are plotted in white, whereas significant ele-ments are shown in black. The figure clearly illustrates that basically nononsignificant elements occur in the canonical MO coefficient matrix, becausethe canonical MOs are typically delocalized over the entire system. This is dif-ferent for the one-particle density matrix where a considerable number of neg-ligible elements occurs. To reduce the computational scaling behavior byexploiting the localization of the one-particle density matrix, it is not sufficientto have many zero elements in the matrix, but it is necessary that the numberof significant elements scales only linearly with system size. This favorablebehavior of the density matrix is indeed observed, as shown in Figure 13 againfor DNA fragments, in comparison with the OðM2Þ behavior of the canonicalcoefficient matrix. It is important to note that the scaling behavior of the num-ber of significant elements in the one-particle density matrix is closely relatedto the highest occupied molecular orbital-lowest unoccupied molecular orbital(HOMO–LUMO) gap of molecular systems; see Refs. 67–70. Therefore, theasymptotic linear scaling behavior holds only for systems with a nonvanishingHOMO–LUMO gap, so that for a ‘‘truly metallic’’ system, for instance, a qua-dratic behavior would result. Nevertheless, for a multitude of important che-mical and biochemical systems, the scaling of the one-particle density matrix is

Figure 12 Significant elements in (a) the canonical MO coefficient matrix ðCÞ and (b)the one-particle density matrix ðPÞ computed at the HF/6-31G� level for four DNA basepairs (DNA4). Significant elements with respect to a threshold of 10�7 are colored inblack.


asymptotically linear. Therefore, the main goal in many linear-scaling theoriesis to exploit the scaling behavior of the density matrix and to avoid entirely theuse of the nonlocal molecular orbital coefficient matrix.

For the linear-scaling formation of the exchange part of the Hamilto-nian, the favorable scaling behavior of the one-particle density matrix P needsto be exploited. If we have a linear scaling density P, then to each index m of amatrix element Pmn, there can be only a constant number of elements with anindex n that is significant with respect to a given threshold. This is nothing elsethan the definition of a linear-scaling matrix. However, this means that ifwe consider the formation of the exchange part in the Fock matrix(Eq. [77]), the asymptotically OðM2Þ number of four-center two-electron inte-grals are coupled over the one-particle density matrix elements, so that theoverall number of required two-electron integrals is reduced to linear for alinear-scaling density matrix:

Kmn ¼Xls

Pls ð ml|{z}M

j ns|{z}M

Þ|fflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflffl}

M �Pls

½78�

C O(N2)P O(N) 10–5

P O(N) 10–6

DNA fragments (6–31G*)Number ofsignificant elements

15000000

10000000

5000000

00 100 200 300 400

Number of atoms

Figure 13 Scaling behavior of significant elements in the one-particle density matrix forDNA fragments computed at the HF/6-31G� level as compared with the scalingbehavior of the MO coefficient matrix. Two different thresholds of 10�5 and 10�6 areshown as compared with the M2 behavior of the coefficient matrix.


This result also becomes clear in looking at the pseudo-code for the formationof the exchange part:

Loop over m �! OðMÞLoop over l �! Oð1Þ: coupled to m via overlapLoop over s �! Oð1Þ: coupled to l via Pls

Loop over n �! Oð1Þ: coupled to s via overlapKmnþ¼ PlsðmljnsÞendloop

endloopendloop

endloop

Here, the outside loop runs over all basis functions m [scaling as OðMÞ]. Thesecond loop is over the second index of all significant charge distributions �ml

(see above), so that this second loop scales as Oð1Þ (asymptotically indepen-dent of molecular size). The third loop is coupled to the Oð1Þ l loop overthe linear-scaling one-particle density matrix, therefore scaling as well inde-pendently of system size [Oð1Þ]. And finally the last loop is behaving asOð1Þ due to the coupling in forming the charge distributions �ns.

These two simple considerations illustrate that the scaling of theexchange part is linked closely to the scaling behavior of the one-particle den-sity matrix. It is important to note, however, that the onset of the scaling beha-vior for the exchange formation can be significantly earlier than the one for thedensity matrix using the same thresholds due to the coupling over the two-electron integrals. We will discuss this in more detail and present timings inthe context of the calculation of energy gradients.

The key to a truly linear-scaling exchange is the efficient implementationof the screening for significant contributions to the exchange matrix in a non-quadratic manner. The time effort for a quadratic screening can be reducedsignificantly (see Ref. 71), but this depends strongly on the molecular systembeing studied and the type of exchange contraction (e.g., exchange-type con-traction of perturbed densities). It is clear that asymptotically an OðM2Þscreening procedure would dominate the calculation.

The first attempts to reduce the scaling of the exchange formationrequired assumptions about the long-range density and exchange behavior.72,73

Because of these assumptions they were not able to readily ensure a prescribedaccuracy. This difficulty was overcome by Schwegler et al. in 199774 in theirONX (order N exchange) algorithm by employing the traditional density-weighted Schwarz integral estimates of direct SCF methods21 within a novelloop structure. For nonmetallic systems, i.e., systems with a nonvanishingHOMO–LUMO gap, this achieves effective linear scaling by using preorderedintegral estimates, which allow the calculation to leave a loop early and toavoid unnecessary computational effort. However, unlike conventional direct


integral contraction schemes,19,21,22 the original (compare as well Ref. 75)ONX does not exploit permutational symmetry of the two-electron integrals.It is clear that as long as the exchange formation is dominated by the integralcomputation, it is favorable to avoid sacrificing permutational symmetry thatis, in an ideal case, a factor of four. The need to exploit permutational symme-try is of particular importance if the algorithm reverts to quadratic scaling forsmall molecules or small band gap systems.

To avoid these problems, the LinKmethod was introduced by Ochsenfeldet al. in 1998.71 It reduces the computational scaling for the exchange part to lin-ear for systems with a nonvanishing HOMO–LUMO gap, while preserving thehighly optimized structure of conventional direct SCF methods with only negli-gible prescreening overhead and without imposing predefined decay properties.The LinK method leads to early advantages as compared with conventionalmethods for systems with larger band gaps. Due to negligible screening over-head, it is also competitive with conventional SCF schemes for both small mole-cules and for systems with small band gaps. For the formation of an exchange-type matrix in, e.g., coupled perturbed SCF theory, the LinK method achievessub-linear scaling, or more precisely, independence of the computational effortfrommolecular size for local perturbations.71 Because implementing the linear-scaling screening is tricky and does not provide much further insight for thecurrent tutorial, we refer the reader to the original literature for details.71,76

We conclude this section by presenting some illustrative results for com-paring Schwarz and MBIE screening using test calculations on DNA molecules

40 128 16DNA base pairs

0

0.5

1

CPU

tim

e [h

]

QQMBIE-1exact

Figure 14 Illustrative timings comparing Schwarz (QQ) and MBIE screening forcalculating the exchange matrix for a series of DNAn molecules (n ¼ 1; 2; 4; 8; 16) withup to 1052 atoms (10674 basis functions). All calculations were performed within theLinK method and with a 6-31G� basis at a threshold of 10�7 on an Intel Xeon 3.6 GHzmachine.


with up to 1052 atoms. In Figure 14, the calculation time for building oneHartree–Fock exchange matrix using Schwarz (QQ) and MBIE screening,respectively, is shown, while in both schemes, the LinK method is used. Weobserve that, in both schemes, the calculation time scales indeed linearlywith the molecule size as pointed out in the foregoing discussion. A speed-up of the calculation by a factor of 2.1 is observed by employing MBIE as com-pared with the QQ screening, whereas the numerical error in the exchangeenergy is preserved and is on the order of 0.1 mHartree for both screeningapproaches using a threshold of 10�7. We have also compared these timingsto ‘‘exact’’ screening, that is, the estimated calculation time that would result ifthe two-electron integrals were known exactly in the screening process. Fromthe fact that the MBIE and ‘‘exact’’ graphs almost coincide, it is evident thatMBIE screening is close to optimal for SCF.

THE EXCHANGE-CORRELATION MATRIXOF KS-DFT

Although the Kohn–Sham-DFT method has been well established insolid-state physics for many years, it was introduced to the computationalchemistry community by a reformulation within a finite Gaussian basisset.59,77–80 Nowadays basically all popular ab initio packages provide avariety of exchange-correlation (XC) functionals that are widely used incomputational chemistry and physics.

In this section,wewill not present the different types ofXC functionals (seeRef. 9 and references therein; Refs. 81 and 82 also treat the recently developedmeta-GGA functionals) but discuss only briefly the OðMÞ formation of the XCpotential matrix Vxc in the given basis.83,84 It has to be mentioned that hybridXC functionals85 also contain a certain amount of exact exchange K, whichcan be formed inOðMÞ fashion within the LinK scheme71,76 as described above.

The XC energy Exc is in general a functional of the density r. Within theGGA, Exc is also a functional of the density gradient rr, and within themeta-GGA, it is a functional of r;rr, and additionally the kinetic energydensity t:

Exc ¼ðfxc½raðrÞ; rbðrÞ;rraðrÞ;rrbðrÞ; taðrÞ; tbðrÞ� dr ½79�

The potential vxc arising from exchange-correlation interactions between elec-trons is defined by the derivative of the XC energy functional Exc with respectto the one-particle density rðrÞ as

vxcðrÞ ¼ qfxcðrÞqrðrÞ ½80�


and the discrete representation in the given basis results from integration overr as

ðVxcÞmn ¼ hwmjvvxcjwni ¼ðqfxcðrÞqrðrÞ wmðrÞwnðrÞ dr ½81�

Because it is typically not possible to determine Exc and Vxc by analytic inte-gration, a numerical quadrature has to be used. Therefore, Eq. [79] is rewrit-ten to become

Exc ¼XNA

A

XNAgrid

i

pA wifxcðriÞ ½82�

where NAgrid is the number of grid points frig and wi is the weight to the given

grid point of atom A. pA is the nuclear partition function that enables a split ofthe molecular grid into single atomic integral contributions. In a first step, theatomic grids are constructed usually by a combination of radial and angulargrid points.86 After determining the partition factors pA by, e.g., the popularmethod proposed by Becke,87 the different atomic grids are merged to yield themolecular grid in OðMÞ fashion.

For each atomic grid, the integral contribution is calculated with a scal-ing behavior independent of system size. After determining the constant num-ber of basis functions wm required for the actual subgrid, as well as thecorresponding basis function pairs wmwn, the representation of the one-particledensity within the partial grid is formed by

rðriÞ ¼Xmn

PmnwmðriÞwnðriÞ ½83�

with analogous equations for rrðriÞ and tðriÞ. At this point it is important tonote that the localization or delocalization of the electrons resulting in a sparseor dense discrete density matrix P does not effect the scaling behavior of thealgorithm; i.e., the strict OðMÞ scaling holds even for metallic systems due tothe overlap-type coupling of wm and wn (see also discussion for extremely dif-fuse basis functions in the context of CFMM).

The evaluation of the XC functional and its derivatives at each point ofthe sub-grid is followed by the summation of the energy functional values toyield the XC energy Exc. To form the matrix representation of the correspond-ing XC potential Vxc in the given basis, the different first-order derivativeshave to be contracted with the corresponding basis function values as

hwmjvvxcjwni ¼XNA

A

XNAgrid

i

pA wiqfxcðriÞqrðriÞ wmðriÞwnðriÞ ½84�

For determining higher order derivatives of the XC potential, which areneeded for response properties, the implementation can be done in a similarfashion, so that an OðMÞ scaling behavior is ensured as well.

The Exchange-Correlation Matrix of KS-DFT 41

AVOIDING THE DIAGONALIZATIONSTEP—DENSITY MATRIX-BASED SCF

In the last sections, we have seen that the Fock matrix can be formed in alinear-scaling fashion. This however means that the second rate-determiningstep within the SCF approach becomes more important for large moleculesdue to its OðM3Þ scaling, although it shows a rather small prefactor: The solu-tion of the generalized eigenvalue problem is typically done by a diagonaliza-tion of the Fock matrix. We now discuss general approaches of how to avoidthe diagonalization step entirely and to reduce the cubic scaling to linear.

The necessity for diagonalization alternatives is illustrated by the follow-ing example: Both the Fock matrix diagonalization and the Fock matrix con-struction using LinK/CFMM require for aDNA8 molecule (eight stacked DNAbase pairs) with 5290 basis functions approximately 22 minutes on a 3.6-GHzXeon processor (HF/6-31G�, threshold 10�7, MBIE screening).23 This changesfor a DNA16 system (with 1052 atoms and 10674 basis functions), where thediagonalization is already more costly than calculating the two-electron inte-gral matrices—141 versus 51 minutes. Therefore it is clearly necessary to cir-cumvent the diagonalization for large molecules.

By diagonalizing the Fock matrix, the canonical MO coefficient matrixðCÞ is obtained (see Eq. [7]). However, we have seen in a previous section thatalmost all elements in the coefficient matrix are significant, which contrastswith the favorable behavior of the one-particle density matrix ðPÞ. The densitymatrix is conventionally constructed from the coefficient matrix by a matrixproduct (Eq. [8]). Although the Roothaan–Hall equations are useful for small-to medium-sized molecules, it makes no sense to solve first for a nonlocalquantity ðCÞ and generate from this the local quantity ðPÞ in order to computethe Fock matrix or the energy of a molecule. Therefore, the goal is to solvedirectly for the one-particle density matrix as a local quantity and avoidentirely the use of the molecular orbital coefficient matrix.

General Remarks

In the following sections, we will provide an overview of density matrix-based SCF theory that allows one to exploit the naturally local behavior of theone-particle density matrix for molecular systems with a nonvanishingHOMO–LUMO gap. Besides the density matrix-based theories sketchedbelow,68,88–94 a range of other methods exists, including divide-and-conquermethods,95–98 Fermi operator expansions (FOE),99,100 Fermi operator projec-tion (FOP),101 orbital minimization (OM),102–105 and optimal basis density-matrix minimization (OBDMM).106,107 Although different in detail, manyshare as a common feature the idea of (imposed or natural) localizationregions in order to achieve an overall OðMÞ complexity. This notion impliesthat the density matrix (or the molecule) may be divided into smaller


submatrices (submolecules), of which only a linear-scaling number of frag-ments may interact with each other. For an overview of these methods, thereader is referred to reviews by Goedecker,108,109 Scuseria,84,110 and Bowleret al.111,112 In the field of ab initio quantum chemistry, it seems that densitymatrix-based schemes are (so far) favored, whereas other diagonalizationalternatives are mainly applied to large tight-binding or semi-empirical calcu-lations. We will therefore focus on density matrix-based approaches, whichnot only allow one to avoid the diagonalization step, but also provide a wayfor the efficient calculation of molecular response properties such as NMRchemical shifts for large systems.113 In the next section, we begin by describingsome basics of tensor formalisms that are useful (but not necessary) for under-standing methods employing nonorthogonal basis functions. That section isfollowed by a brief outline of selected properties of the density matrix. Withthis we then turn to the formulation of diagonalization alternatives based onsolving directly for the one-particle density matrix.

Tensor Formalism

To account correctly for the metric of the space spanned by the basisfunctions (overlap matrix Smn; see also Figure 15), it is convenient to handleoperations in a nonorthogonal basis (like the AO basis) using tensor notation.As we will be concerned with AO-formulations of quantum chemical methodsin the following, a basic understanding of this topic is useful for comprehend-ing the succeeding sections, although we can only give a very brief introduc-tion here. For a more thorough and yet pedagogical introduction to tensortheory in quantum chemistry along with a detailed list of references to the lit-erature, the interested reader is referred to the review of Head-Gordon et al.114

y

x

χ1

χ2

f

g

Figure 15 Basis vectors and functions may be (a) orthogonal, (b) nonorthogonal, oreven (c) curvilinear like (a) ðx; yÞ, (b) ðf ; gÞ, and (c) ðw1; w2Þ in this illustration. The metricgmn � Smn (which can be identified with the overlap matrix in quantum chemistry)describes uniquely the kind of coordinate system (a–c) spanned by the basis and providesa measure for distances, volumes, etc.

Avoiding the Diagonalization Step—Density Matrix-Based SCF 43

A general introduction to tensor analysis and its relation to Dirac’s notationmay be found in the book of Schouten.115

Every vector x in an n-dimensional linear vector space may be expressedas a linear combination of basis vectors ei:

x ¼Xni¼1

xiei � xiei ½85�

where xi are the components of x in the ei representation. On the right-handside, we used Einstein’s sum convention, which we employ for the sake ofbrevity whenever applicable. Vectors with lower indices will be called ‘‘covar-iant,’’ e.g., the ei are covariant basis vectors.

The basis vectors are nonorthogonal in general. That is, the scalar pro-duct of every pair gives a number S:

ei � ej ¼ Sij; 0 � jSijj � 1 ½86�where we assume that the basis vectors are normalized such that Sii ¼ 1. Onecould wish, for reasons that will become clear later, to find a second set ofbasis vectors ej, such that for every ei, there is an ej with

ei � ej ¼ 1 if i ¼ j0 if i 6¼ j

�½87�

Equation [87] is similar to the case of normalized orthogonal basis sets (whereei � ej ¼ dij), with the difference being that one vector comes from the first basisset and the second from the other basis set. For that reason, Eq. [87] is referredto as the ‘‘biorthogonality’’ or ‘‘biorthonormality’’ condition. Basis vectorsmeeting the biorthogonality requirement with respect to a covariant basis eiwill be denoted with an upper index and are called ‘‘contravariant.’’ The con-travariant basis vectors will also be nonorthogonal in general; i.e., ei � ej ¼ Sij.

Instead of expanding x in terms of covariant basis vectors, one may useequally well the contravariant basis as

x ¼Xni¼1

xiei � xie

i ½88�

Here the xi are the components of x in contravariant representation.So far we have restricted ourselves to vectors so as to simplify the discus-

sion. Now we turn to tensors. A tensor TðkÞ of rank k may be seen as an entitywhose components are described by

TðkÞ ¼X

i1;i2;...;ik

Ti1;i2;...;ik

ei1ei2. . . e

ik

½89�


with k indices i1; i2; . . . ; ik. Note that here the indices are placed below T andthe e’s to denote that each may either be co- or contravariant depending on thechosen representation.115 For example, one could choose a set of covariantbasis vectors (Ti1;i2;...;ik ei1ei2 . . . eik), a contravariant set (Ti1;i2;...;ik e

i1ei2 . . . eik),or even a mixed representation. To be called a tensor, such an entity mustobey certain rules concerning coordinate transformations, which we will,however, not discuss here and assume to be fulfilled in the following.

Consider the following examples for illustration: A vector a in n-dimensional space is described completely by its n components ai. It may there-fore be seen as a one-index quantity or a tensor of rank one. A matrix A has n2

components Aij (two indices) and is a rank two tensor. A tensor of rank threehas n3 components, and its components have three indices, Tijk, and so on. As aspecial case, scalars have only n0 ¼ 1 component and are tensors of rank zero.

The following important rules of tensor analysis should be mentioned:Addition and subtraction is only defined for tensors of the same rank and ofthe same transformation properties (co-/contravariance). For example, addinga matrix and a vector is not valid. Multiplication (also called tensor contrac-tion) is only defined for pairs of indices, where one index is co- and the other iscontravariant. As another example, xixi is a valid tensor contraction, but xixi

is not.Tensor notation may be applied to quantum chemical entities such as

basis functions and matrix elements. For example, jwmi is a covariant tensorof rank one. Like before, superscripts, e.g., jwmi, denote contravariant tensors.Co- and contravariant basis functions are defined to be biorthogonal; that is,they obey the conditions of

hwmjwni ¼ dmn and hwmjwni ¼ dnm ½90�

where the first index refers to the ‘‘bra’’ side and the second to the ‘‘ket.’’ Forthe sake of simplicity, we do not pay attention to the order of indices in thefollowing; e.g., we use dmn instead of dmn or d

mn . For an in-depth discussion of this

point, see Ref. 116.Co- and contravariant basis functions are nonorthogonal in general; i.e.,

the following equation holds:

hwmjwni ¼ Smn and hwmjwni ¼ Smn ½91�

It can be shown that co- and contravariant tensors may be converted intoeach other by applying the contravariant and covariant metric tensorsgmn ¼ ðS�1Þmn � Smn and gmn ¼ Smn as

jwmi ¼ gmnjwni and jwmi ¼ gmnjwni ½92�

where Smn is the well-known overlap matrix and Smn ¼ ðS�1Þmn is its inverse.


For tensors Tð2Þ of rank two, we have the following choices as far as co-and contravariance of the component indices are concerned: (1) Tmn, (2) T

mn,(3) Tn

m, and (4) Tmn . Alternative (1) is said to be ‘‘fully covariant,’’ (2) is ‘‘fully

contravariant,’’ and the other two are ‘‘mixed’’ representations. In principle,one is free to formulate physical laws and quantum chemical equations in anyof these alternative representations, because the results are independent of thechoice of representation. Furthermore, by applying the metric tensors, onemay convert between all of these alternatives. It turns out, however, that itis convenient to use representations (3) or (4), which are sometimes calledthe ‘‘natural representation.’’ In this notation, every ‘‘ket’’ is considered tobe a covariant tensor, and every ‘‘bra’’ is contravariant, which is advantageousas a result of the condition of biorthogonality; in the natural representation,one obtains equations that are formally identical to those in an orthogonalbasis, and operator equations may be translated directly into tensor equationsin this natural representation. On the contrary, in fully co- or contravariantequations, one has to take the metric into account in many places, leadingto formally more difficult equations.

Let us look at how to translate the idempotency requirement of the den-sity operator (which will be discussed in the next section more extensively)into a tensor equation as, for example,

rr2 ¼ rr ½93�

Introducing the matrix elements of the density operator in the natural repre-sentation, Pm

n ¼ hwmjrrjwni, one may easily cast this operator equation into ten-sor form:

PmlP

ln ¼ Pm

n ½94�

This natural tensor equation is formally similar to the operator equation. If wewish to cast this equation into another (nonorthogonal) representation, we cando so by applying the metric tensor as described above. Let us, for example,rewrite Eq. [94] using the fully contravariant form of the density matrix:

PmagalPlbgbn|fflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflffl}

PmlP

ln

¼ Pmbgbn|fflfflffl{zfflfflffl}Pmn

½95�

Because gbn occurs on both sides of the equation, we can remove it, and insert-ing gal ¼ Sal; we arrive at

PmaSalPln ¼ Pmn; or inmatrix notation : PSP ¼ P ½96�

This is the same result that is used in AO-based density matrix-based formula-tions of quantum chemistry discussed in later sections.


Atomic basis functions in quantum chemistry transform like covarianttensors. Matrices of molecular integrals are therefore fully covariant tensors;e.g., the matrix elements of the Fock matrix are Fmn ¼ hwmjFjwni. In contrast,the density matrix is a fully contravariant tensor, Pmn ¼ hwmjrrjwni. This repre-sentation is called the ‘‘covariant integral representation.’’114,116 The deriva-tion of working equations in AO-based quantum chemistry can therefore bedivided into two steps: (1) formulation of the basic equations in natural tensorrepresentation, and (2) conversion to covariant integral representation byapplying the metric tensors. The first step yields equations that are similarto the underlying operator or orthonormal-basis equations and are thereforesimple to derive. The second step automatically yields tensorially correct equa-tions for nonorthogonal basis functions, whose derivation may becomeunwieldy without tensor notation because of the frequent occurrence of theoverlap matrix and its inverse.

In the following we will tacitly assume some basic knowledge of tensoranalysis, especially as far as co- and contravariance is concerned. We will,however, in general not use upper and lower indices to discriminate co- andcontravariance, because this is traditionally omitted in quantum chemistryand would greatly complicate the notation. The rare occasions where this ten-sor notation is needed will be pointed out explicitly.

Properties of the One-Particle Density Matrix

A system with N electrons is fully described by the corresponding wavefunction � and, following the interpretation of Born,117

j�j2dr1dr2 . . . drN ¼ ��ðr1r2 . . . rNÞ�ðr1r2 . . . rNÞdr1dr2 . . . drN ½97�

represents the probability for finding electron 1 in dr1, electron 2 in dr2, and soon. The probability for an arbitrary electron to be found at r1 is obtained byintegrating over the positions of the remaining electrons and accounting forthe indistinguishability of fermions:

rðr1Þ ¼ N

ð�ðr1r2 . . . rNÞ��ðr1r2 . . . rNÞdr2 . . . drN ½98�

which defines the so-called one-particle density function (see Ref. 118). It isimportant to note that these functions are quadratic in the wavefunctionand invariant to unitary transformations of the wave function.

If we consider the Hartree–Fock approach of building a Slater determi-nant from one-particle functions ji (compare our section on the basics of SCFtheory), we can similarly define the one-particle density as

rðrÞ ¼Xi2occ

jiðrÞj�i ðrÞ ½99�


and the corresponding density operator rr as

rr ¼Xi2occjjiihjij ½100�

The density operator is a projector onto the occupied space, whichbecomes more clear if one considers, for example, an arbitrary function fthat is expanded in the basis fjrg as

f ¼Xr

arjjri ½101�

By projection with the density operator, the orthonormality causes all compo-nents other than those corresponding to the occupied space to disappear:

rr f ¼Xi2occ

Xr

arjjiihjijjri ¼Xi2occ

Xr

arjjiidir ¼Xi2occ

aijjii ½102�

Projecting a second time gives the same result, leading to the idempo-tency property of projection operators ðrr2 ¼ rrÞ in

rr2 ¼Xi2occj2occ

jjii hjijjji|fflfflffl{zfflfflffl}dij

hjjj ¼Xi2occjjiihjij ¼ rr ½103�

If the one-particle functions ji are expanded in basis functions wm as

ji ¼Xm

Cmi wm ½104�

the density operator can be written as

rr ¼Xi2occjjiihjij ¼

Xmn

Xi2occ

CmiC�nijwmihwnj ¼

Xmn

Pmnjwmihwnj ½105�

with the one-particle density matrix P introduced in Eq. [8]. If we consideragain the idempotency property of the density operator,

rr2 ¼Xmn

Xls

Pmnjwmi hwnjwli|fflfflffl{zfflfflffl}Snl

hwsj Pls ¼Xmn

Pmnjwmihwnj ¼ rr ½106�

then it becomes immediately clear that the following holds:

PSP ¼ P ½107�


Note that we have already derived this equation by the help of tensor notationin the previous section. The overlap matrix S appears in a nonorthogonal basisand is important for correct contraction with co- and contravariant basis sets.Therefore, either PS or SP is a projector onto the occupied space depending onthe tensor properties of the quantity to which it is applied. The same holds forthe complementary projector onto the virtual space ð1� PSÞ or ð1� SPÞ.

An important technique that we will exploit in the next sections is thedecomposition of matrices into occupied–occupied (oo), occupied–virtual(ov/vo), and virtual–virtual (vv) blocks.118 This is done by projecting thesematrices onto the occupied and/or virtual space using the projectors PS orSP (occupied) and ð1� PSÞ or ð1� SPÞ (virtual) as it is shown in the followingequation for a covariant matrix A:

A ¼ 1A1 ¼ SPAPSþ SPAð1� PSÞ þ ð1� SPÞAPSþ ð1� SPÞAð1� PSÞ¼ Aoo þ Aov þ Avo þ Avv ½108�

Density Matrix-Based Energy Functional

As discussed, for a formulation of SCF theories suitable for large mole-cules, it is necessary to avoid the nonlocal MO coefficient matrix, which isconventionally obtained by diagonalizing the Fock matrix. Instead we employthe one-particle density matrix throughout. For achieving such a reformula-tion of SCF theory in a density matrix-based way, we can start by lookingat SCF theory from a slightly different viewpoint. To solve the SCF problem,we need to minimize the energy functional of

E ¼ tr Phþ 1

2PGðPÞ

� �½109�

where GðPÞ denotes the two-electron integral matrices contracted withthe density matrix. We minimize the energy with respect to changes in theone-particle density matrix,

dE

dP

!¼ 0 ½110�

enforcing two constraints: First, the idempotency condition of the followingequation needs to be accounted for:

PSP ¼ P ½111�

and, second, the number of electrons N must be correct:

tr ðPSÞ ¼ N ½112�


These conditions are automatically fulfilled upon diagonalization of the Fockor the Kohn–Sham matrices and the formation of the density matrix (Eq. [8]).

The question now becomes: How do we impose these properties withoutdiagonalization? Li, Nunes, and Vanderbilt (abbreviated as LNV)88 first rea-lized in the context of tight-binding (TB) calculations (see also the relatedwork of Daw119) that insertion of a purification transformation introducedby McWeeny in 1959120,121 allows one to incorporate the idempotency con-strain directly into the energy functional (Eq. [109]). In addition, the con-straint of having the correct number of electrons was imposed by fixing thechemical potential mchem as in:

ELNV ¼ tr

�~PPðHTB � mchem1Þ

�½113�

where 1 is the unit matrix and ~PP denotes the purified density matrix:

~PP ¼ 3PSP� 2PSPSP ½114�

This purification transformation of McWeeny120,121 allows one tocreate, out of a nearly idempotent density matrix P; a more idempotent matrix~PP. The method converges quadratically toward the fully idempotent matrix.120

The function ~xx ðxÞ ¼ 3x2 � 2x3 (for S ¼ 1) is shown in Figure 16 and possessestwo stationary points at f ð0Þ ¼ 0 and f ð1Þ ¼ 1. The purification transforma-tion (Eq. [114]) converges quadratically to an idempotent density matrixwhose eigenvalues are either 0 or 1, which correspond to virtual or occupiedstates, respectively. The necessary convergence condition is that the startingeigenvalues of P are in the range (�0.5, 1.5).

x~

x

−0.5

0.5

1

1.5

2

−1

−0.5 0.5 1−1 0

1.5

Figure 16 ‘‘Purification’’ function ~xx ¼ 3x2 � 2x3.


Let us illustrate the purification transformation by some numericalexamples. Suppose x is close to zero, say, x ¼ 0:1. Purification will then bringit closer to zero: ~xx ¼ 3ð0:1Þ2 � 2ð0:1Þ3 ¼ 0:028. Suppose, on the other hand,that x ¼ 0:9, that is, close to one. This time the purity transformation brings itcloser to 1: ~xx ¼ 3ð0:9Þ2 � 2ð0:9Þ3 ¼ 0:972. We have also illustrated the con-vergence of the purification transformation for several starting values of xand the density matrix in Figure 17.

In passing by, we note that the original LNV approach was designed fororthogonal basis functions. Nunes and Vanderbilt later presented a generaliza-tion to nonorthogonal problems89 (see as well later work by White et al. inRef. 122). A modified LNV scheme for SCF theories was introduced byOchsenfeld and Head-Gordon.68 Similarly Millam and Scuseria90 presentedas well an extension of the LNV algorithm to the HF method.

In the derivation of density matrix-based SCF theory below, we donot employ the chemical potential introduced by LNV,88 but instead wefollow the derivation of Ochsenfeld and Head-Gordon, because McWeeny’spurification automatically preserves the electron number.68 Therefore, toavoid the diagonalization within the SCF procedure, we minimize the energyfunctional

~EE ¼ tr ~PPhþ 1

2~PPGð~PPÞ

� �½115�

with respect to density-matrix changes, where ~PP is the inserted purified density.The simplest approach is therefore to optimize the density matrix (e.g., starting

Figure 17 Convergence of purification transformation for different starting values (left).Purification of the density matrix after a typical geometry optimization step within D-QCSCF (see the following section for a definition) calculation (right), the logarithmicvalue of the norm of the residual (logjjPðiÞ � Pði�1Þjj) is plotted.


with a trial density matrix PðiÞ) by searching for an energy minimum along thedirection of the negative energy gradient:

Pðiþ1Þ ¼ PðiÞ � s � qE½PðiÞ�

qPðiÞ½116�

where s is the step length. The gradient is built by forming the derivative of theenergy functional (Eq. [115]):

q~EEqP¼ q~EE

q~PP

q~PPqP¼ 3FPSþ 3SPF� 2SPFPS� 2FPSPS� 2SPSPF ½117�

At convergence this energy gradient expression reduces to the usual criterionof FPS� SPF ¼ 0. It is important to note that the covariant energy gradient(Eq. [117]) cannot be added directly to the contravariant one-particle densitymatrix, so that a transformation with the metric is required.

Therefore, let us look briefly at the tensor properties of the energy gra-dient. Rewriting the energy gradient (Eq. [117]) in tensor notation,

q~EEqP

!mn

¼ ðrEÞmn ¼ 3FmlPlsSsn þ 3SmlP

lsFsn

� 2SmlPlsFsaP

abSbn

� 2FmlPlsSsaP

abSbn

� 2SmlPlsSsaP

abFbn

½118�

we note immediately that this gradient is a fully covariant tensor. Because thedensity matrix is fully contravariant in ‘‘covariant integral representation,’’ itis tensorially inconsistent to generate a new density matrix by adding the fullycovariant gradient to the fully contravariant density matrix PðiÞ. Convertingthe covariant to contravariant indices by applying the inverse metric, wefind the tensorially consistent formulation of the energy gradient116 as follows:

ðrEÞmn ¼ gmlðrEÞlsgsn¼ ðS�1ÞmlðrEÞlsðS�1Þsn

½119�

With this the fully contravariant energy gradient results:

ðrEÞ ¼ qEqP¼ 3S�1FPþ 3PFS�1 � 2PFP� 2S�1FPSP� 2PSPFS�1 ½120�

Because all matrices for its formation can be built in a linear-scalingfashion and because they are sparse for systems with a nonvanishing


HOMO–LUMO gap, the energy gradient with respect to the density matrix canbe built with linear-scaling effort. Due to symmetries, only a few sparse matrixmultiplications are required for the computation of the gradient. In this way, itis possible to avoid the diagonalization in the SCF procedure, thereby reducingthe computational scaling asymptotically to linear68,122 for large molecules.

‘‘Curvy Steps’’ in Energy Minimization

In the simplest density matrix-based method, optimization steps aretaken in the direction of the negative gradient �rE. The best one can there-fore do is to find the minimum energy along a straight line defined by the gra-dient direction in each step. One can show, however, that it is possible to findthe minimum along a curved path at essentially no additional cost,91,92 whichpotentially leads to more efficient minimization steps. With this approach, theidempotency condition is fulfilled through higher orders than in the densitymatrix-based scheme described above, where just the lowest-order purificationtransformation of McWeeny121 enters the formulation.

It is useful to describe the generation of a new density matrix from theprevious matrix by unitary transformation:

~~PP~PP ¼ UyPU ½121�

Every unitary matrix U can be represented by an exponential function of ananti-Hermitian matrix D13

U ¼ eD ¼ 1þ Dþ 1

2!D2 þ . . . ½122�

or in the tensor notation introduced in the previous section as

Umn ¼ ðeDÞmn � e�

mn ½123�

In this notation, the exponential parametrization of the new density matrixbecomes

~~PP~PPmn ¼ ðUyÞmlPl

sUsn ¼ e��

mlPl

se�s

n ½124�

The densitymatrix (and thus theHartree–Fock energy) can now be seen as func-tions of the parameter D and the requirement for an energy minimum becomes

ðrEÞnm ¼qEq�m

n¼ qE

q~~PP~PPls

q~~PP~PPls

q�mn¼ Fs

lq~~PP~PP

ls

q�mn¼ 0 ½125�


If one is not yet at a minimum, one can for example use the method of steepestdescent to arrive at the optimum density matrix. Inserting the explicit depen-dence of

~~PP~PPmn on �m

n , e.g., in the form of the Taylor expansion of Eq. [124]around �m

n ¼ 0, one obtains for the direction of steepest descent:

�mn ¼ �

qEq�n

m

!��¼0¼ �ðFm

lPln � Pm

lFlnÞ ½126�

Until now this ansatz is similar to the LNV approach (reformulated in naturaltensor notation), starting from an almost idempotent density matrix. (Insertingthe purity-transformed density matrix and going to covariant integral repre-sentation, the previous equation yields exactly the same result as in theLNV approach.) One could search along this direction in a steepest descentmanner to reach the energy minimum. It is instructive to notice that thesesearches along a straight line may be interpreted as truncating the Taylor seriesof the exponential transformation after the linear term:

~~PP~PP ¼ 1� Dþ 1

2!D2 þ . . .

� �P 1þ Dþ 1

2!D2 þ . . .

� � P� DPþ PD ½127�

Nowwe introduce ‘‘curvy steps.’’An intuitive interpretationofwhat is donehere is to expand the Taylor series of the exponential transformation to higherorders, such that the step directions are no longer straight lines, but instead theyare curved. Invoking the Baker–(Campbell–)Hausdorff lemma (see, e.g., Ref.123), the unitary transformation of the density matrix can be written as

~~PP~PP ¼ Pþ ½P;D� þ 1

2!½½P;D�;D� þ 1

3!½½½P;D�;D�;D� þ . . . ½128�

or

~~PP~PPmn ¼

Xj¼0

1

j!P½ j� �m

n

where the P½j� are short-hand notations for nested commutators, which can becalculated by recursion using

P½ jþ1� �m

n¼ ½P½ j�;D�mn ½129�

In a similar way, the Hartree–Fock trial energy, as a function of the trans-formed density matrix ~EE½~~PP~PP�, can be written as a series in the step length s, as

~EE ¼Xj¼0

sj

j!~EE½ j�; with ~EE½ j� ¼ P½ j�

�mnFnm ½130�


This equation describes the dependence of the trial energy on the step lengthalong a curved step direction given by the ~EE½ j�’s and P½ j�’s.

In the ‘‘curvy steps’’ approach, higher terms of the Taylor expansion maybe retained by including higher order commutators in the sum of Eq. [130](letting j run to high orders), which corresponds to making steps along curved(polynomial) directions. If the series of Eq. [130] was truncated after j ¼ 1,one would obtain essentially the same step directions as in the LNV approach(starting from an idempotent density matrix).

A step along a curved direction is superior to one along a straight lineand should lead to faster convergence as far as the number of iterations is con-cerned, because higher order terms of the Taylor expansion are kept in thetransformation of the density matrix. If all intermediate matrices are storedin memory, searching along curved directions is not more expensive than forstraight-line steps; therefore, Head-Gordon and coworkers.91 find the curvysteps method to be faster than the LNV approach.

Density Matrix-Based Quadratically Convergent SCF(D-QCSCF)

We have shown, in principle, how to circumvent the diagonalization andintroduced two alternatives for choosing the density updates—the methods ofsteepest descent and ‘‘curvy steps.’’ Now we derive another density update, onwhich the density matrix-based quadratically convergent SCF method(D-QCSCF) of Ochsenfeld and Head-Gordon68 is based. This will also beour starting point in deriving linear-scaling methods for energy derivativesneeded to determine response properties like vibrational frequencies orNMR chemical shifts, which are described in the next two sections.

To minimize the energy functional (Eq. [115]) with respect to densitychanges

d~EE

dP

!¼ 0 ½131�

we can use, for example, a Newton–Raphson scheme.124 The Taylor expan-sion of the energy functional around P in changes of the density matrix (P�)is given as

~EEðPþ P�Þ ¼ ~EEðPÞ þ d~EE

dP

��P�¼0ðP�Þ þ 1

2

d2~EE

dP2

��P�¼0ðP�Þ2 þ . . . ½132�

For small changes P� in the density P, terms higher than linear in theexpansion can be discarded. We want to minimize the energy gradient ofEq. [132] as


d~EEðPþ P�ÞdP

!¼ 0 ½133�

Neglecting all terms higher than linear and differentiating Eq. [132], we imme-diately arrive at the Newton–Raphson equation, which has to be solved itera-tively to obtain the density update PD:

d2~EE

dP2ðP�Þ ¼ � d~EE

dP½134�

The term on the right-hand side of Eq. [134] is already known from thesimple energy gradient (Eq.[117]) and the left-hand side can be calculated as

qqP

tr

�q~EEðPÞqP

P�

�¼ 3FP�Sþ 3SP�F� 2FP�SPS� 2FPSP�S

� 2SP�FPS� 2SPFP�S� 2SP�SPF� 2SPSP�F

þ 3GðXÞPSþ 3SPGðXÞ � 2GðXÞPSPS� 2SPSPGðXÞ � 2SPGðXÞPS ½135�

with

X ¼ 3P�SPþ 3PSP� � 2P�SPSP� 2PSP�SP� 2PSPSP� ½136�

After the density update has been determined using the Newton–Raphsonequations shown above, the density matrix may be updated as

Pðiþ1Þ ¼ PðiÞ þ s PðiÞ� ½137�

The procedure of determining P� and updating the density is iterated until self-consistency is obtained. For molecules with a nonvanishing HOMO–LUMOgap, all matrices involved are sparse, such that solving the SCF eigenvalue pro-blem is altogether an asymptotically linear-scaling step.

Implications for Linear-Scaling Calculationof SCF Energies

This concludes our derivation of tools for the linear-scaling calculationof SCF energies. We have outlined methods that enable the linear-scalingexecution of the two expensive steps of SCF calculations: first, efficient integralscreening and linear-scaling formation of Fock-type matrices, and second,methods for circumventing the diagonalization step, used conventionally for


solving the SCF pseudo-eigenvalue problem. With these methods, it is nowpossible to calculate HF and DFT energies with an effort scaling asymptoti-cally linear, so that molecular systems with 1000 and more atoms can behandled with today’s computers.

SCF ENERGY GRADIENTS

Up to this stage of our review, we were mainly focusing on the compu-tation of the energy of a molecule. However, to obtain suitable instrumentsfor studying molecular systems and to be able to establish useful connectionsto experimental investigations, we also need to compute other molecularproperties than just the energy. The first step toward this is the calculationof energy gradients, e.g., with respect to nuclear coordinates, which allowus to locate stationary points on a potential energy surface. In addition,energy gradients are crucial for performing direct Born–Oppenheimer molecu-lar dynamics.

The energy gradients with respect to a nuclear coordinate, as an exam-ple, can be obtained by differentiating the SCF energy expression of Eq. [109]:

qEqx¼ trðPhxÞ þ 1

2trðPGxðPÞÞ þ trðPxFÞ ½138�

where GxðPÞ denotes the contraction of derivative two-electron integrals withthe one-particle density matrix and h

x the derivative of the core-Hamiltonianmatrix. Note that the computation of these integral derivatives can be done ina linear-scaling fashion by slight modifications of the previously introducedOðMÞ algorithms like CFMM and LinK, for example. Although the derivativedensity matrix Px occurs, Pulay pointed out125–127 that it can be avoided byexploiting the solved Roothaan–Hall equations and the derivative of theorthonormality relation. In this way, the perturbed density matrix is replacedby an expression containing the overlap integral derivative Sx as


2trðPGxðPÞÞ � trðWSxÞ ½139�

where W is the ‘‘energy-weighted density matrix’’ expressed as

Wmn ¼Xi

eiC�miCni ½140�

This formulation requires one to compute the energy-weighted density matrixusing the molecular orbital coefficient matrix C, which must be avoided to

SCF Energy Gradients 57

achieve an overall linear-scaling behavior. Therefore, we need to derive analternative expression for substituting tr ðPxFÞ (see also Ref. 128).

To obtain equations that are independent of Px, it is necessary to consid-er the different contributions to the derivative density. As for any matrix repre-sentation of operators, it is possible to split the contributions into differentsubspace projections (compare Eq. [108]):

Px ¼ Pxoo þ Pxov þ Pxvo þ Pxvv ½141�

These projections will be analyzed below. At SCF convergence, the followingequations are valid:

FPS ¼ SPF ½142�P ¼ PSP ½143�

In addition, after introducing the perturbation ‘‘x,’’ the derivative of the idem-potency relation (Eq. [143]) has to be fulfilled:

Px ¼ PxSPþ P SxPþ PS Px ½144�

Projecting Eq. [144] onto the occupied space and employing the idempotencyrelation (Eq. [143]) allows us to identify Pxoo:

Pxoo ¼ PSPxSP ¼ PSPxSPþ PSxPþ PSPxSP ¼ �PSxP ½145�

This shows that the occupied–occupied part of Px is directly linked to the deri-vative of the overlap integrals. In addition, the virtual–virtual block of Px

vanishes:

Pxvv ¼ ð1� PSÞPxð1� SPÞ ¼ Px � PSPx � PxSP|fflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflffl}PSxP

þ PSPxSP|fflfflfflffl{zfflfflfflffl}�PSxP

¼ 0 ½146�

With these properties of Px at hand, the remaining part of tr ðPx FÞ is theocc/virt and the virt/occ blocks:

trðPxovFÞ ¼ trðPSPxð1� SPÞFÞ ¼ trðPxð1� SPÞFPSÞ ¼ trðPxFvoÞ ½147�trðPxvoFÞ ¼ trðPxFovÞ ½148�

In Eqs. [147] and [148], the cyclic permutation possible within a trace hasbeen exploited, which shows that the projection of Px can be transferred tothe Fock matrix. At this stage it is important to think again on what is done


in solving the SCF equations at the HF or DFT level: The Fock matrix is diag-onalized, so that blocks coupling virtual and occupied parts vanish:

Fov ¼ SPFð1� PSÞ ¼ SPF� SP FPS|{z}SPF

¼ SPF� S PSP|{z}P

F ¼ 0 ½149�

Fvo ¼ 0 ½150�Therefore, the term involved in the energy gradient simplifies to

trðPxFÞ ¼ trðPxooFÞ ¼ trð�PSxPFÞ ¼ trð�PFPSxÞ ½151�

In this way the final density matrix-based energy gradient expression results,which avoids the use of the conventional energy-weighted density matrix(which is conventionally constructed via the delocalized, canonical MO coef-ficients):


2trðPGxðPÞÞ � trðPFPSxÞ ½152�

Here, only quantities that are sparse for systems with a nonvanishing HOMO–LUMO gap enter the formulation, whereas the derivative quantities h

x, Sx,and the contraction of the one-particle density matrix with the derivative inte-grals can be computed in a linear-scaling fashion. For the integral contractionsslightly modified CFMM- and LinK-type schemes can be used to reduce thescaling.76,129 It is important to note that, for example, in the formation ofthe exchange energy gradient EKx, the derivative two-electron integrals arecontracted with two one-particle density matrices:

EKx ¼Xmn

Xls

Pmn ðmljnsÞx Pls ½153�

Therefore, the coupling in reducing the scaling behavior is even stronger thanfor the calculation of the Fock matrix. An example for the scaling behavior ofthe energy-gradient calculation as compared with the conventional quadraticscaling behavior is displayed in Figure 18 for DNA fragments. It is importantto note that the numerical accuracy is the same in both cases.

MOLECULAR RESPONSE PROPERTIESAT THE SCF LEVEL

With the energy gradients at hand, one can now locate stationary pointson a potential hypersurface or study the dynamics of a system. However, ahuge class of important molecular properties is more complicated to calculate,

Molecular Response Properties at the SCF Level 59

because they are linked to the response of the molecular system with respect toa perturbation. Examples include vibrational frequencies, NMR chemicalshifts, and polarizabilities. An excellent review is given in Ref. 130, so thatwe focus in the following on the key issues that need to be resolved to reducethe scaling behavior for the computation of response properties to being linearat the SCF level.

In the following, we first describe briefly how properties such as vibra-tional frequencies and NMR chemical shifts are computed. Then we focus onhow to calculate the common difficult part, namely the response of the one-particle density matrix with respect to a perturbation and how to reduce thecomputational scaling to linear.

Vibrational Frequencies

The second derivatives with respect to nuclear displacements are crucialfor characterizing stationary points on a potential hypersurface. They provideas well the normal modes of the system and can be linked within the harmonicapproximation to the vibrational frequencies of the system, which can be mea-sured experimentally by IR or Raman spectroscopy. By taking the derivative ofthe SCF energy gradient expression (Eq. [152]) with respect to another

Figure 18 Comparison of timings for standard OðM2Þ (STD JKgrad) and linear-scaling

energy gradients for DNA fragments (HF/6-31G�) using the LinK76 and CFMM129

methods.


nuclear coordinate y, we obtain the following expression for the secondderivatives:

q2Eqxqy

¼ tr Pq2hqxqy

!þ 1

2tr P

q2IIqxqy

P

!� tr PFP

q2Sqxqy

!þ tr P

qFqy

PqSqx

� �

þ trqPqy

qhqx

� �þ tr

qPqy

qIIqx

P

� �� tr

qPqy

FPqSqxþ PF

qPqy

qSqx

� �½154�

where II abbreviates the antisymmetrized two-electron integrals. In contrast tothe simple energy gradient expression, the computation of the perturbed one-particle density matrix cannot be avoided anymore for the second derivatives.To obtain this response of the one-particle density matrix with respect to theperturbation y, the coupled-perturbed Hartree–Fock or coupled-perturbedKohn–Sham equations need to be solved.131–136 The standard path for a solu-tion in the MO basis scales as OðM5Þ,135,136 whereas an AO formulation intro-duced later128,136 reduces the computational effort to OðM4Þ. The scalingbehavior of the latter scheme is due to a partial solution of the coupled-perturbed self-consistent field (CPSCF) equations in the MO basis. To reducethis scaling behavior, Ochsenfeld and Head-Gordon68 reformulated the CPSCFtheory in a density matrix-based scheme (D-CPSCF), so that asymptotically alinear-scaling behavior becomes possible. Closely related to this densitymatrix-based approach, Ochsenfeld, Kussmann, and Koziol recently introducedan even more efficient density-based approach for the solution of the CPSCFequations in the context of the calculation of NMR chemical shieldings.113

Therefore, we focus in the following on the calculation of NMR shieldingsand formulate the corresponding D-CPSCF theory within this context.

NMR Chemical Shieldings

The routine calculation of NMR chemical shifts137–141 using quantumchemical methods has become possible since the introduction of local gauge-origin methods,142–148 which provide a solution to the gauge-origin problemwithin approximated schemes. In our formulation, we use gauge-includingatomic orbitals (GIAO):144–146

wmðBÞ ¼ wmð0Þ exp � i

2cB� ðRm � R0Þ � r

� �½155�

where B is the magnetic field vector and wmð0Þ denotes the standard field-independent basis functions. The location of the basis functions and the gaugeorigin are described by Rm and R0, respectively. The use of the GIAO functionspermits us to avoid the gauge-origin problem and has proven to be particularlysuccessful.137 In the following, we constrain ourselves to the HF method(GIAO-HF),145,146,148 which provides useful results for many molecular systems.


For example, in many systems, we found the GIAO-HF method to yield1H-NMR chemical shifts with an accuracy of typically 0.2–0.5 ppm.149–152

For other nuclei the inclusion of correlation effects can become more impor-tant.137,139, 153,154 The computation at the GIAO-DFT level is closely related.

NMR chemical shifts are calculated as second derivatives of the energywith respect to the external magnetic field B and the nuclear magnetic spinmNj

of a nucleus N:

sNij ¼

q2EqBiqmN j

½156�

where i, j are x, y, z coordinates. This leads to:

sNij ¼

Xmn

Pmnq2hmn

qBiqmNj

þXmn

qPmn

qBi

qhmnqmNj

½157�

The equation shows that, similar to the calculation of vibrational frequencies,the response of the one-particle density matrix to a perturbation is necessary,which is in the case of NMR shieldings the magnetic field Bi. Therefore, theCPSCF equations need to be solved for the perturbed one-particle densitymatrices

qPmn

qBi(short: PBi). In the context of NMR shieldings, the computational

effort of conventional schemes146,148 scales cubically with molecular size.To reduce the scaling behavior for the calculation of response properties,

we focus in the following on a reformulation of the CPSCF equations in a den-sity matrix-based scheme, so that the scaling of the computational effort canbe reduced to linear for systems with a nonvanishing HOMO–LUMO gap.

Density Matrix-Based Coupled Perturbed SCF (D-CPSCF)

The solution of the CPSCF equations is necessary for obtaining theresponse of the one-particle density matrix with respect to a perturbation.As mentioned, in conventional formulations of CPSCF theory, AO–MOtransformations are involved, so that again the delocalized, canonical MOcoefficients are required. In this way, it is not possible to reduce the compu-tational effort to linear. Therefore, the key feature of linear-scaling CPSCFtheory is to avoid these MO transformations and, instead, to solve directlyfor the perturbed one-particle density matrices. The quadratically convergentdensity matrix-based SCF method described above in the context of avoidingthe diagonalization within the SCF cycle can be used as the basis for thereformulation of the response equations.68 Related alternative approacheshave been later proposed in Refs. 155 and 156, but we follow in this reviewour derivation presented in Refs. 68 and 113, which we have found to beuseful in obtaining an efficient density matrix-based CPSCF scheme for largemolecules.


In the following, we focus on the determination of the density matrix asperturbed with respect to the magnetic field ðPBiÞ, whereas an extension toother perturbations is straightforward. Within a linear response formalism(only terms linear in the external perturbation are considered), we can solvefor the perturbed density matrix PBi directly

q2~EE½P�qP2

" #PBi ¼ � q

qBi

q~EE½P�qP

" #½158�

where ~EE is the functional described in Eq. [115]. Inserting Eq. [135] and thederivative of Eq. [117] with respect to the perturbation Bi into Eq. [158], weobtain68

3FPBiSþ 3SPBiF� 2FPBiSPS� 4FPSPBiS� 4SPBiSPF� 2SPSPBiF

þGðXÞPSþ SPGðXÞ � 2SPGðXÞPS¼ FPSBi þ SBiPFþ 2FPSBiPSþ 2SPSBiPF

� FðBiÞPS� SPFðBiÞ þ 2SPFðBiÞPS ½159�

with

FðBiÞ ¼ hBi þGBiðPÞ ½160�

X ¼ PBiSPþ PSPBi � 2PSPBiSP� PSBiP ¼ PBi ½161�

At this stage, it is worthwhile to consider some properties of the deriva-tive density matrix, which can—as described in the section on density matrix-based energy gradients—be split into a sum of subspace projections (Eq. [108]):

PBi ¼ PBioo þ PBi

ov þ PBivo þ PBi

vv ½162�

The comparison with the derivative of the idempotency relation

PBi ¼ PBiSPþ PSBiPþ PSPBi ½163�

clarifies the different contributions of PBi :

PBioo ¼ �SBi

oo ½164�PBiov ¼ �PBi

vo ½165�PBivv ¼ 0 ½166�


where the sign on the right of Eq. [164] originates from the fact that the first-order matrices with respect to the magnetic perturbation are skew-symmetric.As we can directly calculate SBi , we only have to determine the occupied–virtual part PBi

ov. To solve only for PBiov and PBi

vo, the equation system ofEq. [159] can be projected by SP from the left and PS from the right, respec-tively, and the two resulting equations are added. In this way, we obtain thefollowing density matrix-based CPSCF equations,113 which provide superiorconvergence properties, in particular if sparse algebra is employed:

FPBiSPSþ SPSPBiF� FPSPBiS� SPBiSPFþGðPBiÞPSþ SPGðPBiÞ � 2SPGðPBiÞPS¼ FPSBi þ SBiPF� FPSBiPS� SPSBiPF� FðBiÞPS� SPFðBiÞ þ 2SPFðBiÞPS

½167�

with

FðBiÞmn ¼

qhmnqBiþXls

Plsq½ðmnjlsÞ � 1

2 ðmljnsÞ�qBi

½168�

GmnðPBi

lsÞ ¼ �1

2

Xls

qPls

qBiðmljnsÞ ½169�

The convergence properties of the density matrix-based equations, i.e.,the number of iterations to converge PBi , are similar to the ones encounteredfor a solution in the MO space, so that the advantage of using sparse multi-plications within the density-based approach allows us to reduce the scalingproperty of the computational effort in an efficient manner. In this way,NMR chemical shift calculations with linear-scaling effort become possibleand systems with 1000 and more atoms can be treated at the HF or DFT levelon today’s computers.113 Extensions to other molecular properties can be for-mulated in a similar fashion.

OUTLOOK ON ELECTRON CORRELATIONMETHODS FOR LARGE SYSTEMS

Although the main focus of the current review is to provide insights intoreducing the scaling behavior of HF and DFT methods, it seems appropriate toprovide a brief outlook on the behavior of post-HF methods. The importance ofthese methods cannot be overemphasized, because it is the systematic hierarchy ofapproaches to the exact solution of the electronic Schrodinger equation that allowsfor systematic and reliable studies of molecular systems. A concise overview of thehuge amount of interesting and successful work done in the field of reducing thescaling behavior of post-HFmethods is beyond the scope of this chapter; therefore,


we just provide some insights into why a reduction of the scaling behavior withrigorous error bounds should also be possible here. For a firsthand account ofthe impressive progress made in the field, the reader is referred to the work ofPulay and Saebø,157–160 Werner and coworkers,161–163 Head-Gordon and co-workers,164,165 Almlof and Haser,166–168 Ayala and Scuseria,169,170 Friesner andcoworkers,171 Carter and coworkers,172 Schutz and Werner,173 and Schutz.174

Recent reviews may be found in Refs. 175 and 114.To explain some principles, we focus here on the most simple of these

approaches, the Møller–Plesset perturbation theory to second order (MP2).In the conventional, canonical MO-based formulation, the closed-shell MP2correlation energy is given by

EMP2 ¼ �Xijab

ðiajjbÞ½2ðiajjbÞ � ðibjjaÞ�ea þ eb � ei � ej

½170�

with the MO integrals

ðiajjbÞ ¼ðj�i ðr1Þjaðr1Þ

1

r12j�j ðr2Þjbðr2Þdr1dr2 ½171�

Indices i, j denote occupied orbitals, whereas a, b are virtuals. The difficulty isthat the integrals computed in the AO basis need to be transformed into theMO basis:

ðiajjbÞ ¼Xmnls

CmiCnaCljCsb ðmnjlsÞ ½172�

If the transformations are done in a successive instead of a simultaneous way,the computational effort reduces from formally OðM8Þ to OðM5Þ. However,due to the nonlocality of the canonical MOs (discussed above in the contextof SCF methods), this OðM5Þ effort holds in the asymptotic limit, so that noreduction can be expected. The only factor is that the four transformations scaledifferently depending on whether occupied or virtual indices are transformed.

To avoid the canonical, delocalized orbitals, Almlof suggested in 1991166

using a Laplace transform for eliminating the disturbing denominatorxq � ea þ eb � ei � ej:

1

xq¼ð10

expð�xqtÞdt Xta¼1

oðaÞ expð�xqtðaÞÞ ½173�

where the integral can be replaced by a summation over a few grid points. Intypical applications, it has been found by Haser and Almlof167 that t ¼ 5� 8

Outlook on Electron Correlation Methods for Large Systems 65

provides mHartree accuracy. This approach was employed by Haser toformulate an AO–MP2 formalism,168 which we briefly revise in the following:With the definition of two pseudo-density matrices,

PðaÞmn ¼ joðaÞj1=4Xocci

Cmi expððei � eFÞtðaÞÞCni ½174�

and

�PPðaÞmn ¼ joðaÞj1=4

Xvirta

Cma expððeF � eaÞtðaÞÞCna ½175�

where eF is ðeHOMO þ eLUMOÞ=2,166–168 the MP2 energy expression becomes

EMP2 ¼ �Xta¼1

eðaÞJK ½176�

with

eðaÞJK ¼

Xmnls

Xm0n0l0s0

PðaÞmm0

�PPðaÞnn0 ðm0n0jl0s0ÞPðaÞll0

�PPðaÞss0 ½2ðmnjlsÞ � ðmsjlnÞ� ½177�

For each integration point (a ¼ 1 . . . t), four, formally OðM5Þ scaling transfor-mations are necessary to yield the transformed two-electron integrals

ðm�nnjl�ssÞðaÞ ¼X

m0n0l0s0PðaÞmm0

�PPðaÞnn0 ðm0n0jl0s0ÞPðaÞll0

�PPðaÞss0 ½178�

which are contracted in a final, formallyOðM4Þ scaling step in a Coulomb- (eðaÞJ )

and an exchange-type (eðaÞK ) fashion:

eðaÞJK ¼ 2e

ðaÞJ � e

ðaÞK ¼

Xmnls

ðm�nnjl�ssÞðaÞ½2ðmnjlsÞ � ðmsjlnÞ� ½179�

Here m and m denote the same index, where the bar of m (or �mm) only indicatesthat the index has been transformed with P (or �PP, respectively).

In contrast to the conventional MO-based formulation, the AO-basedLaplace formalism allows one to reduce the conventional OðN5Þ scaling ofthe computational cost for large molecules. However, for small molecules,the overhead consists of the need to compute t = 5–8 exponentials and the lar-ger prefactor for the transformations scaling formally as N5 compared withthe nocc �N4, . . ., and n2occ � n2virt �N scaling for the different MO-based


transformations (nocc and nvirt denote the number of occupied and virtual orbi-tals, respectively). Despite this overhead for small molecules, the central draw-back in MO-based transformations caused by the delocalized nature ofcanonical MOs is avoided and the scaling can be reduced for large molecules.

The AO–MP2 method introduced in 1993 by Haser168 applies screeningcriteria to the intermediate four-index quantities in order to reduce the com-putational scaling for larger molecules. Here, the Schwarz inequality intro-duced earlier in this review21,168,177

��ðmnjlsÞ�� ðmnjmnÞ��12��ðlsjlsÞ��12 ¼ QmnQls ½180�

is used, which we denote as QQ-screening. Haser168 adapted the Schwarzscreening for estimating the transformed quantities occurring in AO–MP2 the-ory, which we abbreviate in the following as QQZZ or Pseudo-Schwarzscreening, where Z is defined as an upper bound approximation to the trans-formed Schwarz criterium (see Ref. 168):

��ðm�nnjm�nnÞ��12 � Zmn ½181�

As pointed out by Haser,168 this screening protocol yields asymptotically aquadratically scaling MP2 method for systems with a significant HOMO–LUMO gap. This quadratic scaling of AO–MP2 was further reduced tobecome linear by Ayala and Scuseria by ‘‘introducing interaction domainsand neglecting selective domain-domain interactions’’.169

In this tutorial, we use the Laplace approach to explain some aspects ofthe long-range behavior of electron-correlation methods, without commentingon which one of the many approaches for reducing the computational effortwill become the standard replacement of conventional correlation formula-tions. We follow here our discussion presented in a recent publication,24 whichpermits for the first time to determine rigorously which of the transformedintegral products contribute to the MP2 energy.

Long-Range Behavior of Correlation Effects

The formation of the correlation energy in AO–MP2 consists of the trans-formation (Eq. [178]) and the contraction step (Eq. [179]). We start our discus-sion by considering the distance dependence of correlation contributions.

Transformed IntegralsSome discussion in this section is similar to considerations of Ayala and

Scuseria.169 However, we present here a different argument for deriving rigor-ous and tight upper bounds for estimating transformed integral products fol-lowing the work in Ref. 24.


For nonoverlapping charge distributions �A ¼ �mn ¼ wmwn and �B ¼ �ls

centered at A and B, respectively, the two-electron integral ðmnjlsÞ is boundfrom above (see Refs. 23 and 31) by

1

r12

� � 1

R

X1n¼0

hðr1A � r2BÞniRn

�� ½182�

with R ¼ jB� Aj and the position of the electrons r1 ¼ r1A þ A andr2 ¼ r2B þ B, whereas h i abbreviates the two-electron integral. This expansionin multipoles such as overlap Mð0Þ ¼ S, dipole Mð1Þ, and higher order termsMð2Þ, Mð3Þ, . . . leads to

��ðmnjlsÞ�� R�1��Mð0Þmn M

ð0Þls

��þ R�2

��Mð1Þmn Mð0Þls �Mð0Þmn M

ð1Þls

��þ R�3

��Mð2Þmn Mð0Þls � 2Mð1Þmn M

ð1Þls þMð0Þmn M

ð2Þls

��þ R�4

��Mð3Þmn Mð0Þls � 3Mð2Þmn M

ð1Þls þ 3Mð1Þmn M

ð2Þls �Mð0Þmn M

ð3Þls

��þOðR�5Þ ½183�

Due to the orthogonality properties of P and �PP (similar to the standard one-particle density matrix P of SCF theory and its complement (1 – P) in an ortho-gonal basis), the transformation of the overlap leads to

Xm0

Xn0

Pmm0Sm0n0 �PPnn0 ¼ Sm�nn ¼Mð0Þm�nn ¼ 0 ½184�

so that all terms involving the overlap ðMð0ÞÞ are zero. Therefore, the expan-sion for the transformed integrals becomes

��ðm�nnjl�ssÞ�� R�3�� 2M

ð1Þm�nn M

ð1Þl�ss

��þ R�4

�� 3Mð2Þm�nn M

ð1Þl�ss þ 3M

ð1Þm�nn M

ð2Þl�ss

��þOðR�5Þ ½185�

and an O 1R3

� �dependence for the transformed integrals results. Together with

the O 1R

� �behavior of the untransformed integrals, this leads to an overall

O 1R4

� �decay in the contraction step (Eq. [179]). It is important to note that

this distance dependence results only from the orthogonality properties ofthe pseudo-density matrices, where the only requirement is the validity of


the multipole expansion (Eq. [182]) for the untransformed integrals. Nolocality of the pseudo-density matrices has been exploited at this stage, whichleads to an even stronger decay as discussed below.

Coulomb-Type ContractionIf the charge distributions �mn, �ls, and �m0n0 , �l0s0 , respectively, are non-

overlapping in the sense that the multipole expansion (Eq. [183]) is applicableto the untransformed two-electron integrals ðmnjlsÞ and ðm0n0jl0s0Þ, then thecorresponding Coulomb terms can be written as

eðaÞJ

�� Xmnls

Xm0n0l0s0

��ðmnjlsÞ�� PðaÞmm0�PPðaÞnn0 ðm0n0jl0s0ÞPðaÞll0

�PPðaÞss0

��Xmnls

R�1��Mð0Þmn M

ð0Þls

�� R�3�� 2Mð1Þm�nn M

ð1Þl�ss

��ðaÞþ R�1

��Mð0Þmn Mð0Þls

�� R�4��3Mð1Þm�nn Mð2Þl�ss � 3M

ð2Þm�nn M

ð1Þl�ss

��ðaÞþ R�2


ð1Þls

�� R�3�� 2Mð1Þm�nn M

ð1Þl�ss

��ðaÞþ R�2


ð1Þls

�� R�4��3Mð1Þm�nn Mð2Þl�ss � 3M

ð2Þm�nn M

ð1Þl�ss

��ðaÞþ � � � ½186�

For the sake of notational simplicity, we have not made a distinction betweendistances of centers of untransformed or transformed charge distributions,because it is clear from the context. Considering in more detail the summationover the mn-part (the ls terms are omitted) of the first term of order 1

R4.

Xmn

Xm0n0

Mð0Þmn

R� P

ðaÞmm0

Mð1Þm0n0

R3�PPðaÞnn0

" #¼Xm0n0

Xmn

PðaÞmm0

Mð0Þmn

R�PPðaÞnn0

" #�M

ð1Þm0n0

R3½187�

makes clear that we can either perform first the m0; n0 summation or the sum-mation over m; n. In the second representation, the Mð0Þ term is multiplied by P(and �PP), which would show that the 1

R4 term becomes zero. However, this isonly true if P and �PP are still fully orthogonal in the restricted space of indiceswhere the multipole expansion is valid. Otherwise, missing indices would leadto nonvanishing overlap contributions after the projection with P and �PP. Thisleads to the following requirements:

� �mn and �ls are nonoverlapping: valid multipole expansion for ðmnjlsÞ.� �m0n0 and �l0s0 are nonoverlapping: valid multipole expansion forðm0n0jl0s0Þ.


� For each m and n of �mn, the elements m0 and n0 coupled via the significant

elements of Pmm0 and�PPnn0 have to be contained in the sum

Pm0n0 . In other

words: For each m and n of �mn, the shell pairs coupled via Pmm0 and�PPnn0

(the significant elements) have to be nonoverlapping with �l0s0 , so that

the multipole expansion can be applied.

If these criteria are fulfilled in the restricted space of indices defined by the mul-tipole expansion within a threshold, then ðPMð0Þ�PPÞm0n0 is zero and with that the1R4 term in Eqs. [186] and [187], and two of the three 1

R5 terms in Eq. [186] dis-appear. If the analogous argumentation holds at the same time for �ls, thethird 1

R5 contribution is zero as well, so that an overall 1R6 dependence of the

required transformed integrals for the Coulomb-type contraction results.Therefore, for well-separated charge distributions in the above sense, the

1R4 behavior of transformed integrals turns into a 1

R6 distance dependence. Sucha behavior is well known for van der Waals/dispersion-type interactions. Incontrast to the 1

R4 dependence, however, the 1R6 behavior is linked closely to

the exponential decay of the pseudo-density matrices P and �PP for systemswith nonvanishing HOMO–LUMO gaps. Our experience from SCF theoriesshows that the one-particle density matrix is fairly long-ranged. Althoughwe get for, e.g., DNA fragments a relatively early onset of a linear-scalingbehavior for the computation of the Hartree–Fock exchange, it has to bestressed that this feature is strongly enhanced by the integral contractions(see Refs. 71 and 76) and not solely due to the locality of the one-particle den-sity by itself. In this context, the true locality of the one-particle density matrixis needed for the 1

R6 decay, so that this behavior is expected to start only at sig-nificantly larger distances as compared with the 1

R4 decay. Nevertheless, it isclear that the 1

R6 decay can be exploited in an analogous fashion by imposingthe criteria listed above.

The implications of this decay behavior for the Coulomb-type productsare illustrated in Figure 19 for the example of linear alkanes. For an alkanewith four to five carbon atoms, the exact number of required transformedproducts (MP2, 6-31G�, providing an accuracy of 0.1 mHartree for the firstLaplace coefficient) scales already as low as N1:48 approaching the asymptoticlinear scaling.24 However, the pseudo-Schwarz screening drastically overesti-mates the number of required products and no linear-scaling can be achievedusing this criterion, because the distance dependence of the transformedproducts is not accounted for.

Exchange-Type ContractionThe exchange-type part of the AO–MP2 energy is computed as

eðaÞK ¼

Xmnls

ðm�nnjl�ssÞðaÞðmsjlnÞ ½188�


As discussed, the transformed two-electron integrals decay as 1R3, whereas the

untransformed ones decay as 1R, resulting in a total distance dependence of 1

R4.In addition, in the exchange contraction step, the exponentially decayingcharge densities of the untransformed integral �ms and �ln couple the twosides of the transformed integral. Therefore, as long as the transformed chargedistributions �m�nn and �l�ss decay exponentially, an overall exponential decayfor the exchange-type contraction results. The exponential coupling is similarto the one encountered for the formation of exchange-type contributions inSCF theories using our LinK method for computing energies,71 energy gradi-ents,76 or NMR chemical shifts,113 where the coupling of the two sides ofthe two-electron integrals is mediated over the one-particle densities or itsderivatives. Therefore, in this context, the exchange-type contribution(Eq. [188]) to the correlation energy decays not only as 1

R4, but exponentiallyfor systems with a nonvanishing HOMO–LUMO gap.24

0 200 400 600 800Number of basis functions

0

0.5

1.5

1

1.5

2

2.5

3

3.5

4

4.5

Num

ber

of s

igni

fica

nt p

rodu

cts

(109 )

C5H

12C

10H

22C

20H

42C

40H

82

QQZZMBIEexact

Figure 19 Comparison of the number of significant Coulomb-type integral productsðCnH2nþ2=6-31G� basis; in units of 109) as estimated over shells by Schwarz-typescreening (QQZZ; 10�5) and MBIE (10�5) with the exact number of products selectedvia basis functions. For the latter, a threshold of 10�8 has been selected to providecomparable accuracy in the absolute energies of 0.1 mH (only data for the first Laplacecoefficient in computing the MP2 energy is listed).


Rigorous Selection of Transformed Products viaMultipole-Based Integral Estimates (MBIE)

The discussion above suggests that for the exploitation of the stronglong-range decay behavior of at least 1

R4 for electron correlation effects, it iscrucial to introduce distance dependence into the integral estimates for trans-formed and untransformed two-electron integrals. Here, the MBIE schemeintroduced by Lambrecht and Ochsenfeld23 discussed in the introductory partsof the current review allows one to rigorously preselect which of the trans-formed products actually contribute to the correlation energy.24

To preselect the transformed integral products required for computingthe MP2 energy, one can modify the MBIE integral bounds23,24 so that anupper bound to the transformed integrals is obtained. In addition to thisscreening for the number of contributing products, one needs to select the sig-nificant untransformed integrals required for integral transformations. We willnot discuss the derivation of MBIE bounds for AO-MP2 further in the currentcontext, because the details would not provide more insight. For a detailedderivation the reader is referred to Ref. 24.

The performance of our MBIE method in its current stage for preselect-ing the significant number of contributing transformed products is illustratedin Figure 19. Although MBIE in its current stage still overestimates the numberof products, it is always a true upper bound, so that noncontributing productscan be safely discarded and only a linear-scaling number of products can bepreselected. The MBIE estimate in Figure 19 has been optimized with respectto the transformation as compared with the one described in Ref. 24. Weexpect further improvements in the future in order to approach the true num-ber of required products, so as to reduce the computational effort.

The MBIE screening formulas are crucial for the correct estimation of thelong-range behavior of correlations effects:

� First, MBIE describes the exponential coupling of the ‘‘bra’’ and ‘‘ket’’indices as does the QQZZ screening.

� Second, and most importantly, MBIE correctly describes the 1R4-

dependence of the transformed products, so that for larger separationsbetween ‘‘bra’’ and ‘‘ket’’ centers, the integral products vanish.

� Third, the MBIE estimates are rigorous upper bounds.

In addition, it is possible to exploit the 1R6 behavior as described above; how-

ever, the onset is expected to occur for a significantly larger ‘‘bra-ket’’ separa-tion, so we focused here on the exploitation of the 1

R4 decay.

Implications

The considerations presented in this last section of the chapter illustratethat dynamic correlation is a local effect and that its description should,


therefore, scale linearly with the size of the molecule. This is not only true forthe simple MP2 theory (where the ‘‘correlation interaction’’ between electronsdecays as 1=R4 and faster) on which we based our argumentation, but also formore sophisticated approaches like, for example, coupled-cluster theory.Although a tremendous amount of work has been done by many researchgroups in this field, much remains to be done. The path to such improvementsis, in principle, set, and based on the example of the foregoing analysis, we canbe optimistic that the scientific community will eventually reach this excitinggoal of performing highly accurate ab initio calculations for very largemolecular systems.

CONCLUSIONS

Much work has been done by many scientists over the last decadesto bring quantum chemistry to the impressive stage it is today. Thinkingback to just a bit more than 15 years ago, computing a non-symmetric mole-cule with, say, 10–20 atoms at the Hartree–Fock level was painful. Todaymolecules with more than 1000 atoms can be tackled at the HF or DFTlevel on one-processor computers, and widespread applicability to a multitudeof chemical and biochemical problems has been achieved. Although advancesin quantum chemistry certainly go hand in hand with the fast-evolvingincrease of computer speed, it is clear that the introduction of linear-scalingmethods over the last ten years, or so, has made important contributions tothis success.

In this tutorial we have described some of the basic ideas for reducing thecomputational scaling of quantum chemical methods, without going into thedetails of the many different approaches followed by the numerous researchgroups involved in this field. We have presented linear-scaling methods forthe calculation of SCF energies, energy gradients, and response properties,which open new possibilities for studying molecular systems with 1000 andmore atoms on today’s computers. In addition, the given outlook on linear-scaling electron correlation methods indicates that much more can be expectedand that more and more highly accurate approaches in the ab initio hierarchywill be available as well for large molecules.

Despite the success of linear-scaling methods, a multitude of challengesand open questions remain in the linear-scaling community. Some of the moreimportant challenges include the following issues:

� Many molecular properties remain for which so far no linear-scalingmethods have been devised and implemented.

� Reducing the prefactors stays an important issue and becomes even moreimportant for linear-scaling methods (because any gain in the prefactordirectly translates into the treatable molecule size).

Conclusions 73

� The results for some molecular properties or electron correlation energiesdepend strongly on the size of the basis set; post-HF methods, inparticular, require large basis sets. Even if a method scales linearly withmolecular size, the computational cost may increase dramatically withthe basis set size. Therefore, much more work needs to be devoted fortackling this basis set problem.

� The more ‘‘metallic’’ a system is (small HOMO–LUMO gap), the lesslocal is the one-particle density matrix. Therefore, the question abouthow to deal with strong delocalization in an efficient manner remains animportant challenge.

� Because matrix multiplications are central to many aspects of linear-scaling schemes, any further speed-up in sparse matrix multi-plications will be of importance, in particular if the systems are more‘‘metallic’.

� Although a multitude of open questions still exists even for HF and DFTlinear-scaling schemes, the rigorous and efficient reduction of scaling inpost-HF methods to account for the missing electron correlation effectsremains one of the central challenges for the success of quantum chemistry.

� Many large molecular systems are flexible, and dynamic effects arenecessary for a realistic description. Therefore, molecular dynamicssimulations are needed that require the computation of a huge number ofpoints on a hypersurface, resulting in an extremely high computationalcost for reliable methods.

From this small list of challenges, it becomes clear that there is still agreat need for developing and improving linear-scaling methods. Nevertheless,the foregoing discussion has shown that much has been achieved for theapproximate solution of the Schrodinger equation even for large molecules.For the future, the ultimate goal of solving the molecular Schrodinger equationto highest accuracy and efficiency appears to be reachable. Accomplishingthis goal will allow us to rationalize and understand, to predict, andultimately, to control the chemical and biochemical processes of very largemolecular systems.

REFERENCES

1. E. Schrodinger, Ann. Phys., 79, 361 (1926). Quantisierung als Eigenwertproblem (ersteMitteilung).

2. D. R. Hartree, Proc. Cambridge Philos. Soc., 24, 89 (1928). TheWaveMechanics of an Atomwith a Non-Coulomb Central Field. I. Theory and Methods.

3. V. Fock,Z. Phys., 61, 126 (1930). Naherungsmethode zur Losung des QuantenmechanischenMehrkorperproblems.; Z. Phys., 62, 795 (1930). ‘‘Self-Consistent Field’’ mit Austausch furNatrium.


4. A. Szabo and N. S. Ostlund, Modern Quantum Chemistry - Introduction to AdvancedElectronic Structure Theory, Dover Publications, Inc., Mineola, New York, 1989.

5. C.Møller andM. S. Plesset,Phys. Rev., 46, 618 (1934). Note on anApproximation Treatmentfor Many-Electron Systems.

6. R. J. Bartlett and J. F. Stanton, in Reviews in Computational Chemistry, Vol. 5, K. B.Lipkowitz andD. B. Boyd, Eds., VCH Publishers, NewYork, pp. 65–169, 1990. Applicationof Post-Hartree-Fock Methods: A Tutorial.

7. P. Hohenberg and W. Kohn, Phys. Rev. B, 136, 864 (1964). Inhomogeneous ElectronGas.

8. W. Kohn and L. J. Sham, Phys. Rev., 140, A1133 (1965). Self-Consistent Equations IncludingExchange and Correlation Effects.

9. R. G. Parr and W. Yang, Density-Functional Theory of Atoms and Molecules, InternationalSeries of Monographs on Chemistry 16, Oxford Science Publications, Oxford, UnitedKingdom, 1989.

10. G. E. Moore, Electronics Magazine, 19 April, 1965. Cramming More Components ontoIntegrated Circuits.

11. W. Kutzelnigg, Einfuhrung in die Theoretische Chemie, VCH Weinheim, Weinheim, Ger-many, 2001.

12. I. N. Levine, Quantum Chemistry, Fifth ed., Prentice-Hall, Inc., Englewood Cliffs, NewJersey, 2000.

13. T. Helgaker, P. Jørgensen, and J. Olsen, Molecular Electronic Structure Theory, Wiley,Chichester, United Kingdom, 2000.

14. M. Born and R. A. Oppenheimer, Ann. Phys., 84, 457 (1927) Zur Quantentheorie derMolekeln.

15. B. T. Sutcliffe, inComputational Techniques inQuantumChemistry, G. H. F. Diercksen, B. T.Sutcliffe, and A. Veillard, Eds., Reidel, Boston, Massachusetts, 1975, pp. 1–105. Funda-mentals of Computational Quantum Chemistry.

16. J. C. Slater, Quantum Theory of Matter, 2nd ed., McGraw-Hill, New York, 1968.

17. C. C. J. Roothaan, Rev. Mod. Phys., 23, 69 (1951). New Developments in Molecular OrbitalTheory.

18. G. G. Hall, Proc. Roy. Soc., A205, 541 (1951). The Molecular-Orbital Theory of ChemicalValency. VIII. A Method of Calculating Ionization Potentials.

19. J. Almlof, K. Faegri, and K. Korsell, J. Comput. Chem., 3, 385 (1982). Principles for a DirectSCF Approach to LCAO-MO Ab-initio Calculations.

20. V. Dyczmons, Theoret. Chim. Acta, 28, 307 (1973). No N4-dependence in the Calculation ofLarge Molecules.

21. M. Haser and R. Ahlrichs, J. Comput. Chem., 10, 104 (1989). Improvements on the DirectSCF Method.

22. D. Cremer and J. Gauss, J. Comput. Chem., 7, 274 (1986). An Unconventional SCF Methodfor Calculations on Large Molecules.

23. D. S. Lambrecht and C. Ochsenfeld, J. Chem. Phys., 123, 184101 (2005). Multipole-BasedIntegral Estimates for the Rigorous Description of Distance Dependence in Two-ElectronIntegrals.

24. D. S. Lambrecht, B. Doser, and C. Ochsenfeld, J. Chem. Phys., 123, 184102 (2005). RigorousIntegral Screening for Electron Correlation Methods.

25. J. E. Almlof, USIP Report 72-09 (1972), republished in Theor. Chem. Acc. memorial issue: P.R. Taylor Theor. Chem. Acc., 97, 10 (1997). Methods for the Rapid Evaluation of ElectronRepulsion Integrals in Large-scale LCGO Calculations.

26. J. E. Almlof, in Modern Electronic Structure Theory, D. Yarkony, C.-Y. Ng, Eds., WorldScientific Singapore, 1994, pp. 121–151. Direct Methods in Electronic Structure Theory.

27. H. Eyring, J. Walter, and G. E. Kimball, Quantum Chemistry, Wiley, New York, 1947.

References 75

28. J. O. Hirschfelder, C. F. Curtiss, and R. B. Byron, Molecular Theory of Gases and Liquids,Wiley, New York, 1954.

29. A. D. Buckingham, in Intermolecular Interactions: From Diatomics to Biopolymers,B. Pullman, Ed., Wiley, New York, 1987, pp. 1–67. Basic Theory of Intermolecular Forces:Applications to Small Molecules.

30. P. M.W. Gill, B. G. Johnson, and J. A. Pople,Chem. Phys. Lett., 217, 65 (1994). A Simple yetPowerful Upper Bound for Coulomb Integrals.

31. C. A. White, B. G. Johnson, P. M. W. Gill, and M. Head-Gordon, Chem. Phys. Lett., 230, 8(1994). The Continuous Fast Multipole Method.

32. P. M. Morse and H. Feshbach, Methods of Theoretical Physics, Volume I, McGraw-HillEducation, Tokyo, Japan, 1953.

33. P. M. Morse and H. Feshbach, Methods of Theoretical Physics, Volume II, McGraw-HillEducation, Tokyo, Japan, 1953.

34. D. E. Williams, in Reviews in Computational Chemistry, Vol. 2, K. B. Lipkowitz and D. B.Boyd, Eds., VCH Publishers, New York, 1991, pp. 219–271. Net Atomic Charge andMultipole Models for the ab Initio Molecular Electric Potential.

35. G. B. Arfken andH. J.Weber,MathematicalMethods for Physicists, Academic Press, London,United Kingdom, 2001.

36. S. Obara and A. Saika, J. Chem. Phys., 84, 3963 (1985). Efficient Recursive Computation ofMolecular Integrals over Cartesian Gaussian Functions.

37. M. Head-Gordon and J. A. Pople, J. Chem. Phys., 89, 5777 (1988). A Method for Two-electron Gaussian Integral and Integral Derivative Evaluation Using Recurrence Relations.

38. P. M. W. Gill and J. A. Pople, J. Quantum Chem., 40, 753 (1991). The Prism Algorithm forTwo-Electron Integrals.

39. L. E. McMurchie and E. R. Davidson, J. Comput. Phys., 26, 218 (1978). One- and Two-Electron Integrals over Cartesian Gaussian Functions.

40. P. M. W. Gill, B. G. Johnson, and J. A. Pople, Int. J. Quantum Chem., 40, 745 (1991). Two-Electron Repulsion Integrals Over Gaussian s Functions.

41. P. M. W. Gill, M. Head-Gordon, and J. A. Pople, Int. J. Quantum Chem., 23, 269 (1989). AnEfficient Algorithm for the Generation of Two-Electron Repulsion Integrals over GaussianBasis Functions.

42. A. V. Scherbinin, V. I. Pupyshev, and N. F. Stepanov Int. J. Quantum Chem., 60, 843 (1996).On the Use of Multipole Expansion of the Coulomb Potential in Quantum Chemistry.

43. V. R. Saunders, in Methods in Computational Molecular Physics, G. H. F. Diercksen and S.Wilson, Eds., NATO ASI Series, Series C: Mathematical and Physical Sciences, Vol. 113, D.Reidel Publishing Company, Dordrecht, The Netherlands, 1983, pp. 1–36. MolecularIntegrals for Gaussian Type Functions.

44. T. Helgaker and P. R. Taylor,Modern Electronic Structure Theory, Vol. 2, D. Yarkony, Ed.,World Scientific, Singapore, 1995, pp. 725–856. Gaussian Basis Sets and MolecularIntegrals.

45. L. Greengard and V. Rokhlin, J. Comput. Phys., 60, 187 (1990). Rapid Solution of IntegralEquations of Classical Potential Theory.

46. R. Beatson and L. Greengard, Available: www.math.nyu.edu/faculty/greengar/shortcour-se_fmm.pdf. A Short Course on Fast Multipole Methods.

47. L. Greengard, Science, 265, 909 (1994). Fast Algorithms for Classical Physics.

48. C. A.White andM.Head-Gordon, J. Chem. Phys., 101, 6593 (1994). Derivation and EfficientImplementation of the Fast Multipole Method.

49. C. A. White and M. Head-Gordon, J. Chem. Phys., 105, 5061 (1996). Rotating Around theAngular Momentum Barrier in Fast Multipole Method Calculations.

50. J. Barnes and P. Hut, Nature (London), 324, 446 (1986). A Hierarchical OðNlogNÞ Force-Calculation Algorithm.


51. M. Challacombe, E. Schwegler, and J. Almlof, in Computational Chemistry: Reviewof Current Trends, Vol. 53, J. Lesczszynski, Ed., World Scientific, Singapore, 1996,pp. 4685–4695. Modern Developments in Hartree-Fock Theory: Fast Methods for Comput-ing the Coulomb Matrix.

52. M. Challacombe, E. Schwegler, and J. Almlof, J. Chem. Phys., 104, 4685 (1995). FastAssembly of the Coulomb Matrix: A Quantum Chemical Tree Code.

53. J. Cipriani and B. Silvi,Mol. Phys., 45, 259 (1982). Cartesian Expression of ElectricMultipoleMoments.

54. L. Greengard and J. Strain, J. Sci. Stat. Comp., 12, 79 (1991). The Fast Gauss Transform.

55. H. G. Petersen, D. Soelvason, J. W. Perram, and E. R. Smith, J. Chem. Phys., 101, 8870(1994). The very fast multipole method.

56. M. C. Strain, G. E. Scuseria, and M. J. Frisch, Science, 271, 51 (1996). Achieving LinearScaling for the Electronic Quantum Coulomb Problem.

57. O. Vahtras, J. Almlof, and M. W. Feyereisen, Chem. Phys. Lett., 213, 514 (1993). IntegralApproximations for LCAO-SCF Calculations.

58. R. A. Kendall and H. A. Fruchtl, Theor. Chem. Acc., 97, 158 (1997). The Impact of theResolution of the Identity Approximate Integral Method on Modern Ab Initio AlgorithmDevelopment.

59. B. I. Dunlap, J. W. D. Connolly, and J. R. Sabin, J. Chem. Phys., 71, 3396 (1979). On SomeApproximations in Applications of Xa Theory.

60. K. Eichkorn, O. Treutler, H. Oehm, M. Haser, and R. Ahlrichs, Chem. Phys. Lett., 240, 283(1995). Auxiliary Basis Sets to Approximate Coulomb Potentials.

61. F. Weigend, Phys. Chem. Chem. Phys., 4, 4285 (2002). A Fully Direct RI-HF Algorithm:Implementation, Optimised Auxiliary Basis Sets, Demonstration of Accuracy and Efficiency.

62. M. Sierka, A. Hogekamp, and R. Ahlrichs, J. Chem. Phys., 118, 9136 (2003). Fast Evaluationof The Coulomb Potential for Electron Densities Using Multipole Accelerated Resolution ofIdentity Approximation.

63. L. Fusti-Molnar and P. Pulay, J. Chem. Phys., 117, 7827 (2002). The Fourier TransformCoulomb Method: Efficient and Accurate Calculation of the Coulomb Operator in aGaussian Basis.

64. L. Fusti-Molnar and P. Pulay, J. Mol. Struct. (THEOCHEM), 666–667, 25 (2003). Gaussian-based First-principles Calculations on Large Systems Using the Fourier Transform CoulombMethod.

65. L. Fusti-Molnar and P. Pulay, J. Chem. Phys., 119, 11080 (2003). New Developments in theFourier Transform Coulomb Method: Efficient and Accurate Localization of the FilteredCore Functions and Implementation of the Coulomb Energy Forces.

66. L. Fusti-Molnar and J. Kong, J. Chem. Phys., 122, 074108 (2005). Fast and AccurateCoulomb Calculation with Gaussian Functions.

67. R. Ahlrichs, M. Hoffmann-Ostenhof, T. Hoffmann-Ostenhof, and J. D. Morgan III, Phys.Rev. A, 23, 2106 (1981). Bounds on Decay of Electron Densities with Screening.

68. C. Ochsenfeld andM.Head-Gordon,Chem. Phys. Lett., 270, 399 (1997). A Reformulation ofthe Coupled Perturbed Self-consistent Field Equations Entirely Within a Local AtomicOrbital Density Matrix-based Scheme.

69. P. E. Maslen, C. Ochsenfeld, C. A. White, M. S. Lee, and M. Head-Gordon, J. Phys. Chem.,102, 2215 (1998). Locality and Sparsity of ab initio One–Particle Density Matrices andLocalized Orbitals.

70. W. Kohn, Phys. Rev. Lett., 76, 3168 (1996). Density Functional and Density Matrix MethodScaling Linearly with the Number of Atoms.

71. C. Ochsenfeld, C. A. White, and M. Head-Gordon, J. Chem. Phys., 109, 1663 (1998). Linearand Sublinear Scaling Formation of Hartree-Fock-type Exchange Matrices.

References 77

72. E. Schwegler and M. Challacombe, J. Chem. Phys., 105, 2726 (1996). Linear ScalingComputation of the Hartree-Fock Exchange Matrix.

73. J. C. Burant, G. E. Scuseria, and M. J. Frisch, J. Chem. Phys., 105, 8969 (1996). A LinearScaling Method for Hartree-Fock Exchange Calculations of Large Molecules.

74. E. Schwegler, M. Challacombe, and M. Head-Gordon, J. Chem. Phys., 106, 9708 (1997).Linear Scaling Computation of the FockMatrix. II. Rigorous Bounds on Exchange Integralsand Incremental Fock Build.

75. E. Schwegler and M. Challacombe, Theoret. Chim. Acta, 104, 344 (2000). Linear ScalingComputation of the Hartree-Fock Exchange Matrix. III. Formation of the Exchange Matrixwith Permutational Symmetry.

76. C. Ochsenfeld, Chem. Phys. Lett., 327, 216 (2000), Linear Scaling Exchange Gradients forHartree-Fock and Hybrid Density Functional Theory.

77. H. Sambe and R. H. Felton, J. Chem. Phys., 62, 1122 (1975). A New ComputationalApproach to Slater’s SCF-Xa Equation.

78. C. Satoko,Chem. Phys. Lett., 82, 111 (1981). Direct Force Calculation in the XaMethod andits Application to Chemisorption of an Oxygen Atom on the Al(111) Surface.

79. R. Fournier, J. Andzelm, and D. R. Salahub, J. Chem. Phys., 90, 6371 (1989). AnalyticalGradient of the Linear Combination of Gaussian-type Orbitals — Local Spin DensityEnergy.

80. J. A. Pople, P.M.W. Gill, and B. G. Johnson,Chem. Phys. Lett., 199, 557 (1992). Kohn-ShamDensity-Functional Theory within a Finite Basis Set.

81. J. Tao, J. P. Perdew, V.N. Staroverov, andG. E. Scuseria,Phys. Rev. Lett., 91, 146401 (2003).Climbing the Density Functional Ladder: Nonempirical Meta-Generalized GradientApproximation Designed for Molecules and Solids.

82. J. P. Perdew, A. Ruzsinszky, J. Tao, V. N. Staroverov, G. E. Scuseria, and G. I. Csonka,J. Chem. Phys., 123, 062201 (2005). Prescription for the Design and Selection of DensityFunctional Approximations: More Constraint Satisfaction with Fewer Fits.

83. B. G. Johnson, C. A. White, Q. Zhang, B. Chen, R. L. Graham, P. M. W. Gill, and M. Head-Gordon, inRecentDevelopments inDensity Functional Theory, J.M. Seminario, Ed., Vol. 4,Elsevier, Amsterdam, The Netherlands, 1996, pp. 441–463. Advances in Methodologies forLinear-Scaling Density Functional Calculations.

84. G. E. Scuseria, J. Phys. Chem. A, 103, 4782 (1999). Linear Scaling Density FunctionalCalculations with Gaussian Orbitals.

85. A. D. Becke, J. Chem. Phys., 98, 5648 (1992). Density-Functional Thermochemistry. III. TheRole of Exact Exchange.

86. P. M.W. Gill, B. G. Johnson, and J. A. Pople,Chem. Phys. Lett., 209, 506 (1993). A StandardGrid for Density Functional Calculations.

87. A. D. Becke, J. Chem. Phys., 88, 2547 (1988). A Multicenter Numerical Integration Schemefor Polyatomic Molecules.

88. X.-P. Li, R. W. Nunes, and D. Vanderbilt, Phys. Rev. B, 47, 10891 (1993). Density-matrixElectronic-structure Method with Linear System-size Scaling.

89. R. W. Nunes and D. Vanderbilt, Phys. Rev. B, 50, 17611 (1994). Generalization of theDensity-matrix Method to a Nonorthogonal Basis.

90. J. M.Milliam and G. E. Scuseria, J. Chem. Phys., 106, 5569 (1997). Linear Scaling ConjugateGradient Density Matrix Search as an Alternative to Diagonalization for First PrinciplesElectronic Structure Calculations.

91. M.Head-Gordon, Y. Shao, C. Saravanan, and C. A.White,Mol. Phys., 101, 37 (2003). CurvySteps for Density Matrix Based Energy Minimization: Tensor Formulation and ToyApplications.

92. T. Helgaker, H. Larsen, J. Olsen, and P. Jørgensen,Chem. Phys. Lett., 327, 397 (2000). DirectOptimization of the AO Density Matrix in Hartree-Fock and Kohn-Sham Theories.


93. H. Larsen, J. Olsen, P. Jørgenson, and T. Helgaker, J. Chem. Phys., 115, 9685 (2001). DirectOptimization of the Atomic-orbital Density Matrix Using the Conjugate-gradient Methodwith a Multilevel Preconditioner.

94. M. Challacombe, J. Chem. Phys., 110, 2332 (1999). A Simplified Density Matrix Minimiza-tion for Linear Scaling Self-Consistent Field Theory.

95. W. Yang,Phys. Rev. Lett., 66, 1438 (1991). Direct Calculation of ElectronDensity inDensity-Functional Theory.

96. W. Yang, J. Chem. Phys., 94, 1208 (1991). A Local Projection Method for the LinearCombination of Atomic Orbital Implementation of Density-Functional Theory.

97. Q. Zhao and W. Yang, J. Chem. Phys., 102, 9598 (1995). Analytical Energy Gradients andGeometry Optimization in the Divide-and-conquer Method for Large Molecules.

98. W. Yang and T.-S. Lee, J. Chem. Phys., 103, 5674 (1995). A Density-matrix Divide-and-conquer Approach for Electronic Structure Calculations of Large Molecules.

99. S. Goedecker and L. Colombo, Phys. Rev. Lett., 73, 122 (1994). Efficient Linear ScalingAlgorithm for Tight-binding Molecular Dynamics.

100. S. Goedecker andM. Teter, Phys. Rev. B, 51, 9455 (1995). Tight-binding Electronic-structureCalculations and Tight-binding Molecular Dynamics with Localized Orbitals.

101. S. Goedecker, J. Comput. Phys., 118, 261 (1995). Low Complexity Algorithms for ElectronicStructure Calculations.

102. J. Kim, F. Mauri, and G. Galli, Phys. Rev. B, 52, 1640 (1995). Total-energy GlobalOptimizations using Nonorthogonal Localized Orbitals.

103. F. Mauri and G. Galli, Phys. Rev. B, 50, 4316 (1994). Electronic-structure Calculations andMolecular-dynamics Simulations with Linear System-size Scaling.

104. F. Mauri, G. Galli, and R. Car, Phys. Rev. B, 47, 9973 (1993). Orbital Formulation forElectronic-structure Calculations with Linear System-size Scaling.

105. P. Ordejon, Comput. Mater. Sci., 12, 157 (1998). Order-N Tight-binding Methods forElectronic-structure and Molecular Dynamics.

106. E. Hernandez and M. Gillan, Phys. Rev. B, 51, 10157 (1995). Self-consistent First-principlesTechnique with Linear Scaling.

107. W.Hierse and E. Stechel,Phys. Rev. B, 50, 17811 (1994). Order-NMethods in Self-consistentDensity-functional Calculations.

108. S. Goedecker, Rev. Mod. Phys., 71, 1085 (1999). Linear Scaling Electronic StructureMethods.

109. S. Goedecker and G. E. Scuseria, Commun. Science & Engineering, 5, 14 (2003). LinearScaling Electronic Structure Methods in Chemistry and Physics.

110. A. D. Daniels and G. E. Scuseria, J. Chem. Phys., 110, 1321 (1999). What Is theBest Alternative to Diagonalization of the Hamiltonian in Large Scale SemiempiricalCalculations?

111. D. R. Bowler, T. Miyazaki, and M. J. Gillan, J. Phys.: Condens. Matter, 14, 2781 (2002).Recent Progress in Linear Scaling Ab Initio Electronic Structure Techniques.

112. D. R. Bowler, I. J. Bush, and M. J. Gillan, Int. J. Quantum Chem., 77, 831 (2000). PracticalMethods for Ab Initio Calculations on Thousands of Atoms.

113. C. Ochsenfeld, J. Kussmann, and F. Koziol,Angew. Chem., 116, 4585 (2004);Angew. Chem.Int. Ed., 43, 4485 (2004). Ab Initio NMR Spectra for Molecular Systems with a Thousandand More Atoms: A Linear-Scaling Method.

114. M. Head-Gordon, M. S. Lee, P. E. Maslen, T. van Voorhis, and S. Gwaltney, ModernMethods and Algorithms of Quantum Chemistry, Proceedings, Second ed., J. Grotendorst,Ed., John vonNeumann Institute for Computing, Julich, Germany, NIC Series, Vol. 3, 2000,pp. 593–638. Tensors in Electronic Structure Theory: Basic Concepts and Applications toElectron Correlation Models.

References 79

115. J. A. Schouten, Tensor Analysis for Physicists, 2nd ed., Dover Publications, Mineola, NewYork, 1988.

116. M. Head-Gordon, P. E. Maslen, and C. A. White, J. Chem. Phys., 108, 616 (1998). A TensorFormulation of Many-electron Theory in a Nonorthogonal Single-particle Basis.

117. A. Messiah, Quantum Mechanics, Dover Publications, Mineola, New York, 1999.

118. R. McWeeny, Methods of Molecular Quantum Mechanics (Theoretical Chemistry), 2nd ed.,Academic Press Limited, London, United Kingdom, 1989.

119. M. S. Daw, Phys. Rev. B, 47, 10895 (1993). Model for Energetics of Solids based on theDensity Matrix.

120. R. McWeeny, Phys. Rev., 114, 1528 (1959). Hartree-Fock Theory with Nonorthogonal BasisFunctions.

121. R. McWeeny, Rev. Mod. Phys., 32, 335 (1960). Some Recent Advances in Density MatrixTheory.

122. C. A. White, P. E. Maslen, M. S. Lee, and M. Head-Gordon, Chem. Phys. Lett., 276, 133(1997). The Tensor Properties of Energy Gradients Within a Non-orthogonal Basis.

123. J. J. Sakurai,Modern Quantum Mechanics, Addison Wesley, Reading, Massachusetts, 1993.

124. W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes inFortran, 2nd ed., Cambridge University Press, Cambridge, United Kingdom, 1996.

125. P. Pulay,Mol. Phys. 17, 197 (1969). Ab initio Calculation of Force Constants and EquilibriumGeometries. I. Theory.

126. P. Pulay, in Modern Electronic Structure Theory, D. Yarkony, Ed., World Scientific,Singapore, 1995, pp. 1191–1240. Analytical Derivative Techniques and the Calculationof Vibrational Properties, in Modern Electronic Structure Theory.

127. P. Pulay, in Ab Initio Methods in Quantum Chemistry, K. P. Lawley, Ed., Wiley, New York,1987, pp. 241–286. Analytic Derivative Methods in Quantum Chemistry.

128. M. Frisch, M. Head-Gordon, and J. A. Pople, Chem. Phys., 141, 189 (1990). Direct AnalyticSCF Second Derivatives and Electric Field Properties.

129. Y. Shao, C. A. White, and M. Head-Gordon, J. Chem. Phys., 114, 6572 (2001). EfficientEvaluation of the Coulomb Force in Density Functional Theory Calculations.

130. J. Gauss, inModernMethods and Algorithms ofQuantumChemistry, Proceedings, 2nd ed., J.Grotendorst, Ed., John von Neumann Institute for Computing, Julich, Germany, NIC SeriesVol. 3, 2000, pp. 541–592. Molecular Properties.

131. J. Gerratt and I. M. Mills, J. Chem. Phys., 49, 1968 (1719). Force Constants and Dipole-Moment Derivatives of Molecules form Perturbed Hartree-Fock Calculations.

132. C. E. Dykstra and P. G. Jasien, Chem. Phys. Lett., 109, 388 (1984). Derivative Hartree-FockTheory to all Orders.

133. N. C. Handy, D. J. Tozer, G. J. Laming, C.W.Murray, and R. D. Amos, Isr. J. Chem. 33, 331(1993). Analytic Second Derivatives of the Potential Energy Surface.

134. B. G. Johnson and M. J. Fisch, J. Chem. Phys., 100, 7429 (1994). An Implementation ofAnalytic Second Derivatives of the Gradient-corrected Density Functional Energy.

135. J. A. Pople, R. Krishnan, H. B. Schlegel, and J. S. Binkley, Int. J. Quantum Chem. Symp., S13225 (1979). Derivative Studies in Hartree-Fock and Møller-Plesset Theories.

136. Y. Osamura, Y. Yamaguchi, P. Saxe, D. J. Fox, M. A. Vincent, and H. F. Schafer III.,J. Mol. Struct.: THEOCHEM, 103, 183 (1983). Analytic Second Derivative Techniquesfor Self-Consistent-Field Wave Functions. A new Approach to the Solution of the coupledPerturbed Hartree-Fock Equations.

137. J. Gauss, Ber. Bunsenges. Phys. Chem., 99, 1001 (1995). Accurate Calculation of NMRChemical Shifts.

138. T. Helgaker, M. Jaszunski, and K. Ruud, Chem. Rev., 99, 293 (1999). Ab Initio Methods forthe Calculation of NMR Shielding and Indirect Spin-Spin Coupling Constants.


139. U. Fleischer, W. Kutzelnigg, and C. van Wullen, in Encyclopedia of Computational Chem-istry, P. v. R. Schleyer, N. L. Allinger, T. Clark, J. Gasteiger, P. A. Kollman,H. F. Schaefer III,and P. R. Schreiner, Eds., Wiley, Chichester, United Kingdom, 1998, pp. 1827. Ab initioNMR Chemical Shift Computation.

140. T. Helgaker, P. J. Wilson, R. D. Amos, and N. C. Handy, J. Chem. Phys., 113, 2983 (2000).Nuclear Shielding Constants by Density Functional Theory with Gauge Including AtomicOrbitals.

141. G. Schreckenbach and T. Ziegler, J. Phys. Chem., 99, 606 (1995). Calculation of NMRShielding Tensors Using Gauge-Including Atomic Orbitals and Modern Density FunctionalTheory.

142. W. Kutzelnigg, Isr. J. Chem., 19, 193 (1980). Theory of Magnetic Susceptibilities and NMRChemical Shifts in Terms of Localized Quantities.

143. A. E. Hansen and T. D. Bouman, J. Chem. Phys., 82, 5035 (1985). Localized Orbital/LocalOrigin Method for Calculation and Analysis of NMR Shieldings. Applications to 13CShielding Tensors.

144. F. London, J. Phys. Radium, 8, 397 (1937). Quantum Theory of Interatomic Currents inAromatic Compounds.

145. R. Ditchfield, Molecular Physics, 27, 789 (1974). Self-consistent Perturbation Theory ofDiamagnetism. I. A Gauge-invariant LCAO Method for N.M.R. Chemical Shifts.

146. K. Wolinski, J. F. Hinton, and P. Pulay, J. Am. Chem. Soc., 112, 8251 (1990). EfficientImplementation of the Gauge-Independent AtomicOrbitalMethod for NMRChemical ShiftCalculations.

147. J. R. Cheeseman, G. W. Trucks, T. A. Keith, and M. J. Frisch, J. Chem. Phys., 104, 5497(1996). A Comparison of Models for Calculating Nuclear Magnetic Resonance ShieldingTensors.

148. M. Haser, R. Ahlrichs, H. P. Baron, P. Weis, and H. Horn, Theoret. Chim. Acta, 83, 455(1992). Direct Computation of Second-order SCF Properties of Large Molecules on Work-station Computers with an Application to Large Carbon Clusters.

149. C. Ochsenfeld, Phys. Chem. Chem. Phys., 2, 2153 (2000). An Ab Initio Study of theRelation between NMR Chemical Shifts and Solid-State Structures: HexabenzocoroneneDerivatives.

150. C. Ochsenfeld, S. P. Brown, I. Schnell, J. Gauss, and H. W. Spiess, J. Am. Chem. Soc., 123,2597 (2001). Structure Assignment in the Solid State by the Coupling of Quantum ChemicalCalculations with NMR Experiments: A Columnar Hexabenzocoronene Derivative.

151. S. P. Brown, T. Schaller, U. P. Seelbach, F. Koziol, C. Ochsenfeld, F.-G. Klarner, and H. W.Spiess, Angew. Chem. Int. Ed., 40, 717 (2001). Structure and Dynamics of the Host-GuestComplex of a Molecular Tweezer: Coupling Synthesis, Solid-State NMR, and Quantum-Chemical Calculations.

152. C. Ochsenfeld, F. Koziol, S. P. Brown, T. Schaller, U. P. Seelbach, and F.-G. Klarner, SolidState Nucl. Magn. Reson., 22, 128 (2002). A Study of a Molecular Tweezer Host-GuestSystem by a Combination of Quantum-Chemical Calculations and Solid-State NMRExperiments.

153. J. Gauss, and J. F. Stanton, Adv. Chem. Phys. 123, 355 (2002). Electron-CorrelatedApproaches for the Calculation of NMR Chemical Shifts.

154. M. Kaupp, M. Buhl, and V. G. Malkin (Eds.), Calculation of NMR and EPR Parameters,Wiley-VCH Weinheim, 2004.

155. H. Larsen, T. Helgaker, J. Olsen, and P. Jørgensen, J. Chem. Phys., 115, 10344 (2001).Geometrical Derivatives and Magnetic Properties in Atomic-orbital Density-based Hartree-Fock Theory.

156. V.Weber andM. Challacombe, J. Chem. Phys., 123, 044106 (2005). Higher-Order Responsein O (N) by Perturbed Projection.

157. P. Pulay,Chem. Phys. Lett., 100, 151 (1983). Localizability of Dynamic Electron Correlation.

References 81

158. S. Saebø and P. Pulay,Chem. Phys. Lett., 113, 13 (1985). Local Configuration Interaction: AnEfficient Approach for Larger Molecules.

159. P. Pulay and S. Saebø, Theoret. Chim. Acta, 69, 357 (1985). Orbital-invariant Formulationand Second-order Gradient Evaluation in Møller-Plesset Perturbation Theory.

160. S. Saebø and P. Pulay, J. Chem. Phys., 86, 914 (1987). Fourth-order Møller-Plesset Perturba-tion Theory in the Local Correlation Treatment. I. Method.

161. C. Hampel and H.-J. Werner, J. Chem. Phys., 104, 6286 (1996). Local Treatment of ElectronCorrelation in Coupled Cluster Theory.

162. M. Schutz, G. Hetzer, andH.-J.Werner, J. Chem. Phys., 111, 5691 (1999). Low-order ScalingLocal Electron Correlation Methods. I. Linear Scaling Local MP2.

163. G. Hetzer, M. Schutz, H. Stoll, and H.-J. Werner, J. Chem. Phys., 113, 9443 (2000). Low-Order Scaling Local Correlation Methods II: Splitting the Coulomb Operator in LinearScaling Local Second-Order Møller-Plesset Perturbation Theory.

164. P. E. Maslen and M. Head-Gordon, Chem. Phys. Lett., 283, 102 (1998). Non-iterative LocalSecond Order Møller-Plesset Theory.

165. M. S. Lee, P. E. Maslen, and M. Head-Gordon, J. Chem. Phys., 112, 3592 (2000). CloselyApproximating Second-orderMøller-Plesset Perturbation Theory with a Local Triatomics inMolecules Model.

166. J. Almlof, Chem. Phys. Lett., 181, 319 (1991). Elimination of Energy Denominators inMøller-Plesset Perturbation Theory by a Laplace Transform Approach.

167. M. Haser and J. Almlof, J. Chem. Phys., 96, 489 (1992). Laplace Transform Techniques inMøller-Plesset Perturbation Theory.

168. M. Haser, Theoret. Chim. Acta, 87, 147 (1993). Møller-Plesset (MP2) Perturbation Theoryfor Large Molecules.

169. P. Y. Ayala and G. E. Scuseria, J. Chem. Phys., 110, 3660 (1999). Linear Scaling Second-orderMøller-Plesset Theory in the Atomic Orbital Basis for Large Molecular Systems.

170. G. E. Scuseria and P. Y. Ayala, J. Chem. Phys., 111, 8330 (1999). Linear Scaling CoupledCluster and Perturbation Theories in the Atomic Orbital Basis.

171. R. Friesner, R. B.Murphy,M. D. Beachy,M. N. Ringnalda,W. T. Pollard, B. D. Dunietz, andY. Cao, J. Phys. Chem., 103, 1913 (1999). Correlated ab Initio Electronic StructureCalculations for Large Molecules.

172. D. Walter, A. B. Szilva, K. Niedfeldt, and E. A. Carter, J. Chem. Phys., 117, 1982 (2002).Local Weak-pairs Pseudospectral Multireference Configuration Interaction.

173. M. Schutz and H.-J. Werner, J. Chem. Phys., 114, 661 (2001). Low-order Scaling LocalElectron Correlation Methods. IV. Linear Scaling Local Coupled-Cluster (LCCSD).

174. M. Schutz, J. Chem. Phys., 116, 8772 (2002). Low-order Scaling Local Electron CorrelationMethods. V. Connected Triples beyond (T): Linear Scaling Local CCSDT-1b.

175. G. E. Scuseria and P. Y. Ayala, J. Chem. Phys., 111, 8330 (1999). Linear Scaling CoupledCluster and Perturbation Theories in the Atomic Orbital Basis.

176. P. Knowles, M. Schutz, and H.-J. Werner, in Modern Methods and Algorithms of QuantumChemistry, Proceedings, Second ed., J. Grotendorst, Ed., John von Neumann Institute forComputing, Julich, Germany NIC Series, Vol. 3, 2000, pp. 97–197. Ab Initio Methods forElectron Correlation in Molecules.

177. J. L. Whitten, J. Chem. Phys., 58, 4496 (1973). Coulombic Potential Energy Integrals andApproximations.


CHAPTER 2

Conical Intersections in MolecularSystems

Spiridoula Matsika

Department of Chemistry, Temple University, Philadelphia,Pennsylvania 19122

INTRODUCTION

The study of molecular systems using quantum mechanics is based on theBorn–Oppenheimer approximation.1 This approximation relies on the factthat the electrons, because of their smaller mass, move much faster than theheavier nuclei, so they follow the motion of the nuclei adiabatically, whereasthe latter move on the average potential of the former. The Born–Oppenheimerapproximation is sufficient to describe most chemical processes. In fact, ournotion of molecular structure is based on the Born–Oppenheimer approxima-tion, because the molecular structure is formed by nuclei being placed infixed positions. There are, however, essential nonadiabatic processes in naturethat cannot be described within this approximation. Nonadiabatic pro-cesses are ubiquitous in photophysics and photochemistry, and they governsuch important phenomena as photosynthesis, vision, and charge-transferreactions.

Based on the Born–Oppenheimer approximation, the behavior of mole-cules is described by the dynamics of the nuclei moving along a singlepotential energy surface (PES) generated by the electrons. Nonadiabatic phe-nomena occur when at least two potential energy surfaces approach eachother and the coupling between them becomes important. The traditional


83

way of studying nonadiabatic phenomena involves the concepts of avoidedand intersystem crossings. As two PESs approach each other, the rate ofnonadiabatic processes depends on the energy separating those two surfaces.In recent years, ultra-fast experimental techniques have allowed the observa-tion of nonadiabatic processes that take place in femtoseconds,2 and theseultra-fast rates cannot be explained with the traditional theories. Conicalintersections which are the actual crossings of two PESs, however, canfacilitate rapid nonadiabatic transitions. Conical intersections were knownmathematically since the 1930s,3,4 but they were regarded as mathematicalcuriosities rather than a useful concept for explaining photochemistry. Thereason for this neglect is stated in a review written by Michl in 1974 thatsummarized the conception of conical intersections at that time.5 Michlstated that true surface touching ‘‘is a relatively uncommon occurrenceand along most paths such crossings, even if ‘intended’, are more or lessstrongly avoided.’’

The modern era of nonadiabatic studies started in the 1990s whenalgorithms were developed that allowed for the location of conical intersec-tions without the presence of symmetry.6,7 These algorithms have sincerevealed that conical intersections occur in the excited states of many mole-cules and are far from uncommon.8–14 In fact, Mead and Truhlar haveshown that if an avoided crossing is found, it is more likely that a true con-ical intersection will be close by.15 We have progressed a long way since thefirst theoretical descriptions of conical intersections, and the abstract math-ematical formulations of the previous century can now be used to study,even in quantitative terms, systems important in real life. Conical intersec-tions can, and do, affect the photophysics and photochemistry of systems,or their spectroscopy, especially when the ground state is one of the inter-secting states.

In the last few years, this field has become a ‘‘hot’’ area with a growingappreciation by scientists of the importance of conical intersections in chemi-cal dynamics. Several reviews have been written on the subject,8–14,16 and abook was published recently giving the theoretical formulation for conicalintersections in structure and dynamics.17 A recent Faraday Discussion meet-ing brought together leaders in the field to discuss the current state of non-adiabatic methods, where conical intersections played a central role.18 In thispedagogically driven chapter, we present a basic introduction to the field andprovide some examples that illustrate how conical intersections can explainthe mechanism of nonadiabatic processes. The last section of this review pre-sents recent developments that have extended the computational tools intocases beyond the most common case of two nonrelativistic intersectingstates. This section includes cases of (1) three nonrelativistic intersectingstates, and (2) two intersecting states that incorporate the spin-orbitcoupling. This chapter is not intended as a comprehensive review of the field.

84 Conical Intersections in Molecular Systems

GENERAL THEORY

The Born–Oppenheimer Approximation and itsBreakdown: Nonadiabatic Processes

The time-independent Schrodinger equation for a molecule withN nucleiand M electrons is

Hðr;RÞ�ðr;RÞ ¼ E�ðr;RÞ ½1�

where R denotes all the nuclear coordinates and r all the electronic coordi-nates. The total nonrelativistic Hamiltonian of the system is given by

Hðr;RÞ ¼ Tnuc þHeðr;RÞ ½2�

where Tnuc is the nuclear kinetic energy operator and Heðr;RÞ is the electronicHamiltonian, which includes the electronic kinetic energy and the Coulombinteractions between the particles. Heðr;RÞ depends parametrically on R.Within the Born–Oppenheimer (adiabatic) approximation,1 the couplingbetween nuclear and electronic degrees of freedom is ignored and the totalwavefunction is assumed to be a product of a nuclear wavefunction wðRÞand an electronic wavefunction cðr;RÞ,

�ðr;RÞ ¼ cðr;RÞwðRÞ ½3�

Thus, the nuclear and electronic parts are separated, and solving the electronicSchrodinger equation provides the electronic eigenfunctions cI

Heðr;RÞcIðr;RÞ ¼ EeIðRÞcIðr;RÞ ½4�

Inserting the electronic solution back into the Schrodinger equation for thewhole system, Eq. [1], and neglecting the effect of the nuclear kinetic energyoperator on the electronic wavefunction, cI, gives

ðTnuc þ EeIÞwI ¼ EwI ½5�

which will provide the nuclear wavefunction wI and total energy E.Solving Eqs. [4] and [5] is the task of theoretical chemistry. Electronic

structure methods capable of solving the electronic problem have progressedenormously during the past 40 years and standardized computational modelshave emerged. John Pople received the Nobel Prize for Chemistry in 199819

for being one of the pioneers of this evolution. Solution of the electronicpart of the Hamiltonian provides structures, reaction paths and transition

General Theory 85

states for the study of chemical reactions, electronic energies for obtainingspectra, and many other static properties. To understand the detaileddynamics of chemical systems, however, the nuclear equation also has to besolved. The solution of this part of the Schrodinger equation has not been stan-dardized yet. Furthermore, the quantum solution of Eq. [5] is so cumbersomethat only molecules with a few atoms can be solved quantum mechanically. Inmost other cases, a classical or semi-classical method has to be employed inorder to study the dynamics of nuclei.

The Born–Oppenheimer approximation assumes that the electronic andnuclear motion are well separated and they do not interact; but this assump-tion is not always true. In a more rigorous treatment, the total wavefunction isnot a product of the electronic and nuclear wavefunctions but rather an expan-sion in terms of the electronic wavefunctions cIðr;RÞ20

�ðr;RÞ ¼XI

cIðr;RÞwIðRÞ ½6�

where wIðRÞ are expansion coefficients. The electronic wavefunctions areobtained by solving the electronic equation (Eq. [4]), and, because theyform a complete set, the above expansion is exact when not truncated.21

The expansion coefficients wI can be obtained by inserting Eq. [6] intoEq. [1], multiplying by c�I , and integrating over electronic coordinates. TheSchrodinger equation then becomes

Tnuc� 1

2mKIIðRÞ þ Ee

IðRÞ� �

wIðRÞ�XJ 6¼I

1

2m½KIJðRÞ þ 2fIJðRÞ � r�wJðRÞ¼ EwIðRÞ

½7�

where the nuclear kinetic energy is taken as Tnuc ¼ � 12mr2,r refers to the gra-

dient over the nuclear coordinates R, and m is a reduced mass. KIJ and fIJ arecoupling terms that were neglected in the Born–Oppenheimer approximation;they are responsible for nonadiabatic transitions between different states I andJ. They originate from the nuclear kinetic energy operator operating on theelectronic wavefunctions cIðr;RÞ and are given by

fIJðRÞ ¼ hcIðr;RÞjrcJðr;RÞi ½8�

and

KIJðRÞ ¼ hcIðr;RÞjr2cJðr;RÞi ½9�

The brackets in Eqs. [8] and [9] denote integration over electronic coordinatesr. The diagonal term KII corresponds to nonadiabatic corrections to a single


potential energy surface that can usually be neglected. fIJðRÞ is the derivativecoupling, a vector of dimension Nint ¼ 3N � 6, where N is the number ofatoms in the molecule. The diagonal term fII is zero for real wavefunctionsand KIJ can be expressed in terms of fIJ.

21 The derivative coupling fIJ is a mea-sure of the variation of the electronic wavefunction with nuclear coordinatesand depends on the energy difference between states I and J (see the section onDerivative Coupling). When the couplings KIJ and fIJ are neglected, Eq. [7]reduces to that derived from the Born–Oppenheimer approximation. Whenthe states are well separated, the coupling is small and the Born–Oppenheimerapproximation is valid. If, however, the electronic eigenvalues are close, asmall change in the nuclear coordinates may cause a large change in theelectronic wavefunctions, a situation where the coupling becomes importantand the more general Eq. [7] has to be used. Usually, only a small numberof electronic states are close in energy, and the expansion of the total wave-function is truncated to a small number of interacting states, most often two.

Adiabatic-Diabatic Representation

In Eq. [7], the electronic wavefunctions are taken as the eigenfunctions ofthe electronic Hamiltonian. In this case, all the coupling matrix elementsHIJ ¼ hcIjHejcJi are zero, and the coupling between different electronic statesoccurs through the nuclear kinetic energy terms; this formulation is called theadiabatic representation.17

Alternatively, a diabatic representation can be used.22–26 In this represen-tation, the electronic wavefunctions used to expand the total wavefunction arenot the eigenfunctions of the electronic Hamiltonian, but they are chosen so asto eliminate the derivative coupling. Therefore, the coupling terms do notappear in the Schrodinger equation, but the matrix elementHIJ ¼ hfIjHejfJi is nonzero, which is the term responsible for the couplingof states,

½Tnuc þHII�wI þXJð6¼IÞ

HIJwJ ¼ EwI ½10�

Note that fI are electronic wavefunctions that are not eigenfunctions ofHe. Inevery realistic case (except in diatomic molecules) where the sum over states Jis truncated, the derivative coupling cannot vanish completely for every R,27

but it can become negligibly small. Making fIJ very small corresponds tochoosing the electronic wavefunctions so that they are always smooth func-tions of the nuclear coordinates. Physically diabatic functions maintain thecharacter of the states. For example, assume that a diabatic state f1 corre-sponds to a covalent configuration and another diabatic state f2 correspondsto an ionic configuration. Before a nonadiabatic transition occurs, theadiabatic states will be the same, c1 ¼ f1 and c2 ¼ f2. After a nonadiabatic

General Theory 87

transition, f1 and f2 remain as the covalent and ionic configurations, respec-tively, but the adiabatic states will have switched, i.e., c1 ¼ f2 and c2 ¼ f1.

The problem with the diabatic representation is that, as already men-tioned, it is not possible to have the nonadiabatic couplings zero for everyR, so a strictly diabatic representation does not exist. Furthermore, althoughthe adiabatic representation is unique and well defined (by the diagonalizationof the electronic Hamiltonian), it is not true for the diabatic representation.The diabatic representation has other advantages, however, that make it themethod of choice for studying nuclear dynamics in many cases. In the adia-batic representation, the coupling term is a vector, whereas in the diabaticrepresentation, it is only a scalar and hence much easier to use. The diabaticrepresentation is smooth but the nonadiabatic couplings in the adiabatic repre-sentation have singularities at the conical intersections. Many schemes for theconstruction of diabatic states have been developed, and detailed discussionsabout their construction can be found in several reviews.25,26

The Noncrossing Rule

Neumann and Wigner proved, in their seminal work in 19293 that, for amolecular system with Nint internal nuclear coordinates (Nint ¼ 3N � 6), twoelectronic surfaces become degenerate in a subspace of dimension Nint � 2. Toillustrate this dimensionality rule, consider two intersecting adiabatic electro-nic states, c1 and c2. These two states are expanded in terms of two diabaticstates f1 and f2, which are diagonal to all the remaining electronic states andto each other,28

c1 ¼ c11f1 þ c21f2 ½11�c2 ¼ c12f1 þ c22f2 ½12�

The electronic energies are the eigevalues of the Hamiltonian matrix

He ¼ H11 H12

H21 H22

� �½13�

where Hij ¼ hfijHejfji. The eigenvalues of He are given by

E1;2 ¼ �HH ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi�H2 þH2

12

q½14�

where �HH ¼ ðH11 þH22Þ=2 and �H ¼ ðH11 �H22Þ=2. The eigenfunctions are

c1 ¼ cos ða=2Þf1 þ sin ða=2Þf2 ½15�c2 ¼ � sin ða=2Þf1 þ cos ða=2Þf2 ½16�


where a satisfies

sina ¼ H12ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi�H2 þH2

12

q ½17�

cos a ¼ H11 �H22

2ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi�H2 þH2

12

q ½18�

For the eigenvalues of this matrix to be degenerate, two conditions mustbe satisfied

H11 �H22 ¼ 0 ½19�H12 ¼ 0 ½20�

In an Nint-dimensional space, the two conditions are satisfied in an Nint � 2subspace. This subspace, where the states are degenerate, is called the seamspace. The two-dimensional space orthogonal to it, where the degeneracy islifted, is called the branching or g� h space.11,28 So, conical intersectionsare not isolated points in space. Rather, they are an infinite number of con-nected points forming the seam. For a diatomic molecule that has only onedegree of freedom, it is not possible for two electronic states of the same sym-metry to become degenerate, and this restriction is often called the noncrossingrule. For polyatomic molecules, in contrast, there exist enough nuclear degreesof freedom and their electronic states can thus become degenerate,4 althoughthe above rule does not guarantee this degeneracy will happen, i.e., that thereexists a solution for Eqs. [19] and [20].

The Geometric Phase Effect

It was first pointed out by Longuet-Higgins and Herzberg29,30 that a realelectronic wavefunction changes sign when traversing around a conical inter-section. Mead and Truhlar31 incorporated this geometric phase effect into thesingle electronic state problem and Berry generalized the theory.32 As a resultof his work, this effect is often called the Berry phase.32 Equations [15] and[16] illustrate this effect. When a changes from a to aþ 2p, thewavefunctions will change sign, i.e., c1ðaþ 2pÞ ¼ �c1ðaÞ and c2ðaþ 2pÞ¼ �c2ðaÞ.33 As the total wavefunction must be single valued, the electronicwavefunction in the adiabatic representation should be multiplied by a phasefactor ensuring that the total wavefunction remains single valued. As a conse-quence, the geometric phase can affect nuclear dynamics even when a singlepotential energy surface is considered.34–36 The geometric phase effect canbe considered as a signature of conical intersections and its presence is a proofthat a true conical intersection has been found.

General Theory 89

Conical Intersections and Symmetry

Symmetry-Required Conical Intersections: The Jahn–Teller EffectDegenerate electronic states may exist when a molecular system has high

symmetry. In this case, the requirements for degeneracy (Eqs. [19] and [20])are satisfied by symmetry alone. The Jahn–Teller effect, which refers to thesesymmetry-induced degenerate states, has been known and studied for a longtime. The Jahn–Teller theorem, published in 1937,37,38 states that a moleculein an orbitally degenerate electronic state is unstable and will distort geome-trically to lift the degeneracy. Several books focusing on the Jahn–Teller effecthave been written.39–41 Bersuker contributed to many of these volumes, and herecently published a review on the many advances made in this area.42 Com-mon examples of the Jahn–Teller problem include the doubly degenerate E� eproblem and the triply degenerate T � ðeþ tÞ problem. In a classic Jahn–Tellerproblem, such as the ‘‘mexican hat’’ in the E� e problem, analytic expressionsare used to model the region around the degeneracy. In this manner, boundvibronic states can be derived, which provide a means for experimental veri-fication and study of conical intersections.43 A linear system does not exhibitthe Jahn–Teller effect. Instead, these systems display a similar effect, theRenner–Teller effect, where the first-order coupling is zero, and the degeneracyis not lifted linearly but only in quadratic order.

Symmetry-Allowed Conical IntersectionsWhen two states that cross are of different symmetry, the requirement

that H12 ¼ 0 is satisfied by symmetry. The second requirement (that�H ¼ 0) is satisfied in a subspace of dimension Nint � 1, where Nint corre-sponds to the internal coordinates that retain the symmetry. This crossing isnot a conical intersection, but it becomes one when the symmetry breakingcoordinate that can couple the two states is included. For example, BH2 hasa crossing between states A1 and B2 in C2v symmetry.44 In this symmetry, BH2

has two internal coordinates, the symmetric stretch and bending. In this space,the crossing occurs in a space of dimension 2� 1 ¼ 1, i.e., a line. If the asym-metric stretch is included, the molecule has, in general, Cs symmetry, and bothof the states that are crossing have A0 symmetry. The crossing between thosetwo states is a conical intersection of dimension 3� 2 ¼ 1 (termed accidentalconical intersection, as discussed below).

Accidental Conical IntersectionsMost molecular systems in nature have little or no symmetry, and it is in

these systems that accidental conical intersections often exist. Locating acci-dental points of degeneracy is more difficult than the previous cases becausethere is no symmetry that can be used for guidance. This difficulty, alongwith the misinterpretation of the noncrossing rule, delayed the appreciationof accidental conical intersections. One of the early cases where accidental


conical intersections were found is ozone.45 The global minimum of theground state of ozone has C2v symmetry, but there exists a second local mini-mum at higher energies with D3h symmetry. The two ozone minima are sepa-rated by a transition state that lies close to a conical intersection between the11A1 and 21A1 electronic states. In contrast to the previous example of BH2,the intersecting states of ozone belong to the same irreducible representation inC2v symmetry. As there are two degrees of freedom in this symmetry (the sym-metric stretch and bending), the dimension of the seam is 2� 2 ¼ 0, so there isonly a point of degeneracy. In the lower Cs symmetry, ozone has three degreesof freedom and the seam has dimension 3� 2 ¼ 1. Thus, the two states crossalong a line in Cs symmetry, and this line contains a single point at which themolecule has C2v symmetry.

The Branching Plane

The matrix elements of He when expanded in a Taylor expansion to firstorder around the point of conical intersection R0 become28,46

�HHðRÞ ¼ �HHðR0Þ þ r �HHðR0Þ � dR ½21��HðRÞ ¼ 0þrð�HÞðR0Þ � dR ½22�H12ðRÞ ¼ 0þrH12ðR0Þ � dR ½23�

The requirements for a conical intersection at R (Eqs. [19] and [20]) thenbecome

rð�HÞ � dR ¼ 0 ½24�rH12 � dR ¼ 0 ½25�

so that dR must be orthogonal to the subspace spanned by the vectors rð�HÞand rH12 for the degeneracy to remain. The subspace defined by the two vec-tors, where the degeneracy is lifted linearly, is the branching or g� h space.9,28

The intersection-adapted coordinates are defined by the unit vectors along theenergy difference gradient and the coupling gradient, respectively,

x ¼ g=g ¼ rð�HÞ=g ½26�y ¼ h=h ¼ rH12=h ½27�

where Yarkony’s notation g ¼ rð�HÞ and h ¼ rH12 has been used.11 InEqs. [26] and [27], g, h are the norms of the corresponding vectors. Althoughthese vectors are defined here for the two-state model under consideration, thedescription can be generalized for actual ab initio wavefunctions.9 Quaside-generate perturbation theory can be used to describe the region around a

General Theory 91

conical intersection, thus providing a way to formally describe this region.47,48

The Hamiltonian matrix of Eq. [13] in the branching plane becomes

He ¼ ðsxxþ syyÞIþ gx hyhy �gx

� �½28�

where x; y are displacements along the g; h directions, respectively, sx and syare the projections of r �HH onto the branching plane, and I is a 2� 2 unitmatrix. The energies after diagonalization are

E1;2ðx; yÞ ¼ sxxþ syy ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðgxÞ2 þ ðhyÞ2

q½29�

If the energy of the two states is plotted around the conical intersection alongthe two special coordinates, the potential will have the form of a double cone.Figure 1 shows the three-dimensional plots of the energy of two intersectingstates (a) along the branching plane and (b) along one of the branching coor-dinates and a seam coordinate. One can see that if the x; y axes are the branch-ing coordinates, the degeneracy is lifted and the double cone is formed. If the xcoordinate is a branching coordinate but the other coordinate is a seam coor-dinate, the degeneracy is lifted only along one coordinate and a double wedgeis formed.

Using cylindrical polar coordinates x ¼ r cos y, y ¼ r sin y, the aboveHamiltonian becomes

He ¼ rðsx cos yþ sy sin yÞIþ rg cos y h sin yh sin y �g cos y

� �½30�

33.1

3.23.3

3.4 –0.6–0.4

–0.20

0.20.4

0.6y

x

EE

x (a.u)r (a.u)

(a) (b)

Figure 1 The energies E1;E2 of two intersecting states plotted (a) along the branchingplane and (b) along a branching coordinate and a seam coordinate.


From Eq. [17],

tan a ¼ h sin yg cos y

¼ h

gtan y ½31�

Thus, the angle a, which relates a diabatic representation to the adiabaticrepresentation, is related to the angle y defined from the intersection-adaptedcoordinates.

Characterizing Conical Intersections: Topography

Conical intersections are characterized by their topography.11,28 Thetopography of the PESs in the vicinity of a conical intersection plays a signifi-cant role in the efficacy of a conical intersection’s ability to promote a nona-diabatic transition.11,28,49–53 This topography is described to first order by theexpression of the energies in Eq. [29] in terms of displacements x; y.

The topography of the cone in the branching plane is given in terms ofthe set of parameters g; h; sx; sy defined in the previous section. The parametersg; h give the slope of the cone in the two directions x; y and the parameterssx; sy give the tilt of the cone. A vertical or peaked conical intersection is a con-ical intersection in which the sx and sy parameters are zero. If one or both ofthese parameters is nonzero, the conical intersection will be sloped. The coneis also characterized by the difference in the slopes, g and h. A symmetric coneis one in which the slopes g and h are equal, whereas an asymmetric cone hasdifferent slopes. Figure 2 shows a cone that is asymmetric and tilted mostly inone direction. The parameters for this cone in atomic units areg ¼ 0:13; h ¼ 0:02, sx ¼ 0:20, and sy ¼ 0:00. The three-dimensional plot isshown in panel (a), whereas the y ¼ 0 and x ¼ 0 planes are shown in panels(b) and (c), respectively. The cone along the x direction is steep and tilted,whereas along the y direction it is very flat and vertical because sy is zero(see Figure 2c). Figure 2d shows the energies around the cone as a functionof the polar coordinate y.

The topography of the cone affects the system’s dynamics. Simple classicarguments can rationalize the way topography affects a trajectory: Verticalcones facilitate transitions from the upper surface to the lower surface whereastilted cones are less efficient.28,51 Actual quantum mechanical calculationshave confirmed these generalizations.51,54 The efficacy of a conical intersectionin promoting a nonadiabatic transition reflects the topography in the vicinityof a conical intersection.11

The g and h vectors represent nuclear displacements similar to the nor-mal modes of a molecule. As the wavefunctions of the degenerate states canmix arbitrarily, these vectors are not unique. A unitary transformation canbe used to rotate them in a way that makes them orthogonal to each otherwithout changing the form of the Hamiltonian.55 The two vectors then span

General Theory 93

the branching plane and correspond to the molecular motion the system haswhen exiting the funnel. Examples of these vectors for conical intersectionsin the OHOH system and in uracil are given in Figure 3.56 If the moleculehas symmetry at the point of the conical intersection, the vectors transformas irreducible representations of this group. The g vector is always totally sym-metric because it represents the energy difference gradient of the two states.The coupling vector transforms as the direct product of the symmetry of thetwo states. For example, OHOH has linear symmetry, and a conical intersec-tion between a � and a � state exists.57 The tuning vector g is symmetric (s),whereas the coupling vector h has p symmetry and distorts the linearity(Figure 3a). Uracil is planar at the conical intersection described by these vec-tors. The two states intersecting are A0 and A00, so the two vectors transform asa0 and a00. They are shown in Figure 3b.

After a system on the higher surface encounters a conical intersection, itcan emerge through the conical intersection to the lower surface. The conicalintersection tends to orient the molecular motion in the directions defined bythe branching plane. Accordingly, the outcome of a photochemical reaction

–0.4

–0.3

–0.2

–0.1

0

0.1

0.2

0.3

0.4

0 50 100 150 200 250 300 350

E(a

.u)

θ (degrees)–0.2

–0.15

–0.1

–0.05

0

0.05

0.1

0.15

0.2

–0.4 –0.2 0 0.2 0.4

E (

a.u.

)

y(a.u.)

–0.2

–0.15

–0.1

–0.05

0

0.05

0.1

0.15

0.2

–0.4 –0.2 0 0.2 0.4

E (

a.u.

)

x(a.u.)

x (a.u.) –0.4–0.2 0 0.2 0.4

y (a.u.)

–0.20–0.15–0.10–0.050.000.050.100.150.20

E (a.u.)

E1

E2

E1

E2

E1

E2

E1

E2

(a) (b)

(d) (c)

Figure 2 The energies E1;E2 of two intersecting states plotted (a) along the branchingplane, (b) along the x direction, (c) along the y direction, and (d) along the polarcoordinate y.


and its associated branching ratios, for example, will be affected by the branch-ing vectors and the gradient of the surfaces.14,57,58 As an example, we considerthe reaction of ground state radicals OH(X) reacting with excited radicalsOH(A).57 Figure 3a shows the g and h vectors at the conical intersection�� as described above. Inspection of these vectors provides a guess as towhat the products will be. Displacement along the positive tuning vector bringsthe two hydrogens close to oxygen suggesting the formation of water, whereasdisplacement along the negative tuning direction shows the tendency to formOHþOH. Displacement along the coupling vector will bend the HOH unitleading to formation of water. Figure 4 shows a cartoon of the possible direc-tions the system can take and the suggested products in each case. These spec-ulations were confirmed by calculating ab initio, gradient-directed paths.57

g h

OO

OO

OO

OO

HH

HH

HH

HH

(a) (b)

Figure 3 The g and h vectors defining the branching plane (a) for OHOH, (b) for uracil(reproduced with permission from Ref. 56).

h

g

E

Reaction to H2O+O

Quenching to OH(X)+OH(X)

Routing EffectRouting EffectOH(A)+OH(X)

Figure 4 Cartoon of the possible outcomes for the reaction OH(X)þOH(A) afteremerging from a conical intersection.

General Theory 95

Derivative Coupling

The efficiency of a radiationless transition between two states dependsnot only on the energy difference between those states but also on the deriva-tive coupling fIJ of the states. The derivative coupling appears in the equationsthat describe nonadiabatic nuclear dynamics (Eq. [7]). In the adiabatic repre-sentation, the derivative coupling is needed to carry out nuclear dynamics. Thediabatic representation is defined by setting the derivative coupling equal tozero, which means that efficient ways to transform between the diabatic andadiabatic representations use the derivative coupling.26,55,59 By using the gra-dient operator r on the electronic Schrodinger equation HecI ¼ Ee

IcI, multi-plying by c�J , and integrating over electronic coordinates, Eq. [32] is obtainedfor the derivative coupling

fIJðRÞ ¼hcIjrHejcJi

EeJ � Ee

I

½32�

This expression shows that the derivative coupling is inversely proportional tothe energy difference between the two states, so when the two states approacheach other, the derivative coupling becomes large. At the conical intersection,the energy difference is zero and the derivative coupling becomes infinity.

By differentiating the orthonormality condition for the wavefunctionscI, hcIjcJi ¼ dIJ, one obtains the following properties

fIJðRÞ ¼ �fJIðRÞ ½33�fIIðRÞ ¼ 0 ½34�

so the derivative coupling is antihermitian.Transforming the derivative coupling to intersection-adapted coordi-

nates and polar coordinates, the singular part at the conical intersection isrestricted at the y component.11 Using these transformations, the derivativecoupling is given by

1

rfy ¼ 1

rc1

qqy

�� c2

�

¼ cos ða=2Þf1 þ sin ða=2Þf2

qqy

�� ð� sin ða=2Þf1 þ cos ða=2Þf2Þ�

½35�

which becomes

1

rfy � 1

2rqaqy

½36�


if f1 and f2 are quasidiabatic states f1qqy

�� f2

� � ¼ 0� �

. At the conical intersec-tion, r ¼ 0 and fy is infinity. Alternatively, the opposite direction can be pur-sued where one can define the diabatic states f1;2 by setting fy equal to zeroand finding the necessary transformation a.

ELECTRONIC STRUCTURE METHODS FOREXCITED STATES

Although nonadiabatic effects imply the breakdown of the Born–Oppen-heimer approximation, they are still studied within the framework of the Born–Oppenheimer approximation, in which case the electronic Hamiltonian mustbe solved first. The electronic structure method that will be used to providethe energies and gradients of the states involved is very important for an accu-rate description of conical intersections. Ab initio electronic structure methodshave been used for many years. Treating closed-shell systems in their groundstate is a problem that, in many cases, can now be solved routinely by chemistsusing standardized methods and computer packages such as GAUSSIAN.60

In an ab initio approach, the first step is to solve the Hartree–Fock pro-blem using a suitable basis set. In the Hartree–Fock model, each electronexperiences only the average potential created by the other electrons. In rea-lity, the instantaneous position of each electron, however, depends on theinstantaneous position of the other electrons; but the Hartree–Fock model can-not account for this electron correlation. In order to obtain quantitativeresults, electron correlation (also referred to as dynamical correlation) shouldbe included in the model and there are many methods available for accom-plishing this task based on either variational or perturbation principles.61

The easiest method to understand conceptually is variational configurationinteraction (CI).62 In this method, the electronic wavefunction is expandedin terms of configurations that are formed from excitations of electronsfrom the occupied orbitals in the Hartree–Fock wavefunction to the virtualorbitals. The expansion can be written as

cI ¼XNCSF

a¼1cIaca ½37�

The basis of the expansion, ca, are configuration state functions (CSFs), whichare linear combinations of Slater determinants that are eigenfunctions of thespin operator and have the correct spatial symmetry and total spin of the elec-tronic state under investigation.62 The energies and wavefunctions are pro-vided by solving the equation

½HeðRÞ � EeIðRÞ�cIðRÞ ¼ 0 ½38�

Electronic Structure Methods for Excited States 97

where HeðRÞ is the electronic Hamiltonian in the CSF basis. Usually, theexpansion includes only single and double excitations and is referred to asCISD. If excitations from all occupied to all virtual orbitals are included,the method is a full CI (FCI) and gives the exact answer for the basis setused. In almost all cases, however, FCI yields a very large expansion andhas to be truncated. A different approach for including electron correlationis perturbation theory. Perturbation models at various levels exist, denotedas MPn, where n is the order at which the perturbation expansion is termi-nated. The most popular model, MP2, goes up to second order in perturbationtheory and is used extensively for accurate geometrical optimizations and reac-tion energies. A third approach to include dynamic electron correlation is thecoupled cluster method.63 In this approach, the wavefunction is written as

c ¼ eðT1þT2Þc0 ½39�

where T1 and T2 are operators specifying single and double excitations,respectively, and c0 is the Hartree–Fock wavefunction. If single and doubleexcitations are included, the method is denoted CCSD, and if an approximateway to include triple excitations is used, the method is denoted CCSD(T).Coupled cluster methods represent the most sophisticated methods to accountfor dynamical correlation when a single electronic configuration is a goodfirst-order description of the chemical system. Apart from the above wavefunc-tion-based methods, dynamic correlation can be included by the use of den-sity-based methods, the density functional approaches.64 Density functionaltheory (DFT) has gained unprecedented popularity in recent years becauseof its success in predicting ground state structures with small computationaleffort. However, when treating excited states, the situation becomes morecomplicated. The simplest method that can be used to study excited states isCIS, which is equivalent to Hartree–Fock for the ground state. This methoddoes not give quantitative results but can be used as a starting point. Moresophisticated methods are discussed below.

Multiconfiguration Self-Consistent Field (MCSCF)

As studying excited states requires the equivalent treatment of all electro-nic states, single reference methods cannot best describe excited states. Themost straightforward way to treat excited states is the use of multireferencemethods. Multireference methods are extensions of the single reference Har-tree–Fock or CI methods, where many configurations are used instead of a sin-gle configuration. Multiconfigurational methods are appropriate not only forexcited states but also for ground states with multiconfigurational character,i.e., where one configuration is not sufficient to describe them. These problemsrequire treatment of the nondynamical correlation, which occur because ofnear degeneracies and rearrangement of electrons within partly filled shells.62


One of the most frequently used methods to study excited states is MCSCF. Inthe MCSCF method, the wavefunction is written as a linear combination ofCSFs and the molecular orbitals and coefficients of the expansion are simulta-neously optimized using the variational principle. The choice of which config-urations to be included is critical and depends on the chemical nature of theproblem. This selection process is the most difficult step in setting up anMCSCF calculation, and the flexibility in the choice of the active space andits dependence on the investigator’s intuition has been a criticism of thesemethods. As a result of this flexibility, MCSCF methods cannot be used aspart of a ‘‘black box’’ computational procedure. The configurations are usual-ly generated by excitations of electrons within an active space. A very usefulapproach is the complete active space MCSCF designated as CASSCF.65,66 Inthis case, the configurations are generated by a full CI within the active space,i.e., all possible excitations of the active electrons within the active orbitals areused. Nevertheless, one must still choose the orbitals to be included in theactive space. For small systems, a full valence active space can be used, whereall the valence orbitals of all atoms are included, which is not possible for lar-ger systems and compromises between rigor and computing time have to bemade. In organic aromatic molecules, for example, one can specify as theactive space the p orbitals in order to study p! p� states. If lone pairs thatcontribute to the excited states exist, they too should be part of the activespace. Other types of excited states include Rydberg states, and, in order tobe studied efficiently, the active space should include the Rydberg orbitals.If both valence and Rydberg states are being considered, or if the stateshave mixed character, the active space should include orbitals of both types.Rydberg states require additional considerations for a proper treatment. Forexample, because of their diffuse character, the basis set has to include diffusefunctions of the appropriate type.

Multireference Configuration Interaction (MRCI)

To include dynamical correlation into a quantum calculation, one mustgo beyond the MCSCF approach. A multireference configuration interactionmodel (MRCI) is a CI expansion in which many electronic configurationsare used as references instead of using a single Hartree–Fock reference. Thefinal expansion is a linear combination of all the references and of the config-urations generated from single and double excitations out of these referencesto the virtual orbitals. The first step in an MRCI calculation is to generate theorbitals, and an MCSCF calculation is usually used for this step. When study-ing excited states, an average-of-states MCSCF is needed to guarantee goodorbitals for all the states under consideration. When one is interested simplyin excitation energies, each state can be calculated separately by optimizingthe orbitals for each state. When conical intersections are involved, however,it is important to use orbitals that are common to all intersecting states. The


second step in an MRCI calculation is to choose the references, which are typi-cally determined using the same logic as in MCSCF. Technically, it is notnecessary to choose the same reference as in MCSCF, but it is the most com-mon and safest procedure to follow. The third MRCI step is to generate all ofthe configurations that will be used. These configurations are generated by sin-gle and double excitations to the virtual orbitals, i.e., those unoccupied in theMCSCF calculation. One way to generate these configurations is to include allsingle excitations out of the active space orbitals (First Order CI, FOCI), or toinclude all single and double excitations out of the active space orbitals (Sec-ond Order CI, SOCI). With FOCI or SOCI, all occupied orbitals that are not inthe active space are frozen, meaning that there are no excitations out of thoseoccupied orbitals. In a more general procedure, part of the occupied orbitalsare frozen and part of them are allowed to participate in excitations. If themolecule is small, there is no need to freeze any orbitals, but as the size ofthe system under study increases, one is forced to freeze some orbitals to main-tain computational tractability. Typically, the core orbitals can be frozen with-out significant loss of accuracy, but freezing more orbitals introduces errorsthat can be substantial. For example, it has been seen in organic moleculeswith p orbitals that, by freezing the s orbitals, one obtains poor results. Forsuch molecules s� p correlation is important.67

The MRCI method is very accurate provided all the important configura-tions are included in the expansion. This requirement can be satisfied for smallsystems, but as the size of the system increases, the expansion becomes prohi-bitively large and truncations are necessary. If one is interested in excitationenergies at a single geometry (most often vertical excitations), different expan-sions for the different states can be used to reduce the size of the calculation.Alternatively, the configurations can be truncated based on some selection cri-terion. If, however, one is interested in the excited states over a range of geo-metries, the relative importance of configurations changes and the truncationcannot be used easily.

An important development in MRCI involves ‘‘direct methods,’’ inwhich the Hamiltonian matrix to be diagonalized is not stored on disk.Requiring the Hamiltonian matrix to be stored on disk creates a limitationon the size of the expansions that can be used. Today, the diagonalizationis done directly, thus enabling expansions of millions to billions of CSFs.68

Also, analytic gradients have been developed for MRCI wavefunctions,69–72

which is a huge advantage in computing cost when using these wavefunctionsfor studying conical intersections. The COLUMBUS suite of programs,73 forexample, has algorithms for studying conical intersections and derivativecouplings,74,75 that rely on analytic gradient techniques.69–71 An efficient,internally contracted MRCI method has been developed by Werner andKnowles76 and is implemented in the MOLPRO suite of programs.77 Thisprogram is widely used, but it has the disadvantage of not including analyticgradients.


Complete Active Space Second-Order PerturbationTheory (CASPT2)

A very efficient method to calculate excited states based on perturbationtheory has been developed by Roos et al.78,79 This method, called CASPT2,has been implemented in the ab initio package MOLCAS.80 Other implementa-tions of perturbation theory for excited states have also been developed81–83 andexist in other computational packages such as in GAMESS.84 The CASPT2method78,79 perturbatively computes, through second order, the dynamical cor-relation using a single CASSCF reference state and a non-diagonal zeroth-orderHamiltonian H0. This method has been used for the study of many systems ofvarious sizes and it reproduces experimental excitation energies with high accu-racy.85 It is the method of choice for systemswithmore than 10 to 15 atoms. Thefirst step in a CASPT2 study is again to obtain the orbitals through an MCSCFmethod. Then the active space is selected and the second-order perturbationtheory calculation is performed using the MCSCF reference wavefunction.

The CASPT2 method cannot treat near degeneracies efficiently, because,in these cases, the CASSCF wavefunction is an insufficient reference state forthe perturbation calculation. Multistate perturbative methods have beendeveloped81,86 that avoid this problem and have been found to perform wellat avoided crossings and when valence-Rydberg mixing occurs.86 Seranoet al.87 recently investigated the possibility of using CASPT2 and multistateCASPT2 for locating actual conical intersections. These authors relied onnumerical derivatives for locating conical intersections.87 They concludedthat these methods can lead to nonphysical results when small active spacesare used. Another major disadvantage in geometry optimizations usingCASPT2 is that no analytical derivatives are yet available for this method.Nevertheless, CASPT2 can be used to obtain refined energies at selected points(stationary points or conical intersections) optimized at the CASSCF level.This approach is used extensively at present, and it has been very useful innonadiabatic problems in systems of moderate size.

Single Reference Methods

A different approach for calculating excited states is based on indirectmethods that allow one to calculate excitation energies based on a single refer-ence starting point. Starting from a coupled cluster representation for theground state equation-of-motion coupled cluster (EOM-CCSD)88,89 can beused to provide accurate excitation energies when the reference does not havea multiconfigurational character. Variations of the method can allow formore extended problems such as bond-breaking.90 Alternatively, when thesystem under consideration cannot be treated at this high level of theory,time-dependent density functional theory (TDDFT)91 provides excitationenergies at a cost similar to that of DFT. These methods can predict vertical


excitation energies efficiently, but their extension to the description ofexcited state properties and PESs is more complicated and is currently underdevelopment.

Choosing Electronic Structure Methods for ConicalIntersections

In summary, the choice of electronic structure method one must makefor studying conical intersections should be guided by the following considera-tions: (1) the intersecting states must be treated equivalently; (2) analyticalderivatives should be available because any analysis of conical intersectionsinvolves evaluating the gradients of the surfaces; and (3) both dynamicaland nondynamical correlation should be included. It is not possible, however,to always satisfy these criteria, especially when studying large systems. Analy-tic gradient techniques exist for CASSCF and MRCI wavefunctions, and effi-cient codes have been developed for locating conical intersections for bothtypes of wavefunctions.6,7 By using the MRCI method, one can, in principle,satisfy all of the above criteria, but the scaling of CPU time with the size of thesystem being studied limits the applicability of this method to small-ormedium-size systems. An alternative procedure, which is very common in cur-rent publications on excited states optimizations, is to choose a lower-leveltheory for geometry optimization but then to use a highly correlated methodto obtain accurate energetics. A combination that has been used extensivelyfor medium-size systems is to use CASSCF for the optimizations followed byCASPT2 for the energies. Although this procedure can be used to locate con-ical intersections, one must be careful because the location of degeneracies ismore sensitive to the method used than is the location of minima. If the meth-od chosen for optimization does not even give a correct qualitative descriptionof the system, i.e., the correct ordering of states, then this approach will lead towrong conical intersections.92,93 Another problem occurs when the dynamicelectron correlation of the states that are crossing is substantially different.Under these conditions, the point of the conical intersection becomes anavoided crossing at the higher level of theory, with energy differences exceeding0.5 eV.94 In most cases, the true point of conical intersection is not removed.Instead, it may be relocated at the higher level of theory at a geometry similarto that found by using the lower level of theory94. This is, however, an empiri-cal observation and it is not guaranteed that it will always be the case.

LOCATING CONICAL INTERSECTIONS

The development of analytic gradient techniques95 enables the efficientcharacterization of PESs by finding optimized structures for molecules, locatingtransition states, and establishing reaction pathways. Applications of analytical


gradients can be carried out routinely and with great accuracy today for groundstate surfaces, thus providing the means for understanding the structure ofmolecules and their mechanisms of reaction. Optimizing extrema of excitedstates in contrast is considerably less advanced because of the limited ab initiomethodology available for studying excited states, as discussed in the pre-vious section. To locate conical intersections efficiently, the nonadiabatic cou-plings are needed in addition to the gradients of the surfaces, althoughalgorithms that do not require the coupling can be used. Analytic gradient tech-niques can be extended for the calculation of the nonadiabatic coupling.72,74,75

Methods for locating conical intersections have been developed based onLagrange multiplier6,9,96–99 and projected gradient7,100 techniques. It wasshown in the section on General Theory that conical intersections exist inhypersurfaces of dimension Nint � 2, so there is an infinite number of conicalintersections. These algorithms seek the minimum energy point on the seam,although the seam can be mapped out along some coordinate by using geome-trical constraints.

Methods that implement Lagrange multiplier techniques for constrainedminimizations use the Lagrange multipliers to incorporate the constraints forconical intersections, or for geometrical constraints.6,9,96–99 In the simplestversion of these algorithms, the energy of one of the states is minimizedwith the constraint that the energy difference between the two states is zero.In more advanced versions, the additional constraint that the coupling HIJ iszero is added. Starting from a point R not at the conical intersection, therequirements for obtaining a conical intersection are

�EðRÞ þ gðRÞ � dR ¼ 0 ½40�hðRÞ � dR ¼ 0 ½41�

where�E ¼ EeI � Ee

J . When both criteria are used, the following Lagrangian isformed and minimized:6,9

LðR; l1; l2Þ ¼ E1ðRÞ þ l1�Eþ l2HIJ ½42�

where l1 and l2 are Lagrange multipliers. Additional geometrical constraintscan be imposed by adding them to the Lagrangian. By searching for extrema ofthe Lagrangian, a Newton–Raphson equation can be set up,

Q g hg 0 0h 0 0

24

35 dR

dl1dl2

24

35 ¼ � rL

qL=ql1qL=ql2

24

35 ½43�

which, when solved, provides the solution dR. The matrix elements of H aregiven by Qij ¼ q2L

qRiqRj, and the relations q2L

qRiql1¼ gi,

q2LqRiql2

¼ hi have been used.

Locating Conical Intersections 103

This method has been implemented using analytic gradients fromMRCI wave-functions72,74,75,101 and has recently been added to the COLUMBUS suite ofprograms.73

Bearpark et al. developed a method that does not use Lagrange multi-pliers but uses projected gradient techniques instead.7 This approach mini-mizes the energy difference in the plane spanned by g and h and minimizesE2 in the remaining Nint � 2-dimensional space orthogonal to the g� hplane.7 This method has been discussed in a previous chapter of this series.14

It uses MCSCF analytic gradients and has been implemented in theGAUSSIAN computer package.60

The derivative coupling can be calculated for CI or for MCSCF wave-functions using analytic gradients from the expression72

fIJ ¼ hcI½rHe�cJi��E

þXa;b

cIahcajrcbicJb ½44�

where ca are CSFs and cI and cJ are the CI coefficients for the adiabatic states Iand J, respectively (see Eq. [37]). The first component in Eq. [44] correspondsto the CI contribution and is caused by the change of the CI coefficients,whereas the second component corresponds to the CSF contribution.

DYNAMICS

The electronic structure description of conical intersections provides sta-tic information about the PESs and the mechanisms for nonadiabatic pro-cesses, comparable with the way that transition states provide mechanismsfor ground state problems. In many cases, this static picture is not sufficient,so the kinetic energy of the nuclei must be considered. Once the electronicstructure problem has been solved, and the PESs and nonadiabatic couplingsare available, the nuclear part of the problem can be solved, thereby providinginformation about the dynamical evolution of the system. For systems with upto four atoms, a dynamical solution can be obtained by solving the quantummechanical Schrodinger equation.49 Quantum dynamics requires informationon the global PES, which is computationally intractable for systems containingmany degrees of freedom. A reduced dimensionality model can be invoked forsystems where all the degrees of freedom cannot be considered but where someaspects of the dynamics may be lost. Approximate methods, like the multicon-figuration time-dependent Hartree (MCTDH) method, can extend wavepacketpropagation to larger systems.102 Systems with conical intersections and up to24 degrees of freedom have been studied using MCTDH.102 The simplestalternative to quantum mechanics is to use classical mechanics, where manytrajectories simulate the wavepacket evolution. Nonadiabatic processes


describe transitions between different electronic states and, because thesenonadiabatic processes cannot be treated by purely classical methods, semi-classical methods have to be used. Trajectory-Surface-Hopping (TSH)103–107

and Ehrenfest dynamics108 are very popular methods currently in use forstudying dynamics this way and have been reviewed and compared by Hackand Truhlar.107 In the surface-hopping models, classical trajectories are propa-gated on a single PES. When the transition probability to another surfacebecomes smaller than some criterion, the trajectory hops to the other surface.There are many variants of the hopping criterion that give rise to many sur-face-hopping models.103–107 In Ehrenfest dynamics, the force is obtainedfrom an average potential obtained from the electronic structure. Both thePESs and the nonadiabatic couplings determine the trajectories.108 Anotherapproach, Full Multiple Spawning (FMS), has been developed by Martinezand coworkers.109–111 In their method, the total wavefunction is a sum ofthe products of nuclear and electronic wavefunctions. The PES is recomputedon the fly by ab initio methods that are guided by molecular dynamics.In Martinez’s method, the nuclear wavefunction is expanded in terms ofGaussian basis functions with time-dependent coefficients CðtÞ. The time evo-lution of those coefficients is determined from the time-dependent Schrodingerequation. Each Gaussian basis has a position and a momentum that isdetermined by Hamilton’s equations of motion. In the nonadiabatic region,new wavefunctions are ‘‘spawned’’ onto the other electronic states. The focusof this chapter is on the electronic structure description of conical intersec-tions, so only a very brief summary of dynamical methods is given. Extensivediscussions can be found in other resources.17

APPLICATIONS

The list of systems in which conical intersections have been studied islengthy and one cannot account for all of them in a chapter like this. Hereit is pointed out that most areas of chemistry are affected by conical intersec-tions. Examples related to the author’s own research will be described in great-er detail later to illustrate how conical intersections can be used to understandmechanisms of photophysical or photochemical processes.

The first accidental conical intersections based on ab initio methods werefound for triatomic systems O3,

45 LiNaK,112 and for CHþ4113 even before the

availability of automatic search algorithms. Later, the availability of algo-rithms allowed for the study of many small systems. Systems greatly affectedby conical intersections are small radicals important in atmospheric and com-bustion chemistry, and these systems have been studied extensively.9,16,43,114

Experimental spectroscopic studies of conical intersections are possible forJahn–Teller systems, and typical radicals like C5H5 and C6H

þ6 have been stu-

died by Miller et al.16,43 A main advantage of small systems is that they are

Applications 105

easy to analyze theoretically and have served as prototype systems to test andextend the theory.

In the area of organic photochemistry, extensive work has been done toexamine the role of conical intersections in reaction mechanisms, and severalreviews have been written highlighting the importance of conical intersectionsin photochemical reactions.12–14,115,116 A tutorial in this book series14 dis-cusses how to study mechanisms of photochemical reactions using conicalintersections, and other books that focus on photochemistry now includethis topic in their discussions.117,118 Conical intersections have been foundin most photochemical reactions, such as bond-breaking, bond-making,charge transfer, photoisomerization, and intramolecular electron transfer inorganic radical cations.

Conical intersections usually appear in the Jahn–Teller form in inorganictransition metal complexes because the high symmetry of such complexesallows for this symmetry-required type of conical intersection. For example,studies of complexes of metals with carbonyls revealed that conical intersec-tions facilitate the photodissociation of CO.119 It should be noted, however,that a sufficient amount of work has not been done yet in this area to revealwhether accidental conical intersections exist and what role, if any, theyplay in photodissociation. As a result of the larger spin-orbit coupling intransition metal systems, there exists a higher probability for spin-forbiddentransitions (intersystem crossing) than in nontransition metal systems. Matsu-naga and Koseki have recently reviewed spin-forbidden reactions in this bookseries.20

Conical Intersections in Biologically Relevant Systems

One of the emerging areas in which conical intersections are important isin biological systems. Nonadiabatic processes are common in photobiology,affecting essential processes in life like photosynthesis, light harvesting, vision,and charge transfer and in the photochemical damage and repair of DNA.Conical intersections are expected to participate actively in these processes,and current efforts are underway by several groups to study theseeffects.52,53,121–126 The size of these systems makes accurate quantum mechan-ical studies prohibitive, so, in many cases, the chromophore responsible for thephotochemical behavior (which is usually a smaller molecule) is used as amodel. Mixed quantum mechanical/molecular mechanical (QM/MM) meth-ods can be used to incorporate the effect of the biological environment atthe classical level. These methods often work well for biological systems wherethe nonadiabatic process is localized on the chromophore; however, they areof limited use when the effect is delocalized. For example, a problem such ascharge transfer through DNA cannot be studied in this way, because theexcited states of many chromophores participate. A few examples of thesestudies follow.


Vision involves cis-trans photoisomerization of a chromophore122 andmany studies have been done using different models.52,53,121,123 For example,a CASSCF/AMBER procedure has been used to study the nonadiabaticdynamics of retinal in rhodopsin proteins.53 In another study, a simple modelof a photosynthetic center was examined by Worth and Cederbaum.126 Theyproposed that the presence of conical intersections facilitated the long-rangeintermolecular photo-initiated electron transfer between the protein’s por-phyrin and a nearby quinone. Semiempirical methods and QM/MM methodshave been developed by Martinez and coworkers124,127,128 to study the cis-trans isomerization dynamics of the Green Fluorescent Protein chromophorein solution, which occurs through conical intersections.124 The chromophorein this protein consists of two rings connected with a double bond and hasbeen studied in vacuo as well.125

DNA/RNA BasesThe effect of UV radiation on DNA is of great importance because it can

lead to photochemical damage. A detailed understanding of the properties anddynamics of the excited states of the DNA and RNA bases is most relevantbecause they are the dominant chromophores in nucleic acids. It has beenknown for years that the excited states of the nucleobases are short-livedand the quantum yields for fluorescence are very low.129–131 Recent advancesin experimental techniques have enabled the accurate measurement of theirexcited state lifetimes132 and found them to be on the order of femtoseconds,which suggests that nonradiative relaxation proceeds to the ground state on anultra-fast time scale with the extra energy being transformed into heat.132 Thephotophysical behavior of nucleobases in the gas phase and solution is cur-rently under investigation by many theoretical and experimental groups,with many questions still needing to be addressed.

The mechanism for nonradiative decay for cytosine, adenine, and uracilhas been investigated with quantum mechanical methods.56,92–94,133–137

Excited states in the nucleobases originate from electron excitations fromp or lone pair n orbitals to p� orbitals. Detailed calculations have been donefor cytosine addressing the involvement of conical intersections in the relaxa-tion mechanism.92,93 The two lowest excited states are pp� and nOp�, withthe nOp� being slightly lower in energy than the pp� state at the CASSCFlevel.92 At that level, conical intersections92 were located between thepp� � nOp� states followed by a nOp��S0 conical intersection. These conicalintersections can lead the system to the ground state and, as such, providean explanation for the ultra-short excited state lifetimes. At the conicalintersection with the ground state, cytosine is very distorted with pyramidali-zation of a carbon atom and extreme CO stretching. In another study, whichused perturbative methods (CASPT2) to calculate the energies,93 it was foundthat the pp� state is lower in energy than the nOp� state and that only oneconical intersection, pp��S0, exists in the pathway. Thus, this case is one in

Applications 107

which including dynamic correlation changes the details of the relaxationmechanism.

Two different relaxation mechanisms have been proposed for ade-nine.135–137 Adenine has more excited states close in energy than does cyto-sine, thus making the theoretical calculations more complicated. Onemechanism that has been put forth is similar to the one described above forcytosine and involves ring deformations leading to conical intersections ofexcited pp� states with the ground state.136,137 A very different relaxationmechanism had been proposed earlier involving conical intersections with aRydberg ps� state dissociative along an NH bond.135 It is possible that oneor the other or both mechanisms can be effective, depending on experimentalconditions.

The role of conical intersections on the electronic relaxation mechanismof the excited states of uracil has been studied using MRCI ab initio meth-ods.56 The lowest excited states are S1ðnOp�Þ, S2ðpp�Þ, S3ðnOp�Þ, andS4ðpp�Þ, with S2 having the strongest oscillator strength. Absorption ofultra-violet (UV) radiation populates this state, and an efficient relaxationmechanism involves nonadiabatic transitions to the ground state. The verticalexcitation energies of the first two excited states are given in Table 1 (ReðS0Þ).The energies in bold in each column correspond to the state that was mini-mized. MRCI1 is an MRCI expansion involving only single excitations fromthe reference space. MRCIsp is an MRCI expansion that includes s-p corre-lation as described in the original publication.56 MRCI1 results are shownand, in parenthesis, the single point energies obtained using MRCIsp arereported. Conical intersections have been located that connect S2 with S1and S1 with the ground state. The energies at the conical intersections are givenin Table 1, where the geometry of the conical intersection between states SIand SJ is designated as RxðciIJÞ. The conical intersections between S2 and S1are easily accessible from the Franck Condon region at energies 0.88 eV belowthe vertical excitation energy to S2 at the MRCIsp level. The geometrychanges involve mainly bonds stretching or contracting. The seam of conicalintersections between S2 and S1 contains points with both planar geometry andnonplanar geometry. The geometry of the minimum energy point is given inFigure 5b. In this work, the effect of moving along different directions afteremerging through a conical intersection (discussed in the section entitled

Table 1 Energies in eV for the Three Lowest States of Uracil atOptimized Geometries R Obtained at the MRCI1 (MRCIsp) Level

ReðS0Þ ReðS1Þ Rxðci21Þ Rxðci10ÞS0 0 1.18 2.15 (1.87) 4.47(3.96)S1 5.44 (4.80) 4.35(4.12) 5.37(4.83) 4.47(4.29)S2 6.24 (5.79) 5.86 5.37(4.97) 7.62

Reproduced with permission from Ref. 56


Characterizing Conical Intersections: Topography) was explored. The vectorsdefining the branching plane of the S1-S2 conical intersection are shown inFigure 3b. A gradient minimized pathway starting along one of those vectorsleads to the minimum of the S1 surface. Another pathway, however, leads to aconical intersection between S1 and S0, which is located ca. 4.12 eV above theminimum of the ground state at the MRCIsp level of theory. The geometry ishighly distorted with carbon pyramidalization as shown in Figure 5b. Figure 5shows how the S0 minimum and the S0-S1 conical intersection can be con-nected along nonplanar distortions.

Moving from a single base to a base pair of adjacent bases in the DNAstrand becomes computationally demanding because of the increased size ofthe super system. Notwithstanding, some groups have started moving theirresearch effort in this direction.138,139 An ab initio study of guanine-cytosinesuggests that after photoexcitation, a hydrogen-atom transfer reactioninvolves amino groups as proton donors and ring nitrogen atoms as protonacceptors. A conical intersection facilitates internal conversion to the groundstate. Recently, a combined theoretical/experimental study on a modelWatson–Crick base pair was published.139 That model, a cluster of 2-amino-pyridine molecules, displayed short decay dynamics only when a near-planarhydrogen-bonded structure is present. The fast relaxation in that system isfacilitated first by a conical intersection of a locally excited pp� state to acharge-transfer state with a biradical character, and then by a conical intersec-tion of the charge-transfer state with the ground state.139

0

1

2

3

4

5

6

7

8

−64 −56 −48 −40 −32 −24 −16 −8 0

S0

S1 (ππ∗)

Ene

rgy

(eV

)

–C6C5C4H5 (deg)

S1 (n

Oπ∗)

S2 (ππ*)

S2 (n

Oπ∗)

S0(min)

ci10

(a)

S0 minimumS1- S2 CI

S1 minimum

S0- S1 CI

(b)

N3

N1

C2

C4

C5C6

H3

O8

O7

H1H5

H6

Figure 5 (a) Pathway from a displacement along the g direction of the conicalintersection S1-S0. Following the gradient of the S0 surface leads to the S0 minimum. Theenergies of the S0, S1, and S2 states relative to the minimum of S0 are plotted as afunction of a dihedral angle. Reproduced with permission from Refs. 56. (b) Geometriesof uracil at the minima of S0, S1 and at the conical intersections S2-S1 and S1-S0.

Applications 109

BEYOND THE DOUBLE CONE

Three-State Conical Intersections

The discussion so far has focused on two-state conical intersections, which arethe most common conical intersections, and which have been studied exten-sively. Three-state degeneracies imposed by symmetry have been studied inthe context of the Jahn–Teller problem for many decades,39–41 but only minorattention had been given to accidental three-state degeneracies in moleculesuntil recently.113 As most molecular systems in nature have low or no symme-try, these accidental intersections may have a great impact on the photophysicsand photochemistry of those molecular systems, as has been found in acciden-tal two-state intersections.11,14,17,117 Three-state degeneracies may provide amore efficient relaxation pathway when more than one interstate transitionis needed. Moreover, they introduce more complicated geometric phaseeffects,140–142 and they can affect the system’s dynamics and pathways avail-able for radiationless transitions.143

Extending the noncrossing rule3 to three states being degenerate can bestbe understood by inspection of a 3� 3 electronic Hamiltonian matrix insteadof the 2� 2 matrix as described earlier

He ¼H11 H12 H13

H12 H22 H23

H13 H23 H33

0@

1A ½45�

To obtain degeneracy between all three states, the following 5 requirementsmust be satisfied: (1) all off-diagonal matrix elements have to be zero, i.e.,H12 ¼ H13 ¼ H23 ¼ 0; (2) the diagonal matrix elements have to be equal,i.e., H11 ¼ H22 ¼ H33. In general, for an N �N matrix, N-fold degeneracyis obtained by N � 1 diagonal conditions and NðN � 1Þ=2 off-diagonal condi-tions. The total number of conditions to be satisfied isðN � 1Þ þNðN � 1Þ=2 ¼ ðN � 1ÞðN þ 2Þ=2. For molecules lacking any spa-tial symmetry and containing four or more atoms, conical intersections ofthree states are possible. The branching space28 for these conical intersections,the space in which the double cone topography is evinced, is five dimen-sional140 and connects each electronic state with two other states.

The first study on accidental three-state conical intersections was donefor the CHþ4 cation by Katriel and Davidson.113 In a tetrahedral geometry,the ground state of CHþ4 is a T2 state. Therefore, it is triply degenerate asrequired by symmetry. Only one degree of freedom exists that will preserveTd symmetry, and the dimensionality of the seam is one because all therequirements for degeneracy are satisfied by symmetry. The authors foundadditional three-fold degeneracies in this system even when the tetrahedralsymmetry was broken. If no symmetry is present, the cation has 9 degreesof freedom and the dimensionality of the seam becomes 9� 5 ¼ 4.


Efficient algorithms have recently facilitated the location of three-stateconical intersections, and have identified the existence of such intersectionsin many systems.144–146 Three-state conical intersections, like the two-stateintersections described above, can affect excited states dynamics and groundstate vibrational spectra, if the ground state is involved. Three-state accidentalconical intersections were first found between Rydberg excited states in ethyland allyl radicals.144,145 They were also found in 5-member ring heterocyclicradicals, such as pyrazolyl, where the ground state and two excited states crossat an energy only ca. 3000 cm�1 above the ground state minimum.146 In thiscase, the ground state is one of the degenerate states. As a result, complicatedvibronic spectra were expected and observed experimentally.147 Morerecently, three-state conical intersections have been found in closed-shell sys-tems as well.94,143,148

The involvement of three-state conical intersections in the photophysicsand radiationless decay processes of nucleobases has been investigated usingMRCI methods.94 Three-state conical intersections have been located for thepyrimidine base, uracil, and for the purine base, adenine. Figure 6 shows theenergies of the three-state conical intersections compared with the verticalexcitations in these molecules. In uracil, a three-state degeneracy betweenthe S0, S1, and S2 states has been located 6.2 eV above the ground state mini-mum energy. This energy is 0.4 eV higher than a vertical excitation to S2 and

0

1

2

3

4

5

6

7

8

Re(S0) Rx(ci123)Rx2(ci012)Rx(ci01) Re(S0)Rx(ci12) Rx(ci01)Rx2(ci123 ′)0

1

2

3

4

5

6

7

8

Ene

rgy

(eV

)

(a) Uracil (b) Adenine

Figure 6 Energy levels at the two- and three-state conical intersection points usingMRCI, (a) the S0, S1, S2 states of uracil and (b) the S0, S1, S2, S3, S4 states of adenine.RxðciIJÞ and RxðciIJKÞ denote conical intersection between states I; J or I; J;K;respectively. Reproduced with permission from Ref. 94.

Beyond the Double Cone 111

at least 1.3 eV higher than the two-state conical intersections found pre-viously. In adenine, two different three-state degeneracies between the S1,S2, and S3 states have been located at energies close to the vertical excitationenergies. The energetics of these three-state conical intersections suggest thatthey can play a role in a radiationless decay pathway in adenine. In summary,these results show that three-state conical intersections are common and theycan complicate the PESs of molecules. The most relevant question thenbecomes whether they are accessible during a photoinitiated event.

In three-dimensional subspaces of this five-dimensional branching space,the three-state degeneracy can be lifted partially so that two of the three statesremain degenerate.140,141,145 These two-state conical intersection seams origi-nating from the three-state conical intersection have been studied in the allylradical.145 In adenine, different seams of two-state conical intersections origi-nate from each of the three-state conical intersections, leading to a great num-ber of two-state conical intersections at energies lower than the three-stateseams.

Two-state conical intersections have been well established in many typesof molecular systems. In contrast, the study of three-state conical intersectionsis still in its infancy, and detailed studies are needed to understand the influ-ence of these intersections on the dynamical behavior of molecules.

Spin-Orbit Coupling and Conical Intersections

So far, we have only considered the nonrelativistic electronic Hamilto-nian when determining electronic PESs and conical intersections. Whenspin-orbit coupling is included, the total electronic Hamiltonian becomesHe0 ¼ He þHSO, where He is the nonrelativistic Hamiltonian and HSO isthe spin-orbit coupling operator. Depending on the magnitude of the spin-orbit coupling, different methods exist for its calculation and incorporationinto the electronic structure solution. Several reviews have been publishedon the methodology for treating the spin-orbit coupling, including one thathas appeared in this pedagogically driven review series.149–151 For light ele-ments in which the spin-orbit coupling is small, perturbation theory can beused with HSO treated as a perturbation. For heavier elements, however, thespin-orbit coupling becomes too large to be treated as a perturbation and it hasto be included directly into the electronic Hamiltonian before diagonalization.For heavy elements, relativistic scalar effects also become important, and theytoo must be included in the Hamiltonian. All-electron methods and relativisticeffective core potentials have been developed and used for thesecases.149,150,152,153

When spin-orbit coupling is included in the Hamiltonian, new, qualita-tively different effects appear in the radiationless behavior of the system. Twoeffects are particularly important. First, the spin-orbit interaction can couplestates of different spin multiplicity whose intersection otherwise would not be


conical. In this case, intersystem crossing and spin-forbidden processes areobserved. Spin-forbidden processes are not discussed here but they havebeen described extensively in a previous chapter in this book series.120 Thesecond effect involves systems with an odd number of electrons, for whichinclusion of the spin-orbit coupling changes qualitatively the characteristicsof the conical intersections. The implications on the noncrossing rule werediscussed by Mead in a seminal work in 1979,154 whereas the effect of thespin-orbit coupling on the geometric phase effect was discussed by Stone.155

The origin of this change comes from the behavior of the wavefunction undertime reversal symmetry. The time reversal operator is an antiunitary operatorthat commutes with the Hamiltonian but inverts the spin. For odd-electron sys-tems, a wavefunction f and its time reversal Tf are orthogonal and degenerate.If f is an eigenfunction of the Hamiltonian, Tf is a degenerate eigenfunction,so all the eigenvalues are (at least) doubly degenerate. This degeneracy, presentin odd electron systems, is referred to as Kramers degeneracy.156 Therefore, atwo-state conical intersection requires four eigenfunctions of the electronicHamiltonian to become degenerate. Furthermore, the Hamiltonian matrix ele-ments are complex in general because matrix elements of the spin-orbit cou-pling operator can be complex. Combining these ideas, the two-stateHamiltonian model used earlier to rationalize the noncrossing rule nowbecomes the four-by-four Hamiltonian matrix given in Eq. [46]

He0 ¼H11 H12 0 H1T2

H�12 H22 �H1T20

0 �H�1T2H11 H�12

H�1T20 H12 H22

0BB@

1CCA ½46�

Mathematical properties of the time reversal operator relate the matrix ele-ments appearing in the Hamiltonian. More specifically, only two unique off-diagonal matrix elements and two unique diagonal matrix elements exist.154

The conditions that must be satisfied for degeneracy are as follows:

H11 ¼ H22 ½47�ReðH12Þ ¼ ImðH12Þ ¼ 0 ½48�ReðH1T2

Þ ¼ ImðH1T2Þ ¼ 0 ½49�

The off-diagonal matrix elements are complex; ReðÞ and ImðÞ refer to theirreal and imaginary parts, respectively. If Cs or higher symmetry is present,it can be shown that the number of conditions needed for degeneracy isreduced from five to three.154 Although the theoretical basis needed for study-ing conical intersections that include spin-orbit coupling was introduced byMead in 1979,154 algorithms for the computational study of these conicalintersections were derived and implemented only much later in time.157–159

Beyond the Double Cone 113

Using the conditions in Eqs. [47]–[49] and perturbation theory near a conicalintersections, algorithms based on the Lagrange multipliers method6 weredeveloped.157–159 These techniques can locate conical intersections when thespin-orbit coupling is included in the Hamiltonian with perturbativemethods.151

A system in which this effect has been studied is the reaction of molecu-lar hydrogen with the electronically excited hydroxy radical,

H2 þOHðA2�þÞ ! H2 þOHðX2�Þ ½50�H2 þOHðA2�þÞ ! H2OþHð2SÞ ½51�

In this reaction, either the OH radical quenches back to its ground state or areaction occurs to form water. The nonadiabatic mechanism for these pro-cesses is facilitated by a conical intersection between the � and � states in lin-ear symmetry. When the system has Cs symmetry, the states involved are thetwo A0 states. The nonrelativistic seam has been studied earlier.160–162 Whenthe system has Cs symmetry, there exist five degrees of freedom. Therefore, anonrelativistic seam has dimension 5� 2 ¼ 3, whereas inclusion of spin-orbitcoupling reduces the dimension of the seam to 5� 3 ¼ 2. In linear symmetry,the two crossing states are � and �. When the spin-orbit coupling is included,the � state splits into two components, a �1=2 level and a �3=2 level, as shownin Figure 7. A conical intersection can occur between either � and �1=2 orbetween � and �3=2. Figure 8 shows the energy of the reactants, the products,and the minimum energy point on the seam before and after spin-orbit cou-pling. Whereas the minimum energy point on the nonrelativistic seam is ca.20,000 cm�1 below the energy of the reactants, the energy of the minimumenergy point on the seam after incorporating the spin-orbit coupling is almostthe same as that of the reactants, i.e., the seam is ca. 20,000 cm�1 higher thanwhen spin-orbit coupling is neglected. Thus, even for a system void of heavyatoms, like H2 þOH, the qualitative difference is obvious.

2 – 2 system

2Σ+ 2Π2Σ1/2

(b)

2Π3/2

2Π1/2

2Π3/2

2Σ1/2,2Π1/2

(a) (c)

Figure 7 Energy level diagram of the intersecting states in H2 þOH: (a) at thenonrelativistic conical intersection point without spin-orbit coupling; (b) at thenonrelativistic conical intersection point with spin-orbit coupling; (c) at the newrelativistic conical intersection.


CONCLUSIONS AND FUTURE DIRECTIONS

The study of nonadiabatic processes and conical intersections in particu-lar have gained popularity in recent years. Efficient computational strategiesneeded to locate conical intersections along with modern experimental techni-ques that probe ultra-fast nonadiabatic processes have contributed to thispopularity. Although initial steps in nonadiabatic theory and conical intersec-tions focused on theoretical analyses and involved the study of small prototypesystems, significant progress has been made since then, and we are now begin-ning to address important questions in areas like photobiology and incondensed-phase systems.163 A current focus for some groups is to developmethods that incorporate solvent into the study of conical intersections, whichis being done by using continuum model techniques164,165 and with QM/MMmodels.128 The list of problems that await study is long and they are so impor-tant that it is certain many researchers will devote their research endeavors toimprove the available methods needed to fully understand the role of conicalintersections in chemistry, biology, and in material sciences. Progress in theseareas demands ongoing, seminal developments in electronic structure theoryso that accurate excited state energies and gradients can be obtained for largersystems, along with efficient methods developed for nuclear dynamics.

–40000

–30000

–20000

–10000

0

10000

Ene

rgy

(cm

–1)

OH(X2Π) + H2

OH(A2Σ+) + H2

H2O + H

Nonrelativistic minimumenergy crossing

Relativisticcrossing

1.83

2.20

1.70

2.70

2.05

2.56

Reaction Coordinate

Figure 8 Energy of the minimum energy point on the seam for the �� conicalintersection of H2 þOH with, and without, spin-orbit coupling. The numbers in thediagram adjacent to the molecules are computed bond lengths in A.

Conclusions and Future Directions 115

ACKNOWLEDGMENTS

The author thanks the National Science Foundation under Grant No. CHE-0449853 andTemple University for financial support. David Yarkony is thanked for introducing the author tothe field of conical intersections; several results presented here were obtained in collaboration withhim.

REFERENCES

1. M. Born and R. Oppenheimer, Ann. Phys., 84, 457 (1927). Zur Quantentheorie derMolekeln.

2. M. Dantus and A. Zewail, Chem. Rev., 104, 1717 (2004). Introduction: Femtochemistry.

3. J. von Neumann and E. P. Wigner, Physik. Z., 30, 467 (1929). On the Behaviour ofEigenvalues in Adiabatic Processes.

4. E. Teller, J. Phys. Chem., 41, 109 (1937). The Crossing of Potential Surfaces.

5. J. Michl, Top. Curr. Chem., 46, 1 (1974). Physical Basis of Qualitative MO Arguments inOrganic Photochemistry.

6. M. R.Manaa andD. R. Yarkony, J. Chem. Phys., 99, 5251 (1993). On the Intersection of TwoPotential Energy Surfaces of the Same Symmetry. Systematic Characterization Using aLagrange Multiplier Constrained Procedure.

7. M. J. Bearpark,M. A. Robb, and H. B. Schlegel,Chem. Phys. Lett., 223, 269 (1994). A DirectMethod for the Location of the Lowest Energy Point on a Potential Surface Crossing.

8. D. R. Yarkony, Acc. Chem. Res., 31, 511 (1998). Conical Intersections: Diabolical and OftenMisunderstood.

9. D. R. Yarkony, Rev. Mod. Phys., 68, 985 (1996). Diabolical Conical Intersections.

10. D. R. Yarkony, J. Phys. Chem., 100, 18612 (1996). Current Issues in NonadiabaticChemistry.

11. D. R. Yarkony, J. Phys. Chem. A, 105, 6277 (2001). Conical Intersections: The NewConventional Wisdom.

12. F. Bernardi, M. Olivucci, and M. A. Robb, Acc. Chem. Res., 23, 405 (1990). PredictingForbidden and Allowed Cycloaddition Reactions: Potential Surface Topology and itsRationalization.

13. F. Bernardi, M. Olivucci, andM. A. Robb, Chem. Soc. Rev., 25, 321 (1996). Potential EnergySurface Crossings in Organic Photochemistry.

14. M. A. Robb, M. Garavelli, M. Olivucci, and F. Bernardi, in Reviews in ComputationalChemistry, Vol. 15, K. B. Lipkowitz and D. B. Boyd, Eds., Wiley-VCH, New York, 2000,pp. 87–146. A Computational Strategy for Organic Photochemistry.

15. C. A. Mead and D. G. Truhlar, Phys. Rev. A, 68, 032501 (2003). Relative Likelihood ofEncountering Conical Intersections and Avoided Intersections on the Potential EnergySurfaces of Polyatomic Molecules.

16. T. A. Barckholtz and T. A. Miller, Int. Rev. Phys. Chem., 17, 435 (1998). QuantitativeInsights about Molecules Exhibiting Jahn-Teller and Related Effects.

17. W. Domcke, D. R. Yarkony, and H. Koppel, Conical Intersections, World Scientific,Singapore, 2004.

18. A. W. Jasper, C. Zhu, S. Nangia, and D. G. Truhlar, Faraday Discuss., 127, 1 (2004).Introductory Lecture: Nonadiabatic Effects in Chemical Dynamics.

19. J. A. Pople, in Nobel Lectures, Chemistry 1996-2000, I. Grenthe, Ed., World Scientific,Singapore, 2003.


20. M. Born and K. Huang, Dynamical Theory of Crystal Lattices, Oxford University Press,Oxford, UK, 1954.

21. L. S. Cederbaum, in Conical Intersections, W. Domcke, D. R. Yarkony, and H. Koppel,Eds., World Scientific, Singapore, 2004, pp. 3–40. Born-Oppenheimer Approximation andBeyond.

22. W. Lichten, Phys. Rev., 131, 229 (1963). Resonant Charge Exchange in Atomic Collisions.

23. T. F. O’Malley, in Advances in Atomic and Molecular Physics, Vol. 7, D. Bates andI. Esterman, Eds., Academic Press, New York, 1971, pp. 223–249. Diabatic States ofMolecules - Quasistationary Electronic States.

24. F. T. Smith, Phys. Rev., 179, 111 (1969). Diabatic and Adiabatic Representations for AtomicCollision Problems.

25. T. Pacher, L. S. Cederbaum, andH. Koppel,Adv. Chem. Phys., 84, 293 (1993). Adiabatic andQuasidiabatic States in a Gauge Theoretical Framework.

26. H. Koppel, inConical Intersections, W. Domcke, D. R. Yarkony, and H. Koppel, Eds., WorldScientific, Singapore, 2004, pp. 175–204. Diabatic Representation: Methods for the Con-struction of Diabatic Electronic States.

27. D. G. Truhlar and C. A.Mead, J. Chem. Phys., 77, 6090 (1982). Conditions for the Definitionof a Strictly Diabatic Electronic Basis for Molecular Systems.

28. G. J. Atchity, S. S. Xantheas, and K. Ruedenberg, J. Chem. Phys., 95, 1862 (1991). PotentialEnergy Surfaces Near Intersections.

29. H. C. Longuet-Higgins, U. Opik,M.H. L. Pryce, and R. A. Sack, Proc. R. Soc. London Ser. A,244, 1 (1958). Studies of the Jahn-Teller Effect. II. The Dynamical Problem.

30. G. Herzberg andH. C. Longuet-Higgins,Discuss. Faraday Soc., 35, 77 (1963). Intersection ofPotential Energy Surfaces in Polyatomic Molecules.

31. C. A. Mead and D. G. Truhlar, J. Chem. Phys., 70, 2284 (1979). On the Determination ofBorn-Oppenheimer Nuclear Motion Wave Functions Including Complications Due toConical Intersection and Identical Nuclei.

32. M. V. Berry, Proc. R. Soc. London Ser. A, 392, 45 (1984). Quantal Phase Factors Accom-panying Adiabatic Changes.

33. G. J. Atchity and K. Ruedenberg, J. Chem. Phys., 110, 4208 (1999). A Local Understanding ofthe Quantum Chemical Geometric Phase Theorem in Terms of Diabatic States.

34. C. A. Mead, J. Chem. Phys., 72, 3839 (1980). Superposition of Reactive and NonreactiveScattering-Amplitudes in the Presence of a Conical Intersection.

35. A. Kuppermann, in Dynamics of Molecules and Chemical Reactions, R. E. Wyatt and J. Z.Zhang, Eds., Marcel Dekker, New York, 1996, pp. 411–472. The Geometric Phase inReaction Dynamics.

36. B. K. Kendrick, J. Phys. Chem. A, 107, 6739 (2003). Geometric Phase Effects in ChemicalReaction Dynamics and Molecular Spectra.

37. H. A. Jahn and E. Teller, Proc. R. Soc. London Ser. A, 161, 220 (1937). Stability of PolyatomicMolecules in Degenerate Electronic States. I. Orbital Degeneracy.

38. H. A. Jahn,Proc. R. Soc. London Ser. A, 164, 117 (1938). Stability of PolyatomicMolecules inDegenerate Electronic States. II. Spin Degeneracy.

39. I. B. Bersuker, The Jahn–Teller Effect and Vibronic Interactions in Modern Chemistry,Plenum Press, New York, 1984.

40. R. Englman,The Jahn–Teller Effect inMolecules and Crystals,Wiley-Interscience, NewYork,1972.

41. I. B. Bersuker and V. Z. Polinger, Vibronic Interactions in Molecules and Crystals, Vol. 49,Springer-Verlag, Berlin, 1989.

42. I. B. Bersuker, Chem. Rev., 101, 1067 (2001). Modern Aspects of the Jahn–Teller Effect.Theory and Applications to Molecular Problems.

References 117

43. B. E. Applegate, T. A. Barckholtz, and T. A. Miller, Chem. Soc. Rev., 32, 38 (2003).Exploration of Conical Intersections and Their Ramifications for Chemistry Through theJahn–Teller Effect.

44. V.-A. Glezakou, M. S. Gordon, and D. R. Yarkony, J. Chem. Phys, 108, 5657 (1998).Systematic Location of Intersecting Seams of Conical Intersection in Triatomic Molecules:The 12a’ - 22a’ Conical Intersections in BH2.

45. S. S. Xantheas, G. J. Atchity, S. T. Elbert, andK. Ruedenberg, J. Chem. Phys., 93, 7519 (1990).Potential Energy Surfaces Near Intersections.

46. D. R. Yarkony, in Conical Intersections, W. Domcke, D. R. Yarkony, and H. Koppel, Eds.,World Scientific, Singapore, 2004, pp. 41–128. Conical Intersections: Their Description andConsequences.

47. C. A. Mead, J. Chem. Phys., 78, 807 (1983). Electronic Hamiltonian, Wavefunctions andEnergies and Derivative Coupling between Born-Oppenheimer States in the Vicinity of aConical Intersection.

48. D. R. Yarkony, J. Phys. Chem. A, 101, 4263 (1997). Energies and Derivative Couplings in theVicinity of a Conical Intersection Using Degenerate Perturbation Theory and AnalyticGradient Techniques.

49. H. Koppel, W. Domcke, and L. S. Cederbaum, Adv. Chem. Phys., 57, 59 (1984). MultimodeMolecular Dynamics Beyond the Born-Oppenheimer Approximation.

50. W. Domcke and G. Stock, Adv. Chem. Phys., 100, 1–170 (1997). Theory ofUltrafast Nonadiabatic Excited-State Processes and their Spectroscopic Detection in RealTime.

51. D. R. Yarkony, J. Chem. Phys., 114, 2601 (2001). Nuclear Dynamics Near Conical Inter-sections in the Adiabatic Representation. I. The Effects of Local Topography on InterstateTransition.

52. M. Ben-Nun, F.Molnar, K. Schulten, and T. J.Martinez, Proc. Natl. Acad. Sci. USA, 97, 9379(2000). The Role of Intersection Topography in Bond Selectivity of Cis-trans Photoisome-rization.

53. A. Migani, A. Sinicropi, N. Ferr, A. Cembran, M. Garavelli, and M. Olivucci, FaradayDiscuss., 127, 179 (2004). Structure of the Intersection Space Associated with Z/E Photo-isomerization of Retinal in Rhodopsin Proteins.

54. A. W. Jasper and D. G. Truhlar, J. Chem. Phys., 122, 044101 (2005). Conical Intersectionsand Semiclassical Trajectories: Comparison to Accurate Quantum Dynamics and Analysesof the Trajectories.

55. D. R. Yarkony, J. Chem. Phys., 112, 2111 (2000). On the Adiabatic to Diabatic StatesTransformation Near Intersections of Conical Intersections.

56. S. Matsika, J. Phys. Chem. A, 108, 7584 (2004). Radiationless Decay of Excited States ofUracil Through Conical Intersections.

57. S. Matsika and D. R. Yarkony, J. Chem. Phys., 117, 3733 (2002). Conical Intersections andthe Nonadiabatic Reactions H2OþOð3PÞ $ OHðA2�þÞ þOHðX2�Þ.

58. F. Bernardi, M. Olivucci, I. N. Ragazos, and M. A. Robb, J. Am. Chem. Soc., 114, 8211(1992). A New Mechanistic Scenario for the Photochemical Transformation of Ergosterol:An MCSCF and MM-VB Study.

59. K. Ruedenberg and G. J. Atchity, J. Chem. Phys., 110, 3799 (1993). A QuantumMechanicalDetermination of Diabatic States.

60. M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman, J. A.Montgomery, Jr., T. Vreven, K.N.Kudin, J. C. Burant, J.M.Millam, S. S. Iyengar, J. Tomasi,V. Barone, B. Mennucci, M. Cossi, G. Scalmani, N. Rega, G. A. Petersson, H. Nakatsuji,M.Hada,M. Ehara, K. Toyota, R. Fukuda, J. Hasegawa,M. Ishida, T.Nakajima, Y.Honda,O. Kitao, H. Nakai, M. Klene, X. Li, J. E. Knox, H. P. Hratchian, J. B. Cross, V. Bakken,C. Adamo, J. Jaramillo, R. Gomperts, R. E. Stratmann, O. Yazyev, A. J. Austin, R. Cammi,C. Pomelli, J. W. Ochterski, P. Y. Ayala, K. Morokuma, G. A. Voth, P. Salvador,J. J. Dannenberg, V. G. Zakrzewski, S. Dapprich, A. D. Daniels, M. C. Strain, O. Farkas,


D. K. Malick, A. D. Rabuck, K. Raghavachari, J. B. Foresman, J. V. Ortiz, Q. Cui,A. G. Baboul, S. Clifford, J. Cioslowski, B. B. Stefanov, G. Liu, A. Liashenko, P. Piskorz,I. Komaromi, R. L. Martin, D. J. Fox, T. Keith, M. A. Al Laham, C. Y. Peng, A.Nanayakkara, M. Challacombe, P. M. W. Gill, B. G. Johnson, W. Chen, M. W. Wong,C. Gonzalez, and J. A. Pople, Gaussian 03, Revision C.02, 2004.

61. R. J. Bartlett and J. F. Stanton, in Reviews in Computational Chemistry, Vol. 5, K. B.Lipkowitz and D. B. Boyd, Eds., Wiley-VCH, New York, 1994, pp. 65–169. Applications ofPost-Hartree-Fock Methods: A Tutorial.

62. I. Shavitt, in Methods of Electronic Structure Theory, H. F. Schaefer III, Ed., Plenum Press,New York, 1977, Vol. 4 of Modern Theoretical Chemistry, pp. 189–275. The Method ofConfiguration Interaction.

63. T. D. Crawford and H. F. Schaefer III, inReviews in Computational Chemistry, Vol. 14, K. B.Lipkowitz andD. B. Boyd, Eds.,Wiley-VCH,NewYork, 1999, pp. 33–136. An Introductionto Coupled Cluster Theory for Computational Chemists.

64. F. M. Bickelhaupt and E. J. Baerends, in Reviews in Computational Chemistry, Vol. 15, K. B.Lipkowitz and D. B. Boyd, Eds., Wiley-VCH, New York, 2000, pp. 1–86. Kohn-ShamDensity Functional Theory: Predicting and Understanding Chemistry.

65. B. O. Roos and P. R. Taylor, Chem. Phys., 48, 157 (1980). A Complete Active Space SCFMethod (CASSCF) Using a Density-Matrix Formulated Super-CI Approach.

66. B. O. Roos, Adv. Chem. Phys., 69, 399 (1987). The Complete Active Space Self ConsistentField Method and its Applications in Electronic Structure Calculations.

67. W. T. Borden and E. R. Davidson, Acc. Chem. Res., 29, 67 (1996). The Importance ofIncluding Dynamic Electron Correlation in Ab Initio Calculations.

68. H. Dachsel, R. J. Harrison, and D. A. Dixon, J. Phys. Chem. A, 103, 152 (1999). Multi-reference Configuration Interaction Calculations on Cr2: Passing the One Billion Limit inMRCI/MRACPF Calculations.

69. R. Shepard, Int. J. QuantumChem., 31, 33 (1987). Geometrical EnergyDerivative Evaluationwith MRCI Wave Functions.

70. R. Shepard, in Modern Electronic Structure Theory Part I, D. R. Yarkony, Ed., WorldScientific, Singapore, 1995, pp. 345–458. The Analytic Gradient Method for ConfigurationInteraction Wave Functions.

71. H. Lischka, M. Dallos, and R. Shepard, Mol. Phys., 100, 1647 (2002). Analytic MRCIGradient for Excited States: Formalism and Application to the np� Valence- and n� ð3s; 3pÞRydberg States of Formaldehyde.

72. B. H. Lengsfield and D. R. Yarkony, in State-Selected and State-to-State Ion-MoleculeReaction Dynamics: Part 2 Theory, M. Baer and C. Y. Ng, Eds., John Wiley and Sons,New York, 1992, Vol. 82 of Advances in Chemical Physics, pp. 1–71. NonadiabaticInteractions between Potential Energy Surfaces: Theory and Applications.

73. H. Lischka, R. Shepard, R. M. Pitzer, I. Shavitt, M. Dallos, Th. Muller, P. G. Szalay, M. Seth,G. S. Kedziora, S. Yabushita, and Z. Zhang, Phys. Chem. Chem. Phys., 3, 664 (2001). High-Level Multireference Methods in the Quantum-Chemistry Program System COLUMBUS:Analytic MR-CISD and MR-AQCC Gradients and MR-AQCC-LRT for Excited States,GUGA Spin-Orbit CI and Parallel CI Density.

74. H. Lischka, M. Dallos, P. G. Szalay, D. R. Yarkony, and R. Shepard, J. Chem. Phys., 120,7322 (2004). Analytic Evaluation of Nonadiabatic Coupling Terms at the MR-CI Level. I.Formalism.

75. M. Dallos, H. Lischka, R. Shepard, D. R. Yarkony, and P. G. Szalay, J. Chem. Phys., 120,7330 (2004). Analytic Evaluation of Nonadiabatic Coupling Terms at the MR-CI Level. II.Minima on the Crossing Seam: Formaldehyde and the Photodimerization of Ethylene.

76. H.-J. Werner and P. J. Knowles, J. Chem. Phys., 89, 5803 (1988). An Efficient InternallyContracted Multiconfiguration Reference CI Method.

77. H.-J. Werner, P. J. Knowles, R. Lindh, M. Schutz, P. Celani, T. Korona, F. R. Manby, G.Rauhut, R. D. Amos, A. Bernhardsson, A. Berning, D. L. Cooper, M. J. O. Deegan, A. J.

References 119

Dobbyn, F. Eckert, C. Hampel, G. Hetzer, A. W. Lloyd, S. J. McNicholas, W. Meyer, M. E.Mura, A. Nicklass, P. Palmieri, R. Pitzer, U. Schumann, H. Stoll, A. J. Stone, R. Tarroni, andT. Thorsteinsson, Molpro, version 2002.6, A Package of Ab Initio Programs, 2003.

78. K. Andersson, P.-A. Malmqvist, B. O. Roos, A. J. Sadlej, and K. Wolinski, J. Phys. Chem.,94, 5483 (1990). Second-Order Perturbation-Theory with a CASSCF ReferenceFunction.

79. K. Andersson, P.-A. Malmqvist, and B. O. Roos, J. Chem. Phys., 96, 1218 (1992). Second-Order Perturbation-Theory with a Complete Active Space Self-Consistent Field ReferenceFunction.

80. G. Karlstrom, R. Lindh, P.-A. Malmqvist, B. O. Roos, U. Ryde, V. Veryazov, P.-O. Widmark,M. Cossi, B. Schimmelpfennig, P. Neogrady, and L. Seijo, Computat. Mater. Sci., 28, 222(2003). Molcas: a Program Package for Computational Chemistry.

81. H. Nakano, J. Chem. Phys., 99, 7983 (1993). Quasidegenerate Perturbation Theory withMulticonfigurational Self-consistent-field Reference Functions.

82. K. Hirao, Chem. Phys. Lett., 190, 374 (1992). Multireference Moller-Plesset Method.

83. K. R. Glaesemann,M. S. Gordon, andH.Nakano,Phys. Chem.Chem. Phys., 1, 967 (1999). AStudy of FeCOþ with Correlated Wavefunctions.

84. M.W. Schmidt, K. K. Baldridge, J. A. Boatz, S. T. Elbert,M. S. Gordon, J. H. Jensen, S. Koseki,N. Matsunaga, K. A. Nguyen, S. Su, T. L. Windus, M. Dupuis, and J. A. Montgomery Jr.,J. Comput. Chem., 14, 1347 (1993). Computation of Conical Intersections by UsingPerturbation Techniques.

85. B. O. Roos, K. Andersson, M. P. Fulscher, P.-A. Malmqvist, L. Serrano-Andres, K. Pierloot,andM.Merchan, inNewMethods in Computational QuantumMechanics, I. Prigogine andS. A. Rice, Eds., Wiley, New York, 1996, Vol. 93 ofAdvances in Chemical Physics, pp. 219–331. Multiconfigurational Perturbation Theory: Applications in Electronic Spectroscopy.

86. J. Finley, P.-A. Malmqvist, B. O. Roos, and L. Serrano-Andres, Chem. Phys. Lett., 288, 299(1998). The Multi-state CASPT2 Method.

87. L. Serrano-Andres, M. Merchan, and R. Lindh, J. Chem. Phys., 122, 104107 (2005).Computation of Conical Intersections by Using Perturbation Techniques.

88. H. Koch, H. J. A. Jensen, P. Jorgensen, and T. Helgaker, J. Chem. Phys., 93, 3345 (1990).Excitation-Energies from the Coupled Cluster Singles and Doubles Linear Response Func-tion (CCSDLR) - Applications to Be, CHþ, CO, and H2O.

89. J. F. Stanton and R. J. Bartlett, J. Chem. Phys., 98, 7029 (1993). The Equation of MotionCoupled-Cluster Method - A Systematic Biorthogonal Approach to Molecular-ExcitationEnergies, Transition-Probabilities, and Excited-State Properties.

90. A. I. Krylov, Chem. Phys. Lett., 338, 375 (2001). Size-Consistent Wave Functions for Bond-Breaking: The Equation-of-Motion Spin-Flip Model.

91. E. Runge and E. K. U. Gross, Phys. Rev. Lett., 52, 997 (1984). Density-Functional Theory forTime-Dependent Systems.

92. N. Ismail, L. Blancafort, M. Olivucci, B. Kohler, and M. A. Robb, J. Am. Chem. Soc., 124,6818 (2002). Ultrafast Decay of Electronically Excited Singlet Cytosine via p;p� to n; p� StateSwitch.

93. M. Merchan and L. Serrano-Andres, J. Am. Chem. Soc., 125, 8108 (2003). Ultrafast InternalConversion of Excited Cytosine via the Lowest pp� Electronic Singlet State.

94. S. Matsika, J. Phys. Chem. A, 109, 7538 (2005). Three-state Conical Intersections in NucleicAcid Bases.

95. H. B. Schlegel, in Modern Electronic Structure Theory Part I, D. R. Yarkony, Ed., WorldScientifc, Singapore, 1995, pp. 459–500. Advance Series in Physical Chemistry, GeometryOptimization on Potential Energy Surfaces.

96. N. Koga andK.Morokuma,Chem. Phys. Lett., 119, 371 (1985). Determination of the LowestEnergy Point on the Crossing Seam between Two Potential Surfaces Using the EnergyGradient.


97. A. Farazdel and M. Dupuis, J. Comput. Chem., 12, 276 (1991). On the Determination of theMinimum on the Crossing Seam of Two Potential Energy Surfaces.

98. D. R. Yarkony, J. Chem. Phys., 92, 2457 (1990). On the Characterization of Regions ofAvoided Surface Crossings Using an Analytic Gradient Based Method.

99. J. M. Anglada and J. M. Bofill, J. Comput. Chem., 18, 992 (1997). A Reduced-Restricted-Quasi-Newton-Raphson Method for Locating and Optimizing Energy Crossing Pointsbetween Two Potential Energy Surfaces.

100. I. N. Ragazos,M.A. Robb, F. Bernardi, andM.Olivucci,Chem. Phys. Lett., 119, 217, (1992).Optimization and Characterization of the Lowest Energy Point on a Conical IntersectionUsing an MC-SCF Lagrangian.

101. D. R. Yarkony, in Conical Intersections, W. Domcke, D. R. Yarkony, and H. Koppel, Eds.,World Scientific, Singapore, 2004, pp. 129–174. Determination of Potential Energy SurfaceIntersections and Derivative Couplings in the Adiabatic Representation.

102. G. A.Worth, H.-D.Meyer, and L. S. Cederbaum, inConical Intersections, W. Domcke, D. R.Yarkony, and H. Koppel, Eds., World Scientific, Singapore, 2004, pp. 583–617. Multi-dimensional Dynamics Involving a Conical Intersection: Wavepacket Calculations Using theMCTDH Method.

103. J. C. Tully and R. K. Preston, J. Chem. Phys., 55, 562 (1971). Trajectory Surface HoppingApproach to Nonadiabatic Molecular Collisions: The Reaction of Hþ with D2.

104. J. C. Tully, J. Chem. Phys., 93, 1061 (1990). Molecular Dynamics with Electron Transitions.

105. S. Hammes Schiffer and J. C. Tully, J. Chem. Phys., 101, 4657 (1994). Proton-Transfer inSolution - Molecular-Dynamics with Quantum Transitions.

106. J. C. Tully, inDynamics ofMolecular Collisions, W. H.Miller, Ed., Plenum Press, New York,1975, pp. 217–267. Nonadiabatic Processes in Molecular Collisions.

107. M. D. Hack and D. G. Truhlar, J. Phys. Chem. A, 104, 7917 (2000). NonadiabaticTrajectories at an Exhibition.

108. S. Klein, M. J. Bearpark, B. R. Smith, M. A. Robb,M. Olivucci, and F. Bernardi,Chem. Phys.Lett., 293, 259 (1998). Mixed State ‘‘On the Fly’’ Nonadiabatic Dynamics: The Role of theConical Intersection Topology.

109. T. J. Martinez, M. Ben-Nun, and R. D. Levine, J. Phys. Chem., 100, 7884 (1996). Multi-Electronic State Molecular Dynamics - A Wave Function Approach with Applications.

110. M. Ben-Nun and T. J. Martinez, J. Chem. Phys., 108, 7244 (1998). NonadiabaticMolecular Dynamics: Validation of the Multiple Spawning Method for a MultidimensionalProblem.

111. M. Ben-Nun, J. Quenneville, and T. J. Martinez, J. Phys. Chem. A, 104, 5161 (2000).Ab Initio Multiple Spawning: Photochemistry from First Principles Quantum MolecularDynamics.

112. A. J. C. Varandas, J. Tennyson, and J. N. Murrell, Chem. Phys. Lett., 61, 431 (1979).Chercher le Croisement.

113. J. Katriel and E. R. Davidson, Chem. Phys. Lett., 76, 259 (1980). The Non-crossing Rule:Triply Degenerate Ground-State Geometries of CHþ4 .

114. D. R. Yarkony, in Modern Electronic Structure Theory Part I, D. R. Yarkony, Ed., WorldScientifc, Singapore, 1995, pp. 642–721. Advance Series in Physical Chemistry, ElectronicStructure Aspects of Nonadiabatic Processes.

115. A. Migani and M. Olivucci, in Conical Intersections, W. Domcke, D. R. Yarkony, and H.Koppel, Eds., World Scientific, Singapore, 2004, pp. 271–320. Conical Intersections andOrganic Reaction Mechanisms.

116. Y. Hass and S. Zilberg, J. Photochem. Photobiol. A: Chem., 144, 221 (2001). Photochemistryby Conical Intersections: A Practical Guide for Experimentalists.

117. M. Klessinger and J. Michl, Excited States and Photochemistry of Organic Molecules, VCHPublishers, Inc., New York, 1995.

References 121

118. J. Michl and V. Bonacic-Koutecky, Electronic Aspects of Organic Photochemistry, WileyInterscience, New York, 1990.

119. W. Fuss, S. A. Trushin, and W. E. Schmid, Res. Chem. Intermed., 27, 447 (2001). UltrafastPhotochemistry of Metal Carbonyls.

120. N. Matsunaga and S. Koseki, in Reviews in Computational Chemistry, Vol. 20, K. B.Lipkowitz, R. Larter, and T. R. Cundari, Eds., Wiley-VCH, New York, 2004, pp. 101–152. Modeling of Spin-Forbidden Reactions.

121. M. Garavelli, P. Gelani, F. Bernardi, M. A. Robb, and M. Olivucci, J. Am. Chem. Soc., 119,6891 (1997). The C5H6NHþ2 Protonated Shiff Base: An Ab InitioMinimalModel for RetinalPhotoisomerization.

122. T. Kobayashi, T. Saito, and H. Ohtani,Nature, 414, 531 (2001). Real-Time Spectroscopy ofTransition States in Bacteriorhodopsin During Retinal Isomerization.

123. A.Warshel and Z. T. Chu, J. Phys. Chem. B, 105, 9857 (2001). Nature of the Surface CrossingProcess in Bacteriorhodopsin: Computer Simulations of the Quantum Dynamics of thePrimary Photochemical Event.

124. A. Toniolo, S. Olsen, L. Manohar, and T. J. Martinez, Faraday Discuss., 127, 149(2004). Conical Intersection Dynamics in Solution: The Chromophore of Green FluorescentProtein.

125. M. E. Martin, F. Negri, and M. Olivucci, J. Am. Chem. Soc., 126, 5452 (2004). Origin,Nature, and Fate of the Fluorescent State of the Green Fluorescent Protein Chromophore atthe CASPT2//CASSCF Resolution.

126. G. A.Worth and L. S. Cederbaum,Chem. Phys. Lett., 338, 219 (2001).Mediation of UltrafastTransfer in Biological Systems by Conical Intersections.

127. A. Toniolo, M. Ben-Nun, and T. J. Martinez, J. Phys. Chem. A, 106, 4679 (2002).Optimization of Conical Intersections with Floating Occupation Semiempirical Configura-tion Interaction Wave Functions.

128. A. Toniolo, G. Granucci, and T. J. Martinez, J. Phys. Chem. A, 107, 3822 (2003). ConicalIntersections in Solution: A QM/MM Study Using Floating Occupation Semiempirical CIWave Functions.

129. M. Daniels and W. Hauswirth, Science, 171, 675 (1971). Fluorescence of the Purine andPyrimidine Bases of the Nucleic Acids in Neutral Aqueous Solution at 300 K.

130. M. Daniels, in Photochemistry and Photobiology of Nucleic Acids, Vol. 1, S. Y. Wang, Ed.,Academic Press, New York, 1976, pp. 23–108. Excited States of the Nucleic Acids: Bases,Mononucleosides, and Mononucleotides.

131. P. R. Callis, Ann. Rev. Phys. Chem., 34, 329 (1983). Electronic States and Luminescence ofNucleic Acid Systems.

132. C. E. Crespo-Hernandez, B. Cohen, P.M.Hare, and B. Kohler,Chem.Rev., 104, 1977 (2004).Ultrafast Excited-state Dynamics in Nucleic Acids.

133. B. Mennucci, A. Toniolo, and J. Tomasi, J. Phys. Chem. A, 105, 4749 (2001). TheoreticalStudy of the Photophysics of Adenine in Solution: Tautomerism, Deactivation Mechanisms,and Comparison with the 2-Aminopurine Fluorescent Isomer.

134. A. Broo, J. Phys. Chem. A, 102, 526 (1998). A Theoretical Investigation of the PhysicalReason for the very Diffferent Luminescence Properties of the Two Isomers Adenine and2-Aminopurine.

135. A. L. Sobolewski and W. Domcke, Eur. Phys. J. D, 20, 369 (2002). On the Mechanism ofNonradiative Decay of DNA Bases: Ab Initio and TDDFT Results for the Excited States of9H-Adenine.

136. S. Perun, A. L. Sobolewski, and W. Domcke, J. Am. Chem. Soc., 127, 6257 (2005). Ab InitioStudies on the Radiationless Decay Mechanisms of the Lowest Excited Singlet States of 9H-Adenine.

137. C. M. Marian, J. Chem. Phys., 122, 104314 (2005). A New Pathway for the Rapid Decay ofElectronically Excited Adenine.


138. A. L. Sobolewski and W. Domcke, Phys. Chem. Chem. Phys., 6, 2763 (2004). Ab InitioStudies on the Photophysics of the Guanine-Cytosine Base Pair.

139. T. Schultz, E. Samoylova, W. Radloff, I. V. Hertel, A. L. Sobolewski, and W. Domcke,Science, 306, 1765 (2004). Efficient Deactivation of a Model Base Pair via Excited-stateHydrogen Transfer.

140. S. P. Keating and C. A. Mead, J. Chem. Phys., 82, 5102 (1985). Conical Intersections in aSystem of Four Identical Nuclei.

141. S. Han and D. R. Yarkony, J. Chem. Phys., 119, 11562 (2003). Conical Intersections of ThreeStates. Energies, Derivative Couplings, and the Geometric Phase Effect in the Neighborhoodof Degeneracy Subspaces. Application to the Allyl Radical.

142. S. Han and D. R. Yarkony, J. Chem. Phys., 119, 5058 (2003). Nonadiabatic ProcessesInvolving Three Electronic States. I. Branch Cuts and Linked Pairs of Conical Intersections.

143. J. D. Coe and T. J. Martinez, J. Am. Chem. Soc., 127, 4560 (2005). Competitive Decayat Two- and Three-State Conical Intersections in Excited-State Intramolecular ProtonTransfer.

144. S. Matsika and D. R. Yarkony, J. Chem. Phys., 117, 6907 (2002). Accidental ConicalIntersections of Three States of the Same Symmetry. I. Location and Relevance.

145. S. Matsika and D. R. Yarkony, J. Chem. Soc., 125, 10672 (2003). Beyond Two-state ConicalIntersections. Three-state Conical Intersections in Low Symmetry Molecules: The AllylRadical.

146. S. Matsika and D. R. Yarkony, J. Am. Chem. Soc., 125, 12428 (2003). Conical Intersectionsof Three Electronic States Affect the Ground State of Radical Species with Little Or NoSymmetry: Pyrazolyl.

147. S. Kato, R. Hoenigman, A. Gianola, T. Ichino, V. Bierbaum, and W. C. Lineberger, inMolecular Dynamics and Theoretical Chemistry Contractors Review, M. Berman, Ed.,AFOSR, San Diego, CA, 2003, p. 49.

148. L. Blancafort andM. A. Robb, J. Phys. Chem. A, 108, 10609 (2004). Key Role of a ThreefoldState Crossing in the Ultrafast Decay of Electronically Excited Cytosine.

149. B. A. Heb, C. M. Marian, and S. D. Peyerimhoff, inModern Electronic Structure Theory Part1, D. R. Yarkony, Ed., World Scientific, Singapore, 1995, pp. 152–278. Advanced Series inPhysical Chemistry, Ab Initio Calculation of Spin-orbit Effects in Molecules IncludingElectron Correlation.

150. W. C. Ermler, R. B. Ross, and P. A. Christiansen,Adv. QuantumChem., 19, 139–182 (1988).Spin-Orbit Coupling and Other Relativistic Effects in Atoms and Molecules.

151. C. M. Marian, in Reviews in Computational Chemistry, K. B. Lipkowitz, R. Larter, andT. R. Cundari, Eds., Wiley-VCH, New York, 2001, pp. 99–204. Spin-Orbit Coupling inMolecules.

152. K. Balasubramanian,Relativistic Effects in Chemistry, Part A, Theory and Techniques,Wiley,New York, 1997.

153. G. L. Malli, Ed., Relativistic Effects in Atoms, Molecules, and Solids, Vol. 87 of NATOAdvanced Science Institutes, Plenum Press, New York, 1983.

154. C. A. Mead, J. Chem. Phys., 70, 2276 (1979). The Noncrossing Rule for Electronic PotentialEnergy Surfases: The Role of Time-reversal Invariance.

155. A. J. Stone, Proc. R. Soc. London Ser. A, 351, 141 (1976). Spin-Orbit Coupling and theInteraction of Potential Energy Surfaces in Polyatomic Molecules.

156. H. Kramers, Proc. Acad. Sci. Amsterdam, 33, 959 (1930).

157. S.Matsika andD. R. Yarkony, J. Chem. Phys., 115, 2038 (2001). On the Effects of Spin-OrbitCoupling on Conical Intersection Seams in Molecules with an Odd Number of Electrons. I.Locating the Seam.

158. S.Matsika andD. R. Yarkony, J. Chem. Phys., 115, 5066 (2001). On the Effects of Spin-OrbitCoupling on Conical Intersection Seams in Molecules with an Odd Number of Electrons. II.Characterizing the Local Topography of the Seam.

References 123

159. S. Matsika and D. R. Yarkony, J. Chem. Phys., 116, 2825 (2002). Spin-Orbit Coupling andConical Intersections in Molecules with an Odd Number of Electrons. III. A PerturbativeDetermination of the Electronic Energies, Derivative Couplings and a Rigorous DiabaticRepresentation Near a Conical Intersection.

160. M. I. Lester, R. A. Loomis, R. L. Schwartz, and S. P. Walch, J. Phys. Chem. A, 101, 9195(1997). Electronic Quenching of OH A 2�þðv0 ¼ 0;1Þ in Complexes with Hydrogen andNitrogen.

161. D. R. Yarkony, J. Chem. Phys., 111, 6661 (1999). Substituent Effects and the NoncrossingRule: The Importance of Reduced Symmetry Subspaces. I. The Quenching of OH(A 2�þ) byH2.

162. B. C. Hoffman and D. R. Yarkony, J. Chem. Phys., 113, 10091 (2000). The Role of ConicalIntersections in the Nonadiabatic Quenching of OH(A 2�þ) by Molecular Hydrogen.

163. J. C. Tully, FaradayDiscuss., 127, 463 (2004). Concluding Remarks: Non-adiabatic Effects inChemical Dynamics.

164. D. Laage, I. Burghardt, T. Sommerfeld, and J. T. Hynes, J. Phys. Chem. A, 107, 11271 (2003).On theDissociation of Aromatic Radical Anions in Solution. 1. Formulation andApplicationto p-cyanochlorobenzene Radical Anion.

165. I. Burghardt, L. S. Cederbaum, and J. T. Hynes, Faraday Discuss., 127, 395 (2004).Environmental Effects on a Conical Intersection: A Model Study.


CHAPTER 3

Variational Transition State Theorywith Multidimensional Tunneling

Antonio Fernandez-Ramos,a Benjamin A. Ellingson,b

Bruce C. Garrett,c and Donald G. Truhlarb

aDepartamento de Quimica Fisica, Universidade de Santiago deCompostela,FacultadedeQuimica,SantiagodeCompostela,SpainbDepartment of Chemistry and Supercomputing Institute,University of Minnesota Minneapolis, MNcChemical andMaterials Sciences Division, Pacific NorthwestNational Laboratory, Richland,WA

INTRODUCTION

‘‘The rate of chemical reactions is a very complicated subject’’

Harold S. Johnston, 1966

‘‘The overall picture is that the validity of the transition state theory has notyet been really proved and its success seems to be mysterious.’’

Raymond Daudel, Georges Leroy,Daniel Peeters, and Michael Sana, 1983

This review describes the application of variational transition state theory(VTST) to the calculation of chemical reaction rates. In 1985, two of us, toge-ther with Alan D. Isaacson, wrote a book chapter on this subject entitled ‘‘Gen-eralized Transition State Theory’’ for the multi-volume series entitled Theory ofChemical Reaction Dynamics.1 Since that time, VTST has undergone


125

important improvements due mainly to the ability of this theory to adapt tomore challenging problems. For instance, the 1985 chapter mainly describesthe application of VTST to bimolecular reactions involving 3–6 atoms, whichwere the state-of-the-art at that moment. The study of those reactions by VTSTdynamics depended on the construction of an analytical potential energy sur-face (PES). Nowadays, thanks to the development of more efficient algorithmsand more powerful computers, the situation is completely different, and mostrate calculations are based on ‘‘on the fly’’ electronic structure calculations,which together with hybrid approaches, like combined quantum mechanicalmolecular mechanical methods (QM/MM), allow researchers to apply VTSTto systems with hundreds or even tens of thousands of atoms. Three othermajor advances since 1985 are that transition state dividing surfaces cannow be defined much more realistically, more accurate methods have beendeveloped to include multidimensional quantum mechanical tunneling intoVTST, and the theory has also been extended to reactions in condensed phases.

This review progresses from the simplest VTST treatments applicable tosimple systems to more advanced ones applicable to complex systems. Thenext four sections describe the use of VTST for gas-phase unimolecular orbimolecular reactions for which we can afford to build a global analyticalPES or to use a high-level electronic structure method to run the dynamicswithout invoking special methods or algorithms to reduce the computationalcost. In the second part (the subsequent three sections on pages 190–212), wedeal with VTST in complex systems; this often involves the use of interpolativeor dual-level methods, implicit solvation models, or potentials of mean forceto obtain the potential energy surface. Two sections also discuss the treatmentof condensed-phase reactions by VTST.

A fundamental theoretical construct underlying this whole chapter is theBorn–Oppenheimer approximation. According to this approximation, whichis very accurate for most chemical reactions (the major exceptions being elec-tron transfer and photochemistry), the Born–Oppenheimer energy, which isthe energy of the electrons plus nuclear repulsion, provides a potential energysurface V for nuclear motion. At first we assume that this potential energy sur-face is known and is available as a potential energy function. Later we providemore details on interfacing electronic structure theory with nuclear dynamicsto calculate V by electronic structure calculations ‘‘on the fly,’’ which is calleddirect dynamics. The geometries where rV is zero play a special role; thesegeometries are called stationary points, and they include the equilibriumgeometries of the reactants, products, and saddle points, and geometries ofprecursor and successor complexes that are local minima (often due to vander Waals forces) between reactants and the saddle point and betweenproducts and the saddle point. In general V is required at a wide range ofgeometries, both stationary and nonstationary.

A word on nomenclature is in order here. When we say transition statetheory, we refer to the various versions of the theory, with or without includ-ing tunneling. When we want to be more specific, we may say conventional

126 Variational Transition State Theory

transition state theory, variational transition state theory, canonical varia-tional transition state theory (also called canonical variational theory orCVT), and so forth. For each of the versions of VTST, we can further differ-entiate, for example, CVT without tunneling, CVT with one-dimensional tun-neling, or CVT with multidimensional tunneling; and we can further specifythe specific approximation used for tunneling. Sometimes we use the term gen-eralized transition state theory, which refers to any version of transition statetheory in which the transition state is not restricted to the saddle point with thereaction coordinate along the imaginary frequency normal mode.

In this chapterwe explain the algorithmsused to implementVTST, especiallyCVT, and multidimensional tunneling approximations in the POLYRATE2–6

computer program. We also include some discussion of the fundamental theoryunderlying VTST and these algorithms. Readers whowant amore complete treat-ment of theoretical aspects are referred to another review.

The beginning of the next section includes the basic equations of VTST,paying special attention to canonical variational transition state theory (CVT),although other theories are discussed briefly in the third subsection. The rea-son for centering attention mainly on CVT is that it is very accurate butrequires only a limited knowledge of the PES. The basic algorithms neededto run the dynamics calculations are then discussed in detail, including harmo-nic and anharmonic calculations of partition functions. Multidimensional tun-neling corrections to VTST are discussed in the fourth section. Approaches tobuild the PES information needed in the VTST calculations are then discussed,including direct-dynamics methods with specific reaction parameters, interpo-lated VTST, and dual-level dynamics. The sixth section is dedicated to reac-tions in condensed media, including liquid solutions and solids. Thenensemble-averaged VTST is highlighted. The eighth and ninth sectionsdescribe some practical examples that show in some detail how VTST works,including a brief discussion of kinetic isotope effects. The last section providesa summary of the review.

VARIATIONAL TRANSITION STATE THEORY FORGAS-PHASE REACTIONS

Conventional Transition State Theory

Transition state theory (TST), also known as conventional TST, goesback to the papers of Eyring8 and Evans and Polanyi9 in 1935. For a generalgas-phase reaction of the type

Aþ B! Products ½1�where A and B may be either atoms or molecules, the theory assumes thatthere is an activated complex called the transition state that represents the bot-tleneck in the reaction process. The fundamental assumption of TST (also

Variational Transition State Theory for Gas-Phase Reactions 127

called the no-recrossing assumption) is only expressible in classical mechanics.It states that

(1) this transition state is identified with a dividing hypersurface (or surface,for brevity) that separates the reactant region from the product region inphase space, and

(2) all the trajectories that cross this dividing surface in the direction fromreactants to products originated as reactants and never return to reactants;

that is, they cross the dividing surface only once. For this reason, the TSTdividing surface is sometimes called the dynamical bottleneck. Rigorously,we can say that TST makes only four assumptions:

(1) that the Born–Oppenheimer approximation is valid, and so the reaction iselectronically adiabatic;

(2) that the reactants are equilibrated in a fixed-temperature (canonical)ensemble or fixed-total-energy (microcanonical) ensemble;

(3) that there is no recrossing; and(4) that quantum effects can be included by quantizing vibrations and by a

multiplicative transmission coefficient to account for tunneling (non-classical transmission) and nonclassical reflection.

In a world where nuclear motion is strictly classical, we need not consider (4),and the TST classical rate constant, kzC, for Eq. [1] is given by

kzC ¼1

bhQzCðTÞ�R

CðTÞexp �bVz� ½2�

where b ¼ ðkBTÞ�1 (kB is the Boltzmann constant, and T is the temperature), his the Planck constant,Vz is the potential energy difference between reactants andthe transition state (the barrier height, also called classical barrier height),QzC isthe classical (C) partition function of the transition state, and �R

C is the classicalpartition function of reactants per unit volume. (For a unimolecular reaction, wewould replace �R

C by the unitless classical reactant partition function QRC.)

Note that the transition state has one less degree of freedom than doesthe full system; that particular degree of freedom is called the reaction coordi-nate, and it is missing in QzC. Throughout this chapter, the symbol z is used todenote the conventional transition state, which is a system confined to the vici-nity of the saddle point by constraining the coordinate corresponding to thesaddle point’s imaginary-frequency normal mode to have zero extension.This coordinate is the reaction coordinate in conventional transition state the-ory. The zero of energy for the potential is taken as the energy of the minimumenergy configuration in the reactant region. The partition functions are pro-portional to configurational integrals of Boltzmann factors of the potential.For the reactant partition function, the zero of energy is the same as that forthe potential, whereas for the partition function of the transition state, the zero


of energy is taken as the local minimum in the bound vibrational modes at thesaddle point, which is Vz.

We can establish a connection between Eq. [2] and thermodynamics bystarting with the relation between the free energy of reaction,�G0

T at tempera-ture T, and the equilibrium constant K, which is given by

K ¼ K0 exp ��G0T=RT

� ½3�

where K0 is the value of the reaction quotient at the standard state. (For a reac-tion where the number of moles decreases by one, this is the reciprocal of thestandard-state concentration.) Then we rewrite Eq. [2] in quasithermodynamicterms8,10,11 as

kzC ¼1

bhKzCðTÞ ½4�

where KzC is the quasiequilibrium constant for forming the transition state.(The transition state is not a true thermodynamic species because it has onedegree of freedom missing, and therefore we add the prefix ‘‘quasi’’.) The ther-modynamic analog of Eq. [1] is now given by

kzC ¼1

bhKz;o exp ��Gz;oC;T=RT

h i½5�

where �Gz;oC;T represents the classical free energy of activation for the reactionunder consideration.

The siren song of TST when it was first proposed was that ‘‘all the quan-tities may be calculated from the appropriate potential surface,’’8 and in factfrom very restricted regions of that surface. Specifically, one ‘‘only’’ needs toobtain the properties (energies, geometries, moments of inertia, vibrationalfrequencies, etc.) of the reactants and the transition state from the PES andto be sure that the transition state is unequivocally joined to reactants by areaction path. One approach to ensuring this is to define the reaction pathas the minimum energy path, which can be computed by steepest descent algo-rithms. (These techniques will be discussed in detail in the subsection entitled‘‘The Reaction Path’’.) The fact that conventional transition state theory needsthe potential energy surface only in small regions around the reactant mini-mum and saddle point is indeed enticing. We will see that when one adds var-iational effects, one needs a more extensive region of the potential energysurface that is, nonetheless, still localized in the valley connecting reactantsto products. Then, when one adds tunneling, a longer section of the valley isneeded, and sometimes the potential for geometries outside the valley, in theso-called tunneling swath, is required. Nevertheless, the method often requiresonly a manageably small portion of the potential energy surface, and the cal-culations can be quite efficient.


It is possible to improve the results of Eq. [2] by incorporating a factorgC, called the transmission coefficient, that accounts for some of the aboveapproximations. The ‘‘exact’’ classical thermal rate constant will be given as

kC ¼ gCðTÞkzCðTÞ ½6�We can factor the transmission coefficient into two approximately indepen-dent parts,

gCðTÞ ¼ CðTÞgðTÞ ½7�that account, respectively, for corrections to the fundamental assumptionbeing made and to approximation (2) described earlier. When conventionalTST is compared with classical trajectory calculations, one is testing the no-recrossing assumption; i.e., we are assessing how far C is from unity, withTST being an upper bound to the classical rate constant (C � 1). Both clas-sical trajectory simulations (also called molecular dynamics simulations) andTST invoke the local-equilibrium approximation12 where the microstates ofreactants are in local equilibrium with each other, but it has been shownthat for gas-phase bimolecular reactions, the deviation of g from unity isusually very small.13–18 In the case of gas-phase unimolecular reactions, thereacting molecules need to be activated, and so there is a competition betweenenergy transfer and reaction. At low pressures, the rate constant is pressuredependent (‘‘falloff region’’) and controlled by the activation and deactivationof the activated species. Only when the pressure is sufficiently high is energyredistribution much faster than the product-forming step such that TST can beapplied. In this context, we can consider TST as the high-pressure limit rateconstant of a unimolecular rate constant.

The justification of variational transition state theory is rigorous only in aclassical mechanical world because, when the local equilibrium assumption isvalid, VTST provides an upper bound on the classical mechanical rate constant.One optimizes the definition of the transition state to minimize recrossing, andthe calculated rate constant converges to the exact rate constant from above.

The derivation of TST involves calculating the flux, i.e., counting thespecies that pass through the dividing surface located at the transition state.This only can be stated with certainty in the realm of classical mechanics. Inother words, to formulate classical TST requires that, at a given moment, weknow exactly the location in coordinate space of our reactive system, which ispassing through the dividing surface, and we know the sign of the momentum,which has to be positive, because the molecule is heading toward products.This violates the uncertainty principle. Nevertheless, the classical frameworkprovides a starting point for real systems, which have quantum effects thatare incorporated in two ways. First, quantum effects on motion in all degreesof freedom except the reaction coordinate near the dynamical bottleneck areincluded by replacing classical vibrational partition functions by quantum


mechanical ones. Second, tunneling and nonclassical reflection are includedthrough another temperature-dependent transmission coefficient, k.

In this review we consider reactions for which auxiliary assumption (1),the Born–Oppenheimer approximation, is met or is assumed to be met.Furthermore, we assume that energy transfer processes are occurring fastenough to replenish the populations of depleted reactant states, so g ffi 1 forall gas-phase reactions considered here. Therefore, the true quantum mechan-ical rate constant is given by

k ¼ gðTÞkzðTÞ ¼ ðTÞkðTÞkzðTÞ ½8�

where k takes into account nonclassical effects on the reaction coordinate, andkz is a quantized version of kzC. Then we have to find a methodology to eval-uate (T) and k(T), which are discussed in the following sections. In particu-lar, VTST may be considered a way to calculate by finding a better transitionstate that has less recrossing, and semiclassical tunneling calculations may beused to estimate k. In practical calculations on real systems, even when weoptimize the transition state by VTST, we do not find a transition state thateliminates all recrossing. Thus there is still a non-unit value of (T). As we car-ry out better optimizations of the transition state, the exact should convergeto unity. The essence of transition state theory is that one finally approximates as unity for one’s final choice of transition state.

Canonical Variational Transition State Theory

Conventional TST provides only an approximation to the ‘‘true’’ rateconstants, in part because we are calculating the one-way flux through thedividing surface that is appropriate only for small, classical vibrations aroundthe saddle point.19 We should be considering the net flux in a way thataccounts for global dynamics, quantization of modes transverse to the reactioncoordinate, and tunneling. It is important to note that ‘‘transverse’’ modesconsist of all modes except the reaction coordinate. The first way in whichthe calculated rate constants can be improved is to change the location ofthe dividing surface, which in conventional TST8 is located at the saddle point.More generally we should also consider other dividing surfaces. The conven-tional transition state dividing surface is a hyperplane perpendicular to theimaginary-frequency normal mode (the reactive normal mode) of the saddlepoint; it is the hyperplane with displacement along the reaction normalmode set equal to zero (see Figure 1). Any other dividing surface is by defini-tion a ‘‘generalized transition state.’’20 We search for generalized transitionstate dividing surfaces (even if they are not saddle points) that are locatedwhere the forward flux is a minimum.20–27 The practical problem involveslocating this particular dividing surface S, which in principle is a function of


all the coordinates q and momenta p of the system; that is, S ¼ Sðp;qÞ. Oneway of doing this is to consider the surface as being a function of coordinatesonly and then simplify further this dependency by considering a few-parameterset of dividing surfaces of restricted shape and orientation (together specifiedby X) at a distance s along a given reaction path (instead of allowing arbitrarydefinitions) such that S(p,q) is reduced to S(s,�). We can go further and fix theshape of the dividing surface and use the unit vector n perpendicular to thesurface, instead of X, to define the dividing surface S(s, n). These two para-meters (one scalar and one vector) are optimized variationally until the for-ward flux through the dividing surface is minimized.

In POLYRATE, the default for the reaction path is the minimum energypath (MEP) in isoinertial coordinates. The minimum-energy path is the unionof the paths of steepest descent on the potential energy surface down from thesaddle point toward reactants and products. The path of steepest descentdepends on the coordinate system, and when we refer to the MEP, we alwaysmean the one computed by steepest descents in isointertial coordinates. Isoinertialcoordinates are rectilinear coordinates in which the kinetic energy consists ofdiagonal square terms (that is, there are no cross terms between different com-ponents of momenta), and every coordinate has the same reduced mass. (Rec-tilinear coordinates are linear functions of Cartesian coordinates.) Someexamples of isoinertial coordinates that one encounters are mass-weighted

Figure 1 Contour plot of the Ha þHb �Hc ! Ha �Hb þHc collinear reactionshowing the dividing surface at the transition state and minimum energy path (MEP).X1 and X2 indicate the Ha . . .Hb and Hb . . .Hc distances, respectively. The contourlabels are in kcal/mol.


Cartesians, mass-weighted Cartesian displacements, mass-scaled Cartesians,and mass-scaled Jacobis. In mass-weighted coordinates,28 mass is unity andunitless, and the ‘‘coordinates’’ have units of length times square root ofmass; in mass-scaled coordinates, the reduced mass for all coordinates is a con-stant m (with units of mass), and the coordinates have units of length. Wealmost always use mass-scaled coordinates; the main exception is in the sub-section on curvilinear internal coordinates, where much of the analysis invol-ving internal coordinates is done in terms of unscaled coordinates.

The original choice27 of dividing surface for polyatomic VTST was ahyperplane in rectilinear coordinates orthogonal to the MEP. With this choiceof dividing surface, the direction of the gradient along the MEP coincides withthe direction along n. Therefore, in this case, the dividing surface depends onlyon s, and the minimum rate is obtained by variationally optimizing the loca-tion of the surface along the MEP. The coordinate perpendicular to the divid-ing surface is the reaction coordinate, and the assumption that systems do notrecross the dividing surface may be satisfied if this coordinate is separable fromthe other 3N � 1 degrees of freedom, where N is the number of atoms. Theset of coordinates {u1(s), . . .u3N� 1(s),s} or (u,s) are called natural collisioncoordinates.29

It can be shown30 that all isoinertial coordinates can be obtained fromone another by uniform scaling and an orthogonal transformation. Therefore,the MEP is the same in all such coordinate systems. This MEP is sometimescalled the intrinsic reaction coordinate or IRC.31

It is not necessary to use the MEP as the reaction path; one could alter-natively use a path generated by an arbitrarily complicated reaction coordi-nate,32 and for reactions in the condensed phase, some workers haveallowed a collective bath coordinate33 to participate in the definition of thereaction path. The transition state dividing surface is defined by the MEPonly on the reaction path itself. In the variational reaction path algorithm,34

the dividing surface is not necessarily perpendicular to the gradient along theMEP. Instead, it is the dividing surface that maximizes the free energy of acti-vation,20 and so, in this case, we also optimize n (discussed above and in thesubsection entitled ‘‘The Reaction Path’’), which allows us to make a betterestimate of the net flux through the dividing surface.

It is possible to write an expression for the rate constant similar toEq. [2] by using generalized transition state dividing surfaces. We start bydescribing the formulation of VTST for the original choice of dividing sur-face—a hyperplane in rectilinear coordinates orthogonal to the MEP—andintersecting it at s. In this case, the generalized transition state rate constantis given by

kGTC ¼ 1

bhQGT

C ðT; sÞ�R

CðTÞexp½�bVMEPðsÞ� ½9�


where by convention s ¼ 0 indicates the location of the saddle point and s < 0and s > 0 indicate the reactant and product side of the reaction path, respec-tively, VMEPðsÞ is the potential evaluated on the MEP at s, and QGT

C is the clas-sical generalized transition state partition function. The zero of energy for thegeneralized transition state partition function is taken as the minimum of thelocal vibrational modes orthogonal to the reaction path at s, which is equal toVMEPðsÞ. The value of the rate constant in Eq. [9], when minimized withrespect to s, corresponds to canonical variational transition state theory,also simply called canonical variational theory (CVT)20,27,30,35,36

kCVTC ¼ mins

kGTC ðT; sÞ ¼ kGT

C

hT; sCVTC;� ðTÞ

i½10�

where sCVTC;� indicates the optimum classical position of the dividing surface. (Ingeneral, an asterisk subscript on s denotes the value of s at a variational transi-tion state.) The expression for the classical CVT rate constant is then

kCVTC ¼ 1

bh

QGTC

hT; sCVTC;�

ðTÞi�

�RCðTÞ

expn�bVMEP

sC;�ðTÞ

�o½11�

The CVT rate constant can account for most of the recrossing (depend-ing on the reaction) that takes place at the conventional transition state. Itshould be noted that to minimize the recrossing does not generally mean toeliminate it, and for a particular reaction, we may find that even the ‘‘best’’dividing surface obtained by CVT yields a rate constant larger than the exactclassical rate constant, although it can be shown that in a classical world, wecan always eliminate all recrossing by optimizing the dividing surface in phasespace with respect to all coordinates and momenta.37 On the other hand,assuming local equilibrium of reactant states, the CVT rate constant alwaysimproves the result obtained by conventional TST, and therefore, the follow-ing inequality holds:

kCVTC � kzCðTÞ ½12�

Thus, CVT takes into account the effect of the factor zCðTÞ on the thermalrate constant, where the superscript z means recrossing of the convent-ional transition state, and the subscript C reminds us that we are still discuss-ing the classical mechanical rate constant. CVT is considered to be anapproximation to the exact classical rate constant

kC ffi kCVTC ðTÞ ¼ CVTC ðTÞkzCðTÞ ½13�


where

CVTC ¼ kCVTC ðTÞ

kzCðTÞ½14�

Now we consider how to incorporate quantum effects into the thermalrate constant. For the modes perpendicular to the reaction coordinate, this isdone in what is often considered to be an ad hoc way by quantizing the parti-tion functions.8 Actually, this is not totally ad hoc; it was derived, at least toorder h2 in Planck’s constant, by Wigner38 in 1932. Because the reaction coor-dinate is missing in the transition state partition functions of Eqs. [2] and [9],the rate constant is still not fully quantized at the transition state. At this point,to denote that we have incorporated quantum effects in all degrees of freedomof reactants and all but one degrees of freedom of the transition state by usingquantum mechanical partition functions instead of classical mechanical parti-tion functions, we drop the subscript (C) from all remaining expressions. TheCVT rate constant is then given by

kCVT ¼ 1

bh

QGTT; sCVT� ðTÞ

��RðTÞ exp

n�bVMEP

sCVT� ðTÞ

�o½15�

where �R is the quantized reactant partition function per unit volume andQGTðT; sÞ is the quantized generalized transition state partition function at s.Note that the value sCVT� that minimizes the quantized generalized transitionstate rate constant at temperature T is not necessarily equal to the valuesCVTC;� ðTÞ that minimizes the classical expression.

Another way to write Eq. [9] is to relate it to the free energy of activationprofile GGT;o

T by analogy to Eq. [5]:

kGT ¼ 1

bhKz;o exp

n�hGGT;oðT; sÞ �GR;o

T

i.RTo

¼ 1

bhKz;o exp

h��GGT;o

T ðT; sÞ.RTi

½16�

where Kz,o is the reciprocal of concentration in the standard state for bimole-cular reactions or unity for unimolecular reactions, GGT;o

T is the standard-statefree energy of the system at the dividing surface perpendicular to the MEP, andGR;o

T is the classical standard-state free energy of reactants at temperature T.The free energy of activation profile is given as

�GGT;o ¼ VMEPðsÞ � RT lnQGTðT; sÞKz;o�RðTÞ� �

½17�


Therefore, the CVT rate constant can be rewritten as

kCVT ¼ 1

bhKz;o exp

n��GCVT;0

T

sCVT� ðTÞ

�.RTo

½18�

When comparing Eqs. [16] and [18], it can be seen that the minimum value ofkGT as a function of s is reached when the free energy of activation is maxi-mum.20,27,39,40 This can be restated in terms of first and second derivatives;that is,

qqs

kGTðT; sÞ��s¼sCVT� ðTÞ

¼ qqs

�GGT;oT ðsÞ

��s¼sCVT� ðTÞ

¼ 0 ½19a�

with

q2

qs2kGT T; sð Þ

��s¼sCVT� ðTÞ

> 0 ½19b�

and

q2

qs2�GGT

T sð Þ��s¼sCVT� ðTÞ

< 0 ½19c�

Initially we have taken the dividing surface to be perpendicular to theMEP. In the reorientation of the dividing surface (RODS) algorithm, the divid-ing surface is oriented to yield the most physical free energy of activation,which is the dividing surface that maximizes �GGT;o

T ðSðsi; nnÞÞ at a given Tand si. In this case, the dividing surface is defined by the location si where itintersects the MEP and a unit vector nn that is orthogonal to the dividing sur-face at the MEP. The value of the free energy with the optimum orientation atpoint si is given by

�GOGT;oT ¼ max

nn�GGT;o

T ðSðsi; nnÞÞ ½20�

and the CVT free energy is the maximum of the orientation optimized freeenergies:

�GCVT;oT ¼ max

s�GOGT;o

T ðsÞ ½21�

The algorithm used to evaluate �GOGT;oT will be discussed below.

Other Variational Transition State Theories

Canonical variational theory finds the best dividing surface for a canonicalensemble, characterized by temperature T, to minimize the calculated canonicalrate constant. Alternative variational transition state theories can also be


defined. This is done for other ensembles by finding the dividing surfaces thatminimize the rate constants for those ensembles. For example, a microcanonicalensemble is characterized by a total energy E, and the generalized transitionstate theory rate constant for this ensemble is proportional to NGT

vr ðE; sÞ, whichis the number of vibrational–rotational states with energy smaller than E at ageneralized transition state at s. Microcanonical variational transition state1 the-ory (mVT) is obtained by finding the dividing surface that minimizes NGT

vr ; i.e.,

NmVT ¼ mins

NGTvr ðE; sÞ ½22�

The location of the dividing surface that minimizes Eq. [22] is defined as smVT� ,which specifies the microcanonical variational transition state; thus,

qNGTvr ðE; sÞqs

��s¼smVT� ðEÞ

¼ 0 ½23�

Notice that the minimum-number-of-states criterion corresponds correctly tovariational transition state theory, whereas an earlier minimum-density-of-states criterion does not.27 The microcanonical rate constant can be written as

kmVT ¼ QelðTÞÐ10 NmVT

vr ðEÞ expð�bEÞdEh�RðTÞ ½24�

Where the electronic partition function is defined below. Evaluating themicrocanonical number of states can be very time consuming at high energiesfor big molecules. To avoid this problem, one can instead optimize the general-ized transition states up to the microcanonical variational threshold energyand then use canonical theory for higher energy contributions. This approachis called improved canonical variational theory (ICVT).1,41 ICVT has thesame energy threshold as mVT, but its calculation is much less time consuming.A microcanonical criterion is more flexible than is a canonical one, andtherefore,

kzðTÞ � kCVTðTÞ � kICVTðTÞ � kmVTðTÞ ½25�As we go to the right in the above sequence, the methods account more accu-rately for recrossing effects.

Sometimes it is found that even the best dividing surface gives too highrate constants because another reaction bottleneck exists. Those cases can behandled, at least approximately, by the unified statistical (US) model.42,43 Inthis method, the thermal rate constant can be written as

kUS ¼QelðTÞ

R10

NUSvr ðEÞ expð�bEÞdEh�RðTÞ ½26�


where

NUSvr ¼ NmVT

vr ðEÞUSðEÞ ½27�

The US recrossing factor due to the second bottleneck is defined as

US ¼(1þNmVT

vr ðEÞNmin

vr ðEÞ�NmVT

vr ðEÞNmax

vr ðEÞ

)�1½28�

whereNminvr ðEÞ is the second lowest minimum of the accessible numberNGT

vr ðE; sÞof vibrational–rotational states, and Nmax

vr ðEÞ is the maximum of NGTvr ðE; sÞ

located between the two minima in the number of vibrational–rotationalstates. This approach is nonvariational but always satisfies the relation

kmVT � kUS ½29�In the case that the same physical approximations are applied to fluxes in acanonical ensemble, we call this canonical unified statistical theory (CUS)44

and the recrossing factor CUS is given by

CUS ¼ 1þ qCVTvr ðTÞqmaxvr ðTÞ

� qCVTvr ðTÞqminvr ðTÞ

� ��1½30�

where

qCVTvr ¼ QGTvr ðT; s�CVTÞ exp½�bVMEPðsÞ� ½31�

is the partition function evaluated at the maximum of the free energy of acti-vation profile, qmax

vr is evaluated at the second highest maximum, and qminvr ðTÞ

is evaluated at the lowest minimum between the two maxima. The CUS rateconstant is given by

kCUS ¼ CUSðTÞkCVTðTÞ ½32�In the limit that there are two equivalent maxima in the free energy of activa-tion profile with a deep minimum between them, the statistical result isobtained; i.e., CUS ¼ 0:5. Note that signs appear different in Eqs. [28] and[30] because in the former, ‘‘max’’ and ‘‘min’’ are associated with local max-ima and minima, respectively, of the flux, whereas in the latter, they are asso-ciated with maxima and minima, respectively, of the free energy of activationprofile—not of the flux.

Quantum Effects on the Reaction Coordinate

Up to this point we have incorporated quantum mechanics in the F � 1bound degrees of freedom (where F is the total number of bound and unbound


vibrations and equals 3N � 6, whereN is the number of atoms except that it is3N � 5 for linear species) through the partition functions, and therefore, boththe TST and the CVT rate constants are quantized. The difference betweenboth theories is still given by the factor

CVTðTÞ ¼ kCVTðTÞ=kzðTÞ ½33�

which takes into account the recrossing. To quantize all degrees of freedomrequires incorporation of quantum effects into the reaction coordinate, througha multiplicative transmission coefficient k(T). For example, for CVT, we write

kCVT=YðTÞ ¼ kCVT=YðTÞkCVTðTÞ ½34�

where Y indicates the method to evaluate the quantum effects. The main quan-tum effect to account for is tunneling through the reaction barrier. We canclassify tunneling calculations into three levels depending on level of approx-imation:45

(1) one-dimensional approximations,(2) multidimensional zero-curvature approximations, and(3) multidimensional corner-cutting approximations.

Early models that were developed correspond to the first level of approxima-tion and are based on the probability of penetration of a mass point through aone-dimensional barrier,46,47 whose shape was usually given by an analyticalfunction, for example, a parabola48–50 or an Eckart barrier,51 that is fitted tothe shape of the potential along the reaction path. The method of Wigner38

actually corresponds to the leading term in an expansion in �h; as it dependsonly on the quadratic force constant along the reaction path at the saddlepoint, it may be considered an approximation to the one-dimensional para-bolic result. These one-dimensional models, although historically important,are not very accurate because they do not take into account the full dimension-ality of the system under study. Detailed discussion of multidimensional tun-neling methods is provided below.

PRACTICAL METHODS FOR QUANTIZED VTSTCALCULATIONS

In this section, we provide details of methods used in computations ofquantities needed in quantized VTST rate constant calculations. We start bydiscussing methods used to define dividing surfaces. As the reaction path playsan important role in parameterizing dividing surfaces, we first describe meth-ods for its evaluation. We then discuss calculations of partition functions andnumbers of states needed in the rate constant calculations.

Practical Methods for Quantized VTST Calculations 139

The Reaction Path

This section describes some algorithms used to calculate the reactionpath efficiently. The evaluation of the CVT rate constants requires theknowledge of at least part of a reaction path, which can be calculated bysome of the steepest-descent methods briefly described in the first Subsection.The second Subsection explains a reaction-path algorithm that, at a givenvalue of the reaction coordinate, finds the orientation of the hyperplanar divid-ing surface that maximizes the free energy. Later on, more general shapes forthe dividing surface are discussed.

The Minimum Energy PathThe minimum energy path is the path of steepest descents in isoinertial

coordinates from the saddle point into the reactant and product regions. Forthe general reaction of Eq. [1] in which the reactive system is composed of Natoms ðN ¼ NA þNBÞ and i ¼ 1; 2; . . .; N labels the atoms, we define the 3NCartesian coordinates as R. The origin of the coordinate system is arbitrary,although it is often convenient to define it as the center of the mass of the sys-tem. The saddle point geometry in Cartesian coordinates, denoted Rz, is astationary point and first derivatives of the potential energy, V, with respectto the coordinates at Rz, is zero:

rV ¼ qVqR

��R¼Rz

¼ 0 ½35�

It is useful to change from Cartesian coordinates to a mass-scaled coordinatesystem defined by

xia ¼ mi

m

� �1=2Ria ½36�

where mi is the mass of nucleus i, m is an arbitrary mass, and a denotes theCartesian component (x, y, or z). For bimolecular reactions like Eq. [1], it iscommon either to use the reduced mass of reactants

mrel ¼mAmB

mA þmB½37�

or to use a value of 1 amu for m. For these isoinertial coordinates, the kineticenergy of the nuclear motion simplifies from

T ¼ 1

2

XNi¼1

mi

Xa¼x;y;z

_RR2ia ½38�

to a diagonal form

T ¼ 1

2mXNi¼1

Xa¼x;y;z

_xx2ia ½39�


where _xxia represents the derivative of xia with respect to time. With the latterchoice, the numerical value of coordinates expressed in A is identical to thenumerical value of a mass-weighted28 Cartesian coordinate in amu1/2 A. Themotion of the polyatomic system is reduced to the motion of a point mass m ona potential surface V with the classical equations of motion given by

md

dt_xxia ¼ � qV

qxia½40�

A generalized transition state is a tentative dynamical bottleneck, and a tenta-tive reaction coordinate is a nearly separable coordinate in a direction fromreactants to products. Thermal rate constants are dominated by near-thresholdevents, and near the reaction threshold, a nearly separable coordinate in adirection from reactants to products is given by following the equations ofmotion but damping out the velocity along the trajectory. With this damping,the equations of motion can be rewritten for an infinitesimal time interval t as

m _xxia ¼ � qVqxia

t ½41�

The integration constant is zero because of the assumption of infinitesimalvelocity (x

: ¼ 0 at t ¼ 0). We can rewrite Eq. [41] in vector form as

mdx¼ �rVðxÞdt ¼ �GðxÞdt ½42�

where dt ¼ t dt. If we define a infinitesimal mass-scaled distance along thepath as ds, then

ds ¼XNi¼1

Xa¼x;y;z

dx2ia

" #1=2¼ jGðxÞj

mdt ½43�

with jGj being the modulus of the gradient. Substituting Eq. [43] in Eq. [42],we obtain

dx

ds¼ �GGðxÞ ¼ vðxÞ ½44�

where GG ¼ G=jGj is the normalized gradient, and v is a vector with oppositedirection to the gradient. The MEP can be followed by solving the above dif-ferential equation. The displacement on the MEP is given by the steepest des-cent direction along v, where s indicates the progression along the path52–55

and x(s) the geometry.


For a practical evaluation of the MEP, the first stage involves the knowl-edge of the transition state (or first-order saddle point) geometry. By conven-tion we locate the transition state at s ¼ 0, and we denote its scaled-massCartesian-coordinates geometry by xz. Reactants and products sides are givenby values of s < 0 and s > 0, respectively. There are very efficient algorithms toevaluate transition state geometries,56–58 which are available in many popularelectronic structure packages. We cannot use Eq. [44] to take a step from thesaddle point along the reaction path because the gradient is zero. At the saddlepoint, the direction of the MEP is given by the unbound vibrational mode,which requires evaluation of the normal mode frequencies and eigenvectorsat the saddle point. At stationary points, the vibrational frequencies are calcu-lated by diagonalization of the 3N � 3N matrix of force constants F, whichare the second derivatives of the potential with respect to isoinertial Cartesiancoordinates scaled to a mass m. F is also called the Hessian. For instance, forthe conventional transition state geometry xz, this matrix can be diagonalizedby performing the unitary transformation:

LðxzÞyFðxzÞLðxzÞ ¼ ðxzÞ ½45�where { denotes transpose, L is the 3N � 3N diagonal matrix with eigenva-lues lm on the diagonals (with m ¼ 1; 2; . . . ; 3N) and with eigenvectorsarranged as a matrix L whose columns Lm correspond to the 3N normal-mode directions. The normal-mode frequencies at the saddle point can beobtained from the eigenvalues by the relation:

omðs ¼ 0Þ ¼ lmðxzÞ=m� 1=2 ½46�

The saddle point has 6 zero eigenvalues (5 if it is linear), which correspond tothe overall rotation and translation of the molecule. We define F as the numberof vibrational modes (F ¼ 3N � 6 for a nonlinear molecule or 3N � 5 for alinear molecule), where for a saddle point, the first F � 1 modes are boundwith positive eigenvalues and real frequencies. Mode F is unbound with animaginary frequency (oz) corresponding to motion parallel to the MEP atthe saddle point. The eigenvector associated with this frequency is denotedby LFðxzÞ. The first geometry along the MEP toward reactants (� sign) andtoward products (þ sign) is given by

xðs1 ¼ dsÞ ¼ xz dsLFðxzÞ ½47�where ds is the step length. The sign of LFðxzÞ is chosen so that the vectorpoints from reactants towards products.

For the geometry xðsÞ (x hereafter), the gradient is different than zero, andso for the next x2 geometry, or in general for a geometry xn, with n > 1, we canapply Eq. [44] and follow the opposite direction of the normalized gradient:

xn ¼ xn�1 � dsGGn�1 ¼ xn�1 þ dsvn�1 ½48�


where we use the shorthand notation GGn ¼ GGðxnÞ and vn ¼ vðxnÞ. The abovefirst-order equation gives the MEP geometries by the so-called Euler steepest-descent (ESD) method.59 For an accurate evaluation of the MEP, the step sizehas to be small because the error is proportional to dsð Þ2. Some otherEuler-type methods try to minimize the error, like the predictor-correctoralgorithm,60,61 the optimized Euler stabilization method,59 and the backwardEuler method.62 Of all of the Euler-based steepest descent methods, theoptimized Euler stabilization method, version 1 (ES1*), is the one thatproduces the best-converged paths.59 The ESD method provides an initialgeometry

xð0Þn ¼ xn�1 þ dsvn�1 ½49�

Then a corrector step is specified as a point at a minimum of a parabolic fitalong a line that goes through x

ð0Þn and parallel to a ‘‘bisector’’ vector dn, which

is given by60

dn ¼ vðxn�1Þ � vðxð0Þn Þvðxn�1Þ � vðxð0Þn Þ

�� ½50�

The new geometry is given by

xn ¼ xð0Þn þ �dn ½51�

where � is a step along dn, with a step size proportional to a user provided para-meter d2. The correction is not carried out if jvðxn�1Þ � vðx ð0Þn Þj < o, with obeing a small value characteristic of some small angle between gradients. Thealgorithm is sensitive to the values of d2 and o, and in the ES1* method, it isrecommended that both values are set according to recommendations59 thatwere based on systematic studies of convergence, those values being d2 ¼ dsand o ¼ 0:01.

The above methods are based on a local linear approximation to theenergy, with quadratic information being used only at the saddle point.Another possibility is to use algorithms, which in general are more accurate,that exploit higher order information about the potential energy. Page andMcIver63 have presented a successful method that does this. First, a cubicexpansion of the potential energy surface around the saddle point was pro-posed to take the initial step along the MEP. In this case, the first point alongthe reaction path is given by

xðs1 ¼ dsÞ ¼ xz dsLFðxzÞ 1

2ðdsÞ2cðxzÞ ½52�


where the vector c(xz) is defined by

AcðxzÞ ¼ CðxzÞLFðxzÞ � LyFðxzÞCðxzÞLFðxzÞLFðxzÞ ½53a�

where

A ¼ 2LyFðxzÞFðxzÞLFðxzÞIþ 2LFðxzÞLyFðxzÞ � Ih i

FðxzÞ ½53b�

with I being the identity matrix, and CðxzÞ is given by a finite difference expan-sion of the force constants matrix around the saddle point with a preselectedstep d3:

CðxzÞ ¼ Fðxz þ d3LFðxzÞÞ � Fðxz � d3LFðxzÞÞ2d3

½54�

Although the algorithm is cubic, it requires calculations of Hessian matricesonly near the saddle point.

One of the most popular second-order methods for following the steepestdescent path is the local quadratic approximation of Page and McIver,63

which we call the Page–McIver (PM) algorithm and we describe next. At agiven geometry xn along the path, we evaluate the Hessian matrix Fn and diag-onalize it using

an ¼ UynFnUn ½55�

where Un is an orthogonal matrix of column eigenvectors and an is a diagonalmatrix of eigenvalues. The geometry at the next step along the MEP is given by

xnþ1 ¼ xn þDnð�Þvn ½56�

where

Dnð�Þ ¼ UnMnð�ÞUyn ½57�

and Mn is a diagonal matrix with diagonal elements given by

Miið�Þ ¼ ½expð�an;ii�Þ � 1Þ�=an;ii ½58�

The variable � is a progress variable that is zero at xn and is related to the reac-tion coordinate s by

ds

d�¼ dxy

d�

dx

d�

� �1=2½59�


which can be rewritten

d�

ds¼X3Ni¼1

h2i expð�2an;ii�Þ ½60�

where

hn ¼ UynGn ½61�

The next value of the reaction path coordinate snþ1 ¼ sn þ ds is given bychoosing the value of � to satisfy the following integral equation:

ds ¼ð�0

d�0X3Ni¼1

h2i expð�2an;ii�0Þ !�1

½62�

which is numerically integrated by the trapezoidal rule. An option is to eval-uate a new Hessian after a given number of steps along the reaction pathrather than after each step; in which case, we call it the modified Page–McIveralgorithm.59

Variational Reaction Path AlgorithmThe original approach for defining variational dividing surfaces, once the

MEP is determined, is to choose them to be hyperplanes in rectilinear coordi-nates, which are constrained to be orthogonal to the MEP. In this case thedividing surfaces are characterized by a single parameter, the location s alongthe MEP. In the reorientation of the dividing surface (RODS) method, thedividing surface is not constrained to be orthogonal to the MEP and its orien-tation is optimized to maximize the free energy for points along the MEP. Thepreviously described algorithms allow calculation of a well-converged MEP bythe steepest-descent path from the saddle point to reactants or to products.However, to obtain a well-converged path may be computationally verydemanding and so some alternative strategies have been suggested34,64 fordefining optimum dividing surfaces even if the MEP is not well converged.One such approach is the variational reaction path algorithm (VRP) that isa combination of the ESD and RODS algorithms. The first geometry alongthe path can be obtained from Eq. [47] or Eq. [52] as discussed above. Thegeometries along the path, for instance, a given geometry xn, are obtainedby applying first the ESD method to obtain a zero-order approximation tothe geometry on the MEP

xð0Þn ¼ xn�1 � ds GGn�1 ½63�

We define the dividing surface as a hyperplane in rectilinear coordinates,which is orthogonal to the unit vector nn and passes through the geometry


xð0Þn . The potential in the hyperplane is approximated through quadratic terms

and is most easily expressed in terms of the generalized normal modes formotion in the (F � 1)-dimensional space of the hyperplane (note that conven-tional normal modes are defined only at stationary points, so this concept mustbe generalized to use it at geometries where the gradient of the potential doesnot vanish):

VðxÞ ¼ Vðxð0Þn Þ þXF�1m¼1

hGE

n;mðnnÞQm þ 1

2lEn;mðnnÞQ2

m

i½64�

where Qm is the displacement from xð0Þn in generalized normal mode m and the

gradient and force alongmodem are defined as follows. The gradient vector andHessian matrix evaluated at x

ð0Þn are denotedGð0Þn and Fð0Þn , respectively. Motion

along the vector nn, as well as rotations and translations, are projected out to give

GP;ð0Þn ðnnÞ ¼

I� nnnn

�I� PRT

�Gð0Þn ½65�

and a projected Hessian matrix

FP;ð0Þn ðnnÞ ¼I� nnnny

�I� PRT

�Fð0Þn

I� PRT

�I� nnnny

�½66�

where PRT is the matrix that projects onto the translations and rotations.65

The gradient vector and force constant matrix in the eigenvalue representationare then given by

GEnðnnÞ ¼

hLPnðnnÞ

iyGP;ð0Þ

n ðnnÞ ½67�

and

KEnðnnÞ ¼

hLPnðnnÞ

iyFP;ð0Þn ðnnÞLP

nðnnÞ ½68�

where LEn is a diagonal matrix with elements lEn;m along the diagonal and LP

n isthe matrix of eigenvectors that diagonalizes the projected Hessian matrix. Theeigenvalues and eigenvectors are ordered so that the first F � 1 correspond tothe modes in the hyperplane and modes F, F þ 1, . . ., 3N correspond to themodes along nn and translations and rotations, which have zero eigenvalues.The normal mode coordinates are defined by

Q ¼hLPnðnnÞ

iyx� xð0Þn

�½69�

and the elements F, F þ 1, . . ., 3N will be zero for motion constrained to thehyperplane.


The coordinate along the variational reaction path is then defined as thelocation of the minimum of the local quadratic potential in the hyperplane asgiven by Eq. [63], which is given by

xn ¼ xð0Þn þ LPnðnnÞQMðnnÞ ½70�

where the minimum in the normal mode coordinates is given by

QMm n

_ �

¼ �GEn;mðnnÞ

.lEn;mðnnÞ; m ¼ 1; . . . ; F � 1

0; m ¼ F; . . . ; 3N

(½71�

In the ESD algorithm, for which xn ¼ xð0Þn , the value of s along the path is sim-

ply given by the arc length between adjacent points on the MEP

sn ¼ sn�1 ds ½72�where the sign is negative on the reactant side and positive on the productside of the saddle point. Although xn is not necessarily equal to x

ð0Þn for the

variational reaction path, it has been found that use of Eq. [72] provides a bet-ter estimate of computed rate constants than a method that uses the differencebetween xn and xn�1 in evaluating s.

A complete description of the variational reaction path approach stillrequires definition of the vector nn. If nn is chosen to be along the gradient vectorGGð0Þn , then GP;ð0Þ

n ðnnÞ and GEnðnnÞ are zero [i.e., ð1� nnnnyÞnn ¼ 0, QMðnnÞ ¼ 0, and

x ¼ xð0Þn ]. In the variational reaction path approach, the RODS algorithm is

used to determine the direction of nn. The free energy of activation ofEq. [17] is generalized to

�GGT;oðT; xð0Þn ; nnÞ ¼ VMn ðnnÞ � RT ln

QGTC ðT; xð0Þn ; nnÞKz;o�R

CðTÞ

#"½73�

where VMn ðnnÞ is the minimum value of the local quadratic potential in Eq. [64],

which can be expressed

VMn ðnnÞ ¼ V

xð0Þn

��XF�1m¼1

GEn;mðnnÞ

h i2.2lEn;mðnnÞ ½74�

Calculation of the partition function needed in the evaluation of the freeenergy of activation is described in the next section. Once the partition func-tion is evaluated, the optimum value of �GGT;o T; x

ð0Þn ; nn

�with respect to nn is

obtained by applying the conjugate gradient algorithm for which the vector ofderivatives q�GGT;oðT; xð0Þn ; nnÞ=qnn is needed. These derivatives are obtained byfinite differences. We denote the optimum value of the unit vector for a point s


along the variational reaction path as nnðsÞ. This algorithm eliminates someinstabilities of the calculated reaction path and of the generalized normalmode frequencies. At the same time, it allows a larger step size than the normalsteepest-descent algorithms.34,64

Evaluation of Partition Functions

Calculation of the rate constant involves the ratio of partition functionsfor the generalized transition state and for reactants. The three degrees of free-dom corresponding to translation of the center of mass of the system are thesame in the reactants and transition state, and they are therefore removed inboth the numerator and the denominator of Eq. [15]. The reactant partitionfunction per unit volume for bimolecular reactions is expressed as the productof partition functions for the two reactant species and their relative transla-tional motion

�RðTÞ ¼ �A;Brel ðTÞQAðTÞQBðTÞ ½75�

where

�A;Brel ðTÞ ¼

2pmrelbh2

� �3=2½76�

and QA and QB include contributions from internal degrees of freedom (vibra-tional, rotational, and electronic) for each species. For unimolecular reactions,the reactant partition function involves contributions from just one reactantspecies. For an atomic reactant, QA(T) and QB(T) have contributions onlyfrom the electronic degrees of freedom, whereas for polyatomic species, theyare approximated as shown for reactant A:

QA ¼ QAelðTÞQA

vibðTÞQArotðTÞ ½77�

In this expression, couplings among the electronic, vibrational, and rotationaldegrees of freedom are neglected. The calculation of partition functions forbound species is standard in many textbooks and is repeated here for comple-teness. The electronic partition function is given by

QAel ¼

Xa¼1

dAa exp

h�bEA

elðaÞi

½78�

where a is the index over electronic states and dAa and EA

elðaÞ are the degener-acy and energy of electronic state a, respectively. Note that the energy of theground state (i.e., a ¼ 1) is zero. Rotational partition functions approximated


for the rotational motion of a rigid molecule have shown that there is little lossof accuracy (not more than about 1%) if the quantum partition function isreplaced by the classical one. For a linear reactant, the classical rigid-rotor par-tition function is given by

QArot ¼

2IA

�h2bsArot

½79�

where IA is the moment of inertia, sArot is the rotational symmetry number, and

�h ¼ h=2p. If the reactant is nonlinear, the rotational partition function isapproximated by

QArot ¼

1

sArot

2

�h2b

!3

pIA1 IA2 I

A3

24

351=2

½80�

where IA1 , IA2 , and IA3 are the principal moments of inertia of reactant A.

The vibrational partition function is treated quantum mechanically, andas a first approximation, it is evaluated within the harmonic approximation as

QAvibðTÞ ¼

YFAm¼1

Xnm

exph�bEA

vib;mðnmÞi

½81�

where FA ¼ 3NA � 5 (linear) or FA ¼ 3NA � 6 (nonlinear), NA is the numberof atoms in reactant A, and EA

vib;mðnmÞ is the energy of the harmonic vibra-tional level n in mode m and is given by

EAvib;mðnmÞ ¼ nm þ 1

2

� ��hoA

m ½82�

where oAm is the frequency of normal mode m in reactant A. Anharmonic

corrections to the vibrational partition functions are discussed below.

Generalized Transition State Partition Functions in RectilinearCoordinatesEvaluation of the generalized transition state partition function QGT

involves contributions from the 3N � 4 internal degrees of freedom in thedividing surface. The three degrees of freedom for overall center-of-mass trans-lation and motion out of the dividing surface are removed. Calculations of gen-eralized transition state partition functions require definition of the dividingsurface, which in the most general case described above is specified by a loca-tion x(s) along the reaction coordinate and the orientation of the planar divid-ing surface given by the unit normal vector nnðsÞ. In this section, we describe


calculations for dividing surfaces that are hyperplanes in rectilinear coordi-nates. Calculations for curvilinear coordinates are described in the next section.

As for reactant partition functions, we assume that the coupling amongrotation, vibration, and electronic motion may be neglected, so that the gen-eralized partition function can be written as the product of three partitionfunctions:

QGTðT; sÞ ¼ QGTrot ðT; sÞQGT

vib ðT; sÞQGTel ðT; sÞ ½83�

The electronic partition function is given by

QGTel ¼

Xa¼1

dGTa ðsÞ exp

h�bEGT

el ða; sÞi

½84�

where a ¼ 1; . . . indicates the electronic state, a ¼ 1 denotes the groundelectronic state, and dGT

a ðsÞ and EGTel ða; sÞ are the degeneracy and energy of

electronic state a. The electronic energies are measured relative to the energyat the local minimum in the dividing surface with the ground state energyEGTel ða ¼ 1; sÞ ¼ 0. For many molecules, it is sufficient to consider only the

electronic ground state, because it is the only one that contributes significantlyto the sum. Furthermore, it is usually a very good approximation to make theelectronic partition function independent of s in the interaction region.

Rotational partition functions are calculated for rigid rotations of thetransition state complex and only require knowledge of the geometry x(s).As noted, classical rotational partition functions accurately approximate thequantum mechanical ones. For a linear transition state complex, the classicalrotational partition function is given by

QGTrot ¼

2IðsÞ�h2bsrot

½85�

where I(s) is the moment of inertia and srot is the rotational symmetry number.The rotational partition function for a nonlinear transition state complex is

QGTrot ðT; sÞ ¼

1

srot

2

�h2b

!3

pI1ðsÞI2ðsÞI3ðsÞ24

351=2

½86�

where I1ðsÞ, I2ðsÞ, and I3ðsÞ are the principal moments of inertia.Vibrational partition functions are evaluated within the harmonic

approximation

QGTvib ¼

YF�1m¼1

QGTvib;mðT; sÞ ½87�


Each of the m vibrational partition functions is given by

QGTvib;m ¼

Xnm

exph�bEGT

vib;mðnm; sÞi

½88�

where EGTvib;mðnm; sÞ is the energy of the harmonic vibrational level nm in mode

m, measured relative to VMEPðsÞ, and is given, analogous to Eq. [82], by

EGTvib;mðnm; sÞ ¼ nm þ 1

2

� ��homðsÞ ½89�

where omðsÞ is the frequency of normal mode m for the dividing surfacedefined by x(s) and nnðsÞ. The sum in Eq. [88] should terminate when the lowestdissociation energy of the system is reached,30 but because, in general, thecontribution from high energy levels is negligible, the sum can include allharmonic levels and so we get an analytical expression of the type:

QGTvib;mðT; sÞ ¼

exp � 1

2b�homðsÞ

� �f1� exp½�b�homðsÞ�g ½90�

The harmonic frequencies {x1ðsÞ; . . . ;xF�1ðsÞ} needed for the vibrationalpartition functions correspond to those obtained by making a quadraticexpansion of the potential in the vicinity of the reaction path for motion con-strained to stay on the dividing surface. Calculation of harmonic frequenciesfor planar dividing surfaces in rectilinear coordinates is straightforward anddescribed here. At stationary points, the vibrational frequencies are calculatedby diagonalization of the 3N � 3N Hessian matrix, F, which are the secondderivatives of the potential with respect to isoinertial Cartesian coordinatesscaled to a mass m. For instance, for the transition state geometry, xz, thismatrix is diagonalized as in Eq. [45] to yield the eigenvalues lmðxzÞ. The nor-mal-mode frequencies at the saddle point can be obtained from the eigenvaluesusing Eq. [46].

For a location s along the reaction path that is off the saddle point, wewant the set of vibrational frequencies fx1ðsÞ; . . . ;xF�1ðsÞg for motions thatare orthogonal to the dividing surface at s. Diagonalization of F[x(s)] for loca-tions where the gradient is not zero will yield normal modes that mix motionin the dividing surface with those orthogonal to it. In this case, motion parallelto nnðsÞ and the six degrees of freedom corresponding to translations and rota-tion of the molecule can be projected out of the Hessian. In the case where thedividing surface is a hyperplane and nnðsÞ is parallel to the gradient vector, theexpression for the projection matrix, P, can be found in the article of Miller,


Handy and Adams.65 The generalization to cases where nnðsÞ is not parallel tothe gradient vector is given by an expression similar to Eq. [66]

FP ¼ ðI� nnðsÞnnðsÞyÞðI� PRTÞF½xðsÞ�ðI� PRTÞðI� nnðsÞnnðsÞyÞ ½91�

Now FP can be diagonalized using the relation:

LGTðsÞ� yFPðsÞLGTðsÞ ¼ LðsÞ ½92�

The resulting m ¼ 1; . . . ; F � 1 eigenvalues are given by

omðsÞ ¼ ½lmðsÞ=m�1=2 ½93�

with directions given by the corresponding vectors LGTm ðsÞ, whose phases

(‘‘signs’’) are discussed below Eq. [167].

Generalized Partition Functions in Curvilinear Internal CoordinatesIn the previous subsections, the dividing surfaces were hyperplanes in

rectilinear coordinates; they were orthogonal to the reaction path at thepoint where they intersect it, and they were labeled by the location s at whichthey intersect the reaction path. In this section, we consider more generaldividing surfaces defined in terms of curvilinear coordinates such as stretch,bend, and torsion coordinates (which are called valence coordinates orvalence force coordinates and which are curvilinear because they are non-linear functions of atomic Cartesians). In general, defining the reactionpath provides the value of the reaction coordinate only for points on thereaction path. Defining the dividing surface assigns a value to the reactioncoordinate even when the geometry is off the reaction path because onedefines the generalized transition state dividing surface so that s is constantin the dividing surface; this means that defining the reaction coordinate offthe reaction path is equivalent to defining the dividing surface and vice versa.Making the dividing surface curvilinear means that the expression for theflux in phase space through the dividing surface no longer matches theexpression for a classical partition function.32 Therefore one should intro-duce an additional term C, in addition to the free energy of activation, inthe exponent of equations like Eq. [5]. However, as we only calculate thegeneralized transition state partition function approximately, we do notinclude this term (which is expected to be small for dividing surfaces definedin terms of stretch, bend, and torsion coordinates32). Changing the definitionof the dividing surface changes the generalized transition state partition func-tion even if one makes the harmonic approximation for transverse coordi-nates because generalized normal mode frequencies computed with theconstraint that s is constant will also change if the definition of s off the reac-tion path changes.66–68


An example showing why curvilinear coordinates are more physicalthan rectilinear coordinates is provided by an atom–diatom reaction (AþBC! ABþC) with a collinear reaction path where it is clearly more physicalto define the reaction coordinate in terms of the AB and BC distances and theABC bond angle than to define it as a function of the Cartesian coordinates.Displacements from the linear geometry for fixed values of s produce differenteffects on the geometry when the reaction coordinate is defined in curvilinearcoordinates, in which the bond distances stay fixed, as shown in part (a) ofScheme 1, than when it is defined in rectilinear coordinates, in which atomsmove along straight-line paths in Cartesian coordinates, as shown in part(b) of Scheme 1. This effect is illustrated in Figure 2. The difference is impor-tant because the evaluation of the second derivatives of the potential with dif-ferent frozen variables produces different harmonic frequencies. The aboveexample indicates that the choice between rectilinear and curvilinear coordi-nates for the harmonic treatment is equivalent to choosing between two differ-ent definitions of the reaction coordinate, s and s0, for points that are off thereaction path. These two reaction coordinates are equal for geometries on the

Figure 2 Contour plot that shows the projection over the reaction coordinate of ageometry close to the MEP when curvilinear ðs0Þ or rectilinear ðsÞ coordinates are used.


reaction path but differ for general geometries. The relation between them isgiven by the expression:67

s0 ¼ sþ 1

2

XF�1i¼1

XF�1j¼1

bijqiqj þOðq3i Þ ½94�

where qi represents a curvilinear coordinate that is zero on the reaction pathand measures the distortion away from it; bij involves second-order partialderivatives of s0 with respect to qi with s held fixed. The Hessian elements eval-uated with the two definitions are related by66,67

q2Vqqiqqj

!s0

��q0¼ð0;::::;0;s0Þ

¼ q2Vqqiqqj

!s

�bij qVqs

� �q

��q¼ð0;::::;0;sÞ

½95�

where q0 ¼ fq1; q2; . . . ; qF�1; s0g and q ¼ fq1; q2; . . . ; qF�1; sg. It is clear fromthe above relation that the Hessian and (therefore) the harmonic frequenciesdepend on the definition of the reaction coordinate except at stationary points,where qV=qs ¼ 0. As the calculated vibrational frequencies of the generalizednormal modes depend on the coordinate system, it is important to make themost physically appropriate choice. It has been shown that the curvilinearcoordinates produce more physical harmonic frequencies than do the recti-linear coordinates.67,68 This results because the atoms move along straightlines in rectilinear generalized normal modes,69 whereas motions along pathsdictated by valence coordinates28,68–72 are much less strongly coupled.(Valence coordinates, also called valence force coordinates, are stretches,bends, torsions, and improper torsions.) The frequencies in the more physicalcurvilinear coordinates can be obtained by following a generalization of thescheme described by Pulay and Fogarasi,71 as described next.

For theN-atom system, the energy V at a geometry (denoted by x in Car-tesian coordinates and by q in internal coordinates) close to a reference geo-metry (denoted by x0 in Cartesian coordinates and by q0 in internalcoordinates) can be obtained by a second-order Taylor expansion. In unscaledCartesian and curvilinear coordinates, the expansions are given by

V ¼ V0 þX3Ni¼1

GRiðRi � R0

i Þ þ1

2

X3Ni;j

FRijðRi � R0

i ÞðRj � R0j Þ ½96�

and

V ¼ V0 þXFcurvi¼1

giðqi � q0i Þ þ1

2

XFcurvi;j

fijðqi � q0i Þðqj � q0j Þ ½97�

respectively, where Fcurv is the number of curvilinear coordinates that are to beused, gi is a component of the gradient in internal coordinates, and fij is an


element of the Hessian in curvilinear coordinates. However, three problemsare related to the use of curvilinear coordinates:

(1) They are not mutually orthogonal;(2) for more than four atoms, there are more than 3N � 6 valence coordi-

nates; and(3) the transformation to Cartesian coordinates is nonlinear.

Specifically, the curvilinear coordinates can be written as a power series of thedisplacements in Cartesian coordinates:28

qi ¼X3Nj

BijðRj � R0j Þ þ 1

2

X3Nj;k

CijkðRj � R0

j ÞðRk � R0kÞ þ . . . ½98�

where a superscript zero indicates a reference geometry (a stationary point or apoint on the reaction path), Bij is an element of the Fcurv � 3N Wilson B matrix,

Bij ¼ qqiqRj

� ��fRjg¼fR0

jg; i ¼ 1; . . . Fcurv; j ¼ 1; . . . ; 3N ½99�

andCijk is an element of the 3N � 3N tensorCi that represents the quadratic term

Cijk ¼

q2qiqRjqRk

!��fRkg¼fR0

kg; i ¼ 1; . . . Fcurv; j; k ¼ 1; . . . ; 3N ½100�

For reactions involvingmore than four atoms, it is often not obviouswhichset of 3N � 6 internal coordinates best describes the whole reaction path, and inthose cases, it is very useful to define the reactive system in terms of redundantinternal coordinates.72 Using redundant internal coordinates circumvents

(1) destroying the symmetry of the system for highly symmetric reaction pathsby omitting a subset of symmetry related coordinates and

(2) using an incomplete set of 3N � 6 internal coordinates that does not fullyspan the vibrational space.

Therefore, the recommendation is that, for more than four atoms, one shouldalways use redundant internal coordinates to evaluate the generalized normalmode frequencies.

In practice, the following procedure68,72 is carried out to calculate thefrequencies and generalized normal mode eigenvectors in redundant internalcoordinates, where nonredundant internal coordinates are simply a specialcase and may be used in the same manner.

First, the Wilson B and C matrices28 must be constructed. When usingredundant internal coordinates, the formulas for the Wilson B and C matricesgiven above are used, except the number of internal coordinates, Fcurv, is notrestricted to be 3N � 6. The formulas given above for these matrices are decep-tively simple, and in practice, this is themost difficult step,68 although once com-puter code is available (as in POLYRATE), the code is very general, and no new


issues need to be considered for further applications. Once these matrices havebeen constructed, the Wilson G matrix, called GW, is constructed as

GW ¼ BuBy ½101�

where u is a 3N � 3N diagonal matrix with the reciprocals of the atomicmasses on the diagonal. Next, the matrix GW� is created using

GW� ¼ KK0ð Þ �1 00 0

� �Ky

K0ð Þy� �

½102�

where K is defined to consist of the eigenvectors of GW corresponding to non-zero eigenvalues, K0 is defined to consist of the remaining eigenvectors, and is defined to contain the nonzero eigenvalues. The generalized inverse of theWilson B matrix is71

A ¼ uByGW� ½103�Now, the construction of the gradient and force constant matrices in internalcoordinates is possible:

g ¼ AyGR ½104�

f ¼ AyFRA�XFcurvi

giAyCiA ½105�

Then, the gradient and force constant matrices needed to project out thereaction coordinate are created:

P ¼ GWGW� ½106�~ff ¼ PfP ½107�~gg ¼ Pg ½108�

The projected Hessian fP is given by

fP ¼ 1� pðsÞ BuBy� � �~ffðsÞ�1� ½BuBy�pðsÞ� ½109�

where p, the nonorthogonal coordinate projection operator, is given at s by

p ¼ ~gg~ggy

~ggy BuBy�

~gg½110�

Now it is possible to evaluate the vibrational frequencies using the Wilson GFmatrix method,28,73–75

GWFWLW ¼ LW ½111�where GW is defined above, the projected Hessian f

P is used for FW, LW

is the matrix of generalized normal mode eigenvectors, and K is the diagonal


eigenvalue matrix. Vibrational frequencies are given in terms of the eigen-values by

om ¼ ðmmÞ1=2 ½112�Next, the vibrational eigenvectors must be normalized. The normalized eigen-vector matrix is given by

LLW ¼ LWW ½113�where

Wij ¼ffiffiffiffiffiffiCij

qdij ½114�

and

C ¼ ðLWÞ�1GWhðLWÞ�1

iy½115�

The Cartesian displacement normal-mode eigenvectors are

w ¼ uByðGWÞ�1LLW ¼ ALLW ¼ ALWW ½116�Finally, the elements of the rectilinear eigenvector matrix, LGT, which areneeded for multidimensional tunneling calculations (see Eqs. [164], [170],and [171]) are given by

LGTij ¼

ðmi=mÞ1=2

Pk

ðmk=mÞwkj2" #1=2 ¼ m

1=2i wij

Pk

mkwkj2" #1=2 ½117�

Loose Transition StatesAlthough the POLYRATE program is very general, the definitions it uses

for the generalized transition state dividing surfaces are most appropriate forreactions with non-negligible barriers and tight transition states. For manyassociation–dissociation reactions, the transition state is located at a positionwhere two fragments have nearly free internal rotation; in such cases, one maywish to use even more general definitions of the dividing surfaces;76,77 theseare not covered in the current tutorial. We note though that the methodsused above have been used successfully to treat the association of hydrogenatoms with ethylene to form the ethyl radical.78–80

In recent years, there has been tremendous progress in the treatment ofbarrierless association reactions with strictly loose transition states.76,77,81–89

A strictly loose transition state is defined as one in which the conservedvibrational modes are uncoupled to the transition modes and have the samefrequencies in the variational transition state as in the associatingreagents.81,82,84 (Conserved vibrational modes are modes that occur in both


the associating fragments and the association complex, whereas transitionmodes include overall rotation of the complex and vibrations of the complexthat transform into fragment rotations and relative translational upon disso-ciation of the complex.) Progress has included successively refined treat-ments of the definition of the dividing surface and of the definition of thereaction coordinate that is missing in the transition state76,77,81–88 and ele-gant derivations of rate expression for these successive improvements.85–88

The recent variational implementation of the multifaceted–dividing-surfacevariational-reaction-coordinate version of VTST seems to have brought thetheory to a flexible enough state that it is suitable for application to a widevariety of practical applications to complex combustion reactions of polya-tomic molecules. Although some refinements (e.g., the flexibility of pivot pointplacement for cylindrical molecules like O2

88) would still be useful, thedynamical formalism is now very well developed. However, this formalismis not included in POLYRATE, and so it is not reviewed here.

Harmonic and Anharmonic Vibrational Energy Levels

The partition functions thus far havebeen assumed to be calculated using theharmonic approximation. However, real vibrations contain higher-order forceconstants and cross terms between the harmonic normal modes, and they arecoupled to rotations. If the cross terms and couplings are neglected, each ofthe vibrational degrees of freedom is bound by an anharmonic potential given by

Vm ¼ 1

2kmmðsÞQ2

m þ kmmmðsÞQ3m þ kmmmmðsÞQ4

m þ . . . ½118�

where kmm, kmmm, and kmmmm are the quadratic, cubic, and quartic normalcoordinate force constants and Q is the vector of normal mode coordinates.In rectilinear coordinates, the relationship between normal modes is given by

Q ¼hLGTðsÞ

iy½x� xðsÞ� ½119�

where the transformation matrix is defined by the diagonalization in Eq. [92].In curvilinear coordinates, the normal modes are defined by the Wilson GFmatrix method as described above. For the harmonic approximation, the seriesis truncated after the first term, and the frequency o is given by

om ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffikmm=m

p½120�

The partition function for the harmonic approximation is

QHOvib ¼ e�bE

HO0 ~QQ

HO

vib ½121�

¼ e�bEHO0

YFm¼1

~QQHOm ½122�


EHO0 is the harmonic ground-state energy, which is calculated by

EHO0 ¼ �h

2

XFm¼1

om ½123�

where

~QQHO

m ¼ 1

1� e�b�hom=2½124�

and om is the harmonic vibrational frequency of mode m, given by Eq. [120].For generalized transition states, F is replaced by F�1 in Eqs. [122] and [123]such that the imaginary frequency is not included.

Hindered Internal Rotations (Torsions)One type of anharmonic motion is a hindered internal rotation, or tor-

sion, which can differ substantially from a harmonic normal mode motion.Unlike many other anharmonic motions, torsions can be readily accountedfor even in large systems. It has been shown92�95 that a vibrational partitionfunction that includes a torsion can be written as

Qvib ¼ e�bE0 ~QQtor~QQsb ½125�

where ~QQtor is a torsion partition function and ~QQsb is the stretch-bend partitionfunction that ignores the torsional twist angle. A simple and effective equa-tion90 for calculating ~QQtor is

~QQtor ¼ ~QQHO

m tanhQFR

QI

� �½126�

where QFR is the free rotor partition function given by

QFR ¼ ð2pIkTÞ12

�hs½127�

where I is the effective moment of inertia and s is the effective symmetry num-ber. QI is called the intermediate partition function, which is the high-tem-perature limit of the harmonic oscillator partition function given by

QI ¼ kT

�h

1

o½128�

where o is the normal mode frequency relating to the torsion. The methodthus far has been defined for a single well or multiple wells that are symmetri-cally equivalent. For multiple wells that are not symmetrically equivalent, theextended method has been defined by Chuang and Truhlar.90


The frequency, the effective moment of inertia, and the barrier height W,are related to one another by the expression90,91

o ¼ W

2I

� �12

M ½129�

where M is the number of wells as the torsion rotates 360 degrees. Therefore,under the assumptions that the effective potential for the torsion is a singlecosine term and that the moment of inertia is a constant, only two of the threevariables need to be specified to calculate the torsion partition function. Thefrequency can be determined from normal mode analysis, the barrier heightcan be determined from electronic structure methods, and the effectivemoment of inertia is described next.

There are several schemes for calculating the effective moment of inertiafor internal rotation: the curvilinear (C) scheme of Pitzer and Gwinn,92,93 whichrequires the choice of an axis (typically a bond) about which the tops are rotat-ing; the rectilinear (R) scheme of Truhlar,94 which only requires that one iden-tify the generalized normal mode that corresponds to the torsion and divide themolecule into parts that rotate against each other; the ok scheme of Ellingson etal.,95 which requires that one identify a torsion coordinate as well as the general-ized normal mode frequency corresponding to the torsion; and the oW schemeof Chuang and Truhlar.90 When the torsion is mixed with stretching, bending,or other torsional motions in the generalized normal modes, the user must pickthe generalized normal mode that is most dominated by the torsion under con-sideration. It is not always clear which scheme is most correct, in part becausereal torsions are usually coupled strongly to overall rotation and sometimes toother vibrational modes as well. As the tops become significantly asymmetric,the R scheme begins to fail, and one should use one of the other methods.

The method of calculating the moment of inertia in the C scheme isdescribed here. Let M be the mass of the entire molecule and mi be the massof atom i, and let the principal moment of inertia be defined as Ij, wherej ¼ 1; 2, or 3. All atoms in the molecule are divided into two groups rotatingwith respect to one another; each group is called a top, and the lighter top istaken as the rotating top. Let the coordinate system be defined such that the zaxis is the chosen axis of rotation and the x axis is perpendicular to the z axisand passes through the center of mass of the rotating top, and let the y axis beperpendicular to both x and z. At this point, there are three sets of axes: theoriginal Cartesian axes, the principal moment of inertia axes (labeled 1, 2, or 3),and the axes for the rotating top (labeled x, y, and z). It is important that thesesets of axis are all either right handed or left handed. The direction of cosinesbetween the axes of the top and the principal moment of inertia axis j aredefined as ajx, ajy, and ajz. The vector from the molecule’s center of gravityto the origin of coordinates for the rotating top is given by r, with its compo-nents r1, r2, and r3 on the principal moment of inertia axes.


The moment of inertia for the rotating top about the z axis is given by

A ¼Xi

miðx2i þ y2i Þ ½130�

where the sum is over the atoms in the rotating top and xi, yi, and zi (usedbelow) refer to the location of atom i on the newly created x, y, and z axis,respectively. The xz product of inertia is given by

B ¼Xi

mixizi ½131�

The yz produce of inertia is given by

C ¼Xi

miyizi ½132�

The off-balance factor is given by

U ¼Xi

mixi ½133�

The reduced moment of inertia for internal rotation is given by

I ¼ A�Xj

ðajyUÞ2M

þ ðbjÞ2Ij

( )½134�

where

bj ¼ ajzA� ajxB� ajyCþUðaj�1;yrjþ1 � ajþ1rj�1Þ ½135�

and the superscripts refer to cyclic shifts of axes, such that j� 1 ¼ 3 if j ¼ 1,and jþ 1 ¼ 1 if j ¼ 3. POLYRATE uses the value of I calculated for the lighterof the two tops as the C scheme moment of inertia.

The R scheme does not require that the axis of rotation be chosen apriori, but it relies on the generalized normal mode eigenvector of the modecorresponding to the torsion to determine the axis. The equations for I inthis scheme are given elsewhere.90,91

The ok scheme simply takes the moment of inertia as95

I ¼ 1

otorsion2

q2Vqj2

½136�

where otorsion is the frequency of the normal mode that most corresponds tothe torsion, j is the torsion angle, and the partial derivative in Eq. [136]must be supplied by the user. The partial derivative may be evaluated withother internal coordinates fixed or along a torsion path where other degreesof freedom are optimized for each value of j. The oW scheme uses the barrierheight rather than the second derivative with respect to the torsion angle.90


Morse Approximation I and Other Corrections for PrincipalAnharmonicityMany other anharmonic methods can be applied, especially for smaller

systems. One way to approximate the accurate anharmonic potential along astretching vibrational coordinate is to use a Morse function:96

VM;m ¼ DeðsÞ�exp½�bM;mðsÞQmðsÞ� � 1

�2 ½137�where De is the dissociation energy and the range parameter bM;m is chosensuch that the force constant is correct at the minimum of the Morse potential:

bM;mðsÞ ¼kmmðsÞ2DeðsÞ� �1

2

½138�

The energy levels for the Morse approximation I are given by

EGTvib;m ¼ �homðsÞ nþ 1

2

� �1� xM;mðsÞ nþ 1

2

� �� ½139�

where n is the level index, om is the harmonic frequency, and xM;m is theMorse anharmonicity constant:

xM;m ¼ �homðsÞ4DeðsÞ ½140�

The choice of De as the lowest dissociation energy of the system relative toVMEPðsÞ is referred to as the Morse approximation I.20,30,97

The Morse approximation is not appropriate for modes that havekmmm ¼ 0. These types of modes include bending modes of linear systems,out-of-planes bends, and certain stretching motions. Often such modes are bet-ter treated by a quadratic-quartic model, given by

Vm ¼ 1

2kmmðsÞ½QmðsÞ�2 þ kmmmmðsÞ½QmðsÞ�4 ½141�

Accurate approximations for this model can be determined using aperturbation–variation method.98,99

Spectroscopists call the force constants that have all indices the same theprincipal force constants, while the anharmonicity associated with the princi-pal force constants is called principal anharmonicity. The Morse and quadra-tic–quartic approximations treat only principal anharmonicity. However, asmentioned in Eq. [95], neglecting the cross terms between modes is amuch more serious approximation in rectilinear coordinates.70 Explicitlyincluding cross terms in rectilinear coordinates is expensive and cumbersome


because of the large number of quartic cross terms. One practical step that can betaken to minimize the importance of cross terms is to use curvilinear internalcoordinates.68,72,100,101 Not only are the harmonic frequencies more physicalin curvilinear coordinates, but anharmonicity is much better approximated byretaining only principal terms in the potential and neglecting couplings.

Calculations of Generalized Transition State Numberof States

The generalized transition state number of states needed for microcano-nical variational theory calculations counts the number of states NGT

vr in thetransition state dividing surface at s that are energetically accessible belowan energy E. Consistent with approximations used in calculations of the parti-tion functions, we assume that rotations and vibrations are separable to give

NGTvr ¼

Xn

HhE� VMEPðsÞ � EGT

vib ðn; sÞiNGT

rot

hE� VMEPðsÞ � EGT

vib ðn; sÞ; si½142�

where H(x) is the Heaviside step function ½HðxÞ ¼ 0 for x < 0 and HðxÞ ¼ 1for x > 0] and the rotational number of states are calculated classically.

QUANTUM EFFECTS ON REACTION COORDINATEMOTION

In the previous sections, we quantized the F � 1 degrees of freedom inthe dividing surface, but we still treated the reaction coordinate classically.As discussed, such quantum effects, which are usually dominated by tunnelingbut also include nonclassical reflection, are incorporated by a multiplicativetransmission coefficient k(T). In this section, we provide details about methodsused to incorporate quantum mechanical effects on reaction coordinatemotion through this multiplicative factor.

In practice, we have developed two very useful approaches to the multidi-mensional tunneling problem. In both of thesemethods, we estimate the rate con-stant semiclassically, in which case it involves averaging the tunnelingprobabilities calculated for a set of tunneling energies and tunneling paths. In acomplete semiclassical theory, one would optimize the tunneling paths;102 theoptimum tunneling paths minimize semiclassical imaginary action integrals,which in turnmaximizes the tunneling probabilities.We have found103 that suffi-ciently accurate results can be obtained by a simpler criterion91 inwhich, for eachenergy, we choose the maximum tunneling probability from two approximateresults, one, called small-curvature tunneling3,104 (SCT), calculated by ass-uming that the curvature of the reaction path is small, and the other, called

Quantum Effects on Reaction Coordinate Motion 163

large-curvature tunneling (LCT),1,3,7,91,105–110 calculated by assuming that it islarge. The result is calledmicrocanonically optimizedmultidimensional tunneling(mOMT) or, for short, optimizedmultidimensional tunneling (OMT). The result-ing VTST/OMT rate constants have been carefully tested against accurate quan-tum dynamics,103,111,112 and the accuracy has been found to be very good.

The SCT, LCT, and OMT tunneling calculations differ from one-dimensional models of tunneling in two key respects:

(1) These approximations include the quantized energy requirements of allvibrational modes along the tunneling path. As the vibrational frequenciesare functions of the reaction coordinate, this changes the shape of theeffective potential for tunneling.

(2) These approximations include corner-cutting tunneling. Corner cuttingmeans that the tunneling path is shorter than the minimum energy path.

The wave function decays most slowly if the system tunnels where the effectivebarrier is lowest; however, the distance over which the decay is operativedepends on the tunneling path. Therefore, the optimum tunneling pathsinvolve a compromise between path length and effective potential along thepath. As a consequence, the optimum tunneling paths occur on the concaveside of the minimum energy path; i.e., they ‘‘cut the corner.’’7,52,102,107,113–119 For the purpose of analyzing the results, it is sometimes of interest toalso compute an intermediate result, called zero-curvature tunneling (ZCT),that includes effect (1) but not (2).

The rest of this section will provide the details of the ZCT, SCT, LCT,and OMT tunneling approximations.

Multidimensional Tunneling Corrections Based on theAdiabatic Approximation

The adiabatic separation between the reaction coordinate and all otherF � 1 vibrational degrees of freedom means that quantum states in thosemodes are conserved through the reaction path. With this approximation,we can label the levels of the generalized transition states in terms of the‘‘one-dimensional’’ vibrationally and rotationally adiabatic potentials

Va ¼ VMEPðsÞ þ EGTint ða; sÞ ½143�

where a is the collection of vibrational and rotational quantum numbers andEGTint ða; sÞ is the vibrational–rotational energy level for quantum state a and

generalized, transition state at s. Making the rigid-rotor–harmonic-oscillatorapproximation, EGT

int ða; sÞ for the ground rotational state reduces to the energylevel for vibrational state n ¼ fn1; . . . nF�1g and is given by

EGTvib ðn; sÞ ¼

Xm

�homðsÞ nm þ 1

2

� �½144�


The ground-state adiabatic potential is defined with a ¼ 0, and only the vibra-tions contribute to the internal energy through zero-point energies in eachmode to give

VGa ¼ VMEPðsÞ þ

Xm

�homðsÞ2

½145�

The transmission coefficient is written in terms of the classical and quantumprobabilities, PA

C and PAQ, respectively, for transmission through or above the

adiabatic potential: VAða; sÞ:54

kA ¼R10 dE expð�bEÞP

aPAQða;EÞR1

0 dE expð�bEÞPaPACða;EÞ

½146�

The probabilities for classical motion along the reaction coordinate within theadiabatic approximation are simply zero when the energy E is below the max-imum VA

a of the vibrationally adiabatic potential for state a, and one for ener-gies above the barrier; i.e.,

PACða;EÞ ¼ H

hE� VA

a ðaÞi

½147�

where H is the Heaviside unit-step function defined below Eq. [142]. Evalua-tion of the quantum probabilities PA

Q is more difficult, and two approxima-tions are made to facilitate evaluation of the numerator of the transmissioncoefficient.

The first approximation is that excited-state probabilities are approxi-mated by the probabilities for the ground state PAG

Q , but for a shifted energy,

PAQ ¼ PAG

Q ½E� VAa ðaÞ þ VAG

a � ½148�

where VAGa is the barrier height of the ground-state vibrationally adiabatic

potential,

VAGa ¼ VA

a ða ¼ 0Þ ½149�This approximation assumes that the vibrationally adiabatic potentials of allexcited states have the same shape as the ground-state vibrationally adiabaticpotential. Although this approximation is not strictly valid, it is adequate fortwo reasons. First, when tunneling is important, the temperature is usually lowenough that the transmission coefficient is dominated by the ground state orexcited states close to the ground state. Second, contributions of tunneling tothe rate constant become unimportant (i.e., k! 1) as T becomes high enoughthat excited states with significantly different vibrationally adiabatic potentialcurves contribute more to the rate constant.


The second approximation consists in the replacement of quantum prob-abilities PAG

Q by semiclassical ones

PSAG ¼ f1þ exp½2yðEÞ�g�1 ½150�where y is the imaginary action integral:

yðEÞ ¼ �h�1ðs>ðEÞ

s<ðEÞ

dsn2meffðsÞ

hVG

a ðsÞ � Eio1

2 ½151�

where VGa is the ground-state adiabatic potential defined in Eq. [145], and

s< and s> are the classical turning points, i.e., locations where VGa equals E.

The effective mass meff(s) for motion along the reaction coordinate is discussedin the next section. After these approximations, the semiclassical adiabaticground-state transmission coefficient takes the simplified form

kSAG ¼ bR10 dE expð�bEÞPSAGðEÞ

expð�bVAGa Þ

½152�

which requires evaluation of semiclassical reaction probabilities for theground state only.

The integrals in Eq. [146] extend to infinity, but Eqs. [150] and [151] areonly valid for energies below the top of the barrier (i.e., for E � VAG

a ), which isthe tunneling region. For energies above VAG

a , the quantum effects (nonclassi-cal reflection) are incorporated by assuming that close to the top of the barrierthe shape of the potential is parabolic, and in that case,47

PSAGðVAGa þ�EÞ ffi 1� PSAGðVAG

a ��EÞ ½153�

where �E ¼ E� VAGa . This equation provides a natural extension to Eq. [150],

and therefore, the semiclassical probability in the whole range of energies isgiven by

PSAG ¼0;1þ exp 2y Eð Þ½ �f g�1;

1� PSAG 2VAGa � E

� �;

1;

E < E0

E0 � E � VAGa

VAGa � E � 2VAG

a � E0

2VAGa � E0 < E

8>><>>: ½154�

where E0 is the lowest energy at which it is possible to have tunneling (alsocalled the quantum threshold energy). For instance, for a bimolecular reactionAþ B! CþD

E0 ¼ maxhVG

a ðs ¼ �1Þ;VGa ðs ¼ 1Þ

i½155�


and for a unimolecular reaction A ! B

E0 ¼ maxhVG

a ðs ¼ sRÞ þ 1

2�hoR

F ;VGa ðs ¼ sPÞ þ 1

2�hoP

F

i½156�

where sR and sP indicate the value of s at the reactant and the product minima,respectively.

Accurate Incorporation of Classical Threshold EnergiesThe transmission coefficient described above is appropriate41 for correct-

ing the adiabatic theory or equivalently20,30 the microcanonical variation the-ory,which can be written

kmVT ¼ 1

h�RðTÞð10

dE expð�bEÞXa

PACða;EÞ

¼ kBT

h�RðTÞXa

exph�bVA

a ðaÞi

½157�

Reaction coordinate motion is treated classically in this expression and thelowest energy for reaction [i.e., at which PA

CðE; aÞ is not zero], or the classicalthreshold energy, is the barrier maximum for the ground-state adiabatic poten-tial VAG

a . CVT has a different classical threshold energy, which can be seen bywriting the CVT rate constant as

kCVT ¼ 1

bh�RðTÞXa

expn�bVa

ha; sCVT� ðTÞ

io½158�

where sCVT� ðTÞ is the value of s that minimizes the quantized generalized tran-sition state rate constant at temperature T as defined after Eq. [15] above. Theclassical threshold energy inherent in this expression is VG

a ½sCVT� ðTÞ� instead ofVAG

a . Using the transmission coefficient kSAG to correct CVT instead of mVT(or the adiabatic theory) requires correction for the different classical thresh-old. The CVT rate constant including multidimensional tunneling (MT) in thereaction coordinate is given by

kCVT=MT ¼ kCVT=MTðTÞ kCVTðTÞ ½159�

where

kCVT=MT ¼ bR10 dE expð�bEÞPSAGðEÞexpf�bVG

a ½sCVT� ðTÞ�g ½160�


Similarly, corrections are needed for other theories that have inherent classicalthresholds different from VAG

a , such as conventional TST, in which VAGa in

Eq. [152] is replaced by

VzGa ¼ VGa ðs ¼ 0Þ ½161�

Some of the variables explained here are shown in Figure 3 for more clarity.POLYRATE actually calculates the transmission coefficent as

kMT ¼ bR10 dE expð�bEÞPSAGðEÞ

expð�bVAGa Þ

½162�

where VAGa is VG

a ðs ¼ sAGÞ. Then, instead of using Eq. [159], one usesEq. [162] but first multiplies the CVT rate by

kCVT=CAGðTÞ ¼ expnbhVG

a ðsCVT� ðTÞÞ � VAGa

io½163�

In early papers we were careful to distinguish kCVT=MT from kMT, but in recentpapers, we often call both of these quantities kMT and let the reader figure outwhich one is involved from the context.

Figure 3 Graphic illustration of some important quantities that often appear invariational transition state theory. The transition state is indicated by the z symbol.


Zero-Curvature and Small-Curvature Multidimensional TunnelingFrom the relation between Eq. [150] and Eq. [151], at equal barrier

heights, tunneling effects are more important if the particle has a small massor if the barrier is narrower. This is the reason why tunneling is importantwhen a light particle (for instance, a proton) is being transferred betweendonor and acceptor. The width at the top of the barrier in VMEP is determinedby the magnitude of the imaginary frequency at the transition state, and it issometimes assumed that a large imaginary frequency indicates a narrower bar-rier and, as a consequence, more tunneling. However, VMEPðsÞ is not the effec-tive barrier for tunneling, but as described above, the adiabatic barrier shouldbe used. Complete description of the adiabatic tunneling probabilities requiresdefinition of the effective mass in Eq. [151], which we discuss next.

The adiabatic prescription presented above may appear to be a one-dimensional approach, because the adiabatic potential is a function of thereaction coordinate s only. However, the reaction path is a curvilinear coordi-nate and the curvature of the path couples motion along the reaction coordi-nate to local vibrational modes that are perpendicular to it. The couplingenters into the Hamiltonian for the system through the kinetic energy termand leads to a negative internal centrifugal effect that moves the tunnelingpath to the concave side of the reaction path. In other words, as also concludedabove from a different perspective, the coupling causes the system to ‘‘cut thecorner’’ and tunnel through a shorter path than the reaction coordi-nate.7,52,102,107,113–119 The effect of the coupling is to shorten the tunnelingpath (relative to the reaction path), decreasing the tunneling integral inEq. [151] and thereby increasing the tunneling probabilities. Neglecting thecoupling in evaluating the tunneling is known as the zero-curvature tunneling(ZCT) approximation. In this case, the tunneling path is the reaction path andthe effective mass simplifies to meffðsÞ ¼ m. The ZCT method has the drawbackthat tunneling is usually seriously underestimated.54

Marcus and Coltrin115 showed that the effect of the reaction path curva-ture was to give an optimum tunneling path for the collinear HþH2 reactionthat is the path of concave-side turning points for the stretch vibration ortho-gonal to the reaction coordinate. If we define dx as the arc length along thisnew tunneling path, then the effective mass in Eq. [151] is given in terms of thes-dependent Jacobian factor dx/ds by meff ¼ m ðdx=dsÞ2. The small-curvaturetunneling (SCT) method was developed to extend this approach to three-dimensional polyatomic reactions and to eliminate problems with the Jacobianbecoming unphysical.1,116 In this approach, an approximate expression fordx=ds is written in terms of the curvature components coupling the reactionpath to the vibrational modes and the vibrational turning points.3,116

The coupling between the reaction coordinate and a mode m perpendi-cular to it is given by a curvature component defined by65

BmF ¼ �½signðsÞ�X3Ni¼1

dnniðsÞds

LGTi;mðsÞ ½164�


where nni is component i of the unit vector perpendicular to the generalizedtransition-state dividing surface at s and LGT

i;m is component i of the eigenvectorfor vibrational modem perpendicular to nn at s. If the reaction path is the MEP,then

nn ¼ vðsÞ ½165�

where v(s) is the unit vector tangent to the MEP at s as defined in Eq. [44]. Ifthe reaction path is the VRP, then nnðsÞ is defined by the procedure in the sub-section ‘‘Variational Reaction Path Algorithm’’. Note that in either case, thesign of the unit vector is chosen to be opposite or approximately opposite thegradient vector. The modulus of these F � 1 couplings corresponds to the cur-vature along the reaction path:

k ¼XF�1m¼1½BmFðsÞ�2

( )1=2

½166�

To evaluate the turning points we make the independent normal modeapproximation, where the potential Vmðs;QmÞ in mode m at s along the reac-tion coordinate is given by Eq. [118]. The turning point for vibrational statenm in this mode is obtained by solving the equation:

Vm½s;Qm ¼ tmðnm; sÞ� ¼ EGTvib;mðnm; sÞ ½167�

The sign of BmF depend on the phase assigned to the vector LGTm . This is not an

issue for harmonic calculations because in such calculations it always entersquadratically. However, for calculations of anharmonic turning points, as inEq. [167], we must make the physical choice. With the sign of nn chosen as sta-ted after eq. [165], we choose the turning point so that BmFQm< 0, whichinsures that the turning point is on the concave side.

In the harmonic approximation, the vibrational turning point of mode mis given by the expression

tmðnm; sÞ ¼ ð2nm þ 1Þ�hmomðsÞ

� �1=2½168�

The latest version of the SCT method is limited to treatment of tunneling forthe ground vibrational state with harmonic treatment of vibrations. In thiscase, we use the shorthand notation tmðnm ¼ 0; sÞ ¼ tmðsÞ for the ground-stateturning points.

In the original SCT method, we assumed that all modes were extended totheir turning points along the tunneling path, and this led to unphysically large


tunneling correction factors for reactions with many vibrational modes cou-pled to the reaction coordinate motion. The final version of SCT, called thecentrifugal-dominant small-curvature approximation in the original publica-tion,3 assumes that the corner cutting occurs in the direction along the vectorof coupling components BFðsÞ in the space of the local vibrational coordinatesQ. We make a local rotation of the vibrational axes so that BFðsÞ lies along oneof the axes, u1, and by construction, the curvature coupling in all other vibra-tional coordinates, ui, i ¼ 2 to F � 1, are zero in this coordinate system. Theeffective harmonic potential for the u1 vibrational mode is written as

V ¼ VMEPðsÞ þ 1

2m½�ooðsÞ�2u21 ½169�

where the harmonic frequency for this motion is given by

�oo ¼XF�1m¼1

Bm;FðsÞkðsÞ omðsÞ

� �2 !12

½170�

The turning point �tt for zero-point motion in this harmonic potential takes theform

�tt ¼ �h

m�ooðsÞ� �1

2

¼XF�1m¼1

Bm;FðsÞkðsÞ

� �2½tmðsÞ��4

!�14

½171�

The Jacobian factor dx/ds for the path defined by these turning points isexpressed in terms of the curvature and turning points by

dx=ds ¼n½1� �aaðsÞ�2 þ ðd�tt=dsÞ2

o12 ½172�

where

�aa ¼ kðsÞ �ttðsÞjj ½173�This expression has a singularity when the turning point is equal to the radiusof curvature and is unphysical for values that are larger; i.e., �tt � 1=k. The pro-blem can be solved by using an exponential form, in which case the effectivemass for the SCT method is written as

mSCeff=m ¼ minnexp

��2�aaðsÞ � ½�aaðsÞ�2 þ ðd�tt=dsÞ2�1

½174�

From the above expression, it is clear that mSCeff � m, and therefore, the trans-mission coefficients obtained by the small-curvature approximation are alwaysequal to or larger than the zero-curvature transmission factors. As shown, if


the curvature along the reaction path is small or intermediate, it is possible totreat tunneling, without explicit evaluation of the tunneling path, by using aneffective mass, which is a function of the reaction path curvature.

Large Curvature Transmission Coefficient

The SCT method is appropriate for use in reactions with small reactionpath curvature. For systems with intermediate to large tunneling, the large-curvature tunneling methods1,3,7,91,105–110,120,121 have been developed thatbuild on the adiabatic approach, but they go beyond it to include importantfeatures affecting tunneling in large-curvature systems. The first important fea-ture is that the tunneling paths are straight-line paths that connect the reactantand the product valleys of the reaction. A straight-line path is the shortest pos-sible path between turning points in the reactant and product valleys, but theeffective potential along this path is no longer the adiabatic potential and itcan have a maximum that is larger than the adiabatic barrier maximum. Short-ening the path decreases the tunneling integral, thus increasing the tunnelingprobability, while increasing the potential does the opposite. The optimal tun-neling paths for large-curvature systems are often straight-line paths becausethe effect of shortening the tunneling path dominates for these systems. Thesecond important feature is nonadiabatic tunneling, which is the possibilityof tunneling into excited states for exoergic reactions or the possibility of tun-neling from excited states for endoergic reactions. Finally, the straight-linetunneling paths go through regions of the PES, which are far from the MEP.We call this region the reaction swath. In this section, we start by describingthe large-curvature tunneling method for systems dominated by tunnelingfrom/to the ground vibrational states of reactants/products. We then describehow vibrationally excited states are included in the calculations and the gen-eral procedure to evaluate the LCG4 tunneling probabilities.110 Finally, wedescribe how to carry out these calculations by sampling the reaction swathefficiently.120,121

At this point, it may be helpful to make some comments about howexcited states enter the tunneling calculations. First consider the zero-curva-ture approximation. Here both the transverse vibrational and the rotationalquantum numbers are conserved in the tunneling region, and the process isvibrationally adiabatic.54 Next consider the small-curvature approximation.This is not really adiabatic because the tunneling path is affected by reactionpath curvature, which is a manifestation of coupling to transverse modes.116

Nevertheless, when we calculate a ground-state-to-ground-state process by theSCT approximation, we do not actually assume that the reactants andproducts are in the ground states.122 What we assume is that the systemtunnels in the ground level of the quantized transition state.123 Outside thetunneling region, the transverse quantum numbers may be vibrationally adia-batic, and probably they are vibrationally nonadiabatic whenever there are


low-frequency modes; in addition, the process is probably usually rotationallynonadiabatic.122,123 But in the dynamical bottleneck region where tunnelingoccurs, the transverse modes conserve their quantum number, or at leastthey are assumed to do so.

Next consider the large-curvature approximation. Here one cannot evenassume that the transverse quantum numbers of high-frequency modes areconserved even during the tunneling process itself.124 One cannot describethe wave function in the strong-interaction region, where tunneling occurs,in terms of asymptotic or adiabatic modes; instead one uses a diabatic repre-sentation in which all nonadiabaticity is associated with a single diabaticmode, which correlates more than one asymptotic mode of the product.This yields a recipe for calculating a realistic tunneling probability. To explainthe algorithm, we will first consider the case where all quantum numbers areconsidered, even for this one diabatic mode; this case is treated in the Subsec-tion below. Then we consider the case where tunneling proceeds in part intovibrationally excited levels of the product.

Large-Curvature Tunneling Without Vibrational ExcitationsAs stated, the large-curvature tunneling (LCT) methods use the ground-

state vibrationally adiabatic potential to define classical reaction-coordinateturning points for a total energy E by inverting the equation

VGa ðsiÞ ¼ E; i ¼ 0; 1 ½175�

to obtain s0ðEÞ and s1ðEÞ, which are the turning points in the reactant and pro-duct valleys, respectively. One major departure from the adiabatic theoryis that tunneling at total energy E is not initiated just from the reactant classi-cal turning points at s0ðEÞ, but it occurs all along the entrance channel up tothe turning point. Another departure is that tunneling occurs along straight-line tunneling paths connecting the reactant and product valleys, rather thanthe curvilinear path defined by the reaction path, vibrational turning points,and curvature couplings. Finally, tunneling is assumed to be initiated by vibra-tional motions perpendicular to the reaction coordinate rather than motionalong the reaction coordinate.

The end points of the tunneling paths in the reactant and product valleysare defined as ~ss0 and ~ss1, and they obey the resonance condition

VGa ð~ss0Þ ¼ VG

a ð~ss1Þ ½176�This expression provides a relationship between ~ss0 and ~ss1 so that either one or theother is an independent variable. Unless stated otherwise, we use ~ss0 as the inde-pendent variable, and when ~ss1 appears, its dependence on ~ss0 is implicit. The tun-neling path is a straight-line path in mass-scaled Cartesian coordinates defined by

xðx;~ss0Þ ¼ xRPð~ss0Þ þ x ggð~ss0Þ ½177�


where x denotes the progress variable along the linear path. The unit vectoralong the tunneling path is defined by

ggð~ss0Þ ¼ xRPð~ss1Þ � xRPð~ss0ÞxP

½178�

where xRPð~ss0Þ and xRPð~ss1Þ are mass-weighted Cartesian coordinates at the ter-mini of the tunneling path, which lie on the reaction path at ~ss0 and ~ss1, respec-tively, and xP is the length of the path

xP ¼ xRPð~ss1Þ � xRPð~ss0Þjj ½179�

so that x equals the distance from xRPð~ss0Þ along the path. For simplicity of nota-tion, we do not explicitly show the dependence of xP on ~ss0. To avoid confusionwith coordinates along the straight-line tunneling paths, xðx;~ss0Þ, we use the nota-tion xRPðsÞ to denote mass-weighted Cartesian coordinates along the reactionpath. The reaction path can be either the MEP or the variational reaction path.

The total tunneling amplitude along the incoming trajectory at energyE includes contributions from all tunneling paths initiated in the reactant valley

T0ðEÞ ¼ðs0ðEÞ�1

d~ss0 v�1R ðE;~ss0Þs�1ð~ss0ÞTtunð~ss0Þ��sin w½~ss0; ggð~ss0Þ�� ½180�

The tunneling amplitude Ttunð~ss0Þ is weighted by the classical probability den-sity d~ss0=vRðE;~ss0Þ, which is proportional to the time spent between ~ss0 and~ss0 þ d~ss0, by the number of collisions per unit time with the vibrational turningpoint in the tunneling direction, t�1ð~ss0Þ, and by the sine of the angle w½~ss0; ggð~ss0Þ�between the vector tangent to the reaction path at ~ss0 and ggð~ss0Þ, which is a mea-sure of how effectively the perpendicular vibrations initiate motion along thetunneling path. Tunneling can occur during the incoming and outgoing trajec-tory, so the total tunneling amplitude should be 2T0ðEÞ. However, to enforcemicroscopic reversibility, the total tunneling amplitude is given by

TðEÞ ¼ T0ðEÞ þ T1ðEÞ ½181�

where T1ðEÞ is the tunneling amplitude for the outgoing trajectory in the pro-duct channel. The expression for T1ðEÞ is similar to Eq. [180] except that weuse ~ss1 as the independent variable instead of ~ss0 and the quantitiesvRðE;~ss1Þ; s�1ð~ss1Þ, and w½~ss1; ggð~ss1Þ� are evaluated at locations along the reactionpath in the product channel. The integrals in Eq. [180] and the analogousequation for T1ðEÞ extend out to s ¼ 1, but quantities along the reactioncoordinate needed to evaluate the integrand are available on a grid thatextends to finite values of s. Calculations of the tunneling amplitudes needto be converged with respect to the limits of the grid.


The local velocity for a point ~ssi in the reactant channel (i ¼ 0) or productchannel (i ¼ 1) is given by

vRðE;~ssiÞ ¼ 2

m

hE� VG

a ð~ssiÞi� �1

2

; i ¼ 0; 1 ½182�

The general expression for the angle w½s; ggð~ssiÞ� between the unit vector ggð~ssiÞand the unit vector tangent to the reaction path at s is

cos w½s; ggð~ssiÞ� ¼ ggð~ssiÞ � dxRP=dsdxRP=dsj j ; i ¼ 0; 1 ½183�

where w½~ssi; ggð~ssiÞ�, which is needed in the expressions for T0ðEÞ and T1ðEÞ, isobtained by evaluating this expression at s ¼ ~ssi. The vibrational period sð~ssiÞis evaluated for the effective vibrational potential along the tunneling path.This effective potential is obtained by projecting the tunneling path onto the(F � 1) vibrational modes perpendicular to the reaction path at ~ss0 and comput-ing the potential along this projected straight-line path. In the harmonicapproximation, the vibrational period reduces to

sð~ssiÞ ¼ 2po?ð~ssiÞ ; i ¼ 0; 1 ½184�

where the harmonic frequency is expressed as

o?ð~ssiÞ ¼XF�1m¼1½omð~ssiÞqmð~ssiÞ�2

( )1=2

; i ¼ 0; 1 ½185�

and the components of unit vector along the projected path are given by

qmð~ssiÞ ¼ ggð~ssiÞ � LGTm ð~ssiÞPF�1

m0¼1ggð~ssiÞ � LGT

m ð~ssiÞ� 2� �1

2

; i ¼ 0; 1 ½186�

where the eigenvectors are defined in Eq. [92]. Again, the sign of qm dependson the ‘‘sign’’ of the vector LGT

m , but it is not an issue because we use the har-monic approximation.

The tunneling amplitude for each straight-line path is approximatedusing a primitive semiclassical expression

Ttunð~ss0Þ ¼ Ttunð~ss1Þ ¼ exp½�yð~ss0Þ�; i ¼ 0; 1 ½187�


where the action integral along the linear path is

yð~ss0Þ ¼ ð2mÞ12

�h

�ðxI0

dxnVG

a sIðx;~ss0Þ½ � � VGa ð~ss0Þ

o12

cos w½sIðx;~ss0Þ; ggð~ss0Þ�

þðxIIIxI

dx VIIeffðx;~ss0Þ � VG

a ð~ss0Þ� 1

2

þðxPxIII

dxnVG

a ½sIIIðx;~ss0Þ� � VGa ð~ss0Þ

o12

cos whsIIIðx;~ss0Þ; ggð~ss0Þ

i�½188�

where for simplicity the dependence of the integration limits on ~ss0 are notexplicitly shown. The intervals [0, xI] and [xIII, xP] along the tunneling pathindicate the reactants region (labeled as I) and the products region (labeledas III), respectively. Regions I and III are called adiabatic because contribu-tions to the action integral can be constructed from the information alongthe reaction path and the adiabatic potential. In these adiabatic regions, thesystem tunnels through the adiabatic barrier and the tunneling direction isalong the reaction coordinate. Therefore, the contribution to the action inte-gral in these regions is weighted by projections of the tunneling path along thereaction path, which are given by the cos w factors. In the nonadiabatic region[xI, xIII], the tunneling is along the straight-line tunneling path and uses aneffective potential, which is described below, in calculation of the contributionfrom this region to the action integral.

The vibrational adiabatic potential that enters Eq. [188] requires deter-mination of s for geometry xðx;~ss0Þ along the tunneling path. The value of s isdefined such that the vector between the geometry along the reaction pathxRPðsÞ and the geometry along the linear tunneling path xðx;~ss0Þ is perpendicu-lar to the gradient at that s value:

½xðx;~ss0Þ � xRPðsÞ� � dxRPds¼ 0 ½189�

However, this equation may have multiple solutions. We are interested in twosets of solutions that make s a continuous function of x. The first solutionsIðx;~ss0Þ is obtained by starting in reactants with x ¼ 0, where sIðx ¼ 0;~ss0Þ ¼ ~ss0, and then performing a root search for s at �x, with sIðx ¼ 0;~ss0Þ asthe initial guess for the root search. The procedure is iterated for xþ�x usingsIðx;~ss0Þ as the initial guess for the root search to construct a single-valued andcontinuous function sIðx;~ss0Þ. A second solution sIIIðx;~ss0Þ is found by startingin products with x ¼ xP, where sIIIðx ¼ xP;~ss0Þ ¼ ~ss1 and iteratively decreasing xto find a solution starting from the product channel. Once the value of s is


found, it is possible to define the generalized normal mode coordinatesQm½siðx;~ss0Þ�; i ¼ I or III, by the relation

Qm½siðx;~ss0Þ� ¼ fxðx;~ss0Þ � xRP½siðx;~ss0Þ�g � LGTm ½siðx;~ss0Þ�; i ¼ I or III ½190�

and therefore, at every point along the linear path located in regions I or III, itis possible to assign a unique set of local normal modes.

Next we discuss how the boundaries between the adiabatic and nonadia-batic regions are determined. We begin by defining a zeroth-order estimate ofthe boundaries on the reactant side, x0I . A given geometry xðx;~ss0Þ lies withinthis boundary (i.e., x < x0I ) if all three of the following conditions are met:(1) The value of sIðx;~ss0Þ calculated by Eq. [189] has to be smaller than ~ss1:

sIðx;~ss0Þ < ~ss1 for x < x0I ½191�

(2) All generalized normal mode coordinates are within their vibrationalturning points ��Qm½sIðx;~ss0Þ�

�� tm½sIðx;~ss0Þ�� for x < x0I

�� ½192�

where the turning points are defined in Eq. [167] but taking nm ¼ 0.(3) The geometry xðx;~ss0Þ lies within a single-valued region of the curvilinearcoordinates; i.e.,

�XF�1m¼1

BmF½sIðx;~ss0Þ�Qm½sIðx;~ss0Þ� < 1 for x < x0I ½193�

where the curvature components are defined in Eq. [164]. Note that LGTm

occurs in the definition of both BmF and Qm so the sign cancels out and wedon’t have to worry about it here. Similarly, we define a zeroth-order estimateof boundaries on the product side, x0III, by the conditions:

sIIIðx;~ss0Þ > ~ss0 for x > x0III ½194�Qm½sIIIðx;~ss0Þ� � tm½sIIIðx;~ss0Þ� for x > x0III

�� ½195�

�XF�1m¼1

BmF½sIIIðx;~ss0Þ�Qm½sIIIðx;~ss0Þ� < 1 for x < x0III ½196�

The values of the zeroth-order boundaries are now used to determinethe boundaries, xI and xIII, in Eq. [188]. Two cases can arise, x0I < x0III, inwhich the effective potential in Eq. [188] needs to be specified for the nonadia-batic region, and x0I � x0III, in which the adiabatic regions overlap and the


nonadiabatic region does not exist. We discuss the latter case first. When theadiabatic regions overlap, the adiabatic potential in the interval ½xIII; xI� iscalculated as

minnVG

a ½sIðx;~ss0Þ�;VGa ½sIIIðx;~ss0Þ�

o½197�

For the case x0I < x0III, we define a zeroth-order effective potential for region II

VII;0eff ðx;~ss0Þ ¼ V½xðx;~ss0Þ� þ VI

corrðx0I ;~ss0Þ

þ x� x0Ix0III � x0I

hVIII

corrðx0III;~ss0Þ � VIcorrðx0I ;~ss0Þ

i ½198�

where the first term is the actual potential along the straight-line tunnelingpath. The other terms correct for zero-point energy in modes that are withintheir turning points at the boundaries. Within the harmonic approximation,they are given by

Vicorrðx0i ;~ss0Þ ¼

1

2

XF�1m¼1

h�homðsÞ � mo2

mðsÞQ2mðsÞ

is¼siðx0i ;~ss0Þ

; i ¼ I or III ½199�

This zeroth-order effective potential is not guaranteed to match up smoothlywith the adiabatic potential at the boundaries. To correct for this deficiency,another requirement is added to the three conditions above, namely, (4) theadiabatic potential should be greater than or equal to the zeroth-order effec-tive potential at the boundary. The boundaries xI and xIII of the nonadiabaticregion (labeled as II in Figure 4) are thus defined by

xi ¼ x0i if VGa siðx0i ;~ss0Þ� � VII;0

eff ðx0i ;~ss0Þ; i ¼ I or III ½200�

otherwise the value of xi is defined implicitly by extending the nonadiabaticregion until

VGa ½siðxi;~ss0Þ� ¼ VII;0

eff ðxi;~ss0Þfor VG

a siðx0i ;~ss0Þ�

< VII;0eff ðx0i ;~ss0Þ; i ¼ I or III ½201�

For the case where the adiabatic potential is larger than the effective potential,another correction is made to the effective potential. The difference in energybetween the boundaries is due to anharmonicity, and therefore, we introducenonquadratic corrections of the type

Vianhð~ss0Þ ¼ VG

a ½siðxi;~ss0Þ� � VII;0eff ðxi;~ss0Þ; i ¼ I or III ½202�


for the reactant channel (i ¼ I) and for the product channel (i ¼ III). With thiscorrection, the effective potential is given by

VIIeffðx;~ss0Þ ¼ V½xðx;~ss0Þ� þ VI

corrðxI;~ss0Þ þ VIanhð~ss0Þ

þ x� xIxIII � xI

hVIII

corrðxIII;~ss0Þ � VIcorrðxI;~ss0Þ þ VIII

anhð~ss0Þ � VIanhð~ss0Þ

i½203�

Using the original boundaries x0I and x0III and zeroth-order effective potentialVII;0

eff ðx;~ss0Þ and not imposing the addition condition (4) results in the LCG3method,3 whereas the use of the improved boundaries xI and xIII and effectivepotential VII

effðx;~ss0Þ results in the LCG4 method.110

Figure 4 Effective potential contour plot of a reaction that illustrates some featuresof the LCG4 method for the evaluation of a linear path at a given tunneling energy. Thelinear path has a length xP between the two classical turning points ~ss0 and ~ss1, andhere we consider np ¼ 0. The adiabatic region in the reactant side is labeled as I, thenonadiabatic LCG3 region is labeled as II, the nonadiabatic region that includesthe condition given by Eq. [201] is labeled as II*, and the adiabatic region in the productside is labeled as III. The boundaries of the adiabatic region are indicated by a dottedline. The boundaries between the adiabatic and the nonadiabatic regions for theplotted linear path are zoomed in the squares labeled as (a) and (b). In the reactants side,we consider the case in which VG

a ½sIðx0I ;~ss0Þ� > VII;0eff ðxI;~ss0Þ, and in the products side, the

opposite case is considered; i.e., VGa ½sIIIðx0III;~ss0Þ� > VII;0

eff ðxIII;~ss0Þ.


The tunneling amplitude T(E) accounts for tunneling initiated by vibra-tional motion perpendicular to the reaction coordinate along the incoming andoutgoing trajectories. There is also the probability that motion along the reac-tion coordinate can initiate tunneling at the classical turning point s0 for thereaction coordinate motion. The amplitude for this tunneling contribution isexpf�y½s0ðEÞ�g cos wfs0ðEÞ; gg½s0ðEÞ�g and for the reverse direction isexpf�y½s0ðEÞ�g cos wfs1ðEÞ; gg½s0ðEÞ�g. The total probability then becomes

PLCG4prim ðEÞ ¼ jTðEÞj2

þ cos wfs0ðEÞ; gg½s0ðEÞ�g þ cos wfs1ðEÞ; gg½s0ðEÞ�g2

� �2

� expf�2y½s0ðEÞ�g ½204�

This primitive probability can be greater than one because of the integration ofthe amplitudes over the incoming and outgoing trajectories. Within the uniformsemiclassical approximation, the probability should go to 1=2 at the barrier max-imumandwe enforce this by the uniform expression in Eq. [205] forE � VAG

a .109

PLCG4ðEÞ ¼ 1þ 1

2

PLCG4prim ðVAG

a Þh i�1

�1PLCG4prim ðVAG

a ÞPLCG4prim ðEÞ

8><>:

9>=>;� 1

1þ PLCG4prim ðEÞ

h i�1½205�

This expression reduces to the primitive probability PLCG4prim when it is suffi-

ciently small and goes to 1=2 at the barrier maximum, VAGa . We use an expres-

sion analogous to Eq. [153] to extend the uniform probabilities to energiesabove the barrier.

Large-Curvature Tunneling with Vibrational ExcitationsAs we mentioned, exoergic reactions can have tunneling into excited

states and endoergic reactions can have tunneling from excited states. Tosimplify the description of the LCG4 tunneling method, we only consider cal-culations of the tunneling correction factor for the exoergic direction. How-ever, we construct the tunneling correction factor to obey detailed balance,so the tunneling correction factor for the endoergic reaction is the same. Tun-neling is assumed to populate excited states of a single receptor mode p in theproduct channel. The p mode is a linear combination of the generalized transi-tion-state vibrational modes along the reaction coordinate. We provide adescription of how this mode is defined below. The primitive probability isobtained by summing over final states with the vibrational quantum numbernp of the LCG4 receptor vibrational mode

PLCG4nmaxðEÞ ¼

Xnmax

np¼0PLCG4prim ðE; npÞ ½206�


where nmax is the maximum value of np for which the primitive probabilitiesare included in the sum. PLCG4

nmaxðEÞ is calculated for values of nmax from 0 to

Nmaxp ðEÞ, which is defined below, and used in an expression similar to

Eq. [205] to obtain a uniform expression for each nmax. Although PLCG4nmaxðEÞ

increases monotonically with increasing nmax, the uniform expression maynot, and so we choose the value of nmax that gives the maximum value:

PLCG4ðEÞ ¼ maxnmax

1þ 1

2

PLCG4nmax

VAGa

� �h i�1�1

PLCG4nmax

VAGa

� � PLCG4nmaxðEÞ

8><>:

9>=>;� 1

1þ PLCG4nmaxðEÞ

h i�1½207�

Details of the methods for calculating PLCG4prim ðE; npÞ for np ¼ 0 are described

above. Calculation of PLCG4prim ðE; npÞ for excited states requires (1) definition

of the p mode, (2) definition of Nmaxp ðEÞ, and (3) description of how calcula-

tion of the primitive tunneling probability is modified for np 6¼ 0.Mode p, also called the quasiadibatic or receptor mode, is given by the

projection of the F � 1 normal modes on the straight-line tunneling path.Recall that for np ¼ 0, a unique tunneling path is defined for each startingpoint for tunneling in the reactant channel, ~ss0. The tunneling vector givenby Eq. [178] and the ending point for tunneling in the product channel,~ss1ð~ss0Þ, is determined by the resonance condition in Eq. [176], where we nowexplicitly show the dependence of ~ss1 on ~ss0. For excited states, np 6¼ 0, the end-ing point of the tunneling path in the product channel depends on np; that is,~ss1ð~ss0Þ is replaced by ~ss1ð~ss0; npÞ. The resonance condition defining ~ss1ð~ss0; npÞrequires calculation of the adiabatic potential with excitation in the p mode,which in turn requires definition of the p mode. We start by defining the pmode for an arbitrary straight line path connecting a geometry along thereaction path at ~ss0 in the reactant region, xRPð~ss0Þ, with a geometry alongthe reaction path at an arbitrary location s along the reaction path in the pro-duct channel, xRPðsÞ. The vector connecting these two points is

Eð~ss0; sÞ ¼ xRPðsÞ � xRPð~ss0Þ ½208�

The normalized projection of this vector onto the normal modes at location salong the reaction coordinate defines the p mode for this straight-line path andis given by

qp;mðs;~ss0Þ ¼ Eð~ss0; sÞ � LGTm ðsÞPF�1

m¼1Eð~ss0; sÞ � LGT

m ðsÞ� 2� �1

2

½209�


The harmonic frequency for the p mode can be calculated as

xpðs;~ss0Þ ¼XF�1m¼1

omðsÞqp;mðs;~ss0Þ� 2( )1

2

½210�

This procedure is equivalent to orthogonalizing Eð~ss0; sÞ to the tangent to thereaction path at s and computing the harmonic frequency along the resultingdirection decoupled from the other modes.

The ground-state adiabatic potential is used in the reactant channel forvalues of s up to the location sAG� of the maximum in the ground-state adia-batic potential curve. The classical turning point in the reactant channel forenergy E, s0ðEÞ, is still defined by Eq. [175]. In the product channel, we definethe excited-state vibrationally adiabatic potential curve with quantum numbernp for each initiation point ~ss0 by

Vgaðnp; s;~ss0Þ ¼ VG

a ðsÞ þ np�hopðs;~ss0Þ ½211�

The product-side endpoint ~ss1ð~ss0; npÞ of the tunneling path initiated at ~ss0 isdefined by the resonance condition

Vga np;~ss1ð~ss0; npÞ;~ss0� ¼ VG

a ð~ss0Þ ½212�

The classical turning point at energy E on the product side is then given by

s1ðE; npÞ ¼ ~ss1 s0ðEÞ; np� ½213�

There can be more than one solution to these two equations. The functions~ss1ð~ss0; npÞ and s1ðE; npÞ are defined as the largest of these solutions.

The integerNmaxp is the largest value of np that allows tunneling at energy

E. As mentioned, the quantities needed in the LCG4 calculations are stored ona grid of s values ranging from s� in reactants to sþ in products. The smallestinitiation point in the reactant channel for a tunneling path with excited statenp in products is defined by

~ss0;minðnpÞ ¼ max s�; smin 1ðnpÞ� ½214�

where smin 1ðnpÞ is the value of the initiation point in the reactant channel thatconnects to the last point on the grid:

sþ ¼ ~ss1 smin 1ðnpÞ; np� ½215�


With these definitions,Nmaxp ðEÞ is defined as the largest integer value of np that

satisfies

Vga

nnp;~ss1 ~ss0;minðnpÞ; np

� ;~ss0;minðnpÞ

o� E ½216�

This definition assumes that the excited-state adiabatic potential for values of sgreater than ~ss1½~ss0;minðnpÞ; np� are smaller than the adiabatic potential at this svalue.

Calculation of the primitive probabilities for each value of np follows thesame procedure as outlined in the previous section. A major difference arisesbecause the straight-line tunneling paths are different for each excited state.Generalizations of Eqs. [177]–[179] are as follows:

xðx;~ss0; npÞ ¼ xRPð~ss0Þ þ x ggð~ss0; npÞ ½217�

ggð~ss0; npÞ ¼ xRP½~ss1ð~ss0; npÞ� � xRPð~ss0ÞxPð~ss0; npÞ

½218�

xPð~ss0; npÞ ¼��xRP½~ss1ð~ss0; npÞ� � xRPð~ss0Þ

�� ½219�

The expression for tunneling amplitude T0ðE; npÞ has the same form asEq. [180], with the terms in the integrand modified appropriately to includethe dependence on np. The expression for vR E;~ss0ð Þ remains unchanged,whereas those for t�1ð~ss0; npÞ and w½~ss0; ggð~ss0; npÞ� are modified only because ofthe change in the angle between the tunneling path and the reaction path.Changes to Ttunð~ss0; npÞ are more substantial and are discussed in more detailbelow. The expression for the reverse amplitude T1ðE; npÞ takes the form

T1ðE; npÞ ¼ð1s1ðE;npÞ

d~ss1 v�1R ðE;~ss1; npÞt�1ð~ss1; npÞTtun½~ss1; np�

��sin w½~ss1; ggð~ss1Þ��½220�

where we use ~ss1 as the independent variable in this case and the reactant sideterminus of the tunneling path, ~ss0ð~ss1; npÞ, is defined by

Vga np;~ss1;~ss0ð~ss1; npÞ� ¼ VG

a ~ss0ð~ss1; npÞ� ½221�

The product-channel velocity term has an explicit dependence on np becausethe excited-state adiabatic potential is used in its evaluation:

vRðE;~ss1; npÞ ¼ 2

m

nE� Vg

a ½np;~ss1;~ss0ð~ss1; npÞ�o� �1

2

½222�

As in the expression for T0ðE; npÞ, the quantities t�1ð~ss1;npÞ and w½~ss1; ggð~ss1; npÞ�are modified only because of the change in the angle between the tunneling


path and the reaction path. Finally, the tunneling amplitudes in the expres-sions for T0ðE; npÞ and T1ðE;npÞ are related by

Ttunð~ss1; npÞ ¼ Ttun½~ss0ð~ss1; npÞ; np� ½223�

so all that remains is a description of the modifications needed in calculatingTtunð~ss0; npÞ.

The tunneling amplitude takes the form

Ttunð~ss0; npÞ ¼ exp½�yð~ss0; npÞ� ½224�

where the action integral in Eq. [188] is modified to read as

yð~ss0; npÞ ¼ ð2mÞ12

�h

ðxIð~ss0;npÞ0

dxnVG

a ½sIðx;~ss0; npÞ� � VGa ð~ss0Þ

o12

� cos w sIðx;~ss0; npÞ; ggð~ss0; npÞ�

þðxIIIð~ss0;npÞxIð~ss0;npÞ

dx VIIeffðx;~ss0; npÞ � VG

a ð~ss0Þ� 1

2

þðxPð~ss0;npÞxIIIð~ss0;npÞ

dxnVg

a ½np; sIIIðx;~ss0; npÞ;~ss0� � VGa ð~ss0Þ

o12

� cos w½sIIIðx;~ss0; npÞ; ggð~ss0;npÞ�!

½225�

where for clarity we explicitly show the dependence of the integration limitson ~ss0 and np. The major change for the action integral with an excited state inthe product region is that the excited-state adiabatic potential is used in theproduct adiabatic region. Note that the resonance condition in Eq. [212]allows us to replace VG

a ð~ss0Þ by Vga ½np;~ss1ð~ss0; npÞ;~ss0� if it is more convenient

computationally. Evaluation of siðx;~ss0; npÞ proceeds by finding the solutionto Eq. [189], as described in the previous section, except that the geometryalong the straight-line tunneling path is replaced by xðx;~ss0; npÞ as defined inEq. [217].

Determination of the zeroth-order boundary x0I ð~ss0; npÞ between adiabaticregion I and the nonadiabatic region uses the same procedure as outlined in thethree conditions provided in Eqs. [191]–[193], with appropriate modificationsto include the np-dependence of the straight-line path. Determination of thezeroth-order boundary x0IIIð~ss0; npÞ between adiabatic region I and the nonadia-batic region uses Eqs. [194]–[195], with one further modification beyond theone to include the np-dependence of the straight-line path. The turning pointsfor the vibrational modes in the product region should include the effect ofthe excitation in the p mode. The excitation energy np�hopðs;~ss0Þ in Eq. [211]


is partitioned into all F � 1 normal modes. Each of themmodes gets an energythat is given by

�Emðs;~ss0; npÞ ¼ np�hopðs;~ss0Þ ½omðsÞqp;mðs;~ss0Þ�2½opðs;~ss0Þ�2

½226�

so the energy of the generalized normal mode m is given by

Emðs;~ss0; npÞ ¼ 1

2�homðsÞ þ�Emðs;~ss0; npÞ ½227�

The harmonic turning point needed in the modified version of Eq. [195] isevaluated at s ¼ sIIIðx;~ss0; npÞ and is given by

tp;m½sIIIðx;~ss0; npÞ;~ss0; np� ¼ �h

mom sIIIðx;~ss0;npÞ� þ 2

�Em sIIIðx;~ss0; npÞ;~ss0; np�

mom½sIIIðx;~ss0; npÞ�� 2

!12

½228�

Once the values for x0I ð~ss0; npÞ and x0IIIð~ss0; npÞ are determined for a given~ss0 and np, they are used in the definition of xIð~ss0; npÞ and xIIIð~ss0; npÞ using thegeneral approach described in Eqs. [197]–[201]. For the case that the adiabaticregions overlap, i.e., x0I ð~ss0; npÞ � x0IIIð~ss0; npÞ, we set the adiabatic potential inthe interval ½xIIIð~ss0; npÞ; xIð~ss0; npÞ� as

minnVG

a ½sIðx;~ss0; npÞ�;Vga ½np; sIIIðx;~ss0; npÞ;~ss0�

o½229�

For the case that x0I ð~ss0; npÞ < x0IIIð~ss0; npÞ, we define the zeroth-order effectivepotential VII;0

eff ðx;~ss0; npÞ by the same form as Eq. [198], with modification toinclude the np-dependence of the tunneling path. VI

corrðx0I ;~ss0; npÞ is given byEq. [199] noting that the right-hand side is evaluated at sIðx;~ss0; npÞ and thecorrection potential at the region III boundary is now given by

VIIIcorrðx0III;~ss0; npÞ ¼

XF�1m¼1

Emðs;~ss0; npÞ � 1

2mo2

mðsÞQ2mðsÞ

� �s¼sIIIðx0III;~ss0;npÞ

½230�

The boundary xIð~ss0; npÞ is defined by modified forms of Eqs. [200] and [201],whereas for xIIIð~ss0;npÞ, these conditions are modified to read as follows:

xIIIð~ss0; npÞ ¼ x0IIIð~ss0; npÞfor Vg

a ½np; sIIIðx0III;~ss0; npÞ;~ss0� � VII;0eff ðx0III;~ss0; npÞ ½231�


and

Vga np; sIII xIII;~ss0; np

� �;~ss0

� ¼ VII;0eff xIII;~ss0; np� �

for Vga ½np; sIIIðx0III;~ss0; npÞ;~ss0� < VII;0

eff ðx0III;~ss0; npÞ ½232�

Once the boundaries xIð~ss0; npÞ and xIIIð~ss0; npÞ are determined, the effectivepotential is defined by modification to Eq. [203], where the anharmonic poten-tial on the reactant side VI

anhð~ss0; npÞ is given by modifying Eq. [202] to includenp-dependence of the straight-line path and the np-dependent zeroth-ordereffective potential, and on the product side, it is replaced by

VIIIanhð~ss0; npÞ ¼ Vg

a np; sIIIðxIII;~ss0; npÞ;~ss0� � VII;0

eff ðxIII;~ss0; npÞ ½233�

One final modification is needed in the treatment of excited states in theLCG4 calculations. With T0ðE; npÞ and T1ðE; npÞ obtained with the proce-dures defined above, the primitive semiclassical probability for state np is givenby the following modification to Eq. [204]:

PLCG4prim ðE;npÞ¼ T0ðE;npÞþT1ðE;npÞ

�� 2þdnp;0

coswfs0ðEÞ; gg½s0ðEÞ;np�gþcoswfs1ðE;npÞ;gg½s0ðEÞ;np�g2

� �2

�expf�2y½s0ðEÞ;np�g ½234�

where the tunneling contribution from motion along the reaction coordinateinitiated at the classical turning point, which is accounted for in the last term,is an adiabatic process and therefore only contributes to np ¼ 0.

Practical Methods to Evaluate LCT Transmission CoefficientsThe action integral in Eq. [225] can be evaluated by standard numerical

integration procedures, and about 180 points along the linear path are neededto get full convergence in a typical case. Some of those points may correspondto the nonadiabatic region, for which information in the reaction swath isneeded. In addition, calculation of the transmission coefficient with the aboveprocedure has to be repeated at many tunneling energies, of the order of 80energies, and it is not unusual to have to calculate several hundreds of energiesalong the linear path, which cannot be extrapolated from information aboutthe MEP. These large numbers of single-point energy calculations can makeevaluation of the LCT probabilities directly from ab initio data expensive.

One way of reducing the computational cost is to interpolate some of theenergies along the linear path by a spline under tension rather than computing


them all directly. If at a given tunneling energy Ei we have to evaluate a set ofenergies along the linear path fx1; . . . ; xj; . . . ; xNT

g, where NT can be forinstance 180, we may have a subset fxI; . . . ; xIIIg of points in the nonadiabaticregion, which points cannot be extrapolated from the MEP (in this section, weassume np ¼ 0 for clarity). We can pick up a given number Nna of equallyspaced points in this subset and calculate all others by spline under tensioninterpolation. This procedure can be repeated for each of thefE1; . . . ;Ei; . . . ;EMT

g tunneling energies, with MT being the total number oftunneling energies needed to evaluate the transmission factor (a common valueis MT ¼ 80). This algorithm is called one-dimensional spline interpolationlarge curvature tunneling [ILCT(1D)], and in general, with a value ofNna ¼ 9, it is possible to get converged transmission coefficients with an errorsmaller than 4%.120 This algorithm reduces the computational cost of the LCTtransmission coefficients by about a factor of 5.

Another possibility is to consider the MT �NT grid and to interpolatenot only the points along the linear path but also the tunneling energies. Ofthe whole set of energies fE1; . . . ;Ei; . . . ;EMT

g, we take a subsetE1; . . . ;Ei; . . . ;EMf g, where E1 is the same in both sets, specifically the lowest

energy at which tunneling is possible, and EMT¼ EM coincides with the top of

the vibrationally adiabatic barrier VAGa . The difference is that the second

subset includes only M equally spaced energies of the total MT energies. Atthis particular set of energies, we also build a subset fx1; . . . ; xj0 ; . . . ; xNg ofthe fx1; . . . ; xj; . . . ; xMT

g original set of progress variables where x1 is thesame in both sets and xNT

¼ xN ¼ xP, but as before, the second subset includesonly N equally spaced points instead of the total NT points. The subsets arebuilt in this way because when squared, the M�N and MT �NT gridshave the same boundaries. In fact the M�N grid is transformed in a unitarysquare by performing the following scaling:

�EEi0 ¼ Ei0 � E1

EM � E1and �xxj0 ¼

xj0

xP;j0½235�

where i0 ¼ 1; . . . ;M and j0 ¼ 1; . . . ;N. The grid is interpolated using a two-dimensional spline under tension algorithm, and so this method is usuallycall ILCT(2D).121 Any given geometry specified by ðEi; xiÞ and that belongsto the MT �NT grid can be retrieved from the M�N grid by interpolation,because any geometry that belongs to the first grid belongs also to the secondone. It has been shown for the reaction of the CF3 radical with several hydro-carbons that the ILCT(2D) algorithm produces converged results with a rela-tive error of less than 1% using a 9� 11 grid with respect to the LCG4 fullcalculation (80� 180 grid). Due to the good performance of the ILCT(2D)algorithm, we highly recommend its use for the evaluation of the LCG4transmission factors.


The Microcanonically Optimized Transmission Coefficient

For a bimolecular reaction of the type Aþ BC! ABþ C, where A, B,and C may be atoms or groups of atoms, the reaction path curvature is a func-tion of the skew angle, which is the angle between the gradient of V along thereaction path in the product channel and the gradient of V along the reactionpath in the reactant channel. If we consider isoinertial coordinates, the skewangle is defined by

b ¼ cos�1mAmC

ðmA þmBÞðmB þmCÞ� �1=2

½236�

and it is related to the reaction path curvature by

ðþ1�1

kðsÞds24

35 � dxR

ds¼ dxP

ds� dxR

ds

� �� dx

R

ds¼ �ð1þ cos bÞ ½237�

where xP and xR are the geometries in the reactant and product valleys, respec-tively. The skew angle is close to p/2 when B has a much larger mass than Aand C, and it is close to zero when B has a much smaller mass than A and C.Small skew angles lead to large curvature. In Figure 5, two examples illustratethe curvature of the reaction path. For the reaction HþH2 ! H2 þH, theskew angle is b ¼ 60 degrees (Figure 5a), whereas for reactionClþHCl! ClHþ Cl, the skew angle is only b ¼ 14 degrees (Figure 5b). Ingeneral, for a bimolecular reaction, the curvature of the reaction path is largewhen a light particle (like a proton) is being transferred between two heavy

Figure 5 Contour plots in Jacobi coordinates for (a) HþH2 ! H2 þH and(b) ClþHCl! ClHþ Cl reactions, respectively. The MEP is indicated in both figuresto illustrate the larger curvature in the latter case. Figure (a) also shows other alternativetunneling paths (see text).


atoms, although this need not be the case for unimolecular reactions.125 Thesesystems are usually called heavy–light–heavy reactions, and we expect largetunneling effects in those cases. In fact, it is well known that the SC approx-imation may seriously underestimate tunneling for heavy–light–heavy sys-tems, and therefore, we have to search for a better tunneling path.102

Figure 5a shows a plot with four possible paths for a given tunneling energyE. The points on the MEP (labeled as s< and s>) that correspond to a parti-cular tunneling energy, in both the reactant and the product sides, are calledclassical turning points of reaction-coordinate motion, and they correspondto the limits of integration of Eq. [151]. The longest, but energetically morefavorable path, is the MEP [labeled as (a) in Figure 5a], whereas the shortestpath, but with the highest energy, corresponds to the straight-line path[labeled as (d) in Figure 5a]. In between there are an infinite number of pathsthat connect reactants to products at that particular tunneling energy (amongthem is the SC path, which is labeled as (b) in Figure 5a). Among all the pos-sible paths, we have to find the one that has the largest tunneling probability,which is equivalent to finding the path that, for the correct boundary condi-tions, minimizes the action [labeled as (c)], i.e., the so-called least-action path(LAP).102,124,126,127 Tunneling calculations based on the LAP are called least-action tunneling (LAT). Some approximate methods try to find the LAPwithout its explicit evaluation,128 because the search for the LAP is oftenunaffordable or not worth the cost for polyatomic systems. One way tocircumvent this problem is to evaluate the probability along the straight-line path, which is the kind of path3,108,129–131 that dominates in the large-curvature limit and is usually called the large-curvature path (LCP). We cancompute both the SCT and the LCT (the T in the acronym stands for tunnel-ing) probabilities, the first being accurate for small-to-intermediate curvature,whereas the second is accurate for intermediate-to-large curvature (and alsooften reasonably accurate even for small-curvature). As the objective is tofind the tunneling mechanism with the largest tunneling probability, analternative to searching for the LAP is to choose between the maximum ofthe SCT and LCT probabilities. This new probability is called the microca-nonically optimized multidimensional tunneling probability, PmOMT, and it isgiven by91

PmOMT ¼ maxE

PSCTðEÞPLCTðEÞ

�½238�

It has been shown that the mOMT transmission coefficients are comparable inaccuracy with the LAT transmission coefficients for atom–diatom reac-tions.103 Often we just say OMT without including the microcanonical speci-fication in the algorithm (OMT can also mean canonical OMT in which wefirst thermally average the SCT and LCT probabilities and then choose thelarger transmission coefficient). The resulting VTST/OMT rate constants


have been tested carefully against accurate quantum dynamics,103,111,112 andthe accuracy has been found to be very good.

Sometimes we just say VTST/MT. The MT acronym (‘‘multi-dimensionaltunneling’’) can denote ZCT, SCT, LCT, or OMT, all of which are multidimen-sional, but we usually use SCT or OMT when we carry out MT calculations.

BUILDING THE PES FROM ELECTRONICSTRUCTURE CALCULATION

For the vast majority of chemically interesting systems, a potential energysurface (PES) is not available. When this is the case, there are two options: Createan analytic potential energy function (PEF), or use direct dynamics.62,118,132 Thetraditional route of creating an analytic high-level PEF requires considerable data(from electronic structure or experiment) and human development time. A newmethod called multiconfiguration molecular mechanics (MCMM),133–135 whichallows more straightforward creation of a PES from limited data, has recentlybeen developed and is described below. For small to moderately sized systemswhere electronic structure gradients andHessians are not overly expensive, directdynamics is typically the method of choice.

Direct dynamics has been defined as ‘‘the calculation of rates or otherdynamical observables directly from electronic structure information, withoutthe intermediacy of fitting the electronic energies in the form of a potentialenergy function.’’132 In this method, information about the PES is calculatedby electronic structure methods as it is needed, i.e., ‘‘on the fly.’’ For example,consider the calculation of the MEP using the steepest descent method. AHessian calculation is done by electronic structure theory at the saddle point,and a step is taken in the direction of the imaginary frequency. At this newgeometry, a gradient is requested, which is then calculated using electronicstructure theory. That information is passed back to the MEP calculation, astep is taken in the direction of the gradient, and once again a gradient isrequested at the new geometry. This iterative process continues until theMEP reaches the desired length, and then it is repeated for the other side ofthe MEP. In a CVT calculation, Hessians must also be calculated at severalpoints along the path to determine the vibrationally adiabatic ground-statepotential energy curve and free energy of activation profile for each value of s.

Achieving chemical accuracy by electronic structure calculations iscomputationally expensive, and the time required calculating a rate constantis governed almost entirely by the time spent calculating the gradients andHessians. In addition, the accuracy of the rate constant depends on the accu-racy of the electronic structure method. Therefore, the user must make judi-cious decisions about the length of the MEP, how often Hessians arecalculated, whether to use options like LCT that require extra informationabout the PES, and which electronic structure method to use.


Direct dynamics calculations can be carried out by interfacing an electro-nic structure package with POLYRATE, and several such interfaces are avail-able, including MORATE,4,109,136 GAUSSRATE,137 GAMESSPLUSRATE,138

MULTILEVELRATE,139 MC-TINKERATE,140 and CHARMMRATE.141

A key point to be emphasized here is that using so-called ‘‘straight directdynamics’’ may not be the most efficient approach.142 In straight directdynamics, whenever the dynamical algorithm requires a potential energy, agradient, or a Hessian, it is calculated by a full electronic structure calculation.Such algorithmic purity provides one extreme on the spectrum that spans therange from straight direct dynamics to fitting a global potential energyfunction. However, there are several intermediate possibilities in this spec-trum, corresponding to more economical ways of combining electronic struc-ture theory and dynamics. As these algorithmic possibilities are fleshed out, itis not always possible to distinguish whether a calculation should be classifiedas fitting, as local interpolation (a form of direct dynamics), or as direct.143 Infact, such classification is less important than the ability of the algorithm toreduce the cost for given level of accuracy and size of system, to allow for agiven level of accuracy to be applied with affordable cost to larger systems, orto allow more complete dynamical treatments such as large-curvature tunnel-ing, a more expensive treatment of anharmonicity, or a trajectory-based esti-mate of recrossing. This section will consider interpolation schemes as well asstraight direct dynamics.

Direct Dynamics with Specific Reaction Parameters

Direct dynamics with specific reaction parameters (SRPs)132 involves theuse of an electronic structure method that has been adjusted to reproduceimportant data for a specific reaction, followed by determining the reactionrate using direct dynamics. The adjusted method is typically parameterizedto agree with the correct forward barrier height and possibly also with oneexperimental or high-level energy of reaction, but it may actually be parame-terized for any property that is important for the specific reaction, forexample, the potential energy profile along the reaction path.144

When using experimental data, the barrier height is sometimes approxi-mated by the activation energy, although this is not recommended becausethey may differ by several kcal/mol. High-level frequency calculations maybe carried out for reactants and products, yielding an approximation to theirzero-point energy and heat capacity, and these data may be used in combina-tion with the experimental enthalpy of reaction to calculate a good approx-imation to estimate the Born–Oppenheimer energy of reaction. Alternatively,the barrier height and energy of reaction may be calculated from high-levelelectronic structure methods, such as a correlated wave function theoryor density functional theory. Unfortunately such calculations, althoughoften affordable for stationary points, may become prohibitively expensive

Building the PES from Electronic Structure Calculation 191

for direct dynamics due to the large number of gradients and Hessiansrequired.

In the original application,132 the SRP method was applied to thefollowing reaction:

Cl�ðH2OÞn þ CH3Cl0 ! CH3Clþ Cl0�ðH2OÞn; n ¼ 0; 1; or 2 ½239�

In this particular example, a neglect of the diatomic differential overlap(NDDO)145,146 method was created based on semiempirical molecular orbitaltheory, namely AM1.147,148 The resulting method was referred to as NDDO-SRP. The adjusted parameters were the one-center, one-electron energies, UX

mm,which were adjusted to achieve the correct electron affinity for Cl and the cor-rect barrier height for the n ¼ 0 reaction. The NDDO-SRP rate constants werecompared with those calculated using an accurate PES; the errors for the CVT/SCT rate constants ranged from 39% at 200 K to 30% at 1000 K for theunsolvated complex. When considering the enormous amount of time requiredto create an accurate PES compared with the relatively fast SRP directdynamics calculation, these results are very encouraging. The method alsogave good results for the solvated reactions of Eq. [239], where n ¼ 1 andn ¼ 2.

Interpolated VTST

MCMMMulticonfigurational molecular mechanics (MCMM)133–135 is an

algorithm that approximates a global PES by combining molecular mechanics(MM) with a limited number of energies, gradients, and Hessians based onquantum mechanics. (This is a special case of a dual-level strategy in whichone combines a lower and a higher level.) MCMM is an extension of conven-tional MM (which is only applicable to nonreactive systems) to describe reac-tion potential energy surfaces. It extends the empirical valence bondmethod149 so that it becomes a systematically improvable fitting scheme.This is accomplished by combining the rectilinear Taylor series method ofChang, Minichino, and Miller150,151 for estimating V12 in the local regionaround a given geometry with the use of redundant internal coordinates71,72

for the low-order expansion of the PES and the Shepard interpolation meth-od.152,153 The key to MCMM is the limited number of high-level quantummechanical data required, because, whether or not one uses interpolation orMCMM, the vast majority of time required to calculate a rate constant is con-sumed by the electronic structure calculations. It has been shown that poten-tial energy surfaces created using 13 or fewer Hessians can yield accurate rateconstants.134 Even greater efficiency can be achieved if one is certain thatlarge-curvature tunneling paths need not be explored and/or if one uses partialhigh-level Hessians.135


In MCMM, the Born–Oppenheimer PES is estimated as being the lowesteigenvalue of the 2� 2 potential matrix V:

V11 � V V12

V12 V22 � V

�� ¼ 0 ½240�

where V11 corresponds to the molecular mechanics potential function asso-ciated with the well on the reactant side, V22 corresponds to the molecularmechanics potential function associated with the well on the product side,and V12 corresponds to resonance energy function or resonance integral.

The lowest eigenvalue VðqÞ of the matrix in Eq. [240] at a given geometry,q, is given by

VðqÞ ¼ 1

2ðV11ðqÞ þ V22ðqÞÞ � ðV11ðqÞ þ V22ðqÞÞ2 þ 4V12ðqÞ2

h i12

� �½241�

where V11 and V22 are calculated by molecular mechanics using the connectiv-ity of reactants and products, respectively, and where q denotes either the R orx coordinate set of Eq. [36] or a set of valence internal coordinates,28,68–72

such as stretch, bend, and torsion coordinates. Therefore

V12ðqÞ2 ¼ ½V11ðqÞ � VðqÞ�½V22ðqÞ � VðqÞ� ½242�

Using a suitable quantum mechanical electronic structure method, the energy,gradient, and Hessian can be calculated at an arbitrary geometry, qðkÞ, which iscalled an interpolation point or a Shepard point. Near qðkÞ, V11ðqÞ, VðqÞ, andV22ðqÞ may be expanded as a Taylor series, yielding

Vðq; kÞ ffi VðkÞ þ gðkÞy ��qðkÞ þ 1

2�qðkÞ

y � fðkÞ ��qðkÞ ½243�

where

�qðkÞ ¼ q� qðkÞ ½244�

In Eq. [243], VðkÞ, gðkÞ, and fðkÞ are the energy, gradient, and Hessian, respec-

tively, at reference point qðkÞ. The diagonal elements of Vnn can be expandedaround reference point qðkÞ, yielding

Vnnðq; kÞ ffi VðkÞn þ gðkÞy

n ��qðkÞ þ 1

2�qyðkÞ � fðkÞ ��qðkÞ ½245�

where

VðkÞn ¼ VnnðqðkÞÞ; gðkÞn ¼qVnn

qq

� �q¼qðkÞ

; fðkÞn ¼q2Vnn

qqqq

!q¼qðkÞ

½246�


Substituting these expressions into Eq. [242] yields an analytic expression forV12ðqÞ in the vicinity of reference point qðkÞ, given by

V12ðq;kÞ2ffi VðkÞ1 �VðkÞ

�VðkÞ2 �VðkÞ

�þ V

ðkÞ2 �VðkÞ

�gðkÞ1 �gðkÞ

�y�qðkÞ

þ VðkÞ1 �VðkÞ

�gðkÞ2 �gðkÞ

�y�qðkÞþ1

2VðkÞ2 �VðkÞ

��qyðkÞ f

ðkÞ1 �fðkÞ

��qðkÞþ 1

2VðkÞ1 �VðkÞ

��qyðkÞ f

ðkÞ2 �fðkÞ

��qðkÞþ g

ðkÞ1 �gðkÞ

�y�qðkÞ

� �� g

ðkÞ2 �gðkÞ

�y�qðkÞ

� �½247�

Now that expressions for V11ðqÞ, V12ðqÞ, and V22ðqÞ are available in the vici-nity of qðkÞ, an expression must be derived for VðqÞ that is globally smooth as qapproaches different reference points on the PES. In MCMM, this is doneusing a Shepard interpolation.152,153 Suppose that a collection of M ‘‘Shepardpoints’’ is available, for which there are ab initio energies VðkÞ, gradients gðkÞ,and Hessians fðkÞ. By using the Shepard interpolation method, the resonanceenergy function is given by

VS12ðqÞ ¼

XMk¼1

WkðqÞV 012ðq; kÞ ½248�

where the normalized weight is given by

WkðqÞ ¼wkðqÞwðqÞ ½249�

in terms of unnormalized weights wk (discussed below) and in terms of thenormalization constant

wðqÞ ¼XMþ2l¼1

wlðqÞ ½250�

where the upper limit of the sum is nowgreater than in Eq. [248] because this sumalso includes van der Waals minima (for biomolecular reagents) or chemicalminima (for unimolecular reagents) corresponding to the two molecularmechanics structures; the resonance integral is zero by definition at these twostructures.

In Eq. [248], V 012 is a modified quadratic function given by

½V 012ðq; kÞ�2 ¼ ½V12ðq; kÞ�2uðq; kÞ ½251�where u is a modifier given by

uðq; kÞ ¼ exp�d

½V12ðq; kÞ�2 !

; ½V12ðq; kÞ�2 > 0

0; ½V12ðq; kÞ�2 � 0

8><>: ½252�


where d is 10�8 E2h, and Eh is one hartree. In practice, the expression for

V12ðq; kÞ is given by

½V12ðq; kÞ�2 ¼ DðkÞ 1þ bðkÞy

q� qðkÞ �

þ 1

2q� qðkÞ �y

CðkÞ q� qðkÞ ��

½253�

Choosing constants in Eq. [253] such that Eq. [243] is reproduced whenEq. [251] is substituted into Eq. [241] yields

DðkÞ ¼ �VðkÞ1 �V

ðkÞ2 ½254�

bðkÞy ¼ g

ðkÞ1 � gðkÞ

�VðkÞ1

þ gðkÞ2 � gðkÞ

�VðkÞ2

½255�

CðkÞ ¼ 1

DðkÞ�ðgðkÞ1 � gðkÞÞðgðkÞ2 � gðkÞÞy þ ðgðkÞ2 � gðkÞÞ

� ðgðkÞ1 � gðkÞÞyþ fðkÞ1 � f

ðkÞ

�VðkÞ1

þ fðkÞ2 � f

ðkÞ

�VðkÞ2

½256�

�VðkÞn ¼ VðkÞn � VðkÞ ½257�

We can recap this procedure as follows: Electronic structure calculations areused to generate the Taylor series V(q;k) of Eqs. [243] and [244] in the vicinityof point q(k). This V(q) and the Taylor series (Eqs. [245] and [246]) of the reac-tant and product MM potential energy surfaces are substituted into Eq. [242]to yield a Taylor series of V12 in the vicinity of q(k).

Next we will interpolate V12, which is much smoother and easier tointerpolate than the original V. As discussed further below, the interpolationof V12 is carried out in valence internal coordinates28,68–72 to avoid the neces-sity of achieving a consistent molecular orientation, which would be requiredfor interpolation in atomic Cartesians.

Finally, we must specify the weighting function wkðqÞ to be used forinterpolation via Eqs. [248] through [252]. Several conditions should be metby the weight wk associated with a particular geometry q(k). These conditionsinvolve the behavior of wk near q(k) and near the other interpolation pointsqðk

0Þ with k0 6¼ k. The conditions assure that wk is smooth enough (zero firstand second derivatives near all interpolation points) that the left-hand sideof Eq. [248] has the same Taylor series, through quadratic terms, at q(k) asthat of V 012ðq; kÞ. The conditions are

wkðqðkÞÞ ¼ 1; all k ½258�wkðqðk0ÞÞ << 1; k0 6¼ k ½259�

qwk

qq

��q¼qðk0 Þ

ffi 0; all k0 ½260�

q2wk

qq2

��q¼qðk0 Þ

ffi 0; all k0 ½261�


A variety of functional forms could be chosen for the weight function. A goodchoice is critical to the success and efficiency of the method. The choice madein Refs. 133 and 134 is

wkðqÞ ¼½dkðqÞ��4PMþ2i¼1

1

½diðqÞ�4½262�

where

dkðqÞ ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXjmax

j¼1ðqj � q

ðkÞj Þ2

vuut ½263�

Note that Eq. [262] is the smoothest possible function that satisfies Eqs. [258]to [261]. The recommended jmax for atom-transfer reactions is 3, where q1, q2,and q3 are the forming bond distance, the making bond distance, and the non-transferring bond distance, respectively.

Although the PES is uniquely expressed in Cartesian coordinates, the She-pard interpolation is done using internal coordinates to avoid ambiguities relat-ing to the orientation of the system. Therefore, the data are first transformedfrom Cartesians to internal coordinates, the Shepard interpolation is then com-pleted, and the potential and derivatives are finally transformed back to Carte-sians. For a detailed description of how this is accomplished, see Kim et al.133

A decision must now be made on where to locate the Shepard points. Inaddition to the three required Shepard points corresponding to the reactantwell, transition state, and product well, other points can be added to improvethe accuracy of the surface. A calculation with no additional points is referredto as MCMM-0. A systematic method for choosing the additional points hasbeen presented by Albu, Corchado, and Truhlar.134 The first supplementarypoint is placed on the dynamical bottleneck side of the MEP, where the energyis equal to one quarter the barrier height. This calculation with four She-pard points is referred to as MCMM-1. Additional Shepard points may beadded systematically to yield a sequence of approximations MCCM-2,. . .,MCMM-n, where MCCM-n uses 10 nonstationary points. The user may spe-cify as many additional points as needed to achieve the best accuracy for thePES. For a full discussion of the accuracy of rate calculations, see Ref. 134.Once six nonstationary points have been used in MCMM-6, CVT/SCT ratesare typically within about 15% of the corresponding direct dynamics calcula-tions. MCMM thus provides efficient calculations of reaction rates using onlya small amount of high-level data.

IVTST-MInterpolated variational transition state theory by mapping (IVTST-M)154

is simpler than MCMM in that it does not involve molecular mechanics.


IVTST-M has two goals:

(1) to minimize the length of the path that must be calculated, and(2) to minimize the number of Hessians that must be calculated.

The notation for an IVTST-M calculation is IVTST-M-H/G, where H and Gindicate, respectively, the number of additional Hessians and gradients used.

Interpolation of VMEP uses the Gþ 3 energies that are available. Theseenergies include those of the stationary points corresponding to the reactant,transition state, and the product, and the gradients are the G nonstationarypoints that are situated along the reaction path. Because the reactants and pro-ducts for a bimolecular reaction are at s ¼ �1 and s ¼ þ1, respectively, andbecause interpolation over an infinite interval is less desirable than interpola-tion over a finite one, it is advisable to map the potential VMEP(s) onto a newfunction, VMEP(z), based on a new variable z:

z ¼ 2

parctan

s� s0L

�½264�

where s0 and L are parameters discussed in the next paragraph. This mappingchanges the interval for the potential function from (�1,þ1) for VMEP(s) to(�1,þ1) for VMEP(z).

The interpolation is more efficient if one calculates s0 such that the newfunction is centered where the important changes are occurring as the reactiontakes place. A simple approach is to set s0A and s0B to the values of s where thepotential on the reaction path is equal to half the barrier height (measuredfrom reactants and products respectively) and then to set s0 equal to themean of s0A and s0B, but this may cause unphysical values of s0 for very exother-mic or endothermic reactions. Therefore, it is recommended that s0 should becalculated using sA and sB defined as

sA ¼ �min��s0A��; 2s0B� ½265�

sB ¼ min��s0A��; s0B� ½266�

where

s0A ¼ �ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiVMEPðs ¼ 0Þ � VMEPðsRÞ

jozj2m

s½267�

s0B ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiVMEPðs ¼ 0Þ � VMEPðsPÞ

jozj2m

s½268�

in which m is the reduced mass used to scale the coordinates of the system inEq. [36], oz is the imaginary frequency of the transition state, and sR and sP


correspond to the values of s at the reactants and products, respectively. OncesA and sB have been determined, s0 is calculated using the arithmetic mean:

s0 ¼ ðsA þ sBÞ2

½269�

The range parameter L is estimated from the width of the reaction path:

L ¼ ð�jsAj þ sBÞ2

½270�

VMEP(s) can now be successfully mapped to VMEP(z). Ten extra points arethen placed between the last gradient on the reactant side and z ¼ �1, and 10extra points are placed between the last gradient on the product side andz ¼ 1. These points, whose energies are calculated in the following steps, areused along with the gradient calculations to create a spline-under-tensionpotential energy function along the reaction path.

The energy of these 20 additional points is estimated using the Eckartpotential in Eq. [271] whose terms are defined in Eqs. [272]–[276].

VMEP ¼ AY

1þ Yþ BY

ð1þ YÞ2 þ C ½271�

Y ¼ exps� sEck0

LEckðsÞ� �

½272�

A ¼ VMEPðsPÞ � VMEPðsRÞ ½273�C ¼ VMEPðsRÞ ½274�B ¼ ½2VMEPðs ¼ 0Þ � A� 2C� 2ð½VMEPðs ¼ 0Þ � C�� ½VMEPðs ¼ 0Þ � A� C�Þ12 ½275�

sEck0 ¼ �LEckðsÞ ln Aþ B

B� A

� �½276�

where LEck is a new function. LEck is calculated for each nonstationary point ssuch that the Eckart potential goes through VMEP(s) at that s as well as at reac-tants, products, and saddle point. At the saddle point only, LEckðsÞ iscalculated using the imaginary frequency:

LEckðs ¼ 0Þ ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2½VMEPðs ¼ 0Þ � A�VMEPðs ¼ 0Þ

mjozj2B

s½277�

The Gþ 1 values of LEckðsÞ are then mapped onto the [�1,þ1] interval usingEq. [264]. The values of LEckðz ¼ �1Þ and LEckðz ¼ þ1Þ are approximated


using a quadratic polynomial that has been fitted to the last two G points onthe reactant and product sides of the saddle point, respectively. Thecontinuous function for LEckðzÞ (from z ¼ �1 to z ¼ þ1) can now be calcu-lated using a spline under tension based on values of LEckðzÞ and the saddlepoint, the G nonstationary points, and the approximated values at z ¼ �1and z ¼ þ1. At this time, the energies for the 10 extra points on each sideof the saddle point are calculated using an Eckart potential of LEckðzÞ.

Finally, using the Gþ 23 energies between z ¼ �1 and z ¼ þ1, VMEP(z)is calculated using a spline under tension. This provides an energy along thereaction path from s ¼ �1 to s ¼ þ1. For more details about the minoradjustments to the theory needed to account for a unimolecular or associationreaction rather than a bimolecular reaction with bimolecular products, seeRef. 154.

Once the MEP has been calculated, the moments of inertia I(s) and thefrequencies required to calculate the partition function must be determined.The determinant of the moment of inertia is calculated at the saddle pointand the G nonstationary points. (The values of this determinant for bimolecu-lar reactants and products are assumed to be infinity.) The Gþ 3 values of I(s)are mapped using Eq. [264] to yield IðzÞ. Because the moment of inertiachanges as the square of the geometry and to keep the interpolant in a conve-nient numerical range, one actually interpolates ðIðzzÞ=IðzÞ)1/2 rather than IðzÞat theGþ 3 points. Finally, a spline fit is created using this function to give I(s)for any s.

Interpolation of the frequencies is likewise done using a spline fit; foreach of the F � 1 bound modes, the frequency o(s) is calculated at the locationof each Hessian. These frequencies are then mapped using Eq. [264] and put incanonical order, such that any avoided crossing or symmetry constraints areignored. Imaginary frequencies are treated as negative numbers. The canonicalorder is defined so that the real frequencies are first in order of decreasing mag-nitude, following by the imaginary frequencies in their order of increasingmagnitude followed by the six (five for linear systems) frequencies of smallestmagnitude regardless if they are real or imaginary.

The cost of using the IVTST-M algorithm is negligible compared with thecost of calculating high-level electronic structure gradients and Hessians. Themethod was developed to allow for a shorter sequence of reaction path data,but even if long reaction paths have already been calculated, it is advantageousto use IVTST-M to map out the remainder of the path rather than truncating it.

Dual-Level Dynamics

Dual-level dynamics142,143,155,156 refers to dynamics calculations thatuse two levels of electronic structure theory or two PEFs of different quality.In the VTST/MT context, such methods use a low-level method to calculatethe MEP and gather some information along it followed by using a smaller


number of high-level calculations to improve accuracy. Computing all of thenecessary data with a high-level method may be prohibitively CPU-expensive;yet the low-level method may not provide the required accuracy; dual-levelmethods attempt to use both levels to obtain the highest possible accuracyat the lowest possible cost.

Interpolated Single-Point EnergiesVariational transition state theory with interpolated single-point energies

(VTST–ISPE), is a dual-level method156 that uses high-level, single-point ener-gies on a low-level MEP to correct the VMEP. POLYRATE has implementedthis by using a mapped coordinate as in the IVTST-M algorithm. The low-levelMEP is first mapped to the interval from z ¼ �1 to z ¼ þ1 using Eq. [264] asbefore. Wherever a high-level, single-point energy has been evaluated, �V iscalculated as �V ¼ VHL � VLL, where the subscripts denote high level (HL)and low level (LL). Finally, the dual-level reaction path is evaluated using

VDL ¼ VLL þ Vsplineð�V; zÞ ½278�

where Vsplineð�V; zÞ is a spline-under-tension fit.

Interpolated Optimized CorrectionsThe interpolated optimized corrections (IOC) method143,155 uses HL

energies, gradients, and Hessians at the high-level stationary points to improvethe quality of a VMEP and of frequency and moment of inertia profiles origin-ally calculated at LL.

We have proposed two dual-level schemes for VMEP.143,155 The dual-

Eckart (DE) scheme is given by143

VDEMEP ¼ VLL

MEPðsÞ þ VHLEckðsÞ � VLL

EckðsÞ� ½279�

where

VEck ¼AY

1þ Yþ BY

ð1þ YÞ2 þ C ½280�

Y ¼ exps� S0L

� �½281�

A ¼ VMEPðs ¼ þ1Þ � C ½282�C ¼ VMEPðs ¼ �1Þ ½283�

B ¼ 2Vz � A� 2C� �þ 2

hVz � C

�ðVz � A� C

�i12 ½284�

S0 ¼ �L lnAþ B

B� A

� �½285�


where Vz is VMEPðs ¼ 0Þ, the range parameter for VHLEckðsÞ is given by

LHL ¼ � 2VzðVz � AÞmðozÞ2B

" #12

½286�

where all quantities on the right-hand side are calculated at the high level, andLLL is determined by fitting the low-level VMEP to an Eckart function at thethree stationary points and at one additional point s ¼ sL, where

VzLL � VLLMEPðs ¼ sLÞ ¼ 1

2

�VzLL � VLL

MEPðs ¼ signðsLÞ1Þ

�½287�

The sign of sL is positive if ALL is positive and negative if ALL is negative.The single-Eckart (SE) scheme for correcting VMEP is155

VSEMEP ¼ VLL

MEPðsÞ þ VEckðsÞ ½288�

with Eq. [281] for Y, where the parameters in the single-Eckart potential aregiven by

A ¼ �Vðs ¼ þ1Þ � C ½289�C ¼ �Vðs ¼ �1Þ ½290�B ¼ ð2�Vz � A� CÞ 2ð�Vz � CÞð�Vz � A� CÞ� 1=2 ½291�

along with Eq. [287] to determine L. In these equations, �V denotes the dif-ference of VHL from VLL. Furthermore, we use the upper sign in Eq. [291]when �Vz > �Vðs ¼ 1Þ and the lower sign otherwise. We have foundthat sometimes the DE scheme is preferred,143 but on average, the SE schemeis better.156

In addition to correcting VMEP, the frequencies and the determinant ofthe moment of inertia tensor are also corrected.

The formula for the interpolation of frequencies in the original meth-od155 allowed for the possibility of negative frequencies, which was proble-matic. Therefore, only the updated interpolation method143 will bediscussed. The 3N � 7 real frequencies calculated using the lower level methodare denoted as oLL

m ðsÞ. The dual-level frequencies calculated using the high-level corrections are given by

oDLm ¼ oLL

m ðsÞ exp f ICLðsÞ� ½292�


where

f ICL ¼ AmY

1þ Yþ BmY

ð1þ YÞ2 þ Cm ½293�

Y ¼ exps� S0;m

L

� �½294�

Am ¼ lnoDLm ðs ¼ þ1Þ

oLLm ðs ¼ þ1Þ

� Cm ½295�

Cm ¼ lnoDLm ðs ¼ �1Þ

oLLm ðs ¼ �1Þ

½296�

B ¼ 2 ln

oDLm ðs ¼ 0Þ

oLLm ðs ¼ 0Þ � Am � 2Cm

! 2

" ln

oDLm ðs ¼ 0Þ

oLLm ðs ¼ 0Þ � Cm

!

� ln

oDLm ðs ¼ 0Þ

oLLm ðs ¼ 0Þ � Am � Cm

!#12

½297�

S0;m ¼ �L lnAm þ Bm

Bm � Am

� �½298�

in which L is defined using Eq. [286]. The frequencies are matched in order ofdecreasing magnitude, disregarding symmetry and avoided crossings, andsetting the leftover modes to zero at s ¼ 1.

The determinant IðsÞ of the moment of inertia tensor is corrected by

IHLðsÞ ¼ aILLðsÞ ½299�

where

a ¼ IHLðs ¼ 0ÞILLðs ¼ 0Þ ½300�

This simple formula was chosen so that no difficulties would arise as IðsÞapproaches infinity at reactant and product states.

IVTST–IOC can also be applied to reactions having reactant-side and/orproduct-side wells. Furthermore, the theory explained here is readily appliedto VTST, ZCT, and SCT, but applying the theory to LCT requires additionalsteps, all of which are explained in Ref. 155.

Interpolated Optimized EnergiesThe interpolated optimized energies (IOE) scheme156 is like the IOC

scheme except that the frequencies are not corrected at the higher level.Although the IOE method only uses high-level data at the stationary points,


it is often more accurate than the ISPE method because it involves geometryoptimization at the higher level.155

REACTIONS IN LIQUIDS

Variational transition state theory can also be applied to reactions inliquids when those reactions are not diffusion controlled. For instance, for ageneral reaction of the type:

Aþ B�!kD

�k�D

AB�!kr Products ½301�

where AB is a complex formed by the two molecules before reaction, applica-tion of the steady-state approximation for the concentration of the complexgives the following rate constant:157

k ¼ kDkrk�D þ kr

½302�

If kr >> k�D, then k ffi kD and the reaction is controlled by diffusion. A typicalvalue for kD is 4� 109M�1s�1. Conversely, when kr << k�D, the process iscontrolled by the chemical step, and we can use CVT to predict the thermalrate constants. Specifically, the CVT bimolecular rate constant can be writtenas

kCVTðTÞ ¼ 1

bhC0exp

��G0

TðCVTÞ �G0TðRÞ

�.RT

�½303�

where C0 is the concentration corresponding to the standard state, G0TðRÞ is

the solution-phase standard-state free energy of reactants at temperature T,and G0

TðCVTÞ is the solution-phase standard-state free energy of activa-tion106,158–160 of the canonical variational transition state at temperature T.As for the gas-phase case, the variational free energy of activation is given by

G0TðCVTÞ ¼ max

sG0

TðGT; sÞ ½304�

where G0TðGT; sÞ is the standard-state free energy of activation for a general-

ized transition state at a location s along the reaction path. The rate constantexpression is similar for a unimolecular reaction, the difference being that C0

is missing in Eq. [303].The quantity G0

TðRÞ is a standard-state free energy in liquid solution,denoted G0

TðlÞ, and it is obtained by treating the solute as a system interacting

Reactions in Liquids 203

with a thermal environment. The system is the solute (or the solute plus a fewclosely coupled solvent molecules), and the environment is the (rest of the) sol-vent. In the rest of this section, we will simply call the system the solute. Theliquid-phase free energy of the solute is the solute’s gas-phase free energy plusthe free energy of solvation, which is defined by

�G0S ¼ G0

TðlÞ �G0TðgÞ ½305�

In the Born–Oppenheimer approximation,

G0TðgÞ ¼ EEðgÞ þ RT ln dE þGVRTðTÞ ½306�

where EE is the ground-state electronic energy (including nuclear repulsion asusual) of the solute, dE is the degeneracy of the ground electronic state, andGVRT(T) is the vibrational–rotational–translational free energy in the standardstate. Many methods are available for approximating the standard-state freeenergy of solvation, but here we focus on those that use the SMx universalsolvent models.161–171 In these models

�G0S ¼ �GENPðTÞ þGCDSðTÞ ½307�

where �GENP is the bulk electrostatic component of the solvation free energyobtained by treating the environment as a homogeneous dielectric mediumwith properties of the neat solvent. When the solute is inserted in the solvent,the latter is polarized, and as a result, it exerts a field, called the reaction field,on the solute. �GENP is composed of (1) a polarization energy GP, whichrepresents the net free energy of inserting the solute in the solvent (accountsfor favorable solute-solvent interactions and the energy cost of the solventrearrangement), and (2) the distortion energy �EEN, which represents thecost of distorting the solute geometry and electronic charge distribution tobe self-consistent with the solvent reaction field; i.e.,

�GENPðTÞ ¼ GPðTÞ þ�EENðTÞ ½308�Note that the solute electronic energy in the liquid phase is

EENðl;TÞ ¼ EENðgÞ þ�EENðTÞ ½309�

The term GCDS accounts for the first-solvation-shell effects and is givenby163–166

GCDS ¼Xk

sAk ðTÞAk

�rk; r

CDs

��þ sCSAk

��rk; r

CSs

��½310�

where k labels atoms, Ak is the exposed area of atom k, sAk is a partial

atomic surface tension for atom k, and Akðfrk; rCXS gÞ, with X¼ S or D is the


solvent-accessible surface area of atom k and is a function of a given set ofeffective solute radii {rk} and of effective solvent radii, rCSS and rCDS . Several sol-vation models, such as SM5.42, SM5.43, and SM6, have been created, andthey have different values for these parameters. SM6 is recommended formost systems.

Equations [305]–[310] are strictly valid only for thermodynamic species,which are ordinarily associated with stationary points on the potential energysurface V(R), where R denotes the full set of solute coordinates. However, wealso use the SMx solvation models to calculate potentials of mean force,172

which are called W(R,T). The gradient of W(R,T) gives the force on the solutemolecule averaged over a canonical ensemble of solvent molecules and is ageneralization of the one-dimensional radial potential of mean force thatappears in Debye–Huckel theory. Thus, we write

W ¼ VðRÞ þ DG0SðR;TÞ ½311�

where G0SðR;TÞ is like G0

SðTÞ except that the nuclear coordinates of the soluteare fixed at R; thus, �EEN(T) does not involve a change in the nuclei, and itmay be written as �EE(R,T).

The simplest way to implement Eq. [303] for the liquid-phase reactionrate is called separable equilibrium solvation or the SES approximation.159

In this approximation, one optimizes the stationary points and calculates thereaction path and vibrational frequencies in the gas phase. Then, at everystationary-point and reaction path geometry, one replaces the potential energyV by the potential of mean force W. If one also replaced V by W in calculatingpartition functions, this would provide an exact expression for the fluxthrough the generalized-transition-state dividing surface in a classical mechan-ical world, although one no longer can obtain the exact rate constant by vary-ing the dividing surface because the dividing surface depends only on thesubset R of the total set of solute and solvent coordinates. However, inthe SES approximation, the partition functions are still calculated usingV. Note that the location of the variational transition state along the reactionpath may be different in the gas phase and in the SES approximation to theliquid-phase rate, even though the reaction path is unaltered.

The SES approximation also replaces V by W for the tunneling calcula-tions, which is called the zero-order canonical-mean-shape approximation173

(CMS-O). Note that the tunneling turning points and hence the tunnelingpaths may be different in the gas phase and in solution in the SES approxima-tion, even though the reaction path is unaltered.

Algorithmically, because one only corrects the potential energy surfacealong the reaction path, the IVTST-M algorithm can be (and is) used forSES calculations where W plays the role of the high-level surface and V playsthe role of the low-level surface.

Reactions in Liquids 205

Because the transition state geometry optimized in solution and the solu-tion-path reacton path may be very different from the gas-phase saddle pointand the gas-phase reaction path, it is better to follow the reaction path givenby the steepest-descents-path computed from the potential of mean force. Thisapproach is called the equilibrium solvation path (ESP) approximation. In theESP method, one also substitutes W for V in computing the partition func-tions. In the ESP approximation, the solvent coordinates are not involved inthe definition of the generalized-transition-state dividing surface, and hence,they are not involved in the definition of the reaction coordinate, which isnormal to that surface. One says physically that the solvent does not partici-pate in the reaction coordinate. That is the hallmark of equilibrium solvation.

A third approach is to incorporate nonequilibrium solvent (NES) effects.In POLYRATE, this is accomplished by replacing the many degrees of freedomof the solvent with a single collective solvent coordinate.174 Further discussionof equilibrium and nonequilibrium solvation effects on liquid-phase reactionsis provided elsewhere.33,162,167,169

ENSEMBLE-AVERAGED VARIATIONAL TRANSITIONSTATE THEORY

The concept of reaction coordinate plays an important role in VTST. Infact, there is more than one reaction coordinate. Globally the reaction coordi-nate is defined as the distance s along the reaction path, and this coordinateplays a critical role in tunneling calculations. Locally the reaction coordinateis the degree of freedom (sometimes called z, but often also called s) that ismissing in the generalized transition state.

The treatment of VTST given so far is well suited for bimolecular reac-tions with tight transition states and simple barrier potentials. In such cases, ithas been shown that the variational transition state can be found by optimiza-tion of a one-parameter (s) or few-parameter (s and orientation of the dividingsurface) sequence of dividing surfaces orthogonal to the reaction path, wherethe reaction path is defined as the minimum energy path through isoinertialcoordinates. (See also Refs. 76, 77, and 81–89 for extensions to gas-phase sys-tems with loose transition states, where more general reaction coordinates areconsidered.) In this section, we discuss the extension of VTST for condensed-phase reactions to allow the generalized-transition-state dividing surface todepend on more than just the solute coordinates; for example, it can dependon the solvent, or for an enzyme-catalyzed reaction, it can depend on proteincoordinates. To include these different kinds of cases in a single formalism, wegeneralize the solute/solvent or system/environment separation, and we speakof a primary subsystem (or primary zone) instead of a solute or system and asecondary subsystem (or secondary zone) instead of a solvent or environment.As the reaction coordinate is the degree of freedom that is normal to the


generalized transition state, allowing the generalized-transition-state definitionto depend on secondary-subsystem coordinates is equivalent to allowing thedefinition of the reaction coordinate to depend on secondary-subsystemcoordinates, that is, to allowing the secondary subsystem to participate inthe reaction coordinate. Thus, this extension of VTST allows one, forexample, to include protein motions in the reaction coordinate for enzyme-cat-alyzed reactions. This is accomplished by ensemble averaging,175–180 and theextension is called ensemble-averaged variational transition state theory (EA-VTST); although it is more general than just for enzyme-catalyzed reactions,EA-VTST will be explained here mainly in the enzyme context.

For simple reactions, all, or almost all, of the reaction flux (at least in theabsence of large-curvature tunneling) passes through the TS in a quasi-harmo-nic valley centered on a single reaction path passing through a single saddlepoint. EA-VTST is designed for applications to complex reactions in the con-densed phase where an appropriate reaction coordinate may be very compli-cated, and where reaction proceeds through a large number of reaction paths,each passing through a different saddle point. These saddle points might differtrivially (for example, by a torsion around a far away hydrogen bond) or theymight differ more substantially. But the essence of a liquid-phase reaction isthat the number of saddle points is so numerous that they must be treatedby statistical mechanical theories of liquids. This means, algorithmically,that we must sample rather than examine all contributing configurations. Asfor the single-reaction coordinate version of VSTS described in the previoussections, EA-VTST may be combined with multidimensional tunneling or opti-mized multidimensional tunneling, using the canonical mean-shape approxi-mation, but now in an ensemble-averaged extension.

When applying EA-VTST to enzyme reactions, another kind of system/environment separation is made. Here the reactive system is considered to bethe substrate and perhaps part of the enzyme or coenzyme (and perhapsincluding one or two closely coupled water molecules), and the environmentis the rest of the substrate–coenzyme–enzyme complex plus the (rest of the)surrounding water. In what follows we will sometimes call the reactive systemthe ‘‘primary subsystem’’ and the environment as the ‘‘secondary subsystem.’’For the treatment of reactions in liquids that was presented earlier, the solventwas replaced by a homogeneous dielectric medium, which greatly simplifiesthe calculation. For enzyme-catalyzed reactions, we treat the environmentexplicitly at the atomic level of detail.

For enzyme-catalyzed reactions, we consider the unimolecular rate con-stant for the chemical step, which is the reaction of the Michaelis complex.The EA-VTST/OMT method involves a two-stage or three-stage procedure,where the third stage is optional. In stage one, a user-defined, physically mean-ingful reaction coordinate is used to calculate a one-dimensional potential ofmean force. This provides a classical mechanical free energy of activationalong that coordinate that is used to identify a transition state ensemble. In

Ensemble-Averaged Variational Transition State Theory 207

stage two, the transition state ensemble is used to sample a set of transitionpathways (reaction paths) to determine the transmission coefficient and thequantum mechanical tunneling contributions. The reaction coordinate forstage 1 is called a ‘‘distinguished reaction coordinate,’’ (DRC) which is thegenerally accepted name for a coordinate that has been ‘‘picked out’’ orassigned to serve as a reaction progress variable.64,181–183

In the first step of stage 1, all atoms (5000–25000 atoms for a typical appli-cation to an enzyme-catalyzed reaction) are treated on the same footing. In thisstep, one calculates a one-dimensional potential of mean force (PMF) as a func-tion of the distinguished reaction coordinate z by a classical molecular dynamicssimulation. Any method for calculating classical mechanical PMFs could beused; for example, one can use the CHARMM program184 to carry out thisstep by employing molecular dynamics simulation with umbrella sam-pling.185–187 As discussed below, this provides an approximation to the freeenergy of activation profile for generalized transition states (i.e., transition statedividing surfaces) orthogonal to this reaction coordinate.32 The umbrella sam-pling method involves several ‘‘windows,’’ which are sampled separately, andthen the results from all the windows are merged. During the umbrella samplingcalculations, configurations are saved at regular intervals; these saved configura-tions are sorted into bins based on their value of z and are later used at selectedvalues of z in the second step of stage 1 and the first step of stage 2 describedbelow. (Windows and bins are both spaced out along the reaction coordinate.Windows overlap, but bins do not. Bins are spacedmore closely than windows.)

Stage 1 is the calculation of the PMF along the distingished reactioncoordinate and various types of reaction coordinates can be used, for example,proton and hydride transfer reactions could be evaluated with a geometry-based distinguished reaction coordinate described by the difference betweenthe breaking and forming bond distances as

z ¼ rHD � rHA ½312�where rHD is the distance of the proton or hydride atom that is being trans-ferred to the donor atom and rHA is its distance to the acceptor atom. Selectinga different reaction coordinate should not, in principle, change the final calcu-lated rate constants significantly because stage 2, which uses the transitionstate ensemble of stage 1 to sample a set of reaction paths and uses an ensem-ble of more optimal reaction coordinates to calculate the rate constants.

The first step of stage 2 treats the system and its environment together,without distinction, as a supersystem. In all subsequent steps (that is, in steptwo of stage 1 and in stage 2 as well as the optional stage 3), the N-atom sys-tem is divided into two subsystems, a primary subsystem with N1 atoms andsecondary subsystem with N2 atoms, such that

N ¼ N1 þN2 ½313�Typically N1 ranges from 25 to 43 atoms.


In the second step of stage 1, vibrational quantization effects areincluded in the vibrational free energy,188 where the number of vibrations trea-ted quantum mechanically is

M� ¼ 3N1 � 7 ½314�

This is done for each z bin by computing the frequencies by using a rectilinearprojection operator to remove the reaction-coordinate motion from instanta-neous normal mode analyses at the sampled points. Second, the vibrationalfrequencies are averaged over an ensemble of sampled points in a given bin,and the vibrational free energy is calculated from the average frequencies byboth the quantized and the classical formulas for the free energy of a collectionof harmonic oscillators. Because the sampled points have been sampled at var-ious distances from the bed of the reaction valley, this analysis implicitlyincludes anharmonicity. The difference of the quantized and classical calcula-tions is added to the classical PMF, and the resulting adjusted PMF, called thequasi-classical PMF, corresponds to the M� nuclear motions being quantized,with the remaining motion being classical. At the end of the first stage, the rateconstant is given by

kð1Þ ¼ 1

bhexp ��G

ð1ÞT =RT

h i½315�

where

�Gð1ÞT ¼WCMðT; z� Þ þ�W

ðMÞvib ðT; z� Þ þ CðT; z� Þ

� WCMðT; zRÞ þWCMR;T;F þ�W

ð3N1�6Þvib ðT; zRÞ

h i½316�

in which WCMðT; zÞ is the classical mechanical PMF of stage 1–step 1, z* isvalue of z that maximizes the right-hand side of Eq. [316], zR is value of zwhere WCM(T,z) has a minimum corresponding to reactants, �W

ðM�Þvib ðT;z0Þ is

the ensemble-averaged correction to the vibrational free energy for quantiz-ing the M� highest frequencies at z ¼ z0, C(T,z) is the correction32 for a curvi-linear z, and WCM

R;T;F is the nonseparable vibrational free energy of thereaction coordinate at z¼ zR. Equation [315] is a quasi-classical rate con-stant because it includes quantization in transverse vibrational coordinatesbut not in the reaction coordinate (but it is not the final quasi-classicalrate constant of the EA-VTST treatment). Equation [316] can also be writtenas

�Gð1ÞT ¼ ��WCMðY; zÞ þWcorrðTÞ ½317�


where

��WCMðT; zÞ �WCMðT; z� Þ �WCMðT; zRÞ ½318�

and

WcorrðTÞ � �WCMR;T;MR

þ CðT; z� Þ þ�WðM� Þvib ðT; z� Þ ��W

ðMRÞvib ðT; zRÞ ½319�

where

MR ¼ F ¼M� þ 1 ½320�

The vibrational frequencies o�m at z ¼ z� are calculated from a Hessian thathas the reaction coordinate projected out, but the reactant frequencies oR

m

are calculated without projection.In the second stage, a transition state ensemble is selected. This ensemble

is defined as the set fi ¼ 1; 2; . . . ; Ig of I saved configurations from the umbrel-la sampling that have z nearest to z�. The individual values of z for theseensemble members are called z�;i. For each of these geometries, the primarysystem is optimized to the nearest saddle point, with fixed coordinates forthe secondary zone. An isoinertial MEP of the primary system is then com-puted, again with the secondary zone fixed. Note that each value of i corre-sponds to a different secondary zone and, hence, a different saddle point.Each MEP ði ¼ 1; 2; . . . ; IÞ corresponds to a different valley through the super-system consisting of the reactive system plus its environment. Furthermore,because each MEP has a different reaction coordinate corresponding to a dif-ferent set of coordinates for the secondary zone, the reaction coordinatedepends on the coordinates of the secondary zone. In this way the entire super-system (including the enzyme and solvent) participates in the definition of thereaction coordinate.

For each MEP, VTST and VTST/OMT calculations are carried out usingthe progress variable si along MEP i as the optimized reaction coordinate.(Note that si is the variable s for ensemble member i.) The improved reactioncoordinate for ensemble member i yields a recrossing transmission coefficienti, given by

ð2Þi ¼ exp

��GCVT;o

T

�s�;i

��GGT;o

T

s0;i

��.RT

�½321�

where s�;i is the location of maximum free energy of activation for ensemblemember i along its own reaction coordinate si and s0;i is the value of si forwhich z ¼ z�;i. These recrossing transmission coefficients are averaged overthe I members of the TS ensemble. The actual calculation of �GCVT;oðs�;iÞand �GGT;oðs

0;iÞ for the embedded primary system of transition ensemble


member i is carried out with the CHARMMRATE module of CHARMM.(Note that CHARMMRATE is based on POLYRATE.) The transmission coef-ficient calculated from this step of stage 2 is

ð2Þ ¼ hð2Þi i ½322�

where < . . . > denotes an ensemble average ði ¼ 1; 2; . . . ; IÞ. The resulting rateconstant is

kEA-VTST ¼ ð2ÞðTÞkð1ÞðTÞ ½323�

This stage-2, step-1 rate expression kEA-VTST is the final quasi-classical rateconstant of the two-state process. Equation [323] has sometimes been calledthe static-secondary-zone rate constant without tunneling, but this term isdeceptive because the secondary zone changes from one ensemble memberto another and, hence, is not really static.

At this point one can include optimized multidimensional tunneling ineach ði ¼ 1; 2; . . . ; IÞ of the VTST calculations. The tunneling transmissioncoefficient of stage 2 for ensemble member i is called kð2Þi and is evaluatedby treating the primary zone in the ‘‘ground-state’’ approximation (see the sec-tion titled ‘‘Quantum Effects on Reaction Coordinate Motion’’) and the sec-ondary zone in the zero-order canonical mean shape approximation explainedin the section titled ‘‘Reactions in Liquids’’, to give an improved transmissioncoefficient that includes tunneling:

gð2Þ ¼ hkð2Þi ð2Þi i ½324�

with the final stage-2 rate constant being

kEA-VTST=OMT ¼ gð2ÞðTÞkð1ÞðTÞ ½325�

The procedure just discussed for stage 2 includes the thermal energy andentropy of secondary-zone atoms in k(1)(T) and in the determination of eachs0,i that is used in stage 2, but the s dependence of these contributions is notincluded in each MEP. Optionally these effects could be included in a thirdstage. However, when secondary-zone dynamics are slow on the time scaleover which s crosses the barrier189 (or on the time scale of a wave packet tra-versing the tunneling segment of the reaction path), one is in what Hynes hascalled the ‘‘nonadiabatic solvation limit.’’190–192 In this limit, the transitionstate passage occurs with an ensemble average of essentially fixed second-ary-zone configurations190–192 because the secondary zone cannot respondto the reaction coordinate motion to provide equilibrium solvation; in sucha case, allowing the secondary zone to relax could provide less accurate results


than stopping after stage 2. Contrarily, if the adjustment of the secondary zoneis rapid on the time scale of barrier passage, one can improve the result by add-ing a third stage,175,178 which we call the equilibrium secondary zone approx-imation. If invoked, this stage uses free energy perturbation theory all alongeach MEP to calculate the change in secondary-zone free energy as a functionof each si. That change is added to the generalized transition state theory freeenergy of activation profile for the calculation of both the quasiclassical CVTrate constant and the quantum effects on the reaction coordinate.

GAS-PHASE EXAMPLE: HþCH4

In this section, CVT/mOMT theory is applied to the HþCH4 !H2þCH3 reaction by using the Jordan–Gilbert193 (JG) potential energy sur-face. We select this example because it is one of the few polyatomic systems forwhich accurate quantum dynamics calculations are available.194–196 (By accu-rate quantum dynamics, we mean that the nuclear quantum dynamics are con-verged for a given potential energy surface.) All the VTST calculations havebeen carried out with POLYRATE–version 9.3.1, and the calculations dis-cussed here reproduce the CVT/mOMT rate constants obtained previouslyby Pu et al.111,112

First, the reactants, products, and saddle point are optimized. The ima-ginary frequency at the saddle point of this example has a value of 1093i cm�1.The energies calculated at these points yield a classical barrier height ofVz ¼ 10:92 kcal=mol and an energy of reaction, �E, of 2.77 kcal/mol. Fromthe normal mode analyses performed at the stationary points, the vibrationallyadiabatic ground-state barrier at the saddle point is calculated to be�VzGa ¼ 10:11kcal=mol, where

�VzGa ¼ VzGa � VGa ðs ¼ �1Þ ½326�

and the reaction, for the assumed potential energy surface, is slightly exother-mic, �Ho

0 ¼ �0:01 kcal=mol, where H is the enthalpy. Notice that�Ho

T ¼ �GoT at T ¼ 0 K.

TheMEP was followed over the interval �2:50 ao � s � 2:50 ao by usingthe Page–McIver algorithm with a step size of 0.01 ao, and curvilinear Hessiancalculations were performed at every step. The scaling mass that transformsmass-weighted coordinates to mass-scaled coordinates has been set equal to1 amu. The vibrationally adiabatic ground-state barrier is located atsAG� ¼ 0:182 ao, and the vibrationally adiabatic ground-state barrier height isfound to be �VAG

a ¼ 10:44 kcal/mol. The meaning of � is that this is VGa at its

maximum (denoted by A) relative to the value of VGa at reactants, whereas VAG

a

without � refers to VGa relative to the energy at the classical equilibriun


structure of reactants; this is about 38 kcal/mol as shown in Figure 6. InFigure 6, we plot VMEP and the vibrationally adiabatic potential with vibra-tions orthogonal to the reaction path treated in both curvilinear and rectilinear(Cartesian) coordinates. It should be noticed that both the MEP and the poten-tial VMEP along the MEP are the same in both systems of coordinates; however,the vibrationally adiabatic potential energy curves are different at nonstation-ary points because the vibrational frequencies at nonstationary points dependon the coordinate system. The values of the vibrational frequencies along thereaction path are more physical in curvilinear coordinates, as discussed.

Once the MEP and the frequencies along it have been calculated, one cancalculate the generalized-transition-state-theory free energy profiles, as shownin Figure 7 for T ¼ 200; 300, and 500 K. As indicated in Figure 3, themaximum VAG

a of the adiabatic potential need not coincide with the maximum�GCVT;oðTÞ of the free energy of activation profile at a given temperature. Thevalues of sCVT� ðTÞ are 0.177, 0.171, and 0.152 ao at T ¼ 200; 300, and 500 K,respectively as shown in Figure 7. Thus, the CVT rate constant is lower thanthe conventional TST rate constant because the best dividing surface (the bot-tleneck) is located at s 6¼ 0. For instance, at T ¼ 300 K, the value of the CVTrate constant is 2.2� 10�20 cm3 molecule�1 s�1, whereas the conventional

Figure 6 Plot of the MEP (dotted line) and the vibrationally adiabatic ground-statepotential curve as calculated in curvilinear (solid line) and rectilinear (dashed line)coordinates for the Hþ CH4 reaction.

Gas-Phase Example: H þCH4 213

TST rate constant is 3.6� 10�18 cm3 molecule�1 s�1. These rate constantsinclude quantum effects in all the F � 1 degrees of freedom perpendicular tothe reaction coordinate, but the reaction-coordinate motion is classical; thus,we sometimes call these rate constants hybrid (in older papers) or quasi-clas-sical (in more recent papers). The quantum effects on the reaction coordinateare incorporated by a transmission coefficient as described earlier. Because themaximum of the vibrationally adiabatic potential curve and the maximum ofthe free energy of activation profile at a given temperature do not coincide, onemust employ the classical adiabatic ground-state CAG correction of Eq. [163]in the calculation of the CVT rate constant.

Tunneling effects are important at low temperatures for this reactionbecause a light particle is transferred. The curvature of the reaction pathwas calculated by Eq. [166], and it is plotted in Figure 8. The small-curvatureapproximation to the effective mass along the reaction path is calculated byEq. [174], and its ratio to the scaling mass is also plotted in Figure 8, whichshows how the effective mass is reduced along the reaction path. This reduc-tion in the effective mass also reduces the imaginary action integral and there-fore increases the tunneling probability. The ZCT transmission coefficients use

Figure 7 Generalized-transition-state free energy of activation along the MEP at threedifferent temperatures for the Hþ CH4 reaction.


an effective mass that is always equal to the scaling mass, because the curva-ture along the reaction path is neglected in ZCT, and therefore, ZCT transmis-sion coefficients always predict less tunneling than SCT transmissioncoefficients. The LCT transmission factors are calculated using the proceduredescribed in the section entitled Large Curvature Transmission Coefficient.The larger of the SCT and LCT tunneling probabilities at each tunnelingenergy is the mOMT transmission probability. Thermally averaging these givesthe mOMT transmission coefficient, which is is 18.7 at T ¼ 200 K and 1.57 atT ¼ 500 K.

The effect of tunneling on the reaction is further analyzed by finding theenergy that contributes most to the ground-state transmission coefficient.Making a change of variable, i.e., letting x ¼ E� VAG

a , in Eq. [160] and byusing Eqs. [162] and [163], then

kCVT=mOMT ¼ kCVTðTÞbkCVT=CAGðTÞ(ð0

E0�VAGa

PðxÞ expð�bxÞdx

þð10

PðxÞ expð�bxÞdx)

½327�

Figure 8 Plot of meff=m and the reaction path curvature k along the MEP for theHþ CH4 reaction.

Gas-Phase Example: H þCH4 215

The first integral yields the tunneling contribution to the transmission coef-ficient, and the integrand is plotted in Figure 9. The curves are the productof the tunneling probability multiplied by the Boltzmann factor. Theenergy at which this product has a maximum is called119 the representa-tive tunneling energy (RTE). At a given temperature, the RTE indicates theenergy at which it is most probable for the particle to tunnel. Forinstance, at T ¼ 200K and T ¼ 500K, the RTE is located 2.02 and0.31 kcal/mol below the barrier top, respectively. The mOMT transmissionfactor is larger at lower temperatures because the area under the curve islarger.

The CVT/mOMT rate constants are 7:1� 10�21 and 4:1� 10�15cm3

molecule�1 s�1 at T ¼ 200 and 500 K, respectively, whereas the accuratequantum calculations194–196 are 9.0� 10�21 and 3.8� 10�15 cm3

molecule�1 s�1 at those two temperatures. The average absolute deviationbetween the CVT/mOMT and the accurate rate constants is only 17% in therange 200–500 K. The performance of CVT/mOMT for this reaction is aston-ishing, considering that the quantum calculations for this system took several

Figure 9 Plot of the first integrand of Eq. [327] versus x ¼ E� VAGa at three different

energies for the Hþ CH4 reaction. The maximum of the curves indicates therepresentative tunneling energy. The top of the barrier is located at x ¼ 0.


months, whereas the VTST/mOMT results require only a few seconds ofcomputer time. In particular, the calculations were carried out in less than30 seconds on an old computer, including full LCT calculations withouteven using the faster spline algorithm. The calculations are so fast that theslowest part is setting up the input file.

LIQUID-PHASE EXAMPLE: MENSHUTKINREACTION

In this section, VTST is applied to the bimolecular Menshutkin reactionin aqueous solution:159

ClCH3 þNH3 ! Cl� þH3CNHþ3 ½328�

An important difference of this example from that given earlier is that in thiscase no analytical potential energy surface was provided to the program.Instead, the electronic structure data needed for the dynamics were calculated‘‘on the fly’’ by the MN-GSM197 program; that is, direct dynamics was used.The gas-phase electronic structure calculations were carried out with the HF/6-31G(d) method, and the MEP was followed by using the Page–McIver algo-rithm with a step size of 0.01 ao with analytical Hessian calculations everynine steps. Generalized normal modes were calculated using redundant curvi-linear coordinates. The calculations in solution were performed with the pro-gram MN-GSM–version 5.2, which incorporates the SM5.42, SM5.43, andSM6 solvation models into Gaussian 98.198 The dynamics calculations werecarried out with GAUSSRATE–version 9.1, which in this case was modifiedto serve as an interface between the MN-GSM–v5.2 and POLYRATE–version9.3.1 programs.

The SES calculations were carried out along the gas-phase MEP. In anSES calculation, the solvent is not considered when constructing the MEP,and solvent effects are added separately to create the potential of mean forceusing Eq. [311]. The solvation free energy was evaluated with the SM5.43model, and therefore, the SES calculations are denoted as SM5.43/HF/6-31þG(d)//HF/6-31G(d) or simply as SM5.43/HF/6-31G(d)//g.

The ESP calculations, which include solvent effects when determininggeometries of stationary points and points on the reaction path, are denotedas SM5.43/HF/6-31þG(d). The stationary points within the ESP approxima-tion are optimized using the potential of mean force, where this potential has aminimum for reactants and products and a maximum for the transition state insolution. The reaction path was obtained by using the Page–McIver algorithmwith a step size of 0.01 ao. We evaluated numerical Hessians, including theeffect of solvent, by central differences at every ninth step. Vibrational

Liquid-Phase Example: Menshutkin Reaction 217

frequencies were calculated in redundant curvilinear coordinates. In the ESPapproach, we consider the liquid-phase saddle point on the potential ofmean force surface of the solute as the dividing surface for the conventionaltransition state theory calculations.

For Reaction [328], the Cl, C, and N atoms are collinear. The bondlengths between these three atoms in the gas phase and in solution are listedin Table 1, and the energetics of the stationary points are listed in Table 2. Forthis reaction, solvent effects are very large for products. The aqueous solutionstabilizes the charged products, as shown in Table 3. The gas-phase VMEP andthe SES canonical mean-shape potential UðsjTÞ are plotted in Figure 10. Notethat

UðsjTÞ ¼ VMEPðsÞ þ�G0SðRðsÞ;TÞ ½329�

In the gas phase, a transition state exists for reaction only because there is aslightly stable ion-pair structure, which disappears when the geometry isoptimized in solution. The maximum of UðsjTÞ in the SES approximation islocated at s ¼ �1:60 a0. The maximum of UðsjTÞ along the reaction path atthe SM5.43//HF/6-31G(d) level is much closer to reactants than in the gasphase, which was expected, because in solution, products are much morestabilized.

Table 1 Bond Lengths of the Stationary Points in A

Gas Phase ESPRNC RCCl RNC RCCl

Reactant 1 1.785 1 1.805van der Waals complex 3.419 1.793 — —Saddle point 1.876 2.482 2.263 2.312Ion pair 1.548 2.871 — —Products 1.507 1 1.476 1

Table 2 Zero-Order Mean Shape Potential of the Stationary Points Relative toReactants (in kcal/mol)

Gasa SESa ESPa

van der Waals complex �2.0 �0.98 —Saddle point 36.1 2.61 13.4Ion pair 30.6 �27.5 —Products 111.7 �38.6 �35.6

aReactants absolute energy (in hartrees): �555.277509 (gas); �555.285927 (SES);�555.286366 (ESP).


The potentials along the reaction paths in the SES and ESP approximations areplotted in Figure 11 using a common reaction coordinate consisting of the dif-ference between the breaking and the forming bonds along the path involvingthe breaking and forming bond distance in the gas-phase transition state (thisreaction coordinate is used only for plotting the two cases on a common scale;the actual reaction coordinates are distance along the gas-phase MEP for theSES cases and along the liquid-phase MEP for ESP). The SES and ESP poten-tials show similar profiles and therefore similar rate constants at room

Figure 10 Zero-order canonical mean shape potentialU for reaction [328] calculated atthe HF/6-31 G(d) (gas phase) and SM5.43//HF/6-31G(d) (SES) levels as functions of thereaction coordinate s for the Menshutkin reaction.

Table 3 Standard-State Free Energies of Solvation of theStationary Points in kcal/mol

Level SES ESP

NH3 �4.6 �5.1CH3Cl �0.7 1.4ClCH3. . .NH3 �4.2 —Transition state �78.8 �20.1Cl�. . .CH3NHþ3 �78.9 —Cl� �72.0 �72.0CH3NHþ3 �83.6 �84.3

Liquid-Phase Example: Menshutkin Reaction 219

temperature (see Table 4). The exception is the conventional TST rate constantin the SES approach, which is about six orders of magnitude higher than theCVT rate constant. This is caused by the very different location of the maxi-mum of the potential in liquid-phase solution as compared with the gas phase.As expected, tunneling is not very important for this reaction, and therefore,the SCT approach for tunneling suffices for this case.

Although the above reaction is quite simple, the similarity between theSES and the ESP profiles is stunning if we consider the great difference betweenthe gas-phase and liquid-phase potentials. From this example, we can concludethat, although the ESP allows a more reliable description of the reaction insolution, the SES approach is an inexpensive approach that can sometimesprovide a reasonably accurate alternative to the ESP method.

Figure 11 Zero-order canonical mean shape potential U for reaction [328] calculatedat the HF/6-31G(d) (gas phase), SM5.43//HF/6-31G(d) (SES), and SM5.43/HF/6-31G(d) (ESP) levels as functions for the Menshutkin reaction.

Table 4 Rate Constants in cm3 molecule�1 s�1

k SES ESP

TST 3.7� 10�18 3.7� 10�25

CVT 2.0� 10�25 1.9� 10�25

CVT/SCT 2.9� 10�25 2.6� 10�25


CONCLUDING REMARKS

Transition state theory is based on the assumption of a dynamical bottle-neck. The dynamical bottleneck assumption would be perfect, at least in clas-sical mechanics, if the reaction coordinate were separable. Then one could finda dividing surface separating reactants from products that is not recrossed byany trajectories in phase space. Conventional transition state theory assumesthat the unbound normal mode of the saddle point provides such a separablereaction coordinate, but dividing surfaces defined with this assumption oftenhave significant recrossing corrections. Variational transition state theory cor-rects this problem, eliminating most of the recrossing.

Variational transition state theory has proved itself to be a flexible andpractical tool for finding better transition state dividing surfaces in both simpleand complex systems. Such dividing surfaces are called generalized transitionstates, and the optimum or optimized generalized transition states are calledvariational transition states. Real chemical reactions involve reactants withquantized vibrations, and this feature must be included in realistic rate con-stant calculations. Much more accurate rate constants are obtained if vibra-tions are treated as quantized both in the generalized transition statedividing surface and in the reactants. The reaction-coordinate motion, whichis unbound for bimolecular reactions and therefore does not have quantizedvibrations, also exhibits quantum effects, especially tunneling and nonclassicalreflection. For thermal reactions that involve significant tunneling contribu-tions, it is necessary to treat the overbarrier and tunneling processes in a con-sistent framework because the fraction of reaction that occurs by a tunnelingmechanism tends to decrease gradually as the temperature is increased; thisconsistency can only be achieved in general if a variational criterion is usedto optimize the overbarrier contribution; after such optimization is carriedout, the ground-state transmission coefficient approximation and the canoni-cal-mean-shape approximation provide ways of consistently incorporatingtunneling effects into variational transition state theory for gas-phase andliquid-phase reactions, respectively.

For simple reactions, one needs to consider only a single reaction co-ordinate, and the isoinertial minimum energy path provides a good choicethat is often sufficient. Early work took the transition state dividing surfacesto be hyperplanes perpendicular to the isoinertial minimum energy path andoptimized the location of such hyperplanes along this path. The next genera-tion of algorithms either optimized the orientation of hyperplanes or used cur-vilinear coordinates to define more physical dividing surfaces. The mostcomplete algorithms consider an ensemble of reaction paths. In this way onecan account, at least in part, for recrossing the dividing surface defined by asingle reaction coordinate.

It is not sufficient to merely treat tunneling consistently with over-barrier processes; it must be treated accurately. For overbarrier processes,

Concluding Remarks 221

the nonseparability of the reaction coordinate shows up as recrossing, and thenonseparability of the reaction coordinate is even more important for tunnel-ing than for overbarrier processes. Two kinds of nonseparability are recog-nized. First, the effective barrier along the tunneling coordinate depends onall other degrees of freedom. Second, the tunneling paths themselves tend tobe shorter than the minimum energy path, and this path shortening, called cor-ner cutting, depends on the multidimensional shape of the potential energysurface. For small curvature of the minimum energy path in isoinertial coordi-nates, the effective potential may be calculated vibrationally adiabatically, andtunneling-path shortening may be calculated to a good approximation fromthe reaction-path curvature. For large curvature of the minimum energypath in isoinertial coordinates, the effective potential is vibrationally nonadia-batic, and one must average over a set of nearly straight tunneling paths thatusually cannot be represented in coordinate systems based on the minimum-energy path; special procedures called large-curvature tunneling approxima-tions have been worked out to treat such tunneling consistently with varia-tional transition state theory.

This chapter has included a discussion of algorithms for treating allthese issues, especially as they are incorporated in the POLYRATE computerprogram. The POLYRATE program requires information about the potentialenergy surface, and this can be included in a variety of ways. These includeglobal analytical potential energy surfaces and direct dynamics. In directdynamics, the energies, gradients, and Hessians required by the algorithmsare computed ‘‘on the fly’’ by electronic structure calculations wheneverthe algorithms call for them. This is called direct dynamics. POLYRATEalso includes several interpolation schemes in which the needed energies, gra-dients, and Hessians are locally interpolated from a small dataset of electro-nic structure calculations; this is a particularly efficient form of directdynamics.

ACKNOWLEDGMENTS

This work was supported in part by the U.S. Department of Energy (DOE), Office of Basic EnergySciences (BES), under Grant DE-FG02-86ER13579 and by the Air Force Office of ScientificResearch by a Small Business Technology Transfer grant to Scientific Applications and ResearchAssoc., Inc. A.F.R. thanks the Ministerio de Educacion y Ciencia of Spain for a Ramon y Cajalresearch contract and for Project #BQU2003-01639. B.C.G. acknowledges BES support at PacificNorthwest National Laboratory (PNNL). Battelle operates PNNL for DOE.

REFERENCES

1. D. G. Truhlar, A. D. Isaacson, and B. C. Garrett, in Theory of Chemical Reaction Dynamics,Vol. 3, M. Baer, ed., CRC Press, Boca Raton, FL, 1985, pp. 65–137. Generalized TransitionState Theory.


2. A. D. Isaacson, D. G. Truhlar, S. N. Rai, R. Steckler, G. C. Hancock, B. C. Garrett, and M. J.Redmon, Comput. Phys. Commun., 47, 91 (1987). POLYRATE: A General ComputerProgram for Variational Transition State Theory and Semiclassical Tunneling Calculationsof Chemical Reaction Rates.

3. D.-h. Lu, T. N. Truong, V. S.Melissas, G. C. Lynch, Y.-P. Liu, B. C. Garrett, R. Steckler, A. D.Isaacson, S. N. Rai, G. C. Hancock, J. G. Lauderdale, T. Joseph, andD. G. Truhlar,Comput.Phys. Commun., 71, 235 (1992). POLYRATE 4: ANewVersion of a Computer Program forthe Calculation of Chemical Reaction Rates for Polyatomics.

4. W.-P. Hu, R. Steckler, G. C. Lynch, Y.-P. Liu, B. C. Garrett, A. D. Isaacson, D.-h. Lu, V. S.Melissas, I. Rossi, J. J. P. Stewart, and D. G. Truhlar, QCPE Bull., 15,32 (1995). POLY-RATE -version 6.5 and MORATE -version 6.5/P6.5-M5.05. Two Computer Programs forthe Calculation of Chemical Reaction Rates.

5. R. Steckler, W.-P. Hu, Y.-P. Liu, G. C. Lynch, B. C. Garrett, A. D. Isaacson, V. S. Melissas,D.-h. Lu, T. N. Truong, S. N. Rai, G. C. Hancock, J. G. Lauderdale, T. Joseph, and D. G.Truhlar, Comput. Phys. Commun., 88, 341 (1995). POLYRATE 6.5: A New Version of aComputer Program for the Calculation of Reaction Rates for Polyatomics.

6. J. C. Corchado, Y.-Y. Chuang, P. L. Fast, W.-P. Hu, Y.-P. Liu, G. C. Lynch, K. A. Nguyen, C.F. Jackels, A. Fernandez-Ramos, B. A. Ellingson, B. J. Lynch, V. S. Melissas, J. Villa, I. Rossi,E. L. Coitino, J. Pu, T. V. Albu, R. Steckler, B. C. Garrett, A. D. Isaacson, and D. G. Truhlar,POLYRATE - version 9.4.3. University of Minnesota, Minneapolis, Minnesota, 2006.Available: http://comp.chem.umn.edu/polyrate.

7. S. C. Tucker and D. G. Truhlar, in New Theoretical Concepts for Understanding OrganicReactions, J. Bertran and I. G. Csizmadia, Eds., Kluwer, Dordrecht, The Netherlands, 1989,pp. 291–346. [NATO ASI Ser. C 267, 291–346 (1989)]. Dynamical Formulation ofTransition State Theory: Variational Transition States and Semiclassical Tunneling.

8. H. Eyring, J. Chem. Phys., 3, 107 (1935). The Activated Complex in Chemical Reactions.

9. M. G. Evans and M. Polanyi, Trans. Faraday Soc., 31, 875 (1935). Some Applications ofthe Transition State Method to the Calculation of Reaction Velocities, Especially inSolution.

10. W. F. K. Wynne-Jones and H. Eyring, J. Chem. Phys., 3, 492 (1935). The Absolute Rate ofReactions in Condensed Phases.

11. R. H. Fowler, Trans. Faraday Soc., 34, 124 (1938). General Discussion.

12. R. K. Boyd, Chem. Rev., 77, 93 (1977). Macroscopic and Microscopic Restrictions onChemical Kinetics.

13. C. Lim andD. G. Truhlar, J. Phys. Chem., 89, 5 (1985). Internal-State Nonequilibrium Effectsfor a Fast, Second-Order Reaction.

14. C. Lim and D. G. Truhlar, J. Phys. Chem., 90, 2616 (1986). The Effect of Vibrational-Rotational Disequilibrium on the Rate Constant for an Atom-Transfer Reaction.

15. H. Teitelbaum, J. Phys. Chem., 94, 3328 (1990). Nonequilibrium Kinetics of BimolecularExchange Reactions. 3. Application to Some Combustion Reactions.

16. C. Bowes, N. Mina, and H. Teitelbaum, J. Chem. Soc., Faraday Trans., 87, 229 (1991). Non-Equilibrium Kinetics of Bimolecular Exchange Reactions. 2. Improved Formalism andApplications to Hydrogen AtomþHydrogenMolecule! HydrogenMoleculeþHydrogendrogen Atom and its Isotopic Variants.

17. H. Teitelbaum, Chem. Phys., 173, 91 (1993). Non-Equlibrium Kinetics of BimolecularReactions. IV. Experimental Prediction of the Breakdown of the Kinetic Mass-Action Law.

18. H. Teitelbaum,Chem. Phys. Lett., 202, 242 (1993). Non-EquilibriumKinetics of BimolecularReactions. Effect of Anharmonicity on the Rate Law.

19. P. Pechukas, Annu. Rev. Phys. Chem., 32, 159 (1981). Transition State Theory.

20. B. C. Garrett and D. G. Truhlar, J. Phys. Chem., 83, 1052 (1979); Erratum: 87, 4553 (1983).Generalized Transition State Theory. Classical Mechanical Theory and Applications toCollinear Reactions of Hydrogen Molecules.

References 223

21. E. Wigner, J. Chem. Phys., 5, 720 (1937). Calculation of the Rate of Elementary AssociationReactions.

22. J. Horiuti, Bull. Chem. Soc. Jpn., 13, 210 (1938). On the Statistical Mechanical Treatment ofthe Absolute Rates of Chemical Reactions.

23. J. C. Keck, J. Chem. Phys., 32, 1035 (1960). Variational Theory of Chemical Reaction RatesApplied to Three-Body Recombinations.

24. J. C. Keck, Adv. Chem. Phys., 13, 85 (1967). Variational Theory of Reaction Rates.

25. R. L. Jaffe, J. M. Henry, and J. B. Anderson, J. Chem. Phys., 59, 1128 (1973). VariationalTheory of Reaction Rates: Application to FþH2 $ HFþH.

26. W.H.Miller, J. Chem. Phys., 61, 1823 (1974). QuantumMechanical Transition State Theoryand a New Semiclassical Model for Reaction Rate Constants.

27. B. C. Garrett and D. G. Truhlar, J. Chem. Phys., 70, 1593 (1979). Criterion ofMinimum StateDensity in the Transition State Theory of Bimolecular Reactions.

28. E. B.Wilson, Jr., J. C. Decius, and P. C. Cross,Molecular Vibrations, Dover Publications, Inc.,New York, 1955.

29. R. A. Marcus, Discuss. Faraday Soc., 44, 7 (1967). Analytical Mechanics and AlmostVibrationally Adiabatic Chemical Reactions.

30. B. C. Garrett and D. G. Truhlar, J. Phys. Chem., 83, 1079 (1979); Erratum: 87, 4553 (1983).Generalized Transition State Theory. Quantum Effects for Collinear Reactions of HydrogenMolecules and Isotopically Substituted Hydrogen Molecules.

31. K. Fukui, in The World of Quantum Chemistry, R. Daudel and B. Pullman, Eds., D. Reidel,Dordrecht, The Netherlands, 1974, pp. 113. The Charge and Spin Transfers in ChemicalReaction Paths.

32. G. K. Schenter, B. C. Garrett, and D. G. Truhlar, J. Chem. Phys., 119, 5828 (2003).Generalized Transition State Theory in Terms of the Potential of Mean Force.

33. G. K. Schenter, B. C. Garrett, and D. G. Truhlar, J. Phys. Chem. B, 105, 9672 (2001). TheRole of Collective Solvent Coordinates and Nonequilibrium Solvation in Charge-TransferReactions.

34. P. L. Fast and D. G. Truhlar, J. Chem. Phys., 109, 3721 (1998). Variational Reaction PathAlgorithm.

35. B. C. Garrett and D. G. Truhlar, J. Am. Chem. 50c, 101, 4534 (1979). Generalized TransitionState Theory. Bond Energy–Bond Order Method for Canonical Variational Calculationswith Application to Hydrogen Atom Transfer Reactions.

36. A. Tweedale and K. J. Laidler, J. Chem. Phys., 53, 2045 (1970). Vibrationally AdiabaticModel for the Dyamics of HþH2 Systems.

37. J. C. Keck, Adv. Chem. Phys., 13, 85 (1967). Variational Theory of Reaction Rates.

38. E. Wigner, Z. Physik Chem. B, B19, 203 (1932). On the Penetration of Potential EnergyBarriers in Chemical Reactions.

39. M. A. Eliason and J. O. Hirschfelder, J. Chem. Phys., 30, 1426 (1956). General CollisionTheory Treatment for the Rate of Bimolecular, Gas Phase Reactions.

40. C. Steel and K. J. Laidler, J. Chem. Phys., 34, 1827 (1961). High Frequency Factors inUnimolecular Reactions.

41. B. C. Garrett, D. G. Truhlar, R. S. Grev, and A. W. Magnuson, J. Phys. Chem., 84, 1730(1980). Improved Treatment of Threshold Contributions in Variational Transition-StateTheory.

42. J. O. Hirschfelder and E. Wigner, J. Chem. Phys., 7, 616 (1939). Some Quantum-MechanicalConsiderations in the Theory of Reactions Involving an Activation Energy.

43. W. H. Miller, J. Chem. Phys., 65, 2216 (1976). Unified Statistical Model for ’’Complex’’ and’’Direct’’ Reaction Mechanisms.

44. B. C. Garrett andD.G. Truhlar, J. Chem. Phys., 76, 1853 (1982). Canonical Unified StatisticalModel. Classical Mechanical Theory and Applications to Collinear Reactions.


45. D. G. Truhlar and B. C. Garrett, J. Phys. Chem. A, 107, 4006 (2003). Reduced Mass in theOne-Dimensional Treatment of Tunneling.

46. G. Gamow, Z. Phys., 51, 204 (1928). Quantum Theory of the Atomic Nucleus.

47. E. C. Kemble, The Fundamental Principles of Quantum Mechanics With Elementary Appli-cations, Dover Publications, New York, 1937.

48. R. P. Bell, Proc. Royal Soc. A, 139, 466 (1933). The Application of Quantum Mechanics toChemical Kinetics.

49. R. P. Bell, Trans. Faraday Soc., 55, 1 (1959). The Tunnel Effect Correction for ParabolicPotential Barriers.

50. R. T. Skodje and D. G. Truhlar, J. Phys. Chem., 85, 624 (1981). Parabolic TunnelingCalculations.

51. C. Eckart, Phys. Rev., 35, 1303 (1930). The Penetration of a Potential Barrier by Electrons.

52. R. A. Marcus, J. Chem. Phys., 49, 2617 (1968). Analytical Mechanics of Chemical Reactions.IV. Classical Mechanics of Reactions in Two Dimensions.

53. I. Shavitt, J. Chem. Phys., 49, 4048 (1968). Correlation of Experimental Rate Constants of theHydrogen Exchange Reactions with a Theoretical H3 Potential Surface, Using Transition-State Theory.

54. D. G. Truhlar and A. Kuppermann, J. Am. Chem. Soc., 93, 1840 (1971). Exact TunnelingCalculations.

55. K. Fukui, S. Kato, and H. Fujimoto, J. Am. Chem. Soc., 97, 1 (1975). Constituent Analysis ofthe Potential Gradient Along a Reaction Coordinate. Method and an Application toMethane þ Tritium Reaction.

56. M. C. Flanigan, A. Komornicki, and J. W. McIver, Jr. In Semiempirical Methods ofElectronic Structure Calculation, Part B: Applications, G. A. Segal, Ed., Plenum, NewYork, 1977, pp. 1–47.

57. C. Peng, P. Y. Ayala, H. B. Schlegel, and M. J. Frisch, J. Comput. Chem., 17, 49 (1996).Using Redundant Internal Coordinates to Optimize Equilibrium Geometries and TransitionStates.

58. P. Y. Ayala and H. B. Schlegel, J. Chem. Phys., 107, 375 (1997). A Combined Method forDetermining Reaction Paths, Minima, and Transition State Geometries.

59. V. S. Melissas, D. G. Truhlar, and B. C. Garrett, J. Chem. Phys., 96, 5758 (1992). OptimizedCalculations of Reaction Paths and Reaction-Path Functions for Chemical Reactions.

60. M. W. Schmidt, M. S. Gordon, and M. Dupuis, J. Am. Chem. Soc., 107, 2585 (1985). TheIntrinsic Reaction Coordinate and the Rotational Barrier in Silaethylene.

61. B. C. Garrett, M. J. Redmon, R. Steckler, D. G. Truhlar, K. K. Baldridge, D. Bartol, M. W.Schmidt, and M. S. Gordon, J. Phys. Chem., 92, 1476 (1988). Algorithms and AccuracyRequirements for Computing Reaction Paths by the Method of Steepest Descent.

62. K. K. Baldridge, M. S. Gordon, R. Steckler, and D. G. Truhlar, J. Phys. Chem., 93, 5107(1989). Ab Initio Reaction Paths and Direct Dynamics Calculations.

63. M. Page and J. W. McIver, Jr., J. Chem. Phys., 88, 922 (1988). On Evaluating the ReactionPath Hamiltonian.

64. J. Villa and D. G. Truhlar, Theor. Chem. Acc., 97, 317 (1997). Variational Transition StateTheory Without the Minimum-Energy Path.

65. W. H. Miller, N. C. Handy, and J. E. Adams, J. Chem. Phys., 72, 99 (1980). Reaction PathHamiltonian for Polyatomic Molecules.

66. G. A. Natanson,Mol. Phys., 46, 481 (1982). Internal Motion of a Nonrigid Molecule and itsRelation to the Reaction Path.

67. G. A.Natanson, B. C. Garrett, T. N. Truong, T. Joseph, andD.G. Truhlar, J. Chem. Phys., 94,7875 (1991). The Definition of Reaction Coordinates for Reaction-Path Dynamics.

68. C. F. Jackels, Z. Gu, and D. G. Truhlar, J. Chem. Phys., 102, 3188 (1995). Reaction-PathPotential and Vibrational Frequencies in Terms of Curvilinear Internal Coordinates.

References 225

69. G. Herzberg, Molecular Spectra and Molecular Structure. II. Infrared and Raman Spectra ofPolyatomic Molecules, D. Van Nostrand, Princeton, New Jersey, 1945.

70. A. D. Isaacson, D. G. Truhlar, K. Scanlon, and J. Overend, J. Chem. Phys., 75, 3017 (1981).Tests of Approximation Schemes for Vibrational Energy Levels and Partition Functions forTriatomics: H2O and SO2.

71. P. Pulay and G. Fogarasi, J. Chem. Phys., 96, 2856 (1992). Geometry Optimization inRedundant Internal Coordinates.

72. Y.-Y. Chuang andD. G. Truhlar, J. Phys. Chem. A, 102, 242 (1998). Reaction-PathDynamicsin Redundant Internal Coordinates.

73. D. F. McIntosh and K. H. Michelian, Can. J. Spectrosc., 24, 1 (1979). The Wilson GFMatrixMethod of Vibrational Analysis. Part I: General Theory.

74. D. F.McIntosh and K. H.Michelian,Can. J. Spectrosc., 24, 35 (1979). TheWilsonGFMatrixMethod of Vibrational Analysis. Part II. Theory and Worked Examples of the Constructionof the B Matrix.

75. D. F.McIntosh and K. H.Michelian,Can. J. Spectrosc., 24, 65 (1979). TheWilsonGFMatrixMethod of Vibrational Analysis. Part III: Worked Examples of The Vibrational Analysis ofCarbon Dioxide and Water.

76. S. J. Klippenstein, J. Chem. Phys., 94, 6469 (1991). A Bond Length Reaction Coordinate forUnimolecular Reactions. II. Microcanonical and Canonical Implementations with Applica-tion to the Dissociation of NCNO.

77. S. J. Klippenstein, J. Chem. Phys., 96, 367 (1992); Erratum: 96, 5558 (1992). VariationalOptimizations in the Rice–Ramsperger–Kassel–Marcus Theory Calculations For Unimo-lecular Dissociations With No Reverse Barrier.

78. J. Villa, A. Gonzalez-Lafont, J. M. Lluch, and D. G. Truhlar, J. Am. Chem. Soc., 120, 5559(1998). Entropic Effects on the Dynamical Bottleneck Location and Tunneling Contribu-tions for C2H4 þ H ! C2H5. Variable Scaling of External Correlation Energy forAssociation Reactions.

79. J. Villa, J. C. Corchado, A. Gonzalez-Lafont, J. M. Lluch, and D. G. Truhlar, J. Am. Chem.Soc., 120, 12141 (1998). Explanation of Deuterium and Muonium Kinetic Isotope Effectsfor Hydrogen Atom Addition to an Olefin.

80. J. Villa, J. C. Corchado, A. Gonzalez-Lafont, J.M. Lluch, andD.G. Truhlar, J. Phys. Chem.A,103, 5061 (1999). Variational Transition State Theory with Optimized Orientation of theDividing Surface and Semiclassical Tunneling Calculations for Deuterium and MuoniumKinetic Isotope Effects in the Free Radical Association Reaction H þ C2H4 ! C2H5.

81. D. A. Wardlaw and R. A. Marcus, J. Chem. Phys. 83, 3462 (1985). Unimolecular ReactionRate Theory for Transition States of Partial Looseness. II. Implementation and Analysis withApplications to NO2 and C2H6 Dissociations.

82. D. M. Wardlaw and R. A. Marcus, Adv. Chem. Phys. 107, 9776 (1988). On the StatisticalTheory of Unimolecular Processes.

83. S. J. Klippenstein, J. Phys. Chem. 98, 11459 (1994). An Efficient Procedure for Evaluating theNumber of Available States within a Variably Defined Reaction Coordinate Framework.

84. M. Pesa, M. J. Pilling, S. H. Robertson, and D. M. Wardlaw, J. Phys. Chem. A, 102, 8526(1998). Application of the Canonical Flexible Transition State Theory to CH3, CF3, andCCl3 Recombination Reactions.

85. S. C. Smith, J. Chem. Phys., 111, 1830 (1999). Classical Flux Integrals in Transition StateTheory: Generalized Reaction Coordinates.

86. S. Robertson, A. F. Wagner, and D. M. Wardlaw, J. Phys. Chem. A, 106, 2598 (2002).Flexible Transition State Theory for a Variable Reaction Coordinate: Analytical Expressionsand an Application.

87. Y. Georgievskii and S. J. Klippenstein, J. Chem. Phys., 118, 5442 (2003). Variable ReactionCoordinate Transition State Theory: Analytic Results and Application to the C2H3 þ H !C2H4 Reaction.


88. Y. Georgievskii and S. J. Klippenstein, J. Phys. Chem. A, 107, 9776 (2003). Transition StateTheory for Multichannel Addition Reactions: Multifaceted Dividing Surfaces.

89. Y. Georgievskii and S. J. Klippenstein, J. Chem. Phys., 122, 194103 (2005). Long-RangeTransition State Theory.

90. Y.-Y. Chuang and D. G. Truhlar, J. Chem. Phys., 112, 1221 (2000); Erratum: 124, 179903(2006). Statistical Thermodynamics of Bond Torsional Modes.

91. Y.-P. Liu, D.-h. Lu, A. Gonzalez-Lafont, D. G. Truhlar, and B. C. Garrett, J. Am.Chem. Soc., 115, 7806 (1993). Direct Dynamics Calculation of the Kinetic IsotopeEffect for an Organic Hydrogen-Transfer Reaction, Including Corner-Cutting Tunnelingin 21 Dimensions.

92. K. S. Pitzer and W. D. Gwinn, J. Chem. Phys., 10, 428 (1942). Energy Levels and Thermo-dynamic Function for Molecules with Internal Rotation.

93. K. S. Pitzer, J. Chem. Phys., 14, 239 (1946). Energy Levels and Thermodynamic Functionsfor Molecules with Internal Rotation: II. Unsymmetrical Tops Attached to a RigidFrame.

94. D. G. Truhlar, J. Comput. Chem., 12, 266 (1991). A Simple Approximation for theVibrational Partition Function of a Hindered Internal Rotation.

95. B. A. Ellingson, V. A. Lynch, S. L. Mielke, and D. G. Truhlar, J. Chem. Phys., 125, 84305(2006). Statistical Thermodynamics of Bond Torsional Modes. Tests of Separable, Almost-Separable, and Improved Pitzer–Gwinn Approximations.

96. G. Herzberg, Molecular Spectra and Molecular Structure. I. Spectra of Diatomic Molecules,Van Nostrand Reinhold, Princeton, New Jersey, 1950.

97. A. D. Isaacson and D. G. Truhlar, J. Chem. Phys., 76, 1380 (1982). Polyatomic CanonicalVariational Theory for Chemical Reaction Rates. Separable-mode FormalismWith Applica-tion to Hydroxyl Radical þ Diatomic Hydrogen ! Water þ Atomic Hydrogen.

98. D. G. Truhlar, J. Mol. Spect., 38, 415 (1971). Oscillators with Quartic Anharmonicity:Approximate Energy Levels.

99. B. C. Garrett and D. G. Truhlar, J. Phys. Chem., 83, 1915 (1979). Importance of QuarticAnharmonicity for Bending Partition Functions in Transition-State Theory.

100. K. A. Nguyen, C. F. Jackels, and D. G. Truhlar, J. Chem. Phys., 104, 6491 (1996). Reaction-Path Dynamics in Curvilinear Internal Coordinates Including Torsions.

101. Y.-Y. Chuang and D. G. Truhlar, J. Chem. Phys., 107, 83 (1997). Reaction-Path DynamicswithHarmonic Vibration Frequencies in Curvilinear Internal Coordinates: Hþ trans-N2H2

! NH2 þ H2.

102. B. C. Garrett and D. G. Truhlar, J. Chem. Phys., 79, 4931 (1983). A Least-ActionVariational Method for CalculatingMultidimensional Tunneling Probabilities for ChemicalReactions.

103. T. C. Allison and D. G. Truhlar, in Modern Methods for Multidimensional DynamicsComputations in Chemistry, D. L. Thompson, Ed., World Scientific, Singapore, 1998,pp. 618–712. Testing the Accuracy of Practical Semiclassical Methods: Variational Transi-tion State Theory With Optimized Multidimensional Tunneling.

104. Y.-P. Liu, G. C. Lynch, T.N. Truong,D.-h. Lu, D. G. Truhlar, and B. C. Garrett, J. Am. Chem.Soc., 115, 2408 (1993). Molecular Modeling of the Kinetic Isotope Effect for the [1,5]-Sigmatropic Rearrangement of cis-1,3-Pentadiene.

105. D. G. Truhlar and B. C. Garrett, in Annual Review of Physical Chemistry, Vol. 35, B. S.Rabinovitch, J. M. Schurr, and H. L. Strauss, Eds., Annual Reviews, Inc., Palo Alto,California, 1984, pp. 159–189. Variational Transition State Theory.

106. M. M. Kreevoy and D. G. Truhlar, in Investigation of Rates and Mechanisms of Reactions,Fourth edition, Part 1, C. F. Bernasconi, Ed., Wiley, New York, 1986, pp. 13-95. TransitionState Theory.

107. D. G. Truhlar and B. C. Garrett, Journal de Chimie Physique, 84, 365 (1987). DynamicalBottlenecks and Semiclassical Tunneling Paths for Chemical Reactions.

References 227

108. B. C. Garrett, T. Joseph, T. N. Truong, and D. G. Truhlar, Chem. Phys., 136, 271 (1989).Application of the Large-Curvature Tunneling Approximation to Polyatomic Molecules:Abstraction of H or D by Methyl Radical.

109. T. N. Truong, D.-h. Lu, G. C. Lynch, Y.-P. Liu, V. S. Melissas, J. J. P. Stewart, R. Steckler,B. C. Garrett, A. D. Isaacson, A. Gonzalez-Lafont, S. N. Rai, G. C. Hancock, T. Joseph, andD. G. Truhlar, Comput. Phys. Commun., 75, 143 (1993). MORATE: A Program for DirectDynamics Calculations of Chemical Reaction Rates by Semiempirical Molecular OrbitalTheory.

110. A. Fernandez-Ramos and D. G. Truhlar, J. Chem. Phys., 114, 1491 (2001). ImprovedAlgorithm for Corner-Cutting Tunneling Calculations.

111. J. Pu, J. C. Corchado, and D. G. Truhlar, J. Chem. Phys., 115, 6266 (2001). Test ofVariational Transition State Theory With Multidimensional Tunneling ContributionsAgainst an Accurate Full-Dimensional Rate Constant Calculation for a Six-Atom System.

112. J. Pu andD.G.Truhlar, J. Chem.Phys.,117, 1479 (2002).Validation ofVariational TransitionState Theory with Multidimensional Tunneling Contributions Against Accurate QuantumMechanical Dynamics for H þ CH4 ! H2 þ CH3 in an Extended Temperature Interval.

113. R. A. Marcus, J. Chem. Phys., 45, 4493 (1966). On the Analytical Mechanics of ChemicalReactions. Quantum Mechanics of Linear Collisions.

114. A. Kuppermann, J. T. Adams, and D. G. Truhlar, in Abstractions of Papers, VIII ICPEAC,Beograd, 1973, B. C. Cubic and M. V. Kurepa, Eds., Institute of Physics, Belgrade, Serbia.1973 pp. 149–150.

115. R. A. Marcus and M. E. Coltrin, J. Chem. Phys., 67, 2609 (1977). A New Tunneling Path forReactions Such as HþH2 ! H2þH.

116. R. T. Skodje, D. G. Truhlar, and B. C. Garrett, J. Chem. Phys., 77, 5955 (1982). VibrationallyAdiabatic Models for Reactive Tunneling.

117. M. M. Kreevoy, D. Ostovic, D. G. Truhlar, and B. C. Garrett, J. Phys. Chem., 90, 3766(1986). Phenomenological Manifestations of Large-Curvature Tunneling in Hydride Trans-fer Reactions.

118. D. G. Truhlar and M. S. Gordon, Science, 249, 491 (1990). From Force Fields to Dynamics:Classical and Quantal Paths.

119. Y. Kim, D. G. Truhlar, and M. M. Kreevoy, J. Am. Chem. Soc., 113, 7837 (1991). AnExperimentally Based Family of Potential Energy Surfaces for Hydride Transfer BetweenNADþ Analogues.

120. A. Fernandez-Ramos, D. G. Truhlar, J. C. Corchado, and J. Espinosa-Garcia, J. Phys. Chem.A, 106, 4957 (2002). Interpolated Algorithm for Large-Curvature Tunneling Calculations ofTransmission Coefficients for Variational Transition State Theory Calculations of ReactionRates.

121. A. Fernandez-Ramos and D. G. Truhlar, J. Chem. Theory Comput., 1, 1063 (2005). A NewAlgorithm for Efficient Direct Dynamics Calculations of Large-Curvature Tunneling and itsApplication to Radical Reactions with 9–15 Atoms.

122. G. C. Lynch, P. Halvick, D. G. Truhlar, B. C. Garrett, D. W. Schwenke, and D. J. Kouri,Z. Naturforsch., 44a, 427 (1989). Semiclassical and Quantum Mechanical Calculations ofIsotopic Kinetic Branching Ratios for the Reaction of O(3P) with HD.

123. D. C. Chatfield, R. S. Friedman, D. G. Truhlar, and D.W. Schwenke, FaradayDiscuss. Chem.Soc., 91, 289 (1991). Quantum-Dynamical Characterization of Reactive Transition States.

124. B. C. Garrett, N. Abusalbi, D. J. Kouri, and D. G. Truhlar, J. Chem. Phys., 83, 2252 (1985).Test of Variational Transition State Theory and the Least-Action Approximation forMultidimensional Tunneling Probabilities Against Accurate Quantal Rate Constants fora Collinear Reaction Involving Tunneling into an Excited State.

125. D. G. Truhlar, J. Chem. Soc. Faraday Trans., 90, 1740 (1994). General Discussion.

126. B. C. Garrett and D. G. Truhlar, J. Phys. Chem., 89, 2204 (1985). Generalized TransitionState Theory and Least-Action Tunneling Calculations for the Reaction Rates of Atomic


Hydrogen(Deuterium) þ Molecular Hydrogen (n ¼ 1)! Molecular Hydrogen(HydrogenDeuteride) þ Atomic Hydrogen.

127. S. C. Tucker, D. G. Truhlar, B. C. Garrett, and A. D. Isaacson, J. Chem. Phys., 82, 4102(1985). Variational Transition State Theory With Least-Action Tunneling Calculations forthe Kinetic Isotope Effects in the Atomic ChlorineþMolecular Hydrogen Reaction: Tests ofExtended-LEPS, Information-Theoretic, and Diatomics-in-Molecules Potential Energy Sur-faces.

128. A. Fernandez-Ramos, Z. Smedarchina, M. Zgierski, W. Siebrand, and M. A. Rios, J. Am.Chem. Soc., 121, 6280 (1999). Direct-Dynamics Approaches to Proton Tunneling RateConstants. A Comparative Test for Molecular Inversions and Application to 7-Azain-dole.

129. M. Y. Ovchinnikova, Chem. Phys., 36, 85 (1979). The Tunneling Dynamics of the Low-Temperature Hydrogen Atom Exchange Reactions.

130. V. K. Babamov and R. A. Marcus, J. Chem. Phys., 74, 1790 (1981). Dynamics of HydrogenAtom and Proton Transfer Reactions. Symmetric Case.

131. D. K. Bondi, J. N. L. Connor, B. C. Garrett, and D. G. Truhlar, J. Chem. Phys., 78, 5981(1983). Test of Variational Transition State Theory with a Large-Curvature TunnelingApproximation Against Accurate Quantal Reaction Probabilities and Rate Coefficients forThree Collinear Reactions with Large Reaction-Path Curvature: Atomic ChlorineþHydro-gen Chloride, Atomic Chlorine þ Deuterium Chloride, and Atomic Chlorine þ MuCl.

132. A. Gonzalez-Lafont, T. N. Truong, and D. G. Truhlar, J. Phys. Chem., 95, 4618 (1991).Direct Dynamics Calculations with NDDO (Neglect of Diatomic Differential Overlap)Molecular Orbital Theory with Specific Reaction Parameters.

133. Y. Kim, J. C. Corchado, J. Villa, J. Xing, andD. G. Truhlar, J. Chem. Phys., 112, 2718 (2000).Multiconfiguration Molecular Mechanics Algorithm for Potential Energy Surfaces of Che-mical Reactions.

134. T. V. Albu, J. C. Corchado, andD. G. Truhlar, J. Phys. Chem. A, 105, 8465 (2001).MolecularMechanics for Chemical Reactions: A Standard Strategy for Using MulticonfigurationMolecular Mechanics for Variational Transition State Theory with Optimized Multidimen-sional Tunneling.

135. H. Lin, J. Pu, T. V. Albu, and D. G. Truhlar, J. Phys. Chem. A, 108, 4112 (2004).Efficient Molecular Mechanics for Chemical Reactions Using Partial Electronic StructureHessians.

136. Y.-Y. Chuang, P. L. Fast, W.-P. Hu, G. C. Lynch, Y.-P. Liu, and D. G. Truhlar, MORATE—version 8.5. Available: http://comp.chem.umn.edu/morate.

137. J. C. Corchado, Y.-Y. Chuang, E. L. Coitino, and D. G. Truhlar, GAUSSRATE—version 9.4.Available: http://comp.chem.umn.edu/gaussrate.

138. Y.-Y. Chuang, J. C. Corchado, J. Pu, and D. G. Truhlar, GAMESSPLUSRATE—version 9.3.Available: http://comp.chem.umn.edu/gamessplusrate.

139. J. Pu, J. C. Corchado, B. J. Lynch, P. L. Fast and D. G. Truhlar, MULTILEVELRATE—version 9.3. Available: http://comp.chem.umn.edu/multilevelrate.

140. T. V. Albu, J. C. Corchado, Y. Kim, J. Villa, J. Xing, H. Lin, and D. G. Truhlar, MC-TINKERATE—version 9.1. Available: http://comp.chem.umn.edu/mc-tinkerate.

141. M. Garcia-Viloca, C. Alhambra, J. Corchado, M. Luz Sanchez, J. Villa, J. Gao, and D. G.Truhlar, CRATE—version 9.0. Available: http://comp.chem.umn.edu/crate.

142. D. G. Truhlar, in The Reaction Path in Chemistry: Current Approaches and Perspectives, D.Heidrich, Ed., Kluwer, Dordrecht, The Netherlands, 1995, pp. 229-255. Direct DynamicsMethod for Calculations of Reaction Rates.

143. Y.-Y. Chuang, and D. G. Truhlar, J. Phys. Chem. A, 101, 3808, 8741(E) (1997). ImprovedDual-Level Direct Dynamics Method for Reaction Rate Calculations with Inclusion ofMultidimensional Tunneling Effects and Validation for the Reaction of H with trans-N2H2.

References 229

144. I. Rossi and D. G. Truhlar, Chem. Phys. Lett. 223, 231 (1995). Parameterization of NDDOWavefunctions using Genetic Algorithms: An Evolutionary Approach to ParameterizingPotential Energy Surfaces and Direct Dynamics Calculations for Organic Reactions.

145. J. A. Pople, D. P. Santry, and G. A. Segal, J. Chem. Phys., 43, S129 (1965). Approximate Self-Consistent Molecular Orbital Theory. I. Invariant Procedures.

146. J. A. Pople and D. J. Beveridge, Approximate Molecular Orbital Theory, McGraw-Hill, NewYork, 1970.

147. M. J. S. Dewar, E. G. Zoebisch, E. F. Healy, and J. J. P. Stewart, J. Am. Chem. Soc., 107, 3902(1985). Development and Use of QuantumMechanicalMolecularModels. 76. AM1: ANewGeneral Purpose Quantum Mechanical Molecular Model.

148. M. J. S. Dewar and E. G. Zoebisch, J. Mol. Struct. (THEOCHEM), 180, 1 (1988). Extensionof AM1 to the Halogens.

149. A. Warshel and R. M. Weiss, J. Am. Chem. Soc., 102, 6218 (1980). An Empirical ValenceBond Approach for Comparing Reactions in Solutions and in Enzymes.

150. Y. T. Chang and W. H. Miller, J. Phys. Chem., 94, 5884 (1990). An Empirical Valence BondModel for Constructing Global Potential Energy Surfaces for Chemical Reactions ofPolyatomic Molecular Systems.

151. Y. T. Chang, C. Minichino, and W. H. Miller, J. Chem. Phys., 96, 4341 (1992). ClassicalTrajectory Studies of the Molecular Dissociation Dynamics of Formaldehyde: H2CO!H2

þ CO.

152. J. Ischtwan andM. A. Collins, J. Chem. Phys., 100, 8080 (1994). Molecular Potential EnergySurfaces by Interpolation.

153. K. A. Nguyen, I. Rossi, and D. G. Truhlar, J. Chem. Phys., 103, 5222 (1995). A Dual-LevelShepard Interpolation Method for Generating Potential Energy Surfaces for DynamicsCalculations.

154. J. C. Corchado, E. L. Coitino, Y.-Y. Chuang, P. L. Fast, and D. G. Truhlar, J. Phys. Chem. A,102, 2424 (1998). Interpolated Variational Transition-State Theory by Mapping.

155. W.-P. Hu, Y.-P. Liu, and D. G. Truhlar, J. Chem. Soc., Faraday Trans., 90, 1715 (1994).Variational Transition-State Theory and Semiclassical Tunneling Calculations With Inter-polated Corrections: A New Approach to Interfacing Electronic Structure Theory andDynamics for Organic Reactions.

156. Y.-Y. Chuang, J. C. Corchado, and D. G. Truhlar, J. Phys. Chem. A, 103, 1140 (1999).Mapped Interpolation Scheme for Single-Point Energy Corrections in Reaction Rate Calcu-lations and a Critical Evaluation of Dual-Level Reaction Path Dynamics Methods.

157. D. G. Truhlar, J. Chem. Educ., 62, 104 (1985). Nearly Encounter-Controlled Reactions: TheEquivalence of the Steady-State and Diffusional Viewpoints.

158. K. A. Connors, Chemical Kinetics: The Study of Reaction Rates in Solution; VCH Publishers,New York, 1990, pp. 207–208.

159. Y.-Y. Chuang, C. J. Cramer, and D. G. Truhlar, Int. J. Quantum Chem., 70, 887 (1998).Interface of Electronic Structure and Dynamics for Reactions in Solution.

160. M. J. Pilling and P. W. Seakins, Reaction Kinetics, Oxford University Press, Oxford, UnitedKingdom, 1995, pp. 155–156.

161. C. J. Cramer and D. G. Truhlar, in Reviews in Computational Chemistry, Vol. 6, K. B.Lipkowitz and D. B. Boyd, Eds., VCH Publishers, New York, 1995, pp. 1–72. ContinuumSolvation Models: Classical and Quantum Mechanical Implementations.

162. C. J. Cramer and D. G. Truhlar, in Solvent Effects and Chemical Reactivity, O. Tapia and J.Bertran, Eds., Kluwer, Dordrecht, The Netherlands, 1996, pp. 1–80. [Understanding Chem.React. 17, 1–80 (1996).] Continuum Solvation Models.

163. D. J. Giesen, C. C. Chambers, G. D. Hawkins, C. J. Cramer, and D. G. Truhlar, inComputational Thermochemistry, K. Irikura and D. J. Frurip, Eds., American ChemicalSociety Symposium Series Volume 677, Washington, D.C., 1998, pp. 285–300. ModelingFree Energies of Solvation and Transfer.


164. G. D. Hawkins, T. Zhu, J. Li, C. C. Chambers, D. J. Giesen, D. A. Liotard, C. J. Cramer, andD. G. Truhlar, in Combined Quantum Mechanical and Molecular Mechanical Methods, J.Gao andM. A. Thompson, Eds., American Chemical Society Symposium Series Volume 712,Washington, D.C., 1998, pp. 201–219. Universal Solvation Models.

165. J. Li, G. D. Hawkins, C. J. Cramer, and D. G. Truhlar, Chem. Phys. Lett., 288, 293 (1998).Universal Reaction Field Model Based on Ab Initio Hartree-Fock Theory.

166. T. Zhu, J. Li, G. D. Hawkins, C. J. Cramer, and D. G. Truhlar, J. Chem. Phys., 109, 9117(1998). Density Functional Solvation Model Based on CM2 Atomic Charges.

167. C. J. Cramer and D. G. Truhlar, Chem. Rev., 99, 2161 (1999). Implicit Solvation Models:Equilibria, Structure, Spectra, and Dynamics.

168. J. Li, T. Zhu, G. D. Hawkins, P. Winget, D. A. Liotard, C. J. Cramer, and D. G. Truhlar,Theor. Chem. Acc., 103, 9 (1999). Extension of the Platform of Applicability of theSM5.42R Universal Solvation Model.

169. C. J. Cramer and D. G. Truhlar, in Free Energy Calculations in Rational Drug Design, M. R.Reddy and M. D. Erion, Eds., Kluwer Academic/Plenum, New York, 2001, pp. 63–95.Solvation Thermodynamics and the Treatment of Equilibrium and Nonequilibrium Solva-tion Effects by Models Based on Collective Solvent Coordinates.

170. J. D. Thompson, C. J. Cramer, and D. G. Truhlar, J. Phys. Chem. A, 108, 6532 (2004). NewUniversal Solvation Model and Comparison of the Accuracy of Three Continuum SolvationModels, SM5.42R, SM5.43R, and C-PCM, in Aqueous Solution and Organic Solvents andfor Vapor Pressures.

171. C. P. Kelly, C. J. Cramer, and D. G. Truhlar, J. Chem. Theory Comput., 1, 1133 (2005). SM6:A Density Functional Theory Continuum Solvation Model for Calculating Aqueous Solva-tion Free Energies of Neutrals, Ions, and Solute-Water Clusters.

172. D. A. McQuarrie, Statistical Mechanics, Harper & Row, New York, 1976, pp. 266.

173. D. G. Truhlar, Y.-P. Liu, G. K. Schenter, and B. C. Garrett, J. Phys. Chem., 98, 8396 (1994).Tunneling in the Presence of a Bath: A Generalized Transition State Theory Approach.

174. Y.-Y. Chuang, and D. G. Truhlar, J. Am. Chem. Soc., 121, 10157 (1999). NonequilibriumSolvation Effects for a Polyatomic Reaction in Solution.

175. C. Alhambra, J. Corchado, M. L. Sanchez, M. Garcia-Viloca, J. Gao, and D. G. Truhlar,J. Phys. Chem. B, 105, 11326 (2001). Canonical Variational Theory for Enzyme Kineticswith the Protein Mean Force and Multidimensional Quantum Mechanical TunnelingDynamics. Theory and Application to Liver Alcohol Dehydrogenase.

176. D. G. Truhlar, J. Gao, C. Alhambra, M. Garcia-Viloca, J. Corchado, M. L. Sanchez, and J.Villa, Acc. Chem. Res., 35, 341 (2002). The Incorporation of Quantum Effects in EnzymeKinetics Modeling.

177. M. Garcia-Viloca, C. Alhambra, D. G. Truhlar, and J. Gao, J. Comput. Chem., 24, 177(2003). Hydride Transfer Catalyzed byXylose Isomerase:Mechanism andQuantumEffects.

178. T. D. Poulsen, M. Garcia-Viloca, J. Gao, and D. G. Truhlar, J. Phys. Chem. B,107, 9567(2003). Free Energy Surface, Reaction Paths, and Kinetic Isotope Effect of Short-Chain Acyl-CoA Dehydrogenase.

179. D. G. Truhlar, J. Gao,M. Garcia-Viloca, C. Alhambra, J. Corchado,M. L. Sanchez, and T. D.Poulsen, Int. J. Quantum Chem., 100, 1136 (2004). Ensemble-Averaged VariationalTransition State Theory with Optimized Multidimensional Tunneling for Enzyme Kineticsand Other Condensed-Phase Reactions.

180. D. G. Truhlar, in Isotope Effects in Chemistry and Biology, A. Kohen and H.-H. Limbach,Eds., Marcel Dekker, Inc., New York, 2006, pp. 579–620. Variational Transition StateTheory and Multidimensional Tunneling for Simple and Complex Reactions in the GasPhase, Solids, Liquids, and Enzymes.

181. M. J. Rothman, L. L. Lohr, Jr., C. S. Ewig, and J. R. VanWazer, in Potential Energy Surfacesand Dynamics Calculations, D. G. Truhlar, Ed., Plenum, New York, 1981, pp. 653–660.Application of the Energy Minimization Method to a Search for the Transition State for theH2 þ D2 Exchange Reaction.

References 231

182. R. Steckler and D. G. Truhlar, J. Chem. Phys., 93, 6570 (1990). Reaction-Path Power SeriesAnalysis of NH3 Inversion.

183. D. Heidrich, in The Reaction Path in Chemistry, D. Heidrich, Ed., Kluwer, Dordrecht, TheNetherlands, 1995, pp. 1–10. An Introduction to the Nomenclature and Usage of theReaction Path Concept.

184. B. R. Brooks, R. E. Bruccoleri, B. D. Olafson, D. J. States, S. Swaminathan, and M. Karplus,J. Comput. Chem., 4, 187 (1983). CHARMM: A Program for Macromolecular Energy,Minimisation and Dynamics Calculations.

185. G. N. Patey and J. P. Valleau, Chem. Phys. Lett., 21, 297 (1973). The Free Energy of Sphereswith Dipoles: Monte Carlo with Multistage Sampling.

186. G. N. Patey and J. P. Valleau, J. Chem. Phys., 63, 2334 (1975). A Monte Carlo Method forObtaining the Interionic Potential of Mean Force in Ionic Solution.

187. G. M. Torrie and J. P. Valleau, J. Comput. Phys., 23, 187 (1977). Nonphysical SamplingDistributions in Monte Carlo Free Energy Estimation: Umbrella Sampling.

188. M. Garcia-Viloca, C. Alhambra, D. G. Truhlar, and J. Gao, J. Chem. Phys., 114, 9953 (2001).Inclusion of QuantumMechanical Vibrational Energy in Reactive Potentials of Mean Force.

189. D.C. Chatfield, R.S. Friedman, D.W. Schwenke, and D.G. Truhlar, J. Phys. Chem, 96, 2414(1992). Control of Chemical Reactivity by Quantized Transition States.

190. B. J. Gertner, J.P. Bergsma, K. R. Wilson, S. Lee, and J. T. Hynes, J. Chem. Phys., 86, 1377(1987). Nonadiabatic Solvation Model for SN2 Reactions in Polar Solvents.

191. W. P. Kierstad, K. R. Wilson, and J. T. Hynes, J. Chem. Phys., 95, 5256 (1991). MolecularDynamics of a Model SN1 Reaction in Water.

192. J. T. Hynes, in Solvent Effects and Chemical Reactivity, O. Tapia and J. Bertran, Eds., Kluwer,Dordrecht, The Netherlands, 1996, pp. 231–258. Crossing the Transition State in Solution.

193. M. J. T. Jordan and R. G. Gilbert, J. Chem. Phys., 102, 5669 (1995). Classical TrajectoryStudies of the Reaction CH4 þ H ! CH3 þ H2.

194. J. M. Bowman, D. Wang, X. Huang, F. Huarte-Larranaga, and U. Manthe, J. Chem. Phys.,114, 9683 (1991). The Importance of an Accurate CH4Vibrational Partition Function in FullDimensionality Calculations of the CH4 þ H ! CH3 þ H2 Reaction.

195. F. Huarte-Larranaga and U. Manthe, J. Chem. Phys., 113, 5115 (2000). Full DimensionalQuantum Calculations of the CH4 þ H ! CH3 þ H2 Reaction Rate.

196. F. Huarte-Larranaga and U. Manthe, J. Phys. Chem. A, 105, 2522 (2001). QuantumDynamics of the CH4 þ H ! CH3 þ H2 Reaction. Full Dimensional and ReducedDimensionality Rate Constants Calculations.

197. C. P. Kelly, J. D. Xidos, J. Li, J. D. Thompson, G. D. Hawkins, P. D. Winget, T. Zhu,D. Rinaldi, D. A. Liotard, C. J. Cramer, D. G. Truhlar, and M. J. Frisch, MN-GSM, version5.2, Univeristy of Minnesota, Minneapolis, Minnesota, 55455-0431, 2005.

198. M. J. Frisch, G.W. Trucks, H. B. Schlegel, G. E. Scuseria,M. A. Robb, J. R. Cheeseman, V. G.Zakrzewski, J. A.Montgomery, R. E. Stratmann, J. C. Burant, S. Dapprich, J. M.Millam, A.D. Daniels, K. N. Kudin,M. C. Strain, O. Farkas, J. Tomasi, V. Barone,M. Cossi, R. Cammi,B. Mennucci, C. Pomelli, C. Adamo, S. Clifford, J. Ochterski, G. A. Petersson, P. Y. Ayala,Q. Cui, K. Morokuma, D. K. Malick, A. D. Rabuck, K. Raghavachari, J. B. Foresman, J.Cioslowski, J. V. Ortiz, B. B. Stefanov, G. Liu, A. Liashenko, P. Piskorz, I. Komaromi, R.Gomperts, R. L. Martin, D. J. Fox, T. Keith, M. A. Al-Laham, C. Y. Peng, A. Nanayakkara,C. Gonzalez, M. Challacombe, P. M. W. Gill, B. G. Johnson, W. Chen, M. W. Wong, J. L.Andres, M. Head-Gordon, E. S. Replogle, and J. A. Pople, Gaussian 98, Revision A.3,Gaussian, Inc., Pittsburgh, Pennsylvania, 1998.


CHAPTER 4

Coarse-Grain Modeling of Polymers

Roland Faller

Department of Chemical Engineering & Materials Science,University of California—Davis, Davis, California

INTRODUCTION

Polymers are omnipresent in modern life, so it comes as no surprise thata large number of computational studies have been devoted to them. Still, evenwith modern supercomputers, it is elusive that a single molecular modelingstudy can derive large-scale polymer properties ab initio, in part because ofthe size scales and time scales involved in the simulation. What is neededare modeling techniques that are adapted to all relevant length scales and,moreover, we have to combine them in a useful way. This review is devotedto polymer coarse-graining where we explain how to do it. Not all methodsthat have been developed can be presented here, so a ‘‘How-To’’ approachcovering a limited number of coarse-graining techniques is given. An extre-mely wide variety of approaches exists in the extant literature,1–18 and thereader is encouraged to read other recent reviews on polymer coarse-grainingcovered in Refs.19–22.

Many reasons exist for applying coarse-graining schemes to polymersimulations (cf. Figure 1), the most important being that one can carry outsimulations on meaningful time and size scales. With coarse-graining, theoverall structure of a polymer in melts or in solution can be reproduced faith-fully except for the atomistic detail. Coarse-graining improves the speed andmemory requirements of simulations and allows for longer simulation times,larger system sizes, and longer chains. Simulations of long chains with coarse-graining are necessary because the experimentally relevant chain lengths


233

cannot be accommodated by atomistically detailed simulations. The relevantlength scales associated with polymer studies span the range of distancesbeginning from the distance between bonded atoms that are on the order ofAngstroms to the contour length of the chain (at a minimum) that is on theorder of micrometers. Even if computing speeds increase in the future, asthey did over the last decades, we are still decades away from doing atomisticsimulations of hundreds of chains each consisting of a few thousand mono-mers in a melt. Also, the relevant relaxation times increase by N3:4 for largechains of length N23 meaning that very long simulation times are requiredfor such polymer systems, a feat that remains impossible for atomic-level detailin the near future.

Many questions about large scales have already been answered with sim-ple bead-spring models. These models can reproduce scaling behaviors, there-by contributing to our basic understanding of how complex systems behave.However, to obtain numerical results that can be compared directly withexperiments, one needs a meso-scale model that does not represent a simplegeneric polymer but instead represents the identity of the specific polymerbeing studied. A combination of atomistic and meso-scale models is needed,and the models have to be mapped onto each other as uniquely as possible.

It is now accepted that for a simulation to be termed ‘‘multi-scale,’’ ameaningful and well-defined connection between the various length andtime scales is necessary. Techniques have been devised in which simulationson more than one scale are combined to achieve a better understanding ofthe system as a whole.13,19,20 Molecular dynamics as well as Monte Carlosimulations have been applied in polymer coarse-graining including techniquesthat combine atomistic single-chain Monte Carlo results with results frommolecular dynamics simulations on the meso-scale,1 the automatic simplexmapping technique,2,24 the inverted Boltzmann method,3,25 and others.4,13,26

In addition to the techniques, in which a clearly identified mappingbetween length scales has been established, there exists a wide variety of mod-els that can be chosen for computational efficiency on larger scales, but where

0.5 m

Coarse

Graining

1 nm

Figure 1 The challenge of coarse-graining: Atomistic simulations are easily achieved,but many important questions are focused mainly on the macroscopic length scales.

234 Coarse-Grain Modeling of Polymers

the connection to the local atomistic scale is not completely defined. Suchmodels are still valuable because of their ability to reproduce intermediate-scale generic features of polymers at moderate simulation cost. The largegroup of lattice models falls into this category, for example,5,27–30 as do anumber of meso-scale models of the bead-spring type.31–33

DEFINING THE SYSTEM

In any simulation, as in any experiment, one must first define the systembeing studied. In the case of coarse-graining or multi-scale modeling, we mustconsider the different models that can be used, their connections, and theirrelationship to experiments, among other issues.

Choice of Model

If we want to combine simulations on a variety of length scales, logicsuggests that the first step is to devise a mapping of the different models beingused. Mapping is used here in the mathematical sense that a unique ‘‘identifi-cation function’’ is devised. As it turns out, however, this mapping constitutesthe third step of the coarse-grain modeling process. The first step for us to takeis to choose what kinds of models are to be used, whereas the second step is todefine at which anchoring points the mapping between those models shouldtake place. Only when these prerequisites are fulfilled can we begin the map-ping.

The first issue, the choice of models, is addressed by the nature of theproblems at hand. There is no ‘‘one-size-fits-all’’ solution to many of the pro-blems associated with large-scale, coarse-grained modeling, and this fact is oneof the major conclusions of this review. To decide which models to use, weneed to ask two fundamental questions. First, what properties do we wantto calculate or reproduce? The second question is directly connected to thefirst: What length scales do these properties represent and what effects fromother length scales are expected to be relevant? Answers to these questionsimmediately provide the upper and lower limit for the degree of detail. Some-times, it is not possible to immediately answer the question of important influ-ences; in that case, we must begin by considering all length scales that aresmaller than the largest relevant length scale.

It is obvious that computer simulations are not ‘‘reality’’ in the samesense as experiments, but a technically correct simulation will always representtruthfully the model on which it is based. So, we have to devise models that arein agreement with nature. We will focus here on molecular models, i.e., thosemodels that incorporate individual molecules to some degree, the most com-mon of which are the so-called atomistic models. In this kind of modeling,(almost) every atom is represented by a classical interaction site. Classical

Defining the System 235

mechanics is here used to mean particles obeying Newton’s laws. Althoughnature is based on quantum mechanics, in most problems in soft-condensedmatter, including polymers, the direct quantum effects are often negligibleand can usually be omitted from a simulation. Only for motions of hydrogenatoms will quantum effects play a key role, but in most ‘‘atomistic’’ models,such hydrogens are not treated explicitly. Instead, those light hydrogen atomsare usually combined with the heavier atom to which they are attached. Theseunited atom (UA) models thus contain only ‘‘heavy’’ atoms in which thehydrogens are subsumed. Creating such fictitious atoms for purposes of mod-eling is the second step in a hierarchy of coarse-graining, in which the first stepwas to neglect the influence of quantum mechanics.

The prediction of most local properties requires molecular-level model-ing because those properties depend on the existence of individual molecules.In contrast, a density-based field model is often sufficient on larger scales. Inthe case of lattice models, it is obvious that using any length scale smaller thanthe mesh size is meaningless. In continuous space models, the length scalessmaller than the particle size or bond lengths are meaningless as well. A gen-eral rule of thumb is that one should not analyze a simulation on a given lengthscale or a time scale unless it is at least an order of magnitude greater than thesmallest intrinsic time or length scale of the system being modeled. A goodexample is the time step in a molecular dynamics simulation; only if a motionis described by at least 10 time steps can we refer to it as being reasonably welldescribed.

It may be necessary to use two or more models to cover the range of rele-vant interactions depending on the problem at hand. To have a meaningfulmapping between scales, there must be a significant overlap between the scalesdescribed by the models to be mapped onto each other. Typical models usedfor simulations can cover three orders of magnitude in time or length. Forexample, atomistic models can treat the length scales of a few hundred pic-ometers to tens of nanometers and they can cover time periods from picose-conds to tens of nanoseconds, whereas meso-scale models are useful from afew nanometers to a few micrometers in size and from a few hundred picose-conds up to microseconds in time. In this case, the overlap between atomisticand meso-scale models is sufficient. However, if we need to enter the realm ofmicrometers or even millimeters and beyond in size, a second or third mappingwill be necessary.

A large part of the computational literature is devoted to the mathema-tical identification of the mapping that we now use and also the the applicabil-ity of effective pair interactions. Technically, coarse-graining is a modelreduction. Let fsmallðfrg; fpgÞ be a function describing an observable in the ato-mistic scale model. It depends on the position r and momenta p of the particles(in the case of a particle-based model). Similarly, flargeðfRg; fPgÞ is the samefunction in the nonatomistic model with positions R and momenta P. It is clearthat fsmall ¼ flarge should be valid, but for which ðfRg; fPgÞ? The equality


should also hold if one of the models is field-based and the other is particle-based, i.e., fsmallðfrg; fpgÞ ¼ flargeðrÞ where r represents the field. The field isoften a density field and, without loss of generality, we can use the fielddescription for the larger scale.

Interaction Sites on the Coarse-Grained Scale

In a meso-scale model, a group of atoms is often replaced by a singleinteraction center. This center is usually the size of a monomer in a polymerand it is often called a super-atom. Because the super-atoms are the only inter-action centers in a meso-scale simulation, they are required to carry the infor-mation of the interactions between the real atoms in their local geometricalarrangements that are imposed by the chemistry of the polymer.

The choice of which super-atoms to use is arbitrary in principal, butthere exist a number of criteria to consider when making this selection. It isbeneficial if the distance between super-atoms along the polymer chain isstrictly defined. Figure 2 shows three possibilities for placement of super-atomsalong polystyrene and the corresponding distributions of meso-scale bondlengths. This distribution is obtained by performing an atomistic simulationand, afterwards, measuring the distances between chosen super-atom centers.The choice indicated by (a) is clearly advantageous because it represents a singlepeak in the normalized frequency of distances. With this placement, we see thatonly a few atomistic torsional degrees of freedom exist between the super-atom

0

0.2

0.4

0.6

0. 8

0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55

(a)(b)(c)

r [nm]

h(r)

/r [

arb.

uni

ts]

2

H2C

C

H

H2C

C

H

Superatom Center (a)

(b)(c)

Figure 2 Various anchor sites for the super-atoms of polystyrene. Reprinted fromComputers and Chemical Engineering Volume 29, Q. Sun and Roland Faller, SystematicCoarse-Graining of Atomistic Models for Simulation of Polymeric Systems, pp.2380–2385, Copyright (2005) with permission from Elsevier.

Defining the System 237

centers; thus, there is little torsional freedom to influence the rigid structure.Single peak distributions can be modeled by a single Gaussian curve. ThisGaussian is the distribution of a harmonic bond potential in which theheight-to-width ratio defines the bond strength. In contrast to this situation,a multiplicity of peaks, as in (b) or (c), would lead to an interdependence ofsuper-bonds and super-angles, making it difficult to isolate useful potentialenergy models.

The nonbonded interaction potential of super-atoms is related to theshape of the group of atoms being represented in one super-atom. A favorablemodeling scenario exists when this interaction is spherically symmetric, thusavoiding the use of anisotropic potentials. Coarse-graining approaches usinganisotropic potentials have shown that little is gained by the much highercomplexity of the simulation in return for the slight gains in accuracy.14,15

If a single spherical potential is not satisfactory to represent a monomer as,for example, in polycarbonate polymers,1 it is more economical in terms ofcomputing speed to use more than one spherical super-atom than it is to usea complex, anisotropic potential energy function.

Coarse-graining techniques are based on the idea of effective interactionpotentials between super-atoms. The larger-scale model is calibrated againstthe smaller-scale model, which is considered to be more reliable and is usedas a ‘‘gold standard.’’ Thus, any deficiencies of the small-scale model are car-ried over to the large-scale model.

Another issue that needs to be taken into account when considering theselection of super-atoms and their mapping is whether the polymer understudy contains a degree of randomness, as in atactic chains. One can selectanchor sites that are the same in both tacticities and ignore the local tacticityinformation, or one can map meso and racemo dyads separately leading to amore complex description that, in essence, is a finer-grained coarse-grainingstep. For the case of polystyrene, both approaches have been taken and it isnot yet clear which is more successful.34,35 In general, the choice of super-atoms and anchor sites needed for the mapping for any system must obeythe boundary conditions of the problem at hand.

STATIC MAPPING

Single-Chain Distribution Potentials

One of the earliest approaches of systematic mapping was attempted byTschop et al.1 Their technique begins with a detailed quantum chemical calcu-lation of a few monomers of the system to obtain energetically favorable localconformations and their relative energies. Those quantum chemically deter-mined distributions are then used to perform single-chain Monte Carlo simu-lations in a vacuum. The corresponding distributions of super-atom bond


lengths, bond angles, and dihedral angles are recorded. To obtain a correctpotential, those distributions have to be weighted by the correspondingJacobians between local and global coordinate systems, which, for bondlengths, is r2 coming from the transformation from spherical to cartesian coor-dinates. These distributions are then Boltzmann-inverted to obtain intramole-cular potentials, i.e., a potential is derived from the temperature-weightedlogarithm of the distribution. In a vacuum, the free energy difference betweenconformations equals the difference in potential energy, as expressed in Eq. [1].

Vð�Þ ¼ �kBT ln pð�Þ ½1�

In this equation, � can stand for bond lengths, bond angles, or torsional angles.The distribution of these structural features pð�Þ is taken after the Jacobiancorrection. It is noteworthy that this potential is completely numerical. To cal-culate its derivative so as to obtain the corresponding forces, smoothing tech-niques like local splines or running averages are used. This static mappingtechnique can only be used for single chains in a vacuum; otherwise, the iden-tity between the potential and the free energy cannot be justified.

Simplex

A number of direct ways for linking atomistic and meso-scale melt simu-lations have been proposed more recently. The idea behind these direct meth-ods is to reproduce structure or thermodynamics of the atomistic simulationon the meso-scale self-consistently. As this approach is an optimization pro-blem, mathematical optimization techniques are applicable. One of the mostrobust (but not very efficient) multidimensional optimizers is the simplex opti-mizer, which has the advantage of not needing derivatives, which are difficultto obtain in the simulation. The simplex method was first applied to optimiz-ing atomistic simulation models to experimental data.36,37 We can formallywrite any observable, like, for example, the density r, as a function of theparameters of the simulation model Bi. In Eq. [2], the density is a functionof the Lennard–Jones parameters,

r ¼ f ðfBigÞ ½2�

This mathematical identification means that we interpret our simulation alongwith the subsequent analysis of the observable by evaluating a complex func-tion. This function, in multidimensional space, can be optimized as can anymathematical function. For an optimizer to be applicable, one must define asingle-valued function with a minimum (or maximum) at the desired targetas, for example, the sum of square deviations from target values in Eq. [3].

f ¼Xi

�AðfEig; fsigÞ � Atarget

2 ½3�

Static Mapping 239

Every function evaluation requires a complete equilibration sequence (eithermolecular dynamics (MD) or Monte Carlo (MC)) for the given parameters,followed by a production run and the subsequent analysis. To ensure equili-bration, one must be certain that no drift in the observables exists, for whichan automatic detection of equilibration was developed.36 It has been shownthat derivatives of observables with respect to simulation parameters can becalculated in some cases, paving the way for more efficient optimizers.38

In the context of polymer mapping, we point out that typical target func-tions are not experimental observables but, instead, are related to the structureof the system that is characterized by structure factors or by radial distributionfunctions, although optimization against experimental data has been per-formed as well.16,17 The single-valued function to be minimized is then theintegral over the squared difference of, e.g., radial distribution functions.2 Ifnecessary, the radial distribution function can be multiplied by a weightingfunction wðrÞ as in Eq. [4].2,3,35 Because the local structure is most difficultto reproduce, an exponential decay wðrÞ / expð�arÞ with some decay lengtha�1 is often a good choice.

f ¼ðdrwðrÞ gðrÞ � gtargetðrÞ

� �2 ½4�

A drawback of the simplex and other analytical optimizers is the unavailabilityof numerical potentials. What is needed is a relatively small set of parameters,Bi, defining the entire parameter space. The limit is typically 4–6 independentparameters, because any additional dimension increases the need for comput-ing resources tremendously. A typical choice for such parameters is aLennard–Jones-like expansion2,24 in Eq. [5]

VðRÞ ¼Xi

Bi

ri½5�

where i typically spans the even numbers from 6 to 12. The simplex techniquehas been moderately successful in reproducing monomers of polyisoprene.2

Iterative Structural Coarse-Graining

The Iterative Boltzmann Method (IBM) was developed to circumvent theproblems encountered with the simplex technique.3,25,35,39,40 It is designed tooptimize coarse-graining parameters against the structure of an atomisticsimulation, and it lifts the limitation of needing analytical potentials.

In the limit of infinite dilution, one could use the potential of mean force(PMF) by Boltzmann inverting the pair distribution function from a simulationor an experiment to get an interaction potential between monomers, which is


essentially the nonbonded generalization of the single-chain approachdescribed earlier. Ideas like these have also been used to calculate the PMFof large particles like colloids embedded in matrices of small particles wherethe small particles play only the role of a homogeneous background.41,42

However, in concentrated solutions or in melts, the structure is defined byan interplay of the potential and packing effects. Thus, a direct calculationof the PMF is not a suitable way to obtain the potential energy. Nonetheless,free energies can be used iteratively to approach the correct potential. Toaccomplish this result, a melt or a solution of polymers is simulated in atomis-tic detail to derive a pair distribution function along with all the internal struc-tural functions described earlier. Note that an experimentally obtainedstructure factor could be used as the target alternatively. In that case, however,it is not clear how to define the super-atoms except if one uses partially deut-erated melts (because the experimental super-atoms are not uniquely defined).Moreover, one would have to transform the structure-factor to a radial distri-bution function, because the IBM is local in interaction distance and not localin the wave vector, which means that for every iteration, a one-to-one corre-spondence is assumed between the effects at a distance r0 and the potentialVðr0Þ (or force �drVðrÞjr¼r0 ) at that distance. Using the radial distributionfunction, this locality has been proven to be stable for the iterative procedure;however, we cannot expect this result to hold in wave vector space.

The resulting potential of the IBM is completely numerical, because thepotential energy value at every distance is optimized independently. It is pos-sible (and advantageous) to enforce continuity of the potential by usingweighted local averages, which is important if the function against whichthe potential is to be optimized is relatively noisy. Regrettably, the correctway to remove noise is a longer atomistic simulation, which can be prohibitivein terms of computing times. Widely used atomistic programs like Gromacs43

often use a numerically tabulated potential, however, even if an analyticalform exists for reasons of speed. Then, to calculate a derivative to obtainthe forces, local splines or similar techniques can be used to smooth the func-tion. Cross dependencies of weakly dependent potentials (e.g., bond andangle) can be neglected for computational reasons, but they are normallyeliminated by the proper choice of mapping points anyway.

In Figure 3, we show the resulting potential for the bond angle on themeso-scale using polystyrene chains.35,44 It turns out that no angle states under75 degrees are populated and the corresponding Boltzmann inversion wouldlead to an infinite potential value (as the logarithm of zero diverges) that, inturn, would lead to numerical problems. A practical working approach tonegate this problem is to set the value of the corresponding potential energyto V ¼ 50kBT, thereby preventing such states from being reached. A largervalue of V would lead to numerical problems because the potential changewould be excessively steep, which would lead to huge forces, thus limitingthe size of the time step one could use in the coarse-grain simulation.

Static Mapping 241

When working with iterative structural coarse-graining techniques, welimit ourselves to potentials and distribution functions that depend only ona single coordinate like radial distribution functions (RDFs), bond distance,bond angle, or dihedral angle distributions. These distribution functions areconvenient for describing the structure of polymers, and they enable the useof the IBM3,35,39 to reproduce the structure. As this procedure is iterative,one has to start with a reasonable initial guess as to what the potential func-tion looks like. We invert RDFs for one-component liquid systems by taking asimple inverse of g0ðrÞ, the ‘‘target’’ RDF (from the atomistic simulation),resulting in the PMF. FðrÞ is a free energy and not a potential energy. Simulat-ing our system with the initial guess of the potential (the PMF) will yield a newRDF g1ðrÞ, which differs from the atomistic target result g0ðrÞ because it com-bines both packing and potential effects. The potential, therefore, needs to beimproved, which can be done by adding a correction term,�F ¼ �kBT lnðg0ðrÞ=g1ðrÞÞ. This procedure is iterated until the desired distri-butions of the coarse-grained model and the atomistic model coincide within apreset tolerance. The whole procedure is schematically shown in Figure 4.

Two points are worth stressing here. First, concerning nearest neighbors,the local packing of interaction centers has the greatest influence on the radialdistribution function, so the optimization of the potential should focus initiallyon local interactions. The optimization process should begin with this short-distance region and then, only after the image of the meso-scale RDF of thisregion resembles the atomistic RDF reasonably well, should the tuning processof the other regions begin. It is a good idea to perform a few (typically two to

80 100 120 140 160 175

Angle (degree)

2

4

6

8

10

Ang

le P

oten

tial (

kJ/m

ol)

Figure 3 The bond angle potential on the meso-scale for polystyrene obtained by theIterative Boltzmann Method.35 The angle is defined by three consecutive super-atomsalong the polymer chain.


three) independent optimizations, in series, that focus on increasingly largerdistances. Second, we need to apply different weighting functions wi for thecorrection terms during the iteration. The magnitude of the weighting functionto be used depends on how far the resulting RDF deviates from the targetatomistic RDF; the weighting function is normally set to 1 when the deviationis about 30–40% from the atomistic value. When the deviation is below 30%,a series of parallel runs can be performed with values of wi ¼ 1

8,14, and

12 to find

an optimum starting point for the next step. Running the optimization processin parallel minimizes the time to the next step but requires more computerpower.

In a binary polymer melt ðA� BÞ, the interaction can be sorted into self-interaction ðA� A, B� BÞ and non-self-interactions ðA� BÞ. Because the self-interaction in a polymer melt may not be the same as in the pure polymer,there are actually three target RDFs to be optimized in addition to any bondedinteraction that must be optimized. Although we have correspondingly moretarget functions, it is a good idea to optimize the pure systems first. Followingthat step, we start optimizing the mixture where the A� A and B� B interac-tions are held fixed at their pure polymer value, and only the A� B interactionis tuned. Only when the non-self-interaction has been dealt with can we comeback to the optimization of all three target functions at once.

F

Radial Distribution Function

Simulation and Calculation of

Initial Potential

Target RDF

Difference below Tolerance?

Add F to PotentialΔ

Δ

Done

YES

NO Calculated and Target RDF

Free Energy Difference

Figure 4 Scheme of the iterative procedure used in structural coarse-graining based onthe inverse Boltzmann method.

Static Mapping 243

It is noteworthy to point out that, in any system, optimization of thebonded and the nonbonded parameters can be performed either together inone combined procedure or they can be done separately because the mutualeffects between the two types of interactions are negligible. Because the intra-chain optimization can be achieved much more quickly than can the interchainoptimization, most modelers choose to optimize the two separately.

It has been shown recently, with a comparative study of melts and solu-tions of polyisoprene,3 that the environment has a strong effect on the coarse-grained model. Because polymers in the melt have a different scaling behaviorthan in solution,23 we cannot use the same model when we remove the solvent.For polyisoprene, it was possible to calibrate the meso-scale model at chains oflength 10 and then to perform simulations for chain lengths up to 120.3 Thescaling for the melt and the solution cases were well in agreement with experi-ments and with theoretical expectations.

In Figure 5, we show how one can approach the target RDF by this map-ping technique for a polystyrene melt. We clearly see an increase in accuracyover the course of the optimization process. The RDF indicated ‘‘1st iteration’’used only the PMF. Severe deviations from the target make clear the differencebetween potential energy and free energy. The entropy component of the freeenergy corresponds to the multitude of local conformations subsumed into onemeso-scale position of a super-atom. The effective size of the monomer, indi-cated by the point at which the RDF starts to deviate from zero, is largely over-estimated. Additionally, the local structure is much too pronounced. Note,

0 0.5 1 1.5 2

r (nm)

0

0.5

1

1.5

g(r)

target

1st iteration

2nd

iterationmiddle stageclose to convergence

Figure 5 The approach of the RDF by the Iterative Boltzmann Method in the case ofatactic polystyrene.35 Running averages were applied to the data for clarity. Not alliterations are shown.


too, that the target atomistic RDF rises continuously from zero to one. Theinitial slope of that curve corresponds to the ‘‘hardness’’ of the potential.Clearly the first interaction is too hard.

After only one iteration, the size of the monomer has been reduced andthe potential is much softer. However, the strong overshooting of the firstneighbor peak persists. With a few more iterations, we see that the structureof the atomistic system is reasonably well approximated.

Because these techniques focus only on the structure of the polymericsystem, we are not guaranteed that the thermodynamic state is correctlydescribed, as has been pointed out by Reith et al.25 To avoid such problems,thermodynamic properties should be included in the optimization scheme. Totreat pressure, for example, the following is done:25 After optimizing thepotential energy against the structure, an additional pressure correction poten-tial Vpc of the form

�VpcðrÞ ¼ Apc 1� r

rcut

� �½6�

is added, in which Apc is negative if the pressure is too high and is positive if itwas too low. In Eq. [6], r is the distance between atoms and rcut the cutoff upto which the correction potential is applied. The cutoff is normally chosen tobe the same cutoff distance for the nonbonded terms used in the simulation.This additional potential provides another constant force, in addition to theforce from the structural potential, leading to a constant shift in pressure.Such a potential has only a weak influence on the RDF that can be eradicatedby a short reoptimization. Reith et al. showed that this correction can solve theproblem of having an unphysically high pressure.25

Another coarse-graining technique exists where the atomistic and coarse-grained simulations are not separated.6 On the contrary, both fully detailedand mesoscopically modeled particles are allowed to coexist in the verysame simulation. The detailed particles carry two potentials because theyinteract with the nondetailed particles as if they were nondetailed particles.The aim of this method is to provide a homogeneous structure where detailedand nondetailed particles are fully mixed and the local structure is indistin-guishable. It represents an alternative way to obtain the target functions butdoes not require a different optimization technique.

Mapping Onto Simple Models

In addition to the self-consistent modeling techniques described above,one can use an ad hoc mapping between two independently developed models.In this case, only a few characteristics of the models can be mapped. A goodexample is the stiffness of the chain. The chain stiffness can be characterizedby the persistence length lp, which is derived from an assumed exponential

Static Mapping 245

decay of the directional autocorrelation function along the chain backbone (orthe integral over this function)

e�s=lp ¼ h~uuðrþ sÞ~uuðrÞi ½7�

where r is the curvilinear coordinate along the chain contour, s is the distancealong this curvilinear coordinate, and ~uu is the unit vector denoting the localchain direction. We can measure the persistence length in two independentlydeveloped models and equate it to obtain the mapping of chain lengths. Ofcourse, any other characteristic length scale such as the gyration radius, theend-to-end distance, the monomer size, and so on can be used as well. All theselength scales will generally yield different mappings. If the two models are rea-sonably similar, the differences will be small and the mapping is meaningful;otherwise, the mapping per se is a bad idea. Accordingly, one should not use amodel developed for polydimethylsiloxane, which is probably the most flex-ible of all polymers, to describe the significantly more rigid actin filamentswith persistence lengths that are several orders of magnitude larger.

In principle, the mapping onto simple models assigns a computationallycheap interaction potential to a set of super-atoms. With respect to optimiza-tion, this mapping is just the initial step before any further refinement is car-ried out. In this vein, the polymer models can be similar to one another, inwhich case, we can get a good mapping or we can rely on vastly different poly-mer models with poor mapping qualities and, as such, have essentially nothingto do with each other.

DYNAMIC MAPPING

Molecular dynamics simulations in atomistic detail regularly use a1-femtosecond time step. This time step is required to be about an order ofmagnitude smaller than the fastest characteristic time, which, for many mole-cules of interest, involves bond vibrations. As the bond lengths are customarilyfixed, using techniques like Shake,45,46 Rattle,47,48 or Lincs,49 the fastest timescales in atomistic molecular dynamics are bond angle vibrations that are onthe order of tens of femtoseconds. With a reasonable use of computerresources, one can reach into the nanosecond time range for a simulation.This time period is sufficient for making comparisons with segmentaldynamics in NMR experiments50–52 but not long enough to compare withlarge time scale experiments. Techniques used to map the statics of polymersdescribed earlier lead inherently to larger time scales because the fastestdegrees of freedom are now motions of super-atoms of the size of monomers.If dynamic investigations are desired, one must find a correct mapping of thetime scales involved in the different models.


Mapping by Chain Diffusion

One method for calibrating the time scale is to use the chain diffusioncoefficient. At long enough times, any polymer chain in a melt will end upin diffusive motion as soon as all internal degrees of freedom are relaxed. Asdescribed earlier, static mapping can be used to determine the length scale; onetypically uses the size of the monomer or the distance between super-atomsalong the chain to obtain a suitable length scale for the coarse-grained simula-tion. If both the atomistic and the coarse-grained simulations can be fully equi-librated in the sense that free diffusion of the whole chain is observed, the twodiffusion coefficients can be equated and the time scale is then fixed. Diffusioncoefficients in simulations are normally determined by the mean-square displa-cement through the Einstein relation described in Eq. [8]

D ¼ 1

2dlimt!1hðxðt þ�tÞ � xðtÞÞ2i

�t½8�

where d is the dimensionality of the system and t is the time. In most cases, acomplete free diffusion of an atomistic chain in the melt or in the solution can-not be reached in reasonable computer time, which is the case when a coarse-grained simulation should be used as a means to efficiently equilibrate thestructure from which atomistic simulations will be started.

One example of mapping by chain diffusion involved the case of 10mersof polyisoprene at 413 K. A dynamic mapping between a fully atomistic and avery simple coarse-grained model was demonstrated.7,50 Only chain stiffnesswas used to perform the mapping in that study. The local chain reorientationin both simulations was the same after the time scales had been determined bythe diffusion coefficient. The decay times of the Rouse modes, however, werenot equal, indicating that mapping by stiffness alone is too simplistic.

This mapping, as any dynamic mapping, can become problematic formixtures, because the degree of coarse-graining between the different consti-tuents is not necessarily the same, leading to the problem where the ratio ofdiffusion coefficients in the atomistic and meso-scale simulation can be differ-ent. It has been found in simulations of coarse grained phospholipids that oneprerequisite for a good dynamic mapping is that the masses of the super-atomsshould be similar.53 For example, in lipid simulations, four water moleculeswere mapped into one super-atom so as to create a mass similar to fourCH2 groups that were used as a super-atom in the lipids.

Mapping through Local Correlation Times

Instead of relying on chain diffusion from lengthy simulations, it isoften more convenient to use shorter local time scales to map between

Dynamic Mapping 247

atomistic and coarse-grained length scales, which allows one to carry out amapping if the atomistic simulation does not reach free diffusion. Even iffree diffusion could be reached, the statistical uncertainty of such long timescales is often so great that a shorter time scale is warranted. Candidates forshorter time scales are decay times of higher Rouse modes and, even if theRouse model is an imperfect description of the system under study, such amapping is meaningful because it is well-defined. The Rouse modes are theeigenmodes of the Rouse model (see below). The Rouse mode of index p Xp

is defined as

Xp ¼ 1

N

ðN0

ds cosppsN

�Rs ½9�

where N is the degree of polymerization, s is the coordinate along the chaincontour, and R is the position. Rouse modes of index p are effectively describ-ing a subchain of length N

p , so the first mode describes the chain as a whole, thesecond mode is the structure and dynamics on the length scale of half a chain,and so on. Every Rouse mode has its distinct time scale tp. These time scalesare, in the case of a polymer that follows the Rouse model perfectly, correlatedby t1 ¼ p2tp.

In the extreme case where this subchain for high Rouse modes becomesonly a single monomer, we end up with the segmental relaxation time, i.e., thereorientation dynamics on the monomer scale. This time scale can almostalways be used for mapping, and it can also be used for comparison withand calibration to NMR experiments. If we use the Rouse model for mappingtime scales, we should make sure that the Rouse model is a reasonable descrip-tion for the system under study.

The Rouse model as well as the reptation model are successful modelsfor describing the dynamics of polymers. The Rouse model54 treats the poly-mer as a set of noninteracting beads connected by springs. The dynamics ofpolymer chains in a melt is governed in this model by a viscous force and thestretching forces along the chain. The reptation model confines the motion ofthe polymer to take place in a hypothetical tube but can nonetheless describeglobal dynamic problems. It becomes applicable if a chain becomes longerthan a polymer’s specific entanglement length Le. Mean-squared displace-ments of the monomers g01ðtÞ are important in characterizing polymerdynamics. There exist four processes involving time: (1) For very short timest < te, the polymer segment does not feel the constraints of the tube, so theactual dynamics of the reptation model corresponds to the Rouse model.(2) For te < t � tR, the motion perpendicular to the primitive path, whichlargely follows the chain, is restricted. However, the motion along the primi-tive path is free because it is easier for a polymer to displace itself than itsneighbors. (3) For te < t � td, the internal degrees of freedom are relaxed,


but the chains still are confined in the tube. (4) For t > td, the dynamics isgoverned by free diffusion. All these processes can be summarized byEq. [10]:23,55

g01½t� ¼

Nb2ðt=tRÞ1=2 t < te

Nb2ðt=Z2tRÞ1=4 te < t � tR

Nb2ðt=tdÞ1=2 tR < t � tdNb2ðt=tdÞ t > td

8>>>><>>>>:

½10�

where te, tR, and td are entanglement time, Rouse time, and disengagementtime, respectively; and Z ¼ L

Le, N, and b are the ratio of contour length to

the entanglement length, the degree of polymerization, and the size of a mono-mer, respectively. In the Rouse model, only mean-squared displacement beha-viors with exponents of 1=2 and 1 exist because the only subdiffusive processis the motion of a monomer against the center of mass of the chain at shorttimes. Using coarse graining it was possible to simulate the full spectrum ofthe Rouse and reptation dynamics for atactic polystyrene.44 But Figure 6 illus-trates that a global dynamic mapping of trans-1,4-polyisoprene to a simplebead-spring model that includes stiffness cannot map the local dynamics com-pletely.7,50

101 100 1000 1000010

−4

10−3

10−2

10−1

100

g1,

3/<

Re-

e2 >

Figure 6 Dynamic mapping of polyisoprene at 413 K to a coarse-grained model. Thicklines: atomistic simulations. Thin lines: coarse-grained simulation. The broken linesrepresent mean-squared displacements of the central monomers, i.e., a local quantity.The solid line shows the mean-squared displacement of the center of mass, i.e., a globalproperty. The mean-squared displacement of the center of mass is used for the mapping.We see that the local quantity does not perfectly follow the mapping.

Dynamic Mapping 249

Direct Mapping of the Lennard–Jones Time

A different idea that is independent of the atomistic simulation involvesmapping of the so-called Lennard–Jones time to real time. If one uses thestandard Lennard–Jones units, where we measure lengths in s (the particlediameter), energies in E (the depth of the Lennard–Jones potential), and massesin m (the monomer mass), a natural time scale appears that is conventionallycalled the Lennard–Jones time,48,56

t ¼ s

ffiffiffiffiffim

E

r½11�

This time scale can be used to perform the mapping to the real time scale.3,57

This dynamic mapping runs into a problem when one uses purely numer-ical potentials because, in that case, s and E are not uniquely defined. Onecould select a characteristic length or an energy scale, but the definition ofthis kind of time mapping also becomes ambiguous. Mapping of diffusionor internal time scales are more closely connected to the true system. Nonethe-less, this ‘‘Lennard–Jones’’ mapping can be done a priori without performingthe simulation and thereby can provide an initial estimate of the time scale.Also, if no atomistic simulation is available, this mapping can provide a guide-line for estimating experimental times.

COARSE-GRAINED MONTE CARLO SIMULATIONS

Themain difference betweenMonteCarlo (MC) andMolecularDynamics(MD) simulations is that we do not need to follow the physical trajectory of thesystem with MC, which, in turn, enables us to use ‘‘unphysical’’ moves to coverthe relevant area of phase space more quickly. Such moves include chain break-ing and reattachment,58,59 configurational bias,60 and reptation moves.60

Because we do not have to follow a physical trajectory in an MC simula-tion, we can also use models that are further removed from the true physical orchemical reality. Such models include lattice models (see, e.g., Refs. 28,61,62).With lattice models, the space of our system is (typically) evenly divided intocells, each of which are represented by one lattice site. Lattices can be very sim-ple cubes or they can be specially adapted, highly connected grids.63–65 Hereagain, we need super-atoms, which, however, can occupy only lattice sites. Inmost lattice models every site is either singly occupied or empty, meaning thatthe interaction sites have an impenetrable hard core, which contrasts toLattice–Boltzmann models66 used in studies of hydrodynamics in which everylattice site is occupied by a density, in which case, one deals with a density-based field theory. In lattice models, there exist only a fixed number of dis-tances that can be realized. It makes no sense to distinguish between, say, a


Lennard–Jones or a finite-size well potential; the two are essentially identical.The limited number of distances makes optimizations relatively straight for-ward as illustrated in Figure 7.

In most lattice models, a super-atom can represent a monomer or a Kuhnsegment of the chain.67 In most lattice models, only interactions of very closeneighbors (first or second neighbors) are included such that the calculation ofthe energy, which is the computationally most expensive part of a MonteCarlo calculation, is a sum whose calculation scales linearly with the numberof lattice sites. The actual mapping process, if done systematically, is easierthan without using a lattice. With a lattice model, we have fewer points inthe RDF that need to be reproduced; otherwise, there is no fundamental dif-ference between lattice and off-lattice models.

An often-used coarse-grained Monte Carlo model is the bond-fluctuationmodel.28 In contrast to most other coarse-grained models, it lacks a fixed orquasi-fixed bond length. Instead, connected monomers can occupy all sideand corner sites of an fcc lattice if the monomer to which they are connectedis in the center of the face-centered cube as shown in Figure 8. In this model, asin others, the solvent is typically ignored such that monomers are either occu-pying a site or the site is deemed to be empty.

Monte Carlo simulations can also be performed with an off-latticemodel. In this case, the mapping is the same as described earlier for MD butno dynamic mapping is involved. Kreer et al.68 showed that the number ofMonte Carlo moves can be mapped onto a ‘‘pseudo-time.’’ This mapping pro-cedure can be used only if no nonphysical Monte Carlo moves are applied, i.e.,only local physical moves are allowed. To accomplish this feat, we need toshow that the simulation moves represent the true local dynamics of the mod-el; one includes only the moves that are possible and the relative abundance of

Figure 7 Left: Representation of a polymer on a simple cubic lattice; Right: Interactionpotentials on such a lattice.

Coarse-Grained Monte Carlo Simulations 251

different moves represents truthfully the relative probabilities of the possiblelocal dynamical changes.

Inmost cases, we are not interested in a dynamicmapping. In such circum-stances, we can use all the advances of modern Monte Carlo technology, i.e.,we can apply all conceivable physical and nonphysical moves to derive a correctrepresentation of structure and thermodynamics with less computationaleffort than would be possible with either MD or MC using only local moves.

REVERSE MAPPING

In most applications of coarse graining, one can be satisfied if a relaxed,large-scale description of the system has been derived. However, in some cases,we want to reintroduce atomistic detail in the end, for which we use the anchor-ing points between different models and reverse the mapping.69 This process isnot unique because any constellation of coarse-grained interaction sites repre-sents a variety of constellations of atomistic sites. In reverse mapping, one coulduse the (precalculated) energetically most favorable states and carry out a shortatomistic MD or MC simulation to represent the full system,8 which can tech-nically be done as follows. For short oligomers (two to four monomers) of therespective polymer, a wide variety of local conformations are produced andtheir respective energies are calculated using the atomistic model. For simpli-city, this calculation is done in a vacuum and only the torsional degrees of free-dom are used. Additionally, the relative positions of the super-atoms in thesefragments are stored. From the meso-scale simulation, we have obtained a

Figure 8 The bond fluctuation model. The possible neighbors in two dimensions of theblackmonomer are theneighboring graymonomers. In three dimensions, thismodel leadsto relative bond lengths of 1,

ffiffiffi2p

, andffiffiffi3p

, where 1 corresponds to the lattice spacing.


melt conformation consisting of super-atoms. We then move along all thechains and, fragment by fragment, select the atomistic configurations withthe super-atom constellation fitting the coarse-grained chain. If there is morethan one fragment that fits at a particular position along the chain, we takethe one with the lowest energy. Rather than using only the most favorableand most populated states, we can also use higher-energy states according totheir Boltzmann weighting. In reverse mapping, we need to consider only thetorsional degrees of freedom primarily because the atomistic bonds and anglesare more rigid and less deformable than are torsions.Moreover, bond and angledistributions equilibrate very quickly such that any subsequent short MD simu-lation will be able to provide a realistic distribution. After the reintroduction ofatomistic detail, the melt configuration will have some overlap in the atomicpositions leading to a high energy. Therefore, an energy minimization shouldbe performed before any MD run. Also, one may need to start the simulationwith a small time step and increase it after a few steps as the system equilibrates.

Another way to do reverse mapping is to rigidly fix the super-atom cen-ters in space and perform a local MC simulation of only the small-scale inter-actions.9 Here, one selects a distribution of coarse-grained structures and doesthe local calculation on all structures in that distribution. Figure 9 shows theidea for more than one coarse graining step. The interaction sites marked inblack are the highest degree of coarse-graining, the ones in gray are intermedi-ate, and the white interaction sites are the smallest (atomistic). After perform-ing a simulation of only the black super-atoms, with a coarse-grained potentialobtained in any of the ways described earlier, we have an ensemble of system

Figure 9 The method of Brandt9 where each length scale is treated independently andeach interaction site can only move if the respective length scale is treated. Black, gray,and white circles correspond to coarse-, medium-, and fine-grained monomers,respectively. See text for details.

Reverse Mapping 253

configurations of these super-atoms. A Monte Carlo simulation with finerresolution is then performed for each member of the ensemble without dis-turbing the position of the super-atoms, i.e., in Figure 9 we would moveonly the gray centers while constraining the black centers to obtain an ensem-ble of configurations containing the black and the gray centers. If more thanone level of coarse graining exists, an even finer-grained simulation follows byfixing the black and the gray centers and moving only the now added whitecenters. The difference in degree of detail that one uses here to account for dif-ferent grain models in this technique typically involves every third interactionsite. Thus, we have about three times more gray centers than black centers andthree times more white centers than gray centers.9 If more than one level ofcoarse graining exists, a structural optimization is used for each level to obtaina potential. In some instances, the rigid constraint can also be relaxed.

A LOOK BEYOND POLYMERS

Polymers have traditionally been the focus of multi-scale modeling.Other areas of soft-condensed matter, notably biological membranes, havebecome extremely important more recently and myriad coarse-graining tech-niques have been applied to them.53,70–82 As the techniques used in membranesimulations are similar to those used for polymers, we point out here only afew of the main differences. Phospholipids, the main ingredient of biologicalmembranes, can be viewed as essentially consisting of two hydrophobic oligo-mers connected by a hydrophilic head group. The mapping of an atomisticrepresentation of a lipid bilayer to a coarse-grained representation is illu-strated in Figure 10. These systems self-assemble into bilayers where thehydrophobic core is shielded from the surrounding water. Compared withpolymers, we now must deal with three main new effects. First, the systemsare inherently heterogeneous because the biomembrane is in water. Second,lipids are essentially very short heteropolymers because they contain hydrophi-lic and hydrophobic parts. Third, the electrostatic interactions of lipid mole-cules are much more important than those in typical polymer systems. A

Figure 10 The mapping of an atomistic representation of a lipid bilayer to a coarse-grained model.


successful meso-scale simulation model for lipid bilayers was proposed byMarrink et al.53 The model was originally parameterized to reproduce thestructural, dynamic, and elastic properties of lamellar and nonlamellar statesof various phospholipids. In that study, groups of 4–6 heavy atoms (carbons,nitrogens, phophorus, and oxygen) were united to form super-atoms. The lipidheadgroup consisted of four sites. Two hydrophilic sites (one representing thecholine and one representing the phosphate group) and two intermediatelyhydrophilic sites (representing the glycerol moiety) were involved. Each ofthe two tails of dipalmitoyl-phosphatidylcholine (DPPC), an abundant phos-pholipid, was modeled by four sites. Water was modeled by individualhydrophilic sites, each representing four real water molecules in order toachieve a similar mass as the lipid super-atoms so as to make the dynamicmapping easier as described earlier. The sites were constrained to interact ina pair-wise manner via Lennard–Jones (LJ) potentials. Five different LJpotentials were used, ranging from weak, mimicking hydrophobic interac-tions, to strong, for hydrophilic interactions (with three levels in betweenfor other types of interactions).

In addition to the LJ interactions, a screened Coulomb interaction was usedto model the electrostatic interaction between the zwitterionic head groups. Thecholine group bears a charge of þ1, and the phosphate group bears a charge of�1. Soft springs between bonded pairs held the coarse-grained moleculestogether and angle potentials provided the appropriate stiffness. For efficiencyreasons, all super-atoms were assigned the exact same mass of 72 atomic units.

The interaction of lipids with small molecules was treated in a similarmanner. Alcohols (butanols) were modeled simply as a dimer of a polar andan apolar site;83 the polar site has the same interaction potential as does water,whereas the apolar site is the same as the alkanes in the lipids. This modelmakes the alcohol a symmetric amphiphile (which is not fully realistic). Thealcohol concentrations had to be renormalized by Dickey et al. because onecoarse-grained water represents four actual water molecules, whereas onecoarse-grained butanol represents one real butanol. Accordingly, a concentra-tion of 1:100 (butanol:water) in the coarse-grained model actually is 1:400 inthe real system.

Rougher coarse graining has also been used in lipid bilayer modeling.Only generic effects of the chemistry are taken into account for such roughmodels including hydrophilic–hydrophobic interactions and the anisotropyof the overall molecule. Notwithstanding, important generic properties ofmembranes have been elucidated. An example is the general pathway of lipidbilayer self-assembly, which is not specific to the individual lipid mole-cules.70,71,84 Also, large-scale properties like the bending modulus and theinfluence of concentrations on the bending modulus have been elucidated.72

In that study, it was noted that the layer thickness is the most crucial factorneeded for the prediction of the bending modulus. The phase behavior oflipids has also been studied using dissipative particle dynamics.73,74 A number

A Look Beyond Polymers 255

of solvent-free models have been proposed that are able to reproduce theliquid phase behavior and domain formation85–88 as well as the general elasticbehavior of the membrane.88,89

An even more drastic approach to coarse graining is to model the lipidcompletely in two dimensions and to use only one interaction site per lipid. Asimple example of such a model is a nonadditive hard-disk model90 depicted inFigure 11 and described later.

For simulating lipid bilayers on very large scales, Monte Carlo techni-ques are the method of choice. To model the interactions of mixed phospho-lipids, a simplified model was developed by Faller et al.90–92 That modelcontains the essential interactions between ganglioside lipids having largehead groups, other smaller lipids in the membrane, and attacking pathogens.Ganglioside lipids are unusual—they pack well into lipid membranes, but theyhave a large oligosaccharide head group that extends away from the mem-brane surface.91,93 Thus, they are dispersed in a layer of other lipids. Such mix-tures cannot be modeled readily in a traditional two-dimensional way. Toderive a model for the lipid interactions between dipalmitoyl phosphatidy-lethanolamine (DPPE) and the ganglioside lipid GM1, two coupled layers ofhard disks were used (see Figure 11).90 It is well known that hard-sphere fluidshave a single-phase transition when going from a gas phase to a crystallinephase.94,95 Without attractive interactions, no liquid phase can emerge, soone can expect that such a generalized hard-disk fluid will also have twophases. At very high pressures, phase separation may also occur. The mini-mum packing area for lipids with two linear hydrocarbon chains, like DPPEand GM1, is 38 to 40A2 per molecule. Head group size, hydration, steric, andentropic interactions may increase this area substantially. In the work of Ref.90, 45A2 was used initially for DPPE and 65A2 for GM1. These values arebased on experimental pressure–area isotherms for each lipid.93 However,GM1 molecules at low to intermediate densities when mixed with DPPE donot change the overall area per molecule very much. Hence, a minimum pack-ing area of 40A2 per molecule was used for GM1 in the hydrocarbon plane.The DPPE molecules are therefore modeled as simple disks, whereas the modelfor the GM1 molecules consists of two concentric disks that act in two layers,which technically leads to the peculiar situation where we have a binaryhard-disk fluid with a cross interaction radius that is not the average of theself-interaction radii.

DPPEGM1

40 Å

65 Å

45 Å

Figure 11 Illustration of the nonadditive hard–disk model for lipid mixtures.


It is known96,97 that when cholera toxin attacks a membrane it binds tofive pentagonally arranged GM1 molecules. Therefore, for some of the simula-tions, a number of GM1 particles were fixed, which was done in two differentways.90 The first and easiest way is to fix a number of particles randomly inspace. The second way was to fix them in a group of pentagonal shapes so asto model simplistically the binding of cholera toxin to a mixed DPPE/GM1

bilayer. With either arrangement, the increase in area per head group at thebinding of cholera toxin could be reproduced semi-quantitatively, therebyexplaining that the increase in area per head group comes from the disruptionof local packing by the fixation of molecules.

CONCLUSIONS

This chapter focused on describing structural properties of polymers andrelated soft-matter systems using coarse-grained models. We need to point outtwo major caveats that are important and should be considered especially bynovice modelers.

First, every mapping carried out between two systems is done at a speci-fic state point. Caution is advised if we want to transfer a coarse-grainedmodel between different state points. Changing just the temperature fromone state point to another can lead to a severe change in the meso-scale model.It has been shown recently that a coarse-grained model for atactic polystyreneoptimized in the meet crystallizes under cooling instead of forming a glass.98

An extreme example of this case involves crossing through the l-temperaturein polymer solutions where the system undergoes a significant structuraltransition between a globular nonsolvated polymer conformation and awell-solvated stretched conformation. Note, however, that for polymers likepolyisoprene and polystyrene, there exists a stability of the modeling resultswith chain length; it was not necessary to reoptimize the meso-scale modelwith increasing chain length, which is one of the major strengths of the self-consistent optimization technique described earlier.

Second, changing concentrations in polymer mixtures requires reevaluat-ing the mapping. One must optimize all interaction potentials together at leastin the final steps.

Using coarse-grained modeling techniques today is inevitable becauselarge-scale atomistic, especially quantum chemical, calculations are impracticaland may not be helpful for answering questions involving large size scales orlong time scales. Many techniques exists but, regrettably, there is no singleanswer to the ‘‘How To?’’ question. Coarse graining is still far from being atechnique that can be used in a broad sense as can atomistic simulationsbecause one must always think about the underlying scientific problem. Itmay never become as easy to use as atomistic MD or MC methods where a

Conclusions 257

manifold of well-evolved and relatively easy-to-use software packages exist.Coarse graining, however, offers much in the way of addressing scientificproblems that are intractable at the atomistic level and, from that pers-pective, should be considered as a valuable method for molecular simulations.

ACKNOWLEDGMENTS

The author thanks Alison Dickey and Qi Sun for assistance with the figures. Some of thework described here was financially supported by the U.S. Department of Energy, Office ofAdvanced Scientific Computing through an Early Career Grant (DE-FG02-03ER25568).

REFERENCES

1. W. Tschop, K. Kremer, J. Batoulis, T. Burger, and O. Hahn, Acta Polymerica, 49, 61 (1998).Simulation of Polymer Melts. I. Coarse-Graining Procedure for Polycarbonates.

2. H. Meyer, O. Biermann, R. Faller, D. Reith, and F. Muller-Plathe, J. Chem. Phys., 113, 6264(2000). Coarse Graining of Nonbonded Interparticle Potentials Using Automatic SimplexOptimization to Fit Structural Properties.

3. R. Faller and D. Reith,Macromolecules, 36, 5406 (2003). Properties of Polyisoprene –ModelBuilding in the Melt and in Solution.

4. R. L. C. Akkermans and W. J. Briels, J. Chem. Phys., 114, 1020 (2001). A Structure-BasedCoarse-Grained Model for Polymer Melts.

5. K. R. Haire, T. J. Carver, and A. H. Windle, Comput. Theor. Polym. Sci., 11, 17 (2001). AMonte Carlo Lattice Model for Chain Diffusion in Dense Polymer Systems and its Inter-locking with Molecular Dynamics Simulations.

6. J. D. McCoy and J. G. Curro, Macromolecules, 31, 9362 (1998). Mapping of Explicit Atomonto United Atom Potentials.

7. R. Faller and F. Muller-Plathe, Polymer, 43, 621 (2002). Multi-Scale Modelling of Poly(isoprene) Melts.

8. J. Eilhard, A. Zirkel, W. Tschop, O. Hahn, K. Kremer, O. Scharpf, D. Richter, and U.Buchenau, J. Chem. Phys., 110, 1819 (1999). Spatial Correlations in Polycarbonates:Neutron Scattering and Simulation.

9. D. Bai and A. Brandt, in Multiscale Computational Methods in Chemistry and Physics, Vol.177 of NATO Science Series: Computer and System Sciences, A. Brandt, J. Bernholc, andK. Binder, Eds., IOS Press, Amsterdam, 2001, pp. 250–266. Multiscale Computation ofPolymer Models.

10. M.Murat and K. Kremer, J. Chem. Phys., 108, 4340 (1998). FromManyMonomers toManyPolymers: Soft Ellipsoid Model for Polymer Melts and Mixtures.

11. C. F. Abrams and K. Kremer, J. Chem. Phys., 116, 3162 (2002). Effects of Excluded Volumeand Bond Length on the Dynamics of Dense Bead-Spring Polymer Melts.

12. M. Tsige, J. G. Curro, G. S. Grest, and J. D. McCoy, Macromolecules, 36, 2158 (2003).Molecular Dynamics Simulations and Integral Equation Theory of Alkane Chains: Com-parison of Explicit and United Atom Models.

13. H. Fukunaga, J. Takimoto, andM. Doi, J. Chem. Phys., 116, 8183 (2002). A Coarse-GrainingProcedure for Flexible Polymer Chains with Bonded and Nonbonded Interactions.

14. O. Hahn, L. Delle Site, and K. Kremer, Macromolec. Theory Simul., 10, 288 (2001).Simulation of Polymer Melts: From Spherical to Ellipsoidal Beads.


15. C. F. Abrams and K. Kremer, Macromolecules, 36, 260 (2003). Combined Coarse-Grainedand Atomistic Simulation of Liquid Bisphenol A-Polycabonate: Liquid and IntramolecularStructure.

16. G. C. Rutledge, Phys. Rev. E, 63, 021111 (2001). Modeling Experimental Data in a MonteCarlo Simulation.

17. F. L. Colhoun, R. C. Armstrong, and G. C. Rutledge, Macromolecules, 35, 6032 (2002).Analysis of Experimental Data for Polystyrene Orientation during Stress Relaxation UsingSemigrand Canonical Monte Carlo Simulation.

18. P. Doruker and W. L. Mattice, Macromolec. Theory Simul., 8, 463 (1999). A SecondGeneration of Mapping/Reverse Mapping of Coarse-Grained and Fully Atomistic Modelsof Polymer Melts.

19. J. Baschnagel, K. Binder, P. Doruker, A. A. Gusev, O. Hahn, K. Kremer, W. L. Mattice,F. Muller-Plathe, M. Murat, W. Paul, S. Santos, U. W. Suter, and V. Tries, in Advancesin Polymer Science, Vol. 152, Springer-Verlag, New York, 2000, pp. 41–156. Bridgingthe Gap Between Atomistic and Coarse-Grained Models of Polymers: Status and Pers-pectives.

20. F. Muller-Plathe, ChemPhysChem, 3, 754 (2002). Coarse-Graining in Polymer Simulation:From the Atomistic to the Mesoscopic Scale and Back.

21. F. Muller-Plathe, Soft Mater., 1, 1 (2003). Scale-Hopping in Computer Simulations ofPolymers.

22. R. Faller, Polymer, 45, 3869 (2004). Automatic Coarse Graining of Polymers.

23. M. Doi and S. F. Edwards, The Theory of Polymer Dynamics, Vol. 73 of International Seriesof Monographs on Physics, Clarendon Press, Oxford, 1986.

24. D. Reith, H. Meyer, and F. Muller-Plathe, Macromolecules, 34, 2335 (2001). MappingAtomistic to Coarse-Grained Polymer Models using Automatic Simplex Optimization to FitStructural Properties.

25. D. Reith, M. Putz, and F. Muller-Plathe, J. Comput. Chem., 24, 1624 (2003). DerivingEffective Meso-Scale Coarse Graining Potentials from Atomistic Simulations.

26. R. L. C. Akkermans, A Structure-based Coarse-grained Model for Polymer Melts, Ph.D.thesis, University of Twente, 2000.

27. A. Kolinski, J. Skolnick, and R. Yaris,Macromolecules, 19, 2550 (1986). Monte Carlo Studyof Local Orientational Order in a Semiflexible Polymer Melt Model.

28. I. CarmesinandK.Kremer,Macromolecules,21, 2819(1988).TheBondFluctuationMethod -ANew Effective Algorithm for the Dynamics of Polymers in All Spatial Dimensions.

29 K. Binder, Ed.,Monte Carlo andMolecular Dynamics Simulation in Polymer Science, Vol. 49,Oxford University Press, Oxford, 1995.

30. K. Binder and G. Ciccotti, Eds.,Monte Carlo and Molecular Dynamics of Condensed MatterSystems, Como Conference Proceedings, Societa Italiana di Fisica, Bologna, 1996.

31. G. S. Grest and K. Kremer, Phys. Rev. A, 33, R3628 (1986). Molecular Dynamics Simulationfor Polymers in the Presence of a Heat Bath.

32. K. Kremer and G. S. Grest, J. Chem. Phys., 92, 5057 (1990). Dynamics of Entangled LinearPolymer Melts: A Molecular-Dynamics Simulation.

33. R. Faller, F. Muller-Plathe, and A. Heuer, Macromolecules, 33, 6602 (2000). Local Reor-ientation Dynamics of Semiflexible Polymers in the Melt.

34. G. Milano and F. Muller-Plathe, J. Polym. Sci. B, 43, 871 (2005). Gaussian MulticentredPotentials for Coarse-Grained Polymer Simulations: Linking Atomistic and MesoscopicScales.

35. Q. Sun and R. Faller, Comp. Chem. Eng., 29, 2380 (2005). Systematic Coarse-Graining ofAtomistic Models for Simulation of Polymeric Systems.

36. R. Faller, H. Schmitz, O. Biermann, and F.Muller-Plathe, J. Comput. Chem., 20, 1009 (1999).Automatic Parameterization of Forcefields for Liquids by Simplex Optimization.

References 259

37. R. G. Della Valle and D. Gazzillo, Phys. Rev. B, 59, 13699 (1999). Towards an EffectivePotential for the Monomer, Dimer, Hexamer, Solid and Liquid Forms of HydrogenFluoride.

38. E. Bourasseau, M. Haboudou, A. Boutin, A. H. Fuchs, and P. Ungerer, J. Chem. Phys., 118,3020 (2003). New Optimization Method for Intermolecular Potentials: Optimizationof a New Anisotropic United Atom Potential for Olefins: Prediction of EquilibriumProperties.

39. D. Reith, H. Meyer, and F. Muller-Plathe, Comput. Phys. Commun., 148, 299 (2002). CG–OPT: A Software Package for Automatic Force Field Design.

40. D. Reith, B.Muller, F.Muller-Plathe, and S.Wiegand, J. Chem. Phys., 116, 9100 (2002). Howdoes the Chain Extension of Poly(acrylic acid) Scale in Aqueous Solution?ACombined Studywith Light Scattering and Computer Simulation.

41. O. Engkvist and G. Karlstrom, Chem. Phys., 213, 63 (1996). A Method to Calculate theProbability Distribution for Systems with Large Energy Barriers.

42. E. B. Kim, R. Faller, Q. Yan, N. L. Abbott, and J. J. de Pablo, J. Chem. Phys., 117, 7781(2002). Potential of Mean Force between a Spherical Particle Suspended in a Nematic LiquidCrystal and a Substrate.

43. E. Lindahl, B. Hess, and D. van der Spoel, J. Mol. Model., 7, 306 (2001). GROMACS 3.0: APackage for Molecular Simulation and Trajectory Analysis.

44. Q. Sun and R. Faller, Macromolecules, 39, 812 (2006). Crossover from Unentangled toEntangled Dynamics in a Systematically Coarse-Grained Polystyrene Melt.

45. J.-P. Ryckaert, G. Cicotti, and H. J. C. Berendsen, J. Comput. Phys., 23, 327 (1977).Numerical Integration of the Cartesian Equations of Motion of a System with Constraints:Molecular Dynamics of n-Alkanes.

46. F. Muller-Plathe and D. Brown, Comput. Phys. Commun., 64, 7 (1991). MulticolourAlgorithms in Molecular Simulation: Vectorisation and Parallelisation of Internal Forcesand Constraints.

47. H. C. Andersen, J. Comput. Phys., 72, 2384 (1983). Rattle: A ‘Velocity’ Version of the ShakeAlgorithm for Molecular Dynamics Simulations.

48. M. P. Allen and D. J. Tildesley, Computer Simulation of Liquids, Clarendon Press, Oxford,1987.

49. B. Hess, H. Bekker, H. J. C. Berendsen, and J. G. E. M. Fraaije, J. Comput. Chem., 18, 1463(1997). LINCS: A Linear Constraint Solver for Molecular Simulations.

50. R. Faller, F. Muller-Plathe, M. Doxastakis, and D. Theodorou, Macromolecules, 34, 1436(2001). Local Structure and Dynamics in trans Polyisoprene.

51. J. Budzien, C. Raphael, M. D. Ediger, and J. J. de Pablo, J. Chem. Phys., 116, 8209 (2002).Segmental Dynamics in a Blend of Alkanes: Nuclear Magnetic Resonance Experiments andMolecular Dynamics Simulation.

52. M. Doxastakis, D. N. Theodorou, G. Fytas, F. Kremer, R. Faller, F. Muller-Plathe, and N.Hadjichristidis, J. Chem. Phys., 119, 6883 (2003). Chain and Local Dynamics of Poly-isoprene as Probed by Experiments and Computer Simulations.

53. S. J.Marrink, A.H. de Vries, and A.Mark, J. Phys. Chem. B, 108, 750 (2004). Coarse GrainedModel for Semi-Quantitative Lipid Simulation.

54. P. E. Rouse, J. Chem. Phys., 21, 1272 (1953). A Theory of Linear Viscoelastic Properties ofDilute Solutions of Coiling Polymers.

55. G. Strobl, The Physics of Polymers, second ed., Springer Verlag, Berlin, 1997.

56. D. Frenkel and B. Smit, Understanding Molecular Simulation: From Basic Algorithms toApplications, Academic Press, San Diego, CA, 1996.

57. D. Reith, Neue Methoden zur Computersimulation von Polymersystemen auf verschiedenenLangenskalen und ihre Anwendung, PhD thesis, MPI fur Polymerforschung and UniversitatMainz, 2001. Available: http://archimed.uni-mainz.de/pub/2001/0074.


58. P. V. K. Pant and D. N. Theodorou,Macromolecules, 28, 7224 (1995). Variable ConnectivityMethod for the Atomistic Monte Carlo Simulation of Polydisperse Polymer Melts.

59. Z. Chen and F. A. Escobedo, J. Chem. Phys., 113, 11382 (2000). A Configurational-BiasApproach for the Simulation of Inner Sections of Linear and Cyclic Molecules.

60. J. J. de Pablo, M. Laso, and U. W. Suter, J. Chem. Phys., 96, 2395 (1992). Simulation ofPolyethylene above and below the Melting-point.

61. J. Wittmer, W. Paul, and K. Binder, Macromolecules, 25, 7211 (1992). Rouse and ReptationDynamics at Finite Temperatures: A Monte Carlo Simulation.

62. M.Muller,Macromolec. Theory Simul., 8, 343 (1999).Miscibility Behavior and Single ChainProperties in Polymer Blends: A Bond Fluctuation Model Study.

63. T. Haliloglu and W. L. Mattice, Rev. Chem. Eng., 15, 293 (1999). Simulation of RotationalIsomeric State Models for Polypropylene Melts on a High Coordination Lattice.

64. T. C. Clancy andW. L.Mattice, J. Chem. Phys., 112, 10049 (2000). Rotational Isomeric StateChains on a High Coordination Lattice: Dynamic Monte Carlo Algorithm Details.

65. R. Ozisik, E. D. von Meerwall, and W. L. Mattice, Polymer, 43, 629 (2001). Comparison ofthe Diffusion Coefficients of Linear and Cyclic Alkanes.

66. P. Ahlrichs and B. Dunweg, J. Chem. Phys., 111, 8225 (1999). Simulation of a Single PolymerChain in Solution by Combining Lattice Boltzmann and Molecular Dynamics.

67. W. Kuhn, Kolloid Z., 68, 2 (1934). Uber die Gestalt Fadenformiger Molekule in Losungen.

68. T. Kreer, J. Baschnagel, M. Muller, and K. Binder,Macromolecules, 34, 1105 (2001). MonteCarlo Simulation of Long Chain Polymer Melts: Crossover from Rouse to ReptationDynamics.

69. W. Tschop, K. Kremer, O. Hahn, J. Batoulis, and T. Burger, Acta Polymerica, 49, 75 (1998).SimulationofPolymerMelts. II. FromCoarse-GrainedModels back toAtomisticDescription.

70. T. Soddemann, B. Dunweg, and K. Kremer, Eur. Phys. J. E, 6, 409 (2001). A GenericComputer Model for Amphiphilic Systems.

71. J. C. Shelley, M. Y. Shelley, R. C. Reeder, S. Bandyopadhyay, andM. L. Klein, J. Phys. Chem.B, 105, 4464 (2001). A Coarse Grain Model for Phospholipid Simulations.

72. L. Rekvig, B. Hafskjold, and B. Smit, J. Chem. Phys., 120, 4897 (2004). Simulating the Effectof Surfactant Structure on Bending Moduli of Monolayers.

73. M. Kranenburg, M. Venturoli, and B. Smit, J. Phys. Chem. B, 107, 11491 (2003).Phase Behavior and Induced Interdigitation in Bilayers Studied with Dissipative ParticleDynamics.

74. M. Kranenburg and B. Smit, J. Phys. Chem. B, 109, 6553 (2005). Phase Behavior of ModelLipid Bilayers.

75. B. Smit, P. A. J. Hilbers, K. Esselink, L. A. M. Rupert, N. M. van Os, and A. G. Schlijper,Nature, 348, 624 (1990). Computer Simulations of a Water/Oil Interface in the Presence ofMicelles.

76. R. Goetz, G. Gompper, and R. Lipowsky, Phys. Rev. Lett., 81, 221 (1999). Mobility andElasticity of Self-Assembled Membranes.

77. G. Ayton andG. A. Voth,Biophys. J., 83, 3357 (2002). BridgingMicroscopic andMesoscopicSimulations of Lipid Bilayers.

78. H. Guo and K. Kremer, J. Chem. Phys., 118, 7714 (2003). Amphiphilic Lamellar ModelSystems under Dilation and Compression: Molecular Dynamics Study.

79. M. Muller, K. Katsov, and M. Schick, J. Polym. Sci. B, 41, 1441 (2003). Coarse GrainedModels and Collective Phenomea in Membranes: Computer Simulation of MembraneFusion.

80. T. Murtola, E. Falck, M. Patra, M. Karttunen, and I. Vattulainen, J. Chem. Phys., 121, 9156(2004). Coarse-Grained Model for Phospholipid/Cholesterol Bilayer.

References 261

81. S. O. Nielsen, C. F. Lopes, I. Ivanov, P. B. Moore, J. C. Shelley, and M. L. Klein, Biophys. J.,87, 2107 (2004). Transmembrane Peptide-Induced Lipid Sorting and Mechanism of La-to-Inverted Phase Transition Using Coarse-Grain Molecular Dynamics.

82. O. Lenz and F. Schmid, J. Mol. Liq., 117, 147 (2005). A Simple Computer Model for LiquidLipid Bilayers.

83. A. N. Dickey and R. Faller, J. Polym. Sci. B, 43, 1025 (2005). Investigating Interactions ofBiomembranes and Alcohols: A Multiscale Approach.

84. O. G.Mouritsen, inAdvances in the Computer Simulation of Liquid Crystals, P. Pasini and C.Zannoni, Eds., Vol. C 545 of NATO ASI, NATO, Kluwer, Dordrecht, the Netherlands,2000, pp. 139–188. Computer Simulation of Lyotropic Liquid Crystals as Models ofBiological Membranes.

85. O. Farago, J. Chem. Phys., 119, 596 (2003). ‘‘Water-Free’’ ComputerModel for Fluid BilayerMembranes.

86. G. Brannigan and F. L. H. Brown, J. Chem. Phys., 120, 1059 (2004). Solvent-Free Simulationsof Fluid Membrane Bilayers.

87. G. Brannigan and F. L. H. Brown, J. Chem. Phys., 122, 074905 (2005). CompositionDependence of Bilayer Elasticity.

88. I. R. Cooke, K. Kremer, and M. Deserno, Phys. Rev. E, 72, 011506 (2005). Tunable GenericModel for Fluid Bilayer Membranes.

89. G. Brannigan, A. C. Tamboli, and F. L. H. Brown, J. Chem. Phys., 121, 3259 (2004). The Roleof Molecular Shape in Bilayer Elasticity and Phase Behavior.

90. R. Faller and T. L. Kuhl, Soft Mater., 1, 343 (2003). Modeling the Binding of Cholera-Toxinto a Lipid Membrane by a Non-Additive Two-Dimensional Hard Disk Model.

91. C. E. Miller, J. Majewski, R. Faller, S. Satija, and T. L. Kuhl, Biophys. J., 86, 3700 (2004).Cholera Toxin Assault on Lipid Monolayers Containing Ganglioside GM1.

92. C. E. Miller, J. Majewski, K. Kjaer, M. Weygand, R. Faller, S. Satija, and T. L. Kuhl, Coll.Surf. B: Biointerfaces, 40, 159 (2005). Neutron and X-Ray Scattering Studies of CholeraToxin Interactions with Lipid Monolayers at the Air–Liquid Interface.

93. J. Majewski, T. L. Kuhl, K. Kjaer, and G. S. Smith, Biophys. J., 81, 2707 (2001). Packing ofGanglioside–Phospholipid Monolayers: An X-Ray Diffraction and Reflectivity Study.

94. B. J. Alder and T. E. Wainwright, Phys. Rev., 127, 359 (1962). Phase Transition in ElasticDisks.

95. W. W. Wood, J. Chem. Phys., 52, 729 (1970).NpT-Ensemble Monte Carlo Calculations forthe Hard-Disk Fluid.

96. R. A. Reed, J. Mattai, and G. G. Shipley,Biochemistry, 26, 824 (1987). Interaction of CholeraToxin with Ganglioside GM1 Receptors in Supported Lipid Monolayers.

97. H. O. Ribi, D. S. Ludwig, K. L. Mercer, G. K. Schoolnik, and R. D. Kornberg, Science, 239,1272 (1988). Three-Dimensional Structure of Cholera Toxin Penetrating a LipidMembrane.

98. J. Ghosh, B. Y. Wong, Q. Sun, F. R. Pon, and R. Faller, Molecular Simulation, 32, 175 (2006)Simulation of glasses: Multiscale Modeling and Density of States Monte Carlo Simulations.


CHAPTER 5

Analysis of Chemical InformationContent Using Shannon Entropy

Jeffrey W. Godden and Jurgen Bajorath*

Department of Life Science Informatics, B-IT, RheinischeFriedrich-Wilhelms-Universitat, Bonn, Germany

INTRODUCTION

The goals of this tutorial are to introduce to the novice molecular mode-ler the application of information content analysis in chemistry, to present aninformation theoretic examination of chemical descriptors and provideinsights into their relative significance, and to show that an entropy-basedinformation metric provides an undistorted assessment of the diversity of achemical database. Along the way the Shannon entropy (SE) concept, a form-alism originally developed for the telecommunications industry,1,2 will beintroduced and applied. A differential form of the SE metric will be used tocompare chemical libraries and to suggest which descriptors are most respon-sive to chemical characteristics of different compound collections. Althoughthis chapter focuses on the analysis and comparison of the information contentof molecular descriptors in large databases, we need to point out that otherapplications of information theory in chemistry exist.

Entropy is well known as a quantitative measure of the disorder of aclosed system in thermodynamics and statistical mechanics. The equilibriumof a thermodynamic system is associated with the distribution of objects ormolecules having the greatest probability of occurring, and this most probablestate is the one with the greatest degree of disorder. In statistical mechanics,the increase in entropy to its maximum at equilibrium is rationalized as the


263

intrinsic tendency of any system to proceed to increasingly probable states. Inthis context, entropy is interpreted as a function of the number of possiblemicroscopic states that a system can occupy, as determined by external factorssuch as temperature or pressure. In a similar manner, entropy is also used ininformation theory as a measure of information contained in a dataset ortransmitted message.

Claude E. Shannon is generally recognized as the founding father ofinformation theory as we understand it today: a mathematical theory or fra-mework to quantitatively describe the communication of data. Irrespective oftheir nature or type, data need to be transmitted over ‘‘channels,’’ and a focalpoint of Shannon’s pioneering work has been that channels available for com-municating data are generally noisy. Shannon demonstrated that data can becommunicated over noisy channels with a small probability of error if it is pos-sible to encode (and subsequently) decode the data in a way that communi-cates data at a rate below but close to channel capacity.

The most basic means of conceptualizing entropy in the context of infor-mation theory is to associate the information content of a signal with a prob-ability distribution. The amount of apparent randomness, or distributionspread, is then treated as an entropy metric and thereby associated with theinformation content of the system or message giving rise to the distribution.

A fundamentally important interpretation of Shannon’s formalism forthe study of molecules was that any structural representation could be under-stood as a communication carrying a specific amount of information. Conse-quently, in 1953, the concept of molecular information content wasintroduced.3 In 1977, graph theory was combined with information theoreticanalysis in the design and study of topological indices (graph-based descriptorsof molecular topology).4 A year later, the principle was formulated that entro-py is transformed into molecular information by formation of structures fromelements (through bonds).5 In 1981, the combination of graph and informa-tion theory led to the first quantitative description of molecular complexity.6

Shannon entropy analysis was also applied in quantum mechanics. In 1985,entropy calculations were reported to analyze quantum mechanical basissets.7 In 1998, the concept of local Shannon entropy was introduced basedon partitioning of charge densities over atoms or groups.8

More recently, several investigators have focused on adapting the Shan-non entropy concept for various uses in theoretical organic chemistry and che-moinformatics. For example, almost simultaneously with our initial studies onShannon entropy-based descriptor and database profiling,9 the adaptation ofthis concept for the design of diverse chemical libraries was reported.10 Build-ing directly on our work to adapt9 and to extend11,12 the Shannon entropyformalism, Zell et al. further extended the approach for feature and descriptorselection by introducing a Shannon entropy clique algorithm,13,14 and Grahamhas studied the molecular information content of organic compounds usingShannon entropy calculations.15–17

264 Analysis of Chemical Information

These publications illustrate very well that the Shannon entropy concepthas established itself in computational chemistry and chemoinformatics,regardless of whether applied in the context of molecular graph theory, diver-sity analysis, descriptor selection, or large-scale database profiling. As we willsee, the Shannon entropy formalism is not difficult to grasp even though theunderlying concept is much more complex to comprehend than it appears atfirst glance. For example, although Shannon entropy was related to molecularinformation content as early as 1953, it took 15 more years until it was rigor-ously shown that the Shannon formalism is truly a measure of informationcontent when applied to molecular structure via graph representations.18

We will describe below the SE formalism in detail and explain how it canbe used to estimate chemical information content based on histogram repre-sentations of feature value distributions. Examples from our work and studiesby others will be used to illustrate key aspects of chemical information contentanalysis. Although we focus on the Shannon entropy concept, other measuresof information content will also be discussed, albeit briefly. We will alsoexplain why it has been useful to extend the Shannon entropy concept byintroducing differential Shannon entropy (DSE)11 to facilitate large-scale ana-lysis and comparison of chemical features. The DSE formalism has ultimatelyled to the introduction of the SE–DSE metric.12

SHANNON ENTROPY CONCEPT

Claude E. Shannon, in his seminal 1948 paper,1 considered the frequencyof symbols sent along transmission channels and formulated a metric of theexpectation of aggregations of symbols, which he connected to formulationsfor entropy found previously in statistical mechanics. Shannon was concernedwith the channel capacity needed to transmit a specific amount of information.For Shannon, a channel was a real or theoretical conduit of a signal. For ourpurposes here, the analog of a channel is a single bin in a histogram, andinstead of calculating channel capacity, we will hold our ‘‘channels’’ constantand monitor the degree to which their capacity is filled. The Shannon entropy(or SE value)1,2 is defined as

SE ¼ �Xi

pi log2 pi ½1�

Here p is the estimated probability, or frequency, of the occurrence of a spe-cific channel of data. The pi corresponds to a particular histogram bin countnormalized by the sum of all bin counts, ci being the bin count for a particularbin (Eq. [2]):

pi ¼Xi

citotal count

½2�

Shannon Entropy Concept 265

Note that the logarithm in Eq. [1] is taken to base 2. Although this amounts toa simple scaling factor, it is a convention adopted in information theory so thatentropy can be considered equivalent to the number of bifurcating (binary)choices made in the distribution of the data. In other words, using base 2allows us to address this question: How many yes/no decisions do we needto make for data counts to fall into specific channels or bins in order to repro-duce the observed data distribution? The higher the information content, themore numerous are the decisions required to place each data point.

In Figure 1, the number of ‘‘decisions’’ necessary to place the 100compounds falls between the numerical value that would have resulted ifthe data distribution had produced four equally populated bins (log2 ¼ 2:0)and that of eight equally populated bins (log2 8 ¼ 3:0). This intermediatevalue is because the example probabilities are not evenly distributed overthe eight bins in our histogram, and therefore, our ability to guess whichbin a future compound will fall into is better than if they were equally distrib-uted. Another way to look at this is that the information content of the distri-bution, which is the opposite of our predictive ability (there is no information

0.43 0.23 0.11 0.08 0.02 0.010.07 0.05

log20.43 0.43 log20.23 0.23 log20.08 0.08log20.11 0.11+ + + +

log20.01 0.01+ + +log20.05 0.05 log20.02 0.02log20.07 0.07 = –2.316

2.316invert the sign, and the SE is:

Start with a histogramof the molecular weightsof 100 compounds...

45

40

35

30

25

20

15

10

5

0200 300 400 500 600 700 800 900

23 11 8 7 2 143 + + + + + + + 100 total bin counts=

apply Shannon’s equation to these probabilities

divide each bin count by the total bin count to get the sample probabilities

5

Figure 1 Example of Shannon entropy calculation for a hypothetical distribution ofmolecular weights.


for us if we already know the outcome with certainty), is less than if every binwas equally probable. The connection this example has to Shannon’s originalwork with the transmission of information is to view the molecular weight fre-quencies in Figure 1 as the frequencies of the unit of information to be trans-mitted, e.g., letters in an alphabet. Given this view, from Figure 1, we wouldconclude that we would need on average 2.316 binary bits to encode the mole-cular weight bin ‘‘message’’ for this hypothetical distribution.

The extremes of data distributions are depicted in Figure 2 along with anarbitrary midpoint in a calculation of descriptor entropy.

When the data are maximally distributed over all of the histogrambins, the SE value is equal to the logarithm of the number of histogrambins. Therefore, the SE value is dependent on the number of histogram binsused for a particular study. This dependence can be, for the most part,removed by dividing the SE value by the logarithm to the base 2 of the numberof histogram bins chosen (‘‘N’’ in Eq. [3]), which gives rise to a scaled SE orSSE value:

SSE ¼ SE

log2N½3�

The SSE has an absolute minimum of 0, associated with a data distributionwhere all the values fall into a single bin and a maximum of 1.0, whereeach bin is occupied by an equal number of data counts. As we shall see,SSE is not independent of boundary effects (described later) underlying thedata and there is an asymptotic relationship associated with the number ofbins, which can be ignored for most practical comparisons.

In addition to this asymptotic relationship associated with the number ofbins used in a given study, the treatment of data outliers affects SSE calcula-tions, just as it would influence the analysis of any histogram. A large body ofliterature associated with both of these topics exists (see Refs. 19–21). Surpris-ingly, there is no known optimum value for the number of bins chosen for ahistogram,22 but commonly accepted rules exist. For example, one postulate isthat the bin width should be proportional to both the standard deviation of the

Figure 2 Data distribution extremes and corresponding SE values. Depicted are threehypothetical data distributions that correspond to no information content, intermediateinformation content, and maximal information content (from the left to the right).

Shannon Entropy Concept 267

data and the cube root of the number of available data points.23 An importantpoint when calculating SSE is that the number of histogram bins, however cho-sen, should remain constant throughout any comparison made, even thoughthe SSE values are normalized with respect to bin numbers.

Outliers are a significant problem for the distribution of chemical desc-riptor values. Many descriptors were designed with a relatively narrow rangeof chemical compounds in mind and using them indiscriminately on a largediverse chemical database will produce descriptor value outliers (and occa-sionally even undefined numbers such as infinity). An outlier can distort a stan-dard histogram by forcing other values to be concentrated into fewer bins as isshown in Figure 3. The fact that one of the common uses of histograms is forthe discovery of outliers should not obviate entering into a kind of circular rea-soning by removing outliers until a histogram ‘‘looks good.’’ A more unbiasedapproach to removing outliers is to ask the question: How many vacant inter-nal bins does a histogram have? If this number exceeds some preestablishedthreshold (e.g., greater than 10% of the total number of bins), a percent trim-ming of the extreme values should be employed to tag values as outliers thatwould then omit them from subsequent SE or SSE calculations. Although thereare more formalized tests for outliers,24,25 many of them depend on the pre-sence of (approximately) normal data distributions,26 and therefore must bediscounted for reasons already mentioned. Once a descriptor value of a com-pound is declared to be an outlier, all other values associated with that com-pound must also be removed for any consistent comparison. It is biasedstatistically to remove only those descriptor values that are outliers and carryout SE-based comparisons between descriptors; outlier removal must be madeconsistent at the data level of the compound set. Simply put, an outlier must beexcluded from any of the subsequent calculations.

Although the Shannon entropy formalism appears to be ‘‘easy’’ and isstraightforward to implement, entropy calculations are intimately connectedwith and critically influenced by the data representation over ‘‘channels’’ or

400 450 500

015

030

0

500 1000 1500 2000

040

010

00

bin

coun

ts

Figure 3 Effect of an outlier on histograms with constant binning schemes. On the left isa histogram of 1000 normally distributed data points, and on the right is a histogram ofthe same data with a single outlier of value 2000 added. Because the binning schememaintains a particular number of equal width bins, this one outlier forces all previousdata counts into the single lowest valued bin. The SE for the left histogram is 2.301 (SSE:0.726), and the right histogram SE is 0.011 (SSE: 0.004).


histogram bins. Those channels can be severely affected by statistical problemsassociated with outliers in datasets. For all practical purposes, consistency ofdata representation and rigorous outlier treatment are key considerationswhen evaluating Shannon entropy.

There are other metrics of information content, and several of them arebased on the Shannon entropy.27 About 10 years after introduction of theShannon entropy concept, Jaynes formulated the ‘‘maximum entropy’’approach,28 which is often referred to as Jaynes entropy and is closely relatedto Shannon’s work. Jaynes’ introduction of the notion of maximum entropyhas become an important approach to any study of statistical inference whereall or part of a model system’s probability distribution remains unknown.Jaynes entropy, or ‘‘relations,’’ which guide the parameterization to achievea model of minimum bias, are built on the Kullback–Leibler (KL) function,29

sometimes referred to as the cross-entropy or ‘‘relative entropy’’ function,which is often used and shown (in which p and q represent two probabilitydistributions indexed by k), as

KL ¼Xk

pk log2pkqk

� �½4�

The Kullback–Leibler formulation evaluates the relative entropy betweentwo data distributions. However, it is not symmetrical with respect to thetwo distributions under comparison; that is, one must declare one distributionas the base set or reference from which the other is assumed to depart.Concerning the connection between Jaynes entropy and the Kullback–Leiblerfunction, maximum entropy is achieved when qk is replaced with a distribu-tion about which there is ‘‘prior knowledge’’ and pk is adjusted so as tomaximize KL. Prior knowledge could, for example, be the mean or expecta-tion value of a data distribution. Importantly, because of the quotientinvolved (Eq. [4]), the Kullback–Leibler function becomes undefined if anybin is unpopulated. This renders this function inappropriate for the purposesof estimating information content in chemical descriptor sets, which isdiscussed below.

DESCRIPTOR COMPARISON

Chemical descriptors are used widely in chemoinformatics research tomap the chemical features of compounds into the domain of numerical andstatistical analysis.30 Once molecular features are expressed numerically, oras enumerated factor sets (e.g., structural keys), the tools for numerical andstatistical analysis can then be applied to analyze and compare molecular simi-larity or diversity of compound collections.

Descriptor Comparison 269

Many chemical descriptors exist and are readily available in the literatureor they can be easily calculated,31 but discerning which ones are most useful toa particular study can be a daunting task.30 When selecting chemical descrip-tors, the researcher should consider which set best encodes the features that areimportant to the study in question. Even without focusing on a particular pro-blem, however, one can estimate the information a descriptor may contain byconsidering the details of its numerical construction. For example, a descriptorcan produce an integer value for a molecule, such as an enumeration of a che-mical feature (e.g., the number of triple bonds), or it may possess a spanof real values (e.g., logP(o/w); the logarithm of the octanol/water partitioncoefficient), or it may fall somewhere between the two, like the descriptor‘‘molecular weight,’’ which is quantized in that not all real values are attain-able. One can easily understand the fundamentally quantized nature of themolecular weight descriptor by considering that a hydrogen atom, whoseatomic mass is approximately 1.00794, is the smallest possible unit one canadd to a molecule. How quantized a descriptor is (its ‘‘granularity’’) can beestimated quickly by the number of unique values it attains in a large database.Another question to be considered is as follows: What is the theoreticalrange of the descriptor value? When one uses more than one descriptor, asis typically the case, the relative ranges of each individual descriptor must beconsidered.

Once the general numeric behavior of a descriptor has been considered,then some understanding of the descriptor’s statistical distribution in a datasetmust be obtained. Specifically, the modeler must ask: Can we sensibly applyanalysis tools that depend on approximately normal (or ‘‘Gaussian’’) distribu-tions? Even the most common estimator of the central tendency, the average(or mean) value, is more sensitive to departures from a normal distributionthan is often realized. A proper estimator of central tendency (e.g., mean, med-ian, or mode) is a single value chosen to be an accurate representative of thebehavior of the whole population. Values of descriptors often display longtails in their distributions, and chemical libraries frequently contain com-pounds that are likely outliers, as discussed earlier. In such cases, the meanvalue may not be the best representation of the central tendency. When possi-ble, the descriptor’s distribution should be viewed as a histogram, and a quickglance at that histogram, of even a relatively small random sample of the datapopulation, can readily suggest the proper treatment of the data or explainwhy some analysis has generated unexpected results.

Figure 4 shows a few descriptor histograms of a chemical library ofnowadays typical size, i.e., containing more than a million compounds.32 Itis immediately apparent that any statistical technique that depends on a nor-mal distribution cannot be generally applied; it would be inappropriate, forexample, to use the average value of the number of aromatic bonds to charac-terize a representative for a set of compounds. Any metric of descriptor varia-bility based on a normal distribution such as a standard deviation is also


generally unreliable. Clearly, a nonparametric estimator of descriptor infor-mation content is needed and the histograms suggest a method. The intimateconnection of histogram analysis and information content estimation based onthe Shannon entropy makes this type of analysis very attractive for the sys-tematic study and comparison of descriptor value distributions.

To provide some specific examples, let us consider the calculation of SSEfor four molecular descriptors in two well-known databases: the AvailableChemical Directory (ACD)33 and the Molecular Drug Data Report(MDDR).34 These two databases contain different types of molecules. TheACD contains many organic compounds and reagents, whereas MDDR con-sists exclusively of biologically active molecules, many of which have origi-nated from drug discovery programs. Thus, MDDR compounds are muchmore ‘‘lead-like’’ or ‘‘drug-like’’ than are the synthetic organic ACD mole-cules. The descriptors displayed in Figure 5 are ‘‘molecular weight,’’ the‘‘number of rotatable bonds’’ in a molecule (a measure of molecular flexibil-ity), ‘‘logP(o/w),’’ the logarithm of the octanol/water partition coefficient(a measure of hydrophobic character), and the ‘‘number of hydrogen bonddonors’’ in a molecule. It should be noted that these descriptors almost consti-tute the ‘‘rule-of-five’’ set of descriptors used to estimate the oral availabilityof pharmaceutically relevant molecules.35 The database values of these fourdescriptors were calculated using the software platform Molecular OperatingEnvironment (MOE).36 Histograms for 231,187 ACD compounds and155,814 MDDR compounds were constructed by keeping the number ofhistogram bins constant, removing any compounds judged to be outliers forany of the descriptors under study, and establishing overall minimum andmaximum descriptor values. The number of bins was fixed at 19 accordingto the Sturges rule, which sets the number of bins to the base 2 logarithmof the number of data points plus one.37 In terms of outliers, a single ACDcompound had a sufficiently high molecular weight that it left three interior

0 10 20 30 40 50

0e+

002e

+05

4e+

05

0 10000 20000 30000

015

0000

3500

00

0.0 0.2 0.4 0.6 0.8 1.0

060

0000

Figure 4 Examples of histograms of molecular descriptors. Shown are databasedistributions for three descriptors: ‘‘b_ar’’ stands for the ‘‘number of aromatic bonds,’’and ‘‘weinerPath’’and ‘‘petitjean’’are both molecular distance matrix descriptors. Thesedistributions are representative of those seen for many different types of moleculardescriptors. It should be noted that in the right graph the bins on either side of the peakare not empty but graphically insignificant relative to the central bin.

Descriptor Comparison 271

histogram bins empty. Consequently, this compound was removed fromfurther consideration for the remainder of the study. None of the MDDR com-pounds was considered an outlier. Global minimum and maximum descriptorvalues were identified, and the histogram bin counts were accumulated. Thesecounts were then converted to frequencies according to Eq. [2], and entropyvalues were calculated via Eqs. [1] and [3]. Figure 5 shows the resulting histo-grams and reports the associated descriptor entropy values.

The entropy values reflect the data distributions captured in the histo-grams. For every descriptor, the ACD compound database produces histo-grams with more populated bins appearing in the limits of the chart, whichis reflected by the calculated entropy values. These findings are consistentwith the fact that the ACD database contains a variety of synthetic compoundswhose descriptor and property values are not restricted to those typically seenin more drug-like compounds. Comparing the entropy values of the samedescriptor between two compound sets can therefore provide insights intothe relative information content of the databases, at least in light of the mon-itored feature(s). This use of entropy analysis will be covered in detail belowwhen the DSE formalism is introduced.

Because SE is a nonparametric distributionmetric, one of the essential fea-tures of an entropic approach to descriptor information content analysis isthat descriptors with different units, numerical ranges, and variability can becompared directly, a task that would otherwise not be possible. This allowsus to ask questions such as follows:Which descriptors carry high levels of infor-mation for a specific compound set and which carry very little? To answerthis question, we have systematically studied ‘‘1-D descriptors’’ and ‘‘2-D

Figure 5 Descriptor histograms and SE and SSE values from two different compounddatabases. Compared are distributions of descriptor values in a chemical (ACD) andpharmaceutical (MDDR) database. The top number in the upper part of each chartreports the SE value, and the number beneath is the SSE value. Descriptor abbreviations:‘‘MW,’’ molecular weight; ‘‘a_don,’’ number of hydrogen bond donor atoms;‘‘b_rotN,’’ number of rotatable bonds in a molecule; ‘‘logP(o/w),’’ logarithm of theoctanol/water partition coefficient.


descriptors’’ contained in various databases.9,12 These descriptor designationsmean that their values are calculated frommolecular composition (1-D) formu-las and two-dimensional (graph) representations of molecular structure (2-D),respectively. Among others, they include categories such as bulk propertydescriptors, physicochemical parameters, atom and bond counts, and topologi-cal or shape indices.What we have generally found is that descriptors belongingto all of these categories can carry significant information, even those consistingof atom and bond counts. Thus, there is no strict correlation between the com-plexity of a descriptor and its information content. The information contentmuch depends on the compound database under investigation. However, asone would expect, there is a tendency for complex descriptors, whose definitioncan be understood to consist of several simple descriptors, to carry more infor-mation. An example of this complexity are those descriptors that depend ondivided atomic surface area distributions of other property descriptors and, acc-ordingly, represent higher order statistical combinations of other descriptors.38

On the other hand, simple counts of atomic or bond properties typically havediscrete values and thus often occur at the lower end of an information contentspectrum. Perhaps unexpectedly, we have made similar observations for KierandHall connectivity and shape indices.39 These indices are calculated in a hier-archical order where the higher orders consider an increasing span of neighboratoms. Consequently, even though those connectivity indices are found at thelower end of the descriptor information content spectrum, it is consistentlythe higher order connectivity descriptors whose values tend to go toward zero.Thus, upon closer inspection, our unexpected findings can be rationalized.

The utility of comparing entropic information content between descrip-tors is particularly evident when one is attempting to construct an efficient‘‘aggregate of chemical information,’’ an example of which is a ‘‘fingerprint,’’which refers to a bit string representation of molecular structure and proper-ties composed of various descriptor contributions. For such an endeavor, onewould like to have a set of information-rich descriptors, which are, however,not overdetermined with regard to a specific compound feature (otherwise,this exercise often becomes a deceptive form of a single property or substruc-ture search). Another situation in which one would be interested in descriptorsthat are particularly information-rich is the study of compound class sensitiv-ity where one would like to know which descriptors are most sensitive to che-mical features encoded in a class of compounds that display specific activityagainst a target protein as an example. An entropy-based metric designed toanswer questions related to such issues will be discussed below.

INFLUENCE OF BOUNDARY EFFECTS

As stated, the reason for formulating the SSE is to remove the depen-dence of the entropy metric on the number of histogram bins. However,

Influence of Boundary Effects 273

common to all histogram-based analyses are boundary effects, especiallywhen the number of bins is small. For example, if a descriptor has an intrinsic‘‘preference’’ for certain numerical values (factors of six for the number ofaromatic bonds, for example), and if adding just one more bin leads to divid-ing the data close to the center of this preferred value, a change will thenappear in the resulting SSE value that will not relate directly to the descrip-tor’s intrinsic information content. As mentioned, this phenomenon is gener-ally not an issue if the number of bins is always held constant over the entireanalysis, and when it is, it remains a relatively small effect given that thenumber of bins is initially chosen to be sufficiently large for the number ofdata points involved. Nevertheless, altering the number of bins can affectthe assessment of the underlying information content. In Figure 6, we illus-trate how the value of SSE changes with the number of bins selected for ana-lysis. As Figure 6 demonstrates, boundary oscillations can also be seen,particularly with discrete valued descriptors, which are followed by a slowfall-off reflecting a finite amount of data. A peak (occurring near 225 binsfor ‘‘b_rotN’’ in Figure 6, for example) occurs at the point where the datacannot be sampled on a finer grid without exposing the underlying granu-larity of the data. That is, the data have become spread out as much as pos-sible.

It would be tempting to attempt to define another (and even more binnumber independent) metric to be the peak SSE value. However, that peakoccurs at different values depending on both the dataset and the design ofthe chemical descriptor. For example, an information-rich but sufficiently nar-row valued descriptor might require the number of bins to be on the order ofhalf of the number of data points before that peak is reached. Therefore, the

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0 50 100 150 200 250 300 350 400 0

Figure 6 Changes in values of SSE for histograms with increasing numbers of bins.Descriptors are abbreviated as in Figures 4 and 5.


factors of dependence would actually increase. Although SSE is not trulya bin number-independent metric, its values can always be compared for aconstant number of bins and its value can be approximated for studies usingdifferent numbers of bins.

EXTENSION OF SE ANALYSIS FOR PROFILINGOF CHEMICAL LIBRARIES

One concern of those informaticians who assemble compound libraries isif one database represents a more diverse set than another. For example, onemight ask questions such as follows: How much additional chemical diversitycould be expected if the size of the current database was doubled by addingcompounds from other sources? The SE metric, as a nonparametric indicatorof variability or value spread for a particular compound set, is designed to bean independent metric and not suitable to address such questions. So, althoughit is reasonable to note that one compound set has a higher SE or SSE for aspecific descriptor than for another, no statement can be made about the over-lap between the two sets. Indeed, the two compound sets considered togethermight not produce a greater spread of values, and perhaps surprisingly, it iseven possible to lower the aggregate SE, or global information content, of achemical library by adding additional compounds. Probability histograms pro-viding the basis for SE calculations are implicitly renormalized as more com-pounds are included into the set. It therefore becomes necessary to introduce anew metric for making new assemblages of compounds or for comparing twopreexisting collections. This new metric is referred to as the ‘‘differential Shan-non entropy’’ (DSE) defined in Eq. [5]. The DSE metric has a form that oftenoccurs in statistics, which asks the question: Is the aggregate more than thecombination of its parts?

DSE ¼ SEAB � SEA þ SEB

2

� �½5�

In Eq. [5], SEAB is the Shannon entropy calculated from the aggregate set ofcompound sets A and B, whereas SEA and SEB are the SE values for each of thetwo databases considered individually (of course, SSE values are typically usedinstead of SE). Therefore, DSE can be viewed as the increase or decrease in theoverall descriptor variability due to complementary or synergistic informationcontent of the individual databases involved.

Figure 7 depicts hypothetical distributions to underscore the situationswhere DSE values become significant. The case of negative DSE will occurwhenever the spread of one distribution is enveloped by the other. In general,DSE reflects the growth of the resulting renormalized distribution envelope ofdescriptor values. Importantly, DSE analysis permits the identification of

Extension of SE Analysis for Profiling 275

descriptors that are sensitive to systematic differences in the properties of var-ious compound databases or classes.

Consider a sample calculation. An SE value for the calculated logP(o/w)descriptor from the ACD collection is 1.329 (SSE: 0.308), that from the MDDRis 1.345 (SSE: 0.311), and when the two compound sets are combined into onehistogram, the value becomes 1.343 (SSE: 0.311). Therefore the resulting DSEvalue between the ACD and MDDR databases is 0.006 [scaled DSE (SDSE):0.002]. From this we can conclude that there is a small information gain withrespect to logP(o/w) when combining the two compound collections. Thus, thereis a detectable difference between the two compound sets in the overall logP dis-tribution at the given bin resolution. The same calculation using the ZINC com-pound database (containing over 2 million lead- and drug-like compounds at thiswriting)40 gives an SE of 1.180 (SSE: 0.273), whereas ZINC andMDDR togethergives 1.243 (SSE: 0.288). This provides a ZINC and MDDR DSE of �0.238(SDSE:�0.055). These results are again not unexpected. Because the ZINCdata-base is assembled from the catalogs of pharmaceutical compound vendors, wewould expect to find in it the majority of compounds from the smaller MDDRdataset. Consequently, the MDDR logP(o/w) distribution is duplicated by theset of ZINC compounds, leading to an overall reduction in the per compoundinformation content.

What sort of DSE values would be associated with a significant differ-ence between two compound sets? This question was answered by systemati-cally comparing 143 descriptors among four databases representingpharmaceutical, general synthetic organic, and natural product compounds.It was concluded that SDSE values in excess of 0.026 represent a large differ-ence between the distributions i.e., they are ‘‘high-DSE.’’12

A combination of SE and DSE analysis can be used to separate descriptorshaving little information content in one or both databases from those that arevariable but have different value ranges in the compared databases. Thus, DSEcalculations extend SE analysis by accounting for the range-dependence of

0

0.1

0.08

0.06

0.04

0.02cou

nt

freq

uen

cy SSE = 0.810.1

0.08

0.06

0.04

0.02

0

SSE = 0.66

0.02

0.1

0.08

0.06

0.04

0

SSE = 0.88 SDSE = 0.15

+

0

0.02

0.04

0.06

0.08

0.1

cou

nt

freq

uen

cy SSE = 0.81

0

0.1

0.08

0.06

0.04

0.02

SSE = 0.72 SDSE = –0.02

0

0.1

0.08

0.06

0.04

0.02

SSE = 0.66

+

Figure 7 Model DSE calculations. SSE and scaled DSE (SDSE) values are reported.SDSE values are analogous to the relation between SE and SSE, and they are producedby dividing the DSE by the number of bins taken to logarithm of base 2.


descriptor values. Combining SE and DSE calculations has led to the SE–DSEmetric12 that can classify descriptors based on database comparisons into fourSE–DSE categories: ‘‘high–high,’’ ‘‘high–low,’’ ‘‘low–high,’’ and ‘‘low–low,’’where, e.g., ‘‘high–high’’ means high SE and high DSE. Systematic analysishas revealed that descriptors belonging to the high–high SE–DSE categoryare relatively rare. Mainly complex descriptors, such as the previously men-tioned descriptors developed by Labute,38 were found to belong to this cate-gory. The relative scarcity of the high–high category is intuitive, because itrequires information-rich descriptors (which by definition occupy a broad dis-tribution in the histogram) to also have their values be more or less ‘‘self-avoid-ing’’ between the two databases being compared. Nonetheless, such descriptorscan be found when profiling and comparing different databases, and these aresought after by scientists for many applications, because they have consistentlyhigh information content and because they respond to systematic property dif-ferences between databases. Descriptors belonging to the low–high categoryare more intuitive to comprehend, as they are characterized by narrow rangesof descriptor values in each database combined with a significant difference inthe ranges they adopt between the databases. The low–high situation could alsoindicate general differences between compound collections, although these dif-ferences are in principle more difficult to relate to statistical significance usingdescriptors having low information content. For example, when comparing dif-ferences between the ACD and the Chapman and Hall (CH)41 natural productsdatabase, several simple atomic count descriptors (most notably for halogensand sulphur) were found to have generally low information content, but highSDSE values � 0:03.12 This observation can be rationalized because elementslike halogens and sulphur rarely occur in natural molecules. Furthermore,although descriptors of the low–low category are clearly not useful for compar-ing databases, descriptors belonging to the high–low category could be of valuebecause they have high information content but do not measurably respond tocompound class-specific features. High–low descriptors are also found fre-quently when comparing various databases.12 Descriptors in this categoryinclude different levels of complexity, ranging from very simple constructionssuch as the ‘‘number of hydrogen atoms’’ or ‘‘number of rotatable bonds’’ tocomplex formulations, as mentioned above. The high–low category of SE–DSEdescriptors are preferred for applications like similarity searching across differ-ent databases. In summary, the classification of property descriptors based oninformation content taking into account value range differences helps greatly toprioritize descriptors for specific applications. For example, such descriptorselection themes have proved to be useful when systematically comparing com-pounds from synthetic versus natural sources and to model physical propertiesas described below.

Establishing a metric that provides an intuitive measure of the graphicalseparation between value distributions of two databases being compared isalso useful. For this purpose, the ‘‘Entropic Separation’’ (ES)11 was defined

Extension of SE Analysis for Profiling 277

(Eq. [6]). The ES is the bin distance between the most populated bins or sta-tistical modes (‘‘M’’ in Eq. [6]) of the comparison histogram divided by half ofthe average of the two distributions of individual SE values. For example, if forone database the molecular weight histogram had its most populated bin atbin number 13, and the database to which the first was being compared hadits most populated bin number at bin 27 (with all histogram parameters heldconstant), the intermode bin distance, or jMA �MBj, would be 14.

ES ¼ jMA �MBj1

2

SEA þ SEB

2

� � ½6�

As the ES is scaled to the information content of the descriptor in thecompared databases, a descriptor with a broader distribution (higher averageSE) must have a greater peak separation in order to achieve the same level ofES as another descriptor. The ES is therefore an entropic (and nonparametric)analogy of the classical statistical phrase: ‘‘to be separated by so many sig-ma.’’ This measure is related to, yet distinct from, DSE. Figure 8 illustratesthe application of the ES metric on a pair of hypothetical data distributions.

INFORMATION CONTENT OF ORGANICMOLECULES

Among the Shannon entropy-related investigations referred to in theintroductory sections, studies by Graham et al. on the information contentof organic molecules are interesting to consider relative to our own works.This is because although we have focused on the analysis of descriptors to

0.06

0.04

0.02

0

cou

nts

fre

qu

ency

105 15 20 25

A_

BM M| |

DSE = 2.73ES = 13.13

Figure 8 Model ES calculation and comparison with DSE. Two hypothetical datadistributions are used to calculate ES and DSE values. The ES value can be thought of asa peak separation given in SE units.


estimate chemical information content, Graham et al. have chosen to studyinformation content associated with organic molecules more or less ‘‘directly.’’In qualitative terms, alkanes and aromatic molecules, which consist of onlycarbon and hydrogen atoms, have less information content than do, for exam-ple, halogen substituted forms of these molecules.15 A key feature of theapproach of Graham et al. is that conventional molecular graphs are directlyexamined for information content, not through descriptors as in our more che-moinformatics-oriented approach. Possible applications of Graham’sapproach include, for example, the study of interactions between organic com-pounds and solvent molecules, comparison of different tautomeric or ionizedforms of organic molecules, or the correlation between information contentwithin a compound series and relative potencies.

Graph-centric Shannon entropy-based information content analysis hasbeen elegantly facilitated by Graham et al. through implementation of a Brow-nian processing model that corresponds to a random (yet systematic) walkthrough a molecular graph representation.17 This Brownian processingapproach has recently been further extended to incorporate molecule aggrega-tion and solvation effects,42 thereby linking molecular information to commu-nication between molecules. Brownian processing, as applied to informationcontent analysis, is based on extracting three component atom-bond-atomunits from conventional molecular graphs. For organic molecules, typicalexamples would be (C–C), (C–H), (C–O), (C��O), etc. In serial Brownian pro-cessing, a molecular graph is accessed, a ‘‘code unit’’ selected, one of its nearestneighbors is randomly chosen, followed by a neighbor of the latter unit, then aneighbor of the neighbor is selected, and so forth. In parallel processing, severalnearest neighbors are selected randomly for each unit. Selected units are thenused to generate strings, an example of which for benzene is (C–C) (C–H)(C��C) (C–H) (C–C) (C–H) (C��C) (C–H) . . .. These strings contain multiplecopies of code units and create a serial message (or ‘‘tape recording’’) for a givenmolecule. To process molecular aggregates, such tape recordings can be com-bined for interacting groups of molecules, either sequentially or in parallel.42

Recording the code units and their relative frequency of occurrence (in a serialmessage) in a histogram or table format permits application of the Shannonentropy formula where relative frequencies of code units become their prob-abilities. Calculated entropy is then equivalent to the number of bits requiredto encode the observed distributions of units; the larger the number of bits, thehigher the information content. For tape recordings of similar size, the informa-tion content of molecules or aggregate states can then be compared.

SHANNON ENTROPY IN QUANTUM MECHANICS,MOLECULAR DYNAMICS, AND MODELING

Calculated electron density distributions can be conveniently studiedusing SE analysis, which has led to applications in quantum mechanics.7,8

Shannon Entropy in Quantum Mechanics, Molecular Dynamics 279

For example, when electron densities are recorded along a reaction coordinate,regions where densities peak correspond to low entropy areas, whereas inter-mediate regions are characterized by high entropy. For the study of chargedensities, electron distributions around functional groups, or for the interpre-tation of ab initio wave functions, SE analysis and the concept of ‘‘local’’Shannon entropy are relevant.8 Local SE values, based on the partitioning ofcharge densities of functional groups, have been used as a measure of groupsimilarity,8 and SE values calculated for various groups from Hartree–Fockwave functions have been correlated with changes in molecular geometry.43

Moreover, orbital models representing probabilities of electron distributionsover restricted spaces are well suited for SE analysis. The formulation of orbi-tal Shannon entropy has been achieved where electron density is normalizedwith respect to orbital occupation numbers.44 In this context, the Jaynesand Shannon entropy formalisms were compared, and the Jaynes entropywas rationalized as representing the difference between the mean orbital SEper electron and the mean orbital SE of a particular electron.44 Just like calcu-lated electron densities, entropy calculations have also been applied to experi-mental densities in order to aid in the refinement of crystallographic phases.45

Using the maximum entropy concept, the entropy of the electron density in abinned unit cell was calculated relative to the average electron density.45

In addition to its use in quantum mechanics for the past 20 years, SE ana-lysis has more recently been applied to molecular dynamics simulations andconformational analysis. An algorithm has been developed to calculate SEvalues from dynamics trajectories, and it was shown that entropies of confor-mational energies of test molecules correlated linearly with their experimentalthermodynamic entropies.46 Using 2-D-lattices and simplified (two-state; i.e.,hydrophilic–hydrophobic) protein chain representations, SE values for energydistributions produced by different pair-wise interactions were calculated.Potentials leading to their discrimination on the basis of differences in infor-mation content were developed.47

EXAMPLES OF SE AND DSE ANALYSIS

A key question is as follows: Can SE and DSE, as an information theo-retic approach to descriptor comparison and selection, be applied to accu-rately classify compounds or to model physiochemical properties? Toanswer this question, two conceptually different applications of SE and DSEanalysis will be discussed here and related to other studies. The first applica-tion explores systematic differences between compound sets from syntheticand natural sources.48 The second addresses the problem of rational descriptorselection to predict the aqueous solubility of synthetic compounds.49 For thesepurposes, SE or DSE analysis were carried out, and in both cases, selecteddescriptors were used to build binary QSAR-like classification models.50


A common assertion among medicinal chemists and library designers isthat compounds from natural products are difficult to work with. When ask-ing the question what exactly complicates working with natural products, atypical response involves the complexity of the molecules or features thatmake synthesis difficult. Studies systematically comparing natural and syn-thetic compounds are rare. Only fairly recently has a direct statistical analysisbeen carried out on structural and property differences between natural andsynthetic molecules.51 One example is distribution differences in nitrogen-or oxygen-containing groups as well as differences in distributions of halogenatoms. Halogen atoms and amide groups occur more frequently in syntheticmolecules, whereas natural compounds typically have higher oxygen abun-dance (e.g., in ester or alcohol groups). SE was used to compare two sizablecollections of compounds, one of synthetic origin and the other of natural ori-gin. The following question was asked: Which chemical descriptors are mostlikely to contain the information needed to systematically distinguish betweennatural and synthetic molecules? It should be emphasized that making use of avariety of chemical descriptors for such an investigation allows any level ofabstraction of chemical information to enter the analysis and is thereforemore general than a statistical fragment or substructure comparison.

To answer this question, the ACD33 database for synthetic compoundsand the CH41 database for natural products were chosen. The MOE softwareplatform36 was used to calculate 98 chemical 2-D descriptors for 199,240ACD compounds and for 116,364 CH compounds. Also included were severalimplicit 3-D descriptors that map properties on molecular surface areasapproximated from 2-D representations of structures.38 SE values were calcu-lated for all 98 descriptors and both databases. The ACD SE for each descrip-tor was plotted along one axis and the CH SE for each descriptor along theother as shown in Figure 9. SE points were seen to fall into three broad classes:(1) those with low SE values in both databases, (2) those with high SE in both,and (3) ‘‘off diagonal’’ points with intermediate SE values in either or bothdatabases. The entropic separations (ES) of all 98 descriptors were calculated,and it was found that ten descriptors (Table 1) in the off-diagonal and high SEregions produced the highest ES values.48

The highest ES descriptors reflect some known differences between syn-thetic and natural molecules, including, for example, the degree of saturationor aromatic character. It is also interesting to note that the descriptor with thehighest ES value, ‘‘a_ICM,’’ is itself calculated using entropic principles. Itaccounts for the entropy of the distribution of the elemental composition ofthe compound.

Based on this SE and ES analysis, four sets of descriptors were tested inbinary QSAR models. The four sets of descriptors consisted of (1) 7 descrip-tors with intermediate SE values in both databases, (2) 11 descriptors with lowSE values in both databases, (3) 8 descriptors with high SE values in both data-bases, and (4) 8 descriptors with the highest ES values in Table 1.

Examples of SE and DSE Analysis 281

SE

(A

CD

)

SE (CH)10 2 3 4 5 6

0

1

2

3

4

5

6

A

B

PEOE_VSA+4

a_ICM

chi1vPEOE_VSA+1

KierA3

vsa_pol

Figure 9 Shannon entropy comparison. SE values of descriptors calculated for ACDcompounds are plotted against corresponding values for the CH database. Region (A)includes descriptors with the highest SE, and region (B) those with the lowest SE. ‘‘Off-diagonal’’descriptors have the greatest difference in variability between the twodatabases.

Table 1 Entropic Separation of Descriptors in Two Databases

Descriptor Entropic Separation (ES) SE (CH/ACD)

a_ICM(a) 8.14 5.4/5.3bpol(b) 5.08 4.8/3.1chi0v_C(c) 4.68 4.9/3.6b_double(d) 4.52 2.9/2.5chi1v(e) 4.42 4.8/2.4a_nH(f) 4.08 4.7/3.2b_single(g) 3.93 4.9/3.2b_ar(h) 3.86 2.2/3.0vsa_hyd(i) 3.84 4.8/3.5apol(j) 3.83 4.8/3.6

(a) ‘‘a_ICM,’’ compositional entropy descriptor;(b) ‘‘bpol,’’ normalized atomic polarizability;(c) ‘‘chi0v_C,’’ carbon valence connectivity index (order 0);(d) ‘‘b_double,’’ number of double bonds in a molecule;(e) ‘‘chi1v,’’ atomic valence connectivity index (order 1);(f) ‘‘a_nH,’’ number of hydrogen atoms;(g) ‘‘b_single,’’ number of single bonds;(h) ‘‘b_ar,’’ number of aromatic bonds;(i) ‘‘vsa_hyd,’’ approximate hydrophobic van der Waals surface area;(j) ‘‘apol,’’ atomic polarizability.

282

Binary QSAR employs Bayesian statistics to correlate a selected set ofproperties with a probability for each molecule to belong to one of two states.As it was originally conceived,50 these states are assigned as ‘‘active’’¼ 1 and‘‘inactive’’¼ 0. For our purposes, they instead acquired the meaning of ‘‘nat-ural product’’ and ‘‘synthetic compound.’’ The binary QSAR method isdesigned to make use of a particular set of descriptors as input. Their calcu-lated values are subjected to principal component analysis and processed toproduce a probability density function to which a cut-off value is assignedin order to place each result into one of the two result states. A random setof 500 compounds (composed of equal numbers from ACD and CH) wasused as a training set, and the results were tested against a same-sized testset. The prediction accuracy was assessed with a simple formula consistingof the number of correctly identified natural products plus the correctly iden-tified synthetic compounds divided by the total number of compounds,expressed as a percentage. Applying the above protocol to six different ran-dom training and test sets, the results are unequivocal; tests done with thelow SE descriptor set (group 2) performed the worst returning nearly randomresults in the range of 53% (random performance being at 50%). Tests usingthe highest ES (group 4) performed the best at 91% prediction accuracy.Group 3, consisting of the high SE descriptors (without considering ES),returned a favorable, but not the best, prediction accuracy of 85%. Thedescriptor set composed of intermediate valued SE descriptors (group 1) hadan intermediate prediction accuracy of 68%.

Two conclusions can be derived from these results. First, it is feasible touse entropy-based information theory to select fewer than 10 chemicaldescriptors that can systematically distinguish between compounds from dif-ferent sources. Second, when selecting descriptors to distinguish between com-pounds, it is important that these descriptors have high information contentthat can support separability or differentiate compounds between the datasets.The power of the entropic separation revealed in this analysis gave rise tothe development of the DSE and, ultimately, the SE–DSE metric, as describedearlier.

Another example that focuses on the use of DSE analysis is to model che-mical properties such as predicting the aqueous solubility of compounds.49 Aqu-eous solubility provides an example of a physicochemical property that can beaddressed at the level of structurally derived chemical descriptors. Because theaqueous solubility of many compounds is known, an accurate and sufficientlylarge dataset can be accumulated for constructing and evaluating predictivemodels. In addition, problems surrounding solubility remain a significant issuefor lead identification and optimization in pharmaceutical research.52,53

An important goal of studying chemical descriptors for their ability topredict aqueous solubility was to provide a rational alternative to the intuitivebias that tended to dominate the descriptor selection in this area ofresearch.53–55 Many scientists had included in their studies descriptors that


are based on chemical intuition, such as logP(o/w) and related descriptors thataddress, e.g., hydrogen bonding, and hydrophobic or solvent-accessible sur-face areas. However, further studies have shown that the addition of descrip-tor-based topological and electronic molecular information is as important asthese intuitive sets.56,57

We now ask whether an entropy-based approach can be used to identifydescriptors that accurately predict aqueous solubility (as an example of a rele-vant physicochemical property). To address different solubility thresholdvalues from an experimental dataset, compounds were divided into ‘‘soluble’’and ‘‘insoluble’’ subsets. The descriptors chosen as the information sourceinput for a binary QSAR model were selected exclusively by DSE analysisthat was performed with the number of histogram bins consistently held at25. An experimental database of 650 molecules with known solubility(expressed as logS values, where S is the aqueous solubility in mol/L) wasgleaned from the literature54–57,59 and confirmed in the PHYSPROP data-base;60 all values selected were for a constant temperature (25 1�C). These650 compounds were divided into a training set of 550 molecules and a test setof the remaining 100 molecules to cover equivalent solubility ranges. Fivesolubility threshold levels were established: 1 mM, 5 mM, 10 mM, 50 mM,and 100 mM. These levels were based on the ranges seen for many drugs60

and because the middle threshold (10 mM) was a minimal solubility accepta-ble in most screening assays.52 DSE values were calculated independently foreach of five paired datasets corresponding to the five threshold values for atotal of 148 2-D descriptors.36 Six binary QSAR models using the DSE sorted(highest to lowest, Table 2) top 5, 10, 15, 20, 25, and 30 descriptors weregenerated for each of the five threshold dataset pairs. Prediction accuracywas monitored with the number of correctly identified soluble moleculesplus the number of correctly identified insoluble molecules divided by the totalnumber of molecules, expressed as a percentage.

With the exception of the 100-mM threshold set, the best predictionaccuracy was achieved (at an average 88%), when using only the five highestvalued DSE descriptors. The 100-mM set did better (92%) with 20 of the high-est valued DSE descriptors. The descriptors producing the highest accuracy arelogP, hydrophobic van der Waals surfaces, hydrophobic atom counts, andthree complex descriptors approximating polar, charged, and hydrophobicsurface areas. Note that descriptors providing information about hydrogenbonding or partial charges were not needed to produce the best results. Oneof the most significant findings of this study is that only very few descriptorsare required to predict aqueous solubility with high accuracy. This is consis-tent with the findings of Jorgensen and Duffy61 whose Monte Carlo simula-tions identified 11 descriptors and, with only five terms in a subsequentQSPR, achieved high prediction accuracy.

This DSE analysis of aqueous solubility confirms that informationtheoretic analysis can be used to successfully select features for modelingof physicochemical properties. A genetic algorithm implementation of the


SE and DSE formalisms by Wegner and Zell has also been applied to selectdescriptors for neural network prediction of aqueous solubility and logP(o/w)values.13 Significant correlation coefficients of 0.9 were obtained. In theseneural network studies, only a small number of information-rich descriptorswere also necessary for successful modeling. Shannon entropy-based analysisof Brownian processing of molecular graphs, as discussed above, has also beenapplied successfully to relate information content parameters of nicotinicreceptor antagonist and beta-lactamase inhibitors with their potencies.42

Taken together, all these studies have confirmed that the Shannon entropyapproach derived from digital communication theory can be adapted andextended for solving problems that have traditionally been treated usingQSAR-type or machine learning methods. When applied to descriptor selec-tion, information content analysis is complementary to both QSAR modelingand molecular similarity analysis. Finally, in addition to descriptor selection,the Shannon entropy concept has also been employed by Clark in descriptor

Table 2 Molecular Descriptors with Highest DSE Values inSolubility Predictions

Av DSE Descriptor

0.558 SlogP(a)

0.554 a_hyd(b)

0.542 logP(o/w)(c)

0.542 PEOE_VSA_NEG(d)

0.526 PEOE_VSA-1(e)

0.494 SMR(f)

0.492 chi1v(g)

0.492 vsa_hyd(h)

0.482 mr(i)

0.472 chi0v(j)

NOTE: Reported are average DSE values (‘‘Av DSE’’) for the top10 descriptors that were found to be most responsive to differencesbetween ‘‘soluble’’and ‘‘insoluble’’compounds. DSE values wereaveraged over all five solubility threshold ranges.

(a) ‘‘SlogP,’’ atomic contribution model of the logarithm ofoctanol/water partition coefficient;

(b) ‘‘a_hyd,’’ number of hydrophobic atoms based on pharmaco-phore atom typing;

(c) ‘‘logP(o/w),’’ logarithm of octanol/water partition coefficientbased on a linear atom model;

(d) ‘‘PEOE_VSA_NEG,’’ approximate electronegative van derWaals surface area;

(e) ‘‘PEOE_VSA-1,’’ sum of van der Waals surface area for apartial charge range;

(f) ‘‘SMR,’’ molecular refractivity parameterized model;(g) ‘‘chi1v,’’ atomic valence connectivity index (order 1);(h) ‘‘vsa_hyd,’’ approximate hydrophobic van der Waals surface

area;(i) ‘‘mr,’’ molecular refractivity linear model;(j) ‘‘chi0v,’’ atomic valence connectivity index (order 0). Data

were taken from Ref. 49.


design.62,63 In these studies, complex molecular shape descriptors have beengenerated that capture four local properties: electrostatic potential, ionizationenergy, electron affinity, and polarizability.62 Clark calculated local SE toquantify the distributions of these properties in different regions of the mole-cular surface, leading to the conclusion that low SE regions are preferred formediating specific interactions.63

CONCLUSIONS

The Shannon entropy concept has been adapted and extended for differ-ent types of applications in chemoinformatics and computational chemistry.This information-theoretic concept evaluates the information content ofdata distributions and thereby, within the chemoinformatics framework, pro-vides a basis for estimating the information carrying capacity of chemicaldescriptors or the relative diversity of a compound library. It can also beapplied to extract information from molecular graph representations. Extend-ing the SE formalism to follow changes in distributions of values and introdu-cing a value-range dependence gave rise to a differential form, called DSE,which identifies those chemical descriptors whose shifts most distinguish onecompound set from another. The ensuing SE–DSE metric makes it possible toidentify descriptors having consistently high information content in databasesand that are responsive to database- or compound class-specific features.Importantly, the applications of such metrics permit large-scale property pro-filing of compound databases. Using these techniques, it is often possible todiscern differences in property changes that are too subtle and buried in themorass of data associated with large compound sets to be detected by othermeans. For descriptor selection, SE–DSE analysis provides a rational alterna-tive to more intuitive selection schemes that have long dominated many appli-cations in the QSAR arena. Results available thus far indicate that if descriptorselection can be rationalized, relatively few descriptors having high informa-tion content (SE) and suitable sensitivity (DSE) are usually sufficient for devel-oping a successful application, for example, as a parameter set for QSAR. As afuture perspective, SE–DSE analysis can also be expected to aid in the discov-ery and generation of new chemical descriptors by identifying efficacious com-binations of commonly used descriptors or by elucidating gaps where newtypes of chemical descriptors need to be advanced. A general feature of infor-mation content analysis, as described herein, is that it has low computationalcomplexity and memory requirements. Thus, the approach can easily handlevery large databases that nowadays often contain millions of compounds andare expected to grow further. For applications in chemistry, information con-tent analysis should have significant scientific growth potential in a variety ofareas, including theoretical organic and medicinal chemistry, chemoinfor-matics, quantum mechanics, and molecular dynamics simulations.


REFERENCES

1. C. E. Shannon,Bell Syst. Tech. J., 27, 379 (1948). AMathematical Theory of Communication.

2. C. E. Shannon and W. Weaver, The Mathematical Theory of Communication, University ofIllinois Press, Urbana, Illinois, 1963.

3. S. M. Dancoff and H. Quastler, in Essays on the Use of Information Theory in Biology, H.Quastler, Ed., University of Illinois Press, Urbana, Illinois, 1953, pp 263–273, The Informa-tion Content and Error Rate of Living Things.

4. D. Bonchev and N. Trinajstic, J. Chem. Phys., 67, 4517 (1977). Information Theory, DistanceMatrix, and Molecular Branching.

5. D. Bonchev, Commun. Math. Chem., 7, 65 (1979). Information Indices for Atoms andMolecules.

6. S. H. Bertz, J. Am. Chem Soc., 103, 3599 (1981). First General Index of MolecularComplexity.

7. S. R. Gadre, S. B. Sears, S. J. Chakravorty, and R. D. Bendale, Phys. Rev., A32, 2602 (1985).Some Novel Characteristics of Atomic Information Entropies.

8. M. Ho, V. H. Smith, Jr., D. F. Weaver, C. Gatti, R. P. Sagar, and R. O. Esquivel, J. Chem.Phys., 108, 5469 (1998). Molecular Similarity Based on Information Entropies andDistances.

9. J. W. Godden, F. L. Stahura, and J. Bajorath, J. Chem. Inf. Comput. Sci., 40, 796 (2000).Variability ofMolecular Descriptors in CompoundDatabases Revealed by Shannon EntropyCalculations.

10. G. M. Maggiora and V. Shanmugasundaram, 219th American Chemical Society NationalMeeting. Division of Computers in Chemistry. Abstract No. 119 (2000). Similarity-basedShannon-like Diversity Measure.

11. J. W. Godden, and J. Bajorath, J. Chem. Inf. Comput. Sci., 41, 1060 (2001). DifferentialShannon Entropy as a SensitiveMeasure of Differences in Database Variability ofMolecularDescriptors.

12. J.W.Godden and J. Bajorath, J. Chem. Inf. Comput. Sci., 42, 87 (2002). ChemicalDescriptorswith Distinct Levels of Information Content and Varying Sensitivity to Differences BetweenSelected Compound Databases Identified by SE-DSE Analysis.

13. J. K. Wegner and A. Zell, J. Chem. Inf. Comput. Sci., 43, 1077 (2003). Prediction of AqueousSolubility and Partition Coefficient Optimized by a Genetic Algorithm Based DescriptorSelection Method.

14. J. K. Wegner, H. Frohlich, and A. Zell, J. Chem. Inf. Comput. Sci., 44, 921 (2004). FeatureSelection for Descriptor Based Classification Models.

15. D. J. Graham and D. Schacht, J. Chem. Inf. Comput. Sci., 40, 942 (2000). Base InformationContent in Organic Molecular Formulae.

16. D. J. Graham, J. Chem. Inf. Comput. Sci., 42, 215 (2002). Information Content in OrganicMolecules: Structure Considerations Based on Integer Statistics.

17. D. J. Graham, C. Malarkey, and M. V. Schulmerich, J. Chem. Inf. Comput. Sci., 44, 1601(2004). Information Content in Organic Molecules: Quantification and Statistical Structurevia Brownian Processing.

18. A. Mowshowitz, Bull. Math. Biophys., 30, 175 (1968). Entropy and the Complexity ofGraphs: I. An Index of the Relative Complexity of a Graph.

19. J. Daly, Commun. Stat., Theory Methods, 17, 2921 (1988). The Construction of OptimalHistograms.

20. K. He and G.Meeden, J. Stat. Planning and Inference, 61, 49 (1997). Selecting the Number ofBins in a Histogram: A Decision Theoretic Approach.

21. M. P. Wand, J. Am. Stat. Assoc., 85, 59 (1997). Data-based Choice of Histogram Bin Width.

References 287

22. L. Birge and Y. Rozenholc, (2002). How Many Bins Should Be Put in a Regular Histogram?Available: http://www.proba.jussieu.fr/mathdoc/textes/PMA-721.pdf.

23. D. W. Scott, Biometrika, 66, 605 (1979). On Optimal and Data-based Histograms.

24. Analytical Methods Committee, Basic Concepts, Analyst, 114, (Part 1), 1693–1697, (1989).Robust Statistics - How not to Reject Outliers.

25. D. B. Rorabacher,Anal. Chem., 83, 139 (1991). Statistical Treatment for Rejection of DeviantValues: Critical Values of DixonQ Parameter and Related Subrange Ratios at the 95 PercentConfidence Level.

26. F. Grubbs, in Technometrics, U.S. Army Aberdeen Research and Development Center,Aberdeen Proving Ground, Maryland, 1969, 11, 1, pp. 1–21, Procedures for DetectingOutlying Observations in Samples.

27. T. M. Cover and A. T. Joy, Elements of Information Theory, Wiley, New York, 1991.

28. E. T. Jaynes, Phys. Rev., 106, 620 (1957). Information Theory and Statistical Mechanics.

29. S.Kullback, InformationTheoryandStatistics,DoverPublications,Mineola,NewYork,1997.

30. L. Xue and J. Bajorath, Combin. Chem. High Throughput Screen., 3, 363 (2000). MolecularDescriptors in Chemoinformatics, Computational Combinatorial Chemistry, and VirtualScreening.

31. R. Todeschini and V. Consonni, inMethods and Principles in Medicinal Chemistry - Volume11 - Handbook of Molecular Descriptors, R. Mannhold, H. Kubinyi, and H. Timmerman,Eds., Wiley, New York, 2000.

32. J. W. Godden, L. Xue, D. B. Kitchen, F. L. Stahura, E. J. Schermerhorn, and J. Bajorath,J. Chem. Inf. Comput. Sci., 42, 885 (2002). Median Partitioning: A Novel Method for theSelection of Representative Subsets from Large Compound Pools.

33. Available Chemicals Directory (ACD), 2005, MDL Information Systems Inc., San Leandro,California. Available: www.mdl.com.

34. Molecular Drug Data Report (MDDR), 2005, MDL Information Systems Inc., San Leandro,California. Available: www.mdl.com.

35. C. A. Lipinski, F. Lombardo, B. W. Dominy, and P. J. Feeney, Adv. Drug Deliver. Rev., 23, 3(1997). Experimental and Computational Approaches to Estimate Solubility and Perme-ability in Drug Discovery and Development Settings.

36. Molecular Operating Environment (MOE), 2005, Chemical Computing Group Inc., Mon-treal, Quebec, Canada, Available: www.chemcomp.com.

37. H. A. Sturges, J. Am. Stat. Assoc., 21, 65 (1926). The Choice of a Class Interval.

38. P. A. Labute, J. Mol. Graph. Model., 18, 464 (2000). A Widely Applicable Set of Descriptors.

39. L. H. Hall and L. B. Kier, in Reviews of Computational Chemistry Vol. 2, K. B. Lipkowitz,and D. B. Boyd, Eds., VCH Publishers, New York, 1991, pp. 367–422. The MolecularConnectivity Chi Indices and Kappa Shape Indices in Structure-Property Modeling.

40. J. J. Irwin and B. J. Shoichet, Chem. Inf. Model., 45, 177 (2005). Zinc – A Free Database ofCommercially Available Compounds for Virtual Screening.

41. Chapman & Hall Dictionary of Natural Products (2005), CRC Press LLC, Boca Raton,Florida. Available: www.crcpress.com.

42. D. J. Graham, J. Chem. Inf. Model., 45, 1223 (2005). Information Content in OrganicMolecules: Aggregation States and Solvent Effects.

43. M. Ho, R. P. Sagar, D. F. Weaver, and V. H. Smith, Jr., Int. J. Quantum Chem., 56, 109(1995). An Investigation of the Dependence of Shannon Information Entropies and DistanceMeasures on Molecular Geometry.

44. R. P. Sagar, J. C. Ramirez, R. O. Esquivel, M. Ho, and V. H. Smith, Jr., J. Chem. Phys., 116,9213 (2002). Relationships Between Jaynes Entropy of the One-Particle Density Matrix andShannon Entropy of the Electron Densities.

45. T. Sato, Acta Cryst. A, 48, 842 (1992). Maximum Entropy Method: Phase Refinement.

46. L. Lorenzo and R. A. Mosquera, J. Comput. Chem., 24, 707 (2003). A Box-Counting-BasedAlgorithm for Computing Shannon Entropy in Molecular Dynamics Simulations.


47. T. Aynechi and I. D. Kuntz,Biophys. J., 89, 3008 (2005). An Information Theoretic Approachto Macromolecular Modeling II. Force Fields.

48. F. L. Stahura, J. W. Godden, and J. Bajorath, J. Chem. Inf. Comput. Sci., 40, 1245 (2000).Distinguishing Between Natural Products and Synthetic Molecules by Descriptor ShannonEntropy Analysis and Binary QSAR Calculations.

49. F. L. Stahura, J. W. Godden, and J. Bajorath, J. Chem. Inf. Comput. Sci., 42, 550 (2002).Differential Shannon Entropy Analysis Identifies Molecular Property Descriptors thatPredict Aqueous Solubility of Synthetic Compounds with High Accuracy in Binary QSARCalculations.

50. P. Labute, Pac. Symp. Biocomput., 7, 444 (1999). Binary QSAR: A New Method for theDetermination of Quantitative Structure Activity Relationships.

51. T.Henkel, R.M. Brunne,H.Muller, andR. Reichel,Angew. Chemie. Int. Ed., 38, 643 (1999).Statistical Investigation into the Structural Complementarily of Natural Products andSynthetic Compounds.

52. C. A. Lipinski, Current Drug Discovery, 1, 17 (2001). Avoiding Investments in DoomedDrugs.

53. J. Taskinen, Curr. Opin. Drug Discov. Dev., 3, 102 (2000). Prediction of Aqueous Solubilityin Drug Design.

54. J. M. Sutter and P. C. Jurs, J. Chem. Inf. Comput. Sci., 36, 100 (1996). Prediction of AqueousSolubility for a Diverse Set of Heteroatom-containing Organic Compounds Using a Quan-titative Structure Property Relationship.

55. B. E. Mitchell and P. C. Jurs, J. Chem. Inf. Comput. Sci., 38, 489 (1998). Prediction ofAqueous Solubility of Organic Compounds from Molecular Structure.

56. N. R. McElroy and P. C. Jurs, J. Chem. Inf. Comput. Sci., 41, 1237 (2001). Prediction ofAqueous Solubility ofHeteratom-containingOrganicCompounds fromMolecular Structure.

57. I. V. Tetko, V. Y. Tanchuk, T. N. Kasheva, and A. E. P. Villa, J. Chem. Inf. Comput. Sci., 41,1488(2001).EstimationofAqueousSolubilityofChemicalCompoundsUsingE-state Indices.

58. J. Huuskonen, J. Chem. Inf. Comput. Sci., 40, 773 (2000). Estimation of Aqueous Solubilityfor a Diverse Set of Organic Compounds Based on Molecular Topology.

59. J. Huuskonen, M. Salo, and J. Taskinen, J. Chem. Inf. Comput. Sci., 38, 450 (1998). AqueousSolubility Prediction of Drugs Based on Molecular Topology and Neural Network Model-ling.

60. Physical/Chemical Property database (PHYSPROP), 1994, Syracuse Research Corporation,SRC Environmental Science Center, Syracuse, New York. Available: www.syrres.com.

61. W. L. Jorgensen and E. M. Duffy, Bioorg. Med. Chem. Lett., 10, 1155 (2000). Prediction ofDrug Solubility from Monte Carlo Simulations.

62. J.-H. Lin and T. Clark, J. Chem. Inf. Model., 45, 1010 (2005). An Analytical, VariableResolution, Complete Description of Static Molecules and Their Intermolecular BindingProperties.

63. T. Clark, 229th American Chemical Society National Meeting. Division of Computers inChemistry, Abstract No. 267 (2005). Shannon Entropy as a Local Surface Property.

References 289

CHAPTER 6

Applications of Support VectorMachines in Chemistry

Ovidiu Ivanciuc

Sealy Center for Structural Biology,Department of Biochemistry and Molecular Biology,University of Texas Medical Branch, Galveston, Texas

INTRODUCTION

Kernel-based techniques (such as support vector machines, Bayes pointmachines, kernel principal component analysis, and Gaussian processes) repre-sent a major development in machine learning algorithms. Support vectormachines (SVM) are a group of supervised learning methods that can beapplied to classification or regression. In a short period of time, SVM foundnumerous applications in chemistry, such as in drug design (discriminatingbetween ligands and nonligands, inhibitors and noninhibitors, etc.), quantita-tive structure-activity relationships (QSAR, where SVM regression is used topredict various physical, chemical, or biological properties), chemometrics(optimization of chromatographic separation or compound concentration pre-diction from spectral data as examples), sensors (for qualitative and quantita-tive prediction from sensor data), chemical engineering (fault detection andmodeling of industrial processes), and text mining (automatic recognition ofscientific information).

Support vector machines represent an extension to nonlinear models of thegeneralized portrait algorithmdeveloped byVapnik andLerner.1 The SVMalgo-rithm is based on the statistical learning theory and the Vapnik–Chervonenkis


291

(VC) dimension.2 The statistical learning theory, which describes the propertiesof learningmachines that allow themtogive reliablepredictions,was reviewedbyVapnik in three books: Estimation of Dependencies Based on Empirical Data,3

TheNature of Statistical Learning Theory,4 and Statistical Learning Theory.5 Inthe current formulation, the SVM algorithm was developed at AT&T BellLaboratories by Vapnik et al.6–12

SVM developed into a very active research area, and numerous books areavailable for an in-depth overview of the theoretical basis of these algorithms,including Advances in Kernel Methods: Support Vector Learning by Scholkopfet al.,13 An Introduction to Support Vector Machines by Cristianini andShawe–Taylor,14 Advances in Large Margin Classifiers by Smola et al.,15 Learn-ing and Soft Computing by Kecman,16 Learning with Kernels by Scholkopf andSmola,17 Learning to Classify Text Using Support Vector Machines: Methods,Theory, and Algorithms by Joachims,18 Learning Kernel Classifiers by Her-brich,19 Least Squares Support Vector Machines by Suykens et al.,20 and KernelMethods for Pattern Analysis by Shawe-Taylor and Cristianini.21 Several author-itative reviews and tutorials are highly recommended, namely those authored byScholkopf et al.,7 Smola and Scholkopf,22 Burges,23 Scholkopf et al.,24 Suykens,25

Scholkopf et al.,26 Campbell,27 Scholkopf and Smola,28 and Sanchez.29

In this chapter, we present an overview of SVM applications in chemistry.We start with a nonmathematical introduction to SVM, which will give aflavor of the basic principles of the method and its possible applications in che-mistry. Next we introduce the field of pattern recognition, followed by a briefoverview of the statistical learning theory and of the Vapnik–Chervonenkisdimension. A presentation of linear SVM followed by its extension tononlinear SVM and SVM regression is then provided to give the basic math-ematical details of the theory, accompanied by numerous examples. Severaldetailed examples of SVM classification (SVMC) and SVM regression(SVMR) are then presented, for various structure-activity relationships(SAR) and quantitative structure-activity relationships (QSAR) problems.Chemical applications of SVM are reviewed, with examples from drug design,QSAR, chemometrics, chemical engineering, and automatic recognition ofscientific information in text. Finally, SVM resources on the Web and freeSVM software are reviewed.

A NONMATHEMATICAL INTRODUCTION TO SVM

The principal characteristics of the SVM models are presented here in anonmathematical way and examples of SVM applications to classification andregression problems are given in this section. The mathematical basis of SVMwill be presented in subsequent sections of this tutorial/review chapter.

SVM models were originally defined for the classification of linearlyseparable classes of objects. Such an example is presented in Figure 1. For

292 Applications of Support Vector Machines in Chemistry

these two-dimensional objects that belong to two classes (class þ1 and class�1), it is easy to find a line that separates them perfectly.

For any particular set of two-class objects, an SVM finds the uniquehyperplane having the maximum margin (denoted with d in Figure 1). Thehyperplane H1 defines the border with class þ1 objects, whereas the hyper-plane H2 defines the border with class �1 objects. Two objects from classþ1 define the hyperplane H1, and three objects from class �1 define the hyper-plane H2. These objects, represented inside circles in Figure 1, are called sup-port vectors. A special characteristic of SVM is that the solution to aclassification problem is represented by the support vectors that determinethe maximum margin hyperplane.

SVM can also be used to separate classes that cannot be separated with alinear classifier (Figure 2, left). In such cases, the coordinates of the objects aremapped into a feature space using nonlinear functions called feature functionsf. The feature space is a high-dimensional space in which the two classes canbe separated with a linear classifier (Figure 2, right).

As presented in Figures 2 and 3, the nonlinear feature function f com-bines the input space (the original coordinates of the objects) into the featurespace, which can even have an infinite dimension. Because the feature spaceis high dimensional, it is not practical to use directly feature functions f in

+1

+1

+1

+1+1

+1

−1

−1

−1

−1

−1

−1

δ

H1

H2

−1

−1

−1

+1

+1

Figure 1 Maximum separation hyperplane.

H

+1 +1+1

+1+1 +1

−1−1 −1

−1

−1

−1−1

−1

−1 −1

+1

+1+1

+1

+1 +1

−1

−1 −1

−1−1

−1−1

−1

−1

−1φ

Input space Feature space

Figure 2 Linear separation in feature space.

A Nonmathematical Introduction to SVM 293

computing the classification hyperplane. Instead, the nonlinear mappinginduced by the feature functions is computed with special nonlinear functionscalled kernels. Kernels have the advantage of operating in the input space,where the solution of the classification problem is a weighted sum of kernelfunctions evaluated at the support vectors.

To illustrate the SVM capability of training nonlinear classifiers, considerthe patterns fromTable 1. This is a synthetic dataset of two-dimensional patterns,designed to investigate the properties of the SVM classification algorithm. Allfigures from this chapter presenting SVM models for various datasets wereprepared with a slightly modified version of Gunn’s MATLAB toolbox,http://www.isis.ecs.soton.ac.uk/resources/svminfo/. In all figures, class þ1 pat-terns are represented byþ , whereas class �1 patterns are represented by blackdots. The SVM hyperplane is drawn with a continuous line, whereas the mar-gins of the SVM hyperplane are represented by dotted lines. Support vectorsfrom the class þ1 are represented asþ inside a circle, whereas support vectorsfrom the class �1 are represented as a black dot inside a circle.

Input space

Feature space

Output space

Figure 3 Support vector machines map the input space into a high-dimensional featurespace.

Table 1 Linearly Nonseparable Patterns Used for theSVM Classification Models in Figures 4–6

Pattern x1 x2 Class

1 2 4.5 12 2.5 2.9 13 3 1.5 14 3.6 0.5 15 4.2 2 16 3.9 4 17 5 1 18 0.6 1 �19 1 4.2 �110 1.5 2.5 �111 1.75 0.6 �112 3 5.6 �113 4.5 5 �114 5 4 �115 5.5 2 �1


Partitioning of the dataset from Table 1 with a linear kernel is shown inFigure 4a. It is obvious that a linear function is not adequate for this dataset,because the classifier is not able to discriminate the two types of patterns; allpatterns are support vectors. A perfect separation of the two classes can beachieved with a degree 2 polynomial kernel (Figure 4b). This SVM modelhas six support vectors, namely three from class þ1 and three from class�1. These six patterns define the SVM model and can be used to predict theclass membership for new patterns. The four patterns from class þ1 situated inthe space region bordered by the þ1 margin and the five patterns from class�1 situated in the space region delimited by the �1 margin are not importantin defining the SVM model, and they can be eliminated from the training setwithout changing the SVM solution.

The use of nonlinear kernels provides the SVM with the ability to modelcomplicated separation hyperplanes in this example. However, because thereis no theoretical tool to predict which kernel will give the best results for agiven dataset, experimenting with different kernels is the only way to identifythe best function. An alternative solution to discriminate the patterns fromTable 1 is offered by a degree 3 polynomial kernel (Figure 5a) that has sevensupport vectors, namely three from class þ1 and four from class �1. Theseparation hyperplane becomes even more convoluted when a degree 10 poly-nomial kernel is used (Figure 5b). It is clear that this SVM model, with 10 sup-port vectors (4 from class þ1 and 6 from class �1), is not an optimal model forthe dataset from Table 1.

The next two experiments were performed with the B spline kernel(Figure 6a) and the exponential radial basis function (RBF) kernel (Figure 6b).Both SVM models define elaborate hyperplanes, with a large number of sup-port vectors (11 for spline, 14 for RBF). The SVM models obtained with theexponential RBF kernel acts almost like a look-up table, with all but one

Figure 4 SVM classification models for the dataset from Table 1: (a) dot kernel (linear),Eq. [64]; (b) polynomial kernel, degree 2, Eq. [65].


pattern used as support vectors. By comparing the SVM models fromFigures 4–6, it is clear that the best one is obtained with the degree 2 polyno-mial kernel, the simplest function that separates the two classes with the low-est number of support vectors. This principle of minimum complexity of thekernel function should serve as a guide for the comparative evaluation andselection of the best kernel. Like all other multivariate algorithms, SVM canoverfit the data used in training, a problem that is more likely to happenwhen complex kernels are used to generate the SVM model.

Support vector machines were extended by Vapnik for regression4 byusing an e-insensitive loss function (Figure 7). The learning set of patterns isused to obtain a regression model that can be represented as a tube with radiuse fitted to the data. In the ideal case, SVM regression finds a function that maps

Figure 5 SVM classification models obtained with the polynomial kernel (Eq. [65]) forthe dataset from Table 1: (a) polynomial of degree 3; (b) polynomial of degree 10.

Figure 6 SVM classification models for the dataset from Table 1: (a) B spline kernel,degree 1, Eq. [72]; (b) exponential radial basis function kernel, s ¼ 1, Eq. [67].


all input data with a maximum deviation e from the target (experimental)values. In this case, all training points are located inside the regression tube.However, for datasets affected by errors, it is not possible to fit all the patternsinside the tube and still have a meaningful model. For the general case, SVMregression considers that the error for patterns inside the tube is zero, whereaspatterns situated outside the regression tube have an error that increases whenthe distance to the tube margin increases (Figure 7).30

The SVM regression approach is illustrated with a QSAR for angiotensinII antagonists (Table 2) from a review by Hansch et al.31 This QSAR, model-ing the IC50 for angiotensin II determined in rabbit aorta rings, is a nonlinearequation based on the hydrophobicity parameter ClogP:

log1=IC50¼5:27ð 1:0Þþ0:50ð 0:19ÞClogP�3:0ð 0:83Þ logðb�10ClogPþ1Þn¼16 r2cal¼0:849 scal¼0:178 q2LOO¼0:793 opt:ClogP¼6:42

We will use this dataset later to demonstrate the kernel influence on the SVMregression, as well as the effect of modifying the tube radius e. However, wewill not present QSAR statistics for the SVM model. Comparative QSARmodels are shown in the section on SVM applications in chemistry.

A linear function is clearly inadequate for the dataset from Table 2, sowe will not present the SVMR model for the linear kernel. All SVM regressionfigures were prepared with the Gunn’s MATLAB toolbox. Patterns are repre-sented by þ, and support vectors are represented asþ inside a circle. The SVMhyperplane is drawn with a continuous line, whereas the margins of the SVMregression tube are represented by dotted lines. Several experiments with dif-ferent kernels showed that the degree 2 polynomial kernel offers a good modelfor this dataset, and we decided to demonstrate the influence of the tube radiuse for this kernel (Figures 8 and 9). When the e parameter is too small, the dia-meter of the tube is also small forcing all patterns to be situated outside theSVMR tube. In this case, all patterns are penalized with a value that increaseswhen the distance from the tube’s margin increases. This situation is demon-strated in Figure 8a generated with e ¼ 0:05, when all patterns are support

+ε

−ε

0

Figure 7 Support vector machines regression determines a tube with radius e fitted to thedata.


vectors. As e increases to 0.1, the diameter of the tube increases and the num-ber of support vector decreases to 12 (Figure 8b), whereas the remaining pat-terns are situated inside the tube and have zero error.

A further increase of e to 0.3 results in a dramatic change in the numberof support vectors, which decreases to 4 (Figure 9a), whereas an e of 0.5, withtwo support vectors, gives an SVMR model with a decreased curvature

Table 2 Data for the Angiotensin II Antagonists QSAR31 and for theSVM Regression Models from Figures 8–11

N

N

N

C4H9 O

N

NHN

N

X

No Substituent X ClogP log 1/IC50

1 H 4.50 7.382 C2H5 4.69 7.663 (CH2)2CH3 5.22 7.824 (CH2)3CH3 5.74 8.295 (CH2)4CH3 6.27 8.256 (CH2)5CH3 6.80 8.067 (CH2)7CH3 7.86 6.778 CHMe2 5.00 7.709 CHMeCH2CH3 5.52 8.0010 CH2CHMeCH2CMe3 7.47 7.4611 CH2-cy-C3H5 5.13 7.8212 CH2CH2-cy-C6H11 7.34 7.7513 CH2COOCH2CH3 4.90 8.0514 CH2CO2CMe3 5.83 7.8015 (CH2)5COOCH2CH3 5.76 8.0116 CH2CH2C6H5 6.25 8.51

Figure 8 SVM regression models with a degree 2 polynomial kernel (Eq. [65]) for thedataset from Table 2: (a) e ¼ 0:05; (b) e ¼ 0:1.


(Figure 9b). These experiments illustrate the importance of the e parameter onthe SVMR model. Selection of the optimum value for e should be determinedby comparing the prediction statistics in cross-validation. The optimum valueof e depends on the experimental errors of the modeled property. A low eshould be used for low levels of noise, whereas higher values for e are appro-priate for large experimental errors. Note that a low e results in SVMR modelswith a large number of support vectors, whereas sparse models are obtainedwith higher values for e.

We will explore the possibility of overfitting in SVM regression whencomplex kernels are used to model the data, but first we must consider thelimitations of the dataset in Table 2. This is important because those datamight prevent us from obtaining a high-quality QSAR. First, the biologicaldata are affected by experimental errors and we want to avoid modeling thoseerrors (overfitting the model). Second, the influence of the substituent X ischaracterized with only its hydrophobicity parameter ClogP. Although hydro-phobicity is important, as demonstrated in the QSAR model, it might be thatother structural descriptors (electronic or steric) actually control the biologicalactivity of this series of compounds. However, the small number of com-pounds and the limited diversity of the substituents in this dataset might notreveal the importance of those structural descriptors. Nonetheless, it followsthat a predictive model should capture the nonlinear dependence betweenClogP and log 1/IC50, and it should have a low degree of complexity to avoidmodeling of the errors. The next two experiments were performed with thedegree 10 polynomial kernel (Figure 10a; 12 support vectors) and the expo-nential RBF kernel with s ¼ 1 (Figure 10b; 11 support vectors). BothSVMR models, obtained with e ¼ 0:1, follow the data too closely and failto recognize the general relationship between ClogP and log 1/IC50. The over-fitting is more pronounced for the exponential RBF kernel, which therefore isnot a good choice for this QSAR dataset.

Interesting results are also obtained with the spline kernel (Figure 11a)and the degree 1 B spline kernel (Figure 11b). The spline kernel offers an

Figure 9 SVM regression models with a degree 2 polynomial kernel (Eq. [65]) for thedataset from Table 2: (a) e ¼ 0:3; (b) e ¼ 0:5.


interesting alternative to the SVMR model obtained with the degree 2 polyno-mial kernel. The tube is smooth, with a noticeable asymmetry, which might besupported by the experimental data, as one can deduce after a visual inspec-tion. Together with the degree 2 polynomial kernel model, this spline kernelrepresents a viable QSAR model for this dataset. Of course, only detailedcross-validation and parameter tuning can decide which kernel is best. In con-trast with the spline kernel, the degree 1 B spline kernel displays clear signs ofoverfitting, indicated by the complex regression tube. The hyperplane closelyfollows every pattern and is not able to extract a broad and simple relationshipbetween ClogP and log 1/IC50.

The SVMR experiments that we have just carried out using the QSARdataset from Table 2 offer convincing proof for the SVM ability to modelnonlinear relationships but also their overfitting capabilities. This datasetwas presented only for demonstrative purposes, and we do not recommendthe use of SVM for QSAR models with such a low number of compoundsand descriptors.

Figure 10 SVM regression models with e ¼ 0:1 for the dataset of Table 2:(a) polynomial kernel, degree 10, Eq. [65]; (b) exponential radial basis function kernel,s ¼ 1, Eq. [67].

Figure 11 SVM regression models with e ¼ 0:1 for the dataset of Table 2: (a) splinekernel, Eq. [71]; (b) B spline kernel, degree 1, Eq. [72].


PATTERN CLASSIFICATION

Research in pattern recognition involves development and application ofalgorithms that can recognize patterns in data.32 These techniques have impor-tant applications in character recognition, speech analysis, image analysis,clinical diagnostics, person identification, machine diagnostics, and industrialprocess supervision as examples. Many chemistry problems can also be solvedwith pattern recognition techniques, such as recognizing the provenance ofagricultural products (olive oil, wine, potatoes, honey, etc.) based on compo-sition or spectra, structural elucidation from spectra, identifying mutagens orcarcinogens from molecular structure, classification of aqueous pollu-tants based on their mechanism of action, discriminating chemical compoundsbased on their odor, and classification of chemicals in inhibitors and noninhi-bitors for a certain drug target.

We now introduce some basic notions of pattern recognition. A pattern(object) is any item (chemical compound, material, spectrum, physical object,chemical reaction, industrial process) whose important characteristics form aset of descriptors. A descriptor is a variable (usually numerical) that charac-terizes an object. Note that in pattern recognition, descriptors are usuallycalled ‘‘features’’, but in SVM, ‘‘features’’ have another meaning, so wemust make a clear distinction here between ‘‘descriptors’’ and ‘‘features’’. Adescriptor can be any experimentally measured or theoretically computedquantity that describes the structure of a pattern, including, for example, spec-tra and composition for chemicals, agricultural products, materials, biologicalsamples; graph descriptors33 and topological indices;34 indices derived fromthe molecular geometry and quantum calculations;35,36 industrial processparameters; chemical reaction variables; microarray gene expression data;and mass spectrometry data for proteomics.

Each pattern (object) has associated with it a property value. A propertyis an attribute of a pattern that is difficult, expensive, or time-consuming tomeasure, or not even directly measurable. Examples of such properties includeconcentration of a compound in a biological sample, material, or agriculturalproduct; various physical, chemical, or biological properties of chemical com-pounds; biological toxicity, mutagenicity, or carcinogenicity; ligand/nonligandfor different biological receptors; and fault identification in industrialprocesses.

The major hypothesis used in pattern recognition is that the descriptorscapture some important characteristics of the pattern, and then a mathemati-cal function (e.g., machine learning algorithm) can generate a mapping (rela-tionship) between the descriptor space and the property. Another hypothesis isthat similar objects (objects that are close in the descriptor space) have similarproperties. A wide range of pattern recognition algorithms are currently beingused to solve chemical problems. These methods include linear discriminantanalysis, principal component analysis, partial least squares (PLS),37 artificial

Pattern Classification 301

neural networks,38 multiple linear regression (MLR), principal componentregression, k-nearest neighbors (k-NN), evolutionary algorithms embeddedinto machine learning procedures,39 and large margin classifiers including,of course, support vector machines.

A simple example of a classification problem is presented in Figure 12.The learning set consists of 24 patterns, 10 in class þ1 and 14 in class �1.In the learning (training) phase, the algorithm extracts classification rulesusing the information available in the learning set. In the prediction phase,the classification rules are applied to new patterns, with unknown classmembership, and each new pattern is assigned to a class, either þ1 or �1.In Figure 12, the prediction pattern is indicated with ‘‘?’’.

We consider first a k-NN classifier, with k ¼ 1. This algorithm computesthe distance between the new pattern and all patterns in the training set, andthen it identifies the k patterns closest to the new pattern. The new pattern isassigned to the majority class of the k nearest neighbors. Obviously, k shouldbe odd to avoid undecided situations. The k-NN classifier assigns the new pat-tern to class þ1 (Figure 13) because its closest pattern belongs to this class.The predicted class of a new pattern can change by changing the parameter k.The optimal value for k is usually determined by cross-validation.

The second classifier considered here is a hyperplane H that defines tworegions, one for patterns þ1 and the other for patterns �1. New patterns areassigned to class þ1 if they are situated in the space region corresponding tothe class þ1, but to class �1 if they are situated in the region corresponding toclass �1. For example, the hyperplane H in Figure 14 assigns the new patternto class �1. The approach of these two algorithms is very different: althoughthe k-NN classifier memorizes all patterns, the hyperplane classifier is definedby the equation of a plane in the pattern space. The hyperplane can be usedonly for linearly separable classes, whereas k-NN is a nonlinear classifierand can be used for classes that cannot be separated with a linear hypersurface.

+1

−1+1

−1

−1

−1

−1

−1

−1

−1

−1

−1

+1

+1

+1 +1

+1

+1

+1−1

−1

−1

−1

+1

?

Figure 12 Example of a classification problem.


An n-dimensional pattern (object) x has n coordinates, x ¼ ðx1; x2; . . . ; xnÞ,where eachxi is a realnumber,xi 2 R for i ¼ 1, 2, . . .,n. Eachpatternxj belongs toa class yj 2 f�1;þ1g. Consider a training set T ofm patterns together with theirclasses, T ¼ fðx1; y1Þ; ðx2; y2Þ; . . . ; ðxm; ymÞg. Consider a dot product space S, inwhich the patterns x are embedded, x1, x2, . . ., xm 2 S. Any hyperplane in thespace S can be written as

fx 2 Sjw � xþ b ¼ 0g;w 2 S; b 2 R ½1�

The dot product w � x is defined by

w � x ¼Xni¼1

wixi ½2�

H+1

−1+1

−1

−1

−1

−1

−1

−1

−1

−1

−1

+1

+1

+1 +1

+1

+1

+1−1

−1

−1

−1

+1

−1

Figure 14 Using the linear classifier defined by the hyperplane H, the pattern . ispredicted to belong to the class �1.

+1

−1+1

−1

−1

−1

−1

−1

−1

−1

−1

−1

+1

+1

+1 +1

+1

+1

+1−1

−1

−1

−1

+1+1

Figure 13 Using the k-NN classifier (k ¼ 1), the pattern . is predicted to belong to theclass þ1.


A hyperplane w � xþ b ¼ 0 can be denoted as a pair (w, b). A training setof patterns is linearly separable if at least one linear classifier exists defined bythe pair (w, b), which correctly classifies all training patterns (see Figure 15).All patterns from class þ1 are located in the space region defined byw � xþ b > 0, and all patterns from class �1 are located in the space regiondefined by w � xþ b < 0. Using the linear classifier defined by the pair (w,b), the class of a pattern xk is determined with

classðxkÞ ¼ þ1 if w � xk þ b > 0�1 if w � xk þ b < 0

�½3�

The distance from a point x to the hyperplane defined by (w, b) is

dðx;w; bÞ ¼ jw � xþ bjjjwjj ½4�

where jjwjj is the norm of the vector w.Of all the points on the hyperplane, one has the minimum distance dmin

to the origin (Figure 16):

dmin ¼ jbjjjwjj ½5�

In Figure 16, we show a linear classifier (hyperplaneH defined byw � xþ b ¼ 0),the space region for classþ1 patterns (defined byw � xþ b > 0), the space regionfor class�1 patterns (defined byw � xþ b < 0), and the distance between originand the hyperplane H (jbj=jjwjj).

Consider a group of linear classifiers (hyperplanes) defined by a set of pairs(w, b) that satisfy the following inequalities for any pattern xi in the training set:

w � xi þ b > 0 if yi ¼ þ1w � xi þ b < 0 if yi ¼ �1

�½6�

H

+1−1

−1

−1−1

−1

−1

+1+1

+1

+1−1

−1

−1

+1

w ·x i +b=0

w ·x i+b>0

w ·x i +b< 0

Class +1

Class −1

Figure 15 The classification hyperplane defines a region for classþ1 and another regionfor class �1.


This group of (w, b) pairs defines a set of classifiers that are able to make acomplete separation between two classes of patterns. This situation is illu-strated in Figure 17.

In general, for each linearly separable training set, one can find an infinitenumber of hyperplanes that discriminate the two classes of patterns. Althoughall these linear classifiers can perfectly separate the learning patterns, they arenot all identical. Indeed, their prediction capabilities are different. A hyper-plane situated in the proximity of the border þ1 patterns will predict as �1all new þ1 patterns that are situated close to the separation hyperplane butin the �1 region (w � xþ b < 0). Conversely, a hyperplane situated in theproximity of the border �1 patterns will predict asþ1 all new�1 patterns situ-ated close to the separation hyperplane but in the þ1 region (w � xþ b > 0). Itis clear that such classifiers have little prediction success, which led to the idea

+1

−1+1

−1

−1

−1

−1

−1

−1

−1

−1

−1

+1

+1

+1 +1

+1

+1

+1−1

−1

−1

−1

+1

Figure 17 Several hyperplanes that correctly classify the two classes of patterns.

Hw

Hyperplane: w ·x i +b=0

w ·x i+b>0

w ·x i +b<0

Class +1

Class −1

|b| /||w ||

Figure 16 The distance from the hyperplane to the origin.


of wide margin classifiers, i.e., a hyperplane with a buffer toward the þ1 and�1 space regions (Figure 18).

For some linearly separable classification problems having a finite num-ber of patterns, it is generally possible to define a large number of wide marginclassifiers (Figure 18). Chemometrics and pattern recognition applications sug-gest that an optimum prediction could be obtained with a linear classifier thathas a maximum margin (separation between the two classes), and with theseparation hyperplane being equidistant from the two classes. In the next sec-tion, we introduce elements of statistical learning theory that form the basis ofsupport vector machines, followed by a section on linear support vectormachines in which the mathematical basis for computing a maximum marginclassifier with SVM is presented.

THE VAPNIK–CHERVONENKIS DIMENSION

Support vector machines are based on the structural risk minimization(SRM), derived from statistical learning theory.4,5,10 This theory is the basisfor finding bounds for the classification performance of machine learningalgorithms. Another important result from statistical learning theory isthe performance estimation of finite set classifiers and the convergence oftheir classification performance toward that of a classifier with an infinitenumber of learning samples. Consider a learning set of m patterns. Eachpattern consists of a vector of characteristics xi 2 Rn and an associated classmembership yi. The task of the machine learning algorithm is to find therules of the mapping xi ! yi. The machine model is a possible mappingxi ! f ðxi; p), where each model is defined by a set of parameters p. Traininga machine learning algorithm results in finding an optimum set of para-meters p. The machine algorithm is considered to be deterministic; i.e.,for a given input vector xi and a set of parameters p, the output will bealways f ðxi; p). The expectation for the test error of a machine trained

+1

+1

+1

+1

+1

+1

+1

+1

−1

−1

−1

−1

−1

−1−1

−1

−1

Figure 18 Examples of margin hyperplane classifiers.


with an infinite number of samples is denoted by e(p) (called expected riskor expected error). The empirical risk eemp(p) is the measured error for afinite number of patterns in the training set:

eempðpÞ ¼ 1

2m

Xmi¼1jyi � f ðxi; pÞj ½7�

The quantity ½jyi � f ðxi; pÞj is called the loss, and for a two-class classifica-tion, it can take only the values 0 and 1. Choose a value Z such that0 � Z � 1. For losses taking these values, with probability 1� Z, the follow-ing bound exists for the expected risk:

eðpÞ � eempðpÞ þffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffidVCðlogð2m=dVCÞ þ 1Þ � logðZ=4Þ

m

r½8�

where dVC is a non-negative integer, called the Vapnik–Chervonenkis (VC)dimension of a classifier, that measures the capacity of a classifier. Theright-hand side of this equation defines the risk bound. The second term inthe right-hand side of the equation is called VC confidence.

We consider the case of two-class pattern recognition, when the functionf ðxi; p) can take only two values, e.g., þ1 and �1. Consider a set of m pointsand all their two-class labelings. If for each of the 2m labelings one can find aclassifier f(p) that correctly separates class þ1 points from class �1 points,then that set of points is separated by that set of functions. The VC dimensionfor a set of functions ff ðpÞg is defined as the maximum number of points thatcan be separated by ff ðpÞg. In two dimensions, three samples can be separatedwith a line for each of the six possible combinations (Figure 19, top panels). Inthe case of four training points in a plane, there are two cases that cannot beseparated with a line (Figure 19, bottom panels). These two cases require aclassifier of higher complexity, with a higher VC dimension. The example

· °

··°

° °· ·° ° ·

°°

· °°·

°°

··°·

°°°°·

° ° ° ° ° °· · · · · · ·Figure 19 In a plane, all combinations of three points from two classes can be separatedwith a line. Four points cannot be separated with a linear classifier.

The Vapnik–Chervonenkis Dimension 307

from Figure 19 shows that the VC dimension of a set of lines in R2 is three.A family of classifiers has an infinite VC dimension if it can separate m points,with m being arbitrarily large.

The VC confidence term in Eq. [8] depends on the chosen class of funct-ions, whereas the empirical risk and the actual risk depend on the particularfunction obtained from the training algorithm.23 It is important to find a sub-set of the selected set of functions such that the risk bound for that subset isminimized. A structure is introduced by classifying the whole class of functionsinto nested subsets (Figure 20), with the property dVC;1 < dVC;2 < dVC;3. Foreach subset of functions, it is either possible to compute dVC or to get a boundon the VC dimension. Structural risk minimization consists of finding the sub-set of functions that minimizes the bound on the actual risk. This is done bytraining for each subset a machine model. For each model the goal is to mini-mize the empirical risk. Subsequently, one selects the machine model whosesum of empirical risk and VC confidence is minimal.

PATTERN CLASSIFICATION WITH LINEARSUPPORT VECTOR MACHINES

To apply the results from the statistical learning theory to pattern classi-fication one has to (1) choose a classifier with the smallest empirical risk and(2) choose a classifier from a family that has the smallest VC dimension. For alinearly separable case condition, (1) is satisfied by selecting any classifier thatcompletely separates both classes (for example, any classifier from Figure 17),whereas condition (2) is satisfied for the classifier with the largest margin.

SVM Classification for Linearly Separable Data

The optimum separation hyperplane (OSH) is the hyperplane with themaximum margin for a given finite set of learning patterns. The OSH compu-tation with a linear support vector machine is presented in this section.

The Optimization ProblemBased on the notations from Figure 21, we will now establish the condi-

tions necessary to determine the maximum separation hyperplane. Consider a

dVC,1 dVC,2 dVC,3

Figure 20 Nested subsets of function, ordered by VC dimension.


linear classifier characterized by the set of pairs (w, b) that satisfy the follow-ing inequalities for any pattern xi in the training set:

w � xi þ b > þ1 if yi ¼ þ1w � xi þ b < �1 if yi ¼ �1

�½9�

These equations can be expressed in compact form as

yiðw � xi þ bÞ � þ1 ½10�

or

yiðw � xi þ bÞ � 1 � 0 ½11�

Because we have considered the case of linearly separable classes, eachsuch hyperplane (w, b) is a classifier that correctly separates all patternsfrom the training set:

classðxiÞ ¼ þ1 if w � xi þ b > 0�1 if w � xi þ b < 0

�½12�

For the hyperplane H that defines the linear classifier (i.e., wherew � xþ b ¼ 0), the distance between the origin and the hyperplane H isjbj=jjwjj. We consider the patterns from the class �1 that satisfy the equalityw � xþ b ¼ �1 and that determine the hyperplane H1; the distance betweenthe origin and the hyperplane H1 is equal to j � 1� bj=jjwjj. Similarly, the pat-terns from the class þ1 satisfy the equality w � xþ b ¼ þ1 and that determine

H1

H2H

+1

−1 +1−1

−1−1

−1 −1

+1

+1

+1

+1

+1

+1−1

−1

−1

+1

2/|| w ||

w w ·x i +b>+1

w ·x i +b= +1

w ·x i +b =−1

w ·x i +b=0

w ·x i +b< −1

Figure 21 The separating hyperplane.

Pattern Classification with Linear Support Vector Machines 309

the hyperplane H2; the distance between the origin and the hyperplane H2 isequal to j þ 1� bj=jjwjj. Of course, hyperplanes H, H1, and H2 are paralleland no training patterns are located between hyperplanes H1 and H2. Basedon the above considerations, the margin of the linear classifier H (the distancebetween hyperplanes H1 and H2) is 2=jjwjj.

We now present an alternative method to determine the distancebetween hyperplanes H1 and H2. Consider a point x0 located on the hyper-plane H and a point x1 located on the hyperplane H1, selected in such away that (x0 � x1) is orthogonal to the two hyperplanes. These points satisfythe following two equalities:

w � x0 þ b ¼ 0

w � x1 þ b ¼ �1

�½13�

By subtracting the second equality from the first equality, we obtain

w � ðx0 � x1Þ ¼ 1 ½14�

Because (x0 � x1) is orthogonal to the hyperplane H, and w is also orthogonalto H, then (x0 � x1) and w are parallel, and the dot product satisfies

jw � ðx0 � x1Þj ¼ jjwjj � jjx0 � x1jj ½15�

From Eqs. [14] and [15], we obtain the distance between hyperplanes H andH1:

jjx0 � x1jj ¼ 1

jjwjj ½16�

Similarly, a point x0 located on the hyperplane H and a point x2 located on thehyperplane H2, selected in such a way that (x0 � x2) is orthogonal to the twohyperplanes, will satisfy the equalities:

w � x0 þ b ¼ 0w � x2 þ b ¼ þ1

�½17�

Consequently, the distance between hyperplanes H and H2 is

jjx0 � x2jj ¼ 1

jjwjj ½18�

Therefore, the margin of the linear classifier defined by (w, b) is 2=jjwjj. Thewider the margin, the smaller is dVC, the VC dimension of the classifier. From


these considerations, it follows that the optimum separation hyperplane isobtained by maximizing 2=jjwjj, which is equivalent to minimizing jjwjj2=2.

The problem of finding the optimum separation hyperplane is repre-sented by the identification of the linear classifier (w, b), which satisfies

w � xi þ b � þ1 if yi ¼ þ1w � xi þ b � �1 if yi ¼ �1

�½19�

for which ||w|| has the minimum value.

Computing the Optimum Separation HyperplaneBased on the considerations presented above, the OSH conditions from

Eq. [19] can be formulated into the following expression that represents alinear SVM:

minimize f ðxÞ ¼ jjwjj2

2with the constraints giðxÞ ¼ yiðw � xi þ bÞ � 1 � 0; i ¼ 1; . . . ;m

½20�

The optimization problem from Eq. [20] represents the minimization of aquadratic function under linear constraints (quadratic programming), aproblem studied extensively in optimization theory. Details on quadratic pro-gramming can be found in almost any textbook on numerical optimization,and efficient implementations exist in many software libraries. However,Eq. [20] does not represent the actual optimization problem that is solved todetermine the OSH. Based on the use of a Lagrange function, Eq. [20] is trans-formed into its dual formulation. All SVM models (linear and nonlinear, clas-sification and regression) are solved for the dual formulation, which hasimportant advantages over the primal formulation (Eq. [20]). The dual pro-blem can be easily generalized to linearly nonseparable learning data and tononlinear support vector machines.

A convenient way to solve constrained minimization problems is byusing a Lagrangian function of the problem defined in Eq. [20]:

LPðw; b;LÞ ¼ f ðxÞ þXmi¼0

ligiðxÞ ¼ 1

2kwk2 �

Xmi¼1

liðyiðw � xi þ bÞ � 1Þ

¼ 1

2kwk2 �

Xmi¼1

liyiðw � xi þ bÞþXmi¼1

li ¼ 1

2kwk2

�Xmi¼1

liyiw � xi�Xmi¼1

liyibþXmi¼1

li

½21�

Here L ¼ ðl1; l2; . . . ; lm) is the set of Lagrange multipliers of the training(calibration) patterns with li � 0, and P in LP indicates the primal


formulation of the problem. The Lagrangian function LP must be minimizedwith respect to w and b, and maximized with respect to li, subject to the con-straints li � 0. This is equivalent to solving the Wolfe dual problem,40 namelyto maximize LP subject to the constraints that the gradient of LP with respectto w and b is zero, and subject to the constraints li � 0.

The Karuch–Kuhn–Tucker (KKT)40 conditions for the primal problemare as follows:

Gradient Conditions

qLPðw;b;LÞqw

¼ w�Xmi¼1

liyixi ¼ 0; whereqLPðw;b;LÞ

qw¼ qL

qw1;qLqw2

; � � � ; qLqwn

� �½22�

qLPðw;b;LÞqb

¼Xmi¼1

liyi ¼ 0 ½23�

qLPðw;b;LÞqli

¼ giðxÞ ¼ 0 ½24�

Orthogonality Condition

ligiðxÞ ¼ li½yiðw � xi þ bÞ � 1� ¼ 0; i ¼ 1; . . . ;m ½25�

Feasibility Condition

yiðw � xi þ bÞ � 1 � 0; i ¼ 1; . . . ;m ½26�

Non-negativity Condition

li � 0; i ¼ 1; . . . ;m ½27�

Solving the SVM problem is equivalent to finding a solution to the KKTconditions. We are now ready to formulate the dual problem LD:

maximize LDðw; b;LÞ ¼Xmi¼1

li � 1

2

Xmi¼1

Xmj¼1

liljyiyjxi � xj

subject to li � 0; i ¼ 1; . . . ;m

andXmi¼1

liyi ¼ 0

½28�

Both the primal LP and the dual LD Lagrangian functions are derived from thesame objective functions but with different constraints, and the solution is


found by minimizing LP or by maximizing LD. The most popular algorithmfor solving the optimization problem is the sequential minimal optimization(SMO) proposed by Platt.41

When we introduced the Lagrange function we assigned a Lagrangemultiplier li to each training pattern via the constraints gi(x) (see Eq. [20]).The training patterns from the SVM solution that have li > 0 represent thesupport vectors. The training patterns that have li ¼ 0 are not important inobtaining the SVM model, and they can be removed from training withoutany effect on the SVM solution. As we will see below, any SVM model is com-pletely defined by the set of support vectors and the corresponding Lagrangemultipliers.

The vectorw that defines the OSH (Eq. [29]) is obtained by using Eq. [22]:

w ¼Xmi¼1

liyixi ½29�

To compute the threshold b of the OSH, we consider the KKT conditionof Eq. [25] coupled with the expression for w from Eq. [29] and the conditionlj > 0, which leads to

Xmi¼1

liyixi � xj þ b ¼ yj ½30�

Therefore, the threshold b can be obtained by averaging the b values obtainedfor all support vector patterns, i.e., the patterns with lj > 0:

b ¼ yj �Xmi¼1

liyixi � xj ½31�

Prediction for New PatternsIn the previous section, we presented the SVM algorithm for training a

linear classifier. The result of this training is an optimum separation hyper-plane defined by (w, b) (Eqs. [29] and [31]). After training, the classifier isready to predict the class membership for new patterns, different from thoseused in training. The class of a pattern xk is determined with

classðxkÞ ¼ þ1 if w � xk þ b > 0�1 if w � xk þ b < 0

�½32�

Therefore, the classification of new patterns depends only on the sign of theexpression w � xþ b. However, Eq. [29] offers the possibility to predict new


patterns without computing the vector w explicitly. In this case, we will use forclassification the support vectors from the training set and the correspondingvalues of the Lagrange multipliers li:

classðxkÞ ¼ signXmi¼1

liyixi � xk þ b

!½33�

Patterns that are not support vectors (li ¼ 0) do not influence the classificationof new patterns. The use of Eq. [33] has an important advantage over usingEq. [32]: to classify a new pattern xk, it is only necessary to compute thedot product between xk and every support vector. This results in a significantsaving of computational time whenever the number of support vectors is smallcompared with the total number of patterns in the training set. Also, Eq. [33]can be easily adapted for nonlinear classifiers that use kernels, as we will showlater.

For a particular SVM problem (training set, kernel, kernel parameters),the optimum separation hyperplane is determined only by the support vectors(Figure 22a). By eliminating from training those patterns that are not supportvectors (li ¼ 0), the SVM solution does not change (Figure 22b). This propertysuggests a possible approach for accelerating the SVM learning phase, inwhich patterns that cannot be support vectors are eliminated from learning.

Example of SVM Classification for Linearly Separable DataWe now present several SVM classification experiments for a dataset

that is linearly separable (Table 3). This exercise is meant to compare the lin-ear kernel with nonlinear kernels and to compare different topologies for theseparating hyperplanes. All models used an infinite value for the capacityparameter C (no tolerance for misclassified patterns; see Eq. [39]).

H

(a) (b)

1

H2H +1

−1+1

−1

−1

−1

−1

−1

−1

−1−1

−1

+1

+1

+1 +1+1

+1

+1−1

−1

−1

−1

+1 H1

H2H

+1

+1−1

−1

−1

Figure 22 The optimal hyperplane classifier obtained with all training patterns (a) isidentical with the one computed with only the support vector patterns (b).


As expected, a linear kernel offers a complete separation of the twoclasses (Figure 23a), with only three support vectors, namely one from classþ1 and two from class �1. The hyperplane has the maximum width andprovides both a sparse solution and a good prediction model for new pat-terns. Note that, according to the constraints imposed in generating thisSVMC model, no patterns are allowed inside the margins of the classifier(margins defined by the two bordering hyperplanes represented with dottedlines). To predict the class attribution for new patterns, one uses Eq. [33]applied to the three support vectors. The next experiment uses a degree 2polynomial kernel (Figure 23b), which gives a solution with five supportvectors, namely two from class þ1 and three from class �1. The model isnot optimal for this dataset, but it still provides an acceptable hyperplane

Figure 23 SVM classification models for the dataset from Table 3: (a) dot kernel(linear), Eq. [64]; (b) polynomial kernel, degree 2, Eq. [65].

Table 3 Linearly Separable Patterns Used for the SVMClassification Models in Figures 23–25

Pattern x1 x2 Class

1 1 5.5 12 2.25 5 13 3.25 4.25 14 4 5.2 15 5.25 2.25 16 5.5 4 17 0.5 3.5 �18 1 2 �19 1.5 1 �110 2.25 2.7 �111 3 0.8 �112 3.75 1.25 �113 5 0.6 �1


topology. We have to notice that the margin width varies, decreasing fromleft to right.

By increasing the polynomial degree to 10, we obtain an SVMmodel thathas a wide margin in the center of the separating hyperplane and a very smallmargin toward the two ends (Figure 24a). Four patterns are selected as supportvectors, two from each class. This is not a suitable classifier for the datasetfrom Table 3, mainly because the topology of the separating hypersurface istoo complicated. An even more complex discriminating hyperplane isproduced by the exponential RBF kernel (Figure 24b).

The last two experiments for the linearly separable dataset are performedwith the Gaussian RBF kernel (s ¼ 1; Figure 25a) and the B spline kernel(degree 1; Figure 25b). Although not optimal, the classification hyperplanefor the Gaussian RBF kernel is much better than those obtained with the expo-nential RBF kernel and degree 10 polynomial kernel. On the other hand, SVM

Figure 24 SVM classification models for the dataset from Table 3: (a) polynomialkernel, degree 10, Eq. [65]; (b) exponential radial basis function kernel, s ¼ 1, Eq. [67].

Figure 25 SVM classification models for the dataset from Table 3: (a) Gaussian radialbasis function kernel, s ¼ 1, Eq. [66]; (b) B spline kernel, degree 1, Eq. [72].


with the B spline kernel is clearly overfitted, with a total of nine supportvectors (four from class þ1 and five from class �1). The margins of the SVMclassifier define two ‘‘islands’’ that surround each cluster of patterns. Notice-able are the support vectors situated far away from the central hyperplane.

The SVM classification models depicted in Figures 23–25 convey animportant message for scientists who want to use SVM applications in chemin-formatics: SVM models obtained with complex, nonlinear kernels must alwaysbe compared with those obtained with a linear kernel. Chances are that theseparation hypersurface is almost linear, thus avoiding overfitting the data.

Linear SVM for the Classification of LinearlyNon-Separable Data

In the previous section, we presented the SVMC model for the case whenthe training set is linearly separable, and an optimum separation hyperplanecorrectly classifies all patterns from that training set. The linear separabilityof two classes of patterns might not be a valid assumption for real-life applica-tions, however, and in these cases, the algorithm presented earlier will not finda solution. There are many reasons why a training set is linearly nonseparable.The identification of input variables (x1, x2, . . . , xn) that can separate the twoclasses linearly is not a trivial task. When descriptors are used for SAR models,the selection of those descriptors can be made from thousands of descriptorsfrom the extant literature or they can be computed with available software.Although several procedures have been developed to select the optimum setof structural descriptors, these methods are often time-consuming and mayrequire special algorithms that are not implemented in, e.g., currently avail-able SVM packages. In chemometrics applications, when measured quantities(e.g., spectra, physico-chemical properties, chemical reaction variables, orindustrial process variables) are used to separate two classes of patterns,difficulties exist not only for identifying the relevant properties, but also forcost and instrument availability, which may limit the number of possible mea-surements. Also, all experimental input data are affected by measurementerrors and noise, which can make the patterns linearly nonseparable. Finally,the classes might not be separable with a linear classifier, due to the nonlinearmapping between the input space and the two classes.

In Figure 26, we present a classification problem that, for the majority ofpatterns, can be solved with a linear classifier. However, the region corre-sponding to the þ1 patterns contains two �1 patterns (shown in squareboxes), whereas the two þ1 patterns are embedded in the region correspond-ing to the �1 patterns. Of course, no linear classifier can be computed for thislearning set, but several hyperplanes can be calculated in such a way as tominimize the number of classification errors, e.g., hyperplane H in Figure 26.

In this section, we consider a training set T of m patterns together withtheir classes, T ¼ fðx1; y1Þ; (x2; y2Þ; . . . ; ðxm; ymÞg that can be separated


linearly, except for a small number of objects. Obviously, computing the opti-mum separation hyperplane according to Eqs. [21] and [28] will fail to pro-duce any viable solution. We will show below how the SVMC for linearlyseparable patterns can be adapted to accommodate classification errors inthe training set. The resulting SVMC will still be linear, but it will computean optimum separation hyperplane even for cases like that in Figure 26, whichcannot be completely separated with a linear classifier.

In the previous section, we found that the OSH defined by a pair (w, b) isa buffer between class þ1 and class �1 of patterns, with the property that ithas the largest margin. The border toward the class þ1 is defined by the hyper-plane w � xþ b ¼ �1, whereas the border toward the class �1 is defined by thehyperplane w � xþ b ¼ �1. For the OSH, all class þ1 patterns satisfyw� xþ b � þ1, whereas all class �1 patterns satisfy w� xþ b � �1, and thelearning set is classified without errors.

To obtain an optimum linear classifier for nonseparable data (Figure 27),a penalty is introduced for misclassified data, denoted with x and called a slack

H +1

−1

−1

−1

−1

−1

−1

−1−1

−1

+1

+1 +1+1

+1

+1−1

−1

−1

−1

+1

+1

+1

−1

−1

Figure 26 Linearly nonseparable data. The patterns that cannot be linearly separatedwith a hyperplane are represented inside a square.

H

−1

−1

−1

−1

−1

+1

+1

+1

+1+1

+1

−1

−1

−1

+1

w·xi+b=+1

w·xi+b=−1

w·xi+b=0

ξ

ξ

+1

−1

H1

H2

Figure 27 Linear separable hyperplanes for nonseparable data. The patterns that cannotbe linearly separated with a hyperplane are represented inside a square.


variable. This penalty associated with any pattern in the training is zero forpatterns classified correctly, and has a positive value that increases with thedistance from the corresponding hyperplane for patterns that are not situatedon the correct side of the classifier.

For a pattern (xi, yi) from the class þ1, the slack variable is defined as

xiðw; bÞ ¼ 0 if w � xi þ b � þ11� ðw � xi þ bÞ if w � xi þ b � þ1

�½34�

Similarly, for a pattern (xi, yi) from the class �1, the slack variable is definedas

xiðw; bÞ ¼ 0 if w � xi þ b � �11þ ðw � xi þ bÞ if w � xi þ b � �1

�½35�

From Eqs. [34] and [35] and Figure 27, one can see that the slack vari-able xi(w, b) is zero for þ1 patterns that are classified correctly by hyperplaneH2 (w � xþ b � þ1) and for �1 patterns that are classified correctly by hyper-plane H1 (w � xþ b � �1). Otherwise, the slack variable has a positive valuethat measures the distance between a pattern xi and the corresponding hyper-plane w � xþ b ¼ yi. For þ1 patterns situated in the buffer zone between Hand H2, and for �1 patterns situated in the buffer zone between H and H1,the slack variable takes values between 0 and 1. Such patterns are not consid-ered to be misclassified, but they have a penalty added to the objectivefunction. If a pattern xi is located in the ‘‘forbidden’’ region of the classifier,then xi(w, b)> 1 (see the patterns in square boxes from Figure 27) and the pat-tern is considered to be misclassified. We can combine Eqs. [34] and [35] forslack variables of þ1 and �1 patterns into Eq. [36]:

xiðw; bÞ ¼ 0 if yiðw � xi þ bÞ � þ11� yiðw � xi þ bÞ if yiðw � xi þ bÞ � þ1

�½36�

When slack variables are introduced to penalize misclassified patterns orpatterns situated in the buffer region between H and the corresponding borderhyperplanes (H1 or H2), the constraints imposed to the objective function areas follows:

w � xi þ b � þ1� xi if yi ¼ þ1w � xi þ b � �1þ xi if yi ¼ �1xi > 0; 8i

8<: ½37�

The identification of an OSH is much more difficult when slack variablesare used, because the optimum classifier is a compromise between two oppos-ing conditions. On the one hand, a good SVMC corresponds to a hyperplane


(w, b) with a margin as large as possible in order to guarantee good predictionperformances, which translates into minimizing jjwjj2=2. On the other hand,the optimum hyperplane should minimize the number of classification errorsand it should also minimize the error of misclassified patterns, which translatesin minimizing the number of positive slack variables and simultaneously mini-mizing the value of each slack variable. The latter condition has the tendencyof decreasing the width of the SVMC hyperplane, which is in contradictionwith the former condition. A simple way to combine these two conditionsand to assign a penalty for classification errors is to change the objectivefunction to be minimized from jjwjj2=2 to

jjwjj22þ C

Xmi¼1

xi

!k½38�

where C is a parameter that can be adjusted by the user, and can eitherincrease or decrease the penalty for classification errors. A large C assigns ahigher penalty to classification errors, thus minimizing the number of misclas-sified patterns. A small C maximizes the margin so that the OSH is less sensi-tive to the errors from the learning set. Equation [38] is a convex programmingproblem for any positive integer k, which for k ¼ 1 and k ¼ 2 is also a quad-ratic programming problem. The formula with k ¼ 1 has the advantage thatneither xi nor their Lagrange multipliers appear in the Wolfe dual problem.40

Based on the above considerations, we are now ready to state the form ofthe optimization problem for SVMC with a linear classifier and classificationerrors:

minimizejjwjj22þ C

Xmi¼1

xi

with the constraintsyiðw � xi þ bÞ � þ1� xi; i ¼ 1; . . . ;m

xi � 0; i ¼ 1; . . . ;m

½39�

To solve the above constrained quadratic optimization problem, wefollow the approach based on Lagrange multipliers (Eq. [21]). We definethe Lagrange multipliers L ¼ ðl1; l2; . . . ; lm) for each constraintyiðw � xi þ bÞ � þ1� xi and the Lagrange multipliers M ¼ ðm1; m2; . . . ; mm)for each constraint xi � 0; 8 i ¼ 1; . . . ;m. With these notations, the primalLagrangian function of this problem is

LPðw; b;L;MÞ ¼ 1

2jjwjj2 þ C

Xmi¼1

xi �Xmi¼1

li½yiðw � xi þ bÞ � 1þ xi��Xmi¼1

mixi

½40�


where L ¼ ðl1; l2; . . . ; lmÞ is the set of Lagrange multipliers of the training(calibration) patterns.

The Karuch–Kuhn–Tucker conditions40 for the primal problem are asfollows:

Gradient Conditions

qLPðw; b;L;MÞqw

¼ w�Xmi¼1

liyixi ¼ 0

whereqLPðw; b;L;MÞ

qw¼ qL

qw1;qLqw2

; � � � ; qLqwn

� �½41�

qLPðw; b;L;MÞqb

¼Xmi¼1

liyi ¼ 0 ½42�

qLPðw; b;L;MÞqxi

¼ C� li � mi ¼ 0 ½43�

Orthogonality Condition

li½yiðw � xi þ bÞ � 1þ xi� ¼ 0; i ¼ 1; . . . ;m ½44�

Feasibility Condition

yiðw � xi þ bÞ � 1þ xi � 0; i ¼ 1; . . . ;m ½45�

Non-negativity Condition

xi � 0; i ¼ 1; . . . ;m

li � 0; i ¼ 1; . . . ;m

mi � 0; i ¼ 1; . . . ;m

mixi ¼ 0; i ¼ 1; . . . ;m

½46�

We now substitute Eqs. [41] and [42] into the right side of the Lagran-gian function, obtaining the dual problem

maximize LDðw; b;L;MÞ ¼Xmi¼1

li � 1

2

Xmi¼1

Xmj¼1

liljyiyjxi � xj

subject to 0 � li � C; i ¼ 1; . . . ;m

andXmi¼1

liyi ¼ 0

½47�


The solution for the vector w is obtained from Eq. [41], which representsone of the KKT conditions:

w ¼Xmi¼1

liyixi ½48�

The value of b can be computed as an average for the b values obtainedfrom all training patterns with the following KKT conditions:

li½yiðw � xi þ bÞ � 1þ xi� ¼ 0

ðC� liÞxi ¼ 0½49�

From the above equations, we have also that xi ¼ 0 if li < C. Therefore, b canbe averaged only for those patterns that have 0 � li < C.

We will now examine the relationships between the position of a patternxi and the corresponding values for li; xi, and C. The following situations canbe distinguished:

1. (li ¼ 0; xi ¼ 0): The pattern is inside the þ1 region (w � xi þ b > þ1) ifyi ¼ þ1 or inside the �1 region (wxi þ b < �1) if yi ¼ �1, i.e., it iscorrectly classified, and its distance from the separating hyperplane is largerthan 1=jjwjj. Such patterns are not important in defining the SVMC model,and they do not influence the solution. Hence, they can be deleted from thelearning set without affecting the model.

2. (0 < li < C; xi ¼ 0): This situation corresponds to correctly classifiedpatterns situated on the hyperplanes that border the SVMC OSH,i.e., patterns þ1 are situated on the hyperplane (wxi þ b ¼ þ1), whereaspatterns �1 are situated on the hyperplane (wxi þ b ¼ �1). The distancebetween these patterns and the separating hyperplane is 1=jjwjj. Such apattern is called a margin support vector.

3. (li ¼ C; 0 < xi � 1): These patterns, correctly classified, are called boundsupport vectors, and their distance to the separating hyperplane is smallerthan 1=jjwjj. Patterns from the class þ1 are situated in the buffer zonebetween the separating hyperplane (w � xi þ b ¼ 0) and the border hyper-plane toward the þ1 region (w � xi þ b ¼ þ1). Patterns from the class �1are situated in the buffer zone between the separating hyperplane(w � xi þ b ¼ 0) and the border hyperplane toward the �1 region(w � xi þ b ¼ �1).

4. (li ¼ C; xi > 1): These patterns are incorrectly classified. Patterns from theclass þ1 are situated in the �1 region defined by the separating hyperplane(w � xi þ b < 0), whereas patterns from the class �1 are situated in the þ1region of the separating hyperplane (w � xi þ b > 0).


The classification of new patterns uses the optimum values for w(Eq. [48]) and b (Eq. [49]):


liyixi � xk þ b

!½50�

Equation [50] depends only on the support vectors and their Lagrange multi-pliers, and the optimum value for b, showing that one does not need tocompute w explicitly in order to predict the classification of new patterns.

NONLINEAR SUPPORT VECTOR MACHINES

In previous sections, we introduced the linear SVM classificationalgorithm, which uses the training patterns to generate an optimum separationhyperplane. Such classifiers are not adequate for cases when complex relation-ships exist between input parameters and the class of a pattern. To discrimi-nate linearly nonseparable classes of patterns, the SVM model can be fittedwith nonlinear functions to provide efficient classifiers for hard-to-separateclasses of patterns.

Mapping Patterns to a Feature Space

The separation surface may be nonlinear in many classification pro-blems, but support vector machines can be extended to handle nonlinearseparation surfaces by using feature functions f(x). The SVM extension tononlinear datasets is based on mapping the input variables into a feature spaceof a higher dimension (a Hilbert space of finite or infinite dimension) and thenperforming a linear classification in that higher dimensional space. For exam-ple, consider the set of nonlinearly separable patterns in Figure 28, left. It is

+1−1

+1

+1

+1

+1

+1+1

+1

+1

−1

−1

−1

−1

−1

−1

−1

−1

−1

−1 +1

+1

+1

+1+1

+1

+1+1

+1

−1

−1 −1

−1

−1 −1

−1

−1

−1

−1

−1

φ

Feature spaceInput space

Figure 28 Linear separation of patterns in feature space.

Nonlinear Support Vector Machines 323

clear that a linear classifier, even with slack variables, is not appropriate forthis type of separation surface, which is obviously nonlinear. The nonlinearfeature functions f transform and combine the original coordinates of the pat-terns and perform their mapping into a high-dimensional space (Figure 28,right) where the two classes can be separated with a linear classifier. This prop-erty is of value because linear classifiers are easy to compute, and we can usethe results obtained for linear SVM classification from the previous sections.The only difficulty is to identify, for a particular dataset, the correct set ofnonlinear functions that can perform such mapping.

Consider a training set T of m patterns together with their classes,T ¼ fðx1; y1Þ; ðx2; y2Þ; . . . ; ðxm; ymÞg, where x is an n-dimensional pattern,x ¼ ðx1; x2; . . . ; xn). Define the set of feature functions as f1;f2; . . . ;fh. Anypattern x is mapped to a real vector f(x):

x ¼ ðx1; x2; . . . ; xnÞ ! fðxÞ ¼ ðf1ðxÞ;f2ðxÞ; . . . ;fhðxÞÞ ½51�

After mapping all patterns from the learning set into the feature set, weobtain a set of points in the feature space Rh:

fðTÞ ¼ fðfðx1Þ; y1Þ; ðfðx2Þ; y2Þ; . . . ; ðfðxmÞ; ymÞg ½52�

The important property of the feature space is that the learning set f(T) mightbe linearly separable in the feature space if the appropriate feature functionsare used, even when the learning set is not linearly separable in the originalspace.

We consider a soft margin SVM in which the variables x are substitutedwith the feature vector f(x), which represents an optimization problem similarwith that from Eq. [39]. Using this nonlinear SVM, the class of a pattern xk isdetermined with Eq. [53].

classðxkÞ ¼ sign½w � fðxkÞ þ b� ¼ signXmi¼1

liyifðxiÞ � fðxkÞ þ b

!½53�

The nonlinear classifier defined by Eq. [53] shows that to predict a pattern xk,it is necessary to compute the dot product f(xiÞ � fðxk) for all support vectorsxi. This property of the nonlinear classifier is very important, because it showsthat we do not need to know the actual expression of the feature function f.Moreover, a special class of functions, called kernels, allows the computationof the dot product fðxiÞ � fðxk) in the original space defined by the trainingpatterns.

We present now a simple example of linearly nonseparable classes thatcan become linearly separable in feature space. Consider the dataset fromTable 4 and Figure 29. This two-dimensional dataset, with dimensions


x1 and x2, consists of three patterns in class þ1 and six patterns in class �1.From Figure 29, it is easy to deduce that there is no straight line that can sepa-rate these two classes.

On the other hand, one can imagine a higher dimensional feature spacein which these classes become linearly separable. The features are combina-tions of the input data, and for this example, we add x21 as a new dimension(Table 4, column 4). After this transformation, the dataset is represented in athree-dimensional feature space.

The surface f(x1; x2) ¼ x21 is represented in Figure 30. By adding thissimple feature, we have mapped the patterns onto a nonlinear surface. Thisis easily seen when we plot (Figure 31) the feature space points (x1, x2, x

21)

that are located on the surface from Figure 30.The feature x21 has an interesting property, as one can see by inspecting

Table 4: all patterns from class þ1 have x21 ¼ 0, whereas all patterns fromclass �1 have x21 ¼ þ1. By mapping the patterns in the feature space, weare now able to separate the two classes with a linear classifier, i.e., a plane(Figure 32). Of course, this plane is not unique, and in fact, there is an infinitenumber of planes that can now discriminate the two classes.

+1

+1

+1

−1 −1

−1

−1

−1

−1

Figure 29 Linearly nonseparable two-dimensional patterns.

Table 4 Linearly Nonseparable Patterns that Can beSeparated in a Feature Space

Pattern x1 x2 x21 Class

1 �1 �1 þ1 �12 �1 0 þ1 �13 �1 þ1 þ1 �14 0 �1 0 þ15 0 0 0 þ16 0 þ1 0 þ17 þ1 �1 þ1 �18 þ1 0 þ1 �19 þ1 þ1 þ1 �1


The intersection between the feature space and the classifier defines thedecision boundaries, which, when projected back onto the original space, looklike Figure 33. Thus, transforming the input data into a nonlinear featurespace makes the patterns linearly separable. Unfortunately, for a given dataset,one cannot predict which feature functions will make the patterns linearlyseparable; finding good feature functions is thus a trial-and-error process.

Feature Functions and Kernels

The idea of transforming the input space into a feature space of a higherdimension by using feature functions f(x) and then performing a linear

Figure 30 Surface f ðx; yÞ ¼ x2.

−1

+1

+1

−1

−1

−1

−1+1−1

+1

+1−1

−1

−1

−1

−1

+1−1

Figure 31 Feature space points (x, y, x2).


classification in that higher dimensional space is central to support vectormachines. However, the feature space may have a very high dimensionality,even infinite. An obvious consequence is that we want to avoid the inner pro-duct of feature functions f(x) that appears in Eq. [53]. Fortunately, a methodwas developed to generate a mapping into a high-dimensional feature spacewith kernels. The rationale that prompted the use of kernel functions is to

+1

+1

+1

−1 −1

−1

−1

−1

−1

decision boundary

Figure 33 Projection of the separation plane.

−1

+1

+1

−1

−1

−1

−1+1−1

decision boundary

+1

+1−1

−1

−1

−1

−1

+1−1

Figure 32 A separation plane for þ1 patterns (below the plane) and �1 patterns(above the plane).


enable computations to be performed in the original input space rather thehigh-dimensional (even infinite) feature space. Using this approach, the SVMalgorithm avoids the evaluation of the inner product of the feature functions.

Under certain conditions, an inner product in feature space has anequivalent kernel in input space:

Kðxi; xjÞ ¼ fðxiÞ � fðxjÞ ½54�

If the kernel K is a symmetric positive definite function, which satisfies theMercer’s conditions:4,42

Kðxi; xjÞ ¼X1k

akfkðxiÞfkðxjÞ; ak � 0 ½55�

and ððKðxi; xjÞgðxiÞgðxjÞdxidxj > 0 ½56�

then the kernel represents an inner product in feature space.Consider the two-dimensional pattern x ¼ ðx1; x2) and the feature

function defined for a two-dimensional pattern x:

fðxÞ ¼ 1;ffiffiffi2p

x1;ffiffiffi2p

x2; x21; x

22;

ffiffiffi2p

x1x2 �

½57�

From the expression of this feature function, it is easy to obtain the corre-sponding kernel function

Kðxi; xjÞ ¼ fðxiÞ � fðxjÞ ¼ ð1þ xi � xjÞ2 ½58�

This example can be easily extended to a three-dimensional patternx ¼ ðx1; x2; x3), when the feature function has the expression


x1;ffiffiffi2p

x2;ffiffiffi2p

x3; x21; x

22; x

23;

ffiffiffi2p

x1x2;ffiffiffi2p

x1x3;ffiffiffi2p

x2x3 �

½59�

which corresponds to the polynomial of degree two kernel from Eq. [58]. In asimilar way, a two-dimensional pattern x ¼ ðx1; x2) and a feature function


x1;ffiffiffi3p

x2;ffiffiffi3p

x21;ffiffiffi3p

x22;ffiffiffi6p

x1x2;ffiffiffi3p

x21x2;ffiffiffi3p

x1x22; x

31; x

32

�½60�

is equivalent with a polynomial of degree three kernel

Kðxi; xjÞ ¼ fðxiÞ � fðxjÞ ¼ ð1þ xi � xjÞ3 ½61�


We will now present an example of infinite dimension feature functionwith the expression

/ðxÞ ¼ sinðxÞ; 1ffiffiffi2p sinð2xÞ; 1ffiffiffi

3p sinð3xÞ; 1ffiffiffi

4p sinð4xÞ; � � � ; 1ffiffiffi

np sinðnxÞ; � � �

� �½62�

where x 2 ½1; p�. The kernel corresponding to this infinite series has a verysimple expression, which can be easily calculated as follows:

Kðxi; xjÞ ¼ fðxiÞ � fðxjÞ

¼X1n¼1

1

nsinðnxiÞ sinðnxjÞ ¼ 1

2log sin

xi þ xj2

� ��sin

xi � xj2

�� ½63�

Kernel Functions for SVM

In this section, we present the most used SVM kernels. As these functionsare usually computed in a high-dimensional space and have a nonlinear char-acter, it is not easy to derive an impression on the shape of the classificationhyperplane generated by these kernels. Therefore, we will present several plotsfor SVM models obtained for the dataset shown in Table 5. This dataset isnot separable with a linear classifier, but the two clusters can be clearlydistinguished.

Linear (Dot) KernelThe inner product of x and y defines the linear (dot) kernel:

Kðxi; xjÞ ¼ xi � xj ½64�This is a linear classifier, and it should be used as a test of the nonlinearity inthe training set, as well as a reference for the eventual classification improve-ment obtained with nonlinear kernels.

Table 5 Linearly Nonseparable Patterns Used for the SVM Classification Models inFigures 34–38

Pattern x1 x2 Class

1 2 4 12 2.5 2.75 13 3 5 14 3.5 2 15 4.5 4.75 16 5 3.75 17 3.25 4 18 4 3.25 1

Pattern x1 x2 Class

9 0.6 4.5 �110 1 3 �111 1.5 1 �112 2 5.7 �113 3.5 5.5 �114 4 0.6 �115 5 1.5 �116 5.3 5.4 �117 5.75 3 �1


Polynomial KernelThe polynomial kernel is a simple and efficient method for modeling

nonlinear relationships:

Kðxi; xjÞ ¼ ð1þ xi � xjÞd ½65�

The dataset from Table 5 can be separated easily with a polynomial ker-nel (Figure 34a, polynomial of degree 2). The downside of using polynomialkernels is the overfitting that might appear when the degree increases(Figure 34b, degree 3; Figure 35a, degree 5; Figure 35b, degree 10). As thedegree of the polynomial increases, the classification surface becomes morecomplex. For the degree 10 polynomial, one can see that the border hypersur-face defines two regions for the cluster of þ1 patterns.

Figure 34 SVM classification models obtained with the polynomial kernel (Eq. [65]) forthe dataset from Table 5: (a) polynomial of degree 2; (b) polynomial of degree 3.

Figure 35 SVM classification models obtained with the polynomial kernel (Eq. [65])for the dataset from Table 5: (a) polynomial of degree 5; (b) polynomial of degree 10.


Gaussian Radial Basis Function KernelRadial basis functions (RBF) are widely used kernels, usually in the

Gaussian form:

Kðxi; xjÞ ¼ exp � jjx� yjj22s2

!½66�

The parameter s controls the shape of the separating hyperplane, as one cansee from the two SVM models in Figure 36, both obtained with a GaussianRBF kernel (a, s ¼ 1; b, s ¼ 10). The number of support vectors increasesfrom 6 to 17, showing that the second setting does not generalize well. In prac-tical applications, the parameter s should be optimized with a suitable cross-validation procedure.

Exponential Radial Basis Function KernelIf discontinuities in the hyperplane are acceptable, an exponential RBF

kernel is worth trying:

Kðxi; xjÞ ¼ exp � jjx� yjj2s2

� �½67�

The form of the OSH obtained for this kernel is apparent in Figure 37,where two values for the parameter s are exemplified (a, s ¼ 0:5; b, s ¼ 2).For the particular dataset used here, this kernel is not a good choice, because itrequires too many support vectors.

Figure 36 SVM classification models obtained with the Gaussian radial basis functionkernel (Eq. [66]) for the dataset from Table 5: (a) s ¼ 1; (b) s ¼ 10.


Neural (Sigmoid, Tanh) KernelThe hyperbolic tangent (tanh) function, with a sigmoid shape, is the most

used transfer function for artificial neural networks. The corresponding kernelhas the formula:

Kðxi; xjÞ ¼ tanhðaxi � xj þ bÞ ½68�

Anova KernelA useful function is the anova kernel, whose shape is controlled by the

parameters g and d:

Kðxi; xjÞ ¼Xi

expð�gðxi � xjÞÞ !d

½69�

Fourier Series KernelA Fourier series kernel, on the interval ½�p=2;þp=2�, is defined by

Kðxi; xjÞ ¼ sinðN þ½Þðxi � xjÞsinð½ðxi � xjÞÞ ½70�

Spline KernelThe spline kernel of order k having N knots located at ts is defined by

Kðxi; xjÞ ¼Xkr¼0

xrixrj þXNs¼1ðxi � tsÞkþðxj � tsÞkþ ½71�

Figure 37 SVM classification models obtained with the exponential radial basisfunction kernel (Eq. [67]) for the dataset from Table 5: (a) s ¼ 0:5; (b) s ¼ 2.


B Spline KernelThe B spline kernel is defined on the interval [�1, 1] by the formula:

Kðxi; xjÞ ¼ B2Nþ1ðxi � xjÞ ½72�

Both spline kernels have a remarkable flexibility inmodeling difficult data.This characteristic is not always useful, especially when the classes can be sepa-rated with simple nonlinear functions. The SVM models from Figure 38(a, spline; b, B spline, degree 1) show that the B spline kernel overfits the dataand generates a border hyperplane that has three disjoint regions.

Additive KernelAn interesting property of kernels is that one can combine several kernels

by summing them. The result of this summation is a valid kernel function:

Kðxi; xjÞ ¼Xi

Kiðxi; xjÞ ½73�

Tensor Product KernelThe tensor product of two or more kernels is also a kernel function:

Kðxi; xjÞ ¼Yi

Kiðxi; xjÞ ½74�

In many SVM packages, these properties presented in Eqs. [73] and [74] allowthe user to combine different kernels in order to generate custom kernels moresuitable for particular applications.

Figure 38 SVM classification models for the dataset from Table 5: (a) spline kernel,Eq. [71]; (b) B spline kernel, degree 1, Eq. [72].


Hard Margin Nonlinear SVM Classification

In Figure 39, we present the network structure of a support vectormachine classifier. The input layer is represented by the support vectorsx1, . . ., xn and the test (prediction) pattern xt, which are transformed by thefeature function f and mapped into the feature space. The next layer performsthe dot product between the test pattern f(xt) and each support vector f(xi).The dot product of feature functions is then multiplied with the Lagrangianmultipliers, and the output is the nonlinear classifier from Eq. [53] in whichthe dot product of feature functions was substituted with a kernel function.

The mathematical formulation of the hard margin nonlinear SVM clas-sification is similar to that presented for the SVM classification for linearlyseparable datasets, only now input patterns x are replaced with feature func-tions, x! fðxÞ, and the dot product for two feature functions fðxiÞ � fðxjÞ isreplaced with a kernel function Kðxi;xjÞ, Eq. [64]. Analogously with Eq. [28],the dual problem is


li � 1

2

Xmi¼1

Xmj¼1

liljyiyjfðxiÞ � fðxjÞ

subject to li � 0; i ¼ 1; . . . ;m

andXmi¼1

liyi ¼ 0

½75�

The vector w that determines the optimum separation hyperplane is

w ¼Xmi¼1

liyifðxiÞ ½76�

φ(x1)

φ(x2)

φ(xt)

φ(xn)

x1 ( )

( )

( )

Output

λ1

λ2

λn

sign [ΣλiyiK(xi,xt)+b]

x2

xn

xt

Support vectorsDot productMapped vectors

Test vector

Figure 39 Structure of support vector machines. The test pattern xt and the supportvectors x1, . . . , xn are mapped into a feature space with the nonlinear function f, and thedot products are computed.


As with the derivation of b in Eq. [30], we have

Xmi¼1

liyiKðxi; xjÞ þ b ¼ yj ½77�

Therefore, the threshold b can be obtained by averaging the b valuesobtained for all support vector patterns, i.e., the patterns with lj > 0:

b ¼ yj �Xmi¼1

liyiKðxi; xjÞ ½78�

The SVM classifier obtained with a kernel K is defined by the supportvectors from the training set (li > 0) and the corresponding values of theLagrange multipliers li:


liyiKðxi; xkÞ þ b

!½79�

Soft Margin Nonlinear SVM Classification

A soft margin nonlinear SVM classifier is obtained by introducing slackvariables x and the capacity C. As with Eq. [47], the dual problem is


li � 1

2

Xmi¼1

Xmj¼1

liljyiyjfðxiÞ � fðxjÞ

subject to 0 � li � C; i ¼ 1; . . . ;m

andXmi¼1

liyi ¼ 0

½80�

which defines a classifier identical with the one from Eq. [79].The capacity parameter C is very important in balancing the penalty for

classification errors. It is usually adjusted by the user, or it can be optimizedautomatically by some SVM packages. The penalty for classification errorsincreases when the capacity C increases, with the consequence that the numberof erroneously classified patterns decreases when C increases. On the otherhand, the margin decreases when C increases, making the classifier more sen-sitive to noise or errors in the training set. Between these divergent require-ments (small C for a large margin classifier; large C for a small number ofclassification errors), an optimum value should be determined, usually bytrying to maximize the cross-validation prediction.


To illustrate the influence of the capacity parameter C on the separationhyperplane with the dataset from Table 5 and a polynomial kernel of degree 2,consider Figures 40 (a, C ¼ 100; b, C ¼ 10) and 41 (a, C ¼ 1; b, C ¼ 0:1).This example shows that a bad choice for the capacity C can ruin the perfor-mance of an otherwise very good classifier. Empirical observations suggestthat C ¼ 100 is a good value for a wide range of SVM classification problems,but the optimum value should be determined for each particular case.

A similar trend is presented for the SVM models obtained with thespline kernel, presented in Figure 38a (C infinite) and Figure 42(a, C ¼ 100; b, C ¼ 10). The classifier from Figure 38a does not allow classi-fication errors, whereas by decreasing the capacity C to 100 (Figure 42a),one �1 pattern is misclassified (indicated with an arrow). A further decrease

Figure 40 Influence of the C parameter on the class separation. SVM classificationmodels obtained with the polynomial kernel of degree 2 for the dataset from Table 5:(a) C ¼ 100; (b) C ¼ 10.

Figure 41 Influence of the C parameter on the class separation. SVM classificationmodels obtained with the polynomial kernel of degree 2 for the dataset from Table 5:(a) C ¼ 1; (b) C ¼ 0:1.


of C to 10 increases the number of classification errors: one for class þ1 andthree for class �1.

n-SVM Classification

Another formulation of support vector machines is the n-SVM in whichthe parameter C is replaced by a parameter n 2 [0, 1] that is the lower andupper bound on the number of training patterns that are support vectorsand are situated on the wrong side of the hyperplane. n-SVM can be usedfor both classification and regression, as presented in detail in several reviews,by Scholkopf et al.,43 Chang and Lin,44,45 Steinwart,46 and Chen, Lin, andScholkopf.47

The optimization problem for the n-SVM classification is

minimizejjwjj22� nrþ 1

2

Xmi¼1

xi

with the constraintsyiðw � xi þ bÞ � r� xi; i ¼ 1; . . . ;m

xi � 0; i ¼ 1; . . . ;m

½81�

With these notations, the primal Lagrangian function of this problem is

LPðw; b;L; x; b; r; dÞ ¼ 1

2jjwjj2 � nrþ 1

m

Xmi¼1

xi

�Xmi¼1

nli½yiðw � xi þ bÞ�rþ xi� þ bixi � dr

o½82�

with the Lagrange multipliers li, bi, d � 0. This function must be minimizedwith respect to w, b, x, r, and maximized with respect to L, b, d. Following

Figure 42 Influence of the C parameter on the class separation. SVM classificationmodels obtained with the spline kernel for the dataset from Table 5: (a) C ¼ 100;(b) C ¼ 10.


the same derivation as in the case of C-SVM, we compute the correspondingpartial derivatives and set them equal to 0, which leads to the followingconditions:

w ¼Xmi¼1

liyixi ½83�

li þ bi ¼ 1=m ½84�Xmi¼1

liyi ¼ 0 ½85�

Xmi¼1

li � d ¼ n ½86�

We substitute Eqs. [83] and [84] into Eq. [82], using li, bi, d � 0, andthen we substitute the dot products with kernels, to obtain the following quad-ratic optimization problem:

maximize LDðLÞ ¼ � 1

2

Xmi¼1

Xmj¼1

liljyiyjKðxi; xjÞ

subject to :

0 � li � 1=m i ¼ 1; . . . ;mXmi¼1

liyi ¼ 0

Xmi¼1

li � n

½87�

From these equations, it follows that the n-SVM classifier is


liyiKðxi; xkÞ þ b

!½88�

Scholkopf et al. showed that if a n-SVM classifier leads to r > 0, then theC-SVM classifier with C ¼ 1=mr has the same decision function.43

Weighted SVM for Imbalanced Classification

In many practical applications, the ratio between the number of þ1 and�1 patterns is very different from 1; i.e., one class is in excess and can dom-inate the SVM classifier. In other cases the classification error for one classmay be more unfavorable or expensive than an error for the other class


(e.g., a clinical diagnostic error). In both classes, it is advantageous to use avariant of the SVM classifier, the weighted SVM, that uses different penalties(Cþ and C�) for the two classes. The most unfavorable type of error has ahigher penalty, which translates into an SVM classifier that minimizes thattype of error. By analogy with Eq. [39], the primal problem is the Lagrangianfunction:

minimizejjwjj22þ Cþ

Xmi¼1

yi¼þ1

xi þ C�Xmi¼1

yi¼�1

xi

with the constraintsyiðw � xi þ bÞ � þ1� xi; i ¼ 1; . . . ;m

xi � 0; i ¼ 1; . . . ;m

½89�

which is equivalent with the dual problem


li � 1

2

Xmi¼1

Xmj¼1

liljyiyjxi � xj

subject to :

0 � li � Cþ i ¼ 1; . . . ;m for yi ¼ þ10 � li � C� i ¼ 1; . . . ;m for yi ¼ �1Xmi¼1

liyi ¼ 0 ½90�

The final solution is obtained by introducing the feature functions,x! fðxÞ, and substituting the dot product fðxiÞ � fðxjÞ with a kernelfunction Kðxi; xjÞ:

Multi-class SVM Classification

Support vector machine classification was originally defined for two-class problems. This is a limitation in some cases when three or more classesof patterns are present in the training set as, for example, classifying chemicalcompounds as inhibitors for several targets.

Many multiclass SVM classification approaches decompose the trainingset into several two-class problems. The one-versus-one approach trains a two-class SVM model for any two classes from the training set, which for a k-classproblem results in kðk� 1Þ=2 SVM models. In the prediction phase, a votingprocedure assigns the class of the prediction pattern to be the class with themaximum number of votes. A variant of the one-versus-one approach isDAGSVM (directed acyclic graph SVM), which has an identical training pro-cedure, but uses for prediction a rooted binary directed acyclic graph in which


each vertex is a two-class SVMmodel. Debnath, Takahide and Takahashi pro-posed an optimized one-versus-one multiclass SVM in which only a minimumnumber of SVM classifiers are trained for each class.48

The one-versus-all procedure requires a much smaller number of models,namely for a k-class problem, only k SVM classifiers are needed. The ith SVMclassifier is trained with all patterns from the ith class labeled þ1, and all otherpatterns labeled �1. Although it is easier to implement than the one-versus-one approach, the training sets may be imbalanced due to the large numberof �1 patterns. In a comparative evaluation of one-versus-one, one-versus-all,and DAGSVMmethods for 10 classification problems, Hsu and Lin found thatone-versus-all is less suitable than the other methods.49 However, not all lit-erature reports agree with this finding. Based on a critical review of the exist-ing literature on multiclass SVM and experiments with many datasets, Rifkinand Klautau concluded that the one-versus-all SVM classification is asaccurate as any other multiclass approach.50

Angulo, Parra and Catala proposed the K-SVCR (K-class support vectorclassification-regression) for k-class classification.51 This algorithm has ternaryoutputs, f�1; 0;þ1g, and in the learning phase evaluates all patterns in a one-versus-one-versus-rest procedure by using a mixed classification and regressionSVM. The prediction phase implements a voting scheme that makes the algo-rithm fault-tolerant. Guermeur applied a new multiclass SVM, called M-SVM,to the prediction of protein secondary structure.52,53

Multiclass SVM classification is particularly relevant for the classifica-tion of microarray gene expression data, with particular importance for dis-ease recognition and classification.54–58

SVM REGRESSION

Initially developed for pattern classification, the SVM algorithm wasextended by Vapnik4 for regression by using an e-insensitive loss function(Figure 7). The goal of SVM regression (SVMR) is to identify a function f(x)that for all training patterns x has a maximum deviation e from the target(experimental) values y and has a maximum margin. Using the training pat-terns, SVMR generates a model representing a tube with radius e fitted tothe data. For the hard margin SVMR, the error for patterns inside the tubeis zero, whereas no patterns are allowed outside the tube. For real-case data-sets, this condition cannot account for outliers, an incomplete set of input vari-ables x or experimental errors in measuring y. Analogously with SVMclassification, a soft margin SVMR was introduced by using slack variables.Several reviews on SVM regression should be consulted for more mathematicaldetails, especially those by Mangasarian and Musicant,59,60 Gao, Gunn, andHarris,61,62 and Smola and Scholkopf.30


Consider a training set T of m patterns together with their target (experi-mental) values, T ¼ fðx1; y1Þ; ðx2; y2Þ; . . . ; ðxm; ymÞg, with x 2 Rn and y 2 R.The linear regression case with a hard margin is represented by the functionf ðxÞ ¼ w � xþ b, with w 2 Rn and b 2 R. For this simple case, the SVMR isrepresented by

minimizejjwjj22

with the constraintsw � xi þ b� yi � e; i ¼ 1; . . . ;m

yi �w � xi � b � e; i ¼ 1; . . . ;m

½91�

The above conditions can be easily extended for the soft margin SVMregression:

minimizejjwjj22þ C

Xmi¼1

xþi þ x�i� �

with the constraints

w � xi þ b� yi � eþ xþi ; i ¼ 1; . . . ;m

yi �w � xi � b � eþ x�i ; i ¼ 1; . . . ;m

xþi � 0; i ¼ 1; . . . ;m

x�i � 0; i ¼ 1; . . . ;m

½92�

where xþi is the slack variable associated with an overestimate of the calculatedresponse for the input vector xi, x

�i is the slack variable associated with an

underestimate of the calculated response for the input vector xi, e determinesthe limits of the approximation tube, and C > 0 controls the penalty asso-ciated with deviations larger than e. In the case of the e-insensitive loss func-tion, the deviations are penalized with a linear function:

jxje ¼0 ifjxj � e

jxj � e otherwise

�½93�

The SVM regression is depicted in Figure 43. The regression tube is bor-dered by the hyperplanes y ¼ wxþ bþ e and y ¼ wxþ b� e. Patterns situ-ated between these hyperplanes have the residual (absolute value for thedifference between calculated and experimental y) less than e, and in SVMregression, the error of these patterns is considered zero; thus, they do not con-tribute to the penalty. Only patterns situated outside the regression tube havea residual larger than e and thus a nonzero penalty that, for the e-insensitiveloss function, is proportional to their distance from the SVM regression border(Figure 43, right).

SVM Regression 341

The primal objective function is represented by the Lagrange function

LPðw; b;L;MÞ ¼ jjwjj2

2þ C

Xmi¼1

xþi þ x�i� ��Xm

i¼1mþi x

þi þ m�i x

�i

� ��Xmi¼1

lþi eþ xþi þ yi �w � xi � b� ��Xm

i¼1l�i eþ x�i � yi þw � xi þ b� � ½94�

where lþi ; l�i ; m

þi , and m�i are the Lagrange multipliers. The KKT conditions

for the primal problem are as follows:

Gradient Conditions

qLPðw; b;L;MÞqb

¼Xmi¼1

lþi � l�i� � ¼ 0 ½95�

qLPðw; b;L;MÞqw

¼ w�Xmi¼1

l�i � lþi� �

xi ¼ 0 ½96�

qLPðw; b;L;MÞqxþi

¼ C� mþi � lþi ¼ 0 ½97�

qLPðw; b;L;MÞqx�i

¼ C� m�i � l�i ¼ 0 ½98�

Non-negativity Conditions

xþi ; x�i � 0; i ¼ 1; . . . ;m

lþi ; l�i � 0; i ¼ 1; . . . ;m

mþi ; m�i � 0; i ¼ 1; . . . ;m

½99�

+ε

−ε0

ξ

+ε−ε ξ+ε

loss

y−f(x)

Figure 43 Linear SVM regression case with soft margin and 2-insensitive loss function.


The dual optimization problem is obtained by substituting Eqs. [95]–[98]into Eq. [94]:

LDðw; b;L;MÞ ¼ � 1

2

Xmi¼1

Xmj¼1ðl�i � lþi Þðl�j � lþj Þxi � xj

maximize

� eXmi¼1ðl�i þ lþi Þ þ

Xmi¼1

yiðl�i � lþi Þ

subject toXmi¼1ðl�i � lþi Þ ¼ 0

and l�i ; lþi 2 ½0;C�

½100�

The vector w is obtained from Eq. [96]:

w ¼Xmi¼1ðl�i � lþi Þxi ½101�

which leads to the final expression for f ðxk), the computed value for a pattern xk:

f ðxkÞ ¼Xmi¼1ðl�i � lþi Þxi � xk þ b ½102�

Nonlinear SVM regression is obtained by introducing feature functions fthat map the input patterns into a higher dimensional space, x! fðxÞ. Byreplacing the dot product fðxiÞ � fðxj) with a kernel function Kðxi; xj), weobtain from Eq. [100] the following optimization problem:

LDðw; b;L;MÞ ¼ � 1

2

Xmi¼1

Xmj¼1ðl�i � lþi Þðl�j � lþj ÞKðxi; xjÞ

maximize� eXmi¼1ðl�i þ lþi Þ þ

Xmi¼1

yiðl�i � lþi Þ

subject toXmi¼1ðl�i � lþi Þ ¼ 0

and l�i ; lþi 2 ½0;C�

½103�

Similarly with Eq. [101], the kernel SVM regression model has w given by

w ¼Xmi¼1ðl�i � lþi ÞfðxiÞ ½104�

The modeled property for a pattern xk is obtained with the formula:

f ðxkÞ ¼Xmi¼1ðl�i � lþi ÞKðxi;xkÞ þ b ½105�

SVM Regression 343

The e-insensitive loss function used in the SVM regression adds a newparameter e that significantly influences the model and its prediction capacity.Besides the e-insensitive, other loss functions can be used with SVM regres-sion, such as quadratic, Laplace, or Huber loss functions (Figure 44).

We now present an illustrative example of a one-dimensional nonlinearSVM regression using the dataset in Table 6. This dataset has two spikes,which makes it difficult to model with the common kernels.

(a) (b)

(c) (d)

Figure 44 Loss functions for support vector machines regression: (a) quadratic;(b) Laplace; (c) Huber; (d) e-insensitive.

Table 6 Patterns Used for the SVM Regression Models in Figures 45–48

Pattern x y Pattern x y

1 0.2 1.2 18 1.5 1.182 0.3 1.22 19 1.6 1.173 0.4 1.23 20 1.7 1.164 0.5 1.24 21 1.8 1.125 0.6 1.25 22 1.85 0.856 0.7 1.28 23 1.9 0.657 0.8 1.38 24 1.95 0.328 0.85 1.6 25 2.0 0.49 0.9 1.92 26 2.05 0.510 0.95 2.1 27 2.1 0.611 1.0 2.3 28 2.15 0.812 1.05 2.2 29 2.2 0.9513 1.1 1.85 30 2.3 1.1814 1.15 1.6 31 2.4 1.215 1.2 1.4 32 2.5 1.2116 1.3 1.19 33 2.6 1.2217 1.4 1.18


In Figure 45, we present two SVM regression models, the first oneobtained with a degree 10 polynomial kernel and the second one computedwith a spline kernel. The polynomial kernel has some oscillations on bothends of the curve, whereas the spline kernel is observed to be inadequate formodeling the two spikes. The RBF kernel was also unable to offer an accep-table solution for this regression dataset (data not shown).

The degree 1 B spline kernel (Figure 46a) with the parameters C ¼ 100and e ¼ 0:1 gives a surprisingly good SVM regression model, with a regressiontube that closely follows the details of the input data. We will use this kernel toexplore the influence of the e-insensitivity and capacity C on the regressiontube. By maintaining C to 100 and increasing e to 0.3, we obtain a less sensi-tive solution (Figure 46b), that does not model well the three regions havingalmost constant y values. This is because the diameter of the tube is signifi-cantly larger and the patterns inside the tube do not influence the SVMRmodel(they have zero error).

By further increasing e to 0.5 (Figure 47a), the shape of the SVM regres-sion model becomes even less similar to the dataset. The regression tube is now

Figure 45 SVM regression models for the dataset from Table 6, with e ¼ 0:1: (a) degree10 polynomial kernel; (b) spline kernel.

Figure 46 SVM regression models with a B spline kernel, degree 1, for the dataset fromTable 6, with C ¼ 100: (a) e ¼ 0:1; (b) e ¼ 0:3.

SVM Regression 345

defined by a small number of support vectors, but they are not representativeof the overall shape of the curve. It is now apparent that the e-insensitivityparameter should be tailored for each specific problem because small varia-tions in that parameter have significant effects on the regression model. Wenow consider the influence of the capacity C when e is held constant to 0.1.The reference here is the SVMR model from Figure 46a obtained forC ¼ 100. When C decreases to 10 (Figure 47b), the penalty for errorsdecreases and the solution is incapable of modeling the points with extremey values in the two spikes accurately.

By further decreasing the capacity parameter C to 1 (Figure 48a) andthen to 0.1 (Figure 48b), the SVMR model further loses the capacity to modelthe two spikes. The examples shown here for C are not representative for nor-mal experimental values, and they are presented only to illustrate their influ-ence on the shape of the regression hyperplane.

Figure 47 SVM regression models with a B spline kernel, degree 1, for the dataset fromTable 6: (a) e ¼ 0:5, C ¼ 100; (b) e ¼ 0:1, C ¼ 10.

Figure 48 SVM regression models with a B spline kernel, degree 1, for the dataset fromTable 6: (a) e ¼ 0:1, C ¼ 1; (b) e ¼ 0:1, C ¼ 0:1.


OPTIMIZING THE SVM MODEL

Finding an SVMmodel with good prediction statistics is a trial-and-errortask. The objective is to maximize the predictions statistics while keeping themodel simple in terms of number of input descriptors, number of support vec-tors, patterns used for training, and kernel complexity. In this section, we pre-sent an overview of the techniques used in SVM model optimization.

Descriptor Selection

Selecting relevant input parameters is both important and difficult forany machine learning method. For example, in QSAR, one can compute thou-sands of structural descriptors with software like CODESSA or Dragon, orwith various molecular field methods. Many procedures have been developedin QSAR to identify a set of structural descriptors that retain the importantcharacteristics of the chemical compounds.63,64 These methods can beextended to SVM models. Another source of inspiration is represented bythe algorithms proposed in the machine learning literature, which can be read-ily applied to cheminformatics problems. We present here several literaturepointers for algorithms on descriptor selection.

A variable selection method via sparse SVM was proposed by Bi et al.65

In a first step, this method uses a linear SVM for descriptor selection, followedby a second step when nonlinear kernels are introduced. The recursive saliencyanalysis for descriptor selection was investigated by Cao et al.66 Fung andMangasarian proposed a feature selection Newton method for SVM.67 Kumaret al. introduced a new method for descriptor selection, the locally linearembedding, which can be used for reducing the nonlinear dimensions inQSPR and QSAR.68 Xue et al. investigated the application of recursive featureelimination for three classification tests, namely P-glycoprotein substrates,human intestinal absorption, and compounds that cause torsade de pointes.69

Frohlich, Wegner, and Zell introduced the incremental regularized risk mini-mization procedure from SVM classification and regression, and they com-pared it with recursive feature elimination and with the mutual informationprocedure.70 Five methods of feature selection (information gain, mutualinformation, w2-test, odds ratio, and GSS coefficient) were compared by Liufor their ability to discriminate between thrombin inhibitors and non-inhibitors.71 Byvatov and Schneider compared the SVM-based and theKolmogorov–Smirnov feature selection methods to characterize ligand-receptor interactions in focused compound libraries.72 A genetic algorithmfor descriptor selection was combined with SVM regression by Nandi et al.to model and optimize the benzene isopropylation on Hbeta catalyst.73 Final-ly, gene selection from microarray data is a necessary step for disease classifi-cation55,74–81 with support vector machines.

Optimizing the SVM Model 347

Support Vectors Selection

The time needed to predict a pattern with an SVM model is proportionalto the number of support vectors. This makes prediction slow when the SVMhas a large number of support vectors. Downs, Gates, and Masters showedthat the SMO algorithm,41 usually used for SVM training, can produce solu-tions with more support vectors than are needed for an optimum model.82

They found that some support vectors are linearly dependent on other supportvectors, and that these linearly dependent support vectors can be identified andthen removed from the SVM model with an efficient algorithm. Besides redu-cing of the number of support vectors, the new solution gives identical predic-tions with the full SVMmodel. Their model reduction algorithm was tested forseveral classification and regression problems, and in most cases lead to areduction in the number of support vectors, which was as high as 90% inone example. In some cases, the SVM solution did not contain any linearlydependent support vectors so it was not possible to simplify the model.

Zhan and Shen proposed a four-step algorithm to simplify the SVM solu-tion by removing unnecessary support vectors.83 In the first step, the learningset is used to train the SVM and identify the support vectors. In the secondstep, the support vectors that make the surface convoluted (i.e., their projec-tion of the hypersurface having the largest curvatures) are excluded from thelearning set. In the third step, the SVM is retrained with the reduced learningset. In the fourth step, the complexity of the SVM model is further reduced byapproximating the separation hypersurface with a subset of the support vec-tors. The algorithm was tested for tissue classification for 3-D prostate ultra-sound images, demonstrating that the number of support vectors can bereduced without degrading the prediction of the SVM model.

Jury SVM

Starting from current machine learning algorithms (e.g., PCA, PLS,ANN, SVM, and k-NN), one can derive new classification or regression sys-tems by combining the predictions of two or more models. Such a predictionmeta-algorithm (called jury, committee, or ensemble) can use a wide variety ofmathematical procedures to combine the individual predictions into a finalprediction. Empirical studies showed that jury methods can increase the pre-diction performances of the individual models that are aggregated in theensemble. Their disadvantages include the increased complexity of the modeland longer computing time. In practical applications, the use of jury methodsis justified if a statistically significant increase in prediction power is obtained.Several examples of using jury SVM follow.

Drug-like compound identification with a jury of k-NN, SVM, and ridgeregression was investigated by Merkwirth et al.84 Jury predictions with severalmachine learning methods were compared by Briem and Gunther for the


discrimination of kinase inhibitors from noninhibitors.85 Yap and Chen com-pared two jury SVM procedures for classifying inhibitors and substrates ofcytochromes P450 3A4, 2D6 and 2C9.86 Jerebko et al. used jury SVM classi-fiers based on bagging (bootstrap aggregation) for polyp detection in CT colo-nography.78 Valentini, Muselli, and Ruffino used bagged jury SVM on DNAmicroarray gene expression data to classify normal and malignant tissues.87 Asa final example, we point out the work of Guermeur et al. who used a multi-class SVM to aggregate the best protein secondary structure prediction meth-ods, thus improving their performances.52,53

Kernels for Biosequences

The kernel function measures the similarity between pairs of patterns,typically as a dot product between numerical vectors. The usual numericalencoding for protein sequences is based on a 20-digit vector that encodes (bin-ary) the presence/absence of a certain amino acid in a position. To explore newways of encoding the structural information from biosequences, variouskernels have been proposed for the prediction of biochemical propertiesdirectly from a given sequence.

Saigo et al. defined alignment kernels that compute the similaritybetween two sequences by summing up scores obtained from local alignmentswith gaps.88 The new kernels could recognize SCOP superfamilies and outper-form standard methods for remote homology detection. The mismatch kernelintroduced by Leslie et al. measures sequence similarity based on shared occur-rences of fixed-length patterns in the data, thus allowing for mutationsbetween patterns.89 This type of partial string matching kernel predicts suc-cessfully protein classification in families and superfamilies. Vert used a treekernel to measure the similarity between phylogenetic profiles so as to predictthe functional class of a gene from its phylogenetic profile.90 The tree kernelcan predict functional characteristics from evolutionary information. Yangand Chou defined a class of kernels that compute protein sequence similaritybased on amino acid similarity matrices such as the Dayhoff matrix.91 Stringkernels computed from subsite coupling models of protein sequences wereused by Wang, Yang, and Chou to predict the signal peptide cleavage site.92

Teramoto et al. showed that the design of small interfering RNA (siRNA) isgreatly improved by using string kernels.93 The siRNA sequence was decom-posed into 1-, 2-, and 3-mer subsequences that were fed into the string kernelto compute the similarity between two sequences. Leslie and Kuang definedthree new classes of k-mer string kernels, namely restricted gappy kernels, sub-stitution kernels, and wildcard kernels, based on feature spaces defined byk-length subsequences from the protein sequence.94 The new kernels wereused for homology detection and protein classification. Tsuda and Nobleused the diffusion kernel to predict protein functional classification from meta-bolic and protein–protein interaction networks.95 The diffusion kernel is a

Optimizing the SVM Model 349

method of computing pair-wise distances between all nodes in a graph basedon the sum of weighted paths between each pair of vertices.

Kernels for Molecular Structures

The common approach in SVM applications for property prediction basedon molecular structure involves the computation of various classes of structuraldescriptors. These descriptors are used with various kernels to compute thestructural similarity between two chemical compounds. Obviously, thisapproach reflects chemical bonding only in an indirect way, through descriptors.The molecular structure can be used directly in computing the pair-wise similar-ity of chemicals with tree and graph kernels as reviewed below.

Micheli, Portera, and Sperduti used acyclic molecular subgraphs and treekernels to predict the ligand affinity for the benzodiazepine receptor.96 Maheet al. defined a series of graph kernels that can predict various properties fromonly the molecular graph and various atomic descriptors.97 Jain et al. defined anew graph kernel based on the Schur–Hadamard inner product for a pair ofmolecular graphs, and they tested it by predicting the mutagenicity of aromaticand hetero-aromatic nitro compounds.98 Finally, Lind and Maltseva usedmolecular fingerprints to compute the Tanimoto similarity kernel, whichwas incorporated into an SVM regression to predict the aqueous solubilityof organic compounds.99

PRACTICAL ASPECTS OF SVM CLASSIFICATION

Up to this point we have given mostly a theoretical presentation of SVMclassification and regression; it is now appropriate to show some practicalapplications of support vector machines, together with practical guidelinesfor their application in cheminformatics and QSAR. In this section, we willpresent several case studies in SVM classification; the next section is dedicatedto applications of SVM regression.

Studies investigating the universal approximation capabilities of supportvector machines have demonstrated that SVM with usual kernels (such aspolynomial, Gaussian RBF, or dot product kernels) can approximate any mea-surable or continuous function up to any desired accuracy.100,101 Any set ofpatterns can therefore be modeled perfectly if the appropriate kernel and para-meters are used. The ability to approximate any measurable function is indeedrequired for a good nonlinear multivariate pattern recognition algorithm (arti-ficial neural networks are also universal approximators), but from a practicalpoint of view, more is required. Indeed, goodQSARor cheminformaticsmodelsmust have an optimum predictivity (limited by the number of data, data dis-tribution, noise, errors, selection of structural descriptors, etc.), not only agood mapping capability. For SVM classification problems, highly nonlinear


kernels can eventually separate perfectly the classes of patterns with intricatehyperplanes. This is what the universal approximation capabilities of SVMguarantees. However, these capabilities cannot promise that the resultingSVM will be optimally predictive. In fact, only empirical comparison withother classification algorithms (kNN, linear discriminant analysis, PLS, artifi-cial neural networks, etc.) can demonstrate, for a particular problem, thatSVM is better or worse than other classification methods. Indeed, the literatureis replete with comparative studies showing that SVM can often, but notalways, predict better than other methods. In many cases, the statistical differ-ence between methods is not significant, and due to the limited number of sam-ples used in those studies, one cannot prefer a method against others.

An instructive example to consider is the HIV-1 protease cleavage site pre-diction. This problem was investigated with neural networks,102 self-organizingmaps,103 and support vector machines.91,104 After an in-depth examinationof this problem, Rognvaldsson and You concluded that linear classifiersare at least as good predictors as are nonlinear algorithms.105 The poor choiceof complex, nonlinear classifiers could not deliver any new insight forthe HIV-1 protease cleavage site prediction. The message of this story is simpleand valuable: always compare nonlinear SVM models with linear models and,if possible, with other pattern recognition algorithms.

A common belief is that because SVM is based on structural risk mini-mization, its predictions are better than those of other algorithms that arebased on empirical risk minimization. Many published examples show, how-ever, that for real applications, such beliefs do not carry much weight and thatsometimes other multivariate algorithms can deliver better predictions.

An important question to ask is as follows: Do SVMs overfit? Somereports claim that, due to their derivation from structural risk minimization,SVMs do not overfit. However, in this chapter, we have already presentednumerous examples where the SVM solution is overfitted for simple datasets.More examples will follow. In real applications, one must carefully select thenonlinear kernel function needed to generate a classification hyperplane that istopologically appropriate and has optimum predictive power.

It is sometimes claimed that SVMs are better than artificial neural net-works. This assertion is because SVMs have a unique solution, whereas artifi-cial neural networks can become stuck in local minima and because theoptimum number of hidden neurons of ANN requires time-consuming calcu-lations. Indeed, it is true that multilayer feed-forward neural networks canoffer models that represent local minima, but they also give constantly goodsolutions (although suboptimal), which is not the case with SVM (see exam-ples in this section). Undeniably, for a given kernel and set of parameters, theSVM solution is unique. But, an infinite combination of kernels and SVMparameters exist, resulting in an infinite set of unique SVM models. Theunique SVM solution therefore brings little comfort to the researcher becausethe theory cannot foresee which kernel and set of parameters are optimal for a

Practical Aspects of SVM Classification 351

particular problem. And yes, artificial neural networks easily overfit the train-ing data, but so do support vector machines.

Frequently the exclusive use of the RBF kernel is rationalized by men-tioning that it is the best possible kernel for SVM models. The simple tests pre-sented in this chapter (datasets from Tables 1–6) suggest that other kernelsmight be more useful for particular problems. For a comparative evaluation,we review below several SVM classification models obtained with five impor-tant kernels (linear, polynomial, Gaussian radial basis function, neural, andanova) and show that the SVM prediction capability varies significantlywith the kernel type and parameters values used and that, in many cases, asimple linear model is more predictive than nonlinear kernels.

For all SVM classification models described later in this chapter, we haveused the following kernels: dot (linear); polynomial (degree d ¼ 2, 3, 4, 5);radial basis function, Kðxi; xjÞ ¼ expð�gjjxi � xjjj2Þ; ðg ¼ 0:5; 1:0; 2:0Þ; neural(tanh), Eq. [68], (a ¼ 0:5, 1.0, 2.0 and b ¼ 0, 1, 2); anova, Eq. [69], (g ¼ 0:5,1.0, 2.0 and d ¼ 1, 2, 3). All SVM models were computed with mySVM, byRuping, (http://www–ai.cs.uni–dortmund.de/SOFTWARE/MYSVM/).

Predicting the Mechanism of Action for Polar andNonpolar Narcotic Compounds

Because numerous organic chemicals can be environmental pollutants,considerable efforts were directed toward the study of the relationshipsbetween the structure of a chemical compound and its toxicity. Significant pro-gress has been made in classifying chemical compounds according to theirmechanism of toxicity and to screen them for their environmental risk. Pre-dicting the mechanism of action (MOA) using structural descriptors has majorapplications in the selection of an appropriate quantitative structure–activityrelationships (QSAR) model, to identify chemicals with similar toxicitymechanism, and in extrapolating toxic effects between different species andexposure regimes.106–109

Organic compounds that act as narcotic pollutants are considered to dis-rupt the functioning of cell membranes. Narcotic pollutants are represented bytwo classes of compounds, namely nonpolar (MOA 1) and polar (MOA 2)compounds. The toxicity of both polar and nonpolar narcotic pollutantsdepends on the octanol–water partition coefficient, but the toxicity of polarcompounds depends also on the propensity of forming hydrogen bonds. Renused five structural descriptors to discriminate between 76 polar and 114 non-polar pollutants.107 These were the octanol–water partition coefficient logKow, the energy of the highest occupied molecular orbital EHOMO, the energyof the lowest unoccupied molecular orbital ELUMO, the most negative partialcharge on any non-hydrogen atom in the molecule Q�, and the most positivepartial charge on a hydrogen atom Qþ. All quantum descriptors were com-puted with the AM1 method.


Using a descriptor selection procedure, we found that only three descrip-tors (EHOMO, ELUMO, and Q�) are essential for the SVM model. To exemplifythe shape of the classification hyperplane for polar and nonpolar narcotic pol-lutants, we selected 20 compounds (Table 7) as a test set (nonpolar com-pounds, class þ1; polar compounds, class �1).

The first two experiments were performed with a linear kernel forC ¼ 100 (Figure 49a) and C ¼ 1 (Figure 49b). The first plot shows that this

Table 7 Chemical Compounds, Theoretical Descriptors (EHOMO, ELUMO and Q�), andMechanism of Toxic Action (nonpolar, class þ1; polar, class �1)No Compound EHOMO ELUMO Q� MOA Class

1 tetrachloroethene �9.902 �0.4367 �0.0372 1 þ12 1,2-dichloroethane �11.417 0.6838 �0.1151 1 þ13 1,3-dichloropropane �11.372 1.0193 �0.1625 1 þ14 dichloromethane �11.390 0.5946 �0.1854 1 þ15 1,2,4-trimethylbenzene �8.972 0.5030 �0.2105 1 þ16 1,1,2,2-tetrachloroethane �11.655 �0.0738 �0.2785 1 þ17 2,4-dichloroacetophenone �9.890 �0.5146 �0.4423 1 þ18 4-methyl-2-pentanone �10.493 0.8962 �0.4713 1 þ19 ethyl acetate �11.006 1.1370 �0.5045 1 þ110 cyclohexanone �10.616 3.3960 �0.5584 1 þ111 2,4,6-trimethylphenol �8.691 0.4322 �0.4750 2 �112 3-chloronitrobenzene �10.367 �1.2855 �0.4842 2 �113 4-ethylphenol �8.912 0.4334 �0.4931 2 �114 2,4-dimethylphenol �8.784 0.3979 �0.4980 2 �115 4-nitrotoluene �10.305 �1.0449 �0.5017 2 �116 2-chloro-4-nitroaniline �9.256 �0.9066 �0.6434 2 �117 2-chloroaniline �8.376 0.3928 �0.6743 2 �118 pentafluoroaniline �9.272 �1.0127 �0.8360 2 �119 4-methylaniline �8.356 0.6156 �0.9429 2 �120 4-ethylaniline �8.379 0.6219 �0.9589 2 �1

Figure 49 SVM classification models with a dot (linear) kernel for the dataset fromTable 7: (a) C ¼ 100; (b) C ¼ 1.


dataset can be separated with a linear classifier if some errors are accepted.Note that several þ1 compounds cannot be classified correctly. A decreaseof the capacity C shows a larger margin, with a border close to the bulk ofclass þ1 compounds.

A similar analysis was performed for the degree 2 polynomial kernelwith C ¼ 100 (Figure 50a) and C ¼ 1 (Figure 50b). The classification hyper-plane is significantly different from that of the linear classifier, but with littlesuccess because three þ1 compounds cannot be classified correctly. Bydecreasing the penalty for classification errors (Figure 50b), the marginincreases and major changes appear in the shape of the classification hyper-plane.

We will now show two SVMC models that are clearly overfitted. Thefirst one is obtained with a degree 10 polynomial kernel (Figure 51a), whereasfor the second, we used a B spline kernel (Figure 51b). The two classification

Figure 50 SVM classification models with a degree 2 polynomial kernel for the datasetfrom Table 7: (a) C ¼ 100; (b) C ¼ 1.

Figure 51 SVM classification models for the dataset from Table 7, with C ¼ 100:(a) polynomial kernel, degree 10; (b) B spline kernel, degree 1.


hyperplanes are very complex, with a topology that clearly does not resemblethat of the real data.

The statistics for all SVM models that were considered for this exampleare presented in Table 8. The calibration of the SVM models was performedwith the whole set of 190 compounds, whereas the prediction was tested witha leave–20%–out cross-validation method. All notations are explained in thefootnote of Table 8.

Table 8 shows that the SVMswith a linear kernel give very good results. Theprediction accuracy from experiment 3 (ACp ¼ 0:97) is used to compare the per-formances of other kernels. The polynomial kernel (experiments 4–15) has ACp

between 0.93 and 0.96, which are results that do not equal those of the linear ker-nel. The overfitting of SVMmodels is clearly detected in several cases. For exam-ple, as the degree of the polynomial kernel increases from 2 to 5, ACc increasesfrom 0.97 to 1, whereas ACp decreases from 0.96 to 0.93. The SVM modelswith perfect classification in training have the lowest prediction statistics.

The RBF kernel (experiments 16–24), with ACp between 0.96 and 0.97,has better calibration statistics than the linear kernel, but its performance inprediction only equals that of the linear SVM. Although many tests were per-formed for the neural kernel (experiments 25–51), the prediction statistics arelow, with ACp between 0.64 and 0.88. This result is surprising, because thetanh function gives very good results in neural networks. Even the training sta-tistics are low for the neural kernel, with ACc between 0.68 and 0.89

The last set of SVM models were obtained with the anova kernel (experi-ments 52–78), with ACp between 0.94 and 0.98. In fact, only experiment 58has a better prediction accuracy (ACp¼ 0.98) than the liner SVM model fromexperiment 3. The linear SVM has six errors in prediction (all nonpolar com-pounds predicted to be polar), whereas the anova SVM has four predictionerrors, also for nonpolar compounds.

Our experiments with various kernels show that the performance of theSVM classifier is strongly dependent on the kernel shape. Considering theresults of the linear SVM as a reference, many nonlinear SVM models havelower prediction statistics. It is also true that the linear classifier does agood job and there is not much room for improvement. Out of the 75 nonlinearSVM models, only one, with the anova kernel, has slightly higher predictionstatistics than the linear SVM.

Predicting the Mechanism of Action for Narcoticand Reactive Compounds

The second experiment we present in this tutorial for classifying com-pounds according to their mechanism of action involves the classifications of88 chemicals. The chemicals are either narcotics (nonpolar and polar narco-tics) and reactive compounds (respiratory uncouplers, soft electrophiles,and proelectrophiles).110 The dataset, consisting of 48 narcotic compounds


Table

8ResultsforSVM

ClassificationofPolarandNonpolarPollutants

UsingEHOM

O,ELUM

O,andQ�a

Exp

CK

TPc

FN

cTN

cFPc

SVc

ACc

TPp

FN

pTN

pFPp

SVp

ACp

110

L105

976

027

0.95

104

10

76

022.2

0.95

2100

106

876

025

0.96

104

10

76

020.2

0.95

31000

106

876

025

0.96

108

676

019.6

0.97

d

410

P2

109

575

121

0.97

108

675

118.0

0.96

5100

2109

576

020

0.97

108

674

215.2

0.96

61000

2109

576

019

0.97

108

672

414.8

0.95

710

3112

276

021

0.99

108

673

315.2

0.95

8100

3113

176

019

0.99

107

773

315.2

0.95

91000

3114

076

018

1.00

106

873

314.4

0.94

10

10

4112

276

022

0.99

106

873

317.0

0.94

11

100

4114

076

020

1.00

106

872

415.8

0.94

12

1000

4114

076

020

1.00

106

872

415.8

0.94

13

10

5114

076

019

1.00

107

770

615.0

0.93

14

100

5114

076

020

1.00

107

770

615.0

0.93

15

1000

5114

076

020

1.00

107

770

615.0

0.93

g

16

10

R0.5

109

576

026

0.97

107

775

123.6

0.96

17

100

0.5

112

276

020

0.99

108

674

217.0

0.96

18

1000

0.5

113

176

019

0.99

108

674

215.8

0.96

19

10

1.0

112

276

035

0.99

109

575

134.0

0.97

20

100

1.0

113

176

028

0.99

109

575

126.4

0.97

21

1000

1.0

114

076

021

1.00

109

575

121.8

0.97

22

10

2.0

113

176

045

0.99

109

574

244.8

0.96

23

100

2.0

114

076

043

1.00

109

575

140.8

0.97

24

1000

2.0

114

076

043

1.00

109

575

140.8

0.97

356

ab

25

10

N0.5

0.0

102

12

68

826

0.89

102

12

66

10

24.2

0.88

26

100

0.5

0.0

102

12

64

12

28

0.87

104

10

63

13

23.4

0.88

27

1000

0.5

0.0

102

12

64

12

28

0.87

103

11

62

14

22.0

0.87

28

10

1.0

0.0

98

16

60

16

34

0.83

95

19

61

15

30.6

0.82

29

100

1.0

0.0

98

16

60

16

34

0.83

100

14

56

20

31.4

0.82

30

1000

1.0

0.0

98

16

60

16

34

0.83

95

19

60

16

29.6

0.82

31

10

2.0

0.0

85

29

48

28

60

0.70

80

34

55

21

45.2

0.71

32

100

2.0

0.0

87

27

48

28

58

0.71

80

34

55

21

45.2

0.71

33

1000

2.0

0.0

85

29

47

29

60

0.69

86

28

48

28

47.6

0.71

34

10

0.5

1.0

95

19

53

23

53

0.78

92

22

52

24

41.4

0.76

35

100

0.5

1.0

92

22

53

23

49

0.76

89

25

51

25

39.4

0.74

36

1000

0.5

1.0

92

22

53

23

49

0.76

89

25

50

26

39.2

0.73

37

10

1.0

1.0

85

29

47

29

61

0.69

87

27

50

26

44.6

0.72

38

100

1.0

1.0

98

16

59

17

35

0.83

83

31

52

24

43.8

0.71

39

1000

1.0

1.0

98

16

59

17

35

0.83

84

30

46

30

48.0

0.68

40

10

2.0

1.0

86

28

43

33

64

0.68

86

28

50

26

35.6

0.72

41

100

2.0

1.0

86

28

43

33

64

0.68

94

20

55

21

26.6

0.78

42

1000

2.0

1.0

86

28

43

33

64

0.68

97

17

46

30

34.0

0.75

43

10

0.5

2.0

87

27

46

30

67

0.70

90

24

44

32

54.2

0.71

44

100

0.5

2.0

84

30

46

30

63

0.68

85

29

44

32

51.0

0.68

45

1000

0.5

2.0

84

30

46

30

62

0.68

84

30

44

32

50.2

0.67

46

10

1.0

2.0

83

31

45

31

64

0.67

71

43

50

26

52.0

0.64

47

100

1.0

2.0

83

31

45

31

64

0.67

82

32

45

31

51.6

0.67

48

1000

1.0

2.0

83

31

45

31

64

0.67

82

32

45

31

51.6

0.67

49

10

2.0

2.0

85

29

46

30

63

0.69

75

39

65

11

46.0

0.74

50

100

2.0

2.0

97

17

58

18

37

0.82

79

35

68

842.0

0.77

51

1000

2.0

2.0

97

17

58

18

37

0.82

82

32

65

11

38.2

0.77

gd

52

10

A0.5

1110

476

026

0.98

106

875

122.0

0.95

53

100

0.5

1111

376

017

0.98

108

674

215.4

0.96

(continued

)

357

54

1000

0.5

1112

276

014

0.99

109

573

313.2

0.96

55

10

1.0

1111

376

026

0.98

109

575

120.4

0.97

56

100

1.0

1111

376

018

0.98

110

474

216.0

0.97

57

1000

1.0

1113

176

017

0.99

110

472

414.6

0.96

58

10

2.0

1111

376

024

0.98

110

476

020.6

0.98

59

100

2.0

1113

176

018

0.99

109

573

317.8

0.96

60

1000

2.0

1114

076

014

1.00

109

570

615.2

0.94

61

10

0.5

2112

276

024

0.99

107

775

118.4

0.96

62

100

0.5

2112

276

020

0.99

108

674

216.8

0.96

63

1000

0.5

2114

076

015

1.00

107

774

214.2

0.95

64

10

1.0

2112

276

021

0.99

108

675

118.8

0.96

65

100

1.0

2114

076

020

1.00

107

773

316.6

0.95

66

1000

1.0

2114

076

020

1.00

107

773

316.6

0.95

67

10

2.0

2114

076

024

1.00

108

673

324.6

0.95

68

100

2.0

2114

076

022

1.00

108

673

323.0

0.95

69

1000

2.0

2114

076

022

1.00

108

673

323.0

0.95

70

10

0.5

3112

276

021

0.99

108

674

217.0

0.96

71

100

0.5

3114

076

017

1.00

107

773

315.4

0.95

72

1000

0.5

3114

076

017

1.00

107

773

315.4

0.95

73

10

1.0

3114

076

020

1.00

107

774

220.4

0.95

74

100

1.0

3114

076

020

1.00

107

774

220.4

0.95

75

1000

1.0

3114

076

020

1.00

107

774

220.4

0.95

76

10

2.0

3114

076

038

1.00

108

674

237.2

0.96

77

100

2.0

3114

076

038

1.00

108

674

237.2

0.96

78

1000

2.0

3114

076

038

1.00

108

674

237.2

0.96

aThetablereportstheexperim

entnumber

Exp,capacityparameter

C,kerneltypeK(linearL;polynomialP;radialbasisfunctionR;neuralN;anovaA),

andcorrespondingparameters,calibrationresults(TPc,truepositivein

calibration;FN

c,falsenegativein

calibration;TN

c,truenegativein

calibration;FPc,

falsepositivein

calibration;SVc,number

ofsupportvectors

incalibration;ACc,calibrationaccuracy),andL20%O

predictionresults(TPp,truepositivein

prediction;FN

p,falsenegativein

prediction;TN

p,truenegativein

prediction;FPp,falsepositivein

prediction;SVp,averagenumber

ofsupport

vectors

inprediction;ACp,predictionaccuracy).

Table

8(C

ontinued)

Exp

CK

TPc

FN

cTN

cFPc

SVc

ACc

TPp

FN

pTN

pFPp

SVp

ACp

358

(class þ1) and 40 reactive compounds (class �1), was taken from two recentstudies.108,111 Four theoretical descriptors are used to discriminate betweentheir mechanism of action, namely the octanol–water partition coefficientlog Kow, the energy of the highest occupied molecular orbital EHOMO, theenergy of the lowest unoccupied molecular orbital ELUMO, and the averageacceptor superdelocalizability SNav. The prediction power of each SVM modelwas evaluated with a leave–10%–out cross-validation procedure.

The best prediction statistics for each kernel type are presented here: lin-ear, C ¼ 1000, ACp¼ 0.86; polynomial, degree 2, C ¼ 10, ACp¼ 0.92; RBF,C ¼ 100, g¼ 0.5, ACp¼ 0.83; neural, C ¼ 10, a ¼ 0:5, b ¼ 0, ACp¼ 0.78;and anova, C ¼ 10, g¼ 0.5, d ¼ 1, ACp¼ 0.87. These results indicate that adegree 2 polynomial is a good separation hyperplane between narcotic andreactive compounds. The neural and RBF kernels have worse predictionsthan does the linear SVM model, whereas the anova kernel has similar ACp

with the linear model.

Predicting the Mechanism of Action from Hydrophobicityand Experimental Toxicity

This exercise for classifying compounds according to their mechanism ofaction uses as input data the molecule’s hydrophobicity and experimental toxi-city against Pimephales promelas and Tetrahymena pyriformis.112 SVM clas-sification was applied for a set of 337 organic compounds from eight MOAclasses (126 nonpolar narcotics, 79 polar narcotics, 23 ester narcotics,13 amine narcotics, 13 weak acid respiratory uncouplers, 69 electrophiles,8 proelectrophiles, and 6 nucleophiles).113 The MOA classification was basedon three indices taken from a QSAR study by Ren, Frymier, and Schultz113

namely: log Kow, the octanol–water partition coefficient; log 1/IGC50, the50% inhibitory growth concentration against Tetrahymena pyriformis; andlog 1/LC50, the 50% lethal concentration against Pimephales promelas. Theprediction power of each SVM model was evaluated with a leave–5%–out(L5%O) cross-validation procedure.

In the first test we used SVM models to discriminate between nonpolarnarcotic compounds (chemicals that have baseline toxicity) and other com-pounds having excess toxicity (representing the following MOAs: polar narco-tics, ester narcotics, amine narcotics, weak acid respiratory uncouplers,electrophiles, proelectrophiles, and nucleophiles). From the total set of 337compounds, 126 represent the SVM class þ1 (nonpolar narcotic) and 211represent the SVM class �1 (all other MOA classes).

The best cross-validation results for each kernel type are presented inTable 9. The linear, polynomial, RBF, and anova kernels have similarresults that are of reasonably quality, whereas the neural kernel has verybad statistics; the slight classification improvement obtained for the RBFand anova kernels is not statistically significant.


The chemicals exhibiting excess toxicity belong to sevenMOA classes, andtheir toxicity has a wide range of variation. For these molecules, it is useful tofurther separate them as being less-reactive and more-reactive compounds. Inthe second test, we have developed SVM models that discriminate between less-reactive compounds (SVM class þ1, formed by polar narcotics, ester narcotics,amine narcotics) and more-reactive compounds (SVM class �1, formed byweak acid respiratory uncouplers, electrophiles, proelectrophiles, and nucleo-philes). From the total of 211 compounds with excess toxicity, 115 are less-reactive and 96 are more-reactive compounds.

In Table 10, we show the best cross-validation results for each kerneltype. The radial kernel has the best predictions, followed by the linear SVMmodel. The remaining kernels have worse predictions than does the linearmodel.

Classifying the Carcinogenic Activity of PolycyclicAromatic Hydrocarbons

Structure-activity relationships are valuable statistical models that can beused for predicting the carcinogenic potential of new chemicals and for theinterpretation of the short-term tests of genotoxicity, long-term tests of carci-nogenicity in rodents, and epidemiological data. We show here an SVM appli-cation for identifying the carcinogenic activity of a group of methylated andnonmethylated polycyclic aromatic hydrocarbons (PAHs).114 The dataset

Table 9 SVM Classification of Nonpolar Narcotic Compounds (SVM class þ1) FromOther Compounds (SVM class �1) Using as Descriptors log Kow, log 1/IGC50 and log1/LC50

Kernel TPc FNc TNc FPc SVc ACc TPp FNp TNp FPp SVp ACp

L 78 48 186 25 195 0.78 79 47 186 25 185.8 0.79P, d ¼ 2 81 45 185 26 176 0.79 82 44 184 27 165.8 0.79R, g ¼ 1:0 97 29 190 21 172 0.85 89 37 180 31 165.1 0.80N, a ¼ 0:5, b ¼ 0 75 51 98 113 158 0.51 49 77 130 81 152.1 0.53A, g ¼ 0:5, d ¼ 2 95 31 190 21 169 0.85 87 39 182 29 119.2 0.80

Table 10 SVM Classification of Less-Reactive Compounds and More-ReactiveCompounds

Kernel TPc FNc TNc FPc SVc ACc TPp FNp TNp FPp SVp ACp

L 97 18 50 46 151 0.70 97 18 46 50 144.2 0.68P, d ¼ 2 101 14 38 58 154 0.66 97 18 38 58 144.0 0.64R, g ¼ 2:0 107 8 77 19 141 0.87 94 21 55 41 133.2 0.71N, a ¼ 2, b ¼ 1 71 44 51 45 91 0.58 64 51 59 37 90.5 0.58A, g ¼ 0:5, d ¼ 2 109 6 77 19 112 0.88 85 30 57 39 105.8 0.67


consists of 32 PAHs and 46 methylated PAHs taken from literature.115–118

From this set of 78 aromatic hydrocarbons, 34 are carcinogenic and 44 arenoncarcinogenic. The carcinogenic activity was predicted by using the follow-ing four theoretical descriptors computed with the PM3 semiempirical method:energy of the highest occupied molecular orbital EHOMO; energy of the lowestunoccupied molecular orbital ELUMO; hardness HD, where HD ¼ ðELUMO�EHOMOÞ=2; and difference between EHOMO and EHOMO�1 denoted �H.117

The prediction power of each SVM model was evaluated with a leave–10%–out cross-validation procedure.

The best prediction statistics for each kernel type are presented here: linear,ACp ¼ 0:76; polynomial, degree 2, ACp¼ 0.82; RBF, g¼ 0.5, ACp¼ 0.86;neural, a ¼ 2, b ¼ 0, ACp¼ 0.66; and anova, g ¼ 0:5, d ¼ 1, ACp¼ 0.84(C ¼ 10 for these SVM models). The relationship between the quantum indicesand PAH carcinogenicity is nonlinear, as evidenced by the increase in predictionpower when going from a linear to an RBF kernel.

Structure-Odor Relationships for Pyrazines

Various techniques of molecular design can significantly help fragranceresearchers to find relationships between the chemical structure and the odorof organic compounds.119–124 A wide variety of structural descriptors (mole-cular fragments, topological indices, geometric descriptors, or quantumindices) and a broad selection of qualitative or quantitative statistical equa-tions have been used to model and predict the aroma (and its intensity) for var-ious classes of organic compounds. Besides providing an important guide forthe synthesis of new fragrances, structure-odor relationships (SOR) offer a betterunderstanding of the mechanism of odor perception.

We illustrate the application of support vector machines for aroma clas-sification using as our example 98 tetra–substituted pyrazines (Figure 52)representing three odor classes, namely 32 green, 23 nutty, and 43 bell-pepper.125 The prediction power of each SVM model was evaluated with aleave–10%–out cross-validation procedure.126 This multiclass dataset wasmodeled with an one-versus-all approach.

In the first classification test, class þ1 contained green aroma compoundsand class �1 contained compounds with nutty or bell-pepper aroma. The bestprediction statistics for each kernel type are linear, C ¼ 10, ACp¼ 0.80; poly-nomial, degree 2, C ¼ 1000, ACp¼ 0.86; RBF, C ¼ 10, g ¼ 0:5, ACp¼ 0.79;neural, C ¼ 10, a ¼ 0:5, b ¼ 0, ACp¼ 0.73; and anova, C ¼ 100, g ¼ 0:5,

N

NR4

R3

R1

R2

Figure 52 General structure of pyrazines.


d ¼ 1, ACp¼ 0.84. A degree 2 polynomial kernel has the best prediction, fol-lowed by the anova kernel and the linear model.

In the second test, classþ1containedcompoundswithnuttyaroma,whereasthe remainingpyrazines formedtheclass�1.Theprediction statistics showaslightadvantage for the anova kernel, whereas the linear, polynomial, and RBF kernelshave identical results: linear,C ¼ 10, ACp¼ 0.89; polynomial, degree 2,C ¼ 10,ACp¼ 0.89; RBF, C ¼ 10, g ¼ 0:5, ACp¼ 0.89; neural, C ¼ 100, a ¼ 0:5,b ¼ 0, ACp¼ 0.79; and anova, C ¼ 100, g ¼ 0:5, d ¼ 1, ACp¼ 0.92.

Finally, compounds with bell-pepper aroma were considered to be in classþ1, whereas green and nutty pyrazines formed the class�1. Three kernels (RBF,polynomial, and anova) give much better predictions than does the linear SVMclassifier: linear, C ¼ 10, ACp¼ 0.74; polynomial, degree 2, C ¼ 10,ACp¼ 0.88; RBF, C ¼ 10, g ¼ 0:5, ACp¼ 0.89; neural, C ¼ 100, a ¼ 2,b ¼ 1, ACp¼ 0.68; and anova, C ¼ 10, g ¼ 0:5, d ¼ 1, ACp¼ 0.87. We haveto notice that the number of support vectors depends on the kernel type (linear,SV¼ 27; RBF, SV¼ 43; anova, SV¼ 31; all for training with all compounds),so for this structure-odormodel, one might prefer the SVMmodel with a polyno-mial kernel that ismore compact, i.e., contains a lower numberof support vectors.

In this section, we compared the prediction capabilities of five kernels,namely linear, polynomial, Gaussian radial basis function, neural, and anova.Several guidelines that might help the modeler obtain a predictive SVM modelcan be extracted from these results: (1) It is important to compare the predic-tions of a large number of kernels and combinations of parameters; (2) the lin-ear kernel should be used as a reference to compare the results from nonlinearkernels; (3) some datasets can be separated with a linear hyperplane; in suchinstances, the use of a nonlinear kernel should be avoided; and (4) when therelationships between input data and class attribution are nonlinear, RBF ker-nels do not necessarily give the optimum SVM classifier.

PRACTICAL ASPECTS OF SVM REGRESSION

Support vector machines were initially developed for class discrimination,and most of their applications have been for pattern classification. SVM classifi-cation is especially relevant for important cheminformatics problems, such asrecognizing drug-like compounds, or discriminating between toxic and nontoxiccompounds, and many such applications have been published. The QSAR appli-cations of SVM regression, however, are rare, and this is unfortunate because itrepresents a viable alternative to multiple linear regression, PLS, or neural net-works. In this section, we present several SVMR applications to QSAR datasets,and we compare the performance of several kernels.

The SVMregressionmodelswe present below implement the following ker-nels: linear; polynomial (degree d ¼ 2, 3, 4, 5); radial basis function,Kðxi; xjÞ ¼ expð�gjjxi � xjjj2Þ, (g ¼ 0:25, 0.5, 1.0, 1.5, 2.0); neural (tanh),Eq. [68], (a ¼ 0:5, 1.0, 2.0 and b ¼ 0, 1, 2); and anova, Eq. [69], (g ¼ 0:25,


0.5, 1.0, 1.5, 2.0 and d ¼ 1, 2, 3). All SVMmodelswere computedwithmySVM,by Ruping, (http://www–ai.cs.uni–dortmund.de/SOFTWARE/MYSVM/).

SVM Regression QSAR for the Phenol Toxicity toTetrahymena pyriformis

Aptula et al. used multiple linear regression to investigate the toxicity of200 phenols to the ciliated protozoan Tetrahymena pyriformis.127 Using theirMLR model, they then predicted the toxicity of another 50 phenols. Here wepresent a comparative study for the entire set of 250 phenols, using multiplelinear regression, artificial neural networks, and SVM regression methods.128

Before computing the SVM model, the input vectors were scaled to zero meanand unit variance. The prediction power of the QSAR models was tested withcomplete cross-validation: leave–5%–out (L5%O), leave–10%–out (L10%O),leave–20%–out (L20%O), and leave–25%–out (L25%O). The capacity para-meter C was optimized for each SVM model.

Seven structural descriptorswere used tomodel the 50%growth inhibitionconcentration, IGC50. These descriptors are log D, where D is the dissociationconstant (i.e., the octanol–water partition coefficient corrected for ionization);ELUMO, the energy of the lowest unoccupied molecular orbital; MW, the mole-cular weight; PNEG, the negatively charged molecular surface area; ABSQon, thesum of absolute charges on nitrogen and oxygen atoms; MaxHp, the largestpositive charge on a hydrogen atom; and SsOH, the electrotopological stateindex for the hydroxy group. TheMLRmodel has a calibration correlation coef-ficient of 0.806 and is stable to cross-validation experiments:

pIGC50¼�0:154ð 0:080Þþ0:296ð 0:154ÞlogD�0:352ð 0:183ÞELUMO

þ0:00361ð 0:00188ÞMW�0:0218ð 0:0113ÞPNEG�0:446ð 0:232ÞABSQonþ1:993ð 1:037ÞMaxHpþ0:0265ð 0:0138ÞSsOH ½106�

n ¼ 250 rcal ¼ 0:806 RMSEcal ¼ 0:49 scal ¼ 0:50 Fcal ¼ 64

rLOO ¼ 0:789 q2LOO ¼ 0:622 RMSELOO ¼ 0:51

rL5%O ¼ 0:785 q2L5%O ¼ 0:615 RMSEL5%O ¼ 0:51

rL10%O ¼ 0:786 q2L10%O ¼ 0:617 RMSEL10%O ¼ 0:51

rL20%O ¼ 0:775 q2L20%O ¼ 0:596 RMSEL20%O ¼ 0:53

rL25%O ¼ 0:788 q2L25%O ¼ 0:620 RMSEL25%O ¼ 0:51

Based on the cross-validation statistics, the best ANNmodel has tanh func-tions for both hidden and output neurons, and it has only one hidden neuron. Thestatistics for this ANN are rcal ¼ 0:824, RMSEcal ¼ 0:47; rL5%O ¼ 0:804,q2L5%O ¼ 0:645, RMSEL5%O ¼ 0:49; rL10%O ¼ 0:805, q2L10%O ¼ 0:647,RMSEL10%O ¼ 0:49, rL20%O ¼ 0:802, q2L20%O ¼ 0:642, RMSEL20%O ¼ 0:50;

Practical Aspects of SVM Regression 363

and rL25%O ¼ 0:811, q2L25%O ¼ 0:657, RMSEL25%O ¼ 0:48. On the one hand, theANN statistics are better than those obtained with MLR, indicating that there issome nonlinearity between pIGC50 and the seven structural descriptors. On theother hand, the predictions statistics for ANN models with two or more hiddenneurons decrease, indicating that the dataset has a high level of noise or error.

The SVM regression results for the prediction of phenol toxicity toTetrahymena pyriformis are presented in Tables 11 and 12. In calibration

Table 11 Kernel Type and Corresponding Parameters for Each SVM Modela

Exp Kernel Copt SVcal rcal RMSEcal rL5%O q2L5%O RMSEL5%O

1 L 64.593 250 0.803 0.51 0.789 0.593 0.532 P 2 88.198 250 0.853 0.44 0.787 0.591 0.533 P 3 64.593 243 0.853 0.45 0.326 �3.921 1.834 P 4 73.609 248 0.993 0.09 0.047 <�100 >105 P 5 88.198 250 0.999 0.04 0.137 <�100 >106 R 0.25 88.198 250 0.983 0.15 0.694 0.330 0.687 R 0.5 88.198 250 0.996 0.08 0.660 0.303 0.698 R 1.0 88.198 250 1.000 0.01 0.668 0.428 0.639 R 1.5 88.198 250 1.000 0.00 0.659 0.433 0.6210 R 2.0 64.593 250 1.000 0.00 0.636 0.400 0.6411 N 0.5 0.0 0.024 250 0.748 0.56 0.743 0.536 0.5612 N 1.0 0.0 0.016 250 0.714 0.59 0.722 0.506 0.5813 N 2.0 0.0 0.016 250 0.673 0.61 0.696 0.474 0.6014 N 0.5 1.0 0.020 250 0.691 0.60 0.709 0.483 0.5915 N 1.0 1.0 0.012 250 0.723 0.61 0.706 0.468 0.6016 N 2.0 1.0 0.015 248 0.688 0.61 0.678 0.440 0.6217 N 0.5 2.0 0.020 250 0.642 0.64 0.614 0.374 0.6518 N 1.0 2.0 0.015 250 0.703 0.62 0.670 0.429 0.6219 N 2.0 2.0 0.012 250 0.695 0.62 0.586 0.343 0.6720 A 0.25 1 88.198 250 0.842 0.46 0.718 0.433 0.6221 A 0.5 1 88.198 250 0.857 0.43 0.708 0.414 0.6322 A 1.0 1 88.198 250 0.868 0.42 0.680 0.348 0.6723 A 1.5 1 88.198 250 0.880 0.40 0.674 0.323 0.6824 A 2.0 1 88.198 250 0.884 0.40 0.688 0.360 0.6625 A 0.25 2 88.198 250 0.977 0.18 0.531 �0.760 1.1026 A 0.5 2 88.198 250 0.994 0.09 0.406 �1.595 1.3327 A 1.0 2 88.198 250 1.000 0.01 0.436 �1.182 1.2228 A 1.5 2 88.198 250 1.000 0.00 0.492 �0.512 1.0229 A 2.0 2 64.593 250 1.000 0.00 0.542 �0.141 0.8830 A 0.25 3 88.198 250 0.999 0.04 0.312 �4.199 1.8931 A 0.5 3 64.593 250 1.000 0.00 0.506 �0.781 1.1032 A 1.0 3 64.593 250 1.000 0.00 0.625 0.134 0.7733 A 1.5 3 64.593 250 1.000 0.00 0.682 0.377 0.6534 A 2.0 3 64.593 250 1.000 0.00 0.708 0.461 0.61

a Notations: Exp, experiment number; rcal, calibration correlation coefficient; RMSEcal,calibration root mean square error; rL5%O, leave–5%–out correlation coefficient; q2L5%O, leave–5%–out q2; RMSEL5%O, leave–5%–out root-mean-square error; L, linear kernel; P, polynomialkernel (parameter: degree d); R, radial basis function kernel (parameter: g); N, neural kernel(parameters: a and b); and A, anova kernel (parameters: g and d).


(fitting), the results are much better than those obtained with either MLR orANN. In fact, several SVMR models have a perfect correlation, with rcal ¼ 1.These include experiments 8–10 with RBF and experiments 27–29 and 31–34with the anova kernel. As expected, the SVMR with a linear kernel has predic-tions slightly higher than those of the MLR QSAR. However, the SVM regres-sion models with nonlinear kernels have worse predictions. We compare hereonly rL25%O for all QSAR models (for each kernel type, only the best predictionstatistics are given): MLR, 0.788; ANN, 0.811; SVMR linear, 0.790; SVMR

Table 12 Support Vector Regression Statistics for Leave–10%–out, Leave–20%–out,and Leave–25%–out Cross-validation Testsa

RMS- RMS- RMS-

Exp Kernel rL10%O q2L10%O EL10%O rL20%O q2L20%O EL20%O rL25%O q2L25%O EL25%O

1 L 0.789 0.593 0.53 0.786 0.588 0.53 0.790 0.589 0.53

2 P 0.784 0.586 0.53 0.746 0.495 0.59 0.762 0.501 0.58

3 P 0.316 �3.915 1.83 0.142 �14.254 3.23 0.116 �11.734 2.954 P �0.008 <�100 >10 �0.055 <�100 >10 0.059 <�100 >10

5 P 0.035 <�100 >10 0.196 <�100 >10 0.069 <�100 >10

6 R 0.676 0.307 0.69 0.684 0.291 0.70 0.647 0.238 0.727 R 0.663 0.339 0.67 0.650 0.288 0.70 0.629 0.301 0.69

8 R 0.662 0.424 0.63 0.673 0.440 0.62 0.626 0.381 0.65

9 R 0.650 0.422 0.63 0.662 0.438 0.62 0.595 0.353 0.67

10 R 0.628 0.390 0.65 0.640 0.405 0.64 0.561 0.312 0.6911 N 0.737 0.530 0.57 0.744 0.542 0.56 0.743 0.534 0.56

12 N 0.719 0.503 0.58 0.715 0.497 0.59 0.716 0.497 0.59

13 N 0.685 0.460 0.61 0.689 0.464 0.61 0.700 0.478 0.60

14 N 0.714 0.491 0.59 0.704 0.470 0.60 0.701 0.474 0.6015 N 0.689 0.451 0.61 0.705 0.452 0.61 0.709 0.470 0.60

16 N 0.684 0.443 0.62 0.661 0.430 0.62 0.624 0.381 0.65

17 N 0.610 0.369 0.66 0.629 0.393 0.64 0.630 0.394 0.6418 N 0.678 0.436 0.62 0.683 0.443 0.62 0.678 0.436 0.62

19 N 0.682 0.430 0.62 0.683 0.430 0.62 0.528 0.255 0.71

20 A 0.725 0.457 0.61 0.724 0.465 0.60 0.634 0.241 0.72

21 A 0.723 0.458 0.61 0.730 0.480 0.60 0.601 0.148 0.7622 A 0.684 0.367 0.66 0.655 0.300 0.69 0.624 0.230 0.73

23 A 0.694 0.373 0.65 0.670 0.333 0.68 0.613 0.152 0.76

24 A 0.703 0.397 0.64 0.675 0.351 0.67 0.621 0.158 0.76

25 A 0.493 �0.877 1.13 0.423 �1.871 1.40 0.378 �1.626 1.3426 A 0.351 �1.850 1.40 0.335 �2.174 1.47 0.366 �1.465 1.30

27 A 0.349 �1.390 1.28 0.404 �1.103 1.20 0.454 �0.798 1.11

28 A 0.471 �0.439 0.99 0.516 �0.285 0.94 0.523 �0.294 0.94

29 A 0.549 �0.057 0.85 0.577 0.023 0.82 0.569 �0.043 0.8430 A 0.282 �4.289 1.90 0.354 �3.835 1.82 0.360 �3.149 1.68

31 A 0.449 �1.050 1.18 0.462 �1.040 1.18 0.528 �0.601 1.05

32 A 0.597 0.087 0.79 0.609 0.136 0.77 0.633 0.171 0.7533 A 0.671 0.365 0.66 0.678 0.384 0.65 0.671 0.347 0.67

34 A 0.703 0.457 0.61 0.707 0.468 0.60 0.686 0.412 0.63

a Notations for the cross-validation statistical indices: rL10%O, q2L10%O, and RMSEL10%O forleave–10%–out; rL20%O, q

2L20%O, and RMSEL20%O for leave–20%–out; and rL25%O, q

2L25%O, and

RMSEL25%O for leave–25%–out.


polynomial, 0.762; SVMR RBF, 0.647; SVMR neural, 0.743; and SVMRanova, 0.671. The results presented here seem to indicate that SVM regressioncan fit the data, but the prediction is not reliable.

SVM Regression QSAR for BenzodiazepineReceptor Ligands

Benzodiazepine receptor (BzR) ligands (either benzodiazepines or struc-turally unrelated chemical compounds) act as modulators of g-aminobutyricacid (GABA) binding to its receptor, by altering the transmembrane chlorideion conductance.129–131 The interest for developing new BzR ligands is stimu-lated by their ability to induce a wide spectrum of central nervous systemeffects, from full agonism through antagonism to inverse agonism.

In this exercise, we compare MLR and SVMR QSAR models for the ben-zodiazepine receptor affinity of 52 2–aryl(heteroaryl)–2,5–dihydropyrazo-lo[4,3–c]quinolin–3–(3H)–ones (Figure 53).132 Both models were developedwith five structural descriptors, namely the Hammett electronic parametersR0 , the molar refractivity MRR8, the Sterimol parameter LR040 , an indicatorvariable I (1 or 0) for 7-substituted compounds, and the Sterimol parameterB5R.

130 The MLR model has a calibration correlation coefficient of 0.798and fairly good prediction ability:

log 1=IC50 ¼ 11:538 ð 2:869Þ � 2:320 ð 0:577ÞsR0 � 0:294 ð 0:073ÞMRR8

� 0:326 ð 0:081ÞLR040 � 0:560 ð 0:139Þ I � 1:795 ð 0:446ÞB5R

½107�n ¼ 52 rcal ¼ 0:798 RMSEcal ¼ 0:69 scal ¼ 0:73 Fcal ¼ 16:18


rL5%O ¼ 0:716 q2L5%O ¼ 0:458 RMSEL5%O ¼ 0:84

rL10%O ¼ 0:711 q2L10%O ¼ 0:448 RMSEL10%O ¼ 0:85

rL20%O ¼ 0:733 q2L20%O ¼ 0:502 RMSEL20%O ¼ 0:81

rL25%O ¼ 0:712 q2L25%O ¼ 0:470 RMSEL25%O ¼ 0:83

N

NN

O

H6

7

89

2'

3'

4'

R

R'

Figure 53 General formula for the pyrazolo[4,3–c]quinolin–3–ones.


In Table 13, we present the best regression predictions for each kernel.Despite the large number of SVMR experiments we carried out for this QSAR(34 total), the cross-validation statistics of the SVM models are well belowthose obtained with MLR.

SVM Regression QSAR for the Toxicity of AromaticCompounds to Chlorella vulgaris

Toxicity testing on model systems, short-term assays, and predictions fromquantitative structure-toxicity models are inexpensive and fast methods to screenand prioritize chemicals formore elaborate,more expensive, and time-consumingtoxicity evaluations. A novel short-term toxicity assay using the unicellular greenalgaChlorella vulgariswas proposed recently.133–136 That assay used fluoresceindiacetate, which is metabolized to fluorescein. The appearance of fluorescein,after 15 minutes of exposure to the toxicant, was measured fluorimetrically.The concentration causing a 50% decrease in fluorescence, EC50 (mM), ascompared with a control, was determined for 65 aromatic compounds.133

These experimental data were used to develop a quantitative structure-toxicitymodel with four structural descriptors: log Kow, octanol–water partition coef-ficient; ELUMO, AM1 energy of the lowest unoccupied molecular orbital; Amax,maximum acceptor superdelocalizability; and 0wv, Kier–Hall connectivityindex. QSAR models for predicting the toxicity of aromatic compounds toChlorella vulgaris, using MLR, ANN, and SVMR have been evaluated.137

The MLR results, presented below, show that these four descriptors are usefulin predicting log 1/EC50:

log1=EC50 ¼ �4:483 ð 0:461Þ þ 0:5196 ð 0:0534ÞlogKow

� 0:3425 ð 0:0352ÞELUMO þ 7:260 ð 0:746ÞAmax

þ 0:1375 ð 0:0141Þ 0wv½108�

Table 13 Best SVMR Predictions for the Benzodiazepine Receptor Ligands QSARa

Exp Kernel rL5%O q2L5%O RMSEL5%O rL10%O q2L10%O RMSEL10%O

1 L 0.667 0.261 0.98 0.672 0.273 0.972 P �0.270 <�100 >10 �0.265 <�100 >106 R 0.665 0.368 0.91 0.676 0.370 0.9112 N 0.665 0.416 0.87 0.659 0.411 0.8724 A 0.641 0.293 0.96 0.653 0.339 0.93


1 L 0.674 0.228 1.00 0.667 0.189 1.032 P �0.277 <�100 >10 0.332 <�100 >106 R 0.633 0.317 0.94 0.680 0.344 0.9212 N 0.670 0.418 0.87 0.691 0.432 0.8624 A 0.632 0.155 1.05 0.675 0.376 0.90

a See Table 11 for the parameters of each kernel.


n ¼ 65 rcal ¼ 0:929 RMSEcal ¼ 0:39 Scal ¼ 0:40 Fcal ¼ 94:76


rL5%O ¼ 0:910 q2L5%O ¼ 0:826 RMSEL5%O ¼ 0:44

rL10%O ¼ 0:909 q2L10%O ¼ 0:826 RMSEL10%O ¼ 0:44

rL20%O ¼ 0:910 q2L20%O ¼ 0:828 RMSEL20%O ¼ 0:43

rL25%O ¼ 0:919 q2L25%O ¼ 0:844 RMSEL25%O ¼ 0:41

The ANN cross-validation results show that using more than one hiddenneuron results in decreased predictive power. Because it is very time consum-ing, the LOO procedure was not tested with ANN. Therefore, as the best modelfor the neural network QSAR, we selected the one with a single hidden neuronand with tanh functions for both hidden and output neurons: rcal¼0:934,RMSEcal¼0:38; rL5%O¼0:906, q2L5%O¼0:820, RMSEL5%O¼0:44; andrL10%O¼0:910, q2L10%O¼0:828, RMSEL10%O¼0:43; rL20%O¼0:909,q2L20%O¼0:824, RMSEL20%O¼0:44; and rL25%O¼0:917, q2L25%O¼0:840,RMSEL25%O¼0:42. The neural network does not improve the prediction oflog 1/EC50 compared with the MLR model.

The best SVM regression results for each kernel are given in Table 14. Bycomparing the results from MLR, ANN, and SVMR, we find that no clear

Table 14 Best SVMR Predictions for the Toxicity QSAR of Aromatic Compounds toChlorella vulgarisa

Exp Kernel Copt SVcal rcal RMSEcal rLOO q2LOO RMSELOO

1 L 39.925 65 0.927 0.40 0.920 0.841 0.422 P 2 88.198 65 0.929 0.39 0.873 0.749 0.526 R 0.25 88.198 65 0.990 0.15 0.766 0.552 0.7011 N 0.5 0.0 0.057 65 0.915 0.46 0.912 0.800 0.4720 A 0.25 1 88.198 65 0.953 0.32 0.921 0.846 0.41


1 L 0.920 0.843 0.41 0.915 0.833 0.432 P 0.832 0.641 0.63 0.835 0.635 0.636 R 0.818 0.645 0.62 0.788 0.567 0.6911 N 0.909 0.796 0.47 0.907 0.793 0.4820 A 0.894 0.797 0.47 0.909 0.826 0.44


1 L 0.916 0.837 0.42 0.917 0.837 0.422 P 0.850 0.674 0.60 0.781 0.442 0.786 R 0.775 0.530 0.72 0.807 0.587 0.6711 N 0.883 0.748 0.52 0.899 0.783 0.4920 A 0.921 0.840 0.42 0.873 0.745 0.53



trend exists when the prediction difficulty increases. For each prediction test,the top three models are, in decreasing order, (1) LOO: SVM anova, SVM lin-ear, MLR; (2) L5%O: SVM linear, MLR, SVM neural; (3) L10%O: SVM lin-ear, ANN, MLR, and SVM anova; (4) L20%O: SVM anova, SVM linear,MLR; and (5) L25%O: MLR, ANN, SVM linear. The trends in the predictionstatistics are sometimes counterintuitive. In the L20%O test, SVM anova hasr ¼ 0.921 and MLR has r ¼ 0:910, whereas in the L25%O test, SVM anovadecreases to r ¼ 0:873 and MLR increases to r ¼ 0:919. The prediction beha-vior identified here might explain the success of ensemble methods, whichusually surpass the performances of individual models. Note that by using asingle prediction (cross-validation) test, one might end up with a misleadingranking of QSAR models; using multiple cross-validation tests is thus advised.

SVM Regression QSAR for Bioconcentration Factors

During the bioconcentration process, chemical compounds accumulatein, for example, fish, by absorption through skin or respiratory surface. Thesteady-state ratio between the concentration of a compound in an aquaticorganism and its concentration in the aquatic environment defines the biocon-centration factor (BCF). To determine the environmental fate of chemicalsreleased from industrial, agricultural, or residential sources, it is essential todetermine or predict their BCF. Because the experimental determination ofBCF is time consuming and expensive, various QSAR models have been devel-oped for the BCF prediction using structural descriptors.138–143 Gramatica andPapa proposed a BCF QSAR model for 238 diverse chemicals, based on fivetheoretical descriptors, namely: VIMD;deg, mean information content of the dis-tance degree magnitude; MATS2m, Moran autocorrelation of a topologicalstructure; GATS2e, Geary autocorrelation; H6p, H autocorrelation; andnHAcc, number of H-bond acceptors.144 We used this dataset of 238 com-pounds and those five descriptors to compare MLR and SVMR models.145

The cross-validation statistics of the MLR model are close to the calibrationstatistics, indicating that the model is stable and gives good predictions:

BCF ¼ �18:291 ð 4:727Þ þ 1:867 ð 0:483Þ VIMD;deg

þ 15:813 ð 4:087ÞMATS2m� 0:356 ð 0:092ÞGATS2e

� 2:204 ð 0:570ÞH6p� 0:488 ð 0:126Þ nHAcc

½109�

n ¼ 238 rcal ¼ 0:910 RMSEcal ¼ 0:57 scal ¼ 0:57 Fcal ¼ 223:04


rL5%O ¼ 0:906 q2L5%O ¼ 0:820 RMSEL5%O ¼ 0:58

rL10%O ¼ 0:905 q2L10%O ¼ 0:820 RMSEL10%O ¼ 0:58

rL20%O ¼ 0:904 q2L20%O ¼ 0:816 RMSEL20%O ¼ 0:59

rL25%O ¼ 0:906 q2L25%O ¼ 0:821 RMSEL25%O ¼ 0:58


Table 15 contains the best SVM regression results for each kernel. Thecross-validation results show that the correlation coefficient decreases in the fol-lowing order of kernels: linear> degree 2 polynomial> neural>RBF> anova.The MLR and SVMR linear models are very similar, and both are significantlybetter than the SVM models obtained with nonlinear kernels. The inability ofnonlinear models to outperform the linear ones can be attributed to the largeexperimental errors in determining BCF.

SVM regression is a relatively novel addition to the field of QSAR, but itspotential has not yet been sufficiently explored. In this pedagogically drivenchapter, we have presented four QSAR applications in which we comparedthe performances of five kernels with models obtained with MLR andANN. In general, the SVM regression cannot surpass the predictive abilityof either MLR or ANN, and the prediction of nonlinear kernels is lowerthan that obtained with the linear kernel. Several levels of cross-validationare necessary to confirm the prediction stability; in particular, the QSAR forChlorella vulgaris toxicity shows a different ranking for these methods,depending on the cross-validation test. The statistics of the QSAR modelsare dependent on the kernel type and parameters, and SVM regression givesin some cases unexpectedly low prediction statistics. Another problem withnonlinear kernels is overfitting, which was found in all four QSAR experi-ments. For this tutorial we also experimented with ANNs having different out-put transfer functions (linear, symlog, and tanh; data not shown). When thenumber of hidden neurons was kept low, all ANN results were consistentlygood, unlike SVM regression, which shows a wide and unpredictable variationwith the kernel type and parameters.

Table 15 Best SVMR Predictions for the Bioconcentration Factors of 238 DiverseOrganic Compoundsa

RMS- RMS-

Exp Kernel Copt SVcal rcal Ecal rL5%O q2L5%O EL5%O

1 L 88.198 238 0.909 0.57 0.907 0.822 0.58

2 P 2 73.609 238 0.921 0.54 0.891 0.782 0.64

6 R 0.25 88.198 238 0.976 0.30 0.866 0.714 0.7311 N 0.5 0.0 0.026 238 0.886 0.69 0.883 0.750 0.68

20 A 0.25 1 88.198 238 0.923 0.53 0.842 0.664 0.79

RMS- RMS- RMS-

Exp Kernel rL10%O q2L10%O EL10%O rL20%O q2L20%O EL20%O rL25%O q2L25%O EL25%O

1 L 0.907 0.822 0.58 0.907 0.821 0.58 0.906 0.819 0.58

2 P 0.896 0.791 0.62 0.887 0.775 0.65 0.881 0.758 0.67

6 R 0.868 0.718 0.73 0.856 0.674 0.78 0.860 0.704 0.74

11 N 0.886 0.750 0.68 0.891 0.763 0.67 0.876 0.751 0.6820 A 0.857 0.700 0.75 0.851 0.682 0.77 0.827 0.626 0.84



The fact that the linear kernel gives better results than nonlinear kernelsfor certain QSAR problems is documented in the literature. Yang et al. com-pared linear, polynomial, and RBF kernels for the following properties of alkylbenzenes: boiling point, enthalpy of vaporization at the boiling point, criticaltemperature, critical pressure, and critical volume.146 A LOO test showed that thefirst four properties were predicted best with a linear kernel, whereas criticalvolume was predicted best with a polynomial kernel.

REVIEW OF SVM APPLICATIONS IN CHEMISTRY

A rich literature exists on the topic of chemical applications of supportvector machines. These publications are usually for classification, but someinteresting results have also been obtained with SVM regression. An importantissue is the evaluation of the SVM capabilities. Accordingly, many papers con-tain comparisonswith other pattern recognition algorithms. Equally important isthe assessment of various parameters and kernels that can give rise to the bestSVM model for a particular problem. In this section, we present a selection ofpublished SVM applications in chemistry that focus on drug design and clas-sification of chemical compounds, SAR and QSAR, genotoxicity of chemicalcompounds, chemometrics, sensors, chemical engineering, and text mining forscientific information.

Recognition of Chemical Classes and Drug Design

A test in which kinase inhibitors were discriminated from noninhibitorswas used by Briem and Gunther to compare the prediction performances of sev-eral machine learning methods.85 The learning set consisted of 565 kinase inhi-bitors and 7194 inactive compounds, and the validation was performed with atest set consisting of 204 kinase inhibitors and 300 inactive compounds. Thestructure of the chemical compoundswas encoded into a numerical formby usingGhose–Crippen atom type counts. Four classification methods were used: SVMwith a Gaussian RBF kernel, artificial neural networks, k-nearest neighbors withgenetic algorithm parameter optimization, and recursive partitioning (RP). Allfour methods were able to classify kinase inhibitors from noninhibitors, butwith slight differences in the predictive power of the models. The average testaccuracy for 13 experiments indicates that SVMs give the best predictions forthe test set: SVM0.88, k-NN0.84, ANN0.80,RP0.79. The results for amajorityvote of a jury of 13 experiments show that SVMandANNhad identical test accu-racy: SVM 0.88, k-NN 0.87, ANN 0.88, and RP 0.83.

Muller et al. investigated several machine learning algorithms for theirability to identify drug-like compounds based on a set of atom type counts.147

Five machine learning procedures were investigated: SVM with polynomial

Review of SVM Applications in Chemistry 371

and Gaussian RBF kernels, linear programming machines, linear discriminantanalysis, bagged k-nearest neighbors, and bagged decision trees C4.5. Drug-like compounds were selected from the World Drug Index (WDI), whereasnon-drug compounds were selected from the Available Chemicals Directory(ACD), giving a total of 207,001 compounds. The chemical structure wasrepresented with the counts of Ghose–Crippen atom types. The test errorfor discriminating drug-like from non-drug compounds shows that the twoSVM models give the best results: SVM RBF 6.8% error, SVM linear 7.0%error, C4.5 8.2% error, and k-NN 8.2% error.

Jorissen and Gilson applied SVM to in silico screening of molecular data-bases for compounds possessing a desired activity.148 Structural descriptorswere computed with Dragon, and the parameters of the SVM (with a GaussianRBF kernel) were optimized through a cross-validation procedure. Five sets of50 diverse inhibitors were collected from the literature. The active compoundsare antagonists of the a1A adrenoceptor and reversible inhibitors of cyclin-dependent kinase, cyclooxygenase-2, factor Xa, and phosphodiesterase-5.The nonactive group of compounds was selected from the National CancerInstitute diversity set of chemical compounds. Based on the enrichment factorscomputed for the five sets of active compounds, it was found that SVM cansuccessfully identify active compounds and discriminate them from nonactivechemicals.

Yap and Chen developed a jury SVM method for the classification ofinhibitors and substrates of cytochromes P450 3A4 (CYP3A4, 241 inhibitorsand 368 substrates), 2D6 (CYP2D6, 180 inhibitors and 198 substrates), and2C9 (CYP2C9, 167 inhibitors and 144 substrates).86 Structural descriptorscomputed with Dragon were selected with a genetic algorithm procedureand a L10%O or L20%O SVM cross-validation. Two jury SVM algorithmswere applied. The first is the positive majority consensus SVM (PM-CSVM),and the second is the positive probability consensus SVM (PP-CSVM).PM-CSVM classifies a compound based on the vote of the majority of itsSVM models, whereas PP-CSVM explicitly computes the probability for acompound being in a certain class. Several tests performed by Yap andChen showed that at least 81 SVM models are necessary in each ensemble.Both PM-CSVM and PP-CSVM were shown to be superior to a single SVMmodel (Matthews correlation coefficient for CYP2D6, MCC¼ 0.742 for sin-gle SVM, MCC¼ 0.802 for PM-CSVM, and MCC¼ 0.821 for PP-CSVM).Because PP-CSVM appears to outperform PM-CSVM, the final classificationresults were generated with PP-CSVM: MCC¼ 0.899 for CYP3A4,MCC¼ 0.884 for CYP2D6, and MCC¼ 0.872 for CYP2C9.

Arimoto, Prasad, and Gifford compared five machine learning methods(recursive partitioning, naıve Bayesian classifier, logistic regression, k-nearestneighbors, and SVM) for their ability to discriminate between inhibitors(IC50< 3 mM) and noninhibitors (IC50> 3 mM) of cytochrome P450 3A4.149

The dataset of 4470 compounds was characterized with four sets of


descriptors: MAKEBITS BCI fingerprints (4096 descriptors), MACCS finger-prints (166 descriptors), MOE TGT (typed graph triangle) fingerprints(13608 descriptors), and MolconnZ topological indices (156 descriptors).The models were evaluated with L10%O cross-validation and with a testset of 470 compounds (179 inhibitors and 291 noninhibitors). The most pre-dictive models are the BCI fingerprints/SVM, which correctly classified 135inhibitors and 249 noninhibitors; the MACCS fingerprints/SVM, which cor-rectly classified 137 inhibitors and 248 noninhibitors; and topologicalindices/recursive partitioning, which correctly classified 147 inhibitors and236 noninhibitors. A consensus of these three models slightly increased theaccuracy to 83% compared to individual classification models.

Svetnik et al. performed a large-scale evaluation of the stochastic gradi-ent boosting method (SGB), which implements a jury of classification andregression trees.150 SGB was compared with a single decision tree, a randomforest, partial least squares, k-nearest neighbors, naıve Bayes, and SVM withlinear and RBF kernels. For the 10 QSAR datasets that were used for thesetests we indicate here the best two methods for each QSAR, as determinedby the prediction statistics (mean for 10 cross-validation experiments):blood-brain barrier (180 active compounds, 145 non-active compounds), ran-dom forest AC¼ 0.806 and SGB AC¼ 0.789; estrogen receptor binding activ-ity (131 binding compounds, 101 non-binding compounds) random forestAC¼ 0.827 and SGB AC¼ 0.824; P-glycoprotein (P-gp) transport activity(108 P-gp substrates, 78 P-gp non-substrates) random forest AC¼ 0.804and PLS AC¼ 0.798; multidrug resistance reversal agents (298 active com-pounds, 230 non-active compounds) random forest AC¼ 0.831 and SGBAC¼ 0.826; cyclin-dependent kinase 2 antagonists (361 active compounds,10579 inactive compounds) random forest AC¼ 0.747 and SVM RBFAC¼ 0.723; binding affinity for the dopamine D2 receptor (116 disubstitutedpiperidines) random forest q2 ¼ 0:477 and PLS q2 ¼ 0:454; log D (11260compounds) SGB q2 ¼ 0:860 and SVM linear q2 ¼ 0:841; binding to unspeci-fied channel protein (23102 compounds) SVM linear q2 ¼ 0:843 and SVMRBF q2 ¼ 0:525; cyclooxygenase-2 inhibitors (314 compounds for regression;and 153 active compounds and 161 non-active compounds for classification),regression random forest q2 ¼ 0:434 and SGB q2 ¼ 0:390, and classificationSGB AC¼ 0.789 and SVM linear AC¼ 0.774. The study shows that jurymethods are generally superior to single models.

An important adverse drug reaction is the torsade de pointes (TdP)induction. TdP accounts for almost one third of all drug failures duringdrug development and has resulted in several drugs being withdrawn fromthe market. Yap et al. developed an SVM classification model to predict theTdP potential of drug candidates.151 The drugs that induce TdP were collectedfrom the literature (204 for training and 39 for prediction), whereas drugswith no reported cases of TdP in humans were selected as the non-active com-pounds (204 for training and 39 for prediction). The molecular structure for


each molecule was characterized with the linear solvation energy relationship,(LSER) descriptors. The prediction accuracy for each method is 91.0% accu-racy for SVM with Gaussian RBF kernel, 88.5% accuracy for k-NN, 78.2%accuracy for probabilistic neural networks, and 65.4% accuracy for the C4.5decision tree, thus illustrating the good results of support vector machinesclassification.

HERG (human ether-a-go-go) potassium channel inhibitors can lead to aprolongation of the QT interval that can trigger TdP, an atypical ventriculartachycardia. Tobita,Nishikawa, andNagashima developed an SVMclassifier thatcan discriminate between high and low HERG potassium channel inhibitors.152

The IC50 values for 73 drugs were collected from the literature, and twothresholds were used by those authors to separate high and low inhibitors,namely pIC50¼ 4.4 (58 active and 15 non-active compounds) andpIC50¼ 6.0 (28 active and 45 non-active compounds). The chemical structureof each molecule was represented by 57 2D MOE descriptors and 51 molecu-lar fragments representing a subset of the public 166-bit MACCS key set. Theclassification accuracy for L10%O cross-validation was 95% for pIC50¼ 4.4and 90% for pIC50¼ 6.0, again showing the utility of SVM for classification.

Xue et al. investigated the application of recursive feature elimination forthe three following classification tests: P-glycoprotein substrates (116 sub-strates and 85 non-substrates), human intestinal absorption (131 absorbablecompounds and 65 non-absorbable compounds), and compounds that causetorsade de pointes (85 TdP inducing compounds and 276 non-TdP inducingcompounds).69 With the exception of TdP compounds, the recursive featureelimination increases significantly the prediction power of SVM classifierswith a Gaussian RBF kernel. The accuracy (AC) and Matthews correlationcoefficient (MCC) for SVM alone and for SVM plus recursive feature elimina-tion (SVMþRFE) using a L20%O cross-validation test demonstrates theimportance of eliminating ineffective descriptors: P-glycoprotein substrates,SVM AC¼ 68.3% and MCC¼ 0.37, SVMþRFE AC¼ 79.4% andMCC¼ 0.59; human intestinal absorption, SVM AC¼ 77.0% andMCC¼ 0.48, SVMþRFE AC¼ 86.7% and MCC¼ 0.70; torsade de pointesinducing compounds, SVM AC¼ 82.0% and MCC¼ 0.48, and SVMþRFEAC¼ 83.9% and MCC¼ 0.56.

Selecting an optimum group of descriptors is both an important andtime-consuming phase in developing a predictive QSAR model. Frohlich,Wegner, and Zell introduced the incremental regularized risk minimizationprocedure for SVM classification and regression models, and they comparedit with recursive feature elimination and with the mutual information proce-dure.70 Their first experiment considered 164 compounds that had been testedfor their human intestinal absorption, whereas the second experiment modeledthe aqueous solubility prediction for 1297 compounds. Structural descriptorswere computedby those authorswith JOELib andMOE, and full cross-validationwas performed to compare the descriptor selection methods. The incremental


regularized risk minimization procedure gave slightly better results than didthe recursive feature elimination.

Sorich et al. proposed in silico models to predict chemical glucuronida-tion based on three global descriptors (equalized electronegativity, molecularhardness, and molecular softness) and three local descriptors (atomic charge,Fukui function, and atomic softness), all based on the electronegativity equal-ization method (EEM).153 The metabolism of chemical compounds by 12human UDP-glucuronosyltransferase (UGT) isoforms was modeled with acombined approach referred to as cluster–GA–PLSDA (cluster analysis–genetic algorithm–partial least-squares discriminant analysis) and withn-SVM with an RBF kernel. Groups containing between 50 and 250 substratesand nonsubstrates for each of the 12 UGT isoforms were used to generate theclassification models. The average percentage of correctly predicted chemicalsfor all isoforms is 78% for SVM and 73% for cluster–GA–PLSDA. By combin-ing the EEM descriptors with 2-D descriptors, the SVM average percentage ofcorrectly predicted chemicals increases to 84%.

Jury methods can increase the prediction performances of the individualmodels that are aggregated in the ensemble. Merkwirth et al. investigated theuse of k-NN, SVM, and ridge regression for drug-like compound identifica-tion.84 The first test of their jury approach involved 902 compounds fromhigh-throughput screening experiments that were classified as ‘‘frequent hit-ters’’ (479 compounds) and ‘‘non-frequent hitters’’ (423 compounds), eachof which was characterized by 1814 structural descriptors. The second testconsisted of inhibitors of the cytochrome P450 3A4 (CYP3A4), which weredivided into a group of low inhibitors (186 compounds with IC50< 1 mM)and another group of high inhibitors (224 compounds with IC50> 1 mM).Their cross-validation statistics show that SVM models (single and in a juryof 15 models) are the best classifiers, as can be seen from the values of the Mat-thews correlation coefficient: for frequent hitters, SVM 0.91 and jury-SVM0.92; for CYP3A4, SVM and jury-SVM 0.88. Both SVM and jury-SVM clas-sifiers were obtained by using all structural descriptors, which gave betterresults than models obtained when using only selected input descriptors. Over-all, this approach to jury prediction does not provide any significant advantageover single SVM classifiers.

Five methods of feature selection (information gain, mutual information,w2-test, odds ratio, and GSS coefficient) were compared by Liu for their abilityto discriminate between thrombin inhibitors and noninhibitors.71 The chemicalcompounds were provided by DuPont Pharmaceutical Research Laboratoriesas a learning set of 1909 compounds contained 42 inhibitors and 1867 non-inhibitors, and a test set of 634 compounds contained 150 inhibitors and 484noninhibitors. Each compound was characterized by 139,351 binary featuresdescribing their 3-D structure. In this comparison of naıve Bayesian and SVMclassifiers, all compounds were considered together, and a L10%O cross-vali-dation procedure was applied. Based on information gain descriptor selection,


SVM was robust to a 99% reduction of the descriptor space, with a smalldecrease in sensitivity (from 58.7% to 52.5%) and specificity (from 98.4%to 97.2%).

Byvatov and Schneider compared the SVM-based and the Kolmogorov–Smirnov feature selection methods to characterize ligand–receptor interactionsin focused compound libraries.72 Three datasets were used to compare the fea-ture selection algorithms: 226 kinase inhibitors and 4479 noninhibitors; 227factor Xa inhibitors and 4478 noninhibitors; and 227 factor Xa inhibitors and195 thrombin inhibitors. SVM classifiers with a degree 5 polynomial kernelwere used for all computations, and the molecular structure was encodedinto 182 MOE descriptors and 225 topological pharmacophores. In onetest, both feature selection algorithms produced comparable results, whereasin all other cases, SVM-based feature selection had better predictions.

Finally, we highlight the work of Zernov et al. who tested the SVM abil-ity to discriminate between active–inactive compounds from three libraries.154

The first test evaluated the discrimination between drug-like and non-drugcompounds. The learning set contained 15,000 compounds (7465 drugs and7535 non-drugs), and the test set had 7500 compounds (3751 drugs and3749 non-drugs). The test set accuracy for SVM with an RBF kernel(75.15%) was slightly lower in percentage prediction than that of ANN(75.52%). The second experiment evaluated the discrimination between agro-chemical and non-agrochemical compounds, and the third evaluated the dis-crimination between low and high carbonic anhydrase II inhibitors. In both ofthese tests, SVM classifiers had the lowest number of errors.

QSAR

SVM classification and regression were used to model the potency ofdiverse drug-like compounds to inhibit the human cytochromes P450 3A4(CYP3A4) and 2D6 (CYP2D6).155 The dataset consisted of 1345 CYP3A4and 1187 CYP2D6 compounds tested for the 50% inhibition (IC50) of the cor-responding enzyme. The SVM models were trained with the Gaussian RBFkernel, and the one-versus-one technique was used for multiclass classification.For SVM classification, the datasets were partitioned into three groups: stronginhibitors, consisting of compounds with IC50< 2 mM (243 CYP3A4 inhibi-tors and 182 CYP2D6 inhibitors); medium inhibitors, consisting of those com-pounds with IC50 between 2 and 20 mM (559 CYP3A4 inhibitors and 397CYP2D6 inhibitors); and weak inhibitors, consisting of compounds withIC50> 20 mM (543 CYP3A4 inhibitors and 608 CYP2D6 inhibitors). Foursets of structural descriptors were used to train the SVM models: in-house2-D descriptors, such as atom and ring counts; MOE 2-D descriptors, suchas topological indices and pharmacophores; VolSurf descriptors, based onmolecular interaction fields; and a set of 68 AM1 quantum indices. Leave–10%–out was used to cross-validate the SVM models. The best SVM


classification predictions were obtained with the MOE 2-D descriptors. ForCYP3A4, the test set accuracy is 72% (76% for strong inhibitors, 67% formedium inhibitors, and 77% for weak inhibitors), whereas for CYP2D6, thetest set accuracy is 69% (84% for strong inhibitors, 53% for medium inhibi-tors, and 74% for weak inhibitors). The same group of descriptors gave thebest SVM regression predictions: CYP3A4, q2 ¼ 0:51 vs. q2 ¼ 0:39 for PLS,and CYP2D6, q2 ¼ 0:52 vs. q2 ¼ 0:30 for PLS. In these QSAR models,SVM regression gave much better predictions than did PLS.

Aires-de-Sousa andGasteiger used four regression techniques [multiple lin-ear regression, perceptron (aMLF ANNwith no hidden layer), MLF ANN, andn-SVM regression] to obtain a quantitative structure-enantioselectivity relation-ship (QSER).156 The QSER models the enantiomeric excess in the addition ofdiethyl zinc to benzaldehyde in the presence of a racemic catalyst and an enan-tiopure chiral additive. A total of 65 reactions constituted the dataset. Using11 chiral codes as model input and a three-fold cross-validation procedure, aneural network with two hidden neurons gave the best predictions: ANN 2hidden neurons, R2

pred ¼ 0:923; ANN 1 hidden neurons, R2pred ¼ 0:906; per-

ceptron, R2pred ¼ 0:845; MLR, R2

pred ¼ 0:776; and n-SVM regression withRBF kernel, R2

pred ¼ 0:748.A molecular similarity kernel, the Tanimoto similarity kernel, was used

by Lind and Maltseva in SVM regression to predict the aqueous solubility ofthree sets of organic compounds.99 The Taniomto similarity kernel was com-puted from molecular fingerprints. The RMSE and q2 cross-validation statis-tics for the three sets show a good performance of SVMR with the Tanimotokernel: set 1 (883 compounds), RMSE¼ 0.62 and q2 ¼ 0:88; set 2 (412 com-pounds), RMSE¼ 0.77 and q2 ¼ 0:86; and set 3 (411 compounds),RMSE¼ 0.57 and q2 ¼ 0:88. An SVMR model was trained on set 1 andthen tested on set 2 with good results, i.e., RMSE¼ 0.68 and q2 ¼ 0:89.

Yang et al. developed quantitative structure-property relationships(QSPR) for several properties of 47 alkyl benzenes. These properties includedboiling point, enthalpy of vaporization at the boiling point, critical tempera-ture, critical pressure, and critical volume.146 The molecular structure of alkylbenzenes was encoded with Randic–Kier–Hall connectivity indices, electroto-pological state indices, and Kappa indices. For each property, the optimum setof topological indices, kernel (linear, polynomial, or Gaussian RBF), C, and ewere determined with successive LOO cross-validations. The LOO RMSEstatistics for SVM regression, PLS, and ANN (three hidden neurons) showthat the SVM model is the best: boiling point, SVMR 2.108, PLS 2.436,ANN 5.063; enthalpy of vaporization at the boiling point, SVMR 0.758,PLS 0.817, ANN 1.046; critical temperature, SVMR 5.523, PLS 7.163,ANN 9.704; critical pressure, SVMR 0.075, PLS 0.075, ANN 0.114; and cri-tical volume, SVMR 4.692, PLS 5.914, ANN 9.452. The first four propertieswere best predicted with a linear kernel, whereas a polynomial kernel wasused to model the critical volume.


Kumar et al. introduced a new method for descriptor selection, thelocally linear embedding, which can be used for reducing the nonlinear dimen-sions in QSPR and QSAR.68 SVM regression (Gaussian RBF kernel) was usedto test the new descriptor selection algorithm, using three datasets: boilingpoints of 150 alkanes characterized by 12 topological indices; the Selwooddataset with 31 chemical compounds characterized by 53 descriptors; andthe Steroids dataset consisting of 31 steroids with 1248 descriptors (autocor-relation of molecular surface indices). The statistics obtained with locally lin-ear embedding were better than those obtained with all descriptors or by PCAdescriptor reduction.

Genotoxicity of Chemical Compounds

During the process of drug discovery, the genotoxicity of drug candidatesmust be monitored closely. Genotoxicity mechanisms include DNA methyla-tion, DNA intercalation, unscheduled DNA synthesis, DNA adduct formation,and strand breaking. Li et al. compared the ability of several machine learningalgorithms to classify a set of 860 compounds that were tested for genotoxicity(229 genotoxic and 631 non-genotoxic).157 Four methods were compared:SVM, probabilistic neural networks, k-nearest neighbors, and the C4.5 deci-sion tree. An initial set of 199 structural descriptors (143 topological indices,31 quantum indices, and 25 geometrical descriptors) was reduced to 39descriptors using an SVM descriptor selection procedure. A L20%O cross-validation test showed that SVM has the highest prediction accuracy:89.4% SVM with RBF kernel, 82.9% k-NN, 78.9% probabilistic neural net-works, and 70.7% C4.5.

Typically, an SVM application that predicts properties from the molecu-lar structure uses structural descriptors as input to the SVM model. Thesedescriptors are used in nonlinear functions, such as the polynomial or RBFkernels, to compute the SVM solution. Mahe et al. defined a series of graphkernels that can predict various properties from only the molecular graph.97

Atoms (graph vertices) are characterized by their chemical nature or by theirconnectivity through the Morgan index. Their first test of the molecular graphkernels considered the classification of 230 aromatic and hetero-aromatic nitrocompounds that were tested for mutagenicity on Salmonella typhimurium.This dataset was further divided into a regression-easy set of 188 compounds(125 mutagens and 63 nonmutagens) and a regression-difficult set of 42 com-pounds. In a comparative test of leave–10%–out cross-validation accuracy, themolecular graph kernel ranked third: feature construction 95.7%, stochasticmatching 93.3%, graph kernel 91.2%, neural network 89.4%, linear regres-sion 89.3%, and decision tree 88.3%. For the group of 42 compounds, the lit-erature has fewer comparative tests. In a LOO test, the accuracy of the newkernel was higher than that of other methods: graph kernel 88.1%, inductivelogic programming 85.7%, and decision tree 83.3%. Their second test of the


graph kernel used a dataset of 684 non-congeneric compounds classified asmutagens (341 compounds) or nonmutagens (343 compounds) in a Salmonella/microsome assay. Previous models for this dataset, based on molecular frag-ments, have a L10%O accuracy of 78.5%, whereas the graph kernel has anaccuracy between 76% and 79%.

The mutagenicity dataset of 230 aromatic and hetero-aromatic nitrocompounds was also used as a test case for a molecular graph kernel byJain, Geibel, and Wysotzki.98 Their kernel is based on the Schur–Hadamardinner product for a pair ofmolecular graphs.The leave–10%–out cross-validationaccuracy is 92% for the set of 188 compounds and 90% for the set of 42 com-pounds. The problem of computing the Schur–Hadamard inner product for apair of graphs is NP complete, and in this paper, it was approximated with arecurrent neural network. However, these approximations are not, in general,a kernel. Moreover, for some values of the parameters that control the kernel,the calculation of an SVM solution was not possible.

Helma et al. used the MOLFEA program for generating molecular sub-structures to discriminate between mutagenic and nonmutagenic com-pounds.158 A group of 684 compounds (341 mutagenic and 343nonmutagenic) evaluated with the Ames test (Salmonella/microsome assay)was used to compare the C4.5 decision tree algorithm, the PART rule learningalgorithm, and SVM with linear and degree 2 polynomial kernels. TheL10%O accuracy of 78.5% for the SVM classifier is higher than that of theC4.5 (75.0% accuracy) and PART (74.7% accuracy) algorithms.

Chemometrics

Several forms of transmissible spongiform encephalopathies are knowntoday, such as scrapie, fatal familiar insomnia, kuru, chronic wasting disease,feline spongiform encephalopathy, Creutzfeldt-Jacob disease in humans, orbovine spongiform encephalopathy (BSE). The main pathological characteristicof these diseases is a sponge-like modification of brain tissue. Martin et al.developed a serum-based diagnostic pattern recognition method for BSE diag-nosis.159 Mid-infrared spectroscopy of 641 serum samples was performed, andfour classification algorithms (linear discriminant analysis, LDA; robust lineardiscriminant analysis, RLDA; ANN; SVM) were used to characterize the sam-ples as BSE-positive or BSE-negative. The four classifiers were tested for a sup-plementary set of 160 samples (84 BSE-positive and 76 BSE-negative). For thetest set, ANN had the highest sensitivity (ANN 93%, SVM 88%, LDA 82%,RLDA 80%), whereas SVM had the highest specificity (SVM 99%, LDA 93%,ANN 93%, RLDA 88%).

After the emergence of the mad cow disease, the European Union regu-latory agencies banned processed animal proteins (meat and bone meal,MBM) in feedstuffs destined to farm animals that are kept, fattened, orbred for the production of food. A Near IR–SVM (NIR–SVM) system based


on plane array near-infrared imaging spectroscopy was proposed by Piernaet al. to detect MBM.160 Learning was based on NIR spectra from 26 pureanimal meals and 59 pure vegetal meals, with a total of 5521 spectra collected(2233 animal and 3288 vegetal). An LOO comparative evaluation of PLS,ANN, and SVM shows that support vector machines have the lowestRMSE: SVM RBF kernel 0.102, ANN 0.139, and PLS 0.397.

Wet granulation and direct compression are two methods used to man-ufacture tablets in the pharmaceutical industry. Zomer et al. used pyrolysis-gas chromatography-mass-spectrometry coupled with SVM classification todiscriminate between the two tablet production methods.161 Mass spectradata were submitted to a PCA analysis, and the first principal componentswere used as input for SVM models having linear, polynomial, and GaussianRBF kernels. SVM classifiers with polynomial and RBF kernels performedbetter in prediction than discriminant analysis.

The pathological condition induced by exposing an organism to a toxicsubstance depends on the mode of admission, the quantity, and the type ofdosage (acute or chronic). Urine profiling by b-cyclodextrin-modified micellarelectrokinetic capillary electrophoresis was used by Zomer et al. to identify thetype of cadmium intoxication (acute or chronic).162 Their dataset of 96 sam-ples was split into a learning set of 60 samples and a test set of 36 samples.Discriminant analysis applied to the first six principal components had betterresults on the test set (96.97% correctly classified) than did SVM trained onthe original measured data (75.76% correctly classified).

NIR spectroscopy is often used for nondestructive measurement of che-micals in various materials. The application of least-squares SVM regression(LS–SVMR) in predicting mixture composition from NIR spectra was investi-gated by Thissen et al.163 NIR spectra for ternary mixtures of ethanol, water,and 2-propanol were measured at 30�C, 40�C, 50�C, 60�C, and 70�C. Thelearning set consisted of 13 mixtures per temperature, whereas the test set con-sisted of 6 mixtures per temperature. For the test set, the least-squares SVMapproach had an RMSE 2.6 times lower than that from a PLS analysis.

Chauchard et al. investigated the ability of least-squares SVM regressionto predict the acidity of different grape varieties from NIR spectra.164 NIRscans between 680 and 1100 nm for 371 grape samples were collected forthree varieties: carignan (188 samples), mourverdre (84 samples), and ugni-blanc (99 samples). The total acidity (malic and tartaric acid concentrations)was measured with an HPLC assay. The PLS model selected 68 wavelengthsfrom the NIR spectra, and with eight principal factors gave a predictionq2 ¼ 0:76 and a test correlation coefficient R2 ¼ 0:77. Using 10 principal fac-tors, LS–SVMR models were more predictive than was PLS, with q2 ¼ 0:83and R2 ¼ 0:86. A comparison between an MLR with eight wavelengths(q2 ¼ 0:69 and R2 ¼ 0:68) and an LS–SVMR obtained for the same wave-lengths (q2 ¼ 0:77 and R2 ¼ 0:78) showed a significant improvement for thesupport vector machines model.


PLS and SVM regression were compared in their ability to predict, fromRaman spectra, the monomer masses for the copolymerization of methylmethacrylate and butyl acrylate in toluene.165 The high- and low-resolutionRaman spectra of 37 training samples were used to compute the regressionmodels, which were subsequently tested for 41 test samples. For the high-resolution spectra, the mean relative errors were 3.9% for SVMR and10.1% for PLS. For the low-resolution spectra, these errors were 22.8% forSVMR and 68.0% for PLS. In general, SVMRwith a degree 1 polynomial kernelgave the best predictions, which shows that a linear SVMmodel predicts betterthan the linear PLS model for this type of analysis.

Active learning support vector machines (AL–SVMs) was used by Zomeret al. to identify beach sand polluted with either gasoline or crude oil.166 Atotal of 220 samples were split into 106 learning samples and 114 test samples.Each sample was analyzed using HS–MS (head-space sampler coupled to amass-spectrometer) and with the mass spectra recorded in the range m/z49–160. The results obtained by Zomer et al. show that the active learning pro-cedure is effective in selecting a small subset of training samples, thus greatlyreducing the number of experiments necessary to obtain a predictive model.

Chemometrics techniques are usually applied in capillary electrophoresisto obtain an optimum resolution of the peaks, lower detection limits, shortermigration times, good peak shapes, higher precision, and better signal-to-noiseratio. Optimum separation conditions in capillary electrophoresis were deter-mined by Zhai et al. by combining a genetic algorithm with least-squares sup-port vector machines.167 The optimization target of the genetic algorithm wasto increase the peak resolution, symmetry, and height, and to decrease themigration time. The study involved the identification of four compoundswith anti-tumor activity. The optimizable parameters are the voltage and elec-trophoresis buffer composition, whereas the output measured parameters werethe migration time, height, and width for each of the four peaks. The correlationcoefficient for LS–SVM LOO cross-validation was 0.978. By combining thesimulation results of LS–SVM and a fitness function, the genetic algorithmfinds an optimum combination of experimental conditions for capillary elec-trophoresis separation.

Sensors

Heat treatment of milk ensures the microbial safety of milk and increasesits shelf life. Different heat treatments (UHT, pasteurized, sterilized) can bedistinguished by analyzing the volatile compounds with an electronic nose.A hybrid system that uses an electronic nose combined with an SVM classifi-cation method was tested by Brudzewski, Osowski, and Markiewicz for milkrecognition and classification.168 The electronic nose was composed of seventin oxide-based gas sensors, and the SVM model was tested with linear andRBF kernels. In the first experiment, four brands (classes) of milk were


discriminated, with each class containing 180 samples. In the second experi-ment, the UHT milk from one producer was classified according to the fat con-tent, again with 180 samples for each of the four brands. For each brand, 90samples were used for learning and 90 samples for testing the SVM classifier.The prediction was perfect for both experiments and all brands of milk.

Measurements collected from an electronic nose were used by Sadiket al. in an SVM classification system to identify several organophosphates.169

The following organophosphates were tested: parathion, malathion, dichlor-vos, trichlorfon, paraoxon, and diazinon. The electronic nose contained 32conducting polymer sensors whose output signal was processed and fed intothe SVM classifier for one-versus-one and one-versus-all classification. A totalof 250 measurements were recorded for each of the six organophosphates, anda L20%O cross-validation procedure was implemented. Four kernels weretested, namely linear, Gaussian RBF, polynomial, and the S2000 kernel,Kðx1; x2Þ ¼ jjx1 � x2jj2. In all experiments, the SVM performed better thana neural network.

An electronic nose and an SVM classifier were evaluated by Distante,Ancona, and Siciliano for the recognition of pentanone, hexanal, water, acet-one, and three mixtures of pentanone and hexanal in different concentra-tions.170 In a LOO test, the SVM classifier with a degree 2 polynomialkernel gave the best predictions: SVM 4.5% error, RBF neural network15% error, and multilayer feed-forward ANN 40% error.

Seven types of espresso coffee were classified by Pardo and Sberveglieriwith a system composed of an electronic nose and an SVM with polynomialand Gaussian RBF kernels.171 For each coffee type, 36 measurements wereperformed with an electronic nose equipped with five thin-film semiconductorsensors based on SnO2 and Ti-Fe. The output signal from sensors was sub-mitted to a PCA analysis whose principal components (between 2 and 5)represented the input data for the SVM classifier. The error surface corre-sponding to various kernel parameters and number of input principal compo-nents was investigated.

Gasoline supplemented with alcohol or ethers has an enhanced octanenumber. Adding 10 vol% ethanol, for example, increases the octane numberby 2.5 or more units. The most popular ether additive is methyl tertiarybutyl ether (MTBE), followed by ethyl tertiary butyl ether (ETBE) and ter-tiary amyl methyl ether. MTBE adds 2.5–3.0 octane numbers to gasolineand is used in 25% of all U.S. gasoline. Brudzewski et al. used an electronicnose and support vector machines to identify gasoline supplemented withethanol, MTBE, ETBE, and benzene.172 The electronic nose was composedof seven tin oxide-based gas sensors. Twelve gasoline blend types were pre-pared, and a total of 432 measurements were performed with the electronicnose. In a six-fold cross-validation experiment, it was found that SVM withlinear, degree 2 polynomial, and Gaussian RBF kernels achieved a perfectclassification.


Bicego used the similarity-based representation of electronic nose mea-surements for odor classification with the SVM method.173 In the similarity-based representation, the raw data from sensors are transformed into pairwise(dis)similarities, i.e., distances between objects in the dataset. The electronicnose is an array of eight carbon black-polymer detectors. The system wastested for the recognition of 2-propanol, acetone, and ethanol, with 34 experi-ments for each compound. Two series of 102 experiments were performed, thefirst one with data recorded after 10 minutes of exposure, whereas in the sec-ond group of experiments, the data were recorded after 1 second of exposure.The one-versus-one cross-validation accuracy of the first group of experimentswas 99% for similarity computed using the Euclidean metric. For the secondgroup of experiments, the accuracy was 79% for the Euclidean metric and80% for the Manhattan metric.

Chemical Engineering

Hybrid systems (ANN-GA and SVMR-GA) were compared by Nandiet al. for their ability to model and optimize the isopropylation of benzeneon Hbeta catalyst.73 The input parameters used to model the reaction weretemperature, pressure, benzene-to-isopropyl alcohol ratio, and weight hourlyspace velocity. The output parameters were the yield of isopropylbenzene andthe selectivity S, where S ¼ 100� (weight of isopropylbenzene formed per unittime)/(weight of total aromatics formed per unit time). Based on 42 experi-ments, the genetic algorithm component was used to select the optimum setof input parameters that maximize both yield and selectivity. The GA-optimizedsolutions were then verified experimentally, showing that the two hybridmethods can be used to optimize industrial processes.

The SVM classification of vertical and horizontal two-phase flowregimes in pipes was investigated by Trafalis, Oladunni, and Papavassiliou.174

The vertical flow dataset, with 424 cases, had three classes, whereas the hor-izontal flow dataset, with 2272 cases, had five classes. One-versus-onemulticlass SVM models were developed with polynomial kernels (degrees 1to 4). The transition region is determined with respect to pipe diameter, super-ficial gas velocity, and superficial liquid velocity. Compared with experimentalobservations, the predictions of the SVM model were, in most cases, superiorto those obtained from other types of theoretical models.

The locally weighted regression was extended by Lee et al. to supportvector machines and tested on the synthesis of polyvinyl butyrate (PVB).175

Weighted SVM regression has a variable capacity C, which depends on aweight computed for each data point. The weighted SVM regression wascomputed with e ¼ 0.001. Three kernels were tested: polynomial, GaussianRBF, and neural (tanh). A dataset of 120 patterns was dividend into 80 train-ing patterns, 20 validation patterns, and 20 test patterns. Each pattern con-sisted of 12 measurements of controlled variables (such as viscosity and


concentration of PVB, quantities of the first and second catalyst, reactiontime, temperature) and one product property, PVB viscosity. A comparativetest showed that the weighted SVM regression has the lowest error withRMSE¼ 23.9, compared with SVM regression RMSE¼ 34.9 and neuralnetwork RMSE¼ 109.4.

Chu, Qin, and Han applied an SVM classification model for the faultdetection and identification of the operation mode in processes with multi-mode operations.176 They studied the rapid thermal annealing, which is acritical semiconductor process used to stabilize the structure of silicon wafersand to make uniform the physical properties of the whole wafer after ionimplantation. A dataset of 1848 batch data was divided into 1000 learningdata and 848 test data. Input data for the SVM model were selected with anentropy-based algorithm, and 62 input parameters were used to train threeSVM classification models. The system based on SVM is superior to the con-ventional PCA fault detection method.

The melt index of thermoplastic polymers like polypropylene (PP) andpolystyrene is defined as the mass rate of extrusion flow through a specifiedcapillary under prescribed conditions of temperature and pressure. The meltindex of polypropylene and styrene-acrylonitrile (SAN) polymerization weremodeled by Han, Han, and Chung with PLS, ANN, and SVM regression hav-ing a Gaussian RBF kernel.177 For the SAN polymerization, 33 process vari-ables were measured for 1024 training data and 100 testing data. The test setRMSE shows that the best predictions were obtained with the SVM regression:SVMR 0.97, ANN 1.09, and PLS 3.15. For the PP synthesis, 78 process vari-ables were measured for 467 training data and 50 testing data. The melt indexof PP is best predicted by SVMR, as shown by the corresponding RMSEvalues: SVMR 1.51, PLS 2.08, and ANN 3.07.

Text Mining for Scientific Information

Automatic text datamining is an important source of knowledge, withmany applications in generating databases from scientific literature, such asprotein–disease associations, gene expression patterns, subcellular localiza-tion, and protein–protein interactions.

The NLProt system developed by Mika and Rost combines four supportvector machines, trained individually for distinct tasks.178,179 The first SVM istrained to recognize protein names, whereas the second learns the environmentin which a protein name appears. The third SVM is trained on both proteinnames and their environments. The output of these three SVMs and a scorefrom a protein dictionary are fed into the fourth SVM, which provides as out-put the protein whose name was identified in the text. A dictionary of proteinnames was generated from SwissProt and TrEMBL, whereas the Merriam-Webster Dictionary was used as a source of common words. Other termswere added to the dictionary, such as medical terms, species names, and tissue


types. The system has a 75% accuracy, and in a test on recent abstracts fromCell and EMBO Journal, NLProt reached 70% accuracy.

An SVM approach to name recognition in text was used by Shi andCampagne to develop a protein dictionary.180 A database of 80,528 full textarticles from Journal of Biological Chemistry, EMBO Journal, and Proceed-ings of the National Academy of Sciences were used as input to the SVM sys-tem. A dictionary of 59,990 protein names was produced. Three supportvector machines were trained to discriminate among protein names and cellnames, process names, and interaction keywords, respectively. The processingtime is half a second for a new full-text paper. The method can recognize namevariants not found in SwissProt.

Using PubMed abstracts, the PreBIND system can identify protein–proteininteractions with an SVM system.181 The protein–protein interactions identifiedby the automated PreBIND system are then combined and scrutinized manuallyto produce the BIND database (http://bind.ca). Based on a L10%O cross-valida-tion of a dataset of 1094 abstracts, the SVMapproach had a precision and recall of92%, whereas a naıve Bayes classifier had a precision and recall of 87%.

Bio-medical terms can be recognized and annotated with SVM-basedautomated systems, as shown by Takeuchi and Collier.182 The training wasperformed with 100 Medline abstracts where bio-medical terms weremarked-up manually in XML by an expert. The SVM system recognizedapproximately 3400 terms and showed good prediction capability for eachclass of terms (proteins, DNA, RNA, source, etc.).

Bunescu et al. compared the ability of several machine learning systemsto extract information regarding protein names and their interactions fromMedline abstracts.183 The text recognition systems compared are dictionarybased, the rule learning system Rapier, boosted wrapper induction, SVM,maximum entropy, k-nearest neighbors, and two systems for protein nameidentification, KEX and Abgene. Based on the F-measure (harmonic mean ofprecision and recall) in L10%O cross-validation, the best systems for proteinname recognition are the maximum entropy with dictionary (F ¼ 57:86%) fol-lowed by SVM with dictionary (F ¼ 54:42%).

SVM RESOURCES ON THE WEB

The Internet is a vast source of information on support vector machines.The interested reader can find tutorials, reviews, theoretical and applicationpapers, as well as a wide range of SVM software. In this section, we presentseveral starting points for retrieving relevant SVM information from the Web.

http://support-vector-machines.org/. This Web portal is dedicated tosupport vector machines and their applications. It provides exhaustive lists ofbooks, tutorials, publications (with special sections for applications in chemin-formatics, bioinformatics, and computational biology), software for various

SVM Resources on the WEB 385

platforms, and links to datasets that can be used for SVM classification andregression. Very useful are the links to open-access SVM papers. The site offersalso a list of SVM-related conferences.

http://www.kernel-machines.org/. This portal contains links to websitesrelated to kernel methods. Included are tutorials, publications, books, soft-ware, datasets used to compare algorithms, and conference announcements.A list of major scientists in kernel methods is also available from this site.

http://www.support-vector.net/. This website is a companion to thebook An Introduction to Support Vector Machines by Cristianini andShawe-Taylor,14 and it has a useful list of SVM software.

http://www.kernel-methods.net/. This website is a companion to thebook Kernel Methods for Pattern Analysis by Shawe-Taylor and Cristianini.21

The MatLab scripts from the book can be downloaded from this site. A tutor-ial on kernel methods is also available.

http://www.learning-with-kernels.org/. Several chapters on SVM fromthe book Learning with Kernels by Scholkopf and Smola17 are availablefrom this site.

http://www.boosting.org/. This is a portal for boosting and relatedensemble learning methods, such as arcing and bagging, with application tomodel selection and connections to mathematical programming and largemargin classifiers. The site provides links to software, papers, datasets, andupcoming events.

Journal of Machine Learning Research, http://jmlr.csail.mit.edu/. TheJournal of Machine Learning Research is an open-access journal that containsmany papers on SVM, including new algorithms and SVM model optimiza-tion. All papers can be downloaded and printed for free. In the current contextof widespread progress toward an open access to scientific publications, thisjournal has a remarkable story and is an undisputed success.

http://citeseer.ist.psu.edu/burges98tutorial.html. This is an online rep-rint of Burges’s SVM tutorial ‘‘A Tutorial on Support Vector Machines forPattern Recognition.’’23 The citeseer repository has many useful SVMmanuscripts.

PubMed, http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db¼PubMed.This is a comprehensive database of abstracts for chemistry, biochemistry,biology, and medicine-related literature. PubMed is a free service of theNational Library of Medicine and is a great place to start your search forSVM-related papers. PubMed has direct links for many online journals, whichare particularly useful for open-access journals, such as Bioinformatics orNucleic Acids Research. All SVM applications in cheminformatics from majorjournals are indexed here, but unfortunately, the relevant chemistry journalsare not open access. On the other hand, PubMed is the main hub for openaccess to important SVM applications, such as gene arrays, proteomics, ortoxicogenomics.


PubMed Central, http://www.pubmedcentral.nih.gov/. PubMed Central(PMC) is the U.S. National Institutes of Health (NIH) free digital archive ofbiomedical and life sciences journal literature. It represents the main publicrepository for journals that publish open-access papers. The site containsinformation regarding the NIH initiative for open-access publication ofNIH-funded research. Numerous papers can be found on SVM applicationsin bioinformatics and computational biology.

SVM SOFTWARE

Fortunately, scientists interested in SVM applications in cheminformaticsand computational chemistry can choose from a wide variety of free software,available for download from the Internet. The selection criteria for a useful pack-age are problem type (classification or regression); platform (Windows, Linux/UNIX, Java, MATLAB, R); available kernels (the more the better); flexibility inaddingnewkernels; possibility toperformcross-validationordescriptor selection.Collectedhere is relevant information for themost popular SVMpackages.All arefree for nonprofit use, but they comewith little or no support. On the other hand,they are straightforward to use, are accompanied by extensive documentation,and almost all are available as source code. For users wanting to avoid compila-tion-related problems, many packages are available asWindows binaries. A pop-ular option is the use of SVM scripts in computing environments such asMATLAB, R, Scilab, Torch, YaLE, orWeka (the last five are free). For small pro-blems, theGist server is a viable option. The list of SVMsoftware presented belowis ordered in an approximate decreasing frequency of use.

SVMlight, http://svmlight.joachims.org/. SVMlight, by Joachims,184 is oneof the most widely used SVM classification and regression packages. It has afast optimization algorithm, can be applied to very large datasets, and has avery efficient implementation of the leave–one–out cross-validation. It is dis-tributed as Cþþ source and binaries for Linux, Windows, Cygwin, andSolaris. Kernels available include polynomial, radial basis function, and neural(tanh).

SVMstruct, http://svmlight.joachims.org/svm_struct.html. SVMstruct, byJoachims, is an SVM implementation that can model complex (multivariate)output data y, such as trees, sequences, or sets. These complex output SVMmodels can be applied to natural language parsing, sequence alignment in pro-tein homology detection, and Markov models for part-of-speech tagging. Sev-eral implementations exist: SVMmulticlass, for multiclass classification; SVMcfg,which learns a weighted context free grammar from examples; SVMalign,which learns to align protein sequences from training alignments; andSVMhmm, which learns a Markov model from examples. These moduleshave straightforward applications in bioinformatics, but one can imagine

SVM Software 387

significant implementations for cheminformatics, especially when the chemicalstructure is represented as trees or sequences.

mySVM, http://www-ai.cs.uni-dortmund.de/SOFTWARE/MYSVM/index.html. mySVM, by Ruping, is a Cþþ implementation of SVM classi-fication and regression. It is available as Cþþ source code and Windowsbinaries. Kernels available include linear, polynomial, radial basis function,neural (tanh), and anova. All SVM models presented in this chapter werecomputed with mySVM.

JmySVM. A Java version of mySVM is part of the YaLE (Yet AnotherLearning Environment, http://www-ai.cs.uni-dortmund.de/SOFTWARE/YALE/index.html) learning environment under the name JmySVM

mySVM/db, http://www-ai.cs.uni-dortmund.de/SOFTWARE/MYSVMDB/index.html. mySVM/db is an efficient extension of mySVM, which isdesigned to run directly inside a relational database using an internal JAVAengine. It was tested with an Oracle database, but with small modifications,it should also run on any database offering a JDBC interface. It is especiallyuseful for large datasets available as relational databases.

LIBSVM, http://www.csie.ntu.edu.tw/cjlin/libsvm/. LIBSVM (Libraryfor Support Vector Machines) was developed by Chang and Lin and containsC-classification, n-classification, e-regression, and n-regression. Developedin Cþþ and Java, it also supports multiclass classification, weightedSVMs for unbalanced data, cross-validation, and automatic model selection.It has interfaces for Python, R, Splus, MATLAB, Perl, Ruby, and LabVIEW.Kernels available include linear, polynomial, radial basis function, and neural(tanh).

looms, http://www.csie.ntu.edu.tw/cjlin/looms/. looms, by Lee andLin, is a very efficient leave–one–out model selection for SVM two-class clas-sification. Although LOO cross-validation is usually too time consuming to beperformed for large datasets, looms implements numerical procedures thatmake LOO accessible. Given a range of parameters, looms automaticallyreturns the parameter and model with the best LOO statistics. It is availableas C source code and Windows binaries.

BSVM, http://www.csie.ntu.edu.tw/cjlin/bsvm/. BSVM, authored byHsu and Lin, provides two implementations of multiclass classification,together with SVM regression. It is available as source code for UNIX/Linuxand as binaries for Windows.

OSU SVM Classifier Matlab Toolbox, http://www.ece.osu.edu/maj/osu_svm/. This MATLAB toolbox is based on LIBSVM.

SVMTorch, http://www.idiap.ch/learning/SVMTorch.html. SVMTorch,by Collobert and Bengio,185 is part of the Torch machine learning library(http://www.torch.ch/) and implements SVM classification and regression. It isdistributed as Cþþ source code or binaries for Linux and Solaris.

Weka, http://www.cs.waikato.ac.nz/ml/weka/. Weka is a collection ofmachine learning algorithms for datamining tasks. The algorithms can either


be applied directly to a dataset or called from a Java code. It contains an SVMimplementation.

SVM in R, http://cran.r-project.org/src/contrib/Descriptions/e1071.html.This SVM implementation in R (http://www.r-project.org/) contains C-classification, n-classification, e-regression, and n-regression. Kernels availableinclude linear, polynomial, radial basis, and neural (tanh).

M-SVM, http://www.loria.fr/guermeur/. This is a multi-class SVMimplementation in C by Guermeur.52,53

Gist, http://microarray.cpmc.columbia.edu/gist/. Gist is a C implemen-tation of support vector machine classification and kernel principal compo-nents analysis. The SVM part of Gist is available as an interactive Webserver at http://svm.sdsc.edu. It is a very convenient server for users whowant to experiment with small datasets (hundreds of patterns). Kernels avail-able include linear, polynomial, and radial.

MATLAB SVM Toolbox, http://www.isis.ecs.soton.ac.uk/resources/svminfo/. This SVM toolbox, by Gunn, implements SVM classification andregression with various kernels, including linear, polynomial, Gaussian radialbasis function, exponential radial basis function, neural (tanh), Fourier series,spline, and B spline. All figures from this chapter presenting SVM models forvarious datasets were prepared with a slightly modified version of thisMATLAB toolbox.

TinySVM, http://chasen.org/taku/software/TinySVM/. TinySVM is aCþþ implementation of C-classification and C-regression that uses sparse vec-tor representation. It can handle several thousand training examples and fea-ture dimensions. TinySVM is distributed as binary/source for Linux andbinary for Windows.

SmartLab, http://www.smartlab.dibe.unige.it/. SmartLab providesseveral support vector machines implementations, including cSVM, a Win-dows and Linux implementation of two-class classification; mcSVM, a Win-dows and Linux implementation of multiclass classification; rSVM, aWindows and Linux implementation of regression; and javaSVM1 andjavaSVM2, which are Java applets for SVM classification.

Gini-SVM, http://bach.ece.jhu.edu/svm/ginisvm/. Gini-SVM, byChakrabartty and Cauwenberghs, is a multiclass probability regression enginethat generates conditional probability distributions as a solution. It is availableas source code.

GPDT, http://dm.unife.it/gpdt/.GPDT, by Serafini, et al. , is a Cþþ imple-mentation for large-scale SVM classification in both scalar and distributedmemory parallel environments. It is available as Cþþ source code and Win-dows binaries.

HeroSvm, http://www.cenparmi.concordia.ca/people/jdong/HeroSvm.html. HeroSvm, by Dong, is developed in Cþþ, implements SVM classification,and is distributed as a dynamic link library forWindows. Kernels available includelinear, polynomial, and radial basis function.

SVM Software 389

Spider, http://www.kyb.tuebingen.mpg.de/bs/people/spider/. Spider is anobject-orientated environment for machine learning in MATLAB. It performsunsupervised, supervised, or semi-supervised machine learning problems andincludes training, testing, model selection, cross-validation, and statistical tests.Spider implements SVM multiclass classification and regression.

Java applets, http://svm.dcs.rhbnc.ac.uk/. These SVM classification andregression JavaappletsweredevelopedbymembersofRoyalHolloway,UniversityofLondon, and theAT&TSpeechand ImageProcessingServicesResearchLabora-tory. SVM classification is available from http://svm.dcs.rhbnc.ac.uk/pagesnew/GPat.shtml. SVM regression is available at http://svm.dcs.rhbnc.ac.uk/pagesnew/1D-Reg.shtml.

LEARNSC, http://www.support-vector.ws/html/downloads.html. Thissite contains MATLAB scripts for the book Learning and Soft Computingby Kecman.16 LEARNSC implements SVM classification and regression.

Tree Kernels, http://ai-nlp.info.uniroma2.it/moschitti/Tree-Kernel.htm.TreeKernels, byMoschitti, is an extensionofSVMlight, andwasobtainedbyencod-ing tree kernels. It is available as binaries for Windows, Linux, Mac-OSx, andSolaris. Tree kernels are suitable for encoding chemical structures, and thus thispackage brings significant capabilities for cheminformatics applications.

LS-SVMlab, http://www.esat.kuleuven.ac.be/sista/lssvmlab/. LS-SVMlab,by Suykens, is a MATLAB implementation of least-squares support vectormachines (LS–SVMs), a reformulation of the standard SVM that leads to solvinglinear KKT systems. LS–SVMprimal–dual formulations have been formulated forkernel PCA, kernel CCA, and kernel PLS, thereby extending the class of primal–dual kernel machines. Links between kernel versions of classic pattern recognitionalgorithms such as kernel Fisher discriminant analysis and extensions to unsuper-vised learning, recurrent networks, and control are available.

MATLAB SVM Toolbox, http://www.igi.tugraz.at/aschwaig/software.html. This is a MATLAB SVM classification implementation thatcan handle 1-norm and 2-norm SVM (linear or quadratic loss function)problems.

SVM/LOO, http://bach.ece.jhu.edu/pub/gert/svm/incremental/. SVM/LOO, by Cauwenberghs, has a very efficient MATLAB implementation ofthe leave–one–out cross-validation.

SVMsequel, http://www.isi.edu/hdaume/SVMsequel/. SVMsequel, byDaume III, is an SVMmulticlass classification package, distributed as C sourceor as binaries for Linux or Solaris. Kernels available include linear, polyno-mial, radial basis function, sigmoid, string, tree, and information diffusionon discrete manifolds.

LSVM, http://www.cs.wisc.edu/dmi/lsvm/. LSVM (Lagrangian SupportVectorMachine) is a very fast SVM implementation inMATLABbyMangasarianand Musicant. It can classify datasets containing several million patterns.

ASVM, http://www.cs.wisc.edu/dmi/asvm/. ASVM (Active SupportVector Machine) is a very fast linear SVM script for MATLAB, by Musicantand Mangasarian, developed for large datasets.


PSVM, http://www.cs.wisc.edu/dmi/svm/psvm/. PSVM (Proximal Sup-port Vector Machine) is a MATLAB script by Fung and Mangasarian thatclassifies patterns by assigning them to the closest of two parallel planes.

SimpleSVM Toolbox, http://asi.insa-rouen.fr/gloosli/simpleSVM.html.SimpleSVMToolbox is aMATLAB implementationof the SimpleSVMalgorithm.

SVM Toolbox, http://asi.insa-rouen.fr/%7Earakotom/toolbox/index.This fairly complex MATLAB toolbox contains many algorithms, includingclassification using linear and quadratic penalization, multiclass classification,e-regression, n-regression, wavelet kernel, and SVM feature selection.

MATLAB SVM Toolbox, http://theoval.sys.uea.ac.uk/gcc/svm/toolbox/. Developed by Cawley, this software has standard SVM features,together with multiclass classification and leave–one–out cross-validation.

R-SVM,http://www.biostat.harvard.edu/xzhang/R-SVM/R-SVM.html.R-SVM, by Zhang and Wong, is based on SVMTorch and is designed especiallyfor the classification of microarray gene expression data. R-SVM uses SVM forclassification and for selecting a subset of relevant genes according to their relativecontribution in the classification. This process is done recursively in such a waythat a seriesof gene subsets and classificationmodels canbeobtained ina recursivemanner, at different levels of gene selection. The performance of the classificationcanbe evaluated eitheronan independent test dataset orby cross validationon thesame dataset. R-SVM is distributed as Linux binary.

JSVM, http://www-cad.eecs.berkeley.edu/hwawen/research/projects/jsvm/doc/manual/index.html. JSVM is a Java wrapper for SVMlight.

SvmFu, http://five-percent-nation.mit.edu/SvmFu/. SvmFu, by Rifkin, isa Cþþ package for SVM classification. Kernels available include linear, poly-nomial, and Gaussian radial basis function.

CONCLUSIONS

Kernel learning algorithms have received considerable attention in datamodeling and prediction because kernels can straightforwardly perform anonlinear mapping of the data into a high-dimensional feature space. As a con-sequence, linear models can be transformed easily into nonlinear algorithmsthat in turn can explore complex relationships between input data and pre-dicted property. Kernel algorithms have applications in classification, cluster-ing, and regression. From the diversity of kernel methods (support vectormachines, Gaussian processes, kernel recursive least squares, kernel principalcomponent analysis, kernel perceptron learning, relevance vector machines,kernel Fisher discriminants, Bayes point machines, and kernel Gram-Schmidt),only SVM was readily adopted for QSAR and cheminformatics applications.

Support vector machines represent the most important development inchemometrics after (chronologically) partial least-squares and artificial neuralnetworks. We have presented numerous SAR and QSAR examples in thischapter that demonstrate the SVM capabilities for both classification and

Conclusions 391

regression. These examples showed that the nonlinear features of SVM shouldbe used with caution, because this added flexibility in modeling the databrings with it the danger of overfitting. The literature results reviewed hereshow that support vector machines already have numerous applications incomputational chemistry and cheminformatics. Future developments areexpected to improve the performance of SVM regression and to explore theSVMuse in jury ensembles as an effective way to increase their prediction power.

REFERENCES

1. V. Vapnik and A. Lerner, Automat. Remote Contr., 24, 774–780 (1963). Pattern RecognitionUsing Generalized Portrait Method.

2. V. Vapnik and A. Chervonenkis, Theory of Pattern Recognition, Nauka,Moscow, Russia, 1974.

3. V.Vapnik,EstimationofDependenciesBasedonEmpiricalData,Nauka,Moscow,Russia, 1979.

4. V. Vapnik, The Nature of Statistical Learning Theory, Springer, New York, 1995.

5. V. Vapnik, Statistical Learning Theory, Wiley-Interscience, New York, 1998.

6. C. Cortes and V. Vapnik, Mach. Learn., 20, 273–297 (1995). Support-Vector Networks.

7. B. Scholkopf, K. K. Sung, C. J. C. Burges, F. Girosi, P. Niyogi, T. Poggio, and V. Vapnik, IEEETrans. Signal Process., 45, 2758–2765 (1997). Comparing Support Vector Machines withGaussian Kernels to Radial Basis Function Classifiers.

8. O. Chapelle, P. Haffner, and V. N. Vapnik, IEEE Trans. Neural Netw., 10, 1055–1064(1999). Support Vector Machines for Histogram-based Image Classification.

9. H. Drucker, D. H.Wu, and V. N. Vapnik, IEEE Trans. Neural Netw., 10, 1048–1054 (1999).Support Vector Machines for Spam Categorization.

10. V. N. Vapnik, IEEE Trans. Neural Netw., 10, 988–999 (1999). An Overview of StatisticalLearning Theory.

11. V. Vapnik and O. Chapelle, Neural Comput., 12, 2013–2036 (2000). Bounds on ErrorExpectation for Support Vector Machines.

12. I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, Mach. Learn., 46, 389–422 (2002). GeneSelection for Cancer Classification Using Support Vector Machines.

13. B. Scholkopf, C. J. C. Burges, and A. J. Smola, Advances in Kernel Methods: Support VectorLearning, MIT Press, Cambridge, Massachusetts, 1999.

14. N. Cristianini and J. Shawe-Taylor,An Introduction to Support Vector Machines, CambridgeUniversity Press, Cambridge, United Kingdom, 2000.

15. A. J. Smola, P. Bartlett, B. Scholkopf, and D. Schuurmans, Advances in Large MarginClassifiers, MIT Press, Cambridge, Massachusetts, 2000.

16. V. Kecman, Learning and Soft Computing, MIT Press, Cambridge, Massachusetts, 2001.

17. B. Scholkopf and A. J. Smola, Learning with Kernels, MIT Press, Cambridge, Massachusetts,2002.

18. T. Joachims, Learning to Classify Text Using Support Vector Machines: Methods, Theory,and Algorithms, Kluwer, Norwell, Massachusetts, 2002.

19. R. Herbrich, Learning Kernel Classifiers, MIT Press, Cambridge, Massachusetts, 2002.

20. J. A. K. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor, and J. Vandewalle, LeastSquares Support Vector Machines, World Scientific, Singapore, 2002.

21. J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis, CambridgeUniversity Press, Cambridge, United Kingdom, 2004.

22. A. J. Smola and B. Scholkopf, Algorithmica, 22, 211–231 (1998). On a Kernel-based Methodfor Pattern Recognition, Regression, Approximation, and Operator Inversion.


23. C. J. C. Burges,Data Min. Knowl. Discov., 2, 121–167 (1998). A Tutorial on Support VectorMachines for Pattern Recognition.

24. B. Scholkopf, S. Mika, C. J. C. Burges, P. Knirsch, K.-R. Muller, G. Ratsch, and A. J. Smola,IEEE Trans. Neural Netw., 10, 1000–1017 (1999). Input Space Versus Feature Space inKernel-based Methods.

25. J. A. K. Suykens, Eur. J. Control, 7, 311–327 (2001). Support Vector Machines: A NonlinearModelling and Control Perspective.

26. K.-R.Muller, S.Mika, G. Ratsch, K. Tsuda, and B. Scholkopf, IEEETrans. Neural Netw., 12,181–201 (2001). An Introduction to Kernel-based Learning Algorithms.

27. C. Campbell, Neurocomputing, 48, 63–84 (2002). Kernel Methods: A Survey of CurrentTechniques.

28. B. Scholkopf and A. J. Smola, in Advanced Lectures on Machine Learning, Vol. 2600,Springer, New York, 2002, pp. 41–64. A Short Introduction to Learning with Kernels.

29. V. D. Sanchez, Neurocomputing, 55, 5–20 (2003). Advanced Support Vector Machines andKernel Methods.

30. A. J. Smola and B. Scholkopf, Stat. Comput., 14, 199–222 (2004). A Tutorial on SupportVector Regression.

31. A. Kurup, R. Garg, D. J. Carini, and C. Hansch, Chem. Rev., 101, 2727–2750 (2001).Comparative QSAR: Angiotensin II Antagonists.

32. K. Varmuza, in Handbook of Chemoinformatics, J. Gasteiger, Ed., Vol. 3, Wiley-VCH,Weinheim, Germany, 2003, pp. 1098–1133. Multivariate Data Analysis inChemistry.

33. O. Ivanciuc, in Handbook of Chemoinformatics, J. Gasteiger, Ed., Vol. 1, Wiley-VCH,Weinheim, Germany, 2003, pp. 103–138. Graph Theory in Chemistry.

34. O. Ivanciuc, in Handbook of Chemoinformatics, J. Gasteiger, Ed., Vol. 3, Wiley-VCH,Weinheim, Germany, 2003, pp. 981–1003. Topological Indices.

35. R. Todeschini and V. Consonni, inHandbook of Chemoinformatics, J. Gasteiger, Ed., Vol. 3,Wiley-VCH, Weinheim, Germany, 2003, pp. 1004–1033. Descriptors from MolecularGeometry.

36. P. Jurs, inHandbook of Chemoinformatics, J. Gasteiger, Ed., Vol. 3, Wiley-VCH, Weinheim,Germany, 2003, pp. 1314–1335. Quantitative Structure-Property Relationships.

37. L. Eriksson, H. Antti, E. Holmes, E. Johansson, T. Lundstedt, J. Shockcor, and S. Wold, inHandbook of Chemoinformatics, J. Gasteiger, Ed., Vol. 3, Wiley-VCH, Weinheim,Germany, 2003, pp. 1134–1166. Partial Least Squares (PLS) in Cheminformatics.

38. J. Zupan, in Handbook of Chemoinformatics, J. Gasteiger, Ed., Vol. 3, Wiley-VCH,Weinheim, Germany, 2003, pp. 1167–1215. Neural Networks.

39. A. von Homeyer, in Handbook of Chemoinformatics, J. Gasteiger, Ed., Vol. 3, Wiley-VCH,Weinheim, Germany, 2003, pp. 1239–1280. Evolutionary Algorithms and Their Applica-tions in Chemistry.

40. R. Fletcher, Practical Methods of Optimization, 2 ed., John Wiley and Sons, New York,1987.

41. J. Platt, in Advances in Kernel Methods - Support Vector Learning, B. Scholkopf, C. J. C.Burges, and A. J. Smola, Eds., MIT Press, Cambridge, Massachusetts, 1999, pp. 185–208.Fast Training of Support Vector Machines Using Sequential Minimal Optimization.

42. J. Mercer, Phil. Trans. Roy. Soc. London A, 209, 415–446 (1909). Functions of Positive andNegative Type and Their Connection with the Theory of Integral Equations.

43. B. Scholkopf, A. J. Smola, R. C. Williamson, and P. L. Bartlett, Neural Comput., 12,1207–1245 (2000). New Support Vector Algorithms.

44. C. C. Chang and C. J. Lin, Neural Comput., 13, 2119–2147 (2001). Training n-SupportVector Classifiers: Theory and Algorithms.

45. C. C. Chang and C. J. Lin, Neural Comput., 14, 1959–1977 (2002). Training n-SupportVector Regression: Theory and Algorithms.

References 393

46. I. Steinwart, IEEE Trans. Pattern Anal. Mach. Intell., 25, 1274–1284 (2003). On the OptimalParameter Choice for n-Support Vector Machines.

47. P. H. Chen, C. J. Lin, and B. Scholkopf,Appl. Stoch.Models. Bus. Ind., 21, 111–136 (2005). ATutorial on n-Support Vector Machines.

48. R. Debnath, N. Takahide, and H. Takahashi, Pattern Anal. Appl., 7, 164–175 (2004). ADecision-based One-against-one Method for Multi-class Support Vector Machine.

49. C. W. Hsu and C. J. Lin, IEEE Trans. Neural Netw., 13, 415–425 (2002). A Comparison ofMethods for Multiclass Support Vector Machines.

50. R. Rifkin and A. Klautau, J. Mach. Learn. Res., 5, 101–141 (2004). In Defense of One-vs-allClassification.

51. C. Angulo, X. Parra, and A. Catala,Neurocomputing, 55, 57–77 (2003).K-SVCR. A SupportVector Machine for Multi-class Classification.

52. Y. Guermeur, Pattern Anal. Appl., 5, 168–179 (2002). Combining Discriminant Models withNew Multi-class SVMs.

53. Y. Guermeur, G. Pollastri, A. Elisseeff, D. Zelus, H. Paugam-Moisy, and P. Baldi, Neuro-computing, 56, 305–327 (2004). Combining Protein Secondary Structure PredictionModelswith Ensemble Methods of Optimal Complexity.

54. A. Statnikov, C. F. Aliferis, I. Tsamardinos, D. Hardin, and S. Levy, Bioinformatics, 21,631–643 (2005). A Comprehensive Evaluation of Multicategory Classification Methods forMicroarray Gene Expression Cancer Diagnosis.

55. T. Li, C. L. Zhang, and M. Ogihara, Bioinformatics, 20, 2429–2437 (2004). A ComparativeStudy of Feature Selection and Multiclass Classification Methods for Tissue ClassificationBased on Gene Expression.

56. Y. Lee and C. K. Lee, Bioinformatics, 19, 1132–1139 (2003). Classification of MultipleCancer Types by Tip Multicategory Support Vector Machines Using Gene ExpressionData.

57. S. H. Peng, Q. H. Xu, X. B. Ling, X. N. Peng, W. Du, and L. B. Chen, FEBS Lett., 555,358–362 (2003). Molecular Classification of Cancer Types fromMicroarray Data Using theCombination of Genetic Algorithms and Support Vector Machines.

58. S. Ramaswamy, P. Tamayo, R. Rifkin, S. Mukherjee, C. H. Yeang, M. Angelo, C. Ladd,M. Reich, E. Latulippe, J. P.Mesirov, T. Poggio,W.Gerald,M. Loda, E. S. Lander, and T. R.Golub, Proc. Natl. Acad. Sci. U. S. A., 98, 15149–15154 (2001). Multiclass CancerDiagnosis Using Tumor Gene Expression Signatures.

59. O. L. Mangasarian and D. R. Musicant, IEEE Trans. Pattern Analysis Mach. Intell., 22,950–955 (2000). Robust Linear and Support Vector Regression.

60. O. L. Mangasarian and D. R. Musicant, Mach. Learn., 46, 255–269 (2002). Large ScaleKernel Regression via Linear Programming.

61. J. B. Gao, S. R. Gunn, and C. J. Harris, Neurocomputing, 55, 151–167 (2003). SVMRegression Through Variational Methods and its Sequential Implementation.

62. J. B. Gao, S. R. Gunn, and C. J. Harris, Neurocomputing, 50, 391–405 (2003). Mean FieldMethod for the Support Vector Machine Regression.

63. W. P.Walters and B. B. Goldman,Curr.Opin.DrugDiscov.Dev., 8, 329–333 (2005). FeatureSelection in Quantitative Structure-Activity Relationships.

64. D. J. Livingstone and D. W. Salt, in Reviews in Computational Chemistry, K. B. Lipkowitz,R. Larter, and T. R. Cundari, Eds., Vol. 21, Wiley-VCH, New York, 2005, pp. 287–348.Variable Selection - Spoilt for Choice?

65. J. Bi, K. P. Bennett, M. Embrechts, C. M. Breneman, and M. Song, J. Mach. Learn. Res., 3,1229–1243 (2003). Dimensionality Reduction via Sparse Support Vector Machines.

66. L. Cao, C. K. Seng, Q. Gu, and H. P. Lee, Neural Comput. Appl., 11, 244–249 (2003).Saliency Analysis of Support Vector Machines for Gene Selection in Tissue Classifica-tion.

67. G. M. Fung and O. L. Mangasarian, Comput. Optim. Appl., 28, 185–202 (2004). A FeatureSelection Newton Method for Support Vector Machine Classification.


68. R. Kumar, A. Kulkarni, V. K. Jayaraman, and B. D. Kulkarni, Internet Electron. J. Mol.Des., 3, 118–133 (2004). Structure–Activity Relationships Using Locally Linear EmbeddingAssisted by Support Vector and Lazy Learning Regressors.

69. Y. Xue, Z. R. Li, C.W. Yap, L. Z. Sun, X. Chen, andY. Z. Chen, J. Chem. Inf. Comput. Sci., 44,1630–1638 (2004).Effect ofMolecularDescriptor Feature Selection in SupportVectorMachineClassification of Pharmacokinetic and Toxicological Properties of Chemical Agents.

70. H. Frohlich, J. K.Wegner, andA.Zell,QSARComb. Sci., 23, 311–318 (2004).TowardsOptimalDescriptor Subset Selection with Support Vector Machines in Classification and Regression.

71. Y. Liu, J. Chem. Inf. Comput. Sci., 44, 1823–1828 (2004). A Comparative Study on FeatureSelection Methods for Drug Discovery.

72. E. Byvatov and G. Schneider, J. Chem. Inf. Comput. Sci., 44, 993–999 (2004). SVM-basedFeature Selection for Characterization of Focused Compound Collections.

73. S. Nandi, Y. Badhe, J. Lonari, U. Sridevi, B. S. Rao, S. S. Tambe, and B. D. Kulkarni, Chem.Eng. J., 97, 115–129 (2004). Hybrid Process Modeling and Optimization StrategiesIntegrating Neural Networks/Support Vector Regression and Genetic Algorithms: Studyof Benzene Isopropylation on Hbeta Catalyst.

74. Y. Wang, I. V. Tetko, M. A. Hall, E. Frank, A. Facius, K. F. X. Mayer, and H. W. Mewes,Comput. Biol. Chem., 29, 37–46 (2005). Gene Selection from Microarray Data for CancerClassification - A Machine Learning Approach.

75. N. Pochet, F. De Smet, J. A. K. Suykens, and B. L. R. DeMoor,Bioinformatics, 20, 3185–3195(2004). Systematic Benchmarking of Microarray Data Classification: Assessing the Role ofNon-linearity and Dimensionality Reduction.

76. G. Natsoulis, L. El Ghaoui, G. R. G. Lanckriet, A.M. Tolley, F. Leroy, S. Dunlea, B. P. Eynon,C. I. Pearson, S. Tugendreich, and K. Jarnagin, Genome Res., 15, 724–736 (2005).Classification of a Large Microarray Data Set: Algorithm Comparison and Analysis ofDrug Signatures.

77. X. Zhou and K. Z. Mao, Bioinformatics, 21, 1559–1564 (2005). LS Bound Based GeneSelection for DNA Microarray Data.

78. A. K. Jerebko, J. D. Malley, M. Franaszek, and R. M. Summers, Acad. Radiol., 12, 479–486(2005). Support Vector Machines Committee Classification Method for Computer-aidedPolyp Detection in CT Colonography.

79. K. Faceli, A. de Carvalho, andW. A. Silva,Genet. Mol. Biol., 27, 651–657 (2004). Evaluationof Gene Selection Metrics for Tumor Cell Classification.

80. L. B. Li, W. Jiang, X. Li, K. L. Moser, Z. Guo, L. Du, Q. J. Wang, E. J. Topol, Q. Wang, andS. Rao, Genomics, 85, 16–23 (2005). A Robust Hybrid between Genetic Algorithm andSupport Vector Machine for Extracting an Optimal Feature Gene Subset.

81. C. A. Tsai, C. H. Chen, T. C. Lee, I. C. Ho, U. C. Yang, and J. J. Chen, DNA Cell Biol., 23,607–614 (2004). Gene Selection for Sample Classifications in Microarray Experiments.

82. T. Downs, K. E. Gates, and A. Masters, J. Mach. Learn. Res., 2, 293–297 (2001). ExactSimplification of Support Vector Solutions.

83. Y. Q. Zhan and D. G. Shen, Pattern Recognit., 38, 157–161 (2005). Design Efficient SupportVector Machine for Fast Classification.

84. C.Merkwirth, H. A.Mauser, T. Schulz-Gasch, O. Roche,M. Stahl, and T. Lengauer, J. Chem.Inf. Comput. Sci., 44, 1971–1978 (2004). Ensemble Methods for Classification in Che-minformatics.

85. H. Briem and J. Gunther, ChemBioChem, 6, 558–566 (2005). Classifying "Kinase Inhibitor-likeness" by Using Machine-learning Methods.

86. C.W.Yap andY. Z. Chen, J. Chem Inf.Model., 45, 982–992 (2005). Prediction of CytochromeP450 3A4, 2D6, and 2C9 Inhibitors and Substrates by Using Support Vector Machines.

87. G. Valentini, M. Muselli, and F. Ruffino, Neurocomputing, 56, 461–466 (2004). CancerRecognition with Bagged Ensembles of Support Vector Machines.

88. H. Saigo, J.-P. Vert, N. Ueda, and T. Akutsu, Bioinformatics, 20, 1682–1689 (2004). ProteinHomology Detection Using String Alignment Kernels.

References 395

89. C. S. Leslie, E. Eskin, A. Cohen, J. Weston, and W. S. Noble, Bioinformatics, 20, 467–476(2004). Mismatch String Kernels for Discriminative Protein Classification.

90. J.-P. Vert, Bioinformatics, 18, S276–S284 (2002). A Tree Kernel to Analyse PhylogeneticProfiles.

91. Z. R. Yang and K. C. Chou, Bioinformatics, 20, 735–741 (2004). Bio-support VectorMachines for Computational Proteomics.

92. M.Wang, J. Yang, and K. C. Chou,Amino Acids, 28, 395–402 (2005). Using String Kernel toPredict Peptide Cleavage Site Based on Subsite Coupling Model.

93. R. Teramoto, M. Aoki, T. Kimura, and M. Kanaoka, FEBS Lett., 579, 2878–2882 (2005).Prediction of siRNA Functionality Using Generalized String Kernel and Support VectorMachine.

94. C. Leslie and R. Kuang, J. Mach. Learn. Res., 5, 1435–1455 (2004). Fast String Kernels UsingInexact Matching for Protein Sequences.

95. K. Tsuda and W. S. Noble, Bioinformatics, 20, i326–i333 (2004). Learning Kernels fromBiological Networks by Maximizing Entropy.

96. A. Micheli, F. Portera, and A. Sperduti, Neurocomputing, 64, 73–92 (2005). A PreliminaryEmpirical Comparison of Recursive Neural Networks and Tree Kernel Methods on Regres-sion Tasks for Tree Structured Domains.

97. P. Mahe, N. Ueda, T. Akutsu, J.-L. Perret, and J.-P. Vert, J. Chem Inf. Model., 45, 939–951(2005). Graph Kernels for Molecular Structure-Activity Relationship Analysis with SupportVector Machines.

98. B. J. Jain, P. Geibel, and F. Wysotzki, Neurocomputing, 64, 93–105 (2005). SVM Learningwith the Schur-Hadamard Inner Product for Graphs.

99. P. Lind and T. Maltseva, J. Chem. Inf. Comput. Sci., 43, 1855–1859 (2003). Support VectorMachines for the Estimation of Aqueous Solubility.

100. B. Hammer and K. Gersmann, Neural Process. Lett., 17, 43–53 (2003). A Note on theUniversal Approximation Capability of Support Vector Machines.

101. J. P. Wang, Q. S. Chen, and Y. Chen, in Advances in Neural Networks, F. Yin, J. Wang, andC. Guo, Eds., Vol. 3173, Springer, New York, 2004, pp. 512–517. RBF Kernel BasedSupport Vector Machine with Universal Approximation and its Application.

102. T. B. Thompson, K. C. Chou, and C. Zheng, J. Theor. Biol., 177, 369–379 (1995). NeuralNetwork Prediction of the HIV-1 Protease Cleavage Sites.

103. Z. R. Yang and K. C. Chou, J. Chem. Inf. Comput. Sci., 43, 1748–1753 (2003). MiningBiological Data Using Self-organizing Map.

104. Y. D. Cai, X. J. Liu, X. B. Xu, and K. C. Chou, J. Comput. Chem., 23, 267–274 (2002).Support Vector Machines for Predicting HIV Protease Cleavage Sites in Protein.

105. T. Rognvaldsson and L. W. You, Bioinformatics, 20, 1702–1709 (2004). Why NeuralNetworks Should not be Used for HIV-1 Protease Cleavage Site Prediction.

106. E. Urrestarazu Ramos, W. H. J. Vaes, H. J. M. Verhaar, and J. L. M. Hermens, J. Chem. Inf.Comput. Sci., 38, 845–852 (1998). Quantitative Structure-Activity Relationships for theAquatic Toxicity of Polar and Nonpolar Narcotic Pollutants.

107. S. Ren,Environ. Toxicol., 17, 415–423 (2002). Classifying Class I and Class II Compounds byHydrophobicity and Hydrogen Bonding Descriptors.

108. S. Ren and T. W. Schultz, Toxicol. Lett., 129, 151–160 (2002). Identifying the Mechanism ofAquatic Toxicity of Selected Compounds by Hydrophobicity and Electrophilicity Descriptors.

109. O. Ivanciuc, Internet Electron. J. Mol. Des., 2, 195–208 (2003). Aquatic Toxicity Predictionfor Polar and Nonpolar Narcotic Pollutants with Support Vector Machines.

110. O. Ivanciuc, Internet Electron. J. Mol. Des., 1, 157–172 (2002). Support Vector MachineIdentification of the Aquatic Toxicity Mechanism of Organic Compounds.

111. A. P. Bearden and T. W. Schultz, Environ. Toxicol. Chem., 16, 1311–1317 (1997). Structure-Activity Relationships for Pimephales and Tetrahymena: AMechanism of Action Approach.


112. O. Ivanciuc, Internet Electron. J. Mol. Des., 3, 802–821 (2004). Support Vector MachinesPrediction of the Mechanism of Toxic Action from Hydrophobicity and ExperimentalToxicity Against Pimephales promelas and Tetrahymena pyriformis.

113. S. Ren, P. D. Frymier, and T. W. Schultz, Ecotox. Environ. Safety, 55, 86–97 (2003). AnExploratory Study of the use ofMultivariate Techniques to DetermineMechanisms of ToxicAction.

114. O. Ivanciuc, Internet Electron. J. Mol. Des., 1, 203–218 (2002). Support VectorMachine Classification of the Carcinogenic Activity of Polycyclic Aromatic Hydrocar-bons.

115. R. S. Braga, P. M. V. B. Barone, and D. S. Galvao, J. Mol. Struct. (THEOCHEM), 464, 257–266 (1999). Identifying Carcinogenic Activity of Methylated Polycyclic Aromatic Hydro-carbons (PAHs).

116. P. M. V. B. Barone, R. S. Braga, A. Camilo Jr., and D. S. Galvao, J. Mol. Struct.(THEOCHEM), 505, 55–66 (2000). Electronic Indices from Semi-empirical Calculationsto Identify Carcinogenic Activity of Polycyclic Aromatic Hydrocarbons.

117. R. Vendrame, R. S. Braga, Y. Takahata, and D. S. Galvao, J. Mol. Struct. (THEOCHEM),539, 253–265 (2001). Structure-Carcinogenic Activity Relationship Studies of PolycyclicAromatic Hydrocarbons (PAHs) with Pattern-Recognition Methods.

118. D. J. G. Marino, P. J. Peruzzo, E. A. Castro, and A. A. Toropov, Internet Electron. J. Mol.Des., 1, 115–133 (2002). QSAR Carcinogenic Study of Methylated Polycyclic AromaticHydrocarbons Based on Topological Descriptors Derived from Distance Matrices andCorrelation Weights of Local Graph Invariants.

119. M. Chastrette and J. Y. D. Laumer, Eur. J. Med. Chem., 26, 829–833 (1991). Structure OdorRelationships Using Neural Networks.

120. M. Chastrette, C. El Aıdi, and J. F. Peyraud, Eur. J. Med. Chem., 30, 679–686 (1995).Tetralin, Indan and Nitrobenzene Compound Structure-musk Odor Relationship UsingNeural Networks.

121. K. J. Rossiter, Chem. Rev., 96, 3201–3240 (1996). Structure-Odor Relationships.

122. D. Zakarya, M. Chastrette, M. Tollabi, and S. Fkih-Tetouani, Chemometrics Intell. Lab.Syst., 48, 35–46 (1999). Structure-Camphor Odour Relationships using the Generation andSelection of Pertinent Descriptors Approach.

123. R. D. M. C. Amboni, B. S. Junkes, R. A. Yunes, and V. E. F. Heinzen, J. Agric. Food Chem.,48, 3517–3521 (2000). Quantitative Structure-Odor Relationships of Aliphatic Esters UsingTopological Indices.

124. G. Buchbauer, C. T. Klein, B. Wailzer, and P. Wolschann, J. Agric. Food Chem., 48,4273–4278 (2000). Threshold-Based Structure-Activity Relationships of Pyrazines withBell-Pepper Flavor.

125. B. Wailzer, J. Klocker, G. Buchbauer, G. Ecker, and P. Wolschann, J. Med. Chem., 44,2805–2813 (2001). Prediction of the Aroma Quality and the Threshold Values of SomePyrazines Using Artificial Neural Networks.

126. O. Ivanciuc, Internet Electron. J. Mol. Des., 1, 269–284 (2002). Structure–Odor Relation-ships for Pyrazines with Support Vector Machines.

127. A. O. Aptula, N. G. Jeliazkova, T. W. Schultz, and M. T. D. Cronin, QSAR Comb. Sci., 24,385–396 (2005). The Better Predictive Model: High q2 for the Training Set or Low RootMean Square Error of Prediction for the Test Set?

128. O. Ivanciuc, Internet Electron. J. Mol. Des., 4, 928–947 (2005). QSAR for Phenols Toxicityto Tetrahymena pyriformis with Support Vector Regression and Artificial Neural Net-works.

129. A. Carotti, C. Altornare, L. Savini, L. Chlasserini, C. Pellerano, M. P. Mascia, E. Maciocco,F. Busonero, M. Mameli, G. Biggio, and E. Sanna, Bioorg. Med. Chem., 11, 5259–5272(2003). High Affinity Central Benzodiazepine Receptor Ligands. Part 3: Insights into thePharmacophore and Pattern Recognition Study of Intrinsic Activities of Pyrazolo[4,3-c]quinolin-3-ones.

References 397

130. D. Hadjipavlou-Litina, R. Garg, and C. Hansch, Chem. Rev., 104, 3751–3793 (2004).Comparative Quantitative Structure-Activity Relationship Studies (QSAR) on Non-benzodiazepine Compounds Binding to Benzodiazepine Receptor (BzR).

131. L. Savini, P. Massarelli, C. Nencini, C. Pellerano, G. Biggio, A. Maciocco, G. Tuligi,A. Carrieri, N. Cinone, and A. Carotti, Bioorg. Med. Chem., 6, 389–399 (1998). HighAffinity Central Benzodiazepine Receptor Ligands: Synthesis and Structure-Activity Rela-tionship Studies of a New Series of Pyrazolo[4,3-c]quinolin-3-ones.

132. O. Ivanciuc, Internet Electron. J. Mol. Des., 4, 181–193 (2005). Support Vector RegressionQuantitative Structure-Activity Relationships (QSAR) for Benzodiazepine ReceptorLigands.

133. T. I. Netzeva, J. C. Dearden, R. Edwards, A. D. P.Worgan, andM. T. D. Cronin, J. Chem. Inf.Comput. Sci., 44, 258–265 (2004). QSAR Analysis of the Toxicity of Aromatic Compoundsto Chlorella vulgaris in a Novel Short-term Assay.

134. T. I. Netzeva, J. C. Dearden, R. Edwards, A. D. P. Worgan, and M. T. D. Cronin, Bull.Environ. Contam. Toxicol., 73, 385–391 (2004). Toxicological Evaluation and QSARModelling of Aromatic Amines to Chlorella vulgaris.

135. M. T. D. Cronin, T. I. Netzeva, J. C. Dearden, R. Edwards, and A. D. P. Worgan,Chem. Res.Toxicol., 17, 545–554 (2004). Assessment and Modeling of the Toxicity of OrganicChemicals to Chlorella vulgaris: Development of a Novel Database.

136. A. D. P. Worgan, J. C. Dearden, R. Edwards, T. I. Netzeva, and M. T. D. Cronin, QSARComb. Sci., 22, 204–209 (2003). Evaluation of a Novel Short-term Algal Toxicity Assay bythe Development of QSARs and Inter-species Relationships for Narcotic Chemicals.

137. O. Ivanciuc, Internet Electron. J. Mol. Des., 4, 911–927 (2005). Artificial Neural Networksand Support Vector Regression Quantitative Structure-Activity Relationships (QSAR) forthe Toxicity of Aromatic Compounds to Chlorella vulgaris.

138. O. Ivanciuc, Rev. Roum. Chim., 43, 347–354 (1998). Artificial Neural Networks Applica-tions. Part 7 - Estimation of Bioconcentration Factors in Fish Using SolvatochromicParameters.

139. X. X. Lu, S. Tao, J. Cao, and R.W.Dawson,Chemosphere, 39, 987–999 (1999). Prediction ofFish Bioconcentration Factors of Nonpolar Organic Pollutants Based on Molecular Con-nectivity Indices.

140. S. Tao, H. Y. Hu, X. X. Lu, R. W. Dawson, and F. L. Xu, Chemosphere, 41, 1563–1568(2000). Fragment Constant Method for Prediction of Fish Bioconcentration Factors of Non-polar Chemicals.

141. S. D. Dimitrov, N. C. Dimitrova, J. D. Walker, G. D. Veith, and O. G. Mekenyan, Pure Appl.Chem., 74, 1823–1830 (2002). Predicting Bioconcentration Factors of Highly HydrophobicChemicals. Effects of Molecular Size.

142. S. D. Dimitrov, N. C. Dimitrova, J. D. Walker, G. D. Veith, and O. G. Mekenyan, QSARComb. Sci., 22, 58–68 (2003). Bioconcentration Potential Predictions Based on MolecularAttributes - An Early Warning Approach for Chemicals Found in Humans, Birds, Fish andWildlife.

143. M. H. Fatemi, M. Jalali-Heravi, and E. Konuze, Anal. Chim. Acta, 486, 101–108 (2003).PredictionofBioconcentrationFactorUsingGenetic AlgorithmandArtificialNeuralNetwork.

144. P. Gramatica and E. Papa, QSAR Comb. Sci., 22, 374–385 (2003). QSAR Modeling ofBioconcentration Factor by Theoretical Molecular Descriptors.

145. O. Ivanciuc, Internet Electron. J. Mol. Des., 4, 813–834 (2005). Bioconcentration FactorQSAR with Support Vector Regression and Artificial Neural Networks.

146. S. S. Yang, W. C. Lu, N. Y. Chen, and Q. N. Hu, J. Mol. Struct. (THEOCHEM), 719, 119–127 (2005). Support Vector Regression Based QSPR for the Prediction of Some Physico-chemical Properties of Alkyl Benzenes.

147. K.-R. Muller, G. Ratsch, S. Sonnenburg, S. Mika, M. Grimm, and N. Heinrich, J. ChemInf. Model., 45, 249–253 (2005). Classifying ‘Drug-likeness’ with Kernel-based LearningMethods.


148. R. N. Jorissen and M. K. Gilson, J. Chem Inf. Model., 45, 549–561 (2005). Virtual Screeningof Molecular Databases Using a Support Vector Machine.

149. R. Arimoto, M. A. Prasad, and E. M. Gifford, J. Biomol. Screen, 10, 197–205 (2005).Development of CYP3A4 InhibitionModels: Comparisons of Machine-learning Techniquesand Molecular Descriptors.

150. V. Svetnik, T. Wang, C. Tong, A. Liaw, R. P. Sheridan, and Q. Song, J. Chem Inf. Model., 45,786–799 (2005). Boosting: An Ensemble Learning Tool for Compound Classification andQSAR Modeling.

151. C.W.Yap, C. Z. Cai, Y. Xue, andY. Z. Chen,Toxicol. Sci., 79, 170–177 (2004). Prediction ofTorsade-causing Potential of Drugs by Support Vector Machine Approach.

152. M. Tobita, T. Nishikawa, and R. Nagashima, Bioorg. Med. Chem. Lett., 15, 2886–2890(2005). A Discriminant Model Constructed by the Support Vector Machine Method forHERG Potassium Channel Inhibitors.

153. M. J. Sorich, R. A. McKinnon, J. O. Miners, D. A. Winkler, and P. A. Smith, J. Med. Chem.,47, 5311–5317 (2004). Rapid Prediction of Chemical Metabolism by Human UDP-glucur-onosyltransferase Isoforms Using Quantum Chemical Descriptors Derived with the Electro-negativity Equalization Method.

154. V. V. Zernov, K. V. Balakin, A. A. Ivaschenko, N. P. Savchuk, and I. V. Pletnev, J. Chem.Inf. Comput. Sci., 43, 2048–2056 (2003). Drug Discovery Using Support Vector Machines.The Case Studies of Drug-likeness, Agrochemical-likeness, and Enzyme Inhibition Predic-tions.

155. J. M. Kriegl, T. Arnhold, B. Beck, and T. Fox, QSAR Comb. Sci., 24, 491–502 (2005).Prediction of Human Cytochrome P450 Inhibition Using Support Vector Machines.

156. J. Aires-de-Sousa and J. Gasteiger, J. Comb. Chem., 7, 298–301 (2005). Prediction ofEnantiomeric Excess in a Combinatorial Library of Catalytic Enantioselective Reactions.

157. H. Li, C. Y. Ung, C. W. Yap, Y. Xue, Z. R. Li, Z. W. Cao, and Y. Z. Chen, Chem. Res.Toxicol., 18, 1071–1080 (2005). Prediction of Genotoxicity of Chemical Compounds byStatistical Learning Methods.

158. C. Helma, T. Cramer, S. Kramer, and L. De Raedt, J. Chem. Inf. Comput. Sci., 44, 1402–1411(2004). Data Mining and Machine Learning Techniques for the Identification of Mutageni-city Inducing Substructures and Structure Activity Relationships of Noncongeneric Com-pounds.

159. T. C.Martin, J. Moecks, A. Belooussov, S. Cawthraw, B. Dolenko,M. Eiden, J. von Frese, W.Kohler, J. Schmitt, R. Somorjai, T. Udelhoven, S. Verzakov, and W. Petrich, Analyst, 129,897–901 (2004). Classification of Signatures of Bovine Spongiform Encephalopathy inSerum Using Infrared Spectroscopy.

160. J. A. F. Pierna, V. Baeten, A. M. Renier, R. P. Cogdill, and P. Dardenne, J. Chemometr., 18,341–349 (2004). Combination of Support VectorMachines (SVM) andNear-infrared (NIR)Imaging Spectroscopy for theDetection ofMeat and BoneMeal (MBM) inCompound Feeds.

161. S. Zomer, R. G. Brereton, J. F. Carter, and C. Eckers, Analyst, 129, 175–181 (2004). SupportVector Machines for the Discrimination of Analytical Chemical Data: Application to theDetermination of Tablet Production by Pyrolysis-gas Chromatography-mass Spectrometry.

162. S. Zomer, C. Guillo, R. G. Brereton, and M. Hanna-Brown, Anal. Bioanal. Chem., 378,2008–2020 (2004). Toxicological Classification of Urine Samples Using Pattern RecognitionTechniques and Capillary Electrophoresis.

163. U. Thissen, B. Ustun, W. J. Melssen, and L. M. C. Buydens, Anal. Chem., 76, 3099–3105(2004). Multivariate Calibration with Least-Squares Support Vector Machines.

164. F. Chauchard, R.Cogdill, S. Roussel, J.M.Roger, andV. Bellon-Maurel,Chemometrics Intell.Lab. Syst., 71, 141–150 (2004). Application of LS-SVM to Non-linear Phenomena in NIRSpectroscopy:Development of aRobust andPortable Sensor forAcidity Prediction inGrapes.

165. U. Thissen, M. Pepers, B. Ustun, W. J. Melssen, and L. M. C. Buydens, Chemometrics Intell.Lab. Syst., 73, 169–179 (2004). Comparing Support Vector Machines to PLS for SpectralRegression Applications.

References 399

166. S. Zomer, M. D. N. Sanchez, R. G. Brereton, and J. L. P. Pavon, J. Chemometr., 18, 294–305(2004). Active Learning Support Vector Machines for Optimal Sample Selection in Classi-fication.

167. H. L. Zhai, H. Gao, X. G. Chen, and Z. D. Hu, Anal. Chim. Acta, 546, 112–118 (2005). AnAssisted Approach of the Global Optimization for the Experimental Conditions in CapillaryElectrophoresis.

168. K. Brudzewski, S. Osowski, and T. Markiewicz, Sens. Actuators B, 98, 291–298 (2004).Classification of Milk by Means of an Electronic Nose and SVM Neural Network.

169. O. Sadik, W. H. Land, A. K. Wanekaya, M. Uematsu, M. J. Embrechts, L. Wong,D. Leibensperger, and A. Volykin, J. Chem. Inf. Comput. Sci., 44, 499–507 (2004).Detection and Classification of Organophosphate Nerve Agent Simulants Using SupportVector Machines with Multiarray Sensors.

170. C. Distante, N. Ancona, and P. Siciliano, Sens. Actuators B, 88, 30–39 (2003). Support VectorMachines for Olfactory Signals Recognition.

171. M. Pardo and G. Sberveglieri, Sens. Actuators B, 107, 730–737 (2005). Classification ofElectronic Nose Data with Support Vector Machines.

172. K. Brudzewski, S. Osowski, T.Markiewicz, and J. Ulaczyk, Sens. Actuators B, 113, 135–141(2006). Classification of Gasoline with Supplement of Bio-products by Means of anElectronic Nose and SVM Neural Network.

173. M. Bicego, Sens. Actuators B, 110, 225–230 (2005). Odor Classification Using Similarity-based Representation.

174. T. B. Trafalis, O. Oladunni, and D. V. Papavassiliou, Ind. Eng. Chem. Res., 44, 4414–4426(2005). Two-phase Flow Regime Identification with a Multiclassification Support VectorMachine (SVM) Model.

175. D. E. Lee, J.H. Song, S.O. Song, and E. S. Yoon, Ind. Eng. Chem.Res., 44, 2101–2105 (2005).Weighted Support Vector Machine for Quality Estimation in the Polymerization Process.

176. Y. H. Chu, S. J. Qin, and C. H. Han, Ind. Eng. Chem. Res., 43, 1701–1710 (2004). FaultDetection and Operation Mode Identification Based on Pattern Classification with VariableSelection.

177. I. S. Han, C. H. Han, and C. B. Chung, J. Appl. Polym. Sci., 95, 967–974 (2005). Melt IndexModeling with Support Vector Machines, Partial Least Squares, and Artificial NeuralNetworks.

178. S.Mika and B. Rost,Nucleic Acids Res., 32, W634–W637 (2004). NLProt: Extracting ProteinNames and Sequences from Papers.

179. S.Mika and B. Rost,Bioinformatics, 20, i241–i247 (2004). ProteinNames Precisely Peeled offFree Text.

180. L. Shi and F. Campagne, BMC Bioinformatics, 6, 88 (2005). Building a Protein NameDictionary from Full Text: A Machine Learning Term Extraction Approach.

181. I. Donaldson, J. Martin, B. de Bruijn, C. Wolting, V. Lay, B. Tuekam, S. D. Zhang, B. Baskin,G. D. Bader, K. Michalickova, T. Pawson, and C. W. V. Hogue, BMC Bioinformatics, 4,(2003). PreBIND and Textomy - Mining the Biomedical Literature for Protein-proteinInteractions Using a Support Vector Machine.

182. K. Takeuchi and N. Collier, Artif. Intell. Med., 33, 125–137 (2005). Bio-medical EntityExtraction Using Support Vector Machines.

183. R. Bunescu, R. F. Ge, R. J. Kate, E. M. Marcotte, R. J. Mooney, A. K. Ramani, and Y. W.Wong, Artif. Intell. Med., 33, 139–155 (2005). Comparative Experiments on LearningInformation Extractors for Proteins and Their Interactions.

184. T. Joachims, in Advances in Kernel Methods — Support Vector Learning, B. Scholkopf, C. J.C. Burges, and A. J. Smola, Eds., MIT Press, Cambridge, Massachusetts, 1999. MakingLarge-scale SVM Learning Practical.

185. R. Collobert and S. Bengio, J. Mach. Learn. Res., 1, 143–160 (2001). SVMTorch: SupportVector Machines for Large-scale Regression Problems.


CHAPTER 7

How Computational ChemistryBecame Important in thePharmaceutical Industry

Donald B. Boyd

Department of Chemistry and Chemical Biology, IndianaUniversity-Purdue University at Indianapolis,402 North Blackford Street, Indianapolis, Indiana 46202-3274

INTRODUCTION

The aim of this chapter is to give a brief account of the historical devel-opment of computational chemistry in the pharmaceutical industry. Starting inthe 1960s, scientists entering the field had to cope with and overcome a num-ber of significant obstacles. Better methods had to be conceived and developed.Easier ways for users to perform computational experiments had to be engi-neered into the software. Computers had to become faster, and memory capa-city had to be increased significantly. The minds of scientists who were used todoing research one way had to be convinced that there were other productiveways that could, in certain circumstances, help them reach their research goals.By overcoming the hurdles, successes were achieved. Some of these successeswere scientific advances, and some were new pharmaceutical products reach-ing the marketplace. The accumulating number of successes helped propel thefield forward, helping to create opportunities for additional computationalchemists to establish careers in the pharmaceutical industry. In the spirit ofobjectivity, however, the chapter also mentions some research projects that


401

did not work out so well. Successes and failures are part of any research ortechnical undertaking; scientific breakthroughs rarely come easily.

As a result of the personal interest and experience (25 years at Eli Lillyand Company) of the author,1 the emphasis in this retrospective is on usingcomputers for drug discovery. But the use of computers in laboratory instru-ments and for analysis of experimental and clinical data is no less important.The history reviewed here was written with young scientists in mind. One ofthe main goals of this book series is to educate. We feel it is important that thenew investigator have an appreciation of how the field evolved to its currentcircumstance, if for no other reason than to help steer toward a better futurefor those scientists using or planning to use computational chemistry in thepharmaceutical industry. In addition, this chapter may bring back some mem-ories – fond and otherwise – by elder participants in the field.

Discovering a molecule with a useful therapeutic effect had long beenexclusively an experimental art and science. Several scientific and technicaladvances made a computational approach to pharmaceutical progress possi-ble. One early, fundamental advance was the development of the conceptthat chemical structure is related to molecular properties including biologicalactivity. This concept, depicted in Figure 1, underlies all of medicinal chemis-try and is so fundamental that it is often taken for granted and not even men-tioned in many books and review articles. Given this relationship, it is easy toconceive that if one could predict properties by calculations, one might be ableto predict which structures should be investigated in the laboratory. Anotheradvance was recognizing that a drug typically exerts its biological activity bybinding to and/or inhibiting some biomolecule in the body. This concept stemsfrom Fischer’s famous lock-and-key hypothesis (Schlussel-Schloss-Prinzip).2,3

Another advance was the development in the 1920s of the theory of quantummechanics,4 which connected the distribution of electrons in molecules withobservable molecular properties. Pioneering research in the 1950s then forgedlinks between the electronic structure of molecules and their biological activ-ity. A part of such work was collected in the 1963 book by Bernard andAlberte Pullman (Paris, France) sparking the imagination of many youngscientists about what might be possible with calculations on biomolecules.5

The earliest papers that attempted to mathematically relate chemicalstructure and biological activity were published in Scotland way back in themiddle of the 19th century.6,7 These and a couple other papers8,9 were fore-runners to modern quantitative structure-activity relationships (QSAR) eventhough they were not widely known publications. In 1964, the role of mole-cular descriptors in describing biological activity was reduced to a simplifiedmathematical form, and the field of QSAR was propelled toward its modernvisage.10,11 (A descriptor is any calculated or experimental numerical propertyrelated to a compound’s chemical structure.)

Of course, the engineering development of computers was requisite fortheir use at pharmaceutical companies. The early computers were designed

402 Computational Chemistry in the Pharmaceutical Industry

for military and accounting applications, but gradually it became apparentthat computing machinery would have a vast number of uses. Computerswere first deployed at pharmaceutical companies as early as the 1940s. Theseearly computers were used for payroll and accounting, not for science. Thepower and number of computers gradually increased, so that around 1960 afew pioneering industrial scientists started to think about how computersmight aid them with their drug discovery efforts. In addition, access to com-puters was gained through contractual agreements with nearby educationalinstitutions or companies in other industries. Although payroll and accountingwere still the main uses of computers in the 1960s, a few courageous innova-tors were allowed to use spare time on the mainframes or were allowed toacquire smaller machines specifically for science.

This chapter reviews events, trends, hurdles, progress, people, hardware,and software. Whereas the chapter attempts to paint a picture of happeningsas historically correct as possible, it is inevitably colored by the author’sexperiences and memories. The timeline used in this chapter is divided bydecade beginning with the 1960s and running through the 1990s. Theconclusion gives an overview of how the field has grown; keys to success areidentified.

Molecular Structure

Chemical and Physical Properties

Transport to Target Receptor

Drug-Receptor Interaction

Biochemical Events

Biological Response

Figure 1 From the chemical structure of the molecule arises its other properties such assize, shape, lipophilicity, polarity, and so forth. These properties in turn determine how amolecule will be transported in the body and how it will interact with its intendedreceptor. These interactions result in biochemical events, which in turn evoke abiological response.

Introduction 403

For some topics mentioned in this chapter, hundreds of books12 andthousands of articles demonstrating the growing importance of computationalchemistry in the pharmaceutical industry could be cited, but it is impractical toinclude them all. We hope that the reader will tolerate us citing only a fewexamples; the work of all the many brilliant scientists who made landmarkcontributions cannot be covered in a single chapter. The author is less familiarwith events at European and Japanese companies than with events in theUnited States. For an excellent history of the general development of compu-tational chemistry in the United States, not just in industry, the reader isreferred to an earlier chapter in this book series.13 Also, histories have beenwritten on the development of computational chemistry in the United King-dom,14 Canada,15 France,16 and Germany,17 but they touch only lightly onthe subject of industrial research.

GERMINATION: THE 1960s

We can state confidently that in 1960 essentially 100% of the computa-tional chemists were in academic or government laboratories, not industry. Ofcourse, back then they were not called computational chemists because that is aterm that had not yet entered the language. The scientists who worked with com-puters to learn about molecules were called theoretical chemists or quantum che-mists. The students coming from those academic laboratories constituted themain pool of candidates that industry could hire for their initial ventures intousing computers for drug discovery. Another pool of chemists educated usingcomputers were X-ray crystallographers. Some of these young theoreticians andcrystallographers were interested in helping solve human health challenges andsteered their careers toward pharmaceutical work.

Although a marvel at the time, the workplace of the 1960s looks archaicin hindsight. Online computer files and graphical user interfaces were stillfuturistic concepts. Computers generally resided in computer centers, wherea small army of administrators, engineers, programming consultants, and sup-port people would tend the mainframe computers then in use. The computerswere kept in locked, air-conditioned rooms inaccessible to the ordinary users.One of the largest computers then in use by theoretical chemists and crystal-lographers was the IBM 7094. Support staff operated the tape readers, cardreaders, and printers. The users’ room at the computer centers echoed withthe clunk-clunk-clunk of card punches that encoded data as little rectangularholes in the so-called IBM cards.12 The cards were manufactured in differentcolors so that users could conveniently differentiate their many card decks. Asa by-product, the card punches produced piles of colorful rectangular confetti.There were no Delete or Backspace keys; if any mistake was made in keying indata, the user would need to begin again with a fresh blank card.

The programs used by chemists in the 1960s were usually written inFORTRAN II. Programs used by the chemists typically ranged from half a


box to several boxes long (each box contained 2000 cards; each line of codecorresponded to one card). Input decks containing the data needed by the pro-grams were generally smaller – consisting of tens of cards – and were sand-wiched between JCL (job control language for IBM machines) cards andbound by rubber bands. Carrying several boxes of cards to the computer cen-ter was good for physical fitness. If a box was dropped or if a card readermangled some of the cards, the tedious task of restoring the deck and replacingthe torn cards ensued.

Computer output usually came in the form of ubiquitous pale-green-and-white striped paper (measuring 11 by 14 7/8 inches per page). (A replica ofthat computer paper was used as the cover design for the Journal of Computa-tional Chemistry during the early years of its publication.) Special cardboardcovers and long nylon needles were used to hold and organize stacks of print-outs. The user rooms resounded with the jagged squeal of stacks of fresh com-puter printouts being ripped apart into individual jobs. These were put in thepigeonholes of each user or group. The abundance of cards and printouts inthe users’ room scented the air with a characteristic paper smell.

Mathematical algorithms for common operations such as matrix diago-nalization had been written and could be inserted as a subroutine in a largermolecular orbital program, for instance. Specialized programs for chemistrywere generally developed by academic groups with the graduate students doingmost or all of the programming. This was standard practice in part because theprofessors at different universities (or maybe at the same university) were incompetition with each other and wanted better programs than their competi-tors had access to. (Better meant running faster, handling larger matrices,and doing more.) Also, this situation was standard practice so that the graduatestudents would learn by doing. Obviously, this situation led to much duplica-tion of effort: the proverbial reinventing the wheel. To improve this situation,Prof. Harrison Shull and colleagues at Indiana University, Bloomington,conceived and sold the concept of having an international repository of soft-ware that could be shared. Thus was born in 1962 the Quantum ChemistryProgram Exchange (QCPE). Competitive scientists were initially slow togive away programs they worked so hard to write, but gradually the deposi-tions to QCPE increased. We do not have room here to give a full recountingof the history of QCPE,18 but suffice it to say that QCPE proved instrumentalin advancing the field of computational chemistry, including that at pharma-ceutical companies. Back in the 1960s and 1970s, there were no software com-panies catering to the computational chemistry market, so QCPE was the mainresource for the entire community. As the name implies, QCPE was initiallyused for exchanging subroutines and programs for ab initio and approximateelectronic structure calculations. But QCPE evolved to encompass programsfor molecular mechanics, kinetics, spectroscopy, and a wide range of othercalculations on molecules. The quarterly QCPE Newsletter (later renamedthe QCPE Bulletin), which was edited by Mr. Richard W. Counts, was for a

Germination: The 1960s 405

long time the main vehicle for computational chemists to announce programsand other news of interest. Industrial computational chemists were among themembers of QCPE and, with permission from their corporate management,even contributed programs for use by others.

In regard to software, we note one program that came from the realm ofcrystallography. ORTEP (Oak Ridge Thermal Ellipsoid Program) was the firstwidely used program for (noninteractive) molecular graphics.19 Output fromthe program was inked onto long scrolls of paper run through expensive, flat-bed printers. The ball-and-stick ORTEP drawings were fine for publication,but for routine laboratory work graph paper, ruler, protractor, and pencilwere the tools for plotting Cartesian coordinates of a molecule the chemistwanted to study. Such handmade drawings quantified and visualized molecu-lar geometry. Experimental bond lengths and bond angles needed for suchstructure generation were found in a heavily-used, British compilation.20

To help the chemist think about molecular shape, handheld molecularmodels were also widely used by experimentalists and theoreticians alike.There were two main types of mechanical models. One was analogous to mod-ern stick representations with metal or plastic rods represented bonds betweenatoms, the latter represented by balls or joints that held the rods at specificangles. Drieding models, which were made of solid and hollow metal wires,were among the most accurate and expensive at that time. (Less-expensivestick models made of plastic are still used in the teaching of organic chemis-try.) The other type of molecular models was the space-filling variety. Theexpensive, well-known CPK (Corey-Pauling-Koltun) models21,22 consisted ofthree-dimensional spherical segments made of plastic that was color-codedby element (white for hydrogen, blue for nitrogen, red for oxygen, etc.).From this convention came the color molecular graphics we are familiarwith today.

Before proceeding further, it is worthwhile to briefly describe the milieuof pharmaceutical research 40 years ago. In the 1960s (and before), drug dis-covery was done by trial and error. Progress depended on the intuition andknowledge of medicinal chemists and biologists, as well as serendipitous dis-coveries, not on computational predictions. Interesting compounds flowedfrom two main sources in that period. The smaller pipeline consisted of nat-ural products, such as soil microbes that produce biologically active compo-nents or plants with medicinal properties. The larger pipeline involvedclassical medicinal chemistry. A lead compound would be discovered by bio-logical screening or by reading the patent and scientific literature published bycompetitors at other pharmaceutical companies. From the lead, the medicinalchemists would use their ingenuity, creativity, and synthetic expertise to con-struct new compounds that would be tested by the appropriate in-house phar-macologists, microbiologists, and so forth. Those compounds would often besubmitted to a battery of other bioactivity screens being run at the companyso that leads for other drug targets could be discovered besides the intended


biological target. The most potent compounds found would then become thebasis for another round of analog design and synthesis. Thus would evolvefrom many iterations a structure-activity relationship (SAR), which when sum-marized would consist of a table of compounds and their activities. In fortui-tous circumstances, one of the medicinal chemists would make a compoundwith sufficient potency that a project team consisting of scientists from drugdiscovery and drug development would be assembled to oversee furtherexperiments on the compound to learn if it had the appropriate characteristicsto become a pharmaceutical product. The formula for career success for amedicinal chemist was simple: invent or claim authorship of a project teamcompound. Management would then bestow kudos on the chemist (as wellas the biologists) involved in the project.

What happened when a theoretical chemist was thrown into this milieu?Well, initially not much because the only theoretical methods of the 1960s thatcould treat drug-sized (200–500Da) molecules were limited in what theycould predict, and often those predictions were inaccurate. The molecularorbital methods used were extended Huckel theory23,24 and soon there-after CNDO/2 (complete-neglect-of-differential-overlap/second parameteriza-tion).25,26 These approximate methods involved determining molecularorbitals from a highly approximated Fock matrix. Although crude by today’sstandards and incapable of giving accurate, energy-minimized (‘‘optimized’’),three-dimensional molecular geometries (bond lengths, bond angles, and tor-sional angles), the methods were more practical than other methods availableat the time. One of these other methods was Hartree-Fock27,28,29,30 (alsocalled self-consistent field or nonempirical in the early literature, or ab initioin recent decades). Although Hartree-Fock calculations did fairly well atpredicting molecular geometries, the computers of the era limited treatmentto molecules not much larger than ethane. Simpler methods such as Huckeltheory31,32,33 and Pariser-Parr-Pople (PPP) theory34 could treat largemolecules but only pi electrons. Hence, they were formally limited to planarmolecules, but not many pharmaceuticals are planar.

In addition to the quantum chemistry programs in use in the 1960s, analternative and independent approach was to use QSAR where the activity of acompound is assumed to be a linear (or quadratic or higher) function of cer-tain molecular descriptors. One commonly used descriptor was the contribu-tion of an atom or a functional group to the lipophilicity of a molecule; thisdescriptor was designated pi (p). Other famous descriptors included the Ham-mett sigma (s) values for aromatic systems and the Taft sigma (s*) values foraliphatic systems. Both parameters came from the realm of physical organicchemistry35,36,37 and are measures of the tendency of a substituent to with-draw or donate electron density relative to a hydrogen atom.

Let us close this section with some final comments about the situation in the1960s. Abbott, Schering-Plough, and Upjohn were among the first companies,besides Lilly, to venture into the area of using computers for attempts at drug

Germination: The 1960s 407

discovery. Dow Chemical, which had pharmaceutical interests, also initiated avery early effort. At these companies, a person with theoretical and computerexpertise was hired or one of the company’s existing research scientists wasallowed to turn attention to learning about this new methodology. Because thescience was so new, much effort was expended by those early pioneers in learningabout the scope of applicability of the available methods. Attempts to actuallydesign a drug were neither numerous nor particularly successful. This generaliza-tion does not imply that there were no scientific successes. There were a few suc-cesses in finding correlations and in gaining a better understanding of what wasresponsible for biological activity at the molecular and atomic level. For example,early work at Lilly revealed the glimmer of a relationship between the calculatedelectronic structure of the beta-lactam ring of cephalosporins and antibacterialactivity. The work was performed in the 1960s but was not published38 until1973 because of delays by cautious research management and patent attorneysat the company. (The relationship was elaborated in subsequent years,39,40 butno new pharmaceutical product resulted.41)

GAINING A FOOTHOLD: THE 1970s

Some of the tiny number of companies that first got into this gamedropped out after a few years (but returned later), either for lack of manage-ment support or because the technology was not intellectually satisfying to thescientist involved. Other companies, like Lilly, persisted. Lilly’s pioneeringeffort paid off in establishing a base of expertise. Also, quite a few paperswere published, almost like from scientists in an academic setting. In hind-sight, however, Lilly may have entered the field too early because the initialefforts were so limited by the then existing science, hardware, and software.First impressions can be lasting, and Lilly management rejected further perma-nent growth for more than 20 years. A series of managers at Lilly at least sus-tained the computational chemistry effort until near the end of the 1980s whenthe computational chemistry group was enlarged to catch up to size at theother large pharmaceutical companies. Companies such as Merck and SmithKline and French (using the old name) entered the field a few years after Lilly.Unlike Lilly, they hired chemists trained in both organic chemistry and com-puters and with a pedigree traceable back to Prof. E. J. Corey at Harvard andhis attempts at computer-aided synthesis planning.42,43,44

Regarding hardware of the 1970s, pharmaceutical companies investedmoney from the sale of their products to buy better and better mainframes.Widely used models included members of the IBM 360 and 370 series. Placingthese more powerful machines in-house made it easier and more secure to sub-mit jobs and to retrieve output. But output was still in the form of long print-outs. Input had advanced to the point where punch cards were no longerneeded. So-called dumb terminals, i.e., terminals with no local processing


capability, could be used to set up input jobs for batch running. For instance,at Lilly an IBM 3278 and a Decwriter II (connected to a DEC-10 computer)were used by the computational chemistry group. The statistics programMINITAB was one of the programs that ran on the interactive Digital Equip-ment Corporation (DEC) machine. Card punches were not yet totally obso-lete, but they received less and less usage. The appearance of a typicalcomputational chemistry laboratory is shown in Figure 2.

The spread of technology at pharmaceutical companies also meant thatsecretaries were given word processors (such as the Wang machines) to use inaddition to typewriters, which were still needed for filling out forms. Key-boarding was the domain of secretaries, data entry technicians, and computa-tional chemists. Only a few managers and scientists would type their ownmemos and articles in those days.

Software was still written primarily in FORTRAN, but now mainlyFORTRAN IV. The holdings of QCPE expanded. Among the important acqui-sitions was Gaussian 70, an ab initio quantum chemistry program written byProf. John A. Pople’s group at Carnegie-Mellon University. Pople made theprogram available in 1973.45,46 (He later submitted Gaussian 76 and Gaussian

Figure 2 Laboratories used by computational chemists in the 1970s and early 1980swere characterized by computer card files, key punches, and stacks of computerprintouts. The terminal in the foreground, being used by a promising assistant, is aDecwriter II connected to a DEC 10 computer at the corporate computer center. Theterminal in the background is an IBM 3278 that was hardwired to an IBMmainframe inthe corporate computer center. Neither terminal had graphical capability. Computerprintouts were saved because the computational chemistry calculations were usuallylengthy (some running for weeks on a CPU) and were therefore expensive to reproduce.This photograph was taken on the day before Christmas 1982, but the appearance of theenvirons had not changed much since the mid-1970s.

Gaining a Foothold: The 1970s 409

80 to QCPE, but they were withdrawn when the Gaussian program wascommercialized by Pople and some of his students in 1987.) Nevertheless,ab initio calculations, despite all the elan associated with them, were stillnot very practical or helpful for pharmaceutically interesting molecules. Semi-empirical molecular orbital methods such as EHT, CNDO/2, and MINDO/3were the mainstays of quantum chemical applications. MINDO/3 was Prof.Michael J. S. Dewar’s third refinement of a modified intermediate neglect ofdifferential overlap method.47

The prominent position of quantum mechanics led a coterie of academictheoreticians to think their approach could solve research problems facing thepharmaceutical industry. These theoreticians, who met annually in Europeand on Sanibel Island in Florida, coined the terms of quantum biology48

and quantum pharmacology,49 names that may seem curious to the unini-tiated. They were not meant to imply that some observable aspect of biologyor pharmacology stems from the wave-particle duality observed in thephysics of electrons. Rather, the names conveyed to cognoscenti that theywere applying their trusty old quantum mechanical methods to compoundsdiscussed by biologists and pharmacologists.50 However, doing a calculationon a system of pharmacological interest is not the same as designing a drug.Calculating the molecular orbitals of serotonin, for instance, is a far cry fromdesigning a new serotonin reuptake inhibitor that could become a pharmaceu-tical product.

Nonetheless, something even more useful came on the software scenein the 1970s. That was Prof. N. L. Allinger’s MMI/MMPI program51,52 formolecular mechanics. Classical methods for calculating conformational ener-gies date to the 1940s and early 1960s.53,54 Copies of Allinger’s program couldbe purchased at a nominal fee from QCPE. Molecular mechanics has theadvantage of being much faster than quantum mechanics and capable of gen-erating common organic chemical structures approaching ‘‘chemical accu-racy’’ (bond lengths correctly predicted to within about 0.01 A). Because ofthe empirical manner in which force fields were derived, molecular mechanicswas an anathema to the quantum purists, never mind that Allinger himself alsoused quantum chemistry. Molecular mechanics became an important techni-que in the armamentarium of industrial researchers. Meanwhile, a surprisingnumber of academic theoreticians were slow to notice that the science wastransitioning55,56 from quantum chemistry to multifaceted computationalchemistry.

Computational chemists in the pharmaceutical industry also branchedout from their academic upbringing by acquiring an interest in force fieldmethods, QSAR, and statistics. Computational chemists working to discoverpharmaceuticals came to appreciate the fact that it was too limiting to confineone’s work to just one approach to a problem. To solve research problems inindustry, one had to use the best available technique in the limited time avail-able, and this did not necessarily mean going to a larger basis set or doing a


higher level (and therefore longer running) quantum chemistry calculation. Itmeant using molecular mechanics or QSAR or whatever. It meant not beinghemmed in by a purely quantum theoretical perspective.

Unfortunately, the tension between the computational chemists and themedicinal chemists at pharmaceutical companies did not ease in the 1970s.Medicinal chemists were at the top of the pecking order in the corporateresearch laboratories. On the basis of conversations at scientific meetingswhere computational chemists from industry (all of them together could fitin a small room in this period) could informally exchange their experiencesand challenges, this was an industry-wide situation. (Readers should not getthe impression that the tension between theoreticians and experimentalistsexisted solely in the business world – it also existed at academic chemistrydepartments.)

The situation was that as medicinal chemists pursued an SAR, the com-putational chemists might suggest a structure worthy of synthesis because cal-culations indicated that it had the potential of being more active. But thecomputational chemist was totally dependent on the medicinal chemist totest the hypothesis. Suddenly, the medicinal chemist saw himself going frombeing the wellspring of design ideas to being a technician who was implement-ing someone else’s idea. Although never intended as a threat to the prestigeand hegemony of the organic chemistry hierarchy, design proposals from out-side that hierarchy were often perceived as such.

Another problem was that it was easy to change a carbon to a nitrogenor any other element on a computer. Likewise, it was easy to attach a substi-tuent at any position in whatever stereochemistry seemed best for enhancingactivity. On a computer it was easy to change a six-member ring to a five-member ring or vice versa. Such computer designs were frequently beyondthe possibilities of the synthetic organic chemists, or at least beyond thefast-paced chemistry practiced in industry. This situation contributed to thedisconnect between computational chemists and medicinal chemists. Whatgood is a computer design if the molecule is impossible to make?

If the computational chemist needed a less active compound to be synthe-sized so as to help establish a computational hypothesis, such as for a pharmaco-phore, that synthesis was totally out of the question. No self-respecting medicinalchemist would want to admit to his management that he purposely spent valuabletime making a less active compound. Thus, the 1970s remained a period in whichthe relationship between computational chemists and medicinal chemists was stillbeing worked out.

People in management, who generally rose from the ranks of medicinalchemists, were often unable to perceive a system for effective use of data andideas from computational approaches. The managers had to constantly thinkabout their own career and did not want to get caught on the wrong side of arisky issue. Many managers of that time were far from convinced that compu-tational input was worth anything.


The computational chemists at Lilly tackled this problem of a colla-boration gap in several ways. One was to keep the communication channelsopen and constantly explain what was being done, what might be doable,and what was beyond the capabilities of the then-current state of the art.For organic chemists who had never used a computer, it was necessary togently dispel the notion that one could push a button on a large box withblinking lights and the chemical structure of the next $200 million drugwould tumble into the output tray of the machine. (Annual sales of $200million was equivalent to a blockbuster drug in those days.) The limited cap-ability to predict molecular properties accurately was stressed by the compu-tational chemists to the synthetic chemists. Relative numbers might bepredictable, but not absolute values. Moreover, it was up to the human,not the machine, to use chemical intuition to capitalize on relationshipsfound between calculated physical properties and sought-after biologicalactivities. Also, it was important for the computational chemist to avoidtechnical jargon and theory when talking with medicinal chemists. The com-putational chemists, to the best of their ability, had to speak the language ofthe organic chemists, not vice versa.

In an outreach to the medicinal chemists at Lilly, a one-week workshopwas created and taught in the research building where the organic chemistswere located. (The computational chemists were initially assigned office spacewith the analytical chemists and later with the biologists.) The workshopcovered the basic and practical aspects of performing calculations on mole-cules. The input requirements (which included the format of the data fieldson the punch cards) were taught for several programs. One program wasused to generate Cartesian atomic coordinates. Output from that programwas then used as input for the molecular orbital and molecular mechanics pro-grams. Several of the adventurous young PhD organic chemists took thecourse. The outreach was successful in that it empowered a few medicinal che-mists to do their own calculations for testing molecular design ideas – it was afoot in the door. These young medicinal chemists could set an example for theolder ones. An analogous strategy was used at some other pharmaceuticalcompanies. For instance, Merck conducted a workshop on synthesis planningfor their chemists.57

Despite these efforts, medicinal chemists were slow to accept whatcomputers could provide. Medicinal chemists would bring a research problemto the computational chemists, sometimes out of curiosity about what com-puting could provide, sometimes as a last resort after a question was unsolva-ble by other approaches. The question might be to explain why adding acertain substituent unexpectedly decreased activity in a series of compounds.Or the problem might involve finding a QSAR for a small set of compounds. Ifthe subsequent calculations were unable to provide a satisfactory answer,there was a tendency on the part of some medicinal chemists to generalizefrom that one try and to dismiss the whole field of computational chemistry.


This facet of human nature, especially of scientifically educated people, wasdifficult to fathom.

A perspective that we tried to instill with our colleagues was that acomputer should be viewed as just another piece of research apparatus.Experiments could be done on a computer just like experiments could berun on a spectrometer or in an autoclave. Sometimes the instrument wouldgive the results the scientist was looking for; other times, the computationalexperiment would fail. Not every experiment – at the bench or in the computer– works every time. If a reaction failed, a medicinal chemist would not dismissall of synthetic chemistry; instead, another synthetic route would beattempted. However, the same patience did not seem to extend to computa-tional experiments.

Finally, in regard to the collaboration gap, the importance of a knowl-edgeable (and wise) mentor – an advocate – cannot be overstated. For a nas-cent effort to take root in a business setting, younger scientists working inexploratory areas had to be shielded from excessive critiquing by powerfulmedicinal chemists and management.

The computational chemists were able to engage in collaborations withtheir fellow physical chemists. Some research questions dealt with molecularconformation and spectroscopy. The 1970s were full of small successes such asfinding relationships between calculated and experimental properties. Some ofthese correlations were published. Even something so grand as the de novodesign of a pharmaceutical was attempted but was not within reach.

Two new computer-based resources were launched in the 1970s. One wasthe Cambridge Structural Database58 (CSD; based in Cambridge, England), andthe other was the Protein Data Bank59 (PDB; then based at BrookhavenNational Laboratory in New York). Computational chemists recognized thatthese compilations of 3-D molecular structures would prove very useful, espe-cially as more pharmaceutically relevant compounds were deposited. The CSDwas supported by subscribers including pharmaceutical companies. On theother hand, the PDB was supported by American taxpayers.

We have not discussed QSAR very much, but two influential books ofthe 1970s can be mentioned. Dr. Yvonne Martin began her scientific careeras an experimentalist in a pharmaceutical laboratory, but after becominginterested in the potential of QSAR, she spent time learning the techniquesat the side of Prof. Corwin Hansch and also Prof. Al Leo of Pomona Collegein California. As mentioned in her book,60 she encountered initial resistance toa QSAR approach at Abbott Laboratories. Another significant book that waspublished in the late 1970s was a compilation of substituent constants.61

These parameters were heavily relied on in QSAR investigations.As the field of computer-aided drug design began to catch on in the

1970s, leaders in the field recognized the need to set standards for publica-tions. For instance, a group of American chemists involved in QSAR researchpublished a paper recommending minimal requirements for reporting the


results of QSAR studies.62 Also, a number of scientists recognized that the fol-lowing problem could develop when CADD is juxtaposed to experimentaldrug discovery. Some CADD studies would eventually lead to hypothesesfor more active chemical structures. The natural impulse on the part of ascientist is to publish the prediction. In fact, some computational chemistshave been rather boastful about correctly predicting properties prior to experi-ment. A prediction and any subsequent experiments to synthesize a designedcompound and test its biological activity is at the heart of the scientific meth-od: hypothesis testing. But suppose a compound is made and it is active. Whatare the chances that the compound would be developed by a pharmaceuticalcompany and would eventually reach the patients who could benefit from it?Unfortunately, the chances were not very good. Once an idea is publicly dis-closed, there is a limit on obtaining patent rights to the compound. Because ofthe high expense of developing a pharmaceutical product, pharmaceuticalcompanies would be reticent to become involved with the designed compound.A committee formed by the International Union of Pure and Applied Chemis-try (IUPAC) attempted to preclude this problem from becoming widespread.They proposed that any designed compound should be made and tested priorto its structure being disclosed.63

GROWTH: THE 1980s

If the 1960s were the Dark Ages and the 1970s were the Middle Ages ofcomputational chemistry, the 1980s were the Renaissance, the Baroque Peri-od, and the Enlightenment all rolled into one. The decade of the 1980s waswhen the various approaches of quantum chemistry, molecular mechanics,molecular simulations, QSAR, and molecular graphics coalesced into moderncomputational chemistry.

In the world of scientific publishing, a seminal event occurred in 1980.Professor Allinger launched his Journal of Computational Chemistry. Thishelped stamp a name on the field. Before the journal began publishing, thefield was variously called theoretical chemistry, calculational chemistry, mod-eling, and so on. Interestingly, Allinger first took his journal proposal to thebusiness managers in charge of publications of the American ChemicalSociety (ACS). When they rejected the concept, Allinger turned to publisherJohn Wiley & Sons, which went on to become the premier producer of jour-nals and books in the field. (Sadly, it was not until 2005 that the ACS finallyrecognized the need to improve its journal offerings in the field of computa-tional chemistry/molecular modeling. It also took the ACS bureaucracy a longtime to recognize computational chemistry as an independent subdiscipline ofchemistry.)

Several exciting technical advances fostered the improved environmentfor computer use at pharmaceutical companies in the 1980s. The first was


the development of the VAX 11/780 computer by Digital Equipment Corpora-tion (DEC) in 1979. The machine was departmental size, i.e., the price, dimen-sions, and easy care of the machine allowed each department or group to haveits own superminicomputer. This was a start toward non-centralized controlover computing resources. At Lilly, the small molecule X-ray crystallographerswere the first to gain approval for the purchase of a VAX around 1980. For-tunately, the computational chemists and a few other scientists were allowedto use it too. The machine was a delight to use and far better than the batch-job-oriented mainframes of International Business Machines (IBM) and otherhardware manufacturers. The VAX could be run interactively. Users commu-nicated with the VAX through interactive graphical terminals, the first ofwhich were monochrome. The first VAX at Lilly was fine for one or two users,but would get bogged down and response times would slow to a crawl if morethan five users were logged on simultaneously. Lilly soon started building anever more powerful cluster of VAXes (also called VAXen in deference to theplural of ox). Several other hardware companies that manufactured supermi-nicomputers in the same class as the VAX sprung up. But DEC proved to be agood, relatively long-lasting vendor to deal with, and many pharmaceuticalcompanies acquired VAXes for research. (Today, DEC and those other hard-ware companies no longer exist.)

The development of personal computers (PCs) in the 1980s started tochange the landscape of computing. The pharmaceutical companies certainlynoticed the development of the IBM PC, but its DOS (disk operating system)made learning to use it difficult. Some scientists nonetheless bought thesemachines. The Apple Macintosh appeared on the scene in 1984. With itscute little, lightweight, all-in-one box including monochrome screen, theMac brought interactive computing to a new standard of user friendliness.Soon after becoming aware of these machines, nearly every medicinal chemistwanted one at work. The machines were great at word processing, graphing,and managing small (laboratory-sized) databases. The early floppy disks for-matted for the Macs had a memory capacity of only 400 KB, but by 1988double-sided, double-density disks could hold 1400 KB, which seemed plentyin those days. In contrast to today’s huge applications requiring a compactdisk (> 500 MB) for storage, a typical program of the 1980s could be‘‘stuffed’’ (compressed) on one or maybe two floppy disks.

On the software front, three advances changed the minds of the medic-inal chemists from being diehard skeptics to almost enthusiastic users. Oneadvance was the development of electronic mail. As the Macs and terminalsto the VAX spread to all the chemists in drug discovery and development,the desirability of being connected became obvious. The chemists could com-municate with each other and with management and could tap into databasesand other computer resources. As electronic traffic increased, research build-ings had to be periodically retrofitted with each new generation of cabling tothe computers. A side effect to the spread of computer terminals to the desktop

Growth: The 1980s 415

of every scientist was that management could cut back on secretarial help forscientists, who then had to do their own keyboarding to write reports andpapers.

The second important software advance was ChemDraw,64,65,66,67

which was released first for the Mac in 1986. This program gave chemiststhe ability to quickly create two-dimensional chemical diagrams. Every med-icinal chemist could appreciate the aesthetics of a neat ChemDraw diagram.The diagrams could be cut and pasted into reports, articles, and patents.The old plastic ring templates for drawing chemical diagrams by hand weresuddenly unnecessary.

The third software advance also had an aesthetic element. This was thetechnology of computer graphics or, as it is called when 3-D structures are dis-played on the computer screens, molecular graphics. Whereas medicinal chem-ists might have trouble understanding the significance of the highest occupiedmolecular orbital or the octanol-water partition coefficient of a structure,they could readily appreciate the stick, ball-and-stick, tube, and space-fillingrepresentations of 3-D molecular structures.68,69,70 The graphics could beshown in color and, on more sophisticated terminals, in stereo. These imageswere so stunning that one director of drug discovery at Lilly decreed thatterms like theoretical chemistry, molecular modeling, and computationalchemistry were out. The whole field was henceforth to be called moleculargraphics as far as he was concerned. A picture was something that could beunderstood! Independently, the Journal of Molecular Graphics sprung up in1983.

Naturally, with the flood of new computer technology came the need totrain the research scientists in its use. Whereas the Mac was so easy that med-icinal chemists could master it and ChemDraw in less than a day of training,the VAX was a little more formidable. The author designed, organized, andtaught VAX classes offered to the medicinal chemists and process chemistsat Lilly.

Computer programs that the computational chemists had been runningon the arcane IBM mainframes were ported to the VAXes. This step made theprograms more accessible because all the chemists were given VAX accounts.So, although the other programs (e.g., email and ChemDraw) enticed medic-inal chemists to sit down in front of the computer screen, they were now morelikely to experiment with molecular modeling calculations. (As discussed else-where,71 the terms computational chemistry and molecular modeling wereused more or less interchangeably at pharmaceutical companies, whereasother scientists in the field tried to distinguish the terms.) Besides the classesand workshops, one-on-one training was offered to help the medicinal chem-ists run the computational chemistry programs. This was generally fruitful butoccasionally led to amusing results such as when one medicinal chemist burstout of his lab to happily announce his discovery that he could obtain a correct-looking 3-D structure from MM2 optimization even if he did not bother to


attach hydrogens to the carbons. However, he had not bothered to check thebond lengths and bond angles for his molecule.

On a broader front, large and small pharmaceutical companies becameaware of the potential for computer-aided drug design. Although pharmaceu-tical companies were understandably reticent to discuss what compounds theywere pursuing, they were quite free in disclosing their computational chemis-try infrastructure. For instance, Merck, which had grown its modeling groupto be one of the largest in the world, published their system72 in 1980. Lilly’sinfrastructure73 was described at a national meeting of the American ChemicalSociety in 1982.

Whereas corporate managers generally are selected for displaying leader-ship skills, they often just follow what they see managers doing at other com-panies. Hence, a strategy used by scientists to obtain new equipment or otherresources was to make their managers aware of what other companies wereable to do. New R&D investments are often spurred by the desire of manage-ment to keep up with competing companies.

In the mid-1980s, the author initiated a survey of 48 pharmaceutical andchemical companies that were using computer-aided molecular design meth-ods and were operating in the United States.74 The aim of the survey was tocollect data that would convince management that Lilly’s computationalchemistry effort needed to grow. We summarize here some highlights ofthe data because they give a window on the situation existing in the mid-1980s.

Between 1975 and 1985, the number of computational chemists employedat the 48 companies increased from less than 30 to about 150, more than dou-bling every five years. Thus, more companies were jumping on the bandwagon,and many companies that were already in this area were expanding their efforts.Hiring of computational chemists accelerated through the decade.75 Aware of thepolarization that could exist between theoretical and medicinal chemists, somecompanies tried to circumvent this problem by hiring organic chemistry PhDswho had spent a year or two doing postdoctoral research in molecular modeling.This trend was so pervasive that by 1985, only about a fifth of the computationalchemists working at pharmaceutical companies came from a quantum mechani-cal background. Students too became aware of the fact that if their PhD experi-ence was in quantum chemistry, it would enhance their job prospects if they spenta year or two in some other area such as performing molecular dynamics simula-tions of proteins.

The computational chemistry techniques used most frequently at thattime were molecular graphics and molecular mechanics. Ab initio quantumprograms were in use at 21 of the 48 companies. Over 80% of the companieswere using commercially produced software. Two-thirds of the companieswere using software sold by Molecular Design Ltd. (MDL). A quarter wereusing SYBYL from Tripos Associates, and 15% were using the molecularmodeling program CHEMGRAF by Chemical Design Ltd.


The following companies had five or more scientists working full-time ascomputational chemists in 1985: Abbott, DuPont, Lederle (part of AmericanCyanamid), Merck, Rohm and Haas, Searle, SmithKline Beecham, andUpjohn. Some of these companies had as many as 12 scientists working oncomputer-aided molecular design applications and software development.For the 48 companies, the mean ratio of the number of synthetic chemiststo computational chemists was 29:1. This ratio reflects not only what percen-tage of a company’s research effort was computer-based, but also the numberof synthetic chemists that each computational chemist might be expected toserve. Hence, a small ratio indicates more emphasis on computing or a smallstaff of synthetic chemists. Pharmaceutical companies with low ratios (lessthan 15:1) included Abbott, Alcon, Allergan, Norwich Eaton (part of Proctor& Gamble), and Searle. The most common organizational arrangement (at40% of the 48 companies) was for the computational chemists to be integratedin the same department or division as the synthetic chemists. The other com-panies tried placing their computational chemists in a physical/analyticalgroup, in a computer science group, or in their own unit.

About three-quarters of the 48 companies were using a VAX 11/780,785, or 730 as their primary computing platform for research. The IBM3033, 3083, 4341, and so on were being used for molecular modeling at abouta third of the companies. (The percentages add up to more than 100% becauselarger companies had several types of machines.) The most commonly usedgraphics terminal was the Evans and Sutherland PS300 (E&S PS300) (40%),followed by Tektronix, Envison, and Retrographics VT640 at about one-thirdof the companies each, and IMLAC (25%). The most used brands of plotter in1985 were the Hewlett-Packard and Versatec.

As mentioned above, the most widely used graphics terminal in 1985was the E&S PS300. This machine was popular because of its very high reso-lution, color, speed, and stereo capabilities. (It is stunning to think that a com-pany so in fashion and dominant during one decade could totally disappearfrom the market a decade later. Such are the foibles of computer technology.)At Lilly, the E&S PS300 was set up in a large lighted room with black curtainsenshrouding the cubicle with the machine. All Lilly scientists were free to usethe software running on the machine. In addition, the terminal also served as ashowcase of Lilly’s research prowess that was displayed to visiting Lilly salesrepresentatives and visiting dignitaries. No doubt a similar situation occurredat other companies.

The ability to see molecular models or other three-dimensional data ona computer screen was a novelty that further widened interest in computergraphics. Most users required special stereo glasses to see the images in stereo,but some chemists delighted themselves by mastering the relaxed-eye orcrossed-eye of looking at the pairs of images.

The 1980s saw an important change in the way software was handled.In the 1970s, most of the programs used by computational chemists were


distributed essentially freely through QCPE, exchanged person to person, ordeveloped in-house. But in the 1980s, many of the most popular programs –and some less popular ones – were commercialized. The number of softwarevendors mushroomed. For example, Pople’s programs for ab initio calcula-tions were withdrawn from QCPE; marketing rights were turned over to acompany he helped found, Gaussian Inc. (Pittsburgh, Pennsylvania). This com-pany also took responsibility for continued development of the software. Inthe molecular modeling arena, Tripos Associates (St. Louis, Missouri) wasdominant by the mid-1980s. Their program SYBYL originally came from aca-demic laboratories at Washington University (St. Louis).76

In the arena of chemical structure management, MDL (then in Hayward,California) was dominant. This company, which was founded in 1978 by Prof.Todd Wipke and others, marketed a program called MACCS for managementof databases of compounds synthesized at or acquired by pharmaceutical com-panies. The software stored chemical structures (in two-dimensional represen-tation) and allowed substructure searching and later similarity searching.77,78

The software was vastly better than the manual systems that pharmaceuticalcompanies had been using for recording compounds on file cards that werestored in filing cabinets. Except for some companies, such as Upjohn, whichhad their own home-grown software for management of their corporate com-pounds, many companies bought MACCS and became dependent on it. Ashappens in a free market where there is little competition, MACCS was veryexpensive. Few if any academic groups could afford it. A serious competingsoftware product for compound management did not reach the market until1987, when Daylight Chemical Information Systems was founded. By then,pharmaceutical companies were so wed to MACCS that there was great inertiaagainst switching their databases to another platform, even if it was cheaperand better suited for some tasks.

In 1982, MDL started selling REACCS, a database management systemfor chemical reactions. Medicinal chemists liked both MACCS and REACCS.The former could be used to check whether a compound had been synthesizedin-house and, if so, how much material was left in inventory. The latter pro-gram could be used to retrieve information about synthetic transformationsand reaction conditions that had been published in the literature.

Some other momentous advances occurred on the software front. Onewas the writing of MOPAC, a semiempirical molecular orbital program, byDr. James J. P. Stewart, a postdoctoral associate in Prof. Michael Dewar’sgroup at the University of Texas at Austin.79,80,81 MOPAC was written inFORTRAN 77, a language that became popular among computationalchemists in the 1980s. MOPAC was the first widely used program capableof automatically optimizing the geometry of molecules. This was a hugeimprovement over prior programs that could only perform calculations onfixed geometries. Formerly, a user would have to vary a bond length or abond angle in increments, doing a separate calculation for each; then fit a


parabola to the data points and try to guess where the minimum was. Hence,MOPAC made the determination of 3-D structures much simpler and moreefficient. The program could handle molecules large enough to be of pharma-ceutical interest. With VAXes, a geometry optimization calculation couldrun as long as two or three weeks of wall clock time. An interruption of arun caused by a machine shutdown meant rerunning the calculation fromthe start. For the most part, however, the VAXes were quite stable.

MOPAC was initially applicable to any molecule parameterized forDewar’s MINDO/3 or MNDO molecular orbital methods (i.e., commonelements of the first and second rows of the periodic table). The optimized geo-metries were not in perfect agreement with experimental numbers but werebetter than what could have been obtained by prior molecular orbital programsfor large molecules (those beyond the capability of ab initio calculations).Stewart made his program available through QCPE in 1984, and it quicklybecame (and long remained) the most requested program from QCPE’s libraryof several hundred.82 Unlike commercialized software, programs from QCPEwere attractive because they were distributed as source code and cost very little.

In the arena of molecular mechanics, Prof. Allinger’s ongoing, meticu-lous refinement of an experimentally-based force field for organic compoundswas welcomed by chemists interested in molecular modeling at pharmaceuticalcompanies. The MM2 force field83,84 gave better results than MMI. To fundhis research, Allinger sold distribution rights for the program initially to Mole-cular Design Ltd. (At the time, MDL also marketed several simple moleculargraphics and modeling programs. Later, distribution rights for Allinger’sprograms were transferred to Tripos.)

A program of special interest to the pharmaceutical industry was CLOGP.This program was developed by Prof. Al Leo (Pomona College) in the1980s.85,86,87 It was initially marketed through Daylight Chemical InformationSystems (then of New Orleans and California). CLOGP could predict the lipophi-licity of organic molecules. The algorithm was based on summing the contribu-tion from each fragment (set of atoms) within a structure. The fragmentcontributions were parameterized to reproduce experimental octanol-water parti-tion coefficients, log Po/w. There was some discussion among scientists aboutwhether octanol was the best organic solvent to mimic biological tissues, butthis solvent proved to be the satisfactory for most purposes and eventually becamethe standard. To varying degrees, lipophilicity is related to many molecular prop-erties, including molecular volume, molecular surface area, transport throughmembranes, binding to receptor surfaces, and hence to many different bioactiv-ities. The calculated log Po/w values were widely used as a descriptor in QSARstudies in both industry and academia.

Yet another program was Dr. Kurt Enslein’s TOPKAT.88,89 It was soldthrough his company, Health Designs (Rochester, New York). The softwarewas based on statistics and was trained to predict the toxicity of a moleculefrom its structural fragments. Hence, compounds with fragments such as nitro


or hydrazine would score poorly, basically confirming what an experiencedmedicinal chemist already knew. The toxicological endpoints included carci-nogenicity, mutagenicity, teratogenicity, skin and eye irritation, and so forth.Today, pharmaceutical companies routinely try to predict toxicity, metabo-lism, bioavailibilty, and other factors that determine whether a highly potentligand has what it takes to become a medicine. But back in the 1980s, thescience was just beginning to be tackled computationally. The main marketfor the program was probably government laboratories and regulators. Phar-maceutical laboratories were aware of the existence of the program but wereleery of using it much. Companies trying to develop drugs were afraid that ifthe program, which was of unknown reliability for any specific compound,erroneously predicted danger for a structure, it could kill a project even thougha multitude of laboratory experiments might give the compound a clean billof health. There was also the worry about litigious lawyers. A compoundcould pass all the difficult hurdles of becoming a pharmaceutical, yet someundesirable, unexpected side effect might show up in some small percentageof patients taking it. If lawyers and lay juries (who frequently had — andhave — trouble comprehending complex topics such as science, the relativemerits of different experiments, and the benefit-risk ratio associated withany pharmaceutical product) learned that a computer program had once putup a red flag for the compound, the pharmaceutical company could be allegedto be at fault.

We briefly mention one other commercially produced program SAS. Thiscomprehensive data management and statistics program was used mainlyfor handling clinical data, which was analyzed by the statisticians at eachcompany. Computational chemists also used SAS and other programs whenstatistical analyses were needed. SAS also had then unique capabilities for pro-ducing graphical representations of multidimensional numerical data.90 (Thiswas in the days prior to Spotfire.)

With the widespread commercialization of molecular modeling softwarein the 1980s came both a boon and a bane to the computational chemist andpharmaceutical companies. The boon was that the software vendors sent mar-keting people to individual companies as well as to scientific meetings. Themarketeers would extol the virtues of the programs they were pushing. Greatadvances in drug discovery were promised if only the vendor’s software sys-tems were put in the hands of the scientists. Impressive demonstrations ofmolecular graphics, overlaying molecules, and so forth convinced companymanagers and medicinal chemists that here was the key to increasing researchproductivity. As a result of this marketing, most pharmaceutical companiespurchased the software packages. The bane was that computer-aided drugdesign (CADD) was oversold, thereby setting up unrealistic expectations ofwhat could be achieved by the software. Unrealistic expectations were alsoset for what bench chemists could accomplish with the software. Bench che-mists tend to be intolerant of problematic molecular modeling software.


Whereas experienced computational chemists are used to tolerating complex,limited, jury-rigged, or tedious software solutions, bench chemists generally donot have the time or patience to work with software that is difficult to use.Unless the experimentalists devoted a good deal of time to learning the meth-ods and limitations, the software was best left in the hands of computationalchemistry experts.

Also in the 1980s, structure-based drug design (SBDD) underwent asimilar cycle. Early proponents oversold what could be achieved throughSBDD, thereby causing pharmaceutical companies to reconsider their invest-ments when they discovered that SBDD too was no panacea for filling thedrug discovery cornucopia with choice molecules for development. Neverthe-less, SBDD was an important advance.

All through the 1970s, computational chemists were often rhetoricallyquizzed by critics about what, if any, pharmaceutical product had ever beendesigned by computer. Industrial computational chemists had a solid numberof scientific accomplishments but were basically on the defensive whenchallenged with this question. Evidence to rebut the critics strengthened in the1980s. The fact is that only a few computer-designed structures had ever beensynthesized. (See our earlier discussion on who gets credit for a design idea.)The fact is that only a very tiny percentage of molecules – from any source –ever makes it as far as being a clinical candidate. The stringent criteria set forpharmaceutical products to be used in humans winnows out almost all molecules.The odds were not good for any computational chemist achieving the ultimatesuccess: discovering a drug solely with the aid of the computer. In fact, manymedicinal chemists would toil diligently at their benches and fume hoods for awhole career and never have one of their compounds selected as a candidatefor clinical development.

Another factor impeding computational chemistry from reaching its fullusefulness was that there were only a few drug targets that had had their 3-Dstructures solved prior to the advancing methods for protein crystallographyof the 1980s. One such early protein target was dihydrofolate reductase(DHFR), the 3-D structures of which became known in the late 1970s.91,92

This protein became a favorite target of molecular modeling/drug designefforts in industry and elsewhere in the 1980s. Many resources were expendedtrying to find better inhibitors than the marketed pharmaceuticals of the anti-neoplastic methotrexate or the antibacterial trimethoprim. Innumerablepapers and lectures sprung from those efforts. Scientists do not like to reportnegative results, but one brave author of a 1988 review article quietly alludedto the fact that none of the computer-based efforts at his company or disclosedby others in the literature had yielded better drugs.93

Although this first major, widespread effort at SBDD was a disappointment,the situation looked better on the QSAR front. In Japan, Koga94,95,96 employedclassical (Hansch-type) QSARwhile discovering the antibacterial agent norfloxacinaround 1982. Norfloxacin was the first of the third-generation analogs of nalidixic


acid to reach the marketplace. This early success may not have received the noticeit deserved, perhaps because the field of computer-aided drug design continued tofocus heavily on computer graphics, molecular dynamics, X-ray crystallography,and nuclear magnetic resonance spectroscopy.97 Another factor obscuring thissuccess may have been that medicinal chemists and microbiologists at other phar-maceutical companies capitalized upon the discovery of norfloxacin to elaborateeven better quinoline antibacterials that eventually dominated the market.

As computers and software improved, SBDD became a more popularapproach to drug discovery. One company, Agouron in San Diego, California,set a new paradigm for discovery based on iterations between crystallographyand medicinal chemistry. As new compounds were made, some of them couldbe co-crystallized with the target protein. The 3-D structure of the complexeswere solved by rapid computer techniques. Molecular modeling observationsof how the compounds fit into the receptor suggested ways to improve affinity,leading to another round of synthesis and crystallography.

Although considered by its practitioners and most others as an experi-mental science, protein crystallography (now popularly called structural biol-ogy) often employed a step whereby the refraction data was refined inconjunction with constrained molecular dynamics (MD) simulations. Dr.Axel Brunger’s program X-PLOR98 met this important need. The force fieldin the program had its origin in CHARMM developed by Prof. MartinKarplus’s group at Harvard.99 Pharmaceutical companies that set up proteincrystallography groups acquired X-PLOR to run on their computers.

The SBDD approach affected computational chemists positively. Theincreased number of 3-D structures of therapeutically relevant targets openednew opportunities for molecular modeling of the receptor sites. Computa-tional chemists assisted the medicinal chemists in interpreting the fruits ofcrystallography for design of new ligands.

Molecular dynamics simulations can consume prodigious amounts ofcomputer time. Not only are proteins very large structures, but also the MDresults are regarded as better the longer they are run because more of confor-mational space is assumed to be a sample by the jiggling molecules. Even moredemand for computer power seemed necessary when free energy perturbation(FEP) theory appeared on the scene. Some of the brightest luminaries in aca-demic computational chemistry proclaimed that here was a powerful newmethod for designing drugs.100,101 Pharmaceutical companies were influencedby these claims.102 On the other hand, computational chemists closer to thefrontline of working with medicinal chemists generally recognized thatwhereas FEP was a powerful method for accurately calculating the bindingenergy between ligands and macromolecular targets, it was too slow for exten-sive use in actual drug discovery. The molecular modifications that could besimulated with FEP treatment, such as changing one substituent to another,were relatively minor. Because the FEP simulations had to be run so long toobtain good results, it was often possible for a medicinal chemist to synthesize


the new modification in less time than it took to do the calculations! And, inthose cases where a synthesis would take longer than the calculations, notmany industrial medicinal chemists would rate the modification predictedfrom theory to be worth investing that much of their time. Researchers inindustry are under a great deal of pressure to tackle problems quickly andnot spend too much time on them.

As we near the end of our coverage of the 1980s, we mention one un-usual organizational structure. Whereas it was common practice in pharma-ceutical companies for a medicinal chemist or other organic chemist tomanage the computational chemistry group, one small company, Searle inChicago, experimented in the mid-1980s with the arrangement of havingthe medicinal chemistry group report to a computational chemist. A potentialadvantage of this arrangement was that molecular structures designed on thecomputer would more likely be synthesized. Also, collaboration between thecomputational and the medicinal chemists could be mandated by a managerwho wanted CADD to have a chance to succeed. However, the experimentlasted only two years. A publication in 1991 revealed that Searle experiencedsome of the same frictions in trying to maximize the contributions of compu-tational chemistry that plagued other companies.103 (Searle was eventuallysubsumed by Pharmacia, which was swallowed by Pfizer.)

The insatiable need for more computing resources in the 1980s sensitizedthe pharmaceutical companies to investigate supercomputing.104 Some phar-maceutical companies opted to acquire specialized machines such as array pro-cessors. By the mid-1980s, for example, several pharmaceutical companieshad acquired the Floating Point System (FPS) 164. Other pharmaceutical com-panies sought to meet their needs by buying time and/or forming partnershipswith one of the state or national supercomputing centers that had been set upin the United States, Europe, and Japan. For instance, in 1988, Lilly partneredwith the National Center for Supercomputing Applications (NCSA) in Urba-na-Champaign, Illinois. Meanwhile, supercomputer manufacturers such asCray Research and ETA Systems, both in Minnesota, courted scientists andmanagers at the pharmaceutical companies.

A phrase occasionally heard in this period was that computations werethe ‘‘third way’’ of science. The other two traditional ways to advance sciencewere experiment and theory. The concept behind the new phrase was thatcomputing could be used to develop and test theories and to stimulate ideasfor new experiments.

GEMS DISCOVERED: THE 1990s

The 1990s was a decade of fruition because the computer-based drugdiscovery work of the 1980s yielded an impressive number of new chemicalentities reaching the pharmaceutical marketplace. We elaborate on this


statement later in this section, but first we complete the story about supercom-puters in the pharmaceutical industry.

Pharmaceutical companies were accustomed to supporting their ownresearch and making large investments in it. In fact, the pharmaceutical indus-try has long maintained the largest self-supporting research enterprise in theworld. However, the price tag on a supercomputer was daunting. To helpopen the pharmaceutical industry as customers for supercomputers, the chiefexecutive officer (CEO) of Cray Research took the bold step of paying a visitto the CEO of Lilly in Indianapolis. Apparently, Cray’s strategy was to entice amajor pharmaceutical company to purchase a supercomputer, and then addi-tional pharmaceutical companies might follow suit in the usual attempt tokeep their research competitive. Lilly was offered a Cray-2 at an irresistibleprice. Not only did Lilly buy a machine, but other pharmaceutical companieseither bought or leased a Cray. Merck, Bristol-Myers Squibb, Marion MerrellDow (then a large company in Cincinnati, Ohio), Johnson & Johnson, andBayer were among the companies that chose a Cray. Some of these machineswere the older X-MP or the smaller J90 machine, the latter being less expen-sive to maintain.

After Lilly’s purchase of the Cray 2S-2/128, line managers were given theresponsibility to make sure the purchase decision had a favorable outcome.This was a welcome opportunity because line management was fully confidentthat supercomputing would revolutionize research and development (R&D).105

The Lilly managers believed that a supercomputer would enable their scientiststo test more ideas than would be practical with older computers. Managementwas optimistic that a supercomputer would foster collaborations and informa-tion sharing among employees in different disciplines at the company. The man-agers hoped that both scientific and business uses of the machine wouldmaterialize. Ultimately then, supercomputing would speed the identificationof promising new drug candidates. Scientists closer to the task of using thesupercomputer saw the machine primarily as a tool for performing longer mole-cular dynamics simulations and quantum mechanical calculations on largemolecules. However, if some other computational technique such as QSAR ordata mining was more effective at discovering and optimizing new lead com-pounds, then the supercomputer might not fulfill the dreams envisioned for it.A VAX cluster remained an essential part of the technological infrastructurebest suited for management of the corporate library of compounds (see moreabout this later).

Lilly management organized special workshops to train potentialusers of the Cray. This pool of potential users included as many willingmedicinal chemists and other personnel as could be rounded up. In-housecomputational chemists and other experts were assigned the responsibilityof conducting the off-site, week-long workshops. The workshops coverednot only how to submit and retrieve jobs, but also the general methods ofmolecular modeling, molecular dynamics, quantum chemistry, and QSAR.

Gems Discovered: The 1990s 425

The latter, as mentioned, did not require supercomputing resources, exceptperhaps occasionally to generate quantum mechanical descriptors. Mainly,however, the training had the concomitant benefit of exposing moremedicinal chemists, including younger ones, to what could be achieved withthe current state of the art of computational chemistry applied to moleculardesign.

As the role of the computational chemists became more important, atti-tudes toward them became more accepting. At some large, old pharmaceuticalhouses, and at many smaller, newer companies, it was normal practice toallow computational chemists to be co-inventors on patents if the computa-tional chemists contributed to a discovery. Other companies, including Lilly,had long maintained a company-wide policy that computational chemistscould not be on drug patents. The policy was changed at Lilly as the 1990sdawned. Computational chemists were becoming nearly equal partners inthe effort to discover drugs. This was good both for the computational che-mists and for the company because modern pharmaceutical R&D requires ateam effort.

Lilly’s Cray also served as an impressive public relations showcase.The machine was housed in a special, climate-controlled room. One side ofthe darkened room had a wall of large glass windows treated with a layerof polymer-dispersed liquid crystals. The thousands of visitors who came toLilly each year were escorted into a uniquely designed observation room wherean excellent video was shown that described the supercomputer and how itwould be used for drug discovery. The observation room was automaticallydarkened at the start of the video. At the dramatic finish of the video, thetranslucent glass wall was turned clear and bright lights were turned on insidethe computer room revealing the Cray-2 and its cooling tower for the heattransfer liquid. The visitors enjoyed the spectacle.

To the disappointment of Lilly’s guest relations department, Lilly’sCray-2 was later replaced with a Cray J90, a mundane-looking machine.But the J90 was more economical especially because it was leased. The super-computers were almost always busy with molecular dynamics and quantummechanical calculations.106 Of the personnel at the company, the computa-tional chemists were the main beneficiaries of supercomputing.

At the same time supercomputers were creating excitement at a smallnumber of pharmaceutical companies, another hardware development wasattracting attention at just about every company interested in designing drugs.Workstations from Silicon Graphics Inc. (SGI) were becoming increasinglypopular for molecular research. These high-performance, UNIX-basedmachines were attractive because of their ability to handle large calculationsquickly and because of their high-resolution, interactive computer graphics.Although a supercomputer was fine for CPU-intensive jobs, the workstationswere better suited for interactive molecular modeling software being usedfor drug research. The workstations became so popular that some medicinal


chemists wanted them for their offices, not so much for extensive use, butrather as a status symbol.

Another pivotal event affecting the hardware situation of the early 1990smerits mention. As already stated, the Apple Macintoshes were well liked byscientists. However, in 1994, Apple lost its lawsuit against Microsoft regard-ing the similarities of the Windows graphical user interface (GUI) to Apple’sdesktop design. Adding to Apple Corporation’s problems, the price ofWindows-based PCs dropped significantly below that of Macs. The tablestilted in favor of PCs. More scientists began to use PCs. At Lilly, and maybeother companies, the chief information officer (a position that did not evenexist until the 1990s when computer technology became so critical to corpo-rate success) decreed that the company scientists would have to switch to PCswhether they wanted to or not. The reasons for this switch were several-fold.The PCs were more economical. With PCs being so cheap, it was likely morepeople would use them, and hence, there was a worry that software for Macswould become less plentiful. Also, the problem of incompatible files would beeliminated if all employees used the same type of computer and software.

On the software front, the early 1990s witnessed a continued trendtoward commercially produced programs being used in pharmaceutical com-panies. Programs such as SYBYL (marketed by Tripos), Insight/Discover(BIOSYM), and Quanta/CHARMm (Polygen, and later Molecular Simula-tions Inc., and now called Accelrys) were popular around the world for molec-ular modeling and simulations. Some pharmaceutical companies boughtlicenses to all three of these well-known packages. Use of commercial softwarefreed the in-house computational chemists from the laborious task of codedevelopment, documentation, and maintenance, so that they would havemore time to work on actual drug design. Another attraction of using commer-cial software was that the larger vendors would have a help desk that userscould telephone for assistance when software problems arose, as they oftendid. The availability of the help desk meant that the in-house computationalchemists would have fewer interruptions from medicinal chemists who werehaving difficulty getting the software to work. On the other hand, some com-panies, particularly Merck and Upjohn, preferred to develop software in-housebecause it was thought to be better than what the vendors could provide.

Increasing use of commercial software for computational chemistrymeant a declining role for software from QCPE. QCPE had passed its zenithby ca. 1992, when it had almost 1900 members and over 600 programs in itscatalog. This catalog included about 15 molecular modeling programs writtenat pharmaceutical companies and contributed for the good of the communityof computational chemists. Among the companies contributing software wereMerck, DuPont, Lilly, Abbott, and Novartis. When distribution rights forMOPAC were acquired by Fujitsu in 1992, it was a severe blow to QCPE.After a period of decline, the operations of QCPE changed in 1998. Todayonly a Web-based operation continues at Indiana University Bloomington.


The 1990s witnessed changes for the software vendors also. The Californiacompany that started out as BioDesign became Molecular Simulations Inc.(MSI). Management at MSI went on a buying spree starting in 1991. The com-pany acquired other small software companies competing in the same drugdesign market, including Polygen, BIOSYM, BioCAD, and Oxford Molecular(which had already acquired several small companies including Chemical DesignLtd. in 1998).107 Pharmaceutical companies worried about this accretionbecause it could mean less competition and it could mean that their favoritemolecular dynamics (MD) program might no longer be supported in the future.This latter possibility has not come to pass because there was sufficient loyaltyand demand for each MD package to remain on the market.

Researchers from pharmaceutical companies participated in user groupsset up by the software vendors. Pharmaceutical companies also bought intoconsortia created by the software vendors. These consortia, some of whichdated back to the 1980s, aimed at developing new software tools or improvingexisting software. The pharmaceutical companies hoped to get something fortheir investments. Sometimes the net effect of these investments was that itenabled the software vendors to hire several postdoctoral research associateswho worked on things that were of common interest to the investors.Although the pharmaceutical companies received some benefit from the con-sortia, other needs such as more and better force field parameters, remainedunderserved. Inspired by the slow progress in one force field development con-sortium, Merck single-handedly undertook the de novo development of a forcefield they call the Merck Molecular Force Field (MMFF94). This force field,which targeted the modeling of pharmaceutically interesting molecules well,was published,108–114 and several software vendors subsequently incorporatedit in their molecular modeling programs. The accolades of fellow computa-tional chemists led to the developer being elected in 1992 to become chairmanof one of the Gordon Research Conferences on Computational Chemistry.(The latter well-respected conference series originated in 1986.115)

On the subject of molecular modeling and force fields, a general molecularmodeling package was developed in an organic chemistry laboratory at Colum-bia University in New York City.116 Perhaps because MacroModel was writtenwith organic chemists in mind, it proved popular with industrial medicinalchemists, among others. The program was designed so that versions of widelyused, good force fields including those developed by Allinger and by Kollmancould easily be invoked for any energy minimization or molecular simulation.

The 1990s witnessed other exciting technological developments. In1991, Dr. Jan K. Labanowski, then an employee of the Ohio SupercomputerCenter (Columbus, Ohio), launched an electronic bulletin board called theComputational Chemistry List (CCL). Computational chemists rapidly joinedbecause it was free and an effective forum for informal exchange of informa-tion. Computational chemists at pharmaceutical companies were among the2000 or so members who joined in the 1990s. Often these employees would


take the time to answer questions from beginners, helping them learn about thefield of computer-aided drug design. The CCL was a place where the relativemerits of different methodologies and computers, and the pros and cons ofvarious programming languages could be debated, sometimes passionately.

In 1991, MDL came out with a new embodiment of their compoundmanagement software called ISIS (Integrated Scientific Information System).Pharmaceutical companies upgraded to the new system, having become sodependent on MDL. In general, managers of information technology atpharmaceutical companies preferred one-stop solutions. On the other hand,computational chemists found Daylight Chemical Information Systemssoftware more useful for developing new research applications.

MACCS and then ISIS gave researchers exceptional new tools for drugdiscovery when similarity searching came along. Chemical structures werestored in the database as connectivity tables (describing the atoms and whichones are connected by bonds). In addition, chemical structures could be storedas a series of on-off flags (‘‘keys’’) indicating the presence or absence of specificatoms or combinations of atoms and/or bonds. The similarity of compoundscould be quantitated by the computer in terms of the percentage of keys thatthe compounds shared in common. Thus, if a researcher was aware of a leadstructure from in-house work or the literature, it was possible to find com-pounds in the corporate database that were similar and then get thesecompounds assayed for biological activities. Thus the technique of datamining became important. It was fairly easy to find compounds with low levelsof activity by this method depending on how large the database was. Some ofthese active compounds might have a skeleton different from the lead struc-ture. The new skeleton could form the basis for subsequent lead optimization.As Dr. Yvonne C. Martin (Abbott) has wryly commented in her lectures atscientific meetings, one approach to drug discovery is to find a compoundthat the target receptor sees as the same as an established ligand but that apatent examiner sees as a different compound (and therefore satisfying thenovelty requirement for patentability).

Many or most of the results from data mining in industry went unpub-lished because the leads generated were potentially useful knowledge andbecause of the never-ending rush of high priority work. When a few academicresearchers gained access to commercial data mining software and a compounddatabase, the weakly active compounds that they found were excitedly pub-lished. This difference between industry and academia in handling similar kindsof results is a matter of priorities. In industry, the first priority is to find market-able products and get them out the door. In academia, the priority is to publish(especially in high-impact journals). Contrary to a common misconception,however, scientists in industry do publish, a point we return to later.

Software use for drug discovery and development can be classified in var-ious ways. One way is technique based. Examples would be programs basedon force fields or on statistical fitting (the latter including log P prediction and


toxicity prediction). Another way to classify software is based on whether thealgorithm can be applied to cases where the 3-D structure of the target recep-tor is known or not. An example of software useful when the receptor struc-ture is not known is Catalyst.117 This program, which became available in theearly 1990s, tried to generate a 3-D model of a pharmacophore based on asmall set of compounds with a range of activities against a given target. Thepharmacophore model, if determinable, could be used as a query to searchdatabases of 3-D structures in an effort to find new potential ligands.

In fortuitous circumstances where the 3-D structure of the target recep-tor was known, three computational chemistry methodologies came intoincreased usage. One was docking, i.e., letting an algorithm try to fit a ligandstructure into a receptor. Docking methodology dates back to the 1980s, butthe 1990s saw more crystal structures of pharmaceutically relevant proteinsbeing solved and used for ligand design.118 A second technique of the 1990sinvolved designing a computer algorithm to construct a ligand de novo inside areceptor structure. The program would assemble small molecular fragments or‘‘grow’’ a chemical structure such that the electrostatic and steric attributes ofthe ligand would complement those of the receptor.119,120,121 The third tech-nique of the 1990s was virtual screening.122,123 The computer would screenhypothetical ligand structures, not necessarily compounds actually in bottles,against the 3-D structure of a receptor in order to find those most likely to fitand therefore worthy of synthesis and experimentation.

Technologies for protein crystallography continued to improve. Usingcomputational chemistry software to refine ‘‘experimental’’ protein structuresadvanced. A paper by Brunger et al. went on to become one of the most highlycited papers in the 10-year period starting in 1995.124

A new approach to drug discovery came to prominence around 1993.The arrival of this approach was heralded with optimism reminiscent of earlierwaves of new technologies. The proponents of this innovation – combinatorialchemistry – were organic chemists. Although rarely explicitly stated, the think-ing behind combinatorial chemistry seemed to be as follows. The chance offinding a molecule with therapeutic value was extremely low (one in 5000or one in 10,000 were rough estimates that were often bandied about).Attempts at rational drug design had not significantly improved the odds offinding those rare molecules that could become a pharmaceutical product.Because the low odds could not be beat, make tens of thousands, . . . no, hun-dreds of thousands, . . . no, millions of compounds! Then, figuratively fire amassive number of these molecular bullets at biological targets and hopethat some might stick. New computer-controlled robotic machinery wouldpermit synthesis of all these compounds much more economically than the tra-ditional one-compound-at-a-time process of medicinal chemistry. Likewisecomputer-controlled robotic machinery would automate the biological testingand reduce the cost per assay. Thus was introduced high-throughput screening(HTS) and ultra-HTS.


Proponents promised that use of combinatorial chemistry (combi-chem)and HTS was the way to fill the drug discovery pipeline with future pharma-ceutical products. Pharmaceutical companies, encouraged by the advice ofhighly paid consultants from academia, made massive investments in peopleand infrastructure to set up the necessary equipment in the 1990s. The com-puters needed to run the equipment had to be programmed, and this work wasdone by instrument engineers, although chemists helped set up the systemsthat controlled the synthesis.

Combinatorial chemistry increased the rate of output of new compoundsby three orders of magnitude. Before combi-chem came on the scene, a typicalSAR at a pharmaceutical company might have consisted of fewer than a cou-ple hundred compounds, and a massive effort involving 10–20 medicinalchemistry laboratories might have produced two or three thousand com-pounds over a number of years. In 1993, with traditional one-compound-at-a-time chemistry, it took one organic chemist on average one week to makeone compound for biological testing. Some years later, with combi-chem achemist could easily produce 2000 compounds per week.

With the arrival of combi-chem, computational chemists had a new taskin addition to what they had been doing. Computational chemistry wasneeded so that the combinatorial chemistry was not mindlessly driven bywhatever reagents were available in chemical catalogs or from other sources.Several needs were involved in library designs.125 At the beginning of aresearch project, the need would be to cover as much of ‘‘compound space’’as possible, i.e., to produce a variety of structures to increase the likelihoodthat at least one of the compounds might stick to the target. (Although theterms chemical or compound space have been in use for a couple years, formaldefinitions in the literature are hard to find. We regard compound space as theuniverse of chemically reasonable (energetically stable) combinations of atomsand bonds. A reaction involves the crossing of paths going from one set ofpoints in chemical space to another set. A combinatorial library of compoundswould be a subset of chemical space.) After the drug discovery researchers hadgained a general idea of what structure(s) would bind to the target receptor, asecond need arose: to design compounds similar to the lead(s). In other words,to pepper compound space around the lead in an effort to find a structure thatwould optimize biological activity. A third need was to assess the value oflibraries being offered for sale by various outside intermediaries. Computa-tional chemists could help determine whether these commercial libraries com-plemented or duplicated a company’s existing libraries of compounds anddetermine the degree of variety in the compounds being offered. Computa-tional chemists could determine whether a proposed library would overlap alibrary previously synthesized. How does one describe chemical space andmolecular similarity? Computational chemists had already developed themethodologies of the molecular descriptors and substructure keys, which wementioned earlier. With these tools, the computational chemist could discern


where structures were situated in multidimensional compound or propertyspace and provide advice to the medicinal chemists. (Each dimension of multi-dimensional space can be thought of as corresponding to a different descriptoror property.)

Along with all the data generated by combi-chem and HTS came theneed to manage and analyze the data. Hence, computers and the science ofinformatics became increasingly vital. The need to visualize and learn fromthe massive quantities of data arising from both experimental and computa-tional approaches to drug discovery led to development of specialized graphi-cal analysis tools.

The computational chemist was now becoming more important to drugdiscovery research than ever before. Hence, by 1993–1994, these technologi-cal changes helped save the jobs of many computational chemists at a timewhen pharmaceutical companies in the United States were downsizing, aswe now explain. The industry has been a favorite whipping boy of politiciansfor at least 40 years. In 1992–1993 an especially negative political force threat-ened the pharmaceutical industry in the United States. That force was thehealthcare reform plan proposed by Hillary and Bill Clinton. Their vision ofAmerica was one that required more lawyers and regulators. Readers who arewell versed in history of the 1930s will be aware of the economic systemhanded down from the fascist governments of pre-World War II Europe.Under that system, the means of production (industry) remains in private own-ership but the prices that the companies can ask for their products are regu-lated by government. That was the scheme underlying the Clintons’ healthcarereform proposal. Pharmaceutical companies in the United States generallyfavored any proposal that would increase access to their products but fearedthis specific proposal because of the great uncertainty it cast over the statusquo and future growth prospects. As a result, thousands of pharmaceuticalworkers – including research scientists – were laid off or encouraged to retire.Rumors swirled around inside each pharmaceutical company about whowould be let go and who would retain their jobs. When word came downabout the corporate decisions, the computational chemists were generallyretained, but the ranks of the older medicinal chemists were thinned. A newgeneration of managers at pharmaceutical companies now realized that com-puter-assisted molecular design and library design were critical components oftheir company’s success. One is reminded of the observation of the Nobellaureate physicist, Max Planck, ‘‘An important scientific innovation rarelymakes its way by gradually winning over and converting its opponents. . ..What does happen is that its opponents gradually die out and the growing gen-eration is familiarized with the idea from the beginning.’’

Nevertheless, the Clintons’ healthcare reform scheme had a deleteriouseffect on the hiring of new computational chemists. The job market for com-putational chemists in the United States fell126 from a then record high in1990 to a depression in 1992–1994. This happened because pharmaceutical


companies were afraid to expand until they were sure that the business climatewas once again hospitable for growth. The healthcare reform proposalwas defeated in the United States Congress, but it took a year or two beforepharmaceutical companies regained their confidence and started rebuildingtheir workforces.

Whereas, in the past, some companies hired employees with the intentionof keeping them for an entire career, the employment situation becamemore fluid after the downsizing episode. Replacements were sometimes hiredas contract workers rather than as full employees. This was especially true forinformation technology (IT) support people. Pharmaceutical companies hiredthese temporary computer scientists to maintain networks, PCs, and worksta-tions used by the permanent employees. (Even more troubling to scientists atpharmaceutical companies was the outsourcing of jobs as a way to controlR&D costs.)

Toward the mid-1990s, a new mode of delivering online content came tothe fore: the Web browser. Information technology engineers and computa-tional chemists help set up intranets at pharmaceutical companies. Thisallowed easy distribution of management memos and other information tothe employees. In addition, biological screening data could be posted onthe intranet so that medicinal chemists could quickly access it electronically.Computational chemists made their applications (programs) Web-enabledso that medicinal chemists and others could perform calculations from theirdesktops.

The hardware situation continued to evolve. Personal computers becameever more powerful in terms of processor and hard drive capacity. The price ofPCs continued to fall. Clusters of PCs were built. Use of the open-source Linuxoperating system spread in the 1990s. Distributed processing was developedso a long calculation could be farmed out to separate machines. Massivelyparallel processing was tried. All these changes meant that the days of thesupercomputers were numbered.

Whereas the trend in the 1980s was toward dispersal of computingpower to the departments and the individual user, the IT administratorsstarted bringing the PCs under their centralized control in the 1990s. Softwareto monitor each machine was installed so that what each user did could betracked. Gradually, computational chemists and other workers lost controlover what could and could not be installed on their office machines. Thesame was true for another kind of hardware: the SGI workstations. TheseUNIX machines became more powerful and remained popular for molecularmodeling through the 1990s. Silicon Graphics Inc. acquired the expiringCray technology, but it did not seem to have much effect on their workstationbusiness.

Traditionally, in pursuit of their structure-activity relationships,medicinal chemists had focused almost exclusively on finding compoundswith greater and greater potency. However, these SARs often ended up with


compounds that were unsuitable for development as pharmaceutical products.These compounds were not soluble enough in water, were not orally bioavail-able, or were eliminated too quickly or too slowly from mammalian bodies.Pharmacologists and pharmaceutical development scientists for years had triedto preach the need for the medicinal chemists to think about these otherfactors that determined whether a compound could be a medicine. As hasbeen enumerated elsewhere, there are many factors that determine whethera potent compound has what it takes to become a drug.127 Experimentally,it took a great deal of time to determine these other factors. Often, the neces-sary research resources would not be allocated to a compound until it hadalready been selected for project team status.

At the beginning of the 1990s, predicting the factors beyond potency thatare essential for a compound to become a pharmaceutical product were gen-erally beyond the capability of computational chemistry methods to predictreliably. These factors include properties such as absorption, distribution,metabolism, elimination, and toxicity (ADME/Tox). However, as the decadeunfolded, computational chemists and other scientists created new and bettermethodologies for helping the medicinal chemists and biologists to select com-pounds considering more of the characteristics necessary to become a drug. In1997, Lipinski’s now-famous ‘‘Rule of Five’’ was published.128 These simplerules were easily encoded in database mining operations at every company, sothat compounds with low prospects of becoming an orally active, small-mole-cule drug (e.g., having a molecular weight less than 500 Da) could be weededout by computer. Software vendors also incorporated these and other rulesinto their programs for sale to the pharmaceutical companies.

The computational methods used in the 1980s focused, like medicinalchemistry, on finding compounds with ever-higher affinity between the ligandand its target receptor. That is why in the past we have advocated use of theterm computer-aided ligand design (CALD) rather than CADD.126,129 How-ever, with increased attention to factors other than potency, the field was final-ly becoming more literally correct in calling itself CADD.

Another important change started in the mid-1990s. Traditionally, aQSAR determined at a pharmaceutical company might have involved only5–30 compounds. The number depended on how many compounds the med-icinal chemist had synthesized and submitted to testing by the biologists.Sometimes this size data set sufficed to reveal useful trends. In other cases,the QSARs were not very robust in terms of predictability. As large librariesof compounds were produced, data sets available for QSAR analysis becamelarger. With all that consistently produced (although not necessarily very accu-rately) biological data and a plethora of molecular descriptors, it was possibleto find correlations with better predictability. In fact, QSAR proved to be oneof the best approaches to providing assistance to the medicinal chemist in the1990s. Computational chemists were inventive in creating new moleculardescriptors; thousands have been described in the literature.130,131,132,133


As stated in the opening of this section, the 1990s witnessed the fruition ofa number of drug design efforts. Making a new pharmaceutical product avail-able to patients is a long, arduous, and costly enterprise. It takes 10–15 yearsfrom the time a compound is discovered in the laboratory until it is approvedfor physicians to prescribe. Hence, a molecule that reached the pharmacies inthe 1990s was probably first synthesized at a pharmaceutical company wellback in the 1980s. (Most of today’s widely prescribed medicines come fromthe pharmaceutical industry rather than from government or academic labora-tories.) The improved methodologies of computational chemistry that becameavailable in the 1980s should therefore have started to show their full impactin the 1990s. (Likewise, the improved experimental and computational meth-odologies of the 1990s, if they were as good as claimed, should be bearing fruitnow in the early part of the 21st century.) Table 1 lists medicines whose discov-ery was aided in some way by computer-based methods.

Those compounds marked ‘‘CADD’’ were publicized in a series of earlierpublications.134–140 The CADD successes were compiled in 1997 when weundertook a survey of the corresponding authors of papers published after1993 in the prestigious Journal of Medicinal Chemistry. Correspondingauthors were asked whether calculations were crucial to the discovery ofany compounds from their laboratory. Of the hundreds of replies, we culledout all cases where calculations had not led to drug discovery or had beendone post hoc on a clinical candidate or pharmaceutical product. We havealways felt strongly that the term ‘‘computer-aided drug design’’ should bemore than just doing a calculation; it should be providing information or ideasthat directly help with the conception of a useful new structure. For the survey,we retained only those cases where the senior author of a paper (usually amedicinal chemist) vouched that computational chemistry had actually beencritically important in the research that led to the discovery of a compoundthat had reached the market. As seen in Table 1, there were seven compoundsmeeting this criterion in the period 1994–1997. The computational techniquesused to find these seven compounds included QSAR, ab initio molecular orbi-tal calculations, molecular modeling, molecular shape analysis,141 docking,active analog approach,142 molecular mechanics, and SBDD.

More recently, a group in England led by a structural biologist compileda list of marketed medicines that came from SBDD.143 These are labeled‘‘SBDD’’ in Table 1. It can be seen that only a little overlap exists betweenthe two compilations (CADD and SBDD). It can also be seen that the numberof pharmaceuticals from SBDD is very impressive. Computer-based technolo-gies are clearly making a difference in helping bring new medicines to patients.Often computational chemists had a role to play in fully exploiting the X-raydata.

Looking at the success stories, we see that it has often been a teamof researchers working closely together that led to the success. It took quitea while for other members of the drug discovery research community to


Table

1M

arketed

Pharm

aceuticals

Whose

DiscoveryW

asAided

byComputers

Yearapproved

Discovery

Generic

name

Brandname

Marketed

by

inUnited

States

assistedby

Activity

Norfloxacin

Noroxin

Merck

1983

QSAR

Antibacterial

Losartan

Cozaar

Merck

1994

CADD

Antihypertensive

Dorzolamide

Trusopt

Merck

1995

CADD/SBDD

Antiglaucoma

Ritonavir

Norvir

Abbott

1996

CADD

Antiviral

Indinavir

Crixivan

Merck

1996

CADD

Antiviral

Donepezil

Aricept

Esai

1997

QSAR

Anti-A

lzheimer’s

Zolm

itriptan

Zomig

AstraZeneca

1997

CADD

Antimigraine

Nelfinavir

Viracept

Pfizer

1997

SBDD

Antiviral

Amprenavir

Agenerase

GlaxoSmithKline

1999

SBDD

Antiviral

Zanamivir

Relenza

GlaxoSmithKline

1999

SBDD

Antiviral

Oseltamivir

Tamiflu

Roche

1999

SBDD

Antiviral

Lopinavir

Aluviran

Abbott

2000

SBDD

Antiviral

Imatinib

Gleevec

Novartis

2001

SBDD

Antineoplastic

Erlotinib

Tarceva

OSI

2004

SBDD

Antineoplastic

436

appreciate what computational chemistry could provide. Even today thereremains room for further improvement in this regard. Computational chemis-try is probably most effective when researchers work in an environment wherecredit is shared.144

Research management must try to balance, usually by trial and error, theopposing styles of encouraging competition among co-workers or encouragingcooperation in order to find what produces the best results from a given set ofteam members working on a given project. If management adopts a systemwhereby company scientists are competing with each other, some scientistsmay strive harder to succeed but collaborations become tempered and infor-mation flows less freely. On the other hand, if all members of an interdisciplin-ary team of scientists will benefit when the team succeeds, then collaborationincreases, synergies can occur, and the team is more likely to succeed. Some-times it helps to put the computational chemistry techniques in the hands ofthe medicinal chemists, but it seems that only some of these chemists havethe time and inclination to use the techniques to best advantage. Therefore,computational chemistry experts play an important role in maximizing drugdiscovery efforts.

FINAL OBSERVATIONS

Computers are so ubiquitous in pharmaceutical research and develop-ment today that it may be hard to imagine a time when they were not availableto assist the medicinal chemist or biologist. The notion of a computer on thedesk of every scientist and company manager was rarely contemplated a quar-ter century ago. Now, computers are essential for generating, managing, andtransmitting information.

Over the last four decades, we have witnessed waves of new technologiessweep over the pharmaceutical industry. Sometimes these technologies tendedto be oversold at the beginning and turned out to not be a panacea to meet thequota of the number of new chemical entities that each company would like tolaunch each year. Computer hardware has been constantly improving. Experi-ence has shown that computer technology so pervasive at one point in time canalmost disappear 10 years later. In terms of software, the early crude methodsfor studying molecular properties have given way to much more powerful, bet-ter suited methods for discovering drugs.

The data in Figure 3 attempts to summarize and illustrate what we havetried to describe about the history of computing at pharmaceutical companiesover the last four decades. We plot the annual number of papers published[and abstracted by Chemical Abstracts Service (CAS)] for each year from1964 through 2005. These are papers that were indexed by CAS as pertainingto ‘‘computer or calculation’’ and that came from pharmaceutical companies.Initially, we had wanted to structure our SciFinder Scholar145 search forall papers using terms pertaining to computational chemistry, molecular

Final Observations 437

Figure 3 Annual number of papers published by researchers at pharmaceuticalcompanies during a 42-year period. The data were obtained by searching the CAPLUSand MEDLINE databases for papers related to computer or calculation. Then these hitswere refined with SciFinder Scholar (v. 2004) by searching separately for 64 differentcompany names. Well-known companies from around the world were included. Manyof these companies are members of the Pharmaceutical Research and ManufacturingAssociation (PhRMA). The companies are headquartered in the United States,Switzerland, Germany, and Japan. The names of companies with more than 150 totalhits in the period 1964-2004 are shown in the box. The indexing by CAS is such that asearch on SmithKline Beecham gave the same number of hits as for GlaxoSmithKline(GSK) but much more than for Smith Kline and French. Searches on Parke-Davis andWarner Lambert give the same total number of hits. The top 10 companies forproducing research publications relevant to computers and calculations are, in rankorder, GlaxoSmithKline, Bayer, Merck, BASF, Lilly, Upjohn, Pfizer, Hoffmann-LaRoche, Hoechst, and Ciba-Geigy. Some companies in the plot ceased having papers topublish simply because they were acquired by other pharmaceutical companies, andhence the affiliation of the authors changed. In the category marked ‘‘Other’’ are 40other pharmaceutical and biopharmaceutical companies. These mostly smallercompanies had fewer than 150 papers in the 42-year period. The CAPLUS databasecovers 1500 journal titles. This plot is easier to see in color, but for this grayscalereproduction we note that the order of companies in the legend is the same as the orderof layers in the chart.

438

modeling, computer-aided drug design, quantitative structure-activity rela-tionships, and so forth. However, CAS classifies these terms as overlappingconcepts, and so SciFinder Scholar was unable to do the searches as desired.Searching on ‘‘computer or calculation’’ yields many relevant hits but also anontrivial number of papers that are of questionable relevance. This contam-ination stems from the subjective way abstractors at CAS have indexed articlesover the years. The irrelevant papers introduce noise in the data, so we want tofocus on the qualitative trend of the top curve, which represents the sum ofpapers by all 64 companies covered in the survey.

Figure 3 shows that industrial scientists do publish quite a bit. The totalnumber of publications started off very low and increased slowly and errati-cally from 1964 until 1982. From 1982 to 1993, the annual number of papersgrew dramatically and monotonically. This period is when the superminicom-puters, supercomputers, and workstations appeared on the scene. After 1993,the total number of papers published each year by all companies in our ana-lysis continued growing but more slowly. The number peaked at more than600 papers per year in 2001. Curiously, the last few years show a slight declinein the number of papers published, although the number is still very high. Per-haps in recent years more proprietary research has been occupying the atten-tion of computational chemists in the pharmaceutical industry.

Although we have no way of knowing the total number of computa-tional chemists employed in the pharmaceutical industry during each yearfor the last 40 years, it is possible that this number follows a curve similarto that for the total number of papers plotted in Figure 3.

As the twentieth century came to a close, the job market for computa-tional chemists had recovered from the 1992–1994 debacle. In fact, demandfor computational chemists leaped to new highs each year in the second halfof the 1990s.146 Most of the new jobs were in industry, and most of theseindustrial jobs were at pharmaceutical or biopharmaceutical companies. Aswe noted at the beginning of this chapter, in 1960 there were essentially nocomputational chemists in industry. But 40 years later, perhaps well overhalf of all computational chemists are working in pharmaceutical laboratories.The outlook for computational chemistry is therefore very much linked to thehealth of the pharmaceutical industry. Forces that adversely affect pharmaceu-tical companies will have a negative effect on the scientists who work there aswell as at auxiliary companies such as software vendors that develop programsand databases for use in drug discovery and development.

Discovering new medicines is a serious, extremely difficult, and expen-sive undertaking. Tens of thousands of scientists are employed in this activity.Back in 1980, pharmaceutical and biotechnology companies operating inthe United States invested in aggregate about US$ 2 � 109 in R&D. Thesum has steadily increased (although there was the slight pause in 1994 thatwe mentioned earlier). By 2003, investment in pharmaceutical R&D hadgrown to $34.5 � 109, and it increased further to $38 � 109 in 2004 and


$39.4 � 109 in 2005.147 Drug discovery is risky in the sense that there is noguarantee that millions of dollars invested in a project will pay off. Currently,it may cost as much as $1.2 � 109 on average to discover a new pharmaceu-tical product and bring it to the market.

Prior to the 1990s, the majority of new chemical entities (NCEs) werecoming from pharmaceutical companies in Europe. However, as the Europeangovernmental units over-regulated the industry and discouraged innovation,pharmaceutical research activity slowed in Europe and moved to the UnitedStates.148 Many of the outstanding computational chemists in Europe immi-grated to the United States where opportunities for pharmaceutical discoverywere more exciting. Today the United States pharmaceutical industry investsfar more in discovering new and better therapies than the pharmaceuticalindustry in any other country or any government in the world. Because ofits capitalistic business climate, the United States has the most productivepharmaceutical industry in the world.

Despite the ever increasing investment in R&D each year, the annualnumber of NCEs approved for marketing in the United States (or elsewhere)has not shown any overall increase in the last 25 years. In the last two decades,the number of NCEs has fluctuated between 60 and 20 NCEs per year. Theannual number of NCEs peaked in the late 1990s and was only 20 in 2005.Before the late 1990s, this very uncomfortable fact134 was not widely dis-cussed by either research scientists or corporate executives but is now men-tioned frequently. Scientists did not want to bring attention to their lowsuccess rate; executives did not want to alarm their stakeholders.

The dearth of NCEs is depicted in Figures 4 and 5 which serve to illus-trate that combinatorial chemistry generates many more compounds goinginto the drug discovery pipeline, but the number of new drugs coming outthe funnel has not improved. This comparison demonstrates how difficult ithas become to discover useful new pharmaceutical products. Also, it demon-strates that combi-chem, like other new technologies, was probably oversoldby the organic chemists.

A recent study of companies in various businesses found that there is nodirect relationship between R&D spending and common measures of corpo-rate success such as growth or profitability.149 The number of patents alsodoes not correlate with sales growth. Superior research productivity wasascribed to the quality of an organization’s innovative capabilities. In otherwords, the ability and luck to identify worthwhile areas of research is moreimportant than the number of dollars spent on research.

An analysis of NCE data attempted to reach the conclusion thatinnovation is bringing to market drugs with substantial advantage overexisting treatments.150 However, deciding whether R&D is becoming moreproductive depends on how the NCE data is handled. Generally, as recognizedby most people in the field, the NCEs are not as numerous as one wouldlike.151,152


In an attempt to boost NCE output, executives at pharmaceutical com-panies have put their researchers under extreme pressure to focus and produce.Since the early 1990s, this pressure has moved in only one direction: up.

Those pharmaceutical companies with scientists who are best at creatingand using tools will be able to innovate their way to the future. With combi-natorial chemistry, high-throughput screening, genomics, structural biology,and informatics firmly embedded in modern drug discovery efforts, computa-tional chemistry is indispensable. Modern drug discovery involves inter- andintra-disciplinary teamwork. To succeed, highly specialized chemists and biol-ogists must collaborate synergistically.

For a computational chemistry group to succeed, they need to be led byknowledgeable proponents with an understanding of the need to align thegroup’s expertise with corporate goals. Groups that have been led by peopletoo academically oriented and who rate success mainly in academic terms havenot helped their companies to remain viable.

All musical composers work with the same set of notes, but thegeniuses put the notes together in an extraordinarily beautiful way. Synthetic

Figure 4 Before the arrival of combinatorial chemistry and high-throughput screening,pharmaceutical scientists had to investigate on average 10,000 compounds to find onecompound that was good enough to become a pharmaceutical product.


1 000 compounds advance toextended biological testing

10 compoundsadvance to

clinic

1 compound advances to

market

100 compounds advance to toxicological testing

10 000 compounds synthesized and screened

chemists all have available to them the same elements. The successful medicinalchemist will combine atoms such that amazing therapeutic effect is achievedwith the resulting molecule. Computational chemistry has become importantin the pharmaceutical industry because it can provide a harmonious accompa-niment to medicinal chemistry. The computational chemist’s goal should beto help the medicinal chemist by providing information about structural andelectronic requirements to enhance activity, for example, information aboutwhich regions of compound space are most propitious for exploration.

Fortunately, the effort that goes into pharmaceutical R&D does benefitsociety. In nations where modern medicines are available, life expectancy hasincreased and disability rates among the elderly have declined. Considering allof the things that can go wrong with the human body, many challenges remainfor the pharmaceutical researcher. Hopefully, this review will inspire some

100 000 – 1 000 000 compoundssynthesized and screened

1 000 compounds advance toextended biological testing

10 compoundsadvance to

clinic

100 compounds advance totoxicological testing

1 compound advances to

market

Figure 5 With combinatorial chemistry and high-throughput screening deployed,pharmaceutical scientists investigate many more compounds but still find on averageonly one compound that is good enough to become a pharmaceutical product.


young readers to take up the challenge and join the noble quest to applyscience to help find cures to improve people’s lives.

ACKNOWLEDGMENTS

We are grateful for the privilege to contribute this historical perspective. We thank Dr. Ken-neth B. Lipkowitz for his usual vigorous editorial improvements to this chapter. We thank ourmany colleagues over the years for what they taught us or tried to teach us. In particular, weacknowledge Dr. MaxM. Marsh who was one of the very first people in the pharmaceutical indus-try to recognize the future potential of computer-aided drug design. Dr. Marsh, who started out asan analytical chemist, served 42 years at Eli Lilly and Company before retiring in 1986. It wasgenerally recognized that Lilly was a family-oriented company committed to doing what was rightin all phases of its business (the company’s early motto was ‘‘If it bears a red Lilly, it’s right’’), andthere was great mutual loyalty between the company and the employees. Dr. Marsh epitomizedthese traditions of company culture. A better mentor and more gentlemanly person is hard to ima-gine. The research effort that he initiated in the early 1960s is still in operation today and of courseis now much larger. During part of a 25-year career at the Lilly Research Laboratories of Eli Lillyand Company, the author had the privilege to work with Dr. Marsh. The author would also like tomention Dr. Roger G. Harrison, who started out as an organic chemist working for Eli Lilly andCompany in England. He was one of the new generation of managers who appreciated whatcomputational chemistry could contribute to drug discovery research and set in place a climateto maximize its potential. The author also thanks Prof. Norman L. Allinger, Mr. Douglas M.Boyd, Mrs. Joanne H. Boyd, Dr. David K. Clawson, Dr. Richard D. Cramer III, Dr. David A.Demeter, Mr. Gregory L. Durst, Mrs. Susanne B. Esquivel, Dr. Richard W. Harper, Dr. RobertB. Hermann, Dr. Anton J. Hopfinger, Dr. Stephen W. Kaldor, Mrs. Cynthia B. Leber, Dr. YvonneC. Martin, Dr. Samuel A. F. Milosevich, Dr. James J. P. Stewart, and Dr. Terry R. Stouch for aid asthis review was being written. Creation of this review was also assisted by the computer resourcesof SciFinder Scholar, Google, and Wikipedia.

REFERENCES

1. The organic, inorganic, and physical chemistry courses that the author took in graduate schoolat Harvard University were so permeated with quantum mechanics that he chose a researchproject in this field. The project involved molecular orbital calculations on some well-knownbiomolecules. This interest was further developed in a postdoctoral position at CornellUniversity, so it was natural that his career path led to the pharmaceutical industry. Theauthor joined the drug discovery efforts at the Lilly Research Laboratories of Eli Lilly andCompany in Indianapolis in 1968. After a satisfying career of 25 years at the company, hebecame a research professor at his present affiliation.

2. E. Fischer, Ber. Dtsch. Chem. Ges., 27, 2985–2993 (1894). Einflub der Konfiguration auf dieWirkung der Enzymen.

3. R. B. Silverman, The Organic Chemistry of Drug Design and Drug Action, Academic Press,San Diego, CA, 1992.

4. A. Messiah, Quantum Mechanics, Vol. I, (Translated from the French by G. M. Temmer),Wiley, New York, 1966.

5. B. Pullman and A. Pullman, Quantum Biochemistry, Interscience Publishers, Wiley,New York, 1963.

6. A. Crum Brown and T. R. Fraser, Trans. Roy. Soc. Edinburgh, 25, 151–203 (1869). On theConnection between Chemical Constitution and Physiological Action. Part I. On thePhysiological Action of the Salts of the Ammonium Bases, Derived from Strychnia, Brucia,Thebaia, Codeia, Morphia, and Nicoti.

References 443

7. A. Crum Brown and T. R. Fraser, Trans. Roy. Soc. Edinburgh, 25, 693–739 (1869). On theConnection between Chemical Constitution and Physiological Action. Part II. On thePhysiological Action of the Ammonium Bases Derived from Atropia and Conia.

8. T. C. Bruice, N. Kharasch, and R. J.Winzler,Arch. Biochem. Biophys., 62, 305–317 (1956). ACorrelation of Thyroxine-like Activity and Chemical Structure.

9. R. Zahradnik, Experimentia, 18, 534–536 (1962). Correlation of the Biological Activity ofOrganic Compounds by Means of the Linear Free Energy Relations.

10. C. Hansch and T. Fujita, J. Am. Chem. Soc., 86, 1616–1626 (1964). r-s-p Analysis; Methodfor the Correlation of Biological Activity and Chemical Structure.

11. S. M. Free Jr. and J. W. Wilson, J. Med. Chem., 7, 395–399 (1964). A MathematicalContribution to Structure-Activity Studies.

12. K. B. Lipkowitz andD. B. Boyd, inReviews in Computational Chemistry, K. B. Lipkowitz andD. B. Boyd, Eds., Wiley-VCH, New York, 2001, Vol. 17, pp. 255–357. Books Published onthe Topics of Computational Chemistry.

13. J. D. Bolcer and R. B. Hermann, inReviews in Computational Chemistry, K. B. Lipkowitz andD. B. Boyd, Eds., VCH Publishers, New York, 1994, Vol. 5, pp. 1–63. The Development ofComputational Chemistry in the United States.

14. S. J. Smith and B. T. Sutcliffe, in Reviews in Computational Chemistry, K. B. Lipkowitz andD. B. Boyd, Eds., VCH Publishers, New York, 1997, Vol. 10, pp. 271–316. The Develop-ment of Computational Chemistry in the United Kingdom.

15. R. J. Boyd, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds.,Wiley-VCH, New York, 2000, Vol. 15, pp. 213–299. The Development of ComputationalChemistry in Canada.

16. J.-L. Rivail and B.Maigret, inReviews inComputational Chemistry, K. B. Lipkowitz andD. B.Boyd, Eds., Wiley-VCH, New York, 1998, Vol. 12, pp. 367–380. Computational Chemistryin France: A Historical Survey.

17. S. D. Peyerimhoff, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd,Eds., Wiley-VCH, Hoboken, NJ, 2002, Vol. 18, pp. 257–291. The Development ofComputational Chemistry in Germany.

18. K. B. Lipkowitz and D. B. Boyd, Eds., in Reviews in Computational Chemistry, Wiley-VCH,New York, 2000, Vol. 15, pp. v-xi. A Tribute to the Halcyon Days of QCPE.

19. C. K. Johnson, Crystallographic Computing, Proceedings of the International Summer Schoolon Crystallographic Computing, Carleton University, Ottawa, Canada, Aug. 1969, Crystal-lographic Computing, F. R. Ahmed, Ed., Munksgaard, Copenhagen, Denmark, 1970,pp. 227–230. Drawing Crystal Structures by Computer.

20. L. E. Sutton, D. G. Jenkin, A. D. Mitchell, and L. C. Cross, Eds., Tables of InteratomicDistances and Configuration in Molecules and Ions, Special Publ. No. 11, The ChemicalSociety, London, 1958.

21. W. L. Koltun, Biopolymers, 3, 665–679 (1965). Precision Space-Filling Atomic Models.

22. D. B. Boyd, J. Chem. Educ., 53, 483–488 (1976). Space-Filling Molecular Models of Four-Membered Rings. Three-Dimensional Aspects in the Design of Penicillin and CephalosporinAntibiotics.

23. R. Hoffmann and W. N. Lipscomb, J. Chem. Phys., 36, 2179–2189 (1962). Theory ofPolyhedral Molecules. I. Physical Factorizations of the Secular Equation.

24. R. Hoffmann, J. Chem. Phys., 39, 1397–1412 (1963). An Extended Huckel Theory. I.Hydrocarbons.

25. J. A. Pople and G. A. Segal, J. Chem. Phys., 43, S136-S149 (1965). Approximate Self-ConsistentMolecular Orbital Theory. II. Calculations with Complete Neglect of DifferentialOverlap.

26. J. A. Pople and D. L. Beveridge, Approximate Molecular Orbital Theory, McGraw-Hill,New York, 1970.


27. D. R. Hartree, Proc. Cambridge Phil. Soc., 24, 89–110 (1928). The Wave Mechanics of anAtom with a Non-Coulomb Central Field. I. Theory and Methods.

28. D. R. Hartree, Proc. Cambridge Phil. Soc., 24, 111–132 (1928). The Wave Mechanics of anAtom with a Non-Coulomb Central Field. II. Some Results and Discussion.

29. D. R. Hartree, Proc. Cambridge Phil. Soc., 24 (Pt. 3), 426–437 (1928).WaveMechanics of anAtom with a Non-Coulomb Central Field. III. Term Values and Intensities in Series inOptical Spectra.

30. V. Fock,Zeitschrift fur Physik, 62, 795–805 (1930). ‘‘Self-Consistent Field’’ with Interchangefor Sodium.

31. J. D. Roberts, Notes on Molecular Orbital Calculations, Benjamin, New York, 1962.

32. W. B. Neely,Mol. Pharmacol., 1, 137–144 (1965). The Use ofMolecular Orbital Calculationsas an Aid to Correlate the Structure and Activity of Cholinesterase Inhibitors.

33. R. S. Schnaare and A. N. Martin, J. Pharmaceut. Sci., 54, 1707–1713 (1965). QuantumChemistry in Drug Design.

34. R. G. Parr, Quantum Theory of Molecular Electronic Structure, Benjamin, New York,1963.

35. L. P. Hammett, Physical Organic Chemistry; Reaction Rates, Equilibria, and Mechanisms,2nd ed., McGraw-Hill, New York, 1970.

36. R. W. Taft Jr., J. Am. Chem. Soc., 74, 2729–2732 (1952). Linear Free-Energy Relationshipsfrom Rates of Esterification and Hydrolysis of Aliphatic and Ortho-Substituted BenzoateEsters.

37. E. S. Gould, Mechanism and Structure in Organic Chemistry, Holt Reinhart Winston, NewYork, 1959.

38. R. B. Hermann, J. Antibiot., 26, 223–227 (1973). Structure-Activity Correlations in theCephalosporin C Series Using Extended Huckel Theory and CNDO/2.

39. D. B. Boyd, R. B. Hermann, D. E. Presti, and M. M. Marsh, J. Med. Chem., 18, 408–417(1975). Electronic Structures of Cephalosporins and Penicillins. 4. Modeling Acylation bythe Beta-Lactam Ring.

40. D. B. Boyd, D. K. Herron, W. H. W. Lunn, andW. A. Spitzer, J. Am. Chem. Soc., 102, 1812–1814 (1980). Parabolic Relationships between Antibacterial Activity of Cephalosporins andBeta-Lactam Reactivity Predicted from Molecular Orbital Calculations.

41. D. B. Boyd, in The Amide Linkage: Structural Significance in Chemistry, Biochemistry, andMaterials Science, A. Greenberg, C. M. Breneman, and J. F. Liebman, Eds., Wiley, NewYork, 2000, pp. 337–375. Beta-Lactam Antibacterial Agents: Computational ChemistryInvestigations.

42. E. J. Corey, W. T. Wipke, R. D. Cramer III, andW. J. Howe, J. Am. Chem. Soc., 94, 421–430(1972). Computer-Assisted Synthetic Analysis. Facile Man-Machine Communication ofChemical Structure by Interactive Computer Graphics.

43. E. J. Corey, W. T. Wipke, R. D. Cramer III, andW. J. Howe, J. Am. Chem. Soc., 94, 431–439(1972). Techniques for Perception by a Computer of Synthetically Significant StructuralFeatures in Complex Molecules.

44. W. T. Wipke and P. Gund, J. Am. Chem. Soc., 96, 299–301 (1974). Congestion. Conforma-tion-Dependent Measure of Steric Environment. Derivation and Application in Stereose-lective Addition to Unsaturated Carbon.

45. W. J. Hehre, W. A. Lathan, R. Ditchfield, M. D. Newton, and J. A. Pople, QCPE, 11, 236(1973). GAUSSIAN 70: Ab Initio SCF-MO Calculations on Organic Molecules.

46. W. J. Hehre, L. Radom, P. v. R. Schleyer, and J. A. Pople,Ab InitioMolecularOrbital Theory,Wiley-Interscience, New York, 1986, p. 44.

47. R. C. Bingham, M. J. S. Dewar, and D. H. Lo, J. Am. Chem. Soc., 97, 1285–1293 (1975).Ground States of Molecules. XXV. MINDO/3. Improved Version of the MINDO Semi-empirical SCF-MO Method.

References 445

48. P. O. Lowdin, Ed., Int. J. Quantum Chem., Quantum Biol. Symp. No. 1, Proceedings of theInternational Symposium on Quantum Biology and Quantum Pharmacology, Held atSanibel Island, Florida, January 17–19, 1974, Wiley, New York, 1974.

49. W. G. Richards, Quantum Pharmacology, Butterworths, London, UK, 1977.

50. E. C. Olson and R. E. Christoffersen, Eds., Computer-Assisted Drug Design, Based on aSymposium Sponsored by the Divisions of Computers in Chemistry and Medicinal Chem-istry at the ACS/CSJ Chemical Congress, Honolulu, Hawaii, April 2–6, 1979, ACS Sympo-sium Series 112, American Chemical Society, Washington, DC, 1979.

51. N. L. Allinger, M. A. Miller, F. A. Van Catledge, and J. A. Hirsch, J. Am. Chem. Soc., 89,4345–4357 (1967). Conformational Analysis. LVII. The Calculation of the ConformationalStructures of Hydrocarbons by the Westheimer-Hendrickson-Wiberg Method.

52. R. Gygax, J. Wirz, J. T. Sprague, and N. L. Allinger, Helv. Chim. Acta., 60, 2522–2529(1977). Electronic Structure and Photophysical Properties of Planar Conjugated Hydro-carbons with a 4n-Membered Ring. Part III. Conjugative Stabilization in an ‘‘Antiaromatic’’System: The Conformational Mobility of 1,5-Bisdehydro[12]annulene.

53. F. H. Westheimer and J. E. Mayer, J. Chem. Phys., 14, 733–738 (1946). The Theory of theRacemization of Optically Active Derivatives of Biphenyl.

54. J. B. Hendrickson, J. Am. Chem. Soc., 83, 4537–4547 (1961). Molecular Geometry. I.Machine Computation of the Common Rings.

55. D. B. Boyd and K. B. Lipkowitz, J. Chem. Educ., 59, 269–274 (1982). Molecular Mechanics.The Method and Its Underlying Philosophy.

56. D. B. Boyd and K. B. Lipkowitz, Eds., in Reviews in Computational Chemistry, VCH, NewYork, 1990, Vol. 1, pp. vii-xii. Preface on the Meaning and Scope of ComputationalChemistry.

57. P. Gund, E. J. J. Grabowski, G. M. Smith, J. D. Andose, J. B. Rhodes, and W. T. Wipke,in Computer-Assisted Drug Design. E. C. Olson and R. E. Christoffersen, Eds., ACSSymposium Series 112, American Chemical Society, Washington, DC, 1979, pp. 526–551.

58. F. H. Allen, S. Bellard,M.D. Brice, B. A. Cartwright, A. Doubleday, H.Higgs, T. Hummelink,B. G.Hummelink-Peter, O. Kennard,W.D. S.Motherwell, J. R. Rodgers, andD.G.Watson,Acta Crystallogr., Sect. B, B35, 2331–2339 (1979). The Cambridge Crystallographic DataCentre: Computer-based Search, Retrieval, Analysis and Display of Information.

59. F. C. Bernstein, T. F. Koetzle, G. J. B. Williams, E. F. Meyer Jr., M. D. Brice, J. R. Rodgers, O.Kennard, T. Shimanouchi, and M. Tasumi, J. Mol. Biol., 112, 535–542 (1977). The ProteinData Bank: A Computer-Based Archival File for Macromolecular Structures.

60. Y. C. Martin, Quantitative Drug Design. A Critical Introduction, Dekker, New York, 1978.

61. C. Hansch and A. Leo, Substituent Constants for Correlation Analysis in Chemistry andBiology, Wiley, New York, 1979.

62. P. N. Craig, C. H. Hansch, J. W. McFarland, Y. C. Martin, W. P. Purcell, and R. Zahradnik,J. Med. Chem., 14, 447 (1971). Minimal Statistical Data for Structure-FunctionCorrelations.

63. L. G. Humber, A. Albert, E. Campaigne, J. F. Cavalla, N. Anand, M. Provita, A. I. Rachlin,and P. Sensi, Information Bulletin Number 49, International Union of Pure and AppliedChemistry, Oxford, UK, March 1975. ‘‘Predicted’’ Compounds with ‘‘Alleged’’ BiologicalActivities from Analyses of Structure-Activity Relationships: Implications for MedicinalChemists.

64. E. J. Corey, A. K. Long, and S. D. Rubenstein, Science, 228, 408–418 (1985). Computer-Assisted Analysis in Organic Synthesis.

65. S. D. Rubenstein,Abstracts of Papers, 228th ACSNationalMeeting, Philadelphia, PA, August22–26, 2004, CINF-054. Electronic Documents in Chemistry, from ChemDraw 1.0 toPresent.

66. H. E. Helson, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd, Eds.,Wiley-VCH, New York, 1999, Vol. 13, pp. 313–398. Structure Diagram Generation.


67. T. Monmaney, Smithsonian, 36 (8), 48 (2005). Robert Langridge: His Quest to Peer into theEssence of Life No Longer Seems So Strange.

68. R. Langridge, Chemistry & Industry (London), 12, 475–477 (1980). Computer Graphics inStudies of Molecular Interactions.

69. R. Langridge, T. E. Ferrin, I. D. Kuntz, and M. L. Connolly, Science, 211, 661–666 (1981).Real-Time Color Graphics in Studies of Molecular Interactions.

70. J. G. Vinter, Chemistry in Britain, 21 (1), 32, 33–35, 37–38 (1985). Molecular Graphics forthe Medicinal Chemist.

71. D. B. Boyd, in Ullmann’s Encyclopedia of Industrial Chemistry, 7th edition, Wiley-VCH,Weinheim, Germany, 2006. Molecular Modeling - Industrial Relevance and Applications.

72. P. Gund, J. D. Andose, J. B. Rhodes, and G. M. Smith, Science, 208, 1425–1431 (1980).Three-Dimensional Molecular Modeling and Drug Design.

73. D. B. Boyd and M. M. Marsh, Abstracts of Papers, 183rd National Meeting of the AmericanChemical Society, Las Vegas, Nevada, March 28 - April 2, 1982. Computational Chemistryin the Design of Biologically Active Molecules at Lilly.

74. D. B. Boyd, Quantum Chemistry Program Exchange (QCPE) Bulletin, 5, 85–91 (1985).Profile of Computer-Assisted Molecular Design in Industry.

75. K. B. Lipkowitz and D. B. Boyd, Eds., in Reviews in Computational Chemistry, Wiley-VCH,New York, 1998, Vol. 12, pp. v-xiii. Improved Job Market for Computational Chemists.

76. G. R. Marshall, C. D. Barry, H. E. Bosshard, R. A. Dammkoehler, and D. A. Dunn, inComputer-Assisted Drug Design. E. C. Olson and R. E. Christoffersen, Eds., ACS Sympo-sium Series 112, American Chemical Society, Washington, DC, 1979, pp. 205–226. TheConformational Parameter in Drug Design: The Active Analog Approach. Computer-Assisted Drug Design.

77. Y. C. Martin, M. G. Bures, and P. Willett, in Reviews in Computational Chemistry, K. B.Lipkowitz and D. B. Boyd, Eds., VCH, New York, 1990, Vol. 1, pp. 213–263. SearchingDatabases of Three-Dimensional Structures.

78. G. Grethe and T. E. Moock, J. Chem. Inf. Comput. Sci., 30, 511–520 (1990). SimilaritySearching in REACCS. A New Tool for the Synthetic Chemist.

79. M. J. S. Dewar, E. F. Healy, and J. J. P. Stewart, J. Chem. Soc., Faraday Trans. 2: Mol. Chem.Phys., 80, 227–233 (1984). Location of Transition States in Reaction Mechanisms.

80. J. J. P. Stewart, J. Computer-Aided Mol. Des., 4, 1–105 (1990). MOPAC: A SemiempiricalMolecular Orbital Program.

81. J. J. P. Stewart, inReviews in Computational Chemistry, K. B. Lipkowitz andD. B. Boyd, Eds.,VCH, New York, 1990, Vol. 1, pp. 45–81. Semiempirical Molecular Orbital Methods.

82. J. J. P. Stewart, QCPE, 11, 455 (1983). MOPAC: A Semiempirical Molecular OrbitalProgram.

83. N. L. Allinger, J. Am. Chem. Soc., 99, 8127–8134 (1977). Conformational Analysis. 130.MM2. A Hydrocarbon Force Field Utilizing V1 and V2 Torsional Terms.

84. U. Burkert and N. L. Allinger, Molecular Mechanics, ACS Monograph 177, AmericanChemical Society, Washington DC, 1982.

85. A. J. Leo, J. Pharmaceut. Sci., 76, 166–168 (1987). Some Advantages of Calculating Octanol-Water Partition Coefficients.

86. A. J. Leo, Methods Enzymol., 202, 544–591 (1991). Hydrophobic Parameter: Measurementand Calculation.

87. A. J. Leo, Chem. Rev., 93, 1281–1306 (1993). Calculating log Poct from Structures.

88. K. Enslein, Pharmacol. Rev., 36 (2, Suppl.), 131–135 (1984). Estimation of ToxicologicalEndpoints by Structure-Activity Relationships.

89. K. Enslein, Toxicol. Industrial Health, 4, 479–498 (1988). An Overview of Structure-ActivityRelationships as an Alternative to Testing in Animals for Carcinogenicity, Mutagenicity,Dermal and Eye Irritation, and Acute Oral Toxicity.

References 447

90. D. B. Boyd, J. Med. Chem., 36, 1443–1449 (1993). Application of the Hypersurface IterativeProjection Method to Bicyclic Pyrazolidinone Antibacterial Agent.

91. D. A.Matthews, R. A. Alden, J. T. Bolin, D. J. Filman, S. T. Freer, R. Hamlin,W.G. Hol, R. L.Kisliuk, E. J. Pastore, L. T. Plante, N. Xuong, and J. Kraut, J. Biol. Chem., 253, 6946–6954(1978). Dihydrofolate Reductase from Lactobacillus casei. X-Ray Structure of the EnzymeMethotrexate-NADPH Complex.

92. D. A. Matthews, R. A. Alden, S. T. Freer, H. X. Nguyen, and J. Kraut, J. Biol. Chem., 254,4144–4151 (1979). Dihydrofolate Reductase from Lactobacillus casei. Stereochemistry ofNADPH Binding.

93. A. J. Everett, inTopics inMedicinal Chemistry, P. R. Leeming, Ed., Proceedings of the 4th SCI-RSCMedicinal Chemistry Symposium, Cambridge, UK, Sept. 6–9, 1987, Special Publication65, Royal Society of Chemistry, London, 1988, pp. 314–331. Computing and Trial andError in Chemotherapeutic Research.

94. A. Ito, K. Hirai, M. Inoue, H. Koga, S. Suzue, T. Irikura, and S. Mitsuhashi, Antimicrob.Agents Chemother., 17, 103–108 (1980). In vitro Antibacterial Activity of AM-715, A NewNalidixic Acid Analog.

95. H. Koga, A. Itoh, S. Murayama, S. Suzue, and T. Irikura, J. Med. Chem., 23, 1358–1363(1980). Structure-Activity Relationships of Antibacterial 6,7- and 7,8-Disubstituted 1-Alkyl-1,4-dihydro-4-oxoquinoline-3-carboxylic Acids.

96. H. Koga, Kagaku no Ryoiki, Zokan, 136, 177–202 (1982). Structure-Activity Relationshipsand Drug Design of Pyridonecarboxylic Acid Type (Nalidixic Acid Type) Synthetic Anti-bacterial Agents.

97. T. J. Perun and C. L. Propst, Eds., Computer-Aided Drug Design: Methods and Applications,Dekker, New York, 1989.

98. A. Brunger, M. Karplus, and G. A. Petsko, Acta. Crystallogr., Sect. A, A45, 50–61 (1989).Crystallographic Refinement by Simulated Annealing: Application to Crambin.

99. B. R. Brooks, R. E. Bruccoleri, B. D. Olafson, D. J. States, S. Swaminathan, and M. Karplus,J. Comput. Chem., 4, 187–217 (1983). CHARMM:A Program forMacromolecular Energy,Minimization, and Dynamics Calculations.

100. P. Kollman, Annu. Rev. Phys. Chem., 38, 303–316 (1987). Molecular Modeling.

101. J. A. McCammon, Science, 238, 486–91 (1987). Computer-Aided Molecular Design.

102. M. R. Reddy, M. D. Erion, and A. Agarwal, in Reviews in Computational Chemistry,K. B. Lipkowitz and D. B. Boyd, Eds., Wiley-VCH, New York, 2000, Vol. 16, pp. 217–304.Free Energy Calculations: Use and Limitations in Predicting Ligand Binding Affinities.

103. J. P. Snyder, Med. Res. Rev., 11, 641–662 (1991). Computer-Assisted Drug Design. Part I.Conditions in the 1980s.

104. S. Karin and N. P. Smith, The Supercomputer Era, Harcourt Brace Jovanovich, Boston,1987.

105. J. S. Wold, testimony before the U.S. Senate; Commerce, Science and TransportationCommittee; Science, Technology and Space Subcommittee. Available at http://www.funet.fi/pub/sci/molbio/historical/biodocs/wold.txt. Supercomputing Network: A Key toU.S. Competitiveness in Industries Based on Life-Sciences Excellence.

106. S. A. F. Milosevich and D. B. Boyd, Perspectives in Drug Discovery and Design, 1, 345–358(1993). Supercomputing and Drug Discovery Research.

107. A. B. Richon, Network Science, 1996. Available at http://www.netsci.org/Science/Comp-chem/feature17a.html. A History of Computational Chemistry.

108. T. A. Halgren, J. Am. Chem. Soc., 114, 7827–7843 (1992). The Representation of van derWaals (vdW) Interactions in Molecular Mechanics Force Fields: Potential Form, Combina-tion Rules, and vdW Parameters.

109. T. A. Halgren, J. Comput. Chem., 17, 490–519 (1996).MerckMolecular Force Field. I. Basis,Form, Scope, Parameterization and Performance of MMFF94.


110. T. A. Halgren, J. Comput. Chem., 17, 520–552 (1996). Merck Molecular Force Field.II. MMFF94 van der Waals and Electrostatic Parameters for Intermolecular Interactions.

111. T. A. Halgren, J. Comput. Chem., 17, 553–586 (1996). Merck Molecular Force Field. III.Molecular Geometrics and Vibrational Frequencies for MMFF94.

112. T. A. Halgren and R. B. Nachbar, J. Comput. Chem., 17, 587–615 (1996). Merck MolecularForce Field. IV. Conformational Energies and Geometries.

113. T. A. Halgren, J. Comput. Chem., 17, 616–641 (1996). Merck Molecular Force Field. V.Extension of MMFF94 Using Experimental Data, Additional Computational Data andEmpirical Rules.

114. T. A. Halgren, J. Comput. Chem., 20, 720–729 (1999). MMFF. VI. MMFF94s Option forEnergy Minimization Studies.

115. D. B. Boyd andK. B. Lipkowitz, inReviews in Computational Chemistry, K. B. Lipkowitz andD. B. Boyd, Eds., Wiley-VCH, New York, 2000, Vol. 14, pp. 399–439. History of theGordon Research Conferences on Computational Chemistry.

116. F.Mohamadi, N. G. J. Richards,W. C. Guida, R. Liskamp,M. Lipton, C. Caufield, G. Chang,T. Hendrickson, andW. C. Still, J. Comput. Chem., 11, 440–467 (1990). MacroModel - AnIntegrated Software System for Modeling Organic and Bioorganic Molecules UsingMolecular Mechanics.

117. P. W. Sprague, Recent Advances in Chemical Information, Special Publication 100, RoyalSociety of Chemistry, 1992, pp. 107–111. Catalyst: A Computer Aided Drug Design SystemSpecifically Designed for Medicinal Chemists.

118. J. M. Blaney and J. S. Dixon, Perspectives in Drug Discovery and Design, 1, 301–319 (1993).A Good Ligand is Hard to Find: Automated Docking Methods.

119. H.-J. Boehm, Proceedings of the Alfred Benzon Symposium, No. 39, Munksgaard,Copenhagen, 1996, pp. 402–413. Fragment-Based de novo Ligand Design.

120. M. A. Murcko, in Reviews in Computational Chemistry, K. B. Lipkowitz and D. B. Boyd,Eds., Wiley-VCH, New York, 1997, Vol. 11, pp. 1–66. Recent Advances in Ligand DesignMethods.

121. D. E. Clark, C. W. Murray, and J. Li, in Reviews in Computational Chemistry, K. B.Lipkowitz and D. B. Boyd, Eds., Wiley-VCH, New York, 1997, Vol. 11, pp. 67–125.Current Issues in de novo Molecular Design.

122. G. Lauri and P. A. Bartlett, J. Comput-Aided Mol. Des., 8, 51–66 (1994). CAVEAT: AProgram to Facilitate the Design of Organic Molecules.

123. W. P. Walters, M. T. Stahl, and M. A. Murcko, Drug Discovery Today, 3, 160–178 (1998).Virtual Screening - An Overview.

124. A. T. Brunger, P. D. Adams, G. M. Clore, W. L. DeLano, P. Gros, R. W. Grosse-Kunstleve,J.-S. Jiang, J. Kuszewski, M. Nilges, N. S. Pannu, R. J. Read, L. M. Rice, T. Simonson, andG. L. Warren, Acta Crystallogr. Sect. D: Biol. Crystallogr., D54, 905–921 (1998).Crystallography & NMR System: A New Software Suite for Macromolecular StructureDetermination.

125. R. A. Lewis, S. D. Pickett, and D. E. Clark, in Reviews in Computational Chemistry, K. B.Lipkowitz and D. B. Boyd, Eds., Wiley-VCH, New York, 2000, Vol. 16, pp. 1–51.Computer-Aided Molecular Diversity Analysis and Combinatorial Library Design.

126. K. B. Lipkowitz and D. B. Boyd, Eds., Reviews in Computational Chemistry, VCH,New York, 1996, Vol. 7, pp. v–xi. Trends in the Job Market for ComputationalChemists.

127. K. B. Lipkowitz and D. B. Boyd, Eds., Reviews in Computational Chemistry, Wiley-VCH,New York, 1997, Vol. 11, pp. v–x. Preface on Computer Aided Ligand Design.

128. C. A. Lipinski, F. Lombardo, B.W.Dominy, and P. J. Feeney,Adv. DrugDeliv. Rev., 23, 3–25(1997). Experimental and Computational Approaches to Estimate Solubility and Perme-ability in Drug Discovery and Development Settings.

References 449

129. D. B. Boyd. Abstracts of Papers, Symposium on Connecting Molecular Level CalculationalTools with Experiment, 206th National Meeting of the American Chemical Society,Chicago, Illinois, August 22–26, 1993, PHYS 256. Computer-Aided Molecular DesignApplications.

130. C. Hansch and A. Leo, Exploring QSAR: Fundamentals and Applications in Chemistry andBiology, American Chemical Society, Washington, DC, 1995.

131. C. Hansch, A. Leo, and D. Hoekman, Exploring QSAR: Hydrophobic, Electronic, and StericConstants, American Chemical Society, Washington, DC, 1995.

132. R. Todeschini and V. Consonni, Handbook of Molecular Descriptors, Wiley-VCH, Berlin,2000.

133. M. Karelson, Molecular Descriptors in QSAR/QSPR, Wiley, New York, 2000.

134. D. B. Boyd, in Rational Molecular Design in Drug Research, T. Liljefors, F. S. Jørgensen, andP. Krogsgaard-Larsen, Eds., Proceedings of the Alfred Benzon Symposium No. 42. Munks-gaard, Copenhagen, 1998, pp. 15–23. Progress in Rational Design of TherapeuticallyInteresting Compounds.

135. D. B. Boyd, CHEMTECH, 28 (5), 19–23 (1998). Innovation and the Rational Design ofDrugs.

136. D. B. Boyd,ModernDrugDiscovery, November/December, 1 (2), pp. 41–48 (1998). RationalDrug Design: Controlling the Size of the Haystack.

137. D. B. Boyd, in Encyclopedia of Computational Chemistry, P. v. R. Schleyer, N. L. Allinger, T.Clark, J. Gasteiger, P. Kollman, andH. F. Schaefer III, Eds., Wiley, Chichester, 1998, Vol. 1,pp. 795–804. Drug Design.

138. D. B. Boyd, in Rational Drug Design: Novel Methodology and Practical Applications, A. L.Parrill and M. R. Reddy, Eds., ACS Symp. Series 719, American Chemical Society,Washington, DC, 1999, pp. 346–356. Is Rational Design Good for Anything?

139. P. Zurer, Chemical Eng. News, June 20, 2005, p. 54. Crixivan.

140. J. C. Dyason, J. C. Wilson, and M. Von Itzstein, in Computational Medicinal Chemistry forDrug Discovery, P. Bultinck, H. De Winter, W. Langenaeker, and J. P. Tollenaere, Eds.,Dekker, New York, 2004. Sialidases: Targets for Rational Drug Design.

141. D. E. Walters and A. J. Hopfinger, THEOCHEM, 27, 317–323 (1986). Case Studies of theApplication of Molecular Shape Analysis to Elucidate Drug Action.

142. S. A. DePriest, D. Mayer, C. B. Naylor, and G. R. Marshall, J. Am. Chem. Soc., 115, 5372–5384 (1993). 3-D-QSAR of Angiotensin-Converting Enzyme and Thermolysin Inhibitors: AComparison of CoMFA Models Based on Deduced and Experimentally Determined ActiveSite Geometries.

143. M. Congreve, C.W.Murray, and T. L. Blundell,DrugDiscovery Today, 10, 895–907 (2005).Structural Biology and Drug Discovery.

144. D. B. Boyd, A. D. Palkowitz, K. J. Thrasher, K. L. Hauser, C. A. Whitesitt, J. K. Reel, R. L.Simon, W. Pfeifer, S. L. Lifer, K. Takeuchi, V. Vasudevan, A. D. Kossoy, J. B. Deeter, M. I.Steinberg, K. M. Zimmerman, S. A. Wiest, and W. S. Marshall, in Computer-AidedMolecular Design: Applications in Agrochemicals, Materials, and Pharmaceuticals, C. H.Reynolds,M. K. Holloway, andH. K. Cox, Eds., ACS Symp. Series 589, American ChemicalSociety, Washington, DC, 1995, pp. 14–35. Molecular Modeling and Quantitative Struc-ture-Activity Relationship Studies in Pursuit of Highly Potent Substituted OctanoamideAngiotensin II Receptor Antagonists.

145. See, for example, A. B. Wagner, J. Chem. Inf. Model, 46, 767–774 (2006). SciFinderScholar 2006: An Empirical Analysis of Research Topic Query Processing. And referencestherein.

146. D. B. Boyd andK. B. Lipkowitz, inReviews in Computational Chemistry, K. B. Lipkowitz andD. B. Boyd, Eds., Wiley-VCH, New York, 2002, Vol. 18, pp. 293–319. Examination of theEmployment Environment for Computational Chemistry.

147. Pharmaceutical Research andManufacturing Association, Washington, DC. www. phrma.org.


148. European Federation of Pharmaceutical Industries and Associations, Brussels, Belgium.www.efpia.org/6_publ/infigure2004d.pdf. The Pharmaceutical Industry in Figures, 2000Edition.

149. M. McCoy, Chem. Eng. News, Oct. 17, 2005, p. 9. Study Finds R&D Money Doesn’tBuy Results.

150. E. F. Schmid and D. A. Smith, Drug Discovery Today, 10, 1031–1039 (2005). Is DecliningInnovation in the Pharmaceutical Industry a Myth?

151. S. Class, Chem. Eng. News, Dec. 5, 2005, pp. 15–32. Pharma Reformulates.

152. R. Mullin, Chem. Eng. News, Jan. 23, 2006, p. 9. Tufts Report Anticipates Upturn.

References 451

Author Index

Abbott, N. L., 260

Abrams, C. F., 258Abusalbi, N., 228

Adamo, C., 118, 232

Adams, J. E., 225

Adams, J. T., 228Adams, P. D., 449

Agarwal, A., 448

Ahlrichs, P., 261

Ahlrichs, R., 76, 78, 81Ahmed, F. R., 444

Aires-de-Sousa, J., 399

Akkermans, R. L. C., 258, 259Akutsu, T., 395, 396

Albert, A., 446

Albu, T. V., 223, 229

Alden, R. A., 448Alder, B. J., 262

Alhambra, C., 229, 231, 232

Aliferis, C. F., 394

Al-Laham, M. A., 119, 232Allen, F. H., 446

Allen, M. P., 260

Allinger, N. L., 81, 446, 447, 450

Allison, T. C., 227Almlof, J. E., 76, 77, 78, 82

Altornare, C., 397

Amboni, R. D. M. C., 397Amos, R. D., 81, 119

Anand, N., 446

Ancona, N., 400

Andersen, H. C., 260Andersson, K., 120

Andose, J. D., 446, 447

Andres, J. L., 232

Angelo, M., 394

Anglada, J. M., 121Angulo, C., 394

Antti, H., 393

Aoki, M., 396

Applegate, B. E., 118Aptula, A. O., 397

Arfken, G. B., 77

Arimoto, R., 399

Armstrong, R. C., 259Arnhold, T., 399

Atchity, G. J., 117, 118

Austin, A. J., 118Ayala, P. Y., 82, 118, 225, 232

Aynechi, T., 289

Ayton, G., 261

Babamov, V. K., 229

Baboul, A. G., 119

Bader, G. D., 400

Badhe, Y., 395Baer, M., 119, 223

Baerends, E. J., 119

Baeten, V., 399

Bai, D., 258Bajorath, J., 287, 288, 289

Bakken, V., 118

Balakin, K. V., 399Balasubramanian, K., 123

Baldi, P., 394

Baldridge, K. K., 120, 225

Bandyopadhyay, S., 261Barckholtz, T. A., 116, 117

Barnhill, S., 392

Barns, J., 77


453

Baron, H. P., 81

Barone, P. M. V. B., 397Barone, V., 118, 232

Barry, C. D., 447

Bartlett, P., 392Bartlett, P. A., 449

Bartlett, P. L., 393

Bartlett, R. J., 75, 119, 120

Bartol, D., 225Baschnagel, J., 259

Baskin, B., 400

Bates, D., 117

Batoulis, J., 258, 261Beachy, M. D., 82

Bearden, A. P., 396

Bearpark, M. J., 116, 121

Beatson, R., 77Beck, B., 399

Becke, A. D., 79

Bekker, H., 260Bell, R. P., 225

Bellard, S., 446

Bellon-Maurel, V., 399

Belooussov, A., 399Ben-Nun, M., 118, 121, 122

Bendale, R. D., 287

Bengio, S., 400

Bennett, K. P., 394Berendsen, H. J. C., 260

Bergsma, J. P., 232

Berman, M., 123Bernardi, F., 116, 118, 121, 122

Bernasconi, C. F., 228

Bernhardsson, A., 119

Bernholc, J., 258Berning, A., 119

Bernstein, F. C., 446

Berry, M. V., 123

Bersuker, I. B., 117Bertran, J., 223, 230, 232

Bertz, S. H., 287

Beveridge, D. J., 230, 444

Bi, J., 394Bicego, M., 400

Bickelhaupt, F. M., 119

Bierbaum, V., 123Biermann, O., 258, 259

Biggio, G., 397, 398

Binder, K., 258, 259, 261

Bingham, R. C., 445Binkley, J. S., 81

Birge, L., 288

Blancafort, L., 120, 123

Blaney, J. M., 449Blundell, T. L., 450

Boatz, J. A., 120

Bohm, H.-J., 449Bofill, J. M., 121

Bolcer, J. D., 444

Bolin, J. T., 448

Bonacic Koutecky, V., 122Bonchev, D., 287

Bondi, D. K., 229

Borden, W. T., 119

Born, M., 76, 116, 117Bosshard, H. E., 447

Bouman, T. D., 81

Bourasseau, E., 260

Boutin, A., 260Bowes, C., 223

Bowler, D. R., 80

Bowman, J. M., 232Boyd, D. B., 75, 77, 116, 119, 230, 288,

444, 445, 446, 447, 448, 449, 450

Boyd, R. J., 444

Boyd, R. K., 223Braga, R. S., 397

Brandt, A., 258

Brannigan, G., 262

Breneman, C. M., 394, 445Brereton, R. G., 399, 400

Brice, M. D., 446

Briels, W. J., 258Briem, H., 395

Broo, A., 122

Brooks, B. R., 232, 448

Brown, D., 260Brown, F. L. H., 262

Brown, S. P., 81

Bruccoleri, R. E., 232, 448

Brudzewski, K., 400Bruice, T. C., 444

Brunger, A., 448, 449

Brunne, R. M., 289

Buchbauer, G., 397Buchenau, U., 258

Buckingham, A. D., 76

Budzien, J., 260Bultinck, P., 450

Bunescu, R., 400

Burant, J. C., 78, 118, 232

Bures, M. G., 447Burger, T., 258, 261

Burges, C. J. C., 392, 393, 400

454 Author Index

Burghardt, I., 124

Burkert, U., 447Bush, I. J., 80

Busonero, F., 397

Buydens, L. M. C., 399Byron, R. B., 76

Byvatov, E., 395

Cai, Y. D., 396, 399Callis, P. R., 122

Camilo Jr., A., 397

Cammi, R., 118, 232

Campagne, F., 400Campaigne, E., 446

Campbell, C., 393

Cao, J., 398

Cao, L., 394Cao, Y., 82

Cao, Z. W., 399

Car, R., 79Carini, D. J., 393

Carmesin, I., 259

Carotti, A., 397, 398

Carrieri, A., 398Carter, E. A., 82

Carter, J. F., 399

Cartwright, B. A., 446

Carver, T. J., 258Castro, E. A., 397

Catala, A., 394

Caufield, C., 449Cavalla, J. F., 446

Cawthraw, S., 399

Cederbaum, L. S., 117, 118, 121, 122, 124

Celani, P., 119Cembran, A., 118

Chakravorty, S. J., 287

Challacombe, M., 77, 78, 79, 82, 119, 232

Chambers, C. C., 231Chang, C. C., 393

Chang, G., 449

Chang, Y. T., 230

Chapelle, O., 392Chastrette, M., 397

Chatfield, D. C., 228. 232

Chauchard, F., 399Cheeseman, J. R., 81, 118, 232

Chen, B., 79

Chen, C. H., 395

Chen, J. J., 395Chen, L. B., 394

Chen, N. Y., 398

Chen, P. H., 394

Chen, Q. S., 396Chen, W., 119, 232

Chen, X., 395

Chen, X. G., 400Chen, Y., 396

Chen, Y. Z., 395, 399

Chen, Z., 260

Chervonenkis, A., 392Chiasserini, L., 397

Chou, K. C., 396

Christiansen, P. A., 123

Christoffersen, R. E., 446, 447Chu, Y. H., 400

Chu, Z. T., 122

Chuang, Y.-Y., 223, 226, 227, 229,

230, 231Chung, C. B., 400

Ciccotti, G., 259, 260

Cinone, N., 398Cioslowski, J., 119, 232

Cipriani, J., 77

Clancy, T. C., 261

Clark, D. E., 449Clark, T., 81, 289, 450

Class, S., 451

Clifford, S., 119, 232

Clore, G. M., 449Coe, J. D., 123

Cogdill, R. P., 399

Cohen, A., 396Cohen, B., 122

Coitino, E. L., 223, 229, 230

Colhoun, F. L., 259

Collier, N., 400Collins, M. A., 230

Collobert, R., 400

Colombo, L., 79

Coltrin, M. E., 228Congreve, M., 450

Connolly, M. L., 447

Connor, J. N. L., 229

Connors, K. A., 230Consonni, V., 288, 393, 450

Cooke, I. R., 262

Cooper, D. L., 119Corchado, J. C., 223, 226, 228, 229,

230, 231

Corey, E. J., 445, 446

Cortes, C., 392Cossi, M., 118, 120, 232

Cover, T. M., 288

Author Index 455

Cox, H. K., 450

Craig, P. N., 446Cramer, C. J., 230, 231, 232

Cramer III, R. D., 445

Cramer, T., 399Crawford, T. D., 119

Cremer, D., 76

Crespo Hernandez, C. E., 122

Cristianini, N., 392Cronin, M. T. D., 397, 398

Cross, J. B., 118

Cross, L. C., 444

Cross, P. C., 224Crum Brown, A., 443, 444

Csizmadia, I. G., 223

Csonka, G. I., 79

Cubic, B. C., 228Cui, Q., 119, 232

Cundari, T. R., 122, 123, 394

Curro, J. G., 258Curtiss, C. F., 76

Dachsel, H., 119

Dallos, M., 119Daly, J., 287

Dammkoehler, R. A., 447

Dancoff, S. M., 287

Daniels, A. D., 80, 118, 232Daniels, M., 122

Dannenberg, J. J., 118

Dantus, M., 116Dapprich, S., 118, 232

Dardenne, P., 399

Daudel, R., 224

Davidson, E. R., 77, 119, 121Daw, M. S., 80

Dawson, R. W., 398

De Brabanter, J., 392

de Bruijn, B., 400de Carvalho, A., 395

De Moor, B. L. R., 392, 395

de Pablo, J. J., 260, 261

De Raedt, L., 399De Smet, F., 395

de Vries, A. H., 260

De Winter, H., 450Dearden, J. C., 398

Debnath, R., 394

Decius, J. C., 224

Deegan, M. J. O., 119Deeter, J. B., 450

DeLano, W. L., 449

Della Valle, R. G., 259

Delle Site, L., 258DePriest, S. A., 450

Deserno, M., 262

Dewar, M. J. S., 230, 445, 447Dickey, A. N., 262

Diercksen, G. H. F., 76, 77

Dimitrov, S. D., 398

Dimitrova, N. C., 398Distante, C., 400

Ditchfield, R., 81, 445

Dixon, D. A., 119

Dixon, J. S., 449Dobbyn, A. J., 120

Doi, M., 258, 259

Dolenko, B., 399

Domcke, W., 116, 117, 118, 121, 122, 123Dominy, B. W., 288, 449

Donaldson, I., 400

Doruker, P., 259Doser, B., 76

Doubleday, A., 446

Downs, T., 395

Doxastakis, M., 260Drucker, H., 392

Du, L., 395

Du, W., 394

Duffy, E. M., 289Dunietz, D. B., 82

Dunlea, S., 395

Dunn, D. A., 447Dunweg, B., 261

Dupuis, M., 120, 121, 225

Dyason, J. C., 450

Dyczmons, V., 76

Eckart, C., 225

Ecker, G., 397

Eckers, C., 399Eckert, F., 120

Ediger, M. D., 260

Edwards, R., 398

Edwards, S. F., 259Ehara, M., 118

Eiden, M., 399

Eilhard, J., 258El Aıdi, C., 397

El Ghaoui, L., 395

Elbert, S. T., 118, 120

Eliason, M. A., 224Elisseeff, A., 394

Ellingson, B. A., 223, 227

456 Author Index

Embrechts, M., 394, 400

Engkvist, O., 260Englman, R., 117

Enslein, K., 447

Eriksson, L., 393Erion, M. D., 231, 448

Ermler, W. C., 123

Escobedo, F. A., 260

Eskin, E., 396Espinosa-Garcia, J., 228

Esquivel, R. O., 287, 288

Esselink, K., 261

Esterman, I., 117Evans, M. G., 223

Everett, A. J., 448

Ewig, C. S., 232

Eynon, B. P., 395Eyring, H., 76, 223

Faceli, K., 395Facius, A., 395

Faegri, K., 76

Falck, E., 261

Faller, R., 258, 259, 260, 261, 262Farago, O., 262

Farazdel, A., 121

Farkas, O., 118, 232

Fast, P. L., 223, 224, 229, 230Fatemi, M. H., 398

Feeney, P. J., 288, 449

Fernandez-Ramos, A., 223, 228, 229Ferr, N., 118

Ferrin, T. E., 447

Feshbach, H., 77

Feyereisen, M. W., 78Filman, D. J., 448

Finley, J., 120

Fischer, E., 443

Fkih-Tetouani, S., 397Flanigan, M. C., 225

Flannery, B. P., 80

Fleischer, U., 81

Fletcher, R., 393Fock, V., 75, 445

Fogarasi, G., 226

Foresman, J. B., 119, 232Fowler, R. H., 223

Fox, D. J., 119, 232

Fox, T., 399

Fraaije, J. G. E. M., 260Franaszek, M., 395

Frank, E., 395

Frazer, T. R., 443, 444

Free, Jr., S. M., 444Freer, S. T., 448

Frenkel, D., 260

Friedman, R. S., 228, 232Friesner, R., 82

Frisch, M. J., 78, 80, 81, 118, 225, 232

Frohlich, H., 287, 395

Fruchtl, H. A., 78Frurip, D. J., 231

Frymier, P. D., 397

Fuchs, A. H., 260

Fujimoto, H., 225Fujita, T., 444

Fukuda, R., 118

Fukui, K., 224, 225

Fukunaga, H., 258Fung, G. M., 394

Fuss, W., 122

Fusti-Molnar, L., 78Fytas, G., 260

Gadre, S. R., 287

Galli, G., 79Galvao, D. S., 397

Gamow, G., 225

Gao, H., 400

Gao, J., 229, 231, 232Gao, J. B., 394

Garavelli, M., 116, 118, 122

Garcia-Viloca, M., 229, 231, 232Garg, R., 393, 398

Garrett, B. C., 223, 224, 225, 226, 227, 228,

229, 231, 232

Gasteiger, J., 81, 393, 399, 450Gates, K. E., 395

Gatti, C., 287

Gauss, J., 76, 81

Gazzillo, D., 259Ge, R. F., 400

Geibel, P., 396

Gelani, P., 122

Georgievskii, Y., 227Gerald, W., 394

Gersmann, K., 396

Gertner, B. J., 232Ghosh, J., 262

Gianola, A., 123

Giesen, D. J., 231

Gifford, E. M., 399Gilbert, R. G., 232

Gill, P. M. W., 76, 77, 78, 79, 119, 232

Author Index 457

Gillan, M. J., 80

Gilson, M. K., 399Girosi, F., 392

Glaesemann, K. R., 120

Glezakou, V.-A., 118Godden, J. W., 287, 288, 289

Goedecker, S., 79, 80

Goetz, R., 261

Goldman, B. B., 394Golub, T. R., 394

Gomperts, R., 118, 232

Gompper, G., 261

Gonzalez, C., 119, 232Gonzalez-Lafont, A., 226, 227, 228, 229

Gordon, M. S., 118, 120, 225, 228

Gould, E. S., 445

Grabowski, E. J. J., 446Graham, D. J., 287, 288

Graham, R. L., 79

Gramatica, P., 398Granucci, G., 122

Greenberg, A., 445

Greengard, L., 77

Grenthe, I., 116Grest, G. S., 258, 259

Grethe, G., 447

Grev, R. S., 224

Grimm, M., 398Gros, P., 449

Gross, E. K. U., 120

Grosse-Kunstleve, R. W., 449Grotendorst, J., 80, 81, 82

Grubbs, F., 288

Gu, Q., 394

Gu, Z., 226Guermeur, Y., 394

Guida, W. C., 449

Guillo, C., 399

Gund, P., 445, 446, 447Gunn, S. R., 394

Gunther, J., 395

Guo, C., 396

Guo, H., 261Guo, Z., 395

Gusev, A. A., 259

Guyon, I., 392Gwaltney, S., 80

Gwinn, W. D., 227

Gygax, R., 446

Haboudou, M., 260

Hack, M. D., 121

Hada, M., 118

Hadjichristidis, N., 260Hadjipavlou-Litina, D., 398

Haffner, P., 392

Hafskjold, B., 261Hahn, O., 258, 259, 261

Haire, K. R., 258

Halgren, T. A., 448, 449

Haliloglu, T., 261Hall, G. G., 76

Hall, L. H., 288

Hall, M. A., 395

Halvick, P., 228Hamlin, R., 448

Hammer, B., 396

Hammes Schiffer, S., 121

Hammett, L. P., 445Hampel, C., 82, 120

Han, C. H., 400

Han, I. S., 400Han, S., 123

Hancock, G. C., 223, 228

Handy, N. C., 81, 225

Hanna-Brown, M., 399Hansch, C., 393, 398, 444, 446, 450

Hansen, A. E., 81

Hardin, D., 394

Hare, P. M., 122Harris, C. J., 394

Harrison, R. J., 119

Hartree, D. R., 75, 445Hasegawa, J., 118

Haser, M., 76, 81, 82

Hass, Y., 121

Hauser, K. L., 450Hauswirth, W., 122

Hawkins, G. D., 231, 232

He, K., 287

Head-Gordon, M., 76, 77, 78, 79, 80, 81, 82,232

Healy, E. F., 230, 447

Hehre, W. J., 445

Heidrick, D., 229, 232Heinrich, N., 398

Heinzen, V. E. F., 397

Helgaker, T., 76, 77, 79, 81, 82, 120Helma, C., 399

Helson, H. E., 446

Hendrickson, J. B., 446

Hendrickson, T., 449Henkel, T., 289

Herbrich, R., 392

458 Author Index

Hermann, R. B., 444, 445

Hermens, J. L. M., 396Hernandez, E., 80

Herron, D. K., 445

Hertel, I. V., 123Herzberg, G., 117, 226, 227

Hess, B., 260

Heß, B. A., 123

Hetzer, G., 82, 120Heuer, A., 259

Hierse, W., 80

Higgs, H., 446

Hilbers, P. A. J., 261Hinton, J. F., 81

Hirai, K., 448

Hirao, K., 120

Hirsch, J. A., 446Hirschfelder, J. O., 76, 224, 225

Ho, I. C., 395

Ho, M., 287, 288Hoekman, D., 450

Hoenigman, R., 123

Hoffman, B. C., 124

Hoffmann, R., 444Hoffmann-Ostenhof, M., 78

Hoffmann-Ostenhof, T., 78

Hogekamp, A., 78

Hogue, C. W. V., 400Hohenberg, P., 75

Hol, W. G., 448

Holloway, M. K., 450Holmes, E., 393

Honda, Y., 118

Hopfinger, A. J., 450

Horiuti, J., 224Horn, H., 81

Howe, W. J., 445

Hratchian, H. P., 118

Hsu, C. W., 394Hu, H. Y., 398

Hu, Q. N., 398

Hu, W.-P., 223, 229, 230

Hu, Z. D., 400Huang, K., 117

Huang, X., 232

Huarte-Larranaga, F., 232Humber, L. G., 446

Hummelink, T., 446

Hummelink-Peter, B. G., 446

Hut, P., 77Huuskonen, J., 289

Hynes, J. T., 124, 232

Ichino, T., 123

Inoue, M., 448Irikura, K., 231

Irikura, T., 448

Irwin, J. J., 288Isaacson, A. D., 223, 226, 227, 228, 229

Ischtwan, J., 230

Ishida, M., 118

Ismail, N., 120Itoh, A., 448

Ivanciuc, O., 393, 396, 397, 398

Ivanov, I., 261

Ivaschenko, A. A., 399Iyengar, S. S., 118

Jackels, C. F., 223, 226, 227

Jahn, J. A., 117Jain, B. J., 396

Jalali-Heravi, M., 398

Jaramillo, J., 118Jarnagin, K., 395

Jasper, A. W., 116, 118

Jaszunski, M., 81

Jayaraman, V. K., 395Jaynes, E. T., 288

Jeliazkova, N. G., 397

Jenkin, D. G., 444

Jensen, H. J. A., 120Jensen, J. H., 120

Jerebko, A. K., 395

Jiang, J.-S., 449Jiang, W., 395

Joachims, T., 392, 400

Johansson, E., 393

Johnson, B. G., 76, 77, 78, 79, 119, 232Johnson, C. K., 444

Jordan, M. J. T., 232

Jørgensen, F. S., 450

Jørgensen, P., 79, 82, 120Jorgensen, W. L., 289

Jørgenson, P., 76

Jorissen, R. N., 399

Joseph, T., 223, 225, 228Joy, A. T., 288

Junkes, B. S., 397

Jurs, P. C., 289, 399

Kanaoka, M., 396

Karelson, M., 450

Karin, S., 448Karlstrom, G., 120, 260

Karplus, M., 232, 448

Author Index 459

Karttunen, M., 261

Kasheva, T. N., 289Kate, R. J., 400

Kato, S., 123, 225

Katriel, J., 121Katsov, K., 261

Keating, S. P., 123

Keck, J. C., 224

Kecman, V., 392Kedziora, G. S., 119

Keith, T., 119, 232

Keith, T. A., 81

Kelly, C. P., 231, 232Kemble, E. C., 225

Kendall, R. A., 78

Kendrick, B. K., 117

Kennard, O., 446Kharasch, N., 444

Kier, L. B., 288

Kierstad, W. P., 232Kim, E. B., 260

Kim, J., 79

Kim, Y., 228, 229

Kimball, G. E., 76Kimura, T., 396

Kisliuk, R. L., 448

Kitao, O., 118

Kitchen, D. B., 288Kjaer, K., 262

Klarner, F.-G., 81

Klautau, A., 394Klein, C. T., 397

Klein, M. L., 261

Klein, S., 121

Klene, M., 118Klessinger, M., 121

Klippenstein, S. J., 226, 227

Klocker, J., 397

Knirsch, P., 393Knowles, P., 82

Knowles, P. J., 119

Knox, J. E., 118

Kobayashi, T., 122Koch, H., 120

Koetzle, T. F., 446

Koga, H., 448Koga, N., 120

Kohen, A., 231

Kohler, B., 120, 122

Kohler, W., 399Kohn, W., 75, 76, 78

Kolinski, A., 259

Kollman, P. A., 81, 448, 450

Koltun, W. L., 444Komaromi, I., 119, 232

Komornicki, A., 225

Kong, J., 78Konuze, E., 398

Koppel, H., 116, 117, 118, 121

Kornberg, R. D., 262

Korona, T., 119Korsell, K., 76

Koseki, S., 120, 122

Kossoy, A. D., 450

Kouri, D. J., 228Koziol, F., 80, 81

Kramer, S., 399

Kramers, H., 123

Kranenburg, M., 261Kraut, J., 448

Kreer, T., 261

Kreevoy, M. M., 228Kremer, K., 258, 259, 260, 261, 262

Kriegl, J. M., 399

Krishnan, R., 81

Krogsgaard-Larsen, P., 450Krylov, A. I., 120

Kuang, R., 396

Kubinyi, H., 288

Kudin, K. N., 118, 232Kuhl, T. L., 262

Kuhn, W., 261

Kulkarni, A., 395Kulkarni, B. D., 395

Kullback, S., 288

Kumar, R., 395

Kuntz, I. D., 289, 447Kuppermann, A., 117, 225, 228

Kurepa, M. V., 228

Kurup, A., 393

Kussmann, J., 80Kuszewski, J., 449

Kutzelnigg, W., 76, 81

Laage, D., 124Labute, P. A., 288, 289

Ladd, C., 394

Laidler, K. J., 224Lambrecht, D. S., 76

Lanckriet, G. R. G., 395

Land, W. H., 400

Lander, E. S., 394Langenaeker, W., 450

Langridge, R., 447

460 Author Index

Larsen, H., 79, 82

Larter, R., 122, 123, 394Laso, M., 261

Lathan, W. A., 445

Latulippe, E., 394Lauderdale, J. G., 223

Laumer, J. Y. D., 397

Lauri, G., 449

Lawley, K. P., 80Lay, V., 400

Lee, C. K., 394

Lee, D. E., 400

Lee, H. P., 394Lee, M., 80

Lee, M. S., 78, 80, 82

Lee, S., 232

Lee, T. C., 395Lee, T.-S., 79

Lee, Y., 394

Leeming, P. R., 448Leibensperger, D., 400

Lengauer, T., 395

Lengsfield, B. H., 119

Lenz, O., 261Leo, A. J., 446, 447, 450

Lerner, A., 392

Leroy, F., 395

Leszczynski, J., 77Leslie, C., 396

Leslie, C. S., 396

Lester, M. I., 124Levine, I. N., 76

Levine, R. D., 121

Levy, S., 394

Lewis, R. A., 449Li, H., 399

Li, J., 231, 232, 449

Li, L. B., 395

Li, T., 394Li, X., 118, 395

Li, X.-P., 79

Li, Z. R., 395, 399

Liashenko, A., 119, 232Liaw, A., 399

Lichten, W., 117

Liebman, J. F., 445Lifer, S. L., 450

Liljefors, T., 450

Lloyd, A. W., 120

Lim, C., 223Limbach, H.-H., 231

Lin, C. J., 393, 394

Lin, H., 229

Lin, J.-H., 289Lind, P., 396

Lindahl, E., 260

Lindh, R., 119, 120Lineberger, W. C., 123

Ling, X. B., 394

Liotard, D. A., 231, 232

Lipinski, C. A., 288, 289, 449Lipkowitz, K. B., 75, 77, 116, 119, 122, 123,

230, 288, 394, 444, 446, 447, 448, 449, 450

Lipowsky, R., 261

Lipscomb, W. N., 444Lipton, M., 449

Lischka, H., 119

Liskamp, R., 449

Liu, G., 119, 232Liu, X. J., 396

Liu, Y., 395

Liu, Y.-P., 223, 227, 228, 229, 230, 231Livingstone, D. J., 394

Lluch, J. M., 226

Lo, D. H., 445

Loda, M., 394Lohr, Jr., L. L., 232

Lombardo, F., 288, 449

Lonari, J., 395

London, F., 81Long, A. K., 446

Longuet-Higgins, H. C., 117

Loomis, R. A., 124Lopes, C. F., 261

Lorenzo, L., 288

Lowdin, P. O., 446

Lu, D.-h., 223, 227, 227, 228Lu, W. C., 398

Lu, X. X., 398

Ludwig, D. S., 262

Lundstedt, T., 393Lunn, W. H. W., 445

Luz Sanchez, M., 229

Lynch, B. J., 223

Lynch, G. C., 223, 227, 228, 229Lynch, V. A., 227

Maciocco, E., 397, 398Maggiora, G. M., 287

Magnuson, A. W., 224

Mahe, P., 396

Maigret, B., 444Majewski, J., 262

Malarkey, C., 287

Author Index 461

Malick, D. K., 118, 232

Malley, J. D., 395Malli, G. L., 123

Malmqvist, P. A., 120

Maltseva, T., 396Mameli, M., 397

Manaa, M. R., 116

Manby, F. R., 119

Mangasarian, O. L., 394Mannhold, R., 288

Manohar, L., 122

Manthe, U., 232

Mao, K. Z., 395Marcotte, E. M., 400

Marcus, R. A., 224, 225, 226, 228, 229

Marian, C. M., 122, 123

Marino, D. J. G., 397Mark, A., 260

Markiewicz, T., 400

Marrink, S. J., 260Marsh, M. M., 445, 447

Marshall, G. R., 447, 450

Marshall, W. S., 450

Martin, A. N., 445Martin, J., 400

Martin, M. E., 122

Martin, R. L., 119, 232

Martin, T. C., 399Martin, Y. C., 446, 447

Martinez, T. J., 118, 121, 122, 123

Mascia, M. P., 397Maslen, P. E., 78, 80, 82

Massarelli, P., 398

Masters, A., 395

Matsika, S., 118, 120, 123, 124Matsunaga, N., 120, 122

Mattai, J., 262

Matthews, D. A., 447

Mattice, W. L., 259, 261Mauri, F., 79

Mauser, H. A., 395

Mayer, D., 450

Mayer, J. E., 446Mayer, K. F. X., 395

McCammon, J. A., 448

McCoy, J. D., 258McCoy, M., 451

McElroy, N. R., 289

McFarland, J. W., 446

McIntosh, D. F., 226McIver Jr., J. W., 225

McKinnon, R. A., 399

McMurchie, L. E., 77

McNicholas, S. J., 120McQuarrie, D. A., 231

McWeeny, R., 80

Mead, C. A., 116, 117, 118, 123Meeden, G., 287

Mekenyan, O. G., 398

Melissas, V. S., 223, 225, 228

Melssen, W. J., 399Mennucci, B., 118, 122, 232

Mercer, J., 393

Mercer, K. L., 262

Merchan, M., 120Merkwirth, C., 395

Mesirov, J. P., 394

Messiah, A., 80, 443

Mewes, H. W., 395Meyer, H., 258, 259, 260

Meyer, H.-D., 121

Meyer, W., 120Meyer, Jr., E. F., 446

Michalickova, K., 400

Micheli, A., 396

Michelian, K. H., 226Michl, J., 116, 121, 122

Mielke, S. L., 227

Migani, A., 118, 121

Mika, S., 393, 398, 400Milano, G., 259

Millam, J. M., 79, 232

Miller, C. E., 262Miller, M. A., 446

Miller, T. A., 116, 117

Miller, W. H., 121, 224, 225, 230

Milliam, J. M., 118Milosevich, S. A. F., 448

Mina, N., 223

Miners, J. O., 399

Minichino, C., 230Mitchell, A. D., 444

Mitchell, B. E., 289

Mitsuhashi, S., 448

Miyazaki, T., 80Moecks, J., 399

Mohamadi, F., 449

Møller, C., 75Molnar, F., 118

Monmaney, T., 447

Montgomery Jr, J. A., 118, 120, 232

Moock, T. E., 447Mooney, R. J., 400

Moore, G. E., 76

462 Author Index

Moore, P. B., 261

Morgan III, J. D., 78Morokuma, K., 118, 120, 232

Morse, P. M., 77

Moser, K. L., 395Mosquera, R. A., 288

Motherwell, W. D. S., 446

Mouritsen, O. G., 262

Mowshowitz, A., 287Mukherjee, S., 394

Muller, B., 260

Muller, H., 289

Muller, K.-R., 393, 398Muller, M., 261

Muller, Th., 119

Muller-Plathe, F., 258, 259, 260

Mullin, R., 451Mura, M. E., 120

Murat, M., 258, 259

Murayama, S., 448Murcko, M. A., 449

Murphy, R. B., 82

Murray, C. W., 449, 450

Murrell, J. N., 121Murtola, T., 261

Muselli, M., 395

Musicant, D. R., 394

Nachbar, R. B., 449

Nagashima, R., 399

Nakai, H., 118Nakajima, T., 118

Nakano, H., 120

Nakatsuji, H., 118

Nanayakkara, A., 119, 232Nandi, S., 395

Nangia, S., 116

Natanson, G. A., 226

Natsoulis, G., 395Naylor, C. B., 450

Neely, W. B., 445

Negri, F., 122

Nencini, C., 398Neogrady, P., 120

Netzeva, T. I., 398

Newton, M. D., 445Ng, C.-Y., 76, 119

Nguyen, H. X., 448

Nguyen, K. A., 120, 223, 227, 230

Nicklass, A., 120Niedfeldt, K., 82

Nielsen, S. O., 261

Nilges, M., 449

Nishikawa, T., 399Niyogi, P., 392

Noble, W. S., 396

Nunes, R. W., 79

Obara, S., 77

Ochsenfeld, C., 76, 78, 80, 81

Ochterski, J. W., 118, 232Ogihara, M., 394

Ohtani, H., 122

Oladunni, O., 400

Olafson, B. D., 232, 448Olivucci, M., 116, 118, 120, 121, 122

Olsen, J., 76, 79, 82

Olsen, S., 122

Olson, E. C., 446, 447O’Malley, T. F., 117

Opik, U., 117

Oppenheimer, R. A., 76, 116Ordejon, P., 80

Ortiz, J. V., 119, 232

Osowski, S., 400

Ostlund, N. S., 75Ostovic, D., 228

Ovchinnikova, M. Y., 229

Overend, J., 226

Ozisik, R., 261

Pacher, T., 117

Page, M., 225Palkowitz, A. D., 450

Palmieri, P., 120

Pannu, N. S., 449

Pant, P. V. K., 260Papa, E., 398

Papavassiliou, D. V., 400

Pardo, M., 400

Parr, R. G., 76, 445Parra, X., 394

Parrill, A. L., 450

Pasini, P., 262

Pastore, E. J., 448Patey, G. N., 232

Patra, M., 261

Paugam-Moisy, H., 394Paul, W., 259, 261

Pavon, J. L. P., 400

Pawson, T., 400

Pearson, C. I., 395Pechukas, P., 224

Pellerano, C., 397, 398

Author Index 463

Peng, C., 225

Peng, C. Y., 119, 232Peng, S. H., 394

Peng, X. N., 394

Pepers, M., 399Perdew, J. P., 78, 79

Perram, J. W., 78

Perret, J.-L., 396

Perun, S., 122Perun, T. J., 448

Peruzzo, P. J., 397

Pesa, M., 226

Petersen, H. G., 78Petersson, G. A., 118, 232

Petrich, W., 399

Petsko, G. A., 448

Peyerimhoff, S. D., 123, 444Peyraud, J. F., 397

Pfeifer, W., 450

Pickett, S. D., 449Pierloot, K., 120

Pierna, J. A. F., 399

Pilling, M. J., 226, 230

Piskorz, P., 119, 232Pitzer, K. S., 227

Pitzer, R. M., 119, 120

Plante, L. T., 448

Platt, J., 393Plesset, M. S., 75

Pletnev, I. V., 399

Pochet, N., 395Poggio, T., 392, 394

Polanyi, M., 223

Polinger, V. Z., 117

Pollard, W. T., 82Pollastri, G., 394

Pomelli, C., 118, 232

Pon, F. R, 262

Pople, J. A., 76, 77, 78, 79, 80, 81, 116, 119,230, 232, 444, 445

Portera, F., 396

Poulsen, T. D., 231

Prasad, M. A., 399Press, W. H., 80

Presti, D. E., 445

Preston, R. K., 121Prigogine, I., 120

Propst, C. L., 448

Provita, M., 446

Pryce, H. L., 117Pu, J., 223, 228, 229

Pulay, P., 78, 80, 81, 82, 226

Pullman, A., 443

Pullman, B., 76, 224, 443Pupyshev, V. I., 77

Purcell, W. P., 446

Putz, M., 259

Qin, S. J., 400

Quastler, H., 287

Quenneville, J., 121

Rabinovitch, B. S., 227

Rabuck, A. D., 119, 232

Rachlin, A. I., 446Radloff, W., 123

Radom, L., 445

Ragazos, I. N., 118, 121

Raghavachari, K., 119, 232Rai, S. N., 223, 228

Ramani, A. K., 400

Ramaswamy, S., 394Ramirez, J. C., 288

Rao, B. S., 395

Rao, S., 395

Raphael, C., 260Ratsch, G., 393, 398

Rauhut, G., 119

Read, R. J., 449

Reddy, M. R., 231, 448, 450Redmon, M. J., 223, 225

Reed, A. E., 225

Reed, R. A., 262Reeder, R. C., 261

Reel, J. K., 450

Rega, N., 118

Reich, M., 394Reichel, R., 289

Reith, D., 258, 259, 260

Rekvig, L., 261

Ren, S., 396, 397Renier, A. M., 399

Replogle, E. S., 232

Reynolds, C. H., 450

Rhodes, J. B., 446, 447Ribi, H. O., 262

Rice, R. A., 449

Rice, S. A., 120Richards, N. G. J., 449

Richards, W. G., 446

Richon, A. B., 448

Richter, D., 258Rifkin, R., 394

Rinaldi, D., 232

464 Author Index

Ringnalda, M. N., 82

Rios, M. A., 229Rivail, J.-L., 444

Robb, M. A., 116, 118, 120, 121, 122,

123, 232Roberts, J. D., 445

Robertson, S., 227

Robertson, S. H., 226

Roche, O., 395Rodgers, J. R., 446

Roger, J. M., 399

Rognvaldsson, T., 396

Rokhlin, V., 77Roos, B. O., 119, 120

Roothaan, C. C. J., 76

Rorabacher, D. B., 288

Ross, R. B., 123Rossi, I., 223, 230

Rossiter, K. J., 397

Rost, B., 400Rothman, M. J., 232

Rouse, P. E., 260

Roussel, S., 399

Rozenholc, Y., 288Rubenstein, S. D., 446

Ruedenberg, K., 117, 118

Ruffino, F., 395

Runge, E., 120Rupert, L. A. M., 261

Rutledge, G. C., 259

Ruud, K., 81Ruzsinszky, A., 79

Ryckaert, J.-P., 260

Ryde, U., 120

Sack, R. A., 117

Sadik, O., 400

Sadlej, A. J., 120

Saebø, S., 82Sagar, R. P., 287, 288

Saigo, H., 395

Saika, A., 77

Saito, T., 122Sakurai, J. J., 80

Salo, M., 289

Salt, D. W., 394Salvador, P., 118

Samoylova, E., 123

Sanchez, M. D. N., 400

Sanchez, M. L., 231Sanchez, V. D., 393

Sanna, E., 397

Santos, S., 259

Santry, D. P., 230Saravanan, C., 79

Satija, S., 262

Sato, T., 288Saunders, V. R., 77

Savchuk, N. P., 399

Savini, L., 397, 398

Sberveglieri, G., 400Scalmani, G., 118

Scanlon, K., 226

Schacht, D., 287

Schaefer III, H. F., 81, 119, 450Schaller, T., 81

Scharpf, O., 258

Schenter, G. K., 224, 231, 232

Scherbinin, A. V., 77Schermerhorn, E. J., 288

Schick, M., 261

Schimmelpfennig, B., 120Schlegel, H. B., 81, 116, 118, 120, 225, 232

Schleyer, P. v. R., 81, 445, 450

Schlijper, A. G., 261

Schmid, E. F., 451Schmid, F., 261

Schmid, W. E., 122

Schmidt, M. W., 120, 225

Schmitt, J., 399Schmitz, H., 259

Schnaare, R. S., 445

Schneider, G., 395Schnell, I., 81

Scholkopf, B., 392, 393, 394, 400

Schoolnik, G. K., 262

Schouten, J. A., 80Schreckenbach, G., 81

Schreiner, P. R., 81

Schrodinger, E., 75,

Schulmerich, M. V., 287Schulten, K., 118

Schultz, T., 123

Schultz, T. W., 396, 397

Schulz-Gasch, T., 395Schumann, U., 120

Schurr, J. M., 227

Schutz, M., 82, 119Schuurmans, D., 392

Schwartz, R. L., 124

Schwegler, E., 77, 78

Schwenke, D. W., 228, 232Scott, D. W., 288

Scuseria, G. E., 78, 79, 80, 82, 118, 232

Author Index 465

Seakins, P. W., 230

Sears, S. B., 287Seelbach, U. P., 81

Segal, G. A., 225, 230, 444

Seijo, L, 120Seminario, J. M., 79

Seng, C. K., 394

Sensi, P., 446

Serrano-Andres, L., 120Seth, M., 119

Sham, L. J., 76

Shanmugasundaram, V., 287

Shannon, C. E., 287Shao, Y., 79, 81

Shavitt, I., 119, 225

Shawe-Taylor, J., 392

Shelley, J. C., 261Shelley, M. Y., 261

Shen, D. G., 395

Shepard, R., 119Sheridan, R. P., 399

Shi, L., 400

Shimanouchi, T., 446

Shipley, G. G., 262Shockcor, J., 393

Shoichet, B. J., 288

Siciliano, P., 400

Siebrand, W., 229Siegbahn, P. E. M.

Sierka, M., 78

Silva, W. A., 395Silverman, R. B., 443

Silvi, B., 77

Simon, R. L., 450

Simonson, T., 449Sinicropi, A., 118

Skodje, R. T., 225, 228

Skolnick, J., 259

Slater, J. C., 76Smedarchina, Z., 229

Smit, B., 260, 261

Smith, B. R., 121

Smith, D. A., 451Smith, E. R., 78

Smith, F. T., 117

Smith, G. M., 446, 447Smith, G. S., 262

Smith Jr., V. H., 287, 288

Smith, N. P., 448

Smith, P. A., 399Smith, S. C., 226

Smith, S. J., 444

Smola, A. J., 392, 393, 400

Snyder, J. P., 448Sobolewski, A. L., 122, 123

Soddemann, T., 261

Soelvason, D., 78Sommerfeld, T., 124

Somorjai, R., 399

Song, J. H., 400

Song, M., 394Song, Q., 399

Song, S. O., 400

Sonnenburg, S., 398

Sorich, M. J., 399Sperduti, A., 396

Spiess, H. W., 81

Spitzer, W. A., 445

Sprague, J. T., 446Sprague, P. W., 449

Sridevi, U., 395

Stahl, M., 395Stahl, M. T., 449

Stahura, F. L., 287, 288, 289

Stanton, J. F., 75, 119, 120

Staroverov, V. N., 78, 79States, D. J., 232, 448

Statnikov, A., 394

Stechel, E., 80

Steckler, R., 223, 225, 228, 232Steel, C., 224

Stefanov, B. B., 119, 232

Steinberg, M. I., 450Steinwart, I., 394

Stepanov, N. F., 77

Stewart, J. J. P., 223, 228, 230, 447

Still, W. C., 449Stock, G., 118

Stoll, H., 83, 120

Stone, A. J., 117, 120

Strain, J., 77, 78Strain, M. C., 118, 232

Stratmann, R. E., 118, 232

Strauss, H. L., 227

Strobl, G., 260Sturges, H. A., 288

Su, S., 120

Summers, R. M., 395Sun, L. Z., 395

Sun, Q., 259, 260, 262

Sung, K. K., 392

Sutcliffe, B. T., 76, 444Suter, U. W., 259, 261

Sutter, J. M., 289

466 Author Index

Sutton, L. E., 444

Suykens, J. A. K., 392, 393, 395Suzue, S., 448

Svetnik, V., 399

Swaminathan, S., 232, 448Szabo, A., 75

Szalay, P. G., 119

Szilva, A. B., 82

Taft Jr., R. W., 445

Takahashi, H., 394

Takahata, Y., 397

Takahide, N., 394Takeuchi, K., 400, 450

Takimoto, J., 258

Tamayo, P., 394

Tambe, S. S., 395Tamboli, A. C., 262

Tanchuk, V. Y., 289

Tao, J., 78, 79Tao, S., 398

Tapia, O., 230, 232

Tarroni, R., 120

Taskinen, J., 289Tasumi, M., 446

Taylor, P. R., 76, 77, 119

Teitelbaum, H., 223, 224

Teller, E., 116, 117Tennyson, J., 121

Teramoto, R., 396

Teter, M., 79Tetko, I. V., 289, 395

Teukolsky, S. A., 80

Theodorou, D. N., 260

Thissen, U., 399Thompson, D. L., 227

Thompson, J. D., 231, 232

Thompson, M. A., 231

Thompson, T. B., 396Thorsteinsson, T., 120

Thrasher, K. J., 450

Tildesley, D. J., 260

Timmerman, H., 288Tobita, M., 399

Todeschini, R., 288, 393, 450

Tollabi, M., 397Tollenaere, J. P., 450

Tolley, A. M., 395

Tomasi, J., 118

Tomasi, J., 122, 232Tong, C., 399

Toniolo, A., 122

Topol, E. J., 395

Toropov, A. A., 397Torrie, G. M., 232

Toyota, K., 118

Trafalis, T. B., 400Tries, V., 259

Trinajstic, N., 287

Trucks, G. W., 81, 232

Truhlar, D. G., 116, 117, 118, 121, 223, 224,225, 226, 227, 228, 229, 230, 231, 232

Truong, T. N., 223, 224, 227, 228, 229

Trushin, S. A., 122

Tsai, C. A., 395Tsamardinos, I., 394

Tschop, W., 258, 261

Tsuda, K., 393, 396

Tucker, S. C., 223, 229Tuekam, B., 400

Tugendreich, S., 395

Tuligi, G., 398Tully, J. C., 121, 124

Tweedale, A., 224

Udelhoven, T., 399Ueda, N., 395, 396

Uematsu, M., 400

Ulaczyk, J., 400

Ung, C. Y., 399Ungerer, P., 260

Urrestarazu Ramos, E., 396

Ustun, B., 399

Vaes, W. H. J., 396

Vahtras, O., 78

Valentini, G., 395Valleau, J. P., 232

Van Catledge, F. A., 446

van der Spoel, D., 260

Van Gestel, T., 392van Os, N. M., 261

van Voorhis, T., 80

Van Wazer, J. R., 232

van Wullen, C., 81Vanderbilt, D., 79

Vandewalle, J., 392

Vapnik, V. N., 392Varandas, A. J. C., 121

Varmuza, K., 393

Vasudevan, V., 450

Vattulaine, I., 261Veith, G. D., 398

Vendrame, R., 397

Author Index 467

Venturoli, M., 261

Verhaar, H. J. M., 396Veillard, A., 76

Vert, J.-P., 395, 396

Veryazov, V., 120Verzakov, S., 399

Vetterling, W. T., 80

Villa, A. E. P., 289

Villa, J., 223, 225, 226, 229, 231Vinter, J. G., 447

Volykin, A., 400

von Frese, J., 399

von Homeyer, A., 393Von Itzstein, M., 450

von Meerwall, E. D., 261

von Neumann, J., 116

Voth, G. A., 118, 261Vreven, T., 118

Wagner, A. B., 450Wagner, A. F., 227

Wailzer, B., 397

Wainwright, T. E., 262

Walch, S. P., 124Walker, J. D., 398

Walter, D., 82

Walter, J., 76

Walters, D. E., 450Walters, W. P., 394, 449

Wand, M. P., 287

Wanekaya, A. K., 400Wang, D., 232

Wang, J., 396

Wang, J. P., 396

Wang, M., 396Wang, Q., 395

Wang, Q. J., 395

Wang, S. Y., 122

Wang, T., 399Wang, Y., 395

Wardlaw, D. A., 226, 227

Warren, G. L., 449

Warshel, A., 122, 230Watson, D. G., 446

Weaver, D. F., 287, 288

Weaver, W., 287Weber, H. J., 77

Weber, V., 82

Wegner, J. K., 287, 395

Weinhold, F., 225Weis, P., 81

Weiss, R. M., 230

Werner, H.-J., 82, 119

Westheimer, F. H., 446Weston, J., 392, 396

Weygand, M., 262

White, C. A., 76, 77, 78, 79, 80, 81Whitesitt, C. A., 450

Whitten, J. L., 82

Widmark, P.-O., 120

Wiegand, S., 260Wiest, S. A., 450

Wigner, E., 224, 225

Wigner, E. P., 116

Willett, P., 447Williams, D. E., 77

Williams, G. J. B., 446

Williamson, R. C., 393

Wilson, J. C., 450Wilson, J. W., 444

Wilson Jr., E. B., 224

Wilson, K. R., 232Wilson, P. J., 81

Wilson, S., 77

Windle, A. H., 258

Windus, T. L., 120Winget, P., 231, 232

Winkler, D. A., 399

Winzler, R. J., 444

Wipke, W. T., 445, 446Wirz, J., 446

Wittmer, J., 261

Wold, J. S., 448Wold, S., 393

Wolinski, K., 81, 120

Wolschann, P., 397

Wolting, C., 400Wong, B. Y., 262

Wong, L., 400

Wong, M. W., 119, 232

Wong, Y. W., 400Wood, W. W., 262

Worgan, A. D. P., 398

Worth, G. A., 121, 122

Wu, D. H., 392Wyatt, R. E., 117

Wynne-Jones, W. F. K., 223

Wysotzki, P., 396

Xantheas, S. S., 117, 118

Xidos, J. D., 232

Xing, J., 229Xu, F. L., 398

Xu, Q. H., 394

468 Author Index

Xu, X. B., 396

Xue, L., 288Xue, Y., 395, 399

Xuong, N., 448

Yabushita, S., 119

Yan, Q., 260

Yang, J., 396

Yang, S. S., 398Yang, U. C., 395

Yang, W., 76, 79

Yang, Z. R., 396

Yap, C. W., 395, 399Yaris, R., 259

Yarkony, D. R., 76, 77, 80, 116,

117, 118, 119, 120, 121, 123, 124

Yazyev, O., 118Yeang, C. H., 394

Yin, F., 396

Yoon, E. S., 400You, L. W., 396

Yunes, R. A., 397

Zahradnik, R., 444, 446Zakarya, D., 397

Zakrzewski, V. G., 118, 232

Zannoni, C., 262

Zell, A., 287, 395Zelus, D., 394

Zernov, V. V., 399

Zewail, 116Zgierski, M., 229

Zhai, H. L., 400

Zhan, Y. Q., 395

Zhang, C. L., 394Zhang, J. Z., 117

Zhang, Q., 79

Zhang, S. D., 400

Zhang, Z., 119Zhao, Q., 79

Zheng, C., 396

Zhou, X., 395

Zhu, C., 116Zhu, T., 231, 232

Ziegler, T., 81

Zilberg, S., 121Zimmerman, K. M., 450

Zirkel, A., 258

Zoebisch, E. G., 230

Zomer, S., 399, 400Zupan, J., 393

Zurer, P., 450

Author Index 469

Subject Index

Computer programs are denoted in boldface; databases and journals are in italics.

Abbott Laboratories, 407, 413, 418, 427

Abgene, 385

Accidental conical intersections, 90, 105

Actin filaments, 246Activated complex, 128

Active learning support vector machines

(AL-SVMs), 381

Active space, 99, 101Adenine, 108, 111

Adiabatic approximation, 164

Adiabatic representation, 87Adiabatic-diabatic representation, 87

ADME/Tox, 434

Agouron, 422

Alcon, 418Allergan, 418

Allyl radical, 112

AM1, 192, 352, 367

American Chemical Society (ACS), 414Ames test, 379

Analytic gradients, 100

Anchoring points, 235

Angiotensin II, 297Anharmonic motions, 159

Anharmonic vibrational energy levels, 158

Anisotropic potential energy function, 238AO-MP2 method, 67

Apparent randomness, 264

Aqueous solubility, 283

Aroma classification, 361Array processors, 424

Artificial neural networks (ANNs), 302,

348, 362, 371, 379

ASVM, 390

Asymptotic scaling, 2

Atactic chains, 238

Atactic polystyrene, 249Atom-centered basis set, 3

Atomic basis functions, 47

Atomic orbital, 3

Atomistic detail, 252Atomistic models, 235, 242

Autocorrelation function, 246

Automatic text datamining, 384Available Chemical Directory (ACD), 271,

276, 281, 372

Average-of-states MCSCF, 99

Avoided crossings, 84, 101

Backward Euler method, 143

Barnes-Hut (BH) tree methods, 35

Barrier height, 128Barrierless association reactions, 157

BASF, 438

Basis functions, 3, 43, 97

Bayer, 425, 438Bayes point machines, 291

Bayesian statistics, 283

Bead-spring models, 234Benzodiazepine receptor ligands, 366

Berry phase, 89

Bimolecular rate constant, 203

Bimolecular reaction, 130, 140, 166, 188, 206Binary hard-disk fluid, 256

Binary polymer melt, 243

Binary QSAR, 283, 284


471

BIND database, 385

Bioavailability, 421BioCAD, 428

Bioconcentration factor (BCF), 369

BioDesign, 427Bioinformatics, 385

Bioinformatics, 386Biological activity, 299, 402

Biological membranes, 254Biological systems, 106

Bio-medical terms, 385

Biorthogonality condition, 44

Biorthonormality condition, 44BIOSYM, 428

Boltzmann inversion, 240, 241

Bond-breaking, 101, 106

Bond-fluctuation model, 251Bond-making, 106

Boosted wrapper induction, 385

Born-Oppenheimer approximation, 5, 83, 85,97, 126, 128, 131, 204

Born-Oppenheimer PES, 193

Bound support vectors, 322

Boundary effects, 273Bovine spongiform encephalopathy

(BSE), 379

Branching coordinate, 92

Branching plane, 91, 92Branching space, 89, 91, 110

Bristol-Myers Squibb, 425

BSVM, 388

Calculation chemistry, 414

Calculations of biomolecules, 402

Cambridge Structural Database (CSD), 413Canonical ensemble, 128, 136

Canonical MO coefficient matrix, 36, 42, 65

Canonical unified statistical (CUS) model, 138

Canonical variational theory (CVT), 127, 134Canonical variational transition state theory,

127, 128, 131

Capillary electrophoresis, 380, 381

Carcinogenic activity, 360Carcinogenicity, 421

Cartesian coordinates, 196, 239, 412

CASSCF, 99, 101CASSCF/AMBER, 107

Catalyst, 430

CCSD, 98

CCSD(T), 98Cell, 385Central nervous system, 366

Centrifugal-dominant small-curvature

approximation, 171Cephalosporins, 408

Chain contour, 246

Chain diffusion coefficient, 247Chain stiffness, 245

Chapman and Hall (CH) natural products

database, 277

Charge distribution, 18Charge-transfer reactions, 83, 106

CHARMM, 208, 211, 423, 427

CHARMMRATE, 191, 211

ChemDraw, 416CHEMGRAF, 417

Chemical accuracy, 190

Chemical descriptors, 269, 283

Chemical Design Ltd., 417, 428Chemical diversity, 275

Chemical engineering, 383

Chemical information, 263, 279Chemical information content, 278

Chemical intuition, 284, 412

Chemical libraries, 263, 270, 275

Chemical reaction rates, 125Chemical shifts, 60

Chemoinformatics, 264, 265, 269, 286, 317,

362, 385, 387

Chemometrics, 379Chromophore, 106

Ciba-Geigy, 438

CIS, 98CISD, 98

Class discrimination, 362

Class membership, 295, 302

Classical barrier height, 128Classical CVT rate constant, 134

Classical partition function, 128

Classical threshold energies, 167

Classical trajectory calculations, 130Classical turning points, 166, 182

Classification, 291, 293

Classification errors, 318

Classification hyperplane, 294Classification rules, 302

ClogP, 297

CLOGP, 420Closed-shell systems, 97

CNDO/2, 407, 410

Coarse-grained models, 242

Coarse-grained Monte Carlo simulations, 250CODESSA, 347

Coffee, 382

472 Subject Index

Collaboration gap, 412, 413

Collective bath coordinate, 133COLUMBUS, 100, 104

Combinatorial chemistry, 430

Commercial software, 427Complete active space second-order

perturbation theory (CASPT2), 101, 107

Complete neglect of differential overlap

(CNDO), 407Complex descriptors, 273, 277

Complexity of descriptor, 273

Composite charge distribution, 18

Compound diversity, 269Computational biology, 385

Computational chemistry, 265, 286, 387, 401

Computational Chemistry List (CCL), 428

Computational chemists, 404Computer centers, 404

Computer graphics, 416

Computer use at pharmaceuticalcompanies, 414

Computer-aided drug design (CADD), 413,

414, 417, 434

Computer-aided ligand design (CALD), 434Computer-aided synthesis planning, 408, 412

Computers, 402

Apple Macintosh, 415, 426

Cray-2, 425DEC-10, 409

Floating Point System (FPS)-164, 424

IBM 3033, 418IBM 3083, 418

IBM 3278, 409

IBM 360, 408

IBM 4341, 418IBM 7094, 404

IBM PC, 415

VAX 11/780, 415, 418

VAX 11/783, 418VAX 11/785, 418

Condensed-phase reactions, 206

Configuration state functions (CSFs), 97

Configurational bias, 250Conical intersections (CIs), 83, 84, 90, 93

Conjugate gradient algorithm, 147

Connectivity indices, 273, 377Constrained minimization problems, 311

Continuous, charge distribution, 23, 32

Continuous fast multipole method (CFMM),

16, 34, 5Continuous space models, 236

Contour length, 234, 249

Contracted Gaussian basis functions, 5

Contracted Gaussian distributions, 26Contracted multipole integrals, 26

Contravariant basis vectors, 44

Conventional transition state theory,126, 128

Core orbitals, 100

Corner cutting, 222

Corner-cutting tunneling, 164Coulomb integrals, 15

Coulomb interactions, 255

Coulomb matrix, 15

Coulomb’s Law, 11Coulomb-type contraction, 69

Coupled-cluster (CC) methods, 2, 98

Coupling matrix elements, 87

Coupling terms, 86Covariant basis vectors, 44

Covariant integral representation, 47

CPK (Corey-Pauling-Koltun) models, 406Cray Research, 424, 425

Creutzfeldt-Jacob disease, 379

Cross-entropy, 269

Cross-validation, 299, 302, 355, 363Curved directions, 55

Curvilinear coordinates, 133, 150, 152, 154,

221, 246

Curvilinear internal coordinates, 152, 163Curvy steps method, 55

Cytochromes P450, 372, 375

Cytosine, 107

Data mining, 429

Databases from scientific literature, 384

Daylight Chemical Information Systems, 419, 429De novo design, 413

Debye-Huckel theory, 205

DEC, 415

Decision tree, 378Decwriter II, 409

Degeneracy, 90

Degree of freedom, 128

Degree of polymerization, 249Density functional theory (DFT), 2, 98

Density matrix, 6, 37

Density matrix-based coupled perturbed SCF(D-CPSCF), 62

Density matrix-based energy functional, 49

Density matrix-based quadratically convergent

SCF (D-QCSCF), 55Density matrix-based SCF, 42

Density operator, 48

Subject Index 473

Derivative coupling, 86, 96

Descriptors, 263, 283, 3011-D, 272

2-D, 272, 281, 284

3-D, 281Descriptor comparison, 269

Descriptor design, 285

Descriptor selection, 264, 283, 285, 347, 378

Descriptor space, 301Descriptor variability, 275

Diabatic representation, 87

Diagonalization, 42

Differential Shannon entropy (DSE), 265, 275Diffuse functions, 99

Diffusion, 203

Dihydrofolate reductase (DHFR), 422

Dipalmitoyl-phosphatidylcholine (DPPC), 255Dipalmitoyl-phosphatidylethanolamine

(DPPE), 256

Direct Born-Oppenheimer moleculardynamics, 57

Direct dynamics, 126, 190, 191, 217, 222

Direct mapping of the Lennard-Jones time, 250

Direct methods, 100Direct SCF methods, 8

Directed acyclic graph SVM (DAGSVM), 339

Disconnect between computational chemists

and medicinal chemists, 411Dissipative particle dynamics, 255

Distinguished reaction coordinate (DRC), 208

Distortion energy, 204Diversity of kernel methods, 391

Divide-and-conquer methods, 42

Dividing hypersurface, 128, 131, 152, 158,

205DNA, 28, 378

DNA/RNA bases, 107

Docking, 430

Dow Chemical, 408Dragon, 347, 372

Drieding models, 406

Drug design, 291, 371

Drug discovery, 403Drug-like compound, 271, 362, 371, 375

Drug-like compound identification, 348

Drugs, 376Dual-level dynamics, 199

DuPont, 418, 427

Dynamic correlation, 73, 97, 99, 108

Dynamic mapping, 246Dynamical bottleneck, 128, 130, 173, 221

Dynamics, 104

Dynamics of polymers, 248

Dynamics trajectories, 280

Eckart barrier, 139

Eckart potential, 198Ehrenfest dynamics, 105

Electron correlation, 1, 12, 64, 97

Electron density distributions, 279

Electron repulsion integral, 20Electron transfer, 126

Electronegativity equalization method

(EEM), 375

Electronic coordinates, 85Electronic mail, 415

Electronic nose, 381

Electronic partition function, 148, 150

Electronic structure calculations, 126, 190Electronic structure theory, 1

Electronic wavefunction, 85, 87

Electrostatic interaction, 255Electrostatic potential, 17

Electrotopological indices, 377

Eli Lilly and Company, 402, 407, 427, 438

EMBO Journal, 385Empirical valence bond method, 192

End-to-end distance, 246

Energy gradients, 57

Energy minimization, 53, 253Energy transfer, 130

Ensemble averaging, 207

Ensemble of reaction paths, 221Ensemble-averaged variation transition state

theory (EA-VTST), 206, 207

Entanglement length, 248

Entanglement time, 249Entropic separation (ES), 277, 281

Entropy, 263

Entropy metric, 264

Entropy-based information theory, 283Envison, 418

Enzyme-catalyzed reactions, 206, 207

Equations of motion, 141

Equilibrium solvation, 206Equilibrium solvation path (ESP)

approximation, 206

Errors, 317Espresso coffee, 382

ETA Systems, 424

Ethyl radical, 157

Ethyl tertiary butyl ether (ETBE), 382Euler steepest-descent (ESD) method, 143

Evans and Sutherland PS300, 418

474 Subject Index

Evolutionary algorithms, 302

Exchange-correlation functional, 6, 40Exchange-type contractions, 35, 71

Excitation energies, 101

Excited state dynamics, 111Excited state properties, 102

Excited states, 84, 98, 99, 103, 172

Experimental errors, 299

Extended Huckel theory (EHT), 407, 410

Far-field (FF) interactions, 29, 30

Fast multipole method (FMM), 16, 27, 34

Features, 301Feature construction, 378

Feature functions, 293, 326

Feature selection, 264, 375

Feature space, 293, 323Feed-forward neural networks, 351, 382

Fermi operator expansions (FOE), 42

Fermi operator projections (FOP), 42Fermions, 47

Fingerprints, 273, 373

First order CI (FOCI), 100

Fixed basis functions, 5Flux, 130, 138, 205

Fock matrix, 6, 37, 47

Fock operator, 5

Focused compound libraries, 376FORTAN 77, 419

FORTRAN II, 404

FORTRAN IV, 409Fourier transform Coulomb (FTC) method, 35

Fragrances, 361

Free diffusion, 248

Free energies, 241, 244Free energy of activation, 129, 147

Free energy of reaction, 129

Free energy perturbation (FEP) theory, 423

Free software, 387Full CI (FCI), 98

Full Multiple Spawning (FMS), 105

GAMESS, 101GAMESSPLUSRATE, 191

Gangloside lipid (GM1), 256

Gasoline, 382Gas-phase reactions, 127

Gauge-including atomic orbitals (GIAO), 61

GAUSSIAN, 97, 104

Gaussian 70, 409Gaussian 76, 409

Gaussian 80, 409

GAUSSIAN 98, 217

Gaussian basis functions, 6Gaussian distributions, 20, 270

Gaussian Inc., 419

Gaussian processes, 291Gaussian very fast multipole methods

(GvFFM), 35

GAUSSRATE, 191, 217

Generalized transition state, 131, 221Generalized transition state dividing surface,

205

Generalized transition state partition function,

134, 149, 152Generalized transition state theory, 127

Genetic algorithm, 284, 381

Genotoxicity of chemical compounds, 378

Geometric phase effect, 89, 113Ghose-Crippen atom types, 372

Gini-SVM, 389

Gist, 389GlaxoSmithKline, 438

GPDT, 389

Graining, 29

Graph descriptors, 301Graph theory, 264

Graphical user interface (GUI), 427

Green fluorescent protein, 107

Gromacs, 241Ground state, 84, 148

Group similarity, 280

Gyration radius, 246

Hamiltonian matrix, 88, 92

Hamilton’s equations of motion, 105

Hard margin nonlinear SVM classification,334

Hard-disk model, 256

Hard-sphere fluids, 256

Harmonic vibrational energy levels, 158Hartree-Fock (HF) method, 1, 97

Hartree-Fock reference, 99

Hartree-Fock wavefunction, 97

Health Designs, 420Heaviside step function, 163

Heavy elements, 112

HERG (human ether-a-go-go) potassiumchannel inhibitors, 374

HeroSvm, 389

Hessian, 142, 151, 190

Heteropolymers, 254High information content, 283

Highly symmetric reaction paths, 155

Subject Index 475

High-throughput screening, 430

Hilbert space, 323Hindered internal rotations, 159

Histogram bins, 267

Historical development of computationalchemistry, 401

Hoechst, 438

Hoffmann-LaRoche, 438

HOMO-LUMO gap, 36, 42Hybrid functionals, 40

Hydrodynamics, 250

Hydrogen-atom transfer reaction, 109

Hydrophobicity, 297, 359Hyperplane classifier, 302

IBM, 415

IBM mainframes, 416Idempotency, 46

Imaginary frequency, 127, 128, 190

Imbalanced classification, 338IMLAC, 418

Implicit solvation models, 126

Improved canonical variational theory (ICVT),

137Inductive logic programming, 378

Inertial centrifugal effect, 169

Informatics, 431

Information content, 264, 269Information content analysis, 263

Information content of a signal, 264

Information content of organic molecules, 278Information theoretic analysis, 284

Information theory, 264

Information-rich descriptors, 273, 277, 285

Integral screening, 10Integrated Scientific Information System (ISIS),

429

Interaction domains, 67

Interaction sites, 237Interactive computing, 415

Interactive graphical terminals, 415

Intermediate partition function, 159

Internal contracted MRCI, 100Internal coordinates, 192, 196

Internal degrees of freedom, 148

International Union of Pure and AppliedChemistry (IUPAC), 414

Internet, 385, 387

Interpolated optimized corrections (IOC)

method, 200Interpolated optimized energies, 202

Interpolated single-point energies, 200

Interpolated variational transition state theory

by mapping (IVTST-M), 196Interpolated VTST, 192

Intersystem crossings, 84, 106, 113

Intramolecular electron transfer, 106Intrinsic reaction coordinate (IRC), 133

IR spectroscopy, 60

Isoinertial coordinates, 132, 140, 188

Iterative Boltzmann method (IBM), 240Iterative structural coarse-graining, 242

Jahn-Teller effect, 90, 110

Jaynes entropy (JE), 269, 280JmySVM, 388

JOELib, 374

Johnson & Johnson, 425

Journal of Biological Chemistry, 385Journal of Computational Chemistry,

405, 414

Journal of Machine Learning Research, 386Journal of Medicinal Chemistry, 435Journal of Molecular Graphics, 416JSVM, 391

Jury methods, 373Jury SVM, 348, 372

Kappa indices, 377

Karuch-Kuhn-Tucker (KKT) conditions, 312,321, 342

K-class support vector classification-regression

(K-SVCR), 340Kernel principal component analysis, 291

Kernel-based techniques, 291

Kernels, 294, 326

Additive, 333Anova, 332

B spline, 295, 299, 316, 333, 354

Dot, 329, 353

Exponential RBF, 316, 331Fourier series, 332

Gaussian RBF, 316, 331, 371

Graph, 378

Linear, 295, 329, 353Neural, 332

Nonlinear, 295

Polynomial, 295, 330, 354Radial basis function (RBF), 295, 375

Sigmoid, 332

Spline, 299, 332

SVM, 329Tanh, 332

Tensor product, 333

476 Subject Index

Kernels for biosequences, 349

Kernels for molecular structures, 350KEX, 385

Keys, 429

Kier-Hall indices, 273, 367Kinase inhibitors, 371

Kinetic isotope effects, 127

k-nearest neighbors (k-NNs), 302, 348, 371,

372, 373, 375, 378, 385Kohn-Sham DFT, 6, 40

Kramers degeneracy, 113

Kuhn segment, 251

Kullback-Leibler (KL) function, 269

Lagrange function, 311

Lagrange multipliers, 103, 113, 311, 320

Laplace transform, 65Large curvature transmission coefficient, 172

Large margin classifiers, 302

Large systems, 64Large-curvature path (LCP), 189

Large-curvature tunneling (LCT), 164, 172,

173, 180, 222

Large-curvature tunneling paths, 192Lattice models, 236, 250

Lattice site, 250

Lattice-Boltzmann models, 250

Lead compound, 406Lead identification, 283

Lead-like compound, 271

Learning set, 302LEARNSC, 390

Least-action path (LAP), 189

Least-action path tunneling (LAT), 189

Least-squares SVM regression (LS-SVMR),380

Leave-one-out-model-selection, 388

Lederle, 418

Legendre polynomials, 21Lennard-Jones (LJ) potentials, 255

Lennard-Jones parameters, 239

Lennard-Jones time, 250

Library design, 431Library designers, 281

LIBSVM, 388

Light harvesting, 106Lincs, 246

Linear classifiers, 324, 351, 363

Linear discriminant analysis (LDA), 301, 379

Linear regression, 378Linear scaling, 1, 15, 37

Linear scaling calculation of SCF energies, 56

Linear scaling exchange, 38

Linear separable classes, 302Linear support vector machines, 308

Linear transition state complex, 150

Linearly non-separable data, 317Linearly separable classes of objects, 292

Linearly separable classification problems, 306

Linearly separable data, 308, 314

LinK method, 39, 40, 57Lipid bilayers, 255

Lipid bilayer self-assembly, 255

Lipid simulations, 247

Lipophilicity, 407, 420Local chain reorientation, 247

Local gauge-origin methods, 61

Local interactions, 242

Local minima, 126Local packing of interaction centers, 242

Local quadratic approximation, 144

Local Shannon entropy, 264, 280Local tacticity, 238

Local-equilibrium approximation, 130

Locating conical intersections, 102

Lock-and-key hypothesis, 402Logistic regression, 372

logP(o/w), 270, 271, 276, 284

Long-range behavior of correlation effects, 67

LOO cross-validation, 388Looms, 388

Loose transition states, 157

Low information content, 277LS-SVMlab, 390

LSVM, 390

l-temperature, 257

MACCS, 373, 419

Machine learning, 291, 301, 306

MacroModel, 428

Mad cow disease, 379Mainframes, 403, 416

MAKEBITS, 373

Management, 411

Mapping, 235, 301Mapping between scales, 236

Mapping by chain diffusion, 247

Mapping through local correlation times, 247Margin support vectors, 322

Marion Merrell Dow, 425

Mass spectra, 380

Mass-scaled coordinates, 133, 140, 212Mass-weighted coordinates, 133

MATLAB, 294, 386

Subject Index 477

MATLAB SVM toolbox, 389, 390, 391

Maximum entropy, 269, 385Maximum tunneling probability, 163

MC-TINKERATE, 191

McWeeny’s purification, 50Mean-field approach, 1

Mean-square displacement, 247

Mechanical models, 406

Mechanism of action (MOA), 352, 355Mechanism of odor perception, 361

Mechanism of toxicity, 352

Medicinal chemists, 281, 411

Medline, 385Menshutkin reaction, 217

Mercer’s conditions, 328

Merck Molecular Force Field

(MMFF94), 428Merck, 408, 417, 418, 425, 427, 438

Meso-scale model, 234, 244

Methotrexate, 422Methyl tertiary butyl ether (MTBE), 382

Metrics of information content, 269

Mexican hat, 90

Michaelis complex, 207Microcanonical variation transition state

theory (mVT), 137, 163Microcanonical ensemble, 128, 137

Microcanonical rate constant, 137Microcanonically optimized multidimensional

tunneling (mOMT), 164

Microcanonically optimized transmissioncoefficient, 188

Microcanonically optimized tunneling

probability, 189

Microscopic reversibility, 174Microstates, 130

Milk, 382

MINDO/3, 410, 420

Minimum energy path (MEP), 129, 132,140, 142, 210

MINITAB, 409

Mixed phospholipids, 256

MLF ANN, 377MM2, 420

MMI, 420

MMI/MMPI, 410MNDO, 420

MN-GSM, 217

Modeling, 279, 414

Modes transverse to the reaction coordinate,131

MOLCAS, 101

MolconnZ, 373

Molecular descriptors, 402, 431Molecular Design Ltd. (MDL), 417, 419, 429

Molecular Drug Data Report (MDDR), 271,276, 281

Molecular dynamics (MD), 130, 208, 234,

246, 250, 279, 423

Molecular graph, 279, 378

Molecular graphics, 406, 416, 417Molecular information content, 264

Molecular mechanics, 192, 410, 417, 420

Molecular Operating Environment (MOE),

271, 281, 373, 374Molecular orbitals (MOs), 6

Molecular response properties, 59

Molecular similarity, 269

Molecular Simulations Inc., 427MOLFEA, 379

Møller-Plesset perturbation theory, 2, 65, 98

MOLPRO, 100Moment of inertia, 149, 150, 160, 199

Momentum, 130

Monte Carlo simulations, 234, 250, 256

Moore’s Law, 3MOPAC, 419, 427

MORATE, 191

Morgan index, 378

Morse function, 162MP2, 2, 65, 98

M-SVM, 389

Multiclass dataset, 361Multi-class SVM classification, 339

Multiconfiguration molecular mechanics

(MCMM), 190, 192

Multiconfiguration SCF (MCSCF), 98Multiconfiguration time-dependent Hartree

(MCTDH) method, 104

Multidimensional tunneling, 125, 167

Multidimensional tunneling corrections, 164MULTILEVELRATE, 191

Multiple linear regression (MLR), 302, 362

Multipole accelerated resolution of the identity

(MARI-J), 35Multipole expansion, 15, 20

Multipole integrals, 13

Multipole series, 13Multipole translation operator, 26

Multipole-based integral estimates (MBIEs),

11, 72

Multireference configuration interaction(MRCI), 99

Multireference methods, 98

478 Subject Index

Multi-scale modeling, 235

Multistate perturbative methods, 101Mutagenicity, 421

mySVM, 352, 388

mySVM/db, 388

Naıve Bayesian classifier, 372, 375

Narcotic pollutants, 352

National Institutes of Health (NIH), 386National Library of Medicine, 386

Natural collision coordinates, 134

Natural products, 281

Natural representation, 46Near degeneracies, 98, 101

Near-field (NF) interactions, 29

Near-field integral calculation, 35

Neglect of diatomic differential overlap(NDDO), 192

Neural network, 285, 351, 378, 382

New chemical entities (NCEs), 440Newton-Raphson equation, 55, 103

NIR spectroscopy, 380

NLProt, 384

NMR, 60, 61, 246, 248NMR chemical shielding, 61

Nobel Prize in Chemistry, 85

Noise, 299, 317

Nonadiabatic coupling, 103Nonadiabatic nuclear dynamics, 96

Nonadiabatic processes, 83, 85

Nonadiabatic transitions, 86, 87, 108Nonadiabatic tunneling, 172

Nonclassical reflection, 128, 131, 163

Noncrossing rule, 88, 110, 113

Non-drugs, 376Nonequilibrium solvation (NES) effects, 206

Nonlinear classifier, 302, 324, 351

Nonlinear mapping, 294, 317

Nonlinear models, 291Nonlinear separation surfaces, 323

Nonlinear support vector machines, 323

Nonlinear transition state complex, 150

Nonphysical results, 102Nonredundant internal coordinates, 155

Nonrelativistic Hamiltonian, 85

Non-self-interactions, 243No-recrossing assumption, 128, 130

Norfloxacin, 422, 436

Normal data distributions, 268

Normal distributions, 270Normal mode frequencies, 151

Normal modes, 60, 93, 127, 128, 142, 151

Norwich Eaton, 418

Novartis 427n-SVM classification, 337, 375, 376

Nuclear coordinates, 85

Nuclear displacements, 93Nuclear wavefunction, 85

Nucleic acids, 107

Nucleic Acids Research, 386

Odd-electron systems, 113

Odor classification, 383

Off-lattice model, 251

One-dimensional spline interpolation largecurvature tunneling (ILCT(1D), 187

One-particle density matrix, 36, 47, 62

One-way flux, 131

ONX (order N exchange) algorithm, 38Optimal basis density-matrix minimization

(OBDMM), 42

Optimization techniques, 239Optimized Euler stabilization method, 143

Optimized multidimensional tunneling

(OMT), 164, 211

Optimizing the SVM model, 347Optimum separation hyperplane (OSH), 308,

311, 318, 334

Optimum tunneling paths, 164

Orbital minimization (OM), 42Orbital Shannon entropy, 280

Organic photochemistry, 106

Organophosphates, 382ORTEP (Oak Ridge Thermal Ellipsoid

Program), 406

Oscillator strength, 108

Outliers, 268, 271, 340Overfitting, 330, 370

Oxford Molecular, 428

Ozone, 91, 105

Page-McIver (PM) algorithm, 144, 212, 217

Pair distribution function, 241

Pariser-Parr-Pople (PPP) theory, 407

Partial least squares (PLS), 301, 348, 362, 373,376

Partition functions, 127, 147, 148, 199

Pattern classification, 301, 308Pattern recognition, 292, 301

Peaked conical intersections, 93

Permutational symmetry, 39

Persistence length, 245Perturbation theory, 98, 101, 112

Pfizer, 424

Subject Index 479

Pharmaceutical industry, 401

Pharmaceutical R&D, 440Pharmacia, 424

Pharmacophore, 411

Phase separation, 256Phase space, 128, 250

Phospholipids, 247, 254

Photochemical damage, 106

Photochemical reactions, 106Photochemistry, 83, 106, 126

Photodissociation, 106

Photo-initiated electron transfer, 107

Photoisomerization, 106Photophysics, 83

Photosynthesis, 83, 106

Physicochemical property, 283

PHYSPROP database, 284PM3, 361

Polarizabilties, 60

Polarization energy, 204Polycarbonate, 238

Polycyclic aromatic hydrocarbons

(PAHs), 360

Polydimethylsiloxane, 246Polygen, 427, 428

Polyisoprene, 240, 244, 247

Polymer coarse-graining, 234

Polymers, 254Polypropylene (PP), 384

POLYRATE, 127, 132, 155, 157,

161, 168, 191, 200, 217, 222Polystyrene, 237, 241

Polystyrene melt, 244

Positive majority consensus SVM

(PM-CSVM), 372Positive probability consensus SVM

(PP-CSVM), 372

Post-HF methods, 2, 64

Potential energy, 244Potential energy function (PEF), 190

Potential energy surface (PES), 83, 125, 190

Potentials of mean force (PMF), 126, 205,

208, 240Practical aspects of SVM classification,

350

Practical aspects of SVM regression, 362PreBIND, 385

Predictive model, 299

Predictor-corrector algorithm, 143

Pressure, 245Pressure correction potential, 245

Primitive Gaussian distributions, 26

Principal anharmonicity, 162

Principal component analysis (PCA), 283,301, 348

Principal component regression, 302

Principal force constants, 162Principal moments of inertia, 150

Probability density function, 283, 284

Proceedings of the National Academy ofSciences, 385

Product region, 128

Profiling of chemical libraries, 275

Projected gradient techniques, 104

Projected Hessian, 156Projection operators, 48

Propagation, 104

Property descriptors, 277

Protein classification, 349Protein Data Bank (PDB), 413Protein homology detection, 387

Protein names, 385Protein sequence similarity, 349

Pseudo-eigenvalue problem, 6

Pseudo-time, 251

PSVM, 391PubMed, 385, 386PubMed Central, 387Pure polymer, 243

Purity transformation, 51Pyrazines, 361

QCPE Newsletter, 405QSPR, 284, 347, 377

Quantitative structure-activity relationships

(QSARs), 283, 291, 292, 296, 347, 352,

363, 366, 369, 376, 413Quantitative structure-enantioselectivity

relationships (QSERs), 377

Quantitative structure-toxicity model, 367

Quantized VTST calculation, 139Quantum biology, 410

Quantum calculations, 301

Quantum chemical tree code (QCTC), 35

Quantum chemistry, 1Quantum Chemistry Program, Exchange

(QCPE), 405, 409, 418, 420, 427

Quantum descriptors, 352Quantum dynamics, 104

Quantum effects, 128, 130, 135, 138, 163, 236

Quantum effects on reaction coordinate

motion, 163Quantum mechanical/molecular mechanical

(QM/MM) methods, 106, 126

480 Subject Index

Quantum mechanics, 83, 192, 264, 279,

402, 410Quantum pharmacology, 410

Quantum threshold energy, 166

Quasiadiabatic mode, 181Quasiadiabatic states, 97

Quasidegenerate perturbation theory, 91

R, 387Racah’s normalization, 24

Radial distribution functions (RDFs), 240, 242

Radiationless transition, 96, 110

Raman spectra, 381Raman spectroscopy, 60

Random forest, 373

Rapid nonadiabatic transitions, 84

Rapier, 385Rational drug design, 430

Rattle, 246

REACCS, 419Reactant region, 128

Reaction coordinate, 128, 206

Reaction field, 204

Reaction mechanisms, 106Reaction path curvature, 172, 188

Reaction paths, 85, 129, 140, 152, 169, 176,

183, 206

Reaction potential energy surfaces, 192Reaction swath, 172, 186

Reactions in liquids, 203

Reactive normal mode, 131Reattachment, 250

Receptor mode, 181

Recognition of chemical classes, 371

Recrossing, 221Recrossing transmission coefficient, 210

Rectilinear coordinates, 132, 150

Recursive partitioning (RP), 371, 372

Reduced mass, 133Redundant curvilinear coordinates, 217

Redundant internal coordinates, 155, 192

Regression, 291, 296

Relative entropy function, 269Relativistic effective core potentials, 112

Relaxation times, 234

Remote homology detection, 349Reorientation of the dividing surface (RODS),

136, 145

Representative tunneling effects (RTE), 216

Reptation model, 248, 250Retinal, 107

Retrographics VT640, 418

Reverse mapping, 252

Rhodopsin, 107Ridge regression, 348, 375

Robust linear discriminant analysis (RLDA), 379

Rohm and Haas, 418Roothaan-Hall equations, 5, 42, 57

Rotational partition function, 148

Rotational symmetry number, 149, 150

Rouse modes, 247, 248Rouse time, 249

R-SVM, 391

Rule of Five, 271, 434

Rydberg states, 99

s-p correlation, 100, 108

Saddle points, 126, 131, 147, 151, 190, 199

SAS, 421Scaled Shannon entropy (SSE), 267

Scaling behaviors, 234

SCF convergence, 58SCF energy gradients, 57

Schering-Plough, 407

Schrodinger equation, 1, 85, 96, 104

Schwarz integral screening, 9, 38Scientific information, 291, 384

SciFinder Scholar, 439

SciLab, 387

SCOP superfamilies, 349Screening assays, 284

Seam, 103

Seam coordinate, 92Seam space, 89

Searle, 418, 424

Second order CI (SOCI), 100

Segmental dynamics, 246Segmental relaxation time, 248

Self-consistent field, 4

Self-consistent modeling techniques, 245

Self-organizing maps, 351Selwood dataset, 378Semiempirical molecular orbital theory, 192,

410

Sensors, 381Separable equilibrium solvation (SES) model,

205

Sequential minimal optimization (SMO), 313Shake, 246

Shannon entropy (SE), 263, 279

Shape indices, 273

Shepard interpolation method, 192, 194Shepard point, 193

Silicon Graphics Inc. (SGI), 426, 433

Subject Index 481

Similar objects, 301

Similarity searching, 277, 419, 429SimpleSVM toolbox, 391

Simplex, 239

Single decision tree, 373Single reference methods, 98, 101

Single-chain distribution potentials, 238

Single-chain Monte Carlo simulations, 238

Slack variable, 318, 335, 340Slater determinants, 5, 47, 97

Slater-Condon rules, 5

Small-curvature tunneling (SCT), 163, 169

SmartLab, 389Smith, Kline and French, 408

SmithKline Beecham, 418

SMx universal solvent models, 204

Soft margin nonlinear SVM classification,335

Soft margin SVMR, 340

Software vendors, 419, 421Solubility, 283

Solute geometry, 204

Solute-solvent interactions, 204

Solvation effects, 279Solvent, 251

Solvent effects, 218

Solvent molecules, 204, 279

Solvent reaction field, 204Solvent rearrangement, 204

Solvent-accessible surface area (SASA), 205,

284Solvent-free models, 256

Sparse SVM, 347

Specific reaction parameters (SRPs), 191

Spherical coordinates, 239Spherical harmonic functions, 22

Spherical multipole expansion, 24

Spider, 390

Spin orbitals, 5Spin-forbidden processes, 110

Spin-forbidden transitions, 106

Spin-orbit coupling, 106, 112

Spin-orbit coupling operator, 112Spongiform encephalopathy, 379

Static mapping, 238

Stationary points, 126, 142Statistical learning theory, 291, 292, 306

Statistical mechanics, 263

Steady-state approximation, 203

Steepest descent, 132Stereo glasses, 418

Steroids dataset, 378

Stochastic gradient boosting (SGB) method,

373Stochastic matching, 378

Stretch-bend partition function, 159

Structural descriptors, 299, 347, 352, 361, 369Structural keys, 269

Structural risk minimization (SRM), 306

Structure factors, 240

Structure of polymers, 242Structure-activity relationships (SARs), 292,

317, 407

Structure-based drug design (SBDD), 422

Structure-odor relationships, 361, 362Structures, 85

Sturges rule, 271

Sub-linear scaling, 39, 40

Substructure keys, 431Substructure searching, 419

Super-atoms, 237, 244, 250

Supercomputers, 424Superminicomputer, 415

Supervised learning, 291

Support vector machines (SVMs), 291, 302,

348, 351, 372, 375, 378, 379Support vectors, 293

Support vectors selection, 348

Surface-hopping models, 105

SVM classification, 292SVM hyperplane, 294

SVM regression, 292

SVM regression (SVMR), 340, 362, 367, 369SVM regression models, 362

SVM resources on the web, 385

SVM/LOO, 390

SvmFu, 391SVMsequel, 390

SVMstruct, 387

SVMTorch, 388

SwissProt, 384SYBYL, 417, 419, 427

Symmetry, 84, 90, 110

Symmetry-allowed conical intersections, 90

Symmetry-required conical intersections, 90System/environment separation, 207

Tablet production methods, 380Tanimoto similarity, 377

Taylor expansion, 54, 55, 91, 154

Taylor series, 193, 195

Tektronix, 418Temperature-dependent transmission

coefficients, 131

482 Subject Index

Tensor, 43

Tensor notation, 43Teratogenicity, 421

Test set, 283, 284

Tests for outliers, 268Text mining, 291, 384

Text recognition systems, 385

Theoretical chemistry, 414

Theoretical chemists, 404, 407Thermal annealing, 384

Thermodynamics, 129

Thermoplastic polymers, 384

Three-state conical intersections, 110, 111Thrombin inhibitors, 375

Tight transition state, 157, 206

Tight-binding (TB) calculations, 50

Tilted cones, 93Time reversal operator, 113

Time reversal symmetry, 113

Time-dependent DFT (TDDFT), 101Time-dependent Schrodinger equation, 105

Time-independent Schrodinger equation, 4, 85

TinySVM, 389

TOPKAT, 420Topography, 93

Topological indices, 264, 301

Torch, 387

Torsade de pointes (TdP), 373Torsion, 159

Torsion partition function, 159

Toxicity, 352, 359, 363, 421Toxicity evaluations, 366

Toxicity of aromatic compounds, 366

Toxicological endpoints, 421

Training set, 283, 284, 295, 302, 317Trajectory-Surface-Hopping (TSH), 105

Trans-1,4-polyisoprene, 249

Transition state, 85, 128

Transition state dividing surfaces, 126Transition state ensemble, 208

Transition state partition function, 134

Transition state theory (TST), 126, 128

Transmission coefficient, 130, 131, 139, 167,168, 186

Tree Kernels, 390

TrEMBL, 384Trimethoprim, 422

Tripos Associates, 417, 419, 427

Tunneling, 127, 128, 131, 139, 163

Tunneling amplitude, 180, 183Tunneling effects, 214, 221

Tunneling energies, 163, 187

Tunneling paths, 163, 169, 172, 176, 183

Tunneling probabilities, 163, 169, 172Tunneling swath, 129

Tunneling transmission coefficient, 211

Turning point, 173, 182Two-electron integral screening, 8

Two-electron integrals, 6, 8, 24, 37

Ultra-fast experimental techniques, 84Ultra-short excited state lifetimes, 107

Umbrella sampling, 208

Uncertainty principle, 130

Unified statistical (US) model, 137Unimolecular reaction, 130, 148, 167, 189

United atom (UA) models, 236

Unphysical moves, 250

Unphysically high pressure, 245Unscaled coordinates, 133

Upjohn, 407, 438

Uracil, 94, 111Urine profiling, 380

Valence coordinates, 152

Valence force coordinates, 152Valence-Rydberg methods, 101

Vapnik-Chervonenkis dimension, 292, 306

Variational configuration interaction, 97

Variational dividing surfaces, 145Variational principle, 99

Variational reaction path (VRP) algorithm,

145Variational transition state theory (VTST), 125

Vertical cones, 93

Vertical conical intersections, 93

Vertical excitation energies, 102, 108Very fast multipole methods (vFFM), 35

Vibrational excited states, 172

Vibrational frequencies, 60, 142, 157

Vibrational modes, 134Vibrational partition functions, 131, 149, 150,

159

Vibrational spectra, 111

Virtual orbitals, 97Virtual screening, 430

Vision, 83, 106

Water, 95, 254

Water molecules, 247

Wave vector space, 241

Wavepackets, 104Weighted SVM, 338, 383

Weka, 387, 388

Subject Index 483

Well-separatedness (WS) criterion,

29, 33Wide margin classifiers, 306

Wilson B matrix, 155

Wilson C matrix, 155Wilson G matrix, 156

Wilson GF matrix method, 156

Word processing, 415

Word processors, 409Workstations, 426

World Drug Index (WDI), 372

X-PLOR, 423

YaLE, 387, 388

Zero-curvature tunneling (ZCT), 164,

169

ZINC compound database, 276Zwitterionic head groups, 255

484 Subject Index

Date post:	19-Jun-2020
Category:	Documents
Upload:	others
View:	7 times
Download:	1 times