+ All Categories
Home > Documents > SE 302 Compmethods Qsar

SE 302 Compmethods Qsar

Date post: 02-Apr-2018
Category:
Upload: souvenirsouvenir
View: 219 times
Download: 0 times
Share this document with a friend

of 18

Transcript
  • 7/27/2019 SE 302 Compmethods Qsar

    1/18

    Combinatorial Chemistry & High Throughput Screening, 2006, 9, 213-228 213

    1386-2073/06 $50.00+.00 2006 Bentham Science Publishers Ltd.

    Computational Methods in Developing Quantitative Structure-ActivityRelationships (QSAR): A Review

    Arkadiusz Z. Dudek*,a

    , Tomasz Arodzb

    and Jorge Glvezc

    aUniversity of Minnesota Medical School, Minneapolis, MN 55455, USA

    bInstitute of Computer Science, AGH University of Science and Technology, al. Mickiewicza 30, 30-059 Krakw, Poland

    cUnit of Drug Design and Molecular Connectivity Research, University of Valencia, 46100 Burjassot, Valencia, Spain

    Abstract: Virtual filtering and screening of combinatorial libraries have recently gained attention as methods

    complementing the high-throughput screening and combinatorial chemistry. These chemoinformatic techniques rely

    heavily on quantitative structure-activity relationship (QSAR) analysis, a field with established methodology and

    successful history. In this review, we discuss the computational methods for building QSAR models. We start with

    outlining their usefulness in high-throughput screening and identifying the general scheme of a QSAR model. Following,

    we focus on the methodologies in constructing three main components of QSAR model, namely the methods for

    describing the molecular structure of compounds, for selection of informative descriptors and for activity prediction. We

    present both the well-established methods as well as techniques recently introduced into the QSAR domain.

    Keywords: QSAR, molecular descriptors, feature selection, machine learning.

    1. INTRODUCTION

    High throughput screening (HTS) has been a majorrecent technological improvement in drug discoverypipeline. In conjunction with combinatorial chemistry, itallows for synthesis and rapid activity assessment of vastnumber of small-molecule compounds [1, 2]. As theexperience with these technologies matured, the focus hasshifted from sifting through large, diverse moleculecollections to more rationally designed libraries [3].

    With this need for knowledge-guided screening ofcompounds, the virtual filtering and screening have beenrecognized as techniques complementary to high-throughput

    screening [4, 5]. To much extent, these techniques rely onquantitative structure-activity relationship (QSAR) analysis,which is in constant advancement since the works of Hansch[6] in early 1960s. The QSAR methodology focuses onfinding a model, which allows for correlating the activity tostructure within a family of compounds. Such models can beused to increase the effectiveness of HTS in several ways [7,8].

    QSAR studies can reduce the costly failures of drugcandidates in clinical trials by filtering the combinatoriallibraries. Virtual filtering can eliminate compounds withpredicted toxic of poor pharmacokinetic properties [9, 10]early in the pipeline. It also allows for narrowing the library

    to drug-like or lead-like compounds [11] and eliminating thefrequent-hitters, i.e., compounds that show unspecificactivity in several assays and rarely result in leads [12].Including such considerations at an early stage results inmultidimensional optimization, with high activity as anessential but not only goal [8].

    *Address correspondence to this author at the University of Minnesota,Division of Hematology, Oncology and Transplantation, 420 Delaware St.

    SE, MMC 480, Minneapolis, MN 55455, USA; Tel: +1 612 624-0123; Fax:

    +1 612 625-6919; E-mail: [email protected]

    Considering activity optimization, building targetspecific structure-activity models based on identified hits canguide HTS by rapidly screening the library for mospromising candidates. Such focused screening can reduce thenumber of experiments and allow for use of more complexlow throughput assays [7]. Interpretation of created modelsgives insight into the chemical space in proximity of the hicompound. Feedback loops of high-throughput and virtuascreening, resulting in sequential screening approach [13]allow therefore for more rational progress towards highquality lead compounds. Later in the drug discoverypipeline, accurate QSAR models constructed on the basis othe lead series can assist in optimizing the lead [14].

    The importance and difficulty of the above-describedtasks facing QSAR models has inspired manychemoinformatics researchers to borrow from recendevelopments in various fields, including patternrecognition, molecular modeling, machine learning andartificial intelligence. This results in large family oconceptually different methods being used for creatingQSARs. The purpose of this review is to guide the readerthrough the diversity of the techniques and algorithms fordeveloping successful QSAR models.

    1.1. General Scheme of a QSAR Study

    The chemoinformatic methods used in building QSAR

    models can be divided into three groups, i.e., extractingdescriptors from molecular structure, choosing thoseinformative in the context of the analyzed activity, andfinally, using the values of the descriptors as independenvariables to define a mapping that correlates them with theactivity in question. The typical QSAR system realizes thesephases, as depicted in Fig. 1.

    Generation of Molecular Descriptors from Structure

    The small-molecule compounds are defined by theistructure, encoded as a set of atoms and covalent bonds

    '

  • 7/27/2019 SE 302 Compmethods Qsar

    2/18

    214 Combinatorial Chemistry & High Throughput Screening, 2006, Vol. 9, No. 3 Dudek et al.

    between them. However, the structure cannot be directlyused for creating structure-activity mappings for reasonsstemming from chemistry and computer science.

    First, the chemical structures do not usually contain in anexplicit form the information that relates to activity. Thisinformation has to be extracted from the structure. Variousrationally designed molecular descriptors accentuatedifferent chemical properties implicit in the structure of themolecule. Only those properties may correlate more directlywith the activity. Such properties range fromphysicochemical and quantum-chemical to geometrical andtopological features.

    The second, more technical reason, which guides the use

    and development of molecular descriptors, stems from theparadigm of feature space prevailing in statistical dataanalysis. Most methods employed to predict the activityrequire as input numerical vectors of features of uniformlength for all molecules. Chemical structures of compoundsare diverse in size and nature and as such do not fit into thismodel directly. To circumvent this obstacle, moleculardescriptors convert the structure to the form of well-definedsets of numerical values.

    Selection of Relevant Molecular Descriptors

    Many applications are capable of generating hundreds orthousands of different molecular descriptors. Typically, onlysome of them are significantly correlated with the activity.

    Furthermore, many of the descriptors are intercorrelated.This has negative effects on several aspects of QSARanalysis. Some statistical methods require that the number ofcompounds is significantly greater than the number ofdescriptors. Using large descriptor sets would require largedatasets. Other methods, while capable of handling datasetswith large descriptors to compounds ratios, nonethelesssuffer from loss of accuracy, especially for compoundsunseen during the preparation of the model. Large number ofdescriptors also affects interpretability of the final model. Totackle these problems, a wide range of methods for

    automated narrowing of the set of descriptors to the mosinformative ones is used in QSAR analysis.

    Mapping the Descriptors to Activity

    Once the relevant molecular descriptors are computedand selected, the final task of creating a function betweentheir values and the analyzed activity can be carried out. Thevalue quantifying the activity is expressed as a function ofthe values of the descriptors. The most accurate mappingfunction from some wide family of functions, is usuallyfitted based on the information available in the training seti.e., compounds for which the activity is known. A widerange of mapping function families can be used, includinglinear or non-linear ones, and many methods for carrying ou

    the training to obtain the optimal function can be employed.

    1.2. Organization of the Review

    In the following sections of this review, we discuss eachof the three groups of methods in more detail. We start withvarious types of molecular descriptors in section 2. Next, insection 3 we focus on automatic methods for choosing themost predictive descriptors from a large initial set. Then, insection 4 we describe the linear and non-linear mappingtechniques used for expressing the activity or property as afunction of the values of selected descriptors. For eachtechnique, we provide an overview of the method followedby examples of its applications in QSAR analysis.

    2. MOLECULAR DESCRIPTORS

    Molecular descriptors map the structure of the compoundinto a set of numerical or binary values representing varioumolecular properties that are deemed to be important foexplaining activity. Two broad families of descriptors can bedistinguished, based on the dependence on the informationabout 3D orientation and conformation of the molecule.

    2.1. 2D QSAR Descriptors

    The broad family of descriptors used in the 2D-QSARapproach share a common property of being independenfrom the 3D orientation of the compound. These descriptors

    Fig. (1). Main stages of a QSAR study. The molecular structure is encoded using numerical descriptors. The set of descriptors is pruned to

    select the most informative ones. The activity is derived as a function of the selected descriptors.

  • 7/27/2019 SE 302 Compmethods Qsar

    3/18

    Computational Methods in Developing QSAR Combinatorial Chemistry & High Throughput Screening, 2006, Vol. 9, No. 3 215

    range from simple measures of entities constituting themolecule, through its topological and geometrical propertiesto computed electrostatic and quantum-chemical descriptorsor advanced fragment-counting methods.

    Constitutional Descriptors

    Constitutional descriptors capture properties of themolecule that are related to elements constituting its

    structure. These descriptors are fast and easy to compute.Examples of constitutional descriptors include molecularweight, total number of atoms in the molecule and numbersof atoms of different identity. Also, a number of propertiesrelating to bonds are used, including total numbers of single,double, triple or aromatic type bonds, as well as number ofaromatic rings.

    Electrostatic and Quantum-Chemical Descriptors

    Electrostatic descriptors capture information onelectronic nature of the molecule. These include descriptorscontaining information on atomic net and partial charges[15]. Descriptors for highest negative and positive chargesare also informative, as well as molecular polarizability [16].

    Partial negatively or positively charged solvent-accessibleatomic surface areas have also been used as informativeelectrostatic descriptors for modeling intermolecularhydrogen bonding [17]. Energies of highest occupied andlowest unoccupied molecular orbital form useful quantum-

    chemical descriptors [18], as do the derivative quantitiesuch as absolute hardness [19].

    Topological Descriptors

    The topological descriptors treat the structure of thecompound as a graph, with atoms as vertices and covalenbonds as edges. Based on this approach, many indicequantifying molecular connectivity were defined, starting

    with Wiener index [20], which counts the total number obonds in shortest paths between all pairs of non-hydrogenatoms. Other topological descriptors include Randic indice

    x [21], defined as sum of geometric averages of edge degreesof atoms within paths of given lengths, Balaban's J index[22] and Shultz index [23].

    Information about valence electrons can be included intopological descriptors, e.g. Kier and Hall indices x

    v _ [24] oGlvez topological charge indices [25]. The first ones usegeometric averages of valence connectivities along pathsThe latter measure topological valences of atoms and necharges transfer between pair of atoms separated by a givennumber of bonds. To exemplify the derivation of topologicaindices, in Fig. 2 we show the calculation of Glvez indicefor atom distance of a single bond.

    Descriptors combining connectivity information withother properties are also available, e.g. BCUT [26-28descriptors, which take form of eigenvalues of atom

    Fig. (2). Example of the Galvez first-order topological charge indices G1 and J1 for isopentane. The matrix product Mij of atom adjacency

    matrix and topological distance matrix defined as inverse squares of inter-atom distances, is used to define the charge terms CTij as Mij - MjiThe Gkindices are defined as the algebraic sum of absolute values of the charge terms for pairs of atoms separated by kbonds. The Jk indices

    result from normalization ofGkby the number of bonds in the molecule.

  • 7/27/2019 SE 302 Compmethods Qsar

    4/18

    216 Combinatorial Chemistry & High Throughput Screening, 2006, Vol. 9, No. 3 Dudek et al.

    connectivity matrix with atom charge, polarizability or H-bond potential values on diagonal and additional terms offdiagonal. Similarly, the Topological Sub-StructuralMolecular Design (TOSS-MODE/TOPS-MODE) [29, 30]rely on spectral moments of bond adjacency matrix amendedwith information on for e.g. bond polarizability. The atom-type electrotopological (E-state) indices [31, 32] useelectronic and topological organization to define the intrinsic

    atom state and the perturbations of this state induced byother atoms. This information is gathered individually for awide range of atom-types to form a set of indices.

    Geometrical Descriptors

    Geometrical descriptors rely on spatial arrangement ofatoms constituting the molecule. These descriptors includeinformation on molecular surface obtained from atomic vander Waals areas and their overlap [33]. Molecular volumemay be obtained from atomic van der Waals volumes [34].Principal moments of inertia and gravitational indices [35]also capture information on spatial arrangement of the atomsin molecule. Shadow areas, obtain by projection of themolecule to its two principal axes are also used [36].

    Another geometrical descriptor is the total solvent-accessiblesurface area [37, 38].

    Fragment-Based Descriptors and Molecular Fingerprints

    The family of descriptors relying on substructural motifsis often used, especially for rapid screening of very largedatabases. The BCI fingerprints [39] are derived as bitsdescribing the presence or absence in the molecule of certainfragments, including atoms with their nearest neighborhoods,atom pairs and sequences, or ring-based fragments. A similarapproach is present in the basic set of 166 MDL Keys [40].However, other variants of the MDL Keys are also available,including extended sets of keys or compact sets. The latterare results of dedicated pruning strategies [41] or elimination

    methods, e.g. the Fast Random Elimination ofDescriptors/Substructure Keys (FRED/SKEYS) [42].Recently introduced Hologram QSAR (HQSAR) [43]approach is based on counting the number of occurrences ofcertain substructural paths of functional groups. For eachgroup, cyclic redundancy code is calculated, which serves asa hashing function [44] for partitioning the substructuralmotifs into bins of hash table. The numbers of elements inthe bins form a hologram.

    The Daylight fingerprints [45] are a natural extension ofthe fragment-based descriptors by eliminating the reliance onpre-defined list of sub-structure motifs. The fingerprint foreach molecule is a string of bits. However, a structural motif

    in the molecule does not correspond to a single bit, but leads,through a hashing function, to a pattern of bits that are addedto the fingerprint with a logical "or" operation. The bits indifferent patterns may overlap, due to the large number ofpossible patterns and a finite length of a bit string. Thus, thefact that a bit or several bits are set in the fingerprint cannotbe interpreted as a proof of pattern's presence. However, ifone of the bits corresponding to a given pattern is not set,this guarantees that the pattern is not present in the molecule.This allows for rapid filtering of molecules that do notpossess certain structural motifs. The patterns are generated

    individually for each molecule, and describe atoms with theineighborhoods and paths of up to 7 bonds. Other approachesthan hashed fingerprints are also proposed to circumvent theproblem of a pre-defined substructure library, e.g. algorithmfor optimal discovery of frequent structural fragmentrelevant to given activity [46].

    2.2. 3D QSAR Descriptors

    The 3D-QSAR methodology is much morecomputationally complex than 2D-QSAR approach. Ingeneral, it involves several steps to obtain numericadescriptors of the compound structure. First, theconformation of the compound has to be determined eithefrom experimental data or molecular mechanics and thenrefined by minimizing the energy [47, 48]. Next, theconformers in dataset have to be uniformly aligned in spaceFinally, the space with immersed conformer is probedcomputationally for various descriptors. Some methodsindependent of the compound alignment have also beendeveloped.

    2.2.1. Alignment-Dependent 3D QSAR Descriptors

    The group of methods that require molecule alignmenprior to the calculation of descriptors is strongly dependenon the information on the receptor for the modeled ligand. Incase where such data is available, the alignment can beguided by studying the receptor-ligand complexesOtherwise, purely computational methods for superimposingthe structures in space have to be used [49, 50]. Thesemethods rely e.g. on atom-atom or substructure-substructuremapping.

    Comparative Molecular Field Analysis

    The Comparative Molecular Field Analysis (CoMFA[51] uses electrostatic (Coulombic) and steric (van deWaals) energy fields defined by the inspected compound

    The aligned molecule is placed in a 3D grid. In each point othe grid lattice a probe atom with unit charge is placed andthe potentials (Coulomb and Lennard-Jones) of the energyfields are computed. Then, they serve as descriptors infurther analysis, typically using partial least squaresregression. This analysis allows for identifying structureregions positively and negatively related to the activity inquestion.

    Comparative Molecular Similarity Indices Analysis

    The Comparative Molecular Similarity Indice(CoMSIA) [52] is similar to CoMFA in the aspect of atomprobing throughout the regular grid lattice in which themolecules are immersed. The similarity between probe atom

    and the analyzed molecule are calculated. Compared toCoMFA, CoMSIA uses a different potential functionnamely the Gaussian-type function. Steric, electrostatic, andhydrophobic properties are then calculated, hence the probeatom has unit hydrophobicity as additional property. The useof Gaussian-type potential function instead of Lennard-Joneand Coulombic functions allows for accurate information ingrid points located within the molecule. In CoMFAunacceptably large values are obtained in these points due tothe nature of the potential functions and arbitrary cut-offthat have to be applied.

  • 7/27/2019 SE 302 Compmethods Qsar

    5/18

    Computational Methods in Developing QSAR Combinatorial Chemistry & High Throughput Screening, 2006, Vol. 9, No. 3 217

    2.2.2. Alignment-Independent 3D QSAR Descriptors

    Another group of 3D descriptors are those invariant tomolecule rotation and translation in space. Thus, nosuperposition of compounds is required.

    Comparative Molecular Moment Analysis

    The Comparative Molecular Moment Analysis(CoMMA) [53] uses second-order moments of the massdistribution and charge distributions. The moments relate tocenter of the mass and center of the dipole. The CoMMAdescriptors include principal moments of inertia, magnitudesof dipole moment and principal quadrupole moment.Furthermore, descriptors relating charge to massdistributions are defined, i.e., magnitudes of projections ofdipole upon principal moments of inertia and displacementbetween center of mass and center of dipole.

    Weighted Holistic Invariant Molecular Descriptors

    The Weighted Holistic Invariant Molecular (WHIM) [54,55] and Molecular Surface WHIM [56] descriptors providethe invariant information by employing the principalcomponent analysis (PCA) on the centered co-ordinates ofthe atoms constituting the molecule. This transforms themolecule into the space that captures the most variance. Inthis space, several statistics are calculated and serve asdirectional descriptors, including variance, proportions,symmetry and kurtosis. By combining the directionaldescriptors, non-directional descriptors are also defined. Thecontribution of each atom can be weighted by a chemicalproperty, leading to different principal components capturingvariance within the given property. The atoms can beweighted by mass, van der Waals volume, atomicelectronegativity, atomic polarizability, electrotopologicalindex of Kier and Hall and molecular electrostatic potential.

    VolSurf

    The VolSurf [57, 58] approach is based on probing thegrid around the molecule with specific probes, for e.g.hydrophobic interactions or hydrogen bond acceptor ordonor groups. The resulting lattice boxes are used tocompute the descriptors relying on volumes or surfaces of3D contours, defined by the same value of the probe-molecule interaction energy. By using various probes andcut-off values for the energy, different molecular propertiescan be quantified. These include e.g. molecular volume andsurface, and hydrophobic and hydrophilic regions.Derivative quantities, e.g. molecular globularity or factorsrelating the surface of hydrophobic or hydrophilic regions tosurface of the whole molecule can also be computed. In

    addition, various geometry-based descriptors are alsoavailable, including energy minima distances or amphiphilicmoments.

    Grid-Independent Descriptors

    The Grid-Independent Descriptors (GRIND) [59] havebeen devised to overcome the problems with interpretabilitycommon in alignment-independent descriptors. Similarly toVolSurf, it utilizes probing of the grid with specific probes.The regions showing the most favorable energies ofinteraction are selected, provided that the distances betweenthe regions are large. Next, the probe-based energies areencoded in a way independent of the molecule's

    arrangement. To this end, the distances between the nodes inthe grid are discretized into a set of bins. For each distancebin, the nodes with the highest product of energies are storedand the value of the product serves as the numericadescriptor. In addition, the stored information on the positionof the nodes can be used to track down the exact regions ofthe molecule relating to the given property. To extend themolecular information captured by the descriptors, the

    product of node energies may include not only energierelating to the same probe, but also from two different probetypes.

    2.3. The 2D- Versus 3D-QSAR Approach

    It is generally assumed that 3D approaches are superioto 2D in drug design. Yet, studies show such an assumptionmay not always hold. For example, the results oconventional CoMFA may often be non-reproducible due todependence of the outputs' quality on the orientation of therigidly aligned molecules on user's terminal [60, 61]. Suchalignment problems are typical in 3D approaches and eventhough some solutions have been proposed, the unambiguou3D alignment of structurally diverse molecules still remains

    a difficult task.

    Moreover, the distinction between 2D- and 3D-QSARapproaches is not a crisp one, especially when alignmentindependent descriptors are considered. This can be observedwhen comparing the BCUT with the WHIM descriptorsBoth employ a similar algebraic method, i.e., solving aneigenproblem for a matrix describing the compound - theconnectivity matrix in case of BCUT descriptors andcovariance matrix of 3D co-ordinates in case of WHIM.

    There is also a deeper connection between 3D-QSAR andone of 2D methods, the topological approach. It stems fromthe fact that the geometry of a compound in many casesdepends on its topology. An elegant example was provided

    by Estrada et al., who demonstrated that the dihedral angleof biphenyl as a function of the substituents attached to it canbe predicted by topological indices [62]. Along the sameline, a supposedly typically 3D property, chirality, has beenpredicted using chiral topological indices [63], constructedby introducing an adequate weight into the topologicamatrix for the chiral carbons.

    3. AUTOMATIC SELECTION OF RELEVANTMOLECULAR DESCRIPTORS

    Automatic methods for selecting the best descriptors, orfeatures, to be used in construction of the QSAR model falinto two categories [64]. In the wrapper approach, the qualityof descriptor subsets is obtained from constructing and

    evaluating a series of QSAR models. In filtering, no model isbuild, and features are evaluated using some other criteria.

    3.1. Filtering Methods

    These techniques are applied independently of themapping method used. They are executed prior to themapping, to reduce the number of descriptors followingsome objective criteria, e.g. inter-descriptor correlation.

    Correlation-Based Methods

    Pearson's correlation coefficients may serve as apreliminary filter for discarding intercorrelated descriptors

  • 7/27/2019 SE 302 Compmethods Qsar

    6/18

    218 Combinatorial Chemistry & High Throughput Screening, 2006, Vol. 9, No. 3 Dudek et al.

    This can be done by e.g. creating clusters of descriptorshaving correlation coefficients higher than certain thresholdand retaining only one, randomly chosen member of eachcluster [65]. Another procedure involves estimatingcorrelations between pair of descriptors and, if it exceeds athreshold, randomly discarding one of the descriptors [66].The choice of the ordering in which pairs are evaluated maylead to significantly different results. One popular method is

    to first rank the descriptors by using some criterion, and theniteratively browse the set starting from pairs containing thehighest-ranking features.

    One such ranking may be the correlation ranking, basedon correlation coefficient between activity and descriptors.However, correlation ranking is usually used in conjunctionwith principal component analysis [67, 68]. Methods usingmeasures of correlation between activity and descriptorsother than Pearson's have been used, notably the pair-correlation method [69-71].

    Methods Based on Information Theory

    Information content of the descriptor is defined in termsof entropy of descriptor treated as a random variable. Basedon this notion, various measures relating the informationshared between two descriptors or between descriptor andthe activity can be defined. An example of such measure,used in descriptor selection for QSAR, is the mutualinformation.

    The mutual information, sometimes referred to asinformation gain, quantifies the reduction of uncertainty, orinformation content, of activity variable by knowing thedescriptor values. It is used in QSAR to rank the descriptors[72, 73].

    The application of information-theoretic criteria isstraightforward when both the descriptors and activity valuesare categorical. In case of continuous numerical variables,some discretization schemes have to be applied toapproximate the variables. Thus, such criteria are usuallyused with binary descriptors, such as ones describing 3Dproperties of molecules used in thrombin dataset inKnowledge Discovery in Databases 2001 Cup.

    Statistical Criteria

    The Fisher's ratio, i.e., ratio of the between class varianceto the within-class variance, can be used to rank thedescriptors [74]. Next, the correlation between pairs offeatures is used, as described before, to reduce the set ofdescriptors.

    Another method used in assessing the quality of a

    descriptor is based on the Kolmogorov-Smirnov [75]statistics. As applied to descriptor selection in QSAR [76], itis a fast method not relying on the knowledge of theunderlying distribution and not requiring the conversion ofvariables descriptors into categorical values. For two classesof activity to be predicted, the method measures the maximalabsolute distance between cumulative distribution functionsof the descriptor for individual activity classes.

    3.2. Wrapper Methods

    These techniques operate in conjunction with a mappingalgorithm [77]. The choice of best subset of descriptors isguided by the error of the mapping algorithm for a given

    subset, measured e.g. with cross-validation. The schematicillustration of wrapper methods is given in Fig. 3.

    Fig. (3). Generic scheme for wrapper descriptor selection methods

    Iteratively, various configurations of selected and discarded

    descriptors are evaluated by creating a descriptors-to-activity

    mapping and assessing its prediction accuracy. The final descriptor

    are those yielding the highest accuracy for a given family o

    mapping functions.

    Genetic Algorithm

    The Genetic Algorithms (GA) [78] are efficient methodfor function minimization. In descriptor selection contextthe prediction error of the model built upon a set of featureis optimized [79, 80]. The genetic algorithm mimics thenatural evolution by modeling a dynamic population osolutions. The members of the population, referred to aschromosomes, encode the selected features. The encodingusually takes form of bit strings with bits corresponding toselected features set and others cleared. Each chromosomeleads to a model built using the encoded features. By usingthe training data, the error of the model is quantified andserves as a fitness function. During the course of evolutionthe chromosomes are subjected to crossover and mutation

    By allowing survival and reproduction of the fitteschromosomes, the algorithm effectively minimizes the errofunction in subsequent generations.

    The success of GA depends on several factors. Theparameters steering the crossover, mutation and survival ochromosomes should be carefully chosen to allow thepopulation to explore the solution space and to prevent earlyconvergence to homogeneous population occupying a locaminimum. The choice of initial population is also importanin genetic feature selection. To address this issue, e.g. amethod based on Shannon's entropy combined with graphanalysis can be used [81].

  • 7/27/2019 SE 302 Compmethods Qsar

    7/18

    Computational Methods in Developing QSAR Combinatorial Chemistry & High Throughput Screening, 2006, Vol. 9, No. 3 219

    Genetic Algorithms have been used in feature selectionfor QSAR with a range of mapping methods, e.g. ArtificialNeural Networks [66, 82, 83], k-Nearest Neighbor method[84] and Random Forest [66].

    Simulated Annealing

    Simulated Annealing (SA) [85] is another stochasticmethod for function optimization employed in QSAR [66,

    86, 87]. As in the evolutionary approach, the functionminimized represents the error of the model built using thesubset of descriptors. The SA algorithm operates iterativelyby finding a new subset of descriptors by altering thecurrent-best one, e.g. by exchanging some percentage of thefeatures. Next, SA evaluates prediction error of the newsubset and makes the choice whether to adopt the newsolution as the current optimal solution. This decisiondepends on whether the new solution leads to lower errorthan the current one. If so, the new solution is used.However, in other case, the solution is not automaticallydiscarded. With a given probability, based on the Boltzmanndistribution, the worse solution can replace the current, betterone.

    Replacing the solution with a worse one allows the SAmethod to escape from local minima of the error function,i.e., solutions that cannot be made better without traversingthrough less-fitted feature subsets. The power of SA methodstems from altering the temperature term in the Boltzmanndistribution. At an early stage, when the solution is not yethighly optimized and mostly prone to encounter localminima, the temperature is high. During the course ofalgorithm, the temperature is lowered and acceptance ofworse solutions is less likely. Thus, even if the obtainedminimum is not global, it is nonetheless usually of highquality.

    Sequential Feature Forward Selection

    While genetic algorithm and simulating annealing rely onguided random process of exploring the space of featuresubsets, Forward Feature Selection [88] operates in adeterministic manner. It implements a greedy searchthroughout the feature subsets. As a first step, a singlefeature that leads to best prediction is selected. Next,sequentially, each feature is individually added to the currentsubset and the errors of resulting models are quantified. Thefeature that is the best in reducing the error is incorporatedinto the subset. Thus, in each step a single best feature isadded, resulting in a sequence of nested subsets of features.The procedure stops when a specified number of features isselected. More elaborate stopping conditions are also

    proposed, e.g. based on incorporating an artificial randomfeature [89]. When this feature is to be selected as the onethat improves the best quality of the model, the procedure isstopped. The drawback of forward selection is that if severalfeatures collectively are good predictors but alone each is apoor prediction, none of the features may be chosen. Therecursive feature forward selection has been used in severalQSAR studies [65, 81, 90, 91].

    Sequential Backward Feature Elimination

    The Backward Feature Elimination [88] is anotherexample of a greedy, sequential method that yields nestedsubsets of features. In contrast to forward selection, the full

    set of features is used as a starting point. Next, in each stepall subsets of features resulting from removal of a singlefeature are analyzed for the prediction error. The feature thaleads to a model with highest error is removed from thecurrent subset. The procedure stops when the given numbeof features are dropped.

    Backward elimination is slower than forward selectionyet often leads to better results. Recently, a significantly

    faster variant of backward elimination, the Recursive FeatureElimination [92] method has been proposed for SupporVector Machines (SVM). In this method, the feature to beremoved is chosen based on a single execution of thelearning method using all features remaining in the giveniteration. The SVM allows for ranking the features accordingto their contribution to the result. Thus, the least contributingfeature can be dropped to form a new, narrowed subset ofeatures. There is no need to train SVMs for each subset asin original feature elimination method. Variants of backwardfeature elimination method have been used in numerousQSAR studies [93-97].

    3.3. Hybrid Methods

    In addition to the purely filter- or wrapper-baseddescriptor selection procedures, QSAR studies utilize thefusion of the two approaches. A rapid objective method iused as a preliminary filter to narrow the feature set. Nextone of the more accurate but slower subjective method iemployed. As an example of such a combination otechniques, the correlation-based test significantly reducingthe number of features followed by genetic algorithm orsimulated annealing can be used [66]. A similar procedurewhich uses a greedy sequential feature forward selection ialso in use [65].

    The feature selection can also be implicit in somemapping methods. For example, the Decision Tree (see

    section 4.2.4) utilizes only a subset of features in thedecision process, if a single or only a few descriptors aretested at each node and the overall number of featureexceeds the number of those used in the nodes. Similarlyensembles of decision stumps (see section 4.2.6) also operateon reduced number of descriptors if the number of membersin ensemble is smaller than the number of features.

    4. MAPPING THE MOLECULAR STRUCTURE TOACTIVITY

    Given the selected descriptors, the final step in buildingthe QSAR model is to derive the mapping between theactivity and the values of the features. Simple, yet usefumethods model the activity as a linear function of the

    descriptors. Other, non-linear, methods extend this approachto more complex relations.

    Another important division of the mapping methods ibased on the nature of the activity variable. In case opredicting a continuous value a regression problem iencountered. When only some categories of classes of theactivity need to be predicted, e.g. partitioning compoundsinto active and inactive, the classification problem occurs. Inregression, the dependent variable is modeled as a functionof the descriptors, as noted above. In classificationframework, the resulting model is defined by a decision

  • 7/27/2019 SE 302 Compmethods Qsar

    8/18

  • 7/27/2019 SE 302 Compmethods Qsar

    9/18

    Computational Methods in Developing QSAR Combinatorial Chemistry & High Throughput Screening, 2006, Vol. 9, No. 3 221

    permeability prediction [114] , LDA exhibi ted loweraccuracy than PLS-based method. In predicting antibacterialactivity [118, 119], it performed worse than neural network.LDA was also used to predict drug likeness [120], showingresults slightly better than linear programming machine, amethod similar to linear SVM. However, it yielded resultsworse than non-linear SVM and bagging ensembles. Inecotoxicity prediction [121], LDA performed better than

    other linear methods and k-NN, but inferior to decision trees.4.2. Non-Linear Models

    Non-linear models extend the structure-activityrelationships to non-linear functions of input descriptors.Such models may become more accurate, especially for largeand diverse datasets. However, usually, they are harder tointerpret. Complex, non-linear models may also fall prey tooverfitting [122], i.e., low generalization to compoundsunseen during training.

    4.2.1. Bayes Classifier

    The Bayes Classifier stems from the Bayes rule relatingthe posterior probability of a class to its overall probability,

    the probability of the observations and the likelihood of aclass with respect to observed variables. In Bayes rule, theclass minimizing the posterior probability is chosen as theprediction result. However, in real problems, the likelihoodsare not known and have to be estimated. Yet, given a finitenumber of training examples, such estimation is not trivial.One method to approach this problem is to make anassumption of independence of likelihoods of class withrespect to different descriptors. This leads to the NaiveBayes Classifier (NBC). For typical datasets, the estimationof likelihoods with respect to single variables is feasible. Thedrawback of this method is that independence assumptionusually does not hold.

    An extensive study using Naive Bayes Classifier incomparison with other methods was conducted [110] usingnumerous endpoints, including COX-2, CDK-2, BBB,dopamine, logD, P-glycoprotein, toxicity and multidrugresistance reversal. In most cases NBC was inferior to othermethods, however it outperformed PLS for BBB and CDK-2, k-NN for P-glycoprotein and COX-2, and decision treesfor BBB and P-glycoprotein. In thrombin binding [72], NBCyielded worse results than SVM. However, NBC has beenshown useful in modeling the inhibition of the HIV-1protease [123].

    4.2.2. The k-Nearest Neighbor Method

    The k-Nearest Neighbor (k-NN) [124] is a simple

    decision scheme that requires practically no training and isasymptotically optimal, i.e., with increase in training dataconverges to the optimal prediction error. For a givencompound in the descriptor space, the method analyzes its knearest neighboring compounds from the training set andpredicts the activity class that is most highly representedamong these neighbors. The k-NN scheme is sensitive to thechoice of metric and to the number of training compoundsavailable. Also, the number of neighbors analyzed can beoptimized to yield best results.

    The k-nearest neighbors scheme have been used e.g. forpredicting COX-2 inhibition [84], where it showed accuracy

    higher than PLS and similar to neural networkAnticonvulsant activity, dopamine D1 antagonists andaquatic toxicity have also been modeled using this method[86]. In a study on P-glycoprotein transport activity [94], kNN performed comparably to decision tree, but worse thanneural network and SVM. In ecotoxicity QSAR [121], k-NNwas better than some linear methods, but inferior todiscriminant analysis and decision trees.

    4.2.3. Artificial Neural Networks

    Artificial Neural Networks (ANN) [125] are biologicallyinspired prediction methods based on the architecture of anetwork of neurons. A wide range of specific models basedon this paradigm have been analyzed in literature, withperceptron-based and radial-basis function-based onesprevailing. These two methods both fall into the category ofeed-forward networks, in which, during the prediction, theinformation flows only in direction from the inpudescriptors, through a set of layers, to the output of thenetwork.

    Multi-Layer Perceptron

    The Multi-Layer Perceptron (MLP) model consists of alayered network of interconnected perceptrons, i.e., simplemodels of a neuron [126]. Each perceptron is capable ofmaking a linear combination of its input values and, bymeans of a certain transfer function, produce a binary ocontinuous output. A noteworthy fact is that each input othe perceptron has an adaptive weight specifying theimportance of the input. In training of a single perceptronthe inputs of the perceptron are formed by the moleculardescriptors, while the output should predict the activity othe compound. To achieve this goal, the perceptron is trainedby adjusting the weights, to produce a linear combination othe descriptors that optimally predicts the activity. Theadjusting process relies on the feedback from comparing the

    predicted with the expected output. That is, the error in theprediction is propagated to the weights of the descriptorsaltering them in the direction that counters the error.

    While a single perceptron is a linear model, a networkconsisting of layers of perceptrons, with output of one layeconnected to inputs of neurons in consecutive layer, allowfor non-linear prediction [127]. Multi-layer networks containa single input layer, which consists simply of the values omolecular descriptors, one or more hidden layers, whichprocess the descriptors into internal representations and anoutput layer utilizing the internal representation to producethe final prediction. This architecture is depicted in Fig. 5.

    In multi-layer networks, training, i.e., the adjustment o

    the weights becomes non-trivial. Apart from the output layerthe feedback information is no longer directly available toadjust the weights of neuron inputs in the hidden layers. Onepopular method to overcome this problem is the backwardpropagation of error method. The weights of inputs in theoutput layer neurons are adjusted based on the error as insingle perceptron. Then, the information of the erropropagates from the output layer neurons to the neurons inthe preceding layer proportionally to the weight of the linkbetween given hidden neuron output and the input of theoutput layer neuron. It is then used to adjust the weights ofthe inputs of the neurons in the hidden layer. Thecontribution to the overall error propagates backward

  • 7/27/2019 SE 302 Compmethods Qsar

    10/18

    222 Combinatorial Chemistry & High Throughput Screening, 2006, Vol. 9, No. 3 Dudek et al.

    through all hidden layers until the weights of the first hiddenlayer are adjusted.

    Fig. (5). Multi-Layer Perceptron neural network with two hidden

    layers, single output neuron and four descriptors as input. Each

    neuron realizes an inner product of weights w ij with its input,

    extended with a bias term. A binary step transfer function is used in

    each neuron following the calculation of the inner product.

    In general, any given function can be approximated using

    neural network that is sufficiently large, both in terms ofnumber of layers and number of neurons in a layer [128].However, a big network can overfit if given a finite trainingset. Thus, the choice of the number of layers and neurons isessential in constructing the networks. Usually, networksconsisting of only one hidden layer are used. Another aspectimportant in the construction of the neural network is thechoice of the exact form of the training rule, i.e., the functionrelating the update of weights with the error of prediction.The most popular is the delta rule, the weight change isproportional to the difference of predicted and expectedoutput and to the input value. The proportionality constantdetermines the learning rate, i.e., the magnitude of steps usedin adjusting the weights. Too large learning rates may

    prevent the convergence of the training to the minimal error,while too small rates increase the computational overhead.Variable, decreasing with time learning rate may also beused. The next choice in constructing the MLP is theselection of the neuron transfer function, i.e., the functionrelating the product of input and weights vectors with theoutput. Typically, a sigmoid function is used. Finally, theinitial values of weights of the links between neurons have tobe set. General consensus is that small-magnitude randomnumbers should be used.

    The MLP neural networks has shown their usefulness ina wide range of QSAR applications, where linear modelsoften fail. In a human intestinal absorption study [83], MLP

    outperformed the MLR model. However, single MLPnetworks have been shown inferior to ensembles of suchnetworks in prediction of antifilarial activity, GABAAreceptor binding and inhibition of dihydrofolate reductase[129]. In an antibacterial activity study [118], MLPperformed better than LDA. This type of networks has alsobeen applied to prediction of logP [104], faring better thanlinear regression and comparable to decision trees. Modelingof HIV reverse transcriptase inhibitors and E. colidihydrofolate reductase inhibitors [130] constructed withMLP were better than those relying on MLR or PLS. MLPhas also been employed to predict carcinogenic activity [82],

    aquatic toxicity [131], antimalarial activity and bindingaffinities to platelet derived growth factor receptor [132].

    Radial-Basis Function Neural Networks

    The Radial-basis Function (RBF) [133] neural networksconsist of input layer, a single hidden layer and an outpulayer. Contrary to MLP-ANNs, the neurons in hidden layedo not compute their output based on the product of the

    weights and the input values. Each hidden layer neuron idefined by its center, i.e., a point in the feature space odescriptors. The output of the neuron is calculated as thefunction of the distance between the input compound in thedescriptor space and the point constituting the neuronTypically, the Gaussian function is used, although inprincipal, some other function of the distance may beapplied. The output neuron is of perceptron type, having theoutput as a transfer function of the product of output valuesof RBF neurons and their weights.

    Several parameters are adjusted during the training of theRBF network. A number of RBF neurons has to be createdand their centers and scaling of distance, i.e., widthsdefined. In case of Gaussian function, these parametercorrespond to the mean and standard deviation, respectivelyThe simplest approach is to create as many neurons as thecompounds in the training set and set the centers to thecoordinates of the given examples. Clustering of the trainingset into a number of groups, by e.g. k-means method, andusing the group centers and widths can be used alternativelyThe orthogonal least squares [134] method can also beemployed. This procedure selects a subset of trainingexamples to be used as centers, by sequentially addingexamples, which best contribute to the explanation ovariance of the activity to be predicted. Once all RBFneurons have been trained, the weights of the connectionbetween them and the output neuron have to be establishedThis can be done by analytical procedure.

    The RBF-ANNs have been used in prediction of COX-2inhibition [103], yielding error lower than MLR but highethan SVM. In prediction of physicochemical properties, suchas O-H bond dissociation energy in substituted phenols [91and capillary electrophoresis absolute mobilities [96] RBFANNs showed accuracy worse than non-linear SVM, busometimes better than linear SVM.

    4.2.4. Decision Trees

    Decision Trees (DT) [135, 136] differ from mosstatistical-based classification and regression algorithms bytheir connection to logic-based and expert systems. In facteach classification tree can be translated into a set o

    predictive rules in Boolean logic.

    The DT classification model consists of a tree-likestructure consisting of nodes and links. Nodes are linkedhierarchically, with several child nodes branching from acommon parent node. A node with no children nodes iscalled a leaf. Typically, in each node, a test using a singledescriptor is made. Based on the result of the test, thealgorithm is directed to one of the child nodes branchingfrom the parent. In the child node, another test is performedand further traversal of the tree towards the leafs is carriedout. The final decision is based on the activity clasassociated with the leaf. Thus, the whole decision process i

  • 7/27/2019 SE 302 Compmethods Qsar

    11/18

    Computational Methods in Developing QSAR Combinatorial Chemistry & High Throughput Screening, 2006, Vol. 9, No. 3 223

    based on the series of simple tests, with results guiding thepath from the root of the tree to a leaf, as depicted in Fig. 6.

    Fig. (6). Classification using decision tree based on three molecular

    descriptors. The traversal of the tree on the path from the root to a

    leaf node defining the final decision leads through a set of simple

    tests using the values of molecular descriptors.

    The training of the model consists of incrementaladdition of nodes. The process starts with choosing the testfor the root node. A test that optimally separates thecompounds into the appropriate activity classes is chosen. Ifsuch a test perfectly discriminates the classes, no furthernodes are necessary. However, in most cases, a single test isnot sufficient. A group of compounds corresponding to oneoutcome of the test contains examples from different activityclasses. Thus, iterative procedure is utilized, starting withroot node as a current node.

    At each step, a current node is examined for meeting the

    criteria of being a leaf. A node may become a leaf ifcompounds directed to it by the traversal from the root fallinto a single activity category or at least one category formsa clear majority. Otherwise, if the compounds are distributedbetween several classes, the optimal discriminating criterionis selected. The results of discrimination form child nodeslinked to the current node. Since the decision in the currentnode introduces new discriminatory information, the subsetsof compounds on the child nodes are more homogeneouslycorresponding to single classes. Thus, after several nodesplitting operations, the leafs can be created. Upon creationof child nodes, they are added to the list of nodes waiting tobe assessed and a first node from such a list is chosen as anext current node for evaluation. One should note, that tests

    carried out in nodes at the same level of the tree may bedifferent in the different branches of the tree, following thedifferent class distribution at the nodes.

    There are several considerations in the development ofdecision trees. First, the method for choosing the test to beperformed on the node is necessary. As the test usuallyinvolves values of a single descriptor, the descriptor rankingcriteria outlined in section 3.1 may be employed. Once thedescriptor is chosen, the decision rule must be introduced,e.g. a threshold that separates the compounds from twoactivity classes.

    The DT method may lead to overfitting on the trainingset if the tree is allowed to grow until the nodes consispurely of one class. Thus, early stopping is used once thenodes are sufficiently pure. Moreover, pruning of theconstructed tree [137], i.e., removal of some overspecializedleafs may be introduced to increase the generalizationcapabilities of the tree [136].

    In general, DT methods usually offer suboptimal erro

    rates compared to other non-linear methods, in particulardue to the reliance on single feature in each nodeNonetheless, they are popular in QSAR domain for their easeof interpretability. The tree effectively combines the trainingprocess with descriptor selection, limiting the complexity othe model to be analyzed. Furthermore, since several leafs inthe tree may correspond to a single activity class, they allowfor inspection of different paths leading to the same activity.

    The decision trees also handle regression problems [138]This is done by associating each leaf with a numerical valueinstead of the categorical class. Furthermore, the criteria ofsplitting the node to form child nodes are based on thevariance of the compounds in that node.

    Decision Trees have been tested in a study [110] on awide range of targets, including COX-2 inhibition, bloodbrain barrier permeability, toxicity, multidrug resistancereversal, CDK-2 antagonist activity, dopamine bindingaffinity, logD and binding to an unspecified channel proteinIt performed worse than support vector machines andensembles of decision trees, but often better than PLS onaive bayes classifier. In a logP study [104], it showedresults comparable to MLP-ANNs and better than MLR. In astudy concerning P-glycoprotein transport activity [94], DTwas worse than both SVM and neural networks, but bettethan k-NN method. In various datasets related to ecotoxicity[121], decision trees usually achieved lower error than LDAor k-NN methods. Other studies employing decision treesincluding anti-HIV activity [139], toxicity [140] and oraabsorption [141] have been conducted.

    4.2.5. Support Vector Machines

    The Support Vector Machines (SVM) [142-144] form agroup of methods stemming from the structural riskminimization principle, with the linear support vectoclassifier as its most basic member. The SVC aims acreating a decision hyperplane that maximizes the margini.e., the distance from the hyperplane to the nearest examplesfrom each of the classes. This allows for formulating theclassifier training as a constrained optimization problemImportantly, the objective function is unimodal, contrary toe.g. neural networks, and thus can be optimized effectivelyto global optimum.

    In the simplest case, compounds from different classecan be separated by linear hyperplane; such hyperplane idefined solely by its nearest compounds from the trainingset. Such compounds are referred to as support vectorsgiving the name to the whole method. The core definitionbehind the SVM classification are illustrated in Fig. 7.

    In most cases, however, no linear separation is possibleTo take account of this problem, slack variables areintroduced. These variables are associated with themisclassified compounds and, in conjunction with the

  • 7/27/2019 SE 302 Compmethods Qsar

    12/18

    224 Combinatorial Chemistry & High Throughput Screening, 2006, Vol. 9, No. 3 Dudek et al.

    margin, are subject to optimization. Thus, even though theerroneous classification cannot be avoided, it is penalized.Since the misclassification of compounds strongly influencesthe decision hyperplane, the misclassified compounds alsobecome support vectors.

    Fig. (7). Support vectors and margins in linearly separable (a) and

    non-separable (b) problems. In non-separable case, negative

    margins are encountered and their magnitude is subject to

    optimization along with the magnitude of the positive margins.

    The SVC can be easily transformed into a non-linearclassifier by employing the kernel function [144, 145]. Thekernel function introduces an implicit mapping from theoriginal descriptor space to a high- or infinite-dimensionalspace. The linear hyperplane in the kernel space may behighly non-linear in the original space. Thus, by training alinear classifier in the kernel space a classifier, which is non-linear with respect to descriptor space, is obtained. Thestrength of the kernel function stems from the fact, that whileposition of compounds in the kernel space may not beexplicitly computed, the dot product can be obtained easily.As the algorithm for SVC uses the compound descriptorsonly within their dot products, this allows for computation of

    the decision boundary in the kernel space. One of two kernelfunctions are typically used, the polynomial kernel or theradial-basis function kernel.

    The SVC method has been shown to exhibit lowovertraining and thus allows for good generalization to thepreviously unseen compounds. It is also relatively robustwhen only a small number of examples of each class isavailable. The SVM methods have been extended intoSupport Vector Regression (SVR) [146] to handle theregression problems. By using methods similar to SVC, e.g.the slack variables and the kernel functions, accurate non-linear mapping between the activity and the descriptors canbe found. However, contrary to typical regression methods,

    the predicted values are penalized only if their absolute errorexceeds certain user-specified threshold, and thus theregression model is not optimal in terms of the least-squareerror.

    The SVM method has been shown to exhibit lowprediction error in QSAR [147]. Studies of P-glycoproteinsubstrates used SVMs [93, 94], with results more accuratethan neural networks, decision trees and k-NN. A studyfocused on prediction of drug likeness [120], have shownlower prediction error for SVM than for bagging ensemblesand for linear methods. In a study involving COX-2inhibition and aquatic toxicity [103], SVM outperformedMLR and RBF neural networks. An extensive study using

    support vector machines among other machine learningmethods was conducted [110] using a wide range oendpoints. In this study, SVM was usually better than k-NNdecision trees and linear methods, but slightly inferior toboosting and random forest ensembles.

    SVMs have also been tested with ADME propertiesincluding modeling of human intestinal absorption [81, 93]binding aff inities to human serum albumin [95] and

    cytochrome P450 inhibition [65]. Studies focused onhemostatic factors have employed support vector machinese.g. modeling thrombin binding [72] and factor Xa binding[76]. Adverse drug effects, such as carcinogenic potency[148] and Torsade de Pointes [93] were analyzed using theSVM method.

    Properties such as O-H bond dissociation energy insubstituted phenols [91] and capillary electrophoresiabsolute mobilities [96] have been also studied usingSupport Vector Machines, which exhibited higher accuracythan linear regression and RBF neural networks. Otheproperties predicted with SVM include heat capacity [97and capacity factor (logk) of peptides in high-performance

    liquid chromatography [149].4.2.6. Ensemble Methods

    Traditional approach to QSAR analysis focused onconstructing a single predictive model. Recently, methodutilizing a combination, or ensemble [150], of models foimproving the predication have been proposed. A smalensemble of three classifiers is depicted in Fig. 8.

    Bagging

    The Bagging method [151] focuses on improving thequality of prediction by creating a set of base modelsconstructed using the same algorithm, yet with varyingtraining set. Before training of each of base models, the

    original training set is subject to sampling with replacementThis leads to a group of bootstrap replicas of the originatraining set. The decisions of the models trained on eachreplica are averaged to create the final result. The strength othe bagging method stems from its ability to stabilize theclassifier by learning on different samples of the originadistribution.

    In QSAR, Bagging and other similar ensembles wereused with multi-layer perceptron for prediction of antifilariaactivity, binding affinities to GABAA receptors andinhibitors of dihydrofolate reductase [129], and yieldinglower error than single MLP neural network. The k-NN anddecision trees were used as base method in bagging foprediction of drug likeness [120], showing results slightlyworse than SVM, but better than LDA.

    Random Subspace Method

    The Random Subspace Method [152] is another exampleof ensemble scheme aiming at stabilizing the base model. Intraining of the models, the whole training set is usedHowever, to enforce diversity, only randomly generatedsubsets of descriptors are used. The most notable example orandom subspaces approach is the Random forest method[153], which uses decision trees as base models. Such amethod was recently proposed for use in QSAR in anextensive study including inhibitors of COX-2, blood-brain

  • 7/27/2019 SE 302 Compmethods Qsar

    13/18

    Computational Methods in Developing QSAR Combinatorial Chemistry & High Throughput Screening, 2006, Vol. 9, No. 3 225

    barrier permeability, estrogen receptor binding activity,multidrug resistance reversal, dopamine binding affinity andP-glycoprotein transport activity [154], yielding betterresults than single decision tree and than PLS. In anextended study by the same team [110], it fares better than

    SVM and k-NN.Boosting

    A more elaborate ensemble scheme is introduced in theBoosting algorithm [155, 156]. In training the base models, ituses all the descriptors and all the training compounds.However, for each compound, a weight is defined. Initially,the weights were uniform. After training a base model, itsprediction error is evaluated and weights of incorrectlypredicted compounds are increased. This focuses the nextbase model on previously misclassified examples, even atthe cost of making errors for those correctly classifiedpreviously. Thus, the compounds with activity hardest to

    predict obtain more attention from the ensemble model. Thadvantage of boosting compared to other ensemble methodis that it allows for use of relatively simple and erroneousbase models. Similar to SVM classifier, the power oboosting stems from its ability to create decision boundarie

    maximizing the margin [157].In QSAR, the boosting method employing decision tree

    as base model has been recently shown useful in modelingthe COX-2 inhibition, estrogen and dopamine receptorbinding, multidrug resistance reversal, CDK-2 antagonisactivity, BBB permeability, logD and P-glycoproteintransport activity [110], showing lower prediction error thank-NN, SVM, PLS, decision tree and naive bayes classifier inall cases but one, the P-glycoprotein dataset. In comparisonwith Random Forest method, it was better for severadatasets but worse for other datasets. In other study, a simpledecision stump method was used in boosting for humanintestinal absorption prediction [90].

    Fig. (8). Individual decisions of three binary classifiers and the resulting classifier ensemble with more accurate decision boundary.

  • 7/27/2019 SE 302 Compmethods Qsar

    14/18

    226 Combinatorial Chemistry & High Throughput Screening, 2006, Vol. 9, No. 3 Dudek et al.

    5. CONCLUDING REMARKS

    The chemoinformatic methods underlying QSARanalysis are in constant advancement. Well-establishedtechniques continue to be used, providing successful resultsespecially in small, homogeneous datasets consisting ofcompounds relating to a single mode of action. Thesetechniques include linear methods such as partial leastsquares. Classical non-linear methods, e.g. artificial neuralnetworks, also remain popular. However, the need for rapid,accurate assessment of large set of compounds is shifting theattention to novel techniques from pattern recognition andmachine learning fields.

    Two such methods, both relying on the concept ofmargin maximization, have recently gained attention fromthe QSAR community. These are the support vectormachines and ensemble techniques. Recent studies haveshown that both yield small prediction error in numerousQSAR applications. Given the complexity of these methods,one may be tempted to treat them as black boxes. Yet, asrecently shown, only careful model selection and tuningallows for optimal prediction accuracy [120]. Thus, adoption

    of novel methods should be preceded by extensive studies.Moreover, within the machine learning and patternrecognition fields, the interpretability of the model is usuallynot of utmost importance. Thus, the process of adoptingemerging techniques from these fields may requiresubstantial effort to develop methods for interpreting thecreated models.

    A similar situation can be encountered in preparing themolecular descriptors to be used. The number of differentdescriptors reaches thousand in some leading commercialtools. Having at hand powerful methods for automaticallyselecting the informative features, one may be tempted toleave the descriptor selection process entirely to algorithmictechniques. While this may lead to high accuracy of themodel, often the chosen descriptors may not give clearinsight into the structure-activity relationship.

    Throughout the review, we have focused on predictingthe activity or property given the values of the descriptors fora compound. The inverse problem of finding compoundswith desired activity and properties has also attractedattention. Such an inverse-QSAR formulation directlyfocuses on the goal of drug design, i.e., discovery of activecompounds with good pharmacokinetic and other properties.Authors such as Kvasnicka [158] and Lewis [14] havepublished some algorithms in this direction. Significantinsight has also been given by Kier and Hall [159] andZefirov [160]. Galvez and co-workers [161] have shown that

    topological indices are particularly suited to this aim. Thereason is that whereas the conventional physical andgeometrical descriptors are structure-related, topologicalindices are just an algebraic description of the structureitself. Thus, one can go backward and forward betweenstructure and property, predicting properties for moleculesand vice versa.

    Since the methods for solving the inverse problem are notyet widely adopted, the creation of QSAR models remainsthe main task in computer-aided drug discovery. In general,the adoption of novel, more accurate QSAR modelingtechniques does not reduce the responsibility of the

    investigator. On the contrary, the more complex andoptimized is the model, the more caution it requires duringits application. Combined with the increased complexity othe inspected datasets, this makes the QSAR analysis achallenging endeavor.

    REFERENCES

    [1] Bleicher, K.H.; Boehm, H.-J.; Mueller, K.; Alanine, A.I.Nat. RevDrug Discov.,

    2003,2,

    369-378.[2] Gershell, L.J.; Atkins, J.H.Nat. Rev. Drug Discov., 2003, 2, 321327.

    [3] Goodnow, R.; Guba, W.; Haap, W. Comb. Chem. High ThroughpuScreen., 2003, 6, 649-660.

    [4] Shoichet, B.K.Nature, 2004, 432, 862-865.

    [5] Stahura, F.L.; Bajorath, J. Comb. Chem. High Throughput Screen.2004, 7, 259-269.

    [6] Hansch, C.; Fujita, T.J. Am. Chem. Soc., 1964, 86, 1616-1626.[7] Bajorath, J.Nat. Rev. Drug Discov., 2002, 1, 882-894.[8] Pirard, B. Comb. Chem. High Throughput Screen., 2004, 7, 271

    280.

    [9] Hodgson, J.Nat. Biotechnol., 2001, 19, 722-726.[10] van de Waterbeemd, H.; Gifford, E.Nat. Rev. Drug Discov., 2003

    2, 192-204.[11] Proudfoot, J.R.Bioorg. Med. Chem. Lett., 2002, 12, 1647-1650.[12] Roche, O.; Schneider, P.; Zuegge, J.; Guba, W.; Kansy, M

    Alanine, A.; Bleicher, K.; Danel, F.; Gutknecht, E.M.; RogersEvans, M.; Neidhart, W.; Stalder, H.; Dillon, M.; Sjogren, EFotouhi, N.; Gillepsie, P.; Goodnow, R.; Harris, W.; Jones, P.Taniguchi, M.; Tsujii, S.; von der Saal, W.; Zimmermann, G.Schneider, G. J. Med. Chem., 2002, 45, 137-142.

    [13] Rusinko, A.; Young, S.S.; Drewry, D.H.; Gerritz, S.W. CombChem. High Throughput Screen., 2002, 5, 125-133.

    [14] Lewis, R.A.J. Med. Chem., 2005, 48, 1638-1648.[15] Mulliken, R.S.J. Phys. Chem., 1955, 23, 1833-1840.[16] Cammarata, A.J. Med. Chem., 1967, 10, 525-552.[17] Stanton, D.T.; Egolf, L.M.; Jurs, P.C.; Hicks, M.G.J. Chem. Info

    Comput. Sci., 1992, 32, 306-316.[18] Klopman, G.J. Am. Chem. Soc., 1968, 90, 223-234.[19] Zhou, Z.; Parr, R.G.J. Am. Chem. Soc., 1990, 112, 5720-5724.[20] Wiener, H.J. Am. Chem. Soc., 1947, 69, 17-20.[21] Randic, M.J. Am. Chem. Soc., 1975, 97, 6609-6615.[22] Balaban, A.T. Chem. Phys. Lett., 1982, 89, 399-404.

    [23] Schultz, H.P.J. Chem. Inf. Comput. Sci., 1989, 29, 227-222.[24] Kier, L.B.; Hall, L.H.J. Pharm. Sci., 1981, 70, 583-589.[25] Galvez, J.; Garcia, R.; Salabert, M.T.; Soler, R. J. Chem. Inf

    Comput. Sci., 1994, 34, 520-552.[26] Pearlman, R.S.; Smith, K.Perspect. Drug Discov. Des., 1998, 9-11

    339-353.

    [27] Stanton, D.T.J. Chem. Inf. Comput. Sci., 1999, 39, 11-20.[28] Burden, F.J. Chem. Inf. Comput. Sci., 1989, 29, 225-227.[29] Estrada, E.J. Chem. Info. Comput. Sci., 1996, 36, 844-849.[30] Estrada, E.; Uriarte, E. SAR QSAR Environ. Res., 2001, 12, 309

    324.[31] Hall, L.H.; Kier, L.B. Quant. Struct.-Act. Relat., 1991, 10, 43-48.

    [32] Hall, L.H.; Kier, L.B.J. Chem. Inf. Comput. Sci., 2000, 30 , 784791.

    [33] Labute, P.J. Mol. Graph. Model., 2000, 18, 464-477.[34] Higo, J.; Go, N.J. Comput. Chem., 1989, 10, 376-379.[35] Katritzky, A.R.; Mu, L.; Lobanov, V.S.; Karelson, M. J. Phys

    Chem., 1996, 100, 10400-10407.[36] Rohrbaugh, R.H.; Jurs, P.C.Anal. Chim. Acta, 1987, 199, 99-109.[37] Pearlman, R.S. In: Physical Chemical Properties of Drugs

    Yalkowsky, S.H.; Sinkula, A.A.; Valvani, S.C., Eds.; MarceDekker: New York, 1988.

    [38] Weiser, J.; Weiser, A.A.; Shenkin, P.S.; Still, W.C. J. CompChem., 1998, 19, 797-808.

    [39] Barnard, J.M.; Downs, G.M.J. Chem. Inf. Comput . Sci., 1997, 37141-142.

    [40] http://www.mdl.com/.[41] Durant, J.L.; Leland, B.A.; Henry, D.R.; Nourse, J.G. J. Chem

    Info. Comput. Sci., 2002, 42, 1273-1280.[42] Waller, C.L.; Bradley, M.P.J. Chem. Info. Comput. Sci., 1999, 39

    345-355.[43] Winkler, D.; Burden, F.R. Quant. Struct.-Act. Relat., 1998, 17

    224-231.

  • 7/27/2019 SE 302 Compmethods Qsar

    15/18

    Computational Methods in Developing QSAR Combinatorial Chemistry & High Throughput Screening, 2006, Vol. 9, No. 3 227

    [44] Maurer, W.D.; Lewis, T.G.ACM Comput. Surv., 1975, 7, 5-19.[45] http://www.daylight.com/.[46] Deshpande, M.; Kuramochi, M.; Wale, N.; Karypis, G. IE EE

    Trans. on Knowledge and Data Eng., 2005, 17, 1036-1050.[47] Guner, O.F. Curr. Top. Med. Chem., 2002, 2, 1321-1332.[48] Akamatsu, M. Curr. Top. Med. Chem., 2002, 2, 1381-1394.[49] Lemmen, C.; Lengauer, T.J. Comput.-Aided Mol. Des., 2000, 14,

    215-232.[50] Dove, S.; Buschauer, A. Quant. Struct.-Act. Relat., 1999, 18, 329-

    341.

    [51] Cramer, R.D.; Patterson, D.E.; Bunce, J.D. J. Am. Chem. Soc.,1988, 110, 5959-5967.

    [52] Klebe, G.; Abraham, U.; Mietzner, T. J. Med. Chem., 1994, 37,

    4130-4146.[53] Silverman, B.D.; Platt, D.E.J. Med. Chem., 1996, 39, 2129-2140.[54] Todeschini, R.; Lasagni, M.; Marengo, E. J. Chemom. , 1994, 8,

    263-272.[55] Todeschini, R.; Gramatica, P.Perspect. Drug Discov. Des., 1998,

    9-11, 355-380.

    [56] Bravi, G.; Gancia, E.; Mascagni, P.; Pegna, M.; Todeschini, R.;Zaliani, A.J. Comput.-Aided Mol. Des., 1997, 11, 79-92.

    [57] Cruciani, G.; Crivori, P.; Carrupt, P.-A.; Testa, B.J. Mol. Struct.:THEOCHEM., 2000, 503, 17-30.

    [58] Crivori, P.; Cruciani, G.; Carrupt, P.-A.; Testa, B.J. Med. Chem.,2000, 43, 2204-2216.

    [59] Pastor, M.; Cruciani, G.; McLay, I.; Pickett, S.; Clementi, S. J.Med. Chem., 2000, 43, 3233-3243.

    [60] Cho, S.J.; Tropsha, A.J. Med. Chem., 1995, 38, 1060-1066.[61] Cho, S.J.; Tropsha, A.; Suffness, M.; Cheng, Y.C.; Lee, K.H. J.

    Med. Chem., 1996, 39, 1383-1395.[62] Estrada, E.; Molina, E.; Perdomo-Lopez, J.J. Chem. Inf. Comput.

    Sci., 2001, 41, 1015-1021.[63] de Julian-Ortiz, J.V.; de Gregorio Alapont, C.; Rios-Santamarina,

    I., Garcia-Domenech, R.; Galvez, J. J. Mol. Graphics Modell .,1998, 16, 14-18.

    [64] Guyon, I.; Elisseeff, A.J. Mach. Learn. Res., 2003, 3, 1157-1182.[65] Merkwirth, C.; Mauser, H.; Schulz-Gasch, T.; Roche, O.; Stahl,

    M.; Lengauer, T.J. Chem. Inf. Comput. Sci., 2004, 44, 1971-1978.[66] Guha, R.; Jurs , P.C.J. Chem. Inf. Comput. Sci., 2004, 44, 2179-

    2189.[67] Gallegos, A.; Girone, X.J. Chem. Inf. Comput. Sci., 2004, 44, 321-

    326.[68] Verdu-Andres, J.; Massart, D.L.Appl. Spectrosc., 1998, 52, 1425-

    1434.

    [69] Farkas, O.; Heberger, K.J. Chem. Inf. Model., 2005, 45, 339-346.[70] Heberger, K.; Rajko, R.J. Chemom., 2002, 16, 436-443.[71] Rajko, R.; Heberger, K. Chemom. Intell. Lab. Syst., 2001, 57, 1-14.[72] Liu, Y.J. Chem. Inf. Comput. Sci., 2004, 44, 1823-1828.[73] Venkatraman, V.; Dalby, A.R.; Yang, Z.R.J. Chem. Inf. Comput.

    Sci., 2004, 44, 1686-1692.[74] Lin, T.-H.; Li, H.-T.; Tsai, K.-C.J. Chem. Inf . Comput. Sci., 2004,

    44, 76-87.[75] Massey, F. J.J. Amer. Statistical Assoc., 1951, 46, 68-78.[76] Byvatov, E.; Schneider, G.J. Chem. Inf. Comput. Sci., 2004, 44,

    993-999.[77] Kohavi, R.; John, G.Artiff. Intell., 1997, 97, 273-324.[78] Michalewicz, Z. Genetic algorithms + data structures = evolution

    programs (3rd ed.); Springer-Verlag: London, UK, 1996.[79] Siedlecki, W.; Sklansky, J. Int. J. Pattern Recog. Artiff. Intell.,

    1988, 2, 197-220.[80] Siedlecki, W.; Sklansky, J.Pat. Rec. Lett., 1989, 10, 335-347.

    [81] Wegner, J. K.; Frohlich, H.; Zell, A. J. Chem. Inf. Comput. Sci.,2004, 44, 921-930.

    [82] Hemmateenejad, B.; Safarpour, M.A.; Miri, R.; Nesari, N.J. Chem.Inf. Model., 2005, 45, 190-199.

    [83] Wessel, M.D.; Jurs, P.C.; Tolan, J.W.; Muskal, S.M.J. Chem. Inf.Comput. Sci., 1998, 38, 726-735.

    [84] Baurin, N.; Mozziconacci, J.-C.; Arnoult, E.; Chavatte, P.; Marot,C.; Morin-Allory, L.J. Chem. Inf. Comput. Sci., 2004, 44, 276-285.

    [85] Kirkpatrick, S.; C.D. Gelatt, J.; Vecchi, M.P. In Readings incomputer vision: issues, problems, principles, and paradigms ;

    Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1987.[86] Itskowitz, P.; Tropsha, A.J. Chem. Inf. Model., 2005, 45, 777-785.[87] Sutter, J.M.; Dixon, S.L.; Jurs, P.C. J. Chem. Info. Comput. Sci.,

    1995, 35, 77-84.

    [88] Kittler, J. Pattern Recognition and Signal Processing, 1978, E2941-60

    [89] Bi, J.; Bennet, K.; Embrechts, M.; Breneman, C.; Song, M. JMach. Learn. Res., 2003, 3, 1229-1243.

    [90] Wegner, J.K.; Frohlich, H.; Zell, A.J. Chem. Inf. Comput. Sci.2004, 44, 931-939.

    [91] Xue, C.X.; Zhang, R.S.; Liu, H.X.; Yao, X.J.; Liu, M.C.; Hu, Z.DFan, B.T.J. Chem. Inf. Comput. Sci., 2004, 44, 669-677.

    [92] Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V. Mach. Learn., 200246, 389-422.

    [93] Xue, Y.; Li, Z.R.; Yap, C.W.; Sun, L.Z.; Chen, X.; Chen, Y.Z.JChem. Inf. Comput. Sci., 2004, 44, 1630-1638.

    [94] Xue, Y.; Yap, C.W.; Sun, L.Z.; Cao, Z.W.; Wang, J.F.; Chen, Y.ZJ. Chem. Inf. Comput. Sci., 2004, 44, 1497-1505.

    [95] Xue, C.X.; Zhang, R.S.; Liu, H.X.; Yao, X.J.; Liu, M.C.; Hu, Z.DFan, B.T.J. Chem. Inf. Comput. Sci., 2004, 44, 1693-1700.

    [96] Xue, C.X.; Zhang, R.S.; Liu, M.C.; Hu, Z.D.; Fan, B.T.J. ChemInf. Comput. Sci., 2004, 44, 950-957.

    [97] Xue, C.X.; Zhang, R.S.; Liu, H.X.; Liu, M.C.; Hu, Z.D.; Fan, B.TJ. Chem. Inf. Comput. Sci., 2004, 44, 1267-1274.

    [98] Senese, C.L.; Duca, J.; Pan, D.; Hopfinger, A.J.; Tseng, Y.J.JChem. Inf. Comput. Sci., 2004, 44, 1526-1539.

    [99] Trohalaki, S.; Pachter, R.; Geiss, K.; Frazier, J. J. Chem. InfComput. Sci., 2004, 44, 1186-1192.

    [100] Roy, K.; Ghosh, G.J. Chem. Inf. Comput. Sci., 2004, 44, 559-567.

    [101] Hou, T.J.; Zhang, W.; Xia, K.; Qiao, X.B.; Xu, X.J.J. Chem. InfComput. Sci., 2004, 44, 1585-1600.

    [102] Hou, T.J.; Xia, K.; Zhang, W.; Xu, X.J.J. Chem. Inf. Comput. Sci.2004, 44, 266-275.

    [103] Yao, X.J.; Panaye, A.; Doucet, J.P.; Zhang, R.S.; Chen, H.F.; LiuM.C.; Hu, Z.D.; Fan, B.T. J. Chem. Inf. Comput. Sci., 2004, 44

    1257-1266.[104] Tino, P.; Nabney, I.T.; Williams, B.S.; Losel, J.; Sun, Y.J. Chem

    Inf. Comput. Sci., 2004, 44, 1647-1653.[105] Wold, S.; Ruhe, A.; Wold, H.; Dunn, W. SIAM J. Sci. Stat

    Comput., 1984, 5, 735-743.[106] Wold, S.; Sjostrom, M.; Eriksson, L. Chemom. Intell. Lab. Syst.

    2001, 58, 109-130.[107] Phatak, A.; de Jong, S.J. Chemom., 1997, 11, 311-338.[108] Zhang, H.; Li, H.; Liu, C.J. Chem. Inf. Model., 2005, 45, 440-448.[109] Waller, C.L.J. Chem. Inf. Comput. Sci., 2004, 44, 758-765.[110] Svetnik, V.; Wang, T.; Tong, C.; Liaw, A.; Sheridan, R.P.; Song

    Q.J. Chem. Inf. Model., 2005, 45, 786-799.[111] Clark, M.J. Chem. Inf. Model., 2005, 45, 30-38.

    [112] Catana, C.; Gao, H.; Orrenius, C.; Stouten, P.F.W. J. Chem. InfModel., 2005, 45, 170-176.

    [113] Sun, H.J. Chem. Inf. Comput. Sci., 2004, 44, 748-757.[114] Adenot, M.; Lahana, R.J. Chem. Inf. Comput. Sci., 2004, 44, 239

    248.[115] Feng, J.; Lurati, L.; Ouyang, H.; Robinson, T.; Wang, Y.; Yuan, S

    Young, S.S.J. Chem. Inf. Comput. Sci., 2003, 43, 1463-1470.[116] Fisher, R.Ann. Eugen., 1936, 7, 179-188.[117] Guha, R.; Jurs, P.C.J. Chem. Inf. Model., 2005, 45, 65-73.[118] Murcia-Soler, M.; Perez-Gimenez, F.; Garcia-March, F.J

    Salabert-Salvador, T.; Diaz-Villanueva, W.; Castro-Bleda, M.J.Villanueva-Pareja, A.J. Chem. Inf. Comput. Sci., 2004, 44, 10311041.

    [119] Molina, E.; Diaz, H.G.; Gonzalez, M.P.; Rodriguez, E.; Uriarte, EJ. Chem. Inf. Comput. Sci., 2004, 44, 515-521.

    [120] Mueller, K.-R.; Raetsch, G.; Sonnenburg, S.; Mika, S.; Grimm, MHeinrich, N.J. Chem. Inf. Model., 2005, 45, 249-253.

    [121] Mazzatorta, P.; Benfenati, E.; Lorenzini, P.; Vighi, M.J. Chem. InfComput. Sci., 2004, 44, 105-112.

    [122] Hawkins, D.M.J. Chem. Inf. Comput. Sci., 2004, 44, 1-12.[123] Klon, A.E.; Glick, M.; Davies, J.W. J. Chem. Inf. Comput. Sci

    2004, 44, 2216-2224.[124] Cover, T.; Hart, P.IEEE Trans. Inform. Theory, 67, 13, 21-27.[125] Jain, A.; Mao, J.; Mohiuddin, K. Computer ,1996, 29, 31-44.[126] Rosenblatt, F.Psychol. Rev., 1958, 65, 386-408.[127] Gallant, S.IEEE Trans. Neural Networks, 1990, 1, 179-191.[128] Hornik, K.; Stinchcombe, M.; White, H.Neural Networks, 1989, 2

    359-366.[129] Agrafiotis, D.K.; Cedeno, W.; Lobanov, V.S. J. Chem. Inf

    Comput. Sci., 2002, 42, 903-911.[130] Chiu, T.-L.; So, S.-S. J. Chem. Inf. Comput. Sci., 2004, 44, 154

    160.

  • 7/27/2019 SE 302 Compmethods Qsar

    16/18

    228 Combinatorial Chemistry & High Throughput Screening, 2006, Vol. 9, No. 3 Dudek et al.

    [131] Gini, G.; Craciun, M.V.; Konig, C. J. Chem. Inf. Comput. Sci.,

    2004, 44, 1897-1902.[132] Guha, R.; Jurs, P.C.J. Chem. Inf. Model., 2005, 45, 800-806.

    [133] Mulgrew, B.IEEE Sig. Proc. Mag., 1996, 13, 50-65.[134] Chen, S.; Cowan, C.F.N.; Grant, P.M. IEEE Trans. Neural

    Networks, 1991, 2, 302-309.[135] Quinlan, J.R. Mach. Learn., 1986, 1, 81-106.[136] Gelfand, S.; Ravishankar, C.; Delp, E.IEEE Trans. Pattern Anal.

    Mach. Intell., 1991, 13, 163-174.[137] Mingers, J. Mach. Learn., 1989, 4, 227-243.

    [138] Breiman, L.; Friedman, J.; Olshen, R.; Stone, C. Classification andregression trees; Wadsworth: California, 1984.

    [139] Daszykowski, M.; Walczak, B.; Xu, Q.-S.; Daeyaert, F.; de Jonge,M.R.; Heeres, J.; Koymans, L.M.H.; Lewi, P.J.; Vinkers, H.M.;Janssen, P.A.; Massart, D.L. J. Chem. Inf. Comput. Sci., 2004, 44,716-26.

    [140] DeLisle, R.K.; Dixon, S.L. J. Chem. Inf. Comput. Sci., 2004, 44,862-870.

    [141] Bai, J.P.F.; Utis, A.; Crippen, G.; He, H.-D.; Fischer, V.; Tullman,

    R.; Yin, H.-Q.; Hsu, C.-P.; Jiang, L.; Hwang, K.-K. J. Chem. Inf.Comput. Sci., 2004, 44, 2061-209.

    [142] Cortes, C.; Vapnik, V. Mach. Learn., 1995, 20, 273-297[143] Vapnik, V. The Nature of Statistical Learning Theory; Springer

    Verlag: New York, 1995.[144] Burges, C.Data Min. Knowl. Discov., 1998, 2, 121-167

    [145] Boser, B.E.; Guyon, I.M.; Vapnik, V.N. In COLT '92: Proceedingsof the fifth annual workshop on Computational learning theory ;

    ACM Press: New York, NY, USA, 1992.

    [146] Smola, A.; Schoelkopf, B. Stat. Comput., 2004, 14, 199-222.[147] Burbidge, R.; Trotter, M.; Buxton, B.F.; Holden, S.B. Compu

    Chem., 2001, 26, 5-14.

    [148] Helma, C.; Cramer, T.; Kramer, S.; Raedt, L. D. J. Chem. InfComput. Sci., 2004, 44, 1402-1411.

    [149] Liu, H.X.; Xue, C.X.; Zhang, R.S.; Yao, X.J.; Liu, M.C.; Hu, Z.DFan, B.T.J. Chem. Inf. Comput. Sci., 2004, 44, 1979-1986.

    [150] Meir, R.; Raetsch, G.Lecture Notes in Computer Sci., 2003, 2600118-183.

    [151] Breiman, L. Mach. Learn., 1996, 24, 123-140.

    [152] Ho, T.K. IEEE Trans. Pat. Anal. Mach. Intell., 1998, 20, 832-844.[153] Breiman, L. Mach. Learn., 2001, 45, 5-32.[154] Svetnik, V.; Liaw, A.; Tong, C.; Culberson, C.; Sheridan, R.P

    Feuston, B.P.J. Chem. Inf. Comput. Sci., 2003, 43, 1947-1958.[155] Freund, Y.; Schapire, R.J. Comp. Sys. Sci., 1997, 55, 119-139.[156] Freund, Y.; Schapire, R. J. Japanese Soc. for Artificial Intelligence

    1999, 14, 771-780.[157] Schapire, R. E.; Freund, Y.; Bartlett, P.; Lee, W. S. The Annals o

    Statistics 1998, 26, 1651-1686.

    [158] Kvasnicka, V.; Pospichal, J. J. Chem. Inf. Comput. Sci., 1996, 36516-526.

    [159] Kier, L.B.; Hall, L.H. Quant. Struct.-Act. Relat., 1993, 12, 383-388[160] Skvortsova, M.; Baskin, I.; Slovokhotova, O.; Palyulin, P.; Zefirov

    N.J. Chem. Inf. Comput. Sci., 1993, 33, 630-634.[161] Galvez, J.; Garcia-Domenech, R.; Bernal, J.M.; Garcia-March, F.J

    An. Real Acad. Farm., 1991, 57, 533-546.

    Received: November 1, 2005 Revised: November 15, 2005 Accepted: December 14, 2005

  • 7/27/2019 SE 302 Compmethods Qsar

    17/18

  • 7/27/2019 SE 302 Compmethods Qsar

    18/18


Recommended