Home >Documents >Granular Kernel Tree

Granular Kernel Tree

Date post:08-Apr-2018
Category:
View:216 times
Download:0 times
Share this document with a friend
Transcript:
  • 8/7/2019 Granular Kernel Tree

    1/16

    270 Int. J. Data Mining and Bioinformatics, Vol. 1, No. 3, 2007

    Copyright 2007 Inderscience Enterprises Ltd.

    Granular Kernel Trees with parallel GeneticAlgorithms for drug activity comparisons

    Bo Jin* and Yan-Qing Zhang

    Department of Computer Science,

    Georgia State University, Atlanta, GA 30302, USA

    E-mail: [email protected] E-mail: [email protected]

    *Corresponding author

    Binghe Wang

    Department of Chemistry and

    Center for Biotechnology and Drug Design,

    Georgia State University, Atlanta, GA 30302-4098, USA

    E-mail: [email protected]

    Abstract: With the growing interests of biological data prediction andchemical data prediction, more powerful and flexible kernels need to bedesigned so that the prior knowledge and relationships within data can beexpressed effectively in kernel functions. In this paper, Granular Kernel Trees(GKTs) are proposed and parallel Genetic Algorithms (GAs) are used tooptimise the parameters of GKTs. In applications, SVMs with new kernel treesare employed for drug activity comparisons. The experimental results show thatGKTs and evolutionary GKTs can achieve better performances than traditional

    RBF kernels in terms of prediction accuracy.

    Keywords: kernel design; Support Vector Machines; SVMs; Granular KernelTrees; GKTs; Genetic Algorithms; GAs; drug activity comparisons; datamining; bioinformatics.

    Reference to this paper should be made as follows: Jin, B., Zhang, Y-Q. andWang, B. (2007) Granular Kernel Trees with parallel Genetic Algorithms fordrug activity comparisons, Int. J. Data Mining and Bioinformatics, Vol. 1,No. 3, pp.270285.

    Biographical notes: Bo Jin is a PhD student in the Computer ScienceDepartment at Georgia State University. He received his BE Degree from theUniversity of Electronic Science and Technology of China. His researchinterests are in the areas of machine learning, data mining, chemical

    informatics and biomedical informatics.

    Yan-Qing Zhang is currently an Associated Professor of the Computer ScienceDepartment at Georgia State University. He received a PhD Degree inComputer Science and Engineering at the University of South Florida in 1997.His research interests include hybrid intelligent systems, computationalintelligence, granular computing, kernel machines, bioinformatics, data miningand computational web intelligence. He has published 3 books, 12 bookchapters, 49 journal papers and over 100 conference papers. He has served as areviewer for 37 international journals, and a committee member in over70 international conferences. He is a program co-chair of IEEE-GrC2006.

  • 8/7/2019 Granular Kernel Tree

    2/16

    Granular Kernel Trees with parallel Genetic Algorithms 271

    Binghe Wang is Professor of Chemistry at Georgia State University,

    Georgia Research Alliance Eminent Scholar in Drug Discovery, and GeorgiaCancer Coalition Distinguished Cancer Scientist. He obtained his BS Degreefrom Beijing Medical College in 1982 and his PhD Degree in MedicinalChemistry from the University of Kansas, School of Pharmacy in 1991. He isEditor-in-Chief of Medicinal Research Reviews published by John Wiley andSons and the Series Editor of A Wiley Series in Drug Discovery andDevelopment. His research expertise includes drug delivery, drug designand synthesis, bioorganic chemistry, fluorescent sensors, and combinatorialchemistry.

    1 Introduction

    Kernel methods, specifically Support Vector Machines (SVMs) (Boser et al., 1992;

    Cortes and Vapnik, 1995; Shawe-Taylor and Cristianini, 2004; Vapnik, 1998) have been

    widely used in many fields such as bioinformatics (Schlkopf et al., 2004) and chemical

    informatics (Burbidge et al., 2001; Weston et al., 2003) for data classification and pattern

    recognition. With the help of kernels nonlinear mapping, input data are transformed into

    a high dimensional feature space where it is easy for SVMs to find a hyperplane to

    separate data. SVMs performance is mainly affected by kernel functions. While

    traditional kernels, such as RBF kernels and polynomial kernels do not take into

    considerations the relationships and structure within each data item but simply treat each

    data vector as one unit in operations. With the growing interests of biological data

    prediction and chemical data prediction such as structure-property based molecule

    comparison, protein structure prediction and long DNA sequence comparison, more

    complicated kernels are designed to integrate data structures, such as string kernels

    (Cristianini and Shawe-Taylor, 1999; Lodhi et al., 2001), tree kernels (Collins and Duffy,

    2002; Kashima and Koyanagi, 2002) and graph kernels (Grtner et al., 2003; Kashima

    and Inokuchi, 2002) based on kernel decomposition concept. For detailed review please

    see Grtner (2003). One common character of these kernels is that feature

    transformations are implemented according to objects structures without steps of input

    feature generation. Many of them directly implement inner product operations with some

    kinds of iterative calculations. These transformations are very efficient in the case that

    objects include large structured information. While for many challenging problems,

    objects are not structured or some relationships within objects are not easy to be

    described directly. Furthermore, essential optimisations are needed once kernel functions

    are defined. It should be mentioned that Haussler (1999) first detailedly introduced the

    decomposition based kernel design and proposed convolution kernels.In this paper, we use granular computing concepts to redescribe the decomposition

    based kernel design and propose an evolutionary hierarchical approach to integrate the

    prior knowledge such as data structures, feature relationships into the kernel design.

    Features within an input vector are grouped into feature granules according to the

    composition and structure of each data item. Each feature granule captures a particular

    aspect of data items. For two input vectors, the similarity between a pair of feature

    granules is measured by using a kernel function called granular kernel. Granular kernels

    for different kinds of feature granules are fused together by hierarchical trees, called

    GKTs. Parallel GAs are used to optimise GKTs and select an effective SVMs model.

  • 8/7/2019 Granular Kernel Tree

    3/16

    272 B. Jin, Y-Q. Zhang and B. Wang

    In applications, SVMs with new kernels trees are employed for the comparisons of

    drug activities, which is a problem in Quantitative Structure Activity Relationships(QSAR) analysis. QSAR is an important technique used in drug design, which describes

    the relationships between compound structures and their activities. In QSAR analysis,

    compounds with different activities are discriminated, and then predictive rules are

    constructed. In this study, inhibitors of E. Coli dihydrofolate reductase (DHFR) are

    analysed. These inhibitors are potential therapeutic agents for the treatment of malaria,

    bacterial infection, toxoplasma, and cancer. Experimental results show that SVMs with

    both GKTs and EGKTs can achieve much better performance than SVMs with the

    traditional RBF kernels in terms of prediction accuracy.

    The rest of the paper is organised as follows. Granular kernel, kernel tree design and

    evolutionary optimisation are proposed in Section 2. Section 3 describes the experiments

    of drug activity comparisons. Finally, Section 4 gives conclusions and directs the future

    work.

    2 Granular Kernel and Kernel tree design

    2.1 Definitions

    Definition 1 (Cristianini and Shawe-Taylor, 1999): A kernel is a function Kthat for all

    ,x z X

    satisfies

    ( , ) ( ), ( )K x z x z =

    (1)

    where is a mapping from input spaceX=Rn

    to an inner product feature spaceF=RN

    : ( ) .x x F

    (2)

    Definition 2: A feature granule space G of input spaceX=Rn

    is a sub space ofX, where

    G =Rm

    and 1 mn.

    From input space we may generate many feature granule spaces and some of them may

    overlap on some feature dimensions.

    Definition 3: A feature granule g G

    is a vector which is defined in the feature granule

    space G.

    Definition 4: A granular kernelgKis a kernel that for all , 'g g G

    satisfies

    ( , ') ( ), ( ')gK g g g g =

    (3)

    where is a mapping from feature granule space G =Rm

    to an inner product feature

    spaceRE

    .

    : ( ) .Eg g R

    (4)

  • 8/7/2019 Granular Kernel Tree

    4/16

    Granular Kernel Trees with parallel Genetic Algorithms 273

    2.2 Granular Kernel properties

    Property 1: Granular kernels inherit the properties of traditional kernels such as the

    closure under sum, product, and multiplication with a positive constant over the granular

    feature spaces.

    Let G be a feature granule space and , ' .g g G

    LetgK1 andgK2 be two granule kernels

    operating over the same space G G. The following ( , ')gK g g

    are also granular kernels.

    1( , ') ( , '),gK g g c gK g g c R+

    =

    (5)

    1( , ') ( , ') ,gK g g gK g g c c R+

    = +

    (6)

    1 2( , ') ( , ') ( , ')gK g g gK g g gK g g = +

    (7)

    1 2( , ') ( , ') ( , ')gK g g gK g g gK g g =

    (8)

    ( , ') ( ) ( '), :gK g g f g f g f X R=

    (9)

    1

    1 1

    ( , ')( , ') .

    ( , ) ( ', ')

    gK g g gK g g

    gK g g gK g g =

    (10)

    These properties can be elicited from the traditional kernel properties directly.

    Property 2 (Berg et al., 1984; Haussler, 1999): A kernel can be constructed with two

    granular kernels defined over different granular feature spaces under sum operation.

    To prove it, let 1 1 1( , ')gK g g

    and 2 2 2( , ' )gK g g

    be two granular kernels, where1 1 1, 'g g G

    , 2 2 2, 'g g G

    and G1G2. We may define new kernels like this,

    1 2 1 2 1 1 1(( , ), ( ', ' )) ( , ' )gK g g g g gK g g =

    1 2 1 2 2 2 2'(( , ), ( ' , ' )) ( , ' )gK g g g g gK g g =

    gKandgK can operate over the same feature space (G1G2) (G1G2). We get

    1 1 1 2 2 2 1 2 1 2 1 2 1 2( , ' ) ( , ' ) (( , ), ( ' , ' )) '(( , ), ( ' , ' ))gK g g gK g g gK g g g g gK g g g g + = +

    .

    According to the sum closure property of kernels (Cristianini and Shawe-Taylor, 1999),

    1 1 1 2 2 2( , ' ) ( , ' )gK g g gK g g +

    is a kernel over (G1G2) (G1G2).

    Property 3 (Berg et al., 1984; Haussler, 1999): A kernel can be constructed with two

    granular kernels defined over different granular feature spaces under product operation.

    To prove it, let 1 1 1( , ' )gK g g

    and 2 2 2( , ' )gK g g

    be two granular kernels, where

    1 1 1, 'g g G

    , 2 2 2, 'g g G

    and G1G2. We may define new kernels like this,

    1 2 1 2 1 1 1(( , ), ( ', ' )) ( , ' )gK g g g g gK g g =

    1 2 1 2 2 2 2'(( , ), ( ' , ' )) ( , ' ).gK g g g g gK g g =

  • 8/7/2019 Granular Kernel Tree

    5/16

    274 B. Jin, Y-Q. Zhang and B. Wang

    SogKandgK can operate over the same feature space (G1G2) (G1G2). We get

    1 1 1 2 2 2 1 2 1 2 1 2 1 2( , ' ) ( , ' ) (( , ), ( ' , ' )) '(( , ), ( ' , ' )).gK g g gK g g gK g g g g gK g g g g =

    According to the product closure property of kernels (Cristianini and Shawe-Taylor,

    1999), 1 1 1 2 2 2( , ' ) ( , ' )gK g g gK g g

    is a kernel over (G1G2) (G1G2).

    2.3 GKTs and EGKTs

    An easy and effective way to construct new kernel functions is combining a group of

    granular kernels via some simple operations such as sum and product. The new kernel

    functions can be naturally expressed as tree structures. The following are main steps in

    GKTs design.

    Step 1: Features are bundled into feature granules according to some prior knowledgesuch as object structures and feature relationships or with an automatic learning

    algorithm.

    Step 2: A tree structure is constructed with suitable number of layers, nodes and

    connections. Like the first step, we can construct trees according to some prior

    knowledge or with an automatic learning algorithm. Figure 1 shows a kind of GKTs with

    m basic granular kernels gKt and m pairs of feature granules tg

    and 'tg

    , where

    1 tm.

    Step 3: Granular kernels are selected from the candidate kernel set. Some popular

    traditional kernels such as RBF kernels and polynomial kernels can be chosen as

    granular kernels, since these kernels have proved successful in many real problems.

    Some special kernels designed for some particular problems could also be selected asgranular kernels if they are good at measuring the similarities of corresponding feature

    granules.

    Step 4: Parameters of granular kernels and operations of connection nodes are selected.

    Each connection operation in GKTs can be a sum or product. A positive connection

    weight may associate to each edge in the tree and a granular kernel may belong to one

    more subtrees.

    In this paper, GAs are used to find the optimum parameter settings of GKTs. We use

    EGKTs to represent such kind of evolutionary GKTs. The following are basic definitions

    and operations used in optimising EGKTs.

    Chromosome: LetPi denote the population in generation Gi, where i = 1, , m and m is

    the total number of generations. Each population Pi has p chromosomes cij, j = 1, ,p.

    Each chromosome cij has q genesgt(cij), where t= 1, , q. Here each gene is a parameter

    of GKTs and we use GKTs(cij) to represent GKTs configured with genes gt(cij),

    t= 1, , q.

  • 8/7/2019 Granular Kernel Tree

    6/16

    Granular Kernel Trees with parallel Genetic Algorithms 275

    Fitness: There are several methods to evaluate SVMs performance. One is using k-fold

    cross-validation, which is a popular technique for performance evaluation. Others aresome theoretical bounds evaluation on the generalisation errors, such as Xi-Alpha bound

    (Joachims, 2000), VC bound (Vapnik, 1998), Radius margin bound and VCs span bound

    (Vapnik and Chapelle, 2000). Detail review can be found in Duan et al. (2003). In this

    paper we use k-fold cross-validation to evaluate SVMs performance in training phase.

    Figure 1 An example of GKTs

    In k-fold cross-validation, the training data set S is separated into kmutually exclusive

    subsets vS . Forv = 1, , k, data set v is used to train SVMs with GKTs(cij) and vS

    is

    used to evaluate SVMs model.

    , 1, , .v vS S v k = = (11)

    Afterktimes of training-testing on all different subsets, we get kprediction accuracies.The fitnessfij of chromosome cij is calculated by

    1

    1 k

    ij v

    v

    f Acck =

    = (12)

    whereAccv is the prediction accuracy of GKTs(cij) on vS .

  • 8/7/2019 Granular Kernel Tree

    7/16

    276 B. Jin, Y-Q. Zhang and B. Wang

    Selection: In the algorithm, the roulette wheel method described in Michalewicz (1996)

    is used to select individuals for the new population.

    Crossover: Two chromosomes are first selected randomly from current generation as

    parents and then the crossover point is randomly selected to separate the chromosomes.

    Parts of chromosomes are exchanged between parents to generate two children.

    Mutation: Some chromosomes are randomly selected and some genes are randomly

    chosen from each selected chromosome for mutation. The values of mutated genes are

    replaced by random values.

    2.4 Parallel GAs

    We use parallel GAs to speed up SVMs model selection and parameter optimisation.In the literature, some parallel algorithms are designed for SVMs. In Dong et al. (2003), a

    parallelisation approach is proposed where SVMs kernel matrix is approximated by

    block diagonal matrices so an original optimisation problem can be rewritten into

    hundreds of sub-problems. In Zanghirati et al. (2003) and Serafini et al. (2004), a

    Gradient Projection Method (GPM) is presented and implemented for parallel

    computation in SVMs. The decomposition technique is used to split the SVM Quadratic

    Programming (QP) problem into smaller QP sub-problems (each sub-problem is solved

    by GPM). The related SVMs software can be used in both scalar and distributed memory

    parallel environments. Graf et al. (2005) develop a kind of parallel SVMs called Cascade

    of SVMs on a distributed environment, where smaller optimisations are solved

    independently. The partial results are combined and filtered again in a Cascade of SVMs,

    until the global optimum is reached. Convergence to the global optimum is guaranteed

    with multiple passes through the Cascade.

    Besides the works mentioned above, Runarsson and Sigurdsson (2004) use the

    parallel method to speedup the evolutionary model selection for SVMs. The algorithm is

    implemented on a multi-processor computer in C++ using standard Posix threads.

    In GKTs optimisation, all parameters and operations to be optimised are independent

    in each generation, so its well suitable to design a parallel GAs based system to speed up

    GKTs optimisation. Parallel GAs (Cant-Paz, 1998; Adamidis, 1994; Lin et al., 1997)

    have been well studied in recent several years. There are three common types of parallel

    GAs models:

    single population master-slave models

    single population fine-grained models

    multiple population coarse-grained models.

    In this paper, the parallel GAs system is designed based on the first type of models.

    In the system, one processor is chosen as the master, who stores the population, does

    selection, crossover and mutation, and then distributes individuals to slave processors on

    the cluster. Each single SVMs model is trained and evaluated on one of slave processors

    with the received individual (parameters). After fitness evaluation, each slave will

    send back the fitness value to the master. The architecture of parallel GAs is shown in

    Figure 2. The parallel GAs-SVMs system has some characteristics. First, this is a global

    GAs-SVMs system, since all evaluations and operations are performed on the entire

  • 8/7/2019 Granular Kernel Tree

    8/16

    Granular Kernel Trees with parallel Genetic Algorithms 277

    population. Second, the implementation is easy, clear, practical, and especially suitable

    for SVMs model selection. Third, the system can be easily moved to the large distributedcomputing environment, such as the grid-computing system.

    Figure 2 Parallel GAs model

    QP decomposition based parallel computing can also speed up SVMs model selection in

    a distributed system, while if the training data set is large, the communication costs for

    transferring sub-QP meta results will be very high. On the other hand, in SVMs model

    selection, each SVMs model spends much more time for QP calculation, which generally

    has higher magnitude of running time than those of operations in GAs. In the

    master-slave based parallel GAs-SVMs system, only parameters and fitness values need

    to be transferred between the master and the slaves. So the communication costs are low.Figure 3 shows an example of running time and speedup with parallel GAs on a cluster

    system. The cluster is a shared-disk and distributed-memory platform. In the example, the

    size of dataset is 314, RBF is chosen as the kernel function, the size of population is set to

    300 and the number of generations is set to 50. For this example, we can see the speedup

    can reach 10 with 14 nodes. Here each node is a processor. The system architecture of

    SVMs with EGKTs is shown in Figure 4. In practice, the regularisation parameter Cof

    SVMs is also optimised by parallel GAs.

    Figure 3 An example of running time and speedup with parallel GAs: (a) running timeand (b) speedup

    (a) (b)

  • 8/7/2019 Granular Kernel Tree

    9/16

    278 B. Jin, Y-Q. Zhang and B. Wang

    Figure 4 System Architecture of SVMs with EGKTs

    3 Experiments

    Since RBF kernels (equation (13)) usually have better performances among traditional

    kernels, we compare GKTs and EGKTs with RBF kernels. To make a fair comparison

    with EGKTs, traditional RBF kernels are also optimised by using GAs. Here we use

    E-RBF to represent GAs based RBF kernels.

    2exp( || || ).x z

    (13)

    3.1 Drug sets

    The drug datasets used in the experiments are pyrimidines and triazines, which are

    described in Hirst (1994a; 1994b) and available at UCI Repository of machine learning

    databases (Newman, 1998). Pyrimidines dataset contains 55 drugs, and each drug has

    three possible substitution positions (R3, R4 and R5, see Figure 5(a)). Each substituent is

    characterised by nine chemical properties features: polarity, size, flexibility,

    hydrogen-bond donor, hydrogen-bond acceptor, donor, acceptor, polarisability, and

    effect. Drug activities are identified by the substituents. If no substituent locates in a

    possible position, the features are indicated by nine 1s. Each input vector includes two

    drug features with the fixed feature order. In one vector, if the activity of the first drug is

    higher than that of the second one, the vector is labelled positive, otherwise it is labelled

    negative. So the feature number of one vector is 54.

    Figure 5 Drug structures: (a) pyrimidines and (b) triazines

    (a) (b)

  • 8/7/2019 Granular Kernel Tree

    10/16

    Granular Kernel Trees with parallel Genetic Algorithms 279

    The pyrimidines dataset is randomly shuffled and split into two parts in the proportion of

    4 : 1. One part is used as the training set, which contains pairs of 44 compounds.The other part is chosen as the unseen testing set, which contains pairs of the left

    compounds and those between the left compounds and the training compounds. So the

    size of training set should be 44 43 = 1892 and the size of testing set should be

    44 11 2 + 11 10 = 1078. Due to the deletion of some pairs with the same activities,

    the data sets are actually a little bit smaller than those above.

    The structure of triazines is described in Figure 5(b). In triazines dataset, each

    compound has six possible substitution positions: the positions of R3 and R4; if the

    substituent at R3 contains a ring itself, then R3 and R4 of this third ring; similarly if the

    substituent at R4 contains a ring itself, then R3 and R4 of this third ring. Ten features are

    used to characterise each position: the structure branching feature and other nine features

    which are the same as those used for each substituent of pyrimidines. If no substituent

    locates in a possible position, the features are indicated by ten 1s. So each vector has120 features. We randomly select 60 drugs from triazines dataset and then randomly

    shuffle and split them into two parts in the proportion of 4 : 1 based on drugs of pairs.

    3.2 Feature granules and GKTs design

    In the experiments, the input vectors are decomposed according to the possible

    substituent locations. Each feature granule includes all features of one substituent

    (see Figure 6). For pyrimidines, each drug pair has six feature granules and each feature

    granule has nine features. For triazines, each drug pair has twelve feature granules with

    the size of 10.

    Figure 6 Feature granules: (a) pyrimidines and (b) triazines

    (a) (b)

  • 8/7/2019 Granular Kernel Tree

    11/16

    280 B. Jin, Y-Q. Zhang and B. Wang

    We design two kinds of GKTs for each dataset which are shown in Figure 7. GKTs-1 and

    GKTs-2 are used for pyrimidines. GKTs-3 and GKTs-4 are used for triazines. GKTs-1and GKTs-3 are a kind of two layer kernel trees and within which each granular kernels

    importance is controlled by the outgoing connection weight. GKTs-2 and GKTs-4 are

    three layer kernel trees and within which each drug pair is represented by a two layer

    subtree. Two subtrees are combined together by a product operation at the top of tree.

    3.3 Experimental setup

    RBF kernel functions are also chosen as granular kernels functions in each GKTs, and

    therefore each granular kernelgKi has a RBF parameteri.

    The initial ranges of all RBFs and i are set to [0.0001, 1]. The initial range of

    regularisation parameter C is [1, 256]. The probability of crossover is 0.7 and the

    mutation ratio is 0.5. The range of connection weights is [0.001, 1]. 5-foldcross-validation is used on pyrimidines training dataset and 8-fold cross-validation is

    used on triazines training dataset. In cross-validation, the training data are also split in the

    same way as described in the subsection 3.1. The population size is set to 500 and the

    number of generations is set to 30 for both datasets. The software package of SVMs used

    in the experiments is LibSVM (Chang and Lin, 2001).

    Figure 7 Granular Kernel Trees: (a)(b) GKTs for pyrimidines and (c)(d) GKTs for triazines

    (a) GKTs-1 (b) GKTs-2

    (c) GKTs-3

  • 8/7/2019 Granular Kernel Tree

    12/16

    Granular Kernel Trees with parallel Genetic Algorithms 281

    Figure 7 Granular Kernel Trees: (a)(b) GKTs for pyrimidines and (c)(d) GKTs for triazines

    (continued)

    (d) GKTs-4

    3.4 Experimental results and comparisons

    Table 1 shows performances of three GAs based kernels on pyrimidines dataset.

    EGKTs-1 is evolutionary GKTs-1 and EGKTs-2 is evolutionary GKTs-2. From Table 1,

    we can see that SVMs with both kinds of EGKTs can outperform SVMs with E-RBF by

    3.0% and 3.3% respectively in terms of prediction accuracy on unseen testing dataset.

    The fitness values and training accuracies of SVMs with EGKTs are also higher than

    those of SVMs with E-RBF kernels. Its also shown that the testing accuracy of SVMs

    with EGKTs-1 is a little bit higher than that of SVMs with EGKTs-2 on pyrimidines.

    Table 1 Prediction accuracies on pyrimidines dataset

    E-RBF(%) EGKTs-1 (%) EGKTs-2 (%)

    Fitness 84.5 86.6 88.5

    Training accuracy 96.8 96.8 98.8

    Testing accuracy 88.4 91.7 91.4

    The performances of three GAs based kernels on triazines dataset are shown in Table 2.

    On testing accuracy, SVMs with EGKTs-3 (evolutionary GKTs-3) and EGKTs-4

    (evolutionary GKTs-4) are better than SVMs with E-RBF by 3.7% and 4.9%respectively. We find that the training accuracies are much higher than both testing

    accuracies and fitness values for all three kernels on both datasets, especially on triazines

    dataset. The reason could be due to the fact that data are complicated and SVMs easily

    overfit on the training dataset.

  • 8/7/2019 Granular Kernel Tree

    13/16

    282 B. Jin, Y-Q. Zhang and B. Wang

    Table 2 Prediction accuracies on triazines dataset

    E-RBF(%) EGKTs-3 (%) EGKTs-4 (%)

    Fitness 73.8 74.6 75.8

    Training accuracy 93.4 97.2 98.7

    Testing accuracy 79.6 83.3 84.5

    The comparisons between RBF kernels and GKTs are made by using a large number of

    kernel parameter samples. We randomly generate 2000 C values from [1, 256] for SVMs

    and 2000 groups of kernel parameters for each kernel. SVMs are trained and tested with

    these random parameters. For each dataset, the prediction accuracy curves of three

    kernels are drawn in one picture (Figures 8 and 9) and each of them is ordered with C

    values. From Figures 8 and 9, it is easy to see that the performances of GKTs are better

    than those of RBF kernels. Quartiles and mean are also used to summarise each kernelperformance in terms of testing accuracy. The results are listed in Tables 3 and 4. Based

    on the differences of Q1 (25th percentile), Q2 (median), Q3 (75th percentile) and Mean

    values, we can conclude the performances of two GKTs are better than those of RBF

    kernels by about 2.3~3.4% on pyrimidines and 3.6~4.5% on triazines.

    Table 3 Testing accuracies on pyrimidines dataset with 2000 groups of random parameters

    RBF(%) GKTs-1 (%) GKTs-2 (%)

    Maximum 91.0 93.2 93.0

    75th percentile 88.4 91.7 91.0

    Median 88.0 91.3 90.6

    25th percentile 87.5 90.9 90.1Minimum 83.5 87.0 87.2

    Mean 88.2 91.2 90.5

    Table 4 Testing accuracies on triazines dataset with 2000 groups of random parameters

    RBF(%) GKTs-3 (%) GKTs-4 (%)

    Maximum 83.9 88.2 88.2

    75th percentile 79.9 83.7 84.1

    Median 78.5 82.6 83

    25th percentile 77.9 81.5 82

    Minimum 72.2 77.8 76.2

    Mean 78.9 82.6 83

    We can see that almost all testing accuracies of EGKTs in Tables 1 and 2 are better than

    the maximum testing accuracies of RBF kernels in Tables 3 and 4. We can also find that

    the testing accuracies of GAs based kernel methods can be stabilised at Q3.

  • 8/7/2019 Granular Kernel Tree

    14/16

    Granular Kernel Trees with parallel Genetic Algorithms 283

    Figure 8 Testing accuracy comparisons on pyrimidines

    Figure 9 Testing accuracy comparisons on triazines

    4 Conclusions and future work

    This paper has proposed an approach to construct GKTs according to the granular kernel

    concept and properties. The experimental results have shown that GKTs and EGKTs

    have better performances than traditional RBF kernels in drug activity comparisons.

    Its promising to construct more powerful and suitable kernels by using such kind of

    evolutionary hierarchical kernel design. In the future, we will continue our research on

    the evolutionary granular kernel tree design for other problems. How to generate feature

    granules could be one issue in the case that the relationships among features are complex.

  • 8/7/2019 Granular Kernel Tree

    15/16

    284 B. Jin, Y-Q. Zhang and B. Wang

    Acknowledgements

    This work is supported in part by NIH under P20 GM065762. Bo Jin is supported by

    Molecular Basis for Disease (MBD) Doctoral Fellowship Program.

    References

    Adamidis, P. (1994) Review of parallel genetic algorithms bibliography, Internal T.R., AristotleUniversity of Thessaloniki, Greece.

    Berg, C., Christensen, J.P.R. and Ressel, P. (1984) Harmonic Analysis on Semigroups-Theoryof Positive Definite and Ralated Functions, Springer-Verlag, New York, USA.

    Boser, B., Guyon, I. and Vapnik, V.N. (1992) A training algorithm for optimal margin classifiers, Proc. Fifth Annual Workshop on Computational Learning Theory, ACM Press, USA,

    pp.144152.Burbidge, R., Trotter, M., Buxton, B. and Holden, S. (2001) Drug design by machine learning:

    support vector machines for pharmaceutical data analysis, Computers and Chemistry,Vol. 26, No. 1, pp.415.

    Cant-Paz, E. (1998) A survey of parallel genetic algorithms, Calculateurs Paralleles, Hermes,Paris, Vol. 10, No. 2, pp.141171.

    Chang, C-C. and Lin, C-J. (2001) LIBSVM: A Library for Support Vector Machines, Softwareavailable at http://www.csie.ntu.edu.tw/~cjlin/libsvm.

    Collins, M. and Duffy, N. (2002) Convolution kernels for natural language, in Dietterich, T.G.,Becker, S. and Ghahramani, Z. (Eds.): Advances in Neural Information Processing Systems,MIT Press, Cambridge, MA, Vol. 14, pp.625632.

    Cortes, C. and Vapnik, V.N. (1995) Support-vector networks, Machine Learning Vol. 20,pp.273297.

    Cristianini, N. and Shawe-Taylor, J. (1999) An Introduction to Support Vector Machines:And other Kernel-based Learning Methods, Cambridge University Press, NY.

    Dong, J.X., Krzyzak, A. and Suen, C.Y. (2003) A fast parallel optimization for training supportvector machine, in Perner, P. and Rosenfeld, A. (Eds.): Proceedings of 3rd InternationalConference on Machine Learning and Data Mining, Springer Lecture Notes in ArtificialIntelligence (LNAI 2734), Leipzig, Germany, pp.96105.

    Duan, K., Keerthi, S.S. and Poo, A.N. (2003) Evaluation of simple performance measures fortuning SVM hyperparameters,Neurocomputing, Vol. 51, pp.4159.

    Grtner, T. (2003) A survey of Kernels for structured data, ACM SIGKDD ExplorationsNewsletter, Vol. 5, pp.4958.

    Grtner, T. Flach, P.A. and Wrobel, S. (2003) On graph kernels: hardness results and efficientalternatives,Proceedings of the 16th Annual Conference on Computational Learning Theoryand the 7th Kernel Workshop.

    Graf, H-P., Cosatto, E., Bottou, L., Dourdanovic, I. and Vapnik, V.N. (2005) Parallel support

    vector machines: the cascade SVM, in Saul, L., Weiss, Y. and Bottou, L. (Eds.):Advances in Neural Information Processing Systems, MIT Press, MIT Press, Cambridge, MA, Vol. 17,pp.513520.

    Haussler, D. (1999) Convolution kernels on discrete structures, Technical reportUCSC-CRL-99-10, Department of Computer Science, University of California at Santa Cruz.

    Hirst, J.D., King, R.D. and Sternberg, M.J.E. (1994a) Quantitative structure-activity relationships by neural networks and inductive logic programming. I. The inhibition of dihydrofolatereductase by pyrimidines, Journal of Computer-Aided Molecular Design, Vol. 8, No. 4,pp.405420.

  • 8/7/2019 Granular Kernel Tree

    16/16

    Granular Kernel Trees with parallel Genetic Algorithms 285

    Hirst, J.D., King, R.D. and Sternberg, M.J.E. (1994b) Quantitative structure-activity relationships

    by neural networks and inductive logic programming. II. The inhibition of dihydrofolatereductase by triazines, Journal of Computer-Aided Molecular Design, Vol. 8, No. 4,pp.421432.

    Joachims, T. (2000) Estimating the generalization performance of a SVM efficiently,Proceedingsof the International Conference on Machine Learning, Morgan Kaufman.

    Kashima, H. and Inokuchi, A. (2002) Kernels for graph classification,Proc. 1st ICDM Workshopon Active Mining (AM-2002), Maebashi, Japan.

    Kashima, H. and Koyanagi, T. (2002) Kernels for semi-structured data, Proceedings of theNineteenth International Conference on Machine Learning, pp.291298.

    Lin, S-H., Goodman, E.D. and Punch III, W.F. (1997) Investigating parallel genetic algorithms on job shop scheduling problem, Proceedings of the 6th International Conference onEvolutionary Programming VI.

    Lodhi, H., Shawe-Taylor, J., Christianini, N. and Watkins, C. (2001) Text classification using

    string kernels, in Leen, T., Dietterich, T. and Tresp, V. (Eds.): Advances in NeuralInformation Processing Systems, MIT Press, Cambridge, MA, Vol. 13, pp.563569.

    Michalewicz, Z. (1996) Genetic Algorithms + Data Structures = Evolution Programs,Springer-Verlag, Berlin.

    Newman, D.J., Hettich, S., Blake, C.L. and Merz, C.J. (1998) UCI Repository of Machine LearningDatabases, [http://www.ics.uci.edu/~mlearn/MLRepository.html], University of California,Department of Information and Computer Science, Irvine, CA.

    Runarsson, T.P. and Sigurdsson, S. (2004) Asynchronous parallel evolutionary model selection forsupport vector machines,Neural Information ProcessingLetters and Reviews, Vol. 3, No. 3pp.5967.

    Schlkopf, B., Tsuda, K. and Vert, J-P. (2004) Kernel Methods in Computational Biology,MIT Press, Cambridge, MA.

    Serafini, T., Zanni, L. and Zanghirati, G. (2004) Parallel GPDT A Parallel Gradient Projection-based Decomposition Technique for Support Vector Machines, http://www.dm.unife.it/gpdt.

    Shawe-Taylor, J. and Cristianini, N. (2004) Kernel Methods for Pattern Analysis, CambridgeUniversity Press, Cambridge, MA.

    Vapnik, V.N. (1998) Statistical Learning Theory, John Wiley and Sons, New York.

    Vapnik, V.N. and Chapelle, O. (2000) Bounds on error expectation for support vector machine,in Smola, A., Bartlett, P., Schlkopf, B. and Schuurmans, D. (Eds.): Advances in LargeMargin Classifiers, MIT Press, Cambridge, MA, pp.261280.

    Weston, J., Perez-Cruz, F., Bousquet, O., Chapelle, O., Elisseeff, A. and Schlkopf, B. (2003)Feature selection and transduction for prediction of molecular bioactivity for drug design,Bioinformatics, Vol. 19, No. 6, pp.764771.

    Zanghirati, G. and Zanni, L. (2003) Parallel solver for large quadratic programs in training supportvector machines,Parallel Computing, Vol. 29, pp.535551.

Embed Size (px)
Recommended