+ All Categories
Home > Documents > A graph theoretical approach to SPR

A graph theoretical approach to SPR

Date post: 07-Aug-2018
Category:
Upload: 11113432
View: 229 times
Download: 0 times
Share this document with a friend

of 6

Transcript
  • 8/20/2019 A graph theoretical approach to SPR

    1/12

     a

    ~m po siu m n G ra ph Theory in Chemistry

    6 10

    2 4

    Graph-Theoretical Approach

    to Structure-Property Relationships

    Zlatko MlhaliC

    Faculty of Science and Mathematics, The University of Zagreb, Strossmayerov trg 14.41000 Zagreb, The Republic of Croatia

    Nenad TrinajstlC

    The Rugjer BoSkoviC Institute,

    P.O.B.

    1016,41001 Zagreb,The Republic of Croatia

    A

    fundamental concept of chemistry is that the struc-

    tural characteristics of a molecule are responsible for its

    pmperties

    1).

    This was pointed out in the middle of the

    last century by Crum Brown and Fraser

    2 )

    who had also

    devised one of the first structure-property models. How-

    ever, the earliest work in which this relationship was ob-

    served (the toxicity of methyl and amyl alcohols)was a the-

    sis by Cms in 1863 (3).

    A

    Topological Model of Matter

    The origin of the structure-property concept can be

    traced (4) to the work of the Croatian Jesuit priest, scien-

    tist, and philosopher Rugjer Josip BoGkoviC

    5)

    who intro-

    duced the idea of representing atoms as points in space (6).

    (His major work was the theory of a single law of forces.)

    By allowing the point atoms to assume a variety of differ-

    ent arrangements, BogkoviC was able to account for the ex-

    istence of different substances.

    In this way the BobkoviC model may be considered as the

    forerunner of a topological model for the structure of mat-

    ter. BoBkovib's fundamental idea, which is of the greatest

    importance in chemistry, was that substances have differ-

    ent properties because they have different structures. This

    idea was used, for example, by Davy to rationalize the dif-

    ference between diamond and graphite (4,

    7 .

    Table 1. List of Selected Topological Indices

    Topological Standard Structural interpretationa Author (Year)

    index symbol

    Wiener

    W

    Sum o distances

    n

    a Weiner (1947)

    number molecular graph

    Hosoya Sum o countsof non- Hosoya (1971)

    index adjacent edges

    n

    a

    molecular graph

    RandiC Sum of weighted edges RandiC (1975)

    index

    n

    a molecular graph

    Balaban

    J

    Sum of weighted Balaban (1982)

    index distances

    n

    a molecular

    graph

    Schultz

    MTI

    Sum of elements of the Schultz (1989)

    index structural row matrix

    v[A

    +bD]of a molecular

    graph

    Haraty H Sum of squares of PlavSiC, NikoliiC

    number reciprocal distances

    n

    a TrinajstiC (1991)

    molecular araoh

    Table 2 List of Properties that

    Are

    Deslrable for

    Topological Indices a s Proposed by RandiC 18)

    1

    Direct structural interpretation

    2

    Good correlation with at least one molecular property

    3 Good discrimination of isomers

    4

    Locally defined

    5 Generalizable

    Linearly independent

    Simplicty

    Not based on physical or chemical properties

    Not trivially related to other indices

    Effidencyo construction

    Based on familiar structural concepts

    Correct size dependence

    Gradual change with gradual change in structures

    QSPR

    The structure-~m~ertvelationships want ifv the con-

    nection between ihe structure and p pekies o

  • 8/20/2019 A graph theoretical approach to SPR

    2/12

    liferation will stop in the near future. Here we will review

    onlv several selected touoloeical indices. Table 1 ists six

    topblogical indices thatAwill-be considered in this report.

    Table

    2

    gives a list of useful properties that are desirable

    for topological indices 18).

    The desirable properties proposed by RandiL

    18)

    epre-

    sent the very high level of sophistication that a topological

    index should achieve. All six indices listed in Table 1 ap-

    proach this ideal. Their weakest point is the discrimina-

    tion of isomers. This narticular urouertv is rather low for

    all topological indices'considerei here except the Balaban

    index (19-22). However, this is the weak point of most to-

    pological indices, except for molecular identification num-

    bers (23). Nonetheless. the low discriminatorv Dower of

    many indices does not prevent them from being useful de-

    scriptors in structure property activity modelling.

    In the next section we will give a brief survey of elemen-

    tary (chemical) grapb-theoretical concepts. This section

    will be followed by a section containing definitions of the

    six selected to~oloeicalndices. In the fourth section a de-

    sign of the ~ t ~ c t - ~ m ~ e r t ~elationships will be deline-

    ated. Then a didactic example will be presented.

    Elementary Graph Theoretical Concepts

    We will cover only those graph-theoretical concepts that

    will be used in this report. In doing so, we will follow the

    book Graph Theory by FrankHarary(24) and both editions

    of our book Chemical Graph Theoq (8,16,25).

    Graph theory is a branch of discrete mathematics, re-

    lated to topology and wmbinatorics. It deals with the way

    objects a re connected and with all the consequences of the

    connectivity. The connectivity in a system is, thus, a funda-

    mental quality of graph theory.

    Chemical graph theory is a branch of mathematical

    chemistry, and consequently of theoretical chemistry. I t is

    concerned with handling chemical graphs, tha t is, graphs

    that represent chemical systems. Hence, chemical graph

    theory deals with analyses of all consequences of connec-

    tivity in a chemical system. In other words, chemical graph

    theory is concerned with all aspects of the application of

    graph theory to chemistry.

    The Concept of a Graph in Graph Theory

    The central concept in graph theory is tha t of a graph.

    For a graph theorist, a graph is the application of a set on

    itself, tha t is, a collection of elements of the set and of bi-

    nary relations between these elements. Graphs are one-di-

    mensional objects, but they can be embedded or realized in

    spaces of higher dimensions.

    For a chemist, the two-dimensional realization of a

    graph is more appealing, tha t is, a set of vertices (points)

    and of edges (lines) oining these vertices. Agraph G can be

    visualized by a diagram when the vertices are drawn as

    small circles or dots, and the edges as lines or curves con-

    Figure

    1.

    Adiagram of a labeled (numbered)graph, showing vertices

    as circles and edges as lines. The graph is aclualy aone-dimensional

    entity, by

    it

    can be realized

    in

    two dimensions, as shown here.

    necting the appropriate circles. Because a diagram of a

    graph completely describes the graph, it is customary and

    convenient to refer to the diagram of the graph as the

    graph itself.

    Mainly due to their diagrammatic representation,

    graphs have appeal as structural models in science, in gen-

    eral, and in chemistry, in particular (26,271.As an exam-

    ple, Figure

    1

    hows a diagram of a labelled graph. Agraph

    is called labeled when a specific numbering of the its verti-

    ces is introduced.

    The Concept of a Graph in Chemistry

    In chemistry, graphs can be used to represent a variety

    of chemical objects such as molecules, reactions, crystals,

    polymers, and clusters. The common feature of chemical

    systems is the presence of sites and connections between

    them. Sites can be atoms, electrons, molecules, molecular

    fragments, intermediates, ete., while the connections be-

    tween sites can represent bonds, reaction steps, van der

    Waals forces, etc. Chemical systems can be represented by

    chemical graphs using a simple conversion rule: Sites are

    replaced by vertices and wnnections by edges.

    Molecular Gmphs

    A special class of chemical graphs a re molecular graphs.

    Molecular graphs (or constitutional graphs) are chemical

    graphs tha t represent the constitution of molecules. In

    these graphs vertices correspond to individual atoms, and

    edges correspond to the bonds between them. An interest-

    ing historical detail

    i.;

    related to the concept of the molecu-

    lar graph: The term graph was introduced by English

    mathematician Sylvester (28) in 1878 on the basis of the

    constitutional formulas used by the chemists of his day.

    To simplify the manipulation of molecular graphs, hy-

    drogen-depleted graphs are often used. Such graphs repre-

    sent only the molecular skeletons, omitting hydrogen

    atoms and their bonds. As an example, Figure gives a

    labeled molecular hydrogen-depleted graph that depicts

    the carbon skeleton of 2,3,4-trimetbylhexane.

    Figure

    2.

    A aoelea, hydrogen-depleted,molecular graph correspond-

    mg lo tne carbon skeleton of 2.3.4-trimethylhexane.Tne vertices cor-

    respono to aroms, an0 the edges correspono to chem.ca wnos.

    Analyzing and Comparing Graphs

    Two graphs GI and Gz are isomorphic if there exists a

    one-to-one correspondence between their vertex sets V(GJ

    and V(G2), which induces a one-to-one correspondence be-

    tween their edge sets E(GJ and E(G2).

    n

    invariant of a

    graph G is a quantity associated with G tha t has the same

    value for any graph that is isomorphic with G. I t should be

    noted that topological indices are graph invariants.

    Two vertices i and of a graph G are adjacent if there is

    an edge joining them; the vertices i and are then incident

    to such an edge. Similarly, two edges of G are adjacent if

    they have a vertex in common. The valency of a vertex i of

    G is the number of edges incident to i. This is denoted by

    dm.

    7 2

    Journal

    of

    Chemical Education

  • 8/20/2019 A graph theoretical approach to SPR

    3/12

    A walk of a graph G is an alternating sequence of vertices

    and edges, beginning and ending with vertices, in which

    each edge is incident with the two vertices immediately

    preceding and following it. A path is a walk in which no

    vertex occurs more than once. The distance between two

    vertices is the number of edges in the shortest path that

    joins the two vertices. Agraph G is connected if every pair

    of its vertices is joined by a path. Otherwise, a graph is

    considered disconnected.

    A graph whose vertices all have the same valence is

    called a regular graph. If all vertices in a regular graph

    have a valence of 2, then the graph is called a cycle. A tree

    is a connected acyclic graph. The molecular graph in Fig-

    ure 2

    is

    an example of a tree. A graph is acyclic

    if

    it has no

    cycles.

    Associating Graphs with Matrices

    A labeled (chemical) graph may be associated with sev-

    eral matrices. Two very important graph-theoretical ma-

    trices a re the vertex-adjacency matrix and the distance

    matrix.

    The vertex-adjacency matrix, A = A(G), of a labeled con-

    nected graph G with N vertices, is a square symmetric ma-

    trix of orderN. It is commonly called the adjacency matrix.

    It is defined below.

    1 if

    vertices

    i

    and are adjacent

    1)

    The distance matrix, D = D G), f a labeled connected

    graph G with N vertices is a square symmetric matrix of

    order N. It is defined below.

    where l j is the length of the shortest path (i.e., the dis-

    tance) between the vertices and in G.

    Very often the distance matrix of a graph G can be gen-

    erated using powers of the corresponding adjacency matrix

    of G 29).Table 3 gives the adjacency matrix and the dis-

    tance matrix that correspond

    t

    the molecular graph in

    Figure 2.

    Table3 The Adjacency Matrix and the Distance Matrix

    of the Molecular Graph in Figure2

    Definitions of the Selected Topological Indices

    Wiener Number

    The Wiener number, W = W(G) of G, wasintroduced by

    Wiener

    in

    1947 as the path number 30).This topological

    index is defined as the half-sum of the elements of the dis-

    tance matrix 15).

    Table 4 gives an example for computing he Wiener num-

    ber.

    Table

    4

    The Computation of the Wiener Number for a

    Tree TDepicting the Carbon Skeleton

    of BMethylbutane.

    (a)A labeled tree 7

    b)

    The distance matrix of T

    (c) The Wiener number of T

    W n =-( I+ 8.2+ 4.3) 18

    2

    Table5 The Computation of the Hosoya Index for a

    Tree TRepresenting the Carbon Skeleton

    of 2 3-Dimethylpentane.

    (a)A tree

    T

    (b)

    The count of the ~(Tfiquantities

    n T

    i)

    p T;O)

    = 1

    ii)

    p T;I)

    =

    6

    i i i ) p(T;2)

    =

    8

    iv)

    p(T:3)

    = 2

    (c)The Hosoya index of

    T

    Z n

    =

    p(T;O) p(T;l)

    p(T;2)

    p(T;3) 17

    Volume 69 Number 9 September 1992

    7 3

  • 8/20/2019 A graph theoretical approach to SPR

    4/12

    Table6. The Edge Weights of 10 Edge Types

    Which Appear in Graphs Corresponding to the Carbon

    Skeletons of Hydrocarbons

    Table8. The Computation of the Balaban lndex

    for a Labeled Tree TRepresenting the Carbon Skeleton

    of 2,3-Dimethylpentane

    (a)A labeld tree 7

    I ? 1

    1 2 0.7071

    1,3 0.5774

    1,4 0.5

    2 2 0.5

    2 3 0.4082

    2,4 0.3536

    3,3 0.3333

    3,4 0.2887

    4,4 0.25

    Table 7. The Computation of RandiC lndex for a Tree T

    Depicting the Carbon Skeleton

    of 4-Ethyl-2-methylheptane

    (a)

    A

    tree T

    b) Count o the edge-types (the numbers at the vertices

    represent their valencies)

    4 2 = 2

    4 3 = 2

    bL2= 1

    b 3 = 4

    c)

    The Randit index o  

    x q 2.0.7071 2.0.5774 0.5 4.0.4082 4.7018

    Hosoya lndex

    The Hosoya index, Z

    =

    Z(G), was introduced by Hosoya

    in 1971 as the Z index 15).This index is defined below.

    wherep(G; i) is the number of selections of i mutually non-

    adjacent edges in G.

    By definition, p(G; 0) = 1, and p(G; 1) s the number of

    edges in

    G.

    Table 5 gives an example of computing the

    Hosoya index.

    RandiC lndex

    The Randid index,

    =

    x(G) of G was introduced by

    RandiC in 1975

    as

    the connectivity index (31). This is one

    of the most widely used topological indices in QSPR (32-

    c) The Balaban index o  

    b)The distance sums

    34) (and also in quantitative structure-reactivity relation-

    ship ( SARI 35)).

    The Randid index is defined as

    D( z)=

    where d(i) and d j)are the valencies of the vertices

    i

    and

    tha t define the edge ij.

    For saturated hydrocarbons, eq 5 may be givenin closed

    form. In molecular graphs that depict the carbon skeletons

    of hydrocarbons, only four types of vertices with respect to

    their valencies appear, tha t is, vertices with

    d

    =

    1,2,

    3 4.

    These give rise to 10 types of edges whose weights are

    given in Table 6.

    If the number of each edge type is denoted by

    0 1 2 3 4 2 3

    1 0 1 2 3 1 2

    2 1 0 1 2 2 1

    3 2 1 0 1 3 2

    4 3 2 1 0 4 3

    2 1 2 3 4 0 3

    3 2 1 2 3 3 0

    bg

    where i

    = 1, ... 4

    j

    i ... 4

    and if the edge weights from Table 6 are used, then eq 5

    becomes the following.

    This expression reveals tha t the Randid indices of hydm-

    carbons are fully determined by the counts of the edge

    types in the corresponding hydrogen-depleted graphs.

    Table

    7

    gives an example of computing he RandiC index by

    means of eq 6.

    Balaban lndex

    The ~a l a banndex,

    J

    = J(G) of G, was introduced by

    Balaban in 1982 as the average-distance sum connectivity

    (36). It is defined as

    704 Journal of Chemical Education

  • 8/20/2019 A graph theoretical approach to SPR

    5/12

    Table

    10.

    The Com~utation f the Haraw Number for a

    able 9. The Computationof the Schultz lndex for a

    Tree TDe~ictinahe Carbon Skeleton

    a)A labeled tree

    T

    b)The adjacency matrix of

    T

    ic) The distance matrix of

    d) The adjacency-plus-distance matrix of T

    I  l

     e)The valence row matrix of T

    v T ) = [ 1 3 2 2 1 11

    1)

    The v[A Dl row matrix

    v[A D] T) [22 15 16 16 25 221

    (g)

    The Schukz index of T

    MTZ T)

    = 2.22 15 16 18 25= 118

    where

    M

    is the number of edges in G; v is the eyelomatic

    number of G;and D) i s the distance sum where

    i

    = 1,2,

    ...,

    N

    The cyclomatic number

    = p G)

    of a polycydic graph

    G

    is equal to the minimum number of edges that must be

    removed from

    G

    to transform it to the related acyclic

    graph. For trees,

    = 0;

    for monocycles,

    v =

    1

    The distance sum Dlifor

    a

    vertex i of G represents a sum

    of all entries

    in

    the corresponding row of the distance ma-

    trix.

    Clearly the Wiener number can also be expressed in

    terms of the distance sums.

    Table 8 gives an example of computing the Balaban index.

    Tree T~eljictin~he Carbon skeleton

    of 2,3-Dimethylhexane

    a)

    A

    labeled tree T

    b)

    The distance matrix

    of 7

    c)The D- matrix of T

    I

    .2 0.25

    0.33 0.5 1

    0

    0.2 0.25

    0.5 1

    0.5 0.33 0.25

    0.2

    0 0.33

    0.33 0.5 1

    0.5 0.33 0.25

    0.33

    0

    d)The

    D-

    matrix o

    T

    I

    1 0.25 0.11 0.06

    0.04 0.25

    0.11

    1

    0 1 0.25

    0.11 0.06 1

    0.25

    0.25

    1

    0

    1

    0.25 0.11 0.25

    1

    e)The Harary number of T

    H T)

    = 14.1 16.0.25 14.0.11

    8.0.06 +40.04) 10.10

    Schultz lndex

    The Schultz index, MTI = MTI G) of G , was introduced

    by Schultz in

    989

    as the molecular topological index 3 7 ) .

    This index is defined below 21,371.

    MTI = i

    i = l

    10)

    where the ezs i

    =

    1,2, ...,N represent the elements of the

    following row matrix of order

    N

    where

    v

    s the valency row matrix,

    A

    is the adjacency ma-

    trix, and is the distance matrix. Table

    9

    gives an exam-

    ple of computing the Schultz index.

    araiy Number

    The Harary number, H = H G ) of G , was introduced by

    PlavSiC et al. 3 8 ) n 99 n honor of Professor Frank Har-

    ary on his 70th birthday He greatly influenced the devel-

    opment of graph theory and chemical graph theory. This

    index is defined below.

    Volume

    69

    Number

    9

    September

    992

    7 5

  • 8/20/2019 A graph theoretical approach to SPR

    6/12

    Table

    11.

    The Wiener

    Numbers

    IWI.

    Hosova

    Indices a.andic indices

    irl

    Balaban lndices

    (4,

    chultz indic M T ~ arary ~u ni be rsH) and Boili

    Points (bp In 'C) of Alkanes with Up to 1 Carbon Atoms

    Alkane

    W Z

    J

    MTI

    H

    p

    methane 0

    ethane 1

    propane 4

    2-methylpropane 9

    butane 10

    2,2-dimethylpropane 16

    2-methylbutane 18

    pentane 20

    2.2-dimethyl butane 28

    2.3-dimethyl 29

    butane

    2-methylpentane 32

    3-methylpentane 31

    hexane 35

    2,2,3-trimethylbutane 42

    2,2-dimethylpentane 46

    3,3-dimethylpentane 44

    29-dimethylpentane 46

    2,4-dimethylpentane

    48

    2-methylhexane 52

    3-methylhexane 50

    3-ethylpentane 48

    heptane 56

    2,2,3,3-tetramethyl- 58

    butane

    2,2,3-trimethyl pentane 63

    2,3,3-trimethyl pentane 62

    2,2,4-trimethyl pentane 66

    2,2-dimethyl hexane 71

    3,3-dimethylhexane 67

    3-ethyl-3-methyl- 64

    pentane

    2,3.4-trimethylpentme 65

    2,3-dimethylhexane 70

    3-ethyl-2- 67

    methyipentane

    3,4-dimethylhexane 66

    2,4-dimethylhexane 71

    2,s-dimethylhexane 74

    2methylheptane 79

    3-methylheptane 76

    4methylheptane 75

    3-ethylhexane 72

    octane 84

    2,2,3,3- 82

    tetramethylpentane

    2,2,3,4- 86

    tetramethylpentane

    2,2,3-trimethylhexane 92

    2.2-dimethyl-3- 88

    ethylpentane

    3.34-trimethylhexane 88

    2,3,3,4- 84

    tetramethylpentane

    2,3,3-trimethylhexane 90

    2,3-dimethyl-3- 86

    ethylpentane

    2,2,4,4- 88

    tetramethylpentane

    where V s th e mat rix whose ele-

    ments ar e th e squares of the reciprocal

    distances in

    G.

    TheD matrix may be considered as

    the distance matrix of a class of spe-

    cially weighted graphs in which

    weights between vertices in

    G

    mimic

    the Coulomb law between the sites in

    the corresponding structure. Table 10

    eives a n examole of comoutine the

    ka ra ry number:

    Table 11 eives the Wiener and Har

    ry numbers, and the Hosoya, RandiC,

    Balaban, and Schultz indices for al-

    kanes with up to 10 carbon atoms.

    Designing QSPR Models

    There a re several ways to design

    QSPR models 39-44).Here we outline

    one possible strategy. Figure 3 con-

    tains a flow diagram of the steps in-

    volved in the design of a QSPR model.

    This is an iterative approach.

    Step

    1 Get a reliable source of experi-

    mental data for a given set

    of

    molecules.

    This initial set of molecules is sometimes

    called the training set

    45).

    The data in this

    set must be reliable and accurate. The qual-

    ity of the selected data is important because

    it will affect all the following steps.

    Step 2 The topological index is selected

    and computed. This is also an important

    step because selecting the appropriate topo-

    logical index (or indices) can facilitate find-

    ing the most accurate model.

    Step 3 The two sets of numbers are then

    statistically analyzed using a suitable alge-

    braic expression.

    The QSPR model is t hus

    a

    regression

    model, and one must be careful about

    its statistical stability. Chance factors

    could yield spuriously accurate corre-

    lations (4648). The quali ty of the

    QSPR models can be conveniently

    measured by the correlation coefficient

    r and the s tandard deviations. Agood

    QSPR model must have > 0.99, while

    depends on the property. For exam-

    ple, for boiling points,

    s c

    5 C. There-

    fore, Step 3 is a central step in the de-

    sign of the structure-property models.

    Step 4 Predictions are made for the val-

    ues of the molecular property for species

    that are not part of the training set

    via

    the

    obtained initial

    QSPR

    model. The unknown

    molecules are ~ t ~ ~ t u r d l yelatedto the ini-

    tial set

    of

    compounds.

    Step 5

    The predictions are tested with

    unknown molecules by experimental deter-

    mination of the predicted properties. This

    step is rather involved because it requires

    acquiring or preparing the test molecules.

    Step

    6. If the tests support the predic-

    tions, one presents the

    QS R

    model in its

    final form with all necessary statistical

    characteristics.

    If the te sts do not support the initial

    QSPR model, it must be revised and

    7 6

    Journal of Chemical Education

  • 8/20/2019 A graph theoretical approach to SPR

    7/12

    Table 11 Continued

    Alkane

    2 2 4trimethylhexane

    2 4 4trimethylhexane

    2 2 5-trimethylhexane

    22-dimethyiheptane

    3 bdimethylheptane

    44-dimethylheptane

    3-ethyi-3-methylhexane

    3 bdiethylpentane

    23.4-trimethylhexane

    2 4-dimethyl-3-ethyipentane

    2 3 5-trimethylhexane

    2 3-dimethylheptane

    3-ethyl-2-methylhexane

    3 4-dimethylheptane

    3-ethyl-4methylhexane

    2 4-dimethylheptane

    4-ethyl-2-methylhexane

    3.5-dimethyiheptane

    2 5-dimethylheptane

    2 6-dimethyiheptane

    2-methyloctane

    3-methyioctane

    4-methyloctane

    Sethylheptane

    4-ethylheptane

    nonane

    2 2 3 3 4-pentamethylpentane

    2 2 3 3-tetramethylhexane

    3-ethyl-22.3-trimethylpentane

    3 3.4 4-tetramethylhexane

    2 2 3 4 4-pentamethylpentane

    2 2 3 4-tetramethylhexane

    3-ethyl-2 2 44rimethylpentane

    2 3 4 4tetramethyihexane

    2 2 3 5tetramethylhexane

    2 2 3-trimethylheptane

    2 2dimethyl-3-ethylhexane

    3 3 4trimethylheptane

    3.3-dimethyl-4-ethylhexane

    2 3 3 4-tetramethylhexane

    3 4 4-trimethylheptane

    3 4-dimethyl-3-ethylhexane

    3-ethyl-234-lrimethylpentane

    2 3 3 54etramethylhexane

    2 3 3-trimethylheptane

    2.3-dimethyl-3-ethylhexane

    33diethyl-2-methylpentane

    2 2 4 4tetramethylhexane

    2 2 5-trimethylheplane

    2 5 54rimethylheptane

    2 2 6-trimethyiheptane

    2 2-dimethyloctane

    3 3-dimethyloctane

    4 4-dimethyloctane

    3-ethyl-3-methylheptane

    4-ethyl-4-melhylheptane

    3 3-diethylhexane

    MT

    2 3 4 5tetramethylhexane

    121 58 4.4641 3.8140 436 13.9933 161

    Volume 69 Number

    9

    September

    1992

    707

  • 8/20/2019 A graph theoretical approach to SPR

    8/12

    Table

    11

    Continued

    Alkane

    2,3.4-trimethylheptane

    2,3-dimethyi-4-ethylhexane

    2,3-dimethyl-4-ethylhexane

    2,4-dimethyl-3-ethyihexane

    3,4,5-trimethyiheptane

    2,4-dimethyl-3-isopropylpentane

    3-isopropyl-2-methylhexane

    2,35trimethylheptane

    2,5-dimethyl-3-ethylhexane

    2,4.5-trimethylheptane

    2,3.6-trimethylheptane

    2,3-dimethyloctane

    3-ethyl-2-methylheptane

    3.4-dimethyloctane

    4-isopropylheptane

    4-ethyl-3-methylheptane

    43-dimethyloctane

    3-ethyl-4-methylheptane

    3.4-diethylhexane

    2,4,6-trimethylheptane

    2,4-dimethyloctane

    4-ethyl-2-methylheptane

    3,5-dimethyloctane

    3-ethyl-5-methylheptane

    2,5-dimethyloctane

    5-ethyl-2-methylheptane

    3.6-dimethyloctane

    2.6-dimethyioctane

    2.7-dimethyloctane

    2-methylnonane

    3-methylnonane

    4-methylnonane

    3-ethyloctane

    5-methylnonane

    4-ethyloctane

    4-propylheptane

    decane 165 89 4.9142

    the procedure repeated. The

    QS R

    model thu s estab-

    lished, even for a narrow c lass of compounds, is a very use-

    ful

    tool for predicting t he properties of hypothetical com-

    pounds a nd for the s earch for new compounds with

    programmed properties 12).

    An Instructive Example

    We will apply the procedure from the preceding section,

    to give an instru ctive example of the design of the

    QSPR

    model for predicting th e boiling points of alkanes. As the

    initial set we will consider alkanes with up t

    8

    carbon

    atoms (40 molecules).

    Step

    The boiling points ( C) of the alkanes are taken from the

    CRC Handbook of Chemistry and Physics

    49)

    and Beil-

    stein

    50).

    Step

    2

    We will consider at th is s tage

    all

    six topological indices

    discussed i n this report.

    3.5833

    3.7561

    3.7561

    3.7979

    3.6854

    3.9835

    3.7280

    3.4617

    3.6033

    3.5027

    3.3014

    3.1296

    3.3978

    3.3088

    3.4999

    3.5637

    3.3759

    3.5299

    3.6982

    3.3374

    3.1600

    3.3908

    3.2686

    3.4123

    3.1244

    3.2555

    3.1682

    3.0333

    2.9095

    2.7732

    2.8862

    2.9680

    3.0869

    2.9984

    3.2055

    3.2951

    2.6476

    Step

    MTI

    find

    The following structure-property models ar e th e most

    successful for each index considered:

    p 77.93 (M.97) ~30899 0 0137 - (3.35 .02)10~

    -164.24 (i4.99) (13)

    7 8

    Journal of Chem ical Education

  • 8/20/2019 A graph theoretical approach to SPR

    9/12

    Table

    12.

    The Predicted Values of Boiling Points

    ( C)

    of Nonanes

    Figure 3. A flow diagram of the steps involved

    n

    the design of a

    QSPR model.

    1 Source of experimental data. 2: Seledion of

    the

    topological index.

    3:

    Statistical work and senino uo the QSPR model. 4 Predictions.

    ~ .~ .

    r

    ~

    ~

    5: Test ng the predictions.

    6.

    The final foml ofthe OSPR model. S:

    Tests confirmea he nit:al model. Tne model appears to be satlsfac-

    lory for f~rtherwork. hS: Tens rejected the nit al model as not sat~ s-

    factory. Tne model mJst be rev,seo and the proced~reepeateo

    ~ n t i l

    the satisfactory model is obtained

    The most accurate models ar e those based on in

    Z

    eq 14)

    and eq 15). They

    will

    be used in th e next step.

    Step 4

    We use eqs

    14

    and

    15

    o predict t he boiling points of non-

    anes 35 molecules) see Table 12).

    Step 5

    We compare the predicted and experimental values of

    the nonane boiling ~ o i n t ssee Table 13).

    Both models have problems with some members of the

    nonane series. However. when S t e ~i s r e ~ e a t e dsine the

    boiling points of all alkan es with

    up

    o

    9

    Ar ba n atom; the

    QSPR models based on

    n Z

    an d did not improve. The

    slight improvement happened only when a hiparametric

    model with and N is th e number of carbon atoms in alk-

    ane) was used.

    This model

    is

    given by

    predicted boiling point

    Nonane ~q 14 ~q 15

    2,2,3,3-letramethylpenlane

    119.26 119.40

    2,2,3,4-tetramethylpentane

    2,2,3-trimethylhexane

    2,2-dimethyl-3-ethylpentane

    3,3,4-trimethylhexane

    2,3,3,4-tetramethylpentane

    233-trimethylhexane

    2,3-dimethyl-3-ethylpentane

    2,2,4,4-tetramethylpentane

    2,2,Plrimethylhexane

    2.4,Ptrirnethylhexane

    2.2,5-lrimethylhexane

    22-dimelhylheptane

    3.3-dimethylheptane

    4.4-dimethylheptane

    3-ethyl-3-methylhexane

    3.3-diethylpentane

    2,3,Ptrimethylhexane

    2,4-dimethyl-3-ethylpentane

    2,3,5trimethylhexane

    2,3-dimethylheptane

    3-ethyl-2-methylhexane

    3,4dimelhylheptane

    3-ethyl-Pmethylhexane

    2,4-dimethylheplane

    4-ethyl-2-methylhexane

    3,5-dimelhylheptane

    2,5-dimethylheptane

    2,6-dimethylheptane

    2-methyloctane

    3-methyloctane

    4-methyloctane

    3-elhylheptane

    4-ethylheptane

    nonane

    The procedure may be repeated, a nd we will eventually

    arrive a t the best possible QS R model for predicting the

    boiling points of alkanes.

    Step

    6

    All thre e models expressed

    as

    14, 15, and 19 may serve

    as reliable models for predicting th e alk ane boiling points.

    Plots of

    p

    vs in

    Z

    p

    vs

    X

    and

    p

    vs

    X

    nd th e accompa-

    nying statis tical da ta a re given, respectively, in Figures

    4-

    6.

    The boiling points of alkane s have been predicted many

    times 8,13,15 ,3037 ,40,51 ). Althoughmost of the QSPR

    models produced are very accurate

    r > 0.998, s <

    2

    W

    they suffer from several shortcomings.

    i. Methane was not considered.

    In

    some cases other lighter

    alkanes, such as ethane and propane, were also eliminated

    from the study.

    ii. Models were built for a limited set of alkanes, usually for

    C4-C7 families.

    iii. The complexity of some of the accurate QSPR models n

    the l iterature is forbidding.

    For

    example, one

    of

    the most

    a m -

    rate QSPR models for predicting boiling points of

    alkanes

    is

    the following 40 ) .

    All

    alkanes with up to

    9

    carbon atoms have

    been considered but methsne.)

    Volume

    69

    Number

    9

    Sevternber

    1992

    709

  • 8/20/2019 A graph theoretical approach to SPR

    10/12

    Table 13 Comparison between Predicted (Two Models) and Experimental Values of Boiling Points ( C) of Nonanes

    Nonane (bp)exp Model Model Nonane (bp) Model Model

    (14) (15) (14) (15)

    2,2,3,3-tetramethylpentane

    2.2,3,4-tetramethylpentane

    2.2,3-trimethylhexane

    2,2-dimethyl-3-

    ethylpentane

    3,3,4-trimethylhexane

    2,3,3,4-tetramethylpentane

    2,3,3-trlmethylhexane

    2.3-dimethyl-3-

    ethylpentane

    2,2,4. tetramethylpentane

    2,2+trimethylhexane

    2,4, trimethylhexane

    2.23-trimethylhexane

    2,2-dlmethylheptane

    3,3-dimethylheptane

    4,4-dimethylheptane

    3-ethyl-3-methylhexane

    32-diethylpentane

    2,3,4-trimethylhexane

    2,4-dimethyl-3-

    ethylpentane

    2,3,5-trimethylhexane

    2.3-dimethylheptane

    3-ethyl-2methylhexane

    3,4dimethylheptane

    3-ethyl-4-methylhexane

    2,4dmethylheptane

    4-ethyl-2-methylhexane

    3,bdimethyiheptane

    2,5dimethylheptane

    2,6dimethylheptane

    2-methyloctane

    3-methyloctane

    4-methyloctane

    3-ethylheptane

    4-ethylheptane

    nonane

    2M

    0 W

    0 50

    1 w 1 50

    2w

    25 0 3 w 3 50

    In

    gure 4. plot of p

    vs

    In Zfor the first 40 alkanes.

    710 Journal of Chemical Education

  • 8/20/2019 A graph theoretical approach to SPR

    11/12

    Figure 5 Aplot of bpv s for the first

    40

    alkanes

    Figure 6 A plot of bpvs y or the first

    75

    alkanes

    Volume 69 Number 9 September 199 2

    711

  • 8/20/2019 A graph theoretical approach to SPR

    12/12

    The

    in eq 20

    are

    defined Figure 7. Examples

    of

    a path (3rd order), a cluster (3rd order) and a pathcluster 4th order) for a

    as follows.

    tree Tcorresponding to 3-methylpentane.

    The

    extended connectivity index

    m ~ =[d(i) dm

    ...

    d(m l)la5

    (21)

    where m represents the order of possible fragments. When

    m = 1. framnents are edges which lead to the f int-order

    connekivitY ndex

    x.

    -

    The

    zero-connectivity index

    u

    where nl,

    n 2 ,n3,

    and

    n4

    are the numbers of vertices with

    valencies 1,2,3, and 4, respectively

    Connectivity indices ' x of order m and type t can be ob-

    tained by summing analogous terms over subgraphs in-

    volving paths (t = p ,clusters (t= c), or path-cluster ( t =pc)

    combinations ofm edges. Examples of a path, a cluster and

    a path-cluster are given in Figure 7.

    To conclude this section we stress that there is no simple

    QSPR model for predicting boiling points over a wide

    range of alkanes. However, if we limit ourselves to a simple

    family of alkanes (especially with less than 10 carbon

    atoms), then simple aceurate models are possible

    34).

    Conclusions

    In this report we presented a strategy for designing the

    quantitative structure-property relationships based on to-

    pological indices. The instructive example was directed to

    the design of the structure-property model for predicting

    the boiling points of alkanes. Six selected topological indi-

    ces were tested. The most accurate QSPR models for alk-

    ane boiline ~oi n ts re based on ln

    2.

    and

    Nu.

    The accu-

    ~ ~~

    racy of t h l bodel was judged according to thLcorrelation

    coefficient and the standard error. The umer limits for the

    accurate models were set a t

    r

    > 0.995

    z s

    5

    T

    We conclude that there is no simple single-parameter

    QSPR model for predicting the boiling points over a wide

    range of alkanes due to the great diversity among experi-

    mental values. Multivariate regression models appear to

    be verv accurate due to a varietv ~arametersnvolved in

    the correlation. Each of these p&.meters takes care of a

    certain structural detail of a large alkane. When all di-

    verse structural features of alkanes are considered, the

    model usually gives extremely good agreement between

    the experimental and calculated boiling points.

    Acknowledaement

    We are thankful to the Ministry of Science, Technology,

    and Informatics of the Republic of Cmatia for support.

    3. LipecL, R. LEnuimn. Tmrhi. Chem 1989.8, 1.

    4. hon ey, R.

    J.

    Chem. Ed=. 1886,62,846 .

    5. DdiC,~R~uaiuaiB Po~oi4

    kolaka

    knjige: W b . 1987. This s a bilibiligvsl edition:

    cmatlan and English.

    6. Basmvick,

    R

    J. ~Ik- i~ph i loeoph imotvmlia &It ad micam legem uirivm in

    mtum exUffntium; Runondinl: Venetia, 1763. The English translation ia also

    m4abl e: The TheoryafNolvrol Ph ih ph y; MIT Cambridge,MA, 1966.

    7.

    Daw,

    H. EIPmntaofCkmimlPhl losophy; London, 1812.

    8. %ajstiC,

    N.

    Chemiml Gmph Theory; CRC: BoeaRaton. FL, 1963:Vol.lI,Chapter

    I1 hunay. D I1 InCh.mloolAppiicanomo/T~pd~g ndUmph

    T h o in

    R B.

    Ed .Elsene,: Amsterdam. 1981;

    p

    159.

    12 Smkcneh. M. .. Stankcnch. I

    V .

    Mm X. S R u m Ckm Roo 1S88.57.337.

    13. Hanscn.P J

    :

    Jura P : J Chm E d vc

    LW

    65.575

    11 Rsndk M .I Math ChDm

    1890.4

    337.

    15 llopava H

    Bull

    ChemSa .

    Jomn

    1071.44.2332

    .

    16. Trinajat if.N. Ckml ml Gmkh Thewry, 2nd neviaeded.; CRC: BoeaRaton, FL, 99%

    chapter 3.

    17. huvray , DH. J.MoL St m t. (Thm hemJ 1988,285,187.

    18. Randii. M. J Moth. C h . 891. 7.155.

    19. Bonrheu.D lbnsp lk .U

    J

    Chrm Phya.

    l m.

    67.4517.

    20 F h l a b ~ n . ~.Bumms. L Math Chm. lMvlk~ m uh?

    lW.9.

    14.21:l

    21 \lullcr

    W R ;Szymanalu.K; Knop. J V.. 'lhna).uc. S J Chem In/ Compur

    Sn

    1880.30.160

    22. Plav3iC.D.; Nib%

    S.;Rinajsti6

    N. J Moth. Ck.m in pms.

    23. szymansld.K: o u e r , R. ffiop J. V;%sjati&, N. ~ n t . @onrum cham:

    Qunntum ChPm Symp. 1989,20,173.

    24. Haran. F. Gmph Theary;Addison-Wesley: Reading,MA, 1971:

    2nd prmtmg.

    25. %ajetif,N. Ch am id Gmph Thmy RC: Baca Raton,K 1983;Val. 1.

    26. Chartrand,G. mphs

    m

    Mothematical M&b;

    Rindle ,

    we be^, and Sehmidt: Be -

    ton,

    MA

    1977.

    27. l hng sti C, N.

    In

    MATHICHEMICOMP 1967; Lacher,

    R

    C.. Ed.;

    ElsevierAmater

    dam, 1986, p83.

    28. Sylvester J. J.Natum 1878,17.264.

    29. hbelta, F.

    8.

    Dk re te Molhemniiml M&l; Rentiee-Hall: Englearaod CIS%NJ,

    1976: p 56.

    30. Wiener.H. J. Am Chem.Soc 1917,69.17.

    31. RandiC, M.J.Am. Chem Soc. lW6,9 7,6MR .

    82. %zinger, M.: Chr(den, J. R.; Dub0is.J. E. J C h . °C Compul: &i. 19S5,26 ,23.

    33. huvray, D. U %.Am 1988,254,40.

    34. Sey bld , P. 0.;May,M.;Bagal,U. A Ck m. Edv c 1%87,84,575.

    35. Kier. h B.:Hall. L.H. Molffvlor Conmti uitv

    in

    Stmbre4ctiuihlAdwie.. Wiley:

    N Y ,

    1986.

    36. Rslaban,A. T C h . hys. Lo . l M , 9.399

    37. Sehul tz, H. P. J. Chem Inf Compvt Sc i 1983.29.221.

    88. PhraiC, D.; Nikoli&,

    S.;

    Trinajatik,N. J.Moth Chem, sutmuttedforpublicat im.

    39. Ran&&,M.; Jema n- Bl di f, B.; Gmaaman, S.C.; Rounay.

    D.

    H. Math. Compul.

    Mmklling 1968,6,571.

    4C. Needham,D. E.; Wei,M .;&yb ld.P G.J.Am.Chem S a 988,120,4188.

    41. Nizhnii,

    S.

    V Epehtein,N. A. R u m Chem Rou. 1078,47,363.

    42. Hol,

    W.

    G.

    J A w u

    Ckm.

    Id.

    Ed*. En#. 1983, 26,767.

    43.

    B a d , S. c.; Niemi. G.

    ;

    Vdth.

    G.

    DI C o m p v l n t i ~ ~ lhemiml Gmph

    ThmX

    huvray,

    D

    H. d.; Nova: New

    Ymk

    990; p 235.

    U . Psta,B.,Mayer, J.M .A cl aPh an Jugarl. 1990,40,315.

    45. W.; hvi lk m, J. InPmt lool Applimtlolo of Q~ mt if ot ii m&=m4cIiu-

    ity Roiationahipa (QSARJ in Enuimnmnfd Clumiafry and lbdmlogy; M e r

    w: Deviuem, J.. Ed*.:

    muarer:

    Dordnecht, 1990; p 1

    46. Topliaa.

    J.

    G.; Coste1lo.R. J. J Md . Chem la?& 15,1066.

    47. lbpliss, J. G.; Edwards, R. P.J Med Chem 1818.22.1238.

    48. Banchav, D.;Mekenyan, 0.J M&. Ckm.. pms.

    49. We&, R. C.

    CRCHa kofChrmlatnondPhysiac,

    67th d , 3 d rinting:CRC:

    Baea

    Raton FL. 987.

    50. Re&tPmbHandbueh &r%Mis~ishen Chamie.

    51. Nip,PA :Belaban, T.-8.;Balaban,A T J.Math. Chem 1987,1,61.


Recommended