+ All Categories
Home > Documents > Statistical tree classification of aphids based on morphological characteristics

Statistical tree classification of aphids based on morphological characteristics

Date post: 26-Nov-2023
Category:
Upload: independent
View: 0 times
Download: 0 times
Share this document with a friend
11
Computers and Electronics in Agriculture 24 (1999) 165–175 Statistical tree classification of aphids based on morphological characteristics E. Zintzaras *, J.T. Margaritopoulos, J.A. Tsitsipis Laboratory of Entomology, Faculty of Crop and Animal Production, Uni6ersity of Thessaly, Pedion Areos 38334, Volos, Greece Received 13 May 1999; received in revised form 4 August 1999; accepted 17 August 1999 Abstract Individual aphids were classified to their clones and separated to their host groups using a novel non-parametric classification tree method. The classification of the individuals was based on morphometric variables measured for each individual. The classification tree method splits the initial set of individuals recursively into subsets using one of the variables. The method has the form of a tree branching off into intermediate terminal nodes. The splitting criterion is the increase in purity when the node is split into two subnodes. The size of the tree is controlled by a threshold level for the improvement of the apparent misclassifi- cation rate of the tree after each splitting step. The results obtained by applying the classification tree method were in good agreement with those obtained by the conventional discriminant methods such as Fisher’s linear discriminant analysis and canonical variate analysis. The classification tree method has the advantage over the other two discriminant methods that it gives a graphical presentation of the structure of the data at any growing stage of the tree. Therefore, it can classify the aphids into their clones and, at the same time, to separate the host groups using a single tree. © 1999 Elsevier Science B.V. All rights reserved. Keywords: Classification trees; Discriminant analysis; Aphid morphology www.elsevier.com/locate/compag * Corresponding author. Tel.: +30-421-74078; fax: +30-421-61957. E-mail address: [email protected] (E. Zintzaras) 0168-1699/99/$ - see front matter © 1999 Elsevier Science B.V. All rights reserved. PII:S0168-1699(99)00048-4
Transcript

Computers and Electronics in Agriculture

24 (1999) 165–175

Statistical tree classification of aphids based onmorphological characteristics

E. Zintzaras *, J.T. Margaritopoulos, J.A. TsitsipisLaboratory of Entomology, Faculty of Crop and Animal Production, Uni6ersity of Thessaly,

Pedion Areos 38334, Volos, Greece

Received 13 May 1999; received in revised form 4 August 1999; accepted 17 August 1999

Abstract

Individual aphids were classified to their clones and separated to their host groups usinga novel non-parametric classification tree method. The classification of the individuals wasbased on morphometric variables measured for each individual. The classification treemethod splits the initial set of individuals recursively into subsets using one of the variables.The method has the form of a tree branching off into intermediate terminal nodes. Thesplitting criterion is the increase in purity when the node is split into two subnodes. The sizeof the tree is controlled by a threshold level for the improvement of the apparent misclassifi-cation rate of the tree after each splitting step. The results obtained by applying theclassification tree method were in good agreement with those obtained by the conventionaldiscriminant methods such as Fisher’s linear discriminant analysis and canonical variateanalysis. The classification tree method has the advantage over the other two discriminantmethods that it gives a graphical presentation of the structure of the data at any growingstage of the tree. Therefore, it can classify the aphids into their clones and, at the same time,to separate the host groups using a single tree. © 1999 Elsevier Science B.V. All rightsreserved.

Keywords: Classification trees; Discriminant analysis; Aphid morphology

www.elsevier.com/locate/compag

* Corresponding author. Tel.: +30-421-74078; fax: +30-421-61957.E-mail address: [email protected] (E. Zintzaras)

0168-1699/99/$ - see front matter © 1999 Elsevier Science B.V. All rights reserved.

PII: S0168 -1699 (99 )00048 -4

E. Zintzaras et al. / Computers and Electronics in Agriculture 24 (1999) 165–175166

1. Introduction

The classification of individual aphids to their clones and the separation oftheir host groups (plant species), based on the individual’s morphological charac-ters (variables) is currently investigated applying conventional multivariate meth-ods such as Fisher’s linear discriminant analysis and canonical variate analysis,respectively (Blackman, 1987; Lazzari and Voegtlin, 1993; Blackman and Spence,1994). However, Blackman and Spence (1992) showed that the classification ofaphids based on biochemical techniques might be more effective but they requirespecialised equipment and costly chemicals, and there is no data available forbiochemical classification of clones. Additionally, morphological criteria are pre-dominantly used for insect classification work.

Fisher’s linear discriminant analysis (LDF) allocates an individual aphid to aclone on the basis of measured variables on the individual, so that a densityfunction is maximized (Krzanowski, 1990).

Canonical variate analysis (CVA) provides two dimensional ordinations ofaphid clones on the basis of their morphological variables. The CVA examinesthe separations among a set of groups (clones) of individuals (Digby and Kemp-ton, 1994).

The application of a novel classification method (Zintzaras et al., 1994) isproposed for the classification of aphids based on measured morphological vari-ables. The implementation of the method has been performed using C++ andthe program runs under Windows 95/NT.

The classification tree is a non-parametric classification method. This methodhas the form of a tree branching off into intermediate and terminal nodes. Usingthe measured variables for each individual, each split generates subnodes whichare purer than the parent node. The size of the tree is based on the improve-ment of the apparent misclassification (AMR) after each split. This approachmakes the construction of a tree simple, and therefore faster than minimizing acost complexity function (Breiman et al., 1984). The method has the advantageof investigating the structure of the data at any growing stage of the tree.

In 1987 Blackman carried out morphometric studies on numerous samples ofthe Myzus persicae group, from different host plants, from four continents. Heshowed that the tobacco-feeding aphids (Nicotiana tabacum L.) could be distin-guished by multivariate analysis.

The aim of the study was to classify individual aphids to their clones and toseparate their host groups using the classification tree method, then to compareit with the conventional multivariate methods.

With the tree method, it was possible to examine the classification of individ-ual aphids to their clones and, at the same time, the separation of the individu-als originating from different host plants, especially tobacco, from those feedingon other hosts.

E. Zintzaras et al. / Computers and Electronics in Agriculture 24 (1999) 165–175 167

2. Materials and methods

2.1. The data

The data consisted of nine morphological measured variables on 18 clones ofMyzus persicae (Sulzer): five clones derived from peach (50 individuals), five frompepper (59 individuals) and eight from tobacco (80 individuals). The codes S1–S3,S17, and S18 denoted the five peach clones, codes S4–S8 the five pepper clones andcodes S9–S16 the eight tobacco clones. Each clone consisted of about ten individ-uals. The clones were collected from different regions of Central and North Greeceduring the period 1995–1997. Peach is the primary host of the species, whereholocyclic genotypes hibernate at the stage of diapausing egg. In the spring aphidsmultiply for a few generations parthenogenetically on the peach and then wingedaphids migrate to several secondary hosts, such as pepper and tobacco. Clonesestablished from aphids collected from the peach and pepper were from neighboringareas further away from tobacco growing regions. A clone is a line of individualsthat stem from a single mother that reproduces parthenogenetically.

The nine morphological variables measured, according to Ilharco and van Harten(1987) were (see Fig. 1): v1= the length of the 3rd antennal segment, v2= the length

Fig. 1. Side view of an apterous aphid (modified by Miyaki, 1987) and presentation of the ninemorphological measured variables.

E. Zintzaras et al. / Computers and Electronics in Agriculture 24 (1999) 165–175168

Table 1The means and the corresponding standard deviations of the nine morphological measured variableson individual aphids

v1=0.433190.0478 v2=0.132890.0100 v3=0.474990.0524v4=0.120190.0064 v6=0.113990.0074v5=0.634790.0734

v9=0.225590.0282v7=0.511690.0587 v8=0.051390.0041

of base of 6th antennal segment, v3= the length of terminal process of 6th antennalsegment, v4= the length of last rostral segment, v5= the length of hind femur,v6= the length of 2nd segment of hind tarsus, v7= the length of siphunculus,v8= the maximal width of distal swollen part of siphunculus and v9= the length ofcauda. Table 1 illustrates the means and the corresponding standard deviations ofthe nine variables.

2.2. Classification trees

The classification tree is a novel non-parametric discriminant (classification)method. If x1, . . ., x9 are the nine variables measured on each individual aphid, themethod first sorts the individual aphids according to each variable. Then it choosesa splitting variable and the splitting point on this variable that best discriminatesbetween the outcome classes (clones). After this, the two subsets are partitionedindependently, using the same splitting criterion, and this process is repeatedrecursively.

The splitting process is presented as a tree. The root node of the tree is all theindividuals in the data set. Then the root node is split into two subnodes whichinclude the individuals in each of the above two subsets.

Each time a node is split into two subnodes in such a way that the subnodes arepurer than the parent node. A node is maximally impure when all the classes areequally mixed in it and it is pure when the node contains only one class.

A node t is split into two subnodes tL and tR by a split s so that a proportion pL

of individuals in t go to tL and a portion pR go to tR. If the splitting variable is xk

(k=1, . . . , 9) and the splitting point is s, then tL contains all the individuals whichhave values for xk less than s and tR contains the remainder.

The measure of impurity of node t is defined by the following non-negativefunction:

I(t)=1− %j=1−18

p2 (j/t)

where p( j/t) is the proportion of individuals of class j in node t. The goodness ofsplit is defined by the increase in purity DI(s) as a result of split s (Breiman et al.,1984; Zintzaras et al., 1994):

E. Zintzaras et al. / Computers and Electronics in Agriculture 24 (1999) 165–175 169

DI(s)=It−pLI(tL)−pRI(tR).

The performance of a classification is estimated by apparent misclassification rate(AMR), i.e. the proportion of misclassified individuals using the resubstitutionmethod (Efron and Tibshirani, 1991). In the resubstitution method the training setconsists of the data from all the individual aphids, and the same data set is used toestimate the performance of the classification. Other methods such as 10-fold crossvalidation may produce unbiased estimates of the AMR (Efron and Tibshirani,1991) but they are not examined in this paper. The tree growth is based on theimprovement in the AMR. After each split, classes are assigned to the new nodesusing the majority rule and the AMR of the tree is calculated. If the next splitimproves the AMR by a threshold percentage then this split is performed, otherwisethe node becomes a terminal node (Zintzaras et al., 1994).

2.2.1. Computer implementationThe implementation of the method has been performed using C++ and the

program runs under Windows 95/NT. The input data file contains the number ofvariables, the number of different classes, the number of individuals, the names ofthe classes and, for each individual, a line with its variable values. Two types ofgraphical output can be specified: (i) a short output which summarizes for eachterminal node the number of individuals that belong to different classes; and (ii) along output which lists all individuals of each terminal node. The tabular outputlists for each node the number of individuals, the splitting variable, the individuals,the number of individuals belonging to different classes and the purity measurementof the terminal nodes.

2.3. Con6entional discriminant methods

Fisher’s linear discriminant analysis (LDF) allocates an individual to a class onthe basis of measured variables on the individual. In LDF the density function fi(u)of each class i is calculated:

fi(u)=2p−k/2 )%) −1/2 �exp−1

2(u− ui)T %−1

(u− ui)�

,

where k is the number of variables, u is the vector of variables measured on theindividual and S is the dispersion matrix of the variables. Then an individual isassigned to the class for which the likelihood of log-density function is greater(Krzanowski, 1990).

CVA provides ordinations of aphid clones on the basis of their morphologicalvariables. CVA examines the separations among a set of groups (clones) of units.For this purpose, CVA seeks linear combinations of the k variables, calledcanonical variates, that have the greatest between-group variation relative to theirwithin-group variability. The first and second canonical variates (denoted CV1 andCV2, respectively) are the eigenvectors of the W −1/2BW −1/2, where B is thebetween-group sums of squares and products (SSP) matrix, W=Si=1−18 Wi and

E. Zintzaras et al. / Computers and Electronics in Agriculture 24 (1999) 165–175170

Wi is the within each group i SSP matrix. A two dimensional ordination fromCVA (CV1 versus CV2), usually, accounts for most of the variation of the data(Digby and Kempton, 1994).

3. Analysis of results

The 189 individual aphids classified based on their morphological measuredvariables using the classification tree method. The obtained tree is shown in Fig. 2.The tree growth used 1% as a threshold improvement of the AMR. The overallAMR of the tree was 41%. The aphid individuals were segregated into 21 terminalnodes.

The tree consisted of three main branches, the first one (node 3) corresponds toaphids from tobacco and it is distinct from the other two, the second one (node 4)corresponds to aphids from peach and the last one (node 5) corresponds to aphidsfrom pepper.

The percentage of correctly classified peach aphids to the peach branch (node 4)was 86% (8% to pepper branch and 6% to tobacco branch), pepper aphids to thepepper branch (node 5) was 86% (7% to peach branch and 7% to tobacco branch)and tobacco aphids (node 3) to the tobacco branch was 89% (3% to peach branchand 8% to pepper branch).

The aphids originating from peach were discriminated to their clones (S1, S2, S3,S17, and S18), segregating into four terminal nodes, with the exception of theindividuals of clone S3 which were mixed with aphids from pepper. In terminalnode 17 the aphids of clone S1 were mixed with aphids of different clones. Howeverthe majority of individuals in this node originated from peach. The only distur-bance might be terminal node 25 which includes mainly aphids from pepper (threeout of seven individuals).

The aphids from pepper were segregated into seven terminal nodes. The individ-uals were discriminated into the five clones (S4, S5, S6, S7, and S8), correspondingto five terminal clones (nodes: 10, 19, 32, 33, and 40). However, these nodesincluded a small proportion of individuals from other hosts (six individuals fromtobacco and five from peach). In terminal node 34, there is a mixed of two peach,two pepper and one tobacco aphids. Terminal node 41 contains only aphids frompepper.

Although, almost the aphids from tobacco were segregated into one branch(node 3), the discrimination of individuals into their clones was not completelysuccessful. However, the individuals of clones S9, S11, S15, and S16 were segre-gated into six terminal nodes. The aphids of clones S12, S13, and S14 were spreadand mixed with the other clones from tobacco. The tobacco branch contains also asmall number of aphids originating from different hosts (two from peach and twofrom pepper).

The classification of aphids to their clones obtained using LDF is presented inTable 2. The overall misclassification rate was 37%. The five peach clones werediscriminated. However, the 50% of the individuals of clone 3 were classified to

E.

Zintzaras

etal./

Com

putersand

Electronics

inA

griculture24

(1999)165

–175

171

Fig. 2. The tree obtained for the 189 individual aphids. The AMR was 41%. The threshold for splitting a node was based on a 1% improvement of theAMR. At each terminal node the number of individuals from each clone is shown, e.g. in terminal node 24 there are eight individuals of clone S2. ClonesS1–S3, S17, and S18 are from peach, clones S4–S8 are from pepper, and clones S9–S16 are from tobacco.

E. Zintzaras et al. / Computers and Electronics in Agriculture 24 (1999) 165–175172

different peach clones and to one pepper clone, and the 60% of clone 17 wereclassified to the peach clones and to one pepper clone. Most pepper aphids wereclassified to their clones and any misclassified individuals are within this host. Intobacco clones there were a mixture of individuals mainly within this host, only asmall portion of individuals was classified to peach (two out of 80) and pepper (twoout of 80). A good discrimination is achieved for clones 14, 15, and 16. Fig. 3shows the two-dimensional ordination form CVA which counts for 67% of the totalvariation of the data. The first and second canonical functions (CV1 and CV2,respectively) are:

CV1= −1.13(v1)+0.41(v2)+0.61(v3)−0.24(v4)+0.19(v5)+0.082(v6)

−0.23(v7)+0.19(v8)+0.93(v9)

and

CV2=0.69(v1)−0.060(v2)+0.24(v3)−0.29(v4)+0.51(v5)+0.002(v6)

−0.15(v7)−0.045(v8)−0.20(v9),

the corresponding eigenvalues of W −1/2BW −1/2 are l1=3.20 and l2=1.31. Thefirst axis CV1 clearly separated the tobacco-feeding clones from those originatedfrom the other two hosts, while the second axis CV2 separated the peach aphidsfrom the pepper ones. The only ‘odd’ case was clone 5 from pepper, which appearsto be more like from peach than from pepper.

Table 2The results of the Fisher’s linear discriminant analysis

Predicted clone membershipClones

3 4 5 6 7 8 9 10 11 12 131 14 15 16 17 182

08 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1160 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 22

10000000003 001003010 4 0 3 0 0 0 0 04 00 0 0 0 0 0 00

0 0 0100005 00 000000900 16 10 1 1 0 0 0 0 0 1 0 0 0 0103

0 1 0 0 12 1 0 0 07 00 0 0 0 0 0 0100 0 0 0 2 1 7 0 0 0 0 0 0 0 0 0 08

0 0 0 0 0 0 5 3 0 0 0 19 00 1 0 000 001200010 00 42000100

11 0 0 0 0 0 1 0 5 0 1 1 0 0 0 020 00 0 0 0 0 0 1 1 012 50 1 0 2 0 0 000 0 0 0 0 0 0 0 013 30 4 1 1 1 0 00

001180000014 000000000 0 0 0 0 0 0 1 0 1 1 115 60 0 0 00

0 00 0 0 0070011000016 0 10 2 0 0 0 0 0 0 00 00 0 0 0 4 2217

0 0 0 0 0 0 0 0 0 0 0 018 01 0 0 130

E. Zintzaras et al. / Computers and Electronics in Agriculture 24 (1999) 165–175 173

Fig. 3. Two dimensional ordination of aphids from 18 clones using canonical variate analysis. The twodimensional ordination accounts for 67% of the total variation among clones.

From the equation for CV1 it can be seen that the coefficients for v1, v9, and v3are relatively high. For CV2 the coefficient of v1 is particularly high relatively to theothers. However, in the classification tree method the importance of the variables isindicated by the frequency of each variable that was used for splitting, the frequenciesare: v1=5, v2=2, v3=3, v4=1, v5=1, v6=1, v7=1, v8=2, and v9=3. In bothmethods the variables v1, v9, and v3 affect mostly the results of discrimination.

4. Discussion

The classification tree method has the advantage of investigating the structure ofthe data visually at any growth stage of the tree. The tree method classified the aphidsto their clones and separated those originating from different hosts. The peach aphidswere segregated in one branch of the tree. Only one clone from peach was notdiscriminated and it was mixed with pepper. The pepper aphids were also separatedin a distinct branch which is closer to the peach branch than to the tobacco one.Tobacco aphids showed a distinct morphology and they were separated from thoseoriginating from other hosts because they represented a different host group adaptedto certain host plant (Blackman, 1987; Blackman and Spence, 1992; Field et al., 1994;Margaritopoulos et al., 1998). However, tobacco aphids showed large variabilitywithin this group, probably because the clones were collected from different regions.On the other hand, peach and pepper aphids were collected from a single locality.

E. Zintzaras et al. / Computers and Electronics in Agriculture 24 (1999) 165–175174

The quality of the classification tree was assessed using conventional discrimi-nant analysis methods such as Fisher’s LDF and CVA.

The overall misclassification rates for individuals into their clones were 41 and37% for tree and LDF, respectively. In both methods, the misclassified individu-als were mainly within the host group where they belonged.

The CVA separated the three host groups, which was in agreement with theresulting three main branches of the tree method. In CVA clone 5 of pepper wasmixed with the peach clones, although in tree method this clone was mixed withpepper, its node is separated earlier and is distinct from the other nodes of thepepper.

The classification tree method provides a graphical presentation of the struc-ture of the data. It discriminated the individual aphids to their clones anddistinguished the tobacco form and separated aphids of the same group (notobacco form) originating from different hosts (peach and pepper). The treeresults were in a good agreement, in terms of misclassification rates, with thoseobtained from the other conventional discrimination methods. Finally, the workpresented confirmed the utility of the tree method, as a supplementary tool inaphid systematic and morphometric studies, with the advantage of offering theresults obtained from the application of more than one multivariate method.

Acknowledgements

We thank Prof. J. Podani for useful comments and discussion. Part of thiswork was funded by EPETII 453 project of Greek Secretariat for Research andTechnology and Tobacco Fund 18/T/96 of the EE.

References

Blackman, R.L., 1987. Morphological discrimination of a tobacco-feeding form from Myzus persicae(Sulzer) (Hemiptera: Aphididae), and a key to New World Myzus (Nectarosiphon) species. Bull.Entomol. Res. 77, 713–730.

Blackman, R.L., Spence, J.M., 1992. Electrophoretic distinction between the peach-potato aphid,Myzus persicae and the tobacco aphid, Myzus nicotianae (Homoptera: Aphididae). Bull. Entomol.Res. 82, 161–165.

Blackman, R.L., Spence, J.M., 1994. The effects of temperature on aphid morphology, using amultivariate approach. Eur. J. Entomol. 91, 7–22.

Breiman, L., Friedman, J.H., Olson, R.A., Stone, C.J., 1984. Classification and Regression Trees.Wadsworth, Belmont, CA.

Digby, P.G.N., Kempton, R.A., 1994. Multivariate Analysis of Ecological Communities. Chapmanand Hall, London.

Efron, B., Tibshirani, R., 1991. Statistical data analysis in the computer age. Science 253,390–395.

Field, L.M., Javed, N., Stribley, M.F., Devonshire, A.L., 1994. The peach-potato aphid Myzuspersicae and the tobacco aphid Myzus nicotianae have the same esterase-based mechanisms ofinsecticide resistance. Insect Mol. Biol. 3, 143–148.

Krzanowski, W.J., 1990. Principles of Multivariate Analysis. Clarendon Press, Oxford.

E. Zintzaras et al. / Computers and Electronics in Agriculture 24 (1999) 165–175 175

Lazzari, S.M.N., Voegtlin, D.J., 1993. Morphological variation in Rhopalosiphum padi and R. insertum(Homoptera: Aphididae) related to host plant and temperature. Ann. Entomol. Soc. Am. 86, 26–36.

Ilharco, F.A., van Harten, A., 1987. Systematics, pp. 51–77. In: Minks A.K., Harrewijn, P. (Eds.),Aphids. Their Biology, Natural Enemies and Control. Volume A, Elsevier, Amsterdam, 450 pp.

Margaritopoulos, J.T., Mamuris, Z., Tsitsipis, J.A., 1998. Attempted discrimination of Myzus persicaeand Myzus nicotianae (Homoptera: Aphididae) by random amplified polymorphic DNA polymerasechain reaction technique. Ann. Entomol. Soc. Am. 91, 602–607.Miyaki, M., 1987. Morphology andsystemics. In: Minks, A.K., Harrewijn, P. (Eds.), Aphids, Their Biology, Natural Enemies AndControl, vol. 2A. Elsevier, Amsterdam, pp. 1–25.

Zintzaras, E., Brown, N.P., Kowald, A, 1994. Growing a classification tree using the apparentmisclassification rate CABIOS 10, 263–271.

.


Recommended