+ All Categories
Home > Documents > Research Article Modeling and Testing Landslide...

Research Article Modeling and Testing Landslide...

Date post: 19-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
10
Research Article Modeling and Testing Landslide Hazard Using Decision Tree Mutasem Sh. Alkhasawneh, 1 Umi Kalthum Ngah, 1 Lea Tien Tay, 1 Nor Ashidi Mat Isa, 1 and Mohammad Subhi Al-Batah 2 1 Imaging and Computational Intelligence (ICI) Group, School of Electrical & Electronic Engineering, Universiti Sains Malaysia, Engineering Campus, Nibong Tebal, 14300 Penang, Malaysia 2 Department of Computer Science and Soſtware Engineering, Faculty of Science and Information Technology, Jadara University, P.O. Box 733, Irbid 21110, Jordan Correspondence should be addressed to Mutasem Sh. Alkhasawneh; m sh [email protected] Received 9 September 2013; Accepted 9 December 2013; Published 4 February 2014 Academic Editor: Chong Lin Copyright © 2014 Mutasem Sh. Alkhasawneh et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. is paper proposes a decision tree model for specifying the importance of 21 factors causing the landslides in a wide area of Penang Island, Malaysia. ese factors are vegetation cover, distance from the fault line, slope angle, cross curvature, slope aspect, distance from road, geology, diagonal length, longitude curvature, rugosity, plan curvature, elevation, rain perception, soil texture, surface area, distance from drainage, roughness, land cover, general curvature, tangent curvature, and profile curvature. Decision tree models are used for prediction, classification, and factors importance and are usually represented by an easy to interpret tree like structure. Four models were created using Chi-square Automatic Interaction Detector (CHAID), Exhaustive CHAID, Classification and Regression Tree (CRT), and Quick-Unbiased-Efficient Statistical Tree (QUEST). Twenty-one factors were extracted using digital elevation models (DEMs) and then used as input variables for the models. A data set of 137570 samples was selected for each variable in the analysis, where 68786 samples represent landslides and 68786 samples represent no landslides. 10-fold cross- validation was employed for testing the models. e highest accuracy was achieved using Exhaustive CHAID (82.0%) compared to CHAID (81.9%), CRT (75.6%), and QUEST (74.0%) model. Across the four models, five factors were identified as most important factors which are slope angle, distance from drainage, surface area, slope aspect, and cross curvature. 1. Introduction Landslide is one of the most aggressive natural disasters that causes loss of lives and billions of dollars damages annually worldwide. ey pose a threat to the safety of humankind lives and, the environment, resources, and property [1]. Landslide susceptibility is defined as the propensity of an area to generate landslides [2]. Assuming that landslides will occur in the future because of the same conditions that produced them in the past, susceptibility assessments can be used to predict the geographical location of future land- slides [35]. With the characteristics of high incidence and extensive occurrence range, landslide research has aroused the attention of many scientists, some of whom have focused on landslide susceptibility mapping [6, 7]. rough scientific analysis of landslide susceptibility mapping, we can assess and locate risky landslide susceptible areas. Furthermore, it allows one to take the proper precautions to reduce the negative impacts of landslides [8]. Many studies have been conducted to detect landslides and to analyze the landslide hazard using the Geographic Information Systems (GIS) and remote sensing [913]. Recently, with the development of GIS data processing tech- niques, quantitative studies have been applied to landslide susceptibility analysis using various techniques. Such studies can be identified on the basis of the techniques used, such as probabilistic methods [1418], logistic regression [1921], and artificial neural network [2225]. Most of these studies were aimed at increasing the accuracy of landslide prediction by finding suitable techniques for the respective study area. e objective of this study was to propose the best decision tree model to determine the most important factors which lead landslide susceptibility to occur. Decision tree is a popular Hindawi Publishing Corporation Journal of Applied Mathematics Volume 2014, Article ID 929768, 9 pages http://dx.doi.org/10.1155/2014/929768
Transcript
Page 1: Research Article Modeling and Testing Landslide …downloads.hindawi.com/journals/jam/2014/929768.pdfe re are three tree types of decision tree: CRT, CHAID and Exhaustive CHAID, and

Research ArticleModeling and Testing Landslide Hazard Using Decision Tree

Mutasem Sh Alkhasawneh1 Umi Kalthum Ngah1 Lea Tien Tay1

Nor Ashidi Mat Isa1 and Mohammad Subhi Al-Batah2

1 Imaging and Computational Intelligence (ICI) Group School of Electrical amp Electronic EngineeringUniversiti Sains Malaysia Engineering Campus Nibong Tebal 14300 Penang Malaysia

2 Department of Computer Science and Software Engineering Faculty of Science and Information TechnologyJadara University PO Box 733 Irbid 21110 Jordan

Correspondence should be addressed to Mutasem Sh Alkhasawneh m sh ka1yahoocom

Received 9 September 2013 Accepted 9 December 2013 Published 4 February 2014

Academic Editor Chong Lin

Copyright copy 2014 Mutasem Sh Alkhasawneh et al This is an open access article distributed under the Creative CommonsAttribution License which permits unrestricted use distribution and reproduction in any medium provided the original work isproperly cited

This paper proposes a decision treemodel for specifying the importance of 21 factors causing the landslides in a wide area of PenangIsland Malaysia These factors are vegetation cover distance from the fault line slope angle cross curvature slope aspect distancefrom road geology diagonal length longitude curvature rugosity plan curvature elevation rain perception soil texture surfacearea distance from drainage roughness land cover general curvature tangent curvature and profile curvature Decision treemodels are used for prediction classification and factors importance and are usually represented by an easy to interpret tree likestructure Fourmodels were created using Chi-square Automatic InteractionDetector (CHAID) Exhaustive CHAID Classificationand Regression Tree (CRT) and Quick-Unbiased-Efficient Statistical Tree (QUEST) Twenty-one factors were extracted usingdigital elevation models (DEMs) and then used as input variables for the models A data set of 137570 samples was selected foreach variable in the analysis where 68786 samples represent landslides and 68786 samples represent no landslides 10-fold cross-validation was employed for testing the modelsThe highest accuracy was achieved using Exhaustive CHAID (820) compared toCHAID (819) CRT (756) and QUEST (740) model Across the four models five factors were identified as most importantfactors which are slope angle distance from drainage surface area slope aspect and cross curvature

1 Introduction

Landslide is one of the most aggressive natural disasters thatcauses loss of lives and billions of dollars damages annuallyworldwide They pose a threat to the safety of humankindlives and the environment resources and property [1]Landslide susceptibility is defined as the propensity of anarea to generate landslides [2] Assuming that landslides willoccur in the future because of the same conditions thatproduced them in the past susceptibility assessments canbe used to predict the geographical location of future land-slides [3ndash5] With the characteristics of high incidence andextensive occurrence range landslide research has arousedthe attention of many scientists some of whom have focusedon landslide susceptibility mapping [6 7] Through scientificanalysis of landslide susceptibilitymappingwe can assess andlocate risky landslide susceptible areas Furthermore it allows

one to take the proper precautions to reduce the negativeimpacts of landslides [8]

Many studies have been conducted to detect landslidesand to analyze the landslide hazard using the GeographicInformation Systems (GIS) and remote sensing [9ndash13]Recently with the development of GIS data processing tech-niques quantitative studies have been applied to landslidesusceptibility analysis using various techniques Such studiescan be identified on the basis of the techniques used such asprobabilisticmethods [14ndash18] logistic regression [19ndash21] andartificial neural network [22ndash25] Most of these studies wereaimed at increasing the accuracy of landslide prediction byfinding suitable techniques for the respective study area Theobjective of this study was to propose the best decision treemodel to determine the most important factors which leadlandslide susceptibility to occur Decision tree is a popular

Hindawi Publishing CorporationJournal of Applied MathematicsVolume 2014 Article ID 929768 9 pageshttpdxdoiorg1011552014929768

2 Journal of Applied Mathematics

classification technique and represents a good compromisebetween comprehensibility accuracy and efficiency [26]

Statistical decision tree models have been successfullyused to classify and to estimate land use land cover andother geographical attributes from remote sensing data [2728] Decision tree having its origin in machine learningtheory is an efficient tool for classification and estimationUnlike other statistical methods decision tree makes nostatistical assumptions can handle data that are representedon different measurement scales and is computationally fast[22] Decision tree also has advantages that the estimationprocesses and order of important explanatory variables areexplicitly represented by tree structures [29] In additionrecent developments of computer technologies algorithms ofpattern recognition and automatic methods of decision-treedesign have enabled the use of decision tree models

Pal and Mather [30] demonstrated the advantages of thedecision tree for land cover classification in comparison withother classifiers such as the maximum likelihood methodand artificial neural networks Saito et al [2] used decisiontree models to analyze a distribution of landslides that werealmost suspended or dormant They also indicated thatdecision treemodels are useful for estimating landslide distri-butions Bui et al [31] compared the decision tree for landslideprediction in Vietnam Decision tree showed a decent per-formance compared with Support Vector Machines (SVM)and Naive Bayes Models Meanwhile decision tree showeda good ability in determinations of the important factorscausing the landslide compared with other used modelsPang et al [32] produced the landslide hazard mapping ofPenang Island using decision tree Quinlanrsquos algorithm C45Twelve landslide causative factors were used in his studyPradhan [33] used three models decision tree SVM andadaptive neurofuzzy inference system (ANFIS) for producingthe landslide hazard map for Penang Hill area The decisiontree showed a better performance compared with SVM andANFIS classifier

In this paper four decision tree methods were usedto build the optimum decision models including Chi-square Automatic Interaction Detector (CHAID) Exhaus-tive CHAID Classification and Regression Tree (CRT) andQuick-Unbiased-Efficient Statistical Tree (QUEST) Twentyone factors were selected as the input variables of the decisiontrees A data set of 137570 samples from Penang Island inMalaysia was used as examples for building the decision treesThe experiment contained ten rounds according to differentpartitions of training sets and test sets

2 Decision Trees

A decision tree is a technique for finding and describingstructural patterns in data as tree structures a decisiontree does not require the relationship between all the inputvariables and an objective variable in advanceThis techniquehelps to explain data and to make predictions using thedata [34] A decision tree can also handle data measuredon different scales without any assumptions concerning thefrequency distributions of the data based on its nonlinear

relationship [35] Therefore all variables were put into thedecision tree model

The main purpose of using the decision tree is toachieve amore concise and perspicuous representation of therelationship between an objective variable and explanatoryvariables Namely the decision tree can be visualized moreeasily unlike neural networks it is not a ldquoblack boxrdquo

The decision tree is based on a multistage or hierarchicaldecision scheme (tree structure) The tree is composed ofa root node a set of internal nodes and a set of terminalnodes (leaves) Each node of the decision tree structuremakes a binary decision that separates either one classor some of the classes from the remaining classes Theprocessing is carried out by moving down the tree until theterminal node is reached In a decision tree features thatcarry maximum information are selected for classificationwhile remaining features are rejected thereby increasingcomputational efficiency [36]The top down induction of thedecision tree indicates that variables in the higher order of thetree structure are more important

There are three tree types of decision tree CRT CHAIDand Exhaustive CHAID and Quest The algorithms of thethree types follow the following steps Start tree building byassigning the node to classes stopping tree building Reachthe optimal tree selection and perform cross-validation [37]CART performs tree ldquoPruningrdquo before producing the optimaltree selection while CHAIDmethod performs statistical testsat each step of splitting

21 Classification and Regression Tree (CRT) CRT is a recur-sive partitioning method to be used both for regression andclassification CRT is constructed by splitting subsets of thedata set using all predictor variables to create two childnodes repeatedly beginning with the entire data set The bestpredictor is chosen using a variety of impurity or diversitymeasures (Gini towing ordered towing and least-squareddeviation) The goal is to produce subsets of the data whichare as homogeneous as possible with respect to the targetvariable [38] In this study we used measure of Gini impuritythat was used for categorical target variables

Gini Impurity Measure The Gini index at node 119905 119892(119905) isdefined as

119892 (119905) = sum

119895 = 119894

119901 (119895 | 119905) (119894 | 119905) (1)

where 119894 and 119895 are categories of the target variable Theequation for the Gini index can also be written as

119892 (119905) = 1 minus sum

119895

1199012

(119895 | 119905) (2)

Thus when the cases in a node are evenly distributed acrossthe categories the Gini index takes its maximum value of1 minus (1119896) where 119896 is the number of categories for the targetvariable When all cases in the node belong to the samecategory the Gini index equals 0

Journal of Applied Mathematics 3

If costs of misclassification are specified the Gini index iscomputed as

119892 (119905) = sum

119895 = 119894

119862 (119894 | 119895) 119901 (119895 | 119905) 119901 (119894 | 119905) (3)

where 119862(119894 | 119895) is the probability of misclassifying a category 119895

case as category 119894The Gini criterion function for split 119904 at node 119905 is defined

as

0 (119904 119905) = 119892 (119905) minus 119901119871119892 (119905119871) minus 119901119877119892 (119905119877) (4)

where 119901119871 is the proportion of cases in 119905 sent to the left childnode and 119901119877 is the proportion sent to the right child nodeThe split 119904 is chosen tomaximize the value of 0(119904 119905)This valueis reported as the improvement in the tree [39]

22 Chi-Square Automatic Interaction Detector (CHAID) andExhaustive CHAID CHAID method is based on the 120594

2-test of association A CHAID tree is a decision tree that isconstructed by repeatedly splitting subsets of the space intotwo or more child nodes beginning with the entire data set[40] To determine the best split at any node any allowablepair of categories of the predictor variables is merged untilthere is no statistically significant difference within the pairwith respect to the target variable This CHAID methodnaturally deals with interactions between the independentvariables that are directly available from an examinationof the tree The final nodes identify subgroups defined bydifferent sets of independent variables [41]

The CHAID algorithm only accepts nominal or ordinalcategorical predictors When predictors are continuous theyare transformed into ordinal predictors before using thefollowing algorithm For each predictor variable 119883 mergenonsignificant categories Each final category of 119883 will resultin one child node if 119883 is used to split the node The mergingstep also calculates the adjusted 119901 value that is to be used inthe splitting step

(1) If 119883 has 1 category only stop and set the adjusted 119901

value to be 1(2) If 119883 has 2 categories go to step 8(3) Else find the allowable pair of categories of 119883 (an

allowable pair of categories for ordinal predictor istwo adjacent categories and for nominal predictor isany two categories) that is least significantly differentThe most similar pair is the pair whose test statisticgives the largest 119901 value with respect to the dependentvariable 119884

(4) For the pair having the largest 119901 value check if its 119901

value is larger than a specified alpha-level 120572 mergeIf it does this pair is merged into a single compoundcategory Then a new set of categories of 119883 is formedIf it does not then go to step 7

(5) (Optional) if the newly formed compound categoryconsists of three ormore original categories then findthe best binary split within the compound category in

which119901 value is the smallest Perform this binary splitif its 119901 value is not larger than an alpha-level 120572 split-merge

(6) Go to step 2(7) (Optional) any category having too few observations

(as compared with a user-specified minimum seg-ment size) is merged with the most similar othercategory as measured by the largest of the 119901 values

(8) The adjusted 119901 value is computed for the merged cat-egories by applying Bonferroni adjustments [42 43]

The CHAID algorithm reduces the number of predictorcategories by merging categories when there is no significantdifference between them with respect to the class When nomore classes can be merged the predictor can be consideredas a candidate for a split at the node The original CHAIDalgorithm is not guaranteed to find the best (most significant)split of all of those examined because it uses the lastsplit tested The Exhaustive CHAID algorithm attempts toovercome this problem by continuing to merge categoriesirrespective of significance level until only two categoriesremain for each predictor It then used the split with thelargest significance value rather than the last one triedThe Exhaustive CHAID requires more computer time [44]Calculations of (unadjusted)119901 values in the above algorithmsdepend on the type of dependent variable The merging stepof both CHAID and Exhaustive CHAID sometimes needsthe 119901 value for a pair of 119883 categories and sometimes needsthe 119901 value for all the categories of 119883 When 119901 value fora pair of 119883 categories is needed only part of data in thecurrent node is relevant Let 119863 denotes the relevant dataSuppose in 119863 there are 119868 categories of 119883 and 119869 categoriesof 119884 (if 119884 is categorical) The 119901 value calculation using datain 119863 is given below The null hypothesis of independenceof 119883 and the dependent variable 119884 is tested To do the testa contingency (or count) table is formed using classes of119884 as columns and categories of the predictor 119883 as rowsThe expected cell frequencies under the null hypothesis areestimatedThe observed cell frequencies and the expected cellfrequencies are used to calculate Pearson chi-squared statisticor likelihood ratio statisticThe 119901 value is computed based onthe Pearsonrsquos chi-square statistic method Consider

1198832

=

119869

sum

119895=1

119868

sum

119894=1

(119899119894119895 minus 119894119895)2

119894119895

(5)

where 119899119894119895 = sum119899isin119863119891119899119868 (119909119899 = 119894 and 119910119899 = 119895) is the observed cellfrequency and 119894119895 is the estimated expected cell frequencyfor cell 119909119899 = 119894 119910119899 = 119895 from independence model as followsThe corresponding 119901 value is given by 119901 = pr(119883

2119889 gt 119883

2)

for Pearsonrsquos chi-square test where 1198832119889 follows a chi-squared

distribution with degrees of freedom 119889 = (119869 minus 1)(119868 minus 1)119894119895 = 119899119894 sdot 119899119895119899 119899119894 = sum

119869119895=1 119899119894119895 119899119895 = sum

119868119894=1 119899119894119895 119899 = sum

119895

119895=1sum119868119894=1 119899119894119895

In step 8 the adjusted 119901-value is calculated as the 119901 valuetimes a Bonferroni multiplier The Bonferroni multiplieradjusts for multiple tests Suppose that a predictor variableoriginally has 119868 categories and it is reduced to 119903 categories

4 Journal of Applied Mathematics

after the merging step The Bonferroni multiplier 119861 is thenumber of possible ways that 119868 categories can be merged into119903 categories For 119903 = 119868 119861 = 1 For 2 le 119903 lt 119868 use the followingequation

119861 =

119903minus1

sum

V=0(minus1)

V (119903 minus V)119868

V(119903 minus V) (6)

23 Quick-Unbiased-Efficient Statistical Tree (QUEST)QUEST is a binary split decision tree algorithm for clas-sification and data mining QUEST can be used with univar-iant or linear combination splits A unique feature is that itsattribute selectionmethod has negligible bias If all the attrib-utes are uninformative with respect to the class attribute theneach has approximately the same change of being selected tosplit a node [45]

TheQUEST tree growing process consists of the selectionof a split predictor selection of a split point for the selectedpredictor and stopping In this algorithm only univariantsplits are considered For selection of split predictor it usesthe following algorithm

(1) For each continuous predictor 119883 perform anANOVA 119865-test that tests if all the different classes ofthe dependent variable 119884 have the same mean of 119883and calculate the 119901 value according to the 119865 statisticsFor each categorical predictor perform Pearsonrsquos1205942-test of 119884 and 119883rsquos independence and calculate the

119901 value according to the 1198832 statistics

(2) Find the predictor with the smallest 119901 value anddenote it 119883

lowast(3) If this smallest 119901 value is less than 120572119872 where 120572 isin

(0 1) is a user-specified level of significance and 119872 isthe total number of predictor variables predictor 119883

lowast

is selected as the split predictor for the node If not goto 4

(4) For each continuous predictor 119883 compute Levenersquos 119865

statistic based on the absolute deviation of 119883 from itsclass mean to test if the variances of 119883 for differentclasses of 119884 are the same and calculate the 119901 value forthe test

(5) Find the predictor with the smallest 119901 value anddenote it as 119883

lowastlowast(6) If this smallest 119901 value is less than 120572(119872+1198721) where

1198721 is the number of continuous predictors 119883lowastlowast is

selected as the split predictor for the node Otherwisethis node is not split [45]

3 Study Area

As shown in Figure 1 this study is focused on Penang Islandwhich lies between 5∘151015840 to 5∘301015840 N latitude and 100∘101015840 to100∘201015840 E longitude The North Channel separates the studyarea from the mainland It occupies an area of 285 km2 and isone of the 13 states of Malaysia The island is bounded to thenorth and east by the state of Kedah to the south by the state

582000

585000

588000

591000

594000

597000

600000

603000

606000

582000

585000

588000

591000

594000

597000

600000

603000

606000

243000

246000

249000

252000

255000

258000

261000

264000

243000

246000

249000

252000

255000

258000

261000

264000

N

0 33

(km)

RiverLandslide

Altitude (m)0ndash67ndash1516ndash1920

21ndash62

63ndash123124ndash194195ndash286287ndash430431ndash820

Figure 1 Study area map and landslide location map with hillshaded map

of Perak and to the west by the Strait ofMalacca and Sumatra(Indonesia)

Penang Island consists of both the island of Penangand a coastal strip on the mainland known as the ProvinceWellesley This paper focuses only on the island wherefrequent landslides occurred and threaten lives and damageproperties [46 47] The heavy rain plays a major role intriggering the landslides in the study area Data from theMalaysian Meteorological Department recorded that therainfall amount varies approximately between 2254mm and2903mm annually in the study area Penang Island has atropical climate with high temperature of 29∘C to 32∘C andhumidity ranges from 65 to 96 Topographic elevationsvary between 0m and 820m above sea level The slope angleranges from 0∘ to 87∘ while 4328 of island is flat Geolog-ical data from the Minerals and Geosciences DepartmentMalaysia show that Ferringhi granite Batu Maung graniteclay and sand granite represent more than 72 of the studyarearsquos geology Vegetation cover consists mainly of forests andfruit plantations

4 Data Collection

An effective intelligent system requires a comprehensive dataset Therefore 137570 samples of data were selected in this

Journal of Applied Mathematics 5

Table 1 Number of nodes terminal nodes and order of importance variable

Decision treemodel No of nodes No of terminal

nodesIndependent variable included

ldquoorder of importancerdquoCHAID 317 254 1198813 11988116 11988115 1198815 1198814 11988121 11988117 11988113 1198811 11988112 11988119 11988118 11988110

ExhaustiveCHAID 377 302

1198813 11988116 11988115 1198815 1198814 11988121 1198819 11988113 1198811 11988117 11988112 11988119 1198811811988110

CRT 43 221198813 11988116 11988115 1198815 1198814 1198812 11988121 1198816 1198818 1198819 11988113 1198811 11988117

11988112 11988119 11988120 11988118 11988111 11988114 1198817 11988110

QUEST 55 28 1198813 11988116 11988115 1198815 1198814 1198812 11988121 1198816 1198818 11988113 1198811 11988117 1198811211988119 11988118 11988111 1198817 11988110

analysis where 68786 samples represent landslides and 68786samples represent no landslidesThen Digital ElevationMap(DEM) is used to extract 21 topographic factors The DEMwith five-meter resolutions of Penang Island was obtainedfrom the Department of Survey and Mapping Malaysia Theextracted factors are acronyms as 1198811 (vegetation cover) 1198812

(distance from the fault line) 1198813 (slope angle) 1198814 (crosscurvature) 1198815 (slope aspect) 1198816 (distance from road) 1198817

(geology) 1198818 (diagonal length) 1198819 (longitude curvature)11988110 (rugosity) 11988111 (plan curvature) 11988112 (elevation) 11988113

(rain perception) 11988114 (soil texture) 11988115 (surface area) 11988116

(distance from drainage) 11988117 (roughness) 11988118 (land cover)11988119 (general curvature) 11988120 (tangent curvature) and 11988121

(profile curvature) In the previous studies which have beendone on Penang Island only 14 factors (1198811 to 11988114) were on thesubject of investigation for landslide [48] While the factors11988115 to 11988121 will be applied and investigated for the first timeon the study area Furthermore the 21 factors represent theavailable data of all factors that can cause the landslide in thestudy area The intelligent system target (landslides history)is represented by 0 for no landslide and 1 for landslide Thedata were normalized to range between 0 and 1 for each ofthe factors individually A 10-fold cross-validation analysiswas performed as an initial evaluation of the test error of thealgorithms Briefly this process involves splitting up the dataset into 10 random segments and using 9 of them for trainingand the 10th as a test set for the algorithm Classificationaccuracy of each model was calculated as follows

The accuracy of correctly classified landslide (1) is givenby

accuracy (1) =

10

sum

119894=1

number of correctly classified (1)

number of (1) (7)

The accuracy of correctly classified no landslide (0) isgiven by

accuracy (0) =

10

sum

119894=1

number of correctly classified (0)

number of (0) (8)

The overall accuracy for decision tree model is given by

overall accuracy =accuracy (1) + accuracy (0)

2 (9)

5 Discussion

Four tree algorithms CHAID Exhaustive CHAID CRT andQUEST were applied to map landslide susceptibility hazardThe 4 trees construction is based on the entire sample of137572 cases a cross-validation with 10 folds 005 adjustmentof the probabilities a minimum cases in parent node of 100a minimum cases in child node of 50 and equal misclassifi-cation costs The maximum number of levels is 3 for CHAIDand exhaustive CHAID and 5 for CRT and QUEST

The results for number of nodes number of terminalnodes and importance of independent variable produced byeach model are presented in Table 1 The classification treesobtained show a tree with a total of 317 nodes that consistof 254 terminal nodes using CHAID 377 nodes with 302terminal nodes using exhaustive CHAID 43 nodes with 22terminal nodes using CRT and 55 nodes with 28 terminalnodes using QUEST An example of decision tree usingCRT method is explained in Table 2 The tree has 43 nodesincluding the root node 20 internal nodes and 22 leaves(terminal nodes) Percentages in each category and in eachjoint category are presented in Table 2

Also the decision tree methods are used to analyzethe relationships between landslide susceptibility and relatedfactorsThenormalized importance of factors in classificationusing CRT is shown in Figure 2 The top-down induction ofthe decision tree indicates that variables in the higher order ofthe tree structure are more important for analyzing landslidesusceptibilityThe tree structure demonstrates that importantvariables related to high landslide susceptibility catchmentsare ordered as follows 1198813 (slope angle) 11988116 (distance fromdrainage) 11988115 (surface area) 1198815 (slope aspect) and 1198814 (crosscurvature)

The results for prediction accuracy produced by eachmodel are presented in Table 3 The results show highclassification accuracy for exhaustive CHAID algorithm ascompared to other algorithms It is found that the predictionaccuracy for exhaustive CHAID is 820 with sensitivity723 and specificity 917

6 Conclusion

This study has analyzed landslide susceptibility in PenangIsland Malaysia using ensemble learning with a decision-tree model We can conclude that the decision tree clearly

6 Journal of Applied Mathematics

Table 2 Tree table using CRT method

Node 0 1 Total Predictedcategory Parent node Primary independent variable

119873 Percent 119873 Percent 119873 Percent Variable Improvement Split values0 68786 500 68786 500 137572 1000 11 31018 958 1347 42 32365 235 0 0 11988115 0129 le0037252 37768 359 67439 641 105207 765 1 0 11988115 0129 gt0037253 29797 990 314 10 30111 219 0 1 11988116 0006 le0045804 1221 542 1033 458 2254 16 0 1 11988116 0006 gt0045805 15591 283 39565 717 55156 401 1 2 11988119 0010 le0021206 22177 443 27874 557 50051 364 1 2 11988119 0010 gt0021207 379 360 674 640 1053 08 1 4 11988117 0001 le0009588 842 701 359 299 1201 09 0 4 11988117 0001 gt0009589 6228 374 10421 626 16649 121 1 5 1198813 0003 le00129210 9363 243 29144 757 38507 280 1 5 1198813 0003 gt00129211 3727 755 1207 245 4934 36 0 6 11988116 0008 le00510412 18450 409 26667 591 45117 328 1 6 11988116 0008 gt00510413 237 280 608 720 845 06 1 7 11988112 0000 le01562514 142 683 66 317 208 02 0 7 11988112 0000 gt01562515 216 995 1 05 217 02 0 8 11988115 0000 le00237616 626 636 358 364 984 07 0 8 11988115 0000 gt00237617 1305 612 829 388 2134 16 0 9 11988117 0002 le00265918 4923 339 9592 661 14515 106 1 9 11988117 0002 gt00265919 3413 300 7980 700 11393 83 1 10 11988118 0001 le00587320 5950 219 21164 781 27114 197 1 10 11988118 0001 gt00587321 48 284 121 716 169 01 1 11 11988113 0001 le01022 3679 772 1086 228 4765 35 0 11 11988113 0001 gt01023 10966 476 12083 524 23049 168 1 12 1198813 0003 le00251024 7484 339 14584 661 22068 160 1 12 1198813 0003 gt00251025 68 986 1 14 69 01 0 14 11988112 0000 le02187526 74 532 65 468 139 01 0 14 11988112 0000 gt02187527 232 475 256 525 488 04 1 16 11988121 0000 le0437528 394 794 102 206 496 04 0 16 11988121 0000 gt0437529 758 792 199 208 957 07 0 17 11988115 0001 le00781230 547 465 630 535 1177 09 1 17 11988115 0001 gt00781231 3200 295 7633 705 10833 79 1 18 11988113 0001 le05032 1723 468 1959 532 3682 27 1 18 11988113 0001 gt05033 1828 221 6450 779 8278 60 1 19 11988117 0003 le01721434 1585 509 1530 491 3115 23 0 19 11988117 0003 gt01721435 5873 217 21164 783 27037 197 1 20 11988118 0001 le04648836 77 1000 0 00 77 01 0 20 11988118 0001 gt04648837 2179 714 873 286 3052 22 0 22 1198811 0000 le01538538 1500 876 213 124 1713 12 0 22 1198811 0000 gt01538539 5193 592 3577 408 8770 64 0 23 11988118 0003 le00880940 5773 404 8506 596 14279 104 1 23 11988118 0003 gt00880941 7078 329 14405 671 21483 156 1 24 11988117 0001 le05218542 406 694 179 306 585 04 0 24 11988117 0001 gt052185

Journal of Applied Mathematics 7

Table 3 Classification accuracy produced by each model

Decision tree model ClassificationPredicted (0) Predicted (1) Overall

CHAID 735 903 819Exhaustive CHAID 723 917 820CRT 614 897 756QUEST 544 935 740

Inde

pend

ent v

aria

ble

V10

V7

V14

V11

V18

V20

V19

V12

V17

V1

V13

V9

V8

V6

V21

V2

V4

V5

V15

V16

V3

0 20 40 60 80 100

Normalized importance

000 005 010 015

Importance ()

Figure 2 Normalized importance of factors using CRT method

indicates the order of important variables and quantitativelydescribes the relationships among the occurrence of land-slides topography and geology The decision-tree modelusing the exhaustive CHAID algorithm showed greater accu-racy than the other models demonstrating the usefulnessof the decision tree model for landslide hazard mappingAccuracies were 820 for the exhaustive CHAID 819for the CHAID 756 for the CRT and 740 for theQuest algorithm In this study we determined factors thatmay be involved in landslide susceptibility and the resultscan be used for landslide hazard mapping in other regionsMoreover landslide hazardmappingmap can be used to helpmitigate hazards to people and facilities and as basic datafor developing plans to prevent landslide hazards such as inlocating monitoring and facility sites Further case studiesand modeling are needed to better generalize the factorsinvolved in landslide susceptibility

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] P Aleotti and R Chowdhury ldquoLandslide hazard assessmentsummary review and new perspectivesrdquo Bulletin of EngineeringGeology and the Environment vol 58 no 1 pp 21ndash44 1999

[2] H Saito D Nakayama and H Matsuyama ldquoComparison oflandslide susceptibility based on a decision-tree model andactual landslide occurrence The Akaishi Mountains JapanrdquoGeomorphology vol 109 no 3-4 pp 108ndash121 2009

[3] F Guzzetti A Carrara M Cardinali and P ReichenbachldquoLandslide hazard evaluation a review of current techniquesand their application in a multi-scale study Central ItalyrdquoGeomorphology vol 31 no 1ndash4 pp 181ndash216 1999

[4] F Guzzetti P Reichenbach F Ardizzone M Cardinali and MGalli ldquoEstimating the quality of landslide susceptibilitymodelsrdquoGeomorphology vol 81 no 1-2 pp 166ndash184 2006

[5] F Guzzetti P Reichenbach M Cardinali M Galli and FArdizzone ldquoProbabilistic landslide hazard assessment at thebasin scalerdquoGeomorphology vol 72 no 1ndash4 pp 272ndash299 2005

[6] E Yesilnacar and T Topal ldquoLandslide susceptibility mapping acomparison of logistic regression and neural networksmethodsin a medium scale study Hendek region (Turkey)rdquo EngineeringGeology vol 79 no 3-4 pp 251ndash266 2005

[7] D P Kanungo M K Arora S Sarkar and R P Gupta ldquoAcomparative study of conventional ANN black box fuzzy andcombined neural and fuzzy weighting procedures for landslidesusceptibility zonation in Darjeeling Himalayasrdquo EngineeringGeology vol 85 no 3-4 pp 347ndash366 2006

[8] S He P Pan L Dai H Wang and J Liu ldquoApplication ofkernel-based Fisher discriminant analysis to map landslidesusceptibility in the Qinggan River deltaThree Gorges ChinardquoGeomorphology vol 171-172 pp 30ndash41 2012

[9] A Carrara ldquoMultivariate models for landslide hazard evalua-tionrdquo Journal of the International Association for MathematicalGeology vol 15 no 3 pp 403ndash426 1983

[10] L Ayalew and H Yamagishi ldquoSlope failures in the Blue Nilebasin as seen from landscape evolution perspectiverdquo Geomor-phology vol 57 no 1-2 pp 95ndash116 2004

[11] G Metternicht L Hurni and R Gogu ldquoRemote sensing oflandslides an analysis of the potential contribution to geo-spatial systems for hazard assessment in mountainous environ-mentsrdquoRemote Sensing of Environment vol 98 no 2-3 pp 284ndash303 2005

[12] D E Alexander ldquoA brief survey of GIS inmass-movement stud-ies with reflections on theory and methodsrdquo Geomorphologyvol 94 no 3-4 pp 261ndash267 2008

[13] J Remondo J Bonachea and A Cendrero ldquoQuantitativelandslide risk assessment and mapping on the basis of recentoccurrencesrdquo Geomorphology vol 94 no 3-4 pp 496ndash5072008

8 Journal of Applied Mathematics

[14] L Luzi F Pergalani and M T J Terlien ldquoSlope vulnerability toearthquakes at subregional scale using probabilistic techniquesand geographic information systemsrdquo Engineering Geology vol58 no 3-4 pp 313ndash336 2000

[15] S Lee and KMin ldquoStatistical analysis of landslide susceptibilityat Yongin Koreardquo Environmental Geology vol 40 no 9 pp1095ndash1113 2001

[16] L Donati and M C Turrini ldquoAn objective method to rankthe importance of the factors predisposing to landslides withthe GIS methodology application to an area of the Apennines(Valnerina Perugia Italy)rdquo Engineering Geology vol 63 no 3-4 pp 277ndash289 2002

[17] S Lee and U Choi ldquoDevelopment of GIS-based geologicalhazard information system and its application for landslideanalysis in KoreardquoGeosciences Journal vol 7 no 3 pp 243ndash2522003

[18] B Neuhauser and B Terhorst ldquoLandslide susceptibility assess-ment using ldquoweights-of-evidencerdquo applied to a study area at theJurassic escarpment (SW-Germany)rdquo Geomorphology vol 86no 1-2 pp 12ndash24 2007

[19] P M Atkinson and R Massari ldquoGeneralised linear modellingof susceptibility to landsliding in the central Apennines ItalyrdquoComputers and Geosciences vol 24 no 4 pp 373ndash385 1998

[20] F C Dai C F Lee J Li and Z W Xu ldquoAssessment of landslidesusceptibility on the natural terrain of Lantau Island HongKongrdquo Environmental Geology vol 40 no 3 pp 381ndash391 2001

[21] H A Nefeslioglu T Y Duman and S Durmaz ldquoLandslidesusceptibility mapping for a part of tectonic Kelkit Valley(Eastern Black Sea region of Turkey)rdquo Geomorphology vol 94no 3-4 pp 401ndash418 2008

[22] L Ermini F Catani andN Casagli ldquoArtificial Neural Networksapplied to landslide susceptibility assessmentrdquo Geomorphologyvol 66 no 1ndash4 pp 327ndash343 2005

[23] S Lee and I Park ldquoApplication of decision tree model for theground subsidence hazard mapping near abandoned under-ground coal minesrdquo Journal of Environmental Management vol127 pp 166ndash176 2013

[24] H Gomez and T Kavzoglu ldquoAssessment of shallow landslidesusceptibility using artificial neural networks in Jabonosa RiverBasin Venezuelardquo Engineering Geology vol 78 no 1-2 pp 11ndash27 2005

[25] C Melchiorre M Matteucci A Azzoni and A ZanchildquoArtificial neural networks and cluster analysis in landslidesusceptibility zonationrdquo Geomorphology vol 94 no 3-4 pp379ndash400 2008

[26] Y-K Yeon J-G Han and K H Ryu ldquoLandslide susceptibilitymapping in Injae Korea using a decision treerdquo EngineeringGeology vol 116 no 3-4 pp 274ndash283 2010

[27] R Bou Kheir J Chorowicz C Abdallah and D DhontldquoSoil and bedrock distribution estimated from gully form andfrequency A GIS-based decision-tree model for LebanonrdquoGeomorphology vol 93 no 3-4 pp 482ndash492 2008

[28] N J Schneevoigt S van der Linden H-P Thamm and LSchrott ldquoDetecting Alpine landforms from remotely sensedimagery A pilot study in the Bavarian Alpsrdquo Geomorphologyvol 93 no 1-2 pp 104ndash119 2008

[29] C-S Huang Y-J Lin and C-C Lin ldquoImplementation ofclassifiers for choosing insurance policy using decision trees Acase studyrdquoWSEASTransactions onComputers vol 7 no 10 pp1679ndash1689 2008

[30] M Pal and P M Mather ldquoAn assessment of the effectivenessof decision tree methods for land cover classificationrdquo RemoteSensing of Environment vol 86 no 4 pp 554ndash565 2003

[31] D T Bui B Pradhan O Lofman and I Revhaug ldquoLandslidesusceptibility assessment in Vietnam using support vectormachines decision tree and naıve bayesmodelsrdquoMathematicalProblems in Engineering vol 2012 Article ID 974638 26 pages2012

[32] P K Pang L T Tien and H Lateh ldquoLandslide hazard mappingof penang islandusing decision treemodelrdquo inProceedings of theInternational Conference on Systems and Electronic Engineering(ICSEE rsquo12) Phuket Thailand December 2012

[33] B Pradhan ldquoA comparative study on the predictive ability of thedecision tree support vector machine and neuro-fuzzy modelsin landslide susceptibility mapping using GISrdquo Computers ampGeosciences vol 51 pp 350ndash365 2013

[34] M Ture F Tokatli and I Kurt ldquoUsing Kaplan-Meier analysistogether with decision tree methods (CampRT CHAID QUESTC45 and ID3) in determining recurrence-free survival of breastcancer patientsrdquo Expert Systems with Applications vol 36 no 2pp 2017ndash2026 2009

[35] C E Brodley and M A Friedl ldquoDecision tree classificationof land cover from remotely sensed datardquo Remote Sensing ofEnvironment vol 61 no 3 pp 399ndash409 1997

[36] M Xu P Watanachaturaporn P K Varshney and M KArora ldquoDecision tree regression for soft classification of remotesensing datardquo Remote Sensing of Environment vol 97 no 3 pp322ndash336 2005

[37] I H Witten and E Frank Data MiningmdashPractical MachineLearning Tools and Techniques Elsevier Amsterdam TheNetherlands 2nd edition 2005

[38] R J Lewis ldquoAn introduction to Classification and RegressionTree (CART) analysisrdquo in Proceedings of the Annual Meetingof the Society for Academic Emergenct Medicine San FranciscoCalif USA 2000

[39] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth and BrooksMontery Calif USA 1984

[40] J A Michael and S L Gordon Data Mining Technique ForMarketing Sales and Customer Support Wiley New York NYUSA 1997

[41] D B V Biggs and E Suen ldquoA method of choosing multiwaypartitions for classification and decision treesrdquo Journal ofApplied Statistics vol 18 pp 49ndash62 1991

[42] L A Goodman ldquoSimple models for the analysis of associationin cross-classifications having ordered categoriesrdquo Journal of theAmerican Statistical Association vol 74 no 367 pp 537ndash5521979

[43] G Kass ldquoAn exploratory technique for investigating largequantities of categorical datardquo Applied Statistics vol 29 no 2pp 119ndash127 1980

[44] T Hill and P Lewicki Statistics Methods and Applications AComprehensive Reference for Science Industry andDataMiningStata Soft USA 2006

[45] W-Y Loh and Y-S Shih ldquoSplit selectionmethods for classifica-tion treesrdquo Statistica Sinica vol 7 no 4 pp 815ndash840 1997

[46] H-J Oh and B Pradhan ldquoApplication of a neuro-fuzzy modelto landslide-susceptibility mapping for shallow landslides in atropical hilly areardquoComputers and Geosciences vol 37 no 9 pp1264ndash1276 2011

Journal of Applied Mathematics 9

[47] K LimKhai-Wern T Lea Tien andH Lateh ldquoLandslide hazardmapping of Penang island using probabilistic methods andlogistic regressionrdquo in Proceedings of the IEEE InternationalConference on Imaging Systems andTechniques (IST rsquo11) pp 273ndash278 May 2011

[48] M S Alklhasawneh and U K Ngah ldquoLandslide susceptibilityhazard mapping techniques reviewrdquo Journal of Applied Sciencesvol 12 pp 802ndash808 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Research Article Modeling and Testing Landslide …downloads.hindawi.com/journals/jam/2014/929768.pdfe re are three tree types of decision tree: CRT, CHAID and Exhaustive CHAID, and

2 Journal of Applied Mathematics

classification technique and represents a good compromisebetween comprehensibility accuracy and efficiency [26]

Statistical decision tree models have been successfullyused to classify and to estimate land use land cover andother geographical attributes from remote sensing data [2728] Decision tree having its origin in machine learningtheory is an efficient tool for classification and estimationUnlike other statistical methods decision tree makes nostatistical assumptions can handle data that are representedon different measurement scales and is computationally fast[22] Decision tree also has advantages that the estimationprocesses and order of important explanatory variables areexplicitly represented by tree structures [29] In additionrecent developments of computer technologies algorithms ofpattern recognition and automatic methods of decision-treedesign have enabled the use of decision tree models

Pal and Mather [30] demonstrated the advantages of thedecision tree for land cover classification in comparison withother classifiers such as the maximum likelihood methodand artificial neural networks Saito et al [2] used decisiontree models to analyze a distribution of landslides that werealmost suspended or dormant They also indicated thatdecision treemodels are useful for estimating landslide distri-butions Bui et al [31] compared the decision tree for landslideprediction in Vietnam Decision tree showed a decent per-formance compared with Support Vector Machines (SVM)and Naive Bayes Models Meanwhile decision tree showeda good ability in determinations of the important factorscausing the landslide compared with other used modelsPang et al [32] produced the landslide hazard mapping ofPenang Island using decision tree Quinlanrsquos algorithm C45Twelve landslide causative factors were used in his studyPradhan [33] used three models decision tree SVM andadaptive neurofuzzy inference system (ANFIS) for producingthe landslide hazard map for Penang Hill area The decisiontree showed a better performance compared with SVM andANFIS classifier

In this paper four decision tree methods were usedto build the optimum decision models including Chi-square Automatic Interaction Detector (CHAID) Exhaus-tive CHAID Classification and Regression Tree (CRT) andQuick-Unbiased-Efficient Statistical Tree (QUEST) Twentyone factors were selected as the input variables of the decisiontrees A data set of 137570 samples from Penang Island inMalaysia was used as examples for building the decision treesThe experiment contained ten rounds according to differentpartitions of training sets and test sets

2 Decision Trees

A decision tree is a technique for finding and describingstructural patterns in data as tree structures a decisiontree does not require the relationship between all the inputvariables and an objective variable in advanceThis techniquehelps to explain data and to make predictions using thedata [34] A decision tree can also handle data measuredon different scales without any assumptions concerning thefrequency distributions of the data based on its nonlinear

relationship [35] Therefore all variables were put into thedecision tree model

The main purpose of using the decision tree is toachieve amore concise and perspicuous representation of therelationship between an objective variable and explanatoryvariables Namely the decision tree can be visualized moreeasily unlike neural networks it is not a ldquoblack boxrdquo

The decision tree is based on a multistage or hierarchicaldecision scheme (tree structure) The tree is composed ofa root node a set of internal nodes and a set of terminalnodes (leaves) Each node of the decision tree structuremakes a binary decision that separates either one classor some of the classes from the remaining classes Theprocessing is carried out by moving down the tree until theterminal node is reached In a decision tree features thatcarry maximum information are selected for classificationwhile remaining features are rejected thereby increasingcomputational efficiency [36]The top down induction of thedecision tree indicates that variables in the higher order of thetree structure are more important

There are three tree types of decision tree CRT CHAIDand Exhaustive CHAID and Quest The algorithms of thethree types follow the following steps Start tree building byassigning the node to classes stopping tree building Reachthe optimal tree selection and perform cross-validation [37]CART performs tree ldquoPruningrdquo before producing the optimaltree selection while CHAIDmethod performs statistical testsat each step of splitting

21 Classification and Regression Tree (CRT) CRT is a recur-sive partitioning method to be used both for regression andclassification CRT is constructed by splitting subsets of thedata set using all predictor variables to create two childnodes repeatedly beginning with the entire data set The bestpredictor is chosen using a variety of impurity or diversitymeasures (Gini towing ordered towing and least-squareddeviation) The goal is to produce subsets of the data whichare as homogeneous as possible with respect to the targetvariable [38] In this study we used measure of Gini impuritythat was used for categorical target variables

Gini Impurity Measure The Gini index at node 119905 119892(119905) isdefined as

119892 (119905) = sum

119895 = 119894

119901 (119895 | 119905) (119894 | 119905) (1)

where 119894 and 119895 are categories of the target variable Theequation for the Gini index can also be written as

119892 (119905) = 1 minus sum

119895

1199012

(119895 | 119905) (2)

Thus when the cases in a node are evenly distributed acrossthe categories the Gini index takes its maximum value of1 minus (1119896) where 119896 is the number of categories for the targetvariable When all cases in the node belong to the samecategory the Gini index equals 0

Journal of Applied Mathematics 3

If costs of misclassification are specified the Gini index iscomputed as

119892 (119905) = sum

119895 = 119894

119862 (119894 | 119895) 119901 (119895 | 119905) 119901 (119894 | 119905) (3)

where 119862(119894 | 119895) is the probability of misclassifying a category 119895

case as category 119894The Gini criterion function for split 119904 at node 119905 is defined

as

0 (119904 119905) = 119892 (119905) minus 119901119871119892 (119905119871) minus 119901119877119892 (119905119877) (4)

where 119901119871 is the proportion of cases in 119905 sent to the left childnode and 119901119877 is the proportion sent to the right child nodeThe split 119904 is chosen tomaximize the value of 0(119904 119905)This valueis reported as the improvement in the tree [39]

22 Chi-Square Automatic Interaction Detector (CHAID) andExhaustive CHAID CHAID method is based on the 120594

2-test of association A CHAID tree is a decision tree that isconstructed by repeatedly splitting subsets of the space intotwo or more child nodes beginning with the entire data set[40] To determine the best split at any node any allowablepair of categories of the predictor variables is merged untilthere is no statistically significant difference within the pairwith respect to the target variable This CHAID methodnaturally deals with interactions between the independentvariables that are directly available from an examinationof the tree The final nodes identify subgroups defined bydifferent sets of independent variables [41]

The CHAID algorithm only accepts nominal or ordinalcategorical predictors When predictors are continuous theyare transformed into ordinal predictors before using thefollowing algorithm For each predictor variable 119883 mergenonsignificant categories Each final category of 119883 will resultin one child node if 119883 is used to split the node The mergingstep also calculates the adjusted 119901 value that is to be used inthe splitting step

(1) If 119883 has 1 category only stop and set the adjusted 119901

value to be 1(2) If 119883 has 2 categories go to step 8(3) Else find the allowable pair of categories of 119883 (an

allowable pair of categories for ordinal predictor istwo adjacent categories and for nominal predictor isany two categories) that is least significantly differentThe most similar pair is the pair whose test statisticgives the largest 119901 value with respect to the dependentvariable 119884

(4) For the pair having the largest 119901 value check if its 119901

value is larger than a specified alpha-level 120572 mergeIf it does this pair is merged into a single compoundcategory Then a new set of categories of 119883 is formedIf it does not then go to step 7

(5) (Optional) if the newly formed compound categoryconsists of three ormore original categories then findthe best binary split within the compound category in

which119901 value is the smallest Perform this binary splitif its 119901 value is not larger than an alpha-level 120572 split-merge

(6) Go to step 2(7) (Optional) any category having too few observations

(as compared with a user-specified minimum seg-ment size) is merged with the most similar othercategory as measured by the largest of the 119901 values

(8) The adjusted 119901 value is computed for the merged cat-egories by applying Bonferroni adjustments [42 43]

The CHAID algorithm reduces the number of predictorcategories by merging categories when there is no significantdifference between them with respect to the class When nomore classes can be merged the predictor can be consideredas a candidate for a split at the node The original CHAIDalgorithm is not guaranteed to find the best (most significant)split of all of those examined because it uses the lastsplit tested The Exhaustive CHAID algorithm attempts toovercome this problem by continuing to merge categoriesirrespective of significance level until only two categoriesremain for each predictor It then used the split with thelargest significance value rather than the last one triedThe Exhaustive CHAID requires more computer time [44]Calculations of (unadjusted)119901 values in the above algorithmsdepend on the type of dependent variable The merging stepof both CHAID and Exhaustive CHAID sometimes needsthe 119901 value for a pair of 119883 categories and sometimes needsthe 119901 value for all the categories of 119883 When 119901 value fora pair of 119883 categories is needed only part of data in thecurrent node is relevant Let 119863 denotes the relevant dataSuppose in 119863 there are 119868 categories of 119883 and 119869 categoriesof 119884 (if 119884 is categorical) The 119901 value calculation using datain 119863 is given below The null hypothesis of independenceof 119883 and the dependent variable 119884 is tested To do the testa contingency (or count) table is formed using classes of119884 as columns and categories of the predictor 119883 as rowsThe expected cell frequencies under the null hypothesis areestimatedThe observed cell frequencies and the expected cellfrequencies are used to calculate Pearson chi-squared statisticor likelihood ratio statisticThe 119901 value is computed based onthe Pearsonrsquos chi-square statistic method Consider

1198832

=

119869

sum

119895=1

119868

sum

119894=1

(119899119894119895 minus 119894119895)2

119894119895

(5)

where 119899119894119895 = sum119899isin119863119891119899119868 (119909119899 = 119894 and 119910119899 = 119895) is the observed cellfrequency and 119894119895 is the estimated expected cell frequencyfor cell 119909119899 = 119894 119910119899 = 119895 from independence model as followsThe corresponding 119901 value is given by 119901 = pr(119883

2119889 gt 119883

2)

for Pearsonrsquos chi-square test where 1198832119889 follows a chi-squared

distribution with degrees of freedom 119889 = (119869 minus 1)(119868 minus 1)119894119895 = 119899119894 sdot 119899119895119899 119899119894 = sum

119869119895=1 119899119894119895 119899119895 = sum

119868119894=1 119899119894119895 119899 = sum

119895

119895=1sum119868119894=1 119899119894119895

In step 8 the adjusted 119901-value is calculated as the 119901 valuetimes a Bonferroni multiplier The Bonferroni multiplieradjusts for multiple tests Suppose that a predictor variableoriginally has 119868 categories and it is reduced to 119903 categories

4 Journal of Applied Mathematics

after the merging step The Bonferroni multiplier 119861 is thenumber of possible ways that 119868 categories can be merged into119903 categories For 119903 = 119868 119861 = 1 For 2 le 119903 lt 119868 use the followingequation

119861 =

119903minus1

sum

V=0(minus1)

V (119903 minus V)119868

V(119903 minus V) (6)

23 Quick-Unbiased-Efficient Statistical Tree (QUEST)QUEST is a binary split decision tree algorithm for clas-sification and data mining QUEST can be used with univar-iant or linear combination splits A unique feature is that itsattribute selectionmethod has negligible bias If all the attrib-utes are uninformative with respect to the class attribute theneach has approximately the same change of being selected tosplit a node [45]

TheQUEST tree growing process consists of the selectionof a split predictor selection of a split point for the selectedpredictor and stopping In this algorithm only univariantsplits are considered For selection of split predictor it usesthe following algorithm

(1) For each continuous predictor 119883 perform anANOVA 119865-test that tests if all the different classes ofthe dependent variable 119884 have the same mean of 119883and calculate the 119901 value according to the 119865 statisticsFor each categorical predictor perform Pearsonrsquos1205942-test of 119884 and 119883rsquos independence and calculate the

119901 value according to the 1198832 statistics

(2) Find the predictor with the smallest 119901 value anddenote it 119883

lowast(3) If this smallest 119901 value is less than 120572119872 where 120572 isin

(0 1) is a user-specified level of significance and 119872 isthe total number of predictor variables predictor 119883

lowast

is selected as the split predictor for the node If not goto 4

(4) For each continuous predictor 119883 compute Levenersquos 119865

statistic based on the absolute deviation of 119883 from itsclass mean to test if the variances of 119883 for differentclasses of 119884 are the same and calculate the 119901 value forthe test

(5) Find the predictor with the smallest 119901 value anddenote it as 119883

lowastlowast(6) If this smallest 119901 value is less than 120572(119872+1198721) where

1198721 is the number of continuous predictors 119883lowastlowast is

selected as the split predictor for the node Otherwisethis node is not split [45]

3 Study Area

As shown in Figure 1 this study is focused on Penang Islandwhich lies between 5∘151015840 to 5∘301015840 N latitude and 100∘101015840 to100∘201015840 E longitude The North Channel separates the studyarea from the mainland It occupies an area of 285 km2 and isone of the 13 states of Malaysia The island is bounded to thenorth and east by the state of Kedah to the south by the state

582000

585000

588000

591000

594000

597000

600000

603000

606000

582000

585000

588000

591000

594000

597000

600000

603000

606000

243000

246000

249000

252000

255000

258000

261000

264000

243000

246000

249000

252000

255000

258000

261000

264000

N

0 33

(km)

RiverLandslide

Altitude (m)0ndash67ndash1516ndash1920

21ndash62

63ndash123124ndash194195ndash286287ndash430431ndash820

Figure 1 Study area map and landslide location map with hillshaded map

of Perak and to the west by the Strait ofMalacca and Sumatra(Indonesia)

Penang Island consists of both the island of Penangand a coastal strip on the mainland known as the ProvinceWellesley This paper focuses only on the island wherefrequent landslides occurred and threaten lives and damageproperties [46 47] The heavy rain plays a major role intriggering the landslides in the study area Data from theMalaysian Meteorological Department recorded that therainfall amount varies approximately between 2254mm and2903mm annually in the study area Penang Island has atropical climate with high temperature of 29∘C to 32∘C andhumidity ranges from 65 to 96 Topographic elevationsvary between 0m and 820m above sea level The slope angleranges from 0∘ to 87∘ while 4328 of island is flat Geolog-ical data from the Minerals and Geosciences DepartmentMalaysia show that Ferringhi granite Batu Maung graniteclay and sand granite represent more than 72 of the studyarearsquos geology Vegetation cover consists mainly of forests andfruit plantations

4 Data Collection

An effective intelligent system requires a comprehensive dataset Therefore 137570 samples of data were selected in this

Journal of Applied Mathematics 5

Table 1 Number of nodes terminal nodes and order of importance variable

Decision treemodel No of nodes No of terminal

nodesIndependent variable included

ldquoorder of importancerdquoCHAID 317 254 1198813 11988116 11988115 1198815 1198814 11988121 11988117 11988113 1198811 11988112 11988119 11988118 11988110

ExhaustiveCHAID 377 302

1198813 11988116 11988115 1198815 1198814 11988121 1198819 11988113 1198811 11988117 11988112 11988119 1198811811988110

CRT 43 221198813 11988116 11988115 1198815 1198814 1198812 11988121 1198816 1198818 1198819 11988113 1198811 11988117

11988112 11988119 11988120 11988118 11988111 11988114 1198817 11988110

QUEST 55 28 1198813 11988116 11988115 1198815 1198814 1198812 11988121 1198816 1198818 11988113 1198811 11988117 1198811211988119 11988118 11988111 1198817 11988110

analysis where 68786 samples represent landslides and 68786samples represent no landslidesThen Digital ElevationMap(DEM) is used to extract 21 topographic factors The DEMwith five-meter resolutions of Penang Island was obtainedfrom the Department of Survey and Mapping Malaysia Theextracted factors are acronyms as 1198811 (vegetation cover) 1198812

(distance from the fault line) 1198813 (slope angle) 1198814 (crosscurvature) 1198815 (slope aspect) 1198816 (distance from road) 1198817

(geology) 1198818 (diagonal length) 1198819 (longitude curvature)11988110 (rugosity) 11988111 (plan curvature) 11988112 (elevation) 11988113

(rain perception) 11988114 (soil texture) 11988115 (surface area) 11988116

(distance from drainage) 11988117 (roughness) 11988118 (land cover)11988119 (general curvature) 11988120 (tangent curvature) and 11988121

(profile curvature) In the previous studies which have beendone on Penang Island only 14 factors (1198811 to 11988114) were on thesubject of investigation for landslide [48] While the factors11988115 to 11988121 will be applied and investigated for the first timeon the study area Furthermore the 21 factors represent theavailable data of all factors that can cause the landslide in thestudy area The intelligent system target (landslides history)is represented by 0 for no landslide and 1 for landslide Thedata were normalized to range between 0 and 1 for each ofthe factors individually A 10-fold cross-validation analysiswas performed as an initial evaluation of the test error of thealgorithms Briefly this process involves splitting up the dataset into 10 random segments and using 9 of them for trainingand the 10th as a test set for the algorithm Classificationaccuracy of each model was calculated as follows

The accuracy of correctly classified landslide (1) is givenby

accuracy (1) =

10

sum

119894=1

number of correctly classified (1)

number of (1) (7)

The accuracy of correctly classified no landslide (0) isgiven by

accuracy (0) =

10

sum

119894=1

number of correctly classified (0)

number of (0) (8)

The overall accuracy for decision tree model is given by

overall accuracy =accuracy (1) + accuracy (0)

2 (9)

5 Discussion

Four tree algorithms CHAID Exhaustive CHAID CRT andQUEST were applied to map landslide susceptibility hazardThe 4 trees construction is based on the entire sample of137572 cases a cross-validation with 10 folds 005 adjustmentof the probabilities a minimum cases in parent node of 100a minimum cases in child node of 50 and equal misclassifi-cation costs The maximum number of levels is 3 for CHAIDand exhaustive CHAID and 5 for CRT and QUEST

The results for number of nodes number of terminalnodes and importance of independent variable produced byeach model are presented in Table 1 The classification treesobtained show a tree with a total of 317 nodes that consistof 254 terminal nodes using CHAID 377 nodes with 302terminal nodes using exhaustive CHAID 43 nodes with 22terminal nodes using CRT and 55 nodes with 28 terminalnodes using QUEST An example of decision tree usingCRT method is explained in Table 2 The tree has 43 nodesincluding the root node 20 internal nodes and 22 leaves(terminal nodes) Percentages in each category and in eachjoint category are presented in Table 2

Also the decision tree methods are used to analyzethe relationships between landslide susceptibility and relatedfactorsThenormalized importance of factors in classificationusing CRT is shown in Figure 2 The top-down induction ofthe decision tree indicates that variables in the higher order ofthe tree structure are more important for analyzing landslidesusceptibilityThe tree structure demonstrates that importantvariables related to high landslide susceptibility catchmentsare ordered as follows 1198813 (slope angle) 11988116 (distance fromdrainage) 11988115 (surface area) 1198815 (slope aspect) and 1198814 (crosscurvature)

The results for prediction accuracy produced by eachmodel are presented in Table 3 The results show highclassification accuracy for exhaustive CHAID algorithm ascompared to other algorithms It is found that the predictionaccuracy for exhaustive CHAID is 820 with sensitivity723 and specificity 917

6 Conclusion

This study has analyzed landslide susceptibility in PenangIsland Malaysia using ensemble learning with a decision-tree model We can conclude that the decision tree clearly

6 Journal of Applied Mathematics

Table 2 Tree table using CRT method

Node 0 1 Total Predictedcategory Parent node Primary independent variable

119873 Percent 119873 Percent 119873 Percent Variable Improvement Split values0 68786 500 68786 500 137572 1000 11 31018 958 1347 42 32365 235 0 0 11988115 0129 le0037252 37768 359 67439 641 105207 765 1 0 11988115 0129 gt0037253 29797 990 314 10 30111 219 0 1 11988116 0006 le0045804 1221 542 1033 458 2254 16 0 1 11988116 0006 gt0045805 15591 283 39565 717 55156 401 1 2 11988119 0010 le0021206 22177 443 27874 557 50051 364 1 2 11988119 0010 gt0021207 379 360 674 640 1053 08 1 4 11988117 0001 le0009588 842 701 359 299 1201 09 0 4 11988117 0001 gt0009589 6228 374 10421 626 16649 121 1 5 1198813 0003 le00129210 9363 243 29144 757 38507 280 1 5 1198813 0003 gt00129211 3727 755 1207 245 4934 36 0 6 11988116 0008 le00510412 18450 409 26667 591 45117 328 1 6 11988116 0008 gt00510413 237 280 608 720 845 06 1 7 11988112 0000 le01562514 142 683 66 317 208 02 0 7 11988112 0000 gt01562515 216 995 1 05 217 02 0 8 11988115 0000 le00237616 626 636 358 364 984 07 0 8 11988115 0000 gt00237617 1305 612 829 388 2134 16 0 9 11988117 0002 le00265918 4923 339 9592 661 14515 106 1 9 11988117 0002 gt00265919 3413 300 7980 700 11393 83 1 10 11988118 0001 le00587320 5950 219 21164 781 27114 197 1 10 11988118 0001 gt00587321 48 284 121 716 169 01 1 11 11988113 0001 le01022 3679 772 1086 228 4765 35 0 11 11988113 0001 gt01023 10966 476 12083 524 23049 168 1 12 1198813 0003 le00251024 7484 339 14584 661 22068 160 1 12 1198813 0003 gt00251025 68 986 1 14 69 01 0 14 11988112 0000 le02187526 74 532 65 468 139 01 0 14 11988112 0000 gt02187527 232 475 256 525 488 04 1 16 11988121 0000 le0437528 394 794 102 206 496 04 0 16 11988121 0000 gt0437529 758 792 199 208 957 07 0 17 11988115 0001 le00781230 547 465 630 535 1177 09 1 17 11988115 0001 gt00781231 3200 295 7633 705 10833 79 1 18 11988113 0001 le05032 1723 468 1959 532 3682 27 1 18 11988113 0001 gt05033 1828 221 6450 779 8278 60 1 19 11988117 0003 le01721434 1585 509 1530 491 3115 23 0 19 11988117 0003 gt01721435 5873 217 21164 783 27037 197 1 20 11988118 0001 le04648836 77 1000 0 00 77 01 0 20 11988118 0001 gt04648837 2179 714 873 286 3052 22 0 22 1198811 0000 le01538538 1500 876 213 124 1713 12 0 22 1198811 0000 gt01538539 5193 592 3577 408 8770 64 0 23 11988118 0003 le00880940 5773 404 8506 596 14279 104 1 23 11988118 0003 gt00880941 7078 329 14405 671 21483 156 1 24 11988117 0001 le05218542 406 694 179 306 585 04 0 24 11988117 0001 gt052185

Journal of Applied Mathematics 7

Table 3 Classification accuracy produced by each model

Decision tree model ClassificationPredicted (0) Predicted (1) Overall

CHAID 735 903 819Exhaustive CHAID 723 917 820CRT 614 897 756QUEST 544 935 740

Inde

pend

ent v

aria

ble

V10

V7

V14

V11

V18

V20

V19

V12

V17

V1

V13

V9

V8

V6

V21

V2

V4

V5

V15

V16

V3

0 20 40 60 80 100

Normalized importance

000 005 010 015

Importance ()

Figure 2 Normalized importance of factors using CRT method

indicates the order of important variables and quantitativelydescribes the relationships among the occurrence of land-slides topography and geology The decision-tree modelusing the exhaustive CHAID algorithm showed greater accu-racy than the other models demonstrating the usefulnessof the decision tree model for landslide hazard mappingAccuracies were 820 for the exhaustive CHAID 819for the CHAID 756 for the CRT and 740 for theQuest algorithm In this study we determined factors thatmay be involved in landslide susceptibility and the resultscan be used for landslide hazard mapping in other regionsMoreover landslide hazardmappingmap can be used to helpmitigate hazards to people and facilities and as basic datafor developing plans to prevent landslide hazards such as inlocating monitoring and facility sites Further case studiesand modeling are needed to better generalize the factorsinvolved in landslide susceptibility

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] P Aleotti and R Chowdhury ldquoLandslide hazard assessmentsummary review and new perspectivesrdquo Bulletin of EngineeringGeology and the Environment vol 58 no 1 pp 21ndash44 1999

[2] H Saito D Nakayama and H Matsuyama ldquoComparison oflandslide susceptibility based on a decision-tree model andactual landslide occurrence The Akaishi Mountains JapanrdquoGeomorphology vol 109 no 3-4 pp 108ndash121 2009

[3] F Guzzetti A Carrara M Cardinali and P ReichenbachldquoLandslide hazard evaluation a review of current techniquesand their application in a multi-scale study Central ItalyrdquoGeomorphology vol 31 no 1ndash4 pp 181ndash216 1999

[4] F Guzzetti P Reichenbach F Ardizzone M Cardinali and MGalli ldquoEstimating the quality of landslide susceptibilitymodelsrdquoGeomorphology vol 81 no 1-2 pp 166ndash184 2006

[5] F Guzzetti P Reichenbach M Cardinali M Galli and FArdizzone ldquoProbabilistic landslide hazard assessment at thebasin scalerdquoGeomorphology vol 72 no 1ndash4 pp 272ndash299 2005

[6] E Yesilnacar and T Topal ldquoLandslide susceptibility mapping acomparison of logistic regression and neural networksmethodsin a medium scale study Hendek region (Turkey)rdquo EngineeringGeology vol 79 no 3-4 pp 251ndash266 2005

[7] D P Kanungo M K Arora S Sarkar and R P Gupta ldquoAcomparative study of conventional ANN black box fuzzy andcombined neural and fuzzy weighting procedures for landslidesusceptibility zonation in Darjeeling Himalayasrdquo EngineeringGeology vol 85 no 3-4 pp 347ndash366 2006

[8] S He P Pan L Dai H Wang and J Liu ldquoApplication ofkernel-based Fisher discriminant analysis to map landslidesusceptibility in the Qinggan River deltaThree Gorges ChinardquoGeomorphology vol 171-172 pp 30ndash41 2012

[9] A Carrara ldquoMultivariate models for landslide hazard evalua-tionrdquo Journal of the International Association for MathematicalGeology vol 15 no 3 pp 403ndash426 1983

[10] L Ayalew and H Yamagishi ldquoSlope failures in the Blue Nilebasin as seen from landscape evolution perspectiverdquo Geomor-phology vol 57 no 1-2 pp 95ndash116 2004

[11] G Metternicht L Hurni and R Gogu ldquoRemote sensing oflandslides an analysis of the potential contribution to geo-spatial systems for hazard assessment in mountainous environ-mentsrdquoRemote Sensing of Environment vol 98 no 2-3 pp 284ndash303 2005

[12] D E Alexander ldquoA brief survey of GIS inmass-movement stud-ies with reflections on theory and methodsrdquo Geomorphologyvol 94 no 3-4 pp 261ndash267 2008

[13] J Remondo J Bonachea and A Cendrero ldquoQuantitativelandslide risk assessment and mapping on the basis of recentoccurrencesrdquo Geomorphology vol 94 no 3-4 pp 496ndash5072008

8 Journal of Applied Mathematics

[14] L Luzi F Pergalani and M T J Terlien ldquoSlope vulnerability toearthquakes at subregional scale using probabilistic techniquesand geographic information systemsrdquo Engineering Geology vol58 no 3-4 pp 313ndash336 2000

[15] S Lee and KMin ldquoStatistical analysis of landslide susceptibilityat Yongin Koreardquo Environmental Geology vol 40 no 9 pp1095ndash1113 2001

[16] L Donati and M C Turrini ldquoAn objective method to rankthe importance of the factors predisposing to landslides withthe GIS methodology application to an area of the Apennines(Valnerina Perugia Italy)rdquo Engineering Geology vol 63 no 3-4 pp 277ndash289 2002

[17] S Lee and U Choi ldquoDevelopment of GIS-based geologicalhazard information system and its application for landslideanalysis in KoreardquoGeosciences Journal vol 7 no 3 pp 243ndash2522003

[18] B Neuhauser and B Terhorst ldquoLandslide susceptibility assess-ment using ldquoweights-of-evidencerdquo applied to a study area at theJurassic escarpment (SW-Germany)rdquo Geomorphology vol 86no 1-2 pp 12ndash24 2007

[19] P M Atkinson and R Massari ldquoGeneralised linear modellingof susceptibility to landsliding in the central Apennines ItalyrdquoComputers and Geosciences vol 24 no 4 pp 373ndash385 1998

[20] F C Dai C F Lee J Li and Z W Xu ldquoAssessment of landslidesusceptibility on the natural terrain of Lantau Island HongKongrdquo Environmental Geology vol 40 no 3 pp 381ndash391 2001

[21] H A Nefeslioglu T Y Duman and S Durmaz ldquoLandslidesusceptibility mapping for a part of tectonic Kelkit Valley(Eastern Black Sea region of Turkey)rdquo Geomorphology vol 94no 3-4 pp 401ndash418 2008

[22] L Ermini F Catani andN Casagli ldquoArtificial Neural Networksapplied to landslide susceptibility assessmentrdquo Geomorphologyvol 66 no 1ndash4 pp 327ndash343 2005

[23] S Lee and I Park ldquoApplication of decision tree model for theground subsidence hazard mapping near abandoned under-ground coal minesrdquo Journal of Environmental Management vol127 pp 166ndash176 2013

[24] H Gomez and T Kavzoglu ldquoAssessment of shallow landslidesusceptibility using artificial neural networks in Jabonosa RiverBasin Venezuelardquo Engineering Geology vol 78 no 1-2 pp 11ndash27 2005

[25] C Melchiorre M Matteucci A Azzoni and A ZanchildquoArtificial neural networks and cluster analysis in landslidesusceptibility zonationrdquo Geomorphology vol 94 no 3-4 pp379ndash400 2008

[26] Y-K Yeon J-G Han and K H Ryu ldquoLandslide susceptibilitymapping in Injae Korea using a decision treerdquo EngineeringGeology vol 116 no 3-4 pp 274ndash283 2010

[27] R Bou Kheir J Chorowicz C Abdallah and D DhontldquoSoil and bedrock distribution estimated from gully form andfrequency A GIS-based decision-tree model for LebanonrdquoGeomorphology vol 93 no 3-4 pp 482ndash492 2008

[28] N J Schneevoigt S van der Linden H-P Thamm and LSchrott ldquoDetecting Alpine landforms from remotely sensedimagery A pilot study in the Bavarian Alpsrdquo Geomorphologyvol 93 no 1-2 pp 104ndash119 2008

[29] C-S Huang Y-J Lin and C-C Lin ldquoImplementation ofclassifiers for choosing insurance policy using decision trees Acase studyrdquoWSEASTransactions onComputers vol 7 no 10 pp1679ndash1689 2008

[30] M Pal and P M Mather ldquoAn assessment of the effectivenessof decision tree methods for land cover classificationrdquo RemoteSensing of Environment vol 86 no 4 pp 554ndash565 2003

[31] D T Bui B Pradhan O Lofman and I Revhaug ldquoLandslidesusceptibility assessment in Vietnam using support vectormachines decision tree and naıve bayesmodelsrdquoMathematicalProblems in Engineering vol 2012 Article ID 974638 26 pages2012

[32] P K Pang L T Tien and H Lateh ldquoLandslide hazard mappingof penang islandusing decision treemodelrdquo inProceedings of theInternational Conference on Systems and Electronic Engineering(ICSEE rsquo12) Phuket Thailand December 2012

[33] B Pradhan ldquoA comparative study on the predictive ability of thedecision tree support vector machine and neuro-fuzzy modelsin landslide susceptibility mapping using GISrdquo Computers ampGeosciences vol 51 pp 350ndash365 2013

[34] M Ture F Tokatli and I Kurt ldquoUsing Kaplan-Meier analysistogether with decision tree methods (CampRT CHAID QUESTC45 and ID3) in determining recurrence-free survival of breastcancer patientsrdquo Expert Systems with Applications vol 36 no 2pp 2017ndash2026 2009

[35] C E Brodley and M A Friedl ldquoDecision tree classificationof land cover from remotely sensed datardquo Remote Sensing ofEnvironment vol 61 no 3 pp 399ndash409 1997

[36] M Xu P Watanachaturaporn P K Varshney and M KArora ldquoDecision tree regression for soft classification of remotesensing datardquo Remote Sensing of Environment vol 97 no 3 pp322ndash336 2005

[37] I H Witten and E Frank Data MiningmdashPractical MachineLearning Tools and Techniques Elsevier Amsterdam TheNetherlands 2nd edition 2005

[38] R J Lewis ldquoAn introduction to Classification and RegressionTree (CART) analysisrdquo in Proceedings of the Annual Meetingof the Society for Academic Emergenct Medicine San FranciscoCalif USA 2000

[39] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth and BrooksMontery Calif USA 1984

[40] J A Michael and S L Gordon Data Mining Technique ForMarketing Sales and Customer Support Wiley New York NYUSA 1997

[41] D B V Biggs and E Suen ldquoA method of choosing multiwaypartitions for classification and decision treesrdquo Journal ofApplied Statistics vol 18 pp 49ndash62 1991

[42] L A Goodman ldquoSimple models for the analysis of associationin cross-classifications having ordered categoriesrdquo Journal of theAmerican Statistical Association vol 74 no 367 pp 537ndash5521979

[43] G Kass ldquoAn exploratory technique for investigating largequantities of categorical datardquo Applied Statistics vol 29 no 2pp 119ndash127 1980

[44] T Hill and P Lewicki Statistics Methods and Applications AComprehensive Reference for Science Industry andDataMiningStata Soft USA 2006

[45] W-Y Loh and Y-S Shih ldquoSplit selectionmethods for classifica-tion treesrdquo Statistica Sinica vol 7 no 4 pp 815ndash840 1997

[46] H-J Oh and B Pradhan ldquoApplication of a neuro-fuzzy modelto landslide-susceptibility mapping for shallow landslides in atropical hilly areardquoComputers and Geosciences vol 37 no 9 pp1264ndash1276 2011

Journal of Applied Mathematics 9

[47] K LimKhai-Wern T Lea Tien andH Lateh ldquoLandslide hazardmapping of Penang island using probabilistic methods andlogistic regressionrdquo in Proceedings of the IEEE InternationalConference on Imaging Systems andTechniques (IST rsquo11) pp 273ndash278 May 2011

[48] M S Alklhasawneh and U K Ngah ldquoLandslide susceptibilityhazard mapping techniques reviewrdquo Journal of Applied Sciencesvol 12 pp 802ndash808 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Research Article Modeling and Testing Landslide …downloads.hindawi.com/journals/jam/2014/929768.pdfe re are three tree types of decision tree: CRT, CHAID and Exhaustive CHAID, and

Journal of Applied Mathematics 3

If costs of misclassification are specified the Gini index iscomputed as

119892 (119905) = sum

119895 = 119894

119862 (119894 | 119895) 119901 (119895 | 119905) 119901 (119894 | 119905) (3)

where 119862(119894 | 119895) is the probability of misclassifying a category 119895

case as category 119894The Gini criterion function for split 119904 at node 119905 is defined

as

0 (119904 119905) = 119892 (119905) minus 119901119871119892 (119905119871) minus 119901119877119892 (119905119877) (4)

where 119901119871 is the proportion of cases in 119905 sent to the left childnode and 119901119877 is the proportion sent to the right child nodeThe split 119904 is chosen tomaximize the value of 0(119904 119905)This valueis reported as the improvement in the tree [39]

22 Chi-Square Automatic Interaction Detector (CHAID) andExhaustive CHAID CHAID method is based on the 120594

2-test of association A CHAID tree is a decision tree that isconstructed by repeatedly splitting subsets of the space intotwo or more child nodes beginning with the entire data set[40] To determine the best split at any node any allowablepair of categories of the predictor variables is merged untilthere is no statistically significant difference within the pairwith respect to the target variable This CHAID methodnaturally deals with interactions between the independentvariables that are directly available from an examinationof the tree The final nodes identify subgroups defined bydifferent sets of independent variables [41]

The CHAID algorithm only accepts nominal or ordinalcategorical predictors When predictors are continuous theyare transformed into ordinal predictors before using thefollowing algorithm For each predictor variable 119883 mergenonsignificant categories Each final category of 119883 will resultin one child node if 119883 is used to split the node The mergingstep also calculates the adjusted 119901 value that is to be used inthe splitting step

(1) If 119883 has 1 category only stop and set the adjusted 119901

value to be 1(2) If 119883 has 2 categories go to step 8(3) Else find the allowable pair of categories of 119883 (an

allowable pair of categories for ordinal predictor istwo adjacent categories and for nominal predictor isany two categories) that is least significantly differentThe most similar pair is the pair whose test statisticgives the largest 119901 value with respect to the dependentvariable 119884

(4) For the pair having the largest 119901 value check if its 119901

value is larger than a specified alpha-level 120572 mergeIf it does this pair is merged into a single compoundcategory Then a new set of categories of 119883 is formedIf it does not then go to step 7

(5) (Optional) if the newly formed compound categoryconsists of three ormore original categories then findthe best binary split within the compound category in

which119901 value is the smallest Perform this binary splitif its 119901 value is not larger than an alpha-level 120572 split-merge

(6) Go to step 2(7) (Optional) any category having too few observations

(as compared with a user-specified minimum seg-ment size) is merged with the most similar othercategory as measured by the largest of the 119901 values

(8) The adjusted 119901 value is computed for the merged cat-egories by applying Bonferroni adjustments [42 43]

The CHAID algorithm reduces the number of predictorcategories by merging categories when there is no significantdifference between them with respect to the class When nomore classes can be merged the predictor can be consideredas a candidate for a split at the node The original CHAIDalgorithm is not guaranteed to find the best (most significant)split of all of those examined because it uses the lastsplit tested The Exhaustive CHAID algorithm attempts toovercome this problem by continuing to merge categoriesirrespective of significance level until only two categoriesremain for each predictor It then used the split with thelargest significance value rather than the last one triedThe Exhaustive CHAID requires more computer time [44]Calculations of (unadjusted)119901 values in the above algorithmsdepend on the type of dependent variable The merging stepof both CHAID and Exhaustive CHAID sometimes needsthe 119901 value for a pair of 119883 categories and sometimes needsthe 119901 value for all the categories of 119883 When 119901 value fora pair of 119883 categories is needed only part of data in thecurrent node is relevant Let 119863 denotes the relevant dataSuppose in 119863 there are 119868 categories of 119883 and 119869 categoriesof 119884 (if 119884 is categorical) The 119901 value calculation using datain 119863 is given below The null hypothesis of independenceof 119883 and the dependent variable 119884 is tested To do the testa contingency (or count) table is formed using classes of119884 as columns and categories of the predictor 119883 as rowsThe expected cell frequencies under the null hypothesis areestimatedThe observed cell frequencies and the expected cellfrequencies are used to calculate Pearson chi-squared statisticor likelihood ratio statisticThe 119901 value is computed based onthe Pearsonrsquos chi-square statistic method Consider

1198832

=

119869

sum

119895=1

119868

sum

119894=1

(119899119894119895 minus 119894119895)2

119894119895

(5)

where 119899119894119895 = sum119899isin119863119891119899119868 (119909119899 = 119894 and 119910119899 = 119895) is the observed cellfrequency and 119894119895 is the estimated expected cell frequencyfor cell 119909119899 = 119894 119910119899 = 119895 from independence model as followsThe corresponding 119901 value is given by 119901 = pr(119883

2119889 gt 119883

2)

for Pearsonrsquos chi-square test where 1198832119889 follows a chi-squared

distribution with degrees of freedom 119889 = (119869 minus 1)(119868 minus 1)119894119895 = 119899119894 sdot 119899119895119899 119899119894 = sum

119869119895=1 119899119894119895 119899119895 = sum

119868119894=1 119899119894119895 119899 = sum

119895

119895=1sum119868119894=1 119899119894119895

In step 8 the adjusted 119901-value is calculated as the 119901 valuetimes a Bonferroni multiplier The Bonferroni multiplieradjusts for multiple tests Suppose that a predictor variableoriginally has 119868 categories and it is reduced to 119903 categories

4 Journal of Applied Mathematics

after the merging step The Bonferroni multiplier 119861 is thenumber of possible ways that 119868 categories can be merged into119903 categories For 119903 = 119868 119861 = 1 For 2 le 119903 lt 119868 use the followingequation

119861 =

119903minus1

sum

V=0(minus1)

V (119903 minus V)119868

V(119903 minus V) (6)

23 Quick-Unbiased-Efficient Statistical Tree (QUEST)QUEST is a binary split decision tree algorithm for clas-sification and data mining QUEST can be used with univar-iant or linear combination splits A unique feature is that itsattribute selectionmethod has negligible bias If all the attrib-utes are uninformative with respect to the class attribute theneach has approximately the same change of being selected tosplit a node [45]

TheQUEST tree growing process consists of the selectionof a split predictor selection of a split point for the selectedpredictor and stopping In this algorithm only univariantsplits are considered For selection of split predictor it usesthe following algorithm

(1) For each continuous predictor 119883 perform anANOVA 119865-test that tests if all the different classes ofthe dependent variable 119884 have the same mean of 119883and calculate the 119901 value according to the 119865 statisticsFor each categorical predictor perform Pearsonrsquos1205942-test of 119884 and 119883rsquos independence and calculate the

119901 value according to the 1198832 statistics

(2) Find the predictor with the smallest 119901 value anddenote it 119883

lowast(3) If this smallest 119901 value is less than 120572119872 where 120572 isin

(0 1) is a user-specified level of significance and 119872 isthe total number of predictor variables predictor 119883

lowast

is selected as the split predictor for the node If not goto 4

(4) For each continuous predictor 119883 compute Levenersquos 119865

statistic based on the absolute deviation of 119883 from itsclass mean to test if the variances of 119883 for differentclasses of 119884 are the same and calculate the 119901 value forthe test

(5) Find the predictor with the smallest 119901 value anddenote it as 119883

lowastlowast(6) If this smallest 119901 value is less than 120572(119872+1198721) where

1198721 is the number of continuous predictors 119883lowastlowast is

selected as the split predictor for the node Otherwisethis node is not split [45]

3 Study Area

As shown in Figure 1 this study is focused on Penang Islandwhich lies between 5∘151015840 to 5∘301015840 N latitude and 100∘101015840 to100∘201015840 E longitude The North Channel separates the studyarea from the mainland It occupies an area of 285 km2 and isone of the 13 states of Malaysia The island is bounded to thenorth and east by the state of Kedah to the south by the state

582000

585000

588000

591000

594000

597000

600000

603000

606000

582000

585000

588000

591000

594000

597000

600000

603000

606000

243000

246000

249000

252000

255000

258000

261000

264000

243000

246000

249000

252000

255000

258000

261000

264000

N

0 33

(km)

RiverLandslide

Altitude (m)0ndash67ndash1516ndash1920

21ndash62

63ndash123124ndash194195ndash286287ndash430431ndash820

Figure 1 Study area map and landslide location map with hillshaded map

of Perak and to the west by the Strait ofMalacca and Sumatra(Indonesia)

Penang Island consists of both the island of Penangand a coastal strip on the mainland known as the ProvinceWellesley This paper focuses only on the island wherefrequent landslides occurred and threaten lives and damageproperties [46 47] The heavy rain plays a major role intriggering the landslides in the study area Data from theMalaysian Meteorological Department recorded that therainfall amount varies approximately between 2254mm and2903mm annually in the study area Penang Island has atropical climate with high temperature of 29∘C to 32∘C andhumidity ranges from 65 to 96 Topographic elevationsvary between 0m and 820m above sea level The slope angleranges from 0∘ to 87∘ while 4328 of island is flat Geolog-ical data from the Minerals and Geosciences DepartmentMalaysia show that Ferringhi granite Batu Maung graniteclay and sand granite represent more than 72 of the studyarearsquos geology Vegetation cover consists mainly of forests andfruit plantations

4 Data Collection

An effective intelligent system requires a comprehensive dataset Therefore 137570 samples of data were selected in this

Journal of Applied Mathematics 5

Table 1 Number of nodes terminal nodes and order of importance variable

Decision treemodel No of nodes No of terminal

nodesIndependent variable included

ldquoorder of importancerdquoCHAID 317 254 1198813 11988116 11988115 1198815 1198814 11988121 11988117 11988113 1198811 11988112 11988119 11988118 11988110

ExhaustiveCHAID 377 302

1198813 11988116 11988115 1198815 1198814 11988121 1198819 11988113 1198811 11988117 11988112 11988119 1198811811988110

CRT 43 221198813 11988116 11988115 1198815 1198814 1198812 11988121 1198816 1198818 1198819 11988113 1198811 11988117

11988112 11988119 11988120 11988118 11988111 11988114 1198817 11988110

QUEST 55 28 1198813 11988116 11988115 1198815 1198814 1198812 11988121 1198816 1198818 11988113 1198811 11988117 1198811211988119 11988118 11988111 1198817 11988110

analysis where 68786 samples represent landslides and 68786samples represent no landslidesThen Digital ElevationMap(DEM) is used to extract 21 topographic factors The DEMwith five-meter resolutions of Penang Island was obtainedfrom the Department of Survey and Mapping Malaysia Theextracted factors are acronyms as 1198811 (vegetation cover) 1198812

(distance from the fault line) 1198813 (slope angle) 1198814 (crosscurvature) 1198815 (slope aspect) 1198816 (distance from road) 1198817

(geology) 1198818 (diagonal length) 1198819 (longitude curvature)11988110 (rugosity) 11988111 (plan curvature) 11988112 (elevation) 11988113

(rain perception) 11988114 (soil texture) 11988115 (surface area) 11988116

(distance from drainage) 11988117 (roughness) 11988118 (land cover)11988119 (general curvature) 11988120 (tangent curvature) and 11988121

(profile curvature) In the previous studies which have beendone on Penang Island only 14 factors (1198811 to 11988114) were on thesubject of investigation for landslide [48] While the factors11988115 to 11988121 will be applied and investigated for the first timeon the study area Furthermore the 21 factors represent theavailable data of all factors that can cause the landslide in thestudy area The intelligent system target (landslides history)is represented by 0 for no landslide and 1 for landslide Thedata were normalized to range between 0 and 1 for each ofthe factors individually A 10-fold cross-validation analysiswas performed as an initial evaluation of the test error of thealgorithms Briefly this process involves splitting up the dataset into 10 random segments and using 9 of them for trainingand the 10th as a test set for the algorithm Classificationaccuracy of each model was calculated as follows

The accuracy of correctly classified landslide (1) is givenby

accuracy (1) =

10

sum

119894=1

number of correctly classified (1)

number of (1) (7)

The accuracy of correctly classified no landslide (0) isgiven by

accuracy (0) =

10

sum

119894=1

number of correctly classified (0)

number of (0) (8)

The overall accuracy for decision tree model is given by

overall accuracy =accuracy (1) + accuracy (0)

2 (9)

5 Discussion

Four tree algorithms CHAID Exhaustive CHAID CRT andQUEST were applied to map landslide susceptibility hazardThe 4 trees construction is based on the entire sample of137572 cases a cross-validation with 10 folds 005 adjustmentof the probabilities a minimum cases in parent node of 100a minimum cases in child node of 50 and equal misclassifi-cation costs The maximum number of levels is 3 for CHAIDand exhaustive CHAID and 5 for CRT and QUEST

The results for number of nodes number of terminalnodes and importance of independent variable produced byeach model are presented in Table 1 The classification treesobtained show a tree with a total of 317 nodes that consistof 254 terminal nodes using CHAID 377 nodes with 302terminal nodes using exhaustive CHAID 43 nodes with 22terminal nodes using CRT and 55 nodes with 28 terminalnodes using QUEST An example of decision tree usingCRT method is explained in Table 2 The tree has 43 nodesincluding the root node 20 internal nodes and 22 leaves(terminal nodes) Percentages in each category and in eachjoint category are presented in Table 2

Also the decision tree methods are used to analyzethe relationships between landslide susceptibility and relatedfactorsThenormalized importance of factors in classificationusing CRT is shown in Figure 2 The top-down induction ofthe decision tree indicates that variables in the higher order ofthe tree structure are more important for analyzing landslidesusceptibilityThe tree structure demonstrates that importantvariables related to high landslide susceptibility catchmentsare ordered as follows 1198813 (slope angle) 11988116 (distance fromdrainage) 11988115 (surface area) 1198815 (slope aspect) and 1198814 (crosscurvature)

The results for prediction accuracy produced by eachmodel are presented in Table 3 The results show highclassification accuracy for exhaustive CHAID algorithm ascompared to other algorithms It is found that the predictionaccuracy for exhaustive CHAID is 820 with sensitivity723 and specificity 917

6 Conclusion

This study has analyzed landslide susceptibility in PenangIsland Malaysia using ensemble learning with a decision-tree model We can conclude that the decision tree clearly

6 Journal of Applied Mathematics

Table 2 Tree table using CRT method

Node 0 1 Total Predictedcategory Parent node Primary independent variable

119873 Percent 119873 Percent 119873 Percent Variable Improvement Split values0 68786 500 68786 500 137572 1000 11 31018 958 1347 42 32365 235 0 0 11988115 0129 le0037252 37768 359 67439 641 105207 765 1 0 11988115 0129 gt0037253 29797 990 314 10 30111 219 0 1 11988116 0006 le0045804 1221 542 1033 458 2254 16 0 1 11988116 0006 gt0045805 15591 283 39565 717 55156 401 1 2 11988119 0010 le0021206 22177 443 27874 557 50051 364 1 2 11988119 0010 gt0021207 379 360 674 640 1053 08 1 4 11988117 0001 le0009588 842 701 359 299 1201 09 0 4 11988117 0001 gt0009589 6228 374 10421 626 16649 121 1 5 1198813 0003 le00129210 9363 243 29144 757 38507 280 1 5 1198813 0003 gt00129211 3727 755 1207 245 4934 36 0 6 11988116 0008 le00510412 18450 409 26667 591 45117 328 1 6 11988116 0008 gt00510413 237 280 608 720 845 06 1 7 11988112 0000 le01562514 142 683 66 317 208 02 0 7 11988112 0000 gt01562515 216 995 1 05 217 02 0 8 11988115 0000 le00237616 626 636 358 364 984 07 0 8 11988115 0000 gt00237617 1305 612 829 388 2134 16 0 9 11988117 0002 le00265918 4923 339 9592 661 14515 106 1 9 11988117 0002 gt00265919 3413 300 7980 700 11393 83 1 10 11988118 0001 le00587320 5950 219 21164 781 27114 197 1 10 11988118 0001 gt00587321 48 284 121 716 169 01 1 11 11988113 0001 le01022 3679 772 1086 228 4765 35 0 11 11988113 0001 gt01023 10966 476 12083 524 23049 168 1 12 1198813 0003 le00251024 7484 339 14584 661 22068 160 1 12 1198813 0003 gt00251025 68 986 1 14 69 01 0 14 11988112 0000 le02187526 74 532 65 468 139 01 0 14 11988112 0000 gt02187527 232 475 256 525 488 04 1 16 11988121 0000 le0437528 394 794 102 206 496 04 0 16 11988121 0000 gt0437529 758 792 199 208 957 07 0 17 11988115 0001 le00781230 547 465 630 535 1177 09 1 17 11988115 0001 gt00781231 3200 295 7633 705 10833 79 1 18 11988113 0001 le05032 1723 468 1959 532 3682 27 1 18 11988113 0001 gt05033 1828 221 6450 779 8278 60 1 19 11988117 0003 le01721434 1585 509 1530 491 3115 23 0 19 11988117 0003 gt01721435 5873 217 21164 783 27037 197 1 20 11988118 0001 le04648836 77 1000 0 00 77 01 0 20 11988118 0001 gt04648837 2179 714 873 286 3052 22 0 22 1198811 0000 le01538538 1500 876 213 124 1713 12 0 22 1198811 0000 gt01538539 5193 592 3577 408 8770 64 0 23 11988118 0003 le00880940 5773 404 8506 596 14279 104 1 23 11988118 0003 gt00880941 7078 329 14405 671 21483 156 1 24 11988117 0001 le05218542 406 694 179 306 585 04 0 24 11988117 0001 gt052185

Journal of Applied Mathematics 7

Table 3 Classification accuracy produced by each model

Decision tree model ClassificationPredicted (0) Predicted (1) Overall

CHAID 735 903 819Exhaustive CHAID 723 917 820CRT 614 897 756QUEST 544 935 740

Inde

pend

ent v

aria

ble

V10

V7

V14

V11

V18

V20

V19

V12

V17

V1

V13

V9

V8

V6

V21

V2

V4

V5

V15

V16

V3

0 20 40 60 80 100

Normalized importance

000 005 010 015

Importance ()

Figure 2 Normalized importance of factors using CRT method

indicates the order of important variables and quantitativelydescribes the relationships among the occurrence of land-slides topography and geology The decision-tree modelusing the exhaustive CHAID algorithm showed greater accu-racy than the other models demonstrating the usefulnessof the decision tree model for landslide hazard mappingAccuracies were 820 for the exhaustive CHAID 819for the CHAID 756 for the CRT and 740 for theQuest algorithm In this study we determined factors thatmay be involved in landslide susceptibility and the resultscan be used for landslide hazard mapping in other regionsMoreover landslide hazardmappingmap can be used to helpmitigate hazards to people and facilities and as basic datafor developing plans to prevent landslide hazards such as inlocating monitoring and facility sites Further case studiesand modeling are needed to better generalize the factorsinvolved in landslide susceptibility

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] P Aleotti and R Chowdhury ldquoLandslide hazard assessmentsummary review and new perspectivesrdquo Bulletin of EngineeringGeology and the Environment vol 58 no 1 pp 21ndash44 1999

[2] H Saito D Nakayama and H Matsuyama ldquoComparison oflandslide susceptibility based on a decision-tree model andactual landslide occurrence The Akaishi Mountains JapanrdquoGeomorphology vol 109 no 3-4 pp 108ndash121 2009

[3] F Guzzetti A Carrara M Cardinali and P ReichenbachldquoLandslide hazard evaluation a review of current techniquesand their application in a multi-scale study Central ItalyrdquoGeomorphology vol 31 no 1ndash4 pp 181ndash216 1999

[4] F Guzzetti P Reichenbach F Ardizzone M Cardinali and MGalli ldquoEstimating the quality of landslide susceptibilitymodelsrdquoGeomorphology vol 81 no 1-2 pp 166ndash184 2006

[5] F Guzzetti P Reichenbach M Cardinali M Galli and FArdizzone ldquoProbabilistic landslide hazard assessment at thebasin scalerdquoGeomorphology vol 72 no 1ndash4 pp 272ndash299 2005

[6] E Yesilnacar and T Topal ldquoLandslide susceptibility mapping acomparison of logistic regression and neural networksmethodsin a medium scale study Hendek region (Turkey)rdquo EngineeringGeology vol 79 no 3-4 pp 251ndash266 2005

[7] D P Kanungo M K Arora S Sarkar and R P Gupta ldquoAcomparative study of conventional ANN black box fuzzy andcombined neural and fuzzy weighting procedures for landslidesusceptibility zonation in Darjeeling Himalayasrdquo EngineeringGeology vol 85 no 3-4 pp 347ndash366 2006

[8] S He P Pan L Dai H Wang and J Liu ldquoApplication ofkernel-based Fisher discriminant analysis to map landslidesusceptibility in the Qinggan River deltaThree Gorges ChinardquoGeomorphology vol 171-172 pp 30ndash41 2012

[9] A Carrara ldquoMultivariate models for landslide hazard evalua-tionrdquo Journal of the International Association for MathematicalGeology vol 15 no 3 pp 403ndash426 1983

[10] L Ayalew and H Yamagishi ldquoSlope failures in the Blue Nilebasin as seen from landscape evolution perspectiverdquo Geomor-phology vol 57 no 1-2 pp 95ndash116 2004

[11] G Metternicht L Hurni and R Gogu ldquoRemote sensing oflandslides an analysis of the potential contribution to geo-spatial systems for hazard assessment in mountainous environ-mentsrdquoRemote Sensing of Environment vol 98 no 2-3 pp 284ndash303 2005

[12] D E Alexander ldquoA brief survey of GIS inmass-movement stud-ies with reflections on theory and methodsrdquo Geomorphologyvol 94 no 3-4 pp 261ndash267 2008

[13] J Remondo J Bonachea and A Cendrero ldquoQuantitativelandslide risk assessment and mapping on the basis of recentoccurrencesrdquo Geomorphology vol 94 no 3-4 pp 496ndash5072008

8 Journal of Applied Mathematics

[14] L Luzi F Pergalani and M T J Terlien ldquoSlope vulnerability toearthquakes at subregional scale using probabilistic techniquesand geographic information systemsrdquo Engineering Geology vol58 no 3-4 pp 313ndash336 2000

[15] S Lee and KMin ldquoStatistical analysis of landslide susceptibilityat Yongin Koreardquo Environmental Geology vol 40 no 9 pp1095ndash1113 2001

[16] L Donati and M C Turrini ldquoAn objective method to rankthe importance of the factors predisposing to landslides withthe GIS methodology application to an area of the Apennines(Valnerina Perugia Italy)rdquo Engineering Geology vol 63 no 3-4 pp 277ndash289 2002

[17] S Lee and U Choi ldquoDevelopment of GIS-based geologicalhazard information system and its application for landslideanalysis in KoreardquoGeosciences Journal vol 7 no 3 pp 243ndash2522003

[18] B Neuhauser and B Terhorst ldquoLandslide susceptibility assess-ment using ldquoweights-of-evidencerdquo applied to a study area at theJurassic escarpment (SW-Germany)rdquo Geomorphology vol 86no 1-2 pp 12ndash24 2007

[19] P M Atkinson and R Massari ldquoGeneralised linear modellingof susceptibility to landsliding in the central Apennines ItalyrdquoComputers and Geosciences vol 24 no 4 pp 373ndash385 1998

[20] F C Dai C F Lee J Li and Z W Xu ldquoAssessment of landslidesusceptibility on the natural terrain of Lantau Island HongKongrdquo Environmental Geology vol 40 no 3 pp 381ndash391 2001

[21] H A Nefeslioglu T Y Duman and S Durmaz ldquoLandslidesusceptibility mapping for a part of tectonic Kelkit Valley(Eastern Black Sea region of Turkey)rdquo Geomorphology vol 94no 3-4 pp 401ndash418 2008

[22] L Ermini F Catani andN Casagli ldquoArtificial Neural Networksapplied to landslide susceptibility assessmentrdquo Geomorphologyvol 66 no 1ndash4 pp 327ndash343 2005

[23] S Lee and I Park ldquoApplication of decision tree model for theground subsidence hazard mapping near abandoned under-ground coal minesrdquo Journal of Environmental Management vol127 pp 166ndash176 2013

[24] H Gomez and T Kavzoglu ldquoAssessment of shallow landslidesusceptibility using artificial neural networks in Jabonosa RiverBasin Venezuelardquo Engineering Geology vol 78 no 1-2 pp 11ndash27 2005

[25] C Melchiorre M Matteucci A Azzoni and A ZanchildquoArtificial neural networks and cluster analysis in landslidesusceptibility zonationrdquo Geomorphology vol 94 no 3-4 pp379ndash400 2008

[26] Y-K Yeon J-G Han and K H Ryu ldquoLandslide susceptibilitymapping in Injae Korea using a decision treerdquo EngineeringGeology vol 116 no 3-4 pp 274ndash283 2010

[27] R Bou Kheir J Chorowicz C Abdallah and D DhontldquoSoil and bedrock distribution estimated from gully form andfrequency A GIS-based decision-tree model for LebanonrdquoGeomorphology vol 93 no 3-4 pp 482ndash492 2008

[28] N J Schneevoigt S van der Linden H-P Thamm and LSchrott ldquoDetecting Alpine landforms from remotely sensedimagery A pilot study in the Bavarian Alpsrdquo Geomorphologyvol 93 no 1-2 pp 104ndash119 2008

[29] C-S Huang Y-J Lin and C-C Lin ldquoImplementation ofclassifiers for choosing insurance policy using decision trees Acase studyrdquoWSEASTransactions onComputers vol 7 no 10 pp1679ndash1689 2008

[30] M Pal and P M Mather ldquoAn assessment of the effectivenessof decision tree methods for land cover classificationrdquo RemoteSensing of Environment vol 86 no 4 pp 554ndash565 2003

[31] D T Bui B Pradhan O Lofman and I Revhaug ldquoLandslidesusceptibility assessment in Vietnam using support vectormachines decision tree and naıve bayesmodelsrdquoMathematicalProblems in Engineering vol 2012 Article ID 974638 26 pages2012

[32] P K Pang L T Tien and H Lateh ldquoLandslide hazard mappingof penang islandusing decision treemodelrdquo inProceedings of theInternational Conference on Systems and Electronic Engineering(ICSEE rsquo12) Phuket Thailand December 2012

[33] B Pradhan ldquoA comparative study on the predictive ability of thedecision tree support vector machine and neuro-fuzzy modelsin landslide susceptibility mapping using GISrdquo Computers ampGeosciences vol 51 pp 350ndash365 2013

[34] M Ture F Tokatli and I Kurt ldquoUsing Kaplan-Meier analysistogether with decision tree methods (CampRT CHAID QUESTC45 and ID3) in determining recurrence-free survival of breastcancer patientsrdquo Expert Systems with Applications vol 36 no 2pp 2017ndash2026 2009

[35] C E Brodley and M A Friedl ldquoDecision tree classificationof land cover from remotely sensed datardquo Remote Sensing ofEnvironment vol 61 no 3 pp 399ndash409 1997

[36] M Xu P Watanachaturaporn P K Varshney and M KArora ldquoDecision tree regression for soft classification of remotesensing datardquo Remote Sensing of Environment vol 97 no 3 pp322ndash336 2005

[37] I H Witten and E Frank Data MiningmdashPractical MachineLearning Tools and Techniques Elsevier Amsterdam TheNetherlands 2nd edition 2005

[38] R J Lewis ldquoAn introduction to Classification and RegressionTree (CART) analysisrdquo in Proceedings of the Annual Meetingof the Society for Academic Emergenct Medicine San FranciscoCalif USA 2000

[39] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth and BrooksMontery Calif USA 1984

[40] J A Michael and S L Gordon Data Mining Technique ForMarketing Sales and Customer Support Wiley New York NYUSA 1997

[41] D B V Biggs and E Suen ldquoA method of choosing multiwaypartitions for classification and decision treesrdquo Journal ofApplied Statistics vol 18 pp 49ndash62 1991

[42] L A Goodman ldquoSimple models for the analysis of associationin cross-classifications having ordered categoriesrdquo Journal of theAmerican Statistical Association vol 74 no 367 pp 537ndash5521979

[43] G Kass ldquoAn exploratory technique for investigating largequantities of categorical datardquo Applied Statistics vol 29 no 2pp 119ndash127 1980

[44] T Hill and P Lewicki Statistics Methods and Applications AComprehensive Reference for Science Industry andDataMiningStata Soft USA 2006

[45] W-Y Loh and Y-S Shih ldquoSplit selectionmethods for classifica-tion treesrdquo Statistica Sinica vol 7 no 4 pp 815ndash840 1997

[46] H-J Oh and B Pradhan ldquoApplication of a neuro-fuzzy modelto landslide-susceptibility mapping for shallow landslides in atropical hilly areardquoComputers and Geosciences vol 37 no 9 pp1264ndash1276 2011

Journal of Applied Mathematics 9

[47] K LimKhai-Wern T Lea Tien andH Lateh ldquoLandslide hazardmapping of Penang island using probabilistic methods andlogistic regressionrdquo in Proceedings of the IEEE InternationalConference on Imaging Systems andTechniques (IST rsquo11) pp 273ndash278 May 2011

[48] M S Alklhasawneh and U K Ngah ldquoLandslide susceptibilityhazard mapping techniques reviewrdquo Journal of Applied Sciencesvol 12 pp 802ndash808 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Research Article Modeling and Testing Landslide …downloads.hindawi.com/journals/jam/2014/929768.pdfe re are three tree types of decision tree: CRT, CHAID and Exhaustive CHAID, and

4 Journal of Applied Mathematics

after the merging step The Bonferroni multiplier 119861 is thenumber of possible ways that 119868 categories can be merged into119903 categories For 119903 = 119868 119861 = 1 For 2 le 119903 lt 119868 use the followingequation

119861 =

119903minus1

sum

V=0(minus1)

V (119903 minus V)119868

V(119903 minus V) (6)

23 Quick-Unbiased-Efficient Statistical Tree (QUEST)QUEST is a binary split decision tree algorithm for clas-sification and data mining QUEST can be used with univar-iant or linear combination splits A unique feature is that itsattribute selectionmethod has negligible bias If all the attrib-utes are uninformative with respect to the class attribute theneach has approximately the same change of being selected tosplit a node [45]

TheQUEST tree growing process consists of the selectionof a split predictor selection of a split point for the selectedpredictor and stopping In this algorithm only univariantsplits are considered For selection of split predictor it usesthe following algorithm

(1) For each continuous predictor 119883 perform anANOVA 119865-test that tests if all the different classes ofthe dependent variable 119884 have the same mean of 119883and calculate the 119901 value according to the 119865 statisticsFor each categorical predictor perform Pearsonrsquos1205942-test of 119884 and 119883rsquos independence and calculate the

119901 value according to the 1198832 statistics

(2) Find the predictor with the smallest 119901 value anddenote it 119883

lowast(3) If this smallest 119901 value is less than 120572119872 where 120572 isin

(0 1) is a user-specified level of significance and 119872 isthe total number of predictor variables predictor 119883

lowast

is selected as the split predictor for the node If not goto 4

(4) For each continuous predictor 119883 compute Levenersquos 119865

statistic based on the absolute deviation of 119883 from itsclass mean to test if the variances of 119883 for differentclasses of 119884 are the same and calculate the 119901 value forthe test

(5) Find the predictor with the smallest 119901 value anddenote it as 119883

lowastlowast(6) If this smallest 119901 value is less than 120572(119872+1198721) where

1198721 is the number of continuous predictors 119883lowastlowast is

selected as the split predictor for the node Otherwisethis node is not split [45]

3 Study Area

As shown in Figure 1 this study is focused on Penang Islandwhich lies between 5∘151015840 to 5∘301015840 N latitude and 100∘101015840 to100∘201015840 E longitude The North Channel separates the studyarea from the mainland It occupies an area of 285 km2 and isone of the 13 states of Malaysia The island is bounded to thenorth and east by the state of Kedah to the south by the state

582000

585000

588000

591000

594000

597000

600000

603000

606000

582000

585000

588000

591000

594000

597000

600000

603000

606000

243000

246000

249000

252000

255000

258000

261000

264000

243000

246000

249000

252000

255000

258000

261000

264000

N

0 33

(km)

RiverLandslide

Altitude (m)0ndash67ndash1516ndash1920

21ndash62

63ndash123124ndash194195ndash286287ndash430431ndash820

Figure 1 Study area map and landslide location map with hillshaded map

of Perak and to the west by the Strait ofMalacca and Sumatra(Indonesia)

Penang Island consists of both the island of Penangand a coastal strip on the mainland known as the ProvinceWellesley This paper focuses only on the island wherefrequent landslides occurred and threaten lives and damageproperties [46 47] The heavy rain plays a major role intriggering the landslides in the study area Data from theMalaysian Meteorological Department recorded that therainfall amount varies approximately between 2254mm and2903mm annually in the study area Penang Island has atropical climate with high temperature of 29∘C to 32∘C andhumidity ranges from 65 to 96 Topographic elevationsvary between 0m and 820m above sea level The slope angleranges from 0∘ to 87∘ while 4328 of island is flat Geolog-ical data from the Minerals and Geosciences DepartmentMalaysia show that Ferringhi granite Batu Maung graniteclay and sand granite represent more than 72 of the studyarearsquos geology Vegetation cover consists mainly of forests andfruit plantations

4 Data Collection

An effective intelligent system requires a comprehensive dataset Therefore 137570 samples of data were selected in this

Journal of Applied Mathematics 5

Table 1 Number of nodes terminal nodes and order of importance variable

Decision treemodel No of nodes No of terminal

nodesIndependent variable included

ldquoorder of importancerdquoCHAID 317 254 1198813 11988116 11988115 1198815 1198814 11988121 11988117 11988113 1198811 11988112 11988119 11988118 11988110

ExhaustiveCHAID 377 302

1198813 11988116 11988115 1198815 1198814 11988121 1198819 11988113 1198811 11988117 11988112 11988119 1198811811988110

CRT 43 221198813 11988116 11988115 1198815 1198814 1198812 11988121 1198816 1198818 1198819 11988113 1198811 11988117

11988112 11988119 11988120 11988118 11988111 11988114 1198817 11988110

QUEST 55 28 1198813 11988116 11988115 1198815 1198814 1198812 11988121 1198816 1198818 11988113 1198811 11988117 1198811211988119 11988118 11988111 1198817 11988110

analysis where 68786 samples represent landslides and 68786samples represent no landslidesThen Digital ElevationMap(DEM) is used to extract 21 topographic factors The DEMwith five-meter resolutions of Penang Island was obtainedfrom the Department of Survey and Mapping Malaysia Theextracted factors are acronyms as 1198811 (vegetation cover) 1198812

(distance from the fault line) 1198813 (slope angle) 1198814 (crosscurvature) 1198815 (slope aspect) 1198816 (distance from road) 1198817

(geology) 1198818 (diagonal length) 1198819 (longitude curvature)11988110 (rugosity) 11988111 (plan curvature) 11988112 (elevation) 11988113

(rain perception) 11988114 (soil texture) 11988115 (surface area) 11988116

(distance from drainage) 11988117 (roughness) 11988118 (land cover)11988119 (general curvature) 11988120 (tangent curvature) and 11988121

(profile curvature) In the previous studies which have beendone on Penang Island only 14 factors (1198811 to 11988114) were on thesubject of investigation for landslide [48] While the factors11988115 to 11988121 will be applied and investigated for the first timeon the study area Furthermore the 21 factors represent theavailable data of all factors that can cause the landslide in thestudy area The intelligent system target (landslides history)is represented by 0 for no landslide and 1 for landslide Thedata were normalized to range between 0 and 1 for each ofthe factors individually A 10-fold cross-validation analysiswas performed as an initial evaluation of the test error of thealgorithms Briefly this process involves splitting up the dataset into 10 random segments and using 9 of them for trainingand the 10th as a test set for the algorithm Classificationaccuracy of each model was calculated as follows

The accuracy of correctly classified landslide (1) is givenby

accuracy (1) =

10

sum

119894=1

number of correctly classified (1)

number of (1) (7)

The accuracy of correctly classified no landslide (0) isgiven by

accuracy (0) =

10

sum

119894=1

number of correctly classified (0)

number of (0) (8)

The overall accuracy for decision tree model is given by

overall accuracy =accuracy (1) + accuracy (0)

2 (9)

5 Discussion

Four tree algorithms CHAID Exhaustive CHAID CRT andQUEST were applied to map landslide susceptibility hazardThe 4 trees construction is based on the entire sample of137572 cases a cross-validation with 10 folds 005 adjustmentof the probabilities a minimum cases in parent node of 100a minimum cases in child node of 50 and equal misclassifi-cation costs The maximum number of levels is 3 for CHAIDand exhaustive CHAID and 5 for CRT and QUEST

The results for number of nodes number of terminalnodes and importance of independent variable produced byeach model are presented in Table 1 The classification treesobtained show a tree with a total of 317 nodes that consistof 254 terminal nodes using CHAID 377 nodes with 302terminal nodes using exhaustive CHAID 43 nodes with 22terminal nodes using CRT and 55 nodes with 28 terminalnodes using QUEST An example of decision tree usingCRT method is explained in Table 2 The tree has 43 nodesincluding the root node 20 internal nodes and 22 leaves(terminal nodes) Percentages in each category and in eachjoint category are presented in Table 2

Also the decision tree methods are used to analyzethe relationships between landslide susceptibility and relatedfactorsThenormalized importance of factors in classificationusing CRT is shown in Figure 2 The top-down induction ofthe decision tree indicates that variables in the higher order ofthe tree structure are more important for analyzing landslidesusceptibilityThe tree structure demonstrates that importantvariables related to high landslide susceptibility catchmentsare ordered as follows 1198813 (slope angle) 11988116 (distance fromdrainage) 11988115 (surface area) 1198815 (slope aspect) and 1198814 (crosscurvature)

The results for prediction accuracy produced by eachmodel are presented in Table 3 The results show highclassification accuracy for exhaustive CHAID algorithm ascompared to other algorithms It is found that the predictionaccuracy for exhaustive CHAID is 820 with sensitivity723 and specificity 917

6 Conclusion

This study has analyzed landslide susceptibility in PenangIsland Malaysia using ensemble learning with a decision-tree model We can conclude that the decision tree clearly

6 Journal of Applied Mathematics

Table 2 Tree table using CRT method

Node 0 1 Total Predictedcategory Parent node Primary independent variable

119873 Percent 119873 Percent 119873 Percent Variable Improvement Split values0 68786 500 68786 500 137572 1000 11 31018 958 1347 42 32365 235 0 0 11988115 0129 le0037252 37768 359 67439 641 105207 765 1 0 11988115 0129 gt0037253 29797 990 314 10 30111 219 0 1 11988116 0006 le0045804 1221 542 1033 458 2254 16 0 1 11988116 0006 gt0045805 15591 283 39565 717 55156 401 1 2 11988119 0010 le0021206 22177 443 27874 557 50051 364 1 2 11988119 0010 gt0021207 379 360 674 640 1053 08 1 4 11988117 0001 le0009588 842 701 359 299 1201 09 0 4 11988117 0001 gt0009589 6228 374 10421 626 16649 121 1 5 1198813 0003 le00129210 9363 243 29144 757 38507 280 1 5 1198813 0003 gt00129211 3727 755 1207 245 4934 36 0 6 11988116 0008 le00510412 18450 409 26667 591 45117 328 1 6 11988116 0008 gt00510413 237 280 608 720 845 06 1 7 11988112 0000 le01562514 142 683 66 317 208 02 0 7 11988112 0000 gt01562515 216 995 1 05 217 02 0 8 11988115 0000 le00237616 626 636 358 364 984 07 0 8 11988115 0000 gt00237617 1305 612 829 388 2134 16 0 9 11988117 0002 le00265918 4923 339 9592 661 14515 106 1 9 11988117 0002 gt00265919 3413 300 7980 700 11393 83 1 10 11988118 0001 le00587320 5950 219 21164 781 27114 197 1 10 11988118 0001 gt00587321 48 284 121 716 169 01 1 11 11988113 0001 le01022 3679 772 1086 228 4765 35 0 11 11988113 0001 gt01023 10966 476 12083 524 23049 168 1 12 1198813 0003 le00251024 7484 339 14584 661 22068 160 1 12 1198813 0003 gt00251025 68 986 1 14 69 01 0 14 11988112 0000 le02187526 74 532 65 468 139 01 0 14 11988112 0000 gt02187527 232 475 256 525 488 04 1 16 11988121 0000 le0437528 394 794 102 206 496 04 0 16 11988121 0000 gt0437529 758 792 199 208 957 07 0 17 11988115 0001 le00781230 547 465 630 535 1177 09 1 17 11988115 0001 gt00781231 3200 295 7633 705 10833 79 1 18 11988113 0001 le05032 1723 468 1959 532 3682 27 1 18 11988113 0001 gt05033 1828 221 6450 779 8278 60 1 19 11988117 0003 le01721434 1585 509 1530 491 3115 23 0 19 11988117 0003 gt01721435 5873 217 21164 783 27037 197 1 20 11988118 0001 le04648836 77 1000 0 00 77 01 0 20 11988118 0001 gt04648837 2179 714 873 286 3052 22 0 22 1198811 0000 le01538538 1500 876 213 124 1713 12 0 22 1198811 0000 gt01538539 5193 592 3577 408 8770 64 0 23 11988118 0003 le00880940 5773 404 8506 596 14279 104 1 23 11988118 0003 gt00880941 7078 329 14405 671 21483 156 1 24 11988117 0001 le05218542 406 694 179 306 585 04 0 24 11988117 0001 gt052185

Journal of Applied Mathematics 7

Table 3 Classification accuracy produced by each model

Decision tree model ClassificationPredicted (0) Predicted (1) Overall

CHAID 735 903 819Exhaustive CHAID 723 917 820CRT 614 897 756QUEST 544 935 740

Inde

pend

ent v

aria

ble

V10

V7

V14

V11

V18

V20

V19

V12

V17

V1

V13

V9

V8

V6

V21

V2

V4

V5

V15

V16

V3

0 20 40 60 80 100

Normalized importance

000 005 010 015

Importance ()

Figure 2 Normalized importance of factors using CRT method

indicates the order of important variables and quantitativelydescribes the relationships among the occurrence of land-slides topography and geology The decision-tree modelusing the exhaustive CHAID algorithm showed greater accu-racy than the other models demonstrating the usefulnessof the decision tree model for landslide hazard mappingAccuracies were 820 for the exhaustive CHAID 819for the CHAID 756 for the CRT and 740 for theQuest algorithm In this study we determined factors thatmay be involved in landslide susceptibility and the resultscan be used for landslide hazard mapping in other regionsMoreover landslide hazardmappingmap can be used to helpmitigate hazards to people and facilities and as basic datafor developing plans to prevent landslide hazards such as inlocating monitoring and facility sites Further case studiesand modeling are needed to better generalize the factorsinvolved in landslide susceptibility

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] P Aleotti and R Chowdhury ldquoLandslide hazard assessmentsummary review and new perspectivesrdquo Bulletin of EngineeringGeology and the Environment vol 58 no 1 pp 21ndash44 1999

[2] H Saito D Nakayama and H Matsuyama ldquoComparison oflandslide susceptibility based on a decision-tree model andactual landslide occurrence The Akaishi Mountains JapanrdquoGeomorphology vol 109 no 3-4 pp 108ndash121 2009

[3] F Guzzetti A Carrara M Cardinali and P ReichenbachldquoLandslide hazard evaluation a review of current techniquesand their application in a multi-scale study Central ItalyrdquoGeomorphology vol 31 no 1ndash4 pp 181ndash216 1999

[4] F Guzzetti P Reichenbach F Ardizzone M Cardinali and MGalli ldquoEstimating the quality of landslide susceptibilitymodelsrdquoGeomorphology vol 81 no 1-2 pp 166ndash184 2006

[5] F Guzzetti P Reichenbach M Cardinali M Galli and FArdizzone ldquoProbabilistic landslide hazard assessment at thebasin scalerdquoGeomorphology vol 72 no 1ndash4 pp 272ndash299 2005

[6] E Yesilnacar and T Topal ldquoLandslide susceptibility mapping acomparison of logistic regression and neural networksmethodsin a medium scale study Hendek region (Turkey)rdquo EngineeringGeology vol 79 no 3-4 pp 251ndash266 2005

[7] D P Kanungo M K Arora S Sarkar and R P Gupta ldquoAcomparative study of conventional ANN black box fuzzy andcombined neural and fuzzy weighting procedures for landslidesusceptibility zonation in Darjeeling Himalayasrdquo EngineeringGeology vol 85 no 3-4 pp 347ndash366 2006

[8] S He P Pan L Dai H Wang and J Liu ldquoApplication ofkernel-based Fisher discriminant analysis to map landslidesusceptibility in the Qinggan River deltaThree Gorges ChinardquoGeomorphology vol 171-172 pp 30ndash41 2012

[9] A Carrara ldquoMultivariate models for landslide hazard evalua-tionrdquo Journal of the International Association for MathematicalGeology vol 15 no 3 pp 403ndash426 1983

[10] L Ayalew and H Yamagishi ldquoSlope failures in the Blue Nilebasin as seen from landscape evolution perspectiverdquo Geomor-phology vol 57 no 1-2 pp 95ndash116 2004

[11] G Metternicht L Hurni and R Gogu ldquoRemote sensing oflandslides an analysis of the potential contribution to geo-spatial systems for hazard assessment in mountainous environ-mentsrdquoRemote Sensing of Environment vol 98 no 2-3 pp 284ndash303 2005

[12] D E Alexander ldquoA brief survey of GIS inmass-movement stud-ies with reflections on theory and methodsrdquo Geomorphologyvol 94 no 3-4 pp 261ndash267 2008

[13] J Remondo J Bonachea and A Cendrero ldquoQuantitativelandslide risk assessment and mapping on the basis of recentoccurrencesrdquo Geomorphology vol 94 no 3-4 pp 496ndash5072008

8 Journal of Applied Mathematics

[14] L Luzi F Pergalani and M T J Terlien ldquoSlope vulnerability toearthquakes at subregional scale using probabilistic techniquesand geographic information systemsrdquo Engineering Geology vol58 no 3-4 pp 313ndash336 2000

[15] S Lee and KMin ldquoStatistical analysis of landslide susceptibilityat Yongin Koreardquo Environmental Geology vol 40 no 9 pp1095ndash1113 2001

[16] L Donati and M C Turrini ldquoAn objective method to rankthe importance of the factors predisposing to landslides withthe GIS methodology application to an area of the Apennines(Valnerina Perugia Italy)rdquo Engineering Geology vol 63 no 3-4 pp 277ndash289 2002

[17] S Lee and U Choi ldquoDevelopment of GIS-based geologicalhazard information system and its application for landslideanalysis in KoreardquoGeosciences Journal vol 7 no 3 pp 243ndash2522003

[18] B Neuhauser and B Terhorst ldquoLandslide susceptibility assess-ment using ldquoweights-of-evidencerdquo applied to a study area at theJurassic escarpment (SW-Germany)rdquo Geomorphology vol 86no 1-2 pp 12ndash24 2007

[19] P M Atkinson and R Massari ldquoGeneralised linear modellingof susceptibility to landsliding in the central Apennines ItalyrdquoComputers and Geosciences vol 24 no 4 pp 373ndash385 1998

[20] F C Dai C F Lee J Li and Z W Xu ldquoAssessment of landslidesusceptibility on the natural terrain of Lantau Island HongKongrdquo Environmental Geology vol 40 no 3 pp 381ndash391 2001

[21] H A Nefeslioglu T Y Duman and S Durmaz ldquoLandslidesusceptibility mapping for a part of tectonic Kelkit Valley(Eastern Black Sea region of Turkey)rdquo Geomorphology vol 94no 3-4 pp 401ndash418 2008

[22] L Ermini F Catani andN Casagli ldquoArtificial Neural Networksapplied to landslide susceptibility assessmentrdquo Geomorphologyvol 66 no 1ndash4 pp 327ndash343 2005

[23] S Lee and I Park ldquoApplication of decision tree model for theground subsidence hazard mapping near abandoned under-ground coal minesrdquo Journal of Environmental Management vol127 pp 166ndash176 2013

[24] H Gomez and T Kavzoglu ldquoAssessment of shallow landslidesusceptibility using artificial neural networks in Jabonosa RiverBasin Venezuelardquo Engineering Geology vol 78 no 1-2 pp 11ndash27 2005

[25] C Melchiorre M Matteucci A Azzoni and A ZanchildquoArtificial neural networks and cluster analysis in landslidesusceptibility zonationrdquo Geomorphology vol 94 no 3-4 pp379ndash400 2008

[26] Y-K Yeon J-G Han and K H Ryu ldquoLandslide susceptibilitymapping in Injae Korea using a decision treerdquo EngineeringGeology vol 116 no 3-4 pp 274ndash283 2010

[27] R Bou Kheir J Chorowicz C Abdallah and D DhontldquoSoil and bedrock distribution estimated from gully form andfrequency A GIS-based decision-tree model for LebanonrdquoGeomorphology vol 93 no 3-4 pp 482ndash492 2008

[28] N J Schneevoigt S van der Linden H-P Thamm and LSchrott ldquoDetecting Alpine landforms from remotely sensedimagery A pilot study in the Bavarian Alpsrdquo Geomorphologyvol 93 no 1-2 pp 104ndash119 2008

[29] C-S Huang Y-J Lin and C-C Lin ldquoImplementation ofclassifiers for choosing insurance policy using decision trees Acase studyrdquoWSEASTransactions onComputers vol 7 no 10 pp1679ndash1689 2008

[30] M Pal and P M Mather ldquoAn assessment of the effectivenessof decision tree methods for land cover classificationrdquo RemoteSensing of Environment vol 86 no 4 pp 554ndash565 2003

[31] D T Bui B Pradhan O Lofman and I Revhaug ldquoLandslidesusceptibility assessment in Vietnam using support vectormachines decision tree and naıve bayesmodelsrdquoMathematicalProblems in Engineering vol 2012 Article ID 974638 26 pages2012

[32] P K Pang L T Tien and H Lateh ldquoLandslide hazard mappingof penang islandusing decision treemodelrdquo inProceedings of theInternational Conference on Systems and Electronic Engineering(ICSEE rsquo12) Phuket Thailand December 2012

[33] B Pradhan ldquoA comparative study on the predictive ability of thedecision tree support vector machine and neuro-fuzzy modelsin landslide susceptibility mapping using GISrdquo Computers ampGeosciences vol 51 pp 350ndash365 2013

[34] M Ture F Tokatli and I Kurt ldquoUsing Kaplan-Meier analysistogether with decision tree methods (CampRT CHAID QUESTC45 and ID3) in determining recurrence-free survival of breastcancer patientsrdquo Expert Systems with Applications vol 36 no 2pp 2017ndash2026 2009

[35] C E Brodley and M A Friedl ldquoDecision tree classificationof land cover from remotely sensed datardquo Remote Sensing ofEnvironment vol 61 no 3 pp 399ndash409 1997

[36] M Xu P Watanachaturaporn P K Varshney and M KArora ldquoDecision tree regression for soft classification of remotesensing datardquo Remote Sensing of Environment vol 97 no 3 pp322ndash336 2005

[37] I H Witten and E Frank Data MiningmdashPractical MachineLearning Tools and Techniques Elsevier Amsterdam TheNetherlands 2nd edition 2005

[38] R J Lewis ldquoAn introduction to Classification and RegressionTree (CART) analysisrdquo in Proceedings of the Annual Meetingof the Society for Academic Emergenct Medicine San FranciscoCalif USA 2000

[39] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth and BrooksMontery Calif USA 1984

[40] J A Michael and S L Gordon Data Mining Technique ForMarketing Sales and Customer Support Wiley New York NYUSA 1997

[41] D B V Biggs and E Suen ldquoA method of choosing multiwaypartitions for classification and decision treesrdquo Journal ofApplied Statistics vol 18 pp 49ndash62 1991

[42] L A Goodman ldquoSimple models for the analysis of associationin cross-classifications having ordered categoriesrdquo Journal of theAmerican Statistical Association vol 74 no 367 pp 537ndash5521979

[43] G Kass ldquoAn exploratory technique for investigating largequantities of categorical datardquo Applied Statistics vol 29 no 2pp 119ndash127 1980

[44] T Hill and P Lewicki Statistics Methods and Applications AComprehensive Reference for Science Industry andDataMiningStata Soft USA 2006

[45] W-Y Loh and Y-S Shih ldquoSplit selectionmethods for classifica-tion treesrdquo Statistica Sinica vol 7 no 4 pp 815ndash840 1997

[46] H-J Oh and B Pradhan ldquoApplication of a neuro-fuzzy modelto landslide-susceptibility mapping for shallow landslides in atropical hilly areardquoComputers and Geosciences vol 37 no 9 pp1264ndash1276 2011

Journal of Applied Mathematics 9

[47] K LimKhai-Wern T Lea Tien andH Lateh ldquoLandslide hazardmapping of Penang island using probabilistic methods andlogistic regressionrdquo in Proceedings of the IEEE InternationalConference on Imaging Systems andTechniques (IST rsquo11) pp 273ndash278 May 2011

[48] M S Alklhasawneh and U K Ngah ldquoLandslide susceptibilityhazard mapping techniques reviewrdquo Journal of Applied Sciencesvol 12 pp 802ndash808 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Research Article Modeling and Testing Landslide …downloads.hindawi.com/journals/jam/2014/929768.pdfe re are three tree types of decision tree: CRT, CHAID and Exhaustive CHAID, and

Journal of Applied Mathematics 5

Table 1 Number of nodes terminal nodes and order of importance variable

Decision treemodel No of nodes No of terminal

nodesIndependent variable included

ldquoorder of importancerdquoCHAID 317 254 1198813 11988116 11988115 1198815 1198814 11988121 11988117 11988113 1198811 11988112 11988119 11988118 11988110

ExhaustiveCHAID 377 302

1198813 11988116 11988115 1198815 1198814 11988121 1198819 11988113 1198811 11988117 11988112 11988119 1198811811988110

CRT 43 221198813 11988116 11988115 1198815 1198814 1198812 11988121 1198816 1198818 1198819 11988113 1198811 11988117

11988112 11988119 11988120 11988118 11988111 11988114 1198817 11988110

QUEST 55 28 1198813 11988116 11988115 1198815 1198814 1198812 11988121 1198816 1198818 11988113 1198811 11988117 1198811211988119 11988118 11988111 1198817 11988110

analysis where 68786 samples represent landslides and 68786samples represent no landslidesThen Digital ElevationMap(DEM) is used to extract 21 topographic factors The DEMwith five-meter resolutions of Penang Island was obtainedfrom the Department of Survey and Mapping Malaysia Theextracted factors are acronyms as 1198811 (vegetation cover) 1198812

(distance from the fault line) 1198813 (slope angle) 1198814 (crosscurvature) 1198815 (slope aspect) 1198816 (distance from road) 1198817

(geology) 1198818 (diagonal length) 1198819 (longitude curvature)11988110 (rugosity) 11988111 (plan curvature) 11988112 (elevation) 11988113

(rain perception) 11988114 (soil texture) 11988115 (surface area) 11988116

(distance from drainage) 11988117 (roughness) 11988118 (land cover)11988119 (general curvature) 11988120 (tangent curvature) and 11988121

(profile curvature) In the previous studies which have beendone on Penang Island only 14 factors (1198811 to 11988114) were on thesubject of investigation for landslide [48] While the factors11988115 to 11988121 will be applied and investigated for the first timeon the study area Furthermore the 21 factors represent theavailable data of all factors that can cause the landslide in thestudy area The intelligent system target (landslides history)is represented by 0 for no landslide and 1 for landslide Thedata were normalized to range between 0 and 1 for each ofthe factors individually A 10-fold cross-validation analysiswas performed as an initial evaluation of the test error of thealgorithms Briefly this process involves splitting up the dataset into 10 random segments and using 9 of them for trainingand the 10th as a test set for the algorithm Classificationaccuracy of each model was calculated as follows

The accuracy of correctly classified landslide (1) is givenby

accuracy (1) =

10

sum

119894=1

number of correctly classified (1)

number of (1) (7)

The accuracy of correctly classified no landslide (0) isgiven by

accuracy (0) =

10

sum

119894=1

number of correctly classified (0)

number of (0) (8)

The overall accuracy for decision tree model is given by

overall accuracy =accuracy (1) + accuracy (0)

2 (9)

5 Discussion

Four tree algorithms CHAID Exhaustive CHAID CRT andQUEST were applied to map landslide susceptibility hazardThe 4 trees construction is based on the entire sample of137572 cases a cross-validation with 10 folds 005 adjustmentof the probabilities a minimum cases in parent node of 100a minimum cases in child node of 50 and equal misclassifi-cation costs The maximum number of levels is 3 for CHAIDand exhaustive CHAID and 5 for CRT and QUEST

The results for number of nodes number of terminalnodes and importance of independent variable produced byeach model are presented in Table 1 The classification treesobtained show a tree with a total of 317 nodes that consistof 254 terminal nodes using CHAID 377 nodes with 302terminal nodes using exhaustive CHAID 43 nodes with 22terminal nodes using CRT and 55 nodes with 28 terminalnodes using QUEST An example of decision tree usingCRT method is explained in Table 2 The tree has 43 nodesincluding the root node 20 internal nodes and 22 leaves(terminal nodes) Percentages in each category and in eachjoint category are presented in Table 2

Also the decision tree methods are used to analyzethe relationships between landslide susceptibility and relatedfactorsThenormalized importance of factors in classificationusing CRT is shown in Figure 2 The top-down induction ofthe decision tree indicates that variables in the higher order ofthe tree structure are more important for analyzing landslidesusceptibilityThe tree structure demonstrates that importantvariables related to high landslide susceptibility catchmentsare ordered as follows 1198813 (slope angle) 11988116 (distance fromdrainage) 11988115 (surface area) 1198815 (slope aspect) and 1198814 (crosscurvature)

The results for prediction accuracy produced by eachmodel are presented in Table 3 The results show highclassification accuracy for exhaustive CHAID algorithm ascompared to other algorithms It is found that the predictionaccuracy for exhaustive CHAID is 820 with sensitivity723 and specificity 917

6 Conclusion

This study has analyzed landslide susceptibility in PenangIsland Malaysia using ensemble learning with a decision-tree model We can conclude that the decision tree clearly

6 Journal of Applied Mathematics

Table 2 Tree table using CRT method

Node 0 1 Total Predictedcategory Parent node Primary independent variable

119873 Percent 119873 Percent 119873 Percent Variable Improvement Split values0 68786 500 68786 500 137572 1000 11 31018 958 1347 42 32365 235 0 0 11988115 0129 le0037252 37768 359 67439 641 105207 765 1 0 11988115 0129 gt0037253 29797 990 314 10 30111 219 0 1 11988116 0006 le0045804 1221 542 1033 458 2254 16 0 1 11988116 0006 gt0045805 15591 283 39565 717 55156 401 1 2 11988119 0010 le0021206 22177 443 27874 557 50051 364 1 2 11988119 0010 gt0021207 379 360 674 640 1053 08 1 4 11988117 0001 le0009588 842 701 359 299 1201 09 0 4 11988117 0001 gt0009589 6228 374 10421 626 16649 121 1 5 1198813 0003 le00129210 9363 243 29144 757 38507 280 1 5 1198813 0003 gt00129211 3727 755 1207 245 4934 36 0 6 11988116 0008 le00510412 18450 409 26667 591 45117 328 1 6 11988116 0008 gt00510413 237 280 608 720 845 06 1 7 11988112 0000 le01562514 142 683 66 317 208 02 0 7 11988112 0000 gt01562515 216 995 1 05 217 02 0 8 11988115 0000 le00237616 626 636 358 364 984 07 0 8 11988115 0000 gt00237617 1305 612 829 388 2134 16 0 9 11988117 0002 le00265918 4923 339 9592 661 14515 106 1 9 11988117 0002 gt00265919 3413 300 7980 700 11393 83 1 10 11988118 0001 le00587320 5950 219 21164 781 27114 197 1 10 11988118 0001 gt00587321 48 284 121 716 169 01 1 11 11988113 0001 le01022 3679 772 1086 228 4765 35 0 11 11988113 0001 gt01023 10966 476 12083 524 23049 168 1 12 1198813 0003 le00251024 7484 339 14584 661 22068 160 1 12 1198813 0003 gt00251025 68 986 1 14 69 01 0 14 11988112 0000 le02187526 74 532 65 468 139 01 0 14 11988112 0000 gt02187527 232 475 256 525 488 04 1 16 11988121 0000 le0437528 394 794 102 206 496 04 0 16 11988121 0000 gt0437529 758 792 199 208 957 07 0 17 11988115 0001 le00781230 547 465 630 535 1177 09 1 17 11988115 0001 gt00781231 3200 295 7633 705 10833 79 1 18 11988113 0001 le05032 1723 468 1959 532 3682 27 1 18 11988113 0001 gt05033 1828 221 6450 779 8278 60 1 19 11988117 0003 le01721434 1585 509 1530 491 3115 23 0 19 11988117 0003 gt01721435 5873 217 21164 783 27037 197 1 20 11988118 0001 le04648836 77 1000 0 00 77 01 0 20 11988118 0001 gt04648837 2179 714 873 286 3052 22 0 22 1198811 0000 le01538538 1500 876 213 124 1713 12 0 22 1198811 0000 gt01538539 5193 592 3577 408 8770 64 0 23 11988118 0003 le00880940 5773 404 8506 596 14279 104 1 23 11988118 0003 gt00880941 7078 329 14405 671 21483 156 1 24 11988117 0001 le05218542 406 694 179 306 585 04 0 24 11988117 0001 gt052185

Journal of Applied Mathematics 7

Table 3 Classification accuracy produced by each model

Decision tree model ClassificationPredicted (0) Predicted (1) Overall

CHAID 735 903 819Exhaustive CHAID 723 917 820CRT 614 897 756QUEST 544 935 740

Inde

pend

ent v

aria

ble

V10

V7

V14

V11

V18

V20

V19

V12

V17

V1

V13

V9

V8

V6

V21

V2

V4

V5

V15

V16

V3

0 20 40 60 80 100

Normalized importance

000 005 010 015

Importance ()

Figure 2 Normalized importance of factors using CRT method

indicates the order of important variables and quantitativelydescribes the relationships among the occurrence of land-slides topography and geology The decision-tree modelusing the exhaustive CHAID algorithm showed greater accu-racy than the other models demonstrating the usefulnessof the decision tree model for landslide hazard mappingAccuracies were 820 for the exhaustive CHAID 819for the CHAID 756 for the CRT and 740 for theQuest algorithm In this study we determined factors thatmay be involved in landslide susceptibility and the resultscan be used for landslide hazard mapping in other regionsMoreover landslide hazardmappingmap can be used to helpmitigate hazards to people and facilities and as basic datafor developing plans to prevent landslide hazards such as inlocating monitoring and facility sites Further case studiesand modeling are needed to better generalize the factorsinvolved in landslide susceptibility

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] P Aleotti and R Chowdhury ldquoLandslide hazard assessmentsummary review and new perspectivesrdquo Bulletin of EngineeringGeology and the Environment vol 58 no 1 pp 21ndash44 1999

[2] H Saito D Nakayama and H Matsuyama ldquoComparison oflandslide susceptibility based on a decision-tree model andactual landslide occurrence The Akaishi Mountains JapanrdquoGeomorphology vol 109 no 3-4 pp 108ndash121 2009

[3] F Guzzetti A Carrara M Cardinali and P ReichenbachldquoLandslide hazard evaluation a review of current techniquesand their application in a multi-scale study Central ItalyrdquoGeomorphology vol 31 no 1ndash4 pp 181ndash216 1999

[4] F Guzzetti P Reichenbach F Ardizzone M Cardinali and MGalli ldquoEstimating the quality of landslide susceptibilitymodelsrdquoGeomorphology vol 81 no 1-2 pp 166ndash184 2006

[5] F Guzzetti P Reichenbach M Cardinali M Galli and FArdizzone ldquoProbabilistic landslide hazard assessment at thebasin scalerdquoGeomorphology vol 72 no 1ndash4 pp 272ndash299 2005

[6] E Yesilnacar and T Topal ldquoLandslide susceptibility mapping acomparison of logistic regression and neural networksmethodsin a medium scale study Hendek region (Turkey)rdquo EngineeringGeology vol 79 no 3-4 pp 251ndash266 2005

[7] D P Kanungo M K Arora S Sarkar and R P Gupta ldquoAcomparative study of conventional ANN black box fuzzy andcombined neural and fuzzy weighting procedures for landslidesusceptibility zonation in Darjeeling Himalayasrdquo EngineeringGeology vol 85 no 3-4 pp 347ndash366 2006

[8] S He P Pan L Dai H Wang and J Liu ldquoApplication ofkernel-based Fisher discriminant analysis to map landslidesusceptibility in the Qinggan River deltaThree Gorges ChinardquoGeomorphology vol 171-172 pp 30ndash41 2012

[9] A Carrara ldquoMultivariate models for landslide hazard evalua-tionrdquo Journal of the International Association for MathematicalGeology vol 15 no 3 pp 403ndash426 1983

[10] L Ayalew and H Yamagishi ldquoSlope failures in the Blue Nilebasin as seen from landscape evolution perspectiverdquo Geomor-phology vol 57 no 1-2 pp 95ndash116 2004

[11] G Metternicht L Hurni and R Gogu ldquoRemote sensing oflandslides an analysis of the potential contribution to geo-spatial systems for hazard assessment in mountainous environ-mentsrdquoRemote Sensing of Environment vol 98 no 2-3 pp 284ndash303 2005

[12] D E Alexander ldquoA brief survey of GIS inmass-movement stud-ies with reflections on theory and methodsrdquo Geomorphologyvol 94 no 3-4 pp 261ndash267 2008

[13] J Remondo J Bonachea and A Cendrero ldquoQuantitativelandslide risk assessment and mapping on the basis of recentoccurrencesrdquo Geomorphology vol 94 no 3-4 pp 496ndash5072008

8 Journal of Applied Mathematics

[14] L Luzi F Pergalani and M T J Terlien ldquoSlope vulnerability toearthquakes at subregional scale using probabilistic techniquesand geographic information systemsrdquo Engineering Geology vol58 no 3-4 pp 313ndash336 2000

[15] S Lee and KMin ldquoStatistical analysis of landslide susceptibilityat Yongin Koreardquo Environmental Geology vol 40 no 9 pp1095ndash1113 2001

[16] L Donati and M C Turrini ldquoAn objective method to rankthe importance of the factors predisposing to landslides withthe GIS methodology application to an area of the Apennines(Valnerina Perugia Italy)rdquo Engineering Geology vol 63 no 3-4 pp 277ndash289 2002

[17] S Lee and U Choi ldquoDevelopment of GIS-based geologicalhazard information system and its application for landslideanalysis in KoreardquoGeosciences Journal vol 7 no 3 pp 243ndash2522003

[18] B Neuhauser and B Terhorst ldquoLandslide susceptibility assess-ment using ldquoweights-of-evidencerdquo applied to a study area at theJurassic escarpment (SW-Germany)rdquo Geomorphology vol 86no 1-2 pp 12ndash24 2007

[19] P M Atkinson and R Massari ldquoGeneralised linear modellingof susceptibility to landsliding in the central Apennines ItalyrdquoComputers and Geosciences vol 24 no 4 pp 373ndash385 1998

[20] F C Dai C F Lee J Li and Z W Xu ldquoAssessment of landslidesusceptibility on the natural terrain of Lantau Island HongKongrdquo Environmental Geology vol 40 no 3 pp 381ndash391 2001

[21] H A Nefeslioglu T Y Duman and S Durmaz ldquoLandslidesusceptibility mapping for a part of tectonic Kelkit Valley(Eastern Black Sea region of Turkey)rdquo Geomorphology vol 94no 3-4 pp 401ndash418 2008

[22] L Ermini F Catani andN Casagli ldquoArtificial Neural Networksapplied to landslide susceptibility assessmentrdquo Geomorphologyvol 66 no 1ndash4 pp 327ndash343 2005

[23] S Lee and I Park ldquoApplication of decision tree model for theground subsidence hazard mapping near abandoned under-ground coal minesrdquo Journal of Environmental Management vol127 pp 166ndash176 2013

[24] H Gomez and T Kavzoglu ldquoAssessment of shallow landslidesusceptibility using artificial neural networks in Jabonosa RiverBasin Venezuelardquo Engineering Geology vol 78 no 1-2 pp 11ndash27 2005

[25] C Melchiorre M Matteucci A Azzoni and A ZanchildquoArtificial neural networks and cluster analysis in landslidesusceptibility zonationrdquo Geomorphology vol 94 no 3-4 pp379ndash400 2008

[26] Y-K Yeon J-G Han and K H Ryu ldquoLandslide susceptibilitymapping in Injae Korea using a decision treerdquo EngineeringGeology vol 116 no 3-4 pp 274ndash283 2010

[27] R Bou Kheir J Chorowicz C Abdallah and D DhontldquoSoil and bedrock distribution estimated from gully form andfrequency A GIS-based decision-tree model for LebanonrdquoGeomorphology vol 93 no 3-4 pp 482ndash492 2008

[28] N J Schneevoigt S van der Linden H-P Thamm and LSchrott ldquoDetecting Alpine landforms from remotely sensedimagery A pilot study in the Bavarian Alpsrdquo Geomorphologyvol 93 no 1-2 pp 104ndash119 2008

[29] C-S Huang Y-J Lin and C-C Lin ldquoImplementation ofclassifiers for choosing insurance policy using decision trees Acase studyrdquoWSEASTransactions onComputers vol 7 no 10 pp1679ndash1689 2008

[30] M Pal and P M Mather ldquoAn assessment of the effectivenessof decision tree methods for land cover classificationrdquo RemoteSensing of Environment vol 86 no 4 pp 554ndash565 2003

[31] D T Bui B Pradhan O Lofman and I Revhaug ldquoLandslidesusceptibility assessment in Vietnam using support vectormachines decision tree and naıve bayesmodelsrdquoMathematicalProblems in Engineering vol 2012 Article ID 974638 26 pages2012

[32] P K Pang L T Tien and H Lateh ldquoLandslide hazard mappingof penang islandusing decision treemodelrdquo inProceedings of theInternational Conference on Systems and Electronic Engineering(ICSEE rsquo12) Phuket Thailand December 2012

[33] B Pradhan ldquoA comparative study on the predictive ability of thedecision tree support vector machine and neuro-fuzzy modelsin landslide susceptibility mapping using GISrdquo Computers ampGeosciences vol 51 pp 350ndash365 2013

[34] M Ture F Tokatli and I Kurt ldquoUsing Kaplan-Meier analysistogether with decision tree methods (CampRT CHAID QUESTC45 and ID3) in determining recurrence-free survival of breastcancer patientsrdquo Expert Systems with Applications vol 36 no 2pp 2017ndash2026 2009

[35] C E Brodley and M A Friedl ldquoDecision tree classificationof land cover from remotely sensed datardquo Remote Sensing ofEnvironment vol 61 no 3 pp 399ndash409 1997

[36] M Xu P Watanachaturaporn P K Varshney and M KArora ldquoDecision tree regression for soft classification of remotesensing datardquo Remote Sensing of Environment vol 97 no 3 pp322ndash336 2005

[37] I H Witten and E Frank Data MiningmdashPractical MachineLearning Tools and Techniques Elsevier Amsterdam TheNetherlands 2nd edition 2005

[38] R J Lewis ldquoAn introduction to Classification and RegressionTree (CART) analysisrdquo in Proceedings of the Annual Meetingof the Society for Academic Emergenct Medicine San FranciscoCalif USA 2000

[39] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth and BrooksMontery Calif USA 1984

[40] J A Michael and S L Gordon Data Mining Technique ForMarketing Sales and Customer Support Wiley New York NYUSA 1997

[41] D B V Biggs and E Suen ldquoA method of choosing multiwaypartitions for classification and decision treesrdquo Journal ofApplied Statistics vol 18 pp 49ndash62 1991

[42] L A Goodman ldquoSimple models for the analysis of associationin cross-classifications having ordered categoriesrdquo Journal of theAmerican Statistical Association vol 74 no 367 pp 537ndash5521979

[43] G Kass ldquoAn exploratory technique for investigating largequantities of categorical datardquo Applied Statistics vol 29 no 2pp 119ndash127 1980

[44] T Hill and P Lewicki Statistics Methods and Applications AComprehensive Reference for Science Industry andDataMiningStata Soft USA 2006

[45] W-Y Loh and Y-S Shih ldquoSplit selectionmethods for classifica-tion treesrdquo Statistica Sinica vol 7 no 4 pp 815ndash840 1997

[46] H-J Oh and B Pradhan ldquoApplication of a neuro-fuzzy modelto landslide-susceptibility mapping for shallow landslides in atropical hilly areardquoComputers and Geosciences vol 37 no 9 pp1264ndash1276 2011

Journal of Applied Mathematics 9

[47] K LimKhai-Wern T Lea Tien andH Lateh ldquoLandslide hazardmapping of Penang island using probabilistic methods andlogistic regressionrdquo in Proceedings of the IEEE InternationalConference on Imaging Systems andTechniques (IST rsquo11) pp 273ndash278 May 2011

[48] M S Alklhasawneh and U K Ngah ldquoLandslide susceptibilityhazard mapping techniques reviewrdquo Journal of Applied Sciencesvol 12 pp 802ndash808 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Research Article Modeling and Testing Landslide …downloads.hindawi.com/journals/jam/2014/929768.pdfe re are three tree types of decision tree: CRT, CHAID and Exhaustive CHAID, and

6 Journal of Applied Mathematics

Table 2 Tree table using CRT method

Node 0 1 Total Predictedcategory Parent node Primary independent variable

119873 Percent 119873 Percent 119873 Percent Variable Improvement Split values0 68786 500 68786 500 137572 1000 11 31018 958 1347 42 32365 235 0 0 11988115 0129 le0037252 37768 359 67439 641 105207 765 1 0 11988115 0129 gt0037253 29797 990 314 10 30111 219 0 1 11988116 0006 le0045804 1221 542 1033 458 2254 16 0 1 11988116 0006 gt0045805 15591 283 39565 717 55156 401 1 2 11988119 0010 le0021206 22177 443 27874 557 50051 364 1 2 11988119 0010 gt0021207 379 360 674 640 1053 08 1 4 11988117 0001 le0009588 842 701 359 299 1201 09 0 4 11988117 0001 gt0009589 6228 374 10421 626 16649 121 1 5 1198813 0003 le00129210 9363 243 29144 757 38507 280 1 5 1198813 0003 gt00129211 3727 755 1207 245 4934 36 0 6 11988116 0008 le00510412 18450 409 26667 591 45117 328 1 6 11988116 0008 gt00510413 237 280 608 720 845 06 1 7 11988112 0000 le01562514 142 683 66 317 208 02 0 7 11988112 0000 gt01562515 216 995 1 05 217 02 0 8 11988115 0000 le00237616 626 636 358 364 984 07 0 8 11988115 0000 gt00237617 1305 612 829 388 2134 16 0 9 11988117 0002 le00265918 4923 339 9592 661 14515 106 1 9 11988117 0002 gt00265919 3413 300 7980 700 11393 83 1 10 11988118 0001 le00587320 5950 219 21164 781 27114 197 1 10 11988118 0001 gt00587321 48 284 121 716 169 01 1 11 11988113 0001 le01022 3679 772 1086 228 4765 35 0 11 11988113 0001 gt01023 10966 476 12083 524 23049 168 1 12 1198813 0003 le00251024 7484 339 14584 661 22068 160 1 12 1198813 0003 gt00251025 68 986 1 14 69 01 0 14 11988112 0000 le02187526 74 532 65 468 139 01 0 14 11988112 0000 gt02187527 232 475 256 525 488 04 1 16 11988121 0000 le0437528 394 794 102 206 496 04 0 16 11988121 0000 gt0437529 758 792 199 208 957 07 0 17 11988115 0001 le00781230 547 465 630 535 1177 09 1 17 11988115 0001 gt00781231 3200 295 7633 705 10833 79 1 18 11988113 0001 le05032 1723 468 1959 532 3682 27 1 18 11988113 0001 gt05033 1828 221 6450 779 8278 60 1 19 11988117 0003 le01721434 1585 509 1530 491 3115 23 0 19 11988117 0003 gt01721435 5873 217 21164 783 27037 197 1 20 11988118 0001 le04648836 77 1000 0 00 77 01 0 20 11988118 0001 gt04648837 2179 714 873 286 3052 22 0 22 1198811 0000 le01538538 1500 876 213 124 1713 12 0 22 1198811 0000 gt01538539 5193 592 3577 408 8770 64 0 23 11988118 0003 le00880940 5773 404 8506 596 14279 104 1 23 11988118 0003 gt00880941 7078 329 14405 671 21483 156 1 24 11988117 0001 le05218542 406 694 179 306 585 04 0 24 11988117 0001 gt052185

Journal of Applied Mathematics 7

Table 3 Classification accuracy produced by each model

Decision tree model ClassificationPredicted (0) Predicted (1) Overall

CHAID 735 903 819Exhaustive CHAID 723 917 820CRT 614 897 756QUEST 544 935 740

Inde

pend

ent v

aria

ble

V10

V7

V14

V11

V18

V20

V19

V12

V17

V1

V13

V9

V8

V6

V21

V2

V4

V5

V15

V16

V3

0 20 40 60 80 100

Normalized importance

000 005 010 015

Importance ()

Figure 2 Normalized importance of factors using CRT method

indicates the order of important variables and quantitativelydescribes the relationships among the occurrence of land-slides topography and geology The decision-tree modelusing the exhaustive CHAID algorithm showed greater accu-racy than the other models demonstrating the usefulnessof the decision tree model for landslide hazard mappingAccuracies were 820 for the exhaustive CHAID 819for the CHAID 756 for the CRT and 740 for theQuest algorithm In this study we determined factors thatmay be involved in landslide susceptibility and the resultscan be used for landslide hazard mapping in other regionsMoreover landslide hazardmappingmap can be used to helpmitigate hazards to people and facilities and as basic datafor developing plans to prevent landslide hazards such as inlocating monitoring and facility sites Further case studiesand modeling are needed to better generalize the factorsinvolved in landslide susceptibility

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] P Aleotti and R Chowdhury ldquoLandslide hazard assessmentsummary review and new perspectivesrdquo Bulletin of EngineeringGeology and the Environment vol 58 no 1 pp 21ndash44 1999

[2] H Saito D Nakayama and H Matsuyama ldquoComparison oflandslide susceptibility based on a decision-tree model andactual landslide occurrence The Akaishi Mountains JapanrdquoGeomorphology vol 109 no 3-4 pp 108ndash121 2009

[3] F Guzzetti A Carrara M Cardinali and P ReichenbachldquoLandslide hazard evaluation a review of current techniquesand their application in a multi-scale study Central ItalyrdquoGeomorphology vol 31 no 1ndash4 pp 181ndash216 1999

[4] F Guzzetti P Reichenbach F Ardizzone M Cardinali and MGalli ldquoEstimating the quality of landslide susceptibilitymodelsrdquoGeomorphology vol 81 no 1-2 pp 166ndash184 2006

[5] F Guzzetti P Reichenbach M Cardinali M Galli and FArdizzone ldquoProbabilistic landslide hazard assessment at thebasin scalerdquoGeomorphology vol 72 no 1ndash4 pp 272ndash299 2005

[6] E Yesilnacar and T Topal ldquoLandslide susceptibility mapping acomparison of logistic regression and neural networksmethodsin a medium scale study Hendek region (Turkey)rdquo EngineeringGeology vol 79 no 3-4 pp 251ndash266 2005

[7] D P Kanungo M K Arora S Sarkar and R P Gupta ldquoAcomparative study of conventional ANN black box fuzzy andcombined neural and fuzzy weighting procedures for landslidesusceptibility zonation in Darjeeling Himalayasrdquo EngineeringGeology vol 85 no 3-4 pp 347ndash366 2006

[8] S He P Pan L Dai H Wang and J Liu ldquoApplication ofkernel-based Fisher discriminant analysis to map landslidesusceptibility in the Qinggan River deltaThree Gorges ChinardquoGeomorphology vol 171-172 pp 30ndash41 2012

[9] A Carrara ldquoMultivariate models for landslide hazard evalua-tionrdquo Journal of the International Association for MathematicalGeology vol 15 no 3 pp 403ndash426 1983

[10] L Ayalew and H Yamagishi ldquoSlope failures in the Blue Nilebasin as seen from landscape evolution perspectiverdquo Geomor-phology vol 57 no 1-2 pp 95ndash116 2004

[11] G Metternicht L Hurni and R Gogu ldquoRemote sensing oflandslides an analysis of the potential contribution to geo-spatial systems for hazard assessment in mountainous environ-mentsrdquoRemote Sensing of Environment vol 98 no 2-3 pp 284ndash303 2005

[12] D E Alexander ldquoA brief survey of GIS inmass-movement stud-ies with reflections on theory and methodsrdquo Geomorphologyvol 94 no 3-4 pp 261ndash267 2008

[13] J Remondo J Bonachea and A Cendrero ldquoQuantitativelandslide risk assessment and mapping on the basis of recentoccurrencesrdquo Geomorphology vol 94 no 3-4 pp 496ndash5072008

8 Journal of Applied Mathematics

[14] L Luzi F Pergalani and M T J Terlien ldquoSlope vulnerability toearthquakes at subregional scale using probabilistic techniquesand geographic information systemsrdquo Engineering Geology vol58 no 3-4 pp 313ndash336 2000

[15] S Lee and KMin ldquoStatistical analysis of landslide susceptibilityat Yongin Koreardquo Environmental Geology vol 40 no 9 pp1095ndash1113 2001

[16] L Donati and M C Turrini ldquoAn objective method to rankthe importance of the factors predisposing to landslides withthe GIS methodology application to an area of the Apennines(Valnerina Perugia Italy)rdquo Engineering Geology vol 63 no 3-4 pp 277ndash289 2002

[17] S Lee and U Choi ldquoDevelopment of GIS-based geologicalhazard information system and its application for landslideanalysis in KoreardquoGeosciences Journal vol 7 no 3 pp 243ndash2522003

[18] B Neuhauser and B Terhorst ldquoLandslide susceptibility assess-ment using ldquoweights-of-evidencerdquo applied to a study area at theJurassic escarpment (SW-Germany)rdquo Geomorphology vol 86no 1-2 pp 12ndash24 2007

[19] P M Atkinson and R Massari ldquoGeneralised linear modellingof susceptibility to landsliding in the central Apennines ItalyrdquoComputers and Geosciences vol 24 no 4 pp 373ndash385 1998

[20] F C Dai C F Lee J Li and Z W Xu ldquoAssessment of landslidesusceptibility on the natural terrain of Lantau Island HongKongrdquo Environmental Geology vol 40 no 3 pp 381ndash391 2001

[21] H A Nefeslioglu T Y Duman and S Durmaz ldquoLandslidesusceptibility mapping for a part of tectonic Kelkit Valley(Eastern Black Sea region of Turkey)rdquo Geomorphology vol 94no 3-4 pp 401ndash418 2008

[22] L Ermini F Catani andN Casagli ldquoArtificial Neural Networksapplied to landslide susceptibility assessmentrdquo Geomorphologyvol 66 no 1ndash4 pp 327ndash343 2005

[23] S Lee and I Park ldquoApplication of decision tree model for theground subsidence hazard mapping near abandoned under-ground coal minesrdquo Journal of Environmental Management vol127 pp 166ndash176 2013

[24] H Gomez and T Kavzoglu ldquoAssessment of shallow landslidesusceptibility using artificial neural networks in Jabonosa RiverBasin Venezuelardquo Engineering Geology vol 78 no 1-2 pp 11ndash27 2005

[25] C Melchiorre M Matteucci A Azzoni and A ZanchildquoArtificial neural networks and cluster analysis in landslidesusceptibility zonationrdquo Geomorphology vol 94 no 3-4 pp379ndash400 2008

[26] Y-K Yeon J-G Han and K H Ryu ldquoLandslide susceptibilitymapping in Injae Korea using a decision treerdquo EngineeringGeology vol 116 no 3-4 pp 274ndash283 2010

[27] R Bou Kheir J Chorowicz C Abdallah and D DhontldquoSoil and bedrock distribution estimated from gully form andfrequency A GIS-based decision-tree model for LebanonrdquoGeomorphology vol 93 no 3-4 pp 482ndash492 2008

[28] N J Schneevoigt S van der Linden H-P Thamm and LSchrott ldquoDetecting Alpine landforms from remotely sensedimagery A pilot study in the Bavarian Alpsrdquo Geomorphologyvol 93 no 1-2 pp 104ndash119 2008

[29] C-S Huang Y-J Lin and C-C Lin ldquoImplementation ofclassifiers for choosing insurance policy using decision trees Acase studyrdquoWSEASTransactions onComputers vol 7 no 10 pp1679ndash1689 2008

[30] M Pal and P M Mather ldquoAn assessment of the effectivenessof decision tree methods for land cover classificationrdquo RemoteSensing of Environment vol 86 no 4 pp 554ndash565 2003

[31] D T Bui B Pradhan O Lofman and I Revhaug ldquoLandslidesusceptibility assessment in Vietnam using support vectormachines decision tree and naıve bayesmodelsrdquoMathematicalProblems in Engineering vol 2012 Article ID 974638 26 pages2012

[32] P K Pang L T Tien and H Lateh ldquoLandslide hazard mappingof penang islandusing decision treemodelrdquo inProceedings of theInternational Conference on Systems and Electronic Engineering(ICSEE rsquo12) Phuket Thailand December 2012

[33] B Pradhan ldquoA comparative study on the predictive ability of thedecision tree support vector machine and neuro-fuzzy modelsin landslide susceptibility mapping using GISrdquo Computers ampGeosciences vol 51 pp 350ndash365 2013

[34] M Ture F Tokatli and I Kurt ldquoUsing Kaplan-Meier analysistogether with decision tree methods (CampRT CHAID QUESTC45 and ID3) in determining recurrence-free survival of breastcancer patientsrdquo Expert Systems with Applications vol 36 no 2pp 2017ndash2026 2009

[35] C E Brodley and M A Friedl ldquoDecision tree classificationof land cover from remotely sensed datardquo Remote Sensing ofEnvironment vol 61 no 3 pp 399ndash409 1997

[36] M Xu P Watanachaturaporn P K Varshney and M KArora ldquoDecision tree regression for soft classification of remotesensing datardquo Remote Sensing of Environment vol 97 no 3 pp322ndash336 2005

[37] I H Witten and E Frank Data MiningmdashPractical MachineLearning Tools and Techniques Elsevier Amsterdam TheNetherlands 2nd edition 2005

[38] R J Lewis ldquoAn introduction to Classification and RegressionTree (CART) analysisrdquo in Proceedings of the Annual Meetingof the Society for Academic Emergenct Medicine San FranciscoCalif USA 2000

[39] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth and BrooksMontery Calif USA 1984

[40] J A Michael and S L Gordon Data Mining Technique ForMarketing Sales and Customer Support Wiley New York NYUSA 1997

[41] D B V Biggs and E Suen ldquoA method of choosing multiwaypartitions for classification and decision treesrdquo Journal ofApplied Statistics vol 18 pp 49ndash62 1991

[42] L A Goodman ldquoSimple models for the analysis of associationin cross-classifications having ordered categoriesrdquo Journal of theAmerican Statistical Association vol 74 no 367 pp 537ndash5521979

[43] G Kass ldquoAn exploratory technique for investigating largequantities of categorical datardquo Applied Statistics vol 29 no 2pp 119ndash127 1980

[44] T Hill and P Lewicki Statistics Methods and Applications AComprehensive Reference for Science Industry andDataMiningStata Soft USA 2006

[45] W-Y Loh and Y-S Shih ldquoSplit selectionmethods for classifica-tion treesrdquo Statistica Sinica vol 7 no 4 pp 815ndash840 1997

[46] H-J Oh and B Pradhan ldquoApplication of a neuro-fuzzy modelto landslide-susceptibility mapping for shallow landslides in atropical hilly areardquoComputers and Geosciences vol 37 no 9 pp1264ndash1276 2011

Journal of Applied Mathematics 9

[47] K LimKhai-Wern T Lea Tien andH Lateh ldquoLandslide hazardmapping of Penang island using probabilistic methods andlogistic regressionrdquo in Proceedings of the IEEE InternationalConference on Imaging Systems andTechniques (IST rsquo11) pp 273ndash278 May 2011

[48] M S Alklhasawneh and U K Ngah ldquoLandslide susceptibilityhazard mapping techniques reviewrdquo Journal of Applied Sciencesvol 12 pp 802ndash808 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Research Article Modeling and Testing Landslide …downloads.hindawi.com/journals/jam/2014/929768.pdfe re are three tree types of decision tree: CRT, CHAID and Exhaustive CHAID, and

Journal of Applied Mathematics 7

Table 3 Classification accuracy produced by each model

Decision tree model ClassificationPredicted (0) Predicted (1) Overall

CHAID 735 903 819Exhaustive CHAID 723 917 820CRT 614 897 756QUEST 544 935 740

Inde

pend

ent v

aria

ble

V10

V7

V14

V11

V18

V20

V19

V12

V17

V1

V13

V9

V8

V6

V21

V2

V4

V5

V15

V16

V3

0 20 40 60 80 100

Normalized importance

000 005 010 015

Importance ()

Figure 2 Normalized importance of factors using CRT method

indicates the order of important variables and quantitativelydescribes the relationships among the occurrence of land-slides topography and geology The decision-tree modelusing the exhaustive CHAID algorithm showed greater accu-racy than the other models demonstrating the usefulnessof the decision tree model for landslide hazard mappingAccuracies were 820 for the exhaustive CHAID 819for the CHAID 756 for the CRT and 740 for theQuest algorithm In this study we determined factors thatmay be involved in landslide susceptibility and the resultscan be used for landslide hazard mapping in other regionsMoreover landslide hazardmappingmap can be used to helpmitigate hazards to people and facilities and as basic datafor developing plans to prevent landslide hazards such as inlocating monitoring and facility sites Further case studiesand modeling are needed to better generalize the factorsinvolved in landslide susceptibility

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] P Aleotti and R Chowdhury ldquoLandslide hazard assessmentsummary review and new perspectivesrdquo Bulletin of EngineeringGeology and the Environment vol 58 no 1 pp 21ndash44 1999

[2] H Saito D Nakayama and H Matsuyama ldquoComparison oflandslide susceptibility based on a decision-tree model andactual landslide occurrence The Akaishi Mountains JapanrdquoGeomorphology vol 109 no 3-4 pp 108ndash121 2009

[3] F Guzzetti A Carrara M Cardinali and P ReichenbachldquoLandslide hazard evaluation a review of current techniquesand their application in a multi-scale study Central ItalyrdquoGeomorphology vol 31 no 1ndash4 pp 181ndash216 1999

[4] F Guzzetti P Reichenbach F Ardizzone M Cardinali and MGalli ldquoEstimating the quality of landslide susceptibilitymodelsrdquoGeomorphology vol 81 no 1-2 pp 166ndash184 2006

[5] F Guzzetti P Reichenbach M Cardinali M Galli and FArdizzone ldquoProbabilistic landslide hazard assessment at thebasin scalerdquoGeomorphology vol 72 no 1ndash4 pp 272ndash299 2005

[6] E Yesilnacar and T Topal ldquoLandslide susceptibility mapping acomparison of logistic regression and neural networksmethodsin a medium scale study Hendek region (Turkey)rdquo EngineeringGeology vol 79 no 3-4 pp 251ndash266 2005

[7] D P Kanungo M K Arora S Sarkar and R P Gupta ldquoAcomparative study of conventional ANN black box fuzzy andcombined neural and fuzzy weighting procedures for landslidesusceptibility zonation in Darjeeling Himalayasrdquo EngineeringGeology vol 85 no 3-4 pp 347ndash366 2006

[8] S He P Pan L Dai H Wang and J Liu ldquoApplication ofkernel-based Fisher discriminant analysis to map landslidesusceptibility in the Qinggan River deltaThree Gorges ChinardquoGeomorphology vol 171-172 pp 30ndash41 2012

[9] A Carrara ldquoMultivariate models for landslide hazard evalua-tionrdquo Journal of the International Association for MathematicalGeology vol 15 no 3 pp 403ndash426 1983

[10] L Ayalew and H Yamagishi ldquoSlope failures in the Blue Nilebasin as seen from landscape evolution perspectiverdquo Geomor-phology vol 57 no 1-2 pp 95ndash116 2004

[11] G Metternicht L Hurni and R Gogu ldquoRemote sensing oflandslides an analysis of the potential contribution to geo-spatial systems for hazard assessment in mountainous environ-mentsrdquoRemote Sensing of Environment vol 98 no 2-3 pp 284ndash303 2005

[12] D E Alexander ldquoA brief survey of GIS inmass-movement stud-ies with reflections on theory and methodsrdquo Geomorphologyvol 94 no 3-4 pp 261ndash267 2008

[13] J Remondo J Bonachea and A Cendrero ldquoQuantitativelandslide risk assessment and mapping on the basis of recentoccurrencesrdquo Geomorphology vol 94 no 3-4 pp 496ndash5072008

8 Journal of Applied Mathematics

[14] L Luzi F Pergalani and M T J Terlien ldquoSlope vulnerability toearthquakes at subregional scale using probabilistic techniquesand geographic information systemsrdquo Engineering Geology vol58 no 3-4 pp 313ndash336 2000

[15] S Lee and KMin ldquoStatistical analysis of landslide susceptibilityat Yongin Koreardquo Environmental Geology vol 40 no 9 pp1095ndash1113 2001

[16] L Donati and M C Turrini ldquoAn objective method to rankthe importance of the factors predisposing to landslides withthe GIS methodology application to an area of the Apennines(Valnerina Perugia Italy)rdquo Engineering Geology vol 63 no 3-4 pp 277ndash289 2002

[17] S Lee and U Choi ldquoDevelopment of GIS-based geologicalhazard information system and its application for landslideanalysis in KoreardquoGeosciences Journal vol 7 no 3 pp 243ndash2522003

[18] B Neuhauser and B Terhorst ldquoLandslide susceptibility assess-ment using ldquoweights-of-evidencerdquo applied to a study area at theJurassic escarpment (SW-Germany)rdquo Geomorphology vol 86no 1-2 pp 12ndash24 2007

[19] P M Atkinson and R Massari ldquoGeneralised linear modellingof susceptibility to landsliding in the central Apennines ItalyrdquoComputers and Geosciences vol 24 no 4 pp 373ndash385 1998

[20] F C Dai C F Lee J Li and Z W Xu ldquoAssessment of landslidesusceptibility on the natural terrain of Lantau Island HongKongrdquo Environmental Geology vol 40 no 3 pp 381ndash391 2001

[21] H A Nefeslioglu T Y Duman and S Durmaz ldquoLandslidesusceptibility mapping for a part of tectonic Kelkit Valley(Eastern Black Sea region of Turkey)rdquo Geomorphology vol 94no 3-4 pp 401ndash418 2008

[22] L Ermini F Catani andN Casagli ldquoArtificial Neural Networksapplied to landslide susceptibility assessmentrdquo Geomorphologyvol 66 no 1ndash4 pp 327ndash343 2005

[23] S Lee and I Park ldquoApplication of decision tree model for theground subsidence hazard mapping near abandoned under-ground coal minesrdquo Journal of Environmental Management vol127 pp 166ndash176 2013

[24] H Gomez and T Kavzoglu ldquoAssessment of shallow landslidesusceptibility using artificial neural networks in Jabonosa RiverBasin Venezuelardquo Engineering Geology vol 78 no 1-2 pp 11ndash27 2005

[25] C Melchiorre M Matteucci A Azzoni and A ZanchildquoArtificial neural networks and cluster analysis in landslidesusceptibility zonationrdquo Geomorphology vol 94 no 3-4 pp379ndash400 2008

[26] Y-K Yeon J-G Han and K H Ryu ldquoLandslide susceptibilitymapping in Injae Korea using a decision treerdquo EngineeringGeology vol 116 no 3-4 pp 274ndash283 2010

[27] R Bou Kheir J Chorowicz C Abdallah and D DhontldquoSoil and bedrock distribution estimated from gully form andfrequency A GIS-based decision-tree model for LebanonrdquoGeomorphology vol 93 no 3-4 pp 482ndash492 2008

[28] N J Schneevoigt S van der Linden H-P Thamm and LSchrott ldquoDetecting Alpine landforms from remotely sensedimagery A pilot study in the Bavarian Alpsrdquo Geomorphologyvol 93 no 1-2 pp 104ndash119 2008

[29] C-S Huang Y-J Lin and C-C Lin ldquoImplementation ofclassifiers for choosing insurance policy using decision trees Acase studyrdquoWSEASTransactions onComputers vol 7 no 10 pp1679ndash1689 2008

[30] M Pal and P M Mather ldquoAn assessment of the effectivenessof decision tree methods for land cover classificationrdquo RemoteSensing of Environment vol 86 no 4 pp 554ndash565 2003

[31] D T Bui B Pradhan O Lofman and I Revhaug ldquoLandslidesusceptibility assessment in Vietnam using support vectormachines decision tree and naıve bayesmodelsrdquoMathematicalProblems in Engineering vol 2012 Article ID 974638 26 pages2012

[32] P K Pang L T Tien and H Lateh ldquoLandslide hazard mappingof penang islandusing decision treemodelrdquo inProceedings of theInternational Conference on Systems and Electronic Engineering(ICSEE rsquo12) Phuket Thailand December 2012

[33] B Pradhan ldquoA comparative study on the predictive ability of thedecision tree support vector machine and neuro-fuzzy modelsin landslide susceptibility mapping using GISrdquo Computers ampGeosciences vol 51 pp 350ndash365 2013

[34] M Ture F Tokatli and I Kurt ldquoUsing Kaplan-Meier analysistogether with decision tree methods (CampRT CHAID QUESTC45 and ID3) in determining recurrence-free survival of breastcancer patientsrdquo Expert Systems with Applications vol 36 no 2pp 2017ndash2026 2009

[35] C E Brodley and M A Friedl ldquoDecision tree classificationof land cover from remotely sensed datardquo Remote Sensing ofEnvironment vol 61 no 3 pp 399ndash409 1997

[36] M Xu P Watanachaturaporn P K Varshney and M KArora ldquoDecision tree regression for soft classification of remotesensing datardquo Remote Sensing of Environment vol 97 no 3 pp322ndash336 2005

[37] I H Witten and E Frank Data MiningmdashPractical MachineLearning Tools and Techniques Elsevier Amsterdam TheNetherlands 2nd edition 2005

[38] R J Lewis ldquoAn introduction to Classification and RegressionTree (CART) analysisrdquo in Proceedings of the Annual Meetingof the Society for Academic Emergenct Medicine San FranciscoCalif USA 2000

[39] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth and BrooksMontery Calif USA 1984

[40] J A Michael and S L Gordon Data Mining Technique ForMarketing Sales and Customer Support Wiley New York NYUSA 1997

[41] D B V Biggs and E Suen ldquoA method of choosing multiwaypartitions for classification and decision treesrdquo Journal ofApplied Statistics vol 18 pp 49ndash62 1991

[42] L A Goodman ldquoSimple models for the analysis of associationin cross-classifications having ordered categoriesrdquo Journal of theAmerican Statistical Association vol 74 no 367 pp 537ndash5521979

[43] G Kass ldquoAn exploratory technique for investigating largequantities of categorical datardquo Applied Statistics vol 29 no 2pp 119ndash127 1980

[44] T Hill and P Lewicki Statistics Methods and Applications AComprehensive Reference for Science Industry andDataMiningStata Soft USA 2006

[45] W-Y Loh and Y-S Shih ldquoSplit selectionmethods for classifica-tion treesrdquo Statistica Sinica vol 7 no 4 pp 815ndash840 1997

[46] H-J Oh and B Pradhan ldquoApplication of a neuro-fuzzy modelto landslide-susceptibility mapping for shallow landslides in atropical hilly areardquoComputers and Geosciences vol 37 no 9 pp1264ndash1276 2011

Journal of Applied Mathematics 9

[47] K LimKhai-Wern T Lea Tien andH Lateh ldquoLandslide hazardmapping of Penang island using probabilistic methods andlogistic regressionrdquo in Proceedings of the IEEE InternationalConference on Imaging Systems andTechniques (IST rsquo11) pp 273ndash278 May 2011

[48] M S Alklhasawneh and U K Ngah ldquoLandslide susceptibilityhazard mapping techniques reviewrdquo Journal of Applied Sciencesvol 12 pp 802ndash808 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Research Article Modeling and Testing Landslide …downloads.hindawi.com/journals/jam/2014/929768.pdfe re are three tree types of decision tree: CRT, CHAID and Exhaustive CHAID, and

8 Journal of Applied Mathematics

[14] L Luzi F Pergalani and M T J Terlien ldquoSlope vulnerability toearthquakes at subregional scale using probabilistic techniquesand geographic information systemsrdquo Engineering Geology vol58 no 3-4 pp 313ndash336 2000

[15] S Lee and KMin ldquoStatistical analysis of landslide susceptibilityat Yongin Koreardquo Environmental Geology vol 40 no 9 pp1095ndash1113 2001

[16] L Donati and M C Turrini ldquoAn objective method to rankthe importance of the factors predisposing to landslides withthe GIS methodology application to an area of the Apennines(Valnerina Perugia Italy)rdquo Engineering Geology vol 63 no 3-4 pp 277ndash289 2002

[17] S Lee and U Choi ldquoDevelopment of GIS-based geologicalhazard information system and its application for landslideanalysis in KoreardquoGeosciences Journal vol 7 no 3 pp 243ndash2522003

[18] B Neuhauser and B Terhorst ldquoLandslide susceptibility assess-ment using ldquoweights-of-evidencerdquo applied to a study area at theJurassic escarpment (SW-Germany)rdquo Geomorphology vol 86no 1-2 pp 12ndash24 2007

[19] P M Atkinson and R Massari ldquoGeneralised linear modellingof susceptibility to landsliding in the central Apennines ItalyrdquoComputers and Geosciences vol 24 no 4 pp 373ndash385 1998

[20] F C Dai C F Lee J Li and Z W Xu ldquoAssessment of landslidesusceptibility on the natural terrain of Lantau Island HongKongrdquo Environmental Geology vol 40 no 3 pp 381ndash391 2001

[21] H A Nefeslioglu T Y Duman and S Durmaz ldquoLandslidesusceptibility mapping for a part of tectonic Kelkit Valley(Eastern Black Sea region of Turkey)rdquo Geomorphology vol 94no 3-4 pp 401ndash418 2008

[22] L Ermini F Catani andN Casagli ldquoArtificial Neural Networksapplied to landslide susceptibility assessmentrdquo Geomorphologyvol 66 no 1ndash4 pp 327ndash343 2005

[23] S Lee and I Park ldquoApplication of decision tree model for theground subsidence hazard mapping near abandoned under-ground coal minesrdquo Journal of Environmental Management vol127 pp 166ndash176 2013

[24] H Gomez and T Kavzoglu ldquoAssessment of shallow landslidesusceptibility using artificial neural networks in Jabonosa RiverBasin Venezuelardquo Engineering Geology vol 78 no 1-2 pp 11ndash27 2005

[25] C Melchiorre M Matteucci A Azzoni and A ZanchildquoArtificial neural networks and cluster analysis in landslidesusceptibility zonationrdquo Geomorphology vol 94 no 3-4 pp379ndash400 2008

[26] Y-K Yeon J-G Han and K H Ryu ldquoLandslide susceptibilitymapping in Injae Korea using a decision treerdquo EngineeringGeology vol 116 no 3-4 pp 274ndash283 2010

[27] R Bou Kheir J Chorowicz C Abdallah and D DhontldquoSoil and bedrock distribution estimated from gully form andfrequency A GIS-based decision-tree model for LebanonrdquoGeomorphology vol 93 no 3-4 pp 482ndash492 2008

[28] N J Schneevoigt S van der Linden H-P Thamm and LSchrott ldquoDetecting Alpine landforms from remotely sensedimagery A pilot study in the Bavarian Alpsrdquo Geomorphologyvol 93 no 1-2 pp 104ndash119 2008

[29] C-S Huang Y-J Lin and C-C Lin ldquoImplementation ofclassifiers for choosing insurance policy using decision trees Acase studyrdquoWSEASTransactions onComputers vol 7 no 10 pp1679ndash1689 2008

[30] M Pal and P M Mather ldquoAn assessment of the effectivenessof decision tree methods for land cover classificationrdquo RemoteSensing of Environment vol 86 no 4 pp 554ndash565 2003

[31] D T Bui B Pradhan O Lofman and I Revhaug ldquoLandslidesusceptibility assessment in Vietnam using support vectormachines decision tree and naıve bayesmodelsrdquoMathematicalProblems in Engineering vol 2012 Article ID 974638 26 pages2012

[32] P K Pang L T Tien and H Lateh ldquoLandslide hazard mappingof penang islandusing decision treemodelrdquo inProceedings of theInternational Conference on Systems and Electronic Engineering(ICSEE rsquo12) Phuket Thailand December 2012

[33] B Pradhan ldquoA comparative study on the predictive ability of thedecision tree support vector machine and neuro-fuzzy modelsin landslide susceptibility mapping using GISrdquo Computers ampGeosciences vol 51 pp 350ndash365 2013

[34] M Ture F Tokatli and I Kurt ldquoUsing Kaplan-Meier analysistogether with decision tree methods (CampRT CHAID QUESTC45 and ID3) in determining recurrence-free survival of breastcancer patientsrdquo Expert Systems with Applications vol 36 no 2pp 2017ndash2026 2009

[35] C E Brodley and M A Friedl ldquoDecision tree classificationof land cover from remotely sensed datardquo Remote Sensing ofEnvironment vol 61 no 3 pp 399ndash409 1997

[36] M Xu P Watanachaturaporn P K Varshney and M KArora ldquoDecision tree regression for soft classification of remotesensing datardquo Remote Sensing of Environment vol 97 no 3 pp322ndash336 2005

[37] I H Witten and E Frank Data MiningmdashPractical MachineLearning Tools and Techniques Elsevier Amsterdam TheNetherlands 2nd edition 2005

[38] R J Lewis ldquoAn introduction to Classification and RegressionTree (CART) analysisrdquo in Proceedings of the Annual Meetingof the Society for Academic Emergenct Medicine San FranciscoCalif USA 2000

[39] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees Wadsworth and BrooksMontery Calif USA 1984

[40] J A Michael and S L Gordon Data Mining Technique ForMarketing Sales and Customer Support Wiley New York NYUSA 1997

[41] D B V Biggs and E Suen ldquoA method of choosing multiwaypartitions for classification and decision treesrdquo Journal ofApplied Statistics vol 18 pp 49ndash62 1991

[42] L A Goodman ldquoSimple models for the analysis of associationin cross-classifications having ordered categoriesrdquo Journal of theAmerican Statistical Association vol 74 no 367 pp 537ndash5521979

[43] G Kass ldquoAn exploratory technique for investigating largequantities of categorical datardquo Applied Statistics vol 29 no 2pp 119ndash127 1980

[44] T Hill and P Lewicki Statistics Methods and Applications AComprehensive Reference for Science Industry andDataMiningStata Soft USA 2006

[45] W-Y Loh and Y-S Shih ldquoSplit selectionmethods for classifica-tion treesrdquo Statistica Sinica vol 7 no 4 pp 815ndash840 1997

[46] H-J Oh and B Pradhan ldquoApplication of a neuro-fuzzy modelto landslide-susceptibility mapping for shallow landslides in atropical hilly areardquoComputers and Geosciences vol 37 no 9 pp1264ndash1276 2011

Journal of Applied Mathematics 9

[47] K LimKhai-Wern T Lea Tien andH Lateh ldquoLandslide hazardmapping of Penang island using probabilistic methods andlogistic regressionrdquo in Proceedings of the IEEE InternationalConference on Imaging Systems andTechniques (IST rsquo11) pp 273ndash278 May 2011

[48] M S Alklhasawneh and U K Ngah ldquoLandslide susceptibilityhazard mapping techniques reviewrdquo Journal of Applied Sciencesvol 12 pp 802ndash808 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: Research Article Modeling and Testing Landslide …downloads.hindawi.com/journals/jam/2014/929768.pdfe re are three tree types of decision tree: CRT, CHAID and Exhaustive CHAID, and

Journal of Applied Mathematics 9

[47] K LimKhai-Wern T Lea Tien andH Lateh ldquoLandslide hazardmapping of Penang island using probabilistic methods andlogistic regressionrdquo in Proceedings of the IEEE InternationalConference on Imaging Systems andTechniques (IST rsquo11) pp 273ndash278 May 2011

[48] M S Alklhasawneh and U K Ngah ldquoLandslide susceptibilityhazard mapping techniques reviewrdquo Journal of Applied Sciencesvol 12 pp 802ndash808 2012

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 10: Research Article Modeling and Testing Landslide …downloads.hindawi.com/journals/jam/2014/929768.pdfe re are three tree types of decision tree: CRT, CHAID and Exhaustive CHAID, and

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of


Recommended