+ All Categories
Home > Documents > Knowledge discovery from soil samples using the partial...

Knowledge discovery from soil samples using the partial...

Date post: 27-May-2019
Category:
Upload: truongkhue
View: 218 times
Download: 0 times
Share this document with a friend
18
Knowledge discovery from soil samples using the partial dependence of random forest under fuzzy logic Canying Zeng, A-Xing Zhu, Lin Yang Nanjing Normal university
Transcript
Page 1: Knowledge discovery from soil samples using the partial ...digitalsoilmapping.org/fileadmin/digitalsoil... · Knowledge discovery from soil samples using the partial dependence of

Knowledge discovery from soil samples using the partial dependence of random

forest under fuzzy logic

Canying Zeng, A-Xing Zhu, Lin Yang

Nanjing Normal university

Page 2: Knowledge discovery from soil samples using the partial ...digitalsoilmapping.org/fileadmin/digitalsoil... · Knowledge discovery from soil samples using the partial dependence of

Case study

Introduction

Methodology

1

2

3

Outline

Conclusions 4

Page 3: Knowledge discovery from soil samples using the partial ...digitalsoilmapping.org/fileadmin/digitalsoil... · Knowledge discovery from soil samples using the partial dependence of

Introduction ◆ The soil-landscape relationship

1

The quantitative knowledge on soil-landscape relationship such as rules or membership function is important for understanding soil,

digital soil mapping, and land resource management.

Rules Membership functions

DEM

slope slope

>100 <100

<5 >5 >10 <10

A B C D

1.0 M

embe

rshi

p

Environmental variable

0

0.5

Page 4: Knowledge discovery from soil samples using the partial ...digitalsoilmapping.org/fileadmin/digitalsoil... · Knowledge discovery from soil samples using the partial dependence of

Introduction ◆ Fuzzy membership function

2

Three basic form of fuzzy membership functions (Zhu, 1997)

Environmental variable

1.0

Mem

bers

hip

Z-Shaped

0

0.5

Upper crossover

(b) Bell Shaped 1.0

Mem

bers

hip

Environmental variable

0

Lower crossover

0.5

Upper crossover

xmaxl

(a)

xmaxr xminl xminr xmax xmin

1.0

Mem

bers

hip

Environmental variable

S-Shaped 0

Lower crossover

0.5

(c)

xmax xmin

Membership function is an effective tool to express such knowledge on soil-environment relationships (Zhu 1997, 2001)

Page 5: Knowledge discovery from soil samples using the partial ...digitalsoilmapping.org/fileadmin/digitalsoil... · Knowledge discovery from soil samples using the partial dependence of

Introduction ◆ Fuzzy membership function

1

Soil samples imply the knowledge of relationships between soils and their underlying environmental conditions

Partial dependence (Pd) plot

Random forest Black-box method, widely used in DSM. It doesn’t produce explicitly knowledge on soil-environment relationships.

It can produce the Pd plot, which implies the relationships between soil and environmental.

Partial dependence (Pd) gives a quantitative depiction of the dependence of an environmental variable on the class probability (Friedman, 2001)

Page 6: Knowledge discovery from soil samples using the partial ...digitalsoilmapping.org/fileadmin/digitalsoil... · Knowledge discovery from soil samples using the partial dependence of

Introduction ◆ Random forest and partial dependence

1

Some studies used Pd plots to explain the relationships between species or land use and environmental variables (Wang et al., 2016; Cao et al., 2015;

Cutler et al., 2007), but it cannot be directly used for mapping.

Species Land use

Environmental variables Parti

al d

epen

denc

e

Elevation

Page 7: Knowledge discovery from soil samples using the partial ...digitalsoilmapping.org/fileadmin/digitalsoil... · Knowledge discovery from soil samples using the partial dependence of

Introduction ◆ Question

1

How to translate partial dependence plots into explicit membership functions and use it for soil mapping?

Page 8: Knowledge discovery from soil samples using the partial ...digitalsoilmapping.org/fileadmin/digitalsoil... · Knowledge discovery from soil samples using the partial dependence of

2

Knowledge extraction based on Pd of RF

Environmental variables selection

Soil type inference

Methodology

Page 9: Knowledge discovery from soil samples using the partial ...digitalsoilmapping.org/fileadmin/digitalsoil... · Knowledge discovery from soil samples using the partial dependence of

Methodology ◆ Environmental variables selection

2

The overall mean decrease accuracy (MDA) of the environmental variable in RF should be larger than 0.

NMDA% should be smaller than 10%.

MDA(%) Soil types Overall MDA (%)

NMDA% (%) A B C

Variable 1 23.71 20.26 31.95 28.90 0 Variable 2 20.58 -4.21 -0.86 19.30 20.27 Variable 3 5.23 -3.02 -0.89 -0.03 ---

The mean decrease accuracy was used to choose the relevant environmental variables.

Page 10: Knowledge discovery from soil samples using the partial ...digitalsoilmapping.org/fileadmin/digitalsoil... · Knowledge discovery from soil samples using the partial dependence of

2

For each value of an environmental variable, its Pd is defined as the proportion of votes for a certain class minus the average proportion of votes for the other classes based on the random forest classifier(Friedman, 2001) .

The stronger the partial dependence of a value for some variable, the higher probability of the soil existing in this value of this variable.

Methodology ◆ Knowledge extraction based on Pd generated by RF

Partial

dependence

Determine the key parameters for membership function

Membership

function

Bell Shaped 1.0

Mem

bers

hip

Environmental variable

0

Lower crossover

0.5

Upper crossover

xmaxl xmaxr xminl xminr

After all the fuzzy memberships between each soil type and environmental variables were constructed, The SoLIM was used to predict soil types.

Page 11: Knowledge discovery from soil samples using the partial ...digitalsoilmapping.org/fileadmin/digitalsoil... · Knowledge discovery from soil samples using the partial dependence of

Case study ◆ Study area

3

The study area is located in Heshan farm of Nenjiang County in Heilongjiang province (60 km2). Its elevation ranges from 276 to 363 m. The land use and soil management is generally uniform across the study area.

A: Pachic Stagni-Udic Isohumosols C: Typic Hapli-Udic Isohumosols D: Typic Bori-Udic Cambosols E: Lithic Udi-Orthic Primosols F: Fibric Histic-Typic Haplic Stagnic Gleyosols

B: Mollic Bori-Udic Cambosols

Page 12: Knowledge discovery from soil samples using the partial ...digitalsoilmapping.org/fileadmin/digitalsoil... · Knowledge discovery from soil samples using the partial dependence of

◆ Environmental Variables

3

Variables Module Softwares Elevation Elevation Acgis 10.1 Slope Slope in ArcInfo Acgis 10.1 Cosaspect Cos(Aspect) Acgis 10.1 Planc Plan curvature (Shary et al., 2002) Acgis 10.1 Profic Profile curvature (Shary et al., 2002) Acgis 10.1 TWI Topographic Wetness Index (Qin et al., 2011) SimDTA Hand Height Above the Nearest Drainage (Gharari et al., 2011) Python TCI Terrain Characterization Index (Park and van De Giesen, 2004) SimDTA TRI Terrain Ruggedness Index (S.J. et al., 1999) SimDTA TPI Topographic Position Index (Jenness, 2006; Weiss, 2001) SimDTA Relief Topographic relief (Skidmore, 1990) SimDTA Slopepos Fuzzy slope position including Ridge, Shoulder, Back slope, Foot, Channel with

value of 1-5 (Qin, Zhu, et al., 2009) SimDTA

Description of environmental variables

We generated twelve environmental variables commonly used in this study area.

Case study

Page 13: Knowledge discovery from soil samples using the partial ...digitalsoilmapping.org/fileadmin/digitalsoil... · Knowledge discovery from soil samples using the partial dependence of

◆ Evaluation

3

Two scenarios were conducted to test the effectiveness of proposed method with different training samples.

Random forest was also conducted based on each training samples for the two scenarios.

Training: 33 representative samples Validation: 50 samples as validations

Scenario 1

Training: 2/3 samples Validation: 1/3 samples

Scenario 2

Randomly split

Compared with the knowledge extraction method of Yang et al. (2013) base on same training samples and validation data.

The split was conducted nine times and thus nine sample sets were generated.

Case study

Page 14: Knowledge discovery from soil samples using the partial ...digitalsoilmapping.org/fileadmin/digitalsoil... · Knowledge discovery from soil samples using the partial dependence of

◆ Results of Scenario 1 3

Prediction accuracy

Pachic Stagni-Udic Isohumosols

Mollic Bori-Udic Cambosols

Typic Hapli-Udic Isohumosols

Typic Bori-Udic Cambosols

Lithic Udi-Orthic Primosols Fibric Histic-Typic Haplic Stagnic Gleyosols

(a) (b)

Using membership function: 0.78

Mapping results using (a) membership function and (b) random forest

Using random forest: 0.60

Case study

Yang et al. (2013): 0.76

Soil types

Page 15: Knowledge discovery from soil samples using the partial ...digitalsoilmapping.org/fileadmin/digitalsoil... · Knowledge discovery from soil samples using the partial dependence of

◆ Results of Scenario 2

3

(a)

(d)

Sample set 1

(b)

(e)

(c)

(f)

Pachic Stagni-Udic Isohumosols Mollic Bori-Udic Cambosols Typic Hapli-Udic Isohumosols Typic Bori-Udic Cambosols Lithic Udi-Orthic Primosols Fibric Histic-Typic Haplic Stagnic Gleyosols

Sample set 2 Sample set 3

Mapping results using (a)-(c) membership function and (d)-(f) random forest

Case study

Soil types

Page 16: Knowledge discovery from soil samples using the partial ...digitalsoilmapping.org/fileadmin/digitalsoil... · Knowledge discovery from soil samples using the partial dependence of

◆ Results of Scenario 2

3

0,4

0,5

0,6

0,7

0,8

0,9

0 1 2 3 4 5 6 7 8 9 10

Acc

urac

y

1 Sample sets

2 3 4 5 6 7 8 9

Based on membership functions Based on random forest

The prediction accuracy for each sample set of scenario 2

Case study Average accuracy STD

65.87% 9.59% 63.89% 6.92%

The prediction accuracy of scenario 2 based on the memberships are lower than 1. Because adding atypical samples into training samples made the membership curves wider and overlapped larger. Transitional areas possessed high membership to certain soil types.

Sometimes, the accuracies of random forest were high and sometimes were low. The possible reason is that the random forest was more prefer the training samples with the more widely coverage of environmental conditions for each soil type.

Page 17: Knowledge discovery from soil samples using the partial ...digitalsoilmapping.org/fileadmin/digitalsoil... · Knowledge discovery from soil samples using the partial dependence of

Conclusions 4

The knowledge of relationships between soil and environmental variables can be extracted from partial dependence of random forest. The extracted knowledge is effective to predict soil types in the study area.

Training samples will greatly impact mapping results and accuracies. Using representative samples as training samples is recommended

when applying the proposed method to extract soil-environment knowledge.

Training samples with a full coverage of environmental conditions where each soil type distributes would benefit random forest to obtain more accurate soil maps.

Page 18: Knowledge discovery from soil samples using the partial ...digitalsoilmapping.org/fileadmin/digitalsoil... · Knowledge discovery from soil samples using the partial dependence of

Thank You!


Recommended