UNIVERSITÀ DEGLI STUDI Non-linearity and spatial DI MILANO ... 09...15/09/2009 1 Non-linearity and...

Post on 17-Feb-2019

217 views 0 download

transcript

15/09/2009 1

Non-linearity and spatial correlation in landslide susceptibility mapping

C. Ballabio, J. Blahut, S. Sterlacchini

University of Milano-Bicocca

GIT 2009

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

15/09/2009 2

Summary

Landslide susceptibility modeling

Non-linearity issues

Few examples

Application to a case study

Modeling the residual spatial correlation

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

15/09/2009 3

IntroductionUNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

Landslide susceptibility modeling

Usually defined as a classification problem: if y=1 is an observed occurrence and y=0 is a point with no occurrence, and x is a series of variables, then we want to know:

P(y=1|x)=f(x, θ)

15/09/2009 4X2

X1

-2

0

2

4

6

-2 0 2 4 6

Linearly separable classesJust find a separating line/plane/hyper-plane

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

Introduction

15/09/2009 5

Exactly what LDA, QDA and LR doesUNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

15/09/2009 6

Even for linearly separable classes the best function could be not linear

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

15/09/2009 7

X2

X1

-1.0

-0.5

0.0

0.5

1.0

-1.0 -0.5 0.0 0.5 1.0

What if the separation can not be performed by linear functions?How can we separate the two classes by using only X1 and X2?

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

15/09/2009 8

LDA does not work…

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

15/09/2009 9

LDA does not work…

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

15/09/2009 10

Neither does QDA…

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

15/09/2009 11

Even far more flexible models fail to separate the classes…

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

15/09/2009 12

ANNs get close to do the job, but require a lot of tuning…

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

15/09/2009 13

Support VectorMachines

Based on the Statistical Learning Theory (Vapnik, 1995)Very good performance in classification tasksIntrinsic “Occam’s razor” logic: the simplest model is preferredEasy to avoid overfittingNot so “Black-box”

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

15/09/2009 14

Support VectorMachines

Widely used in machine learningBioinformatics / genetic classificationSpatial mappingRoboticsDigital soil mapping

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

15/09/2009 15

Support vectorclassification

Use the best hyperplaneUse the “kernel trick”

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

15/09/2009 16X2

X1

-2

0

2

4

6

-2 0 2 4 6

Which is the best hyperplane?We need a way to define what “optimal separation” is…

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

?

15/09/2009 17X2

X1

-2

0

2

4

6

-2 0 2 4 6

Find the widest gap between classesFit a plane in the middle of the gap

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

15/09/2009 18

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

Kernel Function

The kernel linearize the data in an high dimensional spaceMakes possible to find a flat separating hyperplane

15/09/2009 19

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

Kernel Function

The kernel linearize the data in an high dimensional spaceMakes possible to find a flat separating hyperplane

15/09/2009 20

Kernel Function

Based on the dot product:

Simple to elaborateBut very powerful, can project data in high dimensional spaces: Reproducing Kernel Hilbert Spaces (RKHS)But… it is not known beforehand which kernel is appropriate…

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

∑=

=l

iiiii xxxx

1]'[][',

15/09/2009 21

Kernels

Polynomial:

Linear:

Radial basis function:

dii xxxxK ,),( =

⎟⎟⎠

⎞⎜⎜⎝

⎛ −−= 2

2

2exp),(

σi

ixxxxK

Exponential RBF: ⎟⎠⎞

⎜⎝⎛ −−= 22

exp),(σ

ii

xxxxK

ii xxxxK ,),( =

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

15/09/2009 22

SVM with Single Gaussian kernelSeparates the classes almost perfectlyReproduces the general trend of the data

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

15/09/2009 23

The Stafforabasin study areaTriggering patterns for flowsDEM derived covariates + geology and landuse

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

15/09/2009 24

1510000

1510000

1520000

1520000

4950

000

4950

000

4960

000

4960

000

4970

000

4970

000

0 1 2 3 4 5 Kilometers

LegendNB ProbabilityValue

High : 1

Low : 0Naïve Bayes(≈WoE) predictionNot bad, but we got a lot of high probability areas

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

15/09/2009 25

LDA (a.k.a. Maximum Likelihood) predictionBetter than NB, but we still get a lot of high probabilities

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

1510000

1510000

1520000

1520000

4950

000

4950

000

4960

000

4960

000

4970

000

4970

000

0 1 2 3 4 5 Kilometers

LegendLDA ProbabilityValue

High : 1

Low : 0

15/09/2009 26

1510000

1510000

1520000

1520000

4950

000

4950

000

4960

000

4960

000

4970

000

4970

000

0 1 2 3 4 5 Kilometers

LegendSVM ProbabilityValue

High : 1

Low : 0SVMJust better…

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

15/09/2009 27

Use cross-validation and ROC curves to compare the models

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

Far less false positivesPredicted in the cross-validationsample

15/09/2009 28

Success curves (Fabbri and Chung, 2003)It’s a ROC with only true positives rate…

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

15/09/2009 29

Why use MachineLearning?

Increasing availability of low cost / high information topographic surveys

i.e. LiDAR, Hyper-spectral data

A lot raw derived information

A ML system can automatically interpreter the data without the need of refinements (automatic mapping systems).

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

15/09/2009 30

What happened if we use only DEM derived data?We still get a decent prediction from SVM, but not from LR/LDA/NB

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

15/09/2009 31

distance

sem

ivar

ianc

e

0.05

0.10

0.15

200 400 600 800

Once we predict with SVM can we derive useful information from the data?There is still a lot of autocorrelation al low distances

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

Residual spatial correlation

15/09/2009 32distance

sem

ivar

ianc

e

0.00

0.05

0.10

0.15

500 1500 2500

detrended.occurrence

0.00

0.02

0.04

0.06

0.08

svm.pred.occurrence

0.00

0.10

0.20

500 1500 2500

occurrence

-0.0

10-0

.005

0.00

0

detrended.svm.pred

0.00

0.02

0.04

0.06

0.08

svm.pred

0.00

0.05

0.10

0.15

detrended

We can implement a Kriging system to model the residual informationOr, we can use MK-SVM to model spatial variation

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

Original trend

Residual correlation

Model correlation

15/09/2009 33

A monodimensionalexample

Combination ofcosine functionswith different λPlus some Additive Gaussian NoiseSample 30% of the data points

0 5 10 15 20 25 30

−0.4

0.0

0.4

x

y

0 5 10 15 20 25 30

−1.0

0.0

1.0

x

y2

0 5 10 15 20 25 30

−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

x

y3

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

15/09/2009 34

Multi-kernel analysis

Two gaussianRBF kernelsMK-SVR is able to separate the two signals, even in presence of noise.

0 5 10 15 20 25 30

−1.0

0.0

1.0

x

y2

0 5 10 15 20 25 30

−1.0

0.0

1.0

x

pred

3

0 5 10 15 20 25 30

−0.4

0.0

0.4

x

y

0 5 10 15 20 25 30

−0.1

50.

000.

15

x

pred

4

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

15/09/2009 35

Spatial SVM performance

Within slope predicted probabilityAverage probability close to max probability

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA

15/09/2009 36

Cross-validation performanceAverage prob. still close to max probability

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA Spatial SVM performance

15/09/2009 37

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA Spatial SVM rules extraction

15/09/2009 38

Conclusions

SVM clearly outperform most of the statistical techniques commonly applied for landslide susceptibility perdictionIt IS a “black box” technique, but not so much… several algorithms for feature selection and ranking are availableVery good for automatic and real time mappingCan easily update the model if new data is providedGood for automatic mapping systems

UNIVERSITÀ DEGLI STUDIDI MILANO - BICOCCA