Predictive Soil Mapping/Modeling (PSM) From Conventional to … Soil... · VWMOD1 are monthly MODIS...

Predictive Soil Mapping/Modeling (PSM) –

From Conventional to Machine Learning (ML) approach

Ranendu Ghosh

Professor and Dean, DAIICT

Ms Megha Pandya

JRF, DAIICT

Presentation at AU

December 17 2019

Soil as Resource

…any material within 2 m from the Earth’s surface that is in

contact with the atmosphere, with the exclusion of living

organisms, areas with continuous ice not covered by other

material, and water bodies deeper than 2 m.

FAO/ISRIC/ISSS 2006

20/12/2019 2 Presentation at BDA 2019, Ahm Univ.

December 17

Soil as Resource

Heterogeneous

Disperse

Three dimensional

Three phase system


December 17

Soil Profile


December 17

Conventional Soil Mapping


December 17

• A soil map is a graphic representation for transmitting

information about the spatial distribution of soil attributes

(Yaalon, 1989).

• The earliest soil maps were produced in the mid 18th century

to delineate homogeneous areas with intrinsic soil attributes

useful in determining suitable land use, and not for soil

classification.

• In the 19th century the Russian school stressed the

importance of genetic soil type, while in the USA the stress is

on the soil's intrinsic properties.

• In conventional soil survey, soil is mapped based on a soil

surveyor's conceptual or mental model (Hudson, 1992).



December 17

Conventional Soil Mapping - Process


December 17

• Aerial photographs, satellite images, and (DEMs) are used to

identify environmental features relating to geology, landform

or vegetation.

• This process is then verified with field observations

• The final product is a map with a legend of soil types, which

can be difficult to interpret and use.



December 17



December 17



December 17



December 17

The main drawbacks of polygon maps are as follows:

• They are static. The maps do not provide direct information on the

dynamics of soil condition (e.g., rates of nutrient depletion)

• They are inflexible for quantitative studies. Such studies

(e.g., food production, land degradation, carbon balance, greenhouse

gas emission) generally require information on the soil’s functional

properties rather than a soil name.

• They imply that soil variation is abrupt and only occurs at the

boundary of the mapping units.

• Some information is lost on polygon maps.

20/12/2019 12

Presentation at BDA 2019, Ahm Univ. December 17

Predictive (Digital) Soil Modeling

Predictive Soil Mapping (PSM) is based on applying statistical

and/or machine learning techniques to fit models for the purpose

of producing spatial and/or spatiotemporal predictions of soil

variables, i.e. maps of soil properties and classes at different

resolutions.

It is a multidisciplinary field combining statistics, data science,

soil science, physical geography, remote sensing, geoinformation

science and a number of other sciences

Scull et al. 2003; McBratney, Mendonça Santos, and Minasny 2003; Henderson et al. 2004;

Boettinger et al. 2010; Zhu et al. 2015


December 17

https://soilmapper.org/preface.html






December 17

Predictive (Digital) Soil Modeling

Three main goals of PSM are to:

To understand the relationship between environmental variables and soil properties in order to more efficiently collect soil data,

Produce and present data that better represent soil landscape continuity, and

Clearly incorporate expert knowledge in modeling.

SCORPAN model


December 17

S = f (cl; o; r; p; t), Jenny, 1941

Sc = f (s, c, o, r, p, a, n) + e McBratney et al. (2003)

Sa = f (s, c, o, r, p, a, n) + e McBratney et al. (2003)

Harmonized World Soil Database 2012

Source databases

Soil Map of the World (DSMW)

SOTER regional studies (SOTWIS)

The European Soil Database (ESDB)

The Soil Map of China 1:1 Million scale (CHINA)

Soil parameter estimates based on the World Inventory of Soil

Emission Potential (WISE) database, 14K profile data


December 17



December 17

1. Data base contents

Resolution of about 1 km (30 arc seconds by 30 arc seconds) was

selected. The resulting raster database consists of 21600 rows and

43200 columns, of which 221 million grid cells cover the globe’s

land territory

Over 16000 different soil mapping units are recognized in the

Harmonized World Soil Database (HWSD.

A SMU can have up to 9 soil unit/topsoil texture combination

2. Harmonization of data base

Attribute Spatial


December 17


SoilGrids250m - Global gridded soil

information based on machine learning (Launched in 2014)

Linear models replaced with tree-based, non-linear machine

learning models to account for non-linear relationships

especially for modeling soil property-depth relationships,

Single prediction models replaced with an ensemble framework

i.e. we use at least two methods for each soil variable to

reduce overshooting effects,

List of covariates extended to include a wider diversity of

MODIS land products and to better represent factors of soil

formation.


December 17


information based on machine learning

Target variables

SoilGrids provides predictions for the following list of standard

soil properties as a function of soil depth

Soil organic carbon content in %,

Soil pH,

Sand, silt and clay (weight %),

Bulk density (kg m−3) of the fine earth fraction (< 2 mm),

Cation-exchange capacity (cmol + /kg) of the fine earth fraction,

Coarse fragments (volumetric %),

Depth to bedrock (cm) and occurrence of R horizon,


December 17



Target variables

SoilGrids provides predictions for the following list of standard

soil classes

World Reference Base (WRB) class

At present, 118 unique soil classes,

United States Department of Agriculture (USDA)

Soil Taxonomy suborders i.e. 67 soil classes.


December 17



Generated predictions at seven standard depths for all numeric soil properties

0 cm, 5 cm, 15 cm, 30 cm, 60 cm, 100 cm and 200 cm, following the vertical

discretisation

Averages over (standard) depth intervals, e.g. 0-5 cm or 0-30 cm, can be

derived by taking a weighted average of the predictions within the depth

interval using numerical integration, such as the trapezoidal rule:

where N is the number of depths, xk is the k-th depth and f(xk) is the value of the target

Variable (i.e., soil property) at depth xk.


December 17

Example of numerical integration following the trapezoidal rule.




December 17

For example,

for the 0-30 cm depth

interval, with soil pH values

at the first four standard

depths equal to 4.5, 5.0, 5.3 and 5.0, the pH is estimated

as

[(5 – 0) * (4.5 – 5.0) + (15 – 5)

* (5.0 - 5.3) + (30 – 15) * (5.3 – 5.0)] /30 .0.5 = 5.083

Input profile data World distribution of soil profiles used for model fitting (about 150,000 points shown

on the map)

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0169748


December 17

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0169748

Soil covariates TWI is the Topographic Wetness Index (values multiplied by 100), EVI is the MODIS Enhanced Vegetation Index (values multiplied by 10,000), s.d. LST is the long-term standard deviation of

MODIS Land Surface Temperatures (values in Celsius degrees).


December 17

Spatial Prediction Framework

Spatial prediction, i.e. fitting of models and generation of maps, consists of four main steps

overlay points and covariates and prepare regression matrix,

fit spatial prediction models,

apply spatial prediction models using tiled raster stacks (covariates),

assess accuracy using cross-validation.


December 17

Spatial Prediction Framework


December 17

Fitted variable importance plots for target variables. (based on R2 values = (1 – SSE/SST) * 100

DEPTH.f is depth from soil surface, TMOD3 and NMOD3 are mean monthly temperatures daytime and nighttime (red color),

TWI, DEM, VBF and VDP are DEM-parameters (bisque color), MMOD4 are mean monthly MODIS NIR band reflectance (cyan

color), PMRG3 are mean monthly precipitation (blue color), EMOD5 are mean monthly EVI derivatives (dark green color),

VWMOD1 are monthly MODIS Precipitable Water Vapor images (orange color), CGLC5 are land cover classes (light green color)

and ASSDAC3 is the average soil and sedimentary deposit thickness (brown color).


December 17

Examples of relationships for target variables

and the most important covariates ( RF model)

• DEPTH.f is the observed depth from soil surface,

• T09MOD3 is mean monthly temperature for September, TMDMOD3 is mean annual temperature,

• PRSMRG3 is total annual precipitation,

• M04MOD4 is mean monthly MODIS NIR band reflectance for April,

• P07MRG3 is mean monthly precipitation for July,

• T01MOD3 is mean monthly temperature for January, and T02MOD3 is mean monthly temperature for February .


December 17

Predicting SEC, pH and SOC using ML approach

Objectives

• To develop PSM model, different approaches has been applied.

• Regression methods and neural network based model approach has been explored.

• SHC was used for training and validation as point data source while satellite based environmental parameters were used as covariates.

• The model trained for 2011-2012 data and tested for the year 2018.


December 17

Methodology

S = f (S,C,O,R,P,A,N)

f – Regression Methods

Decision trees,

Neural networks,

etc..


December 17

SHC Covariates

Datasets Used


December 17

Artificial Neural Network Architecture

*W

*W

*W

*W

Rainfall Data

NDVI

Digital Elevation Model

Slope

Input Layer Hidden Layer

Error

Estimation

Predicted

Value Check

Threshold

Back to next iteration

Within

threshold

Outside

threshold

Output

Value

Output Layer


December 17

NN Architecture

• Input layer are the covariates.

• PSM model has been trained with 2011-2014 environmental

data, and predicted soil properties values for the year 2018.

• For particular those points covariates has been extracted and

trained the model.

• Hidden layer contains different activation functions through which

model can learn non-linearity for prediction.

• Output layer contains predicted values.


December 17

Working Of Neural Networks • The concept of neural network is based on three main steps:

1. For each neuron in a layer, multiply input to weight.

2. Then for each layer, sum all (input) x (weights) of neurons together.

3. Finally, apply activation function on the output to compute new output.

Y = Activation (Ʃ(weight*input) + bias)


December 17

• Specifically in NN we do the sum of products of inputs(X) and their

corresponding Weights(W) and apply a Activation function f(x) to it to

get the output of that layer and feed it as an input to the next layer.

• Here f(x) is activation functions which can be different mathematical

functions.

• Some of the popular activation functions for regression based neural

network are,

– Sigmoid function/ Logistic function

– Tanh function

– ReLU function

• Sigmoid function give the

best result compare to other

activation functions.


December 17

Activation Functions in Neural Networks

Loss functions in Neural Networks

• All kinds of algorithms in machine learning rely on minimizing the ‘loss’ function. A loss function indicates how good is the model in being able to predict the expected outcome.

• It measures the irregularity in the predicted and actual value. It helps in model to train better by controlling the update of its parameters.

• We have done a comprehensive study on various loss functions and chose the optimal one for our network.

• Different loss functions for regression based Neural Networks are,

– MSE (Mean Square error)

– MAE(Mean Absolute error)

– Huber loss

• From above three different loss functions we get better accuracy with MSE.

• Mean squared error (MSE) loss is calculated by the mean of square of the

differences between actual and predicted values across the training

examples.

• Loss function accuracy has been tested with different optimizers also.

• Two different optimizer tested with MSE and Huber loss functions.


December 17

Fig. MSE with different combinations of loss function and

optimizer

• From above fig and table it has been concluded that Adam optimizer

with MSE loss function give the minimum error to the model.

• Model has been equipped with batch normalization and dropout.


December 17

Results

• Above model shows the training accuracy 93% and testing accuracy

with 87%.

• To develop this model SHC data has been split in 80:20 ratio.

20/12/2019 39

Presentation at BDA 2019, Ahm Univ. December 17

Model training on Actual and Trained EC

Model Training accuracy for the year 2011-12 Model Testing accuracy for the year 2011-12

Model validation on Actual and Trained EC

Mean of Actual EC = 0.38

Mean of Predicted EC = 0.37

Standard Deviation of Actual EC = 0.27

Standard Deviation of Predicted EC = 0.25

6.00

6.50

7.00

7.50

8.00

8.50

9.00

9.50

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49

Line graph between Actual and

Predicted pH

Predicted pH Actual pH

Mean of Actual pH = 7.71

Mean of Predicted pH = 7.87

Standard Deviation of Actual pH = 0.61

Standard Deviation of Predicted pH = 0.53

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49


Predicted OC

Predicted OC Actual OC

Mean of Actual OC = 0.37

Mean of Predicted OC = 0.39

Standard Deviation of Actual OC = 0.14

Standard Deviation of Predicted OC = 0.10

0.00

0.50

1.00

1.50

2.00

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49


Predicted SEC

Predicted EC Actual EC


December 17

Actual and predicted

values of soil properties for 2011 and 2018

respectively

EC

No Change

Increasing Trend

Decreasing Trend


December 17

OC pH

Change of soil properties during 2011-18 using

training model of 2011-12

Thanks for your patience


December 17

References

• Bradley A. Miller, in Soil Mapping and Process Modeling for Sustainable Land Use Management, 2017

• B. Minasny, ... A.B.. McBratney, in Reference Module in Earth Systems and Environmental Sciences, 2014

• McBratney, A.B., Santos, M.M. and Minasny, B., 2003. On digital soil mapping. Geoderma, 117(1-2), pp.3-52.


December 17

https://www.sciencedirect.com/book/9780128052006



https://www.sciencedirect.com/referencework/9780124095489



• Inputs are fed into neuron 1, neuron 2 and neuron 3 as they belong to

the Input Layer. • Each neuron has a weight associated with it. When an input enters a

neuron, the weight on the neuron is multiplied to the input.

• For instance, weight 1 will be applied to the input of Neuron 1. If

weight 1 is 0.8 and input is 1 then 0.8 will be computed from Neuron

1: 1 * 0.8 = 0.8

• Sum of weight * inputs of neurons in a layer is calculated. As an

example, the calculated value on the hidden layer in the image will

be:

(Weight 4 x Input To Neuron 4) + (Weight 5 x Input To Neuron 5) • Finally an activation function is applied. Output calculated by the

neurons becomes input to the activation function which then

computes a new output.

• Assume, the activation function is: If (input > 1) Then 0 Else 1

• The output from activation function is then fed to the subsequent layers.


December 17

Working Of Neural Networks

• Activation function is a mathematical formula (algorithm) that is activated under

certain circumstances. When neurons compute weighted sum of inputs, they are passed to the activation function which checks if the computed value is above the required threshold.

• If the computed value is above the required threshold then the activation function is

activated and an output is computed.

• This output is then passed on to the next or previous layers (dependent on the

complexity of the network) which can help neural networks alter weights on their neurons.

• Activation functions are important to learn complicated and Non-linear complex functional mappings between the inputs and output variable. They introduce non-linear

properties to the network.

Activation Functions in Neural Networks


December 17

Methodology


December 17



December 17

Date post:	16-Aug-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Predictive Soil Mapping/Modeling (PSM) From Conventional to … Soil... · VWMOD1 are monthly MODIS...

Documents