+ All Categories
Home > Documents > Z lP ^ randomForest - | World AgroforestryHands-on Soil Infrared Spectroscopy Training Course...

Z lP ^ randomForest - | World AgroforestryHands-on Soil Infrared Spectroscopy Training Course...

Date post: 27-Sep-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
13
Hands-on Soil Infrared Spectroscopy Training Course Getting the best out of light 12 17 th May 2014 R package randomForest Erick Towett
Transcript
Page 1: Z lP ^ randomForest - | World AgroforestryHands-on Soil Infrared Spectroscopy Training Course Getting the best out of light 12 t 17 th May 2014 Z lP ^ randomForest _ Erick Towett

Hands-on Soil Infrared Spectroscopy Training Course

Getting the best out of light

12 – 17th May 2014

R package randomForest Erick Towett

Page 2: Z lP ^ randomForest - | World AgroforestryHands-on Soil Infrared Spectroscopy Training Course Getting the best out of light 12 t 17 th May 2014 Z lP ^ randomForest _ Erick Towett

2

Welcome

Outline

• Introduction

•Features of Random Forest (RF)

• How does RF work?

• Usage

• MIRS Random Forest prediction models for soil properties.

• Demo application of RF to MIRS calibration.

Page 3: Z lP ^ randomForest - | World AgroforestryHands-on Soil Infrared Spectroscopy Training Course Getting the best out of light 12 t 17 th May 2014 Z lP ^ randomForest _ Erick Towett

3

• A great variety of statistical procedures developed & tested for

developing calibrations using IR spectroscopic data.

• This area includes:

• spectral pre-treatments such as derivatives or scatter

correction, etc.

• Procedure used to derive a calibration from the resulting spectral

data such as stepwise-, PLS, or principal component-regression

and other methods such as neural networks e.g. 1. Williams, P., Norris, K. (Eds.), 1987. Near-Infrared Technology in the Agricultural and Food Industries. Amer. Assoc. of Cereal

Chemists, Inc., St. Paul, MN.

2. Williams, P., Norris, K. (Eds.), 2001. Near-Infrared Technology in the Agricultural and Food Industries, 2nd Edition. Amer. Assoc.

of Cereal Chemists, Inc., St. Paul, MN.

3. Naes, T., Isaksson, T., Fearn, T., Davies, T., 2002. A User-Friendly Guide to Multivariate Calibration and Classification. NIR

Publications, Chichester, West Sussex, UK.

4. Westerhaus, M., Workman Jr., J., Reeves III, J.B., Mark, H., 2004. Quantitative analysis. In: Roberts, C.A., Workman Jr., J., Reeves,

J.B. (Eds.), Near-Infrared Spectroscopy in Agriculture. American Society of Agronomy, Madison, WI, pp. 133–174. Chapter 7.).

Introduction I

Page 4: Z lP ^ randomForest - | World AgroforestryHands-on Soil Infrared Spectroscopy Training Course Getting the best out of light 12 t 17 th May 2014 Z lP ^ randomForest _ Erick Towett

4

• Minasny & McBratney (2008) examined 3 methods, PLS,

regression-rules which produces regression trees based on linear

regression, and Treenet which creates boosted regression trees.

• They concluded:

• results showed that, in comparison with PLS with spectra

pretreatment and Boosted Trees, the regression-rules model

provides greater accuracy, is simpler and produces

comprehensible equations, provides an optimal variable

selectio , a d respects the upper a d lower li its of the data .

Minasny, B., McBratney, A.B., 2008. Regression rules as a tool for predicting soil properties

from infrared reflectance spectroscopy. Chemometrics and Intelligent Laboratory Systems

94, 72–79.

Introduction I

Page 5: Z lP ^ randomForest - | World AgroforestryHands-on Soil Infrared Spectroscopy Training Course Getting the best out of light 12 t 17 th May 2014 Z lP ^ randomForest _ Erick Towett

5

• randomForest (RF) implements Breiman’s random forest

algorithm for classification and regression based on a forest of

trees using random inputs.

• Version 5.1

• Depends R (>= 2.5.0)

• Description: Classification and regression based on a forest of

trees using random inputs. URL http://stat-www.berkeley.edu/users/breiman/RandomForests

Reference: Breiman, L. (2001), Random Forests, Machine Learning 45(1), 5-32.

Introduction: RF

Page 6: Z lP ^ randomForest - | World AgroforestryHands-on Soil Infrared Spectroscopy Training Course Getting the best out of light 12 t 17 th May 2014 Z lP ^ randomForest _ Erick Towett

6

RF is fast and easy to implement, produce highly accurate

predictions

It runs efficiently on large data bases.

It can handle thousands of input variables without variable

deletion and without overfitting.

It gives estimates of variable importance in the classification.

RF handles complex data types well.

Obviates the need for transformation of predictors to

approximate normal distributions.

Generated forests can be saved

for future use on other data

Features of Random Forests

Page 7: Z lP ^ randomForest - | World AgroforestryHands-on Soil Infrared Spectroscopy Training Course Getting the best out of light 12 t 17 th May 2014 Z lP ^ randomForest _ Erick Towett

7

What are the challenges of RF? X There are many possible alternative nodes;

X Reseeding will give different models.

How does RF work?

• The out-of-bag (oob) error estimate

In RF, each tree is constructed using a different bootstrap sample from the

original data.

~ 1/3 of the cases are left out of the bootstrap sample and not used in the

construction of the kth tree.

Data to get a running unbiased estimate of classification error as trees are

added to the forest.

It is used to get estimates of variable importance.

Features of RF

Page 8: Z lP ^ randomForest - | World AgroforestryHands-on Soil Infrared Spectroscopy Training Course Getting the best out of light 12 t 17 th May 2014 Z lP ^ randomForest _ Erick Towett

8

• RF can output a list of predictor variables that are important in

predicting the outcome.

• The randomForest package in R has two measures of importance.

One is "total decrease in node impurities from splitting on the variable,

averaged over all trees. The other is based on a permutation test.

How does RF work?

Page 9: Z lP ^ randomForest - | World AgroforestryHands-on Soil Infrared Spectroscopy Training Course Getting the best out of light 12 t 17 th May 2014 Z lP ^ randomForest _ Erick Towett

9

Ongoing:

• Analysis of MIRS randomForests prediction models for soil

properties.

attempt to offer an in-depth analysis of random forests models for the

prediction of a number of soil properties using MIR spectroscopy.

Usage

Page 10: Z lP ^ randomForest - | World AgroforestryHands-on Soil Infrared Spectroscopy Training Course Getting the best out of light 12 t 17 th May 2014 Z lP ^ randomForest _ Erick Towett

10

Materials and Methods

• 1907 soil samples scanned through MIR spectrometer at a resolution

of 4 cm-1 .

• 1st derivative of the spectral range 601.7-4001.6 cm-1 calculated

smoothing interval of 21 data points using the soil.spec package in R.

• RF-OOB built to predict the reference properties from the MIRS 1st

derivative spectra using the entire data set.

Page 11: Z lP ^ randomForest - | World AgroforestryHands-on Soil Infrared Spectroscopy Training Course Getting the best out of light 12 t 17 th May 2014 Z lP ^ randomForest _ Erick Towett

11

Preliminary Results

Bandplot raw spectra Bandplot first deriv. spectra Response wavenumber

Soil organic carbon predicted MIR Vs reference values for the AfSIS baseline fitted using Random Forests; (a) calibration model results & (b) Out-of-bag validation results. The validation samples lying far from the 1:1 line are soils indicate soil types for which more samples need to be added to the calibration library.

Importance plot

Page 12: Z lP ^ randomForest - | World AgroforestryHands-on Soil Infrared Spectroscopy Training Course Getting the best out of light 12 t 17 th May 2014 Z lP ^ randomForest _ Erick Towett

12

Demo:

R package randomForests

Page 13: Z lP ^ randomForest - | World AgroforestryHands-on Soil Infrared Spectroscopy Training Course Getting the best out of light 12 t 17 th May 2014 Z lP ^ randomForest _ Erick Towett

13

R package randomForests

Thank you for your attention


Recommended