+ All Categories
Home > Documents > Chimiometrie 2009 Proposed model for Challenge2009 Patrícia Valderrama [email protected]...

Chimiometrie 2009 Proposed model for Challenge2009 Patrícia Valderrama [email protected]...

Date post: 17-Jan-2016
Category:
Upload: thomasina-gordon
View: 214 times
Download: 0 times
Share this document with a friend
14
Chimiometrie 2009 Proposed model for Challenge2009 Patrícia Valderrama [email protected] [email protected]
Transcript
Page 1: Chimiometrie 2009 Proposed model for Challenge2009 Patrícia Valderrama pativalderrama@gmail.com patricia.valderrama@agroparistech.fr.

Chimiometrie 2009

Proposed model for Challenge2009

Patrícia [email protected]

[email protected]

Page 2: Chimiometrie 2009 Proposed model for Challenge2009 Patrícia Valderrama pativalderrama@gmail.com patricia.valderrama@agroparistech.fr.

1° step) Models development

Model RMSEC R2

Mean center 20VL 2.7019 0.8325

Mean center 20VL + 1° derivative 1.2991 0.9613

Mean center 20VL + Baseline 2.8022 0.8198

Mean center 20 VL + Smoothing 2.7779 0.8229

Mean center 20VL + Smoothing + 1°

derivative1.7005 0.9336

Variable SelectionGenetic Algorithm

iPLS

Obs.: Impossible to estimate the RMSEP because of the need of reference values to X_TST!

I

1n

2iest,ref,1

I

yyRMSEP

yref is the reference valueyest is the estimate value by the modelI is the number of samples

Page 3: Chimiometrie 2009 Proposed model for Challenge2009 Patrícia Valderrama pativalderrama@gmail.com patricia.valderrama@agroparistech.fr.

2° step) Models with Variable Selection

Model RMSEC R2

GA Mean Center 20VL 1.2940 0.9616

iPLS 10 intervals 15VL 3.7139 0.6674

iPLS 5 intervals 14VL 2.0182 0.9008

iPLS 3 intervals 19VL 1.6438 0.9374

iPLS 2 intervals 16VL 1.2526 0.9625

Best Model

Page 4: Chimiometrie 2009 Proposed model for Challenge2009 Patrícia Valderrama pativalderrama@gmail.com patricia.valderrama@agroparistech.fr.

3° step) Outliers detection for the Best Model

Model RMSEC R2

iPLS 2 intervals 16VL 1.2526 0.9625

iPLS 2 intervals 16 VL after outliers

detection0.8320 0.9834

The outliers detection in calibration matrix were based on :Extreme Leverages (zero outliers)

Unmodeled Residuals in Spectra (zero outliers)Unmodeled Residuals in Dependent Variables (7 outliers)

Outliers total in calibration = 7

The outliers detection in validation matrix were based on:Extreme Leverages (129 outliers)

Unmodeled Residuals in Spectra (106 outliers)Outliers total in validation = 153

Best Model Optimized

Page 5: Chimiometrie 2009 Proposed model for Challenge2009 Patrícia Valderrama pativalderrama@gmail.com patricia.valderrama@agroparistech.fr.

3° step) Outliers detection in calibration and validation matrix

Based on:Extreme Leverages: Leverage represents how much one sample is distant from the center of the data. iA,

1TA

TiA,i

ˆˆˆˆh tTTt

n

1A3h i

where T represents the scores of all calibration samples, ti is the scorevector of a particular sample, and A is the number of latent variables.n = number of samples

According to ASTM E1655-00 , samples with higher than a limit value (hi), should be removed from the calibration set.

Page 6: Chimiometrie 2009 Proposed model for Challenge2009 Patrícia Valderrama pativalderrama@gmail.com patricia.valderrama@agroparistech.fr.

3° step) Outliers detection in calibration and validation matrix

Based on:Unmodeled Residuals in Spectra: Identification of outliers based on unmodeled residuals in spectral data were obtained by comparison of the standard deviation total residuals (s(e)) with the standard deviation of a particular sample (s(ei)):

J

1j

2j,ij,i

2i xx

)J,nmax(AJnJ

n)e(s

s(êi)>2s(ê)

n

i

J

jjiji xx

JnAJnJes

1 1

2,,

2 ˆ),max(

1)ˆ(

n = number of samplesJ = number of variablesA = number of latent variablesXi,j = absorbance value of the sample i at wavelength j = estimated value with A latent variablesjix ,ˆ

If a sample presents s(ei) > 2s(e), the sample should be removed from the calibration set.

Page 7: Chimiometrie 2009 Proposed model for Challenge2009 Patrícia Valderrama pativalderrama@gmail.com patricia.valderrama@agroparistech.fr.

3° step) Outliers detection in calibration matrix

Based on:Unmodeled Residuals in Dependent Variables: Outliers are identified through comparison of the root mean square error of calibration (RMSEC) with the absolute error of that sample.

n = number of samplesJ = number of variablesA = number of latent variablesyi = reference value for the i sample = estimated value for I samples

If a sample presents a difference between its reference value (yi) and its estimate (yˆi) larger 2 times the RMSEC, it is identified as an outlier

n

iii yy

AnRMSEC

1

2ˆ1

1

RMSECyy ii 2ˆ

iy

Page 8: Chimiometrie 2009 Proposed model for Challenge2009 Patrícia Valderrama pativalderrama@gmail.com patricia.valderrama@agroparistech.fr.

4° step) Figures of Merit for the Best Model Optimized•Accuracy

• Fit

•Precision – impossible to estimate because of the need of replicates to the validation samples

•Sensitivity

•Analytical Sensitivity

• Selectivity

•Linearity

•Limit of Detection (LOD)

•Limit of Quantification (LOQ)

•Signal-to-noise ratio

Page 9: Chimiometrie 2009 Proposed model for Challenge2009 Patrícia Valderrama pativalderrama@gmail.com patricia.valderrama@agroparistech.fr.

4° step) Figures of Merit for the Best Model OptimizedAccuracy: This parameter reports the closeness of agreement between the reference value and the value found by the calibration model. In chemometrics, this is generally expressed as the root mean square error of calibration (RMSEC) prediction (RMSEP). However, RMSEP is a global parameter that incorporates both systematic and random errors. Hence, an F-test with the RMSEC/RMSEP of two methods is not appropriate to compare the accuracy, a better indicator is the regression of found versus nominal concentrations values and estimation of the linear regression slope and intercept, including the consideration of the elliptical joint confidence regions.

The ellipses contain the ideal point (1, 0), for slope and intercept respectively, showing that the reference calibration values and PLSresults do not present a significant difference with 99% of confidence.

Page 10: Chimiometrie 2009 Proposed model for Challenge2009 Patrícia Valderrama pativalderrama@gmail.com patricia.valderrama@agroparistech.fr.

4° step) Figures of Merit for the Best Model Optimized

Fit:

Net Analyte Signal Versus Reference Values: Presentation pseudo-univariate of the multivariate calibration model

Page 11: Chimiometrie 2009 Proposed model for Challenge2009 Patrícia Valderrama pativalderrama@gmail.com patricia.valderrama@agroparistech.fr.

4° step) Figures of Merit for the Best Model Optimized

Sensitivity: This parameter is the fraction of analytical signal due to the increase of the concentration of a particular analyte at unit concentration.

= 2.3932x10-5

Analytical Sensitivity: The inverse of this parameter reports the minimum concentration difference between two samples that can be determined by the model, considering that the spectral noise represents the larger source of error.

= 0.5737

And the minimum concentration difference between two samples that can be determined by the model is -1 = 1.7431

b

1sên

i

nasi,K,Anas

k y

xssên

x

sên

Page 12: Chimiometrie 2009 Proposed model for Challenge2009 Patrícia Valderrama pativalderrama@gmail.com patricia.valderrama@agroparistech.fr.

4° step) Figures of Merit for the Best Model Optimized

Selectivity: Signal fraction utilized in the quantification

= 0.21

Linearity: in multivariate calibration a liner model should presents errors with alleatory behavior

i

ii x

nâssel

Page 13: Chimiometrie 2009 Proposed model for Challenge2009 Patrícia Valderrama pativalderrama@gmail.com patricia.valderrama@agroparistech.fr.

4° step) Figures of Merit for the Best Model Optimized

Limit of Detection: Following IUPAC recommendations, the LOD can be defined as the minimum detectable value of net signal (or concentration).

= 5.7518

Limit of Quantification: The ability of quantification is generally expressed in terms of the signal or analyte concentration value that will produce estimatives having a specified standard deviation, usually 10%.

= 17.4296

sên

1x3bx3LD

sên

1x10bx10LQ

Page 14: Chimiometrie 2009 Proposed model for Challenge2009 Patrícia Valderrama pativalderrama@gmail.com patricia.valderrama@agroparistech.fr.

4° step) Figures of Merit for the Best Model Optimized

Signal-to-noise ratio: How much the net analyte signal is superior to instrumental noise

Max = 26.1264

Min = 9.5815

x

nâsR/S i

i


Recommended