Lidwine Grosmaire, Pedro Maldonado-Alvarado, Christelle Reynès, Robert Sabatier, Dominique Dufour, Thierry Tran and Jean-Louis Delarbre
Joint Selection of Wavenumber Regions for MidIR and Raman Spectra and Variables in PLS Regression using
Genetic Algorithms
The Context
Empirical and small-scale processing Good breadmaking ability
Interest
Increased production and consumption of cassava
From the crop to the starch
20
12
EF
Fo
ST
An
nu
al
Me
eti
ng
20
-23
No
ve
mb
er
20
12
• M
on
tpe
llie
r, F
ra
nc
e
Varietal and process impacts on breadmaking ability
Standardize and scale-up the process
Physicochemical parameters
Improve product quality
Industrial development of new gluten-free bread products
Aim
The Aim
The Data
Physicochemical parameters Spectral data
Predicting the expansion ability
PLS1
Y
IR2X
Ra3X1
52
1
52
1 4562
-0.45
-0.40
-0.35
-0.30
-0.25
-0.20
-0.15
-0.10
-0.05
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
Inte
nsité (
co
ups)
500 1 000 1 500 2 000 2 500 3 000 3 500
Shift Raman (cm-1)
RVA
Relevant parameters
selGAmPLS
33511
c1X
IR2X
13
Breadmaking abilityAmylose contentRVA parameters (12 variables)
Mid-infrared spectra (3351 variables)
Raman spectra (4562 variables)
How to explain the breadmaking ability from our data using a statisticalregression method while selecting variables of different types:individual and intervals?
Explanatory variables are organized in a multitable in which intervals andindividual variables are selected in order to predict one variable ofinterest: the breadmaking capacity.
A Genetic Algorithm (GA) was developed in a context of discrimination,jointly with the PLS1 method : this method is called selGAmPLS.
The ResultsFig 2: Final GA populations
characteristics: selected variables are indicated by black points
Table 1: Comparison with other methods results (number of selected variables, number of retained
PLS components, R² and cross-validation R²).
The 10 final populations are quite close indicating a global convergence of the GA
Individual populations seem to have converged
Method # var # comp R² R²CV
PLS 7926 7 0.7836 0.6605
PLS + VIP 4 3 0.7210 0.6650
selAGmPLS 311 12 0.9936 0.8273
Physicochemical parameters
4 RVA parameters
Peak ViscosityHolding Strenght
BreakdownRelative Breakdown
Spectroscopic variables
4 spectral regions
Crystalline order of starch
Selected variables
Conclusion
Genetic Algorithms provide a very adaptable and efficient solution when dealing with both several kinds of variables selections (individual vsintervals) and multiway tables.
The results obtained are very interesting for a predictive use. In terms of interpretation the method allowed to highlight the importance ofsome physico-chemical variables and to select a small number of short intervals in spectroscopic data.
The data selected are related to the water absorptivity and the crystalline state of starch and play a key role in breadmaking ability
7926 variables
The Problematic
Fig 1: Selection of relevant intervals and variables in multitables with a GA for modelling Y thanks to PLS