Model-based Classification in Food Authenticity Studies
D. Toher1,2, G. Downey1 and T.B. Murphy2
Presented by: Deirdre Toher
1 Ashtown Food Research Centre, Teagasc,
(formerly The National Food Centre), Dublin 152 Dept of Statistics, School of Computer Science and Statistics, Trinity College Dublin, Dublin 2
Outline
• Food authenticity
• Spectroscopic data
• Current mathematical methods
• Proposed alternative – Dimension reduction– Model-based clustering– Updating
• Example near-infrared data with results
Food Authenticity – what and why?
• Detecting when foods are not what they are claimed to be
• Tampering/adulteration, mislabelling
• Economic fraud worth millions of US dollars globally
• Promote quality products
• Build consumer trust
Food Authenticity – how?
• Near infrared spectroscopy– Non-invasive– Relatively inexpensive
• Multivariate Mathematics– Partial Least Squares Regression– Factorial Discriminant Analysis– Model-based Clustering
• Other methods available (sp..)
Spectroscopic Data• Near infrared transflectance spectroscopy
– High dimensional data– Range 1100-2498 nm, reading every 2 nm– 700 values for each sample
Current Mathematical Methods
• Discriminant Partial Least Squares Regression
• Factorial Discriminant Analysis
Problem?– Limited to “two-group” classification problems– No quantification of certainty
Proposed Alternative
Model-based clustering
– Expansion of discriminant analysis– Allows clusters to vary in shape and size– Gives probability of a sample being in each
cluster/group– Can classify situations with more than two
groupings
Possible Cluster Shapes
The Dimensionality Problem• Model-based clustering requires dimension
reduction – for efficient computation– to prevent singular covariance matrices
• Use wavelet analysis with thresholding
EM Algorithm & Updating
• EM algorithm– expected value of the likelihood function– maximises the expected value– commonly used in statistics for estimating
missing values
• Updating– uses previous estimates of labels as a starting
point for iteration
Example: Honey Adulteration
• Irish honey extended with – fructose:glucose mixtures – fully inverted beet syrup – high fructose corn syrup
• Total of 478 spectra:– 157 pure and 321 adulterated
• 225 with fructose:glucose mixtures• 56 with fully inverted beet syrup• 40 with high fructose corn syrup
Classification AchievedClassification rates on test set data achieved
with correct proportions of each type of adulterant in the training set for “pure or adulterated” question.
Training / Test EM EM & Updating
50% / 50% 94.72% (1.12) 94.43% (1.10)
25% / 75% 93.22% (1.08) 93.05% (1.03)
10% / 90% 90.82% (1.76) 92.22% (1.11)
Classification AchievedClassification rates on test set data achieved
with correct proportions of pure / adulterated in the training set for “pure or adulterated” question.
Training / Test EM EM & Updating
50% / 50% 94.38% (1.16) 94.11% (0.89)
25% / 75% 93.50% (1.08) 93.03% (1.02)
10% / 90% 90.54% (1.80) 92.05% (1.09)
Classification AchievedClassification rates on test set data achieved
using 50% training, 50% test data
with correct proportion of pure / adulterated in the training data set for “type of adulteration” question.
Question EM EM & Updating
Pure or adulterated?
91.09% (1.40) 90.64% (1.36)
Type of adulteration
86.23% (1.20) 84.12% (1.67)
Classification AchievedClassification rates on test set data achieved
using 50% training, 50% test data
with correct proportions of each type of adulterant in the training set for “type of adulteration” question.
Question EM EM & Updating
Pure or adulterated?
89.41% (1.76) 88.61% (1.82)
Type of adulteration
85.70% (1.96) 83.57% (2.23)
Probability v Accurate Classification
Probability of group membership - by colour (black being pure, red being adulterated)
Conclusions
• EM algorithm gives a method of predicting group membership
• Updating procedures effective with small training sets
• Quantifying certainty
• Allows cost of misclassification to be easily incorporated into modelling
Questions?
Funded by:Teagasc under the Walsh Fellowship Scheme
Irish Department of Agriculture & Food
(FIRM programme)
Science Foundation of Ireland
Basic Research Grant scheme (Grant 04/BR/M0057)