Predicting Aflatoxin levels: An Spatial Autoregressive approach
Gissele Gajate-Garrido, IFPRI
International Food Policy Research Institute International Center for the Improvement of Maize
and Wheat International Crops Research Institute for the Semi-Arid Tropics University of Pittsburgh
Uniformed Services University of the Health Sciences ACDI/VOCA/Kenya Maize Development Program Kenya Agricultural Research Institute Institut d’Economie Rurale The Eastern Africa Grain Council
Collecting aflatoxin information is time
consuming and expensive.
Sometimes we can have aflatoxin information from a smaller sample of households.
These information could be useful to predict the level of aflatoxins in other households with similar characteristics.
A Spatial Autoregressive Model (SAR) uses the household characteristics and the aflatoxin level of people around it to predict aflatoxin levels in each household.
This model gives more weight
to the information of my closest “neighbors” and less to the ones that are further away.
My “neighbors” information could help predict my own aflatoxin level since it could contain information that usually is not captured by surveys.
When we estimate models there is always an error term present that represents the variation that we are unable to capture.
Aflatoxin level
Observable characteristics
Unobservable: - Attitudes - Risk aversion - Motivation
There are variables such as a person’s determination or innate ability that could help predict how much time and effort they will invest in preventing aflatoxins in their crops.
These variables cannot be observed or recorded in a survey.
However, by capturing information about my peers this could help provide additional information about how I behave and how high is my aflatoxin level.
In order to asses who is “closest” to me I use location variables:
Longitude
Latitude
Elevation
Slope
▪ (Only for the pre-harvest sample)
6% 2%
29%
6%
38%
9% 9%
27%
74% 63%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
Treatedsoil (lime,manure,
etc.)
Improvedseed
Pesticide Fertilizer Insectdamage
Rodentdamage
Plasticbags forstorage
Storage:specialroominsidehouse
Frequentuse of
pestcidein
storage
Handsortingbefore
storage
Production
Storage
36%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
My neighbors' My characteristics Unobservable
The inside sample prediction captures 36% of the variation in prevalence values.
Yet, the information of my neighbors is not useful to predict my prevalence levels, only my characteristics are relevant.
64 %
Aflatoxin variation We use
data from Mali to test the model.
We start with pre-harvest data.
Variable Obs Mean Std. Dev. Min Max Measured prevalence 247 27.2 64.0 0.05 492.0 Predicted prevalence 247 29.6 26.9 0.00 130.7
0
0.5
1
1.5
2
2.5
0 1 2 3
Measured prevalence (part per billion)
Predicted prevalence 45 degree line
1.04 ***
The relationship between predicted and real values is almost 1 to 1. It is significant at 1%.
0
.01
.02
.03
.04
Den
sity
0 100 200 300 400 50020prevalence (part per billion)
Kernel density measured prevalence
Kernel density predicted prevalence
kernel = epanechnikov, bandwidth = 3.8288
Kernel density estimate for Pre-harvest Aflatoxin levels
The model is not able to capture extremely high values of prevalence and in general overestimates lower values.
76%
43%
0
.00
2.0
04
.00
6.0
08
.01
Den
sity
0 50 100 150 200 25020
Kernel density predicted prevalence for Main HH
kernel = epanechnikov, bandwidth = 12.9933
Kernel density estimate for Main HH Pre-harvest Aflatoxin levels
Variable Obs Mean Std. Dev. Min Max Predicted prevalence for main HH survey 1169 58.4 59.3 0.0 223.1
37%
63%
Post-harvest data after 1 month in storage
During storage not only your characteristics but also your "neighbors" information help explain your aflatoxin level.
Unexplained variation = 62 %
Variation explained by personal characteristics
Variation explained by neighbors aflatoxin level
Total variation in aflatoxin levels
The inside sample prediction captures 38% of the variation in prevalence values.
Variable Obs Mean Std. Dev. Min Max Measured prevalence 243 121.9 256.9 0.0 1911.2 Predicted prevalence 243 129.0 130.5 0.0 778.0
The relationship between predicted and real values is almost 1 to 1. It is significant at 1%.
0
0.5
1
1.5
2
2.5
0 1 2 3
Measured prevalence (part per billion)
Predicted prevalence 45 degree line
0.95 ***
The same methodology applied to the data in Mali will be applied to the data in Kenya.
Hence will be able to predict prevalence levels for the main household survey and use it for further analysis.
Should we expect similar results?
Different crops
▪ Mali –groundnuts vs. Kenya – maize
It also depends on production and storage practices in Kenya.
We have two models that can be used to predict aflatoxin models: Maxent
SAR model
We need to compare the strengths and
weakness of both models.
We can also consider introducing other variables to improve the predictions.
Current Partners: Donor: Bill and Melinda Gates Foundation Center/ Universities IFPRI: C. Narrod (Project lead), P. Trench(Project manager), M. Tiongco, D. Roy, A. Saak, R. Scott, W. Collier, M. Elias. CIMMYT: J. Hellin, H. DeGroote, G. Mahuku, S. Kimenju, B. Munyua ICRISAT: F. Waliyar, J. Ndjeunga, A. Diallo, M. Diallo, V. Reddy University of Pittsburgh: F. Wu, Y. Liu US Uniformed Health Services: J. Chamberlin, P. Masuoka, J. Grieco Country Partners ACDI/VOCA: S. Collins, S. Guantai, S. Walker Kenya Agricultural Research Institute: S. Nzioki, C. Bett Institut d’Economie Rurale: B. Diarra, O. Kodio, L. Diakite