Modelling and Multivariate Data
Analysis of Agricultural Systems
A thesis submitted to The University of Manchester for the
degree of Doctor of Philosophy in the Faculty of
Engineering and Physical Sciences
2015
Najib U Lawal
School of Electrical and Electronic Engineering
2
Table of Contents
Table of Contents ................................................................................................................. 2
List of Figures....................................................................................................................... 5
List of Tables ........................................................................................................................ 7
Abstract ............................................................................................................................... 8
Declaration ........................................................................................................................... 9
Copyright Statement ............................................................................................................. 9
Acknowledgements ............................................................................................................. 10
Abbreviations ..................................................................................................................... 11
Chapter 1 Introduction ........................................................................................................ 13
1.1 Research Motivation .................................................................................................. 13
1.2 The SYIELD Project ................................................................................................... 15
1.2.1 The Biosensor .................................................................................................... 15
1.3 Main Objectives ........................................................................................................ 18
1.4 Contributions of the thesis ......................................................................................... 19
1.5 Thesis Structure ........................................................................................................ 20
Chapter 2 Literature Review ................................................................................................ 22
2.1 Sclerotinia sclerotium ................................................................................................ 22
2.1.1 Sclerotinia Ascospore Release ............................................................................. 22
2.1.2 Sclerotinia Ascospore Dispersal ........................................................................... 23
2.1.3 Sclerotinia sclerotium Epidemiology ..................................................................... 25
2.1.4 Sclerotinia Disease Models .................................................................................. 26
2.2 Dispersion Modelling ................................................................................................. 29
2.2.1 Gaussian Dispersion Model ................................................................................. 30
2.2.2 Trajectory Models .............................................................................................. 32
2.2.3 CALPUFF ........................................................................................................... 33
2.3 Multivariate Statistical Analysis ................................................................................... 34
2.3.1 Multivariate Analysis in Agriculture ...................................................................... 35
2.3.2 Multivariate Statistical Process Control ................................................................. 36
2.4 Sensors, Biosensors and Sensor Networks .................................................................. 40
2.4.1 Peculiar Challenges of Biosensor Networks .......................................................... 41
2.5 Conclusion ................................................................................................................ 44
Chapter 3 Dispersion of Sclerotinia sclerotium Spores in an Oil Seed Rape Canopy ................. 45
3.1 Introduction ............................................................................................................. 45
3.2 Motivation for Experimental Field Trial ........................................................................ 46
3.3 Methodology ............................................................................................................. 47
3.3.1 Field Trial Experiment ......................................................................................... 47
3
3.3.2 Identification and Quantification of Spores ........................................................... 56
3.4 Results ..................................................................................................................... 61
3.4.1 Biosensor Test and Calibration Results ................................................................ 61
3.4.2 Colourimetric Analysis Results and Discussion ...................................................... 64
3.4.3 Spore DNA (qPCR) Results .................................................................................. 69
3.5 Discussion ................................................................................................................ 76
3.5.1 Reliability of the Prototype biosensor in measuring oxalic acid .............................. 76
3.5.2 Sclerotinia sclerotium spores dispersion ............................................................... 80
3.5.3 Experimental Value of Spore Data ....................................................................... 84
3.5.4 Limitations ......................................................................................................... 85
3.6 Conclusion ................................................................................................................ 86
Chapter 4 A backward Lagrangian Stochastic (bLS) model for the dispersion of Sclerotinia sclerotium spores ............................................................................................................... 88
4.1 Introduction ............................................................................................................. 88
4.2 Motivation for Trajectory Modelling Approach ............................................................. 89
4.3 Background Theory ................................................................................................... 89
4.3.1 Lagrangian Stochastic Models ............................................................................. 89
4.3.2 The Backward Lagrangian Stochastic Model ......................................................... 91
4.3.3 Monin-Obukhov Similarity Theory (MOST) ........................................................... 93
4.4 Methodology ............................................................................................................. 94
4.4.1 Parametrising the bLS Model for Sclerotinia Dispersion ......................................... 95
4.4.2 Implementing the bLS Model ............................................................................ 100
4.4.3 Comparing model estimates to experimental data .............................................. 102
4.4.5 Assessing Model Performance ........................................................................... 105
4.5 Results ................................................................................................................... 106
4.6 Discussion .............................................................................................................. 112
4.6.1 bLS Model Performance .................................................................................... 112
4.6.2 Limitations of Experiment ................................................................................. 116
4.7 Conclusions ............................................................................................................ 117
Chapter 5 An Integrated Fault Detection, Identification and Reconstruction Scheme for Agricultural Systems ......................................................................................................... 119
5.1 Motivation .............................................................................................................. 120
5.2 Background Theory ................................................................................................. 121
5.2.1 Principal Components Analysis (PCA) ................................................................. 121
5.2.2 Multivariate Statistical Process Control (MSPC) ................................................... 122
5.2.3 Kernel Density Estimation ................................................................................. 125
5.3 Methodology ........................................................................................................... 126
5.3.1 Data ................................................................................................................ 126
4
5.3.2 Principal Component Analysis of PM10 ............................................................... 126
5.3.3 Multivariate Statistical Process Control (MSPC) ................................................... 128
5.3.4 Online Fault Detection of a PM10 Network with Missing Data .............................. 130
5.3.5 Online Fault Identification in a PM10 Network .................................................... 132
5.3.6 Augmented MSPC............................................................................................. 133
5.4 Results ................................................................................................................... 136
5.4.1 PCA Analysis of PM10 ....................................................................................... 136
5.4.2 Data pre-processing and preliminary model of PM10 .......................................... 139
5.4.3 Final Monitoring Model and Control limits ........................................................... 146
5.4.4 Online Fault Detection of PM10 network ............................................................ 148
5.4.5 Online Fault Identification ................................................................................. 154
5.4.6 Online Fault Detection in a PM10 Network ......................................................... 158
5.5 Discussion .............................................................................................................. 160
5.5.1 Integrated Fault Detection, Identification and Reconstruction in a PM10 Network 160
5.5.2 Limitations of K-MSPC ...................................................................................... 164
5.6 Conclusion .............................................................................................................. 165
Chapter 6 Conclusion, Recommendations and Future Work ................................................. 167
6.1 Overview of Research Motivation ............................................................................. 167
6.2 Summary of Principal Findings ................................................................................. 168
6.2.1 Field trial experiment and generation of novel data ............................................ 168
6.2.2 Evaluating a 3D bLS model with experimental data ............................................ 168
6.2.3 Multivariate data analysis of potential sensor network ........................................ 169
6.3 Real world applications of research .......................................................................... 170
6.4 Further areas of research ........................................................................................ 170
References ....................................................................................................................... 172
Appendix 1: Original Plan and Modification Made ................................................................ 188
WORD COUNT: 59,685
5
List of Figures
Figure 1.1: Biosensor components with sources of failure identified. ...................................... 17
Figure 2.1: Lifecycle of Sclerotinia sclerotium [68] ................................................................ 25 Figure 2.2: Spore dispersal downwind of an above ground plume source [80] ........................ 30 Figure 2.3: Kernel estimates showing individual kernels and the effect of bandwidth, ℎ𝐾𝐷𝐸 (a)
ℎ𝐾𝐷𝐸 = 0.2; (b) ℎ𝐾𝐷𝐸 = 0.8 [112] ............................................................................. 40
Figure 3. 1: Location of Little Hoos (WGS84 Lat/Long: 51.811374/-0.373084), the experimental site, among other field trial sites at Rothamsted Research UK (source of image:
Rothamsted Research). ............................................................................................... 49
Figure 3.2: Layout of sampling area (43m by 28m) within field trial site from 31st May 2013 to 3rd June 2013 showing positions of Rotorod samplers. Data was collected at two heights of 0.8m and 1.6m (O), and additional heights of 2.4m and 3.2m (⊕). ........................... 52
An arrangement of biosensor unit, weather station and a 3D sonic anemometer were situated at the centre of the 7m-diameter ring of ascospores. Scale of sampling area excluding
upwind sampling point: 35m by 28m. All sampling positions are 7 meters apart except I, which is 14m from D. B is 1m away from the edge of the source ring. ........................... 52
Figure 3.3: Experimental trial field showing Rotorod samplers (with rain shields) above OSR
canopy. (Image taken by the author). .......................................................................... 53 Figure 3. 4: Rotorod samplers at position B deployed at 0.8m (obscured), 1.6m, 2.4m and 3.2m
pictured without rain covers. Position B (as well as D) sampled at two additional heights. (Image taken by the author). ...................................................................................... 54
Figure 3. 5: A typical assembly of Rotorod sampler (1), battery (2) and Burkard timer (3), seen here only powering one sampler with its other output unused. (Image taken by author) . 55
Figure 3. 6: Biosensor attached to Uniscan potentiostat using a bespoke connector (1).
Prototype biosensor (2) sensing surface is an enzyme-coated carbon electrode (black circular area in right frame). (Image taken by author) .................................................. 59
Figure 3. 7: Biosensor calibration curve for five repeated measurements at 600C after allowing 120 seconds of mixing (𝑛 = 25, error bars = ± 1S. D. ). .................................................. 62
Figure 3.8: Oxalic acid concentrations for all days for samples collected below the OSR canopy
................................................................................................................................. 66 Figure 3.9: Oxalic acid concentrations for all days for samples collected below the OSR canopy
................................................................................................................................. 67
Figure 3.10: Side-by-side comparison of daily oxalic acid concentrations for all positions. The positions of collection of spores represent Rotorod samplers that were deployed below the
canopy. ...................................................................................................................... 68 Figure 3.11: Concentrations grouped by position for all sampling days. Spores tested for oxalic
acid were collected below the canopy. ......................................................................... 68
Figure 3.12: Along wind concentration (spore DNA) gradient below OSR canopy for first three sampling days. The key refers to field positions (letters) and height of deployment above
ground (numbers). Spore DNA axis is scaled for clarity, maximum values for the first 2 days are shown at the top and have the same units as the vertical axis. ........................ 69
Figure 3.13: Along wind concentration (spore DNA) gradient above OSR canopy for first three sampling days. Lateral (crosswind) sampling positions are not shown. ........................... 70
Figure 3.14: Wind rose showing forecasted (a) and actual (b) wind speed and directions on day
4. The forecasted wind readings were used to set the sampling axis, resulting in a misalignment of sampling grid and spore plume ........................................................... 70
Figure 3.15: The spore gradient at position B (1m downwind of spore ring) with height for first three sampling days.................................................................................................... 71
Figure 3.16: The spore gradient at position D (14m from downwind of spore ring) with height
for first three sampling days. ....................................................................................... 72 Figure 3.17: Spore dispersal gradient for all positions including crosswind (lateral) sampling
positions. The spore DNA concentration axis is in nanograms (ng) and is scaled between 0
6
to 1ng, for clarity. The key refers to field positions (letters) and height of deployment
above ground (numbers). ........................................................................................... 73 Figure 3.18: Spore DNA below the canopy plotted with distance from centre of spore ring for
first three days of sampling. Data is fitted to an inverse power law with coefficients, exponents and 𝑅2 as shown. ....................................................................................... 75
Figure 3.20: Kernel Density Estimation of spore DNA distribution below (left) and above (right)
the canopy. ................................................................................................................ 81 Figure 3.21: Dispersion contours of spore concentration below (left) and above (right) the
canopy. ...................................................................................................................... 83
Figure 4.1: The assumed source configuration used for concentration footprint calculation showing approximate locations of 6 groups of Sclerotinia. Each group is assumed to cover
a 1 square meter area based on approximate measurements of area covered by fruiting bodies. The vertices of each square for the left bottom corner starting with 1 are: (-2.25,
2.5), (1.25, 2.5), (3, -0.5), (1.25, 4.0), (-2.25, -4.0), and (-4, 0.5). (Drawing not to scale)/
............................................................................................................................... 101 Figure 4.2: Normalised observations (blue asterisks) versus normalised model predictions (red
circles) above (left panels) and below (right panels) the canopy for the downwind sampling positions for all sampling days. .................................................................... 107
Figure 4.3: Normalised observations (blue asterisks) versus normalised model predictions (red circles) above (left panels) and below (right panels) the canopy for the crosswind
sampling positions for all sampling days. .................................................................... 108
Figure 4.4: Normalised observations versus normalised model predictions for all observed concentrations above the canopy. The blue line is the 1:1 line .................................... 109
Figure 4.5: Normalised observations versus normalised model predictions for all observed concentrations below the canopy. The blue line is the 1:1 line. .................................... 109
Figure 5. 1: Score plot showing first PC against second (numbers represent sample number –
hour of year) ............................................................................................................ 137 Figure 5. 2: Loading plot of first vs. second PC showing all monitoring stations (numbers
represent station numbers) ....................................................................................... 137 Figure 5. 3: Percentage of variance explained by first 20 PCs .............................................. 139
Figure 5. 4: Calibration and cross-validation errors for first 20 PCs ....................................... 139 Figure 5. 4: Missing data distribution before pre-processing ................................................ 140
Figure 5. 5: Missing data distribution after processing ......................................................... 140
Figure 5. 6: Monitor locations showing deleted monitors (red) with excessive missing data ... 141 Figure 5. 7: Score plots showing the 4 largest PCs against each other ................................. 142
Figure 5.8: Cross-validation and calibration errors .............................................................. 143 Figure 5.9: Hotelling T2 chart preliminary PCA model .......................................................... 143
Figure 5.10: SPE chart for preliminary PCA model ............................................................... 144
Figure 5. 11: Outliers from preliminary model’s SPE showing daily time of emission .............. 145 Figure 5. 12: Hotelling T2 for final PCA model .................................................................... 146
Figure 5. 13: SPE chart for final PCA model ........................................................................ 146 Figure 5. 14: Kernel density estimated distributions of Hotelling T2 and SPE ......................... 147
Figure 5. 15: KDE ICDF showing 95th percentile for Hotelling T2 and SPE ............................ 148 Figure 5. 16: Hotelling T2 control chart for new in-control samples ...................................... 149
Figure 5. 17: SPE chart for new in-control sample............................................................... 149
Figure 5. 18: Hotelling T2 control chart for in-control samples with missing data .................. 150 Figure 5. 19: SPE control chart for in-control samples with missing data .............................. 151
Figure 5. 20: Hotelling T2 chart with severe case of missing data (25%) .............................. 152 Figure 5. 21: SPE chart with severe missing data (25%) ..................................................... 153
Figure 5.22: Variable loadings on PC1 ................................................................................ 154
Figure 5. 23: Hotelling T2 chart for simulated out-of-control samples ................................... 155 Figure 5. 24: SPE chart for simulated out-of-control samples ............................................... 155
Figure 5. 25: SPE-T2 chart for simulated out-of-control samples .......................................... 156
7
Figure 5. 26: Hotelling T2 Contribution plot for 4 corrupted samples (Table 5.2) .................. 157
Figure 5. 27: SPE Contribution plot for 4 corrupted samples (Table 5.2) ............................... 157
List of Tables
Table 3. 1: Volumes of oxalic acid required to prepare 50𝑚𝐿 of 0, 50, 100, 500, 1000 and
1500𝜇𝑚𝑜𝑙𝐿 − 1 standards from 10𝑚𝑚𝑜𝑙𝐿 − 1stock ....................................................... 57
Table 3.2: Current recorded biosensor measurements procedure described in section 3.3.1
Values highlighted in yellow are above baseline noise level determined in the last section and are considered positive for oxalic acid. Heights of 0.8m correspond to Rotorod
samplers deployed below the canopy (canopy height = 1m). ........................................ 64
Table 3.3: Concentrations of oxalic acid measured by colourimetric analysis. Values in purple are positively and quantitatively representative of oxalic acid. Heights of 0.8m correspond
to Rotorod samplers below the canopy and all others are above the canopy (canopy height = 1m). ............................................................................................................. 65
Table 3.4: Spore DNA converted to spore numbers using 0.35pg per single spore determined by
Rogers et al. [153]. .................................................................................................... 78 Table 4. 1: Table of model parameters. ............................................................................. 105
Table 4.2: Calculated model performance measures for different observation groups (above or below canopy height). Number of observations is shown in square brackets ................ 111
Table 5. 2: Index of variables and sample number of corrupted observations ....................... 133
Table 5. 2: Control charts with increasing missing data ....................................................... 151 Table 5.3: Augmented MSPC results showing deviation of corrupted variables from their kriged
estimates and the kriging estimator’s variance ............................................................ 159
8
Abstract
Najib Lawal
Modelling and multivariate data analysis of agricultural systems
The University of Manchester (2015) The broader research area investigated during this programme was conceived from a goal to
contribute towards solving the challenge of food security in the 21st century through the reduction
of crop loss and minimisation of fungicide use. This is aimed to be achieved through the introduction of an empirical approach to agricultural disease monitoring. In line with this, the
SYIELD project, initiated by a consortium involving University of Manchester and Syngenta, among others, proposed a novel biosensor design that can electrochemically detect viable
airborne pathogens by exploiting the biology of plant-pathogen interaction. This approach offers
improvement on the inefficient and largely experimental methods currently used. Within this context, this PhD focused on the adoption of multidisciplinary methods to address three key
objectives that are central to the success of the SYIELD project: local spore ingress near canopies, the evaluation of a suitable model that can describe spore transport, and multivariate analysis of
the potential monitoring network built from these biosensors. The local transport of spores was first investigated by carrying out a field trial experiment
at Rothamsted Research UK in order to investigate spore ingress in OSR canopies, generate
reliable data for testing the prototype biosensor, and evaluate a trajectory model. During the experiment, spores were air-sampled and quantified using established manual detection methods.
Results showed that the manual methods, such as colourimetric detection are more sensitive than the proposed biosensor, suggesting the proxy measurement mechanism used by the biosensor
may not be reliable in live deployments where spores are likely to be contaminated by impurities
and other inhibitors of oxalic acid production. Spores quantified using the more reliable quantitative Polymerase Chain Reaction proved informative and provided novel of data of high
experimental value. The dispersal of this data was found to fit a power decay law, a finding that is consistent with experiments in other crops.
In the second area investigated, a 3D backward Lagrangian Stochastic model was
parameterised and evaluated with the field trial data. The bLS model, parameterised with Monin-Obukhov Similarity Theory (MOST) variables showed good agreement with experimental data and
compared favourably in terms of performance statistics with a recent application of an LS model in a maize canopy. Results obtained from the model were found to be more accurate above the
canopy than below it. This was attributed to a higher error during initialisation of release velocities below the canopy. Overall, the bLS model performed well and demonstrated suitability for
adoption in estimating above-canopy spore concentration profiles which can further be used for
designing efficient deployment strategies. The final area of focus was the monitoring of a potential biosensor network. A novel
framework based on Multivariate Statistical Process Control concepts was proposed and applied to data from a pollution-monitoring network. The main limitation of traditional MSPC in spatial
data applications was identified as a lack of spatial awareness by the PCA model when considering
correlation breakdowns caused by an incoming erroneous observation. This resulted in misclassification of healthy measurements as erroneous. The proposed Kriging-augmented MSPC
approach was able to incorporate this capability and significantly reduce the number of false alarms.
9
Declaration
No portion of the work referred to in the thesis has been submitted in support of an application
for another degree or qualification of this or any other university or other institute of learning;
Copyright Statement
The author of this thesis (including any appendices and/or schedules to this thesis) owns certain
copyright or related rights in it (the “Copyright”) and s/he has given The University of Manchester
certain rights to use such Copyright, including for administrative purposes. Copies of this thesis,
either in full or in extracts and whether in hard or electronic copy, may be made only in
accordance with the Copyright, Designs and Patents Act 1988 (as amended) and regulations
issued under it or, where appropriate, in accordance with licensing agreements which the
University has from time to time. This page must form part of any such copies made.
The ownership of certain Copyright, patents, designs, trademarks and other intellectual property
(the “Intellectual Property”) and any reproductions of copyright works in the thesis, for example
graphs and tables (“Reproductions”), which may be described in this thesis, may not be owned
by the author and may be owned by third parties. Such Intellectual Property and Reproductions
cannot and must not be made available for use without the prior written permission of the
owner(s) of the relevant Intellectual Property and/or Reproductions.
Further information on the conditions under which disclosure, publication and commercialisation
of this thesis, the Copyright and any Intellectual Property and/or Reproductions described in it
may take place is available in the University IP Policy (see
http://documents.manchester.ac.uk/DocuInfo.aspx?DocID=487), in any relevant Thesis
restriction declarations deposited in the University Library, The University Library’s regulations
(see http://www.manchester.ac.uk/library/aboutus/regulations) and in The University’s policy on
Presentation of Theses.
10
Acknowledgements
I would like to express my profound gratitude to Prof Barry Lennox, who was instrumental in
providing me with the opportunity of a lifetime to embark on a PhD. I am particularly grateful for
his patience and understanding when I faced health challenges throughout the final year of this
study.
I am especially thankful to my co-supervisor Dr Bruce Grieve who was always prompt with
assistance and advice. Bruce was also very understanding throughout this last year.
My appreciation also goes to Dr Jon West and Dr Steph Heard for being very hospitable and
tremendously helpful during my field trial experiment at Rothamsted Research.
I am grateful to my UK parents, Dr & Mrs Shehu, for all their support and encouragement, which
I cannot even begin to describe.
To my parents and dear sisters, whose constant love and support has been a life source, I love
you with all my heart.
Finally, to all my friends and colleagues in Manchester, where I have called home for the last five
years, and others all over the world, thank you for making the experience worthwhile.
11
Abbreviations
ANOVA Analysis of Variance
bLS Backward Lagrangian Stochastic Model
CFD Computational Fluid Dynamics
CTG Contiguous Themed Grid
EAD Eulerian Advection Model
ECN Environmental Change Network
EGA Error Grid Analysis
EKF Element-wise K-Fold Cross Validation
EM Expectation Maximisation
EPA Environmental Protection Agency
EWMA Exponentially Weighted Moving Average
FA Factor Analysis
FAC2 Predictions within a factor of 2
FAC5 Predictions within a factor of 5
FB Fractional Bias
fLS Forward Lagrangian Stochastic Model
GPM Gaussian Plume Model
IDW Inverse Distance Weighted interpolation
KDE Kernel Density Estimation
LAI Leaf Area Index
LAQN London Air Quality Network
LS Lagrangian Stochastic Model
MG Geometric Mean
MISE Mean Integrated Square Error
MOST Monin-Obukhov Similarity Theory
MSPC Multivariate Statistical Process Control
MTA Mean Tilt Angle
NIPALS Non-linear Iterative Partial Least Squares
NMSE Normalised Root Mean Squared Error
OA Oxalic Acid
OK Ordinary Kriging
OSR Oilseed Rape
PCA Principal Component Analysis
PCR Principal Component Regression
12
PLS Partial Least Squares
PLSR Partial Least Squares Regression
PM10 Particulate Matter of size less than 10
PMP Projection to Model Plane
PRESS Predicted Residual Sums of Squares
qPCR Quantitative Polymerase Chain reaction
R Pearson’s Correlation Coefficient
RMSEP Root Mean Squared Error of Prediction
RMSECV Root Mean Squared Error of Cross Validation
SCP Single Component Projection
SDE Stochastic Differential Equation
SDS Sodium Dodecyl Sulphate
SPE Squared Prediction Error
SSR Sclerotinia Seed Rape
SVD Singular Value Decomposition
SVI Sensor Validity Index
VG Geometric Variance
13
Chapter 1 Introduction
This chapter sets the context of the thesis, introduces the parent project that spurned the PhD
and lists the main contributions of the research.
1.1 Research Motivation
With the global population recently exceeding 7 billion and expected to reach 9.6 billion by 2050
[1], competition for depleting earth resources – land, water, food – will become more intense.
Maintaining food security is already a challenge, with yield, production and harvested land all on
a declining trend [2]. This calls for innovative agricultural practices that can help achieve and
maintain food security. One of the ways to achieve food security is by minimising pre-harvest
crop loss, where the central challenge is to eliminate or at least reduce the destructive effect of
crop pathogens [3]. Among pathogens, aerially transmitted fungal spores are the most prevalent,
far-reaching, rugged and, under conducive environmental conditions, the most destructive [4].
Fungal spores are difficult to control because it is not straightforward to detect them. Effective
detection of spores requires their measurement, which, in turn, involves both the collection and
the quantification of the pathogens. Manual detection, which can be used to physically collect
and count spores, is only feasible on small scales and, unfortunately, reliable automated detection
methods have not been available due to the unavailability of engineered biosensors that can
detect spores by exploiting the biological interaction between plants and pathogens. Farmers
currently control fungal spores by the preventative application of fungicides to entire fields when
an infection is suspected. However, as crop protection chemicals have only a limited life, the
efficacy of fungicides has often decayed before the onset of the pathogenic event, leaving crops
with limited protection. Additionally, fungicide overuse often results when farmers panic after
realising earlier applications were ineffective. This excessive application of fungicides may instead
have lethal consequences on beneficial arthropods and microbes that promote plant growth [5].
14
To minimise the inefficiency of fungicides and the resulting crop loss from infection by pathogens,
the agricultural community relies on two approaches to forecast crop disease: spore release
forecasts and disease/infection forecasts. Spore release forecasts utilise mechanistic models [6,
7], based on environmental conditions, such as soil relative humidity and temperature to forecast
release events from soil-borne fungi. Disease forecasts [8] base their forecasts not on the release
of spores but on the occurrence of ideal environmental conditions for plant infection. These latter
disease models assume a constant airborne spore concentration and alert farmers when
favourable infection conditions manifest. Both of these approaches may provide precise timing
regarding the onset of release events and infection but they lack location precision and the ability
to estimate spore ingress. The second approach is especially wasteful since it requires farmers to
apply crop protection when an infection threshold is reached whether or not there are spores
present in the air. Therefore, the issue of fungicide inefficiency in crop protection is still an
unsolved challenge and a forecasting system that will provide time and location precision would
offer considerable benefits in terms of economic savings on fungicide cost and improved crop
yield.
The main advantage of these two approaches despite their shortcomings is that they allow the
risk of crop disease to be predicted, without requiring spore data to be collected. A more reliable
method for forecasting crop disease is inoculum-based, where aerial spore concentrations are
measured and then used as input to empirical models in order to estimate the spread of the
fungal spores over a window of time in to the future. Although mechanisms of measuring aerial
spore concentration have existed for some time in agriculture, there do not exist empirical
methods able to reliably forecast largescale agricultural disease risks based on these
measurements [9]. Current pathogen detection methods in the agricultural industry rely on the
manual collection of spores on local scales, which can be extremely time consuming and
unreliable [10]. The more rudimentary techniques are based on collection by sedimentation,
where petri dishes are kept at various distances from a source, for spores to ‘settle’ on to. The
more efficient collection methods use air-sampling equipment to capture airborne spores [10,
11]. While existing air-sampling techniques simplify the collection aspect of spore detection, the
collected samples still need to be processed and in most cases biologically quantified. Detecting
pathogens in this manner has a tendency to be avoided because of the time-consuming nature
of currently available identification processes [12].
Another disadvantage of the currently available detection methods is that they are not practically
scalable and their value is really in experimental trials. The restriction of these methods to
experimental studies is due to the inherent assumption that the locations of the fungal sources
15
are known, and the rudimentary nature of the measurement systems compared to, say,
meteorological sensors which have relatively fast (hourly) and automated sampling. As a result,
pathogen data is collected under specific conditions and on small scales (on the order of tens of
meters) to reduce manual biological quantification difficulties.
To realistically monitor and forecast the risk that airborne pathogens pose to crops, a source-
independent, large-scale collection of spores that will offer advance warning and enable holistic
determination of spore ingress is required. An empirical approach based on the multivariable
analysis of data collected from a network of sensors has the ability to offer such a forecast,
provided the challenge of measuring spore concentration (automatic collection and quantification)
can been addressed. Recent developments in the area of automatic detection of airborne
pathogens [13] holds hope for empirical approaches to agricultural problems. It was in this spirit
that the SYIELD project was set up in 2010 with the aim of providing farmers with advance
warning and precision spray advice.
1.2 The SYIELD Project
The SYIELD project, involving the University of Manchester, Syngenta, Gwent Technology, among
others, was set up with the aim of developing an online risk-forecasting model of fungi-induced
crop diseases that was based on a nationwide biosensor network able to detect viable airborne
fungal spores. The information from these biosensors (see section 1.2.1) was then to form a
decision support system for farmers, enabling them to make efficient and systematic decisions
regarding fungicide applications in a way that ensures cost savings and minimal environmental
impact.
Early adoption of the project was proposed for oil seed rape (OSR) which is particularly susceptible
to Sclerotinia Stem Rot (SSR) in the UK. SSR is caused by Sclerotinia sclerotium, a pathogenic
plant fungus that affects over 400 plant species and may cause yield loss of up to 50% [8]. (See
section 2.1).
1.2.1 The Biosensor
The biosensor that was proposed and developed for measuring fungal spore concentration
mimicked the biology of plant-pathogen interaction through the provision of a nutrient source
that acted as a food source to viable spores. Once designed, this biosensor provided, for the first
time, a real-time, unsupervised means of spore detection. The biosensor was made up of a
sensing surface and a host of mechanical components that suck in airborne spores, incubate and
heat them to the optimal temperature for a biochemical reaction to take place. This reaction,
16
called an oxalate oxidase catalysed reaction [14], produces a pathogenicity factor of Sclerotinia
– oxalic acid (OA). Oxalic acid is then electrochemically measured as a current to infer the
concentration of the spores. The sensing surface was made up of an active (enzyme-coated)
biological surface designed to bind and react with Sclerotinia spores by providing them with a
nutrient base.
17
Figure 1.1: Biosensor components with sources of failure identified.
The entire detection process takes a total of three days from sample collection, through
incubation, to the oxalate oxidase reaction and subsequent electrochemical measurement of OA.
18
The biosensor is therefore a complex combination of multiple components. Figure 1.1 shows an
illustration of the sensor model, its main components and major sources of error.
1.3 Main Objectives
The original objectives of this work had to be modified due to a delay in the production of the
prototype biosensors. This is discussed in detail in Appendix 1. From figure 1.1, it may be
observed that the reliability of a sensed measurement can be affected by the production of oxalic
acid by masquerades, suppression of oxalic acid production by competing fungi, and mechanical
faults either in the form of potentiostat or pump failure. These are in addition to the noise that is
ever present in measurements. The identified sources of error fall in to four categories - false
positive errors, false negative errors, unusually high or low values (outliers) and missing data.
Throughout this thesis, ‘faults’ and ‘errors’ refer to measurement abnormalities resulting from
these four categories.
When the biosensor is deployed in a network, the task of ensuring data integrity becomes
significantly more challenging due to the duplicity of these errors and the challenge of automating
the monitoring process. Hence, error detection, identification and reconstruction methods have
to be designed for the network to ensure data integrity. A first step to achieving this is to
understand the dispersion mechanisms, both on local (in OSR canopies) and large scales. On
local scales, experiments can be carried out to generate data. On a large scale however, other
sources of data have to be identified and relied upon for analysis, as the tediousness of the
manual nature of detection and quantification already described makes large scale experiments
almost impossible. Particulate matter, made up of fine particles less than 10𝜇𝑚 in diameter
(PM10), from monitoring networks is promising in this regard and is discussed further in section
5.1.
The main objectives of this research are as follows:
1 Investigating Sclerotinia sclerotium dispersion in OSR fields through an experimental trial
that studied the natural release, transport and dispersion of spores.
2. Identifying a model able to estimate approximate travel distances of Sclerotinia spores at
short distances from the source.
3. Validating the identified spore dispersion model using experimental data.
4. Evaluating fault detection, fault identification and subsequent re-estimation techniques of
measured data over the potential biosensor network by extending and modifying
multivariate data analysis techniques.
19
The research thus seeks to offer a multidisciplinary solution to the identified problems and draws
on three areas, which are reviewed in Chapter 2: agricultural sciences, micrometeorology and
multivariate statistical process control.
1.4 Contributions of the thesis
The research study carried out during this PhD focuses on experimental design of pathogen
dispersal, modelling and multivariate data analysis in agricultural systems. The major
contributions of the work are as follows:
1 Conception, design and implementation of an experimental field trial for the release and
dispersion of Sclerotinia spores in an OSR field that yielded novel experimental data. A field
trial experiment has been designed and implemented at Rothamsted Research to investigate
the dispersion of spores in an OSR field. The objective of the experiment was to generate
data that described spore transport in and above an OSR canopy. The resulting concentration
gradients enable evaluation of dispersion models as well as estimates of safe distances from
fitted decay models. The 3 dimensional nature of the data as opposed to vertical profile
experiments, which are widely available for other types of fungal spores is of high
experimental value. For example, numerous multidimensional dispersion models, such as
Large Eddy Simulations (LES) and forward Lagrangian Stochastic (fLS) models can be
evaluated using the data.
2 The novel application of a backward Lagrangian Stochastic (bLS) model to describe spore
transport in and above an OSR canopy. During this study, the data generated from the field
trial experiment was used to evaluate a bLS model. While forward Lagrangian Stochastic
models have been applied to spore transport in crop canopies, a bLS model has not, to the
author’s knowledge, been applied to fungal spores in comparable canopies. The purpose of
this application was to evaluate the performance of a trajectory model. The back trajectories
generated with bLS can enable the future determination of minimum distances of separation
between biosensors in the near field (distances from the source characterised by canopy
disruption of surface layer turbulence), where prediction with other types of models can be
unreliable. The bLS model was parameterised using both on-field measurements and
empirical data from experimental findings in literature. Atmospheric turbulence was
characterised using a Monin-Obukhov Similarity Theory (MOST) approach. It was shown that
the model agrees with the experimental data, and compared favourably in terms of model
20
performance statistics with a recent application of a Eulerian-Lagrangian Stochastic model in
a maize canopy.
3 A novel procedure for extending multivariate statistical process control (MSPC) to spatial data,
potentially biosensed Sclerotinia spore data, is presented. MSPC is a statistical monitoring set
of tools for monitoring and controlling multivariate industrial processes where process
variables are correlated. A similar correlation resulting from the spatial correlation of airborne
spore concentrations is expected between biosensor measurements. Due to mechanical
failures, theft, vandalism and errors in the biological sensing process, biosensors will
inevitably have faulty, missing or erroneous measurements. Consequently, a monitoring
framework was presented that will ensure data integrity when measurements are missing,
and ensure detection of false positives, false negatives and outlying measurements. The novel
procedure, named K-MSPC, is an augmented MSPC approach that incorporates Kriging
interpolation into the monitoring scheme so that K-MSPC is aware of spatial dependence. For
example, in a typical MSPC application, a high measurement at a biosensor surrounded by
high neighbouring measurements would be designated as a fault. However, K-MSPC could
determine that high neighbouring values imply a spatial correlation, possibly due to the local
release of spores. K-MSPC was demonstrated with PM10 data sourced from the London Air
Quality Network (LAQN) of Kings College London. PM10 was chosen because of its
aerodynamic similarities (size and settling velocity) and, therefore, dispersion similarities to
Sclerotinia spores. The application of K-MSPC was shown to be successful in detecting and
identifying faults while minimising false alarms and handling missing data. K-MSPC could be
extended to biosensor and general environmental monitoring networks measuring particles
with similar aerodynamic characteristics to Sclerotinia where the spatial scale calls for a
modification of traditional MSPC. It has an advantage over current methods where anomaly
detection is done on the individual sensor level. This poses significant scaling challenges for
large sensor networks.
1.5 Thesis Structure
The thesis is organised into six chapters beginning with an introduction of the research motivation
and general research overview in Chapter 1. Chapter 2 reviews literature from the three key
disciplines drawn from in this work. It begins with a review of the epidemiology of Sclerotinia
sclerotium, followed by a review of pathogen dispersal and an introduction to multivariate data
analysis and its applications in agriculture. Limitations of the current approaches and areas of
21
improvement are identified in this chapter. The subsequent three chapters (Chapter 3-5)
constitute the main research work carried out during this PhD programme.
Chapter 3 presents the details of an experimental trial carried out at Rothamsted Research’s
facility in Harpenden, which was designed to sample naturally released Sclerotinia sclerotium
spores in an OSR field. Details of the collection, quantification and identification of spores by
various means as well the analysis of the data have been presented. An evaluation of the accuracy
and specificity of the different spore quantification methods used is presented in this chapter.
Chapter 4 presents the application and parameterization of a 3D backward Lagrangian Stochastic
(bLS) model to the data generated in chapter 3. The chapter begins by explaining the reason
influencing the choice of a trajectory model, followed by an introduction of Monin-Obukhov
Similarity Theory (MOST), which enables the parameterisations used in bLS in this work. This is
then followed by an introduction to the Lagrangian Stochastic (LS) model and subsequently bLS.
Data pre-processing and the implementation of bLS are then discussed. An evaluation of the
model’s performance based on its agreement with the experimental data from Chapter 3
concludes the chapter.
Chapter 5 proposes a novel procedure for fault detection in a proposed biosensor network that is
based on multivariate data analysis methods. PM10 monitoring data was used in this chapter and
the reasons for this choice are provided in the introduction this chapter. The chapter introduces
Principal Component Analysis (PCA), which has been chosen as a suitable model to describe the
biosensor network. Multivariate statistical process control (MSPC) concepts are then introduced
and their adaptation to the spatial PM10 data are discussed as the chapter unfolds. Limitations
of the traditional MSPC approach are demonstrated and the proposed augmented MSPC
procedure (K-MSPC) is presented and applied to the PM10 data.
Chapter 6 concludes the thesis by summarising principal findings and drawing real world
conclusions from them. Future areas of research work have also been identified and presented
in this chapter.
22
Chapter 2 Literature Review
This chapter gives a review of the current literature in the areas of research identified in chapter
1. In the previous section, the motivation for this research was introduced as being based upon
the inefficiencies of the current agricultural methods. In this section, these current practices are
reviewed and specific shortcomings are identified. This led on to the consideration of other
research areas that could offer improvements on the current methods and provide transferable
techniques that can enable the implementation of the proposals in this study.
2.1 Sclerotinia sclerotium
2.1.1 Sclerotinia Ascospore Release
Sclerotinia sclerotium is a pathogenic plant fungus that affects approximately 400 plant species
worldwide [15]. The fungus can germinate both carpogenically and myceliogenically [16]. In the
latter case, no ascospores are produced and potential for infection is mainly through stems and
roots of neighbouring plants. For carpogenic germination, which is influenced by factors such as
soil moisture and temperature [14], small fruiting bodies known as apothecia are produced on
the sclerotia [17]. Apothecia can attain sizes of 1cm in diameter and are capable of producing up
to 5 x 106 spores [18] over a lifetime of about 20 days under ideal conditions [19]. There is no
consensus with regards to the exact ideal conditions of ascospores. However it is widely believed
that illumination after dark, decrease in relative humidity and increase in temperature are the
determinants of spore release. This was first investigated and reported by Ingold [20]. McCartney
and Lacey [21] assert that low relative humidity preceded by high overnight relative humidity are
important for spore release. Most experiments [18, 22] reported ascospore release in saturated
air, although Clarkson et al. [19] have reported continuous discharge of spores at 65-75% relative
humidity. It is believed that the optimal conditions for release are 20-25oC and 90-95% relative
23
humidity. Weak release rates of ascospores have been reported at lower temperatures of as low
as 5-10oC [22] but it was observed that these suboptimal temperatures reduced the apothecial
lifetime [19].
The ascospore discharge mechanism is not fully understood and is increasingly revealed to be
complex. Sclerotinia spores are actively released after complex interactions between apothecia
and environmental conditions [23-25]. Early investigations by Ingold [20] showed that spores
were released intermittently as puffs. Other investigations have reported continuous discharge in
both light and dark conditions [19]. More recently, the release mechanism has been reported to
be sophisticated, with ascospores acting in a cooperative manner to surf their own wind and
maximise opportunities for longer travel distances [26]. The wide range of ascospore behaviour
suggest that spore discharge and consequently dispersal will have a wide range of variation
depending on local environmental conditions. Rate of ascospore release has also been
investigated. Numerous studies have found that ascospore discharge follows a diurnal pattern
with most experiments reporting a peak at midday [18, 27, 28]. This peak has been attributed to
a peak in temperature. The size range of ascospores reported is between 8-12um [17, 26]. Spores
are launched at speeds of 8.4m/s but this speed decreases to between 0.4 and 0.8m/s over the
first few millimetres of travel. Spores that can make it into the upper turbulent air and escape
the canopy are capable of attaining heights of 150m [17]. Once a spore escapes, it is expected
that its potential for dispersal is the same as other particles/bioaerosols of similar aerodynamic
characteristics [3, 29].
2.1.2 Sclerotinia Ascospore Dispersal
The release of Sclerotinia spores is closely related to its dispersal potential. Multiple studies have
reported a large deposition of spores near the source and that spores are usually locally sourced
[21, 30]. Investigations by Roper et al. [26] show that this high deposition rate is as a result of
a cooperative action by spores to sacrifice some numbers near the source so that opportunities
of long distance travel are maximised. As Sclerotinia spores are released at ground level inside
the canopy [17, 18], the effect of canopy on turbulence also plays an important role in
spore/scalar dispersion [31-37]. This is primarily due to the distortion of the turbulent field by the
canopy [38]. Canopy flow is characterised rapid dissipation of turbulent kinetic energy with depth
into the canopy [31], resulting in low average wind speeds accompanied by intermittent gusts
[39]. The heavy filtering effects of canopies have also been reported as factors influencing heavy
near-source deposition of fungal spores [40, 41]. As a result of this, spores generally travel
distances of the order of hundreds of meters. Suzui and Kobayashi and Boland and Hall [42, 43]
24
report distances of 100meters from the source [44]. However, long distance travel of Sclerotinia
spores has not been experimentally documented, possibly due to large-scale detection and data
collection limitations. But their potential for long distance travel has been demonstrated as they
have been detected from rooftop spore traps (e.g. at Rothamsted research) [3, 29].
Different types of models have been used to describe Sclerotinia spore dispersal. Earlier on, field
experiments successfully fit spore dispersal to 1-dimensional models with concentration
monotonically decreasing with distance from the source [44]. Most studies [41, 45] have used
two functions to describe spore gradient: a negative exponential function and an inverse power
law function. The inverse power law is more appropriate in describing long distance dispersal
[44] while the exponential function is more suited for canopy transport [41]. These functions are
limited in that parameters for the decay coefficients need to be determined for every case [44].
As source information became available (e.g. release speeds and modes (puffing)), emission
models were adopted to describe spore dispersal. One such emission model is the Gaussian Plume
Model (GPM), which assumes spore or particle concentration distributions are Gaussian in the
lateral and crosswind directions. Some good examples of GPM applications to spore dispersion
are [46] and [47]. Gaussian models were found to do well outside the canopy but poorly inside
it since the assumptions of Gaussian velocity distributions in the near-field are not valid [48, 49].
As a result, GPMs are normally adopted for long distance travel of fungal spores.
Short-term spore dispersal is important because information about escape fraction [50], the
amount of spores leaving a canopy, can be ascertained. This fraction, which cannot be measured
directly [51, 52] , is important in assessing long distance dispersal patterns. The realisation of
the GPM’s inadequacy in the near-field led to the adoption of trajectory models, which follow
particles in a more natural manner and are amenable to parameterisation in canopy media [53].
Lagrangian Stochastic (LS) [54] models are the most commonly used models in this area although
some Eulerian Advection Models (EADs) have also been used [44]. Notable applications of LS
models to spore dispersal include: estimation of source in wheat and grass canopies for
Lycopodium and V.Inaequalis spores [55], dispersal of pollen in a maize canopy [56, 57] and
dispersal of fungal spores close to the source [58].
Recent developments in methodologies that provide a practical framework for accurate
determination of turbulent parameters based on 2nd order closure assumptions [59], such as the
𝑘 − 𝜖 theory [60] [61] and Large Eddy Simulation (LES) [31, 62] have enabled the coupling of
these powerful parametrisation methods with LS models to provide more accuracy. The
application of these coupled models to spores has only been on a limited scale, however, due to
25
the unavailability of large scale experimental data for evaluation and the reliance of the turbulence
parametrization on good canopy descriptions. Specific application of LS models to Sclerotinia
spores has not been found in literature. With short distance dispersal dependent on not only
aerodynamic characteristics but on release mechanism and canopy structure as well, it is
expected that application of models will be different from every different spore and canopy.
2.1.3 Sclerotinia sclerotium Epidemiology
Sclerotinia is a pathogenic plant fungus that causes Sclerotinia Stem Rot (SSR) in the majority of
oil seeds and legumes [43]. The lifecycle of Sclerotinia is shown in figure 2.1. As may be seen in
figure 2.1, this lifecycle depends on coming into contact with and infecting a host plant. The
infection process is complex, with Sclerotinia sclerotium first attacking senescent tissue to get a
nutrient source before releasing cell-wall-degrading enzymes that kill adjacent healthy tissue [63]
[16]. Once this is achieved, Sclerotinia can cause perennial damage after initial infection because
of its ability to survive in the soil and germinate as Sclerotia when conditions are optimal [64].
These then give rise to the production of apothecia [65] that release ascospores into the air,
which can be transported by various dispersion mechanisms [66]. This ability to survive in the
soil for long periods means that agricultural practices like crop rotation, a practice that is
tightening recently, influence disease incidence. Shorter crop rotations increase disease risk while
longer rotations decrease it [67].
Figure 2.1: Lifecycle of Sclerotinia sclerotium [68]
26
As indicated in figure 2.1, initial infection of crops begins with the attachment of spores to
senescent plant tissues such as petals of fruiting bodies of crops followed by petal fall [16, 63].
The petals then attach to plant leaves, when there is enough adhesion in the form of leaf wetness,
and subsequently infect the stem, at which stage the disease becomes most advanced and
virtually irreversible. As a result, all disease control decisions have to be made before the stem is
infected in order to save yield.
Although spores are a necessary condition for disease occurrence, their presence alone does not
guarantee infection, as they cannot directly attack healthy tissues [63, 64]. Environmental
conditions, such as temperature in the canopy have to be suitable for infection. Environmental
factors play an essential role in two stages: production of apothecia, and subsequent release of
spores [65], and infection of crops by spores [6]. As a consequence of this significant role, most
disease-forecasting schemes have been based on, though not exclusively, weather conditions,
thereby making the forecast methods indirect. The scheme employed in this research is the first
to utilise the use of online sensors that measure spores concentration in real-time in order to
develop direct prediction models.
2.1.4 Sclerotinia Disease Models
Given the economic cost associated with Sclerotinia and the high cost of the resulting suboptimal
fungicide use [69], there has been considerable interest in developing numerous disease
prediction schemes. Most of the initial attempts focused on indirect modelling. These attempts
utilised land history and weather data to identify suitable environmental conditions either to just
forecast the presence of inoculum through the prediction of apothecia germination [6], forecast
actual infection of petals or stem tissue when some other conditions are simultaneously met [7],
or go a step further to provide decision support for spraying [67, 70]. Direct inoculum-based
detection, measuring actual spore concentrations, is more accurate and has the potential to
improve prediction accuracy when incorporated into forecasting models, although the detection
and measurement process in current methods is slow and manual [63]. The approach adopted
in this research is expected to be better than traditional inoculum-based detection methods by
being the first to utilise directly captured viable spores, along with relevant weather variables and
historical data records to develop a real-time risk-forecasting model.
Two models that offer considerable improvement have been identified and are reviewed in more
detail in section 2.1.2, given their importance to the research. These models are exemplars of
27
the current trends in agricultural disease forecasts and therefore provide insights into areas of
improvement.
2.1.4.1 SkleroPro
Koch et al. [67] developed a forecasting system (SkleroPro) capable of assessing Sclerotinia risk
in winter Oil Seed Rape as well as providing a decision support on fungicide spraying. The model
utilises air temperature, relative humidity, rainfall and sunshine hours to estimate canopy
temperature and relative humidity. Data from a climate chamber study was used to determine
critical values of temperature that coincided with highest disease incidence while critical relative
humidity values were extrapolated from a previous experiment. The sum of hours where both
the relative humidity and temperature are ideal for infection (InhSum) is then compared to a
field-specific disease incidence threshold (Inhi) to decide whether or not to spray. The calculation
of Inhi is guided by economic reasons; it also takes the effect of crop rotation into account with
longer rotations increasing it and shorter ones decreasing it (increasing risk). Whenever
Inhi>InhSum, disease risk is considered significant and the crops are sprayed.
The model generally gave satisfactory performance when tested against historical data (>70%).
In addition, 39% reduction in fungicide cost can be saved when compared to routine unsystematic
sprays [9].
The major limitation of this model is also its simplicity: it does not use direct measurements of
spores. Disease risk is indirectly forecasted by predicting infection hours (Inh), primarily based
on environmental weather conditions. This means that the integrity of forecasts solely relies on
the forecasted weather values’ widely varying relationship with Sclerotinia spores.
Another limitation is that the model focuses primarily on stem infections. Therefore it will not
provide that proactive element that will normally be there when the first stages of infection are
monitored and modelled. Focusing on petal infection and fall will provide more time for spraying
decisions to be made although it may not provide more cost savings, since not all petal infections
translate into SSR. This is precisely what the proposed approach provides.
A third limitation of this model is that it does not take apothecia development and ascospore
dispersal into account. Even though these can strongly be related to, and often inferred from
environmental conditions, other factors influence them so they may not always be inferable from
environmental conditions [6]. This means that the model always assumes spores are present in
a field, as it has no way of tracking spore presence. In the approach employed in this research,
28
the detection of spores by online sensors will confirm apothecia formation and ascospore
presence, even if it is not from a local source, no understanding of ‘other’ factors ‘insufficiently
understood’ is necessary.
One more shortcoming is that the model is only time-point specific not location-specific. This is
because the prediction is solely based on canopy microclimate, which has little variation over
most fields and, therefore will not indicate where disease pressure is more intense. The model
proposed in this research will provide time-point as well as location-specific decision support
thereby improving on the fungicide cost savings offered by 'SkleroPro'.
2.1.4.2 RAISO - Sclero
RAISO-Sclero is a Syngenta Ltd trademarked OSR petal-infection forecasting model developed by
Varraillon et al. [7] that is made up of three sub-models that simulate soil climate conditions,
apothecia life cycle and crop flowering development. Because it predicts petal infection, it offers
a more proactive decision support than SkleroPro but it may not give accurate final attack
numbers, as not all petal fall is caused by SSR. The model, which has gained some practical use
in France and UK, uses environmental variables like air temperature, relative humidity and rainfall
to predict petal infection. When ascospore presence, as given by the apothecia life cycle model,
coincides with petal fall, as shown by the flowering Area Index determined by the crop flowering
development model, a disease impact is assumed. The severity of petal fall will typically increase
disease impact.
The model is validated by carrying out a diagnostic test of collecting and examining petals for
signs of the fungus. According to Varraillon et al. [7], the model gave satisfactory performance
when compared to results obtained from the petal kit (validation) tests with 80% disease
prediction accuracy.
Due to its local nature and the resulting need to make several recalculations, there are major
limitations to this model: validation is semi-manual, time consuming and dangerous due to the
toxicity of chemical reagents used.
Another limitation of this model is in validating the flowering area index model that determines
petal biomass. Validating the model is achieved with a CAN-EYE [71] imaging software, which
extracts canopy characteristics, such as Leaf Area Index (LAI), Vegetation Cover Fraction, etc.,
through the analysis and classification of images [71]. The photographic method can be sensitive
29
to light, and leaf colours may be saturated during full flowering, introducing errors into the
validation [72]. Since petal fall is essential to the functioning of this model, this affects the overall
model quality.
Another concern is that other factors that affect flowering dynamics or petal fall, such as sowing
density and insect attacks, etc., are not considered by the flowering development model. This
results in wrong attribution of reduced petal biomass to disease impact. This may be reason why
RAISO-Sclero generally predicts higher risks than those predicted by petal kit tests.
2.2 Dispersion Modelling
Different models have been used to estimate spore dispersal both within and outside crop
canopies. Most of these models are Gaussian or Lagrangian. As the name implies, Gaussian
models are based on the assumption that the spore plume spreads and expands like a Gaussian
distribution, i.e. around a fixed mean (the plume centre) and with a random variance [73]. Inside
the canopy, at short distances from the surface and where there may be low wind conditions,
Gaussian models may give poor performance [73-75]. As a result, Gaussian models have not
been extensively used for pathogen spore dispersal inside canopies. While Lagrangian particle
dispersion models, such as FLEXPART, are very powerful over long and mesoscale dispersion
ranges, Lagrangian Stochastic (LS) models [54] are suitable for estimating spore dispersal within
the crop canopy at distances close to the source [58]. In contrast, Eulerian Advection – Diffusion
(EAD) models [76] based on Fick’s Law provide better estimates at longer distances from the
source [66] and can be extended to a regional scale [77].
One popular model that falls under the Lagrangian category that has not been used in agriculture
is the National Atmospheric-dispersion Modelling Environment (NAME) model [78] developed by
the Met Office to predict atmospheric dispersion and deposition of gaseous particulates up to
global scales. While this model has a potential use in estimating dispersion of some pathogens,
the relatively short dispersal scale of Sclerotinia spores makes its use unsuitable.
Computational Fluid Dynamics (CFD) models based on Navier–Stokes equation are also potentially
useful in estimating dispersal within plant communities due to their ability to model air flows in
complex terrains [66], such as agricultural fields.
Given the physical and aerodynamic similarities between Sclerotinia spores and other types of
particulate matter [44, 79], say PM10, a review of dispersion models simulating or capable of
30
simulating pollutant dispersion, and other general particle movement, is appropriate. Atmospheric
dispersion models used for pollution can be divided into Gaussian and Trajectory models.
2.2.1 Gaussian Dispersion Model
Gaussian models, either puff or plume, assume a Gaussian distribution for a cloud of particles.
That is, the concentration of particles spreads with distance from the source in both the downwind
and crosswind directions according a normal distribution as shown in Figure 2.2 [80].
Figure 2.2: Spore dispersal downwind of an above ground plume source [80]
As may be seen in Figure 2.2, a released plume spreads in a Gaussian manner with a mean value
centred at height 𝐻𝑒 and standard deviations in all three coordinate axes derived from the
respective random variances of the particles within the plume. Concentrations at all points
downwind from the source and at various heights from the ground can be calculated by:
𝐶(𝑥, 𝑦, 𝑧) = 𝑄
2𝜋𝜎𝑥𝜎𝑦𝑢𝑒𝑥𝑝 (−
𝑦2
2𝜎𝑦2) {𝑒𝑥𝑝 (−
(𝑧−𝐻)2
2𝜎𝑧2 ) + 𝑒𝑥𝑝 (−
(𝑧+𝐻)2
2𝜎𝑧2 )} [2.1]
Where 𝑄 is the rate of release; H is the height from ground level to plume centreline; u is the
horizontal (downwind) wind speed measured at H; x, y and z are the three-dimensional
coordinates of the receptor point; 𝜎𝑥 , 𝜎𝑦 and 𝜎𝑧 are the concentration profiles (dispersion
Pollutantconcentration
profiles
Plumecenterline
Heat x
3
Heat x
2
Heat x
1+y
-y
Actual stack heightEffective stack heightpollutant release heightH
s+ Δh
plume rise
=====
Hs
He
Δh
z
Windx
Hs
31
coefficients) of the plume in the downwind; crosswind and vertical directions for a particular
stability class [80].
In most cases, ground level concentration at height 𝑧 = 0 is of more interest than that at a
‘receptor’ height. The concentration is then given by:
𝐶(𝑥, 𝑦, 0) = 𝑄
2𝜋𝜎𝑥𝜎𝑦𝑢𝑒𝑥𝑝 (−
𝑦2
2𝜎𝑦2) [2.2]
The dispersion coefficients 𝜎𝑥, 𝜎𝑦 and 𝜎𝑧 are determined by atmospheric conditions (turbulence in
the atmosphere) and are tedious to calculate or measure on a case-by-case basis [80]. As a
result, they have been parameterized most notably by Pasquill and Smith [81] for different
atmospheric conditions and presented as Pasquill-Gifford stability classes [80, 81]. The classes
are defined from class A to F ranging from unstable through neutral to stable, with class A being
the most unstable and F the most stable. Basically, the higher the atmospheric instability, the
higher the turbulence, which implies more mixing, making the particles more buoyant - allowing
more deposition far from the source [44]. While considerably simplifying parameter selection, the
stability classification restricts Gaussian models to moderate distances and times, as the
parameters of a particular class are only useful within a range of typically hundreds of meters
[82] where the conditions for that class are valid – long distances may cut across multiple stability
conditions. However, these short ranges must be longer than 100 meters, as concentrations for
shorter source-receptor separation can be unrealistically high [82].
The difference between the puff and plume models is that plume sources are continuous while
puff sources are intermittent emitters; a number of puffs are required before a cloud is formed.
Differentiating between puffs and plumes becomes difficult when very fast puffs are involved and
it is often more advantageous to model such emission sources as plumes [79]. For a point source
emitting particles at 𝜏𝑠 seconds that travel in the air for 𝜏𝑙 seconds, puff and plume sources can
be defined as follows [83]:
Puff source: 𝜏𝑠 ≫ 𝜏𝑙
Plume source: 𝜏𝑠 ≪ 𝜏𝑙
2.2.1.1 Application of Gaussian Model to Spores
With regards to fungal spores, a Gaussian Plume Model (GPM) is more appropriate than a puff
model since the time between emissions is very close to zero [79]. Clarkson et al. [19] actually
classify apothecia release as a continuous phenomenon, having observed continuous release of
32
spores in an experiment. This makes application of GPM plausible since apothecia are likely to be
distributed within any single source and their combined release will form a cloud that is very
similar in flow characteristics to a plume source. GPMs ability to incorporate terms that account
for real effects such as deposition on leaves and escape fraction from canopy [46] also contributes
to their attractiveness. Spijkerboer et al. [46] have confirmed that spores from point sources
actually form Gaussian plumes and that GPMs are as suitable for modelling spores as they are for
modelling gases. They, however, noted that the accuracy of prediction is affected by the under-
prediction of stability classes by most stability tables including Pasquill-Gifford resulting in lower
plume size and, in turn, lower accuracy of prediction [84]. Summarily, Gaussian models can
adequately model the key components of dispersal: release, transportation and deposition. What
they cannot do is to simulate chemical mixing reactions between particles and this is not a
requirement for spore dispersion [79].
2.2.2 Trajectory Models
Trajectory models can be divided into Lagrangian [38, 48, 54, 85] and Eulerian [66, 80] models.
In contrast to Gaussian models which describe an entire cloud of particles, Lagrangian and
Eulerian models follow individual particles as they move through the atmosphere modelling them
as a random work process [83]. For Lagrangian models, individual particles travel at a changing
speed known as the Lagrangian speed, 𝑢𝐿 , whose rate of change is given by the Langevin
equation [79, 86]:
𝑑𝑢𝐿
𝑑𝑡= −𝑎𝑢 + 𝑏𝜉(𝑡) [2.3]
Where 𝑢 is the previous speed of the particle referred to as the ‘memory term’, 𝜉 is a random
forcing function accounting for turbulence, 𝑎 and 𝑏 are functions of particle location and time
derived from the Fokker-Planck equation [87]. The displacement of a particle travelling from
position 𝑥0 to 𝑥1 in time 𝑡0 to 𝑡1 is then
𝑑𝑥 = 𝑑𝑢𝐿 ⋅ 𝑑𝑡 [2.4]
The conditional joint probability distributions of these trajectories are then computed in order to
evaluate the concentration.
Eulerian models are very similar to Lagrangian models except that, for the latter, a moving frame
of reference that moves with the particle is used while the former uses a fixed frame of reference
[83]. As a result, the Lagrangian and Eulerian speeds are not the same, making particle
displacement calculations by these two models unequal.
33
2.2.2.1 Application of Trajectory Models to Spores
Following the trajectories of individual spores means that these models will handle the effect of
wind gusts and rapid change in wind directions better than a GPM [88], which is restricted within
a specific range of atmospheric conditions describing a particular stability class. Easy and accurate
parameterisation of airflows for these models enhances their accuracy although inaccuracies crop
in for complex airflows that are difficult to parameterise [57]. This is particularly important in
simulating spore travel within a canopy where wind speed and direction are randomized by plant
cover and other obstructions [79]. For estimating spore dispersal close to the source or near
ground level, Aylor [58], and Aylor et al. [89] have found that Lagrangian models are accurate.
Aylor and Flesch [55] found that, as Lagrangian models are best suited for the lighter passive
tracers, accounting for the effect of inertia and gravity when implementing them on spores yields
better results. They used the Lagrangian model to accurately estimate the release rate of spores
into the atmosphere from a canopy, a problem that was hitherto intractable [55].
2.2.3 CALPUFF
Due to the successful implementation of Gaussian models in particle dispersal, numerous
simulation packages employing Gaussian distribution of spores have been produced, among them
AEROMOD [90] and CALPUFF [91]. CALPUFF is a non-steady state meteorological modelling
system specifically designed for air quality modelling. It has three main components: CALMET, a
meteorological modelling package capable of generating diagnostic and prognostic wind fields; a
core Gaussian dispersion model with wet and dry deposition and chemical removal known as
CALPUFF; and CALPOST, a suite of post processing programs that output concentrations and
meteorological data fields. In addition, a host of pre-processing programs are available to
interface with various data formats and sources, including other models. The model uses
geophysical and meteorological data inputs to construct a meteorological terrain that determines
dispersion gradients [91].
It may at this stage be evident that there are more attractive reasons to use CALPUFF than a
basic GPM for spore dispersion simulation than the one already outlined: generating
meteorological fields, specifying model domain, dispersion parameters and visualizing outputs,
which otherwise required specifying complex parameters, are significantly less monotonous
through GUI support; and much more complexity in terms of averaging times of wind variables
are easily handled.
34
The major setback of CALPUFF is that it supports non-widely used data formats in which relevant
datasets are not available and conversion is tedious. For example, while global and local (USA)
land-use data files are available in the supported CTG (Contiguous Themed Grid) and other
formats, this global data does not provide enough granularity for non-global models, necessitating
the need to source for some local UK data for good terrain simulation; this data is not available
in supported formats and conversion is a problem.
2.3 Multivariate Statistical Analysis
Based on the review of Sclerotinia disease models, it may be observed that current models used
are first principle models that require information about the source, such as the number and
location of sources and source strength. This information is mostly unavailable, unreliable or
difficult to obtain. It is therefore possible that empirical approaches that can ascertain and exploit
the structure in the dispersion process from collected data would provide better results. With
these methods most of the inadequacies of SkleroPro and RAISO-Sclero such as lack of
assessment of prediction uncertainties, scientific diagnostics, statistical inference and evaluation
can be addressed.
One category of suitable empirical approaches is multivariate data analysis [92]. Multivariate
analysis allows the statistical investigation of inter-variable relationships and contributions in data
sets containing multiple variables. Application of its various forms allows information in any data
to be extracted, interpreted or predicted. A demonstration of the powerful qualities of multivariate
analysis is seen in its application in the chemometrics industry where a large number of predictors
that may not necessarily be significant to the predicted variable make up most of the dataset,
and there is a need to reduce data dimensionality such that only essential variables are included
in the model [92, 93]. It is these qualities that this study hopes to exploit in our application of
these methods. The analysis is not just restricted to the useful information signal part of the data
but extended to the error (random noise) in the data too for statistical inference and diagnostic
purposes. This scientific analysis of error provides a diagnostic tool for assessing results,
something lacking in the experimental approach used for Sclerotinia models [92].
The field of multivariate statistics analysis is used in a wide range of applications comprising 1)
data description, 2) regression and prediction, 3) interpolation, and 4) discrimination and
classification. Generally, these applications can be categorised into linear or nonlinear; projection
or non-projection methods [92, 93].
35
Linear and Nonlinear
Linear methods are those that can be adequately described by the linear model:
𝑦 = 𝑚𝑥 + 𝑐
where y is the dependent variable, x the independent variable, and m and c are constants. By
contrast nonlinear models have parameters that are constantly varying. Linear methods are
generally easier to use and more grounded in theory since they are more understood [93] On the
other hand, a deep understanding of the data is required before nonlinear methods can be used
with a certain degree of certainty [92]. While it is possible, and desirable, to incorporate some
nonlinear relationships into a linear model, it may be necessary to use nonlinear models for some
data. The choice of the method to use depends on the structure and complexity of the data. Most
of the popular regression techniques such as Multiple Linear Regression (MLR), Principal
Component Analysis (PCA), Principal Component Regression (PCR), and Partial Least Squares
(PLS) fall under the linear regression category although they can accommodate some nonlinear
relationships while still maintaining the general linear-in-parameter nature of the model [92].
Projection and non-projection methods
This classification relates to regression. Projection based methods require the original data to be
transformed into a new variable space for easy visualisation and most importantly for dimension
reduction. Methods like PCA, PCR and PLS fall under this category. MLR, however, is a non-
projection method where all modelling and analysis is done on the actual variables themselves.
The projection-based methods are generally more popular and effective in applications where
there is a large number of correlated variables. While non-projection methods may result in
underdetermined systems for a dataset 𝑋 with 𝑚 < 𝑛 (where 𝑚 and 𝑛 are the number of
samples and variables respectively), projection methods always result in an over determined
system. Therefore, projection methods always give a unique or least squares solution to the
regression problem [94, 95].
2.3.1 Multivariate Analysis in Agriculture
Due to the novelty of the proposed approach in this work (see chapter 1) similar applications of
multivariate analysis have not been found in literature. However, agricultural disease prediction
is a wide and diverse area and there have been some applications of multivariate analysis within
the sector. Principal Component Analysis (PCA), Factor Analysis (FA) and other forms of
Discrimination Analysis have been extensively used to analyse and classify various kinds of
agricultural data. Kallithraka et al. [96] used PCA to classify Greek wines based on geographical
36
origin. 33 different varieties were successfully grouped into two geographical regions although
more samples need to be analysed before the result can be generalised. Again, Whittaker [97]
used PCA to identify food-borne bacteria by infrared spectroscopy and the results, after
verification, proved to be accurate.
For modelling and prediction purposes, despite its inferior performance, Principal Component
Regression (PCR) has been favoured in the past over Partial Least Squares Regression (PLSR)
due to the former’s perceived stronger statistical background even though both are not fully
understood statistically [98]. However, after efforts made by Höskuldsson [99], Helland [100]
and subsequent works, the statistical properties of PLSR are more understood and PLSR is
increasingly being used outside chemometrics. PCR and PLSR have been used by Liu et al. [101]
to estimate and characterize the severity of rice brown spot disease. The data is collected by
hyper spectral reflectance and is therefore spectral. The results showed that PLSR gave
predictions with a lower Root Mean Squared Error of Prediction (RMSEP) than PCR. Other
applications of PLSR include the prediction of beef palability from colour, marbling fat and surface
texture features of longissimus dorsi [102] and the detection of Sclerotinia rot disease on celery
using hyper spectral data [103]. The application of PLSR to Sclerotinia or other plant disease
prediction has been found to predominantly utilise spectral data because, until now, there was
no automatic spore detection mechanism that can be used with an online forecasting system.
However, traditional regression methods have been used in developing indirect Sclerotinia
prediction models. Linear regression and logistic regression have been used to associate field
infection levels with disease incidence [67, 70, 104], and multiple regression techniques have
been used to relate disease incidence with several independent variables using SAS software as
part of the forecasting process [105]. Analysis of variance (ANOVA) has also been used in the
analysis of historical disease data [104].
Branches of multivariate analysis such as regression analysis and multivariate interpolation are
potentially useful for predicting and forecasting spores. While regression methods have to some
extent been used within agriculture, the data used is mostly image or spectral data, which makes
the application no different from typical chemometrics applications [106].
2.3.2 Multivariate Statistical Process Control
Due to the novelty of the design of the sensing surface, there is no sense of how the biosensor
will perform. Therefore, there is a need to have a measure of accuracy to expect from the device.
In addition to the reason mentioned above, sensitivity of the surface and specificity of the
37
particular strain of Sclerotinia encountered [107] can affect measurement accuracy. Moreover,
unforeseen problems always cause complications in real deployments. Some real deployments
have shown that environmental sensors can have a yield of as low as 50%, with the rest of the
data either corrupted, erroneous or lost in transmission [108, 109] .
Various statistical methods are used to assess sensor accuracy. Kollman et al. [110] used
correlation analysis, Error Grid Analysis (EGA) and Receiver Operating Characteristics (ROC) to
assess the accuracy of continuous glucose sensors. They found that general statistical methods
tend to overestimate sensor accuracy/inaccuracy because they do not distinguish between
clinically significant and insignificant errors. This is significant to this research study, as detection
of spores by the biosensor system does not necessarily indicate a disease outbreak since the
amount of spores may not be enough to cause an epidemic [16]. A threshold of spore
concentration exists above which an outbreak is more likely. A sensor with an error interval higher
than this threshold has low accuracy and provides little information. Kollman et al. [110]
suggested that the use of specialised methods that use event assessments, which take more
history into account during analysis, as against point-by-point assessments, would provide better
sense of accuracy for these sensors.
Poisson distributions, which can be used to compute probabilities of event occurrence [111] seem
attractive. But a Poisson process requires that successive samples be independent, which
presupposes normal (Gaussian) distribution [111]. There are no assurances that the data (error
vector) will be normal, so an adaptation of Poisson distribution to nonparametric distributions or
other KDE-based methods [112] may need to be considered.
2.3.2.1 Fault Detection and Identification
Sensed (measured) data is inevitably susceptible to errors. Therefore, a mechanism to detect and
identify erroneous sensors is necessary. Because the sensors are required to record real-time
measurements, online error detection methods are desired. Fault detection methods are an ideal
remedy for this since they can be implemented online and the general definition of a fault can be
extended to cover both false positive and negative errors as well as real faults that occur as a
result of sensor failure. As with faults, subtle errors due to drift or correlation breakdown are
more difficult to detect than glaring errors that breach specified limits [113]. An effective
detection method will be one that detects both.
Extensive investigations into online fault detection have been conducted in process monitoring
and control over the past years [113-118]. PCA and PLS models [92, 119-122] have long been
used for process monitoring and control, utilising the ever-present, high correlation of process
38
variables. The idea is to compare a model built with healthy historical data to new incoming
samples, and abnormalities are detected by monitoring the residual error or normal variance of
the model, both of which are indicators of correlation breakdown. The faulty sensor is identified
by identifying the variable with the highest contribution to the square prediction error (SPE) and
the Hotelling T2 statistic [116, 123-125]. Dunia et al. [126] detected, identified and reconstructed
faults using a PCA model and a novel metric, the sensor validity index (SVI), which provided the
status of each sensor online. They applied this method on a boiler process and the results showed
very good reconstructions are possible for a highly correlated process, and the SVI was able to
identify faults reasonably well.
However, their approach only considered one sensor failing or becoming faulty at a time, an
assumption this study cannot afford since multiple simultaneous sensor failures are expected, for
example when oxalic acid producing pathogens that cause false positives traverse the space
covering more than one sensor. More work by Doymaz et al. [127] and Bose et al. [128] address
the case of multiple sensor failures. While all of these methods measure multiple variables, the
biosensors considered in this work will only measure a single variable (spore concentration), but
its spatial variation makes its behaviour similar to that of unique, correlated process variables,
making the discussed process and monitoring control methods appropriate.
Sharma et al. [129] investigated the prevalence of faults in real-world sensor deployments using
a number of fault detection algorithms: estimation-based methods, learning methods and rule-
based methods. Similar to the PCA case, the estimation method uses a least squares model to
estimate a sensor based on its neighbours but uses the estimation error as against SPE to detect
faults. No particular method was found to be perfect; they found that each of the methods
performed well depending on whether the fault is a short fault or a noise fault [129]. They also
found that sensor faults occurred did not occur very frequently, except when mechanical failure
is involved, but when they do, the erroneous values are orders of magnitude higher than the
actual measurements. This is significant since a biosensor, with all its added complexity, is more
susceptible to mechanical as well as other problems. This approach also utilised spatial correlation
to estimate reliable models of sensors using other sensors. Ramanathan et al. [109] also looked
into rapidly deployed sensor networks, where the short deployment period makes detecting and
correcting faults more exigent. Their algorithm was based on a set of rules that identified and
classified faults based on their duration. Their classes of faults were found to be more frequent
than Sharma et al. [129] found, confirmation that fault prevalence is data dependent.
Both Sharma et al. [129] and Ramanathan et al. [109] used rule-based fault detection and
identification with success. The fact that Ramanathan et al. [109] used it for rapidly deployed
39
sensor shows that the approach can detect faults in a timely manner. One more attraction of the
rule-based method is that it can easily incorporate another set of rules that define what to do in
the event of a fault, thus providing fault detection and remediation functionalities. While other
methods such as learning methods, neural network models or hidden Markov models [130], can
provide similar confidence in results, fault remediation is not easily implementable. The main
challenge of the rule-based method is in using it online for slowly sampled systems. A number of
samples will have to be used for some types of faults (subtle ones) to be detected, which in the
case of the biosensors considered in this work which have a sampling frequency of one
measurement per day covers a period of days. Another challenge is the massive amount of
knowledge and experience required to formulate effective rules [109].
All fault detection methods use a threshold limit that is designed around the confidence limit of a
fault-free error model of the process [109, 113, 116, 128, 131]. Selecting this threshold is very
crucial, as an unsuitable value will result in too many false alarms, causing the fault detection
system to be highly unreliable. A possible way to deal with naturally occurring false alarms is
through the use of a low pass filter. Qin and Li [131] used exponentially weighted moving average
(EWMA) filtering to drastically reduce the amount of false alarms due to noise.
2.3.2.1 Kernel Density Estimation
Kernel Density Estimation (KDE) is a reliable nonparametric of estimating data distributions which
makes no assumptions of normality [112, 132]. The kernel density estimator is given by [112]:
𝑓(��) = 1
𝑛ℎ∑��
𝑛
𝑖=1
(�� − ��𝑖
ℎ𝐾𝐷𝐸
) [2.4]
Where 𝑛 is the number of samples,ℎ𝐾𝐷𝐸 is the smoothing parameter or bandwidth analogous to
bin width in histograms (the KDE is actually a more sophisticated histogram where the width and
origin of the histogram ideally chosen to show the true properties of the data), �� is the position
of interest, ��𝑖 are the sample values and �� is the kernel function [112]. �� can be a symmetric
probability density function or a piecewise function. The choice of �� is very important as it
determines the differentiability and continuity of 𝑓. Each of the summation components in Eq.
2.4 represents a kernel whose shape is determined by �� and width by ℎ𝐾𝐷𝐸 [112]. The individual
kernels are added together to yield 𝑓 as illustrated in figure 2.3.
40
The difference between a successful and failed KDE implementation usually comes down to the
choice of bandwidth, which is very much data dependent [132-134]. A large bandwidth obscures
data detail, including multimodal features, while a value close to zero accentuates data spikes
[134]. Other forms are available for equation 2.4 because there are different methods of
specifying ℎ𝐾𝐷𝐸 that enable the estimator to capture more detail for data drawn from long-tailed
distributions, which would otherwise be masked by the more prominent part of the distribution
[112].
Figure 2.3: Kernel estimates showing individual kernels and the effect of bandwidth, ℎ𝐾𝐷𝐸 (a)
ℎ𝐾𝐷𝐸 = 0.2; (b) ℎ𝐾𝐷𝐸 = 0.8 [112]
2.4 Sensors, Biosensors and Sensor Networks
Biosensor and chemo-sensor networks have enormous promise in combating bioterrorism,
contamination, and improving healthcare [135]. The main goal behind the 'SYIELD' project, which
was aimed at deploying a network of biosensors across the UK for the purpose of crop pathogen
detection, fits in perfectly with this. As the name implies, these biosensors are not primarily
transducer-based and are, therefore, not as advanced, reliable or well understood [136]. Large-
scale deployment of transducer-based sensor networks, for example those used in measuring
physical signals such as temperature, pressure, etc., has been researched and improved to
efficient levels of performance over the years, with the electrical and electronic industries being
41
the primary contributors in their quest to create and improve the telecommunications sector, and
the computer industry through its interest in optimising data acquisition.
Unfortunately, biosensor networks, which are potentially a source of data of unprecedented
importance and application, have not been so developed due to the complex nature of the sensing
surface and other reliability issues of the individual sensors that make up the network [136]. It is
still possible, however, to adapt most of the sophistication, and experience acquired over the
years, of ‘conventional’ Wireless Sensor Networks (WSNs) to biosensor networks. For example, it
is now known that a distributed network structure provides networks with more redundancy and
reliability regardless of the makeup of a sensors sensing surface. Chemo/biosensors can be
deployed in a semi distributed fashion, allowing them to cope with node failure through
decentralisation but not requiring the expensive node-node communication [136]. Concepts such
as scheduling, which enables the sensor to be turned off in low risk periods of the day, and data
aggregation, which allows minimisation of data transmission energy, can be employed [136].
2.4.1 Peculiar Challenges of Biosensor Networks
Despite this transferability between conventional sensor and biosensor networks, problems still
persist regarding the optimisation of biosensor networks. Peculiar complications and unreliability
due to the unpredictability of the sensing surface and a multitude of other factors, ranging from
mechanical unreliability to the stochastic nature of particle transport, make the network more
complex and less reliable [136]. The randomness of the measured variable (spore concentration
is a function of a random dispersion process) makes the sensing range of the spore biosensor
lower compared to that of a conventional transducer-based sensor. This makes the problem of
biosensor location and area coverage even more crucial. The expensive cost of these biosensors
also mandates the use of a robust network that can function well with relatively fewer nodes. In
addition, low biosensor sensitivity and unusual drift can seriously affect data quality.
Consequently, constraints resulting from these challenges can affect four key areas: biosensor
location, biosensor coverage, biosensor deployment and data quality.
2.4.1.1 Biosensor Location
Striking an optimal trade-off between network performance, sensor location, coverage, cost and
redundancy is challenging. Sensor location varies with the application being considered [135].
For example, the Environmental Protection Agency’s (EPA) pollution monitoring network, whose
primary aim is ensuring people’s safety, uses community-based location - they use statistically
determined information about where the majority of the people live to locate sensors. Within any
42
population of interest, sensors are located where they are most likely to guarantee early warning.
For spores, as they originate from farms and infect crops, it makes sense to place biosensors in
farms in an ad-hoc manner. But there are a lot of big farms, some located adjacent to each other.
A better approach that will scientifically deploy biosensors exploiting knowledge of their properties
and limitations will improve data quality and reduce cost of deployment.
2.4.1.2 Biosensor Coverage
Non-contact-based detection methods, where the sensed target is a continuous signal and
contact with sensor is not required, aim for complete coverage [137]. Complete coverage means
that all objects (variables of interest) traversing the space surrounding a network of sensors will
be detected. This is possible for signal detection because the sensing radii required for information
coverage for these sensors are quite big [137]. Unlike physical variables that are continuous in
time and space, spores are in a state of motion, and measured values depend on location of
measuring device [13]. Moreover, biosensors typically require actual physical contact with the
measured variable for detection to be possible [136]. This reduces the coverage radius of
individual sensors to effectively less than a couple of metres depending on wind speed, direction,
and whether or not there is a source in their vicinity. There is, therefore, no unique definition of
coverage for any individual biosensor. So many sensors are required to ensure complete network
coverage for spores that it is not practical. Note that the coverage of the entire network is an
extension of the coverage of a single sensor. As a result, chemo/biosensors used in biological,
biohazard and pollution measurements do not aim for complete coverage [138]. They instead
aim to detect ‘significant threats’ and this is why there are no deployment standards for
biosensors.
2.4.1.3 Biosensor Deployment
There is no standard for deploying sensors, as most deployments are based on ad-hoc measures
[138]. But lately, deployment is becoming increasingly reliant on optimal placement of sensors to
guarantee redundancy [135]. Redundancy is very useful in a network because it allows for failure
of nodes and improves measurement precision. Most of these approaches are based on some
kind of optimisation where an objective function is minimised subject to certain constraints. The
choice of the objective function to minimise or maximize varies depending on the types of sensors
[135]. Heuristics, dynamic programming, and genetic algorithm are among the optimisation
approaches recently used [139-141]. Kanaroglou et al. [138] formulated the optimisation problem
of placement of pollution monitors as a function of pollution surface variability or semivariance
43
[138]. Portions of this surface that have high entropy (variance) are allocated a larger number of
monitors by this method. Krause et al. [142] have criticised this approach asserting that the
assumption that high variance locations need more monitors is weak since entropy is an indirect
criterion that does not consider the uncertainty in the original prediction that produced the
surface. Since sensors placed far apart from each other – in the network that produced the data
- are likely to have higher entropy, the result of this weak assumption is that sensors tend to be
located at the borders of the area of interest [142, 143].
Park et al. [144] approached it as an optimisation problem that is a function of geographical
features such as roads, water bodies, elevation, etc., and redundancy. They found that the
algorithm provided qualitative sensor placement that is robust enough to handle node failure,
meaning the algorithm sufficiently allowed for redundancy. The method, however, does not
provide location errors, which are certain to occur even for manual deployment. In addition, there
is no quantitative measure of how beneficial this method is over others.
Lee and Kulesz [135] noted that most of these methods do not consider the dispersion of
hazardous materials, their toxicity and population distribution when placing sensors. These factors
are important when there is an interest in determining the effect of threats on populations. They
proposed a general risk-based placement algorithm that locates sensors iteratively based on the
solution of a local optimisation problem. For a gridded area, the placement for each cell represents
a local optimisation problem. The next placement disregards the risk used for placement in the
previous cell. This approach provides the advantage of quantifying the gain of adding each sensor
to the network, thereby providing an extra condition for stopping the iterative process – when a
certain threshold is reached – in addition to number of sensors available for placement [135].
2.4.1.4 Data Quality
A qualitative data set is one that guarantees a reasonable amount of confidence in the statistical
analysis and conclusions drawn from it. Spore data behaves in a similar way to particulate
datasets such as pollution data [44, 79]. Most approaches for optimising data quality in sensor
networks deal with selecting an ideal sampling frequency and ensuring enough sampling
locations. Exploratory tools of investigating data quality include entropy-based methods [145],
principal component analysis [92], and a host of other time-series analysis methods – such as
autocorrelation and trend analysis Shumway and Stoffer [146]. Averaging and filtering methods
for filtering excessive noise that is almost certainly going to be present in the system are
particularly useful. All of these methods can be used to analyse data collected from a pilot phase
of the experiment, and the conclusions drawn from such an analysis could be used to improve
44
subsequent data collection. Data samples are spatially and temporally correlated and the amount
of this correlation determines data usefulness.
2.5 Conclusion
The review has provided an overview of atmospheric models, fault detection in sensors and
sensor networks and data integrity methods as they relate to biosensors and spores. The
limitations of current agricultural disease prediction methods have been identified as their inability
to incorporate spore dispersion information into disease risk forecasts. Atmospheric models, such
as GPM and trajectory (Lagrangian and Eulerian) models, which have been successfully extended,
from meteorology and climatology, to spore dispersion have also been reviewed. The GPM is
attractive for spore dispersion prediction because of its ease of implementation and availability in
many computer packages such as CALPUFF. Trajectory models have been shown in literature to
be more accurate near a source due to their ability to better simulate the increased randomness
brought by canopy disruption of wind flow within that range. Generally, trajectory models require
detailed information about the source of spores, such as location, ground cover, initial release
velocity and source strength and dimension. This information is not always available in non-
experimental cases.
Data integrity methods in biosensor networks have not fully developed because biosensing is only
just gaining ground. Creating reliable chemo or biosensing surfaces for deployment on a network
scale is still in its infancy. Nevertheless, transferable methods from conventional sensor networks
for optimising network operation and ascertaining and improving data quality have been
identified. These methods were found to be widely used in the communications, electrical,
environmental and chemical industries.
The review also found that sensor deployment is critical as it affects the quality of data collected
and should consider sensor properties such as coverage and measurement precision. Almost all
deployment methods are based on optimising a constrained cost function. Inputs to this objective
function in addition to standard inputs such as maximum number of sensors and coverage area
can vary from approach to approach. The superior methods include sensor redundancy as an
input to the objective function.
45
Chapter 3 Dispersion of Sclerotinia sclerotium Spores in an
Oil Seed Rape Canopy
3.1 Introduction
As discussed in Chapter 1 and 2, the manual and logistical expense of collecting and quantifying
Sclerotinia spores arising from the rudimentary data collection methods limits the availability of
agricultural data. Aerially dispersed PM10 data that is widely available as air quality monitoring
data, is useful for above canopy analysis at longer distances from the source, but is unsuitable
for analysing and modelling source-receptor dispersion on a local scale because near field
transport differs from far field dispersion [147].
The aims of this chapter are to generate data that can: reliably explain the dispersion pattern of
Sclerotinia sclerotium spores in OSR fields in a controlled experiment; enable evaluation of the
sensitivity and effectiveness of a prototype oxalic acid-measuring biosensor; and provide data
that can be used to identify a suitable model for the dispersion of Sclerotinia sclerotium spores
in OSR fields.
The chapter details the design and implementation of an experimental field trial for the emission,
dispersion and collection of Sclerotinia sclerotium spores. It also describes the measurement of
spore concentration from field samples by direct DNA quantification and by proxy, through the
measurement of oxalic acid. Oxalic acid concentration was measured using the electrochemical
process employed by the biosensor (see Chapter 1 and section 3.3.1 in this chapter) and with a
direct concentration measurement using colourimetric analysis. Data resulting from both methods
of oxalic acid concentration evaluation has been compared to test the biosensor’s efficacy. The
overall dispersion of Sclerotinia sclerotium spores within, through, and above an Oil Seed Rape
(OSR) canopy is further discussed. Experimental methods consistent with standard agricultural
procedure and practices were employed, and the application, set-up and utilisation of air sampling
46
and weather measuring equipment were demonstrated. Methods from the chemical sciences,
through the preparation of various samples/reagents, have been adopted and electrochemical
measurement and instrumentation techniques have been utilised along with biochemical
(colourimetric analysis) and biological (quantitative Polymerase Chain Reaction) techniques for
concentration/DNA quantification from collected spores.
The experiment was designed, planned, conceived, set up and implemented by the author with
assistance from Dr Jon West of Rothamsted Research Ltd, UK. All laboratory work was carried
out at Rothamsted Research’s Plant Biology and Crop Science (PBCS) laboratory. The author
solely carried out biosensor tests and reagents preparation; colourimetric analysis was assisted
by Dr Stephanie Heard, then of Rothamsted Research; and Mrs Gail Canning also of Rothamsted
Research did the qPCR analysis. Matlab and data analysis tools in MS Excel were used for data
analysis and presentation.
3.2 Motivation for Experimental Field Trial
This experimental field trial was motivated by a change in the original plan of the SYIELD project
(see Appendix 1 for details). The initial plan intended that prototype biosensors will be ready for
deployment in field trials by mid-2011 and subsequently thus providing the author with at least
2 years Sclerotinia spore dispersion trial data by the end of the PhD. Based on the original plan,
biosensors housed in an integrated unit comprising and a virtual impactor, collection and
incubation mechanism, electrochemical transducers (http://www.syield.net/) will be deployed in
quantity across OSR fields to automatically measure and output a concentration of oxalic daily
thereby supplying data of spatial variation of airborne ascospores. Due to technological and
logistical reasons the biosensor units were not available for this purpose throughout the PhD.
This had the implication that by 2013, the author had no data.
As a result, the author, with the original research goals in mind, conceived and designed the
experimental plan in order to collect data for testing the biosensing chipset (which was in
production at that stage) and evaluating and modelling Sclerotinia dispersal in and above an oil
seed rape canopy. The main aspects of the design were determined by considering the potential
environment the completed biosensors will be deployed in and the type of data they are likely to
collect (naturally released, turbulent in the near-field, diffusive in the far-field, susceptible to
contamination, etc.).
47
3.3 Methodology
This section presents the methodology used in designing the field trial, spore sampling,
identification and quantification. In this main section, a general overview is presented. Description
of the theory and application of these methods along with modifications and justifications are
given in the relevant subsections starting from section 3.3.1.
Broadly, three techniques borrowed from agriculture and horticulture, analytical chemistry and
biology, and meteorology were used. These are: field trial and spore sampling, weather
measurement and instrumentation, and identification and quantification of spores.
A majority of the methods used in this chapter regarding the standard agricultural experimental
setup for spore sampling, deployment and setup of sampling equipment, and identification and
quantification of spores are based on the procedures proposed in Lacey and West [148] and
review of spore traps by Jackson and Bayliss [10]. Standard procedures specific to sampling
agricultural data for model evaluation were also used based on experiments of Aylor, McCartney
and Gleicher. Methods regarding weather instrumentation measurement draw on [149] and
[150].
3.3.1 Field Trial Experiment
An experiment was designed to collect airborne Sclerotinia sclerotium spores in a winter Oil Seed
Rape (OSR) field in Little Hoos (WGS84 Lat/Long: 51.811374/-0.373084) between 31st May and
3rd of June 2012. Little Hoos is one of the classical experimental fields1 at Rothamsted Research
Limited, UK. Figure 3.1 shows the location of Little Hoos in relation to other fields. Siting of the
experiment was motivated by a need to improve sampling reliability, which can be significantly
affected by unmeasured background levels of spores [10], and improve reliability of turbulent
measurements, whose stability and representativeness are amenable to flat terrains [152]. Crop
rotation was practiced in Little Hoos in the previous two years, a practice that is known to inhibit
growth of sclerotia [153] and therefore reduce the likelihood of background concentration levels
of Sclerotinia. There were no detectable natural or artificial inoculum sources in any of the
surrounding fields as the background spore levels from an upwind air sampler confirmed (see
section 3.3.1.3).
1 Classical experimental fields refer to the long-term trial sites where landmark field
experiments were carried out by John Laws and Henry Gilbert in the 19th century. See
Silvertown et al. [151. Silvertown, J., et al., The Park Grass Experiment 1856–2006: its contribution to ecology. Journal of Ecology, 2006. 94(4): p. 801-814..
48
The scale of the experiment (43 x 38m) was chosen for practical reasons, limited by the source
strength, number of available samplers and considerations of the relative short distance travelled
by spores released from a small source inside canopies with a leaf area index of at least 2.5 [43].
The aims of the field trial were threefold: to generate Sclerotinia sclerotium spore spatial data
that would enable the analysis of dispersal of naturally-released spores from in-canopy ground
level sources; to generate suitable data for the identification and evaluation of a physical transport
model to describe dispersion of naturally released spores in and above an OSR canopy; and to
provide viable real life sampled spores for calibrating and testing a prototype Sclerotinia
biosensor.
3.3.1.1 Source and Site Characteristics
Figure 3.1 shows the sampling area within the experimental site. The area is a 43 X 38m
rectangular site within a 100 by 100m OSR field. The OSR was at the flowering stage. Six groups
of Sclerotia were sown at a depth of approximately 2cm in late autumn of 2012 distributed around
the circumference of a 7m-diameter circle. A ring configuration was adopted for the source in
order to address an additional objective of the experiment, which was to test the performance of
an automated prototype biosensor.
The Sclerotia were monitored throughout the winter and matured and produced sporulating
fruiting bodies (apothecia) during the flowering period of the OSR in late April 2013. At the start
of the experiment, the approximate canopy height and Leaf Area Index (LAI) were measured as
1m and 3.5 respectively. LAI was measured with a leaf area index meter (LAI-2200, LiCor
Environmental, NE, USA), which also calculated mean leaf angle.
49
Figure 3. 1: Location of Little Hoos (WGS84 Lat/Long: 51.811374/-0.373084), the experimental
site, among other field trial sites at Rothamsted Research UK (source of image: Rothamsted
Research).
Field Trial Site
50
3.3.1.2 Air Sampling
In this work, given that ascospores are chiefly dispersed aerially [19, 26, 66, 154, 155], air
sampling methods were the primary area of focus. Numerous air sampling methods and
equipment have been developed over the past 50 years of varying reliabilities and confidence
[11, 156] [10]. Generally, depending on the physical characteristics of the spore being sampled,
passive and active sampling techniques [148] can be used. Passive methods rely on the use of
spore traps to capture spores mainly by sedimentation. This method is less suitable for smaller
particles (< 2 𝜇𝑚) as a result of Stoke’s law, which states heavier particles are preferentially
deposited due to their relatively higher settling velocity [148, 157]. Active samplers use active
inertial impaction through the use of air sampling to capture spores. The higher the sampling
volume, the higher their reliability. Some advantages of active methods over passive ones, in
addition to a higher sampling volume, are higher impaction and deposition retention [148].
Popular types of active sampling devices are the Burkard Hirst type spore trap and Rotorod
samplers. In this work, the Rotorod active sampling device was preferred over Burkard traps due
to its combination of superior ease of use and setup, significant inexpensiveness and high
sampling volume [21] at similar or superior impaction efficiencies [55, 158] for particle sizes
above 7𝜇𝑚 [10].
The Rotorod sampler comprises two detachable vertical arms (I-rods) attached to a DC motor,
forming a ‘U’ formation, such that the I-rods stand upright. The motor rotates the arms to sample
an air volume given by:
𝑉 = 𝜋ΔΨΓΖ 𝑐𝑚−3𝑚𝑖𝑛−1 [3.1]
Where Δ,Ψ, Γ and Ζ in Equation 3.1 are the outer diameter, width of the collecting surface,
Length of the collecting surface (all in 𝑐𝑚) and rpm speed respectively.
To begin sampling, a pre-calibrated Rotorod sampler with I-rod surfaces coated with glycerine to
maximise adhesion and impaction efficiency [148], and capable of rotating at 1200rpm by
sampling an air volume of 38 litres per minute was deployed at all sampling heights (see section
3.3.1.3) to collect spores. A 6V battery, which was replaced every 2 days, powered each Rotorod
sampler pair and sampling was automated by a Burkard timer set to activate the Rotorod samplers
for 5 hours (11am to 4pm) daily throughout the experimental period. The sampling days were
chosen such that they were dry and preceded wet days, conditions that have been found to be
optimal for release of Sclerotinia sclerotium spores [19, 20, 159]. The daily sampling periods from
11am to 4pm coincided with weather conditions that have been found to be ideal for spore
emission characterised by increased solar radiation, temperature and sunlight [19, 20]. The
51
sampling duration of 5hrs also alleviates some of the concern that air sampled data is less reliable
with shorter sampling durations [160].
3.3.1.3 Deployment of Samplers
A set of nine (9) locations corresponding to 22 sampling points were chosen in this experiment
to sample travelling spores. 22 Rotorod model 40 samplers (Sampling Technologies Inc., 1989)
were deployed at these locations at various heights. One of the positions (A) was located upwind
to measure background spore levels and the others (B to I) were distributed downwind and
crosswind within Little Hoos as shown in Figure 3.1. The inclusion of an upwind position, A, to
determine the background level of Sclerotinia sclerotium spores from non-local sources complied
with standard practice of spore data collection [10, 148], and was necessary because potential
Sclerotinia sclerotium spore sources could not be entirely ruled out given Rothamsted’s status as
an experimental facility. As Figure 3.2 shows, the samplers were deployed to sample along
downwind, crosswind and vertical directions to provide a spatial measure of dispersion. The
crosswind measurements were made to assess lateral dispersion. The corresponding picture
shown in Figure 3.3 shows a network of Rotorod samplers covered with rain shields spanning the
sampling area. Figure 3.4 provides a closer look at each sampling position, with rain shields
removed, revealing active pairs of I-rods as well as timers that turn sampling on/off. A typical
assembly comprising Rotorods set at 0.8m, 6V battery and Burkard timer is shown in Figure 3.5
before deployment below the canopy.
Previous day’s wind direction forecasts obtained from the Meteorological Office website were
used to align sampling axis with the anticipated wind direction, and this was confirmed and/or
corrected by readings from onsite measurements on each morning of the experiment. This
approximately positioned the sampling axis at the centre of the spore plume, making the
crosswind axis perpendicular to wind direction. Corrections to realign sampling axis with average
wind direction is beneficial because it eliminates covariances between horizontal and crosswind
components of wind speed [149], thereby simplifying model dimensions [150] (see Chapter 4).
All positions were sampled at two heights of 0.8m and 1.6m, positions B and D made
measurements at additional heights of 2.4m and 3.2m in order to determine the vertical profiles
of spore dispersion. The additional sampling heights at position B provided vertical spore
gradients close to the source where spore numbers are highest [26] [29]. The sampling height
of 0.8m below the canopy has been found to be ideal for optimal detection of locally sourced
spores [148] and was chosen as such in this study. The 1.6m sampling height was chosen
because this height is outside the roughness sublayer for 1m-high canopies [149], thus reducing
52
the effect of canopy induced wakes on the sampling point [161]. Comparing collections at this
height with those made at 1.6m above the canopy provided two different spatial gradients that
would enable evaluation of net transport between the two media, i.e. the significance of canopy
filtering or escape fraction [50]. This will in turn give an indication of spores available for long
distance travel which can pose a more insidious threat to crops in other fields [154].
Figure 3.2: Layout of sampling area (43m by 28m) within field trial site from 31st May 2013 to
3rd June 2013 showing positions of Rotorod samplers. Data was collected at two heights of
0.8m and 1.6m (O), and additional heights of 2.4m and 3.2m (⊕).
An arrangement of biosensor unit, weather station and a 3D sonic anemometer were situated
at the centre of the 7m-diameter ring of ascospores. Scale of sampling area excluding upwind
sampling point: 35m by 28m. All sampling positions are 7 meters apart except I, which is 14m
from D. B is 1m away from the edge of the source ring.
53
Figure 3.3: Experimental trial field showing Rotorod samplers (with rain shields) above OSR
canopy. (Image taken by the author).
54
Figure 3. 4: Rotorod samplers at position B deployed at 0.8m (obscured), 1.6m, 2.4m and 3.2m
pictured without rain covers. Position B (as well as D) sampled at two additional heights.
(Image taken by the author).
55
Figure 3. 5: A typical assembly of Rotorod sampler (1), battery (2) and Burkard timer (3), seen
here only powering one sampler with its other output unused. (Image taken by author)
3.3.1.4 Weather Measurements and Instrumentation.
Weather data was captured by two means: a Vantage Pro2 (Davis Instruments, Hayward, CA,
USA) met station placed at the centre of the source ring, inside a circular canopy clearing of a
radius of approximately 6m, recorded temperature, relative humidity, wind speed and wind
direction every 20 seconds and logged 30-minute averages; and a 3D sonic anemometer
(Campbell Scientific, Inc., UT, USA), which was deployed at a height of 2m meters above the
ground, made measurements of orthogonal wind speeds and their hourly averages along with
friction velocity at a frequency of 16Hz. By deploying the sonic anemometer at 2m, turbulence
measurements were made at a height above the roughness sublayer [37] (height of roughness
sublayer above the ground in this experiment is approx. 1.3m), whose effect can cause rough-
1
2
3
56
wall boundary layer eddies that may deteriorate measurement accuracy [161]. The sonic probes
were also aligned with the mean streamwise wind direction to minimise errors [38] based on
forecasts. The reliability of sonic anemometers in canopies of varying heights has been
extensively demonstrated in spore dispersion experiments [55, 56, 162] and for other lighter
tracer particles [150, 163].
At the end of each sampling period, deviations of the sampling grid from the mean wind direction
were noted and recorded as 휃 . The approximate locations of the met station and sonic
anemometer are indicated in figure 3.2. Solar radiation and rainfall data were also sourced from
an offsite meteorological station (part of the Environment Change Network (ECN)) located
approximately 1 kilometre away from the experimental field.
3.3.2 Identification and Quantification of Spores
At the end of each sampling day, for each sampling location, one I-rod per Rotorod containing
the collected spores was stored for direct quantification via qPCR analysis [63] while the other
was immersed into capped 1𝑚𝑙 Eppendorf tubes containing 900𝜇𝑙 of Sabouraud growth media,
stored at 200C for 4 days to allow for oxalic acid formation [64] and used for indirect quantification
via measurements of oxalic acid, a pathogenicity factor for viable Sclerotinia sclerotium spores
[14]. Two types of indirect quantification were used: the biosensor chip and colourimetric
detection. The objective of testing the prototype biosensors is to determine their potential for
real-time deployment. These tests are described in section 3.3.2.1.
The remainder of the sample was used to measure oxalic acid using the alternative method of
colourimetric analysis [148, 164]. Due to its reliability colourimetric detection can ascertain if
detectable quantities of oxalic acid were produced from collected data, as real-life air sampling
can be susceptible to contamination and corruption of samples. Subsequently, the method was
used to assess biosensor performance and to explore the possibility of identifying biosensor false-
negatives.
3.3.2.1 Prototype Biosensor Testing
The biosensing surface is made up of a circular carbon electrode of approximately 1 square
millimetre, capable of holding approximately 80𝜇𝐿 of liquid solution, coated with an enzyme
known as oxalate oxidase [165]. This enzyme acts as the bioreceptor, the component of the
biosensor that determines selectivity and specificity [166]. Oxalate oxidase catalyses oxalate
(Oxalic acid [167]) to hydrogen peroxide (H2O2) and carbon dioxide [168]. H2O2 in turn oxidises
the ferrocyanide ions in the biosensor to ferricyanide ions which results in the generation of
57
faradaic current proportional to the concentration of the analyte (oxalic acid) [169, 170]. The
concentration of oxalic acid is therefore given by the measured current due to the released
electron from this oxidation process during chronoamperometry [171].
Preparing calibration standards
Oxalic acid concentrations of 0, 50, 100, 500, 1000 and 1500𝜇𝑚𝑜𝑙𝐿−1 spanning the detection
range of the biosensor (Gwent Technology Ltd., Pontypool, UK) as stated in the product datasheet
(Gwent Protocol Document TR-A 2574) were used as calibration samples. This range is based on
the amount of oxalic acid needed to use up enzymes, catalysts and ferrocyanide ions on the
biosensing surface in a chemical reaction [172].
To begin, a base stock solution of 10𝑚𝑚𝑜𝑙𝐿−1 oxalic acid was prepared by weighing 63mg of
oxalic acid dihydrate (molar mass =126.07𝑔𝑚𝑜𝑙−1) with an electronic weight balance (precision
1mg). The weighed crystals were then carefully transferred into a 50mL volumetric flask, filled
with Sabouraud media to allow and inhibit the growth of fungi and bacteria respectively [173].
The six (6) calibration samples (0, 50, 100, 500, 1000 and 1500𝜇𝑚𝑜𝑙𝐿−1) were then prepared by
diluting pipetted volumes of this stock solution with Sabouraud media in separate 50 𝑚𝐿
volumetric flasks as shown in Table 3.1.
Table 3. 1: Volumes of oxalic acid required to prepare 50𝑚𝐿 of 0, 50, 100, 500, 1000 and
1500𝜇𝑚𝑜𝑙𝐿−1 standards from 10𝑚𝑚𝑜𝑙𝐿−1stock
Concentration Volume of Solution Volume of Solvent Dilution Factor
(𝜇𝑚𝑜𝑙𝐿−1) (𝜇𝐿) (𝜇𝐿)
1500 7500 42500 6.67
1000 5000 45000 10
500 2500 47500 20
100 500 49500 100
50 250 49750 200
0 0 50000 -
Measurement of oxalic acid with the biosensor
The electrochemical measurement of oxalic acid was achieved with a Uniscan potentiostat
(Uniscan Instruments, 2009) that attached to the biosensor as shown in Figure 3.6 (left frame).
58
The following steps were used to measure oxalic acid concentration as laid out by the biosensor
manufacturers in biosensor electrochemistry manual [172]:
Hot plate pre-heated to 60C.
Biosensors attached to Uniscan potentiostat using bespoke connector addressing the
central carbon paste electrode as working electrode and outer Ag/AgCl electrode as
pseudo reference / auxiliary electrode (Figure 3.6, right frame).
Pipetted 80μL volume of calibration solution (in Sabouraud media) onto pre-prepared
biosensor surface.
Pre-incubation for 120 seconds at 60C to allow for sufficient mixing and completion of
oxalate oxidase reaction.
Application of -0.2V vs Ag/AgCl pseudo reference to working electrode surface.
Measurement of current (0.1 Hz sampling) for 60 seconds.
Recording current at the 60-second time-point.
This measurement procedure was used to test calibration standards and samples. Each calibration
standard was tested 5 times to generate a calibration curve. A further 40 tests (each) were carried
out on standards with blank (0𝜇𝑚𝑜𝑙𝐿−1) and low (0 − 5𝜇𝑚𝑜𝑙𝐿−1) concentrations of oxalic acid.
These were used to calculate the biosensor Limit of Blanks (LOB) and Limit of Detection (LOD)
and Limit of Quantitation (LOQ) [174]. LOB represents the current measured by the biosensor at
0𝜇𝑚𝑜𝑙𝐿−1, LOD represents the lowest analyte concentration the biosensor can measure and LOQ
indicates the lowest analyte concentration that can be reliably measured [175]. The LOB, LOD
and LOQ are considered “figures of merit” for biosensor performance and are widely used [174]
[175]. These were calculated as follows [174]:
𝐿𝑂𝐵 = 𝜇𝑏𝑙𝑎𝑛𝑘𝑠 + 1.645𝜎𝑏𝑙𝑎𝑛𝑘𝑠 [3.2]
𝐿𝑂𝐷 = 𝐿𝑂𝐵 + 1.645𝜎𝑙𝑜𝑤 [3.3]
𝐿𝑂𝑄 = 𝐿𝑂𝐷 [3.4]
Where 𝜇𝑏𝑙𝑎𝑛𝑘𝑠 is the mean of blank concentrations, 𝜎𝑏𝑙𝑎𝑛𝑘𝑠 and 𝜎𝑙𝑜𝑤 are the standard deviations
of blank and low concentrations. The calculation of LOQ is usually ad-hoc [175] but it can be set
to the value of the LOD when expected analyte concentrations are not known [174].
59
Figure 3. 6: Biosensor attached to Uniscan potentiostat using a bespoke connector (1).
Prototype biosensor (2) sensing surface is an enzyme-coated carbon electrode (black circular
area in right frame). (Image taken by author)
3.3.2.2 Colourimetric Detection of Oxalic Acid
Colourimetric detection is a chemical technique for determining the concentration of coloured
compounds in a solution [176]. This determination is enabled by the Beer-Lambert law, which
relates concentration of a dissolved compound, 𝑐𝑑𝑖𝑠, to the absorbance of a specific wavelength
of light, 𝐴𝑏𝑠, and the path length or distance travelled by light through a spectrophotometer cell
of known dimensions, 𝑙, as follows:
𝑐𝑑𝑖𝑠 =𝐴𝑏𝑠
휀𝑎𝑏𝑠𝑙 [3.5]
Where 휀𝑎𝑏𝑠 is the absorption coefficient of the chemical compound. Beer-Lambert law holds true
for most compounds in diluted solutions [176].
Colourimetric tests were performed by Dr Steph Heard of Rothamsted research with an assay
optimised by Prof. Nicola Tirelli of University of Manchester. The procedure used is as follows
[177]:
Solution A was made at a pH of 3.8 and temperature of 37oC up by dissolving 50 𝑚𝑀
Succinate buffer, 0.79 𝑚𝑀 N,N Dimethylaniline and 0.11 𝑚𝑀 3-Methyl-2-
Benzothiazolinone Hydrazone (MBTH) in 100 𝑚𝑙 of deionised water.
1 2
60
A master mix solution was then prepared with 12.6 𝑚𝑙 Solution A, 0.5 𝑚𝑙 100 𝑚𝑀
Ethylenediaminetetraacetic Acid Solution (EDTA), 1 𝑚𝑔/𝑚𝑙 of freshly prepared
Horseradish Peroxide and 0.35 𝑢𝑛𝑖𝑡𝑠 /𝑚𝑙 of freshly prepared Oxalate Oxidase mixed in
1.5𝑚𝑙 water. All solutions were prepared in cold deionised water.
For each sample to be tested for oxalic acid, 10𝜇𝑙 of sample was pipetted into one well
of 96-well flat-bottom tissue culture plate (TPP®). Two replicates (the same amount of
the same sample) were also pipetted into 2 other wells so that an average concentration
was calculated. A blank well containing 10𝜇𝑙 of un-inoculated sample was also prepared.
140𝜇𝑙 aliquot of the master mix was pipetted into each well containing testing sample
and replicates bringing the wells to 150𝜇𝑙.
150𝜇𝑙 of 11 standards (0, 25, 50, 100, 200, 400, 1000, 1500, 2000, 2500, 3000 𝜇𝑀) were
also set up on 11 wells of the culture plate to enable the generation of a standard curve.
Absorbances were then read from the plate with a Varioskan Flash spectral scanning
multimode reader (Thermo Scientific™) at 590nm. The plate was then incubated for 5hrs
at 37oC and then read again.
The concentrations were calculated from the Varioskan readings as the average from
three absorbances with the equation:
𝐴𝑏𝑠 = 𝑆𝑖𝑛𝑡 + 𝑆𝑔𝑟𝑎𝑑𝑆𝑐𝑜𝑛𝑠𝑡𝑐𝑑𝑖𝑠
Where 𝑆𝑖𝑛𝑡 , 𝑆𝑔𝑟𝑎𝑑 and 𝑆𝑐𝑜𝑛𝑠𝑡 are parameters from the standard curve.
3.3.2.3 qPCR Measurement of Spore DNA
Quantitative Polymerase Chain Reaction (qPCR) is a DNA amplification technique that relies on
the use of a thermo-stable polymerase enzyme to synthesise copies of a DNA. The process is
initiated by priming [153] the gene of interest in a DNA, making it ready for polymerase binding
and synthesis of new DNA. The reaction is done in cycles characterised by temperature changes,
with each cycle resulting in an approximate doubling of each DNA.
Extraction of spore DNA was achieved using the quantitative Polymerase Chain Reaction (qPCR).
This method is particularly attractive because it is highly sensitive to low spore counts (up to
0.5pg) and is discriminative of spores, such as Botrytis, B cinerea and S. minor, that are known
to masquerade as Sclerotinia sclerotium spores [153]. This method is therefore ideal for
quantifying real-life sampled that are known to be difficult to accurately quantify with less
accurate techniques.
61
The extraction was done by Mrs Gail Canning at Rothamsted’s PBCS laboratory using the primer
design technique developed by Rogers et al. [153] for quantification. Rotating arm rods
containing the spores collected between 31/05/2013 and 03/06/2013 were removed from frozen
storage at -200C and processed using the following procedure [177]:
Each I-rod was put into a 2ml screw cap tube with one scoop of ballotini beads (0.5𝑚𝑔
and 0.4𝜇𝑔 diameter) added.
440𝜇𝑙 of Extraction Buffer was added into each tube consisting of 400 𝑚𝑀 Tris-HCl; 50
𝑚𝑀 EDTA pH 8; 500𝑚𝑀 sodium chloride; 0.95 % sodium dodecyl sulphate (SDS); 2 %
polyvinylpyrrolidone and 5𝑚𝑀 1,10-phenanthroline monohydrate.
0.1% β-mercaptoethanol was added to the tubes before each sample was shaken in a
FastPrep 24 automated lysis machine (MP Biomedicals, 2012). Samples were shaken 3
times at 6.0𝑚/𝑠 for 40 sec, with a 2 minute cooling period on ice between each cycle.
400𝜇𝑙 of 2% SDS buffer was added to each tube. The tubes were inverted several times
and incubated at 65°C in a water bath for 30mins. 800𝜇𝑙 of the bottom phase of phenol:
chloroform (1:1) was added to each tube and then vortexed and then centrifuged at 4oC
for 10 mins at 13K revolutions per minute (krpm).
The top layer of the supernatant from each tube was pipetted up and placed into a 1.5𝑚𝑙
flip-flop Eppendorf tube already containing 30 𝜇𝑙 of 7.5𝑀ammonium acetate and 480 𝜇𝑙
of isopropanol. The tubes were then inverted several times and centrifuged at -20°C at
13krpm for 30 mins.
The resulting supernatant from these tubes (Eppendorf) was removed leaving a DNA
pellet. The pellet was washed with 200 𝜇𝑙 of 70% ethanol and the tube centrifuged again
at 13krpm for 15 mins. The ethanol was removed and the pellet dried. The pellet was re-
suspended in 30 𝜇𝑙 of sterile deionised water and mixed.
The TaqMan method, which offers more isolation and specificity [178], was used to
quantify the DNA.
3.4 Results
This section presents the results of the three identification and quantification methods described
above. For colorimetric detection and qPCR, the results are also presented as spatial data to show
dispersion of spores from release to sampling.
3.4.1 Biosensor Test and Calibration Results
The calibration curve generated from testing of calibration samples is presented in Figure 3.7.
The error bars denote the standard deviation of the current measured by the prototype biosensor
62
for each sample after 5 repetitions of the test, and can be seen to be higher at very low
(< 100 μmolL−1) and very high (> 1000 μmolL−1) concentrations of oxalic acid. The baseline
current corresponding to 0 μmolL−1 (4.075μA) is also high especially when compared to the
current value corresponding to 1500μmolL−1 (10.58μA – the maximum current in the biosensor
measurement range), as this indicates a background noise of at least 39%. It is always desirable
to linearised biosensor calibration curves to improve reliability of measurements [175] [174]. With
this approach, the background noise is usually higher than measured and is given by the intercept
of the linearised calibration curve (=5.83𝜇𝐴). The LOB was calculated 4.4𝜇𝐴, and the LOD and
LOQ were determined to be 6.05𝜇𝐴. The LOD and LOQ correspond to approximately 63 μmolL−1
of oxalic acid, representing 4.2% of the biosensor measurement range.
Figure 3. 7: Biosensor calibration curve for five repeated measurements at 600C after allowing
120 seconds of mixing (𝑛 = 25, error bars = ± 1S. D. ).
Based on these figures of merit, measurements of sampled data were made. The results are
presented in Table 3.2. As may be observed, only 2 samples, taken on the first day, were positive
y = 0.0036x + 5.8228R² = 0.7558
0
1
2
3
4
5
6
7
8
9
10
11
12
0 100 200 300 400 500 600 700 800 900 1000110012001300140015001600
Me
asu
red
Cu
rre
nt
(uA
)
Oxalic Acid Concentration (uM)
Biosensor Calibration at 60C after 120s of mixing
63
(highlighted in yellow). Both of these samples represent the position that is closest to the source
(B - 1m). A test was considered positive if the current reading was higher than the LOD (6.05𝜇𝐴).
There were no detections by the biosensor on the remainder of the sampling days even though
conditions equally favourable for Sclerotinia sclerotium spore release were present – experiment
days were preceded by a brief period of rain, which was accompanied by dryness, and relative
humidity ranged from 65 to 90% during the sampling period. These are favourable for Sclerotinia
sclerotium spore release [19, 21].
64
Table 3.2: Current recorded biosensor measurements procedure described in section 3.3.1 Values
highlighted in yellow are above baseline noise level determined in the last section and are
considered positive for oxalic acid. Heights of 0.8m correspond to Rotorod samplers deployed
below the canopy (canopy height = 1m).
Position
Height
(m)
Current (µA)
Day 1
Current (µA)
Day 2
Current (µA)
Day 3
Current (µA)
Day 4
A 0.8 4.65 4.32 4.10 4.30
A 1.6 4.23 4.20 4.50 4.70
B 0.8 6.75 4.31 3.95 4.40
B 1.6 6.24 4.25 4.60 4.19
B 2.4 3.99 4.12 4.45 4.20
B 3.2 4.40 4.16 4.34 4.13
C 0.8 4.32 4.40 4.10 4.29
C 1.6 5.32 4.10 3.70 4.54
D 0.8 4.12 4.02 4.00 4.31
D 1.6 4.51 4.22 4.20 4.11
D 2.4 3.99 4.12 3.95 3.95
D 3.2 4.21 4.40 4.00 4.20
E 0.8 4.33 4.25 3.99 4.05
E 1.6 4.60 3.94 4.10 4.18
F 0.8 4.27 3.92 4.08 3.95
F 1.6 4.38 3.85 4.02 3.98
G 0.8 4.50 3.99 4.11 4.00
G 1.6 4.09 4.10 3.89 3.87
H 0.8 3.95 4.15 3.75 4.30
H 1.6 4.10 3.93 3.91 4.50
I 0.8 4.41 3.80 3.84 4.23
I 1.6 4.15 3.98 4.01 3.91
3.4.2 Colourimetric Analysis Results and Discussion
The concentrations of oxalic acid corresponding to collected spores over the 4 sampling days are
shown in Table 3.3. The colourimetric assay used had a lower limit sensitivity of 10𝜇𝑀. As such,
concentrations below 10𝜇𝑀 were considered null values, while values above that threshold were
considered valid and representative. Further, the values highlighted in purple are from samples
65
that turned purple after reaction, as is characteristic for oxalate samples for the assay used. A
couple of samples recorded oxalic acid concentrations (above 10𝜇𝑀) but did not turn purple. It
is suspected that this was caused by other non-oxalate biomass, which affected the
spectrophotometer reading, driving up the absorbance.
Table 3.3: Concentrations of oxalic acid measured by colourimetric analysis. Values in purple are
positively and quantitatively representative of oxalic acid. Heights of 0.8m correspond to Rotorod
samplers below the canopy and all others are above the canopy (canopy height = 1m).
Position Height
(m)
OA conc.
(µM) Day 1
OA conc.
(µM) Day 2
OA conc.
(µM) Day 3
OA conc.
(µM) Day 4
A 0.8 10.33 16.20 6.23 8.05
A 1.6 9.60 4.53 11.33 4.62
B 0.8 57.11 15.86 20.04 21.99
B 1.6 133.82 7.12 31.52 11.03
B 2.4 16.27 15.33 10.91 68.05
B 3.2 8.92 16.50 14.03 38.27
C 0.8 4.05 17.16 6.81 6.67
C 1.6 26.52 4.81 26.34 13.93
D 0.8 7.32 6.67 11.10 8.49
D 1.6 8.54 5.21 1.91 3.38
D 2.4 8.46 4.85 4.37 4.97
D 3.2 4.64 2.42 17.88 3.32
E 0.8 67.78 8.96 6.30 7.38
E 1.6 5.67 6.86 11.13 3.40
F 0.8 7.09 2.43 6.40 6.92
F 1.6 3.53 3.26 13.20 9.50
G 0.8 5.48 4.36 7.47 6.94
G 1.6 6.44 5.30 6.45 3.57
H 0.8 3.20 3.47 6.17 6.87
H 1.6 2.18 8.38 2.48 3.63
I 0.8 8.06 2.94 3.05 5.28
I 1.6 6.99 6.98 21.54 2.51
66
Table 3.3 generally confirms the consistent presence of spores near the source as seen by positive
detections at positions B and C on most of the sampling days. The concentration magnitudes
however are not consistent with the decay model that has long been associated with spore
dispersion, for which reduced concentrations are recorded further away from the source [147].
This is easily perceived in Figures 3.8 and 3.9, where concentrations of oxalic acid from samples
collected below and above the canopy are plotted separately. Only oxalic acid concentrations of
samples corresponding to downwind Rotorod positions (A, B, C, D, I – see figure 3.2) are shown
in these Figures, because concentration decay with distance from source is of interest here. These
positions correspond to -7m, 1m, 7m, 14m and 28m downwind relative to the centre of the spore
ring (origin – 0m) as represented in both figures. The higher concentration at position B (1m) in
Figure 3.9 is unusual when compared to the value at the same position on Figure 3.8, since below
canopy spore concentrations are normally higher than above canopy concentration as a result of
canopy filtering [89] and sacrifice of spores near the source due to cooperative action of spores
during release [26].
Figure 3.8: Oxalic acid concentrations for all days for samples collected below the OSR canopy
0
10
20
30
40
50
60
-10 0 10 20 30
Co
nc.
of
OA
(µ
M)
Distance from Center of Spore Ring (m)
Oxalic Acid Concentration Below Canopy
Day 1
Day 2
Day 3
Day 4
67
Figure 3.9: Oxalic acid concentrations for all days for samples collected below the OSR canopy
As may be seen from Figures 3.8 and 3.9, the concentrations above the canopy are slightly higher
than those below the canopy throughout. Higher above canopy concentrations than below canopy
concentrations is not usually the case for actual spore concentrations in fields where only local
sources contribute to the spores. This is because the canopy heavily filters escaping spores.
Two more outlying measurements are evident in Figures 3.10 and 3.11. Here the complete data,
including concentrations from samples in the crosswind positions (E, F, G, and H) is shown as
side-by-side comparisons of concentrations of oxalic acid by day (Figure 3.10) and by position
(Figure 3.11). On Day 2 (Figure 3.10), a high concentration of oxalic was measured for the upwind
position, A. However, because the measurements are too close to the colourimetric test’s
detection threshold of 10𝜇𝐿 and the overall low concentrations for the entire day (see Day 2 on
Figure 3.10), this may not be significant.
0
20
40
60
80
100
120
140
160
-10 0 10 20 30
Co
nc.
of
OA
(µ
M)
Distance from Center of Spore Ring (m)
Oxalic Acid Concentration Above Canopy
Day1
Day 2
Day 3
Day 4
68
Figure 3.10: Side-by-side comparison of daily oxalic acid concentrations for all positions. The
positions of collection of spores represent Rotorod samplers that were deployed below the
canopy.
Figure 3.11: Concentrations grouped by position for all sampling days. Spores tested for oxalic
acid were collected below the canopy.
0
10
20
30
40
50
60
70
80
Day 1 Day 2 Day 3 Day 4
Co
nc.
of
OA
(µ
M)
Sampling Days
Side-by-side comparison of daily oxalic acid concentrations A
B
C
D
E
F
G
H
I
0
10
20
30
40
50
60
70
80
A B C D E F G H I
Co
nc.
of
OA
(µ
M)
Sampling Positions
Side-by-side comparison of oxalic acid concentration by positions
Day 1
Day 2
Day 3
Day 4
69
3.4.3 Spore DNA (qPCR) Results
The primer design used during the measurement has been found capable of detecting as low as
1.4 spores, corresponding to 0.5pg of DNA [153]. Therefore, the improved sensitivity of this
quantification method over the proxy measurements of oxalic acid is expected. Figures 3.12 and
3.13 show the downwind gradient of spores for all days below and above the canopy respectively.
The key refers to field positions (letters) and height of deployment (numbers) shown in figure
3.2. In these figures, dispersion of spores in the along-wind direction, leaving out the sampling
positions in the lateral direction for now, is shown below and above the canopy respectively.
Figure 3.12: Along wind concentration (spore DNA) gradient below OSR canopy for first three
sampling days. The key refers to field positions (letters) and height of deployment above
ground (numbers). Spore DNA axis is scaled for clarity, maximum values for the first 2 days are
shown at the top and have the same units as the vertical axis.
26775 15245.25
0
2000
4000
6000
8000
Day 1 Day 2 Day 3
Scl
ero
tin
ia D
NA
(p
g)
Sampling Days
Spore gradient in Little Hoos below canopy
disregarding lateral dispersion
A 0.8
B0.8
C0.8
D0.8
I0.8
70
Figure 3.13: Along wind concentration (spore DNA) gradient above OSR canopy for first three
sampling days. Lateral (crosswind) sampling positions are not shown.
It can be seen that spore production/release declined over the duration of sampling. Also, a more
discernible (compared to colourimetric results) trend of daily spore depletion with distance from
the source is evident for all days except the last, where relatively no spores were detected. This
low spore count on the final day of sampling was traced to a marked deviation of the sampling
axis from the general wind direction due to a forecasting error in wind direction for that day,
which resulted in most of the plume bypassing the sampling grid. A wind rose of forecasted and
actual wind direction for the field reveals this, as shown in Figure 3.14. As a result of this, samples
from the fourth day were excluded from further analysis.
a b
Figure 3.14: Wind rose showing forecasted (a) and actual (b) wind speed and directions on day
4. The forecasted wind readings were used to set the sampling axis, resulting in a misalignment
of sampling grid and spore plume
0
300
600
900
1200
1500
1800
Day 1 Day 2 Day 3
Scl
ero
tin
ia D
NA
(p
g)
Sampling Days
Spore gradient in Little Hoos above canopy
disregarding lateral dispersion
A1.6
B1.6
C1.6
D1.6
I1.6
10%
20%
30%
Wind Rose Day 3 (02/06/13) 1000-1600 hrs
WEST EAST
SOUTH
NORTH
1.8 - 1.9
1.9 - 2
2 - 2.1
2.1 - 2.2
2.2 - 2.3
2.3 - 2.4
2.4 - 2.5
2.5 - 2.6
2.6 - 2.7
15%
30%
45%
Wind Rose Day 4 (03/06/13) 1000-1600 hrs
WEST EAST
SOUTH
NORTH
2.85 - 2.9
2.9 - 2.95
2.95 - 3
3 - 3.05
3.05 - 3.1
3.1 - 3.15
3.15 - 3.2
3.2 - 3.25
3.25 - 3.3
3.3 - 3.35
3.35 - 3.4
3.4 - 3.45
71
When Figures 3.12 and 3.13 were compared, it appeared there was a comparatively low escape
of spores from the canopy on Day 2. Of the 15245pg of spore DNA recorded below the canopy,
only 581pg (3.8%) made it outside the canopy. By comparison, 1541pg of 26775pg (5.7%) and
1566pg of 5613pg (27.9%) escaped the canopy on Days 1 and 3. These percentages actually
reveal that there is an unusually high escape rate of spores on Day 3. It is unclear why such a
high escape rate was recorded for Day 3 as all sampling days had identical horizontal wind speed.
It is worth noting however that wind flow in the canopy is non-Gaussian, is characterised by low
wind speeds [38, 41, 54], and factors such as wind gusts, which could affect deposition [41], and
therefore escape from canopy, aren’t accounted for by averaged wind statistics. This is why
trajectory models that utilize instantaneous weather variables through turbulence have been
known to describe canopy transport better [55, 58, 88, 89].
Figure 3.15: The spore gradient at position B (1m downwind of spore ring) with height for first
three sampling days.
0
5000
10000
15000
20000
25000
30000
0.8 1.6 2.4 3.2
Scl
ero
tin
ia D
NA
(p
g)
Height Above Ground (m)
Gradient with height at location B
Day 1
Day 2
Day 3
72
Figure 3.16: The spore gradient at position D (14m from downwind of spore ring) with height
for first three sampling days.
In Figures 3.15 and 3.16, the concentration gradients (with height) at positions B and D are
shown for all days respectively. One of the Rotorod samplers at 3.2m failed on day 2 (Figure
3.16), so the data low data recorded for that position is not representative. It can be seen that,
closer to the source (position B – figure 3.15) there is a steep gradient of spores with height (-
8𝑛𝑔/𝑚 on Day 1, -4.5𝑛𝑔/𝑚 on Day 2, and -1.8𝑛𝑔/𝑚 on Day 3). The steepness of this gradient
is the direct result of the heavy filtration effect of the canopy between heights 0.8m and 1.6m.
Between 1.6m and 2.4m, outside the canopy, there is a decrease in depletion of spores as the
spores are more mixed (Figure 3.15). The linear vertical profile of spores is similar to results of
Sclerotinia sclerotium spore escape from a pasture [50] and release of Lycopodium spores from
a wheat canopy [55].
0
50
100
150
200
250
300
350
0.8 1.6 2.4 3.2
Scl
ero
tin
ia D
NA
(p
g)
Height Above Ground (m)
Gradient with height at location D
Day 1
Day 2
Day 3
73
Figure 3.17: Spore dispersal gradient for all positions including crosswind (lateral) sampling positions. The spore DNA concentration axis
is in nanograms (ng) and is scaled between 0 to 1ng, for clarity. The key refers to field positions (letters) and height of deployment
above ground (numbers).
0
1
31 May 2013 01 June 2013 02 June 2013
Scl
ero
tin
ia D
NA
(n
g)
Spore Dispersal Gradient in Little Hoos for all sampling positionsA0.8
A1.6
B0.8
B1.6
B2.4
B3.2
C0.8
C1.6
D0.8
D1.6
D2.4
D3.2
E0.8
E1.6
F0.8
F1.6
G0.8
G1.6
H0.8
H1.6
I0.8
I1.6
74
At position D (Figure 3.16), 14m downwind of the source, the gradient is flatter than at position
B (to about 53𝑝𝑔/𝑚 for all three days) and comparable numbers of spores are seen below and
above the canopy as a result of spore dilution by air and the reduced effect of the source. This is
true for all sampling positions after C, which is 7m downwind of the source, as shown in Figure
3.17. The figure shows the concentration measured at all positions below and above the canopy
in one figure. Here, spore numbers are almost identical below and above the canopy after 7m
from the source representative of the domination of eddy diffusion over turbulent diffusion [80].
This suggests that sampling at a height of 1.6m (above the canopy) may provide more
representative aerial spore concentration since it can neutralise the effect of very close sources,
which can have the tendency to disrupt models.
Figures 3.18 and 3.19 show the spore DNA plotted against downwind distance from the source
(position letters are replaced by actual downwind distance from source in meters) fitted to a
power law model. In literature, along-wind spore dispersion within and above the canopy has on
some occasions [147, 179, 180] been found to follow a power decay law of the form 𝑎𝑥−𝑑, where
𝑥 is the distance from the source and 𝑎 𝑎𝑛𝑑 𝑏 are constants. Figures 3.18 and 3.19 show that the
dispersion of Sclerotinia sclerotium spores in OSR follows such a model. Expectedly, spore
concentration below the canopy decays much faster (𝑑 = 1.65, 1.47 and 1.3 for the first three
days) owing to the loss of spores by sedimentation and deposition on leaves. Roper et al. [26]
have found out that, due to cooperative action of fungal spores to maximise dispersion, a lot of
Sclerotinia sclerotium spores are sacrificed very close to the source for the greater objective of
travelling farther. This explains the reason behind the heavy deposition of Sclerotinia sclerotium
spores near the source, which have been reported to be as high as 90% [181].
Once spores escape the canopy, the rate of decay slows (see Figure 3.19) because of low
deposition and exposure to a uniform Gaussian-like wind profile resulting in greater turbulence
and mixing (dilution). The much lower decay rates (𝑑 = 0.76, 0.44 and 0.65 for the first three
days respectively) give an indication of this.
75
Figure 3.18: Spore DNA below the canopy plotted with distance from centre of spore ring for
first three days of sampling. Data is fitted to an inverse power law with coefficients, exponents
and 𝑅2 as shown.
y = 22927x-1.647
R² = 0.9878
y = 13831x-1.47
R² = 0.9937
y = 5412.1x-1.297
R² = 0.9652
0
5000
10000
15000
20000
25000
30000
0 5 10 15 20 25 30
Sp
ore
DN
A (
pg)
Distance from center of spore source (m)
Gradient with distance below canopy fitted to a power
law decay model
Day 1
Day 2
Day 3
Power (Day 1)
Power (Day 2)
Power (Day 3)
Figure 3.19: Spore DNA above the canopy plotted with distance from centre of spore ring
for first three days of sampling. Data is fitted to an inverse power law with coefficients,
exponents and 𝑅2 as shown.
3.5 Discussion
A discussion of the most significant findings of this chapter are discussed in this section. First,
the reliability of the prototype biosensor is analysed based on the performance of proven
detection and quantification methods (colorimetric detection and qPCR). Then an analysis of
the dispersion of naturally released Sclerotinia spores is given based on the qPCR quantified
data. This is followed by a discussion on the experimental value of the data generated in this
field trial and the limitation of the experiment. Section 3.5.1 is considered as new knowledge
while 3.5.2 and 3.5.3 are considered as new contributions to existing knowledge.
3.5.1 Reliability of the Prototype biosensor in measuring oxalic acid
In this work, the performance of a Sclerotinia spore biosensor for real-time deployment has
been evaluated for the first time. The biosensor utilizes an enzymatic bioreceptor [13] to
target Sclerotinia spores and the quantitative output is measured by electrochemical
transduction [171] [166].
The results show that the biosensor was unable to make positive detections on 2 of the 3
days it was tested, even though optimal conditions for release of ascospores were present
throughout [16] [19]. To accommodate the reality that the relationship between oxalic acid
concentration and spore numbers is not well understood [177], a colorimetric detection test
of the same sample, testing the same analyte (oxalic acid) was used to validate the
biosensors. The colorimetric tests showed that indeed oxalic acid were produced by the
y = 1278x-0.647
R² = 0.8941
y = 786.58x-0.44
R² = 0.6008
y = 1428.3x-0.762
R² = 0.9117
0
200
400
600
800
1000
1200
1400
1600
1800
0 10 20 30
Sp
ore
DN
A (
pg)
Distance from center of spore source (m)
Gradient with distance above canopy fitted to a
power law decay model
Day 1
Day 2
Day 3
Power (Day 1)
Power (Day 2)
Power (Day 3)
77
spores on all days, although they were low (all but one measurement were < 100 𝜇𝑚𝑜𝑙𝐿−1
and only 4 were > 50 𝜇𝑚𝑜𝑙𝐿−1). The calibration curve and the biosensors detection limit
( 63 𝜇𝑚𝑜𝑙𝐿−1) have shown the biosensor to be unable to reliably measure concentrations
below this threshold. Even among concentrations higher than the sensors LOD, the biosensor
recorded false negative values. This suggests that the LOQ, may have been set low at 1 x
LOD, although this is difficult to confirm without considerably more data.
The calibration curve also showed a high degree of variation in the biosensor measurement,
as a change of 0.51 μA in current (6.24 to 6.75 μA ) resulted in a two-fold increase
concentration of oxalic acid (115.89 to 257.56𝜇𝑚𝑜𝑙𝐿−1). This suggests high variability and
affects the ability of the biosensor to meet a key standard requirement of reliable sensors
[171, 182].
The high LOB of 6.05𝜇𝐴 (compared to the maximum current corresponding to the upper limit
of the detection range – 11.22𝜇A) represents the background noise of the biosensor, which
is a combination of biosensor error and electronic noise [183]. For a lot of biosensors, the
contribution of biosensor error to total error is the main challenge in reducing background
errors and variability [184]. In this case, this error is the result of the oxidation of ferrocyanide
ions by acid buffer applied to the sensors during manufacture [172]. As a result of this high
LOB, the resultant LOD - approximately 4% of the biosensors linear range - is also high and
significantly inferior to reported values for enzymatic electrochemical biosensors, with a
majority routinely detecting well below 1% of their ranges [185]. It is therefore recommended
that as a minimum, the biosensor chip should be improved to reduce the ambient noise in
the device by somehow inhibiting oxidation of ferrocyanide ions by acid buffer already in the
sensors. Interestingly, the biosensor electrochemistry manual [172] reports a lower
background noise, although this was based on fewer tests. The difference between the values
reported in the SYield report and this work is possibly due to batch to batch variation of
biosensor performance.
Deployment of this sensor raises potential areas of concern. So far, only the biosensor’s ability
to directly measure oxalic acid has been discussed. During real-world deployment the
biosensor will have to confront the challenges introduced in the stage before oxalic acid
production, specifically the period between collection, incubation, reaction and
commencement of oxalic acid synthesis. One such challenge is related to the
mischaracterisation of Sclerotinia spores during sampling. This can happen when sampled
data is contaminated by other fungi or chemicals which have the ability to impersonate
Sclerotinia [63] or suppress its ability to produce oxalic acid [186]. While enzymatic biosensors
are not as selective as DNA-based biosensors, they are acceptably selective for most species
78
[182] [13]. The constant presence of contaminants highlights one of the challenges of real
life sampled data and is responsible for why most biosensors show a performance
deterioration when deployed in an uncontrolled non-laboratory setting [185] [182]. These
mischaracterisations can either result in underestimation or over estimation of Sclerotinia risk.
It was not possible to test the impact of this source of error because both methods of testing
targeted the same analyte (oxalic acid). Nevertheless, the existence of OA-producing fungi in
real life spore samples makes underestimation or overestimation of Sclerotinia risk a potential
concern for the biosensor, particularly because the biosensor isolates at the analyte level
(oxalic acid) not at the spore level.
Table 3.4: Spore DNA converted to spore numbers using 0.35pg per single spore determined by
Rogers et al. [153].
Position Height
(m)
Spore No.
Day 1
Spore No.
Day 2
Spore No.
Day 3
A 0.8 208 184 45
A 1.6 11 25 48
B 0.8 76500 43558 16039
B 1.6 4405 1662 4476
B 2.4 1688 1662 792
B 3.2 432 643 60
C 0.8 1893 1776 893
C 1.6 663 1785 609
D 0.8 794 829 815
D 1.6 657 857 813
D 2.4 372 668 388
D 3.2 371 18 452
E 0.8 585 613 362
E 1.6 562 1157 500
F 0.8 296 1030 224
F 1.6 282 1145 624
G 0.8 266 368 95
G 1.6 874 444 124
H 0.8 260 298 172
H 1.6 458 366 118
I 0.8 349 335 170
I 1.6 552 307 301
79
Another area of concern for field deployment is the ability of spores to produce detectable
quantities of oxalic acid. A comparison of oxalic acid measured by colorimetric detection and
spores numbers (Table 3.4) shows that oxalic acid concentrations measured were low
compared to spore numbers at the same position. The average value for the closest sampling
position below the canopy downwind of the source, which should have a relatively high
number of spores due to near-source deposition [26, 154], is 28.75𝜇𝑀. This value is low
considering 50 spores have been reported to produce as much as 500𝜇𝑀 (at a PH of 5.9) of
oxalic acid under similar incubation conditions [177]. This discrepancy is possibly because
highest spore numbers are uncorrelated with high oxalic acid production but, rather,
correlated with biomass growth in fungal cultures [177, 187]. Results obtained by Heard
[177] from investigations into the detection of fungal pathogens showed that low (50),
medium (291) and high (2300) numbers of spores produced comparable levels (between 300
and 780𝜇𝑀) of oxalic acid at pH ranging from 5 to 5.9 after incubation for 4 days in Sabouraud
growth media. Heard [177] observed that there was no noticeable variation in biomass growth
between the three doses of spores. Based on this, it has been suggested that spore numbers
only influence the onset of acid formation [177]. These indicate that laboratory
determinations of expected oxalic production may not translate to field deployments.
Based on all of the above, it is very likely that the biosensor will produce false negative results
which can be more devastating than false positives to farmers/growers since the effects of
the former are often irreversible. It therefore recommended that the following be done before
field deployment:
An improvement of the biosensor chip to improve sensitivity to oxalic acid by reducing
background noise.
More investigation into realistic concentration of oxalic acid to be expected from non-
laboratory isolated airborne spores. Sampling of spores should include Sclerotinia
isolates from a larger area of the UK, as opposed to local or single-field sampling.
At least one more field trial with the full biosensor unit (comprising sampling
mechanism, incubation chamber, biosensor and instrumentation devices) deployed
should be carried out to get more data for the assessment of the units selectivity.
Particularly, the effect of masquerading fungi and oxalic acid-inhibiting microbes on
measurement should be investigated.
Much of the growth of biosensing as a field is owed to the success of biosensors in the medical
health sector [185] [188] [189] [190]. Now aided by new developments in identification and
quantification of airborne inoculum [13] these devices show increasing suitability for
agricultural and environmental applications [190-192] with the potential to achieve higher
accuracies and selectivity [193] [171].
80
With the growth of mechanised farming, agricultural applications require largescale data
collection which would require numerous sampling points over large networks. Key
requirements for biosensors used in this field are operational reliability, speed of
measurement, ease of use and setup, and inexpensiveness per test [182, 194]. At this stage
in the technological development of biosensors, engineering the highest accuracy
bioreception methods (such the DNA-based qPCR) into rapid testing, automatic, easy to use
and continuous biosensors for the purposes of large scale deployment is currently intractable
[13] and economically prohibitive for most applications [182]. A trade-off is therefore
necessary. Enzymatic biosensors, such as the glucose sensor [195] (after numerous
iterations) have been able to achieve this trade-off perfectly [184] [185] [182]. Unfortunately,
as this study indicates, a solid understanding of the biology of the analyte, its synthesis and
likely sources of contamination is necessary in order to produce acceptably reliable enzymatic
biosensors for environmental applications.
3.5.2 Sclerotinia sclerotium spores dispersion
Due to the superior accuracy and reliability of the qPCR measurement, all spore dispersion
analysis was based on spore DNA data. Spore numbers were converted to actual
concentration by dividing equivalent spore numbers by volume of air sampled (38L/m) and
duration of sampling (5hrs) to obtain spores/𝑚−3. This was deemed unnecessary since the
sampling rate, through the use of one type of sampler, and duration were maintained
throughout the experiment. For the same reason, spore DNA was also not scaled by the
efficiency of the Rotorod sampler. As such, the relative difference between samples was
preserved.
The results show that the effect of canopy filtration decreases with distance from the source.
At 7m downwind of the source, the average gradient over the sampling was 4.8 𝑝𝑔/𝑚
compared to 0.53𝑝𝑔/𝑚 at 14m downwind. Near the source, due to vertical ejection of spores
and sacrificial deposition of spores to maximise travel distance [26], much fewer spores
escape the canopy. At longer distances (14m), due to the dominant effect of eddy diffusion
over turbulent transport [154], the spores have sufficiently mixed into a plume and
comparable concentrations were measured at all heights, with below canopy (0.8m) and just
above canopy (1.6m) only differing by 4.5% over 3 days. The significance of this result relates
the effect of turbulence on representativeness of air sampled data. Deploying sampling
equipment where the spore plume is sufficiently mixed gives them the highest chance of
success [29]. The results obtained then suggest that for an OSR canopy of 1m height under
similar conditions, sampling at a distance of approximately 14m from the source is optimal.
This warrants more investigation to determine the effect of different source sizes and
81
strengths on the extent (from the source) of turbulent motion in OSR fields.
The data obtained, as shown in section 3.4.3, is consistent with the 1-D monotonic depletion
of spores that has been associated with fungal spores in literature [66]. Spore DNA was
shown to follow an inverse power law (𝑎𝑥−𝑑) in and above the OSR canopy. This is in
agreement with findings in literature where models based on source-depletion equations have
been proposed to describe concentration gradients in and through crop canopies [3, 147,
162]. These models are either in the form of an exponential decay equation [33] power decay
equation [196] or an additive combination of the two [197]. As Fitt et al. [147] note, the key
difference between the inverse power law fitted to the data in this study and the exponential
power law is that the former does not assume a constant length scale (proportional decrease
in concentration over equal distances). A varying length scale, as assumed by the power law
and, by extension, the data collected in this work, is a characteristic feature of turbulence
that is consistent with canopy near-field flow [161].
Figure 3.20: Kernel Density Estimation of spore DNA distribution below (left) and above
(right) the canopy.
Analysing the spatial distribution of spore DNA below and above the OSR canopy reveals a
key characteristic of in-canopy dispersion as shown in Figure 3.20. In this figure, the Kernel
Density Estimate [112] of spore distribution below the canopy is shown alongside that of
82
spores above the canopy for all days. Below the canopy (Figure 3.20a, b, c), a sharp peak to
the distribution, implying excess/positive kurtosis, is visible. Kurtosis is a 4th moment
descriptive statistic that measures the stochastic nature of a distribution [198, 199] and,
therefore, its variance and deviation from the mean. Kurtosis is a characteristic feature of in-
canopy dispersion due to heavy deposition of spores near the source as a result low average
wind speeds and canopy filtering [49]. This distribution holds for all sampling days as shown.
Expectedly, above the canopy (Figure 3.20d, e, f), the shapes of the curves tend toward
uniform distributions due to increased mixing of spores as a result of stronger winds and
greater turbulence. This is supported in literature The bimodal behaviour exhibited on Days
1 and 3 (Figure 3.20d and 3.20f) represents the dichotomy between two processes occurring
at different turbulent length scales – near the source where spore depletion occurs at a high
rate, and further from the source where spores are mixed and concentration doesn’t decrease
as rapidly with distance.
On the second day, the distribution is closer to uniform and unimodal. This distribution
represents an increased role of eddy diffusion in the dispersion process a higher mixing of
spores, which results in higher mixing and uniform concentration levels with downwind
distance - at positions B, C and D as shown earlier (see section 3.4.3, Figure 3.13, Day2).
The mixing of spores is so central to Gaussian distribution that a necessary assumption for
applying Gaussian Plume Models (GPMs) to spore transport is the “well-mixed condition” [54,
87].
83
Figure 3.21: Dispersion contours of spore concentration below (left) and above (right) the
canopy.
In Figure 3.21, the dispersion of spores below (a, b, c) and above (d, e, f) the OSR canopy
is shown. Below the canopy, there is almost no dispersion beyond 5m in all directions. These
omnidirectional contours are consistent with very low wind speeds where wind and turbulence
contribute very little to spore movement [38, 147]. The reason for this is related to the vertical
inhomogeneity of canopy turbulence with height. Kaimal and Finnigan [38] and Reynolds [49]
have earlier demonstrated this by plotting shear stress (𝑢′𝑤′/𝑢∗2, where 𝑢′ and 𝑤′ are
horizontal and vertical turbulent velocity components, and 𝑢∗ is the friction velocity) for
canopy data of varying Leaf Area Indices (LAI). The profile showed that momentum entering
dense canopies is absorbed from the top at a rate that ensures minimal transmission of shear
stress (turbulence) to the ground. This negligible ground-level turbulence contributes to why
an overwhelming amount of spores released from ground-level sources, such as Sclerotinia
sclerotium spores are deposited near the source. In such a case, the shape of the contour is
almost entirely determined by sedimentation and vertical deposition on leaves and other foliar
elements as supported by this work and depicted in figure 3.21a, b, c.
The effect of wind is more pronounced above the canopy (Figure 3.21d, e, f) as some spores
are transported up to at least 20m downwind on all days. Little lateral dispersion is observed
84
on Days 1 and 3 (Figure 3.21d and f). On Day 2 (see Figure 3.21e), more spores are
transported further in both the downwind and crosswind directions by diffusion as indicated
by the depletion of peak concentration compared to other days (1.4 for days 1 and 3, 0.6 for
day2). This depletion is due to more severe dilution of spores by higher turbulence and the
commencement of diffusive transport [154].
These are in agreement with experimental findings that in-canopy transport exhibits near-
field or stochastic flow characteristics [48, 62, 161].These findings suggest that dispersion
of naturally released Sclerotinia exhibit similar general characteristics inside and in the
neighbourhood of a canopy as other fungal spores [62]. This is expected as the main effect
of canopy on flow is the dissipation of turbulent kinetic energy (TKE) [161] [36, 200-202].
However, the rate of this dissipation is a function of canopy type, spore release mechanism
and spore size [149], and there may be variations from case to case. Therefore, each
canopy/spore/release combination warrants an investigation in order to fully understand its
dispersal mechanisms. Studies like these will provide an opportunity to investigate the more
nuanced characteristics that differentiate spore dispersion from one canopy to another.
3.5.3 Experimental Value of Spore Data
The experimental data generated in this work has experimental utility. Although Sclerotinia
falls within the class of actively released fungal spores [63, 181], there is no general
consensus on how it is released. Varying release rates, release conditions and release speeds
have been reported in literature [19] [26] [203] [204]. One of the complexities of Sclerotinia
release mechanism is highlighted by Roper et al. [26], who have reported that Sclerotinia can
be deliberately sacrificial of spores near the source in order to maximise travel distances.
Further, the dispersion medium (type of canopy and density) affect dispersion in a way that
is unique for each canopy, with microscale flow heterogeneity varying from canopy to canopy
[205]. As a result, generic conclusions drawn from specific experiments, while similar in terms
of large scale diffusive dispersion [154] [53], may not be representative of every unique case
in the near-field [161] [48], since specific canopy features affect the dissipation of turbulent
kinetic energy at different rates [62, 161]. In addition to affecting turbulence, the canopy
also influences the rate of deposition, which has to be accounted for every dispersion
analysis/modelling [57, 206]. Wilson [48] notes that deposition may affect modelling results
more than turbulent characterisation. The canopy profiles needed to improve the accuracy of
turbulent calculations are not readily available for crops other than maize (corn) [207] and
wheat [208]. Consequently, the continued understanding of canopy effects on dispersion and
accuracy of models relies on continued availability of good experimental data.
85
The dearth of naturally released spore data also makes the data in this study important. Most
of the reliable data into dispersal of Sclerotinia are dated and motivated by disease gradients
rather than spore gradients [17, 66, 104]. While disease and airborne spore concentration
may appear correlated, spore disposal cannot be adequately determined without
distinguishing the contributions to disease of each stage of the aerobiological pathway [148].
In the instances where experiments are carried out to investigate spore dispersion [162] [55]
[56] [57] [21] [196] [209], artificial release of inoculum above the ground are used. In the
few instances where they are naturally released [55], sampling is made along vertical profiles
or at downwind distances without sufficient spatial variation.
To the author’s knowledge, this is the first time an attempt has been made to experimentally
describe the three-dimensional dispersion of naturally released Sclerotinia sclerotium spores
in an OSR field. Along with the turbulence data recorded, this data can be used to test and
evaluate in OSR numerous turbulence modelling approaches that have already been tested
in other canopies [210, 211] [60] [212] [62, 161].
Based on this, the data generated in this work is considered a contribution to knowledge,
specifically in the understanding of Sclerotinia spores dispersion OSR fields
3.5.4 Limitations
There are a number of limitations to this experiment that have the tendency to improve data
quality when addressed. The first is the scale of the experiment. A larger sampling area will
provide more data and a higher confidence in results can be achieved. The main consideration
in this study that neutralised this limitation is that various findings have shown that local
sources of Sclerotinia do not generally travel more than 100 meters at significant detectable
quantities [29, 44, 154] [30, 42]. Therefore, the significant increase in manual labour and
expenditure necessary to address this limitation may not always be rewarded by richer data.
The second limitation concerns the state of the art in spore trap technology. The performance
of spore traps can be highly variable based on location, type, size of particle, location of
source and length of sampling period [10] [11, 13]. With respect to testing oxalic acid, the
effect of this is minimal, as both biosensor testing and colourimetric detection were done with
data sampled and processed under identical conditions, hence eliminating the effect of any
bias introduced by sampling. With respects to actual spore quantities and dispersion however,
this limitation can only be mitigated until standards for deployment and spore trap technology
have been improved [148] [10]. As a mitigation measure, care has been taken in this study
to optimally locate and choose the right sampling equipment as detailed in section 3.3.1.2.
86
Another identified limitation of this analysis related to the DNA quantification technique which
spore number estimates were based. qPCR, while very sensitive and selective to Sclerotinia,
cannot determine spore viability [63]. Only viable spores can contribute to oxalic acid
production [107] and as such pore numbers may be unrepresentative of oxalic acid quantity.
However, the effect of this is considered negligible given the large disparity between spore
count measured in the field and oxalic acid concentration measured.
3.6 Conclusion
In this chapter, details of the design and implementation of a field trial experiment to collect
spatial data for the dispersion of Sclerotinia sclerotium spores were described. Three methods
of quantification, namely electrochemical measurement of oxalic acid with a prototype
biosensor, direct measurement of oxalic acid by colourimetric analysis and quantification of
spore DNA by qPCR analysis were used to infer the concentration of collected spores.
Calibration of the biosensor with pure and known concentrations of oxalic acid showed a low
signal-to-noise ratio that was associated with a high ambient noise due to the inherent activity
of ferrocyanide ions in the biosensor. Consequently, only two positions on the field recorded
concentrations (Day 1) that were above the threshold noise (LOD) of the biosensor. The
direct measurement of oxalic acid using colourimetric analysis offered improvements in
sensitivity and detected and quantified oxalic acid at many more positions on the field for all
sampling days.
The qPCR measurements offered the highest sensitivity to spore concentration. The number
of spores determined from the qPCR data showed that the oxalic acid concentrations
measured by colourimetric analysis were lower than expected in comparison to amounts
measured by Heard [177] in a laboratory test with low (50), medium (291) and high (2300)
number of spores. The difference between oxalic acid concentrations measured for pure
Sclerotinia sclerotium spores in Heard’s laboratory tests and those measured in this work may
be due to the presence of other types of spores/pathogens in real air-sampled data that could
act as contaminants and suppressants of oxalic acid production, the arbitrary effect of spore
numbers on oxalic acid quantity or a combination of both.
The biosensor was shown to have a detection limit representing 4% of its measurement range
which is much higher than reported for enzymatic biosensors in literature. As stated in the
preceding paragraph, it is suspected that field-sampled spores may not consistently produce
oxalic acid above this detection threshold. This suggests that, as currently built, the biosensor
is unlikely to reliably detect Sclerotinia sclerotium spores when deployed.
87
The more reliable qPCR data showed dispersion to be consistent with dispersion patterns
investigated for other types of spores and pollen in crop canopies. Above and below canopy
dispersions followed an inverse power law with different rates of source depletion. Dispersion
below the canopy was lower due to low average horizontal wind speeds inside the canopy
(resulting in higher sedimentation) and the filtering effect of the OSR canopy. Above the
canopy, more mixing was observed and the distribution of spores was similar to a Gaussian
process as a result of higher mixing of spores and longer, more stable turbulent length scales.
To the author’s knowledge, this is the first time an attempt has been made to experimentally
describe the three-dimensional dispersion of naturally released Sclerotinia sclerotium spores
in an OSR field. Others [196, 213] have investigated vertical escape and gradients of other
types of spores in other canopies. Since spore dispersion in crops is canopy and spore-type
dependent, the data generated from this study has experimental value. The next chapter
describes the novel application of a Lagrangian Stochastic (LS) model to this data in order to
describe spore dispersion in OSR canopies.
88
Chapter 4 A backward Lagrangian Stochastic (bLS) model
for the dispersion of Sclerotinia sclerotium spores
4.1 Introduction
In Chapter 3, details of a field trial experiment for the release, collection and quantification
of Sclerotinia spores were provided. This chapter describes the backward Lagrangian
Stochastic (bLS) model and its adaptation and application to dispersion of the relatively
heavier Sclerotinia sclerotium spores in the presence of a disruptive (on deposition and
turbulence) oil seed rape canopy. This algorithm was originally developed to describe tracer
transport.
The aim of this chapter is to demonstrate that the bLS model can be used to describe
transportation for spores released from a ground level source in an OSR canopy sufficiently
well enough above the canopy to enable further employment of the model in calculating spore
trajectories. It is desired that the model can describe spore transport in OSR canopies
sufficiently well to justify further employment in calculating spore spatiotemporal
concentration. This sufficiency is defined as the ability of the model to estimate concentrations
of spores at sensor locations above canopy height when the spores leave a ground level, in-
canopy source.
In applying the model, meteorological observations from the field trial experiment discussed
in Chapter 3 were used to characterise the turbulence during spore collection and the
measured spore concentrations were used to evaluate the model performance.
The chapter begins by explaining the reasons influencing the choice of the trajectory model.
This is followed by a description of Monin-Obukhov Similarity Theory (MOST). MOST is the
fundamental theory supporting the validity of turbulence parameterisations used in bLS. The
forward Lagrangian Stochastic (LS) model is then introduced, after which bLS and its
adaptation to the dispersion of Sclerotinia spores in OSR are described.
89
4.2 Motivation for Trajectory Modelling Approach
The data described in chapter 3 provided an insight into the random and non-Gaussian nature
of in-canopy dispersion. Within a field, before spores escape the canopy and constitute a long
distance threats to crops, spore transport cannot be adequately described by Gaussian Plume
Models (GPMs). In addition to the immediate source vicinity being non-diffusive [53], wind
speeds and turbulent forces are very low. Conditions are further complicated by the
disruptiveness of the canopy [36], which makes treating spore dispersion as a plume rather
than as a stochastic process unrealistic [49]. Trajectory models, which describe stochastic
processes, are therefore more suitable in the near field [53, 62]. The accuracy and superiority
of these models over GPMs in describing dispersion within crop canopies for ground-level
sources and measuring distances close to the source (less than 100m) has been demonstrated
and reported extensively in literature [53] [51, 54, 147, 162]. Trajectory models offer an
advantage over GPMs by enabling the tracking of individual particles from release to
deposition and should intuitively provide more accuracy in disruptive wind flows. Within the
family of trajectory models, Lagrangian Stochastic models are attractive because they are
free of theoretical constraints [48] and describe particle movement in the most natural
manner, with particles being described by their actual speeds as opposed to Eulerian fixed-
point velocities. This fundamental approach makes modifications to account for particle
physical features and canopy disruptions relatively easier [48]. Further, using the concept of
backward-time Lagrangian models, spore trajectories and therefore concentration can be
estimated without detailed source information [214]. This is beneficial for local spore
concentration estimation applications, where source locations are usually not known [11].
Another influencing factor for this choice of model is the logistical expense and technological
limitations of measuring and quantifying spore data. As mentioned in chapter 1, the
unavailability of real-time, fast-acting sensors limits the number and scale over which samples
can be collected. This eliminates the possibility of using empirical models. Trajectory models,
especially bLS, only require turbulent statistics, physical/aerodynamic properties of spores
and a description of the terrain over which spore concentration is to be determined. Validating
the models then requires comparably (to empirical models) much less data.
4.3 Background Theory
4.3.1 Lagrangian Stochastic Models
Lagrangian Stochastic (LS) models [87] describe the movement of particles in turbulent flow
by generating all possible trajectories of all particles from a given reference at any time.
These trajectories are computed instantaneously as the particles move with the ‘true’
90
Lagrangian velocities they experience as opposed to fixed-point velocities used in Eulerian
models. Each moving particle is subjected to turbulent forces as it travels through a medium
this and the corresponding velocity fluctuations and displacement experienced are described
by the Langevin equation [86]:
𝑑𝑢 = 𝑎𝑑𝑡 + 𝑏𝜉 [4.1]
𝑑𝑥 = 𝑢𝑑𝑡. [4.2]
where 𝑢 and 𝑥 are the horizontal velocity and position of the particle, 𝑎 and 𝑏 are the
coefficients representing the deterministic and random processes of the stochastic process,
and 𝜉 denotes random numbers drawn from a Gaussian distribution of zero mean and
variance equal to the process time-step, 𝑑𝑡. When
Equations 4.1 & 4.2 are based on the stochastic differential equation (SDE) arising from the
assumption that the state of each particle, its position and speed (𝑋, 𝑈), jointly evolves as a
Markov process [54]. The first term of Eq. 4.1 describes large-scale (drift) properties that
determine the nature of flow, thus representing the deterministic value of the speed
fluctuations. The second term describes small-scale (diffusive) properties of turbulent flow,
thus representing random fluctuations in average speed. For three-dimensional flow along 𝑥
(downwind), 𝑦 (cross-wind) and 𝑧 (vertical) directions corresponding to velocity
components 𝑢, 𝑣, and 𝑤, the generalised 3D Langevin equation given by:
𝑑𝒖 = 𝒂𝑑𝑡 + 𝒃 𝝃
𝑑𝒙 = 𝒖𝑑𝑡 [4.3]
Where:
𝒖 = 𝑢1, 𝑢2, 𝑢3 = 𝑢, 𝑣, 𝑤;
𝒙 = 𝑥1, 𝑥2, 𝑥3 = 𝑥, 𝑦, 𝑧;
𝒂 = 𝑎𝑢1, 𝑎𝑢2, 𝑎𝑢3 = 𝑎𝑢 , 𝑎𝑣 , 𝑎𝑤;
𝒃 = 𝑏𝑢1, 𝑏𝑢2, 𝑏𝑢3 = 𝑏𝑢 , 𝑏𝑣 , 𝑏𝑤 𝑎𝑛𝑑
𝝃 = 𝜉𝑢 , 𝜉𝑣 , 𝜉𝑤 .
The drift term, 𝒂 (𝒙, 𝒖, 𝒕), and diffusion term, 𝒃 (𝒙, 𝒖, 𝒕), can be determined by enforcing the
well-mixed condition [87], which means that in a bounded region, once the distribution of
particles becomes well-mixed in space, it is expected to remain well-mixed for all future times
[215]. The well-mixed condition only accommodates vertical inhomogeneity, the formulation
is not unique for multidimensional models where Gaussian turbulence cannot be assumed for
horizontal flow [150, 210, 211]. To address this, other schemes that satisfy the well-mixed
condition have been proposed but these have not been proven to be preferable to Thomson’s
[216]. For this study, Thomson’s model is suitable because of the assumptions of a stationary
boundary layer.
91
Given 𝑈, 𝑉 𝑎𝑛𝑑 𝑊 as the average Eulerian (measurable) velocities in the along-wind,
crosswind and vertical directions respectively, velocity fluctuations 𝑢 − 𝑈, 𝑣 − 𝑉 and 𝑤 − 𝑊,
henceforth denoted by 𝑢′, 𝑣′𝑎𝑛𝑑 𝑤′, and the corresponding velocity variances
𝜎𝑢2, 𝜎𝑣
2 𝑎𝑛𝑑 𝜎𝑤2, and covariances of velocity fluctuations ⟨𝑢′𝑣′⟩, ⟨𝑢′𝑤′⟩ and ⟨𝑣′𝑤′⟩, the drift
and diffusion terms (𝒂 and 𝒃) for a three-dimensional Langevin equation can be derived.
Assuming 𝑊 = 0 for stationary atmosphere and 𝑉 = ⟨𝑢′𝑣′⟩ = ⟨𝑣′𝑤′⟩ = 0 for sampling grid
aligned to wind direction (sees section 3.3.1.2), Thompson’s solution for the 3D coefficients
of the Langevin equation reduces to its simplest form [150, 217]:
𝑎𝑢 =1
2(𝜎𝑢2𝜎𝑤
2−𝑢∗4)
𝑏𝑢2[𝜎𝑤
2𝑢′ + 𝑢∗2𝑤′] +
1
2
𝜕(𝑢′𝑤′)
𝜕𝑧+ 𝑤′ 𝜕𝑈
𝜕𝑧+
1
2(𝜎𝑢2𝜎𝑤
2−𝑢∗4)
[𝜎𝑤2 𝜕𝜎𝑢
2
𝜕𝑧𝑢′𝑤′ +
𝑢∗2 𝜕𝜎𝑢
2
𝜕𝑧𝑤′2 + 𝑢∗
2 𝜕(𝑢′𝑤′)
𝜕𝑧𝑢′𝑤′ + 𝜎𝑢
2 𝜕(𝑢′𝑤′)
𝜕𝑧𝑤′2 ]
[4.4]
𝑎𝑣 =1
2𝑏𝑣
2 𝑣
𝜎𝑣2 [4.5]
𝑎𝑤 =1
2(𝜎𝑢2𝜎𝑤
2−𝑢∗4)
𝑏𝑤2[𝑢∗
2𝑢′ + 𝜎𝑢2𝑤′] +
1
2
𝜕𝜎𝑤2
𝜕𝑧+
1
2(𝜎𝑢2𝜎𝑤
2−𝑢∗4)
[𝜎𝑤2 𝜕(𝑢′𝑤′)
𝜕𝑧𝑢′𝑤′ +
𝑢∗2 𝜕(𝑢′𝑤′)
𝜕𝑧𝑤′2 − 𝑢∗
2𝜕𝜎𝑤
2
𝜕𝑧𝑢′𝑤′ + 𝜎𝑢
2 𝜕𝜎𝑤2
𝜕𝑧𝑤′2] [4.6]
𝑏𝑢 = 𝑏𝑣 = 𝑏𝑤 = √𝐶𝑜휀 =2𝜎𝑤
2
𝑇𝐿
[4.7]
where the friction velocity , 𝑢∗, is related to the fluctuation velocity covariance as
⟨𝑢′𝑤′⟩ = −𝑢∗2. 𝐶𝑜 is the Kolmogorov constant, 휀 is the turbulent kinetic energy dissipation
rate, and 𝑇𝐿 is the Lagrangian (or alternatively decorrelation) timescale, which describes the
persistence or memory of turbulent motion.
4.3.2 The Backward Lagrangian Stochastic Model
The backward Lagrangian Stochastic (bLS) [217] model is based on the conventional
Lagrangian Stochastic model [53] forward LS model tracks a particle from source (release)
to receptor (deposition) assigning positive values to the velocity vectors. In contrast, bLS
traces a particle’s trajectory back from a receptor to a source. This approach offers flexibility
and ease of use, especially for area sources, by focusing only on trajectories of interest that
originate from specific receptor locations. This simplifies computations by the reducing the
overall number of trajectories that must be computed. It also dispenses with the need to pre-
specify source configurations since any shape of source can be accommodated by simply
evaluating particles that land within it. Importantly, these advantages of the bLS model do
not result in a loss of accuracy compared to LS models [214, 218, 219].
92
Consider a particle originating from point A with time-space coordinates (𝑥, 𝑡), reaching a
receptor B with coordinates (𝑥𝑏 , 𝑡𝑏). Tracking the 𝑖𝑡ℎ particle in backward-time from B to A
implies that 𝑡𝑏 = −𝑡, and the corresponding position and velocity of this moving particle are
defined as:
𝑑𝑢𝑖𝑏 = 𝑎𝑏𝑑𝑡 + 𝑏𝑏𝜉 [4.8]
𝑢𝑖𝑏 =
𝑑𝑥𝑖
𝑑𝑡′=
𝑑𝑥𝑖
𝑑𝑡= −𝑢𝑖 [4.9]
So the particle has a negative velocity relative to LS, but the same magnitude as during
forward LS implementation. Flesch et al. [217] investigated the effect this would have on the
drift and diffusion terms and discovered that the bLS equivalent of the drift term, 𝒂𝑏, only
differs to the forward term, 𝒂, by a sign change on its first term while the diffusion terms 𝒃𝑏
and 𝒃 are equivalent. Hence, the drift and diffusion terms of the bLS model are given by:
𝑎𝑢𝑏 = −
1
2(𝜎𝑢2𝜎𝑤
2−𝑢∗4)
𝑏𝑢2[𝜎𝑤
2𝑢′ + 𝑢∗2𝑤′] +
1
2
𝜕(𝑢′𝑤′)
𝜕𝑧+ 𝑤′ 𝜕𝑈
𝜕𝑧+
1
2(𝜎𝑢2𝜎𝑤
2−𝑢∗4)
[𝜎𝑤2 𝜕𝜎𝑢
2
𝜕𝑧𝑢′𝑤′ +
𝑢∗2 𝜕𝜎𝑢
2
𝜕𝑧𝑤′2 + 𝑢∗
2 𝜕(𝑢′𝑤′)
𝜕𝑧𝑢′𝑤′ + 𝜎𝑢
2 𝜕(𝑢′𝑤′)
𝜕𝑧𝑤′2 ] [4.10]
𝑎𝑣𝑏 = −
1
2𝑏𝑣
2 𝑣
𝜎𝑣2 [4.11]
𝑎𝑤𝑏 = −
1
2(𝜎𝑢2𝜎𝑤
2−𝑢∗4)
𝑏𝑤2[𝑢∗
2𝑢′ + 𝜎𝑢2𝑤′] +
1
2
𝜕𝜎𝑤2
𝜕𝑧+
1
2(𝜎𝑢2𝜎𝑤
2−𝑢∗4)
[𝜎𝑤2 𝜕(𝑢′𝑤′)
𝜕𝑧𝑢′𝑤′ +
𝑢∗2 𝜕(𝑢′𝑤′)
𝜕𝑧𝑤′2 − 𝑢∗
2𝜕𝜎𝑤
2
𝜕𝑧𝑢′𝑤′ + 𝜎𝑢
2 𝜕𝜎𝑤2
𝜕𝑧𝑤′2] ; [4.12]
𝑏𝑢𝑏 = 𝑏𝑣
𝑏 = 𝑏𝑤𝑏 = √𝐶𝑜휀 =
2𝜎𝑤2
𝑇𝐿
[4.13]
Henceforth, 𝑎𝑢 , 𝑎𝑣 , 𝑎𝑛𝑑 𝑎𝑤 and 𝑏𝑢, 𝑏𝑣 𝑎𝑛𝑑 𝑏𝑤 will be used to denote the bLS model.
4.3.2.1 Calculating Concentration with bLS Models
Determining concentration with LS models is an additional process to computing trajectories.
Concentrations at a receptor are determined as the ensemble-average of particle residence
times within the receptor volume [216], i.e. the time spent by particles in a volume. For bLS
models, where the concentration footprint at the source, 𝑥0,, is required, this represents the
backward residence time, 𝑇𝑏, of tracers passing through an infinitesimal distance, 𝑑𝑧, above
a source with mass density, 𝑆. This is given by [217]:
93
𝐶(𝑥0) = 𝑆
𝑁∑ 𝑇𝑛
𝑏
𝑁
𝑛=1
[4.14]
Where 𝑁 is the number of particles traced back from receptor, 𝑥. For an area source
𝑄 (#𝑚−2𝑠−1) at an infinitesimal height above ground so that 𝑆 =𝑄
𝑑𝑧 then Flesch et al. [217]
[217] that 𝑇𝑏 =2𝑑𝑧
|𝑤0|, and Equation 4.14 can be written as:
𝐶(𝑥) = 𝑄
𝑁∑
2
|𝑤0| [4.15]
where 𝑤0 is the velocity at “touchdown” of the particles that land within the source. Equation
4.15 enables the estimation of concentration from a catalogue of landing positions and
velocities (𝑥0, 𝑦0, 𝑤0) without specifying type of source or configuration a priori.
4.3.3 Monin-Obukhov Similarity Theory (MOST)
MOST [47, 118, 191, [38, 152, 220][38, 152, 220][38, 152, 220][60, 129, 206](Garratt 1992,
Kaimal and Finnigan 1994, Foken 2006)(Garratt 1992, Kaimal and Finnigan 1994, Foken
2006)(Garratt 1992, Kaimal and Finnigan 1994, Foken 2006)(Garratt 1992, Kaimal and
Finnigan 1994, Foken 2006)(Garratt 1992, Kaimal and Finnigan 1994, Foken 2006)[51, 120,
201][47, 118, 200][47, 118, 200][47, 118, 200][47, 118, 200][47, 118, 200][47, 118, 199]47,
118, 199] is a generalised theory of turbulence for the surface layer based on a
nondimensional universal function of 𝑧/𝐿 (𝜑(𝑧 𝐿⁄ )), where 𝐿 is the Obukhov length [38, 220,
221]. The theory assumes that the surface layer is locally isotropic up to heights of about
100-200m [222]. Based on this assumption, vertical variation of horizontal wind speed and
turbulence characteristics, MOST stipulates that only three key parameters – friction
velocity, 𝑢∗ , Obukhov length, 𝐿, and surface roughness, 𝑧0 - are required to adequately
describe surface layer flow [152]. These are calculated as follows. The friction velocity, 𝑢∗, is
a function of the turbulent fluctuations of velocity components and is given by:
−𝑢∗2 = √⟨𝑢′𝑤′⟩ [4.16]
And the Obukhov length, L, a measure of stability, is given by:
𝐿 = −𝑢∗
3𝑇
𝑘𝑣𝑔(𝑤′𝑇′) [4.17]
where T is the absolute mean temperature, ⟨𝑤′𝑇′⟩ is the temperature flux and the von
Karman constant, 𝑘𝑣, was chosen as 0.40 – the average of the most reliable values reported
for MOST [152].
94
The roughness length, 𝑧0, was calculated by substituting 𝑢∗ and L into the modified log-wind
profile for canopies as follows:
𝑧0 = −𝑧 − 𝑑
exp [𝑈𝑘𝑣
𝑢∗2 − 𝜑(𝑧, 𝑧0, 𝐿)]
[4.18]
where 𝑑 is the displacement or zero-plane height at which the average wind speed reduces
to zero and 𝜑 (𝑧, 𝑧0, 𝐿) is the Monin-Obukhov universal function (stability correction term) for
unstable stratification given by [223]:
𝜑 = −2 𝑙𝑛 (1 + 𝛺
2) − 𝑙𝑛 (
1 + 𝛺2
2) + 2𝑡𝑎𝑛−1(𝛺) −
𝜋
2 [4.19]
where
𝛺 = [1 − 15(𝑧 − 𝑑)
𝐿]
0.25
Turbulence statistics can then be expressed in terms of these parameters. MOST-based
velocity variances for stable/neutral atmosphere are given by [38]:
𝜎𝑢 = 2.5𝑢∗ [4.20]
𝜎𝑣 = 2𝑢∗ [4.21]
𝜎𝑤 = 1.25𝑢∗ [4.22]
And for unstable stratification [150, 224]:
𝜎𝑢 = 𝑢∗ [4 + 0.6 (𝑧
−𝐿)
2
3]
1/2
[4.23]
𝜎𝑣 = 0.8𝑢∗ [4 + 0.6 (𝑧
−𝐿)
23]
1/2
[4.24]
𝜎𝑤 = 1.25𝑢∗ (1 − 3𝑧 − 𝑑
𝐿)1/3
[4.25]
And the Lagrangian timescale for unstable stratification is calculated as [225, 226]:
𝑇𝐿 =0.5𝑧
𝜎𝑤
(1 − 6𝑧
𝐿)
14⁄ [4.26]
4.4 Methodology
This section presents the methodology used in this chapter. The subsections discuss the
parametrisation of the bLS model for OSR canopy, detail the evaluation of model with
experimental data and then explain the implementation of the discrete bLS model. These
sections represent the author’s work in optimising bLS for the problem in question based on
an integration and adaptation of methodologies and relevant experimental findings across
literature.
95
4.4.1 Parametrising the bLS Model for Sclerotinia Dispersion
The bLS model presented above is suited for the dispersion of neutrally buoyant, passive
tracer particles under the well-mixed constraint. In addition to input stability parameters, the
model must be optimised for the heavier (than tracers) Sclerotinia spores and for canopy
transport.
4.4.1.1 Calculating Model Statistics
All stability statistics used in this work were based on MOST calculations using the turbulence
measurements obtained from the field trial experiment discussed in section 3.2: U, 𝑢, 𝑣, 𝑤,
𝜎𝑢, 𝜎𝑣 , 𝜎𝑤 , 𝜎𝑢2, 𝜎𝑣
2, 𝜎𝑤2 and 𝑢′𝑤′. Although MOST calculated statistics can be erroneous in
extreme stability periods [227] [228], the theory is widely considered satisfactory for heights
of at least 29m, where, usually, |𝑧 𝐿⁄ | < 1 [229-231], and for uniform and flat terrains [38,
220, 232] as long as periods of extreme atmospheric (in)stability are avoided [150, 233].
These conditions are similar to those under which the experimental field trial described in
chapter 3 was conducted.
Following Flesch et al. [150] recommendation that averaging times over 60 minutes are non-
ideal for MOST-based bLS models, a 60-minute averaging time was adopted as this time gave
the best chance of success of evaluating the model against air sampled data in section 3.2.
This is because longer sampling periods improved reliability of spore traps and chances of
data collection [10].
As turbulent statistics are flow media dependent [31] [37], the MOST equations presented
(Eq.4.16-4.26) are not valid inside the canopy. Below the canopy, flow is non-Gaussian and
highly vertically inhomogeneous [234, 235] [212, 216]. Above a canopy, the flow is Gaussian,
horizontally and vertically homogenous under isotropic surface layer assumptions, so MOST
derived statistics above are directly applicable [236]. At the canopy-air interface, a rough-
wall boundary the effect of wind shear on canopy surface introduces instability in the region
around the canopy characterised by constant Reynold’s stress [149], known as the roughness
sublayer [31] [62] [38], which can extend to varying heights under different stability
conditions [152]. Therefore, three separate classes of flow that need to be considered: above
the canopy, inside the roughness sublayer and inside the canopy.
Inside the Roughness Sublayer (RSL)
For dense canopies with Leaf Area Index (LAI) greater than 1, the roughness sublayer extends
to a height of approximately 2(ℎ − 𝑑) above the displacement height [237]. The displacement
height, 𝑑, itself has been found to be fairly consistent at ~0.75ℎ over a wide range of natural
96
canopies [38]. In line with these findings, the roughness sublayer was estimated to extend
from 0.75ℎ to 1.25ℎ, making its effective depth 0.5m. The author determines that at this RSL
height, concerns about high RSL (>3h) degrading MOST performance [152] are not an issue.
At the top of the layer of height 𝑧𝑟𝑙 = 1.25h above ground, velocity statistics were taken to
be equal to those above the canopy (Eqs. 4.16 – 4.26). Following Aylor et al. [55] a gradient
15% was applied to values at 𝑧𝑟𝑙 to account for linear decrease in velocity statistics through
the layer up to the canopy height ℎ . Velocity statistics at the top of the canopy for
stable/neutral stratification are then given by:
𝜎𝑢(ℎ) = 2.13𝑢∗ [4.27]
𝜎𝑣(ℎ) = 1.7𝑢∗ [4.28]
𝜎𝑤(ℎ) = 1.1𝑢∗ [4.29]
and for unstable stratification (𝐿 > 0),
𝜎𝑢(ℎ) = 0.85𝑢∗ [4 + 0.6 (𝑧
−𝐿)
23]
12
[4.30]
𝜎𝑣(ℎ) = 0.72𝑢∗ [4 + 0.6 (𝑧
−𝐿)
23]
12
[4.31]
𝜎𝑤(ℎ) = 1.1𝑢∗ (1 − 3𝑧 − 𝑑
𝐿)
13
[4.32]
The values of 𝜎𝑢, 𝜎𝑣 and 𝜎𝑤 were assumed to decrease linearly through the sublayer [38],
which corresponds to gradients of 0.15𝜎𝑢,𝑣,𝑤(ℎ)/0.25𝑚 for this study. The Lagrangian
timescale, 𝑇𝐿 , remains unchanged in the upper half of the roughness sublayer (i.e. 𝑇𝐿 =
𝑇𝐿(ℎ) 𝑓𝑜𝑟 𝑧 > ℎ)
Above The Canopy
Above the roughness sublayer (𝑧 > 1.25ℎ) the surface layer assumption that average wind
speed varies as a diabatically corrected logarithmic wind profile is valid and Equations 4.16 –
4.26 are applicable.
Inside the Canopy
Only one sonic anemometer was available and it was decided that it be deployed above the
canopy. This is primarily because single measurements of turbulence in the surface layer,
where turbulence is homogenous, are be representative [236]. This is not the case for below
canopy measurements due to nonhomogeneous flow and the resulting effect of high
97
turbulence intensities on flow angles [59]. Consequently, varying canopy turbulence was
calculated based on experimental profiles and the assumption of exponential decay of
turbulence kinetic energy with distance from the canopy top towards the ground [36].
Turbulence profiles were calculated as follows [55]:
𝑈 = 𝑈(ℎ)exp [−𝛾1 (1 −𝑧
ℎ)] for 0< 𝑧 ≤ ℎ [4.33]
𝜎𝑢 = 𝜎𝑢(ℎ)exp [−𝛾2 (1 −𝑧
ℎ)] for 0< 𝑧 < ℎ [4.34]
𝜎𝑣 = 𝜎𝑣(ℎ)exp [−𝛾3 (1 −𝑧
ℎ)] for 0< 𝑧 < ℎ [4.35]
𝜎𝑤 = 𝜎𝑤(ℎ)exp [−𝛾4 (1 −𝑧
ℎ)] for 0< 𝑧 < ℎ [4.36]
⟨ 𝑢′𝑤′⟩ = −𝑢∗2exp [−𝛾5 (1 −
𝑧
ℎ)] for 0 < 𝑧 < ℎ [4.37]
𝑇𝐿 = 𝑇𝐿(ℎ) for 0.25 < 𝑧 < ℎ, and [4.38]
𝑇𝐿 = 𝑇𝐿(ℎ) [0.1 + 3.6(𝑧 ℎ⁄ )] for 0 < 𝑧 < ℎ [4.39]
Where 𝑈(ℎ), 𝜎𝑢(ℎ), 𝜎𝑤(ℎ) 𝑎𝑛𝑑 𝑇𝐿(ℎ) are top of canopy values. The extinction coefficients, 𝛾,
are properties of canopy density representing the rate of absorption of momentum that should
ideally be measured for individual canopies. Choosing these coefficients follows the values
used by Aylor and Flesch [55]. 𝛾1 was assigned a value of 2.4, based on the value estimated
by Shaw et al. [238] for a corn canopy with LAI = 3 (OSR LAI = 3.5). 𝛾2 and 𝛾4 were chosen
based on generalised non-forest canopy length scales for 𝜎𝑢(ℎ) and 𝜎𝑤(ℎ/3), making 𝛾2 = 1
and 𝛾4 = 3 . 𝛾3 was also taken equal to 𝛾2 , based on covariation of 𝜎𝑣 with 𝜎𝑢 . No
experimental values were found for 𝛾5 is OSR canopy, so a non-forest canopy value of 3.5
was used [38].
4.4.1.2 Adjusting for inertia of spores
Tracers and other light particles only need the equations of the fluid (air) to describe their
transport. For particles less than approximately 300𝜇𝑚, bLS can be directly adopted as their
size does not significantly decorrelate fluid and particle trajectories [48]. With diameters in
the range of 12𝜇𝑚 − 14𝜇𝑚 [20, 26, 50], Sclerotinia ascospores fall well below this threshold.
For these spore sizes, Wilson has shown that inertial adjustments need to be made in two
ways. The first is to account for the effect of the spore’s settling velocity, 𝜐𝑠, on the vertical
component of particle velocity, 𝑤. This can be achieved by modifying the Langevin equation
(Eq. 4.6) as shown in Eq. 4.40 [55]:
𝑑𝑧 = (𝑤 − 𝑣𝑠)𝑑𝑡 [4.40]
The second adjustment is to correct the decorrelation timescale to account for the difference
between the turbulence of fluid following a heavier particle and that following a passive tracer.
To achieve this, Sawford and Guest [239] proposed a weighting factor 𝑓 (0 ≤ 𝑓 ≤ 1) to
98
correct the Lagrangian timescale in air, 𝑇𝐿 , to give the decorrelation timescale of Sclerotinia
spores in air, 𝜏, as shown:
𝜏 = 𝑓𝑇𝐿 [4.41]
𝑓 = 1
√1 + (𝛽𝑣𝑠
𝜎𝑤)
2
Following this, Eq. 4.13, can be rewritten as follows:
𝑏𝑢 = 𝑏𝑣 = 𝑏𝑤 =2𝜎𝑤
2
𝑓𝑇𝐿
[4.42]
where 𝛽 is an empirical constant relating Lagrangian to Eulerian timescales determined to be
1.5 by Sawford and Guest [239]. Since 𝑓 is inversely proportional to settling velocity, lighter
particles will have a very small effect on 𝜏. Also, because 𝜏 varies with 𝜎𝑤, which varies with
height inside the canopy, the model time-step, 𝑑𝑡, and the fluctuation term of the Langevin
equation, 𝑏, will vary when the model resolves particles below canopy height.
4.4.1.3 Deposition on Vegetation and Ground
Spore deposition is usually determined by dimensional analysis or by experiments, as these
parameters have not been fully quantified [56]. The spore deposition algorithm of Aylor and
Flesch [55] was updated to account for deposition onto the lateral dimension vegetation
deposition based on the Bouvet et al.’s [207] algorithm. The resulting 3D form is shown in
Eq. 4.43. At each time step, the probability of a single travelling spore being deposited on
any element of vegetation in a 3D wind field can be expressed as:
𝐺𝑣(𝑧) = 𝑣𝑠𝑓𝑥𝐿𝐴𝐷𝐸𝑥𝑑𝑡 + 𝑢𝑓𝑧𝐿𝐴𝐷𝐸𝑧𝑑𝑡 + 𝑣𝑓𝑦𝐿𝐴𝐷𝐸𝑦𝑑𝑡 [4.43]
where 𝐸𝑥, 𝐸𝑦 and 𝐸𝑧 are the horizontal, lateral and vertical impaction efficiencies, 𝑓𝑥, 𝑓𝑦 and
𝑓𝑧 are the projection of plant area to horizontal, lateral and vertical planes and 𝐿𝐴𝐷(𝑚−1)
can also be thought of as the vertical variation of LAI with height inside a canopy [211].
Consistent with the findings of Aylor [208], 𝐸𝑥 was set to be equal to 1.0 signifying perfect
impaction efficiency and 𝐸𝑦 and 𝐸𝑧 were calculated as:
𝐸𝑦 = 𝐸𝑧 =0.86
1 + 0.442 (|𝑢|𝜏𝑅
𝐿𝑣)
−1.967 [4.44]
where 𝜏𝑅 (= 𝑣𝑠/𝑔) is the particle relaxation time and 𝐿𝑣 the characteristic size of vegetation
or leaf width [57].
99
Unlike wheat and maize [207] canopies, LAD profile measurements of OSR canopies were not
found in literature after an exhaustive search. In parameterizing Eq. 4.43, the beta probability
density function LAD profile for canopies with LAI=3 originally proposed by Markkanen et al.
[240] and adapted for banana canopies by Duman et al. [211] and Siqueira et al. [241] was
used. The density function is given by [211]:
𝐿𝐴𝐷(𝑧) ~(𝑧 ℎ⁄ )ℊ−1
(1 − 𝑧ℎ⁄ )
𝜘−1
, 𝑧 ℎ⁄ 𝜖[0,1] [4.45]
Where ℊ and 𝜘 are shape parameters. 𝜘 was kept constant at 3 [240] and ℊ, which affects
vertical distribution of foliage within the canopy, was adjusted to 4 to reflect OSR intermediate
measured crown height of 0.34m. 𝐿𝐴𝐷 was normalised by LAI (=3.5) such that
𝐿𝐴𝐼/ ∫ 𝐿𝐴𝐷. 𝑑𝑧ℎ
0 =1. 𝑓𝑥,𝑦,𝑧 are normally constant with height [55, 57, 207] and were calculated
from the mean tilt angle (MTA) measured during the field trial experiment as 0.3 and 0.52,
respectively (𝑓𝑦 = 𝑓𝑦 = 𝑠𝑖𝑛∅, 𝑓𝑧 = 𝑐𝑜𝑠∅, where ∅ is the mean tilt angle). The leaf width, 𝐿𝑣
[57] [207] was estimated as 0.035m for the OSR field. Estimates were made from 10
randomly selected leaves along each of 8 transects (east to west). Each leaf was measured
with a ruler horizontally across the midvein. The mean (out of 80) of these was taken as 𝐿𝑣.
The probability of deposition was then evaluated by comparing 𝐺𝑣 to a random number
휂 (chosen from a uniform distribution between 0 and 1) and the spore was either deposited
if 휂 < 𝐺𝑣, or allowed through to the next time step otherwise [55]. In the cases where spores
were deposited, that particular spore was abandoned and the bLS model released the next
one provided N had not been exceeded.
Probability of deposition to the ground was given by [162]:
𝐺𝑔 = {
2𝑣𝑠
𝑣𝑠 − 𝑤, 𝑤 < −𝑣𝑠
1, |𝑤| < 𝑣𝑠
[4.46]
On impact, if 휂 < 𝐺𝑔 , the spore was deposited, otherwise it was reflected back into the air.
Just after reflection, the spore position and velocity were updated according to the following
equation [55]:
𝑧𝑛𝑒𝑤 = 𝑧𝑜𝑙𝑑 − 2𝑣𝑠𝑑𝑡
𝑢𝑛𝑒𝑤 = −[𝑢𝑜𝑙𝑑 − 𝑈(𝑧𝑜𝑙𝑑)] + 𝑈(𝑧𝑛𝑒𝑤)
𝑤𝑛𝑒𝑤 = −𝑤𝑜𝑙𝑑 [4.47]
where “old” and “new” denote positions before and after reflection.
100
4.4.2 Implementing the bLS Model
The model was implemented by running the numerical form of Eq. 4.3 to generate particle
back trajectories:
∆𝒖 = 𝒂∆𝑡 + 𝒃 𝝃
∆𝒙 = 𝒖∆𝑡 [4.48]
where ∆𝑡 is the model’s time step and all other terms retain their earlier descriptions. The
choice of ∆𝑡 is critical as it needs to be sufficiently smaller than the decorrelation time scale,
𝜏, such that turbulent activity is not missed and the well-mixed condition is not violated [242].
Following Flesch et al. [150], Flesch et al. [217] and Aylor and Flesch [55], ∆𝑡 was chosen as
0.025𝜏 to fulfil the condition that model time-step is less than the Lagrangian timescale. For
each run, N =150000 particles were released from locations corresponding to the 22 Rotorod
sampling positions in the experimental field trial with initial velocities assigned according to
the Eulerian velocity statistics (above canopy) and roughness sublayer velocity statistics
(below the canopy) for the individual release positions. (Eq. 4.34 – 4.36) were assigned to
those leaving samplers below canopy height. N was determined by releasing 50000, 75000,
100000, 125000, 150,000, 175,000 and 200,000 and N =150,000 was chosen based on the
least value of N that achieved convergence for concentration at position D 1.6m (see Figure
3.2). To ease computation, each particle was traced back in time towards the source as it
experienced position and velocity fluctuations up to a maximum of 40m before being
abandoned.
During each time step, ∆𝑡, the model resolved the particle’s current position based on the
most recent update of Eq. 4.48 and evaluated the turbulent statistics for that height. Above
the canopy, these values will be same throughout; from the top of the roughness sublayer to
the top of the canopy, they will decrease linearly as 0.15𝜎(𝑢,𝑣,𝑤)(ℎ)/0.25𝑚; and inside the
canopy, they will decrease exponentially with the characteristic canopy length scale – see Eq.
4.33–4.39. Within this same time-step and for the same position, ground and vegetation
deposition probabilities were computed and, depending on the outcome, the particle was
either deposited or allowed to proceed through to the next time step. At the end of the time
step, the particle velocity and position were updated according to Eq. 4.49:
𝒖 = 𝒖 + ∆𝒖
𝒙 = 𝒙 + ∆𝒙 [4.49]
The landing positions (𝒙𝟎 , 𝒚𝟎) of all N released particles and corresponding touchdown
velocities 𝑤0∗(= 𝑤0 − 𝑣𝑠) were recorded in a catalogue. Concentrations were then calculated
by slightly modifying Eq. 4.13 to account for velocity differences between spore and tracer as
shown:
(𝐶/𝑄) = 1
𝑁∑
2
|𝑤0∗|
𝑛
[4.50]
101
where 𝑛 is the number of touchdowns within the source area. To account for the ring
configuration of the source, spores were assumed to emanate from 6 1m2 squares centred
on the circumference of the 7m diameter source ring as shown in figure 4.1. All touchdown
within these squares were considered to contribute to the concentration footprint the receptor
of interest. All six groups of sources were assumed to be ‘active’ during the sampling runs
and assumed to have the same 𝑄 and, therefore, could be treated like a single, homogenous
source [243]. These are reasonable assumptions given the similar environmental and physical
soil conditions of the sources due to their proximity to one another, the consistency of
Sclerotinia isolates used, the sowing practice employed in burying each group of Sclerotia
[148], and the same rate of maturation [19].
Figure 4.1: The assumed source configuration used for concentration footprint calculation
showing approximate locations of 6 groups of Sclerotinia. Each group is assumed to cover a
1 square meter area based on approximate measurements of area covered by fruiting
bodies. The vertices of each square for the left bottom corner starting with 1 are: (-2.25,
2.5), (1.25, 2.5), (3, -0.5), (1.25, 4.0), (-2.25, -4.0), and (-4, 0.5). (Drawing not to scale).
102
4.4.3 Comparing model estimates to experimental data
A number of adjustments had to be made to the data in order to compare it to model
estimates. The first adjustment was to convert spore numbers to a standard measurement
unit of concentration (#/𝑚3).
The second adjustment concerned the sampling time of the field experiment described in
chapter 3. In the field experiment, spores were collected for 5 hours each day and analysed
as one sample for each sampling point to mitigate sampling error [10]. From a modelling
point of view, a 5-hour sampling period is too long to evaluate as a single sample period
because wind turbulent activity has a considerably shorter timescale. Averaging times beyond
an hour generally make turbulent statistics unrepresentative and the models they are based
on inaccurate [150]. This is especially true for MOST-based models because, at longer
averaging times, the assumptions of local stationarity are invalid [150, 228].
The third adjustment was to standardise the unit of measure between observed
concentrations and the bLS-estimated model. 𝑄𝑒𝑠𝑡 could not be measured during the
experiment due to the unavailability of direct measuring methods [55]. To compare model
estimates (𝐶/𝑄) with experimental data, there was a need to estimate the actual release rate
of spores, 𝑄𝑒𝑠𝑡 , in order to express observed data as 𝐶𝑚/𝑄𝑒𝑠𝑡 where 𝐶𝑚 is the measured
concentration. These adjustments are detailed in sections 4.4.3.1-4.4.3.3.
4.4.3.1 Converting spore DNA to standardised spore concentration (#/𝒎𝟑)
First, spore DNA in picograms (described in chapter 3) was converted to spore numbers using
a measure of 1spore = 0.35pg of DNA as specified by the primer design used for qPCR
analysis [153]. These were then converted to #/𝑚3 by dividing by the air volume sampled
(38 𝐿𝑚𝑖𝑛−1 ) and sampling time (5hrs). A more realistic measure of actual airborne
concentration requires scaling air-sampled concentrations with the efficiency of the collection
device. Efficiency of Rotorods varies with the type and physical characteristics of spores being
collected and there are no recorded estimates of collection efficiencies of Rotorod samplers
collecting Sclerotinia spores. However, Aylor [244] had calculated an efficiency value for
Rotorods collecting V. Inaequalis spores as 21% and de Jong et al. [50] reported that
Sclerotinia ascospores and V. Inaequalis spores have approximately the same dimensions and
would therefore have similar sedimentation velocity, 𝑣𝑠, of 0.002𝑚𝑠−1. McCartney and Fitt
[179] and Aylor [244] independently reported the sedimentation velocities of Sclerotinia
spores and V. Inaequalis as 0.002𝑚𝑠−1, thus corroborating de Jong et al. Based on this,
concentrations were multiplied by a factor of 5 to account for sampling inefficiency.
103
4.4.3.2 Adjusting spore concentration
Spore concentration was adjusted to a shorter averaging time based on the findings of
Clarkson et al. [19], Hartill [245], McCartney and Lacey [21], Bourdot et al. [157], Abawi and
Hunter [246], and Qandah [27] that associated specific environmental factors with diurnal
variations and peak distributions of Sclerotinia spore release. Most of these studies
[157],[204], [27] found that spore emission peaked between 9am and 1pm, with Qandah
[27] further positing that approximately 85% of total spores released in a day are emitted
during this period. During the experimental field trial, sporulation was observed on all days
at the start of sampling (11am) and temperature and relative humidity had similar diurnal
variation for all days. Based on these results and the fact that glycerine-coated I-rods will
lose their adhesion during long sampling periods due to degradation in spore retention caused
by spore and dust accumulation/overloading [247], the majority of field samples were
assumed to have been collected within the first hour of sampling (11am to 12pm).
Consequently, the spore concentration was multiplied by a factor of 0.85 to get the
concentration after 2hrs. Further assuming a 60/40 split in spore retention, the 2hr
concentration was multiplied by 0.6 to obtain the spore concentration at 12:00pm on all days.
4.4.3.3 Estimating actual spore concentration (𝑸𝒆𝒔𝒕)
Due to the unavailability of direct and reliable methods to measure the rate of release of
spores (source strength), 𝑄𝑒𝑠𝑡 was not measured during the field trial experiment. This is an
important variable in comparing concentration estimates because it scales the observed
concentration values, because comparisons are made are made based on source-scaled
concentrations 𝐶/𝑄.
Proposed methods of calculating 𝑄𝑒𝑠𝑡 by de Jong et al. [50] based on sclerotial density, 𝑠
(#/𝑚2), area of sporulating apothecial disc, A (𝑚𝑚2/𝑠𝑐𝑙𝑒𝑟𝑜𝑡𝑖𝑎), and the rate of release of
ascospores per apothecia, 𝑟 (#/𝑚𝑚2 of disc surface) were found unreliable because of the
large disparities in reported values of ascospore release [19, 159, 245]. This variability is due
to the sensitivity of ascospores to a wide range of environmental conditions that trigger
release.
An inverse dispersion modelling approach was instead adopted to calculate the estimate of
𝑄, 𝑄𝑒𝑠𝑡 [150, 163]:
𝑄𝑒𝑠𝑡 =𝐶𝑏 − 𝐶𝑚
(𝐶 𝑄⁄ )𝑚𝑜𝑑𝑒𝑙
[4.51]
where 𝐶𝑏 and 𝐶𝑚 are observed background and upwind concentrations respectively and
(𝐶 𝑄⁄ )𝑚𝑜𝑑𝑒𝑙
is the estimated normalised bLS model estimate for the same sampling position
as 𝐶𝑚 . From Eq. 4.51, only a few 𝐶𝑚 at select positions and their corresponding model
104
estimates are required to estimate 𝑄𝑒𝑠𝑡. To ensure the independence of 𝑄𝑒𝑠𝑡 from the bLS
model outputs such that using 𝑄𝑒𝑠𝑡 to scale observations does not result in minimizing the
mean squared error between model predictions and observations [56], a forward LS model
was chosen to generate profiles of (𝐶 𝑄⁄ )𝑚𝑜𝑑𝑒𝑙
at heights of 1.6, 2.4 and 3.2m corresponding
to sampling heights at position D, at which point there was good mixing on all days (see
figure 3.2) during the field trial. This model had been successfully used to estimate source
strength from concentration profiles in grass and wheat canopies [55]. To get more accurate
estimates, only sampling points above the canopy were used here. Position B (see figure 3.2),
which is just 1m away from the edge of the circular ring spores, was also not chosen to
estimate 𝑄𝑒𝑠𝑡 to ensure that the constituent plumes from the 6 separate sources had
sufficiently mixed into one. The forward LS model is given by [211] [216]:
(𝐶/𝑄)𝑚𝑜𝑑𝑒𝑙 = ∑ [1
𝑁∆𝑥∆𝑦∆𝑧∑ [
1
𝑢𝑘(𝑥𝑠𝑒𝑛𝑠 , 𝑧𝑠𝑒𝑛𝑠 ±∆𝑧𝑠𝑒𝑛𝑠
2)]
𝐾
𝑘
]
𝑀
𝑚=1
[4.52]
where 𝑢𝑘 is the horizontal velocity of the individual spore passing through a sensing volume
of height ∆𝑧𝑠𝑒𝑛𝑠 located at (𝑥𝑠𝑒𝑛𝑠 , 𝑧𝑠𝑒𝑛𝑠), ∆𝑥, ∆𝑦 𝑎𝑛𝑑 ∆𝑧 (0.1, 0.1, 0.1m) are dimensions of the
sensor volume and M is the number of sources (M=6). The volume of the sensing surface
was set to the volume of air sampled by the Rotorod samplers in one second. To generate
profiles of (𝐶 𝑄⁄ )𝑚𝑜𝑑𝑒𝑙
at heights corresponding to those of position D (see figure 3.2), Eq.
4.52 was evaluated with N =150000. These profiles were then used to scale the observed
concentrations at the respective heights using Eq. 4.51. The best estimate was obtained by
regressing the three 𝑄𝑒𝑠𝑡 values on observed concentrations, yielding 𝑄𝑒𝑠𝑡= 218, 120 and 98
spores 𝑚−3𝑠−1 for the three days. These values are representative of the high apothecial
density in the inoculated sources. The reduction in Qest over the experimental period is
consistent with a decline in apothecial release over its lifetime [181]. It is noteworthy that
inverse modelling (Eq. 4.51) simply scales observed concentration profiles by turbulent effects
to obtain estimated actual source strength, 𝑄𝑒𝑠𝑡.
105
Table 4. 1: Table of model parameters.
Parameters Description Values
LAI
Leaf Area Index
3.5
𝐿𝑣
Leaf width
0.035 𝑚
𝑓𝑥, 𝑓𝑦, 𝑓𝑧
Projection of leaf area in 𝑥, 𝑦 & 𝑧 directions
0.3, 0.3, 0.52
ℎ
Canopy height
1 𝑚
𝑑
Displacement height
0.75 𝑚
𝑧𝑟𝑙
Roughness sublayer height
1.25 𝑚
𝑣𝑠
Settling velocity
0.002𝑚𝑠−1
𝑁
Total number of simulation particles
150000
𝜘
LAD shape parameter 1
4
ℊ
LAD shape parameter 2
3
𝑘𝑣
Von Karman constant
0.4
𝛽
Eulerian-Lagrangian coefficient
1.5
𝑔
Gravitational acceleration
9.82 𝑚𝑠−2
∆𝑡
Model time step
0.025𝜏
4.4.5 Assessing Model Performance
To better assess the performance of the bLS model, the predictions were evaluated using
established dispersion model performance measures [248-250]. These performance statistics
include the geometric mean bias (MG), geometric variance (VG), fractional bias (FB),
normalised root mean square error (NMSE), and fraction of predictions within a factor of 2
and 5 (FAC2 and FAC5). The correlation coefficient was not used because it can be misleading
for short-range dispersion predictions [251]. Assuming appropriate data inputs are used in
the model, the mentioned statistics as a whole are reliable indicators of an “acceptable” model
even in situations where there are comparatively fewer data samples. According to Chang
and Hanna [250], an acceptable model should have the following values for the statistics:
FAC2 >50%, |FB| < 0.3, 0.7<MG<1.3, NMSE < 1.5, and VG < 4. An acceptable model is
106
defined as one that is good enough for “research-grade field experiments” [250]. The
performance measures were calculated as follows [250]:
𝐹𝐵 =𝐶𝑜 − 𝐶𝑝
0.5(𝐶𝑜 + 𝐶𝑝
[4.53]
𝑀𝐺 = exp(ln 𝐶𝑜 − ln 𝐶𝑝
) [4.54]
𝑉𝐺 = exp[(ln 𝐶𝑜 − ln𝐶𝑝)2 ] [4.55]
𝑁𝑀𝑆𝐸 = (𝐶𝑜 − 𝐶𝑝)
2
𝐶𝑜 𝐶𝑝
[4.56]
𝐹𝐴𝐶2 = 0.5 ≤𝐶𝑝
𝐶𝑜
≤ 2 [4.57]
𝐹𝐴𝐶5 = 0.2 ≤𝐶𝑝
𝐶𝑜
≤ 5 [4.57]
where 𝐶𝑜 and 𝐶𝑝 are observations and model predictions respectively and the overbars
denote means of quantities. FB and MG are measures of mean bias and indicate systematic
errors, VG and NMSE are measures of scatter and indicate both random and systematic errors,
and FAC2 and FAC5 are robust measures of how close predictions are to observations [250].
Values of MG (0 < MG < 2) < 1 and > 1 imply overprediction and underprediction respectively.
Similarly, FB (-1< FB < 1) < 0 and > 0 indicate overprediction and underprediction
respectively.
4.5 Results
The model was evaluated with the parameter values shown in Table 4.1. Figures 4.2 and 4.3
show the model predictions and observations for the streamwise and crosswind sampling
positions above and below the canopy respectively. Figures 4.4 and 4.5 show the normalised
observations against modelled observations for all sampling positions above the canopy and
below canopy height respectively. In both cases, it is evident that the model over predicts
the observations considerably.
In Figure 4.2, the model predictions above the canopy appear to agree more with the
observations than those below the canopy. Above the canopy, the power law decay of spore
concentration with distance from the source appears to be preserved. Predictions are worse
near the source and seem to get better with downwind distance. The relatively poor
performance near the source is due to the treatment of six groups of sources as one. At close
distances from the source, errors resulting from the assumption that multiple identical sources
can be modelled as one can be amplified [150]. This close distance is of the order of 10𝑧
from the upwind edge of a source, and is equivalent to 16m (𝑧=1.6) from the upwind edge
of the source in this work, and 9m from the downwind edge. Therefore, sampling position B
107
at 1m downwind is well within this distance. Further away, at approximately 14m, as the
plume mixes the predictions get slightly better.
Figure 4.2: Normalised observations (blue asterisks) versus normalised model predictions
(red circles) above (left panels) and below (right panels) the canopy for the downwind
sampling positions for all sampling days.
Downwind distance from center of source(m)
Norm
alised c
oncentr
ation o
f spore
s (
C/Q
)
0 5 10 15 20 25 3010
-3
10-2
10-1
Day 1 Above canopy (z = 1.6m)
0 5 10 15 20 25 3010
-3
10-2
10-1
Day 2 - Above canopy (z = 1.6m)
0 5 10 15 20 25 3010
-3
10-2
10-1
100
Day 3 - Above canopy (z = 1.6m)
0 5 10 15 20 25 3010
-3
10-2
10-1
100
Day 1 - Below canopy height (z = 0.8m)
0 5 10 15 20 25 3010
-3
10-2
10-1
100
Day 2 - Below canopy height (z = 0.8m)
0 5 10 15 20 25 3010
-3
10-2
10-1
100
Day 3 - Below canopy height (z = 0.8m)
108
Figure 4.3: Normalised observations (blue asterisks) versus normalised model predictions
(red circles) above (left panels) and below (right panels) the canopy for the crosswind
sampling positions for all sampling days.
Distance from plume center (m)
No
rma
lise
d c
on
ce
ntr
atio
n o
f sp
ore
s (
C/Q
)
-15 -10 -5 0 5 10 15
10-3
10-2
10-1
Day 1 - Above canopy (z = 1.6m)
-15 -10 -5 0 5 10 1510
-3
10-2
10-1
Day 2 - Above canopy (z = 1.6m)
-15 -10 -5 0 5 10 1510
-3
10-2
10-1
Day 3 - Above canopy (z = 1.6m)
-15 -10 -5 0 5 10 15
10-3
10-2
10-1
Day 1 - Below canopy height (z = 0.8m)
-15 -10 -5 0 5 10 1510
-3
10-2
10-1
Day 2 - Below canopy height (z = 0.8m)
-15 -10 -5 0 5 10 1510
-3
10-2
10-1
Day 3 - Below canopy height (z = 0.8m)
109
Figure 4.4: Normalised observations versus normalised model predictions for all observed
concentrations above the canopy. The blue line is the 1:1 line
Figure 4.5: Normalised observations versus normalised model predictions for all observed
concentrations below the canopy. The blue line is the 1:1 line.
10-3
10-2
10-1
100
10-3
10-2
10-1
100
C/Q Modelled
C/Q
Observ
ed
Plot of of bLS predictions versus all observations above canopy
10-3
10-2
10-1
100
10-3
10-2
10-1
100
C/Q Modelled
C/Q
Observ
ed
Plot of bLS predictions versus all observations below the canopy
110
The results for the sampling point below canopy height (Figure 4.2, right panels), indicates
this trend may be non-existent due to the chaotic nature of canopy transport. The poor near-
source predictions that were witnessed above the canopy are not visible anymore although
the higher concentration values in the canopy (approximately a 17-fold increase in the highest
values) may have masked this. Notwithstanding, this general over prediction is more
pronounced inside the canopy (approximately 2 times and 16 times that above the canopy
on the first two days). The fact that the model overpredicts more (see Table 4.2) inside the
canopy could suggest it underestimates deposition. A further source of error below the canopy
is discrepancies in the initialised release velocities. To estimate concentrations at the sampling
positions with bLS, particles/spores were initially released from that position with velocities
corresponding to the turbulent conditions for that location. Below the canopy, due to a more
inhomogeneous flow, the estimated turbulent statistics used for the release (𝜎𝑢,𝑣,𝑤(𝑧), 𝑈(𝑧)
for 𝑧 < ℎ) were more erroneous than those above it. This increased erroneous estimates
below the canopy.
In figure 4.3, which shows the observations and predictions at the crosswind sampling points,
the above-canopy prediction are again better than the below canopy predictions (see Table
4.2). Predictions made at the centre of the axis are better than those made on either side in
both mediums. On day 2, when there was a 12.80 misalignment between the central axis of
the sampling grid and the mean wind direction, the predictions were comparatively worse,
suggesting either the model’s sensitivity to misalignment or a mis-estimation of plume spread.
Significant misalignments with the mean wind direction can result in an increase in friction
velocity, 𝑢∗, since ⟨𝑣′𝑤′⟩ will no longer be zero (−𝑢∗2 = √⟨𝑢′𝑤′⟩ + ⟨𝑣′𝑤′⟩), and the model
could under/overestimate plume spread by under/overestimating 𝜎 𝑣 . Independent
measurements of 𝜎 𝑣 at z = 1.6m showed that the model overestimated this quantity
(𝜎 𝑣𝑀𝑂~1.7𝑢∗ and 𝜎 𝑣𝑚𝑒𝑎𝑠
= 0.521𝑚𝑠−1~ 1.41𝑢∗, where 𝜎 𝑣𝑀𝑂and 𝜎 𝑣𝑚𝑒𝑎𝑠
are MOST-estimated
and measured vertical velocity variances respectively). This means that the actual spread of
the plume is less than the model acknowledges. Under such circumstances, predictions away
from the centre will be erroneous as indicated in figure 4.3. Unfortunately, the unavailability
of independent turbulence measurements below the canopy made it impossible to confirm
whether this was also the case below canopy height for this case.
Figures 4.4 and 4.5 show the pervasiveness of overprediction by the model both above and
below the canopy as depicted by the majority of the points lying above the 1:1 line. However,
this overestimation cannot be confidently attributed to the model since the observed data
itself had to undergo a series of adjustments, as explained in the section 4.4.3.2. These
adjustments may have affected the absolute concentration values. Also, Rotorod samplers
have a tendency to underestimate aerial spore concentrations when they decelerate from
111
their calibrated rpm values during sampling [158]. These factors, in addition to corrections
for efficiency and the scaling of observed values with model-estimated source strength (𝑄𝑒𝑠𝑡)
make assessing model performance based on its agreement with absolute observed
concentration values unrealistic [158]. The degree of agreement of the model with
observations is difficult to see without statistical performance measures.
Table 4.2: Calculated model performance measures for different observation groups (above or
below canopy height). Number of observations is shown in square brackets
Performance Measures
Observation Group FB MG VG NMSE FAC2
(%)
FAC5
(%)
Above canopy (all) (27) -0.8 0.51 1.59 2.69 46 95.85
Below canopy (all) (27) -0.55 0.395 2.37 1.56 37.5 79.2
Above canopy (downwind)
(15) -0.69 0.63 1.73 1.75 67 100
Above canopy (crosswind)
(15) -0.88 0.39 2.27 4.1 40 93
Below canopy (crosswind)
(15) -0.85 0.4 6.81 1.74 26 73
Table 4.2 shows the performance statistics computed for all predictions divided into five
groups with number of observations for each calculation shown in square brackets. The
statistics confirm some of the initial observations made: the model generally overpredicts
more inside the canopy than outside by factors of approximately 2 (MG = 0.51, VG = 1.59)
and 2.5 (MG = 0.395, VG = 2.64) respectively, and it is more accurate above the canopy than
below (FAC2 higher above canopy). FAC2 is the more robust statistic because it is
comparatively resistant to outliers. By contrast, NMSE and FB can be strongly influenced by
high outliers as evident (in Table 4.2) in their low values for below canopy predictions where
concentrations are highest.
Even though the model has not met the acceptance threshold laid out by Chang and Hanna,
there is clear evidence from these statistics on which groups it predicts better. The model
performance above the canopy is better overall and the statistics get very close to the
acceptability threshold when the crosswind observations are further excluded (see Table 4.2
– Above Canopy (crosswind) statistics. This is significant because the intended final
112
application of this model is in the back trajectory tracking of sources from an above canopy
sensor.
Above the canopy, predictions are worse for the crosswind observations as seen when
crosswind and downwind predictions above the canopy are compared. This is due to an
underestimation of the lateral velocity component, 𝜎𝑣, as explained earlier. For the crosswind
observations, there is approximately the same amount of overprediction by the model inside
and outside the canopy, as the MG and FB values are almost identical. However, all the other
metrics show that the above-canopy crosswind predictions are better than those below the
canopy. The only exception is NMSE, which as earlier stated has a tendency to be affected
by the high outlying values inside the canopy, and it thus underestimates the mean square
error inside the canopy. This is evident when it is considered that VG, which, like NMSE is
also a measure of scatter but is unaffected by outliers and therefore more representative,
shows there is more randomness inside the canopy (VG = 6.81).
Attempts were made to improve performance by tuning the Lagrangian timescale. The
Lagrangian timescale is a major source of error in the implementation of LS models because
of its dependence on the turbulent kinetic energy dissipation rate and its influence on the
delicate model time-step. Wilson and Flesch [242] found that most errors resulting from the
violation of the well-mixed condition from the implementation of discrete LS models were
attributable to the model’s time-step, which is in turn dependent on the Lagrangian time scale
(∆𝑡~0.025𝑇𝐿) . In the presence of canopies, the turbulent kinetic energy is even more
dissipative and erratic, thus making 𝑇𝐿 more difficult to specify. Aylor and Flesch [55] reported
a better result by using a premultiplier of 0.4 (in Eq. 4.26) instead of 0.5 for 𝑇𝐿. This resulted
in poorer predictions in bLS as did a premultiplier of 0.6 in this work. It was found that a
change to Eq. 4.26 produced considerably worse predictions, possibly due to a violation of
the well-mixed constraint as a result of an altered time-step [242]. The results shown,
therefore, are based on the 𝑇𝐿 formulation in Eq. 4.26 that found good success over a wide
range of project prairie grass observations [225, 226].
4.6 Discussion
4.6.1 bLS Model Performance
This study has parametrised and implemented a bLS model that can estimate the
concentration footprint of naturally-released ground-level Sclerotinia spores at receptor
positions above an OSR canopy. The model used minimal turbulent instrumentation, utilising
MOST and empirical parametrisation of canopy turbulence to describe surface layer and
canopy turbulence.
113
The model gave better estimates above the canopy due to the more homogenous surface
layer flow and indicates that the samplers, which were deployed at 1.6m, are beyond the
influence of the roughness sublayer under these conditions (z𝑟𝑙 = 1.25ℎ = 1.25𝑚). Below the
canopy, the complexities of deposition, low wind speeds, high turbulent intensities, and a
combination of Gaussian (up to heights below 210mm ) non Gaussian velocity PDFs through
the rest of the canopy [252] affect the quality of the estimates [53]. With regards to model
estimates in the lateral direction, this implementation of the bLS performed less satisfactorily.
This is attributable to the challenges of modelling crosswind effects which can be very
sensitive to wind direction, as shown in the higher error of estimating 𝜎 𝑣 for Day 2, when the
streamwise wind misalignment with sampling axis was greatest. In this work, there is a
tendency for these effects to be magnified below the canopy because all turbulent
characterisation is based on the friction velocity, 𝑢∗, as a result of MOST parameterisation.
Below the canopy, this error is magnified by errors further introduced by the experimental
parametrisation of the turbulence field (Eq 4.37-4.43). Markannen et al. [253] have shown
that MOST-LS and MOST-bLS models tend to suffer more accuracy deterioration than Large
Eddy Simulation (LES) coupled LS models when estimating crosswind concentration footprint.
Generally, the model overestimated concentrations in both mediums. This is partly
attributable to a smaller than assumed spore source area. The concentration at each receptor
was calculated from a catalogue of touchdown velocities of particles landing in any one of 6
1-square meter areas. The size of these squares was based on approximate measurements
of ground area covered by apothecia. Due to non-compactness of sclerotia and allowances
for the irregular shape of source area at the vertices of the square, the actual source area is
smaller than assumed. Consequently, trajectories outside the actual source might have been
included in concentration estimation. Another likely source of error is the adjustment of spore
concentration in order to synchronise sampling time with the averaging time of turbulence
statistics. It was estimated that 51% of total daily spores collected were collected in the first
hour based on diurnal spore release variation and deteriorating Rotorod retention of spores.
This could have easily resulted in an overestimation or underestimation of actual measured
spore concentration depending on whether the assumed diurnal spore release variation is
higher or lower than the actual spore release pattern. Therefore, the effect of this adjustment
on model results is unclear. Further, the characterisation of in-canopy turbulence was only
an estimate, as approximate values based on past experiments in similar canopies were
selected to represent varying turbulence through the canopy. Turbulence in canopies is so
complicated that even attempts to directly calculate the turbulent statistics of the flow field
(e.g. [56]), may not accurately reproduce a turbulent flow that is a product of dominant
length scales which change with the dissipation of turbulent kinetic energy [161]. Most errors
114
in LS model implementation are as a result of inadequate characterisations of canopy
turbulence. These errors are related to the conventional LS model’s inadequate simulation of
the turbulence kinetic energy (TKE) dissipation rate [210]. This limitation of the conventional
LS models is responsible for the recent rise of coupled approaches [60, 212] [56] that attempt
to directly solve for TKE using higher order closure schemes [59].
To assess the performance of this model, some similar applications of LS models have been
identified. The two most relevant are Gleicher et al. [56] and Aylor and Flesch [55]. These
are very relevant because they both evaluate the performance of LS models on spore
concentration estimation in crop canopies against experimental data. The application in this
work is still unique because it attempts to model naturally-released ground level spores in an
OSR canopy. The type of canopy is significant because Gleicher et al. and Aylor and Flesch -
like most of the research in this area [34] [57, 207] [88] - carried out their implementation
on data in wheat and corn canopies, where detailed canopy features and turbulence attributes
have been amassed over the years, due to a higher interest in these cash crops [254]. Another
difference is the fact that bLS not LS is used in this work. Backward LS and forward LS models
calculate concentration footprints differently. Using vertical velocity components of spores
that land in an originating source area (bLS) to calculate concentration footprint and using
horizontal components of velocity passing through a sensor volume (fLS) could result in
completely different outcomes [255]. Notwithstanding these differences in the applications,
the works mentioned are a basis for comparison.
Gleicher et al’s [56] work applied a 3D Eulerian-coupled LS model to investigate Lycopodium
spore dispersal in a maize canopy. In their approach, they used Wilson and Shaw’s [256] 2nd
order closure model to iteratively calculate canopy turbulence parameters rather than rely on
an empirical parameterisation based on generalised canopy turbulence. Their model’s
performance metrics were generally better than the bLS model implemented in this work
(based on FAC2 statistic). It is worth noting, however, that their experiment was on a smaller
scale, with the farthest group of receptors (Rotorods) only 8m away from the source. Due to
the increased scale in this work, the performance measures computed will be lessened by the
error associated with estimating concentration at more distant in-canopy locations. Another
thing to consider is that the Gleicher et al. study used artificially released spores with a
uniform release rate from sources above the ground. This meant that the complexities of
spore release, particularly varying rates and velocities [26], were bypassed. Roper et al. [26]
have demonstrated that naturally released fungal spores have a complex interaction with the
surrounding air and maximise opportunities to be released in groups as opposed to
individually. This is not accounted for in current implementations of LS and will affect model’s
performance. Nevertheless, the results in this work have good agreement when above canopy
115
downwind performance is compared (FAC2 = 67% and 71% for this work and Gleicher et al.
respectively). This is encouraging considering, based on performance measures, Gleicher et
al’s model performed better than most air dispersion model applications [47] [56]. Wilson et
al. [257] define high performance as a FAC2 of 56%.
Aylor and Flesch’s [55] implementation is one of the more successful of LS models in crop
canopies, where they estimated concentration profiles (vertical concentrations with height)
of Lycopodium and V. Inaequalis spores from wheat and grass canopies respectively and
achieved good agreement with observed data. Aylor and Flesch, like this work, was also based
on MOST-LS and canopy turbulence statistics were similarly parameterised from experimental
data. However, where they relied on wholesome canopy attributes, this work had to rely on
estimates (e.g. LAD profile) and random sampling (e.g. estimation of 𝐿𝑣). Further, Aylor and
Flesch’s work was carried out on an even smaller scale than Gleicher et al’s, as their primary
aim was to estimate release rates. The results of this work agree with Aylor and Flesch’s as
both confirm the increased accuracy of LS estimates above the canopy. Aylor and Flesch
expressed lower confidence in their in-canopy predictions, attributing it to low flight of spores
with respect to sampling heights inside the canopy. This will also appear to be a contributory
source of error in this work, as this is a direct manifestation of canopy turbulence and
deposition.
Incorporation of empirical techniques into current wind dispersal strategies to address
limitations of scale and ad-hoc nature of current phenomenological methods has been
identified as an important research goal [258]. One way of achieving this is through large-
scale data collection from an optimally deployed network of sensors. Methods of optimal
deployment of sensors are already in use for environmental and health monitoring based on
various underlying statistical models and concentration profiles [142] [259] [260]. These
should be extendible to an LS model-generated concentration profile, where spatiotemporal
fluctuations are used to optimise sampling and monitoring strategies [258]. The first step is
to validate canopy-capable models from the point of view of their ability to generate spatial
gradients. These spatial profiles can then be used to implement better sampling strategies
that can mitigate current limitations of spore traps and samplers [10]. The evaluation of bLS
in estimating concentrations above canopy presented here is from this point of view of
assessing its potential to estimate spatial profiles of Sclerotinia spores and therefore
addresses this identified research need. With a FAC2 of 46% (within 4% of acceptable
standard and potential for improvement – see section 4.6.2) above the canopy, the bLS model
appears suited for this task based on the limited dataset evaluated.
116
Further research into this specific area should focus on bridging the gap between “cash crops
of interest” and crops like OSR in terms of easy availability of accurate canopy attributes.
Despite the advances in measurement techniques in the past decade, such as LIDAR and
differential spectroscopy that allow accurate measurements of canopy variables (e.g. LAD) at
very fine resolutions [258], their use is restricted to a select number of crops, specifically
wheat and maize (corn) due to a higher interest in these crops by pathologists and growers.
When these measurement advancements in OSR are leveraged, recent methodologies in
canopy flow parametrisation, such as 𝑘 − 𝜖 theory [56, 60], the increasingly powerful Large
Eddy Simulations (LES) [62] and the log-normal velocity-dissipation [210] approach that
characterise canopies more reliably based on solutions to the 2nd order closure model [256],
can be utilised to achieve higher accuracy. And, in turn, these gains can be used to further
the goal of incorporating and optimising empirical techniques into spore dispersal and
eventually regional or even global disease prediction.
4.6.2 Limitations of Experiment
There are a number of limitations to the field trial experiment that have an effect on model
evaluation and they are discussed below.
1. Canopy Attributes and In-Canopy Turbulence Measurements: The main limitations
in this work pertain to the unavailability of detailed canopy attributes and a lack of turbulence
statistics within the canopy. More accurate methods of canopy turbulence parametrisation
such as the ones based on 2nd order closure estimation of TKE could not be used as they
require reliable information on foliage density to compute the drag coefficient [37], which is
a key input into the closure scheme. Because the solution to 2nd Order Closure models is often
derived iteratively, errors in the LAD profile are likely to propagate and magnify. Further,
independent measurements below canopy were not available to assess the likely error that
might result due to using an approximate LAD. Based on these, experimental parametrisation
was preferred in this work. Considering turbulence mischaracterisation is the major source of
error in LS models, addressing this limitation will significantly improve the model’s
performance.
This limitation has the tendency to limit points of comparison between this work and others
only to concentration estimation. Intermediate assessment measures in the form of varied
canopy parameterisation methodologies [61, 62, 161, 212, 234, 235], could therefore not be
compared to this work because this implementation, through the use of experimental
parametrisation, essentially made a black box of that component of the model.
117
This limitation could have been mitigated or eliminated by mobilising more sonic
anemometers and utilising sophisticated canopy measurement techniques (e.g. using LIDAR
for foliage density measurement). However, given the circumstances that gave birth to the
field experiment and the subsequent quick modification of plans (see Appendix 1), this was
not possible. The experiment had to rely on equipment that was available at/to Rothamsted
Research at the time.
2. Insufficient Data Samples: Another limitation concerns the unavailability of sufficient
data. This relates to both scale and number of sampling points. Insufficient data has the
tendency to bias results to fit unrepresentative samples, resulting in great variance and
decrease confidence in conclusions. However, the model performance metrics used to assess
the model in this work are considered robust to few data samples [249] and remain the
standard for validation air dispersion models (e.g. [47] [261]), where typically few validation
samples are available [262]. Nevertheless, further evaluation of the bLS model on a larger
scale or higher number of points is advised. With respect to increasing the scale of the
experiment, the very limitations of the semi-manual current methods this work is trying to
address make increasing the scale significantly very difficult. As seen in chapter 3 and
supported by Heard [13], the most reliable identification and quantification techniques are
currently manual and time-consuming in a way that makes large scale data collection
impractical.
4.7 Conclusions
In this chapter, a bLS model describing the transport of spores in an OSR canopy has
successfully been implemented. The rationale behind choosing an LS model was attributed to
the LS model’s ability to naturally mimic particle dispersion in a turbulent atmosphere and its
amenability to modifications that will enable coping with complex environments. The
backward-time Lagrangian implementation of LS was found attractive because it requires
minimal source information and may thus estimate footprint on a conceptual basis. The
capability of bLS to only compute the trajectories of interest can be used to get probabilistic
estimates of likely travel distances of spores that are under the influence of canopy effects.
This will be helpful in informing deployment decisions of monitoring equipment or sensors.
The bLS model presented is simple, requiring only a few surface measurements to
characterize turbulence, a luxury afforded by MOST. The results suggest bLS models can be
capable of estimating concentration of Sclerotinia spores leaving an OSR canopy at sampling
points deployed above the canopy and downwind of a source. Numerous likely sources of
error that might have decreased estimation accuracy have been identified and discussed.
Limitations in the data also prevented the use of more accurate canopy parameterisation
118
schemes that would have improved model performance. It is concluded that correcting for
these errors and mitigating these limitations, particularly the availability of high quality
attributes of OSR canopies will improve the bLS model to acceptable standards for its intended
use – reliable estimation of above canopy spores traveling from a ground-level below canopy
source. The model was assessed to have a FAC2 of 46%. Considering mischaracterisation of
turbulence can result in large modelling errors, it is very likely that the bLS model presented
will meet the acceptable standard of FAC2 of 50%.
119
Chapter 5 An Integrated Fault Detection, Identification
and Reconstruction Scheme for Agricultural Systems
The previous chapters have discussed the dispersion of Sclerotinia spores on a local scale;
the near field (𝑡 < 𝑇𝐿) where spore transport is best described by an LS model [48, 54]. But
when spores escape their local canopy in sufficient numbers, they can constitute a long
distance threat to crops several kilometres away [154]. This dispersion mode is best described
by a Gaussian Plume Model [47] [147].
This chapter proposes a novel approach that is derived from several disciplines to enable the
efficient exploitation of large-scale agricultural data collected by deploying biosensors in a
network. The proposed method is an augmented monitoring procedure based on multivariate
statistical process control (MSPC) techniques that is expected to address data integrity issues
that may exist in a network, by detecting, identifying and reconstructing faulty or missing
data.
Due to the similarities between dispersion of other particulates, such as particulate matter
(PM10), and spore dispersion data with respect to aerodynamic characteristics and plume
distribution [147] and the unavailability of spore dispersion data, a pollution monitoring data
set was used to demonstrate the efficacy of the proposed method. Pollution monitoring
networks are very similar to the potential biosensor network and are expected to be
vulnerable to the same challenges, such as mechanical failure, adverse environments, theft
and vandalism of sampling equipment. There are some important differences however which
have to be accounted for. The main one is in the reliability of measuring instruments. At this
stage of their technological development, as demonstrated in chapter 3, biosensors are
comparably unreliable to PM10 sensors and other meteorological sensors. Even the best
sensors, such as the ones used in healthcare (e.g. glucose sensors) are more susceptible to
errors than conventional sensors due to imperfections in the synergy between biological
reactions and electrochemistry. In addition, meteorological and PM10 networks have their
120
own data validation strategies which are usually robust. It is then expected that the potential
biosensor networks will be more prone to errors and in more need of data validation.
5.1 Motivation
The aim of the SYIELD project was to revolutionize agricultural disease prediction by
deploying a novel biosensor network that would be able to sample large-scale Sclerotinia
spore data efficiently. This network would comprise of numerous spore-measuring biosensors
spanning a large area, with each observation of the collected data being spatio-temporal and
large. Maintaining a sensor network that is exposed to the external environment and that is
made up of several components with finite reliabilities is a complex challenge. As a result of
this complexity, data integrity concerns arise regarding the reliability of observations, severe
missing data due to mechanical failure, theft and vandalism, and robustness of the entire
system to false positives due to suboptimal specificity of the biosensing process2. These
challenges cannot be addressed with the state of the art agricultural data collection methods
that are currently available.
As a consequence of Tobler’s first law of geography, which states that, “Everything is related
to everything else, but near things are more related than distant things”, efficiently sampled
spore data is spatially correlated and can be treated as a collection of multivariate
observations. Consequently, this study proposes a novel application of MSPC to detect,
identify and reconstruct potential errors in the data. MSPC is a model-based multivariate
statistical analysis set of tools that has been successful in monitoring industrial processes,
which are typically more reliable than the comparatively rudimentary biosensors considered
in this work. The proposed incorporation of MSPC into agricultural data collection has the
potential to make the monitoring of crops automated, and marks a migration from the manual
and time-consuming methods currently used [8, 263]. It is hoped that the proposed approach
will extend MSPC techniques such that the success it has achieved in industrial processes can
be realised in the agricultural industry.
The developed method in this chapter is tested on pollution data sourced from the London
Air Quality Network (LAQN) has been used in this study. As mentioned in section 5, at
distances far away from the source, airborne spores can be described by a Gaussian
distribution [47] [147] and are as such dispersed in a similar manner to pollution data. The
data is spatially and temporally correlated and can be modelled and analysed as highly
2 As explained in chapter 3, the biosensing process is based on the proxy detection of oxalic
acid, which is a pathogenicity factor of Sclerotinia spores as well as other fungi. This unfortunately means that false positives can arise due to the detection of any of these
masquerades.
121
correlated variables 3. Additionally, particulate matter of size less than 10 𝜇𝑚 (PM10) is
aerodynamically and physically similar to Sclerotinia spores, which have diameters ranging
from 12 𝜇𝑚 to 14 𝜇𝑚. Moreover, biosensors are expected to suffer from the same deficiencies
as pollution monitors, such as mechanical failure that will result in missing data. The decision
to use pollution data as a surrogate for agricultural data in demonstrating the potential
effectiveness of PCA on spore data was based on these reasons.
The next section introduces the background theory, detailing the components of MSPC
employed in this work
5.2 Background Theory
This section presents the theoretical foundation of main MSPC components as typically
applied in process control.
5.2.1 Principal Components Analysis (PCA)
PCA is a statistical transformation method that allows extraction of information from
correlated and high dimensional variables into new, orthogonal (uncorrelated) variables called
Principal Components or PCs [120]. These PCs are formed in such a way that the dominant
information, as represented by the largest direction of data variance, is contained in the first
PC followed by the second and so on. For a mean centred dataset X (𝑛 𝑥 𝑘) with row vectors,
𝒙𝒊𝑻, a maximum of A (𝐴 ≤ 𝑚𝑖𝑛{𝑛, 𝑘}) PCs can be formed as products of two matrices T (𝑛 𝑥 𝐴)
and 𝐏T (𝐴 𝑥 𝑘 ). When a PCA model is formed, the objective is usually to reduce data
dimensionality by having all the essential information in 𝐴 < 𝑘 PCs and discarding the
remainder. These retained PCs define the structured part of X and the 𝐴 + 1: 𝑘 discarded PCs
constitute the unexplained part of the original data and are defined as the model residuals,
E. E is the residual matrix that represents the deviations between variables and their
projections (predictions) in the PC space. The corresponding principal component model is
given by:
𝑿 = 𝑻𝑷𝑻 + 𝑬 [5.1]
and for any sample/observation, n,
𝒙𝑛 = ∑ 𝒕𝑖𝒑𝑖𝑇
𝐴
𝑖=1
+ 𝒆𝑛 [5.2]
3 It is worth noting that the actual biosensor data (spore data) may be more correlated than this pollution data given air quality monitors were deployed near population centers and
multiple line sources. The multiple sources will introduce a local influence on monitors near them resulting in the reduction of the overall data correlation when compared to a single-
sourced data.
122
where the scores contained in 𝒕𝒊 (columns of T) are the projections of samples of X in the PC
subspace, the loadings in 𝒑𝒊𝑻 (columns of P) represent the contribution of each variable in X
to the PCs retained in the model. PCA therefore transforms original data into an orthogonal
data subspace (PC subspace) and a residual subspace.
5.2.1.1 Cross-validation
Cross-validation is the method used to select model order in PC models. Methods for selecting
PCs range from the ad-hoc, which relies on the percentage of explained variance [264], to
selecting an optimal number of PCs using cross-validation methods [265, 266]. When using
the percentage of explained variance approach, the following formula is used:
𝐸𝑥𝑉𝑎𝑟 =∑ 𝜆𝑖
𝐴𝑖=1
∑ 𝜆𝑖𝑘𝑖=1
[5.3]
where 𝜆𝑖 is the variance of the 𝑖th component. The explained variance is normally expressed
as a percentage and PCs are added to the model until their addition does not result in a
meaningful increase in 𝐸𝑥𝑉𝑎𝑟.
More reliable cross-validation methods are based on Predicted Residual Sums of Squares
(PRESS) is computed as:
𝑃𝑅𝐸𝑆𝑆(𝐴) = ∑∑ ∑(𝑥𝑖𝑗,𝑙 − 𝑥 ��
𝑗,𝑙
𝑚
𝑙=1
𝑛
𝑗=1
𝑐𝑣
𝑖=1
)2 [5.4]
where PRESS(A) is the prediction error for 𝐴(𝐴 = 1,2, …𝐴𝑚𝑎𝑥) components in the model, 𝑥𝑖𝑗,𝑙
and 𝑥 ��𝑗,𝑙 are the observed and predicted 𝑗𝑙th elements of the 𝑖th subgroup 𝑿𝒊 and its estimate
𝑿�� respectively, and 𝑐𝑣 is the number of subgroups. The number of components retained in
the model, A, that minimises PRESS (A) is then chosen as the desirable number of
components in the model.
5.2.2 Multivariate Statistical Process Control (MSPC)
This section introduces the process control and chemometrics suite of tools known as MSPC
[264, 267-271]. MSPC are a set of statistical tools based on PCA and Partial Least Squares
(PLS) [121, 122] that have found success in industrial applications. The methods have seen
wide application in online control and batch process monitoring [272-275]. MSPC is attractive
because it enables online monitoring of multivariate processes and is flexible enough to
incorporate methods for handling missing data. MSPC techniques typically use the Hotelling
T2 and Squared Prediction Error (SPE) control limits [264, 269, 276, 277] to monitor deviation
of process variables from optimal operation. The limits are usually computed under the
assumption that the data is drawn from an independent and identically distributed set, i.e.
123
Gaussian distribution. This optimal process performance is assumed to be contained in the
scores of an underlying PCA model built from healthy process data.
The monitoring aspect of MSPC is of particular interest in this work. This study intends to
apply these methods to monitor the data integrity of a biosensor network by detecting and
identifying false measurements even in the presence of missing data.
5.2.2.1 Process monitoring
Consider a multivariate process represented by the data matrix 𝑿 whose PCA model has been
described as:
𝑿 = 𝑻𝑷𝑇 + 𝑬
Assuming an effective cross-validation procedure, the scores 𝑻𝑷𝑻 will ideally contain all the
relevant information in the data and will therefore be fully representative, i.e. an abnormality
in 𝑿 will manifest as an abnormality in 𝑻𝑷𝑇. If 𝑿 is made up of the normal operation data
(minus outliers or systematic errors), 𝑻𝑷𝑇 will represent the ideal state of the system in the
PC space.
Any new observations 𝒙𝑛𝑒𝑤𝑇 = 𝑷𝑇 𝒕𝑛𝑒𝑤 , scaled using the previously calculated mean and
standard deviation, can be interrogated using fewer pseudo variables, 𝒕𝑛𝑒𝑤 , against this
optimal behaviour to identify potential differences. The thresholds used to determine whether
this deviation is significant or not are the Hotelling 𝑇2 and Square Prediction Error.
Hotelling 𝑻𝟐 chart
The Hotelling 𝑇2 statistic [123] has been a useful feature in multivariate analysis for a long
time. It is based on the generalised distance of observations from their mean or the
Mahalanobis distance [278]. It can thus detect outliers, mean shifts and distributional
deviations from an optimal distribution in multivariate processes [125]. In the PC space, the
Hotelling 𝑇2 statistic for each sample interrogated against a PCA model of order 𝐴 is given by
[264]:
𝑇2 = ∑𝑡𝑖
2
𝜆𝑖2
𝐴
𝑖=1
[5.5]
where 𝑡𝑖 is the 𝑖th element of the score, 𝒕, and 𝜆𝑖 are the score vector and its corresponding
eigenvalue respectively. And the PCA control limit under multivariate normality assumption
becomes:
𝐶𝐿𝑇2 =𝐴(𝑛 + 1)(𝑛 − 1)
𝑛2 − 𝑛𝐴 𝐹(𝛼,𝐴,𝑛−𝐴) [5.6]
124
where 𝛼 is the confidence limit and 𝐹(𝛼,𝐴,𝑛−𝐴) is the 𝛼th upper quantile of an F-distribution
with 𝐴 and 𝑛 − 𝐴 degrees of freedom. It is worth noting that 𝑇𝑖2 now only represents mean
shifts and deviations inside the PC subspace (PCA model with 𝐴 components) [117, 279].
New observations 𝒙𝑛𝑒𝑤 can then be used to calculate 𝑇𝑛𝑒𝑤2 by projecting them onto the PC
subspace using a variant of Eq. 5.8:
𝑇𝑛𝑒𝑤2 = 𝒙𝑛𝑒𝑤
𝑇𝑷𝚲−1𝑷𝑇𝒙𝑛𝑒𝑤 [5.7]
where 𝚲 (𝚲𝑖𝑖 = 𝑑𝑖𝑎𝑔(𝜆𝑖)) is a 𝐴 𝑥 𝐴 matrix of eigenvalues. Whenever 𝑇𝑖,𝑛𝑒𝑤2 > 𝐶𝐿𝑇2 the
sample is assumed to be out of control.
Square Prediction Error (SPE) chart
The SPE chart detects faults and errors that do not lie in the PC subspace. These errors are
undetectable by the 𝑇2 statistic [117]. SPE therefore assesses the errors that lie in the
residual subspace not represented by the first A components of the PCA model, i.e. 𝑬. This
residual subspace can be thought of as orthogonal to the hyper-plane containing the principal
components [117, 280]. Geometrically, the SPE is the difference between the sample
observation vector, 𝒙, and its projection in the PC subspace, 𝒙, [117]:
𝑆𝑃𝐸 = ‖𝒙 − 𝒙‖2 [5.8]
= ‖𝒙 − 𝑷𝑷𝑇𝒙‖2
= ‖(𝑰 − 𝑩)𝒙‖2
= ‖��𝒙‖2 [5.9]
where �� is a projection matrix that represents the transformation of 𝒙 onto the orthogonal
residual subspace. By this definition, SPE is a measure of the PCA model fit to the original
data. A good model will have a high projection of 𝒙 in the PC subspace and a smaller one in
the residual subspace. An out of control state occurs when the SPE is higher than a threshold
value [113, 118]:
𝑆𝑃𝐸 > 𝛿𝛼 [5.10]
As with 𝑇2, a control limit for SPE is also usually calculated under assumptions of multivariate
normality. Jackson and Mudholkar [281] provides the following formula:
𝛿𝛼 =
(
𝑐𝛼√2휃2ℎ0
2
휃1
+휃2ℎ0(ℎ0 − 1)
휃12 + 1
)
1/ℎ0
[5.11]
where 𝑐𝛼is the confidence limit for the (1- 𝛼)th quantile of a normal distribution, and,
ℎ0 = 1 − 2휃1휃3/3휃22
휃𝑗 = ∑ (𝜆𝑖)𝑗
𝑘
𝑖=𝐴+1
𝑗 = 1,2,3.
Alternatively, a weighted chi-square distribution approach can be used [279]. SPE is
alternatively called the Q-statistic when computed under these normality assumptions.
125
Contribution Plots
Contribution plots [277, 282-284] show the contribution of all variables to all score vectors,
thus identifying variables that breach limits. When a change or fault causes a breach of 𝑇2
and/or SPE control limits, the responsible score may be identifiable from the monitoring
charts. Variable contributions to an out of limit SPE observation can be directly inferred from
SPE charts [285]. For the 𝑇2 chart, the contribution of the 𝑘th variable to faulty observations
can be identified from the normalised scores of that observation [285, 286]:
𝐶𝑜𝑛𝑡𝑘 = 𝑝𝑖,𝑘𝑥𝑘,𝑛𝑒𝑤 [5.12]
and when more than one score is out of control, an overall average contribution of variables
to all (normalised) out of control scores is computed as [285]:
𝑇𝐶𝑜𝑛𝑡𝑘= ∑
𝑡𝑖𝜆𝑖
𝑐𝑙𝑏
𝑖=1
𝑝𝑖,𝑘𝑥𝑘,𝑛𝑒𝑤 [5.13]
where 𝑝𝑖,𝑘 is the 𝑖𝑘th element of the loading matrix 𝑷𝑇, 𝑥𝑘,𝑛𝑒𝑤 is the 𝑘th monitor/variable of
the new observation and 𝑐𝑙𝑏 is the number of scores that breach the control limit. Individual
variable contributions with negative values are set to zero as only contributions with the same
sign as the score increase the overall contribution [285].
Detectability, Identifiability and Reconstructability
Not all errors or faults are detectable, identifiable or reconstructable. Non-detectability is
based on the fact that the residual subspace is orthogonal to the PC-subspace. Like fault
reconstruction, fault identification also depends on minimising SPE after the occurrence of a
fault. A fault identification index has been defined by Dunia and Qin [126]:
휂2 =𝑆𝑃𝐸𝑟
𝑆𝑃𝐸 [5.14]
where 휂 ∈ [0 1], 𝑆𝑃𝐸𝑟 is the reconstructed SPE after the occurrence of a fault. A significant
minimisation of 𝑆𝑃𝐸𝑟 signifies high identifiability and will result in a value of 휂 close to 0.
Assuming faults are detectable, when there is sufficient degree of freedom, all faults can be
identified. Dunia and Qin [126] have determined that 𝑘 − 𝐴 ≥ 2 is a necessary condition for
identifiability. Note that 𝑘 − 𝐴 is the dimension of the residual subspace and represents the
redundancy in the system.
5.2.3 Kernel Density Estimation
The KDE estimator (reintroduced from section 2.3.2.1) uses a weight function or kernel, ��,
which acts as a moving window on a univariate data sample of 𝑛 observations
(��𝑖1, ��𝑖2
, . . … , ��𝑖𝑛) to estimate the distribution as shown:
𝑓(��) =1
𝑛ℎ∑�� (
�� − ��𝑖
ℎ𝐾𝐷𝐸
)
𝑛
𝑖=1
[5.15]
126
where ℎ𝐾𝐷𝐸 is the bandwidth or smoothing parameter and 𝑊 is chosen so that its
differentiable, �� ≥ 0, and ∫ ��∞
−∞= 1. A number of kernels satisfy these conditions but this
work uses a Gaussian kernel given by [112]:
��(��) =1
√2𝜋exp (−
1
2��𝑇��) [5.16]
Gaussian kernels are attractive because they have optimal finite support, i.e. they gradually
become zero after a period so that distant points are not overly influential.
The choice of ℎ𝐾𝐷𝐸 is crucial and the variable is difficult to determine [132, 134]. A low value
will result in a noisy estimate and will amplify possibly insignificant data trends while a high
value can lead to insensitivity where important trends are missed and distribution properties
such as multimodal behaviour are missed. A good choice of �� such as the Gaussian (standard
normal distribution) can limit the extent of oversmoothing and undersmoothing thus
complementing the choice of ℎ𝐾𝐷𝐸.
5.3 Methodology
This sections presents the novel integrated methodology proposed in this work and its
application to the London Air Quality Network (LAQN). Main components of the approach are
described under the subheadings that follow.
5.3.1 Data
The data used in this work was sourced from the London Air Quality Network (LAQN) . LAQN
maintains a network of sensors over London to measure hourly values of particulate matter
(PM10), air particles less than 10μm in size. The data used in this study comprised 8760
hourly samples (for the period between January and December 2010) of PM10 concentration
observations from 93 monitoring locations. The sampling area covered spanned Latitude 500
to 520 degrees and Longitude -0.450 to 0.50 degrees. The data is spatiotemporally correlated
due to correlations in causal attributes, such as driving habits, etc., and the effect of diffusive
and dispersive effects, which correlate with location. Treating the data as a multivariate
dataset with 93 variables and 8760 observations can then account for this correlation between
variables and locations. Being a real dataset, there were inevitably missing observations. The
missing measurements also follow patterns that are common of monitoring equipment, where
consecutive observations are missing due to failure or vandalism and repair is not usually
immediate.
5.3.2 Principal Component Analysis of PM10
This section describes the application of PCA to the analysis of spatial patterns in the above
described PM10 data. The employment of PCA for network monitoring is quite different from
127
this and is described later (see section 5.3.3.1). In this preliminary demonstration of PCA,
missing observations were estimated using the relation [287]:
��𝑖,𝑗 =𝑥�� + 𝑥��
2
where 𝑥�� and 𝑥�� are the means of the 𝑖th row and 𝑗th column of 𝑿. More involved missing
data techniques are employed and discussed in section 5.3.4.
5.3.2.1 Model Building
The data matrix 𝑿 (8760 x 93) was first autoscaled [92] by subtracting the mean and dividing
by the variance for each variable. This is important when some variables have high numerical
variations that can dominate data characteristics, e.g. a monitor close to a pollution source.
Autoscaling prevents this by giving each monitor a comparable influence (a unit variance) on
the PCA model. The mean-centring aspect of autoscaling makes data analysis more
informative since variable contributions to each PC are assessed relative to the origin. This
way, negative and positive contributions can be differentiated.
The scores and loadings (Eq. 5.1) were calculated using the SVD approach. SVD theory states
that for every rectangular matrix 𝑿, there exist orthonormal matrices 𝑼 and 𝑽, and a diagonal
matrix 𝑺 such that:
𝑿 = 𝑼𝑺𝑽𝑻 [5.17]
where 𝑑𝑖𝑎𝑔(𝑺) contains the singular values of 𝑿 or the square roots of the eigenvalues of
𝑐𝑜𝑣(𝑿) (= 𝑿𝑻𝑿𝑛 − 1⁄ ) arranged in descending order of magnitude, 𝑽𝑻 represents the
corresponding loading vectors, 𝑷𝑻, and 𝑼𝑺 is equivalent to the score matrix, 𝑻, in Eq. 5.1 if
all possible 𝑘 components were retained in the model. Note that the decomposition implies
that:
𝑐𝑜𝑣(𝑿)𝒗𝑖𝑇 = 𝜆𝑖𝒗𝑖
𝑇
where λ𝑖 (= 𝑑𝑖𝑎𝑔(𝑺𝑖)2) is the eigenvalue of 𝑐𝑜𝑣(𝑿).
After generating the score and loading matrices, the model was validated using the EKF cross-
validation method. EKF was chosen because of its superior accuracy and low computational
cost [266]. The EKF technique involves the following procedure [288]:
For each component, A, divide data into 𝑐𝑣 subgroups by excluding different groups
of data from the training set.
Denote training set and validation (excluded) as 𝑿∗ and 𝑿# respectively for each
subgroup.
Form a PCA model with 𝑿∗
Predict 𝑿# by projecting it onto PC space, i.e. 𝑿# = 𝑿#𝑷𝑻
128
Estimate subgroup error as: 𝑿# − 𝑿#
Form an error matrix for all subgroups, 𝑬𝑐𝑣
Compute element-wise prediction error: 𝑃𝑅𝐸𝑆𝑆(𝐴) = ∑ ∑ (𝑒𝑗,𝑙)2𝑚
𝑙=1𝑛𝑗=1 where 𝑒𝑗,𝑙 is
the 𝑗𝑙th element of 𝑬𝑐𝑣.
Calculate root mean square error of cross-validation (RMSECV): 𝑅𝑀𝑆𝐸𝐶𝑉(𝐴) =
√𝑃𝑅𝐸𝑆𝑆(𝐴)
𝑛
Repeat for all 𝑘 possible components.
The number of components, 𝐴 , that resulted in the lowest PRESS(A), or alternatively
𝑅𝑀𝑆𝐸𝐶𝑉, was selected as the model order.
5.3.3 Multivariate Statistical Process Control (MSPC)
The performance of conventional MSPC in monitoring, detecting and identifying faulty sensors
was first evaluated against PM10 data. Two aspects were specifically tested: the robustness
of the method to the type of missing data expected of realtime monitoring network
(contiguous multiple samples missing) and the susceptibility of the method to mischaracterise
good measurements as bad for spatial data. These component methodologies employed are
detailed in the subsections below.
5.3.3.1 Data Pre-processing and preliminary monitoring PCA model
The data pre-processing required for building a monitoring PCA model is more rigorous than
the one used in the demonstration of PCA in section 5.3.1. In MSPC monitoring applications,
the underlying monitoring model should represent an ideal behaviour of the process, so that
deviations from what is normal behaviour can be identified [289]. One way to ensure good
monitoring PCA models is to in-fill missing data by using various estimation methods [290,
291] [292]. However, where the missing data is too pervasive to estimate reliably, typically
at values higher than 20% [293], deletion methods are preferable. An exploratory analysis of
missing data revealed that at least 26 sensors had over 20% missing data (see section 5.4.2).
Before pre-processing, 500 samples with the least amount of missing data and with score
values within the interquartile range of the total dataset (as determined from PCA analysis –
section 5.3.1) were set aside as “in-control samples” for use in subsequent sections (see
section 5.3.5). A threshold of 25% (to test beyond the limits of most missing data methods)
was applied on the remainder of the data and monitors with more missing measurements
were excluded from model building. After pre-processing, the remaining missing
measurements were in-filled using nearest neighbour interpolation.
129
After processing, the now fully observed dataset was then autoscaled and a preliminary PCA
model was built and validated as using the same procedure described in section 5.3.2.
5.3.3.2 Control Limits
Monitoring statistics (𝑇2 and 𝑆𝑃𝐸) for each sample included in the PCA model were computed
using Eqs. 5.5 and 5.9. To calculate limits for these statistics, their distribution must be known
[294]. The distribution of the monitoring statistics was tested using a Royston’s [295, 296]
multivariate normality test. Royston’s test is a significance test and is an extension of the
reliable Shapiro and Wilk [297] univariate normality test. The test was run in Matlab with a
significance of 0.05 for 2000 randomly selected samples at a time. Following multivariate
normality test, a non-parametric approach, which is better for non-normal data, to calculating
the control limits was considered appropriate [279]. Consequently, Kernel Density Estimation
(Eqs. 5.14 and 5.15) which is widely used to estimate distribution thresholds was used [298].
Numerous methods of selecting the bandwidth have been proposed [112, 132, 133].
Phaladiganon et al. [279] have used Silverman’s rule of thumb for Gaussian approximation to
estimate the density of 𝑇2 even though this requires an assumption of Gaussian distribution
on the data. In this work, ℎ𝐾𝐷𝐸 was estimated using the common method of 𝐿2 minimisation
of the mean integrated square error (MISE) with no underlying distribution assumptions as
shown:
𝑀𝐼𝑆𝐸(𝑓) = 𝐸 ∫(𝑓(𝑞) − 𝑓(𝑞))2𝑑𝑞 [5.18]
After selecting the KDE parameters, the KDE estimator was applied to 𝑇2 and SPE residuals
to estimate their true distributions. The control limits 𝐶𝐿𝛼𝑇2𝐾𝐷𝐸 and 𝛿𝛼𝐾𝐷𝐸 were then
determined by taking the 100(1 − 𝛼𝑡ℎ) percentile of each estimated density.
5.3.3.3 Building the final PCA monitoring model
Detecting faults in this application of MSPC differs from traditional industrial application. In
the latter, faults mostly arise from process failures that will cause a major shift from a clearly
defined optimal performance [299]. In PM10 or spore data, the only indicator of faults is an
abnormally high or low concentration reading at a particular location. This may not cause a
glaring shift in the correlation structure because the underlying “optimal process” is not as
clearly defined. It is therefore argued that the efficacy of applying PCA monitoring to
dispersion data depends on a high sensitivity of the underlying model that can detect subtle
correlation changes arising from uncharacteristic concentration changes. A highly sensitive
model clearly defines what an optimal performance is. This sensitivity can be maximised by
excluding samples with high T2 values, which is indicative of high PM10 concentrations. To
130
achieve this, a threshold equal to the 95th percentile (16.31) of the current T2 distribution is
applied and all samples above that were excluded from the final model.
For further outlier detection, the calculated control limits (𝐶𝐿𝛼𝑇2𝐾𝐷𝐸 and 𝛿𝛼𝐾𝐷𝐸 ) were applied
to the monitoring statistics (𝑇2 and 𝑆𝑃𝐸) and all samples breaching the control limits were
identified and excluded from the data for main model building. An intermediate model was
then rebuilt and validated as described in section 5.3.2. The justification for this is to get the
best possible monitoring model, since the best monitoring models are those that best
discriminate the behaviour they are intended to monitor [300]. The intended final application
of this integrated monitoring approach is in the fault detection of spore biosensor networks,
where ‘faults’ can be subtle measurement errors. To ensure that the monitoring model is
sensitive enough to capture these deviations, further outlier removal was carried out. A new
set of 𝐶𝐿𝛼𝑇2𝐾𝐷𝐸 and 𝛿𝛼𝐾𝐷𝐸 were calculated for the intermediate 𝑇2 and 𝑆𝑃𝐸 of the
intermediate model, and samples breaching these limits were further excluded from modelling
[276]. Finally, the remaining data was used to build a final model (as described in section
5.3.2) and a set of final 𝐶𝐿𝛼𝑇2𝐾𝐷𝐸 and 𝛿𝛼𝐾𝐷𝐸 were calculated based on a final set of 𝑇2 and
𝑆𝑃𝐸 (as described in section 5.3.3.1).
5.3.4 Online Fault Detection of a PM10 Network with Missing Data
The methodologies described so far are referred to as Phase I implementation of MSPC, where
knowledge of a process is gained and control limits are determined based on a desired
behaviour [280]. In the subsequent sections, Phase II, the implementation of online
monitoring, where new observations are evaluated based on Phase I (on-spec) standards is
described. The main areas of challenge in online monitoring are missing data and fault
identification. These are discussed below.
5.3.4.1 Missing data handling during system monitoring
Missing data is ubiquitous in many environmental applications [301, 302]. As demonstrated
in section 5.3.4.1, dealing with missing data in offline applications where data samples are
abundant is straightforward – they can simply be deleted. Where estimation is required, there
is usually enough correlation in the data to reliably estimate reasonable amounts (<20%) of
missing data [293] and numerous reliable methods are available [291]. However, in online
applications, missing data is challenging because only one observation is available at a time.
In this particular case, an observation is a vector of 77 values representing the measurement
from each PM10 monitor, so deletion and conventional estimation are not possible. Online
missing data methods typically do not estimate missing measurements in the variable space.
They instead estimate the score of the new observation using the underlying PCA model.
131
Nelson et al. [303] have shown that prediction uncertainties due to unreliable missing data
handling are at times greater than the uncertainties arising from model prediction errors.
These errors are caused by the loss of orthogonality of principal components [293]. A
successfully applied [304] online missing data approach, the single component projection
(SCP), was found to be unsuitable in this case. This is because this method estimates scores
sequentially element-by-element, which can cause large errors for long observation vectors
due to propagation of errors [305]. Any single observation vector of pollution or spore data
spanning a large spatial area will be large and particularly vulnerable to this. For example, a
PM10 observation, 𝒙𝑛𝑒𝑤(77X1), could have up to 18 stations (25%) missing, which must be
sequentially estimated.
As a result of this deficiency of SCP, Projection to Model Plane (PMP) [114, 120, 293, 303],
which projects the new observation to the PC subspace to calculate the entire score vector
at once, was chosen in this work. PMP estimates the score vector as [303]:
��𝑛𝑒𝑤 = (𝑷∗𝑇𝑷∗)−1𝑷∗𝑇 𝒙𝑛𝑒𝑤∗ [5.19]
where 𝒙𝑛𝑒𝑤∗ is the complete part of the new observation, 𝑷∗ the loading matrix, and ��𝑛𝑒𝑤
the estimated score.
5.3.4.2 Implementation of fault detection
The set aside in-control observations (section 5.3) were used for testing MSPC on new data.
The missing data in the testing samples had been in-filled using nearest neighbour
interpolation as described in section 5.3.1. Three cases were tested: new in-control samples,
randomly missing data, and contiguously missing data.
In-Control Samples: 100 of the observations were scaled by factors ranging from 5 to 10, so
that inter-variable correlation was preserved but variable magnitudes were higher. This was
to test MSPC for new, in-control observations.
Randomly missing data (Case 1): 30 observations were randomly selected from the 100 in-
control samples used. The samples were divided into 3 groups of 10 each. The first group
was corrupted with missing data in the range of 6.5-10% (low), the second in the range of
10-20% (medium) and the third in the range of 20-25% (high). Each observation index that
was allocated a missing value was randomly selected from 77 possible variables.
Contiguously missing data (Case 2): It is common to have neighbouring monitors missing
data, usually because instruments are not fixed in time and numbers of failures mount. To
address this, 15 out 100 samples (from the 500 set aside) were infused with missing data.
These were divided into 3 groups of 5 (21-25, 51-55, and 81-85). For the first group (21-25),
132
the first 19 observations (corresponding to 25% missing data) were designated missing. The
same was done for the second and third groups but with observations 30-49 and 50-69
respectively designated as missing. The three groups thus represent samples with 25%
missing data in approximately the first 19, mid 19 and last 19 PM10 monitors.
These were evaluated with MSPC as follows: A score (or an estimate for Cases 1 and 2) is
calculated for each new observation, 𝒙𝑛𝑒𝑤 ( 𝒙𝑛𝑒𝑤∗). These scores are evaluated against the
control limits and violating scores are flagged.
5.3.5 Online Fault Identification in a PM10 Network
When an observation breaches the control limits, a fault is detected. But identification of
faults requires the identification of the erroneous variable(s). This section describes the
methodology for fault identification in a PM10 monitoring network.
5.3.5.1 Identifying Faults
In a PCA context, fault identification can be implemented using different indices, such as
sensor validity index (SVI) [118], reconstruction-based contribution [306, 307] or T2
contributions [299]. The SVI and RBC approach requires the fault to be isolatable, i.e.
uniquely identified [113, 118, 126]. When there are multiple faults, this is not always possible
[113, 117]. Based on these, the contribution plot [116, 307] approach, which is has been
found suitable for correlated data [118], was preferred. No contributions are calculated for
SPE, as these can be directly inferred from SPE charts [285]. For the 𝑇2 chart, the contribution
of the 𝑘th variable’s contribution to faulty observations was calculated using Eq. 5.12 and
5.13.
5.3.5.2 Implementation of fault identification
The performance of MSPC in detecting out of control samples was evaluated next. Here too,
100 samples from the 500 set aside (see section 5.3.1) were used. 4 samples were corrupted
at select positions of the observation vector as shown in Table 5.2. Samples 5 and 20 were
corrupted with values of approximately 50%-150% times the mean value drawn from an
inverse distance-weighted function, where the central station was assigned the highest value
and the farthest the lowest. Samples 5 and 20 were therefore simulated such that variables
were spatially correlated in a manner that would happen when there is a local emission of
spores or a pollutant. Samples 50 and 80 were corrupted with values randomly chosen from
a range of values spanning 50% and 400%.
133
Table 5. 1: Index of variables and sample number of corrupted observations
Sample no. Corrupted variable index
5 5-10
20 20-25
50 40-45
80 60-65
Each observation was evaluated by MSPC as described in section 5.3.4. When a sample
breached the control limits, the erring variables, in this case monitoring stations, were
identified from the contribution plot and SPE chart as detailed in section 5.3.5.1.
5.3.6 Augmented MSPC
PM10 concentrations are largely linear but also nonlinear due to the complex nonlinear
processes that influence their production and accumulation [308, 309]. A linear model, such
as the PCA model implemented in this work, will by definition explain the variance associated
with the linear correlation between PM10 monitors. Any nonlinear correlation will be assigned
to the residuals. As a result, MSPC based on this model may identify nonlinearly correlated
observations as deviating from ideal behaviour (faulty), thus leading to false positives.
Moreover, in typical monitoring applications, reconstruction is done in the PCA domain [310]
[299, 306]. These reconstructions are not always possible, as reconstructability depends on
fault attributes [311]. In other words, this type of reconstruction assumes the fault has been
correctly detected. This is not ideal for a system may trigger false alarms or where a validation
of the detection process itself is required. To address this issue, an augmented MSPC
approach is proposed. The novel approach integrates the best aspects of MSPC (reliable
detection), robust missing data handling, and a reliable spatial interpolation method to
validate fault detection and reconstruct data in a PM10 monitoring network. The proposed
approach reconstructs data regardless of type of fault and is independent of the fault
detection procedure.
5.3.6.1 Kriging
Kriging [312, 313] is an unbiased method of spatial interpolation that is based on spatial
correlation. Kriging originated from geostatistics but has been successfully applied to particle
dispersion [314-317]. While inverse distance weighting (IDW) [312] and other types of
interpolation methods also account for spatial correlation, they do so by assigning linearly
decreasing weights to points with increasing distance of separation. By contrast, kriging uses
a data-driven function to assign weights. Therefore, kriging offers improved results with
clustered and highly correlated data [317]. Another attraction of kriging is that it gives an
estimate of the error of estimation; hence it can be reliably evaluated. Numerous kriging
134
methods exist [313, 318, 319]. Ordinary Kriging (OK) was chosen in this work because it has
been successfully applied in PM10 interpolation under assumptions of local mean constancy
[317]. The ordinary kriging estimator of a spatially continuous random variable, 𝑍(𝑑0), is the
weighted sum of its values at neighbouring locations, 𝑑𝑖:
��(𝑑0) = ∑ 휁𝑖𝑍(
𝑁(𝑑)
𝑖=1
𝑑𝑖) [5.20]
under the condition that,
∑ 휁𝑖 = 1
𝑁(𝑑)
𝑖=1
where ��(𝑑0) is the estimated variable, 𝑍(𝑑𝑖) is the observed value at the 𝑖 th of 𝑁(𝑑)
neighbours whose mean is assumed to be constant, and 휁𝑖 is the weight function representing
the influence of each observed location on the estimate. The choice of 𝑁(𝑑) is unique to OK
because other forms of kriging either assume a universally constant mean or assume the
process is completely non-stationary [313, 317, 319]. In this work, due to the relative high
density of the LAQN network, specifying 𝑁(𝑑) was straightforward and the five nearest
neighbours were chosen. 휁𝑖 is calculated as the minimiser of the ordinary kriging estimator
variance [320]:
𝜎𝑂𝐾2 = 𝐶𝐾𝑟𝑖𝑔(0) − ∑ 휁𝑖[𝐶𝐾𝑟𝑖𝑔(𝑑𝑖 − 𝑑0) − 𝜚(
𝑁(𝑑)
𝑖=1
𝑑𝑖)] [5.21]
where 𝐶𝐾𝑟𝑖𝑔 (0) is the true variance of the spatial variable and 𝜚 is the Lagrange operator
[320, 321]. The weights that minimise this variance can then be expressed as [320]:
∑ 휁𝑖𝐶𝐾𝑟𝑖𝑔(𝑑𝑖 − 𝑑𝑗) +
𝑁(𝑑)
𝑗=1
𝜚 = 𝐶𝐾𝑟𝑖𝑔(𝑑𝑖 − 𝑑0)
𝑲𝜻 = 𝒌𝐾𝑟𝑖𝑔
and since 𝑲 is positive (semi) definite,
𝜻 = 𝑲−1𝒌𝐾𝑟𝑖𝑔 [5.22]
where 𝐶𝐾𝑟𝑖𝑔(𝑑𝑖 − 𝑑𝑗) is the covariance between all location pairs in 𝑁(𝑑), 𝐶𝐾𝑟𝑖𝑔(𝑑𝑖 − 𝑑0) is the
covariance between observed locations and estimated point, and
𝑲 =
[ 𝐶𝐾𝑟𝑖𝑔11
𝐶𝐾𝑟𝑖𝑔12⋯⋯𝐶𝐾𝑟𝑖𝑔1𝑗
1
𝐶𝐾𝑟𝑖𝑔21 𝐶𝐾𝑟𝑖𝑔21
⋯⋯ 𝐶𝐾𝑟𝑖𝑔2𝑗1
⋮𝐶𝐾𝑟𝑖𝑔𝑖1
𝐶𝐾𝑟𝑖𝑔𝑖2⋯⋯ 𝐶𝐾𝑟𝑖𝑔𝑗𝑗
1
1 1⋯ ⋯ ⋯⋯ 1 0 ]
; 𝜻 =
[ 휁1휁2
⋮휁𝑖𝜚 ]
; and 𝒌𝐾𝑟𝑖𝑔 =
[ 𝐶𝐾𝑟𝑖𝑔1
𝐶𝐾𝑟𝑖𝑔2
⋮𝐶𝐾𝑟𝑖𝑔𝑖
1 ]
These covariances are calculated from the variogram.
135
5.3.6.1.1 Variogram The variogram [318] is the basis of Kriging interpolation that accounts for spatial correlation
between pairs of points in space in the Kriging model, and can be empirically estimated as
[317]:
��(ℎ𝐾𝑟𝑖𝑔) = 1
2𝑁(ℎ𝐾𝑟𝑖𝑔)∑{𝑍(𝑑𝑖) − 𝑍(𝑑𝑖 + ℎ𝐾𝑟𝑖𝑔)}
2
𝑁(ℎ)
𝑖=1
[5.23]
where 𝜗(ℎ) is the estimated semivariance, 𝑁(ℎ𝐾𝑟𝑖𝑔) is the number of observed pairs 𝑍(𝑑𝑖)
and 𝑍(𝑑𝑖 + ℎ𝐾𝑟𝑖𝑔) separated by the lag, ℎ𝐾𝑟𝑖𝑔 [312]. The lag is analogous to the bin and
bandwidth in histograms and kernel density estimation respectively. A poor choice of the
parameter may result in too few or too many data points in any single bin. This can severely
affect the estimated semivariance. In this work, ℎ𝐾𝑟𝑖𝑔 was specified after evaluating the
statistics of the location data, specifically the quantiles of their distribution.
To ensure non-singularity in Eq. 5.22, kriging uses positive (semi) definite theoretical
variograms whose parameters are estimated by fitting the semivariances generated from Eq.
5.23. A number of theoretical models are available [318, 322] but selection is heuristic. The
models differ in how fast the semivariance attains the true variance of the spatial variable. In
this work, the spherical variogram was favoured because it has been widely used in particle
dispersion applications [323]. The spherical model expressed in terms of the true data
semivariance, 𝜗(ℎ𝐾𝑟𝑖𝑔), is defined as [322]:
𝜗(ℎ𝐾𝑟𝑖𝑔) = {𝐶𝐾𝑟𝑖𝑔0
+ 𝐶𝐾𝑟𝑖𝑔1(1.5
ℎ𝐾𝑟𝑖𝑔
𝑟− 0.5 (
ℎ𝐾𝑟𝑖𝑔
𝑟)
3
) 𝑓𝑜𝑟 ℎ𝐾𝑟𝑖𝑔 ≤ 𝑟
𝐶𝐾𝑟𝑖𝑔0+ 𝐶𝐾𝑟𝑖𝑔1
𝑓𝑜𝑟 ℎ𝐾𝑟𝑖𝑔 ≥ 𝑟 [5.24]
where 𝐶𝐾𝑟𝑖𝑔0+ 𝐶𝐾𝑟𝑖𝑔1
is the variance of the estimated variable at 𝑑𝑖 (alternatively called the
‘sill’ [321]) and 𝑟 is the range or the minimum distance from 𝑑𝑖 at which ��(ℎ𝐾𝑟𝑖𝑔) = 𝐶𝐾𝑟𝑖𝑔0+
𝐶𝐾𝑟𝑖𝑔1 . The variogram and kriging are extensively discussed in Clark and Harper [318],
Cressie [312] and Goovaerts [320].
5.3.6.2 Implementing augmented MSPC
The same observations used in section 5.3.5.2 were used in this section to demonstrate
Augment MSPC. To get the best variogram and therefore a higher confidence in 𝜎𝑂𝐾2 [317]
[324], an empirical variogram was generated for each observation evaluated. Observations
5, 20, 50 and 80 correspond to samples measured at 7th, 13th, 11th and 16th hour of the day.
The entire dataset was averaged on these hours generating 7th, 13th, 11th and 16th hour
averages. This was done to reduce the effect of local effects, which could result in spatial
136
heterogeneity that can affect Kriging performance [317, 319]. The four fitted variograms (in
ArcGIS v10.1 on a Windows 7, 2.4GHz Intel Core processor, 4GB RAM platform) were then
used to krige their corresponding observations. To assess confidence in the reconstructed
values, the kriging estimator variance was used. This metric was considered suitable because
it is independent of the values being estimated and is, under assumptions of good variogram
fit, a reliable assessor of kriged estimates [320]. For an interpolated (kriged) value to be
accepted, |𝑥𝑘 − ��𝑘| < 3𝜎𝑂𝐾. This is intuitive as the error between measured and estimated
values will be larger for an erroneous observation. If this error is higher than the uncertainty
in the estimated value then the measured value is validated as bad and the reconstructed
value is accepted as replacement. It is assumed that the kriging error is normally distributed
and a 97.5% confidence is applied.
Augmented MSPC integrates the robust MSPC developed in the previous sections with the
Kriging interpolation described above. The augmented MSPC procedure for each new
autoscaled observation, 𝒙𝑛𝑒𝑤 (77x1), is as follows:
Check new observation for missing data
Use PMP to estimate scores and then compute 𝑇2 and SPE for the sample
Compare current sample’s 𝑇2 and SPE against control limits
If there is a violation, identify number (scores) of violations
Compute 𝐶𝑜𝑛𝑡𝑘 and 𝑇𝐶𝑜𝑛𝑡𝑘 as appropriate
Identify erroneous variable(s), 𝑥𝑘
Reconstruct variable(s), ��𝑘, from nearest 𝑁(𝑑) error-free neighbours using kriging
interpolation
Compare ��𝑘 to 𝑥𝑘
If |𝑥𝑘 − ��𝑘| < 3𝜎𝑂𝐾, 𝑥𝑘 is not faulty. Otherwise, 𝑥𝑘 is faulty replace it with ��𝑘.
5.4 Results
This section presents the main results. Results of the PCA analysis of PM10 data are presented
first, then the MSPC results are outlined subsequently.
5.4.1 PCA Analysis of PM10
This section discusses the results of a PCA analysis of PM10 data. The MSPC result are
presented starting from the next section. The scores and loading plots of the PCA model built
in section 5.3.2 are shown in figures 5.1 and 5.2. Only the largest two principal components
are shown in both cases. Score plots show sample relationships while loading plots show
inter-variable relationships and variable contributions to PCs.
137
Figure 5. 1: Score plot showing first PC against second (numbers represent sample number
– hour of year)
Figure 5. 2: Loading plot of first vs. second PC showing all monitoring stations (numbers
represent station numbers)
The clustering of observations and variables is evident in both figures. In figure 5.1, the
dataset is more compact than in figure 5.2, suggesting that there is more temporal correlation
in the data than spatial. This is to be expected considering there is a cyclical (daily) pattern
to the emission of particulates and that the monitored area spans several kilometres. Outliers
-20 0 20 40 60 80 100 120-15
-10
-5
0
5
10
15
Scores on PC 1 (61.80%)
Score
s o
n P
C 2
(4.3
7%
)
82
109 110 112
493
618
634 635
641
1095
1151 1159
1164
1812
2147
2149
2170 2171
2435 2493
2740 2743
2744 2746 3287
3742 3743
3745
3948 3951
3963 3989
3991
3992 3995 3996 3999 4022
4071
4135 4164
4303 4327 4838
4978
5127
5528
6341
6681 6682
6727
6732 6755
6770 6787
6802
6803 6806
6828
6852
7311 7312
7359
7365
7401 7402
7436
7437 7438 7439
7440 7441
7442 7443
7444
7445
7446
7460
7462 7463
7590 7624
7644
7653 7654
7655 7656
7670 7672
7673
7741 7744 7745 7747
7828
7906
7920
7929
8080
8215 8258
8319 8364 8407 8408 8409 8410 8411
8435
8439 8442
8447 8448
8486
8505
8530 8531
8602
8628
8629
8631 8632 8633
8635
8636 8637
8645 8648 8649
8656 8658 8660
8679
8694 8696 8703
8725 8730
8733 8738
8743 8748 8753
8757 8759
8760
Samples/Scores Plot of Pollution Collated.xls
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
PC 1 (61.80%)
PC
2 (
4.37
%)
6 7
8
9 10
11
15
16
17
18
19
20
21
24
25 27
28
29
30 33 34
35
36
39
41 42
49
51 52
53
54
55
56
57 58
60
61
62
63 64 65
66
67
68
69 72
73
74
77
78
79
80
82
83
84
85
86 87
88
89
91
92
Variables/Loadings Plot for Pollution Collated.xls
138
are also visible in figure 5.1. These samples correspond to measurements made within 8
hours of Guy Fawkes Night! It is therefore possible for outlying or unusual measurements to
be detected by simply exploring this type of data using multivariate analytical tools. These
outliers were excluded before building the preliminary monitoring model.
The loadings plot is a good indicator of redundancy - clustering of some monitors in the
loading plots (figure 5.2) suggests they have similar influences on the PC explaining a
substantial amount (62%) of the variance. This suggests that some of these monitors can be
decommissioned without loss of information (as depicted by the model’s residuals). This is in
agreement with studies conducted across world cities, where it was found that there is a high
redundancy in air quality monitoring networks [325-329] [330].
The loadings plot can also help identify potentially troublesome monitors. Monitors with high
loadings (the PC axis values) will affect a PCA model’s performance when they become
corrupted or when measurements are missing. From a PCA loading plot, these can be
identified and paid special attention.
Overall, PCA analysis suggests there is a potential for substantial dimension reduction for air
dispersed data. This is indicated in the first PC, which explains approximately 62% of the total
data variance. Figure 5.3 shows the explained variance for the first 20 PCs. It is evident that
subsequent PCs explain a very small part of the variance (approx. 4% and 3% for the 2nd and
3rd respectively). The flat nature of the variance curve after about 5 PCs suggests that the
explained variance by these PCs is indistinguishable from random noise. The cross-validation
plot shown in figure 5.3, where the PRESS is seen to increase after 4 PCs, confirms this.
Consequently, the model order was selected as 4.
It should be noted from figure 5.3 that the PCA model only explains approximately 69% of
the variation in the pollution data. The 31% variance that is not explained might contain
important information that the model has ignored, which could be significant when the model
is employed in monitoring applications.
139
Figure 5. 3: Percentage of variance explained by first 20 PCs
Figure 5. 4: Calibration and cross-validation errors for first 20 PCs
5.4.2 Data pre-processing and preliminary model of PM10
The pre-processing results illustrate the difference in the complexity of monitoring
pollution/dispersion compared to industrial processes. Figure 5.5 and 5.6 show the
distribution of missing data in the pre- and post-processing. In figure 5.5, almost 3500
samples had approximately 15 (~20%) missing observations in a vector of 77 observations.
In fact none of the 8760 samples throughout the year was completely observed. Even though
only approximately 15% of the total data is missing, their impact is severe because they are
2 4 6 8 10 12 14 16 18 200
10
20
30
40
50
60
70
Principal Component Number
Variance C
aptu
red (
%)
Eigenvalues and Cross-validation Results for Pollution Collated.xls
2 4 6 8 10 12 14 16 18 200
1
2
3
4
5
6
7
Principal Component Number
RM
SE
CV
, R
MS
EC
Eigenvalues and Cross-validation Results for Pollution Collated.xls
RMSECV
RMSEC
140
missing in blocks for relatively prolonged periods of time. The minimum number of missing
data points in any single observation vector was 6 (8% of the total) and the maximum was
35 (45% of the total). Some of the excluded stations had as high as 80% missing, with an
average of 45% of data missing among them.
Figure 5. 5: Missing data distribution before pre-processing
Figure 5. 6: Missing data distribution after processing
After pre-processing, the data has improved. As indicated in figure 5.6, approximately 4000
samples out of 8760 had approximately 6 (~8%) missing data. The minimum and maximum
5 10 15 20 25 30 350
500
1000
1500
2000
2500
3000
3500
Percentage of missing data
No
. o
f o
bse
rva
tio
ns w
ith
mis
sin
g d
ata
0 5 10 15 20 25 300
500
1000
1500
2000
2500
3000
3500
4000
Percentage of missing data
No
. o
f o
bse
rva
tio
ns w
ith
mis
sin
g d
ata
141
number of missing data in any single observation also improved to 1 and 26 (~34%)
respectively.
Figure 5.7 shows the location of PM10 monitors across London with deleted stations
annotated. Because they are not all concentrated in one location, the distribution of these
stations suggests their exclusion from modelling may not result in a loss of information. In
fact, an evaluation of the loadings corresponding to these stations in figure 5.1 shows most
of them belong to the cluster of points to the right (figure 5.1), which as explained earlier
suggests redundancy in the monitors.
Figure 5. 7: Monitor locations showing deleted monitors (red) with excessive missing data
Figures 5.7 - 5.10 show the scores, cross-validation errors, scaled Hotelling T2 and SPE of the
preliminary model built from all the observations in LAQN PM10 data after excluding monitors
with high missing values. The SPE plot was scaled by T2 axis for comparison. Subsequent
plots of the parameters are presented as absolute values. The cross-validation plot, which
compares the cross-validation error with the calibration error, indicates that 6 PCs are optimal
for the model. The score plots show the plots first PC against all the scores retained in the
model, the T2 plot shows the deviation of each samples from the model centre, and the SPE
plot shows the error between each sample and its projection onto the model space. From
figure 5.7, the same outliers are visible on all PC combinations regardless of percentage
variance explained.
-1.2 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6
51.2
51.25
51.3
51.35
51.4
51.45
51.5
51.55
51.6
51.65
Longitude (deg)
La
titu
de
(d
eg
)
142
Figure 5. 8: Score plots showing the 4 largest PCs against each other
-50 0 50 100 150-60
-40
-20
0
20
Scores on PC1 (62.5%)
Score
s o
n P
C2 (
4.1
5%
)
7436
74377438
74397440
7441
744274437444
7445
7462
-50 0 50 100 150-60
-40
-20
0
20
40
Scores on PC1 (62.5%)
Score
s o
n P
C3 (
3.3
%)
7436
7437
7438
74397440
74417442
7443744474457462
-50 0 50 100 150-15
-10
-5
0
5
10
15
Scores on PC1 (62.5%)
Score
s o
n P
C4 (
2.9
%)
74367437
74387439
74407441
744274437444
7445
7462
-60 -40 -20 0 20-60
-40
-20
0
20
40
Scores on PC2 (4.15%)
Score
s o
n P
C3 (
3.3
%)
7436
7437
7438
74397440
74417442
7443744474457462
-60 -40 -20 0 20-15
-10
-5
0
5
10
15
Scores on PC2 (4.15%)
Score
s o
n P
C4 (
2.9
%)
74367437
74387439
74407441
744274437444
7445
7462
-60 -40 -20 0 20 40-15
-10
-5
0
5
10
15
Scores on PC3 (3.3%
Score
s o
n P
C4 (
2.9
%)
74367437
74387439
74407441
744274437444
7445
7462
143
Figure 5.9: Cross-validation and calibration errors
Figure 5.10: Hotelling T2 chart preliminary PCA model
2 4 6 8 10 12 140.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
Number of PCs
RM
SE
C (
gre
en),
RM
SE
CV
(blu
e)
0 1000 2000 3000 4000 5000 6000 7000 80000
20
40
60
80
100
120
Sample (hrs)
Hote
lling T
2
7436
7437
743874397440
7441
7442
7443
7444
7445
7462
144
Figure 5.11: SPE chart for preliminary PCA model
The first score plot (figure 5.9) is similar to the one shown in figure 5.1 where apparent
outliers were observed. The PCA model, having been validated with 6 PCs, explains
approximately 75% of the total data variance. The remaining 25% of the data variance is
explained by the SPE. An SPE of 25% is high for most processes but PM10 dispersion is a
larger-scale process and is nonlinear due to the dominant influence of wind variables and and
complex causal attributes [308]. As such, there are bound to be numerous directions (PCs)
of identical data variance after the dominant direction (first PC) has been identified. The
justification of using such a linear model on a nonlinear process is based on the concepts of
predominant wind direction and wind speed averaging. These approximate the dispersion
process as linear but do not completely eliminate its nonlinearity. This highlights the
difference of this application of PCA and the significance of validating detected faults.
From the score and T2 figures (figures 5.9 and 5.10), obvious outliers can be easily identified
as indicated by the numbered points. The identified outliers correspond to high observed
values with means at least 5 times that of the next highest samples. The model can thus
detect correlation breakdowns resulting from abnormally high values of PM10 measurements.
But it is also desired that correlation breakdowns resulting from relatively low values (false
negatives) be detected. False negatives result in more subtle shifts because they have a lower
limit of zero as opposed to false positives that can take any positive value higher than the
actual. For example, a false negative for a measurement with a true value of 5 units can only
be detected from the correlation breakdown that results from a maximum error of 5 units.
0 1000 2000 3000 4000 5000 6000 7000 80000
20
40
60
80
100
120
Sample (hrs)
SP
E
7436
7437
74387439744074417442744374447445
7462
145
On the other hand, a false positive has no upper bound on the error and the higher the value,
the more apparent the correlation breakdown and the easier the detectability.
The SPE plot is more sensitive as indicated in figure 5.11. While SPE is known to be more
sensitive in industrial MSPC [274], some of the sensitivity seen here is attributable to the
nonlinear nature of dispersion mentioned earlier. This nonlinearity causes high concentrations
to be observed at locations in a manner that is not consistent with the model. This results in
an estimation error by the model that carries over into the residual, the SPE. SPE is therefore
proportional to high PM10 concentrations and this sensitivity will remain for all relatively high
measurements of PM10. The sensitivity of SPE is demonstrated when the outlying values on
the SPE plot are analysed. Figure 5.11 shows the samples with high SPE values in figure 5.10
plotted based on day of the year and time hour of the day the measurements were made. It
may be seen that the data is evenly distributed throughout the year and there appears to be
clustering between approximately 9am and 4pm. These are busy hours that are associated
with high release of particulate matter and are part of the true pattern of PM10 dispersion in
London. Some of the samples are outliers, for example where the SPE is relatively high for
just one sample. But most of them represent the multiple sourcing of PM10 dispersion and
consequently the nonlinear behaviour of the process the PCA model cannot completely
account for. This makes the SPE susceptible to false alarms. Qin et al. 1997 have proposed
applying an exponentially weighted moving average (EWMA) filter to noisy SPEs such as the
one in figure 5.10. But because this application aims for higher sensitivity than the traditional
industrial MSPC, this was not employed in this work. The same threshold of 95th percentile
(47.71) was applied to the SPE chart to exclude breaching samples.
Figure 5. 12: Outliers from preliminary model’s SPE showing daily time of emission
0 50 100 150 200 250 300 3500
5
10
15
20
Day of the year
Hour
of
the d
ay
146
5.4.3 Final Monitoring Model and Control limits
Figures 5.12 and 5.13 show the T2 and SPE charts of the final monitoring PCA model after
pre-processing.
Figure 5. 13: Hotelling T2 for final PCA model
Figure 5. 14: SPE chart for final PCA model
Figure 5.14 shows the corresponding kernel density estimated distribution of the monitoring
charts and 95th percentile confidence limits (13.42 and 28.96 for T2 and SPE respectively) are
shown in figure 5.15. The density may be seen to be positively skewed for both T2 and SPE
0 1000 2000 3000 4000 5000 6000 7000 80000
10
20
30
40
50
60
70
80
90
100
Sample (hrs)
Hote
lling T
2
0 1000 2000 3000 4000 5000 6000 7000 80000
20
40
60
80
100
120
140
160
180
200
Sample (hrs)
SP
E
147
(0.727 and 0.514 respectively). This has the significance that the values less than the
respective T2 and SPE means (3.84 and 15.45) dominate the distribution, and confidence
limits set above these values may not be critically sensitive. In addition, both distributions are
unimodal, suggesting a confidence limit set based on the distributions will reflect a single type
of behaviour. This is reasonable when it is considered that the statistics only monitor one
characteristic of the data - correlation breakdowns due to abnormal values.
Figure 5. 15: Kernel density estimated distributions of Hotelling T2 and SPE
0 2 4 6 8 10 12 14 16 180
0.1
0.2
0.3
Hotelling T2
Density
0 5 10 15 20 25 30 35 40 45 50 55
0.01
0.02
0.03
0.04
0.05
SPE
Density
148
Figure 5. 16: KDE ICDF showing 95th percentile for Hotelling T2 and SPE
It is expected that this high sensitivity may result in false alarms but this is believed to be a
necessary trade-off. This is because false negatives in the observations resulting from an
insufficiently sensitive monitoring scheme bear a high cost. This is because, in the case of the
biosensors, a false negative measurement would mean not advising farmers to spray or worse
not warning them of impending threats. Additionally, a false negative would take longer to
identify and rectify since the biological measurement process is slow and only provides daily
samples. Given these considerations, the high sensitivity is justified and that is one of the
motivations behind an augmented MSPC approach that can independently assess expected
false positives.
5.4.4 Online Fault Detection of PM10 network
5.4.4.1 No Missing Data
Figure 5.16 and 5.17 show the monitoring charts for the first case. The samples may be seen
to be in control for both limits. As expected, scaling the values and interpolating missing
observations did not notably breakdown correlations as perceived by the model. The higher
sensitivity of SPE in relation to T2, as indicated by SPE values closer to their control limit than
T2 values, is still apparent.
0 10 20 30 40 50 60 70 80 90 1000
5
10
15
20
X: 95
Y: 13.42
Hot
ellin
g T2
Percentiles of the Hotelling T2 KDE
0 10 20 30 40 50 60 70 80 90 1000
10
20
30
40
X: 95
Y: 28.96
Percentiles of the SPE KDE
SP
E
149
Figure 5. 17: Hotelling T2 control chart for new in-control samples
Figure 5. 18: SPE chart for new in-control sample
5.4.4.2 Missing Data (Case 1)
Figures 5.18 and 5.19 show the monitoring charts when there are missing observations in
the samples. In this case, 30 randomly chosen samples were selected with varying amounts
of missing data (from 6.5% to 25%). The samples were divided into 3 groups of 10 each.
The first group was corrupted with missing data in the range of 6.5-10% (low), the second
0 10 20 30 40 50 60 70 80 90 1000
2
4
6
8
10
12
14
16
18
20
Samples (hrs)
Hote
lling T
2
New samples
Control limit
0 10 20 30 40 50 60 70 80 90 1000
10
20
30
40
50
60
70
80
90
100
Samples (hrs)
SP
E
New samples
Control limit
150
in the range of 10-20% (medium) and the third in the range of 20-25% (high). Variable
locations to corrupt for each sample were randomly selected from 77 possible variables.
From figure 5.18 and 5.19, some deviations from the actual values can be seen, indicating
the model is affected by the missing measurements. The sample with the highest difference
between the true T2 and the estimated T2 (at sample 20) is one of the samples in the high
percentage range. This is more clearly seen in Table 5.1, which shows the average deviation
from actual T2 (and SPE) with increasing missing data.
Figure 5. 19: Hotelling T2 control chart for in-control samples with missing data
0 10 20 30 40 50 60 70 80 90 1000
2
4
6
8
10
12
14
16
18
20
Samples (hrs)
Hote
lling T
2
Without missing data
With missing data
Control limit
151
Figure 5. 20: SPE control chart for in-control samples with missing data
Table 5. 2: Control charts with increasing missing data
Control
statistic
Mean absolute deviation for samples with missing data of:
6.5-10% 10-20% 20-25%
Hotelling T2 1.17 1.41 2.63
SPE 3.11 4.96 5.19
From table 5.1, it can be seen that higher missing data percentages cause a higher deviation
from the actual T2 value. The deterioration of the performance of missing data technique with
increasing missing data is expected since PMP calculates the score vector using the observed
part of the new sample. As missing data increases in the new observation vector, the
correlation information between the variables decreases. This performance deterioration does
not result in out-of-control performance, however, because the original samples are in-
control. PMP, like most online missing data techniques, estimates scores by projecting the
observed part of the sample onto the PC plane, i.e. onto the loading vectors. Since the
samples are in control, then the information contained in the model (the loadings) describes
them adequately enough that score estimation errors will be minimal. However, for critically
in-control samples, even a small score estimation error will result in a breach of the control
limit. This is particularly true for SPE because of the already high values of the residual due
to the noise and nonlinearity in the system as discussed earlier.
0 10 20 30 40 50 60 70 80 90 1000
10
20
30
40
50
60
70
80
90
100
Samples (hrs)
SP
E
Without missing data
With missing data
Control limit
152
5.4.4.3 Missing Data (Case 2)
A common occurrence in monitoring and sensing networks was tested next. In most cases
when a pollution monitor or environmental sensor fails, it takes a while to realise and fix the
problem. In such cases, multiple samples may be missing for days as demonstrated during
the processing of this dataset. Figures 5.20 and 5.21 show the monitoring charts when an
extreme form of this scenario occurs, i.e. maximum number of missing values become missing
in consecutive samples as well as variables.
The samples used in the preceding missing data test were used for this exercise. 15 of the
100 samples (21-25, 51-55, 81-85) were infused with missing data. For the first group (21-
25), the first 19 (corresponding to 25% missing data) observations (monitors) were
designated missing. The same was done for the second and third groups but with
observations 30-49 and 50-69 respectively designated as missing. The three groups thus
represent samples with 19% missing data in approximately the first 19, mid 19 and last 19
PM10 monitors. It may be seen that there is a marked deviation from the T2 and SPE values
of the actual samples and those of samples with missing data. T2 of the samples with missing
data has increased from an average value of 1.2647 to 3.71 (~193%) and the corresponding
SPE from 10.98 to 23.31 (112%). But only the samples in 51-55 are out of control. The other
groups remain in control despite having 25% missing data.
Figure 5. 21: Hotelling T2 chart with severe case of missing data (25%)
0 10 20 30 40 50 60 70 80 90 1000
5
10
15
20
25
Samples (hrs)
Hote
lling T
2
With missing data
Without missing data
Control limit
153
Figure 5. 22: SPE chart with severe missing data (25%)
An investigation into the cause of this led to the loading plot shown in figure 5.22. The plot
shows the loading weights (or the influence) of each monitoring station on the most dominant
direction of variance, PC1. The numbers represent monitor positions when arranged by
distance (see figure 5.7). It can be seen that variables in the range of 30-49 (samples 51-
55) have the highest proportion of influential variables among the groups of samples as
represented by a more marked clustering near the maximum PC1 value. The 19 missing
observations in samples 51-55 constitute a critical combination of missing variables. This
increases the score estimation error of PMP [303, 305] and consequently a decrease or
increase in SPE. The monitoring process does not detect all the samples as out of control
because some score estimation errors are positive. A positive estimation error indicates an
under prediction and will reduce T2 values and may increase SPE.
The case just demonstrated is a severe one as mentioned in the beginning of this subsection.
Consecutive samples and variables can routinely get missing in real deployments but not likely
at such high percentages. PMP works well in this application, as it was able to cope with 25%
missing data unless when high proportions of influential variables were missing. This is in
part due to the efficient deployment of the PM10 network. As seen in the figure 5.22, monitors
are comparably influential, giving the network redundancy.
0 10 20 30 40 50 60 70 80 90 1000
10
20
30
40
50
60
70
80
90
100
Samples (hrs)
SP
E
With missing data
Without missing data
Control limit
154
Figure 5.23: Variable loadings on PC1
5.4.5 Online Fault Identification
The control charts of the simulation are shown in figures 5.23 and 5.24. It may be observed
that MSPC detects all samples as faulty. While T2 only detects corrupted samples, SPE has
detected additional ones that are known to be false positives. The reason for these
measurements is the aforementioned nonlinearity and stochastic nature of dispersion, which
was consigned to the residuals when the model order was selected. In these cases it is
advantageous to use the combined chart shown in figure 5.25. In this chart, the samples (5,
20, 50 and 80) are clearly at fault.
-0.04 -0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.140
10
20
30
40
50
60
70
80
1 2
3 4 5
6 7
8 9
1011
1213
1415
1617
1819
2021
2223
2425
2627
2829
3031
3233
3435
3637
3839
4041
4243
4445
4647
4849
5051
5253
5455
5657
5859
6061
6263
6465
6667
6869
7071
7273
7475
7677
Loadings on PC1 (62.5%)
Variable
/Monitor
num
ber
155
Figure 5. 24: Hotelling T2 chart for simulated out-of-control samples
Figure 5. 25: SPE chart for simulated out-of-control samples
0 10 20 30 40 50 60 70 80 90 1000
50
100
150
200
250
300
Hote
lling T
2
Variable
Out of control
Control limit
0 10 20 30 40 50 60 70 80 90 1000
200
400
600
800
1000
1200
Variable
SP
E
Out of control
Control limit
156
Figure 5. 26: SPE-T2 chart for simulated out-of-control samples
It should be noted that samples 5 and 20 are in the context of this application not erroneous
as they may represent realistic events due to local spore/pollen/particulate pollutant sources.
MSPC perceives these samples as having broken the correlation structure. To the PCA model,
the high random values of samples 50 and 80 are as outlying as 5 and 20. This is because
the PCA model does not evaluate correlation within a group of variables (5-10 and 15-25 in
samples 5 and 20) but rather evaluates correlation among all variables in consideration of
past behaviour learned during model training. In other words, the PCA model is not aware of
the local spatial correlations. As long as this spatial correlation is between a few variables, as
is often the case during local release events, the PCA model may always consider that
correlation a deviation from optimal behaviour.
Normally when a reliable detection is made, the contribution plots are inspected to identify
the erroneous variable. Figures 5.26 and 5.27 show the contribution plots to the T2 and SPE
errors. It should be noted that variables with negative variables should be ignored.
In this case, because more than one variable is at fault, all SPE and T2 have multiple
contributory variables as shown. It can be seen that the SPE chart can isolate the faults more
than the T2 chart. This is because SPE contributions are directly defined in the SPE charts
while T2 are derived through approximations [307].
0 50 100 150 200 250 3000
200
400
600
800
1000
1200
Hotelling T2
SP
E
Samples
SPE control limit
Hotelling T2 Control limit
157
Figure 5. 27: Hotelling T2 Contribution plot for 4 corrupted samples (Table 5.2)
Figure 5. 28: SPE Contribution plot for 4 corrupted samples (Table 5.2)
The T2 chart cannot uniquely identify the erroneous variables. From the SPE chart, however,
all (perceived) erroneous variables have been uniquely identified. SPE’s higher sensitivity to
0 20 40 60 80-3
-2
-1
0
1
2
3
Variable
Hot
ellin
g T2 C
ontr
ibut
ion
0 20 40 60 80-1.5
-1
-0.5
0
0.5
1
1.5
Variable
Hot
ellin
g T2 C
ontr
ibut
ion
0 20 40 60 80-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
Variable
Hot
ellin
g T2 C
ontri
butio
n
0 10 20 30 40 50 60 70 80-4
-2
0
2
4
6
8
Variable
Hot
ellin
g T2 C
ontr
ibut
ion
0 10 20 30 40 50 60 70 80-5
0
5
10
15
20
Variable
SP
E C
ontr
ibutio
n
0 10 20 30 40 50 60 70 80-4
-2
0
2
4
6
8
10
12
14
Variable
SP
E C
ontr
ibutio
n
0 10 20 30 40 50 60 70 80-5
0
5
10
15
20
Variable
SP
E C
ontr
ibutio
n
0 10 20 30 40 50 60 70 80-10
-5
0
5
10
15
20
25
Variable
SP
E C
ontr
ibutio
n
158
faults compared to T2 has been reported for online monitoring processes [274]. Alcala and
Qin attribute the low discriminatory power of T2 contribution plots to the approximations
during estimation of T2 contribution compared to direct calculations for SPE [307]. For
samples 50 and 80 that are actually bad measurements MSPC works. But the approach
mischaracterises potentially good measurements because it lacks spatial awareness.
Therefore, from the investigations carried out so far, false alarms can arise from three sources
when MSPC is extended to spatial data:
When combinations of influential variables become missing
High sensitivity of the PCA model to concentration changes
Lack of spatial awareness by traditional MSPC
5.4.6 Online Fault Detection in a PM10 Network
The proposed augmented MSPC procedure (section 5.3.6) can independently confirm
measurements suspected as erroneous by traditional MSPC. As proposed in section 5.3.6, the
difference between the measured and Kriged values is considered significant if |𝑥 − ��𝑘| >
3𝜎𝑂𝐾. Applying the procedure to the relevant variables in samples 5, 20, 50 and 80, the results
are shown in table 5.3. It may be seen that kriging identifies all but one (highlighted in red
in Table 5.3) of the mischaracterised variables from sample 5 and 20 as “good” based on
their neighbours. The variables in the actual faulty samples (50 and 80) have also been
confirmed faulty by this procedure. The single variable kriging could not certify highlights one
of the limitations of the approach. The variable in question is the 6th variable in sample 5 as
such it has the fewest neighbours in the entire group shown.
Kriging efficiency decreases if a variable is at the edge, i.e. when all its neighbours are at
successive distances away from it [321]. This is because the Kriging weights are functions of
distance between pairs alone. They are independent of the value of the variable being
estimated. For such a monitor/variable, the weight given to the nearest neighbour becomes
more influential and may result in a significant (higher than the estimation variance) over
prediction or under prediction depending on the value of the neighbour.
159
Table 5.2: Augmented MSPC results showing deviation of corrupted variables from their kriged
estimates and the kriging estimator’s variance
Sample no. 𝒙 ��𝒌 |𝒙 − ��𝒌| 𝟑𝝈𝑶𝑲
5
163.10 178.36 15.26
11.69
201.37 207.39 6.02
8.70
240.54 233.67 6.87
12.64
172.32 179.37 7.05
9.04
190.38 187.84 2.54
9.08
161.24 164.98 3.74
13.32
20
116.32 104.76 11.56
15.53
174.09 171.44 2.65
7.65
131.35 139.64 8.29
9.41
127.93 126.24 1.69
10.81
118.40 124.91 6.51
9.71
116.70 98.89 17.81
7.96
50
326.42 240.84 85.58
14.53
186.80 220.37 33.57
16.33
414.27 346.65 67.62
10.11
418.03 302.17 115.86
7.61
282.35 314.51 32.16
13.77
376.38 266.41 109.97
8.42
80
108.25 87.29 20.96
9.10
325.89 191.30 134.59
12.27
141.06 178.51 37.45
9.16
432.72 280.23 152.49
8.29
438.95 302.65 136.3
16.44
298.07 274.55 23.52
9.47
The variance of the kriging estimator is a reliable assessor of significance of deviation because
it is independent of estimated values. For monitors with sufficient neighbours, kriging
estimation will work well in verifying or rejecting an SPE detection of a faulty monitor. The
most important feature of kriging can be observed in the relatively unchanging values of its
variance compared with the changing values of the variables being measured. This
independence makes the kriging estimator variance a reliable assessor of significant
deviation.
160
5.5 Discussion
5.5.1 Integrated Fault Detection, Identification and Reconstruction
in a PM10 Network
In this work, a model-based process monitoring scheme capable of fault detection and
identification used in chemometrics and industrial processes was adapted and successfully
applied to PM10 dispersion over London. The conventional MSPC methodology was first
fortified with an online missing data handling technique that is less sensitive to multiple
monitor failures. To account for the non-normal distribution of the data, a parametric
approach to control limits calculation using Kernel Density Estimation was adopted. Then, a
data-driven unbiased spatial interpolation method, Kriging, was integrated into the robust
fault detection to enable validation of the detection process as well as reconstruction of
validated ‘faults’. The intended application of this method is in monitoring, fault detection,
identification and reconstruction of spore concentrations coming from a biosensor network.
This has the potential to be applied to any automatically detected fungal spore of comparable
size (10-14𝜇𝑚).
In adapting MSPC to this data, considerations were given to the likely type of faults that will
occur. In the process industry an “optimal behaviour’” of a process is clearly defined [331].
This may be maintaining a product (output) at a desirable quality or minimising energy or
input consumption below a certain threshold. The monitoring model is typically built offline
for this optimal process behaviour and faults, when they occur, will result in a deviation from
this desired specification that will cause a correlation breakdown [332]. For monitoring
networks, an ideal behaviour is not as clearly defined. To define the optimal process for this
application, the significance of each fault (false positive or false negative) was considered.
False negatives from the point of view of health, environmental and agricultural monitoring
networks are more costly than false positives [333]. This is because their effects are often
irreversible. Specifically, for the potential spore network, a false negative will entail not
warning growers/farmers to apply fungicides, subjecting crops to irreversible risk and/or
damage. In light of this, an optimal behaviour for the monitoring model was defined as a
sufficiently sensitive model that will detect bad measurements due to biosensor drifts or even
contamination. To achieve this specification in the monitoring model, the model was
iteratively built with samples having high T2 excluded from modelling at each step. The
implication of this is that the control limits were lower and more sensitive to small changes in
correlation structure, but also the monitoring scheme was more sensitive to false positives.
This was considered a necessary trade-off considering the higher cost of false negatives as
explained earlier. False positives were accounted for through the augmented MSPC scheme
161
where every detection was validated based on the spatial correlation that may have been
missed by the PCA model.
The results show that MSPC can be extended to spatial systems such as the PM10 monitoring
network, potential biosensor and a host of other environmental monitoring applications. PCA
analysis showed that there is a high redundancy in the PM10 network, suggesting some
monitors can be decommissioned without loss of data. This is in agreement with multiple
studies carried out across the world’s cities that found high redundancies in air quality
monitoring networks [325, 326, 334] [327-329]. The capability of a PCA-based data validation
technique to detect unusual samples was also demonstrated. The integrated missing data
method, PMP, was shown to be robust to missing data of up to 25% missing data, a value
higher than is typically supported by most online missing data methods [335] [293] [303].
However, when missing data was in influential variables, the monitoring scheme was found
to be susceptible to false alarms, due to the higher values of the score estimation errors
under such circumstances [305]. Dealing with missing data in online applications is
challenging. Recent applications of online monitoring to real-world systems either ignore it
[332] or propose solutions without stating how much missing data the solution can handle
[301]. In this work, it was shown that the proposed monitoring scheme can handle up to
25% missing data while maintaining its ability to detect erroneous measurements. MSPC also
successfully identified faults (monitors that reported values outside the ideal system
behaviour defined by the model), although there were some false positives associated with
samples that were simulated to reflect source events. These were attributed to an assumption
of Gaussian distribution by PCA [336] and its subsequent inability to identify local spatial
correlation events in PM10 distribution, which can result in spatial heterogeneity [337]. The
developed augmented scheme was able to validate these false detections and independently
reconstruct faults using Kriging. The kriging estimator variance, 𝜎2𝑂𝐾, used as a confidence
threshold for data reconstruction was successfully used to validate all but 1 wrong MSPC
detection. It was noted that the missed variable (monitor) was at the edge of the Kriging
domain and as such had the fewest number of neighbours available for estimates.
Some of the component methodologies used in this work have been applied to PM10 albeit
in a different manner. PCA has been used mainly for dimension reduction to identify
redundancy [325, 327-329, 334]. It has also been used in source apportionment of PM10 and
PM2.5 concentrations, where the pattern recognition (clustering) power of the technique is
exploited [338, 339]. In these applications, the PCA models were found to explain a high
percentage variance (often > 90%) of the PM10 concentration. In this work, the explained
variance by the validated PCA model was 69%. The difference in the explained variances is
due to two reasons. First, the studies found in literature were motivated by maximum
162
explained variance and so were not restricted by the PRESS. The PRESS can indicate
overfitting, a situation where PCs that describe residuals and not useful information in the
data are added to the model. The effect of overfitting is more impactful when a model is used
to make predictions (estimate scores in this case), but not usually severe for pattern
recognition although it can bias findings [340]. Second, the studies used multiyear data (at
least 5 yrs.) as opposed to single year data used in this study. Using multiyear data can
strengthen the temporal correlations in the data, especially with respect to seasonal
variations. This increases the explained variance by the model. Multiyear data was not used
in this work because using a single year data (specifically 2009-2010) maximised the number
of stations available.
Different variations of Kriging have also been used to predict gaseous pollutants [341],
reconstruct spatiotemporal fields [302] and estimate PM10 concentrations [317, 324, 342].
Most notably, Wong et al. [317] applied kriging to estimate PM10 concentrations in the United
States of America. They found good agreement nationally between measured data and kriged
estimates although kriging performed badly in regions with poor distribution of monitors
where deployment was influenced by expected exceedances. In this study, a better
agreement with measured data (an aggregated 𝑅2 = 0.89 for all samples with all four
variograms) was found due to a higher density in monitors and a variogram fitting
methodology that maximised spatial continuity. The hour-of-day averaging employed reduced
the effect of local sources, thus improving variogram fit.
No application was found in literature of PCA being used as a monitoring model or of a similar
(to this work) integrated scheme being applied to PM10 networks. This is partly because data
validation (Quality Assurance) for environmental networks (PM10 and Weather) is done in-
house, usually governed by agency standards. For example, the US Environmental Protection
Agency (USEPA) uses a 4-level data validation process to check the health of air quality
measurements at each individual sensor location [343]. These networks also use sampling
equipment that is more reliable than the spore biosensor that is the intended beneficiary of
this work. For example, the Tapered Element Oscillating Microbalance (TEOM) analysers
predominantly used in LAQN have a precision of ±0.5𝜇𝑔𝑚−3 [344] over a measurement range
of 0 to 1𝑔𝑚−3, although a correction factor is added to standardise measurement [345]. This
is high precision compared to what is achievable for biosensors at this stage of their
technological development [346].
A few relevant methodologies were however identified. Hill and Minsker [347] propose a
method of anomaly detection for environmental sensor networks based on a time series
approach that compares new observations with previous ones over a moving window. If this
163
difference is higher than a threshold the measurement is classified as anomalous and it record
is removed from future updates of the underlying time-series model. The main limitation to
this approach is that extending it to multiple sensors over a large network will be
computationally expensive since these steps are on the individual sensor level. The main
attraction of the method proposed here is that the dimension reduction capabilities of PCA
ensure that only a few variables (sensors) are checked to determine if a fault has occurred,
since only one score is calculated and evaluated for fault at the detection stage. The method
proposed by Hill and Minsker [347] will require a check at every sensor node for each sample
to detect anomalies. In fact, the approach proposed in this work will retain an advantage over
most change-detection methods that are implemented in the variable space. This is because
the dimension reduction of PCA allows a system of hundreds of sensors to be adequately
described by significantly fewer pseudo-variables in the PC space, and only a single score will
need to be computed to detect a fault over the entire network at any time. Most of the data
guidelines used in meteorological data validation suffer from this one-by-one approach to
sensor error detection [348]. They additionally do not have missing data handling capabilities
[347, 348]. The application of this integrated fault detection, identification and reconstruction
scheme is therefore considered advantageous over existing methods and unique.
The choice of kriging was based on its reliability in spatiotemporal and environmental data
applications [349] [319]. Methods like Artificial Neutral Networks (ANN) [350] [351], which
are self-learning black box models, while promising can suffer from over/under learning
[349], leading to errors. There are no universally good interpolation methods – suitability
depends on data attributes [319]. In this work, the density of the sampling network and the
spatial continuity of concentration for the period in question made Kriging an ideal choice.
Additionally, Kriging provides a measure of estimation precision, 𝜎2𝑂𝐾 , which has proven
useful in assessing the reliability of the reconstructed value. Land Use Regression (LUR), a
highly successful method of predicting PM10 in recent years, was also ruled out due to its
input requirement. LUR needs as inputs covariates, usually source information like intensity
and emission details [352], in addition to monitoring station values. In the intended
application of this technique (Sclerotinia spore dispersion), source information, by far the
most important covariate, is normally not available [29]. Unlike PM10, for which land use
attributes, such as roads, industrial buildings, etc., co-vary with emission patterns, spore
dispersion is mostly dependent on source attributes (size and strength) which are typically
not available. Kriging only requires readily available inputs - existing monitoring stations and
station location information - to make estimates, and this is why it was preferred.
Over the last two decades, fungal diseases have caused unprecedented damage on both
plants and animals, and constitute a major threat to food security [9]. A majority of these are
164
dispersed by air over large distances [3] and are the most dominant species among
bioaerosols within the 2-10 𝜇𝑚 size range [353]. Detecting these spores in an efficient
manner has been the biggest impediment to addressing the challenge they pose, as a result
of limitation in current measurement and sampling equipment [10]. Measurement of these
pathogens have relied on a two-step process, where spore collection and quantification are
done at different stages. The reliable quantification methods are time-consuming and tedious
(e.g. qPCR [63]), thus discouraging the largescale data collection that will improve our
understanding of governing dispersal processes. In a recent Nature article, Fisher et al. [9]
argue that current modelling approaches and small-scale experiments cannot fully predict
disease spread and severity. They instead call for intensive monitoring and surveillance of
fungal pathogens. Empirical approaches to data collection will enable building more robust
models of both spore dispersal and disease prediction than the local, situation-specific
phenomenological models limited data collection currently allows [258]. The advent of rapid
measurement techniques in airborne pathogen detection, specifically biosensors, provides
hope that automatic detection will enable the realisation of this empirical approach [13, 29].
However, the detection techniques while promising are still at the infant stages of their
technological development [171]. Biosensors are more imprecise, inaccurate and generally
more unreliable than conventional sensors due to imperfections in the synergy between
biological reactions and electrochemistry [182]. These limitations at the individual biosensor
level will most likely be confounded when they are deployed in a network for large-scale
sampling, where they will have to cope with challenging environmental factors and vandalism
[354]. Data integrity and validation are therefore needed to address the enormous challenges
facing these networks, specifically with respect to missing data and measurement errors
[354]. The methodology proposed in this work is in line with this need and can be extended
to any fungal spore detection network.
5.5.2 Limitations of K-MSPC
The proposed integrated scheme has a number of limitations. First, the PCA model used in
this work is a linear model. It assumes a multivariate Gaussian distribution of data [336] and
will, as have been noticed, be susceptible to false positives when these assumptions are
violated even for faultless observations. The nonlinearities in PM10 dispersion, although mild,
were supressed by the effect of predominant wind direction on dispersal and abundance of
data to train the model in this case. The large disparity between samples used in training the
model and the sets of new observations used (500) meant that the assumptions of stationarity
were not violated. In real applications, the process (PM10 dispersion) may be nonstationary
and there will be a need for the model to adapt so that incessant false alarms are avoided.
To cope with such process shifts, an adaptive PCA scheme [355, 356] should instead be used.
165
Second, this methodology is intended to be applied online. Currently, variogram fitting is
manual and implemented in a separate package than MSPC. Incorporating recent methods
of automatic variogram fitting [357] will enable the full integration of data reconstruction
(Kriging) with fault detection and identification, and fully automate the scheme.
Another limitation of this work relates to the aforementioned data non-stationarity. Due to
the amount of historical data available for this work, individual variograms were fitted for
each sample kriged. This ensured that the assumption of mean constancy was valid. During
real data monitoring, the spatial structure of the surface could substantially change and
ordinary kriging may give poor estimates [319]. For this reason, it is recommended that
Universal Kriging [341] [319], which accounts for data stationarity, should be used.
5.6 Conclusion
In this chapter, the potential of multivariate analysis tools for use in the proposed biosensor
network has been demonstrated. The utility of dimension reduction methods that allow easy
analysis of high dimensional data were initially demonstrated. The initial analysis was able
identify redundancies in the pollution monitoring network. MSPC was introduced, equipped
with missing data handling capabilities and applied to a PM10 monitoring data network with
a view to optimise it for the potential biosensor network. Confidence limits based on non-
parametric density estimation techniques successfully and consistently detected simulated
faults. It was observed that the effectiveness of missing data techniques depends on the
efficiency of deployment in the network because that minimised the probability of critical
combinations of variables to become missing. MSPC was able to handle up to 25% missing
data even when variables were contiguously missing, demonstrating its potential for real
world extension to spatial networks.
Areas of concern in the extension of traditional MSPC to spatial data were identified. The mild
nonlinearity in dispersion processes arising from the short term nonlinearity of wind variables
was identified as one of the main areas of concern. This nonlinearity is believed to be
responsible for the relatively high proportion of the process being explained by the residuals
(31%). Another area of concern arises from the subtlety of correlation breakdowns due to
change in concentration values compared to industrial processes where a deviation from a
clearly defined specification is of interest. This was addressed by making the underlying PCA
model highly sensitive by excluding all values larger than the 95th percentile of the data
distribution.
After adapting and applying traditional MSPC to the process, limitations of MSPC were
identified as false alarms resulting from the underlying model’s high sensitivity and PCA’s
166
inability to exploit spatial correlation. This was demonstrated through a simulated scenario
where MSPC mischaracterised a potentially healthy measurement as an erroneous one.
A novel augmented MSPC procedure (K-MSPC) that used Kriging to independently verify or
reject detections, and reconstruct faulty measurements was proposed to address these
limitations. K-MSPC was successfully able to certify the mischaracterised samples as healthy.
167
Chapter 6 Conclusion, Recommendations and Future Work
This chapter summarises the principal findings of this study and identifies opportunities for
further research. The summary of research undertaken is divided into four parts:
Overview of Project Conception and Motivation
Summary of Principal Findings
Real world Applications of Research
Future Work
6.1 Overview of Research Motivation
The PhD program presented in this thesis is the result of a four-year multidisciplinary effort
on improving agricultural innovation. With global population exploding and competition for
increasingly scarce resources rising, achieving and maintaining food security is the foremost
challenge of this century. The broader research area investigated during this programme was
conceived from a goal to contribute towards solving this challenge through the reduction of
crop loss and minimisation of fungicide use. This was to be achieved through the introduction
of an empirical approach to agricultural disease monitoring.
The SYIELD project, initiated by a consortium involving University of Manchester, Syngenta,
Gwent, among others sought to address this by proposing a network of biosensors that can
electrochemically detect airborne pathogens by exploiting the biology of plant-pathogen
interaction. This approach offers significant improvements on the current inefficient,
imprecise and largely theoretical or experimental methods used. The proposed biosensor
network approach will make actionable data available and enable the adoption of advanced
data analysis tools from other disciplines that will make disease risk forecasting robust,
simplify quarantine measures and make crop protection and fungicide use more efficient.
168
Within this context, this PhD focused on the adoption of multidisciplinary methods to address
three key objectives that are central to the success of the SYIELD project: local spore ingress
near canopies, the evaluation of a suitable model that can describe and estimate spore travel
distances, and multivariate analysis of a potential pathogen-detecting biosensor network.
6.2 Summary of Principal Findings
A brief summary of the research work done and the main findings thereof are given here.
The main areas are addressed as follows.
6.2.1 Field trial experiment and generation of novel data
The local transport of spores in an OSR canopy was investigated by carrying out a field trial
experiment at Rothamsted Research UK. The aim of the research was to investigate spore
ingress in OSR canopies, generate reliable data for testing the prototype biosensor and
evaluate a trajectory model. During the experiment, spores were air-sampled and quantified
using various quantification methods. Colourimetric detection and the prototype biosensor
were used to test for oxalic acid, an established pathogenicity factor of Sclerotinia spores,
and quantitative Polymerase Chain Reaction (qPCR), a DNA amplification technique, was used
to measure actual spore concentration. As expected, qPCR results outperformed the proxy
measurements. The results provided an insight into the filtration effect of OSR canopies and
heavy ground deposition of spores near the source. The research also enabled the evaluation
of various sampling heights (potential deployment heights of biosensors) from which an
optimal height was identified. Results from test of oxalic acid with the prototype biosensors
and colourimetric test also revealed a low sensitivity for the former, suggesting proxy
measurements may not be reliable in live deployments where spores are likely to be
contaminated by impurities and inhibitors of acid production. The actual spore results
measured using qPCR proved informative and provide a novel source of data that will be
useful for a wide array of applications. This data was found to fit a power decay law, a finding
that is consistent with experiments involving fungal spores in other crops.
6.2.2 Evaluating a 3D bLS model with experimental data
In the second area investigated, a 3D backward Lagrangian Stochastic (bLS) model was
parameterised and evaluated with the field trial data. The bLS model was chosen because
spore ingress, rather than spore concentration, was of primary concern. A model’s ability to
estimate concentrations reliably is a good indicator its ability to compute trajectories since
concentrations are computed from residence times of ensemble trajectories. For this reason,
the evaluation of a bLS model on experimental data was carried out. The final aim of this
aspect of the work is to employ this model to estimate minimum distances of separation of
biosensors and this is a subject of ongoing research. The bLS model, parameterised with
Monin-Obukhov Similarity Theory (MOST) variables showed good agreement with
169
experimental data and compared favourably in terms of performance statistics with a recent
application of an LS model in a maize canopy. Results obtained from the model were found
to be more accurate above the canopy than below it. This was attributed to a higher error
during initialisation of release velocities below the canopy. Overall, the bLS model performed
well partly because the experiments that generated the data were carried out in ideal
conditions for MOST validity.
6.2.3 Multivariate data analysis of potential sensor network
The final area of focus was the monitoring of a potential citywide biosensor network. The
purpose of this section of the research was to investigate data integrity concerns that would
arise from a citywide and potentially nationwide unsupervised network of biosensors with
multiple components of finite reliabilities. A novel framework based on Multivariate Statistical
Process Control (MSPC) concepts was proposed and applied to data from a pollution-
monitoring network. The monitoring data was of PM10 particles, which have similar
aerodynamic and dispersal characteristics with Sclerotinia spores. The monitoring scheme
was based on a PCA model that was trained with PM10 data covering a period of one year.
This data was first analysed to demonstrate the potential utility of PCA's dimension reduction
and data analysis for the biosensor network.
The initial analysis identified redundancies in the PM10 network based on the visual
advantage PCA offers in reduced dimensional space. The monitoring scheme was then
implemented on a refined PCA model, which incorporated missing data handling capabilities.
Missing measurements are a significant challenge in real-world applications of network
monitoring due to a number of reasons ranging from mechanical failures to theft and
vandalism.
To deal with the reality that most natural processes, and, therefore, practical data, do not
conform to normality assumptions, a non-parametric approach was employed to specify the
control limits of the monitoring process.
The adaptation of the MSPC framework to PM10 data identified areas of interest in the
application of monitoring schemes to spatial networks. The analysis suggested that missing
data methods work better when the network is efficiently deployed in such a way that all
biosensors have comparable influences. The analysis also indicated that in these cases of
efficient deployment, the system could handle high missing data amounts in the dataset (up
to 25%). Missing data issues become more challenging when measurements are missing in
contiguous blocks, e.g. when multiple neighbouring sensors fail. This can be avoided by
prompt deployment to replace or repair faulty or vandalised biosensors. Further, the main
limitation of traditional MSPC in spatial data applications was identified as a lack of spatial
170
awareness by the PCA model when considering correlation breakdowns due to an incoming
erroneous observation. This resulted in misidentification of healthy measurements as
erroneous in this study. The proposed augmented MSPC approach was able to incorporate
this capability. The proposed approach also introduced an assessment metric to test deviation
significance in the form of the kriging estimator variance. This is believed to be a robust
metric because it is independent of the values being estimated.
6.3 Real world applications of research
In addition to the real world applications addressed throughout the duration of the study, the
findings from this work are extendable to a wide variety of areas. The monitoring scheme
developed can be extended to any types of measurable spores or air-dispersed particulates
of similar size. In addition, the deployment of a biosensor network will provide actionable
disease prediction data on a scale that has not been seen before. This officially opens up
agriculture to Big Data tools that will go a long way in winning the fight against crop loss and
potentially hunger and famine.
6.4 Further areas of research
A number of research areas were identified during the course of this work. Some of these
areas are extensions of work done while others offer fresh perspectives. A few of the identified
areas are briefly discussed below:
In this work, a static PCA model was used to implement MSPC. Static models cannot
adapt to a shift in process behaviour as they only act according to the information
contained in the data during model building. Particulate matter dispersal as well as
that of spores and pollen may follow a seasonal pattern, in which case a monitoring
system based on static PCA cannot adapt. This will result in false alarms or no alarms
at all as the control limit of the control charts will no longer be valid. The rationale
for using a static model in this work is due the availability of historical data (annual
hourly data). This is not always possible in reality or in the pilot phase of network
monitoring when no prior data has been collected. In these cases monitoring schemes
based on dynamic models like recursive PCA, which recursively recalculate the PCA
models parameters with new information may be more beneficial. This way, the
control limits adapt to the current state of the process.
Another interesting area is in the efficient deployment of biosensors. PCA dimension
reduction capabilities have already been demonstrated. This can be incorporated with
near optimal strategies to achieve good results. In the area of sensor coverage,
optimal location algorithms use a candidate set of locations to (near) optimally locate
sensors. If this initial search space (candidate set) is large the search problem
becomes NP-hard. PCA models can be used to identify a reduced dimension, which
can then be used as an initial search space.
171
Another promising area is in the field of pathogen biosensor production. The
exploitation of pathogen/host interaction pioneered by the SYIELD project offers
promise in many areas. More importantly, current biosensors need improvements in
the areas of sensitivity and specificity. The proposed SYIELD biosensor takes 3 days
between collection and detection of a sample. A faster reaction will significantly
improve data quality and informativeness.
172
References
1. United Nations Department of Economic and Social Affairs, P.D., World Population Prospects: The 2012 Revision, Highlights and Advance Tables. 2013.
2. Alexandratos, N. and J. Bruinsma, World agriculture towards 2030/2050: the 2012 revision. 2012, ESA Working paper Rome, FAO.
3. Brown, J.K. and M.S. Hovmøller, Aerial dispersal of pathogens on the global and continental scales and its impact on plant disease. Science, 2002. 297(5581): p. 537-
541. 4. Oerke, E.-C. and H.-W. Dehne, Safeguarding production—losses in major crops and
the role of crop protection. Crop Protection, 2004. 23(4): p. 275-285.
5. Ahemad, M. and M.S. Khan, Biotoxic impact of fungicides on plant growth promoting activities of phosphate-solubilizing Klebsiella sp. isolated from mustard (Brassica campestris) rhizosphere. Journal of Pest Science, 2012. 85(1): p. 29-36.
6. Clarkson, J.P., et al., Forecasting Sclerotinia Disease on Lettuce: A Predictive Model for Carpogenic Germination of Sclerotinia sclerotiorum Sclerotia. Phytopathology, 2007. 97(5): p. 621-631.
7. Varraillon, T., et al. RAISO-Scléro: a decision support system to follow up petal contamination of sclerotinia in oilseed rape. in 13th International Rapeseed Congress. 2011. Prague, Czech. Republic.
8. Koch, S., et al., A crop loss-related forecasting model for Sclerotinia stem rot in winter oilseed rape. Phytopathology, 2007. 97(9): p. 1186-1194.
9. Fisher, M.C., et al., Emerging fungal threats to animal, plant and ecosystem health. Nature, 2012. 484(7393): p. 186-194.
10. Jackson, S. and K. Bayliss, Spore traps need improvement to fulfil plant biosecurity requirements. Plant Pathology, 2011. 60(5): p. 801-810.
11. West, J.S. and R.B.E. Kimber, Innovations in air sampling to detect plant pathogens. Annals of Applied Biology, 2015. 166(1): p. 4-17.
12. Sankaran, S., et al., A review of advanced techniques for detecting plant diseases. Computers and Electronics in Agriculture, 2010. 72(1): p. 1-13.
13. Heard, S. and J.S. West, New developments in identification and quantification of airborne inoculum, in Detection and Diagnostics of Plant Pathogens. 2014, Springer.
p. 3-19. 14. Bolton, M.D., B.P.H.J. Thomma, and B.D. Nelson, Sclerotinia sclerotiorum (Lib.) de
Bary: biology and molecular traits of a cosmopolitan pathogen. Molecular Plant
Pathology, 2006. 7(1): p. 1-16. 15. Boland, G. and R. Hall, Index of plant hosts of Sclerotinia sclerotiorum. Canadian
Journal of Plant Pathology, 1994. 16(2): p. 93-108. 16. Hegedus, D.D. and S.R. Rimmer, Sclerotinia sclerotiorum: When “to be or not to be”
a pathogen? FEMS microbiology letters, 2005. 251(2): p. 177-184. 17. McCartney, H.A. and M.E. Lacey, The relationship between the release of ascospores
of Sclerotinia sclerotiorum, infection and disease in sunflower plots in the United Kingdom. Grana, 1991. 30(2): p. 486-492.
18. Raynal, G., Kinetics of the ascospore production of Sclerotinia trifoliorum Eriks in growth chamber and under natural climatic conditions. Practical and epidemiological incidence. Agronomie, 1990. 10(7): p. 561-572.
19. Clarkson, J.P., et al., Ascospore release and survival in Sclerotinia sclerotiorum. Mycological Research, 2003. 107(2): p. 213-222.
20. Ingold, C.T., Fungal spores. Their liberation and dispersal. Fungal spores. Their
liberation and dispersal., 1971. 21. McCartney, H. and M.E. Lacey, Wind dispersal of pollen from crops of oilseed rape
(< i> Brassica napus</i> L.). Journal of Aerosol Science, 1991. 22(4): p. 467-477.
22. Newton, H. and L. Sequeira, Ascospores as the primary infective propagule of Sclerotinia sclerotiorum in Wisconsin. Plant Disease Reporter, 1972. 56(9): p. 798-
802. 23. Lacey, J., Spore dispersal—its role in ecology and disease: the British contribution to
fungal aerobiology. Mycological research, 1996. 100(6): p. 641-660.
173
24. Lacey, J., reproduction: patterns of spore production, liberation and dispersal. Water,
Fungi, and Plants, 1986(11): p. 65. 25. Ingold, C.T., Active liberation of reproductive units in terrestrial fungi. Mycologist,
1999. 13(3): p. 113-116. 26. Roper, M., et al., Dispersal of fungal spores on a cooperatively generated wind.
Proceedings of the National Academy of Sciences, 2010. 107(41): p. 17474-17479.
27. Qandah, I.S. and L. del Río Mendoza, Temporal dispersal patterns of Sclerotinia sclerotiorum ascospores during canola flowering. Canadian Journal of Plant
Pathology, 2011. 33(2): p. 159-167. 28. Hartill, W.F.T., Aerobiology of Sclerotinia sclerotiorum and Botrytis cinerea spores in
New Zealand tobacco crops. New Z. J. Agric. Res, 1980. 23: p. 259–262. 29. West, J.S., S.D. Atkins, and B.D. Fitt, Detection of airborne plant pathogens; halting
epidemics before they start. Outlooks on Pest Management, 2009. 20(1): p. 11-14.
30. Suzui, T. and T. Kobayashi, Dispersal of ascospores of Sclerotinia sclerotiorum (Lib.) de Bary on kidney bean plants. Part 1. Dispersal of ascospores from a point source of apothecia. Hokkaido Nat. Agric. Exp. Stn. Bull., 1972a. 101: p. 137-151.
31. Dupont, S. and Y. Brunet, Influence of foliar density profile on canopy flow: A large-eddy simulation study. Agricultural and Forest Meteorology, 2008. 148(6–7): p. 976-
990. 32. Aylor, D.E., Y. Wang, and D.R. Miller, Intermittent wind close to the ground within a
grass canopy. Boundary-Layer Meteorology, 1993. 66(4): p. 427-448. 33. McCartney, H. and D. Aylor, Relative contributions of sedimentation and impaction to
deposition of particles in a crop canopy. Agricultural and forest meteorology, 1987. 40(4): p. 343-358.
34. Wilson, J., et al., Statistics of atmospheric turbulence within and above a corn canopy. Boundary-Layer Meteorology, 1982. 24(4): p. 495-519.
35. Seginer, I., et al., Turbulent flow in a model plant canopy. Boundary-Layer
Meteorology, 1976. 10(4): p. 423-453. 36. Raupach, M., J. Finnigan, and Y. Brunei, Coherent eddies and turbulence in
vegetation canopies: the mixing-layer analogy. Boundary-Layer Meteorology, 1996.
78(3-4): p. 351-382. 37. Poggi, D., et al., The effect of vegetation density on canopy sub-layer turbulence.
Boundary-Layer Meteorology, 2004. 111(3): p. 565-587. 38. Kaimal, J.C. and J.J. Finnigan, Atmospheric boundary layer flows: their structure and
measurement. 1994.
39. Wilson, J., Turbulent transport within the plant canopy. Estimation of Areal Evapotranspiration, 1989. 177: p. 43-80.
40. Andrade, D., et al., Modeling soybean rust spore escape from infected canopies: model description and preliminary results. Journal of Applied Meteorology and
Climatology, 2009. 48(4): p. 789-803. 41. McCartney, H. and B. Fitt, Dispersal of foliar fungal plant pathogens: mechanisms,
gradients and spatial patterns, in The epidemiology of plant diseases. 1998, Springer.
p. 138-160. 42. Suzui, T. and T.H. Kobayashi, Dispersal of ascospores of Sclerotinia sclerotiorum
(Lib.) de Bary on kidney bean plants. Part 2. Dispersal of ascospores in the Tokachi District Hokkaido. Nat. Agric. Exp. Stn. Bull., 1972b. 102(61-68).
43. Boland, G.J. and R. Hall, Relationships between the spatial pattern and number of apothecia of Sclerotinia sclerotiorum and stem rot of soybean. Plant Pathology, 1988. 37(329-336).
44. McCartney, A. and J. West, Dispersal of fungal spores through the air, in Mycology Series. 2007. p. 65.
45. Fitt, B.D., et al., Spore dispersal and plant disease gradients; a comparison between two empirical models. Journal of Phytopathology, 1987. 118(3): p. 227-242.
46. Spijkerboer, H., et al., Ability of the Gaussian plume model to predict and describe spore dispersal over a potato crop. Ecological modelling, 2002. 155(1): p. 1-18.
174
47. Skelsey, P., A.A.M. Holtslag, and W. van der Werf, Development and validation of a quasi-Gaussian plume model for the transport of botanical spores. Agricultural and Forest Meteorology, 2008. 148(8–9): p. 1383-1394.
48. Wilson, J.D., Trajectory Models for Heavy Particles in Atmospheric Turbulence: Comparison with Observations. Journal of Applied Meteorology, 2000. 39(11): p.
1894-1912.
49. Reynolds, A., Development and Validation of a Lagrangian Probability Density Function Model of Horizontally-Homogeneous Turbulence Within and Above Plant Canopies. Boundary-layer meteorology, 2012. 142(2): p. 193-205.
50. de Jong, M.D., et al., A model of the escape of< i> Sclerotinia sclerotiorum</i> ascospores from pasture. Ecological Modelling, 2002. 150(1): p. 83-105.
51. Aylor, D. and G. Taylor, Escape of Peronospora tabacina spores from a field of diseased tobacco plants. Phytopathology, 1983. 73(4): p. 525-529.
52. Aylor, D.E. and F.J. Ferrandino, Rebound of pollen and spores during deposition on cylinders by inertial impaction. Atmospheric Environment (1967), 1985. 19(5): p.
803-806. 53. Thomson, D. and J. Wilson, History of Lagrangian stochastic models for turbulent
dispersion. Lagrangian Modeling of the Atmosphere, 2013: p. 19-36.
54. Wilson, J.D. and B.L. Sawford, Review of Lagrangian stochastic models for trajectories in the turbulent atmosphere. Boundary-Layer Meteorology, 1996. 78(1):
p. 191-210. 55. Aylor, D.E. and T.K. Flesch, Estimating spore release rates using a Lagrangian
stochastic simulation model. Journal of Applied Meteorology, 2001. 40(7): p. 1196-1208.
56. Gleicher, S.C., et al., Interpreting three-dimensional spore concentration measurements and escape fraction in a crop canopy using a coupled Eulerian–Lagrangian stochastic model. Agricultural and Forest Meteorology, 2014. 194: p.
118-131. 57. Jarosz, N., B. Loubet, and L. Huber, Modelling airborne concentration and deposition
rate of maize pollen. Atmospheric Environment, 2004. 38(33): p. 5555-5566.
58. Aylor, D.E., Biophysical scaling and the passive dispersal of fungus spores: relationship to integrated pest management strategies. Agricultural and Forest
Meteorology, 1999. 97(4): p. 275-292. 59. Wilson, J.D., A second-order closure model for flow through vegetation. Boundary-
Layer Meteorology, 1988. 42(4): p. 371-392.
60. Katul, G.G., et al., One- and two-equation models for canopy turbulence. Boundary-Layer Meteorology, 2004. 113(1): p. 81-109.
61. Gleicher, S.C., et al., Interpreting three-dimensional spore concentration measurements and escape fraction in a crop canopy using a coupled Eulerian–Lagrangian stochastic model. Agricultural and Forest Meteorology, 2014. 194(0): p. 118-131.
62. Pan, Y., M. Chamecki, and S.A. Isard, Large-eddy simulation of turbulence and particle dispersion inside the canopy roughness sublayer. Journal of Fluid Mechanics, 2014. 753: p. 499-534.
63. Rogers, S.L., S.D. Atkins, and J.S. West, Detection and quantification of airborne inoculum of Sclerotinia sclerotiorum using quantitative PCR. Plant Pathology, 2009.
58(2): p. 324-331.
64. Saharan, G.S. and N. Mehta, Sclerotinia diseases of crop plants: Biology, ecology and disease management. 2008: Springer.
65. Sun, P. and X.B. Yang, Light, Temperature, and Moisture Effects on Apothecium Production of Sclerotinia sclerotiorum. Plant Disease, 2000. 84(12): p. 1287-1293.
66. Fitt, B.D.L., H.A. McCartney, and J.S. West, Dispersal of foliar plant pathogens: mechanisms, gradients and spatial patterns. The Epidemiology of Plant Diseases,
2006: p. 159-192.
67. Koch, S., et al., A Crop Loss-Related Forecasting Model for Sclerotinia Stem Rot in Winter Oilseed Rape. Phytopathology, 2007. 97(9): p. 1186-1194.
175
68. Protectedherbs.org.uk. Sclerotinia life cycle. 2014 [cited 2014 14 August 2014];
Available from: http://www.protectedherbs.org.uk/pages/sclerotiniaLifeCycle.htm. 69. Fitt, B.D.L., et al., Prospects for developing a forecasting scheme to optimise use of
fungicides for disease control on winter oilseed rape in the UK. Aspects of Applied Biology (United Kingdom), 1997.
70. Twengström, E., et al., Forecasting Sclerotinia stem rot in spring sown oilseed rape. Crop Protection, 1998. 17(5): p. 405-411.
71. Weiss, M. and F. Baret, CAN-EYE V6.1 USER MANUAL. 2010.
72. Jonckheere, I., et al., Review of methods for in situ leaf area index determination: Part I. Theories, sensors and hemispherical photography. Agricultural and forest
meteorology, 2004. 121(1): p. 19-35. 73. Holmes, N.S. and L. Morawska, A review of dispersion modelling and its application
to the dispersion of particles: An overview of different dispersion models available. Atmospheric Environment, 2006. 40(30): p. 5902-5928.
74. Benson, P.E., CALINE 4—A Dispersion Model for Predicting Air Pollutant Concentrations near Roadways, in FHWA User Guide. 1984, U. Trinity Consultants Inc.
75. Sokhi, R., B. Fisher, and e. al. Modelling of air quality around roads. in Proceedings of the 5th International Conference on Harmonisation with Atmospheric Dispersion Modelling for Regulatory Purposes. 1998. Greece.
76. Fitt, B.D.L. and H.A. McCartney, (eds ), , Population Dynamics and Management. Spore dispersal in relation to epidemic models, in Plant Disease Epidemiology
ed. K.J. Leonard and W.E. Fry. Vol. 1. 1986, New York: Macmillan. 77. Aloyan, A.E., Numerical modelling of minor gas constituents and aerosols in the
atmosphere. Ecological Modelling, 2004. 179: p. 163-175.
78. Jones, A., et al., The UK Met Office's next-generation atmospheric dispersion model, NAME III. Air Pollution Modeling and its Application XVII, 2007: p. 580-589.
79. Fitt, B.D.L., H.A. McCartney, and J.S. West, Dispersal of foliar plant pathogens: mechanisms, gradients and spatial patterns
The Epidemiology of Plant Diseases, B.M. Cooke, D.G. Jones, and B. Kaye, Editors. 2006,
Springer Netherlands. p. 159-192. 80. Barratt, R., Atmospheric dispersion modelling: an introduction to practical
applications. 2001: Earthscan. 81. Pasquill, F. and F. Smith, Atmospheric diffusion.: Study of the dispersion of windborne
material from industrial and other sources. JOHN WILEY & SONS, 605 THIRD AVE.,
NEW YORK, NY 10016, USA. 1983., 1983. 82. Abdel-Rahman, A.A. On the Atmospheric Dispersion and Gaussian Plume Model.
2008. 83. Hanna, S., G. Briggs, and R. Hosker Jr, Handbook on atmospheric dispersion.
Prepared for the US Department of Energy, 1982. 84. Erbrink, J. and J. van Jaarsveld, The National Model compared with other models and
measurements. in Spijkerboer et al. (2002), Ability of the Gaussian Plume model to
predict describe spore dispersal over potato a crop, 1992. 155: p. 1-18. 85. Stohl, A., et al., Technical note: The Lagrangian particle dispersion model FLEXPART
version 6.2. Atmos. Chem. Phys., 2005. 5(9): p. 2461-2474. 86. Rodean, H.C., Stochastic Lagrangian models of turbulent diffusion. Vol. 45. 1996:
American Meteorological Society Boston, MA.
87. Thomson, D., Criteria for the selection of stochastic models of particle trajectories in turbulent flows. J. Fluid Mech, 1987. 180(529-556): p. 109.
88. Aylor, D.E., N.P. Schultes, and E.J. Shields, An aerobiological framework for assessing cross-pollination in maize. Agricultural and Forest Meteorology, 2003. 119(3-4): p.
111-129. 89. Aylor, D.E., et al., Quantifying the rate of release and escape of Phytophthora
infestans sporangia from a potato canopy. Phytopathology, 2001. 91(12): p. 1189-
1196. 90. USEPA, Revised draft user's guide for the AEROMOD meteorological processor
(aermet). EPA, 1999: p. 273.
176
91. EarthTechInc, A user’s guide for the CALPUFF dispersion model. Earth Tech, Inc,
2000. 521. 92. Esbensen, K.H., et al., Multivariate data analysis: in practice: an introduction to
multivariate data analysis and experimental design. 2002: Multivariate Data Analysis. 93. Martens, H. and T. Naes, Multivariate calibration. 1992: John Wiley & Sons Inc.
94. Andersson, M., A comparison of nine PLS1 algorithms. Journal of chemometrics,
2009. 23(10): p. 518-529. 95. Geladi, P. and B.R. Kowalski, Partial least-squares regression: a tutorial. Analytica
Chimica Acta, 1986. 185: p. 1-17. 96. Kallithraka, S., et al., Instrumental and sensory analysis of Greek wines;
implementation of principal component analysis (PCA) for classification according to geographical origin. Food Chemistry, 2001. 73(4): p. 501-514.
97. Whittaker, P., et al., Identification of foodborne bacteria by infrared spectroscopy using cellular fatty acid methyl esters. Journal of Microbiological Methods, 2003. 55(3): p. 709-716.
98. Frank, I.E. and J.H. Friedman, A Statistical View of Some Chemometrics Regression Tools. Technometrics, 1993. 35(2): p. 109-135.
99. Höskuldsson, A., PLS regression methods. Journal of chemometrics, 1988. 2(3): p.
211-228. 100. Helland, I.S., Partial Least Squares Regression and Statistical Models. Scandinavian
journal of statistics, 1990. 17(2): p. 97-114. 101. Liu, Z.-y., et al., Characterizing and estimating rice brown spot disease severity using
stepwise regression, principal component regression and partial least-square regression. Journal of Zhejiang University - Science B, 2007. 8(10): p. 738-744.
102. Jackman, P., D.-W. Sun, and P. Allen, Prediction of beef palatability from colour, marbling and surface texture features of longissimus dorsi. Journal of Food Engineering, 2010. 96(1): p. 151-165.
103. Huang, J.F. and A. Apan, Detection of sclerotinia rot disease on celery using hyperspectral data and partial least squares regression. Journal of spatial science,
2006. 51(2): p. 129-142.
104. Foster, A.J., et al., Development and validation of a disease forecast model for Sclerotinia rot of carrot. Canadian Journal of Plant Pathology, 2011. 33(2): p. 187-
201. 105. Turkington, T.K., R.A.A. Morrall, and R.K. Gugel, Use of petal infestation to forecast
Sclerotinia stem rot of canola: Evaluation of early bloom sampling, 1985-90. Can. J.
Plant Pathol, 1991. 13: p. 50-59. 106. Lelong, C.C.D., et al., Evaluation of Oil-Palm Fungal Disease Infestation with Canopy
Hyperspectral Reflectance Data. Sensors, 2010. 10(1): p. 734-747. 107. Guimarães, R.L. and H.U. Stotz, Oxalate production by Sclerotinia sclerotiorum
deregulates guard cells during infection. Plant physiology, 2004. 136(3): p. 3703-3711.
108. Tolle, G., et al. A macroscope in the redwoods. 2005. ACM.
109. Ramanathan, N., et al., Rapid deployment with confidence: Calibration and fault detection in environmental sensor networks. 2006.
110. Kollman, C., et al., Limitations of statistical measures of error in assessing the accuracy of continuous glucose sensors. Diabetes technology & therapeutics, 2005.
7(5): p. 665-672.
111. Montgomery, D.C., G.C. Runger, and N.F. Hubele, Engineering statistics. 2009: Wiley. 112. Silverman, B.W., Density estimation for statistics and data analysis. Vol. 26. 1986:
Chapman & Hall/CRC. 113. Dunia, R., et al., Identification of faulty sensors using principal component analysis.
AIChE Journal, 1996. 42(10): p. 2797-2812. 114. Wise, B. and N. Ricker. Recent advances in multivariate statistical process control:
Improving robustness and sensitivity. 1991. Citeseer.
115. Tong, H. and C.M. Crowe, Detection of gross erros in data reconciliation by principal component analysis. AIChE Journal, 1995. 41(7): p. 1712-1722.
177
116. MacGregor, J.F., et al., Process monitoring and diagnosis by multiblock PLS methods. AIChE Journal, 1994. 40(5): p. 826-838.
117. Dunia, R. and S. Joe Qin, A unified geometric approach to process and sensor fault identification and reconstruction: the unidimensional fault case. Computers & chemical engineering, 1998. 22(7): p. 927-943.
118. Qin, S.J., H. Yue, and R. Dunia, Self-validating inferential sensors with application to air emission monitoring. Industrial & engineering chemistry research, 1997. 36(5): p. 1675-1685.
119. Wold, H., Soft modeling by latent variables: the nonlinear iterative partial least squares approach. Perspectives in probability and statistics, papers in honour of MS
Bartlett, 1975: p. 520-540. 120. Wold, S., K. Esbensen, and P. Geladi, Principal component analysis. Chemometrics
and Intelligent Laboratory Systems, 1987. 2(1-3): p. 37-52.
121. Wold, S., et al., Some recent developments in PLS modeling. Chemometrics and intelligent laboratory systems, 2001. 58(2): p. 131-150.
122. Esbensen, K., An introduction to multivariate data analysis and experimental design. Camo Inc, 2004.
123. Hotelling, H., ed. Multivariate quality control. Techniques of statistical analysis, ed.
C. Eisenhart, M. Hastay, and W. Wallis. 1947, McGraw-Hill: New York. 124. Sparks, R., Monitoring highly correlated multivariate processes using Hotelling's T2
statistic: problems and possible solutions. Quality and Reliability Engineering International, 2014: p. n/a-n/a.
125. Williams, J.D., et al., On the distribution of Hotelling's T2 statistic based on the successive differences covariance matrix estimator. Journal of Quality Technology,
2006. 38: p. 217-229.
126. Dunia, R. and S. Joe Qin, Joint diagnosis of process and sensor faults using principal component analysis. Control Engineering Practice, 1998. 6(4): p. 457-469.
127. Doymaz, F., J.A. Romagnoli, and A. Palazoglu, A strategy for detection and isolation of sensor failures and process upsets. Chemometrics and Intelligent Laboratory
Systems, 2001. 55(1): p. 109-123.
128. Bose, M., G. SathyendraKumar, and C. Venkateswarlu, Detection, isolation and reconstruction of faulty sensors using principal component analysis. Indian journal of
chemical technology, 2005. 12. 129. Sharma, A., L. Golubchik, and R. Govindan. On the prevalence of sensor faults in
real-world deployments. 2007. IEEE.
130. Rabiner, L. and B. Juang, An introduction to hidden Markov models. ASSP Magazine, IEEE, 1986. 3(1): p. 4-16.
131. Qin, S.J. and W. Li, Detection and identification of faulty sensors in dynamic processes. AIChE Journal, 2001. 47(7): p. 1581-1593.
132. Sheather, S.J. and M.C. Jones, A reliable data-based bandwidth selection method for kernel density estimation. Journal of the Royal Statistical Society. Series B
(Methodological), 1991: p. 683-690.
133. Jones, M.C., J.S. Marron, and S.J. Sheather, A brief survey of bandwidth selection for density estimation. Journal of the American Statistical Association, 1996. 91(433): p.
401-407. 134. Sheather, S.J., Density estimation. Statistical Science, 2004. 19(4): p. 588-597.
135. Lee, R.W. and J.J. Kulesz, A risk-based sensor placement methodology. Journal of
hazardous materials, 2008. 158(2): p. 417-429. 136. Byrne, R. and D. Diamond, Chemo/bio-sensor networks. Nature materials, 2006.
5(6): p. 421-424. 137. Wang, B., et al., Sensor density for complete information coverage in wireless sensor
networks. Wireless Sensor Networks, 2006: p. 69-82. 138. Kanaroglou, P.S., et al., Establishing an air pollution monitoring network for intra-
urban population exposure assessment: a location-allocation approach. Atmospheric
Environment, 2005. 39(13): p. 2399-2409. 139. Moses, A., K. Obenschain, and J. Boris. Using CT-Analyst as an integrated tool for
CBR analysis. 2006.
178
140. Chen, Y.Q., K.L. Moore, and Z. Song. Diffusion boundary determination and zone control via mobile actuator-sensor networks (MAS-net): Challenges and opportunities. 2004.
141. Ishida, H., et al., Plume-tracking robots: A new application of chemical sensors. The Biological Bulletin, 2001. 200(2): p. 222-226.
142. Krause, A., A. Singh, and C. Guestrin, Near-optimal sensor placements in Gaussian processes: Theory, efficient algorithms and empirical studies. The Journal of Machine Learning Research, 2008. 9: p. 235-284.
143. Ramakrishnan, N., et al. Gaussian processes for active data mining of spatial aggregates. 2005.
144. Park, J.H., G. Friedman, and M. Jones, Geographical feature sensitive sensor placement. Journal of Parallel and Distributed Computing, 2004. 64(7): p. 815-825.
145. Ozkul, S., N.B. Harmancioglu, and V.P. Singh, Entropy-based assessment of water quality monitoring networks. Journal of hydrologic engineering, 2000. 5: p. 90.
146. Shumway, R.H. and D.S. Stoffer, Time series analysis and its applications. 2000:
Springer Verlag. 147. Fitt, B.D.L., H.A. McCartney, and J.S. West, Dispersal of foliar plant pathogens:
mechanisms, gradients and spatial patterns, in The Epidemiology of Plant Diseases, B.M. Cooke, D.G. Jones, and B. Kaye, Editors. 2006, Springer Netherlands. p. 159-192.
148. Lacey, M.E. and J.S. West, The air spora: a manual for catching and identifying airborne biological particles. 2007: Springer.
149. Kaimal, J. and J. Finnigan, Atmospheric Boundary Layer Flows. 1994, Oxford Univ. Press, New York.
150. Flesch, T., et al., Deducing ground-to-air emissions from observed trace gas concentrations: A field trial. Journal of Applied Meteorology, 2004. 43(3): p. 487-502.
151. Silvertown, J., et al., The Park Grass Experiment 1856–2006: its contribution to ecology. Journal of Ecology, 2006. 94(4): p. 801-814.
152. Foken, T., 50 years of the Monin–Obukhov similarity theory. Boundary-Layer
Meteorology, 2006. 119(3): p. 431-447. 153. Rogers, S., S.D. Atkins, and J. West, Detection and quantification of airborne
inoculum of Sclerotinia sclerotiorum using quantitative PCR. Plant Pathology, 2009. 58(2): p. 324-331.
154. West, J.S., Plant Pathogen Dispersal, in eLS. 2001, John Wiley & Sons, Ltd.
155. Saharan, G. and D.N. Mehta, Sclerotinia diseases of crop plants: biology, ecology and disease management. 2008: Springer.
156. West, J., et al., Development of the miniature virtual impactor–MVI–for long-term and automated air sampling to detect plant pathogen spores. Proceedings of “Future
IPM in Europe, 2013: p. 19-21. 157. Bourdôt, G., et al., Risk analysis of Sclerotinia sclerotiorum for biological control of
Cirsium arvense in pasture: ascospore dispersal. Biocontrol Science and Technology,
2001. 11(1): p. 119-139. 158. Di‐Giovanni, F., A review of the sampling efficiency of rotating‐arm impactors used in
aerobiological studies. Grana, 1998. 37(3): p. 164-171.
159. Abawi, G. and R. Grogan, Epidemiology of diseases caused by Sclerotinia species. Phytopathology, 1979.
160. Saldanha, R., et al., The influence of sampling duration on recovery of culturable fungi using the Andersen N6 and RCS bioaerosol samplers. Indoor air, 2008. 18(6):
p. 464-472.
161. Pan, Y., et al., Dispersion of particles released at the leading edge of a crop canopy. Agricultural and Forest Meteorology, 2015. 211: p. 37-47.
162. Aylor, D.E. and F.J. Ferrandino, Dispersion of spores released from an elevated line source within a wheat canopy. Boundary-Layer Meteorology, 1989. 46(3): p. 251-
273.
163. Flesch, T.K., et al., Estimating gas emissions from a farm with an inverse-dispersion technique. Atmospheric Environment, 2005. 39(27): p. 4863-4874.
179
164. Sood, R., Textbook of medical laboratory technology. 2006: Jaypee Brothers Medical
Publishers. 165. Datta, P.K. and B. Meeuse, Moss oxalic acid oxidase—a flavoprotein. Biochimica et
biophysica acta, 1955. 17: p. 602-603. 166. Vo-Dinh, T., Biomedical Photonics Handbook: Biomedical Diagnostics. Vol. 2. 2014:
CRC press.
167. Kim, K.S., J.-Y. Min, and M.B. Dickman, Oxalic acid is an elicitor of plant programmed cell death during Sclerotinia sclerotiorum disease development. Molecular Plant-
Microbe Interactions, 2008. 21(5): p. 605-612. 168. Hu, Y., et al., Characteristics and heterologous expressions of oxalate degrading
enzymes “oxalate oxidases” and their applications on immobilization, oxalate detection, and medical usage potential. Journal of Biotech Research [ISSN: 1944-
3285], 2015. 6: p. 63-75.
169. Nierman, W.C., et al., Genomic sequence of the pathogenic and allergenic filamentous fungus Aspergillus fumigatus. Nature, 2005. 438(7071): p. 1151-1156.
170. Šljukic, B., C.E. Banks, and R.G. Compton, Iron oxide particles are the active sites for hydrogen peroxide sensing at multiwalled carbon nanotube modified electrodes. Nano letters, 2006. 6(7): p. 1556-1558.
171. Grieshaber, D., et al., Electrochemical Biosensors - Sensor Principles and Architectures. Sensors (Basel, Switzerland), 2008. 8(3): p. 1400-1458.
172. Coldrick, Z., SYIELD: Electrochemistry of oxalate biosensor. 2013, University of Manchester.
173. Hare, J.M., Sabouraud Agar for Fungal Growth, in Laboratory Protocols in Fungal Biology. 2013, Springer. p. 211-216.
174. Armbruster, D.A. and T. Pry, Limit of blank, limit of detection and limit of quantitation. Clin Biochem Rev, 2008. 29(Suppl 1): p. S49-52.
175. Desimoni, E. and B. Brunetti, Data Treatment of Electrochemical Sensors and Biosensors, in Environmental Analysis by Electrochemical Sensors and Biosensors. 2015, Springer. p. 1137-1151.
176. Housecroft, C., E. and E.C. Constable, Chemistry: An Introduction to Organic, Inorganic and Physical Chemistry. 3rd edition ed. 2006, Edinburg Gate (England): Pearson Education Limited.
177. Heard, S., Plant Pathogen Sensing for Early Disease Control. 2013, University of Manchester.
178. Holland, P.M., et al., Detection of specific polymerase chain reaction product by utilizing the 5'----3' exonuclease activity of Thermus aquaticus DNA polymerase. Proceedings of the National Academy of Sciences, 1991. 88(16): p. 7276-7280.
179. McCartney, H. and B. Fitt, Construction of dispersal models. Mathematical modelling of crop disease, 1985.
180. McCartney, H., M. Lacey, and C. Rawlinson, Dispersal of Pyrenopeziza brassicae spores from an oil-seed rape crop. The Journal of Agricultural Science, 1986.
107(02): p. 299-305.
181. Schwartz, H. and J. Steadman, Factors affecting sclerotium populations of, and apothecium production by, Sclerotinia sclerotiorum. Phytopathology, 1978. 68(383-
388): p. 11. 182. Luong, J.H., K.B. Male, and J.D. Glennon, Biosensor technology: technology push
versus market pull. Biotechnology advances, 2008. 26(5): p. 492-500.
183. Banica, F.-G., Chemical sensors and biosensors: fundamentals and applications. 2012: John Wiley & Sons.
184. Ginsberg, B.H., Factors affecting blood glucose monitoring: sources of errors in measurement. Journal of diabetes science and technology, 2009. 3(4): p. 903-913.
185. Justino, C.I., T.A. Rocha-Santos, and A.C. Duarte, Review of analytical figures of merit of sensors and biosensors in clinical applications. TrAC Trends in Analytical
Chemistry, 2010. 29(10): p. 1172-1183.
186. Lu, G., Engineering Sclerotinia sclerotiorum resistance in oilseed crops. African Journal of Biotechnology, 2004. 2(12): p. 509-516.
180
187. Culbertson, B.J., N.C. Furumo, and S.L. Daniel, Impact of nutritional supplements and monosaccharides on growth, oxalate accumulation, and culture pH by Sclerotinia sclerotiorum. FEMS microbiology letters, 2007. 270(1): p. 132-138.
188. Andreescu, S. and O.A. Sadik, Trends and challenges in biochemical sensors for clinical and environmental monitoring. Pure and applied chemistry, 2004. 76(4): p.
861-878.
189. Turner, A.P., Biosensors: Past, present and future. Cranfield University, Institute of BioScience and Technology. Available online: www. cranfield. ac. uk/biotech/chinap.
htm, 1996. 190. Nayak, M., et al., Detection of microorganisms using biosensors—A smarter way
towards detection techniques. Biosensors and Bioelectronics, 2009. 25(4): p. 661-667.
191. Rejeb, I.B., et al., Development of a bio-electrochemical assay for AFB 1 detection in olive oil. Biosensors and Bioelectronics, 2009. 24(7): p. 1962-1968.
192. Mendes, R., et al., Development of an electrochemical immunosensor for Phakopsora pachyrhizi detection in the early diagnosis of soybean rust. Journal of the Brazilian Chemical Society, 2009. 20(4): p. 795-801.
193. D'Orazio, P., Biosensors in clinical chemistry—2011 update. Clinica Chimica Acta,
2011. 412(19): p. 1749-1761. 194. Richman, S.A., D.M. Kranz, and J.D. Stone, Biosensor detection systems: Engineering
stable, high-affinity bioreceptors by yeast surface display, in Biosensors and Biodetection. 2009, Springer. p. 323-350.
195. Oliver, N., et al., Glucose sensors: a review of current and emerging technology. Diabetic Medicine, 2009. 26(3): p. 197-210.
196. Aylor, D., Deposition gradients of urediniospores of Puccinia recondita near a source. Phytopathology, 1987. 77(10): p. 1442-1448.
197. Bullock, J.M. and R.T. Clarke, Long distance seed dispersal by wind: measuring and modelling the tail of the curve. Oecologia, 2000. 124(4): p. 506-521.
198. Dodge, Y., et al., The Oxford dictionary of statistical terms. 2003: Oxford University
Press.
199. Joanes, D. and C. Gill, Comparing measures of sample skewness and kurtosis. Journal of the Royal Statistical Society: Series D (The Statistician), 1998. 47(1): p. 183-189.
200. Raupach, M., P. Coppin, and B. Legg, Experiments on scalar dispersion within a model plant canopy part I: The turbulence structure. Boundary-Layer Meteorology, 1986.
35(1-2): p. 21-52.
201. Raupach, M., R. Antonia, and S. Rajagopalan, Rough-wall turbulent boundary layers. Applied Mechanics Reviews, 1991. 44(1): p. 1-25.
202. Raupach, M., Applying Lagrangian fluid mechanics to infer scalar source distributions from concentration profiles in plant canopies. Agricultural and Forest Meteorology,
1989. 47(2): p. 85-108. 203. Clarkson, J.P., et al., Forecasting Sclerotinia disease on lettuce: toward developing a
prediction model for carpogenic germination of sclerotia. Phytopathology, 2004.
94(3): p. 268-279. 204. Wu, B., et al., Incubation of excised apothecia enhances ascus maturation of
Sclerotinia sclerotiorum. Mycologia, 2007. 99(1): p. 33-41. 205. Bohrer, G., et al., Exploring the effects of microscale structural heterogeneity of forest
canopies using large-eddy simulations. Boundary-layer meteorology, 2009. 132(3):
p. 351-382. 206. Stockmarr, A., V. Andreasen, and H. Østergård, Dispersal distances for airborne
spores based on deposition rates and stochastic modeling. Phytopathology, 2007. 97(10): p. 1325-1330.
207. Bouvet, T., et al., Filtering of windborne particles by a natural windbreak. Boundary-layer meteorology, 2007. 123(3): p. 481-509.
208. Aylor, D.E., Modeling spore dispersal in a barley crop. Agricultural Meteorology, 1982.
26(3): p. 215-219.
181
209. Wegulo, S.N., et al., Spread of Sclerotinia stem rot of soybean from area and point sources of apothecial inoculum. Canadian Journal of Plant Science, 2000. 80(2): p. 389-402.
210. Duman, T., et al., A Velocity–Dissipation Lagrangian Stochastic Model for Turbulent Dispersion in Atmospheric Boundary-Layer and Canopy Flows. Boundary-layer
meteorology, 2014. 152(1): p. 1-18.
211. Duman, T., et al., Footprint Estimation for Multi-Layered Sources and Sinks Inside Canopies in Open and Protected Environments. Boundary-Layer Meteorology, 2015.
155(2): p. 229-248. 212. Katul, G.G., et al., The effects of the canopy medium on dry deposition velocities of
aerosol particles in the canopy sub-layer above forested ecosystems. Atmospheric Environment, 2011. 45(5): p. 1203-1212.
213. Wegulo, S., et al., Spread of Sclerotinia stem rot of soybean from area and point sources of apothecial inoculum. Canadian Journal of Plant Science, 2000. 80(2): p. 389-402.
214. Wilson, J.D., T.K. Flesch, and P. Bourdin, Ground-to-Air Gas Emission Rate Inferred from Measured Concentration Rise within a Disturbed Atmospheric Surface Layer. Journal of Applied Meteorology and Climatology, 2010. 49(9): p. 1818-1830.
215. Luhar, A.K., Turbulent Dispersion: Theory and Parameterization—Overview. Lagrangian Modeling of the Atmosphere, 2013: p. 14-18.
216. Hsieh, C.-I. and G. Katul, The Lagrangian stochastic model for estimating footprint and water vapor fluxes over inhomogeneous surfaces. International journal of
biometeorology, 2009. 53(1): p. 87-100. 217. Flesch, T.K., J.D. Wilson, and E. Yee, Backward-time Lagrangian stochastic dispersion
models and their application to estimate gaseous emissions. Journal of Applied
Meteorology, 1995. 34(6): p. 1320-1332. 218. Ro, K.S., et al., Measuring gas emissions from animal waste lagoons with an inverse-
dispersion technique. Atmospheric Environment, 2013. 66(0): p. 101-106. 219. McBain, M.C. and R.L. Desjardins, The evaluation of a backward Lagrangian
stochastic (bLS) model to estimate greenhouse gas emissions from agricultural sources using a synthetic tracer source. Agricultural and Forest Meteorology, 2005. 135(1–4): p. 61-72.
220. Garratt, J., The atmospheric boundary layer. Cambridge atmospheric and space science series. Cambridge University Press, Cambridge, 1992. 416: p. 444.
221. Obukhov, A., Turbulence in an atmosphere with a non-uniform temperature. Boundary-layer meteorology, 1971. 2(1): p. 7-29.
222. Optis, M., A. Monahan, and F. Bosveld, Moving Beyond Monin–Obukhov Similarity Theory in Modelling Wind-Speed Profiles in the Lower Atmospheric Boundary Layer under Stable Stratification. Boundary-Layer Meteorology, 2014. 153(3): p. 497-514.
223. Businger, J.A., et al., Flux-profile relationships in the atmospheric surface layer. Journal of the Atmospheric Sciences, 1971. 28(2): p. 181-189.
224. Panofsky, H.A. and J.A. Dutton, Atmospheric turbulence. Models and methods for engineering applications. New York: Wiley, 1984, 1984. 1.
225. Wilson, J., G. Thurtell, and G. Kidd, Numerical simulation of particle trajectories in inhomogeneous turbulence, I: Systems with constant turbulent velocity scale. Boundary-Layer Meteorology, 1981a. 21(3): p. 295-313.
226. Wilson, J., G. Thurtell, and G. Kidd, Numerical simulation of particle trajectories in inhomogeneous turbulence, II: Systems with variable turbulent velocity scale. Boundary-Layer Meteorology, 1981b. 21(4): p. 423-441.
227. Pelliccioni, A., et al., Some characteristics of the urban boundary layer above Rome, Italy, and applicability of Monin–Obukhov similarity. Environmental fluid mechanics,
2012. 12(5): p. 405-428. 228. Wilson, J., Monin-Obukhov functions for standard deviations of velocity. Boundary-
layer meteorology, 2008. 129(3): p. 353-369.
229. Högström, U., A.-S. Smedman, and H. Bergström, Calculation of wind speed variation with height over the sea. Wind Engineering, 2006. 30(4): p. 269-286.
182
230. Peña, A., S.-E. Gryning, and C.B. Hasager, Measurements and modelling of the wind speed profile in the marine atmospheric boundary layer. Boundary-layer meteorology, 2008. 129(3): p. 479-495.
231. Lange, B., et al., Importance of thermal effects and sea surface roughness for offshore wind resource assessment. Journal of wind engineering and industrial
aerodynamics, 2004. 92(11): p. 959-988.
232. Haugen, D., J. Kaimal, and E. Bradley, An experimental study of Reynolds stress and heat flux in the atmospheric surface layer. Quarterly Journal of the Royal
Meteorological Society, 1971. 97(412): p. 168-180. 233. Kaimal, J.C., et al., Spectral characteristics of surface-layer turbulence. Quarterly
Journal of the Royal Meteorological Society, 1972. 98(417): p. 563-589. 234. Schlegel, F., et al., Large-eddy simulation of inhomogeneous canopy flows using high
resolution terrestrial laser scanning data. Boundary-layer meteorology, 2012.
142(2): p. 223-243. 235. Cava, D. and G. Katul, The effects of thermal stratification on clustering properties of
canopy turbulence. Boundary-layer meteorology, 2009. 130(3): p. 307-325. 236. Braam, M., F. Bosveld, and A. Moene, On Monin–Obukhov Scaling in and Above the
Atmospheric Surface Layer: The Complexities of Elevated Scintillometer Measurements. Boundary-Layer Meteorology, 2012. 144(2): p. 157-177.
237. De Ridder, K., Bulk Transfer Relations for the Roughness Sublayer. Boundary-Layer
Meteorology, 2010. 134(2): p. 257-267. 238. Shaw, R., et al., Measurements of mean wind flow and three-dimensional turbulence
intensity within a mature corn canopy. Agricultural Meteorology, 1974. 13(3): p. 419-425.
239. Sawford, B. and F. Guest, Lagrangian statistical simulation of the turbulent motion of heavy particles. Boundary-Layer Meteorology, 1991. 54(1-2): p. 147-166.
240. Markkanen, T., et al., Footprints and fetches for fluxes over forest canopies with varying structure and density. Boundary-layer meteorology, 2003. 106(3): p. 437-459.
241. Siqueira, M., G. Katul, and J. Tanny, The Effect of the Screen on the Mass, Momentum, and Energy Exchange Rates of a Uniform Crop Situated in an Extensive Screenhouse. Boundary-Layer Meteorology, 2012. 142(3): p. 339-363.
242. Wilson, J.D. and T.K. Flesch, Flow boundaries in random-flight dispersion models: enforcing the well-mixed condition. Journal of Applied Meteorology, 1993. 32(11): p.
1695-1707.
243. Gao, Z., et al., Estimating gas emissions from multiple sources using a backward Lagrangian stochastic model. Journal of the Air & Waste Management Association,
2008. 58(11): p. 1415-1421. 244. Aylor, D.E., Relative collection efficiency of Rotorod and Burkard spore samplers for
airborne Venturia inaequalis ascospores. Phytopathology, 1993. 83(10): p. 1116-1119.
245. Hartill, W., Aerobiology of Sclerotinia sclerotiorum and Botrytis cinerea spores in New Zealand tobacco crops. New Zealand Journal of Agricultural Research, 1980. 23(2): p. 259-262.
246. Abawi, G. and J. Hunter, White mold of beans in New York. 1979. 247. Bock, C. and P. Cotty, Methods to sample air borne propagules of Aspergillus flavus.
European journal of plant pathology, 2006. 114(4): p. 357-362.
248. Hanna, S., D. Strimaitis, and J. Chang, Hazard Response Modeling Uncertainty (A Quantitative Method). Volume 2. Evaluation of Commonly Used Hazardous Gas Dispersion Models. 1993, DTIC Document.
249. Chang, J. and S. Hanna, Air quality model performance evaluation. Meteorology and
Atmospheric Physics, 2004. 87(1-3): p. 167-196. 250. Chang, J.C. and S.R. Hanna, Technical descriptions and user’s guide for the BOOT
statistical model evaluation software package. 2005, Version.
251. Willmott, C.J., Some comments on the evaluation of model performance. Bulletin of the American Meteorological Society, 1982. 63(11): p. 1309-1313.
183
252. Aylor, D., Y. Wang, and D. Miller, Intermittent wind close to the ground within a grass canopy. Boundary-Layer Meteorology, 1993. 66(4): p. 427-448.
253. Markkanen, T., et al., Comparison of conventional Lagrangian stochastic footprint models against LES driven footprint estimates. Atmospheric Chemistry and Physics, 2009. 9(15): p. 5575-5586.
254. Pan, Z., et al., Prediction of plant diseases through modelling and monitoring airborne pathogen dispersal. Plant Sciences Reviews 2010, 2011: p. 191.
255. Cai, X., et al., Evaluation of backward and forward Lagrangian footprint models in the surface layer. Theoretical and Applied Climatology, 2008. 93(3-4): p. 207-223.
256. Wilson, N.R. and R.H. Shaw, A higher order closure model for canopy flow. Journal
of Applied Meteorology, 1977. 16(11): p. 1197-1205. 257. Wilson, J.D., et al., Lagrangian simulation of wind transport in the urban environment.
Quarterly Journal of the Royal Meteorological Society, 2009. 135(643): p. 1586-
1602. 258. Nathan, R., et al., Mechanistic models of seed dispersal by wind. Theoretical Ecology,
2011. 4(2): p. 113-132. 259. Yi, T.H., H.N. Li, and M. Gu, Optimal sensor placement for structural health
monitoring based on multiple optimization strategies. The Structural Design of Tall
and Special Buildings, 2011. 20(7): p. 881-900. 260. Flynn, E.B. and M.D. Todd, A Bayesian approach to optimal sensor placement for
structural health monitoring with application to active sensing. Mechanical Systems and Signal Processing, 2010. 24(4): p. 891-903.
261. Rood, A.S., Performance evaluation of AERMOD, CALPUFF, and legacy air dispersion models using the Winter Validation Tracer Study dataset. Atmospheric Environment,
2014. 89: p. 707-720.
262. Rinne, J., et al., Effect of chemical degradation on fluxes of reactive compounds – a study with a stochastic Lagrangian transport model. Atmos. Chem. Phys., 2012.
12(11): p. 4843-4854. 263. Gladders, P., et al., Sclerotinia in Oilseed Rape: A Review of the 2007 Epidemic in
England. 2008, Home-Grown Cereals Authority.
264. MacGregor, J. and T. Kourti, Statistical process control of multivariate processes. Control Engineering Practice, 1995. 3(3): p. 403-414.
265. Martens, H., Multivariate calibration. 1989: John Wiley & Sons. 266. Bro, R., et al., Cross-validation of component models: a critical look at current
methods. Analytical and bioanalytical chemistry, 2008. 390(5): p. 1241-1251.
267. Kresta, J.V., J.F. MacGregor, and T.E. Marlin, Multivariate statistical monitoring of process operating performance. The Canadian Journal of Chemical Engineering,
1991. 69(1): p. 35-47. 268. Montgomery, D.C., Introduction to statistical quality control. 1991.
269. Montgomery, D.C., Introduction to Statistical Quality Control. 2004: Wiley. 270. Montgomery, D.C., et al., Integrating statistical process control and engineering
process control. Journal of quality Technology, 1994. 26(2): p. 79-87.
271. Montgomery, D.C. and W. Woodall, Research Issues and and Ideas in Statistical Process Control. Journal of Quality Technology, 1999. 31(4): p. 376-387.
272. Marjanovic, O., et al., Real-time monitoring of an industrial batch process. Computers & chemical engineering, 2006. 30(10): p. 1476-1481.
273. Goulding, P.R., et al., Fault detection in continuous processes using multivariate statistical methods. International Journal of Systems Science, 2000. 31(11): p. 1459-1471.
274. Lennox, B., et al., Application of multivariate statistical process control to batch operations. Computers & Chemical Engineering, 2000. 24(2): p. 291-296.
275. Choi, S.W., et al., Adaptive multivariate statistical process control for monitoring time-varying processes. Industrial & Engineering Chemistry Research, 2006. 45(9): p.
3108-3118.
276. Bersimis, S., S. Psarakis, and J. Panaretos, Multivariate statistical process control charts: an overview. Quality and Reliability Engineering International, 2007. 23(5):
p. 517-543.
184
277. Mason, R.L. and J.C. Young, Improving the sensitivity of the T2 statistic in multivariate process control. Journal of Quality Technology, 1999. 31(2): p. 155-165.
278. Varmuza, K. and P. Filzmoser, Introduction to multivariate statistical analysis in chemometrics. 2008: CRC press.
279. Phaladiganon, P., et al., Principal component analysis-based control charts for multivariate nonnormal distributions. Expert Systems with Applications, 2013. 40(8):
p. 3044-3054. 280. Ferrer, A., Multivariate statistical process control based on principal component
analysis (MSPC-PCA): some reflections and a case study in an autobody assembly process. Quality Engineering, 2007. 19(4): p. 311-325.
281. Jackson, J.E. and G.S. Mudholkar, Control procedures for residuals associated with principal component analysis. Technometrics, 1979. 21(3): p. 341-349.
282. Chou, Y.-M., R.L. Mason, and J.C. Young, The control chart for individual observations from a multivariate non-normal distribution. Communications in Statistics-Theory and Methods, 2001. 30(8-9): p. 1937-1949.
283. Tracy, N., J. Young, and R. Mason, Multivariate control charts for individual observations. Journal of Quality Technology, 1992. 24(2).
284. Westerhuis, J.A., S.P. Gurden, and A.K. Smilde, Generalized contribution plots in multivariate statistical process monitoring. Chemometrics and Intelligent Laboratory Systems, 2000. 51(1): p. 95-114.
285. Kourti, T., The process analytical technology initiative and multivariate process analysis, monitoring and control. Analytical and bioanalytical chemistry, 2006.
384(5): p. 1043-1048. 286. Kourti, T. and J.F. MacGregor, Process analysis, monitoring and diagnosis, using
multivariate projection methods. Chemometrics and intelligent laboratory systems,
1995. 28(1): p. 3-21. 287. Walczak, B. and D. Massart, Dealing with missing data: Part I. Chemometrics and
Intelligent Laboratory Systems, 2001. 58(1): p. 15-27. 288. Camacho, J. and A. Ferrer, Cross‐validation in PCA models with the element‐wise k‐
fold (ekf) algorithm: theoretical aspects. Journal of Chemometrics, 2012. 26(7): p.
361-373. 289. Kjeldahl, K. and R. Bro, Some common misunderstandings in chemometrics. Journal
of Chemometrics, 2010. 24(7-8): p. 558-564.
290. Van Ginkel, J.R., P.M. Kroonenberg, and H.A. Kiers, Missing data in principal component analysis of questionnaire data: a comparison of methods. Journal of
Statistical Computation and Simulation, 2014. 84(11): p. 2298-2315. 291. Ilin, A. and T. Raiko, Practical approaches to principal component analysis in the
presence of missing values. The Journal of Machine Learning Research, 2010. 11: p.
1957-2000. 292. Little, R.J. and D.B. Rubin, Statistical analysis with missing data. 2014: John Wiley &
Sons. 293. Nelson, P.R.C., J.F. MacGregor, and P.A. Taylor, The impact of missing measurements
on PCA and PLS prediction and monitoring applications. Chemometrics and Intelligent Laboratory Systems, 2006. 80(1): p. 1-12.
294. Joe Qin, S., Statistical process monitoring: basics and beyond. Journal of
chemometrics, 2003. 17(8‐9): p. 480-502.
295. Doornik, J.A. and H. Hansen, An omnibus test for univariate and multivariate normality*. Oxford Bulletin of Economics and Statistics, 2008. 70(s1): p. 927-939.
296. Royston, P., Approximating the Shapiro-Wilk W-Test for non-normality. Statistics and Computing, 1992. 2(3): p. 117-119.
297. Shapiro, S.S. and M.B. Wilk, An analysis of variance test for normality (complete samples). Biometrika, 1965: p. 591-611.
298. Yin, S., et al., A comparison study of basic data-driven fault diagnosis and process monitoring methods on the benchmark Tennessee Eastman process. Journal of Process Control, 2012. 22(9): p. 1567-1581.
299. Qin, S.J., Survey on data-driven industrial process monitoring and diagnosis. Annual Reviews in Control, 2012. 36(2): p. 220-234.
185
300. Barceló, S., S. Vidal-Puig, and A. Ferrer, Comparison of multivariate statistical methods for dynamic systems modeling. Quality and Reliability Engineering International, 2011. 27(1): p. 107-124.
301. Quevedo, J., et al., Validation and reconstruction of flow meter data in the Barcelona water distribution network. Control Engineering Practice, 2010. 18(6): p. 640-651.
302. Pollice, A. and G. Jona Lasinio, Spatiotemporal analysis of the PM10 concentration over the Taranto area. Environmental Monitoring and Assessment, 2010. 162(1-4): p. 177-190.
303. Nelson, P.R., P.A. Taylor, and J.F. MacGregor, Missing data methods in PCA and PLS: Score calculations with incomplete observations. Chemometrics and intelligent
laboratory systems, 1996. 35(1): p. 45-65. 304. Lennox, B., et al., Process monitoring of an industrial fed‐batch fermentation.
Biotechnology and Bioengineering, 2001. 74(2): p. 125-135.
305. Arteaga, F. and A. Ferrer, Dealing with missing data in MSPC: several methods, different interpretations, some examples. Journal of chemometrics, 2002. 16(8‐10):
p. 408-418.
306. Alcala, C.F. and S. Joe Qin, Analysis and generalization of fault diagnosis methods for process monitoring. Journal of Process Control, 2011. 21(3): p. 322-330.
307. Alcala, C.F. and S.J. Qin, Reconstruction-based contribution for process monitoring. Automatica, 2009. 45(7): p. 1593-1600.
308. Pisoni, E., C. Carnevale, and M. Volta, Multi-criteria analysis for PM10 planning. Atmospheric Environment, 2009. 43(31): p. 4833-4842.
309. Carnevale, C., et al., Neuro-fuzzy and neural network systems for air quality control. Atmospheric Environment, 2009. 43(31): p. 4811-4821.
310. Li, G., et al., Reconstruction based fault prognosis for continuous processes. Control Engineering Practice, 2010. 18(10): p. 1211-1219.
311. Dunia, R. and S. Joe Qin, Subspace approach to multidimensional fault identification and reconstruction. AIChE Journal, 1998. 44(8): p. 1813-1831.
312. Cressie, N., Statistics for Spatial Data. 1991: John Wiley & Sons.
313. Li, J. and A.D. Heap, A review of spatial interpolation methods for environmental scientists. 2008, Geoscience Australia Canberra. p. 137.
314. Denby, B., et al., Interpolation and assimilation methods for European scale air quality assessment and mapping. Part I: Review and Recommendations. European
Topic Centre on Air and Climate Change Technical Paper, 2005. 7.
315. Horálek, J., et al., Interpolation and assimilation methods for European scale air quality assessment and mapping, Part II: Development and testing new methodologies. ETC/ACC Technical Paper, 2005. 7.
316. Zidek, J.V., W. Sun, and N.D. Le, Designing and integrating composite networks for monitoring multivariate Gaussian pollution fields. Journal of the Royal Statistical Society: Series C (Applied Statistics), 2000. 49(1): p. 63-79.
317. Wong, D.W., L. Yuan, and S.A. Perlin, Comparison of spatial interpolation methods for the estimation of air quality data. J Expo Anal Environ Epidemiol, 2004. 14(5): p. 404-415.
318. Clark, I. and W. Harper, Practical geostatistics. 2000, Columbus, OH: Ecosse North American LLC.
319. Li, J. and A.D. Heap, A review of comparative studies of spatial interpolation methods in environmental sciences: performance and impact factors. Ecological Informatics, 2011. 6(3): p. 228-241.
320. Goovaerts, P., Geostatistics for natural resources evaluation. 1997: Oxford university press.
321. Isaaks, E.H. and R.M. Srivastava, An introduction to applied geostatistics. 1989. 322. Burrough, P.A. and R. McDonnell, Principles of geographical information systems. Vol.
333. 1998, Oxford: Oxford university press
323. Gräler, B., L. Gerharz, and E. Pebesma, Spatio-temporal analysis and interpolation of PM10 measurements in Europe. ETC/ACM Technical Paper, 2011. 10.
186
324. Kim, S.-Y., et al., Ordinary kriging approach to predicting long-term particulate matter concentrations in seven major Korean cities. Environmental Health and Toxicology, 2014. 29: p. e2014012.
325. Pires, J. and F. Martins, Evaluation of spatial variability of PM10 concentrations in London. Water, Air, & Soil Pollution, 2012. 223(5): p. 2287-2296.
326. Pires, J.C., et al., Evaluation of redundant measurements on the air quality monitoring network of Lisbon and Tagus Valley. Chemical Product and Process Modeling, 2009. 4(4): p. 14.
327. Lu, W.-Z., H.-D. He, and L.-y. Dong, Performance assessment of air quality monitoring networks using principal component analysis and cluster analysis. Building
and Environment, 2011. 46(3): p. 577-583. 328. Lau, J., W. Hung, and C. Cheung, Interpretation of air quality in relation to monitoring
station's surroundings. Atmospheric Environment, 2009. 43(4): p. 769-777.
329. Ibarra-Berastegi, G., et al., Assessing spatial variability of SO 2 field as detected by an air quality network using self-organizing maps, cluster, and principal component analysis. Atmospheric Environment, 2009. 43(25): p. 3829-3836.
330. Afif, C., et al., Statistical approach for the characterization of NO2 concentrations in Beirut. Air Quality, Atmosphere & Health, 2009. 2(2): p. 57-67.
331. Wise, B.M. and N.B. Gallagher, The process chemometrics approach to process monitoring and fault detection. Journal of Process Control, 1996. 6(6): p. 329-348.
332. Tao, E., et al., Fault diagnosis based on PCA for sensors of laboratorial wastewater treatment process. Chemometrics and Intelligent Laboratory Systems, 2013. 128: p.
49-55. 333. Qin, J., et al., Detection of citrus canker using hyperspectral reflectance imaging with
spectral information divergence. Journal of Food Engineering, 2009. 93(2): p. 183-
191. 334. Pires, J., et al., Identification of redundant air quality measurements through the use
of principal component analysis. Atmospheric Environment, 2009. 43(25): p. 3837-3842.
335. Chen, T., E. Martin, and G. Montague, Robust probabilistic PCA with missing data and contribution analysis for outlier detection. Computational Statistics & Data Analysis, 2009. 53(10): p. 3706-3716.
336. Wang, H., et al., Data Driven Fault Diagnosis and Fault Tolerant Control: Some Advances and Possible New Directions. Acta Automatica Sinica, 2009. 35(6): p. 739-
747.
337. EPA, D., Integrated science assessment for particulate matter. US Environmental Protection Agency Washington, DC, 2009.
338. Callén, M.S., et al., Comparison of receptor models for source apportionment of the PM10 in Zaragoza (Spain). Chemosphere, 2009. 76(8): p. 1120-1129.
339. Contini, D., et al., Characterisation and source apportionment of PM10 in an urban background site in Lecce. Atmospheric Research, 2010. 95(1): p. 40-54.
340. Cawley, G.C. and N.L. Talbot, On over-fitting in model selection and subsequent selection bias in performance evaluation. The Journal of Machine Learning Research, 2010. 11: p. 2079-2107.
341. Mercer, L.D., et al., Comparing universal kriging and land-use regression for predicting concentrations of gaseous oxides of nitrogen (NOx) for the Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air). Atmospheric Environment,
2011. 45(26): p. 4412-4420. 342. Son, J.-Y., M.L. Bell, and J.-T. Lee, Individual exposure to air pollution and lung
function in Korea: spatial analysis using multiple exposure approaches. Environmental research, 2010. 110(8): p. 739-749.
343. USEPA, Quality Assurance Handbook for Air Pollution Measurement Systems. 2008.
344. AQEG, Particulate Matter in the UK. 2005, Defra: London.
345. Green, D.C., G.W. Fuller, and T. Baker, Development and validation of the volatile correction model for PM 10–An empirical method for adjusting TEOM measurements
187
for their loss of volatile particulate matter. Atmospheric Environment, 2009. 43(13):
p. 2132-2141. 346. Velusamy, V., et al., An overview of foodborne pathogen detection: In the perspective
of biosensors. Biotechnology Advances, 2010. 28(2): p. 232-254. 347. Hill, D.J. and B.S. Minsker, Anomaly detection in streaming environmental sensor
data: A data-driven modeling approach. Environmental Modelling & Software, 2010.
25(9): p. 1014-1022. 348. Estévez, J., P. Gavilán, and J.V. Giráldez, Guidelines on validation procedures for
meteorological data from automatic weather stations. Journal of Hydrology, 2011. 402(1): p. 144-154.
349. Akkala, A., V. Devabhaktuni, and A. Kumar, Interpolation techniques and associated software for environmental data. Environmental Progress & Sustainable Energy,
2010. 29(2): p. 134-141.
350. Moustris, K.P., et al., Development and Application of Artificial Neural Network Modeling in Forecasting PM10 Levels in a Mediterranean City. Water, Air, & Soil
Pollution, 2013. 224(8): p. 1-11. 351. Zhang, H., et al., Evaluation of PM10 forecasting based on the artificial neural network
model and intake fraction in an urban area: A case study in Taiyuan City, China. Journal of the Air & Waste Management Association, 2013. 63(7): p. 755-763.
352. Hoek, G., et al., A review of land-use regression models to assess spatial variation of outdoor air pollution. Atmospheric environment, 2008. 42(33): p. 7561-7578.
353. Liang, L., et al., Rapid detection and quantification of fungal spores in the urban atmosphere by flow cytometry. Journal of Aerosol Science, 2013. 66: p. 179-186.
354. Rundel, P.W., et al., Environmental sensor networks in ecological research. New
Phytologist, 2009. 182(3): p. 589-607.
355. Elshenawy, L.M., et al., Efficient recursive principal component analysis algorithms for process monitoring. Industrial & Engineering Chemistry Research, 2009. 49(1):
p. 252-259. 356. Liu, X., et al., Moving window kernel PCA for adaptive monitoring of nonlinear
processes. Chemometrics and Intelligent Laboratory Systems, 2009. 96(2): p. 132-
143. 357. Pesquer, L., A. Cortés, and X. Pons, Parallel ordinary kriging interpolation
incorporating automatic variogram fitting. Computers & Geosciences, 2011. 37(4): p. 464-473.
188
Appendix 1: Original Plan and Modification Made
The Original Plan: The original plan of the SYIELD project was to have three field trials by
2012. The first was to test a working prototype of the biosensor on the field. The second to
deploy multiple units and the third was deploy the sensors on a regional scale. This was based
on a biosensor development timeline that would deliver a working prototype biosensor in
2010. The field trials were to take place at research facilities, such as Rothamsted and
Velcourt Farms, as well as areas recognised as hotspots for Sclerotinia spores. The research
goals were to use the data from the field trial to determine spore ingress in canopy
environments and use data mining methodologies to design interpolation methods that would
be used for deployment.
The Challenges/Limitations: Due to logistical and technological issues, the biosensor was
delayed by nearly 2 years, such that they were not in a field deployable state by early 2013.
This had the implication that the author had no data to work with.
The Modifications: To overcome these challenges, the author conceived and planned a field
trial experiment to generate data (see Field Trial experiment details in section 3.3.1).
Rothamsted had earlier (winter 2012) sown Sclerotia in an OSR field for a local field trial and
the biosensor chips were available. It was possible to make electrochemical measurements
with the biosensor chips using a handheld potentiostat and a bespoke connector. But it was
realised that the field trial would not be able to collect data on a scale that will be meaningful
for empirical methods. The original goals were therefore modified and a physical modelling
approach was adopted instead. The idea was to evaluate a good physical model that can
enable spore concentration estimation in a canopy environment. Subsequently, this data can
be scaled and used to address some aspects of the original goals. The field trial experiment
was then designed with the intention of collecting spore data, measuring the concentration
with biosensor chips and using that data to evaluate the physical model. However, a
preliminary calibration test indicated that the biosensors may not be sensitive enough to
provide meaningful data. It was at that point that a more reliable quantification technique
was incorporated into the experimental plan.
The unreliability of the biosensor also prioritised data integrity issues in the research and
motivated the methodology presented in Chapter 5.