Modelling and Multivariate Data Analysis of Agricultural ...

Modelling and Multivariate Data

Analysis of Agricultural Systems

A thesis submitted to The University of Manchester for the

degree of Doctor of Philosophy in the Faculty of

Engineering and Physical Sciences

2015

Najib U Lawal

School of Electrical and Electronic Engineering

2

Table of Contents

Table of Contents ................................................................................................................. 2

List of Figures....................................................................................................................... 5

List of Tables ........................................................................................................................ 7

Abstract ............................................................................................................................... 8

Declaration ........................................................................................................................... 9

Copyright Statement ............................................................................................................. 9

Acknowledgements ............................................................................................................. 10

Abbreviations ..................................................................................................................... 11

Chapter 1 Introduction ........................................................................................................ 13

1.1 Research Motivation .................................................................................................. 13

1.2 The SYIELD Project ................................................................................................... 15

1.2.1 The Biosensor .................................................................................................... 15

1.3 Main Objectives ........................................................................................................ 18

1.4 Contributions of the thesis ......................................................................................... 19

1.5 Thesis Structure ........................................................................................................ 20

Chapter 2 Literature Review ................................................................................................ 22

2.1 Sclerotinia sclerotium ................................................................................................ 22

2.1.1 Sclerotinia Ascospore Release ............................................................................. 22

2.1.2 Sclerotinia Ascospore Dispersal ........................................................................... 23

2.1.3 Sclerotinia sclerotium Epidemiology ..................................................................... 25

2.1.4 Sclerotinia Disease Models .................................................................................. 26

2.2 Dispersion Modelling ................................................................................................. 29

2.2.1 Gaussian Dispersion Model ................................................................................. 30

2.2.2 Trajectory Models .............................................................................................. 32

2.2.3 CALPUFF ........................................................................................................... 33

2.3 Multivariate Statistical Analysis ................................................................................... 34

2.3.1 Multivariate Analysis in Agriculture ...................................................................... 35

2.3.2 Multivariate Statistical Process Control ................................................................. 36

2.4 Sensors, Biosensors and Sensor Networks .................................................................. 40

2.4.1 Peculiar Challenges of Biosensor Networks .......................................................... 41

2.5 Conclusion ................................................................................................................ 44

Chapter 3 Dispersion of Sclerotinia sclerotium Spores in an Oil Seed Rape Canopy ................. 45

3.1 Introduction ............................................................................................................. 45

3.2 Motivation for Experimental Field Trial ........................................................................ 46

3.3 Methodology ............................................................................................................. 47

3.3.1 Field Trial Experiment ......................................................................................... 47

3

3.3.2 Identification and Quantification of Spores ........................................................... 56

3.4 Results ..................................................................................................................... 61

3.4.1 Biosensor Test and Calibration Results ................................................................ 61

3.4.2 Colourimetric Analysis Results and Discussion ...................................................... 64

3.4.3 Spore DNA (qPCR) Results .................................................................................. 69

3.5 Discussion ................................................................................................................ 76

3.5.1 Reliability of the Prototype biosensor in measuring oxalic acid .............................. 76

3.5.2 Sclerotinia sclerotium spores dispersion ............................................................... 80

3.5.3 Experimental Value of Spore Data ....................................................................... 84

3.5.4 Limitations ......................................................................................................... 85

3.6 Conclusion ................................................................................................................ 86

Chapter 4 A backward Lagrangian Stochastic (bLS) model for the dispersion of Sclerotinia sclerotium spores ............................................................................................................... 88

4.1 Introduction ............................................................................................................. 88

4.2 Motivation for Trajectory Modelling Approach ............................................................. 89

4.3 Background Theory ................................................................................................... 89

4.3.1 Lagrangian Stochastic Models ............................................................................. 89

4.3.2 The Backward Lagrangian Stochastic Model ......................................................... 91

4.3.3 Monin-Obukhov Similarity Theory (MOST) ........................................................... 93

4.4 Methodology ............................................................................................................. 94

4.4.1 Parametrising the bLS Model for Sclerotinia Dispersion ......................................... 95

4.4.2 Implementing the bLS Model ............................................................................ 100

4.4.3 Comparing model estimates to experimental data .............................................. 102

4.4.5 Assessing Model Performance ........................................................................... 105

4.5 Results ................................................................................................................... 106

4.6 Discussion .............................................................................................................. 112

4.6.1 bLS Model Performance .................................................................................... 112

4.6.2 Limitations of Experiment ................................................................................. 116

4.7 Conclusions ............................................................................................................ 117

Chapter 5 An Integrated Fault Detection, Identification and Reconstruction Scheme for Agricultural Systems ......................................................................................................... 119

5.1 Motivation .............................................................................................................. 120

5.2 Background Theory ................................................................................................. 121

5.2.1 Principal Components Analysis (PCA) ................................................................. 121

5.2.2 Multivariate Statistical Process Control (MSPC) ................................................... 122

5.2.3 Kernel Density Estimation ................................................................................. 125

5.3 Methodology ........................................................................................................... 126

5.3.1 Data ................................................................................................................ 126

4

5.3.2 Principal Component Analysis of PM10 ............................................................... 126

5.3.3 Multivariate Statistical Process Control (MSPC) ................................................... 128

5.3.4 Online Fault Detection of a PM10 Network with Missing Data .............................. 130

5.3.5 Online Fault Identification in a PM10 Network .................................................... 132

5.3.6 Augmented MSPC............................................................................................. 133

5.4 Results ................................................................................................................... 136

5.4.1 PCA Analysis of PM10 ....................................................................................... 136

5.4.2 Data pre-processing and preliminary model of PM10 .......................................... 139

5.4.3 Final Monitoring Model and Control limits ........................................................... 146

5.4.4 Online Fault Detection of PM10 network ............................................................ 148

5.4.5 Online Fault Identification ................................................................................. 154

5.4.6 Online Fault Detection in a PM10 Network ......................................................... 158

5.5 Discussion .............................................................................................................. 160

5.5.1 Integrated Fault Detection, Identification and Reconstruction in a PM10 Network 160

5.5.2 Limitations of K-MSPC ...................................................................................... 164

5.6 Conclusion .............................................................................................................. 165

Chapter 6 Conclusion, Recommendations and Future Work ................................................. 167

6.1 Overview of Research Motivation ............................................................................. 167

6.2 Summary of Principal Findings ................................................................................. 168

6.2.1 Field trial experiment and generation of novel data ............................................ 168

6.2.2 Evaluating a 3D bLS model with experimental data ............................................ 168

6.2.3 Multivariate data analysis of potential sensor network ........................................ 169

6.3 Real world applications of research .......................................................................... 170

6.4 Further areas of research ........................................................................................ 170

References ....................................................................................................................... 172

Appendix 1: Original Plan and Modification Made ................................................................ 188

WORD COUNT: 59,685

5

List of Figures

Figure 1.1: Biosensor components with sources of failure identified. ...................................... 17

Figure 2.1: Lifecycle of Sclerotinia sclerotium [68] ................................................................ 25 Figure 2.2: Spore dispersal downwind of an above ground plume source [80] ........................ 30 Figure 2.3: Kernel estimates showing individual kernels and the effect of bandwidth, ℎ𝐾𝐷𝐸 (a)

ℎ𝐾𝐷𝐸 = 0.2; (b) ℎ𝐾𝐷𝐸 = 0.8 [112] ............................................................................. 40

Figure 3. 1: Location of Little Hoos (WGS84 Lat/Long: 51.811374/-0.373084), the experimental site, among other field trial sites at Rothamsted Research UK (source of image:

Rothamsted Research). ............................................................................................... 49

Figure 3.2: Layout of sampling area (43m by 28m) within field trial site from 31st May 2013 to 3rd June 2013 showing positions of Rotorod samplers. Data was collected at two heights of 0.8m and 1.6m (O), and additional heights of 2.4m and 3.2m (⊕). ........................... 52

An arrangement of biosensor unit, weather station and a 3D sonic anemometer were situated at the centre of the 7m-diameter ring of ascospores. Scale of sampling area excluding

upwind sampling point: 35m by 28m. All sampling positions are 7 meters apart except I, which is 14m from D. B is 1m away from the edge of the source ring. ........................... 52

Figure 3.3: Experimental trial field showing Rotorod samplers (with rain shields) above OSR

canopy. (Image taken by the author). .......................................................................... 53 Figure 3. 4: Rotorod samplers at position B deployed at 0.8m (obscured), 1.6m, 2.4m and 3.2m

pictured without rain covers. Position B (as well as D) sampled at two additional heights. (Image taken by the author). ...................................................................................... 54

Figure 3. 5: A typical assembly of Rotorod sampler (1), battery (2) and Burkard timer (3), seen here only powering one sampler with its other output unused. (Image taken by author) . 55

Figure 3. 6: Biosensor attached to Uniscan potentiostat using a bespoke connector (1).

Prototype biosensor (2) sensing surface is an enzyme-coated carbon electrode (black circular area in right frame). (Image taken by author) .................................................. 59

Figure 3. 7: Biosensor calibration curve for five repeated measurements at 600C after allowing 120 seconds of mixing (𝑛 = 25, error bars = ± 1S. D. ). .................................................. 62

Figure 3.8: Oxalic acid concentrations for all days for samples collected below the OSR canopy

................................................................................................................................. 66 Figure 3.9: Oxalic acid concentrations for all days for samples collected below the OSR canopy

................................................................................................................................. 67

Figure 3.10: Side-by-side comparison of daily oxalic acid concentrations for all positions. The positions of collection of spores represent Rotorod samplers that were deployed below the

canopy. ...................................................................................................................... 68 Figure 3.11: Concentrations grouped by position for all sampling days. Spores tested for oxalic

acid were collected below the canopy. ......................................................................... 68

Figure 3.12: Along wind concentration (spore DNA) gradient below OSR canopy for first three sampling days. The key refers to field positions (letters) and height of deployment above

ground (numbers). Spore DNA axis is scaled for clarity, maximum values for the first 2 days are shown at the top and have the same units as the vertical axis. ........................ 69

Figure 3.13: Along wind concentration (spore DNA) gradient above OSR canopy for first three sampling days. Lateral (crosswind) sampling positions are not shown. ........................... 70

Figure 3.14: Wind rose showing forecasted (a) and actual (b) wind speed and directions on day

4. The forecasted wind readings were used to set the sampling axis, resulting in a misalignment of sampling grid and spore plume ........................................................... 70

Figure 3.15: The spore gradient at position B (1m downwind of spore ring) with height for first three sampling days.................................................................................................... 71

Figure 3.16: The spore gradient at position D (14m from downwind of spore ring) with height

for first three sampling days. ....................................................................................... 72 Figure 3.17: Spore dispersal gradient for all positions including crosswind (lateral) sampling

positions. The spore DNA concentration axis is in nanograms (ng) and is scaled between 0

6

to 1ng, for clarity. The key refers to field positions (letters) and height of deployment

above ground (numbers). ........................................................................................... 73 Figure 3.18: Spore DNA below the canopy plotted with distance from centre of spore ring for

first three days of sampling. Data is fitted to an inverse power law with coefficients, exponents and 𝑅2 as shown. ....................................................................................... 75

Figure 3.20: Kernel Density Estimation of spore DNA distribution below (left) and above (right)

the canopy. ................................................................................................................ 81 Figure 3.21: Dispersion contours of spore concentration below (left) and above (right) the

canopy. ...................................................................................................................... 83

Figure 4.1: The assumed source configuration used for concentration footprint calculation showing approximate locations of 6 groups of Sclerotinia. Each group is assumed to cover

a 1 square meter area based on approximate measurements of area covered by fruiting bodies. The vertices of each square for the left bottom corner starting with 1 are: (-2.25,

2.5), (1.25, 2.5), (3, -0.5), (1.25, 4.0), (-2.25, -4.0), and (-4, 0.5). (Drawing not to scale)/

............................................................................................................................... 101 Figure 4.2: Normalised observations (blue asterisks) versus normalised model predictions (red

circles) above (left panels) and below (right panels) the canopy for the downwind sampling positions for all sampling days. .................................................................... 107

Figure 4.3: Normalised observations (blue asterisks) versus normalised model predictions (red circles) above (left panels) and below (right panels) the canopy for the crosswind

sampling positions for all sampling days. .................................................................... 108

Figure 4.4: Normalised observations versus normalised model predictions for all observed concentrations above the canopy. The blue line is the 1:1 line .................................... 109

Figure 4.5: Normalised observations versus normalised model predictions for all observed concentrations below the canopy. The blue line is the 1:1 line. .................................... 109

Figure 5. 1: Score plot showing first PC against second (numbers represent sample number –

hour of year) ............................................................................................................ 137 Figure 5. 2: Loading plot of first vs. second PC showing all monitoring stations (numbers

represent station numbers) ....................................................................................... 137 Figure 5. 3: Percentage of variance explained by first 20 PCs .............................................. 139

Figure 5. 4: Calibration and cross-validation errors for first 20 PCs ....................................... 139 Figure 5. 4: Missing data distribution before pre-processing ................................................ 140

Figure 5. 5: Missing data distribution after processing ......................................................... 140

Figure 5. 6: Monitor locations showing deleted monitors (red) with excessive missing data ... 141 Figure 5. 7: Score plots showing the 4 largest PCs against each other ................................. 142

Figure 5.8: Cross-validation and calibration errors .............................................................. 143 Figure 5.9: Hotelling T2 chart preliminary PCA model .......................................................... 143

Figure 5.10: SPE chart for preliminary PCA model ............................................................... 144

Figure 5. 11: Outliers from preliminary model’s SPE showing daily time of emission .............. 145 Figure 5. 12: Hotelling T2 for final PCA model .................................................................... 146

Figure 5. 13: SPE chart for final PCA model ........................................................................ 146 Figure 5. 14: Kernel density estimated distributions of Hotelling T2 and SPE ......................... 147

Figure 5. 15: KDE ICDF showing 95th percentile for Hotelling T2 and SPE ............................ 148 Figure 5. 16: Hotelling T2 control chart for new in-control samples ...................................... 149

Figure 5. 17: SPE chart for new in-control sample............................................................... 149

Figure 5. 18: Hotelling T2 control chart for in-control samples with missing data .................. 150 Figure 5. 19: SPE control chart for in-control samples with missing data .............................. 151

Figure 5. 20: Hotelling T2 chart with severe case of missing data (25%) .............................. 152 Figure 5. 21: SPE chart with severe missing data (25%) ..................................................... 153

Figure 5.22: Variable loadings on PC1 ................................................................................ 154

Figure 5. 23: Hotelling T2 chart for simulated out-of-control samples ................................... 155 Figure 5. 24: SPE chart for simulated out-of-control samples ............................................... 155

Figure 5. 25: SPE-T2 chart for simulated out-of-control samples .......................................... 156

7

Figure 5. 26: Hotelling T2 Contribution plot for 4 corrupted samples (Table 5.2) .................. 157

Figure 5. 27: SPE Contribution plot for 4 corrupted samples (Table 5.2) ............................... 157

List of Tables

Table 3. 1: Volumes of oxalic acid required to prepare 50𝑚𝐿 of 0, 50, 100, 500, 1000 and

1500𝜇𝑚𝑜𝑙𝐿 − 1 standards from 10𝑚𝑚𝑜𝑙𝐿 − 1stock ....................................................... 57

Table 3.2: Current recorded biosensor measurements procedure described in section 3.3.1

Values highlighted in yellow are above baseline noise level determined in the last section and are considered positive for oxalic acid. Heights of 0.8m correspond to Rotorod

samplers deployed below the canopy (canopy height = 1m). ........................................ 64

Table 3.3: Concentrations of oxalic acid measured by colourimetric analysis. Values in purple are positively and quantitatively representative of oxalic acid. Heights of 0.8m correspond

to Rotorod samplers below the canopy and all others are above the canopy (canopy height = 1m). ............................................................................................................. 65

Table 3.4: Spore DNA converted to spore numbers using 0.35pg per single spore determined by

Rogers et al. [153]. .................................................................................................... 78 Table 4. 1: Table of model parameters. ............................................................................. 105

Table 4.2: Calculated model performance measures for different observation groups (above or below canopy height). Number of observations is shown in square brackets ................ 111

Table 5. 2: Index of variables and sample number of corrupted observations ....................... 133

Table 5. 2: Control charts with increasing missing data ....................................................... 151 Table 5.3: Augmented MSPC results showing deviation of corrupted variables from their kriged

estimates and the kriging estimator’s variance ............................................................ 159

8

Abstract

Najib Lawal

Modelling and multivariate data analysis of agricultural systems

The University of Manchester (2015) The broader research area investigated during this programme was conceived from a goal to

contribute towards solving the challenge of food security in the 21st century through the reduction

of crop loss and minimisation of fungicide use. This is aimed to be achieved through the introduction of an empirical approach to agricultural disease monitoring. In line with this, the

SYIELD project, initiated by a consortium involving University of Manchester and Syngenta, among others, proposed a novel biosensor design that can electrochemically detect viable

airborne pathogens by exploiting the biology of plant-pathogen interaction. This approach offers

improvement on the inefficient and largely experimental methods currently used. Within this context, this PhD focused on the adoption of multidisciplinary methods to address three key

objectives that are central to the success of the SYIELD project: local spore ingress near canopies, the evaluation of a suitable model that can describe spore transport, and multivariate analysis of

the potential monitoring network built from these biosensors. The local transport of spores was first investigated by carrying out a field trial experiment

at Rothamsted Research UK in order to investigate spore ingress in OSR canopies, generate

reliable data for testing the prototype biosensor, and evaluate a trajectory model. During the experiment, spores were air-sampled and quantified using established manual detection methods.

Results showed that the manual methods, such as colourimetric detection are more sensitive than the proposed biosensor, suggesting the proxy measurement mechanism used by the biosensor

may not be reliable in live deployments where spores are likely to be contaminated by impurities

and other inhibitors of oxalic acid production. Spores quantified using the more reliable quantitative Polymerase Chain Reaction proved informative and provided novel of data of high

experimental value. The dispersal of this data was found to fit a power decay law, a finding that is consistent with experiments in other crops.

In the second area investigated, a 3D backward Lagrangian Stochastic model was

parameterised and evaluated with the field trial data. The bLS model, parameterised with Monin-Obukhov Similarity Theory (MOST) variables showed good agreement with experimental data and

compared favourably in terms of performance statistics with a recent application of an LS model in a maize canopy. Results obtained from the model were found to be more accurate above the

canopy than below it. This was attributed to a higher error during initialisation of release velocities below the canopy. Overall, the bLS model performed well and demonstrated suitability for

adoption in estimating above-canopy spore concentration profiles which can further be used for

designing efficient deployment strategies. The final area of focus was the monitoring of a potential biosensor network. A novel

framework based on Multivariate Statistical Process Control concepts was proposed and applied to data from a pollution-monitoring network. The main limitation of traditional MSPC in spatial

data applications was identified as a lack of spatial awareness by the PCA model when considering

correlation breakdowns caused by an incoming erroneous observation. This resulted in misclassification of healthy measurements as erroneous. The proposed Kriging-augmented MSPC

approach was able to incorporate this capability and significantly reduce the number of false alarms.

9

Declaration

No portion of the work referred to in the thesis has been submitted in support of an application

for another degree or qualification of this or any other university or other institute of learning;

Copyright Statement

The author of this thesis (including any appendices and/or schedules to this thesis) owns certain

copyright or related rights in it (the “Copyright”) and s/he has given The University of Manchester

certain rights to use such Copyright, including for administrative purposes. Copies of this thesis,

either in full or in extracts and whether in hard or electronic copy, may be made only in

accordance with the Copyright, Designs and Patents Act 1988 (as amended) and regulations

issued under it or, where appropriate, in accordance with licensing agreements which the

University has from time to time. This page must form part of any such copies made.

The ownership of certain Copyright, patents, designs, trademarks and other intellectual property

(the “Intellectual Property”) and any reproductions of copyright works in the thesis, for example

graphs and tables (“Reproductions”), which may be described in this thesis, may not be owned

by the author and may be owned by third parties. Such Intellectual Property and Reproductions

cannot and must not be made available for use without the prior written permission of the

owner(s) of the relevant Intellectual Property and/or Reproductions.

Further information on the conditions under which disclosure, publication and commercialisation

of this thesis, the Copyright and any Intellectual Property and/or Reproductions described in it

may take place is available in the University IP Policy (see

http://documents.manchester.ac.uk/DocuInfo.aspx?DocID=487), in any relevant Thesis

restriction declarations deposited in the University Library, The University Library’s regulations

(see http://www.manchester.ac.uk/library/aboutus/regulations) and in The University’s policy on

Presentation of Theses.

10

Acknowledgements

I would like to express my profound gratitude to Prof Barry Lennox, who was instrumental in

providing me with the opportunity of a lifetime to embark on a PhD. I am particularly grateful for

his patience and understanding when I faced health challenges throughout the final year of this

study.

I am especially thankful to my co-supervisor Dr Bruce Grieve who was always prompt with

assistance and advice. Bruce was also very understanding throughout this last year.

My appreciation also goes to Dr Jon West and Dr Steph Heard for being very hospitable and

tremendously helpful during my field trial experiment at Rothamsted Research.

I am grateful to my UK parents, Dr & Mrs Shehu, for all their support and encouragement, which

I cannot even begin to describe.

To my parents and dear sisters, whose constant love and support has been a life source, I love

you with all my heart.

Finally, to all my friends and colleagues in Manchester, where I have called home for the last five

years, and others all over the world, thank you for making the experience worthwhile.

11

Abbreviations

ANOVA Analysis of Variance

bLS Backward Lagrangian Stochastic Model

CFD Computational Fluid Dynamics

CTG Contiguous Themed Grid

EAD Eulerian Advection Model

ECN Environmental Change Network

EGA Error Grid Analysis

EKF Element-wise K-Fold Cross Validation

EM Expectation Maximisation

EPA Environmental Protection Agency

EWMA Exponentially Weighted Moving Average

FA Factor Analysis

FAC2 Predictions within a factor of 2

FAC5 Predictions within a factor of 5

FB Fractional Bias

fLS Forward Lagrangian Stochastic Model

GPM Gaussian Plume Model

IDW Inverse Distance Weighted interpolation

KDE Kernel Density Estimation

LAI Leaf Area Index

LAQN London Air Quality Network

LS Lagrangian Stochastic Model

MG Geometric Mean

MISE Mean Integrated Square Error

MOST Monin-Obukhov Similarity Theory

MSPC Multivariate Statistical Process Control

MTA Mean Tilt Angle

NIPALS Non-linear Iterative Partial Least Squares

NMSE Normalised Root Mean Squared Error

OA Oxalic Acid

OK Ordinary Kriging

OSR Oilseed Rape

PCA Principal Component Analysis

PCR Principal Component Regression

12

PLS Partial Least Squares

PLSR Partial Least Squares Regression

PM10 Particulate Matter of size less than 10

PMP Projection to Model Plane

PRESS Predicted Residual Sums of Squares

qPCR Quantitative Polymerase Chain reaction

R Pearson’s Correlation Coefficient

RMSEP Root Mean Squared Error of Prediction

RMSECV Root Mean Squared Error of Cross Validation

SCP Single Component Projection

SDE Stochastic Differential Equation

SDS Sodium Dodecyl Sulphate

SPE Squared Prediction Error

SSR Sclerotinia Seed Rape

SVD Singular Value Decomposition

SVI Sensor Validity Index

VG Geometric Variance

13

Chapter 1 Introduction

This chapter sets the context of the thesis, introduces the parent project that spurned the PhD

and lists the main contributions of the research.

1.1 Research Motivation

With the global population recently exceeding 7 billion and expected to reach 9.6 billion by 2050

[1], competition for depleting earth resources – land, water, food – will become more intense.

Maintaining food security is already a challenge, with yield, production and harvested land all on

a declining trend [2]. This calls for innovative agricultural practices that can help achieve and

maintain food security. One of the ways to achieve food security is by minimising pre-harvest

crop loss, where the central challenge is to eliminate or at least reduce the destructive effect of

crop pathogens [3]. Among pathogens, aerially transmitted fungal spores are the most prevalent,

far-reaching, rugged and, under conducive environmental conditions, the most destructive [4].

Fungal spores are difficult to control because it is not straightforward to detect them. Effective

detection of spores requires their measurement, which, in turn, involves both the collection and

the quantification of the pathogens. Manual detection, which can be used to physically collect

and count spores, is only feasible on small scales and, unfortunately, reliable automated detection

methods have not been available due to the unavailability of engineered biosensors that can

detect spores by exploiting the biological interaction between plants and pathogens. Farmers

currently control fungal spores by the preventative application of fungicides to entire fields when

an infection is suspected. However, as crop protection chemicals have only a limited life, the

efficacy of fungicides has often decayed before the onset of the pathogenic event, leaving crops

with limited protection. Additionally, fungicide overuse often results when farmers panic after

realising earlier applications were ineffective. This excessive application of fungicides may instead

have lethal consequences on beneficial arthropods and microbes that promote plant growth [5].

14

To minimise the inefficiency of fungicides and the resulting crop loss from infection by pathogens,

the agricultural community relies on two approaches to forecast crop disease: spore release

forecasts and disease/infection forecasts. Spore release forecasts utilise mechanistic models [6,

7], based on environmental conditions, such as soil relative humidity and temperature to forecast

release events from soil-borne fungi. Disease forecasts [8] base their forecasts not on the release

of spores but on the occurrence of ideal environmental conditions for plant infection. These latter

disease models assume a constant airborne spore concentration and alert farmers when

favourable infection conditions manifest. Both of these approaches may provide precise timing

regarding the onset of release events and infection but they lack location precision and the ability

to estimate spore ingress. The second approach is especially wasteful since it requires farmers to

apply crop protection when an infection threshold is reached whether or not there are spores

present in the air. Therefore, the issue of fungicide inefficiency in crop protection is still an

unsolved challenge and a forecasting system that will provide time and location precision would

offer considerable benefits in terms of economic savings on fungicide cost and improved crop

yield.

The main advantage of these two approaches despite their shortcomings is that they allow the

risk of crop disease to be predicted, without requiring spore data to be collected. A more reliable

method for forecasting crop disease is inoculum-based, where aerial spore concentrations are

measured and then used as input to empirical models in order to estimate the spread of the

fungal spores over a window of time in to the future. Although mechanisms of measuring aerial

spore concentration have existed for some time in agriculture, there do not exist empirical

methods able to reliably forecast largescale agricultural disease risks based on these

measurements [9]. Current pathogen detection methods in the agricultural industry rely on the

manual collection of spores on local scales, which can be extremely time consuming and

unreliable [10]. The more rudimentary techniques are based on collection by sedimentation,

where petri dishes are kept at various distances from a source, for spores to ‘settle’ on to. The

more efficient collection methods use air-sampling equipment to capture airborne spores [10,

11]. While existing air-sampling techniques simplify the collection aspect of spore detection, the

collected samples still need to be processed and in most cases biologically quantified. Detecting

pathogens in this manner has a tendency to be avoided because of the time-consuming nature

of currently available identification processes [12].

Another disadvantage of the currently available detection methods is that they are not practically

scalable and their value is really in experimental trials. The restriction of these methods to

experimental studies is due to the inherent assumption that the locations of the fungal sources

15

are known, and the rudimentary nature of the measurement systems compared to, say,

meteorological sensors which have relatively fast (hourly) and automated sampling. As a result,

pathogen data is collected under specific conditions and on small scales (on the order of tens of

meters) to reduce manual biological quantification difficulties.

To realistically monitor and forecast the risk that airborne pathogens pose to crops, a source-

independent, large-scale collection of spores that will offer advance warning and enable holistic

determination of spore ingress is required. An empirical approach based on the multivariable

analysis of data collected from a network of sensors has the ability to offer such a forecast,

provided the challenge of measuring spore concentration (automatic collection and quantification)

can been addressed. Recent developments in the area of automatic detection of airborne

pathogens [13] holds hope for empirical approaches to agricultural problems. It was in this spirit

that the SYIELD project was set up in 2010 with the aim of providing farmers with advance

warning and precision spray advice.

1.2 The SYIELD Project

The SYIELD project, involving the University of Manchester, Syngenta, Gwent Technology, among

others, was set up with the aim of developing an online risk-forecasting model of fungi-induced

crop diseases that was based on a nationwide biosensor network able to detect viable airborne

fungal spores. The information from these biosensors (see section 1.2.1) was then to form a

decision support system for farmers, enabling them to make efficient and systematic decisions

regarding fungicide applications in a way that ensures cost savings and minimal environmental

impact.

Early adoption of the project was proposed for oil seed rape (OSR) which is particularly susceptible

to Sclerotinia Stem Rot (SSR) in the UK. SSR is caused by Sclerotinia sclerotium, a pathogenic

plant fungus that affects over 400 plant species and may cause yield loss of up to 50% [8]. (See

section 2.1).

1.2.1 The Biosensor

The biosensor that was proposed and developed for measuring fungal spore concentration

mimicked the biology of plant-pathogen interaction through the provision of a nutrient source

that acted as a food source to viable spores. Once designed, this biosensor provided, for the first

time, a real-time, unsupervised means of spore detection. The biosensor was made up of a

sensing surface and a host of mechanical components that suck in airborne spores, incubate and

heat them to the optimal temperature for a biochemical reaction to take place. This reaction,

16

called an oxalate oxidase catalysed reaction [14], produces a pathogenicity factor of Sclerotinia

– oxalic acid (OA). Oxalic acid is then electrochemically measured as a current to infer the

concentration of the spores. The sensing surface was made up of an active (enzyme-coated)

biological surface designed to bind and react with Sclerotinia spores by providing them with a

nutrient base.

17

Figure 1.1: Biosensor components with sources of failure identified.

The entire detection process takes a total of three days from sample collection, through

incubation, to the oxalate oxidase reaction and subsequent electrochemical measurement of OA.

18

The biosensor is therefore a complex combination of multiple components. Figure 1.1 shows an

illustration of the sensor model, its main components and major sources of error.

1.3 Main Objectives

The original objectives of this work had to be modified due to a delay in the production of the

prototype biosensors. This is discussed in detail in Appendix 1. From figure 1.1, it may be

observed that the reliability of a sensed measurement can be affected by the production of oxalic

acid by masquerades, suppression of oxalic acid production by competing fungi, and mechanical

faults either in the form of potentiostat or pump failure. These are in addition to the noise that is

ever present in measurements. The identified sources of error fall in to four categories - false

positive errors, false negative errors, unusually high or low values (outliers) and missing data.

Throughout this thesis, ‘faults’ and ‘errors’ refer to measurement abnormalities resulting from

these four categories.

When the biosensor is deployed in a network, the task of ensuring data integrity becomes

significantly more challenging due to the duplicity of these errors and the challenge of automating

the monitoring process. Hence, error detection, identification and reconstruction methods have

to be designed for the network to ensure data integrity. A first step to achieving this is to

understand the dispersion mechanisms, both on local (in OSR canopies) and large scales. On

local scales, experiments can be carried out to generate data. On a large scale however, other

sources of data have to be identified and relied upon for analysis, as the tediousness of the

manual nature of detection and quantification already described makes large scale experiments

almost impossible. Particulate matter, made up of fine particles less than 10𝜇𝑚 in diameter

(PM10), from monitoring networks is promising in this regard and is discussed further in section

5.1.

The main objectives of this research are as follows:

1 Investigating Sclerotinia sclerotium dispersion in OSR fields through an experimental trial

that studied the natural release, transport and dispersion of spores.

2. Identifying a model able to estimate approximate travel distances of Sclerotinia spores at

short distances from the source.

3. Validating the identified spore dispersion model using experimental data.

4. Evaluating fault detection, fault identification and subsequent re-estimation techniques of

measured data over the potential biosensor network by extending and modifying

multivariate data analysis techniques.

19

The research thus seeks to offer a multidisciplinary solution to the identified problems and draws

on three areas, which are reviewed in Chapter 2: agricultural sciences, micrometeorology and

multivariate statistical process control.

1.4 Contributions of the thesis

The research study carried out during this PhD focuses on experimental design of pathogen

dispersal, modelling and multivariate data analysis in agricultural systems. The major

contributions of the work are as follows:

1 Conception, design and implementation of an experimental field trial for the release and

dispersion of Sclerotinia spores in an OSR field that yielded novel experimental data. A field

trial experiment has been designed and implemented at Rothamsted Research to investigate

the dispersion of spores in an OSR field. The objective of the experiment was to generate

data that described spore transport in and above an OSR canopy. The resulting concentration

gradients enable evaluation of dispersion models as well as estimates of safe distances from

fitted decay models. The 3 dimensional nature of the data as opposed to vertical profile

experiments, which are widely available for other types of fungal spores is of high

experimental value. For example, numerous multidimensional dispersion models, such as

Large Eddy Simulations (LES) and forward Lagrangian Stochastic (fLS) models can be

evaluated using the data.

2 The novel application of a backward Lagrangian Stochastic (bLS) model to describe spore

transport in and above an OSR canopy. During this study, the data generated from the field

trial experiment was used to evaluate a bLS model. While forward Lagrangian Stochastic

models have been applied to spore transport in crop canopies, a bLS model has not, to the

author’s knowledge, been applied to fungal spores in comparable canopies. The purpose of

this application was to evaluate the performance of a trajectory model. The back trajectories

generated with bLS can enable the future determination of minimum distances of separation

between biosensors in the near field (distances from the source characterised by canopy

disruption of surface layer turbulence), where prediction with other types of models can be

unreliable. The bLS model was parameterised using both on-field measurements and

empirical data from experimental findings in literature. Atmospheric turbulence was

characterised using a Monin-Obukhov Similarity Theory (MOST) approach. It was shown that

the model agrees with the experimental data, and compared favourably in terms of model

20

performance statistics with a recent application of a Eulerian-Lagrangian Stochastic model in

a maize canopy.

3 A novel procedure for extending multivariate statistical process control (MSPC) to spatial data,

potentially biosensed Sclerotinia spore data, is presented. MSPC is a statistical monitoring set

of tools for monitoring and controlling multivariate industrial processes where process

variables are correlated. A similar correlation resulting from the spatial correlation of airborne

spore concentrations is expected between biosensor measurements. Due to mechanical

failures, theft, vandalism and errors in the biological sensing process, biosensors will

inevitably have faulty, missing or erroneous measurements. Consequently, a monitoring

framework was presented that will ensure data integrity when measurements are missing,

and ensure detection of false positives, false negatives and outlying measurements. The novel

procedure, named K-MSPC, is an augmented MSPC approach that incorporates Kriging

interpolation into the monitoring scheme so that K-MSPC is aware of spatial dependence. For

example, in a typical MSPC application, a high measurement at a biosensor surrounded by

high neighbouring measurements would be designated as a fault. However, K-MSPC could

determine that high neighbouring values imply a spatial correlation, possibly due to the local

release of spores. K-MSPC was demonstrated with PM10 data sourced from the London Air

Quality Network (LAQN) of Kings College London. PM10 was chosen because of its

aerodynamic similarities (size and settling velocity) and, therefore, dispersion similarities to

Sclerotinia spores. The application of K-MSPC was shown to be successful in detecting and

identifying faults while minimising false alarms and handling missing data. K-MSPC could be

extended to biosensor and general environmental monitoring networks measuring particles

with similar aerodynamic characteristics to Sclerotinia where the spatial scale calls for a

modification of traditional MSPC. It has an advantage over current methods where anomaly

detection is done on the individual sensor level. This poses significant scaling challenges for

large sensor networks.

1.5 Thesis Structure

The thesis is organised into six chapters beginning with an introduction of the research motivation

and general research overview in Chapter 1. Chapter 2 reviews literature from the three key

disciplines drawn from in this work. It begins with a review of the epidemiology of Sclerotinia

sclerotium, followed by a review of pathogen dispersal and an introduction to multivariate data

analysis and its applications in agriculture. Limitations of the current approaches and areas of

21

improvement are identified in this chapter. The subsequent three chapters (Chapter 3-5)

constitute the main research work carried out during this PhD programme.

Chapter 3 presents the details of an experimental trial carried out at Rothamsted Research’s

facility in Harpenden, which was designed to sample naturally released Sclerotinia sclerotium

spores in an OSR field. Details of the collection, quantification and identification of spores by

various means as well the analysis of the data have been presented. An evaluation of the accuracy

and specificity of the different spore quantification methods used is presented in this chapter.

Chapter 4 presents the application and parameterization of a 3D backward Lagrangian Stochastic

(bLS) model to the data generated in chapter 3. The chapter begins by explaining the reason

influencing the choice of a trajectory model, followed by an introduction of Monin-Obukhov

Similarity Theory (MOST), which enables the parameterisations used in bLS in this work. This is

then followed by an introduction to the Lagrangian Stochastic (LS) model and subsequently bLS.

Data pre-processing and the implementation of bLS are then discussed. An evaluation of the

model’s performance based on its agreement with the experimental data from Chapter 3

concludes the chapter.

Chapter 5 proposes a novel procedure for fault detection in a proposed biosensor network that is

based on multivariate data analysis methods. PM10 monitoring data was used in this chapter and

the reasons for this choice are provided in the introduction this chapter. The chapter introduces

Principal Component Analysis (PCA), which has been chosen as a suitable model to describe the

biosensor network. Multivariate statistical process control (MSPC) concepts are then introduced

and their adaptation to the spatial PM10 data are discussed as the chapter unfolds. Limitations

of the traditional MSPC approach are demonstrated and the proposed augmented MSPC

procedure (K-MSPC) is presented and applied to the PM10 data.

Chapter 6 concludes the thesis by summarising principal findings and drawing real world

conclusions from them. Future areas of research work have also been identified and presented

in this chapter.

22

Chapter 2 Literature Review

This chapter gives a review of the current literature in the areas of research identified in chapter

1. In the previous section, the motivation for this research was introduced as being based upon

the inefficiencies of the current agricultural methods. In this section, these current practices are

reviewed and specific shortcomings are identified. This led on to the consideration of other

research areas that could offer improvements on the current methods and provide transferable

techniques that can enable the implementation of the proposals in this study.

2.1 Sclerotinia sclerotium

2.1.1 Sclerotinia Ascospore Release

Sclerotinia sclerotium is a pathogenic plant fungus that affects approximately 400 plant species

worldwide [15]. The fungus can germinate both carpogenically and myceliogenically [16]. In the

latter case, no ascospores are produced and potential for infection is mainly through stems and

roots of neighbouring plants. For carpogenic germination, which is influenced by factors such as

soil moisture and temperature [14], small fruiting bodies known as apothecia are produced on

the sclerotia [17]. Apothecia can attain sizes of 1cm in diameter and are capable of producing up

to 5 x 106 spores [18] over a lifetime of about 20 days under ideal conditions [19]. There is no

consensus with regards to the exact ideal conditions of ascospores. However it is widely believed

that illumination after dark, decrease in relative humidity and increase in temperature are the

determinants of spore release. This was first investigated and reported by Ingold [20]. McCartney

and Lacey [21] assert that low relative humidity preceded by high overnight relative humidity are

important for spore release. Most experiments [18, 22] reported ascospore release in saturated

air, although Clarkson et al. [19] have reported continuous discharge of spores at 65-75% relative

humidity. It is believed that the optimal conditions for release are 20-25oC and 90-95% relative

23

humidity. Weak release rates of ascospores have been reported at lower temperatures of as low

as 5-10oC [22] but it was observed that these suboptimal temperatures reduced the apothecial

lifetime [19].

The ascospore discharge mechanism is not fully understood and is increasingly revealed to be

complex. Sclerotinia spores are actively released after complex interactions between apothecia

and environmental conditions [23-25]. Early investigations by Ingold [20] showed that spores

were released intermittently as puffs. Other investigations have reported continuous discharge in

both light and dark conditions [19]. More recently, the release mechanism has been reported to

be sophisticated, with ascospores acting in a cooperative manner to surf their own wind and

maximise opportunities for longer travel distances [26]. The wide range of ascospore behaviour

suggest that spore discharge and consequently dispersal will have a wide range of variation

depending on local environmental conditions. Rate of ascospore release has also been

investigated. Numerous studies have found that ascospore discharge follows a diurnal pattern

with most experiments reporting a peak at midday [18, 27, 28]. This peak has been attributed to

a peak in temperature. The size range of ascospores reported is between 8-12um [17, 26]. Spores

are launched at speeds of 8.4m/s but this speed decreases to between 0.4 and 0.8m/s over the

first few millimetres of travel. Spores that can make it into the upper turbulent air and escape

the canopy are capable of attaining heights of 150m [17]. Once a spore escapes, it is expected

that its potential for dispersal is the same as other particles/bioaerosols of similar aerodynamic

characteristics [3, 29].

2.1.2 Sclerotinia Ascospore Dispersal

The release of Sclerotinia spores is closely related to its dispersal potential. Multiple studies have

reported a large deposition of spores near the source and that spores are usually locally sourced

[21, 30]. Investigations by Roper et al. [26] show that this high deposition rate is as a result of

a cooperative action by spores to sacrifice some numbers near the source so that opportunities

of long distance travel are maximised. As Sclerotinia spores are released at ground level inside

the canopy [17, 18], the effect of canopy on turbulence also plays an important role in

spore/scalar dispersion [31-37]. This is primarily due to the distortion of the turbulent field by the

canopy [38]. Canopy flow is characterised rapid dissipation of turbulent kinetic energy with depth

into the canopy [31], resulting in low average wind speeds accompanied by intermittent gusts

[39]. The heavy filtering effects of canopies have also been reported as factors influencing heavy

near-source deposition of fungal spores [40, 41]. As a result of this, spores generally travel

distances of the order of hundreds of meters. Suzui and Kobayashi and Boland and Hall [42, 43]

24

report distances of 100meters from the source [44]. However, long distance travel of Sclerotinia

spores has not been experimentally documented, possibly due to large-scale detection and data

collection limitations. But their potential for long distance travel has been demonstrated as they

have been detected from rooftop spore traps (e.g. at Rothamsted research) [3, 29].

Different types of models have been used to describe Sclerotinia spore dispersal. Earlier on, field

experiments successfully fit spore dispersal to 1-dimensional models with concentration

monotonically decreasing with distance from the source [44]. Most studies [41, 45] have used

two functions to describe spore gradient: a negative exponential function and an inverse power

law function. The inverse power law is more appropriate in describing long distance dispersal

[44] while the exponential function is more suited for canopy transport [41]. These functions are

limited in that parameters for the decay coefficients need to be determined for every case [44].

As source information became available (e.g. release speeds and modes (puffing)), emission

models were adopted to describe spore dispersal. One such emission model is the Gaussian Plume

Model (GPM), which assumes spore or particle concentration distributions are Gaussian in the

lateral and crosswind directions. Some good examples of GPM applications to spore dispersion

are [46] and [47]. Gaussian models were found to do well outside the canopy but poorly inside

it since the assumptions of Gaussian velocity distributions in the near-field are not valid [48, 49].

As a result, GPMs are normally adopted for long distance travel of fungal spores.

Short-term spore dispersal is important because information about escape fraction [50], the

amount of spores leaving a canopy, can be ascertained. This fraction, which cannot be measured

directly [51, 52] , is important in assessing long distance dispersal patterns. The realisation of

the GPM’s inadequacy in the near-field led to the adoption of trajectory models, which follow

particles in a more natural manner and are amenable to parameterisation in canopy media [53].

Lagrangian Stochastic (LS) [54] models are the most commonly used models in this area although

some Eulerian Advection Models (EADs) have also been used [44]. Notable applications of LS

models to spore dispersal include: estimation of source in wheat and grass canopies for

Lycopodium and V.Inaequalis spores [55], dispersal of pollen in a maize canopy [56, 57] and

dispersal of fungal spores close to the source [58].

Recent developments in methodologies that provide a practical framework for accurate

determination of turbulent parameters based on 2nd order closure assumptions [59], such as the

𝑘 − 𝜖 theory [60] [61] and Large Eddy Simulation (LES) [31, 62] have enabled the coupling of

these powerful parametrisation methods with LS models to provide more accuracy. The

application of these coupled models to spores has only been on a limited scale, however, due to

25

the unavailability of large scale experimental data for evaluation and the reliance of the turbulence

parametrization on good canopy descriptions. Specific application of LS models to Sclerotinia

spores has not been found in literature. With short distance dispersal dependent on not only

aerodynamic characteristics but on release mechanism and canopy structure as well, it is

expected that application of models will be different from every different spore and canopy.

2.1.3 Sclerotinia sclerotium Epidemiology

Sclerotinia is a pathogenic plant fungus that causes Sclerotinia Stem Rot (SSR) in the majority of

oil seeds and legumes [43]. The lifecycle of Sclerotinia is shown in figure 2.1. As may be seen in

figure 2.1, this lifecycle depends on coming into contact with and infecting a host plant. The

infection process is complex, with Sclerotinia sclerotium first attacking senescent tissue to get a

nutrient source before releasing cell-wall-degrading enzymes that kill adjacent healthy tissue [63]

[16]. Once this is achieved, Sclerotinia can cause perennial damage after initial infection because

of its ability to survive in the soil and germinate as Sclerotia when conditions are optimal [64].

These then give rise to the production of apothecia [65] that release ascospores into the air,

which can be transported by various dispersion mechanisms [66]. This ability to survive in the

soil for long periods means that agricultural practices like crop rotation, a practice that is

tightening recently, influence disease incidence. Shorter crop rotations increase disease risk while

longer rotations decrease it [67].

Figure 2.1: Lifecycle of Sclerotinia sclerotium [68]

26

As indicated in figure 2.1, initial infection of crops begins with the attachment of spores to

senescent plant tissues such as petals of fruiting bodies of crops followed by petal fall [16, 63].

The petals then attach to plant leaves, when there is enough adhesion in the form of leaf wetness,

and subsequently infect the stem, at which stage the disease becomes most advanced and

virtually irreversible. As a result, all disease control decisions have to be made before the stem is

infected in order to save yield.

Although spores are a necessary condition for disease occurrence, their presence alone does not

guarantee infection, as they cannot directly attack healthy tissues [63, 64]. Environmental

conditions, such as temperature in the canopy have to be suitable for infection. Environmental

factors play an essential role in two stages: production of apothecia, and subsequent release of

spores [65], and infection of crops by spores [6]. As a consequence of this significant role, most

disease-forecasting schemes have been based on, though not exclusively, weather conditions,

thereby making the forecast methods indirect. The scheme employed in this research is the first

to utilise the use of online sensors that measure spores concentration in real-time in order to

develop direct prediction models.

2.1.4 Sclerotinia Disease Models

Given the economic cost associated with Sclerotinia and the high cost of the resulting suboptimal

fungicide use [69], there has been considerable interest in developing numerous disease

prediction schemes. Most of the initial attempts focused on indirect modelling. These attempts

utilised land history and weather data to identify suitable environmental conditions either to just

forecast the presence of inoculum through the prediction of apothecia germination [6], forecast

actual infection of petals or stem tissue when some other conditions are simultaneously met [7],

or go a step further to provide decision support for spraying [67, 70]. Direct inoculum-based

detection, measuring actual spore concentrations, is more accurate and has the potential to

improve prediction accuracy when incorporated into forecasting models, although the detection

and measurement process in current methods is slow and manual [63]. The approach adopted

in this research is expected to be better than traditional inoculum-based detection methods by

being the first to utilise directly captured viable spores, along with relevant weather variables and

historical data records to develop a real-time risk-forecasting model.

Two models that offer considerable improvement have been identified and are reviewed in more

detail in section 2.1.2, given their importance to the research. These models are exemplars of

27

the current trends in agricultural disease forecasts and therefore provide insights into areas of

improvement.

2.1.4.1 SkleroPro

Koch et al. [67] developed a forecasting system (SkleroPro) capable of assessing Sclerotinia risk

in winter Oil Seed Rape as well as providing a decision support on fungicide spraying. The model

utilises air temperature, relative humidity, rainfall and sunshine hours to estimate canopy

temperature and relative humidity. Data from a climate chamber study was used to determine

critical values of temperature that coincided with highest disease incidence while critical relative

humidity values were extrapolated from a previous experiment. The sum of hours where both

the relative humidity and temperature are ideal for infection (InhSum) is then compared to a

field-specific disease incidence threshold (Inhi) to decide whether or not to spray. The calculation

of Inhi is guided by economic reasons; it also takes the effect of crop rotation into account with

longer rotations increasing it and shorter ones decreasing it (increasing risk). Whenever

Inhi>InhSum, disease risk is considered significant and the crops are sprayed.

The model generally gave satisfactory performance when tested against historical data (>70%).

In addition, 39% reduction in fungicide cost can be saved when compared to routine unsystematic

sprays [9].

The major limitation of this model is also its simplicity: it does not use direct measurements of

spores. Disease risk is indirectly forecasted by predicting infection hours (Inh), primarily based

on environmental weather conditions. This means that the integrity of forecasts solely relies on

the forecasted weather values’ widely varying relationship with Sclerotinia spores.

Another limitation is that the model focuses primarily on stem infections. Therefore it will not

provide that proactive element that will normally be there when the first stages of infection are

monitored and modelled. Focusing on petal infection and fall will provide more time for spraying

decisions to be made although it may not provide more cost savings, since not all petal infections

translate into SSR. This is precisely what the proposed approach provides.

A third limitation of this model is that it does not take apothecia development and ascospore

dispersal into account. Even though these can strongly be related to, and often inferred from

environmental conditions, other factors influence them so they may not always be inferable from

environmental conditions [6]. This means that the model always assumes spores are present in

a field, as it has no way of tracking spore presence. In the approach employed in this research,

28

the detection of spores by online sensors will confirm apothecia formation and ascospore

presence, even if it is not from a local source, no understanding of ‘other’ factors ‘insufficiently

understood’ is necessary.

One more shortcoming is that the model is only time-point specific not location-specific. This is

because the prediction is solely based on canopy microclimate, which has little variation over

most fields and, therefore will not indicate where disease pressure is more intense. The model

proposed in this research will provide time-point as well as location-specific decision support

thereby improving on the fungicide cost savings offered by 'SkleroPro'.

2.1.4.2 RAISO - Sclero

RAISO-Sclero is a Syngenta Ltd trademarked OSR petal-infection forecasting model developed by

Varraillon et al. [7] that is made up of three sub-models that simulate soil climate conditions,

apothecia life cycle and crop flowering development. Because it predicts petal infection, it offers

a more proactive decision support than SkleroPro but it may not give accurate final attack

numbers, as not all petal fall is caused by SSR. The model, which has gained some practical use

in France and UK, uses environmental variables like air temperature, relative humidity and rainfall

to predict petal infection. When ascospore presence, as given by the apothecia life cycle model,

coincides with petal fall, as shown by the flowering Area Index determined by the crop flowering

development model, a disease impact is assumed. The severity of petal fall will typically increase

disease impact.

The model is validated by carrying out a diagnostic test of collecting and examining petals for

signs of the fungus. According to Varraillon et al. [7], the model gave satisfactory performance

when compared to results obtained from the petal kit (validation) tests with 80% disease

prediction accuracy.

Due to its local nature and the resulting need to make several recalculations, there are major

limitations to this model: validation is semi-manual, time consuming and dangerous due to the

toxicity of chemical reagents used.

Another limitation of this model is in validating the flowering area index model that determines

petal biomass. Validating the model is achieved with a CAN-EYE [71] imaging software, which

extracts canopy characteristics, such as Leaf Area Index (LAI), Vegetation Cover Fraction, etc.,

through the analysis and classification of images [71]. The photographic method can be sensitive

29

to light, and leaf colours may be saturated during full flowering, introducing errors into the

validation [72]. Since petal fall is essential to the functioning of this model, this affects the overall

model quality.

Another concern is that other factors that affect flowering dynamics or petal fall, such as sowing

density and insect attacks, etc., are not considered by the flowering development model. This

results in wrong attribution of reduced petal biomass to disease impact. This may be reason why

RAISO-Sclero generally predicts higher risks than those predicted by petal kit tests.

2.2 Dispersion Modelling

Different models have been used to estimate spore dispersal both within and outside crop

canopies. Most of these models are Gaussian or Lagrangian. As the name implies, Gaussian

models are based on the assumption that the spore plume spreads and expands like a Gaussian

distribution, i.e. around a fixed mean (the plume centre) and with a random variance [73]. Inside

the canopy, at short distances from the surface and where there may be low wind conditions,

Gaussian models may give poor performance [73-75]. As a result, Gaussian models have not

been extensively used for pathogen spore dispersal inside canopies. While Lagrangian particle

dispersion models, such as FLEXPART, are very powerful over long and mesoscale dispersion

ranges, Lagrangian Stochastic (LS) models [54] are suitable for estimating spore dispersal within

the crop canopy at distances close to the source [58]. In contrast, Eulerian Advection – Diffusion

(EAD) models [76] based on Fick’s Law provide better estimates at longer distances from the

source [66] and can be extended to a regional scale [77].

One popular model that falls under the Lagrangian category that has not been used in agriculture

is the National Atmospheric-dispersion Modelling Environment (NAME) model [78] developed by

the Met Office to predict atmospheric dispersion and deposition of gaseous particulates up to

global scales. While this model has a potential use in estimating dispersion of some pathogens,

the relatively short dispersal scale of Sclerotinia spores makes its use unsuitable.

Computational Fluid Dynamics (CFD) models based on Navier–Stokes equation are also potentially

useful in estimating dispersal within plant communities due to their ability to model air flows in

complex terrains [66], such as agricultural fields.

Given the physical and aerodynamic similarities between Sclerotinia spores and other types of

particulate matter [44, 79], say PM10, a review of dispersion models simulating or capable of

30

simulating pollutant dispersion, and other general particle movement, is appropriate. Atmospheric

dispersion models used for pollution can be divided into Gaussian and Trajectory models.

2.2.1 Gaussian Dispersion Model

Gaussian models, either puff or plume, assume a Gaussian distribution for a cloud of particles.

That is, the concentration of particles spreads with distance from the source in both the downwind

and crosswind directions according a normal distribution as shown in Figure 2.2 [80].

Figure 2.2: Spore dispersal downwind of an above ground plume source [80]

As may be seen in Figure 2.2, a released plume spreads in a Gaussian manner with a mean value

centred at height 𝐻𝑒 and standard deviations in all three coordinate axes derived from the

respective random variances of the particles within the plume. Concentrations at all points

downwind from the source and at various heights from the ground can be calculated by:

𝐶(𝑥, 𝑦, 𝑧) = 𝑄

2𝜋𝜎𝑥𝜎𝑦𝑢𝑒𝑥𝑝 (−

𝑦2

2𝜎𝑦2) {𝑒𝑥𝑝 (−

(𝑧−𝐻)2

2𝜎𝑧2 ) + 𝑒𝑥𝑝 (−

(𝑧+𝐻)2

2𝜎𝑧2 )} [2.1]

Where 𝑄 is the rate of release; H is the height from ground level to plume centreline; u is the

horizontal (downwind) wind speed measured at H; x, y and z are the three-dimensional

coordinates of the receptor point; 𝜎𝑥 , 𝜎𝑦 and 𝜎𝑧 are the concentration profiles (dispersion

Pollutantconcentration

profiles

Plumecenterline

Heat x

3

Heat x

2

Heat x

1+y

-y

Actual stack heightEffective stack heightpollutant release heightH

s+ Δh

plume rise

=====

Hs

He

Δh

z

Windx

Hs

31

coefficients) of the plume in the downwind; crosswind and vertical directions for a particular

stability class [80].

In most cases, ground level concentration at height 𝑧 = 0 is of more interest than that at a

‘receptor’ height. The concentration is then given by:

𝐶(𝑥, 𝑦, 0) = 𝑄

2𝜋𝜎𝑥𝜎𝑦𝑢𝑒𝑥𝑝 (−

𝑦2

2𝜎𝑦2) [2.2]

The dispersion coefficients 𝜎𝑥, 𝜎𝑦 and 𝜎𝑧 are determined by atmospheric conditions (turbulence in

the atmosphere) and are tedious to calculate or measure on a case-by-case basis [80]. As a

result, they have been parameterized most notably by Pasquill and Smith [81] for different

atmospheric conditions and presented as Pasquill-Gifford stability classes [80, 81]. The classes

are defined from class A to F ranging from unstable through neutral to stable, with class A being

the most unstable and F the most stable. Basically, the higher the atmospheric instability, the

higher the turbulence, which implies more mixing, making the particles more buoyant - allowing

more deposition far from the source [44]. While considerably simplifying parameter selection, the

stability classification restricts Gaussian models to moderate distances and times, as the

parameters of a particular class are only useful within a range of typically hundreds of meters

[82] where the conditions for that class are valid – long distances may cut across multiple stability

conditions. However, these short ranges must be longer than 100 meters, as concentrations for

shorter source-receptor separation can be unrealistically high [82].

The difference between the puff and plume models is that plume sources are continuous while

puff sources are intermittent emitters; a number of puffs are required before a cloud is formed.

Differentiating between puffs and plumes becomes difficult when very fast puffs are involved and

it is often more advantageous to model such emission sources as plumes [79]. For a point source

emitting particles at 𝜏𝑠 seconds that travel in the air for 𝜏𝑙 seconds, puff and plume sources can

be defined as follows [83]:

Puff source: 𝜏𝑠 ≫ 𝜏𝑙

Plume source: 𝜏𝑠 ≪ 𝜏𝑙

2.2.1.1 Application of Gaussian Model to Spores

With regards to fungal spores, a Gaussian Plume Model (GPM) is more appropriate than a puff

model since the time between emissions is very close to zero [79]. Clarkson et al. [19] actually

classify apothecia release as a continuous phenomenon, having observed continuous release of

32

spores in an experiment. This makes application of GPM plausible since apothecia are likely to be

distributed within any single source and their combined release will form a cloud that is very

similar in flow characteristics to a plume source. GPMs ability to incorporate terms that account

for real effects such as deposition on leaves and escape fraction from canopy [46] also contributes

to their attractiveness. Spijkerboer et al. [46] have confirmed that spores from point sources

actually form Gaussian plumes and that GPMs are as suitable for modelling spores as they are for

modelling gases. They, however, noted that the accuracy of prediction is affected by the under-

prediction of stability classes by most stability tables including Pasquill-Gifford resulting in lower

plume size and, in turn, lower accuracy of prediction [84]. Summarily, Gaussian models can

adequately model the key components of dispersal: release, transportation and deposition. What

they cannot do is to simulate chemical mixing reactions between particles and this is not a

requirement for spore dispersion [79].

2.2.2 Trajectory Models

Trajectory models can be divided into Lagrangian [38, 48, 54, 85] and Eulerian [66, 80] models.

In contrast to Gaussian models which describe an entire cloud of particles, Lagrangian and

Eulerian models follow individual particles as they move through the atmosphere modelling them

as a random work process [83]. For Lagrangian models, individual particles travel at a changing

speed known as the Lagrangian speed, 𝑢𝐿 , whose rate of change is given by the Langevin

equation [79, 86]:

𝑑𝑢𝐿

𝑑𝑡= −𝑎𝑢 + 𝑏𝜉(𝑡) [2.3]

Where 𝑢 is the previous speed of the particle referred to as the ‘memory term’, 𝜉 is a random

forcing function accounting for turbulence, 𝑎 and 𝑏 are functions of particle location and time

derived from the Fokker-Planck equation [87]. The displacement of a particle travelling from

position 𝑥0 to 𝑥1 in time 𝑡0 to 𝑡1 is then

𝑑𝑥 = 𝑑𝑢𝐿 ⋅ 𝑑𝑡 [2.4]

The conditional joint probability distributions of these trajectories are then computed in order to

evaluate the concentration.

Eulerian models are very similar to Lagrangian models except that, for the latter, a moving frame

of reference that moves with the particle is used while the former uses a fixed frame of reference

[83]. As a result, the Lagrangian and Eulerian speeds are not the same, making particle

displacement calculations by these two models unequal.

33

2.2.2.1 Application of Trajectory Models to Spores

Following the trajectories of individual spores means that these models will handle the effect of

wind gusts and rapid change in wind directions better than a GPM [88], which is restricted within

a specific range of atmospheric conditions describing a particular stability class. Easy and accurate

parameterisation of airflows for these models enhances their accuracy although inaccuracies crop

in for complex airflows that are difficult to parameterise [57]. This is particularly important in

simulating spore travel within a canopy where wind speed and direction are randomized by plant

cover and other obstructions [79]. For estimating spore dispersal close to the source or near

ground level, Aylor [58], and Aylor et al. [89] have found that Lagrangian models are accurate.

Aylor and Flesch [55] found that, as Lagrangian models are best suited for the lighter passive

tracers, accounting for the effect of inertia and gravity when implementing them on spores yields

better results. They used the Lagrangian model to accurately estimate the release rate of spores

into the atmosphere from a canopy, a problem that was hitherto intractable [55].

2.2.3 CALPUFF

Due to the successful implementation of Gaussian models in particle dispersal, numerous

simulation packages employing Gaussian distribution of spores have been produced, among them

AEROMOD [90] and CALPUFF [91]. CALPUFF is a non-steady state meteorological modelling

system specifically designed for air quality modelling. It has three main components: CALMET, a

meteorological modelling package capable of generating diagnostic and prognostic wind fields; a

core Gaussian dispersion model with wet and dry deposition and chemical removal known as

CALPUFF; and CALPOST, a suite of post processing programs that output concentrations and

meteorological data fields. In addition, a host of pre-processing programs are available to

interface with various data formats and sources, including other models. The model uses

geophysical and meteorological data inputs to construct a meteorological terrain that determines

dispersion gradients [91].

It may at this stage be evident that there are more attractive reasons to use CALPUFF than a

basic GPM for spore dispersion simulation than the one already outlined: generating

meteorological fields, specifying model domain, dispersion parameters and visualizing outputs,

which otherwise required specifying complex parameters, are significantly less monotonous

through GUI support; and much more complexity in terms of averaging times of wind variables

are easily handled.

34

The major setback of CALPUFF is that it supports non-widely used data formats in which relevant

datasets are not available and conversion is tedious. For example, while global and local (USA)

land-use data files are available in the supported CTG (Contiguous Themed Grid) and other

formats, this global data does not provide enough granularity for non-global models, necessitating

the need to source for some local UK data for good terrain simulation; this data is not available

in supported formats and conversion is a problem.

2.3 Multivariate Statistical Analysis

Based on the review of Sclerotinia disease models, it may be observed that current models used

are first principle models that require information about the source, such as the number and

location of sources and source strength. This information is mostly unavailable, unreliable or

difficult to obtain. It is therefore possible that empirical approaches that can ascertain and exploit

the structure in the dispersion process from collected data would provide better results. With

these methods most of the inadequacies of SkleroPro and RAISO-Sclero such as lack of

assessment of prediction uncertainties, scientific diagnostics, statistical inference and evaluation

can be addressed.

One category of suitable empirical approaches is multivariate data analysis [92]. Multivariate

analysis allows the statistical investigation of inter-variable relationships and contributions in data

sets containing multiple variables. Application of its various forms allows information in any data

to be extracted, interpreted or predicted. A demonstration of the powerful qualities of multivariate

analysis is seen in its application in the chemometrics industry where a large number of predictors

that may not necessarily be significant to the predicted variable make up most of the dataset,

and there is a need to reduce data dimensionality such that only essential variables are included

in the model [92, 93]. It is these qualities that this study hopes to exploit in our application of

these methods. The analysis is not just restricted to the useful information signal part of the data

but extended to the error (random noise) in the data too for statistical inference and diagnostic

purposes. This scientific analysis of error provides a diagnostic tool for assessing results,

something lacking in the experimental approach used for Sclerotinia models [92].

The field of multivariate statistics analysis is used in a wide range of applications comprising 1)

data description, 2) regression and prediction, 3) interpolation, and 4) discrimination and

classification. Generally, these applications can be categorised into linear or nonlinear; projection

or non-projection methods [92, 93].

35

Linear and Nonlinear

Linear methods are those that can be adequately described by the linear model:

𝑦 = 𝑚𝑥 + 𝑐

where y is the dependent variable, x the independent variable, and m and c are constants. By

contrast nonlinear models have parameters that are constantly varying. Linear methods are

generally easier to use and more grounded in theory since they are more understood [93] On the

other hand, a deep understanding of the data is required before nonlinear methods can be used

with a certain degree of certainty [92]. While it is possible, and desirable, to incorporate some

nonlinear relationships into a linear model, it may be necessary to use nonlinear models for some

data. The choice of the method to use depends on the structure and complexity of the data. Most

of the popular regression techniques such as Multiple Linear Regression (MLR), Principal

Component Analysis (PCA), Principal Component Regression (PCR), and Partial Least Squares

(PLS) fall under the linear regression category although they can accommodate some nonlinear

relationships while still maintaining the general linear-in-parameter nature of the model [92].

Projection and non-projection methods

This classification relates to regression. Projection based methods require the original data to be

transformed into a new variable space for easy visualisation and most importantly for dimension

reduction. Methods like PCA, PCR and PLS fall under this category. MLR, however, is a non-

projection method where all modelling and analysis is done on the actual variables themselves.

The projection-based methods are generally more popular and effective in applications where

there is a large number of correlated variables. While non-projection methods may result in

underdetermined systems for a dataset 𝑋 with 𝑚 < 𝑛 (where 𝑚 and 𝑛 are the number of

samples and variables respectively), projection methods always result in an over determined

system. Therefore, projection methods always give a unique or least squares solution to the

regression problem [94, 95].

2.3.1 Multivariate Analysis in Agriculture

Due to the novelty of the proposed approach in this work (see chapter 1) similar applications of

multivariate analysis have not been found in literature. However, agricultural disease prediction

is a wide and diverse area and there have been some applications of multivariate analysis within

the sector. Principal Component Analysis (PCA), Factor Analysis (FA) and other forms of

Discrimination Analysis have been extensively used to analyse and classify various kinds of

agricultural data. Kallithraka et al. [96] used PCA to classify Greek wines based on geographical

36

origin. 33 different varieties were successfully grouped into two geographical regions although

more samples need to be analysed before the result can be generalised. Again, Whittaker [97]

used PCA to identify food-borne bacteria by infrared spectroscopy and the results, after

verification, proved to be accurate.

For modelling and prediction purposes, despite its inferior performance, Principal Component

Regression (PCR) has been favoured in the past over Partial Least Squares Regression (PLSR)

due to the former’s perceived stronger statistical background even though both are not fully

understood statistically [98]. However, after efforts made by Höskuldsson [99], Helland [100]

and subsequent works, the statistical properties of PLSR are more understood and PLSR is

increasingly being used outside chemometrics. PCR and PLSR have been used by Liu et al. [101]

to estimate and characterize the severity of rice brown spot disease. The data is collected by

hyper spectral reflectance and is therefore spectral. The results showed that PLSR gave

predictions with a lower Root Mean Squared Error of Prediction (RMSEP) than PCR. Other

applications of PLSR include the prediction of beef palability from colour, marbling fat and surface

texture features of longissimus dorsi [102] and the detection of Sclerotinia rot disease on celery

using hyper spectral data [103]. The application of PLSR to Sclerotinia or other plant disease

prediction has been found to predominantly utilise spectral data because, until now, there was

no automatic spore detection mechanism that can be used with an online forecasting system.

However, traditional regression methods have been used in developing indirect Sclerotinia

prediction models. Linear regression and logistic regression have been used to associate field

infection levels with disease incidence [67, 70, 104], and multiple regression techniques have

been used to relate disease incidence with several independent variables using SAS software as

part of the forecasting process [105]. Analysis of variance (ANOVA) has also been used in the

analysis of historical disease data [104].

Branches of multivariate analysis such as regression analysis and multivariate interpolation are

potentially useful for predicting and forecasting spores. While regression methods have to some

extent been used within agriculture, the data used is mostly image or spectral data, which makes

the application no different from typical chemometrics applications [106].

2.3.2 Multivariate Statistical Process Control

Due to the novelty of the design of the sensing surface, there is no sense of how the biosensor

will perform. Therefore, there is a need to have a measure of accuracy to expect from the device.

In addition to the reason mentioned above, sensitivity of the surface and specificity of the

http://eprints.usq.edu.au/1874

http://eprints.usq.edu.au/1874

37

particular strain of Sclerotinia encountered [107] can affect measurement accuracy. Moreover,

unforeseen problems always cause complications in real deployments. Some real deployments

have shown that environmental sensors can have a yield of as low as 50%, with the rest of the

data either corrupted, erroneous or lost in transmission [108, 109] .

Various statistical methods are used to assess sensor accuracy. Kollman et al. [110] used

correlation analysis, Error Grid Analysis (EGA) and Receiver Operating Characteristics (ROC) to

assess the accuracy of continuous glucose sensors. They found that general statistical methods

tend to overestimate sensor accuracy/inaccuracy because they do not distinguish between

clinically significant and insignificant errors. This is significant to this research study, as detection

of spores by the biosensor system does not necessarily indicate a disease outbreak since the

amount of spores may not be enough to cause an epidemic [16]. A threshold of spore

concentration exists above which an outbreak is more likely. A sensor with an error interval higher

than this threshold has low accuracy and provides little information. Kollman et al. [110]

suggested that the use of specialised methods that use event assessments, which take more

history into account during analysis, as against point-by-point assessments, would provide better

sense of accuracy for these sensors.

Poisson distributions, which can be used to compute probabilities of event occurrence [111] seem

attractive. But a Poisson process requires that successive samples be independent, which

presupposes normal (Gaussian) distribution [111]. There are no assurances that the data (error

vector) will be normal, so an adaptation of Poisson distribution to nonparametric distributions or

other KDE-based methods [112] may need to be considered.

2.3.2.1 Fault Detection and Identification

Sensed (measured) data is inevitably susceptible to errors. Therefore, a mechanism to detect and

identify erroneous sensors is necessary. Because the sensors are required to record real-time

measurements, online error detection methods are desired. Fault detection methods are an ideal

remedy for this since they can be implemented online and the general definition of a fault can be

extended to cover both false positive and negative errors as well as real faults that occur as a

result of sensor failure. As with faults, subtle errors due to drift or correlation breakdown are

more difficult to detect than glaring errors that breach specified limits [113]. An effective

detection method will be one that detects both.

Extensive investigations into online fault detection have been conducted in process monitoring

and control over the past years [113-118]. PCA and PLS models [92, 119-122] have long been

used for process monitoring and control, utilising the ever-present, high correlation of process

38

variables. The idea is to compare a model built with healthy historical data to new incoming

samples, and abnormalities are detected by monitoring the residual error or normal variance of

the model, both of which are indicators of correlation breakdown. The faulty sensor is identified

by identifying the variable with the highest contribution to the square prediction error (SPE) and

the Hotelling T2 statistic [116, 123-125]. Dunia et al. [126] detected, identified and reconstructed

faults using a PCA model and a novel metric, the sensor validity index (SVI), which provided the

status of each sensor online. They applied this method on a boiler process and the results showed

very good reconstructions are possible for a highly correlated process, and the SVI was able to

identify faults reasonably well.

However, their approach only considered one sensor failing or becoming faulty at a time, an

assumption this study cannot afford since multiple simultaneous sensor failures are expected, for

example when oxalic acid producing pathogens that cause false positives traverse the space

covering more than one sensor. More work by Doymaz et al. [127] and Bose et al. [128] address

the case of multiple sensor failures. While all of these methods measure multiple variables, the

biosensors considered in this work will only measure a single variable (spore concentration), but

its spatial variation makes its behaviour similar to that of unique, correlated process variables,

making the discussed process and monitoring control methods appropriate.

Sharma et al. [129] investigated the prevalence of faults in real-world sensor deployments using

a number of fault detection algorithms: estimation-based methods, learning methods and rule-

based methods. Similar to the PCA case, the estimation method uses a least squares model to

estimate a sensor based on its neighbours but uses the estimation error as against SPE to detect

faults. No particular method was found to be perfect; they found that each of the methods

performed well depending on whether the fault is a short fault or a noise fault [129]. They also

found that sensor faults occurred did not occur very frequently, except when mechanical failure

is involved, but when they do, the erroneous values are orders of magnitude higher than the

actual measurements. This is significant since a biosensor, with all its added complexity, is more

susceptible to mechanical as well as other problems. This approach also utilised spatial correlation

to estimate reliable models of sensors using other sensors. Ramanathan et al. [109] also looked

into rapidly deployed sensor networks, where the short deployment period makes detecting and

correcting faults more exigent. Their algorithm was based on a set of rules that identified and

classified faults based on their duration. Their classes of faults were found to be more frequent

than Sharma et al. [129] found, confirmation that fault prevalence is data dependent.

Both Sharma et al. [129] and Ramanathan et al. [109] used rule-based fault detection and

identification with success. The fact that Ramanathan et al. [109] used it for rapidly deployed

39

sensor shows that the approach can detect faults in a timely manner. One more attraction of the

rule-based method is that it can easily incorporate another set of rules that define what to do in

the event of a fault, thus providing fault detection and remediation functionalities. While other

methods such as learning methods, neural network models or hidden Markov models [130], can

provide similar confidence in results, fault remediation is not easily implementable. The main

challenge of the rule-based method is in using it online for slowly sampled systems. A number of

samples will have to be used for some types of faults (subtle ones) to be detected, which in the

case of the biosensors considered in this work which have a sampling frequency of one

measurement per day covers a period of days. Another challenge is the massive amount of

knowledge and experience required to formulate effective rules [109].

All fault detection methods use a threshold limit that is designed around the confidence limit of a

fault-free error model of the process [109, 113, 116, 128, 131]. Selecting this threshold is very

crucial, as an unsuitable value will result in too many false alarms, causing the fault detection

system to be highly unreliable. A possible way to deal with naturally occurring false alarms is

through the use of a low pass filter. Qin and Li [131] used exponentially weighted moving average

(EWMA) filtering to drastically reduce the amount of false alarms due to noise.

2.3.2.1 Kernel Density Estimation

Kernel Density Estimation (KDE) is a reliable nonparametric of estimating data distributions which

makes no assumptions of normality [112, 132]. The kernel density estimator is given by [112]:

𝑓(��) = 1

𝑛ℎ∑��

𝑛

𝑖=1

(�� − ��𝑖

ℎ𝐾𝐷𝐸

) [2.4]

Where 𝑛 is the number of samples,ℎ𝐾𝐷𝐸 is the smoothing parameter or bandwidth analogous to

bin width in histograms (the KDE is actually a more sophisticated histogram where the width and

origin of the histogram ideally chosen to show the true properties of the data), �� is the position

of interest, ��𝑖 are the sample values and �� is the kernel function [112]. �� can be a symmetric

probability density function or a piecewise function. The choice of �� is very important as it

determines the differentiability and continuity of 𝑓. Each of the summation components in Eq.

2.4 represents a kernel whose shape is determined by �� and width by ℎ𝐾𝐷𝐸 [112]. The individual

kernels are added together to yield 𝑓 as illustrated in figure 2.3.

40

The difference between a successful and failed KDE implementation usually comes down to the

choice of bandwidth, which is very much data dependent [132-134]. A large bandwidth obscures

data detail, including multimodal features, while a value close to zero accentuates data spikes

[134]. Other forms are available for equation 2.4 because there are different methods of

specifying ℎ𝐾𝐷𝐸 that enable the estimator to capture more detail for data drawn from long-tailed

distributions, which would otherwise be masked by the more prominent part of the distribution

[112].

Figure 2.3: Kernel estimates showing individual kernels and the effect of bandwidth, ℎ𝐾𝐷𝐸 (a)

ℎ𝐾𝐷𝐸 = 0.2; (b) ℎ𝐾𝐷𝐸 = 0.8 [112]

2.4 Sensors, Biosensors and Sensor Networks

Biosensor and chemo-sensor networks have enormous promise in combating bioterrorism,

contamination, and improving healthcare [135]. The main goal behind the 'SYIELD' project, which

was aimed at deploying a network of biosensors across the UK for the purpose of crop pathogen

detection, fits in perfectly with this. As the name implies, these biosensors are not primarily

transducer-based and are, therefore, not as advanced, reliable or well understood [136]. Large-

scale deployment of transducer-based sensor networks, for example those used in measuring

physical signals such as temperature, pressure, etc., has been researched and improved to

efficient levels of performance over the years, with the electrical and electronic industries being

41

the primary contributors in their quest to create and improve the telecommunications sector, and

the computer industry through its interest in optimising data acquisition.

Unfortunately, biosensor networks, which are potentially a source of data of unprecedented

importance and application, have not been so developed due to the complex nature of the sensing

surface and other reliability issues of the individual sensors that make up the network [136]. It is

still possible, however, to adapt most of the sophistication, and experience acquired over the

years, of ‘conventional’ Wireless Sensor Networks (WSNs) to biosensor networks. For example, it

is now known that a distributed network structure provides networks with more redundancy and

reliability regardless of the makeup of a sensors sensing surface. Chemo/biosensors can be

deployed in a semi distributed fashion, allowing them to cope with node failure through

decentralisation but not requiring the expensive node-node communication [136]. Concepts such

as scheduling, which enables the sensor to be turned off in low risk periods of the day, and data

aggregation, which allows minimisation of data transmission energy, can be employed [136].

2.4.1 Peculiar Challenges of Biosensor Networks

Despite this transferability between conventional sensor and biosensor networks, problems still

persist regarding the optimisation of biosensor networks. Peculiar complications and unreliability

due to the unpredictability of the sensing surface and a multitude of other factors, ranging from

mechanical unreliability to the stochastic nature of particle transport, make the network more

complex and less reliable [136]. The randomness of the measured variable (spore concentration

is a function of a random dispersion process) makes the sensing range of the spore biosensor

lower compared to that of a conventional transducer-based sensor. This makes the problem of

biosensor location and area coverage even more crucial. The expensive cost of these biosensors

also mandates the use of a robust network that can function well with relatively fewer nodes. In

addition, low biosensor sensitivity and unusual drift can seriously affect data quality.

Consequently, constraints resulting from these challenges can affect four key areas: biosensor

location, biosensor coverage, biosensor deployment and data quality.

2.4.1.1 Biosensor Location

Striking an optimal trade-off between network performance, sensor location, coverage, cost and

redundancy is challenging. Sensor location varies with the application being considered [135].

For example, the Environmental Protection Agency’s (EPA) pollution monitoring network, whose

primary aim is ensuring people’s safety, uses community-based location - they use statistically

determined information about where the majority of the people live to locate sensors. Within any

42

population of interest, sensors are located where they are most likely to guarantee early warning.

For spores, as they originate from farms and infect crops, it makes sense to place biosensors in

farms in an ad-hoc manner. But there are a lot of big farms, some located adjacent to each other.

A better approach that will scientifically deploy biosensors exploiting knowledge of their properties

and limitations will improve data quality and reduce cost of deployment.

2.4.1.2 Biosensor Coverage

Non-contact-based detection methods, where the sensed target is a continuous signal and

contact with sensor is not required, aim for complete coverage [137]. Complete coverage means

that all objects (variables of interest) traversing the space surrounding a network of sensors will

be detected. This is possible for signal detection because the sensing radii required for information

coverage for these sensors are quite big [137]. Unlike physical variables that are continuous in

time and space, spores are in a state of motion, and measured values depend on location of

measuring device [13]. Moreover, biosensors typically require actual physical contact with the

measured variable for detection to be possible [136]. This reduces the coverage radius of

individual sensors to effectively less than a couple of metres depending on wind speed, direction,

and whether or not there is a source in their vicinity. There is, therefore, no unique definition of

coverage for any individual biosensor. So many sensors are required to ensure complete network

coverage for spores that it is not practical. Note that the coverage of the entire network is an

extension of the coverage of a single sensor. As a result, chemo/biosensors used in biological,

biohazard and pollution measurements do not aim for complete coverage [138]. They instead

aim to detect ‘significant threats’ and this is why there are no deployment standards for

biosensors.

2.4.1.3 Biosensor Deployment

There is no standard for deploying sensors, as most deployments are based on ad-hoc measures

[138]. But lately, deployment is becoming increasingly reliant on optimal placement of sensors to

guarantee redundancy [135]. Redundancy is very useful in a network because it allows for failure

of nodes and improves measurement precision. Most of these approaches are based on some

kind of optimisation where an objective function is minimised subject to certain constraints. The

choice of the objective function to minimise or maximize varies depending on the types of sensors

[135]. Heuristics, dynamic programming, and genetic algorithm are among the optimisation

approaches recently used [139-141]. Kanaroglou et al. [138] formulated the optimisation problem

of placement of pollution monitors as a function of pollution surface variability or semivariance

43

[138]. Portions of this surface that have high entropy (variance) are allocated a larger number of

monitors by this method. Krause et al. [142] have criticised this approach asserting that the

assumption that high variance locations need more monitors is weak since entropy is an indirect

criterion that does not consider the uncertainty in the original prediction that produced the

surface. Since sensors placed far apart from each other – in the network that produced the data

- are likely to have higher entropy, the result of this weak assumption is that sensors tend to be

located at the borders of the area of interest [142, 143].

Park et al. [144] approached it as an optimisation problem that is a function of geographical

features such as roads, water bodies, elevation, etc., and redundancy. They found that the

algorithm provided qualitative sensor placement that is robust enough to handle node failure,

meaning the algorithm sufficiently allowed for redundancy. The method, however, does not

provide location errors, which are certain to occur even for manual deployment. In addition, there

is no quantitative measure of how beneficial this method is over others.

Lee and Kulesz [135] noted that most of these methods do not consider the dispersion of

hazardous materials, their toxicity and population distribution when placing sensors. These factors

are important when there is an interest in determining the effect of threats on populations. They

proposed a general risk-based placement algorithm that locates sensors iteratively based on the

solution of a local optimisation problem. For a gridded area, the placement for each cell represents

a local optimisation problem. The next placement disregards the risk used for placement in the

previous cell. This approach provides the advantage of quantifying the gain of adding each sensor

to the network, thereby providing an extra condition for stopping the iterative process – when a

certain threshold is reached – in addition to number of sensors available for placement [135].

2.4.1.4 Data Quality

A qualitative data set is one that guarantees a reasonable amount of confidence in the statistical

analysis and conclusions drawn from it. Spore data behaves in a similar way to particulate

datasets such as pollution data [44, 79]. Most approaches for optimising data quality in sensor

networks deal with selecting an ideal sampling frequency and ensuring enough sampling

locations. Exploratory tools of investigating data quality include entropy-based methods [145],

principal component analysis [92], and a host of other time-series analysis methods – such as

autocorrelation and trend analysis Shumway and Stoffer [146]. Averaging and filtering methods

for filtering excessive noise that is almost certainly going to be present in the system are

particularly useful. All of these methods can be used to analyse data collected from a pilot phase

of the experiment, and the conclusions drawn from such an analysis could be used to improve

44

subsequent data collection. Data samples are spatially and temporally correlated and the amount

of this correlation determines data usefulness.

2.5 Conclusion

The review has provided an overview of atmospheric models, fault detection in sensors and

sensor networks and data integrity methods as they relate to biosensors and spores. The

limitations of current agricultural disease prediction methods have been identified as their inability

to incorporate spore dispersion information into disease risk forecasts. Atmospheric models, such

as GPM and trajectory (Lagrangian and Eulerian) models, which have been successfully extended,

from meteorology and climatology, to spore dispersion have also been reviewed. The GPM is

attractive for spore dispersion prediction because of its ease of implementation and availability in

many computer packages such as CALPUFF. Trajectory models have been shown in literature to

be more accurate near a source due to their ability to better simulate the increased randomness

brought by canopy disruption of wind flow within that range. Generally, trajectory models require

detailed information about the source of spores, such as location, ground cover, initial release

velocity and source strength and dimension. This information is not always available in non-

experimental cases.

Data integrity methods in biosensor networks have not fully developed because biosensing is only

just gaining ground. Creating reliable chemo or biosensing surfaces for deployment on a network

scale is still in its infancy. Nevertheless, transferable methods from conventional sensor networks

for optimising network operation and ascertaining and improving data quality have been

identified. These methods were found to be widely used in the communications, electrical,

environmental and chemical industries.

The review also found that sensor deployment is critical as it affects the quality of data collected

and should consider sensor properties such as coverage and measurement precision. Almost all

deployment methods are based on optimising a constrained cost function. Inputs to this objective

function in addition to standard inputs such as maximum number of sensors and coverage area

can vary from approach to approach. The superior methods include sensor redundancy as an

input to the objective function.

45

Chapter 3 Dispersion of Sclerotinia sclerotium Spores in an

Oil Seed Rape Canopy

3.1 Introduction

As discussed in Chapter 1 and 2, the manual and logistical expense of collecting and quantifying

Sclerotinia spores arising from the rudimentary data collection methods limits the availability of

agricultural data. Aerially dispersed PM10 data that is widely available as air quality monitoring

data, is useful for above canopy analysis at longer distances from the source, but is unsuitable

for analysing and modelling source-receptor dispersion on a local scale because near field

transport differs from far field dispersion [147].

The aims of this chapter are to generate data that can: reliably explain the dispersion pattern of

Sclerotinia sclerotium spores in OSR fields in a controlled experiment; enable evaluation of the

sensitivity and effectiveness of a prototype oxalic acid-measuring biosensor; and provide data

that can be used to identify a suitable model for the dispersion of Sclerotinia sclerotium spores

in OSR fields.

The chapter details the design and implementation of an experimental field trial for the emission,

dispersion and collection of Sclerotinia sclerotium spores. It also describes the measurement of

spore concentration from field samples by direct DNA quantification and by proxy, through the

measurement of oxalic acid. Oxalic acid concentration was measured using the electrochemical

process employed by the biosensor (see Chapter 1 and section 3.3.1 in this chapter) and with a

direct concentration measurement using colourimetric analysis. Data resulting from both methods

of oxalic acid concentration evaluation has been compared to test the biosensor’s efficacy. The

overall dispersion of Sclerotinia sclerotium spores within, through, and above an Oil Seed Rape

(OSR) canopy is further discussed. Experimental methods consistent with standard agricultural

procedure and practices were employed, and the application, set-up and utilisation of air sampling

46

and weather measuring equipment were demonstrated. Methods from the chemical sciences,

through the preparation of various samples/reagents, have been adopted and electrochemical

measurement and instrumentation techniques have been utilised along with biochemical

(colourimetric analysis) and biological (quantitative Polymerase Chain Reaction) techniques for

concentration/DNA quantification from collected spores.

The experiment was designed, planned, conceived, set up and implemented by the author with

assistance from Dr Jon West of Rothamsted Research Ltd, UK. All laboratory work was carried

out at Rothamsted Research’s Plant Biology and Crop Science (PBCS) laboratory. The author

solely carried out biosensor tests and reagents preparation; colourimetric analysis was assisted

by Dr Stephanie Heard, then of Rothamsted Research; and Mrs Gail Canning also of Rothamsted

Research did the qPCR analysis. Matlab and data analysis tools in MS Excel were used for data

analysis and presentation.

3.2 Motivation for Experimental Field Trial

This experimental field trial was motivated by a change in the original plan of the SYIELD project

(see Appendix 1 for details). The initial plan intended that prototype biosensors will be ready for

deployment in field trials by mid-2011 and subsequently thus providing the author with at least

2 years Sclerotinia spore dispersion trial data by the end of the PhD. Based on the original plan,

biosensors housed in an integrated unit comprising and a virtual impactor, collection and

incubation mechanism, electrochemical transducers (http://www.syield.net/) will be deployed in

quantity across OSR fields to automatically measure and output a concentration of oxalic daily

thereby supplying data of spatial variation of airborne ascospores. Due to technological and

logistical reasons the biosensor units were not available for this purpose throughout the PhD.

This had the implication that by 2013, the author had no data.

As a result, the author, with the original research goals in mind, conceived and designed the

experimental plan in order to collect data for testing the biosensing chipset (which was in

production at that stage) and evaluating and modelling Sclerotinia dispersal in and above an oil

seed rape canopy. The main aspects of the design were determined by considering the potential

environment the completed biosensors will be deployed in and the type of data they are likely to

collect (naturally released, turbulent in the near-field, diffusive in the far-field, susceptible to

contamination, etc.).

47

3.3 Methodology

This section presents the methodology used in designing the field trial, spore sampling,

identification and quantification. In this main section, a general overview is presented. Description

of the theory and application of these methods along with modifications and justifications are

given in the relevant subsections starting from section 3.3.1.

Broadly, three techniques borrowed from agriculture and horticulture, analytical chemistry and

biology, and meteorology were used. These are: field trial and spore sampling, weather

measurement and instrumentation, and identification and quantification of spores.

A majority of the methods used in this chapter regarding the standard agricultural experimental

setup for spore sampling, deployment and setup of sampling equipment, and identification and

quantification of spores are based on the procedures proposed in Lacey and West [148] and

review of spore traps by Jackson and Bayliss [10]. Standard procedures specific to sampling

agricultural data for model evaluation were also used based on experiments of Aylor, McCartney

and Gleicher. Methods regarding weather instrumentation measurement draw on [149] and

[150].

3.3.1 Field Trial Experiment

An experiment was designed to collect airborne Sclerotinia sclerotium spores in a winter Oil Seed

Rape (OSR) field in Little Hoos (WGS84 Lat/Long: 51.811374/-0.373084) between 31st May and

3rd of June 2012. Little Hoos is one of the classical experimental fields1 at Rothamsted Research

Limited, UK. Figure 3.1 shows the location of Little Hoos in relation to other fields. Siting of the

experiment was motivated by a need to improve sampling reliability, which can be significantly

affected by unmeasured background levels of spores [10], and improve reliability of turbulent

measurements, whose stability and representativeness are amenable to flat terrains [152]. Crop

rotation was practiced in Little Hoos in the previous two years, a practice that is known to inhibit

growth of sclerotia [153] and therefore reduce the likelihood of background concentration levels

of Sclerotinia. There were no detectable natural or artificial inoculum sources in any of the

surrounding fields as the background spore levels from an upwind air sampler confirmed (see

section 3.3.1.3).

1 Classical experimental fields refer to the long-term trial sites where landmark field

experiments were carried out by John Laws and Henry Gilbert in the 19th century. See

Silvertown et al. [151. Silvertown, J., et al., The Park Grass Experiment 1856–2006: its contribution to ecology. Journal of Ecology, 2006. 94(4): p. 801-814..

48

The scale of the experiment (43 x 38m) was chosen for practical reasons, limited by the source

strength, number of available samplers and considerations of the relative short distance travelled

by spores released from a small source inside canopies with a leaf area index of at least 2.5 [43].

The aims of the field trial were threefold: to generate Sclerotinia sclerotium spore spatial data

that would enable the analysis of dispersal of naturally-released spores from in-canopy ground

level sources; to generate suitable data for the identification and evaluation of a physical transport

model to describe dispersion of naturally released spores in and above an OSR canopy; and to

provide viable real life sampled spores for calibrating and testing a prototype Sclerotinia

biosensor.

3.3.1.1 Source and Site Characteristics

Figure 3.1 shows the sampling area within the experimental site. The area is a 43 X 38m

rectangular site within a 100 by 100m OSR field. The OSR was at the flowering stage. Six groups

of Sclerotia were sown at a depth of approximately 2cm in late autumn of 2012 distributed around

the circumference of a 7m-diameter circle. A ring configuration was adopted for the source in

order to address an additional objective of the experiment, which was to test the performance of

an automated prototype biosensor.

The Sclerotia were monitored throughout the winter and matured and produced sporulating

fruiting bodies (apothecia) during the flowering period of the OSR in late April 2013. At the start

of the experiment, the approximate canopy height and Leaf Area Index (LAI) were measured as

1m and 3.5 respectively. LAI was measured with a leaf area index meter (LAI-2200, LiCor

Environmental, NE, USA), which also calculated mean leaf angle.

49

Figure 3. 1: Location of Little Hoos (WGS84 Lat/Long: 51.811374/-0.373084), the experimental

site, among other field trial sites at Rothamsted Research UK (source of image: Rothamsted

Research).

Field Trial Site

50

3.3.1.2 Air Sampling

In this work, given that ascospores are chiefly dispersed aerially [19, 26, 66, 154, 155], air

sampling methods were the primary area of focus. Numerous air sampling methods and

equipment have been developed over the past 50 years of varying reliabilities and confidence

[11, 156] [10]. Generally, depending on the physical characteristics of the spore being sampled,

passive and active sampling techniques [148] can be used. Passive methods rely on the use of

spore traps to capture spores mainly by sedimentation. This method is less suitable for smaller

particles (< 2 𝜇𝑚) as a result of Stoke’s law, which states heavier particles are preferentially

deposited due to their relatively higher settling velocity [148, 157]. Active samplers use active

inertial impaction through the use of air sampling to capture spores. The higher the sampling

volume, the higher their reliability. Some advantages of active methods over passive ones, in

addition to a higher sampling volume, are higher impaction and deposition retention [148].

Popular types of active sampling devices are the Burkard Hirst type spore trap and Rotorod

samplers. In this work, the Rotorod active sampling device was preferred over Burkard traps due

to its combination of superior ease of use and setup, significant inexpensiveness and high

sampling volume [21] at similar or superior impaction efficiencies [55, 158] for particle sizes

above 7𝜇𝑚 [10].

The Rotorod sampler comprises two detachable vertical arms (I-rods) attached to a DC motor,

forming a ‘U’ formation, such that the I-rods stand upright. The motor rotates the arms to sample

an air volume given by:

𝑉 = 𝜋ΔΨΓΖ 𝑐𝑚−3𝑚𝑖𝑛−1 [3.1]

Where Δ,Ψ, Γ and Ζ in Equation 3.1 are the outer diameter, width of the collecting surface,

Length of the collecting surface (all in 𝑐𝑚) and rpm speed respectively.

To begin sampling, a pre-calibrated Rotorod sampler with I-rod surfaces coated with glycerine to

maximise adhesion and impaction efficiency [148], and capable of rotating at 1200rpm by

sampling an air volume of 38 litres per minute was deployed at all sampling heights (see section

3.3.1.3) to collect spores. A 6V battery, which was replaced every 2 days, powered each Rotorod

sampler pair and sampling was automated by a Burkard timer set to activate the Rotorod samplers

for 5 hours (11am to 4pm) daily throughout the experimental period. The sampling days were

chosen such that they were dry and preceded wet days, conditions that have been found to be

optimal for release of Sclerotinia sclerotium spores [19, 20, 159]. The daily sampling periods from

11am to 4pm coincided with weather conditions that have been found to be ideal for spore

emission characterised by increased solar radiation, temperature and sunlight [19, 20]. The

51

sampling duration of 5hrs also alleviates some of the concern that air sampled data is less reliable

with shorter sampling durations [160].

3.3.1.3 Deployment of Samplers

A set of nine (9) locations corresponding to 22 sampling points were chosen in this experiment

to sample travelling spores. 22 Rotorod model 40 samplers (Sampling Technologies Inc., 1989)

were deployed at these locations at various heights. One of the positions (A) was located upwind

to measure background spore levels and the others (B to I) were distributed downwind and

crosswind within Little Hoos as shown in Figure 3.1. The inclusion of an upwind position, A, to

determine the background level of Sclerotinia sclerotium spores from non-local sources complied

with standard practice of spore data collection [10, 148], and was necessary because potential

Sclerotinia sclerotium spore sources could not be entirely ruled out given Rothamsted’s status as

an experimental facility. As Figure 3.2 shows, the samplers were deployed to sample along

downwind, crosswind and vertical directions to provide a spatial measure of dispersion. The

crosswind measurements were made to assess lateral dispersion. The corresponding picture

shown in Figure 3.3 shows a network of Rotorod samplers covered with rain shields spanning the

sampling area. Figure 3.4 provides a closer look at each sampling position, with rain shields

removed, revealing active pairs of I-rods as well as timers that turn sampling on/off. A typical

assembly comprising Rotorods set at 0.8m, 6V battery and Burkard timer is shown in Figure 3.5

before deployment below the canopy.

Previous day’s wind direction forecasts obtained from the Meteorological Office website were

used to align sampling axis with the anticipated wind direction, and this was confirmed and/or

corrected by readings from onsite measurements on each morning of the experiment. This

approximately positioned the sampling axis at the centre of the spore plume, making the

crosswind axis perpendicular to wind direction. Corrections to realign sampling axis with average

wind direction is beneficial because it eliminates covariances between horizontal and crosswind

components of wind speed [149], thereby simplifying model dimensions [150] (see Chapter 4).

All positions were sampled at two heights of 0.8m and 1.6m, positions B and D made

measurements at additional heights of 2.4m and 3.2m in order to determine the vertical profiles

of spore dispersion. The additional sampling heights at position B provided vertical spore

gradients close to the source where spore numbers are highest [26] [29]. The sampling height

of 0.8m below the canopy has been found to be ideal for optimal detection of locally sourced

spores [148] and was chosen as such in this study. The 1.6m sampling height was chosen

because this height is outside the roughness sublayer for 1m-high canopies [149], thus reducing

52

the effect of canopy induced wakes on the sampling point [161]. Comparing collections at this

height with those made at 1.6m above the canopy provided two different spatial gradients that

would enable evaluation of net transport between the two media, i.e. the significance of canopy

filtering or escape fraction [50]. This will in turn give an indication of spores available for long

distance travel which can pose a more insidious threat to crops in other fields [154].

Figure 3.2: Layout of sampling area (43m by 28m) within field trial site from 31st May 2013 to

3rd June 2013 showing positions of Rotorod samplers. Data was collected at two heights of

0.8m and 1.6m (O), and additional heights of 2.4m and 3.2m (⊕).

An arrangement of biosensor unit, weather station and a 3D sonic anemometer were situated

at the centre of the 7m-diameter ring of ascospores. Scale of sampling area excluding upwind

sampling point: 35m by 28m. All sampling positions are 7 meters apart except I, which is 14m

from D. B is 1m away from the edge of the source ring.

53

Figure 3.3: Experimental trial field showing Rotorod samplers (with rain shields) above OSR

canopy. (Image taken by the author).

54

Figure 3. 4: Rotorod samplers at position B deployed at 0.8m (obscured), 1.6m, 2.4m and 3.2m

pictured without rain covers. Position B (as well as D) sampled at two additional heights.

(Image taken by the author).

55

Figure 3. 5: A typical assembly of Rotorod sampler (1), battery (2) and Burkard timer (3), seen

here only powering one sampler with its other output unused. (Image taken by author)

3.3.1.4 Weather Measurements and Instrumentation.

Weather data was captured by two means: a Vantage Pro2 (Davis Instruments, Hayward, CA,

USA) met station placed at the centre of the source ring, inside a circular canopy clearing of a

radius of approximately 6m, recorded temperature, relative humidity, wind speed and wind

direction every 20 seconds and logged 30-minute averages; and a 3D sonic anemometer

(Campbell Scientific, Inc., UT, USA), which was deployed at a height of 2m meters above the

ground, made measurements of orthogonal wind speeds and their hourly averages along with

friction velocity at a frequency of 16Hz. By deploying the sonic anemometer at 2m, turbulence

measurements were made at a height above the roughness sublayer [37] (height of roughness

sublayer above the ground in this experiment is approx. 1.3m), whose effect can cause rough-

1

2

3

56

wall boundary layer eddies that may deteriorate measurement accuracy [161]. The sonic probes

were also aligned with the mean streamwise wind direction to minimise errors [38] based on

forecasts. The reliability of sonic anemometers in canopies of varying heights has been

extensively demonstrated in spore dispersion experiments [55, 56, 162] and for other lighter

tracer particles [150, 163].

At the end of each sampling period, deviations of the sampling grid from the mean wind direction

were noted and recorded as 휃 . The approximate locations of the met station and sonic

anemometer are indicated in figure 3.2. Solar radiation and rainfall data were also sourced from

an offsite meteorological station (part of the Environment Change Network (ECN)) located

approximately 1 kilometre away from the experimental field.

3.3.2 Identification and Quantification of Spores

At the end of each sampling day, for each sampling location, one I-rod per Rotorod containing

the collected spores was stored for direct quantification via qPCR analysis [63] while the other

was immersed into capped 1𝑚𝑙 Eppendorf tubes containing 900𝜇𝑙 of Sabouraud growth media,

stored at 200C for 4 days to allow for oxalic acid formation [64] and used for indirect quantification

via measurements of oxalic acid, a pathogenicity factor for viable Sclerotinia sclerotium spores

[14]. Two types of indirect quantification were used: the biosensor chip and colourimetric

detection. The objective of testing the prototype biosensors is to determine their potential for

real-time deployment. These tests are described in section 3.3.2.1.

The remainder of the sample was used to measure oxalic acid using the alternative method of

colourimetric analysis [148, 164]. Due to its reliability colourimetric detection can ascertain if

detectable quantities of oxalic acid were produced from collected data, as real-life air sampling

can be susceptible to contamination and corruption of samples. Subsequently, the method was

used to assess biosensor performance and to explore the possibility of identifying biosensor false-

negatives.

3.3.2.1 Prototype Biosensor Testing

The biosensing surface is made up of a circular carbon electrode of approximately 1 square

millimetre, capable of holding approximately 80𝜇𝐿 of liquid solution, coated with an enzyme

known as oxalate oxidase [165]. This enzyme acts as the bioreceptor, the component of the

biosensor that determines selectivity and specificity [166]. Oxalate oxidase catalyses oxalate

(Oxalic acid [167]) to hydrogen peroxide (H2O2) and carbon dioxide [168]. H2O2 in turn oxidises

the ferrocyanide ions in the biosensor to ferricyanide ions which results in the generation of

57

faradaic current proportional to the concentration of the analyte (oxalic acid) [169, 170]. The

concentration of oxalic acid is therefore given by the measured current due to the released

electron from this oxidation process during chronoamperometry [171].

Preparing calibration standards

Oxalic acid concentrations of 0, 50, 100, 500, 1000 and 1500𝜇𝑚𝑜𝑙𝐿−1 spanning the detection

range of the biosensor (Gwent Technology Ltd., Pontypool, UK) as stated in the product datasheet

(Gwent Protocol Document TR-A 2574) were used as calibration samples. This range is based on

the amount of oxalic acid needed to use up enzymes, catalysts and ferrocyanide ions on the

biosensing surface in a chemical reaction [172].

To begin, a base stock solution of 10𝑚𝑚𝑜𝑙𝐿−1 oxalic acid was prepared by weighing 63mg of

oxalic acid dihydrate (molar mass =126.07𝑔𝑚𝑜𝑙−1) with an electronic weight balance (precision

1mg). The weighed crystals were then carefully transferred into a 50mL volumetric flask, filled

with Sabouraud media to allow and inhibit the growth of fungi and bacteria respectively [173].

The six (6) calibration samples (0, 50, 100, 500, 1000 and 1500𝜇𝑚𝑜𝑙𝐿−1) were then prepared by

diluting pipetted volumes of this stock solution with Sabouraud media in separate 50 𝑚𝐿

volumetric flasks as shown in Table 3.1.

Table 3. 1: Volumes of oxalic acid required to prepare 50𝑚𝐿 of 0, 50, 100, 500, 1000 and

1500𝜇𝑚𝑜𝑙𝐿−1 standards from 10𝑚𝑚𝑜𝑙𝐿−1stock

Concentration Volume of Solution Volume of Solvent Dilution Factor

(𝜇𝑚𝑜𝑙𝐿−1) (𝜇𝐿) (𝜇𝐿)

1500 7500 42500 6.67

1000 5000 45000 10

500 2500 47500 20

100 500 49500 100

50 250 49750 200

0 0 50000 -

Measurement of oxalic acid with the biosensor

The electrochemical measurement of oxalic acid was achieved with a Uniscan potentiostat

(Uniscan Instruments, 2009) that attached to the biosensor as shown in Figure 3.6 (left frame).

58

The following steps were used to measure oxalic acid concentration as laid out by the biosensor

manufacturers in biosensor electrochemistry manual [172]:

Hot plate pre-heated to 60C.

Biosensors attached to Uniscan potentiostat using bespoke connector addressing the

central carbon paste electrode as working electrode and outer Ag/AgCl electrode as

pseudo reference / auxiliary electrode (Figure 3.6, right frame).

Pipetted 80μL volume of calibration solution (in Sabouraud media) onto pre-prepared

biosensor surface.

Pre-incubation for 120 seconds at 60C to allow for sufficient mixing and completion of

oxalate oxidase reaction.

Application of -0.2V vs Ag/AgCl pseudo reference to working electrode surface.

Measurement of current (0.1 Hz sampling) for 60 seconds.

Recording current at the 60-second time-point.

This measurement procedure was used to test calibration standards and samples. Each calibration

standard was tested 5 times to generate a calibration curve. A further 40 tests (each) were carried

out on standards with blank (0𝜇𝑚𝑜𝑙𝐿−1) and low (0 − 5𝜇𝑚𝑜𝑙𝐿−1) concentrations of oxalic acid.

These were used to calculate the biosensor Limit of Blanks (LOB) and Limit of Detection (LOD)

and Limit of Quantitation (LOQ) [174]. LOB represents the current measured by the biosensor at

0𝜇𝑚𝑜𝑙𝐿−1, LOD represents the lowest analyte concentration the biosensor can measure and LOQ

indicates the lowest analyte concentration that can be reliably measured [175]. The LOB, LOD

and LOQ are considered “figures of merit” for biosensor performance and are widely used [174]

[175]. These were calculated as follows [174]:

𝐿𝑂𝐵 = 𝜇𝑏𝑙𝑎𝑛𝑘𝑠 + 1.645𝜎𝑏𝑙𝑎𝑛𝑘𝑠 [3.2]

𝐿𝑂𝐷 = 𝐿𝑂𝐵 + 1.645𝜎𝑙𝑜𝑤 [3.3]

𝐿𝑂𝑄 = 𝐿𝑂𝐷 [3.4]

Where 𝜇𝑏𝑙𝑎𝑛𝑘𝑠 is the mean of blank concentrations, 𝜎𝑏𝑙𝑎𝑛𝑘𝑠 and 𝜎𝑙𝑜𝑤 are the standard deviations

of blank and low concentrations. The calculation of LOQ is usually ad-hoc [175] but it can be set

to the value of the LOD when expected analyte concentrations are not known [174].

59

Figure 3. 6: Biosensor attached to Uniscan potentiostat using a bespoke connector (1).

Prototype biosensor (2) sensing surface is an enzyme-coated carbon electrode (black circular

area in right frame). (Image taken by author)

3.3.2.2 Colourimetric Detection of Oxalic Acid

Colourimetric detection is a chemical technique for determining the concentration of coloured

compounds in a solution [176]. This determination is enabled by the Beer-Lambert law, which

relates concentration of a dissolved compound, 𝑐𝑑𝑖𝑠, to the absorbance of a specific wavelength

of light, 𝐴𝑏𝑠, and the path length or distance travelled by light through a spectrophotometer cell

of known dimensions, 𝑙, as follows:

𝑐𝑑𝑖𝑠 =𝐴𝑏𝑠

휀𝑎𝑏𝑠𝑙 [3.5]

Where 휀𝑎𝑏𝑠 is the absorption coefficient of the chemical compound. Beer-Lambert law holds true

for most compounds in diluted solutions [176].

Colourimetric tests were performed by Dr Steph Heard of Rothamsted research with an assay

optimised by Prof. Nicola Tirelli of University of Manchester. The procedure used is as follows

[177]:

Solution A was made at a pH of 3.8 and temperature of 37oC up by dissolving 50 𝑚𝑀

Succinate buffer, 0.79 𝑚𝑀 N,N Dimethylaniline and 0.11 𝑚𝑀 3-Methyl-2-

Benzothiazolinone Hydrazone (MBTH) in 100 𝑚𝑙 of deionised water.

1 2

60

A master mix solution was then prepared with 12.6 𝑚𝑙 Solution A, 0.5 𝑚𝑙 100 𝑚𝑀

Ethylenediaminetetraacetic Acid Solution (EDTA), 1 𝑚𝑔/𝑚𝑙 of freshly prepared

Horseradish Peroxide and 0.35 𝑢𝑛𝑖𝑡𝑠 /𝑚𝑙 of freshly prepared Oxalate Oxidase mixed in

1.5𝑚𝑙 water. All solutions were prepared in cold deionised water.

For each sample to be tested for oxalic acid, 10𝜇𝑙 of sample was pipetted into one well

of 96-well flat-bottom tissue culture plate (TPP®). Two replicates (the same amount of

the same sample) were also pipetted into 2 other wells so that an average concentration

was calculated. A blank well containing 10𝜇𝑙 of un-inoculated sample was also prepared.

140𝜇𝑙 aliquot of the master mix was pipetted into each well containing testing sample

and replicates bringing the wells to 150𝜇𝑙.

150𝜇𝑙 of 11 standards (0, 25, 50, 100, 200, 400, 1000, 1500, 2000, 2500, 3000 𝜇𝑀) were

also set up on 11 wells of the culture plate to enable the generation of a standard curve.

Absorbances were then read from the plate with a Varioskan Flash spectral scanning

multimode reader (Thermo Scientific™) at 590nm. The plate was then incubated for 5hrs

at 37oC and then read again.

The concentrations were calculated from the Varioskan readings as the average from

three absorbances with the equation:

𝐴𝑏𝑠 = 𝑆𝑖𝑛𝑡 + 𝑆𝑔𝑟𝑎𝑑𝑆𝑐𝑜𝑛𝑠𝑡𝑐𝑑𝑖𝑠

Where 𝑆𝑖𝑛𝑡 , 𝑆𝑔𝑟𝑎𝑑 and 𝑆𝑐𝑜𝑛𝑠𝑡 are parameters from the standard curve.

3.3.2.3 qPCR Measurement of Spore DNA

Quantitative Polymerase Chain Reaction (qPCR) is a DNA amplification technique that relies on

the use of a thermo-stable polymerase enzyme to synthesise copies of a DNA. The process is

initiated by priming [153] the gene of interest in a DNA, making it ready for polymerase binding

and synthesis of new DNA. The reaction is done in cycles characterised by temperature changes,

with each cycle resulting in an approximate doubling of each DNA.

Extraction of spore DNA was achieved using the quantitative Polymerase Chain Reaction (qPCR).

This method is particularly attractive because it is highly sensitive to low spore counts (up to

0.5pg) and is discriminative of spores, such as Botrytis, B cinerea and S. minor, that are known

to masquerade as Sclerotinia sclerotium spores [153]. This method is therefore ideal for

quantifying real-life sampled that are known to be difficult to accurately quantify with less

accurate techniques.

61

The extraction was done by Mrs Gail Canning at Rothamsted’s PBCS laboratory using the primer

design technique developed by Rogers et al. [153] for quantification. Rotating arm rods

containing the spores collected between 31/05/2013 and 03/06/2013 were removed from frozen

storage at -200C and processed using the following procedure [177]:

Each I-rod was put into a 2ml screw cap tube with one scoop of ballotini beads (0.5𝑚𝑔

and 0.4𝜇𝑔 diameter) added.

440𝜇𝑙 of Extraction Buffer was added into each tube consisting of 400 𝑚𝑀 Tris-HCl; 50

𝑚𝑀 EDTA pH 8; 500𝑚𝑀 sodium chloride; 0.95 % sodium dodecyl sulphate (SDS); 2 %

polyvinylpyrrolidone and 5𝑚𝑀 1,10-phenanthroline monohydrate.

0.1% β-mercaptoethanol was added to the tubes before each sample was shaken in a

FastPrep 24 automated lysis machine (MP Biomedicals, 2012). Samples were shaken 3

times at 6.0𝑚/𝑠 for 40 sec, with a 2 minute cooling period on ice between each cycle.

400𝜇𝑙 of 2% SDS buffer was added to each tube. The tubes were inverted several times

and incubated at 65°C in a water bath for 30mins. 800𝜇𝑙 of the bottom phase of phenol:

chloroform (1:1) was added to each tube and then vortexed and then centrifuged at 4oC

for 10 mins at 13K revolutions per minute (krpm).

The top layer of the supernatant from each tube was pipetted up and placed into a 1.5𝑚𝑙

flip-flop Eppendorf tube already containing 30 𝜇𝑙 of 7.5𝑀ammonium acetate and 480 𝜇𝑙

of isopropanol. The tubes were then inverted several times and centrifuged at -20°C at

13krpm for 30 mins.

The resulting supernatant from these tubes (Eppendorf) was removed leaving a DNA

pellet. The pellet was washed with 200 𝜇𝑙 of 70% ethanol and the tube centrifuged again

at 13krpm for 15 mins. The ethanol was removed and the pellet dried. The pellet was re-

suspended in 30 𝜇𝑙 of sterile deionised water and mixed.

The TaqMan method, which offers more isolation and specificity [178], was used to

quantify the DNA.

3.4 Results

This section presents the results of the three identification and quantification methods described

above. For colorimetric detection and qPCR, the results are also presented as spatial data to show

dispersion of spores from release to sampling.

3.4.1 Biosensor Test and Calibration Results

The calibration curve generated from testing of calibration samples is presented in Figure 3.7.

The error bars denote the standard deviation of the current measured by the prototype biosensor

62

for each sample after 5 repetitions of the test, and can be seen to be higher at very low

(< 100 μmolL−1) and very high (> 1000 μmolL−1) concentrations of oxalic acid. The baseline

current corresponding to 0 μmolL−1 (4.075μA) is also high especially when compared to the

current value corresponding to 1500μmolL−1 (10.58μA – the maximum current in the biosensor

measurement range), as this indicates a background noise of at least 39%. It is always desirable

to linearised biosensor calibration curves to improve reliability of measurements [175] [174]. With

this approach, the background noise is usually higher than measured and is given by the intercept

of the linearised calibration curve (=5.83𝜇𝐴). The LOB was calculated 4.4𝜇𝐴, and the LOD and

LOQ were determined to be 6.05𝜇𝐴. The LOD and LOQ correspond to approximately 63 μmolL−1

of oxalic acid, representing 4.2% of the biosensor measurement range.

Figure 3. 7: Biosensor calibration curve for five repeated measurements at 600C after allowing

120 seconds of mixing (𝑛 = 25, error bars = ± 1S. D. ).

Based on these figures of merit, measurements of sampled data were made. The results are

presented in Table 3.2. As may be observed, only 2 samples, taken on the first day, were positive

y = 0.0036x + 5.8228R² = 0.7558

0

1

2

3

4

5

6

7

8

9

10

11

12

0 100 200 300 400 500 600 700 800 900 1000110012001300140015001600

Me

asu

red

Cu

rre

nt

(uA

)

Oxalic Acid Concentration (uM)

Biosensor Calibration at 60C after 120s of mixing

63

(highlighted in yellow). Both of these samples represent the position that is closest to the source

(B - 1m). A test was considered positive if the current reading was higher than the LOD (6.05𝜇𝐴).

There were no detections by the biosensor on the remainder of the sampling days even though

conditions equally favourable for Sclerotinia sclerotium spore release were present – experiment

days were preceded by a brief period of rain, which was accompanied by dryness, and relative

humidity ranged from 65 to 90% during the sampling period. These are favourable for Sclerotinia

sclerotium spore release [19, 21].

64

Table 3.2: Current recorded biosensor measurements procedure described in section 3.3.1 Values

highlighted in yellow are above baseline noise level determined in the last section and are

considered positive for oxalic acid. Heights of 0.8m correspond to Rotorod samplers deployed

below the canopy (canopy height = 1m).

Position

Height

(m)

Current (µA)

Day 1

Current (µA)

Day 2

Current (µA)

Day 3

Current (µA)

Day 4

A 0.8 4.65 4.32 4.10 4.30

A 1.6 4.23 4.20 4.50 4.70

B 0.8 6.75 4.31 3.95 4.40

B 1.6 6.24 4.25 4.60 4.19

B 2.4 3.99 4.12 4.45 4.20

B 3.2 4.40 4.16 4.34 4.13

C 0.8 4.32 4.40 4.10 4.29

C 1.6 5.32 4.10 3.70 4.54

D 0.8 4.12 4.02 4.00 4.31

D 1.6 4.51 4.22 4.20 4.11

D 2.4 3.99 4.12 3.95 3.95

D 3.2 4.21 4.40 4.00 4.20

E 0.8 4.33 4.25 3.99 4.05

E 1.6 4.60 3.94 4.10 4.18

F 0.8 4.27 3.92 4.08 3.95

F 1.6 4.38 3.85 4.02 3.98

G 0.8 4.50 3.99 4.11 4.00

G 1.6 4.09 4.10 3.89 3.87

H 0.8 3.95 4.15 3.75 4.30

H 1.6 4.10 3.93 3.91 4.50

I 0.8 4.41 3.80 3.84 4.23

I 1.6 4.15 3.98 4.01 3.91

3.4.2 Colourimetric Analysis Results and Discussion

The concentrations of oxalic acid corresponding to collected spores over the 4 sampling days are

shown in Table 3.3. The colourimetric assay used had a lower limit sensitivity of 10𝜇𝑀. As such,

concentrations below 10𝜇𝑀 were considered null values, while values above that threshold were

considered valid and representative. Further, the values highlighted in purple are from samples

65

that turned purple after reaction, as is characteristic for oxalate samples for the assay used. A

couple of samples recorded oxalic acid concentrations (above 10𝜇𝑀) but did not turn purple. It

is suspected that this was caused by other non-oxalate biomass, which affected the

spectrophotometer reading, driving up the absorbance.

Table 3.3: Concentrations of oxalic acid measured by colourimetric analysis. Values in purple are

positively and quantitatively representative of oxalic acid. Heights of 0.8m correspond to Rotorod

samplers below the canopy and all others are above the canopy (canopy height = 1m).

Position Height

(m)

OA conc.

(µM) Day 1

OA conc.

(µM) Day 2

OA conc.

(µM) Day 3

OA conc.

(µM) Day 4

A 0.8 10.33 16.20 6.23 8.05

A 1.6 9.60 4.53 11.33 4.62

B 0.8 57.11 15.86 20.04 21.99

B 1.6 133.82 7.12 31.52 11.03

B 2.4 16.27 15.33 10.91 68.05

B 3.2 8.92 16.50 14.03 38.27

C 0.8 4.05 17.16 6.81 6.67

C 1.6 26.52 4.81 26.34 13.93

D 0.8 7.32 6.67 11.10 8.49

D 1.6 8.54 5.21 1.91 3.38

D 2.4 8.46 4.85 4.37 4.97

D 3.2 4.64 2.42 17.88 3.32

E 0.8 67.78 8.96 6.30 7.38

E 1.6 5.67 6.86 11.13 3.40

F 0.8 7.09 2.43 6.40 6.92

F 1.6 3.53 3.26 13.20 9.50

G 0.8 5.48 4.36 7.47 6.94

G 1.6 6.44 5.30 6.45 3.57

H 0.8 3.20 3.47 6.17 6.87

H 1.6 2.18 8.38 2.48 3.63

I 0.8 8.06 2.94 3.05 5.28

I 1.6 6.99 6.98 21.54 2.51

66

Table 3.3 generally confirms the consistent presence of spores near the source as seen by positive

detections at positions B and C on most of the sampling days. The concentration magnitudes

however are not consistent with the decay model that has long been associated with spore

dispersion, for which reduced concentrations are recorded further away from the source [147].

This is easily perceived in Figures 3.8 and 3.9, where concentrations of oxalic acid from samples

collected below and above the canopy are plotted separately. Only oxalic acid concentrations of

samples corresponding to downwind Rotorod positions (A, B, C, D, I – see figure 3.2) are shown

in these Figures, because concentration decay with distance from source is of interest here. These

positions correspond to -7m, 1m, 7m, 14m and 28m downwind relative to the centre of the spore

ring (origin – 0m) as represented in both figures. The higher concentration at position B (1m) in

Figure 3.9 is unusual when compared to the value at the same position on Figure 3.8, since below

canopy spore concentrations are normally higher than above canopy concentration as a result of

canopy filtering [89] and sacrifice of spores near the source due to cooperative action of spores

during release [26].


0

10

20

30

40

50

60

-10 0 10 20 30

Co

nc.

of

OA

(µ

M)

Distance from Center of Spore Ring (m)

Oxalic Acid Concentration Below Canopy

Day 1

Day 2

Day 3

Day 4

67


As may be seen from Figures 3.8 and 3.9, the concentrations above the canopy are slightly higher

than those below the canopy throughout. Higher above canopy concentrations than below canopy

concentrations is not usually the case for actual spore concentrations in fields where only local

sources contribute to the spores. This is because the canopy heavily filters escaping spores.

Two more outlying measurements are evident in Figures 3.10 and 3.11. Here the complete data,

including concentrations from samples in the crosswind positions (E, F, G, and H) is shown as

side-by-side comparisons of concentrations of oxalic acid by day (Figure 3.10) and by position

(Figure 3.11). On Day 2 (Figure 3.10), a high concentration of oxalic was measured for the upwind

position, A. However, because the measurements are too close to the colourimetric test’s

detection threshold of 10𝜇𝐿 and the overall low concentrations for the entire day (see Day 2 on

Figure 3.10), this may not be significant.

0

20

40

60

80

100

120

140

160

-10 0 10 20 30

Co

nc.

of

OA

(µ

M)

Distance from Center of Spore Ring (m)

Oxalic Acid Concentration Above Canopy

Day1

Day 2

Day 3

Day 4

68

Figure 3.10: Side-by-side comparison of daily oxalic acid concentrations for all positions. The

positions of collection of spores represent Rotorod samplers that were deployed below the

canopy.

Figure 3.11: Concentrations grouped by position for all sampling days. Spores tested for oxalic

acid were collected below the canopy.

0

10

20

30

40

50

60

70

80

Day 1 Day 2 Day 3 Day 4

Co

nc.

of

OA

(µ

M)

Sampling Days

Side-by-side comparison of daily oxalic acid concentrations A

B

C

D

E

F

G

H

I

0

10

20

30

40

50

60

70

80

A B C D E F G H I

Co

nc.

of

OA

(µ

M)

Sampling Positions

Side-by-side comparison of oxalic acid concentration by positions

Day 1

Day 2

Day 3

Day 4

69

3.4.3 Spore DNA (qPCR) Results

The primer design used during the measurement has been found capable of detecting as low as

1.4 spores, corresponding to 0.5pg of DNA [153]. Therefore, the improved sensitivity of this

quantification method over the proxy measurements of oxalic acid is expected. Figures 3.12 and

3.13 show the downwind gradient of spores for all days below and above the canopy respectively.

The key refers to field positions (letters) and height of deployment (numbers) shown in figure

3.2. In these figures, dispersion of spores in the along-wind direction, leaving out the sampling

positions in the lateral direction for now, is shown below and above the canopy respectively.

Figure 3.12: Along wind concentration (spore DNA) gradient below OSR canopy for first three

sampling days. The key refers to field positions (letters) and height of deployment above

ground (numbers). Spore DNA axis is scaled for clarity, maximum values for the first 2 days are

shown at the top and have the same units as the vertical axis.

26775 15245.25

0

2000

4000

6000

8000

Day 1 Day 2 Day 3

Scl

ero

tin

ia D

NA

(p

g)

Sampling Days

Spore gradient in Little Hoos below canopy

disregarding lateral dispersion

A 0.8

B0.8

C0.8

D0.8

I0.8

70

Figure 3.13: Along wind concentration (spore DNA) gradient above OSR canopy for first three

sampling days. Lateral (crosswind) sampling positions are not shown.

It can be seen that spore production/release declined over the duration of sampling. Also, a more

discernible (compared to colourimetric results) trend of daily spore depletion with distance from

the source is evident for all days except the last, where relatively no spores were detected. This

low spore count on the final day of sampling was traced to a marked deviation of the sampling

axis from the general wind direction due to a forecasting error in wind direction for that day,

which resulted in most of the plume bypassing the sampling grid. A wind rose of forecasted and

actual wind direction for the field reveals this, as shown in Figure 3.14. As a result of this, samples

from the fourth day were excluded from further analysis.

a b

Figure 3.14: Wind rose showing forecasted (a) and actual (b) wind speed and directions on day

4. The forecasted wind readings were used to set the sampling axis, resulting in a misalignment

of sampling grid and spore plume

0

300

600

900

1200

1500

1800

Day 1 Day 2 Day 3

Scl

ero

tin

ia D

NA

(p

g)

Sampling Days

Spore gradient in Little Hoos above canopy

disregarding lateral dispersion

A1.6

B1.6

C1.6

D1.6

I1.6

10%

20%

30%

Wind Rose Day 3 (02/06/13) 1000-1600 hrs

WEST EAST

SOUTH

NORTH

1.8 - 1.9

1.9 - 2

2 - 2.1

2.1 - 2.2

2.2 - 2.3

2.3 - 2.4

2.4 - 2.5

2.5 - 2.6

2.6 - 2.7

15%

30%

45%

Wind Rose Day 4 (03/06/13) 1000-1600 hrs

WEST EAST

SOUTH

NORTH

2.85 - 2.9

2.9 - 2.95

2.95 - 3

3 - 3.05

3.05 - 3.1

3.1 - 3.15

3.15 - 3.2

3.2 - 3.25

3.25 - 3.3

3.3 - 3.35

3.35 - 3.4

3.4 - 3.45

71

When Figures 3.12 and 3.13 were compared, it appeared there was a comparatively low escape

of spores from the canopy on Day 2. Of the 15245pg of spore DNA recorded below the canopy,

only 581pg (3.8%) made it outside the canopy. By comparison, 1541pg of 26775pg (5.7%) and

1566pg of 5613pg (27.9%) escaped the canopy on Days 1 and 3. These percentages actually

reveal that there is an unusually high escape rate of spores on Day 3. It is unclear why such a

high escape rate was recorded for Day 3 as all sampling days had identical horizontal wind speed.

It is worth noting however that wind flow in the canopy is non-Gaussian, is characterised by low

wind speeds [38, 41, 54], and factors such as wind gusts, which could affect deposition [41], and

therefore escape from canopy, aren’t accounted for by averaged wind statistics. This is why

trajectory models that utilize instantaneous weather variables through turbulence have been

known to describe canopy transport better [55, 58, 88, 89].

Figure 3.15: The spore gradient at position B (1m downwind of spore ring) with height for first

three sampling days.

0

5000

10000

15000

20000

25000

30000

0.8 1.6 2.4 3.2

Scl

ero

tin

ia D

NA

(p

g)

Height Above Ground (m)

Gradient with height at location B

Day 1

Day 2

Day 3

72

Figure 3.16: The spore gradient at position D (14m from downwind of spore ring) with height

for first three sampling days.

In Figures 3.15 and 3.16, the concentration gradients (with height) at positions B and D are

shown for all days respectively. One of the Rotorod samplers at 3.2m failed on day 2 (Figure

3.16), so the data low data recorded for that position is not representative. It can be seen that,

closer to the source (position B – figure 3.15) there is a steep gradient of spores with height (-

8𝑛𝑔/𝑚 on Day 1, -4.5𝑛𝑔/𝑚 on Day 2, and -1.8𝑛𝑔/𝑚 on Day 3). The steepness of this gradient

is the direct result of the heavy filtration effect of the canopy between heights 0.8m and 1.6m.

Between 1.6m and 2.4m, outside the canopy, there is a decrease in depletion of spores as the

spores are more mixed (Figure 3.15). The linear vertical profile of spores is similar to results of

Sclerotinia sclerotium spore escape from a pasture [50] and release of Lycopodium spores from

a wheat canopy [55].

0

50

100

150

200

250

300

350

0.8 1.6 2.4 3.2

Scl

ero

tin

ia D

NA

(p

g)

Height Above Ground (m)

Gradient with height at location D

Day 1

Day 2

Day 3

73

Figure 3.17: Spore dispersal gradient for all positions including crosswind (lateral) sampling positions. The spore DNA concentration axis

is in nanograms (ng) and is scaled between 0 to 1ng, for clarity. The key refers to field positions (letters) and height of deployment

above ground (numbers).

0

1

31 May 2013 01 June 2013 02 June 2013

Scl

ero

tin

ia D

NA

(n

g)

Spore Dispersal Gradient in Little Hoos for all sampling positionsA0.8

A1.6

B0.8

B1.6

B2.4

B3.2

C0.8

C1.6

D0.8

D1.6

D2.4

D3.2

E0.8

E1.6

F0.8

F1.6

G0.8

G1.6

H0.8

H1.6

I0.8

I1.6

74

At position D (Figure 3.16), 14m downwind of the source, the gradient is flatter than at position

B (to about 53𝑝𝑔/𝑚 for all three days) and comparable numbers of spores are seen below and

above the canopy as a result of spore dilution by air and the reduced effect of the source. This is

true for all sampling positions after C, which is 7m downwind of the source, as shown in Figure

3.17. The figure shows the concentration measured at all positions below and above the canopy

in one figure. Here, spore numbers are almost identical below and above the canopy after 7m

from the source representative of the domination of eddy diffusion over turbulent diffusion [80].

This suggests that sampling at a height of 1.6m (above the canopy) may provide more

representative aerial spore concentration since it can neutralise the effect of very close sources,

which can have the tendency to disrupt models.

Figures 3.18 and 3.19 show the spore DNA plotted against downwind distance from the source

(position letters are replaced by actual downwind distance from source in meters) fitted to a

power law model. In literature, along-wind spore dispersion within and above the canopy has on

some occasions [147, 179, 180] been found to follow a power decay law of the form 𝑎𝑥−𝑑, where

𝑥 is the distance from the source and 𝑎 𝑎𝑛𝑑 𝑏 are constants. Figures 3.18 and 3.19 show that the

dispersion of Sclerotinia sclerotium spores in OSR follows such a model. Expectedly, spore

concentration below the canopy decays much faster (𝑑 = 1.65, 1.47 and 1.3 for the first three

days) owing to the loss of spores by sedimentation and deposition on leaves. Roper et al. [26]

have found out that, due to cooperative action of fungal spores to maximise dispersion, a lot of

Sclerotinia sclerotium spores are sacrificed very close to the source for the greater objective of

travelling farther. This explains the reason behind the heavy deposition of Sclerotinia sclerotium

spores near the source, which have been reported to be as high as 90% [181].

Once spores escape the canopy, the rate of decay slows (see Figure 3.19) because of low

deposition and exposure to a uniform Gaussian-like wind profile resulting in greater turbulence

and mixing (dilution). The much lower decay rates (𝑑 = 0.76, 0.44 and 0.65 for the first three

days respectively) give an indication of this.

75

Figure 3.18: Spore DNA below the canopy plotted with distance from centre of spore ring for

first three days of sampling. Data is fitted to an inverse power law with coefficients, exponents

and 𝑅2 as shown.

y = 22927x-1.647

R² = 0.9878

y = 13831x-1.47

R² = 0.9937

y = 5412.1x-1.297

R² = 0.9652

0

5000

10000

15000

20000

25000

30000

0 5 10 15 20 25 30

Sp

ore

DN

A (

pg)

Distance from center of spore source (m)

Gradient with distance below canopy fitted to a power

law decay model

Day 1

Day 2

Day 3

Power (Day 1)

Power (Day 2)

Power (Day 3)

Figure 3.19: Spore DNA above the canopy plotted with distance from centre of spore ring

for first three days of sampling. Data is fitted to an inverse power law with coefficients,

exponents and 𝑅2 as shown.

3.5 Discussion

A discussion of the most significant findings of this chapter are discussed in this section. First,

the reliability of the prototype biosensor is analysed based on the performance of proven

detection and quantification methods (colorimetric detection and qPCR). Then an analysis of

the dispersion of naturally released Sclerotinia spores is given based on the qPCR quantified

data. This is followed by a discussion on the experimental value of the data generated in this

field trial and the limitation of the experiment. Section 3.5.1 is considered as new knowledge

while 3.5.2 and 3.5.3 are considered as new contributions to existing knowledge.

3.5.1 Reliability of the Prototype biosensor in measuring oxalic acid

In this work, the performance of a Sclerotinia spore biosensor for real-time deployment has

been evaluated for the first time. The biosensor utilizes an enzymatic bioreceptor [13] to

target Sclerotinia spores and the quantitative output is measured by electrochemical

transduction [171] [166].

The results show that the biosensor was unable to make positive detections on 2 of the 3

days it was tested, even though optimal conditions for release of ascospores were present

throughout [16] [19]. To accommodate the reality that the relationship between oxalic acid

concentration and spore numbers is not well understood [177], a colorimetric detection test

of the same sample, testing the same analyte (oxalic acid) was used to validate the

biosensors. The colorimetric tests showed that indeed oxalic acid were produced by the

y = 1278x-0.647

R² = 0.8941

y = 786.58x-0.44

R² = 0.6008

y = 1428.3x-0.762

R² = 0.9117

0

200

400

600

800

1000

1200

1400

1600

1800

0 10 20 30

Sp

ore

DN

A (

pg)

Distance from center of spore source (m)

Gradient with distance above canopy fitted to a

power law decay model

Day 1

Day 2

Day 3

Power (Day 1)

Power (Day 2)

Power (Day 3)

77

spores on all days, although they were low (all but one measurement were < 100 𝜇𝑚𝑜𝑙𝐿−1

and only 4 were > 50 𝜇𝑚𝑜𝑙𝐿−1). The calibration curve and the biosensors detection limit

( 63 𝜇𝑚𝑜𝑙𝐿−1) have shown the biosensor to be unable to reliably measure concentrations

below this threshold. Even among concentrations higher than the sensors LOD, the biosensor

recorded false negative values. This suggests that the LOQ, may have been set low at 1 x

LOD, although this is difficult to confirm without considerably more data.

The calibration curve also showed a high degree of variation in the biosensor measurement,

as a change of 0.51 μA in current (6.24 to 6.75 μA ) resulted in a two-fold increase

concentration of oxalic acid (115.89 to 257.56𝜇𝑚𝑜𝑙𝐿−1). This suggests high variability and

affects the ability of the biosensor to meet a key standard requirement of reliable sensors

[171, 182].

The high LOB of 6.05𝜇𝐴 (compared to the maximum current corresponding to the upper limit

of the detection range – 11.22𝜇A) represents the background noise of the biosensor, which

is a combination of biosensor error and electronic noise [183]. For a lot of biosensors, the

contribution of biosensor error to total error is the main challenge in reducing background

errors and variability [184]. In this case, this error is the result of the oxidation of ferrocyanide

ions by acid buffer applied to the sensors during manufacture [172]. As a result of this high

LOB, the resultant LOD - approximately 4% of the biosensors linear range - is also high and

significantly inferior to reported values for enzymatic electrochemical biosensors, with a

majority routinely detecting well below 1% of their ranges [185]. It is therefore recommended

that as a minimum, the biosensor chip should be improved to reduce the ambient noise in

the device by somehow inhibiting oxidation of ferrocyanide ions by acid buffer already in the

sensors. Interestingly, the biosensor electrochemistry manual [172] reports a lower

background noise, although this was based on fewer tests. The difference between the values

reported in the SYield report and this work is possibly due to batch to batch variation of

biosensor performance.

Deployment of this sensor raises potential areas of concern. So far, only the biosensor’s ability

to directly measure oxalic acid has been discussed. During real-world deployment the

biosensor will have to confront the challenges introduced in the stage before oxalic acid

production, specifically the period between collection, incubation, reaction and

commencement of oxalic acid synthesis. One such challenge is related to the

mischaracterisation of Sclerotinia spores during sampling. This can happen when sampled

data is contaminated by other fungi or chemicals which have the ability to impersonate

Sclerotinia [63] or suppress its ability to produce oxalic acid [186]. While enzymatic biosensors

are not as selective as DNA-based biosensors, they are acceptably selective for most species

78

[182] [13]. The constant presence of contaminants highlights one of the challenges of real

life sampled data and is responsible for why most biosensors show a performance

deterioration when deployed in an uncontrolled non-laboratory setting [185] [182]. These

mischaracterisations can either result in underestimation or over estimation of Sclerotinia risk.

It was not possible to test the impact of this source of error because both methods of testing

targeted the same analyte (oxalic acid). Nevertheless, the existence of OA-producing fungi in

real life spore samples makes underestimation or overestimation of Sclerotinia risk a potential

concern for the biosensor, particularly because the biosensor isolates at the analyte level

(oxalic acid) not at the spore level.

Table 3.4: Spore DNA converted to spore numbers using 0.35pg per single spore determined by

Rogers et al. [153].

Position Height

(m)

Spore No.

Day 1

Spore No.

Day 2

Spore No.

Day 3

A 0.8 208 184 45

A 1.6 11 25 48

B 0.8 76500 43558 16039

B 1.6 4405 1662 4476

B 2.4 1688 1662 792

B 3.2 432 643 60

C 0.8 1893 1776 893

C 1.6 663 1785 609

D 0.8 794 829 815

D 1.6 657 857 813

D 2.4 372 668 388

D 3.2 371 18 452

E 0.8 585 613 362

E 1.6 562 1157 500

F 0.8 296 1030 224

F 1.6 282 1145 624

G 0.8 266 368 95

G 1.6 874 444 124

H 0.8 260 298 172

H 1.6 458 366 118

I 0.8 349 335 170

I 1.6 552 307 301

79

Another area of concern for field deployment is the ability of spores to produce detectable

quantities of oxalic acid. A comparison of oxalic acid measured by colorimetric detection and

spores numbers (Table 3.4) shows that oxalic acid concentrations measured were low

compared to spore numbers at the same position. The average value for the closest sampling

position below the canopy downwind of the source, which should have a relatively high

number of spores due to near-source deposition [26, 154], is 28.75𝜇𝑀. This value is low

considering 50 spores have been reported to produce as much as 500𝜇𝑀 (at a PH of 5.9) of

oxalic acid under similar incubation conditions [177]. This discrepancy is possibly because

highest spore numbers are uncorrelated with high oxalic acid production but, rather,

correlated with biomass growth in fungal cultures [177, 187]. Results obtained by Heard

[177] from investigations into the detection of fungal pathogens showed that low (50),

medium (291) and high (2300) numbers of spores produced comparable levels (between 300

and 780𝜇𝑀) of oxalic acid at pH ranging from 5 to 5.9 after incubation for 4 days in Sabouraud

growth media. Heard [177] observed that there was no noticeable variation in biomass growth

between the three doses of spores. Based on this, it has been suggested that spore numbers

only influence the onset of acid formation [177]. These indicate that laboratory

determinations of expected oxalic production may not translate to field deployments.

Based on all of the above, it is very likely that the biosensor will produce false negative results

which can be more devastating than false positives to farmers/growers since the effects of

the former are often irreversible. It therefore recommended that the following be done before

field deployment:

An improvement of the biosensor chip to improve sensitivity to oxalic acid by reducing

background noise.

More investigation into realistic concentration of oxalic acid to be expected from non-

laboratory isolated airborne spores. Sampling of spores should include Sclerotinia

isolates from a larger area of the UK, as opposed to local or single-field sampling.

At least one more field trial with the full biosensor unit (comprising sampling

mechanism, incubation chamber, biosensor and instrumentation devices) deployed

should be carried out to get more data for the assessment of the units selectivity.

Particularly, the effect of masquerading fungi and oxalic acid-inhibiting microbes on

measurement should be investigated.

Much of the growth of biosensing as a field is owed to the success of biosensors in the medical

health sector [185] [188] [189] [190]. Now aided by new developments in identification and

quantification of airborne inoculum [13] these devices show increasing suitability for

agricultural and environmental applications [190-192] with the potential to achieve higher

accuracies and selectivity [193] [171].

80

With the growth of mechanised farming, agricultural applications require largescale data

collection which would require numerous sampling points over large networks. Key

requirements for biosensors used in this field are operational reliability, speed of

measurement, ease of use and setup, and inexpensiveness per test [182, 194]. At this stage

in the technological development of biosensors, engineering the highest accuracy

bioreception methods (such the DNA-based qPCR) into rapid testing, automatic, easy to use

and continuous biosensors for the purposes of large scale deployment is currently intractable

[13] and economically prohibitive for most applications [182]. A trade-off is therefore

necessary. Enzymatic biosensors, such as the glucose sensor [195] (after numerous

iterations) have been able to achieve this trade-off perfectly [184] [185] [182]. Unfortunately,

as this study indicates, a solid understanding of the biology of the analyte, its synthesis and

likely sources of contamination is necessary in order to produce acceptably reliable enzymatic

biosensors for environmental applications.

3.5.2 Sclerotinia sclerotium spores dispersion

Due to the superior accuracy and reliability of the qPCR measurement, all spore dispersion

analysis was based on spore DNA data. Spore numbers were converted to actual

concentration by dividing equivalent spore numbers by volume of air sampled (38L/m) and

duration of sampling (5hrs) to obtain spores/𝑚−3. This was deemed unnecessary since the

sampling rate, through the use of one type of sampler, and duration were maintained

throughout the experiment. For the same reason, spore DNA was also not scaled by the

efficiency of the Rotorod sampler. As such, the relative difference between samples was

preserved.

The results show that the effect of canopy filtration decreases with distance from the source.

At 7m downwind of the source, the average gradient over the sampling was 4.8 𝑝𝑔/𝑚

compared to 0.53𝑝𝑔/𝑚 at 14m downwind. Near the source, due to vertical ejection of spores

and sacrificial deposition of spores to maximise travel distance [26], much fewer spores

escape the canopy. At longer distances (14m), due to the dominant effect of eddy diffusion

over turbulent transport [154], the spores have sufficiently mixed into a plume and

comparable concentrations were measured at all heights, with below canopy (0.8m) and just

above canopy (1.6m) only differing by 4.5% over 3 days. The significance of this result relates

the effect of turbulence on representativeness of air sampled data. Deploying sampling

equipment where the spore plume is sufficiently mixed gives them the highest chance of

success [29]. The results obtained then suggest that for an OSR canopy of 1m height under

similar conditions, sampling at a distance of approximately 14m from the source is optimal.

This warrants more investigation to determine the effect of different source sizes and

81

strengths on the extent (from the source) of turbulent motion in OSR fields.

The data obtained, as shown in section 3.4.3, is consistent with the 1-D monotonic depletion

of spores that has been associated with fungal spores in literature [66]. Spore DNA was

shown to follow an inverse power law (𝑎𝑥−𝑑) in and above the OSR canopy. This is in

agreement with findings in literature where models based on source-depletion equations have

been proposed to describe concentration gradients in and through crop canopies [3, 147,

162]. These models are either in the form of an exponential decay equation [33] power decay

equation [196] or an additive combination of the two [197]. As Fitt et al. [147] note, the key

difference between the inverse power law fitted to the data in this study and the exponential

power law is that the former does not assume a constant length scale (proportional decrease

in concentration over equal distances). A varying length scale, as assumed by the power law

and, by extension, the data collected in this work, is a characteristic feature of turbulence

that is consistent with canopy near-field flow [161].

Figure 3.20: Kernel Density Estimation of spore DNA distribution below (left) and above

(right) the canopy.

Analysing the spatial distribution of spore DNA below and above the OSR canopy reveals a

key characteristic of in-canopy dispersion as shown in Figure 3.20. In this figure, the Kernel

Density Estimate [112] of spore distribution below the canopy is shown alongside that of

82

spores above the canopy for all days. Below the canopy (Figure 3.20a, b, c), a sharp peak to

the distribution, implying excess/positive kurtosis, is visible. Kurtosis is a 4th moment

descriptive statistic that measures the stochastic nature of a distribution [198, 199] and,

therefore, its variance and deviation from the mean. Kurtosis is a characteristic feature of in-

canopy dispersion due to heavy deposition of spores near the source as a result low average

wind speeds and canopy filtering [49]. This distribution holds for all sampling days as shown.

Expectedly, above the canopy (Figure 3.20d, e, f), the shapes of the curves tend toward

uniform distributions due to increased mixing of spores as a result of stronger winds and

greater turbulence. This is supported in literature The bimodal behaviour exhibited on Days

1 and 3 (Figure 3.20d and 3.20f) represents the dichotomy between two processes occurring

at different turbulent length scales – near the source where spore depletion occurs at a high

rate, and further from the source where spores are mixed and concentration doesn’t decrease

as rapidly with distance.

On the second day, the distribution is closer to uniform and unimodal. This distribution

represents an increased role of eddy diffusion in the dispersion process a higher mixing of

spores, which results in higher mixing and uniform concentration levels with downwind

distance - at positions B, C and D as shown earlier (see section 3.4.3, Figure 3.13, Day2).

The mixing of spores is so central to Gaussian distribution that a necessary assumption for

applying Gaussian Plume Models (GPMs) to spore transport is the “well-mixed condition” [54,

87].

83

Figure 3.21: Dispersion contours of spore concentration below (left) and above (right) the

canopy.

In Figure 3.21, the dispersion of spores below (a, b, c) and above (d, e, f) the OSR canopy

is shown. Below the canopy, there is almost no dispersion beyond 5m in all directions. These

omnidirectional contours are consistent with very low wind speeds where wind and turbulence

contribute very little to spore movement [38, 147]. The reason for this is related to the vertical

inhomogeneity of canopy turbulence with height. Kaimal and Finnigan [38] and Reynolds [49]

have earlier demonstrated this by plotting shear stress (𝑢′𝑤′/𝑢∗2, where 𝑢′ and 𝑤′ are

horizontal and vertical turbulent velocity components, and 𝑢∗ is the friction velocity) for

canopy data of varying Leaf Area Indices (LAI). The profile showed that momentum entering

dense canopies is absorbed from the top at a rate that ensures minimal transmission of shear

stress (turbulence) to the ground. This negligible ground-level turbulence contributes to why

an overwhelming amount of spores released from ground-level sources, such as Sclerotinia

sclerotium spores are deposited near the source. In such a case, the shape of the contour is

almost entirely determined by sedimentation and vertical deposition on leaves and other foliar

elements as supported by this work and depicted in figure 3.21a, b, c.

The effect of wind is more pronounced above the canopy (Figure 3.21d, e, f) as some spores

are transported up to at least 20m downwind on all days. Little lateral dispersion is observed

84

on Days 1 and 3 (Figure 3.21d and f). On Day 2 (see Figure 3.21e), more spores are

transported further in both the downwind and crosswind directions by diffusion as indicated

by the depletion of peak concentration compared to other days (1.4 for days 1 and 3, 0.6 for

day2). This depletion is due to more severe dilution of spores by higher turbulence and the

commencement of diffusive transport [154].

These are in agreement with experimental findings that in-canopy transport exhibits near-

field or stochastic flow characteristics [48, 62, 161].These findings suggest that dispersion

of naturally released Sclerotinia exhibit similar general characteristics inside and in the

neighbourhood of a canopy as other fungal spores [62]. This is expected as the main effect

of canopy on flow is the dissipation of turbulent kinetic energy (TKE) [161] [36, 200-202].

However, the rate of this dissipation is a function of canopy type, spore release mechanism

and spore size [149], and there may be variations from case to case. Therefore, each

canopy/spore/release combination warrants an investigation in order to fully understand its

dispersal mechanisms. Studies like these will provide an opportunity to investigate the more

nuanced characteristics that differentiate spore dispersion from one canopy to another.

3.5.3 Experimental Value of Spore Data

The experimental data generated in this work has experimental utility. Although Sclerotinia

falls within the class of actively released fungal spores [63, 181], there is no general

consensus on how it is released. Varying release rates, release conditions and release speeds

have been reported in literature [19] [26] [203] [204]. One of the complexities of Sclerotinia

release mechanism is highlighted by Roper et al. [26], who have reported that Sclerotinia can

be deliberately sacrificial of spores near the source in order to maximise travel distances.

Further, the dispersion medium (type of canopy and density) affect dispersion in a way that

is unique for each canopy, with microscale flow heterogeneity varying from canopy to canopy

[205]. As a result, generic conclusions drawn from specific experiments, while similar in terms

of large scale diffusive dispersion [154] [53], may not be representative of every unique case

in the near-field [161] [48], since specific canopy features affect the dissipation of turbulent

kinetic energy at different rates [62, 161]. In addition to affecting turbulence, the canopy

also influences the rate of deposition, which has to be accounted for every dispersion

analysis/modelling [57, 206]. Wilson [48] notes that deposition may affect modelling results

more than turbulent characterisation. The canopy profiles needed to improve the accuracy of

turbulent calculations are not readily available for crops other than maize (corn) [207] and

wheat [208]. Consequently, the continued understanding of canopy effects on dispersion and

accuracy of models relies on continued availability of good experimental data.

85

The dearth of naturally released spore data also makes the data in this study important. Most

of the reliable data into dispersal of Sclerotinia are dated and motivated by disease gradients

rather than spore gradients [17, 66, 104]. While disease and airborne spore concentration

may appear correlated, spore disposal cannot be adequately determined without

distinguishing the contributions to disease of each stage of the aerobiological pathway [148].

In the instances where experiments are carried out to investigate spore dispersion [162] [55]

[56] [57] [21] [196] [209], artificial release of inoculum above the ground are used. In the

few instances where they are naturally released [55], sampling is made along vertical profiles

or at downwind distances without sufficient spatial variation.

To the author’s knowledge, this is the first time an attempt has been made to experimentally

describe the three-dimensional dispersion of naturally released Sclerotinia sclerotium spores

in an OSR field. Along with the turbulence data recorded, this data can be used to test and

evaluate in OSR numerous turbulence modelling approaches that have already been tested

in other canopies [210, 211] [60] [212] [62, 161].

Based on this, the data generated in this work is considered a contribution to knowledge,

specifically in the understanding of Sclerotinia spores dispersion OSR fields

3.5.4 Limitations

There are a number of limitations to this experiment that have the tendency to improve data

quality when addressed. The first is the scale of the experiment. A larger sampling area will

provide more data and a higher confidence in results can be achieved. The main consideration

in this study that neutralised this limitation is that various findings have shown that local

sources of Sclerotinia do not generally travel more than 100 meters at significant detectable

quantities [29, 44, 154] [30, 42]. Therefore, the significant increase in manual labour and

expenditure necessary to address this limitation may not always be rewarded by richer data.

The second limitation concerns the state of the art in spore trap technology. The performance

of spore traps can be highly variable based on location, type, size of particle, location of

source and length of sampling period [10] [11, 13]. With respect to testing oxalic acid, the

effect of this is minimal, as both biosensor testing and colourimetric detection were done with

data sampled and processed under identical conditions, hence eliminating the effect of any

bias introduced by sampling. With respects to actual spore quantities and dispersion however,

this limitation can only be mitigated until standards for deployment and spore trap technology

have been improved [148] [10]. As a mitigation measure, care has been taken in this study

to optimally locate and choose the right sampling equipment as detailed in section 3.3.1.2.

86

Another identified limitation of this analysis related to the DNA quantification technique which

spore number estimates were based. qPCR, while very sensitive and selective to Sclerotinia,

cannot determine spore viability [63]. Only viable spores can contribute to oxalic acid

production [107] and as such pore numbers may be unrepresentative of oxalic acid quantity.

However, the effect of this is considered negligible given the large disparity between spore

count measured in the field and oxalic acid concentration measured.

3.6 Conclusion

In this chapter, details of the design and implementation of a field trial experiment to collect

spatial data for the dispersion of Sclerotinia sclerotium spores were described. Three methods

of quantification, namely electrochemical measurement of oxalic acid with a prototype

biosensor, direct measurement of oxalic acid by colourimetric analysis and quantification of

spore DNA by qPCR analysis were used to infer the concentration of collected spores.

Calibration of the biosensor with pure and known concentrations of oxalic acid showed a low

signal-to-noise ratio that was associated with a high ambient noise due to the inherent activity

of ferrocyanide ions in the biosensor. Consequently, only two positions on the field recorded

concentrations (Day 1) that were above the threshold noise (LOD) of the biosensor. The

direct measurement of oxalic acid using colourimetric analysis offered improvements in

sensitivity and detected and quantified oxalic acid at many more positions on the field for all

sampling days.

The qPCR measurements offered the highest sensitivity to spore concentration. The number

of spores determined from the qPCR data showed that the oxalic acid concentrations

measured by colourimetric analysis were lower than expected in comparison to amounts

measured by Heard [177] in a laboratory test with low (50), medium (291) and high (2300)

number of spores. The difference between oxalic acid concentrations measured for pure

Sclerotinia sclerotium spores in Heard’s laboratory tests and those measured in this work may

be due to the presence of other types of spores/pathogens in real air-sampled data that could

act as contaminants and suppressants of oxalic acid production, the arbitrary effect of spore

numbers on oxalic acid quantity or a combination of both.

The biosensor was shown to have a detection limit representing 4% of its measurement range

which is much higher than reported for enzymatic biosensors in literature. As stated in the

preceding paragraph, it is suspected that field-sampled spores may not consistently produce

oxalic acid above this detection threshold. This suggests that, as currently built, the biosensor

is unlikely to reliably detect Sclerotinia sclerotium spores when deployed.

87

The more reliable qPCR data showed dispersion to be consistent with dispersion patterns

investigated for other types of spores and pollen in crop canopies. Above and below canopy

dispersions followed an inverse power law with different rates of source depletion. Dispersion

below the canopy was lower due to low average horizontal wind speeds inside the canopy

(resulting in higher sedimentation) and the filtering effect of the OSR canopy. Above the

canopy, more mixing was observed and the distribution of spores was similar to a Gaussian

process as a result of higher mixing of spores and longer, more stable turbulent length scales.

To the author’s knowledge, this is the first time an attempt has been made to experimentally

describe the three-dimensional dispersion of naturally released Sclerotinia sclerotium spores

in an OSR field. Others [196, 213] have investigated vertical escape and gradients of other

types of spores in other canopies. Since spore dispersion in crops is canopy and spore-type

dependent, the data generated from this study has experimental value. The next chapter

describes the novel application of a Lagrangian Stochastic (LS) model to this data in order to

describe spore dispersion in OSR canopies.

88

Chapter 4 A backward Lagrangian Stochastic (bLS) model

for the dispersion of Sclerotinia sclerotium spores

4.1 Introduction

In Chapter 3, details of a field trial experiment for the release, collection and quantification

of Sclerotinia spores were provided. This chapter describes the backward Lagrangian

Stochastic (bLS) model and its adaptation and application to dispersion of the relatively

heavier Sclerotinia sclerotium spores in the presence of a disruptive (on deposition and

turbulence) oil seed rape canopy. This algorithm was originally developed to describe tracer

transport.

The aim of this chapter is to demonstrate that the bLS model can be used to describe

transportation for spores released from a ground level source in an OSR canopy sufficiently

well enough above the canopy to enable further employment of the model in calculating spore

trajectories. It is desired that the model can describe spore transport in OSR canopies

sufficiently well to justify further employment in calculating spore spatiotemporal

concentration. This sufficiency is defined as the ability of the model to estimate concentrations

of spores at sensor locations above canopy height when the spores leave a ground level, in-

canopy source.

In applying the model, meteorological observations from the field trial experiment discussed

in Chapter 3 were used to characterise the turbulence during spore collection and the

measured spore concentrations were used to evaluate the model performance.

The chapter begins by explaining the reasons influencing the choice of the trajectory model.

This is followed by a description of Monin-Obukhov Similarity Theory (MOST). MOST is the

fundamental theory supporting the validity of turbulence parameterisations used in bLS. The

forward Lagrangian Stochastic (LS) model is then introduced, after which bLS and its

adaptation to the dispersion of Sclerotinia spores in OSR are described.

89

4.2 Motivation for Trajectory Modelling Approach

The data described in chapter 3 provided an insight into the random and non-Gaussian nature

of in-canopy dispersion. Within a field, before spores escape the canopy and constitute a long

distance threats to crops, spore transport cannot be adequately described by Gaussian Plume

Models (GPMs). In addition to the immediate source vicinity being non-diffusive [53], wind

speeds and turbulent forces are very low. Conditions are further complicated by the

disruptiveness of the canopy [36], which makes treating spore dispersion as a plume rather

than as a stochastic process unrealistic [49]. Trajectory models, which describe stochastic

processes, are therefore more suitable in the near field [53, 62]. The accuracy and superiority

of these models over GPMs in describing dispersion within crop canopies for ground-level

sources and measuring distances close to the source (less than 100m) has been demonstrated

and reported extensively in literature [53] [51, 54, 147, 162]. Trajectory models offer an

advantage over GPMs by enabling the tracking of individual particles from release to

deposition and should intuitively provide more accuracy in disruptive wind flows. Within the

family of trajectory models, Lagrangian Stochastic models are attractive because they are

free of theoretical constraints [48] and describe particle movement in the most natural

manner, with particles being described by their actual speeds as opposed to Eulerian fixed-

point velocities. This fundamental approach makes modifications to account for particle

physical features and canopy disruptions relatively easier [48]. Further, using the concept of

backward-time Lagrangian models, spore trajectories and therefore concentration can be

estimated without detailed source information [214]. This is beneficial for local spore

concentration estimation applications, where source locations are usually not known [11].

Another influencing factor for this choice of model is the logistical expense and technological

limitations of measuring and quantifying spore data. As mentioned in chapter 1, the

unavailability of real-time, fast-acting sensors limits the number and scale over which samples

can be collected. This eliminates the possibility of using empirical models. Trajectory models,

especially bLS, only require turbulent statistics, physical/aerodynamic properties of spores

and a description of the terrain over which spore concentration is to be determined. Validating

the models then requires comparably (to empirical models) much less data.

4.3 Background Theory

4.3.1 Lagrangian Stochastic Models

Lagrangian Stochastic (LS) models [87] describe the movement of particles in turbulent flow

by generating all possible trajectories of all particles from a given reference at any time.

These trajectories are computed instantaneously as the particles move with the ‘true’

90

Lagrangian velocities they experience as opposed to fixed-point velocities used in Eulerian

models. Each moving particle is subjected to turbulent forces as it travels through a medium

this and the corresponding velocity fluctuations and displacement experienced are described

by the Langevin equation [86]:

𝑑𝑢 = 𝑎𝑑𝑡 + 𝑏𝜉 [4.1]

𝑑𝑥 = 𝑢𝑑𝑡. [4.2]

where 𝑢 and 𝑥 are the horizontal velocity and position of the particle, 𝑎 and 𝑏 are the

coefficients representing the deterministic and random processes of the stochastic process,

and 𝜉 denotes random numbers drawn from a Gaussian distribution of zero mean and

variance equal to the process time-step, 𝑑𝑡. When

Equations 4.1 & 4.2 are based on the stochastic differential equation (SDE) arising from the

assumption that the state of each particle, its position and speed (𝑋, 𝑈), jointly evolves as a

Markov process [54]. The first term of Eq. 4.1 describes large-scale (drift) properties that

determine the nature of flow, thus representing the deterministic value of the speed

fluctuations. The second term describes small-scale (diffusive) properties of turbulent flow,

thus representing random fluctuations in average speed. For three-dimensional flow along 𝑥

(downwind), 𝑦 (cross-wind) and 𝑧 (vertical) directions corresponding to velocity

components 𝑢, 𝑣, and 𝑤, the generalised 3D Langevin equation given by:

𝑑𝒖 = 𝒂𝑑𝑡 + 𝒃 𝝃

𝑑𝒙 = 𝒖𝑑𝑡 [4.3]

Where:

𝒖 = 𝑢1, 𝑢2, 𝑢3 = 𝑢, 𝑣, 𝑤;

𝒙 = 𝑥1, 𝑥2, 𝑥3 = 𝑥, 𝑦, 𝑧;

𝒂 = 𝑎𝑢1, 𝑎𝑢2, 𝑎𝑢3 = 𝑎𝑢 , 𝑎𝑣 , 𝑎𝑤;

𝒃 = 𝑏𝑢1, 𝑏𝑢2, 𝑏𝑢3 = 𝑏𝑢 , 𝑏𝑣 , 𝑏𝑤 𝑎𝑛𝑑

𝝃 = 𝜉𝑢 , 𝜉𝑣 , 𝜉𝑤 .

The drift term, 𝒂 (𝒙, 𝒖, 𝒕), and diffusion term, 𝒃 (𝒙, 𝒖, 𝒕), can be determined by enforcing the

well-mixed condition [87], which means that in a bounded region, once the distribution of

particles becomes well-mixed in space, it is expected to remain well-mixed for all future times

[215]. The well-mixed condition only accommodates vertical inhomogeneity, the formulation

is not unique for multidimensional models where Gaussian turbulence cannot be assumed for

horizontal flow [150, 210, 211]. To address this, other schemes that satisfy the well-mixed

condition have been proposed but these have not been proven to be preferable to Thomson’s

[216]. For this study, Thomson’s model is suitable because of the assumptions of a stationary

boundary layer.

91

Given 𝑈, 𝑉 𝑎𝑛𝑑 𝑊 as the average Eulerian (measurable) velocities in the along-wind,

crosswind and vertical directions respectively, velocity fluctuations 𝑢 − 𝑈, 𝑣 − 𝑉 and 𝑤 − 𝑊,

henceforth denoted by 𝑢′, 𝑣′𝑎𝑛𝑑 𝑤′, and the corresponding velocity variances

𝜎𝑢2, 𝜎𝑣

2 𝑎𝑛𝑑 𝜎𝑤2, and covariances of velocity fluctuations ⟨𝑢′𝑣′⟩, ⟨𝑢′𝑤′⟩ and ⟨𝑣′𝑤′⟩, the drift

and diffusion terms (𝒂 and 𝒃) for a three-dimensional Langevin equation can be derived.

Assuming 𝑊 = 0 for stationary atmosphere and 𝑉 = ⟨𝑢′𝑣′⟩ = ⟨𝑣′𝑤′⟩ = 0 for sampling grid

aligned to wind direction (sees section 3.3.1.2), Thompson’s solution for the 3D coefficients

of the Langevin equation reduces to its simplest form [150, 217]:

𝑎𝑢 =1

2(𝜎𝑢2𝜎𝑤

2−𝑢∗4)

𝑏𝑢2[𝜎𝑤

2𝑢′ + 𝑢∗2𝑤′] +

1

2

𝜕(𝑢′𝑤′)

𝜕𝑧+ 𝑤′ 𝜕𝑈

𝜕𝑧+

1

2(𝜎𝑢2𝜎𝑤

2−𝑢∗4)

[𝜎𝑤2 𝜕𝜎𝑢

2

𝜕𝑧𝑢′𝑤′ +

𝑢∗2 𝜕𝜎𝑢

2

𝜕𝑧𝑤′2 + 𝑢∗

2 𝜕(𝑢′𝑤′)

𝜕𝑧𝑢′𝑤′ + 𝜎𝑢

2 𝜕(𝑢′𝑤′)

𝜕𝑧𝑤′2 ]

[4.4]

𝑎𝑣 =1

2𝑏𝑣

2 𝑣

𝜎𝑣2 [4.5]

𝑎𝑤 =1

2(𝜎𝑢2𝜎𝑤

2−𝑢∗4)

𝑏𝑤2[𝑢∗

2𝑢′ + 𝜎𝑢2𝑤′] +

1

2

𝜕𝜎𝑤2

𝜕𝑧+

1

2(𝜎𝑢2𝜎𝑤

2−𝑢∗4)

[𝜎𝑤2 𝜕(𝑢′𝑤′)

𝜕𝑧𝑢′𝑤′ +

𝑢∗2 𝜕(𝑢′𝑤′)

𝜕𝑧𝑤′2 − 𝑢∗

2𝜕𝜎𝑤

2


2 𝜕𝜎𝑤2

𝜕𝑧𝑤′2] [4.6]

𝑏𝑢 = 𝑏𝑣 = 𝑏𝑤 = √𝐶𝑜휀 =2𝜎𝑤

2

𝑇𝐿

[4.7]

where the friction velocity , 𝑢∗, is related to the fluctuation velocity covariance as

⟨𝑢′𝑤′⟩ = −𝑢∗2. 𝐶𝑜 is the Kolmogorov constant, 휀 is the turbulent kinetic energy dissipation

rate, and 𝑇𝐿 is the Lagrangian (or alternatively decorrelation) timescale, which describes the

persistence or memory of turbulent motion.

4.3.2 The Backward Lagrangian Stochastic Model

The backward Lagrangian Stochastic (bLS) [217] model is based on the conventional

Lagrangian Stochastic model [53] forward LS model tracks a particle from source (release)

to receptor (deposition) assigning positive values to the velocity vectors. In contrast, bLS

traces a particle’s trajectory back from a receptor to a source. This approach offers flexibility

and ease of use, especially for area sources, by focusing only on trajectories of interest that

originate from specific receptor locations. This simplifies computations by the reducing the

overall number of trajectories that must be computed. It also dispenses with the need to pre-

specify source configurations since any shape of source can be accommodated by simply

evaluating particles that land within it. Importantly, these advantages of the bLS model do

not result in a loss of accuracy compared to LS models [214, 218, 219].

92

Consider a particle originating from point A with time-space coordinates (𝑥, 𝑡), reaching a

receptor B with coordinates (𝑥𝑏 , 𝑡𝑏). Tracking the 𝑖𝑡ℎ particle in backward-time from B to A

implies that 𝑡𝑏 = −𝑡, and the corresponding position and velocity of this moving particle are

defined as:

𝑑𝑢𝑖𝑏 = 𝑎𝑏𝑑𝑡 + 𝑏𝑏𝜉 [4.8]

𝑢𝑖𝑏 =

𝑑𝑥𝑖

𝑑𝑡′=

𝑑𝑥𝑖

𝑑𝑡= −𝑢𝑖 [4.9]

So the particle has a negative velocity relative to LS, but the same magnitude as during

forward LS implementation. Flesch et al. [217] investigated the effect this would have on the

drift and diffusion terms and discovered that the bLS equivalent of the drift term, 𝒂𝑏, only

differs to the forward term, 𝒂, by a sign change on its first term while the diffusion terms 𝒃𝑏

and 𝒃 are equivalent. Hence, the drift and diffusion terms of the bLS model are given by:

𝑎𝑢𝑏 = −

1

2(𝜎𝑢2𝜎𝑤

2−𝑢∗4)

𝑏𝑢2[𝜎𝑤

2𝑢′ + 𝑢∗2𝑤′] +

1

2

𝜕(𝑢′𝑤′)

𝜕𝑧+ 𝑤′ 𝜕𝑈

𝜕𝑧+

1

2(𝜎𝑢2𝜎𝑤

2−𝑢∗4)

[𝜎𝑤2 𝜕𝜎𝑢

2

𝜕𝑧𝑢′𝑤′ +

𝑢∗2 𝜕𝜎𝑢

2

𝜕𝑧𝑤′2 + 𝑢∗

2 𝜕(𝑢′𝑤′)


2 𝜕(𝑢′𝑤′)

𝜕𝑧𝑤′2 ] [4.10]

𝑎𝑣𝑏 = −

1

2𝑏𝑣

2 𝑣

𝜎𝑣2 [4.11]

𝑎𝑤𝑏 = −

1

2(𝜎𝑢2𝜎𝑤

2−𝑢∗4)

𝑏𝑤2[𝑢∗

2𝑢′ + 𝜎𝑢2𝑤′] +

1

2

𝜕𝜎𝑤2

𝜕𝑧+

1

2(𝜎𝑢2𝜎𝑤

2−𝑢∗4)

[𝜎𝑤2 𝜕(𝑢′𝑤′)

𝜕𝑧𝑢′𝑤′ +

𝑢∗2 𝜕(𝑢′𝑤′)

𝜕𝑧𝑤′2 − 𝑢∗

2𝜕𝜎𝑤

2


2 𝜕𝜎𝑤2

𝜕𝑧𝑤′2] ; [4.12]

𝑏𝑢𝑏 = 𝑏𝑣

𝑏 = 𝑏𝑤𝑏 = √𝐶𝑜휀 =

2𝜎𝑤2

𝑇𝐿

[4.13]

Henceforth, 𝑎𝑢 , 𝑎𝑣 , 𝑎𝑛𝑑 𝑎𝑤 and 𝑏𝑢, 𝑏𝑣 𝑎𝑛𝑑 𝑏𝑤 will be used to denote the bLS model.

4.3.2.1 Calculating Concentration with bLS Models

Determining concentration with LS models is an additional process to computing trajectories.

Concentrations at a receptor are determined as the ensemble-average of particle residence

times within the receptor volume [216], i.e. the time spent by particles in a volume. For bLS

models, where the concentration footprint at the source, 𝑥0,, is required, this represents the

backward residence time, 𝑇𝑏, of tracers passing through an infinitesimal distance, 𝑑𝑧, above

a source with mass density, 𝑆. This is given by [217]:

93

𝐶(𝑥0) = 𝑆

𝑁∑ 𝑇𝑛

𝑏

𝑁

𝑛=1

[4.14]

Where 𝑁 is the number of particles traced back from receptor, 𝑥. For an area source

𝑄 (#𝑚−2𝑠−1) at an infinitesimal height above ground so that 𝑆 =𝑄

𝑑𝑧 then Flesch et al. [217]

[217] that 𝑇𝑏 =2𝑑𝑧

|𝑤0|, and Equation 4.14 can be written as:

𝐶(𝑥) = 𝑄

𝑁∑

2

|𝑤0| [4.15]

where 𝑤0 is the velocity at “touchdown” of the particles that land within the source. Equation

4.15 enables the estimation of concentration from a catalogue of landing positions and

velocities (𝑥0, 𝑦0, 𝑤0) without specifying type of source or configuration a priori.

4.3.3 Monin-Obukhov Similarity Theory (MOST)

MOST [47, 118, 191, [38, 152, 220][38, 152, 220][38, 152, 220][60, 129, 206](Garratt 1992,

Kaimal and Finnigan 1994, Foken 2006)(Garratt 1992, Kaimal and Finnigan 1994, Foken

2006)(Garratt 1992, Kaimal and Finnigan 1994, Foken 2006)(Garratt 1992, Kaimal and

Finnigan 1994, Foken 2006)(Garratt 1992, Kaimal and Finnigan 1994, Foken 2006)[51, 120,

201][47, 118, 200][47, 118, 200][47, 118, 200][47, 118, 200][47, 118, 200][47, 118, 199]47,

118, 199] is a generalised theory of turbulence for the surface layer based on a

nondimensional universal function of 𝑧/𝐿 (𝜑(𝑧 𝐿⁄ )), where 𝐿 is the Obukhov length [38, 220,

221]. The theory assumes that the surface layer is locally isotropic up to heights of about

100-200m [222]. Based on this assumption, vertical variation of horizontal wind speed and

turbulence characteristics, MOST stipulates that only three key parameters – friction

velocity, 𝑢∗ , Obukhov length, 𝐿, and surface roughness, 𝑧0 - are required to adequately

describe surface layer flow [152]. These are calculated as follows. The friction velocity, 𝑢∗, is

a function of the turbulent fluctuations of velocity components and is given by:

−𝑢∗2 = √⟨𝑢′𝑤′⟩ [4.16]

And the Obukhov length, L, a measure of stability, is given by:

𝐿 = −𝑢∗

3𝑇

𝑘𝑣𝑔(𝑤′𝑇′) [4.17]

where T is the absolute mean temperature, ⟨𝑤′𝑇′⟩ is the temperature flux and the von

Karman constant, 𝑘𝑣, was chosen as 0.40 – the average of the most reliable values reported

for MOST [152].

94

The roughness length, 𝑧0, was calculated by substituting 𝑢∗ and L into the modified log-wind

profile for canopies as follows:

𝑧0 = −𝑧 − 𝑑

exp [𝑈𝑘𝑣

𝑢∗2 − 𝜑(𝑧, 𝑧0, 𝐿)]

[4.18]

where 𝑑 is the displacement or zero-plane height at which the average wind speed reduces

to zero and 𝜑 (𝑧, 𝑧0, 𝐿) is the Monin-Obukhov universal function (stability correction term) for

unstable stratification given by [223]:

𝜑 = −2 𝑙𝑛 (1 + 𝛺

2) − 𝑙𝑛 (

1 + 𝛺2

2) + 2𝑡𝑎𝑛−1(𝛺) −

𝜋

2 [4.19]

where

𝛺 = [1 − 15(𝑧 − 𝑑)

𝐿]

0.25

Turbulence statistics can then be expressed in terms of these parameters. MOST-based

velocity variances for stable/neutral atmosphere are given by [38]:

𝜎𝑢 = 2.5𝑢∗ [4.20]

𝜎𝑣 = 2𝑢∗ [4.21]

𝜎𝑤 = 1.25𝑢∗ [4.22]

And for unstable stratification [150, 224]:

𝜎𝑢 = 𝑢∗ [4 + 0.6 (𝑧

−𝐿)

2

3]

1/2

[4.23]

𝜎𝑣 = 0.8𝑢∗ [4 + 0.6 (𝑧

−𝐿)

23]

1/2

[4.24]

𝜎𝑤 = 1.25𝑢∗ (1 − 3𝑧 − 𝑑

𝐿)1/3

[4.25]

And the Lagrangian timescale for unstable stratification is calculated as [225, 226]:

𝑇𝐿 =0.5𝑧

𝜎𝑤

(1 − 6𝑧

𝐿)

14⁄ [4.26]

4.4 Methodology

This section presents the methodology used in this chapter. The subsections discuss the

parametrisation of the bLS model for OSR canopy, detail the evaluation of model with

experimental data and then explain the implementation of the discrete bLS model. These

sections represent the author’s work in optimising bLS for the problem in question based on

an integration and adaptation of methodologies and relevant experimental findings across

literature.

95

4.4.1 Parametrising the bLS Model for Sclerotinia Dispersion

The bLS model presented above is suited for the dispersion of neutrally buoyant, passive

tracer particles under the well-mixed constraint. In addition to input stability parameters, the

model must be optimised for the heavier (than tracers) Sclerotinia spores and for canopy

transport.

4.4.1.1 Calculating Model Statistics

All stability statistics used in this work were based on MOST calculations using the turbulence

measurements obtained from the field trial experiment discussed in section 3.2: U, 𝑢, 𝑣, 𝑤,

𝜎𝑢, 𝜎𝑣 , 𝜎𝑤 , 𝜎𝑢2, 𝜎𝑣

2, 𝜎𝑤2 and 𝑢′𝑤′. Although MOST calculated statistics can be erroneous in

extreme stability periods [227] [228], the theory is widely considered satisfactory for heights

of at least 29m, where, usually, |𝑧 𝐿⁄ | < 1 [229-231], and for uniform and flat terrains [38,

220, 232] as long as periods of extreme atmospheric (in)stability are avoided [150, 233].

These conditions are similar to those under which the experimental field trial described in

chapter 3 was conducted.

Following Flesch et al. [150] recommendation that averaging times over 60 minutes are non-

ideal for MOST-based bLS models, a 60-minute averaging time was adopted as this time gave

the best chance of success of evaluating the model against air sampled data in section 3.2.

This is because longer sampling periods improved reliability of spore traps and chances of

data collection [10].

As turbulent statistics are flow media dependent [31] [37], the MOST equations presented

(Eq.4.16-4.26) are not valid inside the canopy. Below the canopy, flow is non-Gaussian and

highly vertically inhomogeneous [234, 235] [212, 216]. Above a canopy, the flow is Gaussian,

horizontally and vertically homogenous under isotropic surface layer assumptions, so MOST

derived statistics above are directly applicable [236]. At the canopy-air interface, a rough-

wall boundary the effect of wind shear on canopy surface introduces instability in the region

around the canopy characterised by constant Reynold’s stress [149], known as the roughness

sublayer [31] [62] [38], which can extend to varying heights under different stability

conditions [152]. Therefore, three separate classes of flow that need to be considered: above

the canopy, inside the roughness sublayer and inside the canopy.

Inside the Roughness Sublayer (RSL)

For dense canopies with Leaf Area Index (LAI) greater than 1, the roughness sublayer extends

to a height of approximately 2(ℎ − 𝑑) above the displacement height [237]. The displacement

height, 𝑑, itself has been found to be fairly consistent at ~0.75ℎ over a wide range of natural

96

canopies [38]. In line with these findings, the roughness sublayer was estimated to extend

from 0.75ℎ to 1.25ℎ, making its effective depth 0.5m. The author determines that at this RSL

height, concerns about high RSL (>3h) degrading MOST performance [152] are not an issue.

At the top of the layer of height 𝑧𝑟𝑙 = 1.25h above ground, velocity statistics were taken to

be equal to those above the canopy (Eqs. 4.16 – 4.26). Following Aylor et al. [55] a gradient

15% was applied to values at 𝑧𝑟𝑙 to account for linear decrease in velocity statistics through

the layer up to the canopy height ℎ . Velocity statistics at the top of the canopy for

stable/neutral stratification are then given by:

𝜎𝑢(ℎ) = 2.13𝑢∗ [4.27]

𝜎𝑣(ℎ) = 1.7𝑢∗ [4.28]

𝜎𝑤(ℎ) = 1.1𝑢∗ [4.29]

and for unstable stratification (𝐿 > 0),

𝜎𝑢(ℎ) = 0.85𝑢∗ [4 + 0.6 (𝑧

−𝐿)

23]

12

[4.30]

𝜎𝑣(ℎ) = 0.72𝑢∗ [4 + 0.6 (𝑧

−𝐿)

23]

12

[4.31]

𝜎𝑤(ℎ) = 1.1𝑢∗ (1 − 3𝑧 − 𝑑

𝐿)

13

[4.32]

The values of 𝜎𝑢, 𝜎𝑣 and 𝜎𝑤 were assumed to decrease linearly through the sublayer [38],

which corresponds to gradients of 0.15𝜎𝑢,𝑣,𝑤(ℎ)/0.25𝑚 for this study. The Lagrangian

timescale, 𝑇𝐿 , remains unchanged in the upper half of the roughness sublayer (i.e. 𝑇𝐿 =

𝑇𝐿(ℎ) 𝑓𝑜𝑟 𝑧 > ℎ)

Above The Canopy

Above the roughness sublayer (𝑧 > 1.25ℎ) the surface layer assumption that average wind

speed varies as a diabatically corrected logarithmic wind profile is valid and Equations 4.16 –

4.26 are applicable.

Inside the Canopy

Only one sonic anemometer was available and it was decided that it be deployed above the

canopy. This is primarily because single measurements of turbulence in the surface layer,

where turbulence is homogenous, are be representative [236]. This is not the case for below

canopy measurements due to nonhomogeneous flow and the resulting effect of high

97

turbulence intensities on flow angles [59]. Consequently, varying canopy turbulence was

calculated based on experimental profiles and the assumption of exponential decay of

turbulence kinetic energy with distance from the canopy top towards the ground [36].

Turbulence profiles were calculated as follows [55]:

𝑈 = 𝑈(ℎ)exp [−𝛾1 (1 −𝑧

ℎ)] for 0< 𝑧 ≤ ℎ [4.33]

𝜎𝑢 = 𝜎𝑢(ℎ)exp [−𝛾2 (1 −𝑧

ℎ)] for 0< 𝑧 < ℎ [4.34]

𝜎𝑣 = 𝜎𝑣(ℎ)exp [−𝛾3 (1 −𝑧

ℎ)] for 0< 𝑧 < ℎ [4.35]

𝜎𝑤 = 𝜎𝑤(ℎ)exp [−𝛾4 (1 −𝑧

ℎ)] for 0< 𝑧 < ℎ [4.36]

⟨ 𝑢′𝑤′⟩ = −𝑢∗2exp [−𝛾5 (1 −

𝑧

ℎ)] for 0 < 𝑧 < ℎ [4.37]

𝑇𝐿 = 𝑇𝐿(ℎ) for 0.25 < 𝑧 < ℎ, and [4.38]

𝑇𝐿 = 𝑇𝐿(ℎ) [0.1 + 3.6(𝑧 ℎ⁄ )] for 0 < 𝑧 < ℎ [4.39]

Where 𝑈(ℎ), 𝜎𝑢(ℎ), 𝜎𝑤(ℎ) 𝑎𝑛𝑑 𝑇𝐿(ℎ) are top of canopy values. The extinction coefficients, 𝛾,

are properties of canopy density representing the rate of absorption of momentum that should

ideally be measured for individual canopies. Choosing these coefficients follows the values

used by Aylor and Flesch [55]. 𝛾1 was assigned a value of 2.4, based on the value estimated

by Shaw et al. [238] for a corn canopy with LAI = 3 (OSR LAI = 3.5). 𝛾2 and 𝛾4 were chosen

based on generalised non-forest canopy length scales for 𝜎𝑢(ℎ) and 𝜎𝑤(ℎ/3), making 𝛾2 = 1

and 𝛾4 = 3 . 𝛾3 was also taken equal to 𝛾2 , based on covariation of 𝜎𝑣 with 𝜎𝑢 . No

experimental values were found for 𝛾5 is OSR canopy, so a non-forest canopy value of 3.5

was used [38].

4.4.1.2 Adjusting for inertia of spores

Tracers and other light particles only need the equations of the fluid (air) to describe their

transport. For particles less than approximately 300𝜇𝑚, bLS can be directly adopted as their

size does not significantly decorrelate fluid and particle trajectories [48]. With diameters in

the range of 12𝜇𝑚 − 14𝜇𝑚 [20, 26, 50], Sclerotinia ascospores fall well below this threshold.

For these spore sizes, Wilson has shown that inertial adjustments need to be made in two

ways. The first is to account for the effect of the spore’s settling velocity, 𝜐𝑠, on the vertical

component of particle velocity, 𝑤. This can be achieved by modifying the Langevin equation

(Eq. 4.6) as shown in Eq. 4.40 [55]:

𝑑𝑧 = (𝑤 − 𝑣𝑠)𝑑𝑡 [4.40]

The second adjustment is to correct the decorrelation timescale to account for the difference

between the turbulence of fluid following a heavier particle and that following a passive tracer.

To achieve this, Sawford and Guest [239] proposed a weighting factor 𝑓 (0 ≤ 𝑓 ≤ 1) to

98

correct the Lagrangian timescale in air, 𝑇𝐿 , to give the decorrelation timescale of Sclerotinia

spores in air, 𝜏, as shown:

𝜏 = 𝑓𝑇𝐿 [4.41]

𝑓 = 1

√1 + (𝛽𝑣𝑠

𝜎𝑤)

2

Following this, Eq. 4.13, can be rewritten as follows:

𝑏𝑢 = 𝑏𝑣 = 𝑏𝑤 =2𝜎𝑤

2

𝑓𝑇𝐿

[4.42]

where 𝛽 is an empirical constant relating Lagrangian to Eulerian timescales determined to be

1.5 by Sawford and Guest [239]. Since 𝑓 is inversely proportional to settling velocity, lighter

particles will have a very small effect on 𝜏. Also, because 𝜏 varies with 𝜎𝑤, which varies with

height inside the canopy, the model time-step, 𝑑𝑡, and the fluctuation term of the Langevin

equation, 𝑏, will vary when the model resolves particles below canopy height.

4.4.1.3 Deposition on Vegetation and Ground

Spore deposition is usually determined by dimensional analysis or by experiments, as these

parameters have not been fully quantified [56]. The spore deposition algorithm of Aylor and

Flesch [55] was updated to account for deposition onto the lateral dimension vegetation

deposition based on the Bouvet et al.’s [207] algorithm. The resulting 3D form is shown in

Eq. 4.43. At each time step, the probability of a single travelling spore being deposited on

any element of vegetation in a 3D wind field can be expressed as:

𝐺𝑣(𝑧) = 𝑣𝑠𝑓𝑥𝐿𝐴𝐷𝐸𝑥𝑑𝑡 + 𝑢𝑓𝑧𝐿𝐴𝐷𝐸𝑧𝑑𝑡 + 𝑣𝑓𝑦𝐿𝐴𝐷𝐸𝑦𝑑𝑡 [4.43]

where 𝐸𝑥, 𝐸𝑦 and 𝐸𝑧 are the horizontal, lateral and vertical impaction efficiencies, 𝑓𝑥, 𝑓𝑦 and

𝑓𝑧 are the projection of plant area to horizontal, lateral and vertical planes and 𝐿𝐴𝐷(𝑚−1)

can also be thought of as the vertical variation of LAI with height inside a canopy [211].

Consistent with the findings of Aylor [208], 𝐸𝑥 was set to be equal to 1.0 signifying perfect

impaction efficiency and 𝐸𝑦 and 𝐸𝑧 were calculated as:

𝐸𝑦 = 𝐸𝑧 =0.86

1 + 0.442 (|𝑢|𝜏𝑅

𝐿𝑣)

−1.967 [4.44]

where 𝜏𝑅 (= 𝑣𝑠/𝑔) is the particle relaxation time and 𝐿𝑣 the characteristic size of vegetation

or leaf width [57].

99

Unlike wheat and maize [207] canopies, LAD profile measurements of OSR canopies were not

found in literature after an exhaustive search. In parameterizing Eq. 4.43, the beta probability

density function LAD profile for canopies with LAI=3 originally proposed by Markkanen et al.

[240] and adapted for banana canopies by Duman et al. [211] and Siqueira et al. [241] was

used. The density function is given by [211]:

𝐿𝐴𝐷(𝑧) ~(𝑧 ℎ⁄ )ℊ−1

(1 − 𝑧ℎ⁄ )

𝜘−1

, 𝑧 ℎ⁄ 𝜖[0,1] [4.45]

Where ℊ and 𝜘 are shape parameters. 𝜘 was kept constant at 3 [240] and ℊ, which affects

vertical distribution of foliage within the canopy, was adjusted to 4 to reflect OSR intermediate

measured crown height of 0.34m. 𝐿𝐴𝐷 was normalised by LAI (=3.5) such that

𝐿𝐴𝐼/ ∫ 𝐿𝐴𝐷. 𝑑𝑧ℎ

0 =1. 𝑓𝑥,𝑦,𝑧 are normally constant with height [55, 57, 207] and were calculated

from the mean tilt angle (MTA) measured during the field trial experiment as 0.3 and 0.52,

respectively (𝑓𝑦 = 𝑓𝑦 = 𝑠𝑖𝑛∅, 𝑓𝑧 = 𝑐𝑜𝑠∅, where ∅ is the mean tilt angle). The leaf width, 𝐿𝑣

[57] [207] was estimated as 0.035m for the OSR field. Estimates were made from 10

randomly selected leaves along each of 8 transects (east to west). Each leaf was measured

with a ruler horizontally across the midvein. The mean (out of 80) of these was taken as 𝐿𝑣.

The probability of deposition was then evaluated by comparing 𝐺𝑣 to a random number

휂 (chosen from a uniform distribution between 0 and 1) and the spore was either deposited

if 휂 < 𝐺𝑣, or allowed through to the next time step otherwise [55]. In the cases where spores

were deposited, that particular spore was abandoned and the bLS model released the next

one provided N had not been exceeded.

Probability of deposition to the ground was given by [162]:

𝐺𝑔 = {

2𝑣𝑠

𝑣𝑠 − 𝑤, 𝑤 < −𝑣𝑠

1, |𝑤| < 𝑣𝑠

[4.46]

On impact, if 휂 < 𝐺𝑔 , the spore was deposited, otherwise it was reflected back into the air.

Just after reflection, the spore position and velocity were updated according to the following

equation [55]:

𝑧𝑛𝑒𝑤 = 𝑧𝑜𝑙𝑑 − 2𝑣𝑠𝑑𝑡

𝑢𝑛𝑒𝑤 = −[𝑢𝑜𝑙𝑑 − 𝑈(𝑧𝑜𝑙𝑑)] + 𝑈(𝑧𝑛𝑒𝑤)

𝑤𝑛𝑒𝑤 = −𝑤𝑜𝑙𝑑 [4.47]

where “old” and “new” denote positions before and after reflection.

100

4.4.2 Implementing the bLS Model

The model was implemented by running the numerical form of Eq. 4.3 to generate particle

back trajectories:

∆𝒖 = 𝒂∆𝑡 + 𝒃 𝝃

∆𝒙 = 𝒖∆𝑡 [4.48]

where ∆𝑡 is the model’s time step and all other terms retain their earlier descriptions. The

choice of ∆𝑡 is critical as it needs to be sufficiently smaller than the decorrelation time scale,

𝜏, such that turbulent activity is not missed and the well-mixed condition is not violated [242].

Following Flesch et al. [150], Flesch et al. [217] and Aylor and Flesch [55], ∆𝑡 was chosen as

0.025𝜏 to fulfil the condition that model time-step is less than the Lagrangian timescale. For

each run, N =150000 particles were released from locations corresponding to the 22 Rotorod

sampling positions in the experimental field trial with initial velocities assigned according to

the Eulerian velocity statistics (above canopy) and roughness sublayer velocity statistics

(below the canopy) for the individual release positions. (Eq. 4.34 – 4.36) were assigned to

those leaving samplers below canopy height. N was determined by releasing 50000, 75000,

100000, 125000, 150,000, 175,000 and 200,000 and N =150,000 was chosen based on the

least value of N that achieved convergence for concentration at position D 1.6m (see Figure

3.2). To ease computation, each particle was traced back in time towards the source as it

experienced position and velocity fluctuations up to a maximum of 40m before being

abandoned.

During each time step, ∆𝑡, the model resolved the particle’s current position based on the

most recent update of Eq. 4.48 and evaluated the turbulent statistics for that height. Above

the canopy, these values will be same throughout; from the top of the roughness sublayer to

the top of the canopy, they will decrease linearly as 0.15𝜎(𝑢,𝑣,𝑤)(ℎ)/0.25𝑚; and inside the

canopy, they will decrease exponentially with the characteristic canopy length scale – see Eq.

4.33–4.39. Within this same time-step and for the same position, ground and vegetation

deposition probabilities were computed and, depending on the outcome, the particle was

either deposited or allowed to proceed through to the next time step. At the end of the time

step, the particle velocity and position were updated according to Eq. 4.49:

𝒖 = 𝒖 + ∆𝒖

𝒙 = 𝒙 + ∆𝒙 [4.49]

The landing positions (𝒙𝟎 , 𝒚𝟎) of all N released particles and corresponding touchdown

velocities 𝑤0∗(= 𝑤0 − 𝑣𝑠) were recorded in a catalogue. Concentrations were then calculated

by slightly modifying Eq. 4.13 to account for velocity differences between spore and tracer as

shown:

(𝐶/𝑄) = 1

𝑁∑

2

|𝑤0∗|

𝑛

[4.50]

101

where 𝑛 is the number of touchdowns within the source area. To account for the ring

configuration of the source, spores were assumed to emanate from 6 1m2 squares centred

on the circumference of the 7m diameter source ring as shown in figure 4.1. All touchdown

within these squares were considered to contribute to the concentration footprint the receptor

of interest. All six groups of sources were assumed to be ‘active’ during the sampling runs

and assumed to have the same 𝑄 and, therefore, could be treated like a single, homogenous

source [243]. These are reasonable assumptions given the similar environmental and physical

soil conditions of the sources due to their proximity to one another, the consistency of

Sclerotinia isolates used, the sowing practice employed in burying each group of Sclerotia

[148], and the same rate of maturation [19].

Figure 4.1: The assumed source configuration used for concentration footprint calculation

showing approximate locations of 6 groups of Sclerotinia. Each group is assumed to cover a

1 square meter area based on approximate measurements of area covered by fruiting

bodies. The vertices of each square for the left bottom corner starting with 1 are: (-2.25,

2.5), (1.25, 2.5), (3, -0.5), (1.25, 4.0), (-2.25, -4.0), and (-4, 0.5). (Drawing not to scale).

102

4.4.3 Comparing model estimates to experimental data

A number of adjustments had to be made to the data in order to compare it to model

estimates. The first adjustment was to convert spore numbers to a standard measurement

unit of concentration (#/𝑚3).

The second adjustment concerned the sampling time of the field experiment described in

chapter 3. In the field experiment, spores were collected for 5 hours each day and analysed

as one sample for each sampling point to mitigate sampling error [10]. From a modelling

point of view, a 5-hour sampling period is too long to evaluate as a single sample period

because wind turbulent activity has a considerably shorter timescale. Averaging times beyond

an hour generally make turbulent statistics unrepresentative and the models they are based

on inaccurate [150]. This is especially true for MOST-based models because, at longer

averaging times, the assumptions of local stationarity are invalid [150, 228].

The third adjustment was to standardise the unit of measure between observed

concentrations and the bLS-estimated model. 𝑄𝑒𝑠𝑡 could not be measured during the

experiment due to the unavailability of direct measuring methods [55]. To compare model

estimates (𝐶/𝑄) with experimental data, there was a need to estimate the actual release rate

of spores, 𝑄𝑒𝑠𝑡 , in order to express observed data as 𝐶𝑚/𝑄𝑒𝑠𝑡 where 𝐶𝑚 is the measured

concentration. These adjustments are detailed in sections 4.4.3.1-4.4.3.3.

4.4.3.1 Converting spore DNA to standardised spore concentration (#/𝒎𝟑)

First, spore DNA in picograms (described in chapter 3) was converted to spore numbers using

a measure of 1spore = 0.35pg of DNA as specified by the primer design used for qPCR

analysis [153]. These were then converted to #/𝑚3 by dividing by the air volume sampled

(38 𝐿𝑚𝑖𝑛−1 ) and sampling time (5hrs). A more realistic measure of actual airborne

concentration requires scaling air-sampled concentrations with the efficiency of the collection

device. Efficiency of Rotorods varies with the type and physical characteristics of spores being

collected and there are no recorded estimates of collection efficiencies of Rotorod samplers

collecting Sclerotinia spores. However, Aylor [244] had calculated an efficiency value for

Rotorods collecting V. Inaequalis spores as 21% and de Jong et al. [50] reported that

Sclerotinia ascospores and V. Inaequalis spores have approximately the same dimensions and

would therefore have similar sedimentation velocity, 𝑣𝑠, of 0.002𝑚𝑠−1. McCartney and Fitt

[179] and Aylor [244] independently reported the sedimentation velocities of Sclerotinia

spores and V. Inaequalis as 0.002𝑚𝑠−1, thus corroborating de Jong et al. Based on this,

concentrations were multiplied by a factor of 5 to account for sampling inefficiency.

103

4.4.3.2 Adjusting spore concentration

Spore concentration was adjusted to a shorter averaging time based on the findings of

Clarkson et al. [19], Hartill [245], McCartney and Lacey [21], Bourdot et al. [157], Abawi and

Hunter [246], and Qandah [27] that associated specific environmental factors with diurnal

variations and peak distributions of Sclerotinia spore release. Most of these studies

[157],[204], [27] found that spore emission peaked between 9am and 1pm, with Qandah

[27] further positing that approximately 85% of total spores released in a day are emitted

during this period. During the experimental field trial, sporulation was observed on all days

at the start of sampling (11am) and temperature and relative humidity had similar diurnal

variation for all days. Based on these results and the fact that glycerine-coated I-rods will

lose their adhesion during long sampling periods due to degradation in spore retention caused

by spore and dust accumulation/overloading [247], the majority of field samples were

assumed to have been collected within the first hour of sampling (11am to 12pm).

Consequently, the spore concentration was multiplied by a factor of 0.85 to get the

concentration after 2hrs. Further assuming a 60/40 split in spore retention, the 2hr

concentration was multiplied by 0.6 to obtain the spore concentration at 12:00pm on all days.

4.4.3.3 Estimating actual spore concentration (𝑸𝒆𝒔𝒕)

Due to the unavailability of direct and reliable methods to measure the rate of release of

spores (source strength), 𝑄𝑒𝑠𝑡 was not measured during the field trial experiment. This is an

important variable in comparing concentration estimates because it scales the observed

concentration values, because comparisons are made are made based on source-scaled

concentrations 𝐶/𝑄.

Proposed methods of calculating 𝑄𝑒𝑠𝑡 by de Jong et al. [50] based on sclerotial density, 𝑠

(#/𝑚2), area of sporulating apothecial disc, A (𝑚𝑚2/𝑠𝑐𝑙𝑒𝑟𝑜𝑡𝑖𝑎), and the rate of release of

ascospores per apothecia, 𝑟 (#/𝑚𝑚2 of disc surface) were found unreliable because of the

large disparities in reported values of ascospore release [19, 159, 245]. This variability is due

to the sensitivity of ascospores to a wide range of environmental conditions that trigger

release.

An inverse dispersion modelling approach was instead adopted to calculate the estimate of

𝑄, 𝑄𝑒𝑠𝑡 [150, 163]:

𝑄𝑒𝑠𝑡 =𝐶𝑏 − 𝐶𝑚

(𝐶 𝑄⁄ )𝑚𝑜𝑑𝑒𝑙

[4.51]

where 𝐶𝑏 and 𝐶𝑚 are observed background and upwind concentrations respectively and

(𝐶 𝑄⁄ )𝑚𝑜𝑑𝑒𝑙

is the estimated normalised bLS model estimate for the same sampling position

as 𝐶𝑚 . From Eq. 4.51, only a few 𝐶𝑚 at select positions and their corresponding model

104

estimates are required to estimate 𝑄𝑒𝑠𝑡. To ensure the independence of 𝑄𝑒𝑠𝑡 from the bLS

model outputs such that using 𝑄𝑒𝑠𝑡 to scale observations does not result in minimizing the

mean squared error between model predictions and observations [56], a forward LS model

was chosen to generate profiles of (𝐶 𝑄⁄ )𝑚𝑜𝑑𝑒𝑙

at heights of 1.6, 2.4 and 3.2m corresponding

to sampling heights at position D, at which point there was good mixing on all days (see

figure 3.2) during the field trial. This model had been successfully used to estimate source

strength from concentration profiles in grass and wheat canopies [55]. To get more accurate

estimates, only sampling points above the canopy were used here. Position B (see figure 3.2),

which is just 1m away from the edge of the circular ring spores, was also not chosen to

estimate 𝑄𝑒𝑠𝑡 to ensure that the constituent plumes from the 6 separate sources had

sufficiently mixed into one. The forward LS model is given by [211] [216]:

(𝐶/𝑄)𝑚𝑜𝑑𝑒𝑙 = ∑ [1

𝑁∆𝑥∆𝑦∆𝑧∑ [

1

𝑢𝑘(𝑥𝑠𝑒𝑛𝑠 , 𝑧𝑠𝑒𝑛𝑠 ±∆𝑧𝑠𝑒𝑛𝑠

2)]

𝐾

𝑘

]

𝑀

𝑚=1

[4.52]

where 𝑢𝑘 is the horizontal velocity of the individual spore passing through a sensing volume

of height ∆𝑧𝑠𝑒𝑛𝑠 located at (𝑥𝑠𝑒𝑛𝑠 , 𝑧𝑠𝑒𝑛𝑠), ∆𝑥, ∆𝑦 𝑎𝑛𝑑 ∆𝑧 (0.1, 0.1, 0.1m) are dimensions of the

sensor volume and M is the number of sources (M=6). The volume of the sensing surface

was set to the volume of air sampled by the Rotorod samplers in one second. To generate

profiles of (𝐶 𝑄⁄ )𝑚𝑜𝑑𝑒𝑙

at heights corresponding to those of position D (see figure 3.2), Eq.

4.52 was evaluated with N =150000. These profiles were then used to scale the observed

concentrations at the respective heights using Eq. 4.51. The best estimate was obtained by

regressing the three 𝑄𝑒𝑠𝑡 values on observed concentrations, yielding 𝑄𝑒𝑠𝑡= 218, 120 and 98

spores 𝑚−3𝑠−1 for the three days. These values are representative of the high apothecial

density in the inoculated sources. The reduction in Qest over the experimental period is

consistent with a decline in apothecial release over its lifetime [181]. It is noteworthy that

inverse modelling (Eq. 4.51) simply scales observed concentration profiles by turbulent effects

to obtain estimated actual source strength, 𝑄𝑒𝑠𝑡.

105

Table 4. 1: Table of model parameters.

Parameters Description Values

LAI

Leaf Area Index

3.5

𝐿𝑣

Leaf width

0.035 𝑚

𝑓𝑥, 𝑓𝑦, 𝑓𝑧

Projection of leaf area in 𝑥, 𝑦 & 𝑧 directions

0.3, 0.3, 0.52

ℎ

Canopy height

1 𝑚

𝑑

Displacement height

0.75 𝑚

𝑧𝑟𝑙

Roughness sublayer height

1.25 𝑚

𝑣𝑠

Settling velocity

0.002𝑚𝑠−1

𝑁

Total number of simulation particles

150000

𝜘

LAD shape parameter 1

4

ℊ

LAD shape parameter 2

3

𝑘𝑣

Von Karman constant

0.4

𝛽

Eulerian-Lagrangian coefficient

1.5

𝑔

Gravitational acceleration

9.82 𝑚𝑠−2

∆𝑡

Model time step

0.025𝜏

4.4.5 Assessing Model Performance

To better assess the performance of the bLS model, the predictions were evaluated using

established dispersion model performance measures [248-250]. These performance statistics

include the geometric mean bias (MG), geometric variance (VG), fractional bias (FB),

normalised root mean square error (NMSE), and fraction of predictions within a factor of 2

and 5 (FAC2 and FAC5). The correlation coefficient was not used because it can be misleading

for short-range dispersion predictions [251]. Assuming appropriate data inputs are used in

the model, the mentioned statistics as a whole are reliable indicators of an “acceptable” model

even in situations where there are comparatively fewer data samples. According to Chang

and Hanna [250], an acceptable model should have the following values for the statistics:

FAC2 >50%, |FB| < 0.3, 0.7<MG<1.3, NMSE < 1.5, and VG < 4. An acceptable model is

106

defined as one that is good enough for “research-grade field experiments” [250]. The

performance measures were calculated as follows [250]:

𝐹𝐵 =𝐶𝑜 − 𝐶𝑝

0.5(𝐶𝑜 + 𝐶𝑝

[4.53]

𝑀𝐺 = exp(ln 𝐶𝑜 − ln 𝐶𝑝

) [4.54]

𝑉𝐺 = exp[(ln 𝐶𝑜 − ln𝐶𝑝)2 ] [4.55]

𝑁𝑀𝑆𝐸 = (𝐶𝑜 − 𝐶𝑝)

2

𝐶𝑜 𝐶𝑝

[4.56]

𝐹𝐴𝐶2 = 0.5 ≤𝐶𝑝

𝐶𝑜

≤ 2 [4.57]

𝐹𝐴𝐶5 = 0.2 ≤𝐶𝑝

𝐶𝑜

≤ 5 [4.57]

where 𝐶𝑜 and 𝐶𝑝 are observations and model predictions respectively and the overbars

denote means of quantities. FB and MG are measures of mean bias and indicate systematic

errors, VG and NMSE are measures of scatter and indicate both random and systematic errors,

and FAC2 and FAC5 are robust measures of how close predictions are to observations [250].

Values of MG (0 < MG < 2) < 1 and > 1 imply overprediction and underprediction respectively.

Similarly, FB (-1< FB < 1) < 0 and > 0 indicate overprediction and underprediction

respectively.

4.5 Results

The model was evaluated with the parameter values shown in Table 4.1. Figures 4.2 and 4.3

show the model predictions and observations for the streamwise and crosswind sampling

positions above and below the canopy respectively. Figures 4.4 and 4.5 show the normalised

observations against modelled observations for all sampling positions above the canopy and

below canopy height respectively. In both cases, it is evident that the model over predicts

the observations considerably.

In Figure 4.2, the model predictions above the canopy appear to agree more with the

observations than those below the canopy. Above the canopy, the power law decay of spore

concentration with distance from the source appears to be preserved. Predictions are worse

near the source and seem to get better with downwind distance. The relatively poor

performance near the source is due to the treatment of six groups of sources as one. At close

distances from the source, errors resulting from the assumption that multiple identical sources

can be modelled as one can be amplified [150]. This close distance is of the order of 10𝑧

from the upwind edge of a source, and is equivalent to 16m (𝑧=1.6) from the upwind edge

of the source in this work, and 9m from the downwind edge. Therefore, sampling position B

107

at 1m downwind is well within this distance. Further away, at approximately 14m, as the

plume mixes the predictions get slightly better.

Figure 4.2: Normalised observations (blue asterisks) versus normalised model predictions

(red circles) above (left panels) and below (right panels) the canopy for the downwind

sampling positions for all sampling days.

Downwind distance from center of source(m)

Norm

alised c

oncentr

ation o

f spore

s (

C/Q

)

0 5 10 15 20 25 3010

-3

10-2

10-1

Day 1 Above canopy (z = 1.6m)

0 5 10 15 20 25 3010

-3

10-2

10-1

Day 2 - Above canopy (z = 1.6m)

0 5 10 15 20 25 3010

-3

10-2

10-1

100


0 5 10 15 20 25 3010

-3

10-2

10-1

100

Day 1 - Below canopy height (z = 0.8m)

0 5 10 15 20 25 3010

-3

10-2

10-1

100


0 5 10 15 20 25 3010

-3

10-2

10-1

100


108

Figure 4.3: Normalised observations (blue asterisks) versus normalised model predictions

(red circles) above (left panels) and below (right panels) the canopy for the crosswind

sampling positions for all sampling days.

Distance from plume center (m)

No

rma

lise

d c

on

ce

ntr

atio

n o

f sp

ore

s (

C/Q

)

-15 -10 -5 0 5 10 15

10-3

10-2

10-1


-15 -10 -5 0 5 10 1510

-3

10-2

10-1


-15 -10 -5 0 5 10 1510

-3

10-2

10-1


-15 -10 -5 0 5 10 15

10-3

10-2

10-1


-15 -10 -5 0 5 10 1510

-3

10-2

10-1


-15 -10 -5 0 5 10 1510

-3

10-2

10-1


109

Figure 4.4: Normalised observations versus normalised model predictions for all observed

concentrations above the canopy. The blue line is the 1:1 line

Figure 4.5: Normalised observations versus normalised model predictions for all observed

concentrations below the canopy. The blue line is the 1:1 line.

10-3

10-2

10-1

100

10-3

10-2

10-1

100

C/Q Modelled

C/Q

Observ

ed

Plot of of bLS predictions versus all observations above canopy

10-3

10-2

10-1

100

10-3

10-2

10-1

100

C/Q Modelled

C/Q

Observ

ed

Plot of bLS predictions versus all observations below the canopy

110

The results for the sampling point below canopy height (Figure 4.2, right panels), indicates

this trend may be non-existent due to the chaotic nature of canopy transport. The poor near-

source predictions that were witnessed above the canopy are not visible anymore although

the higher concentration values in the canopy (approximately a 17-fold increase in the highest

values) may have masked this. Notwithstanding, this general over prediction is more

pronounced inside the canopy (approximately 2 times and 16 times that above the canopy

on the first two days). The fact that the model overpredicts more (see Table 4.2) inside the

canopy could suggest it underestimates deposition. A further source of error below the canopy

is discrepancies in the initialised release velocities. To estimate concentrations at the sampling

positions with bLS, particles/spores were initially released from that position with velocities

corresponding to the turbulent conditions for that location. Below the canopy, due to a more

inhomogeneous flow, the estimated turbulent statistics used for the release (𝜎𝑢,𝑣,𝑤(𝑧), 𝑈(𝑧)

for 𝑧 < ℎ) were more erroneous than those above it. This increased erroneous estimates

below the canopy.

In figure 4.3, which shows the observations and predictions at the crosswind sampling points,

the above-canopy prediction are again better than the below canopy predictions (see Table

4.2). Predictions made at the centre of the axis are better than those made on either side in

both mediums. On day 2, when there was a 12.80 misalignment between the central axis of

the sampling grid and the mean wind direction, the predictions were comparatively worse,

suggesting either the model’s sensitivity to misalignment or a mis-estimation of plume spread.

Significant misalignments with the mean wind direction can result in an increase in friction

velocity, 𝑢∗, since ⟨𝑣′𝑤′⟩ will no longer be zero (−𝑢∗2 = √⟨𝑢′𝑤′⟩ + ⟨𝑣′𝑤′⟩), and the model

could under/overestimate plume spread by under/overestimating 𝜎 𝑣 . Independent

measurements of 𝜎 𝑣 at z = 1.6m showed that the model overestimated this quantity

(𝜎 𝑣𝑀𝑂~1.7𝑢∗ and 𝜎 𝑣𝑚𝑒𝑎𝑠

= 0.521𝑚𝑠−1~ 1.41𝑢∗, where 𝜎 𝑣𝑀𝑂and 𝜎 𝑣𝑚𝑒𝑎𝑠

are MOST-estimated

and measured vertical velocity variances respectively). This means that the actual spread of

the plume is less than the model acknowledges. Under such circumstances, predictions away

from the centre will be erroneous as indicated in figure 4.3. Unfortunately, the unavailability

of independent turbulence measurements below the canopy made it impossible to confirm

whether this was also the case below canopy height for this case.

Figures 4.4 and 4.5 show the pervasiveness of overprediction by the model both above and

below the canopy as depicted by the majority of the points lying above the 1:1 line. However,

this overestimation cannot be confidently attributed to the model since the observed data

itself had to undergo a series of adjustments, as explained in the section 4.4.3.2. These

adjustments may have affected the absolute concentration values. Also, Rotorod samplers

have a tendency to underestimate aerial spore concentrations when they decelerate from

111

their calibrated rpm values during sampling [158]. These factors, in addition to corrections

for efficiency and the scaling of observed values with model-estimated source strength (𝑄𝑒𝑠𝑡)

make assessing model performance based on its agreement with absolute observed

concentration values unrealistic [158]. The degree of agreement of the model with

observations is difficult to see without statistical performance measures.

Table 4.2: Calculated model performance measures for different observation groups (above or

below canopy height). Number of observations is shown in square brackets

Performance Measures

Observation Group FB MG VG NMSE FAC2

(%)

FAC5

(%)

Above canopy (all) (27) -0.8 0.51 1.59 2.69 46 95.85

Below canopy (all) (27) -0.55 0.395 2.37 1.56 37.5 79.2

Above canopy (downwind)

(15) -0.69 0.63 1.73 1.75 67 100

Above canopy (crosswind)

(15) -0.88 0.39 2.27 4.1 40 93

Below canopy (crosswind)

(15) -0.85 0.4 6.81 1.74 26 73

Table 4.2 shows the performance statistics computed for all predictions divided into five

groups with number of observations for each calculation shown in square brackets. The

statistics confirm some of the initial observations made: the model generally overpredicts

more inside the canopy than outside by factors of approximately 2 (MG = 0.51, VG = 1.59)

and 2.5 (MG = 0.395, VG = 2.64) respectively, and it is more accurate above the canopy than

below (FAC2 higher above canopy). FAC2 is the more robust statistic because it is

comparatively resistant to outliers. By contrast, NMSE and FB can be strongly influenced by

high outliers as evident (in Table 4.2) in their low values for below canopy predictions where

concentrations are highest.

Even though the model has not met the acceptance threshold laid out by Chang and Hanna,

there is clear evidence from these statistics on which groups it predicts better. The model

performance above the canopy is better overall and the statistics get very close to the

acceptability threshold when the crosswind observations are further excluded (see Table 4.2

– Above Canopy (crosswind) statistics. This is significant because the intended final

112

application of this model is in the back trajectory tracking of sources from an above canopy

sensor.

Above the canopy, predictions are worse for the crosswind observations as seen when

crosswind and downwind predictions above the canopy are compared. This is due to an

underestimation of the lateral velocity component, 𝜎𝑣, as explained earlier. For the crosswind

observations, there is approximately the same amount of overprediction by the model inside

and outside the canopy, as the MG and FB values are almost identical. However, all the other

metrics show that the above-canopy crosswind predictions are better than those below the

canopy. The only exception is NMSE, which as earlier stated has a tendency to be affected

by the high outlying values inside the canopy, and it thus underestimates the mean square

error inside the canopy. This is evident when it is considered that VG, which, like NMSE is

also a measure of scatter but is unaffected by outliers and therefore more representative,

shows there is more randomness inside the canopy (VG = 6.81).

Attempts were made to improve performance by tuning the Lagrangian timescale. The

Lagrangian timescale is a major source of error in the implementation of LS models because

of its dependence on the turbulent kinetic energy dissipation rate and its influence on the

delicate model time-step. Wilson and Flesch [242] found that most errors resulting from the

violation of the well-mixed condition from the implementation of discrete LS models were

attributable to the model’s time-step, which is in turn dependent on the Lagrangian time scale

(∆𝑡~0.025𝑇𝐿) . In the presence of canopies, the turbulent kinetic energy is even more

dissipative and erratic, thus making 𝑇𝐿 more difficult to specify. Aylor and Flesch [55] reported

a better result by using a premultiplier of 0.4 (in Eq. 4.26) instead of 0.5 for 𝑇𝐿. This resulted

in poorer predictions in bLS as did a premultiplier of 0.6 in this work. It was found that a

change to Eq. 4.26 produced considerably worse predictions, possibly due to a violation of

the well-mixed constraint as a result of an altered time-step [242]. The results shown,

therefore, are based on the 𝑇𝐿 formulation in Eq. 4.26 that found good success over a wide

range of project prairie grass observations [225, 226].

4.6 Discussion

4.6.1 bLS Model Performance

This study has parametrised and implemented a bLS model that can estimate the

concentration footprint of naturally-released ground-level Sclerotinia spores at receptor

positions above an OSR canopy. The model used minimal turbulent instrumentation, utilising

MOST and empirical parametrisation of canopy turbulence to describe surface layer and

canopy turbulence.

113

The model gave better estimates above the canopy due to the more homogenous surface

layer flow and indicates that the samplers, which were deployed at 1.6m, are beyond the

influence of the roughness sublayer under these conditions (z𝑟𝑙 = 1.25ℎ = 1.25𝑚). Below the

canopy, the complexities of deposition, low wind speeds, high turbulent intensities, and a

combination of Gaussian (up to heights below 210mm ) non Gaussian velocity PDFs through

the rest of the canopy [252] affect the quality of the estimates [53]. With regards to model

estimates in the lateral direction, this implementation of the bLS performed less satisfactorily.

This is attributable to the challenges of modelling crosswind effects which can be very

sensitive to wind direction, as shown in the higher error of estimating 𝜎 𝑣 for Day 2, when the

streamwise wind misalignment with sampling axis was greatest. In this work, there is a

tendency for these effects to be magnified below the canopy because all turbulent

characterisation is based on the friction velocity, 𝑢∗, as a result of MOST parameterisation.

Below the canopy, this error is magnified by errors further introduced by the experimental

parametrisation of the turbulence field (Eq 4.37-4.43). Markannen et al. [253] have shown

that MOST-LS and MOST-bLS models tend to suffer more accuracy deterioration than Large

Eddy Simulation (LES) coupled LS models when estimating crosswind concentration footprint.

Generally, the model overestimated concentrations in both mediums. This is partly

attributable to a smaller than assumed spore source area. The concentration at each receptor

was calculated from a catalogue of touchdown velocities of particles landing in any one of 6

1-square meter areas. The size of these squares was based on approximate measurements

of ground area covered by apothecia. Due to non-compactness of sclerotia and allowances

for the irregular shape of source area at the vertices of the square, the actual source area is

smaller than assumed. Consequently, trajectories outside the actual source might have been

included in concentration estimation. Another likely source of error is the adjustment of spore

concentration in order to synchronise sampling time with the averaging time of turbulence

statistics. It was estimated that 51% of total daily spores collected were collected in the first

hour based on diurnal spore release variation and deteriorating Rotorod retention of spores.

This could have easily resulted in an overestimation or underestimation of actual measured

spore concentration depending on whether the assumed diurnal spore release variation is

higher or lower than the actual spore release pattern. Therefore, the effect of this adjustment

on model results is unclear. Further, the characterisation of in-canopy turbulence was only

an estimate, as approximate values based on past experiments in similar canopies were

selected to represent varying turbulence through the canopy. Turbulence in canopies is so

complicated that even attempts to directly calculate the turbulent statistics of the flow field

(e.g. [56]), may not accurately reproduce a turbulent flow that is a product of dominant

length scales which change with the dissipation of turbulent kinetic energy [161]. Most errors

114

in LS model implementation are as a result of inadequate characterisations of canopy

turbulence. These errors are related to the conventional LS model’s inadequate simulation of

the turbulence kinetic energy (TKE) dissipation rate [210]. This limitation of the conventional

LS models is responsible for the recent rise of coupled approaches [60, 212] [56] that attempt

to directly solve for TKE using higher order closure schemes [59].

To assess the performance of this model, some similar applications of LS models have been

identified. The two most relevant are Gleicher et al. [56] and Aylor and Flesch [55]. These

are very relevant because they both evaluate the performance of LS models on spore

concentration estimation in crop canopies against experimental data. The application in this

work is still unique because it attempts to model naturally-released ground level spores in an

OSR canopy. The type of canopy is significant because Gleicher et al. and Aylor and Flesch -

like most of the research in this area [34] [57, 207] [88] - carried out their implementation

on data in wheat and corn canopies, where detailed canopy features and turbulence attributes

have been amassed over the years, due to a higher interest in these cash crops [254]. Another

difference is the fact that bLS not LS is used in this work. Backward LS and forward LS models

calculate concentration footprints differently. Using vertical velocity components of spores

that land in an originating source area (bLS) to calculate concentration footprint and using

horizontal components of velocity passing through a sensor volume (fLS) could result in

completely different outcomes [255]. Notwithstanding these differences in the applications,

the works mentioned are a basis for comparison.

Gleicher et al’s [56] work applied a 3D Eulerian-coupled LS model to investigate Lycopodium

spore dispersal in a maize canopy. In their approach, they used Wilson and Shaw’s [256] 2nd

order closure model to iteratively calculate canopy turbulence parameters rather than rely on

an empirical parameterisation based on generalised canopy turbulence. Their model’s

performance metrics were generally better than the bLS model implemented in this work

(based on FAC2 statistic). It is worth noting, however, that their experiment was on a smaller

scale, with the farthest group of receptors (Rotorods) only 8m away from the source. Due to

the increased scale in this work, the performance measures computed will be lessened by the

error associated with estimating concentration at more distant in-canopy locations. Another

thing to consider is that the Gleicher et al. study used artificially released spores with a

uniform release rate from sources above the ground. This meant that the complexities of

spore release, particularly varying rates and velocities [26], were bypassed. Roper et al. [26]

have demonstrated that naturally released fungal spores have a complex interaction with the

surrounding air and maximise opportunities to be released in groups as opposed to

individually. This is not accounted for in current implementations of LS and will affect model’s

performance. Nevertheless, the results in this work have good agreement when above canopy

115

downwind performance is compared (FAC2 = 67% and 71% for this work and Gleicher et al.

respectively). This is encouraging considering, based on performance measures, Gleicher et

al’s model performed better than most air dispersion model applications [47] [56]. Wilson et

al. [257] define high performance as a FAC2 of 56%.

Aylor and Flesch’s [55] implementation is one of the more successful of LS models in crop

canopies, where they estimated concentration profiles (vertical concentrations with height)

of Lycopodium and V. Inaequalis spores from wheat and grass canopies respectively and

achieved good agreement with observed data. Aylor and Flesch, like this work, was also based

on MOST-LS and canopy turbulence statistics were similarly parameterised from experimental

data. However, where they relied on wholesome canopy attributes, this work had to rely on

estimates (e.g. LAD profile) and random sampling (e.g. estimation of 𝐿𝑣). Further, Aylor and

Flesch’s work was carried out on an even smaller scale than Gleicher et al’s, as their primary

aim was to estimate release rates. The results of this work agree with Aylor and Flesch’s as

both confirm the increased accuracy of LS estimates above the canopy. Aylor and Flesch

expressed lower confidence in their in-canopy predictions, attributing it to low flight of spores

with respect to sampling heights inside the canopy. This will also appear to be a contributory

source of error in this work, as this is a direct manifestation of canopy turbulence and

deposition.

Incorporation of empirical techniques into current wind dispersal strategies to address

limitations of scale and ad-hoc nature of current phenomenological methods has been

identified as an important research goal [258]. One way of achieving this is through large-

scale data collection from an optimally deployed network of sensors. Methods of optimal

deployment of sensors are already in use for environmental and health monitoring based on

various underlying statistical models and concentration profiles [142] [259] [260]. These

should be extendible to an LS model-generated concentration profile, where spatiotemporal

fluctuations are used to optimise sampling and monitoring strategies [258]. The first step is

to validate canopy-capable models from the point of view of their ability to generate spatial

gradients. These spatial profiles can then be used to implement better sampling strategies

that can mitigate current limitations of spore traps and samplers [10]. The evaluation of bLS

in estimating concentrations above canopy presented here is from this point of view of

assessing its potential to estimate spatial profiles of Sclerotinia spores and therefore

addresses this identified research need. With a FAC2 of 46% (within 4% of acceptable

standard and potential for improvement – see section 4.6.2) above the canopy, the bLS model

appears suited for this task based on the limited dataset evaluated.

116

Further research into this specific area should focus on bridging the gap between “cash crops

of interest” and crops like OSR in terms of easy availability of accurate canopy attributes.

Despite the advances in measurement techniques in the past decade, such as LIDAR and

differential spectroscopy that allow accurate measurements of canopy variables (e.g. LAD) at

very fine resolutions [258], their use is restricted to a select number of crops, specifically

wheat and maize (corn) due to a higher interest in these crops by pathologists and growers.

When these measurement advancements in OSR are leveraged, recent methodologies in

canopy flow parametrisation, such as 𝑘 − 𝜖 theory [56, 60], the increasingly powerful Large

Eddy Simulations (LES) [62] and the log-normal velocity-dissipation [210] approach that

characterise canopies more reliably based on solutions to the 2nd order closure model [256],

can be utilised to achieve higher accuracy. And, in turn, these gains can be used to further

the goal of incorporating and optimising empirical techniques into spore dispersal and

eventually regional or even global disease prediction.

4.6.2 Limitations of Experiment

There are a number of limitations to the field trial experiment that have an effect on model

evaluation and they are discussed below.

1. Canopy Attributes and In-Canopy Turbulence Measurements: The main limitations

in this work pertain to the unavailability of detailed canopy attributes and a lack of turbulence

statistics within the canopy. More accurate methods of canopy turbulence parametrisation

such as the ones based on 2nd order closure estimation of TKE could not be used as they

require reliable information on foliage density to compute the drag coefficient [37], which is

a key input into the closure scheme. Because the solution to 2nd Order Closure models is often

derived iteratively, errors in the LAD profile are likely to propagate and magnify. Further,

independent measurements below canopy were not available to assess the likely error that

might result due to using an approximate LAD. Based on these, experimental parametrisation

was preferred in this work. Considering turbulence mischaracterisation is the major source of

error in LS models, addressing this limitation will significantly improve the model’s

performance.

This limitation has the tendency to limit points of comparison between this work and others

only to concentration estimation. Intermediate assessment measures in the form of varied

canopy parameterisation methodologies [61, 62, 161, 212, 234, 235], could therefore not be

compared to this work because this implementation, through the use of experimental

parametrisation, essentially made a black box of that component of the model.

117

This limitation could have been mitigated or eliminated by mobilising more sonic

anemometers and utilising sophisticated canopy measurement techniques (e.g. using LIDAR

for foliage density measurement). However, given the circumstances that gave birth to the

field experiment and the subsequent quick modification of plans (see Appendix 1), this was

not possible. The experiment had to rely on equipment that was available at/to Rothamsted

Research at the time.

2. Insufficient Data Samples: Another limitation concerns the unavailability of sufficient

data. This relates to both scale and number of sampling points. Insufficient data has the

tendency to bias results to fit unrepresentative samples, resulting in great variance and

decrease confidence in conclusions. However, the model performance metrics used to assess

the model in this work are considered robust to few data samples [249] and remain the

standard for validation air dispersion models (e.g. [47] [261]), where typically few validation

samples are available [262]. Nevertheless, further evaluation of the bLS model on a larger

scale or higher number of points is advised. With respect to increasing the scale of the

experiment, the very limitations of the semi-manual current methods this work is trying to

address make increasing the scale significantly very difficult. As seen in chapter 3 and

supported by Heard [13], the most reliable identification and quantification techniques are

currently manual and time-consuming in a way that makes large scale data collection

impractical.

4.7 Conclusions

In this chapter, a bLS model describing the transport of spores in an OSR canopy has

successfully been implemented. The rationale behind choosing an LS model was attributed to

the LS model’s ability to naturally mimic particle dispersion in a turbulent atmosphere and its

amenability to modifications that will enable coping with complex environments. The

backward-time Lagrangian implementation of LS was found attractive because it requires

minimal source information and may thus estimate footprint on a conceptual basis. The

capability of bLS to only compute the trajectories of interest can be used to get probabilistic

estimates of likely travel distances of spores that are under the influence of canopy effects.

This will be helpful in informing deployment decisions of monitoring equipment or sensors.

The bLS model presented is simple, requiring only a few surface measurements to

characterize turbulence, a luxury afforded by MOST. The results suggest bLS models can be

capable of estimating concentration of Sclerotinia spores leaving an OSR canopy at sampling

points deployed above the canopy and downwind of a source. Numerous likely sources of

error that might have decreased estimation accuracy have been identified and discussed.

Limitations in the data also prevented the use of more accurate canopy parameterisation

118

schemes that would have improved model performance. It is concluded that correcting for

these errors and mitigating these limitations, particularly the availability of high quality

attributes of OSR canopies will improve the bLS model to acceptable standards for its intended

use – reliable estimation of above canopy spores traveling from a ground-level below canopy

source. The model was assessed to have a FAC2 of 46%. Considering mischaracterisation of

turbulence can result in large modelling errors, it is very likely that the bLS model presented

will meet the acceptable standard of FAC2 of 50%.

119

Chapter 5 An Integrated Fault Detection, Identification

and Reconstruction Scheme for Agricultural Systems

The previous chapters have discussed the dispersion of Sclerotinia spores on a local scale;

the near field (𝑡 < 𝑇𝐿) where spore transport is best described by an LS model [48, 54]. But

when spores escape their local canopy in sufficient numbers, they can constitute a long

distance threat to crops several kilometres away [154]. This dispersion mode is best described

by a Gaussian Plume Model [47] [147].

This chapter proposes a novel approach that is derived from several disciplines to enable the

efficient exploitation of large-scale agricultural data collected by deploying biosensors in a

network. The proposed method is an augmented monitoring procedure based on multivariate

statistical process control (MSPC) techniques that is expected to address data integrity issues

that may exist in a network, by detecting, identifying and reconstructing faulty or missing

data.

Due to the similarities between dispersion of other particulates, such as particulate matter

(PM10), and spore dispersion data with respect to aerodynamic characteristics and plume

distribution [147] and the unavailability of spore dispersion data, a pollution monitoring data

set was used to demonstrate the efficacy of the proposed method. Pollution monitoring

networks are very similar to the potential biosensor network and are expected to be

vulnerable to the same challenges, such as mechanical failure, adverse environments, theft

and vandalism of sampling equipment. There are some important differences however which

have to be accounted for. The main one is in the reliability of measuring instruments. At this

stage of their technological development, as demonstrated in chapter 3, biosensors are

comparably unreliable to PM10 sensors and other meteorological sensors. Even the best

sensors, such as the ones used in healthcare (e.g. glucose sensors) are more susceptible to

errors than conventional sensors due to imperfections in the synergy between biological

reactions and electrochemistry. In addition, meteorological and PM10 networks have their

120

own data validation strategies which are usually robust. It is then expected that the potential

biosensor networks will be more prone to errors and in more need of data validation.

5.1 Motivation

The aim of the SYIELD project was to revolutionize agricultural disease prediction by

deploying a novel biosensor network that would be able to sample large-scale Sclerotinia

spore data efficiently. This network would comprise of numerous spore-measuring biosensors

spanning a large area, with each observation of the collected data being spatio-temporal and

large. Maintaining a sensor network that is exposed to the external environment and that is

made up of several components with finite reliabilities is a complex challenge. As a result of

this complexity, data integrity concerns arise regarding the reliability of observations, severe

missing data due to mechanical failure, theft and vandalism, and robustness of the entire

system to false positives due to suboptimal specificity of the biosensing process2. These

challenges cannot be addressed with the state of the art agricultural data collection methods

that are currently available.

As a consequence of Tobler’s first law of geography, which states that, “Everything is related

to everything else, but near things are more related than distant things”, efficiently sampled

spore data is spatially correlated and can be treated as a collection of multivariate

observations. Consequently, this study proposes a novel application of MSPC to detect,

identify and reconstruct potential errors in the data. MSPC is a model-based multivariate

statistical analysis set of tools that has been successful in monitoring industrial processes,

which are typically more reliable than the comparatively rudimentary biosensors considered

in this work. The proposed incorporation of MSPC into agricultural data collection has the

potential to make the monitoring of crops automated, and marks a migration from the manual

and time-consuming methods currently used [8, 263]. It is hoped that the proposed approach

will extend MSPC techniques such that the success it has achieved in industrial processes can

be realised in the agricultural industry.

The developed method in this chapter is tested on pollution data sourced from the London

Air Quality Network (LAQN) has been used in this study. As mentioned in section 5, at

distances far away from the source, airborne spores can be described by a Gaussian

distribution [47] [147] and are as such dispersed in a similar manner to pollution data. The

data is spatially and temporally correlated and can be modelled and analysed as highly

2 As explained in chapter 3, the biosensing process is based on the proxy detection of oxalic

acid, which is a pathogenicity factor of Sclerotinia spores as well as other fungi. This unfortunately means that false positives can arise due to the detection of any of these

masquerades.

121

correlated variables 3. Additionally, particulate matter of size less than 10 𝜇𝑚 (PM10) is

aerodynamically and physically similar to Sclerotinia spores, which have diameters ranging

from 12 𝜇𝑚 to 14 𝜇𝑚. Moreover, biosensors are expected to suffer from the same deficiencies

as pollution monitors, such as mechanical failure that will result in missing data. The decision

to use pollution data as a surrogate for agricultural data in demonstrating the potential

effectiveness of PCA on spore data was based on these reasons.

The next section introduces the background theory, detailing the components of MSPC

employed in this work

5.2 Background Theory

This section presents the theoretical foundation of main MSPC components as typically

applied in process control.

5.2.1 Principal Components Analysis (PCA)

PCA is a statistical transformation method that allows extraction of information from

correlated and high dimensional variables into new, orthogonal (uncorrelated) variables called

Principal Components or PCs [120]. These PCs are formed in such a way that the dominant

information, as represented by the largest direction of data variance, is contained in the first

PC followed by the second and so on. For a mean centred dataset X (𝑛 𝑥 𝑘) with row vectors,

𝒙𝒊𝑻, a maximum of A (𝐴 ≤ 𝑚𝑖𝑛{𝑛, 𝑘}) PCs can be formed as products of two matrices T (𝑛 𝑥 𝐴)

and 𝐏T (𝐴 𝑥 𝑘 ). When a PCA model is formed, the objective is usually to reduce data

dimensionality by having all the essential information in 𝐴 < 𝑘 PCs and discarding the

remainder. These retained PCs define the structured part of X and the 𝐴 + 1: 𝑘 discarded PCs

constitute the unexplained part of the original data and are defined as the model residuals,

E. E is the residual matrix that represents the deviations between variables and their

projections (predictions) in the PC space. The corresponding principal component model is

given by:

𝑿 = 𝑻𝑷𝑻 + 𝑬 [5.1]

and for any sample/observation, n,

𝒙𝑛 = ∑ 𝒕𝑖𝒑𝑖𝑇

𝐴

𝑖=1

+ 𝒆𝑛 [5.2]

3 It is worth noting that the actual biosensor data (spore data) may be more correlated than this pollution data given air quality monitors were deployed near population centers and

multiple line sources. The multiple sources will introduce a local influence on monitors near them resulting in the reduction of the overall data correlation when compared to a single-

sourced data.

122

where the scores contained in 𝒕𝒊 (columns of T) are the projections of samples of X in the PC

subspace, the loadings in 𝒑𝒊𝑻 (columns of P) represent the contribution of each variable in X

to the PCs retained in the model. PCA therefore transforms original data into an orthogonal

data subspace (PC subspace) and a residual subspace.

5.2.1.1 Cross-validation

Cross-validation is the method used to select model order in PC models. Methods for selecting

PCs range from the ad-hoc, which relies on the percentage of explained variance [264], to

selecting an optimal number of PCs using cross-validation methods [265, 266]. When using

the percentage of explained variance approach, the following formula is used:

𝐸𝑥𝑉𝑎𝑟 =∑ 𝜆𝑖

𝐴𝑖=1

∑ 𝜆𝑖𝑘𝑖=1

[5.3]

where 𝜆𝑖 is the variance of the 𝑖th component. The explained variance is normally expressed

as a percentage and PCs are added to the model until their addition does not result in a

meaningful increase in 𝐸𝑥𝑉𝑎𝑟.

More reliable cross-validation methods are based on Predicted Residual Sums of Squares

(PRESS) is computed as:

𝑃𝑅𝐸𝑆𝑆(𝐴) = ∑∑ ∑(𝑥𝑖𝑗,𝑙 − 𝑥 ��

𝑗,𝑙

𝑚

𝑙=1

𝑛

𝑗=1

𝑐𝑣

𝑖=1

)2 [5.4]

where PRESS(A) is the prediction error for 𝐴(𝐴 = 1,2, …𝐴𝑚𝑎𝑥) components in the model, 𝑥𝑖𝑗,𝑙

and 𝑥 ��𝑗,𝑙 are the observed and predicted 𝑗𝑙th elements of the 𝑖th subgroup 𝑿𝒊 and its estimate

𝑿�� respectively, and 𝑐𝑣 is the number of subgroups. The number of components retained in

the model, A, that minimises PRESS (A) is then chosen as the desirable number of

components in the model.

5.2.2 Multivariate Statistical Process Control (MSPC)

This section introduces the process control and chemometrics suite of tools known as MSPC

[264, 267-271]. MSPC are a set of statistical tools based on PCA and Partial Least Squares

(PLS) [121, 122] that have found success in industrial applications. The methods have seen

wide application in online control and batch process monitoring [272-275]. MSPC is attractive

because it enables online monitoring of multivariate processes and is flexible enough to

incorporate methods for handling missing data. MSPC techniques typically use the Hotelling

T2 and Squared Prediction Error (SPE) control limits [264, 269, 276, 277] to monitor deviation

of process variables from optimal operation. The limits are usually computed under the

assumption that the data is drawn from an independent and identically distributed set, i.e.

123

Gaussian distribution. This optimal process performance is assumed to be contained in the

scores of an underlying PCA model built from healthy process data.

The monitoring aspect of MSPC is of particular interest in this work. This study intends to

apply these methods to monitor the data integrity of a biosensor network by detecting and

identifying false measurements even in the presence of missing data.

5.2.2.1 Process monitoring

Consider a multivariate process represented by the data matrix 𝑿 whose PCA model has been

described as:

𝑿 = 𝑻𝑷𝑇 + 𝑬

Assuming an effective cross-validation procedure, the scores 𝑻𝑷𝑻 will ideally contain all the

relevant information in the data and will therefore be fully representative, i.e. an abnormality

in 𝑿 will manifest as an abnormality in 𝑻𝑷𝑇. If 𝑿 is made up of the normal operation data

(minus outliers or systematic errors), 𝑻𝑷𝑇 will represent the ideal state of the system in the

PC space.

Any new observations 𝒙𝑛𝑒𝑤𝑇 = 𝑷𝑇 𝒕𝑛𝑒𝑤 , scaled using the previously calculated mean and

standard deviation, can be interrogated using fewer pseudo variables, 𝒕𝑛𝑒𝑤 , against this

optimal behaviour to identify potential differences. The thresholds used to determine whether

this deviation is significant or not are the Hotelling 𝑇2 and Square Prediction Error.

Hotelling 𝑻𝟐 chart

The Hotelling 𝑇2 statistic [123] has been a useful feature in multivariate analysis for a long

time. It is based on the generalised distance of observations from their mean or the

Mahalanobis distance [278]. It can thus detect outliers, mean shifts and distributional

deviations from an optimal distribution in multivariate processes [125]. In the PC space, the

Hotelling 𝑇2 statistic for each sample interrogated against a PCA model of order 𝐴 is given by

[264]:

𝑇2 = ∑𝑡𝑖

2

𝜆𝑖2

𝐴

𝑖=1

[5.5]

where 𝑡𝑖 is the 𝑖th element of the score, 𝒕, and 𝜆𝑖 are the score vector and its corresponding

eigenvalue respectively. And the PCA control limit under multivariate normality assumption

becomes:

𝐶𝐿𝑇2 =𝐴(𝑛 + 1)(𝑛 − 1)

𝑛2 − 𝑛𝐴 𝐹(𝛼,𝐴,𝑛−𝐴) [5.6]

124

where 𝛼 is the confidence limit and 𝐹(𝛼,𝐴,𝑛−𝐴) is the 𝛼th upper quantile of an F-distribution

with 𝐴 and 𝑛 − 𝐴 degrees of freedom. It is worth noting that 𝑇𝑖2 now only represents mean

shifts and deviations inside the PC subspace (PCA model with 𝐴 components) [117, 279].

New observations 𝒙𝑛𝑒𝑤 can then be used to calculate 𝑇𝑛𝑒𝑤2 by projecting them onto the PC

subspace using a variant of Eq. 5.8:

𝑇𝑛𝑒𝑤2 = 𝒙𝑛𝑒𝑤

𝑇𝑷𝚲−1𝑷𝑇𝒙𝑛𝑒𝑤 [5.7]

where 𝚲 (𝚲𝑖𝑖 = 𝑑𝑖𝑎𝑔(𝜆𝑖)) is a 𝐴 𝑥 𝐴 matrix of eigenvalues. Whenever 𝑇𝑖,𝑛𝑒𝑤2 > 𝐶𝐿𝑇2 the

sample is assumed to be out of control.

Square Prediction Error (SPE) chart

The SPE chart detects faults and errors that do not lie in the PC subspace. These errors are

undetectable by the 𝑇2 statistic [117]. SPE therefore assesses the errors that lie in the

residual subspace not represented by the first A components of the PCA model, i.e. 𝑬. This

residual subspace can be thought of as orthogonal to the hyper-plane containing the principal

components [117, 280]. Geometrically, the SPE is the difference between the sample

observation vector, 𝒙, and its projection in the PC subspace, 𝒙, [117]:

𝑆𝑃𝐸 = ‖𝒙 − 𝒙‖2 [5.8]

= ‖𝒙 − 𝑷𝑷𝑇𝒙‖2

= ‖(𝑰 − 𝑩)𝒙‖2

= ‖��𝒙‖2 [5.9]

where �� is a projection matrix that represents the transformation of 𝒙 onto the orthogonal

residual subspace. By this definition, SPE is a measure of the PCA model fit to the original

data. A good model will have a high projection of 𝒙 in the PC subspace and a smaller one in

the residual subspace. An out of control state occurs when the SPE is higher than a threshold

value [113, 118]:

𝑆𝑃𝐸 > 𝛿𝛼 [5.10]

As with 𝑇2, a control limit for SPE is also usually calculated under assumptions of multivariate

normality. Jackson and Mudholkar [281] provides the following formula:

𝛿𝛼 =

(

𝑐𝛼√2휃2ℎ0

2

휃1

+휃2ℎ0(ℎ0 − 1)

휃12 + 1

)

1/ℎ0

[5.11]

where 𝑐𝛼is the confidence limit for the (1- 𝛼)th quantile of a normal distribution, and,

ℎ0 = 1 − 2휃1휃3/3휃22

휃𝑗 = ∑ (𝜆𝑖)𝑗

𝑘

𝑖=𝐴+1

𝑗 = 1,2,3.

Alternatively, a weighted chi-square distribution approach can be used [279]. SPE is

alternatively called the Q-statistic when computed under these normality assumptions.

125

Contribution Plots

Contribution plots [277, 282-284] show the contribution of all variables to all score vectors,

thus identifying variables that breach limits. When a change or fault causes a breach of 𝑇2

and/or SPE control limits, the responsible score may be identifiable from the monitoring

charts. Variable contributions to an out of limit SPE observation can be directly inferred from

SPE charts [285]. For the 𝑇2 chart, the contribution of the 𝑘th variable to faulty observations

can be identified from the normalised scores of that observation [285, 286]:

𝐶𝑜𝑛𝑡𝑘 = 𝑝𝑖,𝑘𝑥𝑘,𝑛𝑒𝑤 [5.12]

and when more than one score is out of control, an overall average contribution of variables

to all (normalised) out of control scores is computed as [285]:

𝑇𝐶𝑜𝑛𝑡𝑘= ∑

𝑡𝑖𝜆𝑖

𝑐𝑙𝑏

𝑖=1

𝑝𝑖,𝑘𝑥𝑘,𝑛𝑒𝑤 [5.13]

where 𝑝𝑖,𝑘 is the 𝑖𝑘th element of the loading matrix 𝑷𝑇, 𝑥𝑘,𝑛𝑒𝑤 is the 𝑘th monitor/variable of

the new observation and 𝑐𝑙𝑏 is the number of scores that breach the control limit. Individual

variable contributions with negative values are set to zero as only contributions with the same

sign as the score increase the overall contribution [285].

Detectability, Identifiability and Reconstructability

Not all errors or faults are detectable, identifiable or reconstructable. Non-detectability is

based on the fact that the residual subspace is orthogonal to the PC-subspace. Like fault

reconstruction, fault identification also depends on minimising SPE after the occurrence of a

fault. A fault identification index has been defined by Dunia and Qin [126]:

휂2 =𝑆𝑃𝐸𝑟

𝑆𝑃𝐸 [5.14]

where 휂 ∈ [0 1], 𝑆𝑃𝐸𝑟 is the reconstructed SPE after the occurrence of a fault. A significant

minimisation of 𝑆𝑃𝐸𝑟 signifies high identifiability and will result in a value of 휂 close to 0.

Assuming faults are detectable, when there is sufficient degree of freedom, all faults can be

identified. Dunia and Qin [126] have determined that 𝑘 − 𝐴 ≥ 2 is a necessary condition for

identifiability. Note that 𝑘 − 𝐴 is the dimension of the residual subspace and represents the

redundancy in the system.

5.2.3 Kernel Density Estimation

The KDE estimator (reintroduced from section 2.3.2.1) uses a weight function or kernel, ��,

which acts as a moving window on a univariate data sample of 𝑛 observations

(��𝑖1, ��𝑖2

, . . … , ��𝑖𝑛) to estimate the distribution as shown:

𝑓(��) =1

𝑛ℎ∑�� (

�� − ��𝑖

ℎ𝐾𝐷𝐸

)

𝑛

𝑖=1

[5.15]

126

where ℎ𝐾𝐷𝐸 is the bandwidth or smoothing parameter and 𝑊 is chosen so that its

differentiable, �� ≥ 0, and ∫ ��∞

−∞= 1. A number of kernels satisfy these conditions but this

work uses a Gaussian kernel given by [112]:

��(��) =1

√2𝜋exp (−

1

2��𝑇��) [5.16]

Gaussian kernels are attractive because they have optimal finite support, i.e. they gradually

become zero after a period so that distant points are not overly influential.

The choice of ℎ𝐾𝐷𝐸 is crucial and the variable is difficult to determine [132, 134]. A low value

will result in a noisy estimate and will amplify possibly insignificant data trends while a high

value can lead to insensitivity where important trends are missed and distribution properties

such as multimodal behaviour are missed. A good choice of �� such as the Gaussian (standard

normal distribution) can limit the extent of oversmoothing and undersmoothing thus

complementing the choice of ℎ𝐾𝐷𝐸.

5.3 Methodology

This sections presents the novel integrated methodology proposed in this work and its

application to the London Air Quality Network (LAQN). Main components of the approach are

described under the subheadings that follow.

5.3.1 Data

The data used in this work was sourced from the London Air Quality Network (LAQN) . LAQN

maintains a network of sensors over London to measure hourly values of particulate matter

(PM10), air particles less than 10μm in size. The data used in this study comprised 8760

hourly samples (for the period between January and December 2010) of PM10 concentration

observations from 93 monitoring locations. The sampling area covered spanned Latitude 500

to 520 degrees and Longitude -0.450 to 0.50 degrees. The data is spatiotemporally correlated

due to correlations in causal attributes, such as driving habits, etc., and the effect of diffusive

and dispersive effects, which correlate with location. Treating the data as a multivariate

dataset with 93 variables and 8760 observations can then account for this correlation between

variables and locations. Being a real dataset, there were inevitably missing observations. The

missing measurements also follow patterns that are common of monitoring equipment, where

consecutive observations are missing due to failure or vandalism and repair is not usually

immediate.

5.3.2 Principal Component Analysis of PM10

This section describes the application of PCA to the analysis of spatial patterns in the above

described PM10 data. The employment of PCA for network monitoring is quite different from

127

this and is described later (see section 5.3.3.1). In this preliminary demonstration of PCA,

missing observations were estimated using the relation [287]:

��𝑖,𝑗 =𝑥�� + 𝑥��

2

where 𝑥�� and 𝑥�� are the means of the 𝑖th row and 𝑗th column of 𝑿. More involved missing

data techniques are employed and discussed in section 5.3.4.

5.3.2.1 Model Building

The data matrix 𝑿 (8760 x 93) was first autoscaled [92] by subtracting the mean and dividing

by the variance for each variable. This is important when some variables have high numerical

variations that can dominate data characteristics, e.g. a monitor close to a pollution source.

Autoscaling prevents this by giving each monitor a comparable influence (a unit variance) on

the PCA model. The mean-centring aspect of autoscaling makes data analysis more

informative since variable contributions to each PC are assessed relative to the origin. This

way, negative and positive contributions can be differentiated.

The scores and loadings (Eq. 5.1) were calculated using the SVD approach. SVD theory states

that for every rectangular matrix 𝑿, there exist orthonormal matrices 𝑼 and 𝑽, and a diagonal

matrix 𝑺 such that:

𝑿 = 𝑼𝑺𝑽𝑻 [5.17]

where 𝑑𝑖𝑎𝑔(𝑺) contains the singular values of 𝑿 or the square roots of the eigenvalues of

𝑐𝑜𝑣(𝑿) (= 𝑿𝑻𝑿𝑛 − 1⁄ ) arranged in descending order of magnitude, 𝑽𝑻 represents the

corresponding loading vectors, 𝑷𝑻, and 𝑼𝑺 is equivalent to the score matrix, 𝑻, in Eq. 5.1 if

all possible 𝑘 components were retained in the model. Note that the decomposition implies

that:

𝑐𝑜𝑣(𝑿)𝒗𝑖𝑇 = 𝜆𝑖𝒗𝑖

𝑇

where λ𝑖 (= 𝑑𝑖𝑎𝑔(𝑺𝑖)2) is the eigenvalue of 𝑐𝑜𝑣(𝑿).

After generating the score and loading matrices, the model was validated using the EKF cross-

validation method. EKF was chosen because of its superior accuracy and low computational

cost [266]. The EKF technique involves the following procedure [288]:

For each component, A, divide data into 𝑐𝑣 subgroups by excluding different groups

of data from the training set.

Denote training set and validation (excluded) as 𝑿∗ and 𝑿# respectively for each

subgroup.

Form a PCA model with 𝑿∗

Predict 𝑿# by projecting it onto PC space, i.e. 𝑿# = 𝑿#𝑷𝑻

128

Estimate subgroup error as: 𝑿# − 𝑿#

Form an error matrix for all subgroups, 𝑬𝑐𝑣

Compute element-wise prediction error: 𝑃𝑅𝐸𝑆𝑆(𝐴) = ∑ ∑ (𝑒𝑗,𝑙)2𝑚

𝑙=1𝑛𝑗=1 where 𝑒𝑗,𝑙 is

the 𝑗𝑙th element of 𝑬𝑐𝑣.

Calculate root mean square error of cross-validation (RMSECV): 𝑅𝑀𝑆𝐸𝐶𝑉(𝐴) =

√𝑃𝑅𝐸𝑆𝑆(𝐴)

𝑛

Repeat for all 𝑘 possible components.

The number of components, 𝐴 , that resulted in the lowest PRESS(A), or alternatively

𝑅𝑀𝑆𝐸𝐶𝑉, was selected as the model order.

5.3.3 Multivariate Statistical Process Control (MSPC)

The performance of conventional MSPC in monitoring, detecting and identifying faulty sensors

was first evaluated against PM10 data. Two aspects were specifically tested: the robustness

of the method to the type of missing data expected of realtime monitoring network

(contiguous multiple samples missing) and the susceptibility of the method to mischaracterise

good measurements as bad for spatial data. These component methodologies employed are

detailed in the subsections below.

5.3.3.1 Data Pre-processing and preliminary monitoring PCA model

The data pre-processing required for building a monitoring PCA model is more rigorous than

the one used in the demonstration of PCA in section 5.3.1. In MSPC monitoring applications,

the underlying monitoring model should represent an ideal behaviour of the process, so that

deviations from what is normal behaviour can be identified [289]. One way to ensure good

monitoring PCA models is to in-fill missing data by using various estimation methods [290,

291] [292]. However, where the missing data is too pervasive to estimate reliably, typically

at values higher than 20% [293], deletion methods are preferable. An exploratory analysis of

missing data revealed that at least 26 sensors had over 20% missing data (see section 5.4.2).

Before pre-processing, 500 samples with the least amount of missing data and with score

values within the interquartile range of the total dataset (as determined from PCA analysis –

section 5.3.1) were set aside as “in-control samples” for use in subsequent sections (see

section 5.3.5). A threshold of 25% (to test beyond the limits of most missing data methods)

was applied on the remainder of the data and monitors with more missing measurements

were excluded from model building. After pre-processing, the remaining missing

measurements were in-filled using nearest neighbour interpolation.

129

After processing, the now fully observed dataset was then autoscaled and a preliminary PCA

model was built and validated as using the same procedure described in section 5.3.2.

5.3.3.2 Control Limits

Monitoring statistics (𝑇2 and 𝑆𝑃𝐸) for each sample included in the PCA model were computed

using Eqs. 5.5 and 5.9. To calculate limits for these statistics, their distribution must be known

[294]. The distribution of the monitoring statistics was tested using a Royston’s [295, 296]

multivariate normality test. Royston’s test is a significance test and is an extension of the

reliable Shapiro and Wilk [297] univariate normality test. The test was run in Matlab with a

significance of 0.05 for 2000 randomly selected samples at a time. Following multivariate

normality test, a non-parametric approach, which is better for non-normal data, to calculating

the control limits was considered appropriate [279]. Consequently, Kernel Density Estimation

(Eqs. 5.14 and 5.15) which is widely used to estimate distribution thresholds was used [298].

Numerous methods of selecting the bandwidth have been proposed [112, 132, 133].

Phaladiganon et al. [279] have used Silverman’s rule of thumb for Gaussian approximation to

estimate the density of 𝑇2 even though this requires an assumption of Gaussian distribution

on the data. In this work, ℎ𝐾𝐷𝐸 was estimated using the common method of 𝐿2 minimisation

of the mean integrated square error (MISE) with no underlying distribution assumptions as

shown:

𝑀𝐼𝑆𝐸(𝑓) = 𝐸 ∫(𝑓(𝑞) − 𝑓(𝑞))2𝑑𝑞 [5.18]

After selecting the KDE parameters, the KDE estimator was applied to 𝑇2 and SPE residuals

to estimate their true distributions. The control limits 𝐶𝐿𝛼𝑇2𝐾𝐷𝐸 and 𝛿𝛼𝐾𝐷𝐸 were then

determined by taking the 100(1 − 𝛼𝑡ℎ) percentile of each estimated density.

5.3.3.3 Building the final PCA monitoring model

Detecting faults in this application of MSPC differs from traditional industrial application. In

the latter, faults mostly arise from process failures that will cause a major shift from a clearly

defined optimal performance [299]. In PM10 or spore data, the only indicator of faults is an

abnormally high or low concentration reading at a particular location. This may not cause a

glaring shift in the correlation structure because the underlying “optimal process” is not as

clearly defined. It is therefore argued that the efficacy of applying PCA monitoring to

dispersion data depends on a high sensitivity of the underlying model that can detect subtle

correlation changes arising from uncharacteristic concentration changes. A highly sensitive

model clearly defines what an optimal performance is. This sensitivity can be maximised by

excluding samples with high T2 values, which is indicative of high PM10 concentrations. To

130

achieve this, a threshold equal to the 95th percentile (16.31) of the current T2 distribution is

applied and all samples above that were excluded from the final model.

For further outlier detection, the calculated control limits (𝐶𝐿𝛼𝑇2𝐾𝐷𝐸 and 𝛿𝛼𝐾𝐷𝐸 ) were applied

to the monitoring statistics (𝑇2 and 𝑆𝑃𝐸) and all samples breaching the control limits were

identified and excluded from the data for main model building. An intermediate model was

then rebuilt and validated as described in section 5.3.2. The justification for this is to get the

best possible monitoring model, since the best monitoring models are those that best

discriminate the behaviour they are intended to monitor [300]. The intended final application

of this integrated monitoring approach is in the fault detection of spore biosensor networks,

where ‘faults’ can be subtle measurement errors. To ensure that the monitoring model is

sensitive enough to capture these deviations, further outlier removal was carried out. A new

set of 𝐶𝐿𝛼𝑇2𝐾𝐷𝐸 and 𝛿𝛼𝐾𝐷𝐸 were calculated for the intermediate 𝑇2 and 𝑆𝑃𝐸 of the

intermediate model, and samples breaching these limits were further excluded from modelling

[276]. Finally, the remaining data was used to build a final model (as described in section

5.3.2) and a set of final 𝐶𝐿𝛼𝑇2𝐾𝐷𝐸 and 𝛿𝛼𝐾𝐷𝐸 were calculated based on a final set of 𝑇2 and

𝑆𝑃𝐸 (as described in section 5.3.3.1).

5.3.4 Online Fault Detection of a PM10 Network with Missing Data

The methodologies described so far are referred to as Phase I implementation of MSPC, where

knowledge of a process is gained and control limits are determined based on a desired

behaviour [280]. In the subsequent sections, Phase II, the implementation of online

monitoring, where new observations are evaluated based on Phase I (on-spec) standards is

described. The main areas of challenge in online monitoring are missing data and fault

identification. These are discussed below.

5.3.4.1 Missing data handling during system monitoring

Missing data is ubiquitous in many environmental applications [301, 302]. As demonstrated

in section 5.3.4.1, dealing with missing data in offline applications where data samples are

abundant is straightforward – they can simply be deleted. Where estimation is required, there

is usually enough correlation in the data to reliably estimate reasonable amounts (<20%) of

missing data [293] and numerous reliable methods are available [291]. However, in online

applications, missing data is challenging because only one observation is available at a time.

In this particular case, an observation is a vector of 77 values representing the measurement

from each PM10 monitor, so deletion and conventional estimation are not possible. Online

missing data methods typically do not estimate missing measurements in the variable space.

They instead estimate the score of the new observation using the underlying PCA model.

131

Nelson et al. [303] have shown that prediction uncertainties due to unreliable missing data

handling are at times greater than the uncertainties arising from model prediction errors.

These errors are caused by the loss of orthogonality of principal components [293]. A

successfully applied [304] online missing data approach, the single component projection

(SCP), was found to be unsuitable in this case. This is because this method estimates scores

sequentially element-by-element, which can cause large errors for long observation vectors

due to propagation of errors [305]. Any single observation vector of pollution or spore data

spanning a large spatial area will be large and particularly vulnerable to this. For example, a

PM10 observation, 𝒙𝑛𝑒𝑤(77X1), could have up to 18 stations (25%) missing, which must be

sequentially estimated.

As a result of this deficiency of SCP, Projection to Model Plane (PMP) [114, 120, 293, 303],

which projects the new observation to the PC subspace to calculate the entire score vector

at once, was chosen in this work. PMP estimates the score vector as [303]:

��𝑛𝑒𝑤 = (𝑷∗𝑇𝑷∗)−1𝑷∗𝑇 𝒙𝑛𝑒𝑤∗ [5.19]

where 𝒙𝑛𝑒𝑤∗ is the complete part of the new observation, 𝑷∗ the loading matrix, and ��𝑛𝑒𝑤

the estimated score.

5.3.4.2 Implementation of fault detection

The set aside in-control observations (section 5.3) were used for testing MSPC on new data.

The missing data in the testing samples had been in-filled using nearest neighbour

interpolation as described in section 5.3.1. Three cases were tested: new in-control samples,

randomly missing data, and contiguously missing data.

In-Control Samples: 100 of the observations were scaled by factors ranging from 5 to 10, so

that inter-variable correlation was preserved but variable magnitudes were higher. This was

to test MSPC for new, in-control observations.

Randomly missing data (Case 1): 30 observations were randomly selected from the 100 in-

control samples used. The samples were divided into 3 groups of 10 each. The first group

was corrupted with missing data in the range of 6.5-10% (low), the second in the range of

10-20% (medium) and the third in the range of 20-25% (high). Each observation index that

was allocated a missing value was randomly selected from 77 possible variables.

Contiguously missing data (Case 2): It is common to have neighbouring monitors missing

data, usually because instruments are not fixed in time and numbers of failures mount. To

address this, 15 out 100 samples (from the 500 set aside) were infused with missing data.

These were divided into 3 groups of 5 (21-25, 51-55, and 81-85). For the first group (21-25),

132

the first 19 observations (corresponding to 25% missing data) were designated missing. The

same was done for the second and third groups but with observations 30-49 and 50-69

respectively designated as missing. The three groups thus represent samples with 25%

missing data in approximately the first 19, mid 19 and last 19 PM10 monitors.

These were evaluated with MSPC as follows: A score (or an estimate for Cases 1 and 2) is

calculated for each new observation, 𝒙𝑛𝑒𝑤 ( 𝒙𝑛𝑒𝑤∗). These scores are evaluated against the

control limits and violating scores are flagged.

5.3.5 Online Fault Identification in a PM10 Network

When an observation breaches the control limits, a fault is detected. But identification of

faults requires the identification of the erroneous variable(s). This section describes the

methodology for fault identification in a PM10 monitoring network.

5.3.5.1 Identifying Faults

In a PCA context, fault identification can be implemented using different indices, such as

sensor validity index (SVI) [118], reconstruction-based contribution [306, 307] or T2

contributions [299]. The SVI and RBC approach requires the fault to be isolatable, i.e.

uniquely identified [113, 118, 126]. When there are multiple faults, this is not always possible

[113, 117]. Based on these, the contribution plot [116, 307] approach, which is has been

found suitable for correlated data [118], was preferred. No contributions are calculated for

SPE, as these can be directly inferred from SPE charts [285]. For the 𝑇2 chart, the contribution

of the 𝑘th variable’s contribution to faulty observations was calculated using Eq. 5.12 and

5.13.

5.3.5.2 Implementation of fault identification

The performance of MSPC in detecting out of control samples was evaluated next. Here too,

100 samples from the 500 set aside (see section 5.3.1) were used. 4 samples were corrupted

at select positions of the observation vector as shown in Table 5.2. Samples 5 and 20 were

corrupted with values of approximately 50%-150% times the mean value drawn from an

inverse distance-weighted function, where the central station was assigned the highest value

and the farthest the lowest. Samples 5 and 20 were therefore simulated such that variables

were spatially correlated in a manner that would happen when there is a local emission of

spores or a pollutant. Samples 50 and 80 were corrupted with values randomly chosen from

a range of values spanning 50% and 400%.

133

Table 5. 1: Index of variables and sample number of corrupted observations

Sample no. Corrupted variable index

5 5-10

20 20-25

50 40-45

80 60-65

Each observation was evaluated by MSPC as described in section 5.3.4. When a sample

breached the control limits, the erring variables, in this case monitoring stations, were

identified from the contribution plot and SPE chart as detailed in section 5.3.5.1.

5.3.6 Augmented MSPC

PM10 concentrations are largely linear but also nonlinear due to the complex nonlinear

processes that influence their production and accumulation [308, 309]. A linear model, such

as the PCA model implemented in this work, will by definition explain the variance associated

with the linear correlation between PM10 monitors. Any nonlinear correlation will be assigned

to the residuals. As a result, MSPC based on this model may identify nonlinearly correlated

observations as deviating from ideal behaviour (faulty), thus leading to false positives.

Moreover, in typical monitoring applications, reconstruction is done in the PCA domain [310]

[299, 306]. These reconstructions are not always possible, as reconstructability depends on

fault attributes [311]. In other words, this type of reconstruction assumes the fault has been

correctly detected. This is not ideal for a system may trigger false alarms or where a validation

of the detection process itself is required. To address this issue, an augmented MSPC

approach is proposed. The novel approach integrates the best aspects of MSPC (reliable

detection), robust missing data handling, and a reliable spatial interpolation method to

validate fault detection and reconstruct data in a PM10 monitoring network. The proposed

approach reconstructs data regardless of type of fault and is independent of the fault

detection procedure.

5.3.6.1 Kriging

Kriging [312, 313] is an unbiased method of spatial interpolation that is based on spatial

correlation. Kriging originated from geostatistics but has been successfully applied to particle

dispersion [314-317]. While inverse distance weighting (IDW) [312] and other types of

interpolation methods also account for spatial correlation, they do so by assigning linearly

decreasing weights to points with increasing distance of separation. By contrast, kriging uses

a data-driven function to assign weights. Therefore, kriging offers improved results with

clustered and highly correlated data [317]. Another attraction of kriging is that it gives an

estimate of the error of estimation; hence it can be reliably evaluated. Numerous kriging

134

methods exist [313, 318, 319]. Ordinary Kriging (OK) was chosen in this work because it has

been successfully applied in PM10 interpolation under assumptions of local mean constancy

[317]. The ordinary kriging estimator of a spatially continuous random variable, 𝑍(𝑑0), is the

weighted sum of its values at neighbouring locations, 𝑑𝑖:

��(𝑑0) = ∑ 휁𝑖𝑍(

𝑁(𝑑)

𝑖=1

𝑑𝑖) [5.20]

under the condition that,

∑ 휁𝑖 = 1

𝑁(𝑑)

𝑖=1

where ��(𝑑0) is the estimated variable, 𝑍(𝑑𝑖) is the observed value at the 𝑖 th of 𝑁(𝑑)

neighbours whose mean is assumed to be constant, and 휁𝑖 is the weight function representing

the influence of each observed location on the estimate. The choice of 𝑁(𝑑) is unique to OK

because other forms of kriging either assume a universally constant mean or assume the

process is completely non-stationary [313, 317, 319]. In this work, due to the relative high

density of the LAQN network, specifying 𝑁(𝑑) was straightforward and the five nearest

neighbours were chosen. 휁𝑖 is calculated as the minimiser of the ordinary kriging estimator

variance [320]:

𝜎𝑂𝐾2 = 𝐶𝐾𝑟𝑖𝑔(0) − ∑ 휁𝑖[𝐶𝐾𝑟𝑖𝑔(𝑑𝑖 − 𝑑0) − 𝜚(

𝑁(𝑑)

𝑖=1

𝑑𝑖)] [5.21]

where 𝐶𝐾𝑟𝑖𝑔 (0) is the true variance of the spatial variable and 𝜚 is the Lagrange operator

[320, 321]. The weights that minimise this variance can then be expressed as [320]:

∑ 휁𝑖𝐶𝐾𝑟𝑖𝑔(𝑑𝑖 − 𝑑𝑗) +

𝑁(𝑑)

𝑗=1

𝜚 = 𝐶𝐾𝑟𝑖𝑔(𝑑𝑖 − 𝑑0)

𝑲𝜻 = 𝒌𝐾𝑟𝑖𝑔

and since 𝑲 is positive (semi) definite,

𝜻 = 𝑲−1𝒌𝐾𝑟𝑖𝑔 [5.22]

where 𝐶𝐾𝑟𝑖𝑔(𝑑𝑖 − 𝑑𝑗) is the covariance between all location pairs in 𝑁(𝑑), 𝐶𝐾𝑟𝑖𝑔(𝑑𝑖 − 𝑑0) is the

covariance between observed locations and estimated point, and

𝑲 =

[ 𝐶𝐾𝑟𝑖𝑔11

𝐶𝐾𝑟𝑖𝑔12⋯⋯𝐶𝐾𝑟𝑖𝑔1𝑗

1

𝐶𝐾𝑟𝑖𝑔21 𝐶𝐾𝑟𝑖𝑔21

⋯⋯ 𝐶𝐾𝑟𝑖𝑔2𝑗1

⋮𝐶𝐾𝑟𝑖𝑔𝑖1

𝐶𝐾𝑟𝑖𝑔𝑖2⋯⋯ 𝐶𝐾𝑟𝑖𝑔𝑗𝑗

1

1 1⋯ ⋯ ⋯⋯ 1 0 ]

; 𝜻 =

[ 휁1휁2

⋮휁𝑖𝜚 ]

; and 𝒌𝐾𝑟𝑖𝑔 =

[ 𝐶𝐾𝑟𝑖𝑔1

𝐶𝐾𝑟𝑖𝑔2

⋮𝐶𝐾𝑟𝑖𝑔𝑖

1 ]

These covariances are calculated from the variogram.

135

5.3.6.1.1 Variogram The variogram [318] is the basis of Kriging interpolation that accounts for spatial correlation

between pairs of points in space in the Kriging model, and can be empirically estimated as

[317]:

��(ℎ𝐾𝑟𝑖𝑔) = 1

2𝑁(ℎ𝐾𝑟𝑖𝑔)∑{𝑍(𝑑𝑖) − 𝑍(𝑑𝑖 + ℎ𝐾𝑟𝑖𝑔)}

2

𝑁(ℎ)

𝑖=1

[5.23]

where 𝜗(ℎ) is the estimated semivariance, 𝑁(ℎ𝐾𝑟𝑖𝑔) is the number of observed pairs 𝑍(𝑑𝑖)

and 𝑍(𝑑𝑖 + ℎ𝐾𝑟𝑖𝑔) separated by the lag, ℎ𝐾𝑟𝑖𝑔 [312]. The lag is analogous to the bin and

bandwidth in histograms and kernel density estimation respectively. A poor choice of the

parameter may result in too few or too many data points in any single bin. This can severely

affect the estimated semivariance. In this work, ℎ𝐾𝑟𝑖𝑔 was specified after evaluating the

statistics of the location data, specifically the quantiles of their distribution.

To ensure non-singularity in Eq. 5.22, kriging uses positive (semi) definite theoretical

variograms whose parameters are estimated by fitting the semivariances generated from Eq.

5.23. A number of theoretical models are available [318, 322] but selection is heuristic. The

models differ in how fast the semivariance attains the true variance of the spatial variable. In

this work, the spherical variogram was favoured because it has been widely used in particle

dispersion applications [323]. The spherical model expressed in terms of the true data

semivariance, 𝜗(ℎ𝐾𝑟𝑖𝑔), is defined as [322]:

𝜗(ℎ𝐾𝑟𝑖𝑔) = {𝐶𝐾𝑟𝑖𝑔0

+ 𝐶𝐾𝑟𝑖𝑔1(1.5

ℎ𝐾𝑟𝑖𝑔

𝑟− 0.5 (

ℎ𝐾𝑟𝑖𝑔

𝑟)

3

) 𝑓𝑜𝑟 ℎ𝐾𝑟𝑖𝑔 ≤ 𝑟

𝐶𝐾𝑟𝑖𝑔0+ 𝐶𝐾𝑟𝑖𝑔1

𝑓𝑜𝑟 ℎ𝐾𝑟𝑖𝑔 ≥ 𝑟 [5.24]

where 𝐶𝐾𝑟𝑖𝑔0+ 𝐶𝐾𝑟𝑖𝑔1

is the variance of the estimated variable at 𝑑𝑖 (alternatively called the

‘sill’ [321]) and 𝑟 is the range or the minimum distance from 𝑑𝑖 at which ��(ℎ𝐾𝑟𝑖𝑔) = 𝐶𝐾𝑟𝑖𝑔0+

𝐶𝐾𝑟𝑖𝑔1 . The variogram and kriging are extensively discussed in Clark and Harper [318],

Cressie [312] and Goovaerts [320].

5.3.6.2 Implementing augmented MSPC

The same observations used in section 5.3.5.2 were used in this section to demonstrate

Augment MSPC. To get the best variogram and therefore a higher confidence in 𝜎𝑂𝐾2 [317]

[324], an empirical variogram was generated for each observation evaluated. Observations

5, 20, 50 and 80 correspond to samples measured at 7th, 13th, 11th and 16th hour of the day.

The entire dataset was averaged on these hours generating 7th, 13th, 11th and 16th hour

averages. This was done to reduce the effect of local effects, which could result in spatial

136

heterogeneity that can affect Kriging performance [317, 319]. The four fitted variograms (in

ArcGIS v10.1 on a Windows 7, 2.4GHz Intel Core processor, 4GB RAM platform) were then

used to krige their corresponding observations. To assess confidence in the reconstructed

values, the kriging estimator variance was used. This metric was considered suitable because

it is independent of the values being estimated and is, under assumptions of good variogram

fit, a reliable assessor of kriged estimates [320]. For an interpolated (kriged) value to be

accepted, |𝑥𝑘 − ��𝑘| < 3𝜎𝑂𝐾. This is intuitive as the error between measured and estimated

values will be larger for an erroneous observation. If this error is higher than the uncertainty

in the estimated value then the measured value is validated as bad and the reconstructed

value is accepted as replacement. It is assumed that the kriging error is normally distributed

and a 97.5% confidence is applied.

Augmented MSPC integrates the robust MSPC developed in the previous sections with the

Kriging interpolation described above. The augmented MSPC procedure for each new

autoscaled observation, 𝒙𝑛𝑒𝑤 (77x1), is as follows:

Check new observation for missing data

Use PMP to estimate scores and then compute 𝑇2 and SPE for the sample

Compare current sample’s 𝑇2 and SPE against control limits

If there is a violation, identify number (scores) of violations

Compute 𝐶𝑜𝑛𝑡𝑘 and 𝑇𝐶𝑜𝑛𝑡𝑘 as appropriate

Identify erroneous variable(s), 𝑥𝑘

Reconstruct variable(s), ��𝑘, from nearest 𝑁(𝑑) error-free neighbours using kriging

interpolation

Compare ��𝑘 to 𝑥𝑘

If |𝑥𝑘 − ��𝑘| < 3𝜎𝑂𝐾, 𝑥𝑘 is not faulty. Otherwise, 𝑥𝑘 is faulty replace it with ��𝑘.

5.4 Results

This section presents the main results. Results of the PCA analysis of PM10 data are presented

first, then the MSPC results are outlined subsequently.

5.4.1 PCA Analysis of PM10

This section discusses the results of a PCA analysis of PM10 data. The MSPC result are

presented starting from the next section. The scores and loading plots of the PCA model built

in section 5.3.2 are shown in figures 5.1 and 5.2. Only the largest two principal components

are shown in both cases. Score plots show sample relationships while loading plots show

inter-variable relationships and variable contributions to PCs.

137

Figure 5. 1: Score plot showing first PC against second (numbers represent sample number

– hour of year)

Figure 5. 2: Loading plot of first vs. second PC showing all monitoring stations (numbers

represent station numbers)

The clustering of observations and variables is evident in both figures. In figure 5.1, the

dataset is more compact than in figure 5.2, suggesting that there is more temporal correlation

in the data than spatial. This is to be expected considering there is a cyclical (daily) pattern

to the emission of particulates and that the monitored area spans several kilometres. Outliers

-20 0 20 40 60 80 100 120-15

-10

-5

0

5

10

15

Scores on PC 1 (61.80%)

Score

s o

n P

C 2

(4.3

7%

)

82

109 110 112

493

618

634 635

641

1095

1151 1159

1164

1812

2147

2149

2170 2171

2435 2493

2740 2743

2744 2746 3287

3742 3743

3745

3948 3951

3963 3989

3991

3992 3995 3996 3999 4022

4071

4135 4164

4303 4327 4838

4978

5127

5528

6341

6681 6682

6727

6732 6755

6770 6787

6802

6803 6806

6828

6852

7311 7312

7359

7365

7401 7402

7436

7437 7438 7439

7440 7441

7442 7443

7444

7445

7446

7460

7462 7463

7590 7624

7644

7653 7654

7655 7656

7670 7672

7673

7741 7744 7745 7747

7828

7906

7920

7929

8080

8215 8258

8319 8364 8407 8408 8409 8410 8411

8435

8439 8442

8447 8448

8486

8505

8530 8531

8602

8628

8629

8631 8632 8633

8635

8636 8637

8645 8648 8649

8656 8658 8660

8679

8694 8696 8703

8725 8730

8733 8738

8743 8748 8753

8757 8759

8760

Samples/Scores Plot of Pollution Collated.xls

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

PC 1 (61.80%)

PC

2 (

4.37

%)

6 7

8

9 10

11

15

16

17

18

19

20

21

24

25 27

28

29

30 33 34

35

36

39

41 42

49

51 52

53

54

55

56

57 58

60

61

62

63 64 65

66

67

68

69 72

73

74

77

78

79

80

82

83

84

85

86 87

88

89

91

92

Variables/Loadings Plot for Pollution Collated.xls

138

are also visible in figure 5.1. These samples correspond to measurements made within 8

hours of Guy Fawkes Night! It is therefore possible for outlying or unusual measurements to

be detected by simply exploring this type of data using multivariate analytical tools. These

outliers were excluded before building the preliminary monitoring model.

The loadings plot is a good indicator of redundancy - clustering of some monitors in the

loading plots (figure 5.2) suggests they have similar influences on the PC explaining a

substantial amount (62%) of the variance. This suggests that some of these monitors can be

decommissioned without loss of information (as depicted by the model’s residuals). This is in

agreement with studies conducted across world cities, where it was found that there is a high

redundancy in air quality monitoring networks [325-329] [330].

The loadings plot can also help identify potentially troublesome monitors. Monitors with high

loadings (the PC axis values) will affect a PCA model’s performance when they become

corrupted or when measurements are missing. From a PCA loading plot, these can be

identified and paid special attention.

Overall, PCA analysis suggests there is a potential for substantial dimension reduction for air

dispersed data. This is indicated in the first PC, which explains approximately 62% of the total

data variance. Figure 5.3 shows the explained variance for the first 20 PCs. It is evident that

subsequent PCs explain a very small part of the variance (approx. 4% and 3% for the 2nd and

3rd respectively). The flat nature of the variance curve after about 5 PCs suggests that the

explained variance by these PCs is indistinguishable from random noise. The cross-validation

plot shown in figure 5.3, where the PRESS is seen to increase after 4 PCs, confirms this.

Consequently, the model order was selected as 4.

It should be noted from figure 5.3 that the PCA model only explains approximately 69% of

the variation in the pollution data. The 31% variance that is not explained might contain

important information that the model has ignored, which could be significant when the model

is employed in monitoring applications.

139

Figure 5. 3: Percentage of variance explained by first 20 PCs

Figure 5. 4: Calibration and cross-validation errors for first 20 PCs

5.4.2 Data pre-processing and preliminary model of PM10

The pre-processing results illustrate the difference in the complexity of monitoring

pollution/dispersion compared to industrial processes. Figure 5.5 and 5.6 show the

distribution of missing data in the pre- and post-processing. In figure 5.5, almost 3500

samples had approximately 15 (~20%) missing observations in a vector of 77 observations.

In fact none of the 8760 samples throughout the year was completely observed. Even though

only approximately 15% of the total data is missing, their impact is severe because they are

2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

70

Principal Component Number

Variance C

aptu

red (

%)

Eigenvalues and Cross-validation Results for Pollution Collated.xls

2 4 6 8 10 12 14 16 18 200

1

2

3

4

5

6

7

Principal Component Number

RM

SE

CV

, R

MS

EC

Eigenvalues and Cross-validation Results for Pollution Collated.xls

RMSECV

RMSEC

140

missing in blocks for relatively prolonged periods of time. The minimum number of missing

data points in any single observation vector was 6 (8% of the total) and the maximum was

35 (45% of the total). Some of the excluded stations had as high as 80% missing, with an

average of 45% of data missing among them.

Figure 5. 5: Missing data distribution before pre-processing

Figure 5. 6: Missing data distribution after processing

After pre-processing, the data has improved. As indicated in figure 5.6, approximately 4000

samples out of 8760 had approximately 6 (~8%) missing data. The minimum and maximum

5 10 15 20 25 30 350

500

1000

1500

2000

2500

3000

3500

Percentage of missing data

No

. o

f o

bse

rva

tio

ns w

ith

mis

sin

g d

ata

0 5 10 15 20 25 300

500

1000

1500

2000

2500

3000

3500

4000

Percentage of missing data

No

. o

f o

bse

rva

tio

ns w

ith

mis

sin

g d

ata

141

number of missing data in any single observation also improved to 1 and 26 (~34%)

respectively.

Figure 5.7 shows the location of PM10 monitors across London with deleted stations

annotated. Because they are not all concentrated in one location, the distribution of these

stations suggests their exclusion from modelling may not result in a loss of information. In

fact, an evaluation of the loadings corresponding to these stations in figure 5.1 shows most

of them belong to the cluster of points to the right (figure 5.1), which as explained earlier

suggests redundancy in the monitors.

Figure 5. 7: Monitor locations showing deleted monitors (red) with excessive missing data

Figures 5.7 - 5.10 show the scores, cross-validation errors, scaled Hotelling T2 and SPE of the

preliminary model built from all the observations in LAQN PM10 data after excluding monitors

with high missing values. The SPE plot was scaled by T2 axis for comparison. Subsequent

plots of the parameters are presented as absolute values. The cross-validation plot, which

compares the cross-validation error with the calibration error, indicates that 6 PCs are optimal

for the model. The score plots show the plots first PC against all the scores retained in the

model, the T2 plot shows the deviation of each samples from the model centre, and the SPE

plot shows the error between each sample and its projection onto the model space. From

figure 5.7, the same outliers are visible on all PC combinations regardless of percentage

variance explained.

-1.2 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6

51.2

51.25

51.3

51.35

51.4

51.45

51.5

51.55

51.6

51.65

Longitude (deg)

La

titu

de

(d

eg

)

142

Figure 5. 8: Score plots showing the 4 largest PCs against each other

-50 0 50 100 150-60

-40

-20

0

20

Scores on PC1 (62.5%)

Score

s o

n P

C2 (

4.1

5%

)

7436

74377438

74397440

7441

744274437444

7445

7462

-50 0 50 100 150-60

-40

-20

0

20

40


Score

s o

n P

C3 (

3.3

%)

7436

7437

7438

74397440

74417442

7443744474457462

-50 0 50 100 150-15

-10

-5

0

5

10

15


Score

s o

n P

C4 (

2.9

%)

74367437

74387439

74407441

744274437444

7445

7462

-60 -40 -20 0 20-60

-40

-20

0

20

40


Score

s o

n P

C3 (

3.3

%)

7436

7437

7438

74397440

74417442

7443744474457462

-60 -40 -20 0 20-15

-10

-5

0

5

10

15


Score

s o

n P

C4 (

2.9

%)

74367437

74387439

74407441

744274437444

7445

7462

-60 -40 -20 0 20 40-15

-10

-5

0

5

10

15

Scores on PC3 (3.3%

Score

s o

n P

C4 (

2.9

%)

74367437

74387439

74407441

744274437444

7445

7462

143

Figure 5.9: Cross-validation and calibration errors

Figure 5.10: Hotelling T2 chart preliminary PCA model

2 4 6 8 10 12 140.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

Number of PCs

RM

SE

C (

gre

en),

RM

SE

CV

(blu

e)

0 1000 2000 3000 4000 5000 6000 7000 80000

20

40

60

80

100

120

Sample (hrs)

Hote

lling T

2

7436

7437

743874397440

7441

7442

7443

7444

7445

7462

144

Figure 5.11: SPE chart for preliminary PCA model

The first score plot (figure 5.9) is similar to the one shown in figure 5.1 where apparent

outliers were observed. The PCA model, having been validated with 6 PCs, explains

approximately 75% of the total data variance. The remaining 25% of the data variance is

explained by the SPE. An SPE of 25% is high for most processes but PM10 dispersion is a

larger-scale process and is nonlinear due to the dominant influence of wind variables and and

complex causal attributes [308]. As such, there are bound to be numerous directions (PCs)

of identical data variance after the dominant direction (first PC) has been identified. The

justification of using such a linear model on a nonlinear process is based on the concepts of

predominant wind direction and wind speed averaging. These approximate the dispersion

process as linear but do not completely eliminate its nonlinearity. This highlights the

difference of this application of PCA and the significance of validating detected faults.

From the score and T2 figures (figures 5.9 and 5.10), obvious outliers can be easily identified

as indicated by the numbered points. The identified outliers correspond to high observed

values with means at least 5 times that of the next highest samples. The model can thus

detect correlation breakdowns resulting from abnormally high values of PM10 measurements.

But it is also desired that correlation breakdowns resulting from relatively low values (false

negatives) be detected. False negatives result in more subtle shifts because they have a lower

limit of zero as opposed to false positives that can take any positive value higher than the

actual. For example, a false negative for a measurement with a true value of 5 units can only

be detected from the correlation breakdown that results from a maximum error of 5 units.

0 1000 2000 3000 4000 5000 6000 7000 80000

20

40

60

80

100

120

Sample (hrs)

SP

E

7436

7437

74387439744074417442744374447445

7462

145

On the other hand, a false positive has no upper bound on the error and the higher the value,

the more apparent the correlation breakdown and the easier the detectability.

The SPE plot is more sensitive as indicated in figure 5.11. While SPE is known to be more

sensitive in industrial MSPC [274], some of the sensitivity seen here is attributable to the

nonlinear nature of dispersion mentioned earlier. This nonlinearity causes high concentrations

to be observed at locations in a manner that is not consistent with the model. This results in

an estimation error by the model that carries over into the residual, the SPE. SPE is therefore

proportional to high PM10 concentrations and this sensitivity will remain for all relatively high

measurements of PM10. The sensitivity of SPE is demonstrated when the outlying values on

the SPE plot are analysed. Figure 5.11 shows the samples with high SPE values in figure 5.10

plotted based on day of the year and time hour of the day the measurements were made. It

may be seen that the data is evenly distributed throughout the year and there appears to be

clustering between approximately 9am and 4pm. These are busy hours that are associated

with high release of particulate matter and are part of the true pattern of PM10 dispersion in

London. Some of the samples are outliers, for example where the SPE is relatively high for

just one sample. But most of them represent the multiple sourcing of PM10 dispersion and

consequently the nonlinear behaviour of the process the PCA model cannot completely

account for. This makes the SPE susceptible to false alarms. Qin et al. 1997 have proposed

applying an exponentially weighted moving average (EWMA) filter to noisy SPEs such as the

one in figure 5.10. But because this application aims for higher sensitivity than the traditional

industrial MSPC, this was not employed in this work. The same threshold of 95th percentile

(47.71) was applied to the SPE chart to exclude breaching samples.

Figure 5. 12: Outliers from preliminary model’s SPE showing daily time of emission

0 50 100 150 200 250 300 3500

5

10

15

20

Day of the year

Hour

of

the d

ay

146

5.4.3 Final Monitoring Model and Control limits

Figures 5.12 and 5.13 show the T2 and SPE charts of the final monitoring PCA model after

pre-processing.

Figure 5. 13: Hotelling T2 for final PCA model

Figure 5. 14: SPE chart for final PCA model

Figure 5.14 shows the corresponding kernel density estimated distribution of the monitoring

charts and 95th percentile confidence limits (13.42 and 28.96 for T2 and SPE respectively) are

shown in figure 5.15. The density may be seen to be positively skewed for both T2 and SPE

0 1000 2000 3000 4000 5000 6000 7000 80000

10

20

30

40

50

60

70

80

90

100

Sample (hrs)

Hote

lling T

2

0 1000 2000 3000 4000 5000 6000 7000 80000

20

40

60

80

100

120

140

160

180

200

Sample (hrs)

SP

E

147

(0.727 and 0.514 respectively). This has the significance that the values less than the

respective T2 and SPE means (3.84 and 15.45) dominate the distribution, and confidence

limits set above these values may not be critically sensitive. In addition, both distributions are

unimodal, suggesting a confidence limit set based on the distributions will reflect a single type

of behaviour. This is reasonable when it is considered that the statistics only monitor one

characteristic of the data - correlation breakdowns due to abnormal values.

Figure 5. 15: Kernel density estimated distributions of Hotelling T2 and SPE

0 2 4 6 8 10 12 14 16 180

0.1

0.2

0.3

Hotelling T2

Density

0 5 10 15 20 25 30 35 40 45 50 55

0.01

0.02

0.03

0.04

0.05

SPE

Density

148

Figure 5. 16: KDE ICDF showing 95th percentile for Hotelling T2 and SPE

It is expected that this high sensitivity may result in false alarms but this is believed to be a

necessary trade-off. This is because false negatives in the observations resulting from an

insufficiently sensitive monitoring scheme bear a high cost. This is because, in the case of the

biosensors, a false negative measurement would mean not advising farmers to spray or worse

not warning them of impending threats. Additionally, a false negative would take longer to

identify and rectify since the biological measurement process is slow and only provides daily

samples. Given these considerations, the high sensitivity is justified and that is one of the

motivations behind an augmented MSPC approach that can independently assess expected

false positives.

5.4.4 Online Fault Detection of PM10 network

5.4.4.1 No Missing Data

Figure 5.16 and 5.17 show the monitoring charts for the first case. The samples may be seen

to be in control for both limits. As expected, scaling the values and interpolating missing

observations did not notably breakdown correlations as perceived by the model. The higher

sensitivity of SPE in relation to T2, as indicated by SPE values closer to their control limit than

T2 values, is still apparent.

0 10 20 30 40 50 60 70 80 90 1000

5

10

15

20

X: 95

Y: 13.42

Hot

ellin

g T2

Percentiles of the Hotelling T2 KDE

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

X: 95

Y: 28.96

Percentiles of the SPE KDE

SP

E

149

Figure 5. 17: Hotelling T2 control chart for new in-control samples

Figure 5. 18: SPE chart for new in-control sample

5.4.4.2 Missing Data (Case 1)

Figures 5.18 and 5.19 show the monitoring charts when there are missing observations in

the samples. In this case, 30 randomly chosen samples were selected with varying amounts

of missing data (from 6.5% to 25%). The samples were divided into 3 groups of 10 each.

The first group was corrupted with missing data in the range of 6.5-10% (low), the second

0 10 20 30 40 50 60 70 80 90 1000

2

4

6

8

10

12

14

16

18

20

Samples (hrs)

Hote

lling T

2

New samples

Control limit

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

70

80

90

100

Samples (hrs)

SP

E

New samples

Control limit

150

in the range of 10-20% (medium) and the third in the range of 20-25% (high). Variable

locations to corrupt for each sample were randomly selected from 77 possible variables.

From figure 5.18 and 5.19, some deviations from the actual values can be seen, indicating

the model is affected by the missing measurements. The sample with the highest difference

between the true T2 and the estimated T2 (at sample 20) is one of the samples in the high

percentage range. This is more clearly seen in Table 5.1, which shows the average deviation

from actual T2 (and SPE) with increasing missing data.

Figure 5. 19: Hotelling T2 control chart for in-control samples with missing data

0 10 20 30 40 50 60 70 80 90 1000

2

4

6

8

10

12

14

16

18

20

Samples (hrs)

Hote

lling T

2

Without missing data

With missing data

Control limit

151

Figure 5. 20: SPE control chart for in-control samples with missing data

Table 5. 2: Control charts with increasing missing data

Control

statistic

Mean absolute deviation for samples with missing data of:

6.5-10% 10-20% 20-25%

Hotelling T2 1.17 1.41 2.63

SPE 3.11 4.96 5.19

From table 5.1, it can be seen that higher missing data percentages cause a higher deviation

from the actual T2 value. The deterioration of the performance of missing data technique with

increasing missing data is expected since PMP calculates the score vector using the observed

part of the new sample. As missing data increases in the new observation vector, the

correlation information between the variables decreases. This performance deterioration does

not result in out-of-control performance, however, because the original samples are in-

control. PMP, like most online missing data techniques, estimates scores by projecting the

observed part of the sample onto the PC plane, i.e. onto the loading vectors. Since the

samples are in control, then the information contained in the model (the loadings) describes

them adequately enough that score estimation errors will be minimal. However, for critically

in-control samples, even a small score estimation error will result in a breach of the control

limit. This is particularly true for SPE because of the already high values of the residual due

to the noise and nonlinearity in the system as discussed earlier.

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

70

80

90

100

Samples (hrs)

SP

E


With missing data

Control limit

152

5.4.4.3 Missing Data (Case 2)

A common occurrence in monitoring and sensing networks was tested next. In most cases

when a pollution monitor or environmental sensor fails, it takes a while to realise and fix the

problem. In such cases, multiple samples may be missing for days as demonstrated during

the processing of this dataset. Figures 5.20 and 5.21 show the monitoring charts when an

extreme form of this scenario occurs, i.e. maximum number of missing values become missing

in consecutive samples as well as variables.

The samples used in the preceding missing data test were used for this exercise. 15 of the

100 samples (21-25, 51-55, 81-85) were infused with missing data. For the first group (21-

25), the first 19 (corresponding to 25% missing data) observations (monitors) were

designated missing. The same was done for the second and third groups but with

observations 30-49 and 50-69 respectively designated as missing. The three groups thus

represent samples with 19% missing data in approximately the first 19, mid 19 and last 19

PM10 monitors. It may be seen that there is a marked deviation from the T2 and SPE values

of the actual samples and those of samples with missing data. T2 of the samples with missing

data has increased from an average value of 1.2647 to 3.71 (~193%) and the corresponding

SPE from 10.98 to 23.31 (112%). But only the samples in 51-55 are out of control. The other

groups remain in control despite having 25% missing data.

Figure 5. 21: Hotelling T2 chart with severe case of missing data (25%)

0 10 20 30 40 50 60 70 80 90 1000

5

10

15

20

25

Samples (hrs)

Hote

lling T

2

With missing data


Control limit

153

Figure 5. 22: SPE chart with severe missing data (25%)

An investigation into the cause of this led to the loading plot shown in figure 5.22. The plot

shows the loading weights (or the influence) of each monitoring station on the most dominant

direction of variance, PC1. The numbers represent monitor positions when arranged by

distance (see figure 5.7). It can be seen that variables in the range of 30-49 (samples 51-

55) have the highest proportion of influential variables among the groups of samples as

represented by a more marked clustering near the maximum PC1 value. The 19 missing

observations in samples 51-55 constitute a critical combination of missing variables. This

increases the score estimation error of PMP [303, 305] and consequently a decrease or

increase in SPE. The monitoring process does not detect all the samples as out of control

because some score estimation errors are positive. A positive estimation error indicates an

under prediction and will reduce T2 values and may increase SPE.

The case just demonstrated is a severe one as mentioned in the beginning of this subsection.

Consecutive samples and variables can routinely get missing in real deployments but not likely

at such high percentages. PMP works well in this application, as it was able to cope with 25%

missing data unless when high proportions of influential variables were missing. This is in

part due to the efficient deployment of the PM10 network. As seen in the figure 5.22, monitors

are comparably influential, giving the network redundancy.

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

70

80

90

100

Samples (hrs)

SP

E

With missing data


Control limit

154

Figure 5.23: Variable loadings on PC1

5.4.5 Online Fault Identification

The control charts of the simulation are shown in figures 5.23 and 5.24. It may be observed

that MSPC detects all samples as faulty. While T2 only detects corrupted samples, SPE has

detected additional ones that are known to be false positives. The reason for these

measurements is the aforementioned nonlinearity and stochastic nature of dispersion, which

was consigned to the residuals when the model order was selected. In these cases it is

advantageous to use the combined chart shown in figure 5.25. In this chart, the samples (5,

20, 50 and 80) are clearly at fault.

-0.04 -0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.140

10

20

30

40

50

60

70

80

1 2

3 4 5

6 7

8 9

1011

1213

1415

1617

1819

2021

2223

2425

2627

2829

3031

3233

3435

3637

3839

4041

4243

4445

4647

4849

5051

5253

5455

5657

5859

6061

6263

6465

6667

6869

7071

7273

7475

7677

Loadings on PC1 (62.5%)

Variable

/Monitor

num

ber

155

Figure 5. 24: Hotelling T2 chart for simulated out-of-control samples

Figure 5. 25: SPE chart for simulated out-of-control samples

0 10 20 30 40 50 60 70 80 90 1000

50

100

150

200

250

300

Hote

lling T

2

Variable

Out of control

Control limit

0 10 20 30 40 50 60 70 80 90 1000

200

400

600

800

1000

1200

Variable

SP

E

Out of control

Control limit

156

Figure 5. 26: SPE-T2 chart for simulated out-of-control samples

It should be noted that samples 5 and 20 are in the context of this application not erroneous

as they may represent realistic events due to local spore/pollen/particulate pollutant sources.

MSPC perceives these samples as having broken the correlation structure. To the PCA model,

the high random values of samples 50 and 80 are as outlying as 5 and 20. This is because

the PCA model does not evaluate correlation within a group of variables (5-10 and 15-25 in

samples 5 and 20) but rather evaluates correlation among all variables in consideration of

past behaviour learned during model training. In other words, the PCA model is not aware of

the local spatial correlations. As long as this spatial correlation is between a few variables, as

is often the case during local release events, the PCA model may always consider that

correlation a deviation from optimal behaviour.

Normally when a reliable detection is made, the contribution plots are inspected to identify

the erroneous variable. Figures 5.26 and 5.27 show the contribution plots to the T2 and SPE

errors. It should be noted that variables with negative variables should be ignored.

In this case, because more than one variable is at fault, all SPE and T2 have multiple

contributory variables as shown. It can be seen that the SPE chart can isolate the faults more

than the T2 chart. This is because SPE contributions are directly defined in the SPE charts

while T2 are derived through approximations [307].

0 50 100 150 200 250 3000

200

400

600

800

1000

1200

Hotelling T2

SP

E

Samples

SPE control limit

Hotelling T2 Control limit

157

Figure 5. 27: Hotelling T2 Contribution plot for 4 corrupted samples (Table 5.2)

Figure 5. 28: SPE Contribution plot for 4 corrupted samples (Table 5.2)

The T2 chart cannot uniquely identify the erroneous variables. From the SPE chart, however,

all (perceived) erroneous variables have been uniquely identified. SPE’s higher sensitivity to

0 20 40 60 80-3

-2

-1

0

1

2

3

Variable

Hot

ellin

g T2 C

ontr

ibut

ion

0 20 40 60 80-1.5

-1

-0.5

0

0.5

1

1.5

Variable

Hot

ellin

g T2 C

ontr

ibut

ion

0 20 40 60 80-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

Variable

Hot

ellin

g T2 C

ontri

butio

n

0 10 20 30 40 50 60 70 80-4

-2

0

2

4

6

8

Variable

Hot

ellin

g T2 C

ontr

ibut

ion

0 10 20 30 40 50 60 70 80-5

0

5

10

15

20

Variable

SP

E C

ontr

ibutio

n

0 10 20 30 40 50 60 70 80-4

-2

0

2

4

6

8

10

12

14

Variable

SP

E C

ontr

ibutio

n

0 10 20 30 40 50 60 70 80-5

0

5

10

15

20

Variable

SP

E C

ontr

ibutio

n

0 10 20 30 40 50 60 70 80-10

-5

0

5

10

15

20

25

Variable

SP

E C

ontr

ibutio

n

158

faults compared to T2 has been reported for online monitoring processes [274]. Alcala and

Qin attribute the low discriminatory power of T2 contribution plots to the approximations

during estimation of T2 contribution compared to direct calculations for SPE [307]. For

samples 50 and 80 that are actually bad measurements MSPC works. But the approach

mischaracterises potentially good measurements because it lacks spatial awareness.

Therefore, from the investigations carried out so far, false alarms can arise from three sources

when MSPC is extended to spatial data:

When combinations of influential variables become missing

High sensitivity of the PCA model to concentration changes

Lack of spatial awareness by traditional MSPC

5.4.6 Online Fault Detection in a PM10 Network

The proposed augmented MSPC procedure (section 5.3.6) can independently confirm

measurements suspected as erroneous by traditional MSPC. As proposed in section 5.3.6, the

difference between the measured and Kriged values is considered significant if |𝑥 − ��𝑘| >

3𝜎𝑂𝐾. Applying the procedure to the relevant variables in samples 5, 20, 50 and 80, the results

are shown in table 5.3. It may be seen that kriging identifies all but one (highlighted in red

in Table 5.3) of the mischaracterised variables from sample 5 and 20 as “good” based on

their neighbours. The variables in the actual faulty samples (50 and 80) have also been

confirmed faulty by this procedure. The single variable kriging could not certify highlights one

of the limitations of the approach. The variable in question is the 6th variable in sample 5 as

such it has the fewest neighbours in the entire group shown.

Kriging efficiency decreases if a variable is at the edge, i.e. when all its neighbours are at

successive distances away from it [321]. This is because the Kriging weights are functions of

distance between pairs alone. They are independent of the value of the variable being

estimated. For such a monitor/variable, the weight given to the nearest neighbour becomes

more influential and may result in a significant (higher than the estimation variance) over

prediction or under prediction depending on the value of the neighbour.

159

Table 5.2: Augmented MSPC results showing deviation of corrupted variables from their kriged

estimates and the kriging estimator’s variance

Sample no. 𝒙 ��𝒌 |𝒙 − ��𝒌| 𝟑𝝈𝑶𝑲

5

163.10 178.36 15.26

11.69

201.37 207.39 6.02

8.70

240.54 233.67 6.87

12.64

172.32 179.37 7.05

9.04

190.38 187.84 2.54

9.08

161.24 164.98 3.74

13.32

20

116.32 104.76 11.56

15.53

174.09 171.44 2.65

7.65

131.35 139.64 8.29

9.41

127.93 126.24 1.69

10.81

118.40 124.91 6.51

9.71

116.70 98.89 17.81

7.96

50

326.42 240.84 85.58

14.53

186.80 220.37 33.57

16.33

414.27 346.65 67.62

10.11

418.03 302.17 115.86

7.61

282.35 314.51 32.16

13.77

376.38 266.41 109.97

8.42

80

108.25 87.29 20.96

9.10

325.89 191.30 134.59

12.27

141.06 178.51 37.45

9.16

432.72 280.23 152.49

8.29

438.95 302.65 136.3

16.44

298.07 274.55 23.52

9.47

The variance of the kriging estimator is a reliable assessor of significance of deviation because

it is independent of estimated values. For monitors with sufficient neighbours, kriging

estimation will work well in verifying or rejecting an SPE detection of a faulty monitor. The

most important feature of kriging can be observed in the relatively unchanging values of its

variance compared with the changing values of the variables being measured. This

independence makes the kriging estimator variance a reliable assessor of significant

deviation.

160

5.5 Discussion

5.5.1 Integrated Fault Detection, Identification and Reconstruction

in a PM10 Network

In this work, a model-based process monitoring scheme capable of fault detection and

identification used in chemometrics and industrial processes was adapted and successfully

applied to PM10 dispersion over London. The conventional MSPC methodology was first

fortified with an online missing data handling technique that is less sensitive to multiple

monitor failures. To account for the non-normal distribution of the data, a parametric

approach to control limits calculation using Kernel Density Estimation was adopted. Then, a

data-driven unbiased spatial interpolation method, Kriging, was integrated into the robust

fault detection to enable validation of the detection process as well as reconstruction of

validated ‘faults’. The intended application of this method is in monitoring, fault detection,

identification and reconstruction of spore concentrations coming from a biosensor network.

This has the potential to be applied to any automatically detected fungal spore of comparable

size (10-14𝜇𝑚).

In adapting MSPC to this data, considerations were given to the likely type of faults that will

occur. In the process industry an “optimal behaviour’” of a process is clearly defined [331].

This may be maintaining a product (output) at a desirable quality or minimising energy or

input consumption below a certain threshold. The monitoring model is typically built offline

for this optimal process behaviour and faults, when they occur, will result in a deviation from

this desired specification that will cause a correlation breakdown [332]. For monitoring

networks, an ideal behaviour is not as clearly defined. To define the optimal process for this

application, the significance of each fault (false positive or false negative) was considered.

False negatives from the point of view of health, environmental and agricultural monitoring

networks are more costly than false positives [333]. This is because their effects are often

irreversible. Specifically, for the potential spore network, a false negative will entail not

warning growers/farmers to apply fungicides, subjecting crops to irreversible risk and/or

damage. In light of this, an optimal behaviour for the monitoring model was defined as a

sufficiently sensitive model that will detect bad measurements due to biosensor drifts or even

contamination. To achieve this specification in the monitoring model, the model was

iteratively built with samples having high T2 excluded from modelling at each step. The

implication of this is that the control limits were lower and more sensitive to small changes in

correlation structure, but also the monitoring scheme was more sensitive to false positives.

This was considered a necessary trade-off considering the higher cost of false negatives as

explained earlier. False positives were accounted for through the augmented MSPC scheme

161

where every detection was validated based on the spatial correlation that may have been

missed by the PCA model.

The results show that MSPC can be extended to spatial systems such as the PM10 monitoring

network, potential biosensor and a host of other environmental monitoring applications. PCA

analysis showed that there is a high redundancy in the PM10 network, suggesting some

monitors can be decommissioned without loss of data. This is in agreement with multiple

studies carried out across the world’s cities that found high redundancies in air quality

monitoring networks [325, 326, 334] [327-329]. The capability of a PCA-based data validation

technique to detect unusual samples was also demonstrated. The integrated missing data

method, PMP, was shown to be robust to missing data of up to 25% missing data, a value

higher than is typically supported by most online missing data methods [335] [293] [303].

However, when missing data was in influential variables, the monitoring scheme was found

to be susceptible to false alarms, due to the higher values of the score estimation errors

under such circumstances [305]. Dealing with missing data in online applications is

challenging. Recent applications of online monitoring to real-world systems either ignore it

[332] or propose solutions without stating how much missing data the solution can handle

[301]. In this work, it was shown that the proposed monitoring scheme can handle up to

25% missing data while maintaining its ability to detect erroneous measurements. MSPC also

successfully identified faults (monitors that reported values outside the ideal system

behaviour defined by the model), although there were some false positives associated with

samples that were simulated to reflect source events. These were attributed to an assumption

of Gaussian distribution by PCA [336] and its subsequent inability to identify local spatial

correlation events in PM10 distribution, which can result in spatial heterogeneity [337]. The

developed augmented scheme was able to validate these false detections and independently

reconstruct faults using Kriging. The kriging estimator variance, 𝜎2𝑂𝐾, used as a confidence

threshold for data reconstruction was successfully used to validate all but 1 wrong MSPC

detection. It was noted that the missed variable (monitor) was at the edge of the Kriging

domain and as such had the fewest number of neighbours available for estimates.

Some of the component methodologies used in this work have been applied to PM10 albeit

in a different manner. PCA has been used mainly for dimension reduction to identify

redundancy [325, 327-329, 334]. It has also been used in source apportionment of PM10 and

PM2.5 concentrations, where the pattern recognition (clustering) power of the technique is

exploited [338, 339]. In these applications, the PCA models were found to explain a high

percentage variance (often > 90%) of the PM10 concentration. In this work, the explained

variance by the validated PCA model was 69%. The difference in the explained variances is

due to two reasons. First, the studies found in literature were motivated by maximum

162

explained variance and so were not restricted by the PRESS. The PRESS can indicate

overfitting, a situation where PCs that describe residuals and not useful information in the

data are added to the model. The effect of overfitting is more impactful when a model is used

to make predictions (estimate scores in this case), but not usually severe for pattern

recognition although it can bias findings [340]. Second, the studies used multiyear data (at

least 5 yrs.) as opposed to single year data used in this study. Using multiyear data can

strengthen the temporal correlations in the data, especially with respect to seasonal

variations. This increases the explained variance by the model. Multiyear data was not used

in this work because using a single year data (specifically 2009-2010) maximised the number

of stations available.

Different variations of Kriging have also been used to predict gaseous pollutants [341],

reconstruct spatiotemporal fields [302] and estimate PM10 concentrations [317, 324, 342].

Most notably, Wong et al. [317] applied kriging to estimate PM10 concentrations in the United

States of America. They found good agreement nationally between measured data and kriged

estimates although kriging performed badly in regions with poor distribution of monitors

where deployment was influenced by expected exceedances. In this study, a better

agreement with measured data (an aggregated 𝑅2 = 0.89 for all samples with all four

variograms) was found due to a higher density in monitors and a variogram fitting

methodology that maximised spatial continuity. The hour-of-day averaging employed reduced

the effect of local sources, thus improving variogram fit.

No application was found in literature of PCA being used as a monitoring model or of a similar

(to this work) integrated scheme being applied to PM10 networks. This is partly because data

validation (Quality Assurance) for environmental networks (PM10 and Weather) is done in-

house, usually governed by agency standards. For example, the US Environmental Protection

Agency (USEPA) uses a 4-level data validation process to check the health of air quality

measurements at each individual sensor location [343]. These networks also use sampling

equipment that is more reliable than the spore biosensor that is the intended beneficiary of

this work. For example, the Tapered Element Oscillating Microbalance (TEOM) analysers

predominantly used in LAQN have a precision of ±0.5𝜇𝑔𝑚−3 [344] over a measurement range

of 0 to 1𝑔𝑚−3, although a correction factor is added to standardise measurement [345]. This

is high precision compared to what is achievable for biosensors at this stage of their

technological development [346].

A few relevant methodologies were however identified. Hill and Minsker [347] propose a

method of anomaly detection for environmental sensor networks based on a time series

approach that compares new observations with previous ones over a moving window. If this

163

difference is higher than a threshold the measurement is classified as anomalous and it record

is removed from future updates of the underlying time-series model. The main limitation to

this approach is that extending it to multiple sensors over a large network will be

computationally expensive since these steps are on the individual sensor level. The main

attraction of the method proposed here is that the dimension reduction capabilities of PCA

ensure that only a few variables (sensors) are checked to determine if a fault has occurred,

since only one score is calculated and evaluated for fault at the detection stage. The method

proposed by Hill and Minsker [347] will require a check at every sensor node for each sample

to detect anomalies. In fact, the approach proposed in this work will retain an advantage over

most change-detection methods that are implemented in the variable space. This is because

the dimension reduction of PCA allows a system of hundreds of sensors to be adequately

described by significantly fewer pseudo-variables in the PC space, and only a single score will

need to be computed to detect a fault over the entire network at any time. Most of the data

guidelines used in meteorological data validation suffer from this one-by-one approach to

sensor error detection [348]. They additionally do not have missing data handling capabilities

[347, 348]. The application of this integrated fault detection, identification and reconstruction

scheme is therefore considered advantageous over existing methods and unique.

The choice of kriging was based on its reliability in spatiotemporal and environmental data

applications [349] [319]. Methods like Artificial Neutral Networks (ANN) [350] [351], which

are self-learning black box models, while promising can suffer from over/under learning

[349], leading to errors. There are no universally good interpolation methods – suitability

depends on data attributes [319]. In this work, the density of the sampling network and the

spatial continuity of concentration for the period in question made Kriging an ideal choice.

Additionally, Kriging provides a measure of estimation precision, 𝜎2𝑂𝐾 , which has proven

useful in assessing the reliability of the reconstructed value. Land Use Regression (LUR), a

highly successful method of predicting PM10 in recent years, was also ruled out due to its

input requirement. LUR needs as inputs covariates, usually source information like intensity

and emission details [352], in addition to monitoring station values. In the intended

application of this technique (Sclerotinia spore dispersion), source information, by far the

most important covariate, is normally not available [29]. Unlike PM10, for which land use

attributes, such as roads, industrial buildings, etc., co-vary with emission patterns, spore

dispersion is mostly dependent on source attributes (size and strength) which are typically

not available. Kriging only requires readily available inputs - existing monitoring stations and

station location information - to make estimates, and this is why it was preferred.

Over the last two decades, fungal diseases have caused unprecedented damage on both

plants and animals, and constitute a major threat to food security [9]. A majority of these are

164

dispersed by air over large distances [3] and are the most dominant species among

bioaerosols within the 2-10 𝜇𝑚 size range [353]. Detecting these spores in an efficient

manner has been the biggest impediment to addressing the challenge they pose, as a result

of limitation in current measurement and sampling equipment [10]. Measurement of these

pathogens have relied on a two-step process, where spore collection and quantification are

done at different stages. The reliable quantification methods are time-consuming and tedious

(e.g. qPCR [63]), thus discouraging the largescale data collection that will improve our

understanding of governing dispersal processes. In a recent Nature article, Fisher et al. [9]

argue that current modelling approaches and small-scale experiments cannot fully predict

disease spread and severity. They instead call for intensive monitoring and surveillance of

fungal pathogens. Empirical approaches to data collection will enable building more robust

models of both spore dispersal and disease prediction than the local, situation-specific

phenomenological models limited data collection currently allows [258]. The advent of rapid

measurement techniques in airborne pathogen detection, specifically biosensors, provides

hope that automatic detection will enable the realisation of this empirical approach [13, 29].

However, the detection techniques while promising are still at the infant stages of their

technological development [171]. Biosensors are more imprecise, inaccurate and generally

more unreliable than conventional sensors due to imperfections in the synergy between

biological reactions and electrochemistry [182]. These limitations at the individual biosensor

level will most likely be confounded when they are deployed in a network for large-scale

sampling, where they will have to cope with challenging environmental factors and vandalism

[354]. Data integrity and validation are therefore needed to address the enormous challenges

facing these networks, specifically with respect to missing data and measurement errors

[354]. The methodology proposed in this work is in line with this need and can be extended

to any fungal spore detection network.

5.5.2 Limitations of K-MSPC

The proposed integrated scheme has a number of limitations. First, the PCA model used in

this work is a linear model. It assumes a multivariate Gaussian distribution of data [336] and

will, as have been noticed, be susceptible to false positives when these assumptions are

violated even for faultless observations. The nonlinearities in PM10 dispersion, although mild,

were supressed by the effect of predominant wind direction on dispersal and abundance of

data to train the model in this case. The large disparity between samples used in training the

model and the sets of new observations used (500) meant that the assumptions of stationarity

were not violated. In real applications, the process (PM10 dispersion) may be nonstationary

and there will be a need for the model to adapt so that incessant false alarms are avoided.

To cope with such process shifts, an adaptive PCA scheme [355, 356] should instead be used.

165

Second, this methodology is intended to be applied online. Currently, variogram fitting is

manual and implemented in a separate package than MSPC. Incorporating recent methods

of automatic variogram fitting [357] will enable the full integration of data reconstruction

(Kriging) with fault detection and identification, and fully automate the scheme.

Another limitation of this work relates to the aforementioned data non-stationarity. Due to

the amount of historical data available for this work, individual variograms were fitted for

each sample kriged. This ensured that the assumption of mean constancy was valid. During

real data monitoring, the spatial structure of the surface could substantially change and

ordinary kriging may give poor estimates [319]. For this reason, it is recommended that

Universal Kriging [341] [319], which accounts for data stationarity, should be used.

5.6 Conclusion

In this chapter, the potential of multivariate analysis tools for use in the proposed biosensor

network has been demonstrated. The utility of dimension reduction methods that allow easy

analysis of high dimensional data were initially demonstrated. The initial analysis was able

identify redundancies in the pollution monitoring network. MSPC was introduced, equipped

with missing data handling capabilities and applied to a PM10 monitoring data network with

a view to optimise it for the potential biosensor network. Confidence limits based on non-

parametric density estimation techniques successfully and consistently detected simulated

faults. It was observed that the effectiveness of missing data techniques depends on the

efficiency of deployment in the network because that minimised the probability of critical

combinations of variables to become missing. MSPC was able to handle up to 25% missing

data even when variables were contiguously missing, demonstrating its potential for real

world extension to spatial networks.

Areas of concern in the extension of traditional MSPC to spatial data were identified. The mild

nonlinearity in dispersion processes arising from the short term nonlinearity of wind variables

was identified as one of the main areas of concern. This nonlinearity is believed to be

responsible for the relatively high proportion of the process being explained by the residuals

(31%). Another area of concern arises from the subtlety of correlation breakdowns due to

change in concentration values compared to industrial processes where a deviation from a

clearly defined specification is of interest. This was addressed by making the underlying PCA

model highly sensitive by excluding all values larger than the 95th percentile of the data

distribution.

After adapting and applying traditional MSPC to the process, limitations of MSPC were

identified as false alarms resulting from the underlying model’s high sensitivity and PCA’s

166

inability to exploit spatial correlation. This was demonstrated through a simulated scenario

where MSPC mischaracterised a potentially healthy measurement as an erroneous one.

A novel augmented MSPC procedure (K-MSPC) that used Kriging to independently verify or

reject detections, and reconstruct faulty measurements was proposed to address these

limitations. K-MSPC was successfully able to certify the mischaracterised samples as healthy.

167

Chapter 6 Conclusion, Recommendations and Future Work

This chapter summarises the principal findings of this study and identifies opportunities for

further research. The summary of research undertaken is divided into four parts:

Overview of Project Conception and Motivation

Summary of Principal Findings

Real world Applications of Research

Future Work

6.1 Overview of Research Motivation

The PhD program presented in this thesis is the result of a four-year multidisciplinary effort

on improving agricultural innovation. With global population exploding and competition for

increasingly scarce resources rising, achieving and maintaining food security is the foremost

challenge of this century. The broader research area investigated during this programme was

conceived from a goal to contribute towards solving this challenge through the reduction of

crop loss and minimisation of fungicide use. This was to be achieved through the introduction

of an empirical approach to agricultural disease monitoring.

The SYIELD project, initiated by a consortium involving University of Manchester, Syngenta,

Gwent, among others sought to address this by proposing a network of biosensors that can

electrochemically detect airborne pathogens by exploiting the biology of plant-pathogen

interaction. This approach offers significant improvements on the current inefficient,

imprecise and largely theoretical or experimental methods used. The proposed biosensor

network approach will make actionable data available and enable the adoption of advanced

data analysis tools from other disciplines that will make disease risk forecasting robust,

simplify quarantine measures and make crop protection and fungicide use more efficient.

168

Within this context, this PhD focused on the adoption of multidisciplinary methods to address

three key objectives that are central to the success of the SYIELD project: local spore ingress

near canopies, the evaluation of a suitable model that can describe and estimate spore travel

distances, and multivariate analysis of a potential pathogen-detecting biosensor network.

6.2 Summary of Principal Findings

A brief summary of the research work done and the main findings thereof are given here.

The main areas are addressed as follows.

6.2.1 Field trial experiment and generation of novel data

The local transport of spores in an OSR canopy was investigated by carrying out a field trial

experiment at Rothamsted Research UK. The aim of the research was to investigate spore

ingress in OSR canopies, generate reliable data for testing the prototype biosensor and

evaluate a trajectory model. During the experiment, spores were air-sampled and quantified

using various quantification methods. Colourimetric detection and the prototype biosensor

were used to test for oxalic acid, an established pathogenicity factor of Sclerotinia spores,

and quantitative Polymerase Chain Reaction (qPCR), a DNA amplification technique, was used

to measure actual spore concentration. As expected, qPCR results outperformed the proxy

measurements. The results provided an insight into the filtration effect of OSR canopies and

heavy ground deposition of spores near the source. The research also enabled the evaluation

of various sampling heights (potential deployment heights of biosensors) from which an

optimal height was identified. Results from test of oxalic acid with the prototype biosensors

and colourimetric test also revealed a low sensitivity for the former, suggesting proxy

measurements may not be reliable in live deployments where spores are likely to be

contaminated by impurities and inhibitors of acid production. The actual spore results

measured using qPCR proved informative and provide a novel source of data that will be

useful for a wide array of applications. This data was found to fit a power decay law, a finding

that is consistent with experiments involving fungal spores in other crops.

6.2.2 Evaluating a 3D bLS model with experimental data

In the second area investigated, a 3D backward Lagrangian Stochastic (bLS) model was

parameterised and evaluated with the field trial data. The bLS model was chosen because

spore ingress, rather than spore concentration, was of primary concern. A model’s ability to

estimate concentrations reliably is a good indicator its ability to compute trajectories since

concentrations are computed from residence times of ensemble trajectories. For this reason,

the evaluation of a bLS model on experimental data was carried out. The final aim of this

aspect of the work is to employ this model to estimate minimum distances of separation of

biosensors and this is a subject of ongoing research. The bLS model, parameterised with

Monin-Obukhov Similarity Theory (MOST) variables showed good agreement with

169

experimental data and compared favourably in terms of performance statistics with a recent

application of an LS model in a maize canopy. Results obtained from the model were found

to be more accurate above the canopy than below it. This was attributed to a higher error

during initialisation of release velocities below the canopy. Overall, the bLS model performed

well partly because the experiments that generated the data were carried out in ideal

conditions for MOST validity.

6.2.3 Multivariate data analysis of potential sensor network

The final area of focus was the monitoring of a potential citywide biosensor network. The

purpose of this section of the research was to investigate data integrity concerns that would

arise from a citywide and potentially nationwide unsupervised network of biosensors with

multiple components of finite reliabilities. A novel framework based on Multivariate Statistical

Process Control (MSPC) concepts was proposed and applied to data from a pollution-

monitoring network. The monitoring data was of PM10 particles, which have similar

aerodynamic and dispersal characteristics with Sclerotinia spores. The monitoring scheme

was based on a PCA model that was trained with PM10 data covering a period of one year.

This data was first analysed to demonstrate the potential utility of PCA's dimension reduction

and data analysis for the biosensor network.

The initial analysis identified redundancies in the PM10 network based on the visual

advantage PCA offers in reduced dimensional space. The monitoring scheme was then

implemented on a refined PCA model, which incorporated missing data handling capabilities.

Missing measurements are a significant challenge in real-world applications of network

monitoring due to a number of reasons ranging from mechanical failures to theft and

vandalism.

To deal with the reality that most natural processes, and, therefore, practical data, do not

conform to normality assumptions, a non-parametric approach was employed to specify the

control limits of the monitoring process.

The adaptation of the MSPC framework to PM10 data identified areas of interest in the

application of monitoring schemes to spatial networks. The analysis suggested that missing

data methods work better when the network is efficiently deployed in such a way that all

biosensors have comparable influences. The analysis also indicated that in these cases of

efficient deployment, the system could handle high missing data amounts in the dataset (up

to 25%). Missing data issues become more challenging when measurements are missing in

contiguous blocks, e.g. when multiple neighbouring sensors fail. This can be avoided by

prompt deployment to replace or repair faulty or vandalised biosensors. Further, the main

limitation of traditional MSPC in spatial data applications was identified as a lack of spatial

170

awareness by the PCA model when considering correlation breakdowns due to an incoming

erroneous observation. This resulted in misidentification of healthy measurements as

erroneous in this study. The proposed augmented MSPC approach was able to incorporate

this capability. The proposed approach also introduced an assessment metric to test deviation

significance in the form of the kriging estimator variance. This is believed to be a robust

metric because it is independent of the values being estimated.

6.3 Real world applications of research

In addition to the real world applications addressed throughout the duration of the study, the

findings from this work are extendable to a wide variety of areas. The monitoring scheme

developed can be extended to any types of measurable spores or air-dispersed particulates

of similar size. In addition, the deployment of a biosensor network will provide actionable

disease prediction data on a scale that has not been seen before. This officially opens up

agriculture to Big Data tools that will go a long way in winning the fight against crop loss and

potentially hunger and famine.

6.4 Further areas of research

A number of research areas were identified during the course of this work. Some of these

areas are extensions of work done while others offer fresh perspectives. A few of the identified

areas are briefly discussed below:

In this work, a static PCA model was used to implement MSPC. Static models cannot

adapt to a shift in process behaviour as they only act according to the information

contained in the data during model building. Particulate matter dispersal as well as

that of spores and pollen may follow a seasonal pattern, in which case a monitoring

system based on static PCA cannot adapt. This will result in false alarms or no alarms

at all as the control limit of the control charts will no longer be valid. The rationale

for using a static model in this work is due the availability of historical data (annual

hourly data). This is not always possible in reality or in the pilot phase of network

monitoring when no prior data has been collected. In these cases monitoring schemes

based on dynamic models like recursive PCA, which recursively recalculate the PCA

models parameters with new information may be more beneficial. This way, the

control limits adapt to the current state of the process.

Another interesting area is in the efficient deployment of biosensors. PCA dimension

reduction capabilities have already been demonstrated. This can be incorporated with

near optimal strategies to achieve good results. In the area of sensor coverage,

optimal location algorithms use a candidate set of locations to (near) optimally locate

sensors. If this initial search space (candidate set) is large the search problem

becomes NP-hard. PCA models can be used to identify a reduced dimension, which

can then be used as an initial search space.

171

Another promising area is in the field of pathogen biosensor production. The

exploitation of pathogen/host interaction pioneered by the SYIELD project offers

promise in many areas. More importantly, current biosensors need improvements in

the areas of sensitivity and specificity. The proposed SYIELD biosensor takes 3 days

between collection and detection of a sample. A faster reaction will significantly

improve data quality and informativeness.

172

References

1. United Nations Department of Economic and Social Affairs, P.D., World Population Prospects: The 2012 Revision, Highlights and Advance Tables. 2013.

2. Alexandratos, N. and J. Bruinsma, World agriculture towards 2030/2050: the 2012 revision. 2012, ESA Working paper Rome, FAO.

3. Brown, J.K. and M.S. Hovmøller, Aerial dispersal of pathogens on the global and continental scales and its impact on plant disease. Science, 2002. 297(5581): p. 537-

541. 4. Oerke, E.-C. and H.-W. Dehne, Safeguarding production—losses in major crops and

the role of crop protection. Crop Protection, 2004. 23(4): p. 275-285.

5. Ahemad, M. and M.S. Khan, Biotoxic impact of fungicides on plant growth promoting activities of phosphate-solubilizing Klebsiella sp. isolated from mustard (Brassica campestris) rhizosphere. Journal of Pest Science, 2012. 85(1): p. 29-36.

6. Clarkson, J.P., et al., Forecasting Sclerotinia Disease on Lettuce: A Predictive Model for Carpogenic Germination of Sclerotinia sclerotiorum Sclerotia. Phytopathology, 2007. 97(5): p. 621-631.

7. Varraillon, T., et al. RAISO-Scléro: a decision support system to follow up petal contamination of sclerotinia in oilseed rape. in 13th International Rapeseed Congress. 2011. Prague, Czech. Republic.

8. Koch, S., et al., A crop loss-related forecasting model for Sclerotinia stem rot in winter oilseed rape. Phytopathology, 2007. 97(9): p. 1186-1194.

9. Fisher, M.C., et al., Emerging fungal threats to animal, plant and ecosystem health. Nature, 2012. 484(7393): p. 186-194.

10. Jackson, S. and K. Bayliss, Spore traps need improvement to fulfil plant biosecurity requirements. Plant Pathology, 2011. 60(5): p. 801-810.

11. West, J.S. and R.B.E. Kimber, Innovations in air sampling to detect plant pathogens. Annals of Applied Biology, 2015. 166(1): p. 4-17.

12. Sankaran, S., et al., A review of advanced techniques for detecting plant diseases. Computers and Electronics in Agriculture, 2010. 72(1): p. 1-13.

13. Heard, S. and J.S. West, New developments in identification and quantification of airborne inoculum, in Detection and Diagnostics of Plant Pathogens. 2014, Springer.

p. 3-19. 14. Bolton, M.D., B.P.H.J. Thomma, and B.D. Nelson, Sclerotinia sclerotiorum (Lib.) de

Bary: biology and molecular traits of a cosmopolitan pathogen. Molecular Plant

Pathology, 2006. 7(1): p. 1-16. 15. Boland, G. and R. Hall, Index of plant hosts of Sclerotinia sclerotiorum. Canadian

Journal of Plant Pathology, 1994. 16(2): p. 93-108. 16. Hegedus, D.D. and S.R. Rimmer, Sclerotinia sclerotiorum: When “to be or not to be”

a pathogen? FEMS microbiology letters, 2005. 251(2): p. 177-184. 17. McCartney, H.A. and M.E. Lacey, The relationship between the release of ascospores

of Sclerotinia sclerotiorum, infection and disease in sunflower plots in the United Kingdom. Grana, 1991. 30(2): p. 486-492.

18. Raynal, G., Kinetics of the ascospore production of Sclerotinia trifoliorum Eriks in growth chamber and under natural climatic conditions. Practical and epidemiological incidence. Agronomie, 1990. 10(7): p. 561-572.

19. Clarkson, J.P., et al., Ascospore release and survival in Sclerotinia sclerotiorum. Mycological Research, 2003. 107(2): p. 213-222.

20. Ingold, C.T., Fungal spores. Their liberation and dispersal. Fungal spores. Their

liberation and dispersal., 1971. 21. McCartney, H. and M.E. Lacey, Wind dispersal of pollen from crops of oilseed rape

(< i> Brassica napus</i> L.). Journal of Aerosol Science, 1991. 22(4): p. 467-477.

22. Newton, H. and L. Sequeira, Ascospores as the primary infective propagule of Sclerotinia sclerotiorum in Wisconsin. Plant Disease Reporter, 1972. 56(9): p. 798-

802. 23. Lacey, J., Spore dispersal—its role in ecology and disease: the British contribution to

fungal aerobiology. Mycological research, 1996. 100(6): p. 641-660.

173

24. Lacey, J., reproduction: patterns of spore production, liberation and dispersal. Water,

Fungi, and Plants, 1986(11): p. 65. 25. Ingold, C.T., Active liberation of reproductive units in terrestrial fungi. Mycologist,

1999. 13(3): p. 113-116. 26. Roper, M., et al., Dispersal of fungal spores on a cooperatively generated wind.

Proceedings of the National Academy of Sciences, 2010. 107(41): p. 17474-17479.

27. Qandah, I.S. and L. del Río Mendoza, Temporal dispersal patterns of Sclerotinia sclerotiorum ascospores during canola flowering. Canadian Journal of Plant

Pathology, 2011. 33(2): p. 159-167. 28. Hartill, W.F.T., Aerobiology of Sclerotinia sclerotiorum and Botrytis cinerea spores in

New Zealand tobacco crops. New Z. J. Agric. Res, 1980. 23: p. 259–262. 29. West, J.S., S.D. Atkins, and B.D. Fitt, Detection of airborne plant pathogens; halting

epidemics before they start. Outlooks on Pest Management, 2009. 20(1): p. 11-14.

30. Suzui, T. and T. Kobayashi, Dispersal of ascospores of Sclerotinia sclerotiorum (Lib.) de Bary on kidney bean plants. Part 1. Dispersal of ascospores from a point source of apothecia. Hokkaido Nat. Agric. Exp. Stn. Bull., 1972a. 101: p. 137-151.

31. Dupont, S. and Y. Brunet, Influence of foliar density profile on canopy flow: A large-eddy simulation study. Agricultural and Forest Meteorology, 2008. 148(6–7): p. 976-

990. 32. Aylor, D.E., Y. Wang, and D.R. Miller, Intermittent wind close to the ground within a

grass canopy. Boundary-Layer Meteorology, 1993. 66(4): p. 427-448. 33. McCartney, H. and D. Aylor, Relative contributions of sedimentation and impaction to

deposition of particles in a crop canopy. Agricultural and forest meteorology, 1987. 40(4): p. 343-358.

34. Wilson, J., et al., Statistics of atmospheric turbulence within and above a corn canopy. Boundary-Layer Meteorology, 1982. 24(4): p. 495-519.

35. Seginer, I., et al., Turbulent flow in a model plant canopy. Boundary-Layer

Meteorology, 1976. 10(4): p. 423-453. 36. Raupach, M., J. Finnigan, and Y. Brunei, Coherent eddies and turbulence in

vegetation canopies: the mixing-layer analogy. Boundary-Layer Meteorology, 1996.

78(3-4): p. 351-382. 37. Poggi, D., et al., The effect of vegetation density on canopy sub-layer turbulence.

Boundary-Layer Meteorology, 2004. 111(3): p. 565-587. 38. Kaimal, J.C. and J.J. Finnigan, Atmospheric boundary layer flows: their structure and

measurement. 1994.

39. Wilson, J., Turbulent transport within the plant canopy. Estimation of Areal Evapotranspiration, 1989. 177: p. 43-80.

40. Andrade, D., et al., Modeling soybean rust spore escape from infected canopies: model description and preliminary results. Journal of Applied Meteorology and

Climatology, 2009. 48(4): p. 789-803. 41. McCartney, H. and B. Fitt, Dispersal of foliar fungal plant pathogens: mechanisms,

gradients and spatial patterns, in The epidemiology of plant diseases. 1998, Springer.

p. 138-160. 42. Suzui, T. and T.H. Kobayashi, Dispersal of ascospores of Sclerotinia sclerotiorum

(Lib.) de Bary on kidney bean plants. Part 2. Dispersal of ascospores in the Tokachi District Hokkaido. Nat. Agric. Exp. Stn. Bull., 1972b. 102(61-68).

43. Boland, G.J. and R. Hall, Relationships between the spatial pattern and number of apothecia of Sclerotinia sclerotiorum and stem rot of soybean. Plant Pathology, 1988. 37(329-336).

44. McCartney, A. and J. West, Dispersal of fungal spores through the air, in Mycology Series. 2007. p. 65.

45. Fitt, B.D., et al., Spore dispersal and plant disease gradients; a comparison between two empirical models. Journal of Phytopathology, 1987. 118(3): p. 227-242.

46. Spijkerboer, H., et al., Ability of the Gaussian plume model to predict and describe spore dispersal over a potato crop. Ecological modelling, 2002. 155(1): p. 1-18.

174

47. Skelsey, P., A.A.M. Holtslag, and W. van der Werf, Development and validation of a quasi-Gaussian plume model for the transport of botanical spores. Agricultural and Forest Meteorology, 2008. 148(8–9): p. 1383-1394.

48. Wilson, J.D., Trajectory Models for Heavy Particles in Atmospheric Turbulence: Comparison with Observations. Journal of Applied Meteorology, 2000. 39(11): p.

1894-1912.

49. Reynolds, A., Development and Validation of a Lagrangian Probability Density Function Model of Horizontally-Homogeneous Turbulence Within and Above Plant Canopies. Boundary-layer meteorology, 2012. 142(2): p. 193-205.

50. de Jong, M.D., et al., A model of the escape of< i> Sclerotinia sclerotiorum</i> ascospores from pasture. Ecological Modelling, 2002. 150(1): p. 83-105.

51. Aylor, D. and G. Taylor, Escape of Peronospora tabacina spores from a field of diseased tobacco plants. Phytopathology, 1983. 73(4): p. 525-529.

52. Aylor, D.E. and F.J. Ferrandino, Rebound of pollen and spores during deposition on cylinders by inertial impaction. Atmospheric Environment (1967), 1985. 19(5): p.

803-806. 53. Thomson, D. and J. Wilson, History of Lagrangian stochastic models for turbulent

dispersion. Lagrangian Modeling of the Atmosphere, 2013: p. 19-36.

54. Wilson, J.D. and B.L. Sawford, Review of Lagrangian stochastic models for trajectories in the turbulent atmosphere. Boundary-Layer Meteorology, 1996. 78(1):

p. 191-210. 55. Aylor, D.E. and T.K. Flesch, Estimating spore release rates using a Lagrangian

stochastic simulation model. Journal of Applied Meteorology, 2001. 40(7): p. 1196-1208.

56. Gleicher, S.C., et al., Interpreting three-dimensional spore concentration measurements and escape fraction in a crop canopy using a coupled Eulerian–Lagrangian stochastic model. Agricultural and Forest Meteorology, 2014. 194: p.

118-131. 57. Jarosz, N., B. Loubet, and L. Huber, Modelling airborne concentration and deposition

rate of maize pollen. Atmospheric Environment, 2004. 38(33): p. 5555-5566.

58. Aylor, D.E., Biophysical scaling and the passive dispersal of fungus spores: relationship to integrated pest management strategies. Agricultural and Forest

Meteorology, 1999. 97(4): p. 275-292. 59. Wilson, J.D., A second-order closure model for flow through vegetation. Boundary-

Layer Meteorology, 1988. 42(4): p. 371-392.

60. Katul, G.G., et al., One- and two-equation models for canopy turbulence. Boundary-Layer Meteorology, 2004. 113(1): p. 81-109.

61. Gleicher, S.C., et al., Interpreting three-dimensional spore concentration measurements and escape fraction in a crop canopy using a coupled Eulerian–Lagrangian stochastic model. Agricultural and Forest Meteorology, 2014. 194(0): p. 118-131.

62. Pan, Y., M. Chamecki, and S.A. Isard, Large-eddy simulation of turbulence and particle dispersion inside the canopy roughness sublayer. Journal of Fluid Mechanics, 2014. 753: p. 499-534.

63. Rogers, S.L., S.D. Atkins, and J.S. West, Detection and quantification of airborne inoculum of Sclerotinia sclerotiorum using quantitative PCR. Plant Pathology, 2009.

58(2): p. 324-331.

64. Saharan, G.S. and N. Mehta, Sclerotinia diseases of crop plants: Biology, ecology and disease management. 2008: Springer.

65. Sun, P. and X.B. Yang, Light, Temperature, and Moisture Effects on Apothecium Production of Sclerotinia sclerotiorum. Plant Disease, 2000. 84(12): p. 1287-1293.

66. Fitt, B.D.L., H.A. McCartney, and J.S. West, Dispersal of foliar plant pathogens: mechanisms, gradients and spatial patterns. The Epidemiology of Plant Diseases,

2006: p. 159-192.

67. Koch, S., et al., A Crop Loss-Related Forecasting Model for Sclerotinia Stem Rot in Winter Oilseed Rape. Phytopathology, 2007. 97(9): p. 1186-1194.

175

68. Protectedherbs.org.uk. Sclerotinia life cycle. 2014 [cited 2014 14 August 2014];

Available from: http://www.protectedherbs.org.uk/pages/sclerotiniaLifeCycle.htm. 69. Fitt, B.D.L., et al., Prospects for developing a forecasting scheme to optimise use of

fungicides for disease control on winter oilseed rape in the UK. Aspects of Applied Biology (United Kingdom), 1997.

70. Twengström, E., et al., Forecasting Sclerotinia stem rot in spring sown oilseed rape. Crop Protection, 1998. 17(5): p. 405-411.

71. Weiss, M. and F. Baret, CAN-EYE V6.1 USER MANUAL. 2010.

72. Jonckheere, I., et al., Review of methods for in situ leaf area index determination: Part I. Theories, sensors and hemispherical photography. Agricultural and forest

meteorology, 2004. 121(1): p. 19-35. 73. Holmes, N.S. and L. Morawska, A review of dispersion modelling and its application

to the dispersion of particles: An overview of different dispersion models available. Atmospheric Environment, 2006. 40(30): p. 5902-5928.

74. Benson, P.E., CALINE 4—A Dispersion Model for Predicting Air Pollutant Concentrations near Roadways, in FHWA User Guide. 1984, U. Trinity Consultants Inc.

75. Sokhi, R., B. Fisher, and e. al. Modelling of air quality around roads. in Proceedings of the 5th International Conference on Harmonisation with Atmospheric Dispersion Modelling for Regulatory Purposes. 1998. Greece.

76. Fitt, B.D.L. and H.A. McCartney, (eds ), , Population Dynamics and Management. Spore dispersal in relation to epidemic models, in Plant Disease Epidemiology

ed. K.J. Leonard and W.E. Fry. Vol. 1. 1986, New York: Macmillan. 77. Aloyan, A.E., Numerical modelling of minor gas constituents and aerosols in the

atmosphere. Ecological Modelling, 2004. 179: p. 163-175.

78. Jones, A., et al., The UK Met Office's next-generation atmospheric dispersion model, NAME III. Air Pollution Modeling and its Application XVII, 2007: p. 580-589.

79. Fitt, B.D.L., H.A. McCartney, and J.S. West, Dispersal of foliar plant pathogens: mechanisms, gradients and spatial patterns

The Epidemiology of Plant Diseases, B.M. Cooke, D.G. Jones, and B. Kaye, Editors. 2006,

Springer Netherlands. p. 159-192. 80. Barratt, R., Atmospheric dispersion modelling: an introduction to practical

applications. 2001: Earthscan. 81. Pasquill, F. and F. Smith, Atmospheric diffusion.: Study of the dispersion of windborne

material from industrial and other sources. JOHN WILEY & SONS, 605 THIRD AVE.,

NEW YORK, NY 10016, USA. 1983., 1983. 82. Abdel-Rahman, A.A. On the Atmospheric Dispersion and Gaussian Plume Model.

2008. 83. Hanna, S., G. Briggs, and R. Hosker Jr, Handbook on atmospheric dispersion.

Prepared for the US Department of Energy, 1982. 84. Erbrink, J. and J. van Jaarsveld, The National Model compared with other models and

measurements. in Spijkerboer et al. (2002), Ability of the Gaussian Plume model to

predict describe spore dispersal over potato a crop, 1992. 155: p. 1-18. 85. Stohl, A., et al., Technical note: The Lagrangian particle dispersion model FLEXPART

version 6.2. Atmos. Chem. Phys., 2005. 5(9): p. 2461-2474. 86. Rodean, H.C., Stochastic Lagrangian models of turbulent diffusion. Vol. 45. 1996:

American Meteorological Society Boston, MA.

87. Thomson, D., Criteria for the selection of stochastic models of particle trajectories in turbulent flows. J. Fluid Mech, 1987. 180(529-556): p. 109.

88. Aylor, D.E., N.P. Schultes, and E.J. Shields, An aerobiological framework for assessing cross-pollination in maize. Agricultural and Forest Meteorology, 2003. 119(3-4): p.

111-129. 89. Aylor, D.E., et al., Quantifying the rate of release and escape of Phytophthora

infestans sporangia from a potato canopy. Phytopathology, 2001. 91(12): p. 1189-

1196. 90. USEPA, Revised draft user's guide for the AEROMOD meteorological processor

(aermet). EPA, 1999: p. 273.

http://www.protectedherbs.org.uk/pages/sclerotiniaLifeCycle.htm

176

91. EarthTechInc, A user’s guide for the CALPUFF dispersion model. Earth Tech, Inc,

2000. 521. 92. Esbensen, K.H., et al., Multivariate data analysis: in practice: an introduction to

multivariate data analysis and experimental design. 2002: Multivariate Data Analysis. 93. Martens, H. and T. Naes, Multivariate calibration. 1992: John Wiley & Sons Inc.

94. Andersson, M., A comparison of nine PLS1 algorithms. Journal of chemometrics,

2009. 23(10): p. 518-529. 95. Geladi, P. and B.R. Kowalski, Partial least-squares regression: a tutorial. Analytica

Chimica Acta, 1986. 185: p. 1-17. 96. Kallithraka, S., et al., Instrumental and sensory analysis of Greek wines;

implementation of principal component analysis (PCA) for classification according to geographical origin. Food Chemistry, 2001. 73(4): p. 501-514.

97. Whittaker, P., et al., Identification of foodborne bacteria by infrared spectroscopy using cellular fatty acid methyl esters. Journal of Microbiological Methods, 2003. 55(3): p. 709-716.

98. Frank, I.E. and J.H. Friedman, A Statistical View of Some Chemometrics Regression Tools. Technometrics, 1993. 35(2): p. 109-135.

99. Höskuldsson, A., PLS regression methods. Journal of chemometrics, 1988. 2(3): p.

211-228. 100. Helland, I.S., Partial Least Squares Regression and Statistical Models. Scandinavian

journal of statistics, 1990. 17(2): p. 97-114. 101. Liu, Z.-y., et al., Characterizing and estimating rice brown spot disease severity using

stepwise regression, principal component regression and partial least-square regression. Journal of Zhejiang University - Science B, 2007. 8(10): p. 738-744.

102. Jackman, P., D.-W. Sun, and P. Allen, Prediction of beef palatability from colour, marbling and surface texture features of longissimus dorsi. Journal of Food Engineering, 2010. 96(1): p. 151-165.

103. Huang, J.F. and A. Apan, Detection of sclerotinia rot disease on celery using hyperspectral data and partial least squares regression. Journal of spatial science,

2006. 51(2): p. 129-142.

104. Foster, A.J., et al., Development and validation of a disease forecast model for Sclerotinia rot of carrot. Canadian Journal of Plant Pathology, 2011. 33(2): p. 187-

201. 105. Turkington, T.K., R.A.A. Morrall, and R.K. Gugel, Use of petal infestation to forecast

Sclerotinia stem rot of canola: Evaluation of early bloom sampling, 1985-90. Can. J.

Plant Pathol, 1991. 13: p. 50-59. 106. Lelong, C.C.D., et al., Evaluation of Oil-Palm Fungal Disease Infestation with Canopy

Hyperspectral Reflectance Data. Sensors, 2010. 10(1): p. 734-747. 107. Guimarães, R.L. and H.U. Stotz, Oxalate production by Sclerotinia sclerotiorum

deregulates guard cells during infection. Plant physiology, 2004. 136(3): p. 3703-3711.

108. Tolle, G., et al. A macroscope in the redwoods. 2005. ACM.

109. Ramanathan, N., et al., Rapid deployment with confidence: Calibration and fault detection in environmental sensor networks. 2006.

110. Kollman, C., et al., Limitations of statistical measures of error in assessing the accuracy of continuous glucose sensors. Diabetes technology & therapeutics, 2005.

7(5): p. 665-672.

111. Montgomery, D.C., G.C. Runger, and N.F. Hubele, Engineering statistics. 2009: Wiley. 112. Silverman, B.W., Density estimation for statistics and data analysis. Vol. 26. 1986:

Chapman & Hall/CRC. 113. Dunia, R., et al., Identification of faulty sensors using principal component analysis.

AIChE Journal, 1996. 42(10): p. 2797-2812. 114. Wise, B. and N. Ricker. Recent advances in multivariate statistical process control:

Improving robustness and sensitivity. 1991. Citeseer.

115. Tong, H. and C.M. Crowe, Detection of gross erros in data reconciliation by principal component analysis. AIChE Journal, 1995. 41(7): p. 1712-1722.

177

116. MacGregor, J.F., et al., Process monitoring and diagnosis by multiblock PLS methods. AIChE Journal, 1994. 40(5): p. 826-838.

117. Dunia, R. and S. Joe Qin, A unified geometric approach to process and sensor fault identification and reconstruction: the unidimensional fault case. Computers & chemical engineering, 1998. 22(7): p. 927-943.

118. Qin, S.J., H. Yue, and R. Dunia, Self-validating inferential sensors with application to air emission monitoring. Industrial & engineering chemistry research, 1997. 36(5): p. 1675-1685.

119. Wold, H., Soft modeling by latent variables: the nonlinear iterative partial least squares approach. Perspectives in probability and statistics, papers in honour of MS

Bartlett, 1975: p. 520-540. 120. Wold, S., K. Esbensen, and P. Geladi, Principal component analysis. Chemometrics

and Intelligent Laboratory Systems, 1987. 2(1-3): p. 37-52.

121. Wold, S., et al., Some recent developments in PLS modeling. Chemometrics and intelligent laboratory systems, 2001. 58(2): p. 131-150.

122. Esbensen, K., An introduction to multivariate data analysis and experimental design. Camo Inc, 2004.

123. Hotelling, H., ed. Multivariate quality control. Techniques of statistical analysis, ed.

C. Eisenhart, M. Hastay, and W. Wallis. 1947, McGraw-Hill: New York. 124. Sparks, R., Monitoring highly correlated multivariate processes using Hotelling's T2

statistic: problems and possible solutions. Quality and Reliability Engineering International, 2014: p. n/a-n/a.

125. Williams, J.D., et al., On the distribution of Hotelling's T2 statistic based on the successive differences covariance matrix estimator. Journal of Quality Technology,

2006. 38: p. 217-229.

126. Dunia, R. and S. Joe Qin, Joint diagnosis of process and sensor faults using principal component analysis. Control Engineering Practice, 1998. 6(4): p. 457-469.

127. Doymaz, F., J.A. Romagnoli, and A. Palazoglu, A strategy for detection and isolation of sensor failures and process upsets. Chemometrics and Intelligent Laboratory

Systems, 2001. 55(1): p. 109-123.

128. Bose, M., G. SathyendraKumar, and C. Venkateswarlu, Detection, isolation and reconstruction of faulty sensors using principal component analysis. Indian journal of

chemical technology, 2005. 12. 129. Sharma, A., L. Golubchik, and R. Govindan. On the prevalence of sensor faults in

real-world deployments. 2007. IEEE.

130. Rabiner, L. and B. Juang, An introduction to hidden Markov models. ASSP Magazine, IEEE, 1986. 3(1): p. 4-16.

131. Qin, S.J. and W. Li, Detection and identification of faulty sensors in dynamic processes. AIChE Journal, 2001. 47(7): p. 1581-1593.

132. Sheather, S.J. and M.C. Jones, A reliable data-based bandwidth selection method for kernel density estimation. Journal of the Royal Statistical Society. Series B

(Methodological), 1991: p. 683-690.

133. Jones, M.C., J.S. Marron, and S.J. Sheather, A brief survey of bandwidth selection for density estimation. Journal of the American Statistical Association, 1996. 91(433): p.

401-407. 134. Sheather, S.J., Density estimation. Statistical Science, 2004. 19(4): p. 588-597.

135. Lee, R.W. and J.J. Kulesz, A risk-based sensor placement methodology. Journal of

hazardous materials, 2008. 158(2): p. 417-429. 136. Byrne, R. and D. Diamond, Chemo/bio-sensor networks. Nature materials, 2006.

5(6): p. 421-424. 137. Wang, B., et al., Sensor density for complete information coverage in wireless sensor

networks. Wireless Sensor Networks, 2006: p. 69-82. 138. Kanaroglou, P.S., et al., Establishing an air pollution monitoring network for intra-

urban population exposure assessment: a location-allocation approach. Atmospheric

Environment, 2005. 39(13): p. 2399-2409. 139. Moses, A., K. Obenschain, and J. Boris. Using CT-Analyst as an integrated tool for

CBR analysis. 2006.

178

140. Chen, Y.Q., K.L. Moore, and Z. Song. Diffusion boundary determination and zone control via mobile actuator-sensor networks (MAS-net): Challenges and opportunities. 2004.

141. Ishida, H., et al., Plume-tracking robots: A new application of chemical sensors. The Biological Bulletin, 2001. 200(2): p. 222-226.

142. Krause, A., A. Singh, and C. Guestrin, Near-optimal sensor placements in Gaussian processes: Theory, efficient algorithms and empirical studies. The Journal of Machine Learning Research, 2008. 9: p. 235-284.

143. Ramakrishnan, N., et al. Gaussian processes for active data mining of spatial aggregates. 2005.

144. Park, J.H., G. Friedman, and M. Jones, Geographical feature sensitive sensor placement. Journal of Parallel and Distributed Computing, 2004. 64(7): p. 815-825.

145. Ozkul, S., N.B. Harmancioglu, and V.P. Singh, Entropy-based assessment of water quality monitoring networks. Journal of hydrologic engineering, 2000. 5: p. 90.

146. Shumway, R.H. and D.S. Stoffer, Time series analysis and its applications. 2000:

Springer Verlag. 147. Fitt, B.D.L., H.A. McCartney, and J.S. West, Dispersal of foliar plant pathogens:

mechanisms, gradients and spatial patterns, in The Epidemiology of Plant Diseases, B.M. Cooke, D.G. Jones, and B. Kaye, Editors. 2006, Springer Netherlands. p. 159-192.

148. Lacey, M.E. and J.S. West, The air spora: a manual for catching and identifying airborne biological particles. 2007: Springer.

149. Kaimal, J. and J. Finnigan, Atmospheric Boundary Layer Flows. 1994, Oxford Univ. Press, New York.

150. Flesch, T., et al., Deducing ground-to-air emissions from observed trace gas concentrations: A field trial. Journal of Applied Meteorology, 2004. 43(3): p. 487-502.

151. Silvertown, J., et al., The Park Grass Experiment 1856–2006: its contribution to ecology. Journal of Ecology, 2006. 94(4): p. 801-814.

152. Foken, T., 50 years of the Monin–Obukhov similarity theory. Boundary-Layer

Meteorology, 2006. 119(3): p. 431-447. 153. Rogers, S., S.D. Atkins, and J. West, Detection and quantification of airborne

inoculum of Sclerotinia sclerotiorum using quantitative PCR. Plant Pathology, 2009. 58(2): p. 324-331.

154. West, J.S., Plant Pathogen Dispersal, in eLS. 2001, John Wiley & Sons, Ltd.

155. Saharan, G. and D.N. Mehta, Sclerotinia diseases of crop plants: biology, ecology and disease management. 2008: Springer.

156. West, J., et al., Development of the miniature virtual impactor–MVI–for long-term and automated air sampling to detect plant pathogen spores. Proceedings of “Future

IPM in Europe, 2013: p. 19-21. 157. Bourdôt, G., et al., Risk analysis of Sclerotinia sclerotiorum for biological control of

Cirsium arvense in pasture: ascospore dispersal. Biocontrol Science and Technology,

2001. 11(1): p. 119-139. 158. Di‐Giovanni, F., A review of the sampling efficiency of rotating‐arm impactors used in

aerobiological studies. Grana, 1998. 37(3): p. 164-171.

159. Abawi, G. and R. Grogan, Epidemiology of diseases caused by Sclerotinia species. Phytopathology, 1979.

160. Saldanha, R., et al., The influence of sampling duration on recovery of culturable fungi using the Andersen N6 and RCS bioaerosol samplers. Indoor air, 2008. 18(6):

p. 464-472.

161. Pan, Y., et al., Dispersion of particles released at the leading edge of a crop canopy. Agricultural and Forest Meteorology, 2015. 211: p. 37-47.

162. Aylor, D.E. and F.J. Ferrandino, Dispersion of spores released from an elevated line source within a wheat canopy. Boundary-Layer Meteorology, 1989. 46(3): p. 251-

273.

163. Flesch, T.K., et al., Estimating gas emissions from a farm with an inverse-dispersion technique. Atmospheric Environment, 2005. 39(27): p. 4863-4874.

179

164. Sood, R., Textbook of medical laboratory technology. 2006: Jaypee Brothers Medical

Publishers. 165. Datta, P.K. and B. Meeuse, Moss oxalic acid oxidase—a flavoprotein. Biochimica et

biophysica acta, 1955. 17: p. 602-603. 166. Vo-Dinh, T., Biomedical Photonics Handbook: Biomedical Diagnostics. Vol. 2. 2014:

CRC press.

167. Kim, K.S., J.-Y. Min, and M.B. Dickman, Oxalic acid is an elicitor of plant programmed cell death during Sclerotinia sclerotiorum disease development. Molecular Plant-

Microbe Interactions, 2008. 21(5): p. 605-612. 168. Hu, Y., et al., Characteristics and heterologous expressions of oxalate degrading

enzymes “oxalate oxidases” and their applications on immobilization, oxalate detection, and medical usage potential. Journal of Biotech Research [ISSN: 1944-

3285], 2015. 6: p. 63-75.

169. Nierman, W.C., et al., Genomic sequence of the pathogenic and allergenic filamentous fungus Aspergillus fumigatus. Nature, 2005. 438(7071): p. 1151-1156.

170. Šljukic, B., C.E. Banks, and R.G. Compton, Iron oxide particles are the active sites for hydrogen peroxide sensing at multiwalled carbon nanotube modified electrodes. Nano letters, 2006. 6(7): p. 1556-1558.

171. Grieshaber, D., et al., Electrochemical Biosensors - Sensor Principles and Architectures. Sensors (Basel, Switzerland), 2008. 8(3): p. 1400-1458.

172. Coldrick, Z., SYIELD: Electrochemistry of oxalate biosensor. 2013, University of Manchester.

173. Hare, J.M., Sabouraud Agar for Fungal Growth, in Laboratory Protocols in Fungal Biology. 2013, Springer. p. 211-216.

174. Armbruster, D.A. and T. Pry, Limit of blank, limit of detection and limit of quantitation. Clin Biochem Rev, 2008. 29(Suppl 1): p. S49-52.

175. Desimoni, E. and B. Brunetti, Data Treatment of Electrochemical Sensors and Biosensors, in Environmental Analysis by Electrochemical Sensors and Biosensors. 2015, Springer. p. 1137-1151.

176. Housecroft, C., E. and E.C. Constable, Chemistry: An Introduction to Organic, Inorganic and Physical Chemistry. 3rd edition ed. 2006, Edinburg Gate (England): Pearson Education Limited.

177. Heard, S., Plant Pathogen Sensing for Early Disease Control. 2013, University of Manchester.

178. Holland, P.M., et al., Detection of specific polymerase chain reaction product by utilizing the 5'----3' exonuclease activity of Thermus aquaticus DNA polymerase. Proceedings of the National Academy of Sciences, 1991. 88(16): p. 7276-7280.

179. McCartney, H. and B. Fitt, Construction of dispersal models. Mathematical modelling of crop disease, 1985.

180. McCartney, H., M. Lacey, and C. Rawlinson, Dispersal of Pyrenopeziza brassicae spores from an oil-seed rape crop. The Journal of Agricultural Science, 1986.

107(02): p. 299-305.

181. Schwartz, H. and J. Steadman, Factors affecting sclerotium populations of, and apothecium production by, Sclerotinia sclerotiorum. Phytopathology, 1978. 68(383-

388): p. 11. 182. Luong, J.H., K.B. Male, and J.D. Glennon, Biosensor technology: technology push

versus market pull. Biotechnology advances, 2008. 26(5): p. 492-500.

183. Banica, F.-G., Chemical sensors and biosensors: fundamentals and applications. 2012: John Wiley & Sons.

184. Ginsberg, B.H., Factors affecting blood glucose monitoring: sources of errors in measurement. Journal of diabetes science and technology, 2009. 3(4): p. 903-913.

185. Justino, C.I., T.A. Rocha-Santos, and A.C. Duarte, Review of analytical figures of merit of sensors and biosensors in clinical applications. TrAC Trends in Analytical

Chemistry, 2010. 29(10): p. 1172-1183.

186. Lu, G., Engineering Sclerotinia sclerotiorum resistance in oilseed crops. African Journal of Biotechnology, 2004. 2(12): p. 509-516.

180

187. Culbertson, B.J., N.C. Furumo, and S.L. Daniel, Impact of nutritional supplements and monosaccharides on growth, oxalate accumulation, and culture pH by Sclerotinia sclerotiorum. FEMS microbiology letters, 2007. 270(1): p. 132-138.

188. Andreescu, S. and O.A. Sadik, Trends and challenges in biochemical sensors for clinical and environmental monitoring. Pure and applied chemistry, 2004. 76(4): p.

861-878.

189. Turner, A.P., Biosensors: Past, present and future. Cranfield University, Institute of BioScience and Technology. Available online: www. cranfield. ac. uk/biotech/chinap.

htm, 1996. 190. Nayak, M., et al., Detection of microorganisms using biosensors—A smarter way

towards detection techniques. Biosensors and Bioelectronics, 2009. 25(4): p. 661-667.

191. Rejeb, I.B., et al., Development of a bio-electrochemical assay for AFB 1 detection in olive oil. Biosensors and Bioelectronics, 2009. 24(7): p. 1962-1968.

192. Mendes, R., et al., Development of an electrochemical immunosensor for Phakopsora pachyrhizi detection in the early diagnosis of soybean rust. Journal of the Brazilian Chemical Society, 2009. 20(4): p. 795-801.

193. D'Orazio, P., Biosensors in clinical chemistry—2011 update. Clinica Chimica Acta,

2011. 412(19): p. 1749-1761. 194. Richman, S.A., D.M. Kranz, and J.D. Stone, Biosensor detection systems: Engineering

stable, high-affinity bioreceptors by yeast surface display, in Biosensors and Biodetection. 2009, Springer. p. 323-350.

195. Oliver, N., et al., Glucose sensors: a review of current and emerging technology. Diabetic Medicine, 2009. 26(3): p. 197-210.

196. Aylor, D., Deposition gradients of urediniospores of Puccinia recondita near a source. Phytopathology, 1987. 77(10): p. 1442-1448.

197. Bullock, J.M. and R.T. Clarke, Long distance seed dispersal by wind: measuring and modelling the tail of the curve. Oecologia, 2000. 124(4): p. 506-521.

198. Dodge, Y., et al., The Oxford dictionary of statistical terms. 2003: Oxford University

Press.

199. Joanes, D. and C. Gill, Comparing measures of sample skewness and kurtosis. Journal of the Royal Statistical Society: Series D (The Statistician), 1998. 47(1): p. 183-189.

200. Raupach, M., P. Coppin, and B. Legg, Experiments on scalar dispersion within a model plant canopy part I: The turbulence structure. Boundary-Layer Meteorology, 1986.

35(1-2): p. 21-52.

201. Raupach, M., R. Antonia, and S. Rajagopalan, Rough-wall turbulent boundary layers. Applied Mechanics Reviews, 1991. 44(1): p. 1-25.

202. Raupach, M., Applying Lagrangian fluid mechanics to infer scalar source distributions from concentration profiles in plant canopies. Agricultural and Forest Meteorology,

1989. 47(2): p. 85-108. 203. Clarkson, J.P., et al., Forecasting Sclerotinia disease on lettuce: toward developing a

prediction model for carpogenic germination of sclerotia. Phytopathology, 2004.

94(3): p. 268-279. 204. Wu, B., et al., Incubation of excised apothecia enhances ascus maturation of

Sclerotinia sclerotiorum. Mycologia, 2007. 99(1): p. 33-41. 205. Bohrer, G., et al., Exploring the effects of microscale structural heterogeneity of forest

canopies using large-eddy simulations. Boundary-layer meteorology, 2009. 132(3):

p. 351-382. 206. Stockmarr, A., V. Andreasen, and H. Østergård, Dispersal distances for airborne

spores based on deposition rates and stochastic modeling. Phytopathology, 2007. 97(10): p. 1325-1330.

207. Bouvet, T., et al., Filtering of windborne particles by a natural windbreak. Boundary-layer meteorology, 2007. 123(3): p. 481-509.

208. Aylor, D.E., Modeling spore dispersal in a barley crop. Agricultural Meteorology, 1982.

26(3): p. 215-219.

181

209. Wegulo, S.N., et al., Spread of Sclerotinia stem rot of soybean from area and point sources of apothecial inoculum. Canadian Journal of Plant Science, 2000. 80(2): p. 389-402.

210. Duman, T., et al., A Velocity–Dissipation Lagrangian Stochastic Model for Turbulent Dispersion in Atmospheric Boundary-Layer and Canopy Flows. Boundary-layer

meteorology, 2014. 152(1): p. 1-18.

211. Duman, T., et al., Footprint Estimation for Multi-Layered Sources and Sinks Inside Canopies in Open and Protected Environments. Boundary-Layer Meteorology, 2015.

155(2): p. 229-248. 212. Katul, G.G., et al., The effects of the canopy medium on dry deposition velocities of

aerosol particles in the canopy sub-layer above forested ecosystems. Atmospheric Environment, 2011. 45(5): p. 1203-1212.

213. Wegulo, S., et al., Spread of Sclerotinia stem rot of soybean from area and point sources of apothecial inoculum. Canadian Journal of Plant Science, 2000. 80(2): p. 389-402.

214. Wilson, J.D., T.K. Flesch, and P. Bourdin, Ground-to-Air Gas Emission Rate Inferred from Measured Concentration Rise within a Disturbed Atmospheric Surface Layer. Journal of Applied Meteorology and Climatology, 2010. 49(9): p. 1818-1830.

215. Luhar, A.K., Turbulent Dispersion: Theory and Parameterization—Overview. Lagrangian Modeling of the Atmosphere, 2013: p. 14-18.

216. Hsieh, C.-I. and G. Katul, The Lagrangian stochastic model for estimating footprint and water vapor fluxes over inhomogeneous surfaces. International journal of

biometeorology, 2009. 53(1): p. 87-100. 217. Flesch, T.K., J.D. Wilson, and E. Yee, Backward-time Lagrangian stochastic dispersion

models and their application to estimate gaseous emissions. Journal of Applied

Meteorology, 1995. 34(6): p. 1320-1332. 218. Ro, K.S., et al., Measuring gas emissions from animal waste lagoons with an inverse-

dispersion technique. Atmospheric Environment, 2013. 66(0): p. 101-106. 219. McBain, M.C. and R.L. Desjardins, The evaluation of a backward Lagrangian

stochastic (bLS) model to estimate greenhouse gas emissions from agricultural sources using a synthetic tracer source. Agricultural and Forest Meteorology, 2005. 135(1–4): p. 61-72.

220. Garratt, J., The atmospheric boundary layer. Cambridge atmospheric and space science series. Cambridge University Press, Cambridge, 1992. 416: p. 444.

221. Obukhov, A., Turbulence in an atmosphere with a non-uniform temperature. Boundary-layer meteorology, 1971. 2(1): p. 7-29.

222. Optis, M., A. Monahan, and F. Bosveld, Moving Beyond Monin–Obukhov Similarity Theory in Modelling Wind-Speed Profiles in the Lower Atmospheric Boundary Layer under Stable Stratification. Boundary-Layer Meteorology, 2014. 153(3): p. 497-514.

223. Businger, J.A., et al., Flux-profile relationships in the atmospheric surface layer. Journal of the Atmospheric Sciences, 1971. 28(2): p. 181-189.

224. Panofsky, H.A. and J.A. Dutton, Atmospheric turbulence. Models and methods for engineering applications. New York: Wiley, 1984, 1984. 1.

225. Wilson, J., G. Thurtell, and G. Kidd, Numerical simulation of particle trajectories in inhomogeneous turbulence, I: Systems with constant turbulent velocity scale. Boundary-Layer Meteorology, 1981a. 21(3): p. 295-313.

226. Wilson, J., G. Thurtell, and G. Kidd, Numerical simulation of particle trajectories in inhomogeneous turbulence, II: Systems with variable turbulent velocity scale. Boundary-Layer Meteorology, 1981b. 21(4): p. 423-441.

227. Pelliccioni, A., et al., Some characteristics of the urban boundary layer above Rome, Italy, and applicability of Monin–Obukhov similarity. Environmental fluid mechanics,

2012. 12(5): p. 405-428. 228. Wilson, J., Monin-Obukhov functions for standard deviations of velocity. Boundary-

layer meteorology, 2008. 129(3): p. 353-369.

229. Högström, U., A.-S. Smedman, and H. Bergström, Calculation of wind speed variation with height over the sea. Wind Engineering, 2006. 30(4): p. 269-286.

182

230. Peña, A., S.-E. Gryning, and C.B. Hasager, Measurements and modelling of the wind speed profile in the marine atmospheric boundary layer. Boundary-layer meteorology, 2008. 129(3): p. 479-495.

231. Lange, B., et al., Importance of thermal effects and sea surface roughness for offshore wind resource assessment. Journal of wind engineering and industrial

aerodynamics, 2004. 92(11): p. 959-988.

232. Haugen, D., J. Kaimal, and E. Bradley, An experimental study of Reynolds stress and heat flux in the atmospheric surface layer. Quarterly Journal of the Royal

Meteorological Society, 1971. 97(412): p. 168-180. 233. Kaimal, J.C., et al., Spectral characteristics of surface-layer turbulence. Quarterly

Journal of the Royal Meteorological Society, 1972. 98(417): p. 563-589. 234. Schlegel, F., et al., Large-eddy simulation of inhomogeneous canopy flows using high

resolution terrestrial laser scanning data. Boundary-layer meteorology, 2012.

142(2): p. 223-243. 235. Cava, D. and G. Katul, The effects of thermal stratification on clustering properties of

canopy turbulence. Boundary-layer meteorology, 2009. 130(3): p. 307-325. 236. Braam, M., F. Bosveld, and A. Moene, On Monin–Obukhov Scaling in and Above the

Atmospheric Surface Layer: The Complexities of Elevated Scintillometer Measurements. Boundary-Layer Meteorology, 2012. 144(2): p. 157-177.

237. De Ridder, K., Bulk Transfer Relations for the Roughness Sublayer. Boundary-Layer

Meteorology, 2010. 134(2): p. 257-267. 238. Shaw, R., et al., Measurements of mean wind flow and three-dimensional turbulence

intensity within a mature corn canopy. Agricultural Meteorology, 1974. 13(3): p. 419-425.

239. Sawford, B. and F. Guest, Lagrangian statistical simulation of the turbulent motion of heavy particles. Boundary-Layer Meteorology, 1991. 54(1-2): p. 147-166.

240. Markkanen, T., et al., Footprints and fetches for fluxes over forest canopies with varying structure and density. Boundary-layer meteorology, 2003. 106(3): p. 437-459.

241. Siqueira, M., G. Katul, and J. Tanny, The Effect of the Screen on the Mass, Momentum, and Energy Exchange Rates of a Uniform Crop Situated in an Extensive Screenhouse. Boundary-Layer Meteorology, 2012. 142(3): p. 339-363.

242. Wilson, J.D. and T.K. Flesch, Flow boundaries in random-flight dispersion models: enforcing the well-mixed condition. Journal of Applied Meteorology, 1993. 32(11): p.

1695-1707.

243. Gao, Z., et al., Estimating gas emissions from multiple sources using a backward Lagrangian stochastic model. Journal of the Air & Waste Management Association,

2008. 58(11): p. 1415-1421. 244. Aylor, D.E., Relative collection efficiency of Rotorod and Burkard spore samplers for

airborne Venturia inaequalis ascospores. Phytopathology, 1993. 83(10): p. 1116-1119.

245. Hartill, W., Aerobiology of Sclerotinia sclerotiorum and Botrytis cinerea spores in New Zealand tobacco crops. New Zealand Journal of Agricultural Research, 1980. 23(2): p. 259-262.

246. Abawi, G. and J. Hunter, White mold of beans in New York. 1979. 247. Bock, C. and P. Cotty, Methods to sample air borne propagules of Aspergillus flavus.

European journal of plant pathology, 2006. 114(4): p. 357-362.

248. Hanna, S., D. Strimaitis, and J. Chang, Hazard Response Modeling Uncertainty (A Quantitative Method). Volume 2. Evaluation of Commonly Used Hazardous Gas Dispersion Models. 1993, DTIC Document.

249. Chang, J. and S. Hanna, Air quality model performance evaluation. Meteorology and

Atmospheric Physics, 2004. 87(1-3): p. 167-196. 250. Chang, J.C. and S.R. Hanna, Technical descriptions and user’s guide for the BOOT

statistical model evaluation software package. 2005, Version.

251. Willmott, C.J., Some comments on the evaluation of model performance. Bulletin of the American Meteorological Society, 1982. 63(11): p. 1309-1313.

183

252. Aylor, D., Y. Wang, and D. Miller, Intermittent wind close to the ground within a grass canopy. Boundary-Layer Meteorology, 1993. 66(4): p. 427-448.

253. Markkanen, T., et al., Comparison of conventional Lagrangian stochastic footprint models against LES driven footprint estimates. Atmospheric Chemistry and Physics, 2009. 9(15): p. 5575-5586.

254. Pan, Z., et al., Prediction of plant diseases through modelling and monitoring airborne pathogen dispersal. Plant Sciences Reviews 2010, 2011: p. 191.

255. Cai, X., et al., Evaluation of backward and forward Lagrangian footprint models in the surface layer. Theoretical and Applied Climatology, 2008. 93(3-4): p. 207-223.

256. Wilson, N.R. and R.H. Shaw, A higher order closure model for canopy flow. Journal

of Applied Meteorology, 1977. 16(11): p. 1197-1205. 257. Wilson, J.D., et al., Lagrangian simulation of wind transport in the urban environment.

Quarterly Journal of the Royal Meteorological Society, 2009. 135(643): p. 1586-

1602. 258. Nathan, R., et al., Mechanistic models of seed dispersal by wind. Theoretical Ecology,

2011. 4(2): p. 113-132. 259. Yi, T.H., H.N. Li, and M. Gu, Optimal sensor placement for structural health

monitoring based on multiple optimization strategies. The Structural Design of Tall

and Special Buildings, 2011. 20(7): p. 881-900. 260. Flynn, E.B. and M.D. Todd, A Bayesian approach to optimal sensor placement for

structural health monitoring with application to active sensing. Mechanical Systems and Signal Processing, 2010. 24(4): p. 891-903.

261. Rood, A.S., Performance evaluation of AERMOD, CALPUFF, and legacy air dispersion models using the Winter Validation Tracer Study dataset. Atmospheric Environment,

2014. 89: p. 707-720.

262. Rinne, J., et al., Effect of chemical degradation on fluxes of reactive compounds – a study with a stochastic Lagrangian transport model. Atmos. Chem. Phys., 2012.

12(11): p. 4843-4854. 263. Gladders, P., et al., Sclerotinia in Oilseed Rape: A Review of the 2007 Epidemic in

England. 2008, Home-Grown Cereals Authority.

264. MacGregor, J. and T. Kourti, Statistical process control of multivariate processes. Control Engineering Practice, 1995. 3(3): p. 403-414.

265. Martens, H., Multivariate calibration. 1989: John Wiley & Sons. 266. Bro, R., et al., Cross-validation of component models: a critical look at current

methods. Analytical and bioanalytical chemistry, 2008. 390(5): p. 1241-1251.

267. Kresta, J.V., J.F. MacGregor, and T.E. Marlin, Multivariate statistical monitoring of process operating performance. The Canadian Journal of Chemical Engineering,

1991. 69(1): p. 35-47. 268. Montgomery, D.C., Introduction to statistical quality control. 1991.

269. Montgomery, D.C., Introduction to Statistical Quality Control. 2004: Wiley. 270. Montgomery, D.C., et al., Integrating statistical process control and engineering

process control. Journal of quality Technology, 1994. 26(2): p. 79-87.

271. Montgomery, D.C. and W. Woodall, Research Issues and and Ideas in Statistical Process Control. Journal of Quality Technology, 1999. 31(4): p. 376-387.

272. Marjanovic, O., et al., Real-time monitoring of an industrial batch process. Computers & chemical engineering, 2006. 30(10): p. 1476-1481.

273. Goulding, P.R., et al., Fault detection in continuous processes using multivariate statistical methods. International Journal of Systems Science, 2000. 31(11): p. 1459-1471.

274. Lennox, B., et al., Application of multivariate statistical process control to batch operations. Computers & Chemical Engineering, 2000. 24(2): p. 291-296.

275. Choi, S.W., et al., Adaptive multivariate statistical process control for monitoring time-varying processes. Industrial & Engineering Chemistry Research, 2006. 45(9): p.

3108-3118.

276. Bersimis, S., S. Psarakis, and J. Panaretos, Multivariate statistical process control charts: an overview. Quality and Reliability Engineering International, 2007. 23(5):

p. 517-543.

184

277. Mason, R.L. and J.C. Young, Improving the sensitivity of the T2 statistic in multivariate process control. Journal of Quality Technology, 1999. 31(2): p. 155-165.

278. Varmuza, K. and P. Filzmoser, Introduction to multivariate statistical analysis in chemometrics. 2008: CRC press.

279. Phaladiganon, P., et al., Principal component analysis-based control charts for multivariate nonnormal distributions. Expert Systems with Applications, 2013. 40(8):

p. 3044-3054. 280. Ferrer, A., Multivariate statistical process control based on principal component

analysis (MSPC-PCA): some reflections and a case study in an autobody assembly process. Quality Engineering, 2007. 19(4): p. 311-325.

281. Jackson, J.E. and G.S. Mudholkar, Control procedures for residuals associated with principal component analysis. Technometrics, 1979. 21(3): p. 341-349.

282. Chou, Y.-M., R.L. Mason, and J.C. Young, The control chart for individual observations from a multivariate non-normal distribution. Communications in Statistics-Theory and Methods, 2001. 30(8-9): p. 1937-1949.

283. Tracy, N., J. Young, and R. Mason, Multivariate control charts for individual observations. Journal of Quality Technology, 1992. 24(2).

284. Westerhuis, J.A., S.P. Gurden, and A.K. Smilde, Generalized contribution plots in multivariate statistical process monitoring. Chemometrics and Intelligent Laboratory Systems, 2000. 51(1): p. 95-114.

285. Kourti, T., The process analytical technology initiative and multivariate process analysis, monitoring and control. Analytical and bioanalytical chemistry, 2006.

384(5): p. 1043-1048. 286. Kourti, T. and J.F. MacGregor, Process analysis, monitoring and diagnosis, using

multivariate projection methods. Chemometrics and intelligent laboratory systems,

1995. 28(1): p. 3-21. 287. Walczak, B. and D. Massart, Dealing with missing data: Part I. Chemometrics and

Intelligent Laboratory Systems, 2001. 58(1): p. 15-27. 288. Camacho, J. and A. Ferrer, Cross‐validation in PCA models with the element‐wise k‐

fold (ekf) algorithm: theoretical aspects. Journal of Chemometrics, 2012. 26(7): p.

361-373. 289. Kjeldahl, K. and R. Bro, Some common misunderstandings in chemometrics. Journal

of Chemometrics, 2010. 24(7-8): p. 558-564.

290. Van Ginkel, J.R., P.M. Kroonenberg, and H.A. Kiers, Missing data in principal component analysis of questionnaire data: a comparison of methods. Journal of

Statistical Computation and Simulation, 2014. 84(11): p. 2298-2315. 291. Ilin, A. and T. Raiko, Practical approaches to principal component analysis in the

presence of missing values. The Journal of Machine Learning Research, 2010. 11: p.

1957-2000. 292. Little, R.J. and D.B. Rubin, Statistical analysis with missing data. 2014: John Wiley &

Sons. 293. Nelson, P.R.C., J.F. MacGregor, and P.A. Taylor, The impact of missing measurements

on PCA and PLS prediction and monitoring applications. Chemometrics and Intelligent Laboratory Systems, 2006. 80(1): p. 1-12.

294. Joe Qin, S., Statistical process monitoring: basics and beyond. Journal of

chemometrics, 2003. 17(8‐9): p. 480-502.

295. Doornik, J.A. and H. Hansen, An omnibus test for univariate and multivariate normality*. Oxford Bulletin of Economics and Statistics, 2008. 70(s1): p. 927-939.

296. Royston, P., Approximating the Shapiro-Wilk W-Test for non-normality. Statistics and Computing, 1992. 2(3): p. 117-119.

297. Shapiro, S.S. and M.B. Wilk, An analysis of variance test for normality (complete samples). Biometrika, 1965: p. 591-611.

298. Yin, S., et al., A comparison study of basic data-driven fault diagnosis and process monitoring methods on the benchmark Tennessee Eastman process. Journal of Process Control, 2012. 22(9): p. 1567-1581.

299. Qin, S.J., Survey on data-driven industrial process monitoring and diagnosis. Annual Reviews in Control, 2012. 36(2): p. 220-234.

185

300. Barceló, S., S. Vidal-Puig, and A. Ferrer, Comparison of multivariate statistical methods for dynamic systems modeling. Quality and Reliability Engineering International, 2011. 27(1): p. 107-124.

301. Quevedo, J., et al., Validation and reconstruction of flow meter data in the Barcelona water distribution network. Control Engineering Practice, 2010. 18(6): p. 640-651.

302. Pollice, A. and G. Jona Lasinio, Spatiotemporal analysis of the PM10 concentration over the Taranto area. Environmental Monitoring and Assessment, 2010. 162(1-4): p. 177-190.

303. Nelson, P.R., P.A. Taylor, and J.F. MacGregor, Missing data methods in PCA and PLS: Score calculations with incomplete observations. Chemometrics and intelligent

laboratory systems, 1996. 35(1): p. 45-65. 304. Lennox, B., et al., Process monitoring of an industrial fed‐batch fermentation.

Biotechnology and Bioengineering, 2001. 74(2): p. 125-135.

305. Arteaga, F. and A. Ferrer, Dealing with missing data in MSPC: several methods, different interpretations, some examples. Journal of chemometrics, 2002. 16(8‐10):

p. 408-418.

306. Alcala, C.F. and S. Joe Qin, Analysis and generalization of fault diagnosis methods for process monitoring. Journal of Process Control, 2011. 21(3): p. 322-330.

307. Alcala, C.F. and S.J. Qin, Reconstruction-based contribution for process monitoring. Automatica, 2009. 45(7): p. 1593-1600.

308. Pisoni, E., C. Carnevale, and M. Volta, Multi-criteria analysis for PM10 planning. Atmospheric Environment, 2009. 43(31): p. 4833-4842.

309. Carnevale, C., et al., Neuro-fuzzy and neural network systems for air quality control. Atmospheric Environment, 2009. 43(31): p. 4811-4821.

310. Li, G., et al., Reconstruction based fault prognosis for continuous processes. Control Engineering Practice, 2010. 18(10): p. 1211-1219.

311. Dunia, R. and S. Joe Qin, Subspace approach to multidimensional fault identification and reconstruction. AIChE Journal, 1998. 44(8): p. 1813-1831.

312. Cressie, N., Statistics for Spatial Data. 1991: John Wiley & Sons.

313. Li, J. and A.D. Heap, A review of spatial interpolation methods for environmental scientists. 2008, Geoscience Australia Canberra. p. 137.

314. Denby, B., et al., Interpolation and assimilation methods for European scale air quality assessment and mapping. Part I: Review and Recommendations. European

Topic Centre on Air and Climate Change Technical Paper, 2005. 7.

315. Horálek, J., et al., Interpolation and assimilation methods for European scale air quality assessment and mapping, Part II: Development and testing new methodologies. ETC/ACC Technical Paper, 2005. 7.

316. Zidek, J.V., W. Sun, and N.D. Le, Designing and integrating composite networks for monitoring multivariate Gaussian pollution fields. Journal of the Royal Statistical Society: Series C (Applied Statistics), 2000. 49(1): p. 63-79.

317. Wong, D.W., L. Yuan, and S.A. Perlin, Comparison of spatial interpolation methods for the estimation of air quality data. J Expo Anal Environ Epidemiol, 2004. 14(5): p. 404-415.

318. Clark, I. and W. Harper, Practical geostatistics. 2000, Columbus, OH: Ecosse North American LLC.

319. Li, J. and A.D. Heap, A review of comparative studies of spatial interpolation methods in environmental sciences: performance and impact factors. Ecological Informatics, 2011. 6(3): p. 228-241.

320. Goovaerts, P., Geostatistics for natural resources evaluation. 1997: Oxford university press.

321. Isaaks, E.H. and R.M. Srivastava, An introduction to applied geostatistics. 1989. 322. Burrough, P.A. and R. McDonnell, Principles of geographical information systems. Vol.

333. 1998, Oxford: Oxford university press

323. Gräler, B., L. Gerharz, and E. Pebesma, Spatio-temporal analysis and interpolation of PM10 measurements in Europe. ETC/ACM Technical Paper, 2011. 10.

186

324. Kim, S.-Y., et al., Ordinary kriging approach to predicting long-term particulate matter concentrations in seven major Korean cities. Environmental Health and Toxicology, 2014. 29: p. e2014012.

325. Pires, J. and F. Martins, Evaluation of spatial variability of PM10 concentrations in London. Water, Air, & Soil Pollution, 2012. 223(5): p. 2287-2296.

326. Pires, J.C., et al., Evaluation of redundant measurements on the air quality monitoring network of Lisbon and Tagus Valley. Chemical Product and Process Modeling, 2009. 4(4): p. 14.

327. Lu, W.-Z., H.-D. He, and L.-y. Dong, Performance assessment of air quality monitoring networks using principal component analysis and cluster analysis. Building

and Environment, 2011. 46(3): p. 577-583. 328. Lau, J., W. Hung, and C. Cheung, Interpretation of air quality in relation to monitoring

station's surroundings. Atmospheric Environment, 2009. 43(4): p. 769-777.

329. Ibarra-Berastegi, G., et al., Assessing spatial variability of SO 2 field as detected by an air quality network using self-organizing maps, cluster, and principal component analysis. Atmospheric Environment, 2009. 43(25): p. 3829-3836.

330. Afif, C., et al., Statistical approach for the characterization of NO2 concentrations in Beirut. Air Quality, Atmosphere & Health, 2009. 2(2): p. 57-67.

331. Wise, B.M. and N.B. Gallagher, The process chemometrics approach to process monitoring and fault detection. Journal of Process Control, 1996. 6(6): p. 329-348.

332. Tao, E., et al., Fault diagnosis based on PCA for sensors of laboratorial wastewater treatment process. Chemometrics and Intelligent Laboratory Systems, 2013. 128: p.

49-55. 333. Qin, J., et al., Detection of citrus canker using hyperspectral reflectance imaging with

spectral information divergence. Journal of Food Engineering, 2009. 93(2): p. 183-

191. 334. Pires, J., et al., Identification of redundant air quality measurements through the use

of principal component analysis. Atmospheric Environment, 2009. 43(25): p. 3837-3842.

335. Chen, T., E. Martin, and G. Montague, Robust probabilistic PCA with missing data and contribution analysis for outlier detection. Computational Statistics & Data Analysis, 2009. 53(10): p. 3706-3716.

336. Wang, H., et al., Data Driven Fault Diagnosis and Fault Tolerant Control: Some Advances and Possible New Directions. Acta Automatica Sinica, 2009. 35(6): p. 739-

747.

337. EPA, D., Integrated science assessment for particulate matter. US Environmental Protection Agency Washington, DC, 2009.

338. Callén, M.S., et al., Comparison of receptor models for source apportionment of the PM10 in Zaragoza (Spain). Chemosphere, 2009. 76(8): p. 1120-1129.

339. Contini, D., et al., Characterisation and source apportionment of PM10 in an urban background site in Lecce. Atmospheric Research, 2010. 95(1): p. 40-54.

340. Cawley, G.C. and N.L. Talbot, On over-fitting in model selection and subsequent selection bias in performance evaluation. The Journal of Machine Learning Research, 2010. 11: p. 2079-2107.

341. Mercer, L.D., et al., Comparing universal kriging and land-use regression for predicting concentrations of gaseous oxides of nitrogen (NOx) for the Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air). Atmospheric Environment,

2011. 45(26): p. 4412-4420. 342. Son, J.-Y., M.L. Bell, and J.-T. Lee, Individual exposure to air pollution and lung

function in Korea: spatial analysis using multiple exposure approaches. Environmental research, 2010. 110(8): p. 739-749.

343. USEPA, Quality Assurance Handbook for Air Pollution Measurement Systems. 2008.

344. AQEG, Particulate Matter in the UK. 2005, Defra: London.

345. Green, D.C., G.W. Fuller, and T. Baker, Development and validation of the volatile correction model for PM 10–An empirical method for adjusting TEOM measurements

187

for their loss of volatile particulate matter. Atmospheric Environment, 2009. 43(13):

p. 2132-2141. 346. Velusamy, V., et al., An overview of foodborne pathogen detection: In the perspective

of biosensors. Biotechnology Advances, 2010. 28(2): p. 232-254. 347. Hill, D.J. and B.S. Minsker, Anomaly detection in streaming environmental sensor

data: A data-driven modeling approach. Environmental Modelling & Software, 2010.

25(9): p. 1014-1022. 348. Estévez, J., P. Gavilán, and J.V. Giráldez, Guidelines on validation procedures for

meteorological data from automatic weather stations. Journal of Hydrology, 2011. 402(1): p. 144-154.

349. Akkala, A., V. Devabhaktuni, and A. Kumar, Interpolation techniques and associated software for environmental data. Environmental Progress & Sustainable Energy,

2010. 29(2): p. 134-141.

350. Moustris, K.P., et al., Development and Application of Artificial Neural Network Modeling in Forecasting PM10 Levels in a Mediterranean City. Water, Air, & Soil

Pollution, 2013. 224(8): p. 1-11. 351. Zhang, H., et al., Evaluation of PM10 forecasting based on the artificial neural network

model and intake fraction in an urban area: A case study in Taiyuan City, China. Journal of the Air & Waste Management Association, 2013. 63(7): p. 755-763.

352. Hoek, G., et al., A review of land-use regression models to assess spatial variation of outdoor air pollution. Atmospheric environment, 2008. 42(33): p. 7561-7578.

353. Liang, L., et al., Rapid detection and quantification of fungal spores in the urban atmosphere by flow cytometry. Journal of Aerosol Science, 2013. 66: p. 179-186.

354. Rundel, P.W., et al., Environmental sensor networks in ecological research. New

Phytologist, 2009. 182(3): p. 589-607.

355. Elshenawy, L.M., et al., Efficient recursive principal component analysis algorithms for process monitoring. Industrial & Engineering Chemistry Research, 2009. 49(1):

p. 252-259. 356. Liu, X., et al., Moving window kernel PCA for adaptive monitoring of nonlinear

processes. Chemometrics and Intelligent Laboratory Systems, 2009. 96(2): p. 132-

143. 357. Pesquer, L., A. Cortés, and X. Pons, Parallel ordinary kriging interpolation

incorporating automatic variogram fitting. Computers & Geosciences, 2011. 37(4): p. 464-473.

188

Appendix 1: Original Plan and Modification Made

The Original Plan: The original plan of the SYIELD project was to have three field trials by

2012. The first was to test a working prototype of the biosensor on the field. The second to

deploy multiple units and the third was deploy the sensors on a regional scale. This was based

on a biosensor development timeline that would deliver a working prototype biosensor in

2010. The field trials were to take place at research facilities, such as Rothamsted and

Velcourt Farms, as well as areas recognised as hotspots for Sclerotinia spores. The research

goals were to use the data from the field trial to determine spore ingress in canopy

environments and use data mining methodologies to design interpolation methods that would

be used for deployment.

The Challenges/Limitations: Due to logistical and technological issues, the biosensor was

delayed by nearly 2 years, such that they were not in a field deployable state by early 2013.

This had the implication that the author had no data to work with.

The Modifications: To overcome these challenges, the author conceived and planned a field

trial experiment to generate data (see Field Trial experiment details in section 3.3.1).

Rothamsted had earlier (winter 2012) sown Sclerotia in an OSR field for a local field trial and

the biosensor chips were available. It was possible to make electrochemical measurements

with the biosensor chips using a handheld potentiostat and a bespoke connector. But it was

realised that the field trial would not be able to collect data on a scale that will be meaningful

for empirical methods. The original goals were therefore modified and a physical modelling

approach was adopted instead. The idea was to evaluate a good physical model that can

enable spore concentration estimation in a canopy environment. Subsequently, this data can

be scaled and used to address some aspects of the original goals. The field trial experiment

was then designed with the intention of collecting spore data, measuring the concentration

with biosensor chips and using that data to evaluate the physical model. However, a

preliminary calibration test indicated that the biosensors may not be sensitive enough to

provide meaningful data. It was at that point that a more reliable quantification technique

was incorporated into the experimental plan.

The unreliability of the biosensor also prioritised data integrity issues in the research and

motivated the methodology presented in Chapter 5.

Date post:	18-Dec-2021
Category:	Documents
Upload:	others
View:	3 times
Download:	0 times

Modelling and Multivariate Data Analysis of Agricultural ...

Documents