+ All Categories
Home > Documents > A methodological approach for the identification and quantification of sources of biological...

A methodological approach for the identification and quantification of sources of biological...

Date post: 17-May-2023
Category:
Upload: independent
View: 0 times
Download: 0 times
Share this document with a friend
9
Postharvest Biology and Technology 39 (2006) 1–9 A methodological approach for the identification and quantification of sources of biological variance in postharvest research Bart De Ketelaere a,, Jo Stulens a , Jeroen Lammertyn a , N.V. Cuong b , Josse De Baerdemaeker a a K.U. Leuven, BIOSYST—MeBioS (Mechatronics, Biostatistics, Sensors), Kasteelpark Arenberg 30, 3001 Leuven, Belgium b Department of Agricultural Machinery and Postharvest Technology, Campus II, Can Tho University, 3/2 Street, Can Tho City, Vietnam Received 29 March 2005; accepted 11 September 2005 Abstract A correct identification and quantification of the different sources of variance in an experimental dataset is of utmost importance, for instance, when comparing treatment groups, or in the case where there is a need for describing the (future) behaviour of a batch of biological products. The total data variance can be split up into two different parts, one describing the biological variance due to the natural heterogeneity of the batch and the other describing the uncertainty due to the imperfect measurement of the attribute considered. The classical approach to include biological variance in postharvest research is to use a two stage approach in which in a first stage a (non-linear) model is built for each product individually, whereafter inferences are based on the parameters obtained from the first stage. In this contribution, we propose a methodological approach to identify and quantify the different sources of biological variance, using the concept of (non-linear) mixed effects models. Such models are a useful tool to handle repeated measures data containing a high biological variance. The concept is demonstrated on a practical dataset of postharvest firmness changes in mangoes. It is shown that aside from the differences in biological age of the mangoes, the decay rate also varies among mangoes. Furthermore, it is shown that the biological variance is the dominating source of variance during the experiment. © 2005 Elsevier B.V. All rights reserved. Keywords: Non-linear mixed effects models; Postharvest quality; Modelling; Biological variance 1. Introduction In postharvest research, one of the main goals is to pro- vide the consumer with fruit of high quality. Often, high quality means that the batch provided to the consumer is very uniform. However, the natural heterogeneity of fruit makes it difficult to fulfil the consumers’ expectation and models capable of quantifying the natural heterogeneity are needed, and they could even help making predictions about the batch behavior. In the literature, several approaches to include the presence of biological variance into a model can be found. In general, the approaches assume that the change in behavior is deterministic (fixed effects), and any biological variation is then included as a stochastic deviation of a sin- gle individual around the deterministic part (random effects). Whether deterministic or stochastic, it is important that the Corresponding author. Tel.: +32 16 32 85 93; fax: +32 16 32 85 90. E-mail address: [email protected] (B. De Ketelaere). methodology used is based on insight into the underlying bio- logical processes (Tijskens et al., 2003). For the deterministic part, several models are widely available that allow for the interpretation of its parameters in terms of these underlying processes. Some excellent examples in the field of posthar- vest biology and technology can be found in Schouten et al. (1997), Hertog et al. (1999), Tijskens et al. (2003), among others. For the stochastic part, the most widely used method- ology is the two stage approach. In a first stage, an individual (non-linear) model is fitted for each product resulting in esti- mates for the individual model parameters, whereas in the second stage inferences (for instance, testing cultivar effects) are performed using the model parameter estimates obtained in the first stage. Some examples of the use of a two stage approach as a tool to add stochastics in the modelling strat- egy can be found in Schouten et al. (2004) and Hertog et al. (2004a). With experimental variance, a combination of biological variance and uncertainty is meant. As Nauta (2000) explains, the separation of biological variance and whatever 0925-5214/$ – see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.postharvbio.2005.09.004
Transcript

Postharvest Biology and Technology 39 (2006) 1–9

A methodological approach for the identification and quantificationof sources of biological variance in postharvest research

Bart De Ketelaerea,∗, Jo Stulensa, Jeroen Lammertyna, N.V. Cuongb, Josse De Baerdemaekera

a K.U. Leuven, BIOSYST—MeBioS (Mechatronics, Biostatistics, Sensors), Kasteelpark Arenberg 30, 3001 Leuven, Belgiumb Department of Agricultural Machinery and Postharvest Technology, Campus II, Can Tho University, 3/2 Street, Can Tho City, Vietnam

Received 29 March 2005; accepted 11 September 2005

Abstract

A correct identification and quantification of the different sources of variance in an experimental dataset is of utmost importance, forinstance, when comparing treatment groups, or in the case where there is a need for describing the (future) behaviour of a batch of biologicalproducts. The total data variance can be split up into two different parts, one describing the biological variance due to the natural heterogeneityof the batch and the other describing the uncertainty due to the imperfect measurement of the attribute considered. The classical approach toi is built fore e propose am ed effectsm emonstratedo he mangoes,t ance duringt©

K

1

vqvmmntibivgW

bio-

thelyingthar-

l.

d-idualsti-the

fects)ainedstagestrat-t

of)tever

0d

nclude biological variance in postharvest research is to use a two stage approach in which in a first stage a (non-linear) modelach product individually, whereafter inferences are based on the parameters obtained from the first stage. In this contribution, wethodological approach to identify and quantify the different sources of biological variance, using the concept of (non-linear) mixodels. Such models are a useful tool to handle repeated measures data containing a high biological variance. The concept is dn a practical dataset of postharvest firmness changes in mangoes. It is shown that aside from the differences in biological age of t

he decay rate also varies among mangoes. Furthermore, it is shown that the biological variance is the dominating source of varihe experiment.

2005 Elsevier B.V. All rights reserved.

eywords: Non-linear mixed effects models; Postharvest quality; Modelling; Biological variance

. Introduction

In postharvest research, one of the main goals is to pro-ide the consumer with fruit of high quality. Often, highuality means that the batch provided to the consumer isery uniform. However, the natural heterogeneity of fruitakes it difficult to fulfil the consumers’ expectation andodels capable of quantifying the natural heterogeneity areeeded, and they could even help making predictions about

he batch behavior. In the literature, several approaches tonclude the presence of biological variance into a model cane found. In general, the approaches assume that the change

n behavior is deterministic (fixed effects), and any biologicalariation is then included as a stochastic deviation of a sin-le individual around the deterministic part (random effects).hether deterministic or stochastic, it is important that the

∗ Corresponding author. Tel.: +32 16 32 85 93; fax: +32 16 32 85 90.E-mail address: [email protected] (B. De Ketelaere).

methodology used is based on insight into the underlyinglogical processes (Tijskens et al., 2003). For thedeterministicpart, several models are widely available that allow forinterpretation of its parameters in terms of these underprocesses. Some excellent examples in the field of posvest biology and technology can be found inSchouten et a(1997), Hertog et al. (1999), Tijskens et al. (2003), amongothers. For thestochastic part, the most widely used methoology is the two stage approach. In a first stage, an indiv(non-linear) model is fitted for each product resulting in emates for the individual model parameters, whereas insecond stage inferences (for instance, testing cultivar efare performed using the model parameter estimates obtin the first stage. Some examples of the use of a twoapproach as a tool to add stochastics in the modellingegy can be found inSchouten et al. (2004)and Hertog eal. (2004a). With experimental variance, a combinationbiological variance and uncertainty is meant. AsNauta (2000explains, the separation of biological variance and wha

925-5214/$ – see front matter © 2005 Elsevier B.V. All rights reserved.oi:10.1016/j.postharvbio.2005.09.004

2 B. De Ketelaere et al. / Postharvest Biology and Technology 39 (2006) 1–9

Nomenclature

Ai design matrix linked to the fixed effectsBi design matrix linked to the random effectsbi q-dimensional random effects vectorC a location parameter in the sigmoidal modeldti random term denoting the different biological

ages of the products in the batch (days)dki random term denoting the different decay rates

of the products in the batch (day−1)Fi(t) firmness of mangoi at timet (Hz2 m2/3)Fmax (theoretical) maximal value of the firmness

(Hz2 m2/3)Fmin (theoretical) minimal value of the firmness

(Hz2 m2/3)fi general, real-valued, differentiable functionG2 likelihood ratio test statisticI identity matrixk firmness decay rate (day−1)� log likelihoodN(µ,Σ) normal distribution with mean vectorµ and

variance–covariance matrixΣni number of repeated measures for mangoinMC number of Monte Carlo simulationsP P-valueT superscript T denotes transposet time (days)vi covariate vectoryi vector of repeated measurements on subjecti

Greek lettersβ p-dimensional vector of fixed effectsχ2

df Chi2 distribution with d.f. degrees of freedomεi uncertainty of the modelφi mango specific parameter vectorγ parameter in the heteroscedastic variance func-

tionσ2

dt variance on dti (days2)σ2

dk variance on dki (days−2)ψ variance covariance matrix of the parameters∧ hat denotes estimate

source of uncertainty (due to measurement error or modelinadequacy) could be very informative. Taking that reason-ing one step further, one can postulate also that identificationand quantification of the different sources of biological vari-ance is of prime interest if one wants to hold onto a modellingstrategy that intends to: (1) describe the true underlying bio-logical processes and (2) make accurate predictions of thefuture behaviour of the subjects under study. For the twostage approach, the identification and quantification of thedifferent sources of biological variance is often of a heuristicnature and is performed by evaluating differences in the val-ues of the coefficient of determination or root mean squared

errors (RMSE) when adding or removing a variance sourcefrom the model (Schouten et al., 1997; Hertog et al., 2004a).

Another approach to include experimental variation intoan analysis is provided by the concept of mixed effects mod-els, where the term ‘mixed’ points to the presence of bothfixed and random effects (Verbeke and Molenberghs, 2000;Diggle et al., 2002). This type of model has been success-fully used by several authors to model postharvest data (DeKetelaere et al., 2003, 2004; Lammertyn et al., 2003). Themodel allows for separation of biological variance and uncer-tainty, which makes it very appealing. Furthermore, it offersa framework to test which of the sources of variance are sig-nificantly present. This is needed in order to make accuratepredictive models, or to test treatment or cultivar effects. Inthis text, emphasis will be put on this type of statistical modelsas a powerful alternative to the two stage approach.

The objective of this study is to apply the concepts of non-linear mixed effects models as a framework to identify andquantify different sources of biological variance encounteredin postharvest research. The advantages of mixed effect mod-els over two stage models will be discussed and illustratedon a practical dataset on quality change of mangoes.

2. Materials and methods

2

v happ mer-c , them times indi-v mallp ondi-t em soft-e as 16d

2

erciald her-l g thee acesw mme-t onec laneo

2

urings alogyw

.1. Mango fruit

The mango fruit (Mangifera indica L., ‘Cat Chu’ culti-ar) were bought directly from a farmer in the Dong Trovince, Vietnam. Mangoes were harvested under comial conditions in September 2004. Directly after harvestangoes were transported to the laboratory within a 2 h

pan. When delivered in the laboratory, each mango wasidually tagged. A total of 90 mangoes was stored in slastic boxes and stacked under controlled atmosphere c

ions (ambient gas atmosphere, 12◦C, 70% RH). Fruit whereasured on a regular basis (2–3 days) until completening. The total measurement span of the experiment ways.

.2. Firmness assessment

For the acoustic firmness measurements, a commesktop unit was used (AFS, AWETA, Nootdorp, The Net

ands). Firmness measurements were performed alonquator of the fruit at two equidistant places. These plere chosen such that they are perpendicular to the sy

ry plane of the stone of the fruit. The positioning of the stould be deduced non-destructively from the symmetry pf the fruit.

.3. Firmness decay: parametric model formulation

To describe the firmness change of the mangoes dtorage, a simple sigmoidal function was assumed, in anith assumptions made byHertog et al. (2004b)and was

B. De Ketelaere et al. / Postharvest Biology and Technology 39 (2006) 1–9 3

defined as follows:

Fi(t) = Fmax − Fmax − Fmin

1 + exp(C − kt), (1)

whereFi(t) is the firmness of thei-th mango at timet, Fmaxthe upper asymptote of the sigmoidal function,Fmin the lowerfirmness asymptote,C a shift parameter andk is the decayrate. Straightforward calculus reveals that the inflection pointof the curve is given byC/k.

2.4. A two stage approach

In the first stage, the non-linear function (Eq.(1)) isfitted to the data of each product separately, resulting inproduct specific estimates for each of the parameters in Eq.(1). Let us denote the product specific parameter vector byφi = [Fmax,iFmin,iCiki]T. In the second stage of the approach,the joint distribution of the parameter vectorφi is speci-fied. In this way, the random (product specific, stochastic)sources are analysed. Often, a multivariate normal distribu-tion is assumed forφi ∼ N(φ, ψ), but in theory any otherappropriate probability density function can be chosen. Theψ denotes the variance covariance matrix that allows forthe possible correlation among the parameter estimates ofthe parameter vectorφ. Any testing for treatment or cultivareffects is then performed by using a multivariate analysis ofvW ongp e-wt odelfi d tot in thes

2

thea vari-a iona iffer-e ctionb thea hef s ani odeli rceso eant.T ncep willr ari-a yr uiti thusp As an s

a mango fruit being 1 day ‘older’ than the batch average.Introducing these concepts into Eq.(1) gives

Fi(t) = Fmax − Fmax − Fmin

1 + exp(C − (k + dki)(t + dti))+ εi, (2)

with the parameters defined as above. These sources of bio-logical variance are assumed to be normally distributed with azero mean and a varianceσ2

dk andσ2dt , respectively. The term

εi ∼ N(0, Σi) denotes the uncertainty having a zero meanvector and a variance covariance matrixΣi. The subscripticlearly indicates that the value of these random componentsdepends on the fruit (mango). The inflection point of thisfunction (Eq.(2)) is now given by (C − dti(k + dki))/(k + dki).For the case dki equals 0, the inflection point is given byC/k − dti, which differs dti from the inflection point of Eq.(1). In caseC = 0, the inflection point is given by dti. In thenext section, it will be elucidated how the fixed (determinis-tic) and random (stochastic) parts of Eq.(2) can be estimatedin one estimation procedure, rather than through a two stageapproach.

2.6. The concept of non-linear mixed effects models

Non-linear mixed effects models are a natural extension tothe linear mixed effects models (Verbeke and Molenberghs,2 nteri sti-m ingleo rar-ca

y

w onf blefc orw et agea et on.T thes

φ

wq to ben ancec -p siblyoa jects stageo fort cimen

ariance (MANOVA), for example, using Hotelling’sT2 orilk’s Lambda, in order to account for the covariances am

arameters. In the case whereψ is a diagonal matrix, a onay ANOVA can be used (Johnson and Wichern, 1998). In

he two stage approach, the uncertainty is related to the mt of the first stage, while the biological variance is relatehe variance in model parameters, and as summarizedecond stage.

.5. Introducing biological variance

Eq.(1) provides the deterministic function describingverage firmness decay. We now introduce biologicalnce into Eq.(1) on different levels. In order to keep notatnd calculations orderly, we restrict ourselves to three dnt sources of data variance. First, we make the distinetween biological variance and uncertainty, in view ofrguments byNauta (2000). By uncertainty, we address t

act that: (1) the instrument used (here the AFS) hanevitable measurement error and (2) the eventual mnadequacy. With biological variance, the different souf the true heterogeneity of the considered batch are mhis biological variance is estimated using extra variaarameters in the model. In the presented example, weestrict ourselves to only two components of biological vnce, namely the biological age of the fruit (dti) and its decaate k (dki). Here, the biological age of an individual frs defined as the age relative to the batch average. Itresents a time shift in the profiles around this average.atural consequence, dti has days as units and dti = 1 denote

000) in order to allow the fixed and random effects to en a non-linear way in the model function. Although the e

ation of the model parameters are obtained from one sptimization routine, it can best be thought of as a hiehical model, in which the first level models thei-th fruits

i = f i(φi, vi) + εi, (3)

ith yi presenting the vector ofni repeated measurementsruit i = 1, . . ., M, f is a general, real-valued, differentiaunction of the subject specific parameter vectorφi and aovariate vectorvi andεi the normally distributed error vectith some variance covariance matrixΣi. As such it can b

hought of being similar to the first stage of the two stpproach. In our practical example,vi could be the storag

ime andφi the four parameters of the sigmoidal functihe functionf is non-linear in at least one component ofubject specific parameter vectorφi which is modelled as

i = Aiβ + Bibi, (4)

hereβ is ap-dimensional vector of fixed effects andbi is a-dimensional random effects vector which is assumedormally distributed with a zero vector as mean and variovariance matrixψ. The matricesAi andBi are of approriate dimensions and depend on the subject and posn the values of the covariates used in the model (Pinheirond Bates, 2000). The assumptions imposed on the subpecific parameter vector are comparable to the secondf the two stage approach where an overall distribution

he parameter estimates that are obtained for each spe

4 B. De Ketelaere et al. / Postharvest Biology and Technology 39 (2006) 1–9

individually is assumed. In our practical case, we assume thedecay rate and the biological age to be bivariate normallydistributed.

Through the random effects and associated distribution,this type of model provides a very flexible way to handletwo important characteristics of the repeated measures dataon biological products such as fruit, being the inherent bio-logical variance due to the natural heterogeneity of the fruit,and the heteroscedasticity of the data. Heteroscedasticity isencountered in many storage experiments because initially, atharvest, there is a considerable amount of variance in qualityof the products, the magnitude of which is mainly determinedby the harvest criteria, and the preciseness of applying thosecriteria. The variance typically decreases as the storage exper-iment comes to an end because all products will have a ‘low’quality at that time. A correct handling of heteroscedasticityis crucial in the estimation of the standard errors of the param-eters and, hence, significance. Coping with the natural batchheterogeneity is important from mainly a practical point ofview. While in a classical regression procedure all varianceabove the population average is attributed to ‘unexplained’the mixed effects model is capable of dividing variance inunexplained variance (being a sum of measurement error andmodel inaccuracy), and batch heterogeneity. An estimate ofthe natural batch heterogeneity is important from the con-sumers’ perspective, wishing to have a very uniform product.M thes

turet fectsm . Ad copeot e tests rentc gnif-i ostw so-ca rue’m venm d( dif-f paredm Cri-t AIC(

agem arlos ea ltingi s, thec ula-t e ate gicalv .

The quantification of the uncertainty was done straightfor-ward using the analytic expression forΣi (see further, Eq.(5)).

The SAS software Version 8.2 (The SAS Institute Inc.,NC, USA) was used throughout all statistical analyses. TheMatlab software version 6.5 (The Mathworks Inc., USA) wasused for data visualisations and Monte Carlo simulations.

The methodology for the identification and quantificationof the different sources of biological variance can thus besummarized as follows:

1. choose an appropriate deterministic population model(fixed effects);

2. add sources of biological variance (random effects) to thepopulation model;

3. fit the stochastic model from step 2 to the data, using theconcepts of (non-) linear mixed effects models;

4. perform the appropriate tests (likelihood ratio test or AIC)to identify significant population parameters (determin-istic part) and sources of biological variance (stochasticpart). Re-fit the model with the significant deterministicand stochastic parameters;

5. use the estimates of the random effects toquantify thesources of biological variance.

3

an-g f them roundt po-n s theh entalv rage.A ffectt ing ac s an

F nctiono

oreover, the ratio of both quantities mainly determinesorting ability of the fruit with a given device.

Different methods have been proposed in the literao estimate the parameters in the non-linear mixed efodel, most of them based on the likelihood functionetailed discussion on this topic, which falls beyond the sf this paper, is given inPinheiro and Bates (2000). Never-

heless, it is of utmost importance to have the appropriattatistics to investigate which fixed and random (the diffeomponents of biological variance) components are sicantly present, and to accurately quantify them. The midely used test statistic to compare two models is thealled likelihood ratio test (LRT) denoted by the symbolG2

nd compares the likelihood of both models being the ‘todel. In the results section, the log likelihood of a giodel is denoted by the symbol�. It follows a Chi-squareχ2) distribution with degrees of freedom equal to theerence between the number of parameters of two comodels. A related test statistic is the Akaike Information

erion (AIC). Preferred models are those with a smallVerbeke and Molenberghs, 2000).

For the comparison of confidence limits of the two stodel and the non-linear mixed effects models, Monte C

imulations (nMC = 1000) (Rubinstein, 1981) based on thppropriate parameter distributions were performed resu

n 1000 mango firmness decay curves. From these curveonfidence limits were generated. The Monte Carlo simions were also used to quantify the biological variancach time point since an analytic expression of the bioloariance as a function of time using Eq.(2) was not feasible

. Results and discussion

In Fig. 1, an overview of the firmness decay of the 90 moes is given using box plots. It is clear that at the start oeasurements (day 2), the mangoes are already at or a

he inflection point of the logistic curve and only the exential decay is observed. The figure also clearly showighly heteroscedastic nature of the data, with experimariance being large at harvest and decreasing during stolthough heteroscedasticity does not (asymptotically) a

he point estimates of the model parameters when applylassical fixed effects non-linear regression model, it ha

ig. 1. Box plots of the mango data showing the firmness decay as a fuf time. Heteroscedasticity is evident.

B. De Ketelaere et al. / Postharvest Biology and Technology 39 (2006) 1–9 5

Fig. 2. Box plot of the residuals from model 1 showing the presence of atime dependency of the residual variance.

important repercussion on the estimation of their variance,and, thus, on inferences drawn from these estimates.

The non-linear mixed effects model based on Eq.(2) wasbuilt, with two sources of biological variance (dki and dti)that possibly covariate, and four fixed effects (determinis-tic or batch) parameters (Fmax, Fmin, C andk). In this firststep, the error variance covariance matrixΣi was assumedto have the formσ2I denoting that the error variance is con-stant over time and error terms are independent (model 0).The −2� of the full model 0 is 2408. The model withoutparameterC (model 1) has a−2� of 2409.7, resulting ina G2 of 1.7 on 1 degree of freedom favouring the modelwithoutC (P = 0.19). Removing the biological variance, dki,on the decay parameterk from model 1 results in a model2 with a −2� of 2883.8, or aG2 of 474.1 referring to ahighly significant (P < 0.0001) effect of the biological vari-

ance. Also removing the random effect on biological age dtifrom the model 1 results in a far less accurate model (model3, −2�= 3033.4,G2 = 623.7, d.f. = 1:2,P < 0.0001). For thismodel 3, it was observed that the estimated values were verydifferent from all other models. A detailed investigation ofthe parameter estimation procedure for this model indicatedthat the optimum found was not stable and that results for thismodel should be taken with some reservation. For compari-son, an ordinary non-linear least squares regression model (norandom effects used, model 4) has a−2� of 3219.9, havingonly three parameters less than model 1 (G2 = 810.2, d.f. = 3,P < 0.0001).

Fig. 2shows a box plot of the residuals from model 1. Theparametric model formulation (Eq.(2)) seems to describethe data reasonably (there is no important systematic patternpresent in the residuals), but the assumption of a constanterror variance is highly questionable. Indeed, the error vari-ance is small around harvest and increases drastically upto 7 days of storage. Afterwards it decreases systematicallytowards the end of the experiment. It is believed that thesmall error variance at the start of the experiment is a mod-elling artefact. Indeed, the average firmness values at the startof the experiment are very large when compared to the otherfirmness values and, hence, dominate the parameter estima-tion procedure (which can be regarded at as being similar to aleast squares procedure because of the assumption of normal-i atedc ues.I oush s apt-n w thef

Σ

and st

Fig. 3. Box plot of the residuals (left)

ty of the stochastic part) to a large extent so that the estimurves are much focused at fitting this initial firmness valn view of this argument, we will now use a monotoneteroscedastic error variance structure and evaluate itess. The error variance structure was assumed to follo

ollowing model:

i = σ2 exp(−γt)I, (5)

andardized residuals (right) from model 5.

6 B. De Ketelaere et al. / Postharvest Biology and Technology 39 (2006) 1–9

Tabl

e1

Sum

mar

yof

the

mod

elbu

ildin

gst

rate

gy

Par

amet

ers

(S.E

.)N

on-li

near

mix

edef

fect

sm

odel

sTw

ost

age

Mod

el0

Mod

el1

Mod

el2

Mod

el3

Mod

el4

Mod

el5

Fm

ax10

8.55

(9.4

908)

99.6

91(2

.700

3)12

6.97

(4.2

456)

304.

510

(18.

648)

104.

36(2

.843

1)10

2.78

0(5

.242

8)10

2.78

0(–

)a

Fm

in2.

545

(0.1

464)

2.61

0(0

.132

2)4.

285

(0.2

445)

5.89

9(0

.211

6)2.

660

(0.5

403)

2.64

3(0

.066

7)2.

643

(–)

a

C−0

.179

(0.1

531)

––

––

––

k0.

352

(0.0

191)

0.37

3(0

.012

4)0.

432

(0.0

153)

1.08

7(0

.033

6)0.

3225

(0.0

176)

0.34

2(0

.010

7)0.

369

(0.0

269)

σ2

2.58

4(0

.209

1)2.

549

(0.2

035)

12.2

89(0

.873

9)17

.812

(1.2

369)

37.6

30(2

.384

7)40

5.28

0(8

4.23

0)1.

657

(–)

σ2 dt

2.23

3(0

.385

2)2.

429

(0.3

937)

1.25

8(0

.230

8)–

–1.

402

(0.5

071)

2.86

9(0

.178

5)σ

2 dk0.

0184

(0.0

043)

0.02

4(0

.004

0)–

0.04

26(0

.006

2)–

0.00

5(0

.001

1)0.

027

(0.0

172)

σdt

,dk

−0.1

66(0

.025

0)−0

.194

(0.0

334)

––

–−0

.054

(0.0

189)

−0.2

16(–

––

––

–0.

494

(0.0

185)

−2�

2408

2409

.728

83.8

3033

.432

19.9

2083

.1–

AIC

2424

2423

.728

93.8

3043

.432

27.9

2099

.1–

Valu

esbe

twee

nbr

acke

tsde

note

stan

dard

erro

rs.

aF

max

and

Fm

inw

ere

held

fixed

atth

eva

lues

ofth

efin

alno

n-lin

ear

mix

edef

fect

sm

odel

(mod

el5)

.

whereγ is a parameter to be estimated from the optimizationprocedure,I the identity matrix (ni × ni) andt denotes time.By applying Eq.(5) to the error structure, heteroscedasticityof the residuals is allowed, but still no covariances are takeninto account since it is assumed that error terms are inde-pendent of each other. This model (model 5) has a−2� of2083.1 using one parameter (γ) more than the previous bestmodel, model 1. The LRT reveals that this model 5 is stronglypreferred above model 1 (G2 = 326.6, d.f. = 1,P < 0.0001).In order to check whether the error heteroscedasticity wassuccessfully removed, again a box plot of the residuals wasalso made (Fig. 3). The left panel shows the residuals with-out normalization. The variance of the residuals seems todecrease exponentially with storage time, complying withEq. (5). Also a normalized residual plot (residuals dividedby the standard deviation, which is a function of time asgiven in Eq.(5)) is informative, suggesting that there is notime dependency of the normalized error variance apparentso that the final model 5 complies with all assumptions ofthe non-linear mixed effects model. All further results arederived from model 5. As a consequence, the contribution ofthe model accuracy in the total uncertainty used further in theresults is minimised.

Table 1lists the parameters of the different models andtheir standard errors. When inspecting dti, it is seen thatthe 95% confidence interval of the biological age is∼5.5d ngop st an-g thea 7;Hc ther cleara uld bet ff beenoe

wasat ti-m henc abled log-i d.T bio-l el 5,b nd 2.A ump-t or theh y noe ncest mati-ct onfi-

ays wide, being roughly a third as wide as the whole maostharvest shelf-life at 12◦C. For the decay ratek, it appear

hat there is a very strong effect of the individual moes (σ2

dk = 0.005,P < 0.0001). This last result contradictsssumption made by several authors (Schouten et al., 199ertog et al., 2004a) who postulate that the decay ratek is aommon factor for a given cultivar. As of this moment,eason why the decay rates differ between the fruit is unnd deserves further attention. One possible reason co

he influence of the different mango trees. The value oσ2

or model 5 represents the uncertainty that would havebserved at time 0 as can be deduced from Eq.(5) and is anxtrapolation of the measured data range.

For comparison, also a classical two stage procedurepplied to the data, and results are also given inTable 1. For

he two stage model,FmaxandFmin were kept fixed at the esates of the final non-linear mixed effects model 5. W

omparing the two stage model to model 5, a remarkifference in the estimation of the variance of the bio

cal age,σ2dt , and of the decay rate,σ2

dk, can be observehe two stage model systematically overestimates the

ogical variance components when compared to the modut its values seem to be similar to those of models 1 as elucidated above, mis-specification of the model ass

ions (such as the repeated measures nature of the dataomoscedasticity of the error terms) has asymptoticallffect on the fixed effects parameter estimates, but influe

he random components and parameter precision systeally. A closer look at the decay ratek and its varianceσ2

dk forhe two stage model leads to the conclusion that a 95% c

B. De Ketelaere et al. / Postharvest Biology and Technology 39 (2006) 1–9 7

Fig. 4. The average (full line) and 90% confidence limits (dotted lines) for the two stage model (left), and the non-linear mixed effects model 5 (right).

dence interval for an individual decay rate is much wider thanexpected, namely [k − 1.96σdk k + 1.96σdk] = [0.074 0.718].The solution provided by model 5 is much more realistic,yielding a 95% confidence interval for an individual decayrate of [0.203 0.481].Fig. 4shows the 90% confidence lim-its for the two stage model (left) and model 5 (right). Thelimits produced by model 5 seem to be much more realis-tic when comparingFigs. 1 and 4. The main reason why thelimits of both types of models are so different is the presence

of the large data variance at harvest carrying a substantialamount of measurement error and being highly influentialin an ordinary least squares setting, such as the two stagemodel, as explained above. Its influence on parameter esti-mates is alleviated in the non-linear mixed effects model 5 byincluding the heteroscedastic measurement error variance inEq.(5).

Fig. 5 shows a visual representation of the covariancematrix of the random effects. There is a moderate negative

F effects rians

ig. 5. Visual representation of the covariance matrix of the randomcatterplot the covariance.

in the final non-linear mixed effects model. Histograms represent vaces, the

8 B. De Ketelaere et al. / Postharvest Biology and Technology 39 (2006) 1–9

Fig. 6. Total experimental variance as a function of time, observed (©) andmodelled (line). Total experimental variance was split up in uncertainty andbiological variance.

correlation between the decay rate and the biological age(r =−0.65) indicating that mango fruit that are at harvestmore ripe than on average tend to loose their firmness fasterthan mango fruit that are less ripe at harvest.

Fig. 6 shows the experimental data variance as observedfrom the data (©), and as modelled using the non-linearmixed model 5. The inclusion of the appropriate randomcomponents in the model allows for a reasonable descriptionof the heteroscedasticity. The total experimental variance ispartly due to: (1) uncertainty about the measurements them-selves and the model inaccuracy (as quantified byΣi, Eq.(5)) and (2) the true batch heterogeneity (as quantified by thecombined effects ofσ2

dk and σ2dt). The biological variance

is highest near the start of the experiment, and decreasesnear the end of the study. The decreasing biological vari-ance is induced by assuming a common value forFmin in allthe models. The uncertainty decreases during storage, andseems to be contradictory to results reported byDe Ketelaereet al. (2004)who proved that for tomatoes, the firmness mea-surement error increases when the tomatoes were softeningHowever, besides the fact that other products are investi-gated in this study, also the range in firmness values is verydifferent. While the firmness of the tomatoes in the studyby De Ketelaere et al. (2004)ranged from about 8 to about3 Hz2 g2/3, in the present study the firmness range from about35 to 3 Hz2 g2/3.

4

nts whichi ccu-r thec racy,a f the

fruit within the batch. A study on the postharvest firmnessdecay of mango fruit was taken to illustrate the versatilityof the concept. It was shown that the uncertainty of the datawas strongly dependent of time, and an exponential type ofuncertainty was incorporated in the model. On the biologi-cal variance side, it was shown that there were two differentsources of biological variance. The first source representsthe different biological ages of the individual mangoes in thebatch. The second source represents the natural heterogeneityin decay rate parameters of the individual fruit.

Acknowledgements

Bart De Ketelaere and Jeroen Lammertyn are Postdoc-toral Researchers of the Fund for Scientific Research Flanders(F.W.O.-Vlaanderen). The authors also kindly acknowledgeNguyen Minh Tri of the Can Tho University in Vietnam andcollaborators of the Department of Agricultural Machineryand Postharvest Technology. Part of the funding came fromVLIR (Flemish Interuniversity Council). The authors alsoacknowledge the reviewers for their insightful suggestions.

References

Dqual-

ctical

D olad onrage.

D lysis

H .M.,ing

tmo-ol. 15,

H ico-har-71–

H .firm-

J tical

L , G.,se:epal

N ntita-124,

P LUS.

R John

. Conclusions

A concept for identifying and quantifying differeources of data variance was presented. The concept,s based on non-linear mixed effects models, is able to aately divide the data variance into uncertainty, due toombined effect of measurement error and model accund biological variance, due to the true heterogeneity o

.

e Ketelaere, B., Lammertyn, J., Molenberghs, G., Nicolaı, B., DeBaerdemaeker, J., 2003. Statistical models for analysing repeatedity measures of horticultural products. Model evaluations and praexample. Math. Biosci. 185, 169–189.

e Ketelaere, B., Lammertyn, J., Molenberghs, G., Desmet, M., Nicı,B., De Baerdemaeker, J., 2004. Tomato cultivar grouping basefirmness change, shelf-life and variability during postharvest stoPostharvest Biol. Technol. 34, 187–201.

iggle, P.J., Heagerty, P., Liang, K.Y., Zeger, S.L., 2002. The Anaof Longitudinal Data. Oxford University Press, Oxford, England.

ertog, M.L.A.T.M., Boerrigter, H.A.M., van den Boogaard, G.J.PTijskens, L.M.M., van Schaik, A.C.R., 1999. Predicting keepquality of strawberries (cv. ‘Elsanta’) packed under modified aspheres: an integrated model approach. Postharvest Biol. Techn1–12.

ertog, M.L.A.T.M., Lammertyn, J., Desmet, M., Scheerlinck, N., Nlai, B.M., 2004a. The impact of biological variance on postvest behaviour of tomato fruit. Postharvest Biol. Technol. 34, 2284.

ertog, M.L.A.T.M., Ben-Arie, R., Roth, E., Nicolaı, B.M., 2004bHumidity and temperature effects on invasive and non-invasiveness measures. Postharvest Biol. Technol. 33, 79–91.

ohnson, R.A., Wichern, D.W., 1998. Applied Multivariate StatisAnalysis. Prentice Hall, Upper Saddle River, NJ.

ammertyn, J., De Ketelaere, B., Marquenie, D., MolenberghsNicolaı, B.M., 2003. Mixed models for multicategorical responmodelling the time effect of physical treatments on strawberry squality. Postharvest Biol. Technol. 30, 195–207.

auta, M.J., 2000. Separation of uncertainty and variability in quative microbial risk assessment models. Int. J. Food Microbiol.365–373.

inheiro, J., Bates, D., 2000. Mixed-Effects Models in S and S-PSpringer, New York.

ubinstein, R.Y., 1981. Simulation and the Monte Carlo method.Wiley and Sons, New York.

B. De Ketelaere et al. / Postharvest Biology and Technology 39 (2006) 1–9 9

Schouten, R.E., Otma, E.C., Van Kooten, O., Tijskens, L.M.M., 1997.Keeping quality of cucumber fruit predicted by biological age.Postharvest Biol. Technol. 12, 175–181.

Schouten, R.E., Jongbloed, G., Tijskens, L.M.M., van Kooten, O., 2004.Batch variability and cultivar keeping quality of cucumber. PostharvestBiol. Technol. 32, 299–310.

Tijskens, L.M.M., Konopacki, P., Simcic, M., 2003. Biological variance:burden or benefit. Postharvest Biol. Technol. 27, 15–25.

Verbeke, G., Molenberghs, G., 2000. Linear Mixed Models for Longitu-dinal Data. Springer, New York.


Recommended