2003, GLS

7/28/2019 2003, GLS

1/13

Pre-whitening of data by covariance-weighted

pre-processing

Harald Martens1*, Martin Hy2, Barry M. Wise3, Rasmus Bro1 and Per B. Brockhoff4

1Department of Food and Dairy Science, Royal Veterinary and Agricultural University, DK-1958 Frederiksberg C, Denmark2Institute of Chemistry, Norwegian University of Science and Technology, N-7491 Trondheim, Norway3Eigenvector Research Inc., Manson, WA, USA4Department of Mathematics and Physics, Royal Veterinary and Agricultural University, DK-1871 Frederiksberg C, Denmark

Received 7 May 2001; Revised 9 September 2002; Accepted 22 November 2002

A data pre-processing method is presented for multichannel `spectra' from process spectro-

photometers and other multichannel instruments. It may be seen as a `pre-whitening' of the spectra,

and serves to make the instrument `blind' to certain interferants while retaining its analyte

sensitivity. Thereby the instrument selectivity may be improved already prior to multivariatecalibration. The result is a reduced need for process perturbation or sample spiking just to generate

calibration samples that span the unwanted interferants. The method consists of shrinking the

multidimensional data space of the spectra in the off-axis dimensions corresponding to the spectra of

these interferants. A `nuisance' covariance matrix S is first constructed, based on prior knowledge or

estimates of the major interferants' spectra, and the scaling matrix G = S1/2 is defined. The pre-

processing then consists of multiplying each input spectrum by G. When these scaled spectra are

analysed in conventional chemometrics software by PCA, PCR, PLSR, curve resolution, etc., the

modelling becomes simpler, because it does not have to account for variations in the unwanted

interferants. The obtained model parameter may finally be descaled by G1 for graphical inter-

pretation. The pre-processing method is illustrated by the use of prior spectroscopic knowledge to

simplify the multivariate calibration of a fibre optical vis/NIR process analyser. The 48-dimensional

spectral space, corresponding to the 48 instrument wavelength channels used, is shrunk in two of itsdimensions, defined by the known spectra of two major interferants. Successful multivariate

calibration could then be obtained, based on a very small calibration sample set. Then the paper

shows the pre-whitening used for reducing the number of bilinear PLSR components in multivariate

calibration models. Nuisance covariance S is either based on the prior knowledge of interferants'

spectra or based on estimating the interferants' spectral subspace from the calibration data at hand.

The relationship of the pre-processing to weighted and generalized least squares from classical

statistics is outlined. Copyright # 2003 John Wiley & Sons, Ltd.

KEYWORDS: pre-whitening; covariance; weighted; preprocessing; GLS; prior knowledge; process; multivariate

calibration

1. INTRODUCTION

1.1. Reducing unwanted effectsClassical chemical modelling, where prior knowledge is

used to formulate mathematical models based on causal/

mechanistic/first-principles theory, has problems when the

a priori knowledge is erroneous or incomplete. On the other

hand, data-driven explorative modelling, such as multi-

variate regression of one set of variables Y on another set of

variables X, has problems if the available data are inade-

quate. Sometimes, purely data-driven modelling requires

large amounts of input data for estimation of parameters that

one already knows.

The goal of the present covariance-weighted pre-proces-

sing technique is to maintain the flexibility of the data-driven

`soft modelling', but to reduce the requirements for

empirical calibration data, by including quantitative prior

knowledge in the modelling. If successful, this should

reduce the existing prerequisite for spanning all relevant

types of variation by the calibration samplesa requirementthat has made multivariate calibration of process analysers

expensive and cumbersome. It should also decrease the total

number of calibration samples needed, as fewer statistical

*Correspondence to: H. Martens, Department of Food and Dairy Science,Royal Veterinary and Agricultural University, DK-1958 Frederiksberg C,Denmark.E-mail: [email protected]

Copyright # 2003 John Wiley & Sons, Ltd.

JOURNAL OF CHEMOMETRICS

J. Chemometrics 2003; 17: 153165Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cem.780

7/28/2019 2003, GLS

2/13

7/28/2019 2003, GLS

3/13

eigenanalyses, these software systems include a weighting-

based pre-processing step, to balance the relevance and noise

levels of the different variables. This weighting may be

written as

X XInputG 2

where G (K K) is a scaling matrix. In this conventionalweighting, G is diagonal, with scaling elements that are the

inverse of a predefined standard deviation s (K 1). In the

commonly used standardization, vector s is defined as the

total initial standard deviation s0 of the Kvariables in the set

of available objects. However, it is also possibleand

statistically more optimalto define s as the standard un-

certainty of the different variables, i.e. the expected standard

deviation of their errors.

More formally, the scaling matrix G may be seen as the

inverse square root of the diagonal variance elements in

matrix S:

G S1=2

3

Defining S = diag(s2) and replacing X by XInputG =

XInputS1/2 in the PCA and PLSR definitions shows that

the pre-processing of the X-variables is equivalent (see

Appendix I) to defining the score vectors as eigenvectors of

XInputS1/2X'Input in PCA/PCR and of XInputS

1/2X'InputYY'

in PLSR (after deflation). In the NIPALS estimation

algorithm it may equivalently be attained by using weighted

least squares (WLS) in the repeated regression over X-vari-

ables that defines each score vector.

If the errors in different X-variables are correlated, S

becomes a covariance matrix with non-zero off-diagonal

elements. From more or less approximate prior knowledgeabout this uncertainty covariance, Equation (3) may still be

used for defining the pre-processing. Equation (2) then

yields a covariance-weighted pre-processing of the input

data. The equivalent NIPALS algorithm then requires

generalized least squares (GLS) regression [1,2] over the

X-variables to estimate the score vectors. Further details of

the relationship between classical GLS and the present use of

covariance-weighted pre-processing for `pre-whitening' of

spectral data are given in Appendix II. This also shows the

converse object weighting to remove correlated errors

between objects.

2.2.3. Denition of the pre-processing weights GA practical implementation of Equation (3) is based on

eigenanalysis of the uncertainty variancecovariance matrix

S in terms of its eigenvectors V and eigenvalues l:

SV Vdiagl 4a

The covariance weighting matrix is here defined as

G Vdiagl1=2VH 4b

The chosen symmetrical definition of G is not mandatory as

long as GG' =S1, but it simplifies the visual interpretation

of the weighted model parameters and residuals.

2.2.4. Deweighting the model parametersThe loadings P and residuals E of the X-variables, obtained

from the bilinear model of the mean-centred, weighted

X-data,

X TPH E 5a

may be descaled to fit the model of the mean-centred,

unweighted data, i.e.

XInput TPHInput EInput 5b

If G is symmetrical and has full rank (see below), the

inversion of Equation (2) gives

XDescaled XInput XG1 5c

Likewise,

EDescaled EG1 5d

and

PDescaled G1P 5e

This simplifies the graphical interpretation of the X-loadings.

In regression methods such as PCR and PLSR the mean-centred, reduced-rank linear regression model summary,

based on the scaled X-variables, may be written as

Y XBA FA 5f

where the regression coefficient parameter matrix BA (KJ)

uses A latent variables and FA (NJ) represents residuals.

BA may be seen as linear combinations of orthogonal

X-loadings (PCR) or orthogonal loading-like loading weights

(PLSR). For graphical interpretation, BA may therefore be

descaled in analogy to Equation (5e) as

BA;Descaled G1BA 5g

On the other hand, the regression coefficients suitable forprediction of the Y-variables directly from the unweighted

X-variables,

bYA XInputBA;ForInput 5h

may be obtained by inserting Equation (2) into Equation (5f),

yielding

BA;ForInput GBA 5i

2.2.5. Denition of the uncertainty covarianceSfrom

prior knowledgeIn the situation with undesired interferants outlined in

Equation (1), it is natural to define S from D = DL' E. The

spectra L of the interferants (the undesired variation

patterns) may sometimes be assumed known, while their

concentrations D are unknown. The formally correct defini-

tion could then be

S L covDLH covE 6a

where cov(D) represents the expected variancecovariance

of the interferant concentrations and cov(E) represents the

covariance of other, unidentified error patterns plus the

variance of random i.i.d. noise. In practice, the variation in

interferant concentrations may be difficult to specify and

may e.g. be replaced by the approximation

covD d2 I 6b

Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 153165

Pre-whitening of spectra 155

7/28/2019 2003, GLS

4/13

where d2 is the expected average variance of the interferants'

concentrations; intercorrelations between the interferants'

concentrations are assumed to be negligible. The scalar d is

given in the unit of interferant concentrations. Moreover, it

may often be adequate to assume that the errors in E are

uncorrelated, i.e.

covE diags2 6c

Thereby Equation (6a) simplifies to

S d2LLH diags2 6d

If all the X-variables have about the same uncertainty

variance s2, i.e.

covE s2I 6e

this leads to a further simplification. With the expected

average interferant concentration variance d2 being a general

scaling factor determining the contribution of the interferant

spectra, this further simplifies the definition of S to

S d2LLH I 6f

By defining the scaling factor d sufficiently large, the pre-

processing X = XInputS1/2 (Equations (2) and (3)) in effect

can make the subsequent least squares-based modelling of X

completely insensitive (`blind') to signal variations caused

by the unknown interferant concentrations. Only the net

analyte signal obtained as the residual after projecting K

(Equation (1)) on L will remain in X, together with un-

modelled variations and measurement noise.

2.2.6. Denition of the uncertainty covarianceSfrom

previous residualsWhen explicit prior knowledge about the spectrum of the

individual interferants in L is lacking, the required informa-

tion may instead be defined from spectral modelling

residuals in previous calibration data. If X and Y data from

a previous relevant set of M objects are available, D, the

spectral residuals in these data, may be obtained after

projection of X on the Jknown constituent concentrations Y:

D XI YYHY1YH 7a

These residualsDmay then be used for estimating the future

error covariance matrix S, by defining L in Equation (6f) as

e.g. the first few (A) principal components of D, obtained bysingular value decomposition of D:

USVH D 7b

In the notation of e.g. Matlab the subspace of the interferants

may be defined as

L V:; 1 : AS1 : A; 1 : A 7c

2.2.7. Denition of the uncertainty covarianceSfromthe data at handEquations (7a)(7c) may alternatively be based on the X and

Y data at hand in the actual set of N calibration samples,instead of on previous data. However, care must then be

taken to avoid overfitting. For instance, if cross-validation

and jackknifing are to be used for statistical assessment of a

calibration model, S may e.g. have to be re-estimated within

each cross-validation segment.

3. MATERIALS AND METHODS

3.1. Input data

The data set used for illustrating the pre-processing has beenchosen for its simplicity, in order to make the method clear.

The data [11] concern the determination of the protonated

state of a chemical dye, litmus.

3.2. MethodsTransmitted light spectra Twere measured remotely by fibre

optics in an industrial process spectrophotometer (Guided

WaveModel 200). The transmittance spectra were converted

into absorbance (here referred to as `optical density' (OD))

spectra and collected in K= 48 wavelength channels between

about 400 and 700 nm. These OD spectra were termed XInput,

available for a total of 23 samples.

The samples contain different known concentrations [11]

of protonated (red-coloured) litmus, which is the analyte to

be calibrated for here, Y = [protonated litmus]. In addition,

the samples have various unknown concentration variations

of two interferants, unprotonated (blue-coloured) litmus

(due to varying pH) and white zinc oxide powder. The data

were analysed in Matlab2Version 5.3 (The MathWorks, Inc.)

using the first author's software.

4. RESULTS

4.1. Previous results for the same data

Without any interferants the OD data are expected toincrease proportionally with the concentration of the red-

coloured analyte, Y = [protonated litmus], at each wave-

length k where the analyte absorbs light, xInput,k, k = 1,2,,K.

However, the two interferants (blue litmus, white powder)

generate selectivity problems: strongly varying but un-

known levels of one or both of the interferants make it

impossible to determine the analyte by conventional

univariate calibration based on a single wavelength channel.

Such selectivity problems may be removed by multi-

variate calibration [2], without knowing anything about the

spectral characteristics of the pure analyte and the inter-

ferants, and without even knowing the concentrations of theinterferants in the calibration samples, as demonstrated for

these data in References [2,11]. However, this requires that

the calibration sample set spans not only the analyte's

concentration but also each of the interferants' concentra-

tions. The present paper shows how additional spectral

information about the interferants may be used to filter out

their effects by shrinking the X-space, to the extent that they

do not have to be modelled and therefore not even spanned

by the calibration set.

4.2. Input dataThe two full curves in Figure 1 show the known

interference structures in the present application example:the instrument responses L=[l1, l2] (crosses) of the two

interferants, represented by their OD spectra at K= 48

wavelength channels in the visible wavelength range. These


156 H. Martens et al.

7/28/2019 2003, GLS

5/13

7/28/2019 2003, GLS

6/13

4.3. Increasing degree of shrinkage of inputdataThe rest of Figure 2 illustrates how the spectra X look after

increased downscaling of the two known interferants'

impact in the pre-processing X = XInputG = XInputS1/2

(Equations (2) and (3)). The error covariance matrix S was

here defined by the simplified expression in Equation (6f) asan increasingly weighted sum of the covariance d2LL' (where

L=[l1, l2] from Figure 1) plus a constant noise variance,

diag(s2) = I.

The scalar d2 determines the degree of shrinkage. The four

rows in Figure 2 represents four increasing degrees of

shrinkage, d2 = 0, 0.1, 1 and 100. This may be thought of as

four different subjective judgements of the relevance of the

two interferants. The left side of the figure shows a gradual

simplification of the X-data, until with d2 = 100 (Figure 2(g))

only one systematic pattern of variation is clearly discernible

from the random measurement noise.

The right side of the figure confirms this: as the

contributions from the two interferants are diminished, theability of the remaining absorbance variation in X to describe

the analyte Y increases. Without any shrinkage of the

interferants' absorbance contributions (d2 = 0), three PCs

were required to describe both X and Y. Already at d2 = 1

most of the variation in Y is described after only one PC.

With d2 = 100 the first PC gives more or less a complete

description of X as well (Figure 2(h)). Equivalently (see

Appendix I), this means that XInputS1X'Input has only one

large eigenvalue.

4.4. A priori information for OLS, WLS and GLSpre-processingFigure 3 compares the pre-processing parameters in con-

ventional unweighted linear regression (here termed `OLS'),

in the pre-processing with diagonal S, as used in e.g. most

chemometric software (here termed `WLS'), and in the new

covariance-weighted pre-processing (here termed `GLS'; see

Appendix II). The left subplots show the uncertainty

information assumed available a priori in each of the three

cases. The right subplots illustrate the effect of the pre-

processing for three arbitrary X-variables (out of 48), namely

#10, 20 and 30, for all the samples.

In the top row (`OLS') there is no prior information used(in Equation (6d), diag(s)=I and d2 = 0). The variation in all

three directions #10, 20 and 30 is seen to be the information

that we expect from Figure 2(a).

Figure 2. Effect of increasing degree of GLS shrinkage of input data. Left: GLS pre-processed input data X = XInputG, where XInput is the

input spectra (a). Right: cumulative fit (fraction of explained variance, R2) of X (crosses, full line) and Y (circles, broken line) as a function

of PCA component a= 14. Rows 14: covariance scaling factors d2 = 0, 0.1, 1 and 100 respectively (Equation (6d)).



7/28/2019 2003, GLS

7/13

The X-variables from wavelength channel #30 onwards

represent mostly baseline information. In order to visualize

the effect of the WLS pre-processing available in most

chemometrics software packages today, we make the

subjective assumption that the baseline channels from #30

onwards contain mainly irrelevant noise (we ignore that the

X-data in this region may carry useful baseline information).Therefore we a priori ascribe relative standard uncertainty

sk = 1 for X-variables k = 129, but increase this to sk = 4 for

k = 3048, and use these expected noise levels as s in

Equation (6d). For this WLS pre-processing, the covariance

shrinkage factor is still defined as d = 0. The vertical variation

in X-variable #30 is seen to have been reduced in Figure 3(d)

compared with Figure 3(b), but otherwise the sample

configuration is unchanged and the cloud of sample points

still spans three dimensions.

In the third row (`GLS') we additionally employ the

spectral background knowledge about the two interferants

from Figure 1, l1 and l2, with shrinkage factor d2 = 100. We

retain the value of s from the WLS case to illustrate howvariance diag(s2) and covariance d2LL' in Equation (6d) can

be used at the same time. The cloud of sample points in

Figure 3(f) now spans mainly a single dimensionvariations

in net analyte signal. Many of the interferant effects have

been removed already during pre-processing.

4.5. Calibration based on very few samplesIn this subsection we illustrate one possible use of pre-

whitening: the removal of interference effects not seen in the

calibration sample set. Conventional cross-validated PLSR isused as the calibration method.

In regression-based multivariate calibration, all the inter-

ference phenomena that may occur in future samples have to

be represented in the calibration sample set, with sufficient

clarity and sufficiently independent of the other types of

variations. Sometimes that is difficult to attain, for economic

or practical reasons, for instance when calibrating an

industrial process spectrophotometer. The covariance-

weighted pre-processing method allows interference phe-

nomena with known spectra L to be corrected for at the pre-

processing stage, so that they do not have to be spanned in

the calibration set.

The first column of subplots in Figure 4 shows the originalabsorbance spectra XInput. The second column of subplots in

Figure 4 shows the spectra after pre-processing by the three

methods illustrated in Figure 3 for three of the X-variables.

Figure 3. Comparison of OLS, WLS and GLS pre-processing. Top (a,b), OLS; middle (c,d), WLS; bottom (e,f), GLS. Left: information

available a priori. Right: data plotted in 3D for X-variables #10, 20 and 30. Each point represents one samples spectrum.



7/28/2019 2003, GLS

8/13

Calibration set. The three densely dotted curves in Figure

4(a) represent N= 3 objects that together are here regarded as

if they were the only samples available with both X-and Y-data .

This tiny calibration sample set has relative analyteconcentrations Y=[0.009,0.365,0.679]'. Test set. For the sake

of illustration, the thin curves in Figure 4 represent the

remaining 20 objects, which will now be treated as a new,

future set, for which Y is to be predicted from their spectra X.

These input data are the same for the OLS, WLS and GLS

cases (rows 1, 2, and 3 in Figure 4).

The three densely dotted curves were used as X in

calibration against Y, with the model parameters estimated

by PLSR. In all three cases, OLS, WLS and GLS, the PLSR

model with one PC appeared to perform best in the small

calibration set, because the calibration samples only spanned

the analyte variation and no interferants. The linear regres-

sion coefficient vector BA = 1 gave more or less equally`perfect' fit in the N= 3 calibration samples by all three pre-

processing methods, as evidenced by the three dots along the

ìdeal' diagonal (middle column of subplots in Figure 4).

The analyte concentration in the remaining 20 ùnknown'

samples, bYA, was now predicted from their spectra, using the

òptimal' calibration model BA=1. The circles in the middle

column of subplots in Figure 4 show that the OLS and WLScalibration models gave bad Y-predictions in the new,

independent samples, while the GLS calibration model gave

good prediction. The reason is that variations in the input

spectra due to varying, uncontrolled levels of the two

interferants were not seen in the calibration set and hence

were left unchecked by the conventional unweighted and

variance-weighted cases (OLS and WLS). In contrast, the

damaging effects of the interferants on the predictive ability

of the calibration model were more or less eliminated by the

covariance-weighted pre-processing (GLS).

The two rightmost columns of subplots in Figure 4 show

the X-residuals after the one-dimensional PLSR model, in

terms of the scaled residuals E (obtained after projection of Xon the first PC t1) and their descaled version EDescaled(Equation (5d)) respectively. This shows that the unmo-

delled interference information was clearly visible for the

Figure 4. Calibration with very few samples. Top (a-1 to a-5), OLS; middle (b-1 to b-5), WLS; bottom (c-1 to c-5), GLS. Column 1: input

data XInput of three calibration samples (densely dotted) and 20 unknown test samples. Column 2: scaled spectra for regression

modelling, X = XOLS, XWLS or XGLS. Column 3: Y-values predicted from optimal models, byi;A1 (ordinate), vs measured values yi

(abscissa); Target line byi;A1 yi. Column 4: spectral residuals from one-PC PLSR model of scaled X-data, E. Column 5: spectral

residuals E after descaling by Equation (5d), EDescaled.



7/28/2019 2003, GLS

9/13

new unknown samples, both for the OLS/WLS and GLS

cases. In the GLS case, E was very low (Figure 4(c-4))

compared with the scaled X-data (Figure 4(c-2)), even for the

20 `new' samples. However, after descaling, the characteris-

tic signals of the two unmodelled interferants became clearly

visible in the residual spectra EDescaled (Figure 4(c-5)). These

residuals may be submitted to a second bilinear modelling,yielding a second set of score vectors and residual variances,

for outlier analysis, etc.

In summary, the pre-processing in this case allowed us to

make a valid calibration model with a small and otherwise

inadequate calibration set, in spite of a glaring lack of

interferant variability between the calibration objects. This

illustrates that shrinking away interference effects in the

X-space by pre-whitening makes it possible to use fewer

calibration samples, and in particular fewer Y-data, and

hence to get cheaper and simpler calibration models.

4.6. Calibration based on many samplesThe next two figures illustrate another advantage of pre-

whitening: the ability to reduce the required dimensionality

of the calibration model for a given set of calibration

samples. The main purpose of this reduction is to simplify

model interpretation, with a possible enhancement of the

predictive performance. In this case all the available objects

from Figure 2(a) are used as calibration samples (N= 23).The

same parameter sets (termed OLS, WLS and GLS) were used

as in the last example, and PLSR was again used for

developing the calibration models.

Full leave-one-out cross-validation was used for assessing

the models in terms of their optimal rank A and their root

mean square error of prediction in Y, RMSEP(Y)A. The input

spectra of the calibration samples now represent all N

(3 20 = 23) curves displayed in the right column of

subplots in Figure 4. The three full curves in Figure 5 show

the predictive ability of the OLS, WLS and GLS cases, in

terms of the cross-validated RMSEP(Y)A vs A = 0,1, 2,,6.(The dotted curve will be discussed later.)

The figure first of all shows that while the OLS and WLS

models require at least A = 3 PCs to reach acceptably low

predictive error, the GLS model did so with only A = 1 PC.

Moreover, a slight improvement in predictive ability was

attained: using two PCs, the GLS case gives a lower

predictive error than the OLS/WLS cases gave with three

or more PCs.

Finally, Figure 6 illustrates the effect of rescaling and

descaling of the model parameters, in this case of the

estimated regression coefficient vector at the lowest accep-

table rank, for OLS, WLS and GLS. The OLS solution is

superimposed on the WLS and GLS solutions as a dotted

line, for comparison.

The left column of subplots shows BA, as obtained from

bilinear PLSR at the optimal number of PCs (A), based on the

scaled X-variables in the OLS, WLS and GLS cases. The three

ways of pre-processing may be seen to yield somewhat

different scaled regression coefficients. Moreover, while the

OLS and WLS solutions requiredA = 3 PCs, the GLS solution

required only A = 1 PC.

The middle column shows the rescaled coefficient

spectrum BA,ForInput (Equation (5i)), suitable for application

Figure 5. Calibration based on all samples: predictive performance after OLS, WLS

and GLS pre-processing. Prediction error of y, estimated by full leave-one-out

cross-validation, from PLSR modelling from X = XInputG with G = S1/2. Squares:

OLS; S = l (no pre-processing). Circles: WLS; S diagonal (variance weighting).

Triangles: knowledge-based GLS; S defined from two known interferant spectra l1

and l2. Dotted curve: data-based GLS; S defined from spectral residuals after

projection of XInput on y.



7/28/2019 2003, GLS

10/13

directly to the input X-variables. Again the OLS solution is

superimposed on the WLS and GLS solutions (dotted line).

The scaling of the individual X-variables in vector BA,ForInputis independent of the pre-processing of the X-variables, so

the only difference between the solutions is due to the impact

of the pre-processing on the estimation process itself. Figures

6(e) and 6(h) show that the downweighting of the X-vari-

ables !channel #30 has rendered the other channels more

important for separating the baseline variations due to the

turbidity from the blue-coloured interferant and the red-

coloured analyte. The wavelength channels just below #30,

with low absorbance at the end of interferant spectrum l1(Figure 1), are given higher relative importance in the

modelling. This confirms that in a rank-reduced calibration

model such as the present low-rank PLSR modelling, there

are several almost equivalent ways to combine the 48 input

variables in order to attain the desired selectivity enhance-

ment.

The right column of subplots in Figure 6 shows the

descaled coefficient spectrum BA,Descaled (Equation (5g)),

suitable for graphical interpretation, with the OLS solution

again superimposed (dotted line). Now the obvious effect of

e.g. the sharp downweighing of X-variables !channel #30

has been removed.

The three solutions are qualitatively similar: they havepositive values below about channel #15, as expected from

the spectral characteristic of the analyte red litmus, and

negative values at higher wavelength channels in order to

compensate for the possible presence of the interferants blue

litmus and white ZnO. However, quantitatively, the three

solutions are somewhat different. This shows that with

different pre-processing methods the PLSR models needed

to describe different Y-relevant patterns of variation in the

data in order to attain the desired selectivity.

4.6.1. Denition of the uncertainty covarianceSfromthe calibration data at handThe dotted curve in Figure 5 represented the results when

interferant spectra L (Figure 1) were considered unknown,

and instead estimated from the X- and Y-data of the 23samples in the actual calibration data set at hand. As before,

leave-one-out cross-validation was employed, with re-

estimation of the spectral interferant covariance S for each

cross-validation segment. The figure shows that the pre-

whitening based on the estimated spectral residual matrix D

(Equation (7a)) with its dominant subspace L (Equations (7b)

and (7c), using A = 2 PCs) gives almost as simple modelling

as the one based on prior knowledge of the two interferants'

individual spectra L=[l1, l2]: in both cases the number of

PLSR components required is reduced, because the model

does not have to span these major interferants. However, the

prediction error is now slightly higher. A possible reason for

this is that the former, knowledge-based pre-processing usedthe known spectra L as additional independent information

in estimating S, while the latter, data-driven pre-processing

had no such extra information available.

Figure 6. Calibration based on all samples: regression coefficients estimated, rescaled

and descaled. Top, OLS (A = 3 PCs); middle, WLS (A = 3 PCs); bottom, GLS (A = 1 PC).

Left: coefficients bBA obtained from scaled spectra X. Middle: rescaled coefficientsbBA;ForInput (Equation (5i)), applicable directly to unscaled input spectra XInput. Right:

descaled coefficients bBA;Descaled (Equation (5g)); weighting effects removed. Dotted

curves: OLS estimate bBA3 from (a), for comparison.



7/28/2019 2003, GLS

11/13

5. DISCUSSION

Figure 4 demonstrated an ability of the covariance-weighted

pre-processing to give good predictive ability even for new

samples with interferants not present in the calibration set . This

may become important in e.g. calibrating industrial process

analysers, when it is difficult to perturb the actual process

enough to get a sufficiently informative calibration sample

set. By introducing prior knowledge about known inter-

ferants' spectral signatures, the interferants can be compen-

sated for already in a pre-processing filtering step, and thus

do not have to vary in the calibration set.

Figure 5 demonstrated that the covariance-weighted GLS'

pre-processing yielded calibration models with lower rank

than those from the conventional `OLS' and `WLS' methods.

High-dimensional models are generally cumbersome to

interpret graphically, so that is an advantage. Moreover, as

long as the uncertainty covariance S represents prior

knowledge, a slight improvement in prediction ability may

be expected, because the subsequent calibration thenrequires fewer statistical parameters to be estimated from

the available Ncalibration data.

5.1. Comparison with other methodsThe covariance-weighted pre-processing based on prior

known spectra L has the advantage of reducing interference

without consuming degrees of freedom from the available,

often expensive Y-data. In that respect it resembles spectral

interference subtraction (SIS) [12]. If, instead, S is estimated

from the available data [X, Y] at hand, the pre-processing has

some similarity to so-called orthogonal signal correction

(OSC) [13] and direct orthogonalization (DO) [14]. Extendedmultiplicative signal correction (EMSC) [12,15] has similar

properties to SIS and covariance-weighted pre-processing,

but allows for removal of both additive and multiplicative

effects.

There is one major difference in how the covariance-

weighted pre-processing and the set of OSC, DO, SIS and

EMSC methods attempt to reduce the interference effects in

XInput. The latter methods subtract the effects in one way or

another. In contrast, the new covariance-weighted pre-

processing is based on shrinking by division (i.e. multi-

plication by the inverse of S; see Equations (2) and (3)). The

full consequences of this distinction are not yet clear.

However, it may be noted that DO [14] is particularly

similar to the data-driven estimation of interferant subspace

L (Equations (7a)(7c); Figure 5, dotted line), even though it

employs subtraction instead of inverted scaling to eliminate

the effect of the interferants.

5.2. Pre-colouring the spectraInstead of just shrinking the X-space in particularly

undesired or irrelevant directions, one may also reformulate

the covariance-weighted pre-processing to expand the

X-space in directions known to be particularly desired or

relevant. For instance, after having contracted the X-space to

filter out irrelevant or detrimental interferants, the X-spacecould then be expanded in the dimension of the analyte's

spectrum (curve 3, Figure 1), to enhance this desired type of

variation over e.g. random measurement noise in the

subsequent multivariate subspace analysis. Preliminary

Monte Carlo simulations (not shown here) indicate this to

have some statistical advantage.

The pre-processing has been used for pre-whitening

spectral X-variables in this paper. However, it may equally

well be applied to the set of Y-variables. Appendix I outlines

various equivalent alternatives for integrating the interferantcovariance matrix S into the actual estimators in PCA/PCR

and PLSR, instead of using S1/2 for pre-processing. When

prior knowledge is available about the available objects, the

pre-processing may also then be used, in a bilinear analogy

to the conventional GLS estimator (Appendix II).

It should be noted that after covariance-weighted pre-

processing to remove all major interferants, the remaining

spectra mainly show the net signal of the analyte plus

random noise (see Figure 2(g)). Of course, if the spectrum of

the analyte, K (Equation (1)), is a linear combination of the

spectra L of the interferants, the covariance-weighted pre-

processing will filter out the analyte effect too; the remaining

net analyte signal is zero. Thus the usual requirement in

quantitative analysis, that the analyte spectrum has to be

linearly independent of the major interferant spectra,

remains valid.

6. CONCLUSIONS

A method has been presented for covariance-weighted pre-

processing of multivariate input data. It facilitates the use of

prior knowledge about undesired (and desired) structures

that are expected to vary in the input data. Its purpose is to

reduce the complexity of the ensuing model and to improve

its predictive ability. The method was illustrated forreducing the effect of spectral variations due to known

interferants' known spectra.

In general, multivariate calibration by low-rank regres-

sion, using e.g. PCR or PLSR, has proven highly effective for

solving selectivity problems in complex systems. Many

unidentified interference problems can even be dealt with, as

long as they are spanned well in the calibration sample set

and picked up clearly by the multichannel instrument.

However, the present combination of prior knowledge

and empirical calibration data may simplify calibration,

because already known parameters do not have to be

estimated statistically from the calibration data. The finalstatistical regression stage in the calibration process could

then primarily be used for finding and correcting unknown

or unexpected phenomena in the data. Thereby calibration of

multichannel instruments may become less expensive and

time-consuming, and easier to understand.

APPENDIX I. EIGENVECTOR EXPRESSIONSFOR COVARIANCE-WEIGHTED PRE-PROCESSING

In PCA, each latent variable (PC) is an eigenvector of XX'

(after suitable mean centring). If the score vector for anindividual PC, t, is scaled to t't = 1, this may be written as

tl=(XX')t. Inserting X = XInputS1/2 (Equations (2) and (3))

into this eigenvalue expression yields the covariance-



7/28/2019 2003, GLS

12/13

weighted expression tl= (XInputS1X'Input)t. Equivalently, t

is then a right-hand singular vector of XInputS1/2.

Conversely, if the PCA loading vector p is scaled to

p'p = 1, then pl=(X'X)p. Inserting X = XInputS1/2 gives

pl= (S1/2X'InputXInputS1/2)p; p is then a left-hand singu-

lar vector of XInputS1/2.

In PLSR, each component is an eigenvector of the XYcovariance structure [10]. For instance, with orthonormal

scores, t is defined by tl= (XX'YY')t (after suitable deflation

for previous components). With X = XInputS1/2 this gives

the expression tl= (XInputS1X'InputYY')t. Conversely, the

orthonormal loading weight w for each component, used for

defining t = X'w (after suitable deflation of X for previous

components), is defined by wl= (X'YY'X)w. Covariance-

weighted pre-processing is equivalent to defining

wl= (S1/2X'InputYY'XInputS1/2)w, or w as the first left-

hand singular vector of S1=2XTInputY.

Hence the PCA/PCR and PLSR solutions may be

obtained either by covariance-weighted pre-processing

X = XInputS1/2 followed by standard OLS-based software

for PCA/PCR or PLSR, or by eigenvector decomposition of

cross-product matrices weighted by S1. The latter is

analoguous to generalised least squares (GLS) regression.

APPENDIX II. GLS AND COVARIANCE-WEIGHTED PRE-PROCESSING

The relationship between generalized least squares (GLS)

regression and covariance-weighted pre-processing will be

demonstrated here. In weighted least squares (WLS) the

regressorregressor and regressorregressand cross-productmatrices are modified by the inverse error covariance matrix

S1. When S has off-diagonal elements, this approach is

called `GLS' in some statistical literature [2]. The terms `WLS'

and `GLS' are therefore employed here to distinguish purely

variance-based weighting from covariance-based weighting.

In some other statistical literature the WLS and GLS terms

are used more interchangeably. More details are given in

Reference [1].

II. 1. Regression over objectsIn the conventional OLS case the input data for one or more

regressands, YInput (NJ)=[yInput,j, j = 1,2,,J], are modelledby projection on one or more regressors, XInput(N K)=[xInput,k, k = 1,2,,K], over a set of N objects,

according to the linear model YInput = XInputB FInput (ignor-

ing the mean centring). To estimate the regression coeffi-

cients B (KJ), the conventional estimator fits each

regressand yInput (N 1) individually to XInput by minimiz-

ing f'InputfInput. This yields the conventional full-rank OLS

estimator bB XHInputXInput1XHInputYInput.

If the correlation pattern between the response errors in

the Nobjects, SN (N N), is known, the GLS estimator bB

XHInputS1N XInput

1XHInputS1N YInput yields better estimates,

because it minimizes fHInputS1N fInput for each regressor, i.e.

the importance of the correlated error pattern is down-weighted.

Equivalently, the pre-whitening operators X S1=2N XInput

and Y S1=2N YInput allow the model to be rewritten as

Y = XB F. The same GLS estimator may now be rewritten

as bB XHX1XHY, which shows that covariance-weighted

pre-processing allows the GLS estimation of B to be

performed by conventional OLS tools. This was here shown

for full-rank OLS/GLS regression, but is equally applicable

for regression methods that handle collinear X-variables,

such as ridge regression and the bilinear methods PCR andPLSR.

II. 2. Regression over X-variablesThe converse case is traditional direct multivariate calibra-

tion or multicomponent curve resolution according to Beer's

law. Here each spectrum xInput (1 K) in the matrix

XInput=[xInput,k; k = 1,2,,K] is modelled by a set of J known

analyte spectra K (KJ) in the linear regression model

XInput = CK'Input EInput, where C (NJ) is the matrix of

unknown analyte concentrations and EInput (N K) is the

matrix of spectral residuals (ignoring baseline offsets). When

the constituent spectrum matrix KInput has full column rank,

the OLS estimator minimizes eInpute'Input for each row in

XInput, yielding bC XInputKInputKHInputKInput

1.

If the correlation pattern between the response errors

in the K X-variables, S (K K), is known, then the

GLS estimator minimizes eInputS1e'Input and yields

bC XInputS1KInputK

HInputS

1KInput1.

The equivalent covariance-weighted pre-processing solu-

tion for curve resolution pre-whitens the spectra [X; K'] =

[XInput; K'Input]S1/2, thereby shrinking away the noise

correlations between the X-variables. The model may then

be written as X = CK'E and the GLS concentration estimate

may be obtained by bC XKKHK1, i.e. by an OLS

expression.In summary, prior knowledge about the uncertainty

covariances S may be used to improve linear regression. In

Appendix I the same was shown for bilinear regressions. In

both cases, one may either analyse the input data directly by

GLS or GLS-like expressions, involving S1, or perform

covariance-weighted pre-processing of the input data by

S1/2, followed by OLS or OLS-like expressions, as

illustrated in this paper.

REFERENCES

1. Read BC. Weighted least squares. In Encyclopedia ofStatistical Sciences, vol. 9, Kotz S, Johnson NL (eds). WileyInterscience, J. Wiley & Sons Inc: New York, 1988; 576578.

2. Martens H, Naes T. Multivariate Calibration. Wiley:Chichester, 1989.

3. Gower JC. Generalised canonical analysis. In MultiwayData Analysis, Coppi R, Bolasco S (eds). Elsevier:Amsterdam, 1989; 221232.

4. Bullmore E, Long C, Suckling J, Fadili J, Calvert G, ZelayaF, Carpenter A, Brammer M. Colored noise andcomputational inference in neurophysiological (fMRI)time series analysis: resampling methods in time andwavelet domains. Human Brain Mapp. 2001; 12: 6178.

5. De Lathauwer L, de Moor B, Vandewalle J. An introduc-tion to independent component analysis. J. Chemometrics2000; 14: 123149.

6. Kuldvee R, Kaljurand M, Smit HC. Improvement ofsignal-to-noise ratio of electropherograms and analysis



7/28/2019 2003, GLS

13/13

reproducibility with digital signal processing and multi-ple injections. J. High Resol. Chromatogr. 1998; 21: 169174.

7. Wentzell PD, Andrews DT, Kowalski BR. Maximumlikelihood multivariate calibration. Anal. Chem. 1997; 69:22992311.

8. Wentzell PD, Lohnes MT. Maximum likelihood principalcomponent analysis with correlated measurement errors:

theoretical and practical considerations. ChemometricsIntell. Lab. Syst. 1999; 45: 6585.9. Paatero P, Tapper U. Positive matrix factorisation: a non-

negative factor model with optimal utilisation of errorestimates of data values. Environmetrics 1994; 5: 111126.

10. Ho skuldsson A. PLS Regrl 7 session methods. J Chemo-metrics, 1988; 2: 211228.

11. Martens H, Martens M. Multivariate Analysis of Quality.An Introduction. Wiley: Chichester, 2001.

12. Martens H, Stark E. Extended multiplicative signalcorrection and spectral interference subtraction: newpre-processing methods for near infrared spectroscopy.J.Pharmaceut. Biomed. Anal. 1991; 9: 625635.

13. Wold S, Antti H, Lindgren F, O hman J. Orthogonal signalcorrection of near-infrared spectra. Chemometrics Intell.Lab. Syst. 1998; 44: 175185.

14. Andersson CA. Direct orthogonalization. ChemometricsIntell. Lab. Syst. 1999; 47: 5163.15. Martens H, Pram Nielsen J, Balling Engelsen S. Light

scattering and light absorbance separated by extendedmultiplicative signal correction (EMSC). Application toNIT analysis of powder mixtures. Anal. Chem. 2003; 75:394404.



Date post:	03-Apr-2018
Category:	Documents
Upload:	byron-xavier-lima-cedillo
View:	214 times
Download:	0 times

2003, GLS

Documents