+ All Categories
Home > Documents > 2003, GLS

2003, GLS

Date post: 03-Apr-2018
Category:
Upload: byron-xavier-lima-cedillo
View: 214 times
Download: 0 times
Share this document with a friend

of 13

Transcript
  • 7/28/2019 2003, GLS

    1/13

    Pre-whitening of data by covariance-weighted

    pre-processing

    Harald Martens1*, Martin Hy2, Barry M. Wise3, Rasmus Bro1 and Per B. Brockhoff4

    1Department of Food and Dairy Science, Royal Veterinary and Agricultural University, DK-1958 Frederiksberg C, Denmark2Institute of Chemistry, Norwegian University of Science and Technology, N-7491 Trondheim, Norway3Eigenvector Research Inc., Manson, WA, USA4Department of Mathematics and Physics, Royal Veterinary and Agricultural University, DK-1871 Frederiksberg C, Denmark

    Received 7 May 2001; Revised 9 September 2002; Accepted 22 November 2002

    A data pre-processing method is presented for multichannel `spectra' from process spectro-

    photometers and other multichannel instruments. It may be seen as a `pre-whitening' of the spectra,

    and serves to make the instrument `blind' to certain interferants while retaining its analyte

    sensitivity. Thereby the instrument selectivity may be improved already prior to multivariatecalibration. The result is a reduced need for process perturbation or sample spiking just to generate

    calibration samples that span the unwanted interferants. The method consists of shrinking the

    multidimensional data space of the spectra in the off-axis dimensions corresponding to the spectra of

    these interferants. A `nuisance' covariance matrix S is first constructed, based on prior knowledge or

    estimates of the major interferants' spectra, and the scaling matrix G = S1/2 is defined. The pre-

    processing then consists of multiplying each input spectrum by G. When these scaled spectra are

    analysed in conventional chemometrics software by PCA, PCR, PLSR, curve resolution, etc., the

    modelling becomes simpler, because it does not have to account for variations in the unwanted

    interferants. The obtained model parameter may finally be descaled by G1 for graphical inter-

    pretation. The pre-processing method is illustrated by the use of prior spectroscopic knowledge to

    simplify the multivariate calibration of a fibre optical vis/NIR process analyser. The 48-dimensional

    spectral space, corresponding to the 48 instrument wavelength channels used, is shrunk in two of itsdimensions, defined by the known spectra of two major interferants. Successful multivariate

    calibration could then be obtained, based on a very small calibration sample set. Then the paper

    shows the pre-whitening used for reducing the number of bilinear PLSR components in multivariate

    calibration models. Nuisance covariance S is either based on the prior knowledge of interferants'

    spectra or based on estimating the interferants' spectral subspace from the calibration data at hand.

    The relationship of the pre-processing to weighted and generalized least squares from classical

    statistics is outlined. Copyright # 2003 John Wiley & Sons, Ltd.

    KEYWORDS: pre-whitening; covariance; weighted; preprocessing; GLS; prior knowledge; process; multivariate

    calibration

    1. INTRODUCTION

    1.1. Reducing unwanted effectsClassical chemical modelling, where prior knowledge is

    used to formulate mathematical models based on causal/

    mechanistic/first-principles theory, has problems when the

    a priori knowledge is erroneous or incomplete. On the other

    hand, data-driven explorative modelling, such as multi-

    variate regression of one set of variables Y on another set of

    variables X, has problems if the available data are inade-

    quate. Sometimes, purely data-driven modelling requires

    large amounts of input data for estimation of parameters that

    one already knows.

    The goal of the present covariance-weighted pre-proces-

    sing technique is to maintain the flexibility of the data-driven

    `soft modelling', but to reduce the requirements for

    empirical calibration data, by including quantitative prior

    knowledge in the modelling. If successful, this should

    reduce the existing prerequisite for spanning all relevant

    types of variation by the calibration samplesa requirementthat has made multivariate calibration of process analysers

    expensive and cumbersome. It should also decrease the total

    number of calibration samples needed, as fewer statistical

    *Correspondence to: H. Martens, Department of Food and Dairy Science,Royal Veterinary and Agricultural University, DK-1958 Frederiksberg C,Denmark.E-mail: [email protected]

    Copyright # 2003 John Wiley & Sons, Ltd.

    JOURNAL OF CHEMOMETRICS

    J. Chemometrics 2003; 17: 153165Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cem.780

  • 7/28/2019 2003, GLS

    2/13

  • 7/28/2019 2003, GLS

    3/13

    eigenanalyses, these software systems include a weighting-

    based pre-processing step, to balance the relevance and noise

    levels of the different variables. This weighting may be

    written as

    X XInputG 2

    where G (K K) is a scaling matrix. In this conventionalweighting, G is diagonal, with scaling elements that are the

    inverse of a predefined standard deviation s (K 1). In the

    commonly used standardization, vector s is defined as the

    total initial standard deviation s0 of the Kvariables in the set

    of available objects. However, it is also possibleand

    statistically more optimalto define s as the standard un-

    certainty of the different variables, i.e. the expected standard

    deviation of their errors.

    More formally, the scaling matrix G may be seen as the

    inverse square root of the diagonal variance elements in

    matrix S:

    G S1=2

    3

    Defining S = diag(s2) and replacing X by XInputG =

    XInputS1/2 in the PCA and PLSR definitions shows that

    the pre-processing of the X-variables is equivalent (see

    Appendix I) to defining the score vectors as eigenvectors of

    XInputS1/2X'Input in PCA/PCR and of XInputS

    1/2X'InputYY'

    in PLSR (after deflation). In the NIPALS estimation

    algorithm it may equivalently be attained by using weighted

    least squares (WLS) in the repeated regression over X-vari-

    ables that defines each score vector.

    If the errors in different X-variables are correlated, S

    becomes a covariance matrix with non-zero off-diagonal

    elements. From more or less approximate prior knowledgeabout this uncertainty covariance, Equation (3) may still be

    used for defining the pre-processing. Equation (2) then

    yields a covariance-weighted pre-processing of the input

    data. The equivalent NIPALS algorithm then requires

    generalized least squares (GLS) regression [1,2] over the

    X-variables to estimate the score vectors. Further details of

    the relationship between classical GLS and the present use of

    covariance-weighted pre-processing for `pre-whitening' of

    spectral data are given in Appendix II. This also shows the

    converse object weighting to remove correlated errors

    between objects.

    2.2.3. Denition of the pre-processing weights GA practical implementation of Equation (3) is based on

    eigenanalysis of the uncertainty variancecovariance matrix

    S in terms of its eigenvectors V and eigenvalues l:

    SV Vdiagl 4a

    The covariance weighting matrix is here defined as

    G Vdiagl1=2VH 4b

    The chosen symmetrical definition of G is not mandatory as

    long as GG' =S1, but it simplifies the visual interpretation

    of the weighted model parameters and residuals.

    2.2.4. Deweighting the model parametersThe loadings P and residuals E of the X-variables, obtained

    from the bilinear model of the mean-centred, weighted

    X-data,

    X TPH E 5a

    may be descaled to fit the model of the mean-centred,

    unweighted data, i.e.

    XInput TPHInput EInput 5b

    If G is symmetrical and has full rank (see below), the

    inversion of Equation (2) gives

    XDescaled XInput XG1 5c

    Likewise,

    EDescaled EG1 5d

    and

    PDescaled G1P 5e

    This simplifies the graphical interpretation of the X-loadings.

    In regression methods such as PCR and PLSR the mean-centred, reduced-rank linear regression model summary,

    based on the scaled X-variables, may be written as

    Y XBA FA 5f

    where the regression coefficient parameter matrix BA (KJ)

    uses A latent variables and FA (NJ) represents residuals.

    BA may be seen as linear combinations of orthogonal

    X-loadings (PCR) or orthogonal loading-like loading weights

    (PLSR). For graphical interpretation, BA may therefore be

    descaled in analogy to Equation (5e) as

    BA;Descaled G1BA 5g

    On the other hand, the regression coefficients suitable forprediction of the Y-variables directly from the unweighted

    X-variables,

    bYA XInputBA;ForInput 5h

    may be obtained by inserting Equation (2) into Equation (5f),

    yielding

    BA;ForInput GBA 5i

    2.2.5. Denition of the uncertainty covarianceSfrom

    prior knowledgeIn the situation with undesired interferants outlined in

    Equation (1), it is natural to define S from D = DL' E. The

    spectra L of the interferants (the undesired variation

    patterns) may sometimes be assumed known, while their

    concentrations D are unknown. The formally correct defini-

    tion could then be

    S L covDLH covE 6a

    where cov(D) represents the expected variancecovariance

    of the interferant concentrations and cov(E) represents the

    covariance of other, unidentified error patterns plus the

    variance of random i.i.d. noise. In practice, the variation in

    interferant concentrations may be difficult to specify and

    may e.g. be replaced by the approximation

    covD d2 I 6b

    Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 153165

    Pre-whitening of spectra 155

  • 7/28/2019 2003, GLS

    4/13

    where d2 is the expected average variance of the interferants'

    concentrations; intercorrelations between the interferants'

    concentrations are assumed to be negligible. The scalar d is

    given in the unit of interferant concentrations. Moreover, it

    may often be adequate to assume that the errors in E are

    uncorrelated, i.e.

    covE diags2 6c

    Thereby Equation (6a) simplifies to

    S d2LLH diags2 6d

    If all the X-variables have about the same uncertainty

    variance s2, i.e.

    covE s2I 6e

    this leads to a further simplification. With the expected

    average interferant concentration variance d2 being a general

    scaling factor determining the contribution of the interferant

    spectra, this further simplifies the definition of S to

    S d2LLH I 6f

    By defining the scaling factor d sufficiently large, the pre-

    processing X = XInputS1/2 (Equations (2) and (3)) in effect

    can make the subsequent least squares-based modelling of X

    completely insensitive (`blind') to signal variations caused

    by the unknown interferant concentrations. Only the net

    analyte signal obtained as the residual after projecting K

    (Equation (1)) on L will remain in X, together with un-

    modelled variations and measurement noise.

    2.2.6. Denition of the uncertainty covarianceSfrom

    previous residualsWhen explicit prior knowledge about the spectrum of the

    individual interferants in L is lacking, the required informa-

    tion may instead be defined from spectral modelling

    residuals in previous calibration data. If X and Y data from

    a previous relevant set of M objects are available, D, the

    spectral residuals in these data, may be obtained after

    projection of X on the Jknown constituent concentrations Y:

    D XI YYHY1YH 7a

    These residualsDmay then be used for estimating the future

    error covariance matrix S, by defining L in Equation (6f) as

    e.g. the first few (A) principal components of D, obtained bysingular value decomposition of D:

    USVH D 7b

    In the notation of e.g. Matlab the subspace of the interferants

    may be defined as

    L V:; 1 : AS1 : A; 1 : A 7c

    2.2.7. Denition of the uncertainty covarianceSfromthe data at handEquations (7a)(7c) may alternatively be based on the X and

    Y data at hand in the actual set of N calibration samples,instead of on previous data. However, care must then be

    taken to avoid overfitting. For instance, if cross-validation

    and jackknifing are to be used for statistical assessment of a

    calibration model, S may e.g. have to be re-estimated within

    each cross-validation segment.

    3. MATERIALS AND METHODS

    3.1. Input data

    The data set used for illustrating the pre-processing has beenchosen for its simplicity, in order to make the method clear.

    The data [11] concern the determination of the protonated

    state of a chemical dye, litmus.

    3.2. MethodsTransmitted light spectra Twere measured remotely by fibre

    optics in an industrial process spectrophotometer (Guided

    WaveModel 200). The transmittance spectra were converted

    into absorbance (here referred to as `optical density' (OD))

    spectra and collected in K= 48 wavelength channels between

    about 400 and 700 nm. These OD spectra were termed XInput,

    available for a total of 23 samples.

    The samples contain different known concentrations [11]

    of protonated (red-coloured) litmus, which is the analyte to

    be calibrated for here, Y = [protonated litmus]. In addition,

    the samples have various unknown concentration variations

    of two interferants, unprotonated (blue-coloured) litmus

    (due to varying pH) and white zinc oxide powder. The data

    were analysed in Matlab2Version 5.3 (The MathWorks, Inc.)

    using the first author's software.

    4. RESULTS

    4.1. Previous results for the same data

    Without any interferants the OD data are expected toincrease proportionally with the concentration of the red-

    coloured analyte, Y = [protonated litmus], at each wave-

    length k where the analyte absorbs light, xInput,k, k = 1,2,,K.

    However, the two interferants (blue litmus, white powder)

    generate selectivity problems: strongly varying but un-

    known levels of one or both of the interferants make it

    impossible to determine the analyte by conventional

    univariate calibration based on a single wavelength channel.

    Such selectivity problems may be removed by multi-

    variate calibration [2], without knowing anything about the

    spectral characteristics of the pure analyte and the inter-

    ferants, and without even knowing the concentrations of theinterferants in the calibration samples, as demonstrated for

    these data in References [2,11]. However, this requires that

    the calibration sample set spans not only the analyte's

    concentration but also each of the interferants' concentra-

    tions. The present paper shows how additional spectral

    information about the interferants may be used to filter out

    their effects by shrinking the X-space, to the extent that they

    do not have to be modelled and therefore not even spanned

    by the calibration set.

    4.2. Input dataThe two full curves in Figure 1 show the known

    interference structures in the present application example:the instrument responses L=[l1, l2] (crosses) of the two

    interferants, represented by their OD spectra at K= 48

    wavelength channels in the visible wavelength range. These

    Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 153165

    156 H. Martens et al.

  • 7/28/2019 2003, GLS

    5/13

  • 7/28/2019 2003, GLS

    6/13

    4.3. Increasing degree of shrinkage of inputdataThe rest of Figure 2 illustrates how the spectra X look after

    increased downscaling of the two known interferants'

    impact in the pre-processing X = XInputG = XInputS1/2

    (Equations (2) and (3)). The error covariance matrix S was

    here defined by the simplified expression in Equation (6f) asan increasingly weighted sum of the covariance d2LL' (where

    L=[l1, l2] from Figure 1) plus a constant noise variance,

    diag(s2) = I.

    The scalar d2 determines the degree of shrinkage. The four

    rows in Figure 2 represents four increasing degrees of

    shrinkage, d2 = 0, 0.1, 1 and 100. This may be thought of as

    four different subjective judgements of the relevance of the

    two interferants. The left side of the figure shows a gradual

    simplification of the X-data, until with d2 = 100 (Figure 2(g))

    only one systematic pattern of variation is clearly discernible

    from the random measurement noise.

    The right side of the figure confirms this: as the

    contributions from the two interferants are diminished, theability of the remaining absorbance variation in X to describe

    the analyte Y increases. Without any shrinkage of the

    interferants' absorbance contributions (d2 = 0), three PCs

    were required to describe both X and Y. Already at d2 = 1

    most of the variation in Y is described after only one PC.

    With d2 = 100 the first PC gives more or less a complete

    description of X as well (Figure 2(h)). Equivalently (see

    Appendix I), this means that XInputS1X'Input has only one

    large eigenvalue.

    4.4. A priori information for OLS, WLS and GLSpre-processingFigure 3 compares the pre-processing parameters in con-

    ventional unweighted linear regression (here termed `OLS'),

    in the pre-processing with diagonal S, as used in e.g. most

    chemometric software (here termed `WLS'), and in the new

    covariance-weighted pre-processing (here termed `GLS'; see

    Appendix II). The left subplots show the uncertainty

    information assumed available a priori in each of the three

    cases. The right subplots illustrate the effect of the pre-

    processing for three arbitrary X-variables (out of 48), namely

    #10, 20 and 30, for all the samples.

    In the top row (`OLS') there is no prior information used(in Equation (6d), diag(s)=I and d2 = 0). The variation in all

    three directions #10, 20 and 30 is seen to be the information

    that we expect from Figure 2(a).

    Figure 2. Effect of increasing degree of GLS shrinkage of input data. Left: GLS pre-processed input data X = XInputG, where XInput is the

    input spectra (a). Right: cumulative fit (fraction of explained variance, R2) of X (crosses, full line) and Y (circles, broken line) as a function

    of PCA component a= 14. Rows 14: covariance scaling factors d2 = 0, 0.1, 1 and 100 respectively (Equation (6d)).

    Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 153165

    158 H. Martens et al.

  • 7/28/2019 2003, GLS

    7/13

    The X-variables from wavelength channel #30 onwards

    represent mostly baseline information. In order to visualize

    the effect of the WLS pre-processing available in most

    chemometrics software packages today, we make the

    subjective assumption that the baseline channels from #30

    onwards contain mainly irrelevant noise (we ignore that the

    X-data in this region may carry useful baseline information).Therefore we a priori ascribe relative standard uncertainty

    sk = 1 for X-variables k = 129, but increase this to sk = 4 for

    k = 3048, and use these expected noise levels as s in

    Equation (6d). For this WLS pre-processing, the covariance

    shrinkage factor is still defined as d = 0. The vertical variation

    in X-variable #30 is seen to have been reduced in Figure 3(d)

    compared with Figure 3(b), but otherwise the sample

    configuration is unchanged and the cloud of sample points

    still spans three dimensions.

    In the third row (`GLS') we additionally employ the

    spectral background knowledge about the two interferants

    from Figure 1, l1 and l2, with shrinkage factor d2 = 100. We

    retain the value of s from the WLS case to illustrate howvariance diag(s2) and covariance d2LL' in Equation (6d) can

    be used at the same time. The cloud of sample points in

    Figure 3(f) now spans mainly a single dimensionvariations

    in net analyte signal. Many of the interferant effects have

    been removed already during pre-processing.

    4.5. Calibration based on very few samplesIn this subsection we illustrate one possible use of pre-

    whitening: the removal of interference effects not seen in the

    calibration sample set. Conventional cross-validated PLSR isused as the calibration method.

    In regression-based multivariate calibration, all the inter-

    ference phenomena that may occur in future samples have to

    be represented in the calibration sample set, with sufficient

    clarity and sufficiently independent of the other types of

    variations. Sometimes that is difficult to attain, for economic

    or practical reasons, for instance when calibrating an

    industrial process spectrophotometer. The covariance-

    weighted pre-processing method allows interference phe-

    nomena with known spectra L to be corrected for at the pre-

    processing stage, so that they do not have to be spanned in

    the calibration set.

    The first column of subplots in Figure 4 shows the originalabsorbance spectra XInput. The second column of subplots in

    Figure 4 shows the spectra after pre-processing by the three

    methods illustrated in Figure 3 for three of the X-variables.

    Figure 3. Comparison of OLS, WLS and GLS pre-processing. Top (a,b), OLS; middle (c,d), WLS; bottom (e,f), GLS. Left: information

    available a priori. Right: data plotted in 3D for X-variables #10, 20 and 30. Each point represents one samples spectrum.

    Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 153165

    Pre-whitening of spectra 159

  • 7/28/2019 2003, GLS

    8/13

    Calibration set. The three densely dotted curves in Figure

    4(a) represent N= 3 objects that together are here regarded as

    if they were the only samples available with both X-and Y-data .

    This tiny calibration sample set has relative analyteconcentrations Y=[0.009,0.365,0.679]'. Test set. For the sake

    of illustration, the thin curves in Figure 4 represent the

    remaining 20 objects, which will now be treated as a new,

    future set, for which Y is to be predicted from their spectra X.

    These input data are the same for the OLS, WLS and GLS

    cases (rows 1, 2, and 3 in Figure 4).

    The three densely dotted curves were used as X in

    calibration against Y, with the model parameters estimated

    by PLSR. In all three cases, OLS, WLS and GLS, the PLSR

    model with one PC appeared to perform best in the small

    calibration set, because the calibration samples only spanned

    the analyte variation and no interferants. The linear regres-

    sion coefficient vector BA = 1 gave more or less equally`perfect' fit in the N= 3 calibration samples by all three pre-

    processing methods, as evidenced by the three dots along the

    `ideal' diagonal (middle column of subplots in Figure 4).

    The analyte concentration in the remaining 20 `unknown'

    samples, bYA, was now predicted from their spectra, using the

    `optimal' calibration model BA=1. The circles in the middle

    column of subplots in Figure 4 show that the OLS and WLScalibration models gave bad Y-predictions in the new,

    independent samples, while the GLS calibration model gave

    good prediction. The reason is that variations in the input

    spectra due to varying, uncontrolled levels of the two

    interferants were not seen in the calibration set and hence

    were left unchecked by the conventional unweighted and

    variance-weighted cases (OLS and WLS). In contrast, the

    damaging effects of the interferants on the predictive ability

    of the calibration model were more or less eliminated by the

    covariance-weighted pre-processing (GLS).

    The two rightmost columns of subplots in Figure 4 show

    the X-residuals after the one-dimensional PLSR model, in

    terms of the scaled residuals E (obtained after projection of Xon the first PC t1) and their descaled version EDescaled(Equation (5d)) respectively. This shows that the unmo-

    delled interference information was clearly visible for the

    Figure 4. Calibration with very few samples. Top (a-1 to a-5), OLS; middle (b-1 to b-5), WLS; bottom (c-1 to c-5), GLS. Column 1: input

    data XInput of three calibration samples (densely dotted) and 20 unknown test samples. Column 2: scaled spectra for regression

    modelling, X = XOLS, XWLS or XGLS. Column 3: Y-values predicted from optimal models, byi;A1 (ordinate), vs measured values yi

    (abscissa); Target line byi;A1 yi. Column 4: spectral residuals from one-PC PLSR model of scaled X-data, E. Column 5: spectral

    residuals E after descaling by Equation (5d), EDescaled.

    Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 153165

    160 H. Martens et al.

  • 7/28/2019 2003, GLS

    9/13

    new unknown samples, both for the OLS/WLS and GLS

    cases. In the GLS case, E was very low (Figure 4(c-4))

    compared with the scaled X-data (Figure 4(c-2)), even for the

    20 `new' samples. However, after descaling, the characteris-

    tic signals of the two unmodelled interferants became clearly

    visible in the residual spectra EDescaled (Figure 4(c-5)). These

    residuals may be submitted to a second bilinear modelling,yielding a second set of score vectors and residual variances,

    for outlier analysis, etc.

    In summary, the pre-processing in this case allowed us to

    make a valid calibration model with a small and otherwise

    inadequate calibration set, in spite of a glaring lack of

    interferant variability between the calibration objects. This

    illustrates that shrinking away interference effects in the

    X-space by pre-whitening makes it possible to use fewer

    calibration samples, and in particular fewer Y-data, and

    hence to get cheaper and simpler calibration models.

    4.6. Calibration based on many samplesThe next two figures illustrate another advantage of pre-

    whitening: the ability to reduce the required dimensionality

    of the calibration model for a given set of calibration

    samples. The main purpose of this reduction is to simplify

    model interpretation, with a possible enhancement of the

    predictive performance. In this case all the available objects

    from Figure 2(a) are used as calibration samples (N= 23).The

    same parameter sets (termed OLS, WLS and GLS) were used

    as in the last example, and PLSR was again used for

    developing the calibration models.

    Full leave-one-out cross-validation was used for assessing

    the models in terms of their optimal rank A and their root

    mean square error of prediction in Y, RMSEP(Y)A. The input

    spectra of the calibration samples now represent all N

    (3 20 = 23) curves displayed in the right column of

    subplots in Figure 4. The three full curves in Figure 5 show

    the predictive ability of the OLS, WLS and GLS cases, in

    terms of the cross-validated RMSEP(Y)A vs A = 0,1, 2,,6.(The dotted curve will be discussed later.)

    The figure first of all shows that while the OLS and WLS

    models require at least A = 3 PCs to reach acceptably low

    predictive error, the GLS model did so with only A = 1 PC.

    Moreover, a slight improvement in predictive ability was

    attained: using two PCs, the GLS case gives a lower

    predictive error than the OLS/WLS cases gave with three

    or more PCs.

    Finally, Figure 6 illustrates the effect of rescaling and

    descaling of the model parameters, in this case of the

    estimated regression coefficient vector at the lowest accep-

    table rank, for OLS, WLS and GLS. The OLS solution is

    superimposed on the WLS and GLS solutions as a dotted

    line, for comparison.

    The left column of subplots shows BA, as obtained from

    bilinear PLSR at the optimal number of PCs (A), based on the

    scaled X-variables in the OLS, WLS and GLS cases. The three

    ways of pre-processing may be seen to yield somewhat

    different scaled regression coefficients. Moreover, while the

    OLS and WLS solutions requiredA = 3 PCs, the GLS solution

    required only A = 1 PC.

    The middle column shows the rescaled coefficient

    spectrum BA,ForInput (Equation (5i)), suitable for application

    Figure 5. Calibration based on all samples: predictive performance after OLS, WLS

    and GLS pre-processing. Prediction error of y, estimated by full leave-one-out

    cross-validation, from PLSR modelling from X = XInputG with G = S1/2. Squares:

    OLS; S = l (no pre-processing). Circles: WLS; S diagonal (variance weighting).

    Triangles: knowledge-based GLS; S defined from two known interferant spectra l1

    and l2. Dotted curve: data-based GLS; S defined from spectral residuals after

    projection of XInput on y.

    Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 153165

    Pre-whitening of spectra 161

  • 7/28/2019 2003, GLS

    10/13

    directly to the input X-variables. Again the OLS solution is

    superimposed on the WLS and GLS solutions (dotted line).

    The scaling of the individual X-variables in vector BA,ForInputis independent of the pre-processing of the X-variables, so

    the only difference between the solutions is due to the impact

    of the pre-processing on the estimation process itself. Figures

    6(e) and 6(h) show that the downweighting of the X-vari-

    ables !channel #30 has rendered the other channels more

    important for separating the baseline variations due to the

    turbidity from the blue-coloured interferant and the red-

    coloured analyte. The wavelength channels just below #30,

    with low absorbance at the end of interferant spectrum l1(Figure 1), are given higher relative importance in the

    modelling. This confirms that in a rank-reduced calibration

    model such as the present low-rank PLSR modelling, there

    are several almost equivalent ways to combine the 48 input

    variables in order to attain the desired selectivity enhance-

    ment.

    The right column of subplots in Figure 6 shows the

    descaled coefficient spectrum BA,Descaled (Equation (5g)),

    suitable for graphical interpretation, with the OLS solution

    again superimposed (dotted line). Now the obvious effect of

    e.g. the sharp downweighing of X-variables !channel #30

    has been removed.

    The three solutions are qualitatively similar: they havepositive values below about channel #15, as expected from

    the spectral characteristic of the analyte red litmus, and

    negative values at higher wavelength channels in order to

    compensate for the possible presence of the interferants blue

    litmus and white ZnO. However, quantitatively, the three

    solutions are somewhat different. This shows that with

    different pre-processing methods the PLSR models needed

    to describe different Y-relevant patterns of variation in the

    data in order to attain the desired selectivity.

    4.6.1. Denition of the uncertainty covarianceSfromthe calibration data at handThe dotted curve in Figure 5 represented the results when

    interferant spectra L (Figure 1) were considered unknown,

    and instead estimated from the X- and Y-data of the 23samples in the actual calibration data set at hand. As before,

    leave-one-out cross-validation was employed, with re-

    estimation of the spectral interferant covariance S for each

    cross-validation segment. The figure shows that the pre-

    whitening based on the estimated spectral residual matrix D

    (Equation (7a)) with its dominant subspace L (Equations (7b)

    and (7c), using A = 2 PCs) gives almost as simple modelling

    as the one based on prior knowledge of the two interferants'

    individual spectra L=[l1, l2]: in both cases the number of

    PLSR components required is reduced, because the model

    does not have to span these major interferants. However, the

    prediction error is now slightly higher. A possible reason for

    this is that the former, knowledge-based pre-processing usedthe known spectra L as additional independent information

    in estimating S, while the latter, data-driven pre-processing

    had no such extra information available.

    Figure 6. Calibration based on all samples: regression coefficients estimated, rescaled

    and descaled. Top, OLS (A = 3 PCs); middle, WLS (A = 3 PCs); bottom, GLS (A = 1 PC).

    Left: coefficients bBA obtained from scaled spectra X. Middle: rescaled coefficientsbBA;ForInput (Equation (5i)), applicable directly to unscaled input spectra XInput. Right:

    descaled coefficients bBA;Descaled (Equation (5g)); weighting effects removed. Dotted

    curves: OLS estimate bBA3 from (a), for comparison.

    Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 153165

    162 H. Martens et al.

  • 7/28/2019 2003, GLS

    11/13

    5. DISCUSSION

    Figure 4 demonstrated an ability of the covariance-weighted

    pre-processing to give good predictive ability even for new

    samples with interferants not present in the calibration set . This

    may become important in e.g. calibrating industrial process

    analysers, when it is difficult to perturb the actual process

    enough to get a sufficiently informative calibration sample

    set. By introducing prior knowledge about known inter-

    ferants' spectral signatures, the interferants can be compen-

    sated for already in a pre-processing filtering step, and thus

    do not have to vary in the calibration set.

    Figure 5 demonstrated that the covariance-weighted GLS'

    pre-processing yielded calibration models with lower rank

    than those from the conventional `OLS' and `WLS' methods.

    High-dimensional models are generally cumbersome to

    interpret graphically, so that is an advantage. Moreover, as

    long as the uncertainty covariance S represents prior

    knowledge, a slight improvement in prediction ability may

    be expected, because the subsequent calibration thenrequires fewer statistical parameters to be estimated from

    the available Ncalibration data.

    5.1. Comparison with other methodsThe covariance-weighted pre-processing based on prior

    known spectra L has the advantage of reducing interference

    without consuming degrees of freedom from the available,

    often expensive Y-data. In that respect it resembles spectral

    interference subtraction (SIS) [12]. If, instead, S is estimated

    from the available data [X, Y] at hand, the pre-processing has

    some similarity to so-called orthogonal signal correction

    (OSC) [13] and direct orthogonalization (DO) [14]. Extendedmultiplicative signal correction (EMSC) [12,15] has similar

    properties to SIS and covariance-weighted pre-processing,

    but allows for removal of both additive and multiplicative

    effects.

    There is one major difference in how the covariance-

    weighted pre-processing and the set of OSC, DO, SIS and

    EMSC methods attempt to reduce the interference effects in

    XInput. The latter methods subtract the effects in one way or

    another. In contrast, the new covariance-weighted pre-

    processing is based on shrinking by division (i.e. multi-

    plication by the inverse of S; see Equations (2) and (3)). The

    full consequences of this distinction are not yet clear.

    However, it may be noted that DO [14] is particularly

    similar to the data-driven estimation of interferant subspace

    L (Equations (7a)(7c); Figure 5, dotted line), even though it

    employs subtraction instead of inverted scaling to eliminate

    the effect of the interferants.

    5.2. Pre-colouring the spectraInstead of just shrinking the X-space in particularly

    undesired or irrelevant directions, one may also reformulate

    the covariance-weighted pre-processing to expand the

    X-space in directions known to be particularly desired or

    relevant. For instance, after having contracted the X-space to

    filter out irrelevant or detrimental interferants, the X-spacecould then be expanded in the dimension of the analyte's

    spectrum (curve 3, Figure 1), to enhance this desired type of

    variation over e.g. random measurement noise in the

    subsequent multivariate subspace analysis. Preliminary

    Monte Carlo simulations (not shown here) indicate this to

    have some statistical advantage.

    The pre-processing has been used for pre-whitening

    spectral X-variables in this paper. However, it may equally

    well be applied to the set of Y-variables. Appendix I outlines

    various equivalent alternatives for integrating the interferantcovariance matrix S into the actual estimators in PCA/PCR

    and PLSR, instead of using S1/2 for pre-processing. When

    prior knowledge is available about the available objects, the

    pre-processing may also then be used, in a bilinear analogy

    to the conventional GLS estimator (Appendix II).

    It should be noted that after covariance-weighted pre-

    processing to remove all major interferants, the remaining

    spectra mainly show the net signal of the analyte plus

    random noise (see Figure 2(g)). Of course, if the spectrum of

    the analyte, K (Equation (1)), is a linear combination of the

    spectra L of the interferants, the covariance-weighted pre-

    processing will filter out the analyte effect too; the remaining

    net analyte signal is zero. Thus the usual requirement in

    quantitative analysis, that the analyte spectrum has to be

    linearly independent of the major interferant spectra,

    remains valid.

    6. CONCLUSIONS

    A method has been presented for covariance-weighted pre-

    processing of multivariate input data. It facilitates the use of

    prior knowledge about undesired (and desired) structures

    that are expected to vary in the input data. Its purpose is to

    reduce the complexity of the ensuing model and to improve

    its predictive ability. The method was illustrated forreducing the effect of spectral variations due to known

    interferants' known spectra.

    In general, multivariate calibration by low-rank regres-

    sion, using e.g. PCR or PLSR, has proven highly effective for

    solving selectivity problems in complex systems. Many

    unidentified interference problems can even be dealt with, as

    long as they are spanned well in the calibration sample set

    and picked up clearly by the multichannel instrument.

    However, the present combination of prior knowledge

    and empirical calibration data may simplify calibration,

    because already known parameters do not have to be

    estimated statistically from the calibration data. The finalstatistical regression stage in the calibration process could

    then primarily be used for finding and correcting unknown

    or unexpected phenomena in the data. Thereby calibration of

    multichannel instruments may become less expensive and

    time-consuming, and easier to understand.

    APPENDIX I. EIGENVECTOR EXPRESSIONSFOR COVARIANCE-WEIGHTED PRE-PROCESSING

    In PCA, each latent variable (PC) is an eigenvector of XX'

    (after suitable mean centring). If the score vector for anindividual PC, t, is scaled to t't = 1, this may be written as

    tl=(XX')t. Inserting X = XInputS1/2 (Equations (2) and (3))

    into this eigenvalue expression yields the covariance-

    Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 153165

    Pre-whitening of spectra 163

  • 7/28/2019 2003, GLS

    12/13

    weighted expression tl= (XInputS1X'Input)t. Equivalently, t

    is then a right-hand singular vector of XInputS1/2.

    Conversely, if the PCA loading vector p is scaled to

    p'p = 1, then pl=(X'X)p. Inserting X = XInputS1/2 gives

    pl= (S1/2X'InputXInputS1/2)p; p is then a left-hand singu-

    lar vector of XInputS1/2.

    In PLSR, each component is an eigenvector of the XYcovariance structure [10]. For instance, with orthonormal

    scores, t is defined by tl= (XX'YY')t (after suitable deflation

    for previous components). With X = XInputS1/2 this gives

    the expression tl= (XInputS1X'InputYY')t. Conversely, the

    orthonormal loading weight w for each component, used for

    defining t = X'w (after suitable deflation of X for previous

    components), is defined by wl= (X'YY'X)w. Covariance-

    weighted pre-processing is equivalent to defining

    wl= (S1/2X'InputYY'XInputS1/2)w, or w as the first left-

    hand singular vector of S1=2XTInputY.

    Hence the PCA/PCR and PLSR solutions may be

    obtained either by covariance-weighted pre-processing

    X = XInputS1/2 followed by standard OLS-based software

    for PCA/PCR or PLSR, or by eigenvector decomposition of

    cross-product matrices weighted by S1. The latter is

    analoguous to generalised least squares (GLS) regression.

    APPENDIX II. GLS AND COVARIANCE-WEIGHTED PRE-PROCESSING

    The relationship between generalized least squares (GLS)

    regression and covariance-weighted pre-processing will be

    demonstrated here. In weighted least squares (WLS) the

    regressorregressor and regressorregressand cross-productmatrices are modified by the inverse error covariance matrix

    S1. When S has off-diagonal elements, this approach is

    called `GLS' in some statistical literature [2]. The terms `WLS'

    and `GLS' are therefore employed here to distinguish purely

    variance-based weighting from covariance-based weighting.

    In some other statistical literature the WLS and GLS terms

    are used more interchangeably. More details are given in

    Reference [1].

    II. 1. Regression over objectsIn the conventional OLS case the input data for one or more

    regressands, YInput (NJ)=[yInput,j, j = 1,2,,J], are modelledby projection on one or more regressors, XInput(N K)=[xInput,k, k = 1,2,,K], over a set of N objects,

    according to the linear model YInput = XInputB FInput (ignor-

    ing the mean centring). To estimate the regression coeffi-

    cients B (KJ), the conventional estimator fits each

    regressand yInput (N 1) individually to XInput by minimiz-

    ing f'InputfInput. This yields the conventional full-rank OLS

    estimator bB XHInputXInput1XHInputYInput.

    If the correlation pattern between the response errors in

    the Nobjects, SN (N N), is known, the GLS estimator bB

    XHInputS1N XInput

    1XHInputS1N YInput yields better estimates,

    because it minimizes fHInputS1N fInput for each regressor, i.e.

    the importance of the correlated error pattern is down-weighted.

    Equivalently, the pre-whitening operators X S1=2N XInput

    and Y S1=2N YInput allow the model to be rewritten as

    Y = XB F. The same GLS estimator may now be rewritten

    as bB XHX1XHY, which shows that covariance-weighted

    pre-processing allows the GLS estimation of B to be

    performed by conventional OLS tools. This was here shown

    for full-rank OLS/GLS regression, but is equally applicable

    for regression methods that handle collinear X-variables,

    such as ridge regression and the bilinear methods PCR andPLSR.

    II. 2. Regression over X-variablesThe converse case is traditional direct multivariate calibra-

    tion or multicomponent curve resolution according to Beer's

    law. Here each spectrum xInput (1 K) in the matrix

    XInput=[xInput,k; k = 1,2,,K] is modelled by a set of J known

    analyte spectra K (KJ) in the linear regression model

    XInput = CK'Input EInput, where C (NJ) is the matrix of

    unknown analyte concentrations and EInput (N K) is the

    matrix of spectral residuals (ignoring baseline offsets). When

    the constituent spectrum matrix KInput has full column rank,

    the OLS estimator minimizes eInpute'Input for each row in

    XInput, yielding bC XInputKInputKHInputKInput

    1.

    If the correlation pattern between the response errors

    in the K X-variables, S (K K), is known, then the

    GLS estimator minimizes eInputS1e'Input and yields

    bC XInputS1KInputK

    HInputS

    1KInput1.

    The equivalent covariance-weighted pre-processing solu-

    tion for curve resolution pre-whitens the spectra [X; K'] =

    [XInput; K'Input]S1/2, thereby shrinking away the noise

    correlations between the X-variables. The model may then

    be written as X = CK'E and the GLS concentration estimate

    may be obtained by bC XKKHK1, i.e. by an OLS

    expression.In summary, prior knowledge about the uncertainty

    covariances S may be used to improve linear regression. In

    Appendix I the same was shown for bilinear regressions. In

    both cases, one may either analyse the input data directly by

    GLS or GLS-like expressions, involving S1, or perform

    covariance-weighted pre-processing of the input data by

    S1/2, followed by OLS or OLS-like expressions, as

    illustrated in this paper.

    REFERENCES

    1. Read BC. Weighted least squares. In Encyclopedia ofStatistical Sciences, vol. 9, Kotz S, Johnson NL (eds). WileyInterscience, J. Wiley & Sons Inc: New York, 1988; 576578.

    2. Martens H, Naes T. Multivariate Calibration. Wiley:Chichester, 1989.

    3. Gower JC. Generalised canonical analysis. In MultiwayData Analysis, Coppi R, Bolasco S (eds). Elsevier:Amsterdam, 1989; 221232.

    4. Bullmore E, Long C, Suckling J, Fadili J, Calvert G, ZelayaF, Carpenter A, Brammer M. Colored noise andcomputational inference in neurophysiological (fMRI)time series analysis: resampling methods in time andwavelet domains. Human Brain Mapp. 2001; 12: 6178.

    5. De Lathauwer L, de Moor B, Vandewalle J. An introduc-tion to independent component analysis. J. Chemometrics2000; 14: 123149.

    6. Kuldvee R, Kaljurand M, Smit HC. Improvement ofsignal-to-noise ratio of electropherograms and analysis

    Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 153165

    164 H. Martens et al.

  • 7/28/2019 2003, GLS

    13/13

    reproducibility with digital signal processing and multi-ple injections. J. High Resol. Chromatogr. 1998; 21: 169174.

    7. Wentzell PD, Andrews DT, Kowalski BR. Maximumlikelihood multivariate calibration. Anal. Chem. 1997; 69:22992311.

    8. Wentzell PD, Lohnes MT. Maximum likelihood principalcomponent analysis with correlated measurement errors:

    theoretical and practical considerations. ChemometricsIntell. Lab. Syst. 1999; 45: 6585.9. Paatero P, Tapper U. Positive matrix factorisation: a non-

    negative factor model with optimal utilisation of errorestimates of data values. Environmetrics 1994; 5: 111126.

    10. Ho skuldsson A. PLS Regrl 7 session methods. J Chemo-metrics, 1988; 2: 211228.

    11. Martens H, Martens M. Multivariate Analysis of Quality.An Introduction. Wiley: Chichester, 2001.

    12. Martens H, Stark E. Extended multiplicative signalcorrection and spectral interference subtraction: newpre-processing methods for near infrared spectroscopy.J.Pharmaceut. Biomed. Anal. 1991; 9: 625635.

    13. Wold S, Antti H, Lindgren F, O hman J. Orthogonal signalcorrection of near-infrared spectra. Chemometrics Intell.Lab. Syst. 1998; 44: 175185.

    14. Andersson CA. Direct orthogonalization. ChemometricsIntell. Lab. Syst. 1999; 47: 5163.15. Martens H, Pram Nielsen J, Balling Engelsen S. Light

    scattering and light absorbance separated by extendedmultiplicative signal correction (EMSC). Application toNIT analysis of powder mixtures. Anal. Chem. 2003; 75:394404.

    Copyright # 2003 John Wiley & Sons, Ltd. J. Chemometrics 2003; 17: 153165

    Pre-whitening of spectra 165


Recommended