+ All Categories
Home > Documents > Improving the performance of hollow waveguide-based infrared gas sensors via tailored chemometrics

Improving the performance of hollow waveguide-based infrared gas sensors via tailored chemometrics

Date post: 26-Nov-2023
Category:
Upload: independent
View: 0 times
Download: 0 times
Share this document with a friend
10
RESEARCH PAPER Improving the performance of hollow waveguide-based infrared gas sensors via tailored chemometrics David Perez-Guaita & Andreas Wilk & Julia Kuligowski & Guillermo Quintás & Miguel de la Guardia & Boris Mizaikoff Received: 10 May 2013 / Revised: 21 June 2013 / Accepted: 8 July 2013 # Springer-Verlag Berlin Heidelberg 2013 Abstract The use of chemometrics in order to improve the molecular selectivity of infrared (IR) spectra has been eval- uated using classic least squares (CLS), partial least squares (PLS), science-based calibration (SBC), and multivariate curve resolution-alternate least squares (MCR-ALS) tech- niques for improving the discriminatory and quantitative performance of infrared hollow waveguide gas sensors. Spectra of mixtures of isobutylene, methane, carbon dioxide, butane, and cyclopropane were recorded, analyzed, and val- idated for optimizing the prediction of associated concentra- tions. PLS, CLS, and SBC provided equivalent results in the absence of interferences. After addition of the spectral char- acteristics of water by humidifying the sample mixtures, CLS and SBC results were similar to those obtained by PLS only if the water spectrum was included in the calibra- tion model. In the presence of an unknown interferant, CLS revealed errors up to six times higher than those obtained by PLS. However, SBC provided similar results compared to PLS by adding a measured noise matrix to the model. Using MCR-ALS provided an excellent estimation of the spectra of the unknown interference. Furthermore, this method also provided a qualitative and quantitative estimation of the components of an unknown set of samples. In summary, using the most suitable chemometrics approach could im- prove the selectivity and quality of the calibration model derived for a sensor system, and may avoid the need to analyze expensive calibration data sets. The results obtained in the present study demonstrated that (1) if all sample components of the system are known, CLS provides a suffi- ciently accurate solution; (2) the selection between PLS and SBC methods depends on whether it is easier to measure a calibration data set or a noise matrix; and (3) MCR-ALS appears to be the most suitable method for detecting inter- ferences within a sample. However, the latter approach re- quires the most extensive calculations and may thus result in limited temporal resolution, if the concentration of a compo- nent should be continuously monitored. Keywords Hollow waveguide . Infrared sensor . Gas sensing . Chemometrics . PLS . SBC . CLS . MCR-ALS Introduction Infrared gas sensing is a well-developed analytical technol- ogy. Since a wide variety of analytically relevant gases composed of two or more dissimilar atoms are active in the infrared wavelength regime, infrared (IR) spectroscopy may be used for studying gas compositions within numerous applications including process analysis and control [1], at- mospheric analysis [2], and clinical research [3]. In the last decades, IR-based gas sensors have extensively been improved with the implementation of quantum cascade lasers [4] and the introduction of hollow waveguides, simul- taneously serving as waveguide and as miniaturized gas cell [5]. Their main advantages include the high (laser) power thresholds, low insertion losses, robustness, and small beam divergence [6]. Some examples of their usage include the determination of volatile organics in field environments [7], D. Perez-Guaita : M. de la Guardia Analytical Chemistry Department, University of Valencia, Edificio Jeroni Muñoz, 46100 Burjassot, Valencia, Spain A. Wilk : B. Mizaikoff (*) Institute of Analytical and Bioanalytical Chemistry, University of Ulm, 89081 Ulm, Germany e-mail: [email protected] J. Kuligowski Division of Neonatology, University Hospital Materno-Infantil La Fe, Bulevar Sur, s/n, 46026 Valencia, Spain G. Quintás Leitat Technological Center, Bio In Vitro Division, Health Research Institute La Fe, 46026 Valencia, Spain Anal Bioanal Chem DOI 10.1007/s00216-013-7230-5
Transcript

RESEARCH PAPER

Improving the performance of hollow waveguide-basedinfrared gas sensors via tailored chemometrics

David Perez-Guaita & Andreas Wilk & Julia Kuligowski &Guillermo Quintás & Miguel de la Guardia &

Boris Mizaikoff

Received: 10 May 2013 /Revised: 21 June 2013 /Accepted: 8 July 2013# Springer-Verlag Berlin Heidelberg 2013

Abstract The use of chemometrics in order to improve themolecular selectivity of infrared (IR) spectra has been eval-uated using classic least squares (CLS), partial least squares(PLS), science-based calibration (SBC), and multivariatecurve resolution-alternate least squares (MCR-ALS) tech-niques for improving the discriminatory and quantitativeperformance of infrared hollow waveguide gas sensors.Spectra of mixtures of isobutylene, methane, carbon dioxide,butane, and cyclopropane were recorded, analyzed, and val-idated for optimizing the prediction of associated concentra-tions. PLS, CLS, and SBC provided equivalent results in theabsence of interferences. After addition of the spectral char-acteristics of water by humidifying the sample mixtures,CLS and SBC results were similar to those obtained byPLS only if the water spectrum was included in the calibra-tion model. In the presence of an unknown interferant, CLSrevealed errors up to six times higher than those obtained byPLS. However, SBC provided similar results compared toPLS by adding a measured noise matrix to the model. UsingMCR-ALS provided an excellent estimation of the spectra ofthe unknown interference. Furthermore, this method alsoprovided a qualitative and quantitative estimation of the

components of an unknown set of samples. In summary,using the most suitable chemometrics approach could im-prove the selectivity and quality of the calibration modelderived for a sensor system, and may avoid the need toanalyze expensive calibration data sets. The results obtainedin the present study demonstrated that (1) if all samplecomponents of the system are known, CLS provides a suffi-ciently accurate solution; (2) the selection between PLS andSBC methods depends on whether it is easier to measure acalibration data set or a noise matrix; and (3) MCR-ALSappears to be the most suitable method for detecting inter-ferences within a sample. However, the latter approach re-quires the most extensive calculations and may thus result inlimited temporal resolution, if the concentration of a compo-nent should be continuously monitored.

Keywords Hollow waveguide . Infrared sensor . Gassensing . Chemometrics . PLS . SBC . CLS . MCR-ALS

Introduction

Infrared gas sensing is a well-developed analytical technol-ogy. Since a wide variety of analytically relevant gasescomposed of two or more dissimilar atoms are active in theinfrared wavelength regime, infrared (IR) spectroscopy maybe used for studying gas compositions within numerousapplications including process analysis and control [1], at-mospheric analysis [2], and clinical research [3].

In the last decades, IR-based gas sensors have extensivelybeen improved with the implementation of quantum cascadelasers [4] and the introduction of hollow waveguides, simul-taneously serving as waveguide and as miniaturized gas cell[5]. Their main advantages include the high (laser) powerthresholds, low insertion losses, robustness, and small beamdivergence [6]. Some examples of their usage include thedetermination of volatile organics in field environments [7],

D. Perez-Guaita :M. de la GuardiaAnalytical Chemistry Department, University of Valencia,Edificio Jeroni Muñoz, 46100 Burjassot, Valencia, Spain

A. Wilk : B. Mizaikoff (*)Institute of Analytical and Bioanalytical Chemistry,University of Ulm, 89081 Ulm, Germanye-mail: [email protected]

J. KuligowskiDivision of Neonatology, University Hospital Materno-InfantilLa Fe, Bulevar Sur, s/n, 46026 Valencia, Spain

G. QuintásLeitat Technological Center, Bio In Vitro Division, HealthResearch Institute La Fe, 46026 Valencia, Spain

Anal Bioanal ChemDOI 10.1007/s00216-013-7230-5

the determination of 12CO2 and 13CO2 ratios in exhaledmouse breath [8, 9], the quantification of carbon monoxidein sidestream cigarette smoke[10], and as a detector in gaschromatography [11]. However, since any organic moleculeabsorbs in the mid-infrared region, mixtures of gases fre-quently reveal highly convoluted and/or overlapping absorp-tion features rendering the direct quantification of selectedtarget analytes difficult [12, 13]. Consequently, chemometricalgorithms are frequently applied for addressing this issue.

While some studies report on the application of principalcomponent regression (PCR) or classical least squares (CLS)techniques [2, 14], the most commonly applied method forquantifying gas mixtures is based on partial least squares(PLS) regression [1, 7, 15–17]. PLS is among the mostfrequently used calibration and data evaluation techniquesin vibrational spectroscopy used for predicting analyte con-centrations in complex matrices [18]. However, PLS regres-sion requires a robust calibration data set based on a collec-tion of well-defined and representative samples includinglinearly independent variations of concentrations of theanalytes in the presence of any potential interferant. Consid-ering the substantial efforts required for establishing reliablecalibrations in PLS, alternative multivariate calibration ethodsincluding science-based calibration (SBC) or multivariatecurve resolution-alternate least squares (MCR-ALS) are in-creasingly adopted.

The aim of the present study was to compare CLS, PLS,SBC, and MCR-ALS for the prediction of individual con-centrations in a mixture of gases analyzed via mid-infraredsensors based on hollow waveguides in order to take maxi-mum advantage of the inherent information content of broad-band IR gas sensors. Isobutylene, methane, CO2, butane, andcyclopropane were selected as exemplary analytes using aseries of samples containing mixtures of these components.The selected methods were validated with the same set ofsamples in different measurement scenarios, i.e., (1) a directcomparison without interferences, (2) in the presence of aknown interference, (3) in presence of an unknown interfer-ence, and (4) for entirely unknown samples. Furthermore, itwas investigated whether guidelines may be established forrationally supporting the selection of the most appropriatealgorithm for individual measurement scenarios and sensortechnologies.

Theoretical background

A summarized description of the chemometric methods used inthe present study is provided; for more detailed information,continuative references have been included. Bold uppercaseletters represent matrices(X), bold italic lowercase charactersrepresent vectors (v), and italic uppercase letters represent sca-lars (N ). Both simulated and experimental data sets have been

converted into matrices X (N×J ), where rows (N ) and col-umns (J ) correspond to samples and variables, respectively.

Partial least squares regression

In PLS, regressions are computed via least squares algo-rithms. PLS aims at establishing a linear relationship be-tween the predictive X block of spectra and the predicted yvector of analyte concentrations [19] by extracting a set oflatent variables (LVs), which explain the sources of variationin the X block correlated to the y vector [20]. This can beexplained by the representation of the spectra in the space ofwavenumbers in order to reveal directions that are linearcombinations of wavenumbers (i.e., LVs), which best de-scribe the studied property.

In case of the prediction of a concentration vector of Nsamples through their spectra of J wavenumbers, the rela-tionship between the spectra matrix X (N×J ) and the con-centration vector y (N×1) may be described as:

y ¼ XbT þ e ð1Þwhere b (1×J ) is the vector of regression coefficients cal-culated by the LVs and e (N×1) is the error vector (i.e.,residuals). Since latent variable selection is performedaccording to the covariance matrix between the data and theinvestigated parameter, PLS may be described as a powerfulpredictive version of PCR. However, the main drawback ofthis technique is the necessity for a robust calibration data setwith an X matrix composed of representative spectra of thesamples to be analyzed, and their potentially time- and cost-extensive laboratory reference values.

Classic least squares regression

CLS modeling is based on the bilinear relationship of thespectra of N samples with the concentration of P componentsto be determined in each sample, and the reference spectra ofthe components measured at J wavenumbers [21, 22]:

X ¼ CST þ E ð2ÞWhere X (N×J ) is the matrix containing the spectra, C

(N×P ) contains the vectors of concentrations, S (P×J ) isthe matrix containing reference spectra, and E is the matrixrepresenting the residual noise.

The sample concentrations are calculated by multiplyingtheir spectra with the transposed of a regression vector in thesame way as described for Eq. 1. However, in CLS B, thematrix (P×J ) of regression vectors is calculated using Eq. 3:

BT ¼ S STS� �−1 ð3Þ

The main advantage of CLS is its simplicity and ease ofcalibration, which only requires access to reference spectra

D. Perez-Guaita et al.

of the components at their reference concentrations. As CLSrefers to a “closed system” limited to n components, the maindrawback is its lack of robustness, and its sensitivity to anyinterferences whose spectra are not included in S.

Science-based calibration

SBC is a relatively new multivariate method that, accordingto Marbach [23], “combines the best features of classicalcalibration and inverse calibration.” For this purpose, SBCassumes that a measured spectrum may be described as:

xT ¼ YgT þ xTn ð4Þ

where the xT vector is the measured spectrum, Y is theconcentration of the analyte, gT is the reference spectrumof the analyte of interest, and xTn is every feature in theexperimentally obtained spectrum that is not resulting froman analyte, i.e., including instrumental noise and spectra ofinterferants.

For estimating xTn , a so-called noise matrix of M spectra

and J wavenumbers eX M � Jð Þ is experimentally recorded,where the differences between the spectra are only due tovariations in the concentration of constituents other than thetarget analytes and variations of the instrumental noise [24].Once this experimentally obtained noise matrix is meancentered, it represents only “spectral noise”:

eX ¼ eXn ð5Þ

In a next step, the covariance of the spectral noise may beestimated:

X≅ ð6Þ

The optimum regression vector b (1×J), i.e., the vectorproviding the minimum squared prediction error, is calculat-ed as:

bopt ¼X

−g

gTX

−gð7Þ

According to Marbach [23], by statistically estimating thespectral noise after the spectral signal has been obtained viaexperimental analysis, the prediction accuracy of the inversemodel may be combined with the simple and easilyinterpreted classical least squares regression model. Al-though the method is only applicable whenever the responsespectrum of the analyte of interest is available, SBC caneliminate “cost of calibration” as a roadblock to the morewidespread use of multivariate calibration.

Multivariate curve resolution-alternate least squaresregression

MCR-ALS has matured to a commonly applied chemometricmethod used for resolving multiple component responses ofunknown and unresolved mixtures [25]. MCR-ALS methodsare mathematically based on the same bilinear model repre-sented in Eq. (2). However, MCR-ALS iteratively solves thisequation by an alternating least squares algorithm, therebycalculating the reference spectra matrix S and concentrationmatrix C for optimum fit to the experimental data X. Thisoptimization process progresses via a suggested number ofcomponents n, and an initial estimation of either the refer-ence spectra or concentration matrix. The iteration is finishedonce the relative differences in standard deviations of theresiduals between values predicted by ALS and the experi-mental data are less than a deliberately selected cutoff value.

In contrast to the CLS, MCR-ALS modeling can be high-ly complex and several variables must be selected. For thecalculation of the initial estimation of the reference spectra,several algorithms such as simple-to-use interactive self-modeling mixture analysis (SIMPLISMA) or evolving factoranalysis are available. Furthermore, constraints may be in-troduced including, e.g., non-negativity, unimodality, clo-sure, or other relevant modeling constraints.

Since in the first two scenarios of the present study (i.e.,(1) direct comparison and (2) presence of a known interfer-ence) all reference spectra were available, performing MCR-ALS models would be equivalent to establishing CLSmodels. Therefore, MCR-ALS evaluation was omitted inthese cases.

Materials and methods

Preparation of gas mixtures and IR analysis

A gas mixing system developed by B. Mizaikoff (Insti-tute of Analytical and Bioanalytical Chemistry, Univ.Ulm) and C. Carter (Lawrence Livermore National Laborato-ry, Livermore/CA, USA) was used for providing gas stan-dards. Ten thousand parts per million (1 %) standards ofcyclopropane, isobutylene, methane, butane, and CO2 weremixed and diluted with nitrogen as IR-transparent backgroundgas. All gas standards were provided by MTI IndustriegaseAG (Neu-Ulm, Germany). The samples provided by the gasmixing systemwere directly inserted into a hollow waveguide(HWG) with an optical path length of 3 cm [8], which simul-taneously served as a waveguide and as a miniaturized ab-sorption gas cell. The HWG sensingmodule was coupled to anFTIR spectrometer (IRcube, Bruker Optics Inc., Billerica/MA,USA) and a mercury–cadmium–telluride detector (InfraredAssociates, Stuart/FL, USA), cooled with liquid nitrogen.

fXTeXM−1

Improving HGW-based IR gas sensors performance via tailored chemometrics

Spectra were recorded in the range between 4,000 and600 cm−1 with a spectral resolution of 2 cm−1, averaging 200scans per spectrum. A spectrum of the HWG module flushedwith nitrogen was used as background spectrum.

Software

Data analysis was performed using Matlab 7.7.0 (Mathworks,Natick, USA) using in-house programmed MATLAB func-tions based on the PLS Toolbox 6.2 (Eigenvector ResearchInc., Wenatchee/WA, USA), the MCR-ALS toolbox providedby Jumon et al. available at http://www.ub.edu/mcr/web_mcr/mcrals.html, and the SBC algorithm provided byMabach [23].

Direct comparison

CLS, SBC, and PLS were compared for the analysis ofmixtures of gases. For this purpose, a calibration and avalidation sample data set was measured according toBrereton [26]. Therefore, isobutylene, methane, cyclopro-pane, butane, and carbon dioxide were mixed to obtain acalibration data set with 25 samples at concentration levels of200, 600, 1,000, 1,400, 1,800, and 2,200 ppm, and a valida-tion sample set of 15 samples at concentration levels of 400,800, 1,200, 1,600, and 2,000 ppm; the corresponding spectraof both data sets were experimentally obtained. Fifteenblanks of nitrogen and a reference spectrum of each analyteat 2,000 ppm were also recorded. In each case, six spectra ofa mixture of the five analytes with a concentration of1,000 ppm were used to evaluate the repeatability of thechemometric methods studied herein. For evaluating theaccuracy, a spectra matrix (XVAL) was established usingthe 15 validation sample spectra (see Fig. 1, bottom) and12 blanks. Table 1 summarizes the matrices and vectors usedfor modeling and validation in “Direct comparison” to“Studies on entirely unknown samples” sections.

Preliminary studies evidenced that the selected algorithmsare highly sensitive to the spectral regions selected for dataevaluation and to the data preprocessing method appliedprior to modeling. In order to provide a fair comparisonamong the methods and analytes, only spectral regions withrelevant contributions of the target analytes were used, i.e.,the spectral regions 3,125–2,825 and 925–850 cm−1 forisobutylene, 3,150–2,875 and 1,375–1,225 cm−1 for meth-ane, 3,150–2,975 and 1,100–800 cm−1 for cyclopropane,3,050–2,800 and 1,525–1,350 cm−1 for butane (see blacklines in Fig. 2), and 2,400–2,250 cm−1 for carbon dioxide.These spectral regions were used for all predictions through-out this study except in the case of suspected unknowninterferants (see “Studies on entirely unknown samples”section). Since in this case no information on the samplecomponents is available, the entire spectrum was used formodeling. Hence, models were sequentially created for each

analyte using the specified spectral regions. For PLS, firstderivatives and mean centering were used for data prepro-cessing of the spectra matrix, whereas baseline correctionusing the baseline function available in MATLAB and meancentering was used prior to CLS, SBC, and MCR modeling.The concentration values were mean centered as well.

PLS modeling was performed using a XCAL spectramatrix composed of the 25 calibration spectra and 3 blanks.Prior to the validation with XVAL, the number of LVs wasselected using leave-one-out cross-validation.

CLS was developed using a matrix XREF composed ofthe five reference spectra of the analytes, and then validatedwith XVAL. These spectra were measured averaging 1,000scans (compared to 200 for all other measurements) in orderto provide a similar signal-to-noise ratio for the threemethods (i.e., 5 spectra × 1,000 scans in comparison to 25spectra × 200 scans).

In case of SBC, the XREF matrix was used again. Thespectrum of the analyte under study was used as a reference,and those of the other constituents were combined for creat-ing the artificial noise matrix. Once the optimum regressionvector was determined, the model was validated with XVAL.

Studies in the presence of known interferences

Here, the three aforementioned methods (PLS, CLS, SBC)were evaluated in the presence of a known interferant. Thisscenario is a common situation in gas sensing and can beexemplified by the errors associated with the presence ofvariable concentrations of water, which is a particularlystrong and omnipresent interferant in IR vapor phase sens-ing. For this purpose, air saturated with water was introducedinto the HWG and a spectrum was recorded at 20 °C

Fig. 1 Spectra corresponding to the validation dataset before (bottom)and after (top) artificial water addition. Note: spectra were shifted in they-axis

D. Perez-Guaita et al.

averaging 1,000 scans. In order to avoid the inclusion ofadditional noise, only the spectral regions with significantwater contributions (1,250–2,210 and 3,400–4,000 cm−1)were selected, whereas absorbances at other wavenumberswere artificially set to 0. This “water vector” called w wasadded to each row of XVAL, multiplying w by a randomnumber between 0 and 1 prior to each addition. Therefore, aXVAL_W matrix of 15 validation samples with variableconcentrations of water ranging from 0 to w was obtained(see Fig. 1, top). As the introduction of the random numberinduces certain variability, the XVAL_W construction andevaluation was repeated 15 times. The obtained XVAL_Wmatrices were then used to evaluate the performance of PLS,CLS, and SBC, as already discussed for the case of directcomparison (see “Direct comparison” section). For studyingthe impact of different concentrations of water on thosedeveloped models, the aforementioned experiment was alsoperformed using the experimentally obtained water spectramultiplied by 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, and 0.The spectral region of CO2 (2,400–2,250 cm

−1) was exclud-ed from this evaluation, as water does not show any absorp-tion features therein.

In order to study the potential approaches to compensate fora known interference—here, water—with the chemometricmethods considered herein, the validation was repeated tocorrect for water interferences. For PLS modeling, it was nec-essary to build a calibration set, which included all possibleinterferences present in the validation set. Hence, XCAL_Wwas used and a matrix was constructed adding the spectra ofwater similar to XVAL_W. For the correction of water interfer-ence within the other methods, a reference spectrum of breathwas experimentally obtained averaging 500 scans wref. In caseof CLS, this spectrum was added as a component to thecalibration matrix XREF, while for SBC this spectrum wasused for creating the artificial noise matrix.

Studies in the presence of unknown interferences

The goal of this evaluation was to test the accuracy of PLS,CLS, SBC, and MCR-ALS in the presence of an unknowninterference. As the spectral features of cyclopropane overlapwith butane, isobutylene, and methane, cyclopropane wasselected as quasi-unknown interference to be corrected forby the algorithms. Since CO2 does not show any significant

Table 1 Summary describing the matrices employed for the modeling and validation of “Direct comparison” to “Studies on entirely unknownsamples” sections

Directcomparison

Studies in the presence ofknown interference (water)

Studies in the presence of unknowninterference (cyclopropane)

Studies on entirelyunknown samples

Uncorrected Corrected Uncorrected Corrected

Section 3.3 3.4 3.5 3.6

Analytes Iso, Met, Cyc,But, CO2

Met, Buta Iso, Met, Butb Iso, Met, Cyc, But

Validation set XVAL XVAL_W XVAL

Calibration set PLS XCAL XCAL XCAL_W –c –c –

CLS XREF XREF [XREF; wref] XREF (withoutCyc referencespectrum)

–d –

SBCe Artificial NM (basedon (XREF-[a])f) +XREF (a,:)g

Artificial NM(based on(XREF-[a])f)+XREF(a,:)g

Artificial NM(based on(XREF-[a]f

and wref)) +XREF (a,:)g

Artificial NM (basedon (XREF-[a])f

without Cyc referencespectrum) + XREF(a,:)g

NM including 8spectra from XCALwith concentrationvalues of 1,400 and1,800 ppm + XREF(a,:)g

MCR –h –h –h [XVAL; XREF(a,:)]

[XVAL; XREF(a,:)]

[XCAL; XVAL]

Iso isobutylene, Met methane, Cyc cyclopropane, But butane, CO2 carbon dioxide (for more information, see text)a Only the analytes whose considered regions overlapped with the water spectrumbOnly the analytes whose considered regions overlapped with the cyclopropane spectrumc It was not computed because in the “Direct comparison” section the PLS was already performed in presence of an unknown interference, since thecyclopropane reference spectrum was not usedd Unable to correcte Noise matrix (NM) + reference spectrumf For each analyte under study XREF-[a] is XREF without the reference spectrum of the analyte.g Reference spectrum of the analyte as a row in XREFh It was not computed since it was equivalent to CLS

Improving HGW-based IR gas sensors performance via tailored chemometrics

spectral overlap with cyclopropane, it was excluded from thisstudy. XVAL was used for validating the models in the sameway described for the direct comparison (compare “Directcomparison” section); however, this time cyclopropane refer-ence spectra were excluded from theXREFmatrix of the CLScalibration and from the artificial noise matrix developmentfor SBC.

Then, it was tested whether the models appropriately cor-rect for the effects of cyclopropane in SBC. Since the refer-ence spectra of all individual components within the mixturewere not available (we suppose that the interferant is un-known), in this case a noise matrix for each target analytecomposed from a collection of sample spectra with similarconcentration of the analyte considered in each case (e.g.,isobutylene, methane, or butane), and variable concentrationof the interferences considered in each case (i.e., the others

components of the mixture, including cyclopropane), wascreated. For this purpose, samples from the XCAL matrixutilized for the PLS calibration were employed. Using theabove-mentioned matrix and the reference spectra of eachsingle compound obtained from XREF, SBC models wereestablished and validated with spectra of XVAL.

Although CLS was unable to treat an unknown interfer-ence, it was still applied in MCR for the decomposition ofspectra, i.e., for determining the usefulness of MCR-ALS forcorrection of an unknown interference. XVAL was used as Xmatrix along with a reference spectrum of the analyte understudy obtained from XREF. As an initial estimation of thecomponents of the mixture, the reference spectra of the targetanalyte and the known interferences were used, which werefixed within the iteration; a noise spectrum was created usingthe rand function of MATLAB as initial estimation of the

Fig. 2 Comparison between thespectra measured (solid blue)and calculated by MCR-ALS(dotted red) for the four analytesunder study in “Studies onentirely unknown samples”section (a) and the fifth spectracalculated that corresponds to thepresence of water vapor in themeasurement zone (b). Blackbars correspond to the intervalsused for modeling each analyte

D. Perez-Guaita et al.

unknown interference (i.e., cyclopropane) spectrum. Differ-ent constraints regarding the non-negativity of the spectrumand of the concentrations were studied. The concentration ofthe reference spectrum of the analyte in the X matrix wasalso fixed at the value of 1. Therefore, the concentrations ofsamples in XVALwere obtained by multiplying the vector ofarbitrary concentrations by the real concentration of thereference spectrum (i.e., 2,000 ppm).

Studies on entirely unknown samples

In the cases discussed above, a measurement situation wassimulated where some information on the composition of thegas sample was available. In the case where the IR sensorobtains a series of spectra without any signal of an a prioriknown constituent, only the MCR-ALSmethod was tested forthe ability to extracting information of interest from unknownspectra. In the Xmatrix, XCAL and XVALwere introduced astest samples. Since CO2 does not show any spectral overlapwith any of the other analytes, it was excluded from theseconsiderations. The spectral regions considered here were3,276–2,553 and 1,830–744 cm−1. The number of constitu-ents in the samples was calculated using an algorithm based ina singular value decomposition (SVD) available within theMCR-ALS toolbox [http://www.ub.edu/mcr/web_mcr/mcrals.html]. In short, after the SVD analysis was performed,the explained variance for each singular value was plottedagainst the number of components, and the minimum numberof components that explained the maximum amount of vari-ance was selected. Then, a first estimation of the spectra wasestablished using the SIMPLISMA algorithm being intro-duced as a y matrix; no constraints were applied.

Results

Direct comparison

In general, when comparing the selected chemometricmethods in absence of any interference, the obtained resultswere quite comparable for all analytes in terms of predictioncapability and repeatability (see Table 2). Performing three-paired t tests (each algorithm with the other two) for com-paring the root mean square error of prediction (RMSEP), nostatistically significant differences were found (α=0.05).CLS and SBC revealed slightly higher similarity, thusevidencing the rather close relationship of the two methods,if an appropriate artificial noise matrix was used in SBC.Regarding the gases studied herein, the prediction of carbondioxide shows reduced accuracy and reproducibility com-pared to the prediction of the other analytes. This is readilyexplained by the lack of isolation of the system from the ambientatmosphere, which consequently results in interferences from

variations in atmospheric CO2. As expected, for well-knownand well-defined analytical systems and measurement scenarios,more complex PLS algorithms do not need to be introduced, asCLS is certainly sufficient, thereby avoiding potential efforts forestablishing representative calibration data sets.

Studies in the presence of known interferences

Figure 3 shows the evolution of the prediction capability ofthe algorithms as a function of the concentration of water,which was artificially added as a known interferant to thesamples. Only analytes with spectral features overlappingwith the water IR spectrum were plotted. Once again, CLSand SBC provided similar results. It is observed that theaddition of water distorts the predictive capabilities, andthe predicted concentration of the analytes apparently in-creases proportionally with increasing water concentration.If the signal corresponding to a water-saturated atmospherewas added, the RMSEP increased to values 6 times and 1.5times higher than those obtained for the prediction of theoriginal validation set of butane and methane, respectively.Once the correction was applied and water was integrated inthe calibration system, the prediction of the analytes sub-stantially improved, and the RMSEP values were only slight-ly higher than those obtained in the absence of water. Incontrast, PLS validation in the presence of water was con-siderably more robust compared to CLS and SBC revealingan increased error only in the case of butane, which wasagain appropriately corrected once water was added to thecalibration data set. The small amount of water vapor presentduring experiments obtaining the initial data results fromlimited isolation of the system against the ambient environ-ment and explains the absence of significant variations of thePLS errors, once the artificial water spectrum was added.

In summary, the algorithms studied herein, especially SBCand CLS, are more sensitive to the water interference, i.e., aknown interferant, in IR-HWG measurements compared toPLS. However, this influence may be readily corrected, if theinterference is known and the spectrum of the interferant may(1) either be measured using the IR-HWG sensor system, (2)or modeled, and (3) integrated into the calibration data set.

Studies in the presence of unknown interferences

PLS validation of the three analytes isobutylene, methane, orbutane was already performed in “Direct comparison” sectionwithout using information about the cyclopropane spectrum.The fact that the validation was performed without priorknowledge of the interferences spectrum evidences one ofthe main advantages of PLS, which may readily deal with aninterference without “a priori” knowledge of its spectrum, aslong as the interferant was also present during the recording of

Improving HGW-based IR gas sensors performance via tailored chemometrics

the calibration data set and providing that the variation of itssignals is not correlated with the analytes of interest.

In the event that an unknown interference appears, theaccuracy of CLS and SBC models is strongly affected, andthe validation errors were found to be more than three timeshigher for all analytes compared to when these interferences(see Fig. 4) are absent. However, in this case, the spectrum ofthe interference is not available and, thus, cannot be integrat-ed into the CLS calibration. Therefore, the interference can-not be corrected for this model. For SBC, the artificial noisematrix cannot be established either. Instead, a noise matrixcomposed of spectra of the interferences with a constant

concentration of the analyte may be established. Once thismatrix was used, the obtained SBC errors were equivalent tothose obtained in the absence of interferences for CLS, SBC,and PLS (see Fig. 4, black bar). As no information on theinterference was required for SBC modeling, these resultsconfirm that this method may substitute PLS, if a noisematrix with the variations of all interferences and a constantconcentration of the analyte is readily available.

MCR-ALSwas also tested for predicting the three consideredanalytes in the presence of cyclopropane. The obtained errorswere between 40 and 100 % higher than those obtained for theinitial CLS and SBC models (see orange bars in Fig. 4). How-ever, MCR-ALS provided a predicted spectrum of the interfer-ence, which may prove useful for identification of an unknowninterferant. For the evaluation of isobutylene, methane, andbutane, the predicted cyclopropane spectra were determinedto be very close to the real spectra of this constituent with

Table 2 RMSEP and repeatability values obtained for the direct validation of PLS, CLS, and SBC models built in the “Direct comparison” section

Analyte PLS CLS SBC

RMSEP Repeatability RMSEP Repeatability RMSEP Repeatability

Cyclopropane 45.59 12.61 38.54 10.75 40.21 10.39

Methane 10.67 6.27 27.59 5.75 28.21 5.80

Isobutylene 35.16 15.82 20.17 14.87 17.07 13.87

Butane 29.52 15.61 22.65 16.15 21.80 15.70

Carbon dioxide 66.40 11.89 94.33 27.39 95.09 26.51

All values are in parts per million. The repeatability corresponds to the standard deviation of the concentrations predicted for six spectra with thesame concentration (1,000 ppm of each compound)

Fig. 3 Evolution of the RMSEP as a function of the amount of wateradded to the spectra for the uncorrected (blue triangles) and corrected(red squares) PLS, CLS, and SBC models; for details, see “Studies inthe presence of known interferences” section

0

50

100

150

200

250

300

350

400

450

Isobutyhlene Methane Butane

RM

SE

P (

pp

m)

CLSSBCCLS without cyclopropaneSBC without cyclopropaneSBC without cyclopropane (corrected)MCR

Fig. 4 Comparison of CLS, SBC, andMCR results obtained in differentscenarios for isobutylene, methane, and butane. Errors were computed asroot mean square error of prediction (RMSEP) and correspond to theprediction of the independent set of samples integrated in the XVALmatrix by different models performed in this study. Violet bars and bluebars indicate the error obtained in the direct comparison without consid-ering the presence of any interference (see “Direct comparison” section).Red and green bars indicate the error obtained when considering cyclo-propane as an unknown interference that was not included in the calibra-tion progress, black bars indicate the error obtained in the validation ofthe SBC after correction using a measured noise matrix, and orange barsindicate the error obtained by the MCR (see “Studies in the presence ofunknown interferences” section)

D. Perez-Guaita et al.

correlation values in the specific spectral region of the analyteunder study of 0.9577, 0.9874, and 0.9886 for isobutylene,methane, and butane, respectively.

Summarizing, if unknown interferants are present, SBC is acompetitive algorithm to PLS, especially if it is experimentallyeasier to determine a noise matrix with an IR-HWG sensorrather than obtaining a sizeable collection of well-characterizedcalibration samples. Additionally, MCR-ALS may providequalitative information on the interference spectrum.

Studies on entirely unknown samples

Finally, the hypothetical case of spectra without any infor-mation on the samples has been studied evaluating the ex-traction of any kind of information from such complexanalytical scenarios. In this scenario, any calibration is ren-dered useless, and data operations are limited to the decom-position or deconvolution of the obtained complex spectrainto the potential spectra of the contained individual constit-uents. In the present study, this process was tested usingMCR-ALS. Using the algorithm based on the SVD analysis,the initial estimation of the number of contained componentswas determined to be 5. Figure 2 compares the measuredconstituent spectra with those extracted by MCR assuming amixture with five components. As the fifth component, aspectrum was extracted matching the water vapor spectrum,which can be related to small variations of atmospherichumidity during the measurement of the samples. In general,excellent correlation was observed between the measuredconstituent spectra and the spectra extracted via MCR. Theobtained correlation coefficients were 0.8514, 0.9977,0.9915, and 0.9996 for isobutylene, methane, cyclopropane,and butane, respectively. MCR also provided an arbitraryconcentration value for the potential components accordingto their calculated spectra. Comparing those predicted valuesto the real values within the samples, the correlation coeffi-cients were 0.9941, 0.9990, 0.9887, and 0.9686 for isobu-tylene, methane, cyclopropane, and butane, respectively,thus evidencing the outstanding capability of MCR to pro-vide qualitative and relative quantitative information onsample constituents, even if no calibration or reference spec-trum of the analytes is available.

Taking into account the analytical application of the afore-mentioned algorithms for IR-HWG gas sensing, the maindrawback of MCR-ALS may be the complexity of the in-volved calculations. As the prediction of the concentrationsresults from an iterative process, a first estimation of thenumber of components must be selected prior to computa-tion and, accordingly, may have to be adapted later on.Therefore, the response at real time of the informationobtained by a sensor may be affected when using MCR-ALS. Since the prediction of the values in CLS, PLS, andSBC is performed by simply multiplying the spectrum with

the regression vector (see Eq. 1), conventional calculationsare computationally less extensive and may be performed inreal time.

Conclusions

In this study, the application of PLS, CLS, SBC, and MCR-ALS for improving the functionality of IR-HWG gas sensorshas been detailed, providing guidelines for selecting the mostbeneficial algorithm in various measurement scenarios of in-creasing complexity. The obtained results evidenced that thereis no single best chemometric method, but that the complexityof the analytical scenario guides toward the most beneficialalgorithm. Thus, the choice of the best chemometric methoddepends on practical considerations for each measurement sit-uation, such as the complexity of the matrix and the amount ofinterferences, the cost in terms of money, the time needed forobtaining calibration samples and reference data, and finally,the need for real-time monitoring. Despite the fact that substan-tial efforts are usually needed for establishing a representativecalibration data set, PLS was found to be the most robustmethod, and the only algorithm that may deal with changeswithin the spectra due to, e.g., interactions among the sampleconstituents. CLS provided a simple yet useful calibration, if allcomponents of interest are known and well-defined, with leastcomputational efforts. However, CLS was found to be highlysensitive to any unexpected interferences within the sample. Inthis case, SBC readily competes with PLS, and the decision foreither one of these techniques depends on whether it is exper-imentally easier to analyze a sizeable calibration data set or anoise matrix. The utility of MCR-ALS is particularly pro-nounced for unknown samples, as the algorithm provided aclose estimation of the spectra of unknown interferences inwell-defined samples, as well as qualitative and quantitativeestimations of the constituents within unknown samples. Nev-ertheless, due to the iterative nature of this algorithm, extensivecomputational efforts may be required, thus decreasing the timeresolution of an in situ or continuously operating sensor system.The applicability range of such a sensor improves with the useof specifically selected chemometric methods studied in thispaper, thereby covering application areas including industrialprocess monitoring, the analysis of volatile organic constituentsin breath, and remote sensing of environmental samples.

Acknowledgments DPG acknowledges financial support by the grant“Segles V” provided by the University of Valencia, the Ministerio deEducación y Ciencia (CTQ2011-25743) and the Generalitat Valenciana(PROMETEO 2010–055), enabling a research stay at the Institute ofAnalytical and Bioanalytical Chemistry, Univ. Ulm. JK acknowledgesher grant (Sara Borrell CD12/00667) from the Instituto Carlos III (Min-istry of Economy and Competitiveness). This work was performed in partunder the auspices of the US Department of Energy by the University ofCalifornia, Lawrence Livermore National Laboratory (LLNL) under

Improving HGW-based IR gas sensors performance via tailored chemometrics

contract no. W-7405-Eng-48. This project was funded in part by theLaboratory Directed Research and Development Program at LLNL undersubcontract nos. B565491, B594450, B590992, B598643, and B603018.

References

1. Nordberg A, HanssonM, Sundh I, Nordkvist E, Carisson H,MathisenB (2000) Monitoring of a biogas process using electronic gas sensorsand near-infrared spectroscopy (NIR). Water Sci Technol 41:1–8

2. Grutter M (2003) Multi-gas analysis of ambient air using FTIRspectroscopy over Mexico City. Atmosfera 16:1–13

3. Wilk A, Seichter F, Kim S-S, Tuetuencue E, Mizaikoff B, Vogt JA,Wachter U, Radermacher P (2012) Toward the quantification of the(CO2)-C-13/(CO2)-C-12 ratio in exhaled mouse breath with mid-infrared hollow waveguide gas sensors. Anal Bioanal Chem402:397–404

4. Wörle K, Seichter F, Wilk A, Armacost C, Day T, Godejohann M,Wachter U, Vogt J, Radermacher P, Mizaikoff B (2013) Breathanalysis with broadly tunable quantum cascade lasers. Anal Chem85:2697–2702

5. Saito M, Kikuchi K (1997) Infrared optical fiber sensors. Opt Rev4:527–538

6. Harrington JA (2000) A review of IR transmitting, hollow wave-guides. Fiber Integr Opt 19:211–227

7. Young CR, Menegazzo N, Riley AE, Brons CH, DiSanzo FP,Givens JL, Martin JL, Disko MM, Mizaikoff B (2011) Infraredhollow waveguide sensors for simultaneous gas phase detection ofbenzene, toluene, and xylenes in field environments. Anal Chem83:6141–6147

8. Fortes PR, Wilk A, Seichter F, Cajlakovic M, Koestler S, RibitschV, Wachter U, Vogt J, Radermacher P, Carter C, Raimundo IC,Mizaikoff B (2013) Combined sensing platform for advanceddiagnostics in exhaled mouse breath. Proc SPIE 8570:85700Q

9. Seichter F, Wilk A, Wörle K, Kim S-S, Vogt JA, Wachter U,Radermacher P, Mizaikoff B (2013) Multivariate determination of13CO2/12CO2 ratios in exhaled mouse breath with mid—infraredhollow waveguide gas sensors. Anal Bioanal Chem 405:4945–4951

10. Thompson BT, Inberg A, Croitoru N, Mizaikoff B (2006) Charac-terization of a mid-infrared hollow waveguide gas cell for theanalysis of carbon monoxide and nitric oxide. Appl Spectrosc60:266–271

11. Wu S, Deev A, Tang Y (2011) Quantum cascade laser sensors foronline gas chromatography. Quantum Sensing and NanophotonicDevices VIII, 794506, doi:10.1117/12.871398

12. Lehmann H, Bartelt H, Willsch R, Amezcua-Correa R, Knight JC(2011) In-line gas sensor based on a photonic bandgap fiber withlaser-drilled lateral microchannels. IEEE Sens J 11:2926–2931

13. Nikiforova OY, Kapitanov VA, Ponomarev YN (2008) Influence ofethylene spectral lines on methane concentration measurementswith a diode laser methane sensor in the 1.65 μm region. ApplPhys B 90:263–268

14. Bacsik Z, Gyivicsan A, Horvath K, Mink J (2006) Determination ofcarbon monoxide concentration and total pressure in gas cavities inthe silica glass body of light bulbs by FT-IR spectrometry. AnalChem 78:2382–2387

15. Villar A, Gorritxategi E, Otaduy D, Ciria JI, Fernandez LA (2011)Chemometric methods applied to the calibration of a Vis-NIRsensor for gas engine's condition monitoring. Anal Chim Acta705:174–181

16. Zhang G, Li Y, Li Q (2010) A miniaturized carbon dioxide gassensor based on infrared absorption. Opt Lasers Eng 48:1206–1212

17. Young C, Kim S-S, Luzinova Y, Weida M, Arnone D, Takeuchi E,Day T, Mizaikoff B (2009) External cavity widely tunable quantumcascade laser based hollow waveguide gas sensors for multianalytedetection. Sensors Actuators B: Chem 140:24–28

18. Wold S, Sjöström M, Eriksson L (2001) PLS-regression: a basictool of chemometrics. Chemom Intell Lab Syst 58:109–130

19. Roggo Y, Chalus P, Maurer L, Lema-Martinez C, Edmond A, JentN (2007) A review of near infrared spectroscopy and chemometricsin pharmaceutical technologies. J Pharm Biomed Anal 44:683–700

20. Trygg J, Lundstedt T (2007) Chapter 6—chemometrics techniquesfor metabonomics. In: Lindon JC, Nicholson JK, Holmes E (eds)The handbook of metabonomics and metabolomics. Elsevier, Am-sterdam, pp 171–199

21. Franke JE (2006) Inverse least squares and classical least squaresmethods for quantitative vibrational spectroscopy. In: ChalmersJM, Griffiths PR (eds) Handbook of vibrational spectroscopy. Wi-ley, Chichester

22. Vajna B, Patyi G, Nagy Z, Bódis A, Farkas A, Marosi G (2011)Comparison of chemometric methods in the analysis of pharma-ceuticals with hyperspectral Raman imaging. J Raman Spectrosc42:1977–1986

23. Marbach R (2005) A new method for multivariate calibration. JNear Infrared Spectrosc 13:241

24. Kuligowski J, Galera MM, García MDG, Culzoni MJ, GoicoecheaHC, Garrigues S, Quintás G, de la Guardia M (2011) Science basedcalibration for the extraction of “analyte-specific” HPLC-DADchromatograms in environmental analysis. Talanta 83:1158–1165

25. Jaumot J, Gargallo R, de Juan A, Tauler R (2005) A graphical user-friendly interface for MCR-ALS: a new tool for multivariate curveresolution in MATLAB. Chemom Intell Lab Syst 76:101–110

26. Brereton RG (1997) Multilevel multifactor designs for multivariatecalibration. Analyst 122:1521–1529

D. Perez-Guaita et al.


Recommended