Scott Ramos, Marlana Blackburn, and Brian Rohrback...

Unmixing Complex Chromatograms Scott Ramos, Marlana Blackburn, and Brian Rohrback, Infometrix, Inc., Woodinville, WA

ABSTRACT

The goal of mixture analysis is to extract from a set of bilinear data estimates of the composition and profiles of the underlying pure components. Traditionally, mixture anal-ysis has been employed in hyphenated techniques to assess peak purity and identify and quantitate unresolved constituents. For example, in a GC-MS experiment, when unresolved chromatographic peaks occur, the goal is to discover the elution profiles of the pure individual components and their corresponding mass spectra. The concentra-tion (elution) profiles and the spectra form the two dimensions in the bilinear data set.

The same approach can be used to separate pure chromatographic profiles from a series of mixed source chromatograms. The bilinear matrices in this case are com-posed of a composition dimension—the chromatographic profiles—and a concentration dimension—derived from the relative contributions of the end members. Methods for mixture analysis have advantages and disadvantages. Two popular methods were cho-sen for evaluation: multivariate curve resolution (MCR) and alternating least squares (ALS). These mixture analysis approaches were applied to two petroleum industry applications.

BACKGROUND

MCR. A bilinear data set (e.g., GC-MS, LC-UV) is assumed to be of the form:

[1]

where D is an m x n matrix of detector responses for m samples described by n mea-surements, C is an m x k matrix of concentrations for the k components that comprise the samples, P is an n x k matrix containing the k component measurement profiles, and E is an m x n matrix of residual errors. The matrix D can also be described via principal components analysis:

[2]

where U is the left singular matrix, S is the diagonal matrix of singular values (square roots of the eigenvalues), and V is the right singular matrix (the eigenvectors). If we truncate the principal components to only relevant information, i.e., excluding noise, then D can be approximated by:

[3]

where is m x q, is q x q, and is n x q, q being the number of principal compo-nents retained in the truncation.

A spectrum pi can be estimated from the eigenvectors via a transformation vector ti:

[4]

and for the set of k components that comprise the mixture, a k x k transformation matrix T provides the conversion:

[5]

Similarly, the concentration matrix C can be estimated from the same transformation matrix:

[6]

Thus, to estimate the measurement profiles and composition, it is necessary to deter-mine a reasonable expression for T. Unfortunately, without unique measurements, an infinite number of vector combinations might comprise T. The result is a solution region containing a collection of mathematically feasible profiles. The task of finding a reason-able solution can be simplified by the application of constraints.

For a two-component chromatographic peak, a solution region can be found by imposing non-negativity constraints on the spectral intensities and compositional pro-portions. This results in a feasible region like that in Figure 1a. Normalizing the spectra collapses these regions to the short line segments in Figure 1b.

Figure 1. Feasible region (a) in scores space of data from a fused chromatographic peak, and (b) projected onto the normalization line

Additional constraints (closure, unimodality) may be possible depending on the type of data. One advantage of MCR1 is the presentation of solution regions, with realistic bounds for the inferred spectral shapes and mixture compositions. A disadvantage is the difficulty in extending the algorithm to more than 2 components; several approaches have appeared in the literature2-4 but will not be described further.

ALS. This method takes a complementary approach to that of MCR: given a reasonable initial estimation of the measurement profiles (or compositions), generate an estimate of the compositions (or profiles), then apply constraints and refine the estimate of the pro-files (or compositions). This process is iterated until no further refinement is possible.

Implementations of ALS differ in the way the initial profile (or composition) estimates are generated. For example, the SIMPLISMA method5 derives estimates of the purest variables. The intensities of these variables in D can be starting points for C. Then a least squares approach is used to generate estimates of the profiles:

[7]

Following normalization (usually, an area norm) of the estimated profiles , the same least squares approximation of the composition is performed:

[8]

In contrast, polytopic vector analysis6 determines initial estimates of the most repre-sentative profiles by finding the k most mutually extreme samples. The profile estimates are normalized, a least squares estimate of C is made, as above, and the alternating least squares is continued, again, until convergence.

As with MCR, non-negativity constraints in the profiles and the compositions are typi-cally applied during each least squares step. Other constraints7 have been proposed, depending on the nature of the data. Unlike MCR, ALS is not complicated by the pres-ence of more than two components except for complications which arise with similarity in the profiles. However, inappropriate initial estimates may produce unreasonable pro-files and compositions.

Number of Components. One of the major issues with any of these algorithms is a reliable determination of the number of components. The initial determination of the number of mixture components is critical: underestimating this value will almost surely generate incorrect profiles.

Many approaches for estimating the correct number of components have been described; in this work, the FIEFA method, developed by Infometrix for Beckman, was used. A brief summary of the FIEFA method follows.

Fixed Interval Evolving Factor Analysis attempts to describe the complexity of a data set by examining, by SVD, an window of data of fixed size that slides across the data set. The algorithm was originally designed for real-time interpretation of LC/Diode Array matrices, so an updating mechanism for the covariance matrix was developed.

With a given interval size, say 5 measurement scans, the eigenvalues of the covari-ance matrix are found and stored. The covariance matrix is updated by removing the contribution of the first sample in the interval, incorporating the effect of the new sample added to the interval, then computing a new set of eigenvalues. Examining the collec-tion of eigenvalues can assist decisions on factor complexity in the data. Regions of continuous presence of eigenvalues can indicate complexity of the data and serve as guidance in preparing a data matrix for mixture analysis.

In Figure 2a is shown a highly overlapped chromatographic peak. Although two com-ponents are present, there is not even an inflection point in the total intensity plot. How-ever, the FIEFA plot shows that a region of two significant eigenvalues extends from scans 4 to 9.

Figure 2. Chromatographic peak (a) of two highly overlapped components, and (b) first 4 eigenvalues (EV) from FIEFA analysis

The FIEFA results can be summarized in a purity bar (e.g., see Figure 3): a horizon-tal line drawn across regions in the profile at the maximum number of significant factors.

EXPERIMENTAL

Fuel blend data. Refineries have a variety of process units to separate, crack and treat hydrocarbons. These generate multiple liquid and gas streams that are blended to pro-duce transportation fuels (gasoline, diesel, and jet/kerosene). Mathematical unmixing of the GC traces allows a simple QC of the final fuel product. In one common instance, kerosene can be back-blended into diesel fuel to increase the supply of the latter. There are limits to how much kerosene can be added due to specification constraints.

Production lots of diesel and kerosene were blended in mixtures ranging from 100% diesel to 100% kerosene in steps of 10%. Samples were analyzed by high resolution GC, on a 60 m DB1 column, with a generic temperature program for petroleum hydro-carbons.

Reservoir data. In exploration and production, a field will often have multiple produc-tion zones. Flow from individual zones can be monitored only if the production is halted and barriers are placed in the wellbore to sample the output one stratum at a time. Mathematical unmixing yields the relative contributions of each producing zone in a more cost-effective manner.

Core samples of a producing reservoir in Angola were characterized by traditional methods into two source types. Mixtures of these oils were made in various ratios which were then analyzed by high resolution GC.

Computational details. Analyses were performed with in-house algorithms developed in Matlab 5.3 (The MathWorks, Natick, MA) and with Pirouette 3.10 (Infometrix, Inc., Woodinville, WA), running on a Dell Precision 4200 workstation, 866 MHz processor with 256 MB RAM.

RESULTS

Fuel blend data. The diesel mixture data set is shown below. A FIEFA purity bar indi-cates that although most of the chromatographic region is made up of a single factor, the region from 8000 to 14000 scans is largely of two factors. Small regions of 3 factors are indicated as well. Also apparent are baseline differences from run to run. Finally, although difficult to perceive in this view, there is some misalignment among the chro-matograms.

Figure 3. Overlaid chromatograms of diesel data; FIEFA bar shows regions of up to 3 factors

The full data set was area normalized, then processed by the 2 component MCR algorithm. The resulting estimates for the pure component profiles are shown in the next figure.

Figure 4. Resolved profiles of raw diesel data

The general shapes of the profiles are in reasonable agreement with the true profiles; the significant negative-going spikes adjacent to the paraffin peaks are a manifestation of the misalignment among these profiles. The corresponding contribution estimates from the two sources are shown in Table 1.

Table 1. Amounts of diesel sources in sample mixtures, resolved by MCR

To compensate for the non-reproducibility in retention times, the profiles were aligned by a rubber-banding procedure, using the n-paraffins as alignment markers. The aligned data are shown below.

Figure 5. Overlaid chromatograms of aligned diesel data

With the aligned data, a 2-factor region (8500 - 12500 scans) is pronounced, while regions of 3 factors are of little significance. The profiles resolved by MCR are shown in the next figure.

Figure 6. Resolved profiles of aligned diesel data

The negative spikes in these profiles are no longer present, and the source contribu-tions in Table 1 are closer to the known values, validating an improvement from the alignment procedure.

Two additional refinements were made to these data to determine feasibility and reduce processing time. First, the data were subsampled by a bunching procedure, reducing the data density by 1/4. Second, only the region shown earlier to contain 2 fac-tors was retained. The result is the following data set.

Figure 7. Overlaid chromatograms of aligned, binned and truncated diesel data

MCR results from this smaller data set are essentially identical to those for the full, aligned data set; the source contributions are again shown in Table 1.

These 3 data sets were also processed with ALS. The contributions of the sources found by ALS are presented in Table 2.

Table 2. Amounts of diesel sources in sample mixtures, resolved by ALS

Standard errors for the refined data sets (aligned; and aligned, filtered and truncated) are slightly lower with MCR than with ALS.

Reservoir data. The mixture data set from the reservoir samples are shown, together with the FIEFA purity bar, in the following figure. In contrast with the diesel data, the chromatographic profiles of the reservoir data are very similar with essentially no regions of unique signal.

Figure 8. Overlaid chromatograms of aligned reservoir data

Most of the chromatographic region is composed of 2 factors, although the first 2000-3000 scan points appear to contain 3 or more factors. Applying MCR to the whole pro-files resulted in significant errors in the estimated source contributions, as shown in the

corresponding columns of Table 3. In fact, the scores space was so perturbed by inclu-sion of the early eluting region that the algorithm improperly inferred which samples were the pure sources.

Table 3. Amounts of reservoir sources in sample mixtures

These data were filtered by the bunching procedure described previously, and the region of early eluting peaks was truncated. The result is a smaller data set which is composed mostly of 2 factors, as shown below.

Figure 9. Overlaid chromatograms of aligned, filtered, truncated reservoir data

The MCR algorithm applied to the these data generated the source profiles pre-sented in the following figure. Because the two resolved profiles are so similar, a subre-gion has been blown up in the plot to show the small differences discerned by the algorithm. The corresponding source amounts are compared to the mixture composi-tions in Table 3.

Figure 10. Resolved profiles of aligned and filtered reservoir data

The results from the ALS algorithm run on the reservoir data are very similar to those from MCR. Table 3 shows the estimated source contributions, and the figure below compares the resolved profiles for the two algorithms. Note that the profiles for source 2 are essentially identical, while only very small differences can be detected for the pro-files of source 1.

Figure 11. Resolved profiles of Reservoir data, comparing MCR and ALS results

SUMMARY

The diesel blend mixtures were prepared in a well-controlled laboratory setting, and the results from the mixture analyses are quite good. In addition, the regions in the chro-matograms unique to one source yield a very restricted feasible region and good esti-mates of the underlying profiles.

The reservoir data represent a more typical scenario in that the source materials are not as well defined and the profiles have no unique channels. Yet, both mixture analysis algorithms yield acceptable results even for this more challenging case. However, to achieve these results, it is necessary to exclude regions of confounding information such as the early eluting peaks in the chromatograms.

Both algorithms produce acceptable results on the whole chromatographic profiles when confounding regions are excluded. Filtering to reduce the data density produces outcomes of equal or better quality, indicating that data rates could be reduced accord-ingly with the concomitant savings in storage requirements.

Mathematical unmixing of chromatographic data is of use in the routine assessment of hydrocarbon mixtures both in upstream (exploration) and downstream (refining) applications. The result is a rapid assessment of the contribution of complex, often chemically-similar sources.

REFERENCES

1. Lawton, W.H. and Sylvestre, E.A. Technometrics. 13(3):617-633 (1971).

2. Borgen, O.S. and Kowalski, B.R. Anal. Chim. Acta. 174:1-26 (1985).

3. Kim, B.M. and Henry, R.C. Chemometrics Intell. Lab. Systems. 49:67-77 (1999).

4. Leger, M.N. and Wentzell, P.D. Chemometrics Intell. Lab. Systems. 62:171-188 (2002).

5. Windig, W. and Guilment, J. Anal. Chem. 63(14):1425-1432 (1991).

6. Full, W.E.; Ehrlich, R; and Klovan, J.E. J. Math. Geology 13:331-344 (1981).

7. Tauler, R.; Smilde, A.; and Kowalski, B.R. J. Chemometrics 9:31-58 (1995).

Special thanks are due Russ Kaufman and Elizabeth Harvey of ChevronTexaco for supplying the chromatographic data and providing funding for this research.

D CP′ E+=

D USV′=

D USV′=

U S V

pi Vti=

P VT=

C US T′( ) 1–=

0 5 10

Factor1

-2

-1

0

1

2

Fact

or2

0.00 0.05 0.10

Factor1

-0.05

0.00

0.05

Fact

or2

P′ C′C( ) 1– C′D=

P

C DP P′P( ) 1–=

True amounts Raw profiles Aligned profiles Filtered profilesSample

#Diesel

#2Diesel

#1Diesel

#2Diesel

#1Diesel

#2Diesel

#1Diesel

#2Diesel

#11 1.0 0.0 1.000 0.000 1.000 0.000 1.000 0.000

2 0.9 0.1 0.915 0.085 0.901 0.099 0.896 0.104

3 0.8 0.2 0.900 0.100 0.797 0.203 0.797 0.203

4 0.7 0.3 0.751 0.249 0.719 0.281 0.718 0.282

5 0.6 0.4 0.724 0.276 0.601 0.399 0.590 0.410

6 0.5 0.5 0.583 0.417 0.517 0.483 0.507 0.493

7 0.4 0.6 0.326 0.674 0.400 0.600 0.396 0.604

8 0.3 0.7 0.288 0.712 0.324 0.676 0.316 0.684

9 0.2 0.8 0.185 0.815 0.207 0.793 0.201 0.799

10 0.1 0.9 0.058 0.942 0.124 0.876 0.118 0.882

11 0.0 1.0 0.000 1.000 0.000 1.000 0.000 1.000

RMSEP 0.062 0.062 0.013 0.013 0.010 0.010

0 2 4 6 8 10 12 14 16 18 200

500

1000

1500

2000

2500

3000

3500

4000

4500Fused Peak

Scan Number

Inte

nsity

0 2 4 6 8 10 12 14 16 18 200

500

1000

1500

2000

2500FIEFA Eigenvalues

Scan Number

Var

ianc

e

EV 1

EV 2

0 2000 4000 6000 8000 10000 12000 14000 16000 180001

1.01

1.02

1.03

1.04

1.05

1.06x 10

5

321

Diesel Mixtures:MCR:Source Profiles

0 50 100 150

Retention Time Index (E +02)

5.9

6.0

6.1

6.2

Sou

rce

Pro

files

(E -0

5)

1

True amounts Raw profiles Aligned profiles Filtered profilesSample

#Diesel

#2Diesel

#1Diesel

#2Diesel

#1Diesel

#2Diesel

#1Diesel

#2Diesel

#11 1.0 0.0 0.966 0.034 0.989 0.000 0.988 0.000

2 0.9 0.1 0.885 0.115 0.891 0.098 0.903 0.102

3 0.8 0.2 0.865 0.135 0.812 0.201 0.796 0.201

4 0.7 0.3 0.728 0.272 0.734 0.280 0.720 0.280

5 0.6 0.4 0.696 0.304 0.603 0.400 0.605 0.407

6 0.5 0.5 0.566 0.434 0.512 0.481 0.510 0.490

7 0.4 0.6 0.334 0.666 0.413 0.598 0.406 0.600

8 0.3 0.7 0.296 0.704 0.326 0.673 0.321 0.679

9 0.2 0.8 0.199 0.801 0.216 0.790 0.212 0.794

10 0.1 0.9 0.083 0.917 0.120 0.872 0.118 0.876

11 0.0 1.0 0.022 0.978 0.000 0.996 0.000 0.993

RMSEP 0.048 0.048 0.017 0.015 0.012 0.012

0 2000 4000 6000 8000 10000 12000 14000 16000 180000

1000

2000

3000

4000

5000

6000

321

Diesel Mixtures:MCR:Source Profiles

0 50 100 150

Retention Time Index (E +02)

0.000

0.001

0.002

0.003

0.004

Sou

rce

Pro

files

1

8000 10000 12000 14000 160000

1000

2000

3000

4000

5000

6000

321

0 5000 10000 15000 20000 25000 300000

500

1000

1500

2000

2500

3000

3500

4321

True amountsWhole profiles,

resolved by MCRFiltered profiles, resolved by MCR

Filtered profiles, resolved by ALS

Sample #

Reserv. #1

Reserv. #2

Reserv. #1

Reserv. #2

Reserv. #1

Reserv. #2

Reserv. #1

Reserv. #2

1 1.0 0.0 0.416 0.584 1.000 0.000 0.995 0.005

2 0.0 1.0 0.602 0.398 0.000 1.000 0.026 0.987

3 0.8 0.2 0.000 1.000 0.787 0.213 0.807 0.202

4 0.6 0.4 0.745 0.255 0.612 0.388 0.603 0.395

5 0.5 0.5 0.489 0.511 0.471 0.529 0.450 0.544

6 0.4 0.6 0.542 0.458 0.492 0.508 0.481 0.517

7 0.2 0.8 1.000 0.000 0.178 0.822 0.116 0.873

RMSEP 0.538 0.538 0.038 0.038 0.049 0.045

1000 3000 5000 7000 9000 11000 13000 15000 17000 190000

100

200

300

400

500

600

700

321

Reservoir Data:MCR:Source Profiles

0 1000 2000 3000 4000 5000

Retention Time Index

0.0000

0.0005

0.0010

0.0015

0.0020

Sou

rce

Pro

files

1

1

Full Data

640 660 680 700 720

Retention Time Index

20

40

60

Res

pons

e (E

-05)

Source 1 - MCRSource 1 - ALS

Source 2 - MCRSource 2 - ALS

Date post:	28-Mar-2020
Category:	Documents
Upload:	others
View:	3 times
Download:	0 times

Scott Ramos, Marlana Blackburn, and Brian Rohrback...

Documents