+ All Categories
Home > Documents > Real time phase detection based online monitoring of batch...

Real time phase detection based online monitoring of batch...

Date post: 06-May-2018
Category:
Upload: lykiet
View: 218 times
Download: 0 times
Share this document with a friend
13
Real time phase detection based online monitoring of batch fermentation processes Soumen K. Maiti a , Rajesh K. Srivastava b , Mani Bhushan a , Pramod P. Wangikar a,b a Department of Chemical Engineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India b School of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India 1. Introduction Fermentation processes are widely used in food, pharmaceu- tical, agrochemical and chemical industries. The production units range from small scale for biopharmaceuticals to large scale for bulk chemicals. A majority of the processes are operated in a batch or semi-batch mode. Intense competition and regulatory require- ments pose severe demands on consistency of these batches in terms of the end of batch productivity and product quality [1]. However, fermentation processes are subject to intrinsic batch-to- batch variability due to variability in raw material quality, state of the seed culture and operator skills. It is therefore desirable to automate monitoring, fault detection and diagnosis and control of fermentation processes. This can lead to improved process reliability, product quality and productivity as well as reduced development time, manpower inputs and cost of production [2]. Typically, during operation, the product quality and batch performance are monitored via off-line measurements of con- centrations of the product, byproducts, biomass and substrates. These measurements are expensive, labor intensive and time consuming, are obtained at low frequencies (e.g., every few hours) at pre-defined intervals and hence, may not always lead to timely information about the status of the batch. Further, in some processes, the product formation begins only towards the later parts of the batch and this leads to additional difficulty in adequately monitoring the process using these offline measure- ments [3]. Fermentors are typically equipped with several on-line sensors such as pH, temperature, concentrations of dissolved oxygen (DO) and carbon dioxide and partial pressure of oxygen and carbon dioxide in the exhaust gas. These measurements are inexpensive, usually available at high frequencies (e.g., every few seconds) and are obtained in an automated fashion. Hence, there is enormous potential to use these measurements to effectively monitor batch fermentation processes. In the general process systems engineering literature, several different techniques have been reported for process monitoring and fault diagnosis [4]. These can be broadly classified as process model based, knowledge based and historical data based. The success of any model based strategy depends critically on the Keywords: Principal component analysis Mean Covariance Moving window Singular point ABSTRACT Industrial fermentations conducted in a batch or semi-batch mode demonstrate significant batch-to- batch variability. Current batch process monitoring strategies involve manual interpretation of highly informative but low frequency offline measurements such as concentrations of products, biomass and substrates. Fermentors are also fitted with computer interfaced instrumentation, enabling high frequency online measurements of several variables and automated techniques which can utilize this data would be desirable. Evolution of a batch fermentation, which typically uses complex medium, can be conceptualized as a sequence of several distinct metabolic phases. Monitoring of batch processes can then be achieved by detecting the phase change events, also termed as singular points (SP). In this work, we propose a novel moving window based real-time monitoring strategy for SP detection based only on online measurements. The key hypothesis of the strategy is that the statistical properties of the online data undergo a significant change around an SP. The strategy is easily implementable and does not require past data or prior knowledge of the number or time of occurrence of SPs. The efficacy of the proposed approach has been demonstrated to be superior compared to that of reported techniques for industrially relevant model organisms. The proposed approach can be used to decide offline sampling timings in real time.
Transcript
Page 1: Real time phase detection based online monitoring of batch ...dspace.library.iitb.ac.in/jspui/bitstream/10054/1713/1/5278.pdf · Real time phase detection based online monitoring

Real time phase detection based online monitoring of batch fermentationprocesses

Soumen K. Maiti a, Rajesh K. Srivastava b, Mani Bhushan a, Pramod P. Wangikar a,b

a Department of Chemical Engineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, Indiab School of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India

Keywords:

Principal component analysis

Mean

Covariance

Moving window

Singular point

A B S T R A C T

Industrial fermentations conducted in a batch or semi-batch mode demonstrate significant batch-to-

batch variability. Current batch process monitoring strategies involve manual interpretation of highly

informative but low frequency offline measurements such as concentrations of products, biomass and

substrates. Fermentors are also fitted with computer interfaced instrumentation, enabling high

frequency online measurements of several variables and automated techniques which can utilize this

data would be desirable. Evolution of a batch fermentation, which typically uses complex medium, can

be conceptualized as a sequence of several distinct metabolic phases. Monitoring of batch processes can

then be achieved by detecting the phase change events, also termed as singular points (SP). In this work,

we propose a novel moving window based real-time monitoring strategy for SP detection based only on

online measurements. The key hypothesis of the strategy is that the statistical properties of the online

data undergo a significant change around an SP. The strategy is easily implementable and does not

require past data or prior knowledge of the number or time of occurrence of SPs. The efficacy of the

proposed approach has been demonstrated to be superior compared to that of reported techniques for

industrially relevant model organisms. The proposed approach can be used to decide offline sampling

timings in real time.

1. Introduction

Fermentation processes are widely used in food, pharmaceu-tical, agrochemical and chemical industries. The production unitsrange from small scale for biopharmaceuticals to large scale forbulk chemicals. A majority of the processes are operated in a batchor semi-batch mode. Intense competition and regulatory require-ments pose severe demands on consistency of these batches interms of the end of batch productivity and product quality [1].However, fermentation processes are subject to intrinsic batch-to-batch variability due to variability in raw material quality, state ofthe seed culture and operator skills. It is therefore desirable toautomate monitoring, fault detection and diagnosis and control offermentation processes. This can lead to improved processreliability, product quality and productivity as well as reduceddevelopment time, manpower inputs and cost of production [2].

Typically, during operation, the product quality and batchperformance are monitored via off-line measurements of con-centrations of the product, byproducts, biomass and substrates.These measurements are expensive, labor intensive and timeconsuming, are obtained at low frequencies (e.g., every few hours)at pre-defined intervals and hence, may not always lead to timelyinformation about the status of the batch. Further, in someprocesses, the product formation begins only towards the laterparts of the batch and this leads to additional difficulty inadequately monitoring the process using these offline measure-ments [3]. Fermentors are typically equipped with several on-linesensors such as pH, temperature, concentrations of dissolvedoxygen (DO) and carbon dioxide and partial pressure of oxygen andcarbon dioxide in the exhaust gas. These measurements areinexpensive, usually available at high frequencies (e.g., every fewseconds) and are obtained in an automated fashion. Hence, there isenormous potential to use these measurements to effectivelymonitor batch fermentation processes.

In the general process systems engineering literature, severaldifferent techniques have been reported for process monitoringand fault diagnosis [4]. These can be broadly classified as processmodel based, knowledge based and historical data based. Thesuccess of any model based strategy depends critically on the

Page 2: Real time phase detection based online monitoring of batch ...dspace.library.iitb.ac.in/jspui/bitstream/10054/1713/1/5278.pdf · Real time phase detection based online monitoring

800

adequacy of the underlying model. Industrial fermentationprocesses typically employ complex media with multiple sub-stitutable carbon and nitrogen substrates, which leads todifficulties in developing adequate process models. Further,several aspects of fermentation processes such as the dynamicevolution of pH and concentration of dissolved oxygen, are not wellunderstood in general and this may lead to additional difficulties indeveloping reliable process models. Hence, model based strategiesmay not be suitable for monitoring of majority of industrialfermentation processes. Knowledge based monitoring techniquessuch as those based on fuzzy logic require expert knowledge of thesystem and therefore are system specific [4,5]. Such expertknowledge may not be available for the system of interest.Further, even for cases where such knowledge exists in terms of themanpower knowledgeable about the system, it is not straightfor-ward to translate such knowledge to a form that can be readilyutilized by automated monitoring systems. Historical data basedmethods rely on large amount of past data to capture theunderlying relationships between the process variables [4,6,7].However, due to batch-to-batch variability intrinsic to fermenta-tion processes, it is difficult for these techniques to delineatebetween normal and abnormal variations.

Another set of methods, based on ideas from statistical controlliterature, have been proposed that rely only on data available fromthe current batch [8–10]. Fermentation processes typically utilizecomplex organic substrates such as yeast extract in addition todefined components such as glucose and ammonia. This provides asubstitutable multisubstrate milieu, which may result in sequen-tial and/or simultaneous utilization of the substrates. The cellularmetabolism may be different in each such substrate uptake phase[11]. Evolution of a batch fermentation process can then beconceptualized as a sequence of such phases, each with its ownduration and dynamics. It is expected that batch-to-batchvariability would therefore, among other things, translate tovariations in switching times between the phases [12]. Hence,effective monitoring can be achieved by detecting the time ofoccurrence of these various phases. The reported technique basedon this philosophy detects the phase change time by identifyingqualitative changes in trajectories of the test statistic T2 andprincipal component score plots [9]. Being qualitative in nature,this technique is difficult to automate. While other statisticalprocess monitoring techniques such as Shewhart Charts, Cumu-lative sum (CUSUM) and Exponentially weighted moving average(EWMA) [8,10,13] have been applied for monitoring batchprocesses in general, they have not been specifically applied formonitoring fermentation processes characterized by multiplephases since it is typically assumed in these techniques that theentire batch data is characterized by single set of statisticalproperties (such as mean and covariance).

In this article, we present a real time phase detection basedprocess monitoring scheme that does not require process model orhistorical data. The scheme is inspired from statistical controlliterature, is multivariate in nature, relies only on onlinemeasurements and can be easily automated to work withindustrial processes. The basic premise in our approach is thatstatistical properties of online measured data are different indifferent phases. Hence, the problem of phase change detection istreated to be equivalent to that of detection of changes in statisticalproperties of the data. To be consistent with earlier work [9,12], werefer to a point where phase change is detected as a singular point(SP).

2. Experimental methods

In this study, experimental data has been collected for two different strains,

Amycolatopsis balhimycina DSM5908 and Bacillus pumilus ATCC 21951 while the

data for Amycolatopsis mediterranei S699 was taken from Doan et al. [9]. For A.

balhimycina and B. pumilus, the fermentation experiments were performed in a 2.5 l

fermentor equipped with various sensors and data acquisition system (Model:

Biostat B, B. Braun, Germany). The fermentor was aerated at a constant flow rate of

1.0 vvm (volume of air per unit volume of medium per minute) using a mass flow

controller. Dissolved oxygen (DO) concentration in the fermentor was maintained

at 40% of saturation value by controlling the stirrer speed in cascade mode with DO.

The concentrations of oxygen and carbon dioxide in the exhaust gas were measured

by infrared spectroscopy and paramagnetic analysis, respectively (Analyser

BINOS1002 M, Rosemount Analytical, Germany). The online measurements were

stored at 5 min intervals.

The Amycolatopsis balhimycina strain was a gift from Prof Anna Eliasson Lantz of

Denmark’s Technical University, Denmark, and was stored on Bennett agar plates at

4 8C. Seed culture was grown in 100 ml medium in a 500 ml capacity Erlenmeyer

flask with single baffle and incubated at 30 8C and 150 rpm. The seed medium

contained per liter of distilled water: glucose: 15 g, glycerol: 15 g, soya peptone:

15 g, NaCl: 5 g and yeast extract: 3 g. Upon reaching an optical density of �12 at

600 nm, 25 ml of the seed culture was transferred to a fermentor containing 1 l of

production medium. The production medium contained, per liter of distilled water,

glucose: 54–100 g, glycerol: 0–16 g, ammonium sulfate: 3–6.6 g, yeast extract:

0.75–1.5 g, defatted soybean flour: 0.25–1.0 g, ZnSO4: 0.02 g, FeSO4: 0.02 g,

trisodium citrate: 0.025 g, MgSO4: 1.5 g, MnSO4: 0.01 g, NaCl: 1 g, MES: 1.045 g

and KH2PO4: 0.2 g. In addition, the following vitamins were added: biotin:

0.00005 g, calcium-pantothenate: 0.001 g, nicotinic acid: 0.001 g, myo-inositol:

0.025 g, thiamin HCL: 0.001 g, pyridoxine HCL: 0.001 g and para-aminobenzoic

acid: 0.0002 g. Temperature was maintained at 30 8C and pH was maintained at 7.0

by adding 1.5N NaOH solution by using a pH controller. The online measurements

included NaOH flow rate, pH, agitator speed and DO.

A transketolase (tkt) deficient strain of Bacillus pumilus ATCC 21951 was procured

from Institute for fermentation, Osaka, Japan. The strain was maintained on Luria

Bertani agar slant and was stored at 4 8C. The preparation of pre-seed and seed

cultures and the culture transfer criteria were as described earlier [14]. The

production medium contained per liter of distilled water: glucose: 200 g, cas amino

acids: 15 g or corn steep liquor: 12 g, ammonium sulfate: 5 g, CaCO3: 16 g, MnSO4:

0.5 g, leucine: 0.5 g and tryptophan: 0.05 g. The temperature was maintained at

37 8C. The online measurements available for Bacillus pumilus were: pH, dissolved

oxygen, agitator speed and CO2 and O2 concentration in exhaust gas.

For both the strains, samples were drawn from the fermentation medium at

regular intervals to obtain the time profiles of concentrations of dry cell weight

(DCW), product(s) and substrate(s). Glucose, glycerol, D-ribose, acetate, acetoin and

2,3-butanediol were analyzed via RI detector on HPLC (Hitachi, Merck KgaA,

Darmstadt, Germany) using HP-Aminex-87-H column (Biorad, Hercules, CA, USA)

with column temperature maintained at 60 8C. A mobile phase of 5 mM sulfuric acid

with flow rate of 0.6 ml/min was used. The concentration of free amino acids was

estimated via the ninhydrin method. The details are described in earlier works

[11,14,15]. Ammonia was measured using Nessler’s reagent [16]. For A.

balhimycina, DCW was measured by filtering 10 ml of the fermentation broth

using pre weighted filter papers (Whatman, Brentford, Middlesex, UK) as reported

elsewhere[11]. Micrococcus luteus was used as a test organism to measure

antimicrobial activity of balhimycin [17]. For this purpose, agar test plates with

Micrococcus luteus growth medium were prepared. Holes were punched in the agar

medium and filled with fermentor samples. Then the plates were incubated for two

days at 30 8C. The growth inhibition diameter around the holes was measured and

concentration of balhimycin was determined using pre-computed calibration

curve.

The data for Amycolatopsis mediterranei S699 was taken from literature [9] and

consisted of the following online measurements: pH, dissolved oxygen, agitator

speed and CO2 and O2 concentration in exhaust gas.

3. Phase detection technique

3.1. Algorithm

In this work, the problem of monitoring of fermentation processhas been posed as that of detection of singular points (SPs). Weassume that the underlying characteristic dynamics and in turn thestatistical properties of the online data vary from one phase toanother. Thus, we propose that an SP can be detected byappropriately detecting the change in the statistical propertiesof the available online data as described below (Fig. 1). For thecurrent phase fi and a new data point xk (xk = [x1k x2k x3k. . .xpk]where p is the number of variables being measured), the followinghypothesis is checked:

Null hypothesis : H0 : xk 2fi

Alternative hypothesis : H1 : xk =2fi(1)

Page 3: Real time phase detection based online monitoring of batch ...dspace.library.iitb.ac.in/jspui/bitstream/10054/1713/1/5278.pdf · Real time phase detection based online monitoring

Fig. 1. Schematic representation of the proposed ‘‘Moving window-dynamic principal component analysis (MW-DPCA)’’ approach for singular point (SP) detection.

801

Let the data belonging to phase fi correspond to a probabilitydistribution Pi. Then these hypotheses can be tested by construct-ing an appropriate test statistic depending on the nature of thedistribution Pi. In this work, we assume that Pi is a normaldistribution, i.e. Pi = N (mi,

Pi), where mi and

Pi are the mean and

covariance matrix of the data corresponding to phase fi which canbe approximated by sample average xi and sample covariance Si

calculated from the available data belonging to phase fi as

xi ¼1

ni

X

x j 2fi

x j; Si ¼1

ni � 1

X

x j 2fi

ðx j � xiÞTðx j � xiÞ (2)

where ni are the number of data points belonging to phase fi. Inorder to obtain reliable estimates of mi and

Pi, the hypothesis

testing is performed only after collecting data for a minimumwindow length (Wmin), i.e., ni �Wmin.The relevant test statistic isthen [18]:

T2k ¼ ðxk � xiÞS�1

i ðxk � xiÞT (3)

which represents the Mahalanobis distance of the current point xk

from the mean of the data corresponding to phase fi. The nullhypothesis is rejected when T2

k violates the upper or lower control

limits, T2UCL and T2

LCL, respectively as:

T2k � T2

LCL

T2k �T2

UCL

(4)

where

T2LCL ¼

pðni � 1Þðni þ 1Þniðni � pÞ Fða=2; p;ni � pÞ (5)

T2UCL ¼

pðni � 1Þðni þ 1Þniðni � pÞ Fð1� a=2; p;ni � pÞ (6)

where a is the significance level. For this study, a = 0.01 has beenused. When the null hypothesis is rejected, a phase change event isdeclared and the index i, which keeps track of the number ofphases detected so far, is incremented by 1: i = i + 1. Datacorresponding to the new phase is then collected afresh and theprocedure continued. On the other hand, when the null hypothesisis accepted, the current point is appended to the sample availablefor phase fi and statistical properties of this phase are recomputedby Eq. (2) before testing the hypothesis (i.e. Eq. (1)) for nextavailable measurement.

Page 4: Real time phase detection based online monitoring of batch ...dspace.library.iitb.ac.in/jspui/bitstream/10054/1713/1/5278.pdf · Real time phase detection based online monitoring

802

The proposed approach, if implemented directly, can sufferfrom following drawbacks: (a) high sensitivity to measurementand process noise: this can lead to high false alarm rate (detectionof SP even if there has been no phase change in the process), (b)inability to capture dynamic relationships in the measured data,and (c) unnecessary computational overload if online data is veryfrequent since time constants of fermentation process may bemuch larger. To deal with these drawbacks, we incorporate thefollowing modifications to the basic approach.

3.1.1. Incorporating robustness to noise

(i) A phase change event is declared only if the null hypothesis isrejected for at least h out of j consecutive data points. Theparameters h and j can be tuned to achieve an acceptable tradeoff between false alarm rate and speed of phase changedetection.

(ii) Principal component analysis (PCA): PCA involves projectingthe measured data onto few orthogonal directions (referred toas loadings) and monitoring the projection of the data (scores)only on those directions. These orthogonal directions are theeigenvectors of the covariance matrix corresponding to itslargest eigenvalues. The number of directions used depends onthe fraction of variability of the data captured in thosedirections. Based on this number b, T2 and control limits usedfor monitoring are changed as [18]:

T2k ¼ ðxk � xiÞPb diagðlbÞ½ ��1PT

bðxk � xiÞT (7)

where diag (lb) is the diagonal matrix of b largest eigenvalues ofcovariance matrix Si and the columns of the matrix Pb are thecorresponding b number of eigenvectors of Si. The correspondingexpressions for upper and lower control limit are as [19]

T2LCL ¼

bðni � 1Þðni þ 1Þniðni � bÞ Fða=2; b;ni � bÞ (8)

T2UCL ¼

bðni � 1Þðni þ 1Þniðni � bÞ Fð1� a=2; b;ni � bÞ (9)

While the covariance matrix was utilized in the above discussionon PCA, use of the correlation matrix for PCA has also been reported[9]. We provide results based on both the techniques.

3.1.2. Incorporation of dynamic relationships among variables

The current online measurements may depend on the pastonline measurements. To capture such dynamic relationships,appropriately lagged data can be added to the current measure-ment [12]. Let the data at current time be related to data up to dsamples in the past, where d is known as the lag, then the currentdata xk is modified as: xd

k ¼ ½xk xk�1 . . . xk�d�. Accordingly, thecurrent phase data matrix Xi is changed to Xd

i and the mean andcovariance matrix of current phase, and the upper and lowercontrol limits are also changed to reflect this modification. Theparameter d needs to be tuned to obtain a balance between thepredictive ability of the model, computational cost and speed of SPdetection. To be consistent with the nomenclature used inliterature [18], PCA when applied to the lagged data will bereferred to as Dynamic PCA (DPCA).

3.1.3. Reducing the computational requirement

Typically the online data is measured at time scales (order ofseconds) which are much faster than the time constants (typicallyorder of hours) of fermentation processes. We consider data sampledat a lower frequency (sampling ratet) for detecting phase shifts. Thissampling rate is then a tuning parameter which should be chosen tobe consistent with the time constants of the fermentation process.

The overall approach with the above modifications is summar-ized in Fig. 1 and this technique will be referred to as ‘‘movingwindow dynamic PCA’’ (MW-DPCA). For the sake of comparison, inthe results section, we have also considered SP detection withoutreducing the dimensionality of the data. For this purpose, Eqs. (3),(5) and (6) are used. This approach will be referred to as ‘‘movingwindow all dimensions’’ (MW-AD). The results are also comparedwith conventional PCA (with and without lag) approach where theentire data is assumed to correspond to a single mean vector andcovariance matrix [9]. For such a scenario, the T2 for every datapoint follows beta distribution since each data point is used toestimate the mean and covariance [19]. Then, the upper controllimit (T2

UCL) and lower control limit (T2LCL) are determined as:

T2LCL ¼

ðm� 1Þm

2

B a=2; b=2; ðm� b� 1Þ=2ð Þ (10)

T2UCL ¼

ðm� 1Þm

2

B 1� a=2; b=2; ðm� b� 1Þ=2ð Þ (11)

Similar to the modifications adopted for the strategies proposed inthis article, the SP detection algorithm based on these conventionalPCA techniques also utilizes the heuristic that for an SP to bedeclared, h out of j consecutive points should be out of controllimits where data points are assumed to be available at samplingfrequency t.

3.2. Selection of model parameters

Implementation of the proposed algorithm in an effectivemanner requires the specification of the tuning parameters d, t, j,and h. In general, the optimal values of the parameters will varyfrom one organism to another due to significant differences in theirfermentation physiology. While searching through the parameterspace, the following values for these parameters have beenconsidered regardless of the organism: d [0, 4, 8, 12, 16], t [5, 10, 15,20] min, j [4, 5, 6, 7, 8] and h [2, 3, 4, 5, 6, 7, 8]. We first construct thereceiver operating characteristic (ROC) curve for all possiblemodels (i.e. all combinations of parameters). ROC captures thetrade off between the sensitivity and specificity for a binaryclassifier system [20,21]. The problem of SP detection can also beconsidered to be a binary classification problem where each datapoint needs to be classified as either a normal point (not an SP) oran SP as shown in Eq. (12).

Null hypothesis H0 : Point is not SPAlternative hypothesis H1 : Point is SP

(12)

Four types of outcomes are possible while testing these competinghypotheses: (i) true negative: H0 is actually true and it is notrejected by the model, (ii) true positive: H1 is actually true and H0 isrejected by the model, (iii) false negative: H0 is actually false but itis not rejected by the model, and (iv) false positive: H0 is actuallytrue but is rejected by the model. For a given model and batch data,let the number of instances of each of the above outcomes bedenoted by TN, TP, FN and FP respectively. For the given batch,knowing the true status of each time point (whether it is normalpoint or SP) from offline measurement data and/or expertknowledge, these numbers are then computed for each model(combination of parameters) under consideration.

All these models are then represented on the ROC curve whichis a plot of sensitivity versus 1-specificity, where sensitivity andspecificity are defined as:

Sensitivity ¼ TP

TPþ FN; Specificity ¼ TN

TNþ FP

In the ROC curve, models lying on the top left hand corner indicatethe optimal trade off between high sensitivity and specificity.

Page 5: Real time phase detection based online monitoring of batch ...dspace.library.iitb.ac.in/jspui/bitstream/10054/1713/1/5278.pdf · Real time phase detection based online monitoring

803

Based on the specific requirements (related to sensitivity andspecificity) of the user, any of the models which capture the besttrade off, can be used. However, in absence of such requirementsthis choice is not straightforward and, single metrics which are acombination of specificity and sensitivity can be used to rank thesemodels [22]. One such popular metric is Matthews CorrelationCoefficient (MCC) which is defined as [23]

MCC ¼ ðTP � TN � FP � FNÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðTP þ FPÞðTP þ FNÞðTN þ FPÞðTN þ FNÞ

p (13)

Fig. 2. Detection of singular points (SPs) in batch fermentation of D-ribose producing, tr

measurements; B) T2 plot for the MW-DPCA approach with covariance matrix using mod

matrix using model parameters as d = 8, t = 5 min, j = 4 and h = 2; and D) Profiles for t

weight. The measured SPs (MSPs) are q: end of lag phase and start of exponential growt

stationary phase, u: end of stationary phase and v: glucose exhaustion. Initial media comp

sulfate 5 g l�1,CaCO3 16 g l�1, MnSO4 0.5 g l�1, leucine 0.5 g l�1 and tryptophan 0.05 g l�1.

v’) denote the model predicted SPs (PSPs) which match with MSP and open circle symbols

indicative of the presence of points which go out of the range of the plot. In this parti

The MCC takes values between �1 and +1. Value of �1 indicatescomplete mismatch between model predictions and true nature ofevery point (SP or not an SP) while value of +1 indicates completeagreement. For finding a common (across different batches), high-performing model for a given organism, models which have MCCvalues greater than t% of maximum MCC (for that batch) and whichdo not miss more than some specified number nm of true SPs (forthat batch) for all batches, are considered. Here t and nm are userspecified parameters. If more than one model satisfies thesecriteria, any of these models can be selected.

ansketolase deficient strain of Bacillus pumilus ATCC 21951. A) Profiles of the online

el parameters as d = 4, t = 5 min, j = 6 and h = 2; C) T2 plot for DPCA with covariance

he offline measurement of concentrations of the substrates, products and dry cell

h phase, r: ribose production start, s: end of exponential growth phase and start of

osition for the batch: glucose 200 g l�1, corn steep liquor (CSL) 12 g l�1, ammonium

Gray filled circle symbols indicate the MSP, dark filled circle symbols (i.e., q’, r’, u’ and

(i.e., w, z, y and t) denote the PSPs which do not match with MSP. In B, the arrows are

cular batch, acetate and acetoin were not detected.

Page 6: Real time phase detection based online monitoring of batch ...dspace.library.iitb.ac.in/jspui/bitstream/10054/1713/1/5278.pdf · Real time phase detection based online monitoring

804

4. Results

We present the results for the proposed moving window basedtechniques: (i) MW-DPCA using covariance; (ii) MW-DPCA usingcorrelation; and (iii) MW-AD. The results are also compared withconventional approaches which use the data for the entire batchduration, namely, (iv) DPCA using covariance; (v) DPCA usingcorrelation; and (vi) using all dimensions (AD). For the PCA basedapproaches, three PCs were used. For all the approaches, Wmin ischosen to be 30. A brief discussion about the rationale behind thischoice is presented in Appendix A. Efficacy of these six approaches iscompared for SP detection for case studies involving three differentmicroorganisms. The predicted SPs (PSP) were compared with SPsidentified manually (MSP) based on the offline measurements. Notethat due to the low frequency of the offline measurements, theoccurrence of true SPs may differ from the identified MSPs by sometolerance d. Hence, the following strategy was used:

If PSP 2 MSP � d½ �then PSP is considered as a true SPIf PSP =2 MSP � d½ �then PSP is considered as a false SP

Fig. 3. Monitoring of the batch-to-batch variability by monitoring the occurrence of SPs fo

symbols) and PSPs (dark filled symbols when the time of PSP matches with that of MSP a

model parameter values of d = 4, t = 5 min, j = 6 and h = 2 for the triplicate batches. (B–H

and dry cell weight. The MSPs for different batches are: Batch IIa: q: start of acetoin a

ammonium sulfate consumption stops, v: amino acid in culture broth starts to increases

starts, s: end of logarithmic phase and start of acetate production, u: ammonium sulfate c

Batch IIc: q: ribose production starts and logarithmic phase ends, r: start of acetate for

amino acid in media starts to increase and death phase starts. Media composition for t

5 g l�1, CaCO3 16 g l�1, MnSO4 0.5 g l�1, leucine 0.5 g l�1 and tryptophan 0.05 g l�1.

In this work, the tolerance d was chosen to be 2.5 h.

4.1. Case study 1: Bacillus pumilus

The tkt deficient strains of bacillus are reported to becommercially important for the production of D-ribose. Twobatches I and II with different initial conditions were conducted,where the following online variables were recorded: pH, DO,agitator speed and concentrations of CO2 and O2 in the exhaust gas.Batch II was conducted in triplicate (labeled as IIa, IIb and IIc). Asexplained in the section on model parameter selection, for a givenapproach, MCC values and the number of missed SPs for allcombinations of model parameters were calculated for thesebatches. The SP detection results for batch I with t = 50 and nm = 2are presented in Fig. 2. Other satisfactory models had only minorvariations in model parameters such as h and j and are thereforenot presented. The MSPs shown in this figure were identified basedon the physiological characteristics as captured by the offlinemeasurements shown in Fig. 2D. Fig. 2A shows the raw online data

r batch fermentation of Bacillus pumilus conducted in triplicate. (A) MSPs (gray filled

nd open symbols otherwise) detected by MW-DPCA with covariance matrix using

) Profiles of the offline measurements of concentrations of the substrates, products

nd ribose production, r: start of acetate formation, s: start of stationary phase, u:

and death phase starts. Batch IIb: q: start of logarithmic phase, r: ribose production

onsumption stops, v: amino acid in media starts to increases and death phase starts.

mation, s: acetoin production starts and ammonium sulfate consumption stops, u:

riplicate batches is: Glucose 200 g l�1, cas amino acid 15 g l�1, ammonium sulfate

Page 7: Real time phase detection based online monitoring of batch ...dspace.library.iitb.ac.in/jspui/bitstream/10054/1713/1/5278.pdf · Real time phase detection based online monitoring

805

while Fig. 2B shows the T2 values for the MW-DPCA approach alongwith the corresponding time varying values for T2

LCL and T2UCL. By

monitoring the control limit violation of the T2 value, the proposedMW-DPCA with covariance approach has detected four of the fiveMSPs. Two additional SPs are also detected. These may be falsealarms or true SPs that are not captured as MSPs due to the lowfrequency of offline measurements. This type of predictionaccuracy would be difficult to achieve simply by visual inspectionof the online data which shows sharp changes at several timepoints. For example, the plot of agitator speed has several visibletroughs and peaks. However, not all these sharp changescorrespond to SPs. Results for MW-DPCA with correlation andMW-AD approaches are not presented since these were inferior tothose of MW-DPCA with covariance for this microorganism. For thesake of comparison, results for conventional single modeltechniques viz. DPCA with covariance, DPCA with correlationand AD were also generated. Fig. 2C shows the T2 values along withthe corresponding T2

UCL and T2LCL for DPCA with covariance, which

was found to be the best among these three conventionalapproaches. The static nature of DPCA is reflected in the timeinvariant nature of the T2 control limits. It is seen from Fig. 2B and Cthat MW-DPCA performs better than DPCA.

Offline data and results for Batch II are shown in Fig. 3. From theoffline data it can be seen that there is significant variability in

Fig. 4. Comparison of different monitoring techniques for batch fermentation of balhimy

circle symbols) and PSPs (dark filled circle symbols when the time of PSP matches with th

using covariance matrix, d = 0, t = 5 min, j = 8 and h = 5, (ii) MW-DPCA using correlation m

(iv) DPCA using covariance matrix, d = 8, t = 5 min, j = 4 and h = 2, (v) DPCA using corre

t = 5 min, j = 8 and h = 3 and (vii) MSPs. (B) Profiles for the offline measurement of conc

measurements. The MSPs are: q: start of antibiotic production, r: glycerol exhaustion, s: gl

Glucose 100 g l�1, glycerol 10 g l�1, yeast extract 1 g l�1, ammonium sulfate 4.95 g l�1, d

trisodium citrate 0.025 g l�1, MgSO4 1.5 g l�1, MnSO4 0.01 g l�1, NaCl 1 g l�1, KH2PO4

0.001 g l�1, myo-inositol 0.025 g l�1, thiamin HCL 0.001 g l�1, pyridoxine HCL 0.001 g l�

these batches. In particular, D-ribose, acetate and acetoinproduction in batch IIc appears to be delayed compared to theother two batches. This variability is reflected in different times ofoccurrence of MSPs corresponding to similar events such as thestart of death phase. The MW-DPCA model applied to batch I is ableto capture the batch-to-batch variability in terms of the PSPs inbatch II as well. Moreover, the results are superior to those ofconventional DPCA (data not shown).

4.2. Case study 2: Amycolatopsis balhimycina

Balhimycin (a glycopeptide antibiotic) producer strain of A.balhimycina was cultivated in media containing multiple carbonand nitrogen substrates including complex sources such as yeastextract and defatted soybean flour. The substitutable substratesmay be taken up sequentially or simultaneously, therebycomplicating the task of SP detection from the raw online data.Seven batches (labeled I-VII) were conducted. Results for batches I-IV, with t = 50 and nm = 2, and common model parameter valuesacross batches for a given approach, are shown in Figs. 4 and 5. Dueto space limitations, results for batches V–VII are presented assupplementary material. Batch I is characterized by three MSPsbased on the offline data (Fig. 4B). The three moving window basedapproaches have successfully identified these SPs whereas the

cin producing strain of Amycolatopsis balhimycina, batch I. (A) The MSPs (gray filled

at of MSP and open circle symbols otherwise) by different approaches: (i) MW-DPCA

atrix, d = 4, t = 10 min, j = 7 and h = 2, (iii) MW-AD, d = 0, t = 5 min, j = 5 and h = 3,

lation matrix, d = 8, t = 5 min, j = 4 and h = 2, (vi) using all dimensions (AD), d = 8,

entrations of the substrates, products and dry cell weight. (C) Profiles of the online

ucose consumption stops and stationary phase starts. Media composition for batch I:

efatted soybean flour 1 g l�1, and micronutrients ZnSO4 0.02 g l�1, FeSO4 0.02 g l�1,

0.16 g l�1, biotin 0.00005 g l�1, calcium-pantothenate 0.001 g l�1, nicotinic acid1, para-aminobenzoic acid 0.0002 g l�1 and MES buffer 1.045 g l�1.

Page 8: Real time phase detection based online monitoring of batch ...dspace.library.iitb.ac.in/jspui/bitstream/10054/1713/1/5278.pdf · Real time phase detection based online monitoring

Fig. 5. Monitoring of batch fermentation of A. balhimycina via MW-DPCA covariance matrix, d = 0, t = 5 min, j = 8 and h = 5 for three different batches. The profiles of offline

measurements, MSPs (gray filled circle symbols) and PSPs (dark filled circle symbols when the time of PSP matches with that of MSP and open circle symbols otherwise) are

shown for (A) Batch II; (B) Batch III; and (C) Batch IV. The MSPs of batch II are; q: end of glycerol and exponential growth phase, r: start of antibiotic production, s: start of death

phase as well as ammonia increase in media and u: complete glucose exhaustion. Media composition for batch II: glucose 84 g l�1, yeast extract 1.64 g l�1, ammonium sulfate

6.0 g l�1, defatted soybean flour 0.37 g l�1. The MSPs of batch III are; q: start of antibiotic production, r: end of glycerol and end of logarithmic phase, s: start of death phase, u:

glucose exhaustion. Media composition for batch III: glucose 51.5 g l�1, glycerol 16.5 g l�1, yeast extract 0.35 g l�1, ammonium sulfate 6.0 g l�1, defatted soybean flour

2.3 g l�1. The MSPs of batch IV are: q: end of lag phase, r: end of first exponential phase, s: end of glycerol and start of antibiotic production, u: ammonium and amino acid start

to increase in culture broth, v: end of glucose and start of death phase. Media composition for batch IV: glucose 68 g l�1, glycerol 2.7 g l�1, yeast extract 1.7 g l�1, ammonium

sulfate 6.6 g l�1, defatted soybean flour 0.16 g l�1. In each batch, micronutrients were added as mentioned in legend to Fig. 4.

806

three single model conventional approaches have missed one ormore of these SPs (Fig. 4A). For example, strategy (v) has missed allMSPs. It is interesting to note that between 36 and 48 h, most of thetechniques have predicted two additional SPs, which maycorrespond to the diauxic nature of the growth as the organismbegins to utilize glucose in this interval. Identification of suchevents, which fall in the interval between two offline samplingtimes, is possible only based on online data. The online data for thisbatch, in particular DO and agitator speed, is more noisy comparedto that for B. pumilus (Fig. 2A). This precludes manual identifica-tion of SPs by visual inspection of online data for this batch.However, the automated MW techniques have successfullyidentified the SPs.

The MW-DPCA results for batches II, III and IV along with thecorresponding offline data are presented in Fig. 5A–C respectively.For batch II, all the MSPs are captured by the MW-DPCA approach.Some of the additional PSPs detected in this batch may correspondto phase change events not identified by the offline measurements.For batch III, three of the four MSPs are correctly identified whilethe MSP u does not match any PSP. However, it should be noted thatthe time of occurrence of u may be inaccurate due to lack of offline

measurements between 108 to 120 h. It is interesting to note thatthere is a PSP at �112 h, which may indicate the actual time ofexhaustion of glucose (the event corresponding to MSP u). For batchIV, four of the five MSPs are captured by MW-DPCA. The missedMSP r corresponds to the end of first exponential growth phase.However, due to lack of offline measurements, this event couldhave actually occurred anytime in the 24–36 h interval. Onceagain, it is interesting to note that there is a PSP in this interval,which may correspond to the actual time of occurrence of thisevent.

4.3. Case study 3: Amycolatopsis mediterranei S699

Data for two batches I and II for this case study has been takenfrom Doan et al. [9]. The results for batch I for MW-DPCA witht = 75, nm = 2 and DPCA with t = 50, nm = 1 along with online andoffline data, and the T2 plots are shown in Fig. 6. For DPCA therewas no model parameter combination which met the t = 75 andnm = 2 criteria for both batches and hence lower t value had to beused to obtain a common model. MW-DPCA approach hassuccessfully predicted the three MSPs in batch I while DPCA

Page 9: Real time phase detection based online monitoring of batch ...dspace.library.iitb.ac.in/jspui/bitstream/10054/1713/1/5278.pdf · Real time phase detection based online monitoring

Fig. 6. Comparison of MW-DPCA and DPCA approaches for monitoring of Amycolatopsis mediterranei fermentation case study, batch I. (A) Profiles of the online measurements;

(B) T2 plot for the MW-DPCA approach with covariance matrix using model parameters as d = 4, t = 5 min, j = 8 and h = 5; (C) T2 plot for DPCA with covariance matrix using

model parameters as d = 16, t = 5 min, j = 8 and h = 6 and (D) Profiles for the offline measurement of concentrations of the substrates, product and dry cell weight. The MSPs

are; q: exhaustion of amino acids and beginning of adaptation to ammonium sulfate, r: beginning of exponential growth on ammonium sulfate and start of rapid consumption

of ammonium sulfate and s: exhaustion of ammonium sulfate. Media composition of batch I: glucose 80 g l�1 and ammonium sulfate 4 g l�1. Additional micronutrients

included potassium sulfate 1 g l�1, magnesium sulfate 1 g l�1, ferrous sulfate 1 g l�1, zinc sulfate 0.01 g l�1 and cobalt chloride 0.03 g l�1. In B–D, MSPs are denoted by gray

filled circle symbols while PSPs are denoted by dark filled circle symbols when the time of PSP matches with that of MSP and open circle symbols otherwise. Data and A and D

reproduced from Doan et al. [9]. The arrows are indicative of the presence of points which go out of the range of the plot.

807

approach has predicted only one MSP. From the online data(Fig. 6A), it is seen that there are no sharp changes corresponding toMSPs q and r and this may be the reason for the failure of DPCAapproach in capturing these events. However, the use of multiplemodels in MW-DPCA enables successful prediction of these events.

Batch II contains two alternate nitrogen substrates namelyammonia and nitrate leading to sequential uptake. MW-DPCAsuccessfully predicted three out of four MSPs whereas theconventional DPCA is able to predict only one (Fig. 7A). Note thatin the neighbourhood of MSPs q and r, there are no noticeable sharpchanges in the online profiles. Despite this, MW-DPCA hascaptured MSP r and has predicted an SP approximately 8 h priorto MSP q.

5. Discussion

In this work we have presented a moving window basedapproach for the detection of SPs in fermentation processes. Theapproach has the following salient features: (i) The method doesnot need to assume that a single statistical model is applicable forthe entire batch duration, (ii) the switching times from onestatistical model to the next are not decided a priori and are insteaddecided in real time based on the dynamic evolution of the batchunder consideration, (iii) similarly, the T2 control limits are notfixed a priori but are decided in real time based on the amount ofdata available in the corresponding phase, and (iv) the approachcan be used even in the absence of historical data. To demonstrate

Page 10: Real time phase detection based online monitoring of batch ...dspace.library.iitb.ac.in/jspui/bitstream/10054/1713/1/5278.pdf · Real time phase detection based online monitoring

Fig. 7. Comparison of different monitoring techniques for batch fermentation of Amycolatopsis mediterranei, batch II. (A) The MSPs (gray filled circle symbols) and PSPs (dark

filled circle symbols when the time of PSP matches with that of MSP and open circle symbols otherwise) by different approaches (i) MW-DPCA using covariance matrix with

d = 4, t = 5 min, j = 8 and h = 5, (ii) MW-DPCA using correlation matrix with d = 4, t = 5 min, j = 8 and h = 3, (iii) MW-AD with d = 4, t = 5 min, j = 6 and h = 5, (iv) DPCA using

covariance matrix with d = 16, t = 5 min, j = 8 and h = 6, (v) DPCA using correlation matrix with d = 8, t = 5 min, j = 6 and h = 3, (vi) using all dimensions (AD) with d = 8,

t = 5 min, j = 8 and h = 3 and (vii) manually identified SPs from experiment (MSPs). (B) Profiles for the offline measurement of concentrations of the substrates, products and

dry cell weight. (C) Profiles of the online measurements. The MSPs are: q: start of log phase, r: end of KNO3 adaptation period, s: ammonium sulfate consumption stops, u:

stationary phase starts. Media composition: Glucose 80 g l�1, KNO3 4.76 g l�1, ammonium sulfate 1.3 g l�1 and micronutrients as mentioned in legend to Fig. 6.

808

the efficacy of our approach, we have presented a comparison withthe conventional single model based approach. The results werefound to be superior to the conventional single model approacheven though the latter utilize the data for the entire batch duration.In contrast, our approach utilizes only the currently available datain an evolving batch in real time. This feature makes our approachamenable for real time implementation which is not possible withthe conventional single model approach.

Note that the proposed SP detection is based on violation ofeither the upper or the lower control limit by the T2 statistic(Eqs. (5) and (6)). In contrast, monitoring techniques haveconventionally relied only on violation of T2

UCL alone [24,25]. Notethat while changes in m mainly lead to violation of T2

UCL, changes inPcan manifest as violations of either T2

UCL or T2LCL [19]. Since a

phase change can correspond to either a change in the mean(operating level of the variables) or the covariance (relationshipsbetween variables), we chose to use both T2

LCL and T2UCL to detect

SPs. Indeed, we have observed several cases where violation of T2LCL

detects the SP (data not shown). Additionally, to provide insightinto the nature of the SP, we perform statistical tests to check if themeans and the covariances of adjacent phases are identical. Details

about these statistical tests are presented in Appendix B. Based onthese tests; it was observed that several of the phase shift pointswhich were detected due to T2

LCL violation corresponded to changesonly in the covariance and not in the mean (data not shown). Thisindicates the utility of using T2

LCL, apart from T2UCL, as a bound on T2

to improve SP detection.From the point of view of sensor selection, it would be of

interest to determine the utility of the various online measure-ments in SP detection. To this end, the contribution of variousonline measurements in SP detection was quantified for the casestudies presented in this work (see Appendix C). Fig. 8 shows thecontribution plots for SP prediction by MW-DPCA for batches incase study I. Note that the contributions of agitator speedand DO were more significant than those of other onlinevariables in majority of the batches. This is consistent with thefact that the rate of aerobic growth dictates the oxygenrequirement which in turn determines the DO and agitatorspeed as seen from Eq. (14).

dCO

dt¼ kLaðC � C0Þ �

mXBM

YB=O(14)

Page 11: Real time phase detection based online monitoring of batch ...dspace.library.iitb.ac.in/jspui/bitstream/10054/1713/1/5278.pdf · Real time phase detection based online monitoring

Fig. 8. Contribution of the different online variables toward SP detection by MW-DPCA approach with covariance matrix for batch fermentation of B. pumilus. (A) Batch I; (B)

Batch IIa; (C) Batch IIb; (D) Batch IIc. Exhaust CO2: gray filled bars; dissolved oxygen: dark filled bars and agitator speed: bars filled with slanted lines. The contribution of pH

and exhaust oxygen are not significant to detection of the SPs and therefore, not shown.

Fig. A1. The profile of T2UCL and T2

LCL at different sample size and different number of

variables (p) to define the Wmin.

809

At pseudosteady state condition dC0/dt becomes zero and Eq. (15)holds:

kLa/ f ðmXBMÞ (15)

But since

kLa/hðNÞ (16)

we get,

N/ gðmXBMÞ (17)

The proposed SP detection technique can in principle be used inconjunction with online data collected via a variety of sensors [26–29]. These sensors may range from simple probes such as those forpH, dissolved oxygen concentration, and optical density tocomplex probes which can acquire near infrared (NIR) orfluorescence spectroscopy based measurements [30,31]. Some ofthe complex probes are more informative but may suffer fromdrawbacks such as limited measurement range and high cost. Webelieve that the proposed approach can be used to evaluate theutility of a given sensor in SP detection. Additionally, the proposedapproach can be used as a guide in real time in making decisionsabout the timings of offline sampling. In particular, an offlinesample can be collected whenever an SP based on online data isdetected instead of collecting offline samples based on arbitrarilyspecified timings.

Acknowledgements

The authors acknowledge the generous gift of the Amycolatopsis

balhimycina strain from Anna Eliasson Lantz of Denmark’sTechnical University, Denmark. The work was partially supportedby a grant from the Department of Biotechnology, Government ofIndia.

Appendix A. Selection of Wmin

The case studies considered in this article involved either four(A. Balhyimycina DSM5908) or five (B. pumilus ATCC 21951and A.

mediterranei S699) online measurements. For these case studies,Wmin was taken to be 30 since it was found (Fig. A1) that there wasnot much variation in the T2

LCL and T2UCL values with respect to Wmin

beyond 30 for p = 4 and 5.

Appendix B. Covariance and mean comparison of adjacentphases

The key idea in our proposed algorithm is that occurrence of anSP corresponds to change in statistical properties of online data.Under the assumption of normally distributed data, this change instatistical properties will be reflected as differences in the meanvectors and/or covariance matrices of adjacent data sets (beforeand after detection of an SP). The data in the two adjacent

Page 12: Real time phase detection based online monitoring of batch ...dspace.library.iitb.ac.in/jspui/bitstream/10054/1713/1/5278.pdf · Real time phase detection based online monitoring

810

populations is considered to be normal with means m1, m2 andcovariances

P1,P

2 respectively. To identify changes in theseparameters, the following statistical tests were used:

The modified likelihood test [32] was used for testing whetherthe covariance matrices of adjacent populations are identical. Inparticular, the following hypotheses were tested;

null hypothesis : H0 : S1 ¼S2

alternative hypothesis : H1 : S1 6¼S2

An approximate test of H0 at significance level a based on the

modified likelihood ratio statistic is to reject H0 if

�2r logL* > cf(1 � a) where cf(1 � a) denotes the percentage

point from the x2f distribution such that area to the left is 1 � a,

r ¼ 1� ð2 p2 þ 3 p� 1Þ=ð6ðpþ 1ÞnÞðP2

i¼1 1=ki � 1Þ, f = p(p + 1)/2 is

degrees of freedom and L ¼ ðY2

i¼1

ðdet SiÞ ni�1ð Þ=2=ðdet SÞ n�2ð Þ=2Þ

ððn� 2Þ pðn�2Þ=2=Y2

i¼1

ðni � 1Þpðni�1Þ=2Þ is the modified likelihood ratio.

In these expressions, Si is the ith population sample covariance

matrix, S ¼P2

i¼1 Si, ni is the size of the ith population sample,

n ¼P2

i¼1 ni, ki = (ni – 1)/(n – 2). For checking whether the mean

vectors of adjacent populations are identical, the followinghypothesis was tested [33].

nullhypothesis : H0 : m1 ¼ m2

alternativehypothesis : H1 : m1 6¼m2

The null hypothesis was rejected if the test statistic T2S > TC

[assuming n1 � n2], where

T2S ¼ n1ðx1 � x2ÞT C�1ðx1 � x2Þ

x1 ¼Pn1

g¼1 x1g=n1, x1g is the gth data point of 1st population.

x2 ¼Xn2

b¼1

x2b=n2, x2b is the bth data point of 2nd population

C ¼Pn1

g¼1ðug � uÞðug � uÞT

ðn1 � 1Þ

ug ¼ x1g � ðn1=n2Þ1=2x2g ; g ¼ 1;2 . . . n1

u ¼Xn1

g¼1

ug=n1;

Tc ¼n1 p

n1 � pþ 1Fð1� a; p;n1 � pþ 1Þ

For both covariance and mean checking, the value of a was taken as0.01.

Appendix C. Contribution of variables towards SP detection

When an SP is detected, the variables primarily responsible foroccurrence of SP can be identified based on the contribution plot ofthe variables. The procedure is as follows [18]:

When an SP is detected at the kth time point corresponding toobservation xk, then T2

k >T2UCL or T2

k <T2LCL. The normalized score

t2i =li are then computed for the ith principal component

(i = 1,2,. . .,b) where t2i is the score of the projection of xk to the

ith loading vector. The principal components for whichðt2

i =liÞ> ð1=bÞT2UCL (in case T2

k >T2UCL) or ðt2

i =liÞ< ð1=bÞT2LCL (in case

T2k <T2

LCL) are determined to be responsible for the out of controlstatus. Let the number of such principal components be r. Then thecontribution of each variable j to the out of control score ti can bedefined as conti; j ¼ ðti=liÞpi; jðx j �m jÞ, where pi,j is the (i, j)th

element of the loading matrix P. If conti,j is negative, it is set equalto zero. The total contribution of the jth process variables is then:

CONT j ¼Pr

i¼1 conti; j The variables with large values of CONTare identified as primary causes for phase change detection. Thisinformation can potentially aid the process operator in determin-ing the nature of phase change as well as in taking any controlaction if required.

Appendix D. Nomenclature

b number of eigen values to be considered for PCA

B(a/2; a, b) beta-distribution with a% significant level and a and b

degree of freedom

C* saturation concentration of oxygen

C0 dissolved oxygen concentration in medium

d dynamic lag

diag (lb) the diagonal matrix of b largest eigenvalues of

covariance matrix Si

DPCA dynamic principal component analysis

F(a/2; c,b) percentage point from F-distribution with a and b

degrees of freedom such that the area to the left is a/2.

kLa volumetric oxygen mass transfer coefficient

m total number of data points for the entire batch

MSP SP based on offline measurements

MW-AD moving window all dimensions

MW-DPCA moving window dynamic PCA

ni the number of data points belonging to phase fi

nm number of missed SPs during prediction

N agitator speed

p number of variables being measured

Pi probability distribution corresponding to ith phase

Pb matrix of b eigenvectors of Si corresponding to the

largest b eigenvalues

PSP predicted SPs

Si sample covariance matrix of ith phase

SP singular point

t percent of MCC considered to find the common model

T2k Mahalanobis distance of the current point xk from the

mean of the data corresponding to phase fi

T2LCL lower control limit of T2

T2UCL upper control limit of T2

Wmin minimum window length for calculating covariance

matrix

xk data (row) vector at kth time

xpk value of pth variable at kth time

xdk data (row) vector with lag d at kth time

Xdi data matrix of ith phase with lag d

XBM biomass concentration

YB/O yield of biomass per unit of oxygen consumed

Greek letters

a the significance level

fi ith phase

h number of points required to violate the control limits for

SP detection

m specific growth rate

t sampling rate

j number of consecutive points checked for violation of

control limits for SP detection

Page 13: Real time phase detection based online monitoring of batch ...dspace.library.iitb.ac.in/jspui/bitstream/10054/1713/1/5278.pdf · Real time phase detection based online monitoring

811

d tolerance for comparing MSP and PSP

mi population mean of ith phasePi population covariance matrix of ith phase

Appendix E. Supplementary data

Supplementary data associated with this article can be found, inthe online version, at doi:10.1016/j.procbio.2009.03.008.

References

[1] Sonnleitner B, Locher G, Fiechter A. Automatic bioprocess control 1. A generalconcept. J Biotechnol 1991;19:1–17.

[2] DePalma A. Process monitoring on-line & in real-time. Genet Eng News2006;26:50–1.

[3] Lopes JA, Costa PF, Alves TP, Menezes JC. Chemometrics in bioprocess engi-neering: process analytical technology (PAT) applications. Chemometr IntellLab 2004;74:269–75.

[4] Venkatasubramanian V, Rengaswamy R, Kavuri SN, Yin K. A review of processfault detection and diagnosis Part III: Process history based methods. ComputChem Eng 2003;27:327–46.

[5] Kamimura R, Konstantinov K, Stephanopoulos G. Knowledge-based systems,artificial neural networks and pattern recognition: applications to biotechno-logical processes. Curr Opin Biotechnol 1996;7:231–4.

[6] Lee JM, Yoo CK, Lee IB. On-line batch process monitoring using a consecutivelyupdated multiway principal component analysis model. Comput Chem Eng2003;27:1903–12.

[7] Lee JM, Yoo CK, Lee IB. Enhanced process monitoring of fed-batch penicillincultivation using time-varying and multivariate statistical analysis. J Biotech-nol 2004;110:119–36.

[8] de Vargas VDCC, Lopes LFD, Souza AM. Comparative study of the performanceof the CuSum and EWMA control charts. Comput Ind Eng 2004;46:707–24.

[9] Doan XT, Srinivasan R, Bapat PM, Wangikar PP. Detection of phase shifts inbatch fermentation via statistical analysis of the online measurements: a casestudy with rifamycin B fermentation. J Biotechnol 2007;132:156–66.

[10] Neubauer AS. The EWMA control chart: properties and comparison with otherquality-control procedures by computer simulation. Clin Chem 1997;43:594–601.

[11] Bapat PM, Das D, Dave NN, Wangikar PP. Phase shifts in the stoichiometry ofrifamycin B fermentation and correlation with the trends in the parametersmeasured online. J Biotechnol 2006;127:115–28.

[12] Doan XT, Srinivasan R. Online monitoring of multi-phase batch processesusing phase-based multivariate statistical process control. Comput Chem Eng2008;32:230–43.

[13] Cinar A, Parulekar SJ, Undey C, Birol G. Batch Fermentation Modeling, Mon-itoring and Control. New York: Marcel Dekker; 2003, 245–261.

[14] Srivastava RK, Wangikar PP. Combined effects of carbon, nitrogen and phos-phorus substrates on D-ribose production via transketolase deficient strain ofBacillus pumilus. J Chem Technol Biotechnol 2008;83:1110–9.

[15] Bapat PM, Bhartiya S, Venkatesh KV, Wangikar PP. Structured kinetic model torepresent the utilization of multiple substrates in complex media duringrifamycin B fermentation. Biotechnol Bioeng 2006;93:779–90.

[16] Morrison GR. Microchemical determination of organic nitrogen with nesslerreagent. Anal Biochem 1971;43:527–32.

[17] Allen NE, LeTourneau DL, Hobbs Jr JN. The role of hydrophobic side chains asdeterminants of antibacterial activity of semisynthetic glycopeptide antibio-tics. J Antibiot (Tokyo) 1997;50:677–84.

[18] Chiang LH, Russell EL, Braatz RD. Fault Detection and Diagnosis in IndustrialSystems. London: Springer-Verlag; 2001, 21–55.

[19] Tracy ND, Young JC, Mason RL. Multivariate control charts for individualobservations. J Qual Technol 1992;24:88–95.

[20] Obuchowski NA. Receiver operating characteristic curves and their use inradiology. Radiology 2003;229:3–8.

[21] Zweig MH, Campbell G. Receiver operating characteristic (ROC) Plots—afundamental evaluation tool in clinical medicine. Clin Chem 1993;39:561–77.

[22] Baldi P, Brunak S, Chauvin Y, Andersen CAF, Nielsen H. Assessing the accuracyof prediction algorithms for classification: an overview. Bioinformatics2000;16:412–24.

[23] Matthews BW. Comparison of the predicted and observed secondary structureof T4 phage lysozyme. Biochim Biophys Acta 1975;405:442–51.

[24] Chen JH, Liu KC. On-line batch process monitoring using dynamic PCA anddynamic PLS models. Chem Eng Sci 2002;57:63–75.

[25] Lennox B, Montague GA, Hiden HG, Kornfeld G, Goulding PR. Process mon-itoring of an industrial fed-batch fermentation. Biotechnol Bioeng2001;74:125–35.

[26] Esbensen K, Kirsanov D, Legin A, Rudnitskaya A, Mortensen J, Pedersen J,Vognsen L, Makarychev-Mikhailov S, Vlasov Y. Fermentation monitoring usingmultisensor systems: feasibility study of the electronic tongue. Anal BioanalChem 2004;378:391–5.

[27] Grube M, Gapes JR, Schuster KC. Application of quantitative IR spectral analysisof bacterial cells to acetone–butanol–ethanol fermentation monitoring. AnalChim Acta 2002;471:127–33.

[28] Riley MR, Rhiel M, Zhou XJ, Arnold MA, Murhammer DW. Simultaneousmeasurement of glucose and glutamine in insect cell culture media by nearinfrared spectroscopy. Biotechnol Bioeng 1997;55:11–5.

[29] Turner C, Rudnitskaya A, Legin A. Monitoring batch fermentations with anelectronic tongue. J Biotechnol 2003;103:87–91.

[30] Arnold SA, Crowley J, Woods N, Harvey LM, McNeill B. In-situ near infraredspectroscopy to monitor key analytes in mammalian cell cultivation. Biotech-nol Bioeng 2003;84:13–9.

[31] Rhee JI, Kang TH. On-line process monitoring and chemometric modeling with2D fluorescence spectra obtained in recombinant E. coli fermentations. Pro-cess Biochem 2007;42:1124–34.

[32] Muirhead RJ. Aspects of Multivariate Statistical Theory. New York: John Wiley& Sons; 1982. pp. 291–311.

[33] Ito K. On the effect of heteroscedasticity and nonnormality upon some multi-variate test procedure. In: Krishnaiah PR, editor. Multivariate Analysis II.Academic Press; 1969. p. 87–119.


Recommended