PROOF COPY 001302ISA - hds.utc.frtdenoeux/dokuwiki/_media/en/revues/isa... · PROOF COPY 001302ISA...

PROOF COPY 001302ISA

PROO

F COPY 001302ISA

Neural networks for process control and optimization:Two industrial applications

Gerard Bloch,a,* Thierry Denoeuxb,†

aCentre de Recherche en Automatique de Nancy (CRAN), UMR CNRS 7039, Franceand Ecole Supe´rieure des Sciences et Technologies de l’Inge´nieur de Nancy (ESSTIN),

UniversiteHenri Poincare, Nancy 1, FrancebHeudiasyc, UMR CNRS 6599, France

and Universite´ de Technologie de Compie`gne, France

~Received 17 July 2001; accepted 16 April 2002!

Abstract

The two most widely used neural models, multilayer perceptron~MLP! and radial basis function network~RBFN!,are presented in the framework of system identification and control. The main steps for building such nonlinear blackbox models are regressor choice, selection of internal architecture, and parameter estimation. The advantages of neuralnetwork models are summarized: universal approximation capabilities, flexibility, and parsimony. Two applications aredescribed in steel industry and water treatment, respectively the control of alloying process in a hot dipped galvanizingline and the control of a coagulation process in a drinking water treatment plant. These examples highlight the interestof neural techniques, when complex nonlinear phenomena are involved, but the empirical knowledge of controloperators can be learned. © 2003 ISA—The Instrumentation, Systems, and Automation Society.

Keywords: Neural networks; Computer modeling and simulation; Control; Optimization; Steel industry; Drinking water treatment

1. INTRODUCITON

Artificial neural networks have been the focus ofa great deal of attention during the last two de-cades, due to their capabilities to solve nonlinearproblems by learning from data. Although a broadrange of neural network architectures can befound, multilayer perceptrons~MLP’s! and radialbasis function networks~RBFN’s! are the mostpopular neural models, particularly for systemmodeling and identification@1,2#, control @3,4#,and time series forecasting.

In Section 2, these two neural models are pre-sented and related to the general task of system

identification from experimental data. The differ-ent methods for choosing the input variables~re-gressors!, selecting the internal architecture, andlearning the weights~i.e., estimating the param-eters! are reviewed. The advantages of these mod-els are then summarized, as compared to othernonlinear structures. The third part briefly intro-duces the application of neural networks to pro-cess control. Finally, two case studies are de-scribed in the last section: the intelligent control ofa hot dipped galvanizing line, and the control of acoagulation process in a water treatment plant.These two applications show the interest of neurallearning in an industrial production context whencomplex physical phenomena are involved, par-ticularly at the upper level of set-point determina-tion.

*E-mail address: [email protected]†E-mail address: [email protected]

ISATRANSACTIONS®

ISA Transactions 42~2003! 1–0

0019-0578/2003/$ - see front matter © 2003 ISA—The Instrumentation, Systems, and Automation Society.



PROO

F COPY 001302ISA

2. NONLINEAR SYSTEM MODELINGWITH NEURAL NETWORKS

2.1 Two neural models

Only a reduced form of multilayer perceptron~MLP! ~or feedforward sigmoid neural network!will first be presented here: the one hidden layerperceptron with linear output unit. Although par-ticular, this model will be called MLP in the fol-lowing. Its form is given, for single outputf , by

f 5 (k51

n

wk2gS (

j 51

p

wk j1 w j1bk

1D 1b2, ~1!

where w j , j 51,...,p, are the inputs of the net-work, wk j

1 and bk1, k51,...,n, j 51,...,p, are the

weights and biases of the hidden layer, the activa-tion functiong is a sigmoid function, often chosenas the hyperbolic tangentg(x)5 2/(11e22x)21,wk

2, k51,...,n, andb2 are the weights and bias ofthe output neuron or node~see Figs. 1 and 2!.

The restriction to only one hidden layer and to alinear activation function at the output brings thegeneral perceptron closer to other nonlinear mod-els, neural or not. Indeed, the one hidden layerperceptron corresponds to a unique particular

choice, the sigmoid function, for the basis functiongk , and to a ‘‘ridge’’ construction for the inputs@2# in a function expansion:

f ~w,u!5 (k51

n

akgk~w,bk!, ~2!

where w5@w1•••wp#T is the regression vectorand the parameter vectoru is the concatenation ofall the weightsw and biasesb.

Choosing a Gaussian functiong(x)5e2x2/s2as

basis function and a radial construction for the in-puts leads to the radial basis function network~RBFN! @5#:

f ~w,u!5 (k51

n

akgk~w!1a0

5 (k51

n

akg~ iw2gkibk!1a0

5 (k51

n

ak expS 21

2 (j 51

p~w j2gk j!

2

bk j2 D 1a0

~3!

where gk5@gk1•••gkp#T is the ‘‘center’’ or

‘‘position’’ of the kth Gaussian and bk

5@bk1•••bkp#T its ‘‘scale’’ or ‘‘width’’ ~see Figs.

3 and 4!.The process of approximating nonlinear rela-

tionship from data can be decomposed in severalsteps:

• determining the structure of the regressionvector w or selecting the inputs of the net-work;

Fig. 1. One hidden layer perceptron, with linear outputnode.

Fig. 2. Hyperbolic tangent function.

Fig. 3. Radial basis function network.

2 Gerard Bloch, Thierry Denoeux / ISA Transactions 42 (2003) 1–0



PROO

F COPY 001302ISA

• choosing the nonlinear mappingf or, in theneural network terminology, selecting an in-ternal network architecture;

• estimating the parameter vectoru, i.e.,~weight! ‘‘learning’’ or ‘‘training.’’

As recalled in Fig. 5, this approach is similar tothe classical one for linear system identification@6#, the selection of the model structure being,nevertheless, more involved. Several general com-ments concerning these three points will be donein the following.

2.2 The regressors

For dynamic systems in discrete timet, a naturalapproach@7,3# is to reuse the input structure oflinear models, particularly the general input-outputmodel family @6#

A~q21!y~ t !5B~q21!

F~q21!u~ t !1

C~q21!

D~q21!e~ t !,

~4!

whereu(t) andy(t) are, respectively, the systeminput and output,e(t) is a white noise independent

from past inputs, and whereA, B, C, D andF arepolynomials in the backward shift operatorq21.

This approach has several attractive advantages,pointed out by Nørgaard@8#, namely:

• It is a natural extension of the well-knownlinear models. The internal architecture canbe increased gradually as a higher flexibilityis needed to model more complex nonlinearrelationships.

• The structural decisions required by the userare reduced to a level that is reasonable tohandle.

• The approach is suitable for the design ofcontrol systems.

The predictor associated with model~4! can beexpressed in ‘‘pseudolinear’’ form asy(tuu)5w(t,u)Tu, wherew is the regression vector andu is the parameter vector. It can be extended tononlinear models asy(tuu)5 f „w(t,u),u…. De-pending on the choice of the regressors inw(t),different models, withN ~for nonlinear! or NN ~forneural network! added, can be derived@9#:

• NFIR, with delayed measured inputsu(t2k) as regressors;

• NARX, with delayed measured inputsu(t2k) and outputsy(t2k) as regressors;

• NOE, with u(t2k) and outputs simulatedfrom past inputsu only yu(t2kuu) as re-gressors,

• NARMAX, with u(t-k), y(t2k), and pre-diction errors «(t2k)5y(t2k)2 y(t2kuu) as regressors,

• NBJ, with u(t-k), «(t2k)5y(t2k)- y(t2kuu), «(t2k) , and «u(k2t)5y(k2t)2 yu(k2tuu) as regressors.

As an example, Fig. 6 illustrates the parallel be-tween~linear! ARX and NARX models.

Several methods have been proposed for the se-lection of the regressors prior to parameter estima-

Fig. 4. Gaussian bell.

Fig. 5. Identification procedure.

Fig. 6. ARX ~left! and NARX ~right! models.

3Gerard Bloch, Thierry Denoeux / ISA Transactions 42 (2003) 1–0



PROO

F COPY 001302ISA

tion. Battiti @10# used entropy measures for select-ing ‘‘features.’’ For dynamic systems, He andAssada@11# described a very computationally ex-tensive method to determine the lag space, i.e., thenumber of delayed signal used as regressors, fordeterministic systems. But most of the time, theselection of the network inputs is done after orduring learning, see e.g., Ref.@12#, and is part ofthe network architecture determination process.

2.3 Selection of the network architecture

Most of the methods for finding the optimal net-work architecture in view of a particular estima-tion problem are iterative techniques and are moreor less derived from linear regression algorithms,where the architecture selection is embedded inparameter estimation. They are applied to the se-lection of inputs, hidden nodes, or individualweights. They can be classified into three groups:

• Forward selection adds the ‘‘best’’ neuron~and the corresponding parameters! to an ex-isting model, see for instance Refs.@13#and @14#.

• Backward selection removes the ‘‘least rel-evant’’ parameters, including pruning meth-ods, see Refs.@15# and @16#, for reviews.The optimal brain damage~OBD! @17# andthe optimal brain surgeon~OBS! @18# algo-rithms are the most widely used pruningmethods. In these methods, an initial net-work, ‘‘large enough’’ to describe the sys-tem, is determined and then reduced itera-tively, by removing useless ‘‘spurious’’parameters. As pruning leads to a simplermodel, it alleviates the overfitting problem,i.e., the learning of noise and unknown un-derlying model of the system at the sametime; as a result, it generally leads to an im-provement of the model generalization abili-ties.

• Finally, stepwise regression combines bothapproaches; see Ref.@12# for regressor se-lection, or Ref.@19#.

2.4 Parameter estimation

Learning ~i.e., parameter estimation! methodsfor MLP’s are very numerous and can be pre-sented in three classes. In the first one, methodsexploiting the particular architecture of these net-works as a succession of layers can be found; see

Refs.@20# and@21#, for instance. The second classcomprises various first- or second-order local,gradient-based procedures; see Ref.@22# for a re-view. Global, or stochastic, optimization methodsconstitute the third class, including particularlyevolutionary algorithms. As reviewed by Yao@23#,such algorithms are used not only for parameterestimation, but also for architecture determinationand learning rule adaptation. Hybrid methods,combining gradient descent and evolutionary algo-rithms, have also been proposed.

The performances of learning algorithms forMLP’s are sensitive to a large range of factors,including:

• the choice of the error function, which canbe simply quadratic, robust to outliers@24#or regularized;

• the weight initialization scheme, which canconsiderably influence the number of itera-tions and may have an impact on generaliza-tion @25,26#;

• the stopping criterion;• parameters specific to the different methods,

as well as, i.e.,...the user’s skills in using aparticular one.

The best method is thus problem dependent. Thebatch Levenberg-Marquardt algorithm, althoughgiving no guarantee to reach a global minimum, isoften recommended.

For RBF networks, there are different ap-proaches to estimate the parametersak, k50,...,n, which appear linearly in model~3!, thecenters gk5@gk1 ...gkp#

T and widths bk

5@bk1 ...bkp#T, k51,...,n. A commonly used

method separates the estimation of theakparameters, on one hand, from that of centersand scales, on the other hand. The centers and thescales are determined in an unsupervised manner,i.e., without using the outputsy of the system, byone of the various clustering methods, such as thehard or fuzzyC-means, for example. Such meth-ods aim at determining compact clusters in a set ofmultidimensional points, in our case the differentsystem input observations. The centers of gravityof the clusters are used as centers for the RBF’s.The scalesbk can then be computed from the clus-ters or fixed heuristically by the user~sometimesthe sameb is used for each RBF and each inputdimension!. The radial basis functionsgk(w) be-ing fixed, the ak parameters are simply deter-




PROO

F COPY 001302ISA

mined by least-squares estimation. Simplicity isoften claimed as the main advantage for the RBFnetworks. Nevertheless, the determination of thecenters by clustering is not always obvious, and itssuffers from the drawbacks as other methods: nu-merous parameters to tune~type of algorithm,number of clusters, initial centers, metric used,•••!,and dependence of the results to initialization.Moreover, learning the centers without supervi-sion is obviously suboptimal with respect to theapproximation task.

Another approach starts from a small enoughnumber of centers and estimates simultaneouslythe ‘‘linear’’ parameters and the centers andwidths, by an iterative method, such as one of thepreviously described for MLP’s. See, for example,Ref. @27#.

Finally, there are efficient but memory demand-ing methods, more or less derived from the or-thogonal least-squares~OLS! algorithm @28#,which allow us to obtain simultaneously the ‘‘lin-ear’’ parameters, the centers, and their number@29#. All the input observations are considered ascandidate centers and, after orthogonalization, areincorporated one by one in a forward manner untila specified error threshold is reached.

2.5 The advantages of the one hidden layerperceptron and RBFN

Among the numerous nonlinear models, neuralor not, which can be used to estimate a nonlinearrelationship, the one hidden layer perceptron~OHLP!, as well as the radial basis function net-work ~RBFN!, present interesting features, whichcan be summarized in few words: they areflexibleand parsimonious nonlinear black box models,with universal approximation capabilities.

Several researchers have proved that OHLP@30#as well as RBFN@5# are universal approximators,i.e., they can approximate any nonlinear function,from a space of finite dimension to another, withany degree of accuracy. Other models share thisproperty, such as polynomial models, trigonomet-ric series, splines, and orthogonal function expan-sions. However, roughly speaking, OHLP’s andRBFN’s are expansions of parametrized functionsinvolving adjustable parameters; consequently, itcan be shown that they require fewer parametersthan expansions of fixed functions to reach aspecified error goal@31#. In that sense, they are

parsimonious. The price to pay for using param-etrized functions in the expansion is the existenceof numerous local minima in the error surface.

Moreover, OHLP’s and RBFN’s are flexible: themore complex~nonlinear! the relationship is tomodel, the more numerous will be the nodes or theparameters of the corresponding neural network.That means that their internal complexity can beeasily increased, without changing the global formof the model. They belong to the general class ofnonparametric models that do not make any as-sumption about the parametric form of the func-tion to be approximated. In that sense, they con-stitute flexible regression tools.

For these various reasons, only these two neuralmodels, the one hidden layer perceptron and theradial basis function network, are employed in theapplications described further. This choice avoidshaving to determine the number of hidden layers.

3. NEURAL NETWORKS FOR CONTROL

Neural networks can be included in various con-trol schemes@4,32,33#. Agarwal @34# proposed asystematic classification, with two main classes. Inthe first category, neural networks are only used asaids for system modeling, control-law implemen-tation, or supervisory action. In the second one,they are used as controllers, with different trainingapproaches. Before presenting examples of sys-tems in each category, we first discuss some gen-eral issues regarding the design of neural networkbased controllers.

3.1 Controller learning

One of the first control strategies which hasbeen proposed is to ‘‘train’’ a neural network tobehave like the inverse of the process, and thenuse it as a controller~see Fig. 7!. For a nonlinearSISO process to be controlled, with inputu andoutput y, it is assumed that the model can be ex-pressed by

Fig. 7. Direct inverse control~open loop!.




PROO

F COPY 001302ISA

y~ t11!5 f „y~ t !,...,y~ t2n11!,

u~ t !,...,u~ t2m!….

So the inverse model of the system can be built byusing a neural network:

u~ t !5 f 21„y~ t11!,y~ t !,...,y~ t2n11!,

u~ t21!,...,u~ t2m!….

This latter model can be used for control by re-placing the actual system output at timet11,y(t11), by the referenceyd(t11) @35#.

In the case where the direct model is one-to-oneand stable, learning of the inverse model can bedone directly from the system only@cf. Fig. 8~a!#.The inverse model can also be taught to be con-figured as a process controllerC by a recursivegradient-based algorithm. This ‘‘specialized’’learning requires the process Jacobian~gradient!,which can be obtained from physical knowledgeof the process, if available, or approximated by

• applying small variations~Du! to the processinput;

• observing the output~Dy!, and• calculating an approximate gradient~Dy/

Du!.

Alternatively, the process can be approximated bya ~direct! linear modelM or by a nonlinear neuralone, from which the gradient can be derived@seeFig. 8~b!# @27#. Note that the latter approach isquite close to conventional adaptive control.

3.2 Several neural control schemes

Control schemes using neural networks can alsobe divided according to the use of a direct modelof the process. If such a model is not required, themethods are called direct control. They include thecopy of an existing controller, in case of compli-cated or costly devices used as controller, or ofhuman, nonexplicit, control laws; direct controlwith inverse model~Fig. 7!; adaptive direct con-trol; feedforward direct control, etc. In feedfor-ward direct control, the inverse neural model isused in parallel with a conventional controller~e.g., PID! ~see Fig. 9!. The role of the PID is toensure regulation and stabilization of the con-trolled process, while the neural network compen-sates for nonlinearities of the process. Supplemen-tary variables can be applied at the input of theinverse neural model, static or dynamic, to takeinto account changes in operating points. Thisscheme has been implemented in one of the appli-cations described in the next section.

Indirect-type control methods require a directmodel of the process to be controlled. They in-clude optimal control, indirect adaptive control,internal model control, predictive control, controlby feedback linearization, etc. As an example, ascheme of neural internal model control is given,for stable processes, in Fig. 10.

Most of the classical control schemes can beextended by using neural networks. The major dif-ficulties in their application to control lie in theassociated computation costs, for fast systems, and

Fig. 9. Neural feedforward control.

Fig. 8. ~a! Direct learning.~b! Specialized learning through direct neural model.

Fig. 10. Neural internal model control.




PROO

F COPY 001302ISA

in the difficulties to establish the stability of theresulting schemes. It must be said that developingneural dynamic control often remains an exces-sively heavy task when classical linear methodscan be applied with acceptable results, even ifcomputer aided tools are now available@36#. Asshown in the following, neural techniques appear,nevertheless, as very useful alternatives whenphysical knowledge of process is insufficient, par-ticularly at control levels higher than classical dy-namic control.

4. TWO APPLICATIONS

4.1 Intelligent control of alloying process in ahot dipped galvanizing line

The first presented study was conducted as partof a collaboration between CRAN and the Sollaccompany, concerning the hot dip galvanizing lineof Florange~France! @37#. The line was designedfor the production of galvanized steel sheets ofoutside car panels with optimum surface quality.The line is 500 m long, and is equipped with about5000 sensors. It produces 300 000 tons/year ofgalvanized steel sheet, at a speed of 80–120m/min. A layout of the galvannealing section ofthis plant, including galvannealing furnace, soak-ing furnace, and air cooling section, is shown inFig. 11. At the exit of the zinc bath, the coatingstrip is annealed to allow the diffusion of strip ironto the coating. The strip is reheated with an induc-

tive furnace up to a set point, named inductivetemperature(u inductive), and goes through a soak-ing furnace, the inner temperature of which iscalled the mean temperature~umean!. It thenpasses through the cooling part to stop the gal-vannealing reaction. The quality of the product isrelated to the percentage of iron at the surface. Anunderalloyed product~lack of iron in the coat! iscaused by an insufficient alloying temperature,while an overalloyed product is obtained when thethermal cycle is too high. The problem is to deter-mine and control the optimal inductive tempera-ture, knowing the operating conditions which arethe speed, width, and thickness of the strip and theheating power applied to the furnace.

This brief description highlights some generalfeatures of plants in the steel industry: numerousand interconnected describing variables, complexphysical phenomena, only partially known in anindustrial production context, nonlinear relation-ships, importance of the skill of the operators. Aspointed out by Harris@33#, many processes, beingtoo complex for direct modeling based on physicallaws, are manually regulated by human operatorsbefore automatic controls are installed. The plantoperator is able to cope with plant nonlinearitiesand slowly varying parameters, to respond to com-plex sets of noisy observations and poorly speci-fied constraints, and to satisfy multiple subjectiveperformance criteria. Thus one of the basic ideasof the presented ‘‘intelligent’’ control applicationis to incorporate the flexible and creative attributesof human controllers, while avoiding their associ-ated characteristics of unreliability.

The overall equipment effectiveness~OEE! ofthe line can be increased by analyzing the sourcesof losses: stopping~for failures, tools changes,preparations, and setting!, slowing ~light running,microfailures, production rate lowering!, andproduct quality faults, and classifying them withrespect to their economical impact. The means foreliminating or reducing these loss causes are thenrelated to the functional hierarchical decomposi-tion of the plant~computer integrated production!:sensors, control, optimization. The approach mustimprove the performances over a wide range ofoperating conditions and the fault tolerance andreconfigurability degree of the plant. An importantaspect is to reduce the design cost of control pro-cedures. The low level validation of measurements~temperatures, pressures!, the supervision of sen-

Fig. 11. The galvannealing section.




PROO

F COPY 001302ISA

sors and operating conditions, will be not de-scribed here. Only the optimization and controlaspects are presented.

4.1.1. Optimization of the alloying thermal cycleThe improvement of the product quality requires

the determination of the optimal set points of thethermal cycle, particularlyu inductive , for differentline speeds and product types. The metallurgyknowledge is not sufficient to explain and to knowthe optimal temperature of the alloying reaction.The complexity of the reaction and the numerousnonlinear relationships between all variables leadto the use of learning algorithms from the operat-ing points fixed by the control operators duringseveral months.

Figure 12 summarizes the scheme to estimatetheu inductive temperature. The idea is to model theenergy supplied to the strip, with respect to thefeatures of the strip and the operating conditions,and then use the constraints of the thermal cycle tocalculateu inductive . For the energy model building~cf. Fig. 13!, the considered variables are the linespeed, the features of the strip, and the ‘‘mea-sured’’ energy, calculated from all the tempera-tures of the tower and the line speed, and used astarget for learning. From dynamic data, operatingpoints are extracted to generate static databases.

Each point is validated as good or bad using sev-eral criteria fixed by the operators, with respect tothe quality of the product, and only good pointsare kept. The remaining steady states are thenseparated in two sets, a modeling one of 260points and a test one of 130 points. The processtypically operates around a finite number of oper-ating points corresponding to the different stripformats and speeds: consequently, the most suit-able neural model was found to be a radial basisfunction network. The neural model building pro-cess is described in Ref.@37#, whereK-means orfuzzy C-means clustering methods are comparedwith orthogonal least-squares~OLS! @28# squaresalgorithm for the determination of the hidden nodenumber~see paragraph 2.4!. Final results are con-sidered as very satisfactory: the prediction errorsof inductive temperature calculated from the vali-dation database are never greater than the absoluteprecision of the corresponding sensor. As shown inFig. 14, 98% of the points are estimated with anerror lower than 1.2%.

4.1.2. Control of the induction furnaceThis part is focused on the alloying cycle con-

trol, and particularly the strip temperature at theexit of the induction furnace. A power preset~Pgal! to apply to the furnace is determined usinga steady-state inverse model to obtain a strip tem-perature close to the optimal temperature esti-mated previously~cf. Fig. 15!. The behavior of the

Fig. 12. Estimation of the inductive temperature.

Fig. 13. Model learning.

Fig. 14. Precision of predicted temperatures~% of pointswith respect to error in %!.

Fig. 15. Open loop control of the inductive temperature.




PROO

F COPY 001302ISA

furnace being nonlinear, a perceptron, with onehidden layer of sigmoidal units and linear outputnode, is used to build the inverse model. Quadraticor robust criteria are employed to estimate theweights. The results are compared, for the differ-ent criteria and for various hidden node numbers,by considering the minimal values of the rootmean square error and the maximal absolute erroron learning and test data sets. The best model,with only four hidden nodes, is obtained from ro-bust learning. Because of the small modeling er-rors, and to take into account the weak fluctuationsof the unknown strip temperature at the entranceof the furnace, a control loop is implemented onthe process~see Fig. 16!. Note that the chosenstrategy includes the possibility to disconnect thecontrol loop in case of measurement unavailabilityof u inductive . That allows for the use of only theneural inverse model in open loop in order tomaintain a sufficient degree of fault tolerance.

The neural learning approach allows the systemto incorporate the skill of the control operators inautomatic control and optimization systems, witha moderate design cost. While guaranteeing a re-quired degree of fault tolerance, the implementedcontrol architecture leads to a decrease of the oc-currence of the underalloyed products and permitsa progressive reduction of operator intervention infurnace control.

4.2 Control of a coagulation process in watertreatment

4.2.1. Context of the applicationWater treatment involves complex physical,

chemical, and biological processes that transformraw water into drinking water. In spite of impor-tant fluctuations in raw water characteristics, dueto natural perturbation or occasional pollution, thequality of the drinking water produced has to bemaintained at a level compatible with official stan-dards, while minimizing operating costs. The ob-jective of this second study, which is the result ofa collaboration between Heudiasyc and Ondeo,was to build a model of the coagulation process,so as to determine the optimum quantity of chemi-cal reagents, as a function of input water quality@38#. As no model of this process is available, aneural network based system was designed for thatpurpose.

Figure 17 depicts the main processes in a typicalplant for surface water treatment. Raw water isabstracted from the resource~a river in this case!and pumped to the treatment works. A typicalplant consists in two main process units: clarifica-tion and filtration. The coagulation process, whichtakes place in the clarification unit, is broughtabout by adding a highly ionic salt~aluminum sul-fate! to the water. A bulky precipitate is formedand removed as sludge. The coagulation processaccounts for the elimination of most of the unde-sirable substances from the raw water and hencetight monitoring and control of this process is es-sential. The main difficulty is to determine the op-timum quantity of chemical reagent related to rawwater characteristics. Poor control leads to wasteof expensive chemicals, failure to meet the water

Fig. 16. Control architecture for the inductive temperature.

Fig. 17. Simplified synopsis of a water treatment plant.




PROO

F COPY 001302ISA

quality targets, and reduced efficiency of sedimen-tation and filtration processes.

4.2.2. Specific requirementsGiven the high variability of the inputs and the

low reliability of available sensors, an importantrequirement in this application isrobustnessagainst erroneous sensor measurements or unusualwater characteristics, due to accidental pollution.In our system, such a robustness is achieved usinga modular architecture composed of two levels: apreprocessing level responsible for outlier rejec-tion and missing data reconstruction, and a predic-tion level involving the determination of the opti-mal coagulant amount from raw watercharacteristics~Fig. 18!. Neural network modelsare involved at both levels: data validation andreconstruction is carried out by a self-organizingfeature map~SOM! which compares input vectorsto reference patterns learned in an unsupervisedmanner, and prediction of coagulant amount isperformed by a MLP.

A second important requirement from the con-sidered application is the possibility to install thesystem at low cost in various sites, which necessi-tates a methodology for designing and training theneural networks automatically from new data, in-cluding the phases of data validation and modelchoice. Our system uses pruning and resamplingtechniques for automatic determination of the net-work architecture and computation of confidencebounds for the predictions.

4.2.3. Data preprocessingApplications in the environmental domain such

as the one considered in this section generally relyon complex sensors located at remote sites. Theprocessing of the corresponding measurements forgenerating higher level information~such as pre-dictions of optimal coagulant dosage! must there-fore account for possible sensor failures and inco-herent input data. This can be achieved bycomputing distances between input vectors and

reference patterns, orprototypes. The determina-tion of prototypes from data in an unsupervisedway may be performed using the self-organizingmap ~SOM! algorithm introduced by Kohonen@39#. The SOM model can be used at the sametime to visualize the clusters in a data set, and torepresent the data on a two-dimensional map in amanner that preserves the nonlinear relations ofthe data items, nearby items being mapped toneighboring positions on the map. A previous ap-plication of SOM’s to water quality monitoringwas described in Ref.@40#.

Self-organizing maps allow the system to detectatypical data or outliers by monitoring the distancebetween each input vectorx and its closest refer-ence vector. If this distance is greater than a speci-fied threshold, the current sample is considered in-valid. The contributions of each of the componentsof vector x to the distance are then examined todetermine more precisely which sensors should bedeclared faulty. These sensor measurements arethen disconnected to compute a new winning pro-totype with only valid parameters.

For reconstruction, each missing value of agiven input variable is estimated by the value ofthe corresponding component of the winning pro-totype. In order to improve the reconstruction ac-curacy, a combination of thek nearest nodes isused. Each missing or invalid value is estimatedby a combination of the corresponding componentin thek nearest prototypes. More details about thisprocedure can be found in Ref.@38#.

4.2.4. Prediction of coagulant dosageThe prediction of optimal coagulant dosage

from water characteristics was addressed as a non-linear regression problem. Six variables measuredcontinuously on the raw water~turbidity, conduc-tivity, pH, temperature, dissolved oxygen, and UVabsorption! were used as input variables. Targetvalues for the coagulant dosing rate were providedby results of laboratory analyses~‘‘jar tests’’! per-

Fig. 18. Structure of the system for automatic coagulation control.




PROO

F COPY 001302ISA

formed once a day by the plant operators. Oneyear of raw data was available, from which a setof 1600 complete learning samples was con-structed by removing erroneous and incompletemeasurements, and averaging the data over 1-htime intervals. A total of 1120 samples~about70%! was exploited to build the model, the restbeing used as an independent test set. Among thetraining data, approximately 30% was left out as avalidation set for optimizing the architecture.

We used a conventional MLP architecture withone hidden layer of sigmoidal units, and trainingwas performed by minimization of the mean-squared error function. For the determination ofthe architecture, the optimal brain damage~OBD!pruning algorithm@17# was used. This method issummarized in Fig. 19. A large initial network isfirst trained using the back-propagation algorithmapplied to the sum of squares error function. The‘‘saliency’’ of each weight is then computed as afunction of the second derivatives of the errorfunction, and the weights with the lowest saliencyvalues are deleted. The process is iterated until thecross- validation error, as estimated using an inde-pendent data set, starts to increase. In this ap-proach, the initial number of hidden units is onlyrequired to excess the optimal hidden layer size. Itwas fixed arbitrarily to 20 in our application, cor-responding to 161 initial connection weights. Thenetwork complexity was then automatically re-duced by the pruning procedure to a final numberof 16 hidden units and 66 weights.

To better assess the reliability of the system inon-line operation, it was demanded by the opera-tors that the system provide not only point esti-mates of the coagulant dosing rate, but also confi-dence intervals. Bootstrap sampling@41# was usedto generate confidence intervals for the systemoutputs, following an approach first proposed inRef. @42#. This technique is illustrated in Fig. 20.In this approach,b bootstrap subsets of the initial

training set are used to trainb MLP models usingthe architecture and training procedure describedpreviously. When a vector is fed into these net-works, theb outputs provide an estimate of thedistribution of the target variable for the currentinput. Lower and upper confidence limits for theprediction related to any given input vector arethen obtained by sorting these outputs and select-ing, e.g., the 10% and 90% cumulative levels. Theprediction accuracy and confidence bounds com-puted on a validation set are shown in Fig. 21.

Extensive field testing on a pilot site has dem-onstrated the efficiency of the approach, and wide-spread dissemination to other sites is currentlyplanned. Expected benefits are treated water of amore consistently high quality, together with im-proved security of service, as the system will re-spond reliably and effectively over long periods.Significant savings in coagulant usage have al-

Fig. 21. Actual~thick line! versus predicted~thin line! co-agulant dosage with neural network model on test data andconfidence interval~shaded region!.

Fig. 19. Learning and pruning algorithm.

Fig. 20. Bootstrap sampling for the generation of predictionintervals.




PROO

F COPY 001302ISA

ready been observed. The performance of the net-work is obviously dependent on the quality andcompleteness of the data available for training thesystem. Consequently, continuous updating oftraining data during operational use is expected tofurther improve the performance of the system.

5. CONCLUSION

The use of artificial neural networks for controlis motivated by their universal approximation ca-pabilities. The feedforward one hidden layer per-ceptron, with linear activation at the output, andthe radial basis function network provide simpleand flexible structures for nonlinear modeling. Asillustrated through the two applications previouslydescribed, neural learning is particularly usefulwhen complex nonlinear phenomena, only par-tially known in an industrial production context,are involved. In such a case, the empirical knowl-edge of control operators can be learned, particu-larly for determining optimal set points. So, ifmost of the classical dynamic control schemes canbe extended by using neural networks, neural tech-niques seem particularly adapted to control levelshigher than classical dynamic control, and yieldcontrol applications with interesting properties:operation over a wide range of conditions, im-proved fault tolerance degree, and moderated de-sign cost.

References

@1# Chen, S. and Billings, S., Int. J. Control56, 319–346~1992!.

@2# Sjoberg, J.et al., Automatica31, 1691–1724~1995!.@3# Narendra, K. S. and Parthasarathy, K., IEEE Trans.

Neural Netw.1, 4–27~1990!.@4# Hunt, K., Sbarbaro, D., Sbikowski, R., and Gawthrop,

P. J., Automatica28, 1083–1112~1992!.@5# Poggio, T. and Girosi, F., Proc. IEEE78, 1481–1497

~1990!.@6# Ljung, L., System Identification: Theory for the User,

2nd edition ~Prentice-Hall, Englewood Cliffs, NJ,1999!.

@7# Sjoberg, J.et al., Automatica31, 1691–1724~1995!.@8# M. Nørgaard, System identification and control with

neural networks, Ph.D. thesis, Dept. of Automation,Technical University of Denmark, 1996.

@9# Sjøberg; and Ngia, L. S. H., in Nonlinear Modeling–Advanced Black-Box Techniques, edited by J.Suykens. Kluwer Academic Publishers, Boston, 1998,Chap. 1, pp. 1–28.

@10# Battiti, R., IEEE Trans. Neural Netw.5, 537–550~1994!.

@11# He, X. and Assada, H., American Control Conference,

San Francisco, San Francisco, CA, 1993.@12# Cibas, T., Fogelman Soulie, F., Gallinari, P., and

Raudys, S., Neurocomputing12, 223–248~1996!.@13# Fahlman, S. E., and Lebiere, C., in Advances in Neural

Information Processing Systems, edited by D. S.Touretsky. Morgan Kaufmann, San Mateo, CA, 1990,Vol. 2, pp. 524–532.

@14# Lengelle, R. and Denoeux, T., Neural Networks9,83–97~1996!.

@15# Reed, R., IEEE Trans. Neural Netw.4, 740–747~1994!.

@16# Jutten. C., and Fambon, O., 3rd European Symposiumon Artificial Neural Networks ESANN’95, Brussels,Belgium, 129–140, 1995.

@17# Le Cun, Y., Denker, J. S., and Solla, S., in Advances inNeural Information Processing Systems~Ref. @13#!, p.598– 605.

@18# Hassibi, B., and Stork, D. G., in Advances in NeuralInformation Processing Systems, edited by S. H. Han-son. Morgan Kaufmann, San Mateo, CA, 1993, Vol. 5,pp. 164–171.

@19# Ji, C. and Psaltis, D., Neural Networks10, 1133–1141~1997!.

@20# Parisi, R., DiClaudio, E., Orlandi, G., and Rao, B. D.,IEEE Trans. Neural Netw.7, 1450–1460~1996!.

@21# Thomas, P., Bloch, G., and Humbert, C., 6th EuropeanSymposium on Artificial Neural Networks ES-ANN’98, Bruges, Belgium, 279–284, 1998.

@22# Sheperd, A. J., Second-Order Methods for Neural Net-works, Springer, London, 1997.

@23# Yao, X., Proc. IEEE87, 1423–1447~1999!.@24# Bloch, G., Thomas, P., and Theilliol, D., Neurocom-

puting 14, 85–99~1997!.@25# Denoeux, T. and Lengelle´, R., Neural Networks6,

351–363~1993!.@26# Thomas. P., and Bloch, G., 15th IMACS World Con-

gress on Scientific Computation, Modeling and Ap-plied Mathematics, edited by A. Sydow and T. Verlag.Berlin, 1997, Vol. 4, pp. 295–300.

@27# Renders, J. M., Algorithmes Ge´netiques et Re´seaux deNeurones. Herme´s, Paris, 1995.

@28# Chen, S., Cowan, F. N., and Grant, P. N., IEEE Trans.Neural Netw.2, 302–309~1991!.

@29# Orr, M. J. L., Recent advances in radial basis functionnetworks, Technical Reportwww.ed.ac.uk/~mjo/papers/recad.ps, Institute forAdaptive and Neural Computation, Edinburgh Univer-sity, UK, 1999.

@30# Cybenko, G., Math. Control, Signals, Syst.2, 303–314 ~1989!.

@31# Barron, A. R., IEEE Trans. Inf. Theory39, 930–945~1993!.

@32# Narendra, K. S. and Parthasarathy, K., IEEE Trans.Neural Netw.1, 4–27~1990!.

@33# Harris, C. J., Advances in Intelligent Control. Taylor &Francis, London, 1994.

@34# Agarwal, M., IEEE Control Syst. Mag.17, 78–84~1997!.

@35# Hunt, K., Sbarbaro, D., Sbikowski, R., and Gawthrop,P. J., Automatica28, 1083–1112~1192!.

@36# Nørgaard, M., Neural network based system identifi-cation toolbox. Technical Report 95-E-773, Institute of




PROO

F COPY 001302ISA

Automation, Technical University of Denmark, 1995.@37# Bloch, G., Sirou, F., Eustache, V., and Fatrez, P., IEEE

Trans. Neural Netw.8, 910–918~1997!.@38# Valentin, N. and Denoeux, T., Intell. Data Anal.5,

23–39~2001!.@39# Kohonen, T., Self-Organizing Maps. Springer-Verlag,

Heidelberg, 1995.@40# Trautmann, T., and Denoeux, T., ICNN’95, Perth, Aus-

tralia, 73–78~1995!.@41# Efron, B. and Tibshirani, R. J., An Introduction to the

Bootstrap. Chapmann & Hall, New York, 1993.@42# Lippmann, R. P., Kukolich, L., and Shahian, D., in

Advances in Neural Information Processing Systems,edited by G. Tesauro. MIT Press, Menlo Park, CA,1995, Vol. 7, p. 1055–1062.

Gerard Bloch received thePh.D. degree in automatic con-trol in 1988 from UniversityHenri Poincare´, Nancy, France.He is currently a professorwith the Ecole Supe´rieure desSciences et Technologies del’Ingenieur de Nancy at thesame University. He is with theCentre de Recherche en Au-tomatique de Nancy and his re-search interests include nonlin-ear system identification andobservation, process diagnosis,

and application of neural networks to automatic control.

Thierry Denoeux graduatedin 1985 as an engineer fromthe Ecole Nationale des Pontset Chaussees in Paris, and re-ceived a doctorate from thesame institution in 1989. He iscurrently a full professor withthe Department of InformationProcessing Engineering at theUniveristy of Compie`gne,France. His current researchinterests concern fuzzy dataanalysis, belief functionstheory and, more generally, the

management of imprecision and uncertainty in data analysis, patternrecognition, and information fusion.



Date post:	02-Feb-2018
Category:	Documents
Upload:	trannhi
View:	225 times
Download:	0 times

PROOF COPY 001302ISA - hds.utc.frtdenoeux/dokuwiki/_media/en/revues/isa... · PROOF COPY 001302ISA...

Documents