+ All Categories
Home > Documents > Online Global Learning in Direct Fuzzy Controllers

Online Global Learning in Direct Fuzzy Controllers

Date post: 13-May-2023
Category:
Upload: independent
View: 0 times
Download: 0 times
Share this document with a friend
12
218 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 12, NO. 2, APRIL 2004 Online Global Learning in Direct Fuzzy Controllers Hector Pomares, Ignacio Rojas, Jesús González, Miguel Damas, Begoña Pino, and Alberto Prieto Abstract—A novel approach to achieve real-time global learning in fuzzy controllers is proposed. Both the rule consequents and the membership functions defined in the premises of the fuzzy rules are tuned using a one-step algorithm, which is capable of control- ling nonlinear plants with no prior offline training. Direct control is achieved by means of two auxiliary systems: The first one is re- sponsible for adapting the consequents of the main controller’s rules to minimize the error arising at the plant output, while the second auxiliary system compiles real input–output data obtained from the plant. The system then learns in real time from these data taking into account, not the current state of the plant but rather the global identification performed. Simulation results show that this approach leads to an enhanced control policy thanks to the global learning performed, avoiding overfitting. Index Terms—Complete rule-based fuzzy systems, fuzzy control, global learning, real-time direct control. I. INTRODUCTION R EAL-TIME control is one of the most important topics in the realm of intelligent systems [1], [2]. As is well known, fuzzy-logic controllers have proved successful in a number of applications where no analytical model of the plant to be con- trolled is available [3], [4], [5]. Among the different ways of im- plementing a fuzzy controller, adaptive fuzzy controllers are, at least in principle, able to deal with unpredictable or unmodeled behavior, which enables them to outperform nonadaptive control policies when the real implementation is accomplished [6]. The first adaptive fuzzy controller, called the linguistic self-organizing controller (SOC), was introduced by Procyk and Mamdani in 1979 [7]; they proposed a learning algorithm capable of generating and modifying control rules by assigning a credit or reward value to the individual control action(s) that make a major contribution to the present performance. In more recent approaches, adaptive fuzzy systems have focused on merging concepts and techniques from conventional adaptive systems into a fuzzy systems framework. Most notable is the work of Wang [8], where the author deals with plants governed by a certain class of differential equations whose bounds must be known. The algorithms proposed by Wang also need offline pretraining before working in real time. A similar approach, but considering discrete-time control, was proposed in [9]. Another attractive approach is the well known model ref- erence adaptive control (MRAC), which arose from research into how to improve self-organizing controllers by using certain Manuscript received September 18, 2000; revised June 7, 2002 and July 7, 2003. This work supported in part by the Spanish CICYT Project DPI2001- 3219. The authors are with the Department of Computer Architecture and Com- puter Technology, University of Granada, E-18071 Granada, Spain (e-mail: hpo- [email protected]). Digital Object Identifier 10.1109/TFUZZ.2004.825081 ideas from conventional control [10]–[13]. MRAC algorithms employ a reference model, i.e., a model of how you would like the plant to behave, to provide a closed-loop performance feed- back for synthesizing and tuning the fuzzy controller. Recently, several researchers have explored different algo- rithms in the adaptive fuzzy systems area. Y.-M. Park et al. [14] proposed a self-organizing fuzzy logic controller using a fuzzy auto-regressive moving average model (FARMA). In their ap- proach, using input and output history at every sampling step, the fuzzy rules are generated for each space–state domain. To avoid memory overflow and improper rules formed during the initial stages, the input domain is a priori partitioned and there- fore the number of rules is limited. Using a reference model, the new rule whose output is closer to the reference output in a given domain replaces the old one. Therehavealsobeenotherapproachestoadaptivefuzzycontrol concerned with the adaptation of input and/or output scale factors of the fuzzy controller. Tuning the scale factor of a fuzzy system is similar to the gain tuning of a classical proportional–integral derivative controller. Changing the scale factors gives elasticity to the control policy without drastic control changes (for example to deal with an incorrect control policy). In [15], a self-tuning al- gorithm to adjust the scaling factors and improve the control rules of a fuzzy system is proposed. The adjustment of the scaling fac- torsisperformedoffline,i.e.,whenthecontrolends,evaluatingthe overshoot, rising time and the amplitude of the oscillation about the set point. In [16], input scale factors are tuned online for a fixed rule base in order to achieve a helical trajectory in the error space converging to the origin. Online tuning of the output scale factor is used in [17] to improve the control policy in the first iterations, when no initial knowledge exists in the fuzzy controller. Algorithms such as those proposed in [15], [17], and [18] are able to work with very limited plant information, mainly qualita- tive knowledge, adapting the fuzzy controller in real time based on the actual error committed at the plant output. Nevertheless, although they function well, these algorithms are only capable of modifying the consequents of the rules, not the parameters that define the membership functions (MFs). In [19], however, based on the work of [20], the premises of the rules can also be tuned without the knowledge of the plant equations. In this in- teresting approach, the controller output error is used to provide real input/output data concerning the system to be controlled. Nevertheless, this method has the disadvantage of requiring the system to have been previously controlled by another controller, as it does not attempt to reduce the plant output error directly. The approach described in this paper, which is an improved and expanded version of [21], synergically incorporates the ad- vantages of the aforementioned methods and attempts to re- solve their disadvantages. Requiring practically no knowledge of the system to be controlled and by means of two auxiliary 1063-6706/04$20.00 © 2004 IEEE
Transcript

218 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 12, NO. 2, APRIL 2004

Online Global Learning in Direct Fuzzy ControllersHector Pomares, Ignacio Rojas, Jesús González, Miguel Damas, Begoña Pino, and Alberto Prieto

Abstract—A novel approach to achieve real-time global learningin fuzzy controllers is proposed. Both the rule consequents and themembership functions defined in the premises of the fuzzy rulesare tuned using a one-step algorithm, which is capable of control-ling nonlinear plants with no prior offline training. Direct controlis achieved by means of two auxiliary systems: The first one is re-sponsible for adapting the consequents of the main controller’srules to minimize the error arising at the plant output, while thesecond auxiliary system compiles real input–output data obtainedfrom the plant. The system then learns in real time from these datataking into account, not the current state of the plant but rather theglobal identification performed. Simulation results show that thisapproach leads to an enhanced control policy thanks to the globallearning performed, avoiding overfitting.

Index Terms—Complete rule-based fuzzy systems, fuzzy control,global learning, real-time direct control.

I. INTRODUCTION

REAL-TIME control is one of the most important topics inthe realm of intelligent systems [1], [2]. As is well known,

fuzzy-logic controllers have proved successful in a number ofapplications where no analytical model of the plant to be con-trolled is available [3], [4], [5]. Among the different ways of im-plementing a fuzzy controller, adaptive fuzzy controllers are, atleast in principle, able to deal with unpredictable or unmodeledbehavior, which enables them to outperform nonadaptive controlpolicies when the real implementation is accomplished [6].

The first adaptive fuzzy controller, called the linguisticself-organizing controller (SOC), was introduced by Procykand Mamdani in 1979 [7]; they proposed a learning algorithmcapable of generating and modifying control rules by assigninga credit or reward value to the individual control action(s) thatmake a major contribution to the present performance.

In more recent approaches, adaptive fuzzy systems havefocused on merging concepts and techniques from conventionaladaptive systems into a fuzzy systems framework. Most notableis the work of Wang [8], where the author deals with plantsgoverned by a certain class of differential equations whosebounds must be known. The algorithms proposed by Wang alsoneed offline pretraining before working in real time. A similarapproach, but considering discrete-time control, was proposedin [9].

Another attractive approach is the well known model ref-erence adaptive control (MRAC), which arose from researchinto how to improve self-organizing controllers by using certain

Manuscript received September 18, 2000; revised June 7, 2002 and July 7,2003. This work supported in part by the Spanish CICYT Project DPI2001-3219.

The authors are with the Department of Computer Architecture and Com-puter Technology, University of Granada, E-18071 Granada, Spain (e-mail: [email protected]).

Digital Object Identifier 10.1109/TFUZZ.2004.825081

ideas from conventional control [10]–[13]. MRAC algorithmsemploy a reference model, i.e., a model of how you would likethe plant to behave, to provide a closed-loop performance feed-back for synthesizing and tuning the fuzzy controller.

Recently, several researchers have explored different algo-rithms in the adaptive fuzzy systems area. Y.-M. Park et al. [14]proposed a self-organizing fuzzy logic controller using a fuzzyauto-regressive moving average model (FARMA). In their ap-proach, using input and output history at every sampling step,the fuzzy rules are generated for each space–state domain. Toavoid memory overflow and improper rules formed during theinitial stages, the input domain is a priori partitioned and there-fore the number of rules is limited. Using a reference model,the new rule whose output is closer to the reference output in agiven domain replaces the old one.

Therehavealsobeenotherapproachestoadaptivefuzzycontrolconcerned with the adaptation of input and/or output scale factorsof the fuzzy controller. Tuning the scale factor of a fuzzy systemis similar to the gain tuning of a classical proportional–integralderivative controller. Changing the scale factors gives elasticityto the control policy without drastic control changes (for exampleto deal with an incorrect control policy). In [15], a self-tuning al-gorithm to adjust the scaling factors and improve the control rulesof a fuzzy system is proposed. The adjustment of the scaling fac-torsisperformedoffline,i.e.,whenthecontrolends,evaluatingtheovershoot, rising time and the amplitude of the oscillation aboutthesetpoint. In [16], inputscale factorsare tunedonlinefora fixedrule base in order to achieve a helical trajectory in the error spaceconverging to the origin. Online tuning of the output scale factoris used in [17] to improve the control policy in the first iterations,when no initial knowledge exists in the fuzzy controller.

Algorithms such as those proposed in [15], [17], and [18] areable to work with very limited plant information, mainly qualita-tive knowledge, adapting the fuzzy controller in real time basedon the actual error committed at the plant output. Nevertheless,although they function well, these algorithms are only capableof modifying the consequents of the rules, not the parametersthat define the membership functions (MFs). In [19], however,based on the work of [20], the premises of the rules can also betuned without the knowledge of the plant equations. In this in-teresting approach, the controller output error is used to providereal input/output data concerning the system to be controlled.Nevertheless, this method has the disadvantage of requiring thesystem to have been previously controlled by another controller,as it does not attempt to reduce the plant output error directly.

The approach described in this paper, which is an improvedand expanded version of [21], synergically incorporates the ad-vantages of the aforementioned methods and attempts to re-solve their disadvantages. Requiring practically no knowledgeof the system to be controlled and by means of two auxiliary

1063-6706/04$20.00 © 2004 IEEE

POMARES et al.: ONLINE GLOBAL LEARNING IN DIRECT FUZZY CONTROLLERS 219

systems, the algorithm is able to exploit virtually all the infor-mation that can be obtained during the online control of the plantand manages to control the system in real time with no offlinepretraining. This is accomplished by introducing intelligence inthe control system for learning, in real time, a rule-based inversemodel of the plant. After this introduction, Section II states themathematical basis of the problem to be solved. In Section IIIthe architecture of the proposed approach is presented, which isthen explained in detail in Sections IV and V. Some simulationsto clarify the main characteristics of the proposed method arepresented in Section VI. Finally, conclusions are drawn in Sec-tion VII.

II. STATEMENT OF THE PROBLEM

The goal of this paper is to achieve real-time control of asystem which, in general, may be nonlinear and whose differ-ential equations are unknown. Furthermore, we assume there isno model of the plant available so there cannot be any offlinepretraining of the main controller parameters. Starting from this“void” fuzzy controller, we attempt to optimize the controller’srules and the parameters defining it in order to translate the stateof the plant to the desired value in the shortest possible time.

The system or plant to be controlled is usually expressed inthe form of its differential equations or, equivalently, by its dif-ference equations, provided these are obtained from the formerwith the use of a short enough sampling period. In mathematicalterms

(1)

where and are, respectively, the plantoutput and the controller output, is the delay of the plant, and

is an unknown continuous and differentiable function.The restriction we are going to impose to the plant is that there

always exists a control policy capable of translating the output tothe desired value (within the operation range). This means thatthere must not be any state in which the output variable doesnot depend on the control input. If this were not the case, therewould exist at least one state in which the plant output does notdepend on the control input and, for that state, no control policywould be possible. Therefore, the partial derivative of the plantoutput with respect to the control signal must never be cancelledand as the plants are, in particular, differentiable and continuouswith respect to the control input, this derivative must have aconstant sign, i.e., the plant must be monotonic with respect tothe control signal. Thus, we can assume there exists a function

such that the control signal given by

(2)

with

(3)

and being the desired output at instant , is capable ofreaching the set point target after instants of time, i.e.,

.

In the proposed algorithm, no information is needed on theequations determining the plant, although it is necessary toknow the monotonicity of its output with respect to the controlsignal, the delay of the plant (which can nearly always be takenas 1 if we use a sampling period that is not very small) and theinputs that have a significant influence on the plant output.

It should be noted that the monotonicity assumption concernsonly the partial derivative of the plant output with respect to thecontrol signal, but does not impose anything about the depen-dency of the plant output with respect to other signals, namely,previous values of the plant output.

As usual in control studies, in our approach we use a completerule-based fuzzy controller [22], with rule defined by

(4)

where is the th MF of variable is the number of inputvariables and is a scalar value.

The fuzzy inference method uses the product as T-norm andthe centroid method with sum-product operator as the defuzzi-fication strategy. The strength or -level of rule is thencalculated by

(5)

Thus, the output of our fuzzy controller is given by

(6)

where is the number of MFs defined in variable .Though no specific MF type is required in the methodology

proposed, in this paper we use triangular functions, character-ized by the following MF:

(7)where , and are the central point and the left andright bounds of the triangular function, respectively. is thestep function defined by

ifotherwise

(8)

It must be noted that, according to (6), the control field couldbe deemed as a problem of function approximation [23], [24]when input–output (I/O) data of the true inverse plant functionare available. Nevertheless, tasks such as those discussed in thispaper, i.e., real-time control starting from no knowledge, aremuch more complex due to the fact that the approximation of (6)to the real inverse plant function must be done while working inreal time and attempting to direct the plant output to the target

220 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 12, NO. 2, APRIL 2004

Fig. 1. Block diagram of the proposed control architecture.

set point at every instant. For this reason, in these cases, controlperformance is measured, not considering the mean-square error(MSE) of the function approximated by the controller, but ratherthe MSE, for a number of iterations or epochs (Num epochs)between the set point and the plant output measured after in-stants of time, being the delay of the plant

(9)

III. ARCHITECTURE OF THE PROPOSED FUZZY CONTROLLER

Fig. 1 depicts the block diagram of the proposed control ar-chitecture. The larger block is the main fuzzy controller, wherespecial emphasis has been laid on the knowledge base, which isdivided into two sets of parameters: the rule premises, i.e., thedefinition of the MFs, and the scalar rule consequents. Coupledto the main controller there are two auxiliary systems, which arein charge of finding suitable parameters for the fuzzy controllerfrom the control evolution:

The Adaptation Block (A-Block) is capable of a coarse tuningof the rule consequents, using information such as the mono-tonicity of the plant output with respect to the control signaland the delay of the plant. This block is necessary in the first it-erations of the control process when no initial parameter valuesare available.

The Global Learning Block (GL-Block) is responsible for thefine-tuning of the MFs and the rule consequents, using I/O datacollected from the plant evolution.

Finally, the input selection block provides the input valuesneeded by every block and collects I/O data in real time; thisis subsequently used by the learning process, as we show inSection V

IV. A-BLOCK

The main problem when real-time control strategies must befaced lies in the fact that, as the internal functioning of thesystem to be controlled is unknown, we are unaware of how to

modify the controller’s parameters. To use a gradient-based al-gorithm, we would have to compute , an unknown deriva-tive. Moreover, in the case of long sampling periods, such aderivative cannot be approximated by .

Nevertheless, as stated before, we do have the information re-garding the monotonicity of the plant, which allows us to obtainthe right direction in which to move the consequents of our rules.Thus, in a plant with a delay that is shorter than the sampling pe-riod (i.e., the output at instant is a direct consequence of thecontrol input at the previous instant), if the control inputprovides a plant output , we know that a lowerinput should have been used, assuming plant output increases di-rectly with the control signal (alternatively, we should have useda larger if the monotonicity were of the opposite sign). Thisis the basis of the SOC proposed by Procyk and Mamdani [7]and has the advantage of needing neither a model of the plantnor the desired control output at each instant of time. Commonapproaches based on SOC use a fuzzy auxiliary system in chargeof this modification of the consequents of the fuzzy rules [15],[17], [18].

As stated above, the monotonicity of the plant provides valu-able information on how to adapt the consequents of the fuzzyrules. To modify these, we only need to take into account therules really used to obtain as the fuzzy controller output.In the approach proposed by Singh [18] in 1998, all the activatedrules (maximum four since he used two-dimensional functionswith a triangular partition configuration [25]), were modified bythe same amount. In [15] and [17], however, the reward/penaltyof the rules were modulated by their activation degree, i.e., eachrule is modified according to its degree of responsibility in ob-taining the current state, outperforming Singh’s approach.

In order to accurately build a self-organizing controller,the auxiliary system should possess information on how theplant output varies with respect to the control signal for everypossible operation region. This entails knowing the Jacobianmatrix of the plant function but, unfortunately, this informationis normally unavailable in a control task. To overcome thisproblem, the aforementioned authors use as an auxiliarysystem a fuzzy controller based on heuristically built metarulesconstructed using the plant output error and error rate as inputs.Nevertheless, these metarules assume that step functions arerequired as set points to the plant and are not suitable whenthese are time variable.

From this, it is evident that with the kind of information avail-able from the plant, only a relatively coarse control can be ap-plied to the system. In this paper, the adaptation block shownin Fig. 1 is responsible for this coarse control, implementinga modified self-organizing controller. This block evaluates thecurrent state of the plant and proposes the correction of the rulesresponsible for the existence of such a state, either as a rewardor as a penalty. Mathematically, if the output of the adaptationblock is denoted by, the rule modification proposed at instant[see (4)] is given by

(10)

where, as in [17], this modification is proportional to the degreewith which the rule was activated in achieving the control output

POMARES et al.: ONLINE GLOBAL LEARNING IN DIRECT FUZZY CONTROLLERS 221

now being evaluated at instant . As it is necessary towait iterations in order to evaluate , implementationof this adaptation method requires the definition of a queue witha depth given by the delay of the plant; this is where the degreesof activation of the rules are stored.

Now, assuming that plant output increases with control input,if this output is smaller than the required set point thenwould have to be made bigger (alternatively, we should haveused a smaller control signal if the monotonicity were of theopposite sign). This implies that the output of the adaptationblock must be proportional to the current plant error:

(11)

being negative when the plant output decreases with the con-trol sign. In the previous expression, is the set pointrequired of the plant output at instant and is the cur-rent plant output. Note that it would be incorrect to usein (11), as the rules that are activated at instant serve toachieve the desired value and not .

It is important to note that cannot depend on theerror rate (as in other approaches) since the set point may betime-variable, nor on other input variables since no further in-formation on the plant is supposed to be available.

The next task is to assign a proper value to parameter . As isapparent from the proposed control architecture in Fig. 1, bothA-Block and GL-Block are in charge of tuning the consequentsof the fuzzy rules. In order to make them work harmoniously,we must concentrate on what we expect of the adaptation block.As stated above, with the relatively small amount of informationavailable from the plant, the adaptation block is only capable ofa coarse tuning of the fuzzy rules, necessary in the first steps ofthe control evolution. As the process advances, the adaptationblock must give way to the global learning block since only thisis capable of fine tuning the control actions. Thus, the influenceof the adaptation block must decrease with time, for example,in the following form:

(12)

where should not be too small, to let the rule consequentstake suitable values, nor too big, which would make the learningprocess too slow. A typical value of might be in the range of1000–5000 epochs. is a scale factor that is necessary to avoidout of range modifications in the fuzzy rules. This factor can beoffline determined by

(13)

where is the range in which the plant output is going tooperate and is the range of the controller’s actuator.

To sum up, the adaptation block of Fig. 1 is basically an eval-uator of the plant output and provides a coarse tuning of the con-sequents of the rules, with a reward/penalty value. Its main char-acteristic is that it can work on the basis of chiefly qualitativeinformation from the plant (mainly, its monotonicity) adapting

the main controller fuzzy rules according to the following finalexpression:

(14)

V. GLOBAL LEARNING BLOCK

In the previous section, we noted the difficulty of using a gra-dient-based algorithm in the control process due to the imprac-ticability of computing the partial derivative of the plant outputwith respect to the control signal for every plant state. In thissection, we show how it is possible to use the gradient descentmethodology, based on the error in the control output instead ofthat in the plant output, in order to achieve a fine-tuning of themain controller parameters. For this purpose, we base our ap-proach on a modified version of the algorithm proposed by An-dersen et al. [19]. The main characteristic of the methodologypresented in this section is that, analogously to the adaptation al-gorithm of the previous section, it does not rely on a plant modelor need to know its differential equations or require a referencemodel [26].

When the controller provides a control signal at instantand the output is evaluated sampling periods later, the error committed at the plant output is not the only

information that may be obtained. Regardless of whether ornot this was the intended response, we now know that, if thesame transition from the same initial conditions but now with

is ever required again, the optimal controlsignal is precisely . Therefore, at every sampling time, wedo get an exact value of the true inverse function of the plant[26].

One way to use this online information concerning the true in-verse plant function is to store the recently obtained data valuesin a memory . As this memory has a finite capacity, the databeing received must be filtered in some way. The best way to dothis is to define a grid in the input space and to store the mostrecent datum belonging to each of the hypercubes defined bysuch a grid, substituting a pre-existing datum in the hypercube.By these means, the memory contains a uniform representationof the inverse function of the plant, and with every step of thegradient descent algorithm, the learning process is performed ina global way, taking into consideration the whole input spaceand thus eliminating overfitting. Since some operation regionsare more important than others, a weight parameter is as-signed to each of the hypercubes, indicating the number of timesa datum belonging to that hypercube is collected. Thus, the moreimportant dynamic regions will have a greater influence on thefine-tuning of the main controller parameters.

In mathematical terms, the control signal exerted at the plantat instant is given by [see (6)]

(15)

where represents the set of parameters that define the con-troller at instant (rules plus MFs) and is given by (3).

222 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 12, NO. 2, APRIL 2004

Fig. 2. (a) Plant to be controlled. (b) Inverse plant function.

After iterations, we obtain at the plant output the value. If we now replace the input vector by

(16)

an expression that only differs from in the first element,where replaces , we obtain the following datumbelonging to the actual inverse plant function:

(17)

this datum is stored in the corresponding position of in-creasing its associated weight.

To perform the global learning process on the basis of thedata stored in , we must evaluate, for each datum, the output

given by the controller for each possible input, thus obtainingan error signal in the output of the controller

(18)

Fig. 3. Initial conditions: (a) MFs. (b) Rule consequents.

Fig. 4. Control performance versus number of epochs for several conditions.Top three plots: adaptation block only. Four lower plots: adaptation + globallearning for different memory grid sizes.

POMARES et al.: ONLINE GLOBAL LEARNING IN DIRECT FUZZY CONTROLLERS 223

Fig. 5. (a) Initial control evolution. (b) Function implemented by the maincontroller after 200 epochs.

where and are the desired output and that obtainedby the current parameters of the main fuzzy controller, respec-tively. Index runs through the size of . It is important tonote that, although is produced by the controller, it is notapplied to the plant. Its only purpose is to calculate .

In each iteration , it is necessary to compute the error in theoutput of the controller for each of the valid data stored, wherethe magnitude to be minimized is given by

(19)

Therefore, the parameters of the main fuzzy controller are opti-mized in each iteration in the following way:

(20)

Fig. 6. (a) Final control evolution obtained. (b) Controller output after theglobal learning process (15000 epochs).

which can be computed taking into account that

(21)

The expression for these derivatives can be found in Ap-pendix A. Finally, is the learning factor, which can begiven by (see Appendix B)

(22)

with .By this procedure we have eliminated the difficulty arising

when using a gradient-based algorithm based on the plant outputerror, whereby it is impossible to compute because ofthe unknown internal functioning of the plant. Instead, we usethe error in the controller output for which we can calculate thepartial derivatives, since we do know its internal functioning.However, it should be noted that the global learning block wouldnot work without the existence of the adaptation block as it isthe latter that is initially capable of obtaining truly useful datafrom the plant.

224 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 12, NO. 2, APRIL 2004

Fig. 7. Final MFs. (a) Variable r(k). (b) Variable y(k).

It should be noted here that the claim for “global learning” ismade based on the fact that we are not using only the last datapoint from the actual inverse plant function but, instead, we con-sider a memory that contains a uniform representation of the in-verse function of the plant whose weight parameters reflect theimportance of each of the dynamic regions. In case the set-point,

, does not change much (for example ),the algorithm takes this fact into consideration by increasing theweight factor belonging to that region of operation. In this case,the global inverse function will be erratic but, as long as the de-sired plant output is a constant, there is no need for learningother operation regions (since the user seems to be only con-cerned with that constant value for the plant output).

VI. SIMULATIONS

To show the applicability of the proposed methodology andto gain an insight into the advantages of the global learning fea-ture of the algorithm two simulation examples are used in thissection.

A. Global Learning Using Random Set-Points

Consider the system described by the following differenceequation [27]:

(23)

which corresponds to a nonlinear plant both with respect to thecontrol signal and the variable to be controlled. In this subsec-tion we use random set points in the range . The functionimplemented by the plant and its inverse are plotted in Fig. 2(a)and (b).

As input variables, we use the desired plant output andits current output with five triangular MFs homogenouslydistributed in the range as depicted in Fig. 3(a). Allrule consequents are initially taken as zero; thus, the functionimplemented by the main fuzzy controller at epoch 0 is that

Fig. 8. Control evolution with fixed parameters for the test pattern.

of Fig. 3(b). For this example, can be roughly taken as 0.5from expression (13). This is a positive value since the plantoutput increases with respect to the control signal (positivemonotonicity). Finally, we use .

In Fig. 4, the control performance measured as the mean-square error (9) every 2000 epochs is plotted for several dif-ferent conditions. The upper three plots correspond to the con-trol performance using only the adaptation block for differentvalues of the parameter (see (12)). As can be seen from thesegraphs, the final control performances are very similar (

– ) in spite of using rather different values of . Themain difference resides in convergence speed; the bigger the pa-rameter the faster the convergence. Nevertheless, excessivelylarge values of can provoke undesirable oscillations in therule consequents, which may cause instabilities.

The four lowest plots of Fig. 4 correspond to the proposedalgorithm for different grid sizes of the storage memory, using[see (12)]

(24)

POMARES et al.: ONLINE GLOBAL LEARNING IN DIRECT FUZZY CONTROLLERS 225

Fig. 9. Control performance vs. number of epochs for several memory grid sizes. (a) With weight factors. (b) Without weight factors.

The figure makes plain the great improvement in control per-formance achieved by the global learning block. When a 1 1memory is used, i.e., we store only the last datum from the plant,the controller’s performance is very poor. With higher memorysizes, it collects data from many other operation regions, thusavoiding the overfitting problem and, besides, it makes param-eter learning faster. It is important to note that in this example,5 5, 10 10, and 15 15 memory arrays obtain very similarresults ( – ).

To see how the control process develops, the initial controlevolution is plotted in Fig. 5(a). The dots represent the desiredplant values and with solid lines, the plant output obtained.Fig. 5(b) shows the controller function after 200 epochs, whereit may be seen that the adaptation block has already been ca-pable of roughly tuning the main rule consequents of the fuzzycontroller. When the learning process is practically finished,the control evolution is as shown in Fig. 6(a), from which itcan be concluded that the control actions are virtually perfect.The controller output for this case is plotted in Fig. 6(b), whichto the naked eye is indistinguishable from the optimal inverseplant function of Fig. 2(b). The final tuned MFs are shown inFig. 7(a) and (b).

As random desired values have been used in this example,we can test the final main fuzzy controller obtained by fixingits parameters (deactivating both the adaptation and globallearning blocks). In Fig. 8, a 150-epoch test pattern composedof a random signal, a sine wave and a ramp is used. From thisfigure, it can be stated that the methodology proposed in thispaper has been capable of finding nearly-optimal values of thefuzzy controller, which has now learned globally the plant ithas been required to control.

B. Global Learning Using a Desired Periodical Pattern

To verify how the global learning strategy proposed in thispaper can also deal with nonrandom set points, consider the fol-lowing system:

(25)

with

(26)

which is even more difficult to control than the previous system.Let us now suppose that it is required for the plant output to de-scribe a periodic sine wave of 100 iterations in the range .It must be noted that within this range, the plant output has adefinite monotonicity with respect to the control signal (in thiscase, positive).

For this example, the same input variables and number of MFs(but now distributed in the range ) are used. The sameinitial rule values, and are also used.

In Fig. 9(a), the control performance of the whole algorithmfor several memory grid values is depicted. Final MSE valuesusing only the adaptation block (not shown in this figure) are inthe range [0.015–0.020]. With a memory of only one storage el-ement (1 1) the MSE achieved is 0.009. When larger sizes ofstorage memory are used the performance is considerably im-proved; the greater the size of , the better the performance(and the overfitting is reduced). This result is achieved thanksto the weight factors introduced in expression (19). It may seemthat using bigger sizes of M would lead to a poorer perfor-mance since the learning action would take into account a biggernumber of operation regions which are not the ones in whichthe plant is working at a certain instant . However, by usingweight parameters, the more important control regions are as-signed bigger weights, thus allowing the global learning algo-rithm to concentrate on the most important zones. Fig. 9(b) de-picts MSE performances without the use of the weight factors.In these cases, once the influence of the adaptation block fades[due to the exponential factor in (24)] the control performancedeteriorates because the global-learning block is trying to learnall the control regions as if they were all equally important.

The initial control evolution can be seen in Fig. 10(a), wherethe difficulty of this control problem can already be predicted.At the end of the learning stage, the evolution we find is that de-picted in Fig. 10(b). The control action (also plotted) indicates,as we have anticipated, the particular actions that should be ex-erted at the plant input to make its output follow the smooth sinewave. The final MFs are also represented in Fig. 11(a) and (b).

226 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 12, NO. 2, APRIL 2004

Fig. 10. (a) Initial control evolution. (b) Final control evolution (15� 15 grid size).

Fig. 11. Final MFs. (a) Variable r(k). (b) Variable y(k).

TABLE IMSE (�1000), UP TO 3 SIGNIFICANT DIGITS, OBTAINED EVERY 1000ITERATIONS FOR DIFFERENT INITIAL VALUES OF THE SCALE FACTOR

C (� = 5000)

C. Noisy Plants

As a final example, let us now consider the system describedby the following difference equations:

(27)

Fig. 12. Effect of the forgetting factor � for the first 1000 iterations of theprocess (C = 1:0).

being a random Gaussian signal with standard deviationequals to 5% of the range of , which will again be a peri-odic sine wave of 100 iterations, now in the range . This

POMARES et al.: ONLINE GLOBAL LEARNING IN DIRECT FUZZY CONTROLLERS 227

Fig. 13. (a) Initial control evolution. (b) Final control evolution (C = 1:0; � = 5000;5� 5� 5 grid size).

noise signal is added in order to simulate the possible effect ofsensor-disturbances or deviations of the plant behavior due toexogenous factors.

Again, we will start the algorithm from an empty set of rules(all consequents set to zero), with five MFs homogenously dis-tributed in the range for the three main controller’s in-puts: , and a 5 5 5 memory array.

In Table I we represent the MSE every 1000 iterations (i.e.,the MSE every ten presentations of the desired sine wave) ob-tained for different initial values of the scale factor , beingthe forgetting factor fixed to 5000 iterations. From the table,we can see the following.

• For scale factors within the range [0.5, 3.0], the evolutionis very similar and the controller rapidly learns how tosuccessfully deal with the plant. The optimum value seemsto be , although any value within that range isperfectly good.

• For very small values of the scale factor (0.1 and 0.25),the control policy is still stable and robust but the learningprocess is very slow. For those values, the adaptation blockcannot adapt the initially zero rule consequents quicklyenough and this influences negatively the controller’s per-formance.

• Finally, for large values of the scale factor , the rulecorrections are so large in the beginning of the algorithmthat the control policy seems to be unstable. Nevertheless,once becomes smaller (due to the effect of the for-getting factor ), the control system stabilizes and finallywe obtain a controller’s performance as good as that ob-tained from the previous scale factors.

As far as the forgetting factor is concerned, its influenceis very low for this example. Fig. 12 represents the control per-formance measured as the MSE (9) every 100 iterations (everycomplete sine wave pattern) for the first 1000 iterations of thealgorithm using different forgetting factors, being fixed to1.0. As can be seen from the figure, the value of the forgettingfactor timidly affects the speed of the convergence of the algo-rithm, being the value , the optimum one. It shouldbe noted that for very small values oscillations inthe values of the MSE could take place due to the little time theadaptation block is given to coarsely adapt the rule consequents.

Finally, to see how the control process develops, the initialcontrol evolution is plotted in Fig. 13(a), for the case of

and . Again, the dots represent the desired plantvalues and with solid lines, the plant output obtained. Fig. 13(b)shows the control evolution we find after 5000 epochs. As canbe seen from both figures, despite the fact that the algorithmis started from an empty rule base and the plant is affected bynoise, the control policy is capable of extracting all necessaryinformation from the plant in real time so as to finally followthe desired sine wave.

VII. CONCLUSION

This paper presents a new methodology for the automatic im-plementation of real-time fuzzy controllers to control nonlinearsystems when a model of the plant is unavailable or its differentialequations unknown. The principal feature of the algorithm is thatit requires no prior offline training and is capable of performingglobal learning of the inverse plant function using any set of de-sired trajectories (randomly selected or patterned). By means oftwo auxiliary blocks, namely the A-Block and the GL-Block, itis possible to exploit virtually all the information that can be ob-tained during the online control of the plant. Finally, the enhancedperformance of the proposed method has been verified with threedifferentcontrolproblems.Forfutureresearch, itcouldbeveryin-terestingtostudyhowtoincludeanextrablockinthealgorithmforselecting the input variables of the main controller and assigningthe optimum number of MFs to each of them.

APPENDIX A

In this appendix, we report the expressions used to employthe gradient descent algorithm presented in Section V. If theparameter with respect to which the partial derivative must becalculated is the consequent of one of the fuzzy rules, namely

, from (6), we have

(28)

228 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 12, NO. 2, APRIL 2004

(29)

If, alternatively, the parameter belongs to the definition of theth MF of variable ; see (29), as shown at the top of the page,

and, as only the MFs of variable can depend on parameter ,we have

(30)

Finally, in this paper, we are using triangular MFs defined by(7). For this case, we have

(31)

(32)

APPENDIX B

In this appendix, we develop the expressions from which thelearning rate in (22) has been deduced.

Let us define (a) as the initial set of parameters and (b)as those obtained after the gradient step. From (20), we have

(33)

and we want to obtain a proper value of which achieves, these being equal if and only if .

If we assume that

(34)

then we can approximate by its first-order Taylor seriesexpansion

(35)

Inserting (33) in the previous expression

(36)

and for any . But must also accomplishassumption (34). Then, choosing

(37)

we obtain

(38)

Thus, must be a positive number such that . Typicalvalues are in the range [0.01–0.1].

REFERENCES

[1] P. Antsaklis, “Defining intelligent control,” IEEE Contr. Syst. Mag., vol.14, pp. 4–5, 58–66, Mar. 1994.

[2] , “Intelligent learning control,” IEEE Contr. Syst. Mag., vol. 14, pp.5–7, Apr. 1995.

[3] O. Yagishita, O. Itoh, and M. Sugeno, “Application of fuzzy reasoning tothe water purification process,” in Industrial Application of Fuzzy Con-trol, M. Sugeno, Ed. Amsterdam, The Netherlands: North-Holland,1985, pp. 19–40.

[4] J. A. Bernard, “Use of rule-based system for process control,” IEEEContr. Syst. Mag., vol. 8, pp. 3–13, May 1988.

[5] Y. Kasai and Y. Morimoto, “Electronically controlled continuouslyvariable transmission,” in Proc. Int. Congr. Transportation Electronics,Dearborn, MI, Oct. 1988, pp. 69–85.

[6] R. Ordoñez, J. Zumberge, J. T. Spooner, and K. M. Passino, “Adap-tive fuzzy control: Experiments and comparative analyzes,” IEEE Trans.Fuzzy Syst., vol. 5, pp. 167–188, May 1997.

[7] T. Procyk and E. Mamdani, “A linguistic self-organizing process con-troller,” Automatica, vol. 15, no. 1, pp. 15–30, 1979.

[8] L. X. Wang, Adaptive Fuzzy Systems and Control. Design and StabilityAnalysis. Upper Saddle River, NJ: Prentice-Hall, 1994.

[9] F.-C. Chen and H. K. Khalil, “Adaptive control of a class of nonlineardiscrete-time systems using neural networks,” IEEE Trans. Automat.Contr., vol. 40, pp. 791–801, May 1995.

[10] K. F. Fong and A. P. Loh, “MRAC control of nonlinear systems usingneural networks with recursive least squares adaptation,” in Proc. IEEEInt. Conf. Neural Networks, 1993, pp. 529–533.

[11] J. R. Layne and K. M. Passino, “Fuzzy model reference learning con-trol,” J. Intell. Fuzz. Syst., vol. 4, pp. 33–47, 1996.

[12] K. Narendra and A. Annaswamy, Stable Adaptive Systems. UpperSaddle River, NJ: Prentice-Hall, 1989.

[13] I. Skrjanc, S. Blazic, and D. Matko, “Direct fuzzy model-reference adap-tive control,” Int. J. Intell. Syst., vol. 17, pp. 943–963, 2002.

[14] Y. Park, U. Moon, and K. Y. Lee, “A self-organizing fuzzy logic con-troller for dynamic systems using a fuzzy auto-regressive moving av-erage (FARMA) model,” IEEE Trans. Fuzzy Syst., vol. 3, pp. 75–82,Feb. 1995.

[15] M. Maeda and S. Murakami, “A self-tuning fuzzy controller,” Fuzzy SetsSyst., vol. 51, pp. 29–40, 1992.

[16] C. Chou and H. Lu, “A heuristic self-tuning fuzzy controller,” Fuzzy SetsSyst., vol. 61, pp. 249–264, 1994.

[17] I. Rojas, H. Pomares, F. J. Pelayo, M. Anguita, E. Ros, and A. Prieto,“New methodology for the development of adaptive and self-learningfuzzy controllers in real time,” Int. J. Approx. Reason., vol. 21, pp.109–136, 1999.

[18] Y. P. Singh, “A modified self-organizing controller for real-time processcontrol applications,” Fuzzy Sets Syst., vol. 96, pp. 147–159, 1998.

[19] H. C. Andersen, A. Lotfi, and A. C. Tsoi, “A new approach to adaptivefuzzy control: The controller output error method,” IEEE Trans. Syst.,Man, Cybern. B, vol. 27, pp. 686–691, Aug. 1997.

[20] J. S. Albus, “Data storage in the cerebellar model articulation controller(CMAC),” Trans. ASME, J. Dyna. Syst., Meas., Control, vol. 97, pp.228–233, Sept. 1975.

POMARES et al.: ONLINE GLOBAL LEARNING IN DIRECT FUZZY CONTROLLERS 229

[21] H. Pomares, I. Rojas, F. J. Fernández, M. Anguita, E. Ros, and A.Prieto, “A new approach for the design of fuzzy controllers in realtime,” in Proc. 8th Int. Conf. Fuzzy Systems, Seoul, Korea, Aug. 1999,pp. 522–526.

[22] C. C. Lee, “Fuzzy logic in control systems: Fuzzy logic controller—PartI, II,” IEEE Trans. Syst.,Man, Cybern., vol. 20, pp. 404–435, Mar. 1990.

[23] I. Rojas, H. Pomares, J. Ortega, and A. Prieto, “Self-organized fuzzysystem generation from training examples,” IEEE Trans. Fuzzy Syst.,vol. 8, pp. 23–36, Feb. 2000.

[24] H. Pomares, I. Rojas, J. Ortega, J. Gonzalez, and A. Prieto, “A systematicapproach to a self-generating fuzzy rule-table for function approxima-tion,” IEEE Trans Syst., Man, Cybern., vol. 30, pp. 431–447, June 2000.

[25] E. H. Ruspini, “A new approach to clustering,” Info. Control, no. 15, pp.22–32, 1969.

[26] D. S. Reay, “Comments on ‘A new approach to adaptive fuzzy control:The controller output error method’,” IEEE Trans. Syst., Man, Cybern.B, vol. 29, pp. 545–546, Aug. 1999.

[27] H. C. Andersen, “The controller output error method,” Ph.D. disserta-tion, Univ. Queensland, Queensland, Australia, 1998.

Hector Pomares received the M.A.Sc. degree inelectrical engineering, the M.Sc. degree in physics,and the Ph.D. degree, all from the University ofGranada, Granada, Spain, in 1995, 1997, and 2000,respectively.

He is currently an Associate Professor with theDepartment of Computer Architecture and ComputerTechnology, the University of Granada. His currentareas of research interest are in the fields of functionapproximation and online control using adaptive andself-organizing fuzzy systems.

Ignacio Rojas received the M.S. degree in physicsand electronics and Ph.D. degree, both from theUniversity of Granada, Granada, Spain, in 1992 and1996, respectively.

He was with the University of Dortmund, Dort-mund, Germany, as an Invited Researcher from 1993to 1995. During 1998, he was a Visiting Researcher atthe BISC Group, University of California, Berkeley.He is currently an Associate Professor with the De-partment of Computer Architecture and Technology,the University of Granada. His research interests are

in the fields of hybrid system and combination of fuzzy logic, genetic algorithmsand neural networks, and financial forecasting.

Jesús Gonzalez was born in 1974. He received theM.Sc. degree in computer science and the Ph.D. de-gree, both from the University of Granada, Granada,Spain, in 1997 and 2001, respectively.

He is currently an Assistant Professor in theDepartment of Computer Architecture and Com-puter Technology, the University of Granada. Hiscurrent areas of research interest are in the fields offunction approximation using radial basis functionneural networks, fuzzy systems, and evolutionarycomputation.

Miguel Damas received the M.Sc. degree in com-puter science and the Ph.D. degree, both from theUniversity of Granada, Granada, Spain, in 1992 and2000, respectively.

Currently, he is Professor of Computer En-gineering and Electronic Engineering with theDepartment of Architecture and Computer Tech-nology, the University of Granada. His main researchinterests are in the fields of industrial control andcommunications, as well as with algorithms andparallel architectures for optimization problems.

Begoña del Pino received the Ph.D. degree incomputer science from the University of Granada,Granada, Spain, in 1999.

She is an Associate Professor with the Departmentof Architecture and Computer Technology, theUniversity of Granada. She teaches courses in tech-nology of computers and design of microelectroniccircuits. Her main research interests lies on VLSIimplementation and modeling of neuro-fuzzy sys-tems and hardware/software codesign of embeddedsystems.

Alberto Prieto received the B.Sc. degree in elec-tronic physics from the Complutense University,Madrid, Spain, and the Ph.D. degree from theUniversity of Granada, Granada, Spain, in 1968 and1976, respectively.

From 1969 to 1970, he was with the “Centro deInvestigaciones Técnicas de Guipuzcoa” and at the“E.T.S.I Industriales,” San Sebastián, Spain. From1971 to 1984, he was Director of the ComputerCentre and, from 1985 to 1990, Dean of theComputer Science and Technology studies of the

University of Granada, where he is currently a Full Professor and Directorof the Department of Computer Architecture and Technology. His researchinterests are in the area of intelligent systems.

Dr. Prieto is a nominated Member of the IFIP WG 10.6 (Neural ComputerSystems) and Chairman of the Spanish RIG of the IEEE Neural NetworksCouncil. He received the Award of Ph.D. Dissertations and the CitemaFoundation National Award in 1976.


Recommended